1*da0073e9SAndroid Build Coastguard Worker## Inference benchmarks 2*da0073e9SAndroid Build Coastguard Worker 3*da0073e9SAndroid Build Coastguard WorkerThis folder contains a work in progress simulation of a python inference server. 4*da0073e9SAndroid Build Coastguard Worker 5*da0073e9SAndroid Build Coastguard WorkerThe v0 version of this has a backend worker that is a single process. It loads a 6*da0073e9SAndroid Build Coastguard WorkerResNet-18 checkpoint to 'cuda:0' and compiles the model. It accepts requests in 7*da0073e9SAndroid Build Coastguard Workerthe form of (tensor, request_time) from a `multiprocessing.Queue`, runs 8*da0073e9SAndroid Build Coastguard Workerinference on the request and returns (output, request_time) in the a separate 9*da0073e9SAndroid Build Coastguard Workerresponse `multiprocessing.Queue`. 10*da0073e9SAndroid Build Coastguard Worker 11*da0073e9SAndroid Build Coastguard WorkerThe frontend worker is a process with three threads 12*da0073e9SAndroid Build Coastguard Worker1. A thread that generates fake data of a given batch size in the form of CPU 13*da0073e9SAndroid Build Coastguard Worker tensors and puts the data into the request queue 14*da0073e9SAndroid Build Coastguard Worker2. A thread that reads responses from the response queue and collects metrics on 15*da0073e9SAndroid Build Coastguard Worker the latency of the first response, which corresponds to the cold start time, 16*da0073e9SAndroid Build Coastguard Worker average, minimum and maximum response latency as well as throughput. 17*da0073e9SAndroid Build Coastguard Worker3. A thread that polls nvidia-smi for GPU utilization metrics. 18*da0073e9SAndroid Build Coastguard Worker 19*da0073e9SAndroid Build Coastguard WorkerFor now we omit data preprocessing as well as result post-processing. 20*da0073e9SAndroid Build Coastguard Worker 21*da0073e9SAndroid Build Coastguard Worker### Running a single benchmark 22*da0073e9SAndroid Build Coastguard Worker 23*da0073e9SAndroid Build Coastguard WorkerThe togglable commmand line arguments to the script are as follows: 24*da0073e9SAndroid Build Coastguard Worker - `num_iters` (default: 100): how many requests to send to the backend 25*da0073e9SAndroid Build Coastguard Worker excluding the first warmup request 26*da0073e9SAndroid Build Coastguard Worker - `batch_size` (default: 32): the batch size of the requests. 27*da0073e9SAndroid Build Coastguard Worker - `model_dir` (default: '.'): the directory to load the checkpoint from 28*da0073e9SAndroid Build Coastguard Worker - `compile` (default: compile): or `--no-compile` whether to `torch.compile()` 29*da0073e9SAndroid Build Coastguard Worker the model 30*da0073e9SAndroid Build Coastguard Worker - `output_file` (default: output.csv): The name of the csv file to write the outputs to in the `results/` directory. 31*da0073e9SAndroid Build Coastguard Worker - `num_workers` (default: 2): The `max_threads` passed to the `ThreadPoolExecutor` in charge of model prediction 32*da0073e9SAndroid Build Coastguard Worker 33*da0073e9SAndroid Build Coastguard Workere.g. A sample command to run the benchmark 34*da0073e9SAndroid Build Coastguard Worker 35*da0073e9SAndroid Build Coastguard Worker``` 36*da0073e9SAndroid Build Coastguard Workerpython -W ignore server.py --num_iters 1000 --batch_size 32 37*da0073e9SAndroid Build Coastguard Worker``` 38*da0073e9SAndroid Build Coastguard Worker 39*da0073e9SAndroid Build Coastguard Workerthe results will be found in `results/output.csv`, which will be appended to if the file already exists. 40*da0073e9SAndroid Build Coastguard Worker 41*da0073e9SAndroid Build Coastguard WorkerNote that `m.compile()` time in the csv file is not the time for the model to be compiled, 42*da0073e9SAndroid Build Coastguard Workerwhich happens during the first iteration, but rather the time for PT2 components 43*da0073e9SAndroid Build Coastguard Workerto be lazily imported (e.g. triton). 44*da0073e9SAndroid Build Coastguard Worker 45*da0073e9SAndroid Build Coastguard Worker### Running a sweep 46*da0073e9SAndroid Build Coastguard Worker 47*da0073e9SAndroid Build Coastguard WorkerThe script `runner.sh` will run a sweep of the benchmark over different batch 48*da0073e9SAndroid Build Coastguard Workersizes with compile on and off and collect the mean and standard deviation of warmup latency, 49*da0073e9SAndroid Build Coastguard Workeraverage latency, throughput and GPU utilization for each. The `results/` directory will contain the metrics 50*da0073e9SAndroid Build Coastguard Workerfrom running a sweep as we develop this benchmark where `results/output_{batch_size}_{compile}.md` 51*da0073e9SAndroid Build Coastguard Workerwill contain the mean and standard deviation of results for a given batch size and compile setting. 52*da0073e9SAndroid Build Coastguard WorkerIf the file already exists, the metrics from the run will be appended as a new row in the markdown table. 53