xref: /aosp_15_r20/external/pytorch/benchmarks/inference/README.md (revision da0073e96a02ea20f0ac840b70461e3646d07c45)
1*da0073e9SAndroid Build Coastguard Worker## Inference benchmarks
2*da0073e9SAndroid Build Coastguard Worker
3*da0073e9SAndroid Build Coastguard WorkerThis folder contains a work in progress simulation of a python inference server.
4*da0073e9SAndroid Build Coastguard Worker
5*da0073e9SAndroid Build Coastguard WorkerThe v0 version of this has a backend worker that is a single process. It loads a
6*da0073e9SAndroid Build Coastguard WorkerResNet-18 checkpoint to 'cuda:0' and compiles the model. It accepts requests in
7*da0073e9SAndroid Build Coastguard Workerthe form of (tensor, request_time) from a `multiprocessing.Queue`, runs
8*da0073e9SAndroid Build Coastguard Workerinference on the request and returns (output, request_time) in the a separate
9*da0073e9SAndroid Build Coastguard Workerresponse `multiprocessing.Queue`.
10*da0073e9SAndroid Build Coastguard Worker
11*da0073e9SAndroid Build Coastguard WorkerThe frontend worker is a process with three threads
12*da0073e9SAndroid Build Coastguard Worker1. A thread that generates fake data of a given batch size in the form of CPU
13*da0073e9SAndroid Build Coastguard Worker   tensors and puts the data into the request queue
14*da0073e9SAndroid Build Coastguard Worker2. A thread that reads responses from the response queue and collects metrics on
15*da0073e9SAndroid Build Coastguard Worker   the latency of the first response, which corresponds to the cold start time,
16*da0073e9SAndroid Build Coastguard Worker   average, minimum and maximum response latency as well as throughput.
17*da0073e9SAndroid Build Coastguard Worker3. A thread that polls nvidia-smi for GPU utilization metrics.
18*da0073e9SAndroid Build Coastguard Worker
19*da0073e9SAndroid Build Coastguard WorkerFor now we omit data preprocessing as well as result post-processing.
20*da0073e9SAndroid Build Coastguard Worker
21*da0073e9SAndroid Build Coastguard Worker### Running a single benchmark
22*da0073e9SAndroid Build Coastguard Worker
23*da0073e9SAndroid Build Coastguard WorkerThe togglable commmand line arguments to the script are as follows:
24*da0073e9SAndroid Build Coastguard Worker  - `num_iters` (default: 100): how many requests to send to the backend
25*da0073e9SAndroid Build Coastguard Worker    excluding the first warmup request
26*da0073e9SAndroid Build Coastguard Worker  - `batch_size` (default: 32): the batch size of the requests.
27*da0073e9SAndroid Build Coastguard Worker  - `model_dir` (default: '.'): the directory to load the checkpoint from
28*da0073e9SAndroid Build Coastguard Worker  - `compile` (default: compile): or `--no-compile` whether to `torch.compile()`
29*da0073e9SAndroid Build Coastguard Worker    the model
30*da0073e9SAndroid Build Coastguard Worker  - `output_file` (default: output.csv): The name of the csv file to write the outputs to in the `results/` directory.
31*da0073e9SAndroid Build Coastguard Worker  - `num_workers` (default: 2): The `max_threads` passed to the `ThreadPoolExecutor` in charge of model prediction
32*da0073e9SAndroid Build Coastguard Worker
33*da0073e9SAndroid Build Coastguard Workere.g. A sample command to run the benchmark
34*da0073e9SAndroid Build Coastguard Worker
35*da0073e9SAndroid Build Coastguard Worker```
36*da0073e9SAndroid Build Coastguard Workerpython -W ignore server.py --num_iters 1000 --batch_size 32
37*da0073e9SAndroid Build Coastguard Worker```
38*da0073e9SAndroid Build Coastguard Worker
39*da0073e9SAndroid Build Coastguard Workerthe results will be found in `results/output.csv`, which will be appended to if the file already exists.
40*da0073e9SAndroid Build Coastguard Worker
41*da0073e9SAndroid Build Coastguard WorkerNote that `m.compile()` time in the csv file is not the time for the model to be compiled,
42*da0073e9SAndroid Build Coastguard Workerwhich happens during the first iteration, but rather the time for PT2 components
43*da0073e9SAndroid Build Coastguard Workerto be lazily imported (e.g. triton).
44*da0073e9SAndroid Build Coastguard Worker
45*da0073e9SAndroid Build Coastguard Worker### Running a sweep
46*da0073e9SAndroid Build Coastguard Worker
47*da0073e9SAndroid Build Coastguard WorkerThe script `runner.sh` will run a sweep of the benchmark over different batch
48*da0073e9SAndroid Build Coastguard Workersizes with compile on and off and collect the mean and standard deviation of warmup latency,
49*da0073e9SAndroid Build Coastguard Workeraverage latency, throughput and GPU utilization for each. The `results/` directory will contain the metrics
50*da0073e9SAndroid Build Coastguard Workerfrom running a sweep as we develop this benchmark where `results/output_{batch_size}_{compile}.md`
51*da0073e9SAndroid Build Coastguard Workerwill contain the mean and standard deviation of results for a given batch size and compile setting.
52*da0073e9SAndroid Build Coastguard WorkerIf the file already exists, the metrics from the run will be appended as a new row in the markdown table.
53