xref: /aosp_15_r20/external/pytorch/benchmarks/dynamo/README.md (revision da0073e96a02ea20f0ac840b70461e3646d07c45)
1*da0073e9SAndroid Build Coastguard Worker# `torch.compile()` Benchmarking
2*da0073e9SAndroid Build Coastguard Worker
3*da0073e9SAndroid Build Coastguard WorkerThis directory contains benchmarking code for TorchDynamo and many
4*da0073e9SAndroid Build Coastguard Workerbackends including TorchInductor.  It includes three main benchmark suites:
5*da0073e9SAndroid Build Coastguard Worker
6*da0073e9SAndroid Build Coastguard Worker- [TorchBenchmark](https://github.com/pytorch/benchmark): A diverse set of models, initially seeded from
7*da0073e9SAndroid Build Coastguard Workerhighly cited research models as ranked by [Papers With Code](https://paperswithcode.com).  See [torchbench
8*da0073e9SAndroid Build Coastguard Workerinstallation](https://github.com/pytorch/benchmark#installation) and `torchbench.py` for the low-level runner.
9*da0073e9SAndroid Build Coastguard Worker[Makefile](Makefile) also contains the commands needed to setup TorchBenchmark to match the versions used in
10*da0073e9SAndroid Build Coastguard WorkerPyTorch CI.
11*da0073e9SAndroid Build Coastguard Worker
12*da0073e9SAndroid Build Coastguard Worker- Models from [HuggingFace](https://github.com/huggingface/transformers): Primarily transformer models, with
13*da0073e9SAndroid Build Coastguard Workerrepresentative models chosen for each category available.  The low-level runner (`huggingface.py`) automatically
14*da0073e9SAndroid Build Coastguard Workerdownloads and installs the needed dependencies on first run.
15*da0073e9SAndroid Build Coastguard Worker
16*da0073e9SAndroid Build Coastguard Worker- Models from [TIMM](https://github.com/huggingface/pytorch-image-models): Primarily vision models, with representative
17*da0073e9SAndroid Build Coastguard Workermodels chosen for each category available.  The low-level runner (`timm_models.py`) automatically downloads and
18*da0073e9SAndroid Build Coastguard Workerinstalls the needed dependencies on first run.
19*da0073e9SAndroid Build Coastguard Worker
20*da0073e9SAndroid Build Coastguard Worker
21*da0073e9SAndroid Build Coastguard Worker## GPU Performance Dashboard
22*da0073e9SAndroid Build Coastguard Worker
23*da0073e9SAndroid Build Coastguard WorkerDaily results from the benchmarks here are available in the [TorchInductor
24*da0073e9SAndroid Build Coastguard WorkerPerformance Dashboard](https://hud.pytorch.org/benchmark/compilers),
25*da0073e9SAndroid Build Coastguard Workercurrently run on an NVIDIA A100 GPU.
26*da0073e9SAndroid Build Coastguard Worker
27*da0073e9SAndroid Build Coastguard WorkerThe [inductor-perf-test-nightly.yml](https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml)
28*da0073e9SAndroid Build Coastguard Workerworkflow generates the data in the performance dashboard.  If you have the needed permissions, you can benchmark
29*da0073e9SAndroid Build Coastguard Workeryour own branch on the PyTorch GitHub repo by:
30*da0073e9SAndroid Build Coastguard Worker1) Select "Run workflow" in the top right of the [workflow](https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml)
31*da0073e9SAndroid Build Coastguard Worker2) Select your branch you want to benchmark
32*da0073e9SAndroid Build Coastguard Worker3) Choose the options (such as training vs inference)
33*da0073e9SAndroid Build Coastguard Worker4) Click "Run workflow"
34*da0073e9SAndroid Build Coastguard Worker5) Wait for the job to complete (4 to 12 hours depending on backlog)
35*da0073e9SAndroid Build Coastguard Worker6) Go to the [dashboard](https://hud.pytorch.org/benchmark/compilers)
36*da0073e9SAndroid Build Coastguard Worker7) Select your branch and commit at the top of the dashboard
37*da0073e9SAndroid Build Coastguard Worker
38*da0073e9SAndroid Build Coastguard WorkerThe dashboard compares two commits a "Base Commit" and a "New Commit".
39*da0073e9SAndroid Build Coastguard WorkerAn entry such as `2.38x → 2.41x` means that the performance improved
40*da0073e9SAndroid Build Coastguard Workerfrom `2.38x` in the base to `2.41x` in the new commit.  All performance
41*da0073e9SAndroid Build Coastguard Workerresults are normalized to eager mode PyTorch (`1x`), and higher is better.
42*da0073e9SAndroid Build Coastguard Worker
43*da0073e9SAndroid Build Coastguard Worker
44*da0073e9SAndroid Build Coastguard Worker## CPU Performance Dashboard
45*da0073e9SAndroid Build Coastguard Worker
46*da0073e9SAndroid Build Coastguard WorkerThe [TorchInductor CPU Performance
47*da0073e9SAndroid Build Coastguard WorkerDashboard](https://github.com/pytorch/pytorch/issues/93531) is tracked
48*da0073e9SAndroid Build Coastguard Workeron a GitHub issue and updated periodically.
49*da0073e9SAndroid Build Coastguard Worker
50*da0073e9SAndroid Build Coastguard Worker## Running Locally
51*da0073e9SAndroid Build Coastguard Worker
52*da0073e9SAndroid Build Coastguard WorkerRaw commands used to generate the data for
53*da0073e9SAndroid Build Coastguard Workerthe performance dashboards can be found
54*da0073e9SAndroid Build Coastguard Worker[here](https://github.com/pytorch/pytorch/blob/641ec2115f300a3e3b39c75f6a32ee3f64afcf30/.ci/pytorch/test.sh#L343-L418).
55*da0073e9SAndroid Build Coastguard Worker
56*da0073e9SAndroid Build Coastguard WorkerTo summarize there are three scripts to run each set of benchmarks:
57*da0073e9SAndroid Build Coastguard Worker- `./benchmarks/dynamo/torchbench.py ...`
58*da0073e9SAndroid Build Coastguard Worker- `./benchmarks/dynamo/huggingface.py ...`
59*da0073e9SAndroid Build Coastguard Worker- `./benchmarks/dynamo/timm_models.py ...`
60*da0073e9SAndroid Build Coastguard Worker
61*da0073e9SAndroid Build Coastguard WorkerEach of these scripts takes the same set of arguments.  The ones used by dashboards are:
62*da0073e9SAndroid Build Coastguard Worker- `--accuracy` or `--performance`: selects between checking correctness and measuring speedup (both are run for dashboard).
63*da0073e9SAndroid Build Coastguard Worker- `--training` or `--inference`: selects between measuring training or inference (both are run for dashboard).
64*da0073e9SAndroid Build Coastguard Worker- `--device=cuda` or `--device=cpu`: selects device to measure.
65*da0073e9SAndroid Build Coastguard Worker- `--amp`, `--bfloat16`, `--float16`, `--float32`:  selects precision to use `--amp` is used for training and `--bfloat16` for inference.
66*da0073e9SAndroid Build Coastguard Worker- `--cold-start-latency`: disables caching to accurately measure compile times.
67*da0073e9SAndroid Build Coastguard Worker- `--backend=inductor`: selects TorchInductor as the compiler backend to measure.  Many more are available, see `--help`.
68*da0073e9SAndroid Build Coastguard Worker- `--output=<filename>.csv`: where to write results to.
69*da0073e9SAndroid Build Coastguard Worker- `--dynamic-shapes --dynamic-batch-only`: used when the `dynamic` config is enabled.
70*da0073e9SAndroid Build Coastguard Worker- `--disable-cudagraphs`: used by configurations without cudagraphs enabled (default).
71*da0073e9SAndroid Build Coastguard Worker- `--freezing`: enable additional inference-only optimizations.
72*da0073e9SAndroid Build Coastguard Worker- `--cpp-wrapper`: enable C++ wrapper code to lower overheads.
73*da0073e9SAndroid Build Coastguard Worker- `TORCHINDUCTOR_MAX_AUTOTUNE=1` (environment variable): used to measure max-autotune mode, which is run weekly due to longer compile times.
74*da0073e9SAndroid Build Coastguard Worker- `--export-aot-inductor`: benchmarks ahead-of-time compilation mode.
75*da0073e9SAndroid Build Coastguard Worker- `--total-partitions` and `--partition-id`: used to parallel benchmarking across different machines.
76*da0073e9SAndroid Build Coastguard Worker
77*da0073e9SAndroid Build Coastguard WorkerFor debugging you can run just a single benchmark by adding the `--only=<NAME>` flag.
78*da0073e9SAndroid Build Coastguard Worker
79*da0073e9SAndroid Build Coastguard WorkerA complete list of options can be seen by running each of the runners with the `--help` flag.
80*da0073e9SAndroid Build Coastguard Worker
81*da0073e9SAndroid Build Coastguard WorkerAs an example, the commands to run first line of the dashboard (performance only) would be:
82*da0073e9SAndroid Build Coastguard Worker```
83*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/torchbench.py --performance --training --amp --backend=inductor --output=torchbench_training.csv
84*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend=inductor --output=torchbench_inference.csv
85*da0073e9SAndroid Build Coastguard Worker
86*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/huggingface.py --performance --training --amp --backend=inductor --output=huggingface_training.csv
87*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/huggingface.py --performance --inference --bfloat16 --backend=inductor --output=huggingface_inference.csv
88*da0073e9SAndroid Build Coastguard Worker
89*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/timm_models.py --performance --training --amp --backend=inductor --output=timm_models_training.csv
90*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/timm_models.py --performance --inference --bfloat16 --backend=inductor --output=timm_models_inference.csv
91*da0073e9SAndroid Build Coastguard Worker```
92