1*da0073e9SAndroid Build Coastguard Worker# `torch.compile()` Benchmarking 2*da0073e9SAndroid Build Coastguard Worker 3*da0073e9SAndroid Build Coastguard WorkerThis directory contains benchmarking code for TorchDynamo and many 4*da0073e9SAndroid Build Coastguard Workerbackends including TorchInductor. It includes three main benchmark suites: 5*da0073e9SAndroid Build Coastguard Worker 6*da0073e9SAndroid Build Coastguard Worker- [TorchBenchmark](https://github.com/pytorch/benchmark): A diverse set of models, initially seeded from 7*da0073e9SAndroid Build Coastguard Workerhighly cited research models as ranked by [Papers With Code](https://paperswithcode.com). See [torchbench 8*da0073e9SAndroid Build Coastguard Workerinstallation](https://github.com/pytorch/benchmark#installation) and `torchbench.py` for the low-level runner. 9*da0073e9SAndroid Build Coastguard Worker[Makefile](Makefile) also contains the commands needed to setup TorchBenchmark to match the versions used in 10*da0073e9SAndroid Build Coastguard WorkerPyTorch CI. 11*da0073e9SAndroid Build Coastguard Worker 12*da0073e9SAndroid Build Coastguard Worker- Models from [HuggingFace](https://github.com/huggingface/transformers): Primarily transformer models, with 13*da0073e9SAndroid Build Coastguard Workerrepresentative models chosen for each category available. The low-level runner (`huggingface.py`) automatically 14*da0073e9SAndroid Build Coastguard Workerdownloads and installs the needed dependencies on first run. 15*da0073e9SAndroid Build Coastguard Worker 16*da0073e9SAndroid Build Coastguard Worker- Models from [TIMM](https://github.com/huggingface/pytorch-image-models): Primarily vision models, with representative 17*da0073e9SAndroid Build Coastguard Workermodels chosen for each category available. The low-level runner (`timm_models.py`) automatically downloads and 18*da0073e9SAndroid Build Coastguard Workerinstalls the needed dependencies on first run. 19*da0073e9SAndroid Build Coastguard Worker 20*da0073e9SAndroid Build Coastguard Worker 21*da0073e9SAndroid Build Coastguard Worker## GPU Performance Dashboard 22*da0073e9SAndroid Build Coastguard Worker 23*da0073e9SAndroid Build Coastguard WorkerDaily results from the benchmarks here are available in the [TorchInductor 24*da0073e9SAndroid Build Coastguard WorkerPerformance Dashboard](https://hud.pytorch.org/benchmark/compilers), 25*da0073e9SAndroid Build Coastguard Workercurrently run on an NVIDIA A100 GPU. 26*da0073e9SAndroid Build Coastguard Worker 27*da0073e9SAndroid Build Coastguard WorkerThe [inductor-perf-test-nightly.yml](https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml) 28*da0073e9SAndroid Build Coastguard Workerworkflow generates the data in the performance dashboard. If you have the needed permissions, you can benchmark 29*da0073e9SAndroid Build Coastguard Workeryour own branch on the PyTorch GitHub repo by: 30*da0073e9SAndroid Build Coastguard Worker1) Select "Run workflow" in the top right of the [workflow](https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml) 31*da0073e9SAndroid Build Coastguard Worker2) Select your branch you want to benchmark 32*da0073e9SAndroid Build Coastguard Worker3) Choose the options (such as training vs inference) 33*da0073e9SAndroid Build Coastguard Worker4) Click "Run workflow" 34*da0073e9SAndroid Build Coastguard Worker5) Wait for the job to complete (4 to 12 hours depending on backlog) 35*da0073e9SAndroid Build Coastguard Worker6) Go to the [dashboard](https://hud.pytorch.org/benchmark/compilers) 36*da0073e9SAndroid Build Coastguard Worker7) Select your branch and commit at the top of the dashboard 37*da0073e9SAndroid Build Coastguard Worker 38*da0073e9SAndroid Build Coastguard WorkerThe dashboard compares two commits a "Base Commit" and a "New Commit". 39*da0073e9SAndroid Build Coastguard WorkerAn entry such as `2.38x → 2.41x` means that the performance improved 40*da0073e9SAndroid Build Coastguard Workerfrom `2.38x` in the base to `2.41x` in the new commit. All performance 41*da0073e9SAndroid Build Coastguard Workerresults are normalized to eager mode PyTorch (`1x`), and higher is better. 42*da0073e9SAndroid Build Coastguard Worker 43*da0073e9SAndroid Build Coastguard Worker 44*da0073e9SAndroid Build Coastguard Worker## CPU Performance Dashboard 45*da0073e9SAndroid Build Coastguard Worker 46*da0073e9SAndroid Build Coastguard WorkerThe [TorchInductor CPU Performance 47*da0073e9SAndroid Build Coastguard WorkerDashboard](https://github.com/pytorch/pytorch/issues/93531) is tracked 48*da0073e9SAndroid Build Coastguard Workeron a GitHub issue and updated periodically. 49*da0073e9SAndroid Build Coastguard Worker 50*da0073e9SAndroid Build Coastguard Worker## Running Locally 51*da0073e9SAndroid Build Coastguard Worker 52*da0073e9SAndroid Build Coastguard WorkerRaw commands used to generate the data for 53*da0073e9SAndroid Build Coastguard Workerthe performance dashboards can be found 54*da0073e9SAndroid Build Coastguard Worker[here](https://github.com/pytorch/pytorch/blob/641ec2115f300a3e3b39c75f6a32ee3f64afcf30/.ci/pytorch/test.sh#L343-L418). 55*da0073e9SAndroid Build Coastguard Worker 56*da0073e9SAndroid Build Coastguard WorkerTo summarize there are three scripts to run each set of benchmarks: 57*da0073e9SAndroid Build Coastguard Worker- `./benchmarks/dynamo/torchbench.py ...` 58*da0073e9SAndroid Build Coastguard Worker- `./benchmarks/dynamo/huggingface.py ...` 59*da0073e9SAndroid Build Coastguard Worker- `./benchmarks/dynamo/timm_models.py ...` 60*da0073e9SAndroid Build Coastguard Worker 61*da0073e9SAndroid Build Coastguard WorkerEach of these scripts takes the same set of arguments. The ones used by dashboards are: 62*da0073e9SAndroid Build Coastguard Worker- `--accuracy` or `--performance`: selects between checking correctness and measuring speedup (both are run for dashboard). 63*da0073e9SAndroid Build Coastguard Worker- `--training` or `--inference`: selects between measuring training or inference (both are run for dashboard). 64*da0073e9SAndroid Build Coastguard Worker- `--device=cuda` or `--device=cpu`: selects device to measure. 65*da0073e9SAndroid Build Coastguard Worker- `--amp`, `--bfloat16`, `--float16`, `--float32`: selects precision to use `--amp` is used for training and `--bfloat16` for inference. 66*da0073e9SAndroid Build Coastguard Worker- `--cold-start-latency`: disables caching to accurately measure compile times. 67*da0073e9SAndroid Build Coastguard Worker- `--backend=inductor`: selects TorchInductor as the compiler backend to measure. Many more are available, see `--help`. 68*da0073e9SAndroid Build Coastguard Worker- `--output=<filename>.csv`: where to write results to. 69*da0073e9SAndroid Build Coastguard Worker- `--dynamic-shapes --dynamic-batch-only`: used when the `dynamic` config is enabled. 70*da0073e9SAndroid Build Coastguard Worker- `--disable-cudagraphs`: used by configurations without cudagraphs enabled (default). 71*da0073e9SAndroid Build Coastguard Worker- `--freezing`: enable additional inference-only optimizations. 72*da0073e9SAndroid Build Coastguard Worker- `--cpp-wrapper`: enable C++ wrapper code to lower overheads. 73*da0073e9SAndroid Build Coastguard Worker- `TORCHINDUCTOR_MAX_AUTOTUNE=1` (environment variable): used to measure max-autotune mode, which is run weekly due to longer compile times. 74*da0073e9SAndroid Build Coastguard Worker- `--export-aot-inductor`: benchmarks ahead-of-time compilation mode. 75*da0073e9SAndroid Build Coastguard Worker- `--total-partitions` and `--partition-id`: used to parallel benchmarking across different machines. 76*da0073e9SAndroid Build Coastguard Worker 77*da0073e9SAndroid Build Coastguard WorkerFor debugging you can run just a single benchmark by adding the `--only=<NAME>` flag. 78*da0073e9SAndroid Build Coastguard Worker 79*da0073e9SAndroid Build Coastguard WorkerA complete list of options can be seen by running each of the runners with the `--help` flag. 80*da0073e9SAndroid Build Coastguard Worker 81*da0073e9SAndroid Build Coastguard WorkerAs an example, the commands to run first line of the dashboard (performance only) would be: 82*da0073e9SAndroid Build Coastguard Worker``` 83*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/torchbench.py --performance --training --amp --backend=inductor --output=torchbench_training.csv 84*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend=inductor --output=torchbench_inference.csv 85*da0073e9SAndroid Build Coastguard Worker 86*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/huggingface.py --performance --training --amp --backend=inductor --output=huggingface_training.csv 87*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/huggingface.py --performance --inference --bfloat16 --backend=inductor --output=huggingface_inference.csv 88*da0073e9SAndroid Build Coastguard Worker 89*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/timm_models.py --performance --training --amp --backend=inductor --output=timm_models_training.csv 90*da0073e9SAndroid Build Coastguard Worker./benchmarks/dynamo/timm_models.py --performance --inference --bfloat16 --backend=inductor --output=timm_models_inference.csv 91*da0073e9SAndroid Build Coastguard Worker``` 92