Name Date Size #Lines LOC

..--

ci_expected_accuracy/H25-Apr-2025-14,0983,647

microbenchmarks/H25-Apr-2025-29,50528,899

pr_time_benchmarks/H25-Apr-2025-508376

MakefileH A D25-Apr-20253.1 KiB4335

README.mdH A D25-Apr-20255.3 KiB9270

__init__.pyH A D25-Apr-20250 10

all_torchbench_models_list.txtH A D25-Apr-20251.2 KiB7373

benchmarks.pyH A D25-Apr-20252.9 KiB10475

check_accuracy.pyH A D25-Apr-20252.9 KiB10279

check_csv.pyH A D25-Apr-2025829 4131

check_graph_breaks.pyH A D25-Apr-20252.4 KiB8668

check_memory_compression_ratio.pyH A D25-Apr-20251.6 KiB5849

check_perf_csv.pyH A D25-Apr-20251,004 4434

combine_csv.pyH A D25-Apr-20251.3 KiB5140

common.pyH A D25-Apr-2025150.9 KiB4,3193,553

dist_util.pyH A D25-Apr-20254 KiB148116

distributed.pyH A D25-Apr-20255.5 KiB178145

expected_ci_perf_inductor_torchbench.csvH A D25-Apr-20252.8 KiB5655

expected_ci_speedup_inductor_torchbench_cpu.csvH A D25-Apr-20251.6 KiB3029

huggingface.pyH A D25-Apr-202520.2 KiB628504

huggingface.yamlH A D25-Apr-20252.8 KiB11081

huggingface_models_list.txtH A D25-Apr-20251.3 KiB5251

huggingface_models_list_cpu.txtH A D25-Apr-20251.2 KiB4847

join_results.pyH A D25-Apr-20251.5 KiB5744

parse_logs.pyH A D25-Apr-20255.7 KiB199136

run_all.shH A D25-Apr-20251.6 KiB3912

run_delta.shH A D25-Apr-2025751 2313

runner.pyH A D25-Apr-202553.3 KiB1,5481,336

summarize_perf.pyH A D25-Apr-20254.4 KiB145119

test.pyH A D25-Apr-20251.2 KiB4636

timm_models.pyH A D25-Apr-202511.9 KiB423331

timm_models_list.txtH A D25-Apr-20251.1 KiB6261

timm_models_list_cpu.txtH A D25-Apr-20251 KiB6059

torchao_backend.pyH A D25-Apr-20252.2 KiB5847

torchbench.pyH A D25-Apr-202514.6 KiB468366

torchbench.yamlH A D25-Apr-20255.3 KiB269178

torchbench_models_list.txtH A D25-Apr-2025463 2928

torchbench_models_list_cpu.txtH A D25-Apr-2025803 4948

training_loss.pyH A D25-Apr-20256.3 KiB204171

README.md

1# `torch.compile()` Benchmarking
2
3This directory contains benchmarking code for TorchDynamo and many
4backends including TorchInductor.  It includes three main benchmark suites:
5
6- [TorchBenchmark](https://github.com/pytorch/benchmark): A diverse set of models, initially seeded from
7highly cited research models as ranked by [Papers With Code](https://paperswithcode.com).  See [torchbench
8installation](https://github.com/pytorch/benchmark#installation) and `torchbench.py` for the low-level runner.
9[Makefile](Makefile) also contains the commands needed to setup TorchBenchmark to match the versions used in
10PyTorch CI.
11
12- Models from [HuggingFace](https://github.com/huggingface/transformers): Primarily transformer models, with
13representative models chosen for each category available.  The low-level runner (`huggingface.py`) automatically
14downloads and installs the needed dependencies on first run.
15
16- Models from [TIMM](https://github.com/huggingface/pytorch-image-models): Primarily vision models, with representative
17models chosen for each category available.  The low-level runner (`timm_models.py`) automatically downloads and
18installs the needed dependencies on first run.
19
20
21## GPU Performance Dashboard
22
23Daily results from the benchmarks here are available in the [TorchInductor
24Performance Dashboard](https://hud.pytorch.org/benchmark/compilers),
25currently run on an NVIDIA A100 GPU.
26
27The [inductor-perf-test-nightly.yml](https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml)
28workflow generates the data in the performance dashboard.  If you have the needed permissions, you can benchmark
29your own branch on the PyTorch GitHub repo by:
301) Select "Run workflow" in the top right of the [workflow](https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml)
312) Select your branch you want to benchmark
323) Choose the options (such as training vs inference)
334) Click "Run workflow"
345) Wait for the job to complete (4 to 12 hours depending on backlog)
356) Go to the [dashboard](https://hud.pytorch.org/benchmark/compilers)
367) Select your branch and commit at the top of the dashboard
37
38The dashboard compares two commits a "Base Commit" and a "New Commit".
39An entry such as `2.38x → 2.41x` means that the performance improved
40from `2.38x` in the base to `2.41x` in the new commit.  All performance
41results are normalized to eager mode PyTorch (`1x`), and higher is better.
42
43
44## CPU Performance Dashboard
45
46The [TorchInductor CPU Performance
47Dashboard](https://github.com/pytorch/pytorch/issues/93531) is tracked
48on a GitHub issue and updated periodically.
49
50## Running Locally
51
52Raw commands used to generate the data for
53the performance dashboards can be found
54[here](https://github.com/pytorch/pytorch/blob/641ec2115f300a3e3b39c75f6a32ee3f64afcf30/.ci/pytorch/test.sh#L343-L418).
55
56To summarize there are three scripts to run each set of benchmarks:
57- `./benchmarks/dynamo/torchbench.py ...`
58- `./benchmarks/dynamo/huggingface.py ...`
59- `./benchmarks/dynamo/timm_models.py ...`
60
61Each of these scripts takes the same set of arguments.  The ones used by dashboards are:
62- `--accuracy` or `--performance`: selects between checking correctness and measuring speedup (both are run for dashboard).
63- `--training` or `--inference`: selects between measuring training or inference (both are run for dashboard).
64- `--device=cuda` or `--device=cpu`: selects device to measure.
65- `--amp`, `--bfloat16`, `--float16`, `--float32`:  selects precision to use `--amp` is used for training and `--bfloat16` for inference.
66- `--cold-start-latency`: disables caching to accurately measure compile times.
67- `--backend=inductor`: selects TorchInductor as the compiler backend to measure.  Many more are available, see `--help`.
68- `--output=<filename>.csv`: where to write results to.
69- `--dynamic-shapes --dynamic-batch-only`: used when the `dynamic` config is enabled.
70- `--disable-cudagraphs`: used by configurations without cudagraphs enabled (default).
71- `--freezing`: enable additional inference-only optimizations.
72- `--cpp-wrapper`: enable C++ wrapper code to lower overheads.
73- `TORCHINDUCTOR_MAX_AUTOTUNE=1` (environment variable): used to measure max-autotune mode, which is run weekly due to longer compile times.
74- `--export-aot-inductor`: benchmarks ahead-of-time compilation mode.
75- `--total-partitions` and `--partition-id`: used to parallel benchmarking across different machines.
76
77For debugging you can run just a single benchmark by adding the `--only=<NAME>` flag.
78
79A complete list of options can be seen by running each of the runners with the `--help` flag.
80
81As an example, the commands to run first line of the dashboard (performance only) would be:
82```
83./benchmarks/dynamo/torchbench.py --performance --training --amp --backend=inductor --output=torchbench_training.csv
84./benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend=inductor --output=torchbench_inference.csv
85
86./benchmarks/dynamo/huggingface.py --performance --training --amp --backend=inductor --output=huggingface_training.csv
87./benchmarks/dynamo/huggingface.py --performance --inference --bfloat16 --backend=inductor --output=huggingface_inference.csv
88
89./benchmarks/dynamo/timm_models.py --performance --training --amp --backend=inductor --output=timm_models_training.csv
90./benchmarks/dynamo/timm_models.py --performance --inference --bfloat16 --backend=inductor --output=timm_models_inference.csv
91```
92