dynamo - OpenGrok cross reference for /aosp_15_r20/external/pytorch/benchmarks/dynamo/

# `torch.compile()` Benchmarking

This directory contains benchmarking code for TorchDynamo and many
backends including TorchInductor.  It includes three main benchmark suites:

- [TorchBenchmark](https://github.com/pytorch/benchmark): A diverse set of models, initially seeded from
highly cited research models as ranked by [Papers With Code](https://paperswithcode.com).  See [torchbench
installation](https://github.com/pytorch/benchmark#installation) and `torchbench.py` for the low-level runner.
[Makefile](Makefile) also contains the commands needed to setup TorchBenchmark to match the versions used in
PyTorch CI.

- Models from [HuggingFace](https://github.com/huggingface/transformers): Primarily transformer models, with
representative models chosen for each category available.  The low-level runner (`huggingface.py`) automatically
downloads and installs the needed dependencies on first run.

- Models from [TIMM](https://github.com/huggingface/pytorch-image-models): Primarily vision models, with representative
models chosen for each category available.  The low-level runner (`timm_models.py`) automatically downloads and
installs the needed dependencies on first run.


## GPU Performance Dashboard

Daily results from the benchmarks here are available in the [TorchInductor
Performance Dashboard](https://hud.pytorch.org/benchmark/compilers),
currently run on an NVIDIA A100 GPU.

The [inductor-perf-test-nightly.yml](https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml)
workflow generates the data in the performance dashboard.  If you have the needed permissions, you can benchmark
your own branch on the PyTorch GitHub repo by:
1) Select "Run workflow" in the top right of the [workflow](https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml)
2) Select your branch you want to benchmark
3) Choose the options (such as training vs inference)
4) Click "Run workflow"
5) Wait for the job to complete (4 to 12 hours depending on backlog)
6) Go to the [dashboard](https://hud.pytorch.org/benchmark/compilers)
7) Select your branch and commit at the top of the dashboard

The dashboard compares two commits a "Base Commit" and a "New Commit".
An entry such as `2.38x → 2.41x` means that the performance improved
from `2.38x` in the base to `2.41x` in the new commit.  All performance
results are normalized to eager mode PyTorch (`1x`), and higher is better.


## CPU Performance Dashboard

The [TorchInductor CPU Performance
Dashboard](https://github.com/pytorch/pytorch/issues/93531) is tracked
on a GitHub issue and updated periodically.

## Running Locally

Raw commands used to generate the data for
the performance dashboards can be found
[here](https://github.com/pytorch/pytorch/blob/641ec2115f300a3e3b39c75f6a32ee3f64afcf30/.ci/pytorch/test.sh#L343-L418).

To summarize there are three scripts to run each set of benchmarks:
- `./benchmarks/dynamo/torchbench.py ...`
- `./benchmarks/dynamo/huggingface.py ...`
- `./benchmarks/dynamo/timm_models.py ...`

Each of these scripts takes the same set of arguments.  The ones used by dashboards are:
- `--accuracy` or `--performance`: selects between checking correctness and measuring speedup (both are run for dashboard).
- `--training` or `--inference`: selects between measuring training or inference (both are run for dashboard).
- `--device=cuda` or `--device=cpu`: selects device to measure.
- `--amp`, `--bfloat16`, `--float16`, `--float32`:  selects precision to use `--amp` is used for training and `--bfloat16` for inference.
- `--cold-start-latency`: disables caching to accurately measure compile times.
- `--backend=inductor`: selects TorchInductor as the compiler backend to measure.  Many more are available, see `--help`.
- `--output=<filename>.csv`: where to write results to.
- `--dynamic-shapes --dynamic-batch-only`: used when the `dynamic` config is enabled.
- `--disable-cudagraphs`: used by configurations without cudagraphs enabled (default).
- `--freezing`: enable additional inference-only optimizations.
- `--cpp-wrapper`: enable C++ wrapper code to lower overheads.
- `TORCHINDUCTOR_MAX_AUTOTUNE=1` (environment variable): used to measure max-autotune mode, which is run weekly due to longer compile times.
- `--export-aot-inductor`: benchmarks ahead-of-time compilation mode.
- `--total-partitions` and `--partition-id`: used to parallel benchmarking across different machines.

For debugging you can run just a single benchmark by adding the `--only=<NAME>` flag.

A complete list of options can be seen by running each of the runners with the `--help` flag.

As an example, the commands to run first line of the dashboard (performance only) would be:
```
./benchmarks/dynamo/torchbench.py --performance --training --amp --backend=inductor --output=torchbench_training.csv
./benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend=inductor --output=torchbench_inference.csv

./benchmarks/dynamo/huggingface.py --performance --training --amp --backend=inductor --output=huggingface_training.csv
./benchmarks/dynamo/huggingface.py --performance --inference --bfloat16 --backend=inductor --output=huggingface_inference.csv

./benchmarks/dynamo/timm_models.py --performance --training --amp --backend=inductor --output=timm_models_training.csv
./benchmarks/dynamo/timm_models.py --performance --inference --bfloat16 --backend=inductor --output=timm_models_inference.csv
```
Name		Date	Size	#Lines	LOC
..		-	-
ci_expected_accuracy/	H	25-Apr-2025	-	14,098	3,647
microbenchmarks/	H	25-Apr-2025	-	29,505	28,899
pr_time_benchmarks/	H	25-Apr-2025	-	508	376
Makefile	H A D	25-Apr-2025	3.1 KiB	43	35
README.md	H A D	25-Apr-2025	5.3 KiB	92	70
__init__.py	H A D	25-Apr-2025	0	1	0
all_torchbench_models_list.txt	H A D	25-Apr-2025	1.2 KiB	73	73
benchmarks.py	H A D	25-Apr-2025	2.9 KiB	104	75
check_accuracy.py	H A D	25-Apr-2025	2.9 KiB	102	79
check_csv.py	H A D	25-Apr-2025	829	41	31
check_graph_breaks.py	H A D	25-Apr-2025	2.4 KiB	86	68
check_memory_compression_ratio.py	H A D	25-Apr-2025	1.6 KiB	58	49
check_perf_csv.py	H A D	25-Apr-2025	1,004	44	34
combine_csv.py	H A D	25-Apr-2025	1.3 KiB	51	40
common.py	H A D	25-Apr-2025	150.9 KiB	4,319	3,553
dist_util.py	H A D	25-Apr-2025	4 KiB	148	116
distributed.py	H A D	25-Apr-2025	5.5 KiB	178	145
expected_ci_perf_inductor_torchbench.csv	H A D	25-Apr-2025	2.8 KiB	56	55
expected_ci_speedup_inductor_torchbench_cpu.csv	H A D	25-Apr-2025	1.6 KiB	30	29
huggingface.py	H A D	25-Apr-2025	20.2 KiB	628	504
huggingface.yaml	H A D	25-Apr-2025	2.8 KiB	110	81
huggingface_models_list.txt	H A D	25-Apr-2025	1.3 KiB	52	51
huggingface_models_list_cpu.txt	H A D	25-Apr-2025	1.2 KiB	48	47
join_results.py	H A D	25-Apr-2025	1.5 KiB	57	44
parse_logs.py	H A D	25-Apr-2025	5.7 KiB	199	136
run_all.sh	H A D	25-Apr-2025	1.6 KiB	39	12
run_delta.sh	H A D	25-Apr-2025	751	23	13
runner.py	H A D	25-Apr-2025	53.3 KiB	1,548	1,336
summarize_perf.py	H A D	25-Apr-2025	4.4 KiB	145	119
test.py	H A D	25-Apr-2025	1.2 KiB	46	36
timm_models.py	H A D	25-Apr-2025	11.9 KiB	423	331
timm_models_list.txt	H A D	25-Apr-2025	1.1 KiB	62	61
timm_models_list_cpu.txt	H A D	25-Apr-2025	1 KiB	60	59
torchao_backend.py	H A D	25-Apr-2025	2.2 KiB	58	47
torchbench.py	H A D	25-Apr-2025	14.6 KiB	468	366
torchbench.yaml	H A D	25-Apr-2025	5.3 KiB	269	178
torchbench_models_list.txt	H A D	25-Apr-2025	463	29	28
torchbench_models_list_cpu.txt	H A D	25-Apr-2025	803	49	48
training_loss.py	H A D	25-Apr-2025	6.3 KiB	204	171