Name Date Size #Lines LOC

..--

applications/H25-Apr-2025-8561

core/H25-Apr-2025-918671

definitions/H25-Apr-2025-345315

execution/H25-Apr-2025-507394

worker/H25-Apr-2025-193123

README.mdH A D25-Apr-20254.5 KiB143120

main.pyH A D25-Apr-20251.4 KiB4736

README.md

1# Instruction count microbenchmarks
2## Quick start
3
4### To run the benchmark:
5
6```
7# From pytorch root
8cd benchmarks/instruction_counts
9python main.py
10```
11
12Currently `main.py` contains a very simple threadpool (so that run time isn't
13unbearably onerous) and simply prints the results. These components will be
14upgraded in subsequent PRs.
15
16### To define a new benchmark:
17* `TimerArgs`: Low level definition which maps directly to
18`torch.utils.benchmark.Timer`
19* `GroupedStmts`: Benchmark a snippet. (Python, C++, or both) Can automatically
20generate TorchScript and autograd variants.
21* `GroupedModules`: Like `GroupedStmts`, but takes `nn.Module`s
22* `GroupedVariants`: Benchmark-per-line to define many related benchmarks in a
23single code block.
24
25## Architecture
26### Benchmark definition.
27
28One primary goal of this suite is to make it easy to define semantically
29related clusters of benchmarks. The crux of this effort is the
30`GroupedBenchmark` class, which is defined in `core/api.py`. It takes a
31definition for a set of related benchmarks, and produces one or more concrete
32cases. It's helpful to see an example to understand how the machinery works.
33Consider the following benchmark:
34
35```
36# `GroupedStmts` is an alias of `GroupedBenchmark.init_from_stmts`
37benchmark = GroupedStmts(
38    py_stmt=r"y = x * w",
39    cpp_stmt=r"auto y = x * w;",
40
41    setup=GroupedSetup(
42        py_setup="""
43            x = torch.ones((4, 4))
44            w = torch.ones((4, 4), requires_grad=True)
45        """,
46        cpp_setup="""
47            auto x = torch::ones((4, 4));
48            auto w = torch::ones((4, 4));
49            w.set_requires_grad(true);
50        """,
51    ),
52
53    signature="f(x, w) -> y",
54    torchscript=True,
55    autograd=True,
56),
57```
58
59It is trivial to generate Timers for the eager forward mode case (ignoring
60`num_threads` for now):
61
62```
63Timer(
64    stmt=benchmark.py_fwd_stmt,
65    setup=benchmark.setup.py_setup,
66)
67
68Timer(
69    stmt=benchmark.cpp_fwd_stmt,
70    setup=benchmark.setup.cpp_setup,
71    language="cpp",
72)
73```
74
75Moreover, because `signature` is provided we know that creation of `x` and `w`
76is part of setup, and the overall computation uses `x` and `w` to produce `y`.
77As a result, we can derive TorchScript'd and AutoGrad variants as well. We can
78deduce that a TorchScript model will take the form:
79
80```
81@torch.jit.script
82def f(x, w):
83    # Paste `benchmark.py_fwd_stmt` into the function body.
84    y = x * w
85    return y  # Set by `-> y` in signature.
86```
87
88And because we will want to use this model in both Python and C++, we save it to
89disk and load it as needed. At this point Timers for TorchScript become:
90
91```
92Timer(
93    stmt="""
94        y = jit_model(x, w)
95    """,
96    setup=""",
97        # benchmark.setup.py_setup
98        # jit_model = torch.jit.load(...)
99        # Warm up jit_model
100    """,
101)
102
103Timer(
104    stmt="""
105        std::vector<torch::jit::IValue> ivalue_inputs(
106            torch::jit::IValue({x}),
107            torch::jit::IValue({w})
108        );
109        auto y = jit_model.forward(ivalue_inputs);
110    """,
111    setup="""
112        # benchmark.setup.cpp_setup
113        # jit_model = torch::jit::load(...)
114        # Warm up jit_model
115    """,
116)
117```
118
119While nothing above is particularly complex, there is non-trivial bookkeeping
120(managing the model artifact, setting up IValues) which if done manually would
121be rather bug-prone and hard to read.
122
123The story is similar for autograd: because we know the output variable (`y`)
124and we make sure to assign it when calling TorchScript models, testing AutoGrad
125is as simple as appending `y.backward()` (or `y.backward();` in C++) to the
126stmt of the forward only variant. Of course this requires that `signature` be
127provided, as there is nothing special about the name `y`.
128
129The logic for the manipulations above is split between `core/api.py` (for
130generating `stmt` based on language, Eager/TorchScript, with or without AutoGrad)
131and `core/expand.py` (for larger, more expansive generation). The benchmarks
132themselves are defined in `definitions/standard.py`. The current set is chosen
133to demonstrate the various model definition APIs, and will be expanded when the
134benchmark runner infrastructure is better equipped to deal with a larger run.
135
136### Benchmark execution.
137
138Once `expand.materialize` has flattened the abstract benchmark definitions into
139`TimerArgs`, they can be sent to a worker (`worker/main.py`) subprocess to
140execution. This worker has no concept of the larger benchmark suite; `TimerArgs`
141is a one-to-one and direct mapping to the `torch.utils.benchmark.Timer` instance
142that the worker instantiates.
143