Name Date Size #Lines LOC

..--

examples/H25-Apr-2025-1,058836

op_fuzzers/H25-Apr-2025-479375

utils/H25-Apr-2025-3,2932,573

README.mdH A D25-Apr-20255.3 KiB12298

__init__.pyH A D25-Apr-2025411 76

README.md

1# Modular Benchmarking Components:
2
3NOTE: These components are currently work in progress.
4
5## Timer
6This class is modeled on the `timeit.Timer` API, but with PyTorch specific
7facilities which make it more suitable for benchmarking kernels. These fall
8into two broad categories:
9
10### Managing 'gotchas':
11
12  `Timer` will invoke `torch.cuda.synchronize()` if applicable, control the
13  number of torch threads, add a warmup, and warn if a measurement appears
14  suspect or downright unreliable.
15
16### Integration and better measurement:
17
18  `Timer`, while modeled after the `timeit` analog, uses a slightly different
19  API from `timeit.Timer`.
20
21  * The constructor accepts additional metadata and timing methods return
22  a `Measurement` class rather than a float. This `Measurement` class is
23  serializable and allows many examples to be grouped and interpreted.
24  (See `Compare` for more details.)
25
26  * `Timer` implements the `blocked_autorange` function which is a
27  mixture of `timeit.Timer.repeat` and `timeit.Timer.autorange`. This function
28  selects and appropriate number and runs for a roughly fixed amount of time
29  (like `autorange`), but is less wasteful than `autorange` which discards
30  ~75% of measurements. It runs many times, similar to `repeat`, and returns
31  a `Measurement` containing all of the run results.
32
33## Compare
34
35`Compare` takes a list of `Measurement`s in its constructor, and displays them
36as a formatted table for easier analysis. Identical measurements will be
37merged, which allows `Compare` to process replicate measurements. Several
38convenience methods are also provided to truncate displayed values based on
39the number of significant figures and color code measurements to highlight
40performance differences. Grouping and layout is based on metadata passed to
41`Timer`:
42* `label`: This is a top level description. (e.g. `add`, or `multiply`) one
43table will be generated per unique label.
44
45* `sub_label`: This is the label for a given configuration. Multiple statements
46may be logically equivalent differ in implementation. Assigning separate
47sub_labels will result in a row per sub_label. If a sublabel is not provided,
48`stmt` is used instead. Statistics (such as computing the fastest
49implementation) are use all sub_labels.
50
51* `description`: This describes the inputs. For instance, `stmt=torch.add(x, y)`
52can be run over several values of `x` and `y`. Each pair should be given its
53own `description`, which allows them to appear in separate columns.
54Statistics do not mix values of different descriptions, since comparing the
55run time of drastically different inputs is generally not meaningful.
56
57* `env`: An optional description of the torch environment. (e.g. `master` or
58`my_branch`). Like sub_labels, statistics are calculated across envs. (Since
59comparing a branch to master or a stable release is a common use case.)
60However `Compare` will visually group rows which are run with the same `env`.
61
62* `num_threads`: By default, `Timer` will run in single-threaded mode. If
63`Measurements` with different numbers of threads are given to `Compare`, they
64will be grouped into separate blocks of rows.
65
66## Fuzzing
67
68The `Fuzzer` class is designed to allow very flexible and repeatable
69construction of a wide variety of Tensors while automating away some
70of the tedium that comes with creating good benchmark inputs. The two
71APIs of interest are the constructor and `Fuzzer.take(self, n: int)`.
72At construction, a `Fuzzer` is a spec for the kind of Tensors that
73should be created. It takes a list of `FuzzedParameters`, a list of
74`FuzzedTensors`, and an integer with which to seed the Fuzzer.
75
76The reason for distinguishing between parameters and Tensors is that the shapes
77and data of Tensors is often linked (e.g. shapes must be identical or
78broadcastable, indices must fall within a given range, etc.) As a result we
79must first materialize values for each parameter, and then use them to
80construct Tensors in a second pass. As a concrete reference, the following
81will create Tensors `x` and `y`, where `x` is a 2D Tensor and `y` is
82broadcastable to the shape of `x`:
83
84```
85fuzzer = Fuzzer(
86  parameters=[
87    FuzzedParameter("k0", 16, 16 * 1024, "loguniform"),
88    FuzzedParameter("k1", 16, 16 * 1024, "loguniform"),
89  ],
90  tensors=[
91    FuzzedTensor(
92      name="x", size=("k0", "k1"), probability_contiguous=0.75
93    ),
94    FuzzedTensor(
95      name="y", size=("k0", 1), probability_contiguous=0.75
96    ),
97  ],
98  seed=0,
99)
100```
101
102Calling `fuzzer.take(n)` will create a generator with `n` elements which
103yields randomly generated Tensors satisfying the above definition, as well
104as some metadata about the parameters and Tensors. Critically, calling
105`.take(...)` multiple times will produce generators which select the same
106parameters, allowing repeat measurements and different environments to
107conduct the same trial. `FuzzedParameter` and `FuzzedTensor` support a
108fairly involved set of behaviors to reflect the rich character of Tensor
109operations and representations. (For instance, note the
110`probability_contiguous` argument which signals that some fraction of the
111time non-contiguous Tensors should be created.) The best way to understand
112`Fuzzer`, however, is probably to experiment with `examples.fuzzer`.
113
114# Examples:
115`python -m examples.simple_timeit`
116
117`python -m examples.compare`
118
119`python -m examples.fuzzer`
120
121`python -m examples.end_to_end`
122