1# Modular Benchmarking Components: 2 3NOTE: These components are currently work in progress. 4 5## Timer 6This class is modeled on the `timeit.Timer` API, but with PyTorch specific 7facilities which make it more suitable for benchmarking kernels. These fall 8into two broad categories: 9 10### Managing 'gotchas': 11 12 `Timer` will invoke `torch.cuda.synchronize()` if applicable, control the 13 number of torch threads, add a warmup, and warn if a measurement appears 14 suspect or downright unreliable. 15 16### Integration and better measurement: 17 18 `Timer`, while modeled after the `timeit` analog, uses a slightly different 19 API from `timeit.Timer`. 20 21 * The constructor accepts additional metadata and timing methods return 22 a `Measurement` class rather than a float. This `Measurement` class is 23 serializable and allows many examples to be grouped and interpreted. 24 (See `Compare` for more details.) 25 26 * `Timer` implements the `blocked_autorange` function which is a 27 mixture of `timeit.Timer.repeat` and `timeit.Timer.autorange`. This function 28 selects and appropriate number and runs for a roughly fixed amount of time 29 (like `autorange`), but is less wasteful than `autorange` which discards 30 ~75% of measurements. It runs many times, similar to `repeat`, and returns 31 a `Measurement` containing all of the run results. 32 33## Compare 34 35`Compare` takes a list of `Measurement`s in its constructor, and displays them 36as a formatted table for easier analysis. Identical measurements will be 37merged, which allows `Compare` to process replicate measurements. Several 38convenience methods are also provided to truncate displayed values based on 39the number of significant figures and color code measurements to highlight 40performance differences. Grouping and layout is based on metadata passed to 41`Timer`: 42* `label`: This is a top level description. (e.g. `add`, or `multiply`) one 43table will be generated per unique label. 44 45* `sub_label`: This is the label for a given configuration. Multiple statements 46may be logically equivalent differ in implementation. Assigning separate 47sub_labels will result in a row per sub_label. If a sublabel is not provided, 48`stmt` is used instead. Statistics (such as computing the fastest 49implementation) are use all sub_labels. 50 51* `description`: This describes the inputs. For instance, `stmt=torch.add(x, y)` 52can be run over several values of `x` and `y`. Each pair should be given its 53own `description`, which allows them to appear in separate columns. 54Statistics do not mix values of different descriptions, since comparing the 55run time of drastically different inputs is generally not meaningful. 56 57* `env`: An optional description of the torch environment. (e.g. `master` or 58`my_branch`). Like sub_labels, statistics are calculated across envs. (Since 59comparing a branch to master or a stable release is a common use case.) 60However `Compare` will visually group rows which are run with the same `env`. 61 62* `num_threads`: By default, `Timer` will run in single-threaded mode. If 63`Measurements` with different numbers of threads are given to `Compare`, they 64will be grouped into separate blocks of rows. 65 66## Fuzzing 67 68The `Fuzzer` class is designed to allow very flexible and repeatable 69construction of a wide variety of Tensors while automating away some 70of the tedium that comes with creating good benchmark inputs. The two 71APIs of interest are the constructor and `Fuzzer.take(self, n: int)`. 72At construction, a `Fuzzer` is a spec for the kind of Tensors that 73should be created. It takes a list of `FuzzedParameters`, a list of 74`FuzzedTensors`, and an integer with which to seed the Fuzzer. 75 76The reason for distinguishing between parameters and Tensors is that the shapes 77and data of Tensors is often linked (e.g. shapes must be identical or 78broadcastable, indices must fall within a given range, etc.) As a result we 79must first materialize values for each parameter, and then use them to 80construct Tensors in a second pass. As a concrete reference, the following 81will create Tensors `x` and `y`, where `x` is a 2D Tensor and `y` is 82broadcastable to the shape of `x`: 83 84``` 85fuzzer = Fuzzer( 86 parameters=[ 87 FuzzedParameter("k0", 16, 16 * 1024, "loguniform"), 88 FuzzedParameter("k1", 16, 16 * 1024, "loguniform"), 89 ], 90 tensors=[ 91 FuzzedTensor( 92 name="x", size=("k0", "k1"), probability_contiguous=0.75 93 ), 94 FuzzedTensor( 95 name="y", size=("k0", 1), probability_contiguous=0.75 96 ), 97 ], 98 seed=0, 99) 100``` 101 102Calling `fuzzer.take(n)` will create a generator with `n` elements which 103yields randomly generated Tensors satisfying the above definition, as well 104as some metadata about the parameters and Tensors. Critically, calling 105`.take(...)` multiple times will produce generators which select the same 106parameters, allowing repeat measurements and different environments to 107conduct the same trial. `FuzzedParameter` and `FuzzedTensor` support a 108fairly involved set of behaviors to reflect the rich character of Tensor 109operations and representations. (For instance, note the 110`probability_contiguous` argument which signals that some fraction of the 111time non-contiguous Tensors should be created.) The best way to understand 112`Fuzzer`, however, is probably to experiment with `examples.fuzzer`. 113 114# Examples: 115`python -m examples.simple_timeit` 116 117`python -m examples.compare` 118 119`python -m examples.fuzzer` 120 121`python -m examples.end_to_end` 122