profiler - OpenGrok cross reference for /aosp_15_r20/external/pytorch/torch/csrc/profiler/

# Profiler Overview

This README describes the details of how the profiler is implemented.

The profiler instruments PyTorch to collect information about the model's execution. Its main features are:
* Instrumenting op calls on the CPU side
* Interfacing with [Kineto](https://github.com/pytorch/kineto/) to collect information from the GPU (or other accelerators)
* Collecting python stack traces
* Exporting this information, e.g. in a chrome trace, or to be processed by downstream tools like [HTA](https://github.com/facebookresearch/HolisticTraceAnalysis)

## Table of Contents

- [Codebase Structure](#codebase-structure)
- [`RecordFunction`](#recordfunction)
- [Autograd Integration](#autograd-integration)
- [Collection and Post-Processing](#collection-and-post-processing)
- [Kineto Integration](#kineto-integration)
- [Python Tracing](#python-tracing)

## Codebase Structure ##

TODO

## `RecordFunction` ##

[/aten/src/ATen/record_function.h](/aten/src/ATen/record_function.h)

`RecordFunction` is used by the profiler to instrument CPU-side events.

`RecordFunction` is a general method of instrumenting function calls in PyTorch. It can be used for other general applications, e.g. see [Features for Large-Scale Deployments](https://pytorch.org/docs/stable/notes/large_scale_deployments.html). In PyTorch, it is already included at some important locations; notably, in the [dispatcher](https://github.com/pytorch/pytorch/blob/247c603da9b780534e25fb1d90b6e5a528b625b1/aten/src/ATen/core/dispatch/Dispatcher.h#L650), surrounding every op.

Users (or PyTorch itself) can register callbacks that will be executed whenever a `RecordFunction` guard is encountered. The profiler uses this mechanism to record the start and end times for each op call, as well as user-provided `RecordFunction` annotations. The `RecordFunction` machinery is designed to have relatively low overhead, especially when there are no callbacks registered. Nevertheless, there can still be some overhead.

There is also a python binding for `RecordFunction` in python (`with torch.profiler.record_function`); this is often used by users to annotate events corresponding to module-level events.

## Autograd Integration ##

The autograd engine is responsible for automatically computing gradients.

The profiler records two pieces of information from the autograd engine:
* [Sequence number](/aten/src/ATen/SequenceNumber.h): this is a unique-per-thread index assigned to each op call(\*) in the forward pass. When a backward op is triggered, it is also assigned a sequence number matching the sequence number of the forward op that caused that backward op to be executed. Using this information, the profiler is able to match forward and backward ops; in chrome traces, this feature can be enabled with the "fwd_bwd" flow events
* [Forward thread id](https://github.com/pytorch/pytorch/blob/2e3fce54506ba82eee2c890410bf7a1405a64ec6/aten/src/ATen/record_function.h#L357): Autograd can be used in multi-threaded environments. The forward thread ID indicates the ID of the thread on which the forward op was executed on. This information is needed because the sequence number, mentioned above, is only unique within a thread; the forward thread ID is used for differentiating different ops with the same sequence number.

(\*) Note that only op invocations whose inputs require gradients are assigned a sequence number

## Collection and Post-Processing ##

TODO

## Kineto Integration ##

TODO

## Python Tracing ##

TODO
Name		Date	Size	#Lines	LOC
..		-	-
orchestration/	H	25-Apr-2025	-	536	391
python/	H	25-Apr-2025	-	948	792
standalone/	H	25-Apr-2025	-	1,185	941
stubs/	H	25-Apr-2025	-	313	258
unwind/	H	25-Apr-2025	-	2,815	2,509
README.md	H A D	25-Apr-2025	3.4 KiB	57	34
api.h	H A D	25-Apr-2025	510	15	9
collection.cpp	H A D	25-Apr-2025	53.9 KiB	1,582	1,230
collection.h	H A D	25-Apr-2025	19.8 KiB	679	507
combined_traceback.cpp	H A D	25-Apr-2025	5.4 KiB	190	158
combined_traceback.h	H A D	25-Apr-2025	2.4 KiB	77	52
containers.h	H A D	25-Apr-2025	5.7 KiB	202	142
data_flow.cpp	H A D	25-Apr-2025	6.4 KiB	197	145
data_flow.h	H A D	25-Apr-2025	3.5 KiB	91	39
events.h	H A D	25-Apr-2025	1 KiB	30	11
kineto_client_interface.cpp	H A D	25-Apr-2025	3 KiB	122	99
kineto_shim.cpp	H A D	25-Apr-2025	13 KiB	435	362
kineto_shim.h	H A D	25-Apr-2025	4 KiB	153	110
perf-inl.h	H A D	25-Apr-2025	1.4 KiB	67	41
perf.cpp	H A D	25-Apr-2025	4.9 KiB	193	136
perf.h	H A D	25-Apr-2025	2.4 KiB	100	55
util.cpp	H A D	25-Apr-2025	25 KiB	784	690
util.h	H A D	25-Apr-2025	5.6 KiB	178	154