xref: /aosp_15_r20/external/pytorch/CONTRIBUTING.md (revision da0073e96a02ea20f0ac840b70461e3646d07c45)
1Thank you for your interest in contributing to PyTorch!
2If you're a new contributor, please first take a read through our
3[Contributing Guide](https://github.com/pytorch/pytorch/wiki/The-Ultimate-Guide-to-PyTorch-Contributions), specifically the [Submitting a Change](https://github.com/pytorch/pytorch/wiki/The-Ultimate-Guide-to-PyTorch-Contributions#submitting-a-change) section
4that walks through the process of contributing a change to PyTorch.
5
6The rest of this document (CONTRIBUTING.md) covers some of the more technical
7aspects of contributing to PyTorch.
8
9# Table of Contents
10
11<!-- toc -->
12
13- [Developing PyTorch](#developing-pytorch)
14  - [Setup the development environment](#setup-the-development-environment)
15  - [Tips and Debugging](#tips-and-debugging)
16- [Nightly Checkout & Pull](#nightly-checkout--pull)
17- [Codebase structure](#codebase-structure)
18- [Unit testing](#unit-testing)
19  - [Python Unit Testing](#python-unit-testing)
20  - [Better local unit tests with `pytest`](#better-local-unit-tests-with-pytest)
21  - [Local linting](#local-linting)
22    - [Running `mypy`](#running-mypy)
23  - [C++ Unit Testing](#c-unit-testing)
24  - [Run Specific CI Jobs](#run-specific-ci-jobs)
25- [Merging your Change](#merging-your-change)
26- [Writing documentation](#writing-documentation)
27  - [Docstring type formatting](#docstring-type-formatting)
28  - [Building documentation](#building-documentation)
29    - [Tips](#tips)
30    - [Building C++ Documentation](#building-c-documentation)
31  - [Previewing changes locally](#previewing-changes-locally)
32  - [Previewing documentation on PRs](#previewing-documentation-on-prs)
33  - [Adding documentation tests](#adding-documentation-tests)
34- [Profiling with `py-spy`](#profiling-with-py-spy)
35- [Managing multiple build trees](#managing-multiple-build-trees)
36- [C++ development tips](#c-development-tips)
37  - [Build only what you need](#build-only-what-you-need)
38  - [Code completion and IDE support](#code-completion-and-ide-support)
39  - [Make no-op build fast](#make-no-op-build-fast)
40    - [Use Ninja](#use-ninja)
41    - [Use CCache](#use-ccache)
42    - [Use a faster linker](#use-a-faster-linker)
43    - [Use pre-compiled headers](#use-pre-compiled-headers)
44    - [Workaround for header dependency bug in nvcc](#workaround-for-header-dependency-bug-in-nvcc)
45  - [Rebuild few files with debug information](#rebuild-few-files-with-debug-information)
46  - [C++ frontend development tips](#c-frontend-development-tips)
47  - [GDB integration](#gdb-integration)
48  - [C++ stacktraces](#c-stacktraces)
49- [CUDA development tips](#cuda-development-tips)
50- [Windows development tips](#windows-development-tips)
51  - [Known MSVC (and MSVC with NVCC) bugs](#known-msvc-and-msvc-with-nvcc-bugs)
52  - [Building on legacy code and CUDA](#building-on-legacy-code-and-cuda)
53- [Running clang-tidy](#running-clang-tidy)
54- [Pre-commit tidy/linting hook](#pre-commit-tidylinting-hook)
55- [Building PyTorch with ASAN](#building-pytorch-with-asan)
56  - [Getting `ccache` to work](#getting-ccache-to-work)
57  - [Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?](#why-this-stuff-with-ld_preload-and-libasan_rt)
58  - [Why LD_PRELOAD in the build function?](#why-ld_preload-in-the-build-function)
59  - [Why no leak detection?](#why-no-leak-detection)
60- [Caffe2 notes](#caffe2-notes)
61- [CI failure tips](#ci-failure-tips)
62  - [Which commit is used in CI?](#which-commit-is-used-in-ci)
63- [Dev Infra Office Hours](#dev-infra-office-hours)
64
65<!-- tocstop -->
66
67## Developing PyTorch
68
69Follow the instructions for [installing PyTorch from source](https://github.com/pytorch/pytorch#from-source). If you get stuck when developing PyTorch on your machine, check out the [tips and debugging](#tips-and-debugging) section below for common solutions.
70
71### Setup the development environment
72
73First, you need to [fork the PyTorch project on GitHub](https://github.com/pytorch/pytorch/fork) and follow the instructions at [Connecting to GitHub with SSH](https://docs.github.com/en/authentication/connecting-to-github-with-ssh) to setup your SSH authentication credentials.
74
75Then clone the PyTorch project and setup the development environment:
76
77```bash
78git clone [email protected]:<USERNAME>/pytorch.git
79cd pytorch
80git remote add upstream [email protected]:pytorch/pytorch.git
81
82make setup-env  # or make setup-env-cuda for pre-built CUDA binaries
83conda activate pytorch-deps
84```
85
86### Tips and Debugging
87
88* If you want to have no-op incremental rebuilds (which are fast), see [Make no-op build fast](#make-no-op-build-fast) below.
89
90* When installing with `python setup.py develop` (in contrast to `python setup.py install`) Python runtime will use
91  the current local source-tree when importing `torch` package. (This is done by creating [`.egg-link`](https://wiki.python.org/moin/PythonPackagingTerminology#egg-link) file in `site-packages` folder)
92  This way you do not need to repeatedly install after modifying Python files (`.py`).
93  However, you would need to reinstall if you modify Python interface (`.pyi`, `.pyi.in`) or
94   non-Python files (`.cpp`, `.cc`, `.cu`, `.h`, ...).
95
96
97  One way to avoid running `python setup.py develop` every time one makes a change to C++/CUDA/ObjectiveC files on Linux/Mac,
98  is to create a symbolic link from `build` folder to `torch/lib`, for example, by issuing following:
99  ```bash
100   pushd torch/lib; sh -c "ln -sf ../../build/lib/libtorch_cpu.* ."; popd
101  ```
102   Afterwards rebuilding a library (for example to rebuild `libtorch_cpu.so` issue `ninja torch_cpu` from `build` folder),
103   would be sufficient to make change visible in `torch` package.
104
105
106  To reinstall, first uninstall all existing PyTorch installs. You may need to run `pip
107  uninstall torch` multiple times. You'll know `torch` is fully
108  uninstalled when you see `WARNING: Skipping torch as it is not
109  installed`. (You should only have to `pip uninstall` a few times, but
110  you can always `uninstall` with `timeout` or in a loop if you're feeling
111  lazy.)
112
113  ```bash
114  conda uninstall pytorch -y
115  yes | pip uninstall torch
116  ```
117
118  Next run `python setup.py clean`. After that, you can install in `develop` mode again.
119
120* If you run into errors when running `python setup.py develop`, here are some debugging steps:
121  1. Run `printf '#include <stdio.h>\nint main() { printf("Hello World");}'|clang -x c -; ./a.out` to make sure
122  your CMake works and can compile this simple Hello World program without errors.
123  2. Nuke your `build` directory. The `setup.py` script compiles binaries into the `build` folder and caches many
124  details along the way, which saves time the next time you build. If you're running into issues, you can always
125  `rm -rf build` from the toplevel `pytorch` directory and start over.
126  3. If you have made edits to the PyTorch repo, commit any change you'd like to keep and clean the repo with the
127  following commands (note that clean _really_ removes all untracked files and changes.):
128      ```bash
129      git submodule deinit -f .
130      git clean -xdf
131      python setup.py clean
132      git submodule update --init --recursive # very important to sync the submodules
133      python setup.py develop                 # then try running the command again
134      ```
135  4. The main step within `python setup.py develop` is running `make` from the `build` directory. If you want to
136    experiment with some environment variables, you can pass them into the command:
137      ```bash
138      ENV_KEY1=ENV_VAL1[, ENV_KEY2=ENV_VAL2]* python setup.py develop
139      ```
140
141* If you run into issue running `git submodule update --init --recursive`. Please try the following:
142  - If you encounter an error such as
143    ```
144    error: Submodule 'third_party/pybind11' could not be updated
145    ```
146    check whether your Git local or global config file contains any `submodule.*` settings. If yes, remove them and try again.
147    (please reference [this doc](https://git-scm.com/docs/git-config#Documentation/git-config.txt-submoduleltnamegturl) for more info).
148
149  - If you encounter an error such as
150    ```
151    fatal: unable to access 'https://github.com/pybind11/pybind11.git': could not load PEM client certificate ...
152    ```
153    this is likely that you are using HTTP proxying and the certificate expired. To check if the certificate is valid, run
154    `git config --global --list` and search for config like `http.proxysslcert=<cert_file>`. Then check certificate valid date by running
155    ```bash
156    openssl x509 -noout -in <cert_file> -dates
157    ```
158
159  - If you encounter an error that some third_party modules are not checked out correctly, such as
160    ```
161    Could not find .../pytorch/third_party/pybind11/CMakeLists.txt
162    ```
163    remove any `submodule.*` settings in your local git config (`.git/config` of your pytorch repo) and try again.
164* If you're a Windows contributor, please check out [Best Practices](https://github.com/pytorch/pytorch/wiki/Best-Practices-to-Edit-and-Compile-Pytorch-Source-Code-On-Windows).
165* For help with any part of the contributing process, please don’t hesitate to utilize our Zoom office hours! See details [here](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours)
166
167## Nightly Checkout & Pull
168
169The `tools/nightly.py` script is provided to ease pure Python development of
170PyTorch. This uses `conda` and `git` to check out the nightly development
171version of PyTorch and installs pre-built binaries into the current repository.
172This is like a development or editable install, but without needing the ability
173to compile any C++ code.
174
175You can use this script to check out a new nightly branch with the following:
176
177```bash
178./tools/nightly.py checkout -b my-nightly-branch
179conda activate pytorch-deps
180```
181
182Or if you would like to re-use an existing conda environment, you can pass in
183the regular environment parameters (`--name` or `--prefix`):
184
185```bash
186./tools/nightly.py checkout -b my-nightly-branch -n my-env
187conda activate my-env
188```
189
190To install the nightly binaries built with CUDA, you can pass in the flag `--cuda`:
191
192```bash
193./tools/nightly.py checkout -b my-nightly-branch --cuda
194conda activate pytorch-deps
195```
196
197You can also use this tool to pull the nightly commits into the current branch:
198
199```bash
200./tools/nightly.py pull -n my-env
201conda activate my-env
202```
203
204Pulling will reinstall the PyTorch dependencies as well as the nightly binaries
205into the repo directory.
206
207## Codebase structure
208
209* [c10](c10) - Core library files that work everywhere, both server
210  and mobile. We are slowly moving pieces from [ATen/core](aten/src/ATen/core)
211  here. This library is intended only to contain essential functionality,
212  and appropriate to use in settings where binary size matters. (But
213  you'll have a lot of missing functionality if you try to use it
214  directly.)
215* [aten](aten) - C++ tensor library for PyTorch (no autograd support)
216  * [src](aten/src) - [README](aten/src/README.md)
217    * [ATen](aten/src/ATen)
218      * [core](aten/src/ATen/core) - Core functionality of ATen. This
219        is migrating to top-level c10 folder.
220      * [native](aten/src/ATen/native) - Modern implementations of
221        operators. If you want to write a new operator, here is where
222        it should go. Most CPU operators go in the top level directory,
223        except for operators which need to be compiled specially; see
224        cpu below.
225        * [cpu](aten/src/ATen/native/cpu) - Not actually CPU
226          implementations of operators, but specifically implementations
227          which are compiled with processor-specific instructions, like
228          AVX. See the [README](aten/src/ATen/native/cpu/README.md) for more
229          details.
230        * [cuda](aten/src/ATen/native/cuda) - CUDA implementations of
231          operators.
232        * [sparse](aten/src/ATen/native/sparse) - CPU and CUDA
233          implementations of COO sparse tensor operations
234        * [mkl](aten/src/ATen/native/mkl) [mkldnn](aten/src/ATen/native/mkldnn)
235          [miopen](aten/src/ATen/native/miopen) [cudnn](aten/src/ATen/native/cudnn)
236          - implementations of operators which simply bind to some
237            backend library.
238        * [quantized](aten/src/ATen/native/quantized/) - Quantized tensor (i.e. QTensor) operation implementations. [README](aten/src/ATen/native/quantized/README.md) contains details including how to implement native quantized operations.
239* [torch](torch) - The actual PyTorch library. Everything that is not
240  in [csrc](torch/csrc) is a Python module, following the PyTorch Python
241  frontend module structure.
242  * [csrc](torch/csrc) - C++ files composing the PyTorch library. Files
243    in this directory tree are a mix of Python binding code, and C++
244    heavy lifting. Consult `setup.py` for the canonical list of Python
245    binding files; conventionally, they are often prefixed with
246    `python_`. [README](torch/csrc/README.md)
247    * [jit](torch/csrc/jit) - Compiler and frontend for TorchScript JIT
248      frontend. [README](torch/csrc/jit/README.md)
249    * [autograd](torch/csrc/autograd) - Implementation of reverse-mode automatic differentiation. [README](torch/csrc/autograd/README.md)
250    * [api](torch/csrc/api) - The PyTorch C++ frontend.
251    * [distributed](torch/csrc/distributed) - Distributed training
252      support for PyTorch.
253* [tools](tools) - Code generation scripts for the PyTorch library.
254  See [README](tools/README.md) of this directory for more details.
255* [test](test) - Python unit tests for PyTorch Python frontend.
256  * [test_torch.py](test/test_torch.py) - Basic tests for PyTorch
257    functionality.
258  * [test_autograd.py](test/test_autograd.py) - Tests for non-NN
259    automatic differentiation support.
260  * [test_nn.py](test/test_nn.py) - Tests for NN operators and
261    their automatic differentiation.
262  * [test_jit.py](test/test_jit.py) - Tests for the JIT compiler
263    and TorchScript.
264  * ...
265  * [cpp](test/cpp) - C++ unit tests for PyTorch C++ frontend.
266    * [api](test/cpp/api) - [README](test/cpp/api/README.md)
267    * [jit](test/cpp/jit) - [README](test/cpp/jit/README.md)
268    * [tensorexpr](test/cpp/tensorexpr) - [README](test/cpp/tensorexpr/README.md)
269  * [expect](test/expect) - Automatically generated "expect" files
270    which are used to compare against expected output.
271  * [onnx](test/onnx) - Tests for ONNX export functionality,
272    using both PyTorch and Caffe2.
273* [caffe2](caffe2) - The Caffe2 library.
274  * [core](caffe2/core) - Core files of Caffe2, e.g., tensor, workspace,
275    blobs, etc.
276  * [operators](caffe2/operators) - Operators of Caffe2.
277  * [python](caffe2/python) - Python bindings to Caffe2.
278  * ...
279* [.circleci](.circleci) - CircleCI configuration management. [README](.circleci/README.md)
280
281## Unit testing
282
283### Python Unit Testing
284
285**Prerequisites**:
286The following packages should be installed with either `conda` or `pip`:
287- `expecttest` and `hypothesis` - required to run tests
288- `mypy` - recommended for linting
289- `pytest` - recommended to run tests more selectively
290
291All PyTorch test suites are located in the `test` folder and start with
292`test_`. Run the entire test
293suite with
294
295```bash
296python test/run_test.py
297```
298
299or run individual test suites using the command `python test/FILENAME.py`,
300where `FILENAME` represents the file containing the test suite you wish
301to run.
302
303For example, to run all the TorchScript JIT tests (located at
304`test/test_jit.py`), you would run:
305
306```bash
307python test/test_jit.py
308```
309
310You can narrow down what you're testing even further by specifying the
311name of an individual test with `TESTCLASSNAME.TESTNAME`. Here,
312`TESTNAME` is the name of the test you want to run, and `TESTCLASSNAME`
313is the name of the class in which it is defined.
314
315Going off the above example, let's say you want to run
316`test_Sequential`, which is defined as part of the `TestJit` class
317in `test/test_jit.py`. Your command would be:
318
319```bash
320python test/test_jit.py TestJit.test_Sequential
321```
322
323**Weird note:** In our CI (Continuous Integration) jobs, we actually run the tests from the `test` folder and **not** the root of the repo, since there are various dependencies we set up for CI that expects the tests to be run from the test folder. As such, there may be some inconsistencies between local testing and CI testing--if you observe an inconsistency, please [file an issue](https://github.com/pytorch/pytorch/issues/new/choose).
324
325### Better local unit tests with `pytest`
326
327We don't officially support `pytest`, but it works well with our
328`unittest` tests and offers a number of useful features for local
329developing. Install it via `pip install pytest`.
330
331If you want to just run tests that contain a specific substring, you can
332use the `-k` flag:
333
334```bash
335pytest test/test_nn.py -k Loss -v
336```
337
338The above is an example of testing a change to all Loss functions: this
339command runs tests such as `TestNN.test_BCELoss` and
340`TestNN.test_MSELoss` and can be useful to save keystrokes.
341
342### Local linting
343
344Install all prerequisites by running
345
346```bash
347make setup-lint
348```
349
350You can now run the same linting steps that are used in CI locally via `make`:
351
352```bash
353make lint
354```
355
356Learn more about the linter on the [lintrunner wiki page](https://github.com/pytorch/pytorch/wiki/lintrunner)
357
358#### Running `mypy`
359
360`mypy` is an optional static type checker for Python. We have multiple `mypy`
361configs for the PyTorch codebase that are automatically validated against whenever the linter is run.
362
363See [Guide for adding type annotations to
364PyTorch](https://github.com/pytorch/pytorch/wiki/Guide-for-adding-type-annotations-to-PyTorch)
365for more information on how to set up `mypy` and tackle type annotation
366tasks.
367
368### C++ Unit Testing
369
370PyTorch offers a series of tests located in the `test/cpp` folder.
371These tests are written in C++ and use the Google Test testing framework.
372After compiling PyTorch from source, the test runner binaries will be
373written to the `build/bin` folder. The command to run one of these tests
374is `./build/bin/FILENAME --gtest_filter=TESTSUITE.TESTNAME`, where
375`TESTNAME` is the name of the test you'd like to run and `TESTSUITE` is
376the suite that test is defined in.
377
378For example, if you wanted to run the test `MayContainAlias`, which
379is part of the test suite `ContainerAliasingTest` in the file
380`test/cpp/jit/test_alias_analysis.cpp`, the command would be:
381
382```bash
383./build/bin/test_jit --gtest_filter=ContainerAliasingTest.MayContainAlias
384```
385
386
387### Run Specific CI Jobs
388
389You can generate a commit that limits the CI to only run a specific job by using
390`tools/testing/explicit_ci_jobs.py` like so:
391
392```bash
393# --job: specify one or more times to filter to a specific job + its dependencies
394# --filter-gha: specify github actions workflows to keep
395# --make-commit: commit CI changes to git with a message explaining the change
396python tools/testing/explicit_ci_jobs.py --job binary_linux_manywheel_3_6m_cpu_devtoolset7_nightly_test --filter-gha '*generated*gcc5.4*' --make-commit
397
398# Make your changes
399
400ghstack submit
401```
402
403**NB**: It is not recommended to use this workflow unless you are also using
404[`ghstack`](https://github.com/ezyang/ghstack). It creates a large commit that is
405of very low signal to reviewers.
406
407## Merging your Change
408If you know the right people or team that should approve your PR (and you have the required permissions to do so), add them to the Reviewers list.
409
410If not, leave the Reviewers section empty. Our triage squad will review your PR, add a module label, and assign it to the appropriate reviewer in a couple business days.  The reviewer will then look at your PR and respond.
411
412Occasionally, things might fall through the cracks (sorry!). In case your PR either doesn't get assigned to a reviewer or doesn't get any response from the reviewer for 4 business days, please leave comment on the PR (mentioning the reviewer if one has been assigned). That'll get it nudged back onto people's radar.
413
414If that still doesn't help, come see us during [our office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours)
415
416Once your PR is approved, you can merge it in by entering a comment with the content `@pytorchmergebot merge` ([what's this bot?](https://github.com/pytorch/pytorch/wiki/Bot-commands))
417
418## Writing documentation
419
420So you want to write some documentation and don't know where to start?
421PyTorch has two main types of documentation:
422- **User facing documentation**:
423These are the docs that you see over at [our docs website](https://pytorch.org/docs).
424- **Developer facing documentation**:
425Developer facing documentation is spread around our READMEs in our codebase and in
426the [PyTorch Developer Wiki](https://pytorch.org/wiki).
427If you're interested in adding new developer docs, please read this [page on the wiki](https://github.com/pytorch/pytorch/wiki/Where-or-how-should-I-add-documentation) on our best practices for where to put it.
428
429The rest of this section is about user-facing documentation.
430
431PyTorch uses [Google style](https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html)
432for formatting docstrings. Each line inside a docstrings block must be limited to 80 characters so that it fits into Jupyter documentation popups.
433
434
435### Docstring type formatting
436
437In addition to the standard Google Style docstring formatting rules, the following guidelines should be followed for docstring types (docstring types are the type information contained in the round brackets after the variable name):
438
439* The "`Callable`", "`Any`", "`Iterable`", "`Iterator`", "`Generator`" types should have their first letter capitalized.
440
441* The "`list`" and "`tuple`" types should be completely lowercase.
442
443* Types should not be made plural. For example: `tuple of int` should be used instead of `tuple of ints`.
444
445* The only acceptable delimiter words for types are `or` and `of`. No other non-type words should be used other than `optional`.
446
447* The word `optional` should only be used after the types, and it is only used if the user does not have to specify a value for the variable. Default values are listed after the variable description. Example:
448
449    ```
450    my_var (int, optional): Variable description. Default: 1
451    ```
452
453* Basic Python types should match their type name so that the [Intersphinx](https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html) extension can correctly identify them. For example:
454    * Use `str` instead of `string`.
455    * Use `bool` instead of `boolean`.
456    * Use `dict` instead of `dictionary`.
457
458* Square brackets should be used for the dictionary type. For example:
459
460    ```
461    my_var (dict[str, int]): Variable description.
462    ```
463
464* If a variable has two different possible types, then the word `or` should be used without a comma. Otherwise variables with 3 or more types should use commas to separate the types. Example:
465
466    ```
467    x (type1 or type2): Variable description.
468    y (type1, type2, or type3): Variable description.
469    ```
470
471
472### Building documentation
473
474To build the documentation:
475
4761. Build and install PyTorch
477
4782. Install the prerequisites
479
480```bash
481cd docs
482pip install -r requirements.txt
483# `katex` must also be available in your PATH.
484# You can either install katex globally if you have properly configured npm:
485# npm install -g katex
486# Or if you prefer an uncontaminated global executable environment or do not want to go through the node configuration:
487# npm install katex && export PATH="$PATH:$(pwd)/node_modules/.bin"
488```
489> Note: if you installed `nodejs` with a different package manager (e.g.,
490`conda`) then `npm` will probably install a version of `katex` that is not
491compatible with your version of `nodejs` and doc builds will fail.
492A combination of versions that is known to work is `[email protected]` and
493`[email protected]`. To install the latter with `npm` you can run
494```npm install -g [email protected]```
495
496
497> Note that if you are a Facebook employee using a devserver, yarn may be more convenient to install katex:
498
499```bash
500yarn global add katex
501```
502> If a specific version is required you can use for example `yarn global add [email protected]`.
503
5043. Generate the documentation HTML files. The generated files will be in `docs/build/html`.
505
506```bash
507make html
508```
509
510#### Tips
511
512The `.rst` source files live in [docs/source](docs/source). Some of the `.rst`
513files pull in docstrings from PyTorch Python code (for example, via
514the `autofunction` or `autoclass` directives). To vastly shorten doc build times,
515it is helpful to remove the files you are not working on, only keeping the base
516`index.rst` file and the files you are editing. The Sphinx build will produce
517missing file warnings but will still complete. For example, to work on `jit.rst`:
518
519```bash
520cd docs/source
521find . -type f | grep rst | grep -v index | grep -v jit | xargs rm
522
523# Make your changes, build the docs, etc.
524
525# Don't commit the deletions!
526git add index.rst jit.rst
527...
528```
529
530#### Building C++ Documentation
531
532For C++ documentation (https://pytorch.org/cppdocs), we use
533[Doxygen](http://www.doxygen.nl/) and then convert it to
534[Sphinx](http://www.sphinx-doc.org/) via
535[Breathe](https://github.com/michaeljones/breathe) and
536[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen
537reference](https://www.doxygen.nl/manual/) for more
538information on the documentation syntax.
539
540We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen
541commands. To run this check locally, run `./check-doxygen.sh` from inside
542`docs/cpp/source`.
543
544To build the documentation, follow the same steps as above, but run them from
545`docs/cpp` instead of `docs`.
546
547### Previewing changes locally
548
549To view HTML files locally, you can open the files in your web browser. For example,
550navigate to `file:///your_pytorch_folder/docs/build/html/index.html` in a web
551browser.
552
553If you are developing on a remote machine, you can set up an SSH tunnel so that
554you can access the HTTP server on the remote machine from your local machine. To map
555remote port 8000 to local port 8000, use either of the following commands.
556
557```bash
558# For SSH
559ssh my_machine -L 8000:my_machine:8000
560
561# For Eternal Terminal
562et my_machine -t="8000:8000"
563```
564
565Then navigate to `localhost:8000` in your web browser.
566
567**Tip:**
568You can start a lightweight HTTP server on the remote machine with:
569
570```bash
571python -m http.server 8000 <path_to_html_output>
572```
573
574Alternatively, you can run `rsync` on your local machine to copy the files from
575your remote machine:
576
577```bash
578mkdir -p build cpp/build
579rsync -az me@my_machine:/path/to/pytorch/docs/build/html build
580rsync -az me@my_machine:/path/to/pytorch/docs/cpp/build/html cpp/build
581```
582
583### Previewing documentation on PRs
584
585PyTorch will host documentation previews at `https://docs-preview.pytorch.org/pytorch/pytorch/<pr number>/index.html` once the
586`pytorch_python_doc_build` GitHub Actions job has completed on your PR. You can visit that page directly
587or find its link in the automated Dr. CI comment on your PR.
588
589### Adding documentation tests
590
591It is easy for code snippets in docstrings and `.rst` files to get out of date. The docs
592build includes the [Sphinx Doctest Extension](https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html),
593which can run code in documentation as a unit test. To use the extension, use
594the `.. testcode::` directive in your `.rst` and docstrings.
595
596To manually run these tests, follow steps 1 and 2 above, then run:
597
598```bash
599cd docs
600make doctest
601```
602
603## Profiling with `py-spy`
604
605Evaluating the performance impact of code changes in PyTorch can be complicated,
606particularly if code changes happen in compiled code. One simple way to profile
607both Python and C++ code in PyTorch is to use
608[`py-spy`](https://github.com/benfred/py-spy), a sampling profiler for Python
609that has the ability to profile native code and Python code in the same session.
610
611`py-spy` can be installed via `pip`:
612
613```bash
614pip install py-spy
615```
616
617To use `py-spy`, first write a Python test script that exercises the
618functionality you would like to profile. For example, this script profiles
619`torch.add`:
620
621```python
622import torch
623
624t1 = torch.tensor([[1, 1], [1, 1.]])
625t2 = torch.tensor([[0, 0], [0, 0.]])
626
627for _ in range(1000000):
628    torch.add(t1, t2)
629```
630
631Since the `torch.add` operation happens in microseconds, we repeat it a large
632number of times to get good statistics. The most straightforward way to use
633`py-spy` with such a script is to generate a [flame
634graph](http://www.brendangregg.com/flamegraphs.html):
635
636```bash
637py-spy record -o profile.svg --native -- python test_tensor_tensor_add.py
638```
639
640This will output a file named `profile.svg` containing a flame graph you can
641view in a web browser or SVG viewer. Individual stack frame entries in the graph
642can be selected interactively with your mouse to zoom in on a particular part of
643the program execution timeline. The `--native` command-line option tells
644`py-spy` to record stack frame entries for PyTorch C++ code. To get line numbers
645for C++ code it may be necessary to compile PyTorch in debug mode by prepending
646your `setup.py develop` call to compile PyTorch with `DEBUG=1`. Depending on
647your operating system it may also be necessary to run `py-spy` with root
648privileges.
649
650`py-spy` can also work in an `htop`-like "live profiling" mode and can be
651tweaked to adjust the stack sampling rate, see the `py-spy` readme for more
652details.
653
654## Managing multiple build trees
655
656One downside to using `python setup.py develop` is that your development
657version of PyTorch will be installed globally on your account (e.g., if
658you run `import torch` anywhere else, the development version will be
659used.
660
661If you want to manage multiple builds of PyTorch, you can make use of
662[conda environments](https://conda.io/docs/using/envs.html) to maintain
663separate Python package environments, each of which can be tied to a
664specific build of PyTorch. To set one up:
665
666```bash
667conda create -n pytorch-myfeature
668source activate pytorch-myfeature
669# if you run python now, torch will NOT be installed
670python setup.py develop
671```
672
673## C++ development tips
674
675If you are working on the C++ code, there are a few important things that you
676will want to keep in mind:
677
6781. How to rebuild only the code you are working on.
6792. How to make rebuilds in the absence of changes go faster.
680
681### Build only what you need
682
683`python setup.py build` will build everything by default, but sometimes you are
684only interested in a specific component.
685
686- Working on a test binary? Run `(cd build && ninja bin/test_binary_name)` to
687  rebuild only that test binary (without rerunning cmake). (Replace `ninja` with
688  `make` if you don't have ninja installed).
689
690On the initial build, you can also speed things up with the environment
691variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `USE_FLASH_ATTENTION`, `USE_MEM_EFF_ATTENTION`, `BUILD_TEST`, `USE_FBGEMM`, `USE_NNPACK` and `USE_QNNPACK`.
692
693- `DEBUG=1` will enable debug builds (-g -O0)
694- `REL_WITH_DEB_INFO=1` will enable debug symbols with optimizations (-g -O3)
695- `USE_DISTRIBUTED=0` will disable distributed (c10d, gloo, mpi, etc.) build.
696- `USE_MKLDNN=0` will disable using MKL-DNN.
697- `USE_CUDA=0` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
698- `BUILD_TEST=0` will disable building C++ test binaries.
699- `USE_FBGEMM=0` will disable using FBGEMM (quantized 8-bit server operators).
700- `USE_NNPACK=0` will disable compiling with NNPACK.
701- `USE_QNNPACK=0` will disable QNNPACK build (quantized 8-bit operators).
702- `USE_XNNPACK=0` will disable compiling with XNNPACK.
703- `USE_FLASH_ATTENTION=0` and `USE_MEM_EFF_ATTENTION=0` will disable compiling flash attention and memory efficient kernels respectively
704
705For example:
706
707```bash
708DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 python setup.py develop
709```
710
711For subsequent builds (i.e., when `build/CMakeCache.txt` exists), the build
712options passed for the first time will persist; please run `ccmake build/`, run
713`cmake-gui build/`, or directly edit `build/CMakeCache.txt` to adapt build
714options.
715
716### Code completion and IDE support
717
718When using `python setup.py develop`, PyTorch will generate
719a `compile_commands.json` file that can be used by many editors
720to provide command completion and error highlighting for PyTorch's
721C++ code. You need to `pip install ninja` to generate accurate
722information for the code in `torch/csrc`. More information at:
723- https://sarcasm.github.io/notes/dev/compilation-database.html
724
725### Make no-op build fast
726
727#### Use Ninja
728
729By default, cmake will use its Makefile generator to generate your build
730system.  You can get faster builds if you install the ninja build system
731with `pip install ninja`.  If PyTorch was already built, you will need
732to run `python setup.py clean` once after installing ninja for builds to
733succeed.
734
735Note: Make sure to use a machine with a larger number of CPU cores, this will significantly reduce your build times.
736
737#### Use CCache
738
739Even when dependencies are tracked with file modification, there are many
740situations where files get rebuilt when a previous compilation was exactly the
741same. Using ccache in a situation like this is a real time-saver.
742
743Before building pytorch, install ccache from your package manager of choice:
744
745```bash
746conda install ccache -c conda-forge
747sudo apt install ccache
748sudo yum install ccache
749brew install ccache
750```
751
752You may also find the default cache size in ccache is too small to be useful.
753The cache sizes can be increased from the command line:
754
755```bash
756# config: cache dir is ~/.ccache, conf file ~/.ccache/ccache.conf
757# max size of cache
758ccache -M 25Gi  # -M 0 for unlimited
759# unlimited number of files
760ccache -F 0
761```
762
763To check this is working, do two clean builds of pytorch in a row. The second
764build should be substantially and noticeably faster than the first build. If
765this doesn't seem to be the case, check the `CMAKE_<LANG>_COMPILER_LAUNCHER`
766rules in `build/CMakeCache.txt`, where `<LANG>` is `C`, `CXX` and `CUDA`.
767Each of these 3 variables should contain ccache, e.g.
768
769```
770//CXX compiler launcher
771CMAKE_CXX_COMPILER_LAUNCHER:STRING=/usr/bin/ccache
772```
773
774If not, you can define these variables on the command line before invoking `setup.py`.
775
776```bash
777export CMAKE_C_COMPILER_LAUNCHER=ccache
778export CMAKE_CXX_COMPILER_LAUNCHER=ccache
779export CMAKE_CUDA_COMPILER_LAUNCHER=ccache
780python setup.py develop
781```
782
783#### Use a faster linker
784
785If you are editing a single file and rebuilding in a tight loop, the time spent
786linking will dominate. The system linker available in most Linux distributions
787(GNU `ld`) is quite slow. Use a faster linker, like [lld](https://lld.llvm.org/).
788
789People on Mac, follow [this guide](https://stackoverflow.com/questions/42730345/how-to-install-llvm-for-mac) instead.
790
791The easiest way to use `lld` this is download the
792[latest LLVM binaries](http://releases.llvm.org/download.html#8.0.0) and run:
793
794```bash
795ln -s /path/to/downloaded/ld.lld /usr/local/bin/ld
796```
797
798#### Use pre-compiled headers
799
800Sometimes there's no way of getting around rebuilding lots of files, for example
801editing `native_functions.yaml` usually means 1000+ files being rebuilt. If
802you're using CMake newer than 3.16, you can enable pre-compiled headers by
803setting `USE_PRECOMPILED_HEADERS=1` either on first setup, or in the
804`CMakeCache.txt` file.
805
806```sh
807USE_PRECOMPILED_HEADERS=1 python setup.py develop
808```
809
810This adds a build step where the compiler takes `<ATen/ATen.h>` and essentially
811dumps its internal AST to a file so the compiler can avoid repeating itself for
812every `.cpp` file.
813
814One caveat is that when enabled, this header gets included in every file by default.
815Which may change what code is legal, for example:
816- internal functions can never alias existing names in `<ATen/ATen.h>`
817- names in `<ATen/ATen.h>` will work even if you don't explicitly include it.
818
819#### Workaround for header dependency bug in nvcc
820If re-building without modifying any files results in several CUDA files being
821re-compiled, you may be running into an `nvcc` bug where header dependencies are
822not converted to absolute paths before reporting it to the build system. This
823makes `ninja` think one of the header files has been deleted, so it runs the
824build again.
825
826A compiler-wrapper to fix this is provided in `tools/nvcc_fix_deps.py`. You can use
827this as a compiler launcher, similar to `ccache`
828```bash
829export CMAKE_CUDA_COMPILER_LAUNCHER="python;`pwd`/tools/nvcc_fix_deps.py;ccache"
830python setup.py develop
831```
832
833### Rebuild few files with debug information
834
835While debugging a problem one often had to maintain a debug build in a separate folder.
836But often only a few files needs to be rebuild with debug info to get a symbolicated backtrace or enable source debugging
837One can easily solve this with the help of `tools/build_with_debinfo.py`
838
839For example, suppose one wants to debug what is going on while tensor index is selected, which can be achieved by setting a breakpoint at `applySelect` function:
840```
841% lldb -o "b applySelect" -o "process launch" -- python3 -c "import torch;print(torch.rand(5)[3])"
842(lldb) target create "python"
843Current executable set to '/usr/bin/python3' (arm64).
844(lldb) settings set -- target.run-args  "-c" "import torch;print(torch.rand(5)[3])"
845(lldb) b applySelect
846Breakpoint 1: no locations (pending).
847WARNING:  Unable to resolve breakpoint to any actual locations.
848(lldb) process launch
8492 locations added to breakpoint 1
850Process 87729 stopped
851* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
852    frame #0: 0x00000001023d55a8 libtorch_python.dylib`at::indexing::impl::applySelect(at::Tensor const&, long long, c10::SymInt, long long, c10::Device const&, std::__1::optional<c10::ArrayRef<c10::SymInt>> const&)
853libtorch_python.dylib`at::indexing::impl::applySelect:
854->  0x1023d55a8 <+0>:  sub    sp, sp, #0xd0
855    0x1023d55ac <+4>:  stp    x24, x23, [sp, #0x90]
856    0x1023d55b0 <+8>:  stp    x22, x21, [sp, #0xa0]
857    0x1023d55b4 <+12>: stp    x20, x19, [sp, #0xb0]
858Target 0: (python) stopped.
859Process 87729 launched: '/usr/bin/python' (arm64)
860```
861Which is not very informative, but can be easily remedied by rebuilding `python_variable_indexing.cpp` with debug information
862```
863% ./tools/build_with_debinfo.py torch/csrc/autograd/python_variable_indexing.cpp
864[1 / 2] Building caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_variable_indexing.cpp.o
865[2 / 2] Building lib/libtorch_python.dylib
866```
867And afterwards:
868```
869% lldb -o "b applySelect" -o "process launch" -- python3 -c "import torch;print(torch.rand(5)[3])"
870(lldb) target create "python"
871Current executable set to '/usr/bin/python3' (arm64).
872(lldb) settings set -- target.run-args  "-c" "import torch;print(torch.rand(5)[3])"
873(lldb) b applySelect
874Breakpoint 1: no locations (pending).
875WARNING:  Unable to resolve breakpoint to any actual locations.
876(lldb) process launch
8772 locations added to breakpoint 1
878Process 87741 stopped
879* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
880    frame #0: 0x00000001024e2628 libtorch_python.dylib`at::indexing::impl::applySelect(self=0x00000001004ee8a8, dim=0, index=(data_ = 3), real_dim=0, (null)=0x000000016fdfe535, self_sizes= Has Value=true ) at TensorIndexing.h:239:7
881   236         const at::Device& /*self_device*/,
882   237         const c10::optional<SymIntArrayRef>& self_sizes) {
883   238       // See NOTE [nested tensor size for indexing]
884-> 239       if (self_sizes.has_value()) {
885   240         auto maybe_index = index.maybe_as_int();
886   241         if (maybe_index.has_value()) {
887   242           TORCH_CHECK_INDEX(
888Target 0: (python) stopped.
889Process 87741 launched: '/usr/bin/python3' (arm64)
890```
891Which is much more useful, isn't it?
892
893### C++ frontend development tips
894
895We have very extensive tests in the [test/cpp/api](test/cpp/api) folder. The
896tests are a great way to see how certain components are intended to be used.
897When compiling PyTorch from source, the test runner binary will be written to
898`build/bin/test_api`. The tests use the [GoogleTest](https://github.com/google/googletest/blob/master/googletest)
899framework, which you can read up about to learn how to configure the test runner. When
900submitting a new feature, we care very much that you write appropriate tests.
901Please follow the lead of the other tests to see how to write a new test case.
902
903### GDB integration
904
905If you are debugging pytorch inside GDB, you might be interested in
906[pytorch-gdb](tools/gdb/pytorch-gdb.py). This script introduces some
907pytorch-specific commands which you can use from the GDB prompt. In
908particular, `torch-tensor-repr` prints a human-readable repr of an at::Tensor
909object. Example of usage:
910
911```
912$ gdb python
913GNU gdb (GDB) 9.2
914[...]
915(gdb) # insert a breakpoint when we call .neg()
916(gdb) break at::Tensor::neg
917Function "at::Tensor::neg" not defined.
918Make breakpoint pending on future shared library load? (y or [n]) y
919Breakpoint 1 (at::Tensor::neg) pending.
920
921(gdb) run
922[...]
923>>> import torch
924>>> t = torch.tensor([1, 2, 3, 4], dtype=torch.float64)
925>>> t
926tensor([1., 2., 3., 4.], dtype=torch.float64)
927>>> t.neg()
928
929Thread 1 "python" hit Breakpoint 1, at::Tensor::neg (this=0x7ffb118a9c88) at aten/src/ATen/core/TensorBody.h:3295
9303295    inline at::Tensor Tensor::neg() const {
931(gdb) # the default repr of 'this' is not very useful
932(gdb) p this
933$1 = (const at::Tensor * const) 0x7ffb118a9c88
934(gdb) p *this
935$2 = {impl_ = {target_ = 0x55629b5cd330}}
936(gdb) torch-tensor-repr *this
937Python-level repr of *this:
938tensor([1., 2., 3., 4.], dtype=torch.float64)
939```
940
941GDB tries to automatically load `pytorch-gdb` thanks to the
942[.gdbinit](.gdbinit) at the root of the pytorch repo. However, auto-loadings is disabled by default, because of security reasons:
943
944```bash
945$ gdb
946warning: File "/path/to/pytorch/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
947To enable execution of this file add
948        add-auto-load-safe-path /path/to/pytorch/.gdbinit
949line to your configuration file "/home/YOUR-USERNAME/.gdbinit".
950To completely disable this security protection add
951        set auto-load safe-path /
952line to your configuration file "/home/YOUR-USERNAME/.gdbinit".
953For more information about this security protection see the
954"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
955        info "(gdb)Auto-loading safe path"
956(gdb)
957```
958
959As gdb itself suggests, the best way to enable auto-loading of `pytorch-gdb`
960is to add the following line to your `~/.gdbinit` (i.e., the `.gdbinit` file
961which is in your home directory, **not** `/path/to/pytorch/.gdbinit`):
962
963```bash
964add-auto-load-safe-path /path/to/pytorch/.gdbinit
965```
966
967### C++ stacktraces
968Set `TORCH_SHOW_CPP_STACKTRACES=1` to get the C++ stacktrace when an error occurs in Python.
969
970## CUDA development tips
971
972If you are working on the CUDA code, here are some useful CUDA debugging tips:
973
9741. `CUDA_DEVICE_DEBUG=1` will enable CUDA device function debug symbols (`-g -G`).
975    This will be particularly helpful in debugging device code. However, it will
976    slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely.
9772. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`,
978   `cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros).
9793. CUDA supports a lot of C++11/14 features such as, `std::numeric_limits`, `std::nextafter`,
980   `std::tuple` etc. in device code. Many of such features are possible because of the
981   [--expt-relaxed-constexpr](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-functions)
982   nvcc flag. There is a known [issue](https://github.com/ROCm-Developer-Tools/HIP/issues/374)
983   that ROCm errors out on device code, which uses such stl functions.
9844. A good performance metric for a CUDA kernel is the
985   [Effective Memory Bandwidth](https://devblogs.nvidia.com/how-implement-performance-metrics-cuda-cc/).
986   It is useful for you to measure this metric whenever you are writing/optimizing a CUDA
987   kernel. Following script shows how we can measure the effective bandwidth of CUDA `uniform_`
988   kernel.
989   ```python
990   import torch
991   from torch.utils.benchmark import Timer
992   size = 128*512
993   nrep = 100
994   nbytes_read_write = 4 # this is number of bytes read + written by a kernel. Change this to fit your kernel.
995
996   for i in range(10):
997       a=torch.empty(size).cuda().uniform_()
998       torch.cuda.synchronize()
999       out = a.uniform_()
1000       torch.cuda.synchronize()
1001       t = Timer(stmt="a.uniform_()", globals=globals())
1002       res = t.blocked_autorange()
1003       timec = res.median
1004       print("uniform, size, elements", size, "forward", timec, "bandwidth (GB/s)", size*(nbytes_read_write)*1e-9/timec)
1005       size *=2
1006   ```
1007
1008  See more cuda development tips [here](https://github.com/pytorch/pytorch/wiki/CUDA-basics)
1009
1010## Windows development tips
1011
1012For building from source on Windows, consult
1013[our documentation](https://pytorch.org/docs/stable/notes/windows.html) on it.
1014
1015Occasionally, you will write a patch which works on Linux, but fails CI on Windows.
1016There are a few aspects in which MSVC (the Windows compiler toolchain we use) is stricter
1017than Linux, which are worth keeping in mind when fixing these problems.
1018
10191. Symbols are NOT exported by default on Windows; instead, you have to explicitly
1020   mark a symbol as exported/imported in a header file with `__declspec(dllexport)` /
1021   `__declspec(dllimport)`. We have codified this pattern into a set of macros
1022   which follow the convention `*_API`, e.g., `TORCH_API` inside Caffe2, Aten and Torch.
1023   (Every separate shared library needs a unique macro name, because symbol visibility
1024   is on a per shared library basis. See c10/macros/Macros.h for more details.)
1025
1026   The upshot is if you see an "unresolved external" error in your Windows build, this
1027   is probably because you forgot to mark a function with `*_API`. However, there is
1028   one important counterexample to this principle: if you want a *templated* function
1029   to be instantiated at the call site, do NOT mark it with `*_API` (if you do mark it,
1030   you'll have to explicitly instantiate all of the specializations used by the call
1031   sites.)
1032
10332. If you link against a library, this does not make its dependencies transitively
1034   visible. You must explicitly specify a link dependency against every library whose
1035   symbols you use. (This is different from Linux where in most environments,
1036   transitive dependencies can be used to fulfill unresolved symbols.)
1037
10383. If you have a Windows box (we have a few on EC2 which you can request access to) and
1039   you want to run the build, the easiest way is to just run `.ci/pytorch/win-build.sh`.
1040   If you need to rebuild, run `REBUILD=1 .ci/pytorch/win-build.sh` (this will avoid
1041   blowing away your Conda environment.)
1042
1043Even if you don't know anything about MSVC, you can use cmake to build simple programs on
1044Windows; this can be helpful if you want to learn more about some peculiar linking behavior
1045by reproducing it on a small example. Here's a simple example cmake file that defines
1046two dynamic libraries, one linking with the other:
1047
1048```CMake
1049project(myproject CXX)
1050set(CMAKE_CXX_STANDARD 14)
1051add_library(foo SHARED foo.cpp)
1052add_library(bar SHARED bar.cpp)
1053# NB: don't forget to __declspec(dllexport) at least one symbol from foo,
1054# otherwise foo.lib will not be created.
1055target_link_libraries(bar PUBLIC foo)
1056```
1057
1058You can build it with:
1059
1060```bash
1061mkdir build
1062cd build
1063cmake ..
1064cmake --build .
1065```
1066
1067### Known MSVC (and MSVC with NVCC) bugs
1068
1069The PyTorch codebase sometimes likes to use exciting C++ features, and
1070these exciting features lead to exciting bugs in Windows compilers.
1071To add insult to injury, the error messages will often not tell you
1072which line of code actually induced the erroring template instantiation.
1073
1074We've found the most effective way to debug these problems is to
1075carefully read over diffs, keeping in mind known bugs in MSVC/NVCC.
1076Here are a few well known pitfalls and workarounds:
1077
1078* This is not actually a bug per se, but in general, code generated by MSVC
1079  is more sensitive to memory errors; you may have written some code
1080  that does a use-after-free or stack overflows; on Linux the code
1081  might work, but on Windows your program will crash. ASAN may not
1082  catch all of these problems: stay vigilant to the possibility that
1083  your crash is due to a real memory problem.
1084
1085* (NVCC) `c10::optional` does not work when used from device code. Don't use
1086  it from kernels. Upstream issue: https://github.com/akrzemi1/Optional/issues/58
1087  and our local issue #10329.
1088
1089* `constexpr` generally works less well on MSVC.
1090
1091  * The idiom `static_assert(f() == f())` to test if `f` is constexpr
1092    does not work; you'll get "error C2131: expression did not evaluate
1093    to a constant". Don't use these asserts on Windows.
1094    (Example: `c10/util/intrusive_ptr.h`)
1095
1096* (NVCC) Code you access inside a `static_assert` will eagerly be
1097  evaluated as if it were device code, and so you might get an error
1098  that the code is "not accessible".
1099
1100```cpp
1101class A {
1102  static A singleton_;
1103  static constexpr inline A* singleton() {
1104    return &singleton_;
1105  }
1106};
1107static_assert(std::is_same(A*, decltype(A::singleton()))::value, "hmm");
1108```
1109
1110* The compiler will run out of heap space if you attempt to compile files that
1111  are too large. Splitting such files into separate files helps.
1112  (Example: `THTensorMath`, `THTensorMoreMath`, `THTensorEvenMoreMath`.)
1113
1114* MSVC's preprocessor (but not the standard compiler) has a bug
1115  where it incorrectly tokenizes raw string literals, ending when it sees a `"`.
1116  This causes preprocessor tokens inside the literal like an`#endif`  to be incorrectly
1117  treated as preprocessor directives. See https://godbolt.org/z/eVTIJq as an example.
1118
1119* Either MSVC or the Windows headers have a PURE macro defined and will replace
1120  any occurrences of the PURE token in code with an empty string. This is why
1121  we have AliasAnalysisKind::PURE_FUNCTION and not AliasAnalysisKind::PURE.
1122  The same is likely true for other identifiers that we just didn't try to use yet.
1123
1124### Building on legacy code and CUDA
1125
1126CUDA, MSVC, and PyTorch versions are interdependent; please install matching versions from this table:
1127| CUDA version | Newest supported VS version                             | PyTorch version |
1128| ------------ | ------------------------------------------------------- | --------------- |
1129| 10.1         | Visual Studio 2019 (16.X) (`_MSC_VER` < 1930)           |  1.3.0 ~ 1.7.0  |
1130| 10.2         | Visual Studio 2019 (16.X) (`_MSC_VER` < 1930)           |  1.5.0 ~ 1.7.0  |
1131| 11.0         | Visual Studio 2019 (16.X) (`_MSC_VER` < 1930)           |      1.7.0      |
1132
1133Note: There's a [compilation issue](https://github.com/oneapi-src/oneDNN/issues/812) in several Visual Studio 2019 versions since 16.7.1, so please make sure your Visual Studio 2019 version is not in 16.7.1 ~ 16.7.5
1134
1135## Running clang-tidy
1136
1137[Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/index.html) is a C++
1138linter and static analysis tool based on the clang compiler. We run clang-tidy
1139in our CI to make sure that new C++ code is safe, sane and efficient. See the
1140[`clang-tidy` job in our GitHub Workflow's
1141lint.yml file](https://github.com/pytorch/pytorch/blob/main/.github/workflows/lint.yml)
1142for the simple commands we use for this.
1143
1144To run clang-tidy locally, follow these steps:
1145
11461. Install clang-tidy.
1147We provide custom built binaries which have additional checks enabled. You can install it by running:
1148```bash
1149python3 -m tools.linter.clang_tidy.generate_build_files
1150```
1151We currently only support Linux and MacOS (x86).
1152
11532. Install clang-tidy driver script dependencies
1154```bash
1155pip3 install -r tools/linter/clang_tidy/requirements.txt
1156```
1157
11583. Run clang-tidy
1159```bash
1160# Run clang-tidy on the entire codebase
1161make clang-tidy
1162# Run clang-tidy only on your changes
1163make clang-tidy CHANGED_ONLY=--changed-only
1164```
1165This internally invokes our driver script and closely mimics how clang-tidy is run on CI.
1166
1167## Pre-commit tidy/linting hook
1168
1169We use clang-tidy to perform additional
1170formatting and semantic checking of code. We provide a pre-commit git hook for
1171performing these checks, before a commit is created:
1172
1173  ```bash
1174  ln -s ../../tools/git-pre-commit .git/hooks/pre-commit
1175  ```
1176
1177If you have already committed files and
1178CI reports `flake8` errors, you can run the check locally in your PR branch with:
1179
1180  ```bash
1181  flake8 $(git diff --name-only $(git merge-base --fork-point main))
1182  ```
1183
1184You'll need to install an appropriately configured flake8; see
1185[Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type)
1186for documentation on how to do this.
1187
1188Fix the code so that no errors are reported when you re-run the above check again,
1189and then commit the fix.
1190
1191## Building PyTorch with ASAN
1192
1193[ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer) is very
1194useful for debugging memory errors in C++. We run it in CI, but here's how to
1195get the same thing to run on your local machine.
1196
1197First, install LLVM 8. The easiest way is to get [prebuilt
1198binaries](http://releases.llvm.org/download.html#8.0.0) and extract them to
1199folder (later called `$LLVM_ROOT`).
1200
1201Then set up the appropriate scripts. You can put this in your `.bashrc`:
1202
1203```bash
1204LLVM_ROOT=<wherever your llvm install is>
1205PYTORCH_ROOT=<wherever your pytorch checkout is>
1206
1207LIBASAN_RT="$LLVM_ROOT/lib/clang/8.0.0/lib/linux/libclang_rt.asan-x86_64.so"
1208build_with_asan()
1209{
1210  LD_PRELOAD=${LIBASAN_RT} \
1211  CC="$LLVM_ROOT/bin/clang" \
1212  CXX="$LLVM_ROOT/bin/clang++" \
1213  LDSHARED="clang --shared" \
1214  LDFLAGS="-stdlib=libstdc++" \
1215  CFLAGS="-fsanitize=address -fno-sanitize-recover=all -shared-libasan -pthread" \
1216  CXX_FLAGS="-pthread" \
1217  USE_CUDA=0 USE_OPENMP=0 USE_DISTRIBUTED=0 DEBUG=1 \
1218  python setup.py develop
1219}
1220
1221run_with_asan()
1222{
1223  LD_PRELOAD=${LIBASAN_RT} $@
1224}
1225
1226# you can look at build-asan.sh to find the latest options the CI uses
1227export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true
1228export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PYTORCH_ROOT/ubsan.supp
1229export ASAN_SYMBOLIZER_PATH=$LLVM_ROOT/bin/llvm-symbolizer
1230```
1231
1232Then you can use the scripts like:
1233
1234```
1235suo-devfair ~/pytorch ❯ build_with_asan
1236suo-devfair ~/pytorch ❯ run_with_asan python test/test_jit.py
1237```
1238
1239### Getting `ccache` to work
1240
1241The scripts above specify the `clang` and `clang++` binaries directly, which
1242bypasses `ccache`. Here's how to get `ccache` to work:
1243
12441. Make sure the ccache symlinks for `clang` and `clang++` are set up (see
1245   CONTRIBUTING.md)
12462. Make sure `$LLVM_ROOT/bin` is available on your `$PATH`.
12473. Change the `CC` and `CXX` variables in `build_with_asan()` to point
1248   directly to `clang` and `clang++`.
1249
1250### Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?
1251
1252The “standard” workflow for ASAN assumes you have a standalone binary:
1253
12541. Recompile your binary with `-fsanitize=address`.
12552. Run the binary, and ASAN will report whatever errors it find.
1256
1257Unfortunately, PyTorch is a distributed as a shared library that is loaded by
1258a third-party executable (Python). It’s too much of a hassle to recompile all
1259of Python every time we want to use ASAN. Luckily, the ASAN folks have a
1260workaround for cases like this:
1261
12621. Recompile your library with `-fsanitize=address -shared-libasan`. The
1263   extra `-shared-libasan` tells the compiler to ask for the shared ASAN
1264   runtime library.
12652. Use `LD_PRELOAD` to tell the dynamic linker to load the ASAN runtime
1266   library before anything else.
1267
1268More information can be found
1269[here](https://github.com/google/sanitizers/wiki/AddressSanitizerAsDso).
1270
1271### Why LD_PRELOAD in the build function?
1272
1273We need `LD_PRELOAD` because there is a cmake check that ensures that a
1274simple program builds and runs. If we are building with ASAN as a shared
1275library, we need to `LD_PRELOAD` the runtime library, otherwise there will
1276dynamic linker errors and the check will fail.
1277
1278We don’t actually need either of these if we fix the cmake checks.
1279
1280### Why no leak detection?
1281
1282Python leaks a lot of memory. Possibly we could configure a suppression file,
1283but we haven’t gotten around to it.
1284
1285## Caffe2 notes
1286
1287In 2018, we merged Caffe2 into the PyTorch source repository. While the
1288steady state aspiration is that Caffe2 and PyTorch share code freely,
1289in the meantime there will be some separation.
1290
1291There are a few "unusual" directories which, for historical reasons,
1292are Caffe2/PyTorch specific. Here they are:
1293
1294- `CMakeLists.txt`, `Makefile`, `binaries`, `cmake`, `conda`, `modules`,
1295  `scripts` are Caffe2-specific. Don't put PyTorch code in them without
1296  extra coordination.
1297
1298- `mypy*`, `requirements.txt`, `setup.py`, `test`, `tools` are
1299  PyTorch-specific. Don't put Caffe2 code in them without extra
1300  coordination.
1301
1302## CI failure tips
1303
1304Once you submit a PR or push a new commit to a branch that is in
1305an active PR, CI jobs will be run automatically. Some of these may
1306fail and you will need to find out why, by looking at the logs.
1307
1308Fairly often, a CI failure might be unrelated to your changes. You can
1309confirm by going to our [HUD](https://hud.pytorch.org) and seeing if the CI job
1310is failing upstream already. In this case, you
1311can usually ignore the failure. See [the following
1312subsection](#which-commit-is-used-in-ci) for more details.
1313
1314Some failures might be related to specific hardware or environment
1315configurations. In this case, if you're a Meta employee, you can ssh into
1316the job's session to perform manual debugging following the instructions in
1317our [CI wiki](https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions).
1318
1319
1320### Which commit is used in CI?
1321
1322For CI run on `main`, this repository is checked out for a given `main`
1323commit, and CI is run on that commit (there isn't really any other choice).
1324
1325For PRs, however, it's a bit more complicated. Consider this commit graph, where
1326`main` is at commit `A`, and the branch for PR #42 (just a placeholder) is at
1327commit `B`:
1328
1329```
1330       o---o---B (refs/pull/42/head)
1331      /         \
1332     /           C (refs/pull/42/merge)
1333    /           /
1334---o---o---o---A (merge-destination) - usually main
1335```
1336
1337There are two possible choices for which commit to use:
1338
13391. Checkout commit `B`, the head of the PR (manually committed by the PR
1340   author).
13412. Checkout commit `C`, the hypothetical result of what would happen if the PR
1342   were merged into its destination (usually `main`).
1343
1344For all practical purposes, most people can think of the commit being used as
1345commit `B` (choice **1**).
1346
1347However, if workflow files (which govern CI behavior) were modified (either by your PR or since dev branch were created ) there's
1348a nuance to know about:
1349The workflow files themselves get taken from checkpoint `C`, the merger of your
1350PR and the `main` branch. But only the workflow files get taken from that merged
1351checkpoint. Everything else (tests, code, etc) all get taken directly from your
1352PR's commit (commit `B`). Please note, this scenario would never affect PRs authored by `ghstack` as they would not automatically ingest the updates from default branch.
1353
1354
1355## Dev Infra Office Hours
1356[Dev Infra Office Hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) are hosted every Friday to answer any questions regarding developer experience, Green HUD, and CI.
1357