1Thank you for your interest in contributing to PyTorch! 2If you're a new contributor, please first take a read through our 3[Contributing Guide](https://github.com/pytorch/pytorch/wiki/The-Ultimate-Guide-to-PyTorch-Contributions), specifically the [Submitting a Change](https://github.com/pytorch/pytorch/wiki/The-Ultimate-Guide-to-PyTorch-Contributions#submitting-a-change) section 4that walks through the process of contributing a change to PyTorch. 5 6The rest of this document (CONTRIBUTING.md) covers some of the more technical 7aspects of contributing to PyTorch. 8 9# Table of Contents 10 11<!-- toc --> 12 13- [Developing PyTorch](#developing-pytorch) 14 - [Setup the development environment](#setup-the-development-environment) 15 - [Tips and Debugging](#tips-and-debugging) 16- [Nightly Checkout & Pull](#nightly-checkout--pull) 17- [Codebase structure](#codebase-structure) 18- [Unit testing](#unit-testing) 19 - [Python Unit Testing](#python-unit-testing) 20 - [Better local unit tests with `pytest`](#better-local-unit-tests-with-pytest) 21 - [Local linting](#local-linting) 22 - [Running `mypy`](#running-mypy) 23 - [C++ Unit Testing](#c-unit-testing) 24 - [Run Specific CI Jobs](#run-specific-ci-jobs) 25- [Merging your Change](#merging-your-change) 26- [Writing documentation](#writing-documentation) 27 - [Docstring type formatting](#docstring-type-formatting) 28 - [Building documentation](#building-documentation) 29 - [Tips](#tips) 30 - [Building C++ Documentation](#building-c-documentation) 31 - [Previewing changes locally](#previewing-changes-locally) 32 - [Previewing documentation on PRs](#previewing-documentation-on-prs) 33 - [Adding documentation tests](#adding-documentation-tests) 34- [Profiling with `py-spy`](#profiling-with-py-spy) 35- [Managing multiple build trees](#managing-multiple-build-trees) 36- [C++ development tips](#c-development-tips) 37 - [Build only what you need](#build-only-what-you-need) 38 - [Code completion and IDE support](#code-completion-and-ide-support) 39 - [Make no-op build fast](#make-no-op-build-fast) 40 - [Use Ninja](#use-ninja) 41 - [Use CCache](#use-ccache) 42 - [Use a faster linker](#use-a-faster-linker) 43 - [Use pre-compiled headers](#use-pre-compiled-headers) 44 - [Workaround for header dependency bug in nvcc](#workaround-for-header-dependency-bug-in-nvcc) 45 - [Rebuild few files with debug information](#rebuild-few-files-with-debug-information) 46 - [C++ frontend development tips](#c-frontend-development-tips) 47 - [GDB integration](#gdb-integration) 48 - [C++ stacktraces](#c-stacktraces) 49- [CUDA development tips](#cuda-development-tips) 50- [Windows development tips](#windows-development-tips) 51 - [Known MSVC (and MSVC with NVCC) bugs](#known-msvc-and-msvc-with-nvcc-bugs) 52 - [Building on legacy code and CUDA](#building-on-legacy-code-and-cuda) 53- [Running clang-tidy](#running-clang-tidy) 54- [Pre-commit tidy/linting hook](#pre-commit-tidylinting-hook) 55- [Building PyTorch with ASAN](#building-pytorch-with-asan) 56 - [Getting `ccache` to work](#getting-ccache-to-work) 57 - [Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?](#why-this-stuff-with-ld_preload-and-libasan_rt) 58 - [Why LD_PRELOAD in the build function?](#why-ld_preload-in-the-build-function) 59 - [Why no leak detection?](#why-no-leak-detection) 60- [Caffe2 notes](#caffe2-notes) 61- [CI failure tips](#ci-failure-tips) 62 - [Which commit is used in CI?](#which-commit-is-used-in-ci) 63- [Dev Infra Office Hours](#dev-infra-office-hours) 64 65<!-- tocstop --> 66 67## Developing PyTorch 68 69Follow the instructions for [installing PyTorch from source](https://github.com/pytorch/pytorch#from-source). If you get stuck when developing PyTorch on your machine, check out the [tips and debugging](#tips-and-debugging) section below for common solutions. 70 71### Setup the development environment 72 73First, you need to [fork the PyTorch project on GitHub](https://github.com/pytorch/pytorch/fork) and follow the instructions at [Connecting to GitHub with SSH](https://docs.github.com/en/authentication/connecting-to-github-with-ssh) to setup your SSH authentication credentials. 74 75Then clone the PyTorch project and setup the development environment: 76 77```bash 78git clone [email protected]:<USERNAME>/pytorch.git 79cd pytorch 80git remote add upstream [email protected]:pytorch/pytorch.git 81 82make setup-env # or make setup-env-cuda for pre-built CUDA binaries 83conda activate pytorch-deps 84``` 85 86### Tips and Debugging 87 88* If you want to have no-op incremental rebuilds (which are fast), see [Make no-op build fast](#make-no-op-build-fast) below. 89 90* When installing with `python setup.py develop` (in contrast to `python setup.py install`) Python runtime will use 91 the current local source-tree when importing `torch` package. (This is done by creating [`.egg-link`](https://wiki.python.org/moin/PythonPackagingTerminology#egg-link) file in `site-packages` folder) 92 This way you do not need to repeatedly install after modifying Python files (`.py`). 93 However, you would need to reinstall if you modify Python interface (`.pyi`, `.pyi.in`) or 94 non-Python files (`.cpp`, `.cc`, `.cu`, `.h`, ...). 95 96 97 One way to avoid running `python setup.py develop` every time one makes a change to C++/CUDA/ObjectiveC files on Linux/Mac, 98 is to create a symbolic link from `build` folder to `torch/lib`, for example, by issuing following: 99 ```bash 100 pushd torch/lib; sh -c "ln -sf ../../build/lib/libtorch_cpu.* ."; popd 101 ``` 102 Afterwards rebuilding a library (for example to rebuild `libtorch_cpu.so` issue `ninja torch_cpu` from `build` folder), 103 would be sufficient to make change visible in `torch` package. 104 105 106 To reinstall, first uninstall all existing PyTorch installs. You may need to run `pip 107 uninstall torch` multiple times. You'll know `torch` is fully 108 uninstalled when you see `WARNING: Skipping torch as it is not 109 installed`. (You should only have to `pip uninstall` a few times, but 110 you can always `uninstall` with `timeout` or in a loop if you're feeling 111 lazy.) 112 113 ```bash 114 conda uninstall pytorch -y 115 yes | pip uninstall torch 116 ``` 117 118 Next run `python setup.py clean`. After that, you can install in `develop` mode again. 119 120* If you run into errors when running `python setup.py develop`, here are some debugging steps: 121 1. Run `printf '#include <stdio.h>\nint main() { printf("Hello World");}'|clang -x c -; ./a.out` to make sure 122 your CMake works and can compile this simple Hello World program without errors. 123 2. Nuke your `build` directory. The `setup.py` script compiles binaries into the `build` folder and caches many 124 details along the way, which saves time the next time you build. If you're running into issues, you can always 125 `rm -rf build` from the toplevel `pytorch` directory and start over. 126 3. If you have made edits to the PyTorch repo, commit any change you'd like to keep and clean the repo with the 127 following commands (note that clean _really_ removes all untracked files and changes.): 128 ```bash 129 git submodule deinit -f . 130 git clean -xdf 131 python setup.py clean 132 git submodule update --init --recursive # very important to sync the submodules 133 python setup.py develop # then try running the command again 134 ``` 135 4. The main step within `python setup.py develop` is running `make` from the `build` directory. If you want to 136 experiment with some environment variables, you can pass them into the command: 137 ```bash 138 ENV_KEY1=ENV_VAL1[, ENV_KEY2=ENV_VAL2]* python setup.py develop 139 ``` 140 141* If you run into issue running `git submodule update --init --recursive`. Please try the following: 142 - If you encounter an error such as 143 ``` 144 error: Submodule 'third_party/pybind11' could not be updated 145 ``` 146 check whether your Git local or global config file contains any `submodule.*` settings. If yes, remove them and try again. 147 (please reference [this doc](https://git-scm.com/docs/git-config#Documentation/git-config.txt-submoduleltnamegturl) for more info). 148 149 - If you encounter an error such as 150 ``` 151 fatal: unable to access 'https://github.com/pybind11/pybind11.git': could not load PEM client certificate ... 152 ``` 153 this is likely that you are using HTTP proxying and the certificate expired. To check if the certificate is valid, run 154 `git config --global --list` and search for config like `http.proxysslcert=<cert_file>`. Then check certificate valid date by running 155 ```bash 156 openssl x509 -noout -in <cert_file> -dates 157 ``` 158 159 - If you encounter an error that some third_party modules are not checked out correctly, such as 160 ``` 161 Could not find .../pytorch/third_party/pybind11/CMakeLists.txt 162 ``` 163 remove any `submodule.*` settings in your local git config (`.git/config` of your pytorch repo) and try again. 164* If you're a Windows contributor, please check out [Best Practices](https://github.com/pytorch/pytorch/wiki/Best-Practices-to-Edit-and-Compile-Pytorch-Source-Code-On-Windows). 165* For help with any part of the contributing process, please don’t hesitate to utilize our Zoom office hours! See details [here](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) 166 167## Nightly Checkout & Pull 168 169The `tools/nightly.py` script is provided to ease pure Python development of 170PyTorch. This uses `conda` and `git` to check out the nightly development 171version of PyTorch and installs pre-built binaries into the current repository. 172This is like a development or editable install, but without needing the ability 173to compile any C++ code. 174 175You can use this script to check out a new nightly branch with the following: 176 177```bash 178./tools/nightly.py checkout -b my-nightly-branch 179conda activate pytorch-deps 180``` 181 182Or if you would like to re-use an existing conda environment, you can pass in 183the regular environment parameters (`--name` or `--prefix`): 184 185```bash 186./tools/nightly.py checkout -b my-nightly-branch -n my-env 187conda activate my-env 188``` 189 190To install the nightly binaries built with CUDA, you can pass in the flag `--cuda`: 191 192```bash 193./tools/nightly.py checkout -b my-nightly-branch --cuda 194conda activate pytorch-deps 195``` 196 197You can also use this tool to pull the nightly commits into the current branch: 198 199```bash 200./tools/nightly.py pull -n my-env 201conda activate my-env 202``` 203 204Pulling will reinstall the PyTorch dependencies as well as the nightly binaries 205into the repo directory. 206 207## Codebase structure 208 209* [c10](c10) - Core library files that work everywhere, both server 210 and mobile. We are slowly moving pieces from [ATen/core](aten/src/ATen/core) 211 here. This library is intended only to contain essential functionality, 212 and appropriate to use in settings where binary size matters. (But 213 you'll have a lot of missing functionality if you try to use it 214 directly.) 215* [aten](aten) - C++ tensor library for PyTorch (no autograd support) 216 * [src](aten/src) - [README](aten/src/README.md) 217 * [ATen](aten/src/ATen) 218 * [core](aten/src/ATen/core) - Core functionality of ATen. This 219 is migrating to top-level c10 folder. 220 * [native](aten/src/ATen/native) - Modern implementations of 221 operators. If you want to write a new operator, here is where 222 it should go. Most CPU operators go in the top level directory, 223 except for operators which need to be compiled specially; see 224 cpu below. 225 * [cpu](aten/src/ATen/native/cpu) - Not actually CPU 226 implementations of operators, but specifically implementations 227 which are compiled with processor-specific instructions, like 228 AVX. See the [README](aten/src/ATen/native/cpu/README.md) for more 229 details. 230 * [cuda](aten/src/ATen/native/cuda) - CUDA implementations of 231 operators. 232 * [sparse](aten/src/ATen/native/sparse) - CPU and CUDA 233 implementations of COO sparse tensor operations 234 * [mkl](aten/src/ATen/native/mkl) [mkldnn](aten/src/ATen/native/mkldnn) 235 [miopen](aten/src/ATen/native/miopen) [cudnn](aten/src/ATen/native/cudnn) 236 - implementations of operators which simply bind to some 237 backend library. 238 * [quantized](aten/src/ATen/native/quantized/) - Quantized tensor (i.e. QTensor) operation implementations. [README](aten/src/ATen/native/quantized/README.md) contains details including how to implement native quantized operations. 239* [torch](torch) - The actual PyTorch library. Everything that is not 240 in [csrc](torch/csrc) is a Python module, following the PyTorch Python 241 frontend module structure. 242 * [csrc](torch/csrc) - C++ files composing the PyTorch library. Files 243 in this directory tree are a mix of Python binding code, and C++ 244 heavy lifting. Consult `setup.py` for the canonical list of Python 245 binding files; conventionally, they are often prefixed with 246 `python_`. [README](torch/csrc/README.md) 247 * [jit](torch/csrc/jit) - Compiler and frontend for TorchScript JIT 248 frontend. [README](torch/csrc/jit/README.md) 249 * [autograd](torch/csrc/autograd) - Implementation of reverse-mode automatic differentiation. [README](torch/csrc/autograd/README.md) 250 * [api](torch/csrc/api) - The PyTorch C++ frontend. 251 * [distributed](torch/csrc/distributed) - Distributed training 252 support for PyTorch. 253* [tools](tools) - Code generation scripts for the PyTorch library. 254 See [README](tools/README.md) of this directory for more details. 255* [test](test) - Python unit tests for PyTorch Python frontend. 256 * [test_torch.py](test/test_torch.py) - Basic tests for PyTorch 257 functionality. 258 * [test_autograd.py](test/test_autograd.py) - Tests for non-NN 259 automatic differentiation support. 260 * [test_nn.py](test/test_nn.py) - Tests for NN operators and 261 their automatic differentiation. 262 * [test_jit.py](test/test_jit.py) - Tests for the JIT compiler 263 and TorchScript. 264 * ... 265 * [cpp](test/cpp) - C++ unit tests for PyTorch C++ frontend. 266 * [api](test/cpp/api) - [README](test/cpp/api/README.md) 267 * [jit](test/cpp/jit) - [README](test/cpp/jit/README.md) 268 * [tensorexpr](test/cpp/tensorexpr) - [README](test/cpp/tensorexpr/README.md) 269 * [expect](test/expect) - Automatically generated "expect" files 270 which are used to compare against expected output. 271 * [onnx](test/onnx) - Tests for ONNX export functionality, 272 using both PyTorch and Caffe2. 273* [caffe2](caffe2) - The Caffe2 library. 274 * [core](caffe2/core) - Core files of Caffe2, e.g., tensor, workspace, 275 blobs, etc. 276 * [operators](caffe2/operators) - Operators of Caffe2. 277 * [python](caffe2/python) - Python bindings to Caffe2. 278 * ... 279* [.circleci](.circleci) - CircleCI configuration management. [README](.circleci/README.md) 280 281## Unit testing 282 283### Python Unit Testing 284 285**Prerequisites**: 286The following packages should be installed with either `conda` or `pip`: 287- `expecttest` and `hypothesis` - required to run tests 288- `mypy` - recommended for linting 289- `pytest` - recommended to run tests more selectively 290 291All PyTorch test suites are located in the `test` folder and start with 292`test_`. Run the entire test 293suite with 294 295```bash 296python test/run_test.py 297``` 298 299or run individual test suites using the command `python test/FILENAME.py`, 300where `FILENAME` represents the file containing the test suite you wish 301to run. 302 303For example, to run all the TorchScript JIT tests (located at 304`test/test_jit.py`), you would run: 305 306```bash 307python test/test_jit.py 308``` 309 310You can narrow down what you're testing even further by specifying the 311name of an individual test with `TESTCLASSNAME.TESTNAME`. Here, 312`TESTNAME` is the name of the test you want to run, and `TESTCLASSNAME` 313is the name of the class in which it is defined. 314 315Going off the above example, let's say you want to run 316`test_Sequential`, which is defined as part of the `TestJit` class 317in `test/test_jit.py`. Your command would be: 318 319```bash 320python test/test_jit.py TestJit.test_Sequential 321``` 322 323**Weird note:** In our CI (Continuous Integration) jobs, we actually run the tests from the `test` folder and **not** the root of the repo, since there are various dependencies we set up for CI that expects the tests to be run from the test folder. As such, there may be some inconsistencies between local testing and CI testing--if you observe an inconsistency, please [file an issue](https://github.com/pytorch/pytorch/issues/new/choose). 324 325### Better local unit tests with `pytest` 326 327We don't officially support `pytest`, but it works well with our 328`unittest` tests and offers a number of useful features for local 329developing. Install it via `pip install pytest`. 330 331If you want to just run tests that contain a specific substring, you can 332use the `-k` flag: 333 334```bash 335pytest test/test_nn.py -k Loss -v 336``` 337 338The above is an example of testing a change to all Loss functions: this 339command runs tests such as `TestNN.test_BCELoss` and 340`TestNN.test_MSELoss` and can be useful to save keystrokes. 341 342### Local linting 343 344Install all prerequisites by running 345 346```bash 347make setup-lint 348``` 349 350You can now run the same linting steps that are used in CI locally via `make`: 351 352```bash 353make lint 354``` 355 356Learn more about the linter on the [lintrunner wiki page](https://github.com/pytorch/pytorch/wiki/lintrunner) 357 358#### Running `mypy` 359 360`mypy` is an optional static type checker for Python. We have multiple `mypy` 361configs for the PyTorch codebase that are automatically validated against whenever the linter is run. 362 363See [Guide for adding type annotations to 364PyTorch](https://github.com/pytorch/pytorch/wiki/Guide-for-adding-type-annotations-to-PyTorch) 365for more information on how to set up `mypy` and tackle type annotation 366tasks. 367 368### C++ Unit Testing 369 370PyTorch offers a series of tests located in the `test/cpp` folder. 371These tests are written in C++ and use the Google Test testing framework. 372After compiling PyTorch from source, the test runner binaries will be 373written to the `build/bin` folder. The command to run one of these tests 374is `./build/bin/FILENAME --gtest_filter=TESTSUITE.TESTNAME`, where 375`TESTNAME` is the name of the test you'd like to run and `TESTSUITE` is 376the suite that test is defined in. 377 378For example, if you wanted to run the test `MayContainAlias`, which 379is part of the test suite `ContainerAliasingTest` in the file 380`test/cpp/jit/test_alias_analysis.cpp`, the command would be: 381 382```bash 383./build/bin/test_jit --gtest_filter=ContainerAliasingTest.MayContainAlias 384``` 385 386 387### Run Specific CI Jobs 388 389You can generate a commit that limits the CI to only run a specific job by using 390`tools/testing/explicit_ci_jobs.py` like so: 391 392```bash 393# --job: specify one or more times to filter to a specific job + its dependencies 394# --filter-gha: specify github actions workflows to keep 395# --make-commit: commit CI changes to git with a message explaining the change 396python tools/testing/explicit_ci_jobs.py --job binary_linux_manywheel_3_6m_cpu_devtoolset7_nightly_test --filter-gha '*generated*gcc5.4*' --make-commit 397 398# Make your changes 399 400ghstack submit 401``` 402 403**NB**: It is not recommended to use this workflow unless you are also using 404[`ghstack`](https://github.com/ezyang/ghstack). It creates a large commit that is 405of very low signal to reviewers. 406 407## Merging your Change 408If you know the right people or team that should approve your PR (and you have the required permissions to do so), add them to the Reviewers list. 409 410If not, leave the Reviewers section empty. Our triage squad will review your PR, add a module label, and assign it to the appropriate reviewer in a couple business days. The reviewer will then look at your PR and respond. 411 412Occasionally, things might fall through the cracks (sorry!). In case your PR either doesn't get assigned to a reviewer or doesn't get any response from the reviewer for 4 business days, please leave comment on the PR (mentioning the reviewer if one has been assigned). That'll get it nudged back onto people's radar. 413 414If that still doesn't help, come see us during [our office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) 415 416Once your PR is approved, you can merge it in by entering a comment with the content `@pytorchmergebot merge` ([what's this bot?](https://github.com/pytorch/pytorch/wiki/Bot-commands)) 417 418## Writing documentation 419 420So you want to write some documentation and don't know where to start? 421PyTorch has two main types of documentation: 422- **User facing documentation**: 423These are the docs that you see over at [our docs website](https://pytorch.org/docs). 424- **Developer facing documentation**: 425Developer facing documentation is spread around our READMEs in our codebase and in 426the [PyTorch Developer Wiki](https://pytorch.org/wiki). 427If you're interested in adding new developer docs, please read this [page on the wiki](https://github.com/pytorch/pytorch/wiki/Where-or-how-should-I-add-documentation) on our best practices for where to put it. 428 429The rest of this section is about user-facing documentation. 430 431PyTorch uses [Google style](https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html) 432for formatting docstrings. Each line inside a docstrings block must be limited to 80 characters so that it fits into Jupyter documentation popups. 433 434 435### Docstring type formatting 436 437In addition to the standard Google Style docstring formatting rules, the following guidelines should be followed for docstring types (docstring types are the type information contained in the round brackets after the variable name): 438 439* The "`Callable`", "`Any`", "`Iterable`", "`Iterator`", "`Generator`" types should have their first letter capitalized. 440 441* The "`list`" and "`tuple`" types should be completely lowercase. 442 443* Types should not be made plural. For example: `tuple of int` should be used instead of `tuple of ints`. 444 445* The only acceptable delimiter words for types are `or` and `of`. No other non-type words should be used other than `optional`. 446 447* The word `optional` should only be used after the types, and it is only used if the user does not have to specify a value for the variable. Default values are listed after the variable description. Example: 448 449 ``` 450 my_var (int, optional): Variable description. Default: 1 451 ``` 452 453* Basic Python types should match their type name so that the [Intersphinx](https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html) extension can correctly identify them. For example: 454 * Use `str` instead of `string`. 455 * Use `bool` instead of `boolean`. 456 * Use `dict` instead of `dictionary`. 457 458* Square brackets should be used for the dictionary type. For example: 459 460 ``` 461 my_var (dict[str, int]): Variable description. 462 ``` 463 464* If a variable has two different possible types, then the word `or` should be used without a comma. Otherwise variables with 3 or more types should use commas to separate the types. Example: 465 466 ``` 467 x (type1 or type2): Variable description. 468 y (type1, type2, or type3): Variable description. 469 ``` 470 471 472### Building documentation 473 474To build the documentation: 475 4761. Build and install PyTorch 477 4782. Install the prerequisites 479 480```bash 481cd docs 482pip install -r requirements.txt 483# `katex` must also be available in your PATH. 484# You can either install katex globally if you have properly configured npm: 485# npm install -g katex 486# Or if you prefer an uncontaminated global executable environment or do not want to go through the node configuration: 487# npm install katex && export PATH="$PATH:$(pwd)/node_modules/.bin" 488``` 489> Note: if you installed `nodejs` with a different package manager (e.g., 490`conda`) then `npm` will probably install a version of `katex` that is not 491compatible with your version of `nodejs` and doc builds will fail. 492A combination of versions that is known to work is `[email protected]` and 493`[email protected]`. To install the latter with `npm` you can run 494```npm install -g [email protected]``` 495 496 497> Note that if you are a Facebook employee using a devserver, yarn may be more convenient to install katex: 498 499```bash 500yarn global add katex 501``` 502> If a specific version is required you can use for example `yarn global add [email protected]`. 503 5043. Generate the documentation HTML files. The generated files will be in `docs/build/html`. 505 506```bash 507make html 508``` 509 510#### Tips 511 512The `.rst` source files live in [docs/source](docs/source). Some of the `.rst` 513files pull in docstrings from PyTorch Python code (for example, via 514the `autofunction` or `autoclass` directives). To vastly shorten doc build times, 515it is helpful to remove the files you are not working on, only keeping the base 516`index.rst` file and the files you are editing. The Sphinx build will produce 517missing file warnings but will still complete. For example, to work on `jit.rst`: 518 519```bash 520cd docs/source 521find . -type f | grep rst | grep -v index | grep -v jit | xargs rm 522 523# Make your changes, build the docs, etc. 524 525# Don't commit the deletions! 526git add index.rst jit.rst 527... 528``` 529 530#### Building C++ Documentation 531 532For C++ documentation (https://pytorch.org/cppdocs), we use 533[Doxygen](http://www.doxygen.nl/) and then convert it to 534[Sphinx](http://www.sphinx-doc.org/) via 535[Breathe](https://github.com/michaeljones/breathe) and 536[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen 537reference](https://www.doxygen.nl/manual/) for more 538information on the documentation syntax. 539 540We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen 541commands. To run this check locally, run `./check-doxygen.sh` from inside 542`docs/cpp/source`. 543 544To build the documentation, follow the same steps as above, but run them from 545`docs/cpp` instead of `docs`. 546 547### Previewing changes locally 548 549To view HTML files locally, you can open the files in your web browser. For example, 550navigate to `file:///your_pytorch_folder/docs/build/html/index.html` in a web 551browser. 552 553If you are developing on a remote machine, you can set up an SSH tunnel so that 554you can access the HTTP server on the remote machine from your local machine. To map 555remote port 8000 to local port 8000, use either of the following commands. 556 557```bash 558# For SSH 559ssh my_machine -L 8000:my_machine:8000 560 561# For Eternal Terminal 562et my_machine -t="8000:8000" 563``` 564 565Then navigate to `localhost:8000` in your web browser. 566 567**Tip:** 568You can start a lightweight HTTP server on the remote machine with: 569 570```bash 571python -m http.server 8000 <path_to_html_output> 572``` 573 574Alternatively, you can run `rsync` on your local machine to copy the files from 575your remote machine: 576 577```bash 578mkdir -p build cpp/build 579rsync -az me@my_machine:/path/to/pytorch/docs/build/html build 580rsync -az me@my_machine:/path/to/pytorch/docs/cpp/build/html cpp/build 581``` 582 583### Previewing documentation on PRs 584 585PyTorch will host documentation previews at `https://docs-preview.pytorch.org/pytorch/pytorch/<pr number>/index.html` once the 586`pytorch_python_doc_build` GitHub Actions job has completed on your PR. You can visit that page directly 587or find its link in the automated Dr. CI comment on your PR. 588 589### Adding documentation tests 590 591It is easy for code snippets in docstrings and `.rst` files to get out of date. The docs 592build includes the [Sphinx Doctest Extension](https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html), 593which can run code in documentation as a unit test. To use the extension, use 594the `.. testcode::` directive in your `.rst` and docstrings. 595 596To manually run these tests, follow steps 1 and 2 above, then run: 597 598```bash 599cd docs 600make doctest 601``` 602 603## Profiling with `py-spy` 604 605Evaluating the performance impact of code changes in PyTorch can be complicated, 606particularly if code changes happen in compiled code. One simple way to profile 607both Python and C++ code in PyTorch is to use 608[`py-spy`](https://github.com/benfred/py-spy), a sampling profiler for Python 609that has the ability to profile native code and Python code in the same session. 610 611`py-spy` can be installed via `pip`: 612 613```bash 614pip install py-spy 615``` 616 617To use `py-spy`, first write a Python test script that exercises the 618functionality you would like to profile. For example, this script profiles 619`torch.add`: 620 621```python 622import torch 623 624t1 = torch.tensor([[1, 1], [1, 1.]]) 625t2 = torch.tensor([[0, 0], [0, 0.]]) 626 627for _ in range(1000000): 628 torch.add(t1, t2) 629``` 630 631Since the `torch.add` operation happens in microseconds, we repeat it a large 632number of times to get good statistics. The most straightforward way to use 633`py-spy` with such a script is to generate a [flame 634graph](http://www.brendangregg.com/flamegraphs.html): 635 636```bash 637py-spy record -o profile.svg --native -- python test_tensor_tensor_add.py 638``` 639 640This will output a file named `profile.svg` containing a flame graph you can 641view in a web browser or SVG viewer. Individual stack frame entries in the graph 642can be selected interactively with your mouse to zoom in on a particular part of 643the program execution timeline. The `--native` command-line option tells 644`py-spy` to record stack frame entries for PyTorch C++ code. To get line numbers 645for C++ code it may be necessary to compile PyTorch in debug mode by prepending 646your `setup.py develop` call to compile PyTorch with `DEBUG=1`. Depending on 647your operating system it may also be necessary to run `py-spy` with root 648privileges. 649 650`py-spy` can also work in an `htop`-like "live profiling" mode and can be 651tweaked to adjust the stack sampling rate, see the `py-spy` readme for more 652details. 653 654## Managing multiple build trees 655 656One downside to using `python setup.py develop` is that your development 657version of PyTorch will be installed globally on your account (e.g., if 658you run `import torch` anywhere else, the development version will be 659used. 660 661If you want to manage multiple builds of PyTorch, you can make use of 662[conda environments](https://conda.io/docs/using/envs.html) to maintain 663separate Python package environments, each of which can be tied to a 664specific build of PyTorch. To set one up: 665 666```bash 667conda create -n pytorch-myfeature 668source activate pytorch-myfeature 669# if you run python now, torch will NOT be installed 670python setup.py develop 671``` 672 673## C++ development tips 674 675If you are working on the C++ code, there are a few important things that you 676will want to keep in mind: 677 6781. How to rebuild only the code you are working on. 6792. How to make rebuilds in the absence of changes go faster. 680 681### Build only what you need 682 683`python setup.py build` will build everything by default, but sometimes you are 684only interested in a specific component. 685 686- Working on a test binary? Run `(cd build && ninja bin/test_binary_name)` to 687 rebuild only that test binary (without rerunning cmake). (Replace `ninja` with 688 `make` if you don't have ninja installed). 689 690On the initial build, you can also speed things up with the environment 691variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `USE_FLASH_ATTENTION`, `USE_MEM_EFF_ATTENTION`, `BUILD_TEST`, `USE_FBGEMM`, `USE_NNPACK` and `USE_QNNPACK`. 692 693- `DEBUG=1` will enable debug builds (-g -O0) 694- `REL_WITH_DEB_INFO=1` will enable debug symbols with optimizations (-g -O3) 695- `USE_DISTRIBUTED=0` will disable distributed (c10d, gloo, mpi, etc.) build. 696- `USE_MKLDNN=0` will disable using MKL-DNN. 697- `USE_CUDA=0` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time. 698- `BUILD_TEST=0` will disable building C++ test binaries. 699- `USE_FBGEMM=0` will disable using FBGEMM (quantized 8-bit server operators). 700- `USE_NNPACK=0` will disable compiling with NNPACK. 701- `USE_QNNPACK=0` will disable QNNPACK build (quantized 8-bit operators). 702- `USE_XNNPACK=0` will disable compiling with XNNPACK. 703- `USE_FLASH_ATTENTION=0` and `USE_MEM_EFF_ATTENTION=0` will disable compiling flash attention and memory efficient kernels respectively 704 705For example: 706 707```bash 708DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 python setup.py develop 709``` 710 711For subsequent builds (i.e., when `build/CMakeCache.txt` exists), the build 712options passed for the first time will persist; please run `ccmake build/`, run 713`cmake-gui build/`, or directly edit `build/CMakeCache.txt` to adapt build 714options. 715 716### Code completion and IDE support 717 718When using `python setup.py develop`, PyTorch will generate 719a `compile_commands.json` file that can be used by many editors 720to provide command completion and error highlighting for PyTorch's 721C++ code. You need to `pip install ninja` to generate accurate 722information for the code in `torch/csrc`. More information at: 723- https://sarcasm.github.io/notes/dev/compilation-database.html 724 725### Make no-op build fast 726 727#### Use Ninja 728 729By default, cmake will use its Makefile generator to generate your build 730system. You can get faster builds if you install the ninja build system 731with `pip install ninja`. If PyTorch was already built, you will need 732to run `python setup.py clean` once after installing ninja for builds to 733succeed. 734 735Note: Make sure to use a machine with a larger number of CPU cores, this will significantly reduce your build times. 736 737#### Use CCache 738 739Even when dependencies are tracked with file modification, there are many 740situations where files get rebuilt when a previous compilation was exactly the 741same. Using ccache in a situation like this is a real time-saver. 742 743Before building pytorch, install ccache from your package manager of choice: 744 745```bash 746conda install ccache -c conda-forge 747sudo apt install ccache 748sudo yum install ccache 749brew install ccache 750``` 751 752You may also find the default cache size in ccache is too small to be useful. 753The cache sizes can be increased from the command line: 754 755```bash 756# config: cache dir is ~/.ccache, conf file ~/.ccache/ccache.conf 757# max size of cache 758ccache -M 25Gi # -M 0 for unlimited 759# unlimited number of files 760ccache -F 0 761``` 762 763To check this is working, do two clean builds of pytorch in a row. The second 764build should be substantially and noticeably faster than the first build. If 765this doesn't seem to be the case, check the `CMAKE_<LANG>_COMPILER_LAUNCHER` 766rules in `build/CMakeCache.txt`, where `<LANG>` is `C`, `CXX` and `CUDA`. 767Each of these 3 variables should contain ccache, e.g. 768 769``` 770//CXX compiler launcher 771CMAKE_CXX_COMPILER_LAUNCHER:STRING=/usr/bin/ccache 772``` 773 774If not, you can define these variables on the command line before invoking `setup.py`. 775 776```bash 777export CMAKE_C_COMPILER_LAUNCHER=ccache 778export CMAKE_CXX_COMPILER_LAUNCHER=ccache 779export CMAKE_CUDA_COMPILER_LAUNCHER=ccache 780python setup.py develop 781``` 782 783#### Use a faster linker 784 785If you are editing a single file and rebuilding in a tight loop, the time spent 786linking will dominate. The system linker available in most Linux distributions 787(GNU `ld`) is quite slow. Use a faster linker, like [lld](https://lld.llvm.org/). 788 789People on Mac, follow [this guide](https://stackoverflow.com/questions/42730345/how-to-install-llvm-for-mac) instead. 790 791The easiest way to use `lld` this is download the 792[latest LLVM binaries](http://releases.llvm.org/download.html#8.0.0) and run: 793 794```bash 795ln -s /path/to/downloaded/ld.lld /usr/local/bin/ld 796``` 797 798#### Use pre-compiled headers 799 800Sometimes there's no way of getting around rebuilding lots of files, for example 801editing `native_functions.yaml` usually means 1000+ files being rebuilt. If 802you're using CMake newer than 3.16, you can enable pre-compiled headers by 803setting `USE_PRECOMPILED_HEADERS=1` either on first setup, or in the 804`CMakeCache.txt` file. 805 806```sh 807USE_PRECOMPILED_HEADERS=1 python setup.py develop 808``` 809 810This adds a build step where the compiler takes `<ATen/ATen.h>` and essentially 811dumps its internal AST to a file so the compiler can avoid repeating itself for 812every `.cpp` file. 813 814One caveat is that when enabled, this header gets included in every file by default. 815Which may change what code is legal, for example: 816- internal functions can never alias existing names in `<ATen/ATen.h>` 817- names in `<ATen/ATen.h>` will work even if you don't explicitly include it. 818 819#### Workaround for header dependency bug in nvcc 820If re-building without modifying any files results in several CUDA files being 821re-compiled, you may be running into an `nvcc` bug where header dependencies are 822not converted to absolute paths before reporting it to the build system. This 823makes `ninja` think one of the header files has been deleted, so it runs the 824build again. 825 826A compiler-wrapper to fix this is provided in `tools/nvcc_fix_deps.py`. You can use 827this as a compiler launcher, similar to `ccache` 828```bash 829export CMAKE_CUDA_COMPILER_LAUNCHER="python;`pwd`/tools/nvcc_fix_deps.py;ccache" 830python setup.py develop 831``` 832 833### Rebuild few files with debug information 834 835While debugging a problem one often had to maintain a debug build in a separate folder. 836But often only a few files needs to be rebuild with debug info to get a symbolicated backtrace or enable source debugging 837One can easily solve this with the help of `tools/build_with_debinfo.py` 838 839For example, suppose one wants to debug what is going on while tensor index is selected, which can be achieved by setting a breakpoint at `applySelect` function: 840``` 841% lldb -o "b applySelect" -o "process launch" -- python3 -c "import torch;print(torch.rand(5)[3])" 842(lldb) target create "python" 843Current executable set to '/usr/bin/python3' (arm64). 844(lldb) settings set -- target.run-args "-c" "import torch;print(torch.rand(5)[3])" 845(lldb) b applySelect 846Breakpoint 1: no locations (pending). 847WARNING: Unable to resolve breakpoint to any actual locations. 848(lldb) process launch 8492 locations added to breakpoint 1 850Process 87729 stopped 851* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 852 frame #0: 0x00000001023d55a8 libtorch_python.dylib`at::indexing::impl::applySelect(at::Tensor const&, long long, c10::SymInt, long long, c10::Device const&, std::__1::optional<c10::ArrayRef<c10::SymInt>> const&) 853libtorch_python.dylib`at::indexing::impl::applySelect: 854-> 0x1023d55a8 <+0>: sub sp, sp, #0xd0 855 0x1023d55ac <+4>: stp x24, x23, [sp, #0x90] 856 0x1023d55b0 <+8>: stp x22, x21, [sp, #0xa0] 857 0x1023d55b4 <+12>: stp x20, x19, [sp, #0xb0] 858Target 0: (python) stopped. 859Process 87729 launched: '/usr/bin/python' (arm64) 860``` 861Which is not very informative, but can be easily remedied by rebuilding `python_variable_indexing.cpp` with debug information 862``` 863% ./tools/build_with_debinfo.py torch/csrc/autograd/python_variable_indexing.cpp 864[1 / 2] Building caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_variable_indexing.cpp.o 865[2 / 2] Building lib/libtorch_python.dylib 866``` 867And afterwards: 868``` 869% lldb -o "b applySelect" -o "process launch" -- python3 -c "import torch;print(torch.rand(5)[3])" 870(lldb) target create "python" 871Current executable set to '/usr/bin/python3' (arm64). 872(lldb) settings set -- target.run-args "-c" "import torch;print(torch.rand(5)[3])" 873(lldb) b applySelect 874Breakpoint 1: no locations (pending). 875WARNING: Unable to resolve breakpoint to any actual locations. 876(lldb) process launch 8772 locations added to breakpoint 1 878Process 87741 stopped 879* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 880 frame #0: 0x00000001024e2628 libtorch_python.dylib`at::indexing::impl::applySelect(self=0x00000001004ee8a8, dim=0, index=(data_ = 3), real_dim=0, (null)=0x000000016fdfe535, self_sizes= Has Value=true ) at TensorIndexing.h:239:7 881 236 const at::Device& /*self_device*/, 882 237 const c10::optional<SymIntArrayRef>& self_sizes) { 883 238 // See NOTE [nested tensor size for indexing] 884-> 239 if (self_sizes.has_value()) { 885 240 auto maybe_index = index.maybe_as_int(); 886 241 if (maybe_index.has_value()) { 887 242 TORCH_CHECK_INDEX( 888Target 0: (python) stopped. 889Process 87741 launched: '/usr/bin/python3' (arm64) 890``` 891Which is much more useful, isn't it? 892 893### C++ frontend development tips 894 895We have very extensive tests in the [test/cpp/api](test/cpp/api) folder. The 896tests are a great way to see how certain components are intended to be used. 897When compiling PyTorch from source, the test runner binary will be written to 898`build/bin/test_api`. The tests use the [GoogleTest](https://github.com/google/googletest/blob/master/googletest) 899framework, which you can read up about to learn how to configure the test runner. When 900submitting a new feature, we care very much that you write appropriate tests. 901Please follow the lead of the other tests to see how to write a new test case. 902 903### GDB integration 904 905If you are debugging pytorch inside GDB, you might be interested in 906[pytorch-gdb](tools/gdb/pytorch-gdb.py). This script introduces some 907pytorch-specific commands which you can use from the GDB prompt. In 908particular, `torch-tensor-repr` prints a human-readable repr of an at::Tensor 909object. Example of usage: 910 911``` 912$ gdb python 913GNU gdb (GDB) 9.2 914[...] 915(gdb) # insert a breakpoint when we call .neg() 916(gdb) break at::Tensor::neg 917Function "at::Tensor::neg" not defined. 918Make breakpoint pending on future shared library load? (y or [n]) y 919Breakpoint 1 (at::Tensor::neg) pending. 920 921(gdb) run 922[...] 923>>> import torch 924>>> t = torch.tensor([1, 2, 3, 4], dtype=torch.float64) 925>>> t 926tensor([1., 2., 3., 4.], dtype=torch.float64) 927>>> t.neg() 928 929Thread 1 "python" hit Breakpoint 1, at::Tensor::neg (this=0x7ffb118a9c88) at aten/src/ATen/core/TensorBody.h:3295 9303295 inline at::Tensor Tensor::neg() const { 931(gdb) # the default repr of 'this' is not very useful 932(gdb) p this 933$1 = (const at::Tensor * const) 0x7ffb118a9c88 934(gdb) p *this 935$2 = {impl_ = {target_ = 0x55629b5cd330}} 936(gdb) torch-tensor-repr *this 937Python-level repr of *this: 938tensor([1., 2., 3., 4.], dtype=torch.float64) 939``` 940 941GDB tries to automatically load `pytorch-gdb` thanks to the 942[.gdbinit](.gdbinit) at the root of the pytorch repo. However, auto-loadings is disabled by default, because of security reasons: 943 944```bash 945$ gdb 946warning: File "/path/to/pytorch/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". 947To enable execution of this file add 948 add-auto-load-safe-path /path/to/pytorch/.gdbinit 949line to your configuration file "/home/YOUR-USERNAME/.gdbinit". 950To completely disable this security protection add 951 set auto-load safe-path / 952line to your configuration file "/home/YOUR-USERNAME/.gdbinit". 953For more information about this security protection see the 954"Auto-loading safe path" section in the GDB manual. E.g., run from the shell: 955 info "(gdb)Auto-loading safe path" 956(gdb) 957``` 958 959As gdb itself suggests, the best way to enable auto-loading of `pytorch-gdb` 960is to add the following line to your `~/.gdbinit` (i.e., the `.gdbinit` file 961which is in your home directory, **not** `/path/to/pytorch/.gdbinit`): 962 963```bash 964add-auto-load-safe-path /path/to/pytorch/.gdbinit 965``` 966 967### C++ stacktraces 968Set `TORCH_SHOW_CPP_STACKTRACES=1` to get the C++ stacktrace when an error occurs in Python. 969 970## CUDA development tips 971 972If you are working on the CUDA code, here are some useful CUDA debugging tips: 973 9741. `CUDA_DEVICE_DEBUG=1` will enable CUDA device function debug symbols (`-g -G`). 975 This will be particularly helpful in debugging device code. However, it will 976 slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely. 9772. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`, 978 `cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros). 9793. CUDA supports a lot of C++11/14 features such as, `std::numeric_limits`, `std::nextafter`, 980 `std::tuple` etc. in device code. Many of such features are possible because of the 981 [--expt-relaxed-constexpr](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-functions) 982 nvcc flag. There is a known [issue](https://github.com/ROCm-Developer-Tools/HIP/issues/374) 983 that ROCm errors out on device code, which uses such stl functions. 9844. A good performance metric for a CUDA kernel is the 985 [Effective Memory Bandwidth](https://devblogs.nvidia.com/how-implement-performance-metrics-cuda-cc/). 986 It is useful for you to measure this metric whenever you are writing/optimizing a CUDA 987 kernel. Following script shows how we can measure the effective bandwidth of CUDA `uniform_` 988 kernel. 989 ```python 990 import torch 991 from torch.utils.benchmark import Timer 992 size = 128*512 993 nrep = 100 994 nbytes_read_write = 4 # this is number of bytes read + written by a kernel. Change this to fit your kernel. 995 996 for i in range(10): 997 a=torch.empty(size).cuda().uniform_() 998 torch.cuda.synchronize() 999 out = a.uniform_() 1000 torch.cuda.synchronize() 1001 t = Timer(stmt="a.uniform_()", globals=globals()) 1002 res = t.blocked_autorange() 1003 timec = res.median 1004 print("uniform, size, elements", size, "forward", timec, "bandwidth (GB/s)", size*(nbytes_read_write)*1e-9/timec) 1005 size *=2 1006 ``` 1007 1008 See more cuda development tips [here](https://github.com/pytorch/pytorch/wiki/CUDA-basics) 1009 1010## Windows development tips 1011 1012For building from source on Windows, consult 1013[our documentation](https://pytorch.org/docs/stable/notes/windows.html) on it. 1014 1015Occasionally, you will write a patch which works on Linux, but fails CI on Windows. 1016There are a few aspects in which MSVC (the Windows compiler toolchain we use) is stricter 1017than Linux, which are worth keeping in mind when fixing these problems. 1018 10191. Symbols are NOT exported by default on Windows; instead, you have to explicitly 1020 mark a symbol as exported/imported in a header file with `__declspec(dllexport)` / 1021 `__declspec(dllimport)`. We have codified this pattern into a set of macros 1022 which follow the convention `*_API`, e.g., `TORCH_API` inside Caffe2, Aten and Torch. 1023 (Every separate shared library needs a unique macro name, because symbol visibility 1024 is on a per shared library basis. See c10/macros/Macros.h for more details.) 1025 1026 The upshot is if you see an "unresolved external" error in your Windows build, this 1027 is probably because you forgot to mark a function with `*_API`. However, there is 1028 one important counterexample to this principle: if you want a *templated* function 1029 to be instantiated at the call site, do NOT mark it with `*_API` (if you do mark it, 1030 you'll have to explicitly instantiate all of the specializations used by the call 1031 sites.) 1032 10332. If you link against a library, this does not make its dependencies transitively 1034 visible. You must explicitly specify a link dependency against every library whose 1035 symbols you use. (This is different from Linux where in most environments, 1036 transitive dependencies can be used to fulfill unresolved symbols.) 1037 10383. If you have a Windows box (we have a few on EC2 which you can request access to) and 1039 you want to run the build, the easiest way is to just run `.ci/pytorch/win-build.sh`. 1040 If you need to rebuild, run `REBUILD=1 .ci/pytorch/win-build.sh` (this will avoid 1041 blowing away your Conda environment.) 1042 1043Even if you don't know anything about MSVC, you can use cmake to build simple programs on 1044Windows; this can be helpful if you want to learn more about some peculiar linking behavior 1045by reproducing it on a small example. Here's a simple example cmake file that defines 1046two dynamic libraries, one linking with the other: 1047 1048```CMake 1049project(myproject CXX) 1050set(CMAKE_CXX_STANDARD 14) 1051add_library(foo SHARED foo.cpp) 1052add_library(bar SHARED bar.cpp) 1053# NB: don't forget to __declspec(dllexport) at least one symbol from foo, 1054# otherwise foo.lib will not be created. 1055target_link_libraries(bar PUBLIC foo) 1056``` 1057 1058You can build it with: 1059 1060```bash 1061mkdir build 1062cd build 1063cmake .. 1064cmake --build . 1065``` 1066 1067### Known MSVC (and MSVC with NVCC) bugs 1068 1069The PyTorch codebase sometimes likes to use exciting C++ features, and 1070these exciting features lead to exciting bugs in Windows compilers. 1071To add insult to injury, the error messages will often not tell you 1072which line of code actually induced the erroring template instantiation. 1073 1074We've found the most effective way to debug these problems is to 1075carefully read over diffs, keeping in mind known bugs in MSVC/NVCC. 1076Here are a few well known pitfalls and workarounds: 1077 1078* This is not actually a bug per se, but in general, code generated by MSVC 1079 is more sensitive to memory errors; you may have written some code 1080 that does a use-after-free or stack overflows; on Linux the code 1081 might work, but on Windows your program will crash. ASAN may not 1082 catch all of these problems: stay vigilant to the possibility that 1083 your crash is due to a real memory problem. 1084 1085* (NVCC) `c10::optional` does not work when used from device code. Don't use 1086 it from kernels. Upstream issue: https://github.com/akrzemi1/Optional/issues/58 1087 and our local issue #10329. 1088 1089* `constexpr` generally works less well on MSVC. 1090 1091 * The idiom `static_assert(f() == f())` to test if `f` is constexpr 1092 does not work; you'll get "error C2131: expression did not evaluate 1093 to a constant". Don't use these asserts on Windows. 1094 (Example: `c10/util/intrusive_ptr.h`) 1095 1096* (NVCC) Code you access inside a `static_assert` will eagerly be 1097 evaluated as if it were device code, and so you might get an error 1098 that the code is "not accessible". 1099 1100```cpp 1101class A { 1102 static A singleton_; 1103 static constexpr inline A* singleton() { 1104 return &singleton_; 1105 } 1106}; 1107static_assert(std::is_same(A*, decltype(A::singleton()))::value, "hmm"); 1108``` 1109 1110* The compiler will run out of heap space if you attempt to compile files that 1111 are too large. Splitting such files into separate files helps. 1112 (Example: `THTensorMath`, `THTensorMoreMath`, `THTensorEvenMoreMath`.) 1113 1114* MSVC's preprocessor (but not the standard compiler) has a bug 1115 where it incorrectly tokenizes raw string literals, ending when it sees a `"`. 1116 This causes preprocessor tokens inside the literal like an`#endif` to be incorrectly 1117 treated as preprocessor directives. See https://godbolt.org/z/eVTIJq as an example. 1118 1119* Either MSVC or the Windows headers have a PURE macro defined and will replace 1120 any occurrences of the PURE token in code with an empty string. This is why 1121 we have AliasAnalysisKind::PURE_FUNCTION and not AliasAnalysisKind::PURE. 1122 The same is likely true for other identifiers that we just didn't try to use yet. 1123 1124### Building on legacy code and CUDA 1125 1126CUDA, MSVC, and PyTorch versions are interdependent; please install matching versions from this table: 1127| CUDA version | Newest supported VS version | PyTorch version | 1128| ------------ | ------------------------------------------------------- | --------------- | 1129| 10.1 | Visual Studio 2019 (16.X) (`_MSC_VER` < 1930) | 1.3.0 ~ 1.7.0 | 1130| 10.2 | Visual Studio 2019 (16.X) (`_MSC_VER` < 1930) | 1.5.0 ~ 1.7.0 | 1131| 11.0 | Visual Studio 2019 (16.X) (`_MSC_VER` < 1930) | 1.7.0 | 1132 1133Note: There's a [compilation issue](https://github.com/oneapi-src/oneDNN/issues/812) in several Visual Studio 2019 versions since 16.7.1, so please make sure your Visual Studio 2019 version is not in 16.7.1 ~ 16.7.5 1134 1135## Running clang-tidy 1136 1137[Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/index.html) is a C++ 1138linter and static analysis tool based on the clang compiler. We run clang-tidy 1139in our CI to make sure that new C++ code is safe, sane and efficient. See the 1140[`clang-tidy` job in our GitHub Workflow's 1141lint.yml file](https://github.com/pytorch/pytorch/blob/main/.github/workflows/lint.yml) 1142for the simple commands we use for this. 1143 1144To run clang-tidy locally, follow these steps: 1145 11461. Install clang-tidy. 1147We provide custom built binaries which have additional checks enabled. You can install it by running: 1148```bash 1149python3 -m tools.linter.clang_tidy.generate_build_files 1150``` 1151We currently only support Linux and MacOS (x86). 1152 11532. Install clang-tidy driver script dependencies 1154```bash 1155pip3 install -r tools/linter/clang_tidy/requirements.txt 1156``` 1157 11583. Run clang-tidy 1159```bash 1160# Run clang-tidy on the entire codebase 1161make clang-tidy 1162# Run clang-tidy only on your changes 1163make clang-tidy CHANGED_ONLY=--changed-only 1164``` 1165This internally invokes our driver script and closely mimics how clang-tidy is run on CI. 1166 1167## Pre-commit tidy/linting hook 1168 1169We use clang-tidy to perform additional 1170formatting and semantic checking of code. We provide a pre-commit git hook for 1171performing these checks, before a commit is created: 1172 1173 ```bash 1174 ln -s ../../tools/git-pre-commit .git/hooks/pre-commit 1175 ``` 1176 1177If you have already committed files and 1178CI reports `flake8` errors, you can run the check locally in your PR branch with: 1179 1180 ```bash 1181 flake8 $(git diff --name-only $(git merge-base --fork-point main)) 1182 ``` 1183 1184You'll need to install an appropriately configured flake8; see 1185[Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type) 1186for documentation on how to do this. 1187 1188Fix the code so that no errors are reported when you re-run the above check again, 1189and then commit the fix. 1190 1191## Building PyTorch with ASAN 1192 1193[ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer) is very 1194useful for debugging memory errors in C++. We run it in CI, but here's how to 1195get the same thing to run on your local machine. 1196 1197First, install LLVM 8. The easiest way is to get [prebuilt 1198binaries](http://releases.llvm.org/download.html#8.0.0) and extract them to 1199folder (later called `$LLVM_ROOT`). 1200 1201Then set up the appropriate scripts. You can put this in your `.bashrc`: 1202 1203```bash 1204LLVM_ROOT=<wherever your llvm install is> 1205PYTORCH_ROOT=<wherever your pytorch checkout is> 1206 1207LIBASAN_RT="$LLVM_ROOT/lib/clang/8.0.0/lib/linux/libclang_rt.asan-x86_64.so" 1208build_with_asan() 1209{ 1210 LD_PRELOAD=${LIBASAN_RT} \ 1211 CC="$LLVM_ROOT/bin/clang" \ 1212 CXX="$LLVM_ROOT/bin/clang++" \ 1213 LDSHARED="clang --shared" \ 1214 LDFLAGS="-stdlib=libstdc++" \ 1215 CFLAGS="-fsanitize=address -fno-sanitize-recover=all -shared-libasan -pthread" \ 1216 CXX_FLAGS="-pthread" \ 1217 USE_CUDA=0 USE_OPENMP=0 USE_DISTRIBUTED=0 DEBUG=1 \ 1218 python setup.py develop 1219} 1220 1221run_with_asan() 1222{ 1223 LD_PRELOAD=${LIBASAN_RT} $@ 1224} 1225 1226# you can look at build-asan.sh to find the latest options the CI uses 1227export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true 1228export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PYTORCH_ROOT/ubsan.supp 1229export ASAN_SYMBOLIZER_PATH=$LLVM_ROOT/bin/llvm-symbolizer 1230``` 1231 1232Then you can use the scripts like: 1233 1234``` 1235suo-devfair ~/pytorch ❯ build_with_asan 1236suo-devfair ~/pytorch ❯ run_with_asan python test/test_jit.py 1237``` 1238 1239### Getting `ccache` to work 1240 1241The scripts above specify the `clang` and `clang++` binaries directly, which 1242bypasses `ccache`. Here's how to get `ccache` to work: 1243 12441. Make sure the ccache symlinks for `clang` and `clang++` are set up (see 1245 CONTRIBUTING.md) 12462. Make sure `$LLVM_ROOT/bin` is available on your `$PATH`. 12473. Change the `CC` and `CXX` variables in `build_with_asan()` to point 1248 directly to `clang` and `clang++`. 1249 1250### Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`? 1251 1252The “standard” workflow for ASAN assumes you have a standalone binary: 1253 12541. Recompile your binary with `-fsanitize=address`. 12552. Run the binary, and ASAN will report whatever errors it find. 1256 1257Unfortunately, PyTorch is a distributed as a shared library that is loaded by 1258a third-party executable (Python). It’s too much of a hassle to recompile all 1259of Python every time we want to use ASAN. Luckily, the ASAN folks have a 1260workaround for cases like this: 1261 12621. Recompile your library with `-fsanitize=address -shared-libasan`. The 1263 extra `-shared-libasan` tells the compiler to ask for the shared ASAN 1264 runtime library. 12652. Use `LD_PRELOAD` to tell the dynamic linker to load the ASAN runtime 1266 library before anything else. 1267 1268More information can be found 1269[here](https://github.com/google/sanitizers/wiki/AddressSanitizerAsDso). 1270 1271### Why LD_PRELOAD in the build function? 1272 1273We need `LD_PRELOAD` because there is a cmake check that ensures that a 1274simple program builds and runs. If we are building with ASAN as a shared 1275library, we need to `LD_PRELOAD` the runtime library, otherwise there will 1276dynamic linker errors and the check will fail. 1277 1278We don’t actually need either of these if we fix the cmake checks. 1279 1280### Why no leak detection? 1281 1282Python leaks a lot of memory. Possibly we could configure a suppression file, 1283but we haven’t gotten around to it. 1284 1285## Caffe2 notes 1286 1287In 2018, we merged Caffe2 into the PyTorch source repository. While the 1288steady state aspiration is that Caffe2 and PyTorch share code freely, 1289in the meantime there will be some separation. 1290 1291There are a few "unusual" directories which, for historical reasons, 1292are Caffe2/PyTorch specific. Here they are: 1293 1294- `CMakeLists.txt`, `Makefile`, `binaries`, `cmake`, `conda`, `modules`, 1295 `scripts` are Caffe2-specific. Don't put PyTorch code in them without 1296 extra coordination. 1297 1298- `mypy*`, `requirements.txt`, `setup.py`, `test`, `tools` are 1299 PyTorch-specific. Don't put Caffe2 code in them without extra 1300 coordination. 1301 1302## CI failure tips 1303 1304Once you submit a PR or push a new commit to a branch that is in 1305an active PR, CI jobs will be run automatically. Some of these may 1306fail and you will need to find out why, by looking at the logs. 1307 1308Fairly often, a CI failure might be unrelated to your changes. You can 1309confirm by going to our [HUD](https://hud.pytorch.org) and seeing if the CI job 1310is failing upstream already. In this case, you 1311can usually ignore the failure. See [the following 1312subsection](#which-commit-is-used-in-ci) for more details. 1313 1314Some failures might be related to specific hardware or environment 1315configurations. In this case, if you're a Meta employee, you can ssh into 1316the job's session to perform manual debugging following the instructions in 1317our [CI wiki](https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions). 1318 1319 1320### Which commit is used in CI? 1321 1322For CI run on `main`, this repository is checked out for a given `main` 1323commit, and CI is run on that commit (there isn't really any other choice). 1324 1325For PRs, however, it's a bit more complicated. Consider this commit graph, where 1326`main` is at commit `A`, and the branch for PR #42 (just a placeholder) is at 1327commit `B`: 1328 1329``` 1330 o---o---B (refs/pull/42/head) 1331 / \ 1332 / C (refs/pull/42/merge) 1333 / / 1334---o---o---o---A (merge-destination) - usually main 1335``` 1336 1337There are two possible choices for which commit to use: 1338 13391. Checkout commit `B`, the head of the PR (manually committed by the PR 1340 author). 13412. Checkout commit `C`, the hypothetical result of what would happen if the PR 1342 were merged into its destination (usually `main`). 1343 1344For all practical purposes, most people can think of the commit being used as 1345commit `B` (choice **1**). 1346 1347However, if workflow files (which govern CI behavior) were modified (either by your PR or since dev branch were created ) there's 1348a nuance to know about: 1349The workflow files themselves get taken from checkpoint `C`, the merger of your 1350PR and the `main` branch. But only the workflow files get taken from that merged 1351checkpoint. Everything else (tests, code, etc) all get taken directly from your 1352PR's commit (commit `B`). Please note, this scenario would never affect PRs authored by `ghstack` as they would not automatically ingest the updates from default branch. 1353 1354 1355## Dev Infra Office Hours 1356[Dev Infra Office Hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) are hosted every Friday to answer any questions regarding developer experience, Green HUD, and CI. 1357