1# Summary 2These are a collection of scripts for access lists of commits between releases. There are other scripts for automatically generating labels for commits. 3 4The release_notes Runbook and other supporting docs can be found here: [Release Notes Supporting Docs](https://drive.google.com/drive/folders/1J0Uwz8oE7TrdcP95zc-id1gdSBPnMKOR?usp=sharing) 5 6An example of generated docs for submodule owners: [2.0 release notes submodule docs](https://drive.google.com/drive/folders/1zQtmF_ak7BkpGEM58YgJfnpNXTnFl25q?usp=share_link) 7 8### Authentication: 9First run the `test_release_notes.py` script to make sure you have the correct authentication set up. This script will try to access the GitHub API and will fail if you are not authenticated. 10 11- If you have enabled ghstack then authentication should be set up correctly. 12- Otherwise go to `https://github.com/settings/tokens` and create a token. You can either follow the steps to setup ghstack or set the env variable `GITHUB_TOKEN`. 13 14 15## Steps: 16 17### Part 1: getting a list of commits 18 19You are going to get a list of commits since the last release in csv format. The usage is the following: 20Assuming tags/v1.13.1 is last released version 21From this directory run: 22`python commitlist.py --create_new tags/v1.13.1 <commit_hash> ` 23 24This saves a commit list to `results/commitlist.csv`. Please confirm visually that the oldest commits weren’t included in the branch cut for the last release as a sanity check. 25 26NB: the commit list contains commits from the merge-base of tags/<most_recent_release_tag> and whatever commit hash you give it, so it may have commits that were cherry-picked to <most_recent_release_tag>! 27 28* Go through the list of cherry-picked commits to the last release and delete them from results/commitlist.csv. 29* This is done manually: 30 * Look for all the PRs that were merged in the release branch with a github query like: https://github.com/pytorch/pytorch/pulls?q=is%3Apr+base%3Arelease%2F<most_recent_release_tag>+is%3Amerged 31 * Look at the commit history https://github.com/pytorch/pytorch/commits/release/<most_recent_release_tag>, to find all the direct push in the release branch (usually for reverts) 32 33 34If you already have a commit list and want to update it, use the following command. This command can be helpful if there are cherry-picks to the release branch or if you’re categorizing commits throughout the three months up to a release. Warning: this is not very well tested. Make sure that you’re on the same branch (e.g., release/<upcoming_release_tag>) as the last time you ran this command, and that you always *commit* your csv before running this command to avoid losing work. 35 36`python commitlist.py --update_to <commit_hash>` 37 38### Part 2: categorizing commits 39 40#### Exploration and cleanup 41 42In this folder is an ipython notebook that I used for exploration and finding relevant commits. For example the commitlist attempts to categorize commits based off the `release notes:` label. Users of PyTorch often add new release notes labels. This Notebook has a cell that can help you identify new labels. 43 44There is a list of all known categories defined in `common.py`. It has designations for types of categories as well such as `_frontend`. 45 46The `categorize` function in commitlist.py does an adequate job of adding the appropriate categories. Since new categories though may be created for your release you may find it helpful to add new heuristics around files changed to help with categorization. 47 48If you update the automatic categorization you can run the following to update the commit list. 49`python commitlist.py --rerun_with_new_filters` Note that this will only update the commits in the commit list that have a category of "Uncategorized". 50 51One you have dug through the commits and done as much automated categorization you can run the following for an interface to categorize any remaining commits. 52 53#### Training a commit classifier 54I added scripts to train a commit classifier from the set of labeled commits in commitlist.csv. This will utilize the title, author, and files changed features of the commits. The file requires torchtext, and tqdm. I had to install torchtext from source but if you are also a PyTorch developer this would likely already be installed. 55 56- There should already exist a `results/` directory from gathering the commitlist.csv. The next step is to create `mkdir results/classifier` 57- Run `python classifier.py --train` This will train the model and save for inference. 58- Run `python categorize.py --use_classifier` This will pre-populate the output with the most likely category. And pressing enter will confirm selection. 59 - Or run `python categorize.py` to label without the classifier. 60 61The interface modifies results/commitlist.csv. If you want to take a coffee break, you can CTRL-C out of it (results/commitlist.csv gets written to on each categorization) and then commit and push results/commitlist.csv to a branch for safekeeping. 62 63If you want to revert a change you just made, you can edit results/commitlist.csv directly. 64 65For each commit, after choosing the category, you can also choose a topic. For the frontend category, you should take the time to do it to save time in the next step. For other categories, you can do it but only of you are 100% sure as it is confusing for submodule owners otherwise. 66 67The categories are as follow: 68 Be sure to update this list if you add a new category to common.py 69 70* jit: Everything related to the jit (including tensorexpr) 71* quantization: Everything related to the quantization mode/passes/operators 72* mobile: Everything related to the mobile build/ops/features 73* onnx: Everything related to onnx 74* caffe2: Everything that happens in the caffe2 folder. No need to add any topics here as these are ignored (they don’t make it into the final release notes) 75* distributed: Everything related to distributed training and rpc 76* visualization: Everything related to tensorboard and visualization in general 77* releng: Everything related to release engineering (circle CI, docker images, etc) 78* amd: Everything related to rocm and amd CPUs 79* cuda: Everything related to cuda backend 80* benchmark: Everything related to the opbench folder and utils.benchmark submodule 81* package: Everything related to torch.package 82* performance as a product: All changes that improve perfs 83* profiler: Everything related to the profiler 84* composability: Everything related to the dispatcher and ATen native binding 85* fx: Everything related to torch.fx 86* code_coverage: Everything related to the code coverage tool 87* vulkan: Everything related to vulkan support (mobile GPU backend) 88* skip: Everything that is not end user or dev facing like code refactoring or internal implementation changes 89* frontend: To ease your future work, we split things here (may be merged in the final document) 90 * python_api 91 * cpp_api 92 * complex 93 * vmap 94 * autograd 95 * build 96 * memory_format 97 * foreach 98 * dataloader 99 * nestedtensor 100 * sparse 101 * mps 102 103 104The topics are as follow: 105 106* bc_breaking: All commits marked as BC-breaking (the script should highlight them). If any other commit look like it could be BC-breaking, add it here as well! 107* deprecation: All commits introducing deprecation. Should be clear from commit msg. 108* new_features: All commits introducing a new feature (new functions, new submodule, new supported platform etc) 109* improvements: All commits providing improvements to existing feature should be here (new backend for a function, new argument, better numerical stability) 110* bug fixes: All commits that fix bugs and behaviors that do not match the documentation 111* performance: All commits that are here mainly for performance (we separate this from improvements above to make it easier for users to look for it) 112* documentation: All commits that add/update documentation 113* devs: All commits that are not end-user facing but still impact people that compile from source, develop into pytorch, extend pytorch, cpp extensions, etc 114* unknown 115 116 117### Part 3: export categories to markdown 118 119`python commitlist.py --export_markdown` 120 121The above exports results/commitlist.csv to markdown by listing every commit under its respective category. 122It will create one file per category in the results/export/ folder. 123 124This part is a little tedious but it seems to work. May want to explore using pandoc to convert the markdown to google doc format. 125 1261. Make sure you are using the light theme of VSCode. 1272. Open a preview of the markdown file and copy the Preview. 1283. In the correct google doc copy the preview and make sure to paste WITH formatting. 1294. You can now send these google docs to the relevant submodule owners for review. 1305. Install the google doc extension [docs to markdown](https://github.com/evbacher/gd2md-html) 1316. Start to compile back down these markdown files into a single markdown file. 132 133`TODO`: This is by far the most manual process and is ripe for automation. If the next person up would like to investigate Google Doc APIS there is some room hor improvement here. 134 135### Part 4: Cherry Picks 136 137You will likely have started this process prior to the branch-cut being finalized. This means Cherry Picks. 138This was my process for keeping track. I use a notes app to log my progress as I periodically incorporate the new cherry picks. 139I will have initially ran something like: 140``` Bash 141python commitlist.py --create_new tags/v1.13.1 <commit-hash> 142``` 143I keep track of that commit-hash. Once there are some cherry-picks that you would like to incorporate I rebase the release branch to upstream 144and run: 145```Bash 146python commitlist.py --update_to <latest-cherry-pick-hash> 147``` 148I then run 149``` Python 150import pandas as pd 151 152commit_list_df = pd.read_csv("results/commitlist.csv") 153last_known_good_hash = "<the most recent hash>" 154 155previous_index = commit_list_df[commit_list_df.commit_hash == last_known_good_hash].index.values[0] 156cherry_pick_df = commit_list_df.iloc[previous_index+1:] 157path = "<your_path>/cherry_picks.csv" 158cherry_pick_df.to_csv(path, index=False) 159 160 161from commitlist import CommitList, to_markdown 162cherry_pick_commit_list = CommitList.from_existing(path) 163 164import os 165categories = list(cherry_pick_commit_list.stat().keys()) 166for category in categories: 167 print(f"Exporting {category}...") 168 lines =to_markdown(cherry_pick_commit_list, category) 169 filename = f'/tmp/cherry_pick/results/result_{category}.md' 170 os.makedirs(os.path.dirname(filename), exist_ok=True) 171 with open(filename, 'w') as f: 172 f.writelines(lines) 173 174``` 175 176This will create new markdown files only from cherry picked commits. And I manually copied and pasted these into the submodule google docs and commented so that 177the submodule owners will see these new commits. 178 179 180### Part 5: Pulling on the submodules into one 181I pretty much followed the run book here. One thing I did was use the [markdown-all-in-one](https://marketplace.visualstudio.com/items?itemName=yzhang.markdown-all-in-one) 182extension to create a table of contents which was really helpful in jumping to sections and copy and pasting the appropriate commits. 183 184You will then create a release at [Pytorch Release](https://github.com/pytorch/pytorch/releases) and if you save as a draft you can see how it will be rendered. 185 186 187 188#### Tidbits 189You will probably have a release note that doesn't fit into the character limit of github. I used the following regex: 190`\[#(\d+)\]\(https://github.com/pytorch/pytorch/pull/\d+\)` to replace the full lunks to (#<pull-request-number>). 191This will get formatted correctly in the github UI and can be checked when creating a draft release. 192 193 194The following markdown code is helpful for creating side-by-side tables of BC breaking/ deprecated code: 195 196 197``` Markdown 198<table> 199<tr> 200<th>PRIOR RELEASE NUM</th> 201<th>NEW RELEASE NUM</th> 202</tr> 203<tr> 204<td> 205 206```Python 207# Code Snippet 1 208``` 209 210</td> 211<td> 212 213```Python 214# Code Snippet 2 215``` 216 217</td> 218</tr> 219</table> 220``` 221