Name Date Size #Lines LOC

..--

.gitignoreH A D25-Apr-202510 21

README.mdH A D25-Apr-202511.7 KiB221159

apply_categories.pyH A D25-Apr-20251,001 3119

categorize.pyH A D25-Apr-20257 KiB218179

classifier.pyH A D25-Apr-202514.7 KiB423339

commitlist.pyH A D25-Apr-202521.4 KiB583491

common.pyH A D25-Apr-20257.9 KiB333274

explore.ipynbH A D25-Apr-20252.5 KiB111110

namespace_check.pyH A D25-Apr-20253.1 KiB132103

requirements.txtH A D25-Apr-202513 22

test_release_notes.pyH A D25-Apr-20252.2 KiB5645

README.md

1# Summary
2These are a collection of scripts for access lists of commits between releases. There are other scripts for automatically generating labels for commits.
3
4The release_notes Runbook and other supporting docs can be found here: [Release Notes Supporting Docs](https://drive.google.com/drive/folders/1J0Uwz8oE7TrdcP95zc-id1gdSBPnMKOR?usp=sharing)
5
6An example of generated docs for submodule owners: [2.0 release notes submodule docs](https://drive.google.com/drive/folders/1zQtmF_ak7BkpGEM58YgJfnpNXTnFl25q?usp=share_link)
7
8### Authentication:
9First run the `test_release_notes.py` script to make sure you have the correct authentication set up. This script will try to access the GitHub API and will fail if you are not authenticated.
10
11- If you have enabled ghstack then authentication should be set up correctly.
12- Otherwise go to `https://github.com/settings/tokens` and create a token. You can either follow the steps to setup ghstack or set the env variable `GITHUB_TOKEN`.
13
14
15## Steps:
16
17### Part 1: getting a list of commits
18
19You are going to get a list of commits since the last release in csv format. The usage is the following:
20Assuming tags/v1.13.1 is last released version
21From this directory run:
22`python commitlist.py --create_new tags/v1.13.1 <commit_hash> `
23
24This saves a commit list to `results/commitlist.csv`.  Please confirm visually that the oldest commits weren’t included in the branch cut for the last release as a sanity check.
25
26NB: the commit list contains commits from the merge-base of tags/<most_recent_release_tag> and whatever commit hash you give it, so it may have commits that were cherry-picked to <most_recent_release_tag>!
27
28* Go through the list of cherry-picked commits to the last release and delete them from results/commitlist.csv.
29* This is done manually:
30    * Look for all the PRs that were merged in the release branch with a github query like: https://github.com/pytorch/pytorch/pulls?q=is%3Apr+base%3Arelease%2F<most_recent_release_tag>+is%3Amerged
31    *  Look at the commit history https://github.com/pytorch/pytorch/commits/release/<most_recent_release_tag>, to find all the direct push in the release branch (usually for reverts)
32
33
34If you already have a commit list and want to update it, use the following command. This command can be helpful if there are cherry-picks to the release branch or if you’re categorizing commits throughout the three months up to a release. Warning: this is not very well tested. Make sure that you’re on the same branch (e.g., release/<upcoming_release_tag>) as the last time you ran this command, and that you always *commit* your csv before running this command to avoid losing work.
35
36`python commitlist.py --update_to <commit_hash>`
37
38### Part 2: categorizing commits
39
40#### Exploration and cleanup
41
42In this folder is an ipython notebook that I used for exploration and finding relevant commits. For example the commitlist attempts to categorize commits based off the `release notes:` label. Users of PyTorch often add new release notes labels. This Notebook has a cell that can help you identify new labels.
43
44There is a list of all known categories defined in `common.py`. It has designations for types of categories as well such as `_frontend`.
45
46The `categorize` function in commitlist.py does an adequate job of adding the appropriate categories. Since new categories though may be created for your release you may find it helpful to add new heuristics around files changed to help with categorization.
47
48If you update the automatic categorization you can run the following to update the commit list.
49`python commitlist.py --rerun_with_new_filters` Note that this will only update the commits in the commit list that have a category of "Uncategorized".
50
51One you have dug through the commits and done as much automated categorization you can run the following for an interface to categorize any remaining commits.
52
53#### Training a commit classifier
54I added scripts to train a commit classifier from the set of labeled commits in commitlist.csv. This will utilize the title, author, and files changed features of the commits. The file requires torchtext, and tqdm. I had to install torchtext from source but if you are also a PyTorch developer this would likely already be installed.
55
56- There should already exist a `results/` directory from gathering the commitlist.csv. The next step is to create `mkdir results/classifier`
57- Run `python classifier.py --train` This will train the model and save for inference.
58- Run `python categorize.py --use_classifier` This will pre-populate the output with the most likely category. And pressing enter will confirm selection.
59 - Or run `python categorize.py` to label without the classifier.
60
61The interface modifies results/commitlist.csv. If you want to take a coffee break, you can CTRL-C out of it (results/commitlist.csv gets written to on each categorization) and then commit and push results/commitlist.csv to a branch for safekeeping.
62
63If you want to revert a change you just made, you can edit results/commitlist.csv directly.
64
65For each commit, after choosing the category, you can also choose a topic. For the frontend category, you should take the time to do it to save time in the next step. For other categories, you can do it but only of you are 100% sure as it is confusing for submodule owners otherwise.
66
67The categories are as follow:
68 Be sure to update this list if you add a new category to common.py
69
70* jit: Everything related to the jit (including tensorexpr)
71* quantization: Everything related to the quantization mode/passes/operators
72* mobile: Everything related to the mobile build/ops/features
73* onnx: Everything related to onnx
74* caffe2: Everything that happens in the caffe2 folder. No need to add any topics here as these are ignored (they don’t make it into the final release notes)
75* distributed: Everything related to distributed training and rpc
76* visualization: Everything related to tensorboard and visualization in general
77* releng: Everything related to release engineering (circle CI, docker images, etc)
78* amd: Everything related to rocm and amd CPUs
79* cuda: Everything related to cuda backend
80* benchmark: Everything related to the opbench folder and utils.benchmark submodule
81* package: Everything related to torch.package
82* performance as a product: All changes that improve perfs
83* profiler: Everything related to the profiler
84* composability: Everything related to the dispatcher and ATen native binding
85* fx: Everything related to torch.fx
86* code_coverage: Everything related to the code coverage tool
87* vulkan: Everything related to vulkan support (mobile GPU backend)
88* skip: Everything that is not end user or dev facing like code refactoring or internal implementation changes
89* frontend: To ease your future work, we split things here (may be merged in the final document)
90    * python_api
91    * cpp_api
92    * complex
93    * vmap
94    * autograd
95    * build
96    * memory_format
97    * foreach
98    * dataloader
99    * nestedtensor
100    * sparse
101    * mps
102
103
104The topics are as follow:
105
106* bc_breaking: All commits marked as BC-breaking (the script should highlight them). If any other commit look like it could be BC-breaking, add it here as well!
107* deprecation: All commits introducing deprecation. Should be clear from commit msg.
108* new_features: All commits introducing a new feature (new functions, new submodule, new supported platform etc)
109* improvements: All commits providing improvements to existing feature should be here (new backend for a function, new argument, better numerical stability)
110* bug fixes: All commits that fix bugs and behaviors that do not match the documentation
111* performance: All commits that are here mainly for performance (we separate this from improvements above to make it easier for users to look for it)
112* documentation: All commits that add/update documentation
113* devs: All commits that are not end-user facing but still impact people that compile from source, develop into pytorch, extend pytorch, cpp extensions, etc
114* unknown
115
116
117### Part 3: export categories to markdown
118
119`python commitlist.py --export_markdown`
120
121The above exports results/commitlist.csv to markdown by listing every commit under its respective category.
122It will create one file per category in the results/export/ folder.
123
124This part is a little tedious but it seems to work. May want to explore using pandoc to convert the markdown to google doc format.
125
1261. Make sure you are using the light theme of VSCode.
1272. Open a preview of the markdown file and copy the Preview.
1283. In the correct google doc copy the preview and make sure to paste WITH formatting.
1294. You can now send these google docs to the relevant submodule owners for review.
1305. Install the google doc extension [docs to markdown](https://github.com/evbacher/gd2md-html)
1316. Start to compile back down these markdown files into a single markdown file.
132
133`TODO`: This is by far the most manual process and is ripe for automation. If the next person up would like to investigate Google Doc APIS there is some room hor improvement here.
134
135### Part 4: Cherry Picks
136
137You will likely have started this process prior to the branch-cut being finalized. This means Cherry Picks.
138This was my process for keeping track. I use a notes app to log my progress as I periodically incorporate the new cherry picks.
139I will have initially ran something like:
140``` Bash
141python commitlist.py --create_new tags/v1.13.1 <commit-hash>
142```
143I keep track of that commit-hash. Once there are some cherry-picks that you would like to incorporate I rebase the release branch to upstream
144and run:
145```Bash
146python commitlist.py --update_to <latest-cherry-pick-hash>
147```
148I then run
149``` Python
150import pandas as pd
151
152commit_list_df = pd.read_csv("results/commitlist.csv")
153last_known_good_hash = "<the most recent hash>"
154
155previous_index = commit_list_df[commit_list_df.commit_hash == last_known_good_hash].index.values[0]
156cherry_pick_df = commit_list_df.iloc[previous_index+1:]
157path = "<your_path>/cherry_picks.csv"
158cherry_pick_df.to_csv(path, index=False)
159
160
161from commitlist import CommitList, to_markdown
162cherry_pick_commit_list = CommitList.from_existing(path)
163
164import os
165categories = list(cherry_pick_commit_list.stat().keys())
166for category in categories:
167    print(f"Exporting {category}...")
168    lines =to_markdown(cherry_pick_commit_list, category)
169    filename = f'/tmp/cherry_pick/results/result_{category}.md'
170    os.makedirs(os.path.dirname(filename), exist_ok=True)
171    with open(filename, 'w') as f:
172        f.writelines(lines)
173
174```
175
176This will create new markdown files only from cherry picked commits. And I manually copied and pasted these into the submodule google docs and commented so that
177the submodule owners will see these new commits.
178
179
180### Part 5: Pulling on the submodules into one
181I pretty much followed the run book here. One thing I did was use the [markdown-all-in-one](https://marketplace.visualstudio.com/items?itemName=yzhang.markdown-all-in-one)
182extension to create a table of contents which was really helpful in jumping to sections and copy and pasting the appropriate commits.
183
184You will then create a release at [Pytorch Release](https://github.com/pytorch/pytorch/releases) and if you save as a draft you can see how it will be rendered.
185
186
187
188#### Tidbits
189You will probably have a release note that doesn't fit into the character limit of github. I used the following regex:
190`\[#(\d+)\]\(https://github.com/pytorch/pytorch/pull/\d+\)` to replace the full lunks to (#<pull-request-number>).
191This will get formatted correctly in the github UI and can be checked when creating a draft release.
192
193
194The following markdown code is helpful for creating side-by-side tables of BC breaking/ deprecated code:
195
196
197``` Markdown
198<table>
199<tr>
200<th>PRIOR RELEASE NUM</th>
201<th>NEW RELEASE NUM</th>
202</tr>
203<tr>
204<td>
205
206```Python
207# Code Snippet 1
208```
209
210</td>
211<td>
212
213```Python
214# Code Snippet 2
215```
216
217</td>
218</tr>
219</table>
220```
221