xref: /aosp_15_r20/external/zstd/CONTRIBUTING.md (revision 01826a4963a0d8a59bc3812d29bdf0fb76416722)
1*01826a49SYabin Cui# Contributing to Zstandard
2*01826a49SYabin CuiWe want to make contributing to this project as easy and transparent as
3*01826a49SYabin Cuipossible.
4*01826a49SYabin Cui
5*01826a49SYabin Cui## Our Development Process
6*01826a49SYabin CuiNew versions are being developed in the "dev" branch,
7*01826a49SYabin Cuior in their own feature branch.
8*01826a49SYabin CuiWhen they are deemed ready for a release, they are merged into "release".
9*01826a49SYabin Cui
10*01826a49SYabin CuiAs a consequence, all contributions must stage first through "dev"
11*01826a49SYabin Cuior their own feature branch.
12*01826a49SYabin Cui
13*01826a49SYabin Cui## Pull Requests
14*01826a49SYabin CuiWe actively welcome your pull requests.
15*01826a49SYabin Cui
16*01826a49SYabin Cui1. Fork the repo and create your branch from `dev`.
17*01826a49SYabin Cui2. If you've added code that should be tested, add tests.
18*01826a49SYabin Cui3. If you've changed APIs, update the documentation.
19*01826a49SYabin Cui4. Ensure the test suite passes.
20*01826a49SYabin Cui5. Make sure your code lints.
21*01826a49SYabin Cui6. If you haven't already, complete the Contributor License Agreement ("CLA").
22*01826a49SYabin Cui
23*01826a49SYabin Cui## Contributor License Agreement ("CLA")
24*01826a49SYabin CuiIn order to accept your pull request, we need you to submit a CLA. You only need
25*01826a49SYabin Cuito do this once to work on any of Facebook's open source projects.
26*01826a49SYabin Cui
27*01826a49SYabin CuiComplete your CLA here: <https://code.facebook.com/cla>
28*01826a49SYabin Cui
29*01826a49SYabin Cui## Workflow
30*01826a49SYabin CuiZstd uses a branch-based workflow for making changes to the codebase. Typically, zstd
31*01826a49SYabin Cuiwill use a new branch per sizable topic. For smaller changes, it is okay to lump multiple
32*01826a49SYabin Cuirelated changes into a branch.
33*01826a49SYabin Cui
34*01826a49SYabin CuiOur contribution process works in three main stages:
35*01826a49SYabin Cui1. Local development
36*01826a49SYabin Cui    * Update:
37*01826a49SYabin Cui        * Checkout your fork of zstd if you have not already
38*01826a49SYabin Cui        ```
39*01826a49SYabin Cui        git checkout https://github.com/<username>/zstd
40*01826a49SYabin Cui        cd zstd
41*01826a49SYabin Cui        ```
42*01826a49SYabin Cui        * Update your local dev branch
43*01826a49SYabin Cui        ```
44*01826a49SYabin Cui        git pull https://github.com/facebook/zstd dev
45*01826a49SYabin Cui        git push origin dev
46*01826a49SYabin Cui        ```
47*01826a49SYabin Cui    * Topic and development:
48*01826a49SYabin Cui        * Make a new branch on your fork about the topic you're developing for
49*01826a49SYabin Cui        ```
50*01826a49SYabin Cui        # branch names should be concise but sufficiently informative
51*01826a49SYabin Cui        git checkout -b <branch-name>
52*01826a49SYabin Cui        git push origin <branch-name>
53*01826a49SYabin Cui        ```
54*01826a49SYabin Cui        * Make commits and push
55*01826a49SYabin Cui        ```
56*01826a49SYabin Cui        # make some changes =
57*01826a49SYabin Cui        git add -u && git commit -m <message>
58*01826a49SYabin Cui        git push origin <branch-name>
59*01826a49SYabin Cui        ```
60*01826a49SYabin Cui        * Note: run local tests to ensure that your changes didn't break existing functionality
61*01826a49SYabin Cui            * Quick check
62*01826a49SYabin Cui            ```
63*01826a49SYabin Cui            make shortest
64*01826a49SYabin Cui            ```
65*01826a49SYabin Cui            * Longer check
66*01826a49SYabin Cui            ```
67*01826a49SYabin Cui            make test
68*01826a49SYabin Cui            ```
69*01826a49SYabin Cui2. Code Review and CI tests
70*01826a49SYabin Cui    * Ensure CI tests pass:
71*01826a49SYabin Cui        * Before sharing anything to the community, create a pull request in your own fork against the dev branch
72*01826a49SYabin Cui        and make sure that all GitHub Actions CI tests pass. See the Continuous Integration section below for more information.
73*01826a49SYabin Cui        * Ensure that static analysis passes on your development machine. See the Static Analysis section
74*01826a49SYabin Cui        below to see how to do this.
75*01826a49SYabin Cui    * Create a pull request:
76*01826a49SYabin Cui        * When you are ready to share you changes to the community, create a pull request from your branch
77*01826a49SYabin Cui        to facebook:dev. You can do this very easily by clicking 'Create Pull Request' on your fork's home
78*01826a49SYabin Cui        page.
79*01826a49SYabin Cui        * From there, select the branch where you made changes as your source branch and facebook:dev
80*01826a49SYabin Cui        as the destination.
81*01826a49SYabin Cui        * Examine the diff presented between the two branches to make sure there is nothing unexpected.
82*01826a49SYabin Cui    * Write a good pull request description:
83*01826a49SYabin Cui        * While there is no strict template that our contributors follow, we would like them to
84*01826a49SYabin Cui        sufficiently summarize and motivate the changes they are proposing. We recommend all pull requests,
85*01826a49SYabin Cui        at least indirectly, address the following points.
86*01826a49SYabin Cui            * Is this pull request important and why?
87*01826a49SYabin Cui            * Is it addressing an issue? If so, what issue? (provide links for convenience please)
88*01826a49SYabin Cui            * Is this a new feature? If so, why is it useful and/or necessary?
89*01826a49SYabin Cui            * Are there background references and documents that reviewers should be aware of to properly assess this change?
90*01826a49SYabin Cui        * Note: make sure to point out any design and architectural decisions that you made and the rationale behind them.
91*01826a49SYabin Cui        * Note: if you have been working with a specific user and would like them to review your work, make sure you mention them using (@<username>)
92*01826a49SYabin Cui    * Submit the pull request and iterate with feedback.
93*01826a49SYabin Cui3. Merge and Release
94*01826a49SYabin Cui    * Getting approval:
95*01826a49SYabin Cui        * You will have to iterate on your changes with feedback from other collaborators to reach a point
96*01826a49SYabin Cui        where your pull request can be safely merged.
97*01826a49SYabin Cui        * To avoid too many comments on style and convention, make sure that you have a
98*01826a49SYabin Cui        look at our style section below before creating a pull request.
99*01826a49SYabin Cui        * Eventually, someone from the zstd team will approve your pull request and not long after merge it into
100*01826a49SYabin Cui        the dev branch.
101*01826a49SYabin Cui    * Housekeeping:
102*01826a49SYabin Cui        * Most PRs are linked with one or more Github issues. If this is the case for your PR, make sure
103*01826a49SYabin Cui        the corresponding issue is mentioned. If your change 'fixes' or completely addresses the
104*01826a49SYabin Cui        issue at hand, then please indicate this by requesting that an issue be closed by commenting.
105*01826a49SYabin Cui        * Just because your changes have been merged does not mean the topic or larger issue is complete. Remember
106*01826a49SYabin Cui        that the change must make it to an official zstd release for it to be meaningful. We recommend
107*01826a49SYabin Cui        that contributors track the activity on their pull request and corresponding issue(s) page(s) until
108*01826a49SYabin Cui        their change makes it to the next release of zstd. Users will often discover bugs in your code or
109*01826a49SYabin Cui        suggest ways to refine and improve your initial changes even after the pull request is merged.
110*01826a49SYabin Cui
111*01826a49SYabin Cui## Static Analysis
112*01826a49SYabin CuiStatic analysis is a process for examining the correctness or validity of a program without actually
113*01826a49SYabin Cuiexecuting it. It usually helps us find many simple bugs. Zstd uses clang's `scan-build` tool for
114*01826a49SYabin Cuistatic analysis. You can install it by following the instructions for your OS on https://clang-analyzer.llvm.org/scan-build.
115*01826a49SYabin Cui
116*01826a49SYabin CuiOnce installed, you can ensure that our static analysis tests pass on your local development machine
117*01826a49SYabin Cuiby running:
118*01826a49SYabin Cui```
119*01826a49SYabin Cuimake staticAnalyze
120*01826a49SYabin Cui```
121*01826a49SYabin Cui
122*01826a49SYabin CuiIn general, you can use `scan-build` to static analyze any build script. For example, to static analyze
123*01826a49SYabin Cuijust `contrib/largeNbDicts` and nothing else, you can run:
124*01826a49SYabin Cui
125*01826a49SYabin Cui```
126*01826a49SYabin Cuiscan-build make -C contrib/largeNbDicts largeNbDicts
127*01826a49SYabin Cui```
128*01826a49SYabin Cui
129*01826a49SYabin Cui### Pitfalls of static analysis
130*01826a49SYabin Cui`scan-build` is part of our regular CI suite. Other static analyzers are not.
131*01826a49SYabin Cui
132*01826a49SYabin CuiIt can be useful to look at additional static analyzers once in a while (and we do), but it's not a good idea to multiply the nb of analyzers run continuously at each commit and PR. The reasons are :
133*01826a49SYabin Cui
134*01826a49SYabin Cui- Static analyzers are full of false positive. The signal to noise ratio is actually pretty low.
135*01826a49SYabin Cui- A good CI policy is "zero-warning tolerance". That means that all issues must be solved, including false positives. This quickly becomes a tedious workload.
136*01826a49SYabin Cui- Multiple static analyzers will feature multiple kind of false positives, sometimes applying to the same code but in different ways leading to :
137*01826a49SYabin Cui   + tortuous code, trying to please multiple constraints, hurting readability and therefore maintenance. Sometimes, such complexity introduce other more subtle bugs, that are just out of scope of the analyzers.
138*01826a49SYabin Cui   + sometimes, these constraints are mutually exclusive : if one try to solve one, the other static analyzer will complain, they can't be both happy at the same time.
139*01826a49SYabin Cui- As if that was not enough, the list of false positives change with each version. It's hard enough to follow one static analyzer, but multiple ones with their own update agenda, this quickly becomes a massive velocity reducer.
140*01826a49SYabin Cui
141*01826a49SYabin CuiThis is different from running a static analyzer once in a while, looking at the output, and __cherry picking__ a few warnings that seem helpful, either because they detected a genuine risk of bug, or because it helps expressing the code in a way which is more readable or more difficult to misuse. These kinds of reports can be useful, and are accepted.
142*01826a49SYabin Cui
143*01826a49SYabin Cui## Continuous Integration
144*01826a49SYabin CuiCI tests run every time a pull request (PR) is created or updated. The exact tests
145*01826a49SYabin Cuithat get run will depend on the destination branch you specify. Some tests take
146*01826a49SYabin Cuilonger to run than others. Currently, our CI is set up to run a short
147*01826a49SYabin Cuiseries of tests when creating a PR to the dev branch and a longer series of tests
148*01826a49SYabin Cuiwhen creating a PR to the release branch. You can look in the configuration files
149*01826a49SYabin Cuiof the respective CI platform for more information on what gets run when.
150*01826a49SYabin Cui
151*01826a49SYabin CuiMost people will just want to create a PR with the destination set to their local dev
152*01826a49SYabin Cuibranch of zstd. You can then find the status of the tests on the PR's page. You can also
153*01826a49SYabin Cuire-run tests and cancel running tests from the PR page or from the respective CI's dashboard.
154*01826a49SYabin Cui
155*01826a49SYabin CuiAlmost all of zstd's CI runs on GitHub Actions (configured at `.github/workflows`), which will automatically run on PRs to your
156*01826a49SYabin Cuiown fork. A small number of tests run on other services (e.g. Travis CI, Circle CI, Appveyor).
157*01826a49SYabin CuiThese require work to set up on your local fork, and (at least for Travis CI) cost money.
158*01826a49SYabin CuiTherefore, if the PR on your local fork passes GitHub Actions, feel free to submit a PR
159*01826a49SYabin Cuiagainst the main repo.
160*01826a49SYabin Cui
161*01826a49SYabin Cui### Third-party CI
162*01826a49SYabin CuiA small number of tests cannot run on GitHub Actions, or have yet to be migrated.
163*01826a49SYabin CuiFor these, we use a variety of third-party services (listed below). It is not necessary to set
164*01826a49SYabin Cuithese up on your fork in order to contribute to zstd; however, we do link to instructions for those
165*01826a49SYabin Cuiwho want earlier signal.
166*01826a49SYabin Cui
167*01826a49SYabin Cui| Service   | Purpose                                                                                                    | Setup Links                                                                                                                                            | Config Path            |
168*01826a49SYabin Cui|-----------|------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|
169*01826a49SYabin Cui| Travis CI | Used for testing on non-x86 architectures such as PowerPC                                                  | https://docs.travis-ci.com/user/tutorial/#to-get-started-with-travis-ci-using-github <br> https://github.com/marketplace/travis-ci                     | `.travis.yml`          |
170*01826a49SYabin Cui| AppVeyor  | Used for some Windows testing (e.g. cygwin, mingw)                                                         | https://www.appveyor.com/blog/2018/10/02/github-apps-integration/ <br> https://github.com/marketplace/appveyor                                         | `appveyor.yml`         |
171*01826a49SYabin Cui| Cirrus CI | Used for testing on FreeBSD                                                                                | https://github.com/marketplace/cirrus-ci/                                                                                                              | `.cirrus.yml`          |
172*01826a49SYabin Cui| Circle CI | Historically was used to provide faster signal,<br/> but we may be able to migrate these to Github Actions | https://circleci.com/docs/2.0/getting-started/#setting-up-circleci <br> https://youtu.be/Js3hMUsSZ2c <br> https://circleci.com/docs/2.0/enable-checks/ | `.circleci/config.yml` |
173*01826a49SYabin Cui
174*01826a49SYabin CuiNote: the instructions linked above mostly cover how to set up a repository with CI from scratch.
175*01826a49SYabin CuiThe general idea should be the same for setting up CI on your fork of zstd, but you may have to
176*01826a49SYabin Cuifollow slightly different steps. In particular, please ignore any instructions related to setting up
177*01826a49SYabin Cuiconfig files (since zstd already has configs for each of these services).
178*01826a49SYabin Cui
179*01826a49SYabin Cui## Performance
180*01826a49SYabin CuiPerformance is extremely important for zstd and we only merge pull requests whose performance
181*01826a49SYabin Cuilandscape and corresponding trade-offs have been adequately analyzed, reproduced, and presented.
182*01826a49SYabin CuiThis high bar for performance means that every PR which has the potential to
183*01826a49SYabin Cuiimpact performance takes a very long time for us to properly review. That being said, we
184*01826a49SYabin Cuialways welcome contributions to improve performance (or worsen performance for the trade-off of
185*01826a49SYabin Cuisomething else). Please keep the following in mind before submitting a performance related PR:
186*01826a49SYabin Cui
187*01826a49SYabin Cui1. Zstd isn't as old as gzip but it has been around for time now and its evolution is
188*01826a49SYabin Cuivery well documented via past Github issues and pull requests. It may be the case that your
189*01826a49SYabin Cuiparticular performance optimization has already been considered in the past. Please take some
190*01826a49SYabin Cuitime to search through old issues and pull requests using keywords specific to your
191*01826a49SYabin Cuiwould-be PR. Of course, just because a topic has already been discussed (and perhaps rejected
192*01826a49SYabin Cuion some grounds) in the past, doesn't mean it isn't worth bringing up again. But even in that case,
193*01826a49SYabin Cuiit will be helpful for you to have context from that topic's history before contributing.
194*01826a49SYabin Cui2. The distinction between noise and actual performance gains can unfortunately be very subtle
195*01826a49SYabin Cuiespecially when microbenchmarking extremely small wins or losses. The only remedy to getting
196*01826a49SYabin Cuisomething subtle merged is extensive benchmarking. You will be doing us a great favor if you
197*01826a49SYabin Cuitake the time to run extensive, long-duration, and potentially cross-(os, platform, process, etc)
198*01826a49SYabin Cuibenchmarks on your end before submitting a PR. Of course, you will not be able to benchmark
199*01826a49SYabin Cuiyour changes on every single processor and os out there (and neither will we) but do that best
200*01826a49SYabin Cuiyou can:) We've added some things to think about when benchmarking below in the Benchmarking
201*01826a49SYabin CuiPerformance section which might be helpful for you.
202*01826a49SYabin Cui3. Optimizing performance for a certain OS, processor vendor, compiler, or network system is a perfectly
203*01826a49SYabin Cuilegitimate thing to do as long as it does not harm the overall performance health of Zstd.
204*01826a49SYabin CuiThis is a hard balance to strike but please keep in mind other aspects of Zstd when
205*01826a49SYabin Cuisubmitting changes that are clang-specific, windows-specific, etc.
206*01826a49SYabin Cui
207*01826a49SYabin Cui## Benchmarking Performance
208*01826a49SYabin CuiPerformance microbenchmarking is a tricky subject but also essential for Zstd. We value empirical
209*01826a49SYabin Cuitesting over theoretical speculation. This guide it not perfect but for most scenarios, it
210*01826a49SYabin Cuiis a good place to start.
211*01826a49SYabin Cui
212*01826a49SYabin Cui### Stability
213*01826a49SYabin CuiUnfortunately, the most important aspect in being able to benchmark reliably is to have a stable
214*01826a49SYabin Cuibenchmarking machine. A virtual machine, a machine with shared resources, or your laptop
215*01826a49SYabin Cuiwill typically not be stable enough to obtain reliable benchmark results. If you can get your
216*01826a49SYabin Cuihands on a desktop, this is usually a better scenario.
217*01826a49SYabin Cui
218*01826a49SYabin CuiOf course, benchmarking can be done on non-hyper-stable machines as well. You will just have to
219*01826a49SYabin Cuido a little more work to ensure that you are in fact measuring the changes you've made and not
220*01826a49SYabin Cuinoise. Here are some things you can do to make your benchmarks more stable:
221*01826a49SYabin Cui
222*01826a49SYabin Cui1. The most simple thing you can do to drastically improve the stability of your benchmark is
223*01826a49SYabin Cuito run it multiple times and then aggregate the results of those runs. As a general rule of
224*01826a49SYabin Cuithumb, the smaller the change you are trying to measure, the more samples of benchmark runs
225*01826a49SYabin Cuiyou will have to aggregate over to get reliable results. Here are some additional things to keep in
226*01826a49SYabin Cuimind when running multiple trials:
227*01826a49SYabin Cui    * How you aggregate your samples are important. You might be tempted to use the mean of your
228*01826a49SYabin Cui    results. While this is certainly going to be a more stable number than a raw single sample
229*01826a49SYabin Cui    benchmark number, you might have more luck by taking the median. The mean is not robust to
230*01826a49SYabin Cui    outliers whereas the median is. Better still, you could simply take the fastest speed your
231*01826a49SYabin Cui    benchmark achieved on each run since that is likely the fastest your process will be
232*01826a49SYabin Cui    capable of running your code. In our experience, this (aggregating by just taking the sample
233*01826a49SYabin Cui    with the fastest running time) has been the most stable approach.
234*01826a49SYabin Cui    * The more samples you have, the more stable your benchmarks should be. You can verify
235*01826a49SYabin Cui    your improved stability by looking at the size of your confidence intervals as you
236*01826a49SYabin Cui    increase your sample count. These should get smaller and smaller. Eventually hopefully
237*01826a49SYabin Cui    smaller than the performance win you are expecting.
238*01826a49SYabin Cui    * Most processors will take some time to get `hot` when running anything. The observations
239*01826a49SYabin Cui    you collect during that time period will very different from the true performance number. Having
240*01826a49SYabin Cui    a very large number of sample will help alleviate this problem slightly but you can also
241*01826a49SYabin Cui    address is directly by simply not including the first `n` iterations of your benchmark in
242*01826a49SYabin Cui    your aggregations. You can determine `n` by simply looking at the results from each iteration
243*01826a49SYabin Cui    and then hand picking a good threshold after which the variance in results seems to stabilize.
244*01826a49SYabin Cui2. You cannot really get reliable benchmarks if your host machine is simultaneously running
245*01826a49SYabin Cuianother cpu/memory-intensive application in the background. If you are running benchmarks on your
246*01826a49SYabin Cuipersonal laptop for instance, you should close all applications (including your code editor and
247*01826a49SYabin Cuibrowser) before running your benchmarks. You might also have invisible background applications
248*01826a49SYabin Cuirunning. You can see what these are by looking at either Activity Monitor on Mac or Task Manager
249*01826a49SYabin Cuion Windows. You will get more stable benchmark results of you end those processes as well.
250*01826a49SYabin Cui    * If you have multiple cores, you can even run your benchmark on a reserved core to prevent
251*01826a49SYabin Cui    pollution from other OS and user processes. There are a number of ways to do this depending
252*01826a49SYabin Cui    on your OS:
253*01826a49SYabin Cui        * On linux boxes, you have use https://github.com/lpechacek/cpuset.
254*01826a49SYabin Cui        * On Windows, you can "Set Processor Affinity" using https://www.thewindowsclub.com/processor-affinity-windows
255*01826a49SYabin Cui        * On Mac, you can try to use their dedicated affinity API https://developer.apple.com/library/archive/releasenotes/Performance/RN-AffinityAPI/#//apple_ref/doc/uid/TP40006635-CH1-DontLinkElementID_2
256*01826a49SYabin Cui3. To benchmark, you will likely end up writing a separate c/c++ program that will link libzstd.
257*01826a49SYabin CuiDynamically linking your library will introduce some added variation (not a large amount but
258*01826a49SYabin Cuidefinitely some). Statically linking libzstd will be more stable. Static libraries should
259*01826a49SYabin Cuibe enabled by default when building zstd.
260*01826a49SYabin Cui4. Use a profiler with a good high resolution timer. See the section below on profiling for
261*01826a49SYabin Cuidetails on this.
262*01826a49SYabin Cui5. Disable frequency scaling, turbo boost and address space randomization (this will vary by OS)
263*01826a49SYabin Cui6. Try to avoid storage. On some systems you can use tmpfs. Putting the program, inputs and outputs on
264*01826a49SYabin Cuitmpfs avoids touching a real storage system, which can have a pretty big variability.
265*01826a49SYabin Cui
266*01826a49SYabin CuiAlso check our LLVM's guide on benchmarking here: https://llvm.org/docs/Benchmarking.html
267*01826a49SYabin Cui
268*01826a49SYabin Cui### Zstd benchmark
269*01826a49SYabin CuiThe fastest signal you can get regarding your performance changes is via the in-build zstd cli
270*01826a49SYabin Cuibench option. You can run Zstd as you typically would for your scenario using some set of options
271*01826a49SYabin Cuiand then additionally also specify the `-b#` option. Doing this will run our benchmarking pipeline
272*01826a49SYabin Cuifor that options you have just provided. If you want to look at the internals of how this
273*01826a49SYabin Cuibenchmarking script works, you can check out programs/benchzstd.c
274*01826a49SYabin Cui
275*01826a49SYabin CuiFor example: say you have made a change that you believe improves the speed of zstd level 1. The
276*01826a49SYabin Cuivery first thing you should use to assess whether you actually achieved any sort of improvement
277*01826a49SYabin Cuiis `zstd -b`. You might try to do something like this. Note: you can use the `-i` option to
278*01826a49SYabin Cuispecify a running time for your benchmark in seconds (default is 3 seconds).
279*01826a49SYabin CuiUsually, the longer the running time, the more stable your results will be.
280*01826a49SYabin Cui
281*01826a49SYabin Cui```
282*01826a49SYabin Cui$ git checkout <commit-before-your-change>
283*01826a49SYabin Cui$ make && cp zstd zstd-old
284*01826a49SYabin Cui$ git checkout <commit-after-your-change>
285*01826a49SYabin Cui$ make && cp zstd zstd-new
286*01826a49SYabin Cui$ zstd-old -i5 -b1 <your-test-data>
287*01826a49SYabin Cui 1<your-test-data>         :      8990 ->      3992 (2.252), 302.6 MB/s , 626.4 MB/s
288*01826a49SYabin Cui$ zstd-new -i5 -b1 <your-test-data>
289*01826a49SYabin Cui 1<your-test-data>         :      8990 ->      3992 (2.252), 302.8 MB/s , 628.4 MB/s
290*01826a49SYabin Cui```
291*01826a49SYabin Cui
292*01826a49SYabin CuiUnless your performance win is large enough to be visible despite the intrinsic noise
293*01826a49SYabin Cuion your computer, benchzstd alone will likely not be enough to validate the impact of your
294*01826a49SYabin Cuichanges. For example, the results of the example above indicate that effectively nothing
295*01826a49SYabin Cuichanged but there could be a small <3% improvement that the noise on the host machine
296*01826a49SYabin Cuiobscured. So unless you see a large performance win (10-15% consistently) using just
297*01826a49SYabin Cuithis method of evaluation will not be sufficient.
298*01826a49SYabin Cui
299*01826a49SYabin Cui### Profiling
300*01826a49SYabin CuiThere are a number of great profilers out there. We're going to briefly mention how you can
301*01826a49SYabin Cuiprofile your code using `instruments` on mac, `perf` on linux and `visual studio profiler`
302*01826a49SYabin Cuion Windows.
303*01826a49SYabin Cui
304*01826a49SYabin CuiSay you have an idea for a change that you think will provide some good performance gains
305*01826a49SYabin Cuifor level 1 compression on Zstd. Typically this means, you have identified a section of
306*01826a49SYabin Cuicode that you think can be made to run faster.
307*01826a49SYabin Cui
308*01826a49SYabin CuiThe first thing you will want to do is make sure that the piece of code is actually taking up
309*01826a49SYabin Cuia notable amount of time to run. It is usually not worth optimizing something which accounts for less than
310*01826a49SYabin Cui0.0001% of the total running time. Luckily, there are tools to help with this.
311*01826a49SYabin CuiProfilers will let you see how much time your code spends inside a particular function.
312*01826a49SYabin CuiIf your target code snippet is only part of a function, it might be worth trying to
313*01826a49SYabin Cuiisolate that snippet by moving it to its own function (this is usually not necessary but
314*01826a49SYabin Cuimight be).
315*01826a49SYabin Cui
316*01826a49SYabin CuiMost profilers (including the profilers discussed below) will generate a call graph of
317*01826a49SYabin Cuifunctions for you. Your goal will be to find your function of interest in this call graph
318*01826a49SYabin Cuiand then inspect the time spent inside of it. You might also want to look at the annotated
319*01826a49SYabin Cuiassembly which most profilers will provide you with.
320*01826a49SYabin Cui
321*01826a49SYabin Cui#### Instruments
322*01826a49SYabin CuiWe will once again consider the scenario where you think you've identified a piece of code
323*01826a49SYabin Cuiwhose performance can be improved upon. Follow these steps to profile your code using
324*01826a49SYabin CuiInstruments.
325*01826a49SYabin Cui
326*01826a49SYabin Cui1. Open Instruments
327*01826a49SYabin Cui2. Select `Time Profiler` from the list of standard templates
328*01826a49SYabin Cui3. Close all other applications except for your instruments window and your terminal
329*01826a49SYabin Cui4. Run your benchmarking script from your terminal window
330*01826a49SYabin Cui    * You will want a benchmark that runs for at least a few seconds (5 seconds will
331*01826a49SYabin Cui    usually be long enough). This way the profiler will have something to work with
332*01826a49SYabin Cui    and you will have ample time to attach your profiler to this process:)
333*01826a49SYabin Cui    * I will just use benchzstd as my benchmarmking script for this example:
334*01826a49SYabin Cui```
335*01826a49SYabin Cui$ zstd -b1 -i5 <my-data> # this will run for 5 seconds
336*01826a49SYabin Cui```
337*01826a49SYabin Cui5. Once you run your benchmarking script, switch back over to instruments and attach your
338*01826a49SYabin Cuiprocess to the time profiler. You can do this by:
339*01826a49SYabin Cui    * Clicking on the `All Processes` drop down in the top left of the toolbar.
340*01826a49SYabin Cui    * Selecting your process from the dropdown. In my case, it is just going to be labeled
341*01826a49SYabin Cui    `zstd`
342*01826a49SYabin Cui    * Hitting the bright red record circle button on the top left of the toolbar
343*01826a49SYabin Cui6. You profiler will now start collecting metrics from your benchmarking script. Once
344*01826a49SYabin Cuiyou think you have collected enough samples (usually this is the case after 3 seconds of
345*01826a49SYabin Cuirecording), stop your profiler.
346*01826a49SYabin Cui7. Make sure that in toolbar of the bottom window, `profile` is selected.
347*01826a49SYabin Cui8. You should be able to see your call graph.
348*01826a49SYabin Cui    * If you don't see the call graph or an incomplete call graph, make sure you have compiled
349*01826a49SYabin Cui    zstd and your benchmarking script using debug flags. On mac and linux, this just means
350*01826a49SYabin Cui    you will have to supply the `-g` flag alone with your build script. You might also
351*01826a49SYabin Cui    have to provide the `-fno-omit-frame-pointer` flag
352*01826a49SYabin Cui9. Dig down the graph to find your function call and then inspect it by double clicking
353*01826a49SYabin Cuithe list item. You will be able to see the annotated source code and the assembly side by
354*01826a49SYabin Cuiside.
355*01826a49SYabin Cui
356*01826a49SYabin Cui#### Perf
357*01826a49SYabin Cui
358*01826a49SYabin CuiThis wiki has a pretty detailed tutorial on getting started working with perf so we'll
359*01826a49SYabin Cuileave you to check that out of you're getting started:
360*01826a49SYabin Cui
361*01826a49SYabin Cuihttps://perf.wiki.kernel.org/index.php/Tutorial
362*01826a49SYabin Cui
363*01826a49SYabin CuiSome general notes on perf:
364*01826a49SYabin Cui* Use `perf stat -r # <bench-program>` to quickly get some relevant timing and
365*01826a49SYabin Cuicounter statistics. Perf uses a high resolution timer and this is likely one
366*01826a49SYabin Cuiof the first things your team will run when assessing your PR.
367*01826a49SYabin Cui* Perf has a long list of hardware counters that can be viewed with `perf --list`.
368*01826a49SYabin CuiWhen measuring optimizations, something worth trying is to make sure the hardware
369*01826a49SYabin Cuicounters you expect to be impacted by your change are in fact being so. For example,
370*01826a49SYabin Cuiif you expect the L1 cache misses to decrease with your change, you can look at the
371*01826a49SYabin Cuicounter `L1-dcache-load-misses`
372*01826a49SYabin Cui* Perf hardware counters will not work on a virtual machine.
373*01826a49SYabin Cui
374*01826a49SYabin Cui#### Visual Studio
375*01826a49SYabin Cui
376*01826a49SYabin CuiTODO
377*01826a49SYabin Cui
378*01826a49SYabin Cui## Issues
379*01826a49SYabin CuiWe use GitHub issues to track public bugs. Please ensure your description is
380*01826a49SYabin Cuiclear and has sufficient instructions to be able to reproduce the issue.
381*01826a49SYabin Cui
382*01826a49SYabin CuiFacebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
383*01826a49SYabin Cuidisclosure of security bugs. In those cases, please go through the process
384*01826a49SYabin Cuioutlined on that page and do not file a public issue.
385*01826a49SYabin Cui
386*01826a49SYabin Cui## Coding Style
387*01826a49SYabin CuiIt's a pretty long topic, which is difficult to summarize in a single paragraph.
388*01826a49SYabin CuiAs a rule of thumbs, try to imitate the coding style of
389*01826a49SYabin Cuisimilar lines of codes around your contribution.
390*01826a49SYabin CuiThe following is a non-exhaustive list of rules employed in zstd code base:
391*01826a49SYabin Cui
392*01826a49SYabin Cui### C90
393*01826a49SYabin CuiThis code base is following strict C90 standard,
394*01826a49SYabin Cuiwith 2 extensions : 64-bit `long long` types, and variadic macros.
395*01826a49SYabin CuiThis rule is applied strictly to code within `lib/` and `programs/`.
396*01826a49SYabin CuiSub-project in `contrib/` are allowed to use other conventions.
397*01826a49SYabin Cui
398*01826a49SYabin Cui### C++ direct compatibility : symbol mangling
399*01826a49SYabin CuiAll public symbol declarations must be wrapped in `extern “C” { … }`,
400*01826a49SYabin Cuiso that this project can be compiled as C++98 code,
401*01826a49SYabin Cuiand linked into C++ applications.
402*01826a49SYabin Cui
403*01826a49SYabin Cui### Minimal Frugal
404*01826a49SYabin CuiThis design requirement is fundamental to preserve the portability of the code base.
405*01826a49SYabin Cui#### Dependencies
406*01826a49SYabin Cui- Reduce dependencies to the minimum possible level.
407*01826a49SYabin Cui  Any dependency should be considered “bad” by default,
408*01826a49SYabin Cui  and only tolerated because it provides a service in a better way than can be achieved locally.
409*01826a49SYabin Cui  The only external dependencies this repository tolerates are
410*01826a49SYabin Cui  standard C libraries, and in rare cases, system level headers.
411*01826a49SYabin Cui- Within `lib/`, this policy is even more drastic.
412*01826a49SYabin Cui  The only external dependencies allowed are `<assert.h>`, `<stdlib.h>`, `<string.h>`,
413*01826a49SYabin Cui  and even then, not directly.
414*01826a49SYabin Cui  In particular, no function shall ever allocate on heap directly,
415*01826a49SYabin Cui  and must use instead `ZSTD_malloc()` and equivalent.
416*01826a49SYabin Cui  Other accepted non-symbol headers are `<stddef.h>` and `<limits.h>`.
417*01826a49SYabin Cui- Within the project, there is a strict hierarchy of dependencies that must be respected.
418*01826a49SYabin Cui  `programs/` is allowed to depend on `lib/`, but only its public API.
419*01826a49SYabin Cui  Within `lib/`, `lib/common` doesn't depend on any other directory.
420*01826a49SYabin Cui  `lib/compress` and `lib/decompress` shall not depend on each other.
421*01826a49SYabin Cui  `lib/dictBuilder` can depend on `lib/common` and `lib/compress`, but not `lib/decompress`.
422*01826a49SYabin Cui#### Resources
423*01826a49SYabin Cui- Functions in `lib/` must use very little stack space,
424*01826a49SYabin Cui  several dozens of bytes max.
425*01826a49SYabin Cui  Everything larger must use the heap allocator,
426*01826a49SYabin Cui  or require a scratch buffer to be emplaced manually.
427*01826a49SYabin Cui
428*01826a49SYabin Cui### Naming
429*01826a49SYabin Cui* All public symbols are prefixed with `ZSTD_`
430*01826a49SYabin Cui  + private symbols, with a scope limited to their own unit, are free of this restriction.
431*01826a49SYabin Cui    However, since `libzstd` source code can be amalgamated,
432*01826a49SYabin Cui    each symbol name must attempt to be (and remain) unique.
433*01826a49SYabin Cui    Avoid too generic names that could become ground for future collisions.
434*01826a49SYabin Cui    This generally implies usage of some form of prefix.
435*01826a49SYabin Cui* For symbols (functions and variables), naming convention is `PREFIX_camelCase`.
436*01826a49SYabin Cui  + In some advanced cases, one can also find :
437*01826a49SYabin Cui    - `PREFIX_prefix2_camelCase`
438*01826a49SYabin Cui    - `PREFIX_camelCase_extendedQualifier`
439*01826a49SYabin Cui* Multi-words names generally consist of an action followed by object:
440*01826a49SYabin Cui  - for example : `ZSTD_createCCtx()`
441*01826a49SYabin Cui* Prefer positive actions
442*01826a49SYabin Cui  - `goBackward` rather than `notGoForward`
443*01826a49SYabin Cui* Type names (`struct`, etc.) follow similar convention,
444*01826a49SYabin Cui  except that they are allowed and even invited to start by an Uppercase letter.
445*01826a49SYabin Cui  Example : `ZSTD_CCtx`, `ZSTD_CDict`
446*01826a49SYabin Cui* Macro names are all Capital letters.
447*01826a49SYabin Cui  The same composition rules (`PREFIX_NAME_QUALIFIER`) apply.
448*01826a49SYabin Cui* File names are all lowercase letters.
449*01826a49SYabin Cui  The convention is `snake_case`.
450*01826a49SYabin Cui  File names **must** be unique across the entire code base,
451*01826a49SYabin Cui  even when they stand in clearly separated directories.
452*01826a49SYabin Cui
453*01826a49SYabin Cui### Qualifiers
454*01826a49SYabin Cui* This code base is `const` friendly, if not `const` fanatical.
455*01826a49SYabin Cui  Any variable that can be `const` (aka. read-only) **must** be `const`.
456*01826a49SYabin Cui  Any pointer which content will not be modified must be `const`.
457*01826a49SYabin Cui  This property is then controlled at compiler level.
458*01826a49SYabin Cui  `const` variables are an important signal to readers that this variable isn't modified.
459*01826a49SYabin Cui  Conversely, non-const variables are a signal to readers to watch out for modifications later on in the function.
460*01826a49SYabin Cui* If a function must be inlined, mention it explicitly,
461*01826a49SYabin Cui  using project's own portable macros, such as `FORCE_INLINE_ATTR`,
462*01826a49SYabin Cui  defined in `lib/common/compiler.h`.
463*01826a49SYabin Cui
464*01826a49SYabin Cui### Debugging
465*01826a49SYabin Cui* **Assertions** are welcome, and should be used very liberally,
466*01826a49SYabin Cui  to control any condition the code expects for its correct execution.
467*01826a49SYabin Cui  These assertion checks will be run in debug builds, and disabled in production.
468*01826a49SYabin Cui* For traces, this project provides its own debug macros,
469*01826a49SYabin Cui  in particular `DEBUGLOG(level, ...)`, defined in `lib/common/debug.h`.
470*01826a49SYabin Cui
471*01826a49SYabin Cui### Code documentation
472*01826a49SYabin Cui* Avoid code documentation that merely repeats what the code is already stating.
473*01826a49SYabin Cui  Whenever applicable, prefer employing the code as the primary way to convey explanations.
474*01826a49SYabin Cui  Example 1 : `int nbTokens = n;` instead of `int i = n; /* i is a nb of tokens *./`.
475*01826a49SYabin Cui  Example 2 : `assert(size > 0);` instead of `/* here, size should be positive */`.
476*01826a49SYabin Cui* At declaration level, the documentation explains how to use the function or variable
477*01826a49SYabin Cui  and when applicable why it's needed, of the scenarios where it can be useful.
478*01826a49SYabin Cui* At implementation level, the documentation explains the general outline of the algorithm employed,
479*01826a49SYabin Cui  and when applicable why this specific choice was preferred.
480*01826a49SYabin Cui
481*01826a49SYabin Cui### General layout
482*01826a49SYabin Cui* 4 spaces for indentation rather than tabs
483*01826a49SYabin Cui* Code documentation shall directly precede function declaration or implementation
484*01826a49SYabin Cui* Function implementations and its code documentation should be preceded and followed by an empty line
485*01826a49SYabin Cui
486*01826a49SYabin Cui
487*01826a49SYabin Cui## License
488*01826a49SYabin CuiBy contributing to Zstandard, you agree that your contributions will be licensed
489*01826a49SYabin Cuiunder both the [LICENSE](LICENSE) file and the [COPYING](COPYING) file in the root directory of this source tree.
490