1*01826a49SYabin Cui# Contributing to Zstandard 2*01826a49SYabin CuiWe want to make contributing to this project as easy and transparent as 3*01826a49SYabin Cuipossible. 4*01826a49SYabin Cui 5*01826a49SYabin Cui## Our Development Process 6*01826a49SYabin CuiNew versions are being developed in the "dev" branch, 7*01826a49SYabin Cuior in their own feature branch. 8*01826a49SYabin CuiWhen they are deemed ready for a release, they are merged into "release". 9*01826a49SYabin Cui 10*01826a49SYabin CuiAs a consequence, all contributions must stage first through "dev" 11*01826a49SYabin Cuior their own feature branch. 12*01826a49SYabin Cui 13*01826a49SYabin Cui## Pull Requests 14*01826a49SYabin CuiWe actively welcome your pull requests. 15*01826a49SYabin Cui 16*01826a49SYabin Cui1. Fork the repo and create your branch from `dev`. 17*01826a49SYabin Cui2. If you've added code that should be tested, add tests. 18*01826a49SYabin Cui3. If you've changed APIs, update the documentation. 19*01826a49SYabin Cui4. Ensure the test suite passes. 20*01826a49SYabin Cui5. Make sure your code lints. 21*01826a49SYabin Cui6. If you haven't already, complete the Contributor License Agreement ("CLA"). 22*01826a49SYabin Cui 23*01826a49SYabin Cui## Contributor License Agreement ("CLA") 24*01826a49SYabin CuiIn order to accept your pull request, we need you to submit a CLA. You only need 25*01826a49SYabin Cuito do this once to work on any of Facebook's open source projects. 26*01826a49SYabin Cui 27*01826a49SYabin CuiComplete your CLA here: <https://code.facebook.com/cla> 28*01826a49SYabin Cui 29*01826a49SYabin Cui## Workflow 30*01826a49SYabin CuiZstd uses a branch-based workflow for making changes to the codebase. Typically, zstd 31*01826a49SYabin Cuiwill use a new branch per sizable topic. For smaller changes, it is okay to lump multiple 32*01826a49SYabin Cuirelated changes into a branch. 33*01826a49SYabin Cui 34*01826a49SYabin CuiOur contribution process works in three main stages: 35*01826a49SYabin Cui1. Local development 36*01826a49SYabin Cui * Update: 37*01826a49SYabin Cui * Checkout your fork of zstd if you have not already 38*01826a49SYabin Cui ``` 39*01826a49SYabin Cui git checkout https://github.com/<username>/zstd 40*01826a49SYabin Cui cd zstd 41*01826a49SYabin Cui ``` 42*01826a49SYabin Cui * Update your local dev branch 43*01826a49SYabin Cui ``` 44*01826a49SYabin Cui git pull https://github.com/facebook/zstd dev 45*01826a49SYabin Cui git push origin dev 46*01826a49SYabin Cui ``` 47*01826a49SYabin Cui * Topic and development: 48*01826a49SYabin Cui * Make a new branch on your fork about the topic you're developing for 49*01826a49SYabin Cui ``` 50*01826a49SYabin Cui # branch names should be concise but sufficiently informative 51*01826a49SYabin Cui git checkout -b <branch-name> 52*01826a49SYabin Cui git push origin <branch-name> 53*01826a49SYabin Cui ``` 54*01826a49SYabin Cui * Make commits and push 55*01826a49SYabin Cui ``` 56*01826a49SYabin Cui # make some changes = 57*01826a49SYabin Cui git add -u && git commit -m <message> 58*01826a49SYabin Cui git push origin <branch-name> 59*01826a49SYabin Cui ``` 60*01826a49SYabin Cui * Note: run local tests to ensure that your changes didn't break existing functionality 61*01826a49SYabin Cui * Quick check 62*01826a49SYabin Cui ``` 63*01826a49SYabin Cui make shortest 64*01826a49SYabin Cui ``` 65*01826a49SYabin Cui * Longer check 66*01826a49SYabin Cui ``` 67*01826a49SYabin Cui make test 68*01826a49SYabin Cui ``` 69*01826a49SYabin Cui2. Code Review and CI tests 70*01826a49SYabin Cui * Ensure CI tests pass: 71*01826a49SYabin Cui * Before sharing anything to the community, create a pull request in your own fork against the dev branch 72*01826a49SYabin Cui and make sure that all GitHub Actions CI tests pass. See the Continuous Integration section below for more information. 73*01826a49SYabin Cui * Ensure that static analysis passes on your development machine. See the Static Analysis section 74*01826a49SYabin Cui below to see how to do this. 75*01826a49SYabin Cui * Create a pull request: 76*01826a49SYabin Cui * When you are ready to share you changes to the community, create a pull request from your branch 77*01826a49SYabin Cui to facebook:dev. You can do this very easily by clicking 'Create Pull Request' on your fork's home 78*01826a49SYabin Cui page. 79*01826a49SYabin Cui * From there, select the branch where you made changes as your source branch and facebook:dev 80*01826a49SYabin Cui as the destination. 81*01826a49SYabin Cui * Examine the diff presented between the two branches to make sure there is nothing unexpected. 82*01826a49SYabin Cui * Write a good pull request description: 83*01826a49SYabin Cui * While there is no strict template that our contributors follow, we would like them to 84*01826a49SYabin Cui sufficiently summarize and motivate the changes they are proposing. We recommend all pull requests, 85*01826a49SYabin Cui at least indirectly, address the following points. 86*01826a49SYabin Cui * Is this pull request important and why? 87*01826a49SYabin Cui * Is it addressing an issue? If so, what issue? (provide links for convenience please) 88*01826a49SYabin Cui * Is this a new feature? If so, why is it useful and/or necessary? 89*01826a49SYabin Cui * Are there background references and documents that reviewers should be aware of to properly assess this change? 90*01826a49SYabin Cui * Note: make sure to point out any design and architectural decisions that you made and the rationale behind them. 91*01826a49SYabin Cui * Note: if you have been working with a specific user and would like them to review your work, make sure you mention them using (@<username>) 92*01826a49SYabin Cui * Submit the pull request and iterate with feedback. 93*01826a49SYabin Cui3. Merge and Release 94*01826a49SYabin Cui * Getting approval: 95*01826a49SYabin Cui * You will have to iterate on your changes with feedback from other collaborators to reach a point 96*01826a49SYabin Cui where your pull request can be safely merged. 97*01826a49SYabin Cui * To avoid too many comments on style and convention, make sure that you have a 98*01826a49SYabin Cui look at our style section below before creating a pull request. 99*01826a49SYabin Cui * Eventually, someone from the zstd team will approve your pull request and not long after merge it into 100*01826a49SYabin Cui the dev branch. 101*01826a49SYabin Cui * Housekeeping: 102*01826a49SYabin Cui * Most PRs are linked with one or more Github issues. If this is the case for your PR, make sure 103*01826a49SYabin Cui the corresponding issue is mentioned. If your change 'fixes' or completely addresses the 104*01826a49SYabin Cui issue at hand, then please indicate this by requesting that an issue be closed by commenting. 105*01826a49SYabin Cui * Just because your changes have been merged does not mean the topic or larger issue is complete. Remember 106*01826a49SYabin Cui that the change must make it to an official zstd release for it to be meaningful. We recommend 107*01826a49SYabin Cui that contributors track the activity on their pull request and corresponding issue(s) page(s) until 108*01826a49SYabin Cui their change makes it to the next release of zstd. Users will often discover bugs in your code or 109*01826a49SYabin Cui suggest ways to refine and improve your initial changes even after the pull request is merged. 110*01826a49SYabin Cui 111*01826a49SYabin Cui## Static Analysis 112*01826a49SYabin CuiStatic analysis is a process for examining the correctness or validity of a program without actually 113*01826a49SYabin Cuiexecuting it. It usually helps us find many simple bugs. Zstd uses clang's `scan-build` tool for 114*01826a49SYabin Cuistatic analysis. You can install it by following the instructions for your OS on https://clang-analyzer.llvm.org/scan-build. 115*01826a49SYabin Cui 116*01826a49SYabin CuiOnce installed, you can ensure that our static analysis tests pass on your local development machine 117*01826a49SYabin Cuiby running: 118*01826a49SYabin Cui``` 119*01826a49SYabin Cuimake staticAnalyze 120*01826a49SYabin Cui``` 121*01826a49SYabin Cui 122*01826a49SYabin CuiIn general, you can use `scan-build` to static analyze any build script. For example, to static analyze 123*01826a49SYabin Cuijust `contrib/largeNbDicts` and nothing else, you can run: 124*01826a49SYabin Cui 125*01826a49SYabin Cui``` 126*01826a49SYabin Cuiscan-build make -C contrib/largeNbDicts largeNbDicts 127*01826a49SYabin Cui``` 128*01826a49SYabin Cui 129*01826a49SYabin Cui### Pitfalls of static analysis 130*01826a49SYabin Cui`scan-build` is part of our regular CI suite. Other static analyzers are not. 131*01826a49SYabin Cui 132*01826a49SYabin CuiIt can be useful to look at additional static analyzers once in a while (and we do), but it's not a good idea to multiply the nb of analyzers run continuously at each commit and PR. The reasons are : 133*01826a49SYabin Cui 134*01826a49SYabin Cui- Static analyzers are full of false positive. The signal to noise ratio is actually pretty low. 135*01826a49SYabin Cui- A good CI policy is "zero-warning tolerance". That means that all issues must be solved, including false positives. This quickly becomes a tedious workload. 136*01826a49SYabin Cui- Multiple static analyzers will feature multiple kind of false positives, sometimes applying to the same code but in different ways leading to : 137*01826a49SYabin Cui + tortuous code, trying to please multiple constraints, hurting readability and therefore maintenance. Sometimes, such complexity introduce other more subtle bugs, that are just out of scope of the analyzers. 138*01826a49SYabin Cui + sometimes, these constraints are mutually exclusive : if one try to solve one, the other static analyzer will complain, they can't be both happy at the same time. 139*01826a49SYabin Cui- As if that was not enough, the list of false positives change with each version. It's hard enough to follow one static analyzer, but multiple ones with their own update agenda, this quickly becomes a massive velocity reducer. 140*01826a49SYabin Cui 141*01826a49SYabin CuiThis is different from running a static analyzer once in a while, looking at the output, and __cherry picking__ a few warnings that seem helpful, either because they detected a genuine risk of bug, or because it helps expressing the code in a way which is more readable or more difficult to misuse. These kinds of reports can be useful, and are accepted. 142*01826a49SYabin Cui 143*01826a49SYabin Cui## Continuous Integration 144*01826a49SYabin CuiCI tests run every time a pull request (PR) is created or updated. The exact tests 145*01826a49SYabin Cuithat get run will depend on the destination branch you specify. Some tests take 146*01826a49SYabin Cuilonger to run than others. Currently, our CI is set up to run a short 147*01826a49SYabin Cuiseries of tests when creating a PR to the dev branch and a longer series of tests 148*01826a49SYabin Cuiwhen creating a PR to the release branch. You can look in the configuration files 149*01826a49SYabin Cuiof the respective CI platform for more information on what gets run when. 150*01826a49SYabin Cui 151*01826a49SYabin CuiMost people will just want to create a PR with the destination set to their local dev 152*01826a49SYabin Cuibranch of zstd. You can then find the status of the tests on the PR's page. You can also 153*01826a49SYabin Cuire-run tests and cancel running tests from the PR page or from the respective CI's dashboard. 154*01826a49SYabin Cui 155*01826a49SYabin CuiAlmost all of zstd's CI runs on GitHub Actions (configured at `.github/workflows`), which will automatically run on PRs to your 156*01826a49SYabin Cuiown fork. A small number of tests run on other services (e.g. Travis CI, Circle CI, Appveyor). 157*01826a49SYabin CuiThese require work to set up on your local fork, and (at least for Travis CI) cost money. 158*01826a49SYabin CuiTherefore, if the PR on your local fork passes GitHub Actions, feel free to submit a PR 159*01826a49SYabin Cuiagainst the main repo. 160*01826a49SYabin Cui 161*01826a49SYabin Cui### Third-party CI 162*01826a49SYabin CuiA small number of tests cannot run on GitHub Actions, or have yet to be migrated. 163*01826a49SYabin CuiFor these, we use a variety of third-party services (listed below). It is not necessary to set 164*01826a49SYabin Cuithese up on your fork in order to contribute to zstd; however, we do link to instructions for those 165*01826a49SYabin Cuiwho want earlier signal. 166*01826a49SYabin Cui 167*01826a49SYabin Cui| Service | Purpose | Setup Links | Config Path | 168*01826a49SYabin Cui|-----------|------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------| 169*01826a49SYabin Cui| Travis CI | Used for testing on non-x86 architectures such as PowerPC | https://docs.travis-ci.com/user/tutorial/#to-get-started-with-travis-ci-using-github <br> https://github.com/marketplace/travis-ci | `.travis.yml` | 170*01826a49SYabin Cui| AppVeyor | Used for some Windows testing (e.g. cygwin, mingw) | https://www.appveyor.com/blog/2018/10/02/github-apps-integration/ <br> https://github.com/marketplace/appveyor | `appveyor.yml` | 171*01826a49SYabin Cui| Cirrus CI | Used for testing on FreeBSD | https://github.com/marketplace/cirrus-ci/ | `.cirrus.yml` | 172*01826a49SYabin Cui| Circle CI | Historically was used to provide faster signal,<br/> but we may be able to migrate these to Github Actions | https://circleci.com/docs/2.0/getting-started/#setting-up-circleci <br> https://youtu.be/Js3hMUsSZ2c <br> https://circleci.com/docs/2.0/enable-checks/ | `.circleci/config.yml` | 173*01826a49SYabin Cui 174*01826a49SYabin CuiNote: the instructions linked above mostly cover how to set up a repository with CI from scratch. 175*01826a49SYabin CuiThe general idea should be the same for setting up CI on your fork of zstd, but you may have to 176*01826a49SYabin Cuifollow slightly different steps. In particular, please ignore any instructions related to setting up 177*01826a49SYabin Cuiconfig files (since zstd already has configs for each of these services). 178*01826a49SYabin Cui 179*01826a49SYabin Cui## Performance 180*01826a49SYabin CuiPerformance is extremely important for zstd and we only merge pull requests whose performance 181*01826a49SYabin Cuilandscape and corresponding trade-offs have been adequately analyzed, reproduced, and presented. 182*01826a49SYabin CuiThis high bar for performance means that every PR which has the potential to 183*01826a49SYabin Cuiimpact performance takes a very long time for us to properly review. That being said, we 184*01826a49SYabin Cuialways welcome contributions to improve performance (or worsen performance for the trade-off of 185*01826a49SYabin Cuisomething else). Please keep the following in mind before submitting a performance related PR: 186*01826a49SYabin Cui 187*01826a49SYabin Cui1. Zstd isn't as old as gzip but it has been around for time now and its evolution is 188*01826a49SYabin Cuivery well documented via past Github issues and pull requests. It may be the case that your 189*01826a49SYabin Cuiparticular performance optimization has already been considered in the past. Please take some 190*01826a49SYabin Cuitime to search through old issues and pull requests using keywords specific to your 191*01826a49SYabin Cuiwould-be PR. Of course, just because a topic has already been discussed (and perhaps rejected 192*01826a49SYabin Cuion some grounds) in the past, doesn't mean it isn't worth bringing up again. But even in that case, 193*01826a49SYabin Cuiit will be helpful for you to have context from that topic's history before contributing. 194*01826a49SYabin Cui2. The distinction between noise and actual performance gains can unfortunately be very subtle 195*01826a49SYabin Cuiespecially when microbenchmarking extremely small wins or losses. The only remedy to getting 196*01826a49SYabin Cuisomething subtle merged is extensive benchmarking. You will be doing us a great favor if you 197*01826a49SYabin Cuitake the time to run extensive, long-duration, and potentially cross-(os, platform, process, etc) 198*01826a49SYabin Cuibenchmarks on your end before submitting a PR. Of course, you will not be able to benchmark 199*01826a49SYabin Cuiyour changes on every single processor and os out there (and neither will we) but do that best 200*01826a49SYabin Cuiyou can:) We've added some things to think about when benchmarking below in the Benchmarking 201*01826a49SYabin CuiPerformance section which might be helpful for you. 202*01826a49SYabin Cui3. Optimizing performance for a certain OS, processor vendor, compiler, or network system is a perfectly 203*01826a49SYabin Cuilegitimate thing to do as long as it does not harm the overall performance health of Zstd. 204*01826a49SYabin CuiThis is a hard balance to strike but please keep in mind other aspects of Zstd when 205*01826a49SYabin Cuisubmitting changes that are clang-specific, windows-specific, etc. 206*01826a49SYabin Cui 207*01826a49SYabin Cui## Benchmarking Performance 208*01826a49SYabin CuiPerformance microbenchmarking is a tricky subject but also essential for Zstd. We value empirical 209*01826a49SYabin Cuitesting over theoretical speculation. This guide it not perfect but for most scenarios, it 210*01826a49SYabin Cuiis a good place to start. 211*01826a49SYabin Cui 212*01826a49SYabin Cui### Stability 213*01826a49SYabin CuiUnfortunately, the most important aspect in being able to benchmark reliably is to have a stable 214*01826a49SYabin Cuibenchmarking machine. A virtual machine, a machine with shared resources, or your laptop 215*01826a49SYabin Cuiwill typically not be stable enough to obtain reliable benchmark results. If you can get your 216*01826a49SYabin Cuihands on a desktop, this is usually a better scenario. 217*01826a49SYabin Cui 218*01826a49SYabin CuiOf course, benchmarking can be done on non-hyper-stable machines as well. You will just have to 219*01826a49SYabin Cuido a little more work to ensure that you are in fact measuring the changes you've made and not 220*01826a49SYabin Cuinoise. Here are some things you can do to make your benchmarks more stable: 221*01826a49SYabin Cui 222*01826a49SYabin Cui1. The most simple thing you can do to drastically improve the stability of your benchmark is 223*01826a49SYabin Cuito run it multiple times and then aggregate the results of those runs. As a general rule of 224*01826a49SYabin Cuithumb, the smaller the change you are trying to measure, the more samples of benchmark runs 225*01826a49SYabin Cuiyou will have to aggregate over to get reliable results. Here are some additional things to keep in 226*01826a49SYabin Cuimind when running multiple trials: 227*01826a49SYabin Cui * How you aggregate your samples are important. You might be tempted to use the mean of your 228*01826a49SYabin Cui results. While this is certainly going to be a more stable number than a raw single sample 229*01826a49SYabin Cui benchmark number, you might have more luck by taking the median. The mean is not robust to 230*01826a49SYabin Cui outliers whereas the median is. Better still, you could simply take the fastest speed your 231*01826a49SYabin Cui benchmark achieved on each run since that is likely the fastest your process will be 232*01826a49SYabin Cui capable of running your code. In our experience, this (aggregating by just taking the sample 233*01826a49SYabin Cui with the fastest running time) has been the most stable approach. 234*01826a49SYabin Cui * The more samples you have, the more stable your benchmarks should be. You can verify 235*01826a49SYabin Cui your improved stability by looking at the size of your confidence intervals as you 236*01826a49SYabin Cui increase your sample count. These should get smaller and smaller. Eventually hopefully 237*01826a49SYabin Cui smaller than the performance win you are expecting. 238*01826a49SYabin Cui * Most processors will take some time to get `hot` when running anything. The observations 239*01826a49SYabin Cui you collect during that time period will very different from the true performance number. Having 240*01826a49SYabin Cui a very large number of sample will help alleviate this problem slightly but you can also 241*01826a49SYabin Cui address is directly by simply not including the first `n` iterations of your benchmark in 242*01826a49SYabin Cui your aggregations. You can determine `n` by simply looking at the results from each iteration 243*01826a49SYabin Cui and then hand picking a good threshold after which the variance in results seems to stabilize. 244*01826a49SYabin Cui2. You cannot really get reliable benchmarks if your host machine is simultaneously running 245*01826a49SYabin Cuianother cpu/memory-intensive application in the background. If you are running benchmarks on your 246*01826a49SYabin Cuipersonal laptop for instance, you should close all applications (including your code editor and 247*01826a49SYabin Cuibrowser) before running your benchmarks. You might also have invisible background applications 248*01826a49SYabin Cuirunning. You can see what these are by looking at either Activity Monitor on Mac or Task Manager 249*01826a49SYabin Cuion Windows. You will get more stable benchmark results of you end those processes as well. 250*01826a49SYabin Cui * If you have multiple cores, you can even run your benchmark on a reserved core to prevent 251*01826a49SYabin Cui pollution from other OS and user processes. There are a number of ways to do this depending 252*01826a49SYabin Cui on your OS: 253*01826a49SYabin Cui * On linux boxes, you have use https://github.com/lpechacek/cpuset. 254*01826a49SYabin Cui * On Windows, you can "Set Processor Affinity" using https://www.thewindowsclub.com/processor-affinity-windows 255*01826a49SYabin Cui * On Mac, you can try to use their dedicated affinity API https://developer.apple.com/library/archive/releasenotes/Performance/RN-AffinityAPI/#//apple_ref/doc/uid/TP40006635-CH1-DontLinkElementID_2 256*01826a49SYabin Cui3. To benchmark, you will likely end up writing a separate c/c++ program that will link libzstd. 257*01826a49SYabin CuiDynamically linking your library will introduce some added variation (not a large amount but 258*01826a49SYabin Cuidefinitely some). Statically linking libzstd will be more stable. Static libraries should 259*01826a49SYabin Cuibe enabled by default when building zstd. 260*01826a49SYabin Cui4. Use a profiler with a good high resolution timer. See the section below on profiling for 261*01826a49SYabin Cuidetails on this. 262*01826a49SYabin Cui5. Disable frequency scaling, turbo boost and address space randomization (this will vary by OS) 263*01826a49SYabin Cui6. Try to avoid storage. On some systems you can use tmpfs. Putting the program, inputs and outputs on 264*01826a49SYabin Cuitmpfs avoids touching a real storage system, which can have a pretty big variability. 265*01826a49SYabin Cui 266*01826a49SYabin CuiAlso check our LLVM's guide on benchmarking here: https://llvm.org/docs/Benchmarking.html 267*01826a49SYabin Cui 268*01826a49SYabin Cui### Zstd benchmark 269*01826a49SYabin CuiThe fastest signal you can get regarding your performance changes is via the in-build zstd cli 270*01826a49SYabin Cuibench option. You can run Zstd as you typically would for your scenario using some set of options 271*01826a49SYabin Cuiand then additionally also specify the `-b#` option. Doing this will run our benchmarking pipeline 272*01826a49SYabin Cuifor that options you have just provided. If you want to look at the internals of how this 273*01826a49SYabin Cuibenchmarking script works, you can check out programs/benchzstd.c 274*01826a49SYabin Cui 275*01826a49SYabin CuiFor example: say you have made a change that you believe improves the speed of zstd level 1. The 276*01826a49SYabin Cuivery first thing you should use to assess whether you actually achieved any sort of improvement 277*01826a49SYabin Cuiis `zstd -b`. You might try to do something like this. Note: you can use the `-i` option to 278*01826a49SYabin Cuispecify a running time for your benchmark in seconds (default is 3 seconds). 279*01826a49SYabin CuiUsually, the longer the running time, the more stable your results will be. 280*01826a49SYabin Cui 281*01826a49SYabin Cui``` 282*01826a49SYabin Cui$ git checkout <commit-before-your-change> 283*01826a49SYabin Cui$ make && cp zstd zstd-old 284*01826a49SYabin Cui$ git checkout <commit-after-your-change> 285*01826a49SYabin Cui$ make && cp zstd zstd-new 286*01826a49SYabin Cui$ zstd-old -i5 -b1 <your-test-data> 287*01826a49SYabin Cui 1<your-test-data> : 8990 -> 3992 (2.252), 302.6 MB/s , 626.4 MB/s 288*01826a49SYabin Cui$ zstd-new -i5 -b1 <your-test-data> 289*01826a49SYabin Cui 1<your-test-data> : 8990 -> 3992 (2.252), 302.8 MB/s , 628.4 MB/s 290*01826a49SYabin Cui``` 291*01826a49SYabin Cui 292*01826a49SYabin CuiUnless your performance win is large enough to be visible despite the intrinsic noise 293*01826a49SYabin Cuion your computer, benchzstd alone will likely not be enough to validate the impact of your 294*01826a49SYabin Cuichanges. For example, the results of the example above indicate that effectively nothing 295*01826a49SYabin Cuichanged but there could be a small <3% improvement that the noise on the host machine 296*01826a49SYabin Cuiobscured. So unless you see a large performance win (10-15% consistently) using just 297*01826a49SYabin Cuithis method of evaluation will not be sufficient. 298*01826a49SYabin Cui 299*01826a49SYabin Cui### Profiling 300*01826a49SYabin CuiThere are a number of great profilers out there. We're going to briefly mention how you can 301*01826a49SYabin Cuiprofile your code using `instruments` on mac, `perf` on linux and `visual studio profiler` 302*01826a49SYabin Cuion Windows. 303*01826a49SYabin Cui 304*01826a49SYabin CuiSay you have an idea for a change that you think will provide some good performance gains 305*01826a49SYabin Cuifor level 1 compression on Zstd. Typically this means, you have identified a section of 306*01826a49SYabin Cuicode that you think can be made to run faster. 307*01826a49SYabin Cui 308*01826a49SYabin CuiThe first thing you will want to do is make sure that the piece of code is actually taking up 309*01826a49SYabin Cuia notable amount of time to run. It is usually not worth optimizing something which accounts for less than 310*01826a49SYabin Cui0.0001% of the total running time. Luckily, there are tools to help with this. 311*01826a49SYabin CuiProfilers will let you see how much time your code spends inside a particular function. 312*01826a49SYabin CuiIf your target code snippet is only part of a function, it might be worth trying to 313*01826a49SYabin Cuiisolate that snippet by moving it to its own function (this is usually not necessary but 314*01826a49SYabin Cuimight be). 315*01826a49SYabin Cui 316*01826a49SYabin CuiMost profilers (including the profilers discussed below) will generate a call graph of 317*01826a49SYabin Cuifunctions for you. Your goal will be to find your function of interest in this call graph 318*01826a49SYabin Cuiand then inspect the time spent inside of it. You might also want to look at the annotated 319*01826a49SYabin Cuiassembly which most profilers will provide you with. 320*01826a49SYabin Cui 321*01826a49SYabin Cui#### Instruments 322*01826a49SYabin CuiWe will once again consider the scenario where you think you've identified a piece of code 323*01826a49SYabin Cuiwhose performance can be improved upon. Follow these steps to profile your code using 324*01826a49SYabin CuiInstruments. 325*01826a49SYabin Cui 326*01826a49SYabin Cui1. Open Instruments 327*01826a49SYabin Cui2. Select `Time Profiler` from the list of standard templates 328*01826a49SYabin Cui3. Close all other applications except for your instruments window and your terminal 329*01826a49SYabin Cui4. Run your benchmarking script from your terminal window 330*01826a49SYabin Cui * You will want a benchmark that runs for at least a few seconds (5 seconds will 331*01826a49SYabin Cui usually be long enough). This way the profiler will have something to work with 332*01826a49SYabin Cui and you will have ample time to attach your profiler to this process:) 333*01826a49SYabin Cui * I will just use benchzstd as my benchmarmking script for this example: 334*01826a49SYabin Cui``` 335*01826a49SYabin Cui$ zstd -b1 -i5 <my-data> # this will run for 5 seconds 336*01826a49SYabin Cui``` 337*01826a49SYabin Cui5. Once you run your benchmarking script, switch back over to instruments and attach your 338*01826a49SYabin Cuiprocess to the time profiler. You can do this by: 339*01826a49SYabin Cui * Clicking on the `All Processes` drop down in the top left of the toolbar. 340*01826a49SYabin Cui * Selecting your process from the dropdown. In my case, it is just going to be labeled 341*01826a49SYabin Cui `zstd` 342*01826a49SYabin Cui * Hitting the bright red record circle button on the top left of the toolbar 343*01826a49SYabin Cui6. You profiler will now start collecting metrics from your benchmarking script. Once 344*01826a49SYabin Cuiyou think you have collected enough samples (usually this is the case after 3 seconds of 345*01826a49SYabin Cuirecording), stop your profiler. 346*01826a49SYabin Cui7. Make sure that in toolbar of the bottom window, `profile` is selected. 347*01826a49SYabin Cui8. You should be able to see your call graph. 348*01826a49SYabin Cui * If you don't see the call graph or an incomplete call graph, make sure you have compiled 349*01826a49SYabin Cui zstd and your benchmarking script using debug flags. On mac and linux, this just means 350*01826a49SYabin Cui you will have to supply the `-g` flag alone with your build script. You might also 351*01826a49SYabin Cui have to provide the `-fno-omit-frame-pointer` flag 352*01826a49SYabin Cui9. Dig down the graph to find your function call and then inspect it by double clicking 353*01826a49SYabin Cuithe list item. You will be able to see the annotated source code and the assembly side by 354*01826a49SYabin Cuiside. 355*01826a49SYabin Cui 356*01826a49SYabin Cui#### Perf 357*01826a49SYabin Cui 358*01826a49SYabin CuiThis wiki has a pretty detailed tutorial on getting started working with perf so we'll 359*01826a49SYabin Cuileave you to check that out of you're getting started: 360*01826a49SYabin Cui 361*01826a49SYabin Cuihttps://perf.wiki.kernel.org/index.php/Tutorial 362*01826a49SYabin Cui 363*01826a49SYabin CuiSome general notes on perf: 364*01826a49SYabin Cui* Use `perf stat -r # <bench-program>` to quickly get some relevant timing and 365*01826a49SYabin Cuicounter statistics. Perf uses a high resolution timer and this is likely one 366*01826a49SYabin Cuiof the first things your team will run when assessing your PR. 367*01826a49SYabin Cui* Perf has a long list of hardware counters that can be viewed with `perf --list`. 368*01826a49SYabin CuiWhen measuring optimizations, something worth trying is to make sure the hardware 369*01826a49SYabin Cuicounters you expect to be impacted by your change are in fact being so. For example, 370*01826a49SYabin Cuiif you expect the L1 cache misses to decrease with your change, you can look at the 371*01826a49SYabin Cuicounter `L1-dcache-load-misses` 372*01826a49SYabin Cui* Perf hardware counters will not work on a virtual machine. 373*01826a49SYabin Cui 374*01826a49SYabin Cui#### Visual Studio 375*01826a49SYabin Cui 376*01826a49SYabin CuiTODO 377*01826a49SYabin Cui 378*01826a49SYabin Cui## Issues 379*01826a49SYabin CuiWe use GitHub issues to track public bugs. Please ensure your description is 380*01826a49SYabin Cuiclear and has sufficient instructions to be able to reproduce the issue. 381*01826a49SYabin Cui 382*01826a49SYabin CuiFacebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe 383*01826a49SYabin Cuidisclosure of security bugs. In those cases, please go through the process 384*01826a49SYabin Cuioutlined on that page and do not file a public issue. 385*01826a49SYabin Cui 386*01826a49SYabin Cui## Coding Style 387*01826a49SYabin CuiIt's a pretty long topic, which is difficult to summarize in a single paragraph. 388*01826a49SYabin CuiAs a rule of thumbs, try to imitate the coding style of 389*01826a49SYabin Cuisimilar lines of codes around your contribution. 390*01826a49SYabin CuiThe following is a non-exhaustive list of rules employed in zstd code base: 391*01826a49SYabin Cui 392*01826a49SYabin Cui### C90 393*01826a49SYabin CuiThis code base is following strict C90 standard, 394*01826a49SYabin Cuiwith 2 extensions : 64-bit `long long` types, and variadic macros. 395*01826a49SYabin CuiThis rule is applied strictly to code within `lib/` and `programs/`. 396*01826a49SYabin CuiSub-project in `contrib/` are allowed to use other conventions. 397*01826a49SYabin Cui 398*01826a49SYabin Cui### C++ direct compatibility : symbol mangling 399*01826a49SYabin CuiAll public symbol declarations must be wrapped in `extern “C” { … }`, 400*01826a49SYabin Cuiso that this project can be compiled as C++98 code, 401*01826a49SYabin Cuiand linked into C++ applications. 402*01826a49SYabin Cui 403*01826a49SYabin Cui### Minimal Frugal 404*01826a49SYabin CuiThis design requirement is fundamental to preserve the portability of the code base. 405*01826a49SYabin Cui#### Dependencies 406*01826a49SYabin Cui- Reduce dependencies to the minimum possible level. 407*01826a49SYabin Cui Any dependency should be considered “bad” by default, 408*01826a49SYabin Cui and only tolerated because it provides a service in a better way than can be achieved locally. 409*01826a49SYabin Cui The only external dependencies this repository tolerates are 410*01826a49SYabin Cui standard C libraries, and in rare cases, system level headers. 411*01826a49SYabin Cui- Within `lib/`, this policy is even more drastic. 412*01826a49SYabin Cui The only external dependencies allowed are `<assert.h>`, `<stdlib.h>`, `<string.h>`, 413*01826a49SYabin Cui and even then, not directly. 414*01826a49SYabin Cui In particular, no function shall ever allocate on heap directly, 415*01826a49SYabin Cui and must use instead `ZSTD_malloc()` and equivalent. 416*01826a49SYabin Cui Other accepted non-symbol headers are `<stddef.h>` and `<limits.h>`. 417*01826a49SYabin Cui- Within the project, there is a strict hierarchy of dependencies that must be respected. 418*01826a49SYabin Cui `programs/` is allowed to depend on `lib/`, but only its public API. 419*01826a49SYabin Cui Within `lib/`, `lib/common` doesn't depend on any other directory. 420*01826a49SYabin Cui `lib/compress` and `lib/decompress` shall not depend on each other. 421*01826a49SYabin Cui `lib/dictBuilder` can depend on `lib/common` and `lib/compress`, but not `lib/decompress`. 422*01826a49SYabin Cui#### Resources 423*01826a49SYabin Cui- Functions in `lib/` must use very little stack space, 424*01826a49SYabin Cui several dozens of bytes max. 425*01826a49SYabin Cui Everything larger must use the heap allocator, 426*01826a49SYabin Cui or require a scratch buffer to be emplaced manually. 427*01826a49SYabin Cui 428*01826a49SYabin Cui### Naming 429*01826a49SYabin Cui* All public symbols are prefixed with `ZSTD_` 430*01826a49SYabin Cui + private symbols, with a scope limited to their own unit, are free of this restriction. 431*01826a49SYabin Cui However, since `libzstd` source code can be amalgamated, 432*01826a49SYabin Cui each symbol name must attempt to be (and remain) unique. 433*01826a49SYabin Cui Avoid too generic names that could become ground for future collisions. 434*01826a49SYabin Cui This generally implies usage of some form of prefix. 435*01826a49SYabin Cui* For symbols (functions and variables), naming convention is `PREFIX_camelCase`. 436*01826a49SYabin Cui + In some advanced cases, one can also find : 437*01826a49SYabin Cui - `PREFIX_prefix2_camelCase` 438*01826a49SYabin Cui - `PREFIX_camelCase_extendedQualifier` 439*01826a49SYabin Cui* Multi-words names generally consist of an action followed by object: 440*01826a49SYabin Cui - for example : `ZSTD_createCCtx()` 441*01826a49SYabin Cui* Prefer positive actions 442*01826a49SYabin Cui - `goBackward` rather than `notGoForward` 443*01826a49SYabin Cui* Type names (`struct`, etc.) follow similar convention, 444*01826a49SYabin Cui except that they are allowed and even invited to start by an Uppercase letter. 445*01826a49SYabin Cui Example : `ZSTD_CCtx`, `ZSTD_CDict` 446*01826a49SYabin Cui* Macro names are all Capital letters. 447*01826a49SYabin Cui The same composition rules (`PREFIX_NAME_QUALIFIER`) apply. 448*01826a49SYabin Cui* File names are all lowercase letters. 449*01826a49SYabin Cui The convention is `snake_case`. 450*01826a49SYabin Cui File names **must** be unique across the entire code base, 451*01826a49SYabin Cui even when they stand in clearly separated directories. 452*01826a49SYabin Cui 453*01826a49SYabin Cui### Qualifiers 454*01826a49SYabin Cui* This code base is `const` friendly, if not `const` fanatical. 455*01826a49SYabin Cui Any variable that can be `const` (aka. read-only) **must** be `const`. 456*01826a49SYabin Cui Any pointer which content will not be modified must be `const`. 457*01826a49SYabin Cui This property is then controlled at compiler level. 458*01826a49SYabin Cui `const` variables are an important signal to readers that this variable isn't modified. 459*01826a49SYabin Cui Conversely, non-const variables are a signal to readers to watch out for modifications later on in the function. 460*01826a49SYabin Cui* If a function must be inlined, mention it explicitly, 461*01826a49SYabin Cui using project's own portable macros, such as `FORCE_INLINE_ATTR`, 462*01826a49SYabin Cui defined in `lib/common/compiler.h`. 463*01826a49SYabin Cui 464*01826a49SYabin Cui### Debugging 465*01826a49SYabin Cui* **Assertions** are welcome, and should be used very liberally, 466*01826a49SYabin Cui to control any condition the code expects for its correct execution. 467*01826a49SYabin Cui These assertion checks will be run in debug builds, and disabled in production. 468*01826a49SYabin Cui* For traces, this project provides its own debug macros, 469*01826a49SYabin Cui in particular `DEBUGLOG(level, ...)`, defined in `lib/common/debug.h`. 470*01826a49SYabin Cui 471*01826a49SYabin Cui### Code documentation 472*01826a49SYabin Cui* Avoid code documentation that merely repeats what the code is already stating. 473*01826a49SYabin Cui Whenever applicable, prefer employing the code as the primary way to convey explanations. 474*01826a49SYabin Cui Example 1 : `int nbTokens = n;` instead of `int i = n; /* i is a nb of tokens *./`. 475*01826a49SYabin Cui Example 2 : `assert(size > 0);` instead of `/* here, size should be positive */`. 476*01826a49SYabin Cui* At declaration level, the documentation explains how to use the function or variable 477*01826a49SYabin Cui and when applicable why it's needed, of the scenarios where it can be useful. 478*01826a49SYabin Cui* At implementation level, the documentation explains the general outline of the algorithm employed, 479*01826a49SYabin Cui and when applicable why this specific choice was preferred. 480*01826a49SYabin Cui 481*01826a49SYabin Cui### General layout 482*01826a49SYabin Cui* 4 spaces for indentation rather than tabs 483*01826a49SYabin Cui* Code documentation shall directly precede function declaration or implementation 484*01826a49SYabin Cui* Function implementations and its code documentation should be preceded and followed by an empty line 485*01826a49SYabin Cui 486*01826a49SYabin Cui 487*01826a49SYabin Cui## License 488*01826a49SYabin CuiBy contributing to Zstandard, you agree that your contributions will be licensed 489*01826a49SYabin Cuiunder both the [LICENSE](LICENSE) file and the [COPYING](COPYING) file in the root directory of this source tree. 490