1*01826a49SYabin Cui<p align="center"><img src="https://raw.githubusercontent.com/facebook/zstd/dev/doc/images/zstd_logo86.png" alt="Zstandard"></p> 2*01826a49SYabin Cui 3*01826a49SYabin Cui__Zstandard__, or `zstd` as short version, is a fast lossless compression algorithm, 4*01826a49SYabin Cuitargeting real-time compression scenarios at zlib-level and better compression ratios. 5*01826a49SYabin CuiIt's backed by a very fast entropy stage, provided by [Huff0 and FSE library](https://github.com/Cyan4973/FiniteStateEntropy). 6*01826a49SYabin Cui 7*01826a49SYabin CuiZstandard's format is stable and documented in [RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple independent implementations are already available. 8*01826a49SYabin CuiThis repository represents the reference implementation, provided as an open-source dual [BSD](LICENSE) OR [GPLv2](COPYING) licensed **C** library, 9*01826a49SYabin Cuiand a command line utility producing and decoding `.zst`, `.gz`, `.xz` and `.lz4` files. 10*01826a49SYabin CuiShould your project require another programming language, 11*01826a49SYabin Cuia list of known ports and bindings is provided on [Zstandard homepage](https://facebook.github.io/zstd/#other-languages). 12*01826a49SYabin Cui 13*01826a49SYabin Cui**Development branch status:** 14*01826a49SYabin Cui 15*01826a49SYabin Cui[![Build Status][travisDevBadge]][travisLink] 16*01826a49SYabin Cui[![Build status][CircleDevBadge]][CircleLink] 17*01826a49SYabin Cui[![Build status][CirrusDevBadge]][CirrusLink] 18*01826a49SYabin Cui[![Fuzzing Status][OSSFuzzBadge]][OSSFuzzLink] 19*01826a49SYabin Cui 20*01826a49SYabin Cui[travisDevBadge]: https://api.travis-ci.com/facebook/zstd.svg?branch=dev "Continuous Integration test suite" 21*01826a49SYabin Cui[travisLink]: https://travis-ci.com/facebook/zstd 22*01826a49SYabin Cui[CircleDevBadge]: https://circleci.com/gh/facebook/zstd/tree/dev.svg?style=shield "Short test suite" 23*01826a49SYabin Cui[CircleLink]: https://circleci.com/gh/facebook/zstd 24*01826a49SYabin Cui[CirrusDevBadge]: https://api.cirrus-ci.com/github/facebook/zstd.svg?branch=dev 25*01826a49SYabin Cui[CirrusLink]: https://cirrus-ci.com/github/facebook/zstd 26*01826a49SYabin Cui[OSSFuzzBadge]: https://oss-fuzz-build-logs.storage.googleapis.com/badges/zstd.svg 27*01826a49SYabin Cui[OSSFuzzLink]: https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:zstd 28*01826a49SYabin Cui 29*01826a49SYabin Cui## Benchmarks 30*01826a49SYabin Cui 31*01826a49SYabin CuiFor reference, several fast compression algorithms were tested and compared 32*01826a49SYabin Cuion a desktop running Ubuntu 20.04 (`Linux 5.11.0-41-generic`), 33*01826a49SYabin Cuiwith a Core i7-9700K CPU @ 4.9GHz, 34*01826a49SYabin Cuiusing [lzbench], an open-source in-memory benchmark by @inikep 35*01826a49SYabin Cuicompiled with [gcc] 9.3.0, 36*01826a49SYabin Cuion the [Silesia compression corpus]. 37*01826a49SYabin Cui 38*01826a49SYabin Cui[lzbench]: https://github.com/inikep/lzbench 39*01826a49SYabin Cui[Silesia compression corpus]: https://sun.aei.polsl.pl//~sdeor/index.php?page=silesia 40*01826a49SYabin Cui[gcc]: https://gcc.gnu.org/ 41*01826a49SYabin Cui 42*01826a49SYabin Cui| Compressor name | Ratio | Compression| Decompress.| 43*01826a49SYabin Cui| --------------- | ------| -----------| ---------- | 44*01826a49SYabin Cui| **zstd 1.5.1 -1** | 2.887 | 530 MB/s | 1700 MB/s | 45*01826a49SYabin Cui| [zlib] 1.2.11 -1 | 2.743 | 95 MB/s | 400 MB/s | 46*01826a49SYabin Cui| brotli 1.0.9 -0 | 2.702 | 395 MB/s | 450 MB/s | 47*01826a49SYabin Cui| **zstd 1.5.1 --fast=1** | 2.437 | 600 MB/s | 2150 MB/s | 48*01826a49SYabin Cui| **zstd 1.5.1 --fast=3** | 2.239 | 670 MB/s | 2250 MB/s | 49*01826a49SYabin Cui| quicklz 1.5.0 -1 | 2.238 | 540 MB/s | 760 MB/s | 50*01826a49SYabin Cui| **zstd 1.5.1 --fast=4** | 2.148 | 710 MB/s | 2300 MB/s | 51*01826a49SYabin Cui| lzo1x 2.10 -1 | 2.106 | 660 MB/s | 845 MB/s | 52*01826a49SYabin Cui| [lz4] 1.9.3 | 2.101 | 740 MB/s | 4500 MB/s | 53*01826a49SYabin Cui| lzf 3.6 -1 | 2.077 | 410 MB/s | 830 MB/s | 54*01826a49SYabin Cui| snappy 1.1.9 | 2.073 | 550 MB/s | 1750 MB/s | 55*01826a49SYabin Cui 56*01826a49SYabin Cui[zlib]: https://www.zlib.net/ 57*01826a49SYabin Cui[lz4]: https://lz4.github.io/lz4/ 58*01826a49SYabin Cui 59*01826a49SYabin CuiThe negative compression levels, specified with `--fast=#`, 60*01826a49SYabin Cuioffer faster compression and decompression speed 61*01826a49SYabin Cuiat the cost of compression ratio (compared to level 1). 62*01826a49SYabin Cui 63*01826a49SYabin CuiZstd can also offer stronger compression ratios at the cost of compression speed. 64*01826a49SYabin CuiSpeed vs Compression trade-off is configurable by small increments. 65*01826a49SYabin CuiDecompression speed is preserved and remains roughly the same at all settings, 66*01826a49SYabin Cuia property shared by most LZ compression algorithms, such as [zlib] or lzma. 67*01826a49SYabin Cui 68*01826a49SYabin CuiThe following tests were run 69*01826a49SYabin Cuion a server running Linux Debian (`Linux version 4.14.0-3-amd64`) 70*01826a49SYabin Cuiwith a Core i7-6700K CPU @ 4.0GHz, 71*01826a49SYabin Cuiusing [lzbench], an open-source in-memory benchmark by @inikep 72*01826a49SYabin Cuicompiled with [gcc] 7.3.0, 73*01826a49SYabin Cuion the [Silesia compression corpus]. 74*01826a49SYabin Cui 75*01826a49SYabin CuiCompression Speed vs Ratio | Decompression Speed 76*01826a49SYabin Cui---------------------------|-------------------- 77*01826a49SYabin Cui |  78*01826a49SYabin Cui 79*01826a49SYabin CuiA few other algorithms can produce higher compression ratios at slower speeds, falling outside of the graph. 80*01826a49SYabin CuiFor a larger picture including slow modes, [click on this link](doc/images/DCspeed5.png). 81*01826a49SYabin Cui 82*01826a49SYabin Cui 83*01826a49SYabin Cui## The case for Small Data compression 84*01826a49SYabin Cui 85*01826a49SYabin CuiPrevious charts provide results applicable to typical file and stream scenarios (several MB). Small data comes with different perspectives. 86*01826a49SYabin Cui 87*01826a49SYabin CuiThe smaller the amount of data to compress, the more difficult it is to compress. This problem is common to all compression algorithms, and reason is, compression algorithms learn from past data how to compress future data. But at the beginning of a new data set, there is no "past" to build upon. 88*01826a49SYabin Cui 89*01826a49SYabin CuiTo solve this situation, Zstd offers a __training mode__, which can be used to tune the algorithm for a selected type of data. 90*01826a49SYabin CuiTraining Zstandard is achieved by providing it with a few samples (one file per sample). The result of this training is stored in a file called "dictionary", which must be loaded before compression and decompression. 91*01826a49SYabin CuiUsing this dictionary, the compression ratio achievable on small data improves dramatically. 92*01826a49SYabin Cui 93*01826a49SYabin CuiThe following example uses the `github-users` [sample set](https://github.com/facebook/zstd/releases/tag/v1.1.3), created from [github public API](https://developer.github.com/v3/users/#get-all-users). 94*01826a49SYabin CuiIt consists of roughly 10K records weighing about 1KB each. 95*01826a49SYabin Cui 96*01826a49SYabin CuiCompression Ratio | Compression Speed | Decompression Speed 97*01826a49SYabin Cui------------------|-------------------|-------------------- 98*01826a49SYabin Cui |  |  99*01826a49SYabin Cui 100*01826a49SYabin Cui 101*01826a49SYabin CuiThese compression gains are achieved while simultaneously providing _faster_ compression and decompression speeds. 102*01826a49SYabin Cui 103*01826a49SYabin CuiTraining works if there is some correlation in a family of small data samples. The more data-specific a dictionary is, the more efficient it is (there is no _universal dictionary_). 104*01826a49SYabin CuiHence, deploying one dictionary per type of data will provide the greatest benefits. 105*01826a49SYabin CuiDictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file. 106*01826a49SYabin Cui 107*01826a49SYabin Cui### Dictionary compression How To: 108*01826a49SYabin Cui 109*01826a49SYabin Cui1. Create the dictionary 110*01826a49SYabin Cui 111*01826a49SYabin Cui `zstd --train FullPathToTrainingSet/* -o dictionaryName` 112*01826a49SYabin Cui 113*01826a49SYabin Cui2. Compress with dictionary 114*01826a49SYabin Cui 115*01826a49SYabin Cui `zstd -D dictionaryName FILE` 116*01826a49SYabin Cui 117*01826a49SYabin Cui3. Decompress with dictionary 118*01826a49SYabin Cui 119*01826a49SYabin Cui `zstd -D dictionaryName --decompress FILE.zst` 120*01826a49SYabin Cui 121*01826a49SYabin Cui 122*01826a49SYabin Cui## Build instructions 123*01826a49SYabin Cui 124*01826a49SYabin Cui`make` is the officially maintained build system of this project. 125*01826a49SYabin CuiAll other build systems are "compatible" and 3rd-party maintained, 126*01826a49SYabin Cuithey may feature small differences in advanced options. 127*01826a49SYabin CuiWhen your system allows it, prefer using `make` to build `zstd` and `libzstd`. 128*01826a49SYabin Cui 129*01826a49SYabin Cui### Makefile 130*01826a49SYabin Cui 131*01826a49SYabin CuiIf your system is compatible with standard `make` (or `gmake`), 132*01826a49SYabin Cuiinvoking `make` in root directory will generate `zstd` cli in root directory. 133*01826a49SYabin CuiIt will also create `libzstd` into `lib/`. 134*01826a49SYabin Cui 135*01826a49SYabin CuiOther available options include: 136*01826a49SYabin Cui- `make install` : create and install zstd cli, library and man pages 137*01826a49SYabin Cui- `make check` : create and run `zstd`, test its behavior on local platform 138*01826a49SYabin Cui 139*01826a49SYabin CuiThe `Makefile` follows the [GNU Standard Makefile conventions](https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html), 140*01826a49SYabin Cuiallowing staged install, standard flags, directory variables and command variables. 141*01826a49SYabin Cui 142*01826a49SYabin CuiFor advanced use cases, specialized compilation flags which control binary generation 143*01826a49SYabin Cuiare documented in [`lib/README.md`](lib/README.md#modular-build) for the `libzstd` library 144*01826a49SYabin Cuiand in [`programs/README.md`](programs/README.md#compilation-variables) for the `zstd` CLI. 145*01826a49SYabin Cui 146*01826a49SYabin Cui### cmake 147*01826a49SYabin Cui 148*01826a49SYabin CuiA `cmake` project generator is provided within `build/cmake`. 149*01826a49SYabin CuiIt can generate Makefiles or other build scripts 150*01826a49SYabin Cuito create `zstd` binary, and `libzstd` dynamic and static libraries. 151*01826a49SYabin Cui 152*01826a49SYabin CuiBy default, `CMAKE_BUILD_TYPE` is set to `Release`. 153*01826a49SYabin Cui 154*01826a49SYabin Cui#### Support for Fat (Universal2) Output 155*01826a49SYabin Cui 156*01826a49SYabin Cui`zstd` can be built and installed with support for both Apple Silicon (M1/M2) as well as Intel by using CMake's Universal2 support. 157*01826a49SYabin CuiTo perform a Fat/Universal2 build and install use the following commands: 158*01826a49SYabin Cui 159*01826a49SYabin Cui```bash 160*01826a49SYabin Cuicmake -B build-cmake-debug -S build/cmake -G Ninja -DCMAKE_OSX_ARCHITECTURES="x86_64;x86_64h;arm64" 161*01826a49SYabin Cuicd build-cmake-debug 162*01826a49SYabin Cuininja 163*01826a49SYabin Cuisudo ninja install 164*01826a49SYabin Cui``` 165*01826a49SYabin Cui 166*01826a49SYabin Cui### Meson 167*01826a49SYabin Cui 168*01826a49SYabin CuiA Meson project is provided within [`build/meson`](build/meson). Follow 169*01826a49SYabin Cuibuild instructions in that directory. 170*01826a49SYabin Cui 171*01826a49SYabin CuiYou can also take a look at [`.travis.yml`](.travis.yml) file for an 172*01826a49SYabin Cuiexample about how Meson is used to build this project. 173*01826a49SYabin Cui 174*01826a49SYabin CuiNote that default build type is **release**. 175*01826a49SYabin Cui 176*01826a49SYabin Cui### VCPKG 177*01826a49SYabin CuiYou can build and install zstd [vcpkg](https://github.com/Microsoft/vcpkg/) dependency manager: 178*01826a49SYabin Cui 179*01826a49SYabin Cui git clone https://github.com/Microsoft/vcpkg.git 180*01826a49SYabin Cui cd vcpkg 181*01826a49SYabin Cui ./bootstrap-vcpkg.sh 182*01826a49SYabin Cui ./vcpkg integrate install 183*01826a49SYabin Cui ./vcpkg install zstd 184*01826a49SYabin Cui 185*01826a49SYabin CuiThe zstd port in vcpkg is kept up to date by Microsoft team members and community contributors. 186*01826a49SYabin CuiIf the version is out of date, please [create an issue or pull request](https://github.com/Microsoft/vcpkg) on the vcpkg repository. 187*01826a49SYabin Cui 188*01826a49SYabin Cui### Visual Studio (Windows) 189*01826a49SYabin Cui 190*01826a49SYabin CuiGoing into `build` directory, you will find additional possibilities: 191*01826a49SYabin Cui- Projects for Visual Studio 2005, 2008 and 2010. 192*01826a49SYabin Cui + VS2010 project is compatible with VS2012, VS2013, VS2015 and VS2017. 193*01826a49SYabin Cui- Automated build scripts for Visual compiler by [@KrzysFR](https://github.com/KrzysFR), in `build/VS_scripts`, 194*01826a49SYabin Cui which will build `zstd` cli and `libzstd` library without any need to open Visual Studio solution. 195*01826a49SYabin Cui 196*01826a49SYabin Cui### Buck 197*01826a49SYabin Cui 198*01826a49SYabin CuiYou can build the zstd binary via buck by executing: `buck build programs:zstd` from the root of the repo. 199*01826a49SYabin CuiThe output binary will be in `buck-out/gen/programs/`. 200*01826a49SYabin Cui 201*01826a49SYabin Cui### Bazel 202*01826a49SYabin Cui 203*01826a49SYabin CuiYou easily can integrate zstd into your Bazel project by using the module hosted on the [Bazel Central Repository](https://registry.bazel.build/modules/zstd). 204*01826a49SYabin Cui 205*01826a49SYabin Cui## Testing 206*01826a49SYabin Cui 207*01826a49SYabin CuiYou can run quick local smoke tests by running `make check`. 208*01826a49SYabin CuiIf you can't use `make`, execute the `playTest.sh` script from the `src/tests` directory. 209*01826a49SYabin CuiTwo env variables `$ZSTD_BIN` and `$DATAGEN_BIN` are needed for the test script to locate the `zstd` and `datagen` binary. 210*01826a49SYabin CuiFor information on CI testing, please refer to `TESTING.md`. 211*01826a49SYabin Cui 212*01826a49SYabin Cui## Status 213*01826a49SYabin Cui 214*01826a49SYabin CuiZstandard is currently deployed within Facebook and many other large cloud infrastructures. 215*01826a49SYabin CuiIt is run continuously to compress large amounts of data in multiple formats and use cases. 216*01826a49SYabin CuiZstandard is considered safe for production environments. 217*01826a49SYabin Cui 218*01826a49SYabin Cui## License 219*01826a49SYabin Cui 220*01826a49SYabin CuiZstandard is dual-licensed under [BSD](LICENSE) OR [GPLv2](COPYING). 221*01826a49SYabin Cui 222*01826a49SYabin Cui## Contributing 223*01826a49SYabin Cui 224*01826a49SYabin CuiThe `dev` branch is the one where all contributions are merged before reaching `release`. 225*01826a49SYabin CuiIf you plan to propose a patch, please commit into the `dev` branch, or its own feature branch. 226*01826a49SYabin CuiDirect commit to `release` are not permitted. 227*01826a49SYabin CuiFor more information, please read [CONTRIBUTING](CONTRIBUTING.md). 228