xref: /aosp_15_r20/external/zstd/README.md (revision 01826a4963a0d8a59bc3812d29bdf0fb76416722)
1*01826a49SYabin Cui<p align="center"><img src="https://raw.githubusercontent.com/facebook/zstd/dev/doc/images/zstd_logo86.png" alt="Zstandard"></p>
2*01826a49SYabin Cui
3*01826a49SYabin Cui__Zstandard__, or `zstd` as short version, is a fast lossless compression algorithm,
4*01826a49SYabin Cuitargeting real-time compression scenarios at zlib-level and better compression ratios.
5*01826a49SYabin CuiIt's backed by a very fast entropy stage, provided by [Huff0 and FSE library](https://github.com/Cyan4973/FiniteStateEntropy).
6*01826a49SYabin Cui
7*01826a49SYabin CuiZstandard's format is stable and documented in [RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple independent implementations are already available.
8*01826a49SYabin CuiThis repository represents the reference implementation, provided as an open-source dual [BSD](LICENSE) OR [GPLv2](COPYING) licensed **C** library,
9*01826a49SYabin Cuiand a command line utility producing and decoding `.zst`, `.gz`, `.xz` and `.lz4` files.
10*01826a49SYabin CuiShould your project require another programming language,
11*01826a49SYabin Cuia list of known ports and bindings is provided on [Zstandard homepage](https://facebook.github.io/zstd/#other-languages).
12*01826a49SYabin Cui
13*01826a49SYabin Cui**Development branch status:**
14*01826a49SYabin Cui
15*01826a49SYabin Cui[![Build Status][travisDevBadge]][travisLink]
16*01826a49SYabin Cui[![Build status][CircleDevBadge]][CircleLink]
17*01826a49SYabin Cui[![Build status][CirrusDevBadge]][CirrusLink]
18*01826a49SYabin Cui[![Fuzzing Status][OSSFuzzBadge]][OSSFuzzLink]
19*01826a49SYabin Cui
20*01826a49SYabin Cui[travisDevBadge]: https://api.travis-ci.com/facebook/zstd.svg?branch=dev "Continuous Integration test suite"
21*01826a49SYabin Cui[travisLink]: https://travis-ci.com/facebook/zstd
22*01826a49SYabin Cui[CircleDevBadge]: https://circleci.com/gh/facebook/zstd/tree/dev.svg?style=shield "Short test suite"
23*01826a49SYabin Cui[CircleLink]: https://circleci.com/gh/facebook/zstd
24*01826a49SYabin Cui[CirrusDevBadge]: https://api.cirrus-ci.com/github/facebook/zstd.svg?branch=dev
25*01826a49SYabin Cui[CirrusLink]: https://cirrus-ci.com/github/facebook/zstd
26*01826a49SYabin Cui[OSSFuzzBadge]: https://oss-fuzz-build-logs.storage.googleapis.com/badges/zstd.svg
27*01826a49SYabin Cui[OSSFuzzLink]: https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:zstd
28*01826a49SYabin Cui
29*01826a49SYabin Cui## Benchmarks
30*01826a49SYabin Cui
31*01826a49SYabin CuiFor reference, several fast compression algorithms were tested and compared
32*01826a49SYabin Cuion a desktop running Ubuntu 20.04 (`Linux 5.11.0-41-generic`),
33*01826a49SYabin Cuiwith a Core i7-9700K CPU @ 4.9GHz,
34*01826a49SYabin Cuiusing [lzbench], an open-source in-memory benchmark by @inikep
35*01826a49SYabin Cuicompiled with [gcc] 9.3.0,
36*01826a49SYabin Cuion the [Silesia compression corpus].
37*01826a49SYabin Cui
38*01826a49SYabin Cui[lzbench]: https://github.com/inikep/lzbench
39*01826a49SYabin Cui[Silesia compression corpus]: https://sun.aei.polsl.pl//~sdeor/index.php?page=silesia
40*01826a49SYabin Cui[gcc]: https://gcc.gnu.org/
41*01826a49SYabin Cui
42*01826a49SYabin Cui| Compressor name         | Ratio | Compression| Decompress.|
43*01826a49SYabin Cui| ---------------         | ------| -----------| ---------- |
44*01826a49SYabin Cui| **zstd 1.5.1 -1**       | 2.887 |   530 MB/s |  1700 MB/s |
45*01826a49SYabin Cui| [zlib] 1.2.11 -1        | 2.743 |    95 MB/s |   400 MB/s |
46*01826a49SYabin Cui| brotli 1.0.9 -0         | 2.702 |   395 MB/s |   450 MB/s |
47*01826a49SYabin Cui| **zstd 1.5.1 --fast=1** | 2.437 |   600 MB/s |  2150 MB/s |
48*01826a49SYabin Cui| **zstd 1.5.1 --fast=3** | 2.239 |   670 MB/s |  2250 MB/s |
49*01826a49SYabin Cui| quicklz 1.5.0 -1        | 2.238 |   540 MB/s |   760 MB/s |
50*01826a49SYabin Cui| **zstd 1.5.1 --fast=4** | 2.148 |   710 MB/s |  2300 MB/s |
51*01826a49SYabin Cui| lzo1x 2.10 -1           | 2.106 |   660 MB/s |   845 MB/s |
52*01826a49SYabin Cui| [lz4] 1.9.3             | 2.101 |   740 MB/s |  4500 MB/s |
53*01826a49SYabin Cui| lzf 3.6 -1              | 2.077 |   410 MB/s |   830 MB/s |
54*01826a49SYabin Cui| snappy 1.1.9            | 2.073 |   550 MB/s |  1750 MB/s |
55*01826a49SYabin Cui
56*01826a49SYabin Cui[zlib]: https://www.zlib.net/
57*01826a49SYabin Cui[lz4]: https://lz4.github.io/lz4/
58*01826a49SYabin Cui
59*01826a49SYabin CuiThe negative compression levels, specified with `--fast=#`,
60*01826a49SYabin Cuioffer faster compression and decompression speed
61*01826a49SYabin Cuiat the cost of compression ratio (compared to level 1).
62*01826a49SYabin Cui
63*01826a49SYabin CuiZstd can also offer stronger compression ratios at the cost of compression speed.
64*01826a49SYabin CuiSpeed vs Compression trade-off is configurable by small increments.
65*01826a49SYabin CuiDecompression speed is preserved and remains roughly the same at all settings,
66*01826a49SYabin Cuia property shared by most LZ compression algorithms, such as [zlib] or lzma.
67*01826a49SYabin Cui
68*01826a49SYabin CuiThe following tests were run
69*01826a49SYabin Cuion a server running Linux Debian (`Linux version 4.14.0-3-amd64`)
70*01826a49SYabin Cuiwith a Core i7-6700K CPU @ 4.0GHz,
71*01826a49SYabin Cuiusing [lzbench], an open-source in-memory benchmark by @inikep
72*01826a49SYabin Cuicompiled with [gcc] 7.3.0,
73*01826a49SYabin Cuion the [Silesia compression corpus].
74*01826a49SYabin Cui
75*01826a49SYabin CuiCompression Speed vs Ratio | Decompression Speed
76*01826a49SYabin Cui---------------------------|--------------------
77*01826a49SYabin Cui![Compression Speed vs Ratio](doc/images/CSpeed2.png "Compression Speed vs Ratio") | ![Decompression Speed](doc/images/DSpeed3.png "Decompression Speed")
78*01826a49SYabin Cui
79*01826a49SYabin CuiA few other algorithms can produce higher compression ratios at slower speeds, falling outside of the graph.
80*01826a49SYabin CuiFor a larger picture including slow modes, [click on this link](doc/images/DCspeed5.png).
81*01826a49SYabin Cui
82*01826a49SYabin Cui
83*01826a49SYabin Cui## The case for Small Data compression
84*01826a49SYabin Cui
85*01826a49SYabin CuiPrevious charts provide results applicable to typical file and stream scenarios (several MB). Small data comes with different perspectives.
86*01826a49SYabin Cui
87*01826a49SYabin CuiThe smaller the amount of data to compress, the more difficult it is to compress. This problem is common to all compression algorithms, and reason is, compression algorithms learn from past data how to compress future data. But at the beginning of a new data set, there is no "past" to build upon.
88*01826a49SYabin Cui
89*01826a49SYabin CuiTo solve this situation, Zstd offers a __training mode__, which can be used to tune the algorithm for a selected type of data.
90*01826a49SYabin CuiTraining Zstandard is achieved by providing it with a few samples (one file per sample). The result of this training is stored in a file called "dictionary", which must be loaded before compression and decompression.
91*01826a49SYabin CuiUsing this dictionary, the compression ratio achievable on small data improves dramatically.
92*01826a49SYabin Cui
93*01826a49SYabin CuiThe following example uses the `github-users` [sample set](https://github.com/facebook/zstd/releases/tag/v1.1.3), created from [github public API](https://developer.github.com/v3/users/#get-all-users).
94*01826a49SYabin CuiIt consists of roughly 10K records weighing about 1KB each.
95*01826a49SYabin Cui
96*01826a49SYabin CuiCompression Ratio | Compression Speed | Decompression Speed
97*01826a49SYabin Cui------------------|-------------------|--------------------
98*01826a49SYabin Cui![Compression Ratio](doc/images/dict-cr.png "Compression Ratio") | ![Compression Speed](doc/images/dict-cs.png "Compression Speed") | ![Decompression Speed](doc/images/dict-ds.png "Decompression Speed")
99*01826a49SYabin Cui
100*01826a49SYabin Cui
101*01826a49SYabin CuiThese compression gains are achieved while simultaneously providing _faster_ compression and decompression speeds.
102*01826a49SYabin Cui
103*01826a49SYabin CuiTraining works if there is some correlation in a family of small data samples. The more data-specific a dictionary is, the more efficient it is (there is no _universal dictionary_).
104*01826a49SYabin CuiHence, deploying one dictionary per type of data will provide the greatest benefits.
105*01826a49SYabin CuiDictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file.
106*01826a49SYabin Cui
107*01826a49SYabin Cui### Dictionary compression How To:
108*01826a49SYabin Cui
109*01826a49SYabin Cui1. Create the dictionary
110*01826a49SYabin Cui
111*01826a49SYabin Cui   `zstd --train FullPathToTrainingSet/* -o dictionaryName`
112*01826a49SYabin Cui
113*01826a49SYabin Cui2. Compress with dictionary
114*01826a49SYabin Cui
115*01826a49SYabin Cui   `zstd -D dictionaryName FILE`
116*01826a49SYabin Cui
117*01826a49SYabin Cui3. Decompress with dictionary
118*01826a49SYabin Cui
119*01826a49SYabin Cui   `zstd -D dictionaryName --decompress FILE.zst`
120*01826a49SYabin Cui
121*01826a49SYabin Cui
122*01826a49SYabin Cui## Build instructions
123*01826a49SYabin Cui
124*01826a49SYabin Cui`make` is the officially maintained build system of this project.
125*01826a49SYabin CuiAll other build systems are "compatible" and 3rd-party maintained,
126*01826a49SYabin Cuithey may feature small differences in advanced options.
127*01826a49SYabin CuiWhen your system allows it, prefer using `make` to build `zstd` and `libzstd`.
128*01826a49SYabin Cui
129*01826a49SYabin Cui### Makefile
130*01826a49SYabin Cui
131*01826a49SYabin CuiIf your system is compatible with standard `make` (or `gmake`),
132*01826a49SYabin Cuiinvoking `make` in root directory will generate `zstd` cli in root directory.
133*01826a49SYabin CuiIt will also create `libzstd` into `lib/`.
134*01826a49SYabin Cui
135*01826a49SYabin CuiOther available options include:
136*01826a49SYabin Cui- `make install` : create and install zstd cli, library and man pages
137*01826a49SYabin Cui- `make check` : create and run `zstd`, test its behavior on local platform
138*01826a49SYabin Cui
139*01826a49SYabin CuiThe `Makefile` follows the [GNU Standard Makefile conventions](https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html),
140*01826a49SYabin Cuiallowing staged install, standard flags, directory variables and command variables.
141*01826a49SYabin Cui
142*01826a49SYabin CuiFor advanced use cases, specialized compilation flags which control binary generation
143*01826a49SYabin Cuiare documented in [`lib/README.md`](lib/README.md#modular-build) for the `libzstd` library
144*01826a49SYabin Cuiand in [`programs/README.md`](programs/README.md#compilation-variables) for the `zstd` CLI.
145*01826a49SYabin Cui
146*01826a49SYabin Cui### cmake
147*01826a49SYabin Cui
148*01826a49SYabin CuiA `cmake` project generator is provided within `build/cmake`.
149*01826a49SYabin CuiIt can generate Makefiles or other build scripts
150*01826a49SYabin Cuito create `zstd` binary, and `libzstd` dynamic and static libraries.
151*01826a49SYabin Cui
152*01826a49SYabin CuiBy default, `CMAKE_BUILD_TYPE` is set to `Release`.
153*01826a49SYabin Cui
154*01826a49SYabin Cui#### Support for Fat (Universal2) Output
155*01826a49SYabin Cui
156*01826a49SYabin Cui`zstd` can be built and installed with support for both Apple Silicon (M1/M2) as well as Intel by using CMake's Universal2 support.
157*01826a49SYabin CuiTo perform a Fat/Universal2 build and install use the following commands:
158*01826a49SYabin Cui
159*01826a49SYabin Cui```bash
160*01826a49SYabin Cuicmake -B build-cmake-debug -S build/cmake -G Ninja -DCMAKE_OSX_ARCHITECTURES="x86_64;x86_64h;arm64"
161*01826a49SYabin Cuicd build-cmake-debug
162*01826a49SYabin Cuininja
163*01826a49SYabin Cuisudo ninja install
164*01826a49SYabin Cui```
165*01826a49SYabin Cui
166*01826a49SYabin Cui### Meson
167*01826a49SYabin Cui
168*01826a49SYabin CuiA Meson project is provided within [`build/meson`](build/meson). Follow
169*01826a49SYabin Cuibuild instructions in that directory.
170*01826a49SYabin Cui
171*01826a49SYabin CuiYou can also take a look at [`.travis.yml`](.travis.yml) file for an
172*01826a49SYabin Cuiexample about how Meson is used to build this project.
173*01826a49SYabin Cui
174*01826a49SYabin CuiNote that default build type is **release**.
175*01826a49SYabin Cui
176*01826a49SYabin Cui### VCPKG
177*01826a49SYabin CuiYou can build and install zstd [vcpkg](https://github.com/Microsoft/vcpkg/) dependency manager:
178*01826a49SYabin Cui
179*01826a49SYabin Cui    git clone https://github.com/Microsoft/vcpkg.git
180*01826a49SYabin Cui    cd vcpkg
181*01826a49SYabin Cui    ./bootstrap-vcpkg.sh
182*01826a49SYabin Cui    ./vcpkg integrate install
183*01826a49SYabin Cui    ./vcpkg install zstd
184*01826a49SYabin Cui
185*01826a49SYabin CuiThe zstd port in vcpkg is kept up to date by Microsoft team members and community contributors.
186*01826a49SYabin CuiIf the version is out of date, please [create an issue or pull request](https://github.com/Microsoft/vcpkg) on the vcpkg repository.
187*01826a49SYabin Cui
188*01826a49SYabin Cui### Visual Studio (Windows)
189*01826a49SYabin Cui
190*01826a49SYabin CuiGoing into `build` directory, you will find additional possibilities:
191*01826a49SYabin Cui- Projects for Visual Studio 2005, 2008 and 2010.
192*01826a49SYabin Cui  + VS2010 project is compatible with VS2012, VS2013, VS2015 and VS2017.
193*01826a49SYabin Cui- Automated build scripts for Visual compiler by [@KrzysFR](https://github.com/KrzysFR), in `build/VS_scripts`,
194*01826a49SYabin Cui  which will build `zstd` cli and `libzstd` library without any need to open Visual Studio solution.
195*01826a49SYabin Cui
196*01826a49SYabin Cui### Buck
197*01826a49SYabin Cui
198*01826a49SYabin CuiYou can build the zstd binary via buck by executing: `buck build programs:zstd` from the root of the repo.
199*01826a49SYabin CuiThe output binary will be in `buck-out/gen/programs/`.
200*01826a49SYabin Cui
201*01826a49SYabin Cui### Bazel
202*01826a49SYabin Cui
203*01826a49SYabin CuiYou easily can integrate zstd into your Bazel project by using the module hosted on the [Bazel Central Repository](https://registry.bazel.build/modules/zstd).
204*01826a49SYabin Cui
205*01826a49SYabin Cui## Testing
206*01826a49SYabin Cui
207*01826a49SYabin CuiYou can run quick local smoke tests by running `make check`.
208*01826a49SYabin CuiIf you can't use `make`, execute the `playTest.sh` script from the `src/tests` directory.
209*01826a49SYabin CuiTwo env variables `$ZSTD_BIN` and `$DATAGEN_BIN` are needed for the test script to locate the `zstd` and `datagen` binary.
210*01826a49SYabin CuiFor information on CI testing, please refer to `TESTING.md`.
211*01826a49SYabin Cui
212*01826a49SYabin Cui## Status
213*01826a49SYabin Cui
214*01826a49SYabin CuiZstandard is currently deployed within Facebook and many other large cloud infrastructures.
215*01826a49SYabin CuiIt is run continuously to compress large amounts of data in multiple formats and use cases.
216*01826a49SYabin CuiZstandard is considered safe for production environments.
217*01826a49SYabin Cui
218*01826a49SYabin Cui## License
219*01826a49SYabin Cui
220*01826a49SYabin CuiZstandard is dual-licensed under [BSD](LICENSE) OR [GPLv2](COPYING).
221*01826a49SYabin Cui
222*01826a49SYabin Cui## Contributing
223*01826a49SYabin Cui
224*01826a49SYabin CuiThe `dev` branch is the one where all contributions are merged before reaching `release`.
225*01826a49SYabin CuiIf you plan to propose a patch, please commit into the `dev` branch, or its own feature branch.
226*01826a49SYabin CuiDirect commit to `release` are not permitted.
227*01826a49SYabin CuiFor more information, please read [CONTRIBUTING](CONTRIBUTING.md).
228