xref: /aosp_15_r20/external/zstd/doc/decompressor_errata.md (revision 01826a4963a0d8a59bc3812d29bdf0fb76416722)
1*01826a49SYabin CuiDecompressor Errata
2*01826a49SYabin Cui===================
3*01826a49SYabin Cui
4*01826a49SYabin CuiThis document captures known decompressor bugs, where the decompressor rejects a valid zstd frame.
5*01826a49SYabin CuiEach entry will contain:
6*01826a49SYabin Cui1. The last affected decompressor versions.
7*01826a49SYabin Cui2. The decompressor components affected.
8*01826a49SYabin Cui2. Whether the compressed frame could ever be produced by the reference compressor.
9*01826a49SYabin Cui3. An example frame (hexadecimal string when it can be short enough, link to golden file otherwise)
10*01826a49SYabin Cui4. A description of the bug.
11*01826a49SYabin Cui
12*01826a49SYabin CuiThe document is in reverse chronological order, with the bugs that affect the most recent zstd decompressor versions listed first.
13*01826a49SYabin Cui
14*01826a49SYabin Cui
15*01826a49SYabin CuiNo sequence using the 2-bytes format
16*01826a49SYabin Cui------------------------------------------------
17*01826a49SYabin Cui
18*01826a49SYabin Cui**Last affected version**: v1.5.5
19*01826a49SYabin Cui
20*01826a49SYabin Cui**Affected decompressor component(s)**: Library & CLI
21*01826a49SYabin Cui
22*01826a49SYabin Cui**Produced by the reference compressor**: No
23*01826a49SYabin Cui
24*01826a49SYabin Cui**Example Frame**: see zstd/tests/golden-decompression/zeroSeq_2B.zst
25*01826a49SYabin Cui
26*01826a49SYabin CuiThe zstd decoder incorrectly expects FSE tables when there are 0 sequences present in the block
27*01826a49SYabin Cuiif the value 0 is encoded using the 2-bytes format.
28*01826a49SYabin CuiInstead, it should immediately end the sequence section, and move on to next block.
29*01826a49SYabin Cui
30*01826a49SYabin CuiThis situation was never generated by the reference compressor,
31*01826a49SYabin Cuibecause representing 0 sequences with the 2-bytes format is inefficient
32*01826a49SYabin Cui(the 1-byte format is always used in this case).
33*01826a49SYabin Cui
34*01826a49SYabin Cui
35*01826a49SYabin CuiCompressed block with a size of exactly 128 KB
36*01826a49SYabin Cui------------------------------------------------
37*01826a49SYabin Cui
38*01826a49SYabin Cui**Last affected version**: v1.5.2
39*01826a49SYabin Cui
40*01826a49SYabin Cui**Affected decompressor component(s)**: Library & CLI
41*01826a49SYabin Cui
42*01826a49SYabin Cui**Produced by the reference compressor**: No
43*01826a49SYabin Cui
44*01826a49SYabin Cui**Example Frame**: see zstd/tests/golden-decompression/block-128k.zst
45*01826a49SYabin Cui
46*01826a49SYabin CuiThe zstd decoder incorrectly rejected blocks of type `Compressed_Block` when their size was exactly 128 KB.
47*01826a49SYabin CuiNote that `128 KB - 1` was accepted, and `128 KB + 1` is forbidden by the spec.
48*01826a49SYabin Cui
49*01826a49SYabin CuiThis type of block was never generated by the reference compressor.
50*01826a49SYabin Cui
51*01826a49SYabin CuiThese blocks used to be disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689).
52*01826a49SYabin Cui
53*01826a49SYabin Cui> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).
54*01826a49SYabin Cui
55*01826a49SYabin Cui
56*01826a49SYabin CuiCompressed block with 0 literals and 0 sequences
57*01826a49SYabin Cui------------------------------------------------
58*01826a49SYabin Cui
59*01826a49SYabin Cui**Last affected version**: v1.5.2
60*01826a49SYabin Cui
61*01826a49SYabin Cui**Affected decompressor component(s)**: Library & CLI
62*01826a49SYabin Cui
63*01826a49SYabin Cui**Produced by the reference compressor**: No
64*01826a49SYabin Cui
65*01826a49SYabin Cui**Example Frame**: `28b5 2ffd 2000 1500 0000 00`
66*01826a49SYabin Cui
67*01826a49SYabin CuiThe zstd decoder incorrectly rejected blocks of type `Compressed_Block` that encodes literals as `Raw_Literals_Block` with no literals, and has no sequences.
68*01826a49SYabin Cui
69*01826a49SYabin CuiThis type of block was never generated by the reference compressor.
70*01826a49SYabin Cui
71*01826a49SYabin CuiAdditionally, these blocks were disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689).
72*01826a49SYabin Cui
73*01826a49SYabin Cui> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).
74*01826a49SYabin Cui
75*01826a49SYabin Cui
76*01826a49SYabin CuiFirst block is RLE block
77*01826a49SYabin Cui------------------------
78*01826a49SYabin Cui
79*01826a49SYabin Cui**Last affected version**: v1.4.3
80*01826a49SYabin Cui
81*01826a49SYabin Cui**Affected decompressor component(s)**: CLI only
82*01826a49SYabin Cui
83*01826a49SYabin Cui**Produced by the reference compressor**: No
84*01826a49SYabin Cui
85*01826a49SYabin Cui**Example Frame**: `28b5 2ffd a001 0002 0002 0010 000b 0000 00`
86*01826a49SYabin Cui
87*01826a49SYabin CuiThe zstd CLI decompressor rejected cases where the first block was an RLE block whose `Block_Size` is 131072, and the frame contains more than one block.
88*01826a49SYabin CuiThis only affected the zstd CLI, and not the library.
89*01826a49SYabin Cui
90*01826a49SYabin CuiThe example is an RLE block with 131072 bytes, followed by a second RLE block with 1 byte.
91*01826a49SYabin Cui
92*01826a49SYabin CuiThe compressor currently works around this limitation by explicitly avoiding producing RLE blocks as the first
93*01826a49SYabin Cuiblock.
94*01826a49SYabin Cui
95*01826a49SYabin Cuihttps://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L3527-L3535
96*01826a49SYabin Cui
97*01826a49SYabin Cui
98*01826a49SYabin CuiTiny FSE Table & Block
99*01826a49SYabin Cui----------------------
100*01826a49SYabin Cui
101*01826a49SYabin Cui**Last affected version**: v1.3.4
102*01826a49SYabin Cui
103*01826a49SYabin Cui**Affected decompressor component(s)**: Library & CLI
104*01826a49SYabin Cui
105*01826a49SYabin Cui**Produced by the reference compressor**: Possibly until version v1.3.4, but probably never
106*01826a49SYabin Cui
107*01826a49SYabin Cui**Example Frame**: `28b5 2ffd 2027 c500 0080 f3f1 f0ec ebc6 c5c7 f09d 4300 0000 e0e0 0658 0100 603e 52`
108*01826a49SYabin Cui
109*01826a49SYabin CuiThe zstd library rejected blocks of type `Compressed_Block` whose offset of the last table with type `FSE_Compressed_Mode` was less than 4 bytes from the end of the block.
110*01826a49SYabin Cui
111*01826a49SYabin CuiIn more depth, let `Last_Table_Offset` be the offset in the compressed block (excluding the header) that
112*01826a49SYabin Cuithe last table with type `FSE_Compressed_Mode` started. If `Block_Content - Last_Table_Offset < 4` then
113*01826a49SYabin Cuithe buggy zstd decompressor would reject the block. This occurs when the last serialized table is 2 bytes
114*01826a49SYabin Cuiand the bitstream size is 1 byte.
115*01826a49SYabin Cui
116*01826a49SYabin CuiFor example:
117*01826a49SYabin Cui* There is 1 sequence in the block
118*01826a49SYabin Cui* `Literals_Lengths_Mode` is `FSE_Compressed_Mode` & the serialized table size is 2 bytes
119*01826a49SYabin Cui* `Offsets_Mode` is `Predefined_Mode`
120*01826a49SYabin Cui* `Match_Lengths_Mode` is `Predefined_Mode`
121*01826a49SYabin Cui* The bitstream is 1 byte. E.g. there is only one sequence and it fits in 1 byte.
122*01826a49SYabin Cui
123*01826a49SYabin CuiThe total `Block_Content` is `5` bytes, and `Last_Table_Offset` is `2`.
124*01826a49SYabin Cui
125*01826a49SYabin CuiSee the compressor workaround code:
126*01826a49SYabin Cui
127*01826a49SYabin Cuihttps://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L2667-L2682
128*01826a49SYabin Cui
129*01826a49SYabin CuiMagicless format
130*01826a49SYabin Cui----------------------
131*01826a49SYabin Cui
132*01826a49SYabin Cui**Last affected version**: v1.5.5
133*01826a49SYabin Cui
134*01826a49SYabin Cui**Affected decompressor component(s)**: Library
135*01826a49SYabin Cui
136*01826a49SYabin Cui**Produced by the reference compressor**: Yes (example: https://gist.github.com/embg/9940726094f4cf2cef162cffe9319232)
137*01826a49SYabin Cui
138*01826a49SYabin Cui**Example Frame**: `27 b5 2f fd 00 03 19 00 00 66 6f 6f 3f ba c4 59`
139*01826a49SYabin Cui
140*01826a49SYabin Cuiv1.5.6 fixes several bugs in which the magicless-format decoder rejects valid frames.
141*01826a49SYabin CuiThese include but are not limited to:
142*01826a49SYabin Cui* Valid frames that happen to begin with a legacy magic number (little-endian)
143*01826a49SYabin Cui* Valid frames that happen to begin with a skippable magic number (little-endian)
144*01826a49SYabin Cui
145*01826a49SYabin CuiIf you are affected by this issue and cannot update to v1.5.6 or later, there is a
146*01826a49SYabin Cuiworkaround to recover affected data. Simply prepend the ZSTD magic number
147*01826a49SYabin Cui`0xFD2FB528` (little-endian) to your data and decompress using the standard-format
148*01826a49SYabin Cuidecoder.
149