1*01826a49SYabin CuiDecompressor Errata 2*01826a49SYabin Cui=================== 3*01826a49SYabin Cui 4*01826a49SYabin CuiThis document captures known decompressor bugs, where the decompressor rejects a valid zstd frame. 5*01826a49SYabin CuiEach entry will contain: 6*01826a49SYabin Cui1. The last affected decompressor versions. 7*01826a49SYabin Cui2. The decompressor components affected. 8*01826a49SYabin Cui2. Whether the compressed frame could ever be produced by the reference compressor. 9*01826a49SYabin Cui3. An example frame (hexadecimal string when it can be short enough, link to golden file otherwise) 10*01826a49SYabin Cui4. A description of the bug. 11*01826a49SYabin Cui 12*01826a49SYabin CuiThe document is in reverse chronological order, with the bugs that affect the most recent zstd decompressor versions listed first. 13*01826a49SYabin Cui 14*01826a49SYabin Cui 15*01826a49SYabin CuiNo sequence using the 2-bytes format 16*01826a49SYabin Cui------------------------------------------------ 17*01826a49SYabin Cui 18*01826a49SYabin Cui**Last affected version**: v1.5.5 19*01826a49SYabin Cui 20*01826a49SYabin Cui**Affected decompressor component(s)**: Library & CLI 21*01826a49SYabin Cui 22*01826a49SYabin Cui**Produced by the reference compressor**: No 23*01826a49SYabin Cui 24*01826a49SYabin Cui**Example Frame**: see zstd/tests/golden-decompression/zeroSeq_2B.zst 25*01826a49SYabin Cui 26*01826a49SYabin CuiThe zstd decoder incorrectly expects FSE tables when there are 0 sequences present in the block 27*01826a49SYabin Cuiif the value 0 is encoded using the 2-bytes format. 28*01826a49SYabin CuiInstead, it should immediately end the sequence section, and move on to next block. 29*01826a49SYabin Cui 30*01826a49SYabin CuiThis situation was never generated by the reference compressor, 31*01826a49SYabin Cuibecause representing 0 sequences with the 2-bytes format is inefficient 32*01826a49SYabin Cui(the 1-byte format is always used in this case). 33*01826a49SYabin Cui 34*01826a49SYabin Cui 35*01826a49SYabin CuiCompressed block with a size of exactly 128 KB 36*01826a49SYabin Cui------------------------------------------------ 37*01826a49SYabin Cui 38*01826a49SYabin Cui**Last affected version**: v1.5.2 39*01826a49SYabin Cui 40*01826a49SYabin Cui**Affected decompressor component(s)**: Library & CLI 41*01826a49SYabin Cui 42*01826a49SYabin Cui**Produced by the reference compressor**: No 43*01826a49SYabin Cui 44*01826a49SYabin Cui**Example Frame**: see zstd/tests/golden-decompression/block-128k.zst 45*01826a49SYabin Cui 46*01826a49SYabin CuiThe zstd decoder incorrectly rejected blocks of type `Compressed_Block` when their size was exactly 128 KB. 47*01826a49SYabin CuiNote that `128 KB - 1` was accepted, and `128 KB + 1` is forbidden by the spec. 48*01826a49SYabin Cui 49*01826a49SYabin CuiThis type of block was never generated by the reference compressor. 50*01826a49SYabin Cui 51*01826a49SYabin CuiThese blocks used to be disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689). 52*01826a49SYabin Cui 53*01826a49SYabin Cui> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block). 54*01826a49SYabin Cui 55*01826a49SYabin Cui 56*01826a49SYabin CuiCompressed block with 0 literals and 0 sequences 57*01826a49SYabin Cui------------------------------------------------ 58*01826a49SYabin Cui 59*01826a49SYabin Cui**Last affected version**: v1.5.2 60*01826a49SYabin Cui 61*01826a49SYabin Cui**Affected decompressor component(s)**: Library & CLI 62*01826a49SYabin Cui 63*01826a49SYabin Cui**Produced by the reference compressor**: No 64*01826a49SYabin Cui 65*01826a49SYabin Cui**Example Frame**: `28b5 2ffd 2000 1500 0000 00` 66*01826a49SYabin Cui 67*01826a49SYabin CuiThe zstd decoder incorrectly rejected blocks of type `Compressed_Block` that encodes literals as `Raw_Literals_Block` with no literals, and has no sequences. 68*01826a49SYabin Cui 69*01826a49SYabin CuiThis type of block was never generated by the reference compressor. 70*01826a49SYabin Cui 71*01826a49SYabin CuiAdditionally, these blocks were disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689). 72*01826a49SYabin Cui 73*01826a49SYabin Cui> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block). 74*01826a49SYabin Cui 75*01826a49SYabin Cui 76*01826a49SYabin CuiFirst block is RLE block 77*01826a49SYabin Cui------------------------ 78*01826a49SYabin Cui 79*01826a49SYabin Cui**Last affected version**: v1.4.3 80*01826a49SYabin Cui 81*01826a49SYabin Cui**Affected decompressor component(s)**: CLI only 82*01826a49SYabin Cui 83*01826a49SYabin Cui**Produced by the reference compressor**: No 84*01826a49SYabin Cui 85*01826a49SYabin Cui**Example Frame**: `28b5 2ffd a001 0002 0002 0010 000b 0000 00` 86*01826a49SYabin Cui 87*01826a49SYabin CuiThe zstd CLI decompressor rejected cases where the first block was an RLE block whose `Block_Size` is 131072, and the frame contains more than one block. 88*01826a49SYabin CuiThis only affected the zstd CLI, and not the library. 89*01826a49SYabin Cui 90*01826a49SYabin CuiThe example is an RLE block with 131072 bytes, followed by a second RLE block with 1 byte. 91*01826a49SYabin Cui 92*01826a49SYabin CuiThe compressor currently works around this limitation by explicitly avoiding producing RLE blocks as the first 93*01826a49SYabin Cuiblock. 94*01826a49SYabin Cui 95*01826a49SYabin Cuihttps://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L3527-L3535 96*01826a49SYabin Cui 97*01826a49SYabin Cui 98*01826a49SYabin CuiTiny FSE Table & Block 99*01826a49SYabin Cui---------------------- 100*01826a49SYabin Cui 101*01826a49SYabin Cui**Last affected version**: v1.3.4 102*01826a49SYabin Cui 103*01826a49SYabin Cui**Affected decompressor component(s)**: Library & CLI 104*01826a49SYabin Cui 105*01826a49SYabin Cui**Produced by the reference compressor**: Possibly until version v1.3.4, but probably never 106*01826a49SYabin Cui 107*01826a49SYabin Cui**Example Frame**: `28b5 2ffd 2027 c500 0080 f3f1 f0ec ebc6 c5c7 f09d 4300 0000 e0e0 0658 0100 603e 52` 108*01826a49SYabin Cui 109*01826a49SYabin CuiThe zstd library rejected blocks of type `Compressed_Block` whose offset of the last table with type `FSE_Compressed_Mode` was less than 4 bytes from the end of the block. 110*01826a49SYabin Cui 111*01826a49SYabin CuiIn more depth, let `Last_Table_Offset` be the offset in the compressed block (excluding the header) that 112*01826a49SYabin Cuithe last table with type `FSE_Compressed_Mode` started. If `Block_Content - Last_Table_Offset < 4` then 113*01826a49SYabin Cuithe buggy zstd decompressor would reject the block. This occurs when the last serialized table is 2 bytes 114*01826a49SYabin Cuiand the bitstream size is 1 byte. 115*01826a49SYabin Cui 116*01826a49SYabin CuiFor example: 117*01826a49SYabin Cui* There is 1 sequence in the block 118*01826a49SYabin Cui* `Literals_Lengths_Mode` is `FSE_Compressed_Mode` & the serialized table size is 2 bytes 119*01826a49SYabin Cui* `Offsets_Mode` is `Predefined_Mode` 120*01826a49SYabin Cui* `Match_Lengths_Mode` is `Predefined_Mode` 121*01826a49SYabin Cui* The bitstream is 1 byte. E.g. there is only one sequence and it fits in 1 byte. 122*01826a49SYabin Cui 123*01826a49SYabin CuiThe total `Block_Content` is `5` bytes, and `Last_Table_Offset` is `2`. 124*01826a49SYabin Cui 125*01826a49SYabin CuiSee the compressor workaround code: 126*01826a49SYabin Cui 127*01826a49SYabin Cuihttps://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L2667-L2682 128*01826a49SYabin Cui 129*01826a49SYabin CuiMagicless format 130*01826a49SYabin Cui---------------------- 131*01826a49SYabin Cui 132*01826a49SYabin Cui**Last affected version**: v1.5.5 133*01826a49SYabin Cui 134*01826a49SYabin Cui**Affected decompressor component(s)**: Library 135*01826a49SYabin Cui 136*01826a49SYabin Cui**Produced by the reference compressor**: Yes (example: https://gist.github.com/embg/9940726094f4cf2cef162cffe9319232) 137*01826a49SYabin Cui 138*01826a49SYabin Cui**Example Frame**: `27 b5 2f fd 00 03 19 00 00 66 6f 6f 3f ba c4 59` 139*01826a49SYabin Cui 140*01826a49SYabin Cuiv1.5.6 fixes several bugs in which the magicless-format decoder rejects valid frames. 141*01826a49SYabin CuiThese include but are not limited to: 142*01826a49SYabin Cui* Valid frames that happen to begin with a legacy magic number (little-endian) 143*01826a49SYabin Cui* Valid frames that happen to begin with a skippable magic number (little-endian) 144*01826a49SYabin Cui 145*01826a49SYabin CuiIf you are affected by this issue and cannot update to v1.5.6 or later, there is a 146*01826a49SYabin Cuiworkaround to recover affected data. Simply prepend the ZSTD magic number 147*01826a49SYabin Cui`0xFD2FB528` (little-endian) to your data and decompress using the standard-format 148*01826a49SYabin Cuidecoder. 149