xref: /aosp_15_r20/external/zstd/contrib/seekable_format/README.md (revision 01826a4963a0d8a59bc3812d29bdf0fb76416722)
1*01826a49SYabin Cui# Zstandard Seekable Format
2*01826a49SYabin Cui
3*01826a49SYabin CuiThe seekable format splits compressed data into a series of independent "frames",
4*01826a49SYabin Cuieach compressed individually,
5*01826a49SYabin Cuiso that decompression of a section in the middle of an archive
6*01826a49SYabin Cuionly requires zstd to decompress at most a frame's worth of extra data,
7*01826a49SYabin Cuiinstead of the entire archive.
8*01826a49SYabin Cui
9*01826a49SYabin CuiThe frames are appended, so that the decompression of the entire payload
10*01826a49SYabin Cuistill regenerates the original content, using any compliant zstd decoder.
11*01826a49SYabin Cui
12*01826a49SYabin CuiOn top of that, the seekable format generates a jump table,
13*01826a49SYabin Cuiwhich makes it possible to jump directly to the position of the relevant frame
14*01826a49SYabin Cuiwhen requesting only a segment of the data.
15*01826a49SYabin CuiThe jump table is simply ignored by zstd decoders unaware of the seekable format.
16*01826a49SYabin Cui
17*01826a49SYabin CuiThe format is delivered with an API to create seekable archives
18*01826a49SYabin Cuiand to retrieve arbitrary segments inside the archive.
19*01826a49SYabin Cui
20*01826a49SYabin Cui### Maximum Frame Size parameter
21*01826a49SYabin Cui
22*01826a49SYabin CuiWhen creating a seekable archive, the main parameter is the maximum frame size.
23*01826a49SYabin Cui
24*01826a49SYabin CuiAt compression time, user can manually select the boundaries between segments,
25*01826a49SYabin Cuibut they don't have to: long segments will be automatically split
26*01826a49SYabin Cuiwhen larger than selected maximum frame size.
27*01826a49SYabin Cui
28*01826a49SYabin CuiSmall frame sizes reduce decompression cost when requesting small segments,
29*01826a49SYabin Cuibecause the decoder will nonetheless have to decompress an entire frame
30*01826a49SYabin Cuito recover just a single byte from it.
31*01826a49SYabin Cui
32*01826a49SYabin CuiA good rule of thumb is to select a maximum frame size roughly equivalent
33*01826a49SYabin Cuito the access pattern when it's known.
34*01826a49SYabin CuiFor example, if the application tends to request 4KB blocks,
35*01826a49SYabin Cuithen it's a good idea to set a maximum frame size in the vicinity of 4 KB.
36*01826a49SYabin Cui
37*01826a49SYabin CuiBut small frame sizes also reduce compression ratio,
38*01826a49SYabin Cuiand increase the cost for the jump table,
39*01826a49SYabin Cuiso there is a balance to find.
40*01826a49SYabin Cui
41*01826a49SYabin CuiIn general, try to avoid really tiny frame sizes (<1 KB),
42*01826a49SYabin Cuiwhich would have a large negative impact on compression ratio.
43