xref: /aosp_15_r20/external/lzma/DOC/lzma.txt (revision f6dc9357d832569d4d1f5d24eacdb3935a1ae8e6)
1LZMA compression
2----------------
3Version: 24.07
4
5This file describes LZMA encoding and decoding functions written in C language.
6
7LZMA is an improved version of famous LZ77 compression algorithm.
8It was improved in way of maximum increasing of compression ratio,
9keeping high decompression speed and low memory requirements for
10decompressing.
11
12Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK)
13
14Also you can look source code for LZMA encoding and decoding:
15  C/Util/Lzma/LzmaUtil.c
16
17
18LZMA compressed file format
19---------------------------
20Offset Size Description
21  0     1   Special LZMA properties (lc,lp, pb in encoded form)
22  1     4   Dictionary size (little endian)
23  5     8   Uncompressed size (little endian). -1 means unknown size
24 13         Compressed data
25
26
27
28ANSI-C LZMA Decoder
29~~~~~~~~~~~~~~~~~~~
30
31Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.
32If you want to use old interfaces you can download previous version of LZMA SDK
33from sourceforge.net site.
34
35To use ANSI-C LZMA Decoder you need the following files:
361) LzmaDec.h + LzmaDec.c + 7zTypes.h + Precomp.h + Compiler.h
37
38Look example code:
39  C/Util/Lzma/LzmaUtil.c
40
41
42Memory requirements for LZMA decoding
43-------------------------------------
44
45Stack usage of LZMA decoding function for local variables is not
46larger than 200-400 bytes.
47
48LZMA Decoder uses dictionary buffer and internal state structure.
49Internal state structure consumes
50  state_size = (4 + (1.5 << (lc + lp))) KB
51by default (lc=3, lp=0), state_size = 16 KB.
52
53
54How To decompress data
55----------------------
56
57LZMA Decoder (ANSI-C version) now supports 2 interfaces:
581) Single-call Decompressing
592) Multi-call State Decompressing (zlib-like interface)
60
61You must use external allocator:
62Example:
63void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }
64void SzFree(void *p, void *address) { p = p; free(address); }
65ISzAlloc alloc = { SzAlloc, SzFree };
66
67You can use p = p; operator to disable compiler warnings.
68
69
70Single-call Decompressing
71-------------------------
72When to use: RAM->RAM decompressing
73Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h
74Compile defines: no defines
75Memory Requirements:
76  - Input buffer: compressed size
77  - Output buffer: uncompressed size
78  - LZMA Internal Structures: state_size (16 KB for default settings)
79
80Interface:
81  int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
82      const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode,
83      ELzmaStatus *status, ISzAlloc *alloc);
84  In:
85    dest     - output data
86    destLen  - output data size
87    src      - input data
88    srcLen   - input data size
89    propData - LZMA properties  (5 bytes)
90    propSize - size of propData buffer (5 bytes)
91    finishMode - It has meaning only if the decoding reaches output limit (*destLen).
92         LZMA_FINISH_ANY - Decode just destLen bytes.
93         LZMA_FINISH_END - Stream must be finished after (*destLen).
94                           You can use LZMA_FINISH_END, when you know that
95                           current output buffer covers last bytes of stream.
96    alloc    - Memory allocator.
97
98  Out:
99    destLen  - processed output size
100    srcLen   - processed input size
101
102  Output:
103    SZ_OK
104      status:
105        LZMA_STATUS_FINISHED_WITH_MARK
106        LZMA_STATUS_NOT_FINISHED
107        LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK
108    SZ_ERROR_DATA - Data error
109    SZ_ERROR_MEM  - Memory allocation error
110    SZ_ERROR_UNSUPPORTED - Unsupported properties
111    SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).
112
113  If LZMA decoder sees end_marker before reaching output limit, it returns OK result,
114  and output value of destLen will be less than output buffer size limit.
115
116  You can use multiple checks to test data integrity after full decompression:
117    1) Check Result and "status" variable.
118    2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
119    3) Check that output(srcLen) = compressedSize, if you know real compressedSize.
120       You must use correct finish mode in that case. */
121
122
123Multi-call State Decompressing (zlib-like interface)
124----------------------------------------------------
125
126When to use: file->file decompressing
127Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h
128
129Memory Requirements:
130 - Buffer for input stream: any size (for example, 16 KB)
131 - Buffer for output stream: any size (for example, 16 KB)
132 - LZMA Internal Structures: state_size (16 KB for default settings)
133 - LZMA dictionary (dictionary size is encoded in LZMA properties header)
134
1351) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:
136   unsigned char header[LZMA_PROPS_SIZE + 8];
137   ReadFile(inFile, header, sizeof(header)
138
1392) Allocate CLzmaDec structures (state + dictionary) using LZMA properties
140
141  CLzmaDec state;
142  LzmaDec_Constr(&state);
143  res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);
144  if (res != SZ_OK)
145    return res;
146
1473) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop
148
149  LzmaDec_Init(&state);
150  for (;;)
151  {
152    ...
153    int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen,
154        const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);
155    ...
156  }
157
158
1594) Free all allocated structures
160  LzmaDec_Free(&state, &g_Alloc);
161
162Look example code:
163  C/Util/Lzma/LzmaUtil.c
164
165
166How To compress data
167--------------------
168
169Compile files:
170  7zTypes.h
171  Threads.h
172  Threads.c
173  LzmaEnc.h
174  LzmaEnc.c
175  LzFind.h
176  LzFind.c
177  LzFindMt.h
178  LzFindMt.c
179  LzFindOpt.c
180  LzHash.h
181
182Memory Requirements:
183  - (dictSize * 11.5 + 6 MB) + state_size
184
185Lzma Encoder can use two memory allocators:
1861) alloc - for small arrays.
1872) allocBig - for big arrays.
188
189For example, you can use Large RAM Pages (2 MB) in allocBig allocator for
190better compression speed. Note that Windows has bad implementation for
191Large RAM Pages.
192It's OK to use same allocator for alloc and allocBig.
193
194
195Single-call Compression with callbacks
196--------------------------------------
197
198Look example code:
199  C/Util/Lzma/LzmaUtil.c
200
201When to use: file->file compressing
202
2031) you must implement callback structures for interfaces:
204ISeqInStream
205ISeqOutStream
206ICompressProgress
207ISzAlloc
208
209static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }
210static void SzFree(void *p, void *address) {  p = p; MyFree(address); }
211static ISzAlloc g_Alloc = { SzAlloc, SzFree };
212
213  CFileSeqInStream inStream;
214  CFileSeqOutStream outStream;
215
216  inStream.funcTable.Read = MyRead;
217  inStream.file = inFile;
218  outStream.funcTable.Write = MyWrite;
219  outStream.file = outFile;
220
221
2222) Create CLzmaEncHandle object;
223
224  CLzmaEncHandle enc;
225
226  enc = LzmaEnc_Create(&g_Alloc);
227  if (enc == 0)
228    return SZ_ERROR_MEM;
229
230
2313) initialize CLzmaEncProps properties;
232
233  LzmaEncProps_Init(&props);
234
235  Then you can change some properties in that structure.
236
2374) Send LZMA properties to LZMA Encoder
238
239  res = LzmaEnc_SetProps(enc, &props);
240
2415) Write encoded properties to header
242
243    Byte header[LZMA_PROPS_SIZE + 8];
244    size_t headerSize = LZMA_PROPS_SIZE;
245    UInt64 fileSize;
246    int i;
247
248    res = LzmaEnc_WriteProperties(enc, header, &headerSize);
249    fileSize = MyGetFileLength(inFile);
250    for (i = 0; i < 8; i++)
251      header[headerSize++] = (Byte)(fileSize >> (8 * i));
252    MyWriteFileAndCheck(outFile, header, headerSize)
253
2546) Call encoding function:
255      res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable,
256        NULL, &g_Alloc, &g_Alloc);
257
2587) Destroy LZMA Encoder Object
259  LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);
260
261
262If callback function return some error code, LzmaEnc_Encode also returns that code
263or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS.
264
265
266Single-call RAM->RAM Compression
267--------------------------------
268
269Single-call RAM->RAM Compression is similar to Compression with callbacks,
270but you provide pointers to buffers instead of pointers to stream callbacks:
271
272SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
273    const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark,
274    ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);
275
276Return code:
277  SZ_OK               - OK
278  SZ_ERROR_MEM        - Memory allocation error
279  SZ_ERROR_PARAM      - Incorrect paramater
280  SZ_ERROR_OUTPUT_EOF - output buffer overflow
281  SZ_ERROR_THREAD     - errors in multithreading functions (only for Mt version)
282
283
284
285Defines
286-------
287
288Z7_LZMA_SIZE_OPT - Enable some code size optimizations in LZMA Decoder to get smaller executable code.
289
290Z7_LZMA_PROB32   - It can increase the speed on some 32-bit CPUs, but memory usage for
291                   some structures will be doubled in that case.
292
293Z7_DECL_Int32_AS_long  - Define it if int is 16-bit on your compiler and long is 32-bit.
294
295Z7_DECL_SizeT_AS_unsigned_int  - Define it if you don't want to use size_t type.
296
297
298Defines for 7z decoder written in C
299-----------------------------------
300These defines are for 7zDec.c only (the decoder in C).
301C++ 7z decoder doesn't uses these macros.
302
303Z7_PPMD_SUPPORT        - define it if you need PPMD method support.
304Z7_NO_METHODS_FILTERS  - do not use filters (except of BCJ2 filter).
305Z7_USE_NATIVE_BRANCH_FILTER - use filter for native ISA:
306                                 use x86 filter, if compiled to x86 executable,
307                		 use arm64 filter, if compiled to arm64 executable.
308
309
310C++ LZMA Encoder/Decoder
311~~~~~~~~~~~~~~~~~~~~~~~~
312C++ LZMA code use COM-like interfaces. So if you want to use it,
313you can study basics of COM/OLE.
314C++ LZMA code is just wrapper over ANSI-C code.
315
316
317C++ Notes
318~~~~~~~~~~~~~~~~~~~~~~~~
319If you use some C++ code folders in 7-Zip (for example, C++ code for 7z archive handling),
320you must check that you correctly work with "new" operator.
3217-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.
322So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator,
323if compiled by old MSVC compilers (MSVC before version VS 2010):
324
325operator new(size_t size)
326{
327  void *p = ::malloc(size);
328  if (!p)
329    throw CNewException();
330  return p;
331}
332
333If the compiler is VS 2010 or newer, NewHandler.cpp doesn't redefine "new" operator.
334Sp if you use new compiler (VS 2010 or newer), you still can include "NewHandler.cpp"
335to compilation, and it will not redefine operator new.
336Also you can compile without "NewHandler.cpp" with new compilers.
337If 7-zip doesn't redefine operator "new", standard exception will be used instead of CNewException.
338Some code of 7-Zip catches any exception in internal code and converts it to HRESULT code.
339So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.
340
341---
342
343http://www.7-zip.org
344http://www.7-zip.org/sdk.html
345http://www.7-zip.org/support.html
346