xref: /aosp_15_r20/external/xz-embedded/README (revision d2c16535d139cb185e89120452531bba6b36d3c6)
1*d2c16535SElliott Hughes
2*d2c16535SElliott HughesXZ Embedded
3*d2c16535SElliott Hughes===========
4*d2c16535SElliott Hughes
5*d2c16535SElliott Hughes    XZ Embedded is a relatively small, limited implementation of the .xz
6*d2c16535SElliott Hughes    file format. Currently only decoding is implemented.
7*d2c16535SElliott Hughes
8*d2c16535SElliott Hughes    XZ Embedded was written for use in the Linux kernel, but the code can
9*d2c16535SElliott Hughes    be easily used in other environments too, including regular userspace
10*d2c16535SElliott Hughes    applications. See userspace/xzminidec.c for an example program.
11*d2c16535SElliott Hughes
12*d2c16535SElliott Hughes        NOTE: The version of XZ Embedded in the Linux kernel lacks a few
13*d2c16535SElliott Hughes        build-time-selectable optional features that are present in the
14*d2c16535SElliott Hughes        upstream XZ Embedded project: support for concatated .xz files,
15*d2c16535SElliott Hughes        CRC64, and ignoring unsupported check. These aren't in Linux
16*d2c16535SElliott Hughes        because they don't seem useful there but they would add to the
17*d2c16535SElliott Hughes        code size.
18*d2c16535SElliott Hughes
19*d2c16535SElliott Hughes    This README contains information that is useful only when the copy
20*d2c16535SElliott Hughes    of XZ Embedded isn't part of the Linux kernel tree. You should also
21*d2c16535SElliott Hughes    read linux/Documentation/staging/xz.rst even if you aren't using
22*d2c16535SElliott Hughes    XZ Embedded as part of Linux; information in that file is not
23*d2c16535SElliott Hughes    repeated in this README.
24*d2c16535SElliott Hughes
25*d2c16535SElliott HughesConformance to the .xz file format specification
26*d2c16535SElliott Hughes
27*d2c16535SElliott Hughes    As of the .xz file format specification version 1.2.0, this
28*d2c16535SElliott Hughes    decompressor implementation has the following limitations:
29*d2c16535SElliott Hughes
30*d2c16535SElliott Hughes      - SHA-256 isn't supported. It can be ignored as an unsupported
31*d2c16535SElliott Hughes        checked type if that feature is enabled at build time.
32*d2c16535SElliott Hughes
33*d2c16535SElliott Hughes      - Delta filter is not included.
34*d2c16535SElliott Hughes
35*d2c16535SElliott Hughes      - BCJ filters don't support non-default start offset.
36*d2c16535SElliott Hughes
37*d2c16535SElliott Hughes      - LZMA2 supports at most 3 GiB dictionary.
38*d2c16535SElliott Hughes
39*d2c16535SElliott Hughes    There are a couple of corner cases where things have been simplified
40*d2c16535SElliott Hughes    at expense of detecting errors as early as possible. These should not
41*d2c16535SElliott Hughes    matter in practice at all since they don't cause security issues. But
42*d2c16535SElliott Hughes    it is good to know this if testing the code with the test files from
43*d2c16535SElliott Hughes    XZ Utils.
44*d2c16535SElliott Hughes
45*d2c16535SElliott HughesCompiler requirements
46*d2c16535SElliott Hughes
47*d2c16535SElliott Hughes    XZ Embedded should compile with any C99 or C11 compiler. The code
48*d2c16535SElliott Hughes    should also be GNU-C89 compatible still. GNU-C89 was used in the
49*d2c16535SElliott Hughes    Linux kernel until 2022. GNU-C89 support likely will be dropped
50*d2c16535SElliott Hughes    at some point.
51*d2c16535SElliott Hughes
52*d2c16535SElliott HughesEmbedding into userspace applications
53*d2c16535SElliott Hughes
54*d2c16535SElliott Hughes    To embed the XZ decoder, copy the following files into a single
55*d2c16535SElliott Hughes    directory in your source code tree:
56*d2c16535SElliott Hughes
57*d2c16535SElliott Hughes        linux/include/linux/xz.h
58*d2c16535SElliott Hughes        linux/lib/xz/xz_crc32.c
59*d2c16535SElliott Hughes        linux/lib/xz/xz_dec_lzma2.c
60*d2c16535SElliott Hughes        linux/lib/xz/xz_dec_stream.c
61*d2c16535SElliott Hughes        linux/lib/xz/xz_lzma2.h
62*d2c16535SElliott Hughes        linux/lib/xz/xz_private.h
63*d2c16535SElliott Hughes        linux/lib/xz/xz_stream.h
64*d2c16535SElliott Hughes        userspace/xz_config.h
65*d2c16535SElliott Hughes
66*d2c16535SElliott Hughes    Alternatively, xz.h may be placed into a different directory but then
67*d2c16535SElliott Hughes    that directory must be in the compiler include path when compiling
68*d2c16535SElliott Hughes    the .c files.
69*d2c16535SElliott Hughes
70*d2c16535SElliott Hughes    Your code should use only the functions declared in xz.h. The rest of
71*d2c16535SElliott Hughes    the .h files are meant only for internal use in XZ Embedded.
72*d2c16535SElliott Hughes
73*d2c16535SElliott Hughes    You may want to modify xz_config.h to be more suitable for your build
74*d2c16535SElliott Hughes    environment. Probably you should at least skim through it even if the
75*d2c16535SElliott Hughes    default file works as is.
76*d2c16535SElliott Hughes
77*d2c16535SElliott HughesSupporting concatenated .xz files
78*d2c16535SElliott Hughes
79*d2c16535SElliott Hughes    Regular .xz files can be concatenated as is and the xz command line
80*d2c16535SElliott Hughes    tool will decompress all streams from a concatenated file (a few
81*d2c16535SElliott Hughes    other popular formats and tools support this too). This kind of .xz
82*d2c16535SElliott Hughes    files are more common than one might think because pxz, an early
83*d2c16535SElliott Hughes    threaded XZ compressor, created this kind of .xz files.
84*d2c16535SElliott Hughes
85*d2c16535SElliott Hughes    The xz_dec_run() function will stop after decompressing one stream.
86*d2c16535SElliott Hughes    This is good when XZ data is stored inside some other file format.
87*d2c16535SElliott Hughes    However, if one is decompressing regular standalone .xz files, one
88*d2c16535SElliott Hughes    will want to decompress all streams in the file. This is easy with
89*d2c16535SElliott Hughes    xz_dec_catrun(). To include support for xz_dec_catrun(), you need
90*d2c16535SElliott Hughes    to #define XZ_DEC_CONCATENATED in xz_config.h or in compiler flags.
91*d2c16535SElliott Hughes
92*d2c16535SElliott HughesIntegrity check support
93*d2c16535SElliott Hughes
94*d2c16535SElliott Hughes    XZ Embedded always supports the integrity check types None and
95*d2c16535SElliott Hughes    CRC32. Support for CRC64 is optional. SHA-256 is currently not
96*d2c16535SElliott Hughes    supported in XZ Embedded although the .xz format does support it.
97*d2c16535SElliott Hughes    The xz tool from XZ Utils uses CRC64 by default, but CRC32 is usually
98*d2c16535SElliott Hughes    enough in embedded systems to keep the code size smaller.
99*d2c16535SElliott Hughes
100*d2c16535SElliott Hughes    If you want support for CRC64, you need to copy linux/lib/xz/xz_crc64.c
101*d2c16535SElliott Hughes    into your application, and #define XZ_USE_CRC64 in xz_config.h or in
102*d2c16535SElliott Hughes    compiler flags.
103*d2c16535SElliott Hughes
104*d2c16535SElliott Hughes    When using the internal CRC32 or CRC64, their lookup tables need to be
105*d2c16535SElliott Hughes    initialized with xz_crc32_init() and xz_crc64_init(), respectively.
106*d2c16535SElliott Hughes    See xz.h for details.
107*d2c16535SElliott Hughes
108*d2c16535SElliott Hughes    To use external CRC32 or CRC64 code instead of the code from
109*d2c16535SElliott Hughes    xz_crc32.c or xz_crc64.c, the following #defines may be used
110*d2c16535SElliott Hughes    in xz_config.h or in compiler flags:
111*d2c16535SElliott Hughes
112*d2c16535SElliott Hughes        #define XZ_INTERNAL_CRC32 0
113*d2c16535SElliott Hughes        #define XZ_INTERNAL_CRC64 0
114*d2c16535SElliott Hughes
115*d2c16535SElliott Hughes    Then it is up to you to provide compatible xz_crc32() or xz_crc64()
116*d2c16535SElliott Hughes    functions.
117*d2c16535SElliott Hughes
118*d2c16535SElliott Hughes    If the .xz file being decompressed uses an integrity check type that
119*d2c16535SElliott Hughes    isn't supported by XZ Embedded, it is treated as an error and the
120*d2c16535SElliott Hughes    file cannot be decompressed. For multi-call mode, this can be modified
121*d2c16535SElliott Hughes    by #defining XZ_DEC_ANY_CHECK. Then xz_dec_run() will return
122*d2c16535SElliott Hughes    XZ_UNSUPPORTED_CHECK when unsupported check type is detected. After
123*d2c16535SElliott Hughes    that decompression can be continued normally except that the
124*d2c16535SElliott Hughes    integrity check won't be verified. In single-call mode there's
125*d2c16535SElliott Hughes    no way to continue decoding, so XZ_DEC_ANY_CHECK is almost useless
126*d2c16535SElliott Hughes    in single-call mode.
127*d2c16535SElliott Hughes
128*d2c16535SElliott HughesBCJ filter support
129*d2c16535SElliott Hughes
130*d2c16535SElliott Hughes    If you want support for one or more BCJ filters, you need to copy
131*d2c16535SElliott Hughes    linux/lib/xz/xz_dec_bcj.c into your application, and use appropriate
132*d2c16535SElliott Hughes    #defines in xz_config.h or in compiler flags. You don't need these
133*d2c16535SElliott Hughes    #defines in the code that just uses XZ Embedded via xz.h, but having
134*d2c16535SElliott Hughes    them always #defined doesn't hurt either.
135*d2c16535SElliott Hughes
136*d2c16535SElliott Hughes        #define             Instruction set     BCJ filter endianness
137*d2c16535SElliott Hughes        XZ_DEC_X86          x86-32 or x86-64    Little endian only
138*d2c16535SElliott Hughes        XZ_DEC_POWERPC      PowerPC             Big endian only
139*d2c16535SElliott Hughes        XZ_DEC_IA64         Itanium (IA-64)     Big or little endian
140*d2c16535SElliott Hughes        XZ_DEC_ARM          ARM                 Little endian instructions
141*d2c16535SElliott Hughes        XZ_DEC_ARMTHUMB     ARM-Thumb           Big or little endian
142*d2c16535SElliott Hughes        XZ_DEC_ARM64        ARM64               Big or little endian
143*d2c16535SElliott Hughes        XZ_DEC_SPARC        SPARC               Big or little endian
144*d2c16535SElliott Hughes        XZ_DEC_RISCV        RISC-V              Big or little endian
145*d2c16535SElliott Hughes
146*d2c16535SElliott Hughes    While some architectures are (partially) bi-endian, the endianness
147*d2c16535SElliott Hughes    setting doesn't change the endianness of the instructions on all
148*d2c16535SElliott Hughes    architectures. That's why many filters work for both big and little
149*d2c16535SElliott Hughes    endian executables (Itanium and ARM based architectures have little
150*d2c16535SElliott Hughes    endian instructions and SPARC has big endian instructions).
151*d2c16535SElliott Hughes
152*d2c16535SElliott HughesNotes about shared libraries
153*d2c16535SElliott Hughes
154*d2c16535SElliott Hughes    If you are including XZ Embedded into a shared library, you should
155*d2c16535SElliott Hughes    rename the xz_* functions to prevent symbol conflicts in case your
156*d2c16535SElliott Hughes    library is linked against some other library or application that
157*d2c16535SElliott Hughes    also has XZ Embedded in it (which may even be a different version
158*d2c16535SElliott Hughes    of XZ Embedded).
159*d2c16535SElliott Hughes
160*d2c16535SElliott Hughes    Please don't create a shared library of XZ Embedded itself unless
161*d2c16535SElliott Hughes    it is fine to rebuild everything depending on that shared library
162*d2c16535SElliott Hughes    every time you upgrade to a newer version of XZ Embedded. There are
163*d2c16535SElliott Hughes    no API or ABI stability guarantees between different versions of
164*d2c16535SElliott Hughes    XZ Embedded.
165*d2c16535SElliott Hughes
166*d2c16535SElliott HughesContact information
167*d2c16535SElliott Hughes
168*d2c16535SElliott Hughes    Email: Lasse Collin <[email protected]>
169*d2c16535SElliott Hughes    IRC: Larhzu on #tukaani on Libera Chat
170*d2c16535SElliott Hughes    GitHub: https://github.com/tukaani-project/xz-embedded
171*d2c16535SElliott Hughes
172