1 2XZ Embedded 3=========== 4 5 XZ Embedded is a relatively small, limited implementation of the .xz 6 file format. Currently only decoding is implemented. 7 8 XZ Embedded was written for use in the Linux kernel, but the code can 9 be easily used in other environments too, including regular userspace 10 applications. See userspace/xzminidec.c for an example program. 11 12 NOTE: The version of XZ Embedded in the Linux kernel lacks a few 13 build-time-selectable optional features that are present in the 14 upstream XZ Embedded project: support for concatated .xz files, 15 CRC64, and ignoring unsupported check. These aren't in Linux 16 because they don't seem useful there but they would add to the 17 code size. 18 19 This README contains information that is useful only when the copy 20 of XZ Embedded isn't part of the Linux kernel tree. You should also 21 read linux/Documentation/staging/xz.rst even if you aren't using 22 XZ Embedded as part of Linux; information in that file is not 23 repeated in this README. 24 25Conformance to the .xz file format specification 26 27 As of the .xz file format specification version 1.2.0, this 28 decompressor implementation has the following limitations: 29 30 - SHA-256 isn't supported. It can be ignored as an unsupported 31 checked type if that feature is enabled at build time. 32 33 - Delta filter is not included. 34 35 - BCJ filters don't support non-default start offset. 36 37 - LZMA2 supports at most 3 GiB dictionary. 38 39 There are a couple of corner cases where things have been simplified 40 at expense of detecting errors as early as possible. These should not 41 matter in practice at all since they don't cause security issues. But 42 it is good to know this if testing the code with the test files from 43 XZ Utils. 44 45Compiler requirements 46 47 XZ Embedded should compile with any C99 or C11 compiler. The code 48 should also be GNU-C89 compatible still. GNU-C89 was used in the 49 Linux kernel until 2022. GNU-C89 support likely will be dropped 50 at some point. 51 52Embedding into userspace applications 53 54 To embed the XZ decoder, copy the following files into a single 55 directory in your source code tree: 56 57 linux/include/linux/xz.h 58 linux/lib/xz/xz_crc32.c 59 linux/lib/xz/xz_dec_lzma2.c 60 linux/lib/xz/xz_dec_stream.c 61 linux/lib/xz/xz_lzma2.h 62 linux/lib/xz/xz_private.h 63 linux/lib/xz/xz_stream.h 64 userspace/xz_config.h 65 66 Alternatively, xz.h may be placed into a different directory but then 67 that directory must be in the compiler include path when compiling 68 the .c files. 69 70 Your code should use only the functions declared in xz.h. The rest of 71 the .h files are meant only for internal use in XZ Embedded. 72 73 You may want to modify xz_config.h to be more suitable for your build 74 environment. Probably you should at least skim through it even if the 75 default file works as is. 76 77Supporting concatenated .xz files 78 79 Regular .xz files can be concatenated as is and the xz command line 80 tool will decompress all streams from a concatenated file (a few 81 other popular formats and tools support this too). This kind of .xz 82 files are more common than one might think because pxz, an early 83 threaded XZ compressor, created this kind of .xz files. 84 85 The xz_dec_run() function will stop after decompressing one stream. 86 This is good when XZ data is stored inside some other file format. 87 However, if one is decompressing regular standalone .xz files, one 88 will want to decompress all streams in the file. This is easy with 89 xz_dec_catrun(). To include support for xz_dec_catrun(), you need 90 to #define XZ_DEC_CONCATENATED in xz_config.h or in compiler flags. 91 92Integrity check support 93 94 XZ Embedded always supports the integrity check types None and 95 CRC32. Support for CRC64 is optional. SHA-256 is currently not 96 supported in XZ Embedded although the .xz format does support it. 97 The xz tool from XZ Utils uses CRC64 by default, but CRC32 is usually 98 enough in embedded systems to keep the code size smaller. 99 100 If you want support for CRC64, you need to copy linux/lib/xz/xz_crc64.c 101 into your application, and #define XZ_USE_CRC64 in xz_config.h or in 102 compiler flags. 103 104 When using the internal CRC32 or CRC64, their lookup tables need to be 105 initialized with xz_crc32_init() and xz_crc64_init(), respectively. 106 See xz.h for details. 107 108 To use external CRC32 or CRC64 code instead of the code from 109 xz_crc32.c or xz_crc64.c, the following #defines may be used 110 in xz_config.h or in compiler flags: 111 112 #define XZ_INTERNAL_CRC32 0 113 #define XZ_INTERNAL_CRC64 0 114 115 Then it is up to you to provide compatible xz_crc32() or xz_crc64() 116 functions. 117 118 If the .xz file being decompressed uses an integrity check type that 119 isn't supported by XZ Embedded, it is treated as an error and the 120 file cannot be decompressed. For multi-call mode, this can be modified 121 by #defining XZ_DEC_ANY_CHECK. Then xz_dec_run() will return 122 XZ_UNSUPPORTED_CHECK when unsupported check type is detected. After 123 that decompression can be continued normally except that the 124 integrity check won't be verified. In single-call mode there's 125 no way to continue decoding, so XZ_DEC_ANY_CHECK is almost useless 126 in single-call mode. 127 128BCJ filter support 129 130 If you want support for one or more BCJ filters, you need to copy 131 linux/lib/xz/xz_dec_bcj.c into your application, and use appropriate 132 #defines in xz_config.h or in compiler flags. You don't need these 133 #defines in the code that just uses XZ Embedded via xz.h, but having 134 them always #defined doesn't hurt either. 135 136 #define Instruction set BCJ filter endianness 137 XZ_DEC_X86 x86-32 or x86-64 Little endian only 138 XZ_DEC_POWERPC PowerPC Big endian only 139 XZ_DEC_IA64 Itanium (IA-64) Big or little endian 140 XZ_DEC_ARM ARM Little endian instructions 141 XZ_DEC_ARMTHUMB ARM-Thumb Big or little endian 142 XZ_DEC_ARM64 ARM64 Big or little endian 143 XZ_DEC_SPARC SPARC Big or little endian 144 XZ_DEC_RISCV RISC-V Big or little endian 145 146 While some architectures are (partially) bi-endian, the endianness 147 setting doesn't change the endianness of the instructions on all 148 architectures. That's why many filters work for both big and little 149 endian executables (Itanium and ARM based architectures have little 150 endian instructions and SPARC has big endian instructions). 151 152Notes about shared libraries 153 154 If you are including XZ Embedded into a shared library, you should 155 rename the xz_* functions to prevent symbol conflicts in case your 156 library is linked against some other library or application that 157 also has XZ Embedded in it (which may even be a different version 158 of XZ Embedded). 159 160 Please don't create a shared library of XZ Embedded itself unless 161 it is fine to rebuild everything depending on that shared library 162 every time you upgrade to a newer version of XZ Embedded. There are 163 no API or ABI stability guarantees between different versions of 164 XZ Embedded. 165 166Contact information 167 168 Email: Lasse Collin <[email protected]> 169 IRC: Larhzu on #tukaani on Libera Chat 170 GitHub: https://github.com/tukaani-project/xz-embedded 171 172