1*d2c16535SElliott Hughes 2*d2c16535SElliott HughesXZ Embedded 3*d2c16535SElliott Hughes=========== 4*d2c16535SElliott Hughes 5*d2c16535SElliott Hughes XZ Embedded is a relatively small, limited implementation of the .xz 6*d2c16535SElliott Hughes file format. Currently only decoding is implemented. 7*d2c16535SElliott Hughes 8*d2c16535SElliott Hughes XZ Embedded was written for use in the Linux kernel, but the code can 9*d2c16535SElliott Hughes be easily used in other environments too, including regular userspace 10*d2c16535SElliott Hughes applications. See userspace/xzminidec.c for an example program. 11*d2c16535SElliott Hughes 12*d2c16535SElliott Hughes NOTE: The version of XZ Embedded in the Linux kernel lacks a few 13*d2c16535SElliott Hughes build-time-selectable optional features that are present in the 14*d2c16535SElliott Hughes upstream XZ Embedded project: support for concatated .xz files, 15*d2c16535SElliott Hughes CRC64, and ignoring unsupported check. These aren't in Linux 16*d2c16535SElliott Hughes because they don't seem useful there but they would add to the 17*d2c16535SElliott Hughes code size. 18*d2c16535SElliott Hughes 19*d2c16535SElliott Hughes This README contains information that is useful only when the copy 20*d2c16535SElliott Hughes of XZ Embedded isn't part of the Linux kernel tree. You should also 21*d2c16535SElliott Hughes read linux/Documentation/staging/xz.rst even if you aren't using 22*d2c16535SElliott Hughes XZ Embedded as part of Linux; information in that file is not 23*d2c16535SElliott Hughes repeated in this README. 24*d2c16535SElliott Hughes 25*d2c16535SElliott HughesConformance to the .xz file format specification 26*d2c16535SElliott Hughes 27*d2c16535SElliott Hughes As of the .xz file format specification version 1.2.0, this 28*d2c16535SElliott Hughes decompressor implementation has the following limitations: 29*d2c16535SElliott Hughes 30*d2c16535SElliott Hughes - SHA-256 isn't supported. It can be ignored as an unsupported 31*d2c16535SElliott Hughes checked type if that feature is enabled at build time. 32*d2c16535SElliott Hughes 33*d2c16535SElliott Hughes - Delta filter is not included. 34*d2c16535SElliott Hughes 35*d2c16535SElliott Hughes - BCJ filters don't support non-default start offset. 36*d2c16535SElliott Hughes 37*d2c16535SElliott Hughes - LZMA2 supports at most 3 GiB dictionary. 38*d2c16535SElliott Hughes 39*d2c16535SElliott Hughes There are a couple of corner cases where things have been simplified 40*d2c16535SElliott Hughes at expense of detecting errors as early as possible. These should not 41*d2c16535SElliott Hughes matter in practice at all since they don't cause security issues. But 42*d2c16535SElliott Hughes it is good to know this if testing the code with the test files from 43*d2c16535SElliott Hughes XZ Utils. 44*d2c16535SElliott Hughes 45*d2c16535SElliott HughesCompiler requirements 46*d2c16535SElliott Hughes 47*d2c16535SElliott Hughes XZ Embedded should compile with any C99 or C11 compiler. The code 48*d2c16535SElliott Hughes should also be GNU-C89 compatible still. GNU-C89 was used in the 49*d2c16535SElliott Hughes Linux kernel until 2022. GNU-C89 support likely will be dropped 50*d2c16535SElliott Hughes at some point. 51*d2c16535SElliott Hughes 52*d2c16535SElliott HughesEmbedding into userspace applications 53*d2c16535SElliott Hughes 54*d2c16535SElliott Hughes To embed the XZ decoder, copy the following files into a single 55*d2c16535SElliott Hughes directory in your source code tree: 56*d2c16535SElliott Hughes 57*d2c16535SElliott Hughes linux/include/linux/xz.h 58*d2c16535SElliott Hughes linux/lib/xz/xz_crc32.c 59*d2c16535SElliott Hughes linux/lib/xz/xz_dec_lzma2.c 60*d2c16535SElliott Hughes linux/lib/xz/xz_dec_stream.c 61*d2c16535SElliott Hughes linux/lib/xz/xz_lzma2.h 62*d2c16535SElliott Hughes linux/lib/xz/xz_private.h 63*d2c16535SElliott Hughes linux/lib/xz/xz_stream.h 64*d2c16535SElliott Hughes userspace/xz_config.h 65*d2c16535SElliott Hughes 66*d2c16535SElliott Hughes Alternatively, xz.h may be placed into a different directory but then 67*d2c16535SElliott Hughes that directory must be in the compiler include path when compiling 68*d2c16535SElliott Hughes the .c files. 69*d2c16535SElliott Hughes 70*d2c16535SElliott Hughes Your code should use only the functions declared in xz.h. The rest of 71*d2c16535SElliott Hughes the .h files are meant only for internal use in XZ Embedded. 72*d2c16535SElliott Hughes 73*d2c16535SElliott Hughes You may want to modify xz_config.h to be more suitable for your build 74*d2c16535SElliott Hughes environment. Probably you should at least skim through it even if the 75*d2c16535SElliott Hughes default file works as is. 76*d2c16535SElliott Hughes 77*d2c16535SElliott HughesSupporting concatenated .xz files 78*d2c16535SElliott Hughes 79*d2c16535SElliott Hughes Regular .xz files can be concatenated as is and the xz command line 80*d2c16535SElliott Hughes tool will decompress all streams from a concatenated file (a few 81*d2c16535SElliott Hughes other popular formats and tools support this too). This kind of .xz 82*d2c16535SElliott Hughes files are more common than one might think because pxz, an early 83*d2c16535SElliott Hughes threaded XZ compressor, created this kind of .xz files. 84*d2c16535SElliott Hughes 85*d2c16535SElliott Hughes The xz_dec_run() function will stop after decompressing one stream. 86*d2c16535SElliott Hughes This is good when XZ data is stored inside some other file format. 87*d2c16535SElliott Hughes However, if one is decompressing regular standalone .xz files, one 88*d2c16535SElliott Hughes will want to decompress all streams in the file. This is easy with 89*d2c16535SElliott Hughes xz_dec_catrun(). To include support for xz_dec_catrun(), you need 90*d2c16535SElliott Hughes to #define XZ_DEC_CONCATENATED in xz_config.h or in compiler flags. 91*d2c16535SElliott Hughes 92*d2c16535SElliott HughesIntegrity check support 93*d2c16535SElliott Hughes 94*d2c16535SElliott Hughes XZ Embedded always supports the integrity check types None and 95*d2c16535SElliott Hughes CRC32. Support for CRC64 is optional. SHA-256 is currently not 96*d2c16535SElliott Hughes supported in XZ Embedded although the .xz format does support it. 97*d2c16535SElliott Hughes The xz tool from XZ Utils uses CRC64 by default, but CRC32 is usually 98*d2c16535SElliott Hughes enough in embedded systems to keep the code size smaller. 99*d2c16535SElliott Hughes 100*d2c16535SElliott Hughes If you want support for CRC64, you need to copy linux/lib/xz/xz_crc64.c 101*d2c16535SElliott Hughes into your application, and #define XZ_USE_CRC64 in xz_config.h or in 102*d2c16535SElliott Hughes compiler flags. 103*d2c16535SElliott Hughes 104*d2c16535SElliott Hughes When using the internal CRC32 or CRC64, their lookup tables need to be 105*d2c16535SElliott Hughes initialized with xz_crc32_init() and xz_crc64_init(), respectively. 106*d2c16535SElliott Hughes See xz.h for details. 107*d2c16535SElliott Hughes 108*d2c16535SElliott Hughes To use external CRC32 or CRC64 code instead of the code from 109*d2c16535SElliott Hughes xz_crc32.c or xz_crc64.c, the following #defines may be used 110*d2c16535SElliott Hughes in xz_config.h or in compiler flags: 111*d2c16535SElliott Hughes 112*d2c16535SElliott Hughes #define XZ_INTERNAL_CRC32 0 113*d2c16535SElliott Hughes #define XZ_INTERNAL_CRC64 0 114*d2c16535SElliott Hughes 115*d2c16535SElliott Hughes Then it is up to you to provide compatible xz_crc32() or xz_crc64() 116*d2c16535SElliott Hughes functions. 117*d2c16535SElliott Hughes 118*d2c16535SElliott Hughes If the .xz file being decompressed uses an integrity check type that 119*d2c16535SElliott Hughes isn't supported by XZ Embedded, it is treated as an error and the 120*d2c16535SElliott Hughes file cannot be decompressed. For multi-call mode, this can be modified 121*d2c16535SElliott Hughes by #defining XZ_DEC_ANY_CHECK. Then xz_dec_run() will return 122*d2c16535SElliott Hughes XZ_UNSUPPORTED_CHECK when unsupported check type is detected. After 123*d2c16535SElliott Hughes that decompression can be continued normally except that the 124*d2c16535SElliott Hughes integrity check won't be verified. In single-call mode there's 125*d2c16535SElliott Hughes no way to continue decoding, so XZ_DEC_ANY_CHECK is almost useless 126*d2c16535SElliott Hughes in single-call mode. 127*d2c16535SElliott Hughes 128*d2c16535SElliott HughesBCJ filter support 129*d2c16535SElliott Hughes 130*d2c16535SElliott Hughes If you want support for one or more BCJ filters, you need to copy 131*d2c16535SElliott Hughes linux/lib/xz/xz_dec_bcj.c into your application, and use appropriate 132*d2c16535SElliott Hughes #defines in xz_config.h or in compiler flags. You don't need these 133*d2c16535SElliott Hughes #defines in the code that just uses XZ Embedded via xz.h, but having 134*d2c16535SElliott Hughes them always #defined doesn't hurt either. 135*d2c16535SElliott Hughes 136*d2c16535SElliott Hughes #define Instruction set BCJ filter endianness 137*d2c16535SElliott Hughes XZ_DEC_X86 x86-32 or x86-64 Little endian only 138*d2c16535SElliott Hughes XZ_DEC_POWERPC PowerPC Big endian only 139*d2c16535SElliott Hughes XZ_DEC_IA64 Itanium (IA-64) Big or little endian 140*d2c16535SElliott Hughes XZ_DEC_ARM ARM Little endian instructions 141*d2c16535SElliott Hughes XZ_DEC_ARMTHUMB ARM-Thumb Big or little endian 142*d2c16535SElliott Hughes XZ_DEC_ARM64 ARM64 Big or little endian 143*d2c16535SElliott Hughes XZ_DEC_SPARC SPARC Big or little endian 144*d2c16535SElliott Hughes XZ_DEC_RISCV RISC-V Big or little endian 145*d2c16535SElliott Hughes 146*d2c16535SElliott Hughes While some architectures are (partially) bi-endian, the endianness 147*d2c16535SElliott Hughes setting doesn't change the endianness of the instructions on all 148*d2c16535SElliott Hughes architectures. That's why many filters work for both big and little 149*d2c16535SElliott Hughes endian executables (Itanium and ARM based architectures have little 150*d2c16535SElliott Hughes endian instructions and SPARC has big endian instructions). 151*d2c16535SElliott Hughes 152*d2c16535SElliott HughesNotes about shared libraries 153*d2c16535SElliott Hughes 154*d2c16535SElliott Hughes If you are including XZ Embedded into a shared library, you should 155*d2c16535SElliott Hughes rename the xz_* functions to prevent symbol conflicts in case your 156*d2c16535SElliott Hughes library is linked against some other library or application that 157*d2c16535SElliott Hughes also has XZ Embedded in it (which may even be a different version 158*d2c16535SElliott Hughes of XZ Embedded). 159*d2c16535SElliott Hughes 160*d2c16535SElliott Hughes Please don't create a shared library of XZ Embedded itself unless 161*d2c16535SElliott Hughes it is fine to rebuild everything depending on that shared library 162*d2c16535SElliott Hughes every time you upgrade to a newer version of XZ Embedded. There are 163*d2c16535SElliott Hughes no API or ABI stability guarantees between different versions of 164*d2c16535SElliott Hughes XZ Embedded. 165*d2c16535SElliott Hughes 166*d2c16535SElliott HughesContact information 167*d2c16535SElliott Hughes 168*d2c16535SElliott Hughes Email: Lasse Collin <[email protected]> 169*d2c16535SElliott Hughes IRC: Larhzu on #tukaani on Libera Chat 170*d2c16535SElliott Hughes GitHub: https://github.com/tukaani-project/xz-embedded 171*d2c16535SElliott Hughes 172