1*0ac9a9daSXin Libzip2(1) bzip2(1) 2*0ac9a9daSXin Li 3*0ac9a9daSXin Li 4*0ac9a9daSXin Li 5*0ac9a9daSXin LiNNAAMMEE 6*0ac9a9daSXin Li bzip2, bunzip2 − a block‐sorting file compressor, v1.0.8 7*0ac9a9daSXin Li bzcat − decompresses files to stdout 8*0ac9a9daSXin Li bzip2recover − recovers data from damaged bzip2 files 9*0ac9a9daSXin Li 10*0ac9a9daSXin Li 11*0ac9a9daSXin LiSSYYNNOOPPSSIISS 12*0ac9a9daSXin Li bbzziipp22 [ −−ccddffkkqqssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ] 13*0ac9a9daSXin Li bbuunnzziipp22 [ −−ffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ] 14*0ac9a9daSXin Li bbzzccaatt [ −−ss ] [ _f_i_l_e_n_a_m_e_s _._._. ] 15*0ac9a9daSXin Li bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e 16*0ac9a9daSXin Li 17*0ac9a9daSXin Li 18*0ac9a9daSXin LiDDEESSCCRRIIPPTTIIOONN 19*0ac9a9daSXin Li _b_z_i_p_2 compresses files using the Burrows‐Wheeler block 20*0ac9a9daSXin Li sorting text compression algorithm, and Huffman coding. 21*0ac9a9daSXin Li Compression is generally considerably better than that 22*0ac9a9daSXin Li achieved by more conventional LZ77/LZ78‐based compressors, 23*0ac9a9daSXin Li and approaches the performance of the PPM family of sta 24*0ac9a9daSXin Li tistical compressors. 25*0ac9a9daSXin Li 26*0ac9a9daSXin Li The command‐line options are deliberately very similar to 27*0ac9a9daSXin Li those of _G_N_U _g_z_i_p_, but they are not identical. 28*0ac9a9daSXin Li 29*0ac9a9daSXin Li _b_z_i_p_2 expects a list of file names to accompany the com 30*0ac9a9daSXin Li mand‐line flags. Each file is replaced by a compressed 31*0ac9a9daSXin Li version of itself, with the name "original_name.bz2". 32*0ac9a9daSXin Li Each compressed file has the same modification date, per 33*0ac9a9daSXin Li missions, and, when possible, ownership as the correspond 34*0ac9a9daSXin Li ing original, so that these properties can be correctly 35*0ac9a9daSXin Li restored at decompression time. File name handling is 36*0ac9a9daSXin Li naive in the sense that there is no mechanism for preserv 37*0ac9a9daSXin Li ing original file names, permissions, ownerships or dates 38*0ac9a9daSXin Li in filesystems which lack these concepts, or have serious 39*0ac9a9daSXin Li file name length restrictions, such as MS‐DOS. 40*0ac9a9daSXin Li 41*0ac9a9daSXin Li _b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing 42*0ac9a9daSXin Li files. If you want this to happen, specify the −f flag. 43*0ac9a9daSXin Li 44*0ac9a9daSXin Li If no file names are specified, _b_z_i_p_2 compresses from 45*0ac9a9daSXin Li standard input to standard output. In this case, _b_z_i_p_2 46*0ac9a9daSXin Li will decline to write compressed output to a terminal, as 47*0ac9a9daSXin Li this would be entirely incomprehensible and therefore 48*0ac9a9daSXin Li pointless. 49*0ac9a9daSXin Li 50*0ac9a9daSXin Li _b_u_n_z_i_p_2 (or _b_z_i_p_2 _−_d_) decompresses all specified files. 51*0ac9a9daSXin Li Files which were not created by _b_z_i_p_2 will be detected and 52*0ac9a9daSXin Li ignored, and a warning issued. _b_z_i_p_2 attempts to guess 53*0ac9a9daSXin Li the filename for the decompressed file from that of the 54*0ac9a9daSXin Li compressed file as follows: 55*0ac9a9daSXin Li 56*0ac9a9daSXin Li filename.bz2 becomes filename 57*0ac9a9daSXin Li filename.bz becomes filename 58*0ac9a9daSXin Li filename.tbz2 becomes filename.tar 59*0ac9a9daSXin Li filename.tbz becomes filename.tar 60*0ac9a9daSXin Li anyothername becomes anyothername.out 61*0ac9a9daSXin Li 62*0ac9a9daSXin Li If the file does not end in one of the recognised endings, 63*0ac9a9daSXin Li _._b_z_2_, _._b_z_, _._t_b_z_2 or _._t_b_z_, _b_z_i_p_2 complains that it cannot 64*0ac9a9daSXin Li guess the name of the original file, and uses the original 65*0ac9a9daSXin Li name with _._o_u_t appended. 66*0ac9a9daSXin Li 67*0ac9a9daSXin Li As with compression, supplying no filenames causes decom 68*0ac9a9daSXin Li pression from standard input to standard output. 69*0ac9a9daSXin Li 70*0ac9a9daSXin Li _b_u_n_z_i_p_2 will correctly decompress a file which is the con 71*0ac9a9daSXin Li catenation of two or more compressed files. The result is 72*0ac9a9daSXin Li the concatenation of the corresponding uncompressed files. 73*0ac9a9daSXin Li Integrity testing (−t) of concatenated compressed files is 74*0ac9a9daSXin Li also supported. 75*0ac9a9daSXin Li 76*0ac9a9daSXin Li You can also compress or decompress files to the standard 77*0ac9a9daSXin Li output by giving the −c flag. Multiple files may be com 78*0ac9a9daSXin Li pressed and decompressed like this. The resulting outputs 79*0ac9a9daSXin Li are fed sequentially to stdout. Compression of multiple 80*0ac9a9daSXin Li files in this manner generates a stream containing multi 81*0ac9a9daSXin Li ple compressed file representations. Such a stream can be 82*0ac9a9daSXin Li decompressed correctly only by _b_z_i_p_2 version 0.9.0 or 83*0ac9a9daSXin Li later. Earlier versions of _b_z_i_p_2 will stop after decom 84*0ac9a9daSXin Li pressing the first file in the stream. 85*0ac9a9daSXin Li 86*0ac9a9daSXin Li _b_z_c_a_t (or _b_z_i_p_2 _‐_d_c_) decompresses all specified files to 87*0ac9a9daSXin Li the standard output. 88*0ac9a9daSXin Li 89*0ac9a9daSXin Li _b_z_i_p_2 will read arguments from the environment variables 90*0ac9a9daSXin Li _B_Z_I_P_2 and _B_Z_I_P_, in that order, and will process them 91*0ac9a9daSXin Li before any arguments read from the command line. This 92*0ac9a9daSXin Li gives a convenient way to supply default arguments. 93*0ac9a9daSXin Li 94*0ac9a9daSXin Li Compression is always performed, even if the compressed 95*0ac9a9daSXin Li file is slightly larger than the original. Files of less 96*0ac9a9daSXin Li than about one hundred bytes tend to get larger, since the 97*0ac9a9daSXin Li compression mechanism has a constant overhead in the 98*0ac9a9daSXin Li region of 50 bytes. Random data (including the output of 99*0ac9a9daSXin Li most file compressors) is coded at about 8.05 bits per 100*0ac9a9daSXin Li byte, giving an expansion of around 0.5%. 101*0ac9a9daSXin Li 102*0ac9a9daSXin Li As a self‐check for your protection, _b_z_i_p_2 uses 32‐bit 103*0ac9a9daSXin Li CRCs to make sure that the decompressed version of a file 104*0ac9a9daSXin Li is identical to the original. This guards against corrup 105*0ac9a9daSXin Li tion of the compressed data, and against undetected bugs 106*0ac9a9daSXin Li in _b_z_i_p_2 (hopefully very unlikely). The chances of data 107*0ac9a9daSXin Li corruption going undetected is microscopic, about one 108*0ac9a9daSXin Li chance in four billion for each file processed. Be aware, 109*0ac9a9daSXin Li though, that the check occurs upon decompression, so it 110*0ac9a9daSXin Li can only tell you that something is wrong. It can’t help 111*0ac9a9daSXin Li you recover the original uncompressed data. You can use 112*0ac9a9daSXin Li _b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files. 113*0ac9a9daSXin Li 114*0ac9a9daSXin Li Return values: 0 for a normal exit, 1 for environmental 115*0ac9a9daSXin Li problems (file not found, invalid flags, I/O errors, &c), 116*0ac9a9daSXin Li 2 to indicate a corrupt compressed file, 3 for an internal 117*0ac9a9daSXin Li consistency error (eg, bug) which caused _b_z_i_p_2 to panic. 118*0ac9a9daSXin Li 119*0ac9a9daSXin Li 120*0ac9a9daSXin LiOOPPTTIIOONNSS 121*0ac9a9daSXin Li −−cc ‐‐‐‐ssttddoouutt 122*0ac9a9daSXin Li Compress or decompress to standard output. 123*0ac9a9daSXin Li 124*0ac9a9daSXin Li −−dd ‐‐‐‐ddeeccoommpprreessss 125*0ac9a9daSXin Li Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are 126*0ac9a9daSXin Li really the same program, and the decision about 127*0ac9a9daSXin Li what actions to take is done on the basis of which 128*0ac9a9daSXin Li name is used. This flag overrides that mechanism, 129*0ac9a9daSXin Li and forces _b_z_i_p_2 to decompress. 130*0ac9a9daSXin Li 131*0ac9a9daSXin Li −−zz ‐‐‐‐ccoommpprreessss 132*0ac9a9daSXin Li The complement to −d: forces compression, 133*0ac9a9daSXin Li regardless of the invocation name. 134*0ac9a9daSXin Li 135*0ac9a9daSXin Li −−tt ‐‐‐‐tteesstt 136*0ac9a9daSXin Li Check integrity of the specified file(s), but don’t 137*0ac9a9daSXin Li decompress them. This really performs a trial 138*0ac9a9daSXin Li decompression and throws away the result. 139*0ac9a9daSXin Li 140*0ac9a9daSXin Li −−ff ‐‐‐‐ffoorrccee 141*0ac9a9daSXin Li Force overwrite of output files. Normally, _b_z_i_p_2 142*0ac9a9daSXin Li will not overwrite existing output files. Also 143*0ac9a9daSXin Li forces _b_z_i_p_2 to break hard links to files, which it 144*0ac9a9daSXin Li otherwise wouldn’t do. 145*0ac9a9daSXin Li 146*0ac9a9daSXin Li bzip2 normally declines to decompress files which 147*0ac9a9daSXin Li don’t have the correct magic header bytes. If 148*0ac9a9daSXin Li forced (‐f), however, it will pass such files 149*0ac9a9daSXin Li through unmodified. This is how GNU gzip behaves. 150*0ac9a9daSXin Li 151*0ac9a9daSXin Li −−kk ‐‐‐‐kkeeeepp 152*0ac9a9daSXin Li Keep (don’t delete) input files during compression 153*0ac9a9daSXin Li or decompression. 154*0ac9a9daSXin Li 155*0ac9a9daSXin Li −−ss ‐‐‐‐ssmmaallll 156*0ac9a9daSXin Li Reduce memory usage, for compression, decompression 157*0ac9a9daSXin Li and testing. Files are decompressed and tested 158*0ac9a9daSXin Li using a modified algorithm which only requires 2.5 159*0ac9a9daSXin Li bytes per block byte. This means any file can be 160*0ac9a9daSXin Li decompressed in 2300k of memory, albeit at about 161*0ac9a9daSXin Li half the normal speed. 162*0ac9a9daSXin Li 163*0ac9a9daSXin Li During compression, −s selects a block size of 164*0ac9a9daSXin Li 200k, which limits memory use to around the same 165*0ac9a9daSXin Li figure, at the expense of your compression ratio. 166*0ac9a9daSXin Li In short, if your machine is low on memory (8 167*0ac9a9daSXin Li megabytes or less), use −s for everything. See 168*0ac9a9daSXin Li MEMORY MANAGEMENT below. 169*0ac9a9daSXin Li 170*0ac9a9daSXin Li −−qq ‐‐‐‐qquuiieett 171*0ac9a9daSXin Li Suppress non‐essential warning messages. Messages 172*0ac9a9daSXin Li pertaining to I/O errors and other critical events 173*0ac9a9daSXin Li will not be suppressed. 174*0ac9a9daSXin Li 175*0ac9a9daSXin Li −−vv ‐‐‐‐vveerrbboossee 176*0ac9a9daSXin Li Verbose mode ‐‐ show the compression ratio for each 177*0ac9a9daSXin Li file processed. Further −v’s increase the ver 178*0ac9a9daSXin Li bosity level, spewing out lots of information which 179*0ac9a9daSXin Li is primarily of interest for diagnostic purposes. 180*0ac9a9daSXin Li 181*0ac9a9daSXin Li −−LL ‐‐‐‐lliicceennssee ‐‐VV ‐‐‐‐vveerrssiioonn 182*0ac9a9daSXin Li Display the software version, license terms and 183*0ac9a9daSXin Li conditions. 184*0ac9a9daSXin Li 185*0ac9a9daSXin Li −−11 ((oorr −−−−ffaasstt)) ttoo −−99 ((oorr −−−−bbeesstt)) 186*0ac9a9daSXin Li Set the block size to 100 k, 200 k .. 900 k when 187*0ac9a9daSXin Li compressing. Has no effect when decompressing. 188*0ac9a9daSXin Li See MEMORY MANAGEMENT below. The −−fast and −−best 189*0ac9a9daSXin Li aliases are primarily for GNU gzip compatibility. 190*0ac9a9daSXin Li In particular, −−fast doesn’t make things signifi 191*0ac9a9daSXin Li cantly faster. And −−best merely selects the 192*0ac9a9daSXin Li default behaviour. 193*0ac9a9daSXin Li 194*0ac9a9daSXin Li −−‐‐ Treats all subsequent arguments as file names, even 195*0ac9a9daSXin Li if they start with a dash. This is so you can han 196*0ac9a9daSXin Li dle files with names beginning with a dash, for 197*0ac9a9daSXin Li example: bzip2 −‐ −myfilename. 198*0ac9a9daSXin Li 199*0ac9a9daSXin Li −−‐‐rreeppeettiittiivvee‐‐ffaasstt ‐‐‐‐rreeppeettiittiivvee‐‐bbeesstt 200*0ac9a9daSXin Li These flags are redundant in versions 0.9.5 and 201*0ac9a9daSXin Li above. They provided some coarse control over the 202*0ac9a9daSXin Li behaviour of the sorting algorithm in earlier ver 203*0ac9a9daSXin Li sions, which was sometimes useful. 0.9.5 and above 204*0ac9a9daSXin Li have an improved algorithm which renders these 205*0ac9a9daSXin Li flags irrelevant. 206*0ac9a9daSXin Li 207*0ac9a9daSXin Li 208*0ac9a9daSXin LiMMEEMMOORRYY MMAANNAAGGEEMMEENNTT 209*0ac9a9daSXin Li _b_z_i_p_2 compresses large files in blocks. The block size 210*0ac9a9daSXin Li affects both the compression ratio achieved, and the 211*0ac9a9daSXin Li amount of memory needed for compression and decompression. 212*0ac9a9daSXin Li The flags −1 through −9 specify the block size to be 213*0ac9a9daSXin Li 100,000 bytes through 900,000 bytes (the default) respec 214*0ac9a9daSXin Li tively. At decompression time, the block size used for 215*0ac9a9daSXin Li compression is read from the header of the compressed 216*0ac9a9daSXin Li file, and _b_u_n_z_i_p_2 then allocates itself just enough memory 217*0ac9a9daSXin Li to decompress the file. Since block sizes are stored in 218*0ac9a9daSXin Li compressed files, it follows that the flags −1 to −9 are 219*0ac9a9daSXin Li irrelevant to and so ignored during decompression. 220*0ac9a9daSXin Li 221*0ac9a9daSXin Li Compression and decompression requirements, in bytes, can 222*0ac9a9daSXin Li be estimated as: 223*0ac9a9daSXin Li 224*0ac9a9daSXin Li Compression: 400k + ( 8 x block size ) 225*0ac9a9daSXin Li 226*0ac9a9daSXin Li Decompression: 100k + ( 4 x block size ), or 227*0ac9a9daSXin Li 100k + ( 2.5 x block size ) 228*0ac9a9daSXin Li 229*0ac9a9daSXin Li Larger block sizes give rapidly diminishing marginal 230*0ac9a9daSXin Li returns. Most of the compression comes from the first two 231*0ac9a9daSXin Li or three hundred k of block size, a fact worth bearing in 232*0ac9a9daSXin Li mind when using _b_z_i_p_2 on small machines. It is also 233*0ac9a9daSXin Li important to appreciate that the decompression memory 234*0ac9a9daSXin Li requirement is set at compression time by the choice of 235*0ac9a9daSXin Li block size. 236*0ac9a9daSXin Li 237*0ac9a9daSXin Li For files compressed with the default 900k block size, 238*0ac9a9daSXin Li _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To 239*0ac9a9daSXin Li support decompression of any file on a 4 megabyte machine, 240*0ac9a9daSXin Li _b_u_n_z_i_p_2 has an option to decompress using approximately 241*0ac9a9daSXin Li half this amount of memory, about 2300 kbytes. Decompres 242*0ac9a9daSXin Li sion speed is also halved, so you should use this option 243*0ac9a9daSXin Li only where necessary. The relevant flag is ‐s. 244*0ac9a9daSXin Li 245*0ac9a9daSXin Li In general, try and use the largest block size memory con 246*0ac9a9daSXin Li straints allow, since that maximises the compression 247*0ac9a9daSXin Li achieved. Compression and decompression speed are virtu 248*0ac9a9daSXin Li ally unaffected by block size. 249*0ac9a9daSXin Li 250*0ac9a9daSXin Li Another significant point applies to files which fit in a 251*0ac9a9daSXin Li single block ‐‐ that means most files you’d encounter 252*0ac9a9daSXin Li using a large block size. The amount of real memory 253*0ac9a9daSXin Li touched is proportional to the size of the file, since the 254*0ac9a9daSXin Li file is smaller than a block. For example, compressing a 255*0ac9a9daSXin Li file 20,000 bytes long with the flag ‐9 will cause the 256*0ac9a9daSXin Li compressor to allocate around 7600k of memory, but only 257*0ac9a9daSXin Li touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the 258*0ac9a9daSXin Li decompressor will allocate 3700k but only touch 100k + 259*0ac9a9daSXin Li 20000 * 4 = 180 kbytes. 260*0ac9a9daSXin Li 261*0ac9a9daSXin Li Here is a table which summarises the maximum memory usage 262*0ac9a9daSXin Li for different block sizes. Also recorded is the total 263*0ac9a9daSXin Li compressed size for 14 files of the Calgary Text Compres 264*0ac9a9daSXin Li sion Corpus totalling 3,141,622 bytes. This column gives 265*0ac9a9daSXin Li some feel for how compression varies with block size. 266*0ac9a9daSXin Li These figures tend to understate the advantage of larger 267*0ac9a9daSXin Li block sizes for larger files, since the Corpus is domi 268*0ac9a9daSXin Li nated by smaller files. 269*0ac9a9daSXin Li 270*0ac9a9daSXin Li Compress Decompress Decompress Corpus 271*0ac9a9daSXin Li Flag usage usage ‐s usage Size 272*0ac9a9daSXin Li 273*0ac9a9daSXin Li ‐1 1200k 500k 350k 914704 274*0ac9a9daSXin Li ‐2 2000k 900k 600k 877703 275*0ac9a9daSXin Li ‐3 2800k 1300k 850k 860338 276*0ac9a9daSXin Li ‐4 3600k 1700k 1100k 846899 277*0ac9a9daSXin Li ‐5 4400k 2100k 1350k 845160 278*0ac9a9daSXin Li ‐6 5200k 2500k 1600k 838626 279*0ac9a9daSXin Li ‐7 6100k 2900k 1850k 834096 280*0ac9a9daSXin Li ‐8 6800k 3300k 2100k 828642 281*0ac9a9daSXin Li ‐9 7600k 3700k 2350k 828642 282*0ac9a9daSXin Li 283*0ac9a9daSXin Li 284*0ac9a9daSXin LiRREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS 285*0ac9a9daSXin Li _b_z_i_p_2 compresses files in blocks, usually 900kbytes long. 286*0ac9a9daSXin Li Each block is handled independently. If a media or trans 287*0ac9a9daSXin Li mission error causes a multi‐block .bz2 file to become 288*0ac9a9daSXin Li damaged, it may be possible to recover data from the 289*0ac9a9daSXin Li undamaged blocks in the file. 290*0ac9a9daSXin Li 291*0ac9a9daSXin Li The compressed representation of each block is delimited 292*0ac9a9daSXin Li by a 48‐bit pattern, which makes it possible to find the 293*0ac9a9daSXin Li block boundaries with reasonable certainty. Each block 294*0ac9a9daSXin Li also carries its own 32‐bit CRC, so damaged blocks can be 295*0ac9a9daSXin Li distinguished from undamaged ones. 296*0ac9a9daSXin Li 297*0ac9a9daSXin Li _b_z_i_p_2_r_e_c_o_v_e_r is a simple program whose purpose is to 298*0ac9a9daSXin Li search for blocks in .bz2 files, and write each block out 299*0ac9a9daSXin Li into its own .bz2 file. You can then use _b_z_i_p_2 −t to test 300*0ac9a9daSXin Li the integrity of the resulting files, and decompress those 301*0ac9a9daSXin Li which are undamaged. 302*0ac9a9daSXin Li 303*0ac9a9daSXin Li _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam 304*0ac9a9daSXin Li aged file, and writes a number of files 305*0ac9a9daSXin Li "rec00001file.bz2", "rec00002file.bz2", etc, containing 306*0ac9a9daSXin Li the extracted blocks. The output filenames are 307*0ac9a9daSXin Li designed so that the use of wildcards in subsequent pro 308*0ac9a9daSXin Li cessing ‐‐ for example, "bzip2 ‐dc rec*file.bz2 > recov 309*0ac9a9daSXin Li ered_data" ‐‐ processes the files in the correct order. 310*0ac9a9daSXin Li 311*0ac9a9daSXin Li _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 312*0ac9a9daSXin Li files, as these will contain many blocks. It is clearly 313*0ac9a9daSXin Li futile to use it on damaged single‐block files, since a 314*0ac9a9daSXin Li damaged block cannot be recovered. If you wish to min 315*0ac9a9daSXin Li imise any potential data loss through media or transmis 316*0ac9a9daSXin Li sion errors, you might consider compressing with a smaller 317*0ac9a9daSXin Li block size. 318*0ac9a9daSXin Li 319*0ac9a9daSXin Li 320*0ac9a9daSXin LiPPEERRFFOORRMMAANNCCEE NNOOTTEESS 321*0ac9a9daSXin Li The sorting phase of compression gathers together similar 322*0ac9a9daSXin Li strings in the file. Because of this, files containing 323*0ac9a9daSXin Li very long runs of repeated symbols, like "aabaabaabaab 324*0ac9a9daSXin Li ..." (repeated several hundred times) may compress more 325*0ac9a9daSXin Li slowly than normal. Versions 0.9.5 and above fare much 326*0ac9a9daSXin Li better than previous versions in this respect. The ratio 327*0ac9a9daSXin Li between worst‐case and average‐case compression time is in 328*0ac9a9daSXin Li the region of 10:1. For previous versions, this figure 329*0ac9a9daSXin Li was more like 100:1. You can use the −vvvv option to mon 330*0ac9a9daSXin Li itor progress in great detail, if you want. 331*0ac9a9daSXin Li 332*0ac9a9daSXin Li Decompression speed is unaffected by these phenomena. 333*0ac9a9daSXin Li 334*0ac9a9daSXin Li _b_z_i_p_2 usually allocates several megabytes of memory to 335*0ac9a9daSXin Li operate in, and then charges all over it in a fairly ran 336*0ac9a9daSXin Li dom fashion. This means that performance, both for com 337*0ac9a9daSXin Li pressing and decompressing, is largely determined by the 338*0ac9a9daSXin Li speed at which your machine can service cache misses. 339*0ac9a9daSXin Li Because of this, small changes to the code to reduce the 340*0ac9a9daSXin Li miss rate have been observed to give disproportionately 341*0ac9a9daSXin Li large performance improvements. I imagine _b_z_i_p_2 will per 342*0ac9a9daSXin Li form best on machines with very large caches. 343*0ac9a9daSXin Li 344*0ac9a9daSXin Li 345*0ac9a9daSXin LiCCAAVVEEAATTSS 346*0ac9a9daSXin Li I/O error messages are not as helpful as they could be. 347*0ac9a9daSXin Li _b_z_i_p_2 tries hard to detect I/O errors and exit cleanly, 348*0ac9a9daSXin Li but the details of what the problem is sometimes seem 349*0ac9a9daSXin Li rather misleading. 350*0ac9a9daSXin Li 351*0ac9a9daSXin Li This manual page pertains to version 1.0.8 of _b_z_i_p_2_. Com 352*0ac9a9daSXin Li pressed data created by this version is entirely forwards 353*0ac9a9daSXin Li and backwards compatible with the previous public 354*0ac9a9daSXin Li releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1, 355*0ac9a9daSXin Li 1.0.2 and above, but with the following exception: 0.9.0 356*0ac9a9daSXin Li and above can correctly decompress multiple concatenated 357*0ac9a9daSXin Li compressed files. 0.1pl2 cannot do this; it will stop 358*0ac9a9daSXin Li after decompressing just the first file in the stream. 359*0ac9a9daSXin Li 360*0ac9a9daSXin Li _b_z_i_p_2_r_e_c_o_v_e_r versions prior to 1.0.2 used 32‐bit integers 361*0ac9a9daSXin Li to represent bit positions in compressed files, so they 362*0ac9a9daSXin Li could not handle compressed files more than 512 megabytes 363*0ac9a9daSXin Li long. Versions 1.0.2 and above use 64‐bit ints on some 364*0ac9a9daSXin Li platforms which support them (GNU supported targets, and 365*0ac9a9daSXin Li Windows). To establish whether or not bzip2recover was 366*0ac9a9daSXin Li built with such a limitation, run it without arguments. 367*0ac9a9daSXin Li In any event you can build yourself an unlimited version 368*0ac9a9daSXin Li if you can recompile it with MaybeUInt64 set to be an 369*0ac9a9daSXin Li unsigned 64‐bit integer. 370*0ac9a9daSXin Li 371*0ac9a9daSXin Li 372*0ac9a9daSXin Li 373*0ac9a9daSXin Li 374*0ac9a9daSXin LiAAUUTTHHOORR 375*0ac9a9daSXin Li Julian Seward, [email protected]. 376*0ac9a9daSXin Li 377*0ac9a9daSXin Li https://sourceware.org/bzip2/ 378*0ac9a9daSXin Li 379*0ac9a9daSXin Li The ideas embodied in _b_z_i_p_2 are due to (at least) the fol 380*0ac9a9daSXin Li lowing people: Michael Burrows and David Wheeler (for the 381*0ac9a9daSXin Li block sorting transformation), David Wheeler (again, for 382*0ac9a9daSXin Li the Huffman coder), Peter Fenwick (for the structured cod 383*0ac9a9daSXin Li ing model in the original _b_z_i_p_, and many refinements), and 384*0ac9a9daSXin Li Alistair Moffat, Radford Neal and Ian Witten (for the 385*0ac9a9daSXin Li arithmetic coder in the original _b_z_i_p_)_. I am much 386*0ac9a9daSXin Li indebted for their help, support and advice. See the man 387*0ac9a9daSXin Li ual in the source distribution for pointers to sources of 388*0ac9a9daSXin Li documentation. Christian von Roques encouraged me to look 389*0ac9a9daSXin Li for faster sorting algorithms, so as to speed up compres 390*0ac9a9daSXin Li sion. Bela Lubkin encouraged me to improve the worst‐case 391*0ac9a9daSXin Li compression performance. Donna Robinson XMLised the docu 392*0ac9a9daSXin Li mentation. The bz* scripts are derived from those of GNU 393*0ac9a9daSXin Li gzip. Many people sent patches, helped with portability 394*0ac9a9daSXin Li problems, lent machines, gave advice and were generally 395*0ac9a9daSXin Li helpful. 396*0ac9a9daSXin Li 397*0ac9a9daSXin Li 398*0ac9a9daSXin Li 399*0ac9a9daSXin Li bzip2(1) 400