xref: /aosp_15_r20/external/bzip2/bzip2.1.preformatted (revision 0ac9a9daea5cce2e775d5da949508593e2ee9206)
1*0ac9a9daSXin Libzip2(1)                                                 bzip2(1)
2*0ac9a9daSXin Li
3*0ac9a9daSXin Li
4*0ac9a9daSXin Li
5*0ac9a9daSXin LiNNAAMMEE
6*0ac9a9daSXin Li       bzip2, bunzip2 − a block‐sorting file compressor, v1.0.8
7*0ac9a9daSXin Li       bzcat − decompresses files to stdout
8*0ac9a9daSXin Li       bzip2recover − recovers data from damaged bzip2 files
9*0ac9a9daSXin Li
10*0ac9a9daSXin Li
11*0ac9a9daSXin LiSSYYNNOOPPSSIISS
12*0ac9a9daSXin Li       bbzziipp22 [ −−ccddffkkqqssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._.  ]
13*0ac9a9daSXin Li       bbuunnzziipp22 [ −−ffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._.  ]
14*0ac9a9daSXin Li       bbzzccaatt [ −−ss ] [ _f_i_l_e_n_a_m_e_s _._._.  ]
15*0ac9a9daSXin Li       bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
16*0ac9a9daSXin Li
17*0ac9a9daSXin Li
18*0ac9a9daSXin LiDDEESSCCRRIIPPTTIIOONN
19*0ac9a9daSXin Li       _b_z_i_p_2  compresses  files  using  the Burrows‐Wheeler block
20*0ac9a9daSXin Li       sorting text compression algorithm,  and  Huffman  coding.
21*0ac9a9daSXin Li       Compression  is  generally  considerably  better than that
22*0ac9a9daSXin Li       achieved by more conventional LZ77/LZ78‐based compressors,
23*0ac9a9daSXin Li       and  approaches  the performance of the PPM family of sta­
24*0ac9a9daSXin Li       tistical compressors.
25*0ac9a9daSXin Li
26*0ac9a9daSXin Li       The command‐line options are deliberately very similar  to
27*0ac9a9daSXin Li       those of _G_N_U _g_z_i_p_, but they are not identical.
28*0ac9a9daSXin Li
29*0ac9a9daSXin Li       _b_z_i_p_2  expects  a list of file names to accompany the com­
30*0ac9a9daSXin Li       mand‐line flags.  Each file is replaced  by  a  compressed
31*0ac9a9daSXin Li       version  of  itself,  with  the  name "original_name.bz2".
32*0ac9a9daSXin Li       Each compressed file has the same modification date,  per­
33*0ac9a9daSXin Li       missions, and, when possible, ownership as the correspond­
34*0ac9a9daSXin Li       ing original, so that these properties  can  be  correctly
35*0ac9a9daSXin Li       restored  at  decompression  time.   File name handling is
36*0ac9a9daSXin Li       naive in the sense that there is no mechanism for preserv­
37*0ac9a9daSXin Li       ing  original file names, permissions, ownerships or dates
38*0ac9a9daSXin Li       in filesystems which lack these concepts, or have  serious
39*0ac9a9daSXin Li       file name length restrictions, such as MS‐DOS.
40*0ac9a9daSXin Li
41*0ac9a9daSXin Li       _b_z_i_p_2  and  _b_u_n_z_i_p_2 will by default not overwrite existing
42*0ac9a9daSXin Li       files.  If you want this to happen, specify the −f flag.
43*0ac9a9daSXin Li
44*0ac9a9daSXin Li       If no file names  are  specified,  _b_z_i_p_2  compresses  from
45*0ac9a9daSXin Li       standard  input  to  standard output.  In this case, _b_z_i_p_2
46*0ac9a9daSXin Li       will decline to write compressed output to a terminal,  as
47*0ac9a9daSXin Li       this  would  be  entirely  incomprehensible  and therefore
48*0ac9a9daSXin Li       pointless.
49*0ac9a9daSXin Li
50*0ac9a9daSXin Li       _b_u_n_z_i_p_2 (or _b_z_i_p_2 _−_d_) decompresses  all  specified  files.
51*0ac9a9daSXin Li       Files which were not created by _b_z_i_p_2 will be detected and
52*0ac9a9daSXin Li       ignored, and a warning issued.  _b_z_i_p_2  attempts  to  guess
53*0ac9a9daSXin Li       the  filename  for  the decompressed file from that of the
54*0ac9a9daSXin Li       compressed file as follows:
55*0ac9a9daSXin Li
56*0ac9a9daSXin Li              filename.bz2    becomes   filename
57*0ac9a9daSXin Li              filename.bz     becomes   filename
58*0ac9a9daSXin Li              filename.tbz2   becomes   filename.tar
59*0ac9a9daSXin Li              filename.tbz    becomes   filename.tar
60*0ac9a9daSXin Li              anyothername    becomes   anyothername.out
61*0ac9a9daSXin Li
62*0ac9a9daSXin Li       If the file does not end in one of the recognised endings,
63*0ac9a9daSXin Li       _._b_z_2_,  _._b_z_,  _._t_b_z_2 or _._t_b_z_, _b_z_i_p_2 complains that it cannot
64*0ac9a9daSXin Li       guess the name of the original file, and uses the original
65*0ac9a9daSXin Li       name with _._o_u_t appended.
66*0ac9a9daSXin Li
67*0ac9a9daSXin Li       As  with compression, supplying no filenames causes decom­
68*0ac9a9daSXin Li       pression from standard input to standard output.
69*0ac9a9daSXin Li
70*0ac9a9daSXin Li       _b_u_n_z_i_p_2 will correctly decompress a file which is the con­
71*0ac9a9daSXin Li       catenation of two or more compressed files.  The result is
72*0ac9a9daSXin Li       the concatenation of the corresponding uncompressed files.
73*0ac9a9daSXin Li       Integrity testing (−t) of concatenated compressed files is
74*0ac9a9daSXin Li       also supported.
75*0ac9a9daSXin Li
76*0ac9a9daSXin Li       You can also compress or decompress files to the  standard
77*0ac9a9daSXin Li       output  by giving the −c flag.  Multiple files may be com­
78*0ac9a9daSXin Li       pressed and decompressed like this.  The resulting outputs
79*0ac9a9daSXin Li       are  fed  sequentially to stdout.  Compression of multiple
80*0ac9a9daSXin Li       files in this manner generates a stream containing  multi­
81*0ac9a9daSXin Li       ple compressed file representations.  Such a stream can be
82*0ac9a9daSXin Li       decompressed correctly only  by  _b_z_i_p_2  version  0.9.0  or
83*0ac9a9daSXin Li       later.   Earlier  versions of _b_z_i_p_2 will stop after decom­
84*0ac9a9daSXin Li       pressing the first file in the stream.
85*0ac9a9daSXin Li
86*0ac9a9daSXin Li       _b_z_c_a_t (or _b_z_i_p_2 _‐_d_c_) decompresses all specified  files  to
87*0ac9a9daSXin Li       the standard output.
88*0ac9a9daSXin Li
89*0ac9a9daSXin Li       _b_z_i_p_2  will  read arguments from the environment variables
90*0ac9a9daSXin Li       _B_Z_I_P_2 and _B_Z_I_P_, in  that  order,  and  will  process  them
91*0ac9a9daSXin Li       before  any  arguments  read  from the command line.  This
92*0ac9a9daSXin Li       gives a convenient way to supply default arguments.
93*0ac9a9daSXin Li
94*0ac9a9daSXin Li       Compression is always performed, even  if  the  compressed
95*0ac9a9daSXin Li       file  is slightly larger than the original.  Files of less
96*0ac9a9daSXin Li       than about one hundred bytes tend to get larger, since the
97*0ac9a9daSXin Li       compression  mechanism  has  a  constant  overhead  in the
98*0ac9a9daSXin Li       region of 50 bytes.  Random data (including the output  of
99*0ac9a9daSXin Li       most  file  compressors)  is  coded at about 8.05 bits per
100*0ac9a9daSXin Li       byte, giving an expansion of around 0.5%.
101*0ac9a9daSXin Li
102*0ac9a9daSXin Li       As a self‐check for your  protection,  _b_z_i_p_2  uses  32‐bit
103*0ac9a9daSXin Li       CRCs  to make sure that the decompressed version of a file
104*0ac9a9daSXin Li       is identical to the original.  This guards against corrup­
105*0ac9a9daSXin Li       tion  of  the compressed data, and against undetected bugs
106*0ac9a9daSXin Li       in _b_z_i_p_2 (hopefully very unlikely).  The chances  of  data
107*0ac9a9daSXin Li       corruption  going  undetected  is  microscopic,  about one
108*0ac9a9daSXin Li       chance in four billion for each file processed.  Be aware,
109*0ac9a9daSXin Li       though,  that  the  check occurs upon decompression, so it
110*0ac9a9daSXin Li       can only tell you that something is wrong.  It can’t  help
111*0ac9a9daSXin Li       you  recover  the original uncompressed data.  You can use
112*0ac9a9daSXin Li       _b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files.
113*0ac9a9daSXin Li
114*0ac9a9daSXin Li       Return values: 0 for a normal exit,  1  for  environmental
115*0ac9a9daSXin Li       problems  (file not found, invalid flags, I/O errors, &c),
116*0ac9a9daSXin Li       2 to indicate a corrupt compressed file, 3 for an internal
117*0ac9a9daSXin Li       consistency error (eg, bug) which caused _b_z_i_p_2 to panic.
118*0ac9a9daSXin Li
119*0ac9a9daSXin Li
120*0ac9a9daSXin LiOOPPTTIIOONNSS
121*0ac9a9daSXin Li       −−cc ‐‐‐‐ssttddoouutt
122*0ac9a9daSXin Li              Compress or decompress to standard output.
123*0ac9a9daSXin Li
124*0ac9a9daSXin Li       −−dd ‐‐‐‐ddeeccoommpprreessss
125*0ac9a9daSXin Li              Force  decompression.  _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are
126*0ac9a9daSXin Li              really the same program,  and  the  decision  about
127*0ac9a9daSXin Li              what  actions to take is done on the basis of which
128*0ac9a9daSXin Li              name is used.  This flag overrides that  mechanism,
129*0ac9a9daSXin Li              and forces _b_z_i_p_2 to decompress.
130*0ac9a9daSXin Li
131*0ac9a9daSXin Li       −−zz ‐‐‐‐ccoommpprreessss
132*0ac9a9daSXin Li              The   complement   to   −d:   forces   compression,
133*0ac9a9daSXin Li              regardless of the invocation name.
134*0ac9a9daSXin Li
135*0ac9a9daSXin Li       −−tt ‐‐‐‐tteesstt
136*0ac9a9daSXin Li              Check integrity of the specified file(s), but don’t
137*0ac9a9daSXin Li              decompress  them.   This  really  performs  a trial
138*0ac9a9daSXin Li              decompression and throws away the result.
139*0ac9a9daSXin Li
140*0ac9a9daSXin Li       −−ff ‐‐‐‐ffoorrccee
141*0ac9a9daSXin Li              Force overwrite of output files.   Normally,  _b_z_i_p_2
142*0ac9a9daSXin Li              will  not  overwrite  existing  output files.  Also
143*0ac9a9daSXin Li              forces _b_z_i_p_2 to break hard links to files, which it
144*0ac9a9daSXin Li              otherwise wouldn’t do.
145*0ac9a9daSXin Li
146*0ac9a9daSXin Li              bzip2  normally  declines to decompress files which
147*0ac9a9daSXin Li              don’t have the  correct  magic  header  bytes.   If
148*0ac9a9daSXin Li              forced  (‐f),  however,  it  will  pass  such files
149*0ac9a9daSXin Li              through unmodified.  This is how GNU gzip  behaves.
150*0ac9a9daSXin Li
151*0ac9a9daSXin Li       −−kk ‐‐‐‐kkeeeepp
152*0ac9a9daSXin Li              Keep  (don’t delete) input files during compression
153*0ac9a9daSXin Li              or decompression.
154*0ac9a9daSXin Li
155*0ac9a9daSXin Li       −−ss ‐‐‐‐ssmmaallll
156*0ac9a9daSXin Li              Reduce memory usage, for compression, decompression
157*0ac9a9daSXin Li              and  testing.   Files  are  decompressed and tested
158*0ac9a9daSXin Li              using a modified algorithm which only requires  2.5
159*0ac9a9daSXin Li              bytes  per  block byte.  This means any file can be
160*0ac9a9daSXin Li              decompressed in 2300k of memory,  albeit  at  about
161*0ac9a9daSXin Li              half the normal speed.
162*0ac9a9daSXin Li
163*0ac9a9daSXin Li              During  compression,  −s  selects  a  block size of
164*0ac9a9daSXin Li              200k, which limits memory use to  around  the  same
165*0ac9a9daSXin Li              figure,  at  the expense of your compression ratio.
166*0ac9a9daSXin Li              In short, if your  machine  is  low  on  memory  (8
167*0ac9a9daSXin Li              megabytes  or  less),  use  −s for everything.  See
168*0ac9a9daSXin Li              MEMORY MANAGEMENT below.
169*0ac9a9daSXin Li
170*0ac9a9daSXin Li       −−qq ‐‐‐‐qquuiieett
171*0ac9a9daSXin Li              Suppress non‐essential warning messages.   Messages
172*0ac9a9daSXin Li              pertaining  to I/O errors and other critical events
173*0ac9a9daSXin Li              will not be suppressed.
174*0ac9a9daSXin Li
175*0ac9a9daSXin Li       −−vv ‐‐‐‐vveerrbboossee
176*0ac9a9daSXin Li              Verbose mode ‐‐ show the compression ratio for each
177*0ac9a9daSXin Li              file  processed.   Further  −v’s  increase the ver­
178*0ac9a9daSXin Li              bosity level, spewing out lots of information which
179*0ac9a9daSXin Li              is primarily of interest for diagnostic purposes.
180*0ac9a9daSXin Li
181*0ac9a9daSXin Li       −−LL ‐‐‐‐lliicceennssee ‐‐VV ‐‐‐‐vveerrssiioonn
182*0ac9a9daSXin Li              Display  the  software  version,  license terms and
183*0ac9a9daSXin Li              conditions.
184*0ac9a9daSXin Li
185*0ac9a9daSXin Li       −−11 ((oorr −−−−ffaasstt)) ttoo −−99 ((oorr −−−−bbeesstt))
186*0ac9a9daSXin Li              Set the block size to 100 k, 200 k ..  900  k  when
187*0ac9a9daSXin Li              compressing.   Has  no  effect  when decompressing.
188*0ac9a9daSXin Li              See MEMORY MANAGEMENT below.  The −−fast and −−best
189*0ac9a9daSXin Li              aliases  are  primarily for GNU gzip compatibility.
190*0ac9a9daSXin Li              In particular, −−fast doesn’t make things  signifi­
191*0ac9a9daSXin Li              cantly  faster.   And  −−best  merely  selects  the
192*0ac9a9daSXin Li              default behaviour.
193*0ac9a9daSXin Li
194*0ac9a9daSXin Li       −−‐‐     Treats all subsequent arguments as file names, even
195*0ac9a9daSXin Li              if they start with a dash.  This is so you can han­
196*0ac9a9daSXin Li              dle files with names beginning  with  a  dash,  for
197*0ac9a9daSXin Li              example: bzip2 −‐ −myfilename.
198*0ac9a9daSXin Li
199*0ac9a9daSXin Li       −−‐‐rreeppeettiittiivvee‐‐ffaasstt ‐‐‐‐rreeppeettiittiivvee‐‐bbeesstt
200*0ac9a9daSXin Li              These  flags  are  redundant  in versions 0.9.5 and
201*0ac9a9daSXin Li              above.  They provided some coarse control over  the
202*0ac9a9daSXin Li              behaviour  of the sorting algorithm in earlier ver­
203*0ac9a9daSXin Li              sions, which was sometimes useful.  0.9.5 and above
204*0ac9a9daSXin Li              have  an  improved  algorithm  which  renders these
205*0ac9a9daSXin Li              flags irrelevant.
206*0ac9a9daSXin Li
207*0ac9a9daSXin Li
208*0ac9a9daSXin LiMMEEMMOORRYY MMAANNAAGGEEMMEENNTT
209*0ac9a9daSXin Li       _b_z_i_p_2 compresses large files in blocks.   The  block  size
210*0ac9a9daSXin Li       affects  both  the  compression  ratio  achieved,  and the
211*0ac9a9daSXin Li       amount of memory needed for compression and decompression.
212*0ac9a9daSXin Li       The  flags  −1  through  −9  specify  the block size to be
213*0ac9a9daSXin Li       100,000 bytes through 900,000 bytes (the default)  respec­
214*0ac9a9daSXin Li       tively.   At  decompression  time, the block size used for
215*0ac9a9daSXin Li       compression is read from  the  header  of  the  compressed
216*0ac9a9daSXin Li       file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
217*0ac9a9daSXin Li       to decompress the file.  Since block sizes are  stored  in
218*0ac9a9daSXin Li       compressed  files,  it follows that the flags −1 to −9 are
219*0ac9a9daSXin Li       irrelevant to and so ignored during decompression.
220*0ac9a9daSXin Li
221*0ac9a9daSXin Li       Compression and decompression requirements, in bytes,  can
222*0ac9a9daSXin Li       be estimated as:
223*0ac9a9daSXin Li
224*0ac9a9daSXin Li              Compression:   400k + ( 8 x block size )
225*0ac9a9daSXin Li
226*0ac9a9daSXin Li              Decompression: 100k + ( 4 x block size ), or
227*0ac9a9daSXin Li                             100k + ( 2.5 x block size )
228*0ac9a9daSXin Li
229*0ac9a9daSXin Li       Larger  block  sizes  give  rapidly  diminishing  marginal
230*0ac9a9daSXin Li       returns.  Most of the compression comes from the first two
231*0ac9a9daSXin Li       or  three hundred k of block size, a fact worth bearing in
232*0ac9a9daSXin Li       mind when using _b_z_i_p_2  on  small  machines.   It  is  also
233*0ac9a9daSXin Li       important  to  appreciate  that  the  decompression memory
234*0ac9a9daSXin Li       requirement is set at compression time by  the  choice  of
235*0ac9a9daSXin Li       block size.
236*0ac9a9daSXin Li
237*0ac9a9daSXin Li       For  files  compressed  with  the default 900k block size,
238*0ac9a9daSXin Li       _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress.   To
239*0ac9a9daSXin Li       support decompression of any file on a 4 megabyte machine,
240*0ac9a9daSXin Li       _b_u_n_z_i_p_2 has an option to  decompress  using  approximately
241*0ac9a9daSXin Li       half this amount of memory, about 2300 kbytes.  Decompres­
242*0ac9a9daSXin Li       sion speed is also halved, so you should use  this  option
243*0ac9a9daSXin Li       only where necessary.  The relevant flag is ‐s.
244*0ac9a9daSXin Li
245*0ac9a9daSXin Li       In general, try and use the largest block size memory con­
246*0ac9a9daSXin Li       straints  allow,  since  that  maximises  the  compression
247*0ac9a9daSXin Li       achieved.   Compression and decompression speed are virtu­
248*0ac9a9daSXin Li       ally unaffected by block size.
249*0ac9a9daSXin Li
250*0ac9a9daSXin Li       Another significant point applies to files which fit in  a
251*0ac9a9daSXin Li       single  block  ‐‐  that  means  most files you’d encounter
252*0ac9a9daSXin Li       using a large block  size.   The  amount  of  real  memory
253*0ac9a9daSXin Li       touched is proportional to the size of the file, since the
254*0ac9a9daSXin Li       file is smaller than a block.  For example, compressing  a
255*0ac9a9daSXin Li       file  20,000  bytes  long  with the flag ‐9 will cause the
256*0ac9a9daSXin Li       compressor to allocate around 7600k of  memory,  but  only
257*0ac9a9daSXin Li       touch 400k + 20000 * 8 = 560 kbytes of it.  Similarly, the
258*0ac9a9daSXin Li       decompressor will allocate 3700k but  only  touch  100k  +
259*0ac9a9daSXin Li       20000 * 4 = 180 kbytes.
260*0ac9a9daSXin Li
261*0ac9a9daSXin Li       Here  is a table which summarises the maximum memory usage
262*0ac9a9daSXin Li       for different block sizes.  Also  recorded  is  the  total
263*0ac9a9daSXin Li       compressed  size for 14 files of the Calgary Text Compres­
264*0ac9a9daSXin Li       sion Corpus totalling 3,141,622 bytes.  This column  gives
265*0ac9a9daSXin Li       some  feel  for  how  compression  varies with block size.
266*0ac9a9daSXin Li       These figures tend to understate the advantage  of  larger
267*0ac9a9daSXin Li       block  sizes  for  larger files, since the Corpus is domi­
268*0ac9a9daSXin Li       nated by smaller files.
269*0ac9a9daSXin Li
270*0ac9a9daSXin Li                  Compress   Decompress   Decompress   Corpus
271*0ac9a9daSXin Li           Flag     usage      usage       ‐s usage     Size
272*0ac9a9daSXin Li
273*0ac9a9daSXin Li            ‐1      1200k       500k         350k      914704
274*0ac9a9daSXin Li            ‐2      2000k       900k         600k      877703
275*0ac9a9daSXin Li            ‐3      2800k      1300k         850k      860338
276*0ac9a9daSXin Li            ‐4      3600k      1700k        1100k      846899
277*0ac9a9daSXin Li            ‐5      4400k      2100k        1350k      845160
278*0ac9a9daSXin Li            ‐6      5200k      2500k        1600k      838626
279*0ac9a9daSXin Li            ‐7      6100k      2900k        1850k      834096
280*0ac9a9daSXin Li            ‐8      6800k      3300k        2100k      828642
281*0ac9a9daSXin Li            ‐9      7600k      3700k        2350k      828642
282*0ac9a9daSXin Li
283*0ac9a9daSXin Li
284*0ac9a9daSXin LiRREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
285*0ac9a9daSXin Li       _b_z_i_p_2 compresses files in blocks, usually 900kbytes  long.
286*0ac9a9daSXin Li       Each block is handled independently.  If a media or trans­
287*0ac9a9daSXin Li       mission error causes a multi‐block  .bz2  file  to  become
288*0ac9a9daSXin Li       damaged,  it  may  be  possible  to  recover data from the
289*0ac9a9daSXin Li       undamaged blocks in the file.
290*0ac9a9daSXin Li
291*0ac9a9daSXin Li       The compressed representation of each block  is  delimited
292*0ac9a9daSXin Li       by  a  48‐bit pattern, which makes it possible to find the
293*0ac9a9daSXin Li       block boundaries with reasonable  certainty.   Each  block
294*0ac9a9daSXin Li       also  carries its own 32‐bit CRC, so damaged blocks can be
295*0ac9a9daSXin Li       distinguished from undamaged ones.
296*0ac9a9daSXin Li
297*0ac9a9daSXin Li       _b_z_i_p_2_r_e_c_o_v_e_r is a  simple  program  whose  purpose  is  to
298*0ac9a9daSXin Li       search  for blocks in .bz2 files, and write each block out
299*0ac9a9daSXin Li       into its own .bz2 file.  You can then use _b_z_i_p_2 −t to test
300*0ac9a9daSXin Li       the integrity of the resulting files, and decompress those
301*0ac9a9daSXin Li       which are undamaged.
302*0ac9a9daSXin Li
303*0ac9a9daSXin Li       _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam­
304*0ac9a9daSXin Li       aged    file,    and    writes    a    number   of   files
305*0ac9a9daSXin Li       "rec00001file.bz2",  "rec00002file.bz2",  etc,  containing
306*0ac9a9daSXin Li       the   extracted   blocks.   The   output   filenames   are
307*0ac9a9daSXin Li       designed  so  that the use of wildcards in subsequent pro­
308*0ac9a9daSXin Li       cessing  ‐‐ for example, "bzip2 ‐dc  rec*file.bz2 > recov­
309*0ac9a9daSXin Li       ered_data" ‐‐ processes the files in the correct order.
310*0ac9a9daSXin Li
311*0ac9a9daSXin Li       _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
312*0ac9a9daSXin Li       files,  as  these will contain many blocks.  It is clearly
313*0ac9a9daSXin Li       futile to use it on damaged single‐block  files,  since  a
314*0ac9a9daSXin Li       damaged  block  cannot  be recovered.  If you wish to min­
315*0ac9a9daSXin Li       imise any potential data loss through media  or  transmis­
316*0ac9a9daSXin Li       sion errors, you might consider compressing with a smaller
317*0ac9a9daSXin Li       block size.
318*0ac9a9daSXin Li
319*0ac9a9daSXin Li
320*0ac9a9daSXin LiPPEERRFFOORRMMAANNCCEE NNOOTTEESS
321*0ac9a9daSXin Li       The sorting phase of compression gathers together  similar
322*0ac9a9daSXin Li       strings  in  the  file.  Because of this, files containing
323*0ac9a9daSXin Li       very long runs of  repeated  symbols,  like  "aabaabaabaab
324*0ac9a9daSXin Li       ..."   (repeated  several hundred times) may compress more
325*0ac9a9daSXin Li       slowly than normal.  Versions 0.9.5 and  above  fare  much
326*0ac9a9daSXin Li       better  than previous versions in this respect.  The ratio
327*0ac9a9daSXin Li       between worst‐case and average‐case compression time is in
328*0ac9a9daSXin Li       the  region  of  10:1.  For previous versions, this figure
329*0ac9a9daSXin Li       was more like 100:1.  You can use the −vvvv option to mon­
330*0ac9a9daSXin Li       itor progress in great detail, if you want.
331*0ac9a9daSXin Li
332*0ac9a9daSXin Li       Decompression speed is unaffected by these phenomena.
333*0ac9a9daSXin Li
334*0ac9a9daSXin Li       _b_z_i_p_2  usually  allocates  several  megabytes of memory to
335*0ac9a9daSXin Li       operate in, and then charges all over it in a fairly  ran­
336*0ac9a9daSXin Li       dom  fashion.   This means that performance, both for com­
337*0ac9a9daSXin Li       pressing and decompressing, is largely determined  by  the
338*0ac9a9daSXin Li       speed  at  which  your  machine  can service cache misses.
339*0ac9a9daSXin Li       Because of this, small changes to the code to  reduce  the
340*0ac9a9daSXin Li       miss  rate  have  been observed to give disproportionately
341*0ac9a9daSXin Li       large performance improvements.  I imagine _b_z_i_p_2 will per­
342*0ac9a9daSXin Li       form best on machines with very large caches.
343*0ac9a9daSXin Li
344*0ac9a9daSXin Li
345*0ac9a9daSXin LiCCAAVVEEAATTSS
346*0ac9a9daSXin Li       I/O  error  messages  are not as helpful as they could be.
347*0ac9a9daSXin Li       _b_z_i_p_2 tries hard to detect I/O errors  and  exit  cleanly,
348*0ac9a9daSXin Li       but  the  details  of  what  the problem is sometimes seem
349*0ac9a9daSXin Li       rather misleading.
350*0ac9a9daSXin Li
351*0ac9a9daSXin Li       This manual page pertains to version 1.0.8 of _b_z_i_p_2_.  Com­
352*0ac9a9daSXin Li       pressed  data created by this version is entirely forwards
353*0ac9a9daSXin Li       and  backwards  compatible  with   the   previous   public
354*0ac9a9daSXin Li       releases,  versions  0.1pl2,  0.9.0,  0.9.5, 1.0.0, 1.0.1,
355*0ac9a9daSXin Li       1.0.2 and above, but with the  following  exception: 0.9.0
356*0ac9a9daSXin Li       and above can  correctly decompress  multiple concatenated
357*0ac9a9daSXin Li       compressed files.  0.1pl2  cannot do this;  it  will  stop
358*0ac9a9daSXin Li       after  decompressing just the first file in the stream.
359*0ac9a9daSXin Li
360*0ac9a9daSXin Li       _b_z_i_p_2_r_e_c_o_v_e_r  versions prior to 1.0.2 used 32‐bit integers
361*0ac9a9daSXin Li       to represent bit positions in compressed  files,  so  they
362*0ac9a9daSXin Li       could  not handle compressed files more than 512 megabytes
363*0ac9a9daSXin Li       long.  Versions 1.0.2 and above use 64‐bit  ints  on  some
364*0ac9a9daSXin Li       platforms  which  support them (GNU supported targets, and
365*0ac9a9daSXin Li       Windows).  To establish whether or  not  bzip2recover  was
366*0ac9a9daSXin Li       built  with  such  a limitation, run it without arguments.
367*0ac9a9daSXin Li       In any event you can build yourself an  unlimited  version
368*0ac9a9daSXin Li       if  you  can  recompile  it  with MaybeUInt64 set to be an
369*0ac9a9daSXin Li       unsigned 64‐bit integer.
370*0ac9a9daSXin Li
371*0ac9a9daSXin Li
372*0ac9a9daSXin Li
373*0ac9a9daSXin Li
374*0ac9a9daSXin LiAAUUTTHHOORR
375*0ac9a9daSXin Li       Julian Seward, [email protected].
376*0ac9a9daSXin Li
377*0ac9a9daSXin Li       https://sourceware.org/bzip2/
378*0ac9a9daSXin Li
379*0ac9a9daSXin Li       The ideas embodied in _b_z_i_p_2 are due to (at least) the fol­
380*0ac9a9daSXin Li       lowing  people: Michael Burrows and David Wheeler (for the
381*0ac9a9daSXin Li       block sorting transformation), David Wheeler  (again,  for
382*0ac9a9daSXin Li       the Huffman coder), Peter Fenwick (for the structured cod­
383*0ac9a9daSXin Li       ing model in the original _b_z_i_p_, and many refinements), and
384*0ac9a9daSXin Li       Alistair  Moffat,  Radford  Neal  and  Ian Witten (for the
385*0ac9a9daSXin Li       arithmetic  coder  in  the  original  _b_z_i_p_)_.   I  am  much
386*0ac9a9daSXin Li       indebted for their help, support and advice.  See the man­
387*0ac9a9daSXin Li       ual in the source distribution for pointers to sources  of
388*0ac9a9daSXin Li       documentation.  Christian von Roques encouraged me to look
389*0ac9a9daSXin Li       for faster sorting algorithms, so as to speed up  compres­
390*0ac9a9daSXin Li       sion.  Bela Lubkin encouraged me to improve the worst‐case
391*0ac9a9daSXin Li       compression performance.  Donna Robinson XMLised the docu­
392*0ac9a9daSXin Li       mentation.   The bz* scripts are derived from those of GNU
393*0ac9a9daSXin Li       gzip.  Many people sent patches, helped  with  portability
394*0ac9a9daSXin Li       problems,  lent  machines,  gave advice and were generally
395*0ac9a9daSXin Li       helpful.
396*0ac9a9daSXin Li
397*0ac9a9daSXin Li
398*0ac9a9daSXin Li
399*0ac9a9daSXin Li                                                         bzip2(1)
400