1*0ac9a9daSXin Li<?xml version="1.0"?> <!-- -*- sgml -*- --> 2*0ac9a9daSXin Li<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 3*0ac9a9daSXin Li "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"[ 4*0ac9a9daSXin Li 5*0ac9a9daSXin Li<!-- various strings, dates etc. common to all docs --> 6*0ac9a9daSXin Li<!ENTITY % common-ents SYSTEM "entities.xml"> %common-ents; 7*0ac9a9daSXin Li]> 8*0ac9a9daSXin Li 9*0ac9a9daSXin Li<book lang="en" id="userman" xreflabel="bzip2 Manual"> 10*0ac9a9daSXin Li 11*0ac9a9daSXin Li <bookinfo> 12*0ac9a9daSXin Li <title>bzip2 and libbzip2, version &bz-version;</title> 13*0ac9a9daSXin Li <subtitle>A program and library for data compression</subtitle> 14*0ac9a9daSXin Li <copyright> 15*0ac9a9daSXin Li <year>&bz-lifespan;</year> 16*0ac9a9daSXin Li <holder>Julian Seward</holder> 17*0ac9a9daSXin Li </copyright> 18*0ac9a9daSXin Li <releaseinfo>Version &bz-version; of &bz-date;</releaseinfo> 19*0ac9a9daSXin Li 20*0ac9a9daSXin Li <authorgroup> 21*0ac9a9daSXin Li <author> 22*0ac9a9daSXin Li <firstname>Julian</firstname> 23*0ac9a9daSXin Li <surname>Seward</surname> 24*0ac9a9daSXin Li <affiliation> 25*0ac9a9daSXin Li <orgname>&bz-url;</orgname> 26*0ac9a9daSXin Li </affiliation> 27*0ac9a9daSXin Li </author> 28*0ac9a9daSXin Li </authorgroup> 29*0ac9a9daSXin Li 30*0ac9a9daSXin Li <legalnotice id="legal"> 31*0ac9a9daSXin Li 32*0ac9a9daSXin Li <para>This program, <computeroutput>bzip2</computeroutput>, the 33*0ac9a9daSXin Li associated library <computeroutput>libbzip2</computeroutput>, and 34*0ac9a9daSXin Li all documentation, are copyright © &bz-lifespan; Julian Seward. 35*0ac9a9daSXin Li All rights reserved.</para> 36*0ac9a9daSXin Li 37*0ac9a9daSXin Li <para>Redistribution and use in source and binary forms, with 38*0ac9a9daSXin Li or without modification, are permitted provided that the 39*0ac9a9daSXin Li following conditions are met:</para> 40*0ac9a9daSXin Li 41*0ac9a9daSXin Li <itemizedlist mark='bullet'> 42*0ac9a9daSXin Li 43*0ac9a9daSXin Li <listitem><para>Redistributions of source code must retain the 44*0ac9a9daSXin Li above copyright notice, this list of conditions and the 45*0ac9a9daSXin Li following disclaimer.</para></listitem> 46*0ac9a9daSXin Li 47*0ac9a9daSXin Li <listitem><para>The origin of this software must not be 48*0ac9a9daSXin Li misrepresented; you must not claim that you wrote the original 49*0ac9a9daSXin Li software. If you use this software in a product, an 50*0ac9a9daSXin Li acknowledgment in the product documentation would be 51*0ac9a9daSXin Li appreciated but is not required.</para></listitem> 52*0ac9a9daSXin Li 53*0ac9a9daSXin Li <listitem><para>Altered source versions must be plainly marked 54*0ac9a9daSXin Li as such, and must not be misrepresented as being the original 55*0ac9a9daSXin Li software.</para></listitem> 56*0ac9a9daSXin Li 57*0ac9a9daSXin Li <listitem><para>The name of the author may not be used to 58*0ac9a9daSXin Li endorse or promote products derived from this software without 59*0ac9a9daSXin Li specific prior written permission.</para></listitem> 60*0ac9a9daSXin Li 61*0ac9a9daSXin Li </itemizedlist> 62*0ac9a9daSXin Li 63*0ac9a9daSXin Li <para>THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY 64*0ac9a9daSXin Li EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 65*0ac9a9daSXin Li THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A 66*0ac9a9daSXin Li PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 67*0ac9a9daSXin Li AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 68*0ac9a9daSXin Li EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED 69*0ac9a9daSXin Li TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 70*0ac9a9daSXin Li DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 71*0ac9a9daSXin Li ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 72*0ac9a9daSXin Li LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 73*0ac9a9daSXin Li IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF 74*0ac9a9daSXin Li THE POSSIBILITY OF SUCH DAMAGE.</para> 75*0ac9a9daSXin Li 76*0ac9a9daSXin Li <para>PATENTS: To the best of my knowledge, 77*0ac9a9daSXin Li <computeroutput>bzip2</computeroutput> and 78*0ac9a9daSXin Li <computeroutput>libbzip2</computeroutput> do not use any patented 79*0ac9a9daSXin Li algorithms. However, I do not have the resources to carry 80*0ac9a9daSXin Li out a patent search. Therefore I cannot give any guarantee of 81*0ac9a9daSXin Li the above statement. 82*0ac9a9daSXin Li </para> 83*0ac9a9daSXin Li 84*0ac9a9daSXin Li</legalnotice> 85*0ac9a9daSXin Li 86*0ac9a9daSXin Li</bookinfo> 87*0ac9a9daSXin Li 88*0ac9a9daSXin Li 89*0ac9a9daSXin Li 90*0ac9a9daSXin Li<chapter id="intro" xreflabel="Introduction"> 91*0ac9a9daSXin Li<title>Introduction</title> 92*0ac9a9daSXin Li 93*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> compresses files 94*0ac9a9daSXin Liusing the Burrows-Wheeler block-sorting text compression 95*0ac9a9daSXin Lialgorithm, and Huffman coding. Compression is generally 96*0ac9a9daSXin Liconsiderably better than that achieved by more conventional 97*0ac9a9daSXin LiLZ77/LZ78-based compressors, and approaches the performance of 98*0ac9a9daSXin Lithe PPM family of statistical compressors.</para> 99*0ac9a9daSXin Li 100*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> is built on top of 101*0ac9a9daSXin Li<computeroutput>libbzip2</computeroutput>, a flexible library for 102*0ac9a9daSXin Lihandling compressed data in the 103*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> format. This manual 104*0ac9a9daSXin Lidescribes both how to use the program and how to work with the 105*0ac9a9daSXin Lilibrary interface. Most of the manual is devoted to this 106*0ac9a9daSXin Lilibrary, not the program, which is good news if your interest is 107*0ac9a9daSXin Lionly in the program.</para> 108*0ac9a9daSXin Li 109*0ac9a9daSXin Li<itemizedlist mark='bullet'> 110*0ac9a9daSXin Li 111*0ac9a9daSXin Li <listitem><para><xref linkend="using"/> describes how to use 112*0ac9a9daSXin Li <computeroutput>bzip2</computeroutput>; this is the only part 113*0ac9a9daSXin Li you need to read if you just want to know how to operate the 114*0ac9a9daSXin Li program.</para></listitem> 115*0ac9a9daSXin Li 116*0ac9a9daSXin Li <listitem><para><xref linkend="libprog"/> describes the 117*0ac9a9daSXin Li programming interfaces in detail, and</para></listitem> 118*0ac9a9daSXin Li 119*0ac9a9daSXin Li <listitem><para><xref linkend="misc"/> records some 120*0ac9a9daSXin Li miscellaneous notes which I thought ought to be recorded 121*0ac9a9daSXin Li somewhere.</para></listitem> 122*0ac9a9daSXin Li 123*0ac9a9daSXin Li</itemizedlist> 124*0ac9a9daSXin Li 125*0ac9a9daSXin Li</chapter> 126*0ac9a9daSXin Li 127*0ac9a9daSXin Li 128*0ac9a9daSXin Li<chapter id="using" xreflabel="How to use bzip2"> 129*0ac9a9daSXin Li<title>How to use bzip2</title> 130*0ac9a9daSXin Li 131*0ac9a9daSXin Li<para>This chapter contains a copy of the 132*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> man page, and nothing 133*0ac9a9daSXin Lielse.</para> 134*0ac9a9daSXin Li 135*0ac9a9daSXin Li<sect1 id="name" xreflabel="NAME"> 136*0ac9a9daSXin Li<title>NAME</title> 137*0ac9a9daSXin Li 138*0ac9a9daSXin Li<itemizedlist mark='bullet'> 139*0ac9a9daSXin Li 140*0ac9a9daSXin Li <listitem><para><computeroutput>bzip2</computeroutput>, 141*0ac9a9daSXin Li <computeroutput>bunzip2</computeroutput> - a block-sorting file 142*0ac9a9daSXin Li compressor, v&bz-version;</para></listitem> 143*0ac9a9daSXin Li 144*0ac9a9daSXin Li <listitem><para><computeroutput>bzcat</computeroutput> - 145*0ac9a9daSXin Li decompresses files to stdout</para></listitem> 146*0ac9a9daSXin Li 147*0ac9a9daSXin Li <listitem><para><computeroutput>bzip2recover</computeroutput> - 148*0ac9a9daSXin Li recovers data from damaged bzip2 files</para></listitem> 149*0ac9a9daSXin Li 150*0ac9a9daSXin Li</itemizedlist> 151*0ac9a9daSXin Li 152*0ac9a9daSXin Li</sect1> 153*0ac9a9daSXin Li 154*0ac9a9daSXin Li 155*0ac9a9daSXin Li<sect1 id="synopsis" xreflabel="SYNOPSIS"> 156*0ac9a9daSXin Li<title>SYNOPSIS</title> 157*0ac9a9daSXin Li 158*0ac9a9daSXin Li<itemizedlist mark='bullet'> 159*0ac9a9daSXin Li 160*0ac9a9daSXin Li <listitem><para><computeroutput>bzip2</computeroutput> [ 161*0ac9a9daSXin Li -cdfkqstvzVL123456789 ] [ filenames ... ]</para></listitem> 162*0ac9a9daSXin Li 163*0ac9a9daSXin Li <listitem><para><computeroutput>bunzip2</computeroutput> [ 164*0ac9a9daSXin Li -fkvsVL ] [ filenames ... ]</para></listitem> 165*0ac9a9daSXin Li 166*0ac9a9daSXin Li <listitem><para><computeroutput>bzcat</computeroutput> [ -s ] [ 167*0ac9a9daSXin Li filenames ... ]</para></listitem> 168*0ac9a9daSXin Li 169*0ac9a9daSXin Li <listitem><para><computeroutput>bzip2recover</computeroutput> 170*0ac9a9daSXin Li filename</para></listitem> 171*0ac9a9daSXin Li 172*0ac9a9daSXin Li</itemizedlist> 173*0ac9a9daSXin Li 174*0ac9a9daSXin Li</sect1> 175*0ac9a9daSXin Li 176*0ac9a9daSXin Li 177*0ac9a9daSXin Li<sect1 id="description" xreflabel="DESCRIPTION"> 178*0ac9a9daSXin Li<title>DESCRIPTION</title> 179*0ac9a9daSXin Li 180*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> compresses files 181*0ac9a9daSXin Liusing the Burrows-Wheeler block sorting text compression 182*0ac9a9daSXin Lialgorithm, and Huffman coding. Compression is generally 183*0ac9a9daSXin Liconsiderably better than that achieved by more conventional 184*0ac9a9daSXin LiLZ77/LZ78-based compressors, and approaches the performance of 185*0ac9a9daSXin Lithe PPM family of statistical compressors.</para> 186*0ac9a9daSXin Li 187*0ac9a9daSXin Li<para>The command-line options are deliberately very similar to 188*0ac9a9daSXin Lithose of GNU <computeroutput>gzip</computeroutput>, but they are 189*0ac9a9daSXin Linot identical.</para> 190*0ac9a9daSXin Li 191*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> expects a list of 192*0ac9a9daSXin Lifile names to accompany the command-line flags. Each file is 193*0ac9a9daSXin Lireplaced by a compressed version of itself, with the name 194*0ac9a9daSXin Li<computeroutput>original_name.bz2</computeroutput>. Each 195*0ac9a9daSXin Licompressed file has the same modification date, permissions, and, 196*0ac9a9daSXin Liwhen possible, ownership as the corresponding original, so that 197*0ac9a9daSXin Lithese properties can be correctly restored at decompression time. 198*0ac9a9daSXin LiFile name handling is naive in the sense that there is no 199*0ac9a9daSXin Limechanism for preserving original file names, permissions, 200*0ac9a9daSXin Liownerships or dates in filesystems which lack these concepts, or 201*0ac9a9daSXin Lihave serious file name length restrictions, such as 202*0ac9a9daSXin LiMS-DOS.</para> 203*0ac9a9daSXin Li 204*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> and 205*0ac9a9daSXin Li<computeroutput>bunzip2</computeroutput> will by default not 206*0ac9a9daSXin Lioverwrite existing files. If you want this to happen, specify 207*0ac9a9daSXin Lithe <computeroutput>-f</computeroutput> flag.</para> 208*0ac9a9daSXin Li 209*0ac9a9daSXin Li<para>If no file names are specified, 210*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> compresses from standard 211*0ac9a9daSXin Liinput to standard output. In this case, 212*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> will decline to write 213*0ac9a9daSXin Licompressed output to a terminal, as this would be entirely 214*0ac9a9daSXin Liincomprehensible and therefore pointless.</para> 215*0ac9a9daSXin Li 216*0ac9a9daSXin Li<para><computeroutput>bunzip2</computeroutput> (or 217*0ac9a9daSXin Li<computeroutput>bzip2 -d</computeroutput>) decompresses all 218*0ac9a9daSXin Lispecified files. Files which were not created by 219*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> will be detected and 220*0ac9a9daSXin Liignored, and a warning issued. 221*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> attempts to guess the 222*0ac9a9daSXin Lifilename for the decompressed file from that of the compressed 223*0ac9a9daSXin Lifile as follows:</para> 224*0ac9a9daSXin Li 225*0ac9a9daSXin Li<itemizedlist mark='bullet'> 226*0ac9a9daSXin Li 227*0ac9a9daSXin Li <listitem><para><computeroutput>filename.bz2 </computeroutput> 228*0ac9a9daSXin Li becomes 229*0ac9a9daSXin Li <computeroutput>filename</computeroutput></para></listitem> 230*0ac9a9daSXin Li 231*0ac9a9daSXin Li <listitem><para><computeroutput>filename.bz </computeroutput> 232*0ac9a9daSXin Li becomes 233*0ac9a9daSXin Li <computeroutput>filename</computeroutput></para></listitem> 234*0ac9a9daSXin Li 235*0ac9a9daSXin Li <listitem><para><computeroutput>filename.tbz2</computeroutput> 236*0ac9a9daSXin Li becomes 237*0ac9a9daSXin Li <computeroutput>filename.tar</computeroutput></para></listitem> 238*0ac9a9daSXin Li 239*0ac9a9daSXin Li <listitem><para><computeroutput>filename.tbz </computeroutput> 240*0ac9a9daSXin Li becomes 241*0ac9a9daSXin Li <computeroutput>filename.tar</computeroutput></para></listitem> 242*0ac9a9daSXin Li 243*0ac9a9daSXin Li <listitem><para><computeroutput>anyothername </computeroutput> 244*0ac9a9daSXin Li becomes 245*0ac9a9daSXin Li <computeroutput>anyothername.out</computeroutput></para></listitem> 246*0ac9a9daSXin Li 247*0ac9a9daSXin Li</itemizedlist> 248*0ac9a9daSXin Li 249*0ac9a9daSXin Li<para>If the file does not end in one of the recognised endings, 250*0ac9a9daSXin Li<computeroutput>.bz2</computeroutput>, 251*0ac9a9daSXin Li<computeroutput>.bz</computeroutput>, 252*0ac9a9daSXin Li<computeroutput>.tbz2</computeroutput> or 253*0ac9a9daSXin Li<computeroutput>.tbz</computeroutput>, 254*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> complains that it cannot 255*0ac9a9daSXin Liguess the name of the original file, and uses the original name 256*0ac9a9daSXin Liwith <computeroutput>.out</computeroutput> appended.</para> 257*0ac9a9daSXin Li 258*0ac9a9daSXin Li<para>As with compression, supplying no filenames causes 259*0ac9a9daSXin Lidecompression from standard input to standard output.</para> 260*0ac9a9daSXin Li 261*0ac9a9daSXin Li<para><computeroutput>bunzip2</computeroutput> will correctly 262*0ac9a9daSXin Lidecompress a file which is the concatenation of two or more 263*0ac9a9daSXin Licompressed files. The result is the concatenation of the 264*0ac9a9daSXin Licorresponding uncompressed files. Integrity testing 265*0ac9a9daSXin Li(<computeroutput>-t</computeroutput>) of concatenated compressed 266*0ac9a9daSXin Lifiles is also supported.</para> 267*0ac9a9daSXin Li 268*0ac9a9daSXin Li<para>You can also compress or decompress files to the standard 269*0ac9a9daSXin Lioutput by giving the <computeroutput>-c</computeroutput> flag. 270*0ac9a9daSXin LiMultiple files may be compressed and decompressed like this. The 271*0ac9a9daSXin Liresulting outputs are fed sequentially to stdout. Compression of 272*0ac9a9daSXin Limultiple files in this manner generates a stream containing 273*0ac9a9daSXin Limultiple compressed file representations. Such a stream can be 274*0ac9a9daSXin Lidecompressed correctly only by 275*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> version 0.9.0 or later. 276*0ac9a9daSXin LiEarlier versions of <computeroutput>bzip2</computeroutput> will 277*0ac9a9daSXin Listop after decompressing the first file in the stream.</para> 278*0ac9a9daSXin Li 279*0ac9a9daSXin Li<para><computeroutput>bzcat</computeroutput> (or 280*0ac9a9daSXin Li<computeroutput>bzip2 -dc</computeroutput>) decompresses all 281*0ac9a9daSXin Lispecified files to the standard output.</para> 282*0ac9a9daSXin Li 283*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> will read arguments 284*0ac9a9daSXin Lifrom the environment variables 285*0ac9a9daSXin Li<computeroutput>BZIP2</computeroutput> and 286*0ac9a9daSXin Li<computeroutput>BZIP</computeroutput>, in that order, and will 287*0ac9a9daSXin Liprocess them before any arguments read from the command line. 288*0ac9a9daSXin LiThis gives a convenient way to supply default arguments.</para> 289*0ac9a9daSXin Li 290*0ac9a9daSXin Li<para>Compression is always performed, even if the compressed 291*0ac9a9daSXin Lifile is slightly larger than the original. Files of less than 292*0ac9a9daSXin Liabout one hundred bytes tend to get larger, since the compression 293*0ac9a9daSXin Limechanism has a constant overhead in the region of 50 bytes. 294*0ac9a9daSXin LiRandom data (including the output of most file compressors) is 295*0ac9a9daSXin Licoded at about 8.05 bits per byte, giving an expansion of around 296*0ac9a9daSXin Li0.5%.</para> 297*0ac9a9daSXin Li 298*0ac9a9daSXin Li<para>As a self-check for your protection, 299*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> uses 32-bit CRCs to make 300*0ac9a9daSXin Lisure that the decompressed version of a file is identical to the 301*0ac9a9daSXin Lioriginal. This guards against corruption of the compressed data, 302*0ac9a9daSXin Liand against undetected bugs in 303*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> (hopefully very unlikely). 304*0ac9a9daSXin LiThe chances of data corruption going undetected is microscopic, 305*0ac9a9daSXin Liabout one chance in four billion for each file processed. Be 306*0ac9a9daSXin Liaware, though, that the check occurs upon decompression, so it 307*0ac9a9daSXin Lican only tell you that something is wrong. It can't help you 308*0ac9a9daSXin Lirecover the original uncompressed data. You can use 309*0ac9a9daSXin Li<computeroutput>bzip2recover</computeroutput> to try to recover 310*0ac9a9daSXin Lidata from damaged files.</para> 311*0ac9a9daSXin Li 312*0ac9a9daSXin Li<para>Return values: 0 for a normal exit, 1 for environmental 313*0ac9a9daSXin Liproblems (file not found, invalid flags, I/O errors, etc.), 2 314*0ac9a9daSXin Lito indicate a corrupt compressed file, 3 for an internal 315*0ac9a9daSXin Liconsistency error (eg, bug) which caused 316*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> to panic.</para> 317*0ac9a9daSXin Li 318*0ac9a9daSXin Li</sect1> 319*0ac9a9daSXin Li 320*0ac9a9daSXin Li 321*0ac9a9daSXin Li<sect1 id="options" xreflabel="OPTIONS"> 322*0ac9a9daSXin Li<title>OPTIONS</title> 323*0ac9a9daSXin Li 324*0ac9a9daSXin Li<variablelist> 325*0ac9a9daSXin Li 326*0ac9a9daSXin Li <varlistentry> 327*0ac9a9daSXin Li <term><computeroutput>-c --stdout</computeroutput></term> 328*0ac9a9daSXin Li <listitem><para>Compress or decompress to standard 329*0ac9a9daSXin Li output.</para></listitem> 330*0ac9a9daSXin Li </varlistentry> 331*0ac9a9daSXin Li 332*0ac9a9daSXin Li <varlistentry> 333*0ac9a9daSXin Li <term><computeroutput>-d --decompress</computeroutput></term> 334*0ac9a9daSXin Li <listitem><para>Force decompression. 335*0ac9a9daSXin Li <computeroutput>bzip2</computeroutput>, 336*0ac9a9daSXin Li <computeroutput>bunzip2</computeroutput> and 337*0ac9a9daSXin Li <computeroutput>bzcat</computeroutput> are really the same 338*0ac9a9daSXin Li program, and the decision about what actions to take is done on 339*0ac9a9daSXin Li the basis of which name is used. This flag overrides that 340*0ac9a9daSXin Li mechanism, and forces bzip2 to decompress.</para></listitem> 341*0ac9a9daSXin Li </varlistentry> 342*0ac9a9daSXin Li 343*0ac9a9daSXin Li <varlistentry> 344*0ac9a9daSXin Li <term><computeroutput>-z --compress</computeroutput></term> 345*0ac9a9daSXin Li <listitem><para>The complement to 346*0ac9a9daSXin Li <computeroutput>-d</computeroutput>: forces compression, 347*0ac9a9daSXin Li regardless of the invokation name.</para></listitem> 348*0ac9a9daSXin Li </varlistentry> 349*0ac9a9daSXin Li 350*0ac9a9daSXin Li <varlistentry> 351*0ac9a9daSXin Li <term><computeroutput>-t --test</computeroutput></term> 352*0ac9a9daSXin Li <listitem><para>Check integrity of the specified file(s), but 353*0ac9a9daSXin Li don't decompress them. This really performs a trial 354*0ac9a9daSXin Li decompression and throws away the result.</para></listitem> 355*0ac9a9daSXin Li </varlistentry> 356*0ac9a9daSXin Li 357*0ac9a9daSXin Li <varlistentry> 358*0ac9a9daSXin Li <term><computeroutput>-f --force</computeroutput></term> 359*0ac9a9daSXin Li <listitem><para>Force overwrite of output files. Normally, 360*0ac9a9daSXin Li <computeroutput>bzip2</computeroutput> will not overwrite 361*0ac9a9daSXin Li existing output files. Also forces 362*0ac9a9daSXin Li <computeroutput>bzip2</computeroutput> to break hard links to 363*0ac9a9daSXin Li files, which it otherwise wouldn't do.</para> 364*0ac9a9daSXin Li <para><computeroutput>bzip2</computeroutput> normally declines 365*0ac9a9daSXin Li to decompress files which don't have the correct magic header 366*0ac9a9daSXin Li bytes. If forced (<computeroutput>-f</computeroutput>), 367*0ac9a9daSXin Li however, it will pass such files through unmodified. This is 368*0ac9a9daSXin Li how GNU <computeroutput>gzip</computeroutput> behaves.</para> 369*0ac9a9daSXin Li </listitem> 370*0ac9a9daSXin Li </varlistentry> 371*0ac9a9daSXin Li 372*0ac9a9daSXin Li <varlistentry> 373*0ac9a9daSXin Li <term><computeroutput>-k --keep</computeroutput></term> 374*0ac9a9daSXin Li <listitem><para>Keep (don't delete) input files during 375*0ac9a9daSXin Li compression or decompression.</para></listitem> 376*0ac9a9daSXin Li </varlistentry> 377*0ac9a9daSXin Li 378*0ac9a9daSXin Li <varlistentry> 379*0ac9a9daSXin Li <term><computeroutput>-s --small</computeroutput></term> 380*0ac9a9daSXin Li <listitem><para>Reduce memory usage, for compression, 381*0ac9a9daSXin Li decompression and testing. Files are decompressed and tested 382*0ac9a9daSXin Li using a modified algorithm which only requires 2.5 bytes per 383*0ac9a9daSXin Li block byte. This means any file can be decompressed in 2300k 384*0ac9a9daSXin Li of memory, albeit at about half the normal speed.</para> 385*0ac9a9daSXin Li <para>During compression, <computeroutput>-s</computeroutput> 386*0ac9a9daSXin Li selects a block size of 200k, which limits memory use to around 387*0ac9a9daSXin Li the same figure, at the expense of your compression ratio. In 388*0ac9a9daSXin Li short, if your machine is low on memory (8 megabytes or less), 389*0ac9a9daSXin Li use <computeroutput>-s</computeroutput> for everything. See 390*0ac9a9daSXin Li <xref linkend="memory-management"/> below.</para></listitem> 391*0ac9a9daSXin Li </varlistentry> 392*0ac9a9daSXin Li 393*0ac9a9daSXin Li <varlistentry> 394*0ac9a9daSXin Li <term><computeroutput>-q --quiet</computeroutput></term> 395*0ac9a9daSXin Li <listitem><para>Suppress non-essential warning messages. 396*0ac9a9daSXin Li Messages pertaining to I/O errors and other critical events 397*0ac9a9daSXin Li will not be suppressed.</para></listitem> 398*0ac9a9daSXin Li </varlistentry> 399*0ac9a9daSXin Li 400*0ac9a9daSXin Li <varlistentry> 401*0ac9a9daSXin Li <term><computeroutput>-v --verbose</computeroutput></term> 402*0ac9a9daSXin Li <listitem><para>Verbose mode -- show the compression ratio for 403*0ac9a9daSXin Li each file processed. Further 404*0ac9a9daSXin Li <computeroutput>-v</computeroutput>'s increase the verbosity 405*0ac9a9daSXin Li level, spewing out lots of information which is primarily of 406*0ac9a9daSXin Li interest for diagnostic purposes.</para></listitem> 407*0ac9a9daSXin Li </varlistentry> 408*0ac9a9daSXin Li 409*0ac9a9daSXin Li <varlistentry> 410*0ac9a9daSXin Li <term><computeroutput>-L --license -V --version</computeroutput></term> 411*0ac9a9daSXin Li <listitem><para>Display the software version, license terms and 412*0ac9a9daSXin Li conditions.</para></listitem> 413*0ac9a9daSXin Li </varlistentry> 414*0ac9a9daSXin Li 415*0ac9a9daSXin Li <varlistentry> 416*0ac9a9daSXin Li <term><computeroutput>-1</computeroutput> (or 417*0ac9a9daSXin Li <computeroutput>--fast</computeroutput>) to 418*0ac9a9daSXin Li <computeroutput>-9</computeroutput> (or 419*0ac9a9daSXin Li <computeroutput>-best</computeroutput>)</term> 420*0ac9a9daSXin Li <listitem><para>Set the block size to 100 k, 200 k ... 900 k 421*0ac9a9daSXin Li when compressing. Has no effect when decompressing. See <xref 422*0ac9a9daSXin Li linkend="memory-management" /> below. The 423*0ac9a9daSXin Li <computeroutput>--fast</computeroutput> and 424*0ac9a9daSXin Li <computeroutput>--best</computeroutput> aliases are primarily 425*0ac9a9daSXin Li for GNU <computeroutput>gzip</computeroutput> compatibility. 426*0ac9a9daSXin Li In particular, <computeroutput>--fast</computeroutput> doesn't 427*0ac9a9daSXin Li make things significantly faster. And 428*0ac9a9daSXin Li <computeroutput>--best</computeroutput> merely selects the 429*0ac9a9daSXin Li default behaviour.</para></listitem> 430*0ac9a9daSXin Li </varlistentry> 431*0ac9a9daSXin Li 432*0ac9a9daSXin Li <varlistentry> 433*0ac9a9daSXin Li <term><computeroutput>--</computeroutput></term> 434*0ac9a9daSXin Li <listitem><para>Treats all subsequent arguments as file names, 435*0ac9a9daSXin Li even if they start with a dash. This is so you can handle 436*0ac9a9daSXin Li files with names beginning with a dash, for example: 437*0ac9a9daSXin Li <computeroutput>bzip2 -- 438*0ac9a9daSXin Li -myfilename</computeroutput>.</para></listitem> 439*0ac9a9daSXin Li </varlistentry> 440*0ac9a9daSXin Li 441*0ac9a9daSXin Li <varlistentry> 442*0ac9a9daSXin Li <term><computeroutput>--repetitive-fast</computeroutput></term> 443*0ac9a9daSXin Li <term><computeroutput>--repetitive-best</computeroutput></term> 444*0ac9a9daSXin Li <listitem><para>These flags are redundant in versions 0.9.5 and 445*0ac9a9daSXin Li above. They provided some coarse control over the behaviour of 446*0ac9a9daSXin Li the sorting algorithm in earlier versions, which was sometimes 447*0ac9a9daSXin Li useful. 0.9.5 and above have an improved algorithm which 448*0ac9a9daSXin Li renders these flags irrelevant.</para></listitem> 449*0ac9a9daSXin Li </varlistentry> 450*0ac9a9daSXin Li 451*0ac9a9daSXin Li</variablelist> 452*0ac9a9daSXin Li 453*0ac9a9daSXin Li</sect1> 454*0ac9a9daSXin Li 455*0ac9a9daSXin Li 456*0ac9a9daSXin Li<sect1 id="memory-management" xreflabel="MEMORY MANAGEMENT"> 457*0ac9a9daSXin Li<title>MEMORY MANAGEMENT</title> 458*0ac9a9daSXin Li 459*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> compresses large 460*0ac9a9daSXin Lifiles in blocks. The block size affects both the compression 461*0ac9a9daSXin Liratio achieved, and the amount of memory needed for compression 462*0ac9a9daSXin Liand decompression. The flags <computeroutput>-1</computeroutput> 463*0ac9a9daSXin Lithrough <computeroutput>-9</computeroutput> specify the block 464*0ac9a9daSXin Lisize to be 100,000 bytes through 900,000 bytes (the default) 465*0ac9a9daSXin Lirespectively. At decompression time, the block size used for 466*0ac9a9daSXin Licompression is read from the header of the compressed file, and 467*0ac9a9daSXin Li<computeroutput>bunzip2</computeroutput> then allocates itself 468*0ac9a9daSXin Lijust enough memory to decompress the file. Since block sizes are 469*0ac9a9daSXin Listored in compressed files, it follows that the flags 470*0ac9a9daSXin Li<computeroutput>-1</computeroutput> to 471*0ac9a9daSXin Li<computeroutput>-9</computeroutput> are irrelevant to and so 472*0ac9a9daSXin Liignored during decompression.</para> 473*0ac9a9daSXin Li 474*0ac9a9daSXin Li<para>Compression and decompression requirements, in bytes, can be 475*0ac9a9daSXin Liestimated as:</para> 476*0ac9a9daSXin Li<programlisting> 477*0ac9a9daSXin LiCompression: 400k + ( 8 x block size ) 478*0ac9a9daSXin Li 479*0ac9a9daSXin LiDecompression: 100k + ( 4 x block size ), or 480*0ac9a9daSXin Li 100k + ( 2.5 x block size ) 481*0ac9a9daSXin Li</programlisting> 482*0ac9a9daSXin Li 483*0ac9a9daSXin Li<para>Larger block sizes give rapidly diminishing marginal 484*0ac9a9daSXin Lireturns. Most of the compression comes from the first two or 485*0ac9a9daSXin Lithree hundred k of block size, a fact worth bearing in mind when 486*0ac9a9daSXin Liusing <computeroutput>bzip2</computeroutput> on small machines. 487*0ac9a9daSXin LiIt is also important to appreciate that the decompression memory 488*0ac9a9daSXin Lirequirement is set at compression time by the choice of block 489*0ac9a9daSXin Lisize.</para> 490*0ac9a9daSXin Li 491*0ac9a9daSXin Li<para>For files compressed with the default 900k block size, 492*0ac9a9daSXin Li<computeroutput>bunzip2</computeroutput> will require about 3700 493*0ac9a9daSXin Likbytes to decompress. To support decompression of any file on a 494*0ac9a9daSXin Li4 megabyte machine, <computeroutput>bunzip2</computeroutput> has 495*0ac9a9daSXin Lian option to decompress using approximately half this amount of 496*0ac9a9daSXin Limemory, about 2300 kbytes. Decompression speed is also halved, 497*0ac9a9daSXin Liso you should use this option only where necessary. The relevant 498*0ac9a9daSXin Liflag is <computeroutput>-s</computeroutput>.</para> 499*0ac9a9daSXin Li 500*0ac9a9daSXin Li<para>In general, try and use the largest block size memory 501*0ac9a9daSXin Liconstraints allow, since that maximises the compression achieved. 502*0ac9a9daSXin LiCompression and decompression speed are virtually unaffected by 503*0ac9a9daSXin Liblock size.</para> 504*0ac9a9daSXin Li 505*0ac9a9daSXin Li<para>Another significant point applies to files which fit in a 506*0ac9a9daSXin Lisingle block -- that means most files you'd encounter using a 507*0ac9a9daSXin Lilarge block size. The amount of real memory touched is 508*0ac9a9daSXin Liproportional to the size of the file, since the file is smaller 509*0ac9a9daSXin Lithan a block. For example, compressing a file 20,000 bytes long 510*0ac9a9daSXin Liwith the flag <computeroutput>-9</computeroutput> will cause the 511*0ac9a9daSXin Licompressor to allocate around 7600k of memory, but only touch 512*0ac9a9daSXin Li400k + 20000 * 8 = 560 kbytes of it. Similarly, the decompressor 513*0ac9a9daSXin Liwill allocate 3700k but only touch 100k + 20000 * 4 = 180 514*0ac9a9daSXin Likbytes.</para> 515*0ac9a9daSXin Li 516*0ac9a9daSXin Li<para>Here is a table which summarises the maximum memory usage 517*0ac9a9daSXin Lifor different block sizes. Also recorded is the total compressed 518*0ac9a9daSXin Lisize for 14 files of the Calgary Text Compression Corpus 519*0ac9a9daSXin Litotalling 3,141,622 bytes. This column gives some feel for how 520*0ac9a9daSXin Licompression varies with block size. These figures tend to 521*0ac9a9daSXin Liunderstate the advantage of larger block sizes for larger files, 522*0ac9a9daSXin Lisince the Corpus is dominated by smaller files.</para> 523*0ac9a9daSXin Li 524*0ac9a9daSXin Li<programlisting> 525*0ac9a9daSXin Li Compress Decompress Decompress Corpus 526*0ac9a9daSXin LiFlag usage usage -s usage Size 527*0ac9a9daSXin Li 528*0ac9a9daSXin Li -1 1200k 500k 350k 914704 529*0ac9a9daSXin Li -2 2000k 900k 600k 877703 530*0ac9a9daSXin Li -3 2800k 1300k 850k 860338 531*0ac9a9daSXin Li -4 3600k 1700k 1100k 846899 532*0ac9a9daSXin Li -5 4400k 2100k 1350k 845160 533*0ac9a9daSXin Li -6 5200k 2500k 1600k 838626 534*0ac9a9daSXin Li -7 6100k 2900k 1850k 834096 535*0ac9a9daSXin Li -8 6800k 3300k 2100k 828642 536*0ac9a9daSXin Li -9 7600k 3700k 2350k 828642 537*0ac9a9daSXin Li</programlisting> 538*0ac9a9daSXin Li 539*0ac9a9daSXin Li</sect1> 540*0ac9a9daSXin Li 541*0ac9a9daSXin Li 542*0ac9a9daSXin Li<sect1 id="recovering" xreflabel="RECOVERING DATA FROM DAMAGED FILES"> 543*0ac9a9daSXin Li<title>RECOVERING DATA FROM DAMAGED FILES</title> 544*0ac9a9daSXin Li 545*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> compresses files in 546*0ac9a9daSXin Liblocks, usually 900kbytes long. Each block is handled 547*0ac9a9daSXin Liindependently. If a media or transmission error causes a 548*0ac9a9daSXin Limulti-block <computeroutput>.bz2</computeroutput> file to become 549*0ac9a9daSXin Lidamaged, it may be possible to recover data from the undamaged 550*0ac9a9daSXin Liblocks in the file.</para> 551*0ac9a9daSXin Li 552*0ac9a9daSXin Li<para>The compressed representation of each block is delimited by 553*0ac9a9daSXin Lia 48-bit pattern, which makes it possible to find the block 554*0ac9a9daSXin Liboundaries with reasonable certainty. Each block also carries 555*0ac9a9daSXin Liits own 32-bit CRC, so damaged blocks can be distinguished from 556*0ac9a9daSXin Liundamaged ones.</para> 557*0ac9a9daSXin Li 558*0ac9a9daSXin Li<para><computeroutput>bzip2recover</computeroutput> is a simple 559*0ac9a9daSXin Liprogram whose purpose is to search for blocks in 560*0ac9a9daSXin Li<computeroutput>.bz2</computeroutput> files, and write each block 561*0ac9a9daSXin Liout into its own <computeroutput>.bz2</computeroutput> file. You 562*0ac9a9daSXin Lican then use <computeroutput>bzip2 -t</computeroutput> to test 563*0ac9a9daSXin Lithe integrity of the resulting files, and decompress those which 564*0ac9a9daSXin Liare undamaged.</para> 565*0ac9a9daSXin Li 566*0ac9a9daSXin Li<para><computeroutput>bzip2recover</computeroutput> takes a 567*0ac9a9daSXin Lisingle argument, the name of the damaged file, and writes a 568*0ac9a9daSXin Linumber of files <computeroutput>rec0001file.bz2</computeroutput>, 569*0ac9a9daSXin Li<computeroutput>rec0002file.bz2</computeroutput>, etc, containing 570*0ac9a9daSXin Lithe extracted blocks. The output filenames are designed so that 571*0ac9a9daSXin Lithe use of wildcards in subsequent processing -- for example, 572*0ac9a9daSXin Li<computeroutput>bzip2 -dc rec*file.bz2 > 573*0ac9a9daSXin Lirecovered_data</computeroutput> -- lists the files in the correct 574*0ac9a9daSXin Liorder.</para> 575*0ac9a9daSXin Li 576*0ac9a9daSXin Li<para><computeroutput>bzip2recover</computeroutput> should be of 577*0ac9a9daSXin Limost use dealing with large <computeroutput>.bz2</computeroutput> 578*0ac9a9daSXin Lifiles, as these will contain many blocks. It is clearly futile 579*0ac9a9daSXin Lito use it on damaged single-block files, since a damaged block 580*0ac9a9daSXin Licannot be recovered. If you wish to minimise any potential data 581*0ac9a9daSXin Liloss through media or transmission errors, you might consider 582*0ac9a9daSXin Licompressing with a smaller block size.</para> 583*0ac9a9daSXin Li 584*0ac9a9daSXin Li</sect1> 585*0ac9a9daSXin Li 586*0ac9a9daSXin Li 587*0ac9a9daSXin Li<sect1 id="performance" xreflabel="PERFORMANCE NOTES"> 588*0ac9a9daSXin Li<title>PERFORMANCE NOTES</title> 589*0ac9a9daSXin Li 590*0ac9a9daSXin Li<para>The sorting phase of compression gathers together similar 591*0ac9a9daSXin Listrings in the file. Because of this, files containing very long 592*0ac9a9daSXin Liruns of repeated symbols, like "aabaabaabaab ..." (repeated 593*0ac9a9daSXin Liseveral hundred times) may compress more slowly than normal. 594*0ac9a9daSXin LiVersions 0.9.5 and above fare much better than previous versions 595*0ac9a9daSXin Liin this respect. The ratio between worst-case and average-case 596*0ac9a9daSXin Licompression time is in the region of 10:1. For previous 597*0ac9a9daSXin Liversions, this figure was more like 100:1. You can use the 598*0ac9a9daSXin Li<computeroutput>-vvvv</computeroutput> option to monitor progress 599*0ac9a9daSXin Liin great detail, if you want.</para> 600*0ac9a9daSXin Li 601*0ac9a9daSXin Li<para>Decompression speed is unaffected by these 602*0ac9a9daSXin Liphenomena.</para> 603*0ac9a9daSXin Li 604*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> usually allocates 605*0ac9a9daSXin Liseveral megabytes of memory to operate in, and then charges all 606*0ac9a9daSXin Liover it in a fairly random fashion. This means that performance, 607*0ac9a9daSXin Liboth for compressing and decompressing, is largely determined by 608*0ac9a9daSXin Lithe speed at which your machine can service cache misses. 609*0ac9a9daSXin LiBecause of this, small changes to the code to reduce the miss 610*0ac9a9daSXin Lirate have been observed to give disproportionately large 611*0ac9a9daSXin Liperformance improvements. I imagine 612*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> will perform best on 613*0ac9a9daSXin Limachines with very large caches.</para> 614*0ac9a9daSXin Li 615*0ac9a9daSXin Li</sect1> 616*0ac9a9daSXin Li 617*0ac9a9daSXin Li 618*0ac9a9daSXin Li 619*0ac9a9daSXin Li<sect1 id="caveats" xreflabel="CAVEATS"> 620*0ac9a9daSXin Li<title>CAVEATS</title> 621*0ac9a9daSXin Li 622*0ac9a9daSXin Li<para>I/O error messages are not as helpful as they could be. 623*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> tries hard to detect I/O 624*0ac9a9daSXin Lierrors and exit cleanly, but the details of what the problem is 625*0ac9a9daSXin Lisometimes seem rather misleading.</para> 626*0ac9a9daSXin Li 627*0ac9a9daSXin Li<para>This manual page pertains to version &bz-version; of 628*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput>. Compressed data created by 629*0ac9a9daSXin Lithis version is entirely forwards and backwards compatible with the 630*0ac9a9daSXin Liprevious public releases, versions 0.1pl2, 0.9.0 and 0.9.5, 1.0.0, 631*0ac9a9daSXin Li1.0.1, 1.0.2 and 1.0.3, but with the following exception: 0.9.0 and 632*0ac9a9daSXin Liabove can correctly decompress multiple concatenated compressed files. 633*0ac9a9daSXin Li0.1pl2 cannot do this; it will stop after decompressing just the first 634*0ac9a9daSXin Lifile in the stream.</para> 635*0ac9a9daSXin Li 636*0ac9a9daSXin Li<para><computeroutput>bzip2recover</computeroutput> versions 637*0ac9a9daSXin Liprior to 1.0.2 used 32-bit integers to represent bit positions in 638*0ac9a9daSXin Licompressed files, so it could not handle compressed files more 639*0ac9a9daSXin Lithan 512 megabytes long. Versions 1.0.2 and above use 64-bit ints 640*0ac9a9daSXin Lion some platforms which support them (GNU supported targets, and 641*0ac9a9daSXin LiWindows). To establish whether or not 642*0ac9a9daSXin Li<computeroutput>bzip2recover</computeroutput> was built with such 643*0ac9a9daSXin Lia limitation, run it without arguments. In any event you can 644*0ac9a9daSXin Libuild yourself an unlimited version if you can recompile it with 645*0ac9a9daSXin Li<computeroutput>MaybeUInt64</computeroutput> set to be an 646*0ac9a9daSXin Liunsigned 64-bit integer.</para> 647*0ac9a9daSXin Li 648*0ac9a9daSXin Li</sect1> 649*0ac9a9daSXin Li 650*0ac9a9daSXin Li 651*0ac9a9daSXin Li 652*0ac9a9daSXin Li<sect1 id="author" xreflabel="AUTHOR"> 653*0ac9a9daSXin Li<title>AUTHOR</title> 654*0ac9a9daSXin Li 655*0ac9a9daSXin Li<para>Julian Seward, 656*0ac9a9daSXin Li<computeroutput>&bz-author;</computeroutput></para> 657*0ac9a9daSXin Li 658*0ac9a9daSXin Li<para>The ideas embodied in 659*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> are due to (at least) the 660*0ac9a9daSXin Lifollowing people: Michael Burrows and David Wheeler (for the 661*0ac9a9daSXin Liblock sorting transformation), David Wheeler (again, for the 662*0ac9a9daSXin LiHuffman coder), Peter Fenwick (for the structured coding model in 663*0ac9a9daSXin Lithe original <computeroutput>bzip</computeroutput>, and many 664*0ac9a9daSXin Lirefinements), and Alistair Moffat, Radford Neal and Ian Witten 665*0ac9a9daSXin Li(for the arithmetic coder in the original 666*0ac9a9daSXin Li<computeroutput>bzip</computeroutput>). I am much indebted for 667*0ac9a9daSXin Litheir help, support and advice. See the manual in the source 668*0ac9a9daSXin Lidistribution for pointers to sources of documentation. Christian 669*0ac9a9daSXin Livon Roques encouraged me to look for faster sorting algorithms, 670*0ac9a9daSXin Liso as to speed up compression. Bela Lubkin encouraged me to 671*0ac9a9daSXin Liimprove the worst-case compression performance. 672*0ac9a9daSXin LiDonna Robinson XMLised the documentation. 673*0ac9a9daSXin LiMany people sent 674*0ac9a9daSXin Lipatches, helped with portability problems, lent machines, gave 675*0ac9a9daSXin Liadvice and were generally helpful.</para> 676*0ac9a9daSXin Li 677*0ac9a9daSXin Li</sect1> 678*0ac9a9daSXin Li 679*0ac9a9daSXin Li</chapter> 680*0ac9a9daSXin Li 681*0ac9a9daSXin Li 682*0ac9a9daSXin Li 683*0ac9a9daSXin Li<chapter id="libprog" xreflabel="Programming with libbzip2"> 684*0ac9a9daSXin Li<title> 685*0ac9a9daSXin LiProgramming with <computeroutput>libbzip2</computeroutput> 686*0ac9a9daSXin Li</title> 687*0ac9a9daSXin Li 688*0ac9a9daSXin Li<para>This chapter describes the programming interface to 689*0ac9a9daSXin Li<computeroutput>libbzip2</computeroutput>.</para> 690*0ac9a9daSXin Li 691*0ac9a9daSXin Li<para>For general background information, particularly about 692*0ac9a9daSXin Limemory use and performance aspects, you'd be well advised to read 693*0ac9a9daSXin Li<xref linkend="using"/> as well.</para> 694*0ac9a9daSXin Li 695*0ac9a9daSXin Li 696*0ac9a9daSXin Li<sect1 id="top-level" xreflabel="Top-level structure"> 697*0ac9a9daSXin Li<title>Top-level structure</title> 698*0ac9a9daSXin Li 699*0ac9a9daSXin Li<para><computeroutput>libbzip2</computeroutput> is a flexible 700*0ac9a9daSXin Lilibrary for compressing and decompressing data in the 701*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> data format. Although 702*0ac9a9daSXin Lipackaged as a single entity, it helps to regard the library as 703*0ac9a9daSXin Lithree separate parts: the low level interface, and the high level 704*0ac9a9daSXin Liinterface, and some utility functions.</para> 705*0ac9a9daSXin Li 706*0ac9a9daSXin Li<para>The structure of 707*0ac9a9daSXin Li<computeroutput>libbzip2</computeroutput>'s interfaces is similar 708*0ac9a9daSXin Lito that of Jean-loup Gailly's and Mark Adler's excellent 709*0ac9a9daSXin Li<computeroutput>zlib</computeroutput> library.</para> 710*0ac9a9daSXin Li 711*0ac9a9daSXin Li<para>All externally visible symbols have names beginning 712*0ac9a9daSXin Li<computeroutput>BZ2_</computeroutput>. This is new in version 713*0ac9a9daSXin Li1.0. The intention is to minimise pollution of the namespaces of 714*0ac9a9daSXin Lilibrary clients.</para> 715*0ac9a9daSXin Li 716*0ac9a9daSXin Li<para>To use any part of the library, you need to 717*0ac9a9daSXin Li<computeroutput>#include <bzlib.h></computeroutput> 718*0ac9a9daSXin Liinto your sources.</para> 719*0ac9a9daSXin Li 720*0ac9a9daSXin Li 721*0ac9a9daSXin Li 722*0ac9a9daSXin Li<sect2 id="ll-summary" xreflabel="Low-level summary"> 723*0ac9a9daSXin Li<title>Low-level summary</title> 724*0ac9a9daSXin Li 725*0ac9a9daSXin Li<para>This interface provides services for compressing and 726*0ac9a9daSXin Lidecompressing data in memory. There's no provision for dealing 727*0ac9a9daSXin Liwith files, streams or any other I/O mechanisms, just straight 728*0ac9a9daSXin Limemory-to-memory work. In fact, this part of the library can be 729*0ac9a9daSXin Licompiled without inclusion of 730*0ac9a9daSXin Li<computeroutput>stdio.h</computeroutput>, which may be helpful 731*0ac9a9daSXin Lifor embedded applications.</para> 732*0ac9a9daSXin Li 733*0ac9a9daSXin Li<para>The low-level part of the library has no global variables 734*0ac9a9daSXin Liand is therefore thread-safe.</para> 735*0ac9a9daSXin Li 736*0ac9a9daSXin Li<para>Six routines make up the low level interface: 737*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressInit</computeroutput>, 738*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput>, and 739*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressEnd</computeroutput> for 740*0ac9a9daSXin Licompression, and a corresponding trio 741*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompressInit</computeroutput>, 742*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompress</computeroutput> and 743*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompressEnd</computeroutput> for 744*0ac9a9daSXin Lidecompression. The <computeroutput>*Init</computeroutput> 745*0ac9a9daSXin Lifunctions allocate memory for compression/decompression and do 746*0ac9a9daSXin Liother initialisations, whilst the 747*0ac9a9daSXin Li<computeroutput>*End</computeroutput> functions close down 748*0ac9a9daSXin Lioperations and release memory.</para> 749*0ac9a9daSXin Li 750*0ac9a9daSXin Li<para>The real work is done by 751*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> and 752*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompress</computeroutput>. These 753*0ac9a9daSXin Licompress and decompress data from a user-supplied input buffer to 754*0ac9a9daSXin Lia user-supplied output buffer. These buffers can be any size; 755*0ac9a9daSXin Liarbitrary quantities of data are handled by making repeated calls 756*0ac9a9daSXin Lito these functions. This is a flexible mechanism allowing a 757*0ac9a9daSXin Liconsumer-pull style of activity, or producer-push, or a mixture 758*0ac9a9daSXin Liof both.</para> 759*0ac9a9daSXin Li 760*0ac9a9daSXin Li</sect2> 761*0ac9a9daSXin Li 762*0ac9a9daSXin Li 763*0ac9a9daSXin Li<sect2 id="hl-summary" xreflabel="High-level summary"> 764*0ac9a9daSXin Li<title>High-level summary</title> 765*0ac9a9daSXin Li 766*0ac9a9daSXin Li<para>This interface provides some handy wrappers around the 767*0ac9a9daSXin Lilow-level interface to facilitate reading and writing 768*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> format files 769*0ac9a9daSXin Li(<computeroutput>.bz2</computeroutput> files). The routines 770*0ac9a9daSXin Liprovide hooks to facilitate reading files in which the 771*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> data stream is embedded 772*0ac9a9daSXin Liwithin some larger-scale file structure, or where there are 773*0ac9a9daSXin Limultiple <computeroutput>bzip2</computeroutput> data streams 774*0ac9a9daSXin Liconcatenated end-to-end.</para> 775*0ac9a9daSXin Li 776*0ac9a9daSXin Li<para>For reading files, 777*0ac9a9daSXin Li<computeroutput>BZ2_bzReadOpen</computeroutput>, 778*0ac9a9daSXin Li<computeroutput>BZ2_bzRead</computeroutput>, 779*0ac9a9daSXin Li<computeroutput>BZ2_bzReadClose</computeroutput> and 780*0ac9a9daSXin Li<computeroutput>BZ2_bzReadGetUnused</computeroutput> are 781*0ac9a9daSXin Lisupplied. For writing files, 782*0ac9a9daSXin Li<computeroutput>BZ2_bzWriteOpen</computeroutput>, 783*0ac9a9daSXin Li<computeroutput>BZ2_bzWrite</computeroutput> and 784*0ac9a9daSXin Li<computeroutput>BZ2_bzWriteFinish</computeroutput> are 785*0ac9a9daSXin Liavailable.</para> 786*0ac9a9daSXin Li 787*0ac9a9daSXin Li<para>As with the low-level library, no global variables are used 788*0ac9a9daSXin Liso the library is per se thread-safe. However, if I/O errors 789*0ac9a9daSXin Lioccur whilst reading or writing the underlying compressed files, 790*0ac9a9daSXin Liyou may have to consult <computeroutput>errno</computeroutput> to 791*0ac9a9daSXin Lidetermine the cause of the error. In that case, you'd need a C 792*0ac9a9daSXin Lilibrary which correctly supports 793*0ac9a9daSXin Li<computeroutput>errno</computeroutput> in a multithreaded 794*0ac9a9daSXin Lienvironment.</para> 795*0ac9a9daSXin Li 796*0ac9a9daSXin Li<para>To make the library a little simpler and more portable, 797*0ac9a9daSXin Li<computeroutput>BZ2_bzReadOpen</computeroutput> and 798*0ac9a9daSXin Li<computeroutput>BZ2_bzWriteOpen</computeroutput> require you to 799*0ac9a9daSXin Lipass them file handles (<computeroutput>FILE*</computeroutput>s) 800*0ac9a9daSXin Liwhich have previously been opened for reading or writing 801*0ac9a9daSXin Lirespectively. That avoids portability problems associated with 802*0ac9a9daSXin Lifile operations and file attributes, whilst not being much of an 803*0ac9a9daSXin Liimposition on the programmer.</para> 804*0ac9a9daSXin Li 805*0ac9a9daSXin Li</sect2> 806*0ac9a9daSXin Li 807*0ac9a9daSXin Li 808*0ac9a9daSXin Li<sect2 id="util-fns-summary" xreflabel="Utility functions summary"> 809*0ac9a9daSXin Li<title>Utility functions summary</title> 810*0ac9a9daSXin Li 811*0ac9a9daSXin Li<para>For very simple needs, 812*0ac9a9daSXin Li<computeroutput>BZ2_bzBuffToBuffCompress</computeroutput> and 813*0ac9a9daSXin Li<computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> are 814*0ac9a9daSXin Liprovided. These compress data in memory from one buffer to 815*0ac9a9daSXin Lianother buffer in a single function call. You should assess 816*0ac9a9daSXin Liwhether these functions fulfill your memory-to-memory 817*0ac9a9daSXin Licompression/decompression requirements before investing effort in 818*0ac9a9daSXin Liunderstanding the more general but more complex low-level 819*0ac9a9daSXin Liinterface.</para> 820*0ac9a9daSXin Li 821*0ac9a9daSXin Li<para>Yoshioka Tsuneo 822*0ac9a9daSXin Li(<computeroutput>[email protected]</computeroutput>) has 823*0ac9a9daSXin Licontributed some functions to give better 824*0ac9a9daSXin Li<computeroutput>zlib</computeroutput> compatibility. These 825*0ac9a9daSXin Lifunctions are <computeroutput>BZ2_bzopen</computeroutput>, 826*0ac9a9daSXin Li<computeroutput>BZ2_bzread</computeroutput>, 827*0ac9a9daSXin Li<computeroutput>BZ2_bzwrite</computeroutput>, 828*0ac9a9daSXin Li<computeroutput>BZ2_bzflush</computeroutput>, 829*0ac9a9daSXin Li<computeroutput>BZ2_bzclose</computeroutput>, 830*0ac9a9daSXin Li<computeroutput>BZ2_bzerror</computeroutput> and 831*0ac9a9daSXin Li<computeroutput>BZ2_bzlibVersion</computeroutput>. You may find 832*0ac9a9daSXin Lithese functions more convenient for simple file reading and 833*0ac9a9daSXin Liwriting, than those in the high-level interface. These functions 834*0ac9a9daSXin Liare not (yet) officially part of the library, and are minimally 835*0ac9a9daSXin Lidocumented here. If they break, you get to keep all the pieces. 836*0ac9a9daSXin LiI hope to document them properly when time permits.</para> 837*0ac9a9daSXin Li 838*0ac9a9daSXin Li<para>Yoshioka also contributed modifications to allow the 839*0ac9a9daSXin Lilibrary to be built as a Windows DLL.</para> 840*0ac9a9daSXin Li 841*0ac9a9daSXin Li</sect2> 842*0ac9a9daSXin Li 843*0ac9a9daSXin Li</sect1> 844*0ac9a9daSXin Li 845*0ac9a9daSXin Li 846*0ac9a9daSXin Li<sect1 id="err-handling" xreflabel="Error handling"> 847*0ac9a9daSXin Li<title>Error handling</title> 848*0ac9a9daSXin Li 849*0ac9a9daSXin Li<para>The library is designed to recover cleanly in all 850*0ac9a9daSXin Lisituations, including the worst-case situation of decompressing 851*0ac9a9daSXin Lirandom data. I'm not 100% sure that it can always do this, so 852*0ac9a9daSXin Liyou might want to add a signal handler to catch segmentation 853*0ac9a9daSXin Liviolations during decompression if you are feeling especially 854*0ac9a9daSXin Liparanoid. I would be interested in hearing more about the 855*0ac9a9daSXin Lirobustness of the library to corrupted compressed data.</para> 856*0ac9a9daSXin Li 857*0ac9a9daSXin Li<para>Version 1.0.3 more robust in this respect than any 858*0ac9a9daSXin Liprevious version. Investigations with Valgrind (a tool for detecting 859*0ac9a9daSXin Liproblems with memory management) indicate 860*0ac9a9daSXin Lithat, at least for the few files I tested, all single-bit errors 861*0ac9a9daSXin Liin the decompressed data are caught properly, with no 862*0ac9a9daSXin Lisegmentation faults, no uses of uninitialised data, no out of 863*0ac9a9daSXin Lirange reads or writes, and no infinite looping in the decompressor. 864*0ac9a9daSXin LiSo it's certainly pretty robust, although 865*0ac9a9daSXin LiI wouldn't claim it to be totally bombproof.</para> 866*0ac9a9daSXin Li 867*0ac9a9daSXin Li<para>The file <computeroutput>bzlib.h</computeroutput> contains 868*0ac9a9daSXin Liall definitions needed to use the library. In particular, you 869*0ac9a9daSXin Lishould definitely not include 870*0ac9a9daSXin Li<computeroutput>bzlib_private.h</computeroutput>.</para> 871*0ac9a9daSXin Li 872*0ac9a9daSXin Li<para>In <computeroutput>bzlib.h</computeroutput>, the various 873*0ac9a9daSXin Lireturn values are defined. The following list is not intended as 874*0ac9a9daSXin Lian exhaustive description of the circumstances in which a given 875*0ac9a9daSXin Livalue may be returned -- those descriptions are given later. 876*0ac9a9daSXin LiRather, it is intended to convey the rough meaning of each return 877*0ac9a9daSXin Livalue. The first five actions are normal and not intended to 878*0ac9a9daSXin Lidenote an error situation.</para> 879*0ac9a9daSXin Li 880*0ac9a9daSXin Li<variablelist> 881*0ac9a9daSXin Li 882*0ac9a9daSXin Li <varlistentry> 883*0ac9a9daSXin Li <term><computeroutput>BZ_OK</computeroutput></term> 884*0ac9a9daSXin Li <listitem><para>The requested action was completed 885*0ac9a9daSXin Li successfully.</para></listitem> 886*0ac9a9daSXin Li </varlistentry> 887*0ac9a9daSXin Li 888*0ac9a9daSXin Li <varlistentry> 889*0ac9a9daSXin Li <term><computeroutput>BZ_RUN_OK, BZ_FLUSH_OK, 890*0ac9a9daSXin Li BZ_FINISH_OK</computeroutput></term> 891*0ac9a9daSXin Li <listitem><para>In 892*0ac9a9daSXin Li <computeroutput>BZ2_bzCompress</computeroutput>, the requested 893*0ac9a9daSXin Li flush/finish/nothing-special action was completed 894*0ac9a9daSXin Li successfully.</para></listitem> 895*0ac9a9daSXin Li </varlistentry> 896*0ac9a9daSXin Li 897*0ac9a9daSXin Li <varlistentry> 898*0ac9a9daSXin Li <term><computeroutput>BZ_STREAM_END</computeroutput></term> 899*0ac9a9daSXin Li <listitem><para>Compression of data was completed, or the 900*0ac9a9daSXin Li logical stream end was detected during 901*0ac9a9daSXin Li decompression.</para></listitem> 902*0ac9a9daSXin Li </varlistentry> 903*0ac9a9daSXin Li 904*0ac9a9daSXin Li</variablelist> 905*0ac9a9daSXin Li 906*0ac9a9daSXin Li<para>The following return values indicate an error of some 907*0ac9a9daSXin Likind.</para> 908*0ac9a9daSXin Li 909*0ac9a9daSXin Li<variablelist> 910*0ac9a9daSXin Li 911*0ac9a9daSXin Li <varlistentry> 912*0ac9a9daSXin Li <term><computeroutput>BZ_CONFIG_ERROR</computeroutput></term> 913*0ac9a9daSXin Li <listitem><para>Indicates that the library has been improperly 914*0ac9a9daSXin Li compiled on your platform -- a major configuration error. 915*0ac9a9daSXin Li Specifically, it means that 916*0ac9a9daSXin Li <computeroutput>sizeof(char)</computeroutput>, 917*0ac9a9daSXin Li <computeroutput>sizeof(short)</computeroutput> and 918*0ac9a9daSXin Li <computeroutput>sizeof(int)</computeroutput> are not 1, 2 and 919*0ac9a9daSXin Li 4 respectively, as they should be. Note that the library 920*0ac9a9daSXin Li should still work properly on 64-bit platforms which follow 921*0ac9a9daSXin Li the LP64 programming model -- that is, where 922*0ac9a9daSXin Li <computeroutput>sizeof(long)</computeroutput> and 923*0ac9a9daSXin Li <computeroutput>sizeof(void*)</computeroutput> are 8. Under 924*0ac9a9daSXin Li LP64, <computeroutput>sizeof(int)</computeroutput> is still 4, 925*0ac9a9daSXin Li so <computeroutput>libbzip2</computeroutput>, which doesn't 926*0ac9a9daSXin Li use the <computeroutput>long</computeroutput> type, is 927*0ac9a9daSXin Li OK.</para></listitem> 928*0ac9a9daSXin Li </varlistentry> 929*0ac9a9daSXin Li 930*0ac9a9daSXin Li <varlistentry> 931*0ac9a9daSXin Li <term><computeroutput>BZ_SEQUENCE_ERROR</computeroutput></term> 932*0ac9a9daSXin Li <listitem><para>When using the library, it is important to call 933*0ac9a9daSXin Li the functions in the correct sequence and with data structures 934*0ac9a9daSXin Li (buffers etc) in the correct states. 935*0ac9a9daSXin Li <computeroutput>libbzip2</computeroutput> checks as much as it 936*0ac9a9daSXin Li can to ensure this is happening, and returns 937*0ac9a9daSXin Li <computeroutput>BZ_SEQUENCE_ERROR</computeroutput> if not. 938*0ac9a9daSXin Li Code which complies precisely with the function semantics, as 939*0ac9a9daSXin Li detailed below, should never receive this value; such an event 940*0ac9a9daSXin Li denotes buggy code which you should 941*0ac9a9daSXin Li investigate.</para></listitem> 942*0ac9a9daSXin Li </varlistentry> 943*0ac9a9daSXin Li 944*0ac9a9daSXin Li <varlistentry> 945*0ac9a9daSXin Li <term><computeroutput>BZ_PARAM_ERROR</computeroutput></term> 946*0ac9a9daSXin Li <listitem><para>Returned when a parameter to a function call is 947*0ac9a9daSXin Li out of range or otherwise manifestly incorrect. As with 948*0ac9a9daSXin Li <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>, this 949*0ac9a9daSXin Li denotes a bug in the client code. The distinction between 950*0ac9a9daSXin Li <computeroutput>BZ_PARAM_ERROR</computeroutput> and 951*0ac9a9daSXin Li <computeroutput>BZ_SEQUENCE_ERROR</computeroutput> is a bit 952*0ac9a9daSXin Li hazy, but still worth making.</para></listitem> 953*0ac9a9daSXin Li </varlistentry> 954*0ac9a9daSXin Li 955*0ac9a9daSXin Li <varlistentry> 956*0ac9a9daSXin Li <term><computeroutput>BZ_MEM_ERROR</computeroutput></term> 957*0ac9a9daSXin Li <listitem><para>Returned when a request to allocate memory 958*0ac9a9daSXin Li failed. Note that the quantity of memory needed to decompress 959*0ac9a9daSXin Li a stream cannot be determined until the stream's header has 960*0ac9a9daSXin Li been read. So 961*0ac9a9daSXin Li <computeroutput>BZ2_bzDecompress</computeroutput> and 962*0ac9a9daSXin Li <computeroutput>BZ2_bzRead</computeroutput> may return 963*0ac9a9daSXin Li <computeroutput>BZ_MEM_ERROR</computeroutput> even though some 964*0ac9a9daSXin Li of the compressed data has been read. The same is not true 965*0ac9a9daSXin Li for compression; once 966*0ac9a9daSXin Li <computeroutput>BZ2_bzCompressInit</computeroutput> or 967*0ac9a9daSXin Li <computeroutput>BZ2_bzWriteOpen</computeroutput> have 968*0ac9a9daSXin Li successfully completed, 969*0ac9a9daSXin Li <computeroutput>BZ_MEM_ERROR</computeroutput> cannot 970*0ac9a9daSXin Li occur.</para></listitem> 971*0ac9a9daSXin Li </varlistentry> 972*0ac9a9daSXin Li 973*0ac9a9daSXin Li <varlistentry> 974*0ac9a9daSXin Li <term><computeroutput>BZ_DATA_ERROR</computeroutput></term> 975*0ac9a9daSXin Li <listitem><para>Returned when a data integrity error is 976*0ac9a9daSXin Li detected during decompression. Most importantly, this means 977*0ac9a9daSXin Li when stored and computed CRCs for the data do not match. This 978*0ac9a9daSXin Li value is also returned upon detection of any other anomaly in 979*0ac9a9daSXin Li the compressed data.</para></listitem> 980*0ac9a9daSXin Li </varlistentry> 981*0ac9a9daSXin Li 982*0ac9a9daSXin Li <varlistentry> 983*0ac9a9daSXin Li <term><computeroutput>BZ_DATA_ERROR_MAGIC</computeroutput></term> 984*0ac9a9daSXin Li <listitem><para>As a special case of 985*0ac9a9daSXin Li <computeroutput>BZ_DATA_ERROR</computeroutput>, it is 986*0ac9a9daSXin Li sometimes useful to know when the compressed stream does not 987*0ac9a9daSXin Li start with the correct magic bytes (<computeroutput>'B' 'Z' 988*0ac9a9daSXin Li 'h'</computeroutput>).</para></listitem> 989*0ac9a9daSXin Li </varlistentry> 990*0ac9a9daSXin Li 991*0ac9a9daSXin Li <varlistentry> 992*0ac9a9daSXin Li <term><computeroutput>BZ_IO_ERROR</computeroutput></term> 993*0ac9a9daSXin Li <listitem><para>Returned by 994*0ac9a9daSXin Li <computeroutput>BZ2_bzRead</computeroutput> and 995*0ac9a9daSXin Li <computeroutput>BZ2_bzWrite</computeroutput> when there is an 996*0ac9a9daSXin Li error reading or writing in the compressed file, and by 997*0ac9a9daSXin Li <computeroutput>BZ2_bzReadOpen</computeroutput> and 998*0ac9a9daSXin Li <computeroutput>BZ2_bzWriteOpen</computeroutput> for attempts 999*0ac9a9daSXin Li to use a file for which the error indicator (viz, 1000*0ac9a9daSXin Li <computeroutput>ferror(f)</computeroutput>) is set. On 1001*0ac9a9daSXin Li receipt of <computeroutput>BZ_IO_ERROR</computeroutput>, the 1002*0ac9a9daSXin Li caller should consult <computeroutput>errno</computeroutput> 1003*0ac9a9daSXin Li and/or <computeroutput>perror</computeroutput> to acquire 1004*0ac9a9daSXin Li operating-system specific information about the 1005*0ac9a9daSXin Li problem.</para></listitem> 1006*0ac9a9daSXin Li </varlistentry> 1007*0ac9a9daSXin Li 1008*0ac9a9daSXin Li <varlistentry> 1009*0ac9a9daSXin Li <term><computeroutput>BZ_UNEXPECTED_EOF</computeroutput></term> 1010*0ac9a9daSXin Li <listitem><para>Returned by 1011*0ac9a9daSXin Li <computeroutput>BZ2_bzRead</computeroutput> when the 1012*0ac9a9daSXin Li compressed file finishes before the logical end of stream is 1013*0ac9a9daSXin Li detected.</para></listitem> 1014*0ac9a9daSXin Li </varlistentry> 1015*0ac9a9daSXin Li 1016*0ac9a9daSXin Li <varlistentry> 1017*0ac9a9daSXin Li <term><computeroutput>BZ_OUTBUFF_FULL</computeroutput></term> 1018*0ac9a9daSXin Li <listitem><para>Returned by 1019*0ac9a9daSXin Li <computeroutput>BZ2_bzBuffToBuffCompress</computeroutput> and 1020*0ac9a9daSXin Li <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> to 1021*0ac9a9daSXin Li indicate that the output data will not fit into the output 1022*0ac9a9daSXin Li buffer provided.</para></listitem> 1023*0ac9a9daSXin Li </varlistentry> 1024*0ac9a9daSXin Li 1025*0ac9a9daSXin Li</variablelist> 1026*0ac9a9daSXin Li 1027*0ac9a9daSXin Li</sect1> 1028*0ac9a9daSXin Li 1029*0ac9a9daSXin Li 1030*0ac9a9daSXin Li 1031*0ac9a9daSXin Li<sect1 id="low-level" xreflabel=">Low-level interface"> 1032*0ac9a9daSXin Li<title>Low-level interface</title> 1033*0ac9a9daSXin Li 1034*0ac9a9daSXin Li 1035*0ac9a9daSXin Li<sect2 id="bzcompress-init" xreflabel="BZ2_bzCompressInit"> 1036*0ac9a9daSXin Li<title>BZ2_bzCompressInit</title> 1037*0ac9a9daSXin Li 1038*0ac9a9daSXin Li<programlisting> 1039*0ac9a9daSXin Litypedef struct { 1040*0ac9a9daSXin Li char *next_in; 1041*0ac9a9daSXin Li unsigned int avail_in; 1042*0ac9a9daSXin Li unsigned int total_in_lo32; 1043*0ac9a9daSXin Li unsigned int total_in_hi32; 1044*0ac9a9daSXin Li 1045*0ac9a9daSXin Li char *next_out; 1046*0ac9a9daSXin Li unsigned int avail_out; 1047*0ac9a9daSXin Li unsigned int total_out_lo32; 1048*0ac9a9daSXin Li unsigned int total_out_hi32; 1049*0ac9a9daSXin Li 1050*0ac9a9daSXin Li void *state; 1051*0ac9a9daSXin Li 1052*0ac9a9daSXin Li void *(*bzalloc)(void *,int,int); 1053*0ac9a9daSXin Li void (*bzfree)(void *,void *); 1054*0ac9a9daSXin Li void *opaque; 1055*0ac9a9daSXin Li} bz_stream; 1056*0ac9a9daSXin Li 1057*0ac9a9daSXin Liint BZ2_bzCompressInit ( bz_stream *strm, 1058*0ac9a9daSXin Li int blockSize100k, 1059*0ac9a9daSXin Li int verbosity, 1060*0ac9a9daSXin Li int workFactor ); 1061*0ac9a9daSXin Li</programlisting> 1062*0ac9a9daSXin Li 1063*0ac9a9daSXin Li<para>Prepares for compression. The 1064*0ac9a9daSXin Li<computeroutput>bz_stream</computeroutput> structure holds all 1065*0ac9a9daSXin Lidata pertaining to the compression activity. A 1066*0ac9a9daSXin Li<computeroutput>bz_stream</computeroutput> structure should be 1067*0ac9a9daSXin Liallocated and initialised prior to the call. The fields of 1068*0ac9a9daSXin Li<computeroutput>bz_stream</computeroutput> comprise the entirety 1069*0ac9a9daSXin Liof the user-visible data. <computeroutput>state</computeroutput> 1070*0ac9a9daSXin Liis a pointer to the private data structures required for 1071*0ac9a9daSXin Licompression.</para> 1072*0ac9a9daSXin Li 1073*0ac9a9daSXin Li<para>Custom memory allocators are supported, via fields 1074*0ac9a9daSXin Li<computeroutput>bzalloc</computeroutput>, 1075*0ac9a9daSXin Li<computeroutput>bzfree</computeroutput>, and 1076*0ac9a9daSXin Li<computeroutput>opaque</computeroutput>. The value 1077*0ac9a9daSXin Li<computeroutput>opaque</computeroutput> is passed to as the first 1078*0ac9a9daSXin Liargument to all calls to <computeroutput>bzalloc</computeroutput> 1079*0ac9a9daSXin Liand <computeroutput>bzfree</computeroutput>, but is otherwise 1080*0ac9a9daSXin Liignored by the library. The call <computeroutput>bzalloc ( 1081*0ac9a9daSXin Liopaque, n, m )</computeroutput> is expected to return a pointer 1082*0ac9a9daSXin Li<computeroutput>p</computeroutput> to <computeroutput>n * 1083*0ac9a9daSXin Lim</computeroutput> bytes of memory, and <computeroutput>bzfree ( 1084*0ac9a9daSXin Liopaque, p )</computeroutput> should free that memory.</para> 1085*0ac9a9daSXin Li 1086*0ac9a9daSXin Li<para>If you don't want to use a custom memory allocator, set 1087*0ac9a9daSXin Li<computeroutput>bzalloc</computeroutput>, 1088*0ac9a9daSXin Li<computeroutput>bzfree</computeroutput> and 1089*0ac9a9daSXin Li<computeroutput>opaque</computeroutput> to 1090*0ac9a9daSXin Li<computeroutput>NULL</computeroutput>, and the library will then 1091*0ac9a9daSXin Liuse the standard <computeroutput>malloc</computeroutput> / 1092*0ac9a9daSXin Li<computeroutput>free</computeroutput> routines.</para> 1093*0ac9a9daSXin Li 1094*0ac9a9daSXin Li<para>Before calling 1095*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressInit</computeroutput>, fields 1096*0ac9a9daSXin Li<computeroutput>bzalloc</computeroutput>, 1097*0ac9a9daSXin Li<computeroutput>bzfree</computeroutput> and 1098*0ac9a9daSXin Li<computeroutput>opaque</computeroutput> should be filled 1099*0ac9a9daSXin Liappropriately, as just described. Upon return, the internal 1100*0ac9a9daSXin Listate will have been allocated and initialised, and 1101*0ac9a9daSXin Li<computeroutput>total_in_lo32</computeroutput>, 1102*0ac9a9daSXin Li<computeroutput>total_in_hi32</computeroutput>, 1103*0ac9a9daSXin Li<computeroutput>total_out_lo32</computeroutput> and 1104*0ac9a9daSXin Li<computeroutput>total_out_hi32</computeroutput> will have been 1105*0ac9a9daSXin Liset to zero. These four fields are used by the library to inform 1106*0ac9a9daSXin Lithe caller of the total amount of data passed into and out of the 1107*0ac9a9daSXin Lilibrary, respectively. You should not try to change them. As of 1108*0ac9a9daSXin Liversion 1.0, 64-bit counts are maintained, even on 32-bit 1109*0ac9a9daSXin Liplatforms, using the <computeroutput>_hi32</computeroutput> 1110*0ac9a9daSXin Lifields to store the upper 32 bits of the count. So, for example, 1111*0ac9a9daSXin Lithe total amount of data in is <computeroutput>(total_in_hi32 1112*0ac9a9daSXin Li<< 32) + total_in_lo32</computeroutput>.</para> 1113*0ac9a9daSXin Li 1114*0ac9a9daSXin Li<para>Parameter <computeroutput>blockSize100k</computeroutput> 1115*0ac9a9daSXin Lispecifies the block size to be used for compression. It should 1116*0ac9a9daSXin Libe a value between 1 and 9 inclusive, and the actual block size 1117*0ac9a9daSXin Liused is 100000 x this figure. 9 gives the best compression but 1118*0ac9a9daSXin Litakes most memory.</para> 1119*0ac9a9daSXin Li 1120*0ac9a9daSXin Li<para>Parameter <computeroutput>verbosity</computeroutput> should 1121*0ac9a9daSXin Libe set to a number between 0 and 4 inclusive. 0 is silent, and 1122*0ac9a9daSXin Ligreater numbers give increasingly verbose monitoring/debugging 1123*0ac9a9daSXin Lioutput. If the library has been compiled with 1124*0ac9a9daSXin Li<computeroutput>-DBZ_NO_STDIO</computeroutput>, no such output 1125*0ac9a9daSXin Liwill appear for any verbosity setting.</para> 1126*0ac9a9daSXin Li 1127*0ac9a9daSXin Li<para>Parameter <computeroutput>workFactor</computeroutput> 1128*0ac9a9daSXin Licontrols how the compression phase behaves when presented with 1129*0ac9a9daSXin Liworst case, highly repetitive, input data. If compression runs 1130*0ac9a9daSXin Liinto difficulties caused by repetitive data, the library switches 1131*0ac9a9daSXin Lifrom the standard sorting algorithm to a fallback algorithm. The 1132*0ac9a9daSXin Lifallback is slower than the standard algorithm by perhaps a 1133*0ac9a9daSXin Lifactor of three, but always behaves reasonably, no matter how bad 1134*0ac9a9daSXin Lithe input.</para> 1135*0ac9a9daSXin Li 1136*0ac9a9daSXin Li<para>Lower values of <computeroutput>workFactor</computeroutput> 1137*0ac9a9daSXin Lireduce the amount of effort the standard algorithm will expend 1138*0ac9a9daSXin Libefore resorting to the fallback. You should set this parameter 1139*0ac9a9daSXin Licarefully; too low, and many inputs will be handled by the 1140*0ac9a9daSXin Lifallback algorithm and so compress rather slowly, too high, and 1141*0ac9a9daSXin Liyour average-to-worst case compression times can become very 1142*0ac9a9daSXin Lilarge. The default value of 30 gives reasonable behaviour over a 1143*0ac9a9daSXin Liwide range of circumstances.</para> 1144*0ac9a9daSXin Li 1145*0ac9a9daSXin Li<para>Allowable values range from 0 to 250 inclusive. 0 is a 1146*0ac9a9daSXin Lispecial case, equivalent to using the default value of 30.</para> 1147*0ac9a9daSXin Li 1148*0ac9a9daSXin Li<para>Note that the compressed output generated is the same 1149*0ac9a9daSXin Liregardless of whether or not the fallback algorithm is 1150*0ac9a9daSXin Liused.</para> 1151*0ac9a9daSXin Li 1152*0ac9a9daSXin Li<para>Be aware also that this parameter may disappear entirely in 1153*0ac9a9daSXin Lifuture versions of the library. In principle it should be 1154*0ac9a9daSXin Lipossible to devise a good way to automatically choose which 1155*0ac9a9daSXin Lialgorithm to use. Such a mechanism would render the parameter 1156*0ac9a9daSXin Liobsolete.</para> 1157*0ac9a9daSXin Li 1158*0ac9a9daSXin Li<para>Possible return values:</para> 1159*0ac9a9daSXin Li 1160*0ac9a9daSXin Li<programlisting> 1161*0ac9a9daSXin LiBZ_CONFIG_ERROR 1162*0ac9a9daSXin Li if the library has been mis-compiled 1163*0ac9a9daSXin LiBZ_PARAM_ERROR 1164*0ac9a9daSXin Li if strm is NULL 1165*0ac9a9daSXin Li or blockSize < 1 or blockSize > 9 1166*0ac9a9daSXin Li or verbosity < 0 or verbosity > 4 1167*0ac9a9daSXin Li or workFactor < 0 or workFactor > 250 1168*0ac9a9daSXin LiBZ_MEM_ERROR 1169*0ac9a9daSXin Li if not enough memory is available 1170*0ac9a9daSXin LiBZ_OK 1171*0ac9a9daSXin Li otherwise 1172*0ac9a9daSXin Li</programlisting> 1173*0ac9a9daSXin Li 1174*0ac9a9daSXin Li<para>Allowable next actions:</para> 1175*0ac9a9daSXin Li 1176*0ac9a9daSXin Li<programlisting> 1177*0ac9a9daSXin LiBZ2_bzCompress 1178*0ac9a9daSXin Li if BZ_OK is returned 1179*0ac9a9daSXin Li no specific action needed in case of error 1180*0ac9a9daSXin Li</programlisting> 1181*0ac9a9daSXin Li 1182*0ac9a9daSXin Li</sect2> 1183*0ac9a9daSXin Li 1184*0ac9a9daSXin Li 1185*0ac9a9daSXin Li<sect2 id="bzCompress" xreflabel="BZ2_bzCompress"> 1186*0ac9a9daSXin Li<title>BZ2_bzCompress</title> 1187*0ac9a9daSXin Li 1188*0ac9a9daSXin Li<programlisting> 1189*0ac9a9daSXin Liint BZ2_bzCompress ( bz_stream *strm, int action ); 1190*0ac9a9daSXin Li</programlisting> 1191*0ac9a9daSXin Li 1192*0ac9a9daSXin Li<para>Provides more input and/or output buffer space for the 1193*0ac9a9daSXin Lilibrary. The caller maintains input and output buffers, and 1194*0ac9a9daSXin Licalls <computeroutput>BZ2_bzCompress</computeroutput> to transfer 1195*0ac9a9daSXin Lidata between them.</para> 1196*0ac9a9daSXin Li 1197*0ac9a9daSXin Li<para>Before each call to 1198*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput>, 1199*0ac9a9daSXin Li<computeroutput>next_in</computeroutput> should point at the data 1200*0ac9a9daSXin Lito be compressed, and <computeroutput>avail_in</computeroutput> 1201*0ac9a9daSXin Lishould indicate how many bytes the library may read. 1202*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> updates 1203*0ac9a9daSXin Li<computeroutput>next_in</computeroutput>, 1204*0ac9a9daSXin Li<computeroutput>avail_in</computeroutput> and 1205*0ac9a9daSXin Li<computeroutput>total_in</computeroutput> to reflect the number 1206*0ac9a9daSXin Liof bytes it has read.</para> 1207*0ac9a9daSXin Li 1208*0ac9a9daSXin Li<para>Similarly, <computeroutput>next_out</computeroutput> should 1209*0ac9a9daSXin Lipoint to a buffer in which the compressed data is to be placed, 1210*0ac9a9daSXin Liwith <computeroutput>avail_out</computeroutput> indicating how 1211*0ac9a9daSXin Limuch output space is available. 1212*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> updates 1213*0ac9a9daSXin Li<computeroutput>next_out</computeroutput>, 1214*0ac9a9daSXin Li<computeroutput>avail_out</computeroutput> and 1215*0ac9a9daSXin Li<computeroutput>total_out</computeroutput> to reflect the number 1216*0ac9a9daSXin Liof bytes output.</para> 1217*0ac9a9daSXin Li 1218*0ac9a9daSXin Li<para>You may provide and remove as little or as much data as you 1219*0ac9a9daSXin Lilike on each call of 1220*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput>. In the limit, 1221*0ac9a9daSXin Liit is acceptable to supply and remove data one byte at a time, 1222*0ac9a9daSXin Lialthough this would be terribly inefficient. You should always 1223*0ac9a9daSXin Liensure that at least one byte of output space is available at 1224*0ac9a9daSXin Lieach call.</para> 1225*0ac9a9daSXin Li 1226*0ac9a9daSXin Li<para>A second purpose of 1227*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> is to request a 1228*0ac9a9daSXin Lichange of mode of the compressed stream.</para> 1229*0ac9a9daSXin Li 1230*0ac9a9daSXin Li<para>Conceptually, a compressed stream can be in one of four 1231*0ac9a9daSXin Listates: IDLE, RUNNING, FLUSHING and FINISHING. Before 1232*0ac9a9daSXin Liinitialisation 1233*0ac9a9daSXin Li(<computeroutput>BZ2_bzCompressInit</computeroutput>) and after 1234*0ac9a9daSXin Litermination (<computeroutput>BZ2_bzCompressEnd</computeroutput>), 1235*0ac9a9daSXin Lia stream is regarded as IDLE.</para> 1236*0ac9a9daSXin Li 1237*0ac9a9daSXin Li<para>Upon initialisation 1238*0ac9a9daSXin Li(<computeroutput>BZ2_bzCompressInit</computeroutput>), the stream 1239*0ac9a9daSXin Liis placed in the RUNNING state. Subsequent calls to 1240*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> should pass 1241*0ac9a9daSXin Li<computeroutput>BZ_RUN</computeroutput> as the requested action; 1242*0ac9a9daSXin Liother actions are illegal and will result in 1243*0ac9a9daSXin Li<computeroutput>BZ_SEQUENCE_ERROR</computeroutput>.</para> 1244*0ac9a9daSXin Li 1245*0ac9a9daSXin Li<para>At some point, the calling program will have provided all 1246*0ac9a9daSXin Lithe input data it wants to. It will then want to finish up -- in 1247*0ac9a9daSXin Lieffect, asking the library to process any data it might have 1248*0ac9a9daSXin Libuffered internally. In this state, 1249*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> will no longer 1250*0ac9a9daSXin Liattempt to read data from 1251*0ac9a9daSXin Li<computeroutput>next_in</computeroutput>, but it will want to 1252*0ac9a9daSXin Liwrite data to <computeroutput>next_out</computeroutput>. Because 1253*0ac9a9daSXin Lithe output buffer supplied by the user can be arbitrarily small, 1254*0ac9a9daSXin Lithe finishing-up operation cannot necessarily be done with a 1255*0ac9a9daSXin Lisingle call of 1256*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput>.</para> 1257*0ac9a9daSXin Li 1258*0ac9a9daSXin Li<para>Instead, the calling program passes 1259*0ac9a9daSXin Li<computeroutput>BZ_FINISH</computeroutput> as an action to 1260*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput>. This changes 1261*0ac9a9daSXin Lithe stream's state to FINISHING. Any remaining input (ie, 1262*0ac9a9daSXin Li<computeroutput>next_in[0 .. avail_in-1]</computeroutput>) is 1263*0ac9a9daSXin Licompressed and transferred to the output buffer. To do this, 1264*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> must be called 1265*0ac9a9daSXin Lirepeatedly until all the output has been consumed. At that 1266*0ac9a9daSXin Lipoint, <computeroutput>BZ2_bzCompress</computeroutput> returns 1267*0ac9a9daSXin Li<computeroutput>BZ_STREAM_END</computeroutput>, and the stream's 1268*0ac9a9daSXin Listate is set back to IDLE. 1269*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressEnd</computeroutput> should then be 1270*0ac9a9daSXin Licalled.</para> 1271*0ac9a9daSXin Li 1272*0ac9a9daSXin Li<para>Just to make sure the calling program does not cheat, the 1273*0ac9a9daSXin Lilibrary makes a note of <computeroutput>avail_in</computeroutput> 1274*0ac9a9daSXin Liat the time of the first call to 1275*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> which has 1276*0ac9a9daSXin Li<computeroutput>BZ_FINISH</computeroutput> as an action (ie, at 1277*0ac9a9daSXin Lithe time the program has announced its intention to not supply 1278*0ac9a9daSXin Liany more input). By comparing this value with that of 1279*0ac9a9daSXin Li<computeroutput>avail_in</computeroutput> over subsequent calls 1280*0ac9a9daSXin Lito <computeroutput>BZ2_bzCompress</computeroutput>, the library 1281*0ac9a9daSXin Lican detect any attempts to slip in more data to compress. Any 1282*0ac9a9daSXin Licalls for which this is detected will return 1283*0ac9a9daSXin Li<computeroutput>BZ_SEQUENCE_ERROR</computeroutput>. This 1284*0ac9a9daSXin Liindicates a programming mistake which should be corrected.</para> 1285*0ac9a9daSXin Li 1286*0ac9a9daSXin Li<para>Instead of asking to finish, the calling program may ask 1287*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> to take all the 1288*0ac9a9daSXin Liremaining input, compress it and terminate the current 1289*0ac9a9daSXin Li(Burrows-Wheeler) compression block. This could be useful for 1290*0ac9a9daSXin Lierror control purposes. The mechanism is analogous to that for 1291*0ac9a9daSXin Lifinishing: call <computeroutput>BZ2_bzCompress</computeroutput> 1292*0ac9a9daSXin Liwith an action of <computeroutput>BZ_FLUSH</computeroutput>, 1293*0ac9a9daSXin Liremove output data, and persist with the 1294*0ac9a9daSXin Li<computeroutput>BZ_FLUSH</computeroutput> action until the value 1295*0ac9a9daSXin Li<computeroutput>BZ_RUN</computeroutput> is returned. As with 1296*0ac9a9daSXin Lifinishing, <computeroutput>BZ2_bzCompress</computeroutput> 1297*0ac9a9daSXin Lidetects any attempt to provide more input data once the flush has 1298*0ac9a9daSXin Libegun.</para> 1299*0ac9a9daSXin Li 1300*0ac9a9daSXin Li<para>Once the flush is complete, the stream returns to the 1301*0ac9a9daSXin Linormal RUNNING state.</para> 1302*0ac9a9daSXin Li 1303*0ac9a9daSXin Li<para>This all sounds pretty complex, but isn't really. Here's a 1304*0ac9a9daSXin Litable which shows which actions are allowable in each state, what 1305*0ac9a9daSXin Liaction will be taken, what the next state is, and what the 1306*0ac9a9daSXin Linon-error return values are. Note that you can't explicitly ask 1307*0ac9a9daSXin Liwhat state the stream is in, but nor do you need to -- it can be 1308*0ac9a9daSXin Liinferred from the values returned by 1309*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput>.</para> 1310*0ac9a9daSXin Li 1311*0ac9a9daSXin Li<programlisting> 1312*0ac9a9daSXin LiIDLE/any 1313*0ac9a9daSXin Li Illegal. IDLE state only exists after BZ2_bzCompressEnd or 1314*0ac9a9daSXin Li before BZ2_bzCompressInit. 1315*0ac9a9daSXin Li Return value = BZ_SEQUENCE_ERROR 1316*0ac9a9daSXin Li 1317*0ac9a9daSXin LiRUNNING/BZ_RUN 1318*0ac9a9daSXin Li Compress from next_in to next_out as much as possible. 1319*0ac9a9daSXin Li Next state = RUNNING 1320*0ac9a9daSXin Li Return value = BZ_RUN_OK 1321*0ac9a9daSXin Li 1322*0ac9a9daSXin LiRUNNING/BZ_FLUSH 1323*0ac9a9daSXin Li Remember current value of next_in. Compress from next_in 1324*0ac9a9daSXin Li to next_out as much as possible, but do not accept any more input. 1325*0ac9a9daSXin Li Next state = FLUSHING 1326*0ac9a9daSXin Li Return value = BZ_FLUSH_OK 1327*0ac9a9daSXin Li 1328*0ac9a9daSXin LiRUNNING/BZ_FINISH 1329*0ac9a9daSXin Li Remember current value of next_in. Compress from next_in 1330*0ac9a9daSXin Li to next_out as much as possible, but do not accept any more input. 1331*0ac9a9daSXin Li Next state = FINISHING 1332*0ac9a9daSXin Li Return value = BZ_FINISH_OK 1333*0ac9a9daSXin Li 1334*0ac9a9daSXin LiFLUSHING/BZ_FLUSH 1335*0ac9a9daSXin Li Compress from next_in to next_out as much as possible, 1336*0ac9a9daSXin Li but do not accept any more input. 1337*0ac9a9daSXin Li If all the existing input has been used up and all compressed 1338*0ac9a9daSXin Li output has been removed 1339*0ac9a9daSXin Li Next state = RUNNING; Return value = BZ_RUN_OK 1340*0ac9a9daSXin Li else 1341*0ac9a9daSXin Li Next state = FLUSHING; Return value = BZ_FLUSH_OK 1342*0ac9a9daSXin Li 1343*0ac9a9daSXin LiFLUSHING/other 1344*0ac9a9daSXin Li Illegal. 1345*0ac9a9daSXin Li Return value = BZ_SEQUENCE_ERROR 1346*0ac9a9daSXin Li 1347*0ac9a9daSXin LiFINISHING/BZ_FINISH 1348*0ac9a9daSXin Li Compress from next_in to next_out as much as possible, 1349*0ac9a9daSXin Li but to not accept any more input. 1350*0ac9a9daSXin Li If all the existing input has been used up and all compressed 1351*0ac9a9daSXin Li output has been removed 1352*0ac9a9daSXin Li Next state = IDLE; Return value = BZ_STREAM_END 1353*0ac9a9daSXin Li else 1354*0ac9a9daSXin Li Next state = FINISHING; Return value = BZ_FINISH_OK 1355*0ac9a9daSXin Li 1356*0ac9a9daSXin LiFINISHING/other 1357*0ac9a9daSXin Li Illegal. 1358*0ac9a9daSXin Li Return value = BZ_SEQUENCE_ERROR 1359*0ac9a9daSXin Li</programlisting> 1360*0ac9a9daSXin Li 1361*0ac9a9daSXin Li 1362*0ac9a9daSXin Li<para>That still looks complicated? Well, fair enough. The 1363*0ac9a9daSXin Liusual sequence of calls for compressing a load of data is:</para> 1364*0ac9a9daSXin Li 1365*0ac9a9daSXin Li<orderedlist> 1366*0ac9a9daSXin Li 1367*0ac9a9daSXin Li <listitem><para>Get started with 1368*0ac9a9daSXin Li <computeroutput>BZ2_bzCompressInit</computeroutput>.</para></listitem> 1369*0ac9a9daSXin Li 1370*0ac9a9daSXin Li <listitem><para>Shovel data in and shlurp out its compressed form 1371*0ac9a9daSXin Li using zero or more calls of 1372*0ac9a9daSXin Li <computeroutput>BZ2_bzCompress</computeroutput> with action = 1373*0ac9a9daSXin Li <computeroutput>BZ_RUN</computeroutput>.</para></listitem> 1374*0ac9a9daSXin Li 1375*0ac9a9daSXin Li <listitem><para>Finish up. Repeatedly call 1376*0ac9a9daSXin Li <computeroutput>BZ2_bzCompress</computeroutput> with action = 1377*0ac9a9daSXin Li <computeroutput>BZ_FINISH</computeroutput>, copying out the 1378*0ac9a9daSXin Li compressed output, until 1379*0ac9a9daSXin Li <computeroutput>BZ_STREAM_END</computeroutput> is 1380*0ac9a9daSXin Li returned.</para></listitem> <listitem><para>Close up and go home. Call 1381*0ac9a9daSXin Li <computeroutput>BZ2_bzCompressEnd</computeroutput>.</para></listitem> 1382*0ac9a9daSXin Li 1383*0ac9a9daSXin Li</orderedlist> 1384*0ac9a9daSXin Li 1385*0ac9a9daSXin Li<para>If the data you want to compress fits into your input 1386*0ac9a9daSXin Libuffer all at once, you can skip the calls of 1387*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress ( ..., BZ_RUN )</computeroutput> 1388*0ac9a9daSXin Liand just do the <computeroutput>BZ2_bzCompress ( ..., BZ_FINISH 1389*0ac9a9daSXin Li)</computeroutput> calls.</para> 1390*0ac9a9daSXin Li 1391*0ac9a9daSXin Li<para>All required memory is allocated by 1392*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressInit</computeroutput>. The 1393*0ac9a9daSXin Licompression library can accept any data at all (obviously). So 1394*0ac9a9daSXin Liyou shouldn't get any error return values from the 1395*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> calls. If you 1396*0ac9a9daSXin Lido, they will be 1397*0ac9a9daSXin Li<computeroutput>BZ_SEQUENCE_ERROR</computeroutput>, and indicate 1398*0ac9a9daSXin Lia bug in your programming.</para> 1399*0ac9a9daSXin Li 1400*0ac9a9daSXin Li<para>Trivial other possible return values:</para> 1401*0ac9a9daSXin Li 1402*0ac9a9daSXin Li<programlisting> 1403*0ac9a9daSXin LiBZ_PARAM_ERROR 1404*0ac9a9daSXin Li if strm is NULL, or strm->s is NULL 1405*0ac9a9daSXin Li</programlisting> 1406*0ac9a9daSXin Li 1407*0ac9a9daSXin Li</sect2> 1408*0ac9a9daSXin Li 1409*0ac9a9daSXin Li 1410*0ac9a9daSXin Li<sect2 id="bzCompress-end" xreflabel="BZ2_bzCompressEnd"> 1411*0ac9a9daSXin Li<title>BZ2_bzCompressEnd</title> 1412*0ac9a9daSXin Li 1413*0ac9a9daSXin Li<programlisting> 1414*0ac9a9daSXin Liint BZ2_bzCompressEnd ( bz_stream *strm ); 1415*0ac9a9daSXin Li</programlisting> 1416*0ac9a9daSXin Li 1417*0ac9a9daSXin Li<para>Releases all memory associated with a compression 1418*0ac9a9daSXin Listream.</para> 1419*0ac9a9daSXin Li 1420*0ac9a9daSXin Li<para>Possible return values:</para> 1421*0ac9a9daSXin Li 1422*0ac9a9daSXin Li<programlisting> 1423*0ac9a9daSXin LiBZ_PARAM_ERROR if strm is NULL or strm->s is NULL 1424*0ac9a9daSXin LiBZ_OK otherwise 1425*0ac9a9daSXin Li</programlisting> 1426*0ac9a9daSXin Li 1427*0ac9a9daSXin Li</sect2> 1428*0ac9a9daSXin Li 1429*0ac9a9daSXin Li 1430*0ac9a9daSXin Li<sect2 id="bzDecompress-init" xreflabel="BZ2_bzDecompressInit"> 1431*0ac9a9daSXin Li<title>BZ2_bzDecompressInit</title> 1432*0ac9a9daSXin Li 1433*0ac9a9daSXin Li<programlisting> 1434*0ac9a9daSXin Liint BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small ); 1435*0ac9a9daSXin Li</programlisting> 1436*0ac9a9daSXin Li 1437*0ac9a9daSXin Li<para>Prepares for decompression. As with 1438*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressInit</computeroutput>, a 1439*0ac9a9daSXin Li<computeroutput>bz_stream</computeroutput> record should be 1440*0ac9a9daSXin Liallocated and initialised before the call. Fields 1441*0ac9a9daSXin Li<computeroutput>bzalloc</computeroutput>, 1442*0ac9a9daSXin Li<computeroutput>bzfree</computeroutput> and 1443*0ac9a9daSXin Li<computeroutput>opaque</computeroutput> should be set if a custom 1444*0ac9a9daSXin Limemory allocator is required, or made 1445*0ac9a9daSXin Li<computeroutput>NULL</computeroutput> for the normal 1446*0ac9a9daSXin Li<computeroutput>malloc</computeroutput> / 1447*0ac9a9daSXin Li<computeroutput>free</computeroutput> routines. Upon return, the 1448*0ac9a9daSXin Liinternal state will have been initialised, and 1449*0ac9a9daSXin Li<computeroutput>total_in</computeroutput> and 1450*0ac9a9daSXin Li<computeroutput>total_out</computeroutput> will be zero.</para> 1451*0ac9a9daSXin Li 1452*0ac9a9daSXin Li<para>For the meaning of parameter 1453*0ac9a9daSXin Li<computeroutput>verbosity</computeroutput>, see 1454*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressInit</computeroutput>.</para> 1455*0ac9a9daSXin Li 1456*0ac9a9daSXin Li<para>If <computeroutput>small</computeroutput> is nonzero, the 1457*0ac9a9daSXin Lilibrary will use an alternative decompression algorithm which 1458*0ac9a9daSXin Liuses less memory but at the cost of decompressing more slowly 1459*0ac9a9daSXin Li(roughly speaking, half the speed, but the maximum memory 1460*0ac9a9daSXin Lirequirement drops to around 2300k). See <xref linkend="using"/> 1461*0ac9a9daSXin Lifor more information on memory management.</para> 1462*0ac9a9daSXin Li 1463*0ac9a9daSXin Li<para>Note that the amount of memory needed to decompress a 1464*0ac9a9daSXin Listream cannot be determined until the stream's header has been 1465*0ac9a9daSXin Liread, so even if 1466*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompressInit</computeroutput> succeeds, a 1467*0ac9a9daSXin Lisubsequent <computeroutput>BZ2_bzDecompress</computeroutput> 1468*0ac9a9daSXin Licould fail with 1469*0ac9a9daSXin Li<computeroutput>BZ_MEM_ERROR</computeroutput>.</para> 1470*0ac9a9daSXin Li 1471*0ac9a9daSXin Li<para>Possible return values:</para> 1472*0ac9a9daSXin Li 1473*0ac9a9daSXin Li<programlisting> 1474*0ac9a9daSXin LiBZ_CONFIG_ERROR 1475*0ac9a9daSXin Li if the library has been mis-compiled 1476*0ac9a9daSXin LiBZ_PARAM_ERROR 1477*0ac9a9daSXin Li if ( small != 0 && small != 1 ) 1478*0ac9a9daSXin Li or (verbosity <; 0 || verbosity > 4) 1479*0ac9a9daSXin LiBZ_MEM_ERROR 1480*0ac9a9daSXin Li if insufficient memory is available 1481*0ac9a9daSXin Li</programlisting> 1482*0ac9a9daSXin Li 1483*0ac9a9daSXin Li<para>Allowable next actions:</para> 1484*0ac9a9daSXin Li 1485*0ac9a9daSXin Li<programlisting> 1486*0ac9a9daSXin LiBZ2_bzDecompress 1487*0ac9a9daSXin Li if BZ_OK was returned 1488*0ac9a9daSXin Li no specific action required in case of error 1489*0ac9a9daSXin Li</programlisting> 1490*0ac9a9daSXin Li 1491*0ac9a9daSXin Li</sect2> 1492*0ac9a9daSXin Li 1493*0ac9a9daSXin Li 1494*0ac9a9daSXin Li<sect2 id="bzDecompress" xreflabel="BZ2_bzDecompress"> 1495*0ac9a9daSXin Li<title>BZ2_bzDecompress</title> 1496*0ac9a9daSXin Li 1497*0ac9a9daSXin Li<programlisting> 1498*0ac9a9daSXin Liint BZ2_bzDecompress ( bz_stream *strm ); 1499*0ac9a9daSXin Li</programlisting> 1500*0ac9a9daSXin Li 1501*0ac9a9daSXin Li<para>Provides more input and/out output buffer space for the 1502*0ac9a9daSXin Lilibrary. The caller maintains input and output buffers, and uses 1503*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompress</computeroutput> to transfer 1504*0ac9a9daSXin Lidata between them.</para> 1505*0ac9a9daSXin Li 1506*0ac9a9daSXin Li<para>Before each call to 1507*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompress</computeroutput>, 1508*0ac9a9daSXin Li<computeroutput>next_in</computeroutput> should point at the 1509*0ac9a9daSXin Licompressed data, and <computeroutput>avail_in</computeroutput> 1510*0ac9a9daSXin Lishould indicate how many bytes the library may read. 1511*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompress</computeroutput> updates 1512*0ac9a9daSXin Li<computeroutput>next_in</computeroutput>, 1513*0ac9a9daSXin Li<computeroutput>avail_in</computeroutput> and 1514*0ac9a9daSXin Li<computeroutput>total_in</computeroutput> to reflect the number 1515*0ac9a9daSXin Liof bytes it has read.</para> 1516*0ac9a9daSXin Li 1517*0ac9a9daSXin Li<para>Similarly, <computeroutput>next_out</computeroutput> should 1518*0ac9a9daSXin Lipoint to a buffer in which the uncompressed output is to be 1519*0ac9a9daSXin Liplaced, with <computeroutput>avail_out</computeroutput> 1520*0ac9a9daSXin Liindicating how much output space is available. 1521*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput> updates 1522*0ac9a9daSXin Li<computeroutput>next_out</computeroutput>, 1523*0ac9a9daSXin Li<computeroutput>avail_out</computeroutput> and 1524*0ac9a9daSXin Li<computeroutput>total_out</computeroutput> to reflect the number 1525*0ac9a9daSXin Liof bytes output.</para> 1526*0ac9a9daSXin Li 1527*0ac9a9daSXin Li<para>You may provide and remove as little or as much data as you 1528*0ac9a9daSXin Lilike on each call of 1529*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompress</computeroutput>. In the limit, 1530*0ac9a9daSXin Liit is acceptable to supply and remove data one byte at a time, 1531*0ac9a9daSXin Lialthough this would be terribly inefficient. You should always 1532*0ac9a9daSXin Liensure that at least one byte of output space is available at 1533*0ac9a9daSXin Lieach call.</para> 1534*0ac9a9daSXin Li 1535*0ac9a9daSXin Li<para>Use of <computeroutput>BZ2_bzDecompress</computeroutput> is 1536*0ac9a9daSXin Lisimpler than 1537*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput>.</para> 1538*0ac9a9daSXin Li 1539*0ac9a9daSXin Li<para>You should provide input and remove output as described 1540*0ac9a9daSXin Liabove, and repeatedly call 1541*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompress</computeroutput> until 1542*0ac9a9daSXin Li<computeroutput>BZ_STREAM_END</computeroutput> is returned. 1543*0ac9a9daSXin LiAppearance of <computeroutput>BZ_STREAM_END</computeroutput> 1544*0ac9a9daSXin Lidenotes that <computeroutput>BZ2_bzDecompress</computeroutput> 1545*0ac9a9daSXin Lihas detected the logical end of the compressed stream. 1546*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompress</computeroutput> will not 1547*0ac9a9daSXin Liproduce <computeroutput>BZ_STREAM_END</computeroutput> until all 1548*0ac9a9daSXin Lioutput data has been placed into the output buffer, so once 1549*0ac9a9daSXin Li<computeroutput>BZ_STREAM_END</computeroutput> appears, you are 1550*0ac9a9daSXin Liguaranteed to have available all the decompressed output, and 1551*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompressEnd</computeroutput> can safely 1552*0ac9a9daSXin Libe called.</para> 1553*0ac9a9daSXin Li 1554*0ac9a9daSXin Li<para>If case of an error return value, you should call 1555*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompressEnd</computeroutput> to clean up 1556*0ac9a9daSXin Liand release memory.</para> 1557*0ac9a9daSXin Li 1558*0ac9a9daSXin Li<para>Possible return values:</para> 1559*0ac9a9daSXin Li 1560*0ac9a9daSXin Li<programlisting> 1561*0ac9a9daSXin LiBZ_PARAM_ERROR 1562*0ac9a9daSXin Li if strm is NULL or strm->s is NULL 1563*0ac9a9daSXin Li or strm->avail_out < 1 1564*0ac9a9daSXin LiBZ_DATA_ERROR 1565*0ac9a9daSXin Li if a data integrity error is detected in the compressed stream 1566*0ac9a9daSXin LiBZ_DATA_ERROR_MAGIC 1567*0ac9a9daSXin Li if the compressed stream doesn't begin with the right magic bytes 1568*0ac9a9daSXin LiBZ_MEM_ERROR 1569*0ac9a9daSXin Li if there wasn't enough memory available 1570*0ac9a9daSXin LiBZ_STREAM_END 1571*0ac9a9daSXin Li if the logical end of the data stream was detected and all 1572*0ac9a9daSXin Li output in has been consumed, eg s-->avail_out > 0 1573*0ac9a9daSXin LiBZ_OK 1574*0ac9a9daSXin Li otherwise 1575*0ac9a9daSXin Li</programlisting> 1576*0ac9a9daSXin Li 1577*0ac9a9daSXin Li<para>Allowable next actions:</para> 1578*0ac9a9daSXin Li 1579*0ac9a9daSXin Li<programlisting> 1580*0ac9a9daSXin LiBZ2_bzDecompress 1581*0ac9a9daSXin Li if BZ_OK was returned 1582*0ac9a9daSXin LiBZ2_bzDecompressEnd 1583*0ac9a9daSXin Li otherwise 1584*0ac9a9daSXin Li</programlisting> 1585*0ac9a9daSXin Li 1586*0ac9a9daSXin Li</sect2> 1587*0ac9a9daSXin Li 1588*0ac9a9daSXin Li 1589*0ac9a9daSXin Li<sect2 id="bzDecompress-end" xreflabel="BZ2_bzDecompressEnd"> 1590*0ac9a9daSXin Li<title>BZ2_bzDecompressEnd</title> 1591*0ac9a9daSXin Li 1592*0ac9a9daSXin Li<programlisting> 1593*0ac9a9daSXin Liint BZ2_bzDecompressEnd ( bz_stream *strm ); 1594*0ac9a9daSXin Li</programlisting> 1595*0ac9a9daSXin Li 1596*0ac9a9daSXin Li<para>Releases all memory associated with a decompression 1597*0ac9a9daSXin Listream.</para> 1598*0ac9a9daSXin Li 1599*0ac9a9daSXin Li<para>Possible return values:</para> 1600*0ac9a9daSXin Li 1601*0ac9a9daSXin Li<programlisting> 1602*0ac9a9daSXin LiBZ_PARAM_ERROR 1603*0ac9a9daSXin Li if strm is NULL or strm->s is NULL 1604*0ac9a9daSXin LiBZ_OK 1605*0ac9a9daSXin Li otherwise 1606*0ac9a9daSXin Li</programlisting> 1607*0ac9a9daSXin Li 1608*0ac9a9daSXin Li<para>Allowable next actions:</para> 1609*0ac9a9daSXin Li 1610*0ac9a9daSXin Li<programlisting> 1611*0ac9a9daSXin Li None. 1612*0ac9a9daSXin Li</programlisting> 1613*0ac9a9daSXin Li 1614*0ac9a9daSXin Li</sect2> 1615*0ac9a9daSXin Li 1616*0ac9a9daSXin Li</sect1> 1617*0ac9a9daSXin Li 1618*0ac9a9daSXin Li 1619*0ac9a9daSXin Li<sect1 id="hl-interface" xreflabel="High-level interface"> 1620*0ac9a9daSXin Li<title>High-level interface</title> 1621*0ac9a9daSXin Li 1622*0ac9a9daSXin Li<para>This interface provides functions for reading and writing 1623*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> format files. First, some 1624*0ac9a9daSXin Ligeneral points.</para> 1625*0ac9a9daSXin Li 1626*0ac9a9daSXin Li<itemizedlist mark='bullet'> 1627*0ac9a9daSXin Li 1628*0ac9a9daSXin Li <listitem><para>All of the functions take an 1629*0ac9a9daSXin Li <computeroutput>int*</computeroutput> first argument, 1630*0ac9a9daSXin Li <computeroutput>bzerror</computeroutput>. After each call, 1631*0ac9a9daSXin Li <computeroutput>bzerror</computeroutput> should be consulted 1632*0ac9a9daSXin Li first to determine the outcome of the call. If 1633*0ac9a9daSXin Li <computeroutput>bzerror</computeroutput> is 1634*0ac9a9daSXin Li <computeroutput>BZ_OK</computeroutput>, the call completed 1635*0ac9a9daSXin Li successfully, and only then should the return value of the 1636*0ac9a9daSXin Li function (if any) be consulted. If 1637*0ac9a9daSXin Li <computeroutput>bzerror</computeroutput> is 1638*0ac9a9daSXin Li <computeroutput>BZ_IO_ERROR</computeroutput>, there was an 1639*0ac9a9daSXin Li error reading/writing the underlying compressed file, and you 1640*0ac9a9daSXin Li should then consult <computeroutput>errno</computeroutput> / 1641*0ac9a9daSXin Li <computeroutput>perror</computeroutput> to determine the cause 1642*0ac9a9daSXin Li of the difficulty. <computeroutput>bzerror</computeroutput> 1643*0ac9a9daSXin Li may also be set to various other values; precise details are 1644*0ac9a9daSXin Li given on a per-function basis below.</para></listitem> 1645*0ac9a9daSXin Li 1646*0ac9a9daSXin Li <listitem><para>If <computeroutput>bzerror</computeroutput> indicates 1647*0ac9a9daSXin Li an error (ie, anything except 1648*0ac9a9daSXin Li <computeroutput>BZ_OK</computeroutput> and 1649*0ac9a9daSXin Li <computeroutput>BZ_STREAM_END</computeroutput>), you should 1650*0ac9a9daSXin Li immediately call 1651*0ac9a9daSXin Li <computeroutput>BZ2_bzReadClose</computeroutput> (or 1652*0ac9a9daSXin Li <computeroutput>BZ2_bzWriteClose</computeroutput>, depending on 1653*0ac9a9daSXin Li whether you are attempting to read or to write) to free up all 1654*0ac9a9daSXin Li resources associated with the stream. Once an error has been 1655*0ac9a9daSXin Li indicated, behaviour of all calls except 1656*0ac9a9daSXin Li <computeroutput>BZ2_bzReadClose</computeroutput> 1657*0ac9a9daSXin Li (<computeroutput>BZ2_bzWriteClose</computeroutput>) is 1658*0ac9a9daSXin Li undefined. The implication is that (1) 1659*0ac9a9daSXin Li <computeroutput>bzerror</computeroutput> should be checked 1660*0ac9a9daSXin Li after each call, and (2) if 1661*0ac9a9daSXin Li <computeroutput>bzerror</computeroutput> indicates an error, 1662*0ac9a9daSXin Li <computeroutput>BZ2_bzReadClose</computeroutput> 1663*0ac9a9daSXin Li (<computeroutput>BZ2_bzWriteClose</computeroutput>) should then 1664*0ac9a9daSXin Li be called to clean up.</para></listitem> 1665*0ac9a9daSXin Li 1666*0ac9a9daSXin Li <listitem><para>The <computeroutput>FILE*</computeroutput> arguments 1667*0ac9a9daSXin Li passed to <computeroutput>BZ2_bzReadOpen</computeroutput> / 1668*0ac9a9daSXin Li <computeroutput>BZ2_bzWriteOpen</computeroutput> should be set 1669*0ac9a9daSXin Li to binary mode. Most Unix systems will do this by default, but 1670*0ac9a9daSXin Li other platforms, including Windows and Mac, will not. If you 1671*0ac9a9daSXin Li omit this, you may encounter problems when moving code to new 1672*0ac9a9daSXin Li platforms.</para></listitem> 1673*0ac9a9daSXin Li 1674*0ac9a9daSXin Li <listitem><para>Memory allocation requests are handled by 1675*0ac9a9daSXin Li <computeroutput>malloc</computeroutput> / 1676*0ac9a9daSXin Li <computeroutput>free</computeroutput>. At present there is no 1677*0ac9a9daSXin Li facility for user-defined memory allocators in the file I/O 1678*0ac9a9daSXin Li functions (could easily be added, though).</para></listitem> 1679*0ac9a9daSXin Li 1680*0ac9a9daSXin Li</itemizedlist> 1681*0ac9a9daSXin Li 1682*0ac9a9daSXin Li 1683*0ac9a9daSXin Li 1684*0ac9a9daSXin Li<sect2 id="bzreadopen" xreflabel="BZ2_bzReadOpen"> 1685*0ac9a9daSXin Li<title>BZ2_bzReadOpen</title> 1686*0ac9a9daSXin Li 1687*0ac9a9daSXin Li<programlisting> 1688*0ac9a9daSXin Litypedef void BZFILE; 1689*0ac9a9daSXin Li 1690*0ac9a9daSXin LiBZFILE *BZ2_bzReadOpen( int *bzerror, FILE *f, 1691*0ac9a9daSXin Li int verbosity, int small, 1692*0ac9a9daSXin Li void *unused, int nUnused ); 1693*0ac9a9daSXin Li</programlisting> 1694*0ac9a9daSXin Li 1695*0ac9a9daSXin Li<para>Prepare to read compressed data from file handle 1696*0ac9a9daSXin Li<computeroutput>f</computeroutput>. 1697*0ac9a9daSXin Li<computeroutput>f</computeroutput> should refer to a file which 1698*0ac9a9daSXin Lihas been opened for reading, and for which the error indicator 1699*0ac9a9daSXin Li(<computeroutput>ferror(f)</computeroutput>)is not set. If 1700*0ac9a9daSXin Li<computeroutput>small</computeroutput> is 1, the library will try 1701*0ac9a9daSXin Lito decompress using less memory, at the expense of speed.</para> 1702*0ac9a9daSXin Li 1703*0ac9a9daSXin Li<para>For reasons explained below, 1704*0ac9a9daSXin Li<computeroutput>BZ2_bzRead</computeroutput> will decompress the 1705*0ac9a9daSXin Li<computeroutput>nUnused</computeroutput> bytes starting at 1706*0ac9a9daSXin Li<computeroutput>unused</computeroutput>, before starting to read 1707*0ac9a9daSXin Lifrom the file <computeroutput>f</computeroutput>. At most 1708*0ac9a9daSXin Li<computeroutput>BZ_MAX_UNUSED</computeroutput> bytes may be 1709*0ac9a9daSXin Lisupplied like this. If this facility is not required, you should 1710*0ac9a9daSXin Lipass <computeroutput>NULL</computeroutput> and 1711*0ac9a9daSXin Li<computeroutput>0</computeroutput> for 1712*0ac9a9daSXin Li<computeroutput>unused</computeroutput> and 1713*0ac9a9daSXin Lin<computeroutput>Unused</computeroutput> respectively.</para> 1714*0ac9a9daSXin Li 1715*0ac9a9daSXin Li<para>For the meaning of parameters 1716*0ac9a9daSXin Li<computeroutput>small</computeroutput> and 1717*0ac9a9daSXin Li<computeroutput>verbosity</computeroutput>, see 1718*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompressInit</computeroutput>.</para> 1719*0ac9a9daSXin Li 1720*0ac9a9daSXin Li<para>The amount of memory needed to decompress a file cannot be 1721*0ac9a9daSXin Lidetermined until the file's header has been read. So it is 1722*0ac9a9daSXin Lipossible that <computeroutput>BZ2_bzReadOpen</computeroutput> 1723*0ac9a9daSXin Lireturns <computeroutput>BZ_OK</computeroutput> but a subsequent 1724*0ac9a9daSXin Licall of <computeroutput>BZ2_bzRead</computeroutput> will return 1725*0ac9a9daSXin Li<computeroutput>BZ_MEM_ERROR</computeroutput>.</para> 1726*0ac9a9daSXin Li 1727*0ac9a9daSXin Li<para>Possible assignments to 1728*0ac9a9daSXin Li<computeroutput>bzerror</computeroutput>:</para> 1729*0ac9a9daSXin Li 1730*0ac9a9daSXin Li<programlisting> 1731*0ac9a9daSXin LiBZ_CONFIG_ERROR 1732*0ac9a9daSXin Li if the library has been mis-compiled 1733*0ac9a9daSXin LiBZ_PARAM_ERROR 1734*0ac9a9daSXin Li if f is NULL 1735*0ac9a9daSXin Li or small is neither 0 nor 1 1736*0ac9a9daSXin Li or ( unused == NULL && nUnused != 0 ) 1737*0ac9a9daSXin Li or ( unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED) ) 1738*0ac9a9daSXin LiBZ_IO_ERROR 1739*0ac9a9daSXin Li if ferror(f) is nonzero 1740*0ac9a9daSXin LiBZ_MEM_ERROR 1741*0ac9a9daSXin Li if insufficient memory is available 1742*0ac9a9daSXin LiBZ_OK 1743*0ac9a9daSXin Li otherwise. 1744*0ac9a9daSXin Li</programlisting> 1745*0ac9a9daSXin Li 1746*0ac9a9daSXin Li<para>Possible return values:</para> 1747*0ac9a9daSXin Li 1748*0ac9a9daSXin Li<programlisting> 1749*0ac9a9daSXin LiPointer to an abstract BZFILE 1750*0ac9a9daSXin Li if bzerror is BZ_OK 1751*0ac9a9daSXin LiNULL 1752*0ac9a9daSXin Li otherwise 1753*0ac9a9daSXin Li</programlisting> 1754*0ac9a9daSXin Li 1755*0ac9a9daSXin Li<para>Allowable next actions:</para> 1756*0ac9a9daSXin Li 1757*0ac9a9daSXin Li<programlisting> 1758*0ac9a9daSXin LiBZ2_bzRead 1759*0ac9a9daSXin Li if bzerror is BZ_OK 1760*0ac9a9daSXin LiBZ2_bzClose 1761*0ac9a9daSXin Li otherwise 1762*0ac9a9daSXin Li</programlisting> 1763*0ac9a9daSXin Li 1764*0ac9a9daSXin Li</sect2> 1765*0ac9a9daSXin Li 1766*0ac9a9daSXin Li 1767*0ac9a9daSXin Li<sect2 id="bzread" xreflabel="BZ2_bzRead"> 1768*0ac9a9daSXin Li<title>BZ2_bzRead</title> 1769*0ac9a9daSXin Li 1770*0ac9a9daSXin Li<programlisting> 1771*0ac9a9daSXin Liint BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len ); 1772*0ac9a9daSXin Li</programlisting> 1773*0ac9a9daSXin Li 1774*0ac9a9daSXin Li<para>Reads up to <computeroutput>len</computeroutput> 1775*0ac9a9daSXin Li(uncompressed) bytes from the compressed file 1776*0ac9a9daSXin Li<computeroutput>b</computeroutput> into the buffer 1777*0ac9a9daSXin Li<computeroutput>buf</computeroutput>. If the read was 1778*0ac9a9daSXin Lisuccessful, <computeroutput>bzerror</computeroutput> is set to 1779*0ac9a9daSXin Li<computeroutput>BZ_OK</computeroutput> and the number of bytes 1780*0ac9a9daSXin Liread is returned. If the logical end-of-stream was detected, 1781*0ac9a9daSXin Li<computeroutput>bzerror</computeroutput> will be set to 1782*0ac9a9daSXin Li<computeroutput>BZ_STREAM_END</computeroutput>, and the number of 1783*0ac9a9daSXin Libytes read is returned. All other 1784*0ac9a9daSXin Li<computeroutput>bzerror</computeroutput> values denote an 1785*0ac9a9daSXin Lierror.</para> 1786*0ac9a9daSXin Li 1787*0ac9a9daSXin Li<para><computeroutput>BZ2_bzRead</computeroutput> will supply 1788*0ac9a9daSXin Li<computeroutput>len</computeroutput> bytes, unless the logical 1789*0ac9a9daSXin Listream end is detected or an error occurs. Because of this, it 1790*0ac9a9daSXin Liis possible to detect the stream end by observing when the number 1791*0ac9a9daSXin Liof bytes returned is less than the number requested. 1792*0ac9a9daSXin LiNevertheless, this is regarded as inadvisable; you should instead 1793*0ac9a9daSXin Licheck <computeroutput>bzerror</computeroutput> after every call 1794*0ac9a9daSXin Liand watch out for 1795*0ac9a9daSXin Li<computeroutput>BZ_STREAM_END</computeroutput>.</para> 1796*0ac9a9daSXin Li 1797*0ac9a9daSXin Li<para>Internally, <computeroutput>BZ2_bzRead</computeroutput> 1798*0ac9a9daSXin Licopies data from the compressed file in chunks of size 1799*0ac9a9daSXin Li<computeroutput>BZ_MAX_UNUSED</computeroutput> bytes before 1800*0ac9a9daSXin Lidecompressing it. If the file contains more bytes than strictly 1801*0ac9a9daSXin Lineeded to reach the logical end-of-stream, 1802*0ac9a9daSXin Li<computeroutput>BZ2_bzRead</computeroutput> will almost certainly 1803*0ac9a9daSXin Liread some of the trailing data before signalling 1804*0ac9a9daSXin Li<computeroutput>BZ_SEQUENCE_END</computeroutput>. To collect the 1805*0ac9a9daSXin Liread but unused data once 1806*0ac9a9daSXin Li<computeroutput>BZ_SEQUENCE_END</computeroutput> has appeared, 1807*0ac9a9daSXin Licall <computeroutput>BZ2_bzReadGetUnused</computeroutput> 1808*0ac9a9daSXin Liimmediately before 1809*0ac9a9daSXin Li<computeroutput>BZ2_bzReadClose</computeroutput>.</para> 1810*0ac9a9daSXin Li 1811*0ac9a9daSXin Li<para>Possible assignments to 1812*0ac9a9daSXin Li<computeroutput>bzerror</computeroutput>:</para> 1813*0ac9a9daSXin Li 1814*0ac9a9daSXin Li<programlisting> 1815*0ac9a9daSXin LiBZ_PARAM_ERROR 1816*0ac9a9daSXin Li if b is NULL or buf is NULL or len < 0 1817*0ac9a9daSXin LiBZ_SEQUENCE_ERROR 1818*0ac9a9daSXin Li if b was opened with BZ2_bzWriteOpen 1819*0ac9a9daSXin LiBZ_IO_ERROR 1820*0ac9a9daSXin Li if there is an error reading from the compressed file 1821*0ac9a9daSXin LiBZ_UNEXPECTED_EOF 1822*0ac9a9daSXin Li if the compressed file ended before 1823*0ac9a9daSXin Li the logical end-of-stream was detected 1824*0ac9a9daSXin LiBZ_DATA_ERROR 1825*0ac9a9daSXin Li if a data integrity error was detected in the compressed stream 1826*0ac9a9daSXin LiBZ_DATA_ERROR_MAGIC 1827*0ac9a9daSXin Li if the stream does not begin with the requisite header bytes 1828*0ac9a9daSXin Li (ie, is not a bzip2 data file). This is really 1829*0ac9a9daSXin Li a special case of BZ_DATA_ERROR. 1830*0ac9a9daSXin LiBZ_MEM_ERROR 1831*0ac9a9daSXin Li if insufficient memory was available 1832*0ac9a9daSXin LiBZ_STREAM_END 1833*0ac9a9daSXin Li if the logical end of stream was detected. 1834*0ac9a9daSXin LiBZ_OK 1835*0ac9a9daSXin Li otherwise. 1836*0ac9a9daSXin Li</programlisting> 1837*0ac9a9daSXin Li 1838*0ac9a9daSXin Li<para>Possible return values:</para> 1839*0ac9a9daSXin Li 1840*0ac9a9daSXin Li<programlisting> 1841*0ac9a9daSXin Linumber of bytes read 1842*0ac9a9daSXin Li if bzerror is BZ_OK or BZ_STREAM_END 1843*0ac9a9daSXin Liundefined 1844*0ac9a9daSXin Li otherwise 1845*0ac9a9daSXin Li</programlisting> 1846*0ac9a9daSXin Li 1847*0ac9a9daSXin Li<para>Allowable next actions:</para> 1848*0ac9a9daSXin Li 1849*0ac9a9daSXin Li<programlisting> 1850*0ac9a9daSXin Licollect data from buf, then BZ2_bzRead or BZ2_bzReadClose 1851*0ac9a9daSXin Li if bzerror is BZ_OK 1852*0ac9a9daSXin Licollect data from buf, then BZ2_bzReadClose or BZ2_bzReadGetUnused 1853*0ac9a9daSXin Li if bzerror is BZ_SEQUENCE_END 1854*0ac9a9daSXin LiBZ2_bzReadClose 1855*0ac9a9daSXin Li otherwise 1856*0ac9a9daSXin Li</programlisting> 1857*0ac9a9daSXin Li 1858*0ac9a9daSXin Li</sect2> 1859*0ac9a9daSXin Li 1860*0ac9a9daSXin Li 1861*0ac9a9daSXin Li<sect2 id="bzreadgetunused" xreflabel="BZ2_bzReadGetUnused"> 1862*0ac9a9daSXin Li<title>BZ2_bzReadGetUnused</title> 1863*0ac9a9daSXin Li 1864*0ac9a9daSXin Li<programlisting> 1865*0ac9a9daSXin Livoid BZ2_bzReadGetUnused( int* bzerror, BZFILE *b, 1866*0ac9a9daSXin Li void** unused, int* nUnused ); 1867*0ac9a9daSXin Li</programlisting> 1868*0ac9a9daSXin Li 1869*0ac9a9daSXin Li<para>Returns data which was read from the compressed file but 1870*0ac9a9daSXin Liwas not needed to get to the logical end-of-stream. 1871*0ac9a9daSXin Li<computeroutput>*unused</computeroutput> is set to the address of 1872*0ac9a9daSXin Lithe data, and <computeroutput>*nUnused</computeroutput> to the 1873*0ac9a9daSXin Linumber of bytes. <computeroutput>*nUnused</computeroutput> will 1874*0ac9a9daSXin Libe set to a value between <computeroutput>0</computeroutput> and 1875*0ac9a9daSXin Li<computeroutput>BZ_MAX_UNUSED</computeroutput> inclusive.</para> 1876*0ac9a9daSXin Li 1877*0ac9a9daSXin Li<para>This function may only be called once 1878*0ac9a9daSXin Li<computeroutput>BZ2_bzRead</computeroutput> has signalled 1879*0ac9a9daSXin Li<computeroutput>BZ_STREAM_END</computeroutput> but before 1880*0ac9a9daSXin Li<computeroutput>BZ2_bzReadClose</computeroutput>.</para> 1881*0ac9a9daSXin Li 1882*0ac9a9daSXin Li<para>Possible assignments to 1883*0ac9a9daSXin Li<computeroutput>bzerror</computeroutput>:</para> 1884*0ac9a9daSXin Li 1885*0ac9a9daSXin Li<programlisting> 1886*0ac9a9daSXin LiBZ_PARAM_ERROR 1887*0ac9a9daSXin Li if b is NULL 1888*0ac9a9daSXin Li or unused is NULL or nUnused is NULL 1889*0ac9a9daSXin LiBZ_SEQUENCE_ERROR 1890*0ac9a9daSXin Li if BZ_STREAM_END has not been signalled 1891*0ac9a9daSXin Li or if b was opened with BZ2_bzWriteOpen 1892*0ac9a9daSXin LiBZ_OK 1893*0ac9a9daSXin Li otherwise 1894*0ac9a9daSXin Li</programlisting> 1895*0ac9a9daSXin Li 1896*0ac9a9daSXin Li<para>Allowable next actions:</para> 1897*0ac9a9daSXin Li 1898*0ac9a9daSXin Li<programlisting> 1899*0ac9a9daSXin LiBZ2_bzReadClose 1900*0ac9a9daSXin Li</programlisting> 1901*0ac9a9daSXin Li 1902*0ac9a9daSXin Li</sect2> 1903*0ac9a9daSXin Li 1904*0ac9a9daSXin Li 1905*0ac9a9daSXin Li<sect2 id="bzreadclose" xreflabel="BZ2_bzReadClose"> 1906*0ac9a9daSXin Li<title>BZ2_bzReadClose</title> 1907*0ac9a9daSXin Li 1908*0ac9a9daSXin Li<programlisting> 1909*0ac9a9daSXin Livoid BZ2_bzReadClose ( int *bzerror, BZFILE *b ); 1910*0ac9a9daSXin Li</programlisting> 1911*0ac9a9daSXin Li 1912*0ac9a9daSXin Li<para>Releases all memory pertaining to the compressed file 1913*0ac9a9daSXin Li<computeroutput>b</computeroutput>. 1914*0ac9a9daSXin Li<computeroutput>BZ2_bzReadClose</computeroutput> does not call 1915*0ac9a9daSXin Li<computeroutput>fclose</computeroutput> on the underlying file 1916*0ac9a9daSXin Lihandle, so you should do that yourself if appropriate. 1917*0ac9a9daSXin Li<computeroutput>BZ2_bzReadClose</computeroutput> should be called 1918*0ac9a9daSXin Lito clean up after all error situations.</para> 1919*0ac9a9daSXin Li 1920*0ac9a9daSXin Li<para>Possible assignments to 1921*0ac9a9daSXin Li<computeroutput>bzerror</computeroutput>:</para> 1922*0ac9a9daSXin Li 1923*0ac9a9daSXin Li<programlisting> 1924*0ac9a9daSXin LiBZ_SEQUENCE_ERROR 1925*0ac9a9daSXin Li if b was opened with BZ2_bzOpenWrite 1926*0ac9a9daSXin LiBZ_OK 1927*0ac9a9daSXin Li otherwise 1928*0ac9a9daSXin Li</programlisting> 1929*0ac9a9daSXin Li 1930*0ac9a9daSXin Li<para>Allowable next actions:</para> 1931*0ac9a9daSXin Li 1932*0ac9a9daSXin Li<programlisting> 1933*0ac9a9daSXin Linone 1934*0ac9a9daSXin Li</programlisting> 1935*0ac9a9daSXin Li 1936*0ac9a9daSXin Li</sect2> 1937*0ac9a9daSXin Li 1938*0ac9a9daSXin Li 1939*0ac9a9daSXin Li<sect2 id="bzwriteopen" xreflabel="BZ2_bzWriteOpen"> 1940*0ac9a9daSXin Li<title>BZ2_bzWriteOpen</title> 1941*0ac9a9daSXin Li 1942*0ac9a9daSXin Li<programlisting> 1943*0ac9a9daSXin LiBZFILE *BZ2_bzWriteOpen( int *bzerror, FILE *f, 1944*0ac9a9daSXin Li int blockSize100k, int verbosity, 1945*0ac9a9daSXin Li int workFactor ); 1946*0ac9a9daSXin Li</programlisting> 1947*0ac9a9daSXin Li 1948*0ac9a9daSXin Li<para>Prepare to write compressed data to file handle 1949*0ac9a9daSXin Li<computeroutput>f</computeroutput>. 1950*0ac9a9daSXin Li<computeroutput>f</computeroutput> should refer to a file which 1951*0ac9a9daSXin Lihas been opened for writing, and for which the error indicator 1952*0ac9a9daSXin Li(<computeroutput>ferror(f)</computeroutput>)is not set.</para> 1953*0ac9a9daSXin Li 1954*0ac9a9daSXin Li<para>For the meaning of parameters 1955*0ac9a9daSXin Li<computeroutput>blockSize100k</computeroutput>, 1956*0ac9a9daSXin Li<computeroutput>verbosity</computeroutput> and 1957*0ac9a9daSXin Li<computeroutput>workFactor</computeroutput>, see 1958*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressInit</computeroutput>.</para> 1959*0ac9a9daSXin Li 1960*0ac9a9daSXin Li<para>All required memory is allocated at this stage, so if the 1961*0ac9a9daSXin Licall completes successfully, 1962*0ac9a9daSXin Li<computeroutput>BZ_MEM_ERROR</computeroutput> cannot be signalled 1963*0ac9a9daSXin Liby a subsequent call to 1964*0ac9a9daSXin Li<computeroutput>BZ2_bzWrite</computeroutput>.</para> 1965*0ac9a9daSXin Li 1966*0ac9a9daSXin Li<para>Possible assignments to 1967*0ac9a9daSXin Li<computeroutput>bzerror</computeroutput>:</para> 1968*0ac9a9daSXin Li 1969*0ac9a9daSXin Li<programlisting> 1970*0ac9a9daSXin LiBZ_CONFIG_ERROR 1971*0ac9a9daSXin Li if the library has been mis-compiled 1972*0ac9a9daSXin LiBZ_PARAM_ERROR 1973*0ac9a9daSXin Li if f is NULL 1974*0ac9a9daSXin Li or blockSize100k < 1 or blockSize100k > 9 1975*0ac9a9daSXin LiBZ_IO_ERROR 1976*0ac9a9daSXin Li if ferror(f) is nonzero 1977*0ac9a9daSXin LiBZ_MEM_ERROR 1978*0ac9a9daSXin Li if insufficient memory is available 1979*0ac9a9daSXin LiBZ_OK 1980*0ac9a9daSXin Li otherwise 1981*0ac9a9daSXin Li</programlisting> 1982*0ac9a9daSXin Li 1983*0ac9a9daSXin Li<para>Possible return values:</para> 1984*0ac9a9daSXin Li 1985*0ac9a9daSXin Li<programlisting> 1986*0ac9a9daSXin LiPointer to an abstract BZFILE 1987*0ac9a9daSXin Li if bzerror is BZ_OK 1988*0ac9a9daSXin LiNULL 1989*0ac9a9daSXin Li otherwise 1990*0ac9a9daSXin Li</programlisting> 1991*0ac9a9daSXin Li 1992*0ac9a9daSXin Li<para>Allowable next actions:</para> 1993*0ac9a9daSXin Li 1994*0ac9a9daSXin Li<programlisting> 1995*0ac9a9daSXin LiBZ2_bzWrite 1996*0ac9a9daSXin Li if bzerror is BZ_OK 1997*0ac9a9daSXin Li (you could go directly to BZ2_bzWriteClose, but this would be pretty pointless) 1998*0ac9a9daSXin LiBZ2_bzWriteClose 1999*0ac9a9daSXin Li otherwise 2000*0ac9a9daSXin Li</programlisting> 2001*0ac9a9daSXin Li 2002*0ac9a9daSXin Li</sect2> 2003*0ac9a9daSXin Li 2004*0ac9a9daSXin Li 2005*0ac9a9daSXin Li<sect2 id="bzwrite" xreflabel="BZ2_bzWrite"> 2006*0ac9a9daSXin Li<title>BZ2_bzWrite</title> 2007*0ac9a9daSXin Li 2008*0ac9a9daSXin Li<programlisting> 2009*0ac9a9daSXin Livoid BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len ); 2010*0ac9a9daSXin Li</programlisting> 2011*0ac9a9daSXin Li 2012*0ac9a9daSXin Li<para>Absorbs <computeroutput>len</computeroutput> bytes from the 2013*0ac9a9daSXin Libuffer <computeroutput>buf</computeroutput>, eventually to be 2014*0ac9a9daSXin Licompressed and written to the file.</para> 2015*0ac9a9daSXin Li 2016*0ac9a9daSXin Li<para>Possible assignments to 2017*0ac9a9daSXin Li<computeroutput>bzerror</computeroutput>:</para> 2018*0ac9a9daSXin Li 2019*0ac9a9daSXin Li<programlisting> 2020*0ac9a9daSXin LiBZ_PARAM_ERROR 2021*0ac9a9daSXin Li if b is NULL or buf is NULL or len < 0 2022*0ac9a9daSXin LiBZ_SEQUENCE_ERROR 2023*0ac9a9daSXin Li if b was opened with BZ2_bzReadOpen 2024*0ac9a9daSXin LiBZ_IO_ERROR 2025*0ac9a9daSXin Li if there is an error writing the compressed file. 2026*0ac9a9daSXin LiBZ_OK 2027*0ac9a9daSXin Li otherwise 2028*0ac9a9daSXin Li</programlisting> 2029*0ac9a9daSXin Li 2030*0ac9a9daSXin Li</sect2> 2031*0ac9a9daSXin Li 2032*0ac9a9daSXin Li 2033*0ac9a9daSXin Li<sect2 id="bzwriteclose" xreflabel="BZ2_bzWriteClose"> 2034*0ac9a9daSXin Li<title>BZ2_bzWriteClose</title> 2035*0ac9a9daSXin Li 2036*0ac9a9daSXin Li<programlisting> 2037*0ac9a9daSXin Livoid BZ2_bzWriteClose( int *bzerror, BZFILE* f, 2038*0ac9a9daSXin Li int abandon, 2039*0ac9a9daSXin Li unsigned int* nbytes_in, 2040*0ac9a9daSXin Li unsigned int* nbytes_out ); 2041*0ac9a9daSXin Li 2042*0ac9a9daSXin Livoid BZ2_bzWriteClose64( int *bzerror, BZFILE* f, 2043*0ac9a9daSXin Li int abandon, 2044*0ac9a9daSXin Li unsigned int* nbytes_in_lo32, 2045*0ac9a9daSXin Li unsigned int* nbytes_in_hi32, 2046*0ac9a9daSXin Li unsigned int* nbytes_out_lo32, 2047*0ac9a9daSXin Li unsigned int* nbytes_out_hi32 ); 2048*0ac9a9daSXin Li</programlisting> 2049*0ac9a9daSXin Li 2050*0ac9a9daSXin Li<para>Compresses and flushes to the compressed file all data so 2051*0ac9a9daSXin Lifar supplied by <computeroutput>BZ2_bzWrite</computeroutput>. 2052*0ac9a9daSXin LiThe logical end-of-stream markers are also written, so subsequent 2053*0ac9a9daSXin Licalls to <computeroutput>BZ2_bzWrite</computeroutput> are 2054*0ac9a9daSXin Liillegal. All memory associated with the compressed file 2055*0ac9a9daSXin Li<computeroutput>b</computeroutput> is released. 2056*0ac9a9daSXin Li<computeroutput>fflush</computeroutput> is called on the 2057*0ac9a9daSXin Licompressed file, but it is not 2058*0ac9a9daSXin Li<computeroutput>fclose</computeroutput>'d.</para> 2059*0ac9a9daSXin Li 2060*0ac9a9daSXin Li<para>If <computeroutput>BZ2_bzWriteClose</computeroutput> is 2061*0ac9a9daSXin Licalled to clean up after an error, the only action is to release 2062*0ac9a9daSXin Lithe memory. The library records the error codes issued by 2063*0ac9a9daSXin Liprevious calls, so this situation will be detected automatically. 2064*0ac9a9daSXin LiThere is no attempt to complete the compression operation, nor to 2065*0ac9a9daSXin Li<computeroutput>fflush</computeroutput> the compressed file. You 2066*0ac9a9daSXin Lican force this behaviour to happen even in the case of no error, 2067*0ac9a9daSXin Liby passing a nonzero value to 2068*0ac9a9daSXin Li<computeroutput>abandon</computeroutput>.</para> 2069*0ac9a9daSXin Li 2070*0ac9a9daSXin Li<para>If <computeroutput>nbytes_in</computeroutput> is non-null, 2071*0ac9a9daSXin Li<computeroutput>*nbytes_in</computeroutput> will be set to be the 2072*0ac9a9daSXin Litotal volume of uncompressed data handled. Similarly, 2073*0ac9a9daSXin Li<computeroutput>nbytes_out</computeroutput> will be set to the 2074*0ac9a9daSXin Litotal volume of compressed data written. For compatibility with 2075*0ac9a9daSXin Liolder versions of the library, 2076*0ac9a9daSXin Li<computeroutput>BZ2_bzWriteClose</computeroutput> only yields the 2077*0ac9a9daSXin Lilower 32 bits of these counts. Use 2078*0ac9a9daSXin Li<computeroutput>BZ2_bzWriteClose64</computeroutput> if you want 2079*0ac9a9daSXin Lithe full 64 bit counts. These two functions are otherwise 2080*0ac9a9daSXin Liabsolutely identical.</para> 2081*0ac9a9daSXin Li 2082*0ac9a9daSXin Li<para>Possible assignments to 2083*0ac9a9daSXin Li<computeroutput>bzerror</computeroutput>:</para> 2084*0ac9a9daSXin Li 2085*0ac9a9daSXin Li<programlisting> 2086*0ac9a9daSXin LiBZ_SEQUENCE_ERROR 2087*0ac9a9daSXin Li if b was opened with BZ2_bzReadOpen 2088*0ac9a9daSXin LiBZ_IO_ERROR 2089*0ac9a9daSXin Li if there is an error writing the compressed file 2090*0ac9a9daSXin LiBZ_OK 2091*0ac9a9daSXin Li otherwise 2092*0ac9a9daSXin Li</programlisting> 2093*0ac9a9daSXin Li 2094*0ac9a9daSXin Li</sect2> 2095*0ac9a9daSXin Li 2096*0ac9a9daSXin Li 2097*0ac9a9daSXin Li<sect2 id="embed" xreflabel="Handling embedded compressed data streams"> 2098*0ac9a9daSXin Li<title>Handling embedded compressed data streams</title> 2099*0ac9a9daSXin Li 2100*0ac9a9daSXin Li<para>The high-level library facilitates use of 2101*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> data streams which form 2102*0ac9a9daSXin Lisome part of a surrounding, larger data stream.</para> 2103*0ac9a9daSXin Li 2104*0ac9a9daSXin Li<itemizedlist mark='bullet'> 2105*0ac9a9daSXin Li 2106*0ac9a9daSXin Li <listitem><para>For writing, the library takes an open file handle, 2107*0ac9a9daSXin Li writes compressed data to it, 2108*0ac9a9daSXin Li <computeroutput>fflush</computeroutput>es it but does not 2109*0ac9a9daSXin Li <computeroutput>fclose</computeroutput> it. The calling 2110*0ac9a9daSXin Li application can write its own data before and after the 2111*0ac9a9daSXin Li compressed data stream, using that same file handle.</para></listitem> 2112*0ac9a9daSXin Li 2113*0ac9a9daSXin Li <listitem><para>Reading is more complex, and the facilities are not as 2114*0ac9a9daSXin Li general as they could be since generality is hard to reconcile 2115*0ac9a9daSXin Li with efficiency. <computeroutput>BZ2_bzRead</computeroutput> 2116*0ac9a9daSXin Li reads from the compressed file in blocks of size 2117*0ac9a9daSXin Li <computeroutput>BZ_MAX_UNUSED</computeroutput> bytes, and in 2118*0ac9a9daSXin Li doing so probably will overshoot the logical end of compressed 2119*0ac9a9daSXin Li stream. To recover this data once decompression has ended, 2120*0ac9a9daSXin Li call <computeroutput>BZ2_bzReadGetUnused</computeroutput> after 2121*0ac9a9daSXin Li the last call of <computeroutput>BZ2_bzRead</computeroutput> 2122*0ac9a9daSXin Li (the one returning 2123*0ac9a9daSXin Li <computeroutput>BZ_STREAM_END</computeroutput>) but before 2124*0ac9a9daSXin Li calling 2125*0ac9a9daSXin Li <computeroutput>BZ2_bzReadClose</computeroutput>.</para></listitem> 2126*0ac9a9daSXin Li 2127*0ac9a9daSXin Li</itemizedlist> 2128*0ac9a9daSXin Li 2129*0ac9a9daSXin Li<para>This mechanism makes it easy to decompress multiple 2130*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> streams placed end-to-end. 2131*0ac9a9daSXin LiAs the end of one stream, when 2132*0ac9a9daSXin Li<computeroutput>BZ2_bzRead</computeroutput> returns 2133*0ac9a9daSXin Li<computeroutput>BZ_STREAM_END</computeroutput>, call 2134*0ac9a9daSXin Li<computeroutput>BZ2_bzReadGetUnused</computeroutput> to collect 2135*0ac9a9daSXin Lithe unused data (copy it into your own buffer somewhere). That 2136*0ac9a9daSXin Lidata forms the start of the next compressed stream. To start 2137*0ac9a9daSXin Liuncompressing that next stream, call 2138*0ac9a9daSXin Li<computeroutput>BZ2_bzReadOpen</computeroutput> again, feeding in 2139*0ac9a9daSXin Lithe unused data via the <computeroutput>unused</computeroutput> / 2140*0ac9a9daSXin Li<computeroutput>nUnused</computeroutput> parameters. Keep doing 2141*0ac9a9daSXin Lithis until <computeroutput>BZ_STREAM_END</computeroutput> return 2142*0ac9a9daSXin Licoincides with the physical end of file 2143*0ac9a9daSXin Li(<computeroutput>feof(f)</computeroutput>). In this situation 2144*0ac9a9daSXin Li<computeroutput>BZ2_bzReadGetUnused</computeroutput> will of 2145*0ac9a9daSXin Licourse return no data.</para> 2146*0ac9a9daSXin Li 2147*0ac9a9daSXin Li<para>This should give some feel for how the high-level interface 2148*0ac9a9daSXin Lican be used. If you require extra flexibility, you'll have to 2149*0ac9a9daSXin Libite the bullet and get to grips with the low-level 2150*0ac9a9daSXin Liinterface.</para> 2151*0ac9a9daSXin Li 2152*0ac9a9daSXin Li</sect2> 2153*0ac9a9daSXin Li 2154*0ac9a9daSXin Li 2155*0ac9a9daSXin Li<sect2 id="std-rdwr" xreflabel="Standard file-reading/writing code"> 2156*0ac9a9daSXin Li<title>Standard file-reading/writing code</title> 2157*0ac9a9daSXin Li 2158*0ac9a9daSXin Li<para>Here's how you'd write data to a compressed file:</para> 2159*0ac9a9daSXin Li 2160*0ac9a9daSXin Li<programlisting> 2161*0ac9a9daSXin LiFILE* f; 2162*0ac9a9daSXin LiBZFILE* b; 2163*0ac9a9daSXin Liint nBuf; 2164*0ac9a9daSXin Lichar buf[ /* whatever size you like */ ]; 2165*0ac9a9daSXin Liint bzerror; 2166*0ac9a9daSXin Liint nWritten; 2167*0ac9a9daSXin Li 2168*0ac9a9daSXin Lif = fopen ( "myfile.bz2", "w" ); 2169*0ac9a9daSXin Liif ( !f ) { 2170*0ac9a9daSXin Li /* handle error */ 2171*0ac9a9daSXin Li} 2172*0ac9a9daSXin Lib = BZ2_bzWriteOpen( &bzerror, f, 9 ); 2173*0ac9a9daSXin Liif (bzerror != BZ_OK) { 2174*0ac9a9daSXin Li BZ2_bzWriteClose ( b ); 2175*0ac9a9daSXin Li /* handle error */ 2176*0ac9a9daSXin Li} 2177*0ac9a9daSXin Li 2178*0ac9a9daSXin Liwhile ( /* condition */ ) { 2179*0ac9a9daSXin Li /* get data to write into buf, and set nBuf appropriately */ 2180*0ac9a9daSXin Li nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf ); 2181*0ac9a9daSXin Li if (bzerror == BZ_IO_ERROR) { 2182*0ac9a9daSXin Li BZ2_bzWriteClose ( &bzerror, b ); 2183*0ac9a9daSXin Li /* handle error */ 2184*0ac9a9daSXin Li } 2185*0ac9a9daSXin Li} 2186*0ac9a9daSXin Li 2187*0ac9a9daSXin LiBZ2_bzWriteClose( &bzerror, b ); 2188*0ac9a9daSXin Liif (bzerror == BZ_IO_ERROR) { 2189*0ac9a9daSXin Li /* handle error */ 2190*0ac9a9daSXin Li} 2191*0ac9a9daSXin Li</programlisting> 2192*0ac9a9daSXin Li 2193*0ac9a9daSXin Li<para>And to read from a compressed file:</para> 2194*0ac9a9daSXin Li 2195*0ac9a9daSXin Li<programlisting> 2196*0ac9a9daSXin LiFILE* f; 2197*0ac9a9daSXin LiBZFILE* b; 2198*0ac9a9daSXin Liint nBuf; 2199*0ac9a9daSXin Lichar buf[ /* whatever size you like */ ]; 2200*0ac9a9daSXin Liint bzerror; 2201*0ac9a9daSXin Liint nWritten; 2202*0ac9a9daSXin Li 2203*0ac9a9daSXin Lif = fopen ( "myfile.bz2", "r" ); 2204*0ac9a9daSXin Liif ( !f ) { 2205*0ac9a9daSXin Li /* handle error */ 2206*0ac9a9daSXin Li} 2207*0ac9a9daSXin Lib = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 ); 2208*0ac9a9daSXin Liif ( bzerror != BZ_OK ) { 2209*0ac9a9daSXin Li BZ2_bzReadClose ( &bzerror, b ); 2210*0ac9a9daSXin Li /* handle error */ 2211*0ac9a9daSXin Li} 2212*0ac9a9daSXin Li 2213*0ac9a9daSXin Libzerror = BZ_OK; 2214*0ac9a9daSXin Liwhile ( bzerror == BZ_OK && /* arbitrary other conditions */) { 2215*0ac9a9daSXin Li nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ ); 2216*0ac9a9daSXin Li if ( bzerror == BZ_OK ) { 2217*0ac9a9daSXin Li /* do something with buf[0 .. nBuf-1] */ 2218*0ac9a9daSXin Li } 2219*0ac9a9daSXin Li} 2220*0ac9a9daSXin Liif ( bzerror != BZ_STREAM_END ) { 2221*0ac9a9daSXin Li BZ2_bzReadClose ( &bzerror, b ); 2222*0ac9a9daSXin Li /* handle error */ 2223*0ac9a9daSXin Li} else { 2224*0ac9a9daSXin Li BZ2_bzReadClose ( &bzerror, b ); 2225*0ac9a9daSXin Li} 2226*0ac9a9daSXin Li</programlisting> 2227*0ac9a9daSXin Li 2228*0ac9a9daSXin Li</sect2> 2229*0ac9a9daSXin Li 2230*0ac9a9daSXin Li</sect1> 2231*0ac9a9daSXin Li 2232*0ac9a9daSXin Li 2233*0ac9a9daSXin Li<sect1 id="util-fns" xreflabel="Utility functions"> 2234*0ac9a9daSXin Li<title>Utility functions</title> 2235*0ac9a9daSXin Li 2236*0ac9a9daSXin Li 2237*0ac9a9daSXin Li<sect2 id="bzbufftobuffcompress" xreflabel="BZ2_bzBuffToBuffCompress"> 2238*0ac9a9daSXin Li<title>BZ2_bzBuffToBuffCompress</title> 2239*0ac9a9daSXin Li 2240*0ac9a9daSXin Li<programlisting> 2241*0ac9a9daSXin Liint BZ2_bzBuffToBuffCompress( char* dest, 2242*0ac9a9daSXin Li unsigned int* destLen, 2243*0ac9a9daSXin Li char* source, 2244*0ac9a9daSXin Li unsigned int sourceLen, 2245*0ac9a9daSXin Li int blockSize100k, 2246*0ac9a9daSXin Li int verbosity, 2247*0ac9a9daSXin Li int workFactor ); 2248*0ac9a9daSXin Li</programlisting> 2249*0ac9a9daSXin Li 2250*0ac9a9daSXin Li<para>Attempts to compress the data in <computeroutput>source[0 2251*0ac9a9daSXin Li.. sourceLen-1]</computeroutput> into the destination buffer, 2252*0ac9a9daSXin Li<computeroutput>dest[0 .. *destLen-1]</computeroutput>. If the 2253*0ac9a9daSXin Lidestination buffer is big enough, 2254*0ac9a9daSXin Li<computeroutput>*destLen</computeroutput> is set to the size of 2255*0ac9a9daSXin Lithe compressed data, and <computeroutput>BZ_OK</computeroutput> 2256*0ac9a9daSXin Liis returned. If the compressed data won't fit, 2257*0ac9a9daSXin Li<computeroutput>*destLen</computeroutput> is unchanged, and 2258*0ac9a9daSXin Li<computeroutput>BZ_OUTBUFF_FULL</computeroutput> is 2259*0ac9a9daSXin Lireturned.</para> 2260*0ac9a9daSXin Li 2261*0ac9a9daSXin Li<para>Compression in this manner is a one-shot event, done with a 2262*0ac9a9daSXin Lisingle call to this function. The resulting compressed data is a 2263*0ac9a9daSXin Licomplete <computeroutput>bzip2</computeroutput> format data 2264*0ac9a9daSXin Listream. There is no mechanism for making additional calls to 2265*0ac9a9daSXin Liprovide extra input data. If you want that kind of mechanism, 2266*0ac9a9daSXin Liuse the low-level interface.</para> 2267*0ac9a9daSXin Li 2268*0ac9a9daSXin Li<para>For the meaning of parameters 2269*0ac9a9daSXin Li<computeroutput>blockSize100k</computeroutput>, 2270*0ac9a9daSXin Li<computeroutput>verbosity</computeroutput> and 2271*0ac9a9daSXin Li<computeroutput>workFactor</computeroutput>, see 2272*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressInit</computeroutput>.</para> 2273*0ac9a9daSXin Li 2274*0ac9a9daSXin Li<para>To guarantee that the compressed data will fit in its 2275*0ac9a9daSXin Libuffer, allocate an output buffer of size 1% larger than the 2276*0ac9a9daSXin Liuncompressed data, plus six hundred extra bytes.</para> 2277*0ac9a9daSXin Li 2278*0ac9a9daSXin Li<para><computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> 2279*0ac9a9daSXin Liwill not write data at or beyond 2280*0ac9a9daSXin Li<computeroutput>dest[*destLen]</computeroutput>, even in case of 2281*0ac9a9daSXin Libuffer overflow.</para> 2282*0ac9a9daSXin Li 2283*0ac9a9daSXin Li<para>Possible return values:</para> 2284*0ac9a9daSXin Li 2285*0ac9a9daSXin Li<programlisting> 2286*0ac9a9daSXin LiBZ_CONFIG_ERROR 2287*0ac9a9daSXin Li if the library has been mis-compiled 2288*0ac9a9daSXin LiBZ_PARAM_ERROR 2289*0ac9a9daSXin Li if dest is NULL or destLen is NULL 2290*0ac9a9daSXin Li or blockSize100k < 1 or blockSize100k > 9 2291*0ac9a9daSXin Li or verbosity < 0 or verbosity > 4 2292*0ac9a9daSXin Li or workFactor < 0 or workFactor > 250 2293*0ac9a9daSXin LiBZ_MEM_ERROR 2294*0ac9a9daSXin Li if insufficient memory is available 2295*0ac9a9daSXin LiBZ_OUTBUFF_FULL 2296*0ac9a9daSXin Li if the size of the compressed data exceeds *destLen 2297*0ac9a9daSXin LiBZ_OK 2298*0ac9a9daSXin Li otherwise 2299*0ac9a9daSXin Li</programlisting> 2300*0ac9a9daSXin Li 2301*0ac9a9daSXin Li</sect2> 2302*0ac9a9daSXin Li 2303*0ac9a9daSXin Li 2304*0ac9a9daSXin Li<sect2 id="bzbufftobuffdecompress" xreflabel="BZ2_bzBuffToBuffDecompress"> 2305*0ac9a9daSXin Li<title>BZ2_bzBuffToBuffDecompress</title> 2306*0ac9a9daSXin Li 2307*0ac9a9daSXin Li<programlisting> 2308*0ac9a9daSXin Liint BZ2_bzBuffToBuffDecompress( char* dest, 2309*0ac9a9daSXin Li unsigned int* destLen, 2310*0ac9a9daSXin Li char* source, 2311*0ac9a9daSXin Li unsigned int sourceLen, 2312*0ac9a9daSXin Li int small, 2313*0ac9a9daSXin Li int verbosity ); 2314*0ac9a9daSXin Li</programlisting> 2315*0ac9a9daSXin Li 2316*0ac9a9daSXin Li<para>Attempts to decompress the data in <computeroutput>source[0 2317*0ac9a9daSXin Li.. sourceLen-1]</computeroutput> into the destination buffer, 2318*0ac9a9daSXin Li<computeroutput>dest[0 .. *destLen-1]</computeroutput>. If the 2319*0ac9a9daSXin Lidestination buffer is big enough, 2320*0ac9a9daSXin Li<computeroutput>*destLen</computeroutput> is set to the size of 2321*0ac9a9daSXin Lithe uncompressed data, and <computeroutput>BZ_OK</computeroutput> 2322*0ac9a9daSXin Liis returned. If the compressed data won't fit, 2323*0ac9a9daSXin Li<computeroutput>*destLen</computeroutput> is unchanged, and 2324*0ac9a9daSXin Li<computeroutput>BZ_OUTBUFF_FULL</computeroutput> is 2325*0ac9a9daSXin Lireturned.</para> 2326*0ac9a9daSXin Li 2327*0ac9a9daSXin Li<para><computeroutput>source</computeroutput> is assumed to hold 2328*0ac9a9daSXin Lia complete <computeroutput>bzip2</computeroutput> format data 2329*0ac9a9daSXin Listream. 2330*0ac9a9daSXin Li<computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> tries 2331*0ac9a9daSXin Lito decompress the entirety of the stream into the output 2332*0ac9a9daSXin Libuffer.</para> 2333*0ac9a9daSXin Li 2334*0ac9a9daSXin Li<para>For the meaning of parameters 2335*0ac9a9daSXin Li<computeroutput>small</computeroutput> and 2336*0ac9a9daSXin Li<computeroutput>verbosity</computeroutput>, see 2337*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompressInit</computeroutput>.</para> 2338*0ac9a9daSXin Li 2339*0ac9a9daSXin Li<para>Because the compression ratio of the compressed data cannot 2340*0ac9a9daSXin Libe known in advance, there is no easy way to guarantee that the 2341*0ac9a9daSXin Lioutput buffer will be big enough. You may of course make 2342*0ac9a9daSXin Liarrangements in your code to record the size of the uncompressed 2343*0ac9a9daSXin Lidata, but such a mechanism is beyond the scope of this 2344*0ac9a9daSXin Lilibrary.</para> 2345*0ac9a9daSXin Li 2346*0ac9a9daSXin Li<para><computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> 2347*0ac9a9daSXin Liwill not write data at or beyond 2348*0ac9a9daSXin Li<computeroutput>dest[*destLen]</computeroutput>, even in case of 2349*0ac9a9daSXin Libuffer overflow.</para> 2350*0ac9a9daSXin Li 2351*0ac9a9daSXin Li<para>Possible return values:</para> 2352*0ac9a9daSXin Li 2353*0ac9a9daSXin Li<programlisting> 2354*0ac9a9daSXin LiBZ_CONFIG_ERROR 2355*0ac9a9daSXin Li if the library has been mis-compiled 2356*0ac9a9daSXin LiBZ_PARAM_ERROR 2357*0ac9a9daSXin Li if dest is NULL or destLen is NULL 2358*0ac9a9daSXin Li or small != 0 && small != 1 2359*0ac9a9daSXin Li or verbosity < 0 or verbosity > 4 2360*0ac9a9daSXin LiBZ_MEM_ERROR 2361*0ac9a9daSXin Li if insufficient memory is available 2362*0ac9a9daSXin LiBZ_OUTBUFF_FULL 2363*0ac9a9daSXin Li if the size of the compressed data exceeds *destLen 2364*0ac9a9daSXin LiBZ_DATA_ERROR 2365*0ac9a9daSXin Li if a data integrity error was detected in the compressed data 2366*0ac9a9daSXin LiBZ_DATA_ERROR_MAGIC 2367*0ac9a9daSXin Li if the compressed data doesn't begin with the right magic bytes 2368*0ac9a9daSXin LiBZ_UNEXPECTED_EOF 2369*0ac9a9daSXin Li if the compressed data ends unexpectedly 2370*0ac9a9daSXin LiBZ_OK 2371*0ac9a9daSXin Li otherwise 2372*0ac9a9daSXin Li</programlisting> 2373*0ac9a9daSXin Li 2374*0ac9a9daSXin Li</sect2> 2375*0ac9a9daSXin Li 2376*0ac9a9daSXin Li</sect1> 2377*0ac9a9daSXin Li 2378*0ac9a9daSXin Li 2379*0ac9a9daSXin Li<sect1 id="zlib-compat" xreflabel="zlib compatibility functions"> 2380*0ac9a9daSXin Li<title>zlib compatibility functions</title> 2381*0ac9a9daSXin Li 2382*0ac9a9daSXin Li<para>Yoshioka Tsuneo has contributed some functions to give 2383*0ac9a9daSXin Libetter <computeroutput>zlib</computeroutput> compatibility. 2384*0ac9a9daSXin LiThese functions are <computeroutput>BZ2_bzopen</computeroutput>, 2385*0ac9a9daSXin Li<computeroutput>BZ2_bzread</computeroutput>, 2386*0ac9a9daSXin Li<computeroutput>BZ2_bzwrite</computeroutput>, 2387*0ac9a9daSXin Li<computeroutput>BZ2_bzflush</computeroutput>, 2388*0ac9a9daSXin Li<computeroutput>BZ2_bzclose</computeroutput>, 2389*0ac9a9daSXin Li<computeroutput>BZ2_bzerror</computeroutput> and 2390*0ac9a9daSXin Li<computeroutput>BZ2_bzlibVersion</computeroutput>. These 2391*0ac9a9daSXin Lifunctions are not (yet) officially part of the library. If they 2392*0ac9a9daSXin Libreak, you get to keep all the pieces. Nevertheless, I think 2393*0ac9a9daSXin Lithey work ok.</para> 2394*0ac9a9daSXin Li 2395*0ac9a9daSXin Li<programlisting> 2396*0ac9a9daSXin Litypedef void BZFILE; 2397*0ac9a9daSXin Li 2398*0ac9a9daSXin Liconst char * BZ2_bzlibVersion ( void ); 2399*0ac9a9daSXin Li</programlisting> 2400*0ac9a9daSXin Li 2401*0ac9a9daSXin Li<para>Returns a string indicating the library version.</para> 2402*0ac9a9daSXin Li 2403*0ac9a9daSXin Li<programlisting> 2404*0ac9a9daSXin LiBZFILE * BZ2_bzopen ( const char *path, const char *mode ); 2405*0ac9a9daSXin LiBZFILE * BZ2_bzdopen ( int fd, const char *mode ); 2406*0ac9a9daSXin Li</programlisting> 2407*0ac9a9daSXin Li 2408*0ac9a9daSXin Li<para>Opens a <computeroutput>.bz2</computeroutput> file for 2409*0ac9a9daSXin Lireading or writing, using either its name or a pre-existing file 2410*0ac9a9daSXin Lidescriptor. Analogous to <computeroutput>fopen</computeroutput> 2411*0ac9a9daSXin Liand <computeroutput>fdopen</computeroutput>.</para> 2412*0ac9a9daSXin Li 2413*0ac9a9daSXin Li<programlisting> 2414*0ac9a9daSXin Liint BZ2_bzread ( BZFILE* b, void* buf, int len ); 2415*0ac9a9daSXin Liint BZ2_bzwrite ( BZFILE* b, void* buf, int len ); 2416*0ac9a9daSXin Li</programlisting> 2417*0ac9a9daSXin Li 2418*0ac9a9daSXin Li<para>Reads/writes data from/to a previously opened 2419*0ac9a9daSXin Li<computeroutput>BZFILE</computeroutput>. Analogous to 2420*0ac9a9daSXin Li<computeroutput>fread</computeroutput> and 2421*0ac9a9daSXin Li<computeroutput>fwrite</computeroutput>.</para> 2422*0ac9a9daSXin Li 2423*0ac9a9daSXin Li<programlisting> 2424*0ac9a9daSXin Liint BZ2_bzflush ( BZFILE* b ); 2425*0ac9a9daSXin Livoid BZ2_bzclose ( BZFILE* b ); 2426*0ac9a9daSXin Li</programlisting> 2427*0ac9a9daSXin Li 2428*0ac9a9daSXin Li<para>Flushes/closes a <computeroutput>BZFILE</computeroutput>. 2429*0ac9a9daSXin Li<computeroutput>BZ2_bzflush</computeroutput> doesn't actually do 2430*0ac9a9daSXin Lianything. Analogous to <computeroutput>fflush</computeroutput> 2431*0ac9a9daSXin Liand <computeroutput>fclose</computeroutput>.</para> 2432*0ac9a9daSXin Li 2433*0ac9a9daSXin Li<programlisting> 2434*0ac9a9daSXin Liconst char * BZ2_bzerror ( BZFILE *b, int *errnum ) 2435*0ac9a9daSXin Li</programlisting> 2436*0ac9a9daSXin Li 2437*0ac9a9daSXin Li<para>Returns a string describing the more recent error status of 2438*0ac9a9daSXin Li<computeroutput>b</computeroutput>, and also sets 2439*0ac9a9daSXin Li<computeroutput>*errnum</computeroutput> to its numerical 2440*0ac9a9daSXin Livalue.</para> 2441*0ac9a9daSXin Li 2442*0ac9a9daSXin Li</sect1> 2443*0ac9a9daSXin Li 2444*0ac9a9daSXin Li 2445*0ac9a9daSXin Li<sect1 id="stdio-free" 2446*0ac9a9daSXin Li xreflabel="Using the library in a stdio-free environment"> 2447*0ac9a9daSXin Li<title>Using the library in a stdio-free environment</title> 2448*0ac9a9daSXin Li 2449*0ac9a9daSXin Li 2450*0ac9a9daSXin Li<sect2 id="stdio-bye" xreflabel="Getting rid of stdio"> 2451*0ac9a9daSXin Li<title>Getting rid of stdio</title> 2452*0ac9a9daSXin Li 2453*0ac9a9daSXin Li<para>In a deeply embedded application, you might want to use 2454*0ac9a9daSXin Lijust the memory-to-memory functions. You can do this 2455*0ac9a9daSXin Liconveniently by compiling the library with preprocessor symbol 2456*0ac9a9daSXin Li<computeroutput>BZ_NO_STDIO</computeroutput> defined. Doing this 2457*0ac9a9daSXin Ligives you a library containing only the following eight 2458*0ac9a9daSXin Lifunctions:</para> 2459*0ac9a9daSXin Li 2460*0ac9a9daSXin Li<para><computeroutput>BZ2_bzCompressInit</computeroutput>, 2461*0ac9a9daSXin Li<computeroutput>BZ2_bzCompress</computeroutput>, 2462*0ac9a9daSXin Li<computeroutput>BZ2_bzCompressEnd</computeroutput> 2463*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompressInit</computeroutput>, 2464*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompress</computeroutput>, 2465*0ac9a9daSXin Li<computeroutput>BZ2_bzDecompressEnd</computeroutput> 2466*0ac9a9daSXin Li<computeroutput>BZ2_bzBuffToBuffCompress</computeroutput>, 2467*0ac9a9daSXin Li<computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput></para> 2468*0ac9a9daSXin Li 2469*0ac9a9daSXin Li<para>When compiled like this, all functions will ignore 2470*0ac9a9daSXin Li<computeroutput>verbosity</computeroutput> settings.</para> 2471*0ac9a9daSXin Li 2472*0ac9a9daSXin Li</sect2> 2473*0ac9a9daSXin Li 2474*0ac9a9daSXin Li 2475*0ac9a9daSXin Li<sect2 id="critical-error" xreflabel="Critical error handling"> 2476*0ac9a9daSXin Li<title>Critical error handling</title> 2477*0ac9a9daSXin Li 2478*0ac9a9daSXin Li<para><computeroutput>libbzip2</computeroutput> contains a number 2479*0ac9a9daSXin Liof internal assertion checks which should, needless to say, never 2480*0ac9a9daSXin Libe activated. Nevertheless, if an assertion should fail, 2481*0ac9a9daSXin Libehaviour depends on whether or not the library was compiled with 2482*0ac9a9daSXin Li<computeroutput>BZ_NO_STDIO</computeroutput> set.</para> 2483*0ac9a9daSXin Li 2484*0ac9a9daSXin Li<para>For a normal compile, an assertion failure yields the 2485*0ac9a9daSXin Limessage:</para> 2486*0ac9a9daSXin Li 2487*0ac9a9daSXin Li<blockquote> 2488*0ac9a9daSXin Li<para>bzip2/libbzip2: internal error number N.</para> 2489*0ac9a9daSXin Li<para>This is a bug in bzip2/libbzip2, &bz-version; of &bz-date;. 2490*0ac9a9daSXin LiPlease report it to: &bz-email;. If this happened 2491*0ac9a9daSXin Liwhen you were using some program which uses libbzip2 as a 2492*0ac9a9daSXin Licomponent, you should also report this bug to the author(s) 2493*0ac9a9daSXin Liof that program. Please make an effort to report this bug; 2494*0ac9a9daSXin Litimely and accurate bug reports eventually lead to higher 2495*0ac9a9daSXin Liquality software. Thanks. 2496*0ac9a9daSXin Li</para></blockquote> 2497*0ac9a9daSXin Li 2498*0ac9a9daSXin Li<para>where <computeroutput>N</computeroutput> is some error code 2499*0ac9a9daSXin Linumber. If <computeroutput>N == 1007</computeroutput>, it also 2500*0ac9a9daSXin Liprints some extra text advising the reader that unreliable memory 2501*0ac9a9daSXin Liis often associated with internal error 1007. (This is a 2502*0ac9a9daSXin Lifrequently-observed-phenomenon with versions 1.0.0/1.0.1).</para> 2503*0ac9a9daSXin Li 2504*0ac9a9daSXin Li<para><computeroutput>exit(3)</computeroutput> is then 2505*0ac9a9daSXin Licalled.</para> 2506*0ac9a9daSXin Li 2507*0ac9a9daSXin Li<para>For a <computeroutput>stdio</computeroutput>-free library, 2508*0ac9a9daSXin Liassertion failures result in a call to a function declared 2509*0ac9a9daSXin Lias:</para> 2510*0ac9a9daSXin Li 2511*0ac9a9daSXin Li<programlisting> 2512*0ac9a9daSXin Liextern void bz_internal_error ( int errcode ); 2513*0ac9a9daSXin Li</programlisting> 2514*0ac9a9daSXin Li 2515*0ac9a9daSXin Li<para>The relevant code is passed as a parameter. You should 2516*0ac9a9daSXin Lisupply such a function.</para> 2517*0ac9a9daSXin Li 2518*0ac9a9daSXin Li<para>In either case, once an assertion failure has occurred, any 2519*0ac9a9daSXin Li<computeroutput>bz_stream</computeroutput> records involved can 2520*0ac9a9daSXin Libe regarded as invalid. You should not attempt to resume normal 2521*0ac9a9daSXin Lioperation with them.</para> 2522*0ac9a9daSXin Li 2523*0ac9a9daSXin Li<para>You may, of course, change critical error handling to suit 2524*0ac9a9daSXin Liyour needs. As I said above, critical errors indicate bugs in 2525*0ac9a9daSXin Lithe library and should not occur. All "normal" error situations 2526*0ac9a9daSXin Liare indicated via error return codes from functions, and can be 2527*0ac9a9daSXin Lirecovered from.</para> 2528*0ac9a9daSXin Li 2529*0ac9a9daSXin Li</sect2> 2530*0ac9a9daSXin Li 2531*0ac9a9daSXin Li</sect1> 2532*0ac9a9daSXin Li 2533*0ac9a9daSXin Li 2534*0ac9a9daSXin Li<sect1 id="win-dll" xreflabel="Making a Windows DLL"> 2535*0ac9a9daSXin Li<title>Making a Windows DLL</title> 2536*0ac9a9daSXin Li 2537*0ac9a9daSXin Li<para>Everything related to Windows has been contributed by 2538*0ac9a9daSXin LiYoshioka Tsuneo 2539*0ac9a9daSXin Li(<computeroutput>[email protected]</computeroutput>), so 2540*0ac9a9daSXin Liyou should send your queries to him (but please Cc: 2541*0ac9a9daSXin Li<computeroutput>&bz-email;</computeroutput>).</para> 2542*0ac9a9daSXin Li 2543*0ac9a9daSXin Li<para>My vague understanding of what to do is: using Visual C++ 2544*0ac9a9daSXin Li5.0, open the project file 2545*0ac9a9daSXin Li<computeroutput>libbz2.dsp</computeroutput>, and build. That's 2546*0ac9a9daSXin Liall.</para> 2547*0ac9a9daSXin Li 2548*0ac9a9daSXin Li<para>If you can't open the project file for some reason, make a 2549*0ac9a9daSXin Linew one, naming these files: 2550*0ac9a9daSXin Li<computeroutput>blocksort.c</computeroutput>, 2551*0ac9a9daSXin Li<computeroutput>bzlib.c</computeroutput>, 2552*0ac9a9daSXin Li<computeroutput>compress.c</computeroutput>, 2553*0ac9a9daSXin Li<computeroutput>crctable.c</computeroutput>, 2554*0ac9a9daSXin Li<computeroutput>decompress.c</computeroutput>, 2555*0ac9a9daSXin Li<computeroutput>huffman.c</computeroutput>, 2556*0ac9a9daSXin Li<computeroutput>randtable.c</computeroutput> and 2557*0ac9a9daSXin Li<computeroutput>libbz2.def</computeroutput>. You will also need 2558*0ac9a9daSXin Lito name the header files <computeroutput>bzlib.h</computeroutput> 2559*0ac9a9daSXin Liand <computeroutput>bzlib_private.h</computeroutput>.</para> 2560*0ac9a9daSXin Li 2561*0ac9a9daSXin Li<para>If you don't use VC++, you may need to define the 2562*0ac9a9daSXin Liproprocessor symbol 2563*0ac9a9daSXin Li<computeroutput>_WIN32</computeroutput>.</para> 2564*0ac9a9daSXin Li 2565*0ac9a9daSXin Li<para>Finally, <computeroutput>dlltest.c</computeroutput> is a 2566*0ac9a9daSXin Lisample program using the DLL. It has a project file, 2567*0ac9a9daSXin Li<computeroutput>dlltest.dsp</computeroutput>.</para> 2568*0ac9a9daSXin Li 2569*0ac9a9daSXin Li<para>If you just want a makefile for Visual C, have a look at 2570*0ac9a9daSXin Li<computeroutput>makefile.msc</computeroutput>.</para> 2571*0ac9a9daSXin Li 2572*0ac9a9daSXin Li<para>Be aware that if you compile 2573*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> itself on Win32, you must 2574*0ac9a9daSXin Liset <computeroutput>BZ_UNIX</computeroutput> to 0 and 2575*0ac9a9daSXin Li<computeroutput>BZ_LCCWIN32</computeroutput> to 1, in the file 2576*0ac9a9daSXin Li<computeroutput>bzip2.c</computeroutput>, before compiling. 2577*0ac9a9daSXin LiOtherwise the resulting binary won't work correctly.</para> 2578*0ac9a9daSXin Li 2579*0ac9a9daSXin Li<para>I haven't tried any of this stuff myself, but it all looks 2580*0ac9a9daSXin Liplausible.</para> 2581*0ac9a9daSXin Li 2582*0ac9a9daSXin Li</sect1> 2583*0ac9a9daSXin Li 2584*0ac9a9daSXin Li</chapter> 2585*0ac9a9daSXin Li 2586*0ac9a9daSXin Li 2587*0ac9a9daSXin Li 2588*0ac9a9daSXin Li<chapter id="misc" xreflabel="Miscellanea"> 2589*0ac9a9daSXin Li<title>Miscellanea</title> 2590*0ac9a9daSXin Li 2591*0ac9a9daSXin Li<para>These are just some random thoughts of mine. Your mileage 2592*0ac9a9daSXin Limay vary.</para> 2593*0ac9a9daSXin Li 2594*0ac9a9daSXin Li 2595*0ac9a9daSXin Li<sect1 id="limits" xreflabel="Limitations of the compressed file format"> 2596*0ac9a9daSXin Li<title>Limitations of the compressed file format</title> 2597*0ac9a9daSXin Li 2598*0ac9a9daSXin Li<para><computeroutput>bzip2-1.0.X</computeroutput>, 2599*0ac9a9daSXin Li<computeroutput>0.9.5</computeroutput> and 2600*0ac9a9daSXin Li<computeroutput>0.9.0</computeroutput> use exactly the same file 2601*0ac9a9daSXin Liformat as the original version, 2602*0ac9a9daSXin Li<computeroutput>bzip2-0.1</computeroutput>. This decision was 2603*0ac9a9daSXin Limade in the interests of stability. Creating yet another 2604*0ac9a9daSXin Liincompatible compressed file format would create further 2605*0ac9a9daSXin Liconfusion and disruption for users.</para> 2606*0ac9a9daSXin Li 2607*0ac9a9daSXin Li<para>Nevertheless, this is not a painless decision. Development 2608*0ac9a9daSXin Liwork since the release of 2609*0ac9a9daSXin Li<computeroutput>bzip2-0.1</computeroutput> in August 1997 has 2610*0ac9a9daSXin Lishown complexities in the file format which slow down 2611*0ac9a9daSXin Lidecompression and, in retrospect, are unnecessary. These 2612*0ac9a9daSXin Liare:</para> 2613*0ac9a9daSXin Li 2614*0ac9a9daSXin Li<itemizedlist mark='bullet'> 2615*0ac9a9daSXin Li 2616*0ac9a9daSXin Li <listitem><para>The run-length encoder, which is the first of the 2617*0ac9a9daSXin Li compression transformations, is entirely irrelevant. The 2618*0ac9a9daSXin Li original purpose was to protect the sorting algorithm from the 2619*0ac9a9daSXin Li very worst case input: a string of repeated symbols. But 2620*0ac9a9daSXin Li algorithm steps Q6a and Q6b in the original Burrows-Wheeler 2621*0ac9a9daSXin Li technical report (SRC-124) show how repeats can be handled 2622*0ac9a9daSXin Li without difficulty in block sorting.</para></listitem> 2623*0ac9a9daSXin Li 2624*0ac9a9daSXin Li <listitem><para>The randomisation mechanism doesn't really need to be 2625*0ac9a9daSXin Li there. Udi Manber and Gene Myers published a suffix array 2626*0ac9a9daSXin Li construction algorithm a few years back, which can be employed 2627*0ac9a9daSXin Li to sort any block, no matter how repetitive, in O(N log N) 2628*0ac9a9daSXin Li time. Subsequent work by Kunihiko Sadakane has produced a 2629*0ac9a9daSXin Li derivative O(N (log N)^2) algorithm which usually outperforms 2630*0ac9a9daSXin Li the Manber-Myers algorithm.</para> 2631*0ac9a9daSXin Li 2632*0ac9a9daSXin Li <para>I could have changed to Sadakane's algorithm, but I find 2633*0ac9a9daSXin Li it to be slower than <computeroutput>bzip2</computeroutput>'s 2634*0ac9a9daSXin Li existing algorithm for most inputs, and the randomisation 2635*0ac9a9daSXin Li mechanism protects adequately against bad cases. I didn't 2636*0ac9a9daSXin Li think it was a good tradeoff to make. Partly this is due to 2637*0ac9a9daSXin Li the fact that I was not flooded with email complaints about 2638*0ac9a9daSXin Li <computeroutput>bzip2-0.1</computeroutput>'s performance on 2639*0ac9a9daSXin Li repetitive data, so perhaps it isn't a problem for real 2640*0ac9a9daSXin Li inputs.</para> 2641*0ac9a9daSXin Li 2642*0ac9a9daSXin Li <para>Probably the best long-term solution, and the one I have 2643*0ac9a9daSXin Li incorporated into 0.9.5 and above, is to use the existing 2644*0ac9a9daSXin Li sorting algorithm initially, and fall back to a O(N (log N)^2) 2645*0ac9a9daSXin Li algorithm if the standard algorithm gets into 2646*0ac9a9daSXin Li difficulties.</para></listitem> 2647*0ac9a9daSXin Li 2648*0ac9a9daSXin Li <listitem><para>The compressed file format was never designed to be 2649*0ac9a9daSXin Li handled by a library, and I have had to jump though some hoops 2650*0ac9a9daSXin Li to produce an efficient implementation of decompression. It's 2651*0ac9a9daSXin Li a bit hairy. Try passing 2652*0ac9a9daSXin Li <computeroutput>decompress.c</computeroutput> through the C 2653*0ac9a9daSXin Li preprocessor and you'll see what I mean. Much of this 2654*0ac9a9daSXin Li complexity could have been avoided if the compressed size of 2655*0ac9a9daSXin Li each block of data was recorded in the data stream.</para></listitem> 2656*0ac9a9daSXin Li 2657*0ac9a9daSXin Li <listitem><para>An Adler-32 checksum, rather than a CRC32 checksum, 2658*0ac9a9daSXin Li would be faster to compute.</para></listitem> 2659*0ac9a9daSXin Li 2660*0ac9a9daSXin Li</itemizedlist> 2661*0ac9a9daSXin Li 2662*0ac9a9daSXin Li<para>It would be fair to say that the 2663*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> format was frozen before I 2664*0ac9a9daSXin Liproperly and fully understood the performance consequences of 2665*0ac9a9daSXin Lidoing so.</para> 2666*0ac9a9daSXin Li 2667*0ac9a9daSXin Li<para>Improvements which I was able to incorporate into 0.9.0, 2668*0ac9a9daSXin Lidespite using the same file format, are:</para> 2669*0ac9a9daSXin Li 2670*0ac9a9daSXin Li<itemizedlist mark='bullet'> 2671*0ac9a9daSXin Li 2672*0ac9a9daSXin Li <listitem><para>Single array implementation of the inverse BWT. This 2673*0ac9a9daSXin Li significantly speeds up decompression, presumably because it 2674*0ac9a9daSXin Li reduces the number of cache misses.</para></listitem> 2675*0ac9a9daSXin Li 2676*0ac9a9daSXin Li <listitem><para>Faster inverse MTF transform for large MTF values. 2677*0ac9a9daSXin Li The new implementation is based on the notion of sliding blocks 2678*0ac9a9daSXin Li of values.</para></listitem> 2679*0ac9a9daSXin Li 2680*0ac9a9daSXin Li <listitem><para><computeroutput>bzip2-0.9.0</computeroutput> now reads 2681*0ac9a9daSXin Li and writes files with <computeroutput>fread</computeroutput> 2682*0ac9a9daSXin Li and <computeroutput>fwrite</computeroutput>; version 0.1 used 2683*0ac9a9daSXin Li <computeroutput>putc</computeroutput> and 2684*0ac9a9daSXin Li <computeroutput>getc</computeroutput>. Duh! Well, you live 2685*0ac9a9daSXin Li and learn.</para></listitem> 2686*0ac9a9daSXin Li 2687*0ac9a9daSXin Li</itemizedlist> 2688*0ac9a9daSXin Li 2689*0ac9a9daSXin Li<para>Further ahead, it would be nice to be able to do random 2690*0ac9a9daSXin Liaccess into files. This will require some careful design of 2691*0ac9a9daSXin Licompressed file formats.</para> 2692*0ac9a9daSXin Li 2693*0ac9a9daSXin Li</sect1> 2694*0ac9a9daSXin Li 2695*0ac9a9daSXin Li 2696*0ac9a9daSXin Li<sect1 id="port-issues" xreflabel="Portability issues"> 2697*0ac9a9daSXin Li<title>Portability issues</title> 2698*0ac9a9daSXin Li 2699*0ac9a9daSXin Li<para>After some consideration, I have decided not to use GNU 2700*0ac9a9daSXin Li<computeroutput>autoconf</computeroutput> to configure 0.9.5 or 2701*0ac9a9daSXin Li1.0.</para> 2702*0ac9a9daSXin Li 2703*0ac9a9daSXin Li<para><computeroutput>autoconf</computeroutput>, admirable and 2704*0ac9a9daSXin Liwonderful though it is, mainly assists with portability problems 2705*0ac9a9daSXin Libetween Unix-like platforms. But 2706*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> doesn't have much in the 2707*0ac9a9daSXin Liway of portability problems on Unix; most of the difficulties 2708*0ac9a9daSXin Liappear when porting to the Mac, or to Microsoft's operating 2709*0ac9a9daSXin Lisystems. <computeroutput>autoconf</computeroutput> doesn't help 2710*0ac9a9daSXin Liin those cases, and brings in a whole load of new 2711*0ac9a9daSXin Licomplexity.</para> 2712*0ac9a9daSXin Li 2713*0ac9a9daSXin Li<para>Most people should be able to compile the library and 2714*0ac9a9daSXin Liprogram under Unix straight out-of-the-box, so to speak, 2715*0ac9a9daSXin Liespecially if you have a version of GNU C available.</para> 2716*0ac9a9daSXin Li 2717*0ac9a9daSXin Li<para>There are a couple of 2718*0ac9a9daSXin Li<computeroutput>__inline__</computeroutput> directives in the 2719*0ac9a9daSXin Licode. GNU C (<computeroutput>gcc</computeroutput>) should be 2720*0ac9a9daSXin Liable to handle them. If you're not using GNU C, your C compiler 2721*0ac9a9daSXin Lishouldn't see them at all. If your compiler does, for some 2722*0ac9a9daSXin Lireason, see them and doesn't like them, just 2723*0ac9a9daSXin Li<computeroutput>#define</computeroutput> 2724*0ac9a9daSXin Li<computeroutput>__inline__</computeroutput> to be 2725*0ac9a9daSXin Li<computeroutput>/* */</computeroutput>. One easy way to do this 2726*0ac9a9daSXin Liis to compile with the flag 2727*0ac9a9daSXin Li<computeroutput>-D__inline__=</computeroutput>, which should be 2728*0ac9a9daSXin Liunderstood by most Unix compilers.</para> 2729*0ac9a9daSXin Li 2730*0ac9a9daSXin Li<para>If you still have difficulties, try compiling with the 2731*0ac9a9daSXin Limacro <computeroutput>BZ_STRICT_ANSI</computeroutput> defined. 2732*0ac9a9daSXin LiThis should enable you to build the library in a strictly ANSI 2733*0ac9a9daSXin Licompliant environment. Building the program itself like this is 2734*0ac9a9daSXin Lidangerous and not supported, since you remove 2735*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput>'s checks against 2736*0ac9a9daSXin Licompressing directories, symbolic links, devices, and other 2737*0ac9a9daSXin Linot-really-a-file entities. This could cause filesystem 2738*0ac9a9daSXin Licorruption!</para> 2739*0ac9a9daSXin Li 2740*0ac9a9daSXin Li<para>One other thing: if you create a 2741*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> binary for public distribution, 2742*0ac9a9daSXin Liplease consider linking it statically (<computeroutput>gcc 2743*0ac9a9daSXin Li-static</computeroutput>). This avoids all sorts of library-version 2744*0ac9a9daSXin Liissues that others may encounter later on.</para> 2745*0ac9a9daSXin Li 2746*0ac9a9daSXin Li<para>If you build <computeroutput>bzip2</computeroutput> on 2747*0ac9a9daSXin LiWin32, you must set <computeroutput>BZ_UNIX</computeroutput> to 0 2748*0ac9a9daSXin Liand <computeroutput>BZ_LCCWIN32</computeroutput> to 1, in the 2749*0ac9a9daSXin Lifile <computeroutput>bzip2.c</computeroutput>, before compiling. 2750*0ac9a9daSXin LiOtherwise the resulting binary won't work correctly.</para> 2751*0ac9a9daSXin Li 2752*0ac9a9daSXin Li</sect1> 2753*0ac9a9daSXin Li 2754*0ac9a9daSXin Li 2755*0ac9a9daSXin Li<sect1 id="bugs" xreflabel="Reporting bugs"> 2756*0ac9a9daSXin Li<title>Reporting bugs</title> 2757*0ac9a9daSXin Li 2758*0ac9a9daSXin Li<para>I tried pretty hard to make sure 2759*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput> is bug free, both by 2760*0ac9a9daSXin Lidesign and by testing. Hopefully you'll never need to read this 2761*0ac9a9daSXin Lisection for real.</para> 2762*0ac9a9daSXin Li 2763*0ac9a9daSXin Li<para>Nevertheless, if <computeroutput>bzip2</computeroutput> dies 2764*0ac9a9daSXin Liwith a segmentation fault, a bus error or an internal assertion 2765*0ac9a9daSXin Lifailure, it will ask you to email me a bug report. Experience from 2766*0ac9a9daSXin Liyears of feedback of bzip2 users indicates that almost all these 2767*0ac9a9daSXin Liproblems can be traced to either compiler bugs or hardware 2768*0ac9a9daSXin Liproblems.</para> 2769*0ac9a9daSXin Li 2770*0ac9a9daSXin Li<itemizedlist mark='bullet'> 2771*0ac9a9daSXin Li 2772*0ac9a9daSXin Li <listitem><para>Recompile the program with no optimisation, and 2773*0ac9a9daSXin Li see if it works. And/or try a different compiler. I heard all 2774*0ac9a9daSXin Li sorts of stories about various flavours of GNU C (and other 2775*0ac9a9daSXin Li compilers) generating bad code for 2776*0ac9a9daSXin Li <computeroutput>bzip2</computeroutput>, and I've run across two 2777*0ac9a9daSXin Li such examples myself.</para> 2778*0ac9a9daSXin Li 2779*0ac9a9daSXin Li <para>2.7.X versions of GNU C are known to generate bad code 2780*0ac9a9daSXin Li from time to time, at high optimisation levels. If you get 2781*0ac9a9daSXin Li problems, try using the flags 2782*0ac9a9daSXin Li <computeroutput>-O2</computeroutput> 2783*0ac9a9daSXin Li <computeroutput>-fomit-frame-pointer</computeroutput> 2784*0ac9a9daSXin Li <computeroutput>-fno-strength-reduce</computeroutput>. You 2785*0ac9a9daSXin Li should specifically <emphasis>not</emphasis> use 2786*0ac9a9daSXin Li <computeroutput>-funroll-loops</computeroutput>.</para> 2787*0ac9a9daSXin Li 2788*0ac9a9daSXin Li <para>You may notice that the Makefile runs six tests as part 2789*0ac9a9daSXin Li of the build process. If the program passes all of these, it's 2790*0ac9a9daSXin Li a pretty good (but not 100%) indication that the compiler has 2791*0ac9a9daSXin Li done its job correctly.</para></listitem> 2792*0ac9a9daSXin Li 2793*0ac9a9daSXin Li <listitem><para>If <computeroutput>bzip2</computeroutput> 2794*0ac9a9daSXin Li crashes randomly, and the crashes are not repeatable, you may 2795*0ac9a9daSXin Li have a flaky memory subsystem. 2796*0ac9a9daSXin Li <computeroutput>bzip2</computeroutput> really hammers your 2797*0ac9a9daSXin Li memory hierarchy, and if it's a bit marginal, you may get these 2798*0ac9a9daSXin Li problems. Ditto if your disk or I/O subsystem is slowly 2799*0ac9a9daSXin Li failing. Yup, this really does happen.</para> 2800*0ac9a9daSXin Li 2801*0ac9a9daSXin Li <para>Try using a different machine of the same type, and see 2802*0ac9a9daSXin Li if you can repeat the problem.</para></listitem> 2803*0ac9a9daSXin Li 2804*0ac9a9daSXin Li <listitem><para>This isn't really a bug, but ... If 2805*0ac9a9daSXin Li <computeroutput>bzip2</computeroutput> tells you your file is 2806*0ac9a9daSXin Li corrupted on decompression, and you obtained the file via FTP, 2807*0ac9a9daSXin Li there is a possibility that you forgot to tell FTP to do a 2808*0ac9a9daSXin Li binary mode transfer. That absolutely will cause the file to 2809*0ac9a9daSXin Li be non-decompressible. You'll have to transfer it 2810*0ac9a9daSXin Li again.</para></listitem> 2811*0ac9a9daSXin Li 2812*0ac9a9daSXin Li</itemizedlist> 2813*0ac9a9daSXin Li 2814*0ac9a9daSXin Li<para>If you've incorporated 2815*0ac9a9daSXin Li<computeroutput>libbzip2</computeroutput> into your own program 2816*0ac9a9daSXin Liand are getting problems, please, please, please, check that the 2817*0ac9a9daSXin Liparameters you are passing in calls to the library, are correct, 2818*0ac9a9daSXin Liand in accordance with what the documentation says is allowable. 2819*0ac9a9daSXin LiI have tried to make the library robust against such problems, 2820*0ac9a9daSXin Libut I'm sure I haven't succeeded.</para> 2821*0ac9a9daSXin Li 2822*0ac9a9daSXin Li<para>Finally, if the above comments don't help, you'll have to 2823*0ac9a9daSXin Lisend me a bug report. Now, it's just amazing how many people 2824*0ac9a9daSXin Liwill send me a bug report saying something like:</para> 2825*0ac9a9daSXin Li 2826*0ac9a9daSXin Li<programlisting> 2827*0ac9a9daSXin Libzip2 crashed with segmentation fault on my machine 2828*0ac9a9daSXin Li</programlisting> 2829*0ac9a9daSXin Li 2830*0ac9a9daSXin Li<para>and absolutely nothing else. Needless to say, a such a 2831*0ac9a9daSXin Lireport is <emphasis>totally, utterly, completely and 2832*0ac9a9daSXin Licomprehensively 100% useless; a waste of your time, my time, and 2833*0ac9a9daSXin Linet bandwidth</emphasis>. With no details at all, there's no way 2834*0ac9a9daSXin LiI can possibly begin to figure out what the problem is.</para> 2835*0ac9a9daSXin Li 2836*0ac9a9daSXin Li<para>The rules of the game are: facts, facts, facts. Don't omit 2837*0ac9a9daSXin Lithem because "oh, they won't be relevant". At the bare 2838*0ac9a9daSXin Liminimum:</para> 2839*0ac9a9daSXin Li 2840*0ac9a9daSXin Li<programlisting> 2841*0ac9a9daSXin LiMachine type. Operating system version. 2842*0ac9a9daSXin LiExact version of bzip2 (do bzip2 -V). 2843*0ac9a9daSXin LiExact version of the compiler used. 2844*0ac9a9daSXin LiFlags passed to the compiler. 2845*0ac9a9daSXin Li</programlisting> 2846*0ac9a9daSXin Li 2847*0ac9a9daSXin Li<para>However, the most important single thing that will help me 2848*0ac9a9daSXin Liis the file that you were trying to compress or decompress at the 2849*0ac9a9daSXin Litime the problem happened. Without that, my ability to do 2850*0ac9a9daSXin Lianything more than speculate about the cause, is limited.</para> 2851*0ac9a9daSXin Li 2852*0ac9a9daSXin Li</sect1> 2853*0ac9a9daSXin Li 2854*0ac9a9daSXin Li 2855*0ac9a9daSXin Li<sect1 id="package" xreflabel="Did you get the right package?"> 2856*0ac9a9daSXin Li<title>Did you get the right package?</title> 2857*0ac9a9daSXin Li 2858*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> is a resource hog. 2859*0ac9a9daSXin LiIt soaks up large amounts of CPU cycles and memory. Also, it 2860*0ac9a9daSXin Ligives very large latencies. In the worst case, you can feed many 2861*0ac9a9daSXin Limegabytes of uncompressed data into the library before getting 2862*0ac9a9daSXin Liany compressed output, so this probably rules out applications 2863*0ac9a9daSXin Lirequiring interactive behaviour.</para> 2864*0ac9a9daSXin Li 2865*0ac9a9daSXin Li<para>These aren't faults of my implementation, I hope, but more 2866*0ac9a9daSXin Lian intrinsic property of the Burrows-Wheeler transform 2867*0ac9a9daSXin Li(unfortunately). Maybe this isn't what you want.</para> 2868*0ac9a9daSXin Li 2869*0ac9a9daSXin Li<para>If you want a compressor and/or library which is faster, 2870*0ac9a9daSXin Liuses less memory but gets pretty good compression, and has 2871*0ac9a9daSXin Liminimal latency, consider Jean-loup Gailly's and Mark Adler's 2872*0ac9a9daSXin Liwork, <computeroutput>zlib-1.2.1</computeroutput> and 2873*0ac9a9daSXin Li<computeroutput>gzip-1.2.4</computeroutput>. Look for them at 2874*0ac9a9daSXin Li<ulink url="http://www.zlib.org">http://www.zlib.org</ulink> and 2875*0ac9a9daSXin Li<ulink url="http://www.gzip.org">http://www.gzip.org</ulink> 2876*0ac9a9daSXin Lirespectively.</para> 2877*0ac9a9daSXin Li 2878*0ac9a9daSXin Li<para>For something faster and lighter still, you might try Markus F 2879*0ac9a9daSXin LiX J Oberhumer's <computeroutput>LZO</computeroutput> real-time 2880*0ac9a9daSXin Licompression/decompression library, at 2881*0ac9a9daSXin Li<ulink url="http://www.oberhumer.com/opensource">http://www.oberhumer.com/opensource</ulink>.</para> 2882*0ac9a9daSXin Li 2883*0ac9a9daSXin Li</sect1> 2884*0ac9a9daSXin Li 2885*0ac9a9daSXin Li 2886*0ac9a9daSXin Li 2887*0ac9a9daSXin Li<sect1 id="reading" xreflabel="Further Reading"> 2888*0ac9a9daSXin Li<title>Further Reading</title> 2889*0ac9a9daSXin Li 2890*0ac9a9daSXin Li<para><computeroutput>bzip2</computeroutput> is not research 2891*0ac9a9daSXin Liwork, in the sense that it doesn't present any new ideas. 2892*0ac9a9daSXin LiRather, it's an engineering exercise based on existing 2893*0ac9a9daSXin Liideas.</para> 2894*0ac9a9daSXin Li 2895*0ac9a9daSXin Li<para>Four documents describe essentially all the ideas behind 2896*0ac9a9daSXin Li<computeroutput>bzip2</computeroutput>:</para> 2897*0ac9a9daSXin Li 2898*0ac9a9daSXin Li<literallayout>Michael Burrows and D. J. Wheeler: 2899*0ac9a9daSXin Li "A block-sorting lossless data compression algorithm" 2900*0ac9a9daSXin Li 10th May 1994. 2901*0ac9a9daSXin Li Digital SRC Research Report 124. 2902*0ac9a9daSXin Li ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz 2903*0ac9a9daSXin Li If you have trouble finding it, try searching at the 2904*0ac9a9daSXin Li New Zealand Digital Library, http://www.nzdl.org. 2905*0ac9a9daSXin Li 2906*0ac9a9daSXin LiDaniel S. Hirschberg and Debra A. LeLewer 2907*0ac9a9daSXin Li "Efficient Decoding of Prefix Codes" 2908*0ac9a9daSXin Li Communications of the ACM, April 1990, Vol 33, Number 4. 2909*0ac9a9daSXin Li You might be able to get an electronic copy of this 2910*0ac9a9daSXin Li from the ACM Digital Library. 2911*0ac9a9daSXin Li 2912*0ac9a9daSXin LiDavid J. Wheeler 2913*0ac9a9daSXin Li Program bred3.c and accompanying document bred3.ps. 2914*0ac9a9daSXin Li This contains the idea behind the multi-table Huffman coding scheme. 2915*0ac9a9daSXin Li ftp://ftp.cl.cam.ac.uk/users/djw3/ 2916*0ac9a9daSXin Li 2917*0ac9a9daSXin LiJon L. Bentley and Robert Sedgewick 2918*0ac9a9daSXin Li "Fast Algorithms for Sorting and Searching Strings" 2919*0ac9a9daSXin Li Available from Sedgewick's web page, 2920*0ac9a9daSXin Li www.cs.princeton.edu/~rs 2921*0ac9a9daSXin Li</literallayout> 2922*0ac9a9daSXin Li 2923*0ac9a9daSXin Li<para>The following paper gives valuable additional insights into 2924*0ac9a9daSXin Lithe algorithm, but is not immediately the basis of any code used 2925*0ac9a9daSXin Liin bzip2.</para> 2926*0ac9a9daSXin Li 2927*0ac9a9daSXin Li<literallayout>Peter Fenwick: 2928*0ac9a9daSXin Li Block Sorting Text Compression 2929*0ac9a9daSXin Li Proceedings of the 19th Australasian Computer Science Conference, 2930*0ac9a9daSXin Li Melbourne, Australia. Jan 31 - Feb 2, 1996. 2931*0ac9a9daSXin Li ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps</literallayout> 2932*0ac9a9daSXin Li 2933*0ac9a9daSXin Li<para>Kunihiko Sadakane's sorting algorithm, mentioned above, is 2934*0ac9a9daSXin Liavailable from:</para> 2935*0ac9a9daSXin Li 2936*0ac9a9daSXin Li<literallayout>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz 2937*0ac9a9daSXin Li</literallayout> 2938*0ac9a9daSXin Li 2939*0ac9a9daSXin Li<para>The Manber-Myers suffix array construction algorithm is 2940*0ac9a9daSXin Lidescribed in a paper available from:</para> 2941*0ac9a9daSXin Li 2942*0ac9a9daSXin Li<literallayout>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps 2943*0ac9a9daSXin Li</literallayout> 2944*0ac9a9daSXin Li 2945*0ac9a9daSXin Li<para>Finally, the following papers document some 2946*0ac9a9daSXin Liinvestigations I made into the performance of sorting 2947*0ac9a9daSXin Liand decompression algorithms:</para> 2948*0ac9a9daSXin Li 2949*0ac9a9daSXin Li<literallayout>Julian Seward 2950*0ac9a9daSXin Li On the Performance of BWT Sorting Algorithms 2951*0ac9a9daSXin Li Proceedings of the IEEE Data Compression Conference 2000 2952*0ac9a9daSXin Li Snowbird, Utah. 28-30 March 2000. 2953*0ac9a9daSXin Li 2954*0ac9a9daSXin LiJulian Seward 2955*0ac9a9daSXin Li Space-time Tradeoffs in the Inverse B-W Transform 2956*0ac9a9daSXin Li Proceedings of the IEEE Data Compression Conference 2001 2957*0ac9a9daSXin Li Snowbird, Utah. 27-29 March 2001. 2958*0ac9a9daSXin Li</literallayout> 2959*0ac9a9daSXin Li 2960*0ac9a9daSXin Li</sect1> 2961*0ac9a9daSXin Li 2962*0ac9a9daSXin Li</chapter> 2963*0ac9a9daSXin Li 2964*0ac9a9daSXin Li</book> 2965