1*ba8755cbSAndroid Build Coastguard Worker<?xml version="1.0"?> 2*ba8755cbSAndroid Build Coastguard Worker<!-- 3*ba8755cbSAndroid Build Coastguard Worker 4*ba8755cbSAndroid Build Coastguard Worker Licensed to the Apache Software Foundation (ASF) under one or more 5*ba8755cbSAndroid Build Coastguard Worker contributor license agreements. See the NOTICE file distributed with 6*ba8755cbSAndroid Build Coastguard Worker this work for additional information regarding copyright ownership. 7*ba8755cbSAndroid Build Coastguard Worker The ASF licenses this file to You under the Apache License, Version 2.0 8*ba8755cbSAndroid Build Coastguard Worker (the "License"); you may not use this file except in compliance with 9*ba8755cbSAndroid Build Coastguard Worker the License. You may obtain a copy of the License at 10*ba8755cbSAndroid Build Coastguard Worker 11*ba8755cbSAndroid Build Coastguard Worker http://www.apache.org/licenses/LICENSE-2.0 12*ba8755cbSAndroid Build Coastguard Worker 13*ba8755cbSAndroid Build Coastguard Worker Unless required by applicable law or agreed to in writing, software 14*ba8755cbSAndroid Build Coastguard Worker distributed under the License is distributed on an "AS IS" BASIS, 15*ba8755cbSAndroid Build Coastguard Worker WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16*ba8755cbSAndroid Build Coastguard Worker See the License for the specific language governing permissions and 17*ba8755cbSAndroid Build Coastguard Worker limitations under the License. 18*ba8755cbSAndroid Build Coastguard Worker 19*ba8755cbSAndroid Build Coastguard Worker--> 20*ba8755cbSAndroid Build Coastguard Worker<document> 21*ba8755cbSAndroid Build Coastguard Worker <properties> 22*ba8755cbSAndroid Build Coastguard Worker <title>Commons Compress TAR package</title> 23*ba8755cbSAndroid Build Coastguard Worker <author email="[email protected]">Commons Documentation Team</author> 24*ba8755cbSAndroid Build Coastguard Worker </properties> 25*ba8755cbSAndroid Build Coastguard Worker <body> 26*ba8755cbSAndroid Build Coastguard Worker <section name="The TAR package"> 27*ba8755cbSAndroid Build Coastguard Worker 28*ba8755cbSAndroid Build Coastguard Worker <p>In addition to the information stored 29*ba8755cbSAndroid Build Coastguard Worker in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code> 30*ba8755cbSAndroid Build Coastguard Worker stores various attributes including information about the 31*ba8755cbSAndroid Build Coastguard Worker original owner and permissions.</p> 32*ba8755cbSAndroid Build Coastguard Worker 33*ba8755cbSAndroid Build Coastguard Worker <p>There are several different dialects of the TAR format, maybe 34*ba8755cbSAndroid Build Coastguard Worker even different TAR formats. The tar package contains special 35*ba8755cbSAndroid Build Coastguard Worker cases in order to read many of the existing dialects and will by 36*ba8755cbSAndroid Build Coastguard Worker default try to create archives in the original format (often 37*ba8755cbSAndroid Build Coastguard Worker called "ustar"). This original format didn't support file names 38*ba8755cbSAndroid Build Coastguard Worker longer than 100 characters or bigger than 8 GiB and the tar 39*ba8755cbSAndroid Build Coastguard Worker package will by default fail if you try to write an entry that 40*ba8755cbSAndroid Build Coastguard Worker goes beyond those limits. "ustar" is the common denominator of 41*ba8755cbSAndroid Build Coastguard Worker all the existing tar dialects and is understood by most of the 42*ba8755cbSAndroid Build Coastguard Worker existing tools.</p> 43*ba8755cbSAndroid Build Coastguard Worker 44*ba8755cbSAndroid Build Coastguard Worker <p>The tar package does not support the full POSIX tar standard 45*ba8755cbSAndroid Build Coastguard Worker nor more modern GNU extension of said standard.</p> 46*ba8755cbSAndroid Build Coastguard Worker 47*ba8755cbSAndroid Build Coastguard Worker <subsection name="Long File Names"> 48*ba8755cbSAndroid Build Coastguard Worker 49*ba8755cbSAndroid Build Coastguard Worker <p>The <code>longFileMode</code> option of 50*ba8755cbSAndroid Build Coastguard Worker <code>TarArchiveOutputStream</code> controls how files with 51*ba8755cbSAndroid Build Coastguard Worker names longer than 100 characters are handled. The possible 52*ba8755cbSAndroid Build Coastguard Worker choices are:</p> 53*ba8755cbSAndroid Build Coastguard Worker 54*ba8755cbSAndroid Build Coastguard Worker <ul> 55*ba8755cbSAndroid Build Coastguard Worker <li><code>LONGFILE_ERROR</code>: throw an exception if such a 56*ba8755cbSAndroid Build Coastguard Worker file is added. This is the default.</li> 57*ba8755cbSAndroid Build Coastguard Worker <li><code>LONGFILE_TRUNCATE</code>: truncate such names.</li> 58*ba8755cbSAndroid Build Coastguard Worker <li><code>LONGFILE_GNU</code>: use a GNU tar variant now 59*ba8755cbSAndroid Build Coastguard Worker refered to as "oldgnu" of storing such names. If you choose 60*ba8755cbSAndroid Build Coastguard Worker the GNU tar option, the archive can not be extracted using 61*ba8755cbSAndroid Build Coastguard Worker many other tar implementations like the ones of OpenBSD, 62*ba8755cbSAndroid Build Coastguard Worker Solaris or MacOS X.</li> 63*ba8755cbSAndroid Build Coastguard Worker <li><code>LONGFILE_POSIX</code>: use a PAX <a 64*ba8755cbSAndroid Build Coastguard Worker href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended 65*ba8755cbSAndroid Build Coastguard Worker header</a> as defined by POSIX 1003.1. Most modern tar 66*ba8755cbSAndroid Build Coastguard Worker implementations are able to extract such archives. <em>since 67*ba8755cbSAndroid Build Coastguard Worker Commons Compress 1.4</em></li> 68*ba8755cbSAndroid Build Coastguard Worker </ul> 69*ba8755cbSAndroid Build Coastguard Worker 70*ba8755cbSAndroid Build Coastguard Worker <p><code>TarArchiveInputStream</code> will recognize the GNU 71*ba8755cbSAndroid Build Coastguard Worker tar as well as the POSIX extensions (starting with Commons 72*ba8755cbSAndroid Build Coastguard Worker Compress 1.2) for long file names and reads the longer names 73*ba8755cbSAndroid Build Coastguard Worker transparently.</p> 74*ba8755cbSAndroid Build Coastguard Worker </subsection> 75*ba8755cbSAndroid Build Coastguard Worker 76*ba8755cbSAndroid Build Coastguard Worker <subsection name="Big Numeric Values"> 77*ba8755cbSAndroid Build Coastguard Worker 78*ba8755cbSAndroid Build Coastguard Worker <p>The <code>bigNumberMode</code> option of 79*ba8755cbSAndroid Build Coastguard Worker <code>TarArchiveOutputStream</code> controls how files larger 80*ba8755cbSAndroid Build Coastguard Worker than 8GiB or with other big numeric values that can't be 81*ba8755cbSAndroid Build Coastguard Worker encoded in traditional header fields are handled. The 82*ba8755cbSAndroid Build Coastguard Worker possible choices are:</p> 83*ba8755cbSAndroid Build Coastguard Worker 84*ba8755cbSAndroid Build Coastguard Worker <ul> 85*ba8755cbSAndroid Build Coastguard Worker <li><code>BIGNUMBER_ERROR</code>: throw an exception if such an 86*ba8755cbSAndroid Build Coastguard Worker entry is added. This is the default.</li> 87*ba8755cbSAndroid Build Coastguard Worker <li><code>BIGNUMBER_STAR</code>: use a variant first 88*ba8755cbSAndroid Build Coastguard Worker introduced by Jörg Schilling's <a 89*ba8755cbSAndroid Build Coastguard Worker href="http://developer.berlios.de/projects/star">star</a> 90*ba8755cbSAndroid Build Coastguard Worker and later adopted by GNU and BSD tar. This method is not 91*ba8755cbSAndroid Build Coastguard Worker supported by all implementations.</li> 92*ba8755cbSAndroid Build Coastguard Worker <li><code>BIGNUMBER_POSIX</code>: use a PAX <a 93*ba8755cbSAndroid Build Coastguard Worker href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended 94*ba8755cbSAndroid Build Coastguard Worker header</a> as defined by POSIX 1003.1. Most modern tar 95*ba8755cbSAndroid Build Coastguard Worker implementations are able to extract such archives.</li> 96*ba8755cbSAndroid Build Coastguard Worker </ul> 97*ba8755cbSAndroid Build Coastguard Worker 98*ba8755cbSAndroid Build Coastguard Worker <p>Starting with Commons Compress 1.4 99*ba8755cbSAndroid Build Coastguard Worker <code>TarArchiveInputStream</code> will recognize the star as 100*ba8755cbSAndroid Build Coastguard Worker well as the POSIX extensions for big numeric values and reads them 101*ba8755cbSAndroid Build Coastguard Worker transparently.</p> 102*ba8755cbSAndroid Build Coastguard Worker </subsection> 103*ba8755cbSAndroid Build Coastguard Worker 104*ba8755cbSAndroid Build Coastguard Worker <subsection name="File Name Encoding"> 105*ba8755cbSAndroid Build Coastguard Worker <p>The original ustar format only supports 7-Bit ASCII file 106*ba8755cbSAndroid Build Coastguard Worker names, later implementations use the platform's default 107*ba8755cbSAndroid Build Coastguard Worker encoding to encode file names. The POSIX standard recommends 108*ba8755cbSAndroid Build Coastguard Worker using PAX extension headers for non-ASCII file names 109*ba8755cbSAndroid Build Coastguard Worker instead.</p> 110*ba8755cbSAndroid Build Coastguard Worker 111*ba8755cbSAndroid Build Coastguard Worker <p>Commons Compress 1.1 to 1.3 assumed file names would be 112*ba8755cbSAndroid Build Coastguard Worker encoded using ISO-8859-1. Starting with Commons Compress 1.4 113*ba8755cbSAndroid Build Coastguard Worker you can specify the encoding to expect (to use when writing) 114*ba8755cbSAndroid Build Coastguard Worker as a parameter to <code>TarArchiveInputStream</code> 115*ba8755cbSAndroid Build Coastguard Worker (<code>TarArchiveOutputStream</code>), it now defaults to the 116*ba8755cbSAndroid Build Coastguard Worker platform's default encoding.</p> 117*ba8755cbSAndroid Build Coastguard Worker 118*ba8755cbSAndroid Build Coastguard Worker <p>Since Commons Compress 1.4 another optional parameter - 119*ba8755cbSAndroid Build Coastguard Worker <code>addPaxHeadersForNonAsciiNames</code> - of 120*ba8755cbSAndroid Build Coastguard Worker <code>TarArchiveOutputStream</code> controls whether PAX 121*ba8755cbSAndroid Build Coastguard Worker extension headers will be written for non-ASCII file names. 122*ba8755cbSAndroid Build Coastguard Worker By default they will not be written to preserve space. 123*ba8755cbSAndroid Build Coastguard Worker <code>TarArchiveInputStream</code> will read them 124*ba8755cbSAndroid Build Coastguard Worker transparently if present.</p> 125*ba8755cbSAndroid Build Coastguard Worker </subsection> 126*ba8755cbSAndroid Build Coastguard Worker 127*ba8755cbSAndroid Build Coastguard Worker <subsection name="Sparse files"> 128*ba8755cbSAndroid Build Coastguard Worker 129*ba8755cbSAndroid Build Coastguard Worker <p><code>TarArchiveInputStream</code> will recognize sparse 130*ba8755cbSAndroid Build Coastguard Worker file entries stored using the "oldgnu" format 131*ba8755cbSAndroid Build Coastguard Worker (<code>--sparse-version=0.0</code> in GNU tar) but is not 132*ba8755cbSAndroid Build Coastguard Worker able to extract them correctly. <a href="#Unsupported 133*ba8755cbSAndroid Build Coastguard Worker Features"><code>canReadEntryData</code></a> will return false 134*ba8755cbSAndroid Build Coastguard Worker on such entries. The other variants of sparse files can 135*ba8755cbSAndroid Build Coastguard Worker currently not be detected at all.</p> 136*ba8755cbSAndroid Build Coastguard Worker </subsection> 137*ba8755cbSAndroid Build Coastguard Worker 138*ba8755cbSAndroid Build Coastguard Worker <subsection name="Consuming Archives Completely"> 139*ba8755cbSAndroid Build Coastguard Worker 140*ba8755cbSAndroid Build Coastguard Worker <p>The end of a tar archive is signalled by two consecutive 141*ba8755cbSAndroid Build Coastguard Worker records of all zeros. Unfortunately not all tar 142*ba8755cbSAndroid Build Coastguard Worker implementations adhere to this and some only write one record 143*ba8755cbSAndroid Build Coastguard Worker to end the archive. Commons Compress will always write two 144*ba8755cbSAndroid Build Coastguard Worker records but stop reading an archive as soon as finds one 145*ba8755cbSAndroid Build Coastguard Worker record of all zeros.</p> 146*ba8755cbSAndroid Build Coastguard Worker 147*ba8755cbSAndroid Build Coastguard Worker <p>Prior to version 1.5 this could leave the second EOF record 148*ba8755cbSAndroid Build Coastguard Worker inside the stream when <code>getNextEntry</code> or 149*ba8755cbSAndroid Build Coastguard Worker <code>getNextTarEntry</code> returned <code>null</code> 150*ba8755cbSAndroid Build Coastguard Worker Starting with version 1.5 <code>TarArchiveInputStream</code> 151*ba8755cbSAndroid Build Coastguard Worker will try to read a second record as well if present, 152*ba8755cbSAndroid Build Coastguard Worker effectively consuming the archive completely.</p> 153*ba8755cbSAndroid Build Coastguard Worker 154*ba8755cbSAndroid Build Coastguard Worker </subsection> 155*ba8755cbSAndroid Build Coastguard Worker 156*ba8755cbSAndroid Build Coastguard Worker <subsection name="PAX Extended Header"> 157*ba8755cbSAndroid Build Coastguard Worker <p>The tar package has supported reading PAX extended headers 158*ba8755cbSAndroid Build Coastguard Worker since 1.3 for local headers and 1.11 for global headers. The 159*ba8755cbSAndroid Build Coastguard Worker following entries of PAX headers are applied when reading:</p> 160*ba8755cbSAndroid Build Coastguard Worker 161*ba8755cbSAndroid Build Coastguard Worker <dl> 162*ba8755cbSAndroid Build Coastguard Worker <dt>path</dt> 163*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's name</dd> 164*ba8755cbSAndroid Build Coastguard Worker 165*ba8755cbSAndroid Build Coastguard Worker <dt>linkpath</dt> 166*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's link name</dd> 167*ba8755cbSAndroid Build Coastguard Worker 168*ba8755cbSAndroid Build Coastguard Worker <dt>gid</dt> 169*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's group id</dd> 170*ba8755cbSAndroid Build Coastguard Worker 171*ba8755cbSAndroid Build Coastguard Worker <dt>gname</dt> 172*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's group name</dd> 173*ba8755cbSAndroid Build Coastguard Worker 174*ba8755cbSAndroid Build Coastguard Worker <dt>uid</dt> 175*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's user id</dd> 176*ba8755cbSAndroid Build Coastguard Worker 177*ba8755cbSAndroid Build Coastguard Worker <dt>uname</dt> 178*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's user name</dd> 179*ba8755cbSAndroid Build Coastguard Worker 180*ba8755cbSAndroid Build Coastguard Worker <dt>size</dt> 181*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's size</dd> 182*ba8755cbSAndroid Build Coastguard Worker 183*ba8755cbSAndroid Build Coastguard Worker <dt>mtime</dt> 184*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's modification time</dd> 185*ba8755cbSAndroid Build Coastguard Worker 186*ba8755cbSAndroid Build Coastguard Worker <dt>SCHILY.devminor</dt> 187*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's minor device number</dd> 188*ba8755cbSAndroid Build Coastguard Worker 189*ba8755cbSAndroid Build Coastguard Worker <dt>SCHILY.devmajor</dt> 190*ba8755cbSAndroid Build Coastguard Worker <dd>set the entry's major device number</dd> 191*ba8755cbSAndroid Build Coastguard Worker </dl> 192*ba8755cbSAndroid Build Coastguard Worker 193*ba8755cbSAndroid Build Coastguard Worker <p>in addition some fields used by GNU tar and star used to 194*ba8755cbSAndroid Build Coastguard Worker signal sparse entries are supported and are used for the 195*ba8755cbSAndroid Build Coastguard Worker <code>is*GNUSparse</code> and <code>isStarSparse</code> 196*ba8755cbSAndroid Build Coastguard Worker methods.</p> 197*ba8755cbSAndroid Build Coastguard Worker 198*ba8755cbSAndroid Build Coastguard Worker <p>Some PAX extra headers may be set when writing archives, 199*ba8755cbSAndroid Build Coastguard Worker for example for non-ASCII names or big numeric values. This 200*ba8755cbSAndroid Build Coastguard Worker depends on various setting of the output stream - see the 201*ba8755cbSAndroid Build Coastguard Worker previous sections.</p> 202*ba8755cbSAndroid Build Coastguard Worker 203*ba8755cbSAndroid Build Coastguard Worker <p>Since 1.15 you can directly access all PAX extension 204*ba8755cbSAndroid Build Coastguard Worker headers that have been found when reading an entry or specify 205*ba8755cbSAndroid Build Coastguard Worker extra headers to be written to a (local) PAX extended header 206*ba8755cbSAndroid Build Coastguard Worker entry.</p> 207*ba8755cbSAndroid Build Coastguard Worker 208*ba8755cbSAndroid Build Coastguard Worker <p>Some hints if you try to set extended headers:</p> 209*ba8755cbSAndroid Build Coastguard Worker 210*ba8755cbSAndroid Build Coastguard Worker <ul> 211*ba8755cbSAndroid Build Coastguard Worker <li>pax header keywords should be ascii. star/gnutar 212*ba8755cbSAndroid Build Coastguard Worker (SCHILY.xattr.* ) do not check for this. libarchive/bsdtar 213*ba8755cbSAndroid Build Coastguard Worker (LIBARCHIVE.xattr.*) uses URL-Encoding.</li> 214*ba8755cbSAndroid Build Coastguard Worker <li>pax header values should be encoded as UTF-8 characters 215*ba8755cbSAndroid Build Coastguard Worker (including trailing <code>\0</code>). star/gnutar 216*ba8755cbSAndroid Build Coastguard Worker (SCHILY.xattr.*) do not check for this. libarchive/bsdtar 217*ba8755cbSAndroid Build Coastguard Worker (LIBARCHIVE.xattr.*) encode values using Base64.</li> 218*ba8755cbSAndroid Build Coastguard Worker <li>libarchive/bsdtar will read SCHILY.xattr headers, but 219*ba8755cbSAndroid Build Coastguard Worker will not generate them.</li> 220*ba8755cbSAndroid Build Coastguard Worker <li>gnutar will complain about LIBARCHIVE.xattr (and any 221*ba8755cbSAndroid Build Coastguard Worker other unknown) headers and will neither encode nor decode 222*ba8755cbSAndroid Build Coastguard Worker them.</li> 223*ba8755cbSAndroid Build Coastguard Worker </ul> 224*ba8755cbSAndroid Build Coastguard Worker </subsection> 225*ba8755cbSAndroid Build Coastguard Worker 226*ba8755cbSAndroid Build Coastguard Worker </section> 227*ba8755cbSAndroid Build Coastguard Worker </body> 228*ba8755cbSAndroid Build Coastguard Worker</document> 229