xref: /aosp_15_r20/external/apache-commons-compress/src/site/xdoc/tar.xml (revision ba8755cb0ae00084b4d58129cd522613d3299f27)
1*ba8755cbSAndroid Build Coastguard Worker<?xml version="1.0"?>
2*ba8755cbSAndroid Build Coastguard Worker<!--
3*ba8755cbSAndroid Build Coastguard Worker
4*ba8755cbSAndroid Build Coastguard Worker   Licensed to the Apache Software Foundation (ASF) under one or more
5*ba8755cbSAndroid Build Coastguard Worker   contributor license agreements.  See the NOTICE file distributed with
6*ba8755cbSAndroid Build Coastguard Worker   this work for additional information regarding copyright ownership.
7*ba8755cbSAndroid Build Coastguard Worker   The ASF licenses this file to You under the Apache License, Version 2.0
8*ba8755cbSAndroid Build Coastguard Worker   (the "License"); you may not use this file except in compliance with
9*ba8755cbSAndroid Build Coastguard Worker   the License.  You may obtain a copy of the License at
10*ba8755cbSAndroid Build Coastguard Worker
11*ba8755cbSAndroid Build Coastguard Worker       http://www.apache.org/licenses/LICENSE-2.0
12*ba8755cbSAndroid Build Coastguard Worker
13*ba8755cbSAndroid Build Coastguard Worker   Unless required by applicable law or agreed to in writing, software
14*ba8755cbSAndroid Build Coastguard Worker   distributed under the License is distributed on an "AS IS" BASIS,
15*ba8755cbSAndroid Build Coastguard Worker   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16*ba8755cbSAndroid Build Coastguard Worker   See the License for the specific language governing permissions and
17*ba8755cbSAndroid Build Coastguard Worker   limitations under the License.
18*ba8755cbSAndroid Build Coastguard Worker
19*ba8755cbSAndroid Build Coastguard Worker-->
20*ba8755cbSAndroid Build Coastguard Worker<document>
21*ba8755cbSAndroid Build Coastguard Worker  <properties>
22*ba8755cbSAndroid Build Coastguard Worker    <title>Commons Compress TAR package</title>
23*ba8755cbSAndroid Build Coastguard Worker    <author email="[email protected]">Commons Documentation Team</author>
24*ba8755cbSAndroid Build Coastguard Worker  </properties>
25*ba8755cbSAndroid Build Coastguard Worker  <body>
26*ba8755cbSAndroid Build Coastguard Worker    <section name="The TAR package">
27*ba8755cbSAndroid Build Coastguard Worker
28*ba8755cbSAndroid Build Coastguard Worker      <p>In addition to the information stored
29*ba8755cbSAndroid Build Coastguard Worker      in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code>
30*ba8755cbSAndroid Build Coastguard Worker      stores various attributes including information about the
31*ba8755cbSAndroid Build Coastguard Worker      original owner and permissions.</p>
32*ba8755cbSAndroid Build Coastguard Worker
33*ba8755cbSAndroid Build Coastguard Worker      <p>There are several different dialects of the TAR format, maybe
34*ba8755cbSAndroid Build Coastguard Worker      even different TAR formats. The tar package contains special
35*ba8755cbSAndroid Build Coastguard Worker      cases in order to read many of the existing dialects and will by
36*ba8755cbSAndroid Build Coastguard Worker      default try to create archives in the original format (often
37*ba8755cbSAndroid Build Coastguard Worker      called "ustar"). This original format didn't support file names
38*ba8755cbSAndroid Build Coastguard Worker      longer than 100 characters or bigger than 8 GiB and the tar
39*ba8755cbSAndroid Build Coastguard Worker      package will by default fail if you try to write an entry that
40*ba8755cbSAndroid Build Coastguard Worker      goes beyond those limits. "ustar" is the common denominator of
41*ba8755cbSAndroid Build Coastguard Worker      all the existing tar dialects and is understood by most of the
42*ba8755cbSAndroid Build Coastguard Worker      existing tools.</p>
43*ba8755cbSAndroid Build Coastguard Worker
44*ba8755cbSAndroid Build Coastguard Worker      <p>The tar package does not support the full POSIX tar standard
45*ba8755cbSAndroid Build Coastguard Worker      nor more modern GNU extension of said standard.</p>
46*ba8755cbSAndroid Build Coastguard Worker
47*ba8755cbSAndroid Build Coastguard Worker      <subsection name="Long File Names">
48*ba8755cbSAndroid Build Coastguard Worker
49*ba8755cbSAndroid Build Coastguard Worker        <p>The <code>longFileMode</code> option of
50*ba8755cbSAndroid Build Coastguard Worker        <code>TarArchiveOutputStream</code> controls how files with
51*ba8755cbSAndroid Build Coastguard Worker        names longer than 100 characters are handled.  The possible
52*ba8755cbSAndroid Build Coastguard Worker        choices are:</p>
53*ba8755cbSAndroid Build Coastguard Worker
54*ba8755cbSAndroid Build Coastguard Worker        <ul>
55*ba8755cbSAndroid Build Coastguard Worker          <li><code>LONGFILE_ERROR</code>: throw an exception if such a
56*ba8755cbSAndroid Build Coastguard Worker          file is added.  This is the default.</li>
57*ba8755cbSAndroid Build Coastguard Worker          <li><code>LONGFILE_TRUNCATE</code>: truncate such names.</li>
58*ba8755cbSAndroid Build Coastguard Worker          <li><code>LONGFILE_GNU</code>: use a GNU tar variant now
59*ba8755cbSAndroid Build Coastguard Worker          refered to as "oldgnu" of storing such names.  If you choose
60*ba8755cbSAndroid Build Coastguard Worker          the GNU tar option, the archive can not be extracted using
61*ba8755cbSAndroid Build Coastguard Worker          many other tar implementations like the ones of OpenBSD,
62*ba8755cbSAndroid Build Coastguard Worker          Solaris or MacOS X.</li>
63*ba8755cbSAndroid Build Coastguard Worker          <li><code>LONGFILE_POSIX</code>: use a PAX <a
64*ba8755cbSAndroid Build Coastguard Worker          href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended
65*ba8755cbSAndroid Build Coastguard Worker          header</a> as defined by POSIX 1003.1.  Most modern tar
66*ba8755cbSAndroid Build Coastguard Worker          implementations are able to extract such archives. <em>since
67*ba8755cbSAndroid Build Coastguard Worker          Commons Compress 1.4</em></li>
68*ba8755cbSAndroid Build Coastguard Worker        </ul>
69*ba8755cbSAndroid Build Coastguard Worker
70*ba8755cbSAndroid Build Coastguard Worker        <p><code>TarArchiveInputStream</code> will recognize the GNU
71*ba8755cbSAndroid Build Coastguard Worker        tar as well as the POSIX extensions (starting with Commons
72*ba8755cbSAndroid Build Coastguard Worker        Compress 1.2) for long file names and reads the longer names
73*ba8755cbSAndroid Build Coastguard Worker        transparently.</p>
74*ba8755cbSAndroid Build Coastguard Worker      </subsection>
75*ba8755cbSAndroid Build Coastguard Worker
76*ba8755cbSAndroid Build Coastguard Worker      <subsection name="Big Numeric Values">
77*ba8755cbSAndroid Build Coastguard Worker
78*ba8755cbSAndroid Build Coastguard Worker        <p>The <code>bigNumberMode</code> option of
79*ba8755cbSAndroid Build Coastguard Worker        <code>TarArchiveOutputStream</code> controls how files larger
80*ba8755cbSAndroid Build Coastguard Worker        than 8GiB or with other big numeric values that can't be
81*ba8755cbSAndroid Build Coastguard Worker        encoded in traditional header fields are handled.  The
82*ba8755cbSAndroid Build Coastguard Worker        possible choices are:</p>
83*ba8755cbSAndroid Build Coastguard Worker
84*ba8755cbSAndroid Build Coastguard Worker        <ul>
85*ba8755cbSAndroid Build Coastguard Worker          <li><code>BIGNUMBER_ERROR</code>: throw an exception if such an
86*ba8755cbSAndroid Build Coastguard Worker          entry is added.  This is the default.</li>
87*ba8755cbSAndroid Build Coastguard Worker          <li><code>BIGNUMBER_STAR</code>: use a variant first
88*ba8755cbSAndroid Build Coastguard Worker          introduced by J&#xf6;rg Schilling's <a
89*ba8755cbSAndroid Build Coastguard Worker          href="http://developer.berlios.de/projects/star">star</a>
90*ba8755cbSAndroid Build Coastguard Worker          and later adopted by GNU and BSD tar.  This method is not
91*ba8755cbSAndroid Build Coastguard Worker          supported by all implementations.</li>
92*ba8755cbSAndroid Build Coastguard Worker          <li><code>BIGNUMBER_POSIX</code>: use a PAX <a
93*ba8755cbSAndroid Build Coastguard Worker          href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended
94*ba8755cbSAndroid Build Coastguard Worker          header</a> as defined by POSIX 1003.1.  Most modern tar
95*ba8755cbSAndroid Build Coastguard Worker          implementations are able to extract such archives.</li>
96*ba8755cbSAndroid Build Coastguard Worker        </ul>
97*ba8755cbSAndroid Build Coastguard Worker
98*ba8755cbSAndroid Build Coastguard Worker        <p>Starting with Commons Compress 1.4
99*ba8755cbSAndroid Build Coastguard Worker        <code>TarArchiveInputStream</code> will recognize the star as
100*ba8755cbSAndroid Build Coastguard Worker        well as the POSIX extensions for big numeric values and reads them
101*ba8755cbSAndroid Build Coastguard Worker        transparently.</p>
102*ba8755cbSAndroid Build Coastguard Worker      </subsection>
103*ba8755cbSAndroid Build Coastguard Worker
104*ba8755cbSAndroid Build Coastguard Worker      <subsection name="File Name Encoding">
105*ba8755cbSAndroid Build Coastguard Worker        <p>The original ustar format only supports 7-Bit ASCII file
106*ba8755cbSAndroid Build Coastguard Worker        names, later implementations use the platform's default
107*ba8755cbSAndroid Build Coastguard Worker        encoding to encode file names.  The POSIX standard recommends
108*ba8755cbSAndroid Build Coastguard Worker        using PAX extension headers for non-ASCII file names
109*ba8755cbSAndroid Build Coastguard Worker        instead.</p>
110*ba8755cbSAndroid Build Coastguard Worker
111*ba8755cbSAndroid Build Coastguard Worker        <p>Commons Compress 1.1 to 1.3 assumed file names would be
112*ba8755cbSAndroid Build Coastguard Worker        encoded using ISO-8859-1.  Starting with Commons Compress 1.4
113*ba8755cbSAndroid Build Coastguard Worker        you can specify the encoding to expect (to use when writing)
114*ba8755cbSAndroid Build Coastguard Worker        as a parameter to <code>TarArchiveInputStream</code>
115*ba8755cbSAndroid Build Coastguard Worker        (<code>TarArchiveOutputStream</code>), it now defaults to the
116*ba8755cbSAndroid Build Coastguard Worker        platform's default encoding.</p>
117*ba8755cbSAndroid Build Coastguard Worker
118*ba8755cbSAndroid Build Coastguard Worker        <p>Since Commons Compress 1.4 another optional parameter -
119*ba8755cbSAndroid Build Coastguard Worker        <code>addPaxHeadersForNonAsciiNames</code> - of
120*ba8755cbSAndroid Build Coastguard Worker        <code>TarArchiveOutputStream</code> controls whether PAX
121*ba8755cbSAndroid Build Coastguard Worker        extension headers will be written for non-ASCII file names.
122*ba8755cbSAndroid Build Coastguard Worker        By default they will not be written to preserve space.
123*ba8755cbSAndroid Build Coastguard Worker        <code>TarArchiveInputStream</code> will read them
124*ba8755cbSAndroid Build Coastguard Worker        transparently if present.</p>
125*ba8755cbSAndroid Build Coastguard Worker      </subsection>
126*ba8755cbSAndroid Build Coastguard Worker
127*ba8755cbSAndroid Build Coastguard Worker      <subsection name="Sparse files">
128*ba8755cbSAndroid Build Coastguard Worker
129*ba8755cbSAndroid Build Coastguard Worker        <p><code>TarArchiveInputStream</code> will recognize sparse
130*ba8755cbSAndroid Build Coastguard Worker        file entries stored using the "oldgnu" format
131*ba8755cbSAndroid Build Coastguard Worker        (<code>-&#x2d;sparse-version=0.0</code> in GNU tar) but is not
132*ba8755cbSAndroid Build Coastguard Worker        able to extract them correctly.  <a href="#Unsupported
133*ba8755cbSAndroid Build Coastguard Worker        Features"><code>canReadEntryData</code></a> will return false
134*ba8755cbSAndroid Build Coastguard Worker        on such entries.  The other variants of sparse files can
135*ba8755cbSAndroid Build Coastguard Worker        currently not be detected at all.</p>
136*ba8755cbSAndroid Build Coastguard Worker      </subsection>
137*ba8755cbSAndroid Build Coastguard Worker
138*ba8755cbSAndroid Build Coastguard Worker      <subsection name="Consuming Archives Completely">
139*ba8755cbSAndroid Build Coastguard Worker
140*ba8755cbSAndroid Build Coastguard Worker        <p>The end of a tar archive is signalled by two consecutive
141*ba8755cbSAndroid Build Coastguard Worker        records of all zeros.  Unfortunately not all tar
142*ba8755cbSAndroid Build Coastguard Worker        implementations adhere to this and some only write one record
143*ba8755cbSAndroid Build Coastguard Worker        to end the archive.  Commons Compress will always write two
144*ba8755cbSAndroid Build Coastguard Worker        records but stop reading an archive as soon as finds one
145*ba8755cbSAndroid Build Coastguard Worker        record of all zeros.</p>
146*ba8755cbSAndroid Build Coastguard Worker
147*ba8755cbSAndroid Build Coastguard Worker        <p>Prior to version 1.5 this could leave the second EOF record
148*ba8755cbSAndroid Build Coastguard Worker        inside the stream when <code>getNextEntry</code> or
149*ba8755cbSAndroid Build Coastguard Worker        <code>getNextTarEntry</code> returned <code>null</code>
150*ba8755cbSAndroid Build Coastguard Worker        Starting with version 1.5 <code>TarArchiveInputStream</code>
151*ba8755cbSAndroid Build Coastguard Worker        will try to read a second record as well if present,
152*ba8755cbSAndroid Build Coastguard Worker        effectively consuming the archive completely.</p>
153*ba8755cbSAndroid Build Coastguard Worker
154*ba8755cbSAndroid Build Coastguard Worker      </subsection>
155*ba8755cbSAndroid Build Coastguard Worker
156*ba8755cbSAndroid Build Coastguard Worker      <subsection name="PAX Extended Header">
157*ba8755cbSAndroid Build Coastguard Worker        <p>The tar package has supported reading PAX extended headers
158*ba8755cbSAndroid Build Coastguard Worker        since 1.3 for local headers and 1.11 for global headers. The
159*ba8755cbSAndroid Build Coastguard Worker        following entries of PAX headers are applied when reading:</p>
160*ba8755cbSAndroid Build Coastguard Worker
161*ba8755cbSAndroid Build Coastguard Worker        <dl>
162*ba8755cbSAndroid Build Coastguard Worker          <dt>path</dt>
163*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's name</dd>
164*ba8755cbSAndroid Build Coastguard Worker
165*ba8755cbSAndroid Build Coastguard Worker          <dt>linkpath</dt>
166*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's link name</dd>
167*ba8755cbSAndroid Build Coastguard Worker
168*ba8755cbSAndroid Build Coastguard Worker          <dt>gid</dt>
169*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's group id</dd>
170*ba8755cbSAndroid Build Coastguard Worker
171*ba8755cbSAndroid Build Coastguard Worker          <dt>gname</dt>
172*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's group name</dd>
173*ba8755cbSAndroid Build Coastguard Worker
174*ba8755cbSAndroid Build Coastguard Worker          <dt>uid</dt>
175*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's user id</dd>
176*ba8755cbSAndroid Build Coastguard Worker
177*ba8755cbSAndroid Build Coastguard Worker          <dt>uname</dt>
178*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's user name</dd>
179*ba8755cbSAndroid Build Coastguard Worker
180*ba8755cbSAndroid Build Coastguard Worker          <dt>size</dt>
181*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's size</dd>
182*ba8755cbSAndroid Build Coastguard Worker
183*ba8755cbSAndroid Build Coastguard Worker          <dt>mtime</dt>
184*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's modification time</dd>
185*ba8755cbSAndroid Build Coastguard Worker
186*ba8755cbSAndroid Build Coastguard Worker          <dt>SCHILY.devminor</dt>
187*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's minor device number</dd>
188*ba8755cbSAndroid Build Coastguard Worker
189*ba8755cbSAndroid Build Coastguard Worker          <dt>SCHILY.devmajor</dt>
190*ba8755cbSAndroid Build Coastguard Worker          <dd>set the entry's major device number</dd>
191*ba8755cbSAndroid Build Coastguard Worker        </dl>
192*ba8755cbSAndroid Build Coastguard Worker
193*ba8755cbSAndroid Build Coastguard Worker        <p>in addition some fields used by GNU tar and star used to
194*ba8755cbSAndroid Build Coastguard Worker        signal sparse entries are supported and are used for the
195*ba8755cbSAndroid Build Coastguard Worker        <code>is*GNUSparse</code> and <code>isStarSparse</code>
196*ba8755cbSAndroid Build Coastguard Worker        methods.</p>
197*ba8755cbSAndroid Build Coastguard Worker
198*ba8755cbSAndroid Build Coastguard Worker        <p>Some PAX extra headers may be set when writing archives,
199*ba8755cbSAndroid Build Coastguard Worker        for example for non-ASCII names or big numeric values. This
200*ba8755cbSAndroid Build Coastguard Worker        depends on various setting of the output stream - see the
201*ba8755cbSAndroid Build Coastguard Worker        previous sections.</p>
202*ba8755cbSAndroid Build Coastguard Worker
203*ba8755cbSAndroid Build Coastguard Worker        <p>Since 1.15 you can directly access all PAX extension
204*ba8755cbSAndroid Build Coastguard Worker        headers that have been found when reading an entry or specify
205*ba8755cbSAndroid Build Coastguard Worker        extra headers to be written to a (local) PAX extended header
206*ba8755cbSAndroid Build Coastguard Worker        entry.</p>
207*ba8755cbSAndroid Build Coastguard Worker
208*ba8755cbSAndroid Build Coastguard Worker        <p>Some hints if you try to set extended headers:</p>
209*ba8755cbSAndroid Build Coastguard Worker
210*ba8755cbSAndroid Build Coastguard Worker        <ul>
211*ba8755cbSAndroid Build Coastguard Worker          <li>pax header keywords should be ascii.  star/gnutar
212*ba8755cbSAndroid Build Coastguard Worker          (SCHILY.xattr.* ) do not check for this.  libarchive/bsdtar
213*ba8755cbSAndroid Build Coastguard Worker          (LIBARCHIVE.xattr.*) uses URL-Encoding.</li>
214*ba8755cbSAndroid Build Coastguard Worker          <li>pax header values should be encoded as UTF-8 characters
215*ba8755cbSAndroid Build Coastguard Worker          (including trailing <code>\0</code>).  star/gnutar
216*ba8755cbSAndroid Build Coastguard Worker          (SCHILY.xattr.*) do not check for this.  libarchive/bsdtar
217*ba8755cbSAndroid Build Coastguard Worker          (LIBARCHIVE.xattr.*) encode values using Base64.</li>
218*ba8755cbSAndroid Build Coastguard Worker          <li>libarchive/bsdtar will read SCHILY.xattr headers, but
219*ba8755cbSAndroid Build Coastguard Worker          will not generate them.</li>
220*ba8755cbSAndroid Build Coastguard Worker          <li>gnutar will complain about LIBARCHIVE.xattr (and any
221*ba8755cbSAndroid Build Coastguard Worker          other unknown) headers and will neither encode nor decode
222*ba8755cbSAndroid Build Coastguard Worker          them.</li>
223*ba8755cbSAndroid Build Coastguard Worker        </ul>
224*ba8755cbSAndroid Build Coastguard Worker      </subsection>
225*ba8755cbSAndroid Build Coastguard Worker
226*ba8755cbSAndroid Build Coastguard Worker    </section>
227*ba8755cbSAndroid Build Coastguard Worker  </body>
228*ba8755cbSAndroid Build Coastguard Worker</document>
229