1:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5   :synopsis: Read and write tar-format archive files.
6
7.. moduleauthor:: Lars Gustäbel <[email protected]>
8.. sectionauthor:: Lars Gustäbel <[email protected]>
9
10**Source code:** :source:`Lib/tarfile.py`
11
12--------------
13
14The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip, bz2 and lzma compression.
16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
18
19Some facts and figures:
20
21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22  if the respective modules are available.
23
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
27  extensions, read-only support for all variants of the *sparse* extension
28  including restoration of sparse files.
29
30* read/write support for the POSIX.1-2001 (pax) format.
31
32* handles directories, regular files, hardlinks, symbolic links, fifos,
33  character devices and block devices and is able to acquire and restore file
34  information like timestamp, access permissions and owner.
35
36.. versionchanged:: 3.3
37   Added support for :mod:`lzma` compression.
38
39
40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)
41
42   Return a :class:`TarFile` object for the pathname *name*. For detailed
43   information on :class:`TarFile` objects and the keyword arguments that are
44   allowed, see :ref:`tarfile-objects`.
45
46   *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47   to ``'r'``. Here is a full list of mode combinations:
48
49   +------------------+---------------------------------------------+
50   | mode             | action                                      |
51   +==================+=============================================+
52   | ``'r' or 'r:*'`` | Open for reading with transparent           |
53   |                  | compression (recommended).                  |
54   +------------------+---------------------------------------------+
55   | ``'r:'``         | Open for reading exclusively without        |
56   |                  | compression.                                |
57   +------------------+---------------------------------------------+
58   | ``'r:gz'``       | Open for reading with gzip compression.     |
59   +------------------+---------------------------------------------+
60   | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
61   +------------------+---------------------------------------------+
62   | ``'r:xz'``       | Open for reading with lzma compression.     |
63   +------------------+---------------------------------------------+
64   | ``'x'`` or       | Create a tarfile exclusively without        |
65   | ``'x:'``         | compression.                                |
66   |                  | Raise a :exc:`FileExistsError` exception    |
67   |                  | if it already exists.                       |
68   +------------------+---------------------------------------------+
69   | ``'x:gz'``       | Create a tarfile with gzip compression.     |
70   |                  | Raise a :exc:`FileExistsError` exception    |
71   |                  | if it already exists.                       |
72   +------------------+---------------------------------------------+
73   | ``'x:bz2'``      | Create a tarfile with bzip2 compression.    |
74   |                  | Raise a :exc:`FileExistsError` exception    |
75   |                  | if it already exists.                       |
76   +------------------+---------------------------------------------+
77   | ``'x:xz'``       | Create a tarfile with lzma compression.     |
78   |                  | Raise a :exc:`FileExistsError` exception    |
79   |                  | if it already exists.                       |
80   +------------------+---------------------------------------------+
81   | ``'a' or 'a:'``  | Open for appending with no compression. The |
82   |                  | file is created if it does not exist.       |
83   +------------------+---------------------------------------------+
84   | ``'w' or 'w:'``  | Open for uncompressed writing.              |
85   +------------------+---------------------------------------------+
86   | ``'w:gz'``       | Open for gzip compressed writing.           |
87   +------------------+---------------------------------------------+
88   | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
89   +------------------+---------------------------------------------+
90   | ``'w:xz'``       | Open for lzma compressed writing.           |
91   +------------------+---------------------------------------------+
92
93   Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94   is not suitable to open a certain (compressed) file for reading,
95   :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this.  If a
96   compression method is not supported, :exc:`CompressionError` is raised.
97
98   If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99   opened in binary mode for *name*. It is supposed to be at position 0.
100
101   For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102   ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
103   *compresslevel* (default ``9``) to specify the compression level of the file.
104
105   For modes ``'w:xz'`` and ``'x:xz'``, :func:`tarfile.open` accepts the
106   keyword argument *preset* to specify the compression level of the file.
107
108   For special purposes, there is a second format for *mode*:
109   ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
110   object that processes its data as a stream of blocks.  No random seeking will
111   be done on the file. If given, *fileobj* may be any object that has a
112   :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
113   specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
114   in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
115   device. However, such a :class:`TarFile` object is limited in that it does
116   not allow random access, see :ref:`tar-examples`.  The currently
117   possible modes:
118
119   +-------------+--------------------------------------------+
120   | Mode        | Action                                     |
121   +=============+============================================+
122   | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
123   |             | with transparent compression.              |
124   +-------------+--------------------------------------------+
125   | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
126   |             | for reading.                               |
127   +-------------+--------------------------------------------+
128   | ``'r|gz'``  | Open a gzip compressed *stream* for        |
129   |             | reading.                                   |
130   +-------------+--------------------------------------------+
131   | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
132   |             | reading.                                   |
133   +-------------+--------------------------------------------+
134   | ``'r|xz'``  | Open an lzma compressed *stream* for       |
135   |             | reading.                                   |
136   +-------------+--------------------------------------------+
137   | ``'w|'``    | Open an uncompressed *stream* for writing. |
138   +-------------+--------------------------------------------+
139   | ``'w|gz'``  | Open a gzip compressed *stream* for        |
140   |             | writing.                                   |
141   +-------------+--------------------------------------------+
142   | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
143   |             | writing.                                   |
144   +-------------+--------------------------------------------+
145   | ``'w|xz'``  | Open an lzma compressed *stream* for       |
146   |             | writing.                                   |
147   +-------------+--------------------------------------------+
148
149   .. versionchanged:: 3.5
150      The ``'x'`` (exclusive creation) mode was added.
151
152   .. versionchanged:: 3.6
153      The *name* parameter accepts a :term:`path-like object`.
154
155
156.. class:: TarFile
157   :noindex:
158
159   Class for reading and writing tar archives. Do not use this class directly:
160   use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
161
162
163.. function:: is_tarfile(name)
164
165   Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
166   module can read. *name* may be a :class:`str`, file, or file-like object.
167
168   .. versionchanged:: 3.9
169      Support for file and file-like objects.
170
171
172The :mod:`tarfile` module defines the following exceptions:
173
174
175.. exception:: TarError
176
177   Base class for all :mod:`tarfile` exceptions.
178
179
180.. exception:: ReadError
181
182   Is raised when a tar archive is opened, that either cannot be handled by the
183   :mod:`tarfile` module or is somehow invalid.
184
185
186.. exception:: CompressionError
187
188   Is raised when a compression method is not supported or when the data cannot be
189   decoded properly.
190
191
192.. exception:: StreamError
193
194   Is raised for the limitations that are typical for stream-like :class:`TarFile`
195   objects.
196
197
198.. exception:: ExtractError
199
200   Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
201   :attr:`TarFile.errorlevel`\ ``== 2``.
202
203
204.. exception:: HeaderError
205
206   Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
207
208
209.. exception:: FilterError
210
211   Base class for members :ref:`refused <tarfile-extraction-refuse>` by
212   filters.
213
214   .. attribute:: tarinfo
215
216      Information about the member that the filter refused to extract,
217      as :ref:`TarInfo <tarinfo-objects>`.
218
219.. exception:: AbsolutePathError
220
221   Raised to refuse extracting a member with an absolute path.
222
223.. exception:: OutsideDestinationError
224
225   Raised to refuse extracting a member outside the destination directory.
226
227.. exception:: SpecialFileError
228
229   Raised to refuse extracting a special file (e.g. a device or pipe).
230
231.. exception:: AbsoluteLinkError
232
233   Raised to refuse extracting a symbolic link with an absolute path.
234
235.. exception:: LinkOutsideDestinationError
236
237   Raised to refuse extracting a symbolic link pointing outside the destination
238   directory.
239
240
241The following constants are available at the module level:
242
243.. data:: ENCODING
244
245   The default character encoding: ``'utf-8'`` on Windows, the value returned by
246   :func:`sys.getfilesystemencoding` otherwise.
247
248
249Each of the following constants defines a tar archive format that the
250:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
251details.
252
253
254.. data:: USTAR_FORMAT
255
256   POSIX.1-1988 (ustar) format.
257
258
259.. data:: GNU_FORMAT
260
261   GNU tar format.
262
263
264.. data:: PAX_FORMAT
265
266   POSIX.1-2001 (pax) format.
267
268
269.. data:: DEFAULT_FORMAT
270
271   The default format for creating archives. This is currently :const:`PAX_FORMAT`.
272
273   .. versionchanged:: 3.8
274      The default format for new archives was changed to
275      :const:`PAX_FORMAT` from :const:`GNU_FORMAT`.
276
277
278.. seealso::
279
280   Module :mod:`zipfile`
281      Documentation of the :mod:`zipfile` standard module.
282
283   :ref:`archiving-operations`
284      Documentation of the higher-level archiving facilities provided by the
285      standard :mod:`shutil` module.
286
287   `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
288      Documentation for tar archive files, including GNU tar extensions.
289
290
291.. _tarfile-objects:
292
293TarFile Objects
294---------------
295
296The :class:`TarFile` object provides an interface to a tar archive. A tar
297archive is a sequence of blocks. An archive member (a stored file) is made up of
298a header block followed by data blocks. It is possible to store a file in a tar
299archive several times. Each archive member is represented by a :class:`TarInfo`
300object, see :ref:`tarinfo-objects` for details.
301
302A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
303statement. It will automatically be closed when the block is completed. Please
304note that in the event of an exception an archive opened for writing will not
305be finalized; only the internally used file object will be closed. See the
306:ref:`tar-examples` section for a use case.
307
308.. versionadded:: 3.2
309   Added support for the context management protocol.
310
311.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=1)
312
313   All following arguments are optional and can be accessed as instance attributes
314   as well.
315
316   *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
317   It can be omitted if *fileobj* is given.
318   In this case, the file object's :attr:`name` attribute is used if it exists.
319
320   *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
321   data to an existing file, ``'w'`` to create a new file overwriting an existing
322   one, or ``'x'`` to create a new file only if it does not already exist.
323
324   If *fileobj* is given, it is used for reading or writing data. If it can be
325   determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
326   from position 0.
327
328   .. note::
329
330      *fileobj* is not closed, when :class:`TarFile` is closed.
331
332   *format* controls the archive format for writing. It must be one of the constants
333   :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
334   defined at module level. When reading, format will be automatically detected, even
335   if different formats are present in a single archive.
336
337   The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
338   with a different one.
339
340   If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
341   is :const:`True`, add the content of the target files to the archive. This has no
342   effect on systems that do not support symbolic links.
343
344   If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
345   If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
346   as possible. This is only useful for reading concatenated or damaged archives.
347
348   *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
349   messages). The messages are written to ``sys.stderr``.
350
351   *errorlevel* controls how extraction errors are handled,
352   see :attr:`the corresponding attribute <~TarFile.errorlevel>`.
353
354   The *encoding* and *errors* arguments define the character encoding to be
355   used for reading or writing the archive and how conversion errors are going
356   to be handled. The default settings will work for most users.
357   See section :ref:`tar-unicode` for in-depth information.
358
359   The *pax_headers* argument is an optional dictionary of strings which
360   will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
361
362   .. versionchanged:: 3.2
363      Use ``'surrogateescape'`` as the default for the *errors* argument.
364
365   .. versionchanged:: 3.5
366      The ``'x'`` (exclusive creation) mode was added.
367
368   .. versionchanged:: 3.6
369      The *name* parameter accepts a :term:`path-like object`.
370
371
372.. classmethod:: TarFile.open(...)
373
374   Alternative constructor. The :func:`tarfile.open` function is actually a
375   shortcut to this classmethod.
376
377
378.. method:: TarFile.getmember(name)
379
380   Return a :class:`TarInfo` object for member *name*. If *name* can not be found
381   in the archive, :exc:`KeyError` is raised.
382
383   .. note::
384
385      If a member occurs more than once in the archive, its last occurrence is assumed
386      to be the most up-to-date version.
387
388
389.. method:: TarFile.getmembers()
390
391   Return the members of the archive as a list of :class:`TarInfo` objects. The
392   list has the same order as the members in the archive.
393
394
395.. method:: TarFile.getnames()
396
397   Return the members as a list of their names. It has the same order as the list
398   returned by :meth:`getmembers`.
399
400
401.. method:: TarFile.list(verbose=True, *, members=None)
402
403   Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
404   only the names of the members are printed. If it is :const:`True`, output
405   similar to that of :program:`ls -l` is produced. If optional *members* is
406   given, it must be a subset of the list returned by :meth:`getmembers`.
407
408   .. versionchanged:: 3.5
409      Added the *members* parameter.
410
411
412.. method:: TarFile.next()
413
414   Return the next member of the archive as a :class:`TarInfo` object, when
415   :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
416   available.
417
418
419.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False, filter=None)
420
421   Extract all members from the archive to the current working directory or
422   directory *path*. If optional *members* is given, it must be a subset of the
423   list returned by :meth:`getmembers`. Directory information like owner,
424   modification time and permissions are set after all members have been extracted.
425   This is done to work around two problems: A directory's modification time is
426   reset each time a file is created in it. And, if a directory's permissions do
427   not allow writing, extracting files to it will fail.
428
429   If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
430   are used to set the owner/group for the extracted files. Otherwise, the named
431   values from the tarfile are used.
432
433   The *filter* argument, which was added in Python 3.11.4, specifies how
434   ``members`` are modified or rejected before extraction.
435   See :ref:`tarfile-extraction-filter` for details.
436   It is recommended to set this explicitly depending on which *tar* features
437   you need to support.
438
439   .. warning::
440
441      Never extract archives from untrusted sources without prior inspection.
442      It is possible that files are created outside of *path*, e.g. members
443      that have absolute filenames starting with ``"/"`` or filenames with two
444      dots ``".."``.
445
446      Set ``filter='data'`` to prevent the most dangerous security issues,
447      and read the :ref:`tarfile-extraction-filter` section for details.
448
449   .. versionchanged:: 3.5
450      Added the *numeric_owner* parameter.
451
452   .. versionchanged:: 3.6
453      The *path* parameter accepts a :term:`path-like object`.
454
455   .. versionchanged:: 3.11.4
456      Added the *filter* parameter.
457
458
459.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None)
460
461   Extract a member from the archive to the current working directory, using its
462   full name. Its file information is extracted as accurately as possible. *member*
463   may be a filename or a :class:`TarInfo` object. You can specify a different
464   directory using *path*. *path* may be a :term:`path-like object`.
465   File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
466
467   The *numeric_owner* and *filter* arguments are the same as
468   for :meth:`extractall`.
469
470   .. note::
471
472      The :meth:`extract` method does not take care of several extraction issues.
473      In most cases you should consider using the :meth:`extractall` method.
474
475   .. warning::
476
477      See the warning for :meth:`extractall`.
478
479      Set ``filter='data'`` to prevent the most dangerous security issues,
480      and read the :ref:`tarfile-extraction-filter` section for details.
481
482   .. versionchanged:: 3.2
483      Added the *set_attrs* parameter.
484
485   .. versionchanged:: 3.5
486      Added the *numeric_owner* parameter.
487
488   .. versionchanged:: 3.6
489      The *path* parameter accepts a :term:`path-like object`.
490
491   .. versionchanged:: 3.11.4
492      Added the *filter* parameter.
493
494
495.. method:: TarFile.extractfile(member)
496
497   Extract a member from the archive as a file object. *member* may be
498   a filename or a :class:`TarInfo` object. If *member* is a regular file or
499   a link, an :class:`io.BufferedReader` object is returned. For all other
500   existing members, :const:`None` is returned. If *member* does not appear
501   in the archive, :exc:`KeyError` is raised.
502
503   .. versionchanged:: 3.3
504      Return an :class:`io.BufferedReader` object.
505
506.. attribute:: TarFile.errorlevel
507   :type: int
508
509   If *errorlevel* is ``0``, errors are ignored when using :meth:`TarFile.extract`
510   and :meth:`TarFile.extractall`.
511   Nevertheless, they appear as error messages in the debug output when
512   *debug* is greater than 0.
513   If ``1`` (the default), all *fatal* errors are raised as :exc:`OSError` or
514   :exc:`FilterError` exceptions. If ``2``, all *non-fatal* errors are raised
515   as :exc:`TarError` exceptions as well.
516
517   Some exceptions, e.g. ones caused by wrong argument types or data
518   corruption, are always raised.
519
520   Custom :ref:`extraction filters <tarfile-extraction-filter>`
521   should raise :exc:`FilterError` for *fatal* errors
522   and :exc:`ExtractError` for *non-fatal* ones.
523
524   Note that when an exception is raised, the archive may be partially
525   extracted. It is the user’s responsibility to clean up.
526
527.. attribute:: TarFile.extraction_filter
528
529   .. versionadded:: 3.11.4
530
531   The :ref:`extraction filter <tarfile-extraction-filter>` used
532   as a default for the *filter* argument of :meth:`~TarFile.extract`
533   and :meth:`~TarFile.extractall`.
534
535   The attribute may be ``None`` or a callable.
536   String names are not allowed for this attribute, unlike the *filter*
537   argument to :meth:`~TarFile.extract`.
538
539   If ``extraction_filter`` is ``None`` (the default),
540   calling an extraction method without a *filter* argument will
541   use the :func:`fully_trusted <fully_trusted_filter>` filter for
542   compatibility with previous Python versions.
543
544   In Python 3.12+, leaving ``extraction_filter=None`` will emit a
545   ``DeprecationWarning``.
546
547   In Python 3.14+, leaving ``extraction_filter=None`` will cause
548   extraction methods to use the :func:`data <data_filter>` filter by default.
549
550   The attribute may be set on instances or overridden in subclasses.
551   It also is possible to set it on the ``TarFile`` class itself to set a
552   global default, although, since it affects all uses of *tarfile*,
553   it is best practice to only do so in top-level applications or
554   :mod:`site configuration <site>`.
555   To set a global default this way, a filter function needs to be wrapped in
556   :func:`staticmethod()` to prevent injection of a ``self`` argument.
557
558.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
559
560   Add the file *name* to the archive. *name* may be any type of file
561   (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
562   alternative name for the file in the archive. Directories are added
563   recursively by default. This can be avoided by setting *recursive* to
564   :const:`False`. Recursion adds entries in sorted order.
565   If *filter* is given, it
566   should be a function that takes a :class:`TarInfo` object argument and
567   returns the changed :class:`TarInfo` object. If it instead returns
568   :const:`None` the :class:`TarInfo` object will be excluded from the
569   archive. See :ref:`tar-examples` for an example.
570
571   .. versionchanged:: 3.2
572      Added the *filter* parameter.
573
574   .. versionchanged:: 3.7
575      Recursion adds entries in sorted order.
576
577
578.. method:: TarFile.addfile(tarinfo, fileobj=None)
579
580   Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
581   it should be a :term:`binary file`, and
582   ``tarinfo.size`` bytes are read from it and added to the archive.  You can
583   create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
584
585
586.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
587
588   Create a :class:`TarInfo` object from the result of :func:`os.stat` or
589   equivalent on an existing file.  The file is either named by *name*, or
590   specified as a :term:`file object` *fileobj* with a file descriptor.
591   *name* may be a :term:`path-like object`.  If
592   given, *arcname* specifies an alternative name for the file in the
593   archive, otherwise, the name is taken from *fileobj*’s
594   :attr:`~io.FileIO.name` attribute, or the *name* argument.  The name
595   should be a text string.
596
597   You can modify
598   some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
599   If the file object is not an ordinary file object positioned at the
600   beginning of the file, attributes such as :attr:`~TarInfo.size` may need
601   modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
602   The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
603   could be a dummy string.
604
605   .. versionchanged:: 3.6
606      The *name* parameter accepts a :term:`path-like object`.
607
608
609.. method:: TarFile.close()
610
611   Close the :class:`TarFile`. In write mode, two finishing zero blocks are
612   appended to the archive.
613
614
615.. attribute:: TarFile.pax_headers
616
617   A dictionary containing key-value pairs of pax global headers.
618
619
620
621.. _tarinfo-objects:
622
623TarInfo Objects
624---------------
625
626A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
627from storing all required attributes of a file (like file type, size, time,
628permissions, owner etc.), it provides some useful methods to determine its type.
629It does *not* contain the file's data itself.
630
631:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
632:meth:`~TarFile.getmember`, :meth:`~TarFile.getmembers` and
633:meth:`~TarFile.gettarinfo`.
634
635Modifying the objects returned by :meth:`~!TarFile.getmember` or
636:meth:`~!TarFile.getmembers` will affect all subsequent
637operations on the archive.
638For cases where this is unwanted, you can use :mod:`copy.copy() <copy>` or
639call the :meth:`~TarInfo.replace` method to create a modified copy in one step.
640
641Several attributes can be set to ``None`` to indicate that a piece of metadata
642is unused or unknown.
643Different :class:`TarInfo` methods handle ``None`` differently:
644
645- The :meth:`~TarFile.extract` or :meth:`~TarFile.extractall` methods will
646  ignore the corresponding metadata, leaving it set to a default.
647- :meth:`~TarFile.addfile` will fail.
648- :meth:`~TarFile.list` will print a placeholder string.
649
650
651.. versionchanged:: 3.11.4
652   Added :meth:`~TarInfo.replace` and handling of ``None``.
653
654
655.. class:: TarInfo(name="")
656
657   Create a :class:`TarInfo` object.
658
659
660.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
661
662   Create and return a :class:`TarInfo` object from string buffer *buf*.
663
664   Raises :exc:`HeaderError` if the buffer is invalid.
665
666
667.. classmethod:: TarInfo.fromtarfile(tarfile)
668
669   Read the next member from the :class:`TarFile` object *tarfile* and return it as
670   a :class:`TarInfo` object.
671
672
673.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
674
675   Create a string buffer from a :class:`TarInfo` object. For information on the
676   arguments see the constructor of the :class:`TarFile` class.
677
678   .. versionchanged:: 3.2
679      Use ``'surrogateescape'`` as the default for the *errors* argument.
680
681
682A ``TarInfo`` object has the following public data attributes:
683
684
685.. attribute:: TarInfo.name
686   :type: str
687
688   Name of the archive member.
689
690
691.. attribute:: TarInfo.size
692   :type: int
693
694   Size in bytes.
695
696
697.. attribute:: TarInfo.mtime
698   :type: int | float
699
700   Time of last modification in seconds since the :ref:`epoch <epoch>`,
701   as in :attr:`os.stat_result.st_mtime`.
702
703   .. versionchanged:: 3.11.4
704
705      Can be set to ``None`` for :meth:`~TarFile.extract` and
706      :meth:`~TarFile.extractall`, causing extraction to skip applying this
707      attribute.
708
709.. attribute:: TarInfo.mode
710   :type: int
711
712   Permission bits, as for :func:`os.chmod`.
713
714   .. versionchanged:: 3.11.4
715
716      Can be set to ``None`` for :meth:`~TarFile.extract` and
717      :meth:`~TarFile.extractall`, causing extraction to skip applying this
718      attribute.
719
720.. attribute:: TarInfo.type
721
722   File type.  *type* is usually one of these constants: :const:`REGTYPE`,
723   :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
724   :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
725   :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
726   more conveniently, use the ``is*()`` methods below.
727
728
729.. attribute:: TarInfo.linkname
730   :type: str
731
732   Name of the target file name, which is only present in :class:`TarInfo` objects
733   of type :const:`LNKTYPE` and :const:`SYMTYPE`.
734
735
736.. attribute:: TarInfo.uid
737   :type: int
738
739   User ID of the user who originally stored this member.
740
741   .. versionchanged:: 3.11.4
742
743      Can be set to ``None`` for :meth:`~TarFile.extract` and
744      :meth:`~TarFile.extractall`, causing extraction to skip applying this
745      attribute.
746
747.. attribute:: TarInfo.gid
748   :type: int
749
750   Group ID of the user who originally stored this member.
751
752   .. versionchanged:: 3.11.4
753
754      Can be set to ``None`` for :meth:`~TarFile.extract` and
755      :meth:`~TarFile.extractall`, causing extraction to skip applying this
756      attribute.
757
758.. attribute:: TarInfo.uname
759   :type: str
760
761   User name.
762
763   .. versionchanged:: 3.11.4
764
765      Can be set to ``None`` for :meth:`~TarFile.extract` and
766      :meth:`~TarFile.extractall`, causing extraction to skip applying this
767      attribute.
768
769.. attribute:: TarInfo.gname
770   :type: str
771
772   Group name.
773
774   .. versionchanged:: 3.11.4
775
776      Can be set to ``None`` for :meth:`~TarFile.extract` and
777      :meth:`~TarFile.extractall`, causing extraction to skip applying this
778      attribute.
779
780.. attribute:: TarInfo.pax_headers
781   :type: dict
782
783   A dictionary containing key-value pairs of an associated pax extended header.
784
785.. method:: TarInfo.replace(name=..., mtime=..., mode=..., linkname=...,
786                            uid=..., gid=..., uname=..., gname=...,
787                            deep=True)
788
789   .. versionadded:: 3.11.4
790
791   Return a *new* copy of the :class:`!TarInfo` object with the given attributes
792   changed. For example, to return a ``TarInfo`` with the group name set to
793   ``'staff'``, use::
794
795       new_tarinfo = old_tarinfo.replace(gname='staff')
796
797   By default, a deep copy is made.
798   If *deep* is false, the copy is shallow, i.e. ``pax_headers``
799   and any custom attributes are shared with the original ``TarInfo`` object.
800
801A :class:`TarInfo` object also provides some convenient query methods:
802
803
804.. method:: TarInfo.isfile()
805
806   Return :const:`True` if the :class:`Tarinfo` object is a regular file.
807
808
809.. method:: TarInfo.isreg()
810
811   Same as :meth:`isfile`.
812
813
814.. method:: TarInfo.isdir()
815
816   Return :const:`True` if it is a directory.
817
818
819.. method:: TarInfo.issym()
820
821   Return :const:`True` if it is a symbolic link.
822
823
824.. method:: TarInfo.islnk()
825
826   Return :const:`True` if it is a hard link.
827
828
829.. method:: TarInfo.ischr()
830
831   Return :const:`True` if it is a character device.
832
833
834.. method:: TarInfo.isblk()
835
836   Return :const:`True` if it is a block device.
837
838
839.. method:: TarInfo.isfifo()
840
841   Return :const:`True` if it is a FIFO.
842
843
844.. method:: TarInfo.isdev()
845
846   Return :const:`True` if it is one of character device, block device or FIFO.
847
848
849.. _tarfile-extraction-filter:
850
851Extraction filters
852------------------
853
854.. versionadded:: 3.11.4
855
856The *tar* format is designed to capture all details of a UNIX-like filesystem,
857which makes it very powerful.
858Unfortunately, the features make it easy to create tar files that have
859unintended -- and possibly malicious -- effects when extracted.
860For example, extracting a tar file can overwrite arbitrary files in various
861ways (e.g.  by using absolute paths, ``..`` path components, or symlinks that
862affect later members).
863
864In most cases, the full functionality is not needed.
865Therefore, *tarfile* supports extraction filters: a mechanism to limit
866functionality, and thus mitigate some of the security issues.
867
868.. seealso::
869
870   :pep:`706`
871      Contains further motivation and rationale behind the design.
872
873The *filter* argument to :meth:`TarFile.extract` or :meth:`~TarFile.extractall`
874can be:
875
876* the string ``'fully_trusted'``: Honor all metadata as specified in the
877  archive.
878  Should be used if the user trusts the archive completely, or implements
879  their own complex verification.
880
881* the string ``'tar'``: Honor most *tar*-specific features (i.e. features of
882  UNIX-like filesystems), but block features that are very likely to be
883  surprising or malicious. See :func:`tar_filter` for details.
884
885* the string ``'data'``: Ignore or block most features specific to UNIX-like
886  filesystems. Intended for extracting cross-platform data archives.
887  See :func:`data_filter` for details.
888
889* ``None`` (default): Use :attr:`TarFile.extraction_filter`.
890
891  If that is also ``None`` (the default), the ``'fully_trusted'``
892  filter will be used (for compatibility with earlier versions of Python).
893
894  In Python 3.12, the default will emit a ``DeprecationWarning``.
895
896  In Python 3.14, the ``'data'`` filter will become the default instead.
897  It's possible to switch earlier; see :attr:`TarFile.extraction_filter`.
898
899* A callable which will be called for each extracted member with a
900  :ref:`TarInfo <tarinfo-objects>` describing the member and the destination
901  path to where the archive is extracted (i.e. the same path is used for all
902  members)::
903
904      filter(/, member: TarInfo, path: str) -> TarInfo | None
905
906  The callable is called just before each member is extracted, so it can
907  take the current state of the disk into account.
908  It can:
909
910  - return a :class:`TarInfo` object which will be used instead of the metadata
911    in the archive, or
912  - return ``None``, in which case the member will be skipped, or
913  - raise an exception to abort the operation or skip the member,
914    depending on :attr:`~TarFile.errorlevel`.
915    Note that when extraction is aborted, :meth:`~TarFile.extractall` may leave
916    the archive partially extracted. It does not attempt to clean up.
917
918Default named filters
919~~~~~~~~~~~~~~~~~~~~~
920
921The pre-defined, named filters are available as functions, so they can be
922reused in custom filters:
923
924.. function:: fully_trusted_filter(/, member, path)
925
926   Return *member* unchanged.
927
928   This implements the ``'fully_trusted'`` filter.
929
930.. function:: tar_filter(/, member, path)
931
932  Implements the ``'tar'`` filter.
933
934  - Strip leading slashes (``/`` and :attr:`os.sep`) from filenames.
935  - :ref:`Refuse <tarfile-extraction-refuse>` to extract files with absolute
936    paths (in case the name is absolute
937    even after stripping slashes, e.g. ``C:/foo`` on Windows).
938    This raises :class:`~tarfile.AbsolutePathError`.
939  - :ref:`Refuse <tarfile-extraction-refuse>` to extract files whose absolute
940    path (after following symlinks) would end up outside the destination.
941    This raises :class:`~tarfile.OutsideDestinationError`.
942  - Clear high mode bits (setuid, setgid, sticky) and group/other write bits
943    (:attr:`~stat.S_IWGRP`|:attr:`~stat.S_IWOTH`).
944
945  Return the modified ``TarInfo`` member.
946
947.. function:: data_filter(/, member, path)
948
949  Implements the ``'data'`` filter.
950  In addition to what ``tar_filter`` does:
951
952  - :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
953    that link to absolute paths, or ones that link outside the destination.
954
955    This raises :class:`~tarfile.AbsoluteLinkError` or
956    :class:`~tarfile.LinkOutsideDestinationError`.
957
958    Note that such files are refused even on platforms that do not support
959    symbolic links.
960
961  - :ref:`Refuse <tarfile-extraction-refuse>` to extract device files
962    (including pipes).
963    This raises :class:`~tarfile.SpecialFileError`.
964
965  - For regular files, including hard links:
966
967    - Set the owner read and write permissions
968      (:attr:`~stat.S_IRUSR`|:attr:`~stat.S_IWUSR`).
969    - Remove the group & other executable permission
970      (:attr:`~stat.S_IXGRP`|:attr:`~stat.S_IXOTH`)
971      if the owner doesn’t have it (:attr:`~stat.S_IXUSR`).
972
973  - For other files (directories), set ``mode`` to ``None``, so
974    that extraction methods skip applying permission bits.
975  - Set user and group info (``uid``, ``gid``, ``uname``, ``gname``)
976    to ``None``, so that extraction methods skip setting it.
977
978  Return the modified ``TarInfo`` member.
979
980
981.. _tarfile-extraction-refuse:
982
983Filter errors
984~~~~~~~~~~~~~
985
986When a filter refuses to extract a file, it will raise an appropriate exception,
987a subclass of :class:`~tarfile.FilterError`.
988This will abort the extraction if :attr:`TarFile.errorlevel` is 1 or more.
989With ``errorlevel=0`` the error will be logged and the member will be skipped,
990but extraction will continue.
991
992
993Hints for further verification
994~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
995
996Even with ``filter='data'``, *tarfile* is not suited for extracting untrusted
997files without prior inspection.
998Among other issues, the pre-defined filters do not prevent denial-of-service
999attacks. Users should do additional checks.
1000
1001Here is an incomplete list of things to consider:
1002
1003* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
1004  to prevent e.g. exploiting pre-existing links, and to make it easier to
1005  clean up after a failed extraction.
1006* When working with untrusted data, use external (e.g. OS-level) limits on
1007  disk, memory and CPU usage.
1008* Check filenames against an allow-list of characters
1009  (to filter out control characters, confusables, foreign path separators,
1010  etc.).
1011* Check that filenames have expected extensions (discouraging files that
1012  execute when you “click on them”, or extension-less files like Windows special device names).
1013* Limit the number of extracted files, total size of extracted data,
1014  filename length (including symlink length), and size of individual files.
1015* Check for files that would be shadowed on case-insensitive filesystems.
1016
1017Also note that:
1018
1019* Tar files may contain multiple versions of the same file.
1020  Later ones are expected to overwrite any earlier ones.
1021  This feature is crucial to allow updating tape archives, but can be abused
1022  maliciously.
1023* *tarfile* does not protect against issues with “live” data,
1024  e.g. an attacker tinkering with the destination (or source) directory while
1025  extraction (or archiving) is in progress.
1026
1027
1028Supporting older Python versions
1029~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1030
1031Extraction filters were added to Python 3.12, and are backported to older
1032versions as security updates.
1033To check whether the feature is available, use e.g.
1034``hasattr(tarfile, 'data_filter')`` rather than checking the Python version.
1035
1036The following examples show how to support Python versions with and without
1037the feature.
1038Note that setting ``extraction_filter`` will affect any subsequent operations.
1039
1040* Fully trusted archive::
1041
1042    my_tarfile.extraction_filter = (lambda member, path: member)
1043    my_tarfile.extractall()
1044
1045* Use the ``'data'`` filter if available, but revert to Python 3.11 behavior
1046  (``'fully_trusted'``) if this feature is not available::
1047
1048    my_tarfile.extraction_filter = getattr(tarfile, 'data_filter',
1049                                           (lambda member, path: member))
1050    my_tarfile.extractall()
1051
1052* Use the ``'data'`` filter; *fail* if it is not available::
1053
1054    my_tarfile.extractall(filter=tarfile.data_filter)
1055
1056  or::
1057
1058    my_tarfile.extraction_filter = tarfile.data_filter
1059    my_tarfile.extractall()
1060
1061* Use the ``'data'`` filter; *warn* if it is not available::
1062
1063   if hasattr(tarfile, 'data_filter'):
1064       my_tarfile.extractall(filter='data')
1065   else:
1066       # remove this when no longer needed
1067       warn_the_user('Extracting may be unsafe; consider updating Python')
1068       my_tarfile.extractall()
1069
1070
1071Stateful extraction filter example
1072~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1073
1074While *tarfile*'s extraction methods take a simple *filter* callable,
1075custom filters may be more complex objects with an internal state.
1076It may be useful to write these as context managers, to be used like this::
1077
1078    with StatefulFilter() as filter_func:
1079        tar.extractall(path, filter=filter_func)
1080
1081Such a filter can be written as, for example::
1082
1083    class StatefulFilter:
1084        def __init__(self):
1085            self.file_count = 0
1086
1087        def __enter__(self):
1088            return self
1089
1090        def __call__(self, member, path):
1091            self.file_count += 1
1092            return member
1093
1094        def __exit__(self, *exc_info):
1095            print(f'{self.file_count} files extracted')
1096
1097
1098.. _tarfile-commandline:
1099.. program:: tarfile
1100
1101
1102Command-Line Interface
1103----------------------
1104
1105.. versionadded:: 3.4
1106
1107The :mod:`tarfile` module provides a simple command-line interface to interact
1108with tar archives.
1109
1110If you want to create a new tar archive, specify its name after the :option:`-c`
1111option and then list the filename(s) that should be included:
1112
1113.. code-block:: shell-session
1114
1115    $ python -m tarfile -c monty.tar  spam.txt eggs.txt
1116
1117Passing a directory is also acceptable:
1118
1119.. code-block:: shell-session
1120
1121    $ python -m tarfile -c monty.tar life-of-brian_1979/
1122
1123If you want to extract a tar archive into the current directory, use
1124the :option:`-e` option:
1125
1126.. code-block:: shell-session
1127
1128    $ python -m tarfile -e monty.tar
1129
1130You can also extract a tar archive into a different directory by passing the
1131directory's name:
1132
1133.. code-block:: shell-session
1134
1135    $ python -m tarfile -e monty.tar  other-dir/
1136
1137For a list of the files in a tar archive, use the :option:`-l` option:
1138
1139.. code-block:: shell-session
1140
1141    $ python -m tarfile -l monty.tar
1142
1143
1144Command-line options
1145~~~~~~~~~~~~~~~~~~~~
1146
1147.. cmdoption:: -l <tarfile>
1148               --list <tarfile>
1149
1150   List files in a tarfile.
1151
1152.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
1153               --create <tarfile> <source1> ... <sourceN>
1154
1155   Create tarfile from source files.
1156
1157.. cmdoption:: -e <tarfile> [<output_dir>]
1158               --extract <tarfile> [<output_dir>]
1159
1160   Extract tarfile into the current directory if *output_dir* is not specified.
1161
1162.. cmdoption:: -t <tarfile>
1163               --test <tarfile>
1164
1165   Test whether the tarfile is valid or not.
1166
1167.. cmdoption:: -v, --verbose
1168
1169   Verbose output.
1170
1171.. cmdoption:: --filter <filtername>
1172
1173   Specifies the *filter* for ``--extract``.
1174   See :ref:`tarfile-extraction-filter` for details.
1175   Only string names are accepted (that is, ``fully_trusted``, ``tar``,
1176   and ``data``).
1177
1178   .. versionadded:: 3.11.4
1179
1180.. _tar-examples:
1181
1182Examples
1183--------
1184
1185How to extract an entire tar archive to the current working directory::
1186
1187   import tarfile
1188   tar = tarfile.open("sample.tar.gz")
1189   tar.extractall()
1190   tar.close()
1191
1192How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
1193a generator function instead of a list::
1194
1195   import os
1196   import tarfile
1197
1198   def py_files(members):
1199       for tarinfo in members:
1200           if os.path.splitext(tarinfo.name)[1] == ".py":
1201               yield tarinfo
1202
1203   tar = tarfile.open("sample.tar.gz")
1204   tar.extractall(members=py_files(tar))
1205   tar.close()
1206
1207How to create an uncompressed tar archive from a list of filenames::
1208
1209   import tarfile
1210   tar = tarfile.open("sample.tar", "w")
1211   for name in ["foo", "bar", "quux"]:
1212       tar.add(name)
1213   tar.close()
1214
1215The same example using the :keyword:`with` statement::
1216
1217    import tarfile
1218    with tarfile.open("sample.tar", "w") as tar:
1219        for name in ["foo", "bar", "quux"]:
1220            tar.add(name)
1221
1222How to read a gzip compressed tar archive and display some member information::
1223
1224   import tarfile
1225   tar = tarfile.open("sample.tar.gz", "r:gz")
1226   for tarinfo in tar:
1227       print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="")
1228       if tarinfo.isreg():
1229           print("a regular file.")
1230       elif tarinfo.isdir():
1231           print("a directory.")
1232       else:
1233           print("something else.")
1234   tar.close()
1235
1236How to create an archive and reset the user information using the *filter*
1237parameter in :meth:`TarFile.add`::
1238
1239    import tarfile
1240    def reset(tarinfo):
1241        tarinfo.uid = tarinfo.gid = 0
1242        tarinfo.uname = tarinfo.gname = "root"
1243        return tarinfo
1244    tar = tarfile.open("sample.tar.gz", "w:gz")
1245    tar.add("foo", filter=reset)
1246    tar.close()
1247
1248
1249.. _tar-formats:
1250
1251Supported tar formats
1252---------------------
1253
1254There are three tar formats that can be created with the :mod:`tarfile` module:
1255
1256* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
1257  up to a length of at best 256 characters and linknames up to 100 characters.
1258  The maximum file size is 8 GiB. This is an old and limited but widely
1259  supported format.
1260
1261* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
1262  linknames, files bigger than 8 GiB and sparse files. It is the de facto
1263  standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
1264  extensions for long names, sparse file support is read-only.
1265
1266* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
1267  format with virtually no limits. It supports long filenames and linknames, large
1268  files and stores pathnames in a portable way. Modern tar implementations,
1269  including GNU tar, bsdtar/libarchive and star, fully support extended *pax*
1270  features; some old or unmaintained libraries may not, but should treat
1271  *pax* archives as if they were in the universally supported *ustar* format.
1272  It is the current default format for new archives.
1273
1274  It extends the existing *ustar* format with extra headers for information
1275  that cannot be stored otherwise. There are two flavours of pax headers:
1276  Extended headers only affect the subsequent file header, global
1277  headers are valid for the complete archive and affect all following files.
1278  All the data in a pax header is encoded in *UTF-8* for portability reasons.
1279
1280There are some more variants of the tar format which can be read, but not
1281created:
1282
1283* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
1284  storing only regular files and directories. Names must not be longer than 100
1285  characters, there is no user/group name information. Some archives have
1286  miscalculated header checksums in case of fields with non-ASCII characters.
1287
1288* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
1289  pax format, but is not compatible.
1290
1291.. _tar-unicode:
1292
1293Unicode issues
1294--------------
1295
1296The tar format was originally conceived to make backups on tape drives with the
1297main focus on preserving file system information. Nowadays tar archives are
1298commonly used for file distribution and exchanging archives over networks. One
1299problem of the original format (which is the basis of all other formats) is
1300that there is no concept of supporting different character encodings. For
1301example, an ordinary tar archive created on a *UTF-8* system cannot be read
1302correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
1303metadata (like filenames, linknames, user/group names) will appear damaged.
1304Unfortunately, there is no way to autodetect the encoding of an archive. The
1305pax format was designed to solve this problem. It stores non-ASCII metadata
1306using the universal character encoding *UTF-8*.
1307
1308The details of character conversion in :mod:`tarfile` are controlled by the
1309*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
1310
1311*encoding* defines the character encoding to use for the metadata in the
1312archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
1313as a fallback. Depending on whether the archive is read or written, the
1314metadata must be either decoded or encoded. If *encoding* is not set
1315appropriately, this conversion may fail.
1316
1317The *errors* argument defines how characters are treated that cannot be
1318converted. Possible values are listed in section :ref:`error-handlers`.
1319The default scheme is ``'surrogateescape'`` which Python also uses for its
1320file system calls, see :ref:`os-filenames`.
1321
1322For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed
1323because all the metadata is stored using *UTF-8*. *encoding* is only used in
1324the rare cases when binary pax headers are decoded or when strings with
1325surrogate characters are stored.
1326