1:mod:`tarfile` --- Read and write tar archive files 2=================================================== 3 4.. module:: tarfile 5 :synopsis: Read and write tar-format archive files. 6 7.. moduleauthor:: Lars Gustäbel <[email protected]> 8.. sectionauthor:: Lars Gustäbel <[email protected]> 9 10**Source code:** :source:`Lib/tarfile.py` 11 12-------------- 13 14The :mod:`tarfile` module makes it possible to read and write tar 15archives, including those using gzip, bz2 and lzma compression. 16Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the 17higher-level functions in :ref:`shutil <archiving-operations>`. 18 19Some facts and figures: 20 21* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives 22 if the respective modules are available. 23 24* read/write support for the POSIX.1-1988 (ustar) format. 25 26* read/write support for the GNU tar format including *longname* and *longlink* 27 extensions, read-only support for all variants of the *sparse* extension 28 including restoration of sparse files. 29 30* read/write support for the POSIX.1-2001 (pax) format. 31 32* handles directories, regular files, hardlinks, symbolic links, fifos, 33 character devices and block devices and is able to acquire and restore file 34 information like timestamp, access permissions and owner. 35 36.. versionchanged:: 3.3 37 Added support for :mod:`lzma` compression. 38 39 40.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs) 41 42 Return a :class:`TarFile` object for the pathname *name*. For detailed 43 information on :class:`TarFile` objects and the keyword arguments that are 44 allowed, see :ref:`tarfile-objects`. 45 46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults 47 to ``'r'``. Here is a full list of mode combinations: 48 49 +------------------+---------------------------------------------+ 50 | mode | action | 51 +==================+=============================================+ 52 | ``'r' or 'r:*'`` | Open for reading with transparent | 53 | | compression (recommended). | 54 +------------------+---------------------------------------------+ 55 | ``'r:'`` | Open for reading exclusively without | 56 | | compression. | 57 +------------------+---------------------------------------------+ 58 | ``'r:gz'`` | Open for reading with gzip compression. | 59 +------------------+---------------------------------------------+ 60 | ``'r:bz2'`` | Open for reading with bzip2 compression. | 61 +------------------+---------------------------------------------+ 62 | ``'r:xz'`` | Open for reading with lzma compression. | 63 +------------------+---------------------------------------------+ 64 | ``'x'`` or | Create a tarfile exclusively without | 65 | ``'x:'`` | compression. | 66 | | Raise a :exc:`FileExistsError` exception | 67 | | if it already exists. | 68 +------------------+---------------------------------------------+ 69 | ``'x:gz'`` | Create a tarfile with gzip compression. | 70 | | Raise a :exc:`FileExistsError` exception | 71 | | if it already exists. | 72 +------------------+---------------------------------------------+ 73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. | 74 | | Raise a :exc:`FileExistsError` exception | 75 | | if it already exists. | 76 +------------------+---------------------------------------------+ 77 | ``'x:xz'`` | Create a tarfile with lzma compression. | 78 | | Raise a :exc:`FileExistsError` exception | 79 | | if it already exists. | 80 +------------------+---------------------------------------------+ 81 | ``'a' or 'a:'`` | Open for appending with no compression. The | 82 | | file is created if it does not exist. | 83 +------------------+---------------------------------------------+ 84 | ``'w' or 'w:'`` | Open for uncompressed writing. | 85 +------------------+---------------------------------------------+ 86 | ``'w:gz'`` | Open for gzip compressed writing. | 87 +------------------+---------------------------------------------+ 88 | ``'w:bz2'`` | Open for bzip2 compressed writing. | 89 +------------------+---------------------------------------------+ 90 | ``'w:xz'`` | Open for lzma compressed writing. | 91 +------------------+---------------------------------------------+ 92 93 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode* 94 is not suitable to open a certain (compressed) file for reading, 95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a 96 compression method is not supported, :exc:`CompressionError` is raised. 97 98 If *fileobj* is specified, it is used as an alternative to a :term:`file object` 99 opened in binary mode for *name*. It is supposed to be at position 0. 100 101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``, 102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument 103 *compresslevel* (default ``9``) to specify the compression level of the file. 104 105 For modes ``'w:xz'`` and ``'x:xz'``, :func:`tarfile.open` accepts the 106 keyword argument *preset* to specify the compression level of the file. 107 108 For special purposes, there is a second format for *mode*: 109 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile` 110 object that processes its data as a stream of blocks. No random seeking will 111 be done on the file. If given, *fileobj* may be any object that has a 112 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize* 113 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant 114 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape 115 device. However, such a :class:`TarFile` object is limited in that it does 116 not allow random access, see :ref:`tar-examples`. The currently 117 possible modes: 118 119 +-------------+--------------------------------------------+ 120 | Mode | Action | 121 +=============+============================================+ 122 | ``'r|*'`` | Open a *stream* of tar blocks for reading | 123 | | with transparent compression. | 124 +-------------+--------------------------------------------+ 125 | ``'r|'`` | Open a *stream* of uncompressed tar blocks | 126 | | for reading. | 127 +-------------+--------------------------------------------+ 128 | ``'r|gz'`` | Open a gzip compressed *stream* for | 129 | | reading. | 130 +-------------+--------------------------------------------+ 131 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for | 132 | | reading. | 133 +-------------+--------------------------------------------+ 134 | ``'r|xz'`` | Open an lzma compressed *stream* for | 135 | | reading. | 136 +-------------+--------------------------------------------+ 137 | ``'w|'`` | Open an uncompressed *stream* for writing. | 138 +-------------+--------------------------------------------+ 139 | ``'w|gz'`` | Open a gzip compressed *stream* for | 140 | | writing. | 141 +-------------+--------------------------------------------+ 142 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for | 143 | | writing. | 144 +-------------+--------------------------------------------+ 145 | ``'w|xz'`` | Open an lzma compressed *stream* for | 146 | | writing. | 147 +-------------+--------------------------------------------+ 148 149 .. versionchanged:: 3.5 150 The ``'x'`` (exclusive creation) mode was added. 151 152 .. versionchanged:: 3.6 153 The *name* parameter accepts a :term:`path-like object`. 154 155 156.. class:: TarFile 157 :noindex: 158 159 Class for reading and writing tar archives. Do not use this class directly: 160 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`. 161 162 163.. function:: is_tarfile(name) 164 165 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile` 166 module can read. *name* may be a :class:`str`, file, or file-like object. 167 168 .. versionchanged:: 3.9 169 Support for file and file-like objects. 170 171 172The :mod:`tarfile` module defines the following exceptions: 173 174 175.. exception:: TarError 176 177 Base class for all :mod:`tarfile` exceptions. 178 179 180.. exception:: ReadError 181 182 Is raised when a tar archive is opened, that either cannot be handled by the 183 :mod:`tarfile` module or is somehow invalid. 184 185 186.. exception:: CompressionError 187 188 Is raised when a compression method is not supported or when the data cannot be 189 decoded properly. 190 191 192.. exception:: StreamError 193 194 Is raised for the limitations that are typical for stream-like :class:`TarFile` 195 objects. 196 197 198.. exception:: ExtractError 199 200 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if 201 :attr:`TarFile.errorlevel`\ ``== 2``. 202 203 204.. exception:: HeaderError 205 206 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. 207 208 209.. exception:: FilterError 210 211 Base class for members :ref:`refused <tarfile-extraction-refuse>` by 212 filters. 213 214 .. attribute:: tarinfo 215 216 Information about the member that the filter refused to extract, 217 as :ref:`TarInfo <tarinfo-objects>`. 218 219.. exception:: AbsolutePathError 220 221 Raised to refuse extracting a member with an absolute path. 222 223.. exception:: OutsideDestinationError 224 225 Raised to refuse extracting a member outside the destination directory. 226 227.. exception:: SpecialFileError 228 229 Raised to refuse extracting a special file (e.g. a device or pipe). 230 231.. exception:: AbsoluteLinkError 232 233 Raised to refuse extracting a symbolic link with an absolute path. 234 235.. exception:: LinkOutsideDestinationError 236 237 Raised to refuse extracting a symbolic link pointing outside the destination 238 directory. 239 240 241The following constants are available at the module level: 242 243.. data:: ENCODING 244 245 The default character encoding: ``'utf-8'`` on Windows, the value returned by 246 :func:`sys.getfilesystemencoding` otherwise. 247 248 249Each of the following constants defines a tar archive format that the 250:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for 251details. 252 253 254.. data:: USTAR_FORMAT 255 256 POSIX.1-1988 (ustar) format. 257 258 259.. data:: GNU_FORMAT 260 261 GNU tar format. 262 263 264.. data:: PAX_FORMAT 265 266 POSIX.1-2001 (pax) format. 267 268 269.. data:: DEFAULT_FORMAT 270 271 The default format for creating archives. This is currently :const:`PAX_FORMAT`. 272 273 .. versionchanged:: 3.8 274 The default format for new archives was changed to 275 :const:`PAX_FORMAT` from :const:`GNU_FORMAT`. 276 277 278.. seealso:: 279 280 Module :mod:`zipfile` 281 Documentation of the :mod:`zipfile` standard module. 282 283 :ref:`archiving-operations` 284 Documentation of the higher-level archiving facilities provided by the 285 standard :mod:`shutil` module. 286 287 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_ 288 Documentation for tar archive files, including GNU tar extensions. 289 290 291.. _tarfile-objects: 292 293TarFile Objects 294--------------- 295 296The :class:`TarFile` object provides an interface to a tar archive. A tar 297archive is a sequence of blocks. An archive member (a stored file) is made up of 298a header block followed by data blocks. It is possible to store a file in a tar 299archive several times. Each archive member is represented by a :class:`TarInfo` 300object, see :ref:`tarinfo-objects` for details. 301 302A :class:`TarFile` object can be used as a context manager in a :keyword:`with` 303statement. It will automatically be closed when the block is completed. Please 304note that in the event of an exception an archive opened for writing will not 305be finalized; only the internally used file object will be closed. See the 306:ref:`tar-examples` section for a use case. 307 308.. versionadded:: 3.2 309 Added support for the context management protocol. 310 311.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=1) 312 313 All following arguments are optional and can be accessed as instance attributes 314 as well. 315 316 *name* is the pathname of the archive. *name* may be a :term:`path-like object`. 317 It can be omitted if *fileobj* is given. 318 In this case, the file object's :attr:`name` attribute is used if it exists. 319 320 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append 321 data to an existing file, ``'w'`` to create a new file overwriting an existing 322 one, or ``'x'`` to create a new file only if it does not already exist. 323 324 If *fileobj* is given, it is used for reading or writing data. If it can be 325 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used 326 from position 0. 327 328 .. note:: 329 330 *fileobj* is not closed, when :class:`TarFile` is closed. 331 332 *format* controls the archive format for writing. It must be one of the constants 333 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are 334 defined at module level. When reading, format will be automatically detected, even 335 if different formats are present in a single archive. 336 337 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class 338 with a different one. 339 340 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it 341 is :const:`True`, add the content of the target files to the archive. This has no 342 effect on systems that do not support symbolic links. 343 344 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive. 345 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members 346 as possible. This is only useful for reading concatenated or damaged archives. 347 348 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug 349 messages). The messages are written to ``sys.stderr``. 350 351 *errorlevel* controls how extraction errors are handled, 352 see :attr:`the corresponding attribute <~TarFile.errorlevel>`. 353 354 The *encoding* and *errors* arguments define the character encoding to be 355 used for reading or writing the archive and how conversion errors are going 356 to be handled. The default settings will work for most users. 357 See section :ref:`tar-unicode` for in-depth information. 358 359 The *pax_headers* argument is an optional dictionary of strings which 360 will be added as a pax global header if *format* is :const:`PAX_FORMAT`. 361 362 .. versionchanged:: 3.2 363 Use ``'surrogateescape'`` as the default for the *errors* argument. 364 365 .. versionchanged:: 3.5 366 The ``'x'`` (exclusive creation) mode was added. 367 368 .. versionchanged:: 3.6 369 The *name* parameter accepts a :term:`path-like object`. 370 371 372.. classmethod:: TarFile.open(...) 373 374 Alternative constructor. The :func:`tarfile.open` function is actually a 375 shortcut to this classmethod. 376 377 378.. method:: TarFile.getmember(name) 379 380 Return a :class:`TarInfo` object for member *name*. If *name* can not be found 381 in the archive, :exc:`KeyError` is raised. 382 383 .. note:: 384 385 If a member occurs more than once in the archive, its last occurrence is assumed 386 to be the most up-to-date version. 387 388 389.. method:: TarFile.getmembers() 390 391 Return the members of the archive as a list of :class:`TarInfo` objects. The 392 list has the same order as the members in the archive. 393 394 395.. method:: TarFile.getnames() 396 397 Return the members as a list of their names. It has the same order as the list 398 returned by :meth:`getmembers`. 399 400 401.. method:: TarFile.list(verbose=True, *, members=None) 402 403 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`, 404 only the names of the members are printed. If it is :const:`True`, output 405 similar to that of :program:`ls -l` is produced. If optional *members* is 406 given, it must be a subset of the list returned by :meth:`getmembers`. 407 408 .. versionchanged:: 3.5 409 Added the *members* parameter. 410 411 412.. method:: TarFile.next() 413 414 Return the next member of the archive as a :class:`TarInfo` object, when 415 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more 416 available. 417 418 419.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False, filter=None) 420 421 Extract all members from the archive to the current working directory or 422 directory *path*. If optional *members* is given, it must be a subset of the 423 list returned by :meth:`getmembers`. Directory information like owner, 424 modification time and permissions are set after all members have been extracted. 425 This is done to work around two problems: A directory's modification time is 426 reset each time a file is created in it. And, if a directory's permissions do 427 not allow writing, extracting files to it will fail. 428 429 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile 430 are used to set the owner/group for the extracted files. Otherwise, the named 431 values from the tarfile are used. 432 433 The *filter* argument, which was added in Python 3.11.4, specifies how 434 ``members`` are modified or rejected before extraction. 435 See :ref:`tarfile-extraction-filter` for details. 436 It is recommended to set this explicitly depending on which *tar* features 437 you need to support. 438 439 .. warning:: 440 441 Never extract archives from untrusted sources without prior inspection. 442 It is possible that files are created outside of *path*, e.g. members 443 that have absolute filenames starting with ``"/"`` or filenames with two 444 dots ``".."``. 445 446 Set ``filter='data'`` to prevent the most dangerous security issues, 447 and read the :ref:`tarfile-extraction-filter` section for details. 448 449 .. versionchanged:: 3.5 450 Added the *numeric_owner* parameter. 451 452 .. versionchanged:: 3.6 453 The *path* parameter accepts a :term:`path-like object`. 454 455 .. versionchanged:: 3.11.4 456 Added the *filter* parameter. 457 458 459.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None) 460 461 Extract a member from the archive to the current working directory, using its 462 full name. Its file information is extracted as accurately as possible. *member* 463 may be a filename or a :class:`TarInfo` object. You can specify a different 464 directory using *path*. *path* may be a :term:`path-like object`. 465 File attributes (owner, mtime, mode) are set unless *set_attrs* is false. 466 467 The *numeric_owner* and *filter* arguments are the same as 468 for :meth:`extractall`. 469 470 .. note:: 471 472 The :meth:`extract` method does not take care of several extraction issues. 473 In most cases you should consider using the :meth:`extractall` method. 474 475 .. warning:: 476 477 See the warning for :meth:`extractall`. 478 479 Set ``filter='data'`` to prevent the most dangerous security issues, 480 and read the :ref:`tarfile-extraction-filter` section for details. 481 482 .. versionchanged:: 3.2 483 Added the *set_attrs* parameter. 484 485 .. versionchanged:: 3.5 486 Added the *numeric_owner* parameter. 487 488 .. versionchanged:: 3.6 489 The *path* parameter accepts a :term:`path-like object`. 490 491 .. versionchanged:: 3.11.4 492 Added the *filter* parameter. 493 494 495.. method:: TarFile.extractfile(member) 496 497 Extract a member from the archive as a file object. *member* may be 498 a filename or a :class:`TarInfo` object. If *member* is a regular file or 499 a link, an :class:`io.BufferedReader` object is returned. For all other 500 existing members, :const:`None` is returned. If *member* does not appear 501 in the archive, :exc:`KeyError` is raised. 502 503 .. versionchanged:: 3.3 504 Return an :class:`io.BufferedReader` object. 505 506.. attribute:: TarFile.errorlevel 507 :type: int 508 509 If *errorlevel* is ``0``, errors are ignored when using :meth:`TarFile.extract` 510 and :meth:`TarFile.extractall`. 511 Nevertheless, they appear as error messages in the debug output when 512 *debug* is greater than 0. 513 If ``1`` (the default), all *fatal* errors are raised as :exc:`OSError` or 514 :exc:`FilterError` exceptions. If ``2``, all *non-fatal* errors are raised 515 as :exc:`TarError` exceptions as well. 516 517 Some exceptions, e.g. ones caused by wrong argument types or data 518 corruption, are always raised. 519 520 Custom :ref:`extraction filters <tarfile-extraction-filter>` 521 should raise :exc:`FilterError` for *fatal* errors 522 and :exc:`ExtractError` for *non-fatal* ones. 523 524 Note that when an exception is raised, the archive may be partially 525 extracted. It is the user’s responsibility to clean up. 526 527.. attribute:: TarFile.extraction_filter 528 529 .. versionadded:: 3.11.4 530 531 The :ref:`extraction filter <tarfile-extraction-filter>` used 532 as a default for the *filter* argument of :meth:`~TarFile.extract` 533 and :meth:`~TarFile.extractall`. 534 535 The attribute may be ``None`` or a callable. 536 String names are not allowed for this attribute, unlike the *filter* 537 argument to :meth:`~TarFile.extract`. 538 539 If ``extraction_filter`` is ``None`` (the default), 540 calling an extraction method without a *filter* argument will 541 use the :func:`fully_trusted <fully_trusted_filter>` filter for 542 compatibility with previous Python versions. 543 544 In Python 3.12+, leaving ``extraction_filter=None`` will emit a 545 ``DeprecationWarning``. 546 547 In Python 3.14+, leaving ``extraction_filter=None`` will cause 548 extraction methods to use the :func:`data <data_filter>` filter by default. 549 550 The attribute may be set on instances or overridden in subclasses. 551 It also is possible to set it on the ``TarFile`` class itself to set a 552 global default, although, since it affects all uses of *tarfile*, 553 it is best practice to only do so in top-level applications or 554 :mod:`site configuration <site>`. 555 To set a global default this way, a filter function needs to be wrapped in 556 :func:`staticmethod()` to prevent injection of a ``self`` argument. 557 558.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None) 559 560 Add the file *name* to the archive. *name* may be any type of file 561 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an 562 alternative name for the file in the archive. Directories are added 563 recursively by default. This can be avoided by setting *recursive* to 564 :const:`False`. Recursion adds entries in sorted order. 565 If *filter* is given, it 566 should be a function that takes a :class:`TarInfo` object argument and 567 returns the changed :class:`TarInfo` object. If it instead returns 568 :const:`None` the :class:`TarInfo` object will be excluded from the 569 archive. See :ref:`tar-examples` for an example. 570 571 .. versionchanged:: 3.2 572 Added the *filter* parameter. 573 574 .. versionchanged:: 3.7 575 Recursion adds entries in sorted order. 576 577 578.. method:: TarFile.addfile(tarinfo, fileobj=None) 579 580 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given, 581 it should be a :term:`binary file`, and 582 ``tarinfo.size`` bytes are read from it and added to the archive. You can 583 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`. 584 585 586.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None) 587 588 Create a :class:`TarInfo` object from the result of :func:`os.stat` or 589 equivalent on an existing file. The file is either named by *name*, or 590 specified as a :term:`file object` *fileobj* with a file descriptor. 591 *name* may be a :term:`path-like object`. If 592 given, *arcname* specifies an alternative name for the file in the 593 archive, otherwise, the name is taken from *fileobj*’s 594 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name 595 should be a text string. 596 597 You can modify 598 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`. 599 If the file object is not an ordinary file object positioned at the 600 beginning of the file, attributes such as :attr:`~TarInfo.size` may need 601 modifying. This is the case for objects such as :class:`~gzip.GzipFile`. 602 The :attr:`~TarInfo.name` may also be modified, in which case *arcname* 603 could be a dummy string. 604 605 .. versionchanged:: 3.6 606 The *name* parameter accepts a :term:`path-like object`. 607 608 609.. method:: TarFile.close() 610 611 Close the :class:`TarFile`. In write mode, two finishing zero blocks are 612 appended to the archive. 613 614 615.. attribute:: TarFile.pax_headers 616 617 A dictionary containing key-value pairs of pax global headers. 618 619 620 621.. _tarinfo-objects: 622 623TarInfo Objects 624--------------- 625 626A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside 627from storing all required attributes of a file (like file type, size, time, 628permissions, owner etc.), it provides some useful methods to determine its type. 629It does *not* contain the file's data itself. 630 631:class:`TarInfo` objects are returned by :class:`TarFile`'s methods 632:meth:`~TarFile.getmember`, :meth:`~TarFile.getmembers` and 633:meth:`~TarFile.gettarinfo`. 634 635Modifying the objects returned by :meth:`~!TarFile.getmember` or 636:meth:`~!TarFile.getmembers` will affect all subsequent 637operations on the archive. 638For cases where this is unwanted, you can use :mod:`copy.copy() <copy>` or 639call the :meth:`~TarInfo.replace` method to create a modified copy in one step. 640 641Several attributes can be set to ``None`` to indicate that a piece of metadata 642is unused or unknown. 643Different :class:`TarInfo` methods handle ``None`` differently: 644 645- The :meth:`~TarFile.extract` or :meth:`~TarFile.extractall` methods will 646 ignore the corresponding metadata, leaving it set to a default. 647- :meth:`~TarFile.addfile` will fail. 648- :meth:`~TarFile.list` will print a placeholder string. 649 650 651.. versionchanged:: 3.11.4 652 Added :meth:`~TarInfo.replace` and handling of ``None``. 653 654 655.. class:: TarInfo(name="") 656 657 Create a :class:`TarInfo` object. 658 659 660.. classmethod:: TarInfo.frombuf(buf, encoding, errors) 661 662 Create and return a :class:`TarInfo` object from string buffer *buf*. 663 664 Raises :exc:`HeaderError` if the buffer is invalid. 665 666 667.. classmethod:: TarInfo.fromtarfile(tarfile) 668 669 Read the next member from the :class:`TarFile` object *tarfile* and return it as 670 a :class:`TarInfo` object. 671 672 673.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape') 674 675 Create a string buffer from a :class:`TarInfo` object. For information on the 676 arguments see the constructor of the :class:`TarFile` class. 677 678 .. versionchanged:: 3.2 679 Use ``'surrogateescape'`` as the default for the *errors* argument. 680 681 682A ``TarInfo`` object has the following public data attributes: 683 684 685.. attribute:: TarInfo.name 686 :type: str 687 688 Name of the archive member. 689 690 691.. attribute:: TarInfo.size 692 :type: int 693 694 Size in bytes. 695 696 697.. attribute:: TarInfo.mtime 698 :type: int | float 699 700 Time of last modification in seconds since the :ref:`epoch <epoch>`, 701 as in :attr:`os.stat_result.st_mtime`. 702 703 .. versionchanged:: 3.11.4 704 705 Can be set to ``None`` for :meth:`~TarFile.extract` and 706 :meth:`~TarFile.extractall`, causing extraction to skip applying this 707 attribute. 708 709.. attribute:: TarInfo.mode 710 :type: int 711 712 Permission bits, as for :func:`os.chmod`. 713 714 .. versionchanged:: 3.11.4 715 716 Can be set to ``None`` for :meth:`~TarFile.extract` and 717 :meth:`~TarFile.extractall`, causing extraction to skip applying this 718 attribute. 719 720.. attribute:: TarInfo.type 721 722 File type. *type* is usually one of these constants: :const:`REGTYPE`, 723 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`, 724 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`, 725 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object 726 more conveniently, use the ``is*()`` methods below. 727 728 729.. attribute:: TarInfo.linkname 730 :type: str 731 732 Name of the target file name, which is only present in :class:`TarInfo` objects 733 of type :const:`LNKTYPE` and :const:`SYMTYPE`. 734 735 736.. attribute:: TarInfo.uid 737 :type: int 738 739 User ID of the user who originally stored this member. 740 741 .. versionchanged:: 3.11.4 742 743 Can be set to ``None`` for :meth:`~TarFile.extract` and 744 :meth:`~TarFile.extractall`, causing extraction to skip applying this 745 attribute. 746 747.. attribute:: TarInfo.gid 748 :type: int 749 750 Group ID of the user who originally stored this member. 751 752 .. versionchanged:: 3.11.4 753 754 Can be set to ``None`` for :meth:`~TarFile.extract` and 755 :meth:`~TarFile.extractall`, causing extraction to skip applying this 756 attribute. 757 758.. attribute:: TarInfo.uname 759 :type: str 760 761 User name. 762 763 .. versionchanged:: 3.11.4 764 765 Can be set to ``None`` for :meth:`~TarFile.extract` and 766 :meth:`~TarFile.extractall`, causing extraction to skip applying this 767 attribute. 768 769.. attribute:: TarInfo.gname 770 :type: str 771 772 Group name. 773 774 .. versionchanged:: 3.11.4 775 776 Can be set to ``None`` for :meth:`~TarFile.extract` and 777 :meth:`~TarFile.extractall`, causing extraction to skip applying this 778 attribute. 779 780.. attribute:: TarInfo.pax_headers 781 :type: dict 782 783 A dictionary containing key-value pairs of an associated pax extended header. 784 785.. method:: TarInfo.replace(name=..., mtime=..., mode=..., linkname=..., 786 uid=..., gid=..., uname=..., gname=..., 787 deep=True) 788 789 .. versionadded:: 3.11.4 790 791 Return a *new* copy of the :class:`!TarInfo` object with the given attributes 792 changed. For example, to return a ``TarInfo`` with the group name set to 793 ``'staff'``, use:: 794 795 new_tarinfo = old_tarinfo.replace(gname='staff') 796 797 By default, a deep copy is made. 798 If *deep* is false, the copy is shallow, i.e. ``pax_headers`` 799 and any custom attributes are shared with the original ``TarInfo`` object. 800 801A :class:`TarInfo` object also provides some convenient query methods: 802 803 804.. method:: TarInfo.isfile() 805 806 Return :const:`True` if the :class:`Tarinfo` object is a regular file. 807 808 809.. method:: TarInfo.isreg() 810 811 Same as :meth:`isfile`. 812 813 814.. method:: TarInfo.isdir() 815 816 Return :const:`True` if it is a directory. 817 818 819.. method:: TarInfo.issym() 820 821 Return :const:`True` if it is a symbolic link. 822 823 824.. method:: TarInfo.islnk() 825 826 Return :const:`True` if it is a hard link. 827 828 829.. method:: TarInfo.ischr() 830 831 Return :const:`True` if it is a character device. 832 833 834.. method:: TarInfo.isblk() 835 836 Return :const:`True` if it is a block device. 837 838 839.. method:: TarInfo.isfifo() 840 841 Return :const:`True` if it is a FIFO. 842 843 844.. method:: TarInfo.isdev() 845 846 Return :const:`True` if it is one of character device, block device or FIFO. 847 848 849.. _tarfile-extraction-filter: 850 851Extraction filters 852------------------ 853 854.. versionadded:: 3.11.4 855 856The *tar* format is designed to capture all details of a UNIX-like filesystem, 857which makes it very powerful. 858Unfortunately, the features make it easy to create tar files that have 859unintended -- and possibly malicious -- effects when extracted. 860For example, extracting a tar file can overwrite arbitrary files in various 861ways (e.g. by using absolute paths, ``..`` path components, or symlinks that 862affect later members). 863 864In most cases, the full functionality is not needed. 865Therefore, *tarfile* supports extraction filters: a mechanism to limit 866functionality, and thus mitigate some of the security issues. 867 868.. seealso:: 869 870 :pep:`706` 871 Contains further motivation and rationale behind the design. 872 873The *filter* argument to :meth:`TarFile.extract` or :meth:`~TarFile.extractall` 874can be: 875 876* the string ``'fully_trusted'``: Honor all metadata as specified in the 877 archive. 878 Should be used if the user trusts the archive completely, or implements 879 their own complex verification. 880 881* the string ``'tar'``: Honor most *tar*-specific features (i.e. features of 882 UNIX-like filesystems), but block features that are very likely to be 883 surprising or malicious. See :func:`tar_filter` for details. 884 885* the string ``'data'``: Ignore or block most features specific to UNIX-like 886 filesystems. Intended for extracting cross-platform data archives. 887 See :func:`data_filter` for details. 888 889* ``None`` (default): Use :attr:`TarFile.extraction_filter`. 890 891 If that is also ``None`` (the default), the ``'fully_trusted'`` 892 filter will be used (for compatibility with earlier versions of Python). 893 894 In Python 3.12, the default will emit a ``DeprecationWarning``. 895 896 In Python 3.14, the ``'data'`` filter will become the default instead. 897 It's possible to switch earlier; see :attr:`TarFile.extraction_filter`. 898 899* A callable which will be called for each extracted member with a 900 :ref:`TarInfo <tarinfo-objects>` describing the member and the destination 901 path to where the archive is extracted (i.e. the same path is used for all 902 members):: 903 904 filter(/, member: TarInfo, path: str) -> TarInfo | None 905 906 The callable is called just before each member is extracted, so it can 907 take the current state of the disk into account. 908 It can: 909 910 - return a :class:`TarInfo` object which will be used instead of the metadata 911 in the archive, or 912 - return ``None``, in which case the member will be skipped, or 913 - raise an exception to abort the operation or skip the member, 914 depending on :attr:`~TarFile.errorlevel`. 915 Note that when extraction is aborted, :meth:`~TarFile.extractall` may leave 916 the archive partially extracted. It does not attempt to clean up. 917 918Default named filters 919~~~~~~~~~~~~~~~~~~~~~ 920 921The pre-defined, named filters are available as functions, so they can be 922reused in custom filters: 923 924.. function:: fully_trusted_filter(/, member, path) 925 926 Return *member* unchanged. 927 928 This implements the ``'fully_trusted'`` filter. 929 930.. function:: tar_filter(/, member, path) 931 932 Implements the ``'tar'`` filter. 933 934 - Strip leading slashes (``/`` and :attr:`os.sep`) from filenames. 935 - :ref:`Refuse <tarfile-extraction-refuse>` to extract files with absolute 936 paths (in case the name is absolute 937 even after stripping slashes, e.g. ``C:/foo`` on Windows). 938 This raises :class:`~tarfile.AbsolutePathError`. 939 - :ref:`Refuse <tarfile-extraction-refuse>` to extract files whose absolute 940 path (after following symlinks) would end up outside the destination. 941 This raises :class:`~tarfile.OutsideDestinationError`. 942 - Clear high mode bits (setuid, setgid, sticky) and group/other write bits 943 (:attr:`~stat.S_IWGRP`|:attr:`~stat.S_IWOTH`). 944 945 Return the modified ``TarInfo`` member. 946 947.. function:: data_filter(/, member, path) 948 949 Implements the ``'data'`` filter. 950 In addition to what ``tar_filter`` does: 951 952 - :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft) 953 that link to absolute paths, or ones that link outside the destination. 954 955 This raises :class:`~tarfile.AbsoluteLinkError` or 956 :class:`~tarfile.LinkOutsideDestinationError`. 957 958 Note that such files are refused even on platforms that do not support 959 symbolic links. 960 961 - :ref:`Refuse <tarfile-extraction-refuse>` to extract device files 962 (including pipes). 963 This raises :class:`~tarfile.SpecialFileError`. 964 965 - For regular files, including hard links: 966 967 - Set the owner read and write permissions 968 (:attr:`~stat.S_IRUSR`|:attr:`~stat.S_IWUSR`). 969 - Remove the group & other executable permission 970 (:attr:`~stat.S_IXGRP`|:attr:`~stat.S_IXOTH`) 971 if the owner doesn’t have it (:attr:`~stat.S_IXUSR`). 972 973 - For other files (directories), set ``mode`` to ``None``, so 974 that extraction methods skip applying permission bits. 975 - Set user and group info (``uid``, ``gid``, ``uname``, ``gname``) 976 to ``None``, so that extraction methods skip setting it. 977 978 Return the modified ``TarInfo`` member. 979 980 981.. _tarfile-extraction-refuse: 982 983Filter errors 984~~~~~~~~~~~~~ 985 986When a filter refuses to extract a file, it will raise an appropriate exception, 987a subclass of :class:`~tarfile.FilterError`. 988This will abort the extraction if :attr:`TarFile.errorlevel` is 1 or more. 989With ``errorlevel=0`` the error will be logged and the member will be skipped, 990but extraction will continue. 991 992 993Hints for further verification 994~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 995 996Even with ``filter='data'``, *tarfile* is not suited for extracting untrusted 997files without prior inspection. 998Among other issues, the pre-defined filters do not prevent denial-of-service 999attacks. Users should do additional checks. 1000 1001Here is an incomplete list of things to consider: 1002 1003* Extract to a :func:`new temporary directory <tempfile.mkdtemp>` 1004 to prevent e.g. exploiting pre-existing links, and to make it easier to 1005 clean up after a failed extraction. 1006* When working with untrusted data, use external (e.g. OS-level) limits on 1007 disk, memory and CPU usage. 1008* Check filenames against an allow-list of characters 1009 (to filter out control characters, confusables, foreign path separators, 1010 etc.). 1011* Check that filenames have expected extensions (discouraging files that 1012 execute when you “click on them”, or extension-less files like Windows special device names). 1013* Limit the number of extracted files, total size of extracted data, 1014 filename length (including symlink length), and size of individual files. 1015* Check for files that would be shadowed on case-insensitive filesystems. 1016 1017Also note that: 1018 1019* Tar files may contain multiple versions of the same file. 1020 Later ones are expected to overwrite any earlier ones. 1021 This feature is crucial to allow updating tape archives, but can be abused 1022 maliciously. 1023* *tarfile* does not protect against issues with “live” data, 1024 e.g. an attacker tinkering with the destination (or source) directory while 1025 extraction (or archiving) is in progress. 1026 1027 1028Supporting older Python versions 1029~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1030 1031Extraction filters were added to Python 3.12, and are backported to older 1032versions as security updates. 1033To check whether the feature is available, use e.g. 1034``hasattr(tarfile, 'data_filter')`` rather than checking the Python version. 1035 1036The following examples show how to support Python versions with and without 1037the feature. 1038Note that setting ``extraction_filter`` will affect any subsequent operations. 1039 1040* Fully trusted archive:: 1041 1042 my_tarfile.extraction_filter = (lambda member, path: member) 1043 my_tarfile.extractall() 1044 1045* Use the ``'data'`` filter if available, but revert to Python 3.11 behavior 1046 (``'fully_trusted'``) if this feature is not available:: 1047 1048 my_tarfile.extraction_filter = getattr(tarfile, 'data_filter', 1049 (lambda member, path: member)) 1050 my_tarfile.extractall() 1051 1052* Use the ``'data'`` filter; *fail* if it is not available:: 1053 1054 my_tarfile.extractall(filter=tarfile.data_filter) 1055 1056 or:: 1057 1058 my_tarfile.extraction_filter = tarfile.data_filter 1059 my_tarfile.extractall() 1060 1061* Use the ``'data'`` filter; *warn* if it is not available:: 1062 1063 if hasattr(tarfile, 'data_filter'): 1064 my_tarfile.extractall(filter='data') 1065 else: 1066 # remove this when no longer needed 1067 warn_the_user('Extracting may be unsafe; consider updating Python') 1068 my_tarfile.extractall() 1069 1070 1071Stateful extraction filter example 1072~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1073 1074While *tarfile*'s extraction methods take a simple *filter* callable, 1075custom filters may be more complex objects with an internal state. 1076It may be useful to write these as context managers, to be used like this:: 1077 1078 with StatefulFilter() as filter_func: 1079 tar.extractall(path, filter=filter_func) 1080 1081Such a filter can be written as, for example:: 1082 1083 class StatefulFilter: 1084 def __init__(self): 1085 self.file_count = 0 1086 1087 def __enter__(self): 1088 return self 1089 1090 def __call__(self, member, path): 1091 self.file_count += 1 1092 return member 1093 1094 def __exit__(self, *exc_info): 1095 print(f'{self.file_count} files extracted') 1096 1097 1098.. _tarfile-commandline: 1099.. program:: tarfile 1100 1101 1102Command-Line Interface 1103---------------------- 1104 1105.. versionadded:: 3.4 1106 1107The :mod:`tarfile` module provides a simple command-line interface to interact 1108with tar archives. 1109 1110If you want to create a new tar archive, specify its name after the :option:`-c` 1111option and then list the filename(s) that should be included: 1112 1113.. code-block:: shell-session 1114 1115 $ python -m tarfile -c monty.tar spam.txt eggs.txt 1116 1117Passing a directory is also acceptable: 1118 1119.. code-block:: shell-session 1120 1121 $ python -m tarfile -c monty.tar life-of-brian_1979/ 1122 1123If you want to extract a tar archive into the current directory, use 1124the :option:`-e` option: 1125 1126.. code-block:: shell-session 1127 1128 $ python -m tarfile -e monty.tar 1129 1130You can also extract a tar archive into a different directory by passing the 1131directory's name: 1132 1133.. code-block:: shell-session 1134 1135 $ python -m tarfile -e monty.tar other-dir/ 1136 1137For a list of the files in a tar archive, use the :option:`-l` option: 1138 1139.. code-block:: shell-session 1140 1141 $ python -m tarfile -l monty.tar 1142 1143 1144Command-line options 1145~~~~~~~~~~~~~~~~~~~~ 1146 1147.. cmdoption:: -l <tarfile> 1148 --list <tarfile> 1149 1150 List files in a tarfile. 1151 1152.. cmdoption:: -c <tarfile> <source1> ... <sourceN> 1153 --create <tarfile> <source1> ... <sourceN> 1154 1155 Create tarfile from source files. 1156 1157.. cmdoption:: -e <tarfile> [<output_dir>] 1158 --extract <tarfile> [<output_dir>] 1159 1160 Extract tarfile into the current directory if *output_dir* is not specified. 1161 1162.. cmdoption:: -t <tarfile> 1163 --test <tarfile> 1164 1165 Test whether the tarfile is valid or not. 1166 1167.. cmdoption:: -v, --verbose 1168 1169 Verbose output. 1170 1171.. cmdoption:: --filter <filtername> 1172 1173 Specifies the *filter* for ``--extract``. 1174 See :ref:`tarfile-extraction-filter` for details. 1175 Only string names are accepted (that is, ``fully_trusted``, ``tar``, 1176 and ``data``). 1177 1178 .. versionadded:: 3.11.4 1179 1180.. _tar-examples: 1181 1182Examples 1183-------- 1184 1185How to extract an entire tar archive to the current working directory:: 1186 1187 import tarfile 1188 tar = tarfile.open("sample.tar.gz") 1189 tar.extractall() 1190 tar.close() 1191 1192How to extract a subset of a tar archive with :meth:`TarFile.extractall` using 1193a generator function instead of a list:: 1194 1195 import os 1196 import tarfile 1197 1198 def py_files(members): 1199 for tarinfo in members: 1200 if os.path.splitext(tarinfo.name)[1] == ".py": 1201 yield tarinfo 1202 1203 tar = tarfile.open("sample.tar.gz") 1204 tar.extractall(members=py_files(tar)) 1205 tar.close() 1206 1207How to create an uncompressed tar archive from a list of filenames:: 1208 1209 import tarfile 1210 tar = tarfile.open("sample.tar", "w") 1211 for name in ["foo", "bar", "quux"]: 1212 tar.add(name) 1213 tar.close() 1214 1215The same example using the :keyword:`with` statement:: 1216 1217 import tarfile 1218 with tarfile.open("sample.tar", "w") as tar: 1219 for name in ["foo", "bar", "quux"]: 1220 tar.add(name) 1221 1222How to read a gzip compressed tar archive and display some member information:: 1223 1224 import tarfile 1225 tar = tarfile.open("sample.tar.gz", "r:gz") 1226 for tarinfo in tar: 1227 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="") 1228 if tarinfo.isreg(): 1229 print("a regular file.") 1230 elif tarinfo.isdir(): 1231 print("a directory.") 1232 else: 1233 print("something else.") 1234 tar.close() 1235 1236How to create an archive and reset the user information using the *filter* 1237parameter in :meth:`TarFile.add`:: 1238 1239 import tarfile 1240 def reset(tarinfo): 1241 tarinfo.uid = tarinfo.gid = 0 1242 tarinfo.uname = tarinfo.gname = "root" 1243 return tarinfo 1244 tar = tarfile.open("sample.tar.gz", "w:gz") 1245 tar.add("foo", filter=reset) 1246 tar.close() 1247 1248 1249.. _tar-formats: 1250 1251Supported tar formats 1252--------------------- 1253 1254There are three tar formats that can be created with the :mod:`tarfile` module: 1255 1256* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames 1257 up to a length of at best 256 characters and linknames up to 100 characters. 1258 The maximum file size is 8 GiB. This is an old and limited but widely 1259 supported format. 1260 1261* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and 1262 linknames, files bigger than 8 GiB and sparse files. It is the de facto 1263 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar 1264 extensions for long names, sparse file support is read-only. 1265 1266* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible 1267 format with virtually no limits. It supports long filenames and linknames, large 1268 files and stores pathnames in a portable way. Modern tar implementations, 1269 including GNU tar, bsdtar/libarchive and star, fully support extended *pax* 1270 features; some old or unmaintained libraries may not, but should treat 1271 *pax* archives as if they were in the universally supported *ustar* format. 1272 It is the current default format for new archives. 1273 1274 It extends the existing *ustar* format with extra headers for information 1275 that cannot be stored otherwise. There are two flavours of pax headers: 1276 Extended headers only affect the subsequent file header, global 1277 headers are valid for the complete archive and affect all following files. 1278 All the data in a pax header is encoded in *UTF-8* for portability reasons. 1279 1280There are some more variants of the tar format which can be read, but not 1281created: 1282 1283* The ancient V7 format. This is the first tar format from Unix Seventh Edition, 1284 storing only regular files and directories. Names must not be longer than 100 1285 characters, there is no user/group name information. Some archives have 1286 miscalculated header checksums in case of fields with non-ASCII characters. 1287 1288* The SunOS tar extended format. This format is a variant of the POSIX.1-2001 1289 pax format, but is not compatible. 1290 1291.. _tar-unicode: 1292 1293Unicode issues 1294-------------- 1295 1296The tar format was originally conceived to make backups on tape drives with the 1297main focus on preserving file system information. Nowadays tar archives are 1298commonly used for file distribution and exchanging archives over networks. One 1299problem of the original format (which is the basis of all other formats) is 1300that there is no concept of supporting different character encodings. For 1301example, an ordinary tar archive created on a *UTF-8* system cannot be read 1302correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual 1303metadata (like filenames, linknames, user/group names) will appear damaged. 1304Unfortunately, there is no way to autodetect the encoding of an archive. The 1305pax format was designed to solve this problem. It stores non-ASCII metadata 1306using the universal character encoding *UTF-8*. 1307 1308The details of character conversion in :mod:`tarfile` are controlled by the 1309*encoding* and *errors* keyword arguments of the :class:`TarFile` class. 1310 1311*encoding* defines the character encoding to use for the metadata in the 1312archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'`` 1313as a fallback. Depending on whether the archive is read or written, the 1314metadata must be either decoded or encoded. If *encoding* is not set 1315appropriately, this conversion may fail. 1316 1317The *errors* argument defines how characters are treated that cannot be 1318converted. Possible values are listed in section :ref:`error-handlers`. 1319The default scheme is ``'surrogateescape'`` which Python also uses for its 1320file system calls, see :ref:`os-filenames`. 1321 1322For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed 1323because all the metadata is stored using *UTF-8*. *encoding* is only used in 1324the rare cases when binary pax headers are decoded or when strings with 1325surrogate characters are stored. 1326