1:mod:`urllib.parse` --- Parse URLs into components 2================================================== 3 4.. module:: urllib.parse 5 :synopsis: Parse URLs into or assemble them from components. 6 7**Source code:** :source:`Lib/urllib/parse.py` 8 9.. index:: 10 single: WWW 11 single: World Wide Web 12 single: URL 13 pair: URL; parsing 14 pair: relative; URL 15 16-------------- 17 18This module defines a standard interface to break Uniform Resource Locator (URL) 19strings up in components (addressing scheme, network location, path etc.), to 20combine the components back into a URL string, and to convert a "relative URL" 21to an absolute URL given a "base URL." 22 23The module has been designed to match the internet RFC on Relative Uniform 24Resource Locators. It supports the following URL schemes: ``file``, ``ftp``, 25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``, 26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``, 27``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``, 28``wais``, ``ws``, ``wss``. 29 30The :mod:`urllib.parse` module defines functions that fall into two broad 31categories: URL parsing and URL quoting. These are covered in detail in 32the following sections. 33 34URL Parsing 35----------- 36 37The URL parsing functions focus on splitting a URL string into its components, 38or on combining URL components into a URL string. 39 40.. function:: urlparse(urlstring, scheme='', allow_fragments=True) 41 42 Parse a URL into six components, returning a 6-item :term:`named tuple`. This 43 corresponds to the general structure of a URL: 44 ``scheme://netloc/path;parameters?query#fragment``. 45 Each tuple item is a string, possibly empty. The components are not broken up 46 into smaller parts (for example, the network location is a single string), and % 47 escapes are not expanded. The delimiters as shown above are not part of the 48 result, except for a leading slash in the *path* component, which is retained if 49 present. For example: 50 51 .. doctest:: 52 :options: +NORMALIZE_WHITESPACE 53 54 >>> from urllib.parse import urlparse 55 >>> urlparse("scheme://netloc/path;parameters?query#fragment") 56 ParseResult(scheme='scheme', netloc='netloc', path='/path;parameters', params='', 57 query='query', fragment='fragment') 58 >>> o = urlparse("http://docs.python.org:80/3/library/urllib.parse.html?" 59 ... "highlight=params#url-parsing") 60 >>> o 61 ParseResult(scheme='http', netloc='docs.python.org:80', 62 path='/3/library/urllib.parse.html', params='', 63 query='highlight=params', fragment='url-parsing') 64 >>> o.scheme 65 'http' 66 >>> o.netloc 67 'docs.python.org:80' 68 >>> o.hostname 69 'docs.python.org' 70 >>> o.port 71 80 72 >>> o._replace(fragment="").geturl() 73 'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params' 74 75 Following the syntax specifications in :rfc:`1808`, urlparse recognizes 76 a netloc only if it is properly introduced by '//'. Otherwise the 77 input is presumed to be a relative URL and thus to start with 78 a path component. 79 80 .. doctest:: 81 :options: +NORMALIZE_WHITESPACE 82 83 >>> from urllib.parse import urlparse 84 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html') 85 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', 86 params='', query='', fragment='') 87 >>> urlparse('www.cwi.nl/%7Eguido/Python.html') 88 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html', 89 params='', query='', fragment='') 90 >>> urlparse('help/Python.html') 91 ParseResult(scheme='', netloc='', path='help/Python.html', params='', 92 query='', fragment='') 93 94 The *scheme* argument gives the default addressing scheme, to be 95 used only if the URL does not specify one. It should be the same type 96 (text or bytes) as *urlstring*, except that the default value ``''`` is 97 always allowed, and is automatically converted to ``b''`` if appropriate. 98 99 If the *allow_fragments* argument is false, fragment identifiers are not 100 recognized. Instead, they are parsed as part of the path, parameters 101 or query component, and :attr:`fragment` is set to the empty string in 102 the return value. 103 104 The return value is a :term:`named tuple`, which means that its items can 105 be accessed by index or as named attributes, which are: 106 107 +------------------+-------+-------------------------+------------------------+ 108 | Attribute | Index | Value | Value if not present | 109 +==================+=======+=========================+========================+ 110 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter | 111 +------------------+-------+-------------------------+------------------------+ 112 | :attr:`netloc` | 1 | Network location part | empty string | 113 +------------------+-------+-------------------------+------------------------+ 114 | :attr:`path` | 2 | Hierarchical path | empty string | 115 +------------------+-------+-------------------------+------------------------+ 116 | :attr:`params` | 3 | Parameters for last | empty string | 117 | | | path element | | 118 +------------------+-------+-------------------------+------------------------+ 119 | :attr:`query` | 4 | Query component | empty string | 120 +------------------+-------+-------------------------+------------------------+ 121 | :attr:`fragment` | 5 | Fragment identifier | empty string | 122 +------------------+-------+-------------------------+------------------------+ 123 | :attr:`username` | | User name | :const:`None` | 124 +------------------+-------+-------------------------+------------------------+ 125 | :attr:`password` | | Password | :const:`None` | 126 +------------------+-------+-------------------------+------------------------+ 127 | :attr:`hostname` | | Host name (lower case) | :const:`None` | 128 +------------------+-------+-------------------------+------------------------+ 129 | :attr:`port` | | Port number as integer, | :const:`None` | 130 | | | if present | | 131 +------------------+-------+-------------------------+------------------------+ 132 133 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if 134 an invalid port is specified in the URL. See section 135 :ref:`urlparse-result-object` for more information on the result object. 136 137 Unmatched square brackets in the :attr:`netloc` attribute will raise a 138 :exc:`ValueError`. 139 140 Characters in the :attr:`netloc` attribute that decompose under NFKC 141 normalization (as used by the IDNA encoding) into any of ``/``, ``?``, 142 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is 143 decomposed before parsing, no error will be raised. 144 145 As is the case with all named tuples, the subclass has a few additional methods 146 and attributes that are particularly useful. One such method is :meth:`_replace`. 147 The :meth:`_replace` method will return a new ParseResult object replacing specified 148 fields with new values. 149 150 .. doctest:: 151 :options: +NORMALIZE_WHITESPACE 152 153 >>> from urllib.parse import urlparse 154 >>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html') 155 >>> u 156 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', 157 params='', query='', fragment='') 158 >>> u._replace(scheme='http') 159 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', 160 params='', query='', fragment='') 161 162 .. warning:: 163 164 :func:`urlparse` does not perform validation. See :ref:`URL parsing 165 security <url-parsing-security>` for details. 166 167 .. versionchanged:: 3.2 168 Added IPv6 URL parsing capabilities. 169 170 .. versionchanged:: 3.3 171 The fragment is now parsed for all URL schemes (unless *allow_fragment* is 172 false), in accordance with :rfc:`3986`. Previously, an allowlist of 173 schemes that support fragments existed. 174 175 .. versionchanged:: 3.6 176 Out-of-range port numbers now raise :exc:`ValueError`, instead of 177 returning :const:`None`. 178 179 .. versionchanged:: 3.8 180 Characters that affect netloc parsing under NFKC normalization will 181 now raise :exc:`ValueError`. 182 183 184.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&') 185 186 Parse a query string given as a string argument (data of type 187 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a 188 dictionary. The dictionary keys are the unique query variable names and the 189 values are lists of values for each name. 190 191 The optional argument *keep_blank_values* is a flag indicating whether blank 192 values in percent-encoded queries should be treated as blank strings. A true value 193 indicates that blanks should be retained as blank strings. The default false 194 value indicates that blank values are to be ignored and treated as if they were 195 not included. 196 197 The optional argument *strict_parsing* is a flag indicating what to do with 198 parsing errors. If false (the default), errors are silently ignored. If true, 199 errors raise a :exc:`ValueError` exception. 200 201 The optional *encoding* and *errors* parameters specify how to decode 202 percent-encoded sequences into Unicode characters, as accepted by the 203 :meth:`bytes.decode` method. 204 205 The optional argument *max_num_fields* is the maximum number of fields to 206 read. If set, then throws a :exc:`ValueError` if there are more than 207 *max_num_fields* fields read. 208 209 The optional argument *separator* is the symbol to use for separating the 210 query arguments. It defaults to ``&``. 211 212 Use the :func:`urllib.parse.urlencode` function (with the ``doseq`` 213 parameter set to ``True``) to convert such dictionaries into query 214 strings. 215 216 217 .. versionchanged:: 3.2 218 Add *encoding* and *errors* parameters. 219 220 .. versionchanged:: 3.8 221 Added *max_num_fields* parameter. 222 223 .. versionchanged:: 3.10 224 Added *separator* parameter with the default value of ``&``. Python 225 versions earlier than Python 3.10 allowed using both ``;`` and ``&`` as 226 query parameter separator. This has been changed to allow only a single 227 separator key, with ``&`` as the default separator. 228 229 230.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&') 231 232 Parse a query string given as a string argument (data of type 233 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of 234 name, value pairs. 235 236 The optional argument *keep_blank_values* is a flag indicating whether blank 237 values in percent-encoded queries should be treated as blank strings. A true value 238 indicates that blanks should be retained as blank strings. The default false 239 value indicates that blank values are to be ignored and treated as if they were 240 not included. 241 242 The optional argument *strict_parsing* is a flag indicating what to do with 243 parsing errors. If false (the default), errors are silently ignored. If true, 244 errors raise a :exc:`ValueError` exception. 245 246 The optional *encoding* and *errors* parameters specify how to decode 247 percent-encoded sequences into Unicode characters, as accepted by the 248 :meth:`bytes.decode` method. 249 250 The optional argument *max_num_fields* is the maximum number of fields to 251 read. If set, then throws a :exc:`ValueError` if there are more than 252 *max_num_fields* fields read. 253 254 The optional argument *separator* is the symbol to use for separating the 255 query arguments. It defaults to ``&``. 256 257 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into 258 query strings. 259 260 .. versionchanged:: 3.2 261 Add *encoding* and *errors* parameters. 262 263 .. versionchanged:: 3.8 264 Added *max_num_fields* parameter. 265 266 .. versionchanged:: 3.10 267 Added *separator* parameter with the default value of ``&``. Python 268 versions earlier than Python 3.10 allowed using both ``;`` and ``&`` as 269 query parameter separator. This has been changed to allow only a single 270 separator key, with ``&`` as the default separator. 271 272 273.. function:: urlunparse(parts) 274 275 Construct a URL from a tuple as returned by ``urlparse()``. The *parts* 276 argument can be any six-item iterable. This may result in a slightly 277 different, but equivalent URL, if the URL that was parsed originally had 278 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC 279 states that these are equivalent). 280 281 282.. function:: urlsplit(urlstring, scheme='', allow_fragments=True) 283 284 This is similar to :func:`urlparse`, but does not split the params from the URL. 285 This should generally be used instead of :func:`urlparse` if the more recent URL 286 syntax allowing parameters to be applied to each segment of the *path* portion 287 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to 288 separate the path segments and parameters. This function returns a 5-item 289 :term:`named tuple`:: 290 291 (addressing scheme, network location, path, query, fragment identifier). 292 293 The return value is a :term:`named tuple`, its items can be accessed by index 294 or as named attributes: 295 296 +------------------+-------+-------------------------+----------------------+ 297 | Attribute | Index | Value | Value if not present | 298 +==================+=======+=========================+======================+ 299 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter | 300 +------------------+-------+-------------------------+----------------------+ 301 | :attr:`netloc` | 1 | Network location part | empty string | 302 +------------------+-------+-------------------------+----------------------+ 303 | :attr:`path` | 2 | Hierarchical path | empty string | 304 +------------------+-------+-------------------------+----------------------+ 305 | :attr:`query` | 3 | Query component | empty string | 306 +------------------+-------+-------------------------+----------------------+ 307 | :attr:`fragment` | 4 | Fragment identifier | empty string | 308 +------------------+-------+-------------------------+----------------------+ 309 | :attr:`username` | | User name | :const:`None` | 310 +------------------+-------+-------------------------+----------------------+ 311 | :attr:`password` | | Password | :const:`None` | 312 +------------------+-------+-------------------------+----------------------+ 313 | :attr:`hostname` | | Host name (lower case) | :const:`None` | 314 +------------------+-------+-------------------------+----------------------+ 315 | :attr:`port` | | Port number as integer, | :const:`None` | 316 | | | if present | | 317 +------------------+-------+-------------------------+----------------------+ 318 319 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if 320 an invalid port is specified in the URL. See section 321 :ref:`urlparse-result-object` for more information on the result object. 322 323 Unmatched square brackets in the :attr:`netloc` attribute will raise a 324 :exc:`ValueError`. 325 326 Characters in the :attr:`netloc` attribute that decompose under NFKC 327 normalization (as used by the IDNA encoding) into any of ``/``, ``?``, 328 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is 329 decomposed before parsing, no error will be raised. 330 331 Following some of the `WHATWG spec`_ that updates RFC 3986, leading C0 332 control and space characters are stripped from the URL. ``\n``, 333 ``\r`` and tab ``\t`` characters are removed from the URL at any position. 334 335 .. warning:: 336 337 :func:`urlsplit` does not perform validation. See :ref:`URL parsing 338 security <url-parsing-security>` for details. 339 340 .. versionchanged:: 3.6 341 Out-of-range port numbers now raise :exc:`ValueError`, instead of 342 returning :const:`None`. 343 344 .. versionchanged:: 3.8 345 Characters that affect netloc parsing under NFKC normalization will 346 now raise :exc:`ValueError`. 347 348 .. versionchanged:: 3.10 349 ASCII newline and tab characters are stripped from the URL. 350 351 .. versionchanged:: 3.11.4 352 Leading WHATWG C0 control and space characters are stripped from the URL. 353 354.. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser 355 356.. function:: urlunsplit(parts) 357 358 Combine the elements of a tuple as returned by :func:`urlsplit` into a 359 complete URL as a string. The *parts* argument can be any five-item 360 iterable. This may result in a slightly different, but equivalent URL, if the 361 URL that was parsed originally had unnecessary delimiters (for example, a ? 362 with an empty query; the RFC states that these are equivalent). 363 364 365.. function:: urljoin(base, url, allow_fragments=True) 366 367 Construct a full ("absolute") URL by combining a "base URL" (*base*) with 368 another URL (*url*). Informally, this uses components of the base URL, in 369 particular the addressing scheme, the network location and (part of) the 370 path, to provide missing components in the relative URL. For example: 371 372 >>> from urllib.parse import urljoin 373 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') 374 'http://www.cwi.nl/%7Eguido/FAQ.html' 375 376 The *allow_fragments* argument has the same meaning and default as for 377 :func:`urlparse`. 378 379 .. note:: 380 381 If *url* is an absolute URL (that is, it starts with ``//`` or ``scheme://``), 382 the *url*'s hostname and/or scheme will be present in the result. For example: 383 384 .. doctest:: 385 386 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 387 ... '//www.python.org/%7Eguido') 388 'http://www.python.org/%7Eguido' 389 390 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and 391 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts. 392 393 394 .. versionchanged:: 3.5 395 396 Behavior updated to match the semantics defined in :rfc:`3986`. 397 398 399.. function:: urldefrag(url) 400 401 If *url* contains a fragment identifier, return a modified version of *url* 402 with no fragment identifier, and the fragment identifier as a separate 403 string. If there is no fragment identifier in *url*, return *url* unmodified 404 and an empty string. 405 406 The return value is a :term:`named tuple`, its items can be accessed by index 407 or as named attributes: 408 409 +------------------+-------+-------------------------+----------------------+ 410 | Attribute | Index | Value | Value if not present | 411 +==================+=======+=========================+======================+ 412 | :attr:`url` | 0 | URL with no fragment | empty string | 413 +------------------+-------+-------------------------+----------------------+ 414 | :attr:`fragment` | 1 | Fragment identifier | empty string | 415 +------------------+-------+-------------------------+----------------------+ 416 417 See section :ref:`urlparse-result-object` for more information on the result 418 object. 419 420 .. versionchanged:: 3.2 421 Result is a structured object rather than a simple 2-tuple. 422 423.. function:: unwrap(url) 424 425 Extract the url from a wrapped URL (that is, a string formatted as 426 ``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path`` 427 or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned 428 without changes. 429 430.. _url-parsing-security: 431 432URL parsing security 433-------------------- 434 435The :func:`urlsplit` and :func:`urlparse` APIs do not perform **validation** of 436inputs. They may not raise errors on inputs that other applications consider 437invalid. They may also succeed on some inputs that might not be considered 438URLs elsewhere. Their purpose is for practical functionality rather than 439purity. 440 441Instead of raising an exception on unusual input, they may instead return some 442component parts as empty strings. Or components may contain more than perhaps 443they should. 444 445We recommend that users of these APIs where the values may be used anywhere 446with security implications code defensively. Do some verification within your 447code before trusting a returned component part. Does that ``scheme`` make 448sense? Is that a sensible ``path``? Is there anything strange about that 449``hostname``? etc. 450 451What constitutes a URL is not universally well defined. Different applications 452have different needs and desired constraints. For instance the living `WHATWG 453spec`_ describes what user facing web clients such as a web browser require. 454While :rfc:`3986` is more general. These functions incorporate some aspects of 455both, but cannot be claimed compliant with either. The APIs and existing user 456code with expectations on specific behaviors predate both standards leading us 457to be very cautious about making API behavior changes. 458 459.. _parsing-ascii-encoded-bytes: 460 461Parsing ASCII Encoded Bytes 462--------------------------- 463 464The URL parsing functions were originally designed to operate on character 465strings only. In practice, it is useful to be able to manipulate properly 466quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the 467URL parsing functions in this module all operate on :class:`bytes` and 468:class:`bytearray` objects in addition to :class:`str` objects. 469 470If :class:`str` data is passed in, the result will also contain only 471:class:`str` data. If :class:`bytes` or :class:`bytearray` data is 472passed in, the result will contain only :class:`bytes` data. 473 474Attempting to mix :class:`str` data with :class:`bytes` or 475:class:`bytearray` in a single function call will result in a 476:exc:`TypeError` being raised, while attempting to pass in non-ASCII 477byte values will trigger :exc:`UnicodeDecodeError`. 478 479To support easier conversion of result objects between :class:`str` and 480:class:`bytes`, all return values from URL parsing functions provide 481either an :meth:`encode` method (when the result contains :class:`str` 482data) or a :meth:`decode` method (when the result contains :class:`bytes` 483data). The signatures of these methods match those of the corresponding 484:class:`str` and :class:`bytes` methods (except that the default encoding 485is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a 486corresponding type that contains either :class:`bytes` data (for 487:meth:`encode` methods) or :class:`str` data (for 488:meth:`decode` methods). 489 490Applications that need to operate on potentially improperly quoted URLs 491that may contain non-ASCII data will need to do their own decoding from 492bytes to characters before invoking the URL parsing methods. 493 494The behaviour described in this section applies only to the URL parsing 495functions. The URL quoting functions use their own rules when producing 496or consuming byte sequences as detailed in the documentation of the 497individual URL quoting functions. 498 499.. versionchanged:: 3.2 500 URL parsing functions now accept ASCII encoded byte sequences 501 502 503.. _urlparse-result-object: 504 505Structured Parse Results 506------------------------ 507 508The result objects from the :func:`urlparse`, :func:`urlsplit` and 509:func:`urldefrag` functions are subclasses of the :class:`tuple` type. 510These subclasses add the attributes listed in the documentation for 511those functions, the encoding and decoding support described in the 512previous section, as well as an additional method: 513 514.. method:: urllib.parse.SplitResult.geturl() 515 516 Return the re-combined version of the original URL as a string. This may 517 differ from the original URL in that the scheme may be normalized to lower 518 case and empty components may be dropped. Specifically, empty parameters, 519 queries, and fragment identifiers will be removed. 520 521 For :func:`urldefrag` results, only empty fragment identifiers will be removed. 522 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be 523 made to the URL returned by this method. 524 525 The result of this method remains unchanged if passed back through the original 526 parsing function: 527 528 >>> from urllib.parse import urlsplit 529 >>> url = 'HTTP://www.Python.org/doc/#' 530 >>> r1 = urlsplit(url) 531 >>> r1.geturl() 532 'http://www.Python.org/doc/' 533 >>> r2 = urlsplit(r1.geturl()) 534 >>> r2.geturl() 535 'http://www.Python.org/doc/' 536 537 538The following classes provide the implementations of the structured parse 539results when operating on :class:`str` objects: 540 541.. class:: DefragResult(url, fragment) 542 543 Concrete class for :func:`urldefrag` results containing :class:`str` 544 data. The :meth:`encode` method returns a :class:`DefragResultBytes` 545 instance. 546 547 .. versionadded:: 3.2 548 549.. class:: ParseResult(scheme, netloc, path, params, query, fragment) 550 551 Concrete class for :func:`urlparse` results containing :class:`str` 552 data. The :meth:`encode` method returns a :class:`ParseResultBytes` 553 instance. 554 555.. class:: SplitResult(scheme, netloc, path, query, fragment) 556 557 Concrete class for :func:`urlsplit` results containing :class:`str` 558 data. The :meth:`encode` method returns a :class:`SplitResultBytes` 559 instance. 560 561 562The following classes provide the implementations of the parse results when 563operating on :class:`bytes` or :class:`bytearray` objects: 564 565.. class:: DefragResultBytes(url, fragment) 566 567 Concrete class for :func:`urldefrag` results containing :class:`bytes` 568 data. The :meth:`decode` method returns a :class:`DefragResult` 569 instance. 570 571 .. versionadded:: 3.2 572 573.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment) 574 575 Concrete class for :func:`urlparse` results containing :class:`bytes` 576 data. The :meth:`decode` method returns a :class:`ParseResult` 577 instance. 578 579 .. versionadded:: 3.2 580 581.. class:: SplitResultBytes(scheme, netloc, path, query, fragment) 582 583 Concrete class for :func:`urlsplit` results containing :class:`bytes` 584 data. The :meth:`decode` method returns a :class:`SplitResult` 585 instance. 586 587 .. versionadded:: 3.2 588 589 590URL Quoting 591----------- 592 593The URL quoting functions focus on taking program data and making it safe 594for use as URL components by quoting special characters and appropriately 595encoding non-ASCII text. They also support reversing these operations to 596recreate the original data from the contents of a URL component if that 597task isn't already covered by the URL parsing functions above. 598 599.. function:: quote(string, safe='/', encoding=None, errors=None) 600 601 Replace special characters in *string* using the ``%xx`` escape. Letters, 602 digits, and the characters ``'_.-~'`` are never quoted. By default, this 603 function is intended for quoting the path section of a URL. The optional 604 *safe* parameter specifies additional ASCII characters that should not be 605 quoted --- its default value is ``'/'``. 606 607 *string* may be either a :class:`str` or a :class:`bytes` object. 608 609 .. versionchanged:: 3.7 610 Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now 611 included in the set of unreserved characters. 612 613 The optional *encoding* and *errors* parameters specify how to deal with 614 non-ASCII characters, as accepted by the :meth:`str.encode` method. 615 *encoding* defaults to ``'utf-8'``. 616 *errors* defaults to ``'strict'``, meaning unsupported characters raise a 617 :class:`UnicodeEncodeError`. 618 *encoding* and *errors* must not be supplied if *string* is a 619 :class:`bytes`, or a :class:`TypeError` is raised. 620 621 Note that ``quote(string, safe, encoding, errors)`` is equivalent to 622 ``quote_from_bytes(string.encode(encoding, errors), safe)``. 623 624 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``. 625 626 627.. function:: quote_plus(string, safe='', encoding=None, errors=None) 628 629 Like :func:`quote`, but also replace spaces with plus signs, as required for 630 quoting HTML form values when building up a query string to go into a URL. 631 Plus signs in the original string are escaped unless they are included in 632 *safe*. It also does not have *safe* default to ``'/'``. 633 634 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``. 635 636 637.. function:: quote_from_bytes(bytes, safe='/') 638 639 Like :func:`quote`, but accepts a :class:`bytes` object rather than a 640 :class:`str`, and does not perform string-to-bytes encoding. 641 642 Example: ``quote_from_bytes(b'a&\xef')`` yields 643 ``'a%26%EF'``. 644 645 646.. function:: unquote(string, encoding='utf-8', errors='replace') 647 648 Replace ``%xx`` escapes with their single-character equivalent. 649 The optional *encoding* and *errors* parameters specify how to decode 650 percent-encoded sequences into Unicode characters, as accepted by the 651 :meth:`bytes.decode` method. 652 653 *string* may be either a :class:`str` or a :class:`bytes` object. 654 655 *encoding* defaults to ``'utf-8'``. 656 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced 657 by a placeholder character. 658 659 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``. 660 661 .. versionchanged:: 3.9 662 *string* parameter supports bytes and str objects (previously only str). 663 664 665 666 667.. function:: unquote_plus(string, encoding='utf-8', errors='replace') 668 669 Like :func:`unquote`, but also replace plus signs with spaces, as required 670 for unquoting HTML form values. 671 672 *string* must be a :class:`str`. 673 674 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``. 675 676 677.. function:: unquote_to_bytes(string) 678 679 Replace ``%xx`` escapes with their single-octet equivalent, and return a 680 :class:`bytes` object. 681 682 *string* may be either a :class:`str` or a :class:`bytes` object. 683 684 If it is a :class:`str`, unescaped non-ASCII characters in *string* 685 are encoded into UTF-8 bytes. 686 687 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``. 688 689 690.. function:: urlencode(query, doseq=False, safe='', encoding=None, \ 691 errors=None, quote_via=quote_plus) 692 693 Convert a mapping object or a sequence of two-element tuples, which may 694 contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII 695 text string. If the resultant string is to be used as a *data* for POST 696 operation with the :func:`~urllib.request.urlopen` function, then 697 it should be encoded to bytes, otherwise it would result in a 698 :exc:`TypeError`. 699 700 The resulting string is a series of ``key=value`` pairs separated by ``'&'`` 701 characters, where both *key* and *value* are quoted using the *quote_via* 702 function. By default, :func:`quote_plus` is used to quote the values, which 703 means spaces are quoted as a ``'+'`` character and '/' characters are 704 encoded as ``%2F``, which follows the standard for GET requests 705 (``application/x-www-form-urlencoded``). An alternate function that can be 706 passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20`` 707 and not encode '/' characters. For maximum control of what is quoted, use 708 ``quote`` and specify a value for *safe*. 709 710 When a sequence of two-element tuples is used as the *query* 711 argument, the first element of each tuple is a key and the second is a 712 value. The value element in itself can be a sequence and in that case, if 713 the optional parameter *doseq* evaluates to ``True``, individual 714 ``key=value`` pairs separated by ``'&'`` are generated for each element of 715 the value sequence for the key. The order of parameters in the encoded 716 string will match the order of parameter tuples in the sequence. 717 718 The *safe*, *encoding*, and *errors* parameters are passed down to 719 *quote_via* (the *encoding* and *errors* parameters are only passed 720 when a query element is a :class:`str`). 721 722 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are 723 provided in this module to parse query strings into Python data structures. 724 725 Refer to :ref:`urllib examples <urllib-examples>` to find out how the 726 :func:`urllib.parse.urlencode` method can be used for generating the query 727 string of a URL or data for a POST request. 728 729 .. versionchanged:: 3.2 730 *query* supports bytes and string objects. 731 732 .. versionadded:: 3.5 733 *quote_via* parameter. 734 735 736.. seealso:: 737 738 `WHATWG`_ - URL Living standard 739 Working Group for the URL Standard that defines URLs, domains, IP addresses, the 740 application/x-www-form-urlencoded format, and their API. 741 742 :rfc:`3986` - Uniform Resource Identifiers 743 This is the current standard (STD66). Any changes to urllib.parse module 744 should conform to this. Certain deviations could be observed, which are 745 mostly for backward compatibility purposes and for certain de-facto 746 parsing requirements as commonly observed in major browsers. 747 748 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's. 749 This specifies the parsing requirements of IPv6 URLs. 750 751 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax 752 Document describing the generic syntactic requirements for both Uniform Resource 753 Names (URNs) and Uniform Resource Locators (URLs). 754 755 :rfc:`2368` - The mailto URL scheme. 756 Parsing requirements for mailto URL schemes. 757 758 :rfc:`1808` - Relative Uniform Resource Locators 759 This Request For Comments includes the rules for joining an absolute and a 760 relative URL, including a fair number of "Abnormal Examples" which govern the 761 treatment of border cases. 762 763 :rfc:`1738` - Uniform Resource Locators (URL) 764 This specifies the formal syntax and semantics of absolute URLs. 765 766.. _WHATWG: https://url.spec.whatwg.org/ 767