1:mod:`urllib.parse` --- Parse URLs into components
2==================================================
3
4.. module:: urllib.parse
5   :synopsis: Parse URLs into or assemble them from components.
6
7**Source code:** :source:`Lib/urllib/parse.py`
8
9.. index::
10   single: WWW
11   single: World Wide Web
12   single: URL
13   pair: URL; parsing
14   pair: relative; URL
15
16--------------
17
18This module defines a standard interface to break Uniform Resource Locator (URL)
19strings up in components (addressing scheme, network location, path etc.), to
20combine the components back into a URL string, and to convert a "relative URL"
21to an absolute URL given a "base URL."
22
23The module has been designed to match the internet RFC on Relative Uniform
24Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
27``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
28``wais``, ``ws``, ``wss``.
29
30The :mod:`urllib.parse` module defines functions that fall into two broad
31categories: URL parsing and URL quoting. These are covered in detail in
32the following sections.
33
34URL Parsing
35-----------
36
37The URL parsing functions focus on splitting a URL string into its components,
38or on combining URL components into a URL string.
39
40.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
41
42   Parse a URL into six components, returning a 6-item :term:`named tuple`.  This
43   corresponds to the general structure of a URL:
44   ``scheme://netloc/path;parameters?query#fragment``.
45   Each tuple item is a string, possibly empty. The components are not broken up
46   into smaller parts (for example, the network location is a single string), and %
47   escapes are not expanded. The delimiters as shown above are not part of the
48   result, except for a leading slash in the *path* component, which is retained if
49   present.  For example:
50
51   .. doctest::
52      :options: +NORMALIZE_WHITESPACE
53
54      >>> from urllib.parse import urlparse
55      >>> urlparse("scheme://netloc/path;parameters?query#fragment")
56      ParseResult(scheme='scheme', netloc='netloc', path='/path;parameters', params='',
57                  query='query', fragment='fragment')
58      >>> o = urlparse("http://docs.python.org:80/3/library/urllib.parse.html?"
59      ...              "highlight=params#url-parsing")
60      >>> o
61      ParseResult(scheme='http', netloc='docs.python.org:80',
62                  path='/3/library/urllib.parse.html', params='',
63                  query='highlight=params', fragment='url-parsing')
64      >>> o.scheme
65      'http'
66      >>> o.netloc
67      'docs.python.org:80'
68      >>> o.hostname
69      'docs.python.org'
70      >>> o.port
71      80
72      >>> o._replace(fragment="").geturl()
73      'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
74
75   Following the syntax specifications in :rfc:`1808`, urlparse recognizes
76   a netloc only if it is properly introduced by '//'.  Otherwise the
77   input is presumed to be a relative URL and thus to start with
78   a path component.
79
80   .. doctest::
81      :options: +NORMALIZE_WHITESPACE
82
83      >>> from urllib.parse import urlparse
84      >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
85      ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
86                  params='', query='', fragment='')
87      >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
88      ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
89                  params='', query='', fragment='')
90      >>> urlparse('help/Python.html')
91      ParseResult(scheme='', netloc='', path='help/Python.html', params='',
92                  query='', fragment='')
93
94   The *scheme* argument gives the default addressing scheme, to be
95   used only if the URL does not specify one.  It should be the same type
96   (text or bytes) as *urlstring*, except that the default value ``''`` is
97   always allowed, and is automatically converted to ``b''`` if appropriate.
98
99   If the *allow_fragments* argument is false, fragment identifiers are not
100   recognized.  Instead, they are parsed as part of the path, parameters
101   or query component, and :attr:`fragment` is set to the empty string in
102   the return value.
103
104   The return value is a :term:`named tuple`, which means that its items can
105   be accessed by index or as named attributes, which are:
106
107   +------------------+-------+-------------------------+------------------------+
108   | Attribute        | Index | Value                   | Value if not present   |
109   +==================+=======+=========================+========================+
110   | :attr:`scheme`   | 0     | URL scheme specifier    | *scheme* parameter     |
111   +------------------+-------+-------------------------+------------------------+
112   | :attr:`netloc`   | 1     | Network location part   | empty string           |
113   +------------------+-------+-------------------------+------------------------+
114   | :attr:`path`     | 2     | Hierarchical path       | empty string           |
115   +------------------+-------+-------------------------+------------------------+
116   | :attr:`params`   | 3     | Parameters for last     | empty string           |
117   |                  |       | path element            |                        |
118   +------------------+-------+-------------------------+------------------------+
119   | :attr:`query`    | 4     | Query component         | empty string           |
120   +------------------+-------+-------------------------+------------------------+
121   | :attr:`fragment` | 5     | Fragment identifier     | empty string           |
122   +------------------+-------+-------------------------+------------------------+
123   | :attr:`username` |       | User name               | :const:`None`          |
124   +------------------+-------+-------------------------+------------------------+
125   | :attr:`password` |       | Password                | :const:`None`          |
126   +------------------+-------+-------------------------+------------------------+
127   | :attr:`hostname` |       | Host name (lower case)  | :const:`None`          |
128   +------------------+-------+-------------------------+------------------------+
129   | :attr:`port`     |       | Port number as integer, | :const:`None`          |
130   |                  |       | if present              |                        |
131   +------------------+-------+-------------------------+------------------------+
132
133   Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
134   an invalid port is specified in the URL.  See section
135   :ref:`urlparse-result-object` for more information on the result object.
136
137   Unmatched square brackets in the :attr:`netloc` attribute will raise a
138   :exc:`ValueError`.
139
140   Characters in the :attr:`netloc` attribute that decompose under NFKC
141   normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
142   ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
143   decomposed before parsing, no error will be raised.
144
145   As is the case with all named tuples, the subclass has a few additional methods
146   and attributes that are particularly useful. One such method is :meth:`_replace`.
147   The :meth:`_replace` method will return a new ParseResult object replacing specified
148   fields with new values.
149
150   .. doctest::
151      :options: +NORMALIZE_WHITESPACE
152
153      >>> from urllib.parse import urlparse
154      >>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
155      >>> u
156      ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
157                  params='', query='', fragment='')
158      >>> u._replace(scheme='http')
159      ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
160                  params='', query='', fragment='')
161
162   .. warning::
163
164      :func:`urlparse` does not perform validation.  See :ref:`URL parsing
165      security <url-parsing-security>` for details.
166
167   .. versionchanged:: 3.2
168      Added IPv6 URL parsing capabilities.
169
170   .. versionchanged:: 3.3
171      The fragment is now parsed for all URL schemes (unless *allow_fragment* is
172      false), in accordance with :rfc:`3986`.  Previously, an allowlist of
173      schemes that support fragments existed.
174
175   .. versionchanged:: 3.6
176      Out-of-range port numbers now raise :exc:`ValueError`, instead of
177      returning :const:`None`.
178
179   .. versionchanged:: 3.8
180      Characters that affect netloc parsing under NFKC normalization will
181      now raise :exc:`ValueError`.
182
183
184.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
185
186   Parse a query string given as a string argument (data of type
187   :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a
188   dictionary.  The dictionary keys are the unique query variable names and the
189   values are lists of values for each name.
190
191   The optional argument *keep_blank_values* is a flag indicating whether blank
192   values in percent-encoded queries should be treated as blank strings. A true value
193   indicates that blanks should be retained as  blank strings.  The default false
194   value indicates that blank values are to be ignored and treated as if they were
195   not included.
196
197   The optional argument *strict_parsing* is a flag indicating what to do with
198   parsing errors.  If false (the default), errors are silently ignored.  If true,
199   errors raise a :exc:`ValueError` exception.
200
201   The optional *encoding* and *errors* parameters specify how to decode
202   percent-encoded sequences into Unicode characters, as accepted by the
203   :meth:`bytes.decode` method.
204
205   The optional argument *max_num_fields* is the maximum number of fields to
206   read. If set, then throws a :exc:`ValueError` if there are more than
207   *max_num_fields* fields read.
208
209   The optional argument *separator* is the symbol to use for separating the
210   query arguments. It defaults to ``&``.
211
212   Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
213   parameter set to ``True``) to convert such dictionaries into query
214   strings.
215
216
217   .. versionchanged:: 3.2
218      Add *encoding* and *errors* parameters.
219
220   .. versionchanged:: 3.8
221      Added *max_num_fields* parameter.
222
223   .. versionchanged:: 3.10
224      Added *separator* parameter with the default value of ``&``. Python
225      versions earlier than Python 3.10 allowed using both ``;`` and ``&`` as
226      query parameter separator. This has been changed to allow only a single
227      separator key, with ``&`` as the default separator.
228
229
230.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
231
232   Parse a query string given as a string argument (data of type
233   :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a list of
234   name, value pairs.
235
236   The optional argument *keep_blank_values* is a flag indicating whether blank
237   values in percent-encoded queries should be treated as blank strings. A true value
238   indicates that blanks should be retained as  blank strings.  The default false
239   value indicates that blank values are to be ignored and treated as if they were
240   not included.
241
242   The optional argument *strict_parsing* is a flag indicating what to do with
243   parsing errors.  If false (the default), errors are silently ignored.  If true,
244   errors raise a :exc:`ValueError` exception.
245
246   The optional *encoding* and *errors* parameters specify how to decode
247   percent-encoded sequences into Unicode characters, as accepted by the
248   :meth:`bytes.decode` method.
249
250   The optional argument *max_num_fields* is the maximum number of fields to
251   read. If set, then throws a :exc:`ValueError` if there are more than
252   *max_num_fields* fields read.
253
254   The optional argument *separator* is the symbol to use for separating the
255   query arguments. It defaults to ``&``.
256
257   Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
258   query strings.
259
260   .. versionchanged:: 3.2
261      Add *encoding* and *errors* parameters.
262
263   .. versionchanged:: 3.8
264      Added *max_num_fields* parameter.
265
266   .. versionchanged:: 3.10
267      Added *separator* parameter with the default value of ``&``. Python
268      versions earlier than Python 3.10 allowed using both ``;`` and ``&`` as
269      query parameter separator. This has been changed to allow only a single
270      separator key, with ``&`` as the default separator.
271
272
273.. function:: urlunparse(parts)
274
275   Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
276   argument can be any six-item iterable. This may result in a slightly
277   different, but equivalent URL, if the URL that was parsed originally had
278   unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
279   states that these are equivalent).
280
281
282.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
283
284   This is similar to :func:`urlparse`, but does not split the params from the URL.
285   This should generally be used instead of :func:`urlparse` if the more recent URL
286   syntax allowing parameters to be applied to each segment of the *path* portion
287   of the URL (see :rfc:`2396`) is wanted.  A separate function is needed to
288   separate the path segments and parameters.  This function returns a 5-item
289   :term:`named tuple`::
290
291      (addressing scheme, network location, path, query, fragment identifier).
292
293   The return value is a :term:`named tuple`, its items can be accessed by index
294   or as named attributes:
295
296   +------------------+-------+-------------------------+----------------------+
297   | Attribute        | Index | Value                   | Value if not present |
298   +==================+=======+=========================+======================+
299   | :attr:`scheme`   | 0     | URL scheme specifier    | *scheme* parameter   |
300   +------------------+-------+-------------------------+----------------------+
301   | :attr:`netloc`   | 1     | Network location part   | empty string         |
302   +------------------+-------+-------------------------+----------------------+
303   | :attr:`path`     | 2     | Hierarchical path       | empty string         |
304   +------------------+-------+-------------------------+----------------------+
305   | :attr:`query`    | 3     | Query component         | empty string         |
306   +------------------+-------+-------------------------+----------------------+
307   | :attr:`fragment` | 4     | Fragment identifier     | empty string         |
308   +------------------+-------+-------------------------+----------------------+
309   | :attr:`username` |       | User name               | :const:`None`        |
310   +------------------+-------+-------------------------+----------------------+
311   | :attr:`password` |       | Password                | :const:`None`        |
312   +------------------+-------+-------------------------+----------------------+
313   | :attr:`hostname` |       | Host name (lower case)  | :const:`None`        |
314   +------------------+-------+-------------------------+----------------------+
315   | :attr:`port`     |       | Port number as integer, | :const:`None`        |
316   |                  |       | if present              |                      |
317   +------------------+-------+-------------------------+----------------------+
318
319   Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
320   an invalid port is specified in the URL.  See section
321   :ref:`urlparse-result-object` for more information on the result object.
322
323   Unmatched square brackets in the :attr:`netloc` attribute will raise a
324   :exc:`ValueError`.
325
326   Characters in the :attr:`netloc` attribute that decompose under NFKC
327   normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
328   ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
329   decomposed before parsing, no error will be raised.
330
331   Following some of the `WHATWG spec`_ that updates RFC 3986, leading C0
332   control and space characters are stripped from the URL. ``\n``,
333   ``\r`` and tab ``\t`` characters are removed from the URL at any position.
334
335   .. warning::
336
337      :func:`urlsplit` does not perform validation.  See :ref:`URL parsing
338      security <url-parsing-security>` for details.
339
340   .. versionchanged:: 3.6
341      Out-of-range port numbers now raise :exc:`ValueError`, instead of
342      returning :const:`None`.
343
344   .. versionchanged:: 3.8
345      Characters that affect netloc parsing under NFKC normalization will
346      now raise :exc:`ValueError`.
347
348   .. versionchanged:: 3.10
349      ASCII newline and tab characters are stripped from the URL.
350
351   .. versionchanged:: 3.11.4
352      Leading WHATWG C0 control and space characters are stripped from the URL.
353
354.. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser
355
356.. function:: urlunsplit(parts)
357
358   Combine the elements of a tuple as returned by :func:`urlsplit` into a
359   complete URL as a string. The *parts* argument can be any five-item
360   iterable. This may result in a slightly different, but equivalent URL, if the
361   URL that was parsed originally had unnecessary delimiters (for example, a ?
362   with an empty query; the RFC states that these are equivalent).
363
364
365.. function:: urljoin(base, url, allow_fragments=True)
366
367   Construct a full ("absolute") URL by combining a "base URL" (*base*) with
368   another URL (*url*).  Informally, this uses components of the base URL, in
369   particular the addressing scheme, the network location and (part of) the
370   path, to provide missing components in the relative URL.  For example:
371
372      >>> from urllib.parse import urljoin
373      >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
374      'http://www.cwi.nl/%7Eguido/FAQ.html'
375
376   The *allow_fragments* argument has the same meaning and default as for
377   :func:`urlparse`.
378
379   .. note::
380
381      If *url* is an absolute URL (that is, it starts with ``//`` or ``scheme://``),
382      the *url*'s hostname and/or scheme will be present in the result.  For example:
383
384      .. doctest::
385
386         >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
387         ...         '//www.python.org/%7Eguido')
388         'http://www.python.org/%7Eguido'
389
390      If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
391      :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
392
393
394   .. versionchanged:: 3.5
395
396      Behavior updated to match the semantics defined in :rfc:`3986`.
397
398
399.. function:: urldefrag(url)
400
401   If *url* contains a fragment identifier, return a modified version of *url*
402   with no fragment identifier, and the fragment identifier as a separate
403   string.  If there is no fragment identifier in *url*, return *url* unmodified
404   and an empty string.
405
406   The return value is a :term:`named tuple`, its items can be accessed by index
407   or as named attributes:
408
409   +------------------+-------+-------------------------+----------------------+
410   | Attribute        | Index | Value                   | Value if not present |
411   +==================+=======+=========================+======================+
412   | :attr:`url`      | 0     | URL with no fragment    | empty string         |
413   +------------------+-------+-------------------------+----------------------+
414   | :attr:`fragment` | 1     | Fragment identifier     | empty string         |
415   +------------------+-------+-------------------------+----------------------+
416
417   See section :ref:`urlparse-result-object` for more information on the result
418   object.
419
420   .. versionchanged:: 3.2
421      Result is a structured object rather than a simple 2-tuple.
422
423.. function:: unwrap(url)
424
425   Extract the url from a wrapped URL (that is, a string formatted as
426   ``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path``
427   or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned
428   without changes.
429
430.. _url-parsing-security:
431
432URL parsing security
433--------------------
434
435The :func:`urlsplit` and :func:`urlparse` APIs do not perform **validation** of
436inputs.  They may not raise errors on inputs that other applications consider
437invalid.  They may also succeed on some inputs that might not be considered
438URLs elsewhere.  Their purpose is for practical functionality rather than
439purity.
440
441Instead of raising an exception on unusual input, they may instead return some
442component parts as empty strings. Or components may contain more than perhaps
443they should.
444
445We recommend that users of these APIs where the values may be used anywhere
446with security implications code defensively. Do some verification within your
447code before trusting a returned component part.  Does that ``scheme`` make
448sense?  Is that a sensible ``path``?  Is there anything strange about that
449``hostname``?  etc.
450
451What constitutes a URL is not universally well defined.  Different applications
452have different needs and desired constraints.  For instance the living `WHATWG
453spec`_ describes what user facing web clients such as a web browser require.
454While :rfc:`3986` is more general.  These functions incorporate some aspects of
455both, but cannot be claimed compliant with either.  The APIs and existing user
456code with expectations on specific behaviors predate both standards leading us
457to be very cautious about making API behavior changes.
458
459.. _parsing-ascii-encoded-bytes:
460
461Parsing ASCII Encoded Bytes
462---------------------------
463
464The URL parsing functions were originally designed to operate on character
465strings only. In practice, it is useful to be able to manipulate properly
466quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
467URL parsing functions in this module all operate on :class:`bytes` and
468:class:`bytearray` objects in addition to :class:`str` objects.
469
470If :class:`str` data is passed in, the result will also contain only
471:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
472passed in, the result will contain only :class:`bytes` data.
473
474Attempting to mix :class:`str` data with :class:`bytes` or
475:class:`bytearray` in a single function call will result in a
476:exc:`TypeError` being raised, while attempting to pass in non-ASCII
477byte values will trigger :exc:`UnicodeDecodeError`.
478
479To support easier conversion of result objects between :class:`str` and
480:class:`bytes`, all return values from URL parsing functions provide
481either an :meth:`encode` method (when the result contains :class:`str`
482data) or a :meth:`decode` method (when the result contains :class:`bytes`
483data). The signatures of these methods match those of the corresponding
484:class:`str` and :class:`bytes` methods (except that the default encoding
485is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
486corresponding type that contains either :class:`bytes` data (for
487:meth:`encode` methods) or :class:`str` data (for
488:meth:`decode` methods).
489
490Applications that need to operate on potentially improperly quoted URLs
491that may contain non-ASCII data will need to do their own decoding from
492bytes to characters before invoking the URL parsing methods.
493
494The behaviour described in this section applies only to the URL parsing
495functions. The URL quoting functions use their own rules when producing
496or consuming byte sequences as detailed in the documentation of the
497individual URL quoting functions.
498
499.. versionchanged:: 3.2
500   URL parsing functions now accept ASCII encoded byte sequences
501
502
503.. _urlparse-result-object:
504
505Structured Parse Results
506------------------------
507
508The result objects from the :func:`urlparse`, :func:`urlsplit`  and
509:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
510These subclasses add the attributes listed in the documentation for
511those functions, the encoding and decoding support described in the
512previous section, as well as an additional method:
513
514.. method:: urllib.parse.SplitResult.geturl()
515
516   Return the re-combined version of the original URL as a string. This may
517   differ from the original URL in that the scheme may be normalized to lower
518   case and empty components may be dropped. Specifically, empty parameters,
519   queries, and fragment identifiers will be removed.
520
521   For :func:`urldefrag` results, only empty fragment identifiers will be removed.
522   For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
523   made to the URL returned by this method.
524
525   The result of this method remains unchanged if passed back through the original
526   parsing function:
527
528      >>> from urllib.parse import urlsplit
529      >>> url = 'HTTP://www.Python.org/doc/#'
530      >>> r1 = urlsplit(url)
531      >>> r1.geturl()
532      'http://www.Python.org/doc/'
533      >>> r2 = urlsplit(r1.geturl())
534      >>> r2.geturl()
535      'http://www.Python.org/doc/'
536
537
538The following classes provide the implementations of the structured parse
539results when operating on :class:`str` objects:
540
541.. class:: DefragResult(url, fragment)
542
543   Concrete class for :func:`urldefrag` results containing :class:`str`
544   data. The :meth:`encode` method returns a :class:`DefragResultBytes`
545   instance.
546
547   .. versionadded:: 3.2
548
549.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
550
551   Concrete class for :func:`urlparse` results containing :class:`str`
552   data. The :meth:`encode` method returns a :class:`ParseResultBytes`
553   instance.
554
555.. class:: SplitResult(scheme, netloc, path, query, fragment)
556
557   Concrete class for :func:`urlsplit` results containing :class:`str`
558   data. The :meth:`encode` method returns a :class:`SplitResultBytes`
559   instance.
560
561
562The following classes provide the implementations of the parse results when
563operating on :class:`bytes` or :class:`bytearray` objects:
564
565.. class:: DefragResultBytes(url, fragment)
566
567   Concrete class for :func:`urldefrag` results containing :class:`bytes`
568   data. The :meth:`decode` method returns a :class:`DefragResult`
569   instance.
570
571   .. versionadded:: 3.2
572
573.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
574
575   Concrete class for :func:`urlparse` results containing :class:`bytes`
576   data. The :meth:`decode` method returns a :class:`ParseResult`
577   instance.
578
579   .. versionadded:: 3.2
580
581.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
582
583   Concrete class for :func:`urlsplit` results containing :class:`bytes`
584   data. The :meth:`decode` method returns a :class:`SplitResult`
585   instance.
586
587   .. versionadded:: 3.2
588
589
590URL Quoting
591-----------
592
593The URL quoting functions focus on taking program data and making it safe
594for use as URL components by quoting special characters and appropriately
595encoding non-ASCII text. They also support reversing these operations to
596recreate the original data from the contents of a URL component if that
597task isn't already covered by the URL parsing functions above.
598
599.. function:: quote(string, safe='/', encoding=None, errors=None)
600
601   Replace special characters in *string* using the ``%xx`` escape. Letters,
602   digits, and the characters ``'_.-~'`` are never quoted. By default, this
603   function is intended for quoting the path section of a URL. The optional
604   *safe* parameter specifies additional ASCII characters that should not be
605   quoted --- its default value is ``'/'``.
606
607   *string* may be either a :class:`str` or a :class:`bytes` object.
608
609   .. versionchanged:: 3.7
610      Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now
611      included in the set of unreserved characters.
612
613   The optional *encoding* and *errors* parameters specify how to deal with
614   non-ASCII characters, as accepted by the :meth:`str.encode` method.
615   *encoding* defaults to ``'utf-8'``.
616   *errors* defaults to ``'strict'``, meaning unsupported characters raise a
617   :class:`UnicodeEncodeError`.
618   *encoding* and *errors* must not be supplied if *string* is a
619   :class:`bytes`, or a :class:`TypeError` is raised.
620
621   Note that ``quote(string, safe, encoding, errors)`` is equivalent to
622   ``quote_from_bytes(string.encode(encoding, errors), safe)``.
623
624   Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
625
626
627.. function:: quote_plus(string, safe='', encoding=None, errors=None)
628
629   Like :func:`quote`, but also replace spaces with plus signs, as required for
630   quoting HTML form values when building up a query string to go into a URL.
631   Plus signs in the original string are escaped unless they are included in
632   *safe*.  It also does not have *safe* default to ``'/'``.
633
634   Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
635
636
637.. function:: quote_from_bytes(bytes, safe='/')
638
639   Like :func:`quote`, but accepts a :class:`bytes` object rather than a
640   :class:`str`, and does not perform string-to-bytes encoding.
641
642   Example: ``quote_from_bytes(b'a&\xef')`` yields
643   ``'a%26%EF'``.
644
645
646.. function:: unquote(string, encoding='utf-8', errors='replace')
647
648   Replace ``%xx`` escapes with their single-character equivalent.
649   The optional *encoding* and *errors* parameters specify how to decode
650   percent-encoded sequences into Unicode characters, as accepted by the
651   :meth:`bytes.decode` method.
652
653   *string* may be either a :class:`str` or a :class:`bytes` object.
654
655   *encoding* defaults to ``'utf-8'``.
656   *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
657   by a placeholder character.
658
659   Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
660
661   .. versionchanged:: 3.9
662      *string* parameter supports bytes and str objects (previously only str).
663
664
665
666
667.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
668
669   Like :func:`unquote`, but also replace plus signs with spaces, as required
670   for unquoting HTML form values.
671
672   *string* must be a :class:`str`.
673
674   Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
675
676
677.. function:: unquote_to_bytes(string)
678
679   Replace ``%xx`` escapes with their single-octet equivalent, and return a
680   :class:`bytes` object.
681
682   *string* may be either a :class:`str` or a :class:`bytes` object.
683
684   If it is a :class:`str`, unescaped non-ASCII characters in *string*
685   are encoded into UTF-8 bytes.
686
687   Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
688
689
690.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
691                        errors=None, quote_via=quote_plus)
692
693   Convert a mapping object or a sequence of two-element tuples, which may
694   contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
695   text string.  If the resultant string is to be used as a *data* for POST
696   operation with the :func:`~urllib.request.urlopen` function, then
697   it should be encoded to bytes, otherwise it would result in a
698   :exc:`TypeError`.
699
700   The resulting string is a series of ``key=value`` pairs separated by ``'&'``
701   characters, where both *key* and *value* are quoted using the *quote_via*
702   function.  By default, :func:`quote_plus` is used to quote the values, which
703   means spaces are quoted as a ``'+'`` character and '/' characters are
704   encoded as ``%2F``, which follows the standard for GET requests
705   (``application/x-www-form-urlencoded``).  An alternate function that can be
706   passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
707   and not encode '/' characters.  For maximum control of what is quoted, use
708   ``quote`` and specify a value for *safe*.
709
710   When a sequence of two-element tuples is used as the *query*
711   argument, the first element of each tuple is a key and the second is a
712   value. The value element in itself can be a sequence and in that case, if
713   the optional parameter *doseq* evaluates to ``True``, individual
714   ``key=value`` pairs separated by ``'&'`` are generated for each element of
715   the value sequence for the key.  The order of parameters in the encoded
716   string will match the order of parameter tuples in the sequence.
717
718   The *safe*, *encoding*, and *errors* parameters are passed down to
719   *quote_via* (the *encoding* and *errors* parameters are only passed
720   when a query element is a :class:`str`).
721
722   To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
723   provided in this module to parse query strings into Python data structures.
724
725   Refer to :ref:`urllib examples <urllib-examples>` to find out how the
726   :func:`urllib.parse.urlencode` method can be used for generating the query
727   string of a URL or data for a POST request.
728
729   .. versionchanged:: 3.2
730      *query* supports bytes and string objects.
731
732   .. versionadded:: 3.5
733      *quote_via* parameter.
734
735
736.. seealso::
737
738   `WHATWG`_ -  URL Living standard
739      Working Group for the URL Standard that defines URLs, domains, IP addresses, the
740      application/x-www-form-urlencoded format, and their API.
741
742   :rfc:`3986` - Uniform Resource Identifiers
743      This is the current standard (STD66). Any changes to urllib.parse module
744      should conform to this. Certain deviations could be observed, which are
745      mostly for backward compatibility purposes and for certain de-facto
746      parsing requirements as commonly observed in major browsers.
747
748   :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
749      This specifies the parsing requirements of IPv6 URLs.
750
751   :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
752      Document describing the generic syntactic requirements for both Uniform Resource
753      Names (URNs) and Uniform Resource Locators (URLs).
754
755   :rfc:`2368` - The mailto URL scheme.
756      Parsing requirements for mailto URL schemes.
757
758   :rfc:`1808` - Relative Uniform Resource Locators
759      This Request For Comments includes the rules for joining an absolute and a
760      relative URL, including a fair number of "Abnormal Examples" which govern the
761      treatment of border cases.
762
763   :rfc:`1738` - Uniform Resource Locators (URL)
764      This specifies the formal syntax and semantics of absolute URLs.
765
766.. _WHATWG: https://url.spec.whatwg.org/
767