1:mod:`urllib.request` --- Extensible library for opening URLs 2============================================================= 3 4.. module:: urllib.request 5 :synopsis: Extensible library for opening URLs. 6 7.. moduleauthor:: Jeremy Hylton <[email protected]> 8.. sectionauthor:: Moshe Zadka <[email protected]> 9.. sectionauthor:: Senthil Kumaran <[email protected]> 10 11**Source code:** :source:`Lib/urllib/request.py` 12 13-------------- 14 15The :mod:`urllib.request` module defines functions and classes which help in 16opening URLs (mostly HTTP) in a complex world --- basic and digest 17authentication, redirections, cookies and more. 18 19.. seealso:: 20 21 The `Requests package <https://requests.readthedocs.io/en/master/>`_ 22 is recommended for a higher-level HTTP client interface. 23 24.. include:: ../includes/wasm-notavail.rst 25 26The :mod:`urllib.request` module defines the following functions: 27 28 29.. function:: urlopen(url, data=None[, timeout], *, cafile=None, capath=None, cadefault=False, context=None) 30 31 Open *url*, which can be either a string containing a valid, properly 32 encoded URL, or a :class:`Request` object. 33 34 *data* must be an object specifying additional data to be sent to the 35 server, or ``None`` if no such data is needed. See :class:`Request` 36 for details. 37 38 urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header 39 in its HTTP requests. 40 41 The optional *timeout* parameter specifies a timeout in seconds for 42 blocking operations like the connection attempt (if not specified, 43 the global default timeout setting will be used). This actually 44 only works for HTTP, HTTPS and FTP connections. 45 46 If *context* is specified, it must be a :class:`ssl.SSLContext` instance 47 describing the various SSL options. See :class:`~http.client.HTTPSConnection` 48 for more details. 49 50 The optional *cafile* and *capath* parameters specify a set of trusted 51 CA certificates for HTTPS requests. *cafile* should point to a single 52 file containing a bundle of CA certificates, whereas *capath* should 53 point to a directory of hashed certificate files. More information can 54 be found in :meth:`ssl.SSLContext.load_verify_locations`. 55 56 The *cadefault* parameter is ignored. 57 58 This function always returns an object which can work as a 59 :term:`context manager` and has the properties *url*, *headers*, and *status*. 60 See :class:`urllib.response.addinfourl` for more detail on these properties. 61 62 For HTTP and HTTPS URLs, this function returns a 63 :class:`http.client.HTTPResponse` object slightly modified. In addition 64 to the three new methods above, the msg attribute contains the 65 same information as the :attr:`~http.client.HTTPResponse.reason` 66 attribute --- the reason phrase returned by server --- instead of 67 the response headers as it is specified in the documentation for 68 :class:`~http.client.HTTPResponse`. 69 70 For FTP, file, and data URLs and requests explicitly handled by legacy 71 :class:`URLopener` and :class:`FancyURLopener` classes, this function 72 returns a :class:`urllib.response.addinfourl` object. 73 74 Raises :exc:`~urllib.error.URLError` on protocol errors. 75 76 Note that ``None`` may be returned if no handler handles the request (though 77 the default installed global :class:`OpenerDirector` uses 78 :class:`UnknownHandler` to ensure this never happens). 79 80 In addition, if proxy settings are detected (for example, when a ``*_proxy`` 81 environment variable like :envvar:`http_proxy` is set), 82 :class:`ProxyHandler` is default installed and makes sure the requests are 83 handled through the proxy. 84 85 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been 86 discontinued; :func:`urllib.request.urlopen` corresponds to the old 87 ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary 88 parameter to ``urllib.urlopen``, can be obtained by using 89 :class:`ProxyHandler` objects. 90 91 .. audit-event:: urllib.Request fullurl,data,headers,method urllib.request.urlopen 92 93 The default opener raises an :ref:`auditing event <auditing>` 94 ``urllib.Request`` with arguments ``fullurl``, ``data``, ``headers``, 95 ``method`` taken from the request object. 96 97 .. versionchanged:: 3.2 98 *cafile* and *capath* were added. 99 100 .. versionchanged:: 3.2 101 HTTPS virtual hosts are now supported if possible (that is, if 102 :data:`ssl.HAS_SNI` is true). 103 104 .. versionadded:: 3.2 105 *data* can be an iterable object. 106 107 .. versionchanged:: 3.3 108 *cadefault* was added. 109 110 .. versionchanged:: 3.4.3 111 *context* was added. 112 113 .. versionchanged:: 3.10 114 HTTPS connection now send an ALPN extension with protocol indicator 115 ``http/1.1`` when no *context* is given. Custom *context* should set 116 ALPN protocols with :meth:`~ssl.SSLContext.set_alpn_protocol`. 117 118 .. deprecated:: 3.6 119 120 *cafile*, *capath* and *cadefault* are deprecated in favor of *context*. 121 Please use :meth:`ssl.SSLContext.load_cert_chain` instead, or let 122 :func:`ssl.create_default_context` select the system's trusted CA 123 certificates for you. 124 125 126.. function:: install_opener(opener) 127 128 Install an :class:`OpenerDirector` instance as the default global opener. 129 Installing an opener is only necessary if you want urlopen to use that 130 opener; otherwise, simply call :meth:`OpenerDirector.open` instead of 131 :func:`~urllib.request.urlopen`. The code does not check for a real 132 :class:`OpenerDirector`, and any class with the appropriate interface will 133 work. 134 135 136.. function:: build_opener([handler, ...]) 137 138 Return an :class:`OpenerDirector` instance, which chains the handlers in the 139 order given. *handler*\s can be either instances of :class:`BaseHandler`, or 140 subclasses of :class:`BaseHandler` (in which case it must be possible to call 141 the constructor without any parameters). Instances of the following classes 142 will be in front of the *handler*\s, unless the *handler*\s contain them, 143 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy 144 settings are detected), :class:`UnknownHandler`, :class:`HTTPHandler`, 145 :class:`HTTPDefaultErrorHandler`, :class:`HTTPRedirectHandler`, 146 :class:`FTPHandler`, :class:`FileHandler`, :class:`HTTPErrorProcessor`. 147 148 If the Python installation has SSL support (i.e., if the :mod:`ssl` module 149 can be imported), :class:`HTTPSHandler` will also be added. 150 151 A :class:`BaseHandler` subclass may also change its :attr:`handler_order` 152 attribute to modify its position in the handlers list. 153 154 155.. function:: pathname2url(path) 156 157 Convert the pathname *path* from the local syntax for a path to the form used in 158 the path component of a URL. This does not produce a complete URL. The return 159 value will already be quoted using the :func:`~urllib.parse.quote` function. 160 161 162.. function:: url2pathname(path) 163 164 Convert the path component *path* from a percent-encoded URL to the local syntax for a 165 path. This does not accept a complete URL. This function uses 166 :func:`~urllib.parse.unquote` to decode *path*. 167 168.. function:: getproxies() 169 170 This helper function returns a dictionary of scheme to proxy server URL 171 mappings. It scans the environment for variables named ``<scheme>_proxy``, 172 in a case insensitive approach, for all operating systems first, and when it 173 cannot find it, looks for proxy information from System 174 Configuration for macOS and Windows Systems Registry for Windows. 175 If both lowercase and uppercase environment variables exist (and disagree), 176 lowercase is preferred. 177 178 .. note:: 179 180 If the environment variable ``REQUEST_METHOD`` is set, which usually 181 indicates your script is running in a CGI environment, the environment 182 variable ``HTTP_PROXY`` (uppercase ``_PROXY``) will be ignored. This is 183 because that variable can be injected by a client using the "Proxy:" HTTP 184 header. If you need to use an HTTP proxy in a CGI environment, either use 185 ``ProxyHandler`` explicitly, or make sure the variable name is in 186 lowercase (or at least the ``_proxy`` suffix). 187 188 189The following classes are provided: 190 191.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None) 192 193 This class is an abstraction of a URL request. 194 195 *url* should be a string containing a valid, properly encoded URL. 196 197 *data* must be an object specifying additional data to send to the 198 server, or ``None`` if no such data is needed. Currently HTTP 199 requests are the only ones that use *data*. The supported object 200 types include bytes, file-like objects, and iterables of bytes-like objects. 201 If no ``Content-Length`` nor ``Transfer-Encoding`` header field 202 has been provided, :class:`HTTPHandler` will set these headers according 203 to the type of *data*. ``Content-Length`` will be used to send 204 bytes objects, while ``Transfer-Encoding: chunked`` as specified in 205 :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables. 206 207 For an HTTP POST request method, *data* should be a buffer in the 208 standard :mimetype:`application/x-www-form-urlencoded` format. The 209 :func:`urllib.parse.urlencode` function takes a mapping or sequence 210 of 2-tuples and returns an ASCII string in this format. It should 211 be encoded to bytes before being used as the *data* parameter. 212 213 *headers* should be a dictionary, and will be treated as if 214 :meth:`add_header` was called with each key and value as arguments. 215 This is often used to "spoof" the ``User-Agent`` header value, which is 216 used by a browser to identify itself -- some HTTP servers only 217 allow requests coming from common browsers as opposed to scripts. 218 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0 219 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while 220 :mod:`urllib`'s default user agent string is 221 ``"Python-urllib/2.6"`` (on Python 2.6). 222 All header keys are sent in camel case. 223 224 An appropriate ``Content-Type`` header should be included if the *data* 225 argument is present. If this header has not been provided and *data* 226 is not None, ``Content-Type: application/x-www-form-urlencoded`` will 227 be added as a default. 228 229 The next two arguments are only of interest for correct handling 230 of third-party HTTP cookies: 231 232 *origin_req_host* should be the request-host of the origin 233 transaction, as defined by :rfc:`2965`. It defaults to 234 ``http.cookiejar.request_host(self)``. This is the host name or IP 235 address of the original request that was initiated by the user. 236 For example, if the request is for an image in an HTML document, 237 this should be the request-host of the request for the page 238 containing the image. 239 240 *unverifiable* should indicate whether the request is unverifiable, 241 as defined by :rfc:`2965`. It defaults to ``False``. An unverifiable 242 request is one whose URL the user did not have the option to 243 approve. For example, if the request is for an image in an HTML 244 document, and the user had no option to approve the automatic 245 fetching of the image, this should be true. 246 247 *method* should be a string that indicates the HTTP request method that 248 will be used (e.g. ``'HEAD'``). If provided, its value is stored in the 249 :attr:`~Request.method` attribute and is used by :meth:`get_method()`. 250 The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise. 251 Subclasses may indicate a different default method by setting the 252 :attr:`~Request.method` attribute in the class itself. 253 254 .. note:: 255 The request will not work as expected if the data object is unable 256 to deliver its content more than once (e.g. a file or an iterable 257 that can produce the content only once) and the request is retried 258 for HTTP redirects or authentication. The *data* is sent to the 259 HTTP server right away after the headers. There is no support for 260 a 100-continue expectation in the library. 261 262 .. versionchanged:: 3.3 263 :attr:`Request.method` argument is added to the Request class. 264 265 .. versionchanged:: 3.4 266 Default :attr:`Request.method` may be indicated at the class level. 267 268 .. versionchanged:: 3.6 269 Do not raise an error if the ``Content-Length`` has not been 270 provided and *data* is neither ``None`` nor a bytes object. 271 Fall back to use chunked transfer encoding instead. 272 273.. class:: OpenerDirector() 274 275 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained 276 together. It manages the chaining of handlers, and recovery from errors. 277 278 279.. class:: BaseHandler() 280 281 This is the base class for all registered handlers --- and handles only the 282 simple mechanics of registration. 283 284 285.. class:: HTTPDefaultErrorHandler() 286 287 A class which defines a default handler for HTTP error responses; all responses 288 are turned into :exc:`~urllib.error.HTTPError` exceptions. 289 290 291.. class:: HTTPRedirectHandler() 292 293 A class to handle redirections. 294 295 296.. class:: HTTPCookieProcessor(cookiejar=None) 297 298 A class to handle HTTP Cookies. 299 300 301.. class:: ProxyHandler(proxies=None) 302 303 Cause requests to go through a proxy. If *proxies* is given, it must be a 304 dictionary mapping protocol names to URLs of proxies. The default is to read 305 the list of proxies from the environment variables 306 ``<protocol>_proxy``. If no proxy environment variables are set, then 307 in a Windows environment proxy settings are obtained from the registry's 308 Internet Settings section, and in a macOS environment proxy information 309 is retrieved from the System Configuration Framework. 310 311 To disable autodetected proxy pass an empty dictionary. 312 313 The :envvar:`no_proxy` environment variable can be used to specify hosts 314 which shouldn't be reached via proxy; if set, it should be a comma-separated 315 list of hostname suffixes, optionally with ``:port`` appended, for example 316 ``cern.ch,ncsa.uiuc.edu,some.host:8080``. 317 318 .. note:: 319 320 ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; 321 see the documentation on :func:`~urllib.request.getproxies`. 322 323 324.. class:: HTTPPasswordMgr() 325 326 Keep a database of ``(realm, uri) -> (user, password)`` mappings. 327 328 329.. class:: HTTPPasswordMgrWithDefaultRealm() 330 331 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of 332 ``None`` is considered a catch-all realm, which is searched if no other realm 333 fits. 334 335 336.. class:: HTTPPasswordMgrWithPriorAuth() 337 338 A variant of :class:`HTTPPasswordMgrWithDefaultRealm` that also has a 339 database of ``uri -> is_authenticated`` mappings. Can be used by a 340 BasicAuth handler to determine when to send authentication credentials 341 immediately instead of waiting for a ``401`` response first. 342 343 .. versionadded:: 3.5 344 345 346.. class:: AbstractBasicAuthHandler(password_mgr=None) 347 348 This is a mixin class that helps with HTTP authentication, both to the remote 349 host and to a proxy. *password_mgr*, if given, should be something that is 350 compatible with :class:`HTTPPasswordMgr`; refer to section 351 :ref:`http-password-mgr` for information on the interface that must be 352 supported. If *passwd_mgr* also provides ``is_authenticated`` and 353 ``update_authenticated`` methods (see 354 :ref:`http-password-mgr-with-prior-auth`), then the handler will use the 355 ``is_authenticated`` result for a given URI to determine whether or not to 356 send authentication credentials with the request. If ``is_authenticated`` 357 returns ``True`` for the URI, credentials are sent. If ``is_authenticated`` 358 is ``False``, credentials are not sent, and then if a ``401`` response is 359 received the request is re-sent with the authentication credentials. If 360 authentication succeeds, ``update_authenticated`` is called to set 361 ``is_authenticated`` ``True`` for the URI, so that subsequent requests to 362 the URI or any of its super-URIs will automatically include the 363 authentication credentials. 364 365 .. versionadded:: 3.5 366 Added ``is_authenticated`` support. 367 368 369.. class:: HTTPBasicAuthHandler(password_mgr=None) 370 371 Handle authentication with the remote host. *password_mgr*, if given, should 372 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 373 section :ref:`http-password-mgr` for information on the interface that must 374 be supported. HTTPBasicAuthHandler will raise a :exc:`ValueError` when 375 presented with a wrong Authentication scheme. 376 377 378.. class:: ProxyBasicAuthHandler(password_mgr=None) 379 380 Handle authentication with the proxy. *password_mgr*, if given, should be 381 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 382 :ref:`http-password-mgr` for information on the interface that must be 383 supported. 384 385 386.. class:: AbstractDigestAuthHandler(password_mgr=None) 387 388 This is a mixin class that helps with HTTP authentication, both to the remote 389 host and to a proxy. *password_mgr*, if given, should be something that is 390 compatible with :class:`HTTPPasswordMgr`; refer to section 391 :ref:`http-password-mgr` for information on the interface that must be 392 supported. 393 394 395.. class:: HTTPDigestAuthHandler(password_mgr=None) 396 397 Handle authentication with the remote host. *password_mgr*, if given, should 398 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 399 section :ref:`http-password-mgr` for information on the interface that must 400 be supported. When both Digest Authentication Handler and Basic 401 Authentication Handler are both added, Digest Authentication is always tried 402 first. If the Digest Authentication returns a 40x response again, it is sent 403 to Basic Authentication handler to Handle. This Handler method will raise a 404 :exc:`ValueError` when presented with an authentication scheme other than 405 Digest or Basic. 406 407 .. versionchanged:: 3.3 408 Raise :exc:`ValueError` on unsupported Authentication Scheme. 409 410 411 412.. class:: ProxyDigestAuthHandler(password_mgr=None) 413 414 Handle authentication with the proxy. *password_mgr*, if given, should be 415 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 416 :ref:`http-password-mgr` for information on the interface that must be 417 supported. 418 419 420.. class:: HTTPHandler() 421 422 A class to handle opening of HTTP URLs. 423 424 425.. class:: HTTPSHandler(debuglevel=0, context=None, check_hostname=None) 426 427 A class to handle opening of HTTPS URLs. *context* and *check_hostname* 428 have the same meaning as in :class:`http.client.HTTPSConnection`. 429 430 .. versionchanged:: 3.2 431 *context* and *check_hostname* were added. 432 433 434.. class:: FileHandler() 435 436 Open local files. 437 438.. class:: DataHandler() 439 440 Open data URLs. 441 442 .. versionadded:: 3.4 443 444.. class:: FTPHandler() 445 446 Open FTP URLs. 447 448 449.. class:: CacheFTPHandler() 450 451 Open FTP URLs, keeping a cache of open FTP connections to minimize delays. 452 453 454.. class:: UnknownHandler() 455 456 A catch-all class to handle unknown URLs. 457 458 459.. class:: HTTPErrorProcessor() 460 461 Process HTTP error responses. 462 463 464.. _request-objects: 465 466Request Objects 467--------------- 468 469The following methods describe :class:`Request`'s public interface, 470and so all may be overridden in subclasses. It also defines several 471public attributes that can be used by clients to inspect the parsed 472request. 473 474.. attribute:: Request.full_url 475 476 The original URL passed to the constructor. 477 478 .. versionchanged:: 3.4 479 480 Request.full_url is a property with setter, getter and a deleter. Getting 481 :attr:`~Request.full_url` returns the original request URL with the 482 fragment, if it was present. 483 484.. attribute:: Request.type 485 486 The URI scheme. 487 488.. attribute:: Request.host 489 490 The URI authority, typically a host, but may also contain a port 491 separated by a colon. 492 493.. attribute:: Request.origin_req_host 494 495 The original host for the request, without port. 496 497.. attribute:: Request.selector 498 499 The URI path. If the :class:`Request` uses a proxy, then selector 500 will be the full URL that is passed to the proxy. 501 502.. attribute:: Request.data 503 504 The entity body for the request, or ``None`` if not specified. 505 506 .. versionchanged:: 3.4 507 Changing value of :attr:`Request.data` now deletes "Content-Length" 508 header if it was previously set or calculated. 509 510.. attribute:: Request.unverifiable 511 512 boolean, indicates whether the request is unverifiable as defined 513 by :rfc:`2965`. 514 515.. attribute:: Request.method 516 517 The HTTP request method to use. By default its value is :const:`None`, 518 which means that :meth:`~Request.get_method` will do its normal computation 519 of the method to be used. Its value can be set (thus overriding the default 520 computation in :meth:`~Request.get_method`) either by providing a default 521 value by setting it at the class level in a :class:`Request` subclass, or by 522 passing a value in to the :class:`Request` constructor via the *method* 523 argument. 524 525 .. versionadded:: 3.3 526 527 .. versionchanged:: 3.4 528 A default value can now be set in subclasses; previously it could only 529 be set via the constructor argument. 530 531 532.. method:: Request.get_method() 533 534 Return a string indicating the HTTP request method. If 535 :attr:`Request.method` is not ``None``, return its value, otherwise return 536 ``'GET'`` if :attr:`Request.data` is ``None``, or ``'POST'`` if it's not. 537 This is only meaningful for HTTP requests. 538 539 .. versionchanged:: 3.3 540 get_method now looks at the value of :attr:`Request.method`. 541 542 543.. method:: Request.add_header(key, val) 544 545 Add another header to the request. Headers are currently ignored by all 546 handlers except HTTP handlers, where they are added to the list of headers sent 547 to the server. Note that there cannot be more than one header with the same 548 name, and later calls will overwrite previous calls in case the *key* collides. 549 Currently, this is no loss of HTTP functionality, since all headers which have 550 meaning when used more than once have a (header-specific) way of gaining the 551 same functionality using only one header. Note that headers added using 552 this method are also added to redirected requests. 553 554 555.. method:: Request.add_unredirected_header(key, header) 556 557 Add a header that will not be added to a redirected request. 558 559 560.. method:: Request.has_header(header) 561 562 Return whether the instance has the named header (checks both regular and 563 unredirected). 564 565 566.. method:: Request.remove_header(header) 567 568 Remove named header from the request instance (both from regular and 569 unredirected headers). 570 571 .. versionadded:: 3.4 572 573 574.. method:: Request.get_full_url() 575 576 Return the URL given in the constructor. 577 578 .. versionchanged:: 3.4 579 580 Returns :attr:`Request.full_url` 581 582 583.. method:: Request.set_proxy(host, type) 584 585 Prepare the request by connecting to a proxy server. The *host* and *type* will 586 replace those of the instance, and the instance's selector will be the original 587 URL given in the constructor. 588 589 590.. method:: Request.get_header(header_name, default=None) 591 592 Return the value of the given header. If the header is not present, return 593 the default value. 594 595 596.. method:: Request.header_items() 597 598 Return a list of tuples (header_name, header_value) of the Request headers. 599 600.. versionchanged:: 3.4 601 The request methods add_data, has_data, get_data, get_type, get_host, 602 get_selector, get_origin_req_host and is_unverifiable that were deprecated 603 since 3.3 have been removed. 604 605 606.. _opener-director-objects: 607 608OpenerDirector Objects 609---------------------- 610 611:class:`OpenerDirector` instances have the following methods: 612 613 614.. method:: OpenerDirector.add_handler(handler) 615 616 *handler* should be an instance of :class:`BaseHandler`. The following methods 617 are searched, and added to the possible chains (note that HTTP errors are a 618 special case). Note that, in the following, *protocol* should be replaced 619 with the actual protocol to handle, for example :meth:`http_response` would 620 be the HTTP protocol response handler. Also *type* should be replaced with 621 the actual HTTP code, for example :meth:`http_error_404` would handle HTTP 622 404 errors. 623 624 * :meth:`<protocol>_open` --- signal that the handler knows how to open *protocol* 625 URLs. 626 627 See |protocol_open|_ for more information. 628 629 * :meth:`http_error_\<type\>` --- signal that the handler knows how to handle HTTP 630 errors with HTTP error code *type*. 631 632 See |http_error_nnn|_ for more information. 633 634 * :meth:`<protocol>_error` --- signal that the handler knows how to handle errors 635 from (non-\ ``http``) *protocol*. 636 637 * :meth:`<protocol>_request` --- signal that the handler knows how to pre-process 638 *protocol* requests. 639 640 See |protocol_request|_ for more information. 641 642 * :meth:`<protocol>_response` --- signal that the handler knows how to 643 post-process *protocol* responses. 644 645 See |protocol_response|_ for more information. 646 647.. |protocol_open| replace:: :meth:`BaseHandler.<protocol>_open` 648.. |http_error_nnn| replace:: :meth:`BaseHandler.http_error_\<nnn\>` 649.. |protocol_request| replace:: :meth:`BaseHandler.<protocol>_request` 650.. |protocol_response| replace:: :meth:`BaseHandler.<protocol>_response` 651 652.. method:: OpenerDirector.open(url, data=None[, timeout]) 653 654 Open the given *url* (which can be a request object or a string), optionally 655 passing the given *data*. Arguments, return values and exceptions raised are 656 the same as those of :func:`urlopen` (which simply calls the :meth:`open` 657 method on the currently installed global :class:`OpenerDirector`). The 658 optional *timeout* parameter specifies a timeout in seconds for blocking 659 operations like the connection attempt (if not specified, the global default 660 timeout setting will be used). The timeout feature actually works only for 661 HTTP, HTTPS and FTP connections. 662 663 664.. method:: OpenerDirector.error(proto, *args) 665 666 Handle an error of the given protocol. This will call the registered error 667 handlers for the given protocol with the given arguments (which are protocol 668 specific). The HTTP protocol is a special case which uses the HTTP response 669 code to determine the specific error handler; refer to the :meth:`http_error_\<type\>` 670 methods of the handler classes. 671 672 Return values and exceptions raised are the same as those of :func:`urlopen`. 673 674OpenerDirector objects open URLs in three stages: 675 676The order in which these methods are called within each stage is determined by 677sorting the handler instances. 678 679#. Every handler with a method named like :meth:`<protocol>_request` has that 680 method called to pre-process the request. 681 682#. Handlers with a method named like :meth:`<protocol>_open` are called to handle 683 the request. This stage ends when a handler either returns a non-\ :const:`None` 684 value (ie. a response), or raises an exception (usually 685 :exc:`~urllib.error.URLError`). Exceptions are allowed to propagate. 686 687 In fact, the above algorithm is first tried for methods named 688 :meth:`default_open`. If all such methods return :const:`None`, the algorithm 689 is repeated for methods named like :meth:`<protocol>_open`. If all such methods 690 return :const:`None`, the algorithm is repeated for methods named 691 :meth:`unknown_open`. 692 693 Note that the implementation of these methods may involve calls of the parent 694 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and 695 :meth:`~OpenerDirector.error` methods. 696 697#. Every handler with a method named like :meth:`<protocol>_response` has that 698 method called to post-process the response. 699 700 701.. _base-handler-objects: 702 703BaseHandler Objects 704------------------- 705 706:class:`BaseHandler` objects provide a couple of methods that are directly 707useful, and others that are meant to be used by derived classes. These are 708intended for direct use: 709 710 711.. method:: BaseHandler.add_parent(director) 712 713 Add a director as parent. 714 715 716.. method:: BaseHandler.close() 717 718 Remove any parents. 719 720The following attribute and methods should only be used by classes derived from 721:class:`BaseHandler`. 722 723.. note:: 724 725 The convention has been adopted that subclasses defining 726 :meth:`<protocol>_request` or :meth:`<protocol>_response` methods are named 727 :class:`\*Processor`; all others are named :class:`\*Handler`. 728 729 730.. attribute:: BaseHandler.parent 731 732 A valid :class:`OpenerDirector`, which can be used to open using a different 733 protocol, or handle errors. 734 735 736.. method:: BaseHandler.default_open(req) 737 738 This method is *not* defined in :class:`BaseHandler`, but subclasses should 739 define it if they want to catch all URLs. 740 741 This method, if implemented, will be called by the parent 742 :class:`OpenerDirector`. It should return a file-like object as described in 743 the return value of the :meth:`~OpenerDirector.open` method of :class:`OpenerDirector`, or ``None``. 744 It should raise :exc:`~urllib.error.URLError`, unless a truly exceptional 745 thing happens (for example, :exc:`MemoryError` should not be mapped to 746 :exc:`URLError`). 747 748 This method will be called before any protocol-specific open method. 749 750 751.. _protocol_open: 752.. method:: BaseHandler.<protocol>_open(req) 753 :noindex: 754 755 This method is *not* defined in :class:`BaseHandler`, but subclasses should 756 define it if they want to handle URLs with the given protocol. 757 758 This method, if defined, will be called by the parent :class:`OpenerDirector`. 759 Return values should be the same as for :meth:`default_open`. 760 761 762.. method:: BaseHandler.unknown_open(req) 763 764 This method is *not* defined in :class:`BaseHandler`, but subclasses should 765 define it if they want to catch all URLs with no specific registered handler to 766 open it. 767 768 This method, if implemented, will be called by the :attr:`parent` 769 :class:`OpenerDirector`. Return values should be the same as for 770 :meth:`default_open`. 771 772 773.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs) 774 775 This method is *not* defined in :class:`BaseHandler`, but subclasses should 776 override it if they intend to provide a catch-all for otherwise unhandled HTTP 777 errors. It will be called automatically by the :class:`OpenerDirector` getting 778 the error, and should not normally be called in other circumstances. 779 780 *req* will be a :class:`Request` object, *fp* will be a file-like object with 781 the HTTP error body, *code* will be the three-digit code of the error, *msg* 782 will be the user-visible explanation of the code and *hdrs* will be a mapping 783 object with the headers of the error. 784 785 Return values and exceptions raised should be the same as those of 786 :func:`urlopen`. 787 788 789.. _http_error_nnn: 790.. method:: BaseHandler.http_error_<nnn>(req, fp, code, msg, hdrs) 791 792 *nnn* should be a three-digit HTTP error code. This method is also not defined 793 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a 794 subclass, when an HTTP error with code *nnn* occurs. 795 796 Subclasses should override this method to handle specific HTTP errors. 797 798 Arguments, return values and exceptions raised should be the same as for 799 :meth:`http_error_default`. 800 801 802.. _protocol_request: 803.. method:: BaseHandler.<protocol>_request(req) 804 :noindex: 805 806 This method is *not* defined in :class:`BaseHandler`, but subclasses should 807 define it if they want to pre-process requests of the given protocol. 808 809 This method, if defined, will be called by the parent :class:`OpenerDirector`. 810 *req* will be a :class:`Request` object. The return value should be a 811 :class:`Request` object. 812 813 814.. _protocol_response: 815.. method:: BaseHandler.<protocol>_response(req, response) 816 :noindex: 817 818 This method is *not* defined in :class:`BaseHandler`, but subclasses should 819 define it if they want to post-process responses of the given protocol. 820 821 This method, if defined, will be called by the parent :class:`OpenerDirector`. 822 *req* will be a :class:`Request` object. *response* will be an object 823 implementing the same interface as the return value of :func:`urlopen`. The 824 return value should implement the same interface as the return value of 825 :func:`urlopen`. 826 827 828.. _http-redirect-handler: 829 830HTTPRedirectHandler Objects 831--------------------------- 832 833.. note:: 834 835 Some HTTP redirections require action from this module's client code. If this 836 is the case, :exc:`~urllib.error.HTTPError` is raised. See :rfc:`2616` for 837 details of the precise meanings of the various redirection codes. 838 839 An :class:`HTTPError` exception raised as a security consideration if the 840 HTTPRedirectHandler is presented with a redirected URL which is not an HTTP, 841 HTTPS or FTP URL. 842 843 844.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl) 845 846 Return a :class:`Request` or ``None`` in response to a redirect. This is called 847 by the default implementations of the :meth:`http_error_30\*` methods when a 848 redirection is received from the server. If a redirection should take place, 849 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the 850 redirect to *newurl*. Otherwise, raise :exc:`~urllib.error.HTTPError` if 851 no other handler should try to handle this URL, or return ``None`` if you 852 can't but another handler might. 853 854 .. note:: 855 856 The default implementation of this method does not strictly follow :rfc:`2616`, 857 which says that 301 and 302 responses to ``POST`` requests must not be 858 automatically redirected without confirmation by the user. In reality, browsers 859 do allow automatic redirection of these responses, changing the POST to a 860 ``GET``, and the default implementation reproduces this behavior. 861 862 863.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs) 864 865 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the 866 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response. 867 868 869.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs) 870 871 The same as :meth:`http_error_301`, but called for the 'found' response. 872 873 874.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs) 875 876 The same as :meth:`http_error_301`, but called for the 'see other' response. 877 878 879.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs) 880 881 The same as :meth:`http_error_301`, but called for the 'temporary redirect' 882 response. It does not allow changing the request method from ``POST`` 883 to ``GET``. 884 885 886.. method:: HTTPRedirectHandler.http_error_308(req, fp, code, msg, hdrs) 887 888 The same as :meth:`http_error_301`, but called for the 'permanent redirect' 889 response. It does not allow changing the request method from ``POST`` 890 to ``GET``. 891 892 .. versionadded:: 3.11 893 894 895.. _http-cookie-processor: 896 897HTTPCookieProcessor Objects 898--------------------------- 899 900:class:`HTTPCookieProcessor` instances have one attribute: 901 902.. attribute:: HTTPCookieProcessor.cookiejar 903 904 The :class:`http.cookiejar.CookieJar` in which cookies are stored. 905 906 907.. _proxy-handler: 908 909ProxyHandler Objects 910-------------------- 911 912 913.. method:: ProxyHandler.<protocol>_open(request) 914 :noindex: 915 916 The :class:`ProxyHandler` will have a method :meth:`<protocol>_open` for every 917 *protocol* which has a proxy in the *proxies* dictionary given in the 918 constructor. The method will modify requests to go through the proxy, by 919 calling ``request.set_proxy()``, and call the next handler in the chain to 920 actually execute the protocol. 921 922 923.. _http-password-mgr: 924 925HTTPPasswordMgr Objects 926----------------------- 927 928These methods are available on :class:`HTTPPasswordMgr` and 929:class:`HTTPPasswordMgrWithDefaultRealm` objects. 930 931 932.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd) 933 934 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and 935 *passwd* must be strings. This causes ``(user, passwd)`` to be used as 936 authentication tokens when authentication for *realm* and a super-URI of any of 937 the given URIs is given. 938 939 940.. method:: HTTPPasswordMgr.find_user_password(realm, authuri) 941 942 Get user/password for given realm and URI, if any. This method will return 943 ``(None, None)`` if there is no matching user/password. 944 945 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be 946 searched if the given *realm* has no matching user/password. 947 948 949.. _http-password-mgr-with-prior-auth: 950 951HTTPPasswordMgrWithPriorAuth Objects 952------------------------------------ 953 954This password manager extends :class:`HTTPPasswordMgrWithDefaultRealm` to support 955tracking URIs for which authentication credentials should always be sent. 956 957 958.. method:: HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, \ 959 passwd, is_authenticated=False) 960 961 *realm*, *uri*, *user*, *passwd* are as for 962 :meth:`HTTPPasswordMgr.add_password`. *is_authenticated* sets the initial 963 value of the ``is_authenticated`` flag for the given URI or list of URIs. 964 If *is_authenticated* is specified as ``True``, *realm* is ignored. 965 966 967.. method:: HTTPPasswordMgrWithPriorAuth.find_user_password(realm, authuri) 968 969 Same as for :class:`HTTPPasswordMgrWithDefaultRealm` objects 970 971 972.. method:: HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, \ 973 is_authenticated=False) 974 975 Update the ``is_authenticated`` flag for the given *uri* or list 976 of URIs. 977 978 979.. method:: HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri) 980 981 Returns the current state of the ``is_authenticated`` flag for 982 the given URI. 983 984 985.. _abstract-basic-auth-handler: 986 987AbstractBasicAuthHandler Objects 988-------------------------------- 989 990 991.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 992 993 Handle an authentication request by getting a user/password pair, and re-trying 994 the request. *authreq* should be the name of the header where the information 995 about the realm is included in the request, *host* specifies the URL and path to 996 authenticate for, *req* should be the (failed) :class:`Request` object, and 997 *headers* should be the error headers. 998 999 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an 1000 authority component (e.g. ``"http://python.org/"``). In either case, the 1001 authority must not contain a userinfo component (so, ``"python.org"`` and 1002 ``"python.org:80"`` are fine, ``"joe:[email protected]"`` is not). 1003 1004 1005.. _http-basic-auth-handler: 1006 1007HTTPBasicAuthHandler Objects 1008---------------------------- 1009 1010 1011.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs) 1012 1013 Retry the request with authentication information, if available. 1014 1015 1016.. _proxy-basic-auth-handler: 1017 1018ProxyBasicAuthHandler Objects 1019----------------------------- 1020 1021 1022.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs) 1023 1024 Retry the request with authentication information, if available. 1025 1026 1027.. _abstract-digest-auth-handler: 1028 1029AbstractDigestAuthHandler Objects 1030--------------------------------- 1031 1032 1033.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 1034 1035 *authreq* should be the name of the header where the information about the realm 1036 is included in the request, *host* should be the host to authenticate to, *req* 1037 should be the (failed) :class:`Request` object, and *headers* should be the 1038 error headers. 1039 1040 1041.. _http-digest-auth-handler: 1042 1043HTTPDigestAuthHandler Objects 1044----------------------------- 1045 1046 1047.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs) 1048 1049 Retry the request with authentication information, if available. 1050 1051 1052.. _proxy-digest-auth-handler: 1053 1054ProxyDigestAuthHandler Objects 1055------------------------------ 1056 1057 1058.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs) 1059 1060 Retry the request with authentication information, if available. 1061 1062 1063.. _http-handler-objects: 1064 1065HTTPHandler Objects 1066------------------- 1067 1068 1069.. method:: HTTPHandler.http_open(req) 1070 1071 Send an HTTP request, which can be either GET or POST, depending on 1072 ``req.has_data()``. 1073 1074 1075.. _https-handler-objects: 1076 1077HTTPSHandler Objects 1078-------------------- 1079 1080 1081.. method:: HTTPSHandler.https_open(req) 1082 1083 Send an HTTPS request, which can be either GET or POST, depending on 1084 ``req.has_data()``. 1085 1086 1087.. _file-handler-objects: 1088 1089FileHandler Objects 1090------------------- 1091 1092 1093.. method:: FileHandler.file_open(req) 1094 1095 Open the file locally, if there is no host name, or the host name is 1096 ``'localhost'``. 1097 1098 .. versionchanged:: 3.2 1099 This method is applicable only for local hostnames. When a remote 1100 hostname is given, an :exc:`~urllib.error.URLError` is raised. 1101 1102 1103.. _data-handler-objects: 1104 1105DataHandler Objects 1106------------------- 1107 1108.. method:: DataHandler.data_open(req) 1109 1110 Read a data URL. This kind of URL contains the content encoded in the URL 1111 itself. The data URL syntax is specified in :rfc:`2397`. This implementation 1112 ignores white spaces in base64 encoded data URLs so the URL may be wrapped 1113 in whatever source file it comes from. But even though some browsers don't 1114 mind about a missing padding at the end of a base64 encoded data URL, this 1115 implementation will raise an :exc:`ValueError` in that case. 1116 1117 1118.. _ftp-handler-objects: 1119 1120FTPHandler Objects 1121------------------ 1122 1123 1124.. method:: FTPHandler.ftp_open(req) 1125 1126 Open the FTP file indicated by *req*. The login is always done with empty 1127 username and password. 1128 1129 1130.. _cacheftp-handler-objects: 1131 1132CacheFTPHandler Objects 1133----------------------- 1134 1135:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the 1136following additional methods: 1137 1138 1139.. method:: CacheFTPHandler.setTimeout(t) 1140 1141 Set timeout of connections to *t* seconds. 1142 1143 1144.. method:: CacheFTPHandler.setMaxConns(m) 1145 1146 Set maximum number of cached connections to *m*. 1147 1148 1149.. _unknown-handler-objects: 1150 1151UnknownHandler Objects 1152---------------------- 1153 1154 1155.. method:: UnknownHandler.unknown_open() 1156 1157 Raise a :exc:`~urllib.error.URLError` exception. 1158 1159 1160.. _http-error-processor-objects: 1161 1162HTTPErrorProcessor Objects 1163-------------------------- 1164 1165.. method:: HTTPErrorProcessor.http_response(request, response) 1166 1167 Process HTTP error responses. 1168 1169 For 200 error codes, the response object is returned immediately. 1170 1171 For non-200 error codes, this simply passes the job on to the 1172 :meth:`http_error_\<type\>` handler methods, via :meth:`OpenerDirector.error`. 1173 Eventually, :class:`HTTPDefaultErrorHandler` will raise an 1174 :exc:`~urllib.error.HTTPError` if no other handler handles the error. 1175 1176 1177.. method:: HTTPErrorProcessor.https_response(request, response) 1178 1179 Process HTTPS error responses. 1180 1181 The behavior is same as :meth:`http_response`. 1182 1183 1184.. _urllib-request-examples: 1185 1186Examples 1187-------- 1188 1189In addition to the examples below, more examples are given in 1190:ref:`urllib-howto`. 1191 1192This example gets the python.org main page and displays the first 300 bytes of 1193it. :: 1194 1195 >>> import urllib.request 1196 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1197 ... print(f.read(300)) 1198 ... 1199 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1200 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html 1201 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n 1202 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n 1203 <title>Python Programming ' 1204 1205Note that urlopen returns a bytes object. This is because there is no way 1206for urlopen to automatically determine the encoding of the byte stream 1207it receives from the HTTP server. In general, a program will decode 1208the returned bytes object to string once it determines or guesses 1209the appropriate encoding. 1210 1211The following W3C document, https://www.w3.org/International/O-charset\ , lists 1212the various ways in which an (X)HTML or an XML document could have specified its 1213encoding information. 1214 1215As the python.org website uses *utf-8* encoding as specified in its meta tag, we 1216will use the same for decoding the bytes object. :: 1217 1218 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1219 ... print(f.read(100).decode('utf-8')) 1220 ... 1221 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1222 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1223 1224It is also possible to achieve the same result without using the 1225:term:`context manager` approach. :: 1226 1227 >>> import urllib.request 1228 >>> f = urllib.request.urlopen('http://www.python.org/') 1229 >>> print(f.read(100).decode('utf-8')) 1230 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1231 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1232 1233In the following example, we are sending a data-stream to the stdin of a CGI 1234and reading the data it returns to us. Note that this example will only work 1235when the Python installation supports SSL. :: 1236 1237 >>> import urllib.request 1238 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi', 1239 ... data=b'This data is passed to stdin of the CGI') 1240 >>> with urllib.request.urlopen(req) as f: 1241 ... print(f.read().decode('utf-8')) 1242 ... 1243 Got Data: "This data is passed to stdin of the CGI" 1244 1245The code for the sample CGI used in the above example is:: 1246 1247 #!/usr/bin/env python 1248 import sys 1249 data = sys.stdin.read() 1250 print('Content-type: text/plain\n\nGot Data: "%s"' % data) 1251 1252Here is an example of doing a ``PUT`` request using :class:`Request`:: 1253 1254 import urllib.request 1255 DATA = b'some data' 1256 req = urllib.request.Request(url='http://localhost:8080', data=DATA, method='PUT') 1257 with urllib.request.urlopen(req) as f: 1258 pass 1259 print(f.status) 1260 print(f.reason) 1261 1262Use of Basic HTTP Authentication:: 1263 1264 import urllib.request 1265 # Create an OpenerDirector with support for Basic HTTP Authentication... 1266 auth_handler = urllib.request.HTTPBasicAuthHandler() 1267 auth_handler.add_password(realm='PDQ Application', 1268 uri='https://mahler:8092/site-updates.py', 1269 user='klem', 1270 passwd='kadidd!ehopper') 1271 opener = urllib.request.build_opener(auth_handler) 1272 # ...and install it globally so it can be used with urlopen. 1273 urllib.request.install_opener(opener) 1274 urllib.request.urlopen('http://www.example.com/login.html') 1275 1276:func:`build_opener` provides many handlers by default, including a 1277:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment 1278variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme 1279involved. For example, the :envvar:`http_proxy` environment variable is read to 1280obtain the HTTP proxy's URL. 1281 1282This example replaces the default :class:`ProxyHandler` with one that uses 1283programmatically supplied proxy URLs, and adds proxy authorization support with 1284:class:`ProxyBasicAuthHandler`. :: 1285 1286 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'}) 1287 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler() 1288 proxy_auth_handler.add_password('realm', 'host', 'username', 'password') 1289 1290 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler) 1291 # This time, rather than install the OpenerDirector, we use it directly: 1292 opener.open('http://www.example.com/login.html') 1293 1294Adding HTTP headers: 1295 1296Use the *headers* argument to the :class:`Request` constructor, or:: 1297 1298 import urllib.request 1299 req = urllib.request.Request('http://www.example.com/') 1300 req.add_header('Referer', 'http://www.python.org/') 1301 # Customize the default User-Agent header value: 1302 req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)') 1303 r = urllib.request.urlopen(req) 1304 1305:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to 1306every :class:`Request`. To change this:: 1307 1308 import urllib.request 1309 opener = urllib.request.build_opener() 1310 opener.addheaders = [('User-agent', 'Mozilla/5.0')] 1311 opener.open('http://www.example.com/') 1312 1313Also, remember that a few standard headers (:mailheader:`Content-Length`, 1314:mailheader:`Content-Type` and :mailheader:`Host`) 1315are added when the :class:`Request` is passed to :func:`urlopen` (or 1316:meth:`OpenerDirector.open`). 1317 1318.. _urllib-examples: 1319 1320Here is an example session that uses the ``GET`` method to retrieve a URL 1321containing parameters:: 1322 1323 >>> import urllib.request 1324 >>> import urllib.parse 1325 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1326 >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params 1327 >>> with urllib.request.urlopen(url) as f: 1328 ... print(f.read().decode('utf-8')) 1329 ... 1330 1331The following example uses the ``POST`` method instead. Note that params output 1332from urlencode is encoded to bytes before it is sent to urlopen as data:: 1333 1334 >>> import urllib.request 1335 >>> import urllib.parse 1336 >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1337 >>> data = data.encode('ascii') 1338 >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f: 1339 ... print(f.read().decode('utf-8')) 1340 ... 1341 1342The following example uses an explicitly specified HTTP proxy, overriding 1343environment settings:: 1344 1345 >>> import urllib.request 1346 >>> proxies = {'http': 'http://proxy.example.com:8080/'} 1347 >>> opener = urllib.request.FancyURLopener(proxies) 1348 >>> with opener.open("http://www.python.org") as f: 1349 ... f.read().decode('utf-8') 1350 ... 1351 1352The following example uses no proxies at all, overriding environment settings:: 1353 1354 >>> import urllib.request 1355 >>> opener = urllib.request.FancyURLopener({}) 1356 >>> with opener.open("http://www.python.org/") as f: 1357 ... f.read().decode('utf-8') 1358 ... 1359 1360 1361Legacy interface 1362---------------- 1363 1364The following functions and classes are ported from the Python 2 module 1365``urllib`` (as opposed to ``urllib2``). They might become deprecated at 1366some point in the future. 1367 1368.. function:: urlretrieve(url, filename=None, reporthook=None, data=None) 1369 1370 Copy a network object denoted by a URL to a local file. If the URL 1371 points to a local file, the object will not be copied unless filename is supplied. 1372 Return a tuple ``(filename, headers)`` where *filename* is the 1373 local file name under which the object can be found, and *headers* is whatever 1374 the :meth:`info` method of the object returned by :func:`urlopen` returned (for 1375 a remote object). Exceptions are the same as for :func:`urlopen`. 1376 1377 The second argument, if present, specifies the file location to copy to (if 1378 absent, the location will be a tempfile with a generated name). The third 1379 argument, if present, is a callable that will be called once on 1380 establishment of the network connection and once after each block read 1381 thereafter. The callable will be passed three arguments; a count of blocks 1382 transferred so far, a block size in bytes, and the total size of the file. The 1383 third argument may be ``-1`` on older FTP servers which do not return a file 1384 size in response to a retrieval request. 1385 1386 The following example illustrates the most common usage scenario:: 1387 1388 >>> import urllib.request 1389 >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/') 1390 >>> html = open(local_filename) 1391 >>> html.close() 1392 1393 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1394 argument may be given to specify a ``POST`` request (normally the request 1395 type is ``GET``). The *data* argument must be a bytes object in standard 1396 :mimetype:`application/x-www-form-urlencoded` format; see the 1397 :func:`urllib.parse.urlencode` function. 1398 1399 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that 1400 the amount of data available was less than the expected amount (which is the 1401 size reported by a *Content-Length* header). This can occur, for example, when 1402 the download is interrupted. 1403 1404 The *Content-Length* is treated as a lower bound: if there's more data to read, 1405 urlretrieve reads more data, but if less data is available, it raises the 1406 exception. 1407 1408 You can still retrieve the downloaded data in this case, it is stored in the 1409 :attr:`content` attribute of the exception instance. 1410 1411 If no *Content-Length* header was supplied, urlretrieve can not check the size 1412 of the data it has downloaded, and just returns it. In this case you just have 1413 to assume that the download was successful. 1414 1415.. function:: urlcleanup() 1416 1417 Cleans up temporary files that may have been left behind by previous 1418 calls to :func:`urlretrieve`. 1419 1420.. class:: URLopener(proxies=None, **x509) 1421 1422 .. deprecated:: 3.3 1423 1424 Base class for opening and reading URLs. Unless you need to support opening 1425 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`, 1426 you probably want to use :class:`FancyURLopener`. 1427 1428 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header 1429 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number. 1430 Applications can define their own :mailheader:`User-Agent` header by subclassing 1431 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute 1432 :attr:`version` to an appropriate string value in the subclass definition. 1433 1434 The optional *proxies* parameter should be a dictionary mapping scheme names to 1435 proxy URLs, where an empty dictionary turns proxies off completely. Its default 1436 value is ``None``, in which case environmental proxy settings will be used if 1437 present, as discussed in the definition of :func:`urlopen`, above. 1438 1439 Additional keyword parameters, collected in *x509*, may be used for 1440 authentication of the client when using the :file:`https:` scheme. The keywords 1441 *key_file* and *cert_file* are supported to provide an SSL key and certificate; 1442 both are needed to support client authentication. 1443 1444 :class:`URLopener` objects will raise an :exc:`OSError` exception if the server 1445 returns an error code. 1446 1447 .. method:: open(fullurl, data=None) 1448 1449 Open *fullurl* using the appropriate protocol. This method sets up cache and 1450 proxy information, then calls the appropriate open method with its input 1451 arguments. If the scheme is not recognized, :meth:`open_unknown` is called. 1452 The *data* argument has the same meaning as the *data* argument of 1453 :func:`urlopen`. 1454 1455 This method always quotes *fullurl* using :func:`~urllib.parse.quote`. 1456 1457 .. method:: open_unknown(fullurl, data=None) 1458 1459 Overridable interface to open unknown URL types. 1460 1461 1462 .. method:: retrieve(url, filename=None, reporthook=None, data=None) 1463 1464 Retrieves the contents of *url* and places it in *filename*. The return value 1465 is a tuple consisting of a local filename and either an 1466 :class:`email.message.Message` object containing the response headers (for remote 1467 URLs) or ``None`` (for local URLs). The caller must then open and read the 1468 contents of *filename*. If *filename* is not given and the URL refers to a 1469 local file, the input filename is returned. If the URL is non-local and 1470 *filename* is not given, the filename is the output of :func:`tempfile.mktemp` 1471 with a suffix that matches the suffix of the last path component of the input 1472 URL. If *reporthook* is given, it must be a function accepting three numeric 1473 parameters: A chunk number, the maximum size chunks are read in and the total size of the download 1474 (-1 if unknown). It will be called once at the start and after each chunk of data is read from the 1475 network. *reporthook* is ignored for local URLs. 1476 1477 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1478 argument may be given to specify a ``POST`` request (normally the request type 1479 is ``GET``). The *data* argument must in standard 1480 :mimetype:`application/x-www-form-urlencoded` format; see the 1481 :func:`urllib.parse.urlencode` function. 1482 1483 1484 .. attribute:: version 1485 1486 Variable that specifies the user agent of the opener object. To get 1487 :mod:`urllib` to tell servers that it is a particular user agent, set this in a 1488 subclass as a class variable or in the constructor before calling the base 1489 constructor. 1490 1491 1492.. class:: FancyURLopener(...) 1493 1494 .. deprecated:: 3.3 1495 1496 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling 1497 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x 1498 response codes listed above, the :mailheader:`Location` header is used to fetch 1499 the actual URL. For 401 response codes (authentication required), basic HTTP 1500 authentication is performed. For the 30x response codes, recursion is bounded 1501 by the value of the *maxtries* attribute, which defaults to 10. 1502 1503 For all other response codes, the method :meth:`http_error_default` is called 1504 which you can override in subclasses to handle the error appropriately. 1505 1506 .. note:: 1507 1508 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests 1509 must not be automatically redirected without confirmation by the user. In 1510 reality, browsers do allow automatic redirection of these responses, changing 1511 the POST to a GET, and :mod:`urllib` reproduces this behaviour. 1512 1513 The parameters to the constructor are the same as those for :class:`URLopener`. 1514 1515 .. note:: 1516 1517 When performing basic authentication, a :class:`FancyURLopener` instance calls 1518 its :meth:`prompt_user_passwd` method. The default implementation asks the 1519 users for the required information on the controlling terminal. A subclass may 1520 override this method to support more appropriate behavior if needed. 1521 1522 The :class:`FancyURLopener` class offers one additional method that should be 1523 overloaded to provide the appropriate behavior: 1524 1525 .. method:: prompt_user_passwd(host, realm) 1526 1527 Return information needed to authenticate the user at the given host in the 1528 specified security realm. The return value should be a tuple, ``(user, 1529 password)``, which can be used for basic authentication. 1530 1531 The implementation prompts for this information on the terminal; an application 1532 should override this method to use an appropriate interaction model in the local 1533 environment. 1534 1535 1536:mod:`urllib.request` Restrictions 1537---------------------------------- 1538 1539 .. index:: 1540 pair: HTTP; protocol 1541 pair: FTP; protocol 1542 1543* Currently, only the following protocols are supported: HTTP (versions 0.9 and 1544 1.0), FTP, local files, and data URLs. 1545 1546 .. versionchanged:: 3.4 Added support for data URLs. 1547 1548* The caching feature of :func:`urlretrieve` has been disabled until someone 1549 finds the time to hack proper processing of Expiration time headers. 1550 1551* There should be a function to query whether a particular URL is in the cache. 1552 1553* For backward compatibility, if a URL appears to point to a local file but the 1554 file can't be opened, the URL is re-interpreted using the FTP protocol. This 1555 can sometimes cause confusing error messages. 1556 1557* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily 1558 long delays while waiting for a network connection to be set up. This means 1559 that it is difficult to build an interactive web client using these functions 1560 without using threads. 1561 1562 .. index:: 1563 single: HTML 1564 pair: HTTP; protocol 1565 1566* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data 1567 returned by the server. This may be binary data (such as an image), plain text 1568 or (for example) HTML. The HTTP protocol provides type information in the reply 1569 header, which can be inspected by looking at the :mailheader:`Content-Type` 1570 header. If the returned data is HTML, you can use the module 1571 :mod:`html.parser` to parse it. 1572 1573 .. index:: single: FTP 1574 1575* The code handling the FTP protocol cannot differentiate between a file and a 1576 directory. This can lead to unexpected behavior when attempting to read a URL 1577 that points to a file that is not accessible. If the URL ends in a ``/``, it is 1578 assumed to refer to a directory and will be handled accordingly. But if an 1579 attempt to read a file leads to a 550 error (meaning the URL cannot be found or 1580 is not accessible, often for permission reasons), then the path is treated as a 1581 directory in order to handle the case when a directory is specified by a URL but 1582 the trailing ``/`` has been left off. This can cause misleading results when 1583 you try to fetch a file whose read permissions make it inaccessible; the FTP 1584 code will try to read it, fail with a 550 error, and then perform a directory 1585 listing for the unreadable file. If fine-grained control is needed, consider 1586 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing 1587 *_urlopener* to meet your needs. 1588 1589 1590 1591:mod:`urllib.response` --- Response classes used by urllib 1592========================================================== 1593 1594.. module:: urllib.response 1595 :synopsis: Response classes used by urllib. 1596 1597The :mod:`urllib.response` module defines functions and classes which define a 1598minimal file-like interface, including ``read()`` and ``readline()``. 1599Functions defined by this module are used internally by the :mod:`urllib.request` module. 1600The typical response object is a :class:`urllib.response.addinfourl` instance: 1601 1602.. class:: addinfourl 1603 1604 .. attribute:: url 1605 1606 URL of the resource retrieved, commonly used to determine if a redirect was followed. 1607 1608 .. attribute:: headers 1609 1610 Returns the headers of the response in the form of an :class:`~email.message.EmailMessage` instance. 1611 1612 .. attribute:: status 1613 1614 .. versionadded:: 3.9 1615 1616 Status code returned by server. 1617 1618 .. method:: geturl() 1619 1620 .. deprecated:: 3.9 1621 Deprecated in favor of :attr:`~addinfourl.url`. 1622 1623 .. method:: info() 1624 1625 .. deprecated:: 3.9 1626 Deprecated in favor of :attr:`~addinfourl.headers`. 1627 1628 .. attribute:: code 1629 1630 .. deprecated:: 3.9 1631 Deprecated in favor of :attr:`~addinfourl.status`. 1632 1633 .. method:: getcode() 1634 1635 .. deprecated:: 3.9 1636 Deprecated in favor of :attr:`~addinfourl.status`. 1637