1# Chrome's URL library 2 3## Layers 4 5There are several conceptual layers in this directory. Going from the lowest 6level up, they are: 7 8### Parsing 9 10The `url_parse.*` files are the parser. This code does no string 11transformations. Its only job is to take an input string and split out the 12components of the URL as best as it can deduce them, for a given type of URL. 13Parsing can never fail, it will take its best guess. This layer does not 14have logic for determining the type of URL parsing to apply, that needs to 15be applied at a higher layer (the "util" layer below). 16 17Because the parser code is derived (_very_ distantly) from some code in 18Mozilla, some of the parser files are in `url/third_party/mozilla/`. 19 20The main header to include for calling the parser is 21`url/third_party/mozilla/url_parse.h`. 22 23### Canonicalization 24 25The `url_canon*` files are the canonicalizer. This code will transform specific 26URL components or specific types of URLs into a standard form. For some 27dangerous or invalid data, the canonicalizer will report that a URL is invalid, 28although it will always try its best to produce output (so the calling code 29can, for example, show the user an error that the URL is invalid). The 30canonicalizer attempts to provide as consistent a representation as possible 31without changing the meaning of a URL. 32 33The canonicalizer layer is designed to be independent of the string type of 34the embedder, so all string output is done through a `CanonOutput` wrapper 35object. An implementation for `std::string` output is provided in 36`url_canon_stdstring.h`. 37 38The main header to include for calling the canonicalizer is 39`url/url_canon.h`. 40 41### Utility 42 43The `url_util*` files provide a higher-level wrapper around the parser and 44canonicalizer. While it can be called directly, it is designed to be the 45foundation for writing URL wrapper objects (The GURL later and Blink's KURL 46object use the Utility layer to implement the low-level logic). 47 48The Utility code makes decisions about URL types and calls the correct parsing 49and canonicalzation functions for those types. It provides an interface to 50register application-specific schemes that have specific requirements. 51Sharing this loigic between KURL and GURL is important so that URLs are 52handled consistently across the application. 53 54The main header to include is `url/url_util.h`. 55 56### Google URL (GURL) and Origin 57 58At the highest layer, a C++ object for representing URLs is provided. This 59object uses STL. Most uses need only this layer. Include `url/gurl.h`. 60 61Also at this layer is also the Origin object which exists to make security 62decisions on the web. Include `url/origin.h`. 63 64## Historical background 65 66This code was originally a separate library that was designed to be embedded 67into both Chrome (which uses STL) and WebKit (which didn't use any STL at the 68time). As a result, the parsing, canonicalization, and utility code could 69not use STL, or any other common code in Chromium like base. 70 71When WebKit was forked into the Chromium repo and renamed Blink, this 72restriction has been relaxed somewhat. Blink still provides its own URL object 73using its own string type, so the insulation that the Utility layer provides is 74still useful. But some STL strings and calls to base functions have gradually 75been added in places where doing so is possible. 76 77## Caution for terminologies 78 79Due to historical usage, the term "Standard URL" is currently used within the 80code to represent "[Special URLs][1]", except for "file:" scheme URL, as defined 81in the URL Standard. However, this terminology is outdated and can lead to 82confusion, particularly now that we are supporting [non-special URLs][2] as well 83([crbug/1416006][3]). For the sake of consistency and clarity, it is recommended 84to switch to the more accurate term "Special URL" throughout the codebase. 85However, this change should be carefully planned and executed due to the 86widespread use of the current terminology in both internal and third-party code. 87For a while, "Standard URL" and "Special URL" are used interchangeably. 88 89[1]: https://url.spec.whatwg.org/#is-special 90[2]: https://url.spec.whatwg.org/#is-not-special 91[3]: https://crbug.com/1416006 92