xref: /aosp_15_r20/external/jsoup/README.md (revision 6da8f8c4bc310ad659121b84dd089062417a2ce2)
1*6da8f8c4SAndroid Build Coastguard Worker# jsoup: Java HTML Parser
2*6da8f8c4SAndroid Build Coastguard Worker
3*6da8f8c4SAndroid Build Coastguard Worker**jsoup** is a Java library that makes it easy to work with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and xpath selectors.
4*6da8f8c4SAndroid Build Coastguard Worker
5*6da8f8c4SAndroid Build Coastguard Worker**jsoup** implements the [WHATWG HTML5](https://html.spec.whatwg.org/multipage/) specification, and parses HTML to the same DOM as modern browsers.
6*6da8f8c4SAndroid Build Coastguard Worker
7*6da8f8c4SAndroid Build Coastguard Worker* scrape and [parse](https://jsoup.org/cookbook/input/parse-document-from-string) HTML from a URL, file, or string
8*6da8f8c4SAndroid Build Coastguard Worker* find and [extract data](https://jsoup.org/cookbook/extracting-data/selector-syntax), using DOM traversal or CSS selectors
9*6da8f8c4SAndroid Build Coastguard Worker* manipulate the [HTML elements](https://jsoup.org/cookbook/modifying-data/set-html), attributes, and text
10*6da8f8c4SAndroid Build Coastguard Worker* [clean](https://jsoup.org/cookbook/cleaning-html/safelist-sanitizer) user-submitted content against a safe-list, to prevent XSS attacks
11*6da8f8c4SAndroid Build Coastguard Worker* output tidy HTML
12*6da8f8c4SAndroid Build Coastguard Worker
13*6da8f8c4SAndroid Build Coastguard Workerjsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.
14*6da8f8c4SAndroid Build Coastguard Worker
15*6da8f8c4SAndroid Build Coastguard WorkerSee [**jsoup.org**](https://jsoup.org/) for downloads and the full [API documentation](https://jsoup.org/apidocs/).
16*6da8f8c4SAndroid Build Coastguard Worker
17*6da8f8c4SAndroid Build Coastguard Worker[![Build Status](https://github.com/jhy/jsoup/workflows/Build/badge.svg)](https://github.com/jhy/jsoup/actions?query=workflow%3ABuild)
18*6da8f8c4SAndroid Build Coastguard Worker
19*6da8f8c4SAndroid Build Coastguard Worker## Example
20*6da8f8c4SAndroid Build Coastguard WorkerFetch the [Wikipedia](https://en.wikipedia.org/wiki/Main_Page) homepage, parse it to a [DOM](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction), and select the headlines from the *In the News* section into a list of [Elements](https://jsoup.org/apidocs/org/jsoup/select/Elements.html):
21*6da8f8c4SAndroid Build Coastguard Worker
22*6da8f8c4SAndroid Build Coastguard Worker```java
23*6da8f8c4SAndroid Build Coastguard WorkerDocument doc = Jsoup.connect("https://en.wikipedia.org/").get();
24*6da8f8c4SAndroid Build Coastguard Workerlog(doc.title());
25*6da8f8c4SAndroid Build Coastguard WorkerElements newsHeadlines = doc.select("#mp-itn b a");
26*6da8f8c4SAndroid Build Coastguard Workerfor (Element headline : newsHeadlines) {
27*6da8f8c4SAndroid Build Coastguard Worker  log("%s\n\t%s",
28*6da8f8c4SAndroid Build Coastguard Worker    headline.attr("title"), headline.absUrl("href"));
29*6da8f8c4SAndroid Build Coastguard Worker}
30*6da8f8c4SAndroid Build Coastguard Worker```
31*6da8f8c4SAndroid Build Coastguard Worker[Online sample](https://try.jsoup.org/~LGB7rk_atM2roavV0d-czMt3J_g), [full source](https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/examples/Wikipedia.java).
32*6da8f8c4SAndroid Build Coastguard Worker
33*6da8f8c4SAndroid Build Coastguard Worker## Open source
34*6da8f8c4SAndroid Build Coastguard Workerjsoup is an open source project distributed under the liberal [MIT license](https://jsoup.org/license). The source code is available on [GitHub](https://github.com/jhy/jsoup).
35*6da8f8c4SAndroid Build Coastguard Worker
36*6da8f8c4SAndroid Build Coastguard Worker## Getting started
37*6da8f8c4SAndroid Build Coastguard Worker1. [Download](https://jsoup.org/download) the latest jsoup jar (or add it to your Maven/Gradle build)
38*6da8f8c4SAndroid Build Coastguard Worker2. Read the [cookbook](https://jsoup.org/cookbook/)
39*6da8f8c4SAndroid Build Coastguard Worker3. Enjoy!
40*6da8f8c4SAndroid Build Coastguard Worker
41*6da8f8c4SAndroid Build Coastguard Worker### Android support
42*6da8f8c4SAndroid Build Coastguard WorkerWhen used in Android projects, [core library desugaring](https://developer.android.com/studio/write/java8-support#library-desugaring) with the [NIO specification](https://developer.android.com/studio/write/java11-nio-support-table) should be enabled to support Java 8+ features.
43*6da8f8c4SAndroid Build Coastguard Worker
44*6da8f8c4SAndroid Build Coastguard Worker## Development and support
45*6da8f8c4SAndroid Build Coastguard WorkerIf you have any questions on how to use jsoup, or have ideas for future development, please get in touch via [jsoup Discussions](https://github.com/jhy/jsoup/discussions).
46*6da8f8c4SAndroid Build Coastguard Worker
47*6da8f8c4SAndroid Build Coastguard WorkerIf you find any issues, please file a [bug](https://jsoup.org/bugs) after checking for duplicates.
48*6da8f8c4SAndroid Build Coastguard Worker
49*6da8f8c4SAndroid Build Coastguard WorkerThe [colophon](https://jsoup.org/colophon) talks about the history of and tools used to build jsoup.
50*6da8f8c4SAndroid Build Coastguard Worker
51*6da8f8c4SAndroid Build Coastguard Worker## Status
52*6da8f8c4SAndroid Build Coastguard Workerjsoup is in general, stable release.
53