1jsoup Changelog Archive 2 3Contains change notes for versions 0.1.1 (2010-Jan-31) through 1.17.1 (2023-Nov-27). 4More recent changes may be found in CHANGES.md. 5 6Release 1.17.1 [27-Nov-2023] 7 * Improvement: in Jsoup.connect(), added support for request-level authentication, supporting authentication to 8 proxies and to servers. 9 <https://github.com/jhy/jsoup/pull/2046> 10 11 * Improvement: in the Elements list, added direct support for `#set(index, element)`, `#remove(index)`, 12 `#remove(object)`, `#clear()`, `#removeAll(collection)`, `#retainAll(collection)`, `#removeIf(filter)`, 13 `#replaceAll(operator)`. These methods update the original DOM, as well as the Elements list. 14 <https://github.com/jhy/jsoup/pull/2017> 15 16 * Improvement: added the NodeIterator class, to efficiently traverse a node tree using the Iterator interface. And 17 added Stream Element#stream() and Node#nodeStream() methods, to enable fluent composable stream pipelines of node 18 traversals. 19 <https://github.com/jhy/jsoup/pull/2051> 20 21 * Improvement: when changing the OutputSettings syntax to XML, the xhtml EscapeMode is automatically set by default. 22 23 * Improvement: added the `:is(selector list)` pseudo-selector, which finds elements that match any of the selectors in 24 the selector list. Useful for making large ORed selectors more readable. 25 26 * Improvement: repackaged the library with native (vs automatic) JPMS module support. 27 <https://github.com/jhy/jsoup/pull/2025> 28 29 * Improvement: better fidelity of source positions when tracking is enabled. And implicitly created or closed elements 30 are tracked and detectable via Range.isImplicit(). 31 <https://github.com/jhy/jsoup/pull/2056> 32 33 * Improvement: when source tracking is enabled, the source position for attribute names and values is now available. 34 Attribute#sourceRange() provides the ranges. 35 <https://github.com/jhy/jsoup/pull/2057> 36 37 * Improvement: when running concurrently under Java 21+ Virtual Threads, virtual threads could be pinned to their 38 carrier platform thread when parsing an input stream. To improve performance, particularly when parsing fetched 39 URLs, the internal ConstrainableInputStream has been replaced by ControllableInputStream, which avoids the locking 40 which caused that pinning. 41 <https://github.com/jhy/jsoup/issues/2054> 42 43 * Improvement: in Jsoup.Connect, allow any XML mimetype as a supported mimetype. Was previously limited to 44 `{application|text}/xml`. This enables for e.g. fetching SVGs with a image/svg+xml mimetype, without having to 45 disable mimetype validation. 46 <https://github.com/jhy/jsoup/issues/2059> 47 48 * Bugfix: when outputting with XML syntax, HTML elements that were parsed as data nodes (<script> and <style>) should 49 be emitted as CDATA nodes, so that they can be parsed correctly by an XML parser. 50 <https://github.com/jhy/jsoup/pull/1720> 51 52 * Bugfix: the Immediate Parent selector `>` could match elements above the root context element, causing incorrect 53 elements to be returned when used on elements other than the root document. 54 <https://github.com/jhy/jsoup/issues/2018> 55 56 * Bugfix: in a sub-query such as `p:has(> span, > i)`, combinators following the `,` Or combinator would be 57 incorrectly skipped, such that the sub-query was parsed as `i` instead of `> i`. 58 <https://github.com/jhy/jsoup/issues/1707> 59 60 * Bugfix: in W3CDom, if the jsoup input document contained an empty doctype, the conversion would fail with a 61 DOMException. Now, said doctype is discarded, and the conversion continues. 62 63 * Bugfix: when cleaning a document containing SVG elements (or other foreign elements that have preserved case names), 64 the cleaned output would be incorrectly nested if the safelist had a different case than the input document. 65 <https://github.com/jhy/jsoup/issues/2049> 66 67 * Bugfix: when cleaning a document, the output style of unknown self-closing tags from the input was not preserved in 68 the output. (So a <foo /> in the input, if safe-listed, would be output as <foo></foo>.) 69 <https://github.com/jhy/jsoup/issues/2049> 70 71 * Build Improvement: added a local test proxy implementation, for proxy integration tests. 72 <https://github.com/jhy/jsoup/pull/2029> 73 74 * Build Improvement: added tests for HTTPS request support, using a local self-signed cert. Includes proxy tests. 75 <https://github.com/jhy/jsoup/pull/2032> 76 77 * Change: the InputStream returned in Connection.Response.bodyStream() is no longer a ConstrainedInputStream, and 78 so is not subject to settings such as timeout or maximum size. It is now a plain BufferedInputStream around the 79 response stream. Whilst this behaviour was not documented, you may have been inadvertently relying on those 80 constraints. The constraints are still applied to other methods such as .parse() and .bufferUp(). So if you do want 81 a constrained BufferedInputStream, you may do Connection.Response.bufferUp().bodyStream(). 82 <https://github.com/jhy/jsoup/issues/2054> 83 84Release 1.16.2 [20-Oct-2023] 85 * Improvement: optimized the performance of complex CSS selectors, by adding a cost-based query planner. Evaluators 86 are sorted by their relative execution cost, and executed in order of lower to higher cost. This speeds the 87 matching process by ensuring that simpler evaluations (such as a tag name match) are conducted prior to more 88 complex evaluations (such as an attribute regex, or a deep child scan with a :has). 89 90 * Improvement: added support for <svg> and <math> tags (and their children). This includes tag namespaces and case 91 preservation on applicable tags and attributes. 92 <https://github.com/jhy/jsoup/pull/2008> 93 94 * Improvement: when converting jsoup Documents to W3C Documents in W3CDom, HTML documents will be placed in the 95 `http://www.w3.org/1999/xhtml` namespace by default, per the HTML5 spec. This can be controlled by setting 96 `W3CDom#namespaceAware(false)`. 97 <https://github.com/jhy/jsoup/pull/1848> 98 99 * Improvement: speed optimized the Structural Evaluators by memoizing previous evaluations. Particularly the `~` 100 (any preceding sibling) and `:nth-of-type` selectors are improved. 101 <https://github.com/jhy/jsoup/issues/1956> 102 103 * Improvement: tweaked the performance of the Element nextElementSibling, previousElementSibling, firstElementSibling, 104 lastElementSibling, firstElementChild, and lastElementChild. They now inplace filter/skip in the child-node list, vs 105 having to allocate and scan a complete Element filtered list. 106 107 * Improvement: optimized internal methods that previously called Element.children() to use filter/skip child-node list 108 accessors instead, reducing new Element List allocations. 109 110 * Improvement: tweaked the performance of parsing :pseudo selectors. 111 112 * Improvement: when using the `:empty` pseudo-selector, blank textnodes are now considered empty. Previously, 113 an element containing any whitespace was not considered empty. 114 <https://github.com/jhy/jsoup/issues/1976> 115 116 * Improvement: in forms, <input type="image"> should be excluded from formData() (and hence from form submissions). 117 <https://github.com/jhy/jsoup/pull/2010> 118 119 * Improvement: in Safelist, made isSafeTag and isSafeAttribute public methods, for extensibility. 120 <https://github.com/jhy/jsoup/issues/1780> 121 122 * Bugfix: `form` elements and empty elements (such as `img`) did not have their attributes de-duplicated. 123 <https://github.com/jhy/jsoup/pull/1950> 124 125 * Bugfix: if Document.OutputSettings was cloned from a clone, an NPE would be thrown when used. 126 <https://github.com/jhy/jsoup/pull/1964> 127 128 * Bugfix: in Jsoup.connect(url), URL paths containing a %2B were incorrectly recoded to a '+', or a '+' was recoded 129 to a ' '. Fixed by reverting to the previous behavior of not encoding supplied paths, other than normalizing to 130 ASCII. 131 <https://github.com/jhy/jsoup/issues/1952> 132 133 * Bugfix: in Jsoup.connect(url), strings containing supplemental characters (e.g. emoji) were not URL escaped 134 correctly. 135 136 * Bugfix: in Jsoup.connect(url), the ConstrainableInputStream would clear Thread interrupts when reading the body. 137 This precluded callers from spawning a thread, running a number of requests for a length of time, then joining that 138 thread after interrupting it. 139 <https://github.com/jhy/jsoup/issues/1991> 140 141 * Bugfix: when tracking HTML source positions, the closing tags for H1...H6 elements were not tracked correctly. 142 <https://github.com/jhy/jsoup/issues/1987> 143 144 * Bugfix: in Jsoup.connect(), a DELETE method request did not support a request body. 145 <https://github.com/jhy/jsoup/issues/1972> 146 147 * Bugfix: when calling Element.cssSelector() on an extremely deeply nested element, a StackOverflowError could occur. 148 Further, a StackOverflowError may occur when running the query. 149 <https://github.com/jhy/jsoup/issues/2001> 150 151 * Bugfix: appending a node back to its original Element after empty() would throw an Index out of bounds exception. 152 Also, now the child nodes that were removed have their parent node cleared, fully detaching them from the original 153 parent. 154 <https://github.com/jhy/jsoup/issues/2013> 155 156 * Bugfix: in Jsoup.Connection when adding headers, the value may have been assumed to be an incorrectly decoded 157 ISO_8859_1 string, and re-encoded as UTF-8. The value is now left as-is. 158 159 * Change: removed previously deprecated methods Document#normalise, Element#forEach(org.jsoup.helper.Consumer<>), 160 Node#forEach(org.jsoup.helper.Consumer<>), and the org.jsoup.helper.Consumer interface; the latter being a 161 previously required compatibility shim prior to Android's de-sugaring support. 162 163 * Change: the previous compatibility shim org.jsoup.UncheckedIOException is deprecated in favor of the now supported 164 java.io.UncheckedIOException. If you are catching the former, modify your code to catch the latter instead. 165 <https://github.com/jhy/jsoup/pull/1989> 166 167 * Change: blocked noscript tags from being added to Safelists, due to incompatibilities between parsers with and 168 without script-mode enabled. 169 170Release 1.16.1 [29-Apr-2023] 171 * Improvement: in Jsoup.connect(url), natively support URLs with Unicode characters in the path or query string, 172 without having to be escaped by the caller. 173 <https://github.com/jhy/jsoup/issues/1914> 174 175 * Improvement: Calling Node.remove() on a node with no parent is now a no-op, vs a validation error. 176 <https://github.com/jhy/jsoup/issues/1898> 177 178 * Bugfix: aligned the HTML Tree Builder processing steps for AfterBody and AfterAfterBody to the updated WHATWG 179 standard, to not pop the stack to close <body> or <html> elements. This prevents an errant </html> closing preceding 180 structure. Also added appropriate error message outputs in this case. 181 <https://github.com/jhy/jsoup/issues/1851> 182 183 * Bugfix: Corrected support for ruby elements (<ruby>, <rp>, <rt>, and <rtc>) to current spec. 184 <https://github.com/jhy/jsoup/issues/1294> 185 186 * Bugfix: When using Node.before(node) or Node.after(node), if the incoming node was a sibling of the context node, 187 the incoming node may be inserted into the wrong relative location. 188 <https://github.com/jhy/jsoup/issues/1898> 189 190 * Bugfix: In Jsoup.connect(url), if the input URL had components that were already % escaped, they would be escaped 191 again, causing errors when fetched. 192 <https://github.com/jhy/jsoup/issues/1902> 193 194 * Bugfix: when tracking input source positions, text in tables that was fostered had invalid positions. 195 <https://github.com/jhy/jsoup/issues/1927> 196 197 * Bugfix: If the Document.OutputSettings class was initialized, and then Entities.escape(String) called, an NPE may be 198 thrown due to a class loading circular dependency. 199 <https://github.com/jhy/jsoup/issues/1910> 200 201 * Bugfix: when pretty-printing, the first inline Element or Comment in a block would not be wrap-indented if it were 202 preceded by a blank text node. 203 <https://github.com/jhy/jsoup/issues/1906> 204 205 * Bugfix: when pretty-printing a <pre> containing block tags, those tags were incorrectly indented. 206 <https://github.com/jhy/jsoup/issues/1891> 207 208 * Bugfix: when pretty-printing nested inlineable blocks (such as a <p> in a <td>), the inner element should be 209 indented. 210 <https://github.com/jhy/jsoup/issues/1926> 211 212 * Bugfix: <br> tags should be wrap-indented when in block tags (and not when in inline tags). 213 <https://github.com/jhy/jsoup/issues/1911> 214 215 * Bugfix: the contents of a sufficiently large <textarea> with un-escaped HTML closing tags may be incorrectly parsed 216 to an empty node. 217 <https://github.com/jhy/jsoup/issues/1929> 218 219Release 1.15.4 [18-Feb-2023] 220 * Improvement: added the ability to escape CSS selectors (tags, IDs, classes) to match elements that don't follow 221 regular CSS syntax. For example, to match by classname <p class="one.two">, use document.select("p.one\\.two"); 222 <https://github.com/jhy/jsoup/issues/838> 223 224 * Improvement: when pretty-printing, wrap text that follows a <br> tag. 225 <https://github.com/jhy/jsoup/issues/1858> 226 227 * Improvement: when pretty-printing, normalize newlines that follow self-closing tags in custom tags. 228 <https://github.com/jhy/jsoup/issues/1852> 229 230 * Improvement: when pretty-printing, collapse non-significant whitespace between a block and an inline tag. 231 <https://github.com/jhy/jsoup/issues/1802> 232 233 * Improvement: in Element#forEach and Node#forEachNode, use java.util.function.Consumer instead of the previous 234 Android compatibility shim org.jsoup.helper.Consumer. Subsequently, the latter has been deprecated. 235 <https://github.com/jhy/jsoup/pull/1870> 236 237 * Improvement: added a new method Document#forms(), to conveniently retrieve a List<FormElement> containing the <form> 238 elements in a document. 239 240 * Improvement: added a new method Document#expectForm(query), to find the first matching FormElement, or blow up 241 trying. 242 243 * Bugfix: URLs containing characters such as [ and ] were not escaped correctly, and would throw a 244 MalformedURLException when fetched. 245 <https://github.com/jhy/jsoup/issues/1873> 246 247 * Bugfix: Element.cssSelector would create invalid selectors for elements where the tag name, ID, or classnames needed 248 to be escaped (e.g. if a class name contained a ':' or '.'). 249 <https://github.com/jhy/jsoup/issues/1742> 250 251 * Bugfix: element.text() should have a space between a block and an inline element. 252 <https://github.com/jhy/jsoup/issues/1877> 253 254 * Bugfix: if a Node or an Element was replaced with itself, that node would incorrectly be orphaned. 255 <https://github.com/jhy/jsoup/issues/1843> 256 257 * Bugfix: form data on a previous request was copied to a new request in newRequest(), resulting in an accumulation of 258 form data when executing multi-step form submissions, or data sent to later requests incorrectly. Now, newRequest() 259 only copies session related settings (cookies, proxy settings, user-agent, etc) but not the request data nor the 260 body. 261 <https://github.com/jhy/jsoup/issues/1778> 262 263 * Bugfix: fixed an issue in Safelist.removeAttributes which could throw a ConcurrentModificationException when using 264 the ":all" pseudo-attribute. 265 266 * Bugfix: given extremely deeply nested HTML, a number of methods in Element could throw a StackOverflowError due 267 to excessive recursion. Namely: #data(), #hasText(), #parents(), and #wrap(html). 268 <https://github.com/jhy/jsoup/issues/1864> 269 270 * Change: deprecated the unused Document#normalise() method. Normalization occurs during the HTML tree construction, 271 and no longer as a distinct phase. 272 273Release 1.15.3 [2022-Aug-24] 274 * Security: fixed an issue where the jsoup cleaner may incorrectly sanitize crafted XSS attempts if 275 SafeList.preserveRelativeLinks is enabled. 276 <https://github.com/jhy/jsoup/security/advisories/GHSA-gp7f-rwcx-9369> 277 278 * Improvement: the Cleaner will preserve the source position of cleaned elements, if source tracking is enabled in the 279 original parse. 280 281 * Improvement: the error messages output from Validate are more descriptive. Exceptions are now ValidationExceptions 282 (extending IllegalArgumentException). Stack traces do not include the Validate class, to make it simpler to see 283 where the exception originated. Common validation errors including malformed URLs and empty selector results have 284 more explicit error messages. 285 286 * Bugfix: the DataUtil would incorrectly read from InputStreams that emitted reads less than the requested size. This 287 lead to incorrect results when parsing from chunked server responses, for example. 288 <https://github.com/jhy/jsoup/issues/1807> 289 290 * Build Improvement: added implementation version and related fields to the jar manifest. 291 <https://github.com/jhy/jsoup/issues/1809> 292 293*** Release 1.15.2 [2022-Jul-04] 294 * Improvement: added the ability to track the position (line, column, index) in the original input source from where 295 a given node was parsed. Accessible via Node.sourceRange() and Element.endSourceRange(). 296 <https://github.com/jhy/jsoup/pull/1790> 297 298 * Improvement: added Element.firstElementChild(), Element.lastElementChild(), Node.firstChild(), Node.lastChild(), 299 as convenient accessors to those child nodes and elements. 300 301 * Improvement: added Element.expectFirst(cssQuery), which is just like Element.selectFirst(), but instead of returning 302 a null if there is no match, will throw an IllegalArgumentException. This is useful if you want to simply abort 303 processing if an expected match is not found. 304 305 * Improvement: when pretty-printing HTML, doctypes are emitted on a newline if there is a preceding comment. 306 <https://github.com/jhy/jsoup/pull/1664> 307 308 * Improvement: when pretty-printing, trim the leading and trailing spaces of textnodes in block tags when possible, 309 so that they are indented correctly. 310 <https://github.com/jhy/jsoup/issues/1798> 311 312 * Improvement: in Element#selectXpath(), disable namespace awareness. This makes it possible to always select elements 313 by their simple local name, regardless of whether an xmlns attribute was set. 314 <https://github.com/jhy/jsoup/issues/1801> 315 316 * Bugfix: when using the readToByteBuffer method, such as in Connection.Response.body(), if the document has not 317 already been parsed and must be read fully, and there is any maximum buffer size being applied, only the default 318 internal buffer size is read. 319 <https://github.com/jhy/jsoup/issues/1774> 320 321 * Bugfix: when serializing HTML, newlines in elements descending from a pre tag were incorrectly skipped. That caused 322 what should have been preformatted output to instead be a run of text. 323 <https://github.com/jhy/jsoup/issues/1776> 324 325 * Bugfix: when pretty-print serializing HTML, newlines separating phrasing content (e.g. a <span> tag within a <p> tag 326 would be incorrectly skipped, instead of normalized to a space. Additionally, improved space normalization between 327 other end of line occurrences, and whitespace handling after a closing </body> 328 <https://github.com/jhy/jsoup/issues/1787> 329 330*** Release 1.15.1 [2022-May-15] 331 * Change: removed previously deprecated methods and classes (including org.jsoup.safety.Whitelist; use 332 org.jsoup.safety.Safelist instead). 333 334 * Improvement: when converting jsoup Documents to W3C Documents in W3CDom, preserve HTML valid attribute names if the 335 input document is using the HTML syntax. (Previously, would always coerce using the more restrictive XML syntax.) 336 <https://github.com/jhy/jsoup/pull/1648> 337 338 * Improvement: added the :containsWholeText(text) selector, to match against non-normalized Element text. That can be 339 useful when elements can only be distinguished by e.g. specific case, or leading whitespace, etc. 340 <https://github.com/jhy/jsoup/issues/1636> 341 342 * Improvement: added Element#wholeOwnText() to retrieve the original (non-normalized) ownText of an Element. Also 343 added the :containsWholeOwnText(text) selector, to match against that. BR elements are now treated as newlines 344 in the wholeText methods. 345 <https://github.com/jhy/jsoup/issues/1636> 346 347 * Improvement: added the :matchesWholeText(regex) and :matchesWholeOwnText(regex) selectors, to match against whole 348 (non-normalized, case sensitive) element text and own text, respectively. 349 <https://github.com/jhy/jsoup/issues/1636> 350 351 * Improvement: when evaluating an XPath query against a context element, the complete document is now visible to the 352 query, vs only the context element's sub-tree. This enables support for queries outside (parent or sibling) the 353 element, e.g. ancestor-or-self::*. 354 <https://github.com/jhy/jsoup/issues/1652> 355 356 * Improvement: allow a maxPaddingWidth on the indent level in OutputSettings when pretty printing. This defaults to 357 30 to limit the indent level for very deeply nested elements, and may be disabled by setting to -1. 358 <https://github.com/jhy/jsoup/pull/1655> 359 360 * Improvement: when cloning a Node or an Element, the clone gets a cloned OwnerDocument containing only that clone, so 361 as to preserve applicable settings, such as the Pretty Print settings. 362 <https://github.com/jhy/jsoup/issues/763> 363 364 * Improvement: added a convenience method Jsoup.parse(File). 365 <https://github.com/jhy/jsoup/issues/1693> 366 367 * Improvement: in the NodeTraversor, added default implementations for NodeVisitor.tail() and NodeFilter.tail(), so 368 that code using only head() methods can be written as lambdas. 369 370 * Improvement: in NodeTraversor, added support for removing nodes via Node.remove() during NodeVisitor.head(). 371 <https://github.com/jhy/jsoup/issues/1699> 372 373 * Improvement: added Node.forEachNode(Consumer<Node>) and Element.forEach(Consumer<Element) methods, to efficiently 374 traverse the DOM with a functional interface. 375 <https://github.com/jhy/jsoup/issues/1700> 376 377 * Bugfix: boolean attribute names should be case-insensitive, but were not when the parser was configured to preserve 378 case. 379 <https://github.com/jhy/jsoup/issues/1656> 380 381 * Bugfix: when reading from SequenceInputStreams across the buffer, the input stream was closed too early, resulting 382 in missed content. 383 <https://github.com/jhy/jsoup/pull/1671> 384 385 * Bugfix: a comment with all dashes (<!----->) should not emit a parse error. 386 <https://github.com/jhy/jsoup/issues/1667> 387 388 * Bugfix: when throwing a SelectorParseException for an invalid selector, don't try to String.format the input, as 389 that could throw an IllegalFormatException. 390 <https://github.com/jhy/jsoup/issues/1691> 391 392 * Bugfix: when serializing HTML with Pretty Print enabled, extraneous whitespace may be added on closing tags, or 393 extra newlines may be added at the end of script blocks. 394 <https://github.com/jhy/jsoup/issues/1688> 395 <https://github.com/jhy/jsoup/issues/1689> 396 397 * Bugfix: when copy-creating a Safelist from another, perform a deep-copy of the original's settings, so that changes 398 to the original after creation do not affect the copy. 399 <https://github.com/jhy/jsoup/pull/1763> 400 401 * Bugfix [Fuzz]: speed improvement when parsing constructed HTML containing very deeply incorrectly stacked formatting 402 elements with many attributes. 403 <https://github.com/jhy/jsoup/issues/1695> 404 405 * Bugfix [Fuzz]: during parsing, a StackOverflowException was possible given crafted HTML with hundreds of nested 406 table elements followed by invalid formatting elements. 407 <https://github.com/jhy/jsoup/issues/1697> 408 409*** Release 1.14.3 [2021-Sep-30] 410 * Improvement: added native XPath support in Element#selectXpath(String) 411 <https://github.com/jhy/jsoup/pull/1629> 412 413 * Improvement: added full support for the <template> tag to the HTML5 parser spec. 414 <https://github.com/jhy/jsoup/issues/1634> 415 416 * Improvement: added support in CharacterReader to track newlines, so that parse errors can be reported more 417 intuitively. 418 <https://github.com/jhy/jsoup/pull/1624> 419 420 * Improvement: tracked parse errors now have more details, including the erroneous token, to help clarify the errors. 421 422 * Improvement: speed and memory optimizations for the :has(subquery) selector. 423 424 * Improvement: the :contains(text) and :containsOwn(text) selectors are now whitespace normalized, aligning to the 425 document text that they are matching against. 426 <https://github.com/jhy/jsoup/issues/876> 427 428 * Improvement: in Element, speed optimized adopting all of an element's child nodes into a currently empty element. 429 Improves the HTML adoption agency algorithm when adopting elements with many children. 430 <https://github.com/jhy/jsoup/issues/1638> 431 432 * Improvement: increased the parse speed when in RCData (e.g. <title>) and unescaped <tag> tokens are found, by 433 memoizing the </title> scan and reducing GC. 434 <https://github.com/jhy/jsoup/issues/1644> 435 436 * Improvement: when parsing custom tags (in HTML or XML), added a flyweight cache on Tag.valueOf(name) to reduce 437 memory overhead when many tags are repeated. Also tuned other areas of the parser when many very deeply stacked 438 custom elements were present. 439 <https://github.com/jhy/jsoup/issues/1646> 440 441 * Bugfix: when tracking errors or checking for validity in the Cleaner, errors were incorrectly raised for missing 442 optional closing tags. 443 444 * Bugfix: the OSGi bundle meta-data incorrectly set a version on the import of javax.annotation (used as a build-time 445 dependency for nullability assertions). 446 <https://github.com/jhy/jsoup/issues/1616> 447 448 * Bugfix: the Attributes::equals() method was sensitive to the order of its contents, but it should not be. 449 <https://github.com/jhy/jsoup/issues/1492> 450 451 * Bugfix: when the HTML parser was configured to preserve case, Element text methods would miss adding whitespace for 452 "BR" tags. 453 454 * Bugfix: attribute names are now normalized & validated correctly for the specific output syntax (HTML or XML). 455 Previously, syntactically invalid attribute names could be output by the html() methods. Such attributes are still 456 available in the DOM, and will be normalized if possible on output. 457 <https://github.com/jhy/jsoup/issues/1474> 458 459 * Bugfix [Fuzz]: fixed an IOOB when an empty select tag was followed by a body tag that needed reparenting. 460 <https://github.com/jhy/jsoup/issues/1639> 461 462 * Build Improvement: fixed nullability annotations for Node.equals(other) and other equals methods. 463 <https://github.com/jhy/jsoup/issues/1628> 464 465 * Build Improvement: added JDK 17 to the CI builds. 466 <https://github.com/jhy/jsoup/pull/1641> 467 468*** Release 1.14.2 [2021-Aug-15] 469 * Improvement: support Pattern.quote \Q and \E escapes in the selector regex matchers. 470 <https://github.com/jhy/jsoup/pull/1536> 471 472 * Improvement: Element.absUrl() now supports tel: URLs, and other URLs that are already absolute but that Java does 473 not have input stream handlers for. 474 <https://github.com/jhy/jsoup/issues/1610> 475 476 * Bugfix: when serializing output, escape characters that are in the < 0x20 range. This improves XML output 477 compatibility, and makes HTML output with these characters easier to read (as they're otherwise invisible). 478 <https://github.com/jhy/jsoup/issues/1556> 479 480 * Bugfix: the *|el wildcard namespace selector now also matches elements with no namespace. 481 <https://github.com/jhy/jsoup/issues/1565> 482 483 * Bugfix: corrected a potential case of the parser input stream not being closed immediately on a read exception. 484 485 * Bugfix: when making a HTTP POST, if the request write fails, make sure the connection is immediately cleaned up. 486 487 * Bugfix: in the XML parser, XML processing instructions without attributes would be serialized as if they did. 488 <https://github.com/jhy/jsoup/issues/770> 489 490 * Bugfix: updated the HtmlTreeParser resetInsertionMode to the current spec for supported elements. 491 <https://github.com/jhy/jsoup/issues/1491> 492 493 * Bugfix: fixed an NPE when parsing fragment HTML into a standalone table element. 494 <https://github.com/jhy/jsoup/issues/1603> 495 496 * Bugfix: fixed an NPE when parsing fragment heading HTML into a standalone p element. 497 <https://github.com/jhy/jsoup/issues/1601> 498 499 * Bugfix: fixed an IOOB when parsing a formatting fragment into a standalone p element. 500 <https://github.com/jhy/jsoup/issues/1602> 501 502 * Bugfix: tag names must start with an ascii-alpha character. 503 <https://github.com/jhy/jsoup/issues/1006> 504 505 * Bugfix [Fuzz]: fixed a slow parse when a tag or an attribute name has thousands of null characters in it. 506 <https://github.com/jhy/jsoup/issues/1580> 507 508 * Bugfix [Fuzz]: the adoption agency algorithm can have an incorrect bookmark position 509 <https://github.com/jhy/jsoup/issues/1576> 510 511 * Bugfix [Fuzz]: malformed HTML could result in null elements on stack 512 <https://github.com/jhy/jsoup/issues/1579> 513 514 * Bugfix [Fuzz]: malformed deeply nested table elements could create a stack overflow. 515 <https://github.com/jhy/jsoup/issues/1577> 516 517 * Bugfix [Fuzz]: Speed optimized malformed HTML creating elements with thousands of elements - limit the attribute 518 count per element when parsing to 512 (in real-world HTML, P99 is ~ 8). 519 <https://github.com/jhy/jsoup/issues/1578> 520 521 * Bugfix [Fuzz]: Speed improvement for the foster formatting elements algo, by limiting how far up a crafted stack 522 to scan. 523 <https://github.com/jhy/jsoup/issues/1593> 524 525 * Bugfix [Fuzz]: Speed improvement when parsing crafted HTML when transferring form attributes. 526 <https://github.com/jhy/jsoup/issues/1595> 527 528 * Bugfix [Fuzz]: Speed improvement when the stack was thousands of items deep, and non-matching close tags sent. 529 <https://github.com/jhy/jsoup/issues/1596> 530 531 * Bugfix [Fuzz]: Speed improvement when an attribute name is 600K of quote characters or otherwise needs accumulation 532 vs being able to read in one hit. 533 <https://github.com/jhy/jsoup/issues/1605> 534 535 * Bugfix [Fuzz]: Speed improvement when closing missing empty tags (in XML comment processed as HTML) when thousands 536 deep in stack. 537 <https://github.com/jhy/jsoup/issues/1606> 538 539 * Bugfix [Fuzz]: Fix a potential stack-overflow in the parser given crafted HTML, when the parser looped in the 540 InSelectInTable state. 541 542 * Bugfix [Fuzz]: Fix an IOOB when the HTML root was cleared from the stack and then attributes were merged onto it. 543 <https://github.com/jhy/jsoup/issues/1611> 544 545 * Bugfix [Fuzz]: Improved the speed of parsing when crafted HTML contains hundreds of active formatting elements 546 that were copied for all new elements (similar to an amplification attack). The number of considered active 547 formatting elements that will be cloned when mis-nested is now capped to 12. 548 <https://github.com/jhy/jsoup/issues/1613> 549 550*** Release 1.14.1 [2021-Jul-10] 551 * Change: updated the minimum supported Java version from Java 7 to Java 8. 552 553 * Change: updated the minimum Android API level from 8 to 10. 554 555 * Change: although Node#childNodes() returns an UnmodifiableList as a view into its children, it was still 556 directly backed by the internal child list. That made some uses, such as looping and moving those children to 557 another element, throw a ConcurrentModificationException. Now this method returns its own list so that they are 558 separated and changes to the parent's contents will not impact the children view. This aligns with similar methods 559 such as Element#children(). If you have code that iterates this list and makes parenting changes to its contents, 560 you may need to make a code update. 561 <https://github.com/jhy/jsoup/issues/1431> 562 563 * Change: the org.jsoup.Connection interface has been modified to introduce new methods for sessions and the cookie 564 store. If you have a custom implementation of this interface, you will need to add implementations of these methods. 565 566 * Improvement: added HTTP request session management support with Jsoup.newSession(). This extends the Connection 567 implementation to support (optional) sessions, which allow request defaults (timeout, proxy, etc) to be set once and 568 then applied to all requests within that session. 569 570 Cookies are re-implemented to correctly support path and domain filtering when used within a session. A default 571 in-memory cookie store is used for the session, or a custom implementation (perhaps disk-persistent, or pre-set) 572 can be used instead. 573 574 Forms submitted using the FormElement#submit() use the same session that was used to fetch the document and so pass 575 cookies and other defaults appropriately. 576 577 The session is multi-thread safe and can execute multiple requests concurrently. If the user accidentally tries to 578 execute the same request object across multiple threads (vs calling Connection#newRequest()), 579 that is detected cleanly and a clear exception is thrown (vs weird blowups in input stream reading, or forcing 580 everything through a synchronized bottleneck. 581 <https://github.com/jhy/jsoup/pull/1476> 582 583 * Improvement: renamed the Whitelist class to Safelist, with the goal of more inclusive language. A shim is provided 584 for backwards compatibility (source and binary). This shim is marked as deprecated and will be removed in the 585 jsoup 1.15.1 release. 586 <https://github.com/jhy/jsoup/pull/1464> 587 588 * Improvement: added support for Internationalized Domain Names (IDNs) in Jsoup.Connect. 589 <https://github.com/jhy/jsoup/issues/1300> 590 591 * Improvement: added support for loading and parsing gzipped HTML files in Jsoup.parse(File in, charset, baseUri). 592 593 * Improvement: reduced thread contention in HttpConnection and Document. 594 <https://github.com/jhy/jsoup/pull/1455> 595 596 * Improvement: better parsing performance when under high thread concurrency 597 <https://github.com/jhy/jsoup/pull/1402> 598 599 * Improvement: added Element#id(String) ID attribute setter. 600 601 * Improvement: in Document, #body() and #head() accessors will now automatically create those elements, if they were 602 missing (e.g. if the Document was not parsed from HTML). Additionally, the #body() method returns the frameset 603 element (instead of null) for frameset documents. 604 605 * Improvement: when cleaning a document, the output settings of the original document are cloned into the cleaned 606 document. 607 <https://github.com/jhy/jsoup/issues/1417> 608 609 * Improvement: when parsing XML, disable pretty-printing by default. 610 <https://github.com/jhy/jsoup/issues/1168> 611 612 * Improvement: much better performance in Node#clone() for large and deeply nested documents. Complexity was O(n^2) or 613 worse, now O(n). 614 615 * Improvement: during traversal using the NodeTraversor, nodes may now be replaced with Node#replaceWith(Node). 616 <https://github.com/jhy/jsoup/issues/1289> 617 618 * Improvement: added Element#insertChildren and Element#prependChildren, as convenience methods in addition to 619 Element#insertChildren(index, children), for bulk moving nodes. 620 621 * Improvement: clean up relative URLs with too many .. segments better. 622 <https://github.com/jhy/jsoup/pull/1482> 623 624 * Build Improvement: integrated jsoup into the OSS Fuzz project, which semi-randomly generates millions of different 625 HTML and XML input files, searching for areas to improve in the parser for increased robustness and throughput. 626 <https://github.com/jhy/jsoup/issues/1502> 627 628 * Build Improvement: integrated with GitHub's CodeQL static code analyzer. 629 <https://github.com/jhy/jsoup/pull/1494> 630 631 * Build Improvement: moved to GitHub Workflows for build verification. 632 633 * Build Improvement: updated Jetty (used for integration tests; not bundled) to 9.4.42. 634 635 * Build Improvement: added nullability annotations and initial settings. 636 <https://github.com/jhy/jsoup/pull/1467> 637 638 * Bugfix: corrected the adoption agency algorithm, to handle cases where e.g. a <a> tag incorrectly nests further <a> 639 tags. 640 <https://github.com/jhy/jsoup/pull/1517> <https://github.com/jhy/jsoup/issues/845> 641 642 * Bugfix: when parsing HTML, could throw NPEs on some tags (isindex or table>input). 643 <https://github.com/jhy/jsoup/issues/1404> 644 645 * Bugfix: in HttpConnection.Request, headers beginning with "sec-" (e.g. Sec-Fetch-Mode) were silently discarded by 646 the underlying Java HttpURLConnection. These are now settable correctly. 647 <https://github.com/jhy/jsoup/issues/1461> 648 649 * Bugfix: when adding child Nodes to a Node, could incorrectly reparent all nodes if the first parent had the same 650 length of children as the incoming node list. 651 652 * Bugfix: when wrapping an orphaned element, would throw an NPE. 653 654 * Bugfix: when wrapping an element with HTML that included multiple sibling elements, those siblings were incorrectly 655 added as children of the wrapper instead of siblings. 656 657 * Bugfix: when setting the content of a script or style tag via the Element#html(String) method, the content is now 658 treated as a DataNode, not a TextNode. This means that characters like '<' will no longer be incorrectly escaped. 659 As a related ergonomic improvement, the same behavior applies for Element#text(String) (i.e. the content will be 660 treated as a DataNode, despite calling the text() method. 661 <https://github.com/jhy/jsoup/issues/1419> 662 663 * Bugfix: when wrapping HTML around an existing element with Element#wrap(String), will now take the content as 664 provided and ignore normal HTML tree-building rules. This allows for e.g. a div tag to be placed inside of p tags. 665 666 * Bugfix: the Elements#forms() method should return the selected immediate elements that are Forms, not children. 667 <https://github.com/jhy/jsoup/pull/1403> 668 669 * Bugfix: when creating a selector for an element with Element#cssSelector, if the element used a non-unique ID 670 attribute, the returned selector may not match the desired element. 671 <https://github.com/jhy/jsoup/issues/1085> 672 673 * Bugfix: corrected the toString() methods of the Evaluator classes. 674 675 * Bugfix: when converting a jsoup document to a W3C document (in W3CDom#convert), if a tag had XML illegal characters, 676 a DOMException would be thrown. Now instead, that tag is represented as a text node. 677 <https://github.com/jhy/jsoup/issues/1093> 678 679 * Bugfix: if a HTML file ended with an open noscript tag, an "EOF" string would appear in the HTML output. 680 681 * Bugfix: when parsing a document as XML, automatically set the output syntax to XML, and ensure that "<" characters 682 in attributes are escaped as "<" (which is not required in HTML as the quoted attribute contents are safe, but is 683 required in XML). 684 <https://github.com/jhy/jsoup/issues/1420> 685 686 * Bugfix: [Fuzz] when parsing an attribute key containing "abs:abs", a validation error would be incorrectly 687 thrown. 688 <https://github.com/jhy/jsoup/issues/1541> 689 690 * Bugfix: [Fuzz] could NPE while parsing in resetInsertionMode(). 691 <https://github.com/jhy/jsoup/issues/1538> 692 693 * Bugfix: [Fuzz] when parsing XML, could Stack Overflow when parsing XML declarations. 694 <https://github.com/jhy/jsoup/issues/1539> 695 696 * Bugfix: [Fuzz] fixed a potential Stack Overflow when parsing mis-nested tfoot tags, and updated the tree parser for 697 this situation to match the updated HTML5 spec. 698 <https://github.com/jhy/jsoup/issues/1543> 699 700 * Bugfix: [Fuzz] fixed a potentially slow HTML parse when tags are nested extremely deep (e.g. 88K depth), by limiting 701 the formatting tag search depth to 256. In practice, it's generally between 4 - 8. 702 <https://github.com/jhy/jsoup/issues/1544> 703 704 * Bugfix: [Fuzz] when parsing an unterminated RCDATA token (e.g. a <title> tag), could throw an IO Exception "No 705 buffer left to unconsume" when trying to rewind the buffer. 706 <https://github.com/jhy/jsoup/issues/1542> 707 708*** Release 1.13.1 [2020-Feb-29] 709 * Improvement: added Element#closest(selector), which walks up the tree to find the nearest element matching the 710 selector. 711 <https://github.com/jhy/jsoup/issues/1326> 712 713 * Improvement: memory optimizations, reducing the retained size of a Document by ~ 39%, and allocations by ~ 9%: 714 1. Attributes holder in Elements is only created if the element has attributes 715 2. Only track the baseUri in an element when it is set via DOM to a new value for a given tree 716 3. After parsing, do not retain the input character reader (and associated buffers) in the Document#parser 717 718 * Improvement: substantial parse speed improvements vs 1.12.x (bringing back to par with previous releases). 719 <https://github.com/jhy/jsoup/issues/1327> 720 721 * Improvement: when pretty-printing, comments in inline tags are not pushed to a newline 722 723 * Improvement: added Attributes#hasDeclaredValueForKey(key) and Attribute#hasDeclaredValueForKeyIgnoreCase(), to check 724 if an attribute is set but has no value. Useful in place of the deprecated and removed BooleanAttribute class and 725 instanceof test. 726 727 * Improvement: removed old methods and classes that were marked deprecated in previous releases. 728 729 * Improvement: added Element#select(Evaluator) and Element#selectFirst(Evaluator), to allow re-use of a parsed CSS 730 selector if using the same evaluator many times. 731 <https://github.com/jhy/jsoup/issues/1319> 732 733 * Improvement: added Elements#forms(), Elements#textNodes(), Elements#dataNodes(), and Elements#comments(), as a 734 convenient way to get access to these node types directly from an element selection. 735 736 * Improvement: preserve whitespace before html and head tag, if pretty-printing is off. 737 738 * Bugfix: in a <select> tag, a second <optgroup> would not automatically close an earlier open <optgroup> 739 <https://github.com/jhy/jsoup/issues/1313> 740 741 * Bugfix: in CharacterReader when parsing an input stream, could throw a Mark Invalid exception if the reader was 742 marked, a bufferUp occurred, and then the reader was rewound. 743 <https://github.com/jhy/jsoup/issues/1324> 744 745 * Bugfix: empty tags and form tags did not have their attributes normalized (lower-cased by default) 746 <https://github.com/jhy/jsoup/pull/1323> 747 748 * Bugfix: when preserve case was set to on, the HTML pretty-print formatter didn't indent capitalized tags correctly. 749 750 * Bugfix: ensure that script and style contents are parsed into DataNodes, not TextNodes, when in case-sensitive 751 parse mode. 752 753**** Release 1.12.2 [2020-Feb-08] 754 * Improvement: the :has() selector now supports relative selectors. For example, the query 755 "div:has(> a)" will select all "div" elements that have at least one direct child "a" element. 756 <https://github.com/jhy/jsoup/pull/1214> 757 758 * Improvement: added Element chaining methods for various overridden methods on Node. 759 <https://github.com/jhy/jsoup/issues/1193> 760 761 * Improvement: ensure HTTP keepalives work when fetching content via body() and bodyAsBytes(). 762 <https://github.com/jhy/jsoup/issues/1232> 763 764 * Improvement: set the default max body size in Jsoup.Connection to 2MB (up from 1MB) so fewer people get trimmed 765 content if they have not set it, but still in sensible bounds. Also updated the default user-agent to improve 766 default compatibility. 767 768 * Improvement: dramatic speed improvement when bulk inserting child nodes into an element (wrapping contents). 769 <https://github.com/jhy/jsoup/issues/1281> 770 771 * Improvement: added Element#childrenSize() as a convenience to get the size of an element's element children. 772 <https://github.com/jhy/jsoup/pull/1291> 773 774 * Improvement: in W3CDom.asString, allow the output mode to be specified as HTML or as XML. It will default to 775 checking the content, and automatically selecting. 776 777 * Improvement: added a Document#documentType() method, to get a doc's doctype. 778 779 * Improvement: To DocumentType, added #name(), #publicID(), and #systemId() methods to fetch those fields. 780 781 * Improvement: in W3CDom conversions from jsoup documents, retain the DocumentType, and be able to serialize it. 782 <https://github.com/jhy/jsoup/issues/1183> 783 784 * Bugfix: on pages fetch by Jsoup.Connection, a "Mark Invalid" exception might be incorrectly thrown, or the page may 785 miss some data. This occurred on larger pages when the file transfer was chunked, and an invalid HTML entity 786 happened to cross a chunk boundary. 787 <https://github.com/jhy/jsoup/issues/1218> 788 789 * Bugfix: if duplicate attributes in an element exist, retain the first vs the last attribute with the same name. Case 790 aware (HTML case-insensitive names, XML are case-sensitive). 791 <https://github.com/jhy/jsoup/issues/1219> 792 793 * Bugfix: don't submit input type=button form elements. 794 <https://github.com/jhy/jsoup/issues/1231> 795 796 * Bugfix: handle error position reporting correctly and don't blow up in some edge cases. 797 <https://github.com/jhy/jsoup/issues/1251> 798 <https://github.com/jhy/jsoup/pull/1253> 799 800 * Bugfix: handle the ^= (starts with) selector correctly when the prefix starts with a space. 801 <https://github.com/jhy/jsoup/pull/1280> 802 803 * Bugfix: don't strip out zero-width-joiners (or zero-width-non-joiners) when normalizing text. That breaks combined 804 emoji (and other text semantics). ♂️ 805 <https://github.com/jhy/jsoup/issues/1269> 806 807 * Bugfix: Evaluator.TagEndsWith (namespaced elements) and Tag disagreed in case-sensitivity. Now correctly matches 808 case-insensitively. 809 <https://github.com/jhy/jsoup/issues/1257> 810 811 * Bugfix: Don't throw an exception if a selector ends in a space, just trim it. 812 <https://github.com/jhy/jsoup/issues/1274> 813 814 * Bugfix: HTML parser adds redundant text when parsing self-closing textarea. 815 <https://github.com/jhy/jsoup/issues/1220> 816 817 * Bugfix: Don't add spurious whitespace or newlines to HTML or text for inline tags. 818 <https://github.com/jhy/jsoup/issues/1305> 819 <https://github.com/jhy/jsoup/issues/731> 820 821 * Bugfix: TextNode.outerHtml() wouldn't normalize correctly without a parent. 822 <https://github.com/jhy/jsoup/issues/1309> 823 824 * Bugfix: Removed binary input detection as it was causing too many false positives. 825 <https://github.com/jhy/jsoup/issues/1250> 826 827 * Bugfix: when cloning a TextNode, if .attributes() was hit before the clone() method, the text value would only be a 828 shallow clone. 829 <https://github.com/jhy/jsoup/issues/1176> 830 831 * Various code hygiene updates. 832 833**** Release 1.12.1 [2019-May-12] 834 * Change: removed deprecated method to disable TLS cert checking Connection.validateTLSCertificates(). 835 836 * Change: some internal methods have been rearranged; if you extended any of the Jsoup internals you may need to make 837 updates. 838 839 * Improvement: documents now remember their parser, so when later manipulating them, the correct HTML or XML tree 840 builder is reused, as are the parser settings like case preservation. 841 <https://github.com/jhy/jsoup/issues/769> 842 843 * Improvement: Jsoup now detects the character set of the input if specified in an XML Declaration, when using the 844 HTML parser. Previously that only happened when the XML parser was specified. 845 <https://github.com/jhy/jsoup/issues/1009> 846 847 * Improvement: if the document's input character set does not support encoding, flip it to one that does. 848 <https://github.com/jhy/jsoup/issues/1007> 849 850 * Improvement: if a start tag is missing a > and a new tag is seen with a <, treat that as a new tag. (This differs 851 from the HTML5 spec, which would make at attribute with a name beginning with <, but in practice this impacts too 852 many pages. 853 <https://github.com/jhy/jsoup/issues/797> 854 855 * Improvement: performance tweaks when parsing start tags, data, tables. 856 857 * Improvement: added Element.nextElementSiblings() and Element.previousElementSiblings() 858 <https://github.com/jhy/jsoup/pull/1054> 859 860 * Improvement: treat center tags as block tags. 861 <https://github.com/jhy/jsoup/pull/1113> 862 863 * Improvement: allow forms to be submitted with Content-Type=multipart/form-data without requiring a file upload; 864 automatically set the mime boundary. 865 <https://github.com/jhy/jsoup/pull/1058> 866 867 * Improvement: Jsoup will now detect if an input file or URL is binary, and will refuse to attempt to parse it, with 868 an IO exception. This prevents runaway processing time and wasted effort creating meaningless parsed DOM trees. 869 <https://github.com/jhy/jsoup/issues/1192> 870 871 * Bugfix: when using the tag case preserving parsing settings, certain HTML tree building rules where not followed 872 for upper case tags. 873 <https://github.com/jhy/jsoup/issues/1149> 874 875 * Bugfix: when converting a Jsoup document to a W3C DOM, if an element is namespaced but not in a defined namespace, 876 set it to the global namespace. 877 <https://github.com/jhy/jsoup/issues/848> 878 879 * Bugfix: attributes created with the Attribute constructor with just spaces for names would incorrectly pass 880 validation. 881 <https://github.com/jhy/jsoup/issues/1159> 882 883 * Bugfix: some pseudo XML Declarations were incorrectly handled when using the XML Parser, leading to an IOOB 884 exception when parsing. 885 <https://github.com/jhy/jsoup/issues/1139> 886 887 * Bugfix: when parsing URL parameter names in an attribute that is not correctly HTML encoded, and near the end of the 888 current buffer, those parameters may be incorrectly dropped. (Improved CharacterReader mark/reset support.) 889 <https://github.com/jhy/jsoup/pull/1154> 890 891 * Bugfix: boolean attribute values would be returned as null, vs an empty string, when accessed via the 892 Attribute#getValue() method. 893 <https://github.com/jhy/jsoup/issues/1065> 894 895 * Bugfix: orphan Attribute objects (i.e. created outside of a parse or an Element) would throw an NPE on 896 Attribute#setValue(val) 897 <https://github.com/jhy/jsoup/issues/1107> 898 899 * Bugfix: Element.shallowClone() was not making a clone of its attributes. 900 <https://github.com/jhy/jsoup/issues/1201> 901 902 * Bugfix: fixed an ArrayIndexOutOfBoundsException in HttpConnection.looksLikeUtf8 when testing small strings in 903 specific ranges. 904 <https://github.com/jhy/jsoup/issues/1172> 905 906 * Updated jetty-server (which is used for integration tests) to latest 9.2 series (9.2.28). 907 908*** Release 1.11.3 [2018-Apr-15] 909 * Improvement: CDATA sections are now treated as whitespace preserving (regardless of the containing element), and are 910 round-tripped into output HTML. 911 <https://github.com/jhy/jsoup/issues/406> 912 <https://github.com/jhy/jsoup/issues/965> 913 914 * Improvement: added support for Deflate encoding. 915 <https://github.com/jhy/jsoup/pull/982> 916 917 * Improvement: when parsing <pre> tags, skip the first newline if present. 918 <https://github.com/jhy/jsoup/issues/825> 919 920 * Improvement: support nested quotes for attribute selection queries. 921 <https://github.com/jhy/jsoup/pull/988> 922 923 * Improvement: character references from Windows-1252 that are not valid Unicode are mapped to the appropriate 924 Unicode replacement. 925 <https://github.com/jhy/jsoup/pull/1046> 926 927 * Improvement: accept a custom SSL socket factory in Jsoup.Connection. 928 <https://github.com/jhy/jsoup/pull/1038> 929 930 * Bugfix: "Mark has been invalidated" exception was thrown when parsing some URLs on Android <= 6. 931 <https://github.com/jhy/jsoup/issues/990> 932 933 * Bugfix: The Element.text() for <div>One</div>Two was "OneTwo", not "One Two". 934 <https://github.com/jhy/jsoup/issues/812> 935 936 * Bugfix: boolean attributes with empty string values were not collapsing in HTML output. 937 <https://github.com/jhy/jsoup/issues/985> 938 939 * Bugfix: when using the XML Parser set to lowercase normalize tags, uppercase closing tags were not correctly 940 handled. 941 <https://github.com/jhy/jsoup/issues/998> 942 943 * Bugfix: when parsing from a URL, an end tag could be read incorrectly if it started on a buffer boundary. 944 <https://github.com/jhy/jsoup/issues/995> 945 946 * Bugfix: when parsing from a URL, if the remote server failed to complete its write (i.e. it writes less than the 947 Content Length header promised on a gzipped stream), the parse method would incorrectly throw an unchecked 948 exception. It now throws the declared IOException. 949 <https://github.com/jhy/jsoup/issues/980> 950 951 * Bugfix: leaf nodes (such as text nodes) where throwing an unsupported operation exception on childNodes(), instead 952 of just returning an empty list. 953 <https://github.com/jhy/jsoup/issues/1032> 954 955 * Bugfix: documents with a leading UTF-8 BOM did not have that BOM consumed, so it acted as a zero width no-break 956 space, which could impact the parse tree. 957 <https://github.com/jhy/jsoup/issues/1003> 958 959 * Bugfix: when parsing an invalid XML declaration, the parse would fail. 960 <https://github.com/jhy/jsoup/issues/1015> 961 962*** Release 1.11.2 [2017-Nov-19] 963 * Improvement: added a new pseudo selector :matchText, which allows text nodes to match as if they were elements. 964 This enables finding text that is only marked by a "br" tag, for example. 965 <https://github.com/jhy/jsoup/issues/550> 966 967 * Change: marked Connection.validateTLSCertificates() as deprecated. 968 969 * Improvement: normalize invisible characters (like soft-hyphens) in Element.text(). 970 <https://github.com/jhy/jsoup/issues/978> 971 972 * Improvement: added Element.wholeText(), to easily get the un-normalized text value of an element and its children. 973 <https://github.com/jhy/jsoup/pull/564> 974 975 * Bugfix: in a deep DOM stack, a StackOverFlow exception could occur when generating implied end tags. 976 <https://github.com/jhy/jsoup/issues/966> 977 978 * Bugfix: when parsing attribute values that happened to cross a buffer boundary, a character was dropped. 979 <https://github.com/jhy/jsoup/issues/967> 980 981 * Bugfix: fixed an issue that prevented using infinite timeouts in Jsoup.Connection. 982 <https://github.com/jhy/jsoup/issues/968> 983 984 * Bugfix: whitespace preserving tags were not honoured when nested deeper than two levels deep. 985 <https://github.com/jhy/jsoup/issues/722> 986 987 * Bugfix: an unterminated comment token at the end of the HTML input would cause an out of bounds exception. 988 <https://github.com/jhy/jsoup/issues/972> 989 990 * Bugfix: an NPE in the Cleaner which would occur if an <a href> attribute value was missing. 991 <https://github.com/jhy/jsoup/issues/973> 992 993 * Bugfix: when serializing the same document in a multiple threads, on Android, with a character set that is not ascii 994 or UTF-8, an encoding exception could occur. 995 <https://github.com/jhy/jsoup/issues/970> 996 997 * Bugfix: removing a form value from the DOM would not remove it from FormData. 998 <https://github.com/jhy/jsoup/pull/969> 999 1000 * Bugfix: in the W3CDom transformer, siblings were incorrectly inheriting namespaces defined on previous siblings. 1001 <https://github.com/jhy/jsoup/issues/977> 1002 1003*** Release 1.11.1 [2017-Nov-06] 1004 * Updated language level to Java 7 from Java 5. To maintain Android support (of minversion 8), try-with-resources are 1005 not used. 1006 <https://github.com/jhy/jsoup/issues/899> 1007 1008 * When loading content from a URL or a file, the content is now parsed as it streams in from the network or disk, 1009 rather than being fully buffered before parsing. This substantially reduces memory consumption & large garbage 1010 objects when loading large files. Note that this change means that a response, once parsed, may not be parsed 1011 again from the same response object unless you call response.bufferUp() first, which will buffer the full response 1012 into memory. 1013 <https://github.com/jhy/jsoup/issues/904> 1014 1015 * Added Connection.Response.bodyStream(), a method to get the response body as an input stream. This is useful for 1016 saving a large response straight to a file, without buffering fully into memory first. 1017 1018 * Performance improvements in text and HTML generation (through less GC). 1019 1020 * Reduced memory consumption of text, scripts, and comments in the DOM by 40%, by refactoring the node 1021 hierarchy to not track childnodes or attributes by default for lead nodes. For the average document, that's about a 1022 30% memory reduction. 1023 <https://github.com/jhy/jsoup/issues/911> 1024 1025 * Reduced memory consumption of Elements by refactoring their Attributes to be a simple pair of arrays, vs a 1026 LinkedHashSet. 1027 <https://github.com/jhy/jsoup/issues/911> 1028 1029 * Added support for Element.selectFirst(query), to efficiently find the first matching element. 1030 1031 * Added Element.appendTo(parent) to simplify slinging elements about. 1032 <https://github.com/jhy/jsoup/pull/662> 1033 1034 * Added support for multiple headers with the same name in Jsoup.Connect 1035 1036 * Added Element.shallowClone() and Node.shallowClone(), to allow cloning nodes without getting all their children. 1037 <https://github.com/jhy/jsoup/issues/900> 1038 1039 * Updated Element.text() and the :contains(text) selector to consider character as spaces. 1040 1041 * Updated Jsoup.connect().timeout() to implement a total connect + combined read timeout. Previously it specified 1042 connect and buffer read times only, so to implement a combined total timeout, you had to have another thread send 1043 an interrupt. 1044 1045 * Improved performance of Node.addChildren (was quadratic) 1046 <https://github.com/jhy/jsoup/pull/930> 1047 1048 * Added missing support for template tags in tables 1049 <https://github.com/jhy/jsoup/pull/901> 1050 1051 * In Jsoup.connect file uploads, added the ability to set the uploaded files' mimetype. 1052 <https://github.com/jhy/jsoup/issues/936> 1053 1054 * Improved Node traversal, including less object creation, and partial and filtering traversor support. 1055 <https://github.com/jhy/jsoup/pull/849> 1056 1057 * Bugfix: if a document was re-decoded after character set detection, the HTML parser was not reset correctly, 1058 which could lead to an incorrect DOM. 1059 <https://github.com/jhy/jsoup/issues/877> 1060 1061 * Bugfix: attributes with the same name but different case would be incorrectly treated as different attributes. 1062 <https://github.com/jhy/jsoup/pull/903> 1063 1064 * Bugfix: self-closing tags for known empty elements were incorrectly treated as errors. 1065 <https://github.com/jhy/jsoup/issues/868> 1066 1067 * Bugfix: fixed an issue where a self-closing title, noframes, or style tag would cause the rest of the page to be 1068 incorrectly parsed as data or text. 1069 <https://github.com/jhy/jsoup/issues/906> 1070 1071 * Bugfix: fixed an issue with unknown mixed-case tags 1072 <https://github.com/jhy/jsoup/pull/942> 1073 1074 * Bugfix: fixed an issue where the entity resources were left open after startup, causing a warning. 1075 <https://github.com/jhy/jsoup/pull/928> 1076 1077 * Bugfix: fixed an issue where Element.getElementsByIndexLessThan(index) would incorrectly provide the root element 1078 <https://github.com/jhy/jsoup/pull/918> 1079 1080 * Improved parse time for pages with exceptionally deeply nested tags. 1081 <https://github.com/jhy/jsoup/issues/955> 1082 1083 * Improvement / workaround: modified the Entities implementation to load its data from a .class vs from a jar resource. 1084 Faster, and safer on Android. 1085 <https://github.com/jhy/jsoup/issues/959> 1086 1087*** Release 1.10.3 [2017-Jun-11] 1088 * Added Elements.eachText() and Elements.eachAttr(name), which return a list of Element's text or attribute values, 1089 respectively. This makes it simpler to for example get a list of each URL on a page: 1090 List<String> urls = doc.select("a").eachAttr("abs:href""); 1091 1092 * Improved selector validation for :contains(...) with unbalanced quotes. 1093 <https://github.com/jhy/jsoup/issues/803> 1094 1095 * Improved the speed of index based CSS selectors and other methods that use elementSiblingIndex, by a factor of 34x. 1096 <https://github.com/jhy/jsoup/pull/862> 1097 1098 * Added Node.clearAttributes(), to simplify removing of all attributes of a Node / Element. 1099 <https://github.com/jhy/jsoup/issues/829> 1100 1101 * Bugfix: if an attribute name started or ended with a control character, the parse would fail with a validation 1102 exception. 1103 <https://github.com/jhy/jsoup/issues/793> 1104 1105 * Bugfix: Element.hasClass() and the ".classname" selector would not find the class attribute case-insensitively. 1106 <https://github.com/jhy/jsoup/issues/814> 1107 1108 * Bugfix: In Jsoup.Connection, if a redirect contained a query string with %xx escapes, they would be double escaped 1109 before the redirect was followed, leading to fetching an incorrect location. 1110 1111 * Bugfix: In Jsoup.Connection, if a request body was set and the connection was redirected, the body would incorrectly 1112 still be sent. 1113 <https://github.com/jhy/jsoup/pull/881> 1114 1115 * Bugfix: In DataUtil when detecting the character set from meta data, and there are two Content-Types defined, use 1116 the one that defines a character set. 1117 <https://github.com/jhy/jsoup/pull/835> 1118 1119 * Bugfix: when parsing unknown tags in case-sensitive HTML mode, end tags would not close scope correctly. 1120 <https://github.com/jhy/jsoup/issues/819> 1121 1122 * In Jsoup.Connection, ensure there is no Content-Type set when being redirected to a GET. 1123 <https://github.com/jhy/jsoup/pull/895> 1124 1125 * Bugfix: in certain locales (Turkey specifically), lowercasing and case insensitivity could fail for specific items. 1126 <https://github.com/jhy/jsoup/pull/820> 1127 1128 * Bugfix: after an element was cloned, changes to its child list where not notifying the element correctly. 1129 <https://github.com/jhy/jsoup/issues/951> 1130 1131*** Release 1.10.2 [2017-Jan-02] 1132 * Improved startup time, particularly on Android, by reducing garbage generation and CPU execution time when loading 1133 the HTML entity files. About 1.72x faster in this area. 1134 1135 * Added Element.is(query) to check if an element matches this CSS query. 1136 1137 * Added new methods to Elements: next(query), nextAll(query), prev(query), prevAll(query) to select next and previous 1138 element siblings from a current selection, with optional selectors. 1139 1140 * Added Node.root() to get the topmost ancestor of a Node. 1141 1142 * Added the new selector :containsData(), to find elements that hold data, like script and style tags. 1143 1144 * Changed Jsoup.isValid(bodyHtml) to validate that the input contains only body HTML that is safe according to the 1145 safelist, and does not include HTML errors. And in the Jsoup.Cleaner.isValid(Document) method, make sure the doc 1146 only includes body HTML. 1147 <https://github.com/jhy/jsoup/issues/245> 1148 <https://github.com/jhy/jsoup/issues/632> 1149 1150 * In Safelists, validate that a removed protocol exists before removing said protocol. 1151 1152 * Allow the Jsoup.Connect thread to be interrupted when reading the input stream; helps when reading from a long stream 1153 of data that doesn't read timeout. 1154 <https://github.com/jhy/jsoup/pull/712> 1155 1156 * Jsoup.Connect now uses a desktop user agent by default. Many developers were getting caught by not specifying the 1157 user agent, and sending the default 'Java'. That causes many servers to return different content than what they would 1158 to a desktop browser, and what the developer was expecting. 1159 1160 * Increased the default connect/read timeout in Jsoup.Connect to 30 seconds. 1161 1162 * Jsoup.Connect now detects if a header value is actually in UTF-8 vs the HTTP spec of ISO-8859, and converts 1163 the header value appropriately. This improves compatibility with servers that are configured incorrectly. 1164 1165 * Bugfix: in Jsoup.Connect, URLs containing non-URL-safe characters were not encoded to URL safe correctly. 1166 <https://github.com/jhy/jsoup/issues/706> 1167 1168 * Bugfix: a "SYSTEM" flag in doctype tags would be incorrectly removed. 1169 <https://github.com/jhy/jsoup/issues/408> 1170 1171 * Bugfix: removing attributes from an Element with removeAttr() would cause a ConcurrentModificationException. 1172 1173 * Bugfix: the contents of Comment nodes were not returned by Element.data() 1174 1175 * Bugfix: if source checked out on Windows with git autocrlf=true, Entities.load would fail because of the \r char. 1176 1177*** Release 1.10.1 [2016-Oct-23] 1178 * New feature: added the option to preserve case for tags and/or attributes, with ParseSettings. By default, the HTML 1179 parser will continue to normalize tag names and attribute names to lower case, and the XML parser will now preserve 1180 case, according to the relevant spec. The CSS selectors for tags and attributes remain case insensitive, per the CSS 1181 spec. 1182 1183 * Improved support for extended HTML entities, including supplemental characters and multiple character references. 1184 Also reduced memory consumption of the entity tables. 1185 <https://github.com/jhy/jsoup/issues/602> 1186 <https://github.com/jhy/jsoup/issues/603> 1187 1188 * Added support for *|E wildcard namespace selectors. 1189 <https://github.com/jhy/jsoup/pull/724> 1190 1191 * Added support for setting multiple connection headers at once with Connection.headers(Map) 1192 <https://github.com/jhy/jsoup/pull/725> 1193 1194 * Added support for setting/overriding the response character set in Connection.Response, for cases where the charset 1195 is not defined by the server, or is defined incorrectly. 1196 <https://github.com/jhy/jsoup/issues/743> 1197 1198 * Improved performance of class selectors by reducing memory allocation and garbage collection. 1199 <https://github.com/jhy/jsoup/pull/753> 1200 1201 * Improved performance of HTML output by reducing the creation of temporary attribute list iterators. 1202 <https://github.com/jhy/jsoup/pull/755> 1203 1204 * Fixed an issue when converting to the W3CDom XML, where valid (but ugly) HTML attribute names containing characters 1205 like '"' could not be converted into valid XML attribute names. These attribute names are now normalized if possible, 1206 or not added to the XML DOM. 1207 <https://github.com/jhy/jsoup/issues/721> 1208 1209 * Fixed an OOB exception when loading an empty-body URL and parsing with the XML parser. 1210 <https://github.com/jhy/jsoup/issues/727> 1211 1212 * Fixed an issue where attribute names starting with a slash would be parsed incorrectly. 1213 <https://github.com/jhy/jsoup/pull/748> 1214 1215 * Don't reuse charset encoders from OutputSettings, to make threadsafe. 1216 <https://github.com/jhy/jsoup/issues/740> 1217 1218 * Fixed an issue in connections with a requestBody where a custom content-type header could be ignored. 1219 <https://github.com/jhy/jsoup/issues/756> 1220 1221*** Release 1.9.2 [2016-May-17] 1222 * Fixed an issue where tag names that contained non-ascii characters but started with an ascii character 1223 would cause the parser to get stuck in an infinite loop. 1224 <https://github.com/jhy/jsoup/issues/704> 1225 1226 * In XML documents, detect the charset from the XML prolog - <?xml encoding="UTF-8"?> 1227 <https://github.com/jhy/jsoup/issues/701> 1228 1229 * Fixed an issue where created XML documents would have an incorrect prolog. 1230 <https://github.com/jhy/jsoup/issues/652> 1231 1232 * Fixed an issue where you could not use an attribute selector to find values containing unbalanced braces or 1233 parentheses. 1234 <https://github.com/jhy/jsoup/issues/611> 1235 1236 * Fixed an issue where namespaced tags (like <fb:comment>) would cause Element.cssSelector() to fail. 1237 <https://github.com/jhy/jsoup/pull/677> 1238 1239*** Release 1.9.1 [2016-Apr-16] 1240 * Added support for HTTP and SOCKS request proxies, specifiable per connection. 1241 <https://github.com/jhy/jsoup/pull/570> 1242 1243 * Added support for sending plain HTTP request bodies in POST and PUT requests, with Connection.requestBody(String). 1244 1245 * Added support in Jsoup.Connect for HEAD, OPTIONS, TRACE. 1246 <https://github.com/jhy/jsoup/issues/613> 1247 1248 * Added support for HTTP 307 Temporary Redirect (replays posts, if applicable). 1249 <https://github.com/jhy/jsoup/pull/666> 1250 1251 * Performance improvements when parsing HTML, particularly for Android Dalvik. 1252 1253 * Added support for writing HTML into Appendable objects (like OutputStreamWriter), to enable stream serialization. 1254 <https://github.com/jhy/jsoup/pull/470/> 1255 1256 * Added support for XML namespaces when converting jsoup documents to W3C documents. 1257 <https://github.com/jhy/jsoup/pull/672> 1258 1259 * Added support for UTF-16 and UTF-32 character set detection from byte-order-marks (BOM). 1260 <https://github.com/jhy/jsoup/issues/695> 1261 1262 * Added support for tags with non-ascii (unicode) letters. 1263 <https://github.com/jhy/jsoup/issues/667> 1264 1265 * Added Connection.data(key) to retrieve a data KeyVal by its key. Useful to update form data before submission. 1266 1267 * Fixed an issue in the Parent selector where it would not match against the root element it was applied to. 1268 <https://github.com/jhy/jsoup/pull/619> 1269 1270 * Fix an issue where elements.select(query) would not return every matching element if they had the same content. 1271 <https://github.com/jhy/jsoup/issues/614> 1272 1273 * Added not-null validators to Element.appendText() and Element.prependText() 1274 <https://github.com/jhy/jsoup/issues/690> 1275 1276 * Fixed an issue when moving nodes using Element.insert(index, children) where the sibling index would be set 1277 incorrectly, leading to the original loads being lost. 1278 <https://github.com/jhy/jsoup/issues/689> 1279 1280 * Reverted Node.equals() and Node.hashCode() back to identity (object) comparisons, as deep content inspection 1281 had negative performance impacts and hashkey stability problems. Functionality replaced with Node.hasSameContent(). 1282 <https://github.com/jhy/jsoup/issues/688> 1283 1284 * In Jsoup.Connect, if the same header key is seen multiple times, combine their values with a comma per the HTTP RFC, 1285 instead of keeping just one value. Also fixes an issue where header values could be out of order. 1286 <https://github.com/jhy/jsoup/issues/618> 1287 1288*** Release 1.8.3 [2015-Aug-02] 1289 * Added support for custom boolean attributes. 1290 <https://github.com/jhy/jsoup/pull/555> 1291 1292 * When fetching XML URLs, automatically switch to the XML parser instead of the HTML parser. 1293 <https://github.com/jhy/jsoup/pull/574> 1294 1295 * Performance improvement on parsing larger HTML pages. On Android KitKat, around 1.7x times faster. On Android 1296 Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the HtmlTreeBuilder methods based on analysis of 1297 various websites; also from further memory reduction for nodes with no children, and other tweaks. 1298 1299 * Fixed an issue in Element.getElementSiblingIndex (and related methods) where sibling elements with the same content 1300 would incorrectly have the same sibling index. 1301 <https://github.com/jhy/jsoup/issues/554> 1302 1303 * Fixed an issue where unexpected elements in a badly nested table could be moved to the wrong location in the 1304 document. 1305 <https://github.com/jhy/jsoup/issues/552> 1306 1307 * Fixed an issue where a table nested within a TH cell would parse to an incorrect tree. 1308 <https://github.com/jhy/jsoup/issues/575> 1309 1310 * When serializing a document using the XHTML encoding entities, if the character set did not support chars 1311 (such as Shift_JIS), the character would be skipped. For visibility, will now always output &xa0; when using XHTML 1312 encoding entities (as is not defined), regardless of the output character set. 1313 <https://github.com/jhy/jsoup/issues/523> 1314 1315 * Fixed an issue when resolving URLs, if the absolute URL had no path, the relative URL was not normalized correctly. 1316 Also fixed an issue where connections that were redirected to a relative URL did not have the same normalization 1317 rules as a URL read from Nodes.absUrl(String). 1318 <https://github.com/jhy/jsoup/issues/585> 1319 1320 * When serialising XML, ensure that '<' characters in attributes are escaped, per spec. Not required in HTML. 1321 <https://github.com/jhy/jsoup/issues/528> 1322 1323*** Release 1.8.2 [2015-Apr-13] 1324 * Performance improvements for parsing HTML on Android, of 1.5x to 1.9x, with larger parses getting a bigger 1325 speed increase. For non-Android JREs, around 1.1x to 1.2x. 1326 1327 * Dramatic performance improvement in HTML serialization on Android (KitKat and later), of 115x. Improvement by working 1328 around a character set encoding speed regression in Android. 1329 <https://github.com/jhy/jsoup/issues/383> 1330 1331 * Performance improvement for the class name selector on Android (.class) of 2.5x to 14x. Around 1.2x 1332 on non-Android JREs. 1333 1334 * File upload support. Added the ability to specify input streams for POST data, which will upload content in 1335 MIME multipart/form-data encoding. 1336 1337 * Add a meta-charset element to documents when setting the character set, so that the document's charset is 1338 unambiguous. 1339 <https://github.com/jhy/jsoup/pull/486> 1340 1341 * Added ability to disable TLS (SSL) certificate validation. Helpful if you're hitting a host with a bad cert, 1342 or your JDK doesn't support SNI. 1343 <https://github.com/jhy/jsoup/pull/343> 1344 1345 * Added ability to further tweak the canned Cleaner Safelists by removing existing settings. 1346 <https://github.com/jhy/jsoup/pull/449> 1347 1348 * Added option in Cleaner Safelist to allow linking to in-page anchors (#) 1349 <https://github.com/jhy/jsoup/pull/441> 1350 1351 * Use a lowercase doctype tag for HTML5 documents. 1352 1353 * Add support for 201 Created with redirect, and other status codes. Treats any HTTP status code 2xx or 3xx as an OK 1354 response, and follow redirects whenever there is a Location header. 1355 <https://github.com/jhy/jsoup/issues/312> 1356 1357 * Added support for HTTP method verbs PUT, DELETE, and PATCH. 1358 1359 * Added support for overriding the default POST character of UTF-8 1360 <https://github.com/jhy/jsoup/pull/491> 1361 1362 * W3C DOM support: added ability to convert from a jsoup document to a W3C document, with the W3Dom helper class. 1363 1364 * In the HtmlToPlainText example program, added the ability to filter using a CSS selector. Also clarified 1365 the usage documentation. 1366 1367 * Fixed validation of cookie names in HttpConnection cookie methods. 1368 <https://github.com/jhy/jsoup/pull/377> 1369 1370 * Fixed an issue where <option> tags would be missed when preparing a form for submission if missing a selected 1371 attribute. 1372 1373 * Fixed an issue where submitting a form would incorrectly include radio and checkbox values without the checked 1374 attribute. 1375 1376 * Fixed an issue where Element.classNames() would return a set containing an empty class; and may have extraneous 1377 whitespace. 1378 <https://github.com/jhy/jsoup/pull/469> 1379 1380 * Fixed an issue where attributes selected by value were not correctly space normalized. 1381 <https://github.com/jhy/jsoup/pull/526> 1382 1383 * In head+noscript elements, treat content as character data, instead of jumping out of head parsing. 1384 <https://github.com/jhy/jsoup/pull/540> 1385 1386 * Fixed performance issue when parsing HTML with elements with many children that need re-parenting. 1387 <https://github.com/jhy/jsoup/pull/506> 1388 1389 * Fixed an issue where a server returning an unsupported character set response would cause a runtime 1390 UnsupportedCharsetException, instead of falling back to the default UTF-8 charset. 1391 <https://github.com/jhy/jsoup/pull/509> 1392 1393 * Fixed an issue where Jsoup.Connection would throw an IO Exception when reading a page with zero content-length. 1394 <https://github.com/jhy/jsoup/issues/538> 1395 1396 * Improved the equals() and hashcode() methods in Node, to consider all their child content, for DOM tree comparisons. 1397 <https://github.com/jhy/jsoup/issues/537> 1398 1399 * Improved performance in Selector when searching multiple roots. 1400 <https://github.com/jhy/jsoup/issues/518> 1401 1402*** Release 1.8.1 [2014-Sep-27] 1403 * Introduced the ability to chose between HTML and XML output, and made HTML the default. This means img tags are 1404 output as <img>, not <img />. XML is the default when using the XmlTreeBuilder. Control this with the 1405 Document.OutputSettings.syntax() method. 1406 1407 * Improved the performance of Element.text() by 3.2x 1408 1409 * Improved the performance of Element.html() by 1.7x 1410 1411 * Improved file read time by 2x, giving around a 10% speed improvement to file parses. 1412 <https://github.com/jhy/jsoup/issues/248> 1413 1414 * Tightened the scope of what characters are escaped in attributes and textnodes, to align with the spec. Also, when 1415 using the extended escape entities map, only escape a character if the current output charset does not support it. 1416 This produces smaller, more legible HTML, with greater control over the output (by setting charset and escape mode). 1417 1418 * If pretty-print is disabled, don't trim outer whitespace in Element.html() 1419 <https://github.com/jhy/jsoup/issues/368> 1420 1421 * In the HTML Cleaner, allow span tags in the basic safelist, and span and div tags in the relaxed safelist. 1422 1423 * Added Element.cssSelector(), which returns a unique CSS selector/path for an element. 1424 <https://github.com/jhy/jsoup/pull/459> 1425 1426 * Fixed an issue where <svg><img/></svg> was parsed as <svg><image/></svg> 1427 <https://github.com/jhy/jsoup/issues/364> 1428 1429 * Fixed an issue where a UTF-8 BOM character was not detected if the HTTP response did not specify a charset, and 1430 the HTML body did, leading to the head contents incorrectly being parsed into the body. Changed the behavior so that 1431 when the UTF-8 BOM is detected, it will take precedence for determining the charset to decode with. 1432 <https://github.com/jhy/jsoup/issues/348> 1433 1434 * Relaxed doctype validation, allowing doctypes to not specify a name. 1435 <https://github.com/jhy/jsoup/issues/460> 1436 1437 * Fixed an issue in parsing a base URI when loading a URL containing a http-equiv element. 1438 <https://github.com/jhy/jsoup/issues/440> 1439 1440 * Fixed an issue for Java 1.5 / Android 2.2 compatibility, and verify it doesn't regress. 1441 <https://github.com/jhy/jsoup/issues/375> 1442 <https://github.com/jhy/jsoup/pull/403> 1443 1444 * Fixed an issue that would throw an NPE when trying to set invalid HTML into a title element. 1445 <https://github.com/jhy/jsoup/pull/410> 1446 1447 * Added support for quoted attribute values in CSS Selectors 1448 <https://github.com/jhy/jsoup/pull/400> 1449 1450 * Fixed support for nth-of-type selectors with unknown tags. 1451 <https://github.com/jhy/jsoup/pull/402> 1452 1453 * Added support for 'application/*+xml' mimetypes. 1454 <https://github.com/jhy/jsoup/pull/444> 1455 1456 * Fixed support for allowing script tags in cleaner Safelists. 1457 <https://github.com/jhy/jsoup/issues/299> 1458 <https://github.com/jhy/jsoup/issues/388> 1459 1460 * In FormElements, don't submit disabled inputs, and use 'on' as checkbox value default. 1461 <https://github.com/jhy/jsoup/issues/489> 1462 1463*** Release 1.7.3 [2013-Nov-10] 1464 * Introduced FormElement, providing easy access to form controls and their data, and the ability to submit forms 1465 with Jsoup.Connect. 1466 1467 * Reduced GC impact during HTML parsing, with 17% fewer objects created, and 3% faster parses. 1468 1469 * Reduced CSS selection time by 26% for common queries. 1470 1471 * Improved HTTP character set detection. 1472 <https://github.com/jhy/jsoup/pull/325> <https://github.com/jhy/jsoup/issues/321> 1473 1474 * Added Document.location, to get the URL the document was retrieved from. Helpful if connection was redirected. 1475 <https://github.com/jhy/jsoup/pull/306> 1476 1477 * Fixed support for self-closing script tags. 1478 <https://github.com/jhy/jsoup/issues/305> 1479 1480 * Fixed a crash when reading an unterminated CDATA section. 1481 <https://github.com/jhy/jsoup/issues/349> 1482 1483 * Fixed an issue where elements added via the adoption agency algorithm did not preserve their attributes. 1484 <https://github.com/jhy/jsoup/issues/313> 1485 1486 * Fixed an issue when cloning a document with extremely nested elements that could cause a stack-overflow. 1487 <https://github.com/jhy/jsoup/issues/290> 1488 1489 * Fixed an issue when connecting or redirecting to a URL that contains a space. 1490 <https://github.com/jhy/jsoup/pull/354> <https://github.com/jhy/jsoup/issues/114> 1491 1492 * Added support for the HTTP/1.1 Temporary Redirect (307) status code. 1493 <https://github.com/jhy/jsoup/issues/452> 1494 1495*** Release 1.7.2 [2013-Jan-27] 1496 * Added support for supplementary characters outside of the Basic Multilingual Plane. 1497 <https://github.com/jhy/jsoup/issues/288> <https://github.com/jhy/jsoup/pull/289> 1498 1499 * Added support for structural pseudo CSS selectors, including :first-child, :last-child, :nth-child, :nth-last-child, 1500 :first-of-type, :last-of-type, :nth-of-type, :nth-last-of-type, :only-child, :only-of-type, :empty, and :root 1501 <https://github.com/jhy/jsoup/pull/208> 1502 1503 * Added a maximum body response size to Jsoup.Connection, to prevent running out of memory when trying to read 1504 extremely large documents. The default is 1MB. 1505 1506 * Refactored the Cleaner to traverse rather than recurse child nodes, to avoid the risk of overflowing the stack. 1507 <https://github.com/jhy/jsoup/issues/246> 1508 1509 * Added Element.insertChildren(), to easily insert a list of child nodes at a specific index. 1510 <https://github.com/jhy/jsoup/issues/239> 1511 1512 * Added Node.childNodesCopy(), to create an independent copy of a Node's children. 1513 1514 * When parsing in XML mode, preserve XML declarations (<?xml ... ?>). 1515 <https://github.com/jhy/jsoup/issues/242> 1516 1517 * Introduced Parser.parseXmlFragment(), to allow easy parsing of XML fragments. 1518 <https://github.com/jhy/jsoup/issues/279> 1519 1520 * Allow Safelist test methods to be extended 1521 <https://github.com/jhy/jsoup/issues/85> 1522 1523 * Added Document.OutputSettings.outline mode, to aid HTML debugging by printing out in outline mode, similar to 1524 browser HTML inspectors. 1525 <https://github.com/jhy/jsoup/issues/273> 1526 1527 * When parsing, allow all tags to self-close. Tags that aren't expected to self-close will get an end tag. 1528 <https://github.com/jhy/jsoup/issues/258> 1529 1530 * Fixed an issue when parsing <textarea>/RCData tags containing unescaped closing tags that would drop the trailing >. 1531 1532 * Corrected the javadoc for Element#child() to note that it throws IndexOutOfBounds. 1533 <https://github.com/jhy/jsoup/issues/277> 1534 1535 * When cloning an Element, reset the classnames set so as not to hold a pointer to the source's. 1536 <https://github.com/jhy/jsoup/issues/278> 1537 1538 * Limit how far up the stack the formatting adoption agency algorithm will travel, to prevent the chance of a run-away 1539 parse when the HTML stack is hopelessly deep. 1540 <https://github.com/jhy/jsoup/issues/234> 1541 1542 * Modified Element.text() to build text by traversing child nodes rather than recursing. This avoids stack-overflow 1543 errors when the DOM is very deep and the VM stack-size is low. 1544 <https://github.com/jhy/jsoup/issues/271> 1545 1546*** Release 1.7.1 [2012-Sep-23] 1547 * Improved parse time, now 2.3x faster than previous release, with lower memory consumption. 1548 1549 * Reduced memory consumption when selecting elements. 1550 1551 * Introduced finer granularity of exceptions in Jsoup.connect, including HttpStatusException and 1552 UnsupportedMimeTypeException. 1553 <https://github.com/jhy/jsoup/issues/229> 1554 1555 * Fixed an issue when determining the Windows-1254 character-set from a meta tag when run in the Turkish locale. 1556 <https://github.com/jhy/jsoup/issues/191> 1557 1558 * Fixed whitespace preservation in <textarea> tags. 1559 <https://github.com/jhy/jsoup/issues/167> 1560 1561 * In jsoup.connect, fail faster if the return content type is not supported. 1562 <https://github.com/jhy/jsoup/issues/153> 1563 1564 * In jsoup.clean, allow custom OutputSettings, to control pretty printing, character set, and entity escaping. 1565 <https://github.com/jhy/jsoup/issues/148> 1566 1567 * Fixed an issue that prevented frameset documents to be cleaned by the Cleaner. 1568 <https://github.com/jhy/jsoup/issues/154> 1569 1570 * Fixed an issue when normalising whitespace for strings containing high-surrogate characters. 1571 <https://github.com/jhy/jsoup/issues/214> 1572 1573 * If a server doesn't specify a content-type header, treat that as OK. 1574 <https://github.com/jhy/jsoup/issues/213> 1575 1576 * If a server returns an unsupported character-set header, attempt to decode the content with the default charset 1577 (UTF8), instead of bailing with an unsupported charset exception. 1578 <https://github.com/jhy/jsoup/issues/215> 1579 1580 * Removed an unnecessary synchronisation in Tag.valueOf, allowing multi-threaded parsing to run faster. 1581 <https://github.com/jhy/jsoup/issues/238> 1582 1583 * Made entity decoding less greedy, so that non-entities are less likely to be incorrectly treated as entities. 1584 <https://github.com/jhy/jsoup/issues/224> 1585 1586 * Whitespace normalise document.title() output. 1587 <https://github.com/jhy/jsoup/issues/168> 1588 1589 * In Jsoup.connection, enforce a connection disconnect after every connect. This precludes keep-alive connections to 1590 the same host, but in practise many implementations will leak connections, particularly on error. 1591 1592*** Release 1.6.3 [2012-May-28] 1593 * Fixed parsing of group-or commas in CSS selectors, to correctly handle sub-queries containing commas. 1594 <https://github.com/jhy/jsoup/issues/179> 1595 1596 * If a node has no parent, return null on previousSibling and nextSibling instead of throwing a null pointer exception. 1597 <https://github.com/jhy/jsoup/issues/184> 1598 1599 * Updated Node.siblingNodes() and Element.siblingElements() to exclude the current node (a node is not its own sibling). 1600 1601 * Fixed HTML entity parser to correctly parse entities like frac14 (letter + number combo). 1602 <https://github.com/jhy/jsoup/issues/145> 1603 1604 * Fixed issue where contents of a script tag within a comment could be incorrectly parsed. 1605 <https://github.com/jhy/jsoup/issues/115> 1606 1607 * Fixed GAE support: load HTML entities from a file on startup, instead of embedding in the class. 1608 1609 * Fixed NPE when HTML fragment parsing a <style> tag 1610 <https://github.com/jhy/jsoup/issues/189> 1611 1612 * Fixed issue with :all pseudo-tag in HTML sanitizer when cleaning tags previously defined in safelist 1613 <https://github.com/jhy/jsoup/issues/156> 1614 1615 * Fixed NPE in Parser.parseFragment() when context parameter is null. 1616 <https://github.com/jhy/jsoup/issues/195> 1617 1618 * In HTML Safelists, when defining allowed attributes for a tag, automatically add the tag to the allowed list. 1619 1620*** Release 1.6.2 [2012-Mar-27] 1621 * Added a simplified XML parsing mode, which can usefully parse valid and invalid XML, but does not enforce any HTML 1622 document structure or special tag behaviour. 1623 1624 * Added the optional ability to track errors when tokenising and parsing. 1625 1626 * Added jsoup.connect.cookies(Map) method, to set multiple cookies at once, possibly from a prior request. 1627 1628 * Added Element.textNodes() and Element.dataNodes(), to easily access an element's children text nodes and data nodes. 1629 1630 * Added an example program that demonstrates how to format HTML as plain-text, and the use of the NodeVisitor interface. 1631 1632 * Added Node.traverse() and Elements.traverse() methods, to iterate through a node's descendants. 1633 1634 * Updated jsoup.connect so that when requests made as POSTs are redirected, the redirect is followed as a GET. 1635 <https://github.com/jhy/jsoup/issues/120> 1636 1637 * Updated the Cleaner and Safelists to optionally preserve related links in elements, instead of converting them 1638 to absolute links. 1639 1640 * Updated the Cleaner to support custom allowed protocols such as "cid:" and "data:". 1641 <https://github.com/jhy/jsoup/issues/127> 1642 1643 * Updated handling of <base href> tags, to act on only the first one seen when parsing, to align with modern browsers. 1644 1645 * Updated Node.setBaseUri(), to recursively set on all the node's descendants. 1646 1647 * Fixed handling of null characters within comments. 1648 <https://github.com/jhy/jsoup/issues/121> 1649 1650 * Tweaked escaped entity detection in attributes to not treat &entity_... as an entity form. 1651 <https://github.com/jhy/jsoup/issues/129> 1652 1653 * Fixed doctype tokeniser to allow whitespace between name and public identifier. 1654 1655 * Fixed issue where comments within a table tag would be duplicate-fostered into body. 1656 <https://github.com/jhy/jsoup/pull/165> 1657 1658 * Fixed an issue where a spurious byte-order-mark at the start of a document would cause the parser to miss head 1659 contents. 1660 <https://github.com/jhy/jsoup/issues/134> 1661 1662 * Fixed an issue where content after a frameset could cause a NPE crash. Now correctly implements spec and ignores 1663 the trailing content. 1664 <https://github.com/jhy/jsoup/issues/162> 1665 1666 * Tweaked whitespace checks to align with HTML spec 1667 <https://github.com/jhy/jsoup/pull/175> 1668 1669 * Tweaked HTML output of closing script and style tags to not add an extraneous newline when pretty-printing. 1670 1671 * Substantially reduced default memory allocation within Node.outerHtml, to reduce memory pressure when serialising 1672 smaller DOMs. 1673 <https://github.com/jhy/jsoup/issues/143> 1674 1675*** Release 1.6.1 [2011-Jul-02] 1676 * Fixed Java 1.5 compatibility. 1677 <https://github.com/jhy/jsoup/issues/103> 1678 1679 * Fixed an issue when parsing <script> tags in body where the tokeniser wouldn't switch to the InScript state, which 1680 meant that data wasn't parsed correctly. 1681 <https://github.com/jhy/jsoup/issues/104> 1682 1683 * Fixed an issue with a missing quote when serialising DocumentType nodes. 1684 <https://github.com/jhy/jsoup/issues/109> 1685 1686 * Fixed issue where a single 0 character was lexed incorrectly as a null character. 1687 <https://github.com/jhy/jsoup/issues/107> 1688 1689 * Fixed normalisation of carriage returns to newlines on input HTML. 1690 <https://github.com/jhy/jsoup/issues/110> 1691 1692 * Disabled memory mapped files when loading files from disk, to improve compatibility in Windows environments. 1693 1694*** Release 1.6.0 [2011-Jun-13] 1695 * HTML5 conformant parser. Complete reimplementation of HTML tokenisation and parsing, to implement the 1696 http://whatwg.org/html spec. This ensures jsoup parses HTML identically to current modern browsers. 1697 1698 * When parsing files from disk, files are loaded via memory mapping, to increase parse speed. 1699 1700 * Reduced memory overhead and lowered garbage collector pressure with Attribute, Node and Element model optimisations. 1701 1702 * Improved "abs:" absolute URL handling in Elements.attr("abs:href") and Node.hasAttr("abs:href"). 1703 <https://github.com/jhy/jsoup/issues/97> 1704 1705 * Fixed cookie handling issue in jsoup.Connect where empty cookies would cause a validation exception. 1706 <https://github.com/jhy/jsoup/issues/87> 1707 1708 * Added jsoup.Connect configuration options to allow HTTP errors to be ignored, and the content-type to be ignored. 1709 Contributed by Jesse Piascik (piascikj) 1710 <https://github.com/jhy/jsoup/pull/78> 1711 1712 * Added Node.before(node) and Node.after(node), to allow existing nodes to be moved, or new nodes to be inserted, into 1713 precise DOM positions. 1714 1715 * Added Node.unwrap() and Elements.unwrap(), to remove a node but keep its contents. Useful for e.g. removing unwanted 1716 formatting tags. 1717 <https://github.com/jhy/jsoup/issues/100> 1718 1719 * Now handles unclosed <title> tags in document by breaking out of the title at the next start tag, instead of 1720 eating up to the end of the document. 1721 <https://github.com/jhy/jsoup/issues/82> 1722 1723 * Added OSGi bundle support to the jsoup package jar. 1724 <https://github.com/jhy/jsoup/issues/98> 1725 1726*** Release 1.5.2 [2011-Feb-27] 1727 * Fixed issue with selector parser where some boolean AND + OR combined queries (e.g. "meta[http-equiv], meta[content]") 1728 were being parsed incorrectly as OR only queries (e.g. former as "meta, [http-equiv], meta[content]") 1729 1730 * Fixed issue where a content-type specified in a meta tag may not be reliably detected, due to the above issue. 1731 1732 * Updated Element.text() and Element.ownText() methods to ensure <br> tags output as whitespace. 1733 1734 * Tweaked Element.outerHtml() method to not generate initial newline on first output element. 1735 1736 *** Release 1.5.1 [2011-Feb-19] 1737 1738 * Integrated new single-pass selector evaluators, contributed by knz (Anton Kazennikov). This significantly speeds up 1739 the execution of combined selector queries. 1740 1741 * Implemented workaround to fix Scala support. Contributed by bbeck (Brandon Beck). 1742 1743 * Added ability to change an element's tag with Element.tagName(String), and to change many at once 1744 with Elements.tagName(String). 1745 1746 * Added Node.wrap(html), Node.before(html), and Node.after(html), to allow HTML to be easily added to all nodes. These 1747 functions were previously supported on Elements only. 1748 1749 * Added TextNode.splitText(index), which allows a text node to be split into two nodes at a specified index point. 1750 This is convenient if you need to surround some text in an element. 1751 1752 * Updated Jsoup.Connection so that cookies set on a redirect response will be included on both the redirected request 1753 and response. 1754 1755 * Infinite redirection loops in Jsoup.Connect are now prevented. 1756 1757 * Allow Jsoup.Connect to parse application/xml and application/xhtml+xml responses. 1758 1759 * Modified Jsoup.Connect to always follow relative links, regardless of the underlying HTTP sub-system. 1760 1761 * Defined U (underline) element as an inline tag. 1762 1763 * Force strict entity matching (must be &xxx; and not &xxx) in element attributes. 1764 1765 * Implemented clone method for Elements (contributed by knz). 1766 1767 * Fixed tokeniser optimisation when scanning for missing data element close tags. 1768 1769 * Fixed issue when using descendant regex attribute selectors. 1770 1771 *** Release 1.4.1 [2010-Nov-23] 1772 1773 * Added ability to load and parse HTML from an input stream. 1774 1775 * Implemented Node.clone() to create deep, independent copies of Nodes, Elements, and Documents. 1776 1777 * Added :not() selector, to find elements that do not match the selector. E.g. div:not(.logo) finds divs that 1778 do not have the "logo" class name. 1779 1780 * Added Elements.not(selector) method, to remove undesired results from selector results. 1781 1782 * Implemented DataNode.setWholeData() to allow updating of script and style data contents. 1783 1784 * Relaxed parse rules of H1 - H6, to allow nested content. This is against spec, but matches browser and publisher 1785 behaviour. 1786 1787 * Relaxed parse rule of SPAN to treat as block, to allow nested block content. 1788 1789 * Fixed issue in jsoup.connect when extracting character set from content-type header; now supports quoted 1790 charset declaration. 1791 1792 * Fixed support for jsoup.connect to follow redirects between http & https URLs. 1793 1794 * Document normalisation now more enthusiastically enforces the correct document structure. 1795 1796 * Support node.outerHtml() method when node has no parent (e.g. when it has been removed from its DOM tree) 1797 1798 * Fixed support for HTML entities with numbers in name (e.g. ¾, ¹). 1799 1800 * Fixed absolute URL generation from relative URLs which are only query strings. 1801 1802*** Release 1.3.3 [2010-Sep-19] 1803 * Implemented Elements.empty() and Elements.remove(). This allows easy element removal, like: 1804 doc.select("iframe").remove(); 1805 1806 * Fixed issue in Entities when unescaping $ ("$") 1807 <http://github.com/jhy/jsoup/issues/issue/34> 1808 1809 * Added restricted XHTML output entity option 1810 <http://github.com/jhy/jsoup/issues/issue/35> 1811 1812*** Release 1.3.2 [2010-Aug-30] 1813 * Treat HTTP headers as case insensitive in Jsoup.Connection. Improves compatibility for HTTP responses. 1814 1815 * Improved malformed table parsing by implementing ignorable end tags. 1816 1817*** Release 1.3.1 [2010-Aug-23] 1818 * Removed dependency on Apache Commons-lang. Jsoup now has no external dependencies. 1819 1820 * Added new Connection implementation, to enable easier and richer HTTP requests that parse to Documents. This includes 1821 support for gzip responses, cookies, headers, data parameters, user-agent, referrer, etc. 1822 1823 * Added Element.ownText() method, to get only the direct text of an element, not including the text of its children. 1824 1825 * Added support for selectors :containsOwn(text) and :matchesOwn(regex), to supplement Element.ownText(). 1826 1827 * Added support for non-pretty-printed HTML output, to more closely mirror the input HTML. 1828 1829 * Further speed optimisations for parsing and output generation. 1830 1831 * Fixed support for case-sensitive HTML escape entities. 1832 <http://github.com/jhy/jsoup/issues/issue/31> 1833 1834 * Fixed issue when parsing tags with keyless attributes. 1835 <http://github.com/jhy/jsoup/issues/issue/32> 1836 1837*** Release 1.2.3 [2010-Aug-04] 1838 * Added support for automatic input character set detection and decoding. Jsoup now automatically detects the encoding 1839 character set when parsing HTML from a File or URL. The parser checks the content-type header, then the 1840 <meta http-equiv> or <meta charset> tag, and finally falls back to UTF-8. 1841 1842 * Added ability to configure the document's output charset, to control which characters are HTML escaped, and which 1843 are kept intact. The output charset defaults to the document's input charset. This simplifies non-ascii output. 1844 1845 * Added full support for all new HTML5 tags. 1846 1847 * Added support for HTML5 dataset custom data attributes, with the Element.dataset() map. 1848 1849 * Added support for the [^attributePrefix] selector query, to find elements with attributes starting with a prefix. 1850 Useful for finding elements with datasets: [^data-] matches <p data-name="jsoup"> 1851 1852 * Added support for namespaced elements (<fb:name>) and selectors to find them (fb|name) 1853 1854 * Implemented Node.ownerDocument DOM method 1855 1856 * Improved implicit table element handling (particularly around thead, tbody, and tfoot). 1857 1858 * Improved HTML output format for empty elements and auto-detected self closing tags 1859 1860 * Changed DT & DD tags to block-mode tags, to follow practice over spec 1861 1862 * Added support for tag names with - and _ (<abc_foo>, <abc-foo>) 1863 1864 * Handle tags with internal trailing space (<foo >) 1865 1866 * Fixed support for character class regular expressions in [attr=~regex] selector 1867 1868*** Release 1.2.2 [2010-Jul-11] 1869 1870 * Performance optimisation: 1871 - core HTML parser engine now 3.5 times faster 1872 - HTML generator now 2.5 times faster 1873 - much lower memory use and garbage collection time 1874 1875 * Added support for :matches(regex) selector, to find elements containing text matching regular expression 1876 1877 * Added support for [key~=regex] attribute selector, to find elements with attribute values matching regular expression 1878 1879 * Upgraded the selector query parser to allow nested selectors like 'div:has(p:matches(regex))' 1880 1881*** Release 1.2.1 [2010-Jun-21] 1882 * Added .before(html) and .after(html) methods to Element and Elements, to insert sibling HTML 1883 1884 * Added :contains(text) selector, to search for elements containing the specified text 1885 1886 * Added :has(selector) pseudo-selector 1887 <http://github.com/jhy/jsoup/issues/issue/20> 1888 1889 * Added Element#parents and Elements#parents to retrieve an element's ancestor chain 1890 <http://github.com/jhy/jsoup/issues/issue/20> 1891 1892 * Fixes an issue where appending / prepending rows to a table (or to similar implicit 1893 element structures) would create a redundant wrapping elements 1894 <http://github.com/jhy/jsoup/issues/issue/21> 1895 1896 * Improved implicit close tag heuristic detection when parsing malformed HTML 1897 1898 * Fixes an issue where text content after a script (or other data-node) was 1899 incorrectly added to the data node. 1900 <http://github.com/jhy/jsoup/issues/issue/22> 1901 1902 * Fixes an issue where text order was incorrect when parsing pre-document 1903 HTML. 1904 <http://github.com/jhy/jsoup/issues/issue/23> 1905 1906*** Release 1.1.1 [2010-Jun-08] 1907 * Added selector support for :eq, :lt, and :gt 1908 <http://github.com/jhy/jsoup/issues/issue/16> 1909 1910 * Added TextNode#text and TextNode#text(String) 1911 <http://github.com/jhy/jsoup/issues/issue/18> 1912 1913 * Throw exception if trying to parse non-text content 1914 <http://github.com/jhy/jsoup/issues/issue/17> 1915 1916 * Added Node#remove and Node#replaceWith 1917 <http://github.com/jhy/jsoup/issues/issue/19> 1918 1919 * Allow _ and - in CSS ID selectors (per CSS spec). 1920 <http://github.com/jhy/jsoup/issues/issue/10> 1921 1922 * Relative links are resolved to absolute when cleaning, to normalize 1923 output and to verify safe protocol. (Were previously discarded.) 1924 <http://github.com/jhy/jsoup/issues/issue/12> 1925 1926 * Allow combinators at start of selector query, for query refinements 1927 <http://github.com/jhy/jsoup/issues/issue/13> 1928 1929 * Added Element#val() and #val(String) methods, for form values 1930 <http://github.com/jhy/jsoup/issues/issue/14> 1931 1932 * Changed textarea contents to parse as TextNodes, not DataNodes, 1933 so contents visible to text() (and val(), as treated as form input) 1934 1935 * Fixed support for Java 1.5 1936 1937*** Release 0.3.1 (2010-Feb-20) 1938 * New features: supports Elements#html(), html(String), 1939 prepend(String), append(String); bulk methods for corresponding 1940 methods in Element. 1941 1942 * New feature: Jsoup.isValid(html, safelist) method for user input 1943 form validation. 1944 1945 * Improved Elements.attr(String) to find first matching element 1946 with attribute. 1947 1948 * Fixed assertion error when cleaning HTML with empty attribute 1949 <http://github.com/jhy/jsoup/issues/issue/7> 1950 1951*** Release 0.2.2 (2010-Feb-07) 1952 * jsoup packages are now available in the Maven central repository. 1953 1954 * New feature: supports Element#addClass, removeClass, toggleClass; 1955 also collection class methods on Elements. 1956 * New feature: supports Element#wrap(html) and Elements#wrap(html). 1957 * New selector syntax: supports E + F adjacent sibling selector 1958 * New selector syntax: supports E ~ F preceding sibling selector 1959 * New: supports Element#elementSiblingIndex() 1960 1961 * Improved document normalisation. 1962 * Improved HTML string output format (pretty-print) 1963 1964 * Fixed absolute URL resolution issue when a base tag has no href. 1965 1966*** Release 0.1.2 (2010-Feb-02) 1967 * Fixed unrecognised tag handler to be more permissive 1968 <http://github.com/jhy/jsoup/issues/issue/1> 1969 1970 1971*** Release 0.1.1 (2010-Jan-31) 1972 * Initial beta release of jsoup 1973