Lines Matching +full:deep +full:- +full:preserve +full:- +full:regex
3 Contains change notes for versions 0.1.1 (2010-Jan-31) through 1.17.1 (2023-Nov-27).
6 Release 1.17.1 [27-Nov-2023]
7 …* Improvement: in Jsoup.connect(), added support for request-level authentication, supporting auth…
23 …* Improvement: added the `:is(selector list)` pseudo-selector, which finds elements that match any…
56 …* Bugfix: in a sub-query such as `p:has(> span, > i)`, combinators following the `,` Or combinator…
57 incorrectly skipped, such that the sub-query was parsed as `i` instead of `> i`.
67 …* Bugfix: when cleaning a document, the output style of unknown self-closing tags from the input w…
68 the output. (So a <foo /> in the input, if safe-listed, would be output as <foo></foo>.)
74 …* Build Improvement: added tests for HTTPS request support, using a local self-signed cert. Includ…
84 Release 1.16.2 [20-Oct-2023]
85 …* Improvement: optimized the performance of complex CSS selectors, by adding a cost-based query pl…
88 complex evaluations (such as an attribute regex, or a deep child scan with a :has).
100 (any preceding sibling) and `:nth-of-type` selectors are improved.
104 …g, firstElementChild, and lastElementChild. They now inplace filter/skip in the child-node list, vs
107 …mized internal methods that previously called Element.children() to use filter/skip child-node list
112 …* Improvement: when using the `:empty` pseudo-selector, blank textnodes are now considered empty. …
122 …* Bugfix: `form` elements and empty elements (such as `img`) did not have their attributes de-dupl…
157 ISO_8859_1 string, and re-encoded as UTF-8. The value is now left as-is.
161 previously required compatibility shim prior to Android's de-sugaring support.
168 without script-mode enabled.
170 Release 1.16.1 [29-Apr-2023]
175 …* Improvement: Calling Node.remove() on a node with no parent is now a no-op, vs a validation erro…
201 …* Bugfix: when pretty-printing, the first inline Element or Comment in a block would not be wrap-i…
205 …* Bugfix: when pretty-printing a <pre> containing block tags, those tags were incorrectly indented.
208 …* Bugfix: when pretty-printing nested inlineable blocks (such as a <p> in a <td>), the inner eleme…
212 * Bugfix: <br> tags should be wrap-indented when in block tags (and not when in inline tags).
215 …* Bugfix: the contents of a sufficiently large <textarea> with un-escaped HTML closing tags may be…
219 Release 1.15.4 [18-Feb-2023]
224 * Improvement: when pretty-printing, wrap text that follows a <br> tag.
227 …* Improvement: when pretty-printing, normalize newlines that follow self-closing tags in custom ta…
230 …* Improvement: when pretty-printing, collapse non-significant whitespace between a block and an in…
258 …form data when executing multi-step form submissions, or data sent to later requests incorrectly. …
259 …only copies session related settings (cookies, proxy settings, user-agent, etc) but not the reques…
264 the ":all" pseudo-attribute.
273 Release 1.15.3 [2022-Aug-24]
276 <https://github.com/jhy/jsoup/security/advisories/GHSA-gp7f-rwcx-9369>
278 …* Improvement: the Cleaner will preserve the source position of cleaned elements, if source tracki…
293 *** Release 1.15.2 [2022-Jul-04]
305 …* Improvement: when pretty-printing HTML, doctypes are emitted on a newline if there is a precedin…
308 …* Improvement: when pretty-printing, trim the leading and trailing spaces of textnodes in block ta…
325 …* Bugfix: when pretty-print serializing HTML, newlines separating phrasing content (e.g. a <span> …
330 *** Release 1.15.1 [2022-May-15]
334 …* Improvement: when converting jsoup Documents to W3C Documents in W3CDom, preserve HTML valid att…
338 …* Improvement: added the :containsWholeText(text) selector, to match against non-normalized Elemen…
342 …* Improvement: added Element#wholeOwnText() to retrieve the original (non-normalized) ownText of a…
347 …* Improvement: added the :matchesWholeText(regex) and :matchesWholeOwnText(regex) selectors, to ma…
348 (non-normalized, case sensitive) element text and own text, respectively.
352 …query, vs only the context element's sub-tree. This enables support for queries outside (parent or…
353 element, e.g. ancestor-or-self::*.
357 …30 to limit the indent level for very deeply nested elements, and may be disabled by setting to -1.
361 as to preserve applicable settings, such as the Pretty Print settings.
377 …ix: boolean attribute names should be case-insensitive, but were not when the parser was configure…
385 * Bugfix: a comment with all dashes (<!----->) should not emit a parse error.
397 …* Bugfix: when copy-creating a Safelist from another, perform a deep-copy of the original's settin…
409 *** Release 1.14.3 [2021-Sep-30]
444 …* Bugfix: the OSGi bundle meta-data incorrectly set a version on the import of javax.annotation (u…
451 …* Bugfix: when the HTML parser was configured to preserve case, Element text methods would miss ad…
468 *** Release 1.14.2 [2021-Aug-15]
469 * Improvement: support Pattern.quote \Q and \E escapes in the selector regex matchers.
502 * Bugfix: tag names must start with an ascii-alpha character.
517 …* Bugfix [Fuzz]: Speed optimized malformed HTML creating elements with thousands of elements - lim…
518 count per element when parsing to 512 (in real-world HTML, P99 is ~ 8).
528 …* Bugfix [Fuzz]: Speed improvement when the stack was thousands of items deep, and non-matching cl…
536 deep in stack.
539 …* Bugfix [Fuzz]: Fix a potential stack-overflow in the parser given crafted HTML, when the parser …
547 formatting elements that will be cloned when mis-nested is now capped to 12.
550 *** Release 1.14.1 [2021-Jul-10]
570 …Cookies are re-implemented to correctly support path and domain filtering when used within a sessi…
571 …in-memory cookie store is used for the session, or a custom implementation (perhaps disk-persisten…
577 …The session is multi-thread safe and can execute multiple requests concurrently. If the user accid…
609 * Improvement: when parsing XML, disable pretty-printing by default.
624 …* Build Improvement: integrated jsoup into the OSS Fuzz project, which semi-randomly generates mil…
645 …* Bugfix: in HttpConnection.Request, headers beginning with "sec-" (e.g. Sec-Fetch-Mode) were sile…
664 …provided and ignore normal HTML tree-building rules. This allows for e.g. a div tag to be placed i…
669 …en creating a selector for an element with Element#cssSelector, if the element used a non-unique ID
696 …* Bugfix: [Fuzz] fixed a potential Stack Overflow when parsing mis-nested tfoot tags, and updated …
700 …* Bugfix: [Fuzz] fixed a potentially slow HTML parse when tags are nested extremely deep (e.g. 88K…
701 the formatting tag search depth to 256. In practice, it's generally between 4 - 8.
708 *** Release 1.13.1 [2020-Feb-29]
721 * Improvement: when pretty-printing, comments in inline tags are not pushed to a newline
729 …* Improvement: added Element#select(Evaluator) and Element#selectFirst(Evaluator), to allow re-use…
736 * Improvement: preserve whitespace before html and head tag, if pretty-printing is off.
745 …* Bugfix: empty tags and form tags did not have their attributes normalized (lower-cased by defaul…
748 …* Bugfix: when preserve case was set to on, the HTML pretty-print formatter didn't indent capitali…
750 …ure that script and style contents are parsed into DataNodes, not TextNodes, when in case-sensitive
753 **** Release 1.12.2 [2020-Feb-08]
765 …content if they have not set it, but still in sensible bounds. Also updated the default user-agent…
790 aware (HTML case-insensitive names, XML are case-sensitive).
803 …* Bugfix: don't strip out zero-width-joiners (or zero-width-non-joiners) when normalizing text. Th…
807 …* Bugfix: Evaluator.TagEndsWith (namespaced elements) and Tag disagreed in case-sensitivity. Now c…
808 case-insensitively.
814 * Bugfix: HTML parser adds redundant text when parsing self-closing textarea.
833 **** Release 1.12.1 [2019-May-12]
863 …* Improvement: allow forms to be submitted with Content-Type=multipart/form-data without requiring…
906 * Updated jetty-server (which is used for integration tests) to latest 9.2 series (9.2.28).
908 *** Release 1.11.3 [2018-Apr-15]
910 round-tripped into output HTML.
923 …* Improvement: character references from Windows-1252 that are not valid Unicode are mapped to the…
955 …* Bugfix: documents with a leading UTF-8 BOM did not have that BOM consumed, so it acted as a zero…
962 *** Release 1.11.2 [2017-Nov-19]
969 * Improvement: normalize invisible characters (like soft-hyphens) in Element.text().
972 …* Improvement: added Element.wholeText(), to easily get the un-normalized text value of an element…
975 …* Bugfix: in a deep DOM stack, a StackOverFlow exception could occur when generating implied end t…
984 * Bugfix: whitespace preserving tags were not honoured when nested deeper than two levels deep.
994 or UTF-8, an encoding exception could occur.
1003 *** Release 1.11.1 [2017-Nov-06]
1004 … level to Java 7 from Java 5. To maintain Android support (of minversion 8), try-with-resources are
1057 …* Bugfix: if a document was re-decoded after character set detection, the HTML parser was not rese…
1064 * Bugfix: self-closing tags for known empty elements were incorrectly treated as errors.
1067 …* Bugfix: fixed an issue where a self-closing title, noframes, or style tag would cause the rest o…
1071 * Bugfix: fixed an issue with unknown mixed-case tags
1087 *** Release 1.10.3 [2017-Jun-11]
1105 …ent.hasClass() and the ".classname" selector would not find the class attribute case-insensitively.
1115 …* Bugfix: In DataUtil when detecting the character set from meta data, and there are two Content-T…
1119 …* Bugfix: when parsing unknown tags in case-sensitive HTML mode, end tags would not close scope co…
1122 * In Jsoup.Connection, ensure there is no Content-Type set when being redirected to a GET.
1131 *** Release 1.10.2 [2017-Jan-02]
1162 …* Jsoup.Connect now detects if a header value is actually in UTF-8 vs the HTTP spec of ISO-8859, a…
1165 …* Bugfix: in Jsoup.Connect, URLs containing non-URL-safe characters were not encoded to URL safe c…
1177 *** Release 1.10.1 [2016-Oct-23]
1178 …* New feature: added the option to preserve case for tags and/or attributes, with ParseSettings. B…
1179 …inue to normalize tag names and attribute names to lower case, and the XML parser will now preserve
1209 * Fixed an OOB exception when loading an empty-body URL and parsing with the XML parser.
1218 …* Fixed an issue in connections with a requestBody where a custom content-type header could be ign…
1221 *** Release 1.9.2 [2016-May-17]
1222 …* Fixed an issue where tag names that contained non-ascii characters but started with an ascii cha…
1226 * In XML documents, detect the charset from the XML prolog - <?xml encoding="UTF-8"?>
1239 *** Release 1.9.1 [2016-Apr-16]
1259 * Added support for UTF-16 and UTF-32 character set detection from byte-order-marks (BOM).
1262 * Added support for tags with non-ascii (unicode) letters.
1273 * Added not-null validators to Element.appendText() and Element.prependText()
1280 …* Reverted Node.equals() and Node.hashCode() back to identity (object) comparisons, as deep conten…
1288 *** Release 1.8.3 [2015-Aug-02]
1296 …Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the HtmlTreeBuilder methods based o…
1323 *** Release 1.8.2 [2015-Apr-13]
1325 speed increase. For non-Android JREs, around 1.1x to 1.2x.
1332 on non-Android JREs.
1335 MIME multipart/form-data encoding.
1337 …* Add a meta-charset element to documents when setting the character set, so that the document's c…
1348 * Added option in Cleaner Safelist to allow linking to in-page anchors (#)
1359 * Added support for overriding the default POST character of UTF-8
1386 …* Fixed performance issue when parsing HTML with elements with many children that need re-parentin…
1390 UnsupportedCharsetException, instead of falling back to the default UTF-8 charset.
1393 …ue where Jsoup.Connection would throw an IO Exception when reading a page with zero content-length.
1402 *** Release 1.8.1 [2014-Sep-27]
1418 * If pretty-print is disabled, don't trim outer whitespace in Element.html()
1429 …* Fixed an issue where a UTF-8 BOM character was not detected if the HTTP response did not specify…
1431 …when the UTF-8 BOM is detected, it will take precedence for determining the charset to decode with.
1437 * Fixed an issue in parsing a base URI when loading a URL containing a http-equiv element.
1450 * Fixed support for nth-of-type selectors with unknown tags.
1463 *** Release 1.7.3 [2013-Nov-10]
1477 * Fixed support for self-closing script tags.
1483 …* Fixed an issue where elements added via the adoption agency algorithm did not preserve their att…
1486 …* Fixed an issue when cloning a document with extremely nested elements that could cause a stack-o…
1495 *** Release 1.7.2 [2013-Jan-27]
1499 …t for structural pseudo CSS selectors, including :first-child, :last-child, :nth-child, :nth-last-…
1500 …:first-of-type, :last-of-type, :nth-of-type, :nth-last-of-type, :only-child, :only-of-type, :empty…
1514 * When parsing in XML mode, preserve XML declarations (<?xml ... ?>).
1527 …* When parsing, allow all tags to self-close. Tags that aren't expected to self-close will get an …
1538 …the stack the formatting adoption agency algorithm will travel, to prevent the chance of a run-away
1539 parse when the HTML stack is hopelessly deep.
1542 …nt.text() to build text by traversing child nodes rather than recursing. This avoids stack-overflow
1543 errors when the DOM is very deep and the VM stack-size is low.
1546 *** Release 1.7.1 [2012-Sep-23]
1555 …* Fixed an issue when determining the Windows-1254 character-set from a meta tag when run in the T…
1570 * Fixed an issue when normalising whitespace for strings containing high-surrogate characters.
1573 * If a server doesn't specify a content-type header, treat that as OK.
1576 …* If a server returns an unsupported character-set header, attempt to decode the content with the …
1580 …* Removed an unnecessary synchronisation in Tag.valueOf, allowing multi-threaded parsing to run fa…
1583 …* Made entity decoding less greedy, so that non-entities are less likely to be incorrectly treated…
1589 …* In Jsoup.connection, enforce a connection disconnect after every connect. This precludes keep-al…
1592 *** Release 1.6.3 [2012-May-28]
1593 …* Fixed parsing of group-or commas in CSS selectors, to correctly handle sub-queries containing co…
1612 …* Fixed issue with :all pseudo-tag in HTML sanitizer when cleaning tags previously defined in safe…
1620 *** Release 1.6.2 [2012-Mar-27]
1630 …* Added an example program that demonstrates how to format HTML as plain-text, and the use of the …
1637 …* Updated the Cleaner and Safelists to optionally preserve related links in elements, instead of c…
1655 * Fixed issue where comments within a table tag would be duplicate-fostered into body.
1658 …* Fixed an issue where a spurious byte-order-mark at the start of a document would cause the parse…
1669 …HTML output of closing script and style tags to not add an extraneous newline when pretty-printing.
1675 *** Release 1.6.1 [2011-Jul-02]
1694 *** Release 1.6.0 [2011-Jun-13]
1708 …* Added jsoup.Connect configuration options to allow HTTP errors to be ignored, and the content-ty…
1726 *** Release 1.5.2 [2011-Feb-27]
1727 …* Fixed issue with selector parser where some boolean AND + OR combined queries (e.g. "meta[http-e…
1728 …were being parsed incorrectly as OR only queries (e.g. former as "meta, [http-equiv], meta[content…
1730 …* Fixed issue where a content-type specified in a meta tag may not be reliably detected, due to th…
1736 *** Release 1.5.1 [2011-Feb-19]
1738 …* Integrated new single-pass selector evaluators, contributed by knz (Anton Kazennikov). This sign…
1759 …* Modified Jsoup.Connect to always follow relative links, regardless of the underlying HTTP sub-sy…
1769 * Fixed issue when using descendant regex attribute selectors.
1771 *** Release 1.4.1 [2010-Nov-23]
1775 * Implemented Node.clone() to create deep, independent copies of Nodes, Elements, and Documents.
1784 …* Relaxed parse rules of H1 - H6, to allow nested content. This is against spec, but matches brows…
1789 …* Fixed issue in jsoup.connect when extracting character set from content-type header; now support…
1802 *** Release 1.3.3 [2010-Sep-19]
1812 *** Release 1.3.2 [2010-Aug-30]
1817 *** Release 1.3.1 [2010-Aug-23]
1818 * Removed dependency on Apache Commons-lang. Jsoup now has no external dependencies.
1821 support for gzip responses, cookies, headers, data parameters, user-agent, referrer, etc.
1825 …* Added support for selectors :containsOwn(text) and :matchesOwn(regex), to supplement Element.own…
1827 * Added support for non-pretty-printed HTML output, to more closely mirror the input HTML.
1831 * Fixed support for case-sensitive HTML escape entities.
1837 *** Release 1.2.3 [2010-Aug-04]
1839 …character set when parsing HTML from a File or URL. The parser checks the content-type header, the…
1840 <meta http-equiv> or <meta charset> tag, and finally falls back to UTF-8.
1843 …act. The output charset defaults to the document's input charset. This simplifies non-ascii output.
1850 Useful for finding elements with datasets: [^data-] matches <p data-name="jsoup">
1858 * Improved HTML output format for empty elements and auto-detected self closing tags
1860 * Changed DT & DD tags to block-mode tags, to follow practice over spec
1862 * Added support for tag names with - and _ (<abc_foo>, <abc-foo>)
1866 * Fixed support for character class regular expressions in [attr=~regex] selector
1868 *** Release 1.2.2 [2010-Jul-11]
1871 - core HTML parser engine now 3.5 times faster
1872 - HTML generator now 2.5 times faster
1873 - much lower memory use and garbage collection time
1875 …* Added support for :matches(regex) selector, to find elements containing text matching regular ex…
1877 …* Added support for [key~=regex] attribute selector, to find elements with attribute values matchi…
1879 * Upgraded the selector query parser to allow nested selectors like 'div:has(p:matches(regex))'
1881 *** Release 1.2.1 [2010-Jun-21]
1886 * Added :has(selector) pseudo-selector
1898 * Fixes an issue where text content after a script (or other data-node) was
1902 * Fixes an issue where text order was incorrect when parsing pre-document
1906 *** Release 1.1.1 [2010-Jun-08]
1913 * Throw exception if trying to parse non-text content
1919 * Allow _ and - in CSS ID selectors (per CSS spec).
1937 *** Release 0.3.1 (2010-Feb-20)
1951 *** Release 0.2.2 (2010-Feb-07)
1962 * Improved HTML string output format (pretty-print)
1966 *** Release 0.1.2 (2010-Feb-02)
1971 *** Release 0.1.1 (2010-Jan-31)