xref: /aosp_15_r20/external/pcre/NEWS (revision 22dc650d8ae982c6770746019a6f94af92b0f024)
1News about PCRE2 releases
2-------------------------
3
4
5Version 10.44 07-June-2024
6--------------------------
7
8This is mostly a bug-fix and tidying release. There is one new function, to set
9a maximum size for a compiled pattern. The maximum name length for groups is
10increased to 128. Some auxiliary files for building under VMS are added.
11
12
13Version 10.43 16-February-2024
14------------------------------
15
16There are quite a lot of changes in this release (see ChangeLog and git log for
17a list). Those that are not bugfixes or code tidies are:
18
19* The JIT code no longer supports ARMv5 architecture.
20
21* A new function pcre2_get_match_data_heapframes_size() for finer heap control.
22
23* New option flags to restrict the interaction between ASCII and non-ASCII
24  characters for caseless matching and \d and friends. There are also new
25  pattern constructs to control these flags from within a pattern.
26
27* Upgrade to Unicode 15.0.0.
28
29* Treat a NULL pattern with zero length as an empty string.
30
31* Added support for limited-length variable-length lookbehind assertions, with
32  a default maximum length of 255 characters (same as Perl) but with a function
33  to adjust the limit.
34
35* Support for LoongArch in JIT.
36
37* Perl changed the meaning of (for example) {,3} which did not used to be
38  recognized as a quantifier. Now it means {0,3} and PCRE2 has also changed.
39  Note that {,} is still not a quantifier.
40
41* Following Perl, allow spaces and tabs after { and before } in all Perl-
42  compatible items that use braces, and also around commas in quantifiers. The
43  one exception in PCRE2 is \u{...}, which is from ECMAScript, not Perl, and
44  PCRE2 follows ECMAScript usage.
45
46* Changed the meaning of \w and its synonyms and derivatives (\b and \B) in UCP
47  mode to follow Perl. It now matches characters whose general categories are L
48  or N or whose particular categories are Mn (non-spacing mark) or Pc
49  (combining punctuation).
50
51* Changed the default meaning of [:xdigit:] in UCP mode to follow Perl. It now
52  matches the "fullwidth" versions of hex digits. PCRE2_EXTRA_ASCII_DIGIT can
53  be used to keep it ASCII only.
54
55* Make PCRE2_UCP the default in UTF mode in pcre2grep and add -no_ucp,
56  --case-restrict and --posix-digit.
57
58* Add --group-separator and --no-group-separator to pcre2grep.
59
60
61Version 10.42 11-December-2022
62------------------------------
63
64This is an unexpectedly early release to fix a problem that was introduced in
6510.41. ChangeLog number 19 (GitHub #139) added the default definition of
66PCRE2_CALL_CONVENTION to pcre2posix.c instead of pcre2posix.h, which meant that
67programs including pcre2posix.h but not pcre2.h couldn't compile. A new test
68that checks this case has been added.
69
70A couple of other minor issues are also fixed, and a patch for an intermittent
71JIT fault is also included. See ChangeLog and the Git log.
72
73
74Version 10.41 06-December-2022
75------------------------------
76
77This is another mainly bug-fixing and code-tidying release. There is one
78significant upgrade to pcre2grep: it now behaves like GNU grep when matching
79more than one pattern and a later pattern matches at an earlier point in the
80subject when the matched substrings are being identified by colour or by
81offsets.
82
83
84Version 10.40 15-April-2022
85---------------------------
86
87This is mostly a bug-fixing and code-tidying release. However, there are some
88extensions to Unicode property handling:
89
90* Added support for Bidi_Class and a number of binary Unicode properties,
91including Bidi_Control.
92
93* A number of changes to script matching for \p and \P:
94
95  (a) Script extensions for a character are now coded as a bitmap instead of
96      a list of script numbers, which should be faster and does not need a
97      loop.
98
99  (b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms
100      sc and scx).
101
102  (c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being
103      the same as \p{scx:scriptname} because this change happened in Perl at
104      release 5.26.
105
106  (d) The standard Unicode 4-letter abbreviations for script names are now
107      recognized.
108
109  (e) In accordance with Unicode and Perl's "loose matching" rules, spaces,
110      hyphens, and underscores are ignored in property names, which are then
111      matched independent of case.
112
113As always, see ChangeLog for a list of all changes (also the Git log).
114
115
116Version 10.39 29-October-2021
117-----------------------------
118
119This release is happening soon after 10.38 because the bug fix is important.
120
1211. Fix incorrect detection of alternatives in first character search in JIT.
122
1232. Update to Unicode 14.0.0.
124
1253. Some code cleanups (see ChangeLog).
126
127
128Version 10.38 01-October-2021
129-----------------------------
130
131As well as some bug fixes and tidies (as always, see ChangeLog for details),
132the documentation is updated to list the new URLs, following the move of the
133source repository to GitHub and the mailing list to Google Groups.
134
135* The CMake build system can now build both static and shared libraries in one
136go.
137
138* Following Perl's lead, \K is now locked out in lookaround assertions by
139default, but an option is provided to re-enable the previous behaviour.
140
141
142Version 10.37 26-May-2021
143-------------------------
144
145A few more bug fixes and tidies. The only change of real note is the removal of
146the actual POSIX names regcomp etc. from the POSIX wrapper library because
147these have caused issues for some applications (see 10.33 #2 below).
148
149
150Version 10.36 04-December-2020
151------------------------------
152
153Again, mainly bug fixes and tidies. The only enhancements are the addition of
154GNU grep's -m (aka --max-count) option to pcre2grep, and also unifying the
155handling of substitution strings for both -O and callouts in pcre2grep, with
156the addition of $x{...} and $o{...} to allow for characters whose code points
157are greater than 255 in Unicode mode.
158
159NOTE: there is an outstanding issue with JIT support for MacOS on arm64
160hardware. For details, please see Bugzilla issue #2618.
161
162
163Version 10.35 15-April-2020
164---------------------------
165
166Bugfixes, tidies, and a few new enhancements.
167
1681. Capturing groups that contain recursive backreferences to themselves are no
169longer automatically atomic, because the restriction is no longer necessary
170as a result of the 10.30 restructuring.
171
1722. Several new options for pcre2_substitute().
173
1743. When Unicode is supported and PCRE2_UCP is set without PCRE2_UTF, Unicode
175character properties are used for upper/lower case computations on characters
176whose code points are greater than 127.
177
1784. The character tables (for low-valued characters) can now more easily be
179saved and restored in binary.
180
1815. Updated to Unicode 13.0.0.
182
183
184Version 10.34 21-November-2019
185------------------------------
186
187Another release with a few enhancements as well as bugfixes and tidies. The
188main new features are:
189
1901. There is now some support for matching in invalid UTF strings.
191
1922. Non-atomic positive lookarounds are implemented in the pcre2_match()
193interpreter, but not in JIT.
194
1953. Added two new functions: pcre2_get_match_data_size() and
196pcre2_maketables_free().
197
1984. Upgraded to Unicode 12.1.0.
199
200
201Version 10.33 16-April-2019
202---------------------------
203
204Yet more bugfixes, tidies, and a few enhancements, summarized here (see
205ChangeLog for the full list):
206
2071. Callouts from pcre2_substitute() are now available.
208
2092. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper
210functions that use the standard POSIX names. However, in pcre2posix.h the POSIX
211names are defined as macros. This should help avoid linking with the wrong
212library in some environments, while still exporting the POSIX names for
213pre-existing programs that use them.
214
2153. Some new options:
216
217   (a) PCRE2_EXTRA_ESCAPED_CR_IS_LF makes \r behave as \n.
218
219   (b) PCRE2_EXTRA_ALT_BSUX enables support for ECMAScript 6's \u{hh...}
220       construct.
221
222   (c) PCRE2_COPY_MATCHED_SUBJECT causes a copy of a matched subject to be
223       made, instead of just remembering a pointer.
224
2254. Some new Perl features:
226
227   (a) Perl 5.28's experimental alphabetic names for atomic groups and
228       lookaround assertions, for example, (*pla:...) and (*atomic:...).
229
230   (b) The new Perl "script run" features (*script_run:...) and
231       (*atomic_script_run:...) aka (*sr:...) and (*asr:...).
232
233   (c) When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in
234       capture group names.
235
2365. --disable-percent-zt disables the use of %zu and %td in formatting strings
237in pcre2test. They were already automatically disabled for VC and older C
238compilers.
239
2406. Some changes related to callouts in pcre2grep:
241
242   (a) Support for running an external program under VMS has been added, in
243       addition to Windows and fork() support.
244
245   (b) --disable-pcre2grep-callout-fork restricts the callout support in
246       to the inbuilt echo facility.
247
248
249Version 10.32 10-September-2018
250-------------------------------
251
252This is another mainly bugfix and tidying release with a few minor
253enhancements. These are the main ones:
254
2551. pcre2grep now supports the inclusion of binary zeros in patterns that are
256read from files via the -f option.
257
2582. ./configure now supports --enable-jit=auto, which automatically enables JIT
259if the hardware supports it.
260
2613. In pcre2_dfa_match(), internal recursive calls no longer use the stack for
262local workspace and local ovectors. Instead, an initial block of stack is
263reserved, but if this is insufficient, heap memory is used. The heap limit
264parameter now applies to pcre2_dfa_match().
265
2664. Updated to Unicode version 11.0.0.
267
2685. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
269
2706. Added support for \N{U+dddd}, but only in Unicode mode.
271
2727. Added support for (?^) to unset all imnsx options.
273
274
275Version 10.31 12-February-2018
276------------------------------
277
278This is mainly a bugfix and tidying release (see ChangeLog for full details).
279However, there are some minor enhancements.
280
2811. New pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
282PCRE2_CONFIG_COMPILED_WIDTHS.
283
2842. New pcre2_pattern_info() option PCRE2_INFO_EXTRAOPTIONS to retrieve the
285extra compile time options.
286
2873. There are now public names for all the pcre2_compile() error numbers.
288
2894. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
290field callout_flags in callout blocks.
291
292
293Version 10.30 14-August-2017
294----------------------------
295
296The full list of changes that includes bugfixes and tidies is, as always, in
297ChangeLog. These are the most important new features:
298
2991. The main interpreter, pcre2_match(), has been refactored into a new version
300that does not use recursive function calls (and therefore the system stack) for
301remembering backtracking positions. This makes --disable-stack-for-recursion a
302NOOP. The new implementation allows backtracking into recursive group calls in
303patterns, making it more compatible with Perl, and also fixes some other
304previously hard-to-do issues. For patterns that have a lot of backtracking, the
305heap is now used, and there is an explicit limit on the amount, settable by
306pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
307but is renamed as "depth limit" (though the old names remain for
308compatibility).
309
310There is also a change in the way callouts from pcre2_match() are handled. The
311offset_vector field in the callout block is no longer a pointer to the
312actual ovector that was passed to the matching function in the match data
313block. Instead it points to an internal ovector of a size large enough to hold
314all possible captured substrings in the pattern.
315
3162. The new option PCRE2_ENDANCHORED insists that a pattern match must end at
317the end of the subject.
318
3193. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and
320pcre2test is upgraded to support it. Setting within the pattern by (?xx) is
321also supported.
322
3234. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
324
3255. Additional compile options in the compile context are now available, and the
326first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
327PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
328
3296. The newline type PCRE2_NEWLINE_NUL is now available.
330
3317. The match limit value now also applies to pcre2_dfa_match() as there are
332patterns that can use up a lot of resources without necessarily recursing very
333deeply.
334
3358. The option REG_PEND (a GNU extension) is now available for the POSIX
336wrapper. Also there is a new option PCRE2_LITERAL which is used to support
337REG_NOSPEC.
338
3399. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the
340benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented
341using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This
342is tidier and also fixes some bugs.
343
34410. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0.
345
34611. There are some experimental functions for converting foreign patterns
347(globs and POSIX patterns) into PCRE2 patterns.
348
349
350Version 10.23 14-February-2017
351------------------------------
352
3531. ChangeLog has the details of a lot of bug fixes and tidies.
354
3552. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
356checking is now done in the pre-pass that identifies capturing groups. This has
357reduced the amount of duplication and made the code tidier. While doing this,
358some minor bugs and Perl incompatibilities were fixed (see ChangeLog for
359details.)
360
3613. Back references are now permitted in lookbehind assertions when there are
362no duplicated group numbers (that is, (?| has not been used), and, if the
363reference is by name, there is only one group of that name. The referenced
364group must, of course be of fixed length.
365
3664. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back
367reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does
368not recognize this syntax.
369
3705. pcre2grep now automatically expands its buffer up to a maximum set by
371--max-buffer-size.
372
3736. The -t option (grand total) has been added to pcre2grep.
374
3757. A new function called pcre2_code_copy_with_tables() exists to copy a
376compiled pattern along with a private copy of the character tables that is
377uses.
378
3798. A user supplied a number of patches to upgrade pcre2grep under Windows and
380tidy the code.
381
3829. Several updates have been made to pcre2test and test scripts (see
383ChangeLog).
384
385
386Version 10.22 29-July-2016
387--------------------------
388
3891. ChangeLog has the details of a number of bug fixes.
390
3912. The POSIX wrapper function regcomp() did not used to support back references
392and subroutine calls if called with the REG_NOSUB option. It now does.
393
3943. A new function, pcre2_code_copy(), is added, to make a copy of a compiled
395pattern.
396
3974. Support for string callouts is added to pcre2grep.
398
3995. Added the PCRE2_NO_JIT option to pcre2_match().
400
4016. The pcre2_get_error_message() function now returns with a negative error
402code if the error number it is given is unknown.
403
4047. Several updates have been made to pcre2test and test scripts (see
405ChangeLog).
406
407
408Version 10.21 12-January-2016
409-----------------------------
410
4111. Many bugs have been fixed. A large number of them were provoked only by very
412strange pattern input, and were discovered by fuzzers. Some others were
413discovered by code auditing. See ChangeLog for details.
414
4152. The Unicode tables have been updated to Unicode version 8.0.0.
416
4173. For Perl compatibility in EBCDIC environments, ranges such as a-z in a
418class, where both values are literal letters in the same case, omit the
419non-letter EBCDIC code points within the range.
420
4214. There have been a number of enhancements to the pcre2_substitute() function,
422giving more flexibility to replacement facilities. It is now also possible to
423cause the function to return the needed buffer size if the one given is too
424small.
425
4265. The PCRE2_ALT_VERBNAMES option causes the "name" parts of special verbs such
427as (*THEN:name) to be processed for backslashes and to take note of
428PCRE2_EXTENDED.
429
4306. PCRE2_INFO_HASBACKSLASHC makes it possible for a client to find out if a
431pattern uses \C, and --never-backslash-C makes it possible to compile a version
432PCRE2 in which the use of \C is always forbidden.
433
4347. A limit to the length of pattern that can be handled can now be set by
435calling pcre2_set_max_pattern_length().
436
4378. When matching an unanchored pattern, a match can be required to begin within
438a given number of code units after the start of the subject by calling
439pcre2_set_offset_limit().
440
4419. The pcre2test program has been extended to test new facilities, and it can
442now run the tests when LF on its own is not a valid newline sequence.
443
44410. The RunTest script has also been updated to enable more tests to be run.
445
44611. There have been some minor performance enhancements.
447
448
449Version 10.20 30-June-2015
450--------------------------
451
4521. Callouts with string arguments and the pcre2_callout_enumerate() function
453have been implemented.
454
4552. The PCRE2_NEVER_BACKSLASH_C option, which locks out the use of \C, is added.
456
4573. The PCRE2_ALT_CIRCUMFLEX option lets ^ match after a newline at the end of a
458subject in multiline mode.
459
4604. The way named subpatterns are handled has been refactored. The previous
461approach had several bugs.
462
4635. The handling of \c in EBCDIC environments has been changed to conform to the
464perlebcdic document. This is an incompatible change.
465
4666. Bugs have been mended, many of them discovered by fuzzers.
467
468
469Version 10.10 06-March-2015
470---------------------------
471
4721. Serialization and de-serialization functions have been added to the API,
473making it possible to save and restore sets of compiled patterns, though
474restoration must be done in the same environment that was used for compilation.
475
4762. The (*NO_JIT) feature has been added; this makes it possible for a pattern
477creator to specify that JIT is not to be used.
478
4793. A number of bugs have been fixed. In particular, bugs that caused building
480on Windows using CMake to fail have been mended.
481
482
483Version 10.00 05-January-2015
484-----------------------------
485
486Version 10.00 is the first release of PCRE2, a revised API for the PCRE
487library. Changes prior to 10.00 are logged in the ChangeLog file for the old
488API, up to item 20 for release 8.36. New programs are recommended to use the
489new library. Programs that use the original (PCRE1) API will need changing
490before linking with the new library.
491
492****
493