xref: /aosp_15_r20/external/pcre/ChangeLog (revision 22dc650d8ae982c6770746019a6f94af92b0f024)
1*22dc650dSSadaf EbrahimiChange Log for PCRE2
2*22dc650dSSadaf Ebrahimi--------------------
3*22dc650dSSadaf Ebrahimi
4*22dc650dSSadaf EbrahimiBefore the move to GitHub, this was the only record of changes to PCRE2. Now
5*22dc650dSSadaf Ebrahimithere is also the log of commit messages.
6*22dc650dSSadaf Ebrahimi
7*22dc650dSSadaf EbrahimiVersion 10.44 07-June-2024
8*22dc650dSSadaf Ebrahimi--------------------------
9*22dc650dSSadaf Ebrahimi
10*22dc650dSSadaf Ebrahimi1. If a pattern contained a variable-length lookbehind in which the first
11*22dc650dSSadaf Ebrahimibranch was not the one with the shortest minimum length, and the lookbehind
12*22dc650dSSadaf Ebrahimicontained a capturing group, and elsewhere in the pattern there was another
13*22dc650dSSadaf Ebrahimilookbehind that referenced that group, the pattern was incorrectly compiled,
14*22dc650dSSadaf Ebrahimileading to unpredictable results, including crashes in JIT compiling. An
15*22dc650dSSadaf Ebrahimiexample pattern is: /(((?<=123?456456|ABC)))(?<=\2)/
16*22dc650dSSadaf Ebrahimi
17*22dc650dSSadaf Ebrahimi2. Further updates to the oss-fuzz support:
18*22dc650dSSadaf Ebrahimi
19*22dc650dSSadaf Ebrahimi   (a) Limit quantifiers for groups and classes to be no more than 10. This
20*22dc650dSSadaf Ebrahimi       avoids very long JIT compile times that happen in some cases when groups
21*22dc650dSSadaf Ebrahimi       are replicated for quantification, and very long match times when
22*22dc650dSSadaf Ebrahimi       classes contain a lot of non-ascii characters.
23*22dc650dSSadaf Ebrahimi
24*22dc650dSSadaf Ebrahimi   (b) Added PCRE2_EXTENDED_MORE to the list of allowed options.
25*22dc650dSSadaf Ebrahimi
26*22dc650dSSadaf Ebrahimi   (c) Arranged for text error messages to be shown in 16-bit and 32-bit modes.
27*22dc650dSSadaf Ebrahimi
28*22dc650dSSadaf Ebrahimi   (d) Made the output in standalone mode more readable.
29*22dc650dSSadaf Ebrahimi
30*22dc650dSSadaf Ebrahimi   (e) General code tidies.
31*22dc650dSSadaf Ebrahimi
32*22dc650dSSadaf Ebrahimi   (f) Limit the size of compiled patterns to 10MB (see 6 below).
33*22dc650dSSadaf Ebrahimi
34*22dc650dSSadaf Ebrahimi   (g) Do not run JIT on patterns whose compiled length is greater than 200K
35*22dc650dSSadaf Ebrahimi       bytes because this takes a long time, causing oss-fuzz to time out.
36*22dc650dSSadaf Ebrahimi
37*22dc650dSSadaf Ebrahimi   (h) Avoid compiling or matching twice with the same options (this could
38*22dc650dSSadaf Ebrahimi       happen if the input didn't set any options).
39*22dc650dSSadaf Ebrahimi
40*22dc650dSSadaf Ebrahimi3. Increase the maximum length of a name for a group from 32 to 128 because
41*22dc650dSSadaf Ebrahimithere is a user for whom 32 is too small.
42*22dc650dSSadaf Ebrahimi
43*22dc650dSSadaf Ebrahimi4. Cause pcre2test to output a message when pcre2_jit_compile() gives an error
44*22dc650dSSadaf Ebrahimireturn if either jitverify or info is specified.
45*22dc650dSSadaf Ebrahimi
46*22dc650dSSadaf Ebrahimi5. Some auxiliary files for building under OpenVMS that were contributed by
47*22dc650dSSadaf EbrahimiAlexey Chupahin have been installed.
48*22dc650dSSadaf Ebrahimi
49*22dc650dSSadaf Ebrahimi6. Added pcre2_set_max_pattern_compiled_length() to limit the size of compiled
50*22dc650dSSadaf Ebrahimipatterns.
51*22dc650dSSadaf Ebrahimi
52*22dc650dSSadaf Ebrahimi7. There was a bug in the implementation of \X caused by my (PH) misreading or
53*22dc650dSSadaf Ebrahimimisunderstanding one of the grapheme sequence breaking rules in Unicode Annex
54*22dc650dSSadaf Ebrahimi#29. A break should occur between two characters with the Extended Pictographic
55*22dc650dSSadaf Ebrahimibreak property unless a zero-width joiner intervenes. PCRE2 was not insisting
56*22dc650dSSadaf Ebrahimion the ZWJ, causing \X to match more than it should. See GitHub issue #410.
57*22dc650dSSadaf Ebrahimi
58*22dc650dSSadaf Ebrahimi8. Avoid compilation issues with proprietary compilers in UNIX since 10.43.
59*22dc650dSSadaf Ebrahimi
60*22dc650dSSadaf Ebrahimi
61*22dc650dSSadaf EbrahimiVersion 10.43 16-February-2024
62*22dc650dSSadaf Ebrahimi------------------------------
63*22dc650dSSadaf Ebrahimi
64*22dc650dSSadaf Ebrahimi1. The test program added by change 2 of 10.42 didn't work when the default
65*22dc650dSSadaf Ebrahiminewline setting didn't include \n as a newline. One test needed (*LF) to ensure
66*22dc650dSSadaf Ebrahimithat it worked.
67*22dc650dSSadaf Ebrahimi
68*22dc650dSSadaf Ebrahimi2. Added the new freestanding POSIX test program to the ManyConfigTests script
69*22dc650dSSadaf Ebrahimiin the maint directory (overlooked in 2 below). Also improved the selection
70*22dc650dSSadaf Ebrahimifacilities in that script, and added a test with JIT in a non-source directory,
71*22dc650dSSadaf Ebrahimifixing an oversight that would have made such a test fail before.
72*22dc650dSSadaf Ebrahimi
73*22dc650dSSadaf Ebrahimi3. Added pcre2_get_match_data_heapframes_size() and related pcre2test flags
74*22dc650dSSadaf Ebrahimito allow for finer control of the heap used when pcre2_match() without JIT is
75*22dc650dSSadaf Ebrahimiused and the match_data might be reused. This began as PR #191, but has had
76*22dc650dSSadaf Ebrahimifurther refinement and documentation edits.
77*22dc650dSSadaf Ebrahimi
78*22dc650dSSadaf Ebrahimi4. Applied PR #181, which tidies some casts in pcre2_valid_utf.c.
79*22dc650dSSadaf Ebrahimi
80*22dc650dSSadaf Ebrahimi5. Applied PR #184, which avoids overflow issues with the heap limit
81*22dc650dSSadaf Ebrahimi(introduced in 10.41/9).
82*22dc650dSSadaf Ebrahimi
83*22dc650dSSadaf Ebrahimi6. Applied PR #192, which changes the timing units for pcre2test from
84*22dc650dSSadaf Ebrahimimilliseconds to microseconds. This is more useful for modern CPUs.
85*22dc650dSSadaf Ebrahimi
86*22dc650dSSadaf Ebrahimi7. Applied PR #193, which makes the requirement for C99 explicit in
87*22dc650dSSadaf Ebrahimiconfigure.ac and CMakeLists.txt.
88*22dc650dSSadaf Ebrahimi
89*22dc650dSSadaf Ebrahimi8. Fixed a bug in pcre2test when a ridiculously large string repeat required a
90*22dc650dSSadaf Ebrahimistupid amount of memory. It now gives a clean realloc() failure error.
91*22dc650dSSadaf Ebrahimi
92*22dc650dSSadaf Ebrahimi9. Updates to restrict the interaction between ASCII and non-ASCII characters
93*22dc650dSSadaf Ebrahimifor caseless matching and items like \d:
94*22dc650dSSadaf Ebrahimi
95*22dc650dSSadaf Ebrahimi   (a) Added PCRE2_EXTRA_CASELESS_RESTRICT to lock out mixing of ASCII and
96*22dc650dSSadaf Ebrahimi       non-ASCII when matching caselessly. This is also /r in pcre2test and
97*22dc650dSSadaf Ebrahimi       (?r) within patterns.
98*22dc650dSSadaf Ebrahimi
99*22dc650dSSadaf Ebrahimi   (b) Added PCRE2_EXTRA_ASCII_{BSD,BSS,BSW,POSIX} and corresponding (?aD) etc
100*22dc650dSSadaf Ebrahimi       in patterns and /a in pcre2test.
101*22dc650dSSadaf Ebrahimi
102*22dc650dSSadaf Ebrahimi   (c) Corresponding updates to pcre2test.
103*22dc650dSSadaf Ebrahimi
104*22dc650dSSadaf Ebrahimi10. Unicode has been updated to 15.0.0.
105*22dc650dSSadaf Ebrahimi
106*22dc650dSSadaf Ebrahimi11. The Python scripts and ucptest.c in maint have been updated (a) a minor
107*22dc650dSSadaf Ebrahimichange needed for 9(a) above; (b) fix bugs in ucptest,
108*22dc650dSSadaf Ebrahimi
109*22dc650dSSadaf Ebrahimi12. Integer overflow testing is now centralized in a new function.
110*22dc650dSSadaf Ebrahimi
111*22dc650dSSadaf Ebrahimi13. Made PCRE2_UCP the default in UTF mode in pcre2grep, and added new options
112*22dc650dSSadaf Ebrahimi--case-restrict and --no-ucp.
113*22dc650dSSadaf Ebrahimi
114*22dc650dSSadaf Ebrahimi14. In the debugging printint module (which is normally only linked into
115*22dc650dSSadaf Ebrahimipcre2test), avoid the use of a variable called "not" because that's deprecated
116*22dc650dSSadaf Ebrahimiin C and forbidden in C++. Also rewrite some code to avoid a goto into a block
117*22dc650dSSadaf Ebrahimithat bypassed its initialization (though it didn't actually matter).
118*22dc650dSSadaf Ebrahimi
119*22dc650dSSadaf Ebrahimi15. More minor code adjustments to avoid using reserved C++ words as variable
120*22dc650dSSadaf Ebrahiminames ("new" and "typename") and another jump that bypassed an (irrelevant)
121*22dc650dSSadaf Ebrahimiinitialization.
122*22dc650dSSadaf Ebrahimi
123*22dc650dSSadaf Ebrahimi16. Merged a pull request that removed pcre2_ucptables.c from the list of files
124*22dc650dSSadaf Ebrahimito compile in NON-AUTOTOOLS-BUILD because it is #included in pcre2_tables.c.
125*22dc650dSSadaf EbrahimiAlso adjusted the BUILD.bazel and build.zig files, which had the same issue. At
126*22dc650dSSadaf Ebrahimithe same time, fixed a typo in the Bazel file.
127*22dc650dSSadaf Ebrahimi
128*22dc650dSSadaf Ebrahimi17. Add PCRE2_EXTRA_ASCII_DIGIT to allow [:digit:] to be kept on sync with \d
129*22dc650dSSadaf Ebrahimieven in UCP mode.
130*22dc650dSSadaf Ebrahimi
131*22dc650dSSadaf Ebrahimi18. Fix an invalid match of ascii word classes when invalid utf is enabled.
132*22dc650dSSadaf Ebrahimi
133*22dc650dSSadaf Ebrahimi19. Add a --posix-digit to pcre2grep for compatibility with GNU grep, and
134*22dc650dSSadaf Ebrahimiother tools that prefer the POSIX compatible unicode definition for \d.
135*22dc650dSSadaf Ebrahimi
136*22dc650dSSadaf Ebrahimi20. Report the bit width of the library in use by pcre2test for usability.
137*22dc650dSSadaf Ebrahimi
138*22dc650dSSadaf Ebrahimi21. A pathological pattern conversion test could result in a string longer than
139*22dc650dSSadaf Ebrahimithe available input buffer. Cause such a test to fail.
140*22dc650dSSadaf Ebrahimi
141*22dc650dSSadaf Ebrahimi22. Add a check that forces a compiler error if PCRE2_CODE_UNIT_WIDTH is not 8,
142*22dc650dSSadaf Ebrahimi16, or 32 when compiling any of the library modules.
143*22dc650dSSadaf Ebrahimi
144*22dc650dSSadaf Ebrahimi23. Update pcre2_compile() to treat a NULL pattern with zero length as an empty
145*22dc650dSSadaf Ebrahimistring.
146*22dc650dSSadaf Ebrahimi
147*22dc650dSSadaf Ebrahimi24. Add support for limited-length variable-length lookbehind assertions, with
148*22dc650dSSadaf Ebrahimidefault maximum length 255 characters (same as Perl) but with a function to
149*22dc650dSSadaf Ebrahimiadjust the limit.
150*22dc650dSSadaf Ebrahimi
151*22dc650dSSadaf Ebrahimi25. Applied pull request #262, which updates the zig configuration, and #278
152*22dc650dSSadaf Ebrahimiwhich fixes a bug with out-of-source-tree CMake build testing.
153*22dc650dSSadaf Ebrahimi
154*22dc650dSSadaf Ebrahimi26. Add support for LoongArch to JIT.
155*22dc650dSSadaf Ebrahimi
156*22dc650dSSadaf Ebrahimi27. Fixed a bug in pcre2_match() in the code for handling the vector of
157*22dc650dSSadaf Ebrahimibacktracking frames on the heap, which caused a heap overflow if *LIMIT_HEAP
158*22dc650dSSadaf Ebrahimirestricted an attempt to extend to less than the frame size. Generally tidy up
159*22dc650dSSadaf Ebrahimithe code for extending the heap frames vector. This fixes GitHub issue #275.
160*22dc650dSSadaf Ebrahimi
161*22dc650dSSadaf Ebrahimi28. Update pcre2_fuzzsupport.c to avoid clang sanitize complaint about shifting
162*22dc650dSSadaf Ebrahimileft by 16 when there are non-zeros in the top 16 bits.
163*22dc650dSSadaf Ebrahimi
164*22dc650dSSadaf Ebrahimi29. Perl 5.34.0 changed the meaning of (for example) {,3} which did not used to
165*22dc650dSSadaf Ebrahimibe treated as a quantifier. Now it is interpreted as {0,3} and PCRE2 has
166*22dc650dSSadaf Ebrahimichanged to match. Note that {,} is still not a quantifier.
167*22dc650dSSadaf Ebrahimi
168*22dc650dSSadaf Ebrahimi30. Perl allows spaces and/or horizontal tabs after { or before } in all items
169*22dc650dSSadaf Ebrahimithat use braces, and also before or after the comma in quantifiers. PCRE2 now
170*22dc650dSSadaf Ebrahimidoes the same, except for \u{...}, which is recognized only when
171*22dc650dSSadaf EbrahimiPCRE2_EXTRA_ALT_BSUX is set. This an ECMAScript, non-Perl compatible,
172*22dc650dSSadaf Ebrahimiextension, so PCRE2 follows ECMAScript rather than Perl.
173*22dc650dSSadaf Ebrahimi
174*22dc650dSSadaf Ebrahimi31. Applied pull request #300 by Carlo, which fixes #261. The bug was that
175*22dc650dSSadaf Ebrahimipcre2_match() was not fully resetting all captures that had been set within a
176*22dc650dSSadaf Ebrahimi(possibly recursive) subroutine call such as (?3).
177*22dc650dSSadaf Ebrahimi
178*22dc650dSSadaf Ebrahimi32. Changed the meaning of \w (and its synonyms) in UCP mode to match Perl. It
179*22dc650dSSadaf Ebrahiminow matches characters whose general categories are L or N or whose particular
180*22dc650dSSadaf Ebrahimicategories are Mn (non-spacing mark) or Pc (combining punctuation). The latter
181*22dc650dSSadaf Ebrahimiincludes underscore.
182*22dc650dSSadaf Ebrahimi
183*22dc650dSSadaf Ebrahimi33. Changed the meaning of [:xdigit:] in UCP mode to match Perl. It now also
184*22dc650dSSadaf Ebrahimimatches the "fullwidth" versions of the hex digits. Just like it is done for
185*22dc650dSSadaf Ebrahimi[:digit:], PCRE2_EXTRA_ASCII_DIGIT can be used to keep this class ASCII only
186*22dc650dSSadaf Ebrahimiwithout affecting other POSIX classes.
187*22dc650dSSadaf Ebrahimi
188*22dc650dSSadaf Ebrahimi34. GitHub PR305 fixes a potential integer overflow in pcre2_dfa_match().
189*22dc650dSSadaf Ebrahimi
190*22dc650dSSadaf Ebrahimi35. Updated handling of \b and \B in UCP mode to match the changes to \w in 32
191*22dc650dSSadaf Ebrahimiabove because \b and \B are defined in terms of \w.
192*22dc650dSSadaf Ebrahimi
193*22dc650dSSadaf Ebrahimi36. Within a pattern (?aT) and (?-aT) set and reset the PCRE2_EXTRA_ASCII_DIGIT
194*22dc650dSSadaf Ebrahimioption, and (?aP) also sets (?aT) so that (?-aP) disables all ASCII
195*22dc650dSSadaf Ebrahimirestrictions on POSIX classes.
196*22dc650dSSadaf Ebrahimi
197*22dc650dSSadaf Ebrahimi37. If PCRE2_FIRSTLINE was set on an anchored pattern, pcre2_match() and
198*22dc650dSSadaf Ebrahimipcre2_dfa_match() misbehaved. PCRE2_FIRSTLINE is now ignored for anchored
199*22dc650dSSadaf Ebrahimipatterns.
200*22dc650dSSadaf Ebrahimi
201*22dc650dSSadaf Ebrahimi38. Add a test for ridiculous ovector offset values to the substring extraction
202*22dc650dSSadaf Ebrahimifunctions.
203*22dc650dSSadaf Ebrahimi
204*22dc650dSSadaf Ebrahimi39. Make OP_REVERSE use IMM2_SIZE for its data instead of LINK_SIZE, for
205*22dc650dSSadaf Ebrahimiconsistency with OP_VREVERSE.
206*22dc650dSSadaf Ebrahimi
207*22dc650dSSadaf Ebrahimi40. In some legacy environments with a pre C99 snprintf, pcre2_regerror could
208*22dc650dSSadaf Ebrahimireturn an incorrect value when the provided buffer was too small.
209*22dc650dSSadaf Ebrahimi
210*22dc650dSSadaf Ebrahimi41. Applied pull request #342 which adds sanity checks for ctype functions and
211*22dc650dSSadaf Ebrahimilocks out any accidental sign-extension.
212*22dc650dSSadaf Ebrahimi
213*22dc650dSSadaf Ebrahimi42. In the 32-bit library, in non-UTF mode, a quantifier that followed a
214*22dc650dSSadaf Ebrahimiliteral character with a value greater than or equal to 0x80000000u caused
215*22dc650dSSadaf Ebrahimiundefined behaviour.
216*22dc650dSSadaf Ebrahimi
217*22dc650dSSadaf Ebrahimi43. \z was misbehaving when matching fragments inside invalid UTF strings.
218*22dc650dSSadaf Ebrahimi
219*22dc650dSSadaf Ebrahimi44. Implement --group-separator and --no-group-separator for pcre2grep.
220*22dc650dSSadaf Ebrahimi
221*22dc650dSSadaf Ebrahimi45. Fix \X matching in 32 bit mode without UTF in JIT.
222*22dc650dSSadaf Ebrahimi
223*22dc650dSSadaf Ebrahimi46. Fix backref iterators when PCRE2_MATCH_UNSET_BACKREF is set in JIT.
224*22dc650dSSadaf Ebrahimi
225*22dc650dSSadaf Ebrahimi47. Refactor the handling of whole-pattern recursion (?0) in pcre2_match() so
226*22dc650dSSadaf Ebrahimithat its end is handled similarly to other recursions. This has altered the
227*22dc650dSSadaf Ebrahimibehaviour of   /|(?0)./endanchored   which was previously not right.
228*22dc650dSSadaf Ebrahimi
229*22dc650dSSadaf Ebrahimi48. Improved the test for looping recursion by checking the last referenced
230*22dc650dSSadaf Ebrahimicharacter as well as the current character. This allows some patterns that
231*22dc650dSSadaf Ebrahimipreviously triggered the check to run to completion instead of giving the loop
232*22dc650dSSadaf Ebrahimierror.
233*22dc650dSSadaf Ebrahimi
234*22dc650dSSadaf Ebrahimi49. In 32-bit mode, the compiler looped for the pattern /[\x{ffffffff}]/ when
235*22dc650dSSadaf EbrahimiPCRE2_CASELESS and PCRE2_UCP (but not PCRE2_UTF) were set. Fixed by not trying
236*22dc650dSSadaf Ebrahimito look for other cases for characters above the Unicode range.
237*22dc650dSSadaf Ebrahimi
238*22dc650dSSadaf Ebrahimi50. In caseless 32-bit mode with UCP (but not UTF) set, the character
239*22dc650dSSadaf Ebrahimi0xffffffff incorrectly matched any character that has more than one other case,
240*22dc650dSSadaf Ebrahimiin particular k and s.
241*22dc650dSSadaf Ebrahimi
242*22dc650dSSadaf Ebrahimi51. Fix accept and endanchored interaction in JIT.
243*22dc650dSSadaf Ebrahimi
244*22dc650dSSadaf Ebrahimi52. Fix backreferences with unset backref and non-greedy iterators in JIT.
245*22dc650dSSadaf Ebrahimi
246*22dc650dSSadaf Ebrahimi53. Improve the logic that checks for a list of starting code units -- positive
247*22dc650dSSadaf Ebrahimilookahead assertions are now ignored if the immediately following item is one
248*22dc650dSSadaf Ebrahimithat sets a mandatory starting character. For example, /a?(?=bc|)d/ used to set
249*22dc650dSSadaf Ebrahimiall of a, b, and d as possible starting code units; now it sets only a and d.
250*22dc650dSSadaf Ebrahimi
251*22dc650dSSadaf Ebrahimi54. Fix incorrect class character matches in JIT.
252*22dc650dSSadaf Ebrahimi
253*22dc650dSSadaf Ebrahimi55. In pcre2test, ensure pcre2_jit_match() is used when jitfast is used with
254*22dc650dSSadaf Ebrahimisubstitution testing.
255*22dc650dSSadaf Ebrahimi
256*22dc650dSSadaf Ebrahimi56. Insert omitted setting of subject length in match data at the end of
257*22dc650dSSadaf Ebrahimipcre2_jit_match().
258*22dc650dSSadaf Ebrahimi
259*22dc650dSSadaf Ebrahimi57. Implemented PCRE2_DISABLE_RECURSELOOP_CHECK for pcre2_match() to enable
260*22dc650dSSadaf Ebrahimisome apparently looping recursions to run to completion and therefore match the
261*22dc650dSSadaf EbrahimiJIT behaviour. With this set, real loops will eventually get caught by match or
262*22dc650dSSadaf Ebrahimiheap limits or run out of resource.
263*22dc650dSSadaf Ebrahimi
264*22dc650dSSadaf Ebrahimi58. AC did a lot of work on pcre2_fuzzsupport.c to extend it to 16-bit and
265*22dc650dSSadaf Ebrahimi32-bit libraries and to compare JIT and non-JIT matching.
266*22dc650dSSadaf Ebrahimi
267*22dc650dSSadaf Ebrahimi
268*22dc650dSSadaf EbrahimiVersion 10.42 11-December-2022
269*22dc650dSSadaf Ebrahimi------------------------------
270*22dc650dSSadaf Ebrahimi
271*22dc650dSSadaf Ebrahimi1. Change 19 of 10.41 wasn't quite right; it put the definition of a default,
272*22dc650dSSadaf Ebrahimiempty value for PCRE2_CALL_CONVENTION in src/pcre2posix.c instead of
273*22dc650dSSadaf Ebrahimisrc/pcre2posix.h, which meant that programs that included pcre2posix.h but not
274*22dc650dSSadaf Ebrahimipcre2.h failed to compile.
275*22dc650dSSadaf Ebrahimi
276*22dc650dSSadaf Ebrahimi2. To catch similar issues to the above in future, a new small test program
277*22dc650dSSadaf Ebrahimithat includes pcre2posix.h but not pcre2.h has been added to the test suite.
278*22dc650dSSadaf Ebrahimi
279*22dc650dSSadaf Ebrahimi3. When the -S option of pcre2test was used to set a stack size greater than
280*22dc650dSSadaf Ebrahimithe allowed maximum, the error message displayed the hard limit incorrectly.
281*22dc650dSSadaf EbrahimiThis was pointed out on GitHub pull request #171, but the suggested patch
282*22dc650dSSadaf Ebrahimididn't cope with all cases. Some further modification was required.
283*22dc650dSSadaf Ebrahimi
284*22dc650dSSadaf Ebrahimi4. Supplying an ovector count of more than 65535 to pcre2_match_data_create()
285*22dc650dSSadaf Ebrahimicaused a crash because the field in the match data block is only 16 bits. A
286*22dc650dSSadaf Ebrahimimaximum of 65535 is now silently applied.
287*22dc650dSSadaf Ebrahimi
288*22dc650dSSadaf Ebrahimi5. Merged @carenas patch #175 which fixes #86 - segfault on aarch64 (ARM),
289*22dc650dSSadaf Ebrahimi
290*22dc650dSSadaf Ebrahimi6. The prototype for pcre2_substring_list_free() specified its argument as
291*22dc650dSSadaf EbrahimiPCRE2_SPTR * which is a const data type, whereas the yield from
292*22dc650dSSadaf Ebrahimipcre2_substring_list() is not const. This caused compiler warnings. I have
293*22dc650dSSadaf Ebrahimichanged the argument of pcre2_substring_list_free() to be PCRE2_UCHAR ** to
294*22dc650dSSadaf Ebrahimiremove this anomaly. This might cause new warnings in existing code where a
295*22dc650dSSadaf Ebrahimicast has been used to avoid previous ones.
296*22dc650dSSadaf Ebrahimi
297*22dc650dSSadaf Ebrahimi
298*22dc650dSSadaf EbrahimiVersion 10.41 06-December-2022
299*22dc650dSSadaf Ebrahimi------------------------------
300*22dc650dSSadaf Ebrahimi
301*22dc650dSSadaf Ebrahimi1. Add fflush() before and after a fork callout in pcre2grep to get its output
302*22dc650dSSadaf Ebrahimito be the same on all systems. (There were previously ordering differences in
303*22dc650dSSadaf EbrahimiAlpine Linux).
304*22dc650dSSadaf Ebrahimi
305*22dc650dSSadaf Ebrahimi2. Merged patch from @carenas (GitHub #110) for pthreads support in CMake.
306*22dc650dSSadaf Ebrahimi
307*22dc650dSSadaf Ebrahimi3. SSF scorecards grumbled about possible overflow in an expression in
308*22dc650dSSadaf Ebrahimipcre2test. It never would have overflowed in practice, but some casts have been
309*22dc650dSSadaf Ebrahimiadded and at the some time there's been some tidying of fprints that output
310*22dc650dSSadaf Ebrahimisize_t values.
311*22dc650dSSadaf Ebrahimi
312*22dc650dSSadaf Ebrahimi4. PR #94 showed up an unused enum in pcre2_convert.c, which is now removed.
313*22dc650dSSadaf Ebrahimi
314*22dc650dSSadaf Ebrahimi5. Minor code re-arrangement to remove gcc warning about realloc() in
315*22dc650dSSadaf Ebrahimipcre2test.
316*22dc650dSSadaf Ebrahimi
317*22dc650dSSadaf Ebrahimi6. Change a number of int variables that hold buffer and line lengths in
318*22dc650dSSadaf Ebrahimipcre2grep to PCRE2_SIZE (aka size_t).
319*22dc650dSSadaf Ebrahimi
320*22dc650dSSadaf Ebrahimi7. Added an #ifdef to cut out a call to PRIV(jit_free) when JIT is not
321*22dc650dSSadaf Ebrahimisupported (even though that function would do nothing in that case) at the
322*22dc650dSSadaf Ebrahimirequest of a user who doesn't even want to link with pcre_jit_compile.o. Also
323*22dc650dSSadaf Ebrahimitidied up an untidy #ifdef arrangement in pcre2test.
324*22dc650dSSadaf Ebrahimi
325*22dc650dSSadaf Ebrahimi8. Fixed an issue in the backtracking optimization of character repeats in
326*22dc650dSSadaf EbrahimiJIT. Furthermore optimize star repetitions, not just plus repetitions.
327*22dc650dSSadaf Ebrahimi
328*22dc650dSSadaf Ebrahimi9. Removed the use of an initial backtracking frames vector on the system stack
329*22dc650dSSadaf Ebrahimiin pcre2_match() so that it now always uses the heap. (In a multi-thread
330*22dc650dSSadaf Ebrahimienvironment with very small stacks there had been an issue.) This also is
331*22dc650dSSadaf Ebrahimitidier for JIT matching, which didn't need that vector. The heap vector is now
332*22dc650dSSadaf Ebrahimiremembered in the match data block and re-used if that block itself is re-used.
333*22dc650dSSadaf EbrahimiIt is freed with the match data block.
334*22dc650dSSadaf Ebrahimi
335*22dc650dSSadaf Ebrahimi10. Adjusted the find_limits code in pcre2test to work with change 9 above.
336*22dc650dSSadaf Ebrahimi
337*22dc650dSSadaf Ebrahimi11. Added find_limits_noheap to pcre2test, because the heap limits are now
338*22dc650dSSadaf Ebrahimidifferent in different environments and so cannot be included in the standard
339*22dc650dSSadaf Ebrahimitests.
340*22dc650dSSadaf Ebrahimi
341*22dc650dSSadaf Ebrahimi12. Created a test for pcre2_match() heap processing that is not part of the
342*22dc650dSSadaf Ebrahimitests run by 'make check', but can be run manually. The current output is from
343*22dc650dSSadaf Ebrahimia 64-bit system.
344*22dc650dSSadaf Ebrahimi
345*22dc650dSSadaf Ebrahimi13. Implemented -Z aka --null in pcre2grep.
346*22dc650dSSadaf Ebrahimi
347*22dc650dSSadaf Ebrahimi14. A minor change to pcre2test and the addition of several new pcre2grep tests
348*22dc650dSSadaf Ebrahimihave improved LCOV coverage statistics. At the same time, code in pcre2grep and
349*22dc650dSSadaf Ebrahimielsewhere that can never be obeyed in normal testing has been excluded from
350*22dc650dSSadaf Ebrahimicoverage.
351*22dc650dSSadaf Ebrahimi
352*22dc650dSSadaf Ebrahimi15. Fixed a bug in pcre2grep that could cause an extra newline to be written
353*22dc650dSSadaf Ebrahimiafter output generated by --output.
354*22dc650dSSadaf Ebrahimi
355*22dc650dSSadaf Ebrahimi16. If a file has a .bz2 extension but is not in fact compressed, pcre2grep
356*22dc650dSSadaf Ebrahimishould process it as a plain text file. A bug stopped this happening; now fixed
357*22dc650dSSadaf Ebrahimiand added to the tests.
358*22dc650dSSadaf Ebrahimi
359*22dc650dSSadaf Ebrahimi17. When pcre2grep was running not in UTF mode, if a string specified by
360*22dc650dSSadaf Ebrahimi--output or obtained from a callout in a pattern contained a character (byte)
361*22dc650dSSadaf Ebrahimigreater than 127, it was incorrectly output in UTF-8 format.
362*22dc650dSSadaf Ebrahimi
363*22dc650dSSadaf Ebrahimi18. Added some casts after warnings from Clang sanitize.
364*22dc650dSSadaf Ebrahimi
365*22dc650dSSadaf Ebrahimi19. Merged patch from cbouc (GitHub #139): 4 function prototypes were missing
366*22dc650dSSadaf EbrahimiPCRE2_CALL_CONVENTION in src/pcre2posix.h. All function prototypes returning
367*22dc650dSSadaf Ebrahimipointers had out of place PCRE2_CALL_CONVENTION in src/pcre2.h.*. These
368*22dc650dSSadaf Ebrahimiproduced errors when building for Windows with #define PCRE2_CALL_CONVENTION
369*22dc650dSSadaf Ebrahimi__stdcall.
370*22dc650dSSadaf Ebrahimi
371*22dc650dSSadaf Ebrahimi20. A negative repeat value in a pcre2test subject line was not being
372*22dc650dSSadaf Ebrahimidiagnosed, leading to infinite looping.
373*22dc650dSSadaf Ebrahimi
374*22dc650dSSadaf Ebrahimi21. Updated RunGrepTest to discard the warning that Bash now gives when setting
375*22dc650dSSadaf EbrahimiLC_CTYPE to a bad value (because older versions didn't).
376*22dc650dSSadaf Ebrahimi
377*22dc650dSSadaf Ebrahimi22. Updated pcre2grep so that it behaves like GNU grep when matching more than
378*22dc650dSSadaf Ebrahimione pattern and a later pattern matches at an earlier point in the subject when
379*22dc650dSSadaf Ebrahimithe matched substrings are being identified by colour or by offsets.
380*22dc650dSSadaf Ebrahimi
381*22dc650dSSadaf Ebrahimi23. Updated the PrepareRelease script so that the man page that it makes for
382*22dc650dSSadaf Ebrahimithe pcre2demo demonstration program is more standard and does not cause errors
383*22dc650dSSadaf Ebrahimiwhen processed by lexgrog or mandb -c (GitHub issue #160).
384*22dc650dSSadaf Ebrahimi
385*22dc650dSSadaf Ebrahimi24. The JIT compiler was updated.
386*22dc650dSSadaf Ebrahimi
387*22dc650dSSadaf Ebrahimi
388*22dc650dSSadaf EbrahimiVersion 10.40 15-April-2022
389*22dc650dSSadaf Ebrahimi---------------------------
390*22dc650dSSadaf Ebrahimi
391*22dc650dSSadaf Ebrahimi1. Merged patch from @carenas (GitHub #35, 7db87842) to fix pcre2grep incorrect
392*22dc650dSSadaf Ebrahimihandling of multiple passes.
393*22dc650dSSadaf Ebrahimi
394*22dc650dSSadaf Ebrahimi2. Merged patch from @carenas (GitHub #36, dae47509) to fix portability issue
395*22dc650dSSadaf Ebrahimiin pcre2grep with buffered fseek(stdin).
396*22dc650dSSadaf Ebrahimi
397*22dc650dSSadaf Ebrahimi3. Merged patch from @carenas (GitHub #37, acc520924) to fix tests when -S is
398*22dc650dSSadaf Ebrahiminot supported.
399*22dc650dSSadaf Ebrahimi
400*22dc650dSSadaf Ebrahimi4. Revert an unintended change in JIT repeat detection.
401*22dc650dSSadaf Ebrahimi
402*22dc650dSSadaf Ebrahimi5. Merged patch from @carenas (GitHub #52, b037bfa1) to fix build on GNU Hurd.
403*22dc650dSSadaf Ebrahimi
404*22dc650dSSadaf Ebrahimi6. Merged documentation and comments patches from @carenas (GitHub #47).
405*22dc650dSSadaf Ebrahimi
406*22dc650dSSadaf Ebrahimi7. Merged patch from @carenas (GitHub #49) to remove obsolete JFriedl test code
407*22dc650dSSadaf Ebrahimifrom pcre2grep.
408*22dc650dSSadaf Ebrahimi
409*22dc650dSSadaf Ebrahimi8. Merged patch from @carenas (GitHub #48) to fix CMake install issue #46.
410*22dc650dSSadaf Ebrahimi
411*22dc650dSSadaf Ebrahimi9. Merged patch from @carenas (GitHub #53) fixing NULL checks in matching and
412*22dc650dSSadaf Ebrahimisubstituting.
413*22dc650dSSadaf Ebrahimi
414*22dc650dSSadaf Ebrahimi10. Add null_subject and null_replacement modifiers to pcre2test.
415*22dc650dSSadaf Ebrahimi
416*22dc650dSSadaf Ebrahimi11. Add check for NULL subject to POSIX regexec() function.
417*22dc650dSSadaf Ebrahimi
418*22dc650dSSadaf Ebrahimi12. Add check for NULL replacement to pcre2_substitute().
419*22dc650dSSadaf Ebrahimi
420*22dc650dSSadaf Ebrahimi13. For the subject arguments of pcre2_match(), pcre2_dfa_match(), and
421*22dc650dSSadaf Ebrahimipcre2_substitute(), and the replacement argument of the latter, if the pointer
422*22dc650dSSadaf Ebrahimiis NULL and the length is zero, treat as an empty string. Apparently a number
423*22dc650dSSadaf Ebrahimiof applications treat NULL/0 in this way.
424*22dc650dSSadaf Ebrahimi
425*22dc650dSSadaf Ebrahimi14. Added support for Bidi_Class and a number of binary Unicode properties,
426*22dc650dSSadaf Ebrahimiincluding Bidi_Control.
427*22dc650dSSadaf Ebrahimi
428*22dc650dSSadaf Ebrahimi15. Fix some minor issues raised by clang sanitize.
429*22dc650dSSadaf Ebrahimi
430*22dc650dSSadaf Ebrahimi16. Very minor code speed up for maximizing character property matches.
431*22dc650dSSadaf Ebrahimi
432*22dc650dSSadaf Ebrahimi17. A number of changes to script matching for \p and \P:
433*22dc650dSSadaf Ebrahimi
434*22dc650dSSadaf Ebrahimi    (a) Script extensions for a character are now coded as a bitmap instead of
435*22dc650dSSadaf Ebrahimi        a list of script numbers, which should be faster and does not need a
436*22dc650dSSadaf Ebrahimi        loop.
437*22dc650dSSadaf Ebrahimi
438*22dc650dSSadaf Ebrahimi    (b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms
439*22dc650dSSadaf Ebrahimi        sc and scx).
440*22dc650dSSadaf Ebrahimi
441*22dc650dSSadaf Ebrahimi    (c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being
442*22dc650dSSadaf Ebrahimi        the same as \p{scx:scriptname} because this change happened in Perl at
443*22dc650dSSadaf Ebrahimi        release 5.26.
444*22dc650dSSadaf Ebrahimi
445*22dc650dSSadaf Ebrahimi    (d) The standard Unicode 4-letter abbreviations for script names are now
446*22dc650dSSadaf Ebrahimi        recognized.
447*22dc650dSSadaf Ebrahimi
448*22dc650dSSadaf Ebrahimi    (e) In accordance with Unicode and Perl's "loose matching" rules, spaces,
449*22dc650dSSadaf Ebrahimi        hyphens, and underscores are ignored in property names, which are then
450*22dc650dSSadaf Ebrahimi        matched independent of case.
451*22dc650dSSadaf Ebrahimi
452*22dc650dSSadaf Ebrahimi18. The Python scripts in the maint directory have been refactored. There are
453*22dc650dSSadaf Ebrahiminow three scripts that generate pcre2_ucd.c, pcre2_ucp.h, and pcre2_ucptables.c
454*22dc650dSSadaf Ebrahimi(which is #included by pcre2_tables.c). The data lists that used to be
455*22dc650dSSadaf Ebrahimiduplicated are now held in a single common Python module.
456*22dc650dSSadaf Ebrahimi
457*22dc650dSSadaf Ebrahimi19. On CHERI, and thus Arm's Morello prototype, pointers are represented as
458*22dc650dSSadaf Ebrahimihardware capabilities, which consist of both an integer address and additional
459*22dc650dSSadaf Ebrahimimetadata, meaning they are twice the size of the platform's size_t type, i.e.
460*22dc650dSSadaf Ebrahimi16 bytes on a 64-bit system. The ovector member of heapframe happens to only be
461*22dc650dSSadaf Ebrahimi8 byte aligned, and so computing frame_size ended up with a multiple of 8 but
462*22dc650dSSadaf Ebrahiminot 16. Whilst the first frame was always suitably aligned, this then
463*22dc650dSSadaf Ebrahimimisaligned the frame that follows, resulting in an alignment fault when storing
464*22dc650dSSadaf Ebrahimia pointer to Fecode at the start of match. Patch to fix this issue by Jessica
465*22dc650dSSadaf EbrahimiClarke PR#72.
466*22dc650dSSadaf Ebrahimi
467*22dc650dSSadaf Ebrahimi20. Added -LP and -LS listing options to pcre2test.
468*22dc650dSSadaf Ebrahimi
469*22dc650dSSadaf Ebrahimi21. A user discovered that the library names in CMakeLists.txt for MSVC
470*22dc650dSSadaf Ebrahimidebugger (PDB) files were incorrect - perhaps never tried for PCRE2?
471*22dc650dSSadaf Ebrahimi
472*22dc650dSSadaf Ebrahimi22. An item such as [Aa] is optimized into a caseless single character match.
473*22dc650dSSadaf EbrahimiWhen this was quantified (e.g. [Aa]{2}) and was also the last literal item in a
474*22dc650dSSadaf Ebrahimipattern, the optimizing "must be present for a match" character check was not
475*22dc650dSSadaf Ebrahimibeing flagged as caseless, causing some matches that should have succeeded to
476*22dc650dSSadaf Ebrahimifail.
477*22dc650dSSadaf Ebrahimi
478*22dc650dSSadaf Ebrahimi23. Fixed a unicode property matching issue in JIT. The character was not
479*22dc650dSSadaf Ebrahimifully read in caseless matching.
480*22dc650dSSadaf Ebrahimi
481*22dc650dSSadaf Ebrahimi24. Fixed an issue affecting recursions in JIT caused by duplicated data
482*22dc650dSSadaf Ebrahimitransfers.
483*22dc650dSSadaf Ebrahimi
484*22dc650dSSadaf Ebrahimi25. Merged patch from @carenas (GitHub #96) which fixes some problems with
485*22dc650dSSadaf Ebrahimipcre2test and readline/readedit:
486*22dc650dSSadaf Ebrahimi
487*22dc650dSSadaf Ebrahimi  * Use the right header for libedit in FreeBSD with autoconf
488*22dc650dSSadaf Ebrahimi  * Really allow libedit with cmake
489*22dc650dSSadaf Ebrahimi  * Avoid using readline headers with libedit
490*22dc650dSSadaf Ebrahimi
491*22dc650dSSadaf Ebrahimi
492*22dc650dSSadaf EbrahimiVersion 10.39 29-October-2021
493*22dc650dSSadaf Ebrahimi-----------------------------
494*22dc650dSSadaf Ebrahimi
495*22dc650dSSadaf Ebrahimi1. Fix incorrect detection of alternatives in first character search in JIT.
496*22dc650dSSadaf Ebrahimi
497*22dc650dSSadaf Ebrahimi2. Merged patch from @carenas (GitHub #28):
498*22dc650dSSadaf Ebrahimi
499*22dc650dSSadaf Ebrahimi  Visual Studio 2013 includes support for %zu and %td, so let newer
500*22dc650dSSadaf Ebrahimi  versions of it avoid the fallback, and while at it, make sure that
501*22dc650dSSadaf Ebrahimi  the first check is for DISABLE_PERCENT_ZT so it will be always
502*22dc650dSSadaf Ebrahimi  honoured if chosen.
503*22dc650dSSadaf Ebrahimi
504*22dc650dSSadaf Ebrahimi  prtdiff_t is signed, so use a signed type instead, and make sure
505*22dc650dSSadaf Ebrahimi  that an appropriate width is chosen if pointers are 64bit wide and
506*22dc650dSSadaf Ebrahimi  long is not (ex: Windows 64bit).
507*22dc650dSSadaf Ebrahimi
508*22dc650dSSadaf Ebrahimi  IMHO removing the cast (and therefore the possibility of truncation)
509*22dc650dSSadaf Ebrahimi  make the code cleaner and the fallback is likely portable enough
510*22dc650dSSadaf Ebrahimi  with all 64-bit POSIX systems doing LP64 except for Windows.
511*22dc650dSSadaf Ebrahimi
512*22dc650dSSadaf Ebrahimi3. Merged patch from @carenas (GitHub #29) to update to Unicode 14.0.0.
513*22dc650dSSadaf Ebrahimi
514*22dc650dSSadaf Ebrahimi4. Merged patch from @carenas (GitHub #30):
515*22dc650dSSadaf Ebrahimi
516*22dc650dSSadaf Ebrahimi  * Cleanup: remove references to no longer used stdint.h
517*22dc650dSSadaf Ebrahimi
518*22dc650dSSadaf Ebrahimi  Since 19c50b9d (Unconditionally use inttypes.h instead of trying for stdint.h
519*22dc650dSSadaf Ebrahimi  (simplification) and remove the now unnecessary inclusion in
520*22dc650dSSadaf Ebrahimi  pcre2_internal.h., 2018-11-14), stdint.h is no longer used.
521*22dc650dSSadaf Ebrahimi
522*22dc650dSSadaf Ebrahimi  Remove checks for it in autotools and CMake and document better the expected
523*22dc650dSSadaf Ebrahimi  build failures for systems that might have stdint.h (C99) and not inttypes.h
524*22dc650dSSadaf Ebrahimi  (from POSIX), like old Windows.
525*22dc650dSSadaf Ebrahimi
526*22dc650dSSadaf Ebrahimi  * Cleanup: remove detection for inttypes.h which is a hard dependency
527*22dc650dSSadaf Ebrahimi
528*22dc650dSSadaf Ebrahimi  CMake checks for standard headers are not meant to be used for hard
529*22dc650dSSadaf Ebrahimi  dependencies, so will prevent a possible fallback to work.
530*22dc650dSSadaf Ebrahimi
531*22dc650dSSadaf Ebrahimi  Alternatively, the header could be checked to make the configuration fail
532*22dc650dSSadaf Ebrahimi  instead of breaking the build, but that was punted, as it was missing anyway
533*22dc650dSSadaf Ebrahimi  from autotools.
534*22dc650dSSadaf Ebrahimi
535*22dc650dSSadaf Ebrahimi5. Merged patch from @carenas (GitHub #32):
536*22dc650dSSadaf Ebrahimi
537*22dc650dSSadaf Ebrahimi  * jit: allow building with ancient MSVC versions
538*22dc650dSSadaf Ebrahimi
539*22dc650dSSadaf Ebrahimi  Visual Studio older than 2013 fails to build with JIT enabled, because it is
540*22dc650dSSadaf Ebrahimi  unable to parse non C89 compatible syntax, with mixed declarations and code.
541*22dc650dSSadaf Ebrahimi  While most recent compilers wouldn't even report this as a warning since it
542*22dc650dSSadaf Ebrahimi  is valid C99, it could be also made visible by adding to gcc/clang the
543*22dc650dSSadaf Ebrahimi  -Wdeclaration-after-statement flag at build time.
544*22dc650dSSadaf Ebrahimi
545*22dc650dSSadaf Ebrahimi  Move the code below the affected definitions.
546*22dc650dSSadaf Ebrahimi
547*22dc650dSSadaf Ebrahimi  * pcre2grep: avoid mixing declarations with code
548*22dc650dSSadaf Ebrahimi
549*22dc650dSSadaf Ebrahimi  Since d5a61ee8 (Patch to detect (and ignore) symlink loops in pcre2grep,
550*22dc650dSSadaf Ebrahimi  2021-08-28), code will fail to build in a strict C89 compiler.
551*22dc650dSSadaf Ebrahimi
552*22dc650dSSadaf Ebrahimi  Reformat slightly to make it C89 compatible again.
553*22dc650dSSadaf Ebrahimi
554*22dc650dSSadaf Ebrahimi
555*22dc650dSSadaf EbrahimiVersion 10.38 01-October-2021
556*22dc650dSSadaf Ebrahimi-----------------------------
557*22dc650dSSadaf Ebrahimi
558*22dc650dSSadaf Ebrahimi1. Fix invalid single character repetition issues in JIT when the repetition
559*22dc650dSSadaf Ebrahimiis inside a capturing bracket and the bracket is preceded by character
560*22dc650dSSadaf Ebrahimiliterals.
561*22dc650dSSadaf Ebrahimi
562*22dc650dSSadaf Ebrahimi2. Installed revised CMake configuration files provided by Jan-Willem Blokland.
563*22dc650dSSadaf EbrahimiThis extends the CMake build system to build both static and shared libraries
564*22dc650dSSadaf Ebrahimiin one go, builds the static library with PIC, and exposes PCRE2 libraries
565*22dc650dSSadaf Ebrahimiusing the CMake config files. JWB provided these notes:
566*22dc650dSSadaf Ebrahimi
567*22dc650dSSadaf Ebrahimi- Introduced CMake variable BUILD_STATIC_LIBS to build the static library.
568*22dc650dSSadaf Ebrahimi
569*22dc650dSSadaf Ebrahimi- Make a small modification to config-cmake.h.in by removing the PCRE2_STATIC
570*22dc650dSSadaf Ebrahimi  variable. Added PCRE2_STATIC variable to the static build using the
571*22dc650dSSadaf Ebrahimi  target_compile_definitions() function.
572*22dc650dSSadaf Ebrahimi
573*22dc650dSSadaf Ebrahimi- Extended the CMake config files.
574*22dc650dSSadaf Ebrahimi
575*22dc650dSSadaf Ebrahimi  - Introduced CMake variable PCRE2_USE_STATIC_LIBS to easily switch between
576*22dc650dSSadaf Ebrahimi    the static and shared libraries.
577*22dc650dSSadaf Ebrahimi
578*22dc650dSSadaf Ebrahimi  - Added the PCRE_STATIC variable to the target compile definitions for the
579*22dc650dSSadaf Ebrahimi    import of the static library.
580*22dc650dSSadaf Ebrahimi
581*22dc650dSSadaf EbrahimiBuilding static and shared libraries using MSVC results in a name clash of
582*22dc650dSSadaf Ebrahimithe libraries. Both static and shared library builds create, for example, the
583*22dc650dSSadaf Ebrahimifile pcre2-8.lib. Therefore, I decided to change the static library names by
584*22dc650dSSadaf Ebrahimiadding "-static". For example, pcre2-8.lib has become pcre2-8-static.lib.
585*22dc650dSSadaf Ebrahimi[Comment by PH: this is MSVC-specific. It doesn't happen on Linux.]
586*22dc650dSSadaf Ebrahimi
587*22dc650dSSadaf Ebrahimi3. Increased the minimum release number for CMake to 3.0.0 because older than
588*22dc650dSSadaf Ebrahimi2.8.12 is deprecated (it was set to 2.8.5) and causes warnings. Even 3.0.0 is
589*22dc650dSSadaf Ebrahimiquite old; it was released in 2014.
590*22dc650dSSadaf Ebrahimi
591*22dc650dSSadaf Ebrahimi4. Implemented a modified version of Thomas Tempelmann's pcre2grep patch for
592*22dc650dSSadaf Ebrahimidetecting symlink loops. This is dependent on the availability of realpath(),
593*22dc650dSSadaf Ebrahimiwhich is now tested for in ./configure and CMakeLists.txt.
594*22dc650dSSadaf Ebrahimi
595*22dc650dSSadaf Ebrahimi5. Implemented a modified version of Thomas Tempelmann's patch for faster
596*22dc650dSSadaf Ebrahimicase-independent "first code unit" searches for unanchored patterns in 8-bit
597*22dc650dSSadaf Ebrahimimode in the interpreters. Instead of just remembering whether one case matched
598*22dc650dSSadaf Ebrahimior not, it remembers the position of a previous match so as to avoid
599*22dc650dSSadaf Ebrahimiunnecessary repeated searching.
600*22dc650dSSadaf Ebrahimi
601*22dc650dSSadaf Ebrahimi6. Perl now locks out \K in lookarounds, so PCRE2 now does the same by default.
602*22dc650dSSadaf EbrahimiHowever, just in case anybody was relying on the old behaviour, there is an
603*22dc650dSSadaf Ebrahimioption called PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK that enables the old behaviour.
604*22dc650dSSadaf EbrahimiAn option has also been added to pcre2grep to enable this.
605*22dc650dSSadaf Ebrahimi
606*22dc650dSSadaf Ebrahimi7. Re-enable a JIT optimization which was unintentionally disabled in 10.35.
607*22dc650dSSadaf Ebrahimi
608*22dc650dSSadaf Ebrahimi8. There is a loop counter to catch excessively crazy patterns when checking
609*22dc650dSSadaf Ebrahimithe lengths of lookbehinds at compile time. This was incorrectly getting reset
610*22dc650dSSadaf Ebrahimiwhenever a lookahead was processed, leading to some fuzzer-generated patterns
611*22dc650dSSadaf Ebrahimitaking a very long time to compile when (?|) was present in the pattern,
612*22dc650dSSadaf Ebrahimibecause (?|) disables caching of group lengths.
613*22dc650dSSadaf Ebrahimi
614*22dc650dSSadaf Ebrahimi
615*22dc650dSSadaf EbrahimiVersion 10.37 26-May-2021
616*22dc650dSSadaf Ebrahimi-------------------------
617*22dc650dSSadaf Ebrahimi
618*22dc650dSSadaf Ebrahimi1. Change RunGrepTest to use tr instead of sed when testing with binary
619*22dc650dSSadaf Ebrahimizero bytes, because sed varies a lot from system to system and has problems
620*22dc650dSSadaf Ebrahimiwith binary zeros. This is from Bugzilla #2681. Patch from Jeremie
621*22dc650dSSadaf EbrahimiCourreges-Anglas via Nam Nguyen. This fixes RunGrepTest for OpenBSD. Later:
622*22dc650dSSadaf Ebrahimiit broke it for at least one version of Solaris, where tr can't handle binary
623*22dc650dSSadaf Ebrahimizeros. However, that system had /usr/xpg4/bin/tr installed, which works OK, so
624*22dc650dSSadaf EbrahimiRunGrepTest now checks for that command and uses it if found.
625*22dc650dSSadaf Ebrahimi
626*22dc650dSSadaf Ebrahimi2. Compiling with gcc 10.2's -fanalyzer option showed up a hypothetical problem
627*22dc650dSSadaf Ebrahimiwith a NULL dereference. I don't think this case could ever occur in practice,
628*22dc650dSSadaf Ebrahimibut I have put in a check in order to get rid of the compiler error.
629*22dc650dSSadaf Ebrahimi
630*22dc650dSSadaf Ebrahimi3. An alternative patch for CMakeLists.txt because 10.36 #4 breaks CMake on
631*22dc650dSSadaf EbrahimiWindows. Patch from [email protected] fixes bugzilla #2688.
632*22dc650dSSadaf Ebrahimi
633*22dc650dSSadaf Ebrahimi4. Two bugs related to over-large numbers have been fixed so the behaviour is
634*22dc650dSSadaf Ebrahiminow the same as Perl.
635*22dc650dSSadaf Ebrahimi
636*22dc650dSSadaf Ebrahimi  (a) A pattern such as /\214748364/ gave an overflow error instead of being
637*22dc650dSSadaf Ebrahimi  treated as the octal number \214 followed by literal digits.
638*22dc650dSSadaf Ebrahimi
639*22dc650dSSadaf Ebrahimi  (b) A sequence such as {65536 that has no terminating } so is not a
640*22dc650dSSadaf Ebrahimi  quantifier was nevertheless complaining that a quantifier number was too big.
641*22dc650dSSadaf Ebrahimi
642*22dc650dSSadaf Ebrahimi5. A run of autoconf suggested that configure.ac was out-of-date with respect
643*22dc650dSSadaf Ebrahimito the latest autoconf. Running autoupdate made some valid changes, some valid
644*22dc650dSSadaf Ebrahimisuggestions, and also some invalid changes, which were fixed by hand. Autoconf
645*22dc650dSSadaf Ebrahiminow runs clean and the resulting "configure" seems to work, so I hope nothing
646*22dc650dSSadaf Ebrahimiis broken. Later: the requirement for autoconf 2.70 broke some automatic test
647*22dc650dSSadaf Ebrahimirobots. It doesn't seem to be necessary: trying a reduction to 2.60.
648*22dc650dSSadaf Ebrahimi
649*22dc650dSSadaf Ebrahimi6. The pattern /a\K.(?0)*/ when matched against "abac" by the interpreter gave
650*22dc650dSSadaf Ebrahimithe answer "bac", whereas Perl and JIT both yield "c". This was because the
651*22dc650dSSadaf Ebrahimieffect of \K was not propagating back from the full pattern recursion. Other
652*22dc650dSSadaf Ebrahimirecursions such as /(a\K.(?1)*)/ did not have this problem.
653*22dc650dSSadaf Ebrahimi
654*22dc650dSSadaf Ebrahimi7. Restore single character repetition optimization in JIT. Currently fewer
655*22dc650dSSadaf Ebrahimicharacter repetitions are optimized than in 10.34.
656*22dc650dSSadaf Ebrahimi
657*22dc650dSSadaf Ebrahimi8. When the names of the functions in the POSIX wrapper were changed to
658*22dc650dSSadaf Ebrahimipcre2_regcomp() etc. (see change 10.33 #4 below), functions with the original
659*22dc650dSSadaf Ebrahiminames were left in the library so that pre-compiled programs would still work.
660*22dc650dSSadaf EbrahimiHowever, this has proved troublesome when programs link with several libraries,
661*22dc650dSSadaf Ebrahimisome of which use PCRE2 via the POSIX interface while others use a native POSIX
662*22dc650dSSadaf Ebrahimilibrary. For this reason, the POSIX function names are removed in this release.
663*22dc650dSSadaf EbrahimiThe macros in pcre2posix.h should ensure that re-compiling fixes any programs
664*22dc650dSSadaf Ebrahimithat haven't been compiled since before 10.33.
665*22dc650dSSadaf Ebrahimi
666*22dc650dSSadaf Ebrahimi
667*22dc650dSSadaf EbrahimiVersion 10.36 04-December-2020
668*22dc650dSSadaf Ebrahimi------------------------------
669*22dc650dSSadaf Ebrahimi
670*22dc650dSSadaf Ebrahimi1. Add CET_CFLAGS so that when Intel CET is enabled, pass -mshstk to
671*22dc650dSSadaf Ebrahimicompiler. This fixes https://bugs.exim.org/show_bug.cgi?id=2578. Patch for
672*22dc650dSSadaf EbrahimiMakefile.am and configure.ac by H.J. Lu. Equivalent patch for CMakeLists.txt
673*22dc650dSSadaf Ebrahimiinvented by PH.
674*22dc650dSSadaf Ebrahimi
675*22dc650dSSadaf Ebrahimi2. Fix infinite loop when a single byte newline is searched in JIT when
676*22dc650dSSadaf Ebrahimiinvalid utf8 mode is enabled.
677*22dc650dSSadaf Ebrahimi
678*22dc650dSSadaf Ebrahimi3. Updated CMakeLists.txt with patch from Wolfgang Stöggl (Bugzilla #2584):
679*22dc650dSSadaf Ebrahimi
680*22dc650dSSadaf Ebrahimi  - Include GNUInstallDirs and use ${CMAKE_INSTALL_LIBDIR} instead of hardcoded
681*22dc650dSSadaf Ebrahimi    lib. This allows differentiation between lib and lib64.
682*22dc650dSSadaf Ebrahimi    CMAKE_INSTALL_LIBDIR is used for installation of libraries and also for
683*22dc650dSSadaf Ebrahimi    pkgconfig file generation.
684*22dc650dSSadaf Ebrahimi
685*22dc650dSSadaf Ebrahimi  - Add the version of PCRE2 to the configuration summary like ./configure
686*22dc650dSSadaf Ebrahimi    does.
687*22dc650dSSadaf Ebrahimi
688*22dc650dSSadaf Ebrahimi  - Fix typo: MACTHED_STRING->MATCHED_STRING
689*22dc650dSSadaf Ebrahimi
690*22dc650dSSadaf Ebrahimi4. Updated CMakeLists.txt with another patch from Wolfgang Stöggl (Bugzilla
691*22dc650dSSadaf Ebrahimi#2588):
692*22dc650dSSadaf Ebrahimi
693*22dc650dSSadaf Ebrahimi  - Add escaped double quotes around include directory in CMakeLists.txt to
694*22dc650dSSadaf Ebrahimi    allow spaces in directory names.
695*22dc650dSSadaf Ebrahimi
696*22dc650dSSadaf Ebrahimi  - This fixes a cmake error, if the path of the pcre2 source contains a space.
697*22dc650dSSadaf Ebrahimi
698*22dc650dSSadaf Ebrahimi5. Updated CMakeLists.txt with a patch from B. Scott Michel: CMake's
699*22dc650dSSadaf Ebrahimidocumentation suggests using CHECK_SYMBOL_EXISTS over CHECK_FUNCTION_EXIST.
700*22dc650dSSadaf EbrahimiMoreover, these functions come from specific header files, which need to be
701*22dc650dSSadaf Ebrahimispecified (and, thankfully, are the same on both the Linux and WinXX
702*22dc650dSSadaf Ebrahimiplatforms.)
703*22dc650dSSadaf Ebrahimi
704*22dc650dSSadaf Ebrahimi6. Added a (uint32_t) cast to prevent a compiler warning in pcre2_compile.c.
705*22dc650dSSadaf Ebrahimi
706*22dc650dSSadaf Ebrahimi7. Applied a patch from Wolfgang Stöggl (Bugzilla #2600) to fix postfix for
707*22dc650dSSadaf Ebrahimidebug Windows builds using CMake. This also updated configure so that it
708*22dc650dSSadaf Ebrahimigenerates *.pc files and pcre2-config with the same content, as in the past.
709*22dc650dSSadaf Ebrahimi
710*22dc650dSSadaf Ebrahimi8. If a pattern ended with (?(VERSION=n.d where n is any number but d is just a
711*22dc650dSSadaf Ebrahimisingle digit, the code unit beyond d was being read (i.e. there was a read
712*22dc650dSSadaf Ebrahimibuffer overflow). Fixes ClusterFuzz 23779.
713*22dc650dSSadaf Ebrahimi
714*22dc650dSSadaf Ebrahimi9. After the rework in r1235, certain character ranges were incorrectly
715*22dc650dSSadaf Ebrahimihandled by an optimization in JIT. Furthermore a wrong offset was used to
716*22dc650dSSadaf Ebrahimiread a value from a buffer which could lead to memory overread.
717*22dc650dSSadaf Ebrahimi
718*22dc650dSSadaf Ebrahimi10. Unnoticed for many years was the fact that delimiters other than / in the
719*22dc650dSSadaf Ebrahimitestinput1 and testinput4 files could cause incorrect behaviour when these
720*22dc650dSSadaf Ebrahimifiles were processed by perltest.sh. There were several tests that used quotes
721*22dc650dSSadaf Ebrahimias delimiters, and it was just luck that they didn't go wrong with perltest.sh.
722*22dc650dSSadaf EbrahimiAll the patterns in testinput1 and testinput4 now use / as their delimiter.
723*22dc650dSSadaf EbrahimiThis fixes Bugzilla #2641.
724*22dc650dSSadaf Ebrahimi
725*22dc650dSSadaf Ebrahimi11. Perl has started to give an error for \K within lookarounds (though there
726*22dc650dSSadaf Ebrahimiare cases where it doesn't). PCRE2 still allows this, so the tests that include
727*22dc650dSSadaf Ebrahimithis case have been moved from test 1 to test 2.
728*22dc650dSSadaf Ebrahimi
729*22dc650dSSadaf Ebrahimi12. Further to 10 above, pcre2test has been updated to detect and grumble if a
730*22dc650dSSadaf Ebrahimidelimiter other than / is used after #perltest.
731*22dc650dSSadaf Ebrahimi
732*22dc650dSSadaf Ebrahimi13. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS
733*22dc650dSSadaf Ebrahimiwas set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding
734*22dc650dSSadaf Ebrahimithe start of a match was not resetting correctly after a failed match on the
735*22dc650dSSadaf Ebrahimifirst valid fragment of the subject, possibly causing incorrect "no match"
736*22dc650dSSadaf Ebrahimireturns on subsequent fragments. For example, the pattern /A/ failed to match
737*22dc650dSSadaf Ebrahimithe subject \xe5A. Fixes Bugzilla #2642.
738*22dc650dSSadaf Ebrahimi
739*22dc650dSSadaf Ebrahimi14. Fixed a bug in character set matching when JIT is enabled and both unicode
740*22dc650dSSadaf Ebrahimiscripts and unicode classes are present at the same time.
741*22dc650dSSadaf Ebrahimi
742*22dc650dSSadaf Ebrahimi15. Added GNU grep's -m (aka --max-count) option to pcre2grep.
743*22dc650dSSadaf Ebrahimi
744*22dc650dSSadaf Ebrahimi16. Refactored substitution processing in pcre2grep strings, both for the -O
745*22dc650dSSadaf Ebrahimioption and when dealing with callouts. There is now a single function that
746*22dc650dSSadaf Ebrahimihandles $ expansion in all cases (instead of multiple copies of almost
747*22dc650dSSadaf Ebrahimiidentical code). This means that the same escape sequences are available
748*22dc650dSSadaf Ebrahimieverywhere, which was not previously the case. At the same time, the escape
749*22dc650dSSadaf Ebrahimisequences $x{...} and $o{...} have been introduced, to allow for characters
750*22dc650dSSadaf Ebrahimiwhose code points are greater than 255 in Unicode mode.
751*22dc650dSSadaf Ebrahimi
752*22dc650dSSadaf Ebrahimi17. Applied the patch from Bugzilla #2628 to RunGrepTest. This does an explicit
753*22dc650dSSadaf Ebrahimitest for a version of sed that can handle binary zero, instead of assuming that
754*22dc650dSSadaf Ebrahimiany Linux version will work. Later: replaced $(...) by `...` because not all
755*22dc650dSSadaf Ebrahimishells recognize the former.
756*22dc650dSSadaf Ebrahimi
757*22dc650dSSadaf Ebrahimi18. Fixed a word boundary check bug in JIT when partial matching is enabled.
758*22dc650dSSadaf Ebrahimi
759*22dc650dSSadaf Ebrahimi19. Fix ARM64 compilation warning in JIT. Patch by Carlo.
760*22dc650dSSadaf Ebrahimi
761*22dc650dSSadaf Ebrahimi20. A bug in the RunTest script meant that if the first part of test 2 failed,
762*22dc650dSSadaf Ebrahimithe failure was not reported.
763*22dc650dSSadaf Ebrahimi
764*22dc650dSSadaf Ebrahimi21. Test 2 was failing when run from a directory other than the source
765*22dc650dSSadaf Ebrahimidirectory. This failure was previously missed in RunTest because of 20 above.
766*22dc650dSSadaf EbrahimiFixes added to both RunTest and RunTest.bat.
767*22dc650dSSadaf Ebrahimi
768*22dc650dSSadaf Ebrahimi22. Patch to CMakeLists.txt from Daniel to fix problem with testing under
769*22dc650dSSadaf EbrahimiWindows.
770*22dc650dSSadaf Ebrahimi
771*22dc650dSSadaf Ebrahimi
772*22dc650dSSadaf EbrahimiVersion 10.35 09-May-2020
773*22dc650dSSadaf Ebrahimi---------------------------
774*22dc650dSSadaf Ebrahimi
775*22dc650dSSadaf Ebrahimi1. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT.
776*22dc650dSSadaf Ebrahimi
777*22dc650dSSadaf Ebrahimi2. Fix ARMv5 JIT improper handling of labels right after a constant pool.
778*22dc650dSSadaf Ebrahimi
779*22dc650dSSadaf Ebrahimi3. A JIT bug is fixed which allowed to read the fields of the compiled
780*22dc650dSSadaf Ebrahimipattern before its existence is checked.
781*22dc650dSSadaf Ebrahimi
782*22dc650dSSadaf Ebrahimi4. Back in the PCRE1 day, capturing groups that contained recursive back
783*22dc650dSSadaf Ebrahimireferences to themselves were made atomic (version 8.01, change 18) because
784*22dc650dSSadaf Ebrahimiafter the end a repeated group, the captured substrings had their values from
785*22dc650dSSadaf Ebrahimithe final repetition, not from an earlier repetition that might be the
786*22dc650dSSadaf Ebrahimidestination of a backtrack. This feature was documented, and was carried over
787*22dc650dSSadaf Ebrahimiinto PCRE2. However, it has now been realized that the major refactoring that
788*22dc650dSSadaf Ebrahimiwas done for 10.30 has made this atomizing unnecessary, and it is confusing
789*22dc650dSSadaf Ebrahimiwhen users are unaware of it, making some patterns appear not to be working as
790*22dc650dSSadaf Ebrahimiexpected. Capture values of recursive back references in repeated groups are
791*22dc650dSSadaf Ebrahiminow correctly backtracked, so this unnecessary restriction has been removed.
792*22dc650dSSadaf Ebrahimi
793*22dc650dSSadaf Ebrahimi5. Added PCRE2_SUBSTITUTE_LITERAL.
794*22dc650dSSadaf Ebrahimi
795*22dc650dSSadaf Ebrahimi6. Avoid some VS compiler warnings.
796*22dc650dSSadaf Ebrahimi
797*22dc650dSSadaf Ebrahimi7. Added PCRE2_SUBSTITUTE_MATCHED.
798*22dc650dSSadaf Ebrahimi
799*22dc650dSSadaf Ebrahimi8. Added (?* and (?<* as synonyms for (*napla: and (*naplb: to match another
800*22dc650dSSadaf Ebrahimiregex engine. The Perl regex folks are aware of this usage and have made a note
801*22dc650dSSadaf Ebrahimiabout it.
802*22dc650dSSadaf Ebrahimi
803*22dc650dSSadaf Ebrahimi9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
804*22dc650dSSadaf Ebrahimi1, believing that repeating an assertion is pointless. However, if a positive
805*22dc650dSSadaf Ebrahimiassertion contains capturing groups, repetition can be useful. In any case, an
806*22dc650dSSadaf Ebrahimiassertion could always be wrapped in a repeated group. The only restriction
807*22dc650dSSadaf Ebrahimithat is now imposed is that an unlimited maximum is changed to one more than
808*22dc650dSSadaf Ebrahimithe minimum.
809*22dc650dSSadaf Ebrahimi
810*22dc650dSSadaf Ebrahimi10. Fix *THEN verbs in lookahead assertions in JIT.
811*22dc650dSSadaf Ebrahimi
812*22dc650dSSadaf Ebrahimi11. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.
813*22dc650dSSadaf Ebrahimi
814*22dc650dSSadaf Ebrahimi12. The JIT stack should be freed when the low-level stack allocation fails.
815*22dc650dSSadaf Ebrahimi
816*22dc650dSSadaf Ebrahimi13. In pcre2grep, if the final line in a scanned file is output but does not
817*22dc650dSSadaf Ebrahimiend with a newline sequence, add a newline according to the --newline setting.
818*22dc650dSSadaf Ebrahimi
819*22dc650dSSadaf Ebrahimi14. (?(DEFINE)...) groups were not being handled correctly when checking for
820*22dc650dSSadaf Ebrahimithe fixed length of a lookbehind assertion. Such a group within a lookbehind
821*22dc650dSSadaf Ebrahimishould be skipped, as it does not contribute to the length of the group.
822*22dc650dSSadaf EbrahimiInstead, the (DEFINE) group was being processed, and if at the end of the
823*22dc650dSSadaf Ebrahimilookbehind, that end was not correctly recognized. Errors such as "lookbehind
824*22dc650dSSadaf Ebrahimiassertion is not fixed length" and also "internal error: bad code value in
825*22dc650dSSadaf Ebrahimiparsed_skip()" could result.
826*22dc650dSSadaf Ebrahimi
827*22dc650dSSadaf Ebrahimi15. Put a limit of 1000 on recursive calls in pcre2_study() when searching
828*22dc650dSSadaf Ebrahiminested groups for starting code units, in order to avoid stack overflow issues.
829*22dc650dSSadaf EbrahimiIf the limit is reached, it just gives up trying for this optimization.
830*22dc650dSSadaf Ebrahimi
831*22dc650dSSadaf Ebrahimi16. The control verb chain list must always be restored when exiting from a
832*22dc650dSSadaf Ebrahimirecurse function in JIT.
833*22dc650dSSadaf Ebrahimi
834*22dc650dSSadaf Ebrahimi17. Fix a crash which occurs when the character type of an invalid UTF
835*22dc650dSSadaf Ebrahimicharacter is decoded in JIT.
836*22dc650dSSadaf Ebrahimi
837*22dc650dSSadaf Ebrahimi18. Changes in many areas of the code so that when Unicode is supported and
838*22dc650dSSadaf EbrahimiPCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for
839*22dc650dSSadaf Ebrahimiupper/lower case computations on characters whose code points are greater than
840*22dc650dSSadaf Ebrahimi127.
841*22dc650dSSadaf Ebrahimi
842*22dc650dSSadaf Ebrahimi19. The function for checking UTF-16 validity was returning an incorrect offset
843*22dc650dSSadaf Ebrahimifor the start of the error when a high surrogate was not followed by a valid
844*22dc650dSSadaf Ebrahimilow surrogate. This caused incorrect behaviour, for example when
845*22dc650dSSadaf EbrahimiPCRE2_MATCH_INVALID_UTF was set and a match started immediately following the
846*22dc650dSSadaf Ebrahimiinvalid high surrogate, such as /aa/ matching "\x{d800}aa".
847*22dc650dSSadaf Ebrahimi
848*22dc650dSSadaf Ebrahimi20. If a DEFINE group immediately preceded a lookbehind assertion, the pattern
849*22dc650dSSadaf Ebrahimicould be mis-compiled and therefore not match correctly. This is the example
850*22dc650dSSadaf Ebrahimithat found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to
851*22dc650dSSadaf Ebrahimimatch "word" because the "move back" value was set to zero.
852*22dc650dSSadaf Ebrahimi
853*22dc650dSSadaf Ebrahimi21. Following a request from a user, some extensions and tidies to the
854*22dc650dSSadaf Ebrahimicharacter tables handling have been done:
855*22dc650dSSadaf Ebrahimi
856*22dc650dSSadaf Ebrahimi  (a) The dftables auxiliary program is renamed pcre2_dftables, but it is still
857*22dc650dSSadaf Ebrahimi  not installed for public use.
858*22dc650dSSadaf Ebrahimi
859*22dc650dSSadaf Ebrahimi  (b) There is now a -b option for pcre2_dftables, which causes the tables to
860*22dc650dSSadaf Ebrahimi  be written in binary. There is also a -help option.
861*22dc650dSSadaf Ebrahimi
862*22dc650dSSadaf Ebrahimi  (c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an
863*22dc650dSSadaf Ebrahimi  application that wants to save tables in binary knows how long they are.
864*22dc650dSSadaf Ebrahimi
865*22dc650dSSadaf Ebrahimi22. Changed setting of CMAKE_MODULE_PATH in CMakeLists.txt from SET to
866*22dc650dSSadaf EbrahimiLIST(APPEND...) to allow a setting from the command line to be included.
867*22dc650dSSadaf Ebrahimi
868*22dc650dSSadaf Ebrahimi23. Updated to Unicode 13.0.0.
869*22dc650dSSadaf Ebrahimi
870*22dc650dSSadaf Ebrahimi24. CMake build now checks for secure_getenv() and strerror(). Patch by Carlo.
871*22dc650dSSadaf Ebrahimi
872*22dc650dSSadaf Ebrahimi25. Avoid using [-1] as a suffix in pcre2test because it can provoke a compiler
873*22dc650dSSadaf Ebrahimiwarning.
874*22dc650dSSadaf Ebrahimi
875*22dc650dSSadaf Ebrahimi26. Added tests for __attribute__((uninitialized)) to both the configure and
876*22dc650dSSadaf EbrahimiCMake build files, and then applied this attribute to the variable called
877*22dc650dSSadaf Ebrahimistack_frames_vector[] in pcre2_match(). When implemented, this disables
878*22dc650dSSadaf Ebrahimiautomatic initialization (a facility in clang), which can take time on big
879*22dc650dSSadaf Ebrahimivariables.
880*22dc650dSSadaf Ebrahimi
881*22dc650dSSadaf Ebrahimi27. Updated CMakeLists.txt (patches by Uwe Korn) to add support for
882*22dc650dSSadaf Ebrahimipcre2-config, the libpcre*.pc files, SOVERSION, VERSION and the
883*22dc650dSSadaf EbrahimiMACHO_*_VERSIONS settings for CMake builds.
884*22dc650dSSadaf Ebrahimi
885*22dc650dSSadaf Ebrahimi28. Another patch to CMakeLists.txt to check for mkostemp (configure already
886*22dc650dSSadaf Ebrahimidoes). Patch by Carlo Marcelo Arenas Belon.
887*22dc650dSSadaf Ebrahimi
888*22dc650dSSadaf Ebrahimi29. Check for the existence of memfd_create in both CMake and configure
889*22dc650dSSadaf Ebrahimiconfigurations. Patch by Carlo Marcelo Arenas Belon.
890*22dc650dSSadaf Ebrahimi
891*22dc650dSSadaf Ebrahimi30. Restrict the configuration setting for the SELinux compatible execmem
892*22dc650dSSadaf Ebrahimiallocator (change 10.30/44) to Linux and NetBSD.
893*22dc650dSSadaf Ebrahimi
894*22dc650dSSadaf Ebrahimi
895*22dc650dSSadaf EbrahimiVersion 10.34 21-November-2019
896*22dc650dSSadaf Ebrahimi------------------------------
897*22dc650dSSadaf Ebrahimi
898*22dc650dSSadaf Ebrahimi1. The maximum number of capturing subpatterns is 65535 (documented), but no
899*22dc650dSSadaf Ebrahimicheck on this was ever implemented. This omission has been rectified; it fixes
900*22dc650dSSadaf EbrahimiClusterFuzz 14376.
901*22dc650dSSadaf Ebrahimi
902*22dc650dSSadaf Ebrahimi2. Improved the invalid utf32 support of the JIT compiler. Now it correctly
903*22dc650dSSadaf Ebrahimidetects invalid characters in the 0xd800-0xdfff range.
904*22dc650dSSadaf Ebrahimi
905*22dc650dSSadaf Ebrahimi3. Fix minor typo bug in JIT compile when \X is used in a non-UTF string.
906*22dc650dSSadaf Ebrahimi
907*22dc650dSSadaf Ebrahimi4. Add support for matching in invalid UTF strings to the pcre2_match()
908*22dc650dSSadaf Ebrahimiinterpreter, and integrate with the existing JIT support via the new
909*22dc650dSSadaf EbrahimiPCRE2_MATCH_INVALID_UTF compile-time option.
910*22dc650dSSadaf Ebrahimi
911*22dc650dSSadaf Ebrahimi5. Give more error detail for invalid UTF-8 when detected in pcre2grep.
912*22dc650dSSadaf Ebrahimi
913*22dc650dSSadaf Ebrahimi6. Add support for invalid UTF-8 to pcre2grep.
914*22dc650dSSadaf Ebrahimi
915*22dc650dSSadaf Ebrahimi7. Adjust the limit for "must have" code unit searching, in particular,
916*22dc650dSSadaf Ebrahimiincrease it substantially for non-anchored patterns.
917*22dc650dSSadaf Ebrahimi
918*22dc650dSSadaf Ebrahimi8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
919*22dc650dSSadaf Ebrahimiminimum is potentially useful.
920*22dc650dSSadaf Ebrahimi
921*22dc650dSSadaf Ebrahimi9. Some changes to the way the minimum subject length is handled:
922*22dc650dSSadaf Ebrahimi
923*22dc650dSSadaf Ebrahimi   * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed;
924*22dc650dSSadaf Ebrahimi     pcre2test now omits this item instead of showing a value of zero.
925*22dc650dSSadaf Ebrahimi
926*22dc650dSSadaf Ebrahimi   * An incorrect minimum length could be calculated for a pattern that
927*22dc650dSSadaf Ebrahimi     contained (*ACCEPT) inside a qualified group whose minimum repetition was
928*22dc650dSSadaf Ebrahimi     zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum
929*22dc650dSSadaf Ebrahimi     of 2. The minimum length scan no longer happens for a pattern that
930*22dc650dSSadaf Ebrahimi     contains (*ACCEPT).
931*22dc650dSSadaf Ebrahimi
932*22dc650dSSadaf Ebrahimi   * When no minimum length is set by the normal scan, but a first and/or last
933*22dc650dSSadaf Ebrahimi     code unit is recorded, set the minimum to 1 or 2 as appropriate.
934*22dc650dSSadaf Ebrahimi
935*22dc650dSSadaf Ebrahimi   * When a pattern contains multiple groups with the same number, a back
936*22dc650dSSadaf Ebrahimi     reference cannot know which one to scan for a minimum length. This used to
937*22dc650dSSadaf Ebrahimi     cause the minimum length finder to give up with no result. Now it treats
938*22dc650dSSadaf Ebrahimi     such references as not adding to the minimum length (which it should have
939*22dc650dSSadaf Ebrahimi     done all along).
940*22dc650dSSadaf Ebrahimi
941*22dc650dSSadaf Ebrahimi   * Furthermore, the above action now happens only if the back reference is to
942*22dc650dSSadaf Ebrahimi     a group that exists more than once in a pattern instead of any back
943*22dc650dSSadaf Ebrahimi     reference in a pattern with duplicate numbers.
944*22dc650dSSadaf Ebrahimi
945*22dc650dSSadaf Ebrahimi10. A (*MARK) value inside a successful condition was not being returned by the
946*22dc650dSSadaf Ebrahimiinterpretive matcher (it was returned by JIT). This bug has been mended.
947*22dc650dSSadaf Ebrahimi
948*22dc650dSSadaf Ebrahimi11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work
949*22dc650dSSadaf Ebrahimiif the pattern had more than 32 capturing parentheses. This is fixed. In
950*22dc650dSSadaf Ebrahimiaddition (a) the default limit for groups requested by -o<n> has been raised to
951*22dc650dSSadaf Ebrahimi50, (b) the new --om-capture option changes the limit, (c) an error is raised
952*22dc650dSSadaf Ebrahimiif -o asks for a group that is above the limit.
953*22dc650dSSadaf Ebrahimi
954*22dc650dSSadaf Ebrahimi12. The quantifier {1} was always being ignored, but this is incorrect when it
955*22dc650dSSadaf Ebrahimiis made possessive and applied to an item in parentheses, because a
956*22dc650dSSadaf Ebrahimiparenthesized item may contain multiple branches or other backtracking points,
957*22dc650dSSadaf Ebrahimifor example /(a|ab){1}+c/ or /(a+){1}+a/.
958*22dc650dSSadaf Ebrahimi
959*22dc650dSSadaf Ebrahimi13. For partial matches, pcre2test was always showing the maximum lookbehind
960*22dc650dSSadaf Ebrahimicharacters, flagged with "<", which is misleading when the lookbehind didn't
961*22dc650dSSadaf Ebrahimiactually look behind the start (because it was later in the pattern). Showing
962*22dc650dSSadaf Ebrahimiall consulted preceding characters for partial matches is now controlled by the
963*22dc650dSSadaf Ebrahimiexisting "allusedtext" modifier and, as for complete matches, this facility is
964*22dc650dSSadaf Ebrahimiavailable only for non-JIT matching, because JIT does not maintain the first
965*22dc650dSSadaf Ebrahimiand last consulted characters.
966*22dc650dSSadaf Ebrahimi
967*22dc650dSSadaf Ebrahimi14. DFA matching (using pcre2_dfa_match()) was not recognising a partial match
968*22dc650dSSadaf Ebrahimiif the end of the subject was encountered in a lookahead (conditional or
969*22dc650dSSadaf Ebrahimiotherwise), an atomic group, or a recursion.
970*22dc650dSSadaf Ebrahimi
971*22dc650dSSadaf Ebrahimi15. Give error if pcre2test -t, -T, -tm or -TM is given an argument of zero.
972*22dc650dSSadaf Ebrahimi
973*22dc650dSSadaf Ebrahimi16. Check for integer overflow when computing lookbehind lengths. Fixes
974*22dc650dSSadaf EbrahimiClusterfuzz issue 15636.
975*22dc650dSSadaf Ebrahimi
976*22dc650dSSadaf Ebrahimi17. Implemented non-atomic positive lookaround assertions.
977*22dc650dSSadaf Ebrahimi
978*22dc650dSSadaf Ebrahimi18. If a lookbehind contained a lookahead that contained another lookbehind
979*22dc650dSSadaf Ebrahimiwithin it, the nested lookbehind was not correctly processed. For example, if
980*22dc650dSSadaf Ebrahimi/(?<=(?=(?<=a)))b/ was matched to "ab" it gave no match instead of matching
981*22dc650dSSadaf Ebrahimi"b".
982*22dc650dSSadaf Ebrahimi
983*22dc650dSSadaf Ebrahimi19. Implemented pcre2_get_match_data_size().
984*22dc650dSSadaf Ebrahimi
985*22dc650dSSadaf Ebrahimi20. Two alterations to partial matching:
986*22dc650dSSadaf Ebrahimi
987*22dc650dSSadaf Ebrahimi    (a) The definition of a partial match is slightly changed: if a pattern
988*22dc650dSSadaf Ebrahimi    contains any lookbehinds, an empty partial match may be given, because this
989*22dc650dSSadaf Ebrahimi    is another situation where adding characters to the current subject can
990*22dc650dSSadaf Ebrahimi    lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab".
991*22dc650dSSadaf Ebrahimi
992*22dc650dSSadaf Ebrahimi    (b) Similarly, if a pattern could match an empty string, an empty partial
993*22dc650dSSadaf Ebrahimi    match may be given. Example: /(?![ab]).*/ with subject "ab". This case
994*22dc650dSSadaf Ebrahimi    applies only to PCRE2_PARTIAL_HARD.
995*22dc650dSSadaf Ebrahimi
996*22dc650dSSadaf Ebrahimi    (c) An empty string partial hard match can be returned for \z and \Z as it
997*22dc650dSSadaf Ebrahimi    is documented that they shouldn't match.
998*22dc650dSSadaf Ebrahimi
999*22dc650dSSadaf Ebrahimi21. A branch that started with (*ACCEPT) was not being recognized as one that
1000*22dc650dSSadaf Ebrahimicould match an empty string.
1001*22dc650dSSadaf Ebrahimi
1002*22dc650dSSadaf Ebrahimi22. Corrected pcre2_set_character_tables() tables data type: was const unsigned
1003*22dc650dSSadaf Ebrahimichar * instead of const uint8_t *, as generated by pcre2_maketables().
1004*22dc650dSSadaf Ebrahimi
1005*22dc650dSSadaf Ebrahimi23. Upgraded to Unicode 12.1.0.
1006*22dc650dSSadaf Ebrahimi
1007*22dc650dSSadaf Ebrahimi24. Add -jitfast command line option to pcre2test (to make all the jit options
1008*22dc650dSSadaf Ebrahimiavailable directly).
1009*22dc650dSSadaf Ebrahimi
1010*22dc650dSSadaf Ebrahimi25. Make pcre2test -C show if libreadline or libedit is supported.
1011*22dc650dSSadaf Ebrahimi
1012*22dc650dSSadaf Ebrahimi26. If the length of one branch of a group exceeded 65535 (the maximum value
1013*22dc650dSSadaf Ebrahimithat is remembered as a minimum length), the whole group's length was
1014*22dc650dSSadaf Ebrahimiincorrectly recorded as 65535, leading to incorrect "no match" when start-up
1015*22dc650dSSadaf Ebrahimioptimizations were in force.
1016*22dc650dSSadaf Ebrahimi
1017*22dc650dSSadaf Ebrahimi27. The "rightmost consulted character" value was not always correct; in
1018*22dc650dSSadaf Ebrahimiparticular, if a pattern ended with a negative lookahead, characters that were
1019*22dc650dSSadaf Ebrahimiinspected in that lookahead were not included.
1020*22dc650dSSadaf Ebrahimi
1021*22dc650dSSadaf Ebrahimi28. Add the pcre2_maketables_free() function.
1022*22dc650dSSadaf Ebrahimi
1023*22dc650dSSadaf Ebrahimi29. The start-up optimization that looks for a unique initial matching
1024*22dc650dSSadaf Ebrahimicode unit in the interpretive engines uses memchr() in 8-bit mode. When the
1025*22dc650dSSadaf Ebrahimisearch is caseless, it was doing so inefficiently, which ended up slowing down
1026*22dc650dSSadaf Ebrahimithe match drastically when the subject was very long. The revised code (a)
1027*22dc650dSSadaf Ebrahimiremembers if one case is not found, so it never repeats the search for that
1028*22dc650dSSadaf Ebrahimicase after a bumpalong and (b) when one case has been found, it searches only
1029*22dc650dSSadaf Ebrahimiup to that position for an earlier occurrence of the other case. This fix
1030*22dc650dSSadaf Ebrahimiapplies to both interpretive pcre2_match() and to pcre2_dfa_match().
1031*22dc650dSSadaf Ebrahimi
1032*22dc650dSSadaf Ebrahimi30. While scanning to find the minimum length of a group, if any branch has
1033*22dc650dSSadaf Ebrahimiminimum length zero, there is no need to scan any subsequent branches (a small
1034*22dc650dSSadaf Ebrahimicompile-time performance improvement).
1035*22dc650dSSadaf Ebrahimi
1036*22dc650dSSadaf Ebrahimi31. Installed a .gitignore file on a user's suggestion. When using the svn
1037*22dc650dSSadaf Ebrahimirepository with git (through git svn) this helps keep it tidy.
1038*22dc650dSSadaf Ebrahimi
1039*22dc650dSSadaf Ebrahimi32. Add underflow check in JIT which may occur when the value of subject
1040*22dc650dSSadaf Ebrahimistring pointer is close to 0.
1041*22dc650dSSadaf Ebrahimi
1042*22dc650dSSadaf Ebrahimi33. Arrange for classes such as [Aa] which contain just the two cases of the
1043*22dc650dSSadaf Ebrahimisame character, to be treated as a single caseless character. This causes the
1044*22dc650dSSadaf Ebrahimifirst and required code unit optimizations to kick in where relevant.
1045*22dc650dSSadaf Ebrahimi
1046*22dc650dSSadaf Ebrahimi34. Improve the bitmap of starting bytes for positive classes that include wide
1047*22dc650dSSadaf Ebrahimicharacters, but no property types, in UTF-8 mode. Previously, on encountering
1048*22dc650dSSadaf Ebrahimisuch a class, the bits for all bytes greater than \xc4 were set, thus
1049*22dc650dSSadaf Ebrahimispecifying any character with codepoint >= 0x100. Now the only bits that are
1050*22dc650dSSadaf Ebrahimiset are for the relevant bytes that start the wide characters. This can give a
1051*22dc650dSSadaf Ebrahiminoticeable performance improvement.
1052*22dc650dSSadaf Ebrahimi
1053*22dc650dSSadaf Ebrahimi35. If the bitmap of starting code units contains only 1 or 2 bits, replace it
1054*22dc650dSSadaf Ebrahimiwith a single starting code unit (1 bit) or a caseless single starting code
1055*22dc650dSSadaf Ebrahimiunit if the two relevant characters are case-partners. This is particularly
1056*22dc650dSSadaf Ebrahimirelevant to the 8-bit library, though it applies to all. It can give a
1057*22dc650dSSadaf Ebrahimiperformance boost for patterns such as [Ww]ord and (word|WORD). However, this
1058*22dc650dSSadaf Ebrahimioptimization doesn't happen if there is a "required" code unit of the same
1059*22dc650dSSadaf Ebrahimivalue (because the search for a "required" code unit starts at the match start
1060*22dc650dSSadaf Ebrahimifor non-unique first code unit patterns, but after a unique first code unit,
1061*22dc650dSSadaf Ebrahimiand patterns such as a*a need the former action).
1062*22dc650dSSadaf Ebrahimi
1063*22dc650dSSadaf Ebrahimi36. Small patch to pcre2posix.c to set the erroroffset field to -1 immediately
1064*22dc650dSSadaf Ebrahimiafter a successful compile, instead of at the start of matching to avoid a
1065*22dc650dSSadaf Ebrahimisanitizer complaint (regexec is supposed to be thread safe).
1066*22dc650dSSadaf Ebrahimi
1067*22dc650dSSadaf Ebrahimi37. Add NEON vectorization to JIT to speed up matching of first character and
1068*22dc650dSSadaf Ebrahimipairs of characters on ARM64 CPUs.
1069*22dc650dSSadaf Ebrahimi
1070*22dc650dSSadaf Ebrahimi38. If a non-ASCII character was the first in a starting assertion in a
1071*22dc650dSSadaf Ebrahimicaseless match, the "first code unit" optimization did not get the casing
1072*22dc650dSSadaf Ebrahimiright, and the assertion failed to match a character in the other case if it
1073*22dc650dSSadaf Ebrahimidid not start with the same code unit.
1074*22dc650dSSadaf Ebrahimi
1075*22dc650dSSadaf Ebrahimi39. Fixed the incorrect computation of jump sizes on x86 CPUs in JIT. A masking
1076*22dc650dSSadaf Ebrahimioperation was incorrectly removed in r1136. Reported by Ralf Junker.
1077*22dc650dSSadaf Ebrahimi
1078*22dc650dSSadaf Ebrahimi
1079*22dc650dSSadaf EbrahimiVersion 10.33 16-April-2019
1080*22dc650dSSadaf Ebrahimi---------------------------
1081*22dc650dSSadaf Ebrahimi
1082*22dc650dSSadaf Ebrahimi1. Added "allvector" to pcre2test to make it easy to check the part of the
1083*22dc650dSSadaf Ebrahimiovector that shouldn't be changed, in particular after substitute and failed or
1084*22dc650dSSadaf Ebrahimipartial matches.
1085*22dc650dSSadaf Ebrahimi
1086*22dc650dSSadaf Ebrahimi2. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
1087*22dc650dSSadaf Ebrahimia greater than 1 fixed quantifier. This issue was found by Yunho Kim.
1088*22dc650dSSadaf Ebrahimi
1089*22dc650dSSadaf Ebrahimi3. Added support for callouts from pcre2_substitute(). After 10.33-RC1, but
1090*22dc650dSSadaf Ebrahimiprior to release, fixed a bug that caused a crash if pcre2_substitute() was
1091*22dc650dSSadaf Ebrahimicalled with a NULL match context.
1092*22dc650dSSadaf Ebrahimi
1093*22dc650dSSadaf Ebrahimi4. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper
1094*22dc650dSSadaf Ebrahimifunctions that use the standard POSIX names. However, in pcre2posix.h the POSIX
1095*22dc650dSSadaf Ebrahiminames are defined as macros. This should help avoid linking with the wrong
1096*22dc650dSSadaf Ebrahimilibrary in some environments while still exporting the POSIX names for
1097*22dc650dSSadaf Ebrahimipre-existing programs that use them. (The Debian alternative names are also
1098*22dc650dSSadaf Ebrahimidefined as macros, but not documented.)
1099*22dc650dSSadaf Ebrahimi
1100*22dc650dSSadaf Ebrahimi5. Fix an xclass matching issue in JIT.
1101*22dc650dSSadaf Ebrahimi
1102*22dc650dSSadaf Ebrahimi6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315).
1103*22dc650dSSadaf Ebrahimi
1104*22dc650dSSadaf Ebrahimi7. Implement the Perl 5.28 experimental alphabetic names for atomic groups and
1105*22dc650dSSadaf Ebrahimilookaround assertions, for example, (*pla:...) and (*atomic:...). These are
1106*22dc650dSSadaf Ebrahimicharacterized by a lower case letter following (* and to simplify coding for
1107*22dc650dSSadaf Ebrahimithis, the character tables created by pcre2_maketables() were updated to add a
1108*22dc650dSSadaf Ebrahiminew "is lower case letter" bit. At the same time, the now unused "is
1109*22dc650dSSadaf Ebrahimihexadecimal digit" bit was removed. The default tables in
1110*22dc650dSSadaf Ebrahimisrc/pcre2_chartables.c.dist are updated.
1111*22dc650dSSadaf Ebrahimi
1112*22dc650dSSadaf Ebrahimi8. Implement the new Perl "script run" features (*script_run:...) and
1113*22dc650dSSadaf Ebrahimi(*atomic_script_run:...) aka (*sr:...) and (*asr:...).
1114*22dc650dSSadaf Ebrahimi
1115*22dc650dSSadaf Ebrahimi9. Fixed two typos in change 22 for 10.21, which added special handling for
1116*22dc650dSSadaf Ebrahimiranges such as a-z in EBCDIC environments. The original code probably never
1117*22dc650dSSadaf Ebrahimiworked, though there were no bug reports.
1118*22dc650dSSadaf Ebrahimi
1119*22dc650dSSadaf Ebrahimi10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via
1120*22dc650dSSadaf Ebrahimipcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast
1121*22dc650dSSadaf Ebrahimipath. Also, when a match fails, set the subject field in the match data to NULL
1122*22dc650dSSadaf Ebrahimifor tidiness - none of the substring extractors should reference this after
1123*22dc650dSSadaf Ebrahimimatch failure.
1124*22dc650dSSadaf Ebrahimi
1125*22dc650dSSadaf Ebrahimi11. If a pattern started with a subroutine call that had a quantifier with a
1126*22dc650dSSadaf Ebrahimiminimum of zero, an incorrect "match must start with this character" could be
1127*22dc650dSSadaf Ebrahimirecorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to
1128*22dc650dSSadaf Ebrahimibe the first character of a match.
1129*22dc650dSSadaf Ebrahimi
1130*22dc650dSSadaf Ebrahimi12. The heap limit checking code in pcre2_dfa_match() could suffer from
1131*22dc650dSSadaf Ebrahimioverflow if the heap limit was set very large. This could cause incorrect "heap
1132*22dc650dSSadaf Ebrahimilimit exceeded" errors.
1133*22dc650dSSadaf Ebrahimi
1134*22dc650dSSadaf Ebrahimi13. Add "kibibytes" to the heap limit output from pcre2test -C to make the
1135*22dc650dSSadaf Ebrahimiunits clear.
1136*22dc650dSSadaf Ebrahimi
1137*22dc650dSSadaf Ebrahimi14. Add a call to pcre2_jit_free_unused_memory() in pcre2grep, for tidiness.
1138*22dc650dSSadaf Ebrahimi
1139*22dc650dSSadaf Ebrahimi15. Updated the VMS-specific code in pcre2test on the advice of a VMS user.
1140*22dc650dSSadaf Ebrahimi
1141*22dc650dSSadaf Ebrahimi16. Removed the unnecessary inclusion of stdint.h (or inttypes.h) from
1142*22dc650dSSadaf Ebrahimipcre2_internal.h as it is now included by pcre2.h. Also, change 17 for 10.32
1143*22dc650dSSadaf Ebrahimibelow was unnecessarily complicated, as inttypes.h is a Standard C header,
1144*22dc650dSSadaf Ebrahimiwhich is defined to be a superset of stdint.h. Instead of conditionally
1145*22dc650dSSadaf Ebrahimiincluding stdint.h or inttypes.h, pcre2.h now unconditionally includes
1146*22dc650dSSadaf Ebrahimiinttypes.h. This supports environments that do not have stdint.h but do have
1147*22dc650dSSadaf Ebrahimiinttypes.h, which are known to exist. A note in the autotools documentation
1148*22dc650dSSadaf Ebrahimisays (November 2018) that there are none known that are the other way round.
1149*22dc650dSSadaf Ebrahimi
1150*22dc650dSSadaf Ebrahimi17. Added --disable-percent-zt to "configure" (and equivalent to CMake) to
1151*22dc650dSSadaf Ebrahimiforcibly disable the use of %zu and %td in formatting strings because there is
1152*22dc650dSSadaf Ebrahimiat least one version of VMS that claims to be C99 but does not support these
1153*22dc650dSSadaf Ebrahimimodifiers.
1154*22dc650dSSadaf Ebrahimi
1155*22dc650dSSadaf Ebrahimi18. Added --disable-pcre2grep-callout-fork, which restricts the callout support
1156*22dc650dSSadaf Ebrahimiin pcre2grep to the inbuilt echo facility. This may be useful in environments
1157*22dc650dSSadaf Ebrahimithat do not support fork().
1158*22dc650dSSadaf Ebrahimi
1159*22dc650dSSadaf Ebrahimi19. Fix two instances of <= 0 being applied to unsigned integers (the VMS
1160*22dc650dSSadaf Ebrahimicompiler complains).
1161*22dc650dSSadaf Ebrahimi
1162*22dc650dSSadaf Ebrahimi20. Added "fork" support for VMS to pcre2grep, for running an external program
1163*22dc650dSSadaf Ebrahimivia a string callout.
1164*22dc650dSSadaf Ebrahimi
1165*22dc650dSSadaf Ebrahimi21. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel.
1166*22dc650dSSadaf Ebrahimi
1167*22dc650dSSadaf Ebrahimi22. If a pattern started with (*MARK), (*COMMIT), (*PRUNE), (*SKIP), or (*THEN)
1168*22dc650dSSadaf Ebrahimifollowed by ^ it was not recognized as anchored.
1169*22dc650dSSadaf Ebrahimi
1170*22dc650dSSadaf Ebrahimi23. The RunGrepTest script used to cut out the test of NUL characters for
1171*22dc650dSSadaf EbrahimiSolaris and MacOS as printf and sed can't handle them. It seems that the *BSD
1172*22dc650dSSadaf Ebrahimisystems can't either. I've inverted the test so that only those OS that are
1173*22dc650dSSadaf Ebrahimiknown to work (currently only Linux) try to run this test.
1174*22dc650dSSadaf Ebrahimi
1175*22dc650dSSadaf Ebrahimi24. Some tests in RunGrepTest appended to testtrygrep from two different file
1176*22dc650dSSadaf Ebrahimidescriptors instead of redirecting stderr to stdout. This worked on Linux, but
1177*22dc650dSSadaf Ebrahimiit was reported not to on other systems, causing the tests to fail.
1178*22dc650dSSadaf Ebrahimi
1179*22dc650dSSadaf Ebrahimi25. In the RunTest script, make the test for stack setting use the same value
1180*22dc650dSSadaf Ebrahimifor the stack as it needs for -bigstack.
1181*22dc650dSSadaf Ebrahimi
1182*22dc650dSSadaf Ebrahimi26. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning.
1183*22dc650dSSadaf Ebrahimi
1184*22dc650dSSadaf Ebrahimi26. With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s
1185*22dc650dSSadaf Ebrahimiwhich are valid in character classes, but not as the end of ranges, were being
1186*22dc650dSSadaf Ebrahimitreated as literals. An example is [_-\s] (but not [\s-_] because that gave an
1187*22dc650dSSadaf Ebrahimierror at the *start* of a range). Now an "invalid range" error is given
1188*22dc650dSSadaf Ebrahimiindependently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
1189*22dc650dSSadaf Ebrahimi
1190*22dc650dSSadaf Ebrahimi27. Related to 26 above, PCRE2_BAD_ESCAPE_IS_LITERAL was affecting known escape
1191*22dc650dSSadaf Ebrahimisequences such as \eX when they appeared invalidly in a character class. Now
1192*22dc650dSSadaf Ebrahimithe option applies only to unrecognized or malformed escape sequences.
1193*22dc650dSSadaf Ebrahimi
1194*22dc650dSSadaf Ebrahimi28. Fix word boundary in JIT compiler. Patch by Mike Munday.
1195*22dc650dSSadaf Ebrahimi
1196*22dc650dSSadaf Ebrahimi29. The pcre2_dfa_match() function was incorrectly handling conditional version
1197*22dc650dSSadaf Ebrahimitests such as (?(VERSION>=0)...) when the version test was true. Incorrect
1198*22dc650dSSadaf Ebrahimiprocessing or a crash could result.
1199*22dc650dSSadaf Ebrahimi
1200*22dc650dSSadaf Ebrahimi30. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group
1201*22dc650dSSadaf Ebrahiminames, as Perl does. There was a small bug in this new code, found by
1202*22dc650dSSadaf EbrahimiClusterFuzz 12950, fixed before release.
1203*22dc650dSSadaf Ebrahimi
1204*22dc650dSSadaf Ebrahimi31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh}
1205*22dc650dSSadaf Ebrahimiconstruct.
1206*22dc650dSSadaf Ebrahimi
1207*22dc650dSSadaf Ebrahimi32. Compile \p{Any} to be the same as . in DOTALL mode, so that it benefits
1208*22dc650dSSadaf Ebrahimifrom auto-anchoring if \p{Any}* starts a pattern.
1209*22dc650dSSadaf Ebrahimi
1210*22dc650dSSadaf Ebrahimi33. Compile invalid UTF check in JIT test when only pcre32 is enabled.
1211*22dc650dSSadaf Ebrahimi
1212*22dc650dSSadaf Ebrahimi34. For some time now, CMake has been warning about the setting of policy
1213*22dc650dSSadaf EbrahimiCMP0026 to "OLD" in CmakeLists.txt, and hinting that the feature might be
1214*22dc650dSSadaf Ebrahimiremoved in a future version. A request for CMake expertise on the list produced
1215*22dc650dSSadaf Ebrahimino result, so I have now hacked CMakeLists.txt along the lines of some changes
1216*22dc650dSSadaf EbrahimiI found on the Internet. The new code no longer needs the policy setting, and
1217*22dc650dSSadaf Ebrahimiit appears to work fine on Linux.
1218*22dc650dSSadaf Ebrahimi
1219*22dc650dSSadaf Ebrahimi35. Setting --enable-jit=auto for an out-of-tree build failed because the
1220*22dc650dSSadaf Ebrahimisource directory wasn't in the search path for AC_TRY_COMPILE always. Patch
1221*22dc650dSSadaf Ebrahimifrom Ross Burton.
1222*22dc650dSSadaf Ebrahimi
1223*22dc650dSSadaf Ebrahimi36. Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available.
1224*22dc650dSSadaf EbrahimiPatch by Guillem Jover.
1225*22dc650dSSadaf Ebrahimi
1226*22dc650dSSadaf Ebrahimi37. Changed expressions such as 1<<10 to 1u<<10 in many places because compiler
1227*22dc650dSSadaf Ebrahimiwarnings were reported.
1228*22dc650dSSadaf Ebrahimi
1229*22dc650dSSadaf Ebrahimi38. Using the clang compiler with sanitizing options causes runtime complaints
1230*22dc650dSSadaf Ebrahimiabout truncation for statements such as x = ~x when x is an 8-bit value; it
1231*22dc650dSSadaf Ebrahimiseems to compute ~x as a 32-bit value. Changing such statements to x = 255 ^ x
1232*22dc650dSSadaf Ebrahimigets rid of the warnings. There were also two missing casts in pcre2test.
1233*22dc650dSSadaf Ebrahimi
1234*22dc650dSSadaf Ebrahimi
1235*22dc650dSSadaf EbrahimiVersion 10.32 10-September-2018
1236*22dc650dSSadaf Ebrahimi-------------------------------
1237*22dc650dSSadaf Ebrahimi
1238*22dc650dSSadaf Ebrahimi1. When matching using the REG_STARTEND feature of the POSIX API with a
1239*22dc650dSSadaf Ebrahiminon-zero starting offset, unset capturing groups with lower numbers than a
1240*22dc650dSSadaf Ebrahimigroup that did capture something were not being correctly returned as "unset"
1241*22dc650dSSadaf Ebrahimi(that is, with offset values of -1).
1242*22dc650dSSadaf Ebrahimi
1243*22dc650dSSadaf Ebrahimi2. When matching using the POSIX API, pcre2test used to omit listing unset
1244*22dc650dSSadaf Ebrahimigroups altogether. Now it shows those that come before any actual captures as
1245*22dc650dSSadaf Ebrahimi"<unset>", as happens for non-POSIX matching.
1246*22dc650dSSadaf Ebrahimi
1247*22dc650dSSadaf Ebrahimi3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
1248*22dc650dSSadaf Ebrahimiwhatever the build configuration was. It now correctly says "\R matches all
1249*22dc650dSSadaf EbrahimiUnicode newlines" in the default case when --enable-bsr-anycrlf has not been
1250*22dc650dSSadaf Ebrahimispecified. Similarly, running "pcre2test -C bsr" never produced the result
1251*22dc650dSSadaf EbrahimiANY.
1252*22dc650dSSadaf Ebrahimi
1253*22dc650dSSadaf Ebrahimi4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
1254*22dc650dSSadaf Ebrahimimulti-code-unit characters caused bad behaviour and possibly a crash. This
1255*22dc650dSSadaf Ebrahimiissue was fixed for other kinds of repeat in release 10.20 by change 19, but
1256*22dc650dSSadaf Ebrahimirepeating character classes were overlooked.
1257*22dc650dSSadaf Ebrahimi
1258*22dc650dSSadaf Ebrahimi5. pcre2grep now supports the inclusion of binary zeros in patterns that are
1259*22dc650dSSadaf Ebrahimiread from files via the -f option.
1260*22dc650dSSadaf Ebrahimi
1261*22dc650dSSadaf Ebrahimi6. A small fix to pcre2grep to avoid compiler warnings for -Wformat-overflow=2.
1262*22dc650dSSadaf Ebrahimi
1263*22dc650dSSadaf Ebrahimi7. Added --enable-jit=auto support to configure.ac.
1264*22dc650dSSadaf Ebrahimi
1265*22dc650dSSadaf Ebrahimi8. Added some dummy variables to the heapframe structure in 16-bit and 32-bit
1266*22dc650dSSadaf Ebrahimimodes for the benefit of m68k, where pointers can be 16-bit aligned. The
1267*22dc650dSSadaf Ebrahimidummies force 32-bit alignment and this ensures that the structure is a
1268*22dc650dSSadaf Ebrahimimultiple of PCRE2_SIZE, a requirement that is tested at compile time. In other
1269*22dc650dSSadaf Ebrahimiarchitectures, alignment requirements take care of this automatically.
1270*22dc650dSSadaf Ebrahimi
1271*22dc650dSSadaf Ebrahimi9. When returning an error from pcre2_pattern_convert(), ensure the error
1272*22dc650dSSadaf Ebrahimioffset is set zero for early errors.
1273*22dc650dSSadaf Ebrahimi
1274*22dc650dSSadaf Ebrahimi10. A number of patches for Windows support from Daniel Richard G:
1275*22dc650dSSadaf Ebrahimi
1276*22dc650dSSadaf Ebrahimi  (a) List of error numbers in Runtest.bat corrected (it was not the same as in
1277*22dc650dSSadaf Ebrahimi      Runtest).
1278*22dc650dSSadaf Ebrahimi
1279*22dc650dSSadaf Ebrahimi  (b) pcre2grep snprintf() workaround as used elsewhere in the tree.
1280*22dc650dSSadaf Ebrahimi
1281*22dc650dSSadaf Ebrahimi  (c) Support for non-C99 snprintf() that returns -1 in the overflow case.
1282*22dc650dSSadaf Ebrahimi
1283*22dc650dSSadaf Ebrahimi11. Minor tidy of pcre2_dfa_match() code.
1284*22dc650dSSadaf Ebrahimi
1285*22dc650dSSadaf Ebrahimi12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer
1286*22dc650dSSadaf Ebrahimiuse the stack for local workspace and local ovectors. Instead, an initial block
1287*22dc650dSSadaf Ebrahimiof stack is reserved, but if this is insufficient, heap memory is used. The
1288*22dc650dSSadaf Ebrahimiheap limit parameter now applies to pcre2_dfa_match().
1289*22dc650dSSadaf Ebrahimi
1290*22dc650dSSadaf Ebrahimi13. If a "find limits" test of DFA matching in pcre2test resulted in too many
1291*22dc650dSSadaf Ebrahimimatches for the ovector, no matches were displayed.
1292*22dc650dSSadaf Ebrahimi
1293*22dc650dSSadaf Ebrahimi14. Removed an occurrence of ctrl/Z from test 6 because Windows treats it as
1294*22dc650dSSadaf EbrahimiEOF. The test looks to have come from a fuzzer.
1295*22dc650dSSadaf Ebrahimi
1296*22dc650dSSadaf Ebrahimi15. If PCRE2 was built with a default match limit a lot greater than the
1297*22dc650dSSadaf Ebrahimidefault default of 10 000 000, some JIT tests of the match limit no longer
1298*22dc650dSSadaf Ebrahimifailed. All such tests now set 10 000 000 as the upper limit.
1299*22dc650dSSadaf Ebrahimi
1300*22dc650dSSadaf Ebrahimi16. Another Windows related patch for pcregrep to ensure that WIN32 is
1301*22dc650dSSadaf Ebrahimiundefined under Cygwin.
1302*22dc650dSSadaf Ebrahimi
1303*22dc650dSSadaf Ebrahimi17. Test for the presence of stdint.h and inttypes.h in configure and CMake and
1304*22dc650dSSadaf Ebrahimiinclude whichever exists (stdint preferred) instead of unconditionally
1305*22dc650dSSadaf Ebrahimiincluding stdint. This makes life easier for old and non-standard systems.
1306*22dc650dSSadaf Ebrahimi
1307*22dc650dSSadaf Ebrahimi18. Further changes to improve portability, especially to old and or non-
1308*22dc650dSSadaf Ebrahimistandard systems:
1309*22dc650dSSadaf Ebrahimi
1310*22dc650dSSadaf Ebrahimi  (a) Put all printf arguments in RunGrepTest into single, not double, quotes,
1311*22dc650dSSadaf Ebrahimi      and use \0 not \x00 for binary zero.
1312*22dc650dSSadaf Ebrahimi
1313*22dc650dSSadaf Ebrahimi  (b) Avoid the use of C++ (i.e. BCPL) // comments.
1314*22dc650dSSadaf Ebrahimi
1315*22dc650dSSadaf Ebrahimi  (c) Parameterize the use of %zu in pcre2test to make it like %td. For both of
1316*22dc650dSSadaf Ebrahimi      these now, if using MSVC or a standard C before C99, %lu is used with a
1317*22dc650dSSadaf Ebrahimi      cast if necessary.
1318*22dc650dSSadaf Ebrahimi
1319*22dc650dSSadaf Ebrahimi19. Applied a contributed patch to CMakeLists.txt to increase the stack size
1320*22dc650dSSadaf Ebrahimiwhen linking pcre2test with MSVC. This gets rid of a stack overflow error in
1321*22dc650dSSadaf Ebrahimithe standard set of tests.
1322*22dc650dSSadaf Ebrahimi
1323*22dc650dSSadaf Ebrahimi20. Output a warning in pcre2test when ignoring the "altglobal" modifier when
1324*22dc650dSSadaf Ebrahimiit is given with the "replace" modifier.
1325*22dc650dSSadaf Ebrahimi
1326*22dc650dSSadaf Ebrahimi21. In both pcre2test and pcre2_substitute(), with global matching, a pattern
1327*22dc650dSSadaf Ebrahimithat matched an empty string, but never at the starting match offset, was not
1328*22dc650dSSadaf Ebrahimihandled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such
1329*22dc650dSSadaf Ebrahimia pattern. Because \G is in a lookbehind assertion, there has to be a
1330*22dc650dSSadaf Ebrahimi"bumpalong" before there can be a match. The automatic "advance by one
1331*22dc650dSSadaf Ebrahimicharacter after an empty string match" rule is therefore inappropriate. A more
1332*22dc650dSSadaf Ebrahimicomplicated algorithm has now been implemented.
1333*22dc650dSSadaf Ebrahimi
1334*22dc650dSSadaf Ebrahimi22. When checking to see if a lookbehind is of fixed length, lookaheads were
1335*22dc650dSSadaf Ebrahimicorrectly ignored, but qualifiers on lookaheads were not being ignored, leading
1336*22dc650dSSadaf Ebrahimito an incorrect "lookbehind assertion is not fixed length" error.
1337*22dc650dSSadaf Ebrahimi
1338*22dc650dSSadaf Ebrahimi23. The VERSION condition test was reading fractional PCRE2 version numbers
1339*22dc650dSSadaf Ebrahimisuch as the 04 in 10.04 incorrectly and hence giving wrong results.
1340*22dc650dSSadaf Ebrahimi
1341*22dc650dSSadaf Ebrahimi24. Updated to Unicode version 11.0.0. As well as the usual addition of new
1342*22dc650dSSadaf Ebrahimiscripts and characters, this involved re-jigging the grapheme break property
1343*22dc650dSSadaf Ebrahimialgorithm because Unicode has changed the way emojis are handled.
1344*22dc650dSSadaf Ebrahimi
1345*22dc650dSSadaf Ebrahimi25. Fixed an obscure bug that struck when there were two atomic groups not
1346*22dc650dSSadaf Ebrahimiseparated by something with a backtracking point. There could be an incorrect
1347*22dc650dSSadaf Ebrahimibacktrack into the first of the atomic groups. A complicated example is
1348*22dc650dSSadaf Ebrahimi/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ matched against "abc", where the *SKIP
1349*22dc650dSSadaf Ebrahimishouldn't find a MARK (because is in an atomic group), but it did.
1350*22dc650dSSadaf Ebrahimi
1351*22dc650dSSadaf Ebrahimi26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set
1352*22dc650dSSadaf Ebrahimia list of modifiers for all subsequent patterns - only those that the script
1353*22dc650dSSadaf Ebrahimirecognizes are meaningful; (2) #subject lines can be used to set or unset a
1354*22dc650dSSadaf Ebrahimidefault "mark" modifier; (3) Unsupported #command lines give a warning when
1355*22dc650dSSadaf Ebrahimithey are ignored; (4) Mark data is output only if the "mark" modifier is
1356*22dc650dSSadaf Ebrahimipresent.
1357*22dc650dSSadaf Ebrahimi
1358*22dc650dSSadaf Ebrahimi27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
1359*22dc650dSSadaf Ebrahimi
1360*22dc650dSSadaf Ebrahimi28. A (*MARK) name was not being passed back for positive assertions that were
1361*22dc650dSSadaf Ebrahimiterminated by (*ACCEPT).
1362*22dc650dSSadaf Ebrahimi
1363*22dc650dSSadaf Ebrahimi29. Add support for \N{U+dddd}, but only in Unicode mode.
1364*22dc650dSSadaf Ebrahimi
1365*22dc650dSSadaf Ebrahimi30. Add support for (?^) for unsetting all imnsx options.
1366*22dc650dSSadaf Ebrahimi
1367*22dc650dSSadaf Ebrahimi31. The PCRE2_EXTENDED (/x) option only ever discarded space characters whose
1368*22dc650dSSadaf Ebrahimicode point was less than 256 and that were recognized by the lookup table
1369*22dc650dSSadaf Ebrahimigenerated by pcre2_maketables(), which uses isspace() to identify white space.
1370*22dc650dSSadaf EbrahimiNow, when Unicode support is compiled, PCRE2_EXTENDED also discards U+0085,
1371*22dc650dSSadaf EbrahimiU+200E, U+200F, U+2028, and U+2029, which are additional characters defined by
1372*22dc650dSSadaf EbrahimiUnicode as "Pattern White Space". This makes PCRE2 compatible with Perl.
1373*22dc650dSSadaf Ebrahimi
1374*22dc650dSSadaf Ebrahimi32. In certain circumstances, option settings within patterns were not being
1375*22dc650dSSadaf Ebrahimicorrectly processed. For example, the pattern /((?i)A)(?m)B/ incorrectly
1376*22dc650dSSadaf Ebrahimimatched "ab". (The (?m) setting lost the fact that (?i) should be reset at the
1377*22dc650dSSadaf Ebrahimiend of its group during the parse process, but without another setting such as
1378*22dc650dSSadaf Ebrahimi(?m) the compile phase got it right.) This bug was introduced by the
1379*22dc650dSSadaf Ebrahimirefactoring in release 10.23.
1380*22dc650dSSadaf Ebrahimi
1381*22dc650dSSadaf Ebrahimi33. PCRE2 uses bcopy() if available when memmove() is not, and it used just to
1382*22dc650dSSadaf Ebrahimidefine memmove() as function call to bcopy(). This hasn't been tested for a
1383*22dc650dSSadaf Ebrahimilong time because in pcre2test the result of memmove() was being used, whereas
1384*22dc650dSSadaf Ebrahimibcopy() doesn't return a result. This feature is now refactored always to call
1385*22dc650dSSadaf Ebrahimian emulation function when there is no memmove(). The emulation makes use of
1386*22dc650dSSadaf Ebrahimibcopy() when available.
1387*22dc650dSSadaf Ebrahimi
1388*22dc650dSSadaf Ebrahimi34. When serializing a pattern, set the memctl, executable_jit, and tables
1389*22dc650dSSadaf Ebrahimifields (that is, all the fields that contain pointers) to zeros so that the
1390*22dc650dSSadaf Ebrahimiresult of serializing is always the same. These fields are re-set when the
1391*22dc650dSSadaf Ebrahimipattern is deserialized.
1392*22dc650dSSadaf Ebrahimi
1393*22dc650dSSadaf Ebrahimi35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
1394*22dc650dSSadaf Ebrahiminegative class with no characters less than 0x100 followed by a positive class
1395*22dc650dSSadaf Ebrahimiwith only characters less than 0x100, the first class was incorrectly being
1396*22dc650dSSadaf Ebrahimiauto-possessified, causing incorrect match failures.
1397*22dc650dSSadaf Ebrahimi
1398*22dc650dSSadaf Ebrahimi36. Removed the character type bit ctype_meta, which dates from PCRE1 and is
1399*22dc650dSSadaf Ebrahiminot used in PCRE2.
1400*22dc650dSSadaf Ebrahimi
1401*22dc650dSSadaf Ebrahimi37. Tidied up unnecessarily complicated macros used in the escapes table.
1402*22dc650dSSadaf Ebrahimi
1403*22dc650dSSadaf Ebrahimi38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted
1404*22dc650dSSadaf Ebrahimifrom distribution tarballs, owing to a typo in Makefile.am which had
1405*22dc650dSSadaf Ebrahimitestoutput8-16-3 twice. Now fixed.
1406*22dc650dSSadaf Ebrahimi
1407*22dc650dSSadaf Ebrahimi39. If the only branch in a conditional subpattern was anchored, the whole
1408*22dc650dSSadaf Ebrahimisubpattern was treated as anchored, when it should not have been, since the
1409*22dc650dSSadaf Ebrahimiassumed empty second branch cannot be anchored. Demonstrated by test patterns
1410*22dc650dSSadaf Ebrahimisuch as /(?(1)^())b/ or /(?(?=^))b/.
1411*22dc650dSSadaf Ebrahimi
1412*22dc650dSSadaf Ebrahimi40. A repeated conditional subpattern that could match an empty string was
1413*22dc650dSSadaf Ebrahimialways assumed to be unanchored. Now it is checked just like any other
1414*22dc650dSSadaf Ebrahimirepeated conditional subpattern, and can be found to be anchored if the minimum
1415*22dc650dSSadaf Ebrahimiquantifier is one or more. I can't see much use for a repeated anchored
1416*22dc650dSSadaf Ebrahimipattern, but the behaviour is now consistent.
1417*22dc650dSSadaf Ebrahimi
1418*22dc650dSSadaf Ebrahimi41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint
1419*22dc650dSSadaf Ebrahimi(for an event that could never occur but you had to have external information
1420*22dc650dSSadaf Ebrahimito know that).
1421*22dc650dSSadaf Ebrahimi
1422*22dc650dSSadaf Ebrahimi42. If before the first match in a file that was being searched by pcre2grep
1423*22dc650dSSadaf Ebrahimithere was a line that was sufficiently long to cause the input buffer to be
1424*22dc650dSSadaf Ebrahimiexpanded, the variable holding the location of the end of the previous match
1425*22dc650dSSadaf Ebrahimiwas being adjusted incorrectly, and could cause an overflow warning from a code
1426*22dc650dSSadaf Ebrahimisanitizer. However, as the value is used only to print pending "after" lines
1427*22dc650dSSadaf Ebrahimiwhen the next match is reached (and there are no such lines in this case) this
1428*22dc650dSSadaf Ebrahimibug could do no damage.
1429*22dc650dSSadaf Ebrahimi
1430*22dc650dSSadaf Ebrahimi
1431*22dc650dSSadaf EbrahimiVersion 10.31 12-February-2018
1432*22dc650dSSadaf Ebrahimi------------------------------
1433*22dc650dSSadaf Ebrahimi
1434*22dc650dSSadaf Ebrahimi1. Fix typo (missing ]) in VMS code in pcre2test.c.
1435*22dc650dSSadaf Ebrahimi
1436*22dc650dSSadaf Ebrahimi2. Replace the replicated code for matching extended Unicode grapheme sequences
1437*22dc650dSSadaf Ebrahimi(which got a lot more complicated by change 10.30/49) by a single subroutine
1438*22dc650dSSadaf Ebrahimithat is called by both pcre2_match() and pcre2_dfa_match().
1439*22dc650dSSadaf Ebrahimi
1440*22dc650dSSadaf Ebrahimi3. Add idempotent guard to pcre2_internal.h.
1441*22dc650dSSadaf Ebrahimi
1442*22dc650dSSadaf Ebrahimi4. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
1443*22dc650dSSadaf EbrahimiPCRE2_CONFIG_COMPILED_WIDTHS.
1444*22dc650dSSadaf Ebrahimi
1445*22dc650dSSadaf Ebrahimi5. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is
1446*22dc650dSSadaf Ebrahimidefined (e.g. by --enable-never-backslash-C).
1447*22dc650dSSadaf Ebrahimi
1448*22dc650dSSadaf Ebrahimi6. Defined public names for all the pcre2_compile() error numbers, and used
1449*22dc650dSSadaf Ebrahimithe public names in pcre2_convert.c.
1450*22dc650dSSadaf Ebrahimi
1451*22dc650dSSadaf Ebrahimi7. Fixed a small memory leak in pcre2test (convert contexts).
1452*22dc650dSSadaf Ebrahimi
1453*22dc650dSSadaf Ebrahimi8. Added two casts to compile.c and one to match.c to avoid compiler warnings.
1454*22dc650dSSadaf Ebrahimi
1455*22dc650dSSadaf Ebrahimi9. Added code to pcre2grep when compiled under VMS to set the symbol
1456*22dc650dSSadaf EbrahimiPCRE2GREP_RC to the exit status, because VMS does not distinguish between
1457*22dc650dSSadaf Ebrahimiexit(0) and exit(1).
1458*22dc650dSSadaf Ebrahimi
1459*22dc650dSSadaf Ebrahimi10. Added the -LM (list modifiers) option to pcre2test. Also made -C complain
1460*22dc650dSSadaf Ebrahimiabout a bad option only if the following argument item does not start with a
1461*22dc650dSSadaf Ebrahimihyphen.
1462*22dc650dSSadaf Ebrahimi
1463*22dc650dSSadaf Ebrahimi11. pcre2grep was truncating components of file names to 128 characters when
1464*22dc650dSSadaf Ebrahimiprocessing files with the -r option, and also (some very odd code) truncating
1465*22dc650dSSadaf Ebrahimipath names to 512 characters. There is now a check on the absolute length of
1466*22dc650dSSadaf Ebrahimifull path file names, which may be up to 2047 characters long.
1467*22dc650dSSadaf Ebrahimi
1468*22dc650dSSadaf Ebrahimi12. When an assertion contained (*ACCEPT) it caused all open capturing groups
1469*22dc650dSSadaf Ebrahimito be closed (as for a non-assertion ACCEPT), which was wrong and could lead to
1470*22dc650dSSadaf Ebrahimimisbehaviour for subsequent references to groups that started outside the
1471*22dc650dSSadaf Ebrahimiassertion. ACCEPT in an assertion now closes only those groups that were
1472*22dc650dSSadaf Ebrahimistarted within that assertion. Fixes oss-fuzz issues 3852 and 3891.
1473*22dc650dSSadaf Ebrahimi
1474*22dc650dSSadaf Ebrahimi13. Multiline matching in pcre2grep was misbehaving if the pattern matched
1475*22dc650dSSadaf Ebrahimiwithin a line, and then matched again at the end of the line and over into
1476*22dc650dSSadaf Ebrahimisubsequent lines. Behaviour was different with and without colouring, and
1477*22dc650dSSadaf Ebrahimisometimes context lines were incorrectly printed and/or line endings were lost.
1478*22dc650dSSadaf EbrahimiAll these issues should now be fixed.
1479*22dc650dSSadaf Ebrahimi
1480*22dc650dSSadaf Ebrahimi14. If --line-buffered was specified for pcre2grep when input was from a
1481*22dc650dSSadaf Ebrahimicompressed file (.gz or .bz2) a segfault occurred. (Line buffering should be
1482*22dc650dSSadaf Ebrahimiignored for compressed files.)
1483*22dc650dSSadaf Ebrahimi
1484*22dc650dSSadaf Ebrahimi15. Although pcre2_jit_match checks whether the pattern is compiled
1485*22dc650dSSadaf Ebrahimiin a given mode, it was also expected that at least one mode is available.
1486*22dc650dSSadaf EbrahimiThis is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION
1487*22dc650dSSadaf Ebrahimiwhen the pattern is not optimized by JIT at all.
1488*22dc650dSSadaf Ebrahimi
1489*22dc650dSSadaf Ebrahimi16. The line number and related variables such as match counts in pcre2grep
1490*22dc650dSSadaf Ebrahimiwere all int variables, causing overflow when files with more than 2147483647
1491*22dc650dSSadaf Ebrahimilines were processed (assuming 32-bit ints). They have all been changed to
1492*22dc650dSSadaf Ebrahimiunsigned long ints.
1493*22dc650dSSadaf Ebrahimi
1494*22dc650dSSadaf Ebrahimi17. If a backreference with a minimum repeat count of zero was first in a
1495*22dc650dSSadaf Ebrahimipattern, apart from assertions, an incorrect first matching character could be
1496*22dc650dSSadaf Ebrahimirecorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
1497*22dc650dSSadaf Ebrahimias the first character of a match.
1498*22dc650dSSadaf Ebrahimi
1499*22dc650dSSadaf Ebrahimi18. Characters in a leading positive assertion are considered for recording a
1500*22dc650dSSadaf Ebrahimifirst character of a match when the rest of the pattern does not provide one.
1501*22dc650dSSadaf EbrahimiHowever, a character in a non-assertive group within a leading assertion such
1502*22dc650dSSadaf Ebrahimias in the pattern /(?=(a))\1?b/ caused this process to fail. This was an
1503*22dc650dSSadaf Ebrahimiinfelicity rather than an outright bug, because it did not affect the result of
1504*22dc650dSSadaf Ebrahimia match, just its speed. (In fact, in this case, the starting 'a' was
1505*22dc650dSSadaf Ebrahimisubsequently picked up in the study.)
1506*22dc650dSSadaf Ebrahimi
1507*22dc650dSSadaf Ebrahimi19. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return"
1508*22dc650dSSadaf Ebrahimiinstead of "RRETURN" saves unwinding the backtracks in these cases (only one
1509*22dc650dSSadaf Ebrahimididn't).
1510*22dc650dSSadaf Ebrahimi
1511*22dc650dSSadaf Ebrahimi20. Allocate a single callout block on the stack at the start of pcre2_match()
1512*22dc650dSSadaf Ebrahimiand set its never-changing fields once only. Do the same for pcre2_dfa_match().
1513*22dc650dSSadaf Ebrahimi
1514*22dc650dSSadaf Ebrahimi21. Save the extra compile options (set in the compile context) with the
1515*22dc650dSSadaf Ebrahimicompiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS
1516*22dc650dSSadaf Ebrahimito retrieve them, and update pcre2test to show them.
1517*22dc650dSSadaf Ebrahimi
1518*22dc650dSSadaf Ebrahimi22. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
1519*22dc650dSSadaf Ebrahimifield callout_flags in callout blocks. The bits are set by pcre2_match(), but
1520*22dc650dSSadaf Ebrahiminot by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts
1521*22dc650dSSadaf Ebrahimiif the callout_extra subject modifier is set. These bits are provided to help
1522*22dc650dSSadaf Ebrahimiwith tracking how a backtracking match is proceeding.
1523*22dc650dSSadaf Ebrahimi
1524*22dc650dSSadaf Ebrahimi23. Updated the pcre2demo.c demonstration program, which was missing the extra
1525*22dc650dSSadaf Ebrahimicode for -g that handles the case when \K in an assertion causes the match to
1526*22dc650dSSadaf Ebrahimiend at the original start point. Also arranged for it to detect when \K causes
1527*22dc650dSSadaf Ebrahimithe end of a match to be before its start.
1528*22dc650dSSadaf Ebrahimi
1529*22dc650dSSadaf Ebrahimi24. Similar to 23 above, strange things (including loops) could happen in
1530*22dc650dSSadaf Ebrahimipcre2grep when \K was used in an assertion when --colour was used or in
1531*22dc650dSSadaf Ebrahimimultiline mode. The "end at original start point" bug is fixed, and if the end
1532*22dc650dSSadaf Ebrahimipoint is found to be before the start point, they are swapped.
1533*22dc650dSSadaf Ebrahimi
1534*22dc650dSSadaf Ebrahimi25. When PCRE2_FIRSTLINE without PCRE2_NO_START_OPTIMIZE was used in non-JIT
1535*22dc650dSSadaf Ebrahimimatching (both pcre2_match() and pcre2_dfa_match()) and the matched string
1536*22dc650dSSadaf Ebrahimistarted with the first code unit of a newline sequence, matching failed because
1537*22dc650dSSadaf Ebrahimiit was not tried at the newline.
1538*22dc650dSSadaf Ebrahimi
1539*22dc650dSSadaf Ebrahimi26. Code for giving up a non-partial match after failing to find a starting
1540*22dc650dSSadaf Ebrahimicode unit anywhere in the subject was missing when searching for one of a
1541*22dc650dSSadaf Ebrahiminumber of code units (the bitmap case) in both pcre2_match() and
1542*22dc650dSSadaf Ebrahimipcre2_dfa_match(). This was a missing optimization rather than a bug.
1543*22dc650dSSadaf Ebrahimi
1544*22dc650dSSadaf Ebrahimi27. Tidied up the ACROSSCHAR macro to be like FORWARDCHAR and BACKCHAR, using a
1545*22dc650dSSadaf Ebrahimipointer argument rather than a code unit value. This should not have affected
1546*22dc650dSSadaf Ebrahimithe generated code.
1547*22dc650dSSadaf Ebrahimi
1548*22dc650dSSadaf Ebrahimi28. The JIT compiler has been updated.
1549*22dc650dSSadaf Ebrahimi
1550*22dc650dSSadaf Ebrahimi29. Avoid pointer overflow for unset captures in pcre2_substring_list_get().
1551*22dc650dSSadaf EbrahimiThis could not actually cause a crash because it was always used in a memcpy()
1552*22dc650dSSadaf Ebrahimicall with zero length.
1553*22dc650dSSadaf Ebrahimi
1554*22dc650dSSadaf Ebrahimi30. Some internal structures have a variable-length ovector[] as their last
1555*22dc650dSSadaf Ebrahimielement. Their actual memory is obtained dynamically, giving an ovector of
1556*22dc650dSSadaf Ebrahimiappropriate length. However, they are defined in the structure as
1557*22dc650dSSadaf Ebrahimiovector[NUMBER], where NUMBER is large so that array bound checkers don't
1558*22dc650dSSadaf Ebrahimigrumble. The value of NUMBER was 10000, but a fuzzer exceeded 5000 capturing
1559*22dc650dSSadaf Ebrahimigroups, making the ovector larger than this. The number has been increased to
1560*22dc650dSSadaf Ebrahimi131072, which allows for the maximum number of captures (65535) plus the
1561*22dc650dSSadaf Ebrahimioverall match. This fixes oss-fuzz issue 5415.
1562*22dc650dSSadaf Ebrahimi
1563*22dc650dSSadaf Ebrahimi31. Auto-possessification at the end of a capturing group was dependent on what
1564*22dc650dSSadaf Ebrahimifollows the group (e.g. /(a+)b/ would auto-possessify the a+) but this caused
1565*22dc650dSSadaf Ebrahimiincorrect behaviour when the group was called recursively from elsewhere in the
1566*22dc650dSSadaf Ebrahimipattern where something different might follow. This bug is an unforseen
1567*22dc650dSSadaf Ebrahimiconsequence of change #1 for 10.30 - the implementation of backtracking into
1568*22dc650dSSadaf Ebrahimirecursions. Iterators at the ends of capturing groups are no longer considered
1569*22dc650dSSadaf Ebrahimifor auto-possessification if the pattern contains any recursions. Fixes
1570*22dc650dSSadaf EbrahimiBugzilla #2232.
1571*22dc650dSSadaf Ebrahimi
1572*22dc650dSSadaf Ebrahimi
1573*22dc650dSSadaf EbrahimiVersion 10.30 14-August-2017
1574*22dc650dSSadaf Ebrahimi----------------------------
1575*22dc650dSSadaf Ebrahimi
1576*22dc650dSSadaf Ebrahimi1. The main interpreter, pcre2_match(), has been refactored into a new version
1577*22dc650dSSadaf Ebrahimithat does not use recursive function calls (and therefore the stack) for
1578*22dc650dSSadaf Ebrahimiremembering backtracking positions. This makes --disable-stack-for-recursion a
1579*22dc650dSSadaf EbrahimiNOOP. The new implementation allows backtracking into recursive group calls in
1580*22dc650dSSadaf Ebrahimipatterns, making it more compatible with Perl, and also fixes some other
1581*22dc650dSSadaf Ebrahimihard-to-do issues such as #1887 in Bugzilla. The code is also cleaner because
1582*22dc650dSSadaf Ebrahimithe old code had a number of fudges to try to reduce stack usage. It seems to
1583*22dc650dSSadaf Ebrahimirun no slower than the old code.
1584*22dc650dSSadaf Ebrahimi
1585*22dc650dSSadaf EbrahimiA number of bugs in the refactored code were subsequently fixed during testing
1586*22dc650dSSadaf Ebrahimibefore release, but after the code was made available in the repository. These
1587*22dc650dSSadaf Ebrahimibugs were never in fully released code, but are noted here for the record.
1588*22dc650dSSadaf Ebrahimi
1589*22dc650dSSadaf Ebrahimi  (a) If a pattern had fewer capturing parentheses than the ovector supplied in
1590*22dc650dSSadaf Ebrahimi      the match data block, a memory error (detectable by ASAN) occurred after
1591*22dc650dSSadaf Ebrahimi      a match, because the external block was being set from non-existent
1592*22dc650dSSadaf Ebrahimi      internal ovector fields. Fixes oss-fuzz issue 781.
1593*22dc650dSSadaf Ebrahimi
1594*22dc650dSSadaf Ebrahimi  (b) A pattern with very many capturing parentheses (when the internal frame
1595*22dc650dSSadaf Ebrahimi      size was greater than the initial frame vector on the stack) caused a
1596*22dc650dSSadaf Ebrahimi      crash. A vector on the heap is now set up at the start of matching if the
1597*22dc650dSSadaf Ebrahimi      vector on the stack is not big enough to handle at least 10 frames.
1598*22dc650dSSadaf Ebrahimi      Fixes oss-fuzz issue 783.
1599*22dc650dSSadaf Ebrahimi
1600*22dc650dSSadaf Ebrahimi  (c) Handling of (*VERB)s in recursions was wrong in some cases.
1601*22dc650dSSadaf Ebrahimi
1602*22dc650dSSadaf Ebrahimi  (d) Captures in negative assertions that were used as conditions were not
1603*22dc650dSSadaf Ebrahimi      happening if the assertion matched via (*ACCEPT).
1604*22dc650dSSadaf Ebrahimi
1605*22dc650dSSadaf Ebrahimi  (e) Mark values were not being passed out of recursions.
1606*22dc650dSSadaf Ebrahimi
1607*22dc650dSSadaf Ebrahimi  (f) Refactor some code in do_callout() to avoid picky compiler warnings about
1608*22dc650dSSadaf Ebrahimi      negative indices. Fixes oss-fuzz issue 1454.
1609*22dc650dSSadaf Ebrahimi
1610*22dc650dSSadaf Ebrahimi  (g) Similarly refactor the way the variable length ovector is addressed for
1611*22dc650dSSadaf Ebrahimi      similar reasons. Fixes oss-fuzz issue 1465.
1612*22dc650dSSadaf Ebrahimi
1613*22dc650dSSadaf Ebrahimi2. Now that pcre2_match() no longer uses recursive function calls (see above),
1614*22dc650dSSadaf Ebrahimithe "match limit recursion" value seems misnamed. It still exists, and limits
1615*22dc650dSSadaf Ebrahimithe depth of tree that is searched. To avoid future confusion, it has been
1616*22dc650dSSadaf Ebrahimirenamed as "depth limit" in all relevant places (--with-depth-limit,
1617*22dc650dSSadaf Ebrahimi(*LIMIT_DEPTH), pcre2_set_depth_limit(), etc) but the old names are still
1618*22dc650dSSadaf Ebrahimiavailable for backwards compatibility.
1619*22dc650dSSadaf Ebrahimi
1620*22dc650dSSadaf Ebrahimi3. Hardened pcre2test so as to reduce the number of bugs reported by fuzzers:
1621*22dc650dSSadaf Ebrahimi
1622*22dc650dSSadaf Ebrahimi  (a) Check for malloc failures when getting memory for the ovector (POSIX) or
1623*22dc650dSSadaf Ebrahimi      the match data block (non-POSIX).
1624*22dc650dSSadaf Ebrahimi
1625*22dc650dSSadaf Ebrahimi4. In the 32-bit library in non-UTF mode, an attempt to find a Unicode property
1626*22dc650dSSadaf Ebrahimifor a character with a code point greater than 0x10ffff (the Unicode maximum)
1627*22dc650dSSadaf Ebrahimicaused a crash.
1628*22dc650dSSadaf Ebrahimi
1629*22dc650dSSadaf Ebrahimi5. If a lookbehind assertion that contained a back reference to a group
1630*22dc650dSSadaf Ebrahimiappearing later in the pattern was compiled with the PCRE2_ANCHORED option,
1631*22dc650dSSadaf Ebrahimiundefined actions (often a segmentation fault) could occur, depending on what
1632*22dc650dSSadaf Ebrahimiother options were set. An example assertion is (?<!\1(abc)) where the
1633*22dc650dSSadaf Ebrahimireference \1 precedes the group (abc). This fixes oss-fuzz issue 865.
1634*22dc650dSSadaf Ebrahimi
1635*22dc650dSSadaf Ebrahimi6. Added the PCRE2_INFO_FRAMESIZE item to pcre2_pattern_info() and arranged for
1636*22dc650dSSadaf Ebrahimipcre2test to use it to output the frame size when the "framesize" modifier is
1637*22dc650dSSadaf Ebrahimigiven.
1638*22dc650dSSadaf Ebrahimi
1639*22dc650dSSadaf Ebrahimi7. Reworked the recursive pattern matching in the JIT compiler to follow the
1640*22dc650dSSadaf Ebrahimiinterpreter changes.
1641*22dc650dSSadaf Ebrahimi
1642*22dc650dSSadaf Ebrahimi8. When the zero_terminate modifier was specified on a pcre2test subject line
1643*22dc650dSSadaf Ebrahimifor global matching, unpredictable things could happen. For example, in UTF-8
1644*22dc650dSSadaf Ebrahimimode, the pattern //g,zero_terminate read random memory when matched against an
1645*22dc650dSSadaf Ebrahimiempty string with zero_terminate. This was a bug in pcre2test, not the library.
1646*22dc650dSSadaf Ebrahimi
1647*22dc650dSSadaf Ebrahimi9. Moved some Windows-specific code in pcre2grep (introduced in 10.23/13) out
1648*22dc650dSSadaf Ebrahimiof the section that is compiled when Unix-style directory scanning is
1649*22dc650dSSadaf Ebrahimiavailable, and into a new section that is always compiled for Windows.
1650*22dc650dSSadaf Ebrahimi
1651*22dc650dSSadaf Ebrahimi10. In pcre2test, explicitly close the file after an error during serialization
1652*22dc650dSSadaf Ebrahimior deserialization (the "load" or "save" commands).
1653*22dc650dSSadaf Ebrahimi
1654*22dc650dSSadaf Ebrahimi11. Fix memory leak in pcre2_serialize_decode() when the input is invalid.
1655*22dc650dSSadaf Ebrahimi
1656*22dc650dSSadaf Ebrahimi12. Fix potential NULL dereference in pcre2_callout_enumerate() if called with
1657*22dc650dSSadaf Ebrahimia NULL pattern pointer when Unicode support is available.
1658*22dc650dSSadaf Ebrahimi
1659*22dc650dSSadaf Ebrahimi13. When the 32-bit library was being tested by pcre2test, error messages that
1660*22dc650dSSadaf Ebrahimiwere longer than 64 code units could cause a buffer overflow. This was a bug in
1661*22dc650dSSadaf Ebrahimipcre2test.
1662*22dc650dSSadaf Ebrahimi
1663*22dc650dSSadaf Ebrahimi14. The alternative matching function, pcre2_dfa_match() misbehaved if it
1664*22dc650dSSadaf Ebrahimiencountered a character class with a possessive repeat, for example [a-f]{3}+.
1665*22dc650dSSadaf Ebrahimi
1666*22dc650dSSadaf Ebrahimi15. The depth (formerly recursion) limit now applies to DFA matching (as
1667*22dc650dSSadaf Ebrahimiof 10.23/36); pcre2test has been upgraded so that \=find_limits works with DFA
1668*22dc650dSSadaf Ebrahimimatching to find the minimum value for this limit.
1669*22dc650dSSadaf Ebrahimi
1670*22dc650dSSadaf Ebrahimi16. Since 10.21, if pcre2_match() was called with a null context, default
1671*22dc650dSSadaf Ebrahimimemory allocation functions were used instead of whatever was used when the
1672*22dc650dSSadaf Ebrahimipattern was compiled.
1673*22dc650dSSadaf Ebrahimi
1674*22dc650dSSadaf Ebrahimi17. Changes to the pcre2test "memory" modifier on a subject line. These apply
1675*22dc650dSSadaf Ebrahimionly to pcre2_match():
1676*22dc650dSSadaf Ebrahimi
1677*22dc650dSSadaf Ebrahimi  (a) Warn if null_context is set on both pattern and subject, because the
1678*22dc650dSSadaf Ebrahimi      memory details cannot then be shown.
1679*22dc650dSSadaf Ebrahimi
1680*22dc650dSSadaf Ebrahimi  (b) Remember (up to a certain number of) memory allocations and their
1681*22dc650dSSadaf Ebrahimi      lengths, and list only the lengths, so as to be system-independent.
1682*22dc650dSSadaf Ebrahimi      (In practice, the new interpreter never has more than 2 blocks allocated
1683*22dc650dSSadaf Ebrahimi      simultaneously.)
1684*22dc650dSSadaf Ebrahimi
1685*22dc650dSSadaf Ebrahimi18. Make pcre2test detect an error return from pcre2_get_error_message(), give
1686*22dc650dSSadaf Ebrahimia message, and abandon the run (this would have detected #13 above).
1687*22dc650dSSadaf Ebrahimi
1688*22dc650dSSadaf Ebrahimi19. Implemented PCRE2_ENDANCHORED.
1689*22dc650dSSadaf Ebrahimi
1690*22dc650dSSadaf Ebrahimi20. Applied Jason Hood's patches (slightly modified) to pcre2grep, to implement
1691*22dc650dSSadaf Ebrahimithe --output=text (-O) option and the inbuilt callout echo.
1692*22dc650dSSadaf Ebrahimi
1693*22dc650dSSadaf Ebrahimi21. Extend auto-anchoring etc. to ignore groups with a zero qualifier and
1694*22dc650dSSadaf Ebrahimisingle-branch conditions with a false condition (e.g. DEFINE) at the start of a
1695*22dc650dSSadaf Ebrahimibranch. For example, /(?(DEFINE)...)^A/ and /(...){0}^B/ are now flagged as
1696*22dc650dSSadaf Ebrahimianchored.
1697*22dc650dSSadaf Ebrahimi
1698*22dc650dSSadaf Ebrahimi22. Added an explicit limit on the amount of heap used by pcre2_match(), set by
1699*22dc650dSSadaf Ebrahimipcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). Upgraded pcre2test to show the
1700*22dc650dSSadaf Ebrahimiheap limit along with other pattern information, and to find the minimum when
1701*22dc650dSSadaf Ebrahimithe find_limits modifier is set.
1702*22dc650dSSadaf Ebrahimi
1703*22dc650dSSadaf Ebrahimi23. Write to the last 8 bytes of the pcre2_real_code structure when a compiled
1704*22dc650dSSadaf Ebrahimipattern is set up so as to initialize any padding the compiler might have
1705*22dc650dSSadaf Ebrahimiincluded. This avoids valgrind warnings when a compiled pattern is copied, in
1706*22dc650dSSadaf Ebrahimiparticular when it is serialized.
1707*22dc650dSSadaf Ebrahimi
1708*22dc650dSSadaf Ebrahimi24. Remove a redundant line of code left in accidentally a long time ago.
1709*22dc650dSSadaf Ebrahimi
1710*22dc650dSSadaf Ebrahimi25. Remove a duplication typo in pcre2_tables.c
1711*22dc650dSSadaf Ebrahimi
1712*22dc650dSSadaf Ebrahimi26. Correct an incorrect cast in pcre2_valid_utf.c
1713*22dc650dSSadaf Ebrahimi
1714*22dc650dSSadaf Ebrahimi27. Update pcre2test, remove some unused code in pcre2_match(), and upgrade the
1715*22dc650dSSadaf Ebrahimitests to improve coverage.
1716*22dc650dSSadaf Ebrahimi
1717*22dc650dSSadaf Ebrahimi28. Some fixes/tidies as a result of looking at Coverity Scan output:
1718*22dc650dSSadaf Ebrahimi
1719*22dc650dSSadaf Ebrahimi    (a) Typo: ">" should be ">=" in opcode check in pcre2_auto_possess.c.
1720*22dc650dSSadaf Ebrahimi    (b) Added some casts to avoid "suspicious implicit sign extension".
1721*22dc650dSSadaf Ebrahimi    (c) Resource leaks in pcre2test in rare error cases.
1722*22dc650dSSadaf Ebrahimi    (d) Avoid warning for never-use case OP_TABLE_LENGTH which is just a fudge
1723*22dc650dSSadaf Ebrahimi        for checking at compile time that tables are the right size.
1724*22dc650dSSadaf Ebrahimi    (e) Add missing "fall through" comment.
1725*22dc650dSSadaf Ebrahimi
1726*22dc650dSSadaf Ebrahimi29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features.
1727*22dc650dSSadaf Ebrahimi
1728*22dc650dSSadaf Ebrahimi30. Implement (?n: for PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
1729*22dc650dSSadaf Ebrahimi
1730*22dc650dSSadaf Ebrahimi31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
1731*22dc650dSSadaf Ebrahimipcre2test, a crash could occur.
1732*22dc650dSSadaf Ebrahimi
1733*22dc650dSSadaf Ebrahimi32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16MiB) so
1734*22dc650dSSadaf Ebrahimithat all the tests can run with clang's sanitizing options.
1735*22dc650dSSadaf Ebrahimi
1736*22dc650dSSadaf Ebrahimi33. Implement extra compile options in the compile context and add the first
1737*22dc650dSSadaf Ebrahimione: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
1738*22dc650dSSadaf Ebrahimi
1739*22dc650dSSadaf Ebrahimi34. Implement newline type PCRE2_NEWLINE_NUL.
1740*22dc650dSSadaf Ebrahimi
1741*22dc650dSSadaf Ebrahimi35. A lookbehind assertion that had a zero-length branch caused undefined
1742*22dc650dSSadaf Ebrahimibehaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859.
1743*22dc650dSSadaf Ebrahimi
1744*22dc650dSSadaf Ebrahimi36. The match limit value now also applies to pcre2_dfa_match() as there are
1745*22dc650dSSadaf Ebrahimipatterns that can use up a lot of resources without necessarily recursing very
1746*22dc650dSSadaf Ebrahimideeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
1747*22dc650dSSadaf Ebrahimi
1748*22dc650dSSadaf Ebrahimi37. Implement PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
1749*22dc650dSSadaf Ebrahimi
1750*22dc650dSSadaf Ebrahimi38. Fix returned offsets from regexec() when REG_STARTEND is used with a
1751*22dc650dSSadaf Ebrahimistarting offset greater than zero.
1752*22dc650dSSadaf Ebrahimi
1753*22dc650dSSadaf Ebrahimi39. Implement REG_PEND (GNU extension) for the POSIX wrapper.
1754*22dc650dSSadaf Ebrahimi
1755*22dc650dSSadaf Ebrahimi40. Implement the subject_literal modifier in pcre2test, and allow jitstack on
1756*22dc650dSSadaf Ebrahimipattern lines.
1757*22dc650dSSadaf Ebrahimi
1758*22dc650dSSadaf Ebrahimi41. Implement PCRE2_LITERAL and use it to support REG_NOSPEC.
1759*22dc650dSSadaf Ebrahimi
1760*22dc650dSSadaf Ebrahimi42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit
1761*22dc650dSSadaf Ebrahimiof pcre2grep.
1762*22dc650dSSadaf Ebrahimi
1763*22dc650dSSadaf Ebrahimi43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL,
1764*22dc650dSSadaf EbrahimiPCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs:
1765*22dc650dSSadaf Ebrahimi
1766*22dc650dSSadaf Ebrahimi    (a) The -F option did not work for fixed strings containing \E.
1767*22dc650dSSadaf Ebrahimi    (b) The -w option did not work for patterns with multiple branches.
1768*22dc650dSSadaf Ebrahimi
1769*22dc650dSSadaf Ebrahimi44. Added configuration options for the SELinux compatible execmem allocator in
1770*22dc650dSSadaf EbrahimiJIT.
1771*22dc650dSSadaf Ebrahimi
1772*22dc650dSSadaf Ebrahimi45. Increased the limit for searching for a "must be present" code unit in
1773*22dc650dSSadaf Ebrahimisubjects from 1000 to 2000 for 8-bit searches, since they use memchr() and are
1774*22dc650dSSadaf Ebrahimimuch faster.
1775*22dc650dSSadaf Ebrahimi
1776*22dc650dSSadaf Ebrahimi46. Arrange for anchored patterns to record and use "first code unit" data,
1777*22dc650dSSadaf Ebrahimibecause this can give a fast "no match" without searching for a "required code
1778*22dc650dSSadaf Ebrahimiunit". Previously only non-anchored patterns did this.
1779*22dc650dSSadaf Ebrahimi
1780*22dc650dSSadaf Ebrahimi47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
1781*22dc650dSSadaf Ebrahimi
1782*22dc650dSSadaf Ebrahimi48. Add the callout_no_where modifier to pcre2test.
1783*22dc650dSSadaf Ebrahimi
1784*22dc650dSSadaf Ebrahimi49. Update extended grapheme breaking rules to the latest set that are in
1785*22dc650dSSadaf EbrahimiUnicode Standard Annex #29.
1786*22dc650dSSadaf Ebrahimi
1787*22dc650dSSadaf Ebrahimi50. Added experimental foreign pattern conversion facilities
1788*22dc650dSSadaf Ebrahimi(pcre2_pattern_convert() and friends).
1789*22dc650dSSadaf Ebrahimi
1790*22dc650dSSadaf Ebrahimi51. Change the macro FWRITE, used in pcre2grep, to FWRITE_IGNORE because FWRITE
1791*22dc650dSSadaf Ebrahimiis defined in a system header in cygwin. Also modified some of the #ifdefs in
1792*22dc650dSSadaf Ebrahimipcre2grep related to Windows and Cygwin support.
1793*22dc650dSSadaf Ebrahimi
1794*22dc650dSSadaf Ebrahimi52. Change 3(g) for 10.23 was a bit too zealous. If a hyphen that follows a
1795*22dc650dSSadaf Ebrahimicharacter class is the last character in the class, Perl does not give a
1796*22dc650dSSadaf Ebrahimiwarning. PCRE2 now also treats this as a literal.
1797*22dc650dSSadaf Ebrahimi
1798*22dc650dSSadaf Ebrahimi53. Related to 52, though PCRE2 was throwing an error for [[:digit:]-X] it was
1799*22dc650dSSadaf Ebrahiminot doing so for [\d-X] (and similar escapes), as is documented.
1800*22dc650dSSadaf Ebrahimi
1801*22dc650dSSadaf Ebrahimi54. Fixed a MIPS issue in the JIT compiler reported by Joshua Kinard.
1802*22dc650dSSadaf Ebrahimi
1803*22dc650dSSadaf Ebrahimi55. Fixed a "maybe uninitialized" warning for class_uchardata in \p handling in
1804*22dc650dSSadaf Ebrahimipcre2_compile() which could never actually trigger (code should have been cut
1805*22dc650dSSadaf Ebrahimiout when Unicode support is disabled).
1806*22dc650dSSadaf Ebrahimi
1807*22dc650dSSadaf Ebrahimi
1808*22dc650dSSadaf EbrahimiVersion 10.23 14-February-2017
1809*22dc650dSSadaf Ebrahimi------------------------------
1810*22dc650dSSadaf Ebrahimi
1811*22dc650dSSadaf Ebrahimi1. Extended pcre2test with the utf8_input modifier so that it is able to
1812*22dc650dSSadaf Ebrahimigenerate all possible 16-bit and 32-bit code unit values in non-UTF modes.
1813*22dc650dSSadaf Ebrahimi
1814*22dc650dSSadaf Ebrahimi2. In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode), without
1815*22dc650dSSadaf EbrahimiPCRE2_UCP set, a negative character type such as \D in a positive class should
1816*22dc650dSSadaf Ebrahimicause all characters greater than 255 to match, whatever else is in the class.
1817*22dc650dSSadaf EbrahimiThere was a bug that caused this not to happen if a Unicode property item was
1818*22dc650dSSadaf Ebrahimiadded to such a class, for example [\D\P{Nd}] or [\W\pL].
1819*22dc650dSSadaf Ebrahimi
1820*22dc650dSSadaf Ebrahimi3. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
1821*22dc650dSSadaf Ebrahimichecking is now done in the pre-pass that identifies capturing groups. This has
1822*22dc650dSSadaf Ebrahimireduced the amount of duplication and made the code tidier. While doing this,
1823*22dc650dSSadaf Ebrahimisome minor bugs and Perl incompatibilities were fixed, including:
1824*22dc650dSSadaf Ebrahimi
1825*22dc650dSSadaf Ebrahimi  (a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead
1826*22dc650dSSadaf Ebrahimi      of giving an invalid quantifier error.
1827*22dc650dSSadaf Ebrahimi
1828*22dc650dSSadaf Ebrahimi  (b) {0} can now be used after a group in a lookbehind assertion; previously
1829*22dc650dSSadaf Ebrahimi      this caused an "assertion is not fixed length" error.
1830*22dc650dSSadaf Ebrahimi
1831*22dc650dSSadaf Ebrahimi  (c) Perl always treats (?(DEFINE) as a "define" group, even if a group with
1832*22dc650dSSadaf Ebrahimi      the name "DEFINE" exists. PCRE2 now does likewise.
1833*22dc650dSSadaf Ebrahimi
1834*22dc650dSSadaf Ebrahimi  (d) A recursion condition test such as (?(R2)...) must now refer to an
1835*22dc650dSSadaf Ebrahimi      existing subpattern.
1836*22dc650dSSadaf Ebrahimi
1837*22dc650dSSadaf Ebrahimi  (e) A conditional recursion test such as (?(R)...) misbehaved if there was a
1838*22dc650dSSadaf Ebrahimi      group whose name began with "R".
1839*22dc650dSSadaf Ebrahimi
1840*22dc650dSSadaf Ebrahimi  (f) When testing zero-terminated patterns under valgrind, the terminating
1841*22dc650dSSadaf Ebrahimi      zero is now marked "no access". This catches bugs that would otherwise
1842*22dc650dSSadaf Ebrahimi      show up only with non-zero-terminated patterns.
1843*22dc650dSSadaf Ebrahimi
1844*22dc650dSSadaf Ebrahimi  (g) A hyphen appearing immediately after a POSIX character class (for example
1845*22dc650dSSadaf Ebrahimi      /[[:ascii:]-z]/) now generates an error. Perl does accept this as a
1846*22dc650dSSadaf Ebrahimi      literal, but gives a warning, so it seems best to fail it in PCRE.
1847*22dc650dSSadaf Ebrahimi
1848*22dc650dSSadaf Ebrahimi  (h) An empty \Q\E sequence may appear after a callout that precedes an
1849*22dc650dSSadaf Ebrahimi      assertion condition (it is, of course, ignored).
1850*22dc650dSSadaf Ebrahimi
1851*22dc650dSSadaf EbrahimiOne effect of the refactoring is that some error numbers and messages have
1852*22dc650dSSadaf Ebrahimichanged, and the pattern offset given for compiling errors is not always the
1853*22dc650dSSadaf Ebrahimiright-most character that has been read. In particular, for a variable-length
1854*22dc650dSSadaf Ebrahimilookbehind assertion it now points to the start of the assertion. Another
1855*22dc650dSSadaf Ebrahimichange is that when a callout appears before a group, the "length of next
1856*22dc650dSSadaf Ebrahimipattern item" that is passed now just gives the length of the opening
1857*22dc650dSSadaf Ebrahimiparenthesis item, not the length of the whole group. A length of zero is now
1858*22dc650dSSadaf Ebrahimigiven only for a callout at the end of the pattern. Automatic callouts are no
1859*22dc650dSSadaf Ebrahimilonger inserted before and after explicit callouts in the pattern.
1860*22dc650dSSadaf Ebrahimi
1861*22dc650dSSadaf EbrahimiA number of bugs in the refactored code were subsequently fixed during testing
1862*22dc650dSSadaf Ebrahimibefore release, but after the code was made available in the repository. Many
1863*22dc650dSSadaf Ebrahimiof the bugs were discovered by fuzzing testing. Several of them were related to
1864*22dc650dSSadaf Ebrahimithe change from assuming a zero-terminated pattern (which previously had
1865*22dc650dSSadaf Ebrahimirequired non-zero terminated strings to be copied). These bugs were never in
1866*22dc650dSSadaf Ebrahimifully released code, but are noted here for the record.
1867*22dc650dSSadaf Ebrahimi
1868*22dc650dSSadaf Ebrahimi  (a) An overall recursion such as (?0) inside a lookbehind assertion was not
1869*22dc650dSSadaf Ebrahimi      being diagnosed as an error.
1870*22dc650dSSadaf Ebrahimi
1871*22dc650dSSadaf Ebrahimi  (b) In utf mode, the length of a *MARK (or other verb) name was being checked
1872*22dc650dSSadaf Ebrahimi      in characters instead of code units, which could lead to bad code being
1873*22dc650dSSadaf Ebrahimi      compiled, leading to unpredictable behaviour.
1874*22dc650dSSadaf Ebrahimi
1875*22dc650dSSadaf Ebrahimi  (c) In extended /x mode, characters whose code was greater than 255 caused
1876*22dc650dSSadaf Ebrahimi      a lookup outside one of the global tables. A similar bug existed for wide
1877*22dc650dSSadaf Ebrahimi      characters in *VERB names.
1878*22dc650dSSadaf Ebrahimi
1879*22dc650dSSadaf Ebrahimi  (d) The amount of memory needed for a compiled pattern was miscalculated if a
1880*22dc650dSSadaf Ebrahimi      lookbehind contained more than one toplevel branch and the first branch
1881*22dc650dSSadaf Ebrahimi      was of length zero.
1882*22dc650dSSadaf Ebrahimi
1883*22dc650dSSadaf Ebrahimi  (e) In UTF-8 or UTF-16 modes with PCRE2_EXTENDED (/x) set and a non-zero-
1884*22dc650dSSadaf Ebrahimi      terminated pattern, if a # comment ran on to the end of the pattern, one
1885*22dc650dSSadaf Ebrahimi      or more code units past the end were being read.
1886*22dc650dSSadaf Ebrahimi
1887*22dc650dSSadaf Ebrahimi  (f) An unterminated repeat at the end of a non-zero-terminated pattern (e.g.
1888*22dc650dSSadaf Ebrahimi      "{2,2") could cause reading beyond the pattern.
1889*22dc650dSSadaf Ebrahimi
1890*22dc650dSSadaf Ebrahimi  (g) When reading a callout string, if the end delimiter was at the end of the
1891*22dc650dSSadaf Ebrahimi      pattern one further code unit was read.
1892*22dc650dSSadaf Ebrahimi
1893*22dc650dSSadaf Ebrahimi  (h) An unterminated number after \g' could cause reading beyond the pattern.
1894*22dc650dSSadaf Ebrahimi
1895*22dc650dSSadaf Ebrahimi  (i) An insufficient memory size was being computed for compiling with
1896*22dc650dSSadaf Ebrahimi      PCRE2_AUTO_CALLOUT.
1897*22dc650dSSadaf Ebrahimi
1898*22dc650dSSadaf Ebrahimi  (j) A conditional group with an assertion condition used more memory than was
1899*22dc650dSSadaf Ebrahimi      allowed for it during parsing, so too many of them could therefore
1900*22dc650dSSadaf Ebrahimi      overrun a buffer.
1901*22dc650dSSadaf Ebrahimi
1902*22dc650dSSadaf Ebrahimi  (k) If parsing a pattern exactly filled the buffer, the internal test for
1903*22dc650dSSadaf Ebrahimi      overrun did not check when the final META_END item was added.
1904*22dc650dSSadaf Ebrahimi
1905*22dc650dSSadaf Ebrahimi  (l) If a lookbehind contained a subroutine call, and the called group
1906*22dc650dSSadaf Ebrahimi      contained an option setting such as (?s), and the PCRE2_ANCHORED option
1907*22dc650dSSadaf Ebrahimi      was set, unpredictable behaviour could occur. The underlying bug was
1908*22dc650dSSadaf Ebrahimi      incorrect code and insufficient checking while searching for the end of
1909*22dc650dSSadaf Ebrahimi      the called subroutine in the parsed pattern.
1910*22dc650dSSadaf Ebrahimi
1911*22dc650dSSadaf Ebrahimi  (m) Quantifiers following (*VERB)s were not being diagnosed as errors.
1912*22dc650dSSadaf Ebrahimi
1913*22dc650dSSadaf Ebrahimi  (n) The use of \Q...\E in a (*VERB) name when PCRE2_ALT_VERBNAMES and
1914*22dc650dSSadaf Ebrahimi      PCRE2_AUTO_CALLOUT were both specified caused undetermined behaviour.
1915*22dc650dSSadaf Ebrahimi
1916*22dc650dSSadaf Ebrahimi  (o) If \Q was preceded by a quantified item, and the following \E was
1917*22dc650dSSadaf Ebrahimi      followed by '?' or '+', and there was at least one literal character
1918*22dc650dSSadaf Ebrahimi      between them, an internal error "unexpected repeat" occurred (example:
1919*22dc650dSSadaf Ebrahimi      /.+\QX\E+/).
1920*22dc650dSSadaf Ebrahimi
1921*22dc650dSSadaf Ebrahimi  (p) A buffer overflow could occur while sorting the names in the group name
1922*22dc650dSSadaf Ebrahimi      list (depending on the order in which the names were seen).
1923*22dc650dSSadaf Ebrahimi
1924*22dc650dSSadaf Ebrahimi  (q) A conditional group that started with a callout was not doing the right
1925*22dc650dSSadaf Ebrahimi      check for a following assertion, leading to compiling bad code. Example:
1926*22dc650dSSadaf Ebrahimi      /(?(C'XX))?!XX/
1927*22dc650dSSadaf Ebrahimi
1928*22dc650dSSadaf Ebrahimi  (r) If a character whose code point was greater than 0xffff appeared within
1929*22dc650dSSadaf Ebrahimi      a lookbehind that was within another lookbehind, the calculation of the
1930*22dc650dSSadaf Ebrahimi      lookbehind length went wrong and could provoke an internal error.
1931*22dc650dSSadaf Ebrahimi
1932*22dc650dSSadaf Ebrahimi  (t) The sequence \E- or \Q\E- after a POSIX class in a character class caused
1933*22dc650dSSadaf Ebrahimi      an internal error. Now the hyphen is treated as a literal.
1934*22dc650dSSadaf Ebrahimi
1935*22dc650dSSadaf Ebrahimi4. Back references are now permitted in lookbehind assertions when there are
1936*22dc650dSSadaf Ebrahimino duplicated group numbers (that is, (?| has not been used), and, if the
1937*22dc650dSSadaf Ebrahimireference is by name, there is only one group of that name. The referenced
1938*22dc650dSSadaf Ebrahimigroup must, of course be of fixed length.
1939*22dc650dSSadaf Ebrahimi
1940*22dc650dSSadaf Ebrahimi5. pcre2test has been upgraded so that, when run under valgrind with valgrind
1941*22dc650dSSadaf Ebrahimisupport enabled, reading past the end of the pattern is detected, both when
1942*22dc650dSSadaf Ebrahimicompiling and during callout processing.
1943*22dc650dSSadaf Ebrahimi
1944*22dc650dSSadaf Ebrahimi6. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back
1945*22dc650dSSadaf Ebrahimireference" and can be useful in repetitions (compare \g{-<number>} ). Perl does
1946*22dc650dSSadaf Ebrahiminot recognize this syntax.
1947*22dc650dSSadaf Ebrahimi
1948*22dc650dSSadaf Ebrahimi7. Automatic callouts are no longer generated before and after callouts in the
1949*22dc650dSSadaf Ebrahimipattern.
1950*22dc650dSSadaf Ebrahimi
1951*22dc650dSSadaf Ebrahimi8. When pcre2test was outputting information from a callout, the caret indicator
1952*22dc650dSSadaf Ebrahimifor the current position in the subject line was incorrect if it was after an
1953*22dc650dSSadaf Ebrahimiescape sequence for a character whose code point was greater than \x{ff}.
1954*22dc650dSSadaf Ebrahimi
1955*22dc650dSSadaf Ebrahimi9. Change 19 for 10.22 had a typo (PCRE_STATIC_RUNTIME should be
1956*22dc650dSSadaf EbrahimiPCRE2_STATIC_RUNTIME). Fix from David Gaussmann.
1957*22dc650dSSadaf Ebrahimi
1958*22dc650dSSadaf Ebrahimi10. Added --max-buffer-size to pcre2grep, to allow for automatic buffer
1959*22dc650dSSadaf Ebrahimiexpansion when long lines are encountered. Original patch by Dmitry
1960*22dc650dSSadaf EbrahimiCherniachenko.
1961*22dc650dSSadaf Ebrahimi
1962*22dc650dSSadaf Ebrahimi11. If pcre2grep was compiled with JIT support, but the library was compiled
1963*22dc650dSSadaf Ebrahimiwithout it (something that neither ./configure nor CMake allow, but it can be
1964*22dc650dSSadaf Ebrahimidone by editing config.h), pcre2grep was giving a JIT error. Now it detects
1965*22dc650dSSadaf Ebrahimithis situation and does not try to use JIT.
1966*22dc650dSSadaf Ebrahimi
1967*22dc650dSSadaf Ebrahimi12. Added some "const" qualifiers to variables in pcre2grep.
1968*22dc650dSSadaf Ebrahimi
1969*22dc650dSSadaf Ebrahimi13. Added Dmitry Cherniachenko's patch for colouring output in Windows
1970*22dc650dSSadaf Ebrahimi(untested by me). Also, look for GREP_COLOUR or GREP_COLOR if the environment
1971*22dc650dSSadaf Ebrahimivariables PCRE2GREP_COLOUR and PCRE2GREP_COLOR are not found.
1972*22dc650dSSadaf Ebrahimi
1973*22dc650dSSadaf Ebrahimi14. Add the -t (grand total) option to pcre2grep.
1974*22dc650dSSadaf Ebrahimi
1975*22dc650dSSadaf Ebrahimi15. A number of bugs have been mended relating to match start-up optimizations
1976*22dc650dSSadaf Ebrahimiwhen the first thing in a pattern is a positive lookahead. These all applied
1977*22dc650dSSadaf Ebrahimionly when PCRE2_NO_START_OPTIMIZE was *not* set:
1978*22dc650dSSadaf Ebrahimi
1979*22dc650dSSadaf Ebrahimi    (a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed
1980*22dc650dSSadaf Ebrahimi        both an initial 'X' and a following 'X'.
1981*22dc650dSSadaf Ebrahimi    (b) Some patterns starting with an assertion that started with .* were
1982*22dc650dSSadaf Ebrahimi        incorrectly optimized as having to match at the start of the subject or
1983*22dc650dSSadaf Ebrahimi        after a newline. There are cases where this is not true, for example,
1984*22dc650dSSadaf Ebrahimi        (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that
1985*22dc650dSSadaf Ebrahimi        start with spaces. Starting .* in an assertion is no longer taken as an
1986*22dc650dSSadaf Ebrahimi        indication of matching at the start (or after a newline).
1987*22dc650dSSadaf Ebrahimi
1988*22dc650dSSadaf Ebrahimi16. The "offset" modifier in pcre2test was not being ignored (as documented)
1989*22dc650dSSadaf Ebrahimiwhen the POSIX API was in use.
1990*22dc650dSSadaf Ebrahimi
1991*22dc650dSSadaf Ebrahimi17. Added --enable-fuzz-support to "configure", causing an non-installed
1992*22dc650dSSadaf Ebrahimilibrary containing a test function that can be called by fuzzers to be
1993*22dc650dSSadaf Ebrahimicompiled. A non-installed  binary to run the test function locally, called
1994*22dc650dSSadaf Ebrahimipcre2fuzzcheck is also compiled.
1995*22dc650dSSadaf Ebrahimi
1996*22dc650dSSadaf Ebrahimi18. A pattern with PCRE2_DOTALL (/s) set but not PCRE2_NO_DOTSTAR_ANCHOR, and
1997*22dc650dSSadaf Ebrahimiwhich started with .* inside a positive lookahead was incorrectly being
1998*22dc650dSSadaf Ebrahimicompiled as implicitly anchored.
1999*22dc650dSSadaf Ebrahimi
2000*22dc650dSSadaf Ebrahimi19. Removed all instances of "register" declarations, as they are considered
2001*22dc650dSSadaf Ebrahimiobsolete these days and in any case had become very haphazard.
2002*22dc650dSSadaf Ebrahimi
2003*22dc650dSSadaf Ebrahimi20. Add strerror() to pcre2test for failed file opening.
2004*22dc650dSSadaf Ebrahimi
2005*22dc650dSSadaf Ebrahimi21. Make pcre2test -C list valgrind support when it is enabled.
2006*22dc650dSSadaf Ebrahimi
2007*22dc650dSSadaf Ebrahimi22. Add the use_length modifier to pcre2test.
2008*22dc650dSSadaf Ebrahimi
2009*22dc650dSSadaf Ebrahimi23. Fix an off-by-one bug in pcre2test for the list of names for 'get' and
2010*22dc650dSSadaf Ebrahimi'copy' modifiers.
2011*22dc650dSSadaf Ebrahimi
2012*22dc650dSSadaf Ebrahimi24. Add PCRE2_CALL_CONVENTION into the prototype declarations in pcre2.h as it
2013*22dc650dSSadaf Ebrahimiis apparently needed there as well as in the function definitions. (Why did
2014*22dc650dSSadaf Ebrahiminobody ask for this in PCRE1?)
2015*22dc650dSSadaf Ebrahimi
2016*22dc650dSSadaf Ebrahimi25. Change the _PCRE2_H and _PCRE2_UCP_H guard macros in the header files to
2017*22dc650dSSadaf EbrahimiPCRE2_H_IDEMPOTENT_GUARD and PCRE2_UCP_H_IDEMPOTENT_GUARD to be more standard
2018*22dc650dSSadaf Ebrahimicompliant and unique.
2019*22dc650dSSadaf Ebrahimi
2020*22dc650dSSadaf Ebrahimi26. pcre2-config --libs-posix was listing -lpcre2posix instead of
2021*22dc650dSSadaf Ebrahimi-lpcre2-posix. Also, the CMake build process was building the library with the
2022*22dc650dSSadaf Ebrahimiwrong name.
2023*22dc650dSSadaf Ebrahimi
2024*22dc650dSSadaf Ebrahimi27. In pcre2test, give some offset information for errors in hex patterns.
2025*22dc650dSSadaf EbrahimiThis uses the C99 formatting sequence %td, except for MSVC which doesn't
2026*22dc650dSSadaf Ebrahimisupport it - %lu is used instead.
2027*22dc650dSSadaf Ebrahimi
2028*22dc650dSSadaf Ebrahimi28. Implemented pcre2_code_copy_with_tables(), and added pushtablescopy to
2029*22dc650dSSadaf Ebrahimipcre2test for testing it.
2030*22dc650dSSadaf Ebrahimi
2031*22dc650dSSadaf Ebrahimi29. Fix small memory leak in pcre2test.
2032*22dc650dSSadaf Ebrahimi
2033*22dc650dSSadaf Ebrahimi30. Fix out-of-bounds read for partial matching of /./ against an empty string
2034*22dc650dSSadaf Ebrahimiwhen the newline type is CRLF.
2035*22dc650dSSadaf Ebrahimi
2036*22dc650dSSadaf Ebrahimi31. Fix a bug in pcre2test that caused a crash when a locale was set either in
2037*22dc650dSSadaf Ebrahimithe current pattern or a previous one and a wide character was matched.
2038*22dc650dSSadaf Ebrahimi
2039*22dc650dSSadaf Ebrahimi32. The appearance of \p, \P, or \X in a substitution string when
2040*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_EXTENDED was set caused a segmentation fault (NULL
2041*22dc650dSSadaf Ebrahimidereference).
2042*22dc650dSSadaf Ebrahimi
2043*22dc650dSSadaf Ebrahimi33. If the starting offset was specified as greater than the subject length in
2044*22dc650dSSadaf Ebrahimia call to pcre2_substitute() an out-of-bounds memory reference could occur.
2045*22dc650dSSadaf Ebrahimi
2046*22dc650dSSadaf Ebrahimi34. When PCRE2 was compiled to use the heap instead of the stack for recursive
2047*22dc650dSSadaf Ebrahimicalls to match(), a repeated minimizing caseless back reference, or a
2048*22dc650dSSadaf Ebrahimimaximizing one where the two cases had different numbers of code units,
2049*22dc650dSSadaf Ebrahimifollowed by a caseful back reference, could lose the caselessness of the first
2050*22dc650dSSadaf Ebrahimirepeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX
2051*22dc650dSSadaf Ebrahimibut didn't).
2052*22dc650dSSadaf Ebrahimi
2053*22dc650dSSadaf Ebrahimi35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum
2054*22dc650dSSadaf Ebrahimimatching length and just records zero. Typically this happens when there are
2055*22dc650dSSadaf Ebrahimitoo many nested or recursive back references. If the limit was reached in
2056*22dc650dSSadaf Ebrahimicertain recursive cases it failed to be triggered and an internal error could
2057*22dc650dSSadaf Ebrahimibe the result.
2058*22dc650dSSadaf Ebrahimi
2059*22dc650dSSadaf Ebrahimi36. The pcre2_dfa_match() function now takes note of the recursion limit for
2060*22dc650dSSadaf Ebrahimithe internal recursive calls that are used for lookrounds and recursions within
2061*22dc650dSSadaf Ebrahimithe pattern.
2062*22dc650dSSadaf Ebrahimi
2063*22dc650dSSadaf Ebrahimi37. More refactoring has got rid of the internal could_be_empty_branch()
2064*22dc650dSSadaf Ebrahimifunction (around 400 lines of code, including comments) by keeping track of
2065*22dc650dSSadaf Ebrahimicould-be-emptiness as the pattern is compiled instead of scanning compiled
2066*22dc650dSSadaf Ebrahimigroups. (This would have been much harder before the refactoring of #3 above.)
2067*22dc650dSSadaf EbrahimiThis lifts a restriction on the number of branches in a group (more than about
2068*22dc650dSSadaf Ebrahimi1100 would give "pattern is too complicated").
2069*22dc650dSSadaf Ebrahimi
2070*22dc650dSSadaf Ebrahimi38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern
2071*22dc650dSSadaf Ebrahimiauto_callout".
2072*22dc650dSSadaf Ebrahimi
2073*22dc650dSSadaf Ebrahimi39. In a library with Unicode support, incorrect data was compiled for a
2074*22dc650dSSadaf Ebrahimipattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide
2075*22dc650dSSadaf Ebrahimicharacters to match (for example, /[\s[:^ascii:]]/).
2076*22dc650dSSadaf Ebrahimi
2077*22dc650dSSadaf Ebrahimi40. The callout_error modifier has been added to pcre2test to make it possible
2078*22dc650dSSadaf Ebrahimito return PCRE2_ERROR_CALLOUT from a callout.
2079*22dc650dSSadaf Ebrahimi
2080*22dc650dSSadaf Ebrahimi41. A minor change to pcre2grep: colour reset is now "<esc>[0m" instead of
2081*22dc650dSSadaf Ebrahimi"<esc>[00m".
2082*22dc650dSSadaf Ebrahimi
2083*22dc650dSSadaf Ebrahimi42. The limit in the auto-possessification code that was intended to catch
2084*22dc650dSSadaf Ebrahimioverly-complicated patterns and not spend too much time auto-possessifying was
2085*22dc650dSSadaf Ebrahimibeing reset too often, resulting in very long compile times for some patterns.
2086*22dc650dSSadaf EbrahimiNow such patterns are no longer completely auto-possessified.
2087*22dc650dSSadaf Ebrahimi
2088*22dc650dSSadaf Ebrahimi43. Applied Jason Hood's revised patch for RunTest.bat.
2089*22dc650dSSadaf Ebrahimi
2090*22dc650dSSadaf Ebrahimi44. Added a new Windows script RunGrepTest.bat, courtesy of Jason Hood.
2091*22dc650dSSadaf Ebrahimi
2092*22dc650dSSadaf Ebrahimi45. Minor cosmetic fix to pcre2test: move a variable that is not used under
2093*22dc650dSSadaf EbrahimiWindows into the "not Windows" code.
2094*22dc650dSSadaf Ebrahimi
2095*22dc650dSSadaf Ebrahimi46. Applied Jason Hood's patches to upgrade pcre2grep under Windows and tidy
2096*22dc650dSSadaf Ebrahimisome of the code:
2097*22dc650dSSadaf Ebrahimi
2098*22dc650dSSadaf Ebrahimi  * normalised the Windows condition by ensuring WIN32 is defined;
2099*22dc650dSSadaf Ebrahimi  * enables the callout feature under Windows;
2100*22dc650dSSadaf Ebrahimi  * adds globbing (Microsoft's implementation expands quoted args),
2101*22dc650dSSadaf Ebrahimi    using a tweaked opendirectory;
2102*22dc650dSSadaf Ebrahimi  * implements the is_*_tty functions for Windows;
2103*22dc650dSSadaf Ebrahimi  * --color=always will write the ANSI sequences to file;
2104*22dc650dSSadaf Ebrahimi  * add sequences 4 (underline works on Win10) and 5 (blink as bright
2105*22dc650dSSadaf Ebrahimi    background, relatively standard on DOS/Win);
2106*22dc650dSSadaf Ebrahimi  * remove the (char *) casts for the now-const strings;
2107*22dc650dSSadaf Ebrahimi  * remove GREP_COLOUR (grep's command line allowed the 'u', but not
2108*22dc650dSSadaf Ebrahimi    the environment), parsing GREP_COLORS instead;
2109*22dc650dSSadaf Ebrahimi  * uses the current colour if not set, rather than black;
2110*22dc650dSSadaf Ebrahimi  * add print_match for the undefined case;
2111*22dc650dSSadaf Ebrahimi  * fixes a typo.
2112*22dc650dSSadaf Ebrahimi
2113*22dc650dSSadaf EbrahimiIn addition, colour settings containing anything other than digits and
2114*22dc650dSSadaf Ebrahimisemicolon are ignored, and the colour controls are no longer output for empty
2115*22dc650dSSadaf Ebrahimistrings.
2116*22dc650dSSadaf Ebrahimi
2117*22dc650dSSadaf Ebrahimi47. Detecting patterns that are too large inside the length-measuring loop
2118*22dc650dSSadaf Ebrahimisaves processing ridiculously long patterns to their end.
2119*22dc650dSSadaf Ebrahimi
2120*22dc650dSSadaf Ebrahimi48. Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it
2121*22dc650dSSadaf Ebrahimijust wastes time. In the UTF case it can also produce redundant entries in
2122*22dc650dSSadaf EbrahimiXCLASS lists caused by characters with multiple other cases and pairs of
2123*22dc650dSSadaf Ebrahimicharacters in the same "not-x" sublists.
2124*22dc650dSSadaf Ebrahimi
2125*22dc650dSSadaf Ebrahimi49. A pattern such as /(?=(a\K))/ can report the end of the match being before
2126*22dc650dSSadaf Ebrahimiits start; pcre2test was not handling this correctly when using the POSIX
2127*22dc650dSSadaf Ebrahimiinterface (it was OK with the native interface).
2128*22dc650dSSadaf Ebrahimi
2129*22dc650dSSadaf Ebrahimi50. In pcre2grep, ignore all JIT compile errors. This means that pcre2grep will
2130*22dc650dSSadaf Ebrahimicontinue to work, falling back to interpretation if anything goes wrong with
2131*22dc650dSSadaf EbrahimiJIT.
2132*22dc650dSSadaf Ebrahimi
2133*22dc650dSSadaf Ebrahimi51. Applied patches from Christian Persch to configure.ac to make use of the
2134*22dc650dSSadaf EbrahimiAC_USE_SYSTEM_EXTENSIONS macro and to test for functions used by the JIT
2135*22dc650dSSadaf Ebrahimimodules.
2136*22dc650dSSadaf Ebrahimi
2137*22dc650dSSadaf Ebrahimi52. Minor fixes to pcre2grep from Jason Hood:
2138*22dc650dSSadaf Ebrahimi    * fixed some spacing;
2139*22dc650dSSadaf Ebrahimi    * Windows doesn't usually use single quotes, so I've added a define
2140*22dc650dSSadaf Ebrahimi      to use appropriate quotes [in an example];
2141*22dc650dSSadaf Ebrahimi    * LC_ALL was displayed as "LCC_ALL";
2142*22dc650dSSadaf Ebrahimi    * numbers 11, 12 & 13 should end in "th";
2143*22dc650dSSadaf Ebrahimi    * use double quotes in usage message.
2144*22dc650dSSadaf Ebrahimi
2145*22dc650dSSadaf Ebrahimi53. When autopossessifying, skip empty branches without recursion, to reduce
2146*22dc650dSSadaf Ebrahimistack usage for the benefit of clang with -fsanitize-address, which uses huge
2147*22dc650dSSadaf Ebrahimistack frames. Example pattern: /X?(R||){3335}/. Fixes oss-fuzz issue 553.
2148*22dc650dSSadaf Ebrahimi
2149*22dc650dSSadaf Ebrahimi54. A pattern with very many explicit back references to a group that is a long
2150*22dc650dSSadaf Ebrahimiway from the start of the pattern could take a long time to compile because
2151*22dc650dSSadaf Ebrahimisearching for the referenced group in order to find the minimum length was
2152*22dc650dSSadaf Ebrahimibeing done repeatedly. Now up to 128 group minimum lengths are cached and the
2153*22dc650dSSadaf Ebrahimiattempt to find a minimum length is abandoned if there is a back reference to a
2154*22dc650dSSadaf Ebrahimigroup whose number is greater than 128. (In that case, the pattern is so
2155*22dc650dSSadaf Ebrahimicomplicated that this optimization probably isn't worth it.) This fixes
2156*22dc650dSSadaf Ebrahimioss-fuzz issue 557.
2157*22dc650dSSadaf Ebrahimi
2158*22dc650dSSadaf Ebrahimi55. Issue 32 for 10.22 below was not correctly fixed. If pcre2grep in multiline
2159*22dc650dSSadaf Ebrahimimode with --only-matching matched several lines, it restarted scanning at the
2160*22dc650dSSadaf Ebrahiminext line instead of moving on to the end of the matched string, which can be
2161*22dc650dSSadaf Ebrahimiseveral lines after the start.
2162*22dc650dSSadaf Ebrahimi
2163*22dc650dSSadaf Ebrahimi56. Applied Jason Hood's new patch for RunGrepTest.bat that updates it in line
2164*22dc650dSSadaf Ebrahimiwith updates to the non-Windows version.
2165*22dc650dSSadaf Ebrahimi
2166*22dc650dSSadaf Ebrahimi
2167*22dc650dSSadaf Ebrahimi
2168*22dc650dSSadaf EbrahimiVersion 10.22 29-July-2016
2169*22dc650dSSadaf Ebrahimi--------------------------
2170*22dc650dSSadaf Ebrahimi
2171*22dc650dSSadaf Ebrahimi1. Applied Jason Hood's patches to RunTest.bat and testdata/wintestoutput3
2172*22dc650dSSadaf Ebrahimito fix problems with running the tests under Windows.
2173*22dc650dSSadaf Ebrahimi
2174*22dc650dSSadaf Ebrahimi2. Implemented a facility for quoting literal characters within hexadecimal
2175*22dc650dSSadaf Ebrahimipatterns in pcre2test, to make it easier to create patterns with just a few
2176*22dc650dSSadaf Ebrahiminon-printing characters.
2177*22dc650dSSadaf Ebrahimi
2178*22dc650dSSadaf Ebrahimi3. Binary zeros are not supported in pcre2test input files. It now detects them
2179*22dc650dSSadaf Ebrahimiand gives an error.
2180*22dc650dSSadaf Ebrahimi
2181*22dc650dSSadaf Ebrahimi4. Updated the valgrind parameters in RunTest: (a) changed smc-check=all to
2182*22dc650dSSadaf Ebrahimismc-check=all-non-file; (b) changed obj:* in the suppression file to obj:??? so
2183*22dc650dSSadaf Ebrahimithat it matches only unknown objects.
2184*22dc650dSSadaf Ebrahimi
2185*22dc650dSSadaf Ebrahimi5. Updated the maintenance script maint/ManyConfigTests to make it easier to
2186*22dc650dSSadaf Ebrahimiselect individual groups of tests.
2187*22dc650dSSadaf Ebrahimi
2188*22dc650dSSadaf Ebrahimi6. When the POSIX wrapper function regcomp() is called, the REG_NOSUB option
2189*22dc650dSSadaf Ebrahimiused to set PCRE2_NO_AUTO_CAPTURE when calling pcre2_compile(). However, this
2190*22dc650dSSadaf Ebrahimidisables the use of back references (and subroutine calls), which are supported
2191*22dc650dSSadaf Ebrahimiby other implementations of regcomp() with RE_NOSUB. Therefore, REG_NOSUB no
2192*22dc650dSSadaf Ebrahimilonger causes PCRE2_NO_AUTO_CAPTURE to be set, though it still ignores nmatch
2193*22dc650dSSadaf Ebrahimiand pmatch when regexec() is called.
2194*22dc650dSSadaf Ebrahimi
2195*22dc650dSSadaf Ebrahimi7. Because of 6 above, pcre2test has been modified with a new modifier called
2196*22dc650dSSadaf Ebrahimiposix_nosub, to call regcomp() with REG_NOSUB. Previously the no_auto_capture
2197*22dc650dSSadaf Ebrahimimodifier had this effect. That option is now ignored when the POSIX API is in
2198*22dc650dSSadaf Ebrahimiuse.
2199*22dc650dSSadaf Ebrahimi
2200*22dc650dSSadaf Ebrahimi8. Minor tidies to the pcre2demo.c sample program, including more comments
2201*22dc650dSSadaf Ebrahimiabout its 8-bit-ness.
2202*22dc650dSSadaf Ebrahimi
2203*22dc650dSSadaf Ebrahimi9. Detect unmatched closing parentheses and give the error in the pre-scan
2204*22dc650dSSadaf Ebrahimiinstead of later. Previously the pre-scan carried on and could give a
2205*22dc650dSSadaf Ebrahimimisleading incorrect error message. For example, /(?J)(?'a'))(?'a')/ gave a
2206*22dc650dSSadaf Ebrahimimessage about invalid duplicate group names.
2207*22dc650dSSadaf Ebrahimi
2208*22dc650dSSadaf Ebrahimi10. It has happened that pcre2test was accidentally linked with another POSIX
2209*22dc650dSSadaf Ebrahimiregex library instead of libpcre2-posix. In this situation, a call to regcomp()
2210*22dc650dSSadaf Ebrahimi(in the other library) may succeed, returning zero, but of course putting its
2211*22dc650dSSadaf Ebrahimiown data into the regex_t block. In one example the re_pcre2_code field was
2212*22dc650dSSadaf Ebrahimileft as NULL, which made pcre2test think it had not got a compiled POSIX regex,
2213*22dc650dSSadaf Ebrahimiso it treated the next line as another pattern line, resulting in a confusing
2214*22dc650dSSadaf Ebrahimierror message. A check has been added to pcre2test to see if the data returned
2215*22dc650dSSadaf Ebrahimifrom a successful call of regcomp() are valid for PCRE2's regcomp(). If they
2216*22dc650dSSadaf Ebrahimiare not, an error message is output and the pcre2test run is abandoned. The
2217*22dc650dSSadaf Ebrahimimessage points out the possibility of a mis-linking. Hopefully this will avoid
2218*22dc650dSSadaf Ebrahimisome head-scratching the next time this happens.
2219*22dc650dSSadaf Ebrahimi
2220*22dc650dSSadaf Ebrahimi11. A pattern such as /(?<=((?C)0))/, which has a callout inside a lookbehind
2221*22dc650dSSadaf Ebrahimiassertion, caused pcre2test to output a very large number of spaces when the
2222*22dc650dSSadaf Ebrahimicallout was taken, making the program appearing to loop.
2223*22dc650dSSadaf Ebrahimi
2224*22dc650dSSadaf Ebrahimi12. A pattern that included (*ACCEPT) in the middle of a sufficiently deeply
2225*22dc650dSSadaf Ebrahiminested set of parentheses of sufficient size caused an overflow of the
2226*22dc650dSSadaf Ebrahimicompiling workspace (which was diagnosed, but of course is not desirable).
2227*22dc650dSSadaf Ebrahimi
2228*22dc650dSSadaf Ebrahimi13. Detect missing closing parentheses during the pre-pass for group
2229*22dc650dSSadaf Ebrahimiidentification.
2230*22dc650dSSadaf Ebrahimi
2231*22dc650dSSadaf Ebrahimi14. Changed some integer variable types and put in a number of casts, following
2232*22dc650dSSadaf Ebrahimia report of compiler warnings from Visual Studio 2013 and a few tests with
2233*22dc650dSSadaf Ebrahimigcc's -Wconversion (which still throws up a lot).
2234*22dc650dSSadaf Ebrahimi
2235*22dc650dSSadaf Ebrahimi15. Implemented pcre2_code_copy(), and added pushcopy and #popcopy to pcre2test
2236*22dc650dSSadaf Ebrahimifor testing it.
2237*22dc650dSSadaf Ebrahimi
2238*22dc650dSSadaf Ebrahimi16. Change 66 for 10.21 introduced the use of snprintf() in PCRE2's version of
2239*22dc650dSSadaf Ebrahimiregerror(). When the error buffer is too small, my version of snprintf() puts a
2240*22dc650dSSadaf Ebrahimibinary zero in the final byte. Bug #1801 seems to show that other versions do
2241*22dc650dSSadaf Ebrahiminot do this, leading to bad output from pcre2test when it was checking for
2242*22dc650dSSadaf Ebrahimibuffer overflow. It no longer assumes a binary zero at the end of a too-small
2243*22dc650dSSadaf Ebrahimiregerror() buffer.
2244*22dc650dSSadaf Ebrahimi
2245*22dc650dSSadaf Ebrahimi17. Fixed typo ("&&" for "&") in pcre2_study(). Fortunately, this could not
2246*22dc650dSSadaf Ebrahimiactually affect anything, by sheer luck.
2247*22dc650dSSadaf Ebrahimi
2248*22dc650dSSadaf Ebrahimi18. Two minor fixes for MSVC compilation: (a) removal of apparently incorrect
2249*22dc650dSSadaf Ebrahimi"const" qualifiers in pcre2test and (b) defining snprintf as _snprintf for
2250*22dc650dSSadaf Ebrahimiolder MSVC compilers. This has been done both in src/pcre2_internal.h for most
2251*22dc650dSSadaf Ebrahimiof the library, and also in src/pcre2posix.c, which no longer includes
2252*22dc650dSSadaf Ebrahimipcre2_internal.h (see 24 below).
2253*22dc650dSSadaf Ebrahimi
2254*22dc650dSSadaf Ebrahimi19. Applied Chris Wilson's patch (Bugzilla #1681) to CMakeLists.txt for MSVC
2255*22dc650dSSadaf Ebrahimistatic compilation. Subsequently applied Chris Wilson's second patch, putting
2256*22dc650dSSadaf Ebrahimithe first patch under a new option instead of being unconditional when
2257*22dc650dSSadaf EbrahimiPCRE_STATIC is set.
2258*22dc650dSSadaf Ebrahimi
2259*22dc650dSSadaf Ebrahimi20. Updated pcre2grep to set stdout as binary when run under Windows, so as not
2260*22dc650dSSadaf Ebrahimito convert \r\n at the ends of reflected lines into \r\r\n. This required
2261*22dc650dSSadaf Ebrahimiensuring that other output that is written to stdout (e.g. file names) uses the
2262*22dc650dSSadaf Ebrahimiappropriate line terminator: \r\n for Windows, \n otherwise.
2263*22dc650dSSadaf Ebrahimi
2264*22dc650dSSadaf Ebrahimi21. When a line is too long for pcre2grep's internal buffer, show the maximum
2265*22dc650dSSadaf Ebrahimilength in the error message.
2266*22dc650dSSadaf Ebrahimi
2267*22dc650dSSadaf Ebrahimi22. Added support for string callouts to pcre2grep (Zoltan's patch with PH
2268*22dc650dSSadaf Ebrahimiadditions).
2269*22dc650dSSadaf Ebrahimi
2270*22dc650dSSadaf Ebrahimi23. RunTest.bat was missing a "set type" line for test 22.
2271*22dc650dSSadaf Ebrahimi
2272*22dc650dSSadaf Ebrahimi24. The pcre2posix.c file was including pcre2_internal.h, and using some
2273*22dc650dSSadaf Ebrahimi"private" knowledge of the data structures. This is unnecessary; the code has
2274*22dc650dSSadaf Ebrahimibeen re-factored and no longer includes pcre2_internal.h.
2275*22dc650dSSadaf Ebrahimi
2276*22dc650dSSadaf Ebrahimi25. A racing condition is fixed in JIT reported by Mozilla.
2277*22dc650dSSadaf Ebrahimi
2278*22dc650dSSadaf Ebrahimi26. Minor code refactor to avoid "array subscript is below array bounds"
2279*22dc650dSSadaf Ebrahimicompiler warning.
2280*22dc650dSSadaf Ebrahimi
2281*22dc650dSSadaf Ebrahimi27. Minor code refactor to avoid "left shift of negative number" warning.
2282*22dc650dSSadaf Ebrahimi
2283*22dc650dSSadaf Ebrahimi28. Add a bit more sanity checking to pcre2_serialize_decode() and document
2284*22dc650dSSadaf Ebrahimithat it expects trusted data.
2285*22dc650dSSadaf Ebrahimi
2286*22dc650dSSadaf Ebrahimi29. Fix typo in pcre2_jit_test.c
2287*22dc650dSSadaf Ebrahimi
2288*22dc650dSSadaf Ebrahimi30. Due to an oversight, pcre2grep was not making use of JIT when available.
2289*22dc650dSSadaf EbrahimiThis is now fixed.
2290*22dc650dSSadaf Ebrahimi
2291*22dc650dSSadaf Ebrahimi31. The RunGrepTest script is updated to use the valgrind suppressions file
2292*22dc650dSSadaf Ebrahimiwhen testing with JIT under valgrind (compare 10.21/51 below). The suppressions
2293*22dc650dSSadaf Ebrahimifile is updated so that is now the same as for PCRE1: it suppresses the
2294*22dc650dSSadaf EbrahimiMemcheck warnings Addr16 and Cond in unknown objects (that is, JIT-compiled
2295*22dc650dSSadaf Ebrahimicode). Also changed smc-check=all to smc-check=all-non-file as was done for
2296*22dc650dSSadaf EbrahimiRunTest (see 4 above).
2297*22dc650dSSadaf Ebrahimi
2298*22dc650dSSadaf Ebrahimi32. Implemented the PCRE2_NO_JIT option for pcre2_match().
2299*22dc650dSSadaf Ebrahimi
2300*22dc650dSSadaf Ebrahimi33. Fix typo that gave a compiler error when JIT not supported.
2301*22dc650dSSadaf Ebrahimi
2302*22dc650dSSadaf Ebrahimi34. Fix comment describing the returns from find_fixedlength().
2303*22dc650dSSadaf Ebrahimi
2304*22dc650dSSadaf Ebrahimi35. Fix potential negative index in pcre2test.
2305*22dc650dSSadaf Ebrahimi
2306*22dc650dSSadaf Ebrahimi36. Calls to pcre2_get_error_message() with error numbers that are never
2307*22dc650dSSadaf Ebrahimireturned by PCRE2 functions were returning empty strings. Now the error code
2308*22dc650dSSadaf EbrahimiPCRE2_ERROR_BADDATA is returned. A facility has been added to pcre2test to
2309*22dc650dSSadaf Ebrahimishow the texts for given error numbers (i.e. to call pcre2_get_error_message()
2310*22dc650dSSadaf Ebrahimiand display what it returns) and a few representative error codes are now
2311*22dc650dSSadaf Ebrahimichecked in RunTest.
2312*22dc650dSSadaf Ebrahimi
2313*22dc650dSSadaf Ebrahimi37. Added "&& !defined(__INTEL_COMPILER)" to the test for __GNUC__ in
2314*22dc650dSSadaf Ebrahimipcre2_match.c, in anticipation that this is needed for the same reason it was
2315*22dc650dSSadaf Ebrahimirecently added to pcrecpp.cc in PCRE1.
2316*22dc650dSSadaf Ebrahimi
2317*22dc650dSSadaf Ebrahimi38. Using -o with -M in pcre2grep could cause unnecessary repeated output when
2318*22dc650dSSadaf Ebrahimithe match extended over a line boundary, as it tried to find more matches "on
2319*22dc650dSSadaf Ebrahimithe same line" - but it was already over the end.
2320*22dc650dSSadaf Ebrahimi
2321*22dc650dSSadaf Ebrahimi39. Allow \C in lookbehinds and DFA matching in UTF-32 mode (by converting it
2322*22dc650dSSadaf Ebrahimito the same code as '.' when PCRE2_DOTALL is set).
2323*22dc650dSSadaf Ebrahimi
2324*22dc650dSSadaf Ebrahimi40. Fix two clang compiler warnings in pcre2test when only one code unit width
2325*22dc650dSSadaf Ebrahimiis supported.
2326*22dc650dSSadaf Ebrahimi
2327*22dc650dSSadaf Ebrahimi41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack
2328*22dc650dSSadaf Ebrahimiif it fails when running the interpreter with a 16MiB stack (and if changing
2329*22dc650dSSadaf Ebrahimithe stack size via pcre2test is possible). This avoids having to manually set a
2330*22dc650dSSadaf Ebrahimilarge stack size when testing with clang.
2331*22dc650dSSadaf Ebrahimi
2332*22dc650dSSadaf Ebrahimi42. Fix register overwrite in JIT when SSE2 acceleration is enabled.
2333*22dc650dSSadaf Ebrahimi
2334*22dc650dSSadaf Ebrahimi43. Detect integer overflow in pcre2test pattern and data repetition counts.
2335*22dc650dSSadaf Ebrahimi
2336*22dc650dSSadaf Ebrahimi44. In pcre2test, ignore "allcaptures" after DFA matching.
2337*22dc650dSSadaf Ebrahimi
2338*22dc650dSSadaf Ebrahimi45. Fix unaligned accesses on x86. Patch by Marc Mutz.
2339*22dc650dSSadaf Ebrahimi
2340*22dc650dSSadaf Ebrahimi46. Fix some more clang compiler warnings.
2341*22dc650dSSadaf Ebrahimi
2342*22dc650dSSadaf Ebrahimi
2343*22dc650dSSadaf EbrahimiVersion 10.21 12-January-2016
2344*22dc650dSSadaf Ebrahimi-----------------------------
2345*22dc650dSSadaf Ebrahimi
2346*22dc650dSSadaf Ebrahimi1. Improve matching speed of patterns starting with + or * in JIT.
2347*22dc650dSSadaf Ebrahimi
2348*22dc650dSSadaf Ebrahimi2. Use memchr() to find the first character in an unanchored match in 8-bit
2349*22dc650dSSadaf Ebrahimimode in the interpreter. This gives a significant speed improvement.
2350*22dc650dSSadaf Ebrahimi
2351*22dc650dSSadaf Ebrahimi3. Removed a redundant copy of the opcode_possessify table in the
2352*22dc650dSSadaf Ebrahimipcre2_auto_possessify.c source.
2353*22dc650dSSadaf Ebrahimi
2354*22dc650dSSadaf Ebrahimi4. Fix typos in dftables.c for z/OS.
2355*22dc650dSSadaf Ebrahimi
2356*22dc650dSSadaf Ebrahimi5. Change 36 for 10.20 broke the handling of [[:>:]] and [[:<:]] in that
2357*22dc650dSSadaf Ebrahimiprocessing them could involve a buffer overflow if the following character was
2358*22dc650dSSadaf Ebrahimian opening parenthesis.
2359*22dc650dSSadaf Ebrahimi
2360*22dc650dSSadaf Ebrahimi6. Change 36 for 10.20 also introduced a bug in processing this pattern:
2361*22dc650dSSadaf Ebrahimi/((?x)(*:0))#(?'/. Specifically: if a setting of (?x) was followed by a (*MARK)
2362*22dc650dSSadaf Ebrahimisetting (which (*:0) is), then (?x) did not get unset at the end of its group
2363*22dc650dSSadaf Ebrahimiduring the scan for named groups, and hence the external # was incorrectly
2364*22dc650dSSadaf Ebrahimitreated as a comment and the invalid (?' at the end of the pattern was not
2365*22dc650dSSadaf Ebrahimidiagnosed. This caused a buffer overflow during the real compile. This bug was
2366*22dc650dSSadaf Ebrahimidiscovered by Karl Skomski with the LLVM fuzzer.
2367*22dc650dSSadaf Ebrahimi
2368*22dc650dSSadaf Ebrahimi7. Moved the pcre2_find_bracket() function from src/pcre2_compile.c into its
2369*22dc650dSSadaf Ebrahimiown source module to avoid a circular dependency between src/pcre2_compile.c
2370*22dc650dSSadaf Ebrahimiand src/pcre2_study.c
2371*22dc650dSSadaf Ebrahimi
2372*22dc650dSSadaf Ebrahimi8. A callout with a string argument containing an opening square bracket, for
2373*22dc650dSSadaf Ebrahimiexample /(?C$[$)(?<]/, was incorrectly processed and could provoke a buffer
2374*22dc650dSSadaf Ebrahimioverflow. This bug was discovered by Karl Skomski with the LLVM fuzzer.
2375*22dc650dSSadaf Ebrahimi
2376*22dc650dSSadaf Ebrahimi9. The handling of callouts during the pre-pass for named group identification
2377*22dc650dSSadaf Ebrahimihas been tightened up.
2378*22dc650dSSadaf Ebrahimi
2379*22dc650dSSadaf Ebrahimi10. The quantifier {1} can be ignored, whether greedy, non-greedy, or
2380*22dc650dSSadaf Ebrahimipossessive. This is a very minor optimization.
2381*22dc650dSSadaf Ebrahimi
2382*22dc650dSSadaf Ebrahimi11. A possessively repeated conditional group that could match an empty string,
2383*22dc650dSSadaf Ebrahimifor example, /(?(R))*+/, was incorrectly compiled.
2384*22dc650dSSadaf Ebrahimi
2385*22dc650dSSadaf Ebrahimi12. The Unicode tables have been updated to Unicode 8.0.0 (thanks to Christian
2386*22dc650dSSadaf EbrahimiPersch).
2387*22dc650dSSadaf Ebrahimi
2388*22dc650dSSadaf Ebrahimi13. An empty comment (?#) in a pattern was incorrectly processed and could
2389*22dc650dSSadaf Ebrahimiprovoke a buffer overflow. This bug was discovered by Karl Skomski with the
2390*22dc650dSSadaf EbrahimiLLVM fuzzer.
2391*22dc650dSSadaf Ebrahimi
2392*22dc650dSSadaf Ebrahimi14. Fix infinite recursion in the JIT compiler when certain patterns such as
2393*22dc650dSSadaf Ebrahimi/(?:|a|){100}x/ are analysed.
2394*22dc650dSSadaf Ebrahimi
2395*22dc650dSSadaf Ebrahimi15. Some patterns with character classes involving [: and \\ were incorrectly
2396*22dc650dSSadaf Ebrahimicompiled and could cause reading from uninitialized memory or an incorrect
2397*22dc650dSSadaf Ebrahimierror diagnosis. Examples are: /[[:\\](?<[::]/ and /[[:\\](?'abc')[a:]. The
2398*22dc650dSSadaf Ebrahimifirst of these bugs was discovered by Karl Skomski with the LLVM fuzzer.
2399*22dc650dSSadaf Ebrahimi
2400*22dc650dSSadaf Ebrahimi16. Pathological patterns containing many nested occurrences of [: caused
2401*22dc650dSSadaf Ebrahimipcre2_compile() to run for a very long time. This bug was found by the LLVM
2402*22dc650dSSadaf Ebrahimifuzzer.
2403*22dc650dSSadaf Ebrahimi
2404*22dc650dSSadaf Ebrahimi17. A missing closing parenthesis for a callout with a string argument was not
2405*22dc650dSSadaf Ebrahimibeing diagnosed, possibly leading to a buffer overflow. This bug was found by
2406*22dc650dSSadaf Ebrahimithe LLVM fuzzer.
2407*22dc650dSSadaf Ebrahimi
2408*22dc650dSSadaf Ebrahimi18. A conditional group with only one branch has an implicit empty alternative
2409*22dc650dSSadaf Ebrahimibranch and must therefore be treated as potentially matching an empty string.
2410*22dc650dSSadaf Ebrahimi
2411*22dc650dSSadaf Ebrahimi19. If (?R was followed by - or + incorrect behaviour happened instead of a
2412*22dc650dSSadaf Ebrahimidiagnostic. This bug was discovered by Karl Skomski with the LLVM fuzzer.
2413*22dc650dSSadaf Ebrahimi
2414*22dc650dSSadaf Ebrahimi20. Another bug that was introduced by change 36 for 10.20: conditional groups
2415*22dc650dSSadaf Ebrahimiwhose condition was an assertion preceded by an explicit callout with a string
2416*22dc650dSSadaf Ebrahimiargument might be incorrectly processed, especially if the string contained \Q.
2417*22dc650dSSadaf EbrahimiThis bug was discovered by Karl Skomski with the LLVM fuzzer.
2418*22dc650dSSadaf Ebrahimi
2419*22dc650dSSadaf Ebrahimi21. Compiling PCRE2 with the sanitize options of clang showed up a number of
2420*22dc650dSSadaf Ebrahimivery pedantic coding infelicities and a buffer overflow while checking a UTF-8
2421*22dc650dSSadaf Ebrahimistring if the final multi-byte UTF-8 character was truncated.
2422*22dc650dSSadaf Ebrahimi
2423*22dc650dSSadaf Ebrahimi22. For Perl compatibility in EBCDIC environments, ranges such as a-z in a
2424*22dc650dSSadaf Ebrahimiclass, where both values are literal letters in the same case, omit the
2425*22dc650dSSadaf Ebrahiminon-letter EBCDIC code points within the range.
2426*22dc650dSSadaf Ebrahimi
2427*22dc650dSSadaf Ebrahimi23. Finding the minimum matching length of complex patterns with back
2428*22dc650dSSadaf Ebrahimireferences and/or recursions can take a long time. There is now a cut-off that
2429*22dc650dSSadaf Ebrahimigives up trying to find a minimum length when things get too complex.
2430*22dc650dSSadaf Ebrahimi
2431*22dc650dSSadaf Ebrahimi24. An optimization has been added that speeds up finding the minimum matching
2432*22dc650dSSadaf Ebrahimilength for patterns containing repeated capturing groups or recursions.
2433*22dc650dSSadaf Ebrahimi
2434*22dc650dSSadaf Ebrahimi25. If a pattern contained a back reference to a group whose number was
2435*22dc650dSSadaf Ebrahimiduplicated as a result of appearing in a (?|...) group, the computation of the
2436*22dc650dSSadaf Ebrahimiminimum matching length gave a wrong result, which could cause incorrect "no
2437*22dc650dSSadaf Ebrahimimatch" errors. For such patterns, a minimum matching length cannot at present
2438*22dc650dSSadaf Ebrahimibe computed.
2439*22dc650dSSadaf Ebrahimi
2440*22dc650dSSadaf Ebrahimi26. Added a check for integer overflow in conditions (?(<digits>) and
2441*22dc650dSSadaf Ebrahimi(?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
2442*22dc650dSSadaf Ebrahimifuzzer.
2443*22dc650dSSadaf Ebrahimi
2444*22dc650dSSadaf Ebrahimi27. Fixed an issue when \p{Any} inside an xclass did not read the current
2445*22dc650dSSadaf Ebrahimicharacter.
2446*22dc650dSSadaf Ebrahimi
2447*22dc650dSSadaf Ebrahimi28. If pcre2grep was given the -q option with -c or -l, or when handling a
2448*22dc650dSSadaf Ebrahimibinary file, it incorrectly wrote output to stdout.
2449*22dc650dSSadaf Ebrahimi
2450*22dc650dSSadaf Ebrahimi29. The JIT compiler did not restore the control verb head in case of *THEN
2451*22dc650dSSadaf Ebrahimicontrol verbs. This issue was found by Karl Skomski with a custom LLVM fuzzer.
2452*22dc650dSSadaf Ebrahimi
2453*22dc650dSSadaf Ebrahimi30. The way recursive references such as (?3) are compiled has been re-written
2454*22dc650dSSadaf Ebrahimibecause the old way was the cause of many issues. Now, conversion of the group
2455*22dc650dSSadaf Ebrahiminumber into a pattern offset does not happen until the pattern has been
2456*22dc650dSSadaf Ebrahimicompletely compiled. This does mean that detection of all infinitely looping
2457*22dc650dSSadaf Ebrahimirecursions is postponed till match time. In the past, some easy ones were
2458*22dc650dSSadaf Ebrahimidetected at compile time. This re-writing was done in response to yet another
2459*22dc650dSSadaf Ebrahimibug found by the LLVM fuzzer.
2460*22dc650dSSadaf Ebrahimi
2461*22dc650dSSadaf Ebrahimi31. A test for a back reference to a non-existent group was missing for items
2462*22dc650dSSadaf Ebrahimisuch as \987. This caused incorrect code to be compiled. This issue was found
2463*22dc650dSSadaf Ebrahimiby Karl Skomski with a custom LLVM fuzzer.
2464*22dc650dSSadaf Ebrahimi
2465*22dc650dSSadaf Ebrahimi32. Error messages for syntax errors following \g and \k were giving inaccurate
2466*22dc650dSSadaf Ebrahimioffsets in the pattern.
2467*22dc650dSSadaf Ebrahimi
2468*22dc650dSSadaf Ebrahimi33. Improve the performance of starting single character repetitions in JIT.
2469*22dc650dSSadaf Ebrahimi
2470*22dc650dSSadaf Ebrahimi34. (*LIMIT_MATCH=) now gives an error instead of setting the value to 0.
2471*22dc650dSSadaf Ebrahimi
2472*22dc650dSSadaf Ebrahimi35. Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now
2473*22dc650dSSadaf Ebrahimigive the right offset instead of zero.
2474*22dc650dSSadaf Ebrahimi
2475*22dc650dSSadaf Ebrahimi36. The JIT compiler should not check repeats after a {0,1} repeat byte code.
2476*22dc650dSSadaf EbrahimiThis issue was found by Karl Skomski with a custom LLVM fuzzer.
2477*22dc650dSSadaf Ebrahimi
2478*22dc650dSSadaf Ebrahimi37. The JIT compiler should restore the control chain for empty possessive
2479*22dc650dSSadaf Ebrahimirepeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
2480*22dc650dSSadaf Ebrahimi
2481*22dc650dSSadaf Ebrahimi38. A bug which was introduced by the single character repetition optimization
2482*22dc650dSSadaf Ebrahimiwas fixed.
2483*22dc650dSSadaf Ebrahimi
2484*22dc650dSSadaf Ebrahimi39. Match limit check added to recursion. This issue was found by Karl Skomski
2485*22dc650dSSadaf Ebrahimiwith a custom LLVM fuzzer.
2486*22dc650dSSadaf Ebrahimi
2487*22dc650dSSadaf Ebrahimi40. Arrange for the UTF check in pcre2_match() and pcre2_dfa_match() to look
2488*22dc650dSSadaf Ebrahimionly at the part of the subject that is relevant when the starting offset is
2489*22dc650dSSadaf Ebrahiminon-zero.
2490*22dc650dSSadaf Ebrahimi
2491*22dc650dSSadaf Ebrahimi41. Improve first character match in JIT with SSE2 on x86.
2492*22dc650dSSadaf Ebrahimi
2493*22dc650dSSadaf Ebrahimi42. Fix two assertion fails in JIT. These issues were found by Karl Skomski
2494*22dc650dSSadaf Ebrahimiwith a custom LLVM fuzzer.
2495*22dc650dSSadaf Ebrahimi
2496*22dc650dSSadaf Ebrahimi43. Correct the setting of CMAKE_C_FLAGS in CMakeLists.txt (patch from Roy Ivy
2497*22dc650dSSadaf EbrahimiIII).
2498*22dc650dSSadaf Ebrahimi
2499*22dc650dSSadaf Ebrahimi44. Fix bug in RunTest.bat for new test 14, and adjust the script for the added
2500*22dc650dSSadaf Ebrahimitest (there are now 20 in total).
2501*22dc650dSSadaf Ebrahimi
2502*22dc650dSSadaf Ebrahimi45. Fixed a corner case of range optimization in JIT.
2503*22dc650dSSadaf Ebrahimi
2504*22dc650dSSadaf Ebrahimi46. Add the ${*MARK} facility to pcre2_substitute().
2505*22dc650dSSadaf Ebrahimi
2506*22dc650dSSadaf Ebrahimi47. Modifier lists in pcre2test were splitting at spaces without the required
2507*22dc650dSSadaf Ebrahimicommas.
2508*22dc650dSSadaf Ebrahimi
2509*22dc650dSSadaf Ebrahimi48. Implemented PCRE2_ALT_VERBNAMES.
2510*22dc650dSSadaf Ebrahimi
2511*22dc650dSSadaf Ebrahimi49. Fixed two issues in JIT. These were found by Karl Skomski with a custom
2512*22dc650dSSadaf EbrahimiLLVM fuzzer.
2513*22dc650dSSadaf Ebrahimi
2514*22dc650dSSadaf Ebrahimi50. The pcre2test program has been extended by adding the #newline_default
2515*22dc650dSSadaf Ebrahimicommand. This has made it possible to run the standard tests when PCRE2 is
2516*22dc650dSSadaf Ebrahimicompiled with either CR or CRLF as the default newline convention. As part of
2517*22dc650dSSadaf Ebrahimithis work, the new command was added to several test files and the testing
2518*22dc650dSSadaf Ebrahimiscripts were modified. The pcre2grep tests can now also be run when there is no
2519*22dc650dSSadaf EbrahimiLF in the default newline convention.
2520*22dc650dSSadaf Ebrahimi
2521*22dc650dSSadaf Ebrahimi51. The RunTest script has been modified so that, when JIT is used and valgrind
2522*22dc650dSSadaf Ebrahimiis specified, a valgrind suppressions file is set up to ignore "Invalid read of
2523*22dc650dSSadaf Ebrahimisize 16" errors because these are false positives when the hardware supports
2524*22dc650dSSadaf Ebrahimithe SSE2 instruction set.
2525*22dc650dSSadaf Ebrahimi
2526*22dc650dSSadaf Ebrahimi52. It is now possible to have comment lines amid the subject strings in
2527*22dc650dSSadaf Ebrahimipcre2test (and perltest.sh) input.
2528*22dc650dSSadaf Ebrahimi
2529*22dc650dSSadaf Ebrahimi53. Implemented PCRE2_USE_OFFSET_LIMIT and pcre2_set_offset_limit().
2530*22dc650dSSadaf Ebrahimi
2531*22dc650dSSadaf Ebrahimi54. Add the null_context modifier to pcre2test so that calling pcre2_compile()
2532*22dc650dSSadaf Ebrahimiand the matching functions with NULL contexts can be tested.
2533*22dc650dSSadaf Ebrahimi
2534*22dc650dSSadaf Ebrahimi55. Implemented PCRE2_SUBSTITUTE_EXTENDED.
2535*22dc650dSSadaf Ebrahimi
2536*22dc650dSSadaf Ebrahimi56. In a character class such as [\W\p{Any}] where both a negative-type escape
2537*22dc650dSSadaf Ebrahimi("not a word character") and a property escape were present, the property
2538*22dc650dSSadaf Ebrahimiescape was being ignored.
2539*22dc650dSSadaf Ebrahimi
2540*22dc650dSSadaf Ebrahimi57. Fixed integer overflow for patterns whose minimum matching length is very,
2541*22dc650dSSadaf Ebrahimivery large.
2542*22dc650dSSadaf Ebrahimi
2543*22dc650dSSadaf Ebrahimi58. Implemented --never-backslash-C.
2544*22dc650dSSadaf Ebrahimi
2545*22dc650dSSadaf Ebrahimi59. Change 55 above introduced a bug by which certain patterns provoked the
2546*22dc650dSSadaf Ebrahimierroneous error "\ at end of pattern".
2547*22dc650dSSadaf Ebrahimi
2548*22dc650dSSadaf Ebrahimi60. The special sequences [[:<:]] and [[:>:]] gave rise to incorrect compiling
2549*22dc650dSSadaf Ebrahimierrors or other strange effects if compiled in UCP mode. Found with libFuzzer
2550*22dc650dSSadaf Ebrahimiand AddressSanitizer.
2551*22dc650dSSadaf Ebrahimi
2552*22dc650dSSadaf Ebrahimi61. Whitespace at the end of a pcre2test pattern line caused a spurious error
2553*22dc650dSSadaf Ebrahimimessage if there were only single-character modifiers. It should be ignored.
2554*22dc650dSSadaf Ebrahimi
2555*22dc650dSSadaf Ebrahimi62. The use of PCRE2_NO_AUTO_CAPTURE could cause incorrect compilation results
2556*22dc650dSSadaf Ebrahimior segmentation errors for some patterns. Found with libFuzzer and
2557*22dc650dSSadaf EbrahimiAddressSanitizer.
2558*22dc650dSSadaf Ebrahimi
2559*22dc650dSSadaf Ebrahimi63. Very long names in (*MARK) or (*THEN) etc. items could provoke a buffer
2560*22dc650dSSadaf Ebrahimioverflow.
2561*22dc650dSSadaf Ebrahimi
2562*22dc650dSSadaf Ebrahimi64. Improve error message for overly-complicated patterns.
2563*22dc650dSSadaf Ebrahimi
2564*22dc650dSSadaf Ebrahimi65. Implemented an optional replication feature for patterns in pcre2test, to
2565*22dc650dSSadaf Ebrahimimake it easier to test long repetitive patterns. The tests for 63 above are
2566*22dc650dSSadaf Ebrahimiconverted to use the new feature.
2567*22dc650dSSadaf Ebrahimi
2568*22dc650dSSadaf Ebrahimi66. In the POSIX wrapper, if regerror() was given too small a buffer, it could
2569*22dc650dSSadaf Ebrahimimisbehave.
2570*22dc650dSSadaf Ebrahimi
2571*22dc650dSSadaf Ebrahimi67. In pcre2_substitute() in UTF mode, the UTF validity check on the
2572*22dc650dSSadaf Ebrahimireplacement string was happening before the length setting when the replacement
2573*22dc650dSSadaf Ebrahimistring was zero-terminated.
2574*22dc650dSSadaf Ebrahimi
2575*22dc650dSSadaf Ebrahimi68. In pcre2_substitute() in UTF mode, PCRE2_NO_UTF_CHECK can be set for the
2576*22dc650dSSadaf Ebrahimisecond and subsequent calls to pcre2_match().
2577*22dc650dSSadaf Ebrahimi
2578*22dc650dSSadaf Ebrahimi69. There was no check for integer overflow for a replacement group number in
2579*22dc650dSSadaf Ebrahimipcre2_substitute(). An added check for a number greater than the largest group
2580*22dc650dSSadaf Ebrahiminumber in the pattern means this is not now needed.
2581*22dc650dSSadaf Ebrahimi
2582*22dc650dSSadaf Ebrahimi70. The PCRE2-specific VERSION condition didn't work correctly if only one
2583*22dc650dSSadaf Ebrahimidigit was given after the decimal point, or if more than two digits were given.
2584*22dc650dSSadaf EbrahimiIt now works with one or two digits, and gives a compile time error if more are
2585*22dc650dSSadaf Ebrahimigiven.
2586*22dc650dSSadaf Ebrahimi
2587*22dc650dSSadaf Ebrahimi71. In pcre2_substitute() there was the possibility of reading one code unit
2588*22dc650dSSadaf Ebrahimibeyond the end of the replacement string.
2589*22dc650dSSadaf Ebrahimi
2590*22dc650dSSadaf Ebrahimi72. The code for checking a subject's UTF-32 validity for a pattern with a
2591*22dc650dSSadaf Ebrahimilookbehind involved an out-of-bounds pointer, which could potentially cause
2592*22dc650dSSadaf Ebrahimitrouble in some environments.
2593*22dc650dSSadaf Ebrahimi
2594*22dc650dSSadaf Ebrahimi73. The maximum lookbehind length was incorrectly calculated for patterns such
2595*22dc650dSSadaf Ebrahimias /(?<=(a)(?-1))x/ which have a recursion within a backreference.
2596*22dc650dSSadaf Ebrahimi
2597*22dc650dSSadaf Ebrahimi74. Give an error if a lookbehind assertion is longer than 65535 code units.
2598*22dc650dSSadaf Ebrahimi
2599*22dc650dSSadaf Ebrahimi75. Give an error in pcre2_substitute() if a match ends before it starts (as a
2600*22dc650dSSadaf Ebrahimiresult of the use of \K).
2601*22dc650dSSadaf Ebrahimi
2602*22dc650dSSadaf Ebrahimi76. Check the length of subpattern names and the names in (*MARK:xx) etc.
2603*22dc650dSSadaf Ebrahimidynamically to avoid the possibility of integer overflow.
2604*22dc650dSSadaf Ebrahimi
2605*22dc650dSSadaf Ebrahimi77. Implement pcre2_set_max_pattern_length() so that programs can restrict the
2606*22dc650dSSadaf Ebrahimisize of patterns that they are prepared to handle.
2607*22dc650dSSadaf Ebrahimi
2608*22dc650dSSadaf Ebrahimi78. (*NO_AUTO_POSSESS) was not working.
2609*22dc650dSSadaf Ebrahimi
2610*22dc650dSSadaf Ebrahimi79. Adding group information caching improves the speed of compiling when
2611*22dc650dSSadaf Ebrahimichecking whether a group has a fixed length and/or could match an empty string,
2612*22dc650dSSadaf Ebrahimiespecially when recursion or subroutine calls are involved. However, this
2613*22dc650dSSadaf Ebrahimicannot be used when (?| is present in the pattern because the same number may
2614*22dc650dSSadaf Ebrahimibe used for groups of different sizes. To catch runaway patterns in this
2615*22dc650dSSadaf Ebrahimisituation, counts have been introduced to the functions that scan for empty
2616*22dc650dSSadaf Ebrahimibranches or compute fixed lengths.
2617*22dc650dSSadaf Ebrahimi
2618*22dc650dSSadaf Ebrahimi80. Allow for the possibility of the size of the nest_save structure not being
2619*22dc650dSSadaf Ebrahimia factor of the size of the compiling workspace (it currently is).
2620*22dc650dSSadaf Ebrahimi
2621*22dc650dSSadaf Ebrahimi81. Check for integer overflow in minimum length calculation and cap it at
2622*22dc650dSSadaf Ebrahimi65535.
2623*22dc650dSSadaf Ebrahimi
2624*22dc650dSSadaf Ebrahimi82. Small optimizations in code for finding the minimum matching length.
2625*22dc650dSSadaf Ebrahimi
2626*22dc650dSSadaf Ebrahimi83. Lock out configuring for EBCDIC with non-8-bit libraries.
2627*22dc650dSSadaf Ebrahimi
2628*22dc650dSSadaf Ebrahimi84. Test for error code <= 0 in regerror().
2629*22dc650dSSadaf Ebrahimi
2630*22dc650dSSadaf Ebrahimi85. Check for too many replacements (more than INT_MAX) in pcre2_substitute().
2631*22dc650dSSadaf Ebrahimi
2632*22dc650dSSadaf Ebrahimi86. Avoid the possibility of computing with an out-of-bounds pointer (though
2633*22dc650dSSadaf Ebrahiminot dereferencing it) while handling lookbehind assertions.
2634*22dc650dSSadaf Ebrahimi
2635*22dc650dSSadaf Ebrahimi87. Failure to get memory for the match data in regcomp() is now given as a
2636*22dc650dSSadaf Ebrahimiregcomp() error instead of waiting for regexec() to pick it up.
2637*22dc650dSSadaf Ebrahimi
2638*22dc650dSSadaf Ebrahimi88. In pcre2_substitute(), ensure that CRLF is not split when it is a valid
2639*22dc650dSSadaf Ebrahiminewline sequence.
2640*22dc650dSSadaf Ebrahimi
2641*22dc650dSSadaf Ebrahimi89. Paranoid check in regcomp() for bad error code from pcre2_compile().
2642*22dc650dSSadaf Ebrahimi
2643*22dc650dSSadaf Ebrahimi90. Run test 8 (internal offsets and code sizes) for link sizes 3 and 4 as well
2644*22dc650dSSadaf Ebrahimias for link size 2.
2645*22dc650dSSadaf Ebrahimi
2646*22dc650dSSadaf Ebrahimi91. Document that JIT has a limit on pattern size, and give more information
2647*22dc650dSSadaf Ebrahimiabout JIT compile failures in pcre2test.
2648*22dc650dSSadaf Ebrahimi
2649*22dc650dSSadaf Ebrahimi92. Implement PCRE2_INFO_HASBACKSLASHC.
2650*22dc650dSSadaf Ebrahimi
2651*22dc650dSSadaf Ebrahimi93. Re-arrange valgrind support code in pcre2test to avoid spurious reports
2652*22dc650dSSadaf Ebrahimiwith JIT (possibly caused by SSE2?).
2653*22dc650dSSadaf Ebrahimi
2654*22dc650dSSadaf Ebrahimi94. Support offset_limit in JIT.
2655*22dc650dSSadaf Ebrahimi
2656*22dc650dSSadaf Ebrahimi95. A sequence such as [[:punct:]b] that is, a POSIX character class followed
2657*22dc650dSSadaf Ebrahimiby a single ASCII character in a class item, was incorrectly compiled in UCP
2658*22dc650dSSadaf Ebrahimimode. The POSIX class got lost, but only if the single character followed it.
2659*22dc650dSSadaf Ebrahimi
2660*22dc650dSSadaf Ebrahimi96. [:punct:] in UCP mode was matching some characters in the range 128-255
2661*22dc650dSSadaf Ebrahimithat should not have been matched.
2662*22dc650dSSadaf Ebrahimi
2663*22dc650dSSadaf Ebrahimi97. If [:^ascii:] or [:^xdigit:] are present in a non-negated class, all
2664*22dc650dSSadaf Ebrahimicharacters with code points greater than 255 are in the class. When a Unicode
2665*22dc650dSSadaf Ebrahimiproperty was also in the class (if PCRE2_UCP is set, escapes such as \w are
2666*22dc650dSSadaf Ebrahimiturned into Unicode properties), wide characters were not correctly handled,
2667*22dc650dSSadaf Ebrahimiand could fail to match.
2668*22dc650dSSadaf Ebrahimi
2669*22dc650dSSadaf Ebrahimi98. In pcre2test, make the "startoffset" modifier a synonym of "offset",
2670*22dc650dSSadaf Ebrahimibecause it sets the "startoffset" parameter for pcre2_match().
2671*22dc650dSSadaf Ebrahimi
2672*22dc650dSSadaf Ebrahimi99. If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between
2673*22dc650dSSadaf Ebrahimian item and its qualifier (for example, A(?#comment)?B) pcre2_compile()
2674*22dc650dSSadaf Ebrahimimisbehaved. This bug was found by the LLVM fuzzer.
2675*22dc650dSSadaf Ebrahimi
2676*22dc650dSSadaf Ebrahimi100. The error for an invalid UTF pattern string always gave the code unit
2677*22dc650dSSadaf Ebrahimioffset as zero instead of where the invalidity was found.
2678*22dc650dSSadaf Ebrahimi
2679*22dc650dSSadaf Ebrahimi101. Further to 97 above, negated classes such as [^[:^ascii:]\d] were also not
2680*22dc650dSSadaf Ebrahimiworking correctly in UCP mode.
2681*22dc650dSSadaf Ebrahimi
2682*22dc650dSSadaf Ebrahimi102. Similar to 99 above, if an isolated \E was present between an item and its
2683*22dc650dSSadaf Ebrahimiqualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile() misbehaved. This bug
2684*22dc650dSSadaf Ebrahimiwas found by the LLVM fuzzer.
2685*22dc650dSSadaf Ebrahimi
2686*22dc650dSSadaf Ebrahimi103. The POSIX wrapper function regexec() crashed if the option REG_STARTEND
2687*22dc650dSSadaf Ebrahimiwas set when the pmatch argument was NULL. It now returns REG_INVARG.
2688*22dc650dSSadaf Ebrahimi
2689*22dc650dSSadaf Ebrahimi104. Allow for up to 32-bit numbers in the ordin() function in pcre2grep.
2690*22dc650dSSadaf Ebrahimi
2691*22dc650dSSadaf Ebrahimi105. An empty \Q\E sequence between an item and its qualifier caused
2692*22dc650dSSadaf Ebrahimipcre2_compile() to misbehave when auto callouts were enabled. This bug
2693*22dc650dSSadaf Ebrahimiwas found by the LLVM fuzzer.
2694*22dc650dSSadaf Ebrahimi
2695*22dc650dSSadaf Ebrahimi106. If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a (*MARK) or
2696*22dc650dSSadaf Ebrahimiother verb "name" ended with whitespace immediately before the closing
2697*22dc650dSSadaf Ebrahimiparenthesis, pcre2_compile() misbehaved. Example: /(*:abc )/, but only when
2698*22dc650dSSadaf Ebrahimiboth those options were set.
2699*22dc650dSSadaf Ebrahimi
2700*22dc650dSSadaf Ebrahimi107. In a number of places pcre2_compile() was not handling NULL characters
2701*22dc650dSSadaf Ebrahimicorrectly, and pcre2test with the "bincode" modifier was not always correctly
2702*22dc650dSSadaf Ebrahimidisplaying fields containing NULLS:
2703*22dc650dSSadaf Ebrahimi
2704*22dc650dSSadaf Ebrahimi   (a) Within /x extended #-comments
2705*22dc650dSSadaf Ebrahimi   (b) Within the "name" part of (*MARK) and other *verbs
2706*22dc650dSSadaf Ebrahimi   (c) Within the text argument of a callout
2707*22dc650dSSadaf Ebrahimi
2708*22dc650dSSadaf Ebrahimi108. If a pattern that was compiled with PCRE2_EXTENDED started with white
2709*22dc650dSSadaf Ebrahimispace or a #-type comment that was followed by (?-x), which turns off
2710*22dc650dSSadaf EbrahimiPCRE2_EXTENDED, and there was no subsequent (?x) to turn it on again,
2711*22dc650dSSadaf Ebrahimipcre2_compile() assumed that (?-x) applied to the whole pattern and
2712*22dc650dSSadaf Ebrahimiconsequently mis-compiled it. This bug was found by the LLVM fuzzer. The fix
2713*22dc650dSSadaf Ebrahimifor this bug means that a setting of any of the (?imsxJU) options at the start
2714*22dc650dSSadaf Ebrahimiof a pattern is no longer transferred to the options that are returned by
2715*22dc650dSSadaf EbrahimiPCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have
2716*22dc650dSSadaf Ebrahimichanged when the effects of those options were all moved to compile time.
2717*22dc650dSSadaf Ebrahimi
2718*22dc650dSSadaf Ebrahimi109. An escaped closing parenthesis in the "name" part of a (*verb) when
2719*22dc650dSSadaf EbrahimiPCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug
2720*22dc650dSSadaf Ebrahimiwas found by the LLVM fuzzer.
2721*22dc650dSSadaf Ebrahimi
2722*22dc650dSSadaf Ebrahimi110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it
2723*22dc650dSSadaf Ebrahimipossible to test it.
2724*22dc650dSSadaf Ebrahimi
2725*22dc650dSSadaf Ebrahimi111. "Harden" pcre2test against ridiculously large values in modifiers and
2726*22dc650dSSadaf Ebrahimicommand line arguments.
2727*22dc650dSSadaf Ebrahimi
2728*22dc650dSSadaf Ebrahimi112. Implemented PCRE2_SUBSTITUTE_UNKNOWN_UNSET and PCRE2_SUBSTITUTE_OVERFLOW_
2729*22dc650dSSadaf EbrahimiLENGTH.
2730*22dc650dSSadaf Ebrahimi
2731*22dc650dSSadaf Ebrahimi113. Fix printing of *MARK names that contain binary zeroes in pcre2test.
2732*22dc650dSSadaf Ebrahimi
2733*22dc650dSSadaf Ebrahimi
2734*22dc650dSSadaf EbrahimiVersion 10.20 30-June-2015
2735*22dc650dSSadaf Ebrahimi--------------------------
2736*22dc650dSSadaf Ebrahimi
2737*22dc650dSSadaf Ebrahimi1. Callouts with string arguments have been added.
2738*22dc650dSSadaf Ebrahimi
2739*22dc650dSSadaf Ebrahimi2. Assertion code generator in JIT has been optimized.
2740*22dc650dSSadaf Ebrahimi
2741*22dc650dSSadaf Ebrahimi3. The invalid pattern (?(?C) has a missing assertion condition at the end. The
2742*22dc650dSSadaf Ebrahimipcre2_compile() function read past the end of the input before diagnosing an
2743*22dc650dSSadaf Ebrahimierror. This bug was discovered by the LLVM fuzzer.
2744*22dc650dSSadaf Ebrahimi
2745*22dc650dSSadaf Ebrahimi4. Implemented pcre2_callout_enumerate().
2746*22dc650dSSadaf Ebrahimi
2747*22dc650dSSadaf Ebrahimi5. Fix JIT compilation of conditional blocks whose assertion is converted to
2748*22dc650dSSadaf Ebrahimi(*FAIL). E.g: /(?(?!))/.
2749*22dc650dSSadaf Ebrahimi
2750*22dc650dSSadaf Ebrahimi6. The pattern /(?(?!)^)/ caused references to random memory. This bug was
2751*22dc650dSSadaf Ebrahimidiscovered by the LLVM fuzzer.
2752*22dc650dSSadaf Ebrahimi
2753*22dc650dSSadaf Ebrahimi7. The assertion (?!) is optimized to (*FAIL). This was not handled correctly
2754*22dc650dSSadaf Ebrahimiwhen this assertion was used as a condition, for example (?(?!)a|b). In
2755*22dc650dSSadaf Ebrahimipcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect
2756*22dc650dSSadaf Ebrahimierror about an unsupported item.
2757*22dc650dSSadaf Ebrahimi
2758*22dc650dSSadaf Ebrahimi8. For some types of pattern, for example /Z*(|d*){216}/, the auto-
2759*22dc650dSSadaf Ebrahimipossessification code could take exponential time to complete. A recursion
2760*22dc650dSSadaf Ebrahimidepth limit of 1000 has been imposed to limit the resources used by this
2761*22dc650dSSadaf Ebrahimioptimization. This infelicity was discovered by the LLVM fuzzer.
2762*22dc650dSSadaf Ebrahimi
2763*22dc650dSSadaf Ebrahimi9. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class
2764*22dc650dSSadaf Ebrahimisuch as \S in non-UCP mode, explicit wide characters (> 255) can be ignored
2765*22dc650dSSadaf Ebrahimibecause \S ensures they are all in the class. The code for doing this was
2766*22dc650dSSadaf Ebrahimiinteracting badly with the code for computing the amount of space needed to
2767*22dc650dSSadaf Ebrahimicompile the pattern, leading to a buffer overflow. This bug was discovered by
2768*22dc650dSSadaf Ebrahimithe LLVM fuzzer.
2769*22dc650dSSadaf Ebrahimi
2770*22dc650dSSadaf Ebrahimi10. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside
2771*22dc650dSSadaf Ebrahimiother kinds of group caused stack overflow at compile time. This bug was
2772*22dc650dSSadaf Ebrahimidiscovered by the LLVM fuzzer.
2773*22dc650dSSadaf Ebrahimi
2774*22dc650dSSadaf Ebrahimi11. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment
2775*22dc650dSSadaf Ebrahimibetween a subroutine call and its quantifier was incorrectly compiled, leading
2776*22dc650dSSadaf Ebrahimito buffer overflow or other errors. This bug was discovered by the LLVM fuzzer.
2777*22dc650dSSadaf Ebrahimi
2778*22dc650dSSadaf Ebrahimi12. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an
2779*22dc650dSSadaf Ebrahimiassertion after (?(. The code was failing to check the character after (?(?<
2780*22dc650dSSadaf Ebrahimifor the ! or = that would indicate a lookbehind assertion. This bug was
2781*22dc650dSSadaf Ebrahimidiscovered by the LLVM fuzzer.
2782*22dc650dSSadaf Ebrahimi
2783*22dc650dSSadaf Ebrahimi13. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with
2784*22dc650dSSadaf Ebrahimia fixed maximum following a group that contains a subroutine reference was
2785*22dc650dSSadaf Ebrahimiincorrectly compiled and could trigger buffer overflow. This bug was discovered
2786*22dc650dSSadaf Ebrahimiby the LLVM fuzzer.
2787*22dc650dSSadaf Ebrahimi
2788*22dc650dSSadaf Ebrahimi14. Negative relative recursive references such as (?-7) to non-existent
2789*22dc650dSSadaf Ebrahimisubpatterns were not being diagnosed and could lead to unpredictable behaviour.
2790*22dc650dSSadaf EbrahimiThis bug was discovered by the LLVM fuzzer.
2791*22dc650dSSadaf Ebrahimi
2792*22dc650dSSadaf Ebrahimi15. The bug fixed in 14 was due to an integer variable that was unsigned when
2793*22dc650dSSadaf Ebrahimiit should have been signed. Some other "int" variables, having been checked,
2794*22dc650dSSadaf Ebrahimihave either been changed to uint32_t or commented as "must be signed".
2795*22dc650dSSadaf Ebrahimi
2796*22dc650dSSadaf Ebrahimi16. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1)))
2797*22dc650dSSadaf Ebrahimicaused a stack overflow instead of the diagnosis of a non-fixed length
2798*22dc650dSSadaf Ebrahimilookbehind assertion. This bug was discovered by the LLVM fuzzer.
2799*22dc650dSSadaf Ebrahimi
2800*22dc650dSSadaf Ebrahimi17. The use of \K in a positive lookbehind assertion in a non-anchored pattern
2801*22dc650dSSadaf Ebrahimi(e.g. /(?<=\Ka)/) could make pcre2grep loop.
2802*22dc650dSSadaf Ebrahimi
2803*22dc650dSSadaf Ebrahimi18. There was a similar problem to 17 in pcre2test for global matches, though
2804*22dc650dSSadaf Ebrahimithe code there did catch the loop.
2805*22dc650dSSadaf Ebrahimi
2806*22dc650dSSadaf Ebrahimi19. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
2807*22dc650dSSadaf Ebrahimiand a subsequent item in the pattern caused a non-match, backtracking over the
2808*22dc650dSSadaf Ebrahimirepeated \X did not stop, but carried on past the start of the subject, causing
2809*22dc650dSSadaf Ebrahimireference to random memory and/or a segfault. There were also some other cases
2810*22dc650dSSadaf Ebrahimiwhere backtracking after \C could crash. This set of bugs was discovered by the
2811*22dc650dSSadaf EbrahimiLLVM fuzzer.
2812*22dc650dSSadaf Ebrahimi
2813*22dc650dSSadaf Ebrahimi20. The function for finding the minimum length of a matching string could take
2814*22dc650dSSadaf Ebrahimia very long time if mutual recursion was present many times in a pattern, for
2815*22dc650dSSadaf Ebrahimiexample, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has
2816*22dc650dSSadaf Ebrahimibeen implemented. This infelicity was discovered by the LLVM fuzzer.
2817*22dc650dSSadaf Ebrahimi
2818*22dc650dSSadaf Ebrahimi21. Implemented PCRE2_NEVER_BACKSLASH_C.
2819*22dc650dSSadaf Ebrahimi
2820*22dc650dSSadaf Ebrahimi22. The feature for string replication in pcre2test could read from freed
2821*22dc650dSSadaf Ebrahimimemory if the replication required a buffer to be extended, and it was not
2822*22dc650dSSadaf Ebrahimiworking properly in 16-bit and 32-bit modes. This issue was discovered by a
2823*22dc650dSSadaf Ebrahimifuzzer: see http://lcamtuf.coredump.cx/afl/.
2824*22dc650dSSadaf Ebrahimi
2825*22dc650dSSadaf Ebrahimi23. Added the PCRE2_ALT_CIRCUMFLEX option.
2826*22dc650dSSadaf Ebrahimi
2827*22dc650dSSadaf Ebrahimi24. Adjust the treatment of \8 and \9 to be the same as the current Perl
2828*22dc650dSSadaf Ebrahimibehaviour.
2829*22dc650dSSadaf Ebrahimi
2830*22dc650dSSadaf Ebrahimi25. Static linking against the PCRE2 library using the pkg-config module was
2831*22dc650dSSadaf Ebrahimifailing on missing pthread symbols.
2832*22dc650dSSadaf Ebrahimi
2833*22dc650dSSadaf Ebrahimi26. If a group that contained a recursive back reference also contained a
2834*22dc650dSSadaf Ebrahimiforward reference subroutine call followed by a non-forward-reference
2835*22dc650dSSadaf Ebrahimisubroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
2836*22dc650dSSadaf Ebrahimicompile correct code, leading to undefined behaviour or an internally detected
2837*22dc650dSSadaf Ebrahimierror. This bug was discovered by the LLVM fuzzer.
2838*22dc650dSSadaf Ebrahimi
2839*22dc650dSSadaf Ebrahimi27. Quantification of certain items (e.g. atomic back references) could cause
2840*22dc650dSSadaf Ebrahimiincorrect code to be compiled when recursive forward references were involved.
2841*22dc650dSSadaf EbrahimiFor example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was
2842*22dc650dSSadaf Ebrahimidiscovered by the LLVM fuzzer.
2843*22dc650dSSadaf Ebrahimi
2844*22dc650dSSadaf Ebrahimi28. A repeated conditional group whose condition was a reference by name caused
2845*22dc650dSSadaf Ebrahimia buffer overflow if there was more than one group with the given name. This
2846*22dc650dSSadaf Ebrahimibug was discovered by the LLVM fuzzer.
2847*22dc650dSSadaf Ebrahimi
2848*22dc650dSSadaf Ebrahimi29. A recursive back reference by name within a group that had the same name as
2849*22dc650dSSadaf Ebrahimianother group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/.
2850*22dc650dSSadaf EbrahimiThis bug was discovered by the LLVM fuzzer.
2851*22dc650dSSadaf Ebrahimi
2852*22dc650dSSadaf Ebrahimi30. A forward reference by name to a group whose number is the same as the
2853*22dc650dSSadaf Ebrahimicurrent group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a
2854*22dc650dSSadaf Ebrahimibuffer overflow at compile time. This bug was discovered by the LLVM fuzzer.
2855*22dc650dSSadaf Ebrahimi
2856*22dc650dSSadaf Ebrahimi31. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1
2857*22dc650dSSadaf Ebrahimias an int; fixed by writing it as 1u).
2858*22dc650dSSadaf Ebrahimi
2859*22dc650dSSadaf Ebrahimi32. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives
2860*22dc650dSSadaf Ebrahimia warning for "fileno" unless -std=gnu99 us used.
2861*22dc650dSSadaf Ebrahimi
2862*22dc650dSSadaf Ebrahimi33. A lookbehind assertion within a set of mutually recursive subpatterns could
2863*22dc650dSSadaf Ebrahimiprovoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
2864*22dc650dSSadaf Ebrahimi
2865*22dc650dSSadaf Ebrahimi34. Give an error for an empty subpattern name such as (?'').
2866*22dc650dSSadaf Ebrahimi
2867*22dc650dSSadaf Ebrahimi35. Make pcre2test give an error if a pattern that follows #forbud_utf contains
2868*22dc650dSSadaf Ebrahimi\P, \p, or \X.
2869*22dc650dSSadaf Ebrahimi
2870*22dc650dSSadaf Ebrahimi36. The way named subpatterns are handled has been refactored. There is now a
2871*22dc650dSSadaf Ebrahimipre-pass over the regex which does nothing other than identify named
2872*22dc650dSSadaf Ebrahimisubpatterns and count the total captures. This means that information about
2873*22dc650dSSadaf Ebrahiminamed patterns is known before the rest of the compile. In particular, it means
2874*22dc650dSSadaf Ebrahimithat forward references can be checked as they are encountered. Previously, the
2875*22dc650dSSadaf Ebrahimicode for handling forward references was contorted and led to several errors in
2876*22dc650dSSadaf Ebrahimicomputing the memory requirements for some patterns, leading to buffer
2877*22dc650dSSadaf Ebrahimioverflows.
2878*22dc650dSSadaf Ebrahimi
2879*22dc650dSSadaf Ebrahimi37. There was no check for integer overflow in subroutine calls such as (?123).
2880*22dc650dSSadaf Ebrahimi
2881*22dc650dSSadaf Ebrahimi38. The table entry for \l in EBCDIC environments was incorrect, leading to its
2882*22dc650dSSadaf Ebrahimibeing treated as a literal 'l' instead of causing an error.
2883*22dc650dSSadaf Ebrahimi
2884*22dc650dSSadaf Ebrahimi39. If a non-capturing group containing a conditional group that could match
2885*22dc650dSSadaf Ebrahimian empty string was repeated, it was not identified as matching an empty string
2886*22dc650dSSadaf Ebrahimiitself. For example: /^(?:(?(1)x|)+)+$()/.
2887*22dc650dSSadaf Ebrahimi
2888*22dc650dSSadaf Ebrahimi40. In an EBCDIC environment, pcretest was mishandling the escape sequences
2889*22dc650dSSadaf Ebrahimi\a and \e in test subject lines.
2890*22dc650dSSadaf Ebrahimi
2891*22dc650dSSadaf Ebrahimi41. In an EBCDIC environment, \a in a pattern was converted to the ASCII
2892*22dc650dSSadaf Ebrahimiinstead of the EBCDIC value.
2893*22dc650dSSadaf Ebrahimi
2894*22dc650dSSadaf Ebrahimi42. The handling of \c in an EBCDIC environment has been revised so that it is
2895*22dc650dSSadaf Ebrahiminow compatible with the specification in Perl's perlebcdic page.
2896*22dc650dSSadaf Ebrahimi
2897*22dc650dSSadaf Ebrahimi43. Single character repetition in JIT has been improved. 20-30% speedup
2898*22dc650dSSadaf Ebrahimiwas achieved on certain patterns.
2899*22dc650dSSadaf Ebrahimi
2900*22dc650dSSadaf Ebrahimi44. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
2901*22dc650dSSadaf EbrahimiASCII/Unicode. This has now been added to the list of characters that are
2902*22dc650dSSadaf Ebrahimirecognized as white space in EBCDIC.
2903*22dc650dSSadaf Ebrahimi
2904*22dc650dSSadaf Ebrahimi45. When PCRE2 was compiled without Unicode support, the use of \p and \P gave
2905*22dc650dSSadaf Ebrahimian error (correctly) when used outside a class, but did not give an error
2906*22dc650dSSadaf Ebrahimiwithin a class.
2907*22dc650dSSadaf Ebrahimi
2908*22dc650dSSadaf Ebrahimi46. \h within a class was incorrectly compiled in EBCDIC environments.
2909*22dc650dSSadaf Ebrahimi
2910*22dc650dSSadaf Ebrahimi47. JIT should return with error when the compiled pattern requires
2911*22dc650dSSadaf Ebrahimimore stack space than the maximum.
2912*22dc650dSSadaf Ebrahimi
2913*22dc650dSSadaf Ebrahimi48. Fixed a memory leak in pcre2grep when a locale is set.
2914*22dc650dSSadaf Ebrahimi
2915*22dc650dSSadaf Ebrahimi
2916*22dc650dSSadaf EbrahimiVersion 10.10 06-March-2015
2917*22dc650dSSadaf Ebrahimi---------------------------
2918*22dc650dSSadaf Ebrahimi
2919*22dc650dSSadaf Ebrahimi1. When a pattern is compiled, it remembers the highest back reference so that
2920*22dc650dSSadaf Ebrahimiwhen matching, if the ovector is too small, extra memory can be obtained to
2921*22dc650dSSadaf Ebrahimiuse instead. A conditional subpattern whose condition is a check on a capture
2922*22dc650dSSadaf Ebrahimihaving happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is
2923*22dc650dSSadaf Ebrahimianother kind of back reference, but it was not setting the highest
2924*22dc650dSSadaf Ebrahimibackreference number. This mattered only if pcre2_match() was called with an
2925*22dc650dSSadaf Ebrahimiovector that was too small to hold the capture, and there was no other kind of
2926*22dc650dSSadaf Ebrahimiback reference (a situation which is probably quite rare). The effect of the
2927*22dc650dSSadaf Ebrahimibug was that the condition was always treated as FALSE when the capture could
2928*22dc650dSSadaf Ebrahiminot be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
2929*22dc650dSSadaf Ebrahimihas been fixed.
2930*22dc650dSSadaf Ebrahimi
2931*22dc650dSSadaf Ebrahimi2. Functions for serialization and deserialization of sets of compiled patterns
2932*22dc650dSSadaf Ebrahimihave been added.
2933*22dc650dSSadaf Ebrahimi
2934*22dc650dSSadaf Ebrahimi3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove
2935*22dc650dSSadaf Ebrahimiexcess code units at the end of the data block that may occasionally occur if
2936*22dc650dSSadaf Ebrahimithe code for calculating the size over-estimates. This change stops the
2937*22dc650dSSadaf Ebrahimiserialization code copying uninitialized data, to which valgrind objects. The
2938*22dc650dSSadaf Ebrahimidocumentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not
2939*22dc650dSSadaf Ebrahimiinclude the general overhead. This has been corrected.
2940*22dc650dSSadaf Ebrahimi
2941*22dc650dSSadaf Ebrahimi4. All code units in every slot in the table of group names are now set, again
2942*22dc650dSSadaf Ebrahimiin order to avoid accessing uninitialized data when serializing.
2943*22dc650dSSadaf Ebrahimi
2944*22dc650dSSadaf Ebrahimi5. The (*NO_JIT) feature is implemented.
2945*22dc650dSSadaf Ebrahimi
2946*22dc650dSSadaf Ebrahimi6. If a bug that caused pcre2_compile() to use more memory than allocated was
2947*22dc650dSSadaf Ebrahimitriggered when using valgrind, the code in (3) above passed a stupidly large
2948*22dc650dSSadaf Ebrahimivalue to valgrind. This caused a crash instead of an "internal error" return.
2949*22dc650dSSadaf Ebrahimi
2950*22dc650dSSadaf Ebrahimi7. A reference to a duplicated named group (either a back reference or a test
2951*22dc650dSSadaf Ebrahimifor being set in a conditional) that occurred in a part of the pattern where
2952*22dc650dSSadaf EbrahimiPCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern
2953*22dc650dSSadaf Ebrahimito be incorrectly calculated, leading to overwriting.
2954*22dc650dSSadaf Ebrahimi
2955*22dc650dSSadaf Ebrahimi8. A mutually recursive set of back references such as (\2)(\1) caused a
2956*22dc650dSSadaf Ebrahimisegfault at compile time (while trying to find the minimum matching length).
2957*22dc650dSSadaf EbrahimiThe infinite loop is now broken (with the minimum length unset, that is, zero).
2958*22dc650dSSadaf Ebrahimi
2959*22dc650dSSadaf Ebrahimi9. If an assertion that was used as a condition was quantified with a minimum
2960*22dc650dSSadaf Ebrahimiof zero, matching went wrong. In particular, if the whole group had unlimited
2961*22dc650dSSadaf Ebrahimirepetition and could match an empty string, a segfault was likely. The pattern
2962*22dc650dSSadaf Ebrahimi(?(?=0)?)+ is an example that caused this. Perl allows assertions to be
2963*22dc650dSSadaf Ebrahimiquantified, but not if they are being used as conditions, so the above pattern
2964*22dc650dSSadaf Ebrahimiis faulted by Perl. PCRE2 has now been changed so that it also rejects such
2965*22dc650dSSadaf Ebrahimipatterns.
2966*22dc650dSSadaf Ebrahimi
2967*22dc650dSSadaf Ebrahimi10. The error message for an invalid quantifier has been changed from "nothing
2968*22dc650dSSadaf Ebrahimito repeat" to "quantifier does not follow a repeatable item".
2969*22dc650dSSadaf Ebrahimi
2970*22dc650dSSadaf Ebrahimi11. If a bad UTF string is compiled with NO_UTF_CHECK, it may succeed, but
2971*22dc650dSSadaf Ebrahimiscanning the compiled pattern in subsequent auto-possessification can get out
2972*22dc650dSSadaf Ebrahimiof step and lead to an unknown opcode. Previously this could have caused an
2973*22dc650dSSadaf Ebrahimiinfinite loop. Now it generates an "internal error" error. This is a tidyup,
2974*22dc650dSSadaf Ebrahiminot a bug fix; passing bad UTF with NO_UTF_CHECK is documented as having an
2975*22dc650dSSadaf Ebrahimiundefined outcome.
2976*22dc650dSSadaf Ebrahimi
2977*22dc650dSSadaf Ebrahimi12. A UTF pattern containing a "not" match of a non-ASCII character and a
2978*22dc650dSSadaf Ebrahimisubroutine reference could loop at compile time. Example: /[^\xff]((?1))/.
2979*22dc650dSSadaf Ebrahimi
2980*22dc650dSSadaf Ebrahimi13. The locale test (RunTest 3) has been upgraded. It now checks that a locale
2981*22dc650dSSadaf Ebrahimithat is found in the output of "locale -a" can actually be set by pcre2test
2982*22dc650dSSadaf Ebrahimibefore it is accepted. Previously, in an environment where a locale was listed
2983*22dc650dSSadaf Ebrahimibut would not set (an example does exist), the test would "pass" without
2984*22dc650dSSadaf Ebrahimiactually doing anything. Also the fr_CA locale has been added to the list of
2985*22dc650dSSadaf Ebrahimilocales that can be used.
2986*22dc650dSSadaf Ebrahimi
2987*22dc650dSSadaf Ebrahimi14. Fixed a bug in pcre2_substitute(). If a replacement string ended in a
2988*22dc650dSSadaf Ebrahimicapturing group number without parentheses, the last character was incorrectly
2989*22dc650dSSadaf Ebrahimiliterally included at the end of the replacement string.
2990*22dc650dSSadaf Ebrahimi
2991*22dc650dSSadaf Ebrahimi15. A possessive capturing group such as (a)*+ with a minimum repeat of zero
2992*22dc650dSSadaf Ebrahimifailed to allow the zero-repeat case if pcre2_match() was called with an
2993*22dc650dSSadaf Ebrahimiovector too small to capture the group.
2994*22dc650dSSadaf Ebrahimi
2995*22dc650dSSadaf Ebrahimi16. Improved error message in pcre2test when setting the stack size (-S) fails.
2996*22dc650dSSadaf Ebrahimi
2997*22dc650dSSadaf Ebrahimi17. Fixed two bugs in CMakeLists.txt: (1) Some lines had got lost in the
2998*22dc650dSSadaf Ebrahimitransfer from PCRE1, meaning that CMake configuration failed if "build tests"
2999*22dc650dSSadaf Ebrahimiwas selected. (2) The file src/pcre2_serialize.c had not been added to the list
3000*22dc650dSSadaf Ebrahimiof PCRE2 sources, which caused a failure to build pcre2test.
3001*22dc650dSSadaf Ebrahimi
3002*22dc650dSSadaf Ebrahimi18. Fixed typo in pcre2_serialize.c (DECL instead of DEFN) that causes problems
3003*22dc650dSSadaf Ebrahimionly on Windows.
3004*22dc650dSSadaf Ebrahimi
3005*22dc650dSSadaf Ebrahimi19. Use binary input when reading back saved serialized patterns in pcre2test.
3006*22dc650dSSadaf Ebrahimi
3007*22dc650dSSadaf Ebrahimi20. Added RunTest.bat for running the tests under Windows.
3008*22dc650dSSadaf Ebrahimi
3009*22dc650dSSadaf Ebrahimi21. "make distclean" was not removing config.h, a file that may be created for
3010*22dc650dSSadaf Ebrahimiuse with CMake.
3011*22dc650dSSadaf Ebrahimi
3012*22dc650dSSadaf Ebrahimi22. A pattern such as "((?2){0,1999}())?", which has a group containing a
3013*22dc650dSSadaf Ebrahimiforward reference repeated a large (but limited) number of times within a
3014*22dc650dSSadaf Ebrahimirepeated outer group that has a zero minimum quantifier, caused incorrect code
3015*22dc650dSSadaf Ebrahimito be compiled, leading to the error "internal error: previously-checked
3016*22dc650dSSadaf Ebrahimireferenced subpattern not found" when an incorrect memory address was read.
3017*22dc650dSSadaf EbrahimiThis bug was reported as "heap overflow", discovered by Kai Lu of Fortinet's
3018*22dc650dSSadaf EbrahimiFortiGuard Labs. (Added 24-March-2015: CVE-2015-2325 was given to this.)
3019*22dc650dSSadaf Ebrahimi
3020*22dc650dSSadaf Ebrahimi23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine
3021*22dc650dSSadaf Ebrahimicall within a group that also contained a recursive back reference caused
3022*22dc650dSSadaf Ebrahimiincorrect code to be compiled. This bug was reported as "heap overflow",
3023*22dc650dSSadaf Ebrahimidiscovered by Kai Lu of Fortinet's FortiGuard Labs. (Added 24-March-2015:
3024*22dc650dSSadaf EbrahimiCVE-2015-2326 was given to this.)
3025*22dc650dSSadaf Ebrahimi
3026*22dc650dSSadaf Ebrahimi24. Computing the size of the JIT read-only data in advance has been a source
3027*22dc650dSSadaf Ebrahimiof various issues, and new ones are still appear unfortunately. To fix
3028*22dc650dSSadaf Ebrahimiexisting and future issues, size computation is eliminated from the code,
3029*22dc650dSSadaf Ebrahimiand replaced by on-demand memory allocation.
3030*22dc650dSSadaf Ebrahimi
3031*22dc650dSSadaf Ebrahimi25. A pattern such as /(?i)[A-`]/, where characters in the other case are
3032*22dc650dSSadaf Ebrahimiadjacent to the end of the range, and the range contained characters with more
3033*22dc650dSSadaf Ebrahimithan one other case, caused incorrect behaviour when compiled in UTF mode. In
3034*22dc650dSSadaf Ebrahimithat example, the range a-j was left out of the class.
3035*22dc650dSSadaf Ebrahimi
3036*22dc650dSSadaf Ebrahimi
3037*22dc650dSSadaf EbrahimiVersion 10.00 05-January-2015
3038*22dc650dSSadaf Ebrahimi-----------------------------
3039*22dc650dSSadaf Ebrahimi
3040*22dc650dSSadaf EbrahimiVersion 10.00 is the first release of PCRE2, a revised API for the PCRE
3041*22dc650dSSadaf Ebrahimilibrary. Changes prior to 10.00 are logged in the ChangeLog file for the old
3042*22dc650dSSadaf EbrahimiAPI, up to item 20 for release 8.36.
3043*22dc650dSSadaf Ebrahimi
3044*22dc650dSSadaf EbrahimiThe code of the library was heavily revised as part of the new API
3045*22dc650dSSadaf Ebrahimiimplementation. Details of each and every modification were not individually
3046*22dc650dSSadaf Ebrahimilogged. In addition to the API changes, the following changes were made. They
3047*22dc650dSSadaf Ebrahimiare either new functionality, or bug fixes and other noticeable changes of
3048*22dc650dSSadaf Ebrahimibehaviour that were implemented after the code had been forked.
3049*22dc650dSSadaf Ebrahimi
3050*22dc650dSSadaf Ebrahimi1. Including Unicode support at build time is now enabled by default, but it
3051*22dc650dSSadaf Ebrahimican optionally be disabled. It is not enabled by default at run time (no
3052*22dc650dSSadaf Ebrahimichange).
3053*22dc650dSSadaf Ebrahimi
3054*22dc650dSSadaf Ebrahimi2. The test program, now called pcre2test, was re-specified and almost
3055*22dc650dSSadaf Ebrahimicompletely re-written. Its input is not compatible with input for pcretest.
3056*22dc650dSSadaf Ebrahimi
3057*22dc650dSSadaf Ebrahimi3. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the
3058*22dc650dSSadaf EbrahimiPCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is
3059*22dc650dSSadaf Ebrahimimatched by that pattern.
3060*22dc650dSSadaf Ebrahimi
3061*22dc650dSSadaf Ebrahimi4. For the benefit of those who use PCRE2 via some other application, that is,
3062*22dc650dSSadaf Ebrahiminot writing the function calls themselves, it is possible to check the PCRE2
3063*22dc650dSSadaf Ebrahimiversion by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a
3064*22dc650dSSadaf Ebrahimistring such as "yesno".
3065*22dc650dSSadaf Ebrahimi
3066*22dc650dSSadaf Ebrahimi5. There are case-equivalent Unicode characters whose encodings use different
3067*22dc650dSSadaf Ebrahiminumbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is
3068*22dc650dSSadaf Ebrahimitheoretically possible for this to happen in UTF-16 too.) If a backreference to
3069*22dc650dSSadaf Ebrahimia group containing one of these characters was greedily repeated, and during
3070*22dc650dSSadaf Ebrahimithe match a backtrack occurred, the subject might be backtracked by the wrong
3071*22dc650dSSadaf Ebrahiminumber of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly
3072*22dc650dSSadaf Ebrahimi(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should
3073*22dc650dSSadaf Ebrahimicapture the final character, which is the three bytes E2, B1, and A5 in UTF-8.
3074*22dc650dSSadaf EbrahimiIncorrect backtracking meant that group 2 captured only the last two bytes.
3075*22dc650dSSadaf EbrahimiThis bug has been fixed; the new code is slower, but it is used only when the
3076*22dc650dSSadaf Ebrahimistrings matched by the repetition are not all the same length.
3077*22dc650dSSadaf Ebrahimi
3078*22dc650dSSadaf Ebrahimi6. A pattern such as /()a/ was not setting the "first character must be 'a'"
3079*22dc650dSSadaf Ebrahimiinformation. This applied to any pattern with a group that matched no
3080*22dc650dSSadaf Ebrahimicharacters, for example: /(?:(?=.)|(?<!x))a/.
3081*22dc650dSSadaf Ebrahimi
3082*22dc650dSSadaf Ebrahimi7. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for
3083*22dc650dSSadaf Ebrahimithose parentheses to be closed with whatever has been captured so far. However,
3084*22dc650dSSadaf Ebrahimiit was failing to mark any other groups between the highest capture so far and
3085*22dc650dSSadaf Ebrahimithe current group as "unset". Thus, the ovector for those groups contained
3086*22dc650dSSadaf Ebrahimiwhatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when
3087*22dc650dSSadaf Ebrahimimatched against "abcd".
3088*22dc650dSSadaf Ebrahimi
3089*22dc650dSSadaf Ebrahimi8. The pcre2_substitute() function has been implemented.
3090*22dc650dSSadaf Ebrahimi
3091*22dc650dSSadaf Ebrahimi9. If an assertion used as a condition was quantified with a minimum of zero
3092*22dc650dSSadaf Ebrahimi(an odd thing to do, but it happened), SIGSEGV or other misbehaviour could
3093*22dc650dSSadaf Ebrahimioccur.
3094*22dc650dSSadaf Ebrahimi
3095*22dc650dSSadaf Ebrahimi10. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented.
3096*22dc650dSSadaf Ebrahimi
3097*22dc650dSSadaf Ebrahimi****
3098