1*22dc650dSSadaf EbrahimiChange Log for PCRE2 2*22dc650dSSadaf Ebrahimi-------------------- 3*22dc650dSSadaf Ebrahimi 4*22dc650dSSadaf EbrahimiBefore the move to GitHub, this was the only record of changes to PCRE2. Now 5*22dc650dSSadaf Ebrahimithere is also the log of commit messages. 6*22dc650dSSadaf Ebrahimi 7*22dc650dSSadaf EbrahimiVersion 10.44 07-June-2024 8*22dc650dSSadaf Ebrahimi-------------------------- 9*22dc650dSSadaf Ebrahimi 10*22dc650dSSadaf Ebrahimi1. If a pattern contained a variable-length lookbehind in which the first 11*22dc650dSSadaf Ebrahimibranch was not the one with the shortest minimum length, and the lookbehind 12*22dc650dSSadaf Ebrahimicontained a capturing group, and elsewhere in the pattern there was another 13*22dc650dSSadaf Ebrahimilookbehind that referenced that group, the pattern was incorrectly compiled, 14*22dc650dSSadaf Ebrahimileading to unpredictable results, including crashes in JIT compiling. An 15*22dc650dSSadaf Ebrahimiexample pattern is: /(((?<=123?456456|ABC)))(?<=\2)/ 16*22dc650dSSadaf Ebrahimi 17*22dc650dSSadaf Ebrahimi2. Further updates to the oss-fuzz support: 18*22dc650dSSadaf Ebrahimi 19*22dc650dSSadaf Ebrahimi (a) Limit quantifiers for groups and classes to be no more than 10. This 20*22dc650dSSadaf Ebrahimi avoids very long JIT compile times that happen in some cases when groups 21*22dc650dSSadaf Ebrahimi are replicated for quantification, and very long match times when 22*22dc650dSSadaf Ebrahimi classes contain a lot of non-ascii characters. 23*22dc650dSSadaf Ebrahimi 24*22dc650dSSadaf Ebrahimi (b) Added PCRE2_EXTENDED_MORE to the list of allowed options. 25*22dc650dSSadaf Ebrahimi 26*22dc650dSSadaf Ebrahimi (c) Arranged for text error messages to be shown in 16-bit and 32-bit modes. 27*22dc650dSSadaf Ebrahimi 28*22dc650dSSadaf Ebrahimi (d) Made the output in standalone mode more readable. 29*22dc650dSSadaf Ebrahimi 30*22dc650dSSadaf Ebrahimi (e) General code tidies. 31*22dc650dSSadaf Ebrahimi 32*22dc650dSSadaf Ebrahimi (f) Limit the size of compiled patterns to 10MB (see 6 below). 33*22dc650dSSadaf Ebrahimi 34*22dc650dSSadaf Ebrahimi (g) Do not run JIT on patterns whose compiled length is greater than 200K 35*22dc650dSSadaf Ebrahimi bytes because this takes a long time, causing oss-fuzz to time out. 36*22dc650dSSadaf Ebrahimi 37*22dc650dSSadaf Ebrahimi (h) Avoid compiling or matching twice with the same options (this could 38*22dc650dSSadaf Ebrahimi happen if the input didn't set any options). 39*22dc650dSSadaf Ebrahimi 40*22dc650dSSadaf Ebrahimi3. Increase the maximum length of a name for a group from 32 to 128 because 41*22dc650dSSadaf Ebrahimithere is a user for whom 32 is too small. 42*22dc650dSSadaf Ebrahimi 43*22dc650dSSadaf Ebrahimi4. Cause pcre2test to output a message when pcre2_jit_compile() gives an error 44*22dc650dSSadaf Ebrahimireturn if either jitverify or info is specified. 45*22dc650dSSadaf Ebrahimi 46*22dc650dSSadaf Ebrahimi5. Some auxiliary files for building under OpenVMS that were contributed by 47*22dc650dSSadaf EbrahimiAlexey Chupahin have been installed. 48*22dc650dSSadaf Ebrahimi 49*22dc650dSSadaf Ebrahimi6. Added pcre2_set_max_pattern_compiled_length() to limit the size of compiled 50*22dc650dSSadaf Ebrahimipatterns. 51*22dc650dSSadaf Ebrahimi 52*22dc650dSSadaf Ebrahimi7. There was a bug in the implementation of \X caused by my (PH) misreading or 53*22dc650dSSadaf Ebrahimimisunderstanding one of the grapheme sequence breaking rules in Unicode Annex 54*22dc650dSSadaf Ebrahimi#29. A break should occur between two characters with the Extended Pictographic 55*22dc650dSSadaf Ebrahimibreak property unless a zero-width joiner intervenes. PCRE2 was not insisting 56*22dc650dSSadaf Ebrahimion the ZWJ, causing \X to match more than it should. See GitHub issue #410. 57*22dc650dSSadaf Ebrahimi 58*22dc650dSSadaf Ebrahimi8. Avoid compilation issues with proprietary compilers in UNIX since 10.43. 59*22dc650dSSadaf Ebrahimi 60*22dc650dSSadaf Ebrahimi 61*22dc650dSSadaf EbrahimiVersion 10.43 16-February-2024 62*22dc650dSSadaf Ebrahimi------------------------------ 63*22dc650dSSadaf Ebrahimi 64*22dc650dSSadaf Ebrahimi1. The test program added by change 2 of 10.42 didn't work when the default 65*22dc650dSSadaf Ebrahiminewline setting didn't include \n as a newline. One test needed (*LF) to ensure 66*22dc650dSSadaf Ebrahimithat it worked. 67*22dc650dSSadaf Ebrahimi 68*22dc650dSSadaf Ebrahimi2. Added the new freestanding POSIX test program to the ManyConfigTests script 69*22dc650dSSadaf Ebrahimiin the maint directory (overlooked in 2 below). Also improved the selection 70*22dc650dSSadaf Ebrahimifacilities in that script, and added a test with JIT in a non-source directory, 71*22dc650dSSadaf Ebrahimifixing an oversight that would have made such a test fail before. 72*22dc650dSSadaf Ebrahimi 73*22dc650dSSadaf Ebrahimi3. Added pcre2_get_match_data_heapframes_size() and related pcre2test flags 74*22dc650dSSadaf Ebrahimito allow for finer control of the heap used when pcre2_match() without JIT is 75*22dc650dSSadaf Ebrahimiused and the match_data might be reused. This began as PR #191, but has had 76*22dc650dSSadaf Ebrahimifurther refinement and documentation edits. 77*22dc650dSSadaf Ebrahimi 78*22dc650dSSadaf Ebrahimi4. Applied PR #181, which tidies some casts in pcre2_valid_utf.c. 79*22dc650dSSadaf Ebrahimi 80*22dc650dSSadaf Ebrahimi5. Applied PR #184, which avoids overflow issues with the heap limit 81*22dc650dSSadaf Ebrahimi(introduced in 10.41/9). 82*22dc650dSSadaf Ebrahimi 83*22dc650dSSadaf Ebrahimi6. Applied PR #192, which changes the timing units for pcre2test from 84*22dc650dSSadaf Ebrahimimilliseconds to microseconds. This is more useful for modern CPUs. 85*22dc650dSSadaf Ebrahimi 86*22dc650dSSadaf Ebrahimi7. Applied PR #193, which makes the requirement for C99 explicit in 87*22dc650dSSadaf Ebrahimiconfigure.ac and CMakeLists.txt. 88*22dc650dSSadaf Ebrahimi 89*22dc650dSSadaf Ebrahimi8. Fixed a bug in pcre2test when a ridiculously large string repeat required a 90*22dc650dSSadaf Ebrahimistupid amount of memory. It now gives a clean realloc() failure error. 91*22dc650dSSadaf Ebrahimi 92*22dc650dSSadaf Ebrahimi9. Updates to restrict the interaction between ASCII and non-ASCII characters 93*22dc650dSSadaf Ebrahimifor caseless matching and items like \d: 94*22dc650dSSadaf Ebrahimi 95*22dc650dSSadaf Ebrahimi (a) Added PCRE2_EXTRA_CASELESS_RESTRICT to lock out mixing of ASCII and 96*22dc650dSSadaf Ebrahimi non-ASCII when matching caselessly. This is also /r in pcre2test and 97*22dc650dSSadaf Ebrahimi (?r) within patterns. 98*22dc650dSSadaf Ebrahimi 99*22dc650dSSadaf Ebrahimi (b) Added PCRE2_EXTRA_ASCII_{BSD,BSS,BSW,POSIX} and corresponding (?aD) etc 100*22dc650dSSadaf Ebrahimi in patterns and /a in pcre2test. 101*22dc650dSSadaf Ebrahimi 102*22dc650dSSadaf Ebrahimi (c) Corresponding updates to pcre2test. 103*22dc650dSSadaf Ebrahimi 104*22dc650dSSadaf Ebrahimi10. Unicode has been updated to 15.0.0. 105*22dc650dSSadaf Ebrahimi 106*22dc650dSSadaf Ebrahimi11. The Python scripts and ucptest.c in maint have been updated (a) a minor 107*22dc650dSSadaf Ebrahimichange needed for 9(a) above; (b) fix bugs in ucptest, 108*22dc650dSSadaf Ebrahimi 109*22dc650dSSadaf Ebrahimi12. Integer overflow testing is now centralized in a new function. 110*22dc650dSSadaf Ebrahimi 111*22dc650dSSadaf Ebrahimi13. Made PCRE2_UCP the default in UTF mode in pcre2grep, and added new options 112*22dc650dSSadaf Ebrahimi--case-restrict and --no-ucp. 113*22dc650dSSadaf Ebrahimi 114*22dc650dSSadaf Ebrahimi14. In the debugging printint module (which is normally only linked into 115*22dc650dSSadaf Ebrahimipcre2test), avoid the use of a variable called "not" because that's deprecated 116*22dc650dSSadaf Ebrahimiin C and forbidden in C++. Also rewrite some code to avoid a goto into a block 117*22dc650dSSadaf Ebrahimithat bypassed its initialization (though it didn't actually matter). 118*22dc650dSSadaf Ebrahimi 119*22dc650dSSadaf Ebrahimi15. More minor code adjustments to avoid using reserved C++ words as variable 120*22dc650dSSadaf Ebrahiminames ("new" and "typename") and another jump that bypassed an (irrelevant) 121*22dc650dSSadaf Ebrahimiinitialization. 122*22dc650dSSadaf Ebrahimi 123*22dc650dSSadaf Ebrahimi16. Merged a pull request that removed pcre2_ucptables.c from the list of files 124*22dc650dSSadaf Ebrahimito compile in NON-AUTOTOOLS-BUILD because it is #included in pcre2_tables.c. 125*22dc650dSSadaf EbrahimiAlso adjusted the BUILD.bazel and build.zig files, which had the same issue. At 126*22dc650dSSadaf Ebrahimithe same time, fixed a typo in the Bazel file. 127*22dc650dSSadaf Ebrahimi 128*22dc650dSSadaf Ebrahimi17. Add PCRE2_EXTRA_ASCII_DIGIT to allow [:digit:] to be kept on sync with \d 129*22dc650dSSadaf Ebrahimieven in UCP mode. 130*22dc650dSSadaf Ebrahimi 131*22dc650dSSadaf Ebrahimi18. Fix an invalid match of ascii word classes when invalid utf is enabled. 132*22dc650dSSadaf Ebrahimi 133*22dc650dSSadaf Ebrahimi19. Add a --posix-digit to pcre2grep for compatibility with GNU grep, and 134*22dc650dSSadaf Ebrahimiother tools that prefer the POSIX compatible unicode definition for \d. 135*22dc650dSSadaf Ebrahimi 136*22dc650dSSadaf Ebrahimi20. Report the bit width of the library in use by pcre2test for usability. 137*22dc650dSSadaf Ebrahimi 138*22dc650dSSadaf Ebrahimi21. A pathological pattern conversion test could result in a string longer than 139*22dc650dSSadaf Ebrahimithe available input buffer. Cause such a test to fail. 140*22dc650dSSadaf Ebrahimi 141*22dc650dSSadaf Ebrahimi22. Add a check that forces a compiler error if PCRE2_CODE_UNIT_WIDTH is not 8, 142*22dc650dSSadaf Ebrahimi16, or 32 when compiling any of the library modules. 143*22dc650dSSadaf Ebrahimi 144*22dc650dSSadaf Ebrahimi23. Update pcre2_compile() to treat a NULL pattern with zero length as an empty 145*22dc650dSSadaf Ebrahimistring. 146*22dc650dSSadaf Ebrahimi 147*22dc650dSSadaf Ebrahimi24. Add support for limited-length variable-length lookbehind assertions, with 148*22dc650dSSadaf Ebrahimidefault maximum length 255 characters (same as Perl) but with a function to 149*22dc650dSSadaf Ebrahimiadjust the limit. 150*22dc650dSSadaf Ebrahimi 151*22dc650dSSadaf Ebrahimi25. Applied pull request #262, which updates the zig configuration, and #278 152*22dc650dSSadaf Ebrahimiwhich fixes a bug with out-of-source-tree CMake build testing. 153*22dc650dSSadaf Ebrahimi 154*22dc650dSSadaf Ebrahimi26. Add support for LoongArch to JIT. 155*22dc650dSSadaf Ebrahimi 156*22dc650dSSadaf Ebrahimi27. Fixed a bug in pcre2_match() in the code for handling the vector of 157*22dc650dSSadaf Ebrahimibacktracking frames on the heap, which caused a heap overflow if *LIMIT_HEAP 158*22dc650dSSadaf Ebrahimirestricted an attempt to extend to less than the frame size. Generally tidy up 159*22dc650dSSadaf Ebrahimithe code for extending the heap frames vector. This fixes GitHub issue #275. 160*22dc650dSSadaf Ebrahimi 161*22dc650dSSadaf Ebrahimi28. Update pcre2_fuzzsupport.c to avoid clang sanitize complaint about shifting 162*22dc650dSSadaf Ebrahimileft by 16 when there are non-zeros in the top 16 bits. 163*22dc650dSSadaf Ebrahimi 164*22dc650dSSadaf Ebrahimi29. Perl 5.34.0 changed the meaning of (for example) {,3} which did not used to 165*22dc650dSSadaf Ebrahimibe treated as a quantifier. Now it is interpreted as {0,3} and PCRE2 has 166*22dc650dSSadaf Ebrahimichanged to match. Note that {,} is still not a quantifier. 167*22dc650dSSadaf Ebrahimi 168*22dc650dSSadaf Ebrahimi30. Perl allows spaces and/or horizontal tabs after { or before } in all items 169*22dc650dSSadaf Ebrahimithat use braces, and also before or after the comma in quantifiers. PCRE2 now 170*22dc650dSSadaf Ebrahimidoes the same, except for \u{...}, which is recognized only when 171*22dc650dSSadaf EbrahimiPCRE2_EXTRA_ALT_BSUX is set. This an ECMAScript, non-Perl compatible, 172*22dc650dSSadaf Ebrahimiextension, so PCRE2 follows ECMAScript rather than Perl. 173*22dc650dSSadaf Ebrahimi 174*22dc650dSSadaf Ebrahimi31. Applied pull request #300 by Carlo, which fixes #261. The bug was that 175*22dc650dSSadaf Ebrahimipcre2_match() was not fully resetting all captures that had been set within a 176*22dc650dSSadaf Ebrahimi(possibly recursive) subroutine call such as (?3). 177*22dc650dSSadaf Ebrahimi 178*22dc650dSSadaf Ebrahimi32. Changed the meaning of \w (and its synonyms) in UCP mode to match Perl. It 179*22dc650dSSadaf Ebrahiminow matches characters whose general categories are L or N or whose particular 180*22dc650dSSadaf Ebrahimicategories are Mn (non-spacing mark) or Pc (combining punctuation). The latter 181*22dc650dSSadaf Ebrahimiincludes underscore. 182*22dc650dSSadaf Ebrahimi 183*22dc650dSSadaf Ebrahimi33. Changed the meaning of [:xdigit:] in UCP mode to match Perl. It now also 184*22dc650dSSadaf Ebrahimimatches the "fullwidth" versions of the hex digits. Just like it is done for 185*22dc650dSSadaf Ebrahimi[:digit:], PCRE2_EXTRA_ASCII_DIGIT can be used to keep this class ASCII only 186*22dc650dSSadaf Ebrahimiwithout affecting other POSIX classes. 187*22dc650dSSadaf Ebrahimi 188*22dc650dSSadaf Ebrahimi34. GitHub PR305 fixes a potential integer overflow in pcre2_dfa_match(). 189*22dc650dSSadaf Ebrahimi 190*22dc650dSSadaf Ebrahimi35. Updated handling of \b and \B in UCP mode to match the changes to \w in 32 191*22dc650dSSadaf Ebrahimiabove because \b and \B are defined in terms of \w. 192*22dc650dSSadaf Ebrahimi 193*22dc650dSSadaf Ebrahimi36. Within a pattern (?aT) and (?-aT) set and reset the PCRE2_EXTRA_ASCII_DIGIT 194*22dc650dSSadaf Ebrahimioption, and (?aP) also sets (?aT) so that (?-aP) disables all ASCII 195*22dc650dSSadaf Ebrahimirestrictions on POSIX classes. 196*22dc650dSSadaf Ebrahimi 197*22dc650dSSadaf Ebrahimi37. If PCRE2_FIRSTLINE was set on an anchored pattern, pcre2_match() and 198*22dc650dSSadaf Ebrahimipcre2_dfa_match() misbehaved. PCRE2_FIRSTLINE is now ignored for anchored 199*22dc650dSSadaf Ebrahimipatterns. 200*22dc650dSSadaf Ebrahimi 201*22dc650dSSadaf Ebrahimi38. Add a test for ridiculous ovector offset values to the substring extraction 202*22dc650dSSadaf Ebrahimifunctions. 203*22dc650dSSadaf Ebrahimi 204*22dc650dSSadaf Ebrahimi39. Make OP_REVERSE use IMM2_SIZE for its data instead of LINK_SIZE, for 205*22dc650dSSadaf Ebrahimiconsistency with OP_VREVERSE. 206*22dc650dSSadaf Ebrahimi 207*22dc650dSSadaf Ebrahimi40. In some legacy environments with a pre C99 snprintf, pcre2_regerror could 208*22dc650dSSadaf Ebrahimireturn an incorrect value when the provided buffer was too small. 209*22dc650dSSadaf Ebrahimi 210*22dc650dSSadaf Ebrahimi41. Applied pull request #342 which adds sanity checks for ctype functions and 211*22dc650dSSadaf Ebrahimilocks out any accidental sign-extension. 212*22dc650dSSadaf Ebrahimi 213*22dc650dSSadaf Ebrahimi42. In the 32-bit library, in non-UTF mode, a quantifier that followed a 214*22dc650dSSadaf Ebrahimiliteral character with a value greater than or equal to 0x80000000u caused 215*22dc650dSSadaf Ebrahimiundefined behaviour. 216*22dc650dSSadaf Ebrahimi 217*22dc650dSSadaf Ebrahimi43. \z was misbehaving when matching fragments inside invalid UTF strings. 218*22dc650dSSadaf Ebrahimi 219*22dc650dSSadaf Ebrahimi44. Implement --group-separator and --no-group-separator for pcre2grep. 220*22dc650dSSadaf Ebrahimi 221*22dc650dSSadaf Ebrahimi45. Fix \X matching in 32 bit mode without UTF in JIT. 222*22dc650dSSadaf Ebrahimi 223*22dc650dSSadaf Ebrahimi46. Fix backref iterators when PCRE2_MATCH_UNSET_BACKREF is set in JIT. 224*22dc650dSSadaf Ebrahimi 225*22dc650dSSadaf Ebrahimi47. Refactor the handling of whole-pattern recursion (?0) in pcre2_match() so 226*22dc650dSSadaf Ebrahimithat its end is handled similarly to other recursions. This has altered the 227*22dc650dSSadaf Ebrahimibehaviour of /|(?0)./endanchored which was previously not right. 228*22dc650dSSadaf Ebrahimi 229*22dc650dSSadaf Ebrahimi48. Improved the test for looping recursion by checking the last referenced 230*22dc650dSSadaf Ebrahimicharacter as well as the current character. This allows some patterns that 231*22dc650dSSadaf Ebrahimipreviously triggered the check to run to completion instead of giving the loop 232*22dc650dSSadaf Ebrahimierror. 233*22dc650dSSadaf Ebrahimi 234*22dc650dSSadaf Ebrahimi49. In 32-bit mode, the compiler looped for the pattern /[\x{ffffffff}]/ when 235*22dc650dSSadaf EbrahimiPCRE2_CASELESS and PCRE2_UCP (but not PCRE2_UTF) were set. Fixed by not trying 236*22dc650dSSadaf Ebrahimito look for other cases for characters above the Unicode range. 237*22dc650dSSadaf Ebrahimi 238*22dc650dSSadaf Ebrahimi50. In caseless 32-bit mode with UCP (but not UTF) set, the character 239*22dc650dSSadaf Ebrahimi0xffffffff incorrectly matched any character that has more than one other case, 240*22dc650dSSadaf Ebrahimiin particular k and s. 241*22dc650dSSadaf Ebrahimi 242*22dc650dSSadaf Ebrahimi51. Fix accept and endanchored interaction in JIT. 243*22dc650dSSadaf Ebrahimi 244*22dc650dSSadaf Ebrahimi52. Fix backreferences with unset backref and non-greedy iterators in JIT. 245*22dc650dSSadaf Ebrahimi 246*22dc650dSSadaf Ebrahimi53. Improve the logic that checks for a list of starting code units -- positive 247*22dc650dSSadaf Ebrahimilookahead assertions are now ignored if the immediately following item is one 248*22dc650dSSadaf Ebrahimithat sets a mandatory starting character. For example, /a?(?=bc|)d/ used to set 249*22dc650dSSadaf Ebrahimiall of a, b, and d as possible starting code units; now it sets only a and d. 250*22dc650dSSadaf Ebrahimi 251*22dc650dSSadaf Ebrahimi54. Fix incorrect class character matches in JIT. 252*22dc650dSSadaf Ebrahimi 253*22dc650dSSadaf Ebrahimi55. In pcre2test, ensure pcre2_jit_match() is used when jitfast is used with 254*22dc650dSSadaf Ebrahimisubstitution testing. 255*22dc650dSSadaf Ebrahimi 256*22dc650dSSadaf Ebrahimi56. Insert omitted setting of subject length in match data at the end of 257*22dc650dSSadaf Ebrahimipcre2_jit_match(). 258*22dc650dSSadaf Ebrahimi 259*22dc650dSSadaf Ebrahimi57. Implemented PCRE2_DISABLE_RECURSELOOP_CHECK for pcre2_match() to enable 260*22dc650dSSadaf Ebrahimisome apparently looping recursions to run to completion and therefore match the 261*22dc650dSSadaf EbrahimiJIT behaviour. With this set, real loops will eventually get caught by match or 262*22dc650dSSadaf Ebrahimiheap limits or run out of resource. 263*22dc650dSSadaf Ebrahimi 264*22dc650dSSadaf Ebrahimi58. AC did a lot of work on pcre2_fuzzsupport.c to extend it to 16-bit and 265*22dc650dSSadaf Ebrahimi32-bit libraries and to compare JIT and non-JIT matching. 266*22dc650dSSadaf Ebrahimi 267*22dc650dSSadaf Ebrahimi 268*22dc650dSSadaf EbrahimiVersion 10.42 11-December-2022 269*22dc650dSSadaf Ebrahimi------------------------------ 270*22dc650dSSadaf Ebrahimi 271*22dc650dSSadaf Ebrahimi1. Change 19 of 10.41 wasn't quite right; it put the definition of a default, 272*22dc650dSSadaf Ebrahimiempty value for PCRE2_CALL_CONVENTION in src/pcre2posix.c instead of 273*22dc650dSSadaf Ebrahimisrc/pcre2posix.h, which meant that programs that included pcre2posix.h but not 274*22dc650dSSadaf Ebrahimipcre2.h failed to compile. 275*22dc650dSSadaf Ebrahimi 276*22dc650dSSadaf Ebrahimi2. To catch similar issues to the above in future, a new small test program 277*22dc650dSSadaf Ebrahimithat includes pcre2posix.h but not pcre2.h has been added to the test suite. 278*22dc650dSSadaf Ebrahimi 279*22dc650dSSadaf Ebrahimi3. When the -S option of pcre2test was used to set a stack size greater than 280*22dc650dSSadaf Ebrahimithe allowed maximum, the error message displayed the hard limit incorrectly. 281*22dc650dSSadaf EbrahimiThis was pointed out on GitHub pull request #171, but the suggested patch 282*22dc650dSSadaf Ebrahimididn't cope with all cases. Some further modification was required. 283*22dc650dSSadaf Ebrahimi 284*22dc650dSSadaf Ebrahimi4. Supplying an ovector count of more than 65535 to pcre2_match_data_create() 285*22dc650dSSadaf Ebrahimicaused a crash because the field in the match data block is only 16 bits. A 286*22dc650dSSadaf Ebrahimimaximum of 65535 is now silently applied. 287*22dc650dSSadaf Ebrahimi 288*22dc650dSSadaf Ebrahimi5. Merged @carenas patch #175 which fixes #86 - segfault on aarch64 (ARM), 289*22dc650dSSadaf Ebrahimi 290*22dc650dSSadaf Ebrahimi6. The prototype for pcre2_substring_list_free() specified its argument as 291*22dc650dSSadaf EbrahimiPCRE2_SPTR * which is a const data type, whereas the yield from 292*22dc650dSSadaf Ebrahimipcre2_substring_list() is not const. This caused compiler warnings. I have 293*22dc650dSSadaf Ebrahimichanged the argument of pcre2_substring_list_free() to be PCRE2_UCHAR ** to 294*22dc650dSSadaf Ebrahimiremove this anomaly. This might cause new warnings in existing code where a 295*22dc650dSSadaf Ebrahimicast has been used to avoid previous ones. 296*22dc650dSSadaf Ebrahimi 297*22dc650dSSadaf Ebrahimi 298*22dc650dSSadaf EbrahimiVersion 10.41 06-December-2022 299*22dc650dSSadaf Ebrahimi------------------------------ 300*22dc650dSSadaf Ebrahimi 301*22dc650dSSadaf Ebrahimi1. Add fflush() before and after a fork callout in pcre2grep to get its output 302*22dc650dSSadaf Ebrahimito be the same on all systems. (There were previously ordering differences in 303*22dc650dSSadaf EbrahimiAlpine Linux). 304*22dc650dSSadaf Ebrahimi 305*22dc650dSSadaf Ebrahimi2. Merged patch from @carenas (GitHub #110) for pthreads support in CMake. 306*22dc650dSSadaf Ebrahimi 307*22dc650dSSadaf Ebrahimi3. SSF scorecards grumbled about possible overflow in an expression in 308*22dc650dSSadaf Ebrahimipcre2test. It never would have overflowed in practice, but some casts have been 309*22dc650dSSadaf Ebrahimiadded and at the some time there's been some tidying of fprints that output 310*22dc650dSSadaf Ebrahimisize_t values. 311*22dc650dSSadaf Ebrahimi 312*22dc650dSSadaf Ebrahimi4. PR #94 showed up an unused enum in pcre2_convert.c, which is now removed. 313*22dc650dSSadaf Ebrahimi 314*22dc650dSSadaf Ebrahimi5. Minor code re-arrangement to remove gcc warning about realloc() in 315*22dc650dSSadaf Ebrahimipcre2test. 316*22dc650dSSadaf Ebrahimi 317*22dc650dSSadaf Ebrahimi6. Change a number of int variables that hold buffer and line lengths in 318*22dc650dSSadaf Ebrahimipcre2grep to PCRE2_SIZE (aka size_t). 319*22dc650dSSadaf Ebrahimi 320*22dc650dSSadaf Ebrahimi7. Added an #ifdef to cut out a call to PRIV(jit_free) when JIT is not 321*22dc650dSSadaf Ebrahimisupported (even though that function would do nothing in that case) at the 322*22dc650dSSadaf Ebrahimirequest of a user who doesn't even want to link with pcre_jit_compile.o. Also 323*22dc650dSSadaf Ebrahimitidied up an untidy #ifdef arrangement in pcre2test. 324*22dc650dSSadaf Ebrahimi 325*22dc650dSSadaf Ebrahimi8. Fixed an issue in the backtracking optimization of character repeats in 326*22dc650dSSadaf EbrahimiJIT. Furthermore optimize star repetitions, not just plus repetitions. 327*22dc650dSSadaf Ebrahimi 328*22dc650dSSadaf Ebrahimi9. Removed the use of an initial backtracking frames vector on the system stack 329*22dc650dSSadaf Ebrahimiin pcre2_match() so that it now always uses the heap. (In a multi-thread 330*22dc650dSSadaf Ebrahimienvironment with very small stacks there had been an issue.) This also is 331*22dc650dSSadaf Ebrahimitidier for JIT matching, which didn't need that vector. The heap vector is now 332*22dc650dSSadaf Ebrahimiremembered in the match data block and re-used if that block itself is re-used. 333*22dc650dSSadaf EbrahimiIt is freed with the match data block. 334*22dc650dSSadaf Ebrahimi 335*22dc650dSSadaf Ebrahimi10. Adjusted the find_limits code in pcre2test to work with change 9 above. 336*22dc650dSSadaf Ebrahimi 337*22dc650dSSadaf Ebrahimi11. Added find_limits_noheap to pcre2test, because the heap limits are now 338*22dc650dSSadaf Ebrahimidifferent in different environments and so cannot be included in the standard 339*22dc650dSSadaf Ebrahimitests. 340*22dc650dSSadaf Ebrahimi 341*22dc650dSSadaf Ebrahimi12. Created a test for pcre2_match() heap processing that is not part of the 342*22dc650dSSadaf Ebrahimitests run by 'make check', but can be run manually. The current output is from 343*22dc650dSSadaf Ebrahimia 64-bit system. 344*22dc650dSSadaf Ebrahimi 345*22dc650dSSadaf Ebrahimi13. Implemented -Z aka --null in pcre2grep. 346*22dc650dSSadaf Ebrahimi 347*22dc650dSSadaf Ebrahimi14. A minor change to pcre2test and the addition of several new pcre2grep tests 348*22dc650dSSadaf Ebrahimihave improved LCOV coverage statistics. At the same time, code in pcre2grep and 349*22dc650dSSadaf Ebrahimielsewhere that can never be obeyed in normal testing has been excluded from 350*22dc650dSSadaf Ebrahimicoverage. 351*22dc650dSSadaf Ebrahimi 352*22dc650dSSadaf Ebrahimi15. Fixed a bug in pcre2grep that could cause an extra newline to be written 353*22dc650dSSadaf Ebrahimiafter output generated by --output. 354*22dc650dSSadaf Ebrahimi 355*22dc650dSSadaf Ebrahimi16. If a file has a .bz2 extension but is not in fact compressed, pcre2grep 356*22dc650dSSadaf Ebrahimishould process it as a plain text file. A bug stopped this happening; now fixed 357*22dc650dSSadaf Ebrahimiand added to the tests. 358*22dc650dSSadaf Ebrahimi 359*22dc650dSSadaf Ebrahimi17. When pcre2grep was running not in UTF mode, if a string specified by 360*22dc650dSSadaf Ebrahimi--output or obtained from a callout in a pattern contained a character (byte) 361*22dc650dSSadaf Ebrahimigreater than 127, it was incorrectly output in UTF-8 format. 362*22dc650dSSadaf Ebrahimi 363*22dc650dSSadaf Ebrahimi18. Added some casts after warnings from Clang sanitize. 364*22dc650dSSadaf Ebrahimi 365*22dc650dSSadaf Ebrahimi19. Merged patch from cbouc (GitHub #139): 4 function prototypes were missing 366*22dc650dSSadaf EbrahimiPCRE2_CALL_CONVENTION in src/pcre2posix.h. All function prototypes returning 367*22dc650dSSadaf Ebrahimipointers had out of place PCRE2_CALL_CONVENTION in src/pcre2.h.*. These 368*22dc650dSSadaf Ebrahimiproduced errors when building for Windows with #define PCRE2_CALL_CONVENTION 369*22dc650dSSadaf Ebrahimi__stdcall. 370*22dc650dSSadaf Ebrahimi 371*22dc650dSSadaf Ebrahimi20. A negative repeat value in a pcre2test subject line was not being 372*22dc650dSSadaf Ebrahimidiagnosed, leading to infinite looping. 373*22dc650dSSadaf Ebrahimi 374*22dc650dSSadaf Ebrahimi21. Updated RunGrepTest to discard the warning that Bash now gives when setting 375*22dc650dSSadaf EbrahimiLC_CTYPE to a bad value (because older versions didn't). 376*22dc650dSSadaf Ebrahimi 377*22dc650dSSadaf Ebrahimi22. Updated pcre2grep so that it behaves like GNU grep when matching more than 378*22dc650dSSadaf Ebrahimione pattern and a later pattern matches at an earlier point in the subject when 379*22dc650dSSadaf Ebrahimithe matched substrings are being identified by colour or by offsets. 380*22dc650dSSadaf Ebrahimi 381*22dc650dSSadaf Ebrahimi23. Updated the PrepareRelease script so that the man page that it makes for 382*22dc650dSSadaf Ebrahimithe pcre2demo demonstration program is more standard and does not cause errors 383*22dc650dSSadaf Ebrahimiwhen processed by lexgrog or mandb -c (GitHub issue #160). 384*22dc650dSSadaf Ebrahimi 385*22dc650dSSadaf Ebrahimi24. The JIT compiler was updated. 386*22dc650dSSadaf Ebrahimi 387*22dc650dSSadaf Ebrahimi 388*22dc650dSSadaf EbrahimiVersion 10.40 15-April-2022 389*22dc650dSSadaf Ebrahimi--------------------------- 390*22dc650dSSadaf Ebrahimi 391*22dc650dSSadaf Ebrahimi1. Merged patch from @carenas (GitHub #35, 7db87842) to fix pcre2grep incorrect 392*22dc650dSSadaf Ebrahimihandling of multiple passes. 393*22dc650dSSadaf Ebrahimi 394*22dc650dSSadaf Ebrahimi2. Merged patch from @carenas (GitHub #36, dae47509) to fix portability issue 395*22dc650dSSadaf Ebrahimiin pcre2grep with buffered fseek(stdin). 396*22dc650dSSadaf Ebrahimi 397*22dc650dSSadaf Ebrahimi3. Merged patch from @carenas (GitHub #37, acc520924) to fix tests when -S is 398*22dc650dSSadaf Ebrahiminot supported. 399*22dc650dSSadaf Ebrahimi 400*22dc650dSSadaf Ebrahimi4. Revert an unintended change in JIT repeat detection. 401*22dc650dSSadaf Ebrahimi 402*22dc650dSSadaf Ebrahimi5. Merged patch from @carenas (GitHub #52, b037bfa1) to fix build on GNU Hurd. 403*22dc650dSSadaf Ebrahimi 404*22dc650dSSadaf Ebrahimi6. Merged documentation and comments patches from @carenas (GitHub #47). 405*22dc650dSSadaf Ebrahimi 406*22dc650dSSadaf Ebrahimi7. Merged patch from @carenas (GitHub #49) to remove obsolete JFriedl test code 407*22dc650dSSadaf Ebrahimifrom pcre2grep. 408*22dc650dSSadaf Ebrahimi 409*22dc650dSSadaf Ebrahimi8. Merged patch from @carenas (GitHub #48) to fix CMake install issue #46. 410*22dc650dSSadaf Ebrahimi 411*22dc650dSSadaf Ebrahimi9. Merged patch from @carenas (GitHub #53) fixing NULL checks in matching and 412*22dc650dSSadaf Ebrahimisubstituting. 413*22dc650dSSadaf Ebrahimi 414*22dc650dSSadaf Ebrahimi10. Add null_subject and null_replacement modifiers to pcre2test. 415*22dc650dSSadaf Ebrahimi 416*22dc650dSSadaf Ebrahimi11. Add check for NULL subject to POSIX regexec() function. 417*22dc650dSSadaf Ebrahimi 418*22dc650dSSadaf Ebrahimi12. Add check for NULL replacement to pcre2_substitute(). 419*22dc650dSSadaf Ebrahimi 420*22dc650dSSadaf Ebrahimi13. For the subject arguments of pcre2_match(), pcre2_dfa_match(), and 421*22dc650dSSadaf Ebrahimipcre2_substitute(), and the replacement argument of the latter, if the pointer 422*22dc650dSSadaf Ebrahimiis NULL and the length is zero, treat as an empty string. Apparently a number 423*22dc650dSSadaf Ebrahimiof applications treat NULL/0 in this way. 424*22dc650dSSadaf Ebrahimi 425*22dc650dSSadaf Ebrahimi14. Added support for Bidi_Class and a number of binary Unicode properties, 426*22dc650dSSadaf Ebrahimiincluding Bidi_Control. 427*22dc650dSSadaf Ebrahimi 428*22dc650dSSadaf Ebrahimi15. Fix some minor issues raised by clang sanitize. 429*22dc650dSSadaf Ebrahimi 430*22dc650dSSadaf Ebrahimi16. Very minor code speed up for maximizing character property matches. 431*22dc650dSSadaf Ebrahimi 432*22dc650dSSadaf Ebrahimi17. A number of changes to script matching for \p and \P: 433*22dc650dSSadaf Ebrahimi 434*22dc650dSSadaf Ebrahimi (a) Script extensions for a character are now coded as a bitmap instead of 435*22dc650dSSadaf Ebrahimi a list of script numbers, which should be faster and does not need a 436*22dc650dSSadaf Ebrahimi loop. 437*22dc650dSSadaf Ebrahimi 438*22dc650dSSadaf Ebrahimi (b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms 439*22dc650dSSadaf Ebrahimi sc and scx). 440*22dc650dSSadaf Ebrahimi 441*22dc650dSSadaf Ebrahimi (c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being 442*22dc650dSSadaf Ebrahimi the same as \p{scx:scriptname} because this change happened in Perl at 443*22dc650dSSadaf Ebrahimi release 5.26. 444*22dc650dSSadaf Ebrahimi 445*22dc650dSSadaf Ebrahimi (d) The standard Unicode 4-letter abbreviations for script names are now 446*22dc650dSSadaf Ebrahimi recognized. 447*22dc650dSSadaf Ebrahimi 448*22dc650dSSadaf Ebrahimi (e) In accordance with Unicode and Perl's "loose matching" rules, spaces, 449*22dc650dSSadaf Ebrahimi hyphens, and underscores are ignored in property names, which are then 450*22dc650dSSadaf Ebrahimi matched independent of case. 451*22dc650dSSadaf Ebrahimi 452*22dc650dSSadaf Ebrahimi18. The Python scripts in the maint directory have been refactored. There are 453*22dc650dSSadaf Ebrahiminow three scripts that generate pcre2_ucd.c, pcre2_ucp.h, and pcre2_ucptables.c 454*22dc650dSSadaf Ebrahimi(which is #included by pcre2_tables.c). The data lists that used to be 455*22dc650dSSadaf Ebrahimiduplicated are now held in a single common Python module. 456*22dc650dSSadaf Ebrahimi 457*22dc650dSSadaf Ebrahimi19. On CHERI, and thus Arm's Morello prototype, pointers are represented as 458*22dc650dSSadaf Ebrahimihardware capabilities, which consist of both an integer address and additional 459*22dc650dSSadaf Ebrahimimetadata, meaning they are twice the size of the platform's size_t type, i.e. 460*22dc650dSSadaf Ebrahimi16 bytes on a 64-bit system. The ovector member of heapframe happens to only be 461*22dc650dSSadaf Ebrahimi8 byte aligned, and so computing frame_size ended up with a multiple of 8 but 462*22dc650dSSadaf Ebrahiminot 16. Whilst the first frame was always suitably aligned, this then 463*22dc650dSSadaf Ebrahimimisaligned the frame that follows, resulting in an alignment fault when storing 464*22dc650dSSadaf Ebrahimia pointer to Fecode at the start of match. Patch to fix this issue by Jessica 465*22dc650dSSadaf EbrahimiClarke PR#72. 466*22dc650dSSadaf Ebrahimi 467*22dc650dSSadaf Ebrahimi20. Added -LP and -LS listing options to pcre2test. 468*22dc650dSSadaf Ebrahimi 469*22dc650dSSadaf Ebrahimi21. A user discovered that the library names in CMakeLists.txt for MSVC 470*22dc650dSSadaf Ebrahimidebugger (PDB) files were incorrect - perhaps never tried for PCRE2? 471*22dc650dSSadaf Ebrahimi 472*22dc650dSSadaf Ebrahimi22. An item such as [Aa] is optimized into a caseless single character match. 473*22dc650dSSadaf EbrahimiWhen this was quantified (e.g. [Aa]{2}) and was also the last literal item in a 474*22dc650dSSadaf Ebrahimipattern, the optimizing "must be present for a match" character check was not 475*22dc650dSSadaf Ebrahimibeing flagged as caseless, causing some matches that should have succeeded to 476*22dc650dSSadaf Ebrahimifail. 477*22dc650dSSadaf Ebrahimi 478*22dc650dSSadaf Ebrahimi23. Fixed a unicode property matching issue in JIT. The character was not 479*22dc650dSSadaf Ebrahimifully read in caseless matching. 480*22dc650dSSadaf Ebrahimi 481*22dc650dSSadaf Ebrahimi24. Fixed an issue affecting recursions in JIT caused by duplicated data 482*22dc650dSSadaf Ebrahimitransfers. 483*22dc650dSSadaf Ebrahimi 484*22dc650dSSadaf Ebrahimi25. Merged patch from @carenas (GitHub #96) which fixes some problems with 485*22dc650dSSadaf Ebrahimipcre2test and readline/readedit: 486*22dc650dSSadaf Ebrahimi 487*22dc650dSSadaf Ebrahimi * Use the right header for libedit in FreeBSD with autoconf 488*22dc650dSSadaf Ebrahimi * Really allow libedit with cmake 489*22dc650dSSadaf Ebrahimi * Avoid using readline headers with libedit 490*22dc650dSSadaf Ebrahimi 491*22dc650dSSadaf Ebrahimi 492*22dc650dSSadaf EbrahimiVersion 10.39 29-October-2021 493*22dc650dSSadaf Ebrahimi----------------------------- 494*22dc650dSSadaf Ebrahimi 495*22dc650dSSadaf Ebrahimi1. Fix incorrect detection of alternatives in first character search in JIT. 496*22dc650dSSadaf Ebrahimi 497*22dc650dSSadaf Ebrahimi2. Merged patch from @carenas (GitHub #28): 498*22dc650dSSadaf Ebrahimi 499*22dc650dSSadaf Ebrahimi Visual Studio 2013 includes support for %zu and %td, so let newer 500*22dc650dSSadaf Ebrahimi versions of it avoid the fallback, and while at it, make sure that 501*22dc650dSSadaf Ebrahimi the first check is for DISABLE_PERCENT_ZT so it will be always 502*22dc650dSSadaf Ebrahimi honoured if chosen. 503*22dc650dSSadaf Ebrahimi 504*22dc650dSSadaf Ebrahimi prtdiff_t is signed, so use a signed type instead, and make sure 505*22dc650dSSadaf Ebrahimi that an appropriate width is chosen if pointers are 64bit wide and 506*22dc650dSSadaf Ebrahimi long is not (ex: Windows 64bit). 507*22dc650dSSadaf Ebrahimi 508*22dc650dSSadaf Ebrahimi IMHO removing the cast (and therefore the possibility of truncation) 509*22dc650dSSadaf Ebrahimi make the code cleaner and the fallback is likely portable enough 510*22dc650dSSadaf Ebrahimi with all 64-bit POSIX systems doing LP64 except for Windows. 511*22dc650dSSadaf Ebrahimi 512*22dc650dSSadaf Ebrahimi3. Merged patch from @carenas (GitHub #29) to update to Unicode 14.0.0. 513*22dc650dSSadaf Ebrahimi 514*22dc650dSSadaf Ebrahimi4. Merged patch from @carenas (GitHub #30): 515*22dc650dSSadaf Ebrahimi 516*22dc650dSSadaf Ebrahimi * Cleanup: remove references to no longer used stdint.h 517*22dc650dSSadaf Ebrahimi 518*22dc650dSSadaf Ebrahimi Since 19c50b9d (Unconditionally use inttypes.h instead of trying for stdint.h 519*22dc650dSSadaf Ebrahimi (simplification) and remove the now unnecessary inclusion in 520*22dc650dSSadaf Ebrahimi pcre2_internal.h., 2018-11-14), stdint.h is no longer used. 521*22dc650dSSadaf Ebrahimi 522*22dc650dSSadaf Ebrahimi Remove checks for it in autotools and CMake and document better the expected 523*22dc650dSSadaf Ebrahimi build failures for systems that might have stdint.h (C99) and not inttypes.h 524*22dc650dSSadaf Ebrahimi (from POSIX), like old Windows. 525*22dc650dSSadaf Ebrahimi 526*22dc650dSSadaf Ebrahimi * Cleanup: remove detection for inttypes.h which is a hard dependency 527*22dc650dSSadaf Ebrahimi 528*22dc650dSSadaf Ebrahimi CMake checks for standard headers are not meant to be used for hard 529*22dc650dSSadaf Ebrahimi dependencies, so will prevent a possible fallback to work. 530*22dc650dSSadaf Ebrahimi 531*22dc650dSSadaf Ebrahimi Alternatively, the header could be checked to make the configuration fail 532*22dc650dSSadaf Ebrahimi instead of breaking the build, but that was punted, as it was missing anyway 533*22dc650dSSadaf Ebrahimi from autotools. 534*22dc650dSSadaf Ebrahimi 535*22dc650dSSadaf Ebrahimi5. Merged patch from @carenas (GitHub #32): 536*22dc650dSSadaf Ebrahimi 537*22dc650dSSadaf Ebrahimi * jit: allow building with ancient MSVC versions 538*22dc650dSSadaf Ebrahimi 539*22dc650dSSadaf Ebrahimi Visual Studio older than 2013 fails to build with JIT enabled, because it is 540*22dc650dSSadaf Ebrahimi unable to parse non C89 compatible syntax, with mixed declarations and code. 541*22dc650dSSadaf Ebrahimi While most recent compilers wouldn't even report this as a warning since it 542*22dc650dSSadaf Ebrahimi is valid C99, it could be also made visible by adding to gcc/clang the 543*22dc650dSSadaf Ebrahimi -Wdeclaration-after-statement flag at build time. 544*22dc650dSSadaf Ebrahimi 545*22dc650dSSadaf Ebrahimi Move the code below the affected definitions. 546*22dc650dSSadaf Ebrahimi 547*22dc650dSSadaf Ebrahimi * pcre2grep: avoid mixing declarations with code 548*22dc650dSSadaf Ebrahimi 549*22dc650dSSadaf Ebrahimi Since d5a61ee8 (Patch to detect (and ignore) symlink loops in pcre2grep, 550*22dc650dSSadaf Ebrahimi 2021-08-28), code will fail to build in a strict C89 compiler. 551*22dc650dSSadaf Ebrahimi 552*22dc650dSSadaf Ebrahimi Reformat slightly to make it C89 compatible again. 553*22dc650dSSadaf Ebrahimi 554*22dc650dSSadaf Ebrahimi 555*22dc650dSSadaf EbrahimiVersion 10.38 01-October-2021 556*22dc650dSSadaf Ebrahimi----------------------------- 557*22dc650dSSadaf Ebrahimi 558*22dc650dSSadaf Ebrahimi1. Fix invalid single character repetition issues in JIT when the repetition 559*22dc650dSSadaf Ebrahimiis inside a capturing bracket and the bracket is preceded by character 560*22dc650dSSadaf Ebrahimiliterals. 561*22dc650dSSadaf Ebrahimi 562*22dc650dSSadaf Ebrahimi2. Installed revised CMake configuration files provided by Jan-Willem Blokland. 563*22dc650dSSadaf EbrahimiThis extends the CMake build system to build both static and shared libraries 564*22dc650dSSadaf Ebrahimiin one go, builds the static library with PIC, and exposes PCRE2 libraries 565*22dc650dSSadaf Ebrahimiusing the CMake config files. JWB provided these notes: 566*22dc650dSSadaf Ebrahimi 567*22dc650dSSadaf Ebrahimi- Introduced CMake variable BUILD_STATIC_LIBS to build the static library. 568*22dc650dSSadaf Ebrahimi 569*22dc650dSSadaf Ebrahimi- Make a small modification to config-cmake.h.in by removing the PCRE2_STATIC 570*22dc650dSSadaf Ebrahimi variable. Added PCRE2_STATIC variable to the static build using the 571*22dc650dSSadaf Ebrahimi target_compile_definitions() function. 572*22dc650dSSadaf Ebrahimi 573*22dc650dSSadaf Ebrahimi- Extended the CMake config files. 574*22dc650dSSadaf Ebrahimi 575*22dc650dSSadaf Ebrahimi - Introduced CMake variable PCRE2_USE_STATIC_LIBS to easily switch between 576*22dc650dSSadaf Ebrahimi the static and shared libraries. 577*22dc650dSSadaf Ebrahimi 578*22dc650dSSadaf Ebrahimi - Added the PCRE_STATIC variable to the target compile definitions for the 579*22dc650dSSadaf Ebrahimi import of the static library. 580*22dc650dSSadaf Ebrahimi 581*22dc650dSSadaf EbrahimiBuilding static and shared libraries using MSVC results in a name clash of 582*22dc650dSSadaf Ebrahimithe libraries. Both static and shared library builds create, for example, the 583*22dc650dSSadaf Ebrahimifile pcre2-8.lib. Therefore, I decided to change the static library names by 584*22dc650dSSadaf Ebrahimiadding "-static". For example, pcre2-8.lib has become pcre2-8-static.lib. 585*22dc650dSSadaf Ebrahimi[Comment by PH: this is MSVC-specific. It doesn't happen on Linux.] 586*22dc650dSSadaf Ebrahimi 587*22dc650dSSadaf Ebrahimi3. Increased the minimum release number for CMake to 3.0.0 because older than 588*22dc650dSSadaf Ebrahimi2.8.12 is deprecated (it was set to 2.8.5) and causes warnings. Even 3.0.0 is 589*22dc650dSSadaf Ebrahimiquite old; it was released in 2014. 590*22dc650dSSadaf Ebrahimi 591*22dc650dSSadaf Ebrahimi4. Implemented a modified version of Thomas Tempelmann's pcre2grep patch for 592*22dc650dSSadaf Ebrahimidetecting symlink loops. This is dependent on the availability of realpath(), 593*22dc650dSSadaf Ebrahimiwhich is now tested for in ./configure and CMakeLists.txt. 594*22dc650dSSadaf Ebrahimi 595*22dc650dSSadaf Ebrahimi5. Implemented a modified version of Thomas Tempelmann's patch for faster 596*22dc650dSSadaf Ebrahimicase-independent "first code unit" searches for unanchored patterns in 8-bit 597*22dc650dSSadaf Ebrahimimode in the interpreters. Instead of just remembering whether one case matched 598*22dc650dSSadaf Ebrahimior not, it remembers the position of a previous match so as to avoid 599*22dc650dSSadaf Ebrahimiunnecessary repeated searching. 600*22dc650dSSadaf Ebrahimi 601*22dc650dSSadaf Ebrahimi6. Perl now locks out \K in lookarounds, so PCRE2 now does the same by default. 602*22dc650dSSadaf EbrahimiHowever, just in case anybody was relying on the old behaviour, there is an 603*22dc650dSSadaf Ebrahimioption called PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK that enables the old behaviour. 604*22dc650dSSadaf EbrahimiAn option has also been added to pcre2grep to enable this. 605*22dc650dSSadaf Ebrahimi 606*22dc650dSSadaf Ebrahimi7. Re-enable a JIT optimization which was unintentionally disabled in 10.35. 607*22dc650dSSadaf Ebrahimi 608*22dc650dSSadaf Ebrahimi8. There is a loop counter to catch excessively crazy patterns when checking 609*22dc650dSSadaf Ebrahimithe lengths of lookbehinds at compile time. This was incorrectly getting reset 610*22dc650dSSadaf Ebrahimiwhenever a lookahead was processed, leading to some fuzzer-generated patterns 611*22dc650dSSadaf Ebrahimitaking a very long time to compile when (?|) was present in the pattern, 612*22dc650dSSadaf Ebrahimibecause (?|) disables caching of group lengths. 613*22dc650dSSadaf Ebrahimi 614*22dc650dSSadaf Ebrahimi 615*22dc650dSSadaf EbrahimiVersion 10.37 26-May-2021 616*22dc650dSSadaf Ebrahimi------------------------- 617*22dc650dSSadaf Ebrahimi 618*22dc650dSSadaf Ebrahimi1. Change RunGrepTest to use tr instead of sed when testing with binary 619*22dc650dSSadaf Ebrahimizero bytes, because sed varies a lot from system to system and has problems 620*22dc650dSSadaf Ebrahimiwith binary zeros. This is from Bugzilla #2681. Patch from Jeremie 621*22dc650dSSadaf EbrahimiCourreges-Anglas via Nam Nguyen. This fixes RunGrepTest for OpenBSD. Later: 622*22dc650dSSadaf Ebrahimiit broke it for at least one version of Solaris, where tr can't handle binary 623*22dc650dSSadaf Ebrahimizeros. However, that system had /usr/xpg4/bin/tr installed, which works OK, so 624*22dc650dSSadaf EbrahimiRunGrepTest now checks for that command and uses it if found. 625*22dc650dSSadaf Ebrahimi 626*22dc650dSSadaf Ebrahimi2. Compiling with gcc 10.2's -fanalyzer option showed up a hypothetical problem 627*22dc650dSSadaf Ebrahimiwith a NULL dereference. I don't think this case could ever occur in practice, 628*22dc650dSSadaf Ebrahimibut I have put in a check in order to get rid of the compiler error. 629*22dc650dSSadaf Ebrahimi 630*22dc650dSSadaf Ebrahimi3. An alternative patch for CMakeLists.txt because 10.36 #4 breaks CMake on 631*22dc650dSSadaf EbrahimiWindows. Patch from [email protected] fixes bugzilla #2688. 632*22dc650dSSadaf Ebrahimi 633*22dc650dSSadaf Ebrahimi4. Two bugs related to over-large numbers have been fixed so the behaviour is 634*22dc650dSSadaf Ebrahiminow the same as Perl. 635*22dc650dSSadaf Ebrahimi 636*22dc650dSSadaf Ebrahimi (a) A pattern such as /\214748364/ gave an overflow error instead of being 637*22dc650dSSadaf Ebrahimi treated as the octal number \214 followed by literal digits. 638*22dc650dSSadaf Ebrahimi 639*22dc650dSSadaf Ebrahimi (b) A sequence such as {65536 that has no terminating } so is not a 640*22dc650dSSadaf Ebrahimi quantifier was nevertheless complaining that a quantifier number was too big. 641*22dc650dSSadaf Ebrahimi 642*22dc650dSSadaf Ebrahimi5. A run of autoconf suggested that configure.ac was out-of-date with respect 643*22dc650dSSadaf Ebrahimito the latest autoconf. Running autoupdate made some valid changes, some valid 644*22dc650dSSadaf Ebrahimisuggestions, and also some invalid changes, which were fixed by hand. Autoconf 645*22dc650dSSadaf Ebrahiminow runs clean and the resulting "configure" seems to work, so I hope nothing 646*22dc650dSSadaf Ebrahimiis broken. Later: the requirement for autoconf 2.70 broke some automatic test 647*22dc650dSSadaf Ebrahimirobots. It doesn't seem to be necessary: trying a reduction to 2.60. 648*22dc650dSSadaf Ebrahimi 649*22dc650dSSadaf Ebrahimi6. The pattern /a\K.(?0)*/ when matched against "abac" by the interpreter gave 650*22dc650dSSadaf Ebrahimithe answer "bac", whereas Perl and JIT both yield "c". This was because the 651*22dc650dSSadaf Ebrahimieffect of \K was not propagating back from the full pattern recursion. Other 652*22dc650dSSadaf Ebrahimirecursions such as /(a\K.(?1)*)/ did not have this problem. 653*22dc650dSSadaf Ebrahimi 654*22dc650dSSadaf Ebrahimi7. Restore single character repetition optimization in JIT. Currently fewer 655*22dc650dSSadaf Ebrahimicharacter repetitions are optimized than in 10.34. 656*22dc650dSSadaf Ebrahimi 657*22dc650dSSadaf Ebrahimi8. When the names of the functions in the POSIX wrapper were changed to 658*22dc650dSSadaf Ebrahimipcre2_regcomp() etc. (see change 10.33 #4 below), functions with the original 659*22dc650dSSadaf Ebrahiminames were left in the library so that pre-compiled programs would still work. 660*22dc650dSSadaf EbrahimiHowever, this has proved troublesome when programs link with several libraries, 661*22dc650dSSadaf Ebrahimisome of which use PCRE2 via the POSIX interface while others use a native POSIX 662*22dc650dSSadaf Ebrahimilibrary. For this reason, the POSIX function names are removed in this release. 663*22dc650dSSadaf EbrahimiThe macros in pcre2posix.h should ensure that re-compiling fixes any programs 664*22dc650dSSadaf Ebrahimithat haven't been compiled since before 10.33. 665*22dc650dSSadaf Ebrahimi 666*22dc650dSSadaf Ebrahimi 667*22dc650dSSadaf EbrahimiVersion 10.36 04-December-2020 668*22dc650dSSadaf Ebrahimi------------------------------ 669*22dc650dSSadaf Ebrahimi 670*22dc650dSSadaf Ebrahimi1. Add CET_CFLAGS so that when Intel CET is enabled, pass -mshstk to 671*22dc650dSSadaf Ebrahimicompiler. This fixes https://bugs.exim.org/show_bug.cgi?id=2578. Patch for 672*22dc650dSSadaf EbrahimiMakefile.am and configure.ac by H.J. Lu. Equivalent patch for CMakeLists.txt 673*22dc650dSSadaf Ebrahimiinvented by PH. 674*22dc650dSSadaf Ebrahimi 675*22dc650dSSadaf Ebrahimi2. Fix infinite loop when a single byte newline is searched in JIT when 676*22dc650dSSadaf Ebrahimiinvalid utf8 mode is enabled. 677*22dc650dSSadaf Ebrahimi 678*22dc650dSSadaf Ebrahimi3. Updated CMakeLists.txt with patch from Wolfgang Stöggl (Bugzilla #2584): 679*22dc650dSSadaf Ebrahimi 680*22dc650dSSadaf Ebrahimi - Include GNUInstallDirs and use ${CMAKE_INSTALL_LIBDIR} instead of hardcoded 681*22dc650dSSadaf Ebrahimi lib. This allows differentiation between lib and lib64. 682*22dc650dSSadaf Ebrahimi CMAKE_INSTALL_LIBDIR is used for installation of libraries and also for 683*22dc650dSSadaf Ebrahimi pkgconfig file generation. 684*22dc650dSSadaf Ebrahimi 685*22dc650dSSadaf Ebrahimi - Add the version of PCRE2 to the configuration summary like ./configure 686*22dc650dSSadaf Ebrahimi does. 687*22dc650dSSadaf Ebrahimi 688*22dc650dSSadaf Ebrahimi - Fix typo: MACTHED_STRING->MATCHED_STRING 689*22dc650dSSadaf Ebrahimi 690*22dc650dSSadaf Ebrahimi4. Updated CMakeLists.txt with another patch from Wolfgang Stöggl (Bugzilla 691*22dc650dSSadaf Ebrahimi#2588): 692*22dc650dSSadaf Ebrahimi 693*22dc650dSSadaf Ebrahimi - Add escaped double quotes around include directory in CMakeLists.txt to 694*22dc650dSSadaf Ebrahimi allow spaces in directory names. 695*22dc650dSSadaf Ebrahimi 696*22dc650dSSadaf Ebrahimi - This fixes a cmake error, if the path of the pcre2 source contains a space. 697*22dc650dSSadaf Ebrahimi 698*22dc650dSSadaf Ebrahimi5. Updated CMakeLists.txt with a patch from B. Scott Michel: CMake's 699*22dc650dSSadaf Ebrahimidocumentation suggests using CHECK_SYMBOL_EXISTS over CHECK_FUNCTION_EXIST. 700*22dc650dSSadaf EbrahimiMoreover, these functions come from specific header files, which need to be 701*22dc650dSSadaf Ebrahimispecified (and, thankfully, are the same on both the Linux and WinXX 702*22dc650dSSadaf Ebrahimiplatforms.) 703*22dc650dSSadaf Ebrahimi 704*22dc650dSSadaf Ebrahimi6. Added a (uint32_t) cast to prevent a compiler warning in pcre2_compile.c. 705*22dc650dSSadaf Ebrahimi 706*22dc650dSSadaf Ebrahimi7. Applied a patch from Wolfgang Stöggl (Bugzilla #2600) to fix postfix for 707*22dc650dSSadaf Ebrahimidebug Windows builds using CMake. This also updated configure so that it 708*22dc650dSSadaf Ebrahimigenerates *.pc files and pcre2-config with the same content, as in the past. 709*22dc650dSSadaf Ebrahimi 710*22dc650dSSadaf Ebrahimi8. If a pattern ended with (?(VERSION=n.d where n is any number but d is just a 711*22dc650dSSadaf Ebrahimisingle digit, the code unit beyond d was being read (i.e. there was a read 712*22dc650dSSadaf Ebrahimibuffer overflow). Fixes ClusterFuzz 23779. 713*22dc650dSSadaf Ebrahimi 714*22dc650dSSadaf Ebrahimi9. After the rework in r1235, certain character ranges were incorrectly 715*22dc650dSSadaf Ebrahimihandled by an optimization in JIT. Furthermore a wrong offset was used to 716*22dc650dSSadaf Ebrahimiread a value from a buffer which could lead to memory overread. 717*22dc650dSSadaf Ebrahimi 718*22dc650dSSadaf Ebrahimi10. Unnoticed for many years was the fact that delimiters other than / in the 719*22dc650dSSadaf Ebrahimitestinput1 and testinput4 files could cause incorrect behaviour when these 720*22dc650dSSadaf Ebrahimifiles were processed by perltest.sh. There were several tests that used quotes 721*22dc650dSSadaf Ebrahimias delimiters, and it was just luck that they didn't go wrong with perltest.sh. 722*22dc650dSSadaf EbrahimiAll the patterns in testinput1 and testinput4 now use / as their delimiter. 723*22dc650dSSadaf EbrahimiThis fixes Bugzilla #2641. 724*22dc650dSSadaf Ebrahimi 725*22dc650dSSadaf Ebrahimi11. Perl has started to give an error for \K within lookarounds (though there 726*22dc650dSSadaf Ebrahimiare cases where it doesn't). PCRE2 still allows this, so the tests that include 727*22dc650dSSadaf Ebrahimithis case have been moved from test 1 to test 2. 728*22dc650dSSadaf Ebrahimi 729*22dc650dSSadaf Ebrahimi12. Further to 10 above, pcre2test has been updated to detect and grumble if a 730*22dc650dSSadaf Ebrahimidelimiter other than / is used after #perltest. 731*22dc650dSSadaf Ebrahimi 732*22dc650dSSadaf Ebrahimi13. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS 733*22dc650dSSadaf Ebrahimiwas set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding 734*22dc650dSSadaf Ebrahimithe start of a match was not resetting correctly after a failed match on the 735*22dc650dSSadaf Ebrahimifirst valid fragment of the subject, possibly causing incorrect "no match" 736*22dc650dSSadaf Ebrahimireturns on subsequent fragments. For example, the pattern /A/ failed to match 737*22dc650dSSadaf Ebrahimithe subject \xe5A. Fixes Bugzilla #2642. 738*22dc650dSSadaf Ebrahimi 739*22dc650dSSadaf Ebrahimi14. Fixed a bug in character set matching when JIT is enabled and both unicode 740*22dc650dSSadaf Ebrahimiscripts and unicode classes are present at the same time. 741*22dc650dSSadaf Ebrahimi 742*22dc650dSSadaf Ebrahimi15. Added GNU grep's -m (aka --max-count) option to pcre2grep. 743*22dc650dSSadaf Ebrahimi 744*22dc650dSSadaf Ebrahimi16. Refactored substitution processing in pcre2grep strings, both for the -O 745*22dc650dSSadaf Ebrahimioption and when dealing with callouts. There is now a single function that 746*22dc650dSSadaf Ebrahimihandles $ expansion in all cases (instead of multiple copies of almost 747*22dc650dSSadaf Ebrahimiidentical code). This means that the same escape sequences are available 748*22dc650dSSadaf Ebrahimieverywhere, which was not previously the case. At the same time, the escape 749*22dc650dSSadaf Ebrahimisequences $x{...} and $o{...} have been introduced, to allow for characters 750*22dc650dSSadaf Ebrahimiwhose code points are greater than 255 in Unicode mode. 751*22dc650dSSadaf Ebrahimi 752*22dc650dSSadaf Ebrahimi17. Applied the patch from Bugzilla #2628 to RunGrepTest. This does an explicit 753*22dc650dSSadaf Ebrahimitest for a version of sed that can handle binary zero, instead of assuming that 754*22dc650dSSadaf Ebrahimiany Linux version will work. Later: replaced $(...) by `...` because not all 755*22dc650dSSadaf Ebrahimishells recognize the former. 756*22dc650dSSadaf Ebrahimi 757*22dc650dSSadaf Ebrahimi18. Fixed a word boundary check bug in JIT when partial matching is enabled. 758*22dc650dSSadaf Ebrahimi 759*22dc650dSSadaf Ebrahimi19. Fix ARM64 compilation warning in JIT. Patch by Carlo. 760*22dc650dSSadaf Ebrahimi 761*22dc650dSSadaf Ebrahimi20. A bug in the RunTest script meant that if the first part of test 2 failed, 762*22dc650dSSadaf Ebrahimithe failure was not reported. 763*22dc650dSSadaf Ebrahimi 764*22dc650dSSadaf Ebrahimi21. Test 2 was failing when run from a directory other than the source 765*22dc650dSSadaf Ebrahimidirectory. This failure was previously missed in RunTest because of 20 above. 766*22dc650dSSadaf EbrahimiFixes added to both RunTest and RunTest.bat. 767*22dc650dSSadaf Ebrahimi 768*22dc650dSSadaf Ebrahimi22. Patch to CMakeLists.txt from Daniel to fix problem with testing under 769*22dc650dSSadaf EbrahimiWindows. 770*22dc650dSSadaf Ebrahimi 771*22dc650dSSadaf Ebrahimi 772*22dc650dSSadaf EbrahimiVersion 10.35 09-May-2020 773*22dc650dSSadaf Ebrahimi--------------------------- 774*22dc650dSSadaf Ebrahimi 775*22dc650dSSadaf Ebrahimi1. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT. 776*22dc650dSSadaf Ebrahimi 777*22dc650dSSadaf Ebrahimi2. Fix ARMv5 JIT improper handling of labels right after a constant pool. 778*22dc650dSSadaf Ebrahimi 779*22dc650dSSadaf Ebrahimi3. A JIT bug is fixed which allowed to read the fields of the compiled 780*22dc650dSSadaf Ebrahimipattern before its existence is checked. 781*22dc650dSSadaf Ebrahimi 782*22dc650dSSadaf Ebrahimi4. Back in the PCRE1 day, capturing groups that contained recursive back 783*22dc650dSSadaf Ebrahimireferences to themselves were made atomic (version 8.01, change 18) because 784*22dc650dSSadaf Ebrahimiafter the end a repeated group, the captured substrings had their values from 785*22dc650dSSadaf Ebrahimithe final repetition, not from an earlier repetition that might be the 786*22dc650dSSadaf Ebrahimidestination of a backtrack. This feature was documented, and was carried over 787*22dc650dSSadaf Ebrahimiinto PCRE2. However, it has now been realized that the major refactoring that 788*22dc650dSSadaf Ebrahimiwas done for 10.30 has made this atomizing unnecessary, and it is confusing 789*22dc650dSSadaf Ebrahimiwhen users are unaware of it, making some patterns appear not to be working as 790*22dc650dSSadaf Ebrahimiexpected. Capture values of recursive back references in repeated groups are 791*22dc650dSSadaf Ebrahiminow correctly backtracked, so this unnecessary restriction has been removed. 792*22dc650dSSadaf Ebrahimi 793*22dc650dSSadaf Ebrahimi5. Added PCRE2_SUBSTITUTE_LITERAL. 794*22dc650dSSadaf Ebrahimi 795*22dc650dSSadaf Ebrahimi6. Avoid some VS compiler warnings. 796*22dc650dSSadaf Ebrahimi 797*22dc650dSSadaf Ebrahimi7. Added PCRE2_SUBSTITUTE_MATCHED. 798*22dc650dSSadaf Ebrahimi 799*22dc650dSSadaf Ebrahimi8. Added (?* and (?<* as synonyms for (*napla: and (*naplb: to match another 800*22dc650dSSadaf Ebrahimiregex engine. The Perl regex folks are aware of this usage and have made a note 801*22dc650dSSadaf Ebrahimiabout it. 802*22dc650dSSadaf Ebrahimi 803*22dc650dSSadaf Ebrahimi9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to 804*22dc650dSSadaf Ebrahimi1, believing that repeating an assertion is pointless. However, if a positive 805*22dc650dSSadaf Ebrahimiassertion contains capturing groups, repetition can be useful. In any case, an 806*22dc650dSSadaf Ebrahimiassertion could always be wrapped in a repeated group. The only restriction 807*22dc650dSSadaf Ebrahimithat is now imposed is that an unlimited maximum is changed to one more than 808*22dc650dSSadaf Ebrahimithe minimum. 809*22dc650dSSadaf Ebrahimi 810*22dc650dSSadaf Ebrahimi10. Fix *THEN verbs in lookahead assertions in JIT. 811*22dc650dSSadaf Ebrahimi 812*22dc650dSSadaf Ebrahimi11. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY. 813*22dc650dSSadaf Ebrahimi 814*22dc650dSSadaf Ebrahimi12. The JIT stack should be freed when the low-level stack allocation fails. 815*22dc650dSSadaf Ebrahimi 816*22dc650dSSadaf Ebrahimi13. In pcre2grep, if the final line in a scanned file is output but does not 817*22dc650dSSadaf Ebrahimiend with a newline sequence, add a newline according to the --newline setting. 818*22dc650dSSadaf Ebrahimi 819*22dc650dSSadaf Ebrahimi14. (?(DEFINE)...) groups were not being handled correctly when checking for 820*22dc650dSSadaf Ebrahimithe fixed length of a lookbehind assertion. Such a group within a lookbehind 821*22dc650dSSadaf Ebrahimishould be skipped, as it does not contribute to the length of the group. 822*22dc650dSSadaf EbrahimiInstead, the (DEFINE) group was being processed, and if at the end of the 823*22dc650dSSadaf Ebrahimilookbehind, that end was not correctly recognized. Errors such as "lookbehind 824*22dc650dSSadaf Ebrahimiassertion is not fixed length" and also "internal error: bad code value in 825*22dc650dSSadaf Ebrahimiparsed_skip()" could result. 826*22dc650dSSadaf Ebrahimi 827*22dc650dSSadaf Ebrahimi15. Put a limit of 1000 on recursive calls in pcre2_study() when searching 828*22dc650dSSadaf Ebrahiminested groups for starting code units, in order to avoid stack overflow issues. 829*22dc650dSSadaf EbrahimiIf the limit is reached, it just gives up trying for this optimization. 830*22dc650dSSadaf Ebrahimi 831*22dc650dSSadaf Ebrahimi16. The control verb chain list must always be restored when exiting from a 832*22dc650dSSadaf Ebrahimirecurse function in JIT. 833*22dc650dSSadaf Ebrahimi 834*22dc650dSSadaf Ebrahimi17. Fix a crash which occurs when the character type of an invalid UTF 835*22dc650dSSadaf Ebrahimicharacter is decoded in JIT. 836*22dc650dSSadaf Ebrahimi 837*22dc650dSSadaf Ebrahimi18. Changes in many areas of the code so that when Unicode is supported and 838*22dc650dSSadaf EbrahimiPCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for 839*22dc650dSSadaf Ebrahimiupper/lower case computations on characters whose code points are greater than 840*22dc650dSSadaf Ebrahimi127. 841*22dc650dSSadaf Ebrahimi 842*22dc650dSSadaf Ebrahimi19. The function for checking UTF-16 validity was returning an incorrect offset 843*22dc650dSSadaf Ebrahimifor the start of the error when a high surrogate was not followed by a valid 844*22dc650dSSadaf Ebrahimilow surrogate. This caused incorrect behaviour, for example when 845*22dc650dSSadaf EbrahimiPCRE2_MATCH_INVALID_UTF was set and a match started immediately following the 846*22dc650dSSadaf Ebrahimiinvalid high surrogate, such as /aa/ matching "\x{d800}aa". 847*22dc650dSSadaf Ebrahimi 848*22dc650dSSadaf Ebrahimi20. If a DEFINE group immediately preceded a lookbehind assertion, the pattern 849*22dc650dSSadaf Ebrahimicould be mis-compiled and therefore not match correctly. This is the example 850*22dc650dSSadaf Ebrahimithat found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to 851*22dc650dSSadaf Ebrahimimatch "word" because the "move back" value was set to zero. 852*22dc650dSSadaf Ebrahimi 853*22dc650dSSadaf Ebrahimi21. Following a request from a user, some extensions and tidies to the 854*22dc650dSSadaf Ebrahimicharacter tables handling have been done: 855*22dc650dSSadaf Ebrahimi 856*22dc650dSSadaf Ebrahimi (a) The dftables auxiliary program is renamed pcre2_dftables, but it is still 857*22dc650dSSadaf Ebrahimi not installed for public use. 858*22dc650dSSadaf Ebrahimi 859*22dc650dSSadaf Ebrahimi (b) There is now a -b option for pcre2_dftables, which causes the tables to 860*22dc650dSSadaf Ebrahimi be written in binary. There is also a -help option. 861*22dc650dSSadaf Ebrahimi 862*22dc650dSSadaf Ebrahimi (c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an 863*22dc650dSSadaf Ebrahimi application that wants to save tables in binary knows how long they are. 864*22dc650dSSadaf Ebrahimi 865*22dc650dSSadaf Ebrahimi22. Changed setting of CMAKE_MODULE_PATH in CMakeLists.txt from SET to 866*22dc650dSSadaf EbrahimiLIST(APPEND...) to allow a setting from the command line to be included. 867*22dc650dSSadaf Ebrahimi 868*22dc650dSSadaf Ebrahimi23. Updated to Unicode 13.0.0. 869*22dc650dSSadaf Ebrahimi 870*22dc650dSSadaf Ebrahimi24. CMake build now checks for secure_getenv() and strerror(). Patch by Carlo. 871*22dc650dSSadaf Ebrahimi 872*22dc650dSSadaf Ebrahimi25. Avoid using [-1] as a suffix in pcre2test because it can provoke a compiler 873*22dc650dSSadaf Ebrahimiwarning. 874*22dc650dSSadaf Ebrahimi 875*22dc650dSSadaf Ebrahimi26. Added tests for __attribute__((uninitialized)) to both the configure and 876*22dc650dSSadaf EbrahimiCMake build files, and then applied this attribute to the variable called 877*22dc650dSSadaf Ebrahimistack_frames_vector[] in pcre2_match(). When implemented, this disables 878*22dc650dSSadaf Ebrahimiautomatic initialization (a facility in clang), which can take time on big 879*22dc650dSSadaf Ebrahimivariables. 880*22dc650dSSadaf Ebrahimi 881*22dc650dSSadaf Ebrahimi27. Updated CMakeLists.txt (patches by Uwe Korn) to add support for 882*22dc650dSSadaf Ebrahimipcre2-config, the libpcre*.pc files, SOVERSION, VERSION and the 883*22dc650dSSadaf EbrahimiMACHO_*_VERSIONS settings for CMake builds. 884*22dc650dSSadaf Ebrahimi 885*22dc650dSSadaf Ebrahimi28. Another patch to CMakeLists.txt to check for mkostemp (configure already 886*22dc650dSSadaf Ebrahimidoes). Patch by Carlo Marcelo Arenas Belon. 887*22dc650dSSadaf Ebrahimi 888*22dc650dSSadaf Ebrahimi29. Check for the existence of memfd_create in both CMake and configure 889*22dc650dSSadaf Ebrahimiconfigurations. Patch by Carlo Marcelo Arenas Belon. 890*22dc650dSSadaf Ebrahimi 891*22dc650dSSadaf Ebrahimi30. Restrict the configuration setting for the SELinux compatible execmem 892*22dc650dSSadaf Ebrahimiallocator (change 10.30/44) to Linux and NetBSD. 893*22dc650dSSadaf Ebrahimi 894*22dc650dSSadaf Ebrahimi 895*22dc650dSSadaf EbrahimiVersion 10.34 21-November-2019 896*22dc650dSSadaf Ebrahimi------------------------------ 897*22dc650dSSadaf Ebrahimi 898*22dc650dSSadaf Ebrahimi1. The maximum number of capturing subpatterns is 65535 (documented), but no 899*22dc650dSSadaf Ebrahimicheck on this was ever implemented. This omission has been rectified; it fixes 900*22dc650dSSadaf EbrahimiClusterFuzz 14376. 901*22dc650dSSadaf Ebrahimi 902*22dc650dSSadaf Ebrahimi2. Improved the invalid utf32 support of the JIT compiler. Now it correctly 903*22dc650dSSadaf Ebrahimidetects invalid characters in the 0xd800-0xdfff range. 904*22dc650dSSadaf Ebrahimi 905*22dc650dSSadaf Ebrahimi3. Fix minor typo bug in JIT compile when \X is used in a non-UTF string. 906*22dc650dSSadaf Ebrahimi 907*22dc650dSSadaf Ebrahimi4. Add support for matching in invalid UTF strings to the pcre2_match() 908*22dc650dSSadaf Ebrahimiinterpreter, and integrate with the existing JIT support via the new 909*22dc650dSSadaf EbrahimiPCRE2_MATCH_INVALID_UTF compile-time option. 910*22dc650dSSadaf Ebrahimi 911*22dc650dSSadaf Ebrahimi5. Give more error detail for invalid UTF-8 when detected in pcre2grep. 912*22dc650dSSadaf Ebrahimi 913*22dc650dSSadaf Ebrahimi6. Add support for invalid UTF-8 to pcre2grep. 914*22dc650dSSadaf Ebrahimi 915*22dc650dSSadaf Ebrahimi7. Adjust the limit for "must have" code unit searching, in particular, 916*22dc650dSSadaf Ebrahimiincrease it substantially for non-anchored patterns. 917*22dc650dSSadaf Ebrahimi 918*22dc650dSSadaf Ebrahimi8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero 919*22dc650dSSadaf Ebrahimiminimum is potentially useful. 920*22dc650dSSadaf Ebrahimi 921*22dc650dSSadaf Ebrahimi9. Some changes to the way the minimum subject length is handled: 922*22dc650dSSadaf Ebrahimi 923*22dc650dSSadaf Ebrahimi * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed; 924*22dc650dSSadaf Ebrahimi pcre2test now omits this item instead of showing a value of zero. 925*22dc650dSSadaf Ebrahimi 926*22dc650dSSadaf Ebrahimi * An incorrect minimum length could be calculated for a pattern that 927*22dc650dSSadaf Ebrahimi contained (*ACCEPT) inside a qualified group whose minimum repetition was 928*22dc650dSSadaf Ebrahimi zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum 929*22dc650dSSadaf Ebrahimi of 2. The minimum length scan no longer happens for a pattern that 930*22dc650dSSadaf Ebrahimi contains (*ACCEPT). 931*22dc650dSSadaf Ebrahimi 932*22dc650dSSadaf Ebrahimi * When no minimum length is set by the normal scan, but a first and/or last 933*22dc650dSSadaf Ebrahimi code unit is recorded, set the minimum to 1 or 2 as appropriate. 934*22dc650dSSadaf Ebrahimi 935*22dc650dSSadaf Ebrahimi * When a pattern contains multiple groups with the same number, a back 936*22dc650dSSadaf Ebrahimi reference cannot know which one to scan for a minimum length. This used to 937*22dc650dSSadaf Ebrahimi cause the minimum length finder to give up with no result. Now it treats 938*22dc650dSSadaf Ebrahimi such references as not adding to the minimum length (which it should have 939*22dc650dSSadaf Ebrahimi done all along). 940*22dc650dSSadaf Ebrahimi 941*22dc650dSSadaf Ebrahimi * Furthermore, the above action now happens only if the back reference is to 942*22dc650dSSadaf Ebrahimi a group that exists more than once in a pattern instead of any back 943*22dc650dSSadaf Ebrahimi reference in a pattern with duplicate numbers. 944*22dc650dSSadaf Ebrahimi 945*22dc650dSSadaf Ebrahimi10. A (*MARK) value inside a successful condition was not being returned by the 946*22dc650dSSadaf Ebrahimiinterpretive matcher (it was returned by JIT). This bug has been mended. 947*22dc650dSSadaf Ebrahimi 948*22dc650dSSadaf Ebrahimi11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work 949*22dc650dSSadaf Ebrahimiif the pattern had more than 32 capturing parentheses. This is fixed. In 950*22dc650dSSadaf Ebrahimiaddition (a) the default limit for groups requested by -o<n> has been raised to 951*22dc650dSSadaf Ebrahimi50, (b) the new --om-capture option changes the limit, (c) an error is raised 952*22dc650dSSadaf Ebrahimiif -o asks for a group that is above the limit. 953*22dc650dSSadaf Ebrahimi 954*22dc650dSSadaf Ebrahimi12. The quantifier {1} was always being ignored, but this is incorrect when it 955*22dc650dSSadaf Ebrahimiis made possessive and applied to an item in parentheses, because a 956*22dc650dSSadaf Ebrahimiparenthesized item may contain multiple branches or other backtracking points, 957*22dc650dSSadaf Ebrahimifor example /(a|ab){1}+c/ or /(a+){1}+a/. 958*22dc650dSSadaf Ebrahimi 959*22dc650dSSadaf Ebrahimi13. For partial matches, pcre2test was always showing the maximum lookbehind 960*22dc650dSSadaf Ebrahimicharacters, flagged with "<", which is misleading when the lookbehind didn't 961*22dc650dSSadaf Ebrahimiactually look behind the start (because it was later in the pattern). Showing 962*22dc650dSSadaf Ebrahimiall consulted preceding characters for partial matches is now controlled by the 963*22dc650dSSadaf Ebrahimiexisting "allusedtext" modifier and, as for complete matches, this facility is 964*22dc650dSSadaf Ebrahimiavailable only for non-JIT matching, because JIT does not maintain the first 965*22dc650dSSadaf Ebrahimiand last consulted characters. 966*22dc650dSSadaf Ebrahimi 967*22dc650dSSadaf Ebrahimi14. DFA matching (using pcre2_dfa_match()) was not recognising a partial match 968*22dc650dSSadaf Ebrahimiif the end of the subject was encountered in a lookahead (conditional or 969*22dc650dSSadaf Ebrahimiotherwise), an atomic group, or a recursion. 970*22dc650dSSadaf Ebrahimi 971*22dc650dSSadaf Ebrahimi15. Give error if pcre2test -t, -T, -tm or -TM is given an argument of zero. 972*22dc650dSSadaf Ebrahimi 973*22dc650dSSadaf Ebrahimi16. Check for integer overflow when computing lookbehind lengths. Fixes 974*22dc650dSSadaf EbrahimiClusterfuzz issue 15636. 975*22dc650dSSadaf Ebrahimi 976*22dc650dSSadaf Ebrahimi17. Implemented non-atomic positive lookaround assertions. 977*22dc650dSSadaf Ebrahimi 978*22dc650dSSadaf Ebrahimi18. If a lookbehind contained a lookahead that contained another lookbehind 979*22dc650dSSadaf Ebrahimiwithin it, the nested lookbehind was not correctly processed. For example, if 980*22dc650dSSadaf Ebrahimi/(?<=(?=(?<=a)))b/ was matched to "ab" it gave no match instead of matching 981*22dc650dSSadaf Ebrahimi"b". 982*22dc650dSSadaf Ebrahimi 983*22dc650dSSadaf Ebrahimi19. Implemented pcre2_get_match_data_size(). 984*22dc650dSSadaf Ebrahimi 985*22dc650dSSadaf Ebrahimi20. Two alterations to partial matching: 986*22dc650dSSadaf Ebrahimi 987*22dc650dSSadaf Ebrahimi (a) The definition of a partial match is slightly changed: if a pattern 988*22dc650dSSadaf Ebrahimi contains any lookbehinds, an empty partial match may be given, because this 989*22dc650dSSadaf Ebrahimi is another situation where adding characters to the current subject can 990*22dc650dSSadaf Ebrahimi lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab". 991*22dc650dSSadaf Ebrahimi 992*22dc650dSSadaf Ebrahimi (b) Similarly, if a pattern could match an empty string, an empty partial 993*22dc650dSSadaf Ebrahimi match may be given. Example: /(?![ab]).*/ with subject "ab". This case 994*22dc650dSSadaf Ebrahimi applies only to PCRE2_PARTIAL_HARD. 995*22dc650dSSadaf Ebrahimi 996*22dc650dSSadaf Ebrahimi (c) An empty string partial hard match can be returned for \z and \Z as it 997*22dc650dSSadaf Ebrahimi is documented that they shouldn't match. 998*22dc650dSSadaf Ebrahimi 999*22dc650dSSadaf Ebrahimi21. A branch that started with (*ACCEPT) was not being recognized as one that 1000*22dc650dSSadaf Ebrahimicould match an empty string. 1001*22dc650dSSadaf Ebrahimi 1002*22dc650dSSadaf Ebrahimi22. Corrected pcre2_set_character_tables() tables data type: was const unsigned 1003*22dc650dSSadaf Ebrahimichar * instead of const uint8_t *, as generated by pcre2_maketables(). 1004*22dc650dSSadaf Ebrahimi 1005*22dc650dSSadaf Ebrahimi23. Upgraded to Unicode 12.1.0. 1006*22dc650dSSadaf Ebrahimi 1007*22dc650dSSadaf Ebrahimi24. Add -jitfast command line option to pcre2test (to make all the jit options 1008*22dc650dSSadaf Ebrahimiavailable directly). 1009*22dc650dSSadaf Ebrahimi 1010*22dc650dSSadaf Ebrahimi25. Make pcre2test -C show if libreadline or libedit is supported. 1011*22dc650dSSadaf Ebrahimi 1012*22dc650dSSadaf Ebrahimi26. If the length of one branch of a group exceeded 65535 (the maximum value 1013*22dc650dSSadaf Ebrahimithat is remembered as a minimum length), the whole group's length was 1014*22dc650dSSadaf Ebrahimiincorrectly recorded as 65535, leading to incorrect "no match" when start-up 1015*22dc650dSSadaf Ebrahimioptimizations were in force. 1016*22dc650dSSadaf Ebrahimi 1017*22dc650dSSadaf Ebrahimi27. The "rightmost consulted character" value was not always correct; in 1018*22dc650dSSadaf Ebrahimiparticular, if a pattern ended with a negative lookahead, characters that were 1019*22dc650dSSadaf Ebrahimiinspected in that lookahead were not included. 1020*22dc650dSSadaf Ebrahimi 1021*22dc650dSSadaf Ebrahimi28. Add the pcre2_maketables_free() function. 1022*22dc650dSSadaf Ebrahimi 1023*22dc650dSSadaf Ebrahimi29. The start-up optimization that looks for a unique initial matching 1024*22dc650dSSadaf Ebrahimicode unit in the interpretive engines uses memchr() in 8-bit mode. When the 1025*22dc650dSSadaf Ebrahimisearch is caseless, it was doing so inefficiently, which ended up slowing down 1026*22dc650dSSadaf Ebrahimithe match drastically when the subject was very long. The revised code (a) 1027*22dc650dSSadaf Ebrahimiremembers if one case is not found, so it never repeats the search for that 1028*22dc650dSSadaf Ebrahimicase after a bumpalong and (b) when one case has been found, it searches only 1029*22dc650dSSadaf Ebrahimiup to that position for an earlier occurrence of the other case. This fix 1030*22dc650dSSadaf Ebrahimiapplies to both interpretive pcre2_match() and to pcre2_dfa_match(). 1031*22dc650dSSadaf Ebrahimi 1032*22dc650dSSadaf Ebrahimi30. While scanning to find the minimum length of a group, if any branch has 1033*22dc650dSSadaf Ebrahimiminimum length zero, there is no need to scan any subsequent branches (a small 1034*22dc650dSSadaf Ebrahimicompile-time performance improvement). 1035*22dc650dSSadaf Ebrahimi 1036*22dc650dSSadaf Ebrahimi31. Installed a .gitignore file on a user's suggestion. When using the svn 1037*22dc650dSSadaf Ebrahimirepository with git (through git svn) this helps keep it tidy. 1038*22dc650dSSadaf Ebrahimi 1039*22dc650dSSadaf Ebrahimi32. Add underflow check in JIT which may occur when the value of subject 1040*22dc650dSSadaf Ebrahimistring pointer is close to 0. 1041*22dc650dSSadaf Ebrahimi 1042*22dc650dSSadaf Ebrahimi33. Arrange for classes such as [Aa] which contain just the two cases of the 1043*22dc650dSSadaf Ebrahimisame character, to be treated as a single caseless character. This causes the 1044*22dc650dSSadaf Ebrahimifirst and required code unit optimizations to kick in where relevant. 1045*22dc650dSSadaf Ebrahimi 1046*22dc650dSSadaf Ebrahimi34. Improve the bitmap of starting bytes for positive classes that include wide 1047*22dc650dSSadaf Ebrahimicharacters, but no property types, in UTF-8 mode. Previously, on encountering 1048*22dc650dSSadaf Ebrahimisuch a class, the bits for all bytes greater than \xc4 were set, thus 1049*22dc650dSSadaf Ebrahimispecifying any character with codepoint >= 0x100. Now the only bits that are 1050*22dc650dSSadaf Ebrahimiset are for the relevant bytes that start the wide characters. This can give a 1051*22dc650dSSadaf Ebrahiminoticeable performance improvement. 1052*22dc650dSSadaf Ebrahimi 1053*22dc650dSSadaf Ebrahimi35. If the bitmap of starting code units contains only 1 or 2 bits, replace it 1054*22dc650dSSadaf Ebrahimiwith a single starting code unit (1 bit) or a caseless single starting code 1055*22dc650dSSadaf Ebrahimiunit if the two relevant characters are case-partners. This is particularly 1056*22dc650dSSadaf Ebrahimirelevant to the 8-bit library, though it applies to all. It can give a 1057*22dc650dSSadaf Ebrahimiperformance boost for patterns such as [Ww]ord and (word|WORD). However, this 1058*22dc650dSSadaf Ebrahimioptimization doesn't happen if there is a "required" code unit of the same 1059*22dc650dSSadaf Ebrahimivalue (because the search for a "required" code unit starts at the match start 1060*22dc650dSSadaf Ebrahimifor non-unique first code unit patterns, but after a unique first code unit, 1061*22dc650dSSadaf Ebrahimiand patterns such as a*a need the former action). 1062*22dc650dSSadaf Ebrahimi 1063*22dc650dSSadaf Ebrahimi36. Small patch to pcre2posix.c to set the erroroffset field to -1 immediately 1064*22dc650dSSadaf Ebrahimiafter a successful compile, instead of at the start of matching to avoid a 1065*22dc650dSSadaf Ebrahimisanitizer complaint (regexec is supposed to be thread safe). 1066*22dc650dSSadaf Ebrahimi 1067*22dc650dSSadaf Ebrahimi37. Add NEON vectorization to JIT to speed up matching of first character and 1068*22dc650dSSadaf Ebrahimipairs of characters on ARM64 CPUs. 1069*22dc650dSSadaf Ebrahimi 1070*22dc650dSSadaf Ebrahimi38. If a non-ASCII character was the first in a starting assertion in a 1071*22dc650dSSadaf Ebrahimicaseless match, the "first code unit" optimization did not get the casing 1072*22dc650dSSadaf Ebrahimiright, and the assertion failed to match a character in the other case if it 1073*22dc650dSSadaf Ebrahimidid not start with the same code unit. 1074*22dc650dSSadaf Ebrahimi 1075*22dc650dSSadaf Ebrahimi39. Fixed the incorrect computation of jump sizes on x86 CPUs in JIT. A masking 1076*22dc650dSSadaf Ebrahimioperation was incorrectly removed in r1136. Reported by Ralf Junker. 1077*22dc650dSSadaf Ebrahimi 1078*22dc650dSSadaf Ebrahimi 1079*22dc650dSSadaf EbrahimiVersion 10.33 16-April-2019 1080*22dc650dSSadaf Ebrahimi--------------------------- 1081*22dc650dSSadaf Ebrahimi 1082*22dc650dSSadaf Ebrahimi1. Added "allvector" to pcre2test to make it easy to check the part of the 1083*22dc650dSSadaf Ebrahimiovector that shouldn't be changed, in particular after substitute and failed or 1084*22dc650dSSadaf Ebrahimipartial matches. 1085*22dc650dSSadaf Ebrahimi 1086*22dc650dSSadaf Ebrahimi2. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has 1087*22dc650dSSadaf Ebrahimia greater than 1 fixed quantifier. This issue was found by Yunho Kim. 1088*22dc650dSSadaf Ebrahimi 1089*22dc650dSSadaf Ebrahimi3. Added support for callouts from pcre2_substitute(). After 10.33-RC1, but 1090*22dc650dSSadaf Ebrahimiprior to release, fixed a bug that caused a crash if pcre2_substitute() was 1091*22dc650dSSadaf Ebrahimicalled with a NULL match context. 1092*22dc650dSSadaf Ebrahimi 1093*22dc650dSSadaf Ebrahimi4. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper 1094*22dc650dSSadaf Ebrahimifunctions that use the standard POSIX names. However, in pcre2posix.h the POSIX 1095*22dc650dSSadaf Ebrahiminames are defined as macros. This should help avoid linking with the wrong 1096*22dc650dSSadaf Ebrahimilibrary in some environments while still exporting the POSIX names for 1097*22dc650dSSadaf Ebrahimipre-existing programs that use them. (The Debian alternative names are also 1098*22dc650dSSadaf Ebrahimidefined as macros, but not documented.) 1099*22dc650dSSadaf Ebrahimi 1100*22dc650dSSadaf Ebrahimi5. Fix an xclass matching issue in JIT. 1101*22dc650dSSadaf Ebrahimi 1102*22dc650dSSadaf Ebrahimi6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315). 1103*22dc650dSSadaf Ebrahimi 1104*22dc650dSSadaf Ebrahimi7. Implement the Perl 5.28 experimental alphabetic names for atomic groups and 1105*22dc650dSSadaf Ebrahimilookaround assertions, for example, (*pla:...) and (*atomic:...). These are 1106*22dc650dSSadaf Ebrahimicharacterized by a lower case letter following (* and to simplify coding for 1107*22dc650dSSadaf Ebrahimithis, the character tables created by pcre2_maketables() were updated to add a 1108*22dc650dSSadaf Ebrahiminew "is lower case letter" bit. At the same time, the now unused "is 1109*22dc650dSSadaf Ebrahimihexadecimal digit" bit was removed. The default tables in 1110*22dc650dSSadaf Ebrahimisrc/pcre2_chartables.c.dist are updated. 1111*22dc650dSSadaf Ebrahimi 1112*22dc650dSSadaf Ebrahimi8. Implement the new Perl "script run" features (*script_run:...) and 1113*22dc650dSSadaf Ebrahimi(*atomic_script_run:...) aka (*sr:...) and (*asr:...). 1114*22dc650dSSadaf Ebrahimi 1115*22dc650dSSadaf Ebrahimi9. Fixed two typos in change 22 for 10.21, which added special handling for 1116*22dc650dSSadaf Ebrahimiranges such as a-z in EBCDIC environments. The original code probably never 1117*22dc650dSSadaf Ebrahimiworked, though there were no bug reports. 1118*22dc650dSSadaf Ebrahimi 1119*22dc650dSSadaf Ebrahimi10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via 1120*22dc650dSSadaf Ebrahimipcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast 1121*22dc650dSSadaf Ebrahimipath. Also, when a match fails, set the subject field in the match data to NULL 1122*22dc650dSSadaf Ebrahimifor tidiness - none of the substring extractors should reference this after 1123*22dc650dSSadaf Ebrahimimatch failure. 1124*22dc650dSSadaf Ebrahimi 1125*22dc650dSSadaf Ebrahimi11. If a pattern started with a subroutine call that had a quantifier with a 1126*22dc650dSSadaf Ebrahimiminimum of zero, an incorrect "match must start with this character" could be 1127*22dc650dSSadaf Ebrahimirecorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to 1128*22dc650dSSadaf Ebrahimibe the first character of a match. 1129*22dc650dSSadaf Ebrahimi 1130*22dc650dSSadaf Ebrahimi12. The heap limit checking code in pcre2_dfa_match() could suffer from 1131*22dc650dSSadaf Ebrahimioverflow if the heap limit was set very large. This could cause incorrect "heap 1132*22dc650dSSadaf Ebrahimilimit exceeded" errors. 1133*22dc650dSSadaf Ebrahimi 1134*22dc650dSSadaf Ebrahimi13. Add "kibibytes" to the heap limit output from pcre2test -C to make the 1135*22dc650dSSadaf Ebrahimiunits clear. 1136*22dc650dSSadaf Ebrahimi 1137*22dc650dSSadaf Ebrahimi14. Add a call to pcre2_jit_free_unused_memory() in pcre2grep, for tidiness. 1138*22dc650dSSadaf Ebrahimi 1139*22dc650dSSadaf Ebrahimi15. Updated the VMS-specific code in pcre2test on the advice of a VMS user. 1140*22dc650dSSadaf Ebrahimi 1141*22dc650dSSadaf Ebrahimi16. Removed the unnecessary inclusion of stdint.h (or inttypes.h) from 1142*22dc650dSSadaf Ebrahimipcre2_internal.h as it is now included by pcre2.h. Also, change 17 for 10.32 1143*22dc650dSSadaf Ebrahimibelow was unnecessarily complicated, as inttypes.h is a Standard C header, 1144*22dc650dSSadaf Ebrahimiwhich is defined to be a superset of stdint.h. Instead of conditionally 1145*22dc650dSSadaf Ebrahimiincluding stdint.h or inttypes.h, pcre2.h now unconditionally includes 1146*22dc650dSSadaf Ebrahimiinttypes.h. This supports environments that do not have stdint.h but do have 1147*22dc650dSSadaf Ebrahimiinttypes.h, which are known to exist. A note in the autotools documentation 1148*22dc650dSSadaf Ebrahimisays (November 2018) that there are none known that are the other way round. 1149*22dc650dSSadaf Ebrahimi 1150*22dc650dSSadaf Ebrahimi17. Added --disable-percent-zt to "configure" (and equivalent to CMake) to 1151*22dc650dSSadaf Ebrahimiforcibly disable the use of %zu and %td in formatting strings because there is 1152*22dc650dSSadaf Ebrahimiat least one version of VMS that claims to be C99 but does not support these 1153*22dc650dSSadaf Ebrahimimodifiers. 1154*22dc650dSSadaf Ebrahimi 1155*22dc650dSSadaf Ebrahimi18. Added --disable-pcre2grep-callout-fork, which restricts the callout support 1156*22dc650dSSadaf Ebrahimiin pcre2grep to the inbuilt echo facility. This may be useful in environments 1157*22dc650dSSadaf Ebrahimithat do not support fork(). 1158*22dc650dSSadaf Ebrahimi 1159*22dc650dSSadaf Ebrahimi19. Fix two instances of <= 0 being applied to unsigned integers (the VMS 1160*22dc650dSSadaf Ebrahimicompiler complains). 1161*22dc650dSSadaf Ebrahimi 1162*22dc650dSSadaf Ebrahimi20. Added "fork" support for VMS to pcre2grep, for running an external program 1163*22dc650dSSadaf Ebrahimivia a string callout. 1164*22dc650dSSadaf Ebrahimi 1165*22dc650dSSadaf Ebrahimi21. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel. 1166*22dc650dSSadaf Ebrahimi 1167*22dc650dSSadaf Ebrahimi22. If a pattern started with (*MARK), (*COMMIT), (*PRUNE), (*SKIP), or (*THEN) 1168*22dc650dSSadaf Ebrahimifollowed by ^ it was not recognized as anchored. 1169*22dc650dSSadaf Ebrahimi 1170*22dc650dSSadaf Ebrahimi23. The RunGrepTest script used to cut out the test of NUL characters for 1171*22dc650dSSadaf EbrahimiSolaris and MacOS as printf and sed can't handle them. It seems that the *BSD 1172*22dc650dSSadaf Ebrahimisystems can't either. I've inverted the test so that only those OS that are 1173*22dc650dSSadaf Ebrahimiknown to work (currently only Linux) try to run this test. 1174*22dc650dSSadaf Ebrahimi 1175*22dc650dSSadaf Ebrahimi24. Some tests in RunGrepTest appended to testtrygrep from two different file 1176*22dc650dSSadaf Ebrahimidescriptors instead of redirecting stderr to stdout. This worked on Linux, but 1177*22dc650dSSadaf Ebrahimiit was reported not to on other systems, causing the tests to fail. 1178*22dc650dSSadaf Ebrahimi 1179*22dc650dSSadaf Ebrahimi25. In the RunTest script, make the test for stack setting use the same value 1180*22dc650dSSadaf Ebrahimifor the stack as it needs for -bigstack. 1181*22dc650dSSadaf Ebrahimi 1182*22dc650dSSadaf Ebrahimi26. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning. 1183*22dc650dSSadaf Ebrahimi 1184*22dc650dSSadaf Ebrahimi26. With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s 1185*22dc650dSSadaf Ebrahimiwhich are valid in character classes, but not as the end of ranges, were being 1186*22dc650dSSadaf Ebrahimitreated as literals. An example is [_-\s] (but not [\s-_] because that gave an 1187*22dc650dSSadaf Ebrahimierror at the *start* of a range). Now an "invalid range" error is given 1188*22dc650dSSadaf Ebrahimiindependently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. 1189*22dc650dSSadaf Ebrahimi 1190*22dc650dSSadaf Ebrahimi27. Related to 26 above, PCRE2_BAD_ESCAPE_IS_LITERAL was affecting known escape 1191*22dc650dSSadaf Ebrahimisequences such as \eX when they appeared invalidly in a character class. Now 1192*22dc650dSSadaf Ebrahimithe option applies only to unrecognized or malformed escape sequences. 1193*22dc650dSSadaf Ebrahimi 1194*22dc650dSSadaf Ebrahimi28. Fix word boundary in JIT compiler. Patch by Mike Munday. 1195*22dc650dSSadaf Ebrahimi 1196*22dc650dSSadaf Ebrahimi29. The pcre2_dfa_match() function was incorrectly handling conditional version 1197*22dc650dSSadaf Ebrahimitests such as (?(VERSION>=0)...) when the version test was true. Incorrect 1198*22dc650dSSadaf Ebrahimiprocessing or a crash could result. 1199*22dc650dSSadaf Ebrahimi 1200*22dc650dSSadaf Ebrahimi30. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group 1201*22dc650dSSadaf Ebrahiminames, as Perl does. There was a small bug in this new code, found by 1202*22dc650dSSadaf EbrahimiClusterFuzz 12950, fixed before release. 1203*22dc650dSSadaf Ebrahimi 1204*22dc650dSSadaf Ebrahimi31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh} 1205*22dc650dSSadaf Ebrahimiconstruct. 1206*22dc650dSSadaf Ebrahimi 1207*22dc650dSSadaf Ebrahimi32. Compile \p{Any} to be the same as . in DOTALL mode, so that it benefits 1208*22dc650dSSadaf Ebrahimifrom auto-anchoring if \p{Any}* starts a pattern. 1209*22dc650dSSadaf Ebrahimi 1210*22dc650dSSadaf Ebrahimi33. Compile invalid UTF check in JIT test when only pcre32 is enabled. 1211*22dc650dSSadaf Ebrahimi 1212*22dc650dSSadaf Ebrahimi34. For some time now, CMake has been warning about the setting of policy 1213*22dc650dSSadaf EbrahimiCMP0026 to "OLD" in CmakeLists.txt, and hinting that the feature might be 1214*22dc650dSSadaf Ebrahimiremoved in a future version. A request for CMake expertise on the list produced 1215*22dc650dSSadaf Ebrahimino result, so I have now hacked CMakeLists.txt along the lines of some changes 1216*22dc650dSSadaf EbrahimiI found on the Internet. The new code no longer needs the policy setting, and 1217*22dc650dSSadaf Ebrahimiit appears to work fine on Linux. 1218*22dc650dSSadaf Ebrahimi 1219*22dc650dSSadaf Ebrahimi35. Setting --enable-jit=auto for an out-of-tree build failed because the 1220*22dc650dSSadaf Ebrahimisource directory wasn't in the search path for AC_TRY_COMPILE always. Patch 1221*22dc650dSSadaf Ebrahimifrom Ross Burton. 1222*22dc650dSSadaf Ebrahimi 1223*22dc650dSSadaf Ebrahimi36. Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available. 1224*22dc650dSSadaf EbrahimiPatch by Guillem Jover. 1225*22dc650dSSadaf Ebrahimi 1226*22dc650dSSadaf Ebrahimi37. Changed expressions such as 1<<10 to 1u<<10 in many places because compiler 1227*22dc650dSSadaf Ebrahimiwarnings were reported. 1228*22dc650dSSadaf Ebrahimi 1229*22dc650dSSadaf Ebrahimi38. Using the clang compiler with sanitizing options causes runtime complaints 1230*22dc650dSSadaf Ebrahimiabout truncation for statements such as x = ~x when x is an 8-bit value; it 1231*22dc650dSSadaf Ebrahimiseems to compute ~x as a 32-bit value. Changing such statements to x = 255 ^ x 1232*22dc650dSSadaf Ebrahimigets rid of the warnings. There were also two missing casts in pcre2test. 1233*22dc650dSSadaf Ebrahimi 1234*22dc650dSSadaf Ebrahimi 1235*22dc650dSSadaf EbrahimiVersion 10.32 10-September-2018 1236*22dc650dSSadaf Ebrahimi------------------------------- 1237*22dc650dSSadaf Ebrahimi 1238*22dc650dSSadaf Ebrahimi1. When matching using the REG_STARTEND feature of the POSIX API with a 1239*22dc650dSSadaf Ebrahiminon-zero starting offset, unset capturing groups with lower numbers than a 1240*22dc650dSSadaf Ebrahimigroup that did capture something were not being correctly returned as "unset" 1241*22dc650dSSadaf Ebrahimi(that is, with offset values of -1). 1242*22dc650dSSadaf Ebrahimi 1243*22dc650dSSadaf Ebrahimi2. When matching using the POSIX API, pcre2test used to omit listing unset 1244*22dc650dSSadaf Ebrahimigroups altogether. Now it shows those that come before any actual captures as 1245*22dc650dSSadaf Ebrahimi"<unset>", as happens for non-POSIX matching. 1246*22dc650dSSadaf Ebrahimi 1247*22dc650dSSadaf Ebrahimi3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only", 1248*22dc650dSSadaf Ebrahimiwhatever the build configuration was. It now correctly says "\R matches all 1249*22dc650dSSadaf EbrahimiUnicode newlines" in the default case when --enable-bsr-anycrlf has not been 1250*22dc650dSSadaf Ebrahimispecified. Similarly, running "pcre2test -C bsr" never produced the result 1251*22dc650dSSadaf EbrahimiANY. 1252*22dc650dSSadaf Ebrahimi 1253*22dc650dSSadaf Ebrahimi4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing 1254*22dc650dSSadaf Ebrahimimulti-code-unit characters caused bad behaviour and possibly a crash. This 1255*22dc650dSSadaf Ebrahimiissue was fixed for other kinds of repeat in release 10.20 by change 19, but 1256*22dc650dSSadaf Ebrahimirepeating character classes were overlooked. 1257*22dc650dSSadaf Ebrahimi 1258*22dc650dSSadaf Ebrahimi5. pcre2grep now supports the inclusion of binary zeros in patterns that are 1259*22dc650dSSadaf Ebrahimiread from files via the -f option. 1260*22dc650dSSadaf Ebrahimi 1261*22dc650dSSadaf Ebrahimi6. A small fix to pcre2grep to avoid compiler warnings for -Wformat-overflow=2. 1262*22dc650dSSadaf Ebrahimi 1263*22dc650dSSadaf Ebrahimi7. Added --enable-jit=auto support to configure.ac. 1264*22dc650dSSadaf Ebrahimi 1265*22dc650dSSadaf Ebrahimi8. Added some dummy variables to the heapframe structure in 16-bit and 32-bit 1266*22dc650dSSadaf Ebrahimimodes for the benefit of m68k, where pointers can be 16-bit aligned. The 1267*22dc650dSSadaf Ebrahimidummies force 32-bit alignment and this ensures that the structure is a 1268*22dc650dSSadaf Ebrahimimultiple of PCRE2_SIZE, a requirement that is tested at compile time. In other 1269*22dc650dSSadaf Ebrahimiarchitectures, alignment requirements take care of this automatically. 1270*22dc650dSSadaf Ebrahimi 1271*22dc650dSSadaf Ebrahimi9. When returning an error from pcre2_pattern_convert(), ensure the error 1272*22dc650dSSadaf Ebrahimioffset is set zero for early errors. 1273*22dc650dSSadaf Ebrahimi 1274*22dc650dSSadaf Ebrahimi10. A number of patches for Windows support from Daniel Richard G: 1275*22dc650dSSadaf Ebrahimi 1276*22dc650dSSadaf Ebrahimi (a) List of error numbers in Runtest.bat corrected (it was not the same as in 1277*22dc650dSSadaf Ebrahimi Runtest). 1278*22dc650dSSadaf Ebrahimi 1279*22dc650dSSadaf Ebrahimi (b) pcre2grep snprintf() workaround as used elsewhere in the tree. 1280*22dc650dSSadaf Ebrahimi 1281*22dc650dSSadaf Ebrahimi (c) Support for non-C99 snprintf() that returns -1 in the overflow case. 1282*22dc650dSSadaf Ebrahimi 1283*22dc650dSSadaf Ebrahimi11. Minor tidy of pcre2_dfa_match() code. 1284*22dc650dSSadaf Ebrahimi 1285*22dc650dSSadaf Ebrahimi12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer 1286*22dc650dSSadaf Ebrahimiuse the stack for local workspace and local ovectors. Instead, an initial block 1287*22dc650dSSadaf Ebrahimiof stack is reserved, but if this is insufficient, heap memory is used. The 1288*22dc650dSSadaf Ebrahimiheap limit parameter now applies to pcre2_dfa_match(). 1289*22dc650dSSadaf Ebrahimi 1290*22dc650dSSadaf Ebrahimi13. If a "find limits" test of DFA matching in pcre2test resulted in too many 1291*22dc650dSSadaf Ebrahimimatches for the ovector, no matches were displayed. 1292*22dc650dSSadaf Ebrahimi 1293*22dc650dSSadaf Ebrahimi14. Removed an occurrence of ctrl/Z from test 6 because Windows treats it as 1294*22dc650dSSadaf EbrahimiEOF. The test looks to have come from a fuzzer. 1295*22dc650dSSadaf Ebrahimi 1296*22dc650dSSadaf Ebrahimi15. If PCRE2 was built with a default match limit a lot greater than the 1297*22dc650dSSadaf Ebrahimidefault default of 10 000 000, some JIT tests of the match limit no longer 1298*22dc650dSSadaf Ebrahimifailed. All such tests now set 10 000 000 as the upper limit. 1299*22dc650dSSadaf Ebrahimi 1300*22dc650dSSadaf Ebrahimi16. Another Windows related patch for pcregrep to ensure that WIN32 is 1301*22dc650dSSadaf Ebrahimiundefined under Cygwin. 1302*22dc650dSSadaf Ebrahimi 1303*22dc650dSSadaf Ebrahimi17. Test for the presence of stdint.h and inttypes.h in configure and CMake and 1304*22dc650dSSadaf Ebrahimiinclude whichever exists (stdint preferred) instead of unconditionally 1305*22dc650dSSadaf Ebrahimiincluding stdint. This makes life easier for old and non-standard systems. 1306*22dc650dSSadaf Ebrahimi 1307*22dc650dSSadaf Ebrahimi18. Further changes to improve portability, especially to old and or non- 1308*22dc650dSSadaf Ebrahimistandard systems: 1309*22dc650dSSadaf Ebrahimi 1310*22dc650dSSadaf Ebrahimi (a) Put all printf arguments in RunGrepTest into single, not double, quotes, 1311*22dc650dSSadaf Ebrahimi and use \0 not \x00 for binary zero. 1312*22dc650dSSadaf Ebrahimi 1313*22dc650dSSadaf Ebrahimi (b) Avoid the use of C++ (i.e. BCPL) // comments. 1314*22dc650dSSadaf Ebrahimi 1315*22dc650dSSadaf Ebrahimi (c) Parameterize the use of %zu in pcre2test to make it like %td. For both of 1316*22dc650dSSadaf Ebrahimi these now, if using MSVC or a standard C before C99, %lu is used with a 1317*22dc650dSSadaf Ebrahimi cast if necessary. 1318*22dc650dSSadaf Ebrahimi 1319*22dc650dSSadaf Ebrahimi19. Applied a contributed patch to CMakeLists.txt to increase the stack size 1320*22dc650dSSadaf Ebrahimiwhen linking pcre2test with MSVC. This gets rid of a stack overflow error in 1321*22dc650dSSadaf Ebrahimithe standard set of tests. 1322*22dc650dSSadaf Ebrahimi 1323*22dc650dSSadaf Ebrahimi20. Output a warning in pcre2test when ignoring the "altglobal" modifier when 1324*22dc650dSSadaf Ebrahimiit is given with the "replace" modifier. 1325*22dc650dSSadaf Ebrahimi 1326*22dc650dSSadaf Ebrahimi21. In both pcre2test and pcre2_substitute(), with global matching, a pattern 1327*22dc650dSSadaf Ebrahimithat matched an empty string, but never at the starting match offset, was not 1328*22dc650dSSadaf Ebrahimihandled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such 1329*22dc650dSSadaf Ebrahimia pattern. Because \G is in a lookbehind assertion, there has to be a 1330*22dc650dSSadaf Ebrahimi"bumpalong" before there can be a match. The automatic "advance by one 1331*22dc650dSSadaf Ebrahimicharacter after an empty string match" rule is therefore inappropriate. A more 1332*22dc650dSSadaf Ebrahimicomplicated algorithm has now been implemented. 1333*22dc650dSSadaf Ebrahimi 1334*22dc650dSSadaf Ebrahimi22. When checking to see if a lookbehind is of fixed length, lookaheads were 1335*22dc650dSSadaf Ebrahimicorrectly ignored, but qualifiers on lookaheads were not being ignored, leading 1336*22dc650dSSadaf Ebrahimito an incorrect "lookbehind assertion is not fixed length" error. 1337*22dc650dSSadaf Ebrahimi 1338*22dc650dSSadaf Ebrahimi23. The VERSION condition test was reading fractional PCRE2 version numbers 1339*22dc650dSSadaf Ebrahimisuch as the 04 in 10.04 incorrectly and hence giving wrong results. 1340*22dc650dSSadaf Ebrahimi 1341*22dc650dSSadaf Ebrahimi24. Updated to Unicode version 11.0.0. As well as the usual addition of new 1342*22dc650dSSadaf Ebrahimiscripts and characters, this involved re-jigging the grapheme break property 1343*22dc650dSSadaf Ebrahimialgorithm because Unicode has changed the way emojis are handled. 1344*22dc650dSSadaf Ebrahimi 1345*22dc650dSSadaf Ebrahimi25. Fixed an obscure bug that struck when there were two atomic groups not 1346*22dc650dSSadaf Ebrahimiseparated by something with a backtracking point. There could be an incorrect 1347*22dc650dSSadaf Ebrahimibacktrack into the first of the atomic groups. A complicated example is 1348*22dc650dSSadaf Ebrahimi/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ matched against "abc", where the *SKIP 1349*22dc650dSSadaf Ebrahimishouldn't find a MARK (because is in an atomic group), but it did. 1350*22dc650dSSadaf Ebrahimi 1351*22dc650dSSadaf Ebrahimi26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set 1352*22dc650dSSadaf Ebrahimia list of modifiers for all subsequent patterns - only those that the script 1353*22dc650dSSadaf Ebrahimirecognizes are meaningful; (2) #subject lines can be used to set or unset a 1354*22dc650dSSadaf Ebrahimidefault "mark" modifier; (3) Unsupported #command lines give a warning when 1355*22dc650dSSadaf Ebrahimithey are ignored; (4) Mark data is output only if the "mark" modifier is 1356*22dc650dSSadaf Ebrahimipresent. 1357*22dc650dSSadaf Ebrahimi 1358*22dc650dSSadaf Ebrahimi27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported. 1359*22dc650dSSadaf Ebrahimi 1360*22dc650dSSadaf Ebrahimi28. A (*MARK) name was not being passed back for positive assertions that were 1361*22dc650dSSadaf Ebrahimiterminated by (*ACCEPT). 1362*22dc650dSSadaf Ebrahimi 1363*22dc650dSSadaf Ebrahimi29. Add support for \N{U+dddd}, but only in Unicode mode. 1364*22dc650dSSadaf Ebrahimi 1365*22dc650dSSadaf Ebrahimi30. Add support for (?^) for unsetting all imnsx options. 1366*22dc650dSSadaf Ebrahimi 1367*22dc650dSSadaf Ebrahimi31. The PCRE2_EXTENDED (/x) option only ever discarded space characters whose 1368*22dc650dSSadaf Ebrahimicode point was less than 256 and that were recognized by the lookup table 1369*22dc650dSSadaf Ebrahimigenerated by pcre2_maketables(), which uses isspace() to identify white space. 1370*22dc650dSSadaf EbrahimiNow, when Unicode support is compiled, PCRE2_EXTENDED also discards U+0085, 1371*22dc650dSSadaf EbrahimiU+200E, U+200F, U+2028, and U+2029, which are additional characters defined by 1372*22dc650dSSadaf EbrahimiUnicode as "Pattern White Space". This makes PCRE2 compatible with Perl. 1373*22dc650dSSadaf Ebrahimi 1374*22dc650dSSadaf Ebrahimi32. In certain circumstances, option settings within patterns were not being 1375*22dc650dSSadaf Ebrahimicorrectly processed. For example, the pattern /((?i)A)(?m)B/ incorrectly 1376*22dc650dSSadaf Ebrahimimatched "ab". (The (?m) setting lost the fact that (?i) should be reset at the 1377*22dc650dSSadaf Ebrahimiend of its group during the parse process, but without another setting such as 1378*22dc650dSSadaf Ebrahimi(?m) the compile phase got it right.) This bug was introduced by the 1379*22dc650dSSadaf Ebrahimirefactoring in release 10.23. 1380*22dc650dSSadaf Ebrahimi 1381*22dc650dSSadaf Ebrahimi33. PCRE2 uses bcopy() if available when memmove() is not, and it used just to 1382*22dc650dSSadaf Ebrahimidefine memmove() as function call to bcopy(). This hasn't been tested for a 1383*22dc650dSSadaf Ebrahimilong time because in pcre2test the result of memmove() was being used, whereas 1384*22dc650dSSadaf Ebrahimibcopy() doesn't return a result. This feature is now refactored always to call 1385*22dc650dSSadaf Ebrahimian emulation function when there is no memmove(). The emulation makes use of 1386*22dc650dSSadaf Ebrahimibcopy() when available. 1387*22dc650dSSadaf Ebrahimi 1388*22dc650dSSadaf Ebrahimi34. When serializing a pattern, set the memctl, executable_jit, and tables 1389*22dc650dSSadaf Ebrahimifields (that is, all the fields that contain pointers) to zeros so that the 1390*22dc650dSSadaf Ebrahimiresult of serializing is always the same. These fields are re-set when the 1391*22dc650dSSadaf Ebrahimipattern is deserialized. 1392*22dc650dSSadaf Ebrahimi 1393*22dc650dSSadaf Ebrahimi35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated 1394*22dc650dSSadaf Ebrahiminegative class with no characters less than 0x100 followed by a positive class 1395*22dc650dSSadaf Ebrahimiwith only characters less than 0x100, the first class was incorrectly being 1396*22dc650dSSadaf Ebrahimiauto-possessified, causing incorrect match failures. 1397*22dc650dSSadaf Ebrahimi 1398*22dc650dSSadaf Ebrahimi36. Removed the character type bit ctype_meta, which dates from PCRE1 and is 1399*22dc650dSSadaf Ebrahiminot used in PCRE2. 1400*22dc650dSSadaf Ebrahimi 1401*22dc650dSSadaf Ebrahimi37. Tidied up unnecessarily complicated macros used in the escapes table. 1402*22dc650dSSadaf Ebrahimi 1403*22dc650dSSadaf Ebrahimi38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted 1404*22dc650dSSadaf Ebrahimifrom distribution tarballs, owing to a typo in Makefile.am which had 1405*22dc650dSSadaf Ebrahimitestoutput8-16-3 twice. Now fixed. 1406*22dc650dSSadaf Ebrahimi 1407*22dc650dSSadaf Ebrahimi39. If the only branch in a conditional subpattern was anchored, the whole 1408*22dc650dSSadaf Ebrahimisubpattern was treated as anchored, when it should not have been, since the 1409*22dc650dSSadaf Ebrahimiassumed empty second branch cannot be anchored. Demonstrated by test patterns 1410*22dc650dSSadaf Ebrahimisuch as /(?(1)^())b/ or /(?(?=^))b/. 1411*22dc650dSSadaf Ebrahimi 1412*22dc650dSSadaf Ebrahimi40. A repeated conditional subpattern that could match an empty string was 1413*22dc650dSSadaf Ebrahimialways assumed to be unanchored. Now it is checked just like any other 1414*22dc650dSSadaf Ebrahimirepeated conditional subpattern, and can be found to be anchored if the minimum 1415*22dc650dSSadaf Ebrahimiquantifier is one or more. I can't see much use for a repeated anchored 1416*22dc650dSSadaf Ebrahimipattern, but the behaviour is now consistent. 1417*22dc650dSSadaf Ebrahimi 1418*22dc650dSSadaf Ebrahimi41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint 1419*22dc650dSSadaf Ebrahimi(for an event that could never occur but you had to have external information 1420*22dc650dSSadaf Ebrahimito know that). 1421*22dc650dSSadaf Ebrahimi 1422*22dc650dSSadaf Ebrahimi42. If before the first match in a file that was being searched by pcre2grep 1423*22dc650dSSadaf Ebrahimithere was a line that was sufficiently long to cause the input buffer to be 1424*22dc650dSSadaf Ebrahimiexpanded, the variable holding the location of the end of the previous match 1425*22dc650dSSadaf Ebrahimiwas being adjusted incorrectly, and could cause an overflow warning from a code 1426*22dc650dSSadaf Ebrahimisanitizer. However, as the value is used only to print pending "after" lines 1427*22dc650dSSadaf Ebrahimiwhen the next match is reached (and there are no such lines in this case) this 1428*22dc650dSSadaf Ebrahimibug could do no damage. 1429*22dc650dSSadaf Ebrahimi 1430*22dc650dSSadaf Ebrahimi 1431*22dc650dSSadaf EbrahimiVersion 10.31 12-February-2018 1432*22dc650dSSadaf Ebrahimi------------------------------ 1433*22dc650dSSadaf Ebrahimi 1434*22dc650dSSadaf Ebrahimi1. Fix typo (missing ]) in VMS code in pcre2test.c. 1435*22dc650dSSadaf Ebrahimi 1436*22dc650dSSadaf Ebrahimi2. Replace the replicated code for matching extended Unicode grapheme sequences 1437*22dc650dSSadaf Ebrahimi(which got a lot more complicated by change 10.30/49) by a single subroutine 1438*22dc650dSSadaf Ebrahimithat is called by both pcre2_match() and pcre2_dfa_match(). 1439*22dc650dSSadaf Ebrahimi 1440*22dc650dSSadaf Ebrahimi3. Add idempotent guard to pcre2_internal.h. 1441*22dc650dSSadaf Ebrahimi 1442*22dc650dSSadaf Ebrahimi4. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and 1443*22dc650dSSadaf EbrahimiPCRE2_CONFIG_COMPILED_WIDTHS. 1444*22dc650dSSadaf Ebrahimi 1445*22dc650dSSadaf Ebrahimi5. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is 1446*22dc650dSSadaf Ebrahimidefined (e.g. by --enable-never-backslash-C). 1447*22dc650dSSadaf Ebrahimi 1448*22dc650dSSadaf Ebrahimi6. Defined public names for all the pcre2_compile() error numbers, and used 1449*22dc650dSSadaf Ebrahimithe public names in pcre2_convert.c. 1450*22dc650dSSadaf Ebrahimi 1451*22dc650dSSadaf Ebrahimi7. Fixed a small memory leak in pcre2test (convert contexts). 1452*22dc650dSSadaf Ebrahimi 1453*22dc650dSSadaf Ebrahimi8. Added two casts to compile.c and one to match.c to avoid compiler warnings. 1454*22dc650dSSadaf Ebrahimi 1455*22dc650dSSadaf Ebrahimi9. Added code to pcre2grep when compiled under VMS to set the symbol 1456*22dc650dSSadaf EbrahimiPCRE2GREP_RC to the exit status, because VMS does not distinguish between 1457*22dc650dSSadaf Ebrahimiexit(0) and exit(1). 1458*22dc650dSSadaf Ebrahimi 1459*22dc650dSSadaf Ebrahimi10. Added the -LM (list modifiers) option to pcre2test. Also made -C complain 1460*22dc650dSSadaf Ebrahimiabout a bad option only if the following argument item does not start with a 1461*22dc650dSSadaf Ebrahimihyphen. 1462*22dc650dSSadaf Ebrahimi 1463*22dc650dSSadaf Ebrahimi11. pcre2grep was truncating components of file names to 128 characters when 1464*22dc650dSSadaf Ebrahimiprocessing files with the -r option, and also (some very odd code) truncating 1465*22dc650dSSadaf Ebrahimipath names to 512 characters. There is now a check on the absolute length of 1466*22dc650dSSadaf Ebrahimifull path file names, which may be up to 2047 characters long. 1467*22dc650dSSadaf Ebrahimi 1468*22dc650dSSadaf Ebrahimi12. When an assertion contained (*ACCEPT) it caused all open capturing groups 1469*22dc650dSSadaf Ebrahimito be closed (as for a non-assertion ACCEPT), which was wrong and could lead to 1470*22dc650dSSadaf Ebrahimimisbehaviour for subsequent references to groups that started outside the 1471*22dc650dSSadaf Ebrahimiassertion. ACCEPT in an assertion now closes only those groups that were 1472*22dc650dSSadaf Ebrahimistarted within that assertion. Fixes oss-fuzz issues 3852 and 3891. 1473*22dc650dSSadaf Ebrahimi 1474*22dc650dSSadaf Ebrahimi13. Multiline matching in pcre2grep was misbehaving if the pattern matched 1475*22dc650dSSadaf Ebrahimiwithin a line, and then matched again at the end of the line and over into 1476*22dc650dSSadaf Ebrahimisubsequent lines. Behaviour was different with and without colouring, and 1477*22dc650dSSadaf Ebrahimisometimes context lines were incorrectly printed and/or line endings were lost. 1478*22dc650dSSadaf EbrahimiAll these issues should now be fixed. 1479*22dc650dSSadaf Ebrahimi 1480*22dc650dSSadaf Ebrahimi14. If --line-buffered was specified for pcre2grep when input was from a 1481*22dc650dSSadaf Ebrahimicompressed file (.gz or .bz2) a segfault occurred. (Line buffering should be 1482*22dc650dSSadaf Ebrahimiignored for compressed files.) 1483*22dc650dSSadaf Ebrahimi 1484*22dc650dSSadaf Ebrahimi15. Although pcre2_jit_match checks whether the pattern is compiled 1485*22dc650dSSadaf Ebrahimiin a given mode, it was also expected that at least one mode is available. 1486*22dc650dSSadaf EbrahimiThis is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION 1487*22dc650dSSadaf Ebrahimiwhen the pattern is not optimized by JIT at all. 1488*22dc650dSSadaf Ebrahimi 1489*22dc650dSSadaf Ebrahimi16. The line number and related variables such as match counts in pcre2grep 1490*22dc650dSSadaf Ebrahimiwere all int variables, causing overflow when files with more than 2147483647 1491*22dc650dSSadaf Ebrahimilines were processed (assuming 32-bit ints). They have all been changed to 1492*22dc650dSSadaf Ebrahimiunsigned long ints. 1493*22dc650dSSadaf Ebrahimi 1494*22dc650dSSadaf Ebrahimi17. If a backreference with a minimum repeat count of zero was first in a 1495*22dc650dSSadaf Ebrahimipattern, apart from assertions, an incorrect first matching character could be 1496*22dc650dSSadaf Ebrahimirecorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set 1497*22dc650dSSadaf Ebrahimias the first character of a match. 1498*22dc650dSSadaf Ebrahimi 1499*22dc650dSSadaf Ebrahimi18. Characters in a leading positive assertion are considered for recording a 1500*22dc650dSSadaf Ebrahimifirst character of a match when the rest of the pattern does not provide one. 1501*22dc650dSSadaf EbrahimiHowever, a character in a non-assertive group within a leading assertion such 1502*22dc650dSSadaf Ebrahimias in the pattern /(?=(a))\1?b/ caused this process to fail. This was an 1503*22dc650dSSadaf Ebrahimiinfelicity rather than an outright bug, because it did not affect the result of 1504*22dc650dSSadaf Ebrahimia match, just its speed. (In fact, in this case, the starting 'a' was 1505*22dc650dSSadaf Ebrahimisubsequently picked up in the study.) 1506*22dc650dSSadaf Ebrahimi 1507*22dc650dSSadaf Ebrahimi19. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return" 1508*22dc650dSSadaf Ebrahimiinstead of "RRETURN" saves unwinding the backtracks in these cases (only one 1509*22dc650dSSadaf Ebrahimididn't). 1510*22dc650dSSadaf Ebrahimi 1511*22dc650dSSadaf Ebrahimi20. Allocate a single callout block on the stack at the start of pcre2_match() 1512*22dc650dSSadaf Ebrahimiand set its never-changing fields once only. Do the same for pcre2_dfa_match(). 1513*22dc650dSSadaf Ebrahimi 1514*22dc650dSSadaf Ebrahimi21. Save the extra compile options (set in the compile context) with the 1515*22dc650dSSadaf Ebrahimicompiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS 1516*22dc650dSSadaf Ebrahimito retrieve them, and update pcre2test to show them. 1517*22dc650dSSadaf Ebrahimi 1518*22dc650dSSadaf Ebrahimi22. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new 1519*22dc650dSSadaf Ebrahimifield callout_flags in callout blocks. The bits are set by pcre2_match(), but 1520*22dc650dSSadaf Ebrahiminot by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts 1521*22dc650dSSadaf Ebrahimiif the callout_extra subject modifier is set. These bits are provided to help 1522*22dc650dSSadaf Ebrahimiwith tracking how a backtracking match is proceeding. 1523*22dc650dSSadaf Ebrahimi 1524*22dc650dSSadaf Ebrahimi23. Updated the pcre2demo.c demonstration program, which was missing the extra 1525*22dc650dSSadaf Ebrahimicode for -g that handles the case when \K in an assertion causes the match to 1526*22dc650dSSadaf Ebrahimiend at the original start point. Also arranged for it to detect when \K causes 1527*22dc650dSSadaf Ebrahimithe end of a match to be before its start. 1528*22dc650dSSadaf Ebrahimi 1529*22dc650dSSadaf Ebrahimi24. Similar to 23 above, strange things (including loops) could happen in 1530*22dc650dSSadaf Ebrahimipcre2grep when \K was used in an assertion when --colour was used or in 1531*22dc650dSSadaf Ebrahimimultiline mode. The "end at original start point" bug is fixed, and if the end 1532*22dc650dSSadaf Ebrahimipoint is found to be before the start point, they are swapped. 1533*22dc650dSSadaf Ebrahimi 1534*22dc650dSSadaf Ebrahimi25. When PCRE2_FIRSTLINE without PCRE2_NO_START_OPTIMIZE was used in non-JIT 1535*22dc650dSSadaf Ebrahimimatching (both pcre2_match() and pcre2_dfa_match()) and the matched string 1536*22dc650dSSadaf Ebrahimistarted with the first code unit of a newline sequence, matching failed because 1537*22dc650dSSadaf Ebrahimiit was not tried at the newline. 1538*22dc650dSSadaf Ebrahimi 1539*22dc650dSSadaf Ebrahimi26. Code for giving up a non-partial match after failing to find a starting 1540*22dc650dSSadaf Ebrahimicode unit anywhere in the subject was missing when searching for one of a 1541*22dc650dSSadaf Ebrahiminumber of code units (the bitmap case) in both pcre2_match() and 1542*22dc650dSSadaf Ebrahimipcre2_dfa_match(). This was a missing optimization rather than a bug. 1543*22dc650dSSadaf Ebrahimi 1544*22dc650dSSadaf Ebrahimi27. Tidied up the ACROSSCHAR macro to be like FORWARDCHAR and BACKCHAR, using a 1545*22dc650dSSadaf Ebrahimipointer argument rather than a code unit value. This should not have affected 1546*22dc650dSSadaf Ebrahimithe generated code. 1547*22dc650dSSadaf Ebrahimi 1548*22dc650dSSadaf Ebrahimi28. The JIT compiler has been updated. 1549*22dc650dSSadaf Ebrahimi 1550*22dc650dSSadaf Ebrahimi29. Avoid pointer overflow for unset captures in pcre2_substring_list_get(). 1551*22dc650dSSadaf EbrahimiThis could not actually cause a crash because it was always used in a memcpy() 1552*22dc650dSSadaf Ebrahimicall with zero length. 1553*22dc650dSSadaf Ebrahimi 1554*22dc650dSSadaf Ebrahimi30. Some internal structures have a variable-length ovector[] as their last 1555*22dc650dSSadaf Ebrahimielement. Their actual memory is obtained dynamically, giving an ovector of 1556*22dc650dSSadaf Ebrahimiappropriate length. However, they are defined in the structure as 1557*22dc650dSSadaf Ebrahimiovector[NUMBER], where NUMBER is large so that array bound checkers don't 1558*22dc650dSSadaf Ebrahimigrumble. The value of NUMBER was 10000, but a fuzzer exceeded 5000 capturing 1559*22dc650dSSadaf Ebrahimigroups, making the ovector larger than this. The number has been increased to 1560*22dc650dSSadaf Ebrahimi131072, which allows for the maximum number of captures (65535) plus the 1561*22dc650dSSadaf Ebrahimioverall match. This fixes oss-fuzz issue 5415. 1562*22dc650dSSadaf Ebrahimi 1563*22dc650dSSadaf Ebrahimi31. Auto-possessification at the end of a capturing group was dependent on what 1564*22dc650dSSadaf Ebrahimifollows the group (e.g. /(a+)b/ would auto-possessify the a+) but this caused 1565*22dc650dSSadaf Ebrahimiincorrect behaviour when the group was called recursively from elsewhere in the 1566*22dc650dSSadaf Ebrahimipattern where something different might follow. This bug is an unforseen 1567*22dc650dSSadaf Ebrahimiconsequence of change #1 for 10.30 - the implementation of backtracking into 1568*22dc650dSSadaf Ebrahimirecursions. Iterators at the ends of capturing groups are no longer considered 1569*22dc650dSSadaf Ebrahimifor auto-possessification if the pattern contains any recursions. Fixes 1570*22dc650dSSadaf EbrahimiBugzilla #2232. 1571*22dc650dSSadaf Ebrahimi 1572*22dc650dSSadaf Ebrahimi 1573*22dc650dSSadaf EbrahimiVersion 10.30 14-August-2017 1574*22dc650dSSadaf Ebrahimi---------------------------- 1575*22dc650dSSadaf Ebrahimi 1576*22dc650dSSadaf Ebrahimi1. The main interpreter, pcre2_match(), has been refactored into a new version 1577*22dc650dSSadaf Ebrahimithat does not use recursive function calls (and therefore the stack) for 1578*22dc650dSSadaf Ebrahimiremembering backtracking positions. This makes --disable-stack-for-recursion a 1579*22dc650dSSadaf EbrahimiNOOP. The new implementation allows backtracking into recursive group calls in 1580*22dc650dSSadaf Ebrahimipatterns, making it more compatible with Perl, and also fixes some other 1581*22dc650dSSadaf Ebrahimihard-to-do issues such as #1887 in Bugzilla. The code is also cleaner because 1582*22dc650dSSadaf Ebrahimithe old code had a number of fudges to try to reduce stack usage. It seems to 1583*22dc650dSSadaf Ebrahimirun no slower than the old code. 1584*22dc650dSSadaf Ebrahimi 1585*22dc650dSSadaf EbrahimiA number of bugs in the refactored code were subsequently fixed during testing 1586*22dc650dSSadaf Ebrahimibefore release, but after the code was made available in the repository. These 1587*22dc650dSSadaf Ebrahimibugs were never in fully released code, but are noted here for the record. 1588*22dc650dSSadaf Ebrahimi 1589*22dc650dSSadaf Ebrahimi (a) If a pattern had fewer capturing parentheses than the ovector supplied in 1590*22dc650dSSadaf Ebrahimi the match data block, a memory error (detectable by ASAN) occurred after 1591*22dc650dSSadaf Ebrahimi a match, because the external block was being set from non-existent 1592*22dc650dSSadaf Ebrahimi internal ovector fields. Fixes oss-fuzz issue 781. 1593*22dc650dSSadaf Ebrahimi 1594*22dc650dSSadaf Ebrahimi (b) A pattern with very many capturing parentheses (when the internal frame 1595*22dc650dSSadaf Ebrahimi size was greater than the initial frame vector on the stack) caused a 1596*22dc650dSSadaf Ebrahimi crash. A vector on the heap is now set up at the start of matching if the 1597*22dc650dSSadaf Ebrahimi vector on the stack is not big enough to handle at least 10 frames. 1598*22dc650dSSadaf Ebrahimi Fixes oss-fuzz issue 783. 1599*22dc650dSSadaf Ebrahimi 1600*22dc650dSSadaf Ebrahimi (c) Handling of (*VERB)s in recursions was wrong in some cases. 1601*22dc650dSSadaf Ebrahimi 1602*22dc650dSSadaf Ebrahimi (d) Captures in negative assertions that were used as conditions were not 1603*22dc650dSSadaf Ebrahimi happening if the assertion matched via (*ACCEPT). 1604*22dc650dSSadaf Ebrahimi 1605*22dc650dSSadaf Ebrahimi (e) Mark values were not being passed out of recursions. 1606*22dc650dSSadaf Ebrahimi 1607*22dc650dSSadaf Ebrahimi (f) Refactor some code in do_callout() to avoid picky compiler warnings about 1608*22dc650dSSadaf Ebrahimi negative indices. Fixes oss-fuzz issue 1454. 1609*22dc650dSSadaf Ebrahimi 1610*22dc650dSSadaf Ebrahimi (g) Similarly refactor the way the variable length ovector is addressed for 1611*22dc650dSSadaf Ebrahimi similar reasons. Fixes oss-fuzz issue 1465. 1612*22dc650dSSadaf Ebrahimi 1613*22dc650dSSadaf Ebrahimi2. Now that pcre2_match() no longer uses recursive function calls (see above), 1614*22dc650dSSadaf Ebrahimithe "match limit recursion" value seems misnamed. It still exists, and limits 1615*22dc650dSSadaf Ebrahimithe depth of tree that is searched. To avoid future confusion, it has been 1616*22dc650dSSadaf Ebrahimirenamed as "depth limit" in all relevant places (--with-depth-limit, 1617*22dc650dSSadaf Ebrahimi(*LIMIT_DEPTH), pcre2_set_depth_limit(), etc) but the old names are still 1618*22dc650dSSadaf Ebrahimiavailable for backwards compatibility. 1619*22dc650dSSadaf Ebrahimi 1620*22dc650dSSadaf Ebrahimi3. Hardened pcre2test so as to reduce the number of bugs reported by fuzzers: 1621*22dc650dSSadaf Ebrahimi 1622*22dc650dSSadaf Ebrahimi (a) Check for malloc failures when getting memory for the ovector (POSIX) or 1623*22dc650dSSadaf Ebrahimi the match data block (non-POSIX). 1624*22dc650dSSadaf Ebrahimi 1625*22dc650dSSadaf Ebrahimi4. In the 32-bit library in non-UTF mode, an attempt to find a Unicode property 1626*22dc650dSSadaf Ebrahimifor a character with a code point greater than 0x10ffff (the Unicode maximum) 1627*22dc650dSSadaf Ebrahimicaused a crash. 1628*22dc650dSSadaf Ebrahimi 1629*22dc650dSSadaf Ebrahimi5. If a lookbehind assertion that contained a back reference to a group 1630*22dc650dSSadaf Ebrahimiappearing later in the pattern was compiled with the PCRE2_ANCHORED option, 1631*22dc650dSSadaf Ebrahimiundefined actions (often a segmentation fault) could occur, depending on what 1632*22dc650dSSadaf Ebrahimiother options were set. An example assertion is (?<!\1(abc)) where the 1633*22dc650dSSadaf Ebrahimireference \1 precedes the group (abc). This fixes oss-fuzz issue 865. 1634*22dc650dSSadaf Ebrahimi 1635*22dc650dSSadaf Ebrahimi6. Added the PCRE2_INFO_FRAMESIZE item to pcre2_pattern_info() and arranged for 1636*22dc650dSSadaf Ebrahimipcre2test to use it to output the frame size when the "framesize" modifier is 1637*22dc650dSSadaf Ebrahimigiven. 1638*22dc650dSSadaf Ebrahimi 1639*22dc650dSSadaf Ebrahimi7. Reworked the recursive pattern matching in the JIT compiler to follow the 1640*22dc650dSSadaf Ebrahimiinterpreter changes. 1641*22dc650dSSadaf Ebrahimi 1642*22dc650dSSadaf Ebrahimi8. When the zero_terminate modifier was specified on a pcre2test subject line 1643*22dc650dSSadaf Ebrahimifor global matching, unpredictable things could happen. For example, in UTF-8 1644*22dc650dSSadaf Ebrahimimode, the pattern //g,zero_terminate read random memory when matched against an 1645*22dc650dSSadaf Ebrahimiempty string with zero_terminate. This was a bug in pcre2test, not the library. 1646*22dc650dSSadaf Ebrahimi 1647*22dc650dSSadaf Ebrahimi9. Moved some Windows-specific code in pcre2grep (introduced in 10.23/13) out 1648*22dc650dSSadaf Ebrahimiof the section that is compiled when Unix-style directory scanning is 1649*22dc650dSSadaf Ebrahimiavailable, and into a new section that is always compiled for Windows. 1650*22dc650dSSadaf Ebrahimi 1651*22dc650dSSadaf Ebrahimi10. In pcre2test, explicitly close the file after an error during serialization 1652*22dc650dSSadaf Ebrahimior deserialization (the "load" or "save" commands). 1653*22dc650dSSadaf Ebrahimi 1654*22dc650dSSadaf Ebrahimi11. Fix memory leak in pcre2_serialize_decode() when the input is invalid. 1655*22dc650dSSadaf Ebrahimi 1656*22dc650dSSadaf Ebrahimi12. Fix potential NULL dereference in pcre2_callout_enumerate() if called with 1657*22dc650dSSadaf Ebrahimia NULL pattern pointer when Unicode support is available. 1658*22dc650dSSadaf Ebrahimi 1659*22dc650dSSadaf Ebrahimi13. When the 32-bit library was being tested by pcre2test, error messages that 1660*22dc650dSSadaf Ebrahimiwere longer than 64 code units could cause a buffer overflow. This was a bug in 1661*22dc650dSSadaf Ebrahimipcre2test. 1662*22dc650dSSadaf Ebrahimi 1663*22dc650dSSadaf Ebrahimi14. The alternative matching function, pcre2_dfa_match() misbehaved if it 1664*22dc650dSSadaf Ebrahimiencountered a character class with a possessive repeat, for example [a-f]{3}+. 1665*22dc650dSSadaf Ebrahimi 1666*22dc650dSSadaf Ebrahimi15. The depth (formerly recursion) limit now applies to DFA matching (as 1667*22dc650dSSadaf Ebrahimiof 10.23/36); pcre2test has been upgraded so that \=find_limits works with DFA 1668*22dc650dSSadaf Ebrahimimatching to find the minimum value for this limit. 1669*22dc650dSSadaf Ebrahimi 1670*22dc650dSSadaf Ebrahimi16. Since 10.21, if pcre2_match() was called with a null context, default 1671*22dc650dSSadaf Ebrahimimemory allocation functions were used instead of whatever was used when the 1672*22dc650dSSadaf Ebrahimipattern was compiled. 1673*22dc650dSSadaf Ebrahimi 1674*22dc650dSSadaf Ebrahimi17. Changes to the pcre2test "memory" modifier on a subject line. These apply 1675*22dc650dSSadaf Ebrahimionly to pcre2_match(): 1676*22dc650dSSadaf Ebrahimi 1677*22dc650dSSadaf Ebrahimi (a) Warn if null_context is set on both pattern and subject, because the 1678*22dc650dSSadaf Ebrahimi memory details cannot then be shown. 1679*22dc650dSSadaf Ebrahimi 1680*22dc650dSSadaf Ebrahimi (b) Remember (up to a certain number of) memory allocations and their 1681*22dc650dSSadaf Ebrahimi lengths, and list only the lengths, so as to be system-independent. 1682*22dc650dSSadaf Ebrahimi (In practice, the new interpreter never has more than 2 blocks allocated 1683*22dc650dSSadaf Ebrahimi simultaneously.) 1684*22dc650dSSadaf Ebrahimi 1685*22dc650dSSadaf Ebrahimi18. Make pcre2test detect an error return from pcre2_get_error_message(), give 1686*22dc650dSSadaf Ebrahimia message, and abandon the run (this would have detected #13 above). 1687*22dc650dSSadaf Ebrahimi 1688*22dc650dSSadaf Ebrahimi19. Implemented PCRE2_ENDANCHORED. 1689*22dc650dSSadaf Ebrahimi 1690*22dc650dSSadaf Ebrahimi20. Applied Jason Hood's patches (slightly modified) to pcre2grep, to implement 1691*22dc650dSSadaf Ebrahimithe --output=text (-O) option and the inbuilt callout echo. 1692*22dc650dSSadaf Ebrahimi 1693*22dc650dSSadaf Ebrahimi21. Extend auto-anchoring etc. to ignore groups with a zero qualifier and 1694*22dc650dSSadaf Ebrahimisingle-branch conditions with a false condition (e.g. DEFINE) at the start of a 1695*22dc650dSSadaf Ebrahimibranch. For example, /(?(DEFINE)...)^A/ and /(...){0}^B/ are now flagged as 1696*22dc650dSSadaf Ebrahimianchored. 1697*22dc650dSSadaf Ebrahimi 1698*22dc650dSSadaf Ebrahimi22. Added an explicit limit on the amount of heap used by pcre2_match(), set by 1699*22dc650dSSadaf Ebrahimipcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). Upgraded pcre2test to show the 1700*22dc650dSSadaf Ebrahimiheap limit along with other pattern information, and to find the minimum when 1701*22dc650dSSadaf Ebrahimithe find_limits modifier is set. 1702*22dc650dSSadaf Ebrahimi 1703*22dc650dSSadaf Ebrahimi23. Write to the last 8 bytes of the pcre2_real_code structure when a compiled 1704*22dc650dSSadaf Ebrahimipattern is set up so as to initialize any padding the compiler might have 1705*22dc650dSSadaf Ebrahimiincluded. This avoids valgrind warnings when a compiled pattern is copied, in 1706*22dc650dSSadaf Ebrahimiparticular when it is serialized. 1707*22dc650dSSadaf Ebrahimi 1708*22dc650dSSadaf Ebrahimi24. Remove a redundant line of code left in accidentally a long time ago. 1709*22dc650dSSadaf Ebrahimi 1710*22dc650dSSadaf Ebrahimi25. Remove a duplication typo in pcre2_tables.c 1711*22dc650dSSadaf Ebrahimi 1712*22dc650dSSadaf Ebrahimi26. Correct an incorrect cast in pcre2_valid_utf.c 1713*22dc650dSSadaf Ebrahimi 1714*22dc650dSSadaf Ebrahimi27. Update pcre2test, remove some unused code in pcre2_match(), and upgrade the 1715*22dc650dSSadaf Ebrahimitests to improve coverage. 1716*22dc650dSSadaf Ebrahimi 1717*22dc650dSSadaf Ebrahimi28. Some fixes/tidies as a result of looking at Coverity Scan output: 1718*22dc650dSSadaf Ebrahimi 1719*22dc650dSSadaf Ebrahimi (a) Typo: ">" should be ">=" in opcode check in pcre2_auto_possess.c. 1720*22dc650dSSadaf Ebrahimi (b) Added some casts to avoid "suspicious implicit sign extension". 1721*22dc650dSSadaf Ebrahimi (c) Resource leaks in pcre2test in rare error cases. 1722*22dc650dSSadaf Ebrahimi (d) Avoid warning for never-use case OP_TABLE_LENGTH which is just a fudge 1723*22dc650dSSadaf Ebrahimi for checking at compile time that tables are the right size. 1724*22dc650dSSadaf Ebrahimi (e) Add missing "fall through" comment. 1725*22dc650dSSadaf Ebrahimi 1726*22dc650dSSadaf Ebrahimi29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features. 1727*22dc650dSSadaf Ebrahimi 1728*22dc650dSSadaf Ebrahimi30. Implement (?n: for PCRE2_NO_AUTO_CAPTURE, because Perl now has this. 1729*22dc650dSSadaf Ebrahimi 1730*22dc650dSSadaf Ebrahimi31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in 1731*22dc650dSSadaf Ebrahimipcre2test, a crash could occur. 1732*22dc650dSSadaf Ebrahimi 1733*22dc650dSSadaf Ebrahimi32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16MiB) so 1734*22dc650dSSadaf Ebrahimithat all the tests can run with clang's sanitizing options. 1735*22dc650dSSadaf Ebrahimi 1736*22dc650dSSadaf Ebrahimi33. Implement extra compile options in the compile context and add the first 1737*22dc650dSSadaf Ebrahimione: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. 1738*22dc650dSSadaf Ebrahimi 1739*22dc650dSSadaf Ebrahimi34. Implement newline type PCRE2_NEWLINE_NUL. 1740*22dc650dSSadaf Ebrahimi 1741*22dc650dSSadaf Ebrahimi35. A lookbehind assertion that had a zero-length branch caused undefined 1742*22dc650dSSadaf Ebrahimibehaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859. 1743*22dc650dSSadaf Ebrahimi 1744*22dc650dSSadaf Ebrahimi36. The match limit value now also applies to pcre2_dfa_match() as there are 1745*22dc650dSSadaf Ebrahimipatterns that can use up a lot of resources without necessarily recursing very 1746*22dc650dSSadaf Ebrahimideeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761. 1747*22dc650dSSadaf Ebrahimi 1748*22dc650dSSadaf Ebrahimi37. Implement PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. 1749*22dc650dSSadaf Ebrahimi 1750*22dc650dSSadaf Ebrahimi38. Fix returned offsets from regexec() when REG_STARTEND is used with a 1751*22dc650dSSadaf Ebrahimistarting offset greater than zero. 1752*22dc650dSSadaf Ebrahimi 1753*22dc650dSSadaf Ebrahimi39. Implement REG_PEND (GNU extension) for the POSIX wrapper. 1754*22dc650dSSadaf Ebrahimi 1755*22dc650dSSadaf Ebrahimi40. Implement the subject_literal modifier in pcre2test, and allow jitstack on 1756*22dc650dSSadaf Ebrahimipattern lines. 1757*22dc650dSSadaf Ebrahimi 1758*22dc650dSSadaf Ebrahimi41. Implement PCRE2_LITERAL and use it to support REG_NOSPEC. 1759*22dc650dSSadaf Ebrahimi 1760*22dc650dSSadaf Ebrahimi42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit 1761*22dc650dSSadaf Ebrahimiof pcre2grep. 1762*22dc650dSSadaf Ebrahimi 1763*22dc650dSSadaf Ebrahimi43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL, 1764*22dc650dSSadaf EbrahimiPCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs: 1765*22dc650dSSadaf Ebrahimi 1766*22dc650dSSadaf Ebrahimi (a) The -F option did not work for fixed strings containing \E. 1767*22dc650dSSadaf Ebrahimi (b) The -w option did not work for patterns with multiple branches. 1768*22dc650dSSadaf Ebrahimi 1769*22dc650dSSadaf Ebrahimi44. Added configuration options for the SELinux compatible execmem allocator in 1770*22dc650dSSadaf EbrahimiJIT. 1771*22dc650dSSadaf Ebrahimi 1772*22dc650dSSadaf Ebrahimi45. Increased the limit for searching for a "must be present" code unit in 1773*22dc650dSSadaf Ebrahimisubjects from 1000 to 2000 for 8-bit searches, since they use memchr() and are 1774*22dc650dSSadaf Ebrahimimuch faster. 1775*22dc650dSSadaf Ebrahimi 1776*22dc650dSSadaf Ebrahimi46. Arrange for anchored patterns to record and use "first code unit" data, 1777*22dc650dSSadaf Ebrahimibecause this can give a fast "no match" without searching for a "required code 1778*22dc650dSSadaf Ebrahimiunit". Previously only non-anchored patterns did this. 1779*22dc650dSSadaf Ebrahimi 1780*22dc650dSSadaf Ebrahimi47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0. 1781*22dc650dSSadaf Ebrahimi 1782*22dc650dSSadaf Ebrahimi48. Add the callout_no_where modifier to pcre2test. 1783*22dc650dSSadaf Ebrahimi 1784*22dc650dSSadaf Ebrahimi49. Update extended grapheme breaking rules to the latest set that are in 1785*22dc650dSSadaf EbrahimiUnicode Standard Annex #29. 1786*22dc650dSSadaf Ebrahimi 1787*22dc650dSSadaf Ebrahimi50. Added experimental foreign pattern conversion facilities 1788*22dc650dSSadaf Ebrahimi(pcre2_pattern_convert() and friends). 1789*22dc650dSSadaf Ebrahimi 1790*22dc650dSSadaf Ebrahimi51. Change the macro FWRITE, used in pcre2grep, to FWRITE_IGNORE because FWRITE 1791*22dc650dSSadaf Ebrahimiis defined in a system header in cygwin. Also modified some of the #ifdefs in 1792*22dc650dSSadaf Ebrahimipcre2grep related to Windows and Cygwin support. 1793*22dc650dSSadaf Ebrahimi 1794*22dc650dSSadaf Ebrahimi52. Change 3(g) for 10.23 was a bit too zealous. If a hyphen that follows a 1795*22dc650dSSadaf Ebrahimicharacter class is the last character in the class, Perl does not give a 1796*22dc650dSSadaf Ebrahimiwarning. PCRE2 now also treats this as a literal. 1797*22dc650dSSadaf Ebrahimi 1798*22dc650dSSadaf Ebrahimi53. Related to 52, though PCRE2 was throwing an error for [[:digit:]-X] it was 1799*22dc650dSSadaf Ebrahiminot doing so for [\d-X] (and similar escapes), as is documented. 1800*22dc650dSSadaf Ebrahimi 1801*22dc650dSSadaf Ebrahimi54. Fixed a MIPS issue in the JIT compiler reported by Joshua Kinard. 1802*22dc650dSSadaf Ebrahimi 1803*22dc650dSSadaf Ebrahimi55. Fixed a "maybe uninitialized" warning for class_uchardata in \p handling in 1804*22dc650dSSadaf Ebrahimipcre2_compile() which could never actually trigger (code should have been cut 1805*22dc650dSSadaf Ebrahimiout when Unicode support is disabled). 1806*22dc650dSSadaf Ebrahimi 1807*22dc650dSSadaf Ebrahimi 1808*22dc650dSSadaf EbrahimiVersion 10.23 14-February-2017 1809*22dc650dSSadaf Ebrahimi------------------------------ 1810*22dc650dSSadaf Ebrahimi 1811*22dc650dSSadaf Ebrahimi1. Extended pcre2test with the utf8_input modifier so that it is able to 1812*22dc650dSSadaf Ebrahimigenerate all possible 16-bit and 32-bit code unit values in non-UTF modes. 1813*22dc650dSSadaf Ebrahimi 1814*22dc650dSSadaf Ebrahimi2. In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode), without 1815*22dc650dSSadaf EbrahimiPCRE2_UCP set, a negative character type such as \D in a positive class should 1816*22dc650dSSadaf Ebrahimicause all characters greater than 255 to match, whatever else is in the class. 1817*22dc650dSSadaf EbrahimiThere was a bug that caused this not to happen if a Unicode property item was 1818*22dc650dSSadaf Ebrahimiadded to such a class, for example [\D\P{Nd}] or [\W\pL]. 1819*22dc650dSSadaf Ebrahimi 1820*22dc650dSSadaf Ebrahimi3. There has been a major re-factoring of the pcre2_compile.c file. Most syntax 1821*22dc650dSSadaf Ebrahimichecking is now done in the pre-pass that identifies capturing groups. This has 1822*22dc650dSSadaf Ebrahimireduced the amount of duplication and made the code tidier. While doing this, 1823*22dc650dSSadaf Ebrahimisome minor bugs and Perl incompatibilities were fixed, including: 1824*22dc650dSSadaf Ebrahimi 1825*22dc650dSSadaf Ebrahimi (a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead 1826*22dc650dSSadaf Ebrahimi of giving an invalid quantifier error. 1827*22dc650dSSadaf Ebrahimi 1828*22dc650dSSadaf Ebrahimi (b) {0} can now be used after a group in a lookbehind assertion; previously 1829*22dc650dSSadaf Ebrahimi this caused an "assertion is not fixed length" error. 1830*22dc650dSSadaf Ebrahimi 1831*22dc650dSSadaf Ebrahimi (c) Perl always treats (?(DEFINE) as a "define" group, even if a group with 1832*22dc650dSSadaf Ebrahimi the name "DEFINE" exists. PCRE2 now does likewise. 1833*22dc650dSSadaf Ebrahimi 1834*22dc650dSSadaf Ebrahimi (d) A recursion condition test such as (?(R2)...) must now refer to an 1835*22dc650dSSadaf Ebrahimi existing subpattern. 1836*22dc650dSSadaf Ebrahimi 1837*22dc650dSSadaf Ebrahimi (e) A conditional recursion test such as (?(R)...) misbehaved if there was a 1838*22dc650dSSadaf Ebrahimi group whose name began with "R". 1839*22dc650dSSadaf Ebrahimi 1840*22dc650dSSadaf Ebrahimi (f) When testing zero-terminated patterns under valgrind, the terminating 1841*22dc650dSSadaf Ebrahimi zero is now marked "no access". This catches bugs that would otherwise 1842*22dc650dSSadaf Ebrahimi show up only with non-zero-terminated patterns. 1843*22dc650dSSadaf Ebrahimi 1844*22dc650dSSadaf Ebrahimi (g) A hyphen appearing immediately after a POSIX character class (for example 1845*22dc650dSSadaf Ebrahimi /[[:ascii:]-z]/) now generates an error. Perl does accept this as a 1846*22dc650dSSadaf Ebrahimi literal, but gives a warning, so it seems best to fail it in PCRE. 1847*22dc650dSSadaf Ebrahimi 1848*22dc650dSSadaf Ebrahimi (h) An empty \Q\E sequence may appear after a callout that precedes an 1849*22dc650dSSadaf Ebrahimi assertion condition (it is, of course, ignored). 1850*22dc650dSSadaf Ebrahimi 1851*22dc650dSSadaf EbrahimiOne effect of the refactoring is that some error numbers and messages have 1852*22dc650dSSadaf Ebrahimichanged, and the pattern offset given for compiling errors is not always the 1853*22dc650dSSadaf Ebrahimiright-most character that has been read. In particular, for a variable-length 1854*22dc650dSSadaf Ebrahimilookbehind assertion it now points to the start of the assertion. Another 1855*22dc650dSSadaf Ebrahimichange is that when a callout appears before a group, the "length of next 1856*22dc650dSSadaf Ebrahimipattern item" that is passed now just gives the length of the opening 1857*22dc650dSSadaf Ebrahimiparenthesis item, not the length of the whole group. A length of zero is now 1858*22dc650dSSadaf Ebrahimigiven only for a callout at the end of the pattern. Automatic callouts are no 1859*22dc650dSSadaf Ebrahimilonger inserted before and after explicit callouts in the pattern. 1860*22dc650dSSadaf Ebrahimi 1861*22dc650dSSadaf EbrahimiA number of bugs in the refactored code were subsequently fixed during testing 1862*22dc650dSSadaf Ebrahimibefore release, but after the code was made available in the repository. Many 1863*22dc650dSSadaf Ebrahimiof the bugs were discovered by fuzzing testing. Several of them were related to 1864*22dc650dSSadaf Ebrahimithe change from assuming a zero-terminated pattern (which previously had 1865*22dc650dSSadaf Ebrahimirequired non-zero terminated strings to be copied). These bugs were never in 1866*22dc650dSSadaf Ebrahimifully released code, but are noted here for the record. 1867*22dc650dSSadaf Ebrahimi 1868*22dc650dSSadaf Ebrahimi (a) An overall recursion such as (?0) inside a lookbehind assertion was not 1869*22dc650dSSadaf Ebrahimi being diagnosed as an error. 1870*22dc650dSSadaf Ebrahimi 1871*22dc650dSSadaf Ebrahimi (b) In utf mode, the length of a *MARK (or other verb) name was being checked 1872*22dc650dSSadaf Ebrahimi in characters instead of code units, which could lead to bad code being 1873*22dc650dSSadaf Ebrahimi compiled, leading to unpredictable behaviour. 1874*22dc650dSSadaf Ebrahimi 1875*22dc650dSSadaf Ebrahimi (c) In extended /x mode, characters whose code was greater than 255 caused 1876*22dc650dSSadaf Ebrahimi a lookup outside one of the global tables. A similar bug existed for wide 1877*22dc650dSSadaf Ebrahimi characters in *VERB names. 1878*22dc650dSSadaf Ebrahimi 1879*22dc650dSSadaf Ebrahimi (d) The amount of memory needed for a compiled pattern was miscalculated if a 1880*22dc650dSSadaf Ebrahimi lookbehind contained more than one toplevel branch and the first branch 1881*22dc650dSSadaf Ebrahimi was of length zero. 1882*22dc650dSSadaf Ebrahimi 1883*22dc650dSSadaf Ebrahimi (e) In UTF-8 or UTF-16 modes with PCRE2_EXTENDED (/x) set and a non-zero- 1884*22dc650dSSadaf Ebrahimi terminated pattern, if a # comment ran on to the end of the pattern, one 1885*22dc650dSSadaf Ebrahimi or more code units past the end were being read. 1886*22dc650dSSadaf Ebrahimi 1887*22dc650dSSadaf Ebrahimi (f) An unterminated repeat at the end of a non-zero-terminated pattern (e.g. 1888*22dc650dSSadaf Ebrahimi "{2,2") could cause reading beyond the pattern. 1889*22dc650dSSadaf Ebrahimi 1890*22dc650dSSadaf Ebrahimi (g) When reading a callout string, if the end delimiter was at the end of the 1891*22dc650dSSadaf Ebrahimi pattern one further code unit was read. 1892*22dc650dSSadaf Ebrahimi 1893*22dc650dSSadaf Ebrahimi (h) An unterminated number after \g' could cause reading beyond the pattern. 1894*22dc650dSSadaf Ebrahimi 1895*22dc650dSSadaf Ebrahimi (i) An insufficient memory size was being computed for compiling with 1896*22dc650dSSadaf Ebrahimi PCRE2_AUTO_CALLOUT. 1897*22dc650dSSadaf Ebrahimi 1898*22dc650dSSadaf Ebrahimi (j) A conditional group with an assertion condition used more memory than was 1899*22dc650dSSadaf Ebrahimi allowed for it during parsing, so too many of them could therefore 1900*22dc650dSSadaf Ebrahimi overrun a buffer. 1901*22dc650dSSadaf Ebrahimi 1902*22dc650dSSadaf Ebrahimi (k) If parsing a pattern exactly filled the buffer, the internal test for 1903*22dc650dSSadaf Ebrahimi overrun did not check when the final META_END item was added. 1904*22dc650dSSadaf Ebrahimi 1905*22dc650dSSadaf Ebrahimi (l) If a lookbehind contained a subroutine call, and the called group 1906*22dc650dSSadaf Ebrahimi contained an option setting such as (?s), and the PCRE2_ANCHORED option 1907*22dc650dSSadaf Ebrahimi was set, unpredictable behaviour could occur. The underlying bug was 1908*22dc650dSSadaf Ebrahimi incorrect code and insufficient checking while searching for the end of 1909*22dc650dSSadaf Ebrahimi the called subroutine in the parsed pattern. 1910*22dc650dSSadaf Ebrahimi 1911*22dc650dSSadaf Ebrahimi (m) Quantifiers following (*VERB)s were not being diagnosed as errors. 1912*22dc650dSSadaf Ebrahimi 1913*22dc650dSSadaf Ebrahimi (n) The use of \Q...\E in a (*VERB) name when PCRE2_ALT_VERBNAMES and 1914*22dc650dSSadaf Ebrahimi PCRE2_AUTO_CALLOUT were both specified caused undetermined behaviour. 1915*22dc650dSSadaf Ebrahimi 1916*22dc650dSSadaf Ebrahimi (o) If \Q was preceded by a quantified item, and the following \E was 1917*22dc650dSSadaf Ebrahimi followed by '?' or '+', and there was at least one literal character 1918*22dc650dSSadaf Ebrahimi between them, an internal error "unexpected repeat" occurred (example: 1919*22dc650dSSadaf Ebrahimi /.+\QX\E+/). 1920*22dc650dSSadaf Ebrahimi 1921*22dc650dSSadaf Ebrahimi (p) A buffer overflow could occur while sorting the names in the group name 1922*22dc650dSSadaf Ebrahimi list (depending on the order in which the names were seen). 1923*22dc650dSSadaf Ebrahimi 1924*22dc650dSSadaf Ebrahimi (q) A conditional group that started with a callout was not doing the right 1925*22dc650dSSadaf Ebrahimi check for a following assertion, leading to compiling bad code. Example: 1926*22dc650dSSadaf Ebrahimi /(?(C'XX))?!XX/ 1927*22dc650dSSadaf Ebrahimi 1928*22dc650dSSadaf Ebrahimi (r) If a character whose code point was greater than 0xffff appeared within 1929*22dc650dSSadaf Ebrahimi a lookbehind that was within another lookbehind, the calculation of the 1930*22dc650dSSadaf Ebrahimi lookbehind length went wrong and could provoke an internal error. 1931*22dc650dSSadaf Ebrahimi 1932*22dc650dSSadaf Ebrahimi (t) The sequence \E- or \Q\E- after a POSIX class in a character class caused 1933*22dc650dSSadaf Ebrahimi an internal error. Now the hyphen is treated as a literal. 1934*22dc650dSSadaf Ebrahimi 1935*22dc650dSSadaf Ebrahimi4. Back references are now permitted in lookbehind assertions when there are 1936*22dc650dSSadaf Ebrahimino duplicated group numbers (that is, (?| has not been used), and, if the 1937*22dc650dSSadaf Ebrahimireference is by name, there is only one group of that name. The referenced 1938*22dc650dSSadaf Ebrahimigroup must, of course be of fixed length. 1939*22dc650dSSadaf Ebrahimi 1940*22dc650dSSadaf Ebrahimi5. pcre2test has been upgraded so that, when run under valgrind with valgrind 1941*22dc650dSSadaf Ebrahimisupport enabled, reading past the end of the pattern is detected, both when 1942*22dc650dSSadaf Ebrahimicompiling and during callout processing. 1943*22dc650dSSadaf Ebrahimi 1944*22dc650dSSadaf Ebrahimi6. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back 1945*22dc650dSSadaf Ebrahimireference" and can be useful in repetitions (compare \g{-<number>} ). Perl does 1946*22dc650dSSadaf Ebrahiminot recognize this syntax. 1947*22dc650dSSadaf Ebrahimi 1948*22dc650dSSadaf Ebrahimi7. Automatic callouts are no longer generated before and after callouts in the 1949*22dc650dSSadaf Ebrahimipattern. 1950*22dc650dSSadaf Ebrahimi 1951*22dc650dSSadaf Ebrahimi8. When pcre2test was outputting information from a callout, the caret indicator 1952*22dc650dSSadaf Ebrahimifor the current position in the subject line was incorrect if it was after an 1953*22dc650dSSadaf Ebrahimiescape sequence for a character whose code point was greater than \x{ff}. 1954*22dc650dSSadaf Ebrahimi 1955*22dc650dSSadaf Ebrahimi9. Change 19 for 10.22 had a typo (PCRE_STATIC_RUNTIME should be 1956*22dc650dSSadaf EbrahimiPCRE2_STATIC_RUNTIME). Fix from David Gaussmann. 1957*22dc650dSSadaf Ebrahimi 1958*22dc650dSSadaf Ebrahimi10. Added --max-buffer-size to pcre2grep, to allow for automatic buffer 1959*22dc650dSSadaf Ebrahimiexpansion when long lines are encountered. Original patch by Dmitry 1960*22dc650dSSadaf EbrahimiCherniachenko. 1961*22dc650dSSadaf Ebrahimi 1962*22dc650dSSadaf Ebrahimi11. If pcre2grep was compiled with JIT support, but the library was compiled 1963*22dc650dSSadaf Ebrahimiwithout it (something that neither ./configure nor CMake allow, but it can be 1964*22dc650dSSadaf Ebrahimidone by editing config.h), pcre2grep was giving a JIT error. Now it detects 1965*22dc650dSSadaf Ebrahimithis situation and does not try to use JIT. 1966*22dc650dSSadaf Ebrahimi 1967*22dc650dSSadaf Ebrahimi12. Added some "const" qualifiers to variables in pcre2grep. 1968*22dc650dSSadaf Ebrahimi 1969*22dc650dSSadaf Ebrahimi13. Added Dmitry Cherniachenko's patch for colouring output in Windows 1970*22dc650dSSadaf Ebrahimi(untested by me). Also, look for GREP_COLOUR or GREP_COLOR if the environment 1971*22dc650dSSadaf Ebrahimivariables PCRE2GREP_COLOUR and PCRE2GREP_COLOR are not found. 1972*22dc650dSSadaf Ebrahimi 1973*22dc650dSSadaf Ebrahimi14. Add the -t (grand total) option to pcre2grep. 1974*22dc650dSSadaf Ebrahimi 1975*22dc650dSSadaf Ebrahimi15. A number of bugs have been mended relating to match start-up optimizations 1976*22dc650dSSadaf Ebrahimiwhen the first thing in a pattern is a positive lookahead. These all applied 1977*22dc650dSSadaf Ebrahimionly when PCRE2_NO_START_OPTIMIZE was *not* set: 1978*22dc650dSSadaf Ebrahimi 1979*22dc650dSSadaf Ebrahimi (a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed 1980*22dc650dSSadaf Ebrahimi both an initial 'X' and a following 'X'. 1981*22dc650dSSadaf Ebrahimi (b) Some patterns starting with an assertion that started with .* were 1982*22dc650dSSadaf Ebrahimi incorrectly optimized as having to match at the start of the subject or 1983*22dc650dSSadaf Ebrahimi after a newline. There are cases where this is not true, for example, 1984*22dc650dSSadaf Ebrahimi (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that 1985*22dc650dSSadaf Ebrahimi start with spaces. Starting .* in an assertion is no longer taken as an 1986*22dc650dSSadaf Ebrahimi indication of matching at the start (or after a newline). 1987*22dc650dSSadaf Ebrahimi 1988*22dc650dSSadaf Ebrahimi16. The "offset" modifier in pcre2test was not being ignored (as documented) 1989*22dc650dSSadaf Ebrahimiwhen the POSIX API was in use. 1990*22dc650dSSadaf Ebrahimi 1991*22dc650dSSadaf Ebrahimi17. Added --enable-fuzz-support to "configure", causing an non-installed 1992*22dc650dSSadaf Ebrahimilibrary containing a test function that can be called by fuzzers to be 1993*22dc650dSSadaf Ebrahimicompiled. A non-installed binary to run the test function locally, called 1994*22dc650dSSadaf Ebrahimipcre2fuzzcheck is also compiled. 1995*22dc650dSSadaf Ebrahimi 1996*22dc650dSSadaf Ebrahimi18. A pattern with PCRE2_DOTALL (/s) set but not PCRE2_NO_DOTSTAR_ANCHOR, and 1997*22dc650dSSadaf Ebrahimiwhich started with .* inside a positive lookahead was incorrectly being 1998*22dc650dSSadaf Ebrahimicompiled as implicitly anchored. 1999*22dc650dSSadaf Ebrahimi 2000*22dc650dSSadaf Ebrahimi19. Removed all instances of "register" declarations, as they are considered 2001*22dc650dSSadaf Ebrahimiobsolete these days and in any case had become very haphazard. 2002*22dc650dSSadaf Ebrahimi 2003*22dc650dSSadaf Ebrahimi20. Add strerror() to pcre2test for failed file opening. 2004*22dc650dSSadaf Ebrahimi 2005*22dc650dSSadaf Ebrahimi21. Make pcre2test -C list valgrind support when it is enabled. 2006*22dc650dSSadaf Ebrahimi 2007*22dc650dSSadaf Ebrahimi22. Add the use_length modifier to pcre2test. 2008*22dc650dSSadaf Ebrahimi 2009*22dc650dSSadaf Ebrahimi23. Fix an off-by-one bug in pcre2test for the list of names for 'get' and 2010*22dc650dSSadaf Ebrahimi'copy' modifiers. 2011*22dc650dSSadaf Ebrahimi 2012*22dc650dSSadaf Ebrahimi24. Add PCRE2_CALL_CONVENTION into the prototype declarations in pcre2.h as it 2013*22dc650dSSadaf Ebrahimiis apparently needed there as well as in the function definitions. (Why did 2014*22dc650dSSadaf Ebrahiminobody ask for this in PCRE1?) 2015*22dc650dSSadaf Ebrahimi 2016*22dc650dSSadaf Ebrahimi25. Change the _PCRE2_H and _PCRE2_UCP_H guard macros in the header files to 2017*22dc650dSSadaf EbrahimiPCRE2_H_IDEMPOTENT_GUARD and PCRE2_UCP_H_IDEMPOTENT_GUARD to be more standard 2018*22dc650dSSadaf Ebrahimicompliant and unique. 2019*22dc650dSSadaf Ebrahimi 2020*22dc650dSSadaf Ebrahimi26. pcre2-config --libs-posix was listing -lpcre2posix instead of 2021*22dc650dSSadaf Ebrahimi-lpcre2-posix. Also, the CMake build process was building the library with the 2022*22dc650dSSadaf Ebrahimiwrong name. 2023*22dc650dSSadaf Ebrahimi 2024*22dc650dSSadaf Ebrahimi27. In pcre2test, give some offset information for errors in hex patterns. 2025*22dc650dSSadaf EbrahimiThis uses the C99 formatting sequence %td, except for MSVC which doesn't 2026*22dc650dSSadaf Ebrahimisupport it - %lu is used instead. 2027*22dc650dSSadaf Ebrahimi 2028*22dc650dSSadaf Ebrahimi28. Implemented pcre2_code_copy_with_tables(), and added pushtablescopy to 2029*22dc650dSSadaf Ebrahimipcre2test for testing it. 2030*22dc650dSSadaf Ebrahimi 2031*22dc650dSSadaf Ebrahimi29. Fix small memory leak in pcre2test. 2032*22dc650dSSadaf Ebrahimi 2033*22dc650dSSadaf Ebrahimi30. Fix out-of-bounds read for partial matching of /./ against an empty string 2034*22dc650dSSadaf Ebrahimiwhen the newline type is CRLF. 2035*22dc650dSSadaf Ebrahimi 2036*22dc650dSSadaf Ebrahimi31. Fix a bug in pcre2test that caused a crash when a locale was set either in 2037*22dc650dSSadaf Ebrahimithe current pattern or a previous one and a wide character was matched. 2038*22dc650dSSadaf Ebrahimi 2039*22dc650dSSadaf Ebrahimi32. The appearance of \p, \P, or \X in a substitution string when 2040*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_EXTENDED was set caused a segmentation fault (NULL 2041*22dc650dSSadaf Ebrahimidereference). 2042*22dc650dSSadaf Ebrahimi 2043*22dc650dSSadaf Ebrahimi33. If the starting offset was specified as greater than the subject length in 2044*22dc650dSSadaf Ebrahimia call to pcre2_substitute() an out-of-bounds memory reference could occur. 2045*22dc650dSSadaf Ebrahimi 2046*22dc650dSSadaf Ebrahimi34. When PCRE2 was compiled to use the heap instead of the stack for recursive 2047*22dc650dSSadaf Ebrahimicalls to match(), a repeated minimizing caseless back reference, or a 2048*22dc650dSSadaf Ebrahimimaximizing one where the two cases had different numbers of code units, 2049*22dc650dSSadaf Ebrahimifollowed by a caseful back reference, could lose the caselessness of the first 2050*22dc650dSSadaf Ebrahimirepeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX 2051*22dc650dSSadaf Ebrahimibut didn't). 2052*22dc650dSSadaf Ebrahimi 2053*22dc650dSSadaf Ebrahimi35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum 2054*22dc650dSSadaf Ebrahimimatching length and just records zero. Typically this happens when there are 2055*22dc650dSSadaf Ebrahimitoo many nested or recursive back references. If the limit was reached in 2056*22dc650dSSadaf Ebrahimicertain recursive cases it failed to be triggered and an internal error could 2057*22dc650dSSadaf Ebrahimibe the result. 2058*22dc650dSSadaf Ebrahimi 2059*22dc650dSSadaf Ebrahimi36. The pcre2_dfa_match() function now takes note of the recursion limit for 2060*22dc650dSSadaf Ebrahimithe internal recursive calls that are used for lookrounds and recursions within 2061*22dc650dSSadaf Ebrahimithe pattern. 2062*22dc650dSSadaf Ebrahimi 2063*22dc650dSSadaf Ebrahimi37. More refactoring has got rid of the internal could_be_empty_branch() 2064*22dc650dSSadaf Ebrahimifunction (around 400 lines of code, including comments) by keeping track of 2065*22dc650dSSadaf Ebrahimicould-be-emptiness as the pattern is compiled instead of scanning compiled 2066*22dc650dSSadaf Ebrahimigroups. (This would have been much harder before the refactoring of #3 above.) 2067*22dc650dSSadaf EbrahimiThis lifts a restriction on the number of branches in a group (more than about 2068*22dc650dSSadaf Ebrahimi1100 would give "pattern is too complicated"). 2069*22dc650dSSadaf Ebrahimi 2070*22dc650dSSadaf Ebrahimi38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern 2071*22dc650dSSadaf Ebrahimiauto_callout". 2072*22dc650dSSadaf Ebrahimi 2073*22dc650dSSadaf Ebrahimi39. In a library with Unicode support, incorrect data was compiled for a 2074*22dc650dSSadaf Ebrahimipattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide 2075*22dc650dSSadaf Ebrahimicharacters to match (for example, /[\s[:^ascii:]]/). 2076*22dc650dSSadaf Ebrahimi 2077*22dc650dSSadaf Ebrahimi40. The callout_error modifier has been added to pcre2test to make it possible 2078*22dc650dSSadaf Ebrahimito return PCRE2_ERROR_CALLOUT from a callout. 2079*22dc650dSSadaf Ebrahimi 2080*22dc650dSSadaf Ebrahimi41. A minor change to pcre2grep: colour reset is now "<esc>[0m" instead of 2081*22dc650dSSadaf Ebrahimi"<esc>[00m". 2082*22dc650dSSadaf Ebrahimi 2083*22dc650dSSadaf Ebrahimi42. The limit in the auto-possessification code that was intended to catch 2084*22dc650dSSadaf Ebrahimioverly-complicated patterns and not spend too much time auto-possessifying was 2085*22dc650dSSadaf Ebrahimibeing reset too often, resulting in very long compile times for some patterns. 2086*22dc650dSSadaf EbrahimiNow such patterns are no longer completely auto-possessified. 2087*22dc650dSSadaf Ebrahimi 2088*22dc650dSSadaf Ebrahimi43. Applied Jason Hood's revised patch for RunTest.bat. 2089*22dc650dSSadaf Ebrahimi 2090*22dc650dSSadaf Ebrahimi44. Added a new Windows script RunGrepTest.bat, courtesy of Jason Hood. 2091*22dc650dSSadaf Ebrahimi 2092*22dc650dSSadaf Ebrahimi45. Minor cosmetic fix to pcre2test: move a variable that is not used under 2093*22dc650dSSadaf EbrahimiWindows into the "not Windows" code. 2094*22dc650dSSadaf Ebrahimi 2095*22dc650dSSadaf Ebrahimi46. Applied Jason Hood's patches to upgrade pcre2grep under Windows and tidy 2096*22dc650dSSadaf Ebrahimisome of the code: 2097*22dc650dSSadaf Ebrahimi 2098*22dc650dSSadaf Ebrahimi * normalised the Windows condition by ensuring WIN32 is defined; 2099*22dc650dSSadaf Ebrahimi * enables the callout feature under Windows; 2100*22dc650dSSadaf Ebrahimi * adds globbing (Microsoft's implementation expands quoted args), 2101*22dc650dSSadaf Ebrahimi using a tweaked opendirectory; 2102*22dc650dSSadaf Ebrahimi * implements the is_*_tty functions for Windows; 2103*22dc650dSSadaf Ebrahimi * --color=always will write the ANSI sequences to file; 2104*22dc650dSSadaf Ebrahimi * add sequences 4 (underline works on Win10) and 5 (blink as bright 2105*22dc650dSSadaf Ebrahimi background, relatively standard on DOS/Win); 2106*22dc650dSSadaf Ebrahimi * remove the (char *) casts for the now-const strings; 2107*22dc650dSSadaf Ebrahimi * remove GREP_COLOUR (grep's command line allowed the 'u', but not 2108*22dc650dSSadaf Ebrahimi the environment), parsing GREP_COLORS instead; 2109*22dc650dSSadaf Ebrahimi * uses the current colour if not set, rather than black; 2110*22dc650dSSadaf Ebrahimi * add print_match for the undefined case; 2111*22dc650dSSadaf Ebrahimi * fixes a typo. 2112*22dc650dSSadaf Ebrahimi 2113*22dc650dSSadaf EbrahimiIn addition, colour settings containing anything other than digits and 2114*22dc650dSSadaf Ebrahimisemicolon are ignored, and the colour controls are no longer output for empty 2115*22dc650dSSadaf Ebrahimistrings. 2116*22dc650dSSadaf Ebrahimi 2117*22dc650dSSadaf Ebrahimi47. Detecting patterns that are too large inside the length-measuring loop 2118*22dc650dSSadaf Ebrahimisaves processing ridiculously long patterns to their end. 2119*22dc650dSSadaf Ebrahimi 2120*22dc650dSSadaf Ebrahimi48. Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it 2121*22dc650dSSadaf Ebrahimijust wastes time. In the UTF case it can also produce redundant entries in 2122*22dc650dSSadaf EbrahimiXCLASS lists caused by characters with multiple other cases and pairs of 2123*22dc650dSSadaf Ebrahimicharacters in the same "not-x" sublists. 2124*22dc650dSSadaf Ebrahimi 2125*22dc650dSSadaf Ebrahimi49. A pattern such as /(?=(a\K))/ can report the end of the match being before 2126*22dc650dSSadaf Ebrahimiits start; pcre2test was not handling this correctly when using the POSIX 2127*22dc650dSSadaf Ebrahimiinterface (it was OK with the native interface). 2128*22dc650dSSadaf Ebrahimi 2129*22dc650dSSadaf Ebrahimi50. In pcre2grep, ignore all JIT compile errors. This means that pcre2grep will 2130*22dc650dSSadaf Ebrahimicontinue to work, falling back to interpretation if anything goes wrong with 2131*22dc650dSSadaf EbrahimiJIT. 2132*22dc650dSSadaf Ebrahimi 2133*22dc650dSSadaf Ebrahimi51. Applied patches from Christian Persch to configure.ac to make use of the 2134*22dc650dSSadaf EbrahimiAC_USE_SYSTEM_EXTENSIONS macro and to test for functions used by the JIT 2135*22dc650dSSadaf Ebrahimimodules. 2136*22dc650dSSadaf Ebrahimi 2137*22dc650dSSadaf Ebrahimi52. Minor fixes to pcre2grep from Jason Hood: 2138*22dc650dSSadaf Ebrahimi * fixed some spacing; 2139*22dc650dSSadaf Ebrahimi * Windows doesn't usually use single quotes, so I've added a define 2140*22dc650dSSadaf Ebrahimi to use appropriate quotes [in an example]; 2141*22dc650dSSadaf Ebrahimi * LC_ALL was displayed as "LCC_ALL"; 2142*22dc650dSSadaf Ebrahimi * numbers 11, 12 & 13 should end in "th"; 2143*22dc650dSSadaf Ebrahimi * use double quotes in usage message. 2144*22dc650dSSadaf Ebrahimi 2145*22dc650dSSadaf Ebrahimi53. When autopossessifying, skip empty branches without recursion, to reduce 2146*22dc650dSSadaf Ebrahimistack usage for the benefit of clang with -fsanitize-address, which uses huge 2147*22dc650dSSadaf Ebrahimistack frames. Example pattern: /X?(R||){3335}/. Fixes oss-fuzz issue 553. 2148*22dc650dSSadaf Ebrahimi 2149*22dc650dSSadaf Ebrahimi54. A pattern with very many explicit back references to a group that is a long 2150*22dc650dSSadaf Ebrahimiway from the start of the pattern could take a long time to compile because 2151*22dc650dSSadaf Ebrahimisearching for the referenced group in order to find the minimum length was 2152*22dc650dSSadaf Ebrahimibeing done repeatedly. Now up to 128 group minimum lengths are cached and the 2153*22dc650dSSadaf Ebrahimiattempt to find a minimum length is abandoned if there is a back reference to a 2154*22dc650dSSadaf Ebrahimigroup whose number is greater than 128. (In that case, the pattern is so 2155*22dc650dSSadaf Ebrahimicomplicated that this optimization probably isn't worth it.) This fixes 2156*22dc650dSSadaf Ebrahimioss-fuzz issue 557. 2157*22dc650dSSadaf Ebrahimi 2158*22dc650dSSadaf Ebrahimi55. Issue 32 for 10.22 below was not correctly fixed. If pcre2grep in multiline 2159*22dc650dSSadaf Ebrahimimode with --only-matching matched several lines, it restarted scanning at the 2160*22dc650dSSadaf Ebrahiminext line instead of moving on to the end of the matched string, which can be 2161*22dc650dSSadaf Ebrahimiseveral lines after the start. 2162*22dc650dSSadaf Ebrahimi 2163*22dc650dSSadaf Ebrahimi56. Applied Jason Hood's new patch for RunGrepTest.bat that updates it in line 2164*22dc650dSSadaf Ebrahimiwith updates to the non-Windows version. 2165*22dc650dSSadaf Ebrahimi 2166*22dc650dSSadaf Ebrahimi 2167*22dc650dSSadaf Ebrahimi 2168*22dc650dSSadaf EbrahimiVersion 10.22 29-July-2016 2169*22dc650dSSadaf Ebrahimi-------------------------- 2170*22dc650dSSadaf Ebrahimi 2171*22dc650dSSadaf Ebrahimi1. Applied Jason Hood's patches to RunTest.bat and testdata/wintestoutput3 2172*22dc650dSSadaf Ebrahimito fix problems with running the tests under Windows. 2173*22dc650dSSadaf Ebrahimi 2174*22dc650dSSadaf Ebrahimi2. Implemented a facility for quoting literal characters within hexadecimal 2175*22dc650dSSadaf Ebrahimipatterns in pcre2test, to make it easier to create patterns with just a few 2176*22dc650dSSadaf Ebrahiminon-printing characters. 2177*22dc650dSSadaf Ebrahimi 2178*22dc650dSSadaf Ebrahimi3. Binary zeros are not supported in pcre2test input files. It now detects them 2179*22dc650dSSadaf Ebrahimiand gives an error. 2180*22dc650dSSadaf Ebrahimi 2181*22dc650dSSadaf Ebrahimi4. Updated the valgrind parameters in RunTest: (a) changed smc-check=all to 2182*22dc650dSSadaf Ebrahimismc-check=all-non-file; (b) changed obj:* in the suppression file to obj:??? so 2183*22dc650dSSadaf Ebrahimithat it matches only unknown objects. 2184*22dc650dSSadaf Ebrahimi 2185*22dc650dSSadaf Ebrahimi5. Updated the maintenance script maint/ManyConfigTests to make it easier to 2186*22dc650dSSadaf Ebrahimiselect individual groups of tests. 2187*22dc650dSSadaf Ebrahimi 2188*22dc650dSSadaf Ebrahimi6. When the POSIX wrapper function regcomp() is called, the REG_NOSUB option 2189*22dc650dSSadaf Ebrahimiused to set PCRE2_NO_AUTO_CAPTURE when calling pcre2_compile(). However, this 2190*22dc650dSSadaf Ebrahimidisables the use of back references (and subroutine calls), which are supported 2191*22dc650dSSadaf Ebrahimiby other implementations of regcomp() with RE_NOSUB. Therefore, REG_NOSUB no 2192*22dc650dSSadaf Ebrahimilonger causes PCRE2_NO_AUTO_CAPTURE to be set, though it still ignores nmatch 2193*22dc650dSSadaf Ebrahimiand pmatch when regexec() is called. 2194*22dc650dSSadaf Ebrahimi 2195*22dc650dSSadaf Ebrahimi7. Because of 6 above, pcre2test has been modified with a new modifier called 2196*22dc650dSSadaf Ebrahimiposix_nosub, to call regcomp() with REG_NOSUB. Previously the no_auto_capture 2197*22dc650dSSadaf Ebrahimimodifier had this effect. That option is now ignored when the POSIX API is in 2198*22dc650dSSadaf Ebrahimiuse. 2199*22dc650dSSadaf Ebrahimi 2200*22dc650dSSadaf Ebrahimi8. Minor tidies to the pcre2demo.c sample program, including more comments 2201*22dc650dSSadaf Ebrahimiabout its 8-bit-ness. 2202*22dc650dSSadaf Ebrahimi 2203*22dc650dSSadaf Ebrahimi9. Detect unmatched closing parentheses and give the error in the pre-scan 2204*22dc650dSSadaf Ebrahimiinstead of later. Previously the pre-scan carried on and could give a 2205*22dc650dSSadaf Ebrahimimisleading incorrect error message. For example, /(?J)(?'a'))(?'a')/ gave a 2206*22dc650dSSadaf Ebrahimimessage about invalid duplicate group names. 2207*22dc650dSSadaf Ebrahimi 2208*22dc650dSSadaf Ebrahimi10. It has happened that pcre2test was accidentally linked with another POSIX 2209*22dc650dSSadaf Ebrahimiregex library instead of libpcre2-posix. In this situation, a call to regcomp() 2210*22dc650dSSadaf Ebrahimi(in the other library) may succeed, returning zero, but of course putting its 2211*22dc650dSSadaf Ebrahimiown data into the regex_t block. In one example the re_pcre2_code field was 2212*22dc650dSSadaf Ebrahimileft as NULL, which made pcre2test think it had not got a compiled POSIX regex, 2213*22dc650dSSadaf Ebrahimiso it treated the next line as another pattern line, resulting in a confusing 2214*22dc650dSSadaf Ebrahimierror message. A check has been added to pcre2test to see if the data returned 2215*22dc650dSSadaf Ebrahimifrom a successful call of regcomp() are valid for PCRE2's regcomp(). If they 2216*22dc650dSSadaf Ebrahimiare not, an error message is output and the pcre2test run is abandoned. The 2217*22dc650dSSadaf Ebrahimimessage points out the possibility of a mis-linking. Hopefully this will avoid 2218*22dc650dSSadaf Ebrahimisome head-scratching the next time this happens. 2219*22dc650dSSadaf Ebrahimi 2220*22dc650dSSadaf Ebrahimi11. A pattern such as /(?<=((?C)0))/, which has a callout inside a lookbehind 2221*22dc650dSSadaf Ebrahimiassertion, caused pcre2test to output a very large number of spaces when the 2222*22dc650dSSadaf Ebrahimicallout was taken, making the program appearing to loop. 2223*22dc650dSSadaf Ebrahimi 2224*22dc650dSSadaf Ebrahimi12. A pattern that included (*ACCEPT) in the middle of a sufficiently deeply 2225*22dc650dSSadaf Ebrahiminested set of parentheses of sufficient size caused an overflow of the 2226*22dc650dSSadaf Ebrahimicompiling workspace (which was diagnosed, but of course is not desirable). 2227*22dc650dSSadaf Ebrahimi 2228*22dc650dSSadaf Ebrahimi13. Detect missing closing parentheses during the pre-pass for group 2229*22dc650dSSadaf Ebrahimiidentification. 2230*22dc650dSSadaf Ebrahimi 2231*22dc650dSSadaf Ebrahimi14. Changed some integer variable types and put in a number of casts, following 2232*22dc650dSSadaf Ebrahimia report of compiler warnings from Visual Studio 2013 and a few tests with 2233*22dc650dSSadaf Ebrahimigcc's -Wconversion (which still throws up a lot). 2234*22dc650dSSadaf Ebrahimi 2235*22dc650dSSadaf Ebrahimi15. Implemented pcre2_code_copy(), and added pushcopy and #popcopy to pcre2test 2236*22dc650dSSadaf Ebrahimifor testing it. 2237*22dc650dSSadaf Ebrahimi 2238*22dc650dSSadaf Ebrahimi16. Change 66 for 10.21 introduced the use of snprintf() in PCRE2's version of 2239*22dc650dSSadaf Ebrahimiregerror(). When the error buffer is too small, my version of snprintf() puts a 2240*22dc650dSSadaf Ebrahimibinary zero in the final byte. Bug #1801 seems to show that other versions do 2241*22dc650dSSadaf Ebrahiminot do this, leading to bad output from pcre2test when it was checking for 2242*22dc650dSSadaf Ebrahimibuffer overflow. It no longer assumes a binary zero at the end of a too-small 2243*22dc650dSSadaf Ebrahimiregerror() buffer. 2244*22dc650dSSadaf Ebrahimi 2245*22dc650dSSadaf Ebrahimi17. Fixed typo ("&&" for "&") in pcre2_study(). Fortunately, this could not 2246*22dc650dSSadaf Ebrahimiactually affect anything, by sheer luck. 2247*22dc650dSSadaf Ebrahimi 2248*22dc650dSSadaf Ebrahimi18. Two minor fixes for MSVC compilation: (a) removal of apparently incorrect 2249*22dc650dSSadaf Ebrahimi"const" qualifiers in pcre2test and (b) defining snprintf as _snprintf for 2250*22dc650dSSadaf Ebrahimiolder MSVC compilers. This has been done both in src/pcre2_internal.h for most 2251*22dc650dSSadaf Ebrahimiof the library, and also in src/pcre2posix.c, which no longer includes 2252*22dc650dSSadaf Ebrahimipcre2_internal.h (see 24 below). 2253*22dc650dSSadaf Ebrahimi 2254*22dc650dSSadaf Ebrahimi19. Applied Chris Wilson's patch (Bugzilla #1681) to CMakeLists.txt for MSVC 2255*22dc650dSSadaf Ebrahimistatic compilation. Subsequently applied Chris Wilson's second patch, putting 2256*22dc650dSSadaf Ebrahimithe first patch under a new option instead of being unconditional when 2257*22dc650dSSadaf EbrahimiPCRE_STATIC is set. 2258*22dc650dSSadaf Ebrahimi 2259*22dc650dSSadaf Ebrahimi20. Updated pcre2grep to set stdout as binary when run under Windows, so as not 2260*22dc650dSSadaf Ebrahimito convert \r\n at the ends of reflected lines into \r\r\n. This required 2261*22dc650dSSadaf Ebrahimiensuring that other output that is written to stdout (e.g. file names) uses the 2262*22dc650dSSadaf Ebrahimiappropriate line terminator: \r\n for Windows, \n otherwise. 2263*22dc650dSSadaf Ebrahimi 2264*22dc650dSSadaf Ebrahimi21. When a line is too long for pcre2grep's internal buffer, show the maximum 2265*22dc650dSSadaf Ebrahimilength in the error message. 2266*22dc650dSSadaf Ebrahimi 2267*22dc650dSSadaf Ebrahimi22. Added support for string callouts to pcre2grep (Zoltan's patch with PH 2268*22dc650dSSadaf Ebrahimiadditions). 2269*22dc650dSSadaf Ebrahimi 2270*22dc650dSSadaf Ebrahimi23. RunTest.bat was missing a "set type" line for test 22. 2271*22dc650dSSadaf Ebrahimi 2272*22dc650dSSadaf Ebrahimi24. The pcre2posix.c file was including pcre2_internal.h, and using some 2273*22dc650dSSadaf Ebrahimi"private" knowledge of the data structures. This is unnecessary; the code has 2274*22dc650dSSadaf Ebrahimibeen re-factored and no longer includes pcre2_internal.h. 2275*22dc650dSSadaf Ebrahimi 2276*22dc650dSSadaf Ebrahimi25. A racing condition is fixed in JIT reported by Mozilla. 2277*22dc650dSSadaf Ebrahimi 2278*22dc650dSSadaf Ebrahimi26. Minor code refactor to avoid "array subscript is below array bounds" 2279*22dc650dSSadaf Ebrahimicompiler warning. 2280*22dc650dSSadaf Ebrahimi 2281*22dc650dSSadaf Ebrahimi27. Minor code refactor to avoid "left shift of negative number" warning. 2282*22dc650dSSadaf Ebrahimi 2283*22dc650dSSadaf Ebrahimi28. Add a bit more sanity checking to pcre2_serialize_decode() and document 2284*22dc650dSSadaf Ebrahimithat it expects trusted data. 2285*22dc650dSSadaf Ebrahimi 2286*22dc650dSSadaf Ebrahimi29. Fix typo in pcre2_jit_test.c 2287*22dc650dSSadaf Ebrahimi 2288*22dc650dSSadaf Ebrahimi30. Due to an oversight, pcre2grep was not making use of JIT when available. 2289*22dc650dSSadaf EbrahimiThis is now fixed. 2290*22dc650dSSadaf Ebrahimi 2291*22dc650dSSadaf Ebrahimi31. The RunGrepTest script is updated to use the valgrind suppressions file 2292*22dc650dSSadaf Ebrahimiwhen testing with JIT under valgrind (compare 10.21/51 below). The suppressions 2293*22dc650dSSadaf Ebrahimifile is updated so that is now the same as for PCRE1: it suppresses the 2294*22dc650dSSadaf EbrahimiMemcheck warnings Addr16 and Cond in unknown objects (that is, JIT-compiled 2295*22dc650dSSadaf Ebrahimicode). Also changed smc-check=all to smc-check=all-non-file as was done for 2296*22dc650dSSadaf EbrahimiRunTest (see 4 above). 2297*22dc650dSSadaf Ebrahimi 2298*22dc650dSSadaf Ebrahimi32. Implemented the PCRE2_NO_JIT option for pcre2_match(). 2299*22dc650dSSadaf Ebrahimi 2300*22dc650dSSadaf Ebrahimi33. Fix typo that gave a compiler error when JIT not supported. 2301*22dc650dSSadaf Ebrahimi 2302*22dc650dSSadaf Ebrahimi34. Fix comment describing the returns from find_fixedlength(). 2303*22dc650dSSadaf Ebrahimi 2304*22dc650dSSadaf Ebrahimi35. Fix potential negative index in pcre2test. 2305*22dc650dSSadaf Ebrahimi 2306*22dc650dSSadaf Ebrahimi36. Calls to pcre2_get_error_message() with error numbers that are never 2307*22dc650dSSadaf Ebrahimireturned by PCRE2 functions were returning empty strings. Now the error code 2308*22dc650dSSadaf EbrahimiPCRE2_ERROR_BADDATA is returned. A facility has been added to pcre2test to 2309*22dc650dSSadaf Ebrahimishow the texts for given error numbers (i.e. to call pcre2_get_error_message() 2310*22dc650dSSadaf Ebrahimiand display what it returns) and a few representative error codes are now 2311*22dc650dSSadaf Ebrahimichecked in RunTest. 2312*22dc650dSSadaf Ebrahimi 2313*22dc650dSSadaf Ebrahimi37. Added "&& !defined(__INTEL_COMPILER)" to the test for __GNUC__ in 2314*22dc650dSSadaf Ebrahimipcre2_match.c, in anticipation that this is needed for the same reason it was 2315*22dc650dSSadaf Ebrahimirecently added to pcrecpp.cc in PCRE1. 2316*22dc650dSSadaf Ebrahimi 2317*22dc650dSSadaf Ebrahimi38. Using -o with -M in pcre2grep could cause unnecessary repeated output when 2318*22dc650dSSadaf Ebrahimithe match extended over a line boundary, as it tried to find more matches "on 2319*22dc650dSSadaf Ebrahimithe same line" - but it was already over the end. 2320*22dc650dSSadaf Ebrahimi 2321*22dc650dSSadaf Ebrahimi39. Allow \C in lookbehinds and DFA matching in UTF-32 mode (by converting it 2322*22dc650dSSadaf Ebrahimito the same code as '.' when PCRE2_DOTALL is set). 2323*22dc650dSSadaf Ebrahimi 2324*22dc650dSSadaf Ebrahimi40. Fix two clang compiler warnings in pcre2test when only one code unit width 2325*22dc650dSSadaf Ebrahimiis supported. 2326*22dc650dSSadaf Ebrahimi 2327*22dc650dSSadaf Ebrahimi41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack 2328*22dc650dSSadaf Ebrahimiif it fails when running the interpreter with a 16MiB stack (and if changing 2329*22dc650dSSadaf Ebrahimithe stack size via pcre2test is possible). This avoids having to manually set a 2330*22dc650dSSadaf Ebrahimilarge stack size when testing with clang. 2331*22dc650dSSadaf Ebrahimi 2332*22dc650dSSadaf Ebrahimi42. Fix register overwrite in JIT when SSE2 acceleration is enabled. 2333*22dc650dSSadaf Ebrahimi 2334*22dc650dSSadaf Ebrahimi43. Detect integer overflow in pcre2test pattern and data repetition counts. 2335*22dc650dSSadaf Ebrahimi 2336*22dc650dSSadaf Ebrahimi44. In pcre2test, ignore "allcaptures" after DFA matching. 2337*22dc650dSSadaf Ebrahimi 2338*22dc650dSSadaf Ebrahimi45. Fix unaligned accesses on x86. Patch by Marc Mutz. 2339*22dc650dSSadaf Ebrahimi 2340*22dc650dSSadaf Ebrahimi46. Fix some more clang compiler warnings. 2341*22dc650dSSadaf Ebrahimi 2342*22dc650dSSadaf Ebrahimi 2343*22dc650dSSadaf EbrahimiVersion 10.21 12-January-2016 2344*22dc650dSSadaf Ebrahimi----------------------------- 2345*22dc650dSSadaf Ebrahimi 2346*22dc650dSSadaf Ebrahimi1. Improve matching speed of patterns starting with + or * in JIT. 2347*22dc650dSSadaf Ebrahimi 2348*22dc650dSSadaf Ebrahimi2. Use memchr() to find the first character in an unanchored match in 8-bit 2349*22dc650dSSadaf Ebrahimimode in the interpreter. This gives a significant speed improvement. 2350*22dc650dSSadaf Ebrahimi 2351*22dc650dSSadaf Ebrahimi3. Removed a redundant copy of the opcode_possessify table in the 2352*22dc650dSSadaf Ebrahimipcre2_auto_possessify.c source. 2353*22dc650dSSadaf Ebrahimi 2354*22dc650dSSadaf Ebrahimi4. Fix typos in dftables.c for z/OS. 2355*22dc650dSSadaf Ebrahimi 2356*22dc650dSSadaf Ebrahimi5. Change 36 for 10.20 broke the handling of [[:>:]] and [[:<:]] in that 2357*22dc650dSSadaf Ebrahimiprocessing them could involve a buffer overflow if the following character was 2358*22dc650dSSadaf Ebrahimian opening parenthesis. 2359*22dc650dSSadaf Ebrahimi 2360*22dc650dSSadaf Ebrahimi6. Change 36 for 10.20 also introduced a bug in processing this pattern: 2361*22dc650dSSadaf Ebrahimi/((?x)(*:0))#(?'/. Specifically: if a setting of (?x) was followed by a (*MARK) 2362*22dc650dSSadaf Ebrahimisetting (which (*:0) is), then (?x) did not get unset at the end of its group 2363*22dc650dSSadaf Ebrahimiduring the scan for named groups, and hence the external # was incorrectly 2364*22dc650dSSadaf Ebrahimitreated as a comment and the invalid (?' at the end of the pattern was not 2365*22dc650dSSadaf Ebrahimidiagnosed. This caused a buffer overflow during the real compile. This bug was 2366*22dc650dSSadaf Ebrahimidiscovered by Karl Skomski with the LLVM fuzzer. 2367*22dc650dSSadaf Ebrahimi 2368*22dc650dSSadaf Ebrahimi7. Moved the pcre2_find_bracket() function from src/pcre2_compile.c into its 2369*22dc650dSSadaf Ebrahimiown source module to avoid a circular dependency between src/pcre2_compile.c 2370*22dc650dSSadaf Ebrahimiand src/pcre2_study.c 2371*22dc650dSSadaf Ebrahimi 2372*22dc650dSSadaf Ebrahimi8. A callout with a string argument containing an opening square bracket, for 2373*22dc650dSSadaf Ebrahimiexample /(?C$[$)(?<]/, was incorrectly processed and could provoke a buffer 2374*22dc650dSSadaf Ebrahimioverflow. This bug was discovered by Karl Skomski with the LLVM fuzzer. 2375*22dc650dSSadaf Ebrahimi 2376*22dc650dSSadaf Ebrahimi9. The handling of callouts during the pre-pass for named group identification 2377*22dc650dSSadaf Ebrahimihas been tightened up. 2378*22dc650dSSadaf Ebrahimi 2379*22dc650dSSadaf Ebrahimi10. The quantifier {1} can be ignored, whether greedy, non-greedy, or 2380*22dc650dSSadaf Ebrahimipossessive. This is a very minor optimization. 2381*22dc650dSSadaf Ebrahimi 2382*22dc650dSSadaf Ebrahimi11. A possessively repeated conditional group that could match an empty string, 2383*22dc650dSSadaf Ebrahimifor example, /(?(R))*+/, was incorrectly compiled. 2384*22dc650dSSadaf Ebrahimi 2385*22dc650dSSadaf Ebrahimi12. The Unicode tables have been updated to Unicode 8.0.0 (thanks to Christian 2386*22dc650dSSadaf EbrahimiPersch). 2387*22dc650dSSadaf Ebrahimi 2388*22dc650dSSadaf Ebrahimi13. An empty comment (?#) in a pattern was incorrectly processed and could 2389*22dc650dSSadaf Ebrahimiprovoke a buffer overflow. This bug was discovered by Karl Skomski with the 2390*22dc650dSSadaf EbrahimiLLVM fuzzer. 2391*22dc650dSSadaf Ebrahimi 2392*22dc650dSSadaf Ebrahimi14. Fix infinite recursion in the JIT compiler when certain patterns such as 2393*22dc650dSSadaf Ebrahimi/(?:|a|){100}x/ are analysed. 2394*22dc650dSSadaf Ebrahimi 2395*22dc650dSSadaf Ebrahimi15. Some patterns with character classes involving [: and \\ were incorrectly 2396*22dc650dSSadaf Ebrahimicompiled and could cause reading from uninitialized memory or an incorrect 2397*22dc650dSSadaf Ebrahimierror diagnosis. Examples are: /[[:\\](?<[::]/ and /[[:\\](?'abc')[a:]. The 2398*22dc650dSSadaf Ebrahimifirst of these bugs was discovered by Karl Skomski with the LLVM fuzzer. 2399*22dc650dSSadaf Ebrahimi 2400*22dc650dSSadaf Ebrahimi16. Pathological patterns containing many nested occurrences of [: caused 2401*22dc650dSSadaf Ebrahimipcre2_compile() to run for a very long time. This bug was found by the LLVM 2402*22dc650dSSadaf Ebrahimifuzzer. 2403*22dc650dSSadaf Ebrahimi 2404*22dc650dSSadaf Ebrahimi17. A missing closing parenthesis for a callout with a string argument was not 2405*22dc650dSSadaf Ebrahimibeing diagnosed, possibly leading to a buffer overflow. This bug was found by 2406*22dc650dSSadaf Ebrahimithe LLVM fuzzer. 2407*22dc650dSSadaf Ebrahimi 2408*22dc650dSSadaf Ebrahimi18. A conditional group with only one branch has an implicit empty alternative 2409*22dc650dSSadaf Ebrahimibranch and must therefore be treated as potentially matching an empty string. 2410*22dc650dSSadaf Ebrahimi 2411*22dc650dSSadaf Ebrahimi19. If (?R was followed by - or + incorrect behaviour happened instead of a 2412*22dc650dSSadaf Ebrahimidiagnostic. This bug was discovered by Karl Skomski with the LLVM fuzzer. 2413*22dc650dSSadaf Ebrahimi 2414*22dc650dSSadaf Ebrahimi20. Another bug that was introduced by change 36 for 10.20: conditional groups 2415*22dc650dSSadaf Ebrahimiwhose condition was an assertion preceded by an explicit callout with a string 2416*22dc650dSSadaf Ebrahimiargument might be incorrectly processed, especially if the string contained \Q. 2417*22dc650dSSadaf EbrahimiThis bug was discovered by Karl Skomski with the LLVM fuzzer. 2418*22dc650dSSadaf Ebrahimi 2419*22dc650dSSadaf Ebrahimi21. Compiling PCRE2 with the sanitize options of clang showed up a number of 2420*22dc650dSSadaf Ebrahimivery pedantic coding infelicities and a buffer overflow while checking a UTF-8 2421*22dc650dSSadaf Ebrahimistring if the final multi-byte UTF-8 character was truncated. 2422*22dc650dSSadaf Ebrahimi 2423*22dc650dSSadaf Ebrahimi22. For Perl compatibility in EBCDIC environments, ranges such as a-z in a 2424*22dc650dSSadaf Ebrahimiclass, where both values are literal letters in the same case, omit the 2425*22dc650dSSadaf Ebrahiminon-letter EBCDIC code points within the range. 2426*22dc650dSSadaf Ebrahimi 2427*22dc650dSSadaf Ebrahimi23. Finding the minimum matching length of complex patterns with back 2428*22dc650dSSadaf Ebrahimireferences and/or recursions can take a long time. There is now a cut-off that 2429*22dc650dSSadaf Ebrahimigives up trying to find a minimum length when things get too complex. 2430*22dc650dSSadaf Ebrahimi 2431*22dc650dSSadaf Ebrahimi24. An optimization has been added that speeds up finding the minimum matching 2432*22dc650dSSadaf Ebrahimilength for patterns containing repeated capturing groups or recursions. 2433*22dc650dSSadaf Ebrahimi 2434*22dc650dSSadaf Ebrahimi25. If a pattern contained a back reference to a group whose number was 2435*22dc650dSSadaf Ebrahimiduplicated as a result of appearing in a (?|...) group, the computation of the 2436*22dc650dSSadaf Ebrahimiminimum matching length gave a wrong result, which could cause incorrect "no 2437*22dc650dSSadaf Ebrahimimatch" errors. For such patterns, a minimum matching length cannot at present 2438*22dc650dSSadaf Ebrahimibe computed. 2439*22dc650dSSadaf Ebrahimi 2440*22dc650dSSadaf Ebrahimi26. Added a check for integer overflow in conditions (?(<digits>) and 2441*22dc650dSSadaf Ebrahimi(?(R<digits>). This omission was discovered by Karl Skomski with the LLVM 2442*22dc650dSSadaf Ebrahimifuzzer. 2443*22dc650dSSadaf Ebrahimi 2444*22dc650dSSadaf Ebrahimi27. Fixed an issue when \p{Any} inside an xclass did not read the current 2445*22dc650dSSadaf Ebrahimicharacter. 2446*22dc650dSSadaf Ebrahimi 2447*22dc650dSSadaf Ebrahimi28. If pcre2grep was given the -q option with -c or -l, or when handling a 2448*22dc650dSSadaf Ebrahimibinary file, it incorrectly wrote output to stdout. 2449*22dc650dSSadaf Ebrahimi 2450*22dc650dSSadaf Ebrahimi29. The JIT compiler did not restore the control verb head in case of *THEN 2451*22dc650dSSadaf Ebrahimicontrol verbs. This issue was found by Karl Skomski with a custom LLVM fuzzer. 2452*22dc650dSSadaf Ebrahimi 2453*22dc650dSSadaf Ebrahimi30. The way recursive references such as (?3) are compiled has been re-written 2454*22dc650dSSadaf Ebrahimibecause the old way was the cause of many issues. Now, conversion of the group 2455*22dc650dSSadaf Ebrahiminumber into a pattern offset does not happen until the pattern has been 2456*22dc650dSSadaf Ebrahimicompletely compiled. This does mean that detection of all infinitely looping 2457*22dc650dSSadaf Ebrahimirecursions is postponed till match time. In the past, some easy ones were 2458*22dc650dSSadaf Ebrahimidetected at compile time. This re-writing was done in response to yet another 2459*22dc650dSSadaf Ebrahimibug found by the LLVM fuzzer. 2460*22dc650dSSadaf Ebrahimi 2461*22dc650dSSadaf Ebrahimi31. A test for a back reference to a non-existent group was missing for items 2462*22dc650dSSadaf Ebrahimisuch as \987. This caused incorrect code to be compiled. This issue was found 2463*22dc650dSSadaf Ebrahimiby Karl Skomski with a custom LLVM fuzzer. 2464*22dc650dSSadaf Ebrahimi 2465*22dc650dSSadaf Ebrahimi32. Error messages for syntax errors following \g and \k were giving inaccurate 2466*22dc650dSSadaf Ebrahimioffsets in the pattern. 2467*22dc650dSSadaf Ebrahimi 2468*22dc650dSSadaf Ebrahimi33. Improve the performance of starting single character repetitions in JIT. 2469*22dc650dSSadaf Ebrahimi 2470*22dc650dSSadaf Ebrahimi34. (*LIMIT_MATCH=) now gives an error instead of setting the value to 0. 2471*22dc650dSSadaf Ebrahimi 2472*22dc650dSSadaf Ebrahimi35. Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now 2473*22dc650dSSadaf Ebrahimigive the right offset instead of zero. 2474*22dc650dSSadaf Ebrahimi 2475*22dc650dSSadaf Ebrahimi36. The JIT compiler should not check repeats after a {0,1} repeat byte code. 2476*22dc650dSSadaf EbrahimiThis issue was found by Karl Skomski with a custom LLVM fuzzer. 2477*22dc650dSSadaf Ebrahimi 2478*22dc650dSSadaf Ebrahimi37. The JIT compiler should restore the control chain for empty possessive 2479*22dc650dSSadaf Ebrahimirepeats. This issue was found by Karl Skomski with a custom LLVM fuzzer. 2480*22dc650dSSadaf Ebrahimi 2481*22dc650dSSadaf Ebrahimi38. A bug which was introduced by the single character repetition optimization 2482*22dc650dSSadaf Ebrahimiwas fixed. 2483*22dc650dSSadaf Ebrahimi 2484*22dc650dSSadaf Ebrahimi39. Match limit check added to recursion. This issue was found by Karl Skomski 2485*22dc650dSSadaf Ebrahimiwith a custom LLVM fuzzer. 2486*22dc650dSSadaf Ebrahimi 2487*22dc650dSSadaf Ebrahimi40. Arrange for the UTF check in pcre2_match() and pcre2_dfa_match() to look 2488*22dc650dSSadaf Ebrahimionly at the part of the subject that is relevant when the starting offset is 2489*22dc650dSSadaf Ebrahiminon-zero. 2490*22dc650dSSadaf Ebrahimi 2491*22dc650dSSadaf Ebrahimi41. Improve first character match in JIT with SSE2 on x86. 2492*22dc650dSSadaf Ebrahimi 2493*22dc650dSSadaf Ebrahimi42. Fix two assertion fails in JIT. These issues were found by Karl Skomski 2494*22dc650dSSadaf Ebrahimiwith a custom LLVM fuzzer. 2495*22dc650dSSadaf Ebrahimi 2496*22dc650dSSadaf Ebrahimi43. Correct the setting of CMAKE_C_FLAGS in CMakeLists.txt (patch from Roy Ivy 2497*22dc650dSSadaf EbrahimiIII). 2498*22dc650dSSadaf Ebrahimi 2499*22dc650dSSadaf Ebrahimi44. Fix bug in RunTest.bat for new test 14, and adjust the script for the added 2500*22dc650dSSadaf Ebrahimitest (there are now 20 in total). 2501*22dc650dSSadaf Ebrahimi 2502*22dc650dSSadaf Ebrahimi45. Fixed a corner case of range optimization in JIT. 2503*22dc650dSSadaf Ebrahimi 2504*22dc650dSSadaf Ebrahimi46. Add the ${*MARK} facility to pcre2_substitute(). 2505*22dc650dSSadaf Ebrahimi 2506*22dc650dSSadaf Ebrahimi47. Modifier lists in pcre2test were splitting at spaces without the required 2507*22dc650dSSadaf Ebrahimicommas. 2508*22dc650dSSadaf Ebrahimi 2509*22dc650dSSadaf Ebrahimi48. Implemented PCRE2_ALT_VERBNAMES. 2510*22dc650dSSadaf Ebrahimi 2511*22dc650dSSadaf Ebrahimi49. Fixed two issues in JIT. These were found by Karl Skomski with a custom 2512*22dc650dSSadaf EbrahimiLLVM fuzzer. 2513*22dc650dSSadaf Ebrahimi 2514*22dc650dSSadaf Ebrahimi50. The pcre2test program has been extended by adding the #newline_default 2515*22dc650dSSadaf Ebrahimicommand. This has made it possible to run the standard tests when PCRE2 is 2516*22dc650dSSadaf Ebrahimicompiled with either CR or CRLF as the default newline convention. As part of 2517*22dc650dSSadaf Ebrahimithis work, the new command was added to several test files and the testing 2518*22dc650dSSadaf Ebrahimiscripts were modified. The pcre2grep tests can now also be run when there is no 2519*22dc650dSSadaf EbrahimiLF in the default newline convention. 2520*22dc650dSSadaf Ebrahimi 2521*22dc650dSSadaf Ebrahimi51. The RunTest script has been modified so that, when JIT is used and valgrind 2522*22dc650dSSadaf Ebrahimiis specified, a valgrind suppressions file is set up to ignore "Invalid read of 2523*22dc650dSSadaf Ebrahimisize 16" errors because these are false positives when the hardware supports 2524*22dc650dSSadaf Ebrahimithe SSE2 instruction set. 2525*22dc650dSSadaf Ebrahimi 2526*22dc650dSSadaf Ebrahimi52. It is now possible to have comment lines amid the subject strings in 2527*22dc650dSSadaf Ebrahimipcre2test (and perltest.sh) input. 2528*22dc650dSSadaf Ebrahimi 2529*22dc650dSSadaf Ebrahimi53. Implemented PCRE2_USE_OFFSET_LIMIT and pcre2_set_offset_limit(). 2530*22dc650dSSadaf Ebrahimi 2531*22dc650dSSadaf Ebrahimi54. Add the null_context modifier to pcre2test so that calling pcre2_compile() 2532*22dc650dSSadaf Ebrahimiand the matching functions with NULL contexts can be tested. 2533*22dc650dSSadaf Ebrahimi 2534*22dc650dSSadaf Ebrahimi55. Implemented PCRE2_SUBSTITUTE_EXTENDED. 2535*22dc650dSSadaf Ebrahimi 2536*22dc650dSSadaf Ebrahimi56. In a character class such as [\W\p{Any}] where both a negative-type escape 2537*22dc650dSSadaf Ebrahimi("not a word character") and a property escape were present, the property 2538*22dc650dSSadaf Ebrahimiescape was being ignored. 2539*22dc650dSSadaf Ebrahimi 2540*22dc650dSSadaf Ebrahimi57. Fixed integer overflow for patterns whose minimum matching length is very, 2541*22dc650dSSadaf Ebrahimivery large. 2542*22dc650dSSadaf Ebrahimi 2543*22dc650dSSadaf Ebrahimi58. Implemented --never-backslash-C. 2544*22dc650dSSadaf Ebrahimi 2545*22dc650dSSadaf Ebrahimi59. Change 55 above introduced a bug by which certain patterns provoked the 2546*22dc650dSSadaf Ebrahimierroneous error "\ at end of pattern". 2547*22dc650dSSadaf Ebrahimi 2548*22dc650dSSadaf Ebrahimi60. The special sequences [[:<:]] and [[:>:]] gave rise to incorrect compiling 2549*22dc650dSSadaf Ebrahimierrors or other strange effects if compiled in UCP mode. Found with libFuzzer 2550*22dc650dSSadaf Ebrahimiand AddressSanitizer. 2551*22dc650dSSadaf Ebrahimi 2552*22dc650dSSadaf Ebrahimi61. Whitespace at the end of a pcre2test pattern line caused a spurious error 2553*22dc650dSSadaf Ebrahimimessage if there were only single-character modifiers. It should be ignored. 2554*22dc650dSSadaf Ebrahimi 2555*22dc650dSSadaf Ebrahimi62. The use of PCRE2_NO_AUTO_CAPTURE could cause incorrect compilation results 2556*22dc650dSSadaf Ebrahimior segmentation errors for some patterns. Found with libFuzzer and 2557*22dc650dSSadaf EbrahimiAddressSanitizer. 2558*22dc650dSSadaf Ebrahimi 2559*22dc650dSSadaf Ebrahimi63. Very long names in (*MARK) or (*THEN) etc. items could provoke a buffer 2560*22dc650dSSadaf Ebrahimioverflow. 2561*22dc650dSSadaf Ebrahimi 2562*22dc650dSSadaf Ebrahimi64. Improve error message for overly-complicated patterns. 2563*22dc650dSSadaf Ebrahimi 2564*22dc650dSSadaf Ebrahimi65. Implemented an optional replication feature for patterns in pcre2test, to 2565*22dc650dSSadaf Ebrahimimake it easier to test long repetitive patterns. The tests for 63 above are 2566*22dc650dSSadaf Ebrahimiconverted to use the new feature. 2567*22dc650dSSadaf Ebrahimi 2568*22dc650dSSadaf Ebrahimi66. In the POSIX wrapper, if regerror() was given too small a buffer, it could 2569*22dc650dSSadaf Ebrahimimisbehave. 2570*22dc650dSSadaf Ebrahimi 2571*22dc650dSSadaf Ebrahimi67. In pcre2_substitute() in UTF mode, the UTF validity check on the 2572*22dc650dSSadaf Ebrahimireplacement string was happening before the length setting when the replacement 2573*22dc650dSSadaf Ebrahimistring was zero-terminated. 2574*22dc650dSSadaf Ebrahimi 2575*22dc650dSSadaf Ebrahimi68. In pcre2_substitute() in UTF mode, PCRE2_NO_UTF_CHECK can be set for the 2576*22dc650dSSadaf Ebrahimisecond and subsequent calls to pcre2_match(). 2577*22dc650dSSadaf Ebrahimi 2578*22dc650dSSadaf Ebrahimi69. There was no check for integer overflow for a replacement group number in 2579*22dc650dSSadaf Ebrahimipcre2_substitute(). An added check for a number greater than the largest group 2580*22dc650dSSadaf Ebrahiminumber in the pattern means this is not now needed. 2581*22dc650dSSadaf Ebrahimi 2582*22dc650dSSadaf Ebrahimi70. The PCRE2-specific VERSION condition didn't work correctly if only one 2583*22dc650dSSadaf Ebrahimidigit was given after the decimal point, or if more than two digits were given. 2584*22dc650dSSadaf EbrahimiIt now works with one or two digits, and gives a compile time error if more are 2585*22dc650dSSadaf Ebrahimigiven. 2586*22dc650dSSadaf Ebrahimi 2587*22dc650dSSadaf Ebrahimi71. In pcre2_substitute() there was the possibility of reading one code unit 2588*22dc650dSSadaf Ebrahimibeyond the end of the replacement string. 2589*22dc650dSSadaf Ebrahimi 2590*22dc650dSSadaf Ebrahimi72. The code for checking a subject's UTF-32 validity for a pattern with a 2591*22dc650dSSadaf Ebrahimilookbehind involved an out-of-bounds pointer, which could potentially cause 2592*22dc650dSSadaf Ebrahimitrouble in some environments. 2593*22dc650dSSadaf Ebrahimi 2594*22dc650dSSadaf Ebrahimi73. The maximum lookbehind length was incorrectly calculated for patterns such 2595*22dc650dSSadaf Ebrahimias /(?<=(a)(?-1))x/ which have a recursion within a backreference. 2596*22dc650dSSadaf Ebrahimi 2597*22dc650dSSadaf Ebrahimi74. Give an error if a lookbehind assertion is longer than 65535 code units. 2598*22dc650dSSadaf Ebrahimi 2599*22dc650dSSadaf Ebrahimi75. Give an error in pcre2_substitute() if a match ends before it starts (as a 2600*22dc650dSSadaf Ebrahimiresult of the use of \K). 2601*22dc650dSSadaf Ebrahimi 2602*22dc650dSSadaf Ebrahimi76. Check the length of subpattern names and the names in (*MARK:xx) etc. 2603*22dc650dSSadaf Ebrahimidynamically to avoid the possibility of integer overflow. 2604*22dc650dSSadaf Ebrahimi 2605*22dc650dSSadaf Ebrahimi77. Implement pcre2_set_max_pattern_length() so that programs can restrict the 2606*22dc650dSSadaf Ebrahimisize of patterns that they are prepared to handle. 2607*22dc650dSSadaf Ebrahimi 2608*22dc650dSSadaf Ebrahimi78. (*NO_AUTO_POSSESS) was not working. 2609*22dc650dSSadaf Ebrahimi 2610*22dc650dSSadaf Ebrahimi79. Adding group information caching improves the speed of compiling when 2611*22dc650dSSadaf Ebrahimichecking whether a group has a fixed length and/or could match an empty string, 2612*22dc650dSSadaf Ebrahimiespecially when recursion or subroutine calls are involved. However, this 2613*22dc650dSSadaf Ebrahimicannot be used when (?| is present in the pattern because the same number may 2614*22dc650dSSadaf Ebrahimibe used for groups of different sizes. To catch runaway patterns in this 2615*22dc650dSSadaf Ebrahimisituation, counts have been introduced to the functions that scan for empty 2616*22dc650dSSadaf Ebrahimibranches or compute fixed lengths. 2617*22dc650dSSadaf Ebrahimi 2618*22dc650dSSadaf Ebrahimi80. Allow for the possibility of the size of the nest_save structure not being 2619*22dc650dSSadaf Ebrahimia factor of the size of the compiling workspace (it currently is). 2620*22dc650dSSadaf Ebrahimi 2621*22dc650dSSadaf Ebrahimi81. Check for integer overflow in minimum length calculation and cap it at 2622*22dc650dSSadaf Ebrahimi65535. 2623*22dc650dSSadaf Ebrahimi 2624*22dc650dSSadaf Ebrahimi82. Small optimizations in code for finding the minimum matching length. 2625*22dc650dSSadaf Ebrahimi 2626*22dc650dSSadaf Ebrahimi83. Lock out configuring for EBCDIC with non-8-bit libraries. 2627*22dc650dSSadaf Ebrahimi 2628*22dc650dSSadaf Ebrahimi84. Test for error code <= 0 in regerror(). 2629*22dc650dSSadaf Ebrahimi 2630*22dc650dSSadaf Ebrahimi85. Check for too many replacements (more than INT_MAX) in pcre2_substitute(). 2631*22dc650dSSadaf Ebrahimi 2632*22dc650dSSadaf Ebrahimi86. Avoid the possibility of computing with an out-of-bounds pointer (though 2633*22dc650dSSadaf Ebrahiminot dereferencing it) while handling lookbehind assertions. 2634*22dc650dSSadaf Ebrahimi 2635*22dc650dSSadaf Ebrahimi87. Failure to get memory for the match data in regcomp() is now given as a 2636*22dc650dSSadaf Ebrahimiregcomp() error instead of waiting for regexec() to pick it up. 2637*22dc650dSSadaf Ebrahimi 2638*22dc650dSSadaf Ebrahimi88. In pcre2_substitute(), ensure that CRLF is not split when it is a valid 2639*22dc650dSSadaf Ebrahiminewline sequence. 2640*22dc650dSSadaf Ebrahimi 2641*22dc650dSSadaf Ebrahimi89. Paranoid check in regcomp() for bad error code from pcre2_compile(). 2642*22dc650dSSadaf Ebrahimi 2643*22dc650dSSadaf Ebrahimi90. Run test 8 (internal offsets and code sizes) for link sizes 3 and 4 as well 2644*22dc650dSSadaf Ebrahimias for link size 2. 2645*22dc650dSSadaf Ebrahimi 2646*22dc650dSSadaf Ebrahimi91. Document that JIT has a limit on pattern size, and give more information 2647*22dc650dSSadaf Ebrahimiabout JIT compile failures in pcre2test. 2648*22dc650dSSadaf Ebrahimi 2649*22dc650dSSadaf Ebrahimi92. Implement PCRE2_INFO_HASBACKSLASHC. 2650*22dc650dSSadaf Ebrahimi 2651*22dc650dSSadaf Ebrahimi93. Re-arrange valgrind support code in pcre2test to avoid spurious reports 2652*22dc650dSSadaf Ebrahimiwith JIT (possibly caused by SSE2?). 2653*22dc650dSSadaf Ebrahimi 2654*22dc650dSSadaf Ebrahimi94. Support offset_limit in JIT. 2655*22dc650dSSadaf Ebrahimi 2656*22dc650dSSadaf Ebrahimi95. A sequence such as [[:punct:]b] that is, a POSIX character class followed 2657*22dc650dSSadaf Ebrahimiby a single ASCII character in a class item, was incorrectly compiled in UCP 2658*22dc650dSSadaf Ebrahimimode. The POSIX class got lost, but only if the single character followed it. 2659*22dc650dSSadaf Ebrahimi 2660*22dc650dSSadaf Ebrahimi96. [:punct:] in UCP mode was matching some characters in the range 128-255 2661*22dc650dSSadaf Ebrahimithat should not have been matched. 2662*22dc650dSSadaf Ebrahimi 2663*22dc650dSSadaf Ebrahimi97. If [:^ascii:] or [:^xdigit:] are present in a non-negated class, all 2664*22dc650dSSadaf Ebrahimicharacters with code points greater than 255 are in the class. When a Unicode 2665*22dc650dSSadaf Ebrahimiproperty was also in the class (if PCRE2_UCP is set, escapes such as \w are 2666*22dc650dSSadaf Ebrahimiturned into Unicode properties), wide characters were not correctly handled, 2667*22dc650dSSadaf Ebrahimiand could fail to match. 2668*22dc650dSSadaf Ebrahimi 2669*22dc650dSSadaf Ebrahimi98. In pcre2test, make the "startoffset" modifier a synonym of "offset", 2670*22dc650dSSadaf Ebrahimibecause it sets the "startoffset" parameter for pcre2_match(). 2671*22dc650dSSadaf Ebrahimi 2672*22dc650dSSadaf Ebrahimi99. If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between 2673*22dc650dSSadaf Ebrahimian item and its qualifier (for example, A(?#comment)?B) pcre2_compile() 2674*22dc650dSSadaf Ebrahimimisbehaved. This bug was found by the LLVM fuzzer. 2675*22dc650dSSadaf Ebrahimi 2676*22dc650dSSadaf Ebrahimi100. The error for an invalid UTF pattern string always gave the code unit 2677*22dc650dSSadaf Ebrahimioffset as zero instead of where the invalidity was found. 2678*22dc650dSSadaf Ebrahimi 2679*22dc650dSSadaf Ebrahimi101. Further to 97 above, negated classes such as [^[:^ascii:]\d] were also not 2680*22dc650dSSadaf Ebrahimiworking correctly in UCP mode. 2681*22dc650dSSadaf Ebrahimi 2682*22dc650dSSadaf Ebrahimi102. Similar to 99 above, if an isolated \E was present between an item and its 2683*22dc650dSSadaf Ebrahimiqualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile() misbehaved. This bug 2684*22dc650dSSadaf Ebrahimiwas found by the LLVM fuzzer. 2685*22dc650dSSadaf Ebrahimi 2686*22dc650dSSadaf Ebrahimi103. The POSIX wrapper function regexec() crashed if the option REG_STARTEND 2687*22dc650dSSadaf Ebrahimiwas set when the pmatch argument was NULL. It now returns REG_INVARG. 2688*22dc650dSSadaf Ebrahimi 2689*22dc650dSSadaf Ebrahimi104. Allow for up to 32-bit numbers in the ordin() function in pcre2grep. 2690*22dc650dSSadaf Ebrahimi 2691*22dc650dSSadaf Ebrahimi105. An empty \Q\E sequence between an item and its qualifier caused 2692*22dc650dSSadaf Ebrahimipcre2_compile() to misbehave when auto callouts were enabled. This bug 2693*22dc650dSSadaf Ebrahimiwas found by the LLVM fuzzer. 2694*22dc650dSSadaf Ebrahimi 2695*22dc650dSSadaf Ebrahimi106. If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a (*MARK) or 2696*22dc650dSSadaf Ebrahimiother verb "name" ended with whitespace immediately before the closing 2697*22dc650dSSadaf Ebrahimiparenthesis, pcre2_compile() misbehaved. Example: /(*:abc )/, but only when 2698*22dc650dSSadaf Ebrahimiboth those options were set. 2699*22dc650dSSadaf Ebrahimi 2700*22dc650dSSadaf Ebrahimi107. In a number of places pcre2_compile() was not handling NULL characters 2701*22dc650dSSadaf Ebrahimicorrectly, and pcre2test with the "bincode" modifier was not always correctly 2702*22dc650dSSadaf Ebrahimidisplaying fields containing NULLS: 2703*22dc650dSSadaf Ebrahimi 2704*22dc650dSSadaf Ebrahimi (a) Within /x extended #-comments 2705*22dc650dSSadaf Ebrahimi (b) Within the "name" part of (*MARK) and other *verbs 2706*22dc650dSSadaf Ebrahimi (c) Within the text argument of a callout 2707*22dc650dSSadaf Ebrahimi 2708*22dc650dSSadaf Ebrahimi108. If a pattern that was compiled with PCRE2_EXTENDED started with white 2709*22dc650dSSadaf Ebrahimispace or a #-type comment that was followed by (?-x), which turns off 2710*22dc650dSSadaf EbrahimiPCRE2_EXTENDED, and there was no subsequent (?x) to turn it on again, 2711*22dc650dSSadaf Ebrahimipcre2_compile() assumed that (?-x) applied to the whole pattern and 2712*22dc650dSSadaf Ebrahimiconsequently mis-compiled it. This bug was found by the LLVM fuzzer. The fix 2713*22dc650dSSadaf Ebrahimifor this bug means that a setting of any of the (?imsxJU) options at the start 2714*22dc650dSSadaf Ebrahimiof a pattern is no longer transferred to the options that are returned by 2715*22dc650dSSadaf EbrahimiPCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have 2716*22dc650dSSadaf Ebrahimichanged when the effects of those options were all moved to compile time. 2717*22dc650dSSadaf Ebrahimi 2718*22dc650dSSadaf Ebrahimi109. An escaped closing parenthesis in the "name" part of a (*verb) when 2719*22dc650dSSadaf EbrahimiPCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug 2720*22dc650dSSadaf Ebrahimiwas found by the LLVM fuzzer. 2721*22dc650dSSadaf Ebrahimi 2722*22dc650dSSadaf Ebrahimi110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it 2723*22dc650dSSadaf Ebrahimipossible to test it. 2724*22dc650dSSadaf Ebrahimi 2725*22dc650dSSadaf Ebrahimi111. "Harden" pcre2test against ridiculously large values in modifiers and 2726*22dc650dSSadaf Ebrahimicommand line arguments. 2727*22dc650dSSadaf Ebrahimi 2728*22dc650dSSadaf Ebrahimi112. Implemented PCRE2_SUBSTITUTE_UNKNOWN_UNSET and PCRE2_SUBSTITUTE_OVERFLOW_ 2729*22dc650dSSadaf EbrahimiLENGTH. 2730*22dc650dSSadaf Ebrahimi 2731*22dc650dSSadaf Ebrahimi113. Fix printing of *MARK names that contain binary zeroes in pcre2test. 2732*22dc650dSSadaf Ebrahimi 2733*22dc650dSSadaf Ebrahimi 2734*22dc650dSSadaf EbrahimiVersion 10.20 30-June-2015 2735*22dc650dSSadaf Ebrahimi-------------------------- 2736*22dc650dSSadaf Ebrahimi 2737*22dc650dSSadaf Ebrahimi1. Callouts with string arguments have been added. 2738*22dc650dSSadaf Ebrahimi 2739*22dc650dSSadaf Ebrahimi2. Assertion code generator in JIT has been optimized. 2740*22dc650dSSadaf Ebrahimi 2741*22dc650dSSadaf Ebrahimi3. The invalid pattern (?(?C) has a missing assertion condition at the end. The 2742*22dc650dSSadaf Ebrahimipcre2_compile() function read past the end of the input before diagnosing an 2743*22dc650dSSadaf Ebrahimierror. This bug was discovered by the LLVM fuzzer. 2744*22dc650dSSadaf Ebrahimi 2745*22dc650dSSadaf Ebrahimi4. Implemented pcre2_callout_enumerate(). 2746*22dc650dSSadaf Ebrahimi 2747*22dc650dSSadaf Ebrahimi5. Fix JIT compilation of conditional blocks whose assertion is converted to 2748*22dc650dSSadaf Ebrahimi(*FAIL). E.g: /(?(?!))/. 2749*22dc650dSSadaf Ebrahimi 2750*22dc650dSSadaf Ebrahimi6. The pattern /(?(?!)^)/ caused references to random memory. This bug was 2751*22dc650dSSadaf Ebrahimidiscovered by the LLVM fuzzer. 2752*22dc650dSSadaf Ebrahimi 2753*22dc650dSSadaf Ebrahimi7. The assertion (?!) is optimized to (*FAIL). This was not handled correctly 2754*22dc650dSSadaf Ebrahimiwhen this assertion was used as a condition, for example (?(?!)a|b). In 2755*22dc650dSSadaf Ebrahimipcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect 2756*22dc650dSSadaf Ebrahimierror about an unsupported item. 2757*22dc650dSSadaf Ebrahimi 2758*22dc650dSSadaf Ebrahimi8. For some types of pattern, for example /Z*(|d*){216}/, the auto- 2759*22dc650dSSadaf Ebrahimipossessification code could take exponential time to complete. A recursion 2760*22dc650dSSadaf Ebrahimidepth limit of 1000 has been imposed to limit the resources used by this 2761*22dc650dSSadaf Ebrahimioptimization. This infelicity was discovered by the LLVM fuzzer. 2762*22dc650dSSadaf Ebrahimi 2763*22dc650dSSadaf Ebrahimi9. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class 2764*22dc650dSSadaf Ebrahimisuch as \S in non-UCP mode, explicit wide characters (> 255) can be ignored 2765*22dc650dSSadaf Ebrahimibecause \S ensures they are all in the class. The code for doing this was 2766*22dc650dSSadaf Ebrahimiinteracting badly with the code for computing the amount of space needed to 2767*22dc650dSSadaf Ebrahimicompile the pattern, leading to a buffer overflow. This bug was discovered by 2768*22dc650dSSadaf Ebrahimithe LLVM fuzzer. 2769*22dc650dSSadaf Ebrahimi 2770*22dc650dSSadaf Ebrahimi10. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside 2771*22dc650dSSadaf Ebrahimiother kinds of group caused stack overflow at compile time. This bug was 2772*22dc650dSSadaf Ebrahimidiscovered by the LLVM fuzzer. 2773*22dc650dSSadaf Ebrahimi 2774*22dc650dSSadaf Ebrahimi11. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment 2775*22dc650dSSadaf Ebrahimibetween a subroutine call and its quantifier was incorrectly compiled, leading 2776*22dc650dSSadaf Ebrahimito buffer overflow or other errors. This bug was discovered by the LLVM fuzzer. 2777*22dc650dSSadaf Ebrahimi 2778*22dc650dSSadaf Ebrahimi12. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an 2779*22dc650dSSadaf Ebrahimiassertion after (?(. The code was failing to check the character after (?(?< 2780*22dc650dSSadaf Ebrahimifor the ! or = that would indicate a lookbehind assertion. This bug was 2781*22dc650dSSadaf Ebrahimidiscovered by the LLVM fuzzer. 2782*22dc650dSSadaf Ebrahimi 2783*22dc650dSSadaf Ebrahimi13. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with 2784*22dc650dSSadaf Ebrahimia fixed maximum following a group that contains a subroutine reference was 2785*22dc650dSSadaf Ebrahimiincorrectly compiled and could trigger buffer overflow. This bug was discovered 2786*22dc650dSSadaf Ebrahimiby the LLVM fuzzer. 2787*22dc650dSSadaf Ebrahimi 2788*22dc650dSSadaf Ebrahimi14. Negative relative recursive references such as (?-7) to non-existent 2789*22dc650dSSadaf Ebrahimisubpatterns were not being diagnosed and could lead to unpredictable behaviour. 2790*22dc650dSSadaf EbrahimiThis bug was discovered by the LLVM fuzzer. 2791*22dc650dSSadaf Ebrahimi 2792*22dc650dSSadaf Ebrahimi15. The bug fixed in 14 was due to an integer variable that was unsigned when 2793*22dc650dSSadaf Ebrahimiit should have been signed. Some other "int" variables, having been checked, 2794*22dc650dSSadaf Ebrahimihave either been changed to uint32_t or commented as "must be signed". 2795*22dc650dSSadaf Ebrahimi 2796*22dc650dSSadaf Ebrahimi16. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1))) 2797*22dc650dSSadaf Ebrahimicaused a stack overflow instead of the diagnosis of a non-fixed length 2798*22dc650dSSadaf Ebrahimilookbehind assertion. This bug was discovered by the LLVM fuzzer. 2799*22dc650dSSadaf Ebrahimi 2800*22dc650dSSadaf Ebrahimi17. The use of \K in a positive lookbehind assertion in a non-anchored pattern 2801*22dc650dSSadaf Ebrahimi(e.g. /(?<=\Ka)/) could make pcre2grep loop. 2802*22dc650dSSadaf Ebrahimi 2803*22dc650dSSadaf Ebrahimi18. There was a similar problem to 17 in pcre2test for global matches, though 2804*22dc650dSSadaf Ebrahimithe code there did catch the loop. 2805*22dc650dSSadaf Ebrahimi 2806*22dc650dSSadaf Ebrahimi19. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*), 2807*22dc650dSSadaf Ebrahimiand a subsequent item in the pattern caused a non-match, backtracking over the 2808*22dc650dSSadaf Ebrahimirepeated \X did not stop, but carried on past the start of the subject, causing 2809*22dc650dSSadaf Ebrahimireference to random memory and/or a segfault. There were also some other cases 2810*22dc650dSSadaf Ebrahimiwhere backtracking after \C could crash. This set of bugs was discovered by the 2811*22dc650dSSadaf EbrahimiLLVM fuzzer. 2812*22dc650dSSadaf Ebrahimi 2813*22dc650dSSadaf Ebrahimi20. The function for finding the minimum length of a matching string could take 2814*22dc650dSSadaf Ebrahimia very long time if mutual recursion was present many times in a pattern, for 2815*22dc650dSSadaf Ebrahimiexample, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has 2816*22dc650dSSadaf Ebrahimibeen implemented. This infelicity was discovered by the LLVM fuzzer. 2817*22dc650dSSadaf Ebrahimi 2818*22dc650dSSadaf Ebrahimi21. Implemented PCRE2_NEVER_BACKSLASH_C. 2819*22dc650dSSadaf Ebrahimi 2820*22dc650dSSadaf Ebrahimi22. The feature for string replication in pcre2test could read from freed 2821*22dc650dSSadaf Ebrahimimemory if the replication required a buffer to be extended, and it was not 2822*22dc650dSSadaf Ebrahimiworking properly in 16-bit and 32-bit modes. This issue was discovered by a 2823*22dc650dSSadaf Ebrahimifuzzer: see http://lcamtuf.coredump.cx/afl/. 2824*22dc650dSSadaf Ebrahimi 2825*22dc650dSSadaf Ebrahimi23. Added the PCRE2_ALT_CIRCUMFLEX option. 2826*22dc650dSSadaf Ebrahimi 2827*22dc650dSSadaf Ebrahimi24. Adjust the treatment of \8 and \9 to be the same as the current Perl 2828*22dc650dSSadaf Ebrahimibehaviour. 2829*22dc650dSSadaf Ebrahimi 2830*22dc650dSSadaf Ebrahimi25. Static linking against the PCRE2 library using the pkg-config module was 2831*22dc650dSSadaf Ebrahimifailing on missing pthread symbols. 2832*22dc650dSSadaf Ebrahimi 2833*22dc650dSSadaf Ebrahimi26. If a group that contained a recursive back reference also contained a 2834*22dc650dSSadaf Ebrahimiforward reference subroutine call followed by a non-forward-reference 2835*22dc650dSSadaf Ebrahimisubroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to 2836*22dc650dSSadaf Ebrahimicompile correct code, leading to undefined behaviour or an internally detected 2837*22dc650dSSadaf Ebrahimierror. This bug was discovered by the LLVM fuzzer. 2838*22dc650dSSadaf Ebrahimi 2839*22dc650dSSadaf Ebrahimi27. Quantification of certain items (e.g. atomic back references) could cause 2840*22dc650dSSadaf Ebrahimiincorrect code to be compiled when recursive forward references were involved. 2841*22dc650dSSadaf EbrahimiFor example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was 2842*22dc650dSSadaf Ebrahimidiscovered by the LLVM fuzzer. 2843*22dc650dSSadaf Ebrahimi 2844*22dc650dSSadaf Ebrahimi28. A repeated conditional group whose condition was a reference by name caused 2845*22dc650dSSadaf Ebrahimia buffer overflow if there was more than one group with the given name. This 2846*22dc650dSSadaf Ebrahimibug was discovered by the LLVM fuzzer. 2847*22dc650dSSadaf Ebrahimi 2848*22dc650dSSadaf Ebrahimi29. A recursive back reference by name within a group that had the same name as 2849*22dc650dSSadaf Ebrahimianother group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/. 2850*22dc650dSSadaf EbrahimiThis bug was discovered by the LLVM fuzzer. 2851*22dc650dSSadaf Ebrahimi 2852*22dc650dSSadaf Ebrahimi30. A forward reference by name to a group whose number is the same as the 2853*22dc650dSSadaf Ebrahimicurrent group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a 2854*22dc650dSSadaf Ebrahimibuffer overflow at compile time. This bug was discovered by the LLVM fuzzer. 2855*22dc650dSSadaf Ebrahimi 2856*22dc650dSSadaf Ebrahimi31. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1 2857*22dc650dSSadaf Ebrahimias an int; fixed by writing it as 1u). 2858*22dc650dSSadaf Ebrahimi 2859*22dc650dSSadaf Ebrahimi32. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives 2860*22dc650dSSadaf Ebrahimia warning for "fileno" unless -std=gnu99 us used. 2861*22dc650dSSadaf Ebrahimi 2862*22dc650dSSadaf Ebrahimi33. A lookbehind assertion within a set of mutually recursive subpatterns could 2863*22dc650dSSadaf Ebrahimiprovoke a buffer overflow. This bug was discovered by the LLVM fuzzer. 2864*22dc650dSSadaf Ebrahimi 2865*22dc650dSSadaf Ebrahimi34. Give an error for an empty subpattern name such as (?''). 2866*22dc650dSSadaf Ebrahimi 2867*22dc650dSSadaf Ebrahimi35. Make pcre2test give an error if a pattern that follows #forbud_utf contains 2868*22dc650dSSadaf Ebrahimi\P, \p, or \X. 2869*22dc650dSSadaf Ebrahimi 2870*22dc650dSSadaf Ebrahimi36. The way named subpatterns are handled has been refactored. There is now a 2871*22dc650dSSadaf Ebrahimipre-pass over the regex which does nothing other than identify named 2872*22dc650dSSadaf Ebrahimisubpatterns and count the total captures. This means that information about 2873*22dc650dSSadaf Ebrahiminamed patterns is known before the rest of the compile. In particular, it means 2874*22dc650dSSadaf Ebrahimithat forward references can be checked as they are encountered. Previously, the 2875*22dc650dSSadaf Ebrahimicode for handling forward references was contorted and led to several errors in 2876*22dc650dSSadaf Ebrahimicomputing the memory requirements for some patterns, leading to buffer 2877*22dc650dSSadaf Ebrahimioverflows. 2878*22dc650dSSadaf Ebrahimi 2879*22dc650dSSadaf Ebrahimi37. There was no check for integer overflow in subroutine calls such as (?123). 2880*22dc650dSSadaf Ebrahimi 2881*22dc650dSSadaf Ebrahimi38. The table entry for \l in EBCDIC environments was incorrect, leading to its 2882*22dc650dSSadaf Ebrahimibeing treated as a literal 'l' instead of causing an error. 2883*22dc650dSSadaf Ebrahimi 2884*22dc650dSSadaf Ebrahimi39. If a non-capturing group containing a conditional group that could match 2885*22dc650dSSadaf Ebrahimian empty string was repeated, it was not identified as matching an empty string 2886*22dc650dSSadaf Ebrahimiitself. For example: /^(?:(?(1)x|)+)+$()/. 2887*22dc650dSSadaf Ebrahimi 2888*22dc650dSSadaf Ebrahimi40. In an EBCDIC environment, pcretest was mishandling the escape sequences 2889*22dc650dSSadaf Ebrahimi\a and \e in test subject lines. 2890*22dc650dSSadaf Ebrahimi 2891*22dc650dSSadaf Ebrahimi41. In an EBCDIC environment, \a in a pattern was converted to the ASCII 2892*22dc650dSSadaf Ebrahimiinstead of the EBCDIC value. 2893*22dc650dSSadaf Ebrahimi 2894*22dc650dSSadaf Ebrahimi42. The handling of \c in an EBCDIC environment has been revised so that it is 2895*22dc650dSSadaf Ebrahiminow compatible with the specification in Perl's perlebcdic page. 2896*22dc650dSSadaf Ebrahimi 2897*22dc650dSSadaf Ebrahimi43. Single character repetition in JIT has been improved. 20-30% speedup 2898*22dc650dSSadaf Ebrahimiwas achieved on certain patterns. 2899*22dc650dSSadaf Ebrahimi 2900*22dc650dSSadaf Ebrahimi44. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in 2901*22dc650dSSadaf EbrahimiASCII/Unicode. This has now been added to the list of characters that are 2902*22dc650dSSadaf Ebrahimirecognized as white space in EBCDIC. 2903*22dc650dSSadaf Ebrahimi 2904*22dc650dSSadaf Ebrahimi45. When PCRE2 was compiled without Unicode support, the use of \p and \P gave 2905*22dc650dSSadaf Ebrahimian error (correctly) when used outside a class, but did not give an error 2906*22dc650dSSadaf Ebrahimiwithin a class. 2907*22dc650dSSadaf Ebrahimi 2908*22dc650dSSadaf Ebrahimi46. \h within a class was incorrectly compiled in EBCDIC environments. 2909*22dc650dSSadaf Ebrahimi 2910*22dc650dSSadaf Ebrahimi47. JIT should return with error when the compiled pattern requires 2911*22dc650dSSadaf Ebrahimimore stack space than the maximum. 2912*22dc650dSSadaf Ebrahimi 2913*22dc650dSSadaf Ebrahimi48. Fixed a memory leak in pcre2grep when a locale is set. 2914*22dc650dSSadaf Ebrahimi 2915*22dc650dSSadaf Ebrahimi 2916*22dc650dSSadaf EbrahimiVersion 10.10 06-March-2015 2917*22dc650dSSadaf Ebrahimi--------------------------- 2918*22dc650dSSadaf Ebrahimi 2919*22dc650dSSadaf Ebrahimi1. When a pattern is compiled, it remembers the highest back reference so that 2920*22dc650dSSadaf Ebrahimiwhen matching, if the ovector is too small, extra memory can be obtained to 2921*22dc650dSSadaf Ebrahimiuse instead. A conditional subpattern whose condition is a check on a capture 2922*22dc650dSSadaf Ebrahimihaving happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is 2923*22dc650dSSadaf Ebrahimianother kind of back reference, but it was not setting the highest 2924*22dc650dSSadaf Ebrahimibackreference number. This mattered only if pcre2_match() was called with an 2925*22dc650dSSadaf Ebrahimiovector that was too small to hold the capture, and there was no other kind of 2926*22dc650dSSadaf Ebrahimiback reference (a situation which is probably quite rare). The effect of the 2927*22dc650dSSadaf Ebrahimibug was that the condition was always treated as FALSE when the capture could 2928*22dc650dSSadaf Ebrahiminot be consulted, leading to a incorrect behaviour by pcre2_match(). This bug 2929*22dc650dSSadaf Ebrahimihas been fixed. 2930*22dc650dSSadaf Ebrahimi 2931*22dc650dSSadaf Ebrahimi2. Functions for serialization and deserialization of sets of compiled patterns 2932*22dc650dSSadaf Ebrahimihave been added. 2933*22dc650dSSadaf Ebrahimi 2934*22dc650dSSadaf Ebrahimi3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove 2935*22dc650dSSadaf Ebrahimiexcess code units at the end of the data block that may occasionally occur if 2936*22dc650dSSadaf Ebrahimithe code for calculating the size over-estimates. This change stops the 2937*22dc650dSSadaf Ebrahimiserialization code copying uninitialized data, to which valgrind objects. The 2938*22dc650dSSadaf Ebrahimidocumentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not 2939*22dc650dSSadaf Ebrahimiinclude the general overhead. This has been corrected. 2940*22dc650dSSadaf Ebrahimi 2941*22dc650dSSadaf Ebrahimi4. All code units in every slot in the table of group names are now set, again 2942*22dc650dSSadaf Ebrahimiin order to avoid accessing uninitialized data when serializing. 2943*22dc650dSSadaf Ebrahimi 2944*22dc650dSSadaf Ebrahimi5. The (*NO_JIT) feature is implemented. 2945*22dc650dSSadaf Ebrahimi 2946*22dc650dSSadaf Ebrahimi6. If a bug that caused pcre2_compile() to use more memory than allocated was 2947*22dc650dSSadaf Ebrahimitriggered when using valgrind, the code in (3) above passed a stupidly large 2948*22dc650dSSadaf Ebrahimivalue to valgrind. This caused a crash instead of an "internal error" return. 2949*22dc650dSSadaf Ebrahimi 2950*22dc650dSSadaf Ebrahimi7. A reference to a duplicated named group (either a back reference or a test 2951*22dc650dSSadaf Ebrahimifor being set in a conditional) that occurred in a part of the pattern where 2952*22dc650dSSadaf EbrahimiPCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern 2953*22dc650dSSadaf Ebrahimito be incorrectly calculated, leading to overwriting. 2954*22dc650dSSadaf Ebrahimi 2955*22dc650dSSadaf Ebrahimi8. A mutually recursive set of back references such as (\2)(\1) caused a 2956*22dc650dSSadaf Ebrahimisegfault at compile time (while trying to find the minimum matching length). 2957*22dc650dSSadaf EbrahimiThe infinite loop is now broken (with the minimum length unset, that is, zero). 2958*22dc650dSSadaf Ebrahimi 2959*22dc650dSSadaf Ebrahimi9. If an assertion that was used as a condition was quantified with a minimum 2960*22dc650dSSadaf Ebrahimiof zero, matching went wrong. In particular, if the whole group had unlimited 2961*22dc650dSSadaf Ebrahimirepetition and could match an empty string, a segfault was likely. The pattern 2962*22dc650dSSadaf Ebrahimi(?(?=0)?)+ is an example that caused this. Perl allows assertions to be 2963*22dc650dSSadaf Ebrahimiquantified, but not if they are being used as conditions, so the above pattern 2964*22dc650dSSadaf Ebrahimiis faulted by Perl. PCRE2 has now been changed so that it also rejects such 2965*22dc650dSSadaf Ebrahimipatterns. 2966*22dc650dSSadaf Ebrahimi 2967*22dc650dSSadaf Ebrahimi10. The error message for an invalid quantifier has been changed from "nothing 2968*22dc650dSSadaf Ebrahimito repeat" to "quantifier does not follow a repeatable item". 2969*22dc650dSSadaf Ebrahimi 2970*22dc650dSSadaf Ebrahimi11. If a bad UTF string is compiled with NO_UTF_CHECK, it may succeed, but 2971*22dc650dSSadaf Ebrahimiscanning the compiled pattern in subsequent auto-possessification can get out 2972*22dc650dSSadaf Ebrahimiof step and lead to an unknown opcode. Previously this could have caused an 2973*22dc650dSSadaf Ebrahimiinfinite loop. Now it generates an "internal error" error. This is a tidyup, 2974*22dc650dSSadaf Ebrahiminot a bug fix; passing bad UTF with NO_UTF_CHECK is documented as having an 2975*22dc650dSSadaf Ebrahimiundefined outcome. 2976*22dc650dSSadaf Ebrahimi 2977*22dc650dSSadaf Ebrahimi12. A UTF pattern containing a "not" match of a non-ASCII character and a 2978*22dc650dSSadaf Ebrahimisubroutine reference could loop at compile time. Example: /[^\xff]((?1))/. 2979*22dc650dSSadaf Ebrahimi 2980*22dc650dSSadaf Ebrahimi13. The locale test (RunTest 3) has been upgraded. It now checks that a locale 2981*22dc650dSSadaf Ebrahimithat is found in the output of "locale -a" can actually be set by pcre2test 2982*22dc650dSSadaf Ebrahimibefore it is accepted. Previously, in an environment where a locale was listed 2983*22dc650dSSadaf Ebrahimibut would not set (an example does exist), the test would "pass" without 2984*22dc650dSSadaf Ebrahimiactually doing anything. Also the fr_CA locale has been added to the list of 2985*22dc650dSSadaf Ebrahimilocales that can be used. 2986*22dc650dSSadaf Ebrahimi 2987*22dc650dSSadaf Ebrahimi14. Fixed a bug in pcre2_substitute(). If a replacement string ended in a 2988*22dc650dSSadaf Ebrahimicapturing group number without parentheses, the last character was incorrectly 2989*22dc650dSSadaf Ebrahimiliterally included at the end of the replacement string. 2990*22dc650dSSadaf Ebrahimi 2991*22dc650dSSadaf Ebrahimi15. A possessive capturing group such as (a)*+ with a minimum repeat of zero 2992*22dc650dSSadaf Ebrahimifailed to allow the zero-repeat case if pcre2_match() was called with an 2993*22dc650dSSadaf Ebrahimiovector too small to capture the group. 2994*22dc650dSSadaf Ebrahimi 2995*22dc650dSSadaf Ebrahimi16. Improved error message in pcre2test when setting the stack size (-S) fails. 2996*22dc650dSSadaf Ebrahimi 2997*22dc650dSSadaf Ebrahimi17. Fixed two bugs in CMakeLists.txt: (1) Some lines had got lost in the 2998*22dc650dSSadaf Ebrahimitransfer from PCRE1, meaning that CMake configuration failed if "build tests" 2999*22dc650dSSadaf Ebrahimiwas selected. (2) The file src/pcre2_serialize.c had not been added to the list 3000*22dc650dSSadaf Ebrahimiof PCRE2 sources, which caused a failure to build pcre2test. 3001*22dc650dSSadaf Ebrahimi 3002*22dc650dSSadaf Ebrahimi18. Fixed typo in pcre2_serialize.c (DECL instead of DEFN) that causes problems 3003*22dc650dSSadaf Ebrahimionly on Windows. 3004*22dc650dSSadaf Ebrahimi 3005*22dc650dSSadaf Ebrahimi19. Use binary input when reading back saved serialized patterns in pcre2test. 3006*22dc650dSSadaf Ebrahimi 3007*22dc650dSSadaf Ebrahimi20. Added RunTest.bat for running the tests under Windows. 3008*22dc650dSSadaf Ebrahimi 3009*22dc650dSSadaf Ebrahimi21. "make distclean" was not removing config.h, a file that may be created for 3010*22dc650dSSadaf Ebrahimiuse with CMake. 3011*22dc650dSSadaf Ebrahimi 3012*22dc650dSSadaf Ebrahimi22. A pattern such as "((?2){0,1999}())?", which has a group containing a 3013*22dc650dSSadaf Ebrahimiforward reference repeated a large (but limited) number of times within a 3014*22dc650dSSadaf Ebrahimirepeated outer group that has a zero minimum quantifier, caused incorrect code 3015*22dc650dSSadaf Ebrahimito be compiled, leading to the error "internal error: previously-checked 3016*22dc650dSSadaf Ebrahimireferenced subpattern not found" when an incorrect memory address was read. 3017*22dc650dSSadaf EbrahimiThis bug was reported as "heap overflow", discovered by Kai Lu of Fortinet's 3018*22dc650dSSadaf EbrahimiFortiGuard Labs. (Added 24-March-2015: CVE-2015-2325 was given to this.) 3019*22dc650dSSadaf Ebrahimi 3020*22dc650dSSadaf Ebrahimi23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine 3021*22dc650dSSadaf Ebrahimicall within a group that also contained a recursive back reference caused 3022*22dc650dSSadaf Ebrahimiincorrect code to be compiled. This bug was reported as "heap overflow", 3023*22dc650dSSadaf Ebrahimidiscovered by Kai Lu of Fortinet's FortiGuard Labs. (Added 24-March-2015: 3024*22dc650dSSadaf EbrahimiCVE-2015-2326 was given to this.) 3025*22dc650dSSadaf Ebrahimi 3026*22dc650dSSadaf Ebrahimi24. Computing the size of the JIT read-only data in advance has been a source 3027*22dc650dSSadaf Ebrahimiof various issues, and new ones are still appear unfortunately. To fix 3028*22dc650dSSadaf Ebrahimiexisting and future issues, size computation is eliminated from the code, 3029*22dc650dSSadaf Ebrahimiand replaced by on-demand memory allocation. 3030*22dc650dSSadaf Ebrahimi 3031*22dc650dSSadaf Ebrahimi25. A pattern such as /(?i)[A-`]/, where characters in the other case are 3032*22dc650dSSadaf Ebrahimiadjacent to the end of the range, and the range contained characters with more 3033*22dc650dSSadaf Ebrahimithan one other case, caused incorrect behaviour when compiled in UTF mode. In 3034*22dc650dSSadaf Ebrahimithat example, the range a-j was left out of the class. 3035*22dc650dSSadaf Ebrahimi 3036*22dc650dSSadaf Ebrahimi 3037*22dc650dSSadaf EbrahimiVersion 10.00 05-January-2015 3038*22dc650dSSadaf Ebrahimi----------------------------- 3039*22dc650dSSadaf Ebrahimi 3040*22dc650dSSadaf EbrahimiVersion 10.00 is the first release of PCRE2, a revised API for the PCRE 3041*22dc650dSSadaf Ebrahimilibrary. Changes prior to 10.00 are logged in the ChangeLog file for the old 3042*22dc650dSSadaf EbrahimiAPI, up to item 20 for release 8.36. 3043*22dc650dSSadaf Ebrahimi 3044*22dc650dSSadaf EbrahimiThe code of the library was heavily revised as part of the new API 3045*22dc650dSSadaf Ebrahimiimplementation. Details of each and every modification were not individually 3046*22dc650dSSadaf Ebrahimilogged. In addition to the API changes, the following changes were made. They 3047*22dc650dSSadaf Ebrahimiare either new functionality, or bug fixes and other noticeable changes of 3048*22dc650dSSadaf Ebrahimibehaviour that were implemented after the code had been forked. 3049*22dc650dSSadaf Ebrahimi 3050*22dc650dSSadaf Ebrahimi1. Including Unicode support at build time is now enabled by default, but it 3051*22dc650dSSadaf Ebrahimican optionally be disabled. It is not enabled by default at run time (no 3052*22dc650dSSadaf Ebrahimichange). 3053*22dc650dSSadaf Ebrahimi 3054*22dc650dSSadaf Ebrahimi2. The test program, now called pcre2test, was re-specified and almost 3055*22dc650dSSadaf Ebrahimicompletely re-written. Its input is not compatible with input for pcretest. 3056*22dc650dSSadaf Ebrahimi 3057*22dc650dSSadaf Ebrahimi3. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the 3058*22dc650dSSadaf EbrahimiPCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is 3059*22dc650dSSadaf Ebrahimimatched by that pattern. 3060*22dc650dSSadaf Ebrahimi 3061*22dc650dSSadaf Ebrahimi4. For the benefit of those who use PCRE2 via some other application, that is, 3062*22dc650dSSadaf Ebrahiminot writing the function calls themselves, it is possible to check the PCRE2 3063*22dc650dSSadaf Ebrahimiversion by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a 3064*22dc650dSSadaf Ebrahimistring such as "yesno". 3065*22dc650dSSadaf Ebrahimi 3066*22dc650dSSadaf Ebrahimi5. There are case-equivalent Unicode characters whose encodings use different 3067*22dc650dSSadaf Ebrahiminumbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is 3068*22dc650dSSadaf Ebrahimitheoretically possible for this to happen in UTF-16 too.) If a backreference to 3069*22dc650dSSadaf Ebrahimia group containing one of these characters was greedily repeated, and during 3070*22dc650dSSadaf Ebrahimithe match a backtrack occurred, the subject might be backtracked by the wrong 3071*22dc650dSSadaf Ebrahiminumber of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly 3072*22dc650dSSadaf Ebrahimi(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should 3073*22dc650dSSadaf Ebrahimicapture the final character, which is the three bytes E2, B1, and A5 in UTF-8. 3074*22dc650dSSadaf EbrahimiIncorrect backtracking meant that group 2 captured only the last two bytes. 3075*22dc650dSSadaf EbrahimiThis bug has been fixed; the new code is slower, but it is used only when the 3076*22dc650dSSadaf Ebrahimistrings matched by the repetition are not all the same length. 3077*22dc650dSSadaf Ebrahimi 3078*22dc650dSSadaf Ebrahimi6. A pattern such as /()a/ was not setting the "first character must be 'a'" 3079*22dc650dSSadaf Ebrahimiinformation. This applied to any pattern with a group that matched no 3080*22dc650dSSadaf Ebrahimicharacters, for example: /(?:(?=.)|(?<!x))a/. 3081*22dc650dSSadaf Ebrahimi 3082*22dc650dSSadaf Ebrahimi7. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for 3083*22dc650dSSadaf Ebrahimithose parentheses to be closed with whatever has been captured so far. However, 3084*22dc650dSSadaf Ebrahimiit was failing to mark any other groups between the highest capture so far and 3085*22dc650dSSadaf Ebrahimithe current group as "unset". Thus, the ovector for those groups contained 3086*22dc650dSSadaf Ebrahimiwhatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when 3087*22dc650dSSadaf Ebrahimimatched against "abcd". 3088*22dc650dSSadaf Ebrahimi 3089*22dc650dSSadaf Ebrahimi8. The pcre2_substitute() function has been implemented. 3090*22dc650dSSadaf Ebrahimi 3091*22dc650dSSadaf Ebrahimi9. If an assertion used as a condition was quantified with a minimum of zero 3092*22dc650dSSadaf Ebrahimi(an odd thing to do, but it happened), SIGSEGV or other misbehaviour could 3093*22dc650dSSadaf Ebrahimioccur. 3094*22dc650dSSadaf Ebrahimi 3095*22dc650dSSadaf Ebrahimi10. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented. 3096*22dc650dSSadaf Ebrahimi 3097*22dc650dSSadaf Ebrahimi**** 3098