1*22dc650dSSadaf Ebrahimi<html> 2*22dc650dSSadaf Ebrahimi<head> 3*22dc650dSSadaf Ebrahimi<title>pcre2test specification</title> 4*22dc650dSSadaf Ebrahimi</head> 5*22dc650dSSadaf Ebrahimi<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> 6*22dc650dSSadaf Ebrahimi<h1>pcre2test man page</h1> 7*22dc650dSSadaf Ebrahimi<p> 8*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>. 9*22dc650dSSadaf Ebrahimi</p> 10*22dc650dSSadaf Ebrahimi<p> 11*22dc650dSSadaf EbrahimiThis page is part of the PCRE2 HTML documentation. It was generated 12*22dc650dSSadaf Ebrahimiautomatically from the original man page. If there is any nonsense in it, 13*22dc650dSSadaf Ebrahimiplease consult the man page, in case the conversion went wrong. 14*22dc650dSSadaf Ebrahimi<br> 15*22dc650dSSadaf Ebrahimi<ul> 16*22dc650dSSadaf Ebrahimi<li><a name="TOC1" href="#SEC1">SYNOPSIS</a> 17*22dc650dSSadaf Ebrahimi<li><a name="TOC2" href="#SEC2">PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a> 18*22dc650dSSadaf Ebrahimi<li><a name="TOC3" href="#SEC3">INPUT ENCODING</a> 19*22dc650dSSadaf Ebrahimi<li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a> 20*22dc650dSSadaf Ebrahimi<li><a name="TOC5" href="#SEC5">DESCRIPTION</a> 21*22dc650dSSadaf Ebrahimi<li><a name="TOC6" href="#SEC6">COMMAND LINES</a> 22*22dc650dSSadaf Ebrahimi<li><a name="TOC7" href="#SEC7">MODIFIER SYNTAX</a> 23*22dc650dSSadaf Ebrahimi<li><a name="TOC8" href="#SEC8">PATTERN SYNTAX</a> 24*22dc650dSSadaf Ebrahimi<li><a name="TOC9" href="#SEC9">SUBJECT LINE SYNTAX</a> 25*22dc650dSSadaf Ebrahimi<li><a name="TOC10" href="#SEC10">PATTERN MODIFIERS</a> 26*22dc650dSSadaf Ebrahimi<li><a name="TOC11" href="#SEC11">SUBJECT MODIFIERS</a> 27*22dc650dSSadaf Ebrahimi<li><a name="TOC12" href="#SEC12">THE ALTERNATIVE MATCHING FUNCTION</a> 28*22dc650dSSadaf Ebrahimi<li><a name="TOC13" href="#SEC13">DEFAULT OUTPUT FROM pcre2test</a> 29*22dc650dSSadaf Ebrahimi<li><a name="TOC14" href="#SEC14">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a> 30*22dc650dSSadaf Ebrahimi<li><a name="TOC15" href="#SEC15">RESTARTING AFTER A PARTIAL MATCH</a> 31*22dc650dSSadaf Ebrahimi<li><a name="TOC16" href="#SEC16">CALLOUTS</a> 32*22dc650dSSadaf Ebrahimi<li><a name="TOC17" href="#SEC17">NON-PRINTING CHARACTERS</a> 33*22dc650dSSadaf Ebrahimi<li><a name="TOC18" href="#SEC18">SAVING AND RESTORING COMPILED PATTERNS</a> 34*22dc650dSSadaf Ebrahimi<li><a name="TOC19" href="#SEC19">SEE ALSO</a> 35*22dc650dSSadaf Ebrahimi<li><a name="TOC20" href="#SEC20">AUTHOR</a> 36*22dc650dSSadaf Ebrahimi<li><a name="TOC21" href="#SEC21">REVISION</a> 37*22dc650dSSadaf Ebrahimi</ul> 38*22dc650dSSadaf Ebrahimi<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> 39*22dc650dSSadaf Ebrahimi<P> 40*22dc650dSSadaf Ebrahimi<b>pcre2test [options] [input file [output file]]</b> 41*22dc650dSSadaf Ebrahimi<br> 42*22dc650dSSadaf Ebrahimi<br> 43*22dc650dSSadaf Ebrahimi<b>pcre2test</b> is a test program for the PCRE2 regular expression libraries, 44*22dc650dSSadaf Ebrahimibut it can also be used for experimenting with regular expressions. This 45*22dc650dSSadaf Ebrahimidocument describes the features of the test program; for details of the regular 46*22dc650dSSadaf Ebrahimiexpressions themselves, see the 47*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 48*22dc650dSSadaf Ebrahimidocumentation. For details of the PCRE2 library function calls and their 49*22dc650dSSadaf Ebrahimioptions, see the 50*22dc650dSSadaf Ebrahimi<a href="pcre2api.html"><b>pcre2api</b></a> 51*22dc650dSSadaf Ebrahimidocumentation. 52*22dc650dSSadaf Ebrahimi</P> 53*22dc650dSSadaf Ebrahimi<P> 54*22dc650dSSadaf EbrahimiThe input for <b>pcre2test</b> is a sequence of regular expression patterns and 55*22dc650dSSadaf Ebrahimisubject strings to be matched. There are also command lines for setting 56*22dc650dSSadaf Ebrahimidefaults and controlling some special actions. The output shows the result of 57*22dc650dSSadaf Ebrahimieach match attempt. Modifiers on external or internal command lines, the 58*22dc650dSSadaf Ebrahimipatterns, and the subject lines specify PCRE2 function options, control how the 59*22dc650dSSadaf Ebrahimisubject is processed, and what output is produced. 60*22dc650dSSadaf Ebrahimi</P> 61*22dc650dSSadaf Ebrahimi<P> 62*22dc650dSSadaf EbrahimiThere are many obscure modifiers, some of which are specifically designed for 63*22dc650dSSadaf Ebrahimiuse in conjunction with the test script and data files that are distributed as 64*22dc650dSSadaf Ebrahimipart of PCRE2. All the modifiers are documented here, some without much 65*22dc650dSSadaf Ebrahimijustification, but many of them are unlikely to be of use except when testing 66*22dc650dSSadaf Ebrahimithe libraries. 67*22dc650dSSadaf Ebrahimi</P> 68*22dc650dSSadaf Ebrahimi<br><a name="SEC2" href="#TOC1">PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br> 69*22dc650dSSadaf Ebrahimi<P> 70*22dc650dSSadaf EbrahimiDifferent versions of the PCRE2 library can be built to support character 71*22dc650dSSadaf Ebrahimistrings that are encoded in 8-bit, 16-bit, or 32-bit code units. One, two, or 72*22dc650dSSadaf Ebrahimiall three of these libraries may be simultaneously installed. The 73*22dc650dSSadaf Ebrahimi<b>pcre2test</b> program can be used to test all the libraries. However, its own 74*22dc650dSSadaf Ebrahimiinput and output are always in 8-bit format. When testing the 16-bit or 32-bit 75*22dc650dSSadaf Ebrahimilibraries, patterns and subject strings are converted to 16-bit or 32-bit 76*22dc650dSSadaf Ebrahimiformat before being passed to the library functions. Results are converted back 77*22dc650dSSadaf Ebrahimito 8-bit code units for output. 78*22dc650dSSadaf Ebrahimi</P> 79*22dc650dSSadaf Ebrahimi<P> 80*22dc650dSSadaf EbrahimiIn the rest of this document, the names of library functions and structures 81*22dc650dSSadaf Ebrahimiare given in generic form, for example, <b>pcre2_compile()</b>. The actual 82*22dc650dSSadaf Ebrahiminames used in the libraries have a suffix _8, _16, or _32, as appropriate. 83*22dc650dSSadaf Ebrahimi<a name="inputencoding"></a></P> 84*22dc650dSSadaf Ebrahimi<br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br> 85*22dc650dSSadaf Ebrahimi<P> 86*22dc650dSSadaf EbrahimiInput to <b>pcre2test</b> is processed line by line, either by calling the C 87*22dc650dSSadaf Ebrahimilibrary's <b>fgets()</b> function, or via the <b>libreadline</b> or <b>libedit</b> 88*22dc650dSSadaf Ebrahimilibrary. In some Windows environments character 26 (hex 1A) causes an immediate 89*22dc650dSSadaf Ebrahimiend of file, and no further data is read, so this character should be avoided 90*22dc650dSSadaf Ebrahimiunless you really want that action. 91*22dc650dSSadaf Ebrahimi</P> 92*22dc650dSSadaf Ebrahimi<P> 93*22dc650dSSadaf EbrahimiThe input is processed using C's string functions, so must not contain binary 94*22dc650dSSadaf Ebrahimizeros, even though in Unix-like environments, <b>fgets()</b> treats any bytes 95*22dc650dSSadaf Ebrahimiother than newline as data characters. An error is generated if a binary zero 96*22dc650dSSadaf Ebrahimiis encountered. By default subject lines are processed for backslash escapes, 97*22dc650dSSadaf Ebrahimiwhich makes it possible to include any data value in strings that are passed to 98*22dc650dSSadaf Ebrahimithe library for matching. For patterns, there is a facility for specifying some 99*22dc650dSSadaf Ebrahimior all of the 8-bit input characters as hexadecimal pairs, which makes it 100*22dc650dSSadaf Ebrahimipossible to include binary zeros. 101*22dc650dSSadaf Ebrahimi</P> 102*22dc650dSSadaf Ebrahimi<br><b> 103*22dc650dSSadaf EbrahimiInput for the 16-bit and 32-bit libraries 104*22dc650dSSadaf Ebrahimi</b><br> 105*22dc650dSSadaf Ebrahimi<P> 106*22dc650dSSadaf EbrahimiWhen testing the 16-bit or 32-bit libraries, there is a need to be able to 107*22dc650dSSadaf Ebrahimigenerate character code points greater than 255 in the strings that are passed 108*22dc650dSSadaf Ebrahimito the library. For subject lines, backslash escapes can be used. In addition, 109*22dc650dSSadaf Ebrahimiwhen the <b>utf</b> modifier (see 110*22dc650dSSadaf Ebrahimi<a href="#optionmodifiers">"Setting compilation options"</a> 111*22dc650dSSadaf Ebrahimibelow) is set, the pattern and any following subject lines are interpreted as 112*22dc650dSSadaf EbrahimiUTF-8 strings and translated to UTF-16 or UTF-32 as appropriate. 113*22dc650dSSadaf Ebrahimi</P> 114*22dc650dSSadaf Ebrahimi<P> 115*22dc650dSSadaf EbrahimiFor non-UTF testing of wide characters, the <b>utf8_input</b> modifier can be 116*22dc650dSSadaf Ebrahimiused. This is mutually exclusive with <b>utf</b>, and is allowed only in 16-bit 117*22dc650dSSadaf Ebrahimior 32-bit mode. It causes the pattern and following subject lines to be treated 118*22dc650dSSadaf Ebrahimias UTF-8 according to the original definition (RFC 2279), which allows for 119*22dc650dSSadaf Ebrahimicharacter values up to 0x7fffffff. Each character is placed in one 16-bit or 120*22dc650dSSadaf Ebrahimi32-bit code unit (in the 16-bit case, values greater than 0xffff cause an error 121*22dc650dSSadaf Ebrahimito occur). 122*22dc650dSSadaf Ebrahimi</P> 123*22dc650dSSadaf Ebrahimi<P> 124*22dc650dSSadaf EbrahimiUTF-8 (in its original definition) is not capable of encoding values greater 125*22dc650dSSadaf Ebrahimithan 0x7fffffff, but such values can be handled by the 32-bit library. When 126*22dc650dSSadaf Ebrahimitesting this library in non-UTF mode with <b>utf8_input</b> set, if any 127*22dc650dSSadaf Ebrahimicharacter is preceded by the byte 0xff (which is an invalid byte in UTF-8) 128*22dc650dSSadaf Ebrahimi0x80000000 is added to the character's value. This is the only way of passing 129*22dc650dSSadaf Ebrahimisuch code points in a pattern string. For subject strings, using an escape 130*22dc650dSSadaf Ebrahimisequence is preferable. 131*22dc650dSSadaf Ebrahimi</P> 132*22dc650dSSadaf Ebrahimi<br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br> 133*22dc650dSSadaf Ebrahimi<P> 134*22dc650dSSadaf Ebrahimi<b>-8</b> 135*22dc650dSSadaf EbrahimiIf the 8-bit library has been built, this option causes it to be used (this is 136*22dc650dSSadaf Ebrahimithe default). If the 8-bit library has not been built, this option causes an 137*22dc650dSSadaf Ebrahimierror. 138*22dc650dSSadaf Ebrahimi</P> 139*22dc650dSSadaf Ebrahimi<P> 140*22dc650dSSadaf Ebrahimi<b>-16</b> 141*22dc650dSSadaf EbrahimiIf the 16-bit library has been built, this option causes it to be used. If the 142*22dc650dSSadaf Ebrahimi8-bit library has not been built, this is the default. If the 16-bit library 143*22dc650dSSadaf Ebrahimihas not been built, this option causes an error. 144*22dc650dSSadaf Ebrahimi</P> 145*22dc650dSSadaf Ebrahimi<P> 146*22dc650dSSadaf Ebrahimi<b>-32</b> 147*22dc650dSSadaf EbrahimiIf the 32-bit library has been built, this option causes it to be used. If no 148*22dc650dSSadaf Ebrahimiother library has been built, this is the default. If the 32-bit library has 149*22dc650dSSadaf Ebrahiminot been built, this option causes an error. 150*22dc650dSSadaf Ebrahimi</P> 151*22dc650dSSadaf Ebrahimi<P> 152*22dc650dSSadaf Ebrahimi<b>-ac</b> 153*22dc650dSSadaf EbrahimiBehave as if each pattern has the <b>auto_callout</b> modifier, that is, insert 154*22dc650dSSadaf Ebrahimiautomatic callouts into every pattern that is compiled. 155*22dc650dSSadaf Ebrahimi</P> 156*22dc650dSSadaf Ebrahimi<P> 157*22dc650dSSadaf Ebrahimi<b>-AC</b> 158*22dc650dSSadaf EbrahimiAs for <b>-ac</b>, but in addition behave as if each subject line has the 159*22dc650dSSadaf Ebrahimi<b>callout_extra</b> modifier, that is, show additional information from 160*22dc650dSSadaf Ebrahimicallouts. 161*22dc650dSSadaf Ebrahimi</P> 162*22dc650dSSadaf Ebrahimi<P> 163*22dc650dSSadaf Ebrahimi<b>-b</b> 164*22dc650dSSadaf EbrahimiBehave as if each pattern has the <b>fullbincode</b> modifier; the full 165*22dc650dSSadaf Ebrahimiinternal binary form of the pattern is output after compilation. 166*22dc650dSSadaf Ebrahimi</P> 167*22dc650dSSadaf Ebrahimi<P> 168*22dc650dSSadaf Ebrahimi<b>-C</b> 169*22dc650dSSadaf EbrahimiOutput the version number of the PCRE2 library, and all available information 170*22dc650dSSadaf Ebrahimiabout the optional features that are included, and then exit with zero exit 171*22dc650dSSadaf Ebrahimicode. All other options are ignored. If both -C and -LM are present, whichever 172*22dc650dSSadaf Ebrahimiis first is recognized. 173*22dc650dSSadaf Ebrahimi</P> 174*22dc650dSSadaf Ebrahimi<P> 175*22dc650dSSadaf Ebrahimi<b>-C</b> <i>option</i> 176*22dc650dSSadaf EbrahimiOutput information about a specific build-time option, then exit. This 177*22dc650dSSadaf Ebrahimifunctionality is intended for use in scripts such as <b>RunTest</b>. The 178*22dc650dSSadaf Ebrahimifollowing options output the value and set the exit code as indicated: 179*22dc650dSSadaf Ebrahimi<pre> 180*22dc650dSSadaf Ebrahimi ebcdic-nl the code for LF (= NL) in an EBCDIC environment: 181*22dc650dSSadaf Ebrahimi 0x15 or 0x25 182*22dc650dSSadaf Ebrahimi 0 if used in an ASCII environment 183*22dc650dSSadaf Ebrahimi exit code is always 0 184*22dc650dSSadaf Ebrahimi linksize the configured internal link size (2, 3, or 4) 185*22dc650dSSadaf Ebrahimi exit code is set to the link size 186*22dc650dSSadaf Ebrahimi newline the default newline setting: 187*22dc650dSSadaf Ebrahimi CR, LF, CRLF, ANYCRLF, ANY, or NUL 188*22dc650dSSadaf Ebrahimi exit code is always 0 189*22dc650dSSadaf Ebrahimi bsr the default setting for what \R matches: 190*22dc650dSSadaf Ebrahimi ANYCRLF or ANY 191*22dc650dSSadaf Ebrahimi exit code is always 0 192*22dc650dSSadaf Ebrahimi</pre> 193*22dc650dSSadaf EbrahimiThe following options output 1 for true or 0 for false, and set the exit code 194*22dc650dSSadaf Ebrahimito the same value: 195*22dc650dSSadaf Ebrahimi<pre> 196*22dc650dSSadaf Ebrahimi backslash-C \C is supported (not locked out) 197*22dc650dSSadaf Ebrahimi ebcdic compiled for an EBCDIC environment 198*22dc650dSSadaf Ebrahimi jit just-in-time support is available 199*22dc650dSSadaf Ebrahimi pcre2-16 the 16-bit library was built 200*22dc650dSSadaf Ebrahimi pcre2-32 the 32-bit library was built 201*22dc650dSSadaf Ebrahimi pcre2-8 the 8-bit library was built 202*22dc650dSSadaf Ebrahimi unicode Unicode support is available 203*22dc650dSSadaf Ebrahimi</pre> 204*22dc650dSSadaf EbrahimiIf an unknown option is given, an error message is output; the exit code is 0. 205*22dc650dSSadaf Ebrahimi</P> 206*22dc650dSSadaf Ebrahimi<P> 207*22dc650dSSadaf Ebrahimi<b>-d</b> 208*22dc650dSSadaf EbrahimiBehave as if each pattern has the <b>debug</b> modifier; the internal 209*22dc650dSSadaf Ebrahimiform and information about the compiled pattern is output after compilation; 210*22dc650dSSadaf Ebrahimi<b>-d</b> is equivalent to <b>-b -i</b>. 211*22dc650dSSadaf Ebrahimi</P> 212*22dc650dSSadaf Ebrahimi<P> 213*22dc650dSSadaf Ebrahimi<b>-dfa</b> 214*22dc650dSSadaf EbrahimiBehave as if each subject line has the <b>dfa</b> modifier; matching is done 215*22dc650dSSadaf Ebrahimiusing the <b>pcre2_dfa_match()</b> function instead of the default 216*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>. 217*22dc650dSSadaf Ebrahimi</P> 218*22dc650dSSadaf Ebrahimi<P> 219*22dc650dSSadaf Ebrahimi<b>-error</b> <i>number[,number,...]</i> 220*22dc650dSSadaf EbrahimiCall <b>pcre2_get_error_message()</b> for each of the error numbers in the 221*22dc650dSSadaf Ebrahimicomma-separated list, display the resulting messages on the standard output, 222*22dc650dSSadaf Ebrahimithen exit with zero exit code. The numbers may be positive or negative. This is 223*22dc650dSSadaf Ebrahimia convenience facility for PCRE2 maintainers. 224*22dc650dSSadaf Ebrahimi</P> 225*22dc650dSSadaf Ebrahimi<P> 226*22dc650dSSadaf Ebrahimi<b>-help</b> 227*22dc650dSSadaf EbrahimiOutput a brief summary these options and then exit. 228*22dc650dSSadaf Ebrahimi</P> 229*22dc650dSSadaf Ebrahimi<P> 230*22dc650dSSadaf Ebrahimi<b>-i</b> 231*22dc650dSSadaf EbrahimiBehave as if each pattern has the <b>info</b> modifier; information about the 232*22dc650dSSadaf Ebrahimicompiled pattern is given after compilation. 233*22dc650dSSadaf Ebrahimi</P> 234*22dc650dSSadaf Ebrahimi<P> 235*22dc650dSSadaf Ebrahimi<b>-jit</b> 236*22dc650dSSadaf EbrahimiBehave as if each pattern line has the <b>jit</b> modifier; after successful 237*22dc650dSSadaf Ebrahimicompilation, each pattern is passed to the just-in-time compiler, if available. 238*22dc650dSSadaf Ebrahimi</P> 239*22dc650dSSadaf Ebrahimi<P> 240*22dc650dSSadaf Ebrahimi<b>-jitfast</b> 241*22dc650dSSadaf EbrahimiBehave as if each pattern line has the <b>jitfast</b> modifier; after 242*22dc650dSSadaf Ebrahimisuccessful compilation, each pattern is passed to the just-in-time compiler, if 243*22dc650dSSadaf Ebrahimiavailable, and each subject line is passed directly to the JIT matcher via its 244*22dc650dSSadaf Ebrahimi"fast path". 245*22dc650dSSadaf Ebrahimi</P> 246*22dc650dSSadaf Ebrahimi<P> 247*22dc650dSSadaf Ebrahimi<b>-jitverify</b> 248*22dc650dSSadaf EbrahimiBehave as if each pattern line has the <b>jitverify</b> modifier; after 249*22dc650dSSadaf Ebrahimisuccessful compilation, each pattern is passed to the just-in-time compiler, if 250*22dc650dSSadaf Ebrahimiavailable, and the use of JIT for matching is verified. 251*22dc650dSSadaf Ebrahimi</P> 252*22dc650dSSadaf Ebrahimi<P> 253*22dc650dSSadaf Ebrahimi<b>-LM</b> 254*22dc650dSSadaf EbrahimiList modifiers: write a list of available pattern and subject modifiers to the 255*22dc650dSSadaf Ebrahimistandard output, then exit with zero exit code. All other options are ignored. 256*22dc650dSSadaf EbrahimiIf both -C and any -Lx options are present, whichever is first is recognized. 257*22dc650dSSadaf Ebrahimi</P> 258*22dc650dSSadaf Ebrahimi<P> 259*22dc650dSSadaf Ebrahimi<b>-LP</b> 260*22dc650dSSadaf EbrahimiList properties: write a list of recognized Unicode properties to the standard 261*22dc650dSSadaf Ebrahimioutput, then exit with zero exit code. All other options are ignored. If both 262*22dc650dSSadaf Ebrahimi-C and any -Lx options are present, whichever is first is recognized. 263*22dc650dSSadaf Ebrahimi</P> 264*22dc650dSSadaf Ebrahimi<P> 265*22dc650dSSadaf Ebrahimi<b>-LS</b> 266*22dc650dSSadaf EbrahimiList scripts: write a list of recognized Unicode script names to the standard 267*22dc650dSSadaf Ebrahimioutput, then exit with zero exit code. All other options are ignored. If both 268*22dc650dSSadaf Ebrahimi-C and any -Lx options are present, whichever is first is recognized. 269*22dc650dSSadaf Ebrahimi</P> 270*22dc650dSSadaf Ebrahimi<P> 271*22dc650dSSadaf Ebrahimi<b>-pattern</b> <i>modifier-list</i> 272*22dc650dSSadaf EbrahimiBehave as if each pattern line contains the given modifiers. 273*22dc650dSSadaf Ebrahimi</P> 274*22dc650dSSadaf Ebrahimi<P> 275*22dc650dSSadaf Ebrahimi<b>-q</b> 276*22dc650dSSadaf EbrahimiDo not output the version number of <b>pcre2test</b> at the start of execution. 277*22dc650dSSadaf Ebrahimi</P> 278*22dc650dSSadaf Ebrahimi<P> 279*22dc650dSSadaf Ebrahimi<b>-S</b> <i>size</i> 280*22dc650dSSadaf EbrahimiOn Unix-like systems, set the size of the run-time stack to <i>size</i> 281*22dc650dSSadaf Ebrahimimebibytes (units of 1024*1024 bytes). 282*22dc650dSSadaf Ebrahimi</P> 283*22dc650dSSadaf Ebrahimi<P> 284*22dc650dSSadaf Ebrahimi<b>-subject</b> <i>modifier-list</i> 285*22dc650dSSadaf EbrahimiBehave as if each subject line contains the given modifiers. 286*22dc650dSSadaf Ebrahimi</P> 287*22dc650dSSadaf Ebrahimi<P> 288*22dc650dSSadaf Ebrahimi<b>-t</b> 289*22dc650dSSadaf EbrahimiRun each compile and match many times with a timer, and output the resulting 290*22dc650dSSadaf Ebrahimitimes per compile or match. When JIT is used, separate times are given for the 291*22dc650dSSadaf Ebrahimiinitial compile and the JIT compile. You can control the number of iterations 292*22dc650dSSadaf Ebrahimithat are used for timing by following <b>-t</b> with a number (as a separate 293*22dc650dSSadaf Ebrahimiitem on the command line). For example, "-t 1000" iterates 1000 times. The 294*22dc650dSSadaf Ebrahimidefault is to iterate 500,000 times. 295*22dc650dSSadaf Ebrahimi</P> 296*22dc650dSSadaf Ebrahimi<P> 297*22dc650dSSadaf Ebrahimi<b>-tm</b> 298*22dc650dSSadaf EbrahimiThis is like <b>-t</b> except that it times only the matching phase, not the 299*22dc650dSSadaf Ebrahimicompile phase. 300*22dc650dSSadaf Ebrahimi</P> 301*22dc650dSSadaf Ebrahimi<P> 302*22dc650dSSadaf Ebrahimi<b>-T</b> <b>-TM</b> 303*22dc650dSSadaf EbrahimiThese behave like <b>-t</b> and <b>-tm</b>, but in addition, at the end of a run, 304*22dc650dSSadaf Ebrahimithe total times for all compiles and matches are output. 305*22dc650dSSadaf Ebrahimi</P> 306*22dc650dSSadaf Ebrahimi<P> 307*22dc650dSSadaf Ebrahimi<b>-version</b> 308*22dc650dSSadaf EbrahimiOutput the PCRE2 version number and then exit. 309*22dc650dSSadaf Ebrahimi</P> 310*22dc650dSSadaf Ebrahimi<br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br> 311*22dc650dSSadaf Ebrahimi<P> 312*22dc650dSSadaf EbrahimiIf <b>pcre2test</b> is given two filename arguments, it reads from the first and 313*22dc650dSSadaf Ebrahimiwrites to the second. If the first name is "-", input is taken from the 314*22dc650dSSadaf Ebrahimistandard input. If <b>pcre2test</b> is given only one argument, it reads from 315*22dc650dSSadaf Ebrahimithat file and writes to stdout. Otherwise, it reads from stdin and writes to 316*22dc650dSSadaf Ebrahimistdout. 317*22dc650dSSadaf Ebrahimi</P> 318*22dc650dSSadaf Ebrahimi<P> 319*22dc650dSSadaf EbrahimiWhen <b>pcre2test</b> is built, a configuration option can specify that it 320*22dc650dSSadaf Ebrahimishould be linked with the <b>libreadline</b> or <b>libedit</b> library. When this 321*22dc650dSSadaf Ebrahimiis done, if the input is from a terminal, it is read using the <b>readline()</b> 322*22dc650dSSadaf Ebrahimifunction. This provides line-editing and history facilities. The output from 323*22dc650dSSadaf Ebrahimithe <b>-help</b> option states whether or not <b>readline()</b> will be used. 324*22dc650dSSadaf Ebrahimi</P> 325*22dc650dSSadaf Ebrahimi<P> 326*22dc650dSSadaf EbrahimiThe program handles any number of tests, each of which consists of a set of 327*22dc650dSSadaf Ebrahimiinput lines. Each set starts with a regular expression pattern, followed by any 328*22dc650dSSadaf Ebrahiminumber of subject lines to be matched against that pattern. In between sets of 329*22dc650dSSadaf Ebrahimitest data, command lines that begin with # may appear. This file format, with 330*22dc650dSSadaf Ebrahimisome restrictions, can also be processed by the <b>perltest.sh</b> script that 331*22dc650dSSadaf Ebrahimiis distributed with PCRE2 as a means of checking that the behaviour of PCRE2 332*22dc650dSSadaf Ebrahimiand Perl is the same. For a specification of <b>perltest.sh</b>, see the 333*22dc650dSSadaf Ebrahimicomments near its beginning. See also the #perltest command below. 334*22dc650dSSadaf Ebrahimi</P> 335*22dc650dSSadaf Ebrahimi<P> 336*22dc650dSSadaf EbrahimiWhen the input is a terminal, <b>pcre2test</b> prompts for each line of input, 337*22dc650dSSadaf Ebrahimiusing "re>" to prompt for regular expression patterns, and "data>" to prompt 338*22dc650dSSadaf Ebrahimifor subject lines. Command lines starting with # can be entered only in 339*22dc650dSSadaf Ebrahimiresponse to the "re>" prompt. 340*22dc650dSSadaf Ebrahimi</P> 341*22dc650dSSadaf Ebrahimi<P> 342*22dc650dSSadaf EbrahimiEach subject line is matched separately and independently. If you want to do 343*22dc650dSSadaf Ebrahimimulti-line matches, you have to use the \n escape sequence (or \r or \r\n, 344*22dc650dSSadaf Ebrahimietc., depending on the newline setting) in a single line of input to encode the 345*22dc650dSSadaf Ebrahiminewline sequences. There is no limit on the length of subject lines; the input 346*22dc650dSSadaf Ebrahimibuffer is automatically extended if it is too small. There are replication 347*22dc650dSSadaf Ebrahimifeatures that makes it possible to generate long repetitive pattern or subject 348*22dc650dSSadaf Ebrahimilines without having to supply them explicitly. 349*22dc650dSSadaf Ebrahimi</P> 350*22dc650dSSadaf Ebrahimi<P> 351*22dc650dSSadaf EbrahimiAn empty line or the end of the file signals the end of the subject lines for a 352*22dc650dSSadaf Ebrahimitest, at which point a new pattern or command line is expected if there is 353*22dc650dSSadaf Ebrahimistill input to be read. 354*22dc650dSSadaf Ebrahimi</P> 355*22dc650dSSadaf Ebrahimi<br><a name="SEC6" href="#TOC1">COMMAND LINES</a><br> 356*22dc650dSSadaf Ebrahimi<P> 357*22dc650dSSadaf EbrahimiIn between sets of test data, a line that begins with # is interpreted as a 358*22dc650dSSadaf Ebrahimicommand line. If the first character is followed by white space or an 359*22dc650dSSadaf Ebrahimiexclamation mark, the line is treated as a comment, and ignored. Otherwise, the 360*22dc650dSSadaf Ebrahimifollowing commands are recognized: 361*22dc650dSSadaf Ebrahimi<pre> 362*22dc650dSSadaf Ebrahimi #forbid_utf 363*22dc650dSSadaf Ebrahimi</pre> 364*22dc650dSSadaf EbrahimiSubsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP 365*22dc650dSSadaf Ebrahimioptions set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and 366*22dc650dSSadaf Ebrahimithe use of (*UTF) and (*UCP) at the start of patterns. This command also forces 367*22dc650dSSadaf Ebrahimian error if a subsequent pattern contains any occurrences of \P, \p, or \X, 368*22dc650dSSadaf Ebrahimiwhich are still supported when PCRE2_UTF is not set, but which require Unicode 369*22dc650dSSadaf Ebrahimiproperty support to be included in the library. 370*22dc650dSSadaf Ebrahimi</P> 371*22dc650dSSadaf Ebrahimi<P> 372*22dc650dSSadaf EbrahimiThis is a trigger guard that is used in test files to ensure that UTF or 373*22dc650dSSadaf EbrahimiUnicode property tests are not accidentally added to files that are used when 374*22dc650dSSadaf EbrahimiUnicode support is not included in the library. Setting PCRE2_NEVER_UTF and 375*22dc650dSSadaf EbrahimiPCRE2_NEVER_UCP as a default can also be obtained by the use of <b>#pattern</b>; 376*22dc650dSSadaf Ebrahimithe difference is that <b>#forbid_utf</b> cannot be unset, and the automatic 377*22dc650dSSadaf Ebrahimioptions are not displayed in pattern information, to avoid cluttering up test 378*22dc650dSSadaf Ebrahimioutput. 379*22dc650dSSadaf Ebrahimi<pre> 380*22dc650dSSadaf Ebrahimi #load <filename> 381*22dc650dSSadaf Ebrahimi</pre> 382*22dc650dSSadaf EbrahimiThis command is used to load a set of precompiled patterns from a file, as 383*22dc650dSSadaf Ebrahimidescribed in the section entitled "Saving and restoring compiled patterns" 384*22dc650dSSadaf Ebrahimi<a href="#saverestore">below.</a> 385*22dc650dSSadaf Ebrahimi<pre> 386*22dc650dSSadaf Ebrahimi #loadtables <filename> 387*22dc650dSSadaf Ebrahimi</pre> 388*22dc650dSSadaf EbrahimiThis command is used to load a set of binary character tables that can be 389*22dc650dSSadaf Ebrahimiaccessed by the tables=3 qualifier. Such tables can be created by the 390*22dc650dSSadaf Ebrahimi<b>pcre2_dftables</b> program with the -b option. 391*22dc650dSSadaf Ebrahimi<pre> 392*22dc650dSSadaf Ebrahimi #newline_default [<newline-list>] 393*22dc650dSSadaf Ebrahimi</pre> 394*22dc650dSSadaf EbrahimiWhen PCRE2 is built, a default newline convention can be specified. This 395*22dc650dSSadaf Ebrahimidetermines which characters and/or character pairs are recognized as indicating 396*22dc650dSSadaf Ebrahimia newline in a pattern or subject string. The default can be overridden when a 397*22dc650dSSadaf Ebrahimipattern is compiled. The standard test files contain tests of various newline 398*22dc650dSSadaf Ebrahimiconventions, but the majority of the tests expect a single linefeed to be 399*22dc650dSSadaf Ebrahimirecognized as a newline by default. Without special action the tests would fail 400*22dc650dSSadaf Ebrahimiwhen PCRE2 is compiled with either CR or CRLF as the default newline. 401*22dc650dSSadaf Ebrahimi</P> 402*22dc650dSSadaf Ebrahimi<P> 403*22dc650dSSadaf EbrahimiThe #newline_default command specifies a list of newline types that are 404*22dc650dSSadaf Ebrahimiacceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF, 405*22dc650dSSadaf EbrahimiANY, or NUL (in upper or lower case), for example: 406*22dc650dSSadaf Ebrahimi<pre> 407*22dc650dSSadaf Ebrahimi #newline_default LF Any anyCRLF 408*22dc650dSSadaf Ebrahimi</pre> 409*22dc650dSSadaf EbrahimiIf the default newline is in the list, this command has no effect. Otherwise, 410*22dc650dSSadaf Ebrahimiexcept when testing the POSIX API, a <b>newline</b> modifier that specifies the 411*22dc650dSSadaf Ebrahimifirst newline convention in the list (LF in the above example) is added to any 412*22dc650dSSadaf Ebrahimipattern that does not already have a <b>newline</b> modifier. If the newline 413*22dc650dSSadaf Ebrahimilist is empty, the feature is turned off. This command is present in a number 414*22dc650dSSadaf Ebrahimiof the standard test input files. 415*22dc650dSSadaf Ebrahimi</P> 416*22dc650dSSadaf Ebrahimi<P> 417*22dc650dSSadaf EbrahimiWhen the POSIX API is being tested there is no way to override the default 418*22dc650dSSadaf Ebrahiminewline convention, though it is possible to set the newline convention from 419*22dc650dSSadaf Ebrahimiwithin the pattern. A warning is given if the <b>posix</b> or <b>posix_nosub</b> 420*22dc650dSSadaf Ebrahimimodifier is used when <b>#newline_default</b> would set a default for the 421*22dc650dSSadaf Ebrahiminon-POSIX API. 422*22dc650dSSadaf Ebrahimi<pre> 423*22dc650dSSadaf Ebrahimi #pattern <modifier-list> 424*22dc650dSSadaf Ebrahimi</pre> 425*22dc650dSSadaf EbrahimiThis command sets a default modifier list that applies to all subsequent 426*22dc650dSSadaf Ebrahimipatterns. Modifiers on a pattern can change these settings. 427*22dc650dSSadaf Ebrahimi<pre> 428*22dc650dSSadaf Ebrahimi #perltest 429*22dc650dSSadaf Ebrahimi</pre> 430*22dc650dSSadaf EbrahimiThis line is used in test files that can also be processed by <b>perltest.sh</b> 431*22dc650dSSadaf Ebrahimito confirm that Perl gives the same results as PCRE2. Subsequent tests are 432*22dc650dSSadaf Ebrahimichecked for the use of <b>pcre2test</b> features that are incompatible with the 433*22dc650dSSadaf Ebrahimi<b>perltest.sh</b> script. 434*22dc650dSSadaf Ebrahimi</P> 435*22dc650dSSadaf Ebrahimi<P> 436*22dc650dSSadaf EbrahimiPatterns must use '/' as their delimiter, and only certain modifiers are 437*22dc650dSSadaf Ebrahimisupported. Comment lines, #pattern commands, and #subject commands that set or 438*22dc650dSSadaf Ebrahimiunset "mark" are recognized and acted on. The #perltest, #forbid_utf, and 439*22dc650dSSadaf Ebrahimi#newline_default commands, which are needed in the relevant pcre2test files, 440*22dc650dSSadaf Ebrahimiare silently ignored. All other command lines are ignored, but give a warning 441*22dc650dSSadaf Ebrahimimessage. The <b>#perltest</b> command helps detect tests that are accidentally 442*22dc650dSSadaf Ebrahimiput in the wrong file or use the wrong delimiter. For more details of the 443*22dc650dSSadaf Ebrahimi<b>perltest.sh</b> script see the comments it contains. 444*22dc650dSSadaf Ebrahimi<pre> 445*22dc650dSSadaf Ebrahimi #pop [<modifiers>] 446*22dc650dSSadaf Ebrahimi #popcopy [<modifiers>] 447*22dc650dSSadaf Ebrahimi</pre> 448*22dc650dSSadaf EbrahimiThese commands are used to manipulate the stack of compiled patterns, as 449*22dc650dSSadaf Ebrahimidescribed in the section entitled "Saving and restoring compiled patterns" 450*22dc650dSSadaf Ebrahimi<a href="#saverestore">below.</a> 451*22dc650dSSadaf Ebrahimi<pre> 452*22dc650dSSadaf Ebrahimi #save <filename> 453*22dc650dSSadaf Ebrahimi</pre> 454*22dc650dSSadaf EbrahimiThis command is used to save a set of compiled patterns to a file, as described 455*22dc650dSSadaf Ebrahimiin the section entitled "Saving and restoring compiled patterns" 456*22dc650dSSadaf Ebrahimi<a href="#saverestore">below.</a> 457*22dc650dSSadaf Ebrahimi<pre> 458*22dc650dSSadaf Ebrahimi #subject <modifier-list> 459*22dc650dSSadaf Ebrahimi</pre> 460*22dc650dSSadaf EbrahimiThis command sets a default modifier list that applies to all subsequent 461*22dc650dSSadaf Ebrahimisubject lines. Modifiers on a subject line can change these settings. 462*22dc650dSSadaf Ebrahimi</P> 463*22dc650dSSadaf Ebrahimi<br><a name="SEC7" href="#TOC1">MODIFIER SYNTAX</a><br> 464*22dc650dSSadaf Ebrahimi<P> 465*22dc650dSSadaf EbrahimiModifier lists are used with both pattern and subject lines. Items in a list 466*22dc650dSSadaf Ebrahimiare separated by commas followed by optional white space. Trailing whitespace 467*22dc650dSSadaf Ebrahimiin a modifier list is ignored. Some modifiers may be given for both patterns 468*22dc650dSSadaf Ebrahimiand subject lines, whereas others are valid only for one or the other. Each 469*22dc650dSSadaf Ebrahimimodifier has a long name, for example "anchored", and some of them must be 470*22dc650dSSadaf Ebrahimifollowed by an equals sign and a value, for example, "offset=12". Values cannot 471*22dc650dSSadaf Ebrahimicontain comma characters, but may contain spaces. Modifiers that do not take 472*22dc650dSSadaf Ebrahimivalues may be preceded by a minus sign to turn off a previous setting. 473*22dc650dSSadaf Ebrahimi</P> 474*22dc650dSSadaf Ebrahimi<P> 475*22dc650dSSadaf EbrahimiA few of the more common modifiers can also be specified as single letters, for 476*22dc650dSSadaf Ebrahimiexample "i" for "caseless". In documentation, following the Perl convention, 477*22dc650dSSadaf Ebrahimithese are written with a slash ("the /i modifier") for clarity. Abbreviated 478*22dc650dSSadaf Ebrahimimodifiers must all be concatenated in the first item of a modifier list. If the 479*22dc650dSSadaf Ebrahimifirst item is not recognized as a long modifier name, it is interpreted as a 480*22dc650dSSadaf Ebrahimisequence of these abbreviations. For example: 481*22dc650dSSadaf Ebrahimi<pre> 482*22dc650dSSadaf Ebrahimi /abc/ig,newline=cr,jit=3 483*22dc650dSSadaf Ebrahimi</pre> 484*22dc650dSSadaf EbrahimiThis is a pattern line whose modifier list starts with two one-letter modifiers 485*22dc650dSSadaf Ebrahimi(/i and /g). The lower-case abbreviated modifiers are the same as used in Perl. 486*22dc650dSSadaf Ebrahimi</P> 487*22dc650dSSadaf Ebrahimi<br><a name="SEC8" href="#TOC1">PATTERN SYNTAX</a><br> 488*22dc650dSSadaf Ebrahimi<P> 489*22dc650dSSadaf EbrahimiA pattern line must start with one of the following characters (common symbols, 490*22dc650dSSadaf Ebrahimiexcluding pattern meta-characters): 491*22dc650dSSadaf Ebrahimi<pre> 492*22dc650dSSadaf Ebrahimi / ! " ' ` - = _ : ; , % & @ ~ 493*22dc650dSSadaf Ebrahimi</pre> 494*22dc650dSSadaf EbrahimiThis is interpreted as the pattern's delimiter. A regular expression may be 495*22dc650dSSadaf Ebrahimicontinued over several input lines, in which case the newline characters are 496*22dc650dSSadaf Ebrahimiincluded within it. It is possible to include the delimiter as a literal within 497*22dc650dSSadaf Ebrahimithe pattern by escaping it with a backslash, for example 498*22dc650dSSadaf Ebrahimi<pre> 499*22dc650dSSadaf Ebrahimi /abc\/def/ 500*22dc650dSSadaf Ebrahimi</pre> 501*22dc650dSSadaf EbrahimiIf you do this, the escape and the delimiter form part of the pattern, but 502*22dc650dSSadaf Ebrahimisince the delimiters are all non-alphanumeric, the inclusion of the backslash 503*22dc650dSSadaf Ebrahimidoes not affect the pattern's interpretation. Note, however, that this trick 504*22dc650dSSadaf Ebrahimidoes not work within \Q...\E literal bracketing because the backslash will 505*22dc650dSSadaf Ebrahimiitself be interpreted as a literal. If the terminating delimiter is immediately 506*22dc650dSSadaf Ebrahimifollowed by a backslash, for example, 507*22dc650dSSadaf Ebrahimi<pre> 508*22dc650dSSadaf Ebrahimi /abc/\ 509*22dc650dSSadaf Ebrahimi</pre> 510*22dc650dSSadaf Ebrahimia backslash is added to the end of the pattern. This is done to provide a way 511*22dc650dSSadaf Ebrahimiof testing the error condition that arises if a pattern finishes with a 512*22dc650dSSadaf Ebrahimibackslash, because 513*22dc650dSSadaf Ebrahimi<pre> 514*22dc650dSSadaf Ebrahimi /abc\/ 515*22dc650dSSadaf Ebrahimi</pre> 516*22dc650dSSadaf Ebrahimiis interpreted as the first line of a pattern that starts with "abc/", causing 517*22dc650dSSadaf Ebrahimipcre2test to read the next line as a continuation of the regular expression. 518*22dc650dSSadaf Ebrahimi</P> 519*22dc650dSSadaf Ebrahimi<P> 520*22dc650dSSadaf EbrahimiA pattern can be followed by a modifier list (details below). 521*22dc650dSSadaf Ebrahimi</P> 522*22dc650dSSadaf Ebrahimi<br><a name="SEC9" href="#TOC1">SUBJECT LINE SYNTAX</a><br> 523*22dc650dSSadaf Ebrahimi<P> 524*22dc650dSSadaf EbrahimiBefore each subject line is passed to <b>pcre2_match()</b>, 525*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>, leading and trailing white 526*22dc650dSSadaf Ebrahimispace is removed, and the line is scanned for backslash escapes, unless the 527*22dc650dSSadaf Ebrahimi<b>subject_literal</b> modifier was set for the pattern. The following provide a 528*22dc650dSSadaf Ebrahimimeans of encoding non-printing characters in a visible way: 529*22dc650dSSadaf Ebrahimi<pre> 530*22dc650dSSadaf Ebrahimi \a alarm (BEL, \x07) 531*22dc650dSSadaf Ebrahimi \b backspace (\x08) 532*22dc650dSSadaf Ebrahimi \e escape (\x27) 533*22dc650dSSadaf Ebrahimi \f form feed (\x0c) 534*22dc650dSSadaf Ebrahimi \n newline (\x0a) 535*22dc650dSSadaf Ebrahimi \r carriage return (\x0d) 536*22dc650dSSadaf Ebrahimi \t tab (\x09) 537*22dc650dSSadaf Ebrahimi \v vertical tab (\x0b) 538*22dc650dSSadaf Ebrahimi \nnn octal character (up to 3 octal digits); always 539*22dc650dSSadaf Ebrahimi a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode 540*22dc650dSSadaf Ebrahimi \o{dd...} octal character (any number of octal digits} 541*22dc650dSSadaf Ebrahimi \xhh hexadecimal byte (up to 2 hex digits) 542*22dc650dSSadaf Ebrahimi \x{hh...} hexadecimal character (any number of hex digits) 543*22dc650dSSadaf Ebrahimi</pre> 544*22dc650dSSadaf EbrahimiThe use of \x{hh...} is not dependent on the use of the <b>utf</b> modifier on 545*22dc650dSSadaf Ebrahimithe pattern. It is recognized always. There may be any number of hexadecimal 546*22dc650dSSadaf Ebrahimidigits inside the braces; invalid values provoke error messages. 547*22dc650dSSadaf Ebrahimi</P> 548*22dc650dSSadaf Ebrahimi<P> 549*22dc650dSSadaf EbrahimiNote that \xhh specifies one byte rather than one character in UTF-8 mode; 550*22dc650dSSadaf Ebrahimithis makes it possible to construct invalid UTF-8 sequences for testing 551*22dc650dSSadaf Ebrahimipurposes. On the other hand, \x{hh} is interpreted as a UTF-8 character in 552*22dc650dSSadaf EbrahimiUTF-8 mode, generating more than one byte if the value is greater than 127. 553*22dc650dSSadaf EbrahimiWhen testing the 8-bit library not in UTF-8 mode, \x{hh} generates one byte 554*22dc650dSSadaf Ebrahimifor values less than 256, and causes an error for greater values. 555*22dc650dSSadaf Ebrahimi</P> 556*22dc650dSSadaf Ebrahimi<P> 557*22dc650dSSadaf EbrahimiIn UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it 558*22dc650dSSadaf Ebrahimipossible to construct invalid UTF-16 sequences for testing purposes. 559*22dc650dSSadaf Ebrahimi</P> 560*22dc650dSSadaf Ebrahimi<P> 561*22dc650dSSadaf EbrahimiIn UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it 562*22dc650dSSadaf Ebrahimipossible to construct invalid UTF-32 sequences for testing purposes. 563*22dc650dSSadaf Ebrahimi</P> 564*22dc650dSSadaf Ebrahimi<P> 565*22dc650dSSadaf EbrahimiThere is a special backslash sequence that specifies replication of one or more 566*22dc650dSSadaf Ebrahimicharacters: 567*22dc650dSSadaf Ebrahimi<pre> 568*22dc650dSSadaf Ebrahimi \[<characters>]{<count>} 569*22dc650dSSadaf Ebrahimi</pre> 570*22dc650dSSadaf EbrahimiThis makes it possible to test long strings without having to provide them as 571*22dc650dSSadaf Ebrahimipart of the file. For example: 572*22dc650dSSadaf Ebrahimi<pre> 573*22dc650dSSadaf Ebrahimi \[abc]{4} 574*22dc650dSSadaf Ebrahimi</pre> 575*22dc650dSSadaf Ebrahimiis converted to "abcabcabcabc". This feature does not support nesting. To 576*22dc650dSSadaf Ebrahimiinclude a closing square bracket in the characters, code it as \x5D. 577*22dc650dSSadaf Ebrahimi</P> 578*22dc650dSSadaf Ebrahimi<P> 579*22dc650dSSadaf EbrahimiA backslash followed by an equals sign marks the end of the subject string and 580*22dc650dSSadaf Ebrahimithe start of a modifier list. For example: 581*22dc650dSSadaf Ebrahimi<pre> 582*22dc650dSSadaf Ebrahimi abc\=notbol,notempty 583*22dc650dSSadaf Ebrahimi</pre> 584*22dc650dSSadaf EbrahimiIf the subject string is empty and \= is followed by whitespace, the line is 585*22dc650dSSadaf Ebrahimitreated as a comment line, and is not used for matching. For example: 586*22dc650dSSadaf Ebrahimi<pre> 587*22dc650dSSadaf Ebrahimi \= This is a comment. 588*22dc650dSSadaf Ebrahimi abc\= This is an invalid modifier list. 589*22dc650dSSadaf Ebrahimi</pre> 590*22dc650dSSadaf EbrahimiA backslash followed by any other non-alphanumeric character just escapes that 591*22dc650dSSadaf Ebrahimicharacter. A backslash followed by anything else causes an error. However, if 592*22dc650dSSadaf Ebrahimithe very last character in the line is a backslash (and there is no modifier 593*22dc650dSSadaf Ebrahimilist), it is ignored. This gives a way of passing an empty line as data, since 594*22dc650dSSadaf Ebrahimia real empty line terminates the data input. 595*22dc650dSSadaf Ebrahimi</P> 596*22dc650dSSadaf Ebrahimi<P> 597*22dc650dSSadaf EbrahimiIf the <b>subject_literal</b> modifier is set for a pattern, all subject lines 598*22dc650dSSadaf Ebrahimithat follow are treated as literals, with no special treatment of backslashes. 599*22dc650dSSadaf EbrahimiNo replication is possible, and any subject modifiers must be set as defaults 600*22dc650dSSadaf Ebrahimiby a <b>#subject</b> command. 601*22dc650dSSadaf Ebrahimi</P> 602*22dc650dSSadaf Ebrahimi<br><a name="SEC10" href="#TOC1">PATTERN MODIFIERS</a><br> 603*22dc650dSSadaf Ebrahimi<P> 604*22dc650dSSadaf EbrahimiThere are several types of modifier that can appear in pattern lines. Except 605*22dc650dSSadaf Ebrahimiwhere noted below, they may also be used in <b>#pattern</b> commands. A 606*22dc650dSSadaf Ebrahimipattern's modifier list can add to or override default modifiers that were set 607*22dc650dSSadaf Ebrahimiby a previous <b>#pattern</b> command. 608*22dc650dSSadaf Ebrahimi<a name="optionmodifiers"></a></P> 609*22dc650dSSadaf Ebrahimi<br><b> 610*22dc650dSSadaf EbrahimiSetting compilation options 611*22dc650dSSadaf Ebrahimi</b><br> 612*22dc650dSSadaf Ebrahimi<P> 613*22dc650dSSadaf EbrahimiThe following modifiers set options for <b>pcre2_compile()</b>. Most of them set 614*22dc650dSSadaf Ebrahimibits in the options argument of that function, but those whose names start with 615*22dc650dSSadaf EbrahimiPCRE2_EXTRA are additional options that are set in the compile context. 616*22dc650dSSadaf EbrahimiSome of these options have single-letter abbreviations. There is special 617*22dc650dSSadaf Ebrahimihandling for /x: if a second x is present, PCRE2_EXTENDED is converted into 618*22dc650dSSadaf EbrahimiPCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EXTENDED as well, 619*22dc650dSSadaf Ebrahimithough this makes no difference to the way <b>pcre2_compile()</b> behaves. See 620*22dc650dSSadaf Ebrahimi<a href="pcre2api.html"><b>pcre2api</b></a> 621*22dc650dSSadaf Ebrahimifor a description of the effects of these options. 622*22dc650dSSadaf Ebrahimi<pre> 623*22dc650dSSadaf Ebrahimi allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS 624*22dc650dSSadaf Ebrahimi allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK 625*22dc650dSSadaf Ebrahimi allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 626*22dc650dSSadaf Ebrahimi alt_bsux set PCRE2_ALT_BSUX 627*22dc650dSSadaf Ebrahimi alt_circumflex set PCRE2_ALT_CIRCUMFLEX 628*22dc650dSSadaf Ebrahimi alt_verbnames set PCRE2_ALT_VERBNAMES 629*22dc650dSSadaf Ebrahimi anchored set PCRE2_ANCHORED 630*22dc650dSSadaf Ebrahimi /a ascii_all set all ASCII options 631*22dc650dSSadaf Ebrahimi ascii_bsd set PCRE2_EXTRA_ASCII_BSD 632*22dc650dSSadaf Ebrahimi ascii_bss set PCRE2_EXTRA_ASCII_BSS 633*22dc650dSSadaf Ebrahimi ascii_bsw set PCRE2_EXTRA_ASCII_BSW 634*22dc650dSSadaf Ebrahimi ascii_digit set PCRE2_EXTRA_ASCII_DIGIT 635*22dc650dSSadaf Ebrahimi ascii_posix set PCRE2_EXTRA_ASCII_POSIX 636*22dc650dSSadaf Ebrahimi auto_callout set PCRE2_AUTO_CALLOUT 637*22dc650dSSadaf Ebrahimi bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 638*22dc650dSSadaf Ebrahimi /i caseless set PCRE2_CASELESS 639*22dc650dSSadaf Ebrahimi /r caseless_restrict set PCRE2_EXTRA_CASELESS_RESTRICT 640*22dc650dSSadaf Ebrahimi dollar_endonly set PCRE2_DOLLAR_ENDONLY 641*22dc650dSSadaf Ebrahimi /s dotall set PCRE2_DOTALL 642*22dc650dSSadaf Ebrahimi dupnames set PCRE2_DUPNAMES 643*22dc650dSSadaf Ebrahimi endanchored set PCRE2_ENDANCHORED 644*22dc650dSSadaf Ebrahimi escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF 645*22dc650dSSadaf Ebrahimi /x extended set PCRE2_EXTENDED 646*22dc650dSSadaf Ebrahimi /xx extended_more set PCRE2_EXTENDED_MORE 647*22dc650dSSadaf Ebrahimi extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX 648*22dc650dSSadaf Ebrahimi firstline set PCRE2_FIRSTLINE 649*22dc650dSSadaf Ebrahimi literal set PCRE2_LITERAL 650*22dc650dSSadaf Ebrahimi match_line set PCRE2_EXTRA_MATCH_LINE 651*22dc650dSSadaf Ebrahimi match_invalid_utf set PCRE2_MATCH_INVALID_UTF 652*22dc650dSSadaf Ebrahimi match_unset_backref set PCRE2_MATCH_UNSET_BACKREF 653*22dc650dSSadaf Ebrahimi match_word set PCRE2_EXTRA_MATCH_WORD 654*22dc650dSSadaf Ebrahimi /m multiline set PCRE2_MULTILINE 655*22dc650dSSadaf Ebrahimi never_backslash_c set PCRE2_NEVER_BACKSLASH_C 656*22dc650dSSadaf Ebrahimi never_ucp set PCRE2_NEVER_UCP 657*22dc650dSSadaf Ebrahimi never_utf set PCRE2_NEVER_UTF 658*22dc650dSSadaf Ebrahimi /n no_auto_capture set PCRE2_NO_AUTO_CAPTURE 659*22dc650dSSadaf Ebrahimi no_auto_possess set PCRE2_NO_AUTO_POSSESS 660*22dc650dSSadaf Ebrahimi no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR 661*22dc650dSSadaf Ebrahimi no_start_optimize set PCRE2_NO_START_OPTIMIZE 662*22dc650dSSadaf Ebrahimi no_utf_check set PCRE2_NO_UTF_CHECK 663*22dc650dSSadaf Ebrahimi ucp set PCRE2_UCP 664*22dc650dSSadaf Ebrahimi ungreedy set PCRE2_UNGREEDY 665*22dc650dSSadaf Ebrahimi use_offset_limit set PCRE2_USE_OFFSET_LIMIT 666*22dc650dSSadaf Ebrahimi utf set PCRE2_UTF 667*22dc650dSSadaf Ebrahimi</pre> 668*22dc650dSSadaf EbrahimiAs well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all 669*22dc650dSSadaf Ebrahiminon-printing characters in output strings to be printed using the \x{hh...} 670*22dc650dSSadaf Ebrahiminotation. Otherwise, those less than 0x100 are output in hex without the curly 671*22dc650dSSadaf Ebrahimibrackets. Setting <b>utf</b> in 16-bit or 32-bit mode also causes pattern and 672*22dc650dSSadaf Ebrahimisubject strings to be translated to UTF-16 or UTF-32, respectively, before 673*22dc650dSSadaf Ebrahimibeing passed to library functions. 674*22dc650dSSadaf Ebrahimi<a name="controlmodifiers"></a></P> 675*22dc650dSSadaf Ebrahimi<br><b> 676*22dc650dSSadaf EbrahimiSetting compilation controls 677*22dc650dSSadaf Ebrahimi</b><br> 678*22dc650dSSadaf Ebrahimi<P> 679*22dc650dSSadaf EbrahimiThe following modifiers affect the compilation process or request information 680*22dc650dSSadaf Ebrahimiabout the pattern. There are single-letter abbreviations for some that are 681*22dc650dSSadaf Ebrahimiheavily used in the test files. 682*22dc650dSSadaf Ebrahimi<pre> 683*22dc650dSSadaf Ebrahimi bsr=[anycrlf|unicode] specify \R handling 684*22dc650dSSadaf Ebrahimi /B bincode show binary code without lengths 685*22dc650dSSadaf Ebrahimi callout_info show callout information 686*22dc650dSSadaf Ebrahimi convert=<options> request foreign pattern conversion 687*22dc650dSSadaf Ebrahimi convert_glob_escape=c set glob escape character 688*22dc650dSSadaf Ebrahimi convert_glob_separator=c set glob separator character 689*22dc650dSSadaf Ebrahimi convert_length set convert buffer length 690*22dc650dSSadaf Ebrahimi debug same as info,fullbincode 691*22dc650dSSadaf Ebrahimi framesize show matching frame size 692*22dc650dSSadaf Ebrahimi fullbincode show binary code with lengths 693*22dc650dSSadaf Ebrahimi /I info show info about compiled pattern 694*22dc650dSSadaf Ebrahimi hex unquoted characters are hexadecimal 695*22dc650dSSadaf Ebrahimi jit[=<number>] use JIT 696*22dc650dSSadaf Ebrahimi jitfast use JIT fast path 697*22dc650dSSadaf Ebrahimi jitverify verify JIT use 698*22dc650dSSadaf Ebrahimi locale=<name> use this locale 699*22dc650dSSadaf Ebrahimi max_pattern_compiled ) set maximum compiled pattern 700*22dc650dSSadaf Ebrahimi _length=<n> ) length (bytes) 701*22dc650dSSadaf Ebrahimi max_pattern_length=<n> set maximum pattern length (code units) 702*22dc650dSSadaf Ebrahimi max_varlookbehind=<n> set maximum variable lookbehind length 703*22dc650dSSadaf Ebrahimi memory show memory used 704*22dc650dSSadaf Ebrahimi newline=<type> set newline type 705*22dc650dSSadaf Ebrahimi null_context compile with a NULL context 706*22dc650dSSadaf Ebrahimi null_pattern pass pattern as NULL 707*22dc650dSSadaf Ebrahimi parens_nest_limit=<n> set maximum parentheses depth 708*22dc650dSSadaf Ebrahimi posix use the POSIX API 709*22dc650dSSadaf Ebrahimi posix_nosub use the POSIX API with REG_NOSUB 710*22dc650dSSadaf Ebrahimi push push compiled pattern onto the stack 711*22dc650dSSadaf Ebrahimi pushcopy push a copy onto the stack 712*22dc650dSSadaf Ebrahimi stackguard=<number> test the stackguard feature 713*22dc650dSSadaf Ebrahimi subject_literal treat all subject lines as literal 714*22dc650dSSadaf Ebrahimi tables=[0|1|2|3] select internal tables 715*22dc650dSSadaf Ebrahimi use_length do not zero-terminate the pattern 716*22dc650dSSadaf Ebrahimi utf8_input treat input as UTF-8 717*22dc650dSSadaf Ebrahimi</pre> 718*22dc650dSSadaf EbrahimiThe effects of these modifiers are described in the following sections. 719*22dc650dSSadaf Ebrahimi</P> 720*22dc650dSSadaf Ebrahimi<br><b> 721*22dc650dSSadaf EbrahimiNewline and \R handling 722*22dc650dSSadaf Ebrahimi</b><br> 723*22dc650dSSadaf Ebrahimi<P> 724*22dc650dSSadaf EbrahimiThe <b>bsr</b> modifier specifies what \R in a pattern should match. If it is 725*22dc650dSSadaf Ebrahimiset to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode", 726*22dc650dSSadaf Ebrahimi\R matches any Unicode newline sequence. The default can be specified when 727*22dc650dSSadaf EbrahimiPCRE2 is built; if it is not, the default is set to Unicode. 728*22dc650dSSadaf Ebrahimi</P> 729*22dc650dSSadaf Ebrahimi<P> 730*22dc650dSSadaf EbrahimiThe <b>newline</b> modifier specifies which characters are to be interpreted as 731*22dc650dSSadaf Ebrahiminewlines, both in the pattern and in subject lines. The type must be one of CR, 732*22dc650dSSadaf EbrahimiLF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case). 733*22dc650dSSadaf Ebrahimi</P> 734*22dc650dSSadaf Ebrahimi<br><b> 735*22dc650dSSadaf EbrahimiInformation about a pattern 736*22dc650dSSadaf Ebrahimi</b><br> 737*22dc650dSSadaf Ebrahimi<P> 738*22dc650dSSadaf EbrahimiThe <b>debug</b> modifier is a shorthand for <b>info,fullbincode</b>, requesting 739*22dc650dSSadaf Ebrahimiall available information. 740*22dc650dSSadaf Ebrahimi</P> 741*22dc650dSSadaf Ebrahimi<P> 742*22dc650dSSadaf EbrahimiThe <b>bincode</b> modifier causes a representation of the compiled code to be 743*22dc650dSSadaf Ebrahimioutput after compilation. This information does not contain length and offset 744*22dc650dSSadaf Ebrahimivalues, which ensures that the same output is generated for different internal 745*22dc650dSSadaf Ebrahimilink sizes and different code unit widths. By using <b>bincode</b>, the same 746*22dc650dSSadaf Ebrahimiregression tests can be used in different environments. 747*22dc650dSSadaf Ebrahimi</P> 748*22dc650dSSadaf Ebrahimi<P> 749*22dc650dSSadaf EbrahimiThe <b>fullbincode</b> modifier, by contrast, <i>does</i> include length and 750*22dc650dSSadaf Ebrahimioffset values. This is used in a few special tests that run only for specific 751*22dc650dSSadaf Ebrahimicode unit widths and link sizes, and is also useful for one-off tests. 752*22dc650dSSadaf Ebrahimi</P> 753*22dc650dSSadaf Ebrahimi<P> 754*22dc650dSSadaf EbrahimiThe <b>info</b> modifier requests information about the compiled pattern 755*22dc650dSSadaf Ebrahimi(whether it is anchored, has a fixed first character, and so on). The 756*22dc650dSSadaf Ebrahimiinformation is obtained from the <b>pcre2_pattern_info()</b> function. Here are 757*22dc650dSSadaf Ebrahimisome typical examples: 758*22dc650dSSadaf Ebrahimi<pre> 759*22dc650dSSadaf Ebrahimi re> /(?i)(^a|^b)/m,info 760*22dc650dSSadaf Ebrahimi Capture group count = 1 761*22dc650dSSadaf Ebrahimi Compile options: multiline 762*22dc650dSSadaf Ebrahimi Overall options: caseless multiline 763*22dc650dSSadaf Ebrahimi First code unit at start or follows newline 764*22dc650dSSadaf Ebrahimi Subject length lower bound = 1 765*22dc650dSSadaf Ebrahimi 766*22dc650dSSadaf Ebrahimi re> /(?i)abc/info 767*22dc650dSSadaf Ebrahimi Capture group count = 0 768*22dc650dSSadaf Ebrahimi Compile options: <none> 769*22dc650dSSadaf Ebrahimi Overall options: caseless 770*22dc650dSSadaf Ebrahimi First code unit = 'a' (caseless) 771*22dc650dSSadaf Ebrahimi Last code unit = 'c' (caseless) 772*22dc650dSSadaf Ebrahimi Subject length lower bound = 3 773*22dc650dSSadaf Ebrahimi</pre> 774*22dc650dSSadaf Ebrahimi"Compile options" are those specified by modifiers; "overall options" have 775*22dc650dSSadaf Ebrahimiadded options that are taken or deduced from the pattern. If both sets of 776*22dc650dSSadaf Ebrahimioptions are the same, just a single "options" line is output; if there are no 777*22dc650dSSadaf Ebrahimioptions, the line is omitted. "First code unit" is where any match must start; 778*22dc650dSSadaf Ebrahimiif there is more than one they are listed as "starting code units". "Last code 779*22dc650dSSadaf Ebrahimiunit" is the last literal code unit that must be present in any match. This is 780*22dc650dSSadaf Ebrahiminot necessarily the last character. These lines are omitted if no starting or 781*22dc650dSSadaf Ebrahimiending code units are recorded. The subject length line is omitted when 782*22dc650dSSadaf Ebrahimi<b>no_start_optimize</b> is set because the minimum length is not calculated 783*22dc650dSSadaf Ebrahimiwhen it can never be used. 784*22dc650dSSadaf Ebrahimi</P> 785*22dc650dSSadaf Ebrahimi<P> 786*22dc650dSSadaf EbrahimiThe <b>framesize</b> modifier shows the size, in bytes, of each storage frame 787*22dc650dSSadaf Ebrahimiused by <b>pcre2_match()</b> for handling backtracking. The size depends on the 788*22dc650dSSadaf Ebrahiminumber of capturing parentheses in the pattern. A vector of these frames is 789*22dc650dSSadaf Ebrahimiused at matching time; its overall size is shown when the <b>heaframes_size</b> 790*22dc650dSSadaf Ebrahimisubject modifier is set. 791*22dc650dSSadaf Ebrahimi</P> 792*22dc650dSSadaf Ebrahimi<P> 793*22dc650dSSadaf EbrahimiThe <b>callout_info</b> modifier requests information about all the callouts in 794*22dc650dSSadaf Ebrahimithe pattern. A list of them is output at the end of any other information that 795*22dc650dSSadaf Ebrahimiis requested. For each callout, either its number or string is given, followed 796*22dc650dSSadaf Ebrahimiby the item that follows it in the pattern. 797*22dc650dSSadaf Ebrahimi</P> 798*22dc650dSSadaf Ebrahimi<br><b> 799*22dc650dSSadaf EbrahimiPassing a NULL context 800*22dc650dSSadaf Ebrahimi</b><br> 801*22dc650dSSadaf Ebrahimi<P> 802*22dc650dSSadaf EbrahimiNormally, <b>pcre2test</b> passes a context block to <b>pcre2_compile()</b>. If 803*22dc650dSSadaf Ebrahimithe <b>null_context</b> modifier is set, however, NULL is passed. This is for 804*22dc650dSSadaf Ebrahimitesting that <b>pcre2_compile()</b> behaves correctly in this case (it uses 805*22dc650dSSadaf Ebrahimidefault values). 806*22dc650dSSadaf Ebrahimi</P> 807*22dc650dSSadaf Ebrahimi<br><b> 808*22dc650dSSadaf EbrahimiPassing a NULL pattern 809*22dc650dSSadaf Ebrahimi</b><br> 810*22dc650dSSadaf Ebrahimi<P> 811*22dc650dSSadaf EbrahimiThe <b>null_pattern</b> modifier is for testing the behaviour of 812*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> when the pattern argument is NULL. The length value 813*22dc650dSSadaf Ebrahimipassed is the default PCRE2_ZERO_TERMINATED unless <b>use_length</b> is set. 814*22dc650dSSadaf EbrahimiAny length other than zero causes an error. 815*22dc650dSSadaf Ebrahimi</P> 816*22dc650dSSadaf Ebrahimi<br><b> 817*22dc650dSSadaf EbrahimiSpecifying pattern characters in hexadecimal 818*22dc650dSSadaf Ebrahimi</b><br> 819*22dc650dSSadaf Ebrahimi<P> 820*22dc650dSSadaf EbrahimiThe <b>hex</b> modifier specifies that the characters of the pattern, except for 821*22dc650dSSadaf Ebrahimisubstrings enclosed in single or double quotes, are to be interpreted as pairs 822*22dc650dSSadaf Ebrahimiof hexadecimal digits. This feature is provided as a way of creating patterns 823*22dc650dSSadaf Ebrahimithat contain binary zeros and other non-printing characters. White space is 824*22dc650dSSadaf Ebrahimipermitted between pairs of digits. For example, this pattern contains three 825*22dc650dSSadaf Ebrahimicharacters: 826*22dc650dSSadaf Ebrahimi<pre> 827*22dc650dSSadaf Ebrahimi /ab 32 59/hex 828*22dc650dSSadaf Ebrahimi</pre> 829*22dc650dSSadaf EbrahimiParts of such a pattern are taken literally if quoted. This pattern contains 830*22dc650dSSadaf Ebrahiminine characters, only two of which are specified in hexadecimal: 831*22dc650dSSadaf Ebrahimi<pre> 832*22dc650dSSadaf Ebrahimi /ab "literal" 32/hex 833*22dc650dSSadaf Ebrahimi</pre> 834*22dc650dSSadaf EbrahimiEither single or double quotes may be used. There is no way of including 835*22dc650dSSadaf Ebrahimithe delimiter within a substring. The <b>hex</b> and <b>expand</b> modifiers are 836*22dc650dSSadaf Ebrahimimutually exclusive. 837*22dc650dSSadaf Ebrahimi</P> 838*22dc650dSSadaf Ebrahimi<br><b> 839*22dc650dSSadaf EbrahimiSpecifying the pattern's length 840*22dc650dSSadaf Ebrahimi</b><br> 841*22dc650dSSadaf Ebrahimi<P> 842*22dc650dSSadaf EbrahimiBy default, patterns are passed to the compiling functions as zero-terminated 843*22dc650dSSadaf Ebrahimistrings but can be passed by length instead of being zero-terminated. The 844*22dc650dSSadaf Ebrahimi<b>use_length</b> modifier causes this to happen. Using a length happens 845*22dc650dSSadaf Ebrahimiautomatically (whether or not <b>use_length</b> is set) when <b>hex</b> is set, 846*22dc650dSSadaf Ebrahimibecause patterns specified in hexadecimal may contain binary zeros. 847*22dc650dSSadaf Ebrahimi</P> 848*22dc650dSSadaf Ebrahimi<P> 849*22dc650dSSadaf EbrahimiIf <b>hex</b> or <b>use_length</b> is used with the POSIX wrapper API (see 850*22dc650dSSadaf Ebrahimi<a href="#posixwrapper">"Using the POSIX wrapper API"</a> 851*22dc650dSSadaf Ebrahimibelow), the REG_PEND extension is used to pass the pattern's length. 852*22dc650dSSadaf Ebrahimi</P> 853*22dc650dSSadaf Ebrahimi<br><b> 854*22dc650dSSadaf EbrahimiSpecifying a maximum for variable lookbehinds 855*22dc650dSSadaf Ebrahimi</b><br> 856*22dc650dSSadaf Ebrahimi<P> 857*22dc650dSSadaf EbrahimiVariable lookbehind assertions are supported only if, for each one, there is a 858*22dc650dSSadaf Ebrahimimaximum length (in characters) that it can match. There is a limit on this, 859*22dc650dSSadaf Ebrahimiwhose default can be set at build time, with an ultimate default of 255. The 860*22dc650dSSadaf Ebrahimi<b>max_varlookbehind</b> modifier uses the <b>pcre2_set_max_varlookbehind()</b> 861*22dc650dSSadaf Ebrahimifunction to change the limit. Lookbehinds whose branches each match a fixed 862*22dc650dSSadaf Ebrahimilength are limited to 65535 characters per branch. 863*22dc650dSSadaf Ebrahimi</P> 864*22dc650dSSadaf Ebrahimi<br><b> 865*22dc650dSSadaf EbrahimiSpecifying wide characters in 16-bit and 32-bit modes 866*22dc650dSSadaf Ebrahimi</b><br> 867*22dc650dSSadaf Ebrahimi<P> 868*22dc650dSSadaf EbrahimiIn 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and 869*22dc650dSSadaf Ebrahimitranslated to UTF-16 or UTF-32 when the <b>utf</b> modifier is set. For testing 870*22dc650dSSadaf Ebrahimithe 16-bit and 32-bit libraries in non-UTF mode, the <b>utf8_input</b> modifier 871*22dc650dSSadaf Ebrahimican be used. It is mutually exclusive with <b>utf</b>. Input lines are 872*22dc650dSSadaf Ebrahimiinterpreted as UTF-8 as a means of specifying wide characters. More details are 873*22dc650dSSadaf Ebrahimigiven in 874*22dc650dSSadaf Ebrahimi<a href="#inputencoding">"Input encoding"</a> 875*22dc650dSSadaf Ebrahimiabove. 876*22dc650dSSadaf Ebrahimi</P> 877*22dc650dSSadaf Ebrahimi<br><b> 878*22dc650dSSadaf EbrahimiGenerating long repetitive patterns 879*22dc650dSSadaf Ebrahimi</b><br> 880*22dc650dSSadaf Ebrahimi<P> 881*22dc650dSSadaf EbrahimiSome tests use long patterns that are very repetitive. Instead of creating a 882*22dc650dSSadaf Ebrahimivery long input line for such a pattern, you can use a special repetition 883*22dc650dSSadaf Ebrahimifeature, similar to the one described for subject lines above. If the 884*22dc650dSSadaf Ebrahimi<b>expand</b> modifier is present on a pattern, parts of the pattern that have 885*22dc650dSSadaf Ebrahimithe form 886*22dc650dSSadaf Ebrahimi<pre> 887*22dc650dSSadaf Ebrahimi \[<characters>]{<count>} 888*22dc650dSSadaf Ebrahimi</pre> 889*22dc650dSSadaf Ebrahimiare expanded before the pattern is passed to <b>pcre2_compile()</b>. For 890*22dc650dSSadaf Ebrahimiexample, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction 891*22dc650dSSadaf Ebrahimicannot be nested. An initial "\[" sequence is recognized only if "]{" followed 892*22dc650dSSadaf Ebrahimiby decimal digits and "}" is found later in the pattern. If not, the characters 893*22dc650dSSadaf Ebrahimiremain in the pattern unaltered. The <b>expand</b> and <b>hex</b> modifiers are 894*22dc650dSSadaf Ebrahimimutually exclusive. 895*22dc650dSSadaf Ebrahimi</P> 896*22dc650dSSadaf Ebrahimi<P> 897*22dc650dSSadaf EbrahimiIf part of an expanded pattern looks like an expansion, but is really part of 898*22dc650dSSadaf Ebrahimithe actual pattern, unwanted expansion can be avoided by giving two values in 899*22dc650dSSadaf Ebrahimithe quantifier. For example, \[AB]{6000,6000} is not recognized as an 900*22dc650dSSadaf Ebrahimiexpansion item. 901*22dc650dSSadaf Ebrahimi</P> 902*22dc650dSSadaf Ebrahimi<P> 903*22dc650dSSadaf EbrahimiIf the <b>info</b> modifier is set on an expanded pattern, the result of the 904*22dc650dSSadaf Ebrahimiexpansion is included in the information that is output. 905*22dc650dSSadaf Ebrahimi</P> 906*22dc650dSSadaf Ebrahimi<br><b> 907*22dc650dSSadaf EbrahimiJIT compilation 908*22dc650dSSadaf Ebrahimi</b><br> 909*22dc650dSSadaf Ebrahimi<P> 910*22dc650dSSadaf EbrahimiJust-in-time (JIT) compiling is a heavyweight optimization that can greatly 911*22dc650dSSadaf Ebrahimispeed up pattern matching. See the 912*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a> 913*22dc650dSSadaf Ebrahimidocumentation for details. JIT compiling happens, optionally, after a pattern 914*22dc650dSSadaf Ebrahimihas been successfully compiled into an internal form. The JIT compiler converts 915*22dc650dSSadaf Ebrahimithis to optimized machine code. It needs to know whether the match-time options 916*22dc650dSSadaf EbrahimiPCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, because 917*22dc650dSSadaf Ebrahimidifferent code is generated for the different cases. See the <b>partial</b> 918*22dc650dSSadaf Ebrahimimodifier in "Subject Modifiers" 919*22dc650dSSadaf Ebrahimi<a href="#subjectmodifiers">below</a> 920*22dc650dSSadaf Ebrahimifor details of how these options are specified for each match attempt. 921*22dc650dSSadaf Ebrahimi</P> 922*22dc650dSSadaf Ebrahimi<P> 923*22dc650dSSadaf EbrahimiJIT compilation is requested by the <b>jit</b> pattern modifier, which may 924*22dc650dSSadaf Ebrahimioptionally be followed by an equals sign and a number in the range 0 to 7. 925*22dc650dSSadaf EbrahimiThe three bits that make up the number specify which of the three JIT operating 926*22dc650dSSadaf Ebrahimimodes are to be compiled: 927*22dc650dSSadaf Ebrahimi<pre> 928*22dc650dSSadaf Ebrahimi 1 compile JIT code for non-partial matching 929*22dc650dSSadaf Ebrahimi 2 compile JIT code for soft partial matching 930*22dc650dSSadaf Ebrahimi 4 compile JIT code for hard partial matching 931*22dc650dSSadaf Ebrahimi</pre> 932*22dc650dSSadaf EbrahimiThe possible values for the <b>jit</b> modifier are therefore: 933*22dc650dSSadaf Ebrahimi<pre> 934*22dc650dSSadaf Ebrahimi 0 disable JIT 935*22dc650dSSadaf Ebrahimi 1 normal matching only 936*22dc650dSSadaf Ebrahimi 2 soft partial matching only 937*22dc650dSSadaf Ebrahimi 3 normal and soft partial matching 938*22dc650dSSadaf Ebrahimi 4 hard partial matching only 939*22dc650dSSadaf Ebrahimi 6 soft and hard partial matching only 940*22dc650dSSadaf Ebrahimi 7 all three modes 941*22dc650dSSadaf Ebrahimi</pre> 942*22dc650dSSadaf EbrahimiIf no number is given, 7 is assumed. The phrase "partial matching" means a call 943*22dc650dSSadaf Ebrahimito <b>pcre2_match()</b> with either the PCRE2_PARTIAL_SOFT or the 944*22dc650dSSadaf EbrahimiPCRE2_PARTIAL_HARD option set. Note that such a call may return a complete 945*22dc650dSSadaf Ebrahimimatch; the options enable the possibility of a partial match, but do not 946*22dc650dSSadaf Ebrahimirequire it. Note also that if you request JIT compilation only for partial 947*22dc650dSSadaf Ebrahimimatching (for example, jit=2) but do not set the <b>partial</b> modifier on a 948*22dc650dSSadaf Ebrahimisubject line, that match will not use JIT code because none was compiled for 949*22dc650dSSadaf Ebrahiminon-partial matching. 950*22dc650dSSadaf Ebrahimi</P> 951*22dc650dSSadaf Ebrahimi<P> 952*22dc650dSSadaf EbrahimiIf JIT compilation is successful, the compiled JIT code will automatically be 953*22dc650dSSadaf Ebrahimiused when an appropriate type of match is run, except when incompatible 954*22dc650dSSadaf Ebrahimirun-time options are specified. For more details, see the 955*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a> 956*22dc650dSSadaf Ebrahimidocumentation. See also the <b>jitstack</b> modifier below for a way of 957*22dc650dSSadaf Ebrahimisetting the size of the JIT stack. 958*22dc650dSSadaf Ebrahimi</P> 959*22dc650dSSadaf Ebrahimi<P> 960*22dc650dSSadaf EbrahimiIf the <b>jitfast</b> modifier is specified, matching is done using the JIT 961*22dc650dSSadaf Ebrahimi"fast path" interface, <b>pcre2_jit_match()</b>, which skips some of the sanity 962*22dc650dSSadaf Ebrahimichecks that are done by <b>pcre2_match()</b>, and of course does not work when 963*22dc650dSSadaf EbrahimiJIT is not supported. If <b>jitfast</b> is specified without <b>jit</b>, jit=7 is 964*22dc650dSSadaf Ebrahimiassumed. 965*22dc650dSSadaf Ebrahimi</P> 966*22dc650dSSadaf Ebrahimi<P> 967*22dc650dSSadaf EbrahimiIf the <b>jitverify</b> modifier is specified, information about the compiled 968*22dc650dSSadaf Ebrahimipattern shows whether JIT compilation was or was not successful. If 969*22dc650dSSadaf Ebrahimi<b>jitverify</b> is specified without <b>jit</b>, jit=7 is assumed. If JIT 970*22dc650dSSadaf Ebrahimicompilation is successful when <b>jitverify</b> is set, the text "(JIT)" is 971*22dc650dSSadaf Ebrahimiadded to the first output line after a match or non match when JIT-compiled 972*22dc650dSSadaf Ebrahimicode was actually used in the match. 973*22dc650dSSadaf Ebrahimi</P> 974*22dc650dSSadaf Ebrahimi<br><b> 975*22dc650dSSadaf EbrahimiSetting a locale 976*22dc650dSSadaf Ebrahimi</b><br> 977*22dc650dSSadaf Ebrahimi<P> 978*22dc650dSSadaf EbrahimiThe <b>locale</b> modifier must specify the name of a locale, for example: 979*22dc650dSSadaf Ebrahimi<pre> 980*22dc650dSSadaf Ebrahimi /pattern/locale=fr_FR 981*22dc650dSSadaf Ebrahimi</pre> 982*22dc650dSSadaf EbrahimiThe given locale is set, <b>pcre2_maketables()</b> is called to build a set of 983*22dc650dSSadaf Ebrahimicharacter tables for the locale, and this is then passed to 984*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> when compiling the regular expression. The same tables 985*22dc650dSSadaf Ebrahimiare used when matching the following subject lines. The <b>locale</b> modifier 986*22dc650dSSadaf Ebrahimiapplies only to the pattern on which it appears, but can be given in a 987*22dc650dSSadaf Ebrahimi<b>#pattern</b> command if a default is needed. Setting a locale and alternate 988*22dc650dSSadaf Ebrahimicharacter tables are mutually exclusive. 989*22dc650dSSadaf Ebrahimi</P> 990*22dc650dSSadaf Ebrahimi<br><b> 991*22dc650dSSadaf EbrahimiShowing pattern memory 992*22dc650dSSadaf Ebrahimi</b><br> 993*22dc650dSSadaf Ebrahimi<P> 994*22dc650dSSadaf EbrahimiThe <b>memory</b> modifier causes the size in bytes of the memory used to hold 995*22dc650dSSadaf Ebrahimithe compiled pattern to be output. This does not include the size of the 996*22dc650dSSadaf Ebrahimi<b>pcre2_code</b> block; it is just the actual compiled data. If the pattern is 997*22dc650dSSadaf Ebrahimisubsequently passed to the JIT compiler, the size of the JIT compiled code is 998*22dc650dSSadaf Ebrahimialso output. Here is an example: 999*22dc650dSSadaf Ebrahimi<pre> 1000*22dc650dSSadaf Ebrahimi re> /a(b)c/jit,memory 1001*22dc650dSSadaf Ebrahimi Memory allocation (code space): 21 1002*22dc650dSSadaf Ebrahimi Memory allocation (JIT code): 1910 1003*22dc650dSSadaf Ebrahimi 1004*22dc650dSSadaf Ebrahimi</PRE> 1005*22dc650dSSadaf Ebrahimi</P> 1006*22dc650dSSadaf Ebrahimi<br><b> 1007*22dc650dSSadaf EbrahimiLimiting nested parentheses 1008*22dc650dSSadaf Ebrahimi</b><br> 1009*22dc650dSSadaf Ebrahimi<P> 1010*22dc650dSSadaf EbrahimiThe <b>parens_nest_limit</b> modifier sets a limit on the depth of nested 1011*22dc650dSSadaf Ebrahimiparentheses in a pattern. Breaching the limit causes a compilation error. 1012*22dc650dSSadaf EbrahimiThe default for the library is set when PCRE2 is built, but <b>pcre2test</b> 1013*22dc650dSSadaf Ebrahimisets its own default of 220, which is required for running the standard test 1014*22dc650dSSadaf Ebrahimisuite. 1015*22dc650dSSadaf Ebrahimi</P> 1016*22dc650dSSadaf Ebrahimi<br><b> 1017*22dc650dSSadaf EbrahimiLimiting the pattern length 1018*22dc650dSSadaf Ebrahimi</b><br> 1019*22dc650dSSadaf Ebrahimi<P> 1020*22dc650dSSadaf EbrahimiThe <b>max_pattern_length</b> modifier sets a limit, in code units, to the 1021*22dc650dSSadaf Ebrahimilength of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit 1022*22dc650dSSadaf Ebrahimicauses a compilation error. The default is the largest number a PCRE2_SIZE 1023*22dc650dSSadaf Ebrahimivariable can hold (essentially unlimited). 1024*22dc650dSSadaf Ebrahimi</P> 1025*22dc650dSSadaf Ebrahimi<br><b> 1026*22dc650dSSadaf EbrahimiLimiting the size of a compiled pattern 1027*22dc650dSSadaf Ebrahimi</b><br> 1028*22dc650dSSadaf Ebrahimi<P> 1029*22dc650dSSadaf EbrahimiThe <b>max_pattern_compiled_length</b> modifier sets a limit, in bytes, to the 1030*22dc650dSSadaf Ebrahimiamount of memory used by a compiled pattern. Breaching the limit causes a 1031*22dc650dSSadaf Ebrahimicompilation error. The default is the largest number a PCRE2_SIZE variable can 1032*22dc650dSSadaf Ebrahimihold (essentially unlimited). 1033*22dc650dSSadaf Ebrahimi<a name="posixwrapper"></a></P> 1034*22dc650dSSadaf Ebrahimi<br><b> 1035*22dc650dSSadaf EbrahimiUsing the POSIX wrapper API 1036*22dc650dSSadaf Ebrahimi</b><br> 1037*22dc650dSSadaf Ebrahimi<P> 1038*22dc650dSSadaf EbrahimiThe <b>posix</b> and <b>posix_nosub</b> modifiers cause <b>pcre2test</b> to call 1039*22dc650dSSadaf EbrahimiPCRE2 via the POSIX wrapper API rather than its native API. When 1040*22dc650dSSadaf Ebrahimi<b>posix_nosub</b> is used, the POSIX option REG_NOSUB is passed to 1041*22dc650dSSadaf Ebrahimi<b>regcomp()</b>. The POSIX wrapper supports only the 8-bit library. Note that 1042*22dc650dSSadaf Ebrahimiit does not imply POSIX matching semantics; for more detail see the 1043*22dc650dSSadaf Ebrahimi<a href="pcre2posix.html"><b>pcre2posix</b></a> 1044*22dc650dSSadaf Ebrahimidocumentation. The following pattern modifiers set options for the 1045*22dc650dSSadaf Ebrahimi<b>regcomp()</b> function: 1046*22dc650dSSadaf Ebrahimi<pre> 1047*22dc650dSSadaf Ebrahimi caseless REG_ICASE 1048*22dc650dSSadaf Ebrahimi multiline REG_NEWLINE 1049*22dc650dSSadaf Ebrahimi dotall REG_DOTALL ) 1050*22dc650dSSadaf Ebrahimi ungreedy REG_UNGREEDY ) These options are not part of 1051*22dc650dSSadaf Ebrahimi ucp REG_UCP ) the POSIX standard 1052*22dc650dSSadaf Ebrahimi utf REG_UTF8 ) 1053*22dc650dSSadaf Ebrahimi</pre> 1054*22dc650dSSadaf EbrahimiThe <b>regerror_buffsize</b> modifier specifies a size for the error buffer that 1055*22dc650dSSadaf Ebrahimiis passed to <b>regerror()</b> in the event of a compilation error. For example: 1056*22dc650dSSadaf Ebrahimi<pre> 1057*22dc650dSSadaf Ebrahimi /abc/posix,regerror_buffsize=20 1058*22dc650dSSadaf Ebrahimi</pre> 1059*22dc650dSSadaf EbrahimiThis provides a means of testing the behaviour of <b>regerror()</b> when the 1060*22dc650dSSadaf Ebrahimibuffer is too small for the error message. If this modifier has not been set, a 1061*22dc650dSSadaf Ebrahimilarge buffer is used. 1062*22dc650dSSadaf Ebrahimi</P> 1063*22dc650dSSadaf Ebrahimi<P> 1064*22dc650dSSadaf EbrahimiThe <b>aftertext</b> and <b>allaftertext</b> subject modifiers work as described 1065*22dc650dSSadaf Ebrahimibelow. All other modifiers are either ignored, with a warning message, or cause 1066*22dc650dSSadaf Ebrahimian error. 1067*22dc650dSSadaf Ebrahimi</P> 1068*22dc650dSSadaf Ebrahimi<P> 1069*22dc650dSSadaf EbrahimiThe pattern is passed to <b>regcomp()</b> as a zero-terminated string by 1070*22dc650dSSadaf Ebrahimidefault, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the 1071*22dc650dSSadaf EbrahimiREG_PEND extension is used to pass it by length. 1072*22dc650dSSadaf Ebrahimi</P> 1073*22dc650dSSadaf Ebrahimi<br><b> 1074*22dc650dSSadaf EbrahimiTesting the stack guard feature 1075*22dc650dSSadaf Ebrahimi</b><br> 1076*22dc650dSSadaf Ebrahimi<P> 1077*22dc650dSSadaf EbrahimiThe <b>stackguard</b> modifier is used to test the use of 1078*22dc650dSSadaf Ebrahimi<b>pcre2_set_compile_recursion_guard()</b>, a function that is provided to 1079*22dc650dSSadaf Ebrahimienable stack availability to be checked during compilation (see the 1080*22dc650dSSadaf Ebrahimi<a href="pcre2api.html"><b>pcre2api</b></a> 1081*22dc650dSSadaf Ebrahimidocumentation for details). If the number specified by the modifier is greater 1082*22dc650dSSadaf Ebrahimithan zero, <b>pcre2_set_compile_recursion_guard()</b> is called to set up 1083*22dc650dSSadaf Ebrahimicallback from <b>pcre2_compile()</b> to a local function. The argument it 1084*22dc650dSSadaf Ebrahimireceives is the current nesting parenthesis depth; if this is greater than the 1085*22dc650dSSadaf Ebrahimivalue given by the modifier, non-zero is returned, causing the compilation to 1086*22dc650dSSadaf Ebrahimibe aborted. 1087*22dc650dSSadaf Ebrahimi</P> 1088*22dc650dSSadaf Ebrahimi<br><b> 1089*22dc650dSSadaf EbrahimiUsing alternative character tables 1090*22dc650dSSadaf Ebrahimi</b><br> 1091*22dc650dSSadaf Ebrahimi<P> 1092*22dc650dSSadaf EbrahimiThe value specified for the <b>tables</b> modifier must be one of the digits 0, 1093*22dc650dSSadaf Ebrahimi1, 2, or 3. It causes a specific set of built-in character tables to be passed 1094*22dc650dSSadaf Ebrahimito <b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour 1095*22dc650dSSadaf Ebrahimiwith different character tables. The digit specifies the tables as follows: 1096*22dc650dSSadaf Ebrahimi<pre> 1097*22dc650dSSadaf Ebrahimi 0 do not pass any special character tables 1098*22dc650dSSadaf Ebrahimi 1 the default ASCII tables, as distributed in 1099*22dc650dSSadaf Ebrahimi pcre2_chartables.c.dist 1100*22dc650dSSadaf Ebrahimi 2 a set of tables defining ISO 8859 characters 1101*22dc650dSSadaf Ebrahimi 3 a set of tables loaded by the #loadtables command 1102*22dc650dSSadaf Ebrahimi</pre> 1103*22dc650dSSadaf EbrahimiIn tables 2, some characters whose codes are greater than 128 are identified as 1104*22dc650dSSadaf Ebrahimiletters, digits, spaces, etc. Tables 3 can be used only after a 1105*22dc650dSSadaf Ebrahimi<b>#loadtables</b> command has loaded them from a binary file. Setting alternate 1106*22dc650dSSadaf Ebrahimicharacter tables and a locale are mutually exclusive. 1107*22dc650dSSadaf Ebrahimi</P> 1108*22dc650dSSadaf Ebrahimi<br><b> 1109*22dc650dSSadaf EbrahimiSetting certain match controls 1110*22dc650dSSadaf Ebrahimi</b><br> 1111*22dc650dSSadaf Ebrahimi<P> 1112*22dc650dSSadaf EbrahimiThe following modifiers are really subject modifiers, and are described under 1113*22dc650dSSadaf Ebrahimi"Subject Modifiers" below. However, they may be included in a pattern's 1114*22dc650dSSadaf Ebrahimimodifier list, in which case they are applied to every subject line that is 1115*22dc650dSSadaf Ebrahimiprocessed with that pattern. These modifiers do not affect the compilation 1116*22dc650dSSadaf Ebrahimiprocess. 1117*22dc650dSSadaf Ebrahimi<pre> 1118*22dc650dSSadaf Ebrahimi aftertext show text after match 1119*22dc650dSSadaf Ebrahimi allaftertext show text after captures 1120*22dc650dSSadaf Ebrahimi allcaptures show all captures 1121*22dc650dSSadaf Ebrahimi allvector show the entire ovector 1122*22dc650dSSadaf Ebrahimi allusedtext show all consulted text 1123*22dc650dSSadaf Ebrahimi altglobal alternative global matching 1124*22dc650dSSadaf Ebrahimi /g global global matching 1125*22dc650dSSadaf Ebrahimi heapframes_size show match data heapframes size 1126*22dc650dSSadaf Ebrahimi jitstack=<n> set size of JIT stack 1127*22dc650dSSadaf Ebrahimi mark show mark values 1128*22dc650dSSadaf Ebrahimi replace=<string> specify a replacement string 1129*22dc650dSSadaf Ebrahimi startchar show starting character when relevant 1130*22dc650dSSadaf Ebrahimi substitute_callout use substitution callouts 1131*22dc650dSSadaf Ebrahimi substitute_extended use PCRE2_SUBSTITUTE_EXTENDED 1132*22dc650dSSadaf Ebrahimi substitute_literal use PCRE2_SUBSTITUTE_LITERAL 1133*22dc650dSSadaf Ebrahimi substitute_matched use PCRE2_SUBSTITUTE_MATCHED 1134*22dc650dSSadaf Ebrahimi substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 1135*22dc650dSSadaf Ebrahimi substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 1136*22dc650dSSadaf Ebrahimi substitute_skip=<n> skip substitution <n> 1137*22dc650dSSadaf Ebrahimi substitute_stop=<n> skip substitution <n> and following 1138*22dc650dSSadaf Ebrahimi substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET 1139*22dc650dSSadaf Ebrahimi substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY 1140*22dc650dSSadaf Ebrahimi</pre> 1141*22dc650dSSadaf EbrahimiThese modifiers may not appear in a <b>#pattern</b> command. If you want them as 1142*22dc650dSSadaf Ebrahimidefaults, set them in a <b>#subject</b> command. 1143*22dc650dSSadaf Ebrahimi</P> 1144*22dc650dSSadaf Ebrahimi<br><b> 1145*22dc650dSSadaf EbrahimiSpecifying literal subject lines 1146*22dc650dSSadaf Ebrahimi</b><br> 1147*22dc650dSSadaf Ebrahimi<P> 1148*22dc650dSSadaf EbrahimiIf the <b>subject_literal</b> modifier is present on a pattern, all the subject 1149*22dc650dSSadaf Ebrahimilines that it matches are taken as literal strings, with no interpretation of 1150*22dc650dSSadaf Ebrahimibackslashes. It is not possible to set subject modifiers on such lines, but any 1151*22dc650dSSadaf Ebrahimithat are set as defaults by a <b>#subject</b> command are recognized. 1152*22dc650dSSadaf Ebrahimi</P> 1153*22dc650dSSadaf Ebrahimi<br><b> 1154*22dc650dSSadaf EbrahimiSaving a compiled pattern 1155*22dc650dSSadaf Ebrahimi</b><br> 1156*22dc650dSSadaf Ebrahimi<P> 1157*22dc650dSSadaf EbrahimiWhen a pattern with the <b>push</b> modifier is successfully compiled, it is 1158*22dc650dSSadaf Ebrahimipushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next 1159*22dc650dSSadaf Ebrahimiline to contain a new pattern (or a command) instead of a subject line. This 1160*22dc650dSSadaf Ebrahimifacility is used when saving compiled patterns to a file, as described in the 1161*22dc650dSSadaf Ebrahimisection entitled "Saving and restoring compiled patterns" 1162*22dc650dSSadaf Ebrahimi<a href="#saverestore">below.</a> 1163*22dc650dSSadaf EbrahimiIf <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled 1164*22dc650dSSadaf Ebrahimipattern is stacked, leaving the original as current, ready to match the 1165*22dc650dSSadaf Ebrahimifollowing input lines. This provides a way of testing the 1166*22dc650dSSadaf Ebrahimi<b>pcre2_code_copy()</b> function. 1167*22dc650dSSadaf EbrahimiThe <b>push</b> and <b>pushcopy </b> modifiers are incompatible with compilation 1168*22dc650dSSadaf Ebrahimimodifiers such as <b>global</b> that act at match time. Any that are specified 1169*22dc650dSSadaf Ebrahimiare ignored (for the stacked copy), with a warning message, except for 1170*22dc650dSSadaf Ebrahimi<b>replace</b>, which causes an error. Note that <b>jitverify</b>, which is 1171*22dc650dSSadaf Ebrahimiallowed, does not carry through to any subsequent matching that uses a stacked 1172*22dc650dSSadaf Ebrahimipattern. 1173*22dc650dSSadaf Ebrahimi</P> 1174*22dc650dSSadaf Ebrahimi<br><b> 1175*22dc650dSSadaf EbrahimiTesting foreign pattern conversion 1176*22dc650dSSadaf Ebrahimi</b><br> 1177*22dc650dSSadaf Ebrahimi<P> 1178*22dc650dSSadaf EbrahimiThe experimental foreign pattern conversion functions in PCRE2 can be tested by 1179*22dc650dSSadaf Ebrahimisetting the <b>convert</b> modifier. Its argument is a colon-separated list of 1180*22dc650dSSadaf Ebrahimioptions, which set the equivalent option for the <b>pcre2_pattern_convert()</b> 1181*22dc650dSSadaf Ebrahimifunction: 1182*22dc650dSSadaf Ebrahimi<pre> 1183*22dc650dSSadaf Ebrahimi glob PCRE2_CONVERT_GLOB 1184*22dc650dSSadaf Ebrahimi glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR 1185*22dc650dSSadaf Ebrahimi glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 1186*22dc650dSSadaf Ebrahimi posix_basic PCRE2_CONVERT_POSIX_BASIC 1187*22dc650dSSadaf Ebrahimi posix_extended PCRE2_CONVERT_POSIX_EXTENDED 1188*22dc650dSSadaf Ebrahimi unset Unset all options 1189*22dc650dSSadaf Ebrahimi</pre> 1190*22dc650dSSadaf EbrahimiThe "unset" value is useful for turning off a default that has been set by a 1191*22dc650dSSadaf Ebrahimi<b>#pattern</b> command. When one of these options is set, the input pattern is 1192*22dc650dSSadaf Ebrahimipassed to <b>pcre2_pattern_convert()</b>. If the conversion is successful, the 1193*22dc650dSSadaf Ebrahimiresult is reflected in the output and then passed to <b>pcre2_compile()</b>. The 1194*22dc650dSSadaf Ebrahiminormal <b>utf</b> and <b>no_utf_check</b> options, if set, cause the 1195*22dc650dSSadaf EbrahimiPCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be passed to 1196*22dc650dSSadaf Ebrahimi<b>pcre2_pattern_convert()</b>. 1197*22dc650dSSadaf Ebrahimi</P> 1198*22dc650dSSadaf Ebrahimi<P> 1199*22dc650dSSadaf EbrahimiBy default, the conversion function is allowed to allocate a buffer for its 1200*22dc650dSSadaf Ebrahimioutput. However, if the <b>convert_length</b> modifier is set to a value greater 1201*22dc650dSSadaf Ebrahimithan zero, <b>pcre2test</b> passes a buffer of the given length. This makes it 1202*22dc650dSSadaf Ebrahimipossible to test the length check. 1203*22dc650dSSadaf Ebrahimi</P> 1204*22dc650dSSadaf Ebrahimi<P> 1205*22dc650dSSadaf EbrahimiThe <b>convert_glob_escape</b> and <b>convert_glob_separator</b> modifiers can be 1206*22dc650dSSadaf Ebrahimiused to specify the escape and separator characters for glob processing, 1207*22dc650dSSadaf Ebrahimioverriding the defaults, which are operating-system dependent. 1208*22dc650dSSadaf Ebrahimi<a name="subjectmodifiers"></a></P> 1209*22dc650dSSadaf Ebrahimi<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br> 1210*22dc650dSSadaf Ebrahimi<P> 1211*22dc650dSSadaf EbrahimiThe modifiers that can appear in subject lines and the <b>#subject</b> 1212*22dc650dSSadaf Ebrahimicommand are of two types. 1213*22dc650dSSadaf Ebrahimi</P> 1214*22dc650dSSadaf Ebrahimi<br><b> 1215*22dc650dSSadaf EbrahimiSetting match options 1216*22dc650dSSadaf Ebrahimi</b><br> 1217*22dc650dSSadaf Ebrahimi<P> 1218*22dc650dSSadaf EbrahimiThe following modifiers set options for <b>pcre2_match()</b> or 1219*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>. See 1220*22dc650dSSadaf Ebrahimi<a href="pcreapi.html"><b>pcreapi</b></a> 1221*22dc650dSSadaf Ebrahimifor a description of their effects. 1222*22dc650dSSadaf Ebrahimi<pre> 1223*22dc650dSSadaf Ebrahimi anchored set PCRE2_ANCHORED 1224*22dc650dSSadaf Ebrahimi endanchored set PCRE2_ENDANCHORED 1225*22dc650dSSadaf Ebrahimi dfa_restart set PCRE2_DFA_RESTART 1226*22dc650dSSadaf Ebrahimi dfa_shortest set PCRE2_DFA_SHORTEST 1227*22dc650dSSadaf Ebrahimi disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK 1228*22dc650dSSadaf Ebrahimi no_jit set PCRE2_NO_JIT 1229*22dc650dSSadaf Ebrahimi no_utf_check set PCRE2_NO_UTF_CHECK 1230*22dc650dSSadaf Ebrahimi notbol set PCRE2_NOTBOL 1231*22dc650dSSadaf Ebrahimi notempty set PCRE2_NOTEMPTY 1232*22dc650dSSadaf Ebrahimi notempty_atstart set PCRE2_NOTEMPTY_ATSTART 1233*22dc650dSSadaf Ebrahimi noteol set PCRE2_NOTEOL 1234*22dc650dSSadaf Ebrahimi partial_hard (or ph) set PCRE2_PARTIAL_HARD 1235*22dc650dSSadaf Ebrahimi partial_soft (or ps) set PCRE2_PARTIAL_SOFT 1236*22dc650dSSadaf Ebrahimi</pre> 1237*22dc650dSSadaf EbrahimiThe partial matching modifiers are provided with abbreviations because they 1238*22dc650dSSadaf Ebrahimiappear frequently in tests. 1239*22dc650dSSadaf Ebrahimi</P> 1240*22dc650dSSadaf Ebrahimi<P> 1241*22dc650dSSadaf EbrahimiIf the <b>posix</b> or <b>posix_nosub</b> modifier was present on the pattern, 1242*22dc650dSSadaf Ebrahimicausing the POSIX wrapper API to be used, the only option-setting modifiers 1243*22dc650dSSadaf Ebrahimithat have any effect are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>, 1244*22dc650dSSadaf Ebrahimicausing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to 1245*22dc650dSSadaf Ebrahimi<b>regexec()</b>. The other modifiers are ignored, with a warning message. 1246*22dc650dSSadaf Ebrahimi</P> 1247*22dc650dSSadaf Ebrahimi<P> 1248*22dc650dSSadaf EbrahimiThere is one additional modifier that can be used with the POSIX wrapper. It is 1249*22dc650dSSadaf Ebrahimiignored (with a warning) if used for non-POSIX matching. 1250*22dc650dSSadaf Ebrahimi<pre> 1251*22dc650dSSadaf Ebrahimi posix_startend=<n>[:<m>] 1252*22dc650dSSadaf Ebrahimi</pre> 1253*22dc650dSSadaf EbrahimiThis causes the subject string to be passed to <b>regexec()</b> using the 1254*22dc650dSSadaf EbrahimiREG_STARTEND option, which uses offsets to specify which part of the string is 1255*22dc650dSSadaf Ebrahimisearched. If only one number is given, the end offset is passed as the end of 1256*22dc650dSSadaf Ebrahimithe subject string. For more detail of REG_STARTEND, see the 1257*22dc650dSSadaf Ebrahimi<a href="pcre2posix.html"><b>pcre2posix</b></a> 1258*22dc650dSSadaf Ebrahimidocumentation. If the subject string contains binary zeros (coded as escapes 1259*22dc650dSSadaf Ebrahimisuch as \x{00} because <b>pcre2test</b> does not support actual binary zeros in 1260*22dc650dSSadaf Ebrahimiits input), you must use <b>posix_startend</b> to specify its length. 1261*22dc650dSSadaf Ebrahimi</P> 1262*22dc650dSSadaf Ebrahimi<br><b> 1263*22dc650dSSadaf EbrahimiSetting match controls 1264*22dc650dSSadaf Ebrahimi</b><br> 1265*22dc650dSSadaf Ebrahimi<P> 1266*22dc650dSSadaf EbrahimiThe following modifiers affect the matching process or request additional 1267*22dc650dSSadaf Ebrahimiinformation. Some of them may also be specified on a pattern line (see above), 1268*22dc650dSSadaf Ebrahimiin which case they apply to every subject line that is matched against that 1269*22dc650dSSadaf Ebrahimipattern, but can be overridden by modifiers on the subject. 1270*22dc650dSSadaf Ebrahimi<pre> 1271*22dc650dSSadaf Ebrahimi aftertext show text after match 1272*22dc650dSSadaf Ebrahimi allaftertext show text after captures 1273*22dc650dSSadaf Ebrahimi allcaptures show all captures 1274*22dc650dSSadaf Ebrahimi allvector show the entire ovector 1275*22dc650dSSadaf Ebrahimi allusedtext show all consulted text (non-JIT only) 1276*22dc650dSSadaf Ebrahimi altglobal alternative global matching 1277*22dc650dSSadaf Ebrahimi callout_capture show captures at callout time 1278*22dc650dSSadaf Ebrahimi callout_data=<n> set a value to pass via callouts 1279*22dc650dSSadaf Ebrahimi callout_error=<n>[:<m>] control callout error 1280*22dc650dSSadaf Ebrahimi callout_extra show extra callout information 1281*22dc650dSSadaf Ebrahimi callout_fail=<n>[:<m>] control callout failure 1282*22dc650dSSadaf Ebrahimi callout_no_where do not show position of a callout 1283*22dc650dSSadaf Ebrahimi callout_none do not supply a callout function 1284*22dc650dSSadaf Ebrahimi copy=<number or name> copy captured substring 1285*22dc650dSSadaf Ebrahimi depth_limit=<n> set a depth limit 1286*22dc650dSSadaf Ebrahimi dfa use <b>pcre2_dfa_match()</b> 1287*22dc650dSSadaf Ebrahimi find_limits find heap, match and depth limits 1288*22dc650dSSadaf Ebrahimi find_limits_noheap find match and depth limits 1289*22dc650dSSadaf Ebrahimi get=<number or name> extract captured substring 1290*22dc650dSSadaf Ebrahimi getall extract all captured substrings 1291*22dc650dSSadaf Ebrahimi /g global global matching 1292*22dc650dSSadaf Ebrahimi heapframes_size show match data heapframes size 1293*22dc650dSSadaf Ebrahimi heap_limit=<n> set a limit on heap memory (Kbytes) 1294*22dc650dSSadaf Ebrahimi jitstack=<n> set size of JIT stack 1295*22dc650dSSadaf Ebrahimi mark show mark values 1296*22dc650dSSadaf Ebrahimi match_limit=<n> set a match limit 1297*22dc650dSSadaf Ebrahimi memory show heap memory usage 1298*22dc650dSSadaf Ebrahimi null_context match with a NULL context 1299*22dc650dSSadaf Ebrahimi null_replacement substitute with NULL replacement 1300*22dc650dSSadaf Ebrahimi null_subject match with NULL subject 1301*22dc650dSSadaf Ebrahimi offset=<n> set starting offset 1302*22dc650dSSadaf Ebrahimi offset_limit=<n> set offset limit 1303*22dc650dSSadaf Ebrahimi ovector=<n> set size of output vector 1304*22dc650dSSadaf Ebrahimi recursion_limit=<n> obsolete synonym for depth_limit 1305*22dc650dSSadaf Ebrahimi replace=<string> specify a replacement string 1306*22dc650dSSadaf Ebrahimi startchar show startchar when relevant 1307*22dc650dSSadaf Ebrahimi startoffset=<n> same as offset=<n> 1308*22dc650dSSadaf Ebrahimi substitute_callout use substitution callouts 1309*22dc650dSSadaf Ebrahimi substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED 1310*22dc650dSSadaf Ebrahimi substitute_literal use PCRE2_SUBSTITUTE_LITERAL 1311*22dc650dSSadaf Ebrahimi substitute_matched use PCRE2_SUBSTITUTE_MATCHED 1312*22dc650dSSadaf Ebrahimi substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 1313*22dc650dSSadaf Ebrahimi substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 1314*22dc650dSSadaf Ebrahimi substitute_skip=<n> skip substitution number n 1315*22dc650dSSadaf Ebrahimi substitute_stop=<n> skip substitution number n and greater 1316*22dc650dSSadaf Ebrahimi substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET 1317*22dc650dSSadaf Ebrahimi substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY 1318*22dc650dSSadaf Ebrahimi zero_terminate pass the subject as zero-terminated 1319*22dc650dSSadaf Ebrahimi</pre> 1320*22dc650dSSadaf EbrahimiThe effects of these modifiers are described in the following sections. When 1321*22dc650dSSadaf Ebrahimimatching via the POSIX wrapper API, the <b>aftertext</b>, <b>allaftertext</b>, 1322*22dc650dSSadaf Ebrahimiand <b>ovector</b> subject modifiers work as described below. All other 1323*22dc650dSSadaf Ebrahimimodifiers are either ignored, with a warning message, or cause an error. 1324*22dc650dSSadaf Ebrahimi</P> 1325*22dc650dSSadaf Ebrahimi<br><b> 1326*22dc650dSSadaf EbrahimiShowing more text 1327*22dc650dSSadaf Ebrahimi</b><br> 1328*22dc650dSSadaf Ebrahimi<P> 1329*22dc650dSSadaf EbrahimiThe <b>aftertext</b> modifier requests that as well as outputting the part of 1330*22dc650dSSadaf Ebrahimithe subject string that matched the entire pattern, <b>pcre2test</b> should in 1331*22dc650dSSadaf Ebrahimiaddition output the remainder of the subject string. This is useful for tests 1332*22dc650dSSadaf Ebrahimiwhere the subject contains multiple copies of the same substring. The 1333*22dc650dSSadaf Ebrahimi<b>allaftertext</b> modifier requests the same action for captured substrings as 1334*22dc650dSSadaf Ebrahimiwell as the main matched substring. In each case the remainder is output on the 1335*22dc650dSSadaf Ebrahimifollowing line with a plus character following the capture number. 1336*22dc650dSSadaf Ebrahimi</P> 1337*22dc650dSSadaf Ebrahimi<P> 1338*22dc650dSSadaf EbrahimiThe <b>allusedtext</b> modifier requests that all the text that was consulted 1339*22dc650dSSadaf Ebrahimiduring a successful pattern match by the interpreter should be shown, for both 1340*22dc650dSSadaf Ebrahimifull and partial matches. This feature is not supported for JIT matching, and 1341*22dc650dSSadaf Ebrahimiif requested with JIT it is ignored (with a warning message). Setting this 1342*22dc650dSSadaf Ebrahimimodifier affects the output if there is a lookbehind at the start of a match, 1343*22dc650dSSadaf Ebrahimior, for a complete match, a lookahead at the end, or if \K is used in the 1344*22dc650dSSadaf Ebrahimipattern. Characters that precede or follow the start and end of the actual 1345*22dc650dSSadaf Ebrahimimatch are indicated in the output by '<' or '>' characters underneath them. 1346*22dc650dSSadaf EbrahimiHere is an example: 1347*22dc650dSSadaf Ebrahimi<pre> 1348*22dc650dSSadaf Ebrahimi re> /(?<=pqr)abc(?=xyz)/ 1349*22dc650dSSadaf Ebrahimi data> 123pqrabcxyz456\=allusedtext 1350*22dc650dSSadaf Ebrahimi 0: pqrabcxyz 1351*22dc650dSSadaf Ebrahimi <<< >>> 1352*22dc650dSSadaf Ebrahimi data> 123pqrabcxy\=ph,allusedtext 1353*22dc650dSSadaf Ebrahimi Partial match: pqrabcxy 1354*22dc650dSSadaf Ebrahimi <<< 1355*22dc650dSSadaf Ebrahimi</pre> 1356*22dc650dSSadaf EbrahimiThe first, complete match shows that the matched string is "abc", with the 1357*22dc650dSSadaf Ebrahimipreceding and following strings "pqr" and "xyz" having been consulted during 1358*22dc650dSSadaf Ebrahimithe match (when processing the assertions). The partial match can indicate only 1359*22dc650dSSadaf Ebrahimithe preceding string. 1360*22dc650dSSadaf Ebrahimi</P> 1361*22dc650dSSadaf Ebrahimi<P> 1362*22dc650dSSadaf EbrahimiThe <b>startchar</b> modifier requests that the starting character for the match 1363*22dc650dSSadaf Ebrahimibe indicated, if it is different to the start of the matched string. The only 1364*22dc650dSSadaf Ebrahimitime when this occurs is when \K has been processed as part of the match. In 1365*22dc650dSSadaf Ebrahimithis situation, the output for the matched string is displayed from the 1366*22dc650dSSadaf Ebrahimistarting character instead of from the match point, with circumflex characters 1367*22dc650dSSadaf Ebrahimiunder the earlier characters. For example: 1368*22dc650dSSadaf Ebrahimi<pre> 1369*22dc650dSSadaf Ebrahimi re> /abc\Kxyz/ 1370*22dc650dSSadaf Ebrahimi data> abcxyz\=startchar 1371*22dc650dSSadaf Ebrahimi 0: abcxyz 1372*22dc650dSSadaf Ebrahimi ^^^ 1373*22dc650dSSadaf Ebrahimi</pre> 1374*22dc650dSSadaf EbrahimiUnlike <b>allusedtext</b>, the <b>startchar</b> modifier can be used with JIT. 1375*22dc650dSSadaf EbrahimiHowever, these two modifiers are mutually exclusive. 1376*22dc650dSSadaf Ebrahimi</P> 1377*22dc650dSSadaf Ebrahimi<br><b> 1378*22dc650dSSadaf EbrahimiShowing the value of all capture groups 1379*22dc650dSSadaf Ebrahimi</b><br> 1380*22dc650dSSadaf Ebrahimi<P> 1381*22dc650dSSadaf EbrahimiThe <b>allcaptures</b> modifier requests that the values of all potential 1382*22dc650dSSadaf Ebrahimicaptured parentheses be output after a match. By default, only those up to the 1383*22dc650dSSadaf Ebrahimihighest one actually used in the match are output (corresponding to the return 1384*22dc650dSSadaf Ebrahimicode from <b>pcre2_match()</b>). Groups that did not take part in the match 1385*22dc650dSSadaf Ebrahimiare output as "<unset>". This modifier is not relevant for DFA matching (which 1386*22dc650dSSadaf Ebrahimidoes no capturing) and does not apply when <b>replace</b> is specified; it is 1387*22dc650dSSadaf Ebrahimiignored, with a warning message, if present. 1388*22dc650dSSadaf Ebrahimi</P> 1389*22dc650dSSadaf Ebrahimi<br><b> 1390*22dc650dSSadaf EbrahimiShowing the entire ovector, for all outcomes 1391*22dc650dSSadaf Ebrahimi</b><br> 1392*22dc650dSSadaf Ebrahimi<P> 1393*22dc650dSSadaf EbrahimiThe <b>allvector</b> modifier requests that the entire ovector be shown, 1394*22dc650dSSadaf Ebrahimiwhatever the outcome of the match. Compare <b>allcaptures</b>, which shows only 1395*22dc650dSSadaf Ebrahimiup to the maximum number of capture groups for the pattern, and then only for a 1396*22dc650dSSadaf Ebrahimisuccessful complete non-DFA match. This modifier, which acts after any match 1397*22dc650dSSadaf Ebrahimiresult, and also for DFA matching, provides a means of checking that there are 1398*22dc650dSSadaf Ebrahimino unexpected modifications to ovector fields. Before each match attempt, the 1399*22dc650dSSadaf Ebrahimiovector is filled with a special value, and if this is found in both elements 1400*22dc650dSSadaf Ebrahimiof a capturing pair, "<unchanged>" is output. After a successful match, this 1401*22dc650dSSadaf Ebrahimiapplies to all groups after the maximum capture group for the pattern. In other 1402*22dc650dSSadaf Ebrahimicases it applies to the entire ovector. After a partial match, the first two 1403*22dc650dSSadaf Ebrahimielements are the only ones that should be set. After a DFA match, the amount of 1404*22dc650dSSadaf Ebrahimiovector that is used depends on the number of matches that were found. 1405*22dc650dSSadaf Ebrahimi</P> 1406*22dc650dSSadaf Ebrahimi<br><b> 1407*22dc650dSSadaf EbrahimiTesting pattern callouts 1408*22dc650dSSadaf Ebrahimi</b><br> 1409*22dc650dSSadaf Ebrahimi<P> 1410*22dc650dSSadaf EbrahimiA callout function is supplied when <b>pcre2test</b> calls the library matching 1411*22dc650dSSadaf Ebrahimifunctions, unless <b>callout_none</b> is specified. Its behaviour can be 1412*22dc650dSSadaf Ebrahimicontrolled by various modifiers listed above whose names begin with 1413*22dc650dSSadaf Ebrahimi<b>callout_</b>. Details are given in the section entitled "Callouts" 1414*22dc650dSSadaf Ebrahimi<a href="#callouts">below.</a> 1415*22dc650dSSadaf EbrahimiTesting callouts from <b>pcre2_substitute()</b> is described separately in 1416*22dc650dSSadaf Ebrahimi"Testing the substitution function" 1417*22dc650dSSadaf Ebrahimi<a href="#substitution">below.</a> 1418*22dc650dSSadaf Ebrahimi</P> 1419*22dc650dSSadaf Ebrahimi<br><b> 1420*22dc650dSSadaf EbrahimiFinding all matches in a string 1421*22dc650dSSadaf Ebrahimi</b><br> 1422*22dc650dSSadaf Ebrahimi<P> 1423*22dc650dSSadaf EbrahimiSearching for all possible matches within a subject can be requested by the 1424*22dc650dSSadaf Ebrahimi<b>global</b> or <b>altglobal</b> modifier. After finding a match, the matching 1425*22dc650dSSadaf Ebrahimifunction is called again to search the remainder of the subject. The difference 1426*22dc650dSSadaf Ebrahimibetween <b>global</b> and <b>altglobal</b> is that the former uses the 1427*22dc650dSSadaf Ebrahimi<i>start_offset</i> argument to <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> 1428*22dc650dSSadaf Ebrahimito start searching at a new point within the entire string (which is what Perl 1429*22dc650dSSadaf Ebrahimidoes), whereas the latter passes over a shortened subject. This makes a 1430*22dc650dSSadaf Ebrahimidifference to the matching process if the pattern begins with a lookbehind 1431*22dc650dSSadaf Ebrahimiassertion (including \b or \B). 1432*22dc650dSSadaf Ebrahimi</P> 1433*22dc650dSSadaf Ebrahimi<P> 1434*22dc650dSSadaf EbrahimiIf an empty string is matched, the next match is done with the 1435*22dc650dSSadaf EbrahimiPCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for 1436*22dc650dSSadaf Ebrahimianother, non-empty, match at the same point in the subject. If this match 1437*22dc650dSSadaf Ebrahimifails, the start offset is advanced, and the normal match is retried. This 1438*22dc650dSSadaf Ebrahimiimitates the way Perl handles such cases when using the <b>/g</b> modifier or 1439*22dc650dSSadaf Ebrahimithe <b>split()</b> function. Normally, the start offset is advanced by one 1440*22dc650dSSadaf Ebrahimicharacter, but if the newline convention recognizes CRLF as a newline, and the 1441*22dc650dSSadaf Ebrahimicurrent character is CR followed by LF, an advance of two characters occurs. 1442*22dc650dSSadaf Ebrahimi</P> 1443*22dc650dSSadaf Ebrahimi<br><b> 1444*22dc650dSSadaf EbrahimiTesting substring extraction functions 1445*22dc650dSSadaf Ebrahimi</b><br> 1446*22dc650dSSadaf Ebrahimi<P> 1447*22dc650dSSadaf EbrahimiThe <b>copy</b> and <b>get</b> modifiers can be used to test the 1448*22dc650dSSadaf Ebrahimi<b>pcre2_substring_copy_xxx()</b> and <b>pcre2_substring_get_xxx()</b> functions. 1449*22dc650dSSadaf EbrahimiThey can be given more than once, and each can specify a capture group name or 1450*22dc650dSSadaf Ebrahiminumber, for example: 1451*22dc650dSSadaf Ebrahimi<pre> 1452*22dc650dSSadaf Ebrahimi abcd\=copy=1,copy=3,get=G1 1453*22dc650dSSadaf Ebrahimi</pre> 1454*22dc650dSSadaf EbrahimiIf the <b>#subject</b> command is used to set default copy and/or get lists, 1455*22dc650dSSadaf Ebrahimithese can be unset by specifying a negative number to cancel all numbered 1456*22dc650dSSadaf Ebrahimigroups and an empty name to cancel all named groups. 1457*22dc650dSSadaf Ebrahimi</P> 1458*22dc650dSSadaf Ebrahimi<P> 1459*22dc650dSSadaf EbrahimiThe <b>getall</b> modifier tests <b>pcre2_substring_list_get()</b>, which 1460*22dc650dSSadaf Ebrahimiextracts all captured substrings. 1461*22dc650dSSadaf Ebrahimi</P> 1462*22dc650dSSadaf Ebrahimi<P> 1463*22dc650dSSadaf EbrahimiIf the subject line is successfully matched, the substrings extracted by the 1464*22dc650dSSadaf Ebrahimiconvenience functions are output with C, G, or L after the string number 1465*22dc650dSSadaf Ebrahimiinstead of a colon. This is in addition to the normal full list. The string 1466*22dc650dSSadaf Ebrahimilength (that is, the return from the extraction function) is given in 1467*22dc650dSSadaf Ebrahimiparentheses after each substring, followed by the name when the extraction was 1468*22dc650dSSadaf Ebrahimiby name. 1469*22dc650dSSadaf Ebrahimi<a name="substitution"></a></P> 1470*22dc650dSSadaf Ebrahimi<br><b> 1471*22dc650dSSadaf EbrahimiTesting the substitution function 1472*22dc650dSSadaf Ebrahimi</b><br> 1473*22dc650dSSadaf Ebrahimi<P> 1474*22dc650dSSadaf EbrahimiIf the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is 1475*22dc650dSSadaf Ebrahimicalled instead of one of the matching functions (or after one call of 1476*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> in the case of PCRE2_SUBSTITUTE_MATCHED). Note that 1477*22dc650dSSadaf Ebrahimireplacement strings cannot contain commas, because a comma signifies the end of 1478*22dc650dSSadaf Ebrahimia modifier. This is not thought to be an issue in a test program. 1479*22dc650dSSadaf Ebrahimi</P> 1480*22dc650dSSadaf Ebrahimi<P> 1481*22dc650dSSadaf EbrahimiSpecifying a completely empty replacement string disables this modifier. 1482*22dc650dSSadaf EbrahimiHowever, it is possible to specify an empty replacement by providing a buffer 1483*22dc650dSSadaf Ebrahimilength, as described below, for an otherwise empty replacement. 1484*22dc650dSSadaf Ebrahimi</P> 1485*22dc650dSSadaf Ebrahimi<P> 1486*22dc650dSSadaf EbrahimiUnlike subject strings, <b>pcre2test</b> does not process replacement strings 1487*22dc650dSSadaf Ebrahimifor escape sequences. In UTF mode, a replacement string is checked to see if it 1488*22dc650dSSadaf Ebrahimiis a valid UTF-8 string. If so, it is correctly converted to a UTF string of 1489*22dc650dSSadaf Ebrahimithe appropriate code unit width. If it is not a valid UTF-8 string, the 1490*22dc650dSSadaf Ebrahimiindividual code units are copied directly. This provides a means of passing an 1491*22dc650dSSadaf Ebrahimiinvalid UTF-8 string for testing purposes. 1492*22dc650dSSadaf Ebrahimi</P> 1493*22dc650dSSadaf Ebrahimi<P> 1494*22dc650dSSadaf EbrahimiThe following modifiers set options (in additional to the normal match options) 1495*22dc650dSSadaf Ebrahimifor <b>pcre2_substitute()</b>: 1496*22dc650dSSadaf Ebrahimi<pre> 1497*22dc650dSSadaf Ebrahimi global PCRE2_SUBSTITUTE_GLOBAL 1498*22dc650dSSadaf Ebrahimi substitute_extended PCRE2_SUBSTITUTE_EXTENDED 1499*22dc650dSSadaf Ebrahimi substitute_literal PCRE2_SUBSTITUTE_LITERAL 1500*22dc650dSSadaf Ebrahimi substitute_matched PCRE2_SUBSTITUTE_MATCHED 1501*22dc650dSSadaf Ebrahimi substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 1502*22dc650dSSadaf Ebrahimi substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 1503*22dc650dSSadaf Ebrahimi substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET 1504*22dc650dSSadaf Ebrahimi substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY 1505*22dc650dSSadaf Ebrahimi</pre> 1506*22dc650dSSadaf EbrahimiSee the 1507*22dc650dSSadaf Ebrahimi<a href="pcre2api.html"><b>pcre2api</b></a> 1508*22dc650dSSadaf Ebrahimidocumentation for details of these options. 1509*22dc650dSSadaf Ebrahimi</P> 1510*22dc650dSSadaf Ebrahimi<P> 1511*22dc650dSSadaf EbrahimiAfter a successful substitution, the modified string is output, preceded by the 1512*22dc650dSSadaf Ebrahiminumber of replacements. This may be zero if there were no matches. Here is a 1513*22dc650dSSadaf Ebrahimisimple example of a substitution test: 1514*22dc650dSSadaf Ebrahimi<pre> 1515*22dc650dSSadaf Ebrahimi /abc/replace=xxx 1516*22dc650dSSadaf Ebrahimi =abc=abc= 1517*22dc650dSSadaf Ebrahimi 1: =xxx=abc= 1518*22dc650dSSadaf Ebrahimi =abc=abc=\=global 1519*22dc650dSSadaf Ebrahimi 2: =xxx=xxx= 1520*22dc650dSSadaf Ebrahimi</pre> 1521*22dc650dSSadaf EbrahimiSubject and replacement strings should be kept relatively short (fewer than 256 1522*22dc650dSSadaf Ebrahimicharacters) for substitution tests, as fixed-size buffers are used. To make it 1523*22dc650dSSadaf Ebrahimieasy to test for buffer overflow, if the replacement string starts with a 1524*22dc650dSSadaf Ebrahiminumber in square brackets, that number is passed to <b>pcre2_substitute()</b> as 1525*22dc650dSSadaf Ebrahimithe size of the output buffer, with the replacement string starting at the next 1526*22dc650dSSadaf Ebrahimicharacter. Here is an example that tests the edge case: 1527*22dc650dSSadaf Ebrahimi<pre> 1528*22dc650dSSadaf Ebrahimi /abc/ 1529*22dc650dSSadaf Ebrahimi 123abc123\=replace=[10]XYZ 1530*22dc650dSSadaf Ebrahimi 1: 123XYZ123 1531*22dc650dSSadaf Ebrahimi 123abc123\=replace=[9]XYZ 1532*22dc650dSSadaf Ebrahimi Failed: error -47: no more memory 1533*22dc650dSSadaf Ebrahimi</pre> 1534*22dc650dSSadaf EbrahimiThe default action of <b>pcre2_substitute()</b> is to return 1535*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the 1536*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the 1537*22dc650dSSadaf Ebrahimi<b>substitute_overflow_length</b> modifier), <b>pcre2_substitute()</b> continues 1538*22dc650dSSadaf Ebrahimito go through the motions of matching and substituting (but not doing any 1539*22dc650dSSadaf Ebrahimicallouts), in order to compute the size of buffer that is required. When this 1540*22dc650dSSadaf Ebrahimihappens, <b>pcre2test</b> shows the required buffer length (which includes space 1541*22dc650dSSadaf Ebrahimifor the trailing zero) as part of the error message. For example: 1542*22dc650dSSadaf Ebrahimi<pre> 1543*22dc650dSSadaf Ebrahimi /abc/substitute_overflow_length 1544*22dc650dSSadaf Ebrahimi 123abc123\=replace=[9]XYZ 1545*22dc650dSSadaf Ebrahimi Failed: error -47: no more memory: 10 code units are needed 1546*22dc650dSSadaf Ebrahimi</pre> 1547*22dc650dSSadaf EbrahimiA replacement string is ignored with POSIX and DFA matching. Specifying partial 1548*22dc650dSSadaf Ebrahimimatching provokes an error return ("bad option value") from 1549*22dc650dSSadaf Ebrahimi<b>pcre2_substitute()</b>. 1550*22dc650dSSadaf Ebrahimi</P> 1551*22dc650dSSadaf Ebrahimi<br><b> 1552*22dc650dSSadaf EbrahimiTesting substitute callouts 1553*22dc650dSSadaf Ebrahimi</b><br> 1554*22dc650dSSadaf Ebrahimi<P> 1555*22dc650dSSadaf EbrahimiIf the <b>substitute_callout</b> modifier is set, a substitution callout 1556*22dc650dSSadaf Ebrahimifunction is set up. The <b>null_context</b> modifier must not be set, because 1557*22dc650dSSadaf Ebrahimithe address of the callout function is passed in a match context. When the 1558*22dc650dSSadaf Ebrahimicallout function is called (after each substitution), details of the input 1559*22dc650dSSadaf Ebrahimiand output strings are output. For example: 1560*22dc650dSSadaf Ebrahimi<pre> 1561*22dc650dSSadaf Ebrahimi /abc/g,replace=<$0>,substitute_callout 1562*22dc650dSSadaf Ebrahimi abcdefabcpqr 1563*22dc650dSSadaf Ebrahimi 1(1) Old 0 3 "abc" New 0 5 "<abc>" 1564*22dc650dSSadaf Ebrahimi 2(1) Old 6 9 "abc" New 8 13 "<abc>" 1565*22dc650dSSadaf Ebrahimi 2: <abc>def<abc>pqr 1566*22dc650dSSadaf Ebrahimi</pre> 1567*22dc650dSSadaf EbrahimiThe first number on each callout line is the count of matches. The 1568*22dc650dSSadaf Ebrahimiparenthesized number is the number of pairs that are set in the ovector (that 1569*22dc650dSSadaf Ebrahimiis, one more than the number of capturing groups that were set). Then are 1570*22dc650dSSadaf Ebrahimilisted the offsets of the old substring, its contents, and the same for the 1571*22dc650dSSadaf Ebrahimireplacement. 1572*22dc650dSSadaf Ebrahimi</P> 1573*22dc650dSSadaf Ebrahimi<P> 1574*22dc650dSSadaf EbrahimiBy default, the substitution callout function returns zero, which accepts the 1575*22dc650dSSadaf Ebrahimireplacement and causes matching to continue if /g was used. Two further 1576*22dc650dSSadaf Ebrahimimodifiers can be used to test other return values. If <b>substitute_skip</b> is 1577*22dc650dSSadaf Ebrahimiset to a value greater than zero the callout function returns +1 for the match 1578*22dc650dSSadaf Ebrahimiof that number, and similarly <b>substitute_stop</b> returns -1. These cause the 1579*22dc650dSSadaf Ebrahimireplacement to be rejected, and -1 causes no further matching to take place. If 1580*22dc650dSSadaf Ebrahimieither of them are set, <b>substitute_callout</b> is assumed. For example: 1581*22dc650dSSadaf Ebrahimi<pre> 1582*22dc650dSSadaf Ebrahimi /abc/g,replace=<$0>,substitute_skip=1 1583*22dc650dSSadaf Ebrahimi abcdefabcpqr 1584*22dc650dSSadaf Ebrahimi 1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED" 1585*22dc650dSSadaf Ebrahimi 2(1) Old 6 9 "abc" New 6 11 "<abc>" 1586*22dc650dSSadaf Ebrahimi 2: abcdef<abc>pqr 1587*22dc650dSSadaf Ebrahimi abcdefabcpqr\=substitute_stop=1 1588*22dc650dSSadaf Ebrahimi 1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED" 1589*22dc650dSSadaf Ebrahimi 1: abcdefabcpqr 1590*22dc650dSSadaf Ebrahimi</pre> 1591*22dc650dSSadaf EbrahimiIf both are set for the same number, stop takes precedence. Only a single skip 1592*22dc650dSSadaf Ebrahimior stop is supported, which is sufficient for testing that the feature works. 1593*22dc650dSSadaf Ebrahimi</P> 1594*22dc650dSSadaf Ebrahimi<br><b> 1595*22dc650dSSadaf EbrahimiSetting the JIT stack size 1596*22dc650dSSadaf Ebrahimi</b><br> 1597*22dc650dSSadaf Ebrahimi<P> 1598*22dc650dSSadaf EbrahimiThe <b>jitstack</b> modifier provides a way of setting the maximum stack size 1599*22dc650dSSadaf Ebrahimithat is used by the just-in-time optimization code. It is ignored if JIT 1600*22dc650dSSadaf Ebrahimioptimization is not being used. The value is a number of kibibytes (units of 1601*22dc650dSSadaf Ebrahimi1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack 1602*22dc650dSSadaf Ebrahimithat is larger than the default is necessary only for very complicated 1603*22dc650dSSadaf Ebrahimipatterns. If <b>jitstack</b> is set non-zero on a subject line it overrides any 1604*22dc650dSSadaf Ebrahimivalue that was set on the pattern. 1605*22dc650dSSadaf Ebrahimi</P> 1606*22dc650dSSadaf Ebrahimi<br><b> 1607*22dc650dSSadaf EbrahimiSetting heap, match, and depth limits 1608*22dc650dSSadaf Ebrahimi</b><br> 1609*22dc650dSSadaf Ebrahimi<P> 1610*22dc650dSSadaf EbrahimiThe <b>heap_limit</b>, <b>match_limit</b>, and <b>depth_limit</b> modifiers set 1611*22dc650dSSadaf Ebrahimithe appropriate limits in the match context. These values are ignored when the 1612*22dc650dSSadaf Ebrahimi<b>find_limits</b> or <b>find_limits_noheap</b> modifier is specified. 1613*22dc650dSSadaf Ebrahimi</P> 1614*22dc650dSSadaf Ebrahimi<br><b> 1615*22dc650dSSadaf EbrahimiFinding minimum limits 1616*22dc650dSSadaf Ebrahimi</b><br> 1617*22dc650dSSadaf Ebrahimi<P> 1618*22dc650dSSadaf EbrahimiIf the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b> 1619*22dc650dSSadaf Ebrahimicalls the relevant matching function several times, setting different values in 1620*22dc650dSSadaf Ebrahimithe match context via <b>pcre2_set_heap_limit()</b>, 1621*22dc650dSSadaf Ebrahimi<b>pcre2_set_match_limit()</b>, or <b>pcre2_set_depth_limit()</b> until it finds 1622*22dc650dSSadaf Ebrahimithe smallest value for each parameter that allows the match to complete without 1623*22dc650dSSadaf Ebrahimia "limit exceeded" error. The match itself may succeed or fail. An alternative 1624*22dc650dSSadaf Ebrahimimodifier, <b>find_limits_noheap</b>, omits the heap limit. This is used in the 1625*22dc650dSSadaf Ebrahimistandard tests, because the minimum heap limit varies between systems. If JIT 1626*22dc650dSSadaf Ebrahimiis being used, only the match limit is relevant, and the other two are 1627*22dc650dSSadaf Ebrahimiautomatically omitted. 1628*22dc650dSSadaf Ebrahimi</P> 1629*22dc650dSSadaf Ebrahimi<P> 1630*22dc650dSSadaf EbrahimiWhen using this modifier, the pattern should not contain any limit settings 1631*22dc650dSSadaf Ebrahimisuch as (*LIMIT_MATCH=...) within it. If such a setting is present and is 1632*22dc650dSSadaf Ebrahimilower than the minimum matching value, the minimum value cannot be found 1633*22dc650dSSadaf Ebrahimibecause <b>pcre2_set_match_limit()</b> etc. are only able to reduce the value of 1634*22dc650dSSadaf Ebrahimian in-pattern limit; they cannot increase it. 1635*22dc650dSSadaf Ebrahimi</P> 1636*22dc650dSSadaf Ebrahimi<P> 1637*22dc650dSSadaf EbrahimiFor non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how 1638*22dc650dSSadaf Ebrahimimuch nested backtracking happens (that is, how deeply the pattern's tree is 1639*22dc650dSSadaf Ebrahimisearched). In the case of DFA matching, <i>depth_limit</i> controls the depth of 1640*22dc650dSSadaf Ebrahimirecursive calls of the internal function that is used for handling pattern 1641*22dc650dSSadaf Ebrahimirecursion, lookaround assertions, and atomic groups. 1642*22dc650dSSadaf Ebrahimi</P> 1643*22dc650dSSadaf Ebrahimi<P> 1644*22dc650dSSadaf EbrahimiFor non-DFA matching, the <i>match_limit</i> number is a measure of the amount 1645*22dc650dSSadaf Ebrahimiof backtracking that takes place, and learning the minimum value can be 1646*22dc650dSSadaf Ebrahimiinstructive. For most simple matches, the number is quite small, but for 1647*22dc650dSSadaf Ebrahimipatterns with very large numbers of matching possibilities, it can become large 1648*22dc650dSSadaf Ebrahimivery quickly with increasing length of subject string. In the case of DFA 1649*22dc650dSSadaf Ebrahimimatching, <i>match_limit</i> controls the total number of calls, both recursive 1650*22dc650dSSadaf Ebrahimiand non-recursive, to the internal matching function, thus controlling the 1651*22dc650dSSadaf Ebrahimioverall amount of computing resource that is used. 1652*22dc650dSSadaf Ebrahimi</P> 1653*22dc650dSSadaf Ebrahimi<P> 1654*22dc650dSSadaf EbrahimiFor both kinds of matching, the <i>heap_limit</i> number, which is in kibibytes 1655*22dc650dSSadaf Ebrahimi(units of 1024 bytes), limits the amount of heap memory used for matching. 1656*22dc650dSSadaf Ebrahimi</P> 1657*22dc650dSSadaf Ebrahimi<br><b> 1658*22dc650dSSadaf EbrahimiShowing MARK names 1659*22dc650dSSadaf Ebrahimi</b><br> 1660*22dc650dSSadaf Ebrahimi<P> 1661*22dc650dSSadaf EbrahimiThe <b>mark</b> modifier causes the names from backtracking control verbs that 1662*22dc650dSSadaf Ebrahimiare returned from calls to <b>pcre2_match()</b> to be displayed. If a mark is 1663*22dc650dSSadaf Ebrahimireturned for a match, non-match, or partial match, <b>pcre2test</b> shows it. 1664*22dc650dSSadaf EbrahimiFor a match, it is on a line by itself, tagged with "MK:". Otherwise, it 1665*22dc650dSSadaf Ebrahimiis added to the non-match message. 1666*22dc650dSSadaf Ebrahimi</P> 1667*22dc650dSSadaf Ebrahimi<br><b> 1668*22dc650dSSadaf EbrahimiShowing memory usage 1669*22dc650dSSadaf Ebrahimi</b><br> 1670*22dc650dSSadaf Ebrahimi<P> 1671*22dc650dSSadaf EbrahimiThe <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap 1672*22dc650dSSadaf Ebrahimimemory allocation and freeing calls that occur during a call to 1673*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>. In the latter case, heap memory 1674*22dc650dSSadaf Ebrahimiis used only when a match requires more internal workspace that the default 1675*22dc650dSSadaf Ebrahimiallocation on the stack, so in many cases there will be no output. No heap 1676*22dc650dSSadaf Ebrahimimemory is allocated during matching with JIT. For this modifier to work, the 1677*22dc650dSSadaf Ebrahimi<b>null_context</b> modifier must not be set on both the pattern and the 1678*22dc650dSSadaf Ebrahimisubject, though it can be set on one or the other. 1679*22dc650dSSadaf Ebrahimi</P> 1680*22dc650dSSadaf Ebrahimi<br><b> 1681*22dc650dSSadaf EbrahimiShowing the heap frame overall vector size 1682*22dc650dSSadaf Ebrahimi</b><br> 1683*22dc650dSSadaf Ebrahimi<P> 1684*22dc650dSSadaf EbrahimiThe <b>heapframes_size</b> modifier is relevant for matches using 1685*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> without JIT. After a match has run (whether successful or 1686*22dc650dSSadaf Ebrahiminot) the size, in bytes, of the allocated heap frames vector that is left 1687*22dc650dSSadaf Ebrahimiattached to the match data block is shown. If the matching action involved 1688*22dc650dSSadaf Ebrahimiseveral calls to <b>pcre2_match()</b> (for example, global matching or for 1689*22dc650dSSadaf Ebrahimitiming) only the final value is shown. 1690*22dc650dSSadaf Ebrahimi</P> 1691*22dc650dSSadaf Ebrahimi<P> 1692*22dc650dSSadaf EbrahimiThis modifier is ignored, with a warning, for POSIX or DFA matching. JIT 1693*22dc650dSSadaf Ebrahimimatching does not use the heap frames vector, so the size is always zero, 1694*22dc650dSSadaf Ebrahimiunless there was a previous non-JIT match. Note that specifing a size of zero 1695*22dc650dSSadaf Ebrahimifor the output vector (see below) causes <b>pcre2test</b> to free its match data 1696*22dc650dSSadaf Ebrahimiblock (and associated heap frames vector) and allocate a new one. 1697*22dc650dSSadaf Ebrahimi</P> 1698*22dc650dSSadaf Ebrahimi<br><b> 1699*22dc650dSSadaf EbrahimiSetting a starting offset 1700*22dc650dSSadaf Ebrahimi</b><br> 1701*22dc650dSSadaf Ebrahimi<P> 1702*22dc650dSSadaf EbrahimiThe <b>offset</b> modifier sets an offset in the subject string at which 1703*22dc650dSSadaf Ebrahimimatching starts. Its value is a number of code units, not characters. 1704*22dc650dSSadaf Ebrahimi</P> 1705*22dc650dSSadaf Ebrahimi<br><b> 1706*22dc650dSSadaf EbrahimiSetting an offset limit 1707*22dc650dSSadaf Ebrahimi</b><br> 1708*22dc650dSSadaf Ebrahimi<P> 1709*22dc650dSSadaf EbrahimiThe <b>offset_limit</b> modifier sets a limit for unanchored matches. If a match 1710*22dc650dSSadaf Ebrahimicannot be found starting at or before this offset in the subject, a "no match" 1711*22dc650dSSadaf Ebrahimireturn is given. The data value is a number of code units, not characters. When 1712*22dc650dSSadaf Ebrahimithis modifier is used, the <b>use_offset_limit</b> modifier must have been set 1713*22dc650dSSadaf Ebrahimifor the pattern; if not, an error is generated. 1714*22dc650dSSadaf Ebrahimi</P> 1715*22dc650dSSadaf Ebrahimi<br><b> 1716*22dc650dSSadaf EbrahimiSetting the size of the output vector 1717*22dc650dSSadaf Ebrahimi</b><br> 1718*22dc650dSSadaf Ebrahimi<P> 1719*22dc650dSSadaf EbrahimiThe <b>ovector</b> modifier applies only to the subject line in which it 1720*22dc650dSSadaf Ebrahimiappears, though of course it can also be used to set a default in a 1721*22dc650dSSadaf Ebrahimi<b>#subject</b> command. It specifies the number of pairs of offsets that are 1722*22dc650dSSadaf Ebrahimiavailable for storing matching information. The default is 15. 1723*22dc650dSSadaf Ebrahimi</P> 1724*22dc650dSSadaf Ebrahimi<P> 1725*22dc650dSSadaf EbrahimiA value of zero is useful when testing the POSIX API because it causes 1726*22dc650dSSadaf Ebrahimi<b>regexec()</b> to be called with a NULL capture vector. When not testing the 1727*22dc650dSSadaf EbrahimiPOSIX API, a value of zero is used to cause 1728*22dc650dSSadaf Ebrahimi<b>pcre2_match_data_create_from_pattern()</b> to be called, in order to create a 1729*22dc650dSSadaf Ebrahiminew match block of exactly the right size for the pattern. (It is not possible 1730*22dc650dSSadaf Ebrahimito create a match block with a zero-length ovector; there is always at least 1731*22dc650dSSadaf Ebrahimione pair of offsets.) The old match data block is freed. 1732*22dc650dSSadaf Ebrahimi</P> 1733*22dc650dSSadaf Ebrahimi<br><b> 1734*22dc650dSSadaf EbrahimiPassing the subject as zero-terminated 1735*22dc650dSSadaf Ebrahimi</b><br> 1736*22dc650dSSadaf Ebrahimi<P> 1737*22dc650dSSadaf EbrahimiBy default, the subject string is passed to a native API matching function with 1738*22dc650dSSadaf Ebrahimiits correct length. In order to test the facility for passing a zero-terminated 1739*22dc650dSSadaf Ebrahimistring, the <b>zero_terminate</b> modifier is provided. It causes the length to 1740*22dc650dSSadaf Ebrahimibe passed as PCRE2_ZERO_TERMINATED. When matching via the POSIX interface, 1741*22dc650dSSadaf Ebrahimithis modifier is ignored, with a warning. 1742*22dc650dSSadaf Ebrahimi</P> 1743*22dc650dSSadaf Ebrahimi<P> 1744*22dc650dSSadaf EbrahimiWhen testing <b>pcre2_substitute()</b>, this modifier also has the effect of 1745*22dc650dSSadaf Ebrahimipassing the replacement string as zero-terminated. 1746*22dc650dSSadaf Ebrahimi</P> 1747*22dc650dSSadaf Ebrahimi<br><b> 1748*22dc650dSSadaf EbrahimiPassing a NULL context, subject, or replacement 1749*22dc650dSSadaf Ebrahimi</b><br> 1750*22dc650dSSadaf Ebrahimi<P> 1751*22dc650dSSadaf EbrahimiNormally, <b>pcre2test</b> passes a context block to <b>pcre2_match()</b>, 1752*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>, <b>pcre2_jit_match()</b> or <b>pcre2_substitute()</b>. 1753*22dc650dSSadaf EbrahimiIf the <b>null_context</b> modifier is set, however, NULL is passed. This is for 1754*22dc650dSSadaf Ebrahimitesting that the matching and substitution functions behave correctly in this 1755*22dc650dSSadaf Ebrahimicase (they use default values). This modifier cannot be used with the 1756*22dc650dSSadaf Ebrahimi<b>find_limits</b>, <b>find_limits_noheap</b>, or <b>substitute_callout</b> 1757*22dc650dSSadaf Ebrahimimodifiers. 1758*22dc650dSSadaf Ebrahimi</P> 1759*22dc650dSSadaf Ebrahimi<P> 1760*22dc650dSSadaf EbrahimiSimilarly, for testing purposes, if the <b>null_subject</b> or 1761*22dc650dSSadaf Ebrahimi<b>null_replacement</b> modifier is set, the subject or replacement string 1762*22dc650dSSadaf Ebrahimipointers are passed as NULL, respectively, to the relevant functions. 1763*22dc650dSSadaf Ebrahimi</P> 1764*22dc650dSSadaf Ebrahimi<br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br> 1765*22dc650dSSadaf Ebrahimi<P> 1766*22dc650dSSadaf EbrahimiBy default, <b>pcre2test</b> uses the standard PCRE2 matching function, 1767*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> to match each subject line. PCRE2 also supports an 1768*22dc650dSSadaf Ebrahimialternative matching function, <b>pcre2_dfa_match()</b>, which operates in a 1769*22dc650dSSadaf Ebrahimidifferent way, and has some restrictions. The differences between the two 1770*22dc650dSSadaf Ebrahimifunctions are described in the 1771*22dc650dSSadaf Ebrahimi<a href="pcre2matching.html"><b>pcre2matching</b></a> 1772*22dc650dSSadaf Ebrahimidocumentation. 1773*22dc650dSSadaf Ebrahimi</P> 1774*22dc650dSSadaf Ebrahimi<P> 1775*22dc650dSSadaf EbrahimiIf the <b>dfa</b> modifier is set, the alternative matching function is used. 1776*22dc650dSSadaf EbrahimiThis function finds all possible matches at a given point in the subject. If, 1777*22dc650dSSadaf Ebrahimihowever, the <b>dfa_shortest</b> modifier is set, processing stops after the 1778*22dc650dSSadaf Ebrahimifirst match is found. This is always the shortest possible match. 1779*22dc650dSSadaf Ebrahimi</P> 1780*22dc650dSSadaf Ebrahimi<br><a name="SEC13" href="#TOC1">DEFAULT OUTPUT FROM pcre2test</a><br> 1781*22dc650dSSadaf Ebrahimi<P> 1782*22dc650dSSadaf EbrahimiThis section describes the output when the normal matching function, 1783*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>, is being used. 1784*22dc650dSSadaf Ebrahimi</P> 1785*22dc650dSSadaf Ebrahimi<P> 1786*22dc650dSSadaf EbrahimiWhen a match succeeds, <b>pcre2test</b> outputs the list of captured substrings, 1787*22dc650dSSadaf Ebrahimistarting with number 0 for the string that matched the whole pattern. 1788*22dc650dSSadaf EbrahimiOtherwise, it outputs "No match" when the return is PCRE2_ERROR_NOMATCH, or 1789*22dc650dSSadaf Ebrahimi"Partial match:" followed by the partially matching substring when the 1790*22dc650dSSadaf Ebrahimireturn is PCRE2_ERROR_PARTIAL. (Note that this is the 1791*22dc650dSSadaf Ebrahimientire substring that was inspected during the partial match; it may include 1792*22dc650dSSadaf Ebrahimicharacters before the actual match start if a lookbehind assertion, \K, \b, 1793*22dc650dSSadaf Ebrahimior \B was involved.) 1794*22dc650dSSadaf Ebrahimi</P> 1795*22dc650dSSadaf Ebrahimi<P> 1796*22dc650dSSadaf EbrahimiFor any other return, <b>pcre2test</b> outputs the PCRE2 negative error number 1797*22dc650dSSadaf Ebrahimiand a short descriptive phrase. If the error is a failed UTF string check, the 1798*22dc650dSSadaf Ebrahimicode unit offset of the start of the failing character is also output. Here is 1799*22dc650dSSadaf Ebrahimian example of an interactive <b>pcre2test</b> run. 1800*22dc650dSSadaf Ebrahimi<pre> 1801*22dc650dSSadaf Ebrahimi $ pcre2test 1802*22dc650dSSadaf Ebrahimi PCRE2 version 10.22 2016-07-29 1803*22dc650dSSadaf Ebrahimi 1804*22dc650dSSadaf Ebrahimi re> /^abc(\d+)/ 1805*22dc650dSSadaf Ebrahimi data> abc123 1806*22dc650dSSadaf Ebrahimi 0: abc123 1807*22dc650dSSadaf Ebrahimi 1: 123 1808*22dc650dSSadaf Ebrahimi data> xyz 1809*22dc650dSSadaf Ebrahimi No match 1810*22dc650dSSadaf Ebrahimi</pre> 1811*22dc650dSSadaf EbrahimiUnset capturing substrings that are not followed by one that is set are not 1812*22dc650dSSadaf Ebrahimishown by <b>pcre2test</b> unless the <b>allcaptures</b> modifier is specified. In 1813*22dc650dSSadaf Ebrahimithe following example, there are two capturing substrings, but when the first 1814*22dc650dSSadaf Ebrahimidata line is matched, the second, unset substring is not shown. An "internal" 1815*22dc650dSSadaf Ebrahimiunset substring is shown as "<unset>", as for the second data line. 1816*22dc650dSSadaf Ebrahimi<pre> 1817*22dc650dSSadaf Ebrahimi re> /(a)|(b)/ 1818*22dc650dSSadaf Ebrahimi data> a 1819*22dc650dSSadaf Ebrahimi 0: a 1820*22dc650dSSadaf Ebrahimi 1: a 1821*22dc650dSSadaf Ebrahimi data> b 1822*22dc650dSSadaf Ebrahimi 0: b 1823*22dc650dSSadaf Ebrahimi 1: <unset> 1824*22dc650dSSadaf Ebrahimi 2: b 1825*22dc650dSSadaf Ebrahimi</pre> 1826*22dc650dSSadaf EbrahimiIf the strings contain any non-printing characters, they are output as \xhh 1827*22dc650dSSadaf Ebrahimiescapes if the value is less than 256 and UTF mode is not set. Otherwise they 1828*22dc650dSSadaf Ebrahimiare output as \x{hh...} escapes. See below for the definition of non-printing 1829*22dc650dSSadaf Ebrahimicharacters. If the <b>aftertext</b> modifier is set, the output for substring 0 1830*22dc650dSSadaf Ebrahimiis followed by the rest of the subject string, identified by "0+" like this: 1831*22dc650dSSadaf Ebrahimi<pre> 1832*22dc650dSSadaf Ebrahimi re> /cat/aftertext 1833*22dc650dSSadaf Ebrahimi data> cataract 1834*22dc650dSSadaf Ebrahimi 0: cat 1835*22dc650dSSadaf Ebrahimi 0+ aract 1836*22dc650dSSadaf Ebrahimi</pre> 1837*22dc650dSSadaf EbrahimiIf global matching is requested, the results of successive matching attempts 1838*22dc650dSSadaf Ebrahimiare output in sequence, like this: 1839*22dc650dSSadaf Ebrahimi<pre> 1840*22dc650dSSadaf Ebrahimi re> /\Bi(\w\w)/g 1841*22dc650dSSadaf Ebrahimi data> Mississippi 1842*22dc650dSSadaf Ebrahimi 0: iss 1843*22dc650dSSadaf Ebrahimi 1: ss 1844*22dc650dSSadaf Ebrahimi 0: iss 1845*22dc650dSSadaf Ebrahimi 1: ss 1846*22dc650dSSadaf Ebrahimi 0: ipp 1847*22dc650dSSadaf Ebrahimi 1: pp 1848*22dc650dSSadaf Ebrahimi</pre> 1849*22dc650dSSadaf Ebrahimi"No match" is output only if the first match attempt fails. Here is an example 1850*22dc650dSSadaf Ebrahimiof a failure message (the offset 4 that is specified by the <b>offset</b> 1851*22dc650dSSadaf Ebrahimimodifier is past the end of the subject string): 1852*22dc650dSSadaf Ebrahimi<pre> 1853*22dc650dSSadaf Ebrahimi re> /xyz/ 1854*22dc650dSSadaf Ebrahimi data> xyz\=offset=4 1855*22dc650dSSadaf Ebrahimi Error -24 (bad offset value) 1856*22dc650dSSadaf Ebrahimi</PRE> 1857*22dc650dSSadaf Ebrahimi</P> 1858*22dc650dSSadaf Ebrahimi<P> 1859*22dc650dSSadaf EbrahimiNote that whereas patterns can be continued over several lines (a plain ">" 1860*22dc650dSSadaf Ebrahimiprompt is used for continuations), subject lines may not. However newlines can 1861*22dc650dSSadaf Ebrahimibe included in a subject by means of the \n escape (or \r, \r\n, etc., 1862*22dc650dSSadaf Ebrahimidepending on the newline sequence setting). 1863*22dc650dSSadaf Ebrahimi</P> 1864*22dc650dSSadaf Ebrahimi<br><a name="SEC14" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br> 1865*22dc650dSSadaf Ebrahimi<P> 1866*22dc650dSSadaf EbrahimiWhen the alternative matching function, <b>pcre2_dfa_match()</b>, is used, the 1867*22dc650dSSadaf Ebrahimioutput consists of a list of all the matches that start at the first point in 1868*22dc650dSSadaf Ebrahimithe subject where there is at least one match. For example: 1869*22dc650dSSadaf Ebrahimi<pre> 1870*22dc650dSSadaf Ebrahimi re> /(tang|tangerine|tan)/ 1871*22dc650dSSadaf Ebrahimi data> yellow tangerine\=dfa 1872*22dc650dSSadaf Ebrahimi 0: tangerine 1873*22dc650dSSadaf Ebrahimi 1: tang 1874*22dc650dSSadaf Ebrahimi 2: tan 1875*22dc650dSSadaf Ebrahimi</pre> 1876*22dc650dSSadaf EbrahimiUsing the normal matching function on this data finds only "tang". The 1877*22dc650dSSadaf Ebrahimilongest matching string is always given first (and numbered zero). After a 1878*22dc650dSSadaf EbrahimiPCRE2_ERROR_PARTIAL return, the output is "Partial match:", followed by the 1879*22dc650dSSadaf Ebrahimipartially matching substring. Note that this is the entire substring that was 1880*22dc650dSSadaf Ebrahimiinspected during the partial match; it may include characters before the actual 1881*22dc650dSSadaf Ebrahimimatch start if a lookbehind assertion, \b, or \B was involved. (\K is not 1882*22dc650dSSadaf Ebrahimisupported for DFA matching.) 1883*22dc650dSSadaf Ebrahimi</P> 1884*22dc650dSSadaf Ebrahimi<P> 1885*22dc650dSSadaf EbrahimiIf global matching is requested, the search for further matches resumes 1886*22dc650dSSadaf Ebrahimiat the end of the longest match. For example: 1887*22dc650dSSadaf Ebrahimi<pre> 1888*22dc650dSSadaf Ebrahimi re> /(tang|tangerine|tan)/g 1889*22dc650dSSadaf Ebrahimi data> yellow tangerine and tangy sultana\=dfa 1890*22dc650dSSadaf Ebrahimi 0: tangerine 1891*22dc650dSSadaf Ebrahimi 1: tang 1892*22dc650dSSadaf Ebrahimi 2: tan 1893*22dc650dSSadaf Ebrahimi 0: tang 1894*22dc650dSSadaf Ebrahimi 1: tan 1895*22dc650dSSadaf Ebrahimi 0: tan 1896*22dc650dSSadaf Ebrahimi</pre> 1897*22dc650dSSadaf EbrahimiThe alternative matching function does not support substring capture, so the 1898*22dc650dSSadaf Ebrahimimodifiers that are concerned with captured substrings are not relevant. 1899*22dc650dSSadaf Ebrahimi</P> 1900*22dc650dSSadaf Ebrahimi<br><a name="SEC15" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br> 1901*22dc650dSSadaf Ebrahimi<P> 1902*22dc650dSSadaf EbrahimiWhen the alternative matching function has given the PCRE2_ERROR_PARTIAL 1903*22dc650dSSadaf Ebrahimireturn, indicating that the subject partially matched the pattern, you can 1904*22dc650dSSadaf Ebrahimirestart the match with additional subject data by means of the 1905*22dc650dSSadaf Ebrahimi<b>dfa_restart</b> modifier. For example: 1906*22dc650dSSadaf Ebrahimi<pre> 1907*22dc650dSSadaf Ebrahimi re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ 1908*22dc650dSSadaf Ebrahimi data> 23ja\=ps,dfa 1909*22dc650dSSadaf Ebrahimi Partial match: 23ja 1910*22dc650dSSadaf Ebrahimi data> n05\=dfa,dfa_restart 1911*22dc650dSSadaf Ebrahimi 0: n05 1912*22dc650dSSadaf Ebrahimi</pre> 1913*22dc650dSSadaf EbrahimiFor further information about partial matching, see the 1914*22dc650dSSadaf Ebrahimi<a href="pcre2partial.html"><b>pcre2partial</b></a> 1915*22dc650dSSadaf Ebrahimidocumentation. 1916*22dc650dSSadaf Ebrahimi<a name="callouts"></a></P> 1917*22dc650dSSadaf Ebrahimi<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br> 1918*22dc650dSSadaf Ebrahimi<P> 1919*22dc650dSSadaf EbrahimiIf the pattern contains any callout requests, <b>pcre2test</b>'s callout 1920*22dc650dSSadaf Ebrahimifunction is called during matching unless <b>callout_none</b> is specified. This 1921*22dc650dSSadaf Ebrahimiworks with both matching functions, and with JIT, though there are some 1922*22dc650dSSadaf Ebrahimidifferences in behaviour. The output for callouts with numerical arguments and 1923*22dc650dSSadaf Ebrahimithose with string arguments is slightly different. 1924*22dc650dSSadaf Ebrahimi</P> 1925*22dc650dSSadaf Ebrahimi<br><b> 1926*22dc650dSSadaf EbrahimiCallouts with numerical arguments 1927*22dc650dSSadaf Ebrahimi</b><br> 1928*22dc650dSSadaf Ebrahimi<P> 1929*22dc650dSSadaf EbrahimiBy default, the callout function displays the callout number, the start and 1930*22dc650dSSadaf Ebrahimicurrent positions in the subject text at the callout time, and the next pattern 1931*22dc650dSSadaf Ebrahimiitem to be tested. For example: 1932*22dc650dSSadaf Ebrahimi<pre> 1933*22dc650dSSadaf Ebrahimi --->pqrabcdef 1934*22dc650dSSadaf Ebrahimi 0 ^ ^ \d 1935*22dc650dSSadaf Ebrahimi</pre> 1936*22dc650dSSadaf EbrahimiThis output indicates that callout number 0 occurred for a match attempt 1937*22dc650dSSadaf Ebrahimistarting at the fourth character of the subject string, when the pointer was at 1938*22dc650dSSadaf Ebrahimithe seventh character, and when the next pattern item was \d. Just 1939*22dc650dSSadaf Ebrahimione circumflex is output if the start and current positions are the same, or if 1940*22dc650dSSadaf Ebrahimithe current position precedes the start position, which can happen if the 1941*22dc650dSSadaf Ebrahimicallout is in a lookbehind assertion. 1942*22dc650dSSadaf Ebrahimi</P> 1943*22dc650dSSadaf Ebrahimi<P> 1944*22dc650dSSadaf EbrahimiCallouts numbered 255 are assumed to be automatic callouts, inserted as a 1945*22dc650dSSadaf Ebrahimiresult of the <b>auto_callout</b> pattern modifier. In this case, instead of 1946*22dc650dSSadaf Ebrahimishowing the callout number, the offset in the pattern, preceded by a plus, is 1947*22dc650dSSadaf Ebrahimioutput. For example: 1948*22dc650dSSadaf Ebrahimi<pre> 1949*22dc650dSSadaf Ebrahimi re> /\d?[A-E]\*/auto_callout 1950*22dc650dSSadaf Ebrahimi data> E* 1951*22dc650dSSadaf Ebrahimi --->E* 1952*22dc650dSSadaf Ebrahimi +0 ^ \d? 1953*22dc650dSSadaf Ebrahimi +3 ^ [A-E] 1954*22dc650dSSadaf Ebrahimi +8 ^^ \* 1955*22dc650dSSadaf Ebrahimi +10 ^ ^ 1956*22dc650dSSadaf Ebrahimi 0: E* 1957*22dc650dSSadaf Ebrahimi</pre> 1958*22dc650dSSadaf EbrahimiIf a pattern contains (*MARK) items, an additional line is output whenever 1959*22dc650dSSadaf Ebrahimia change of latest mark is passed to the callout function. For example: 1960*22dc650dSSadaf Ebrahimi<pre> 1961*22dc650dSSadaf Ebrahimi re> /a(*MARK:X)bc/auto_callout 1962*22dc650dSSadaf Ebrahimi data> abc 1963*22dc650dSSadaf Ebrahimi --->abc 1964*22dc650dSSadaf Ebrahimi +0 ^ a 1965*22dc650dSSadaf Ebrahimi +1 ^^ (*MARK:X) 1966*22dc650dSSadaf Ebrahimi +10 ^^ b 1967*22dc650dSSadaf Ebrahimi Latest Mark: X 1968*22dc650dSSadaf Ebrahimi +11 ^ ^ c 1969*22dc650dSSadaf Ebrahimi +12 ^ ^ 1970*22dc650dSSadaf Ebrahimi 0: abc 1971*22dc650dSSadaf Ebrahimi</pre> 1972*22dc650dSSadaf EbrahimiThe mark changes between matching "a" and "b", but stays the same for the rest 1973*22dc650dSSadaf Ebrahimiof the match, so nothing more is output. If, as a result of backtracking, the 1974*22dc650dSSadaf Ebrahimimark reverts to being unset, the text "<unset>" is output. 1975*22dc650dSSadaf Ebrahimi</P> 1976*22dc650dSSadaf Ebrahimi<br><b> 1977*22dc650dSSadaf EbrahimiCallouts with string arguments 1978*22dc650dSSadaf Ebrahimi</b><br> 1979*22dc650dSSadaf Ebrahimi<P> 1980*22dc650dSSadaf EbrahimiThe output for a callout with a string argument is similar, except that instead 1981*22dc650dSSadaf Ebrahimiof outputting a callout number before the position indicators, the callout 1982*22dc650dSSadaf Ebrahimistring and its offset in the pattern string are output before the reflection of 1983*22dc650dSSadaf Ebrahimithe subject string, and the subject string is reflected for each callout. For 1984*22dc650dSSadaf Ebrahimiexample: 1985*22dc650dSSadaf Ebrahimi<pre> 1986*22dc650dSSadaf Ebrahimi re> /^ab(?C'first')cd(?C"second")ef/ 1987*22dc650dSSadaf Ebrahimi data> abcdefg 1988*22dc650dSSadaf Ebrahimi Callout (7): 'first' 1989*22dc650dSSadaf Ebrahimi --->abcdefg 1990*22dc650dSSadaf Ebrahimi ^ ^ c 1991*22dc650dSSadaf Ebrahimi Callout (20): "second" 1992*22dc650dSSadaf Ebrahimi --->abcdefg 1993*22dc650dSSadaf Ebrahimi ^ ^ e 1994*22dc650dSSadaf Ebrahimi 0: abcdef 1995*22dc650dSSadaf Ebrahimi 1996*22dc650dSSadaf Ebrahimi</PRE> 1997*22dc650dSSadaf Ebrahimi</P> 1998*22dc650dSSadaf Ebrahimi<br><b> 1999*22dc650dSSadaf EbrahimiCallout modifiers 2000*22dc650dSSadaf Ebrahimi</b><br> 2001*22dc650dSSadaf Ebrahimi<P> 2002*22dc650dSSadaf EbrahimiThe callout function in <b>pcre2test</b> returns zero (carry on matching) by 2003*22dc650dSSadaf Ebrahimidefault, but you can use a <b>callout_fail</b> modifier in a subject line to 2004*22dc650dSSadaf Ebrahimichange this and other parameters of the callout (see below). 2005*22dc650dSSadaf Ebrahimi</P> 2006*22dc650dSSadaf Ebrahimi<P> 2007*22dc650dSSadaf EbrahimiIf the <b>callout_capture</b> modifier is set, the current captured groups are 2008*22dc650dSSadaf Ebrahimioutput when a callout occurs. This is useful only for non-DFA matching, as 2009*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b> does not support capturing, so no captures are ever 2010*22dc650dSSadaf Ebrahimishown. 2011*22dc650dSSadaf Ebrahimi</P> 2012*22dc650dSSadaf Ebrahimi<P> 2013*22dc650dSSadaf EbrahimiThe normal callout output, showing the callout number or pattern offset (as 2014*22dc650dSSadaf Ebrahimidescribed above) is suppressed if the <b>callout_no_where</b> modifier is set. 2015*22dc650dSSadaf Ebrahimi</P> 2016*22dc650dSSadaf Ebrahimi<P> 2017*22dc650dSSadaf EbrahimiWhen using the interpretive matching function <b>pcre2_match()</b> without JIT, 2018*22dc650dSSadaf Ebrahimisetting the <b>callout_extra</b> modifier causes additional output from 2019*22dc650dSSadaf Ebrahimi<b>pcre2test</b>'s callout function to be generated. For the first callout in a 2020*22dc650dSSadaf Ebrahimimatch attempt at a new starting position in the subject, "New match attempt" is 2021*22dc650dSSadaf Ebrahimioutput. If there has been a backtrack since the last callout (or start of 2022*22dc650dSSadaf Ebrahimimatching if this is the first callout), "Backtrack" is output, followed by "No 2023*22dc650dSSadaf Ebrahimiother matching paths" if the backtrack ended the previous match attempt. For 2024*22dc650dSSadaf Ebrahimiexample: 2025*22dc650dSSadaf Ebrahimi<pre> 2026*22dc650dSSadaf Ebrahimi re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess 2027*22dc650dSSadaf Ebrahimi data> aac\=callout_extra 2028*22dc650dSSadaf Ebrahimi New match attempt 2029*22dc650dSSadaf Ebrahimi --->aac 2030*22dc650dSSadaf Ebrahimi +0 ^ ( 2031*22dc650dSSadaf Ebrahimi +1 ^ a+ 2032*22dc650dSSadaf Ebrahimi +3 ^ ^ ) 2033*22dc650dSSadaf Ebrahimi +4 ^ ^ b 2034*22dc650dSSadaf Ebrahimi Backtrack 2035*22dc650dSSadaf Ebrahimi --->aac 2036*22dc650dSSadaf Ebrahimi +3 ^^ ) 2037*22dc650dSSadaf Ebrahimi +4 ^^ b 2038*22dc650dSSadaf Ebrahimi Backtrack 2039*22dc650dSSadaf Ebrahimi No other matching paths 2040*22dc650dSSadaf Ebrahimi New match attempt 2041*22dc650dSSadaf Ebrahimi --->aac 2042*22dc650dSSadaf Ebrahimi +0 ^ ( 2043*22dc650dSSadaf Ebrahimi +1 ^ a+ 2044*22dc650dSSadaf Ebrahimi +3 ^^ ) 2045*22dc650dSSadaf Ebrahimi +4 ^^ b 2046*22dc650dSSadaf Ebrahimi Backtrack 2047*22dc650dSSadaf Ebrahimi No other matching paths 2048*22dc650dSSadaf Ebrahimi New match attempt 2049*22dc650dSSadaf Ebrahimi --->aac 2050*22dc650dSSadaf Ebrahimi +0 ^ ( 2051*22dc650dSSadaf Ebrahimi +1 ^ a+ 2052*22dc650dSSadaf Ebrahimi Backtrack 2053*22dc650dSSadaf Ebrahimi No other matching paths 2054*22dc650dSSadaf Ebrahimi New match attempt 2055*22dc650dSSadaf Ebrahimi --->aac 2056*22dc650dSSadaf Ebrahimi +0 ^ ( 2057*22dc650dSSadaf Ebrahimi +1 ^ a+ 2058*22dc650dSSadaf Ebrahimi No match 2059*22dc650dSSadaf Ebrahimi</pre> 2060*22dc650dSSadaf EbrahimiNotice that various optimizations must be turned off if you want all possible 2061*22dc650dSSadaf Ebrahimimatching paths to be scanned. If <b>no_start_optimize</b> is not used, there is 2062*22dc650dSSadaf Ebrahimian immediate "no match", without any callouts, because the starting 2063*22dc650dSSadaf Ebrahimioptimization fails to find "b" in the subject, which it knows must be present 2064*22dc650dSSadaf Ebrahimifor any match. If <b>no_auto_possess</b> is not used, the "a+" item is turned 2065*22dc650dSSadaf Ebrahimiinto "a++", which reduces the number of backtracks. 2066*22dc650dSSadaf Ebrahimi</P> 2067*22dc650dSSadaf Ebrahimi<P> 2068*22dc650dSSadaf EbrahimiThe <b>callout_extra</b> modifier has no effect if used with the DFA matching 2069*22dc650dSSadaf Ebrahimifunction, or with JIT. 2070*22dc650dSSadaf Ebrahimi</P> 2071*22dc650dSSadaf Ebrahimi<br><b> 2072*22dc650dSSadaf EbrahimiReturn values from callouts 2073*22dc650dSSadaf Ebrahimi</b><br> 2074*22dc650dSSadaf Ebrahimi<P> 2075*22dc650dSSadaf EbrahimiThe default return from the callout function is zero, which allows matching to 2076*22dc650dSSadaf Ebrahimicontinue. The <b>callout_fail</b> modifier can be given one or two numbers. If 2077*22dc650dSSadaf Ebrahimithere is only one number, 1 is returned instead of 0 (causing matching to 2078*22dc650dSSadaf Ebrahimibacktrack) when a callout of that number is reached. If two numbers (<n>:<m>) 2079*22dc650dSSadaf Ebrahimiare given, 1 is returned when callout <n> is reached and there have been at 2080*22dc650dSSadaf Ebrahimileast <m> callouts. The <b>callout_error</b> modifier is similar, except that 2081*22dc650dSSadaf EbrahimiPCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be 2082*22dc650dSSadaf Ebrahimiaborted. If both these modifiers are set for the same callout number, 2083*22dc650dSSadaf Ebrahimi<b>callout_error</b> takes precedence. Note that callouts with string arguments 2084*22dc650dSSadaf Ebrahimiare always given the number zero. 2085*22dc650dSSadaf Ebrahimi</P> 2086*22dc650dSSadaf Ebrahimi<P> 2087*22dc650dSSadaf EbrahimiThe <b>callout_data</b> modifier can be given an unsigned or a negative number. 2088*22dc650dSSadaf EbrahimiThis is set as the "user data" that is passed to the matching function, and 2089*22dc650dSSadaf Ebrahimipassed back when the callout function is invoked. Any value other than zero is 2090*22dc650dSSadaf Ebrahimiused as a return from <b>pcre2test</b>'s callout function. 2091*22dc650dSSadaf Ebrahimi</P> 2092*22dc650dSSadaf Ebrahimi<P> 2093*22dc650dSSadaf EbrahimiInserting callouts can be helpful when using <b>pcre2test</b> to check 2094*22dc650dSSadaf Ebrahimicomplicated regular expressions. For further information about callouts, see 2095*22dc650dSSadaf Ebrahimithe 2096*22dc650dSSadaf Ebrahimi<a href="pcre2callout.html"><b>pcre2callout</b></a> 2097*22dc650dSSadaf Ebrahimidocumentation. 2098*22dc650dSSadaf Ebrahimi</P> 2099*22dc650dSSadaf Ebrahimi<br><a name="SEC17" href="#TOC1">NON-PRINTING CHARACTERS</a><br> 2100*22dc650dSSadaf Ebrahimi<P> 2101*22dc650dSSadaf EbrahimiWhen <b>pcre2test</b> is outputting text in the compiled version of a pattern, 2102*22dc650dSSadaf Ebrahimibytes other than 32-126 are always treated as non-printing characters and are 2103*22dc650dSSadaf Ebrahimitherefore shown as hex escapes. 2104*22dc650dSSadaf Ebrahimi</P> 2105*22dc650dSSadaf Ebrahimi<P> 2106*22dc650dSSadaf EbrahimiWhen <b>pcre2test</b> is outputting text that is a matched part of a subject 2107*22dc650dSSadaf Ebrahimistring, it behaves in the same way, unless a different locale has been set for 2108*22dc650dSSadaf Ebrahimithe pattern (using the <b>locale</b> modifier). In this case, the 2109*22dc650dSSadaf Ebrahimi<b>isprint()</b> function is used to distinguish printing and non-printing 2110*22dc650dSSadaf Ebrahimicharacters. 2111*22dc650dSSadaf Ebrahimi<a name="saverestore"></a></P> 2112*22dc650dSSadaf Ebrahimi<br><a name="SEC18" href="#TOC1">SAVING AND RESTORING COMPILED PATTERNS</a><br> 2113*22dc650dSSadaf Ebrahimi<P> 2114*22dc650dSSadaf EbrahimiIt is possible to save compiled patterns on disc or elsewhere, and reload them 2115*22dc650dSSadaf Ebrahimilater, subject to a number of restrictions. JIT data cannot be saved. The host 2116*22dc650dSSadaf Ebrahimion which the patterns are reloaded must be running the same version of PCRE2, 2117*22dc650dSSadaf Ebrahimiwith the same code unit width, and must also have the same endianness, pointer 2118*22dc650dSSadaf Ebrahimiwidth and PCRE2_SIZE type. Before compiled patterns can be saved they must be 2119*22dc650dSSadaf Ebrahimiserialized, that is, converted to a stream of bytes. A single byte stream may 2120*22dc650dSSadaf Ebrahimicontain any number of compiled patterns, but they must all use the same 2121*22dc650dSSadaf Ebrahimicharacter tables. A single copy of the tables is included in the byte stream 2122*22dc650dSSadaf Ebrahimi(its size is 1088 bytes). 2123*22dc650dSSadaf Ebrahimi</P> 2124*22dc650dSSadaf Ebrahimi<P> 2125*22dc650dSSadaf EbrahimiThe functions whose names begin with <b>pcre2_serialize_</b> are used 2126*22dc650dSSadaf Ebrahimifor serializing and de-serializing. They are described in the 2127*22dc650dSSadaf Ebrahimi<a href="pcre2serialize.html"><b>pcre2serialize</b></a> 2128*22dc650dSSadaf Ebrahimidocumentation. In this section we describe the features of <b>pcre2test</b> that 2129*22dc650dSSadaf Ebrahimican be used to test these functions. 2130*22dc650dSSadaf Ebrahimi</P> 2131*22dc650dSSadaf Ebrahimi<P> 2132*22dc650dSSadaf EbrahimiNote that "serialization" in PCRE2 does not convert compiled patterns to an 2133*22dc650dSSadaf Ebrahimiabstract format like Java or .NET. It just makes a reloadable byte code stream. 2134*22dc650dSSadaf EbrahimiHence the restrictions on reloading mentioned above. 2135*22dc650dSSadaf Ebrahimi</P> 2136*22dc650dSSadaf Ebrahimi<P> 2137*22dc650dSSadaf EbrahimiIn <b>pcre2test</b>, when a pattern with <b>push</b> modifier is successfully 2138*22dc650dSSadaf Ebrahimicompiled, it is pushed onto a stack of compiled patterns, and <b>pcre2test</b> 2139*22dc650dSSadaf Ebrahimiexpects the next line to contain a new pattern (or command) instead of a 2140*22dc650dSSadaf Ebrahimisubject line. By contrast, the <b>pushcopy</b> modifier causes a copy of the 2141*22dc650dSSadaf Ebrahimicompiled pattern to be stacked, leaving the original available for immediate 2142*22dc650dSSadaf Ebrahimimatching. By using <b>push</b> and/or <b>pushcopy</b>, a number of patterns can 2143*22dc650dSSadaf Ebrahimibe compiled and retained. These modifiers are incompatible with <b>posix</b>, 2144*22dc650dSSadaf Ebrahimiand control modifiers that act at match time are ignored (with a message) for 2145*22dc650dSSadaf Ebrahimithe stacked patterns. The <b>jitverify</b> modifier applies only at compile 2146*22dc650dSSadaf Ebrahimitime. 2147*22dc650dSSadaf Ebrahimi</P> 2148*22dc650dSSadaf Ebrahimi<P> 2149*22dc650dSSadaf EbrahimiThe command 2150*22dc650dSSadaf Ebrahimi<pre> 2151*22dc650dSSadaf Ebrahimi #save <filename> 2152*22dc650dSSadaf Ebrahimi</pre> 2153*22dc650dSSadaf Ebrahimicauses all the stacked patterns to be serialized and the result written to the 2154*22dc650dSSadaf Ebrahiminamed file. Afterwards, all the stacked patterns are freed. The command 2155*22dc650dSSadaf Ebrahimi<pre> 2156*22dc650dSSadaf Ebrahimi #load <filename> 2157*22dc650dSSadaf Ebrahimi</pre> 2158*22dc650dSSadaf Ebrahimireads the data in the file, and then arranges for it to be de-serialized, with 2159*22dc650dSSadaf Ebrahimithe resulting compiled patterns added to the pattern stack. The pattern on the 2160*22dc650dSSadaf Ebrahimitop of the stack can be retrieved by the #pop command, which must be followed 2161*22dc650dSSadaf Ebrahimiby lines of subjects that are to be matched with the pattern, terminated as 2162*22dc650dSSadaf Ebrahimiusual by an empty line or end of file. This command may be followed by a 2163*22dc650dSSadaf Ebrahimimodifier list containing only 2164*22dc650dSSadaf Ebrahimi<a href="#controlmodifiers">control modifiers</a> 2165*22dc650dSSadaf Ebrahimithat act after a pattern has been compiled. In particular, <b>hex</b>, 2166*22dc650dSSadaf Ebrahimi<b>posix</b>, <b>posix_nosub</b>, <b>push</b>, and <b>pushcopy</b> are not allowed, 2167*22dc650dSSadaf Ebrahiminor are any 2168*22dc650dSSadaf Ebrahimi<a href="#optionmodifiers">option-setting modifiers.</a> 2169*22dc650dSSadaf EbrahimiThe JIT modifiers are, however permitted. Here is an example that saves and 2170*22dc650dSSadaf Ebrahimireloads two patterns. 2171*22dc650dSSadaf Ebrahimi<pre> 2172*22dc650dSSadaf Ebrahimi /abc/push 2173*22dc650dSSadaf Ebrahimi /xyz/push 2174*22dc650dSSadaf Ebrahimi #save tempfile 2175*22dc650dSSadaf Ebrahimi #load tempfile 2176*22dc650dSSadaf Ebrahimi #pop info 2177*22dc650dSSadaf Ebrahimi xyz 2178*22dc650dSSadaf Ebrahimi 2179*22dc650dSSadaf Ebrahimi #pop jit,bincode 2180*22dc650dSSadaf Ebrahimi abc 2181*22dc650dSSadaf Ebrahimi</pre> 2182*22dc650dSSadaf EbrahimiIf <b>jitverify</b> is used with #pop, it does not automatically imply 2183*22dc650dSSadaf Ebrahimi<b>jit</b>, which is different behaviour from when it is used on a pattern. 2184*22dc650dSSadaf Ebrahimi</P> 2185*22dc650dSSadaf Ebrahimi<P> 2186*22dc650dSSadaf EbrahimiThe #popcopy command is analogous to the <b>pushcopy</b> modifier in that it 2187*22dc650dSSadaf Ebrahimimakes current a copy of the topmost stack pattern, leaving the original still 2188*22dc650dSSadaf Ebrahimion the stack. 2189*22dc650dSSadaf Ebrahimi</P> 2190*22dc650dSSadaf Ebrahimi<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br> 2191*22dc650dSSadaf Ebrahimi<P> 2192*22dc650dSSadaf Ebrahimi<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3), 2193*22dc650dSSadaf Ebrahimi<b>pcre2jit</b>, <b>pcre2matching</b>(3), <b>pcre2partial</b>(d), 2194*22dc650dSSadaf Ebrahimi<b>pcre2pattern</b>(3), <b>pcre2serialize</b>(3). 2195*22dc650dSSadaf Ebrahimi</P> 2196*22dc650dSSadaf Ebrahimi<br><a name="SEC20" href="#TOC1">AUTHOR</a><br> 2197*22dc650dSSadaf Ebrahimi<P> 2198*22dc650dSSadaf EbrahimiPhilip Hazel 2199*22dc650dSSadaf Ebrahimi<br> 2200*22dc650dSSadaf EbrahimiRetired from University Computing Service 2201*22dc650dSSadaf Ebrahimi<br> 2202*22dc650dSSadaf EbrahimiCambridge, England. 2203*22dc650dSSadaf Ebrahimi<br> 2204*22dc650dSSadaf Ebrahimi</P> 2205*22dc650dSSadaf Ebrahimi<br><a name="SEC21" href="#TOC1">REVISION</a><br> 2206*22dc650dSSadaf Ebrahimi<P> 2207*22dc650dSSadaf EbrahimiLast updated: 24 April 2024 2208*22dc650dSSadaf Ebrahimi<br> 2209*22dc650dSSadaf EbrahimiCopyright © 1997-2024 University of Cambridge. 2210*22dc650dSSadaf Ebrahimi<br> 2211*22dc650dSSadaf Ebrahimi<p> 2212*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>. 2213*22dc650dSSadaf Ebrahimi</p> 2214