xref: /aosp_15_r20/external/pcre/doc/html/pcre2test.html (revision 22dc650d8ae982c6770746019a6f94af92b0f024)
1*22dc650dSSadaf Ebrahimi<html>
2*22dc650dSSadaf Ebrahimi<head>
3*22dc650dSSadaf Ebrahimi<title>pcre2test specification</title>
4*22dc650dSSadaf Ebrahimi</head>
5*22dc650dSSadaf Ebrahimi<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6*22dc650dSSadaf Ebrahimi<h1>pcre2test man page</h1>
7*22dc650dSSadaf Ebrahimi<p>
8*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>.
9*22dc650dSSadaf Ebrahimi</p>
10*22dc650dSSadaf Ebrahimi<p>
11*22dc650dSSadaf EbrahimiThis page is part of the PCRE2 HTML documentation. It was generated
12*22dc650dSSadaf Ebrahimiautomatically from the original man page. If there is any nonsense in it,
13*22dc650dSSadaf Ebrahimiplease consult the man page, in case the conversion went wrong.
14*22dc650dSSadaf Ebrahimi<br>
15*22dc650dSSadaf Ebrahimi<ul>
16*22dc650dSSadaf Ebrahimi<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
17*22dc650dSSadaf Ebrahimi<li><a name="TOC2" href="#SEC2">PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
18*22dc650dSSadaf Ebrahimi<li><a name="TOC3" href="#SEC3">INPUT ENCODING</a>
19*22dc650dSSadaf Ebrahimi<li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a>
20*22dc650dSSadaf Ebrahimi<li><a name="TOC5" href="#SEC5">DESCRIPTION</a>
21*22dc650dSSadaf Ebrahimi<li><a name="TOC6" href="#SEC6">COMMAND LINES</a>
22*22dc650dSSadaf Ebrahimi<li><a name="TOC7" href="#SEC7">MODIFIER SYNTAX</a>
23*22dc650dSSadaf Ebrahimi<li><a name="TOC8" href="#SEC8">PATTERN SYNTAX</a>
24*22dc650dSSadaf Ebrahimi<li><a name="TOC9" href="#SEC9">SUBJECT LINE SYNTAX</a>
25*22dc650dSSadaf Ebrahimi<li><a name="TOC10" href="#SEC10">PATTERN MODIFIERS</a>
26*22dc650dSSadaf Ebrahimi<li><a name="TOC11" href="#SEC11">SUBJECT MODIFIERS</a>
27*22dc650dSSadaf Ebrahimi<li><a name="TOC12" href="#SEC12">THE ALTERNATIVE MATCHING FUNCTION</a>
28*22dc650dSSadaf Ebrahimi<li><a name="TOC13" href="#SEC13">DEFAULT OUTPUT FROM pcre2test</a>
29*22dc650dSSadaf Ebrahimi<li><a name="TOC14" href="#SEC14">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
30*22dc650dSSadaf Ebrahimi<li><a name="TOC15" href="#SEC15">RESTARTING AFTER A PARTIAL MATCH</a>
31*22dc650dSSadaf Ebrahimi<li><a name="TOC16" href="#SEC16">CALLOUTS</a>
32*22dc650dSSadaf Ebrahimi<li><a name="TOC17" href="#SEC17">NON-PRINTING CHARACTERS</a>
33*22dc650dSSadaf Ebrahimi<li><a name="TOC18" href="#SEC18">SAVING AND RESTORING COMPILED PATTERNS</a>
34*22dc650dSSadaf Ebrahimi<li><a name="TOC19" href="#SEC19">SEE ALSO</a>
35*22dc650dSSadaf Ebrahimi<li><a name="TOC20" href="#SEC20">AUTHOR</a>
36*22dc650dSSadaf Ebrahimi<li><a name="TOC21" href="#SEC21">REVISION</a>
37*22dc650dSSadaf Ebrahimi</ul>
38*22dc650dSSadaf Ebrahimi<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
39*22dc650dSSadaf Ebrahimi<P>
40*22dc650dSSadaf Ebrahimi<b>pcre2test [options] [input file [output file]]</b>
41*22dc650dSSadaf Ebrahimi<br>
42*22dc650dSSadaf Ebrahimi<br>
43*22dc650dSSadaf Ebrahimi<b>pcre2test</b> is a test program for the PCRE2 regular expression libraries,
44*22dc650dSSadaf Ebrahimibut it can also be used for experimenting with regular expressions. This
45*22dc650dSSadaf Ebrahimidocument describes the features of the test program; for details of the regular
46*22dc650dSSadaf Ebrahimiexpressions themselves, see the
47*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
48*22dc650dSSadaf Ebrahimidocumentation. For details of the PCRE2 library function calls and their
49*22dc650dSSadaf Ebrahimioptions, see the
50*22dc650dSSadaf Ebrahimi<a href="pcre2api.html"><b>pcre2api</b></a>
51*22dc650dSSadaf Ebrahimidocumentation.
52*22dc650dSSadaf Ebrahimi</P>
53*22dc650dSSadaf Ebrahimi<P>
54*22dc650dSSadaf EbrahimiThe input for <b>pcre2test</b> is a sequence of regular expression patterns and
55*22dc650dSSadaf Ebrahimisubject strings to be matched. There are also command lines for setting
56*22dc650dSSadaf Ebrahimidefaults and controlling some special actions. The output shows the result of
57*22dc650dSSadaf Ebrahimieach match attempt. Modifiers on external or internal command lines, the
58*22dc650dSSadaf Ebrahimipatterns, and the subject lines specify PCRE2 function options, control how the
59*22dc650dSSadaf Ebrahimisubject is processed, and what output is produced.
60*22dc650dSSadaf Ebrahimi</P>
61*22dc650dSSadaf Ebrahimi<P>
62*22dc650dSSadaf EbrahimiThere are many obscure modifiers, some of which are specifically designed for
63*22dc650dSSadaf Ebrahimiuse in conjunction with the test script and data files that are distributed as
64*22dc650dSSadaf Ebrahimipart of PCRE2. All the modifiers are documented here, some without much
65*22dc650dSSadaf Ebrahimijustification, but many of them are unlikely to be of use except when testing
66*22dc650dSSadaf Ebrahimithe libraries.
67*22dc650dSSadaf Ebrahimi</P>
68*22dc650dSSadaf Ebrahimi<br><a name="SEC2" href="#TOC1">PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
69*22dc650dSSadaf Ebrahimi<P>
70*22dc650dSSadaf EbrahimiDifferent versions of the PCRE2 library can be built to support character
71*22dc650dSSadaf Ebrahimistrings that are encoded in 8-bit, 16-bit, or 32-bit code units. One, two, or
72*22dc650dSSadaf Ebrahimiall three of these libraries may be simultaneously installed. The
73*22dc650dSSadaf Ebrahimi<b>pcre2test</b> program can be used to test all the libraries. However, its own
74*22dc650dSSadaf Ebrahimiinput and output are always in 8-bit format. When testing the 16-bit or 32-bit
75*22dc650dSSadaf Ebrahimilibraries, patterns and subject strings are converted to 16-bit or 32-bit
76*22dc650dSSadaf Ebrahimiformat before being passed to the library functions. Results are converted back
77*22dc650dSSadaf Ebrahimito 8-bit code units for output.
78*22dc650dSSadaf Ebrahimi</P>
79*22dc650dSSadaf Ebrahimi<P>
80*22dc650dSSadaf EbrahimiIn the rest of this document, the names of library functions and structures
81*22dc650dSSadaf Ebrahimiare given in generic form, for example, <b>pcre2_compile()</b>. The actual
82*22dc650dSSadaf Ebrahiminames used in the libraries have a suffix _8, _16, or _32, as appropriate.
83*22dc650dSSadaf Ebrahimi<a name="inputencoding"></a></P>
84*22dc650dSSadaf Ebrahimi<br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
85*22dc650dSSadaf Ebrahimi<P>
86*22dc650dSSadaf EbrahimiInput to <b>pcre2test</b> is processed line by line, either by calling the C
87*22dc650dSSadaf Ebrahimilibrary's <b>fgets()</b> function, or via the <b>libreadline</b> or <b>libedit</b>
88*22dc650dSSadaf Ebrahimilibrary. In some Windows environments character 26 (hex 1A) causes an immediate
89*22dc650dSSadaf Ebrahimiend of file, and no further data is read, so this character should be avoided
90*22dc650dSSadaf Ebrahimiunless you really want that action.
91*22dc650dSSadaf Ebrahimi</P>
92*22dc650dSSadaf Ebrahimi<P>
93*22dc650dSSadaf EbrahimiThe input is processed using C's string functions, so must not contain binary
94*22dc650dSSadaf Ebrahimizeros, even though in Unix-like environments, <b>fgets()</b> treats any bytes
95*22dc650dSSadaf Ebrahimiother than newline as data characters. An error is generated if a binary zero
96*22dc650dSSadaf Ebrahimiis encountered. By default subject lines are processed for backslash escapes,
97*22dc650dSSadaf Ebrahimiwhich makes it possible to include any data value in strings that are passed to
98*22dc650dSSadaf Ebrahimithe library for matching. For patterns, there is a facility for specifying some
99*22dc650dSSadaf Ebrahimior all of the 8-bit input characters as hexadecimal pairs, which makes it
100*22dc650dSSadaf Ebrahimipossible to include binary zeros.
101*22dc650dSSadaf Ebrahimi</P>
102*22dc650dSSadaf Ebrahimi<br><b>
103*22dc650dSSadaf EbrahimiInput for the 16-bit and 32-bit libraries
104*22dc650dSSadaf Ebrahimi</b><br>
105*22dc650dSSadaf Ebrahimi<P>
106*22dc650dSSadaf EbrahimiWhen testing the 16-bit or 32-bit libraries, there is a need to be able to
107*22dc650dSSadaf Ebrahimigenerate character code points greater than 255 in the strings that are passed
108*22dc650dSSadaf Ebrahimito the library. For subject lines, backslash escapes can be used. In addition,
109*22dc650dSSadaf Ebrahimiwhen the <b>utf</b> modifier (see
110*22dc650dSSadaf Ebrahimi<a href="#optionmodifiers">"Setting compilation options"</a>
111*22dc650dSSadaf Ebrahimibelow) is set, the pattern and any following subject lines are interpreted as
112*22dc650dSSadaf EbrahimiUTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
113*22dc650dSSadaf Ebrahimi</P>
114*22dc650dSSadaf Ebrahimi<P>
115*22dc650dSSadaf EbrahimiFor non-UTF testing of wide characters, the <b>utf8_input</b> modifier can be
116*22dc650dSSadaf Ebrahimiused. This is mutually exclusive with <b>utf</b>, and is allowed only in 16-bit
117*22dc650dSSadaf Ebrahimior 32-bit mode. It causes the pattern and following subject lines to be treated
118*22dc650dSSadaf Ebrahimias UTF-8 according to the original definition (RFC 2279), which allows for
119*22dc650dSSadaf Ebrahimicharacter values up to 0x7fffffff. Each character is placed in one 16-bit or
120*22dc650dSSadaf Ebrahimi32-bit code unit (in the 16-bit case, values greater than 0xffff cause an error
121*22dc650dSSadaf Ebrahimito occur).
122*22dc650dSSadaf Ebrahimi</P>
123*22dc650dSSadaf Ebrahimi<P>
124*22dc650dSSadaf EbrahimiUTF-8 (in its original definition) is not capable of encoding values greater
125*22dc650dSSadaf Ebrahimithan 0x7fffffff, but such values can be handled by the 32-bit library. When
126*22dc650dSSadaf Ebrahimitesting this library in non-UTF mode with <b>utf8_input</b> set, if any
127*22dc650dSSadaf Ebrahimicharacter is preceded by the byte 0xff (which is an invalid byte in UTF-8)
128*22dc650dSSadaf Ebrahimi0x80000000 is added to the character's value. This is the only way of passing
129*22dc650dSSadaf Ebrahimisuch code points in a pattern string. For subject strings, using an escape
130*22dc650dSSadaf Ebrahimisequence is preferable.
131*22dc650dSSadaf Ebrahimi</P>
132*22dc650dSSadaf Ebrahimi<br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br>
133*22dc650dSSadaf Ebrahimi<P>
134*22dc650dSSadaf Ebrahimi<b>-8</b>
135*22dc650dSSadaf EbrahimiIf the 8-bit library has been built, this option causes it to be used (this is
136*22dc650dSSadaf Ebrahimithe default). If the 8-bit library has not been built, this option causes an
137*22dc650dSSadaf Ebrahimierror.
138*22dc650dSSadaf Ebrahimi</P>
139*22dc650dSSadaf Ebrahimi<P>
140*22dc650dSSadaf Ebrahimi<b>-16</b>
141*22dc650dSSadaf EbrahimiIf the 16-bit library has been built, this option causes it to be used. If the
142*22dc650dSSadaf Ebrahimi8-bit library has not been built, this is the default. If the 16-bit library
143*22dc650dSSadaf Ebrahimihas not been built, this option causes an error.
144*22dc650dSSadaf Ebrahimi</P>
145*22dc650dSSadaf Ebrahimi<P>
146*22dc650dSSadaf Ebrahimi<b>-32</b>
147*22dc650dSSadaf EbrahimiIf the 32-bit library has been built, this option causes it to be used. If no
148*22dc650dSSadaf Ebrahimiother library has been built, this is the default. If the 32-bit library has
149*22dc650dSSadaf Ebrahiminot been built, this option causes an error.
150*22dc650dSSadaf Ebrahimi</P>
151*22dc650dSSadaf Ebrahimi<P>
152*22dc650dSSadaf Ebrahimi<b>-ac</b>
153*22dc650dSSadaf EbrahimiBehave as if each pattern has the <b>auto_callout</b> modifier, that is, insert
154*22dc650dSSadaf Ebrahimiautomatic callouts into every pattern that is compiled.
155*22dc650dSSadaf Ebrahimi</P>
156*22dc650dSSadaf Ebrahimi<P>
157*22dc650dSSadaf Ebrahimi<b>-AC</b>
158*22dc650dSSadaf EbrahimiAs for <b>-ac</b>, but in addition behave as if each subject line has the
159*22dc650dSSadaf Ebrahimi<b>callout_extra</b> modifier, that is, show additional information from
160*22dc650dSSadaf Ebrahimicallouts.
161*22dc650dSSadaf Ebrahimi</P>
162*22dc650dSSadaf Ebrahimi<P>
163*22dc650dSSadaf Ebrahimi<b>-b</b>
164*22dc650dSSadaf EbrahimiBehave as if each pattern has the <b>fullbincode</b> modifier; the full
165*22dc650dSSadaf Ebrahimiinternal binary form of the pattern is output after compilation.
166*22dc650dSSadaf Ebrahimi</P>
167*22dc650dSSadaf Ebrahimi<P>
168*22dc650dSSadaf Ebrahimi<b>-C</b>
169*22dc650dSSadaf EbrahimiOutput the version number of the PCRE2 library, and all available information
170*22dc650dSSadaf Ebrahimiabout the optional features that are included, and then exit with zero exit
171*22dc650dSSadaf Ebrahimicode. All other options are ignored. If both -C and -LM are present, whichever
172*22dc650dSSadaf Ebrahimiis first is recognized.
173*22dc650dSSadaf Ebrahimi</P>
174*22dc650dSSadaf Ebrahimi<P>
175*22dc650dSSadaf Ebrahimi<b>-C</b> <i>option</i>
176*22dc650dSSadaf EbrahimiOutput information about a specific build-time option, then exit. This
177*22dc650dSSadaf Ebrahimifunctionality is intended for use in scripts such as <b>RunTest</b>. The
178*22dc650dSSadaf Ebrahimifollowing options output the value and set the exit code as indicated:
179*22dc650dSSadaf Ebrahimi<pre>
180*22dc650dSSadaf Ebrahimi  ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
181*22dc650dSSadaf Ebrahimi               0x15 or 0x25
182*22dc650dSSadaf Ebrahimi               0 if used in an ASCII environment
183*22dc650dSSadaf Ebrahimi               exit code is always 0
184*22dc650dSSadaf Ebrahimi  linksize   the configured internal link size (2, 3, or 4)
185*22dc650dSSadaf Ebrahimi               exit code is set to the link size
186*22dc650dSSadaf Ebrahimi  newline    the default newline setting:
187*22dc650dSSadaf Ebrahimi               CR, LF, CRLF, ANYCRLF, ANY, or NUL
188*22dc650dSSadaf Ebrahimi               exit code is always 0
189*22dc650dSSadaf Ebrahimi  bsr        the default setting for what \R matches:
190*22dc650dSSadaf Ebrahimi               ANYCRLF or ANY
191*22dc650dSSadaf Ebrahimi               exit code is always 0
192*22dc650dSSadaf Ebrahimi</pre>
193*22dc650dSSadaf EbrahimiThe following options output 1 for true or 0 for false, and set the exit code
194*22dc650dSSadaf Ebrahimito the same value:
195*22dc650dSSadaf Ebrahimi<pre>
196*22dc650dSSadaf Ebrahimi  backslash-C  \C is supported (not locked out)
197*22dc650dSSadaf Ebrahimi  ebcdic       compiled for an EBCDIC environment
198*22dc650dSSadaf Ebrahimi  jit          just-in-time support is available
199*22dc650dSSadaf Ebrahimi  pcre2-16     the 16-bit library was built
200*22dc650dSSadaf Ebrahimi  pcre2-32     the 32-bit library was built
201*22dc650dSSadaf Ebrahimi  pcre2-8      the 8-bit library was built
202*22dc650dSSadaf Ebrahimi  unicode      Unicode support is available
203*22dc650dSSadaf Ebrahimi</pre>
204*22dc650dSSadaf EbrahimiIf an unknown option is given, an error message is output; the exit code is 0.
205*22dc650dSSadaf Ebrahimi</P>
206*22dc650dSSadaf Ebrahimi<P>
207*22dc650dSSadaf Ebrahimi<b>-d</b>
208*22dc650dSSadaf EbrahimiBehave as if each pattern has the <b>debug</b> modifier; the internal
209*22dc650dSSadaf Ebrahimiform and information about the compiled pattern is output after compilation;
210*22dc650dSSadaf Ebrahimi<b>-d</b> is equivalent to <b>-b -i</b>.
211*22dc650dSSadaf Ebrahimi</P>
212*22dc650dSSadaf Ebrahimi<P>
213*22dc650dSSadaf Ebrahimi<b>-dfa</b>
214*22dc650dSSadaf EbrahimiBehave as if each subject line has the <b>dfa</b> modifier; matching is done
215*22dc650dSSadaf Ebrahimiusing the <b>pcre2_dfa_match()</b> function instead of the default
216*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>.
217*22dc650dSSadaf Ebrahimi</P>
218*22dc650dSSadaf Ebrahimi<P>
219*22dc650dSSadaf Ebrahimi<b>-error</b> <i>number[,number,...]</i>
220*22dc650dSSadaf EbrahimiCall <b>pcre2_get_error_message()</b> for each of the error numbers in the
221*22dc650dSSadaf Ebrahimicomma-separated list, display the resulting messages on the standard output,
222*22dc650dSSadaf Ebrahimithen exit with zero exit code. The numbers may be positive or negative. This is
223*22dc650dSSadaf Ebrahimia convenience facility for PCRE2 maintainers.
224*22dc650dSSadaf Ebrahimi</P>
225*22dc650dSSadaf Ebrahimi<P>
226*22dc650dSSadaf Ebrahimi<b>-help</b>
227*22dc650dSSadaf EbrahimiOutput a brief summary these options and then exit.
228*22dc650dSSadaf Ebrahimi</P>
229*22dc650dSSadaf Ebrahimi<P>
230*22dc650dSSadaf Ebrahimi<b>-i</b>
231*22dc650dSSadaf EbrahimiBehave as if each pattern has the <b>info</b> modifier; information about the
232*22dc650dSSadaf Ebrahimicompiled pattern is given after compilation.
233*22dc650dSSadaf Ebrahimi</P>
234*22dc650dSSadaf Ebrahimi<P>
235*22dc650dSSadaf Ebrahimi<b>-jit</b>
236*22dc650dSSadaf EbrahimiBehave as if each pattern line has the <b>jit</b> modifier; after successful
237*22dc650dSSadaf Ebrahimicompilation, each pattern is passed to the just-in-time compiler, if available.
238*22dc650dSSadaf Ebrahimi</P>
239*22dc650dSSadaf Ebrahimi<P>
240*22dc650dSSadaf Ebrahimi<b>-jitfast</b>
241*22dc650dSSadaf EbrahimiBehave as if each pattern line has the <b>jitfast</b> modifier; after
242*22dc650dSSadaf Ebrahimisuccessful compilation, each pattern is passed to the just-in-time compiler, if
243*22dc650dSSadaf Ebrahimiavailable, and each subject line is passed directly to the JIT matcher via its
244*22dc650dSSadaf Ebrahimi"fast path".
245*22dc650dSSadaf Ebrahimi</P>
246*22dc650dSSadaf Ebrahimi<P>
247*22dc650dSSadaf Ebrahimi<b>-jitverify</b>
248*22dc650dSSadaf EbrahimiBehave as if each pattern line has the <b>jitverify</b> modifier; after
249*22dc650dSSadaf Ebrahimisuccessful compilation, each pattern is passed to the just-in-time compiler, if
250*22dc650dSSadaf Ebrahimiavailable, and the use of JIT for matching is verified.
251*22dc650dSSadaf Ebrahimi</P>
252*22dc650dSSadaf Ebrahimi<P>
253*22dc650dSSadaf Ebrahimi<b>-LM</b>
254*22dc650dSSadaf EbrahimiList modifiers: write a list of available pattern and subject modifiers to the
255*22dc650dSSadaf Ebrahimistandard output, then exit with zero exit code. All other options are ignored.
256*22dc650dSSadaf EbrahimiIf both -C and any -Lx options are present, whichever is first is recognized.
257*22dc650dSSadaf Ebrahimi</P>
258*22dc650dSSadaf Ebrahimi<P>
259*22dc650dSSadaf Ebrahimi<b>-LP</b>
260*22dc650dSSadaf EbrahimiList properties: write a list of recognized Unicode properties to the standard
261*22dc650dSSadaf Ebrahimioutput, then exit with zero exit code. All other options are ignored. If both
262*22dc650dSSadaf Ebrahimi-C and any -Lx options are present, whichever is first is recognized.
263*22dc650dSSadaf Ebrahimi</P>
264*22dc650dSSadaf Ebrahimi<P>
265*22dc650dSSadaf Ebrahimi<b>-LS</b>
266*22dc650dSSadaf EbrahimiList scripts: write a list of recognized Unicode script names to the standard
267*22dc650dSSadaf Ebrahimioutput, then exit with zero exit code. All other options are ignored. If both
268*22dc650dSSadaf Ebrahimi-C and any -Lx options are present, whichever is first is recognized.
269*22dc650dSSadaf Ebrahimi</P>
270*22dc650dSSadaf Ebrahimi<P>
271*22dc650dSSadaf Ebrahimi<b>-pattern</b> <i>modifier-list</i>
272*22dc650dSSadaf EbrahimiBehave as if each pattern line contains the given modifiers.
273*22dc650dSSadaf Ebrahimi</P>
274*22dc650dSSadaf Ebrahimi<P>
275*22dc650dSSadaf Ebrahimi<b>-q</b>
276*22dc650dSSadaf EbrahimiDo not output the version number of <b>pcre2test</b> at the start of execution.
277*22dc650dSSadaf Ebrahimi</P>
278*22dc650dSSadaf Ebrahimi<P>
279*22dc650dSSadaf Ebrahimi<b>-S</b> <i>size</i>
280*22dc650dSSadaf EbrahimiOn Unix-like systems, set the size of the run-time stack to <i>size</i>
281*22dc650dSSadaf Ebrahimimebibytes (units of 1024*1024 bytes).
282*22dc650dSSadaf Ebrahimi</P>
283*22dc650dSSadaf Ebrahimi<P>
284*22dc650dSSadaf Ebrahimi<b>-subject</b> <i>modifier-list</i>
285*22dc650dSSadaf EbrahimiBehave as if each subject line contains the given modifiers.
286*22dc650dSSadaf Ebrahimi</P>
287*22dc650dSSadaf Ebrahimi<P>
288*22dc650dSSadaf Ebrahimi<b>-t</b>
289*22dc650dSSadaf EbrahimiRun each compile and match many times with a timer, and output the resulting
290*22dc650dSSadaf Ebrahimitimes per compile or match. When JIT is used, separate times are given for the
291*22dc650dSSadaf Ebrahimiinitial compile and the JIT compile. You can control the number of iterations
292*22dc650dSSadaf Ebrahimithat are used for timing by following <b>-t</b> with a number (as a separate
293*22dc650dSSadaf Ebrahimiitem on the command line). For example, "-t 1000" iterates 1000 times. The
294*22dc650dSSadaf Ebrahimidefault is to iterate 500,000 times.
295*22dc650dSSadaf Ebrahimi</P>
296*22dc650dSSadaf Ebrahimi<P>
297*22dc650dSSadaf Ebrahimi<b>-tm</b>
298*22dc650dSSadaf EbrahimiThis is like <b>-t</b> except that it times only the matching phase, not the
299*22dc650dSSadaf Ebrahimicompile phase.
300*22dc650dSSadaf Ebrahimi</P>
301*22dc650dSSadaf Ebrahimi<P>
302*22dc650dSSadaf Ebrahimi<b>-T</b> <b>-TM</b>
303*22dc650dSSadaf EbrahimiThese behave like <b>-t</b> and <b>-tm</b>, but in addition, at the end of a run,
304*22dc650dSSadaf Ebrahimithe total times for all compiles and matches are output.
305*22dc650dSSadaf Ebrahimi</P>
306*22dc650dSSadaf Ebrahimi<P>
307*22dc650dSSadaf Ebrahimi<b>-version</b>
308*22dc650dSSadaf EbrahimiOutput the PCRE2 version number and then exit.
309*22dc650dSSadaf Ebrahimi</P>
310*22dc650dSSadaf Ebrahimi<br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br>
311*22dc650dSSadaf Ebrahimi<P>
312*22dc650dSSadaf EbrahimiIf <b>pcre2test</b> is given two filename arguments, it reads from the first and
313*22dc650dSSadaf Ebrahimiwrites to the second. If the first name is "-", input is taken from the
314*22dc650dSSadaf Ebrahimistandard input. If <b>pcre2test</b> is given only one argument, it reads from
315*22dc650dSSadaf Ebrahimithat file and writes to stdout. Otherwise, it reads from stdin and writes to
316*22dc650dSSadaf Ebrahimistdout.
317*22dc650dSSadaf Ebrahimi</P>
318*22dc650dSSadaf Ebrahimi<P>
319*22dc650dSSadaf EbrahimiWhen <b>pcre2test</b> is built, a configuration option can specify that it
320*22dc650dSSadaf Ebrahimishould be linked with the <b>libreadline</b> or <b>libedit</b> library. When this
321*22dc650dSSadaf Ebrahimiis done, if the input is from a terminal, it is read using the <b>readline()</b>
322*22dc650dSSadaf Ebrahimifunction. This provides line-editing and history facilities. The output from
323*22dc650dSSadaf Ebrahimithe <b>-help</b> option states whether or not <b>readline()</b> will be used.
324*22dc650dSSadaf Ebrahimi</P>
325*22dc650dSSadaf Ebrahimi<P>
326*22dc650dSSadaf EbrahimiThe program handles any number of tests, each of which consists of a set of
327*22dc650dSSadaf Ebrahimiinput lines. Each set starts with a regular expression pattern, followed by any
328*22dc650dSSadaf Ebrahiminumber of subject lines to be matched against that pattern. In between sets of
329*22dc650dSSadaf Ebrahimitest data, command lines that begin with # may appear. This file format, with
330*22dc650dSSadaf Ebrahimisome restrictions, can also be processed by the <b>perltest.sh</b> script that
331*22dc650dSSadaf Ebrahimiis distributed with PCRE2 as a means of checking that the behaviour of PCRE2
332*22dc650dSSadaf Ebrahimiand Perl is the same. For a specification of <b>perltest.sh</b>, see the
333*22dc650dSSadaf Ebrahimicomments near its beginning. See also the #perltest command below.
334*22dc650dSSadaf Ebrahimi</P>
335*22dc650dSSadaf Ebrahimi<P>
336*22dc650dSSadaf EbrahimiWhen the input is a terminal, <b>pcre2test</b> prompts for each line of input,
337*22dc650dSSadaf Ebrahimiusing "re&#62;" to prompt for regular expression patterns, and "data&#62;" to prompt
338*22dc650dSSadaf Ebrahimifor subject lines. Command lines starting with # can be entered only in
339*22dc650dSSadaf Ebrahimiresponse to the "re&#62;" prompt.
340*22dc650dSSadaf Ebrahimi</P>
341*22dc650dSSadaf Ebrahimi<P>
342*22dc650dSSadaf EbrahimiEach subject line is matched separately and independently. If you want to do
343*22dc650dSSadaf Ebrahimimulti-line matches, you have to use the \n escape sequence (or \r or \r\n,
344*22dc650dSSadaf Ebrahimietc., depending on the newline setting) in a single line of input to encode the
345*22dc650dSSadaf Ebrahiminewline sequences. There is no limit on the length of subject lines; the input
346*22dc650dSSadaf Ebrahimibuffer is automatically extended if it is too small. There are replication
347*22dc650dSSadaf Ebrahimifeatures that makes it possible to generate long repetitive pattern or subject
348*22dc650dSSadaf Ebrahimilines without having to supply them explicitly.
349*22dc650dSSadaf Ebrahimi</P>
350*22dc650dSSadaf Ebrahimi<P>
351*22dc650dSSadaf EbrahimiAn empty line or the end of the file signals the end of the subject lines for a
352*22dc650dSSadaf Ebrahimitest, at which point a new pattern or command line is expected if there is
353*22dc650dSSadaf Ebrahimistill input to be read.
354*22dc650dSSadaf Ebrahimi</P>
355*22dc650dSSadaf Ebrahimi<br><a name="SEC6" href="#TOC1">COMMAND LINES</a><br>
356*22dc650dSSadaf Ebrahimi<P>
357*22dc650dSSadaf EbrahimiIn between sets of test data, a line that begins with # is interpreted as a
358*22dc650dSSadaf Ebrahimicommand line. If the first character is followed by white space or an
359*22dc650dSSadaf Ebrahimiexclamation mark, the line is treated as a comment, and ignored. Otherwise, the
360*22dc650dSSadaf Ebrahimifollowing commands are recognized:
361*22dc650dSSadaf Ebrahimi<pre>
362*22dc650dSSadaf Ebrahimi  #forbid_utf
363*22dc650dSSadaf Ebrahimi</pre>
364*22dc650dSSadaf EbrahimiSubsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
365*22dc650dSSadaf Ebrahimioptions set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and
366*22dc650dSSadaf Ebrahimithe use of (*UTF) and (*UCP) at the start of patterns. This command also forces
367*22dc650dSSadaf Ebrahimian error if a subsequent pattern contains any occurrences of \P, \p, or \X,
368*22dc650dSSadaf Ebrahimiwhich are still supported when PCRE2_UTF is not set, but which require Unicode
369*22dc650dSSadaf Ebrahimiproperty support to be included in the library.
370*22dc650dSSadaf Ebrahimi</P>
371*22dc650dSSadaf Ebrahimi<P>
372*22dc650dSSadaf EbrahimiThis is a trigger guard that is used in test files to ensure that UTF or
373*22dc650dSSadaf EbrahimiUnicode property tests are not accidentally added to files that are used when
374*22dc650dSSadaf EbrahimiUnicode support is not included in the library. Setting PCRE2_NEVER_UTF and
375*22dc650dSSadaf EbrahimiPCRE2_NEVER_UCP as a default can also be obtained by the use of <b>#pattern</b>;
376*22dc650dSSadaf Ebrahimithe difference is that <b>#forbid_utf</b> cannot be unset, and the automatic
377*22dc650dSSadaf Ebrahimioptions are not displayed in pattern information, to avoid cluttering up test
378*22dc650dSSadaf Ebrahimioutput.
379*22dc650dSSadaf Ebrahimi<pre>
380*22dc650dSSadaf Ebrahimi  #load &#60;filename&#62;
381*22dc650dSSadaf Ebrahimi</pre>
382*22dc650dSSadaf EbrahimiThis command is used to load a set of precompiled patterns from a file, as
383*22dc650dSSadaf Ebrahimidescribed in the section entitled "Saving and restoring compiled patterns"
384*22dc650dSSadaf Ebrahimi<a href="#saverestore">below.</a>
385*22dc650dSSadaf Ebrahimi<pre>
386*22dc650dSSadaf Ebrahimi  #loadtables &#60;filename&#62;
387*22dc650dSSadaf Ebrahimi</pre>
388*22dc650dSSadaf EbrahimiThis command is used to load a set of binary character tables that can be
389*22dc650dSSadaf Ebrahimiaccessed by the tables=3 qualifier. Such tables can be created by the
390*22dc650dSSadaf Ebrahimi<b>pcre2_dftables</b> program with the -b option.
391*22dc650dSSadaf Ebrahimi<pre>
392*22dc650dSSadaf Ebrahimi  #newline_default [&#60;newline-list&#62;]
393*22dc650dSSadaf Ebrahimi</pre>
394*22dc650dSSadaf EbrahimiWhen PCRE2 is built, a default newline convention can be specified. This
395*22dc650dSSadaf Ebrahimidetermines which characters and/or character pairs are recognized as indicating
396*22dc650dSSadaf Ebrahimia newline in a pattern or subject string. The default can be overridden when a
397*22dc650dSSadaf Ebrahimipattern is compiled. The standard test files contain tests of various newline
398*22dc650dSSadaf Ebrahimiconventions, but the majority of the tests expect a single linefeed to be
399*22dc650dSSadaf Ebrahimirecognized as a newline by default. Without special action the tests would fail
400*22dc650dSSadaf Ebrahimiwhen PCRE2 is compiled with either CR or CRLF as the default newline.
401*22dc650dSSadaf Ebrahimi</P>
402*22dc650dSSadaf Ebrahimi<P>
403*22dc650dSSadaf EbrahimiThe #newline_default command specifies a list of newline types that are
404*22dc650dSSadaf Ebrahimiacceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF,
405*22dc650dSSadaf EbrahimiANY, or NUL (in upper or lower case), for example:
406*22dc650dSSadaf Ebrahimi<pre>
407*22dc650dSSadaf Ebrahimi  #newline_default LF Any anyCRLF
408*22dc650dSSadaf Ebrahimi</pre>
409*22dc650dSSadaf EbrahimiIf the default newline is in the list, this command has no effect. Otherwise,
410*22dc650dSSadaf Ebrahimiexcept when testing the POSIX API, a <b>newline</b> modifier that specifies the
411*22dc650dSSadaf Ebrahimifirst newline convention in the list (LF in the above example) is added to any
412*22dc650dSSadaf Ebrahimipattern that does not already have a <b>newline</b> modifier. If the newline
413*22dc650dSSadaf Ebrahimilist is empty, the feature is turned off. This command is present in a number
414*22dc650dSSadaf Ebrahimiof the standard test input files.
415*22dc650dSSadaf Ebrahimi</P>
416*22dc650dSSadaf Ebrahimi<P>
417*22dc650dSSadaf EbrahimiWhen the POSIX API is being tested there is no way to override the default
418*22dc650dSSadaf Ebrahiminewline convention, though it is possible to set the newline convention from
419*22dc650dSSadaf Ebrahimiwithin the pattern. A warning is given if the <b>posix</b> or <b>posix_nosub</b>
420*22dc650dSSadaf Ebrahimimodifier is used when <b>#newline_default</b> would set a default for the
421*22dc650dSSadaf Ebrahiminon-POSIX API.
422*22dc650dSSadaf Ebrahimi<pre>
423*22dc650dSSadaf Ebrahimi  #pattern &#60;modifier-list&#62;
424*22dc650dSSadaf Ebrahimi</pre>
425*22dc650dSSadaf EbrahimiThis command sets a default modifier list that applies to all subsequent
426*22dc650dSSadaf Ebrahimipatterns. Modifiers on a pattern can change these settings.
427*22dc650dSSadaf Ebrahimi<pre>
428*22dc650dSSadaf Ebrahimi  #perltest
429*22dc650dSSadaf Ebrahimi</pre>
430*22dc650dSSadaf EbrahimiThis line is used in test files that can also be processed by <b>perltest.sh</b>
431*22dc650dSSadaf Ebrahimito confirm that Perl gives the same results as PCRE2. Subsequent tests are
432*22dc650dSSadaf Ebrahimichecked for the use of <b>pcre2test</b> features that are incompatible with the
433*22dc650dSSadaf Ebrahimi<b>perltest.sh</b> script.
434*22dc650dSSadaf Ebrahimi</P>
435*22dc650dSSadaf Ebrahimi<P>
436*22dc650dSSadaf EbrahimiPatterns must use '/' as their delimiter, and only certain modifiers are
437*22dc650dSSadaf Ebrahimisupported. Comment lines, #pattern commands, and #subject commands that set or
438*22dc650dSSadaf Ebrahimiunset "mark" are recognized and acted on. The #perltest, #forbid_utf, and
439*22dc650dSSadaf Ebrahimi#newline_default commands, which are needed in the relevant pcre2test files,
440*22dc650dSSadaf Ebrahimiare silently ignored. All other command lines are ignored, but give a warning
441*22dc650dSSadaf Ebrahimimessage. The <b>#perltest</b> command helps detect tests that are accidentally
442*22dc650dSSadaf Ebrahimiput in the wrong file or use the wrong delimiter. For more details of the
443*22dc650dSSadaf Ebrahimi<b>perltest.sh</b> script see the comments it contains.
444*22dc650dSSadaf Ebrahimi<pre>
445*22dc650dSSadaf Ebrahimi  #pop [&#60;modifiers&#62;]
446*22dc650dSSadaf Ebrahimi  #popcopy [&#60;modifiers&#62;]
447*22dc650dSSadaf Ebrahimi</pre>
448*22dc650dSSadaf EbrahimiThese commands are used to manipulate the stack of compiled patterns, as
449*22dc650dSSadaf Ebrahimidescribed in the section entitled "Saving and restoring compiled patterns"
450*22dc650dSSadaf Ebrahimi<a href="#saverestore">below.</a>
451*22dc650dSSadaf Ebrahimi<pre>
452*22dc650dSSadaf Ebrahimi  #save &#60;filename&#62;
453*22dc650dSSadaf Ebrahimi</pre>
454*22dc650dSSadaf EbrahimiThis command is used to save a set of compiled patterns to a file, as described
455*22dc650dSSadaf Ebrahimiin the section entitled "Saving and restoring compiled patterns"
456*22dc650dSSadaf Ebrahimi<a href="#saverestore">below.</a>
457*22dc650dSSadaf Ebrahimi<pre>
458*22dc650dSSadaf Ebrahimi  #subject &#60;modifier-list&#62;
459*22dc650dSSadaf Ebrahimi</pre>
460*22dc650dSSadaf EbrahimiThis command sets a default modifier list that applies to all subsequent
461*22dc650dSSadaf Ebrahimisubject lines. Modifiers on a subject line can change these settings.
462*22dc650dSSadaf Ebrahimi</P>
463*22dc650dSSadaf Ebrahimi<br><a name="SEC7" href="#TOC1">MODIFIER SYNTAX</a><br>
464*22dc650dSSadaf Ebrahimi<P>
465*22dc650dSSadaf EbrahimiModifier lists are used with both pattern and subject lines. Items in a list
466*22dc650dSSadaf Ebrahimiare separated by commas followed by optional white space. Trailing whitespace
467*22dc650dSSadaf Ebrahimiin a modifier list is ignored. Some modifiers may be given for both patterns
468*22dc650dSSadaf Ebrahimiand subject lines, whereas others are valid only for one or the other. Each
469*22dc650dSSadaf Ebrahimimodifier has a long name, for example "anchored", and some of them must be
470*22dc650dSSadaf Ebrahimifollowed by an equals sign and a value, for example, "offset=12". Values cannot
471*22dc650dSSadaf Ebrahimicontain comma characters, but may contain spaces. Modifiers that do not take
472*22dc650dSSadaf Ebrahimivalues may be preceded by a minus sign to turn off a previous setting.
473*22dc650dSSadaf Ebrahimi</P>
474*22dc650dSSadaf Ebrahimi<P>
475*22dc650dSSadaf EbrahimiA few of the more common modifiers can also be specified as single letters, for
476*22dc650dSSadaf Ebrahimiexample "i" for "caseless". In documentation, following the Perl convention,
477*22dc650dSSadaf Ebrahimithese are written with a slash ("the /i modifier") for clarity. Abbreviated
478*22dc650dSSadaf Ebrahimimodifiers must all be concatenated in the first item of a modifier list. If the
479*22dc650dSSadaf Ebrahimifirst item is not recognized as a long modifier name, it is interpreted as a
480*22dc650dSSadaf Ebrahimisequence of these abbreviations. For example:
481*22dc650dSSadaf Ebrahimi<pre>
482*22dc650dSSadaf Ebrahimi  /abc/ig,newline=cr,jit=3
483*22dc650dSSadaf Ebrahimi</pre>
484*22dc650dSSadaf EbrahimiThis is a pattern line whose modifier list starts with two one-letter modifiers
485*22dc650dSSadaf Ebrahimi(/i and /g). The lower-case abbreviated modifiers are the same as used in Perl.
486*22dc650dSSadaf Ebrahimi</P>
487*22dc650dSSadaf Ebrahimi<br><a name="SEC8" href="#TOC1">PATTERN SYNTAX</a><br>
488*22dc650dSSadaf Ebrahimi<P>
489*22dc650dSSadaf EbrahimiA pattern line must start with one of the following characters (common symbols,
490*22dc650dSSadaf Ebrahimiexcluding pattern meta-characters):
491*22dc650dSSadaf Ebrahimi<pre>
492*22dc650dSSadaf Ebrahimi  / ! " ' ` - = _ : ; , % & @ ~
493*22dc650dSSadaf Ebrahimi</pre>
494*22dc650dSSadaf EbrahimiThis is interpreted as the pattern's delimiter. A regular expression may be
495*22dc650dSSadaf Ebrahimicontinued over several input lines, in which case the newline characters are
496*22dc650dSSadaf Ebrahimiincluded within it. It is possible to include the delimiter as a literal within
497*22dc650dSSadaf Ebrahimithe pattern by escaping it with a backslash, for example
498*22dc650dSSadaf Ebrahimi<pre>
499*22dc650dSSadaf Ebrahimi  /abc\/def/
500*22dc650dSSadaf Ebrahimi</pre>
501*22dc650dSSadaf EbrahimiIf you do this, the escape and the delimiter form part of the pattern, but
502*22dc650dSSadaf Ebrahimisince the delimiters are all non-alphanumeric, the inclusion of the backslash
503*22dc650dSSadaf Ebrahimidoes not affect the pattern's interpretation. Note, however, that this trick
504*22dc650dSSadaf Ebrahimidoes not work within \Q...\E literal bracketing because the backslash will
505*22dc650dSSadaf Ebrahimiitself be interpreted as a literal. If the terminating delimiter is immediately
506*22dc650dSSadaf Ebrahimifollowed by a backslash, for example,
507*22dc650dSSadaf Ebrahimi<pre>
508*22dc650dSSadaf Ebrahimi  /abc/\
509*22dc650dSSadaf Ebrahimi</pre>
510*22dc650dSSadaf Ebrahimia backslash is added to the end of the pattern. This is done to provide a way
511*22dc650dSSadaf Ebrahimiof testing the error condition that arises if a pattern finishes with a
512*22dc650dSSadaf Ebrahimibackslash, because
513*22dc650dSSadaf Ebrahimi<pre>
514*22dc650dSSadaf Ebrahimi  /abc\/
515*22dc650dSSadaf Ebrahimi</pre>
516*22dc650dSSadaf Ebrahimiis interpreted as the first line of a pattern that starts with "abc/", causing
517*22dc650dSSadaf Ebrahimipcre2test to read the next line as a continuation of the regular expression.
518*22dc650dSSadaf Ebrahimi</P>
519*22dc650dSSadaf Ebrahimi<P>
520*22dc650dSSadaf EbrahimiA pattern can be followed by a modifier list (details below).
521*22dc650dSSadaf Ebrahimi</P>
522*22dc650dSSadaf Ebrahimi<br><a name="SEC9" href="#TOC1">SUBJECT LINE SYNTAX</a><br>
523*22dc650dSSadaf Ebrahimi<P>
524*22dc650dSSadaf EbrahimiBefore each subject line is passed to <b>pcre2_match()</b>,
525*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>, leading and trailing white
526*22dc650dSSadaf Ebrahimispace is removed, and the line is scanned for backslash escapes, unless the
527*22dc650dSSadaf Ebrahimi<b>subject_literal</b> modifier was set for the pattern. The following provide a
528*22dc650dSSadaf Ebrahimimeans of encoding non-printing characters in a visible way:
529*22dc650dSSadaf Ebrahimi<pre>
530*22dc650dSSadaf Ebrahimi  \a         alarm (BEL, \x07)
531*22dc650dSSadaf Ebrahimi  \b         backspace (\x08)
532*22dc650dSSadaf Ebrahimi  \e         escape (\x27)
533*22dc650dSSadaf Ebrahimi  \f         form feed (\x0c)
534*22dc650dSSadaf Ebrahimi  \n         newline (\x0a)
535*22dc650dSSadaf Ebrahimi  \r         carriage return (\x0d)
536*22dc650dSSadaf Ebrahimi  \t         tab (\x09)
537*22dc650dSSadaf Ebrahimi  \v         vertical tab (\x0b)
538*22dc650dSSadaf Ebrahimi  \nnn       octal character (up to 3 octal digits); always
539*22dc650dSSadaf Ebrahimi               a byte unless &#62; 255 in UTF-8 or 16-bit or 32-bit mode
540*22dc650dSSadaf Ebrahimi  \o{dd...}  octal character (any number of octal digits}
541*22dc650dSSadaf Ebrahimi  \xhh       hexadecimal byte (up to 2 hex digits)
542*22dc650dSSadaf Ebrahimi  \x{hh...}  hexadecimal character (any number of hex digits)
543*22dc650dSSadaf Ebrahimi</pre>
544*22dc650dSSadaf EbrahimiThe use of \x{hh...} is not dependent on the use of the <b>utf</b> modifier on
545*22dc650dSSadaf Ebrahimithe pattern. It is recognized always. There may be any number of hexadecimal
546*22dc650dSSadaf Ebrahimidigits inside the braces; invalid values provoke error messages.
547*22dc650dSSadaf Ebrahimi</P>
548*22dc650dSSadaf Ebrahimi<P>
549*22dc650dSSadaf EbrahimiNote that \xhh specifies one byte rather than one character in UTF-8 mode;
550*22dc650dSSadaf Ebrahimithis makes it possible to construct invalid UTF-8 sequences for testing
551*22dc650dSSadaf Ebrahimipurposes. On the other hand, \x{hh} is interpreted as a UTF-8 character in
552*22dc650dSSadaf EbrahimiUTF-8 mode, generating more than one byte if the value is greater than 127.
553*22dc650dSSadaf EbrahimiWhen testing the 8-bit library not in UTF-8 mode, \x{hh} generates one byte
554*22dc650dSSadaf Ebrahimifor values less than 256, and causes an error for greater values.
555*22dc650dSSadaf Ebrahimi</P>
556*22dc650dSSadaf Ebrahimi<P>
557*22dc650dSSadaf EbrahimiIn UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
558*22dc650dSSadaf Ebrahimipossible to construct invalid UTF-16 sequences for testing purposes.
559*22dc650dSSadaf Ebrahimi</P>
560*22dc650dSSadaf Ebrahimi<P>
561*22dc650dSSadaf EbrahimiIn UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it
562*22dc650dSSadaf Ebrahimipossible to construct invalid UTF-32 sequences for testing purposes.
563*22dc650dSSadaf Ebrahimi</P>
564*22dc650dSSadaf Ebrahimi<P>
565*22dc650dSSadaf EbrahimiThere is a special backslash sequence that specifies replication of one or more
566*22dc650dSSadaf Ebrahimicharacters:
567*22dc650dSSadaf Ebrahimi<pre>
568*22dc650dSSadaf Ebrahimi  \[&#60;characters&#62;]{&#60;count&#62;}
569*22dc650dSSadaf Ebrahimi</pre>
570*22dc650dSSadaf EbrahimiThis makes it possible to test long strings without having to provide them as
571*22dc650dSSadaf Ebrahimipart of the file. For example:
572*22dc650dSSadaf Ebrahimi<pre>
573*22dc650dSSadaf Ebrahimi  \[abc]{4}
574*22dc650dSSadaf Ebrahimi</pre>
575*22dc650dSSadaf Ebrahimiis converted to "abcabcabcabc". This feature does not support nesting. To
576*22dc650dSSadaf Ebrahimiinclude a closing square bracket in the characters, code it as \x5D.
577*22dc650dSSadaf Ebrahimi</P>
578*22dc650dSSadaf Ebrahimi<P>
579*22dc650dSSadaf EbrahimiA backslash followed by an equals sign marks the end of the subject string and
580*22dc650dSSadaf Ebrahimithe start of a modifier list. For example:
581*22dc650dSSadaf Ebrahimi<pre>
582*22dc650dSSadaf Ebrahimi  abc\=notbol,notempty
583*22dc650dSSadaf Ebrahimi</pre>
584*22dc650dSSadaf EbrahimiIf the subject string is empty and \= is followed by whitespace, the line is
585*22dc650dSSadaf Ebrahimitreated as a comment line, and is not used for matching. For example:
586*22dc650dSSadaf Ebrahimi<pre>
587*22dc650dSSadaf Ebrahimi  \= This is a comment.
588*22dc650dSSadaf Ebrahimi  abc\= This is an invalid modifier list.
589*22dc650dSSadaf Ebrahimi</pre>
590*22dc650dSSadaf EbrahimiA backslash followed by any other non-alphanumeric character just escapes that
591*22dc650dSSadaf Ebrahimicharacter. A backslash followed by anything else causes an error. However, if
592*22dc650dSSadaf Ebrahimithe very last character in the line is a backslash (and there is no modifier
593*22dc650dSSadaf Ebrahimilist), it is ignored. This gives a way of passing an empty line as data, since
594*22dc650dSSadaf Ebrahimia real empty line terminates the data input.
595*22dc650dSSadaf Ebrahimi</P>
596*22dc650dSSadaf Ebrahimi<P>
597*22dc650dSSadaf EbrahimiIf the <b>subject_literal</b> modifier is set for a pattern, all subject lines
598*22dc650dSSadaf Ebrahimithat follow are treated as literals, with no special treatment of backslashes.
599*22dc650dSSadaf EbrahimiNo replication is possible, and any subject modifiers must be set as defaults
600*22dc650dSSadaf Ebrahimiby a <b>#subject</b> command.
601*22dc650dSSadaf Ebrahimi</P>
602*22dc650dSSadaf Ebrahimi<br><a name="SEC10" href="#TOC1">PATTERN MODIFIERS</a><br>
603*22dc650dSSadaf Ebrahimi<P>
604*22dc650dSSadaf EbrahimiThere are several types of modifier that can appear in pattern lines. Except
605*22dc650dSSadaf Ebrahimiwhere noted below, they may also be used in <b>#pattern</b> commands. A
606*22dc650dSSadaf Ebrahimipattern's modifier list can add to or override default modifiers that were set
607*22dc650dSSadaf Ebrahimiby a previous <b>#pattern</b> command.
608*22dc650dSSadaf Ebrahimi<a name="optionmodifiers"></a></P>
609*22dc650dSSadaf Ebrahimi<br><b>
610*22dc650dSSadaf EbrahimiSetting compilation options
611*22dc650dSSadaf Ebrahimi</b><br>
612*22dc650dSSadaf Ebrahimi<P>
613*22dc650dSSadaf EbrahimiThe following modifiers set options for <b>pcre2_compile()</b>. Most of them set
614*22dc650dSSadaf Ebrahimibits in the options argument of that function, but those whose names start with
615*22dc650dSSadaf EbrahimiPCRE2_EXTRA are additional options that are set in the compile context.
616*22dc650dSSadaf EbrahimiSome of these options have single-letter abbreviations. There is special
617*22dc650dSSadaf Ebrahimihandling for /x: if a second x is present, PCRE2_EXTENDED is converted into
618*22dc650dSSadaf EbrahimiPCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EXTENDED as well,
619*22dc650dSSadaf Ebrahimithough this makes no difference to the way <b>pcre2_compile()</b> behaves. See
620*22dc650dSSadaf Ebrahimi<a href="pcre2api.html"><b>pcre2api</b></a>
621*22dc650dSSadaf Ebrahimifor a description of the effects of these options.
622*22dc650dSSadaf Ebrahimi<pre>
623*22dc650dSSadaf Ebrahimi      allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
624*22dc650dSSadaf Ebrahimi      allow_lookaround_bsk      set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
625*22dc650dSSadaf Ebrahimi      allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
626*22dc650dSSadaf Ebrahimi      alt_bsux                  set PCRE2_ALT_BSUX
627*22dc650dSSadaf Ebrahimi      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
628*22dc650dSSadaf Ebrahimi      alt_verbnames             set PCRE2_ALT_VERBNAMES
629*22dc650dSSadaf Ebrahimi      anchored                  set PCRE2_ANCHORED
630*22dc650dSSadaf Ebrahimi  /a  ascii_all                 set all ASCII options
631*22dc650dSSadaf Ebrahimi      ascii_bsd                 set PCRE2_EXTRA_ASCII_BSD
632*22dc650dSSadaf Ebrahimi      ascii_bss                 set PCRE2_EXTRA_ASCII_BSS
633*22dc650dSSadaf Ebrahimi      ascii_bsw                 set PCRE2_EXTRA_ASCII_BSW
634*22dc650dSSadaf Ebrahimi      ascii_digit               set PCRE2_EXTRA_ASCII_DIGIT
635*22dc650dSSadaf Ebrahimi      ascii_posix               set PCRE2_EXTRA_ASCII_POSIX
636*22dc650dSSadaf Ebrahimi      auto_callout              set PCRE2_AUTO_CALLOUT
637*22dc650dSSadaf Ebrahimi      bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
638*22dc650dSSadaf Ebrahimi  /i  caseless                  set PCRE2_CASELESS
639*22dc650dSSadaf Ebrahimi  /r  caseless_restrict         set PCRE2_EXTRA_CASELESS_RESTRICT
640*22dc650dSSadaf Ebrahimi      dollar_endonly            set PCRE2_DOLLAR_ENDONLY
641*22dc650dSSadaf Ebrahimi  /s  dotall                    set PCRE2_DOTALL
642*22dc650dSSadaf Ebrahimi      dupnames                  set PCRE2_DUPNAMES
643*22dc650dSSadaf Ebrahimi      endanchored               set PCRE2_ENDANCHORED
644*22dc650dSSadaf Ebrahimi      escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF
645*22dc650dSSadaf Ebrahimi  /x  extended                  set PCRE2_EXTENDED
646*22dc650dSSadaf Ebrahimi  /xx extended_more             set PCRE2_EXTENDED_MORE
647*22dc650dSSadaf Ebrahimi      extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX
648*22dc650dSSadaf Ebrahimi      firstline                 set PCRE2_FIRSTLINE
649*22dc650dSSadaf Ebrahimi      literal                   set PCRE2_LITERAL
650*22dc650dSSadaf Ebrahimi      match_line                set PCRE2_EXTRA_MATCH_LINE
651*22dc650dSSadaf Ebrahimi      match_invalid_utf         set PCRE2_MATCH_INVALID_UTF
652*22dc650dSSadaf Ebrahimi      match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
653*22dc650dSSadaf Ebrahimi      match_word                set PCRE2_EXTRA_MATCH_WORD
654*22dc650dSSadaf Ebrahimi  /m  multiline                 set PCRE2_MULTILINE
655*22dc650dSSadaf Ebrahimi      never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
656*22dc650dSSadaf Ebrahimi      never_ucp                 set PCRE2_NEVER_UCP
657*22dc650dSSadaf Ebrahimi      never_utf                 set PCRE2_NEVER_UTF
658*22dc650dSSadaf Ebrahimi  /n  no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
659*22dc650dSSadaf Ebrahimi      no_auto_possess           set PCRE2_NO_AUTO_POSSESS
660*22dc650dSSadaf Ebrahimi      no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
661*22dc650dSSadaf Ebrahimi      no_start_optimize         set PCRE2_NO_START_OPTIMIZE
662*22dc650dSSadaf Ebrahimi      no_utf_check              set PCRE2_NO_UTF_CHECK
663*22dc650dSSadaf Ebrahimi      ucp                       set PCRE2_UCP
664*22dc650dSSadaf Ebrahimi      ungreedy                  set PCRE2_UNGREEDY
665*22dc650dSSadaf Ebrahimi      use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
666*22dc650dSSadaf Ebrahimi      utf                       set PCRE2_UTF
667*22dc650dSSadaf Ebrahimi</pre>
668*22dc650dSSadaf EbrahimiAs well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all
669*22dc650dSSadaf Ebrahiminon-printing characters in output strings to be printed using the \x{hh...}
670*22dc650dSSadaf Ebrahiminotation. Otherwise, those less than 0x100 are output in hex without the curly
671*22dc650dSSadaf Ebrahimibrackets. Setting <b>utf</b> in 16-bit or 32-bit mode also causes pattern and
672*22dc650dSSadaf Ebrahimisubject strings to be translated to UTF-16 or UTF-32, respectively, before
673*22dc650dSSadaf Ebrahimibeing passed to library functions.
674*22dc650dSSadaf Ebrahimi<a name="controlmodifiers"></a></P>
675*22dc650dSSadaf Ebrahimi<br><b>
676*22dc650dSSadaf EbrahimiSetting compilation controls
677*22dc650dSSadaf Ebrahimi</b><br>
678*22dc650dSSadaf Ebrahimi<P>
679*22dc650dSSadaf EbrahimiThe following modifiers affect the compilation process or request information
680*22dc650dSSadaf Ebrahimiabout the pattern. There are single-letter abbreviations for some that are
681*22dc650dSSadaf Ebrahimiheavily used in the test files.
682*22dc650dSSadaf Ebrahimi<pre>
683*22dc650dSSadaf Ebrahimi      bsr=[anycrlf|unicode]     specify \R handling
684*22dc650dSSadaf Ebrahimi  /B  bincode                   show binary code without lengths
685*22dc650dSSadaf Ebrahimi      callout_info              show callout information
686*22dc650dSSadaf Ebrahimi      convert=&#60;options&#62;         request foreign pattern conversion
687*22dc650dSSadaf Ebrahimi      convert_glob_escape=c     set glob escape character
688*22dc650dSSadaf Ebrahimi      convert_glob_separator=c  set glob separator character
689*22dc650dSSadaf Ebrahimi      convert_length            set convert buffer length
690*22dc650dSSadaf Ebrahimi      debug                     same as info,fullbincode
691*22dc650dSSadaf Ebrahimi      framesize                 show matching frame size
692*22dc650dSSadaf Ebrahimi      fullbincode               show binary code with lengths
693*22dc650dSSadaf Ebrahimi  /I  info                      show info about compiled pattern
694*22dc650dSSadaf Ebrahimi      hex                       unquoted characters are hexadecimal
695*22dc650dSSadaf Ebrahimi      jit[=&#60;number&#62;]            use JIT
696*22dc650dSSadaf Ebrahimi      jitfast                   use JIT fast path
697*22dc650dSSadaf Ebrahimi      jitverify                 verify JIT use
698*22dc650dSSadaf Ebrahimi      locale=&#60;name&#62;             use this locale
699*22dc650dSSadaf Ebrahimi      max_pattern_compiled      ) set maximum compiled pattern
700*22dc650dSSadaf Ebrahimi                 _length=&#60;n&#62;    )   length (bytes)
701*22dc650dSSadaf Ebrahimi      max_pattern_length=&#60;n&#62;    set maximum pattern length (code units)
702*22dc650dSSadaf Ebrahimi      max_varlookbehind=&#60;n&#62;     set maximum variable lookbehind length
703*22dc650dSSadaf Ebrahimi      memory                    show memory used
704*22dc650dSSadaf Ebrahimi      newline=&#60;type&#62;            set newline type
705*22dc650dSSadaf Ebrahimi      null_context              compile with a NULL context
706*22dc650dSSadaf Ebrahimi      null_pattern              pass pattern as NULL
707*22dc650dSSadaf Ebrahimi      parens_nest_limit=&#60;n&#62;     set maximum parentheses depth
708*22dc650dSSadaf Ebrahimi      posix                     use the POSIX API
709*22dc650dSSadaf Ebrahimi      posix_nosub               use the POSIX API with REG_NOSUB
710*22dc650dSSadaf Ebrahimi      push                      push compiled pattern onto the stack
711*22dc650dSSadaf Ebrahimi      pushcopy                  push a copy onto the stack
712*22dc650dSSadaf Ebrahimi      stackguard=&#60;number&#62;       test the stackguard feature
713*22dc650dSSadaf Ebrahimi      subject_literal           treat all subject lines as literal
714*22dc650dSSadaf Ebrahimi      tables=[0|1|2|3]          select internal tables
715*22dc650dSSadaf Ebrahimi      use_length                do not zero-terminate the pattern
716*22dc650dSSadaf Ebrahimi      utf8_input                treat input as UTF-8
717*22dc650dSSadaf Ebrahimi</pre>
718*22dc650dSSadaf EbrahimiThe effects of these modifiers are described in the following sections.
719*22dc650dSSadaf Ebrahimi</P>
720*22dc650dSSadaf Ebrahimi<br><b>
721*22dc650dSSadaf EbrahimiNewline and \R handling
722*22dc650dSSadaf Ebrahimi</b><br>
723*22dc650dSSadaf Ebrahimi<P>
724*22dc650dSSadaf EbrahimiThe <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
725*22dc650dSSadaf Ebrahimiset to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
726*22dc650dSSadaf Ebrahimi\R matches any Unicode newline sequence. The default can be specified when
727*22dc650dSSadaf EbrahimiPCRE2 is built; if it is not, the default is set to Unicode.
728*22dc650dSSadaf Ebrahimi</P>
729*22dc650dSSadaf Ebrahimi<P>
730*22dc650dSSadaf EbrahimiThe <b>newline</b> modifier specifies which characters are to be interpreted as
731*22dc650dSSadaf Ebrahiminewlines, both in the pattern and in subject lines. The type must be one of CR,
732*22dc650dSSadaf EbrahimiLF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case).
733*22dc650dSSadaf Ebrahimi</P>
734*22dc650dSSadaf Ebrahimi<br><b>
735*22dc650dSSadaf EbrahimiInformation about a pattern
736*22dc650dSSadaf Ebrahimi</b><br>
737*22dc650dSSadaf Ebrahimi<P>
738*22dc650dSSadaf EbrahimiThe <b>debug</b> modifier is a shorthand for <b>info,fullbincode</b>, requesting
739*22dc650dSSadaf Ebrahimiall available information.
740*22dc650dSSadaf Ebrahimi</P>
741*22dc650dSSadaf Ebrahimi<P>
742*22dc650dSSadaf EbrahimiThe <b>bincode</b> modifier causes a representation of the compiled code to be
743*22dc650dSSadaf Ebrahimioutput after compilation. This information does not contain length and offset
744*22dc650dSSadaf Ebrahimivalues, which ensures that the same output is generated for different internal
745*22dc650dSSadaf Ebrahimilink sizes and different code unit widths. By using <b>bincode</b>, the same
746*22dc650dSSadaf Ebrahimiregression tests can be used in different environments.
747*22dc650dSSadaf Ebrahimi</P>
748*22dc650dSSadaf Ebrahimi<P>
749*22dc650dSSadaf EbrahimiThe <b>fullbincode</b> modifier, by contrast, <i>does</i> include length and
750*22dc650dSSadaf Ebrahimioffset values. This is used in a few special tests that run only for specific
751*22dc650dSSadaf Ebrahimicode unit widths and link sizes, and is also useful for one-off tests.
752*22dc650dSSadaf Ebrahimi</P>
753*22dc650dSSadaf Ebrahimi<P>
754*22dc650dSSadaf EbrahimiThe <b>info</b> modifier requests information about the compiled pattern
755*22dc650dSSadaf Ebrahimi(whether it is anchored, has a fixed first character, and so on). The
756*22dc650dSSadaf Ebrahimiinformation is obtained from the <b>pcre2_pattern_info()</b> function. Here are
757*22dc650dSSadaf Ebrahimisome typical examples:
758*22dc650dSSadaf Ebrahimi<pre>
759*22dc650dSSadaf Ebrahimi    re&#62; /(?i)(^a|^b)/m,info
760*22dc650dSSadaf Ebrahimi  Capture group count = 1
761*22dc650dSSadaf Ebrahimi  Compile options: multiline
762*22dc650dSSadaf Ebrahimi  Overall options: caseless multiline
763*22dc650dSSadaf Ebrahimi  First code unit at start or follows newline
764*22dc650dSSadaf Ebrahimi  Subject length lower bound = 1
765*22dc650dSSadaf Ebrahimi
766*22dc650dSSadaf Ebrahimi    re&#62; /(?i)abc/info
767*22dc650dSSadaf Ebrahimi  Capture group count = 0
768*22dc650dSSadaf Ebrahimi  Compile options: &#60;none&#62;
769*22dc650dSSadaf Ebrahimi  Overall options: caseless
770*22dc650dSSadaf Ebrahimi  First code unit = 'a' (caseless)
771*22dc650dSSadaf Ebrahimi  Last code unit = 'c' (caseless)
772*22dc650dSSadaf Ebrahimi  Subject length lower bound = 3
773*22dc650dSSadaf Ebrahimi</pre>
774*22dc650dSSadaf Ebrahimi"Compile options" are those specified by modifiers; "overall options" have
775*22dc650dSSadaf Ebrahimiadded options that are taken or deduced from the pattern. If both sets of
776*22dc650dSSadaf Ebrahimioptions are the same, just a single "options" line is output; if there are no
777*22dc650dSSadaf Ebrahimioptions, the line is omitted. "First code unit" is where any match must start;
778*22dc650dSSadaf Ebrahimiif there is more than one they are listed as "starting code units". "Last code
779*22dc650dSSadaf Ebrahimiunit" is the last literal code unit that must be present in any match. This is
780*22dc650dSSadaf Ebrahiminot necessarily the last character. These lines are omitted if no starting or
781*22dc650dSSadaf Ebrahimiending code units are recorded. The subject length line is omitted when
782*22dc650dSSadaf Ebrahimi<b>no_start_optimize</b> is set because the minimum length is not calculated
783*22dc650dSSadaf Ebrahimiwhen it can never be used.
784*22dc650dSSadaf Ebrahimi</P>
785*22dc650dSSadaf Ebrahimi<P>
786*22dc650dSSadaf EbrahimiThe <b>framesize</b> modifier shows the size, in bytes, of each storage frame
787*22dc650dSSadaf Ebrahimiused by <b>pcre2_match()</b> for handling backtracking. The size depends on the
788*22dc650dSSadaf Ebrahiminumber of capturing parentheses in the pattern. A vector of these frames is
789*22dc650dSSadaf Ebrahimiused at matching time; its overall size is shown when the <b>heaframes_size</b>
790*22dc650dSSadaf Ebrahimisubject modifier is set.
791*22dc650dSSadaf Ebrahimi</P>
792*22dc650dSSadaf Ebrahimi<P>
793*22dc650dSSadaf EbrahimiThe <b>callout_info</b> modifier requests information about all the callouts in
794*22dc650dSSadaf Ebrahimithe pattern. A list of them is output at the end of any other information that
795*22dc650dSSadaf Ebrahimiis requested. For each callout, either its number or string is given, followed
796*22dc650dSSadaf Ebrahimiby the item that follows it in the pattern.
797*22dc650dSSadaf Ebrahimi</P>
798*22dc650dSSadaf Ebrahimi<br><b>
799*22dc650dSSadaf EbrahimiPassing a NULL context
800*22dc650dSSadaf Ebrahimi</b><br>
801*22dc650dSSadaf Ebrahimi<P>
802*22dc650dSSadaf EbrahimiNormally, <b>pcre2test</b> passes a context block to <b>pcre2_compile()</b>. If
803*22dc650dSSadaf Ebrahimithe <b>null_context</b> modifier is set, however, NULL is passed. This is for
804*22dc650dSSadaf Ebrahimitesting that <b>pcre2_compile()</b> behaves correctly in this case (it uses
805*22dc650dSSadaf Ebrahimidefault values).
806*22dc650dSSadaf Ebrahimi</P>
807*22dc650dSSadaf Ebrahimi<br><b>
808*22dc650dSSadaf EbrahimiPassing a NULL pattern
809*22dc650dSSadaf Ebrahimi</b><br>
810*22dc650dSSadaf Ebrahimi<P>
811*22dc650dSSadaf EbrahimiThe <b>null_pattern</b> modifier is for testing the behaviour of
812*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> when the pattern argument is NULL. The length value
813*22dc650dSSadaf Ebrahimipassed is the default PCRE2_ZERO_TERMINATED unless <b>use_length</b> is set.
814*22dc650dSSadaf EbrahimiAny length other than zero causes an error.
815*22dc650dSSadaf Ebrahimi</P>
816*22dc650dSSadaf Ebrahimi<br><b>
817*22dc650dSSadaf EbrahimiSpecifying pattern characters in hexadecimal
818*22dc650dSSadaf Ebrahimi</b><br>
819*22dc650dSSadaf Ebrahimi<P>
820*22dc650dSSadaf EbrahimiThe <b>hex</b> modifier specifies that the characters of the pattern, except for
821*22dc650dSSadaf Ebrahimisubstrings enclosed in single or double quotes, are to be interpreted as pairs
822*22dc650dSSadaf Ebrahimiof hexadecimal digits. This feature is provided as a way of creating patterns
823*22dc650dSSadaf Ebrahimithat contain binary zeros and other non-printing characters. White space is
824*22dc650dSSadaf Ebrahimipermitted between pairs of digits. For example, this pattern contains three
825*22dc650dSSadaf Ebrahimicharacters:
826*22dc650dSSadaf Ebrahimi<pre>
827*22dc650dSSadaf Ebrahimi  /ab 32 59/hex
828*22dc650dSSadaf Ebrahimi</pre>
829*22dc650dSSadaf EbrahimiParts of such a pattern are taken literally if quoted. This pattern contains
830*22dc650dSSadaf Ebrahiminine characters, only two of which are specified in hexadecimal:
831*22dc650dSSadaf Ebrahimi<pre>
832*22dc650dSSadaf Ebrahimi  /ab "literal" 32/hex
833*22dc650dSSadaf Ebrahimi</pre>
834*22dc650dSSadaf EbrahimiEither single or double quotes may be used. There is no way of including
835*22dc650dSSadaf Ebrahimithe delimiter within a substring. The <b>hex</b> and <b>expand</b> modifiers are
836*22dc650dSSadaf Ebrahimimutually exclusive.
837*22dc650dSSadaf Ebrahimi</P>
838*22dc650dSSadaf Ebrahimi<br><b>
839*22dc650dSSadaf EbrahimiSpecifying the pattern's length
840*22dc650dSSadaf Ebrahimi</b><br>
841*22dc650dSSadaf Ebrahimi<P>
842*22dc650dSSadaf EbrahimiBy default, patterns are passed to the compiling functions as zero-terminated
843*22dc650dSSadaf Ebrahimistrings but can be passed by length instead of being zero-terminated. The
844*22dc650dSSadaf Ebrahimi<b>use_length</b> modifier causes this to happen. Using a length happens
845*22dc650dSSadaf Ebrahimiautomatically (whether or not <b>use_length</b> is set) when <b>hex</b> is set,
846*22dc650dSSadaf Ebrahimibecause patterns specified in hexadecimal may contain binary zeros.
847*22dc650dSSadaf Ebrahimi</P>
848*22dc650dSSadaf Ebrahimi<P>
849*22dc650dSSadaf EbrahimiIf <b>hex</b> or <b>use_length</b> is used with the POSIX wrapper API (see
850*22dc650dSSadaf Ebrahimi<a href="#posixwrapper">"Using the POSIX wrapper API"</a>
851*22dc650dSSadaf Ebrahimibelow), the REG_PEND extension is used to pass the pattern's length.
852*22dc650dSSadaf Ebrahimi</P>
853*22dc650dSSadaf Ebrahimi<br><b>
854*22dc650dSSadaf EbrahimiSpecifying a maximum for variable lookbehinds
855*22dc650dSSadaf Ebrahimi</b><br>
856*22dc650dSSadaf Ebrahimi<P>
857*22dc650dSSadaf EbrahimiVariable lookbehind assertions are supported only if, for each one, there is a
858*22dc650dSSadaf Ebrahimimaximum length (in characters) that it can match. There is a limit on this,
859*22dc650dSSadaf Ebrahimiwhose default can be set at build time, with an ultimate default of 255. The
860*22dc650dSSadaf Ebrahimi<b>max_varlookbehind</b> modifier uses the <b>pcre2_set_max_varlookbehind()</b>
861*22dc650dSSadaf Ebrahimifunction to change the limit. Lookbehinds whose branches each match a fixed
862*22dc650dSSadaf Ebrahimilength are limited to 65535 characters per branch.
863*22dc650dSSadaf Ebrahimi</P>
864*22dc650dSSadaf Ebrahimi<br><b>
865*22dc650dSSadaf EbrahimiSpecifying wide characters in 16-bit and 32-bit modes
866*22dc650dSSadaf Ebrahimi</b><br>
867*22dc650dSSadaf Ebrahimi<P>
868*22dc650dSSadaf EbrahimiIn 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
869*22dc650dSSadaf Ebrahimitranslated to UTF-16 or UTF-32 when the <b>utf</b> modifier is set. For testing
870*22dc650dSSadaf Ebrahimithe 16-bit and 32-bit libraries in non-UTF mode, the <b>utf8_input</b> modifier
871*22dc650dSSadaf Ebrahimican be used. It is mutually exclusive with <b>utf</b>. Input lines are
872*22dc650dSSadaf Ebrahimiinterpreted as UTF-8 as a means of specifying wide characters. More details are
873*22dc650dSSadaf Ebrahimigiven in
874*22dc650dSSadaf Ebrahimi<a href="#inputencoding">"Input encoding"</a>
875*22dc650dSSadaf Ebrahimiabove.
876*22dc650dSSadaf Ebrahimi</P>
877*22dc650dSSadaf Ebrahimi<br><b>
878*22dc650dSSadaf EbrahimiGenerating long repetitive patterns
879*22dc650dSSadaf Ebrahimi</b><br>
880*22dc650dSSadaf Ebrahimi<P>
881*22dc650dSSadaf EbrahimiSome tests use long patterns that are very repetitive. Instead of creating a
882*22dc650dSSadaf Ebrahimivery long input line for such a pattern, you can use a special repetition
883*22dc650dSSadaf Ebrahimifeature, similar to the one described for subject lines above. If the
884*22dc650dSSadaf Ebrahimi<b>expand</b> modifier is present on a pattern, parts of the pattern that have
885*22dc650dSSadaf Ebrahimithe form
886*22dc650dSSadaf Ebrahimi<pre>
887*22dc650dSSadaf Ebrahimi  \[&#60;characters&#62;]{&#60;count&#62;}
888*22dc650dSSadaf Ebrahimi</pre>
889*22dc650dSSadaf Ebrahimiare expanded before the pattern is passed to <b>pcre2_compile()</b>. For
890*22dc650dSSadaf Ebrahimiexample, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
891*22dc650dSSadaf Ebrahimicannot be nested. An initial "\[" sequence is recognized only if "]{" followed
892*22dc650dSSadaf Ebrahimiby decimal digits and "}" is found later in the pattern. If not, the characters
893*22dc650dSSadaf Ebrahimiremain in the pattern unaltered. The <b>expand</b> and <b>hex</b> modifiers are
894*22dc650dSSadaf Ebrahimimutually exclusive.
895*22dc650dSSadaf Ebrahimi</P>
896*22dc650dSSadaf Ebrahimi<P>
897*22dc650dSSadaf EbrahimiIf part of an expanded pattern looks like an expansion, but is really part of
898*22dc650dSSadaf Ebrahimithe actual pattern, unwanted expansion can be avoided by giving two values in
899*22dc650dSSadaf Ebrahimithe quantifier. For example, \[AB]{6000,6000} is not recognized as an
900*22dc650dSSadaf Ebrahimiexpansion item.
901*22dc650dSSadaf Ebrahimi</P>
902*22dc650dSSadaf Ebrahimi<P>
903*22dc650dSSadaf EbrahimiIf the <b>info</b> modifier is set on an expanded pattern, the result of the
904*22dc650dSSadaf Ebrahimiexpansion is included in the information that is output.
905*22dc650dSSadaf Ebrahimi</P>
906*22dc650dSSadaf Ebrahimi<br><b>
907*22dc650dSSadaf EbrahimiJIT compilation
908*22dc650dSSadaf Ebrahimi</b><br>
909*22dc650dSSadaf Ebrahimi<P>
910*22dc650dSSadaf EbrahimiJust-in-time (JIT) compiling is a heavyweight optimization that can greatly
911*22dc650dSSadaf Ebrahimispeed up pattern matching. See the
912*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a>
913*22dc650dSSadaf Ebrahimidocumentation for details. JIT compiling happens, optionally, after a pattern
914*22dc650dSSadaf Ebrahimihas been successfully compiled into an internal form. The JIT compiler converts
915*22dc650dSSadaf Ebrahimithis to optimized machine code. It needs to know whether the match-time options
916*22dc650dSSadaf EbrahimiPCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, because
917*22dc650dSSadaf Ebrahimidifferent code is generated for the different cases. See the <b>partial</b>
918*22dc650dSSadaf Ebrahimimodifier in "Subject Modifiers"
919*22dc650dSSadaf Ebrahimi<a href="#subjectmodifiers">below</a>
920*22dc650dSSadaf Ebrahimifor details of how these options are specified for each match attempt.
921*22dc650dSSadaf Ebrahimi</P>
922*22dc650dSSadaf Ebrahimi<P>
923*22dc650dSSadaf EbrahimiJIT compilation is requested by the <b>jit</b> pattern modifier, which may
924*22dc650dSSadaf Ebrahimioptionally be followed by an equals sign and a number in the range 0 to 7.
925*22dc650dSSadaf EbrahimiThe three bits that make up the number specify which of the three JIT operating
926*22dc650dSSadaf Ebrahimimodes are to be compiled:
927*22dc650dSSadaf Ebrahimi<pre>
928*22dc650dSSadaf Ebrahimi  1  compile JIT code for non-partial matching
929*22dc650dSSadaf Ebrahimi  2  compile JIT code for soft partial matching
930*22dc650dSSadaf Ebrahimi  4  compile JIT code for hard partial matching
931*22dc650dSSadaf Ebrahimi</pre>
932*22dc650dSSadaf EbrahimiThe possible values for the <b>jit</b> modifier are therefore:
933*22dc650dSSadaf Ebrahimi<pre>
934*22dc650dSSadaf Ebrahimi  0  disable JIT
935*22dc650dSSadaf Ebrahimi  1  normal matching only
936*22dc650dSSadaf Ebrahimi  2  soft partial matching only
937*22dc650dSSadaf Ebrahimi  3  normal and soft partial matching
938*22dc650dSSadaf Ebrahimi  4  hard partial matching only
939*22dc650dSSadaf Ebrahimi  6  soft and hard partial matching only
940*22dc650dSSadaf Ebrahimi  7  all three modes
941*22dc650dSSadaf Ebrahimi</pre>
942*22dc650dSSadaf EbrahimiIf no number is given, 7 is assumed. The phrase "partial matching" means a call
943*22dc650dSSadaf Ebrahimito <b>pcre2_match()</b> with either the PCRE2_PARTIAL_SOFT or the
944*22dc650dSSadaf EbrahimiPCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
945*22dc650dSSadaf Ebrahimimatch; the options enable the possibility of a partial match, but do not
946*22dc650dSSadaf Ebrahimirequire it. Note also that if you request JIT compilation only for partial
947*22dc650dSSadaf Ebrahimimatching (for example, jit=2) but do not set the <b>partial</b> modifier on a
948*22dc650dSSadaf Ebrahimisubject line, that match will not use JIT code because none was compiled for
949*22dc650dSSadaf Ebrahiminon-partial matching.
950*22dc650dSSadaf Ebrahimi</P>
951*22dc650dSSadaf Ebrahimi<P>
952*22dc650dSSadaf EbrahimiIf JIT compilation is successful, the compiled JIT code will automatically be
953*22dc650dSSadaf Ebrahimiused when an appropriate type of match is run, except when incompatible
954*22dc650dSSadaf Ebrahimirun-time options are specified. For more details, see the
955*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a>
956*22dc650dSSadaf Ebrahimidocumentation. See also the <b>jitstack</b> modifier below for a way of
957*22dc650dSSadaf Ebrahimisetting the size of the JIT stack.
958*22dc650dSSadaf Ebrahimi</P>
959*22dc650dSSadaf Ebrahimi<P>
960*22dc650dSSadaf EbrahimiIf the <b>jitfast</b> modifier is specified, matching is done using the JIT
961*22dc650dSSadaf Ebrahimi"fast path" interface, <b>pcre2_jit_match()</b>, which skips some of the sanity
962*22dc650dSSadaf Ebrahimichecks that are done by <b>pcre2_match()</b>, and of course does not work when
963*22dc650dSSadaf EbrahimiJIT is not supported. If <b>jitfast</b> is specified without <b>jit</b>, jit=7 is
964*22dc650dSSadaf Ebrahimiassumed.
965*22dc650dSSadaf Ebrahimi</P>
966*22dc650dSSadaf Ebrahimi<P>
967*22dc650dSSadaf EbrahimiIf the <b>jitverify</b> modifier is specified, information about the compiled
968*22dc650dSSadaf Ebrahimipattern shows whether JIT compilation was or was not successful. If
969*22dc650dSSadaf Ebrahimi<b>jitverify</b> is specified without <b>jit</b>, jit=7 is assumed. If JIT
970*22dc650dSSadaf Ebrahimicompilation is successful when <b>jitverify</b> is set, the text "(JIT)" is
971*22dc650dSSadaf Ebrahimiadded to the first output line after a match or non match when JIT-compiled
972*22dc650dSSadaf Ebrahimicode was actually used in the match.
973*22dc650dSSadaf Ebrahimi</P>
974*22dc650dSSadaf Ebrahimi<br><b>
975*22dc650dSSadaf EbrahimiSetting a locale
976*22dc650dSSadaf Ebrahimi</b><br>
977*22dc650dSSadaf Ebrahimi<P>
978*22dc650dSSadaf EbrahimiThe <b>locale</b> modifier must specify the name of a locale, for example:
979*22dc650dSSadaf Ebrahimi<pre>
980*22dc650dSSadaf Ebrahimi  /pattern/locale=fr_FR
981*22dc650dSSadaf Ebrahimi</pre>
982*22dc650dSSadaf EbrahimiThe given locale is set, <b>pcre2_maketables()</b> is called to build a set of
983*22dc650dSSadaf Ebrahimicharacter tables for the locale, and this is then passed to
984*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> when compiling the regular expression. The same tables
985*22dc650dSSadaf Ebrahimiare used when matching the following subject lines. The <b>locale</b> modifier
986*22dc650dSSadaf Ebrahimiapplies only to the pattern on which it appears, but can be given in a
987*22dc650dSSadaf Ebrahimi<b>#pattern</b> command if a default is needed. Setting a locale and alternate
988*22dc650dSSadaf Ebrahimicharacter tables are mutually exclusive.
989*22dc650dSSadaf Ebrahimi</P>
990*22dc650dSSadaf Ebrahimi<br><b>
991*22dc650dSSadaf EbrahimiShowing pattern memory
992*22dc650dSSadaf Ebrahimi</b><br>
993*22dc650dSSadaf Ebrahimi<P>
994*22dc650dSSadaf EbrahimiThe <b>memory</b> modifier causes the size in bytes of the memory used to hold
995*22dc650dSSadaf Ebrahimithe compiled pattern to be output. This does not include the size of the
996*22dc650dSSadaf Ebrahimi<b>pcre2_code</b> block; it is just the actual compiled data. If the pattern is
997*22dc650dSSadaf Ebrahimisubsequently passed to the JIT compiler, the size of the JIT compiled code is
998*22dc650dSSadaf Ebrahimialso output. Here is an example:
999*22dc650dSSadaf Ebrahimi<pre>
1000*22dc650dSSadaf Ebrahimi    re&#62; /a(b)c/jit,memory
1001*22dc650dSSadaf Ebrahimi  Memory allocation (code space): 21
1002*22dc650dSSadaf Ebrahimi  Memory allocation (JIT code): 1910
1003*22dc650dSSadaf Ebrahimi
1004*22dc650dSSadaf Ebrahimi</PRE>
1005*22dc650dSSadaf Ebrahimi</P>
1006*22dc650dSSadaf Ebrahimi<br><b>
1007*22dc650dSSadaf EbrahimiLimiting nested parentheses
1008*22dc650dSSadaf Ebrahimi</b><br>
1009*22dc650dSSadaf Ebrahimi<P>
1010*22dc650dSSadaf EbrahimiThe <b>parens_nest_limit</b> modifier sets a limit on the depth of nested
1011*22dc650dSSadaf Ebrahimiparentheses in a pattern. Breaching the limit causes a compilation error.
1012*22dc650dSSadaf EbrahimiThe default for the library is set when PCRE2 is built, but <b>pcre2test</b>
1013*22dc650dSSadaf Ebrahimisets its own default of 220, which is required for running the standard test
1014*22dc650dSSadaf Ebrahimisuite.
1015*22dc650dSSadaf Ebrahimi</P>
1016*22dc650dSSadaf Ebrahimi<br><b>
1017*22dc650dSSadaf EbrahimiLimiting the pattern length
1018*22dc650dSSadaf Ebrahimi</b><br>
1019*22dc650dSSadaf Ebrahimi<P>
1020*22dc650dSSadaf EbrahimiThe <b>max_pattern_length</b> modifier sets a limit, in code units, to the
1021*22dc650dSSadaf Ebrahimilength of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit
1022*22dc650dSSadaf Ebrahimicauses a compilation error. The default is the largest number a PCRE2_SIZE
1023*22dc650dSSadaf Ebrahimivariable can hold (essentially unlimited).
1024*22dc650dSSadaf Ebrahimi</P>
1025*22dc650dSSadaf Ebrahimi<br><b>
1026*22dc650dSSadaf EbrahimiLimiting the size of a compiled pattern
1027*22dc650dSSadaf Ebrahimi</b><br>
1028*22dc650dSSadaf Ebrahimi<P>
1029*22dc650dSSadaf EbrahimiThe <b>max_pattern_compiled_length</b> modifier sets a limit, in bytes, to the
1030*22dc650dSSadaf Ebrahimiamount of memory used by a compiled pattern. Breaching the limit causes a
1031*22dc650dSSadaf Ebrahimicompilation error. The default is the largest number a PCRE2_SIZE variable can
1032*22dc650dSSadaf Ebrahimihold (essentially unlimited).
1033*22dc650dSSadaf Ebrahimi<a name="posixwrapper"></a></P>
1034*22dc650dSSadaf Ebrahimi<br><b>
1035*22dc650dSSadaf EbrahimiUsing the POSIX wrapper API
1036*22dc650dSSadaf Ebrahimi</b><br>
1037*22dc650dSSadaf Ebrahimi<P>
1038*22dc650dSSadaf EbrahimiThe <b>posix</b> and <b>posix_nosub</b> modifiers cause <b>pcre2test</b> to call
1039*22dc650dSSadaf EbrahimiPCRE2 via the POSIX wrapper API rather than its native API. When
1040*22dc650dSSadaf Ebrahimi<b>posix_nosub</b> is used, the POSIX option REG_NOSUB is passed to
1041*22dc650dSSadaf Ebrahimi<b>regcomp()</b>. The POSIX wrapper supports only the 8-bit library. Note that
1042*22dc650dSSadaf Ebrahimiit does not imply POSIX matching semantics; for more detail see the
1043*22dc650dSSadaf Ebrahimi<a href="pcre2posix.html"><b>pcre2posix</b></a>
1044*22dc650dSSadaf Ebrahimidocumentation. The following pattern modifiers set options for the
1045*22dc650dSSadaf Ebrahimi<b>regcomp()</b> function:
1046*22dc650dSSadaf Ebrahimi<pre>
1047*22dc650dSSadaf Ebrahimi  caseless           REG_ICASE
1048*22dc650dSSadaf Ebrahimi  multiline          REG_NEWLINE
1049*22dc650dSSadaf Ebrahimi  dotall             REG_DOTALL     )
1050*22dc650dSSadaf Ebrahimi  ungreedy           REG_UNGREEDY   ) These options are not part of
1051*22dc650dSSadaf Ebrahimi  ucp                REG_UCP        )   the POSIX standard
1052*22dc650dSSadaf Ebrahimi  utf                REG_UTF8       )
1053*22dc650dSSadaf Ebrahimi</pre>
1054*22dc650dSSadaf EbrahimiThe <b>regerror_buffsize</b> modifier specifies a size for the error buffer that
1055*22dc650dSSadaf Ebrahimiis passed to <b>regerror()</b> in the event of a compilation error. For example:
1056*22dc650dSSadaf Ebrahimi<pre>
1057*22dc650dSSadaf Ebrahimi  /abc/posix,regerror_buffsize=20
1058*22dc650dSSadaf Ebrahimi</pre>
1059*22dc650dSSadaf EbrahimiThis provides a means of testing the behaviour of <b>regerror()</b> when the
1060*22dc650dSSadaf Ebrahimibuffer is too small for the error message. If this modifier has not been set, a
1061*22dc650dSSadaf Ebrahimilarge buffer is used.
1062*22dc650dSSadaf Ebrahimi</P>
1063*22dc650dSSadaf Ebrahimi<P>
1064*22dc650dSSadaf EbrahimiThe <b>aftertext</b> and <b>allaftertext</b> subject modifiers work as described
1065*22dc650dSSadaf Ebrahimibelow. All other modifiers are either ignored, with a warning message, or cause
1066*22dc650dSSadaf Ebrahimian error.
1067*22dc650dSSadaf Ebrahimi</P>
1068*22dc650dSSadaf Ebrahimi<P>
1069*22dc650dSSadaf EbrahimiThe pattern is passed to <b>regcomp()</b> as a zero-terminated string by
1070*22dc650dSSadaf Ebrahimidefault, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
1071*22dc650dSSadaf EbrahimiREG_PEND extension is used to pass it by length.
1072*22dc650dSSadaf Ebrahimi</P>
1073*22dc650dSSadaf Ebrahimi<br><b>
1074*22dc650dSSadaf EbrahimiTesting the stack guard feature
1075*22dc650dSSadaf Ebrahimi</b><br>
1076*22dc650dSSadaf Ebrahimi<P>
1077*22dc650dSSadaf EbrahimiThe <b>stackguard</b> modifier is used to test the use of
1078*22dc650dSSadaf Ebrahimi<b>pcre2_set_compile_recursion_guard()</b>, a function that is provided to
1079*22dc650dSSadaf Ebrahimienable stack availability to be checked during compilation (see the
1080*22dc650dSSadaf Ebrahimi<a href="pcre2api.html"><b>pcre2api</b></a>
1081*22dc650dSSadaf Ebrahimidocumentation for details). If the number specified by the modifier is greater
1082*22dc650dSSadaf Ebrahimithan zero, <b>pcre2_set_compile_recursion_guard()</b> is called to set up
1083*22dc650dSSadaf Ebrahimicallback from <b>pcre2_compile()</b> to a local function. The argument it
1084*22dc650dSSadaf Ebrahimireceives is the current nesting parenthesis depth; if this is greater than the
1085*22dc650dSSadaf Ebrahimivalue given by the modifier, non-zero is returned, causing the compilation to
1086*22dc650dSSadaf Ebrahimibe aborted.
1087*22dc650dSSadaf Ebrahimi</P>
1088*22dc650dSSadaf Ebrahimi<br><b>
1089*22dc650dSSadaf EbrahimiUsing alternative character tables
1090*22dc650dSSadaf Ebrahimi</b><br>
1091*22dc650dSSadaf Ebrahimi<P>
1092*22dc650dSSadaf EbrahimiThe value specified for the <b>tables</b> modifier must be one of the digits 0,
1093*22dc650dSSadaf Ebrahimi1, 2, or 3. It causes a specific set of built-in character tables to be passed
1094*22dc650dSSadaf Ebrahimito <b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour
1095*22dc650dSSadaf Ebrahimiwith different character tables. The digit specifies the tables as follows:
1096*22dc650dSSadaf Ebrahimi<pre>
1097*22dc650dSSadaf Ebrahimi  0   do not pass any special character tables
1098*22dc650dSSadaf Ebrahimi  1   the default ASCII tables, as distributed in
1099*22dc650dSSadaf Ebrahimi        pcre2_chartables.c.dist
1100*22dc650dSSadaf Ebrahimi  2   a set of tables defining ISO 8859 characters
1101*22dc650dSSadaf Ebrahimi  3   a set of tables loaded by the #loadtables command
1102*22dc650dSSadaf Ebrahimi</pre>
1103*22dc650dSSadaf EbrahimiIn tables 2, some characters whose codes are greater than 128 are identified as
1104*22dc650dSSadaf Ebrahimiletters, digits, spaces, etc. Tables 3 can be used only after a
1105*22dc650dSSadaf Ebrahimi<b>#loadtables</b> command has loaded them from a binary file. Setting alternate
1106*22dc650dSSadaf Ebrahimicharacter tables and a locale are mutually exclusive.
1107*22dc650dSSadaf Ebrahimi</P>
1108*22dc650dSSadaf Ebrahimi<br><b>
1109*22dc650dSSadaf EbrahimiSetting certain match controls
1110*22dc650dSSadaf Ebrahimi</b><br>
1111*22dc650dSSadaf Ebrahimi<P>
1112*22dc650dSSadaf EbrahimiThe following modifiers are really subject modifiers, and are described under
1113*22dc650dSSadaf Ebrahimi"Subject Modifiers" below. However, they may be included in a pattern's
1114*22dc650dSSadaf Ebrahimimodifier list, in which case they are applied to every subject line that is
1115*22dc650dSSadaf Ebrahimiprocessed with that pattern. These modifiers do not affect the compilation
1116*22dc650dSSadaf Ebrahimiprocess.
1117*22dc650dSSadaf Ebrahimi<pre>
1118*22dc650dSSadaf Ebrahimi      aftertext                   show text after match
1119*22dc650dSSadaf Ebrahimi      allaftertext                show text after captures
1120*22dc650dSSadaf Ebrahimi      allcaptures                 show all captures
1121*22dc650dSSadaf Ebrahimi      allvector                   show the entire ovector
1122*22dc650dSSadaf Ebrahimi      allusedtext                 show all consulted text
1123*22dc650dSSadaf Ebrahimi      altglobal                   alternative global matching
1124*22dc650dSSadaf Ebrahimi  /g  global                      global matching
1125*22dc650dSSadaf Ebrahimi      heapframes_size             show match data heapframes size
1126*22dc650dSSadaf Ebrahimi      jitstack=&#60;n&#62;                set size of JIT stack
1127*22dc650dSSadaf Ebrahimi      mark                        show mark values
1128*22dc650dSSadaf Ebrahimi      replace=&#60;string&#62;            specify a replacement string
1129*22dc650dSSadaf Ebrahimi      startchar                   show starting character when relevant
1130*22dc650dSSadaf Ebrahimi      substitute_callout          use substitution callouts
1131*22dc650dSSadaf Ebrahimi      substitute_extended         use PCRE2_SUBSTITUTE_EXTENDED
1132*22dc650dSSadaf Ebrahimi      substitute_literal          use PCRE2_SUBSTITUTE_LITERAL
1133*22dc650dSSadaf Ebrahimi      substitute_matched          use PCRE2_SUBSTITUTE_MATCHED
1134*22dc650dSSadaf Ebrahimi      substitute_overflow_length  use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1135*22dc650dSSadaf Ebrahimi      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
1136*22dc650dSSadaf Ebrahimi      substitute_skip=&#60;n&#62;         skip substitution &#60;n&#62;
1137*22dc650dSSadaf Ebrahimi      substitute_stop=&#60;n&#62;         skip substitution &#60;n&#62; and following
1138*22dc650dSSadaf Ebrahimi      substitute_unknown_unset    use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1139*22dc650dSSadaf Ebrahimi      substitute_unset_empty      use PCRE2_SUBSTITUTE_UNSET_EMPTY
1140*22dc650dSSadaf Ebrahimi</pre>
1141*22dc650dSSadaf EbrahimiThese modifiers may not appear in a <b>#pattern</b> command. If you want them as
1142*22dc650dSSadaf Ebrahimidefaults, set them in a <b>#subject</b> command.
1143*22dc650dSSadaf Ebrahimi</P>
1144*22dc650dSSadaf Ebrahimi<br><b>
1145*22dc650dSSadaf EbrahimiSpecifying literal subject lines
1146*22dc650dSSadaf Ebrahimi</b><br>
1147*22dc650dSSadaf Ebrahimi<P>
1148*22dc650dSSadaf EbrahimiIf the <b>subject_literal</b> modifier is present on a pattern, all the subject
1149*22dc650dSSadaf Ebrahimilines that it matches are taken as literal strings, with no interpretation of
1150*22dc650dSSadaf Ebrahimibackslashes. It is not possible to set subject modifiers on such lines, but any
1151*22dc650dSSadaf Ebrahimithat are set as defaults by a <b>#subject</b> command are recognized.
1152*22dc650dSSadaf Ebrahimi</P>
1153*22dc650dSSadaf Ebrahimi<br><b>
1154*22dc650dSSadaf EbrahimiSaving a compiled pattern
1155*22dc650dSSadaf Ebrahimi</b><br>
1156*22dc650dSSadaf Ebrahimi<P>
1157*22dc650dSSadaf EbrahimiWhen a pattern with the <b>push</b> modifier is successfully compiled, it is
1158*22dc650dSSadaf Ebrahimipushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
1159*22dc650dSSadaf Ebrahimiline to contain a new pattern (or a command) instead of a subject line. This
1160*22dc650dSSadaf Ebrahimifacility is used when saving compiled patterns to a file, as described in the
1161*22dc650dSSadaf Ebrahimisection entitled "Saving and restoring compiled patterns"
1162*22dc650dSSadaf Ebrahimi<a href="#saverestore">below.</a>
1163*22dc650dSSadaf EbrahimiIf <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled
1164*22dc650dSSadaf Ebrahimipattern is stacked, leaving the original as current, ready to match the
1165*22dc650dSSadaf Ebrahimifollowing input lines. This provides a way of testing the
1166*22dc650dSSadaf Ebrahimi<b>pcre2_code_copy()</b> function.
1167*22dc650dSSadaf EbrahimiThe <b>push</b> and <b>pushcopy </b> modifiers are incompatible with compilation
1168*22dc650dSSadaf Ebrahimimodifiers such as <b>global</b> that act at match time. Any that are specified
1169*22dc650dSSadaf Ebrahimiare ignored (for the stacked copy), with a warning message, except for
1170*22dc650dSSadaf Ebrahimi<b>replace</b>, which causes an error. Note that <b>jitverify</b>, which is
1171*22dc650dSSadaf Ebrahimiallowed, does not carry through to any subsequent matching that uses a stacked
1172*22dc650dSSadaf Ebrahimipattern.
1173*22dc650dSSadaf Ebrahimi</P>
1174*22dc650dSSadaf Ebrahimi<br><b>
1175*22dc650dSSadaf EbrahimiTesting foreign pattern conversion
1176*22dc650dSSadaf Ebrahimi</b><br>
1177*22dc650dSSadaf Ebrahimi<P>
1178*22dc650dSSadaf EbrahimiThe experimental foreign pattern conversion functions in PCRE2 can be tested by
1179*22dc650dSSadaf Ebrahimisetting the <b>convert</b> modifier. Its argument is a colon-separated list of
1180*22dc650dSSadaf Ebrahimioptions, which set the equivalent option for the <b>pcre2_pattern_convert()</b>
1181*22dc650dSSadaf Ebrahimifunction:
1182*22dc650dSSadaf Ebrahimi<pre>
1183*22dc650dSSadaf Ebrahimi  glob                    PCRE2_CONVERT_GLOB
1184*22dc650dSSadaf Ebrahimi  glob_no_starstar        PCRE2_CONVERT_GLOB_NO_STARSTAR
1185*22dc650dSSadaf Ebrahimi  glob_no_wild_separator  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
1186*22dc650dSSadaf Ebrahimi  posix_basic             PCRE2_CONVERT_POSIX_BASIC
1187*22dc650dSSadaf Ebrahimi  posix_extended          PCRE2_CONVERT_POSIX_EXTENDED
1188*22dc650dSSadaf Ebrahimi  unset                   Unset all options
1189*22dc650dSSadaf Ebrahimi</pre>
1190*22dc650dSSadaf EbrahimiThe "unset" value is useful for turning off a default that has been set by a
1191*22dc650dSSadaf Ebrahimi<b>#pattern</b> command. When one of these options is set, the input pattern is
1192*22dc650dSSadaf Ebrahimipassed to <b>pcre2_pattern_convert()</b>. If the conversion is successful, the
1193*22dc650dSSadaf Ebrahimiresult is reflected in the output and then passed to <b>pcre2_compile()</b>. The
1194*22dc650dSSadaf Ebrahiminormal <b>utf</b> and <b>no_utf_check</b> options, if set, cause the
1195*22dc650dSSadaf EbrahimiPCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be passed to
1196*22dc650dSSadaf Ebrahimi<b>pcre2_pattern_convert()</b>.
1197*22dc650dSSadaf Ebrahimi</P>
1198*22dc650dSSadaf Ebrahimi<P>
1199*22dc650dSSadaf EbrahimiBy default, the conversion function is allowed to allocate a buffer for its
1200*22dc650dSSadaf Ebrahimioutput. However, if the <b>convert_length</b> modifier is set to a value greater
1201*22dc650dSSadaf Ebrahimithan zero, <b>pcre2test</b> passes a buffer of the given length. This makes it
1202*22dc650dSSadaf Ebrahimipossible to test the length check.
1203*22dc650dSSadaf Ebrahimi</P>
1204*22dc650dSSadaf Ebrahimi<P>
1205*22dc650dSSadaf EbrahimiThe <b>convert_glob_escape</b> and <b>convert_glob_separator</b> modifiers can be
1206*22dc650dSSadaf Ebrahimiused to specify the escape and separator characters for glob processing,
1207*22dc650dSSadaf Ebrahimioverriding the defaults, which are operating-system dependent.
1208*22dc650dSSadaf Ebrahimi<a name="subjectmodifiers"></a></P>
1209*22dc650dSSadaf Ebrahimi<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
1210*22dc650dSSadaf Ebrahimi<P>
1211*22dc650dSSadaf EbrahimiThe modifiers that can appear in subject lines and the <b>#subject</b>
1212*22dc650dSSadaf Ebrahimicommand are of two types.
1213*22dc650dSSadaf Ebrahimi</P>
1214*22dc650dSSadaf Ebrahimi<br><b>
1215*22dc650dSSadaf EbrahimiSetting match options
1216*22dc650dSSadaf Ebrahimi</b><br>
1217*22dc650dSSadaf Ebrahimi<P>
1218*22dc650dSSadaf EbrahimiThe following modifiers set options for <b>pcre2_match()</b> or
1219*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>. See
1220*22dc650dSSadaf Ebrahimi<a href="pcreapi.html"><b>pcreapi</b></a>
1221*22dc650dSSadaf Ebrahimifor a description of their effects.
1222*22dc650dSSadaf Ebrahimi<pre>
1223*22dc650dSSadaf Ebrahimi      anchored                   set PCRE2_ANCHORED
1224*22dc650dSSadaf Ebrahimi      endanchored                set PCRE2_ENDANCHORED
1225*22dc650dSSadaf Ebrahimi      dfa_restart                set PCRE2_DFA_RESTART
1226*22dc650dSSadaf Ebrahimi      dfa_shortest               set PCRE2_DFA_SHORTEST
1227*22dc650dSSadaf Ebrahimi      disable_recurseloop_check  set PCRE2_DISABLE_RECURSELOOP_CHECK
1228*22dc650dSSadaf Ebrahimi      no_jit                     set PCRE2_NO_JIT
1229*22dc650dSSadaf Ebrahimi      no_utf_check               set PCRE2_NO_UTF_CHECK
1230*22dc650dSSadaf Ebrahimi      notbol                     set PCRE2_NOTBOL
1231*22dc650dSSadaf Ebrahimi      notempty                   set PCRE2_NOTEMPTY
1232*22dc650dSSadaf Ebrahimi      notempty_atstart           set PCRE2_NOTEMPTY_ATSTART
1233*22dc650dSSadaf Ebrahimi      noteol                     set PCRE2_NOTEOL
1234*22dc650dSSadaf Ebrahimi      partial_hard (or ph)       set PCRE2_PARTIAL_HARD
1235*22dc650dSSadaf Ebrahimi      partial_soft (or ps)       set PCRE2_PARTIAL_SOFT
1236*22dc650dSSadaf Ebrahimi</pre>
1237*22dc650dSSadaf EbrahimiThe partial matching modifiers are provided with abbreviations because they
1238*22dc650dSSadaf Ebrahimiappear frequently in tests.
1239*22dc650dSSadaf Ebrahimi</P>
1240*22dc650dSSadaf Ebrahimi<P>
1241*22dc650dSSadaf EbrahimiIf the <b>posix</b> or <b>posix_nosub</b> modifier was present on the pattern,
1242*22dc650dSSadaf Ebrahimicausing the POSIX wrapper API to be used, the only option-setting modifiers
1243*22dc650dSSadaf Ebrahimithat have any effect are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>,
1244*22dc650dSSadaf Ebrahimicausing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
1245*22dc650dSSadaf Ebrahimi<b>regexec()</b>. The other modifiers are ignored, with a warning message.
1246*22dc650dSSadaf Ebrahimi</P>
1247*22dc650dSSadaf Ebrahimi<P>
1248*22dc650dSSadaf EbrahimiThere is one additional modifier that can be used with the POSIX wrapper. It is
1249*22dc650dSSadaf Ebrahimiignored (with a warning) if used for non-POSIX matching.
1250*22dc650dSSadaf Ebrahimi<pre>
1251*22dc650dSSadaf Ebrahimi      posix_startend=&#60;n&#62;[:&#60;m&#62;]
1252*22dc650dSSadaf Ebrahimi</pre>
1253*22dc650dSSadaf EbrahimiThis causes the subject string to be passed to <b>regexec()</b> using the
1254*22dc650dSSadaf EbrahimiREG_STARTEND option, which uses offsets to specify which part of the string is
1255*22dc650dSSadaf Ebrahimisearched. If only one number is given, the end offset is passed as the end of
1256*22dc650dSSadaf Ebrahimithe subject string. For more detail of REG_STARTEND, see the
1257*22dc650dSSadaf Ebrahimi<a href="pcre2posix.html"><b>pcre2posix</b></a>
1258*22dc650dSSadaf Ebrahimidocumentation. If the subject string contains binary zeros (coded as escapes
1259*22dc650dSSadaf Ebrahimisuch as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
1260*22dc650dSSadaf Ebrahimiits input), you must use <b>posix_startend</b> to specify its length.
1261*22dc650dSSadaf Ebrahimi</P>
1262*22dc650dSSadaf Ebrahimi<br><b>
1263*22dc650dSSadaf EbrahimiSetting match controls
1264*22dc650dSSadaf Ebrahimi</b><br>
1265*22dc650dSSadaf Ebrahimi<P>
1266*22dc650dSSadaf EbrahimiThe following modifiers affect the matching process or request additional
1267*22dc650dSSadaf Ebrahimiinformation. Some of them may also be specified on a pattern line (see above),
1268*22dc650dSSadaf Ebrahimiin which case they apply to every subject line that is matched against that
1269*22dc650dSSadaf Ebrahimipattern, but can be overridden by modifiers on the subject.
1270*22dc650dSSadaf Ebrahimi<pre>
1271*22dc650dSSadaf Ebrahimi      aftertext                  show text after match
1272*22dc650dSSadaf Ebrahimi      allaftertext               show text after captures
1273*22dc650dSSadaf Ebrahimi      allcaptures                show all captures
1274*22dc650dSSadaf Ebrahimi      allvector                  show the entire ovector
1275*22dc650dSSadaf Ebrahimi      allusedtext                show all consulted text (non-JIT only)
1276*22dc650dSSadaf Ebrahimi      altglobal                  alternative global matching
1277*22dc650dSSadaf Ebrahimi      callout_capture            show captures at callout time
1278*22dc650dSSadaf Ebrahimi      callout_data=&#60;n&#62;           set a value to pass via callouts
1279*22dc650dSSadaf Ebrahimi      callout_error=&#60;n&#62;[:&#60;m&#62;]    control callout error
1280*22dc650dSSadaf Ebrahimi      callout_extra              show extra callout information
1281*22dc650dSSadaf Ebrahimi      callout_fail=&#60;n&#62;[:&#60;m&#62;]     control callout failure
1282*22dc650dSSadaf Ebrahimi      callout_no_where           do not show position of a callout
1283*22dc650dSSadaf Ebrahimi      callout_none               do not supply a callout function
1284*22dc650dSSadaf Ebrahimi      copy=&#60;number or name&#62;      copy captured substring
1285*22dc650dSSadaf Ebrahimi      depth_limit=&#60;n&#62;            set a depth limit
1286*22dc650dSSadaf Ebrahimi      dfa                        use <b>pcre2_dfa_match()</b>
1287*22dc650dSSadaf Ebrahimi      find_limits                find heap, match and depth limits
1288*22dc650dSSadaf Ebrahimi      find_limits_noheap         find match and depth limits
1289*22dc650dSSadaf Ebrahimi      get=&#60;number or name&#62;       extract captured substring
1290*22dc650dSSadaf Ebrahimi      getall                     extract all captured substrings
1291*22dc650dSSadaf Ebrahimi  /g  global                     global matching
1292*22dc650dSSadaf Ebrahimi      heapframes_size            show match data heapframes size
1293*22dc650dSSadaf Ebrahimi      heap_limit=&#60;n&#62;             set a limit on heap memory (Kbytes)
1294*22dc650dSSadaf Ebrahimi      jitstack=&#60;n&#62;               set size of JIT stack
1295*22dc650dSSadaf Ebrahimi      mark                       show mark values
1296*22dc650dSSadaf Ebrahimi      match_limit=&#60;n&#62;            set a match limit
1297*22dc650dSSadaf Ebrahimi      memory                     show heap memory usage
1298*22dc650dSSadaf Ebrahimi      null_context               match with a NULL context
1299*22dc650dSSadaf Ebrahimi      null_replacement           substitute with NULL replacement
1300*22dc650dSSadaf Ebrahimi      null_subject               match with NULL subject
1301*22dc650dSSadaf Ebrahimi      offset=&#60;n&#62;                 set starting offset
1302*22dc650dSSadaf Ebrahimi      offset_limit=&#60;n&#62;           set offset limit
1303*22dc650dSSadaf Ebrahimi      ovector=&#60;n&#62;                set size of output vector
1304*22dc650dSSadaf Ebrahimi      recursion_limit=&#60;n&#62;        obsolete synonym for depth_limit
1305*22dc650dSSadaf Ebrahimi      replace=&#60;string&#62;           specify a replacement string
1306*22dc650dSSadaf Ebrahimi      startchar                  show startchar when relevant
1307*22dc650dSSadaf Ebrahimi      startoffset=&#60;n&#62;            same as offset=&#60;n&#62;
1308*22dc650dSSadaf Ebrahimi      substitute_callout         use substitution callouts
1309*22dc650dSSadaf Ebrahimi      substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
1310*22dc650dSSadaf Ebrahimi      substitute_literal         use PCRE2_SUBSTITUTE_LITERAL
1311*22dc650dSSadaf Ebrahimi      substitute_matched         use PCRE2_SUBSTITUTE_MATCHED
1312*22dc650dSSadaf Ebrahimi      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1313*22dc650dSSadaf Ebrahimi      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
1314*22dc650dSSadaf Ebrahimi      substitute_skip=&#60;n&#62;        skip substitution number n
1315*22dc650dSSadaf Ebrahimi      substitute_stop=&#60;n&#62;        skip substitution number n and greater
1316*22dc650dSSadaf Ebrahimi      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1317*22dc650dSSadaf Ebrahimi      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
1318*22dc650dSSadaf Ebrahimi      zero_terminate             pass the subject as zero-terminated
1319*22dc650dSSadaf Ebrahimi</pre>
1320*22dc650dSSadaf EbrahimiThe effects of these modifiers are described in the following sections. When
1321*22dc650dSSadaf Ebrahimimatching via the POSIX wrapper API, the <b>aftertext</b>, <b>allaftertext</b>,
1322*22dc650dSSadaf Ebrahimiand <b>ovector</b> subject modifiers work as described below. All other
1323*22dc650dSSadaf Ebrahimimodifiers are either ignored, with a warning message, or cause an error.
1324*22dc650dSSadaf Ebrahimi</P>
1325*22dc650dSSadaf Ebrahimi<br><b>
1326*22dc650dSSadaf EbrahimiShowing more text
1327*22dc650dSSadaf Ebrahimi</b><br>
1328*22dc650dSSadaf Ebrahimi<P>
1329*22dc650dSSadaf EbrahimiThe <b>aftertext</b> modifier requests that as well as outputting the part of
1330*22dc650dSSadaf Ebrahimithe subject string that matched the entire pattern, <b>pcre2test</b> should in
1331*22dc650dSSadaf Ebrahimiaddition output the remainder of the subject string. This is useful for tests
1332*22dc650dSSadaf Ebrahimiwhere the subject contains multiple copies of the same substring. The
1333*22dc650dSSadaf Ebrahimi<b>allaftertext</b> modifier requests the same action for captured substrings as
1334*22dc650dSSadaf Ebrahimiwell as the main matched substring. In each case the remainder is output on the
1335*22dc650dSSadaf Ebrahimifollowing line with a plus character following the capture number.
1336*22dc650dSSadaf Ebrahimi</P>
1337*22dc650dSSadaf Ebrahimi<P>
1338*22dc650dSSadaf EbrahimiThe <b>allusedtext</b> modifier requests that all the text that was consulted
1339*22dc650dSSadaf Ebrahimiduring a successful pattern match by the interpreter should be shown, for both
1340*22dc650dSSadaf Ebrahimifull and partial matches. This feature is not supported for JIT matching, and
1341*22dc650dSSadaf Ebrahimiif requested with JIT it is ignored (with a warning message). Setting this
1342*22dc650dSSadaf Ebrahimimodifier affects the output if there is a lookbehind at the start of a match,
1343*22dc650dSSadaf Ebrahimior, for a complete match, a lookahead at the end, or if \K is used in the
1344*22dc650dSSadaf Ebrahimipattern. Characters that precede or follow the start and end of the actual
1345*22dc650dSSadaf Ebrahimimatch are indicated in the output by '&#60;' or '&#62;' characters underneath them.
1346*22dc650dSSadaf EbrahimiHere is an example:
1347*22dc650dSSadaf Ebrahimi<pre>
1348*22dc650dSSadaf Ebrahimi    re&#62; /(?&#60;=pqr)abc(?=xyz)/
1349*22dc650dSSadaf Ebrahimi  data&#62; 123pqrabcxyz456\=allusedtext
1350*22dc650dSSadaf Ebrahimi   0: pqrabcxyz
1351*22dc650dSSadaf Ebrahimi      &#60;&#60;&#60;   &#62;&#62;&#62;
1352*22dc650dSSadaf Ebrahimi  data&#62; 123pqrabcxy\=ph,allusedtext
1353*22dc650dSSadaf Ebrahimi  Partial match: pqrabcxy
1354*22dc650dSSadaf Ebrahimi                 &#60;&#60;&#60;
1355*22dc650dSSadaf Ebrahimi</pre>
1356*22dc650dSSadaf EbrahimiThe first, complete match shows that the matched string is "abc", with the
1357*22dc650dSSadaf Ebrahimipreceding and following strings "pqr" and "xyz" having been consulted during
1358*22dc650dSSadaf Ebrahimithe match (when processing the assertions). The partial match can indicate only
1359*22dc650dSSadaf Ebrahimithe preceding string.
1360*22dc650dSSadaf Ebrahimi</P>
1361*22dc650dSSadaf Ebrahimi<P>
1362*22dc650dSSadaf EbrahimiThe <b>startchar</b> modifier requests that the starting character for the match
1363*22dc650dSSadaf Ebrahimibe indicated, if it is different to the start of the matched string. The only
1364*22dc650dSSadaf Ebrahimitime when this occurs is when \K has been processed as part of the match. In
1365*22dc650dSSadaf Ebrahimithis situation, the output for the matched string is displayed from the
1366*22dc650dSSadaf Ebrahimistarting character instead of from the match point, with circumflex characters
1367*22dc650dSSadaf Ebrahimiunder the earlier characters. For example:
1368*22dc650dSSadaf Ebrahimi<pre>
1369*22dc650dSSadaf Ebrahimi    re&#62; /abc\Kxyz/
1370*22dc650dSSadaf Ebrahimi  data&#62; abcxyz\=startchar
1371*22dc650dSSadaf Ebrahimi   0: abcxyz
1372*22dc650dSSadaf Ebrahimi      ^^^
1373*22dc650dSSadaf Ebrahimi</pre>
1374*22dc650dSSadaf EbrahimiUnlike <b>allusedtext</b>, the <b>startchar</b> modifier can be used with JIT.
1375*22dc650dSSadaf EbrahimiHowever, these two modifiers are mutually exclusive.
1376*22dc650dSSadaf Ebrahimi</P>
1377*22dc650dSSadaf Ebrahimi<br><b>
1378*22dc650dSSadaf EbrahimiShowing the value of all capture groups
1379*22dc650dSSadaf Ebrahimi</b><br>
1380*22dc650dSSadaf Ebrahimi<P>
1381*22dc650dSSadaf EbrahimiThe <b>allcaptures</b> modifier requests that the values of all potential
1382*22dc650dSSadaf Ebrahimicaptured parentheses be output after a match. By default, only those up to the
1383*22dc650dSSadaf Ebrahimihighest one actually used in the match are output (corresponding to the return
1384*22dc650dSSadaf Ebrahimicode from <b>pcre2_match()</b>). Groups that did not take part in the match
1385*22dc650dSSadaf Ebrahimiare output as "&#60;unset&#62;". This modifier is not relevant for DFA matching (which
1386*22dc650dSSadaf Ebrahimidoes no capturing) and does not apply when <b>replace</b> is specified; it is
1387*22dc650dSSadaf Ebrahimiignored, with a warning message, if present.
1388*22dc650dSSadaf Ebrahimi</P>
1389*22dc650dSSadaf Ebrahimi<br><b>
1390*22dc650dSSadaf EbrahimiShowing the entire ovector, for all outcomes
1391*22dc650dSSadaf Ebrahimi</b><br>
1392*22dc650dSSadaf Ebrahimi<P>
1393*22dc650dSSadaf EbrahimiThe <b>allvector</b> modifier requests that the entire ovector be shown,
1394*22dc650dSSadaf Ebrahimiwhatever the outcome of the match. Compare <b>allcaptures</b>, which shows only
1395*22dc650dSSadaf Ebrahimiup to the maximum number of capture groups for the pattern, and then only for a
1396*22dc650dSSadaf Ebrahimisuccessful complete non-DFA match. This modifier, which acts after any match
1397*22dc650dSSadaf Ebrahimiresult, and also for DFA matching, provides a means of checking that there are
1398*22dc650dSSadaf Ebrahimino unexpected modifications to ovector fields. Before each match attempt, the
1399*22dc650dSSadaf Ebrahimiovector is filled with a special value, and if this is found in both elements
1400*22dc650dSSadaf Ebrahimiof a capturing pair, "&#60;unchanged&#62;" is output. After a successful match, this
1401*22dc650dSSadaf Ebrahimiapplies to all groups after the maximum capture group for the pattern. In other
1402*22dc650dSSadaf Ebrahimicases it applies to the entire ovector. After a partial match, the first two
1403*22dc650dSSadaf Ebrahimielements are the only ones that should be set. After a DFA match, the amount of
1404*22dc650dSSadaf Ebrahimiovector that is used depends on the number of matches that were found.
1405*22dc650dSSadaf Ebrahimi</P>
1406*22dc650dSSadaf Ebrahimi<br><b>
1407*22dc650dSSadaf EbrahimiTesting pattern callouts
1408*22dc650dSSadaf Ebrahimi</b><br>
1409*22dc650dSSadaf Ebrahimi<P>
1410*22dc650dSSadaf EbrahimiA callout function is supplied when <b>pcre2test</b> calls the library matching
1411*22dc650dSSadaf Ebrahimifunctions, unless <b>callout_none</b> is specified. Its behaviour can be
1412*22dc650dSSadaf Ebrahimicontrolled by various modifiers listed above whose names begin with
1413*22dc650dSSadaf Ebrahimi<b>callout_</b>. Details are given in the section entitled "Callouts"
1414*22dc650dSSadaf Ebrahimi<a href="#callouts">below.</a>
1415*22dc650dSSadaf EbrahimiTesting callouts from <b>pcre2_substitute()</b> is described separately in
1416*22dc650dSSadaf Ebrahimi"Testing the substitution function"
1417*22dc650dSSadaf Ebrahimi<a href="#substitution">below.</a>
1418*22dc650dSSadaf Ebrahimi</P>
1419*22dc650dSSadaf Ebrahimi<br><b>
1420*22dc650dSSadaf EbrahimiFinding all matches in a string
1421*22dc650dSSadaf Ebrahimi</b><br>
1422*22dc650dSSadaf Ebrahimi<P>
1423*22dc650dSSadaf EbrahimiSearching for all possible matches within a subject can be requested by the
1424*22dc650dSSadaf Ebrahimi<b>global</b> or <b>altglobal</b> modifier. After finding a match, the matching
1425*22dc650dSSadaf Ebrahimifunction is called again to search the remainder of the subject. The difference
1426*22dc650dSSadaf Ebrahimibetween <b>global</b> and <b>altglobal</b> is that the former uses the
1427*22dc650dSSadaf Ebrahimi<i>start_offset</i> argument to <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
1428*22dc650dSSadaf Ebrahimito start searching at a new point within the entire string (which is what Perl
1429*22dc650dSSadaf Ebrahimidoes), whereas the latter passes over a shortened subject. This makes a
1430*22dc650dSSadaf Ebrahimidifference to the matching process if the pattern begins with a lookbehind
1431*22dc650dSSadaf Ebrahimiassertion (including \b or \B).
1432*22dc650dSSadaf Ebrahimi</P>
1433*22dc650dSSadaf Ebrahimi<P>
1434*22dc650dSSadaf EbrahimiIf an empty string is matched, the next match is done with the
1435*22dc650dSSadaf EbrahimiPCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
1436*22dc650dSSadaf Ebrahimianother, non-empty, match at the same point in the subject. If this match
1437*22dc650dSSadaf Ebrahimifails, the start offset is advanced, and the normal match is retried. This
1438*22dc650dSSadaf Ebrahimiimitates the way Perl handles such cases when using the <b>/g</b> modifier or
1439*22dc650dSSadaf Ebrahimithe <b>split()</b> function. Normally, the start offset is advanced by one
1440*22dc650dSSadaf Ebrahimicharacter, but if the newline convention recognizes CRLF as a newline, and the
1441*22dc650dSSadaf Ebrahimicurrent character is CR followed by LF, an advance of two characters occurs.
1442*22dc650dSSadaf Ebrahimi</P>
1443*22dc650dSSadaf Ebrahimi<br><b>
1444*22dc650dSSadaf EbrahimiTesting substring extraction functions
1445*22dc650dSSadaf Ebrahimi</b><br>
1446*22dc650dSSadaf Ebrahimi<P>
1447*22dc650dSSadaf EbrahimiThe <b>copy</b> and <b>get</b> modifiers can be used to test the
1448*22dc650dSSadaf Ebrahimi<b>pcre2_substring_copy_xxx()</b> and <b>pcre2_substring_get_xxx()</b> functions.
1449*22dc650dSSadaf EbrahimiThey can be given more than once, and each can specify a capture group name or
1450*22dc650dSSadaf Ebrahiminumber, for example:
1451*22dc650dSSadaf Ebrahimi<pre>
1452*22dc650dSSadaf Ebrahimi   abcd\=copy=1,copy=3,get=G1
1453*22dc650dSSadaf Ebrahimi</pre>
1454*22dc650dSSadaf EbrahimiIf the <b>#subject</b> command is used to set default copy and/or get lists,
1455*22dc650dSSadaf Ebrahimithese can be unset by specifying a negative number to cancel all numbered
1456*22dc650dSSadaf Ebrahimigroups and an empty name to cancel all named groups.
1457*22dc650dSSadaf Ebrahimi</P>
1458*22dc650dSSadaf Ebrahimi<P>
1459*22dc650dSSadaf EbrahimiThe <b>getall</b> modifier tests <b>pcre2_substring_list_get()</b>, which
1460*22dc650dSSadaf Ebrahimiextracts all captured substrings.
1461*22dc650dSSadaf Ebrahimi</P>
1462*22dc650dSSadaf Ebrahimi<P>
1463*22dc650dSSadaf EbrahimiIf the subject line is successfully matched, the substrings extracted by the
1464*22dc650dSSadaf Ebrahimiconvenience functions are output with C, G, or L after the string number
1465*22dc650dSSadaf Ebrahimiinstead of a colon. This is in addition to the normal full list. The string
1466*22dc650dSSadaf Ebrahimilength (that is, the return from the extraction function) is given in
1467*22dc650dSSadaf Ebrahimiparentheses after each substring, followed by the name when the extraction was
1468*22dc650dSSadaf Ebrahimiby name.
1469*22dc650dSSadaf Ebrahimi<a name="substitution"></a></P>
1470*22dc650dSSadaf Ebrahimi<br><b>
1471*22dc650dSSadaf EbrahimiTesting the substitution function
1472*22dc650dSSadaf Ebrahimi</b><br>
1473*22dc650dSSadaf Ebrahimi<P>
1474*22dc650dSSadaf EbrahimiIf the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
1475*22dc650dSSadaf Ebrahimicalled instead of one of the matching functions (or after one call of
1476*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
1477*22dc650dSSadaf Ebrahimireplacement strings cannot contain commas, because a comma signifies the end of
1478*22dc650dSSadaf Ebrahimia modifier. This is not thought to be an issue in a test program.
1479*22dc650dSSadaf Ebrahimi</P>
1480*22dc650dSSadaf Ebrahimi<P>
1481*22dc650dSSadaf EbrahimiSpecifying a completely empty replacement string disables this modifier.
1482*22dc650dSSadaf EbrahimiHowever, it is possible to specify an empty replacement by providing a buffer
1483*22dc650dSSadaf Ebrahimilength, as described below, for an otherwise empty replacement.
1484*22dc650dSSadaf Ebrahimi</P>
1485*22dc650dSSadaf Ebrahimi<P>
1486*22dc650dSSadaf EbrahimiUnlike subject strings, <b>pcre2test</b> does not process replacement strings
1487*22dc650dSSadaf Ebrahimifor escape sequences. In UTF mode, a replacement string is checked to see if it
1488*22dc650dSSadaf Ebrahimiis a valid UTF-8 string. If so, it is correctly converted to a UTF string of
1489*22dc650dSSadaf Ebrahimithe appropriate code unit width. If it is not a valid UTF-8 string, the
1490*22dc650dSSadaf Ebrahimiindividual code units are copied directly. This provides a means of passing an
1491*22dc650dSSadaf Ebrahimiinvalid UTF-8 string for testing purposes.
1492*22dc650dSSadaf Ebrahimi</P>
1493*22dc650dSSadaf Ebrahimi<P>
1494*22dc650dSSadaf EbrahimiThe following modifiers set options (in additional to the normal match options)
1495*22dc650dSSadaf Ebrahimifor <b>pcre2_substitute()</b>:
1496*22dc650dSSadaf Ebrahimi<pre>
1497*22dc650dSSadaf Ebrahimi  global                      PCRE2_SUBSTITUTE_GLOBAL
1498*22dc650dSSadaf Ebrahimi  substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
1499*22dc650dSSadaf Ebrahimi  substitute_literal          PCRE2_SUBSTITUTE_LITERAL
1500*22dc650dSSadaf Ebrahimi  substitute_matched          PCRE2_SUBSTITUTE_MATCHED
1501*22dc650dSSadaf Ebrahimi  substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1502*22dc650dSSadaf Ebrahimi  substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
1503*22dc650dSSadaf Ebrahimi  substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1504*22dc650dSSadaf Ebrahimi  substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
1505*22dc650dSSadaf Ebrahimi</pre>
1506*22dc650dSSadaf EbrahimiSee the
1507*22dc650dSSadaf Ebrahimi<a href="pcre2api.html"><b>pcre2api</b></a>
1508*22dc650dSSadaf Ebrahimidocumentation for details of these options.
1509*22dc650dSSadaf Ebrahimi</P>
1510*22dc650dSSadaf Ebrahimi<P>
1511*22dc650dSSadaf EbrahimiAfter a successful substitution, the modified string is output, preceded by the
1512*22dc650dSSadaf Ebrahiminumber of replacements. This may be zero if there were no matches. Here is a
1513*22dc650dSSadaf Ebrahimisimple example of a substitution test:
1514*22dc650dSSadaf Ebrahimi<pre>
1515*22dc650dSSadaf Ebrahimi  /abc/replace=xxx
1516*22dc650dSSadaf Ebrahimi      =abc=abc=
1517*22dc650dSSadaf Ebrahimi   1: =xxx=abc=
1518*22dc650dSSadaf Ebrahimi      =abc=abc=\=global
1519*22dc650dSSadaf Ebrahimi   2: =xxx=xxx=
1520*22dc650dSSadaf Ebrahimi</pre>
1521*22dc650dSSadaf EbrahimiSubject and replacement strings should be kept relatively short (fewer than 256
1522*22dc650dSSadaf Ebrahimicharacters) for substitution tests, as fixed-size buffers are used. To make it
1523*22dc650dSSadaf Ebrahimieasy to test for buffer overflow, if the replacement string starts with a
1524*22dc650dSSadaf Ebrahiminumber in square brackets, that number is passed to <b>pcre2_substitute()</b> as
1525*22dc650dSSadaf Ebrahimithe size of the output buffer, with the replacement string starting at the next
1526*22dc650dSSadaf Ebrahimicharacter. Here is an example that tests the edge case:
1527*22dc650dSSadaf Ebrahimi<pre>
1528*22dc650dSSadaf Ebrahimi  /abc/
1529*22dc650dSSadaf Ebrahimi      123abc123\=replace=[10]XYZ
1530*22dc650dSSadaf Ebrahimi   1: 123XYZ123
1531*22dc650dSSadaf Ebrahimi      123abc123\=replace=[9]XYZ
1532*22dc650dSSadaf Ebrahimi  Failed: error -47: no more memory
1533*22dc650dSSadaf Ebrahimi</pre>
1534*22dc650dSSadaf EbrahimiThe default action of <b>pcre2_substitute()</b> is to return
1535*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
1536*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
1537*22dc650dSSadaf Ebrahimi<b>substitute_overflow_length</b> modifier), <b>pcre2_substitute()</b> continues
1538*22dc650dSSadaf Ebrahimito go through the motions of matching and substituting (but not doing any
1539*22dc650dSSadaf Ebrahimicallouts), in order to compute the size of buffer that is required. When this
1540*22dc650dSSadaf Ebrahimihappens, <b>pcre2test</b> shows the required buffer length (which includes space
1541*22dc650dSSadaf Ebrahimifor the trailing zero) as part of the error message. For example:
1542*22dc650dSSadaf Ebrahimi<pre>
1543*22dc650dSSadaf Ebrahimi  /abc/substitute_overflow_length
1544*22dc650dSSadaf Ebrahimi      123abc123\=replace=[9]XYZ
1545*22dc650dSSadaf Ebrahimi  Failed: error -47: no more memory: 10 code units are needed
1546*22dc650dSSadaf Ebrahimi</pre>
1547*22dc650dSSadaf EbrahimiA replacement string is ignored with POSIX and DFA matching. Specifying partial
1548*22dc650dSSadaf Ebrahimimatching provokes an error return ("bad option value") from
1549*22dc650dSSadaf Ebrahimi<b>pcre2_substitute()</b>.
1550*22dc650dSSadaf Ebrahimi</P>
1551*22dc650dSSadaf Ebrahimi<br><b>
1552*22dc650dSSadaf EbrahimiTesting substitute callouts
1553*22dc650dSSadaf Ebrahimi</b><br>
1554*22dc650dSSadaf Ebrahimi<P>
1555*22dc650dSSadaf EbrahimiIf the <b>substitute_callout</b> modifier is set, a substitution callout
1556*22dc650dSSadaf Ebrahimifunction is set up. The <b>null_context</b> modifier must not be set, because
1557*22dc650dSSadaf Ebrahimithe address of the callout function is passed in a match context. When the
1558*22dc650dSSadaf Ebrahimicallout function is called (after each substitution), details of the input
1559*22dc650dSSadaf Ebrahimiand output strings are output. For example:
1560*22dc650dSSadaf Ebrahimi<pre>
1561*22dc650dSSadaf Ebrahimi  /abc/g,replace=&#60;$0&#62;,substitute_callout
1562*22dc650dSSadaf Ebrahimi      abcdefabcpqr
1563*22dc650dSSadaf Ebrahimi   1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62;"
1564*22dc650dSSadaf Ebrahimi   2(1) Old 6 9 "abc" New 8 13 "&#60;abc&#62;"
1565*22dc650dSSadaf Ebrahimi   2: &#60;abc&#62;def&#60;abc&#62;pqr
1566*22dc650dSSadaf Ebrahimi</pre>
1567*22dc650dSSadaf EbrahimiThe first number on each callout line is the count of matches. The
1568*22dc650dSSadaf Ebrahimiparenthesized number is the number of pairs that are set in the ovector (that
1569*22dc650dSSadaf Ebrahimiis, one more than the number of capturing groups that were set). Then are
1570*22dc650dSSadaf Ebrahimilisted the offsets of the old substring, its contents, and the same for the
1571*22dc650dSSadaf Ebrahimireplacement.
1572*22dc650dSSadaf Ebrahimi</P>
1573*22dc650dSSadaf Ebrahimi<P>
1574*22dc650dSSadaf EbrahimiBy default, the substitution callout function returns zero, which accepts the
1575*22dc650dSSadaf Ebrahimireplacement and causes matching to continue if /g was used. Two further
1576*22dc650dSSadaf Ebrahimimodifiers can be used to test other return values. If <b>substitute_skip</b> is
1577*22dc650dSSadaf Ebrahimiset to a value greater than zero the callout function returns +1 for the match
1578*22dc650dSSadaf Ebrahimiof that number, and similarly <b>substitute_stop</b> returns -1. These cause the
1579*22dc650dSSadaf Ebrahimireplacement to be rejected, and -1 causes no further matching to take place. If
1580*22dc650dSSadaf Ebrahimieither of them are set, <b>substitute_callout</b> is assumed. For example:
1581*22dc650dSSadaf Ebrahimi<pre>
1582*22dc650dSSadaf Ebrahimi  /abc/g,replace=&#60;$0&#62;,substitute_skip=1
1583*22dc650dSSadaf Ebrahimi      abcdefabcpqr
1584*22dc650dSSadaf Ebrahimi   1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; SKIPPED"
1585*22dc650dSSadaf Ebrahimi   2(1) Old 6 9 "abc" New 6 11 "&#60;abc&#62;"
1586*22dc650dSSadaf Ebrahimi   2: abcdef&#60;abc&#62;pqr
1587*22dc650dSSadaf Ebrahimi      abcdefabcpqr\=substitute_stop=1
1588*22dc650dSSadaf Ebrahimi   1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; STOPPED"
1589*22dc650dSSadaf Ebrahimi   1: abcdefabcpqr
1590*22dc650dSSadaf Ebrahimi</pre>
1591*22dc650dSSadaf EbrahimiIf both are set for the same number, stop takes precedence. Only a single skip
1592*22dc650dSSadaf Ebrahimior stop is supported, which is sufficient for testing that the feature works.
1593*22dc650dSSadaf Ebrahimi</P>
1594*22dc650dSSadaf Ebrahimi<br><b>
1595*22dc650dSSadaf EbrahimiSetting the JIT stack size
1596*22dc650dSSadaf Ebrahimi</b><br>
1597*22dc650dSSadaf Ebrahimi<P>
1598*22dc650dSSadaf EbrahimiThe <b>jitstack</b> modifier provides a way of setting the maximum stack size
1599*22dc650dSSadaf Ebrahimithat is used by the just-in-time optimization code. It is ignored if JIT
1600*22dc650dSSadaf Ebrahimioptimization is not being used. The value is a number of kibibytes (units of
1601*22dc650dSSadaf Ebrahimi1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
1602*22dc650dSSadaf Ebrahimithat is larger than the default is necessary only for very complicated
1603*22dc650dSSadaf Ebrahimipatterns. If <b>jitstack</b> is set non-zero on a subject line it overrides any
1604*22dc650dSSadaf Ebrahimivalue that was set on the pattern.
1605*22dc650dSSadaf Ebrahimi</P>
1606*22dc650dSSadaf Ebrahimi<br><b>
1607*22dc650dSSadaf EbrahimiSetting heap, match, and depth limits
1608*22dc650dSSadaf Ebrahimi</b><br>
1609*22dc650dSSadaf Ebrahimi<P>
1610*22dc650dSSadaf EbrahimiThe <b>heap_limit</b>, <b>match_limit</b>, and <b>depth_limit</b> modifiers set
1611*22dc650dSSadaf Ebrahimithe appropriate limits in the match context. These values are ignored when the
1612*22dc650dSSadaf Ebrahimi<b>find_limits</b> or <b>find_limits_noheap</b> modifier is specified.
1613*22dc650dSSadaf Ebrahimi</P>
1614*22dc650dSSadaf Ebrahimi<br><b>
1615*22dc650dSSadaf EbrahimiFinding minimum limits
1616*22dc650dSSadaf Ebrahimi</b><br>
1617*22dc650dSSadaf Ebrahimi<P>
1618*22dc650dSSadaf EbrahimiIf the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
1619*22dc650dSSadaf Ebrahimicalls the relevant matching function several times, setting different values in
1620*22dc650dSSadaf Ebrahimithe match context via <b>pcre2_set_heap_limit()</b>,
1621*22dc650dSSadaf Ebrahimi<b>pcre2_set_match_limit()</b>, or <b>pcre2_set_depth_limit()</b> until it finds
1622*22dc650dSSadaf Ebrahimithe smallest value for each parameter that allows the match to complete without
1623*22dc650dSSadaf Ebrahimia "limit exceeded" error. The match itself may succeed or fail. An alternative
1624*22dc650dSSadaf Ebrahimimodifier, <b>find_limits_noheap</b>, omits the heap limit. This is used in the
1625*22dc650dSSadaf Ebrahimistandard tests, because the minimum heap limit varies between systems. If JIT
1626*22dc650dSSadaf Ebrahimiis being used, only the match limit is relevant, and the other two are
1627*22dc650dSSadaf Ebrahimiautomatically omitted.
1628*22dc650dSSadaf Ebrahimi</P>
1629*22dc650dSSadaf Ebrahimi<P>
1630*22dc650dSSadaf EbrahimiWhen using this modifier, the pattern should not contain any limit settings
1631*22dc650dSSadaf Ebrahimisuch as (*LIMIT_MATCH=...) within it. If such a setting is present and is
1632*22dc650dSSadaf Ebrahimilower than the minimum matching value, the minimum value cannot be found
1633*22dc650dSSadaf Ebrahimibecause <b>pcre2_set_match_limit()</b> etc. are only able to reduce the value of
1634*22dc650dSSadaf Ebrahimian in-pattern limit; they cannot increase it.
1635*22dc650dSSadaf Ebrahimi</P>
1636*22dc650dSSadaf Ebrahimi<P>
1637*22dc650dSSadaf EbrahimiFor non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
1638*22dc650dSSadaf Ebrahimimuch nested backtracking happens (that is, how deeply the pattern's tree is
1639*22dc650dSSadaf Ebrahimisearched). In the case of DFA matching, <i>depth_limit</i> controls the depth of
1640*22dc650dSSadaf Ebrahimirecursive calls of the internal function that is used for handling pattern
1641*22dc650dSSadaf Ebrahimirecursion, lookaround assertions, and atomic groups.
1642*22dc650dSSadaf Ebrahimi</P>
1643*22dc650dSSadaf Ebrahimi<P>
1644*22dc650dSSadaf EbrahimiFor non-DFA matching, the <i>match_limit</i> number is a measure of the amount
1645*22dc650dSSadaf Ebrahimiof backtracking that takes place, and learning the minimum value can be
1646*22dc650dSSadaf Ebrahimiinstructive. For most simple matches, the number is quite small, but for
1647*22dc650dSSadaf Ebrahimipatterns with very large numbers of matching possibilities, it can become large
1648*22dc650dSSadaf Ebrahimivery quickly with increasing length of subject string. In the case of DFA
1649*22dc650dSSadaf Ebrahimimatching, <i>match_limit</i> controls the total number of calls, both recursive
1650*22dc650dSSadaf Ebrahimiand non-recursive, to the internal matching function, thus controlling the
1651*22dc650dSSadaf Ebrahimioverall amount of computing resource that is used.
1652*22dc650dSSadaf Ebrahimi</P>
1653*22dc650dSSadaf Ebrahimi<P>
1654*22dc650dSSadaf EbrahimiFor both kinds of matching, the <i>heap_limit</i> number, which is in kibibytes
1655*22dc650dSSadaf Ebrahimi(units of 1024 bytes), limits the amount of heap memory used for matching.
1656*22dc650dSSadaf Ebrahimi</P>
1657*22dc650dSSadaf Ebrahimi<br><b>
1658*22dc650dSSadaf EbrahimiShowing MARK names
1659*22dc650dSSadaf Ebrahimi</b><br>
1660*22dc650dSSadaf Ebrahimi<P>
1661*22dc650dSSadaf EbrahimiThe <b>mark</b> modifier causes the names from backtracking control verbs that
1662*22dc650dSSadaf Ebrahimiare returned from calls to <b>pcre2_match()</b> to be displayed. If a mark is
1663*22dc650dSSadaf Ebrahimireturned for a match, non-match, or partial match, <b>pcre2test</b> shows it.
1664*22dc650dSSadaf EbrahimiFor a match, it is on a line by itself, tagged with "MK:". Otherwise, it
1665*22dc650dSSadaf Ebrahimiis added to the non-match message.
1666*22dc650dSSadaf Ebrahimi</P>
1667*22dc650dSSadaf Ebrahimi<br><b>
1668*22dc650dSSadaf EbrahimiShowing memory usage
1669*22dc650dSSadaf Ebrahimi</b><br>
1670*22dc650dSSadaf Ebrahimi<P>
1671*22dc650dSSadaf EbrahimiThe <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
1672*22dc650dSSadaf Ebrahimimemory allocation and freeing calls that occur during a call to
1673*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>. In the latter case, heap memory
1674*22dc650dSSadaf Ebrahimiis used only when a match requires more internal workspace that the default
1675*22dc650dSSadaf Ebrahimiallocation on the stack, so in many cases there will be no output. No heap
1676*22dc650dSSadaf Ebrahimimemory is allocated during matching with JIT. For this modifier to work, the
1677*22dc650dSSadaf Ebrahimi<b>null_context</b> modifier must not be set on both the pattern and the
1678*22dc650dSSadaf Ebrahimisubject, though it can be set on one or the other.
1679*22dc650dSSadaf Ebrahimi</P>
1680*22dc650dSSadaf Ebrahimi<br><b>
1681*22dc650dSSadaf EbrahimiShowing the heap frame overall vector size
1682*22dc650dSSadaf Ebrahimi</b><br>
1683*22dc650dSSadaf Ebrahimi<P>
1684*22dc650dSSadaf EbrahimiThe <b>heapframes_size</b> modifier is relevant for matches using
1685*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> without JIT. After a match has run (whether successful or
1686*22dc650dSSadaf Ebrahiminot) the size, in bytes, of the allocated heap frames vector that is left
1687*22dc650dSSadaf Ebrahimiattached to the match data block is shown. If the matching action involved
1688*22dc650dSSadaf Ebrahimiseveral calls to <b>pcre2_match()</b> (for example, global matching or for
1689*22dc650dSSadaf Ebrahimitiming) only the final value is shown.
1690*22dc650dSSadaf Ebrahimi</P>
1691*22dc650dSSadaf Ebrahimi<P>
1692*22dc650dSSadaf EbrahimiThis modifier is ignored, with a warning, for POSIX or DFA matching. JIT
1693*22dc650dSSadaf Ebrahimimatching does not use the heap frames vector, so the size is always zero,
1694*22dc650dSSadaf Ebrahimiunless there was a previous non-JIT match. Note that specifing a size of zero
1695*22dc650dSSadaf Ebrahimifor the output vector (see below) causes <b>pcre2test</b> to free its match data
1696*22dc650dSSadaf Ebrahimiblock (and associated heap frames vector) and allocate a new one.
1697*22dc650dSSadaf Ebrahimi</P>
1698*22dc650dSSadaf Ebrahimi<br><b>
1699*22dc650dSSadaf EbrahimiSetting a starting offset
1700*22dc650dSSadaf Ebrahimi</b><br>
1701*22dc650dSSadaf Ebrahimi<P>
1702*22dc650dSSadaf EbrahimiThe <b>offset</b> modifier sets an offset in the subject string at which
1703*22dc650dSSadaf Ebrahimimatching starts. Its value is a number of code units, not characters.
1704*22dc650dSSadaf Ebrahimi</P>
1705*22dc650dSSadaf Ebrahimi<br><b>
1706*22dc650dSSadaf EbrahimiSetting an offset limit
1707*22dc650dSSadaf Ebrahimi</b><br>
1708*22dc650dSSadaf Ebrahimi<P>
1709*22dc650dSSadaf EbrahimiThe <b>offset_limit</b> modifier sets a limit for unanchored matches. If a match
1710*22dc650dSSadaf Ebrahimicannot be found starting at or before this offset in the subject, a "no match"
1711*22dc650dSSadaf Ebrahimireturn is given. The data value is a number of code units, not characters. When
1712*22dc650dSSadaf Ebrahimithis modifier is used, the <b>use_offset_limit</b> modifier must have been set
1713*22dc650dSSadaf Ebrahimifor the pattern; if not, an error is generated.
1714*22dc650dSSadaf Ebrahimi</P>
1715*22dc650dSSadaf Ebrahimi<br><b>
1716*22dc650dSSadaf EbrahimiSetting the size of the output vector
1717*22dc650dSSadaf Ebrahimi</b><br>
1718*22dc650dSSadaf Ebrahimi<P>
1719*22dc650dSSadaf EbrahimiThe <b>ovector</b> modifier applies only to the subject line in which it
1720*22dc650dSSadaf Ebrahimiappears, though of course it can also be used to set a default in a
1721*22dc650dSSadaf Ebrahimi<b>#subject</b> command. It specifies the number of pairs of offsets that are
1722*22dc650dSSadaf Ebrahimiavailable for storing matching information. The default is 15.
1723*22dc650dSSadaf Ebrahimi</P>
1724*22dc650dSSadaf Ebrahimi<P>
1725*22dc650dSSadaf EbrahimiA value of zero is useful when testing the POSIX API because it causes
1726*22dc650dSSadaf Ebrahimi<b>regexec()</b> to be called with a NULL capture vector. When not testing the
1727*22dc650dSSadaf EbrahimiPOSIX API, a value of zero is used to cause
1728*22dc650dSSadaf Ebrahimi<b>pcre2_match_data_create_from_pattern()</b> to be called, in order to create a
1729*22dc650dSSadaf Ebrahiminew match block of exactly the right size for the pattern. (It is not possible
1730*22dc650dSSadaf Ebrahimito create a match block with a zero-length ovector; there is always at least
1731*22dc650dSSadaf Ebrahimione pair of offsets.) The old match data block is freed.
1732*22dc650dSSadaf Ebrahimi</P>
1733*22dc650dSSadaf Ebrahimi<br><b>
1734*22dc650dSSadaf EbrahimiPassing the subject as zero-terminated
1735*22dc650dSSadaf Ebrahimi</b><br>
1736*22dc650dSSadaf Ebrahimi<P>
1737*22dc650dSSadaf EbrahimiBy default, the subject string is passed to a native API matching function with
1738*22dc650dSSadaf Ebrahimiits correct length. In order to test the facility for passing a zero-terminated
1739*22dc650dSSadaf Ebrahimistring, the <b>zero_terminate</b> modifier is provided. It causes the length to
1740*22dc650dSSadaf Ebrahimibe passed as PCRE2_ZERO_TERMINATED. When matching via the POSIX interface,
1741*22dc650dSSadaf Ebrahimithis modifier is ignored, with a warning.
1742*22dc650dSSadaf Ebrahimi</P>
1743*22dc650dSSadaf Ebrahimi<P>
1744*22dc650dSSadaf EbrahimiWhen testing <b>pcre2_substitute()</b>, this modifier also has the effect of
1745*22dc650dSSadaf Ebrahimipassing the replacement string as zero-terminated.
1746*22dc650dSSadaf Ebrahimi</P>
1747*22dc650dSSadaf Ebrahimi<br><b>
1748*22dc650dSSadaf EbrahimiPassing a NULL context, subject, or replacement
1749*22dc650dSSadaf Ebrahimi</b><br>
1750*22dc650dSSadaf Ebrahimi<P>
1751*22dc650dSSadaf EbrahimiNormally, <b>pcre2test</b> passes a context block to <b>pcre2_match()</b>,
1752*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>, <b>pcre2_jit_match()</b> or <b>pcre2_substitute()</b>.
1753*22dc650dSSadaf EbrahimiIf the <b>null_context</b> modifier is set, however, NULL is passed. This is for
1754*22dc650dSSadaf Ebrahimitesting that the matching and substitution functions behave correctly in this
1755*22dc650dSSadaf Ebrahimicase (they use default values). This modifier cannot be used with the
1756*22dc650dSSadaf Ebrahimi<b>find_limits</b>, <b>find_limits_noheap</b>, or <b>substitute_callout</b>
1757*22dc650dSSadaf Ebrahimimodifiers.
1758*22dc650dSSadaf Ebrahimi</P>
1759*22dc650dSSadaf Ebrahimi<P>
1760*22dc650dSSadaf EbrahimiSimilarly, for testing purposes, if the <b>null_subject</b> or
1761*22dc650dSSadaf Ebrahimi<b>null_replacement</b> modifier is set, the subject or replacement string
1762*22dc650dSSadaf Ebrahimipointers are passed as NULL, respectively, to the relevant functions.
1763*22dc650dSSadaf Ebrahimi</P>
1764*22dc650dSSadaf Ebrahimi<br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
1765*22dc650dSSadaf Ebrahimi<P>
1766*22dc650dSSadaf EbrahimiBy default, <b>pcre2test</b> uses the standard PCRE2 matching function,
1767*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> to match each subject line. PCRE2 also supports an
1768*22dc650dSSadaf Ebrahimialternative matching function, <b>pcre2_dfa_match()</b>, which operates in a
1769*22dc650dSSadaf Ebrahimidifferent way, and has some restrictions. The differences between the two
1770*22dc650dSSadaf Ebrahimifunctions are described in the
1771*22dc650dSSadaf Ebrahimi<a href="pcre2matching.html"><b>pcre2matching</b></a>
1772*22dc650dSSadaf Ebrahimidocumentation.
1773*22dc650dSSadaf Ebrahimi</P>
1774*22dc650dSSadaf Ebrahimi<P>
1775*22dc650dSSadaf EbrahimiIf the <b>dfa</b> modifier is set, the alternative matching function is used.
1776*22dc650dSSadaf EbrahimiThis function finds all possible matches at a given point in the subject. If,
1777*22dc650dSSadaf Ebrahimihowever, the <b>dfa_shortest</b> modifier is set, processing stops after the
1778*22dc650dSSadaf Ebrahimifirst match is found. This is always the shortest possible match.
1779*22dc650dSSadaf Ebrahimi</P>
1780*22dc650dSSadaf Ebrahimi<br><a name="SEC13" href="#TOC1">DEFAULT OUTPUT FROM pcre2test</a><br>
1781*22dc650dSSadaf Ebrahimi<P>
1782*22dc650dSSadaf EbrahimiThis section describes the output when the normal matching function,
1783*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>, is being used.
1784*22dc650dSSadaf Ebrahimi</P>
1785*22dc650dSSadaf Ebrahimi<P>
1786*22dc650dSSadaf EbrahimiWhen a match succeeds, <b>pcre2test</b> outputs the list of captured substrings,
1787*22dc650dSSadaf Ebrahimistarting with number 0 for the string that matched the whole pattern.
1788*22dc650dSSadaf EbrahimiOtherwise, it outputs "No match" when the return is PCRE2_ERROR_NOMATCH, or
1789*22dc650dSSadaf Ebrahimi"Partial match:" followed by the partially matching substring when the
1790*22dc650dSSadaf Ebrahimireturn is PCRE2_ERROR_PARTIAL. (Note that this is the
1791*22dc650dSSadaf Ebrahimientire substring that was inspected during the partial match; it may include
1792*22dc650dSSadaf Ebrahimicharacters before the actual match start if a lookbehind assertion, \K, \b,
1793*22dc650dSSadaf Ebrahimior \B was involved.)
1794*22dc650dSSadaf Ebrahimi</P>
1795*22dc650dSSadaf Ebrahimi<P>
1796*22dc650dSSadaf EbrahimiFor any other return, <b>pcre2test</b> outputs the PCRE2 negative error number
1797*22dc650dSSadaf Ebrahimiand a short descriptive phrase. If the error is a failed UTF string check, the
1798*22dc650dSSadaf Ebrahimicode unit offset of the start of the failing character is also output. Here is
1799*22dc650dSSadaf Ebrahimian example of an interactive <b>pcre2test</b> run.
1800*22dc650dSSadaf Ebrahimi<pre>
1801*22dc650dSSadaf Ebrahimi  $ pcre2test
1802*22dc650dSSadaf Ebrahimi  PCRE2 version 10.22 2016-07-29
1803*22dc650dSSadaf Ebrahimi
1804*22dc650dSSadaf Ebrahimi    re&#62; /^abc(\d+)/
1805*22dc650dSSadaf Ebrahimi  data&#62; abc123
1806*22dc650dSSadaf Ebrahimi   0: abc123
1807*22dc650dSSadaf Ebrahimi   1: 123
1808*22dc650dSSadaf Ebrahimi  data&#62; xyz
1809*22dc650dSSadaf Ebrahimi  No match
1810*22dc650dSSadaf Ebrahimi</pre>
1811*22dc650dSSadaf EbrahimiUnset capturing substrings that are not followed by one that is set are not
1812*22dc650dSSadaf Ebrahimishown by <b>pcre2test</b> unless the <b>allcaptures</b> modifier is specified. In
1813*22dc650dSSadaf Ebrahimithe following example, there are two capturing substrings, but when the first
1814*22dc650dSSadaf Ebrahimidata line is matched, the second, unset substring is not shown. An "internal"
1815*22dc650dSSadaf Ebrahimiunset substring is shown as "&#60;unset&#62;", as for the second data line.
1816*22dc650dSSadaf Ebrahimi<pre>
1817*22dc650dSSadaf Ebrahimi    re&#62; /(a)|(b)/
1818*22dc650dSSadaf Ebrahimi  data&#62; a
1819*22dc650dSSadaf Ebrahimi   0: a
1820*22dc650dSSadaf Ebrahimi   1: a
1821*22dc650dSSadaf Ebrahimi  data&#62; b
1822*22dc650dSSadaf Ebrahimi   0: b
1823*22dc650dSSadaf Ebrahimi   1: &#60;unset&#62;
1824*22dc650dSSadaf Ebrahimi   2: b
1825*22dc650dSSadaf Ebrahimi</pre>
1826*22dc650dSSadaf EbrahimiIf the strings contain any non-printing characters, they are output as \xhh
1827*22dc650dSSadaf Ebrahimiescapes if the value is less than 256 and UTF mode is not set. Otherwise they
1828*22dc650dSSadaf Ebrahimiare output as \x{hh...} escapes. See below for the definition of non-printing
1829*22dc650dSSadaf Ebrahimicharacters. If the <b>aftertext</b> modifier is set, the output for substring 0
1830*22dc650dSSadaf Ebrahimiis followed by the rest of the subject string, identified by "0+" like this:
1831*22dc650dSSadaf Ebrahimi<pre>
1832*22dc650dSSadaf Ebrahimi    re&#62; /cat/aftertext
1833*22dc650dSSadaf Ebrahimi  data&#62; cataract
1834*22dc650dSSadaf Ebrahimi   0: cat
1835*22dc650dSSadaf Ebrahimi   0+ aract
1836*22dc650dSSadaf Ebrahimi</pre>
1837*22dc650dSSadaf EbrahimiIf global matching is requested, the results of successive matching attempts
1838*22dc650dSSadaf Ebrahimiare output in sequence, like this:
1839*22dc650dSSadaf Ebrahimi<pre>
1840*22dc650dSSadaf Ebrahimi    re&#62; /\Bi(\w\w)/g
1841*22dc650dSSadaf Ebrahimi  data&#62; Mississippi
1842*22dc650dSSadaf Ebrahimi   0: iss
1843*22dc650dSSadaf Ebrahimi   1: ss
1844*22dc650dSSadaf Ebrahimi   0: iss
1845*22dc650dSSadaf Ebrahimi   1: ss
1846*22dc650dSSadaf Ebrahimi   0: ipp
1847*22dc650dSSadaf Ebrahimi   1: pp
1848*22dc650dSSadaf Ebrahimi</pre>
1849*22dc650dSSadaf Ebrahimi"No match" is output only if the first match attempt fails. Here is an example
1850*22dc650dSSadaf Ebrahimiof a failure message (the offset 4 that is specified by the <b>offset</b>
1851*22dc650dSSadaf Ebrahimimodifier is past the end of the subject string):
1852*22dc650dSSadaf Ebrahimi<pre>
1853*22dc650dSSadaf Ebrahimi    re&#62; /xyz/
1854*22dc650dSSadaf Ebrahimi  data&#62; xyz\=offset=4
1855*22dc650dSSadaf Ebrahimi  Error -24 (bad offset value)
1856*22dc650dSSadaf Ebrahimi</PRE>
1857*22dc650dSSadaf Ebrahimi</P>
1858*22dc650dSSadaf Ebrahimi<P>
1859*22dc650dSSadaf EbrahimiNote that whereas patterns can be continued over several lines (a plain "&#62;"
1860*22dc650dSSadaf Ebrahimiprompt is used for continuations), subject lines may not. However newlines can
1861*22dc650dSSadaf Ebrahimibe included in a subject by means of the \n escape (or \r, \r\n, etc.,
1862*22dc650dSSadaf Ebrahimidepending on the newline sequence setting).
1863*22dc650dSSadaf Ebrahimi</P>
1864*22dc650dSSadaf Ebrahimi<br><a name="SEC14" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
1865*22dc650dSSadaf Ebrahimi<P>
1866*22dc650dSSadaf EbrahimiWhen the alternative matching function, <b>pcre2_dfa_match()</b>, is used, the
1867*22dc650dSSadaf Ebrahimioutput consists of a list of all the matches that start at the first point in
1868*22dc650dSSadaf Ebrahimithe subject where there is at least one match. For example:
1869*22dc650dSSadaf Ebrahimi<pre>
1870*22dc650dSSadaf Ebrahimi    re&#62; /(tang|tangerine|tan)/
1871*22dc650dSSadaf Ebrahimi  data&#62; yellow tangerine\=dfa
1872*22dc650dSSadaf Ebrahimi   0: tangerine
1873*22dc650dSSadaf Ebrahimi   1: tang
1874*22dc650dSSadaf Ebrahimi   2: tan
1875*22dc650dSSadaf Ebrahimi</pre>
1876*22dc650dSSadaf EbrahimiUsing the normal matching function on this data finds only "tang". The
1877*22dc650dSSadaf Ebrahimilongest matching string is always given first (and numbered zero). After a
1878*22dc650dSSadaf EbrahimiPCRE2_ERROR_PARTIAL return, the output is "Partial match:", followed by the
1879*22dc650dSSadaf Ebrahimipartially matching substring. Note that this is the entire substring that was
1880*22dc650dSSadaf Ebrahimiinspected during the partial match; it may include characters before the actual
1881*22dc650dSSadaf Ebrahimimatch start if a lookbehind assertion, \b, or \B was involved. (\K is not
1882*22dc650dSSadaf Ebrahimisupported for DFA matching.)
1883*22dc650dSSadaf Ebrahimi</P>
1884*22dc650dSSadaf Ebrahimi<P>
1885*22dc650dSSadaf EbrahimiIf global matching is requested, the search for further matches resumes
1886*22dc650dSSadaf Ebrahimiat the end of the longest match. For example:
1887*22dc650dSSadaf Ebrahimi<pre>
1888*22dc650dSSadaf Ebrahimi    re&#62; /(tang|tangerine|tan)/g
1889*22dc650dSSadaf Ebrahimi  data&#62; yellow tangerine and tangy sultana\=dfa
1890*22dc650dSSadaf Ebrahimi   0: tangerine
1891*22dc650dSSadaf Ebrahimi   1: tang
1892*22dc650dSSadaf Ebrahimi   2: tan
1893*22dc650dSSadaf Ebrahimi   0: tang
1894*22dc650dSSadaf Ebrahimi   1: tan
1895*22dc650dSSadaf Ebrahimi   0: tan
1896*22dc650dSSadaf Ebrahimi</pre>
1897*22dc650dSSadaf EbrahimiThe alternative matching function does not support substring capture, so the
1898*22dc650dSSadaf Ebrahimimodifiers that are concerned with captured substrings are not relevant.
1899*22dc650dSSadaf Ebrahimi</P>
1900*22dc650dSSadaf Ebrahimi<br><a name="SEC15" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
1901*22dc650dSSadaf Ebrahimi<P>
1902*22dc650dSSadaf EbrahimiWhen the alternative matching function has given the PCRE2_ERROR_PARTIAL
1903*22dc650dSSadaf Ebrahimireturn, indicating that the subject partially matched the pattern, you can
1904*22dc650dSSadaf Ebrahimirestart the match with additional subject data by means of the
1905*22dc650dSSadaf Ebrahimi<b>dfa_restart</b> modifier. For example:
1906*22dc650dSSadaf Ebrahimi<pre>
1907*22dc650dSSadaf Ebrahimi    re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
1908*22dc650dSSadaf Ebrahimi  data&#62; 23ja\=ps,dfa
1909*22dc650dSSadaf Ebrahimi  Partial match: 23ja
1910*22dc650dSSadaf Ebrahimi  data&#62; n05\=dfa,dfa_restart
1911*22dc650dSSadaf Ebrahimi   0: n05
1912*22dc650dSSadaf Ebrahimi</pre>
1913*22dc650dSSadaf EbrahimiFor further information about partial matching, see the
1914*22dc650dSSadaf Ebrahimi<a href="pcre2partial.html"><b>pcre2partial</b></a>
1915*22dc650dSSadaf Ebrahimidocumentation.
1916*22dc650dSSadaf Ebrahimi<a name="callouts"></a></P>
1917*22dc650dSSadaf Ebrahimi<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
1918*22dc650dSSadaf Ebrahimi<P>
1919*22dc650dSSadaf EbrahimiIf the pattern contains any callout requests, <b>pcre2test</b>'s callout
1920*22dc650dSSadaf Ebrahimifunction is called during matching unless <b>callout_none</b> is specified. This
1921*22dc650dSSadaf Ebrahimiworks with both matching functions, and with JIT, though there are some
1922*22dc650dSSadaf Ebrahimidifferences in behaviour. The output for callouts with numerical arguments and
1923*22dc650dSSadaf Ebrahimithose with string arguments is slightly different.
1924*22dc650dSSadaf Ebrahimi</P>
1925*22dc650dSSadaf Ebrahimi<br><b>
1926*22dc650dSSadaf EbrahimiCallouts with numerical arguments
1927*22dc650dSSadaf Ebrahimi</b><br>
1928*22dc650dSSadaf Ebrahimi<P>
1929*22dc650dSSadaf EbrahimiBy default, the callout function displays the callout number, the start and
1930*22dc650dSSadaf Ebrahimicurrent positions in the subject text at the callout time, and the next pattern
1931*22dc650dSSadaf Ebrahimiitem to be tested. For example:
1932*22dc650dSSadaf Ebrahimi<pre>
1933*22dc650dSSadaf Ebrahimi  ---&#62;pqrabcdef
1934*22dc650dSSadaf Ebrahimi    0    ^  ^     \d
1935*22dc650dSSadaf Ebrahimi</pre>
1936*22dc650dSSadaf EbrahimiThis output indicates that callout number 0 occurred for a match attempt
1937*22dc650dSSadaf Ebrahimistarting at the fourth character of the subject string, when the pointer was at
1938*22dc650dSSadaf Ebrahimithe seventh character, and when the next pattern item was \d. Just
1939*22dc650dSSadaf Ebrahimione circumflex is output if the start and current positions are the same, or if
1940*22dc650dSSadaf Ebrahimithe current position precedes the start position, which can happen if the
1941*22dc650dSSadaf Ebrahimicallout is in a lookbehind assertion.
1942*22dc650dSSadaf Ebrahimi</P>
1943*22dc650dSSadaf Ebrahimi<P>
1944*22dc650dSSadaf EbrahimiCallouts numbered 255 are assumed to be automatic callouts, inserted as a
1945*22dc650dSSadaf Ebrahimiresult of the <b>auto_callout</b> pattern modifier. In this case, instead of
1946*22dc650dSSadaf Ebrahimishowing the callout number, the offset in the pattern, preceded by a plus, is
1947*22dc650dSSadaf Ebrahimioutput. For example:
1948*22dc650dSSadaf Ebrahimi<pre>
1949*22dc650dSSadaf Ebrahimi    re&#62; /\d?[A-E]\*/auto_callout
1950*22dc650dSSadaf Ebrahimi  data&#62; E*
1951*22dc650dSSadaf Ebrahimi  ---&#62;E*
1952*22dc650dSSadaf Ebrahimi   +0 ^      \d?
1953*22dc650dSSadaf Ebrahimi   +3 ^      [A-E]
1954*22dc650dSSadaf Ebrahimi   +8 ^^     \*
1955*22dc650dSSadaf Ebrahimi  +10 ^ ^
1956*22dc650dSSadaf Ebrahimi   0: E*
1957*22dc650dSSadaf Ebrahimi</pre>
1958*22dc650dSSadaf EbrahimiIf a pattern contains (*MARK) items, an additional line is output whenever
1959*22dc650dSSadaf Ebrahimia change of latest mark is passed to the callout function. For example:
1960*22dc650dSSadaf Ebrahimi<pre>
1961*22dc650dSSadaf Ebrahimi    re&#62; /a(*MARK:X)bc/auto_callout
1962*22dc650dSSadaf Ebrahimi  data&#62; abc
1963*22dc650dSSadaf Ebrahimi  ---&#62;abc
1964*22dc650dSSadaf Ebrahimi   +0 ^       a
1965*22dc650dSSadaf Ebrahimi   +1 ^^      (*MARK:X)
1966*22dc650dSSadaf Ebrahimi  +10 ^^      b
1967*22dc650dSSadaf Ebrahimi  Latest Mark: X
1968*22dc650dSSadaf Ebrahimi  +11 ^ ^     c
1969*22dc650dSSadaf Ebrahimi  +12 ^  ^
1970*22dc650dSSadaf Ebrahimi   0: abc
1971*22dc650dSSadaf Ebrahimi</pre>
1972*22dc650dSSadaf EbrahimiThe mark changes between matching "a" and "b", but stays the same for the rest
1973*22dc650dSSadaf Ebrahimiof the match, so nothing more is output. If, as a result of backtracking, the
1974*22dc650dSSadaf Ebrahimimark reverts to being unset, the text "&#60;unset&#62;" is output.
1975*22dc650dSSadaf Ebrahimi</P>
1976*22dc650dSSadaf Ebrahimi<br><b>
1977*22dc650dSSadaf EbrahimiCallouts with string arguments
1978*22dc650dSSadaf Ebrahimi</b><br>
1979*22dc650dSSadaf Ebrahimi<P>
1980*22dc650dSSadaf EbrahimiThe output for a callout with a string argument is similar, except that instead
1981*22dc650dSSadaf Ebrahimiof outputting a callout number before the position indicators, the callout
1982*22dc650dSSadaf Ebrahimistring and its offset in the pattern string are output before the reflection of
1983*22dc650dSSadaf Ebrahimithe subject string, and the subject string is reflected for each callout. For
1984*22dc650dSSadaf Ebrahimiexample:
1985*22dc650dSSadaf Ebrahimi<pre>
1986*22dc650dSSadaf Ebrahimi    re&#62; /^ab(?C'first')cd(?C"second")ef/
1987*22dc650dSSadaf Ebrahimi  data&#62; abcdefg
1988*22dc650dSSadaf Ebrahimi  Callout (7): 'first'
1989*22dc650dSSadaf Ebrahimi  ---&#62;abcdefg
1990*22dc650dSSadaf Ebrahimi      ^ ^         c
1991*22dc650dSSadaf Ebrahimi  Callout (20): "second"
1992*22dc650dSSadaf Ebrahimi  ---&#62;abcdefg
1993*22dc650dSSadaf Ebrahimi      ^   ^       e
1994*22dc650dSSadaf Ebrahimi   0: abcdef
1995*22dc650dSSadaf Ebrahimi
1996*22dc650dSSadaf Ebrahimi</PRE>
1997*22dc650dSSadaf Ebrahimi</P>
1998*22dc650dSSadaf Ebrahimi<br><b>
1999*22dc650dSSadaf EbrahimiCallout modifiers
2000*22dc650dSSadaf Ebrahimi</b><br>
2001*22dc650dSSadaf Ebrahimi<P>
2002*22dc650dSSadaf EbrahimiThe callout function in <b>pcre2test</b> returns zero (carry on matching) by
2003*22dc650dSSadaf Ebrahimidefault, but you can use a <b>callout_fail</b> modifier in a subject line to
2004*22dc650dSSadaf Ebrahimichange this and other parameters of the callout (see below).
2005*22dc650dSSadaf Ebrahimi</P>
2006*22dc650dSSadaf Ebrahimi<P>
2007*22dc650dSSadaf EbrahimiIf the <b>callout_capture</b> modifier is set, the current captured groups are
2008*22dc650dSSadaf Ebrahimioutput when a callout occurs. This is useful only for non-DFA matching, as
2009*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b> does not support capturing, so no captures are ever
2010*22dc650dSSadaf Ebrahimishown.
2011*22dc650dSSadaf Ebrahimi</P>
2012*22dc650dSSadaf Ebrahimi<P>
2013*22dc650dSSadaf EbrahimiThe normal callout output, showing the callout number or pattern offset (as
2014*22dc650dSSadaf Ebrahimidescribed above) is suppressed if the <b>callout_no_where</b> modifier is set.
2015*22dc650dSSadaf Ebrahimi</P>
2016*22dc650dSSadaf Ebrahimi<P>
2017*22dc650dSSadaf EbrahimiWhen using the interpretive matching function <b>pcre2_match()</b> without JIT,
2018*22dc650dSSadaf Ebrahimisetting the <b>callout_extra</b> modifier causes additional output from
2019*22dc650dSSadaf Ebrahimi<b>pcre2test</b>'s callout function to be generated. For the first callout in a
2020*22dc650dSSadaf Ebrahimimatch attempt at a new starting position in the subject, "New match attempt" is
2021*22dc650dSSadaf Ebrahimioutput. If there has been a backtrack since the last callout (or start of
2022*22dc650dSSadaf Ebrahimimatching if this is the first callout), "Backtrack" is output, followed by "No
2023*22dc650dSSadaf Ebrahimiother matching paths" if the backtrack ended the previous match attempt. For
2024*22dc650dSSadaf Ebrahimiexample:
2025*22dc650dSSadaf Ebrahimi<pre>
2026*22dc650dSSadaf Ebrahimi   re&#62; /(a+)b/auto_callout,no_start_optimize,no_auto_possess
2027*22dc650dSSadaf Ebrahimi  data&#62; aac\=callout_extra
2028*22dc650dSSadaf Ebrahimi  New match attempt
2029*22dc650dSSadaf Ebrahimi  ---&#62;aac
2030*22dc650dSSadaf Ebrahimi   +0 ^       (
2031*22dc650dSSadaf Ebrahimi   +1 ^       a+
2032*22dc650dSSadaf Ebrahimi   +3 ^ ^     )
2033*22dc650dSSadaf Ebrahimi   +4 ^ ^     b
2034*22dc650dSSadaf Ebrahimi  Backtrack
2035*22dc650dSSadaf Ebrahimi  ---&#62;aac
2036*22dc650dSSadaf Ebrahimi   +3 ^^      )
2037*22dc650dSSadaf Ebrahimi   +4 ^^      b
2038*22dc650dSSadaf Ebrahimi  Backtrack
2039*22dc650dSSadaf Ebrahimi  No other matching paths
2040*22dc650dSSadaf Ebrahimi  New match attempt
2041*22dc650dSSadaf Ebrahimi  ---&#62;aac
2042*22dc650dSSadaf Ebrahimi   +0  ^      (
2043*22dc650dSSadaf Ebrahimi   +1  ^      a+
2044*22dc650dSSadaf Ebrahimi   +3  ^^     )
2045*22dc650dSSadaf Ebrahimi   +4  ^^     b
2046*22dc650dSSadaf Ebrahimi  Backtrack
2047*22dc650dSSadaf Ebrahimi  No other matching paths
2048*22dc650dSSadaf Ebrahimi  New match attempt
2049*22dc650dSSadaf Ebrahimi  ---&#62;aac
2050*22dc650dSSadaf Ebrahimi   +0   ^     (
2051*22dc650dSSadaf Ebrahimi   +1   ^     a+
2052*22dc650dSSadaf Ebrahimi  Backtrack
2053*22dc650dSSadaf Ebrahimi  No other matching paths
2054*22dc650dSSadaf Ebrahimi  New match attempt
2055*22dc650dSSadaf Ebrahimi  ---&#62;aac
2056*22dc650dSSadaf Ebrahimi   +0    ^    (
2057*22dc650dSSadaf Ebrahimi   +1    ^    a+
2058*22dc650dSSadaf Ebrahimi  No match
2059*22dc650dSSadaf Ebrahimi</pre>
2060*22dc650dSSadaf EbrahimiNotice that various optimizations must be turned off if you want all possible
2061*22dc650dSSadaf Ebrahimimatching paths to be scanned. If <b>no_start_optimize</b> is not used, there is
2062*22dc650dSSadaf Ebrahimian immediate "no match", without any callouts, because the starting
2063*22dc650dSSadaf Ebrahimioptimization fails to find "b" in the subject, which it knows must be present
2064*22dc650dSSadaf Ebrahimifor any match. If <b>no_auto_possess</b> is not used, the "a+" item is turned
2065*22dc650dSSadaf Ebrahimiinto "a++", which reduces the number of backtracks.
2066*22dc650dSSadaf Ebrahimi</P>
2067*22dc650dSSadaf Ebrahimi<P>
2068*22dc650dSSadaf EbrahimiThe <b>callout_extra</b> modifier has no effect if used with the DFA matching
2069*22dc650dSSadaf Ebrahimifunction, or with JIT.
2070*22dc650dSSadaf Ebrahimi</P>
2071*22dc650dSSadaf Ebrahimi<br><b>
2072*22dc650dSSadaf EbrahimiReturn values from callouts
2073*22dc650dSSadaf Ebrahimi</b><br>
2074*22dc650dSSadaf Ebrahimi<P>
2075*22dc650dSSadaf EbrahimiThe default return from the callout function is zero, which allows matching to
2076*22dc650dSSadaf Ebrahimicontinue. The <b>callout_fail</b> modifier can be given one or two numbers. If
2077*22dc650dSSadaf Ebrahimithere is only one number, 1 is returned instead of 0 (causing matching to
2078*22dc650dSSadaf Ebrahimibacktrack) when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;)
2079*22dc650dSSadaf Ebrahimiare given, 1 is returned when callout &#60;n&#62; is reached and there have been at
2080*22dc650dSSadaf Ebrahimileast &#60;m&#62; callouts. The <b>callout_error</b> modifier is similar, except that
2081*22dc650dSSadaf EbrahimiPCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
2082*22dc650dSSadaf Ebrahimiaborted. If both these modifiers are set for the same callout number,
2083*22dc650dSSadaf Ebrahimi<b>callout_error</b> takes precedence. Note that callouts with string arguments
2084*22dc650dSSadaf Ebrahimiare always given the number zero.
2085*22dc650dSSadaf Ebrahimi</P>
2086*22dc650dSSadaf Ebrahimi<P>
2087*22dc650dSSadaf EbrahimiThe <b>callout_data</b> modifier can be given an unsigned or a negative number.
2088*22dc650dSSadaf EbrahimiThis is set as the "user data" that is passed to the matching function, and
2089*22dc650dSSadaf Ebrahimipassed back when the callout function is invoked. Any value other than zero is
2090*22dc650dSSadaf Ebrahimiused as a return from <b>pcre2test</b>'s callout function.
2091*22dc650dSSadaf Ebrahimi</P>
2092*22dc650dSSadaf Ebrahimi<P>
2093*22dc650dSSadaf EbrahimiInserting callouts can be helpful when using <b>pcre2test</b> to check
2094*22dc650dSSadaf Ebrahimicomplicated regular expressions. For further information about callouts, see
2095*22dc650dSSadaf Ebrahimithe
2096*22dc650dSSadaf Ebrahimi<a href="pcre2callout.html"><b>pcre2callout</b></a>
2097*22dc650dSSadaf Ebrahimidocumentation.
2098*22dc650dSSadaf Ebrahimi</P>
2099*22dc650dSSadaf Ebrahimi<br><a name="SEC17" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
2100*22dc650dSSadaf Ebrahimi<P>
2101*22dc650dSSadaf EbrahimiWhen <b>pcre2test</b> is outputting text in the compiled version of a pattern,
2102*22dc650dSSadaf Ebrahimibytes other than 32-126 are always treated as non-printing characters and are
2103*22dc650dSSadaf Ebrahimitherefore shown as hex escapes.
2104*22dc650dSSadaf Ebrahimi</P>
2105*22dc650dSSadaf Ebrahimi<P>
2106*22dc650dSSadaf EbrahimiWhen <b>pcre2test</b> is outputting text that is a matched part of a subject
2107*22dc650dSSadaf Ebrahimistring, it behaves in the same way, unless a different locale has been set for
2108*22dc650dSSadaf Ebrahimithe pattern (using the <b>locale</b> modifier). In this case, the
2109*22dc650dSSadaf Ebrahimi<b>isprint()</b> function is used to distinguish printing and non-printing
2110*22dc650dSSadaf Ebrahimicharacters.
2111*22dc650dSSadaf Ebrahimi<a name="saverestore"></a></P>
2112*22dc650dSSadaf Ebrahimi<br><a name="SEC18" href="#TOC1">SAVING AND RESTORING COMPILED PATTERNS</a><br>
2113*22dc650dSSadaf Ebrahimi<P>
2114*22dc650dSSadaf EbrahimiIt is possible to save compiled patterns on disc or elsewhere, and reload them
2115*22dc650dSSadaf Ebrahimilater, subject to a number of restrictions. JIT data cannot be saved. The host
2116*22dc650dSSadaf Ebrahimion which the patterns are reloaded must be running the same version of PCRE2,
2117*22dc650dSSadaf Ebrahimiwith the same code unit width, and must also have the same endianness, pointer
2118*22dc650dSSadaf Ebrahimiwidth and PCRE2_SIZE type. Before compiled patterns can be saved they must be
2119*22dc650dSSadaf Ebrahimiserialized, that is, converted to a stream of bytes. A single byte stream may
2120*22dc650dSSadaf Ebrahimicontain any number of compiled patterns, but they must all use the same
2121*22dc650dSSadaf Ebrahimicharacter tables. A single copy of the tables is included in the byte stream
2122*22dc650dSSadaf Ebrahimi(its size is 1088 bytes).
2123*22dc650dSSadaf Ebrahimi</P>
2124*22dc650dSSadaf Ebrahimi<P>
2125*22dc650dSSadaf EbrahimiThe functions whose names begin with <b>pcre2_serialize_</b> are used
2126*22dc650dSSadaf Ebrahimifor serializing and de-serializing. They are described in the
2127*22dc650dSSadaf Ebrahimi<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
2128*22dc650dSSadaf Ebrahimidocumentation. In this section we describe the features of <b>pcre2test</b> that
2129*22dc650dSSadaf Ebrahimican be used to test these functions.
2130*22dc650dSSadaf Ebrahimi</P>
2131*22dc650dSSadaf Ebrahimi<P>
2132*22dc650dSSadaf EbrahimiNote that "serialization" in PCRE2 does not convert compiled patterns to an
2133*22dc650dSSadaf Ebrahimiabstract format like Java or .NET. It just makes a reloadable byte code stream.
2134*22dc650dSSadaf EbrahimiHence the restrictions on reloading mentioned above.
2135*22dc650dSSadaf Ebrahimi</P>
2136*22dc650dSSadaf Ebrahimi<P>
2137*22dc650dSSadaf EbrahimiIn <b>pcre2test</b>, when a pattern with <b>push</b> modifier is successfully
2138*22dc650dSSadaf Ebrahimicompiled, it is pushed onto a stack of compiled patterns, and <b>pcre2test</b>
2139*22dc650dSSadaf Ebrahimiexpects the next line to contain a new pattern (or command) instead of a
2140*22dc650dSSadaf Ebrahimisubject line. By contrast, the <b>pushcopy</b> modifier causes a copy of the
2141*22dc650dSSadaf Ebrahimicompiled pattern to be stacked, leaving the original available for immediate
2142*22dc650dSSadaf Ebrahimimatching. By using <b>push</b> and/or <b>pushcopy</b>, a number of patterns can
2143*22dc650dSSadaf Ebrahimibe compiled and retained. These modifiers are incompatible with <b>posix</b>,
2144*22dc650dSSadaf Ebrahimiand control modifiers that act at match time are ignored (with a message) for
2145*22dc650dSSadaf Ebrahimithe stacked patterns. The <b>jitverify</b> modifier applies only at compile
2146*22dc650dSSadaf Ebrahimitime.
2147*22dc650dSSadaf Ebrahimi</P>
2148*22dc650dSSadaf Ebrahimi<P>
2149*22dc650dSSadaf EbrahimiThe command
2150*22dc650dSSadaf Ebrahimi<pre>
2151*22dc650dSSadaf Ebrahimi  #save &#60;filename&#62;
2152*22dc650dSSadaf Ebrahimi</pre>
2153*22dc650dSSadaf Ebrahimicauses all the stacked patterns to be serialized and the result written to the
2154*22dc650dSSadaf Ebrahiminamed file. Afterwards, all the stacked patterns are freed. The command
2155*22dc650dSSadaf Ebrahimi<pre>
2156*22dc650dSSadaf Ebrahimi  #load &#60;filename&#62;
2157*22dc650dSSadaf Ebrahimi</pre>
2158*22dc650dSSadaf Ebrahimireads the data in the file, and then arranges for it to be de-serialized, with
2159*22dc650dSSadaf Ebrahimithe resulting compiled patterns added to the pattern stack. The pattern on the
2160*22dc650dSSadaf Ebrahimitop of the stack can be retrieved by the #pop command, which must be followed
2161*22dc650dSSadaf Ebrahimiby lines of subjects that are to be matched with the pattern, terminated as
2162*22dc650dSSadaf Ebrahimiusual by an empty line or end of file. This command may be followed by a
2163*22dc650dSSadaf Ebrahimimodifier list containing only
2164*22dc650dSSadaf Ebrahimi<a href="#controlmodifiers">control modifiers</a>
2165*22dc650dSSadaf Ebrahimithat act after a pattern has been compiled. In particular, <b>hex</b>,
2166*22dc650dSSadaf Ebrahimi<b>posix</b>, <b>posix_nosub</b>, <b>push</b>, and <b>pushcopy</b> are not allowed,
2167*22dc650dSSadaf Ebrahiminor are any
2168*22dc650dSSadaf Ebrahimi<a href="#optionmodifiers">option-setting modifiers.</a>
2169*22dc650dSSadaf EbrahimiThe JIT modifiers are, however permitted. Here is an example that saves and
2170*22dc650dSSadaf Ebrahimireloads two patterns.
2171*22dc650dSSadaf Ebrahimi<pre>
2172*22dc650dSSadaf Ebrahimi  /abc/push
2173*22dc650dSSadaf Ebrahimi  /xyz/push
2174*22dc650dSSadaf Ebrahimi  #save tempfile
2175*22dc650dSSadaf Ebrahimi  #load tempfile
2176*22dc650dSSadaf Ebrahimi  #pop info
2177*22dc650dSSadaf Ebrahimi  xyz
2178*22dc650dSSadaf Ebrahimi
2179*22dc650dSSadaf Ebrahimi  #pop jit,bincode
2180*22dc650dSSadaf Ebrahimi  abc
2181*22dc650dSSadaf Ebrahimi</pre>
2182*22dc650dSSadaf EbrahimiIf <b>jitverify</b> is used with #pop, it does not automatically imply
2183*22dc650dSSadaf Ebrahimi<b>jit</b>, which is different behaviour from when it is used on a pattern.
2184*22dc650dSSadaf Ebrahimi</P>
2185*22dc650dSSadaf Ebrahimi<P>
2186*22dc650dSSadaf EbrahimiThe #popcopy command is analogous to the <b>pushcopy</b> modifier in that it
2187*22dc650dSSadaf Ebrahimimakes current a copy of the topmost stack pattern, leaving the original still
2188*22dc650dSSadaf Ebrahimion the stack.
2189*22dc650dSSadaf Ebrahimi</P>
2190*22dc650dSSadaf Ebrahimi<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
2191*22dc650dSSadaf Ebrahimi<P>
2192*22dc650dSSadaf Ebrahimi<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
2193*22dc650dSSadaf Ebrahimi<b>pcre2jit</b>, <b>pcre2matching</b>(3), <b>pcre2partial</b>(d),
2194*22dc650dSSadaf Ebrahimi<b>pcre2pattern</b>(3), <b>pcre2serialize</b>(3).
2195*22dc650dSSadaf Ebrahimi</P>
2196*22dc650dSSadaf Ebrahimi<br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
2197*22dc650dSSadaf Ebrahimi<P>
2198*22dc650dSSadaf EbrahimiPhilip Hazel
2199*22dc650dSSadaf Ebrahimi<br>
2200*22dc650dSSadaf EbrahimiRetired from University Computing Service
2201*22dc650dSSadaf Ebrahimi<br>
2202*22dc650dSSadaf EbrahimiCambridge, England.
2203*22dc650dSSadaf Ebrahimi<br>
2204*22dc650dSSadaf Ebrahimi</P>
2205*22dc650dSSadaf Ebrahimi<br><a name="SEC21" href="#TOC1">REVISION</a><br>
2206*22dc650dSSadaf Ebrahimi<P>
2207*22dc650dSSadaf EbrahimiLast updated: 24 April 2024
2208*22dc650dSSadaf Ebrahimi<br>
2209*22dc650dSSadaf EbrahimiCopyright &copy; 1997-2024 University of Cambridge.
2210*22dc650dSSadaf Ebrahimi<br>
2211*22dc650dSSadaf Ebrahimi<p>
2212*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>.
2213*22dc650dSSadaf Ebrahimi</p>
2214