xref: /aosp_15_r20/external/pcre/doc/pcre2test.txt (revision 22dc650d8ae982c6770746019a6f94af92b0f024)
1*22dc650dSSadaf Ebrahimi
2*22dc650dSSadaf EbrahimiPCRE2TEST(1)                General Commands Manual               PCRE2TEST(1)
3*22dc650dSSadaf Ebrahimi
4*22dc650dSSadaf Ebrahimi
5*22dc650dSSadaf EbrahimiNAME
6*22dc650dSSadaf Ebrahimi       pcre2test - a program for testing Perl-compatible regular expressions.
7*22dc650dSSadaf Ebrahimi
8*22dc650dSSadaf Ebrahimi
9*22dc650dSSadaf EbrahimiSYNOPSIS
10*22dc650dSSadaf Ebrahimi
11*22dc650dSSadaf Ebrahimi       pcre2test [options] [input file [output file]]
12*22dc650dSSadaf Ebrahimi
13*22dc650dSSadaf Ebrahimi       pcre2test is a test program for the PCRE2 regular expression libraries,
14*22dc650dSSadaf Ebrahimi       but  it  can  also  be used for experimenting with regular expressions.
15*22dc650dSSadaf Ebrahimi       This document describes the features of the test program;  for  details
16*22dc650dSSadaf Ebrahimi       of  the regular expressions themselves, see the pcre2pattern documenta-
17*22dc650dSSadaf Ebrahimi       tion. For details of the PCRE2 library function  calls  and  their  op-
18*22dc650dSSadaf Ebrahimi       tions, see the pcre2api documentation.
19*22dc650dSSadaf Ebrahimi
20*22dc650dSSadaf Ebrahimi       The  input  for  pcre2test is a sequence of regular expression patterns
21*22dc650dSSadaf Ebrahimi       and subject strings to be matched. There are  also  command  lines  for
22*22dc650dSSadaf Ebrahimi       setting defaults and controlling some special actions. The output shows
23*22dc650dSSadaf Ebrahimi       the  result  of  each  match attempt. Modifiers on external or internal
24*22dc650dSSadaf Ebrahimi       command lines, the patterns, and the subject lines specify PCRE2  func-
25*22dc650dSSadaf Ebrahimi       tion  options, control how the subject is processed, and what output is
26*22dc650dSSadaf Ebrahimi       produced.
27*22dc650dSSadaf Ebrahimi
28*22dc650dSSadaf Ebrahimi       There are many obscure modifiers, some of which  are  specifically  de-
29*22dc650dSSadaf Ebrahimi       signed  for use in conjunction with the test script and data files that
30*22dc650dSSadaf Ebrahimi       are distributed as part of PCRE2.  All  the  modifiers  are  documented
31*22dc650dSSadaf Ebrahimi       here, some without much justification, but many of them are unlikely to
32*22dc650dSSadaf Ebrahimi       be of use except when testing the libraries.
33*22dc650dSSadaf Ebrahimi
34*22dc650dSSadaf Ebrahimi
35*22dc650dSSadaf EbrahimiPCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
36*22dc650dSSadaf Ebrahimi
37*22dc650dSSadaf Ebrahimi       Different versions of the PCRE2 library can be built to support charac-
38*22dc650dSSadaf Ebrahimi       ter  strings  that  are encoded in 8-bit, 16-bit, or 32-bit code units.
39*22dc650dSSadaf Ebrahimi       One, two, or all three of these libraries  may  be  simultaneously  in-
40*22dc650dSSadaf Ebrahimi       stalled.  The  pcre2test program can be used to test all the libraries.
41*22dc650dSSadaf Ebrahimi       However, its own input and output are  always  in  8-bit  format.  When
42*22dc650dSSadaf Ebrahimi       testing  the  16-bit  or 32-bit libraries, patterns and subject strings
43*22dc650dSSadaf Ebrahimi       are converted to 16-bit or 32-bit format before being passed to the li-
44*22dc650dSSadaf Ebrahimi       brary functions. Results are converted back to  8-bit  code  units  for
45*22dc650dSSadaf Ebrahimi       output.
46*22dc650dSSadaf Ebrahimi
47*22dc650dSSadaf Ebrahimi       In the rest of this document, the names of library functions and struc-
48*22dc650dSSadaf Ebrahimi       tures  are given in generic form, for example, pcre2_compile(). The ac-
49*22dc650dSSadaf Ebrahimi       tual names used in the libraries have a suffix _8, _16, or _32, as  ap-
50*22dc650dSSadaf Ebrahimi       propriate.
51*22dc650dSSadaf Ebrahimi
52*22dc650dSSadaf Ebrahimi
53*22dc650dSSadaf EbrahimiINPUT ENCODING
54*22dc650dSSadaf Ebrahimi
55*22dc650dSSadaf Ebrahimi       Input  to  pcre2test is processed line by line, either by calling the C
56*22dc650dSSadaf Ebrahimi       library's fgets() function, or via the libreadline or libedit  library.
57*22dc650dSSadaf Ebrahimi       In  some Windows environments character 26 (hex 1A) causes an immediate
58*22dc650dSSadaf Ebrahimi       end of file, and no further data is read, so this character  should  be
59*22dc650dSSadaf Ebrahimi       avoided unless you really want that action.
60*22dc650dSSadaf Ebrahimi
61*22dc650dSSadaf Ebrahimi       The  input is processed using C's string functions, so must not contain
62*22dc650dSSadaf Ebrahimi       binary zeros, even though in Unix-like environments, fgets() treats any
63*22dc650dSSadaf Ebrahimi       bytes other than newline as data characters. An error is generated if a
64*22dc650dSSadaf Ebrahimi       binary zero is encountered. By default subject lines are processed  for
65*22dc650dSSadaf Ebrahimi       backslash escapes, which makes it possible to include any data value in
66*22dc650dSSadaf Ebrahimi       strings  that  are  passed  to  the library for matching. For patterns,
67*22dc650dSSadaf Ebrahimi       there is a facility for specifying some or all of the 8-bit input char-
68*22dc650dSSadaf Ebrahimi       acters as hexadecimal pairs, which makes it possible to include  binary
69*22dc650dSSadaf Ebrahimi       zeros.
70*22dc650dSSadaf Ebrahimi
71*22dc650dSSadaf Ebrahimi   Input for the 16-bit and 32-bit libraries
72*22dc650dSSadaf Ebrahimi
73*22dc650dSSadaf Ebrahimi       When testing the 16-bit or 32-bit libraries, there is a need to be able
74*22dc650dSSadaf Ebrahimi       to  generate character code points greater than 255 in the strings that
75*22dc650dSSadaf Ebrahimi       are passed to the library. For subject lines, backslash escapes can  be
76*22dc650dSSadaf Ebrahimi       used.  In addition, when the utf modifier (see "Setting compilation op-
77*22dc650dSSadaf Ebrahimi       tions" below) is set, the pattern and any following subject  lines  are
78*22dc650dSSadaf Ebrahimi       interpreted  as UTF-8 strings and translated to UTF-16 or UTF-32 as ap-
79*22dc650dSSadaf Ebrahimi       propriate.
80*22dc650dSSadaf Ebrahimi
81*22dc650dSSadaf Ebrahimi       For non-UTF testing of wide characters, the utf8_input modifier can  be
82*22dc650dSSadaf Ebrahimi       used.  This  is  mutually  exclusive  with  utf, and is allowed only in
83*22dc650dSSadaf Ebrahimi       16-bit or 32-bit mode. It causes  the  pattern  and  following  subject
84*22dc650dSSadaf Ebrahimi       lines  to be treated as UTF-8 according to the original definition (RFC
85*22dc650dSSadaf Ebrahimi       2279), which allows for character values up to 0x7fffffff. Each charac-
86*22dc650dSSadaf Ebrahimi       ter is placed in one 16-bit or 32-bit code unit (in  the  16-bit  case,
87*22dc650dSSadaf Ebrahimi       values greater than 0xffff cause an error to occur).
88*22dc650dSSadaf Ebrahimi
89*22dc650dSSadaf Ebrahimi       UTF-8  (in  its  original definition) is not capable of encoding values
90*22dc650dSSadaf Ebrahimi       greater than 0x7fffffff, but such values can be handled by  the  32-bit
91*22dc650dSSadaf Ebrahimi       library. When testing this library in non-UTF mode with utf8_input set,
92*22dc650dSSadaf Ebrahimi       if any character is preceded by the byte 0xff (which is an invalid byte
93*22dc650dSSadaf Ebrahimi       in  UTF-8)  0x80000000  is  added to the character's value. This is the
94*22dc650dSSadaf Ebrahimi       only way of passing such code points in a pattern string.  For  subject
95*22dc650dSSadaf Ebrahimi       strings, using an escape sequence is preferable.
96*22dc650dSSadaf Ebrahimi
97*22dc650dSSadaf Ebrahimi
98*22dc650dSSadaf EbrahimiCOMMAND LINE OPTIONS
99*22dc650dSSadaf Ebrahimi
100*22dc650dSSadaf Ebrahimi       -8        If the 8-bit library has been built, this option causes it to
101*22dc650dSSadaf Ebrahimi                 be  used  (this is the default). If the 8-bit library has not
102*22dc650dSSadaf Ebrahimi                 been built, this option causes an error.
103*22dc650dSSadaf Ebrahimi
104*22dc650dSSadaf Ebrahimi       -16       If the 16-bit library has been built, this option  causes  it
105*22dc650dSSadaf Ebrahimi                 to  be used. If the 8-bit library has not been built, this is
106*22dc650dSSadaf Ebrahimi                 the default. If the 16-bit library has not been  built,  this
107*22dc650dSSadaf Ebrahimi                 option causes an error.
108*22dc650dSSadaf Ebrahimi
109*22dc650dSSadaf Ebrahimi       -32       If  the  32-bit library has been built, this option causes it
110*22dc650dSSadaf Ebrahimi                 to be used. If no other library has been built, this  is  the
111*22dc650dSSadaf Ebrahimi                 default.  If  the 32-bit library has not been built, this op-
112*22dc650dSSadaf Ebrahimi                 tion causes an error.
113*22dc650dSSadaf Ebrahimi
114*22dc650dSSadaf Ebrahimi       -ac       Behave as if each pattern has the auto_callout modifier, that
115*22dc650dSSadaf Ebrahimi                 is, insert automatic callouts into every pattern that is com-
116*22dc650dSSadaf Ebrahimi                 piled.
117*22dc650dSSadaf Ebrahimi
118*22dc650dSSadaf Ebrahimi       -AC       As for -ac, but in addition behave as if  each  subject  line
119*22dc650dSSadaf Ebrahimi                 has  the callout_extra modifier, that is, show additional in-
120*22dc650dSSadaf Ebrahimi                 formation from callouts.
121*22dc650dSSadaf Ebrahimi
122*22dc650dSSadaf Ebrahimi       -b        Behave as if each pattern has the fullbincode  modifier;  the
123*22dc650dSSadaf Ebrahimi                 full internal binary form of the pattern is output after com-
124*22dc650dSSadaf Ebrahimi                 pilation.
125*22dc650dSSadaf Ebrahimi
126*22dc650dSSadaf Ebrahimi       -C        Output  the  version  number  of  the  PCRE2 library, and all
127*22dc650dSSadaf Ebrahimi                 available information about the optional  features  that  are
128*22dc650dSSadaf Ebrahimi                 included,  and  then  exit with zero exit code. All other op-
129*22dc650dSSadaf Ebrahimi                 tions are ignored. If both -C and -LM are present,  whichever
130*22dc650dSSadaf Ebrahimi                 is first is recognized.
131*22dc650dSSadaf Ebrahimi
132*22dc650dSSadaf Ebrahimi       -C option Output  information  about a specific build-time option, then
133*22dc650dSSadaf Ebrahimi                 exit. This functionality is intended for use in scripts  such
134*22dc650dSSadaf Ebrahimi                 as  RunTest.  The  following options output the value and set
135*22dc650dSSadaf Ebrahimi                 the exit code as indicated:
136*22dc650dSSadaf Ebrahimi
137*22dc650dSSadaf Ebrahimi                   ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
138*22dc650dSSadaf Ebrahimi                                0x15 or 0x25
139*22dc650dSSadaf Ebrahimi                                0 if used in an ASCII environment
140*22dc650dSSadaf Ebrahimi                                exit code is always 0
141*22dc650dSSadaf Ebrahimi                   linksize   the configured internal link size (2, 3, or 4)
142*22dc650dSSadaf Ebrahimi                                exit code is set to the link size
143*22dc650dSSadaf Ebrahimi                   newline    the default newline setting:
144*22dc650dSSadaf Ebrahimi                                CR, LF, CRLF, ANYCRLF, ANY, or NUL
145*22dc650dSSadaf Ebrahimi                                exit code is always 0
146*22dc650dSSadaf Ebrahimi                   bsr        the default setting for what \R matches:
147*22dc650dSSadaf Ebrahimi                                ANYCRLF or ANY
148*22dc650dSSadaf Ebrahimi                                exit code is always 0
149*22dc650dSSadaf Ebrahimi
150*22dc650dSSadaf Ebrahimi                 The following options output 1 for true or 0 for  false,  and
151*22dc650dSSadaf Ebrahimi                 set the exit code to the same value:
152*22dc650dSSadaf Ebrahimi
153*22dc650dSSadaf Ebrahimi                   backslash-C  \C is supported (not locked out)
154*22dc650dSSadaf Ebrahimi                   ebcdic       compiled for an EBCDIC environment
155*22dc650dSSadaf Ebrahimi                   jit          just-in-time support is available
156*22dc650dSSadaf Ebrahimi                   pcre2-16     the 16-bit library was built
157*22dc650dSSadaf Ebrahimi                   pcre2-32     the 32-bit library was built
158*22dc650dSSadaf Ebrahimi                   pcre2-8      the 8-bit library was built
159*22dc650dSSadaf Ebrahimi                   unicode      Unicode support is available
160*22dc650dSSadaf Ebrahimi
161*22dc650dSSadaf Ebrahimi                 If  an  unknown  option is given, an error message is output;
162*22dc650dSSadaf Ebrahimi                 the exit code is 0.
163*22dc650dSSadaf Ebrahimi
164*22dc650dSSadaf Ebrahimi       -d        Behave as if each pattern has the debug modifier; the  inter-
165*22dc650dSSadaf Ebrahimi                 nal form and information about the compiled pattern is output
166*22dc650dSSadaf Ebrahimi                 after compilation; -d is equivalent to -b -i.
167*22dc650dSSadaf Ebrahimi
168*22dc650dSSadaf Ebrahimi       -dfa      Behave as if each subject line has the dfa modifier; matching
169*22dc650dSSadaf Ebrahimi                 is  done  using the pcre2_dfa_match() function instead of the
170*22dc650dSSadaf Ebrahimi                 default pcre2_match().
171*22dc650dSSadaf Ebrahimi
172*22dc650dSSadaf Ebrahimi       -error number[,number,...]
173*22dc650dSSadaf Ebrahimi                 Call pcre2_get_error_message() for each of the error  numbers
174*22dc650dSSadaf Ebrahimi                 in  the  comma-separated list, display the resulting messages
175*22dc650dSSadaf Ebrahimi                 on the standard output, then exit with zero  exit  code.  The
176*22dc650dSSadaf Ebrahimi                 numbers  may  be  positive or negative. This is a convenience
177*22dc650dSSadaf Ebrahimi                 facility for PCRE2 maintainers.
178*22dc650dSSadaf Ebrahimi
179*22dc650dSSadaf Ebrahimi       -help     Output a brief summary these options and then exit.
180*22dc650dSSadaf Ebrahimi
181*22dc650dSSadaf Ebrahimi       -i        Behave as if each pattern has the info modifier;  information
182*22dc650dSSadaf Ebrahimi                 about the compiled pattern is given after compilation.
183*22dc650dSSadaf Ebrahimi
184*22dc650dSSadaf Ebrahimi       -jit      Behave  as  if  each pattern line has the jit modifier; after
185*22dc650dSSadaf Ebrahimi                 successful compilation, each pattern is passed to  the  just-
186*22dc650dSSadaf Ebrahimi                 in-time compiler, if available.
187*22dc650dSSadaf Ebrahimi
188*22dc650dSSadaf Ebrahimi       -jitfast  Behave  as if each pattern line has the jitfast modifier; af-
189*22dc650dSSadaf Ebrahimi                 ter successful compilation, each pattern  is  passed  to  the
190*22dc650dSSadaf Ebrahimi                 just-in-time compiler, if available, and each subject line is
191*22dc650dSSadaf Ebrahimi                 passed directly to the JIT matcher via its "fast path".
192*22dc650dSSadaf Ebrahimi
193*22dc650dSSadaf Ebrahimi       -jitverify
194*22dc650dSSadaf Ebrahimi                 Behave  as  if  each pattern line has the jitverify modifier;
195*22dc650dSSadaf Ebrahimi                 after successful compilation, each pattern is passed  to  the
196*22dc650dSSadaf Ebrahimi                 just-in-time  compiler,  if available, and the use of JIT for
197*22dc650dSSadaf Ebrahimi                 matching is verified.
198*22dc650dSSadaf Ebrahimi
199*22dc650dSSadaf Ebrahimi       -LM       List modifiers: write a list of available pattern and subject
200*22dc650dSSadaf Ebrahimi                 modifiers to the standard output, then exit  with  zero  exit
201*22dc650dSSadaf Ebrahimi                 code.  All other options are ignored.  If both -C and any -Lx
202*22dc650dSSadaf Ebrahimi                 options are present, whichever is first is recognized.
203*22dc650dSSadaf Ebrahimi
204*22dc650dSSadaf Ebrahimi       -LP       List properties: write a list of recognized  Unicode  proper-
205*22dc650dSSadaf Ebrahimi                 ties  to  the standard output, then exit with zero exit code.
206*22dc650dSSadaf Ebrahimi                 All other options are ignored. If both -C and any -Lx options
207*22dc650dSSadaf Ebrahimi                 are present, whichever is first is recognized.
208*22dc650dSSadaf Ebrahimi
209*22dc650dSSadaf Ebrahimi       -LS       List scripts: write a list of recognized Unicode script names
210*22dc650dSSadaf Ebrahimi                 to the standard output, then exit with zero  exit  code.  All
211*22dc650dSSadaf Ebrahimi                 other options are ignored. If both -C and any -Lx options are
212*22dc650dSSadaf Ebrahimi                 present, whichever is first is recognized.
213*22dc650dSSadaf Ebrahimi
214*22dc650dSSadaf Ebrahimi       -pattern modifier-list
215*22dc650dSSadaf Ebrahimi                 Behave as if each pattern line contains the given modifiers.
216*22dc650dSSadaf Ebrahimi
217*22dc650dSSadaf Ebrahimi       -q        Do not output the version number of pcre2test at the start of
218*22dc650dSSadaf Ebrahimi                 execution.
219*22dc650dSSadaf Ebrahimi
220*22dc650dSSadaf Ebrahimi       -S size   On  Unix-like  systems, set the size of the run-time stack to
221*22dc650dSSadaf Ebrahimi                 size mebibytes (units of 1024*1024 bytes).
222*22dc650dSSadaf Ebrahimi
223*22dc650dSSadaf Ebrahimi       -subject modifier-list
224*22dc650dSSadaf Ebrahimi                 Behave as if each subject line contains the given modifiers.
225*22dc650dSSadaf Ebrahimi
226*22dc650dSSadaf Ebrahimi       -t        Run each compile and match many times with a timer, and  out-
227*22dc650dSSadaf Ebrahimi                 put  the  resulting  times  per compile or match. When JIT is
228*22dc650dSSadaf Ebrahimi                 used, separate times are given for the  initial  compile  and
229*22dc650dSSadaf Ebrahimi                 the  JIT  compile.  You  can control the number of iterations
230*22dc650dSSadaf Ebrahimi                 that are used for timing by following -t with a number (as  a
231*22dc650dSSadaf Ebrahimi                 separate  item  on  the command line). For example, "-t 1000"
232*22dc650dSSadaf Ebrahimi                 iterates 1000 times. The default is to iterate 500,000 times.
233*22dc650dSSadaf Ebrahimi
234*22dc650dSSadaf Ebrahimi       -tm       This is like -t except that it times only the matching phase,
235*22dc650dSSadaf Ebrahimi                 not the compile phase.
236*22dc650dSSadaf Ebrahimi
237*22dc650dSSadaf Ebrahimi       -T -TM    These behave like -t and -tm, but in addition, at the end  of
238*22dc650dSSadaf Ebrahimi                 a  run, the total times for all compiles and matches are out-
239*22dc650dSSadaf Ebrahimi                 put.
240*22dc650dSSadaf Ebrahimi
241*22dc650dSSadaf Ebrahimi       -version  Output the PCRE2 version number and then exit.
242*22dc650dSSadaf Ebrahimi
243*22dc650dSSadaf Ebrahimi
244*22dc650dSSadaf EbrahimiDESCRIPTION
245*22dc650dSSadaf Ebrahimi
246*22dc650dSSadaf Ebrahimi       If pcre2test is given two filename arguments, it reads from  the  first
247*22dc650dSSadaf Ebrahimi       and writes to the second. If the first name is "-", input is taken from
248*22dc650dSSadaf Ebrahimi       the  standard  input. If pcre2test is given only one argument, it reads
249*22dc650dSSadaf Ebrahimi       from that file and writes to stdout. Otherwise, it reads from stdin and
250*22dc650dSSadaf Ebrahimi       writes to stdout.
251*22dc650dSSadaf Ebrahimi
252*22dc650dSSadaf Ebrahimi       When pcre2test is built, a configuration option  can  specify  that  it
253*22dc650dSSadaf Ebrahimi       should  be linked with the libreadline or libedit library. When this is
254*22dc650dSSadaf Ebrahimi       done, if the input is from a terminal, it is read using the  readline()
255*22dc650dSSadaf Ebrahimi       function. This provides line-editing and history facilities. The output
256*22dc650dSSadaf Ebrahimi       from the -help option states whether or not readline() will be used.
257*22dc650dSSadaf Ebrahimi
258*22dc650dSSadaf Ebrahimi       The  program  handles  any number of tests, each of which consists of a
259*22dc650dSSadaf Ebrahimi       set of input lines. Each set starts with a regular expression  pattern,
260*22dc650dSSadaf Ebrahimi       followed by any number of subject lines to be matched against that pat-
261*22dc650dSSadaf Ebrahimi       tern. In between sets of test data, command lines that begin with # may
262*22dc650dSSadaf Ebrahimi       appear. This file format, with some restrictions, can also be processed
263*22dc650dSSadaf Ebrahimi       by  the perltest.sh script that is distributed with PCRE2 as a means of
264*22dc650dSSadaf Ebrahimi       checking that the behaviour of PCRE2 and Perl is the same. For a speci-
265*22dc650dSSadaf Ebrahimi       fication of perltest.sh, see the comments near its beginning. See  also
266*22dc650dSSadaf Ebrahimi       the #perltest command below.
267*22dc650dSSadaf Ebrahimi
268*22dc650dSSadaf Ebrahimi       When the input is a terminal, pcre2test prompts for each line of input,
269*22dc650dSSadaf Ebrahimi       using  "re>"  to prompt for regular expression patterns, and "data>" to
270*22dc650dSSadaf Ebrahimi       prompt for subject lines. Command lines starting with # can be  entered
271*22dc650dSSadaf Ebrahimi       only in response to the "re>" prompt.
272*22dc650dSSadaf Ebrahimi
273*22dc650dSSadaf Ebrahimi       Each  subject line is matched separately and independently. If you want
274*22dc650dSSadaf Ebrahimi       to do multi-line matches, you have to use the \n escape sequence (or \r
275*22dc650dSSadaf Ebrahimi       or \r\n, etc., depending on the newline setting) in a  single  line  of
276*22dc650dSSadaf Ebrahimi       input  to encode the newline sequences. There is no limit on the length
277*22dc650dSSadaf Ebrahimi       of subject lines; the input buffer is automatically extended if  it  is
278*22dc650dSSadaf Ebrahimi       too  small.  There  are  replication features that makes it possible to
279*22dc650dSSadaf Ebrahimi       generate long repetitive pattern or subject  lines  without  having  to
280*22dc650dSSadaf Ebrahimi       supply them explicitly.
281*22dc650dSSadaf Ebrahimi
282*22dc650dSSadaf Ebrahimi       An  empty  line  or  the end of the file signals the end of the subject
283*22dc650dSSadaf Ebrahimi       lines for a test, at which point a new pattern or command line  is  ex-
284*22dc650dSSadaf Ebrahimi       pected if there is still input to be read.
285*22dc650dSSadaf Ebrahimi
286*22dc650dSSadaf Ebrahimi
287*22dc650dSSadaf EbrahimiCOMMAND LINES
288*22dc650dSSadaf Ebrahimi
289*22dc650dSSadaf Ebrahimi       In  between sets of test data, a line that begins with # is interpreted
290*22dc650dSSadaf Ebrahimi       as a command line. If the first character is followed by white space or
291*22dc650dSSadaf Ebrahimi       an exclamation mark, the line is treated as  a  comment,  and  ignored.
292*22dc650dSSadaf Ebrahimi       Otherwise, the following commands are recognized:
293*22dc650dSSadaf Ebrahimi
294*22dc650dSSadaf Ebrahimi         #forbid_utf
295*22dc650dSSadaf Ebrahimi
296*22dc650dSSadaf Ebrahimi       Subsequent   patterns   automatically   have  the  PCRE2_NEVER_UTF  and
297*22dc650dSSadaf Ebrahimi       PCRE2_NEVER_UCP options set, which locks out the use of  the  PCRE2_UTF
298*22dc650dSSadaf Ebrahimi       and  PCRE2_UCP options and the use of (*UTF) and (*UCP) at the start of
299*22dc650dSSadaf Ebrahimi       patterns. This command also forces an error  if  a  subsequent  pattern
300*22dc650dSSadaf Ebrahimi       contains  any  occurrences  of \P, \p, or \X, which are still supported
301*22dc650dSSadaf Ebrahimi       when PCRE2_UTF is not set, but which require Unicode  property  support
302*22dc650dSSadaf Ebrahimi       to be included in the library.
303*22dc650dSSadaf Ebrahimi
304*22dc650dSSadaf Ebrahimi       This  is  a trigger guard that is used in test files to ensure that UTF
305*22dc650dSSadaf Ebrahimi       or Unicode property tests are not accidentally added to files that  are
306*22dc650dSSadaf Ebrahimi       used  when  Unicode  support  is  not  included in the library. Setting
307*22dc650dSSadaf Ebrahimi       PCRE2_NEVER_UTF and PCRE2_NEVER_UCP as a default can also  be  obtained
308*22dc650dSSadaf Ebrahimi       by  the  use  of #pattern; the difference is that #forbid_utf cannot be
309*22dc650dSSadaf Ebrahimi       unset, and the automatic options are not displayed in pattern  informa-
310*22dc650dSSadaf Ebrahimi       tion, to avoid cluttering up test output.
311*22dc650dSSadaf Ebrahimi
312*22dc650dSSadaf Ebrahimi         #load <filename>
313*22dc650dSSadaf Ebrahimi
314*22dc650dSSadaf Ebrahimi       This command is used to load a set of precompiled patterns from a file,
315*22dc650dSSadaf Ebrahimi       as  described  in  the  section entitled "Saving and restoring compiled
316*22dc650dSSadaf Ebrahimi       patterns" below.
317*22dc650dSSadaf Ebrahimi
318*22dc650dSSadaf Ebrahimi         #loadtables <filename>
319*22dc650dSSadaf Ebrahimi
320*22dc650dSSadaf Ebrahimi       This command is used to load a set of binary character tables that  can
321*22dc650dSSadaf Ebrahimi       be  accessed  by  the tables=3 qualifier. Such tables can be created by
322*22dc650dSSadaf Ebrahimi       the pcre2_dftables program with the -b option.
323*22dc650dSSadaf Ebrahimi
324*22dc650dSSadaf Ebrahimi         #newline_default [<newline-list>]
325*22dc650dSSadaf Ebrahimi
326*22dc650dSSadaf Ebrahimi       When PCRE2 is built, a default newline  convention  can  be  specified.
327*22dc650dSSadaf Ebrahimi       This  determines which characters and/or character pairs are recognized
328*22dc650dSSadaf Ebrahimi       as indicating a newline in a pattern or subject string. The default can
329*22dc650dSSadaf Ebrahimi       be overridden when a pattern is compiled. The standard test files  con-
330*22dc650dSSadaf Ebrahimi       tain  tests  of  various  newline  conventions, but the majority of the
331*22dc650dSSadaf Ebrahimi       tests expect a single linefeed to be recognized as  a  newline  by  de-
332*22dc650dSSadaf Ebrahimi       fault.  Without  special action the tests would fail when PCRE2 is com-
333*22dc650dSSadaf Ebrahimi       piled with either CR or CRLF as the default newline.
334*22dc650dSSadaf Ebrahimi
335*22dc650dSSadaf Ebrahimi       The #newline_default command specifies a list of newline types that are
336*22dc650dSSadaf Ebrahimi       acceptable as the default. The types must be one of CR, LF, CRLF,  ANY-
337*22dc650dSSadaf Ebrahimi       CRLF, ANY, or NUL (in upper or lower case), for example:
338*22dc650dSSadaf Ebrahimi
339*22dc650dSSadaf Ebrahimi         #newline_default LF Any anyCRLF
340*22dc650dSSadaf Ebrahimi
341*22dc650dSSadaf Ebrahimi       If the default newline is in the list, this command has no effect. Oth-
342*22dc650dSSadaf Ebrahimi       erwise,  except  when  testing  the  POSIX API, a newline modifier that
343*22dc650dSSadaf Ebrahimi       specifies the first newline convention in the list (LF in the above ex-
344*22dc650dSSadaf Ebrahimi       ample) is added to any pattern that does not  already  have  a  newline
345*22dc650dSSadaf Ebrahimi       modifier. If the newline list is empty, the feature is turned off. This
346*22dc650dSSadaf Ebrahimi       command is present in a number of the standard test input files.
347*22dc650dSSadaf Ebrahimi
348*22dc650dSSadaf Ebrahimi       When  the POSIX API is being tested there is no way to override the de-
349*22dc650dSSadaf Ebrahimi       fault newline convention, though it is possible to set the newline con-
350*22dc650dSSadaf Ebrahimi       vention from within the pattern. A warning is given  if  the  posix  or
351*22dc650dSSadaf Ebrahimi       posix_nosub  modifier is used when #newline_default would set a default
352*22dc650dSSadaf Ebrahimi       for the non-POSIX API.
353*22dc650dSSadaf Ebrahimi
354*22dc650dSSadaf Ebrahimi         #pattern <modifier-list>
355*22dc650dSSadaf Ebrahimi
356*22dc650dSSadaf Ebrahimi       This command sets a default modifier list that applies  to  all  subse-
357*22dc650dSSadaf Ebrahimi       quent patterns. Modifiers on a pattern can change these settings.
358*22dc650dSSadaf Ebrahimi
359*22dc650dSSadaf Ebrahimi         #perltest
360*22dc650dSSadaf Ebrahimi
361*22dc650dSSadaf Ebrahimi       This  line  is  used  in test files that can also be processed by perl-
362*22dc650dSSadaf Ebrahimi       test.sh to confirm that Perl gives the same results  as  PCRE2.  Subse-
363*22dc650dSSadaf Ebrahimi       quent  tests are checked for the use of pcre2test features that are in-
364*22dc650dSSadaf Ebrahimi       compatible with the perltest.sh script.
365*22dc650dSSadaf Ebrahimi
366*22dc650dSSadaf Ebrahimi       Patterns must use '/' as their delimiter, and  only  certain  modifiers
367*22dc650dSSadaf Ebrahimi       are  supported. Comment lines, #pattern commands, and #subject commands
368*22dc650dSSadaf Ebrahimi       that set or unset "mark" are recognized and acted  on.  The  #perltest,
369*22dc650dSSadaf Ebrahimi       #forbid_utf,  and  #newline_default  commands,  which are needed in the
370*22dc650dSSadaf Ebrahimi       relevant pcre2test files, are silently ignored. All other command lines
371*22dc650dSSadaf Ebrahimi       are ignored, but give a warning message. The  #perltest  command  helps
372*22dc650dSSadaf Ebrahimi       detect  tests  that  are  accidentally put in the wrong file or use the
373*22dc650dSSadaf Ebrahimi       wrong delimiter. For more details of the  perltest.sh  script  see  the
374*22dc650dSSadaf Ebrahimi       comments it contains.
375*22dc650dSSadaf Ebrahimi
376*22dc650dSSadaf Ebrahimi         #pop [<modifiers>]
377*22dc650dSSadaf Ebrahimi         #popcopy [<modifiers>]
378*22dc650dSSadaf Ebrahimi
379*22dc650dSSadaf Ebrahimi       These  commands  are used to manipulate the stack of compiled patterns,
380*22dc650dSSadaf Ebrahimi       as described in the section entitled  "Saving  and  restoring  compiled
381*22dc650dSSadaf Ebrahimi       patterns" below.
382*22dc650dSSadaf Ebrahimi
383*22dc650dSSadaf Ebrahimi         #save <filename>
384*22dc650dSSadaf Ebrahimi
385*22dc650dSSadaf Ebrahimi       This  command  is used to save a set of compiled patterns to a file, as
386*22dc650dSSadaf Ebrahimi       described in the section entitled "Saving and restoring  compiled  pat-
387*22dc650dSSadaf Ebrahimi       terns" below.
388*22dc650dSSadaf Ebrahimi
389*22dc650dSSadaf Ebrahimi         #subject <modifier-list>
390*22dc650dSSadaf Ebrahimi
391*22dc650dSSadaf Ebrahimi       This  command  sets  a default modifier list that applies to all subse-
392*22dc650dSSadaf Ebrahimi       quent subject lines. Modifiers on a subject line can change these  set-
393*22dc650dSSadaf Ebrahimi       tings.
394*22dc650dSSadaf Ebrahimi
395*22dc650dSSadaf Ebrahimi
396*22dc650dSSadaf EbrahimiMODIFIER SYNTAX
397*22dc650dSSadaf Ebrahimi
398*22dc650dSSadaf Ebrahimi       Modifier lists are used with both pattern and subject lines. Items in a
399*22dc650dSSadaf Ebrahimi       list are separated by commas followed by optional white space. Trailing
400*22dc650dSSadaf Ebrahimi       whitespace  in  a modifier list is ignored. Some modifiers may be given
401*22dc650dSSadaf Ebrahimi       for both patterns and subject lines, whereas others are valid only  for
402*22dc650dSSadaf Ebrahimi       one  or  the  other.  Each  modifier  has a long name, for example "an-
403*22dc650dSSadaf Ebrahimi       chored", and some of them must be followed by  an  equals  sign  and  a
404*22dc650dSSadaf Ebrahimi       value,  for  example,  "offset=12". Values cannot contain comma charac-
405*22dc650dSSadaf Ebrahimi       ters, but may contain spaces. Modifiers that do not take values may  be
406*22dc650dSSadaf Ebrahimi       preceded by a minus sign to turn off a previous setting.
407*22dc650dSSadaf Ebrahimi
408*22dc650dSSadaf Ebrahimi       A few of the more common modifiers can also be specified as single let-
409*22dc650dSSadaf Ebrahimi       ters,  for  example "i" for "caseless". In documentation, following the
410*22dc650dSSadaf Ebrahimi       Perl convention, these are written with a slash ("the /i modifier") for
411*22dc650dSSadaf Ebrahimi       clarity. Abbreviated modifiers must all be concatenated  in  the  first
412*22dc650dSSadaf Ebrahimi       item  of a modifier list. If the first item is not recognized as a long
413*22dc650dSSadaf Ebrahimi       modifier name, it is interpreted as a sequence of these  abbreviations.
414*22dc650dSSadaf Ebrahimi       For example:
415*22dc650dSSadaf Ebrahimi
416*22dc650dSSadaf Ebrahimi         /abc/ig,newline=cr,jit=3
417*22dc650dSSadaf Ebrahimi
418*22dc650dSSadaf Ebrahimi       This  is  a pattern line whose modifier list starts with two one-letter
419*22dc650dSSadaf Ebrahimi       modifiers (/i and /g). The lower-case  abbreviated  modifiers  are  the
420*22dc650dSSadaf Ebrahimi       same as used in Perl.
421*22dc650dSSadaf Ebrahimi
422*22dc650dSSadaf Ebrahimi
423*22dc650dSSadaf EbrahimiPATTERN SYNTAX
424*22dc650dSSadaf Ebrahimi
425*22dc650dSSadaf Ebrahimi       A  pattern line must start with one of the following characters (common
426*22dc650dSSadaf Ebrahimi       symbols, excluding pattern meta-characters):
427*22dc650dSSadaf Ebrahimi
428*22dc650dSSadaf Ebrahimi         / ! " ' ` - = _ : ; , % & @ ~
429*22dc650dSSadaf Ebrahimi
430*22dc650dSSadaf Ebrahimi       This is interpreted as the pattern's delimiter.  A  regular  expression
431*22dc650dSSadaf Ebrahimi       may  be  continued  over several input lines, in which case the newline
432*22dc650dSSadaf Ebrahimi       characters are included within it. It is possible to include the delim-
433*22dc650dSSadaf Ebrahimi       iter as a literal within the pattern by escaping it with  a  backslash,
434*22dc650dSSadaf Ebrahimi       for example
435*22dc650dSSadaf Ebrahimi
436*22dc650dSSadaf Ebrahimi         /abc\/def/
437*22dc650dSSadaf Ebrahimi
438*22dc650dSSadaf Ebrahimi       If  you do this, the escape and the delimiter form part of the pattern,
439*22dc650dSSadaf Ebrahimi       but since the delimiters are all non-alphanumeric, the inclusion of the
440*22dc650dSSadaf Ebrahimi       backslash does not affect the pattern's interpretation. Note,  however,
441*22dc650dSSadaf Ebrahimi       that this trick does not work within \Q...\E literal bracketing because
442*22dc650dSSadaf Ebrahimi       the backslash will itself be interpreted as a literal. If the terminat-
443*22dc650dSSadaf Ebrahimi       ing delimiter is immediately followed by a backslash, for example,
444*22dc650dSSadaf Ebrahimi
445*22dc650dSSadaf Ebrahimi         /abc/\
446*22dc650dSSadaf Ebrahimi
447*22dc650dSSadaf Ebrahimi       a backslash is added to the end of the pattern. This is done to provide
448*22dc650dSSadaf Ebrahimi       a  way of testing the error condition that arises if a pattern finishes
449*22dc650dSSadaf Ebrahimi       with a backslash, because
450*22dc650dSSadaf Ebrahimi
451*22dc650dSSadaf Ebrahimi         /abc\/
452*22dc650dSSadaf Ebrahimi
453*22dc650dSSadaf Ebrahimi       is interpreted as the first line of a pattern that starts with  "abc/",
454*22dc650dSSadaf Ebrahimi       causing  pcre2test to read the next line as a continuation of the regu-
455*22dc650dSSadaf Ebrahimi       lar expression.
456*22dc650dSSadaf Ebrahimi
457*22dc650dSSadaf Ebrahimi       A pattern can be followed by a modifier list (details below).
458*22dc650dSSadaf Ebrahimi
459*22dc650dSSadaf Ebrahimi
460*22dc650dSSadaf EbrahimiSUBJECT LINE SYNTAX
461*22dc650dSSadaf Ebrahimi
462*22dc650dSSadaf Ebrahimi       Before each subject line is passed to pcre2_match(), pcre2_dfa_match(),
463*22dc650dSSadaf Ebrahimi       or pcre2_jit_match(), leading and trailing white space is removed,  and
464*22dc650dSSadaf Ebrahimi       the  line  is scanned for backslash escapes, unless the subject_literal
465*22dc650dSSadaf Ebrahimi       modifier was set for the pattern. The following provide a means of  en-
466*22dc650dSSadaf Ebrahimi       coding non-printing characters in a visible way:
467*22dc650dSSadaf Ebrahimi
468*22dc650dSSadaf Ebrahimi         \a         alarm (BEL, \x07)
469*22dc650dSSadaf Ebrahimi         \b         backspace (\x08)
470*22dc650dSSadaf Ebrahimi         \e         escape (\x27)
471*22dc650dSSadaf Ebrahimi         \f         form feed (\x0c)
472*22dc650dSSadaf Ebrahimi         \n         newline (\x0a)
473*22dc650dSSadaf Ebrahimi         \r         carriage return (\x0d)
474*22dc650dSSadaf Ebrahimi         \t         tab (\x09)
475*22dc650dSSadaf Ebrahimi         \v         vertical tab (\x0b)
476*22dc650dSSadaf Ebrahimi         \nnn       octal character (up to 3 octal digits); always
477*22dc650dSSadaf Ebrahimi                      a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
478*22dc650dSSadaf Ebrahimi         \o{dd...}  octal character (any number of octal digits}
479*22dc650dSSadaf Ebrahimi         \xhh       hexadecimal byte (up to 2 hex digits)
480*22dc650dSSadaf Ebrahimi         \x{hh...}  hexadecimal character (any number of hex digits)
481*22dc650dSSadaf Ebrahimi
482*22dc650dSSadaf Ebrahimi       The use of \x{hh...} is not dependent on the use of the utf modifier on
483*22dc650dSSadaf Ebrahimi       the  pattern. It is recognized always. There may be any number of hexa-
484*22dc650dSSadaf Ebrahimi       decimal digits inside the braces; invalid  values  provoke  error  mes-
485*22dc650dSSadaf Ebrahimi       sages.
486*22dc650dSSadaf Ebrahimi
487*22dc650dSSadaf Ebrahimi       Note  that  \xhh  specifies one byte rather than one character in UTF-8
488*22dc650dSSadaf Ebrahimi       mode; this makes it possible to construct invalid UTF-8  sequences  for
489*22dc650dSSadaf Ebrahimi       testing  purposes.  On the other hand, \x{hh} is interpreted as a UTF-8
490*22dc650dSSadaf Ebrahimi       character in UTF-8 mode, generating more than one byte if the value  is
491*22dc650dSSadaf Ebrahimi       greater  than  127.   When testing the 8-bit library not in UTF-8 mode,
492*22dc650dSSadaf Ebrahimi       \x{hh} generates one byte for values less than 256, and causes an error
493*22dc650dSSadaf Ebrahimi       for greater values.
494*22dc650dSSadaf Ebrahimi
495*22dc650dSSadaf Ebrahimi       In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
496*22dc650dSSadaf Ebrahimi       possible to construct invalid UTF-16 sequences for testing purposes.
497*22dc650dSSadaf Ebrahimi
498*22dc650dSSadaf Ebrahimi       In UTF-32 mode, all 4- to 8-digit \x{...}  values  are  accepted.  This
499*22dc650dSSadaf Ebrahimi       makes  it  possible  to  construct invalid UTF-32 sequences for testing
500*22dc650dSSadaf Ebrahimi       purposes.
501*22dc650dSSadaf Ebrahimi
502*22dc650dSSadaf Ebrahimi       There is a special backslash sequence that specifies replication of one
503*22dc650dSSadaf Ebrahimi       or more characters:
504*22dc650dSSadaf Ebrahimi
505*22dc650dSSadaf Ebrahimi         \[<characters>]{<count>}
506*22dc650dSSadaf Ebrahimi
507*22dc650dSSadaf Ebrahimi       This makes it possible to test long strings without having  to  provide
508*22dc650dSSadaf Ebrahimi       them as part of the file. For example:
509*22dc650dSSadaf Ebrahimi
510*22dc650dSSadaf Ebrahimi         \[abc]{4}
511*22dc650dSSadaf Ebrahimi
512*22dc650dSSadaf Ebrahimi       is  converted to "abcabcabcabc". This feature does not support nesting.
513*22dc650dSSadaf Ebrahimi       To include a closing square bracket in the characters, code it as \x5D.
514*22dc650dSSadaf Ebrahimi
515*22dc650dSSadaf Ebrahimi       A backslash followed by an equals sign marks the  end  of  the  subject
516*22dc650dSSadaf Ebrahimi       string and the start of a modifier list. For example:
517*22dc650dSSadaf Ebrahimi
518*22dc650dSSadaf Ebrahimi         abc\=notbol,notempty
519*22dc650dSSadaf Ebrahimi
520*22dc650dSSadaf Ebrahimi       If  the  subject  string is empty and \= is followed by whitespace, the
521*22dc650dSSadaf Ebrahimi       line is treated as a comment line, and is not used  for  matching.  For
522*22dc650dSSadaf Ebrahimi       example:
523*22dc650dSSadaf Ebrahimi
524*22dc650dSSadaf Ebrahimi         \= This is a comment.
525*22dc650dSSadaf Ebrahimi         abc\= This is an invalid modifier list.
526*22dc650dSSadaf Ebrahimi
527*22dc650dSSadaf Ebrahimi       A  backslash  followed by any other non-alphanumeric character just es-
528*22dc650dSSadaf Ebrahimi       capes that character. A backslash followed by anything else  causes  an
529*22dc650dSSadaf Ebrahimi       error.  However,  if the very last character in the line is a backslash
530*22dc650dSSadaf Ebrahimi       (and there is no modifier list), it is ignored. This  gives  a  way  of
531*22dc650dSSadaf Ebrahimi       passing  an  empty line as data, since a real empty line terminates the
532*22dc650dSSadaf Ebrahimi       data input.
533*22dc650dSSadaf Ebrahimi
534*22dc650dSSadaf Ebrahimi       If the subject_literal modifier is set for a pattern, all subject lines
535*22dc650dSSadaf Ebrahimi       that follow are treated as literals, with no special treatment of back-
536*22dc650dSSadaf Ebrahimi       slashes.  No replication is possible, and any subject modifiers must be
537*22dc650dSSadaf Ebrahimi       set as defaults by a #subject command.
538*22dc650dSSadaf Ebrahimi
539*22dc650dSSadaf Ebrahimi
540*22dc650dSSadaf EbrahimiPATTERN MODIFIERS
541*22dc650dSSadaf Ebrahimi
542*22dc650dSSadaf Ebrahimi       There are several types of modifier that can appear in  pattern  lines.
543*22dc650dSSadaf Ebrahimi       Except where noted below, they may also be used in #pattern commands. A
544*22dc650dSSadaf Ebrahimi       pattern's  modifier  list can add to or override default modifiers that
545*22dc650dSSadaf Ebrahimi       were set by a previous #pattern command.
546*22dc650dSSadaf Ebrahimi
547*22dc650dSSadaf Ebrahimi   Setting compilation options
548*22dc650dSSadaf Ebrahimi
549*22dc650dSSadaf Ebrahimi       The following modifiers set options for pcre2_compile(). Most  of  them
550*22dc650dSSadaf Ebrahimi       set  bits  in  the  options  argument of that function, but those whose
551*22dc650dSSadaf Ebrahimi       names start with PCRE2_EXTRA are additional options that are set in the
552*22dc650dSSadaf Ebrahimi       compile context.  Some of these options  have  single-letter  abbrevia-
553*22dc650dSSadaf Ebrahimi       tions.  There  is  special  handling  for /x: if a second x is present,
554*22dc650dSSadaf Ebrahimi       PCRE2_EXTENDED is converted into  PCRE2_EXTENDED_MORE  as  in  Perl.  A
555*22dc650dSSadaf Ebrahimi       third appearance adds PCRE2_EXTENDED as well, though this makes no dif-
556*22dc650dSSadaf Ebrahimi       ference to the way pcre2_compile() behaves. See pcre2api for a descrip-
557*22dc650dSSadaf Ebrahimi       tion of the effects of these options.
558*22dc650dSSadaf Ebrahimi
559*22dc650dSSadaf Ebrahimi             allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
560*22dc650dSSadaf Ebrahimi             allow_lookaround_bsk      set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
561*22dc650dSSadaf Ebrahimi             allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
562*22dc650dSSadaf Ebrahimi             alt_bsux                  set PCRE2_ALT_BSUX
563*22dc650dSSadaf Ebrahimi             alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
564*22dc650dSSadaf Ebrahimi             alt_verbnames             set PCRE2_ALT_VERBNAMES
565*22dc650dSSadaf Ebrahimi             anchored                  set PCRE2_ANCHORED
566*22dc650dSSadaf Ebrahimi         /a  ascii_all                 set all ASCII options
567*22dc650dSSadaf Ebrahimi             ascii_bsd                 set PCRE2_EXTRA_ASCII_BSD
568*22dc650dSSadaf Ebrahimi             ascii_bss                 set PCRE2_EXTRA_ASCII_BSS
569*22dc650dSSadaf Ebrahimi             ascii_bsw                 set PCRE2_EXTRA_ASCII_BSW
570*22dc650dSSadaf Ebrahimi             ascii_digit               set PCRE2_EXTRA_ASCII_DIGIT
571*22dc650dSSadaf Ebrahimi             ascii_posix               set PCRE2_EXTRA_ASCII_POSIX
572*22dc650dSSadaf Ebrahimi             auto_callout              set PCRE2_AUTO_CALLOUT
573*22dc650dSSadaf Ebrahimi             bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
574*22dc650dSSadaf Ebrahimi         /i  caseless                  set PCRE2_CASELESS
575*22dc650dSSadaf Ebrahimi         /r  caseless_restrict         set PCRE2_EXTRA_CASELESS_RESTRICT
576*22dc650dSSadaf Ebrahimi             dollar_endonly            set PCRE2_DOLLAR_ENDONLY
577*22dc650dSSadaf Ebrahimi         /s  dotall                    set PCRE2_DOTALL
578*22dc650dSSadaf Ebrahimi             dupnames                  set PCRE2_DUPNAMES
579*22dc650dSSadaf Ebrahimi             endanchored               set PCRE2_ENDANCHORED
580*22dc650dSSadaf Ebrahimi             escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF
581*22dc650dSSadaf Ebrahimi         /x  extended                  set PCRE2_EXTENDED
582*22dc650dSSadaf Ebrahimi         /xx extended_more             set PCRE2_EXTENDED_MORE
583*22dc650dSSadaf Ebrahimi             extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX
584*22dc650dSSadaf Ebrahimi             firstline                 set PCRE2_FIRSTLINE
585*22dc650dSSadaf Ebrahimi             literal                   set PCRE2_LITERAL
586*22dc650dSSadaf Ebrahimi             match_line                set PCRE2_EXTRA_MATCH_LINE
587*22dc650dSSadaf Ebrahimi             match_invalid_utf         set PCRE2_MATCH_INVALID_UTF
588*22dc650dSSadaf Ebrahimi             match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
589*22dc650dSSadaf Ebrahimi             match_word                set PCRE2_EXTRA_MATCH_WORD
590*22dc650dSSadaf Ebrahimi         /m  multiline                 set PCRE2_MULTILINE
591*22dc650dSSadaf Ebrahimi             never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
592*22dc650dSSadaf Ebrahimi             never_ucp                 set PCRE2_NEVER_UCP
593*22dc650dSSadaf Ebrahimi             never_utf                 set PCRE2_NEVER_UTF
594*22dc650dSSadaf Ebrahimi         /n  no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
595*22dc650dSSadaf Ebrahimi             no_auto_possess           set PCRE2_NO_AUTO_POSSESS
596*22dc650dSSadaf Ebrahimi             no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
597*22dc650dSSadaf Ebrahimi             no_start_optimize         set PCRE2_NO_START_OPTIMIZE
598*22dc650dSSadaf Ebrahimi             no_utf_check              set PCRE2_NO_UTF_CHECK
599*22dc650dSSadaf Ebrahimi             ucp                       set PCRE2_UCP
600*22dc650dSSadaf Ebrahimi             ungreedy                  set PCRE2_UNGREEDY
601*22dc650dSSadaf Ebrahimi             use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
602*22dc650dSSadaf Ebrahimi             utf                       set PCRE2_UTF
603*22dc650dSSadaf Ebrahimi
604*22dc650dSSadaf Ebrahimi       As well as turning on the PCRE2_UTF option, the utf modifier causes all
605*22dc650dSSadaf Ebrahimi       non-printing  characters  in  output  strings  to  be printed using the
606*22dc650dSSadaf Ebrahimi       \x{hh...} notation. Otherwise, those less than 0x100 are output in  hex
607*22dc650dSSadaf Ebrahimi       without  the  curly brackets. Setting utf in 16-bit or 32-bit mode also
608*22dc650dSSadaf Ebrahimi       causes pattern and subject  strings  to  be  translated  to  UTF-16  or
609*22dc650dSSadaf Ebrahimi       UTF-32, respectively, before being passed to library functions.
610*22dc650dSSadaf Ebrahimi
611*22dc650dSSadaf Ebrahimi   Setting compilation controls
612*22dc650dSSadaf Ebrahimi
613*22dc650dSSadaf Ebrahimi       The  following  modifiers affect the compilation process or request in-
614*22dc650dSSadaf Ebrahimi       formation about the pattern. There are single-letter abbreviations  for
615*22dc650dSSadaf Ebrahimi       some that are heavily used in the test files.
616*22dc650dSSadaf Ebrahimi
617*22dc650dSSadaf Ebrahimi             bsr=[anycrlf|unicode]     specify \R handling
618*22dc650dSSadaf Ebrahimi         /B  bincode                   show binary code without lengths
619*22dc650dSSadaf Ebrahimi             callout_info              show callout information
620*22dc650dSSadaf Ebrahimi             convert=<options>         request foreign pattern conversion
621*22dc650dSSadaf Ebrahimi             convert_glob_escape=c     set glob escape character
622*22dc650dSSadaf Ebrahimi             convert_glob_separator=c  set glob separator character
623*22dc650dSSadaf Ebrahimi             convert_length            set convert buffer length
624*22dc650dSSadaf Ebrahimi             debug                     same as info,fullbincode
625*22dc650dSSadaf Ebrahimi             framesize                 show matching frame size
626*22dc650dSSadaf Ebrahimi             fullbincode               show binary code with lengths
627*22dc650dSSadaf Ebrahimi         /I  info                      show info about compiled pattern
628*22dc650dSSadaf Ebrahimi             hex                       unquoted characters are hexadecimal
629*22dc650dSSadaf Ebrahimi             jit[=<number>]            use JIT
630*22dc650dSSadaf Ebrahimi             jitfast                   use JIT fast path
631*22dc650dSSadaf Ebrahimi             jitverify                 verify JIT use
632*22dc650dSSadaf Ebrahimi             locale=<name>             use this locale
633*22dc650dSSadaf Ebrahimi             max_pattern_compiled      ) set maximum compiled pattern
634*22dc650dSSadaf Ebrahimi                        _length=<n>    )   length (bytes)
635*22dc650dSSadaf Ebrahimi             max_pattern_length=<n>    set maximum pattern length (code units)
636*22dc650dSSadaf Ebrahimi             max_varlookbehind=<n>     set maximum variable lookbehind length
637*22dc650dSSadaf Ebrahimi             memory                    show memory used
638*22dc650dSSadaf Ebrahimi             newline=<type>            set newline type
639*22dc650dSSadaf Ebrahimi             null_context              compile with a NULL context
640*22dc650dSSadaf Ebrahimi             null_pattern              pass pattern as NULL
641*22dc650dSSadaf Ebrahimi             parens_nest_limit=<n>     set maximum parentheses depth
642*22dc650dSSadaf Ebrahimi             posix                     use the POSIX API
643*22dc650dSSadaf Ebrahimi             posix_nosub               use the POSIX API with REG_NOSUB
644*22dc650dSSadaf Ebrahimi             push                      push compiled pattern onto the stack
645*22dc650dSSadaf Ebrahimi             pushcopy                  push a copy onto the stack
646*22dc650dSSadaf Ebrahimi             stackguard=<number>       test the stackguard feature
647*22dc650dSSadaf Ebrahimi             subject_literal           treat all subject lines as literal
648*22dc650dSSadaf Ebrahimi             tables=[0|1|2|3]          select internal tables
649*22dc650dSSadaf Ebrahimi             use_length                do not zero-terminate the pattern
650*22dc650dSSadaf Ebrahimi             utf8_input                treat input as UTF-8
651*22dc650dSSadaf Ebrahimi
652*22dc650dSSadaf Ebrahimi       The effects of these modifiers are described in the following sections.
653*22dc650dSSadaf Ebrahimi
654*22dc650dSSadaf Ebrahimi   Newline and \R handling
655*22dc650dSSadaf Ebrahimi
656*22dc650dSSadaf Ebrahimi       The  bsr modifier specifies what \R in a pattern should match. If it is
657*22dc650dSSadaf Ebrahimi       set to "anycrlf", \R matches CR, LF, or CRLF only.  If  it  is  set  to
658*22dc650dSSadaf Ebrahimi       "unicode",  \R matches any Unicode newline sequence. The default can be
659*22dc650dSSadaf Ebrahimi       specified when PCRE2 is built; if it is not, the default is set to Uni-
660*22dc650dSSadaf Ebrahimi       code.
661*22dc650dSSadaf Ebrahimi
662*22dc650dSSadaf Ebrahimi       The newline modifier specifies which characters are to  be  interpreted
663*22dc650dSSadaf Ebrahimi       as newlines, both in the pattern and in subject lines. The type must be
664*22dc650dSSadaf Ebrahimi       one of CR, LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case).
665*22dc650dSSadaf Ebrahimi
666*22dc650dSSadaf Ebrahimi   Information about a pattern
667*22dc650dSSadaf Ebrahimi
668*22dc650dSSadaf Ebrahimi       The  debug modifier is a shorthand for info,fullbincode, requesting all
669*22dc650dSSadaf Ebrahimi       available information.
670*22dc650dSSadaf Ebrahimi
671*22dc650dSSadaf Ebrahimi       The bincode modifier causes a representation of the compiled code to be
672*22dc650dSSadaf Ebrahimi       output after compilation. This information does not contain length  and
673*22dc650dSSadaf Ebrahimi       offset values, which ensures that the same output is generated for dif-
674*22dc650dSSadaf Ebrahimi       ferent  internal  link  sizes  and different code unit widths. By using
675*22dc650dSSadaf Ebrahimi       bincode, the same regression tests can be used  in  different  environ-
676*22dc650dSSadaf Ebrahimi       ments.
677*22dc650dSSadaf Ebrahimi
678*22dc650dSSadaf Ebrahimi       The  fullbincode  modifier, by contrast, does include length and offset
679*22dc650dSSadaf Ebrahimi       values. This is used in a few special tests that run only for  specific
680*22dc650dSSadaf Ebrahimi       code unit widths and link sizes, and is also useful for one-off tests.
681*22dc650dSSadaf Ebrahimi
682*22dc650dSSadaf Ebrahimi       The  info  modifier  requests  information  about  the compiled pattern
683*22dc650dSSadaf Ebrahimi       (whether it is anchored, has a fixed first character, and so  on).  The
684*22dc650dSSadaf Ebrahimi       information  is  obtained  from the pcre2_pattern_info() function. Here
685*22dc650dSSadaf Ebrahimi       are some typical examples:
686*22dc650dSSadaf Ebrahimi
687*22dc650dSSadaf Ebrahimi           re> /(?i)(^a|^b)/m,info
688*22dc650dSSadaf Ebrahimi         Capture group count = 1
689*22dc650dSSadaf Ebrahimi         Compile options: multiline
690*22dc650dSSadaf Ebrahimi         Overall options: caseless multiline
691*22dc650dSSadaf Ebrahimi         First code unit at start or follows newline
692*22dc650dSSadaf Ebrahimi         Subject length lower bound = 1
693*22dc650dSSadaf Ebrahimi
694*22dc650dSSadaf Ebrahimi           re> /(?i)abc/info
695*22dc650dSSadaf Ebrahimi         Capture group count = 0
696*22dc650dSSadaf Ebrahimi         Compile options: <none>
697*22dc650dSSadaf Ebrahimi         Overall options: caseless
698*22dc650dSSadaf Ebrahimi         First code unit = 'a' (caseless)
699*22dc650dSSadaf Ebrahimi         Last code unit = 'c' (caseless)
700*22dc650dSSadaf Ebrahimi         Subject length lower bound = 3
701*22dc650dSSadaf Ebrahimi
702*22dc650dSSadaf Ebrahimi       "Compile options" are those specified by modifiers;  "overall  options"
703*22dc650dSSadaf Ebrahimi       have  added options that are taken or deduced from the pattern. If both
704*22dc650dSSadaf Ebrahimi       sets of options are the same, just a single "options" line  is  output;
705*22dc650dSSadaf Ebrahimi       if  there  are  no  options,  the line is omitted. "First code unit" is
706*22dc650dSSadaf Ebrahimi       where any match must start; if there is more than one they  are  listed
707*22dc650dSSadaf Ebrahimi       as  "starting  code  units".  "Last code unit" is the last literal code
708*22dc650dSSadaf Ebrahimi       unit that must be present in any match. This  is  not  necessarily  the
709*22dc650dSSadaf Ebrahimi       last  character.  These lines are omitted if no starting or ending code
710*22dc650dSSadaf Ebrahimi       units  are  recorded.  The  subject  length  line   is   omitted   when
711*22dc650dSSadaf Ebrahimi       no_start_optimize  is  set because the minimum length is not calculated
712*22dc650dSSadaf Ebrahimi       when it can never be used.
713*22dc650dSSadaf Ebrahimi
714*22dc650dSSadaf Ebrahimi       The framesize modifier shows the size, in bytes, of each storage  frame
715*22dc650dSSadaf Ebrahimi       used  by  pcre2_match()  for handling backtracking. The size depends on
716*22dc650dSSadaf Ebrahimi       the number of capturing parentheses in the pattern. A vector  of  these
717*22dc650dSSadaf Ebrahimi       frames  is  used  at  matching time; its overall size is shown when the
718*22dc650dSSadaf Ebrahimi       heaframes_size subject modifier is set.
719*22dc650dSSadaf Ebrahimi
720*22dc650dSSadaf Ebrahimi       The callout_info modifier requests information about all  the  callouts
721*22dc650dSSadaf Ebrahimi       in the pattern. A list of them is output at the end of any other infor-
722*22dc650dSSadaf Ebrahimi       mation that is requested. For each callout, either its number or string
723*22dc650dSSadaf Ebrahimi       is given, followed by the item that follows it in the pattern.
724*22dc650dSSadaf Ebrahimi
725*22dc650dSSadaf Ebrahimi   Passing a NULL context
726*22dc650dSSadaf Ebrahimi
727*22dc650dSSadaf Ebrahimi       Normally,  pcre2test  passes a context block to pcre2_compile(). If the
728*22dc650dSSadaf Ebrahimi       null_context modifier is set, however, NULL  is  passed.  This  is  for
729*22dc650dSSadaf Ebrahimi       testing  that  pcre2_compile()  behaves correctly in this case (it uses
730*22dc650dSSadaf Ebrahimi       default values).
731*22dc650dSSadaf Ebrahimi
732*22dc650dSSadaf Ebrahimi   Passing a NULL pattern
733*22dc650dSSadaf Ebrahimi
734*22dc650dSSadaf Ebrahimi       The null_pattern modifier is for testing the  behaviour  of  pcre2_com-
735*22dc650dSSadaf Ebrahimi       pile()  when  the  pattern argument is NULL. The length value passed is
736*22dc650dSSadaf Ebrahimi       the default PCRE2_ZERO_TERMINATED unless use_length is set.  Any length
737*22dc650dSSadaf Ebrahimi       other than zero causes an error.
738*22dc650dSSadaf Ebrahimi
739*22dc650dSSadaf Ebrahimi   Specifying pattern characters in hexadecimal
740*22dc650dSSadaf Ebrahimi
741*22dc650dSSadaf Ebrahimi       The hex modifier specifies that the characters of the  pattern,  except
742*22dc650dSSadaf Ebrahimi       for  substrings  enclosed  in single or double quotes, are to be inter-
743*22dc650dSSadaf Ebrahimi       preted as pairs of hexadecimal digits. This feature is  provided  as  a
744*22dc650dSSadaf Ebrahimi       way of creating patterns that contain binary zeros and other non-print-
745*22dc650dSSadaf Ebrahimi       ing  characters.  White space is permitted between pairs of digits. For
746*22dc650dSSadaf Ebrahimi       example, this pattern contains three characters:
747*22dc650dSSadaf Ebrahimi
748*22dc650dSSadaf Ebrahimi         /ab 32 59/hex
749*22dc650dSSadaf Ebrahimi
750*22dc650dSSadaf Ebrahimi       Parts of such a pattern are taken literally  if  quoted.  This  pattern
751*22dc650dSSadaf Ebrahimi       contains  nine characters, only two of which are specified in hexadeci-
752*22dc650dSSadaf Ebrahimi       mal:
753*22dc650dSSadaf Ebrahimi
754*22dc650dSSadaf Ebrahimi         /ab "literal" 32/hex
755*22dc650dSSadaf Ebrahimi
756*22dc650dSSadaf Ebrahimi       Either single or double quotes may be used. There is no way of  includ-
757*22dc650dSSadaf Ebrahimi       ing  the delimiter within a substring. The hex and expand modifiers are
758*22dc650dSSadaf Ebrahimi       mutually exclusive.
759*22dc650dSSadaf Ebrahimi
760*22dc650dSSadaf Ebrahimi   Specifying the pattern's length
761*22dc650dSSadaf Ebrahimi
762*22dc650dSSadaf Ebrahimi       By default, patterns are passed to the compiling functions as zero-ter-
763*22dc650dSSadaf Ebrahimi       minated strings but can be passed by length instead of being  zero-ter-
764*22dc650dSSadaf Ebrahimi       minated.  The use_length modifier causes this to happen. Using a length
765*22dc650dSSadaf Ebrahimi       happens automatically (whether or not use_length is set)  when  hex  is
766*22dc650dSSadaf Ebrahimi       set,  because  patterns specified in hexadecimal may contain binary ze-
767*22dc650dSSadaf Ebrahimi       ros.
768*22dc650dSSadaf Ebrahimi
769*22dc650dSSadaf Ebrahimi       If hex or use_length is used with the POSIX wrapper API (see "Using the
770*22dc650dSSadaf Ebrahimi       POSIX wrapper API" below), the REG_PEND extension is used to  pass  the
771*22dc650dSSadaf Ebrahimi       pattern's length.
772*22dc650dSSadaf Ebrahimi
773*22dc650dSSadaf Ebrahimi   Specifying a maximum for variable lookbehinds
774*22dc650dSSadaf Ebrahimi
775*22dc650dSSadaf Ebrahimi       Variable  lookbehind  assertions  are  supported only if, for each one,
776*22dc650dSSadaf Ebrahimi       there is a maximum length (in characters) that it can match. There is a
777*22dc650dSSadaf Ebrahimi       limit on this, whose default can be set at build time, with an ultimate
778*22dc650dSSadaf Ebrahimi       default   of   255.   The   max_varlookbehind   modifier    uses    the
779*22dc650dSSadaf Ebrahimi       pcre2_set_max_varlookbehind() function to change the limit. Lookbehinds
780*22dc650dSSadaf Ebrahimi       whose  branches  each match a fixed length are limited to 65535 charac-
781*22dc650dSSadaf Ebrahimi       ters per branch.
782*22dc650dSSadaf Ebrahimi
783*22dc650dSSadaf Ebrahimi   Specifying wide characters in 16-bit and 32-bit modes
784*22dc650dSSadaf Ebrahimi
785*22dc650dSSadaf Ebrahimi       In 16-bit and 32-bit modes, all input is automatically treated as UTF-8
786*22dc650dSSadaf Ebrahimi       and translated to UTF-16 or UTF-32 when the utf modifier  is  set.  For
787*22dc650dSSadaf Ebrahimi       testing the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input
788*22dc650dSSadaf Ebrahimi       modifier  can  be  used. It is mutually exclusive with utf. Input lines
789*22dc650dSSadaf Ebrahimi       are interpreted as UTF-8 as a means of specifying wide characters. More
790*22dc650dSSadaf Ebrahimi       details are given in "Input encoding" above.
791*22dc650dSSadaf Ebrahimi
792*22dc650dSSadaf Ebrahimi   Generating long repetitive patterns
793*22dc650dSSadaf Ebrahimi
794*22dc650dSSadaf Ebrahimi       Some tests use long patterns that are very repetitive. Instead of  cre-
795*22dc650dSSadaf Ebrahimi       ating  a very long input line for such a pattern, you can use a special
796*22dc650dSSadaf Ebrahimi       repetition feature, similar to the  one  described  for  subject  lines
797*22dc650dSSadaf Ebrahimi       above.  If  the  expand  modifier is present on a pattern, parts of the
798*22dc650dSSadaf Ebrahimi       pattern that have the form
799*22dc650dSSadaf Ebrahimi
800*22dc650dSSadaf Ebrahimi         \[<characters>]{<count>}
801*22dc650dSSadaf Ebrahimi
802*22dc650dSSadaf Ebrahimi       are expanded before the pattern is passed to pcre2_compile(). For exam-
803*22dc650dSSadaf Ebrahimi       ple, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
804*22dc650dSSadaf Ebrahimi       cannot be nested. An initial "\[" sequence is recognized only  if  "]{"
805*22dc650dSSadaf Ebrahimi       followed  by  decimal  digits and "}" is found later in the pattern. If
806*22dc650dSSadaf Ebrahimi       not, the characters remain in the pattern unaltered. The expand and hex
807*22dc650dSSadaf Ebrahimi       modifiers are mutually exclusive.
808*22dc650dSSadaf Ebrahimi
809*22dc650dSSadaf Ebrahimi       If part of an expanded pattern looks like an expansion, but  is  really
810*22dc650dSSadaf Ebrahimi       part of the actual pattern, unwanted expansion can be avoided by giving
811*22dc650dSSadaf Ebrahimi       two values in the quantifier. For example, \[AB]{6000,6000} is not rec-
812*22dc650dSSadaf Ebrahimi       ognized as an expansion item.
813*22dc650dSSadaf Ebrahimi
814*22dc650dSSadaf Ebrahimi       If  the  info modifier is set on an expanded pattern, the result of the
815*22dc650dSSadaf Ebrahimi       expansion is included in the information that is output.
816*22dc650dSSadaf Ebrahimi
817*22dc650dSSadaf Ebrahimi   JIT compilation
818*22dc650dSSadaf Ebrahimi
819*22dc650dSSadaf Ebrahimi       Just-in-time (JIT) compiling is a  heavyweight  optimization  that  can
820*22dc650dSSadaf Ebrahimi       greatly  speed  up pattern matching. See the pcre2jit documentation for
821*22dc650dSSadaf Ebrahimi       details. JIT compiling happens, optionally, after a  pattern  has  been
822*22dc650dSSadaf Ebrahimi       successfully  compiled into an internal form. The JIT compiler converts
823*22dc650dSSadaf Ebrahimi       this to optimized machine code. It needs to know whether the match-time
824*22dc650dSSadaf Ebrahimi       options PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used,
825*22dc650dSSadaf Ebrahimi       because different code is generated for the different  cases.  See  the
826*22dc650dSSadaf Ebrahimi       partial  modifier in "Subject Modifiers" below for details of how these
827*22dc650dSSadaf Ebrahimi       options are specified for each match attempt.
828*22dc650dSSadaf Ebrahimi
829*22dc650dSSadaf Ebrahimi       JIT compilation is requested by the jit pattern modifier, which may op-
830*22dc650dSSadaf Ebrahimi       tionally be followed by an equals sign and a number in the range  0  to
831*22dc650dSSadaf Ebrahimi       7.   The  three bits that make up the number specify which of the three
832*22dc650dSSadaf Ebrahimi       JIT operating modes are to be compiled:
833*22dc650dSSadaf Ebrahimi
834*22dc650dSSadaf Ebrahimi         1  compile JIT code for non-partial matching
835*22dc650dSSadaf Ebrahimi         2  compile JIT code for soft partial matching
836*22dc650dSSadaf Ebrahimi         4  compile JIT code for hard partial matching
837*22dc650dSSadaf Ebrahimi
838*22dc650dSSadaf Ebrahimi       The possible values for the jit modifier are therefore:
839*22dc650dSSadaf Ebrahimi
840*22dc650dSSadaf Ebrahimi         0  disable JIT
841*22dc650dSSadaf Ebrahimi         1  normal matching only
842*22dc650dSSadaf Ebrahimi         2  soft partial matching only
843*22dc650dSSadaf Ebrahimi         3  normal and soft partial matching
844*22dc650dSSadaf Ebrahimi         4  hard partial matching only
845*22dc650dSSadaf Ebrahimi         6  soft and hard partial matching only
846*22dc650dSSadaf Ebrahimi         7  all three modes
847*22dc650dSSadaf Ebrahimi
848*22dc650dSSadaf Ebrahimi       If no number is given, 7 is  assumed.  The  phrase  "partial  matching"
849*22dc650dSSadaf Ebrahimi       means a call to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the
850*22dc650dSSadaf Ebrahimi       PCRE2_PARTIAL_HARD  option set. Note that such a call may return a com-
851*22dc650dSSadaf Ebrahimi       plete match; the options enable the possibility of a partial match, but
852*22dc650dSSadaf Ebrahimi       do not require it. Note also that if you request JIT  compilation  only
853*22dc650dSSadaf Ebrahimi       for  partial  matching  (for example, jit=2) but do not set the partial
854*22dc650dSSadaf Ebrahimi       modifier on a subject line, that match will not use  JIT  code  because
855*22dc650dSSadaf Ebrahimi       none was compiled for non-partial matching.
856*22dc650dSSadaf Ebrahimi
857*22dc650dSSadaf Ebrahimi       If  JIT compilation is successful, the compiled JIT code will automati-
858*22dc650dSSadaf Ebrahimi       cally be used when an appropriate type of match is run, except when in-
859*22dc650dSSadaf Ebrahimi       compatible run-time options are specified. For more  details,  see  the
860*22dc650dSSadaf Ebrahimi       pcre2jit  documentation. See also the jitstack modifier below for a way
861*22dc650dSSadaf Ebrahimi       of setting the size of the JIT stack.
862*22dc650dSSadaf Ebrahimi
863*22dc650dSSadaf Ebrahimi       If the jitfast modifier is specified, matching is done  using  the  JIT
864*22dc650dSSadaf Ebrahimi       "fast  path" interface, pcre2_jit_match(), which skips some of the san-
865*22dc650dSSadaf Ebrahimi       ity checks that are done by pcre2_match(), and of course does not  work
866*22dc650dSSadaf Ebrahimi       when  JIT  is not supported. If jitfast is specified without jit, jit=7
867*22dc650dSSadaf Ebrahimi       is assumed.
868*22dc650dSSadaf Ebrahimi
869*22dc650dSSadaf Ebrahimi       If the jitverify modifier is specified, information about the  compiled
870*22dc650dSSadaf Ebrahimi       pattern  shows  whether  JIT  compilation was or was not successful. If
871*22dc650dSSadaf Ebrahimi       jitverify is specified without jit, jit=7 is assumed. If  JIT  compila-
872*22dc650dSSadaf Ebrahimi       tion  is successful when jitverify is set, the text "(JIT)" is added to
873*22dc650dSSadaf Ebrahimi       the first output line after a match or non match when JIT-compiled code
874*22dc650dSSadaf Ebrahimi       was actually used in the match.
875*22dc650dSSadaf Ebrahimi
876*22dc650dSSadaf Ebrahimi   Setting a locale
877*22dc650dSSadaf Ebrahimi
878*22dc650dSSadaf Ebrahimi       The locale modifier must specify the name of a locale, for example:
879*22dc650dSSadaf Ebrahimi
880*22dc650dSSadaf Ebrahimi         /pattern/locale=fr_FR
881*22dc650dSSadaf Ebrahimi
882*22dc650dSSadaf Ebrahimi       The given locale is set, pcre2_maketables() is called to build a set of
883*22dc650dSSadaf Ebrahimi       character tables for the locale, and this is then passed to  pcre2_com-
884*22dc650dSSadaf Ebrahimi       pile()  when compiling the regular expression. The same tables are used
885*22dc650dSSadaf Ebrahimi       when matching the following subject lines. The locale modifier  applies
886*22dc650dSSadaf Ebrahimi       only to the pattern on which it appears, but can be given in a #pattern
887*22dc650dSSadaf Ebrahimi       command  if a default is needed. Setting a locale and alternate charac-
888*22dc650dSSadaf Ebrahimi       ter tables are mutually exclusive.
889*22dc650dSSadaf Ebrahimi
890*22dc650dSSadaf Ebrahimi   Showing pattern memory
891*22dc650dSSadaf Ebrahimi
892*22dc650dSSadaf Ebrahimi       The memory modifier causes the size in bytes of the memory used to hold
893*22dc650dSSadaf Ebrahimi       the compiled pattern to be output. This does not include  the  size  of
894*22dc650dSSadaf Ebrahimi       the  pcre2_code block; it is just the actual compiled data. If the pat-
895*22dc650dSSadaf Ebrahimi       tern is subsequently passed to the JIT compiler, the size  of  the  JIT
896*22dc650dSSadaf Ebrahimi       compiled code is also output. Here is an example:
897*22dc650dSSadaf Ebrahimi
898*22dc650dSSadaf Ebrahimi           re> /a(b)c/jit,memory
899*22dc650dSSadaf Ebrahimi         Memory allocation (code space): 21
900*22dc650dSSadaf Ebrahimi         Memory allocation (JIT code): 1910
901*22dc650dSSadaf Ebrahimi
902*22dc650dSSadaf Ebrahimi
903*22dc650dSSadaf Ebrahimi   Limiting nested parentheses
904*22dc650dSSadaf Ebrahimi
905*22dc650dSSadaf Ebrahimi       The  parens_nest_limit  modifier  sets  a  limit on the depth of nested
906*22dc650dSSadaf Ebrahimi       parentheses in a pattern. Breaching the limit causes a compilation  er-
907*22dc650dSSadaf Ebrahimi       ror.   The  default  for  the  library  is set when PCRE2 is built, but
908*22dc650dSSadaf Ebrahimi       pcre2test sets its own default of 220, which is  required  for  running
909*22dc650dSSadaf Ebrahimi       the standard test suite.
910*22dc650dSSadaf Ebrahimi
911*22dc650dSSadaf Ebrahimi   Limiting the pattern length
912*22dc650dSSadaf Ebrahimi
913*22dc650dSSadaf Ebrahimi       The  max_pattern_length  modifier  sets  a limit, in code units, to the
914*22dc650dSSadaf Ebrahimi       length of pattern that pcre2_compile() will accept. Breaching the limit
915*22dc650dSSadaf Ebrahimi       causes a compilation  error.  The  default  is  the  largest  number  a
916*22dc650dSSadaf Ebrahimi       PCRE2_SIZE variable can hold (essentially unlimited).
917*22dc650dSSadaf Ebrahimi
918*22dc650dSSadaf Ebrahimi   Limiting the size of a compiled pattern
919*22dc650dSSadaf Ebrahimi
920*22dc650dSSadaf Ebrahimi       The max_pattern_compiled_length modifier sets a limit, in bytes, to the
921*22dc650dSSadaf Ebrahimi       amount of memory used by a compiled pattern. Breaching the limit causes
922*22dc650dSSadaf Ebrahimi       a  compilation  error.  The  default is the largest number a PCRE2_SIZE
923*22dc650dSSadaf Ebrahimi       variable can hold (essentially unlimited).
924*22dc650dSSadaf Ebrahimi
925*22dc650dSSadaf Ebrahimi   Using the POSIX wrapper API
926*22dc650dSSadaf Ebrahimi
927*22dc650dSSadaf Ebrahimi       The posix and posix_nosub modifiers cause pcre2test to call  PCRE2  via
928*22dc650dSSadaf Ebrahimi       the  POSIX  wrapper API rather than its native API. When posix_nosub is
929*22dc650dSSadaf Ebrahimi       used, the POSIX option REG_NOSUB is  passed  to  regcomp().  The  POSIX
930*22dc650dSSadaf Ebrahimi       wrapper  supports  only  the 8-bit library. Note that it does not imply
931*22dc650dSSadaf Ebrahimi       POSIX matching semantics; for more detail see the pcre2posix documenta-
932*22dc650dSSadaf Ebrahimi       tion. The following pattern modifiers set  options  for  the  regcomp()
933*22dc650dSSadaf Ebrahimi       function:
934*22dc650dSSadaf Ebrahimi
935*22dc650dSSadaf Ebrahimi         caseless           REG_ICASE
936*22dc650dSSadaf Ebrahimi         multiline          REG_NEWLINE
937*22dc650dSSadaf Ebrahimi         dotall             REG_DOTALL     )
938*22dc650dSSadaf Ebrahimi         ungreedy           REG_UNGREEDY   ) These options are not part of
939*22dc650dSSadaf Ebrahimi         ucp                REG_UCP        )   the POSIX standard
940*22dc650dSSadaf Ebrahimi         utf                REG_UTF8       )
941*22dc650dSSadaf Ebrahimi
942*22dc650dSSadaf Ebrahimi       The  regerror_buffsize  modifier  specifies a size for the error buffer
943*22dc650dSSadaf Ebrahimi       that is passed to regerror() in the event of a compilation  error.  For
944*22dc650dSSadaf Ebrahimi       example:
945*22dc650dSSadaf Ebrahimi
946*22dc650dSSadaf Ebrahimi         /abc/posix,regerror_buffsize=20
947*22dc650dSSadaf Ebrahimi
948*22dc650dSSadaf Ebrahimi       This  provides  a means of testing the behaviour of regerror() when the
949*22dc650dSSadaf Ebrahimi       buffer is too small for the error message. If  this  modifier  has  not
950*22dc650dSSadaf Ebrahimi       been set, a large buffer is used.
951*22dc650dSSadaf Ebrahimi
952*22dc650dSSadaf Ebrahimi       The  aftertext and allaftertext subject modifiers work as described be-
953*22dc650dSSadaf Ebrahimi       low. All other modifiers are either ignored, with a warning message, or
954*22dc650dSSadaf Ebrahimi       cause an error.
955*22dc650dSSadaf Ebrahimi
956*22dc650dSSadaf Ebrahimi       The pattern is passed to regcomp() as a zero-terminated string  by  de-
957*22dc650dSSadaf Ebrahimi       fault, but if the use_length or hex modifiers are set, the REG_PEND ex-
958*22dc650dSSadaf Ebrahimi       tension is used to pass it by length.
959*22dc650dSSadaf Ebrahimi
960*22dc650dSSadaf Ebrahimi   Testing the stack guard feature
961*22dc650dSSadaf Ebrahimi
962*22dc650dSSadaf Ebrahimi       The  stackguard  modifier  is  used  to  test the use of pcre2_set_com-
963*22dc650dSSadaf Ebrahimi       pile_recursion_guard(), a function that is  provided  to  enable  stack
964*22dc650dSSadaf Ebrahimi       availability  to  be checked during compilation (see the pcre2api docu-
965*22dc650dSSadaf Ebrahimi       mentation for details). If the number  specified  by  the  modifier  is
966*22dc650dSSadaf Ebrahimi       greater than zero, pcre2_set_compile_recursion_guard() is called to set
967*22dc650dSSadaf Ebrahimi       up  callback  from pcre2_compile() to a local function. The argument it
968*22dc650dSSadaf Ebrahimi       receives is the current nesting parenthesis depth; if this  is  greater
969*22dc650dSSadaf Ebrahimi       than the value given by the modifier, non-zero is returned, causing the
970*22dc650dSSadaf Ebrahimi       compilation to be aborted.
971*22dc650dSSadaf Ebrahimi
972*22dc650dSSadaf Ebrahimi   Using alternative character tables
973*22dc650dSSadaf Ebrahimi
974*22dc650dSSadaf Ebrahimi       The  value  specified for the tables modifier must be one of the digits
975*22dc650dSSadaf Ebrahimi       0, 1, 2, or 3. It causes a specific set of built-in character tables to
976*22dc650dSSadaf Ebrahimi       be passed to pcre2_compile(). This is used in the PCRE2 tests to  check
977*22dc650dSSadaf Ebrahimi       behaviour  with different character tables. The digit specifies the ta-
978*22dc650dSSadaf Ebrahimi       bles as follows:
979*22dc650dSSadaf Ebrahimi
980*22dc650dSSadaf Ebrahimi         0   do not pass any special character tables
981*22dc650dSSadaf Ebrahimi         1   the default ASCII tables, as distributed in
982*22dc650dSSadaf Ebrahimi               pcre2_chartables.c.dist
983*22dc650dSSadaf Ebrahimi         2   a set of tables defining ISO 8859 characters
984*22dc650dSSadaf Ebrahimi         3   a set of tables loaded by the #loadtables command
985*22dc650dSSadaf Ebrahimi
986*22dc650dSSadaf Ebrahimi       In tables 2, some characters whose codes are greater than 128 are iden-
987*22dc650dSSadaf Ebrahimi       tified as letters, digits, spaces, etc. Tables 3 can be used only after
988*22dc650dSSadaf Ebrahimi       a #loadtables command has loaded them from a binary file.  Setting  al-
989*22dc650dSSadaf Ebrahimi       ternate character tables and a locale are mutually exclusive.
990*22dc650dSSadaf Ebrahimi
991*22dc650dSSadaf Ebrahimi   Setting certain match controls
992*22dc650dSSadaf Ebrahimi
993*22dc650dSSadaf Ebrahimi       The following modifiers are really subject modifiers, and are described
994*22dc650dSSadaf Ebrahimi       under  "Subject  Modifiers"  below.  However, they may be included in a
995*22dc650dSSadaf Ebrahimi       pattern's modifier list, in which case they are applied to  every  sub-
996*22dc650dSSadaf Ebrahimi       ject  line  that is processed with that pattern. These modifiers do not
997*22dc650dSSadaf Ebrahimi       affect the compilation process.
998*22dc650dSSadaf Ebrahimi
999*22dc650dSSadaf Ebrahimi             aftertext                   show text after match
1000*22dc650dSSadaf Ebrahimi             allaftertext                show text after captures
1001*22dc650dSSadaf Ebrahimi             allcaptures                 show all captures
1002*22dc650dSSadaf Ebrahimi             allvector                   show the entire ovector
1003*22dc650dSSadaf Ebrahimi             allusedtext                 show all consulted text
1004*22dc650dSSadaf Ebrahimi             altglobal                   alternative global matching
1005*22dc650dSSadaf Ebrahimi         /g  global                      global matching
1006*22dc650dSSadaf Ebrahimi             heapframes_size             show match data heapframes size
1007*22dc650dSSadaf Ebrahimi             jitstack=<n>                set size of JIT stack
1008*22dc650dSSadaf Ebrahimi             mark                        show mark values
1009*22dc650dSSadaf Ebrahimi             replace=<string>            specify a replacement string
1010*22dc650dSSadaf Ebrahimi             startchar                   show starting character when relevant
1011*22dc650dSSadaf Ebrahimi             substitute_callout          use substitution callouts
1012*22dc650dSSadaf Ebrahimi             substitute_extended         use PCRE2_SUBSTITUTE_EXTENDED
1013*22dc650dSSadaf Ebrahimi             substitute_literal          use PCRE2_SUBSTITUTE_LITERAL
1014*22dc650dSSadaf Ebrahimi             substitute_matched          use PCRE2_SUBSTITUTE_MATCHED
1015*22dc650dSSadaf Ebrahimi             substitute_overflow_length  use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1016*22dc650dSSadaf Ebrahimi             substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
1017*22dc650dSSadaf Ebrahimi             substitute_skip=<n>         skip substitution <n>
1018*22dc650dSSadaf Ebrahimi             substitute_stop=<n>         skip substitution <n> and following
1019*22dc650dSSadaf Ebrahimi             substitute_unknown_unset    use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1020*22dc650dSSadaf Ebrahimi             substitute_unset_empty      use PCRE2_SUBSTITUTE_UNSET_EMPTY
1021*22dc650dSSadaf Ebrahimi
1022*22dc650dSSadaf Ebrahimi       These modifiers may not appear in a #pattern command. If you want  them
1023*22dc650dSSadaf Ebrahimi       as defaults, set them in a #subject command.
1024*22dc650dSSadaf Ebrahimi
1025*22dc650dSSadaf Ebrahimi   Specifying literal subject lines
1026*22dc650dSSadaf Ebrahimi
1027*22dc650dSSadaf Ebrahimi       If  the  subject_literal modifier is present on a pattern, all the sub-
1028*22dc650dSSadaf Ebrahimi       ject lines that it matches are taken as literal strings, with no inter-
1029*22dc650dSSadaf Ebrahimi       pretation of backslashes. It is not possible to set  subject  modifiers
1030*22dc650dSSadaf Ebrahimi       on  such  lines, but any that are set as defaults by a #subject command
1031*22dc650dSSadaf Ebrahimi       are recognized.
1032*22dc650dSSadaf Ebrahimi
1033*22dc650dSSadaf Ebrahimi   Saving a compiled pattern
1034*22dc650dSSadaf Ebrahimi
1035*22dc650dSSadaf Ebrahimi       When a pattern with the push modifier is successfully compiled,  it  is
1036*22dc650dSSadaf Ebrahimi       pushed  onto  a  stack  of compiled patterns, and pcre2test expects the
1037*22dc650dSSadaf Ebrahimi       next line to contain a new pattern (or a command) instead of a  subject
1038*22dc650dSSadaf Ebrahimi       line. This facility is used when saving compiled patterns to a file, as
1039*22dc650dSSadaf Ebrahimi       described  in  the section entitled "Saving and restoring compiled pat-
1040*22dc650dSSadaf Ebrahimi       terns" below.  If pushcopy is used instead of push, a copy of the  com-
1041*22dc650dSSadaf Ebrahimi       piled  pattern  is  stacked,  leaving the original as current, ready to
1042*22dc650dSSadaf Ebrahimi       match the following input lines. This provides a  way  of  testing  the
1043*22dc650dSSadaf Ebrahimi       pcre2_code_copy()  function.   The push and pushcopy  modifiers are in-
1044*22dc650dSSadaf Ebrahimi       compatible with compilation modifiers such as global that act at  match
1045*22dc650dSSadaf Ebrahimi       time. Any that are specified are ignored (for the stacked copy), with a
1046*22dc650dSSadaf Ebrahimi       warning  message,  except for replace, which causes an error. Note that
1047*22dc650dSSadaf Ebrahimi       jitverify, which is allowed, does not carry through to  any  subsequent
1048*22dc650dSSadaf Ebrahimi       matching that uses a stacked pattern.
1049*22dc650dSSadaf Ebrahimi
1050*22dc650dSSadaf Ebrahimi   Testing foreign pattern conversion
1051*22dc650dSSadaf Ebrahimi
1052*22dc650dSSadaf Ebrahimi       The  experimental  foreign pattern conversion functions in PCRE2 can be
1053*22dc650dSSadaf Ebrahimi       tested by setting the convert modifier. Its argument is  a  colon-sepa-
1054*22dc650dSSadaf Ebrahimi       rated  list  of  options,  which  set  the  equivalent  option  for the
1055*22dc650dSSadaf Ebrahimi       pcre2_pattern_convert() function:
1056*22dc650dSSadaf Ebrahimi
1057*22dc650dSSadaf Ebrahimi         glob                    PCRE2_CONVERT_GLOB
1058*22dc650dSSadaf Ebrahimi         glob_no_starstar        PCRE2_CONVERT_GLOB_NO_STARSTAR
1059*22dc650dSSadaf Ebrahimi         glob_no_wild_separator  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
1060*22dc650dSSadaf Ebrahimi         posix_basic             PCRE2_CONVERT_POSIX_BASIC
1061*22dc650dSSadaf Ebrahimi         posix_extended          PCRE2_CONVERT_POSIX_EXTENDED
1062*22dc650dSSadaf Ebrahimi         unset                   Unset all options
1063*22dc650dSSadaf Ebrahimi
1064*22dc650dSSadaf Ebrahimi       The "unset" value is useful for turning off a default that has been set
1065*22dc650dSSadaf Ebrahimi       by a #pattern command. When one of these options is set, the input pat-
1066*22dc650dSSadaf Ebrahimi       tern is passed to pcre2_pattern_convert(). If the  conversion  is  suc-
1067*22dc650dSSadaf Ebrahimi       cessful,  the  result  is  reflected  in  the output and then passed to
1068*22dc650dSSadaf Ebrahimi       pcre2_compile(). The normal utf and no_utf_check options, if set, cause
1069*22dc650dSSadaf Ebrahimi       the PCRE2_CONVERT_UTF  and  PCRE2_CONVERT_NO_UTF_CHECK  options  to  be
1070*22dc650dSSadaf Ebrahimi       passed to pcre2_pattern_convert().
1071*22dc650dSSadaf Ebrahimi
1072*22dc650dSSadaf Ebrahimi       By default, the conversion function is allowed to allocate a buffer for
1073*22dc650dSSadaf Ebrahimi       its  output.  However, if the convert_length modifier is set to a value
1074*22dc650dSSadaf Ebrahimi       greater than zero, pcre2test passes a buffer of the given length.  This
1075*22dc650dSSadaf Ebrahimi       makes it possible to test the length check.
1076*22dc650dSSadaf Ebrahimi
1077*22dc650dSSadaf Ebrahimi       The  convert_glob_escape  and  convert_glob_separator  modifiers can be
1078*22dc650dSSadaf Ebrahimi       used to specify the escape and separator characters for  glob  process-
1079*22dc650dSSadaf Ebrahimi       ing, overriding the defaults, which are operating-system dependent.
1080*22dc650dSSadaf Ebrahimi
1081*22dc650dSSadaf Ebrahimi
1082*22dc650dSSadaf EbrahimiSUBJECT MODIFIERS
1083*22dc650dSSadaf Ebrahimi
1084*22dc650dSSadaf Ebrahimi       The modifiers that can appear in subject lines and the #subject command
1085*22dc650dSSadaf Ebrahimi       are of two types.
1086*22dc650dSSadaf Ebrahimi
1087*22dc650dSSadaf Ebrahimi   Setting match options
1088*22dc650dSSadaf Ebrahimi
1089*22dc650dSSadaf Ebrahimi       The    following   modifiers   set   options   for   pcre2_match()   or
1090*22dc650dSSadaf Ebrahimi       pcre2_dfa_match(). See pcreapi for a description of their effects.
1091*22dc650dSSadaf Ebrahimi
1092*22dc650dSSadaf Ebrahimi             anchored                   set PCRE2_ANCHORED
1093*22dc650dSSadaf Ebrahimi             endanchored                set PCRE2_ENDANCHORED
1094*22dc650dSSadaf Ebrahimi             dfa_restart                set PCRE2_DFA_RESTART
1095*22dc650dSSadaf Ebrahimi             dfa_shortest               set PCRE2_DFA_SHORTEST
1096*22dc650dSSadaf Ebrahimi             disable_recurseloop_check  set PCRE2_DISABLE_RECURSELOOP_CHECK
1097*22dc650dSSadaf Ebrahimi             no_jit                     set PCRE2_NO_JIT
1098*22dc650dSSadaf Ebrahimi             no_utf_check               set PCRE2_NO_UTF_CHECK
1099*22dc650dSSadaf Ebrahimi             notbol                     set PCRE2_NOTBOL
1100*22dc650dSSadaf Ebrahimi             notempty                   set PCRE2_NOTEMPTY
1101*22dc650dSSadaf Ebrahimi             notempty_atstart           set PCRE2_NOTEMPTY_ATSTART
1102*22dc650dSSadaf Ebrahimi             noteol                     set PCRE2_NOTEOL
1103*22dc650dSSadaf Ebrahimi             partial_hard (or ph)       set PCRE2_PARTIAL_HARD
1104*22dc650dSSadaf Ebrahimi             partial_soft (or ps)       set PCRE2_PARTIAL_SOFT
1105*22dc650dSSadaf Ebrahimi
1106*22dc650dSSadaf Ebrahimi       The partial matching modifiers are provided with abbreviations  because
1107*22dc650dSSadaf Ebrahimi       they appear frequently in tests.
1108*22dc650dSSadaf Ebrahimi
1109*22dc650dSSadaf Ebrahimi       If  the posix or posix_nosub modifier was present on the pattern, caus-
1110*22dc650dSSadaf Ebrahimi       ing the POSIX wrapper API to be used, the only option-setting modifiers
1111*22dc650dSSadaf Ebrahimi       that have any effect are notbol, notempty, and noteol, causing REG_NOT-
1112*22dc650dSSadaf Ebrahimi       BOL, REG_NOTEMPTY,  and  REG_NOTEOL,  respectively,  to  be  passed  to
1113*22dc650dSSadaf Ebrahimi       regexec(). The other modifiers are ignored, with a warning message.
1114*22dc650dSSadaf Ebrahimi
1115*22dc650dSSadaf Ebrahimi       There  is one additional modifier that can be used with the POSIX wrap-
1116*22dc650dSSadaf Ebrahimi       per. It is ignored (with a warning) if used for non-POSIX matching.
1117*22dc650dSSadaf Ebrahimi
1118*22dc650dSSadaf Ebrahimi             posix_startend=<n>[:<m>]
1119*22dc650dSSadaf Ebrahimi
1120*22dc650dSSadaf Ebrahimi       This causes the subject string to be  passed  to  regexec()  using  the
1121*22dc650dSSadaf Ebrahimi       REG_STARTEND  option,  which  uses offsets to specify which part of the
1122*22dc650dSSadaf Ebrahimi       string is searched. If only one number is  given,  the  end  offset  is
1123*22dc650dSSadaf Ebrahimi       passed  as  the end of the subject string. For more detail of REG_STAR-
1124*22dc650dSSadaf Ebrahimi       TEND, see the pcre2posix documentation. If the subject string  contains
1125*22dc650dSSadaf Ebrahimi       binary  zeros  (coded  as escapes such as \x{00} because pcre2test does
1126*22dc650dSSadaf Ebrahimi       not support actual binary zeros in its input), you must use posix_star-
1127*22dc650dSSadaf Ebrahimi       tend to specify its length.
1128*22dc650dSSadaf Ebrahimi
1129*22dc650dSSadaf Ebrahimi   Setting match controls
1130*22dc650dSSadaf Ebrahimi
1131*22dc650dSSadaf Ebrahimi       The following modifiers affect the matching process  or  request  addi-
1132*22dc650dSSadaf Ebrahimi       tional  information.  Some  of  them may also be specified on a pattern
1133*22dc650dSSadaf Ebrahimi       line (see above), in which case they apply to every subject  line  that
1134*22dc650dSSadaf Ebrahimi       is  matched against that pattern, but can be overridden by modifiers on
1135*22dc650dSSadaf Ebrahimi       the subject.
1136*22dc650dSSadaf Ebrahimi
1137*22dc650dSSadaf Ebrahimi             aftertext                  show text after match
1138*22dc650dSSadaf Ebrahimi             allaftertext               show text after captures
1139*22dc650dSSadaf Ebrahimi             allcaptures                show all captures
1140*22dc650dSSadaf Ebrahimi             allvector                  show the entire ovector
1141*22dc650dSSadaf Ebrahimi             allusedtext                show all consulted text (non-JIT only)
1142*22dc650dSSadaf Ebrahimi             altglobal                  alternative global matching
1143*22dc650dSSadaf Ebrahimi             callout_capture            show captures at callout time
1144*22dc650dSSadaf Ebrahimi             callout_data=<n>           set a value to pass via callouts
1145*22dc650dSSadaf Ebrahimi             callout_error=<n>[:<m>]    control callout error
1146*22dc650dSSadaf Ebrahimi             callout_extra              show extra callout information
1147*22dc650dSSadaf Ebrahimi             callout_fail=<n>[:<m>]     control callout failure
1148*22dc650dSSadaf Ebrahimi             callout_no_where           do not show position of a callout
1149*22dc650dSSadaf Ebrahimi             callout_none               do not supply a callout function
1150*22dc650dSSadaf Ebrahimi             copy=<number or name>      copy captured substring
1151*22dc650dSSadaf Ebrahimi             depth_limit=<n>            set a depth limit
1152*22dc650dSSadaf Ebrahimi             dfa                        use pcre2_dfa_match()
1153*22dc650dSSadaf Ebrahimi             find_limits                find heap, match and depth limits
1154*22dc650dSSadaf Ebrahimi             find_limits_noheap         find match and depth limits
1155*22dc650dSSadaf Ebrahimi             get=<number or name>       extract captured substring
1156*22dc650dSSadaf Ebrahimi             getall                     extract all captured substrings
1157*22dc650dSSadaf Ebrahimi         /g  global                     global matching
1158*22dc650dSSadaf Ebrahimi             heapframes_size            show match data heapframes size
1159*22dc650dSSadaf Ebrahimi             heap_limit=<n>             set a limit on heap memory (Kbytes)
1160*22dc650dSSadaf Ebrahimi             jitstack=<n>               set size of JIT stack
1161*22dc650dSSadaf Ebrahimi             mark                       show mark values
1162*22dc650dSSadaf Ebrahimi             match_limit=<n>            set a match limit
1163*22dc650dSSadaf Ebrahimi             memory                     show heap memory usage
1164*22dc650dSSadaf Ebrahimi             null_context               match with a NULL context
1165*22dc650dSSadaf Ebrahimi             null_replacement           substitute with NULL replacement
1166*22dc650dSSadaf Ebrahimi             null_subject               match with NULL subject
1167*22dc650dSSadaf Ebrahimi             offset=<n>                 set starting offset
1168*22dc650dSSadaf Ebrahimi             offset_limit=<n>           set offset limit
1169*22dc650dSSadaf Ebrahimi             ovector=<n>                set size of output vector
1170*22dc650dSSadaf Ebrahimi             recursion_limit=<n>        obsolete synonym for depth_limit
1171*22dc650dSSadaf Ebrahimi             replace=<string>           specify a replacement string
1172*22dc650dSSadaf Ebrahimi             startchar                  show startchar when relevant
1173*22dc650dSSadaf Ebrahimi             startoffset=<n>            same as offset=<n>
1174*22dc650dSSadaf Ebrahimi             substitute_callout         use substitution callouts
1175*22dc650dSSadaf Ebrahimi             substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
1176*22dc650dSSadaf Ebrahimi             substitute_literal         use PCRE2_SUBSTITUTE_LITERAL
1177*22dc650dSSadaf Ebrahimi             substitute_matched         use PCRE2_SUBSTITUTE_MATCHED
1178*22dc650dSSadaf Ebrahimi             substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1179*22dc650dSSadaf Ebrahimi             substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
1180*22dc650dSSadaf Ebrahimi             substitute_skip=<n>        skip substitution number n
1181*22dc650dSSadaf Ebrahimi             substitute_stop=<n>        skip substitution number n and greater
1182*22dc650dSSadaf Ebrahimi             substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1183*22dc650dSSadaf Ebrahimi             substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
1184*22dc650dSSadaf Ebrahimi             zero_terminate             pass the subject as zero-terminated
1185*22dc650dSSadaf Ebrahimi
1186*22dc650dSSadaf Ebrahimi       The effects of these modifiers are described in the following sections.
1187*22dc650dSSadaf Ebrahimi       When matching via the POSIX wrapper API, the  aftertext,  allaftertext,
1188*22dc650dSSadaf Ebrahimi       and  ovector subject modifiers work as described below. All other modi-
1189*22dc650dSSadaf Ebrahimi       fiers are either ignored, with a warning message, or cause an error.
1190*22dc650dSSadaf Ebrahimi
1191*22dc650dSSadaf Ebrahimi   Showing more text
1192*22dc650dSSadaf Ebrahimi
1193*22dc650dSSadaf Ebrahimi       The aftertext modifier requests that as well as outputting the part  of
1194*22dc650dSSadaf Ebrahimi       the subject string that matched the entire pattern, pcre2test should in
1195*22dc650dSSadaf Ebrahimi       addition output the remainder of the subject string. This is useful for
1196*22dc650dSSadaf Ebrahimi       tests where the subject contains multiple copies of the same substring.
1197*22dc650dSSadaf Ebrahimi       The  allaftertext  modifier  requests the same action for captured sub-
1198*22dc650dSSadaf Ebrahimi       strings as well as the main matched substring. In each case the remain-
1199*22dc650dSSadaf Ebrahimi       der is output on the following line with a plus character following the
1200*22dc650dSSadaf Ebrahimi       capture number.
1201*22dc650dSSadaf Ebrahimi
1202*22dc650dSSadaf Ebrahimi       The allusedtext modifier requests that all the text that was  consulted
1203*22dc650dSSadaf Ebrahimi       during  a  successful pattern match by the interpreter should be shown,
1204*22dc650dSSadaf Ebrahimi       for both full and partial matches. This feature is  not  supported  for
1205*22dc650dSSadaf Ebrahimi       JIT  matching,  and if requested with JIT it is ignored (with a warning
1206*22dc650dSSadaf Ebrahimi       message). Setting this modifier affects the output if there is a  look-
1207*22dc650dSSadaf Ebrahimi       behind  at  the start of a match, or, for a complete match, a lookahead
1208*22dc650dSSadaf Ebrahimi       at the end, or if \K is used in the pattern. Characters that precede or
1209*22dc650dSSadaf Ebrahimi       follow the start and end of the actual match are indicated in the  out-
1210*22dc650dSSadaf Ebrahimi       put by '<' or '>' characters underneath them.  Here is an example:
1211*22dc650dSSadaf Ebrahimi
1212*22dc650dSSadaf Ebrahimi           re> /(?<=pqr)abc(?=xyz)/
1213*22dc650dSSadaf Ebrahimi         data> 123pqrabcxyz456\=allusedtext
1214*22dc650dSSadaf Ebrahimi          0: pqrabcxyz
1215*22dc650dSSadaf Ebrahimi             <<<   >>>
1216*22dc650dSSadaf Ebrahimi         data> 123pqrabcxy\=ph,allusedtext
1217*22dc650dSSadaf Ebrahimi         Partial match: pqrabcxy
1218*22dc650dSSadaf Ebrahimi                        <<<
1219*22dc650dSSadaf Ebrahimi
1220*22dc650dSSadaf Ebrahimi       The  first, complete match shows that the matched string is "abc", with
1221*22dc650dSSadaf Ebrahimi       the preceding and following strings "pqr" and "xyz"  having  been  con-
1222*22dc650dSSadaf Ebrahimi       sulted  during  the match (when processing the assertions). The partial
1223*22dc650dSSadaf Ebrahimi       match can indicate only the preceding string.
1224*22dc650dSSadaf Ebrahimi
1225*22dc650dSSadaf Ebrahimi       The startchar modifier requests that the  starting  character  for  the
1226*22dc650dSSadaf Ebrahimi       match  be  indicated,  if  it  is different to the start of the matched
1227*22dc650dSSadaf Ebrahimi       string. The only time when this occurs is when \K has been processed as
1228*22dc650dSSadaf Ebrahimi       part of the match. In this situation, the output for the matched string
1229*22dc650dSSadaf Ebrahimi       is displayed from the starting character  instead  of  from  the  match
1230*22dc650dSSadaf Ebrahimi       point, with circumflex characters under the earlier characters. For ex-
1231*22dc650dSSadaf Ebrahimi       ample:
1232*22dc650dSSadaf Ebrahimi
1233*22dc650dSSadaf Ebrahimi           re> /abc\Kxyz/
1234*22dc650dSSadaf Ebrahimi         data> abcxyz\=startchar
1235*22dc650dSSadaf Ebrahimi          0: abcxyz
1236*22dc650dSSadaf Ebrahimi             ^^^
1237*22dc650dSSadaf Ebrahimi
1238*22dc650dSSadaf Ebrahimi       Unlike  allusedtext, the startchar modifier can be used with JIT.  How-
1239*22dc650dSSadaf Ebrahimi       ever, these two modifiers are mutually exclusive.
1240*22dc650dSSadaf Ebrahimi
1241*22dc650dSSadaf Ebrahimi   Showing the value of all capture groups
1242*22dc650dSSadaf Ebrahimi
1243*22dc650dSSadaf Ebrahimi       The allcaptures modifier requests that the values of all potential cap-
1244*22dc650dSSadaf Ebrahimi       tured parentheses be output after a match. By default, only those up to
1245*22dc650dSSadaf Ebrahimi       the highest one actually used in the match are output (corresponding to
1246*22dc650dSSadaf Ebrahimi       the return code from pcre2_match()). Groups that did not take  part  in
1247*22dc650dSSadaf Ebrahimi       the  match  are  output as "<unset>". This modifier is not relevant for
1248*22dc650dSSadaf Ebrahimi       DFA matching (which does no capturing) and does not apply when  replace
1249*22dc650dSSadaf Ebrahimi       is specified; it is ignored, with a warning message, if present.
1250*22dc650dSSadaf Ebrahimi
1251*22dc650dSSadaf Ebrahimi   Showing the entire ovector, for all outcomes
1252*22dc650dSSadaf Ebrahimi
1253*22dc650dSSadaf Ebrahimi       The allvector modifier requests that the entire ovector be shown, what-
1254*22dc650dSSadaf Ebrahimi       ever the outcome of the match. Compare allcaptures, which shows only up
1255*22dc650dSSadaf Ebrahimi       to  the maximum number of capture groups for the pattern, and then only
1256*22dc650dSSadaf Ebrahimi       for a successful complete non-DFA match. This modifier, which acts  af-
1257*22dc650dSSadaf Ebrahimi       ter  any  match  result, and also for DFA matching, provides a means of
1258*22dc650dSSadaf Ebrahimi       checking that there are no unexpected modifications to ovector  fields.
1259*22dc650dSSadaf Ebrahimi       Before  each match attempt, the ovector is filled with a special value,
1260*22dc650dSSadaf Ebrahimi       and if this is found in  both  elements  of  a  capturing  pair,  "<un-
1261*22dc650dSSadaf Ebrahimi       changed>"  is  output.  After  a  successful match, this applies to all
1262*22dc650dSSadaf Ebrahimi       groups after the maximum capture group for the pattern. In other  cases
1263*22dc650dSSadaf Ebrahimi       it  applies to the entire ovector. After a partial match, the first two
1264*22dc650dSSadaf Ebrahimi       elements are the only ones that should be set. After a DFA  match,  the
1265*22dc650dSSadaf Ebrahimi       amount  of  ovector  that is used depends on the number of matches that
1266*22dc650dSSadaf Ebrahimi       were found.
1267*22dc650dSSadaf Ebrahimi
1268*22dc650dSSadaf Ebrahimi   Testing pattern callouts
1269*22dc650dSSadaf Ebrahimi
1270*22dc650dSSadaf Ebrahimi       A callout function is supplied when pcre2test calls the library  match-
1271*22dc650dSSadaf Ebrahimi       ing  functions,  unless callout_none is specified. Its behaviour can be
1272*22dc650dSSadaf Ebrahimi       controlled by various modifiers listed above  whose  names  begin  with
1273*22dc650dSSadaf Ebrahimi       callout_.  Details  are given in the section entitled "Callouts" below.
1274*22dc650dSSadaf Ebrahimi       Testing callouts from pcre2_substitute()  is  described  separately  in
1275*22dc650dSSadaf Ebrahimi       "Testing the substitution function" below.
1276*22dc650dSSadaf Ebrahimi
1277*22dc650dSSadaf Ebrahimi   Finding all matches in a string
1278*22dc650dSSadaf Ebrahimi
1279*22dc650dSSadaf Ebrahimi       Searching for all possible matches within a subject can be requested by
1280*22dc650dSSadaf Ebrahimi       the  global  or altglobal modifier. After finding a match, the matching
1281*22dc650dSSadaf Ebrahimi       function is called again to search the remainder of  the  subject.  The
1282*22dc650dSSadaf Ebrahimi       difference  between  global  and  altglobal is that the former uses the
1283*22dc650dSSadaf Ebrahimi       start_offset argument to pcre2_match() or  pcre2_dfa_match()  to  start
1284*22dc650dSSadaf Ebrahimi       searching  at  a new point within the entire string (which is what Perl
1285*22dc650dSSadaf Ebrahimi       does), whereas the latter passes over a shortened subject. This makes a
1286*22dc650dSSadaf Ebrahimi       difference to the matching process if the pattern begins with a lookbe-
1287*22dc650dSSadaf Ebrahimi       hind assertion (including \b or \B).
1288*22dc650dSSadaf Ebrahimi
1289*22dc650dSSadaf Ebrahimi       If an empty string  is  matched,  the  next  match  is  done  with  the
1290*22dc650dSSadaf Ebrahimi       PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
1291*22dc650dSSadaf Ebrahimi       for another, non-empty, match at the same point in the subject. If this
1292*22dc650dSSadaf Ebrahimi       match  fails, the start offset is advanced, and the normal match is re-
1293*22dc650dSSadaf Ebrahimi       tried. This imitates the way Perl handles such cases when using the  /g
1294*22dc650dSSadaf Ebrahimi       modifier  or  the  split()  function. Normally, the start offset is ad-
1295*22dc650dSSadaf Ebrahimi       vanced by one character, but if the newline convention recognizes  CRLF
1296*22dc650dSSadaf Ebrahimi       as  a  newline,  and the current character is CR followed by LF, an ad-
1297*22dc650dSSadaf Ebrahimi       vance of two characters occurs.
1298*22dc650dSSadaf Ebrahimi
1299*22dc650dSSadaf Ebrahimi   Testing substring extraction functions
1300*22dc650dSSadaf Ebrahimi
1301*22dc650dSSadaf Ebrahimi       The copy  and  get  modifiers  can  be  used  to  test  the  pcre2_sub-
1302*22dc650dSSadaf Ebrahimi       string_copy_xxx() and pcre2_substring_get_xxx() functions.  They can be
1303*22dc650dSSadaf Ebrahimi       given more than once, and each can specify a capture group name or num-
1304*22dc650dSSadaf Ebrahimi       ber, for example:
1305*22dc650dSSadaf Ebrahimi
1306*22dc650dSSadaf Ebrahimi          abcd\=copy=1,copy=3,get=G1
1307*22dc650dSSadaf Ebrahimi
1308*22dc650dSSadaf Ebrahimi       If  the  #subject command is used to set default copy and/or get lists,
1309*22dc650dSSadaf Ebrahimi       these can be unset by specifying a negative number to cancel  all  num-
1310*22dc650dSSadaf Ebrahimi       bered groups and an empty name to cancel all named groups.
1311*22dc650dSSadaf Ebrahimi
1312*22dc650dSSadaf Ebrahimi       The  getall  modifier  tests pcre2_substring_list_get(), which extracts
1313*22dc650dSSadaf Ebrahimi       all captured substrings.
1314*22dc650dSSadaf Ebrahimi
1315*22dc650dSSadaf Ebrahimi       If the subject line is successfully matched, the  substrings  extracted
1316*22dc650dSSadaf Ebrahimi       by  the  convenience  functions  are  output  with C, G, or L after the
1317*22dc650dSSadaf Ebrahimi       string number instead of a colon. This is in  addition  to  the  normal
1318*22dc650dSSadaf Ebrahimi       full  list.  The string length (that is, the return from the extraction
1319*22dc650dSSadaf Ebrahimi       function) is given in parentheses after each substring, followed by the
1320*22dc650dSSadaf Ebrahimi       name when the extraction was by name.
1321*22dc650dSSadaf Ebrahimi
1322*22dc650dSSadaf Ebrahimi   Testing the substitution function
1323*22dc650dSSadaf Ebrahimi
1324*22dc650dSSadaf Ebrahimi       If the replace modifier is  set,  the  pcre2_substitute()  function  is
1325*22dc650dSSadaf Ebrahimi       called  instead  of one of the matching functions (or after one call of
1326*22dc650dSSadaf Ebrahimi       pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note  that  re-
1327*22dc650dSSadaf Ebrahimi       placement  strings cannot contain commas, because a comma signifies the
1328*22dc650dSSadaf Ebrahimi       end of a modifier. This is not thought to be an issue in  a  test  pro-
1329*22dc650dSSadaf Ebrahimi       gram.
1330*22dc650dSSadaf Ebrahimi
1331*22dc650dSSadaf Ebrahimi       Specifying  a  completely  empty replacement string disables this modi-
1332*22dc650dSSadaf Ebrahimi       fier.  However, it is possible to specify an empty replacement by  pro-
1333*22dc650dSSadaf Ebrahimi       viding  a buffer length, as described below, for an otherwise empty re-
1334*22dc650dSSadaf Ebrahimi       placement.
1335*22dc650dSSadaf Ebrahimi
1336*22dc650dSSadaf Ebrahimi       Unlike subject strings, pcre2test does not process replacement  strings
1337*22dc650dSSadaf Ebrahimi       for  escape  sequences. In UTF mode, a replacement string is checked to
1338*22dc650dSSadaf Ebrahimi       see if it is a valid UTF-8 string. If so, it is correctly converted  to
1339*22dc650dSSadaf Ebrahimi       a  UTF  string of the appropriate code unit width. If it is not a valid
1340*22dc650dSSadaf Ebrahimi       UTF-8 string, the individual code units are copied directly. This  pro-
1341*22dc650dSSadaf Ebrahimi       vides a means of passing an invalid UTF-8 string for testing purposes.
1342*22dc650dSSadaf Ebrahimi
1343*22dc650dSSadaf Ebrahimi       The  following modifiers set options (in additional to the normal match
1344*22dc650dSSadaf Ebrahimi       options) for pcre2_substitute():
1345*22dc650dSSadaf Ebrahimi
1346*22dc650dSSadaf Ebrahimi         global                      PCRE2_SUBSTITUTE_GLOBAL
1347*22dc650dSSadaf Ebrahimi         substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
1348*22dc650dSSadaf Ebrahimi         substitute_literal          PCRE2_SUBSTITUTE_LITERAL
1349*22dc650dSSadaf Ebrahimi         substitute_matched          PCRE2_SUBSTITUTE_MATCHED
1350*22dc650dSSadaf Ebrahimi         substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1351*22dc650dSSadaf Ebrahimi         substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
1352*22dc650dSSadaf Ebrahimi         substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1353*22dc650dSSadaf Ebrahimi         substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
1354*22dc650dSSadaf Ebrahimi
1355*22dc650dSSadaf Ebrahimi       See the pcre2api documentation for details of these options.
1356*22dc650dSSadaf Ebrahimi
1357*22dc650dSSadaf Ebrahimi       After a successful substitution, the modified string  is  output,  pre-
1358*22dc650dSSadaf Ebrahimi       ceded  by the number of replacements. This may be zero if there were no
1359*22dc650dSSadaf Ebrahimi       matches. Here is a simple example of a substitution test:
1360*22dc650dSSadaf Ebrahimi
1361*22dc650dSSadaf Ebrahimi         /abc/replace=xxx
1362*22dc650dSSadaf Ebrahimi             =abc=abc=
1363*22dc650dSSadaf Ebrahimi          1: =xxx=abc=
1364*22dc650dSSadaf Ebrahimi             =abc=abc=\=global
1365*22dc650dSSadaf Ebrahimi          2: =xxx=xxx=
1366*22dc650dSSadaf Ebrahimi
1367*22dc650dSSadaf Ebrahimi       Subject and replacement strings should be kept relatively short  (fewer
1368*22dc650dSSadaf Ebrahimi       than  256 characters) for substitution tests, as fixed-size buffers are
1369*22dc650dSSadaf Ebrahimi       used. To make it easy to test for buffer overflow, if  the  replacement
1370*22dc650dSSadaf Ebrahimi       string  starts  with a number in square brackets, that number is passed
1371*22dc650dSSadaf Ebrahimi       to pcre2_substitute() as the size of the output buffer,  with  the  re-
1372*22dc650dSSadaf Ebrahimi       placement  string  starting  at  the next character. Here is an example
1373*22dc650dSSadaf Ebrahimi       that tests the edge case:
1374*22dc650dSSadaf Ebrahimi
1375*22dc650dSSadaf Ebrahimi         /abc/
1376*22dc650dSSadaf Ebrahimi             123abc123\=replace=[10]XYZ
1377*22dc650dSSadaf Ebrahimi          1: 123XYZ123
1378*22dc650dSSadaf Ebrahimi             123abc123\=replace=[9]XYZ
1379*22dc650dSSadaf Ebrahimi         Failed: error -47: no more memory
1380*22dc650dSSadaf Ebrahimi
1381*22dc650dSSadaf Ebrahimi       The  default  action  of  pcre2_substitute()  is  to  return  PCRE2_ER-
1382*22dc650dSSadaf Ebrahimi       ROR_NOMEMORY  when  the  output  buffer  is  too small. However, if the
1383*22dc650dSSadaf Ebrahimi       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by  using  the  substi-
1384*22dc650dSSadaf Ebrahimi       tute_overflow_length  modifier),  pcre2_substitute()  continues  to  go
1385*22dc650dSSadaf Ebrahimi       through the motions of matching and substituting  (but  not  doing  any
1386*22dc650dSSadaf Ebrahimi       callouts),  in  order  to  compute the size of buffer that is required.
1387*22dc650dSSadaf Ebrahimi       When this happens, pcre2test shows the required  buffer  length  (which
1388*22dc650dSSadaf Ebrahimi       includes space for the trailing zero) as part of the error message. For
1389*22dc650dSSadaf Ebrahimi       example:
1390*22dc650dSSadaf Ebrahimi
1391*22dc650dSSadaf Ebrahimi         /abc/substitute_overflow_length
1392*22dc650dSSadaf Ebrahimi             123abc123\=replace=[9]XYZ
1393*22dc650dSSadaf Ebrahimi         Failed: error -47: no more memory: 10 code units are needed
1394*22dc650dSSadaf Ebrahimi
1395*22dc650dSSadaf Ebrahimi       A replacement string is ignored with POSIX and DFA matching. Specifying
1396*22dc650dSSadaf Ebrahimi       partial  matching  provokes  an  error return ("bad option value") from
1397*22dc650dSSadaf Ebrahimi       pcre2_substitute().
1398*22dc650dSSadaf Ebrahimi
1399*22dc650dSSadaf Ebrahimi   Testing substitute callouts
1400*22dc650dSSadaf Ebrahimi
1401*22dc650dSSadaf Ebrahimi       If the substitute_callout modifier is set, a substitution callout func-
1402*22dc650dSSadaf Ebrahimi       tion is set up. The null_context modifier must not be set, because  the
1403*22dc650dSSadaf Ebrahimi       address  of the callout function is passed in a match context. When the
1404*22dc650dSSadaf Ebrahimi       callout function is called (after each substitution),  details  of  the
1405*22dc650dSSadaf Ebrahimi       input and output strings are output. For example:
1406*22dc650dSSadaf Ebrahimi
1407*22dc650dSSadaf Ebrahimi         /abc/g,replace=<$0>,substitute_callout
1408*22dc650dSSadaf Ebrahimi             abcdefabcpqr
1409*22dc650dSSadaf Ebrahimi          1(1) Old 0 3 "abc" New 0 5 "<abc>"
1410*22dc650dSSadaf Ebrahimi          2(1) Old 6 9 "abc" New 8 13 "<abc>"
1411*22dc650dSSadaf Ebrahimi          2: <abc>def<abc>pqr
1412*22dc650dSSadaf Ebrahimi
1413*22dc650dSSadaf Ebrahimi       The  first  number  on  each  callout line is the count of matches. The
1414*22dc650dSSadaf Ebrahimi       parenthesized number is the number of pairs that are set in the ovector
1415*22dc650dSSadaf Ebrahimi       (that is, one more than the number of capturing groups that were  set).
1416*22dc650dSSadaf Ebrahimi       Then are listed the offsets of the old substring, its contents, and the
1417*22dc650dSSadaf Ebrahimi       same for the replacement.
1418*22dc650dSSadaf Ebrahimi
1419*22dc650dSSadaf Ebrahimi       By  default,  the substitution callout function returns zero, which ac-
1420*22dc650dSSadaf Ebrahimi       cepts the replacement and causes matching to continue if /g  was  used.
1421*22dc650dSSadaf Ebrahimi       Two  further modifiers can be used to test other return values. If sub-
1422*22dc650dSSadaf Ebrahimi       stitute_skip is set to a value greater than zero the  callout  function
1423*22dc650dSSadaf Ebrahimi       returns  +1 for the match of that number, and similarly substitute_stop
1424*22dc650dSSadaf Ebrahimi       returns -1. These cause the replacement to be rejected, and  -1  causes
1425*22dc650dSSadaf Ebrahimi       no  further  matching to take place. If either of them are set, substi-
1426*22dc650dSSadaf Ebrahimi       tute_callout is assumed. For example:
1427*22dc650dSSadaf Ebrahimi
1428*22dc650dSSadaf Ebrahimi         /abc/g,replace=<$0>,substitute_skip=1
1429*22dc650dSSadaf Ebrahimi             abcdefabcpqr
1430*22dc650dSSadaf Ebrahimi          1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
1431*22dc650dSSadaf Ebrahimi          2(1) Old 6 9 "abc" New 6 11 "<abc>"
1432*22dc650dSSadaf Ebrahimi          2: abcdef<abc>pqr
1433*22dc650dSSadaf Ebrahimi             abcdefabcpqr\=substitute_stop=1
1434*22dc650dSSadaf Ebrahimi          1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
1435*22dc650dSSadaf Ebrahimi          1: abcdefabcpqr
1436*22dc650dSSadaf Ebrahimi
1437*22dc650dSSadaf Ebrahimi       If both are set for the same number, stop takes precedence. Only a sin-
1438*22dc650dSSadaf Ebrahimi       gle skip or stop is supported, which is sufficient for testing that the
1439*22dc650dSSadaf Ebrahimi       feature works.
1440*22dc650dSSadaf Ebrahimi
1441*22dc650dSSadaf Ebrahimi   Setting the JIT stack size
1442*22dc650dSSadaf Ebrahimi
1443*22dc650dSSadaf Ebrahimi       The jitstack modifier provides a way of setting the maximum stack  size
1444*22dc650dSSadaf Ebrahimi       that  is  used  by the just-in-time optimization code. It is ignored if
1445*22dc650dSSadaf Ebrahimi       JIT optimization is not being used. The value is a number of  kibibytes
1446*22dc650dSSadaf Ebrahimi       (units  of  1024  bytes). Setting zero reverts to the default of 32KiB.
1447*22dc650dSSadaf Ebrahimi       Providing a stack that is larger than the default is necessary only for
1448*22dc650dSSadaf Ebrahimi       very complicated patterns. If jitstack is set  non-zero  on  a  subject
1449*22dc650dSSadaf Ebrahimi       line it overrides any value that was set on the pattern.
1450*22dc650dSSadaf Ebrahimi
1451*22dc650dSSadaf Ebrahimi   Setting heap, match, and depth limits
1452*22dc650dSSadaf Ebrahimi
1453*22dc650dSSadaf Ebrahimi       The  heap_limit,  match_limit, and depth_limit modifiers set the appro-
1454*22dc650dSSadaf Ebrahimi       priate limits in the match context. These values are ignored  when  the
1455*22dc650dSSadaf Ebrahimi       find_limits or find_limits_noheap modifier is specified.
1456*22dc650dSSadaf Ebrahimi
1457*22dc650dSSadaf Ebrahimi   Finding minimum limits
1458*22dc650dSSadaf Ebrahimi
1459*22dc650dSSadaf Ebrahimi       If  the  find_limits  modifier  is present on a subject line, pcre2test
1460*22dc650dSSadaf Ebrahimi       calls the relevant matching function several times,  setting  different
1461*22dc650dSSadaf Ebrahimi       values    in    the    match    context   via   pcre2_set_heap_limit(),
1462*22dc650dSSadaf Ebrahimi       pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds  the
1463*22dc650dSSadaf Ebrahimi       smallest  value  for  each  parameter that allows the match to complete
1464*22dc650dSSadaf Ebrahimi       without a "limit exceeded" error. The match itself may succeed or fail.
1465*22dc650dSSadaf Ebrahimi       An alternative modifier, find_limits_noheap, omits the heap limit. This
1466*22dc650dSSadaf Ebrahimi       is used in the standard tests, because the minimum  heap  limit  varies
1467*22dc650dSSadaf Ebrahimi       between  systems.  If  JIT is being used, only the match limit is rele-
1468*22dc650dSSadaf Ebrahimi       vant, and the other two are automatically omitted.
1469*22dc650dSSadaf Ebrahimi
1470*22dc650dSSadaf Ebrahimi       When using this modifier, the pattern should not contain any limit set-
1471*22dc650dSSadaf Ebrahimi       tings such as (*LIMIT_MATCH=...)  within  it.  If  such  a  setting  is
1472*22dc650dSSadaf Ebrahimi       present and is lower than the minimum matching value, the minimum value
1473*22dc650dSSadaf Ebrahimi       cannot  be  found because pcre2_set_match_limit() etc. are only able to
1474*22dc650dSSadaf Ebrahimi       reduce the value of an in-pattern limit; they cannot increase it.
1475*22dc650dSSadaf Ebrahimi
1476*22dc650dSSadaf Ebrahimi       For non-DFA matching, the minimum depth_limit number is  a  measure  of
1477*22dc650dSSadaf Ebrahimi       how much nested backtracking happens (that is, how deeply the pattern's
1478*22dc650dSSadaf Ebrahimi       tree  is  searched).  In the case of DFA matching, depth_limit controls
1479*22dc650dSSadaf Ebrahimi       the depth of recursive calls of the internal function that is used  for
1480*22dc650dSSadaf Ebrahimi       handling pattern recursion, lookaround assertions, and atomic groups.
1481*22dc650dSSadaf Ebrahimi
1482*22dc650dSSadaf Ebrahimi       For non-DFA matching, the match_limit number is a measure of the amount
1483*22dc650dSSadaf Ebrahimi       of backtracking that takes place, and learning the minimum value can be
1484*22dc650dSSadaf Ebrahimi       instructive.  For  most  simple matches, the number is quite small, but
1485*22dc650dSSadaf Ebrahimi       for patterns with very large numbers of matching possibilities, it  can
1486*22dc650dSSadaf Ebrahimi       become  large very quickly with increasing length of subject string. In
1487*22dc650dSSadaf Ebrahimi       the case of DFA matching, match_limit  controls  the  total  number  of
1488*22dc650dSSadaf Ebrahimi       calls, both recursive and non-recursive, to the internal matching func-
1489*22dc650dSSadaf Ebrahimi       tion, thus controlling the overall amount of computing resource that is
1490*22dc650dSSadaf Ebrahimi       used.
1491*22dc650dSSadaf Ebrahimi
1492*22dc650dSSadaf Ebrahimi       For  both  kinds  of  matching,  the  heap_limit  number,  which  is in
1493*22dc650dSSadaf Ebrahimi       kibibytes (units of 1024 bytes), limits the amount of heap memory  used
1494*22dc650dSSadaf Ebrahimi       for matching.
1495*22dc650dSSadaf Ebrahimi
1496*22dc650dSSadaf Ebrahimi   Showing MARK names
1497*22dc650dSSadaf Ebrahimi
1498*22dc650dSSadaf Ebrahimi
1499*22dc650dSSadaf Ebrahimi       The mark modifier causes the names from backtracking control verbs that
1500*22dc650dSSadaf Ebrahimi       are  returned from calls to pcre2_match() to be displayed. If a mark is
1501*22dc650dSSadaf Ebrahimi       returned for a match, non-match, or partial match, pcre2test shows  it.
1502*22dc650dSSadaf Ebrahimi       For  a  match, it is on a line by itself, tagged with "MK:". Otherwise,
1503*22dc650dSSadaf Ebrahimi       it is added to the non-match message.
1504*22dc650dSSadaf Ebrahimi
1505*22dc650dSSadaf Ebrahimi   Showing memory usage
1506*22dc650dSSadaf Ebrahimi
1507*22dc650dSSadaf Ebrahimi       The memory modifier causes pcre2test to log the sizes of all heap  mem-
1508*22dc650dSSadaf Ebrahimi       ory   allocation  and  freeing  calls  that  occur  during  a  call  to
1509*22dc650dSSadaf Ebrahimi       pcre2_match() or pcre2_dfa_match(). In the latter case, heap memory  is
1510*22dc650dSSadaf Ebrahimi       used  only  when  a match requires more internal workspace that the de-
1511*22dc650dSSadaf Ebrahimi       fault allocation on the stack, so in many cases there will be  no  out-
1512*22dc650dSSadaf Ebrahimi       put.  No  heap  memory  is allocated during matching with JIT. For this
1513*22dc650dSSadaf Ebrahimi       modifier to work, the null_context modifier must not be set on both the
1514*22dc650dSSadaf Ebrahimi       pattern and the subject, though it can be set on one or the other.
1515*22dc650dSSadaf Ebrahimi
1516*22dc650dSSadaf Ebrahimi   Showing the heap frame overall vector size
1517*22dc650dSSadaf Ebrahimi
1518*22dc650dSSadaf Ebrahimi       The  heapframes_size   modifier   is   relevant   for   matches   using
1519*22dc650dSSadaf Ebrahimi       pcre2_match() without JIT. After a match has run (whether successful or
1520*22dc650dSSadaf Ebrahimi       not)  the  size,  in bytes, of the allocated heap frames vector that is
1521*22dc650dSSadaf Ebrahimi       left attached to the match data block is shown. If the matching  action
1522*22dc650dSSadaf Ebrahimi       involved  several  calls to pcre2_match() (for example, global matching
1523*22dc650dSSadaf Ebrahimi       or for timing) only the final value is shown.
1524*22dc650dSSadaf Ebrahimi
1525*22dc650dSSadaf Ebrahimi       This modifier is ignored, with a warning, for POSIX  or  DFA  matching.
1526*22dc650dSSadaf Ebrahimi       JIT matching does not use the heap frames vector, so the size is always
1527*22dc650dSSadaf Ebrahimi       zero,  unless there was a previous non-JIT match. Note that specifing a
1528*22dc650dSSadaf Ebrahimi       size of zero for the output vector (see below) causes pcre2test to free
1529*22dc650dSSadaf Ebrahimi       its match data block (and associated heap frames vector) and allocate a
1530*22dc650dSSadaf Ebrahimi       new one.
1531*22dc650dSSadaf Ebrahimi
1532*22dc650dSSadaf Ebrahimi   Setting a starting offset
1533*22dc650dSSadaf Ebrahimi
1534*22dc650dSSadaf Ebrahimi       The offset modifier sets an offset  in  the  subject  string  at  which
1535*22dc650dSSadaf Ebrahimi       matching starts. Its value is a number of code units, not characters.
1536*22dc650dSSadaf Ebrahimi
1537*22dc650dSSadaf Ebrahimi   Setting an offset limit
1538*22dc650dSSadaf Ebrahimi
1539*22dc650dSSadaf Ebrahimi       The  offset_limit  modifier  sets  a limit for unanchored matches. If a
1540*22dc650dSSadaf Ebrahimi       match cannot be found starting at or before this offset in the subject,
1541*22dc650dSSadaf Ebrahimi       a "no match" return is given. The data value is a number of code units,
1542*22dc650dSSadaf Ebrahimi       not characters. When this modifier is used, the use_offset_limit  modi-
1543*22dc650dSSadaf Ebrahimi       fier must have been set for the pattern; if not, an error is generated.
1544*22dc650dSSadaf Ebrahimi
1545*22dc650dSSadaf Ebrahimi   Setting the size of the output vector
1546*22dc650dSSadaf Ebrahimi
1547*22dc650dSSadaf Ebrahimi       The  ovector  modifier applies only to the subject line in which it ap-
1548*22dc650dSSadaf Ebrahimi       pears, though of course it can also be used to set a default in a #sub-
1549*22dc650dSSadaf Ebrahimi       ject command. It specifies the number of  pairs  of  offsets  that  are
1550*22dc650dSSadaf Ebrahimi       available for storing matching information. The default is 15.
1551*22dc650dSSadaf Ebrahimi
1552*22dc650dSSadaf Ebrahimi       A  value of zero is useful when testing the POSIX API because it causes
1553*22dc650dSSadaf Ebrahimi       regexec() to be called with a NULL capture vector. When not testing the
1554*22dc650dSSadaf Ebrahimi       POSIX API, a value of  zero  is  used  to  cause  pcre2_match_data_cre-
1555*22dc650dSSadaf Ebrahimi       ate_from_pattern()  to  be called, in order to create a new match block
1556*22dc650dSSadaf Ebrahimi       of exactly the right size for the pattern. (It is not possible to  cre-
1557*22dc650dSSadaf Ebrahimi       ate  a match block with a zero-length ovector; there is always at least
1558*22dc650dSSadaf Ebrahimi       one pair of offsets.) The old match data block is freed.
1559*22dc650dSSadaf Ebrahimi
1560*22dc650dSSadaf Ebrahimi   Passing the subject as zero-terminated
1561*22dc650dSSadaf Ebrahimi
1562*22dc650dSSadaf Ebrahimi       By default, the subject string is passed to a native API matching func-
1563*22dc650dSSadaf Ebrahimi       tion with its correct length. In order to test the facility for passing
1564*22dc650dSSadaf Ebrahimi       a zero-terminated string, the zero_terminate modifier is  provided.  It
1565*22dc650dSSadaf Ebrahimi       causes  the length to be passed as PCRE2_ZERO_TERMINATED. When matching
1566*22dc650dSSadaf Ebrahimi       via the POSIX interface, this modifier is ignored, with a warning.
1567*22dc650dSSadaf Ebrahimi
1568*22dc650dSSadaf Ebrahimi       When testing pcre2_substitute(), this modifier also has the  effect  of
1569*22dc650dSSadaf Ebrahimi       passing the replacement string as zero-terminated.
1570*22dc650dSSadaf Ebrahimi
1571*22dc650dSSadaf Ebrahimi   Passing a NULL context, subject, or replacement
1572*22dc650dSSadaf Ebrahimi
1573*22dc650dSSadaf Ebrahimi       Normally,   pcre2test   passes   a   context  block  to  pcre2_match(),
1574*22dc650dSSadaf Ebrahimi       pcre2_dfa_match(), pcre2_jit_match()  or  pcre2_substitute().   If  the
1575*22dc650dSSadaf Ebrahimi       null_context  modifier  is  set,  however,  NULL is passed. This is for
1576*22dc650dSSadaf Ebrahimi       testing that the matching and substitution functions  behave  correctly
1577*22dc650dSSadaf Ebrahimi       in  this  case  (they use default values). This modifier cannot be used
1578*22dc650dSSadaf Ebrahimi       with the find_limits, find_limits_noheap, or  substitute_callout  modi-
1579*22dc650dSSadaf Ebrahimi       fiers.
1580*22dc650dSSadaf Ebrahimi
1581*22dc650dSSadaf Ebrahimi       Similarly,  for  testing purposes, if the null_subject or null_replace-
1582*22dc650dSSadaf Ebrahimi       ment modifier is set, the subject or replacement  string  pointers  are
1583*22dc650dSSadaf Ebrahimi       passed as NULL, respectively, to the relevant functions.
1584*22dc650dSSadaf Ebrahimi
1585*22dc650dSSadaf Ebrahimi
1586*22dc650dSSadaf EbrahimiTHE ALTERNATIVE MATCHING FUNCTION
1587*22dc650dSSadaf Ebrahimi
1588*22dc650dSSadaf Ebrahimi       By  default,  pcre2test  uses  the  standard  PCRE2  matching function,
1589*22dc650dSSadaf Ebrahimi       pcre2_match() to match each subject line. PCRE2 also supports an alter-
1590*22dc650dSSadaf Ebrahimi       native matching function, pcre2_dfa_match(), which operates in  a  dif-
1591*22dc650dSSadaf Ebrahimi       ferent  way, and has some restrictions. The differences between the two
1592*22dc650dSSadaf Ebrahimi       functions are described in the pcre2matching documentation.
1593*22dc650dSSadaf Ebrahimi
1594*22dc650dSSadaf Ebrahimi       If the dfa modifier is set, the alternative matching function is  used.
1595*22dc650dSSadaf Ebrahimi       This  function  finds all possible matches at a given point in the sub-
1596*22dc650dSSadaf Ebrahimi       ject. If, however, the dfa_shortest modifier is set,  processing  stops
1597*22dc650dSSadaf Ebrahimi       after  the  first  match is found. This is always the shortest possible
1598*22dc650dSSadaf Ebrahimi       match.
1599*22dc650dSSadaf Ebrahimi
1600*22dc650dSSadaf Ebrahimi
1601*22dc650dSSadaf EbrahimiDEFAULT OUTPUT FROM pcre2test
1602*22dc650dSSadaf Ebrahimi
1603*22dc650dSSadaf Ebrahimi       This section describes the output when the  normal  matching  function,
1604*22dc650dSSadaf Ebrahimi       pcre2_match(), is being used.
1605*22dc650dSSadaf Ebrahimi
1606*22dc650dSSadaf Ebrahimi       When  a  match  succeeds,  pcre2test  outputs the list of captured sub-
1607*22dc650dSSadaf Ebrahimi       strings, starting with number 0 for the string that matched  the  whole
1608*22dc650dSSadaf Ebrahimi       pattern.  Otherwise, it outputs "No match" when the return is PCRE2_ER-
1609*22dc650dSSadaf Ebrahimi       ROR_NOMATCH,  or  "Partial  match:"  followed by the partially matching
1610*22dc650dSSadaf Ebrahimi       substring when the return is PCRE2_ERROR_PARTIAL. (Note  that  this  is
1611*22dc650dSSadaf Ebrahimi       the  entire  substring  that was inspected during the partial match; it
1612*22dc650dSSadaf Ebrahimi       may include characters before the actual match start  if  a  lookbehind
1613*22dc650dSSadaf Ebrahimi       assertion, \K, \b, or \B was involved.)
1614*22dc650dSSadaf Ebrahimi
1615*22dc650dSSadaf Ebrahimi       For any other return, pcre2test outputs the PCRE2 negative error number
1616*22dc650dSSadaf Ebrahimi       and  a  short  descriptive  phrase. If the error is a failed UTF string
1617*22dc650dSSadaf Ebrahimi       check, the code unit offset of the start of the  failing  character  is
1618*22dc650dSSadaf Ebrahimi       also output. Here is an example of an interactive pcre2test run.
1619*22dc650dSSadaf Ebrahimi
1620*22dc650dSSadaf Ebrahimi         $ pcre2test
1621*22dc650dSSadaf Ebrahimi         PCRE2 version 10.22 2016-07-29
1622*22dc650dSSadaf Ebrahimi
1623*22dc650dSSadaf Ebrahimi           re> /^abc(\d+)/
1624*22dc650dSSadaf Ebrahimi         data> abc123
1625*22dc650dSSadaf Ebrahimi          0: abc123
1626*22dc650dSSadaf Ebrahimi          1: 123
1627*22dc650dSSadaf Ebrahimi         data> xyz
1628*22dc650dSSadaf Ebrahimi         No match
1629*22dc650dSSadaf Ebrahimi
1630*22dc650dSSadaf Ebrahimi       Unset capturing substrings that are not followed by one that is set are
1631*22dc650dSSadaf Ebrahimi       not shown by pcre2test unless the allcaptures modifier is specified. In
1632*22dc650dSSadaf Ebrahimi       the following example, there are two capturing substrings, but when the
1633*22dc650dSSadaf Ebrahimi       first  data  line is matched, the second, unset substring is not shown.
1634*22dc650dSSadaf Ebrahimi       An "internal" unset substring is shown as "<unset>", as for the  second
1635*22dc650dSSadaf Ebrahimi       data line.
1636*22dc650dSSadaf Ebrahimi
1637*22dc650dSSadaf Ebrahimi           re> /(a)|(b)/
1638*22dc650dSSadaf Ebrahimi         data> a
1639*22dc650dSSadaf Ebrahimi          0: a
1640*22dc650dSSadaf Ebrahimi          1: a
1641*22dc650dSSadaf Ebrahimi         data> b
1642*22dc650dSSadaf Ebrahimi          0: b
1643*22dc650dSSadaf Ebrahimi          1: <unset>
1644*22dc650dSSadaf Ebrahimi          2: b
1645*22dc650dSSadaf Ebrahimi
1646*22dc650dSSadaf Ebrahimi       If  the strings contain any non-printing characters, they are output as
1647*22dc650dSSadaf Ebrahimi       \xhh escapes if the value is less than 256 and UTF  mode  is  not  set.
1648*22dc650dSSadaf Ebrahimi       Otherwise they are output as \x{hh...} escapes. See below for the defi-
1649*22dc650dSSadaf Ebrahimi       nition  of  non-printing  characters. If the aftertext modifier is set,
1650*22dc650dSSadaf Ebrahimi       the output for substring 0 is followed  by  the  rest  of  the  subject
1651*22dc650dSSadaf Ebrahimi       string, identified by "0+" like this:
1652*22dc650dSSadaf Ebrahimi
1653*22dc650dSSadaf Ebrahimi           re> /cat/aftertext
1654*22dc650dSSadaf Ebrahimi         data> cataract
1655*22dc650dSSadaf Ebrahimi          0: cat
1656*22dc650dSSadaf Ebrahimi          0+ aract
1657*22dc650dSSadaf Ebrahimi
1658*22dc650dSSadaf Ebrahimi       If global matching is requested, the results of successive matching at-
1659*22dc650dSSadaf Ebrahimi       tempts are output in sequence, like this:
1660*22dc650dSSadaf Ebrahimi
1661*22dc650dSSadaf Ebrahimi           re> /\Bi(\w\w)/g
1662*22dc650dSSadaf Ebrahimi         data> Mississippi
1663*22dc650dSSadaf Ebrahimi          0: iss
1664*22dc650dSSadaf Ebrahimi          1: ss
1665*22dc650dSSadaf Ebrahimi          0: iss
1666*22dc650dSSadaf Ebrahimi          1: ss
1667*22dc650dSSadaf Ebrahimi          0: ipp
1668*22dc650dSSadaf Ebrahimi          1: pp
1669*22dc650dSSadaf Ebrahimi
1670*22dc650dSSadaf Ebrahimi       "No  match" is output only if the first match attempt fails. Here is an
1671*22dc650dSSadaf Ebrahimi       example of a failure message (the offset 4 that  is  specified  by  the
1672*22dc650dSSadaf Ebrahimi       offset modifier is past the end of the subject string):
1673*22dc650dSSadaf Ebrahimi
1674*22dc650dSSadaf Ebrahimi           re> /xyz/
1675*22dc650dSSadaf Ebrahimi         data> xyz\=offset=4
1676*22dc650dSSadaf Ebrahimi         Error -24 (bad offset value)
1677*22dc650dSSadaf Ebrahimi
1678*22dc650dSSadaf Ebrahimi       Note that whereas patterns can be continued over several lines (a plain
1679*22dc650dSSadaf Ebrahimi       ">"  prompt  is used for continuations), subject lines may not. However
1680*22dc650dSSadaf Ebrahimi       newlines can be included in a subject by means of the \n escape (or \r,
1681*22dc650dSSadaf Ebrahimi       \r\n, etc., depending on the newline sequence setting).
1682*22dc650dSSadaf Ebrahimi
1683*22dc650dSSadaf Ebrahimi
1684*22dc650dSSadaf EbrahimiOUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1685*22dc650dSSadaf Ebrahimi
1686*22dc650dSSadaf Ebrahimi       When the alternative matching function, pcre2_dfa_match(), is used, the
1687*22dc650dSSadaf Ebrahimi       output consists of a list of all the matches that start  at  the  first
1688*22dc650dSSadaf Ebrahimi       point in the subject where there is at least one match. For example:
1689*22dc650dSSadaf Ebrahimi
1690*22dc650dSSadaf Ebrahimi           re> /(tang|tangerine|tan)/
1691*22dc650dSSadaf Ebrahimi         data> yellow tangerine\=dfa
1692*22dc650dSSadaf Ebrahimi          0: tangerine
1693*22dc650dSSadaf Ebrahimi          1: tang
1694*22dc650dSSadaf Ebrahimi          2: tan
1695*22dc650dSSadaf Ebrahimi
1696*22dc650dSSadaf Ebrahimi       Using  the normal matching function on this data finds only "tang". The
1697*22dc650dSSadaf Ebrahimi       longest matching string is always given first (and numbered zero).  Af-
1698*22dc650dSSadaf Ebrahimi       ter  a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
1699*22dc650dSSadaf Ebrahimi       lowed by the partially matching substring. Note that this is the entire
1700*22dc650dSSadaf Ebrahimi       substring that was inspected during the partial match; it  may  include
1701*22dc650dSSadaf Ebrahimi       characters before the actual match start if a lookbehind assertion, \b,
1702*22dc650dSSadaf Ebrahimi       or \B was involved. (\K is not supported for DFA matching.)
1703*22dc650dSSadaf Ebrahimi
1704*22dc650dSSadaf Ebrahimi       If global matching is requested, the search for further matches resumes
1705*22dc650dSSadaf Ebrahimi       at the end of the longest match. For example:
1706*22dc650dSSadaf Ebrahimi
1707*22dc650dSSadaf Ebrahimi           re> /(tang|tangerine|tan)/g
1708*22dc650dSSadaf Ebrahimi         data> yellow tangerine and tangy sultana\=dfa
1709*22dc650dSSadaf Ebrahimi          0: tangerine
1710*22dc650dSSadaf Ebrahimi          1: tang
1711*22dc650dSSadaf Ebrahimi          2: tan
1712*22dc650dSSadaf Ebrahimi          0: tang
1713*22dc650dSSadaf Ebrahimi          1: tan
1714*22dc650dSSadaf Ebrahimi          0: tan
1715*22dc650dSSadaf Ebrahimi
1716*22dc650dSSadaf Ebrahimi       The  alternative  matching function does not support substring capture,
1717*22dc650dSSadaf Ebrahimi       so the modifiers that are concerned with captured  substrings  are  not
1718*22dc650dSSadaf Ebrahimi       relevant.
1719*22dc650dSSadaf Ebrahimi
1720*22dc650dSSadaf Ebrahimi
1721*22dc650dSSadaf EbrahimiRESTARTING AFTER A PARTIAL MATCH
1722*22dc650dSSadaf Ebrahimi
1723*22dc650dSSadaf Ebrahimi       When  the  alternative matching function has given the PCRE2_ERROR_PAR-
1724*22dc650dSSadaf Ebrahimi       TIAL return, indicating that the subject partially matched the pattern,
1725*22dc650dSSadaf Ebrahimi       you can restart the match with additional subject data by means of  the
1726*22dc650dSSadaf Ebrahimi       dfa_restart modifier. For example:
1727*22dc650dSSadaf Ebrahimi
1728*22dc650dSSadaf Ebrahimi           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
1729*22dc650dSSadaf Ebrahimi         data> 23ja\=ps,dfa
1730*22dc650dSSadaf Ebrahimi         Partial match: 23ja
1731*22dc650dSSadaf Ebrahimi         data> n05\=dfa,dfa_restart
1732*22dc650dSSadaf Ebrahimi          0: n05
1733*22dc650dSSadaf Ebrahimi
1734*22dc650dSSadaf Ebrahimi       For  further  information  about partial matching, see the pcre2partial
1735*22dc650dSSadaf Ebrahimi       documentation.
1736*22dc650dSSadaf Ebrahimi
1737*22dc650dSSadaf Ebrahimi
1738*22dc650dSSadaf EbrahimiCALLOUTS
1739*22dc650dSSadaf Ebrahimi
1740*22dc650dSSadaf Ebrahimi       If the pattern contains any callout requests, pcre2test's callout func-
1741*22dc650dSSadaf Ebrahimi       tion is called during matching unless callout_none is  specified.  This
1742*22dc650dSSadaf Ebrahimi       works with both matching functions, and with JIT, though there are some
1743*22dc650dSSadaf Ebrahimi       differences  in behaviour. The output for callouts with numerical argu-
1744*22dc650dSSadaf Ebrahimi       ments and those with string arguments is slightly different.
1745*22dc650dSSadaf Ebrahimi
1746*22dc650dSSadaf Ebrahimi   Callouts with numerical arguments
1747*22dc650dSSadaf Ebrahimi
1748*22dc650dSSadaf Ebrahimi       By default, the callout function displays the callout number, the start
1749*22dc650dSSadaf Ebrahimi       and current positions in the subject text at the callout time, and  the
1750*22dc650dSSadaf Ebrahimi       next pattern item to be tested. For example:
1751*22dc650dSSadaf Ebrahimi
1752*22dc650dSSadaf Ebrahimi         --->pqrabcdef
1753*22dc650dSSadaf Ebrahimi           0    ^  ^     \d
1754*22dc650dSSadaf Ebrahimi
1755*22dc650dSSadaf Ebrahimi       This  output  indicates  that callout number 0 occurred for a match at-
1756*22dc650dSSadaf Ebrahimi       tempt starting at the fourth character of the subject string, when  the
1757*22dc650dSSadaf Ebrahimi       pointer  was  at  the seventh character, and when the next pattern item
1758*22dc650dSSadaf Ebrahimi       was \d. Just one circumflex is output if the start  and  current  posi-
1759*22dc650dSSadaf Ebrahimi       tions are the same, or if the current position precedes the start posi-
1760*22dc650dSSadaf Ebrahimi       tion, which can happen if the callout is in a lookbehind assertion.
1761*22dc650dSSadaf Ebrahimi
1762*22dc650dSSadaf Ebrahimi       Callouts numbered 255 are assumed to be automatic callouts, inserted as
1763*22dc650dSSadaf Ebrahimi       a result of the auto_callout pattern modifier. In this case, instead of
1764*22dc650dSSadaf Ebrahimi       showing  the  callout  number, the offset in the pattern, preceded by a
1765*22dc650dSSadaf Ebrahimi       plus, is output. For example:
1766*22dc650dSSadaf Ebrahimi
1767*22dc650dSSadaf Ebrahimi           re> /\d?[A-E]\*/auto_callout
1768*22dc650dSSadaf Ebrahimi         data> E*
1769*22dc650dSSadaf Ebrahimi         --->E*
1770*22dc650dSSadaf Ebrahimi          +0 ^      \d?
1771*22dc650dSSadaf Ebrahimi          +3 ^      [A-E]
1772*22dc650dSSadaf Ebrahimi          +8 ^^     \*
1773*22dc650dSSadaf Ebrahimi         +10 ^ ^
1774*22dc650dSSadaf Ebrahimi          0: E*
1775*22dc650dSSadaf Ebrahimi
1776*22dc650dSSadaf Ebrahimi       If a pattern contains (*MARK) items, an additional line is output when-
1777*22dc650dSSadaf Ebrahimi       ever a change of latest mark is passed to the callout function. For ex-
1778*22dc650dSSadaf Ebrahimi       ample:
1779*22dc650dSSadaf Ebrahimi
1780*22dc650dSSadaf Ebrahimi           re> /a(*MARK:X)bc/auto_callout
1781*22dc650dSSadaf Ebrahimi         data> abc
1782*22dc650dSSadaf Ebrahimi         --->abc
1783*22dc650dSSadaf Ebrahimi          +0 ^       a
1784*22dc650dSSadaf Ebrahimi          +1 ^^      (*MARK:X)
1785*22dc650dSSadaf Ebrahimi         +10 ^^      b
1786*22dc650dSSadaf Ebrahimi         Latest Mark: X
1787*22dc650dSSadaf Ebrahimi         +11 ^ ^     c
1788*22dc650dSSadaf Ebrahimi         +12 ^  ^
1789*22dc650dSSadaf Ebrahimi          0: abc
1790*22dc650dSSadaf Ebrahimi
1791*22dc650dSSadaf Ebrahimi       The mark changes between matching "a" and "b", but stays the  same  for
1792*22dc650dSSadaf Ebrahimi       the  rest  of  the match, so nothing more is output. If, as a result of
1793*22dc650dSSadaf Ebrahimi       backtracking, the mark reverts to being unset, the  text  "<unset>"  is
1794*22dc650dSSadaf Ebrahimi       output.
1795*22dc650dSSadaf Ebrahimi
1796*22dc650dSSadaf Ebrahimi   Callouts with string arguments
1797*22dc650dSSadaf Ebrahimi
1798*22dc650dSSadaf Ebrahimi       The output for a callout with a string argument is similar, except that
1799*22dc650dSSadaf Ebrahimi       instead  of outputting a callout number before the position indicators,
1800*22dc650dSSadaf Ebrahimi       the callout string and its offset in the pattern string are output  be-
1801*22dc650dSSadaf Ebrahimi       fore  the  reflection  of the subject string, and the subject string is
1802*22dc650dSSadaf Ebrahimi       reflected for each callout. For example:
1803*22dc650dSSadaf Ebrahimi
1804*22dc650dSSadaf Ebrahimi           re> /^ab(?C'first')cd(?C"second")ef/
1805*22dc650dSSadaf Ebrahimi         data> abcdefg
1806*22dc650dSSadaf Ebrahimi         Callout (7): 'first'
1807*22dc650dSSadaf Ebrahimi         --->abcdefg
1808*22dc650dSSadaf Ebrahimi             ^ ^         c
1809*22dc650dSSadaf Ebrahimi         Callout (20): "second"
1810*22dc650dSSadaf Ebrahimi         --->abcdefg
1811*22dc650dSSadaf Ebrahimi             ^   ^       e
1812*22dc650dSSadaf Ebrahimi          0: abcdef
1813*22dc650dSSadaf Ebrahimi
1814*22dc650dSSadaf Ebrahimi
1815*22dc650dSSadaf Ebrahimi   Callout modifiers
1816*22dc650dSSadaf Ebrahimi
1817*22dc650dSSadaf Ebrahimi       The callout function in pcre2test returns zero (carry on  matching)  by
1818*22dc650dSSadaf Ebrahimi       default,  but  you can use a callout_fail modifier in a subject line to
1819*22dc650dSSadaf Ebrahimi       change this and other parameters of the callout (see below).
1820*22dc650dSSadaf Ebrahimi
1821*22dc650dSSadaf Ebrahimi       If the callout_capture modifier is set, the current captured groups are
1822*22dc650dSSadaf Ebrahimi       output when a callout occurs. This is useful only for non-DFA matching,
1823*22dc650dSSadaf Ebrahimi       as pcre2_dfa_match() does not support capturing,  so  no  captures  are
1824*22dc650dSSadaf Ebrahimi       ever shown.
1825*22dc650dSSadaf Ebrahimi
1826*22dc650dSSadaf Ebrahimi       The normal callout output, showing the callout number or pattern offset
1827*22dc650dSSadaf Ebrahimi       (as  described above) is suppressed if the callout_no_where modifier is
1828*22dc650dSSadaf Ebrahimi       set.
1829*22dc650dSSadaf Ebrahimi
1830*22dc650dSSadaf Ebrahimi       When using the interpretive  matching  function  pcre2_match()  without
1831*22dc650dSSadaf Ebrahimi       JIT,  setting  the callout_extra modifier causes additional output from
1832*22dc650dSSadaf Ebrahimi       pcre2test's callout function to be generated. For the first callout  in
1833*22dc650dSSadaf Ebrahimi       a  match  attempt at a new starting position in the subject, "New match
1834*22dc650dSSadaf Ebrahimi       attempt" is output. If there has been a backtrack since the last  call-
1835*22dc650dSSadaf Ebrahimi       out (or start of matching if this is the first callout), "Backtrack" is
1836*22dc650dSSadaf Ebrahimi       output,  followed  by  "No other matching paths" if the backtrack ended
1837*22dc650dSSadaf Ebrahimi       the previous match attempt. For example:
1838*22dc650dSSadaf Ebrahimi
1839*22dc650dSSadaf Ebrahimi          re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
1840*22dc650dSSadaf Ebrahimi         data> aac\=callout_extra
1841*22dc650dSSadaf Ebrahimi         New match attempt
1842*22dc650dSSadaf Ebrahimi         --->aac
1843*22dc650dSSadaf Ebrahimi          +0 ^       (
1844*22dc650dSSadaf Ebrahimi          +1 ^       a+
1845*22dc650dSSadaf Ebrahimi          +3 ^ ^     )
1846*22dc650dSSadaf Ebrahimi          +4 ^ ^     b
1847*22dc650dSSadaf Ebrahimi         Backtrack
1848*22dc650dSSadaf Ebrahimi         --->aac
1849*22dc650dSSadaf Ebrahimi          +3 ^^      )
1850*22dc650dSSadaf Ebrahimi          +4 ^^      b
1851*22dc650dSSadaf Ebrahimi         Backtrack
1852*22dc650dSSadaf Ebrahimi         No other matching paths
1853*22dc650dSSadaf Ebrahimi         New match attempt
1854*22dc650dSSadaf Ebrahimi         --->aac
1855*22dc650dSSadaf Ebrahimi          +0  ^      (
1856*22dc650dSSadaf Ebrahimi          +1  ^      a+
1857*22dc650dSSadaf Ebrahimi          +3  ^^     )
1858*22dc650dSSadaf Ebrahimi          +4  ^^     b
1859*22dc650dSSadaf Ebrahimi         Backtrack
1860*22dc650dSSadaf Ebrahimi         No other matching paths
1861*22dc650dSSadaf Ebrahimi         New match attempt
1862*22dc650dSSadaf Ebrahimi         --->aac
1863*22dc650dSSadaf Ebrahimi          +0   ^     (
1864*22dc650dSSadaf Ebrahimi          +1   ^     a+
1865*22dc650dSSadaf Ebrahimi         Backtrack
1866*22dc650dSSadaf Ebrahimi         No other matching paths
1867*22dc650dSSadaf Ebrahimi         New match attempt
1868*22dc650dSSadaf Ebrahimi         --->aac
1869*22dc650dSSadaf Ebrahimi          +0    ^    (
1870*22dc650dSSadaf Ebrahimi          +1    ^    a+
1871*22dc650dSSadaf Ebrahimi         No match
1872*22dc650dSSadaf Ebrahimi
1873*22dc650dSSadaf Ebrahimi       Notice that various optimizations must be turned off if  you  want  all
1874*22dc650dSSadaf Ebrahimi       possible  matching  paths  to  be  scanned. If no_start_optimize is not
1875*22dc650dSSadaf Ebrahimi       used, there is an immediate "no match", without any  callouts,  because
1876*22dc650dSSadaf Ebrahimi       the  starting  optimization  fails to find "b" in the subject, which it
1877*22dc650dSSadaf Ebrahimi       knows must be present for any match. If no_auto_possess  is  not  used,
1878*22dc650dSSadaf Ebrahimi       the  "a+"  item is turned into "a++", which reduces the number of back-
1879*22dc650dSSadaf Ebrahimi       tracks.
1880*22dc650dSSadaf Ebrahimi
1881*22dc650dSSadaf Ebrahimi       The callout_extra modifier has no effect if used with the DFA  matching
1882*22dc650dSSadaf Ebrahimi       function, or with JIT.
1883*22dc650dSSadaf Ebrahimi
1884*22dc650dSSadaf Ebrahimi   Return values from callouts
1885*22dc650dSSadaf Ebrahimi
1886*22dc650dSSadaf Ebrahimi       The  default  return  from  the  callout function is zero, which allows
1887*22dc650dSSadaf Ebrahimi       matching to continue. The callout_fail modifier can be given one or two
1888*22dc650dSSadaf Ebrahimi       numbers. If there is only one number, 1 is returned instead of 0 (caus-
1889*22dc650dSSadaf Ebrahimi       ing matching to backtrack) when a callout of that number is reached. If
1890*22dc650dSSadaf Ebrahimi       two numbers (<n>:<m>) are given, 1 is  returned  when  callout  <n>  is
1891*22dc650dSSadaf Ebrahimi       reached  and  there  have been at least <m> callouts. The callout_error
1892*22dc650dSSadaf Ebrahimi       modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
1893*22dc650dSSadaf Ebrahimi       ing the entire matching process to be aborted. If both these  modifiers
1894*22dc650dSSadaf Ebrahimi       are  set  for  the same callout number, callout_error takes precedence.
1895*22dc650dSSadaf Ebrahimi       Note that callouts with string arguments are always  given  the  number
1896*22dc650dSSadaf Ebrahimi       zero.
1897*22dc650dSSadaf Ebrahimi
1898*22dc650dSSadaf Ebrahimi       The  callout_data  modifier can be given an unsigned or a negative num-
1899*22dc650dSSadaf Ebrahimi       ber.  This is set as the "user data" that is  passed  to  the  matching
1900*22dc650dSSadaf Ebrahimi       function,  and  passed  back  when the callout function is invoked. Any
1901*22dc650dSSadaf Ebrahimi       value other than zero is used as  a  return  from  pcre2test's  callout
1902*22dc650dSSadaf Ebrahimi       function.
1903*22dc650dSSadaf Ebrahimi
1904*22dc650dSSadaf Ebrahimi       Inserting callouts can be helpful when using pcre2test to check compli-
1905*22dc650dSSadaf Ebrahimi       cated  regular expressions. For further information about callouts, see
1906*22dc650dSSadaf Ebrahimi       the pcre2callout documentation.
1907*22dc650dSSadaf Ebrahimi
1908*22dc650dSSadaf Ebrahimi
1909*22dc650dSSadaf EbrahimiNON-PRINTING CHARACTERS
1910*22dc650dSSadaf Ebrahimi
1911*22dc650dSSadaf Ebrahimi       When pcre2test is outputting text in the compiled version of a pattern,
1912*22dc650dSSadaf Ebrahimi       bytes other than 32-126 are always treated as  non-printing  characters
1913*22dc650dSSadaf Ebrahimi       and are therefore shown as hex escapes.
1914*22dc650dSSadaf Ebrahimi
1915*22dc650dSSadaf Ebrahimi       When  pcre2test  is outputting text that is a matched part of a subject
1916*22dc650dSSadaf Ebrahimi       string, it behaves in the same way, unless a different locale has  been
1917*22dc650dSSadaf Ebrahimi       set  for the pattern (using the locale modifier). In this case, the is-
1918*22dc650dSSadaf Ebrahimi       print() function is used to distinguish printing and non-printing char-
1919*22dc650dSSadaf Ebrahimi       acters.
1920*22dc650dSSadaf Ebrahimi
1921*22dc650dSSadaf Ebrahimi
1922*22dc650dSSadaf EbrahimiSAVING AND RESTORING COMPILED PATTERNS
1923*22dc650dSSadaf Ebrahimi
1924*22dc650dSSadaf Ebrahimi       It is possible to save compiled patterns on disc or elsewhere, and  re-
1925*22dc650dSSadaf Ebrahimi       load  them  later, subject to a number of restrictions. JIT data cannot
1926*22dc650dSSadaf Ebrahimi       be saved. The host on which the patterns are reloaded must  be  running
1927*22dc650dSSadaf Ebrahimi       the same version of PCRE2, with the same code unit width, and must also
1928*22dc650dSSadaf Ebrahimi       have  the  same  endianness,  pointer width and PCRE2_SIZE type. Before
1929*22dc650dSSadaf Ebrahimi       compiled patterns can be saved they must be serialized, that  is,  con-
1930*22dc650dSSadaf Ebrahimi       verted  to a stream of bytes. A single byte stream may contain any num-
1931*22dc650dSSadaf Ebrahimi       ber of compiled patterns, but they must all use the same character  ta-
1932*22dc650dSSadaf Ebrahimi       bles.  A  single copy of the tables is included in the byte stream (its
1933*22dc650dSSadaf Ebrahimi       size is 1088 bytes).
1934*22dc650dSSadaf Ebrahimi
1935*22dc650dSSadaf Ebrahimi       The functions whose names begin with pcre2_serialize_ are used for  se-
1936*22dc650dSSadaf Ebrahimi       rializing  and de-serializing. They are described in the pcre2serialize
1937*22dc650dSSadaf Ebrahimi       documentation. In this section we describe the  features  of  pcre2test
1938*22dc650dSSadaf Ebrahimi       that can be used to test these functions.
1939*22dc650dSSadaf Ebrahimi
1940*22dc650dSSadaf Ebrahimi       Note  that  "serialization" in PCRE2 does not convert compiled patterns
1941*22dc650dSSadaf Ebrahimi       to an abstract format like Java or .NET. It  just  makes  a  reloadable
1942*22dc650dSSadaf Ebrahimi       byte code stream.  Hence the restrictions on reloading mentioned above.
1943*22dc650dSSadaf Ebrahimi
1944*22dc650dSSadaf Ebrahimi       In  pcre2test,  when  a pattern with push modifier is successfully com-
1945*22dc650dSSadaf Ebrahimi       piled, it is pushed onto a stack of compiled  patterns,  and  pcre2test
1946*22dc650dSSadaf Ebrahimi       expects  the next line to contain a new pattern (or command) instead of
1947*22dc650dSSadaf Ebrahimi       a subject line. By contrast, the pushcopy modifier causes a copy of the
1948*22dc650dSSadaf Ebrahimi       compiled pattern to be stacked, leaving the original available for  im-
1949*22dc650dSSadaf Ebrahimi       mediate  matching.  By using push and/or pushcopy, a number of patterns
1950*22dc650dSSadaf Ebrahimi       can be compiled and retained. These  modifiers  are  incompatible  with
1951*22dc650dSSadaf Ebrahimi       posix, and control modifiers that act at match time are ignored (with a
1952*22dc650dSSadaf Ebrahimi       message)  for the stacked patterns. The jitverify modifier applies only
1953*22dc650dSSadaf Ebrahimi       at compile time.
1954*22dc650dSSadaf Ebrahimi
1955*22dc650dSSadaf Ebrahimi       The command
1956*22dc650dSSadaf Ebrahimi
1957*22dc650dSSadaf Ebrahimi         #save <filename>
1958*22dc650dSSadaf Ebrahimi
1959*22dc650dSSadaf Ebrahimi       causes all the stacked patterns to be serialized and the result written
1960*22dc650dSSadaf Ebrahimi       to the named file. Afterwards, all the stacked patterns are freed.  The
1961*22dc650dSSadaf Ebrahimi       command
1962*22dc650dSSadaf Ebrahimi
1963*22dc650dSSadaf Ebrahimi         #load <filename>
1964*22dc650dSSadaf Ebrahimi
1965*22dc650dSSadaf Ebrahimi       reads  the  data in the file, and then arranges for it to be de-serial-
1966*22dc650dSSadaf Ebrahimi       ized, with the resulting compiled patterns added to the pattern  stack.
1967*22dc650dSSadaf Ebrahimi       The  pattern  on the top of the stack can be retrieved by the #pop com-
1968*22dc650dSSadaf Ebrahimi       mand, which must be followed by  lines  of  subjects  that  are  to  be
1969*22dc650dSSadaf Ebrahimi       matched  with  the pattern, terminated as usual by an empty line or end
1970*22dc650dSSadaf Ebrahimi       of file. This command may be followed by  a  modifier  list  containing
1971*22dc650dSSadaf Ebrahimi       only  control  modifiers that act after a pattern has been compiled. In
1972*22dc650dSSadaf Ebrahimi       particular, hex, posix, posix_nosub, push, and  pushcopy  are  not  al-
1973*22dc650dSSadaf Ebrahimi       lowed,  nor  are  any option-setting modifiers.  The JIT modifiers are,
1974*22dc650dSSadaf Ebrahimi       however permitted. Here is an example that saves and reloads  two  pat-
1975*22dc650dSSadaf Ebrahimi       terns.
1976*22dc650dSSadaf Ebrahimi
1977*22dc650dSSadaf Ebrahimi         /abc/push
1978*22dc650dSSadaf Ebrahimi         /xyz/push
1979*22dc650dSSadaf Ebrahimi         #save tempfile
1980*22dc650dSSadaf Ebrahimi         #load tempfile
1981*22dc650dSSadaf Ebrahimi         #pop info
1982*22dc650dSSadaf Ebrahimi         xyz
1983*22dc650dSSadaf Ebrahimi
1984*22dc650dSSadaf Ebrahimi         #pop jit,bincode
1985*22dc650dSSadaf Ebrahimi         abc
1986*22dc650dSSadaf Ebrahimi
1987*22dc650dSSadaf Ebrahimi       If  jitverify  is  used with #pop, it does not automatically imply jit,
1988*22dc650dSSadaf Ebrahimi       which is different behaviour from when it is used on a pattern.
1989*22dc650dSSadaf Ebrahimi
1990*22dc650dSSadaf Ebrahimi       The #popcopy command is analogous to the pushcopy modifier in  that  it
1991*22dc650dSSadaf Ebrahimi       makes current a copy of the topmost stack pattern, leaving the original
1992*22dc650dSSadaf Ebrahimi       still on the stack.
1993*22dc650dSSadaf Ebrahimi
1994*22dc650dSSadaf Ebrahimi
1995*22dc650dSSadaf EbrahimiSEE ALSO
1996*22dc650dSSadaf Ebrahimi
1997*22dc650dSSadaf Ebrahimi       pcre2(3),  pcre2api(3),  pcre2callout(3),  pcre2jit,  pcre2matching(3),
1998*22dc650dSSadaf Ebrahimi       pcre2partial(d), pcre2pattern(3), pcre2serialize(3).
1999*22dc650dSSadaf Ebrahimi
2000*22dc650dSSadaf Ebrahimi
2001*22dc650dSSadaf EbrahimiAUTHOR
2002*22dc650dSSadaf Ebrahimi
2003*22dc650dSSadaf Ebrahimi       Philip Hazel
2004*22dc650dSSadaf Ebrahimi       Retired from University Computing Service
2005*22dc650dSSadaf Ebrahimi       Cambridge, England.
2006*22dc650dSSadaf Ebrahimi
2007*22dc650dSSadaf Ebrahimi
2008*22dc650dSSadaf EbrahimiREVISION
2009*22dc650dSSadaf Ebrahimi
2010*22dc650dSSadaf Ebrahimi       Last updated: 24 April 2024
2011*22dc650dSSadaf Ebrahimi       Copyright (c) 1997-2024 University of Cambridge.
2012*22dc650dSSadaf Ebrahimi
2013*22dc650dSSadaf Ebrahimi
2014*22dc650dSSadaf EbrahimiPCRE 10.44                       24 April 2024                    PCRE2TEST(1)
2015