xref: /aosp_15_r20/external/pcre/doc/html/pcre2syntax.html (revision 22dc650d8ae982c6770746019a6f94af92b0f024)
1*22dc650dSSadaf Ebrahimi<html>
2*22dc650dSSadaf Ebrahimi<head>
3*22dc650dSSadaf Ebrahimi<title>pcre2syntax specification</title>
4*22dc650dSSadaf Ebrahimi</head>
5*22dc650dSSadaf Ebrahimi<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6*22dc650dSSadaf Ebrahimi<h1>pcre2syntax man page</h1>
7*22dc650dSSadaf Ebrahimi<p>
8*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>.
9*22dc650dSSadaf Ebrahimi</p>
10*22dc650dSSadaf Ebrahimi<p>
11*22dc650dSSadaf EbrahimiThis page is part of the PCRE2 HTML documentation. It was generated
12*22dc650dSSadaf Ebrahimiautomatically from the original man page. If there is any nonsense in it,
13*22dc650dSSadaf Ebrahimiplease consult the man page, in case the conversion went wrong.
14*22dc650dSSadaf Ebrahimi<br>
15*22dc650dSSadaf Ebrahimi<ul>
16*22dc650dSSadaf Ebrahimi<li><a name="TOC1" href="#SEC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a>
17*22dc650dSSadaf Ebrahimi<li><a name="TOC2" href="#SEC2">QUOTING</a>
18*22dc650dSSadaf Ebrahimi<li><a name="TOC3" href="#SEC3">BRACED ITEMS</a>
19*22dc650dSSadaf Ebrahimi<li><a name="TOC4" href="#SEC4">ESCAPED CHARACTERS</a>
20*22dc650dSSadaf Ebrahimi<li><a name="TOC5" href="#SEC5">CHARACTER TYPES</a>
21*22dc650dSSadaf Ebrahimi<li><a name="TOC6" href="#SEC6">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
22*22dc650dSSadaf Ebrahimi<li><a name="TOC7" href="#SEC7">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
23*22dc650dSSadaf Ebrahimi<li><a name="TOC8" href="#SEC8">BINARY PROPERTIES FOR \p AND \P</a>
24*22dc650dSSadaf Ebrahimi<li><a name="TOC9" href="#SEC9">SCRIPT MATCHING WITH \p AND \P</a>
25*22dc650dSSadaf Ebrahimi<li><a name="TOC10" href="#SEC10">THE BIDI_CLASS PROPERTY FOR \p AND \P</a>
26*22dc650dSSadaf Ebrahimi<li><a name="TOC11" href="#SEC11">CHARACTER CLASSES</a>
27*22dc650dSSadaf Ebrahimi<li><a name="TOC12" href="#SEC12">QUANTIFIERS</a>
28*22dc650dSSadaf Ebrahimi<li><a name="TOC13" href="#SEC13">ANCHORS AND SIMPLE ASSERTIONS</a>
29*22dc650dSSadaf Ebrahimi<li><a name="TOC14" href="#SEC14">REPORTED MATCH POINT SETTING</a>
30*22dc650dSSadaf Ebrahimi<li><a name="TOC15" href="#SEC15">ALTERNATION</a>
31*22dc650dSSadaf Ebrahimi<li><a name="TOC16" href="#SEC16">CAPTURING</a>
32*22dc650dSSadaf Ebrahimi<li><a name="TOC17" href="#SEC17">ATOMIC GROUPS</a>
33*22dc650dSSadaf Ebrahimi<li><a name="TOC18" href="#SEC18">COMMENT</a>
34*22dc650dSSadaf Ebrahimi<li><a name="TOC19" href="#SEC19">OPTION SETTING</a>
35*22dc650dSSadaf Ebrahimi<li><a name="TOC20" href="#SEC20">NEWLINE CONVENTION</a>
36*22dc650dSSadaf Ebrahimi<li><a name="TOC21" href="#SEC21">WHAT \R MATCHES</a>
37*22dc650dSSadaf Ebrahimi<li><a name="TOC22" href="#SEC22">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
38*22dc650dSSadaf Ebrahimi<li><a name="TOC23" href="#SEC23">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
39*22dc650dSSadaf Ebrahimi<li><a name="TOC24" href="#SEC24">SCRIPT RUNS</a>
40*22dc650dSSadaf Ebrahimi<li><a name="TOC25" href="#SEC25">BACKREFERENCES</a>
41*22dc650dSSadaf Ebrahimi<li><a name="TOC26" href="#SEC26">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
42*22dc650dSSadaf Ebrahimi<li><a name="TOC27" href="#SEC27">CONDITIONAL PATTERNS</a>
43*22dc650dSSadaf Ebrahimi<li><a name="TOC28" href="#SEC28">BACKTRACKING CONTROL</a>
44*22dc650dSSadaf Ebrahimi<li><a name="TOC29" href="#SEC29">CALLOUTS</a>
45*22dc650dSSadaf Ebrahimi<li><a name="TOC30" href="#SEC30">SEE ALSO</a>
46*22dc650dSSadaf Ebrahimi<li><a name="TOC31" href="#SEC31">AUTHOR</a>
47*22dc650dSSadaf Ebrahimi<li><a name="TOC32" href="#SEC32">REVISION</a>
48*22dc650dSSadaf Ebrahimi</ul>
49*22dc650dSSadaf Ebrahimi<br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
50*22dc650dSSadaf Ebrahimi<P>
51*22dc650dSSadaf EbrahimiThe full syntax and semantics of the regular expressions that are supported by
52*22dc650dSSadaf EbrahimiPCRE2 are described in the
53*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
54*22dc650dSSadaf Ebrahimidocumentation. This document contains a quick-reference summary of the syntax.
55*22dc650dSSadaf Ebrahimi</P>
56*22dc650dSSadaf Ebrahimi<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
57*22dc650dSSadaf Ebrahimi<P>
58*22dc650dSSadaf Ebrahimi<pre>
59*22dc650dSSadaf Ebrahimi  \x         where x is non-alphanumeric is a literal x
60*22dc650dSSadaf Ebrahimi  \Q...\E    treat enclosed characters as literal
61*22dc650dSSadaf Ebrahimi</pre>
62*22dc650dSSadaf EbrahimiNote that white space inside \Q...\E is always treated as literal, even if
63*22dc650dSSadaf EbrahimiPCRE2_EXTENDED is set, causing most other white space to be ignored.
64*22dc650dSSadaf Ebrahimi</P>
65*22dc650dSSadaf Ebrahimi<br><a name="SEC3" href="#TOC1">BRACED ITEMS</a><br>
66*22dc650dSSadaf Ebrahimi<P>
67*22dc650dSSadaf EbrahimiWith one exception, wherever brace characters { and } are required to enclose
68*22dc650dSSadaf Ebrahimidata for constructions such as \g{2} or \k{name}, space and/or horizontal tab
69*22dc650dSSadaf Ebrahimicharacters that follow { or precede } are allowed and are ignored. In the case
70*22dc650dSSadaf Ebrahimiof quantifiers, they may also appear before or after the comma. The exception
71*22dc650dSSadaf Ebrahimiis \u{...} which is not Perl-compatible and is recognized only when
72*22dc650dSSadaf EbrahimiPCRE2_EXTRA_ALT_BSUX is set. This is an ECMAScript compatibility feature, and
73*22dc650dSSadaf Ebrahimifollows ECMAScript's behaviour.
74*22dc650dSSadaf Ebrahimi</P>
75*22dc650dSSadaf Ebrahimi<br><a name="SEC4" href="#TOC1">ESCAPED CHARACTERS</a><br>
76*22dc650dSSadaf Ebrahimi<P>
77*22dc650dSSadaf EbrahimiThis table applies to ASCII and Unicode environments. An unrecognized escape
78*22dc650dSSadaf Ebrahimisequence causes an error.
79*22dc650dSSadaf Ebrahimi<pre>
80*22dc650dSSadaf Ebrahimi  \a         alarm, that is, the BEL character (hex 07)
81*22dc650dSSadaf Ebrahimi  \cx        "control-x", where x is a non-control ASCII character
82*22dc650dSSadaf Ebrahimi  \e         escape (hex 1B)
83*22dc650dSSadaf Ebrahimi  \f         form feed (hex 0C)
84*22dc650dSSadaf Ebrahimi  \n         newline (hex 0A)
85*22dc650dSSadaf Ebrahimi  \r         carriage return (hex 0D)
86*22dc650dSSadaf Ebrahimi  \t         tab (hex 09)
87*22dc650dSSadaf Ebrahimi  \0dd       character with octal code 0dd
88*22dc650dSSadaf Ebrahimi  \ddd       character with octal code ddd, or backreference
89*22dc650dSSadaf Ebrahimi  \o{ddd..}  character with octal code ddd..
90*22dc650dSSadaf Ebrahimi  \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
91*22dc650dSSadaf Ebrahimi  \xhh       character with hex code hh
92*22dc650dSSadaf Ebrahimi  \x{hh..}   character with hex code hh..
93*22dc650dSSadaf Ebrahimi</pre>
94*22dc650dSSadaf EbrahimiIf PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
95*22dc650dSSadaf Ebrahimifollowing are also recognized:
96*22dc650dSSadaf Ebrahimi<pre>
97*22dc650dSSadaf Ebrahimi  \U         the character "U"
98*22dc650dSSadaf Ebrahimi  \uhhhh     character with hex code hhhh
99*22dc650dSSadaf Ebrahimi  \u{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
100*22dc650dSSadaf Ebrahimi</pre>
101*22dc650dSSadaf EbrahimiWhen \x is not followed by {, from zero to two hexadecimal digits are read,
102*22dc650dSSadaf Ebrahimibut in ALT_BSUX mode \x must be followed by two hexadecimal digits to be
103*22dc650dSSadaf Ebrahimirecognized as a hexadecimal escape; otherwise it matches a literal "x".
104*22dc650dSSadaf EbrahimiLikewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits
105*22dc650dSSadaf Ebrahimior (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
106*22dc650dSSadaf Ebrahimimatches a literal "u".
107*22dc650dSSadaf Ebrahimi</P>
108*22dc650dSSadaf Ebrahimi<P>
109*22dc650dSSadaf EbrahimiNote that \0dd is always an octal code. The treatment of backslash followed by
110*22dc650dSSadaf Ebrahimia non-zero digit is complicated; for details see the section
111*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
112*22dc650dSSadaf Ebrahimiin the
113*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
114*22dc650dSSadaf Ebrahimidocumentation, where details of escape processing in EBCDIC environments are
115*22dc650dSSadaf Ebrahimialso given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
116*22dc650dSSadaf Ebrahimisupported in EBCDIC environments. Note that \N not followed by an opening
117*22dc650dSSadaf Ebrahimicurly bracket has a different meaning (see below).
118*22dc650dSSadaf Ebrahimi</P>
119*22dc650dSSadaf Ebrahimi<br><a name="SEC5" href="#TOC1">CHARACTER TYPES</a><br>
120*22dc650dSSadaf Ebrahimi<P>
121*22dc650dSSadaf Ebrahimi<pre>
122*22dc650dSSadaf Ebrahimi  .          any character except newline;
123*22dc650dSSadaf Ebrahimi               in dotall mode, any character whatsoever
124*22dc650dSSadaf Ebrahimi  \C         one code unit, even in UTF mode (best avoided)
125*22dc650dSSadaf Ebrahimi  \d         a decimal digit
126*22dc650dSSadaf Ebrahimi  \D         a character that is not a decimal digit
127*22dc650dSSadaf Ebrahimi  \h         a horizontal white space character
128*22dc650dSSadaf Ebrahimi  \H         a character that is not a horizontal white space character
129*22dc650dSSadaf Ebrahimi  \N         a character that is not a newline
130*22dc650dSSadaf Ebrahimi  \p{<i>xx</i>}     a character with the <i>xx</i> property
131*22dc650dSSadaf Ebrahimi  \P{<i>xx</i>}     a character without the <i>xx</i> property
132*22dc650dSSadaf Ebrahimi  \R         a newline sequence
133*22dc650dSSadaf Ebrahimi  \s         a white space character
134*22dc650dSSadaf Ebrahimi  \S         a character that is not a white space character
135*22dc650dSSadaf Ebrahimi  \v         a vertical white space character
136*22dc650dSSadaf Ebrahimi  \V         a character that is not a vertical white space character
137*22dc650dSSadaf Ebrahimi  \w         a "word" character
138*22dc650dSSadaf Ebrahimi  \W         a "non-word" character
139*22dc650dSSadaf Ebrahimi  \X         a Unicode extended grapheme cluster
140*22dc650dSSadaf Ebrahimi</pre>
141*22dc650dSSadaf Ebrahimi\C is dangerous because it may leave the current matching point in the middle
142*22dc650dSSadaf Ebrahimiof a UTF-8 or UTF-16 character. The application can lock out the use of \C by
143*22dc650dSSadaf Ebrahimisetting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2
144*22dc650dSSadaf Ebrahimiwith the use of \C permanently disabled.
145*22dc650dSSadaf Ebrahimi</P>
146*22dc650dSSadaf Ebrahimi<P>
147*22dc650dSSadaf EbrahimiBy default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
148*22dc650dSSadaf Ebrahimior in the 16-bit and 32-bit libraries. However, if locale-specific matching is
149*22dc650dSSadaf Ebrahimihappening, \s and \w may also match characters with code points in the range
150*22dc650dSSadaf Ebrahimi128-255. If the PCRE2_UCP option is set, the behaviour of these escape
151*22dc650dSSadaf Ebrahimisequences is changed to use Unicode properties and they match many more
152*22dc650dSSadaf Ebrahimicharacters, but there are some option settings that can restrict individual
153*22dc650dSSadaf Ebrahimisequences to matching only ASCII characters.
154*22dc650dSSadaf Ebrahimi</P>
155*22dc650dSSadaf Ebrahimi<P>
156*22dc650dSSadaf EbrahimiProperty descriptions in \p and \P are matched caselessly; hyphens,
157*22dc650dSSadaf Ebrahimiunderscores, and white space are ignored, in accordance with Unicode's "loose
158*22dc650dSSadaf Ebrahimimatching" rules.
159*22dc650dSSadaf Ebrahimi</P>
160*22dc650dSSadaf Ebrahimi<br><a name="SEC6" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
161*22dc650dSSadaf Ebrahimi<P>
162*22dc650dSSadaf Ebrahimi<pre>
163*22dc650dSSadaf Ebrahimi  C          Other
164*22dc650dSSadaf Ebrahimi  Cc         Control
165*22dc650dSSadaf Ebrahimi  Cf         Format
166*22dc650dSSadaf Ebrahimi  Cn         Unassigned
167*22dc650dSSadaf Ebrahimi  Co         Private use
168*22dc650dSSadaf Ebrahimi  Cs         Surrogate
169*22dc650dSSadaf Ebrahimi
170*22dc650dSSadaf Ebrahimi  L          Letter
171*22dc650dSSadaf Ebrahimi  Ll         Lower case letter
172*22dc650dSSadaf Ebrahimi  Lm         Modifier letter
173*22dc650dSSadaf Ebrahimi  Lo         Other letter
174*22dc650dSSadaf Ebrahimi  Lt         Title case letter
175*22dc650dSSadaf Ebrahimi  Lu         Upper case letter
176*22dc650dSSadaf Ebrahimi  Lc         Ll, Lu, or Lt
177*22dc650dSSadaf Ebrahimi  L&         Ll, Lu, or Lt
178*22dc650dSSadaf Ebrahimi
179*22dc650dSSadaf Ebrahimi  M          Mark
180*22dc650dSSadaf Ebrahimi  Mc         Spacing mark
181*22dc650dSSadaf Ebrahimi  Me         Enclosing mark
182*22dc650dSSadaf Ebrahimi  Mn         Non-spacing mark
183*22dc650dSSadaf Ebrahimi
184*22dc650dSSadaf Ebrahimi  N          Number
185*22dc650dSSadaf Ebrahimi  Nd         Decimal number
186*22dc650dSSadaf Ebrahimi  Nl         Letter number
187*22dc650dSSadaf Ebrahimi  No         Other number
188*22dc650dSSadaf Ebrahimi
189*22dc650dSSadaf Ebrahimi  P          Punctuation
190*22dc650dSSadaf Ebrahimi  Pc         Connector punctuation
191*22dc650dSSadaf Ebrahimi  Pd         Dash punctuation
192*22dc650dSSadaf Ebrahimi  Pe         Close punctuation
193*22dc650dSSadaf Ebrahimi  Pf         Final punctuation
194*22dc650dSSadaf Ebrahimi  Pi         Initial punctuation
195*22dc650dSSadaf Ebrahimi  Po         Other punctuation
196*22dc650dSSadaf Ebrahimi  Ps         Open punctuation
197*22dc650dSSadaf Ebrahimi
198*22dc650dSSadaf Ebrahimi  S          Symbol
199*22dc650dSSadaf Ebrahimi  Sc         Currency symbol
200*22dc650dSSadaf Ebrahimi  Sk         Modifier symbol
201*22dc650dSSadaf Ebrahimi  Sm         Mathematical symbol
202*22dc650dSSadaf Ebrahimi  So         Other symbol
203*22dc650dSSadaf Ebrahimi
204*22dc650dSSadaf Ebrahimi  Z          Separator
205*22dc650dSSadaf Ebrahimi  Zl         Line separator
206*22dc650dSSadaf Ebrahimi  Zp         Paragraph separator
207*22dc650dSSadaf Ebrahimi  Zs         Space separator
208*22dc650dSSadaf Ebrahimi</PRE>
209*22dc650dSSadaf Ebrahimi</P>
210*22dc650dSSadaf Ebrahimi<br><a name="SEC7" href="#TOC1">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
211*22dc650dSSadaf Ebrahimi<P>
212*22dc650dSSadaf Ebrahimi<pre>
213*22dc650dSSadaf Ebrahimi  Xan        Alphanumeric: union of properties L and N
214*22dc650dSSadaf Ebrahimi  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
215*22dc650dSSadaf Ebrahimi  Xsp        Perl space: property Z or tab, NL, VT, FF, CR
216*22dc650dSSadaf Ebrahimi  Xuc        Universally-named character: one that can be
217*22dc650dSSadaf Ebrahimi               represented by a Universal Character Name
218*22dc650dSSadaf Ebrahimi  Xwd        Perl word: property Xan or underscore
219*22dc650dSSadaf Ebrahimi</pre>
220*22dc650dSSadaf EbrahimiPerl and POSIX space are now the same. Perl added VT to its space character set
221*22dc650dSSadaf Ebrahimiat release 5.18.
222*22dc650dSSadaf Ebrahimi</P>
223*22dc650dSSadaf Ebrahimi<br><a name="SEC8" href="#TOC1">BINARY PROPERTIES FOR \p AND \P</a><br>
224*22dc650dSSadaf Ebrahimi<P>
225*22dc650dSSadaf EbrahimiUnicode defines a number of binary properties, that is, properties whose only
226*22dc650dSSadaf Ebrahimivalues are true or false. You can obtain a list of those that are recognized by
227*22dc650dSSadaf Ebrahimi\p and \P, along with their abbreviations, by running this command:
228*22dc650dSSadaf Ebrahimi<pre>
229*22dc650dSSadaf Ebrahimi  pcre2test -LP
230*22dc650dSSadaf Ebrahimi</PRE>
231*22dc650dSSadaf Ebrahimi</P>
232*22dc650dSSadaf Ebrahimi<br><a name="SEC9" href="#TOC1">SCRIPT MATCHING WITH \p AND \P</a><br>
233*22dc650dSSadaf Ebrahimi<P>
234*22dc650dSSadaf EbrahimiMany script names and their 4-letter abbreviations are recognized in
235*22dc650dSSadaf Ebrahimi\p{sc:...} or \p{scx:...} items, or on their own with \p (and also \P of
236*22dc650dSSadaf Ebrahimicourse). You can obtain a list of these scripts by running this command:
237*22dc650dSSadaf Ebrahimi<pre>
238*22dc650dSSadaf Ebrahimi  pcre2test -LS
239*22dc650dSSadaf Ebrahimi</PRE>
240*22dc650dSSadaf Ebrahimi</P>
241*22dc650dSSadaf Ebrahimi<br><a name="SEC10" href="#TOC1">THE BIDI_CLASS PROPERTY FOR \p AND \P</a><br>
242*22dc650dSSadaf Ebrahimi<P>
243*22dc650dSSadaf Ebrahimi<pre>
244*22dc650dSSadaf Ebrahimi  \p{Bidi_Class:&#60;class&#62;}   matches a character with the given class
245*22dc650dSSadaf Ebrahimi  \p{BC:&#60;class&#62;}           matches a character with the given class
246*22dc650dSSadaf Ebrahimi</pre>
247*22dc650dSSadaf EbrahimiThe recognized classes are:
248*22dc650dSSadaf Ebrahimi<pre>
249*22dc650dSSadaf Ebrahimi  AL          Arabic letter
250*22dc650dSSadaf Ebrahimi  AN          Arabic number
251*22dc650dSSadaf Ebrahimi  B           paragraph separator
252*22dc650dSSadaf Ebrahimi  BN          boundary neutral
253*22dc650dSSadaf Ebrahimi  CS          common separator
254*22dc650dSSadaf Ebrahimi  EN          European number
255*22dc650dSSadaf Ebrahimi  ES          European separator
256*22dc650dSSadaf Ebrahimi  ET          European terminator
257*22dc650dSSadaf Ebrahimi  FSI         first strong isolate
258*22dc650dSSadaf Ebrahimi  L           left-to-right
259*22dc650dSSadaf Ebrahimi  LRE         left-to-right embedding
260*22dc650dSSadaf Ebrahimi  LRI         left-to-right isolate
261*22dc650dSSadaf Ebrahimi  LRO         left-to-right override
262*22dc650dSSadaf Ebrahimi  NSM         non-spacing mark
263*22dc650dSSadaf Ebrahimi  ON          other neutral
264*22dc650dSSadaf Ebrahimi  PDF         pop directional format
265*22dc650dSSadaf Ebrahimi  PDI         pop directional isolate
266*22dc650dSSadaf Ebrahimi  R           right-to-left
267*22dc650dSSadaf Ebrahimi  RLE         right-to-left embedding
268*22dc650dSSadaf Ebrahimi  RLI         right-to-left isolate
269*22dc650dSSadaf Ebrahimi  RLO         right-to-left override
270*22dc650dSSadaf Ebrahimi  S           segment separator
271*22dc650dSSadaf Ebrahimi  WS          which space
272*22dc650dSSadaf Ebrahimi</PRE>
273*22dc650dSSadaf Ebrahimi</P>
274*22dc650dSSadaf Ebrahimi<br><a name="SEC11" href="#TOC1">CHARACTER CLASSES</a><br>
275*22dc650dSSadaf Ebrahimi<P>
276*22dc650dSSadaf Ebrahimi<pre>
277*22dc650dSSadaf Ebrahimi  [...]       positive character class
278*22dc650dSSadaf Ebrahimi  [^...]      negative character class
279*22dc650dSSadaf Ebrahimi  [x-y]       range (can be used for hex characters)
280*22dc650dSSadaf Ebrahimi  [[:xxx:]]   positive POSIX named set
281*22dc650dSSadaf Ebrahimi  [[:^xxx:]]  negative POSIX named set
282*22dc650dSSadaf Ebrahimi
283*22dc650dSSadaf Ebrahimi  alnum       alphanumeric
284*22dc650dSSadaf Ebrahimi  alpha       alphabetic
285*22dc650dSSadaf Ebrahimi  ascii       0-127
286*22dc650dSSadaf Ebrahimi  blank       space or tab
287*22dc650dSSadaf Ebrahimi  cntrl       control character
288*22dc650dSSadaf Ebrahimi  digit       decimal digit
289*22dc650dSSadaf Ebrahimi  graph       printing, excluding space
290*22dc650dSSadaf Ebrahimi  lower       lower case letter
291*22dc650dSSadaf Ebrahimi  print       printing, including space
292*22dc650dSSadaf Ebrahimi  punct       printing, excluding alphanumeric
293*22dc650dSSadaf Ebrahimi  space       white space
294*22dc650dSSadaf Ebrahimi  upper       upper case letter
295*22dc650dSSadaf Ebrahimi  word        same as \w
296*22dc650dSSadaf Ebrahimi  xdigit      hexadecimal digit
297*22dc650dSSadaf Ebrahimi</pre>
298*22dc650dSSadaf EbrahimiIn PCRE2, POSIX character set names recognize only ASCII characters by default,
299*22dc650dSSadaf Ebrahimibut some of them use Unicode properties if PCRE2_UCP is set. You can use
300*22dc650dSSadaf Ebrahimi\Q...\E inside a character class.
301*22dc650dSSadaf Ebrahimi</P>
302*22dc650dSSadaf Ebrahimi<br><a name="SEC12" href="#TOC1">QUANTIFIERS</a><br>
303*22dc650dSSadaf Ebrahimi<P>
304*22dc650dSSadaf Ebrahimi<pre>
305*22dc650dSSadaf Ebrahimi  ?           0 or 1, greedy
306*22dc650dSSadaf Ebrahimi  ?+          0 or 1, possessive
307*22dc650dSSadaf Ebrahimi  ??          0 or 1, lazy
308*22dc650dSSadaf Ebrahimi  *           0 or more, greedy
309*22dc650dSSadaf Ebrahimi  *+          0 or more, possessive
310*22dc650dSSadaf Ebrahimi  *?          0 or more, lazy
311*22dc650dSSadaf Ebrahimi  +           1 or more, greedy
312*22dc650dSSadaf Ebrahimi  ++          1 or more, possessive
313*22dc650dSSadaf Ebrahimi  +?          1 or more, lazy
314*22dc650dSSadaf Ebrahimi  {n}         exactly n
315*22dc650dSSadaf Ebrahimi  {n,m}       at least n, no more than m, greedy
316*22dc650dSSadaf Ebrahimi  {n,m}+      at least n, no more than m, possessive
317*22dc650dSSadaf Ebrahimi  {n,m}?      at least n, no more than m, lazy
318*22dc650dSSadaf Ebrahimi  {n,}        n or more, greedy
319*22dc650dSSadaf Ebrahimi  {n,}+       n or more, possessive
320*22dc650dSSadaf Ebrahimi  {n,}?       n or more, lazy
321*22dc650dSSadaf Ebrahimi  {,m}        zero up to m, greedy
322*22dc650dSSadaf Ebrahimi  {,m}+       zero up to m, possessive
323*22dc650dSSadaf Ebrahimi  {,m}?       zero up to m, lazy
324*22dc650dSSadaf Ebrahimi</PRE>
325*22dc650dSSadaf Ebrahimi</P>
326*22dc650dSSadaf Ebrahimi<br><a name="SEC13" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
327*22dc650dSSadaf Ebrahimi<P>
328*22dc650dSSadaf Ebrahimi<pre>
329*22dc650dSSadaf Ebrahimi  \b          word boundary
330*22dc650dSSadaf Ebrahimi  \B          not a word boundary
331*22dc650dSSadaf Ebrahimi  ^           start of subject
332*22dc650dSSadaf Ebrahimi                also after an internal newline in multiline mode
333*22dc650dSSadaf Ebrahimi                (after any newline if PCRE2_ALT_CIRCUMFLEX is set)
334*22dc650dSSadaf Ebrahimi  \A          start of subject
335*22dc650dSSadaf Ebrahimi  $           end of subject
336*22dc650dSSadaf Ebrahimi                also before newline at end of subject
337*22dc650dSSadaf Ebrahimi                also before internal newline in multiline mode
338*22dc650dSSadaf Ebrahimi  \Z          end of subject
339*22dc650dSSadaf Ebrahimi                also before newline at end of subject
340*22dc650dSSadaf Ebrahimi  \z          end of subject
341*22dc650dSSadaf Ebrahimi  \G          first matching position in subject
342*22dc650dSSadaf Ebrahimi</PRE>
343*22dc650dSSadaf Ebrahimi</P>
344*22dc650dSSadaf Ebrahimi<br><a name="SEC14" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
345*22dc650dSSadaf Ebrahimi<P>
346*22dc650dSSadaf Ebrahimi<pre>
347*22dc650dSSadaf Ebrahimi  \K          set reported start of match
348*22dc650dSSadaf Ebrahimi</pre>
349*22dc650dSSadaf EbrahimiFrom release 10.38 \K is not permitted by default in lookaround assertions,
350*22dc650dSSadaf Ebrahimifor compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
351*22dc650dSSadaf Ebrahimioption is set, the previous behaviour is re-enabled. When this option is set,
352*22dc650dSSadaf Ebrahimi\K is honoured in positive assertions, but ignored in negative ones.
353*22dc650dSSadaf Ebrahimi</P>
354*22dc650dSSadaf Ebrahimi<br><a name="SEC15" href="#TOC1">ALTERNATION</a><br>
355*22dc650dSSadaf Ebrahimi<P>
356*22dc650dSSadaf Ebrahimi<pre>
357*22dc650dSSadaf Ebrahimi  expr|expr|expr...
358*22dc650dSSadaf Ebrahimi</PRE>
359*22dc650dSSadaf Ebrahimi</P>
360*22dc650dSSadaf Ebrahimi<br><a name="SEC16" href="#TOC1">CAPTURING</a><br>
361*22dc650dSSadaf Ebrahimi<P>
362*22dc650dSSadaf Ebrahimi<pre>
363*22dc650dSSadaf Ebrahimi  (...)           capture group
364*22dc650dSSadaf Ebrahimi  (?&#60;name&#62;...)    named capture group (Perl)
365*22dc650dSSadaf Ebrahimi  (?'name'...)    named capture group (Perl)
366*22dc650dSSadaf Ebrahimi  (?P&#60;name&#62;...)   named capture group (Python)
367*22dc650dSSadaf Ebrahimi  (?:...)         non-capture group
368*22dc650dSSadaf Ebrahimi  (?|...)         non-capture group; reset group numbers for
369*22dc650dSSadaf Ebrahimi                   capture groups in each alternative
370*22dc650dSSadaf Ebrahimi</pre>
371*22dc650dSSadaf EbrahimiIn non-UTF modes, names may contain underscores and ASCII letters and digits;
372*22dc650dSSadaf Ebrahimiin UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
373*22dc650dSSadaf Ebrahimiboth cases, a name must not start with a digit.
374*22dc650dSSadaf Ebrahimi</P>
375*22dc650dSSadaf Ebrahimi<br><a name="SEC17" href="#TOC1">ATOMIC GROUPS</a><br>
376*22dc650dSSadaf Ebrahimi<P>
377*22dc650dSSadaf Ebrahimi<pre>
378*22dc650dSSadaf Ebrahimi  (?&#62;...)         atomic non-capture group
379*22dc650dSSadaf Ebrahimi  (*atomic:...)   atomic non-capture group
380*22dc650dSSadaf Ebrahimi</PRE>
381*22dc650dSSadaf Ebrahimi</P>
382*22dc650dSSadaf Ebrahimi<br><a name="SEC18" href="#TOC1">COMMENT</a><br>
383*22dc650dSSadaf Ebrahimi<P>
384*22dc650dSSadaf Ebrahimi<pre>
385*22dc650dSSadaf Ebrahimi  (?#....)        comment (not nestable)
386*22dc650dSSadaf Ebrahimi</PRE>
387*22dc650dSSadaf Ebrahimi</P>
388*22dc650dSSadaf Ebrahimi<br><a name="SEC19" href="#TOC1">OPTION SETTING</a><br>
389*22dc650dSSadaf Ebrahimi<P>
390*22dc650dSSadaf EbrahimiChanges of these options within a group are automatically cancelled at the end
391*22dc650dSSadaf Ebrahimiof the group.
392*22dc650dSSadaf Ebrahimi<pre>
393*22dc650dSSadaf Ebrahimi  (?a)            all ASCII options
394*22dc650dSSadaf Ebrahimi  (?aD)           restrict \d to ASCII in UCP mode
395*22dc650dSSadaf Ebrahimi  (?aS)           restrict \s to ASCII in UCP mode
396*22dc650dSSadaf Ebrahimi  (?aW)           restrict \w to ASCII in UCP mode
397*22dc650dSSadaf Ebrahimi  (?aP)           restrict all POSIX classes to ASCII in UCP mode
398*22dc650dSSadaf Ebrahimi  (?aT)           restrict POSIX digit classes to ASCII in UCP mode
399*22dc650dSSadaf Ebrahimi  (?i)            caseless
400*22dc650dSSadaf Ebrahimi  (?J)            allow duplicate named groups
401*22dc650dSSadaf Ebrahimi  (?m)            multiline
402*22dc650dSSadaf Ebrahimi  (?n)            no auto capture
403*22dc650dSSadaf Ebrahimi  (?r)            restrict caseless to either ASCII or non-ASCII
404*22dc650dSSadaf Ebrahimi  (?s)            single line (dotall)
405*22dc650dSSadaf Ebrahimi  (?U)            default ungreedy (lazy)
406*22dc650dSSadaf Ebrahimi  (?x)            ignore white space except in classes or \Q...\E
407*22dc650dSSadaf Ebrahimi  (?xx)           as (?x) but also ignore space and tab in classes
408*22dc650dSSadaf Ebrahimi  (?-...)         unset the given option(s)
409*22dc650dSSadaf Ebrahimi  (?^)            unset imnrsx options
410*22dc650dSSadaf Ebrahimi</pre>
411*22dc650dSSadaf Ebrahimi(?aP) implies (?aT) as well, though this has no additional effect. However, it
412*22dc650dSSadaf Ebrahimimeans that (?-aP) is really (?-PT) which disables all ASCII restrictions for
413*22dc650dSSadaf EbrahimiPOSIX classes.
414*22dc650dSSadaf Ebrahimi</P>
415*22dc650dSSadaf Ebrahimi<P>
416*22dc650dSSadaf EbrahimiUnsetting x or xx unsets both. Several options may be set at once, and a
417*22dc650dSSadaf Ebrahimimixture of setting and unsetting such as (?i-x) is allowed, but there may be
418*22dc650dSSadaf Ebrahimionly one hyphen. Setting (but no unsetting) is allowed after (?^ for example
419*22dc650dSSadaf Ebrahimi(?^in). An option setting may appear at the start of a non-capture group, for
420*22dc650dSSadaf Ebrahimiexample (?i:...).
421*22dc650dSSadaf Ebrahimi</P>
422*22dc650dSSadaf Ebrahimi<P>
423*22dc650dSSadaf EbrahimiThe following are recognized only at the very start of a pattern or after one
424*22dc650dSSadaf Ebrahimiof the newline or \R options with similar syntax. More than one of them may
425*22dc650dSSadaf Ebrahimiappear. For the first three, d is a decimal number.
426*22dc650dSSadaf Ebrahimi<pre>
427*22dc650dSSadaf Ebrahimi  (*LIMIT_DEPTH=d) set the backtracking limit to d
428*22dc650dSSadaf Ebrahimi  (*LIMIT_HEAP=d)  set the heap size limit to d * 1024 bytes
429*22dc650dSSadaf Ebrahimi  (*LIMIT_MATCH=d) set the match limit to d
430*22dc650dSSadaf Ebrahimi  (*NOTEMPTY)      set PCRE2_NOTEMPTY when matching
431*22dc650dSSadaf Ebrahimi  (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
432*22dc650dSSadaf Ebrahimi  (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
433*22dc650dSSadaf Ebrahimi  (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR)
434*22dc650dSSadaf Ebrahimi  (*NO_JIT)       disable JIT optimization
435*22dc650dSSadaf Ebrahimi  (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
436*22dc650dSSadaf Ebrahimi  (*UTF)          set appropriate UTF mode for the library in use
437*22dc650dSSadaf Ebrahimi  (*UCP)          set PCRE2_UCP (use Unicode properties for \d etc)
438*22dc650dSSadaf Ebrahimi</pre>
439*22dc650dSSadaf EbrahimiNote that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the value of
440*22dc650dSSadaf Ebrahimithe limits set by the caller of <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>,
441*22dc650dSSadaf Ebrahiminot increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
442*22dc650dSSadaf Ebrahimiapplication can lock out the use of (*UTF) and (*UCP) by setting the
443*22dc650dSSadaf EbrahimiPCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
444*22dc650dSSadaf Ebrahimi</P>
445*22dc650dSSadaf Ebrahimi<br><a name="SEC20" href="#TOC1">NEWLINE CONVENTION</a><br>
446*22dc650dSSadaf Ebrahimi<P>
447*22dc650dSSadaf EbrahimiThese are recognized only at the very start of the pattern or after option
448*22dc650dSSadaf Ebrahimisettings with a similar syntax.
449*22dc650dSSadaf Ebrahimi<pre>
450*22dc650dSSadaf Ebrahimi  (*CR)           carriage return only
451*22dc650dSSadaf Ebrahimi  (*LF)           linefeed only
452*22dc650dSSadaf Ebrahimi  (*CRLF)         carriage return followed by linefeed
453*22dc650dSSadaf Ebrahimi  (*ANYCRLF)      all three of the above
454*22dc650dSSadaf Ebrahimi  (*ANY)          any Unicode newline sequence
455*22dc650dSSadaf Ebrahimi  (*NUL)          the NUL character (binary zero)
456*22dc650dSSadaf Ebrahimi</PRE>
457*22dc650dSSadaf Ebrahimi</P>
458*22dc650dSSadaf Ebrahimi<br><a name="SEC21" href="#TOC1">WHAT \R MATCHES</a><br>
459*22dc650dSSadaf Ebrahimi<P>
460*22dc650dSSadaf EbrahimiThese are recognized only at the very start of the pattern or after option
461*22dc650dSSadaf Ebrahimisetting with a similar syntax.
462*22dc650dSSadaf Ebrahimi<pre>
463*22dc650dSSadaf Ebrahimi  (*BSR_ANYCRLF)  CR, LF, or CRLF
464*22dc650dSSadaf Ebrahimi  (*BSR_UNICODE)  any Unicode newline sequence
465*22dc650dSSadaf Ebrahimi</PRE>
466*22dc650dSSadaf Ebrahimi</P>
467*22dc650dSSadaf Ebrahimi<br><a name="SEC22" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
468*22dc650dSSadaf Ebrahimi<P>
469*22dc650dSSadaf Ebrahimi<pre>
470*22dc650dSSadaf Ebrahimi  (?=...)                     )
471*22dc650dSSadaf Ebrahimi  (*pla:...)                  ) positive lookahead
472*22dc650dSSadaf Ebrahimi  (*positive_lookahead:...)   )
473*22dc650dSSadaf Ebrahimi
474*22dc650dSSadaf Ebrahimi  (?!...)                     )
475*22dc650dSSadaf Ebrahimi  (*nla:...)                  ) negative lookahead
476*22dc650dSSadaf Ebrahimi  (*negative_lookahead:...)   )
477*22dc650dSSadaf Ebrahimi
478*22dc650dSSadaf Ebrahimi  (?&#60;=...)                    )
479*22dc650dSSadaf Ebrahimi  (*plb:...)                  ) positive lookbehind
480*22dc650dSSadaf Ebrahimi  (*positive_lookbehind:...)  )
481*22dc650dSSadaf Ebrahimi
482*22dc650dSSadaf Ebrahimi  (?&#60;!...)                    )
483*22dc650dSSadaf Ebrahimi  (*nlb:...)                  ) negative lookbehind
484*22dc650dSSadaf Ebrahimi  (*negative_lookbehind:...)  )
485*22dc650dSSadaf Ebrahimi</pre>
486*22dc650dSSadaf EbrahimiEach top-level branch of a lookbehind must have a limit for the number of
487*22dc650dSSadaf Ebrahimicharacters it matches. If any branch can match a variable number of characters,
488*22dc650dSSadaf Ebrahimithe maximum for each branch is limited to a value set by the caller of
489*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> or defaulted. The default is set when PCRE2 is built
490*22dc650dSSadaf Ebrahimi(ultimate default 255). If every branch matches a fixed number of characters,
491*22dc650dSSadaf Ebrahimithe limit for each branch is 65535 characters.
492*22dc650dSSadaf Ebrahimi</P>
493*22dc650dSSadaf Ebrahimi<br><a name="SEC23" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
494*22dc650dSSadaf Ebrahimi<P>
495*22dc650dSSadaf EbrahimiThese assertions are specific to PCRE2 and are not Perl-compatible.
496*22dc650dSSadaf Ebrahimi<pre>
497*22dc650dSSadaf Ebrahimi  (?*...)                                )
498*22dc650dSSadaf Ebrahimi  (*napla:...)                           ) synonyms
499*22dc650dSSadaf Ebrahimi  (*non_atomic_positive_lookahead:...)   )
500*22dc650dSSadaf Ebrahimi
501*22dc650dSSadaf Ebrahimi  (?&#60;*...)                               )
502*22dc650dSSadaf Ebrahimi  (*naplb:...)                           ) synonyms
503*22dc650dSSadaf Ebrahimi  (*non_atomic_positive_lookbehind:...)  )
504*22dc650dSSadaf Ebrahimi</PRE>
505*22dc650dSSadaf Ebrahimi</P>
506*22dc650dSSadaf Ebrahimi<br><a name="SEC24" href="#TOC1">SCRIPT RUNS</a><br>
507*22dc650dSSadaf Ebrahimi<P>
508*22dc650dSSadaf Ebrahimi<pre>
509*22dc650dSSadaf Ebrahimi  (*script_run:...)           ) script run, can be backtracked into
510*22dc650dSSadaf Ebrahimi  (*sr:...)                   )
511*22dc650dSSadaf Ebrahimi
512*22dc650dSSadaf Ebrahimi  (*atomic_script_run:...)    ) atomic script run
513*22dc650dSSadaf Ebrahimi  (*asr:...)                  )
514*22dc650dSSadaf Ebrahimi</PRE>
515*22dc650dSSadaf Ebrahimi</P>
516*22dc650dSSadaf Ebrahimi<br><a name="SEC25" href="#TOC1">BACKREFERENCES</a><br>
517*22dc650dSSadaf Ebrahimi<P>
518*22dc650dSSadaf Ebrahimi<pre>
519*22dc650dSSadaf Ebrahimi  \n              reference by number (can be ambiguous)
520*22dc650dSSadaf Ebrahimi  \gn             reference by number
521*22dc650dSSadaf Ebrahimi  \g{n}           reference by number
522*22dc650dSSadaf Ebrahimi  \g+n            relative reference by number (PCRE2 extension)
523*22dc650dSSadaf Ebrahimi  \g-n            relative reference by number
524*22dc650dSSadaf Ebrahimi  \g{+n}          relative reference by number (PCRE2 extension)
525*22dc650dSSadaf Ebrahimi  \g{-n}          relative reference by number
526*22dc650dSSadaf Ebrahimi  \k&#60;name&#62;        reference by name (Perl)
527*22dc650dSSadaf Ebrahimi  \k'name'        reference by name (Perl)
528*22dc650dSSadaf Ebrahimi  \g{name}        reference by name (Perl)
529*22dc650dSSadaf Ebrahimi  \k{name}        reference by name (.NET)
530*22dc650dSSadaf Ebrahimi  (?P=name)       reference by name (Python)
531*22dc650dSSadaf Ebrahimi</PRE>
532*22dc650dSSadaf Ebrahimi</P>
533*22dc650dSSadaf Ebrahimi<br><a name="SEC26" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
534*22dc650dSSadaf Ebrahimi<P>
535*22dc650dSSadaf Ebrahimi<pre>
536*22dc650dSSadaf Ebrahimi  (?R)            recurse whole pattern
537*22dc650dSSadaf Ebrahimi  (?n)            call subroutine by absolute number
538*22dc650dSSadaf Ebrahimi  (?+n)           call subroutine by relative number
539*22dc650dSSadaf Ebrahimi  (?-n)           call subroutine by relative number
540*22dc650dSSadaf Ebrahimi  (?&name)        call subroutine by name (Perl)
541*22dc650dSSadaf Ebrahimi  (?P&#62;name)       call subroutine by name (Python)
542*22dc650dSSadaf Ebrahimi  \g&#60;name&#62;        call subroutine by name (Oniguruma)
543*22dc650dSSadaf Ebrahimi  \g'name'        call subroutine by name (Oniguruma)
544*22dc650dSSadaf Ebrahimi  \g&#60;n&#62;           call subroutine by absolute number (Oniguruma)
545*22dc650dSSadaf Ebrahimi  \g'n'           call subroutine by absolute number (Oniguruma)
546*22dc650dSSadaf Ebrahimi  \g&#60;+n&#62;          call subroutine by relative number (PCRE2 extension)
547*22dc650dSSadaf Ebrahimi  \g'+n'          call subroutine by relative number (PCRE2 extension)
548*22dc650dSSadaf Ebrahimi  \g&#60;-n&#62;          call subroutine by relative number (PCRE2 extension)
549*22dc650dSSadaf Ebrahimi  \g'-n'          call subroutine by relative number (PCRE2 extension)
550*22dc650dSSadaf Ebrahimi</PRE>
551*22dc650dSSadaf Ebrahimi</P>
552*22dc650dSSadaf Ebrahimi<br><a name="SEC27" href="#TOC1">CONDITIONAL PATTERNS</a><br>
553*22dc650dSSadaf Ebrahimi<P>
554*22dc650dSSadaf Ebrahimi<pre>
555*22dc650dSSadaf Ebrahimi  (?(condition)yes-pattern)
556*22dc650dSSadaf Ebrahimi  (?(condition)yes-pattern|no-pattern)
557*22dc650dSSadaf Ebrahimi
558*22dc650dSSadaf Ebrahimi  (?(n)               absolute reference condition
559*22dc650dSSadaf Ebrahimi  (?(+n)              relative reference condition (PCRE2 extension)
560*22dc650dSSadaf Ebrahimi  (?(-n)              relative reference condition (PCRE2 extension)
561*22dc650dSSadaf Ebrahimi  (?(&#60;name&#62;)          named reference condition (Perl)
562*22dc650dSSadaf Ebrahimi  (?('name')          named reference condition (Perl)
563*22dc650dSSadaf Ebrahimi  (?(name)            named reference condition (PCRE2, deprecated)
564*22dc650dSSadaf Ebrahimi  (?(R)               overall recursion condition
565*22dc650dSSadaf Ebrahimi  (?(Rn)              specific numbered group recursion condition
566*22dc650dSSadaf Ebrahimi  (?(R&name)          specific named group recursion condition
567*22dc650dSSadaf Ebrahimi  (?(DEFINE)          define groups for reference
568*22dc650dSSadaf Ebrahimi  (?(VERSION[&#62;]=n.m)  test PCRE2 version
569*22dc650dSSadaf Ebrahimi  (?(assert)          assertion condition
570*22dc650dSSadaf Ebrahimi</pre>
571*22dc650dSSadaf EbrahimiNote the ambiguity of (?(R) and (?(Rn) which might be named reference
572*22dc650dSSadaf Ebrahimiconditions or recursion tests. Such a condition is interpreted as a reference
573*22dc650dSSadaf Ebrahimicondition if the relevant named group exists.
574*22dc650dSSadaf Ebrahimi</P>
575*22dc650dSSadaf Ebrahimi<br><a name="SEC28" href="#TOC1">BACKTRACKING CONTROL</a><br>
576*22dc650dSSadaf Ebrahimi<P>
577*22dc650dSSadaf EbrahimiAll backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
578*22dc650dSSadaf Ebrahiminame is mandatory, for the others it is optional. (*SKIP) changes its behaviour
579*22dc650dSSadaf Ebrahimiif :NAME is present. The others just set a name for passing back to the caller,
580*22dc650dSSadaf Ebrahimibut this is not a name that (*SKIP) can see. The following act immediately they
581*22dc650dSSadaf Ebrahimiare reached:
582*22dc650dSSadaf Ebrahimi<pre>
583*22dc650dSSadaf Ebrahimi  (*ACCEPT)       force successful match
584*22dc650dSSadaf Ebrahimi  (*FAIL)         force backtrack; synonym (*F)
585*22dc650dSSadaf Ebrahimi  (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
586*22dc650dSSadaf Ebrahimi</pre>
587*22dc650dSSadaf EbrahimiThe following act only when a subsequent match failure causes a backtrack to
588*22dc650dSSadaf Ebrahimireach them. They all force a match failure, but they differ in what happens
589*22dc650dSSadaf Ebrahimiafterwards. Those that advance the start-of-match point do so only if the
590*22dc650dSSadaf Ebrahimipattern is not anchored.
591*22dc650dSSadaf Ebrahimi<pre>
592*22dc650dSSadaf Ebrahimi  (*COMMIT)       overall failure, no advance of starting point
593*22dc650dSSadaf Ebrahimi  (*PRUNE)        advance to next starting character
594*22dc650dSSadaf Ebrahimi  (*SKIP)         advance to current matching position
595*22dc650dSSadaf Ebrahimi  (*SKIP:NAME)    advance to position corresponding to an earlier
596*22dc650dSSadaf Ebrahimi                  (*MARK:NAME); if not found, the (*SKIP) is ignored
597*22dc650dSSadaf Ebrahimi  (*THEN)         local failure, backtrack to next alternation
598*22dc650dSSadaf Ebrahimi</pre>
599*22dc650dSSadaf EbrahimiThe effect of one of these verbs in a group called as a subroutine is confined
600*22dc650dSSadaf Ebrahimito the subroutine call.
601*22dc650dSSadaf Ebrahimi</P>
602*22dc650dSSadaf Ebrahimi<br><a name="SEC29" href="#TOC1">CALLOUTS</a><br>
603*22dc650dSSadaf Ebrahimi<P>
604*22dc650dSSadaf Ebrahimi<pre>
605*22dc650dSSadaf Ebrahimi  (?C)            callout (assumed number 0)
606*22dc650dSSadaf Ebrahimi  (?Cn)           callout with numerical data n
607*22dc650dSSadaf Ebrahimi  (?C"text")      callout with string data
608*22dc650dSSadaf Ebrahimi</pre>
609*22dc650dSSadaf EbrahimiThe allowed string delimiters are ` ' " ^ % # $ (which are the same for the
610*22dc650dSSadaf Ebrahimistart and the end), and the starting delimiter { matched with the ending
611*22dc650dSSadaf Ebrahimidelimiter }. To encode the ending delimiter within the string, double it.
612*22dc650dSSadaf Ebrahimi</P>
613*22dc650dSSadaf Ebrahimi<br><a name="SEC30" href="#TOC1">SEE ALSO</a><br>
614*22dc650dSSadaf Ebrahimi<P>
615*22dc650dSSadaf Ebrahimi<b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
616*22dc650dSSadaf Ebrahimi<b>pcre2matching</b>(3), <b>pcre2</b>(3).
617*22dc650dSSadaf Ebrahimi</P>
618*22dc650dSSadaf Ebrahimi<br><a name="SEC31" href="#TOC1">AUTHOR</a><br>
619*22dc650dSSadaf Ebrahimi<P>
620*22dc650dSSadaf EbrahimiPhilip Hazel
621*22dc650dSSadaf Ebrahimi<br>
622*22dc650dSSadaf EbrahimiRetired from University Computing Service
623*22dc650dSSadaf Ebrahimi<br>
624*22dc650dSSadaf EbrahimiCambridge, England.
625*22dc650dSSadaf Ebrahimi<br>
626*22dc650dSSadaf Ebrahimi</P>
627*22dc650dSSadaf Ebrahimi<br><a name="SEC32" href="#TOC1">REVISION</a><br>
628*22dc650dSSadaf Ebrahimi<P>
629*22dc650dSSadaf EbrahimiLast updated: 12 October 2023
630*22dc650dSSadaf Ebrahimi<br>
631*22dc650dSSadaf EbrahimiCopyright &copy; 1997-2023 University of Cambridge.
632*22dc650dSSadaf Ebrahimi<br>
633*22dc650dSSadaf Ebrahimi<p>
634*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>.
635*22dc650dSSadaf Ebrahimi</p>
636