xref: /aosp_15_r20/external/pcre/doc/html/pcre2syntax.html (revision 22dc650d8ae982c6770746019a6f94af92b0f024)
1<html>
2<head>
3<title>pcre2syntax specification</title>
4</head>
5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6<h1>pcre2syntax man page</h1>
7<p>
8Return to the <a href="index.html">PCRE2 index page</a>.
9</p>
10<p>
11This page is part of the PCRE2 HTML documentation. It was generated
12automatically from the original man page. If there is any nonsense in it,
13please consult the man page, in case the conversion went wrong.
14<br>
15<ul>
16<li><a name="TOC1" href="#SEC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a>
17<li><a name="TOC2" href="#SEC2">QUOTING</a>
18<li><a name="TOC3" href="#SEC3">BRACED ITEMS</a>
19<li><a name="TOC4" href="#SEC4">ESCAPED CHARACTERS</a>
20<li><a name="TOC5" href="#SEC5">CHARACTER TYPES</a>
21<li><a name="TOC6" href="#SEC6">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
22<li><a name="TOC7" href="#SEC7">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
23<li><a name="TOC8" href="#SEC8">BINARY PROPERTIES FOR \p AND \P</a>
24<li><a name="TOC9" href="#SEC9">SCRIPT MATCHING WITH \p AND \P</a>
25<li><a name="TOC10" href="#SEC10">THE BIDI_CLASS PROPERTY FOR \p AND \P</a>
26<li><a name="TOC11" href="#SEC11">CHARACTER CLASSES</a>
27<li><a name="TOC12" href="#SEC12">QUANTIFIERS</a>
28<li><a name="TOC13" href="#SEC13">ANCHORS AND SIMPLE ASSERTIONS</a>
29<li><a name="TOC14" href="#SEC14">REPORTED MATCH POINT SETTING</a>
30<li><a name="TOC15" href="#SEC15">ALTERNATION</a>
31<li><a name="TOC16" href="#SEC16">CAPTURING</a>
32<li><a name="TOC17" href="#SEC17">ATOMIC GROUPS</a>
33<li><a name="TOC18" href="#SEC18">COMMENT</a>
34<li><a name="TOC19" href="#SEC19">OPTION SETTING</a>
35<li><a name="TOC20" href="#SEC20">NEWLINE CONVENTION</a>
36<li><a name="TOC21" href="#SEC21">WHAT \R MATCHES</a>
37<li><a name="TOC22" href="#SEC22">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
38<li><a name="TOC23" href="#SEC23">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
39<li><a name="TOC24" href="#SEC24">SCRIPT RUNS</a>
40<li><a name="TOC25" href="#SEC25">BACKREFERENCES</a>
41<li><a name="TOC26" href="#SEC26">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
42<li><a name="TOC27" href="#SEC27">CONDITIONAL PATTERNS</a>
43<li><a name="TOC28" href="#SEC28">BACKTRACKING CONTROL</a>
44<li><a name="TOC29" href="#SEC29">CALLOUTS</a>
45<li><a name="TOC30" href="#SEC30">SEE ALSO</a>
46<li><a name="TOC31" href="#SEC31">AUTHOR</a>
47<li><a name="TOC32" href="#SEC32">REVISION</a>
48</ul>
49<br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
50<P>
51The full syntax and semantics of the regular expressions that are supported by
52PCRE2 are described in the
53<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
54documentation. This document contains a quick-reference summary of the syntax.
55</P>
56<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
57<P>
58<pre>
59  \x         where x is non-alphanumeric is a literal x
60  \Q...\E    treat enclosed characters as literal
61</pre>
62Note that white space inside \Q...\E is always treated as literal, even if
63PCRE2_EXTENDED is set, causing most other white space to be ignored.
64</P>
65<br><a name="SEC3" href="#TOC1">BRACED ITEMS</a><br>
66<P>
67With one exception, wherever brace characters { and } are required to enclose
68data for constructions such as \g{2} or \k{name}, space and/or horizontal tab
69characters that follow { or precede } are allowed and are ignored. In the case
70of quantifiers, they may also appear before or after the comma. The exception
71is \u{...} which is not Perl-compatible and is recognized only when
72PCRE2_EXTRA_ALT_BSUX is set. This is an ECMAScript compatibility feature, and
73follows ECMAScript's behaviour.
74</P>
75<br><a name="SEC4" href="#TOC1">ESCAPED CHARACTERS</a><br>
76<P>
77This table applies to ASCII and Unicode environments. An unrecognized escape
78sequence causes an error.
79<pre>
80  \a         alarm, that is, the BEL character (hex 07)
81  \cx        "control-x", where x is a non-control ASCII character
82  \e         escape (hex 1B)
83  \f         form feed (hex 0C)
84  \n         newline (hex 0A)
85  \r         carriage return (hex 0D)
86  \t         tab (hex 09)
87  \0dd       character with octal code 0dd
88  \ddd       character with octal code ddd, or backreference
89  \o{ddd..}  character with octal code ddd..
90  \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
91  \xhh       character with hex code hh
92  \x{hh..}   character with hex code hh..
93</pre>
94If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
95following are also recognized:
96<pre>
97  \U         the character "U"
98  \uhhhh     character with hex code hhhh
99  \u{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
100</pre>
101When \x is not followed by {, from zero to two hexadecimal digits are read,
102but in ALT_BSUX mode \x must be followed by two hexadecimal digits to be
103recognized as a hexadecimal escape; otherwise it matches a literal "x".
104Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits
105or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
106matches a literal "u".
107</P>
108<P>
109Note that \0dd is always an octal code. The treatment of backslash followed by
110a non-zero digit is complicated; for details see the section
111<a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
112in the
113<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
114documentation, where details of escape processing in EBCDIC environments are
115also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
116supported in EBCDIC environments. Note that \N not followed by an opening
117curly bracket has a different meaning (see below).
118</P>
119<br><a name="SEC5" href="#TOC1">CHARACTER TYPES</a><br>
120<P>
121<pre>
122  .          any character except newline;
123               in dotall mode, any character whatsoever
124  \C         one code unit, even in UTF mode (best avoided)
125  \d         a decimal digit
126  \D         a character that is not a decimal digit
127  \h         a horizontal white space character
128  \H         a character that is not a horizontal white space character
129  \N         a character that is not a newline
130  \p{<i>xx</i>}     a character with the <i>xx</i> property
131  \P{<i>xx</i>}     a character without the <i>xx</i> property
132  \R         a newline sequence
133  \s         a white space character
134  \S         a character that is not a white space character
135  \v         a vertical white space character
136  \V         a character that is not a vertical white space character
137  \w         a "word" character
138  \W         a "non-word" character
139  \X         a Unicode extended grapheme cluster
140</pre>
141\C is dangerous because it may leave the current matching point in the middle
142of a UTF-8 or UTF-16 character. The application can lock out the use of \C by
143setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2
144with the use of \C permanently disabled.
145</P>
146<P>
147By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
148or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
149happening, \s and \w may also match characters with code points in the range
150128-255. If the PCRE2_UCP option is set, the behaviour of these escape
151sequences is changed to use Unicode properties and they match many more
152characters, but there are some option settings that can restrict individual
153sequences to matching only ASCII characters.
154</P>
155<P>
156Property descriptions in \p and \P are matched caselessly; hyphens,
157underscores, and white space are ignored, in accordance with Unicode's "loose
158matching" rules.
159</P>
160<br><a name="SEC6" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
161<P>
162<pre>
163  C          Other
164  Cc         Control
165  Cf         Format
166  Cn         Unassigned
167  Co         Private use
168  Cs         Surrogate
169
170  L          Letter
171  Ll         Lower case letter
172  Lm         Modifier letter
173  Lo         Other letter
174  Lt         Title case letter
175  Lu         Upper case letter
176  Lc         Ll, Lu, or Lt
177  L&         Ll, Lu, or Lt
178
179  M          Mark
180  Mc         Spacing mark
181  Me         Enclosing mark
182  Mn         Non-spacing mark
183
184  N          Number
185  Nd         Decimal number
186  Nl         Letter number
187  No         Other number
188
189  P          Punctuation
190  Pc         Connector punctuation
191  Pd         Dash punctuation
192  Pe         Close punctuation
193  Pf         Final punctuation
194  Pi         Initial punctuation
195  Po         Other punctuation
196  Ps         Open punctuation
197
198  S          Symbol
199  Sc         Currency symbol
200  Sk         Modifier symbol
201  Sm         Mathematical symbol
202  So         Other symbol
203
204  Z          Separator
205  Zl         Line separator
206  Zp         Paragraph separator
207  Zs         Space separator
208</PRE>
209</P>
210<br><a name="SEC7" href="#TOC1">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
211<P>
212<pre>
213  Xan        Alphanumeric: union of properties L and N
214  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
215  Xsp        Perl space: property Z or tab, NL, VT, FF, CR
216  Xuc        Universally-named character: one that can be
217               represented by a Universal Character Name
218  Xwd        Perl word: property Xan or underscore
219</pre>
220Perl and POSIX space are now the same. Perl added VT to its space character set
221at release 5.18.
222</P>
223<br><a name="SEC8" href="#TOC1">BINARY PROPERTIES FOR \p AND \P</a><br>
224<P>
225Unicode defines a number of binary properties, that is, properties whose only
226values are true or false. You can obtain a list of those that are recognized by
227\p and \P, along with their abbreviations, by running this command:
228<pre>
229  pcre2test -LP
230</PRE>
231</P>
232<br><a name="SEC9" href="#TOC1">SCRIPT MATCHING WITH \p AND \P</a><br>
233<P>
234Many script names and their 4-letter abbreviations are recognized in
235\p{sc:...} or \p{scx:...} items, or on their own with \p (and also \P of
236course). You can obtain a list of these scripts by running this command:
237<pre>
238  pcre2test -LS
239</PRE>
240</P>
241<br><a name="SEC10" href="#TOC1">THE BIDI_CLASS PROPERTY FOR \p AND \P</a><br>
242<P>
243<pre>
244  \p{Bidi_Class:&#60;class&#62;}   matches a character with the given class
245  \p{BC:&#60;class&#62;}           matches a character with the given class
246</pre>
247The recognized classes are:
248<pre>
249  AL          Arabic letter
250  AN          Arabic number
251  B           paragraph separator
252  BN          boundary neutral
253  CS          common separator
254  EN          European number
255  ES          European separator
256  ET          European terminator
257  FSI         first strong isolate
258  L           left-to-right
259  LRE         left-to-right embedding
260  LRI         left-to-right isolate
261  LRO         left-to-right override
262  NSM         non-spacing mark
263  ON          other neutral
264  PDF         pop directional format
265  PDI         pop directional isolate
266  R           right-to-left
267  RLE         right-to-left embedding
268  RLI         right-to-left isolate
269  RLO         right-to-left override
270  S           segment separator
271  WS          which space
272</PRE>
273</P>
274<br><a name="SEC11" href="#TOC1">CHARACTER CLASSES</a><br>
275<P>
276<pre>
277  [...]       positive character class
278  [^...]      negative character class
279  [x-y]       range (can be used for hex characters)
280  [[:xxx:]]   positive POSIX named set
281  [[:^xxx:]]  negative POSIX named set
282
283  alnum       alphanumeric
284  alpha       alphabetic
285  ascii       0-127
286  blank       space or tab
287  cntrl       control character
288  digit       decimal digit
289  graph       printing, excluding space
290  lower       lower case letter
291  print       printing, including space
292  punct       printing, excluding alphanumeric
293  space       white space
294  upper       upper case letter
295  word        same as \w
296  xdigit      hexadecimal digit
297</pre>
298In PCRE2, POSIX character set names recognize only ASCII characters by default,
299but some of them use Unicode properties if PCRE2_UCP is set. You can use
300\Q...\E inside a character class.
301</P>
302<br><a name="SEC12" href="#TOC1">QUANTIFIERS</a><br>
303<P>
304<pre>
305  ?           0 or 1, greedy
306  ?+          0 or 1, possessive
307  ??          0 or 1, lazy
308  *           0 or more, greedy
309  *+          0 or more, possessive
310  *?          0 or more, lazy
311  +           1 or more, greedy
312  ++          1 or more, possessive
313  +?          1 or more, lazy
314  {n}         exactly n
315  {n,m}       at least n, no more than m, greedy
316  {n,m}+      at least n, no more than m, possessive
317  {n,m}?      at least n, no more than m, lazy
318  {n,}        n or more, greedy
319  {n,}+       n or more, possessive
320  {n,}?       n or more, lazy
321  {,m}        zero up to m, greedy
322  {,m}+       zero up to m, possessive
323  {,m}?       zero up to m, lazy
324</PRE>
325</P>
326<br><a name="SEC13" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
327<P>
328<pre>
329  \b          word boundary
330  \B          not a word boundary
331  ^           start of subject
332                also after an internal newline in multiline mode
333                (after any newline if PCRE2_ALT_CIRCUMFLEX is set)
334  \A          start of subject
335  $           end of subject
336                also before newline at end of subject
337                also before internal newline in multiline mode
338  \Z          end of subject
339                also before newline at end of subject
340  \z          end of subject
341  \G          first matching position in subject
342</PRE>
343</P>
344<br><a name="SEC14" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
345<P>
346<pre>
347  \K          set reported start of match
348</pre>
349From release 10.38 \K is not permitted by default in lookaround assertions,
350for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
351option is set, the previous behaviour is re-enabled. When this option is set,
352\K is honoured in positive assertions, but ignored in negative ones.
353</P>
354<br><a name="SEC15" href="#TOC1">ALTERNATION</a><br>
355<P>
356<pre>
357  expr|expr|expr...
358</PRE>
359</P>
360<br><a name="SEC16" href="#TOC1">CAPTURING</a><br>
361<P>
362<pre>
363  (...)           capture group
364  (?&#60;name&#62;...)    named capture group (Perl)
365  (?'name'...)    named capture group (Perl)
366  (?P&#60;name&#62;...)   named capture group (Python)
367  (?:...)         non-capture group
368  (?|...)         non-capture group; reset group numbers for
369                   capture groups in each alternative
370</pre>
371In non-UTF modes, names may contain underscores and ASCII letters and digits;
372in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
373both cases, a name must not start with a digit.
374</P>
375<br><a name="SEC17" href="#TOC1">ATOMIC GROUPS</a><br>
376<P>
377<pre>
378  (?&#62;...)         atomic non-capture group
379  (*atomic:...)   atomic non-capture group
380</PRE>
381</P>
382<br><a name="SEC18" href="#TOC1">COMMENT</a><br>
383<P>
384<pre>
385  (?#....)        comment (not nestable)
386</PRE>
387</P>
388<br><a name="SEC19" href="#TOC1">OPTION SETTING</a><br>
389<P>
390Changes of these options within a group are automatically cancelled at the end
391of the group.
392<pre>
393  (?a)            all ASCII options
394  (?aD)           restrict \d to ASCII in UCP mode
395  (?aS)           restrict \s to ASCII in UCP mode
396  (?aW)           restrict \w to ASCII in UCP mode
397  (?aP)           restrict all POSIX classes to ASCII in UCP mode
398  (?aT)           restrict POSIX digit classes to ASCII in UCP mode
399  (?i)            caseless
400  (?J)            allow duplicate named groups
401  (?m)            multiline
402  (?n)            no auto capture
403  (?r)            restrict caseless to either ASCII or non-ASCII
404  (?s)            single line (dotall)
405  (?U)            default ungreedy (lazy)
406  (?x)            ignore white space except in classes or \Q...\E
407  (?xx)           as (?x) but also ignore space and tab in classes
408  (?-...)         unset the given option(s)
409  (?^)            unset imnrsx options
410</pre>
411(?aP) implies (?aT) as well, though this has no additional effect. However, it
412means that (?-aP) is really (?-PT) which disables all ASCII restrictions for
413POSIX classes.
414</P>
415<P>
416Unsetting x or xx unsets both. Several options may be set at once, and a
417mixture of setting and unsetting such as (?i-x) is allowed, but there may be
418only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
419(?^in). An option setting may appear at the start of a non-capture group, for
420example (?i:...).
421</P>
422<P>
423The following are recognized only at the very start of a pattern or after one
424of the newline or \R options with similar syntax. More than one of them may
425appear. For the first three, d is a decimal number.
426<pre>
427  (*LIMIT_DEPTH=d) set the backtracking limit to d
428  (*LIMIT_HEAP=d)  set the heap size limit to d * 1024 bytes
429  (*LIMIT_MATCH=d) set the match limit to d
430  (*NOTEMPTY)      set PCRE2_NOTEMPTY when matching
431  (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
432  (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
433  (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR)
434  (*NO_JIT)       disable JIT optimization
435  (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
436  (*UTF)          set appropriate UTF mode for the library in use
437  (*UCP)          set PCRE2_UCP (use Unicode properties for \d etc)
438</pre>
439Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the value of
440the limits set by the caller of <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>,
441not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
442application can lock out the use of (*UTF) and (*UCP) by setting the
443PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
444</P>
445<br><a name="SEC20" href="#TOC1">NEWLINE CONVENTION</a><br>
446<P>
447These are recognized only at the very start of the pattern or after option
448settings with a similar syntax.
449<pre>
450  (*CR)           carriage return only
451  (*LF)           linefeed only
452  (*CRLF)         carriage return followed by linefeed
453  (*ANYCRLF)      all three of the above
454  (*ANY)          any Unicode newline sequence
455  (*NUL)          the NUL character (binary zero)
456</PRE>
457</P>
458<br><a name="SEC21" href="#TOC1">WHAT \R MATCHES</a><br>
459<P>
460These are recognized only at the very start of the pattern or after option
461setting with a similar syntax.
462<pre>
463  (*BSR_ANYCRLF)  CR, LF, or CRLF
464  (*BSR_UNICODE)  any Unicode newline sequence
465</PRE>
466</P>
467<br><a name="SEC22" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
468<P>
469<pre>
470  (?=...)                     )
471  (*pla:...)                  ) positive lookahead
472  (*positive_lookahead:...)   )
473
474  (?!...)                     )
475  (*nla:...)                  ) negative lookahead
476  (*negative_lookahead:...)   )
477
478  (?&#60;=...)                    )
479  (*plb:...)                  ) positive lookbehind
480  (*positive_lookbehind:...)  )
481
482  (?&#60;!...)                    )
483  (*nlb:...)                  ) negative lookbehind
484  (*negative_lookbehind:...)  )
485</pre>
486Each top-level branch of a lookbehind must have a limit for the number of
487characters it matches. If any branch can match a variable number of characters,
488the maximum for each branch is limited to a value set by the caller of
489<b>pcre2_compile()</b> or defaulted. The default is set when PCRE2 is built
490(ultimate default 255). If every branch matches a fixed number of characters,
491the limit for each branch is 65535 characters.
492</P>
493<br><a name="SEC23" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
494<P>
495These assertions are specific to PCRE2 and are not Perl-compatible.
496<pre>
497  (?*...)                                )
498  (*napla:...)                           ) synonyms
499  (*non_atomic_positive_lookahead:...)   )
500
501  (?&#60;*...)                               )
502  (*naplb:...)                           ) synonyms
503  (*non_atomic_positive_lookbehind:...)  )
504</PRE>
505</P>
506<br><a name="SEC24" href="#TOC1">SCRIPT RUNS</a><br>
507<P>
508<pre>
509  (*script_run:...)           ) script run, can be backtracked into
510  (*sr:...)                   )
511
512  (*atomic_script_run:...)    ) atomic script run
513  (*asr:...)                  )
514</PRE>
515</P>
516<br><a name="SEC25" href="#TOC1">BACKREFERENCES</a><br>
517<P>
518<pre>
519  \n              reference by number (can be ambiguous)
520  \gn             reference by number
521  \g{n}           reference by number
522  \g+n            relative reference by number (PCRE2 extension)
523  \g-n            relative reference by number
524  \g{+n}          relative reference by number (PCRE2 extension)
525  \g{-n}          relative reference by number
526  \k&#60;name&#62;        reference by name (Perl)
527  \k'name'        reference by name (Perl)
528  \g{name}        reference by name (Perl)
529  \k{name}        reference by name (.NET)
530  (?P=name)       reference by name (Python)
531</PRE>
532</P>
533<br><a name="SEC26" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
534<P>
535<pre>
536  (?R)            recurse whole pattern
537  (?n)            call subroutine by absolute number
538  (?+n)           call subroutine by relative number
539  (?-n)           call subroutine by relative number
540  (?&name)        call subroutine by name (Perl)
541  (?P&#62;name)       call subroutine by name (Python)
542  \g&#60;name&#62;        call subroutine by name (Oniguruma)
543  \g'name'        call subroutine by name (Oniguruma)
544  \g&#60;n&#62;           call subroutine by absolute number (Oniguruma)
545  \g'n'           call subroutine by absolute number (Oniguruma)
546  \g&#60;+n&#62;          call subroutine by relative number (PCRE2 extension)
547  \g'+n'          call subroutine by relative number (PCRE2 extension)
548  \g&#60;-n&#62;          call subroutine by relative number (PCRE2 extension)
549  \g'-n'          call subroutine by relative number (PCRE2 extension)
550</PRE>
551</P>
552<br><a name="SEC27" href="#TOC1">CONDITIONAL PATTERNS</a><br>
553<P>
554<pre>
555  (?(condition)yes-pattern)
556  (?(condition)yes-pattern|no-pattern)
557
558  (?(n)               absolute reference condition
559  (?(+n)              relative reference condition (PCRE2 extension)
560  (?(-n)              relative reference condition (PCRE2 extension)
561  (?(&#60;name&#62;)          named reference condition (Perl)
562  (?('name')          named reference condition (Perl)
563  (?(name)            named reference condition (PCRE2, deprecated)
564  (?(R)               overall recursion condition
565  (?(Rn)              specific numbered group recursion condition
566  (?(R&name)          specific named group recursion condition
567  (?(DEFINE)          define groups for reference
568  (?(VERSION[&#62;]=n.m)  test PCRE2 version
569  (?(assert)          assertion condition
570</pre>
571Note the ambiguity of (?(R) and (?(Rn) which might be named reference
572conditions or recursion tests. Such a condition is interpreted as a reference
573condition if the relevant named group exists.
574</P>
575<br><a name="SEC28" href="#TOC1">BACKTRACKING CONTROL</a><br>
576<P>
577All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
578name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
579if :NAME is present. The others just set a name for passing back to the caller,
580but this is not a name that (*SKIP) can see. The following act immediately they
581are reached:
582<pre>
583  (*ACCEPT)       force successful match
584  (*FAIL)         force backtrack; synonym (*F)
585  (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
586</pre>
587The following act only when a subsequent match failure causes a backtrack to
588reach them. They all force a match failure, but they differ in what happens
589afterwards. Those that advance the start-of-match point do so only if the
590pattern is not anchored.
591<pre>
592  (*COMMIT)       overall failure, no advance of starting point
593  (*PRUNE)        advance to next starting character
594  (*SKIP)         advance to current matching position
595  (*SKIP:NAME)    advance to position corresponding to an earlier
596                  (*MARK:NAME); if not found, the (*SKIP) is ignored
597  (*THEN)         local failure, backtrack to next alternation
598</pre>
599The effect of one of these verbs in a group called as a subroutine is confined
600to the subroutine call.
601</P>
602<br><a name="SEC29" href="#TOC1">CALLOUTS</a><br>
603<P>
604<pre>
605  (?C)            callout (assumed number 0)
606  (?Cn)           callout with numerical data n
607  (?C"text")      callout with string data
608</pre>
609The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
610start and the end), and the starting delimiter { matched with the ending
611delimiter }. To encode the ending delimiter within the string, double it.
612</P>
613<br><a name="SEC30" href="#TOC1">SEE ALSO</a><br>
614<P>
615<b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
616<b>pcre2matching</b>(3), <b>pcre2</b>(3).
617</P>
618<br><a name="SEC31" href="#TOC1">AUTHOR</a><br>
619<P>
620Philip Hazel
621<br>
622Retired from University Computing Service
623<br>
624Cambridge, England.
625<br>
626</P>
627<br><a name="SEC32" href="#TOC1">REVISION</a><br>
628<P>
629Last updated: 12 October 2023
630<br>
631Copyright &copy; 1997-2023 University of Cambridge.
632<br>
633<p>
634Return to the <a href="index.html">PCRE2 index page</a>.
635</p>
636