Lines Matching full:is
7 PCRE2 is a new API for PCRE, starting at release 10.0. This document contains a
289 backward compatibility. They should not be used in new code. The first is
290 replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
320 patterns that can be processed by \fBpcre2_compile()\fP. This facility is
333 units, respectively. However, there is just one header file, \fBpcre2.h\fP.
351 For example, PCRE2_UCHAR16 is usually defined as `uint16_t'.
353 that is, they are pointers to vectors of unsigned code units.
364 PCRE2_CODE_UNIT_WIDTH is not defined by default. An application must define it
370 including \fBpcre2.h\fP, and then use the real function names. Any code that is
371 to be included in an environment where the value of PCRE2_CODE_UNIT_WIDTH is
372 unknown should also use the real function names. (Unfortunately, it is not
375 If PCRE2_CODE_UNIT_WIDTH is not defined before including \fBpcre2.h\fP, a
392 PCRE2 has its own native API, which is described in this document. There are
413 sample program that demonstrates the simplest way of using them is provided in
415 of this program is given in the
431 Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
436 support is not available.
442 JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
443 unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
451 A second matching function, \fBpcre2_dfa_match()\fP, which is not
452 Perl-compatible, is also provided. This uses a different algorithm for the
457 and disadvantages is given in the
461 documentation. There is no JIT support for \fBpcre2_dfa_match()\fP.
479 functions is called with a NULL argument, the function returns immediately
494 blocks of various sorts. In all cases, if one of these functions is called with
502 several places. These values are always of type PCRE2_SIZE, which is an
504 value that can be stored in such a type (that is ~(PCRE2_SIZE)0) is reserved
506 Therefore, the longest string that can be handled is one less than this
508 8-bit library is such a length the same as the number of bytes in the string.
523 Each of the first three conventions is used by at least one operating system as
524 its standard newline sequence. When PCRE2 is built, a default can be specified.
525 If it is not, the default is set to LF, which is the Unix standard. However,
534 In the PCRE2 documentation the word "newline" is used to mean "the character or
537 metacharacters, the handling of #-comments in /x mode, and, when CRLF is a
539 non-anchored pattern. There is more detail about this in the
554 In a multithreaded application it is important to keep thread-specific data
556 itself is thread-safe: it contains no static or global variables. The API is
567 A pointer to the compiled form of a pattern is returned to the user when
568 \fBpcre2_compile()\fP is successful. The data in the compiled pattern is fixed,
569 and does not change when the pattern is matched. Therefore, it is thread-safe,
570 that is, the same compiled pattern can be used by more than one thread
573 just-in-time (JIT) optimization feature is being used, it needs separate memory
583 is somewhat tricky to do correctly. If you know that writing to a pointer is
597 The reason for checking the pointer a second time is as follows: Several
607 is not sufficient. The thread that is doing the compiling may be descheduled
625 If JIT is being used, but the JIT compilation is not being done immediately
626 (perhaps waiting to see if the pattern is used often enough), similar logic is
638 functions are called. A context is nothing more than a collection of parameters
640 in a context is a convenient way of passing them to a PCRE2 function without
668 directly. A context is just a block of memory that holds the parameter values.
670 NULL when a context pointer is required.
672 There are three different types of context: a general context that is relevant
681 library. The context is named `general' rather than specifically `memory'
684 general context. A general context is created by:
698 Whenever code in PCRE2 calls these functions, the final argument is the value
701 \fImalloc()\fP and \fIfree()\fP are used. (This is not currently useful, as
703 The \fIprivate_malloc()\fP function is used (if supplied) to obtain memory for
708 used. When the time comes to free the block, this function is called.
723 If this function is passed a NULL argument, it returns immediately without
731 A compile context is required if you want to provide an external function for
742 A compile context is also required if you are using custom memory management.
746 A compile context is created, copied, and freed by the following functions:
758 A compile context is created with default values for its parameters. These can
760 PCRE2_ERROR_BADDATA if invalid data is detected.
769 ending sequence. The value is used by the JIT compiler and by the two
779 argument is a general context. This function builds a set of character tables
804 This sets a maximum length, in code units, for any pattern string that is
805 compiled with this context. If the pattern is longer, an error is generated.
806 This facility is provided so that applications that accept patterns from
807 external sources can limit their size. The default is the largest number that a
808 PCRE2_SIZE variable can hold, which is effectively unlimited.
816 version of a pattern that is compiled with this context. If the pattern needs
817 more memory, an error is generated. This facility is provided so that
819 memory they use. The default is the largest number that a PCRE2_SIZE variable
820 can hold, which is effectively unlimited.
828 variable-length lookbehind assertion. The default is set when PCRE2 is built,
842 NUL character, that is a binary zero).
851 When a pattern is compiled with the PCRE2_EXTENDED or PCRE2_EXTENDED_MORE
853 comments starting with #. The value is saved with the compiled pattern for
862 This parameter adjusts the limit, set when PCRE2 is built (default 250), on the
872 There is at least one application that runs PCRE2 in threads with very limited
873 system stack, where running out of stack is to be avoided at all costs. The
874 parenthesis limit above cannot take account of how much stack is actually
876 that is called whenever \fBpcre2_compile()\fP starts to compile a parenthesized
881 nesting, and the second is user data that is set up by the last argument of
883 zero if all is well, or non-zero to force an error.
890 A match context is required if you want to:
902 A match context is created, copied, and freed by the following functions:
914 A match context is created with default values for its parameters. These can
916 PCRE2_ERROR_BADDATA if invalid data is detected.
951 advance in the subject string. The default value is PCRE2_UNSET. The
954 offset is not found. The \fBpcre2_substitute()\fP function makes no more
957 For example, if the pattern /abc/ is matched against "123abc" with an offset
958 limit less than 3, the result is PCRE2_ERROR_NOMATCH. A match can never be
960 \fBpcre2_dfa_match()\fP, or \fBpcre2_substitute()\fP is greater than the offset
964 calling \fBpcre2_compile()\fP so that when JIT is in use, different code can be
965 compiled. If a match is started with a non-default match limit when
966 PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
971 newline that follows the start of matching in the subject. If this is set with
973 offset limit. In other words, whichever limit comes first is used.
990 documentation for more details). If the limit is reached, the negative error
991 code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
992 is built; if it is not, the default is set very large and is essentially
1000 where ddd is a decimal number. However, such a setting is ignored unless ddd is
1002 limit is set, less than the default.
1012 For \fBpcre2_dfa_match()\fP, a vector on the system stack is used when
1014 is not big enough is heap memory used. In this case, setting a value of zero
1025 trees. The classic example is a pattern that uses nested unlimited repeats.
1027 There is an internal counter in \fBpcre2_match()\fP that is incremented each
1033 though the counting is done in a different way.
1035 When \fBpcre2_match()\fP is called with a pattern that was successfully
1036 processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed
1037 is entirely different. However, there is still the possibility of runaway
1042 The default value for the limit can be set when PCRE2 is built; the default is
1048 where ddd is a decimal number. However, such a setting is ignored unless ddd is
1050 \fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
1058 Each time a nested backtracking point is passed, a new memory frame is used
1060 indirectly limits the amount of memory that is used in a match. However,
1066 The depth limit is not relevant, and is ignored, when matching is done using
1067 JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
1070 limits, indirectly, the amount of system stack that is used. It was more useful
1076 If the depth of internal recursive function calls is great enough, local
1078 depth limit also indirectly limits the amount of heap memory that is used. A
1080 using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
1084 The default value for the depth limit can be set when PCRE2 is built; if it is
1085 not, the default is set to the same value as the default for the match limit.
1086 If the limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
1092 where ddd is a decimal number. However, such a setting is ignored unless ddd is
1094 \fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
1110 The first argument for \fBpcre2_config()\fP specifies which information is
1111 required. The second argument is a pointer to memory into which the information
1112 is placed. If NULL is passed, the function returns the amount of memory that is
1114 the value is in bytes; when requesting these values, \fIwhere\fP should point
1116 length is given in code units, not counting the terminating zero.
1118 When requesting information, the returned value from \fBpcre2_config()\fP is
1120 the value in the first argument is not recognized. The following information is
1125 The output is a uint32_t integer whose value indicates what character
1129 default can be overridden when a pattern is compiled.
1133 The output is a uint32_t integer whose lower bits indicate which code unit
1139 The output is a uint32_t integer that gives the default limit for the depth of
1146 The output is a uint32_t integer that gives, in kibibytes, the default limit
1153 The output is a uint32_t integer that is set to one if support for just-in-time
1154 compiling is included in the library; otherwise it is set to zero. Note that
1164 The \fIwhere\fP argument should point to a buffer that is at least 48 code
1166 \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with a
1167 string that contains the name of the architecture for which the JIT compiler is
1169 is not available, PCRE2_ERROR_BADOPTION is returned, otherwise the number of
1170 code units used is returned. This is the length of the string, plus one unit
1175 The output is a uint32_t integer that contains the number of bytes used for
1176 internal linkage in compiled regular expressions. When PCRE2 is configured, the
1177 value can be set to 2, 3, or 4, with the default being 2. This is the value
1178 that is returned by \fBpcre2_config()\fP. However, when the 16-bit library is
1179 compiled, a value of 3 is rounded up to 4, and when the 32-bit library is
1180 compiled, internal linkages always use 4 bytes, so the configured value is not
1183 The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
1190 The output is a uint32_t integer that gives the default match limit for
1196 The output is a uint32_t integer whose value specifies the default character
1197 sequence that is recognized as meaning "newline". The values are:
1211 The output is a uint32_t integer that is set to one if the use of \eC was
1212 permanently disabled when PCRE2 was built; otherwise it is set to zero.
1216 The output is a uint32_t integer that gives the maximum depth of nesting
1217 of parentheses (of any kind) in a pattern. This limit is imposed to cap the
1218 amount of system stack used when a pattern is compiled. It is specified when
1219 PCRE2 is built; the default is 250. This limit does not take into account the
1225 This parameter is obsolete and should not be used in new code. The output is a
1226 uint32_t integer that is always set to zero.
1230 The output is a uint32_t integer that gives the length of PCRE2's character
1240 The \fIwhere\fP argument should point to a buffer that is at least 24 code
1243 without Unicode support, the buffer is filled with the text "Unicode not
1244 supported". Otherwise, the Unicode version string (for example, "8.0.0") is
1245 inserted. The number of code units used is returned. This is the length of the
1250 The output is a uint32_t integer that is set to one if Unicode support is
1251 available; otherwise it is set to zero. Unicode support implies UTF support.
1255 The \fIwhere\fP argument should point to a buffer that is at least 24 code
1257 \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
1258 the PCRE2 version string, zero-terminated. The number of code units used is
1259 returned. This is the length of the string plus one unit for the terminating
1280 The pattern is defined by a pointer to a string of code units and a length in
1281 code units. If the pattern is zero-terminated, the length can be specified as
1282 PCRE2_ZERO_TERMINATED. A NULL pattern pointer with a length of zero is treated
1287 If the compile context argument \fIccontext\fP is NULL, memory for the compiled
1288 pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from
1290 free the memory by calling \fBpcre2_code_free()\fP when it is no longer needed.
1291 If \fBpcre2_code_free()\fP is called with a NULL argument, it returns
1301 the JIT information cannot be copied (because it is position-dependent).
1303 passed to \fBpcre2_jit_compile()\fP if required. If \fBpcre2_code_copy()\fP is
1316 tables are used throughout, so this behaviour is appropriate. Nevertheless,
1320 the new tables. The memory for the new tables is automatically freed when
1321 \fBpcre2_code_free()\fP is called for the new copy of the compiled code. If
1322 \fBpcre2_code_copy_with_tables()\fP is called with a NULL argument, it returns
1325 NOTE: When one of the matching functions is called, pointers to the compiled
1335 PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
1366 If \fIerrorcode\fP or \fIerroroffset\fP is NULL, \fBpcre2_compile()\fP returns
1374 that are used for invalid UTF strings when validity checking is in force. These
1380 documentation. There is no separate documentation for the positive error codes,
1390 is successful \fIerrorcode\fP is set to a value that returns the message "no
1393 The value returned in \fIerroroffset\fP is an indication of where in the
1394 pattern an error occurred. When there is no error, zero is returned. A non-zero
1395 value is not necessarily the furthest point in the pattern that was read. For
1396 example, after the error "lookbehind assertion is not fixed length", the error
1398 UTF-16 string, the offset is that of the first code unit of the failing
1402 cases, the offset passed back is the length of the pattern. Note that the
1403 offset is in code units, not characters, even in a UTF mode. It may sometimes
1414 PCRE2_ZERO_TERMINATED, /* the pattern is zero-terminated */
1430 If this bit is set, the pattern is forced to be "anchored", that is, it is
1431 constrained to match only at the first matching point in the string that is
1433 appropriate constructs in the pattern itself, which is the only way to do it in
1439 immediately follows an opening one is treated as a data character for the
1440 class. When PCRE2_ALLOW_EMPTY_CLASS is set, it terminates the class, which
1446 makes PCRE2's behaviour more like ECMAscript (aka JavaScript). When it is set:
1451 (2) \eu matches a lower case "u" character unless it is followed by four
1456 (3) \ex matches a lower case "x" character unless it is followed by two
1458 to match. By default, as in Perl, a hexadecimal number is always expected after
1474 In multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter
1475 matches at the start of the subject (unless PCRE2_NOTBOL is set), and also
1484 (*MARK:NAME) is any sequence of characters that does not include a closing
1485 parenthesis. The name is not processed in any way, and it is not possible to
1487 option is set, normal backslash processing is applied to verb names and only an
1490 or PCRE2_EXTENDED_MORE option is set with PCRE2_ALT_VERBNAMES, unescaped
1491 whitespace in verb names is skipped and #-comments are recognized, exactly as
1496 If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
1507 If this bit is set, letters in the pattern match both upper and lower case
1508 letters in the subject. It is equivalent to Perl's /i option, and it can be
1510 PCRE2_UCP is set, Unicode properties are used for all characters with more than
1517 For lower valued characters with only one other case, a lookup table is used
1518 for speed. When neither PCRE2_UTF nor PCRE2_UCP is set, a lookup table is used
1524 If this bit is set, a dollar metacharacter in the pattern matches only at the
1527 newlines). The PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is
1528 set. There is no equivalent to this option in Perl, and no way to set it within
1533 If this bit is set, a dot metacharacter in the pattern matches any character,
1536 not match when the current position in the subject is at a newline. This option
1544 If this bit is set, names used to identify capture groups need not be unique.
1545 This can be helpful for certain types of pattern when it is known that only one
1555 If this bit is set, the end of any pattern match must be right at the end of
1568 achieved by appropriate constructs in the pattern itself, which is the only way
1572 to the first (that is, the longest) matched string. Other parallel matches,
1578 If this bit is set, most white space characters in the pattern are totally
1580 sequence. However, white space is not allowed within sequences such as (?> that
1582 as {1,3}. Ignorable white space is permitted between an item and a following
1584 possessiveness. PCRE2_EXTENDED is equivalent to Perl's /x option, and it can be
1587 When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as
1589 flagged as white space in its low-character table. The table is normally
1599 When PCRE2 is compiled with Unicode support, in addition to these characters,
1603 separator). This set of characters is the same as recognized by Perl's /x
1610 complicated patterns. Note that the end of this type of comment is a literal
1615 the compile context that is passed to \fBpcre2_compile()\fP or by a special
1621 in the \fBpcre2pattern\fP documentation. A default is defined when PCRE2 is
1629 characters that are ignored outside a character class. PCRE2_EXTENDED_MORE is
1635 If this option is set, the start of an unanchored pattern match must be before
1637 though the matched text may continue over the newline. If \fIstartoffset\fP is
1638 non-zero, the limiting newline is not necessarily the first newline in the
1639 subject. For example, if the subject string is "abc\enxyz" (where \en
1641 PCRE2_FIRSTLINE if \fIstartoffset\fP is greater than 3. See also
1643 PCRE2_FIRSTLINE is set with an offset limit, a match must occur in the first
1645 first is used. This option has no effect for anchored patterns.
1649 If this option is set, all meta-characters in the pattern are disabled, and it
1651 expression engine is not the most efficient way of doing it. If you are doing a
1667 suitably aligned. This facility is not supported for DFA matching. For details,
1676 If this option is set, a backreference to an unset capture group matches an
1678 A pattern such as (\e1)(a) succeeds when this option is set (assuming it can
1690 (except when PCRE2_DOLLAR_ENDONLY is set). Note, however, that unless
1691 PCRE2_DOTALL is set, the "any character" metacharacter (.) does not match at a
1692 newline. This behaviour (for ^, $, and dot) is the same as Perl.
1694 When PCRE2_MULTILINE it is set, the "start of line" and "end of line"
1706 This option locks out the use of \eC in the pattern that is being compiled.
1710 external sources. Note that there is also a build-time option that permanently
1725 UTF-32, depending on which library is in use. In particular, it prevents the
1733 If this option is set, it disables the use of numbered capturing parentheses in
1734 the pattern. Any opening parenthesis that is not followed by ? behaves as if it
1736 they acquire numbers in the usual way). This is the same as Perl's /n option.
1737 Note that, when this option is set, references to capture groups
1743 If this option is set, it disables "auto-possessification", which is an
1748 search and run all the callouts, but it is mainly provided for testing
1753 If this option is set, it disables an optimization that is applied when .* is
1755 other branches also start with .* or with \eA or \eG or ^. The optimization is
1756 automatically disabled for .* if it is inside an atomic group or a capture
1757 group that is the subject of a backreference, or if the pattern contains
1758 (*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
1759 automatically anchored if PCRE2_DOTALL is set for all the .* items and
1760 PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
1761 must start either at the start of the subject or following a newline is
1766 This is an option whose main effect is at matching time. It does not change
1771 order to speed up the process. For example, if it is known that an unanchored
1775 such as (*COMMIT) at the start of a pattern is not considered until after a
1778 skipped if the pattern is never actually used. The start-up optimizations are
1779 in effect a pre-scan of the subject that takes place before the pattern is run.
1783 result is "no match", the callouts do occur, and that items such as (*COMMIT)
1792 When this is compiled, PCRE2 records the fact that a match must start with the
1793 character "A". Suppose the subject string is "DEFABC". The start-up
1797 match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
1798 subject string does not happen. The first match attempt is run starting from
1800 the overall result is "no match".
1803 subject, which is recorded when possible. Consider the pattern
1807 The minimum length for a match is two characters. If the subject is "XXBB", the
1809 is long enough. In the process, (*MARK:2) is encountered and remembered. When
1810 the match attempt fails, the next "B" is found, but there is only one character
1811 left, so there are no more attempts, and "no match" is returned with the "last
1812 mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
1814 (*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
1815 returned is "1". In this case, the optimizations do not affect the overall
1816 match result, which is still "no match", but they do affect the auxiliary
1817 information that is returned.
1821 When PCRE2_UTF is set, the validity of the pattern as a UTF string is
1840 document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
1843 If you know that your pattern is a valid UTF string, and you want to skip this
1845 it is set, the effect of passing an invalid UTF string as a pattern is
1853 error that is given if an escape sequence for an invalid Unicode code point is
1862 However, this is possible only in UTF-8 and UTF-32 modes, because these values
1869 default, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode
1883 The second effect of PCRE2_UCP is to force the use of Unicode properties for
1884 upper/lower casing operations, even when PCRE2_UTF is not set. This makes it
1885 possible to process strings in the 16-bit UCS-2 code. This option is available
1886 only if PCRE2 has been compiled with Unicode support (which is the default).
1894 greedy by default, but become greedy if followed by "?". It is not compatible
1900 \fBpcre2_set_offset_limit()\fP is going to be used to set a non-default offset
1901 limit in a match context for matches that use this pattern. An error is
1902 generated if an offset limit is set without this option. For more details, see
1915 single-code-unit strings. It is available when PCRE2 is built to include
1916 Unicode support (which is the default). If Unicode support is not available,
1935 assertions, following Perl's lead. This option is provided to re-enable the
1937 case anybody is relying on it.
1941 This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is
1947 in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2.
1956 If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
1959 characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
1967 character code, where hhh.. is any number of hexadecimal digits.
1971 This option forces \ed to match only ASCII digits, even when PCRE2_UCP is set.
1977 PCRE2_UCP is set. It can be changed within a pattern by means of the (?aS)
1989 match only ASCII digits, even when PCRE2_UCP is set. It can be changed within
1995 [:xdigit:], to match only ASCII characters, even when PCRE2_UCP is set. It can
2002 This is a dangerous option. Use with care. By default, an unrecognized escape
2004 detected by \fBpcre2_compile()\fP. Perl is somewhat inconsistent in handling
2005 such items: for example, \ej is treated as a literal "j", and non-hexadecimal
2007 Perl's warning switch is enabled. However, a malformed octal number after \eo{
2010 If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
2012 treated as single-character escapes. For example, \ej is a literal "j" and
2013 \ex{2z} is treated as the literal string "x{2z}". Setting this option means
2015 that a sequence such as [\eN{] is interpreted as a malformed attempt at
2016 [\eN{...}] and so is treated as [N{] whereas [\eN] gives an error because an
2017 unqualified \eN is a valid escape sequence but is not supported in a character
2018 class. To reiterate: this is a dangerous option. Use with great care.
2022 When either PCRE2_UCP or PCRE2_UTF is set, caseless matching follows Unicode
2025 characters. The ASCII letter S is case-equivalent to U+017f (long S) and the
2026 ASCII letter K is case-equivalent to U+212a (Kelvin sign). This option disables
2034 is expected to match a newline. If this option is set, \er in a pattern is
2041 This option is provided for use by the \fB-x\fP option of \fBpcre2grep\fP. It
2042 causes the pattern only to match complete lines. This is achieved by
2044 pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, the matched
2050 This option is provided for use by the \fB-w\fP option of \fBpcre2grep\fP. It
2052 and the end. This is achieved by automatically inserting the code for "\eb(?:"
2054 used with PCRE2_LITERAL. However, it is ignored if PCRE2_EXTRA_MATCH_LINE is
2082 compiler is available, further processes a compiled pattern into machine code
2090 JIT compilation is a heavyweight optimization. It can take some time for
2113 When PCRE2 is built with Unicode support (the default), certain Unicode
2115 PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and
2119 when PCRE2_UTF is not set. There are, however, some PCRE2_EXTRA options (see
2122 The use of locales with Unicode is discouraged. If you are handling characters
2128 recognize only ASCII characters. However, when PCRE2 is built, it is possible
2135 support is expected to die away.
2138 the relevant locale. The only argument to this function is a general context,
2139 which can be used to pass a custom memory allocator. If the argument is NULL,
2140 the system \fBmalloc()\fP is used. The result can be passed to
2154 The locale name "fr_FR" is used on Linux and other Unix-like systems; if you
2155 are using Windows, the name for the French locale is "french".
2157 The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP
2163 It is the caller's responsibility to ensure that the memory containing the
2175 processor is 32-bit or 64-bit. A copy of the result of \fBpcre2_maketables()\fP
2180 return this value. Note that the \fBpcre2_dftables\fP program, which is part of
2204 The first argument for \fBpcre2_pattern_info()\fP is a pointer to the compiled
2205 pattern. The second argument specifies which piece of information is required,
2206 and the third argument is a pointer to a variable to receive the data. If the
2207 third argument is NULL, the first argument is ignored, and the function returns
2208 the size in bytes of the variable that is required for the information
2209 requested. Otherwise, the yield of the function is zero for success, or one of
2215 PCRE2_ERROR_UNSET the requested field is not set
2217 The "magic number" is placed at the start of each compiled pattern as a simple
2218 check against passing an arbitrary memory pointer. Here is a typical call of
2225 PCRE2_INFO_SIZE, /* what is required */
2243 For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED
2244 option, the result for PCRE2_INFO_ALLOPTIONS is PCRE2_EXTENDED and PCRE2_UTF.
2249 A pattern compiled without PCRE2_ANCHORED is automatically anchored by PCRE2 if
2250 the first significant item in every top-level branch is one of the following:
2252 ^ unless PCRE2_MULTILINE is set
2257 When .* is the first significant item, anchoring is possible only when all the
2260 .* is not in an atomic group
2262 .* is not in a capture group that is the subject
2264 PCRE2_DOTALL is in force for .*
2266 PCRE2_NO_DOTSTAR_ANCHOR is not set
2268 For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
2278 group is set in a conditional group such as (?(3)a|b) is also a backreference.
2279 Zero is returned if there are no backreferences.
2283 The output is a uint32_t integer whose value indicates what character sequences
2291 is not used, this is also the total number of capture groups. The third
2297 (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
2300 limit will only be used during matching if it is less than the limit set or
2310 value 255 or above". If such a table was constructed, a pointer to it is
2311 returned. Otherwise NULL is returned. The third argument should point to a
2318 variable. If there is a fixed first value, for example, the letter "c" from a
2319 pattern such as (cat|cow|coyote), 1 is returned, and the value can be retrieved
2320 using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, but it is
2322 newline in the subject, 2 is returned. Otherwise, and for anchored patterns, 0
2330 value is always less than 256. In the 16-bit library the value can be up to
2337 backtracking positions when the pattern is processed by \fBpcre2_match()\fP
2351 explicit match is either a literal CR or LF character, or \er or \en or one of
2357 (*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
2360 limit will only be used during matching if it is less than the limit set or
2365 Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise
2377 Returns 1 if there is a rightmost literal code unit that must exist in any
2379 \fBuint32_t\fP variable. If there is no such value, 0 is returned. When 1 is
2381 PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is
2383 pattern /^a\ed+z\ed+/ the returned value is 1 (with "z" returned from
2384 PCRE2_INFO_LASTCODEUNIT), but for /^a\edz\ed/ the returned value is 0.
2397 recursive subroutine calls it is not always possible to determine whether or
2404 (*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
2407 limit will only be used during matching if it is less than the limit set or
2420 Note that this information is useful for multi-segment matching only
2422 (?<=a(?<=ba)c) returns a maximum lookbehind of 2, but when it is processed, the
2426 PCRE2_INFO_MAXLOOKBEHIND is really only useful as a debugging tool. See the
2434 If a minimum length for matching subject strings was computed, its value is
2435 returned. Otherwise the returned value is 0. This value is not computed when
2436 PCRE2_NO_START_OPTIMIZE is set. The value is a number of characters, which in
2438 should point to a \fBuint32_t\fP variable. The value is a lower bound to the
2440 do actually match, but every string that does match is at least that long.
2450 substrings by name. It is also possible to extract the data directly, by first
2453 you need to use the name-to-number map, which is described by these three
2461 PCRE2_INFO_NAMETABLE returns a pointer to the first entry of the table. This is
2467 the parenthesis number. The rest of the entry is the corresponding name, zero
2470 The names are in alphabetical order. If (?| is used to create multiple capture
2480 page, the groups may be given the same name, but there is only one entry in the
2484 only if PCRE2_DUPNAMES is set. They appear in the table in the order in which
2485 they were found in the pattern. In the absence of (?| this is the order of
2486 increasing number; when (?| is used this is not necessarily the case because
2490 after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white
2491 space - including newlines - is ignored):
2498 entry in the table is eight bytes long. The table is as follows, with
2507 name-to-number map, remember that the length of the entries is likely to be
2512 The output is one of the following \fBuint32_t\fP values:
2529 pattern itself. The value that is used when \fBpcre2_compile()\fP is getting
2548 be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a
2550 the third is arbitrary user data. The callback function is called for every
2551 callout in the pattern in the order in which they appear. Its first argument is
2552 a pointer to a callout enumeration block, and its second argument is the
2564 It is possible to save compiled patterns on disc or elsewhere, and reload them
2569 "serialized" form, which in the case of PCRE2 is really just a bytecode dump.
2593 Information about a successful or unsuccessful match is placed in a match
2594 data block, which is an opaque structure that is accessed by function calls. In
2596 string that define the matched parts of the subject. This is known as the
2602 argument is the number of pairs of offsets in the \fIovector\fP.
2604 When using \fBpcre2_match()\fP, one pair of offsets is required to identify the
2613 A minimum of at least 1 pair is imposed by \fBpcre2_match_data_create()\fP, so
2614 it is always possible to return the overall matched string in the case of
2616 \fBpcre2_dfa_match()\fP. The maximum number of pairs is 65535; if the first
2617 argument of \fBpcre2_match_data_create()\fP is greater than this, 65535 is
2620 The second argument of \fBpcre2_match_data_create()\fP is a pointer to a
2625 For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a
2626 pointer to a compiled pattern. The ovector is created to be exactly the right
2629 \fBpcre2_dfa_match()\fP. The second argument is again a pointer to a general
2630 context, but in this case if NULL is passed, the memory is obtained using the
2647 When a call of \fBpcre2_match()\fP fails, valid data is available in the match
2648 block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one
2649 of the error codes for an invalid UTF string. Exactly what is available depends
2650 on the error, and is detailed below.
2652 When one of the matching functions is called, pointers to the compiled pattern
2658 PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
2665 When a match data block itself is no longer needed, it should be freed by
2666 calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL
2682 bytes, of the block that is its argument.
2684 When \fBpcre2_match()\fP runs interpretively (that is, without using JIT), it
2695 Heap memory is used for the frames vector; if the initial memory block turns
2696 out to be too small during matching, it is automatically expanded. When
2697 \fBpcre2_match()\fP returns, the memory is not freed, but remains attached to
2699 block. It is automatically freed when the match data block itself is freed.
2705 memory is constrained can check this and free the match data block if the heap
2719 The function \fBpcre2_match()\fP is called to match a subject string against a
2720 compiled pattern, which is passed in the \fIcode\fP argument. You can call
2725 This function is the main matching facility of the library, and it operates in
2726 a Perl-like manner. For specialist use there is also an alternative matching
2727 function, which is described
2734 Here is an example of a simple call to \fBpcre2_match()\fP:
2746 If the subject string is zero-terminated, the length can be given as
2759 The subject string is passed to \fBpcre2_match()\fP as a pointer in
2762 That is, they are in bytes for the 8-bit library, 16-bit code units for the
2764 UTF processing is enabled. As a special case, if \fIsubject\fP is NULL and
2765 \fIlength\fP is zero, the subject is assumed to be an empty string. If
2766 \fIlength\fP is non-zero, an error occurs if \fIsubject\fP is NULL.
2768 If \fIstartoffset\fP is greater than the length of the subject,
2769 \fBpcre2_match()\fP returns PCRE2_ERROR_BADOFFSET. When the starting offset is
2776 A non-zero starting offset is useful when searching for another match in the
2785 the current position in the subject is not a word boundary.) When applied to
2787 occurrence. If \fBpcre2_match()\fP is called again with just the remainder of
2788 the subject, namely "issippi", it does not match, because \eB is always false
2789 at the start of the subject, which is deemed to be a word boundary. However, if
2790 \fBpcre2_match()\fP is passed the entire string again, but with
2792 is able to look behind the starting point to discover that it is preceded by a
2795 Finding all the matches in a subject is tricky when the pattern can match an
2796 empty string. It is possible to emulate Perl's /g behaviour by first trying the
2799 and trying an ordinary match again. There is some code that demonstrates how to
2806 character is CR followed by LF, advance the starting offset by two characters
2809 If a non-zero starting offset is passed when the pattern is anchored, a single
2810 attempt to match at the given offset is made. This can only succeed if the
2825 Their action is described below.
2827 Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
2828 the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
2829 interpretive code in \fBpcre2_match()\fP is run.
2830 PCRE2_DISABLE_RECURSELOOP_CHECK is ignored by JIT, but apart from PCRE2_NO_JIT
2843 By default, a pointer to the subject is remembered in the match data block so
2847 lifetime of the subject string is not guaranteed, it may be necessary to make a
2848 copy of the subject string, but it is wasteful to do this unless the match is
2849 successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
2850 subject is copied and the new pointer is remembered in the match data block
2852 the match block itself is used. The copy is automatically freed when
2853 \fBpcre2_match_data_free()\fP is called to free the match data block. It is also
2854 automatically freed if the match data block is re-used for another match
2859 This option is relevant only to \fBpcre2_match()\fP for interpretive matching.
2860 It is ignored when JIT is used, and is forbidden for \fBpcre2_dfa_match()\fP.
2865 limits are large. There is therefore a check at the start of each recursion.
2866 If the same group is still active from a previous call, and the current subject
2867 pointer is the same as it was at the start of that group, and the furthest
2868 inspected character of the subject has not changed, an error is generated.
2871 this error. This option disables the check. It is provided mainly for testing
2876 If the PCRE2_ENDANCHORED option is set, any string that \fBpcre2_match()\fP
2882 This option specifies that first character of the subject string is not the
2890 This option specifies that the end of the subject string is not the end of a
2899 An empty string is not considered to be a valid match if this option is set. If
2906 string at the start of the subject. With PCRE2_NOTEMPTY set, this match is not
2912 This is like PCRE2_NOTEMPTY, except that it locks out an empty string match
2913 only at the first matching position, that is, at the start of the subject plus
2914 the starting offset. An empty string match later in the subject is permitted.
2915 If the pattern is anchored, such a match can occur only if the pattern contains
2921 \fBpcre2_jit_compile()\fP, JIT is automatically used when \fBpcre2_match()\fP
2927 When PCRE2_UTF is set at compile time, the validity of the subject as a UTF
2928 string is checked unless PCRE2_NO_UTF_CHECK is passed to \fBpcre2_match()\fP or
2930 case is discussed in detail in the
2936 In the default case, if a non-zero starting offset is given, the check is
2938 matching, and there is a check that the starting offset points to the first
2946 The check is carried out before any other processing takes place, and a
2947 negative error code is returned if the check fails. There are several UTF error
2969 If you know that your subject is valid, and you want to skip this check for
2976 PCRE2_NO_UTF_CHECK is set at match time the effect of passing an invalid
2977 string as a subject, or an invalid value of \fIstartoffset\fP, is undefined.
2984 the end of the subject string is reached successfully, but there are not enough
2991 complete match can be found is PCRE2_ERROR_PARTIAL returned instead of
2993 caller is prepared to handle a partial match, but only if no complete match can
2996 If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
2997 a partial match is found, \fBpcre2_match()\fP immediately returns
2999 words, when PCRE2_PARTIAL_HARD is set, a partial match is considered to be more
3002 There is a more detailed discussion of partial and multi-segment matching, with
3014 When PCRE2 is built, a default newline convention is set; this is usually the
3033 starting position is advanced after a match failure for an unanchored pattern.
3035 When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is set as
3037 when the current starting position is at a CRLF sequence, and the pattern
3038 contains no explicit matches for CR or LF characters, the match position is
3041 The above rule is a compromise that makes the most common cases work as
3042 expected. For example, if the pattern is .+A (and the PCRE2_DOTALL option is
3048 An explicit match for CR of LF is either a literal appearance of one of those
3053 Notwithstanding the above, anomalous effects may still occur when CRLF is a
3070 book, this is called "capturing" in what follows, and the phrase "capture
3071 group" (Perl terminology) is used for a fragment of a pattern that picks out a
3090 called the \fBovector\fP, which contains the offsets of captured strings. It is
3100 Within the ovector, the first in each pair of values is set to the offset of
3101 the first code unit of a substring, and the second is set to the offset of the
3103 offsets, not character offsets. That is, they are byte offsets in the 8-bit
3108 of offsets (that is, \fIovector[0]\fP and \fIovector[1]\fP) are set. They
3117 pair is used for the first captured substring, and so on. The value returned by
3118 \fBpcre2_match()\fP is one more than the highest numbered pair that has been
3119 set. For example, if two substrings have been captured, the returned value is
3121 match is 1, indicating that just the first pair of offsets has been set.
3125 For example, if the pattern (?=ab\eK) is matched against "ab", the start and
3128 If a capture group is matched repeatedly within a single match operation, it is
3129 the last portion of the subject that it matched that is returned.
3131 If the ovector is too small to hold all the captured substring offsets, as much
3132 as possible is filled in, and the function returns a value of zero. If captured
3134 data block whose ovector is of minimum length (that is, one pair).
3136 It is possible for capture group number \fIn+1\fP to match some part of the
3138 "abc" is matched against the pattern (a|(z))(bc) the return from the function
3139 is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
3144 also set to PCRE2_UNSET. For example, if the string "abc" is matched against
3146 function is 2, because the highest used capture group number is 1. The offsets
3147 for the second and third capture groups (assuming the vector is large enough,
3151 pattern are never changed. That is, if a pattern contains \fIn\fP capturing
3167 As well as the offsets in the ovector, other information about a match is
3169 appropriate circumstances. If they are called at other times, the result is
3177 the zero-terminated name, which is within the compiled pattern. If no name is
3178 available, NULL is returned. The length of the name (excluding the terminating
3179 zero) is stored in the code unit that precedes the name. You should use this
3183 After a successful match, the name that is returned is the last mark name
3186 contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a
3187 partial match, the last encountered name is returned. For example, consider
3192 When it matches "bc", the returned name is A. The B mark is "seen" in the first
3193 branch of the group, but it is not on the matching path. On the other hand,
3194 when this pattern fails to match "bx", the returned name is B.
3198 is removed from the pattern above, there is an initial check for the presence
3209 escape sequence. After a partial match, however, this value is always the same
3232 with them. The codes are given names in the header file. If UTF checking is in
3233 force and an invalid UTF subject string is detected, one of a number of
3234 UTF-specific negative error codes is returned. Details are given in the
3256 catch the case when it is passed a junk pointer. This is the error that is
3257 returned when the magic number is not present.
3261 This error is given when a compiled pattern is passed to a function in a
3263 the 8-bit library is passed to a 16-bit or 32-bit library function.
3282 This error is never generated by \fBpcre2_match()\fP itself. It is provided for
3305 This error is returned when a pattern that was successfully studied using JIT
3307 stack is not large enough. See the
3319 Heap memory is used to remember backtracking points. This error is given when
3321 error, PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed exceeds
3322 the heap limit. PCRE2_ERROR_NOMEMORY is also returned if
3323 PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
3332 This error is returned when \fBpcre2_match()\fP detects a recursion loop within
3338 matching is attempted.
3353 code unit buffer and its length in code units, into which the text message is
3354 placed. The message is returned in code units of the appropriate width for the
3355 library that is being used.
3357 The returned message is terminated with a trailing zero, and the function
3359 error number is unknown, the negative error code PCRE2_ERROR_BADDATA is
3360 returned. If the buffer is too small, the message is truncated (but still with
3361 a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
3362 None of the messages are very long; a buffer size of 120 code units is ample.
3391 a binary zero is correctly extracted and has a further zero added on the end,
3392 but the result is not, of course, a C string.
3397 substring zero is available. An attempt to extract any other substring gives
3403 For example, if the pattern (?=ab\eK) is matched against "ab", the start and
3409 argument is a pointer to the match data block, the second is the group number,
3410 and the third is a pointer to a variable into which the length is placed. If
3422 This is updated to contain the actual number of code units used for the
3428 zero. When the substring is no longer needed, the memory should be freed by
3431 The return value from all these functions is zero for success, or a negative
3432 error code. If the pattern match failed, the match failure code is returned.
3433 If a substring number greater than zero is used after a partial match,
3434 PCRE2_ERROR_PARTIAL is returned. Other possible error codes are:
3443 There is no substring with that number in the pattern, that is, the number is
3449 pattern, is greater than the number of slots in the ovector, so the substring
3454 The substring did not participate in the match. For example, if the pattern is
3455 (abc)|(def) and the subject is "def", and the ovector contains at least two
3456 capturing slots, substring number 1 is unset.
3472 that is added to each of them. All this is done in a single block of memory
3473 that is obtained using the same memory allocation function that was used to get
3477 partial match, the error code PCRE2_ERROR_PARTIAL is returned.
3479 The address of the memory block is returned via \fIlistptr\fP, which is also
3480 the start of the list of string pointers. The end of the list is marked by a
3481 NULL pointer. The address of the list of lengths is returned via
3485 function is zero if all went well, or PCRE2_ERROR_NOMEMORY if the memory block
3486 could not be obtained. When the list is no longer needed, it should be freed by
3489 If this function encounters a substring that is unset, which can happen when
3522 the number of the capture group called "xxx" is 2. If the name is known to be
3524 calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
3525 compiled pattern, and the second is the name. The yield of the function is the
3526 group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or
3527 PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name.
3532 "bynumber" functions, the only difference being that the second argument is a
3533 name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
3535 captured substring from the first named group that is set.
3537 If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
3539 number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there
3540 is at least one group with a slot in the ovector, but no group is found to be
3541 set, PCRE2_ERROR_UNSET is returned.
3574 the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP, which
3576 special case, if \fIreplacement\fP is NULL and \fIrlength\fP is zero, the
3577 replacement is assumed to be an empty string. If \fIrlength\fP is non-zero, an
3578 error occurs if \fIreplacement\fP is NULL.
3580 There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just
3581 the replacement string(s). The default action is to perform just one
3582 replacement if the pattern matches, but there is an option that requests
3586 that were carried out. This may be zero if no match was found, and is never
3587 greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
3588 returned if an error is detected.
3599 data block is obtained and freed within this function, using memory management
3603 If \fImatch_data\fP is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
3604 provided block is used for all calls to \fBpcre2_match()\fP, and its contents
3611 One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
3620 PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
3621 \fBpcre2_match()\fP is called after the first substitution to check for further
3622 matches, but this is done using an internally obtained match data block, thus
3625 The \fIcode\fP argument is not used for matching before the first substitution
3626 when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when
3627 PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the
3630 The default action of \fBpcre2_substitute()\fP is to return a copy of the
3632 PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
3643 function is successful, the value is updated to contain the length in code
3644 units of the new string, excluding the trailing zero that is automatically
3647 If the function is not successful, the value set via \fIoutlengthptr\fP depends
3648 on the type of error. For syntax errors in the replacement string, the value is
3650 errors, the value is PCRE2_UNSET by default. This includes the case of the
3651 output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
3653 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
3654 too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
3655 this option is set, however, \fBpcre2_substitute()\fP continues to go through
3657 in order to compute the size of buffer that is needed. This value is passed
3661 Passing a buffer size of zero is a permitted way of finding out how much memory
3663 operation is carried out twice. Depending on the application, it may be more
3667 The replacement string, which is interpreted as a UTF string in UTF mode, is
3668 checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF
3671 If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted
3672 in any way. By default, however, a dollar character is an escape character that
3674 (*MARK) or other control verbs in the pattern. Dollar is the only escape
3675 character (backslash is treated as literal). The following forms are always
3685 For example, if the pattern a(b)c is matched with "=abc=" and the replacement
3686 string "+$1$0$1+", the result is "=+babcb+=".
3691 inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This
3700 replacing every matching substring. If this option is not set, only the first
3701 matching substring is replaced. The search for matches takes place in the
3702 original subject string (that is, previous replacements do not affect it).
3703 Iteration is implemented by advancing the \fIstartoffset\fP value for each
3704 search, which is always passed the entire subject string. If an offset limit is
3705 set in the match context, searching stops when that limit is reached.
3709 limit. Here is a \fBpcre2test\fP example:
3716 length, an attempt to find a non-empty match at the same offset is performed.
3717 If this is not successful, the offset is advanced by one character except when
3718 CRLF is a valid newline sequence and the next two characters are CR, LF. In
3719 this case, the offset is advanced by two characters.
3727 groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
3728 strings when inserted as described above. If this option is not set, an attempt
3733 replacement string. Without this option, only the dollar character is special,
3735 PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
3737 Firstly, backslash in a replacement string is interpreted as an escape
3748 \eu and \el force the next character (if it is a letter) to upper or lower
3757 the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
3761 The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
3762 flexibility to capture group substitution. The syntax is similar to that used
3769 default value. If group <n> is set, its value is inserted; if not, <string> is
3771 expanded and inserted when group <n> is set or unset, respectively. The first
3772 form is just a convenient shorthand for
3790 If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
3799 code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
3802 PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
3803 unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
3805 PCRE2_ERROR_UNSET is returned for an unset substring insertion (including an
3806 unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) when the simple
3807 (non-extended) syntax is used and PCRE2_SUBSTITUTE_UNSET_EMPTY is not set.
3809 PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough. If the
3810 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is
3811 needed is returned via \fIoutlengthptr\fP. Note that this does not happen by
3814 PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
3815 \fImatch_data\fP argument is NULL or if the \fIsubject\fP or \fIreplacement\fP
3816 arguments are NULL. For backward compatibility reasons an exception is made for
3817 the \fIreplacement\fP argument if the \fIrlength\fP argument is also 0.
3819 PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
3825 subject, which can happen if \eK is used in an assertion).
3847 callout function for \fBpcre2_substitute()\fP. This information is passed in
3848 a match context. The callout function is called after each substitution has
3850 function is not called for simulated substitutions that happen as a result of
3853 The first argument of the callout function is a pointer to a substitute callout
3866 current version is 0. The version number will increase in future if more fields
3867 are added, but the intention is never to remove any of the existing fields.
3869 The \fIsubscount\fP field is the number of the current match. It is 1 for the
3875 are set in the ovector, and is always greater than zero.
3881 The second argument of the callout function is the value passed as
3883 callout function is interpreted as follows:
3885 If the value is zero, the replacement is accepted, and, if
3886 PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next
3887 match. If the value is not zero, the current replacement is not accepted. If
3888 the value is greater than zero, processing continues when
3889 PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or
3890 PCRE2_SUBSTITUTE_GLOBAL is not set), the rest of the input is copied to the
3903 When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture
3909 only one of each set of identically-named groups participates. An example is
3918 the given name that is set. Only if none are set is PCRE2_ERROR_UNSET is
3924 argument is the compiled pattern, and the second is the name. If the third and
3932 PCRE2_ERROR_NOSUBSTRING is returned if there are no entries for the given name.
3934 The format of the name table is described
3952 callout facility, which is described in the
3958 What you have to do is to insert a callout right at the end of the pattern.
3959 When your callout function is called, extract and save the current matched
3977 The function \fBpcre2_dfa_match()\fP is called to match a subject string
3981 characteristics to the normal algorithm, and is not compatible with Perl. Some
3993 is used in a different way, and this is described below. The other common
3995 description is not repeated here.
3998 vector should contain at least 20 elements. It is used for keeping track of
3999 multiple paths through the pattern tree. More workspace is needed for patterns
4002 Here is an example of a simple call to \fBpcre2_dfa_match()\fP:
4026 description is not repeated here.
4032 details are slightly different. When PCRE2_PARTIAL_HARD is set for
4034 subject is reached and there is still at least one matching possibility that
4036 already been found. When PCRE2_PARTIAL_SOFT is set, the return code
4037 PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL if the end of the
4038 subject is reached, there have been no complete matches, but there is still at
4040 when the longest partial match was found is set as the first matching string in
4041 both cases. There is a more detailed discussion of partial and multi-segment
4052 works, this is necessarily the shortest possible match at the first possible
4057 When \fBpcre2_dfa_match()\fP returns a partial match, it is possible to call it
4059 match. The PCRE2_DFA_RESTART option requests this action; when it is set, the
4061 before because data about the match so far is left in them after a partial
4062 match. There is more discussion of this facility in the
4081 This is <something> <something else> <something further> no more
4089 On success, the yield of the function is a number greater than zero, which is
4101 is, the longest matching string is first. If there were too many matches to fit
4102 into the ovector, the yield of the function is zero, and the vector is filled
4107 pattern "a\ed+" is compiled as if it were "a\ed++". For DFA matching, this
4108 means that only one possible match is found. If you really do want multiple
4127 This return is given if \fBpcre2_dfa_match()\fP encounters an item in the
4133 This return is given if \fBpcre2_dfa_match()\fP encounters a condition item
4139 This return is given if \fBpcre2_dfa_match()\fP is called for a pattern that
4140 was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for DFA
4145 This return is given if \fBpcre2_dfa_match()\fP runs out of space in the
4150 When a recursion or subroutine call is processed, the matching function calls
4152 This error is given if the internal ovector is not large enough. This should be
4153 extremely rare, as a vector of size 1000 is used.
4157 When \fBpcre2_dfa_match()\fP is called with the \fBPCRE2_DFA_RESTART\fP option,
4160 fail, this error is given.