pcre2api.3 - OpenGrok cross reference for /aosp_15

Lines Matching full:are
288 These functions became obsolete at release 10.30 and are retained only for
322 POSIX basic and extended patterns can be converted. Details are given in the
332 There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
336 systems the libraries are called \fBlibpcre2-8\fP, \fBlibpcre2-16\fP, and
345 There are also three different sets of data types:
352 The SPTR types are pointers to constants of the equivalent UCHAR types,
353 that is, they are pointers to vectors of unsigned code units.
355 Character strings are passed to a PCRE2 library as sequences of unsigned
361 are defined whose names are the generic forms such as \fBpcre2_compile()\fP and
385 PCRE2 documents, functions and data types are described using their generic
392 PCRE2 has its own native API, which is described in this document. There are
395 functionality of PCRE2 and they are not thread-safe. They are described in the
402 codes are defined in the header file \fBpcre2.h\fP, which also contains
411 The functions \fBpcre2_compile()\fP and \fBpcre2_match()\fP are used for
425 The compiling and matching functions recognize various options that are passed
426 as bits in an options argument. There are also some more complicated parameters
427 such as custom memory management functions and resource limits that are passed
428 in "contexts" (which are just memory blocks, described below). Simple
445 checking. The JIT-specific functions are discussed in the
454 point in the subject), and scans the subject just once (unless there are
463 In addition to the main compiling and matching functions, there are convenience
465 been matched by \fBpcre2_match()\fP. They are:
477 \fBpcre2_substring_free()\fP and \fBpcre2_substring_list_free()\fP are also
486 Functions whose names begin with \fBpcre2_serialize_\fP are used for saving
489 Finally, there are functions for finding out information about a compiled
493 Functions with names ending with \fB_free()\fP are used for freeing memory
502 several places. These values are always of type PCRE2_SIZE, which is an
507 maximum. Note that string lengths are always given in code units. Only in the
518 Unicode newline sequence. The Unicode newline sequences are the three just
560 There are several different blocks of data that are used to pass information
580 In a more complicated situation, where patterns are compiled only when they are
581 first needed, but are still shared between threads, pointers to compiled
638 functions are called. A context is nothing more than a collection of parameters
641 using lots of arguments. The parameters that are stored in contexts are in some
645 In a multithreaded application, if the parameters in a context are values that
663 Some PCRE2 functions have a lot of parameters, many of which are used only by
667 parameters are passed to certain functions in a \fBcontext\fP instead of
672 There are three different types of context: a general context that is relevant
680 memory management functions that are called from several places in the PCRE2
693 prototypes are:
701 \fImalloc()\fP and \fIfree()\fP are used. (This is not currently useful, as
702 there are no other fields in a general context, but in future there might be.)
704 storing the context, and all three values are saved as part of the context.
742 A compile context is also required if you are using custom memory management.
787 As PCRE2 has developed, almost all the 32 option bits that are available in
789 running out, the compile context contains a set of extra option bits which are
792 setting. The available options are defined in the section entitled "Extra
830 without a bounding length are not supported.
837 This specifies which characters or character sequences are to be recognized as
925 during a matching operation. Details are given in the
938 made by \fBpcre2_substitute()\fP. Details are given in the section entitled
1006 \fBpcre2_match()\fP uses the heap are given in the
1023 up too many computing resources when processing patterns that are not going to
1031 patterns that are not anchored, the count restarts from zero for each position
1077 workspace vectors are allocated on the heap from version 10.32 onwards, so the
1141 lookarounds, and atomic groups in \fBpcre2_dfa_match()\fP. Further details are
1148 \fBpcre2_dfa_match()\fP. Further details are given with
1191 \fBpcre2_match()\fP. Further details are given with
1197 sequence that is recognized as meaning "newline". The values are:
1316 tables are used throughout, so this behaviour is appropriate. Nevertheless,
1317 there are occasions when a copy of a compiled pattern and the relevant tables
1319 Copies of both the code and the tables are made, with the new code pointing to
1326 pattern and the subject string are set in the match data block so that they can
1343 settings that affect the compilation. It should be zero if none of them are
1344 required. The available options are described below. Some of them (in
1345 particular, those that are compatible with Perl, but some others as well) can
1367 NULL immediately. Otherwise, the variables to which these point are set to an
1372 There are nearly 100 positive error codes that \fBpcre2_compile()\fP may return
1373 if it finds an error in the pattern. There are also some negative error codes
1374 that are used for invalid UTF strings when validity checking is in force. These
1381 because the textual error messages that are obtained by calling the
1388 should be self-explanatory. Macro names starting with PCRE2_ERROR_ are defined
1401 Some errors are not detected until the whole pattern has been scanned; in these
1425 The following names for option bits are defined in the \fBpcre2.h\fP header
1491 whitespace in verb names is skipped and #-comments are recognized, exactly as
1510 PCRE2_UCP is set, Unicode properties are used for all characters with more than
1511 one other case, and for all characters whose code points are greater than
1512 U+007F. Note that there are two ASCII characters, K and S, that, in addition to
1513 their lower case ASCII equivalents, are case-equivalent with U+212A (Kelvin
1520 16-bit or 32-bit mode) are treated as not having another case.
1535 character, even if newlines are coded as CRLF. Without this option, a dot does
1546 instance of the named group can ever be matched. There are more details of
1573 which are necessarily substrings of the first one, must obviously end before
1578 If this bit is set, most white space characters in the pattern are totally
1588 white space only those characters with code points less than 256 that are
1595 ASCII environments, the relevant characters are those with code points 0x0009
1600 five more Unicode "Pattern White Space" characters are recognized by
1601 PCRE2_EXTENDED. These are U+0085 (next line), U+200E (left-to-right mark),
1604 option. Note that the horizontal and vertical space characters that are matched
1605 by the \eh and \ev escapes in patterns are a much bigger set.
1614 Which characters are interpreted as newlines can be specified by a setting in
1627 and horizontal tab characters are ignored inside a character class. Note: only
1628 these two characters are ignored, not the full set of pattern white space
1629 characters that are ignored outside a character class. PCRE2_EXTENDED_MORE is
1649 If this option is set, all meta-characters in the pattern are disabled, and it
1651 expression engine is not the most efficient way of doing it. If you are doing a
1652 lot of literal matching and are worried about efficiency, you should consider
1653 using other approaches. The only other main options that are allowed with
1654 PCRE2_LITERAL are: PCRE2_ANCHORED, PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT,
1658 PCRE2_EXTRA_MATCH_WORD are also supported. Any other options cause an error.
1666 sequences within an arbitrary string of bytes unless such sequences are
1701 there are no newlines in a subject string, or no occurrences of ^ or $ in a
1745 backtracks into a+ that can never be successful. However, if callouts are in
1746 use, auto-possessification means that some callouts are never taken. You can
1770 There are a number of optimizations that may occur at the start of a match, in
1777 (*MARK) items are in use, these "start-up" optimizations can cause them to be
1778 skipped if the pattern is never actually used. The start-up optimizations are
1784 and (*MARK) are considered at every possible starting position in the subject
1811 left, so there are no more attempts, and "no match" is returned with the "last
1812 mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
1822 automatically checked. There are discussions about the validity of
1855 points (0xd800 to 0xdfff) are invalid. If you want to allow escape sequences
1869 default, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode
1870 properties are used to classify characters. There are some PCRE2_EXTRA
1871 options (see below) that add finer control to this behaviour. More details are
1893 This option inverts the "greediness" of the quantifiers so that they are not
1914 that are subsequently processed as strings of UTF characters instead of
1918 behaviour of PCRE2 are given in the
1930 \fBpcre2_set_compile_extra_options()\fP function are as follows:
1943 code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode
1946 UTF-32, but are defined as invalid code points, and cause errors if encountered
1957 point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
2006 digits in \ex{} are just ignored, though warnings are given in both cases if
2011 \fBpcre2_compile()\fP, all unrecognized or malformed escape sequences are
2023 rules, which allow for more than two cases per character. There are two
2033 There are some legacy applications where the escape sequence \er in a pattern
2084 function. Full details are given in the
2107 PCRE2 handles caseless matching, and determines whether characters are letters,
2109 point. However, this applies only to characters whose code points are less than
2119 when PCRE2_UTF is not set. There are, however, some PCRE2_EXTRA options (see
2122 The use of locales with Unicode is discouraged. If you are handling characters
2126 PCRE2 contains a built-in set of character tables that are used by default.
2127 These are sufficient for many applications. Normally, the internal tables
2137 External tables are built by calling the \fBpcre2_maketables()\fP function, in
2144 For example, to build and use tables that are appropriate for the French locale
2145 (where accented characters with values greater than 127 are treated as
2158 is saved with the compiled pattern, and the same tables are used by the
2164 tables remains available while they are still in use. When they are no longer
2173 The tables described above are just a sequence of binary bytes, which makes
2228 The possible values for the second argument are defined in \fBpcre2.h\fP, and
2258 following are true:
2268 For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
2279 Zero is returned if there are no backreferences.
2309 greater than 255 are supported, the flag bit for 255 means "any code unit of
2336 Return the size (in bytes) of the data frames that are used to remember
2447 names are just an additional way of identifying the parentheses, which still
2449 \fBpcre2_substring_get_byname()\fP are provided for extracting captured
2463 two bytes of each entry are the number of the capturing parenthesis, most
2470 The names are in alphabetical order. If (?| is used to create multiple capture
2481 table. Different names for groups of the same number are not permitted.
2483 Duplicate names for capture groups with different numbers are permitted, but
2497 There are four named capture groups, so the table has four entries, and each
2531 value returned by this option, because there are cases where the code that
2554 contents of the callout enumeration block are described in the
2565 later, subject to a number of restrictions. The host on which the patterns are
2570 The functions whose names begin with \fBpcre2_serialize_\fP are used for
2571 converting to and from the serialized form. They are described in the
2611 large enough to hold as many as are expected.
2622 memory for the match data block. If you are not using custom memory management,
2635 operation has finished, using functions that are described in the sections on
2653 and the subject string are set in the match data block so that they can be
2748 matching parameters are to be changed. For details, see the section on
2761 \fIstartoffset\fP. The length and offset are in code units, not characters.
2762 That is, they are in bytes for the 8-bit library, 16-bit code units for the
2773 mode, one code unit equals one character, so all offsets are valid). Like the
2821 zero. The only bits that may be set are PCRE2_ANCHORED,
2831 (obviously), the remaining options are supported for JIT matching.
2846 until all such operations are complete. For some applications where the
2865 limits are large. There is therefore a check at the start of each recursion.
2870 There are rare cases of matches that would complete, but nevertheless trigger
2900 there are alternatives in the pattern, they are tried. If all the alternatives
2939 code unit of a character or to the end of the subject. If there are no
2942 starting offset, or at the start of the subject if there are not that many
2943 characters before the starting offset. Note that the sequences \eb and \eB are
2947 negative error code is returned if the check fails. There are several UTF error
2949 code unit sequence. There are discussions about the validity of
2972 calls to \fBpcre2_match()\fP if you are making repeated calls to find multiple
2984 the end of the subject string is reached successfully, but there are not enough
3074 can be used to find out how many capture groups there are in a compiled
3102 first code unit after the end of a substring. These values are always code unit
3103 offsets, not character offsets. That is, they are byte offsets in the 8-bit
3108 of offsets (that is, \fIovector[0]\fP and \fIovector[1]\fP) are set. They
3120 3. If there are no captured substrings, the return value from a successful
3126 end offset values for the match are 2 and 0.
3133 substrings are not of interest, \fBpcre2_match()\fP may be called with a match
3139 is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
3140 values in the offset pairs corresponding to unused groups are set to
3143 Offset values that correspond to unused groups at the end of the expression are
3145 the pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the
3148 of course) are set to PCRE2_UNSET.
3151 pattern are never changed. That is, if a pattern contains \fIn\fP capturing
3152 parentheses, no more than \fIovector[0]\fP to \fIovector[2n+1]\fP are set by
3154 had. After a failed match attempt, the contents of the ovector are unchanged.
3169 appropriate circumstances. If they are called at other times, the result is
3196 \fBWarning:\fP By default, certain start-of-match optimizations are used to
3213 the code unit offset of the invalid UTF character. Details are given in the
3231 Negative error codes are also returned by other functions, and are documented
3232 with them. The codes are given names in the header file. If UTF checking is in
3234 UTF-specific negative error codes is returned. Details are given in the
3238 page. The following are the other errors that may be returned by
3335 position in the subject string. Some simple patterns that might do this are
3362 None of the messages are very long; a buffer size of 120 code units is ample.
3389 For convenience, auxiliary functions are provided for extracting captured
3404 end offset values for the match are 2 and 0. In this situation, calling these
3417 used for the match data block. The first two arguments of these functions are a
3420 The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to
3426 to variables that are updated with a pointer to the new memory and the number
3434 PCRE2_ERROR_PARTIAL is returned. Other possible error codes are:
3531 For convenience, there are also "byname" functions that correspond to the
3533 name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
3537 If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
3538 returned. If all groups with the name have numbers that are greater than the
3554 names are not included in the compiled code. The matching process uses only
3591 end before it starts are not supported, and give rise to an error return. For
3593 start earlier than the point that was reached in the previous iteration are
3596 The first seven arguments of \fBpcre2_substitute()\fP are the same as for
3597 \fBpcre2_match()\fP, except that the partial matching options are not
3605 afterwards are the result of the final call. For global changes, this will
3619 The contents of the externally supplied match data block are not changed when
3632 PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
3633 returned. In the global case, multiple replacements are concatenated in the
3675 character (backslash is treated as literal). The following forms are always
3682 Either a group number or a group name can be given for <n>. Curly brackets are
3718 CRLF is a valid newline sequence and the next two characters are CR, LF. In
3734 and only the group insertion forms listed above are valid. When
3743 There are also four escape sequences for forcing the case of inserted letters.
3753 properties are used for case forcing characters whose code points are greater
3770 expanded and the result inserted. The second form specifies strings that are
3791 PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and
3800 \fBpcre2_match()\fP are passed straight back.
3816 arguments are NULL. For backward compatibility reasons an exception is made for
3871 pointers are copies of the values passed to \fBpcre2_substitute()\fP.
3904 groups are not required to be unique. Duplicate names are always allowed for
3906 groups are named, they are required to use the same names.
3908 Normally, patterns that use duplicate names are such that in any one match,
3916 When duplicates are present, \fBpcre2_substring_copy_byname()\fP and
3918 the given name that is set. Only if none are set is PCRE2_ERROR_UNSET is
3920 error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate names.
3925 fourth arguments are NULL, the function returns a group number for a unique
3928 When the third and fourth arguments are not NULL, they must be pointers to
3929 variables that are updated by the function. After it has run, they point to the
3932 PCRE2_ERROR_NOSUBSTRING is returned if there are no entries for the given name.
3982 of the features of PCRE2 patterns are not supported. Nevertheless, there are
3991 The arguments for the \fBpcre2_dfa_match()\fP function are the same as for
3994 arguments are used in the same way as for \fBpcre2_match()\fP, so their
4000 and subjects where there are a lot of potential matches.
4021 be zero. The only bits that may be set are PCRE2_ANCHORED,
4025 four of these are exactly the same as for \fBpcre2_match()\fP, so their
4032 details are slightly different. When PCRE2_PARTIAL_HARD is set for
4074 the function start at the same point in the subject. The shorter matches are
4083 the three matched strings are
4090 the number of matched substrings. The offsets of the substrings are returned in
4100 The matched strings are stored in the ovector in reverse order of length; that
4117 Many of the errors are the same as for \fBpcre2_match()\fP, as described
4122 There are in addition the following errors that are specific to
4135 specific capture group. These are not supported.
4158 some plausibility checks are made on the contents of the workspace, which