xref: /aosp_15_r20/external/pcre/doc/html/pcre2convert.html (revision 22dc650d8ae982c6770746019a6f94af92b0f024)
1*22dc650dSSadaf Ebrahimi<html>
2*22dc650dSSadaf Ebrahimi<head>
3*22dc650dSSadaf Ebrahimi<title>pcre2convert specification</title>
4*22dc650dSSadaf Ebrahimi</head>
5*22dc650dSSadaf Ebrahimi<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6*22dc650dSSadaf Ebrahimi<h1>pcre2convert man page</h1>
7*22dc650dSSadaf Ebrahimi<p>
8*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>.
9*22dc650dSSadaf Ebrahimi</p>
10*22dc650dSSadaf Ebrahimi<p>
11*22dc650dSSadaf EbrahimiThis page is part of the PCRE2 HTML documentation. It was generated
12*22dc650dSSadaf Ebrahimiautomatically from the original man page. If there is any nonsense in it,
13*22dc650dSSadaf Ebrahimiplease consult the man page, in case the conversion went wrong.
14*22dc650dSSadaf Ebrahimi<br>
15*22dc650dSSadaf Ebrahimi<ul>
16*22dc650dSSadaf Ebrahimi<li><a name="TOC1" href="#SEC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a>
17*22dc650dSSadaf Ebrahimi<li><a name="TOC2" href="#SEC2">THE CONVERT CONTEXT</a>
18*22dc650dSSadaf Ebrahimi<li><a name="TOC3" href="#SEC3">THE CONVERSION FUNCTION</a>
19*22dc650dSSadaf Ebrahimi<li><a name="TOC4" href="#SEC4">CONVERTING GLOBS</a>
20*22dc650dSSadaf Ebrahimi<li><a name="TOC5" href="#SEC5">CONVERTING POSIX PATTERNS</a>
21*22dc650dSSadaf Ebrahimi<li><a name="TOC6" href="#SEC6">AUTHOR</a>
22*22dc650dSSadaf Ebrahimi<li><a name="TOC7" href="#SEC7">REVISION</a>
23*22dc650dSSadaf Ebrahimi</ul>
24*22dc650dSSadaf Ebrahimi<br><a name="SEC1" href="#TOC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a><br>
25*22dc650dSSadaf Ebrahimi<P>
26*22dc650dSSadaf EbrahimiThis document describes a set of functions that can be used to convert
27*22dc650dSSadaf Ebrahimi"foreign" patterns into PCRE2 regular expressions. This facility is currently
28*22dc650dSSadaf Ebrahimiexperimental, and may be changed in future releases. Two kinds of pattern,
29*22dc650dSSadaf Ebrahimiglobs and POSIX patterns, are supported.
30*22dc650dSSadaf Ebrahimi</P>
31*22dc650dSSadaf Ebrahimi<br><a name="SEC2" href="#TOC1">THE CONVERT CONTEXT</a><br>
32*22dc650dSSadaf Ebrahimi<P>
33*22dc650dSSadaf Ebrahimi<b>pcre2_convert_context *pcre2_convert_context_create(</b>
34*22dc650dSSadaf Ebrahimi<b>  pcre2_general_context *<i>gcontext</i>);</b>
35*22dc650dSSadaf Ebrahimi<br>
36*22dc650dSSadaf Ebrahimi<br>
37*22dc650dSSadaf Ebrahimi<b>pcre2_convert_context *pcre2_convert_context_copy(</b>
38*22dc650dSSadaf Ebrahimi<b>  pcre2_convert_context *<i>cvcontext</i>);</b>
39*22dc650dSSadaf Ebrahimi<br>
40*22dc650dSSadaf Ebrahimi<br>
41*22dc650dSSadaf Ebrahimi<b>void pcre2_convert_context_free(pcre2_convert_context *<i>cvcontext</i>);</b>
42*22dc650dSSadaf Ebrahimi<br>
43*22dc650dSSadaf Ebrahimi<br>
44*22dc650dSSadaf Ebrahimi<b>int pcre2_set_glob_escape(pcre2_convert_context *<i>cvcontext</i>,</b>
45*22dc650dSSadaf Ebrahimi<b>  uint32_t <i>escape_char</i>);</b>
46*22dc650dSSadaf Ebrahimi<br>
47*22dc650dSSadaf Ebrahimi<br>
48*22dc650dSSadaf Ebrahimi<b>int pcre2_set_glob_separator(pcre2_convert_context *<i>cvcontext</i>,</b>
49*22dc650dSSadaf Ebrahimi<b>  uint32_t <i>separator_char</i>);</b>
50*22dc650dSSadaf Ebrahimi<br>
51*22dc650dSSadaf Ebrahimi<br>
52*22dc650dSSadaf EbrahimiA convert context is used to hold parameters that affect the way that pattern
53*22dc650dSSadaf Ebrahimiconversion works. Like all PCRE2 contexts, you need to use a context only if
54*22dc650dSSadaf Ebrahimiyou want to override the defaults. There are the usual create, copy, and free
55*22dc650dSSadaf Ebrahimifunctions. If custom memory management functions are set in a general context
56*22dc650dSSadaf Ebrahimithat is passed to <b>pcre2_convert_context_create()</b>, they are used for all
57*22dc650dSSadaf Ebrahimimemory management within the conversion functions.
58*22dc650dSSadaf Ebrahimi</P>
59*22dc650dSSadaf Ebrahimi<P>
60*22dc650dSSadaf EbrahimiThere are only two parameters in the convert context at present. Both apply
61*22dc650dSSadaf Ebrahimionly to glob conversions. The escape character defaults to grave accent under
62*22dc650dSSadaf EbrahimiWindows, otherwise backslash. It can be set to zero, meaning no escape
63*22dc650dSSadaf Ebrahimicharacter, or to any punctuation character with a code point less than 256.
64*22dc650dSSadaf EbrahimiThe separator character defaults to backslash under Windows, otherwise forward
65*22dc650dSSadaf Ebrahimislash. It can be set to forward slash, backslash, or dot.
66*22dc650dSSadaf Ebrahimi</P>
67*22dc650dSSadaf Ebrahimi<P>
68*22dc650dSSadaf EbrahimiThe two setting functions return zero on success, or PCRE2_ERROR_BADDATA if
69*22dc650dSSadaf Ebrahimitheir second argument is invalid.
70*22dc650dSSadaf Ebrahimi</P>
71*22dc650dSSadaf Ebrahimi<br><a name="SEC3" href="#TOC1">THE CONVERSION FUNCTION</a><br>
72*22dc650dSSadaf Ebrahimi<P>
73*22dc650dSSadaf Ebrahimi<b>int pcre2_pattern_convert(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
74*22dc650dSSadaf Ebrahimi<b>  uint32_t <i>options</i>, PCRE2_UCHAR **<i>buffer</i>,</b>
75*22dc650dSSadaf Ebrahimi<b>  PCRE2_SIZE *<i>blength</i>, pcre2_convert_context *<i>cvcontext</i>);</b>
76*22dc650dSSadaf Ebrahimi<br>
77*22dc650dSSadaf Ebrahimi<br>
78*22dc650dSSadaf Ebrahimi<b>void pcre2_converted_pattern_free(PCRE2_UCHAR *<i>converted_pattern</i>);</b>
79*22dc650dSSadaf Ebrahimi<br>
80*22dc650dSSadaf Ebrahimi<br>
81*22dc650dSSadaf EbrahimiThe first two arguments of <b>pcre2_pattern_convert()</b> define the foreign
82*22dc650dSSadaf Ebrahimipattern that is to be converted. The length may be given as
83*22dc650dSSadaf EbrahimiPCRE2_ZERO_TERMINATED. The <b>options</b> argument defines how the pattern is to
84*22dc650dSSadaf Ebrahimibe processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set.
85*22dc650dSSadaf EbrahimiPCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid.
86*22dc650dSSadaf EbrahimiOne or more of the glob options, or one of the following POSIX options must be
87*22dc650dSSadaf Ebrahimiset to define the type of conversion that is required:
88*22dc650dSSadaf Ebrahimi<pre>
89*22dc650dSSadaf Ebrahimi  PCRE2_CONVERT_GLOB
90*22dc650dSSadaf Ebrahimi  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
91*22dc650dSSadaf Ebrahimi  PCRE2_CONVERT_GLOB_NO_STARSTAR
92*22dc650dSSadaf Ebrahimi  PCRE2_CONVERT_POSIX_BASIC
93*22dc650dSSadaf Ebrahimi  PCRE2_CONVERT_POSIX_EXTENDED
94*22dc650dSSadaf Ebrahimi</pre>
95*22dc650dSSadaf EbrahimiDetails of the conversions are given below. The <b>buffer</b> and <b>blength</b>
96*22dc650dSSadaf Ebrahimiarguments define how the output is handled:
97*22dc650dSSadaf Ebrahimi</P>
98*22dc650dSSadaf Ebrahimi<P>
99*22dc650dSSadaf EbrahimiIf <b>buffer</b> is NULL, the function just returns the length of the converted
100*22dc650dSSadaf Ebrahimipattern via <b>blength</b>. This is one less than the length of buffer needed,
101*22dc650dSSadaf Ebrahimibecause a terminating zero is always added to the output.
102*22dc650dSSadaf Ebrahimi</P>
103*22dc650dSSadaf Ebrahimi<P>
104*22dc650dSSadaf EbrahimiIf <b>buffer</b> points to a NULL pointer, an output buffer is obtained using
105*22dc650dSSadaf Ebrahimithe allocator in the context or <b>malloc()</b> if no context is supplied. A
106*22dc650dSSadaf Ebrahimipointer to this buffer is placed in the variable to which <b>buffer</b> points.
107*22dc650dSSadaf EbrahimiWhen no longer needed the output buffer must be freed by calling
108*22dc650dSSadaf Ebrahimi<b>pcre2_converted_pattern_free()</b>. If this function is called with a NULL
109*22dc650dSSadaf Ebrahimiargument, it returns immediately without doing anything.
110*22dc650dSSadaf Ebrahimi</P>
111*22dc650dSSadaf Ebrahimi<P>
112*22dc650dSSadaf EbrahimiIf <b>buffer</b> points to a non-NULL pointer, <b>blength</b> must be set to the
113*22dc650dSSadaf Ebrahimiactual length of the buffer provided (in code units).
114*22dc650dSSadaf Ebrahimi</P>
115*22dc650dSSadaf Ebrahimi<P>
116*22dc650dSSadaf EbrahimiIn all cases, after successful conversion, the variable pointed to by
117*22dc650dSSadaf Ebrahimi<b>blength</b> is updated to the length actually used (in code units), excluding
118*22dc650dSSadaf Ebrahimithe terminating zero that is always added.
119*22dc650dSSadaf Ebrahimi</P>
120*22dc650dSSadaf Ebrahimi<P>
121*22dc650dSSadaf EbrahimiIf an error occurs, the length (via <b>blength</b>) is set to the offset
122*22dc650dSSadaf Ebrahimiwithin the input pattern where the error was detected. Only gross syntax errors
123*22dc650dSSadaf Ebrahimiare caught; there are plenty of errors that will get passed on for
124*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> to discover.
125*22dc650dSSadaf Ebrahimi</P>
126*22dc650dSSadaf Ebrahimi<P>
127*22dc650dSSadaf EbrahimiThe return from <b>pcre2_pattern_convert()</b> is zero on success or a non-zero
128*22dc650dSSadaf EbrahimiPCRE2 error code. Note that PCRE2 error codes may be positive or negative:
129*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> uses mostly positive codes and <b>pcre2_match()</b>
130*22dc650dSSadaf Ebrahiminegative ones; <b>pcre2_convert()</b> uses existing codes of both kinds. A
131*22dc650dSSadaf Ebrahimitextual error message can be obtained by calling
132*22dc650dSSadaf Ebrahimi<b>pcre2_get_error_message()</b>.
133*22dc650dSSadaf Ebrahimi</P>
134*22dc650dSSadaf Ebrahimi<br><a name="SEC4" href="#TOC1">CONVERTING GLOBS</a><br>
135*22dc650dSSadaf Ebrahimi<P>
136*22dc650dSSadaf EbrahimiGlobs are used to match file names, and consequently have the concept of a
137*22dc650dSSadaf Ebrahimi"path separator", which defaults to backslash under Windows and forward slash
138*22dc650dSSadaf Ebrahimiotherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not
139*22dc650dSSadaf Ebrahimipermitted to match separator characters, but the double-star (**) feature
140*22dc650dSSadaf Ebrahimi(which does match separators) is supported.
141*22dc650dSSadaf Ebrahimi</P>
142*22dc650dSSadaf Ebrahimi<P>
143*22dc650dSSadaf EbrahimiPCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to
144*22dc650dSSadaf Ebrahimimatch separator characters. PCRE2_CONVERT_GLOB_NO_STARSTAR matches globs with
145*22dc650dSSadaf Ebrahimithe double-star feature disabled. These options may be given together.
146*22dc650dSSadaf Ebrahimi</P>
147*22dc650dSSadaf Ebrahimi<br><a name="SEC5" href="#TOC1">CONVERTING POSIX PATTERNS</a><br>
148*22dc650dSSadaf Ebrahimi<P>
149*22dc650dSSadaf EbrahimiPOSIX defines two kinds of regular expression pattern: basic and extended.
150*22dc650dSSadaf EbrahimiThese can be processed by setting PCRE2_CONVERT_POSIX_BASIC or
151*22dc650dSSadaf EbrahimiPCRE2_CONVERT_POSIX_EXTENDED, respectively.
152*22dc650dSSadaf Ebrahimi</P>
153*22dc650dSSadaf Ebrahimi<P>
154*22dc650dSSadaf EbrahimiIn POSIX patterns, backslash is not special in a character class. Unmatched
155*22dc650dSSadaf Ebrahimiclosing parentheses are treated as literals.
156*22dc650dSSadaf Ebrahimi</P>
157*22dc650dSSadaf Ebrahimi<P>
158*22dc650dSSadaf EbrahimiIn basic patterns, ? + | {} and () must be escaped to be recognized
159*22dc650dSSadaf Ebrahimias metacharacters outside a character class. If the first character in the
160*22dc650dSSadaf Ebrahimipattern is * it is treated as a literal. ^ is a metacharacter only at the start
161*22dc650dSSadaf Ebrahimiof a branch.
162*22dc650dSSadaf Ebrahimi</P>
163*22dc650dSSadaf Ebrahimi<P>
164*22dc650dSSadaf EbrahimiIn extended patterns, a backslash not in a character class always
165*22dc650dSSadaf Ebrahimimakes the next character literal, whatever it is. There are no backreferences.
166*22dc650dSSadaf Ebrahimi</P>
167*22dc650dSSadaf Ebrahimi<P>
168*22dc650dSSadaf EbrahimiNote: POSIX mandates that the longest possible match at the first matching
169*22dc650dSSadaf Ebrahimiposition must be found. This is not what <b>pcre2_match()</b> does; it yields
170*22dc650dSSadaf Ebrahimithe first match that is found. An application can use <b>pcre2_dfa_match()</b>
171*22dc650dSSadaf Ebrahimito find the longest match, but that does not support backreferences (but then
172*22dc650dSSadaf Ebrahimineither do POSIX extended patterns).
173*22dc650dSSadaf Ebrahimi</P>
174*22dc650dSSadaf Ebrahimi<br><a name="SEC6" href="#TOC1">AUTHOR</a><br>
175*22dc650dSSadaf Ebrahimi<P>
176*22dc650dSSadaf EbrahimiPhilip Hazel
177*22dc650dSSadaf Ebrahimi<br>
178*22dc650dSSadaf EbrahimiRetired from University Computing Service
179*22dc650dSSadaf Ebrahimi<br>
180*22dc650dSSadaf EbrahimiCambridge, England.
181*22dc650dSSadaf Ebrahimi<br>
182*22dc650dSSadaf Ebrahimi</P>
183*22dc650dSSadaf Ebrahimi<br><a name="SEC7" href="#TOC1">REVISION</a><br>
184*22dc650dSSadaf Ebrahimi<P>
185*22dc650dSSadaf EbrahimiLast updated: 28 June 2018
186*22dc650dSSadaf Ebrahimi<br>
187*22dc650dSSadaf EbrahimiCopyright &copy; 1997-2018 University of Cambridge.
188*22dc650dSSadaf Ebrahimi<br>
189*22dc650dSSadaf Ebrahimi<p>
190*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>.
191*22dc650dSSadaf Ebrahimi</p>
192