1*22dc650dSSadaf Ebrahimi<html> 2*22dc650dSSadaf Ebrahimi<head> 3*22dc650dSSadaf Ebrahimi<title>pcre2convert specification</title> 4*22dc650dSSadaf Ebrahimi</head> 5*22dc650dSSadaf Ebrahimi<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> 6*22dc650dSSadaf Ebrahimi<h1>pcre2convert man page</h1> 7*22dc650dSSadaf Ebrahimi<p> 8*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>. 9*22dc650dSSadaf Ebrahimi</p> 10*22dc650dSSadaf Ebrahimi<p> 11*22dc650dSSadaf EbrahimiThis page is part of the PCRE2 HTML documentation. It was generated 12*22dc650dSSadaf Ebrahimiautomatically from the original man page. If there is any nonsense in it, 13*22dc650dSSadaf Ebrahimiplease consult the man page, in case the conversion went wrong. 14*22dc650dSSadaf Ebrahimi<br> 15*22dc650dSSadaf Ebrahimi<ul> 16*22dc650dSSadaf Ebrahimi<li><a name="TOC1" href="#SEC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a> 17*22dc650dSSadaf Ebrahimi<li><a name="TOC2" href="#SEC2">THE CONVERT CONTEXT</a> 18*22dc650dSSadaf Ebrahimi<li><a name="TOC3" href="#SEC3">THE CONVERSION FUNCTION</a> 19*22dc650dSSadaf Ebrahimi<li><a name="TOC4" href="#SEC4">CONVERTING GLOBS</a> 20*22dc650dSSadaf Ebrahimi<li><a name="TOC5" href="#SEC5">CONVERTING POSIX PATTERNS</a> 21*22dc650dSSadaf Ebrahimi<li><a name="TOC6" href="#SEC6">AUTHOR</a> 22*22dc650dSSadaf Ebrahimi<li><a name="TOC7" href="#SEC7">REVISION</a> 23*22dc650dSSadaf Ebrahimi</ul> 24*22dc650dSSadaf Ebrahimi<br><a name="SEC1" href="#TOC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a><br> 25*22dc650dSSadaf Ebrahimi<P> 26*22dc650dSSadaf EbrahimiThis document describes a set of functions that can be used to convert 27*22dc650dSSadaf Ebrahimi"foreign" patterns into PCRE2 regular expressions. This facility is currently 28*22dc650dSSadaf Ebrahimiexperimental, and may be changed in future releases. Two kinds of pattern, 29*22dc650dSSadaf Ebrahimiglobs and POSIX patterns, are supported. 30*22dc650dSSadaf Ebrahimi</P> 31*22dc650dSSadaf Ebrahimi<br><a name="SEC2" href="#TOC1">THE CONVERT CONTEXT</a><br> 32*22dc650dSSadaf Ebrahimi<P> 33*22dc650dSSadaf Ebrahimi<b>pcre2_convert_context *pcre2_convert_context_create(</b> 34*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 35*22dc650dSSadaf Ebrahimi<br> 36*22dc650dSSadaf Ebrahimi<br> 37*22dc650dSSadaf Ebrahimi<b>pcre2_convert_context *pcre2_convert_context_copy(</b> 38*22dc650dSSadaf Ebrahimi<b> pcre2_convert_context *<i>cvcontext</i>);</b> 39*22dc650dSSadaf Ebrahimi<br> 40*22dc650dSSadaf Ebrahimi<br> 41*22dc650dSSadaf Ebrahimi<b>void pcre2_convert_context_free(pcre2_convert_context *<i>cvcontext</i>);</b> 42*22dc650dSSadaf Ebrahimi<br> 43*22dc650dSSadaf Ebrahimi<br> 44*22dc650dSSadaf Ebrahimi<b>int pcre2_set_glob_escape(pcre2_convert_context *<i>cvcontext</i>,</b> 45*22dc650dSSadaf Ebrahimi<b> uint32_t <i>escape_char</i>);</b> 46*22dc650dSSadaf Ebrahimi<br> 47*22dc650dSSadaf Ebrahimi<br> 48*22dc650dSSadaf Ebrahimi<b>int pcre2_set_glob_separator(pcre2_convert_context *<i>cvcontext</i>,</b> 49*22dc650dSSadaf Ebrahimi<b> uint32_t <i>separator_char</i>);</b> 50*22dc650dSSadaf Ebrahimi<br> 51*22dc650dSSadaf Ebrahimi<br> 52*22dc650dSSadaf EbrahimiA convert context is used to hold parameters that affect the way that pattern 53*22dc650dSSadaf Ebrahimiconversion works. Like all PCRE2 contexts, you need to use a context only if 54*22dc650dSSadaf Ebrahimiyou want to override the defaults. There are the usual create, copy, and free 55*22dc650dSSadaf Ebrahimifunctions. If custom memory management functions are set in a general context 56*22dc650dSSadaf Ebrahimithat is passed to <b>pcre2_convert_context_create()</b>, they are used for all 57*22dc650dSSadaf Ebrahimimemory management within the conversion functions. 58*22dc650dSSadaf Ebrahimi</P> 59*22dc650dSSadaf Ebrahimi<P> 60*22dc650dSSadaf EbrahimiThere are only two parameters in the convert context at present. Both apply 61*22dc650dSSadaf Ebrahimionly to glob conversions. The escape character defaults to grave accent under 62*22dc650dSSadaf EbrahimiWindows, otherwise backslash. It can be set to zero, meaning no escape 63*22dc650dSSadaf Ebrahimicharacter, or to any punctuation character with a code point less than 256. 64*22dc650dSSadaf EbrahimiThe separator character defaults to backslash under Windows, otherwise forward 65*22dc650dSSadaf Ebrahimislash. It can be set to forward slash, backslash, or dot. 66*22dc650dSSadaf Ebrahimi</P> 67*22dc650dSSadaf Ebrahimi<P> 68*22dc650dSSadaf EbrahimiThe two setting functions return zero on success, or PCRE2_ERROR_BADDATA if 69*22dc650dSSadaf Ebrahimitheir second argument is invalid. 70*22dc650dSSadaf Ebrahimi</P> 71*22dc650dSSadaf Ebrahimi<br><a name="SEC3" href="#TOC1">THE CONVERSION FUNCTION</a><br> 72*22dc650dSSadaf Ebrahimi<P> 73*22dc650dSSadaf Ebrahimi<b>int pcre2_pattern_convert(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b> 74*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, PCRE2_UCHAR **<i>buffer</i>,</b> 75*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE *<i>blength</i>, pcre2_convert_context *<i>cvcontext</i>);</b> 76*22dc650dSSadaf Ebrahimi<br> 77*22dc650dSSadaf Ebrahimi<br> 78*22dc650dSSadaf Ebrahimi<b>void pcre2_converted_pattern_free(PCRE2_UCHAR *<i>converted_pattern</i>);</b> 79*22dc650dSSadaf Ebrahimi<br> 80*22dc650dSSadaf Ebrahimi<br> 81*22dc650dSSadaf EbrahimiThe first two arguments of <b>pcre2_pattern_convert()</b> define the foreign 82*22dc650dSSadaf Ebrahimipattern that is to be converted. The length may be given as 83*22dc650dSSadaf EbrahimiPCRE2_ZERO_TERMINATED. The <b>options</b> argument defines how the pattern is to 84*22dc650dSSadaf Ebrahimibe processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set. 85*22dc650dSSadaf EbrahimiPCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid. 86*22dc650dSSadaf EbrahimiOne or more of the glob options, or one of the following POSIX options must be 87*22dc650dSSadaf Ebrahimiset to define the type of conversion that is required: 88*22dc650dSSadaf Ebrahimi<pre> 89*22dc650dSSadaf Ebrahimi PCRE2_CONVERT_GLOB 90*22dc650dSSadaf Ebrahimi PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 91*22dc650dSSadaf Ebrahimi PCRE2_CONVERT_GLOB_NO_STARSTAR 92*22dc650dSSadaf Ebrahimi PCRE2_CONVERT_POSIX_BASIC 93*22dc650dSSadaf Ebrahimi PCRE2_CONVERT_POSIX_EXTENDED 94*22dc650dSSadaf Ebrahimi</pre> 95*22dc650dSSadaf EbrahimiDetails of the conversions are given below. The <b>buffer</b> and <b>blength</b> 96*22dc650dSSadaf Ebrahimiarguments define how the output is handled: 97*22dc650dSSadaf Ebrahimi</P> 98*22dc650dSSadaf Ebrahimi<P> 99*22dc650dSSadaf EbrahimiIf <b>buffer</b> is NULL, the function just returns the length of the converted 100*22dc650dSSadaf Ebrahimipattern via <b>blength</b>. This is one less than the length of buffer needed, 101*22dc650dSSadaf Ebrahimibecause a terminating zero is always added to the output. 102*22dc650dSSadaf Ebrahimi</P> 103*22dc650dSSadaf Ebrahimi<P> 104*22dc650dSSadaf EbrahimiIf <b>buffer</b> points to a NULL pointer, an output buffer is obtained using 105*22dc650dSSadaf Ebrahimithe allocator in the context or <b>malloc()</b> if no context is supplied. A 106*22dc650dSSadaf Ebrahimipointer to this buffer is placed in the variable to which <b>buffer</b> points. 107*22dc650dSSadaf EbrahimiWhen no longer needed the output buffer must be freed by calling 108*22dc650dSSadaf Ebrahimi<b>pcre2_converted_pattern_free()</b>. If this function is called with a NULL 109*22dc650dSSadaf Ebrahimiargument, it returns immediately without doing anything. 110*22dc650dSSadaf Ebrahimi</P> 111*22dc650dSSadaf Ebrahimi<P> 112*22dc650dSSadaf EbrahimiIf <b>buffer</b> points to a non-NULL pointer, <b>blength</b> must be set to the 113*22dc650dSSadaf Ebrahimiactual length of the buffer provided (in code units). 114*22dc650dSSadaf Ebrahimi</P> 115*22dc650dSSadaf Ebrahimi<P> 116*22dc650dSSadaf EbrahimiIn all cases, after successful conversion, the variable pointed to by 117*22dc650dSSadaf Ebrahimi<b>blength</b> is updated to the length actually used (in code units), excluding 118*22dc650dSSadaf Ebrahimithe terminating zero that is always added. 119*22dc650dSSadaf Ebrahimi</P> 120*22dc650dSSadaf Ebrahimi<P> 121*22dc650dSSadaf EbrahimiIf an error occurs, the length (via <b>blength</b>) is set to the offset 122*22dc650dSSadaf Ebrahimiwithin the input pattern where the error was detected. Only gross syntax errors 123*22dc650dSSadaf Ebrahimiare caught; there are plenty of errors that will get passed on for 124*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> to discover. 125*22dc650dSSadaf Ebrahimi</P> 126*22dc650dSSadaf Ebrahimi<P> 127*22dc650dSSadaf EbrahimiThe return from <b>pcre2_pattern_convert()</b> is zero on success or a non-zero 128*22dc650dSSadaf EbrahimiPCRE2 error code. Note that PCRE2 error codes may be positive or negative: 129*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> uses mostly positive codes and <b>pcre2_match()</b> 130*22dc650dSSadaf Ebrahiminegative ones; <b>pcre2_convert()</b> uses existing codes of both kinds. A 131*22dc650dSSadaf Ebrahimitextual error message can be obtained by calling 132*22dc650dSSadaf Ebrahimi<b>pcre2_get_error_message()</b>. 133*22dc650dSSadaf Ebrahimi</P> 134*22dc650dSSadaf Ebrahimi<br><a name="SEC4" href="#TOC1">CONVERTING GLOBS</a><br> 135*22dc650dSSadaf Ebrahimi<P> 136*22dc650dSSadaf EbrahimiGlobs are used to match file names, and consequently have the concept of a 137*22dc650dSSadaf Ebrahimi"path separator", which defaults to backslash under Windows and forward slash 138*22dc650dSSadaf Ebrahimiotherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not 139*22dc650dSSadaf Ebrahimipermitted to match separator characters, but the double-star (**) feature 140*22dc650dSSadaf Ebrahimi(which does match separators) is supported. 141*22dc650dSSadaf Ebrahimi</P> 142*22dc650dSSadaf Ebrahimi<P> 143*22dc650dSSadaf EbrahimiPCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to 144*22dc650dSSadaf Ebrahimimatch separator characters. PCRE2_CONVERT_GLOB_NO_STARSTAR matches globs with 145*22dc650dSSadaf Ebrahimithe double-star feature disabled. These options may be given together. 146*22dc650dSSadaf Ebrahimi</P> 147*22dc650dSSadaf Ebrahimi<br><a name="SEC5" href="#TOC1">CONVERTING POSIX PATTERNS</a><br> 148*22dc650dSSadaf Ebrahimi<P> 149*22dc650dSSadaf EbrahimiPOSIX defines two kinds of regular expression pattern: basic and extended. 150*22dc650dSSadaf EbrahimiThese can be processed by setting PCRE2_CONVERT_POSIX_BASIC or 151*22dc650dSSadaf EbrahimiPCRE2_CONVERT_POSIX_EXTENDED, respectively. 152*22dc650dSSadaf Ebrahimi</P> 153*22dc650dSSadaf Ebrahimi<P> 154*22dc650dSSadaf EbrahimiIn POSIX patterns, backslash is not special in a character class. Unmatched 155*22dc650dSSadaf Ebrahimiclosing parentheses are treated as literals. 156*22dc650dSSadaf Ebrahimi</P> 157*22dc650dSSadaf Ebrahimi<P> 158*22dc650dSSadaf EbrahimiIn basic patterns, ? + | {} and () must be escaped to be recognized 159*22dc650dSSadaf Ebrahimias metacharacters outside a character class. If the first character in the 160*22dc650dSSadaf Ebrahimipattern is * it is treated as a literal. ^ is a metacharacter only at the start 161*22dc650dSSadaf Ebrahimiof a branch. 162*22dc650dSSadaf Ebrahimi</P> 163*22dc650dSSadaf Ebrahimi<P> 164*22dc650dSSadaf EbrahimiIn extended patterns, a backslash not in a character class always 165*22dc650dSSadaf Ebrahimimakes the next character literal, whatever it is. There are no backreferences. 166*22dc650dSSadaf Ebrahimi</P> 167*22dc650dSSadaf Ebrahimi<P> 168*22dc650dSSadaf EbrahimiNote: POSIX mandates that the longest possible match at the first matching 169*22dc650dSSadaf Ebrahimiposition must be found. This is not what <b>pcre2_match()</b> does; it yields 170*22dc650dSSadaf Ebrahimithe first match that is found. An application can use <b>pcre2_dfa_match()</b> 171*22dc650dSSadaf Ebrahimito find the longest match, but that does not support backreferences (but then 172*22dc650dSSadaf Ebrahimineither do POSIX extended patterns). 173*22dc650dSSadaf Ebrahimi</P> 174*22dc650dSSadaf Ebrahimi<br><a name="SEC6" href="#TOC1">AUTHOR</a><br> 175*22dc650dSSadaf Ebrahimi<P> 176*22dc650dSSadaf EbrahimiPhilip Hazel 177*22dc650dSSadaf Ebrahimi<br> 178*22dc650dSSadaf EbrahimiRetired from University Computing Service 179*22dc650dSSadaf Ebrahimi<br> 180*22dc650dSSadaf EbrahimiCambridge, England. 181*22dc650dSSadaf Ebrahimi<br> 182*22dc650dSSadaf Ebrahimi</P> 183*22dc650dSSadaf Ebrahimi<br><a name="SEC7" href="#TOC1">REVISION</a><br> 184*22dc650dSSadaf Ebrahimi<P> 185*22dc650dSSadaf EbrahimiLast updated: 28 June 2018 186*22dc650dSSadaf Ebrahimi<br> 187*22dc650dSSadaf EbrahimiCopyright © 1997-2018 University of Cambridge. 188*22dc650dSSadaf Ebrahimi<br> 189*22dc650dSSadaf Ebrahimi<p> 190*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>. 191*22dc650dSSadaf Ebrahimi</p> 192