1*22dc650dSSadaf Ebrahimi<html> 2*22dc650dSSadaf Ebrahimi<head> 3*22dc650dSSadaf Ebrahimi<title>pcre2api specification</title> 4*22dc650dSSadaf Ebrahimi</head> 5*22dc650dSSadaf Ebrahimi<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> 6*22dc650dSSadaf Ebrahimi<h1>pcre2api man page</h1> 7*22dc650dSSadaf Ebrahimi<p> 8*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>. 9*22dc650dSSadaf Ebrahimi</p> 10*22dc650dSSadaf Ebrahimi<p> 11*22dc650dSSadaf EbrahimiThis page is part of the PCRE2 HTML documentation. It was generated 12*22dc650dSSadaf Ebrahimiautomatically from the original man page. If there is any nonsense in it, 13*22dc650dSSadaf Ebrahimiplease consult the man page, in case the conversion went wrong. 14*22dc650dSSadaf Ebrahimi<br> 15*22dc650dSSadaf Ebrahimi<ul> 16*22dc650dSSadaf Ebrahimi<li><a name="TOC1" href="#SEC1">PCRE2 NATIVE API BASIC FUNCTIONS</a> 17*22dc650dSSadaf Ebrahimi<li><a name="TOC2" href="#SEC2">PCRE2 NATIVE API AUXILIARY MATCH FUNCTIONS</a> 18*22dc650dSSadaf Ebrahimi<li><a name="TOC3" href="#SEC3">PCRE2 NATIVE API GENERAL CONTEXT FUNCTIONS</a> 19*22dc650dSSadaf Ebrahimi<li><a name="TOC4" href="#SEC4">PCRE2 NATIVE API COMPILE CONTEXT FUNCTIONS</a> 20*22dc650dSSadaf Ebrahimi<li><a name="TOC5" href="#SEC5">PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS</a> 21*22dc650dSSadaf Ebrahimi<li><a name="TOC6" href="#SEC6">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a> 22*22dc650dSSadaf Ebrahimi<li><a name="TOC7" href="#SEC7">PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION</a> 23*22dc650dSSadaf Ebrahimi<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a> 24*22dc650dSSadaf Ebrahimi<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a> 25*22dc650dSSadaf Ebrahimi<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a> 26*22dc650dSSadaf Ebrahimi<li><a name="TOC11" href="#SEC11">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a> 27*22dc650dSSadaf Ebrahimi<li><a name="TOC12" href="#SEC12">PCRE2 EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a> 28*22dc650dSSadaf Ebrahimi<li><a name="TOC13" href="#SEC13">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a> 29*22dc650dSSadaf Ebrahimi<li><a name="TOC14" href="#SEC14">PCRE2 API OVERVIEW</a> 30*22dc650dSSadaf Ebrahimi<li><a name="TOC15" href="#SEC15">STRING LENGTHS AND OFFSETS</a> 31*22dc650dSSadaf Ebrahimi<li><a name="TOC16" href="#SEC16">NEWLINES</a> 32*22dc650dSSadaf Ebrahimi<li><a name="TOC17" href="#SEC17">MULTITHREADING</a> 33*22dc650dSSadaf Ebrahimi<li><a name="TOC18" href="#SEC18">PCRE2 CONTEXTS</a> 34*22dc650dSSadaf Ebrahimi<li><a name="TOC19" href="#SEC19">CHECKING BUILD-TIME OPTIONS</a> 35*22dc650dSSadaf Ebrahimi<li><a name="TOC20" href="#SEC20">COMPILING A PATTERN</a> 36*22dc650dSSadaf Ebrahimi<li><a name="TOC21" href="#SEC21">JUST-IN-TIME (JIT) COMPILATION</a> 37*22dc650dSSadaf Ebrahimi<li><a name="TOC22" href="#SEC22">LOCALE SUPPORT</a> 38*22dc650dSSadaf Ebrahimi<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A COMPILED PATTERN</a> 39*22dc650dSSadaf Ebrahimi<li><a name="TOC24" href="#SEC24">INFORMATION ABOUT A PATTERN'S CALLOUTS</a> 40*22dc650dSSadaf Ebrahimi<li><a name="TOC25" href="#SEC25">SERIALIZATION AND PRECOMPILING</a> 41*22dc650dSSadaf Ebrahimi<li><a name="TOC26" href="#SEC26">THE MATCH DATA BLOCK</a> 42*22dc650dSSadaf Ebrahimi<li><a name="TOC27" href="#SEC27">MEMORY USE FOR MATCH DATA BLOCKS</a> 43*22dc650dSSadaf Ebrahimi<li><a name="TOC28" href="#SEC28">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a> 44*22dc650dSSadaf Ebrahimi<li><a name="TOC29" href="#SEC29">NEWLINE HANDLING WHEN MATCHING</a> 45*22dc650dSSadaf Ebrahimi<li><a name="TOC30" href="#SEC30">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a> 46*22dc650dSSadaf Ebrahimi<li><a name="TOC31" href="#SEC31">OTHER INFORMATION ABOUT A MATCH</a> 47*22dc650dSSadaf Ebrahimi<li><a name="TOC32" href="#SEC32">ERROR RETURNS FROM <b>pcre2_match()</b></a> 48*22dc650dSSadaf Ebrahimi<li><a name="TOC33" href="#SEC33">OBTAINING A TEXTUAL ERROR MESSAGE</a> 49*22dc650dSSadaf Ebrahimi<li><a name="TOC34" href="#SEC34">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a> 50*22dc650dSSadaf Ebrahimi<li><a name="TOC35" href="#SEC35">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a> 51*22dc650dSSadaf Ebrahimi<li><a name="TOC36" href="#SEC36">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a> 52*22dc650dSSadaf Ebrahimi<li><a name="TOC37" href="#SEC37">CREATING A NEW STRING WITH SUBSTITUTIONS</a> 53*22dc650dSSadaf Ebrahimi<li><a name="TOC38" href="#SEC38">DUPLICATE CAPTURE GROUP NAMES</a> 54*22dc650dSSadaf Ebrahimi<li><a name="TOC39" href="#SEC39">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a> 55*22dc650dSSadaf Ebrahimi<li><a name="TOC40" href="#SEC40">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a> 56*22dc650dSSadaf Ebrahimi<li><a name="TOC41" href="#SEC41">SEE ALSO</a> 57*22dc650dSSadaf Ebrahimi<li><a name="TOC42" href="#SEC42">AUTHOR</a> 58*22dc650dSSadaf Ebrahimi<li><a name="TOC43" href="#SEC43">REVISION</a> 59*22dc650dSSadaf Ebrahimi</ul> 60*22dc650dSSadaf Ebrahimi<P> 61*22dc650dSSadaf Ebrahimi<b>#include <pcre2.h></b> 62*22dc650dSSadaf Ebrahimi<br> 63*22dc650dSSadaf Ebrahimi<br> 64*22dc650dSSadaf EbrahimiPCRE2 is a new API for PCRE, starting at release 10.0. This document contains a 65*22dc650dSSadaf Ebrahimidescription of all its native functions. See the 66*22dc650dSSadaf Ebrahimi<a href="pcre2.html"><b>pcre2</b></a> 67*22dc650dSSadaf Ebrahimidocument for an overview of all the PCRE2 documentation. 68*22dc650dSSadaf Ebrahimi</P> 69*22dc650dSSadaf Ebrahimi<br><a name="SEC1" href="#TOC1">PCRE2 NATIVE API BASIC FUNCTIONS</a><br> 70*22dc650dSSadaf Ebrahimi<P> 71*22dc650dSSadaf Ebrahimi<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b> 72*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b> 73*22dc650dSSadaf Ebrahimi<b> pcre2_compile_context *<i>ccontext</i>);</b> 74*22dc650dSSadaf Ebrahimi<br> 75*22dc650dSSadaf Ebrahimi<br> 76*22dc650dSSadaf Ebrahimi<b>void pcre2_code_free(pcre2_code *<i>code</i>);</b> 77*22dc650dSSadaf Ebrahimi<br> 78*22dc650dSSadaf Ebrahimi<br> 79*22dc650dSSadaf Ebrahimi<b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b> 80*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 81*22dc650dSSadaf Ebrahimi<br> 82*22dc650dSSadaf Ebrahimi<br> 83*22dc650dSSadaf Ebrahimi<b>pcre2_match_data *pcre2_match_data_create_from_pattern(</b> 84*22dc650dSSadaf Ebrahimi<b> const pcre2_code *<i>code</i>, pcre2_general_context *<i>gcontext</i>);</b> 85*22dc650dSSadaf Ebrahimi<br> 86*22dc650dSSadaf Ebrahimi<br> 87*22dc650dSSadaf Ebrahimi<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> 88*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> 89*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b> 90*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>);</b> 91*22dc650dSSadaf Ebrahimi<br> 92*22dc650dSSadaf Ebrahimi<br> 93*22dc650dSSadaf Ebrahimi<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> 94*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> 95*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b> 96*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>,</b> 97*22dc650dSSadaf Ebrahimi<b> int *<i>workspace</i>, PCRE2_SIZE <i>wscount</i>);</b> 98*22dc650dSSadaf Ebrahimi<br> 99*22dc650dSSadaf Ebrahimi<br> 100*22dc650dSSadaf Ebrahimi<b>void pcre2_match_data_free(pcre2_match_data *<i>match_data</i>);</b> 101*22dc650dSSadaf Ebrahimi</P> 102*22dc650dSSadaf Ebrahimi<br><a name="SEC2" href="#TOC1">PCRE2 NATIVE API AUXILIARY MATCH FUNCTIONS</a><br> 103*22dc650dSSadaf Ebrahimi<P> 104*22dc650dSSadaf Ebrahimi<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b> 105*22dc650dSSadaf Ebrahimi<br> 106*22dc650dSSadaf Ebrahimi<br> 107*22dc650dSSadaf Ebrahimi<b>PCRE2_SIZE pcre2_get_match_data_size(pcre2_match_data *<i>match_data</i>);</b> 108*22dc650dSSadaf Ebrahimi<br> 109*22dc650dSSadaf Ebrahimi<br> 110*22dc650dSSadaf Ebrahimi<b>PCRE2_SIZE pcre2_get_match_data_heapframes_size(</b> 111*22dc650dSSadaf Ebrahimi<b> pcre2_match_data *<i>match_data</i>);</b> 112*22dc650dSSadaf Ebrahimi<br> 113*22dc650dSSadaf Ebrahimi<br> 114*22dc650dSSadaf Ebrahimi<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b> 115*22dc650dSSadaf Ebrahimi<br> 116*22dc650dSSadaf Ebrahimi<br> 117*22dc650dSSadaf Ebrahimi<b>PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *<i>match_data</i>);</b> 118*22dc650dSSadaf Ebrahimi<br> 119*22dc650dSSadaf Ebrahimi<br> 120*22dc650dSSadaf Ebrahimi<b>PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *<i>match_data</i>);</b> 121*22dc650dSSadaf Ebrahimi</P> 122*22dc650dSSadaf Ebrahimi<br><a name="SEC3" href="#TOC1">PCRE2 NATIVE API GENERAL CONTEXT FUNCTIONS</a><br> 123*22dc650dSSadaf Ebrahimi<P> 124*22dc650dSSadaf Ebrahimi<b>pcre2_general_context *pcre2_general_context_create(</b> 125*22dc650dSSadaf Ebrahimi<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b> 126*22dc650dSSadaf Ebrahimi<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b> 127*22dc650dSSadaf Ebrahimi<br> 128*22dc650dSSadaf Ebrahimi<br> 129*22dc650dSSadaf Ebrahimi<b>pcre2_general_context *pcre2_general_context_copy(</b> 130*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 131*22dc650dSSadaf Ebrahimi<br> 132*22dc650dSSadaf Ebrahimi<br> 133*22dc650dSSadaf Ebrahimi<b>void pcre2_general_context_free(pcre2_general_context *<i>gcontext</i>);</b> 134*22dc650dSSadaf Ebrahimi</P> 135*22dc650dSSadaf Ebrahimi<br><a name="SEC4" href="#TOC1">PCRE2 NATIVE API COMPILE CONTEXT FUNCTIONS</a><br> 136*22dc650dSSadaf Ebrahimi<P> 137*22dc650dSSadaf Ebrahimi<b>pcre2_compile_context *pcre2_compile_context_create(</b> 138*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 139*22dc650dSSadaf Ebrahimi<br> 140*22dc650dSSadaf Ebrahimi<br> 141*22dc650dSSadaf Ebrahimi<b>pcre2_compile_context *pcre2_compile_context_copy(</b> 142*22dc650dSSadaf Ebrahimi<b> pcre2_compile_context *<i>ccontext</i>);</b> 143*22dc650dSSadaf Ebrahimi<br> 144*22dc650dSSadaf Ebrahimi<br> 145*22dc650dSSadaf Ebrahimi<b>void pcre2_compile_context_free(pcre2_compile_context *<i>ccontext</i>);</b> 146*22dc650dSSadaf Ebrahimi<br> 147*22dc650dSSadaf Ebrahimi<br> 148*22dc650dSSadaf Ebrahimi<b>int pcre2_set_bsr(pcre2_compile_context *<i>ccontext</i>,</b> 149*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 150*22dc650dSSadaf Ebrahimi<br> 151*22dc650dSSadaf Ebrahimi<br> 152*22dc650dSSadaf Ebrahimi<b>int pcre2_set_character_tables(pcre2_compile_context *<i>ccontext</i>,</b> 153*22dc650dSSadaf Ebrahimi<b> const uint8_t *<i>tables</i>);</b> 154*22dc650dSSadaf Ebrahimi<br> 155*22dc650dSSadaf Ebrahimi<br> 156*22dc650dSSadaf Ebrahimi<b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b> 157*22dc650dSSadaf Ebrahimi<b> uint32_t <i>extra_options</i>);</b> 158*22dc650dSSadaf Ebrahimi<br> 159*22dc650dSSadaf Ebrahimi<br> 160*22dc650dSSadaf Ebrahimi<b>int pcre2_set_max_pattern_length(pcre2_compile_context *<i>ccontext</i>,</b> 161*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>value</i>);</b> 162*22dc650dSSadaf Ebrahimi<br> 163*22dc650dSSadaf Ebrahimi<br> 164*22dc650dSSadaf Ebrahimi<b>int pcre2_set_max_pattern_compiled_length(</b> 165*22dc650dSSadaf Ebrahimi<b> pcre2_compile_context *<i>ccontext</i>, PCRE2_SIZE <i>value</i>);</b> 166*22dc650dSSadaf Ebrahimi<br> 167*22dc650dSSadaf Ebrahimi<br> 168*22dc650dSSadaf Ebrahimi<b>int pcre2_set_max_varlookbehind(pcre2_compile_contest *<i>ccontext</i>,</b> 169*22dc650dSSadaf Ebrahimi<b>" uint32_t <i>value</i>);</b> 170*22dc650dSSadaf Ebrahimi<br> 171*22dc650dSSadaf Ebrahimi<br> 172*22dc650dSSadaf Ebrahimi<b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b> 173*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 174*22dc650dSSadaf Ebrahimi<br> 175*22dc650dSSadaf Ebrahimi<br> 176*22dc650dSSadaf Ebrahimi<b>int pcre2_set_parens_nest_limit(pcre2_compile_context *<i>ccontext</i>,</b> 177*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 178*22dc650dSSadaf Ebrahimi<br> 179*22dc650dSSadaf Ebrahimi<br> 180*22dc650dSSadaf Ebrahimi<b>int pcre2_set_compile_recursion_guard(pcre2_compile_context *<i>ccontext</i>,</b> 181*22dc650dSSadaf Ebrahimi<b> int (*<i>guard_function</i>)(uint32_t, void *), void *<i>user_data</i>);</b> 182*22dc650dSSadaf Ebrahimi</P> 183*22dc650dSSadaf Ebrahimi<br><a name="SEC5" href="#TOC1">PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS</a><br> 184*22dc650dSSadaf Ebrahimi<P> 185*22dc650dSSadaf Ebrahimi<b>pcre2_match_context *pcre2_match_context_create(</b> 186*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 187*22dc650dSSadaf Ebrahimi<br> 188*22dc650dSSadaf Ebrahimi<br> 189*22dc650dSSadaf Ebrahimi<b>pcre2_match_context *pcre2_match_context_copy(</b> 190*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>);</b> 191*22dc650dSSadaf Ebrahimi<br> 192*22dc650dSSadaf Ebrahimi<br> 193*22dc650dSSadaf Ebrahimi<b>void pcre2_match_context_free(pcre2_match_context *<i>mcontext</i>);</b> 194*22dc650dSSadaf Ebrahimi<br> 195*22dc650dSSadaf Ebrahimi<br> 196*22dc650dSSadaf Ebrahimi<b>int pcre2_set_callout(pcre2_match_context *<i>mcontext</i>,</b> 197*22dc650dSSadaf Ebrahimi<b> int (*<i>callout_function</i>)(pcre2_callout_block *, void *),</b> 198*22dc650dSSadaf Ebrahimi<b> void *<i>callout_data</i>);</b> 199*22dc650dSSadaf Ebrahimi<br> 200*22dc650dSSadaf Ebrahimi<br> 201*22dc650dSSadaf Ebrahimi<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b> 202*22dc650dSSadaf Ebrahimi<b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b> 203*22dc650dSSadaf Ebrahimi<b> void *<i>callout_data</i>);</b> 204*22dc650dSSadaf Ebrahimi<br> 205*22dc650dSSadaf Ebrahimi<br> 206*22dc650dSSadaf Ebrahimi<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b> 207*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>value</i>);</b> 208*22dc650dSSadaf Ebrahimi<br> 209*22dc650dSSadaf Ebrahimi<br> 210*22dc650dSSadaf Ebrahimi<b>int pcre2_set_heap_limit(pcre2_match_context *<i>mcontext</i>,</b> 211*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 212*22dc650dSSadaf Ebrahimi<br> 213*22dc650dSSadaf Ebrahimi<br> 214*22dc650dSSadaf Ebrahimi<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b> 215*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 216*22dc650dSSadaf Ebrahimi<br> 217*22dc650dSSadaf Ebrahimi<br> 218*22dc650dSSadaf Ebrahimi<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b> 219*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 220*22dc650dSSadaf Ebrahimi</P> 221*22dc650dSSadaf Ebrahimi<br><a name="SEC6" href="#TOC1">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a><br> 222*22dc650dSSadaf Ebrahimi<P> 223*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_copy_byname(pcre2_match_data *<i>match_data</i>,</b> 224*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>, PCRE2_UCHAR *<i>buffer</i>, PCRE2_SIZE *<i>bufflen</i>);</b> 225*22dc650dSSadaf Ebrahimi<br> 226*22dc650dSSadaf Ebrahimi<br> 227*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_copy_bynumber(pcre2_match_data *<i>match_data</i>,</b> 228*22dc650dSSadaf Ebrahimi<b> uint32_t <i>number</i>, PCRE2_UCHAR *<i>buffer</i>,</b> 229*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE *<i>bufflen</i>);</b> 230*22dc650dSSadaf Ebrahimi<br> 231*22dc650dSSadaf Ebrahimi<br> 232*22dc650dSSadaf Ebrahimi<b>void pcre2_substring_free(PCRE2_UCHAR *<i>buffer</i>);</b> 233*22dc650dSSadaf Ebrahimi<br> 234*22dc650dSSadaf Ebrahimi<br> 235*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_get_byname(pcre2_match_data *<i>match_data</i>,</b> 236*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>, PCRE2_UCHAR **<i>bufferptr</i>, PCRE2_SIZE *<i>bufflen</i>);</b> 237*22dc650dSSadaf Ebrahimi<br> 238*22dc650dSSadaf Ebrahimi<br> 239*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_get_bynumber(pcre2_match_data *<i>match_data</i>,</b> 240*22dc650dSSadaf Ebrahimi<b> uint32_t <i>number</i>, PCRE2_UCHAR **<i>bufferptr</i>,</b> 241*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE *<i>bufflen</i>);</b> 242*22dc650dSSadaf Ebrahimi<br> 243*22dc650dSSadaf Ebrahimi<br> 244*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_length_byname(pcre2_match_data *<i>match_data</i>,</b> 245*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>, PCRE2_SIZE *<i>length</i>);</b> 246*22dc650dSSadaf Ebrahimi<br> 247*22dc650dSSadaf Ebrahimi<br> 248*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b> 249*22dc650dSSadaf Ebrahimi<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b> 250*22dc650dSSadaf Ebrahimi<br> 251*22dc650dSSadaf Ebrahimi<br> 252*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b> 253*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b> 254*22dc650dSSadaf Ebrahimi<br> 255*22dc650dSSadaf Ebrahimi<br> 256*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b> 257*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>);</b> 258*22dc650dSSadaf Ebrahimi<br> 259*22dc650dSSadaf Ebrahimi<br> 260*22dc650dSSadaf Ebrahimi<b>void pcre2_substring_list_free(PCRE2_UCHAR **<i>list</i>);</b> 261*22dc650dSSadaf Ebrahimi<br> 262*22dc650dSSadaf Ebrahimi<br> 263*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b> 264*22dc650dSSadaf Ebrahimi<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b> 265*22dc650dSSadaf Ebrahimi</P> 266*22dc650dSSadaf Ebrahimi<br><a name="SEC7" href="#TOC1">PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION</a><br> 267*22dc650dSSadaf Ebrahimi<P> 268*22dc650dSSadaf Ebrahimi<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> 269*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> 270*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b> 271*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>, PCRE2_SPTR <i>replacementz</i>,</b> 272*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>rlength</i>, PCRE2_UCHAR *<i>outputbuffer</i>,</b> 273*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE *<i>outlengthptr</i>);</b> 274*22dc650dSSadaf Ebrahimi</P> 275*22dc650dSSadaf Ebrahimi<br><a name="SEC8" href="#TOC1">PCRE2 NATIVE API JIT FUNCTIONS</a><br> 276*22dc650dSSadaf Ebrahimi<P> 277*22dc650dSSadaf Ebrahimi<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b> 278*22dc650dSSadaf Ebrahimi<br> 279*22dc650dSSadaf Ebrahimi<br> 280*22dc650dSSadaf Ebrahimi<b>int pcre2_jit_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> 281*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> 282*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b> 283*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>);</b> 284*22dc650dSSadaf Ebrahimi<br> 285*22dc650dSSadaf Ebrahimi<br> 286*22dc650dSSadaf Ebrahimi<b>void pcre2_jit_free_unused_memory(pcre2_general_context *<i>gcontext</i>);</b> 287*22dc650dSSadaf Ebrahimi<br> 288*22dc650dSSadaf Ebrahimi<br> 289*22dc650dSSadaf Ebrahimi<b>pcre2_jit_stack *pcre2_jit_stack_create(size_t <i>startsize</i>,</b> 290*22dc650dSSadaf Ebrahimi<b> size_t <i>maxsize</i>, pcre2_general_context *<i>gcontext</i>);</b> 291*22dc650dSSadaf Ebrahimi<br> 292*22dc650dSSadaf Ebrahimi<br> 293*22dc650dSSadaf Ebrahimi<b>void pcre2_jit_stack_assign(pcre2_match_context *<i>mcontext</i>,</b> 294*22dc650dSSadaf Ebrahimi<b> pcre2_jit_callback <i>callback_function</i>, void *<i>callback_data</i>);</b> 295*22dc650dSSadaf Ebrahimi<br> 296*22dc650dSSadaf Ebrahimi<br> 297*22dc650dSSadaf Ebrahimi<b>void pcre2_jit_stack_free(pcre2_jit_stack *<i>jit_stack</i>);</b> 298*22dc650dSSadaf Ebrahimi</P> 299*22dc650dSSadaf Ebrahimi<br><a name="SEC9" href="#TOC1">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a><br> 300*22dc650dSSadaf Ebrahimi<P> 301*22dc650dSSadaf Ebrahimi<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b> 302*22dc650dSSadaf Ebrahimi<b> int32_t <i>number_of_codes</i>, const uint8_t *<i>bytes</i>,</b> 303*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 304*22dc650dSSadaf Ebrahimi<br> 305*22dc650dSSadaf Ebrahimi<br> 306*22dc650dSSadaf Ebrahimi<b>int32_t pcre2_serialize_encode(const pcre2_code **<i>codes</i>,</b> 307*22dc650dSSadaf Ebrahimi<b> int32_t <i>number_of_codes</i>, uint8_t **<i>serialized_bytes</i>,</b> 308*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b> 309*22dc650dSSadaf Ebrahimi<br> 310*22dc650dSSadaf Ebrahimi<br> 311*22dc650dSSadaf Ebrahimi<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b> 312*22dc650dSSadaf Ebrahimi<br> 313*22dc650dSSadaf Ebrahimi<br> 314*22dc650dSSadaf Ebrahimi<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b> 315*22dc650dSSadaf Ebrahimi</P> 316*22dc650dSSadaf Ebrahimi<br><a name="SEC10" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br> 317*22dc650dSSadaf Ebrahimi<P> 318*22dc650dSSadaf Ebrahimi<b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b> 319*22dc650dSSadaf Ebrahimi<br> 320*22dc650dSSadaf Ebrahimi<br> 321*22dc650dSSadaf Ebrahimi<b>pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *<i>code</i>);</b> 322*22dc650dSSadaf Ebrahimi<br> 323*22dc650dSSadaf Ebrahimi<br> 324*22dc650dSSadaf Ebrahimi<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b> 325*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>bufflen</i>);</b> 326*22dc650dSSadaf Ebrahimi<br> 327*22dc650dSSadaf Ebrahimi<br> 328*22dc650dSSadaf Ebrahimi<b>const uint8_t *pcre2_maketables(pcre2_general_context *<i>gcontext</i>);</b> 329*22dc650dSSadaf Ebrahimi<br> 330*22dc650dSSadaf Ebrahimi<br> 331*22dc650dSSadaf Ebrahimi<b>void pcre2_maketables_free(pcre2_general_context *<i>gcontext</i>,</b> 332*22dc650dSSadaf Ebrahimi<b> const uint8_t *<i>tables</i>);</b> 333*22dc650dSSadaf Ebrahimi<br> 334*22dc650dSSadaf Ebrahimi<br> 335*22dc650dSSadaf Ebrahimi<b>int pcre2_pattern_info(const pcre2_code *<i>code</i>, uint32_t <i>what</i>,</b> 336*22dc650dSSadaf Ebrahimi<b> void *<i>where</i>);</b> 337*22dc650dSSadaf Ebrahimi<br> 338*22dc650dSSadaf Ebrahimi<br> 339*22dc650dSSadaf Ebrahimi<b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b> 340*22dc650dSSadaf Ebrahimi<b> int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b> 341*22dc650dSSadaf Ebrahimi<b> void *<i>user_data</i>);</b> 342*22dc650dSSadaf Ebrahimi<br> 343*22dc650dSSadaf Ebrahimi<br> 344*22dc650dSSadaf Ebrahimi<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b> 345*22dc650dSSadaf Ebrahimi</P> 346*22dc650dSSadaf Ebrahimi<br><a name="SEC11" href="#TOC1">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a><br> 347*22dc650dSSadaf Ebrahimi<P> 348*22dc650dSSadaf Ebrahimi<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b> 349*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 350*22dc650dSSadaf Ebrahimi<br> 351*22dc650dSSadaf Ebrahimi<br> 352*22dc650dSSadaf Ebrahimi<b>int pcre2_set_recursion_memory_management(</b> 353*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>,</b> 354*22dc650dSSadaf Ebrahimi<b> void *(*<i>private_malloc</i>)(size_t, void *),</b> 355*22dc650dSSadaf Ebrahimi<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b> 356*22dc650dSSadaf Ebrahimi<br> 357*22dc650dSSadaf Ebrahimi<br> 358*22dc650dSSadaf EbrahimiThese functions became obsolete at release 10.30 and are retained only for 359*22dc650dSSadaf Ebrahimibackward compatibility. They should not be used in new code. The first is 360*22dc650dSSadaf Ebrahimireplaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and 361*22dc650dSSadaf Ebrahimihas no effect (it always returns zero). 362*22dc650dSSadaf Ebrahimi</P> 363*22dc650dSSadaf Ebrahimi<br><a name="SEC12" href="#TOC1">PCRE2 EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a><br> 364*22dc650dSSadaf Ebrahimi<P> 365*22dc650dSSadaf Ebrahimi<b>pcre2_convert_context *pcre2_convert_context_create(</b> 366*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 367*22dc650dSSadaf Ebrahimi<br> 368*22dc650dSSadaf Ebrahimi<br> 369*22dc650dSSadaf Ebrahimi<b>pcre2_convert_context *pcre2_convert_context_copy(</b> 370*22dc650dSSadaf Ebrahimi<b> pcre2_convert_context *<i>cvcontext</i>);</b> 371*22dc650dSSadaf Ebrahimi<br> 372*22dc650dSSadaf Ebrahimi<br> 373*22dc650dSSadaf Ebrahimi<b>void pcre2_convert_context_free(pcre2_convert_context *<i>cvcontext</i>);</b> 374*22dc650dSSadaf Ebrahimi<br> 375*22dc650dSSadaf Ebrahimi<br> 376*22dc650dSSadaf Ebrahimi<b>int pcre2_set_glob_escape(pcre2_convert_context *<i>cvcontext</i>,</b> 377*22dc650dSSadaf Ebrahimi<b> uint32_t <i>escape_char</i>);</b> 378*22dc650dSSadaf Ebrahimi<br> 379*22dc650dSSadaf Ebrahimi<br> 380*22dc650dSSadaf Ebrahimi<b>int pcre2_set_glob_separator(pcre2_convert_context *<i>cvcontext</i>,</b> 381*22dc650dSSadaf Ebrahimi<b> uint32_t <i>separator_char</i>);</b> 382*22dc650dSSadaf Ebrahimi<br> 383*22dc650dSSadaf Ebrahimi<br> 384*22dc650dSSadaf Ebrahimi<b>int pcre2_pattern_convert(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b> 385*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, PCRE2_UCHAR **<i>buffer</i>,</b> 386*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE *<i>blength</i>, pcre2_convert_context *<i>cvcontext</i>);</b> 387*22dc650dSSadaf Ebrahimi<br> 388*22dc650dSSadaf Ebrahimi<br> 389*22dc650dSSadaf Ebrahimi<b>void pcre2_converted_pattern_free(PCRE2_UCHAR *<i>converted_pattern</i>);</b> 390*22dc650dSSadaf Ebrahimi<br> 391*22dc650dSSadaf Ebrahimi<br> 392*22dc650dSSadaf EbrahimiThese functions provide a way of converting non-PCRE2 patterns into 393*22dc650dSSadaf Ebrahimipatterns that can be processed by <b>pcre2_compile()</b>. This facility is 394*22dc650dSSadaf Ebrahimiexperimental and may be changed in future releases. At present, "globs" and 395*22dc650dSSadaf EbrahimiPOSIX basic and extended patterns can be converted. Details are given in the 396*22dc650dSSadaf Ebrahimi<a href="pcre2convert.html"><b>pcre2convert</b></a> 397*22dc650dSSadaf Ebrahimidocumentation. 398*22dc650dSSadaf Ebrahimi</P> 399*22dc650dSSadaf Ebrahimi<br><a name="SEC13" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br> 400*22dc650dSSadaf Ebrahimi<P> 401*22dc650dSSadaf EbrahimiThere are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code 402*22dc650dSSadaf Ebrahimiunits, respectively. However, there is just one header file, <b>pcre2.h</b>. 403*22dc650dSSadaf EbrahimiThis contains the function prototypes and other definitions for all three 404*22dc650dSSadaf Ebrahimilibraries. One, two, or all three can be installed simultaneously. On Unix-like 405*22dc650dSSadaf Ebrahimisystems the libraries are called <b>libpcre2-8</b>, <b>libpcre2-16</b>, and 406*22dc650dSSadaf Ebrahimi<b>libpcre2-32</b>, and they can also co-exist with the original PCRE libraries. 407*22dc650dSSadaf EbrahimiEvery PCRE2 function comes in three different forms, one for each library, for 408*22dc650dSSadaf Ebrahimiexample: 409*22dc650dSSadaf Ebrahimi<pre> 410*22dc650dSSadaf Ebrahimi <b>pcre2_compile_8()</b> 411*22dc650dSSadaf Ebrahimi <b>pcre2_compile_16()</b> 412*22dc650dSSadaf Ebrahimi <b>pcre2_compile_32()</b> 413*22dc650dSSadaf Ebrahimi</pre> 414*22dc650dSSadaf EbrahimiThere are also three different sets of data types: 415*22dc650dSSadaf Ebrahimi<pre> 416*22dc650dSSadaf Ebrahimi <b>PCRE2_UCHAR8, PCRE2_UCHAR16, PCRE2_UCHAR32</b> 417*22dc650dSSadaf Ebrahimi <b>PCRE2_SPTR8, PCRE2_SPTR16, PCRE2_SPTR32</b> 418*22dc650dSSadaf Ebrahimi</pre> 419*22dc650dSSadaf EbrahimiThe UCHAR types define unsigned code units of the appropriate widths. 420*22dc650dSSadaf EbrahimiFor example, PCRE2_UCHAR16 is usually defined as `uint16_t'. 421*22dc650dSSadaf EbrahimiThe SPTR types are pointers to constants of the equivalent UCHAR types, 422*22dc650dSSadaf Ebrahimithat is, they are pointers to vectors of unsigned code units. 423*22dc650dSSadaf Ebrahimi</P> 424*22dc650dSSadaf Ebrahimi<P> 425*22dc650dSSadaf EbrahimiCharacter strings are passed to a PCRE2 library as sequences of unsigned 426*22dc650dSSadaf Ebrahimiintegers in code units of the appropriate width. The length of a string may 427*22dc650dSSadaf Ebrahimibe given as a number of code units, or the string may be specified as 428*22dc650dSSadaf Ebrahimizero-terminated. 429*22dc650dSSadaf Ebrahimi</P> 430*22dc650dSSadaf Ebrahimi<P> 431*22dc650dSSadaf EbrahimiMany applications use only one code unit width. For their convenience, macros 432*22dc650dSSadaf Ebrahimiare defined whose names are the generic forms such as <b>pcre2_compile()</b> and 433*22dc650dSSadaf EbrahimiPCRE2_SPTR. These macros use the value of the macro PCRE2_CODE_UNIT_WIDTH to 434*22dc650dSSadaf Ebrahimigenerate the appropriate width-specific function and macro names. 435*22dc650dSSadaf EbrahimiPCRE2_CODE_UNIT_WIDTH is not defined by default. An application must define it 436*22dc650dSSadaf Ebrahimito be 8, 16, or 32 before including <b>pcre2.h</b> in order to make use of the 437*22dc650dSSadaf Ebrahimigeneric names. 438*22dc650dSSadaf Ebrahimi</P> 439*22dc650dSSadaf Ebrahimi<P> 440*22dc650dSSadaf EbrahimiApplications that use more than one code unit width can be linked with more 441*22dc650dSSadaf Ebrahimithan one PCRE2 library, but must define PCRE2_CODE_UNIT_WIDTH to be 0 before 442*22dc650dSSadaf Ebrahimiincluding <b>pcre2.h</b>, and then use the real function names. Any code that is 443*22dc650dSSadaf Ebrahimito be included in an environment where the value of PCRE2_CODE_UNIT_WIDTH is 444*22dc650dSSadaf Ebrahimiunknown should also use the real function names. (Unfortunately, it is not 445*22dc650dSSadaf Ebrahimipossible in C code to save and restore the value of a macro.) 446*22dc650dSSadaf Ebrahimi</P> 447*22dc650dSSadaf Ebrahimi<P> 448*22dc650dSSadaf EbrahimiIf PCRE2_CODE_UNIT_WIDTH is not defined before including <b>pcre2.h</b>, a 449*22dc650dSSadaf Ebrahimicompiler error occurs. 450*22dc650dSSadaf Ebrahimi</P> 451*22dc650dSSadaf Ebrahimi<P> 452*22dc650dSSadaf EbrahimiWhen using multiple libraries in an application, you must take care when 453*22dc650dSSadaf Ebrahimiprocessing any particular pattern to use only functions from a single library. 454*22dc650dSSadaf EbrahimiFor example, if you want to run a match using a pattern that was compiled with 455*22dc650dSSadaf Ebrahimi<b>pcre2_compile_16()</b>, you must do so with <b>pcre2_match_16()</b>, not 456*22dc650dSSadaf Ebrahimi<b>pcre2_match_8()</b> or <b>pcre2_match_32()</b>. 457*22dc650dSSadaf Ebrahimi</P> 458*22dc650dSSadaf Ebrahimi<P> 459*22dc650dSSadaf EbrahimiIn the function summaries above, and in the rest of this document and other 460*22dc650dSSadaf EbrahimiPCRE2 documents, functions and data types are described using their generic 461*22dc650dSSadaf Ebrahiminames, without the _8, _16, or _32 suffix. 462*22dc650dSSadaf Ebrahimi</P> 463*22dc650dSSadaf Ebrahimi<br><a name="SEC14" href="#TOC1">PCRE2 API OVERVIEW</a><br> 464*22dc650dSSadaf Ebrahimi<P> 465*22dc650dSSadaf EbrahimiPCRE2 has its own native API, which is described in this document. There are 466*22dc650dSSadaf Ebrahimialso some wrapper functions for the 8-bit library that correspond to the 467*22dc650dSSadaf EbrahimiPOSIX regular expression API, but they do not give access to all the 468*22dc650dSSadaf Ebrahimifunctionality of PCRE2 and they are not thread-safe. They are described in the 469*22dc650dSSadaf Ebrahimi<a href="pcre2posix.html"><b>pcre2posix</b></a> 470*22dc650dSSadaf Ebrahimidocumentation. Both these APIs define a set of C function calls. 471*22dc650dSSadaf Ebrahimi</P> 472*22dc650dSSadaf Ebrahimi<P> 473*22dc650dSSadaf EbrahimiThe native API C data types, function prototypes, option values, and error 474*22dc650dSSadaf Ebrahimicodes are defined in the header file <b>pcre2.h</b>, which also contains 475*22dc650dSSadaf Ebrahimidefinitions of PCRE2_MAJOR and PCRE2_MINOR, the major and minor release numbers 476*22dc650dSSadaf Ebrahimifor the library. Applications can use these to include support for different 477*22dc650dSSadaf Ebrahimireleases of PCRE2. 478*22dc650dSSadaf Ebrahimi</P> 479*22dc650dSSadaf Ebrahimi<P> 480*22dc650dSSadaf EbrahimiIn a Windows environment, if you want to statically link an application program 481*22dc650dSSadaf Ebrahimiagainst a non-dll PCRE2 library, you must define PCRE2_STATIC before including 482*22dc650dSSadaf Ebrahimi<b>pcre2.h</b>. 483*22dc650dSSadaf Ebrahimi</P> 484*22dc650dSSadaf Ebrahimi<P> 485*22dc650dSSadaf EbrahimiThe functions <b>pcre2_compile()</b> and <b>pcre2_match()</b> are used for 486*22dc650dSSadaf Ebrahimicompiling and matching regular expressions in a Perl-compatible manner. A 487*22dc650dSSadaf Ebrahimisample program that demonstrates the simplest way of using them is provided in 488*22dc650dSSadaf Ebrahimithe file called <i>pcre2demo.c</i> in the PCRE2 source distribution. A listing 489*22dc650dSSadaf Ebrahimiof this program is given in the 490*22dc650dSSadaf Ebrahimi<a href="pcre2demo.html"><b>pcre2demo</b></a> 491*22dc650dSSadaf Ebrahimidocumentation, and the 492*22dc650dSSadaf Ebrahimi<a href="pcre2sample.html"><b>pcre2sample</b></a> 493*22dc650dSSadaf Ebrahimidocumentation describes how to compile and run it. 494*22dc650dSSadaf Ebrahimi</P> 495*22dc650dSSadaf Ebrahimi<P> 496*22dc650dSSadaf EbrahimiThe compiling and matching functions recognize various options that are passed 497*22dc650dSSadaf Ebrahimias bits in an options argument. There are also some more complicated parameters 498*22dc650dSSadaf Ebrahimisuch as custom memory management functions and resource limits that are passed 499*22dc650dSSadaf Ebrahimiin "contexts" (which are just memory blocks, described below). Simple 500*22dc650dSSadaf Ebrahimiapplications do not need to make use of contexts. 501*22dc650dSSadaf Ebrahimi</P> 502*22dc650dSSadaf Ebrahimi<P> 503*22dc650dSSadaf EbrahimiJust-in-time (JIT) compiler support is an optional feature of PCRE2 that can be 504*22dc650dSSadaf Ebrahimibuilt in appropriate hardware environments. It greatly speeds up the matching 505*22dc650dSSadaf Ebrahimiperformance of many patterns. Programs can request that it be used if 506*22dc650dSSadaf Ebrahimiavailable by calling <b>pcre2_jit_compile()</b> after a pattern has been 507*22dc650dSSadaf Ebrahimisuccessfully compiled by <b>pcre2_compile()</b>. This does nothing if JIT 508*22dc650dSSadaf Ebrahimisupport is not available. 509*22dc650dSSadaf Ebrahimi</P> 510*22dc650dSSadaf Ebrahimi<P> 511*22dc650dSSadaf EbrahimiMore complicated programs might need to make use of the specialist functions 512*22dc650dSSadaf Ebrahimi<b>pcre2_jit_stack_create()</b>, <b>pcre2_jit_stack_free()</b>, and 513*22dc650dSSadaf Ebrahimi<b>pcre2_jit_stack_assign()</b> in order to control the JIT code's memory usage. 514*22dc650dSSadaf Ebrahimi</P> 515*22dc650dSSadaf Ebrahimi<P> 516*22dc650dSSadaf EbrahimiJIT matching is automatically used by <b>pcre2_match()</b> if it is available, 517*22dc650dSSadaf Ebrahimiunless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT 518*22dc650dSSadaf Ebrahimimatching, which gives improved performance at the expense of less sanity 519*22dc650dSSadaf Ebrahimichecking. The JIT-specific functions are discussed in the 520*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a> 521*22dc650dSSadaf Ebrahimidocumentation. 522*22dc650dSSadaf Ebrahimi</P> 523*22dc650dSSadaf Ebrahimi<P> 524*22dc650dSSadaf EbrahimiA second matching function, <b>pcre2_dfa_match()</b>, which is not 525*22dc650dSSadaf EbrahimiPerl-compatible, is also provided. This uses a different algorithm for the 526*22dc650dSSadaf Ebrahimimatching. The alternative algorithm finds all possible matches (at a given 527*22dc650dSSadaf Ebrahimipoint in the subject), and scans the subject just once (unless there are 528*22dc650dSSadaf Ebrahimilookaround assertions). However, this algorithm does not return captured 529*22dc650dSSadaf Ebrahimisubstrings. A description of the two matching algorithms and their advantages 530*22dc650dSSadaf Ebrahimiand disadvantages is given in the 531*22dc650dSSadaf Ebrahimi<a href="pcre2matching.html"><b>pcre2matching</b></a> 532*22dc650dSSadaf Ebrahimidocumentation. There is no JIT support for <b>pcre2_dfa_match()</b>. 533*22dc650dSSadaf Ebrahimi</P> 534*22dc650dSSadaf Ebrahimi<P> 535*22dc650dSSadaf EbrahimiIn addition to the main compiling and matching functions, there are convenience 536*22dc650dSSadaf Ebrahimifunctions for extracting captured substrings from a subject string that has 537*22dc650dSSadaf Ebrahimibeen matched by <b>pcre2_match()</b>. They are: 538*22dc650dSSadaf Ebrahimi<pre> 539*22dc650dSSadaf Ebrahimi <b>pcre2_substring_copy_byname()</b> 540*22dc650dSSadaf Ebrahimi <b>pcre2_substring_copy_bynumber()</b> 541*22dc650dSSadaf Ebrahimi <b>pcre2_substring_get_byname()</b> 542*22dc650dSSadaf Ebrahimi <b>pcre2_substring_get_bynumber()</b> 543*22dc650dSSadaf Ebrahimi <b>pcre2_substring_list_get()</b> 544*22dc650dSSadaf Ebrahimi <b>pcre2_substring_length_byname()</b> 545*22dc650dSSadaf Ebrahimi <b>pcre2_substring_length_bynumber()</b> 546*22dc650dSSadaf Ebrahimi <b>pcre2_substring_nametable_scan()</b> 547*22dc650dSSadaf Ebrahimi <b>pcre2_substring_number_from_name()</b> 548*22dc650dSSadaf Ebrahimi</pre> 549*22dc650dSSadaf Ebrahimi<b>pcre2_substring_free()</b> and <b>pcre2_substring_list_free()</b> are also 550*22dc650dSSadaf Ebrahimiprovided, to free memory used for extracted strings. If either of these 551*22dc650dSSadaf Ebrahimifunctions is called with a NULL argument, the function returns immediately 552*22dc650dSSadaf Ebrahimiwithout doing anything. 553*22dc650dSSadaf Ebrahimi</P> 554*22dc650dSSadaf Ebrahimi<P> 555*22dc650dSSadaf EbrahimiThe function <b>pcre2_substitute()</b> can be called to match a pattern and 556*22dc650dSSadaf Ebrahimireturn a copy of the subject string with substitutions for parts that were 557*22dc650dSSadaf Ebrahimimatched. 558*22dc650dSSadaf Ebrahimi</P> 559*22dc650dSSadaf Ebrahimi<P> 560*22dc650dSSadaf EbrahimiFunctions whose names begin with <b>pcre2_serialize_</b> are used for saving 561*22dc650dSSadaf Ebrahimicompiled patterns on disc or elsewhere, and reloading them later. 562*22dc650dSSadaf Ebrahimi</P> 563*22dc650dSSadaf Ebrahimi<P> 564*22dc650dSSadaf EbrahimiFinally, there are functions for finding out information about a compiled 565*22dc650dSSadaf Ebrahimipattern (<b>pcre2_pattern_info()</b>) and about the configuration with which 566*22dc650dSSadaf EbrahimiPCRE2 was built (<b>pcre2_config()</b>). 567*22dc650dSSadaf Ebrahimi</P> 568*22dc650dSSadaf Ebrahimi<P> 569*22dc650dSSadaf EbrahimiFunctions with names ending with <b>_free()</b> are used for freeing memory 570*22dc650dSSadaf Ebrahimiblocks of various sorts. In all cases, if one of these functions is called with 571*22dc650dSSadaf Ebrahimia NULL argument, it does nothing. 572*22dc650dSSadaf Ebrahimi</P> 573*22dc650dSSadaf Ebrahimi<br><a name="SEC15" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br> 574*22dc650dSSadaf Ebrahimi<P> 575*22dc650dSSadaf EbrahimiThe PCRE2 API uses string lengths and offsets into strings of code units in 576*22dc650dSSadaf Ebrahimiseveral places. These values are always of type PCRE2_SIZE, which is an 577*22dc650dSSadaf Ebrahimiunsigned integer type, currently always defined as <i>size_t</i>. The largest 578*22dc650dSSadaf Ebrahimivalue that can be stored in such a type (that is ~(PCRE2_SIZE)0) is reserved 579*22dc650dSSadaf Ebrahimias a special indicator for zero-terminated strings and unset offsets. 580*22dc650dSSadaf EbrahimiTherefore, the longest string that can be handled is one less than this 581*22dc650dSSadaf Ebrahimimaximum. Note that string lengths are always given in code units. Only in the 582*22dc650dSSadaf Ebrahimi8-bit library is such a length the same as the number of bytes in the string. 583*22dc650dSSadaf Ebrahimi<a name="newlines"></a></P> 584*22dc650dSSadaf Ebrahimi<br><a name="SEC16" href="#TOC1">NEWLINES</a><br> 585*22dc650dSSadaf Ebrahimi<P> 586*22dc650dSSadaf EbrahimiPCRE2 supports five different conventions for indicating line breaks in 587*22dc650dSSadaf Ebrahimistrings: a single CR (carriage return) character, a single LF (linefeed) 588*22dc650dSSadaf Ebrahimicharacter, the two-character sequence CRLF, any of the three preceding, or any 589*22dc650dSSadaf EbrahimiUnicode newline sequence. The Unicode newline sequences are the three just 590*22dc650dSSadaf Ebrahimimentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed, 591*22dc650dSSadaf EbrahimiU+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS 592*22dc650dSSadaf Ebrahimi(paragraph separator, U+2029). 593*22dc650dSSadaf Ebrahimi</P> 594*22dc650dSSadaf Ebrahimi<P> 595*22dc650dSSadaf EbrahimiEach of the first three conventions is used by at least one operating system as 596*22dc650dSSadaf Ebrahimiits standard newline sequence. When PCRE2 is built, a default can be specified. 597*22dc650dSSadaf EbrahimiIf it is not, the default is set to LF, which is the Unix standard. However, 598*22dc650dSSadaf Ebrahimithe newline convention can be changed by an application when calling 599*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b>, or it can be specified by special text at the start of 600*22dc650dSSadaf Ebrahimithe pattern itself; this overrides any other settings. See the 601*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 602*22dc650dSSadaf Ebrahimipage for details of the special character sequences. 603*22dc650dSSadaf Ebrahimi</P> 604*22dc650dSSadaf Ebrahimi<P> 605*22dc650dSSadaf EbrahimiIn the PCRE2 documentation the word "newline" is used to mean "the character or 606*22dc650dSSadaf Ebrahimipair of characters that indicate a line break". The choice of newline 607*22dc650dSSadaf Ebrahimiconvention affects the handling of the dot, circumflex, and dollar 608*22dc650dSSadaf Ebrahimimetacharacters, the handling of #-comments in /x mode, and, when CRLF is a 609*22dc650dSSadaf Ebrahimirecognized line ending sequence, the match position advancement for a 610*22dc650dSSadaf Ebrahiminon-anchored pattern. There is more detail about this in the 611*22dc650dSSadaf Ebrahimi<a href="#matchoptions">section on <b>pcre2_match()</b> options</a> 612*22dc650dSSadaf Ebrahimibelow. 613*22dc650dSSadaf Ebrahimi</P> 614*22dc650dSSadaf Ebrahimi<P> 615*22dc650dSSadaf EbrahimiThe choice of newline convention does not affect the interpretation of 616*22dc650dSSadaf Ebrahimithe \n or \r escape sequences, nor does it affect what \R matches; this has 617*22dc650dSSadaf Ebrahimiits own separate convention. 618*22dc650dSSadaf Ebrahimi</P> 619*22dc650dSSadaf Ebrahimi<br><a name="SEC17" href="#TOC1">MULTITHREADING</a><br> 620*22dc650dSSadaf Ebrahimi<P> 621*22dc650dSSadaf EbrahimiIn a multithreaded application it is important to keep thread-specific data 622*22dc650dSSadaf Ebrahimiseparate from data that can be shared between threads. The PCRE2 library code 623*22dc650dSSadaf Ebrahimiitself is thread-safe: it contains no static or global variables. The API is 624*22dc650dSSadaf Ebrahimidesigned to be fairly simple for non-threaded applications while at the same 625*22dc650dSSadaf Ebrahimitime ensuring that multithreaded applications can use it. 626*22dc650dSSadaf Ebrahimi</P> 627*22dc650dSSadaf Ebrahimi<P> 628*22dc650dSSadaf EbrahimiThere are several different blocks of data that are used to pass information 629*22dc650dSSadaf Ebrahimibetween the application and the PCRE2 libraries. 630*22dc650dSSadaf Ebrahimi</P> 631*22dc650dSSadaf Ebrahimi<br><b> 632*22dc650dSSadaf EbrahimiThe compiled pattern 633*22dc650dSSadaf Ebrahimi</b><br> 634*22dc650dSSadaf Ebrahimi<P> 635*22dc650dSSadaf EbrahimiA pointer to the compiled form of a pattern is returned to the user when 636*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> is successful. The data in the compiled pattern is fixed, 637*22dc650dSSadaf Ebrahimiand does not change when the pattern is matched. Therefore, it is thread-safe, 638*22dc650dSSadaf Ebrahimithat is, the same compiled pattern can be used by more than one thread 639*22dc650dSSadaf Ebrahimisimultaneously. For example, an application can compile all its patterns at the 640*22dc650dSSadaf Ebrahimistart, before forking off multiple threads that use them. However, if the 641*22dc650dSSadaf Ebrahimijust-in-time (JIT) optimization feature is being used, it needs separate memory 642*22dc650dSSadaf Ebrahimistack areas for each thread. See the 643*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a> 644*22dc650dSSadaf Ebrahimidocumentation for more details. 645*22dc650dSSadaf Ebrahimi</P> 646*22dc650dSSadaf Ebrahimi<P> 647*22dc650dSSadaf EbrahimiIn a more complicated situation, where patterns are compiled only when they are 648*22dc650dSSadaf Ebrahimifirst needed, but are still shared between threads, pointers to compiled 649*22dc650dSSadaf Ebrahimipatterns must be protected from simultaneous writing by multiple threads. This 650*22dc650dSSadaf Ebrahimiis somewhat tricky to do correctly. If you know that writing to a pointer is 651*22dc650dSSadaf Ebrahimiatomic in your environment, you can use logic like this: 652*22dc650dSSadaf Ebrahimi<pre> 653*22dc650dSSadaf Ebrahimi Get a read-only (shared) lock (mutex) for pointer 654*22dc650dSSadaf Ebrahimi if (pointer == NULL) 655*22dc650dSSadaf Ebrahimi { 656*22dc650dSSadaf Ebrahimi Get a write (unique) lock for pointer 657*22dc650dSSadaf Ebrahimi if (pointer == NULL) pointer = pcre2_compile(... 658*22dc650dSSadaf Ebrahimi } 659*22dc650dSSadaf Ebrahimi Release the lock 660*22dc650dSSadaf Ebrahimi Use pointer in pcre2_match() 661*22dc650dSSadaf Ebrahimi</pre> 662*22dc650dSSadaf EbrahimiOf course, testing for compilation errors should also be included in the code. 663*22dc650dSSadaf Ebrahimi</P> 664*22dc650dSSadaf Ebrahimi<P> 665*22dc650dSSadaf EbrahimiThe reason for checking the pointer a second time is as follows: Several 666*22dc650dSSadaf Ebrahimithreads may have acquired the shared lock and tested the pointer for being 667*22dc650dSSadaf EbrahimiNULL, but only one of them will be given the write lock, with the rest kept 668*22dc650dSSadaf Ebrahimiwaiting. The winning thread will compile the pattern and store the result. 669*22dc650dSSadaf EbrahimiAfter this thread releases the write lock, another thread will get it, and if 670*22dc650dSSadaf Ebrahimiit does not retest pointer for being NULL, will recompile the pattern and 671*22dc650dSSadaf Ebrahimioverwrite the pointer, creating a memory leak and possibly causing other 672*22dc650dSSadaf Ebrahimiissues. 673*22dc650dSSadaf Ebrahimi</P> 674*22dc650dSSadaf Ebrahimi<P> 675*22dc650dSSadaf EbrahimiIn an environment where writing to a pointer may not be atomic, the above logic 676*22dc650dSSadaf Ebrahimiis not sufficient. The thread that is doing the compiling may be descheduled 677*22dc650dSSadaf Ebrahimiafter writing only part of the pointer, which could cause other threads to use 678*22dc650dSSadaf Ebrahimian invalid value. Instead of checking the pointer itself, a separate "pointer 679*22dc650dSSadaf Ebrahimiis valid" flag (that can be updated atomically) must be used: 680*22dc650dSSadaf Ebrahimi<pre> 681*22dc650dSSadaf Ebrahimi Get a read-only (shared) lock (mutex) for pointer 682*22dc650dSSadaf Ebrahimi if (!pointer_is_valid) 683*22dc650dSSadaf Ebrahimi { 684*22dc650dSSadaf Ebrahimi Get a write (unique) lock for pointer 685*22dc650dSSadaf Ebrahimi if (!pointer_is_valid) 686*22dc650dSSadaf Ebrahimi { 687*22dc650dSSadaf Ebrahimi pointer = pcre2_compile(... 688*22dc650dSSadaf Ebrahimi pointer_is_valid = TRUE 689*22dc650dSSadaf Ebrahimi } 690*22dc650dSSadaf Ebrahimi } 691*22dc650dSSadaf Ebrahimi Release the lock 692*22dc650dSSadaf Ebrahimi Use pointer in pcre2_match() 693*22dc650dSSadaf Ebrahimi</pre> 694*22dc650dSSadaf EbrahimiIf JIT is being used, but the JIT compilation is not being done immediately 695*22dc650dSSadaf Ebrahimi(perhaps waiting to see if the pattern is used often enough), similar logic is 696*22dc650dSSadaf Ebrahimirequired. JIT compilation updates a value within the compiled code block, so a 697*22dc650dSSadaf Ebrahimithread must gain unique write access to the pointer before calling 698*22dc650dSSadaf Ebrahimi<b>pcre2_jit_compile()</b>. Alternatively, <b>pcre2_code_copy()</b> or 699*22dc650dSSadaf Ebrahimi<b>pcre2_code_copy_with_tables()</b> can be used to obtain a private copy of the 700*22dc650dSSadaf Ebrahimicompiled code before calling the JIT compiler. 701*22dc650dSSadaf Ebrahimi</P> 702*22dc650dSSadaf Ebrahimi<br><b> 703*22dc650dSSadaf EbrahimiContext blocks 704*22dc650dSSadaf Ebrahimi</b><br> 705*22dc650dSSadaf Ebrahimi<P> 706*22dc650dSSadaf EbrahimiThe next main section below introduces the idea of "contexts" in which PCRE2 707*22dc650dSSadaf Ebrahimifunctions are called. A context is nothing more than a collection of parameters 708*22dc650dSSadaf Ebrahimithat control the way PCRE2 operates. Grouping a number of parameters together 709*22dc650dSSadaf Ebrahimiin a context is a convenient way of passing them to a PCRE2 function without 710*22dc650dSSadaf Ebrahimiusing lots of arguments. The parameters that are stored in contexts are in some 711*22dc650dSSadaf Ebrahimisense "advanced features" of the API. Many straightforward applications will 712*22dc650dSSadaf Ebrahiminot need to use contexts. 713*22dc650dSSadaf Ebrahimi</P> 714*22dc650dSSadaf Ebrahimi<P> 715*22dc650dSSadaf EbrahimiIn a multithreaded application, if the parameters in a context are values that 716*22dc650dSSadaf Ebrahimiare never changed, the same context can be used by all the threads. However, if 717*22dc650dSSadaf Ebrahimiany thread needs to change any value in a context, it must make its own 718*22dc650dSSadaf Ebrahimithread-specific copy. 719*22dc650dSSadaf Ebrahimi</P> 720*22dc650dSSadaf Ebrahimi<br><b> 721*22dc650dSSadaf EbrahimiMatch blocks 722*22dc650dSSadaf Ebrahimi</b><br> 723*22dc650dSSadaf Ebrahimi<P> 724*22dc650dSSadaf EbrahimiThe matching functions need a block of memory for storing the results of a 725*22dc650dSSadaf Ebrahimimatch. This includes details of what was matched, as well as additional 726*22dc650dSSadaf Ebrahimiinformation such as the name of a (*MARK) setting. Each thread must provide its 727*22dc650dSSadaf Ebrahimiown copy of this memory. 728*22dc650dSSadaf Ebrahimi</P> 729*22dc650dSSadaf Ebrahimi<br><a name="SEC18" href="#TOC1">PCRE2 CONTEXTS</a><br> 730*22dc650dSSadaf Ebrahimi<P> 731*22dc650dSSadaf EbrahimiSome PCRE2 functions have a lot of parameters, many of which are used only by 732*22dc650dSSadaf Ebrahimispecialist applications, for example, those that use custom memory management 733*22dc650dSSadaf Ebrahimior non-standard character tables. To keep function argument lists at a 734*22dc650dSSadaf Ebrahimireasonable size, and at the same time to keep the API extensible, "uncommon" 735*22dc650dSSadaf Ebrahimiparameters are passed to certain functions in a <b>context</b> instead of 736*22dc650dSSadaf Ebrahimidirectly. A context is just a block of memory that holds the parameter values. 737*22dc650dSSadaf EbrahimiApplications that do not need to adjust any of the context parameters can pass 738*22dc650dSSadaf EbrahimiNULL when a context pointer is required. 739*22dc650dSSadaf Ebrahimi</P> 740*22dc650dSSadaf Ebrahimi<P> 741*22dc650dSSadaf EbrahimiThere are three different types of context: a general context that is relevant 742*22dc650dSSadaf Ebrahimifor several PCRE2 operations, a compile-time context, and a match-time context. 743*22dc650dSSadaf Ebrahimi</P> 744*22dc650dSSadaf Ebrahimi<br><b> 745*22dc650dSSadaf EbrahimiThe general context 746*22dc650dSSadaf Ebrahimi</b><br> 747*22dc650dSSadaf Ebrahimi<P> 748*22dc650dSSadaf EbrahimiAt present, this context just contains pointers to (and data for) external 749*22dc650dSSadaf Ebrahimimemory management functions that are called from several places in the PCRE2 750*22dc650dSSadaf Ebrahimilibrary. The context is named `general' rather than specifically `memory' 751*22dc650dSSadaf Ebrahimibecause in future other fields may be added. If you do not want to supply your 752*22dc650dSSadaf Ebrahimiown custom memory management functions, you do not need to bother with a 753*22dc650dSSadaf Ebrahimigeneral context. A general context is created by: 754*22dc650dSSadaf Ebrahimi<br> 755*22dc650dSSadaf Ebrahimi<br> 756*22dc650dSSadaf Ebrahimi<b>pcre2_general_context *pcre2_general_context_create(</b> 757*22dc650dSSadaf Ebrahimi<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b> 758*22dc650dSSadaf Ebrahimi<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b> 759*22dc650dSSadaf Ebrahimi<br> 760*22dc650dSSadaf Ebrahimi<br> 761*22dc650dSSadaf EbrahimiThe two function pointers specify custom memory management functions, whose 762*22dc650dSSadaf Ebrahimiprototypes are: 763*22dc650dSSadaf Ebrahimi<pre> 764*22dc650dSSadaf Ebrahimi <b>void *private_malloc(PCRE2_SIZE, void *);</b> 765*22dc650dSSadaf Ebrahimi <b>void private_free(void *, void *);</b> 766*22dc650dSSadaf Ebrahimi</pre> 767*22dc650dSSadaf EbrahimiWhenever code in PCRE2 calls these functions, the final argument is the value 768*22dc650dSSadaf Ebrahimiof <i>memory_data</i>. Either of the first two arguments of the creation 769*22dc650dSSadaf Ebrahimifunction may be NULL, in which case the system memory management functions 770*22dc650dSSadaf Ebrahimi<i>malloc()</i> and <i>free()</i> are used. (This is not currently useful, as 771*22dc650dSSadaf Ebrahimithere are no other fields in a general context, but in future there might be.) 772*22dc650dSSadaf EbrahimiThe <i>private_malloc()</i> function is used (if supplied) to obtain memory for 773*22dc650dSSadaf Ebrahimistoring the context, and all three values are saved as part of the context. 774*22dc650dSSadaf Ebrahimi</P> 775*22dc650dSSadaf Ebrahimi<P> 776*22dc650dSSadaf EbrahimiWhenever PCRE2 creates a data block of any kind, the block contains a pointer 777*22dc650dSSadaf Ebrahimito the <i>free()</i> function that matches the <i>malloc()</i> function that was 778*22dc650dSSadaf Ebrahimiused. When the time comes to free the block, this function is called. 779*22dc650dSSadaf Ebrahimi</P> 780*22dc650dSSadaf Ebrahimi<P> 781*22dc650dSSadaf EbrahimiA general context can be copied by calling: 782*22dc650dSSadaf Ebrahimi<br> 783*22dc650dSSadaf Ebrahimi<br> 784*22dc650dSSadaf Ebrahimi<b>pcre2_general_context *pcre2_general_context_copy(</b> 785*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 786*22dc650dSSadaf Ebrahimi<br> 787*22dc650dSSadaf Ebrahimi<br> 788*22dc650dSSadaf EbrahimiThe memory used for a general context should be freed by calling: 789*22dc650dSSadaf Ebrahimi<br> 790*22dc650dSSadaf Ebrahimi<br> 791*22dc650dSSadaf Ebrahimi<b>void pcre2_general_context_free(pcre2_general_context *<i>gcontext</i>);</b> 792*22dc650dSSadaf Ebrahimi<br> 793*22dc650dSSadaf Ebrahimi<br> 794*22dc650dSSadaf EbrahimiIf this function is passed a NULL argument, it returns immediately without 795*22dc650dSSadaf Ebrahimidoing anything. 796*22dc650dSSadaf Ebrahimi<a name="compilecontext"></a></P> 797*22dc650dSSadaf Ebrahimi<br><b> 798*22dc650dSSadaf EbrahimiThe compile context 799*22dc650dSSadaf Ebrahimi</b><br> 800*22dc650dSSadaf Ebrahimi<P> 801*22dc650dSSadaf EbrahimiA compile context is required if you want to provide an external function for 802*22dc650dSSadaf Ebrahimistack checking during compilation or to change the default values of any of the 803*22dc650dSSadaf Ebrahimifollowing compile-time parameters: 804*22dc650dSSadaf Ebrahimi<pre> 805*22dc650dSSadaf Ebrahimi What \R matches (Unicode newlines or CR, LF, CRLF only) 806*22dc650dSSadaf Ebrahimi PCRE2's character tables 807*22dc650dSSadaf Ebrahimi The newline character sequence 808*22dc650dSSadaf Ebrahimi The compile time nested parentheses limit 809*22dc650dSSadaf Ebrahimi The maximum length of the pattern string 810*22dc650dSSadaf Ebrahimi The extra options bits (none set by default) 811*22dc650dSSadaf Ebrahimi</pre> 812*22dc650dSSadaf EbrahimiA compile context is also required if you are using custom memory management. 813*22dc650dSSadaf EbrahimiIf none of these apply, just pass NULL as the context argument of 814*22dc650dSSadaf Ebrahimi<i>pcre2_compile()</i>. 815*22dc650dSSadaf Ebrahimi</P> 816*22dc650dSSadaf Ebrahimi<P> 817*22dc650dSSadaf EbrahimiA compile context is created, copied, and freed by the following functions: 818*22dc650dSSadaf Ebrahimi<br> 819*22dc650dSSadaf Ebrahimi<br> 820*22dc650dSSadaf Ebrahimi<b>pcre2_compile_context *pcre2_compile_context_create(</b> 821*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 822*22dc650dSSadaf Ebrahimi<br> 823*22dc650dSSadaf Ebrahimi<br> 824*22dc650dSSadaf Ebrahimi<b>pcre2_compile_context *pcre2_compile_context_copy(</b> 825*22dc650dSSadaf Ebrahimi<b> pcre2_compile_context *<i>ccontext</i>);</b> 826*22dc650dSSadaf Ebrahimi<br> 827*22dc650dSSadaf Ebrahimi<br> 828*22dc650dSSadaf Ebrahimi<b>void pcre2_compile_context_free(pcre2_compile_context *<i>ccontext</i>);</b> 829*22dc650dSSadaf Ebrahimi<br> 830*22dc650dSSadaf Ebrahimi<br> 831*22dc650dSSadaf EbrahimiA compile context is created with default values for its parameters. These can 832*22dc650dSSadaf Ebrahimibe changed by calling the following functions, which return 0 on success, or 833*22dc650dSSadaf EbrahimiPCRE2_ERROR_BADDATA if invalid data is detected. 834*22dc650dSSadaf Ebrahimi<br> 835*22dc650dSSadaf Ebrahimi<br> 836*22dc650dSSadaf Ebrahimi<b>int pcre2_set_bsr(pcre2_compile_context *<i>ccontext</i>,</b> 837*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 838*22dc650dSSadaf Ebrahimi<br> 839*22dc650dSSadaf Ebrahimi<br> 840*22dc650dSSadaf EbrahimiThe value must be PCRE2_BSR_ANYCRLF, to specify that \R matches only CR, LF, 841*22dc650dSSadaf Ebrahimior CRLF, or PCRE2_BSR_UNICODE, to specify that \R matches any Unicode line 842*22dc650dSSadaf Ebrahimiending sequence. The value is used by the JIT compiler and by the two 843*22dc650dSSadaf Ebrahimiinterpreted matching functions, <i>pcre2_match()</i> and 844*22dc650dSSadaf Ebrahimi<i>pcre2_dfa_match()</i>. 845*22dc650dSSadaf Ebrahimi<br> 846*22dc650dSSadaf Ebrahimi<br> 847*22dc650dSSadaf Ebrahimi<b>int pcre2_set_character_tables(pcre2_compile_context *<i>ccontext</i>,</b> 848*22dc650dSSadaf Ebrahimi<b> const uint8_t *<i>tables</i>);</b> 849*22dc650dSSadaf Ebrahimi<br> 850*22dc650dSSadaf Ebrahimi<br> 851*22dc650dSSadaf EbrahimiThe value must be the result of a call to <b>pcre2_maketables()</b>, whose only 852*22dc650dSSadaf Ebrahimiargument is a general context. This function builds a set of character tables 853*22dc650dSSadaf Ebrahimiin the current locale. 854*22dc650dSSadaf Ebrahimi<br> 855*22dc650dSSadaf Ebrahimi<br> 856*22dc650dSSadaf Ebrahimi<b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b> 857*22dc650dSSadaf Ebrahimi<b> uint32_t <i>extra_options</i>);</b> 858*22dc650dSSadaf Ebrahimi<br> 859*22dc650dSSadaf Ebrahimi<br> 860*22dc650dSSadaf EbrahimiAs PCRE2 has developed, almost all the 32 option bits that are available in 861*22dc650dSSadaf Ebrahimithe <i>options</i> argument of <b>pcre2_compile()</b> have been used up. To avoid 862*22dc650dSSadaf Ebrahimirunning out, the compile context contains a set of extra option bits which are 863*22dc650dSSadaf Ebrahimiused for some newer, assumed rarer, options. This function sets those bits. It 864*22dc650dSSadaf Ebrahimialways sets all the bits (either on or off). It does not modify any existing 865*22dc650dSSadaf Ebrahimisetting. The available options are defined in the section entitled "Extra 866*22dc650dSSadaf Ebrahimicompile options" 867*22dc650dSSadaf Ebrahimi<a href="#extracompileoptions">below.</a> 868*22dc650dSSadaf Ebrahimi<br> 869*22dc650dSSadaf Ebrahimi<br> 870*22dc650dSSadaf Ebrahimi<b>int pcre2_set_max_pattern_length(pcre2_compile_context *<i>ccontext</i>,</b> 871*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>value</i>);</b> 872*22dc650dSSadaf Ebrahimi<br> 873*22dc650dSSadaf Ebrahimi<br> 874*22dc650dSSadaf EbrahimiThis sets a maximum length, in code units, for any pattern string that is 875*22dc650dSSadaf Ebrahimicompiled with this context. If the pattern is longer, an error is generated. 876*22dc650dSSadaf EbrahimiThis facility is provided so that applications that accept patterns from 877*22dc650dSSadaf Ebrahimiexternal sources can limit their size. The default is the largest number that a 878*22dc650dSSadaf EbrahimiPCRE2_SIZE variable can hold, which is effectively unlimited. 879*22dc650dSSadaf Ebrahimi<br> 880*22dc650dSSadaf Ebrahimi<br> 881*22dc650dSSadaf Ebrahimi<b>int pcre2_set_max_pattern_compiled_length(</b> 882*22dc650dSSadaf Ebrahimi<b> pcre2_compile_context *<i>ccontext</i>, PCRE2_SIZE <i>value</i>);</b> 883*22dc650dSSadaf Ebrahimi<br> 884*22dc650dSSadaf Ebrahimi<br> 885*22dc650dSSadaf EbrahimiThis sets a maximum size, in bytes, for the memory needed to hold the compiled 886*22dc650dSSadaf Ebrahimiversion of a pattern that is compiled with this context. If the pattern needs 887*22dc650dSSadaf Ebrahimimore memory, an error is generated. This facility is provided so that 888*22dc650dSSadaf Ebrahimiapplications that accept patterns from external sources can limit the amount of 889*22dc650dSSadaf Ebrahimimemory they use. The default is the largest number that a PCRE2_SIZE variable 890*22dc650dSSadaf Ebrahimican hold, which is effectively unlimited. 891*22dc650dSSadaf Ebrahimi<br> 892*22dc650dSSadaf Ebrahimi<br> 893*22dc650dSSadaf Ebrahimi<b>int pcre2_set_max_varlookbehind(pcre2_compile_contest *<i>ccontext</i>,</b> 894*22dc650dSSadaf Ebrahimi<b>" uint32_t <i>value</i>);</b> 895*22dc650dSSadaf Ebrahimi<br> 896*22dc650dSSadaf Ebrahimi<br> 897*22dc650dSSadaf EbrahimiThis sets a maximum length for the number of characters matched by a 898*22dc650dSSadaf Ebrahimivariable-length lookbehind assertion. The default is set when PCRE2 is built, 899*22dc650dSSadaf Ebrahimiwith the ultimate default being 255, the same as Perl. Lookbehind assertions 900*22dc650dSSadaf Ebrahimiwithout a bounding length are not supported. 901*22dc650dSSadaf Ebrahimi<br> 902*22dc650dSSadaf Ebrahimi<br> 903*22dc650dSSadaf Ebrahimi<b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b> 904*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 905*22dc650dSSadaf Ebrahimi<br> 906*22dc650dSSadaf Ebrahimi<br> 907*22dc650dSSadaf EbrahimiThis specifies which characters or character sequences are to be recognized as 908*22dc650dSSadaf Ebrahiminewlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only), 909*22dc650dSSadaf EbrahimiPCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character 910*22dc650dSSadaf Ebrahimisequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), 911*22dc650dSSadaf EbrahimiPCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the 912*22dc650dSSadaf EbrahimiNUL character, that is a binary zero). 913*22dc650dSSadaf Ebrahimi</P> 914*22dc650dSSadaf Ebrahimi<P> 915*22dc650dSSadaf EbrahimiA pattern can override the value set in the compile context by starting with a 916*22dc650dSSadaf Ebrahimisequence such as (*CRLF). See the 917*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 918*22dc650dSSadaf Ebrahimipage for details. 919*22dc650dSSadaf Ebrahimi</P> 920*22dc650dSSadaf Ebrahimi<P> 921*22dc650dSSadaf EbrahimiWhen a pattern is compiled with the PCRE2_EXTENDED or PCRE2_EXTENDED_MORE 922*22dc650dSSadaf Ebrahimioption, the newline convention affects the recognition of the end of internal 923*22dc650dSSadaf Ebrahimicomments starting with #. The value is saved with the compiled pattern for 924*22dc650dSSadaf Ebrahimisubsequent use by the JIT compiler and by the two interpreted matching 925*22dc650dSSadaf Ebrahimifunctions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>. 926*22dc650dSSadaf Ebrahimi<br> 927*22dc650dSSadaf Ebrahimi<br> 928*22dc650dSSadaf Ebrahimi<b>int pcre2_set_parens_nest_limit(pcre2_compile_context *<i>ccontext</i>,</b> 929*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 930*22dc650dSSadaf Ebrahimi<br> 931*22dc650dSSadaf Ebrahimi<br> 932*22dc650dSSadaf EbrahimiThis parameter adjusts the limit, set when PCRE2 is built (default 250), on the 933*22dc650dSSadaf Ebrahimidepth of parenthesis nesting in a pattern. This limit stops rogue patterns 934*22dc650dSSadaf Ebrahimiusing up too much system stack when being compiled. The limit applies to 935*22dc650dSSadaf Ebrahimiparentheses of all kinds, not just capturing parentheses. 936*22dc650dSSadaf Ebrahimi<br> 937*22dc650dSSadaf Ebrahimi<br> 938*22dc650dSSadaf Ebrahimi<b>int pcre2_set_compile_recursion_guard(pcre2_compile_context *<i>ccontext</i>,</b> 939*22dc650dSSadaf Ebrahimi<b> int (*<i>guard_function</i>)(uint32_t, void *), void *<i>user_data</i>);</b> 940*22dc650dSSadaf Ebrahimi<br> 941*22dc650dSSadaf Ebrahimi<br> 942*22dc650dSSadaf EbrahimiThere is at least one application that runs PCRE2 in threads with very limited 943*22dc650dSSadaf Ebrahimisystem stack, where running out of stack is to be avoided at all costs. The 944*22dc650dSSadaf Ebrahimiparenthesis limit above cannot take account of how much stack is actually 945*22dc650dSSadaf Ebrahimiavailable during compilation. For a finer control, you can supply a function 946*22dc650dSSadaf Ebrahimithat is called whenever <b>pcre2_compile()</b> starts to compile a parenthesized 947*22dc650dSSadaf Ebrahimipart of a pattern. This function can check the actual stack size (or anything 948*22dc650dSSadaf Ebrahimielse that it wants to, of course). 949*22dc650dSSadaf Ebrahimi</P> 950*22dc650dSSadaf Ebrahimi<P> 951*22dc650dSSadaf EbrahimiThe first argument to the callout function gives the current depth of 952*22dc650dSSadaf Ebrahiminesting, and the second is user data that is set up by the last argument of 953*22dc650dSSadaf Ebrahimi<b>pcre2_set_compile_recursion_guard()</b>. The callout function should return 954*22dc650dSSadaf Ebrahimizero if all is well, or non-zero to force an error. 955*22dc650dSSadaf Ebrahimi<a name="matchcontext"></a></P> 956*22dc650dSSadaf Ebrahimi<br><b> 957*22dc650dSSadaf EbrahimiThe match context 958*22dc650dSSadaf Ebrahimi</b><br> 959*22dc650dSSadaf Ebrahimi<P> 960*22dc650dSSadaf EbrahimiA match context is required if you want to: 961*22dc650dSSadaf Ebrahimi<pre> 962*22dc650dSSadaf Ebrahimi Set up a callout function 963*22dc650dSSadaf Ebrahimi Set an offset limit for matching an unanchored pattern 964*22dc650dSSadaf Ebrahimi Change the limit on the amount of heap used when matching 965*22dc650dSSadaf Ebrahimi Change the backtracking match limit 966*22dc650dSSadaf Ebrahimi Change the backtracking depth limit 967*22dc650dSSadaf Ebrahimi Set custom memory management specifically for the match 968*22dc650dSSadaf Ebrahimi</pre> 969*22dc650dSSadaf EbrahimiIf none of these apply, just pass NULL as the context argument of 970*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>. 971*22dc650dSSadaf Ebrahimi</P> 972*22dc650dSSadaf Ebrahimi<P> 973*22dc650dSSadaf EbrahimiA match context is created, copied, and freed by the following functions: 974*22dc650dSSadaf Ebrahimi<br> 975*22dc650dSSadaf Ebrahimi<br> 976*22dc650dSSadaf Ebrahimi<b>pcre2_match_context *pcre2_match_context_create(</b> 977*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 978*22dc650dSSadaf Ebrahimi<br> 979*22dc650dSSadaf Ebrahimi<br> 980*22dc650dSSadaf Ebrahimi<b>pcre2_match_context *pcre2_match_context_copy(</b> 981*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>);</b> 982*22dc650dSSadaf Ebrahimi<br> 983*22dc650dSSadaf Ebrahimi<br> 984*22dc650dSSadaf Ebrahimi<b>void pcre2_match_context_free(pcre2_match_context *<i>mcontext</i>);</b> 985*22dc650dSSadaf Ebrahimi<br> 986*22dc650dSSadaf Ebrahimi<br> 987*22dc650dSSadaf EbrahimiA match context is created with default values for its parameters. These can 988*22dc650dSSadaf Ebrahimibe changed by calling the following functions, which return 0 on success, or 989*22dc650dSSadaf EbrahimiPCRE2_ERROR_BADDATA if invalid data is detected. 990*22dc650dSSadaf Ebrahimi<br> 991*22dc650dSSadaf Ebrahimi<br> 992*22dc650dSSadaf Ebrahimi<b>int pcre2_set_callout(pcre2_match_context *<i>mcontext</i>,</b> 993*22dc650dSSadaf Ebrahimi<b> int (*<i>callout_function</i>)(pcre2_callout_block *, void *),</b> 994*22dc650dSSadaf Ebrahimi<b> void *<i>callout_data</i>);</b> 995*22dc650dSSadaf Ebrahimi<br> 996*22dc650dSSadaf Ebrahimi<br> 997*22dc650dSSadaf EbrahimiThis sets up a callout function for PCRE2 to call at specified points 998*22dc650dSSadaf Ebrahimiduring a matching operation. Details are given in the 999*22dc650dSSadaf Ebrahimi<a href="pcre2callout.html"><b>pcre2callout</b></a> 1000*22dc650dSSadaf Ebrahimidocumentation. 1001*22dc650dSSadaf Ebrahimi<br> 1002*22dc650dSSadaf Ebrahimi<br> 1003*22dc650dSSadaf Ebrahimi<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b> 1004*22dc650dSSadaf Ebrahimi<b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b> 1005*22dc650dSSadaf Ebrahimi<b> void *<i>callout_data</i>);</b> 1006*22dc650dSSadaf Ebrahimi<br> 1007*22dc650dSSadaf Ebrahimi<br> 1008*22dc650dSSadaf EbrahimiThis sets up a callout function for PCRE2 to call after each substitution 1009*22dc650dSSadaf Ebrahimimade by <b>pcre2_substitute()</b>. Details are given in the section entitled 1010*22dc650dSSadaf Ebrahimi"Creating a new string with substitutions" 1011*22dc650dSSadaf Ebrahimi<a href="#substitutions">below.</a> 1012*22dc650dSSadaf Ebrahimi<br> 1013*22dc650dSSadaf Ebrahimi<br> 1014*22dc650dSSadaf Ebrahimi<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b> 1015*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>value</i>);</b> 1016*22dc650dSSadaf Ebrahimi<br> 1017*22dc650dSSadaf Ebrahimi<br> 1018*22dc650dSSadaf EbrahimiThe <i>offset_limit</i> parameter limits how far an unanchored search can 1019*22dc650dSSadaf Ebrahimiadvance in the subject string. The default value is PCRE2_UNSET. The 1020*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> functions return 1021*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOMATCH if a match with a starting point before or at the given 1022*22dc650dSSadaf Ebrahimioffset is not found. The <b>pcre2_substitute()</b> function makes no more 1023*22dc650dSSadaf Ebrahimisubstitutions. 1024*22dc650dSSadaf Ebrahimi</P> 1025*22dc650dSSadaf Ebrahimi<P> 1026*22dc650dSSadaf EbrahimiFor example, if the pattern /abc/ is matched against "123abc" with an offset 1027*22dc650dSSadaf Ebrahimilimit less than 3, the result is PCRE2_ERROR_NOMATCH. A match can never be 1028*22dc650dSSadaf Ebrahimifound if the <i>startoffset</i> argument of <b>pcre2_match()</b>, 1029*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>, or <b>pcre2_substitute()</b> is greater than the offset 1030*22dc650dSSadaf Ebrahimilimit set in the match context. 1031*22dc650dSSadaf Ebrahimi</P> 1032*22dc650dSSadaf Ebrahimi<P> 1033*22dc650dSSadaf EbrahimiWhen using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when 1034*22dc650dSSadaf Ebrahimicalling <b>pcre2_compile()</b> so that when JIT is in use, different code can be 1035*22dc650dSSadaf Ebrahimicompiled. If a match is started with a non-default match limit when 1036*22dc650dSSadaf EbrahimiPCRE2_USE_OFFSET_LIMIT is not set, an error is generated. 1037*22dc650dSSadaf Ebrahimi</P> 1038*22dc650dSSadaf Ebrahimi<P> 1039*22dc650dSSadaf EbrahimiThe offset limit facility can be used to track progress when searching large 1040*22dc650dSSadaf Ebrahimisubject strings or to limit the extent of global substitutions. See also the 1041*22dc650dSSadaf EbrahimiPCRE2_FIRSTLINE option, which requires a match to start before or at the first 1042*22dc650dSSadaf Ebrahiminewline that follows the start of matching in the subject. If this is set with 1043*22dc650dSSadaf Ebrahimian offset limit, a match must occur in the first line and also within the 1044*22dc650dSSadaf Ebrahimioffset limit. In other words, whichever limit comes first is used. 1045*22dc650dSSadaf Ebrahimi<br> 1046*22dc650dSSadaf Ebrahimi<br> 1047*22dc650dSSadaf Ebrahimi<b>int pcre2_set_heap_limit(pcre2_match_context *<i>mcontext</i>,</b> 1048*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 1049*22dc650dSSadaf Ebrahimi<br> 1050*22dc650dSSadaf Ebrahimi<br> 1051*22dc650dSSadaf EbrahimiThe <i>heap_limit</i> parameter specifies, in units of kibibytes (1024 bytes), 1052*22dc650dSSadaf Ebrahimithe maximum amount of heap memory that <b>pcre2_match()</b> may use to hold 1053*22dc650dSSadaf Ebrahimibacktracking information when running an interpretive match. This limit also 1054*22dc650dSSadaf Ebrahimiapplies to <b>pcre2_dfa_match()</b>, which may use the heap when processing 1055*22dc650dSSadaf Ebrahimipatterns with a lot of nested pattern recursion or lookarounds or atomic 1056*22dc650dSSadaf Ebrahimigroups. This limit does not apply to matching with the JIT optimization, which 1057*22dc650dSSadaf Ebrahimihas its own memory control arrangements (see the 1058*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a> 1059*22dc650dSSadaf Ebrahimidocumentation for more details). If the limit is reached, the negative error 1060*22dc650dSSadaf Ebrahimicode PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2 1061*22dc650dSSadaf Ebrahimiis built; if it is not, the default is set very large and is essentially 1062*22dc650dSSadaf Ebrahimiunlimited. 1063*22dc650dSSadaf Ebrahimi</P> 1064*22dc650dSSadaf Ebrahimi<P> 1065*22dc650dSSadaf EbrahimiA value for the heap limit may also be supplied by an item at the start of a 1066*22dc650dSSadaf Ebrahimipattern of the form 1067*22dc650dSSadaf Ebrahimi<pre> 1068*22dc650dSSadaf Ebrahimi (*LIMIT_HEAP=ddd) 1069*22dc650dSSadaf Ebrahimi</pre> 1070*22dc650dSSadaf Ebrahimiwhere ddd is a decimal number. However, such a setting is ignored unless ddd is 1071*22dc650dSSadaf Ebrahimiless than the limit set by the caller of <b>pcre2_match()</b> or, if no such 1072*22dc650dSSadaf Ebrahimilimit is set, less than the default. 1073*22dc650dSSadaf Ebrahimi</P> 1074*22dc650dSSadaf Ebrahimi<P> 1075*22dc650dSSadaf EbrahimiThe <b>pcre2_match()</b> function always needs some heap memory, so setting a 1076*22dc650dSSadaf Ebrahimivalue of zero guarantees a "heap limit exceeded" error. Details of how 1077*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> uses the heap are given in the 1078*22dc650dSSadaf Ebrahimi<a href="pcre2perform.html"><b>pcre2perform</b></a> 1079*22dc650dSSadaf Ebrahimidocumentation. 1080*22dc650dSSadaf Ebrahimi</P> 1081*22dc650dSSadaf Ebrahimi<P> 1082*22dc650dSSadaf EbrahimiFor <b>pcre2_dfa_match()</b>, a vector on the system stack is used when 1083*22dc650dSSadaf Ebrahimiprocessing pattern recursions, lookarounds, or atomic groups, and only if this 1084*22dc650dSSadaf Ebrahimiis not big enough is heap memory used. In this case, setting a value of zero 1085*22dc650dSSadaf Ebrahimidisables the use of the heap. 1086*22dc650dSSadaf Ebrahimi<br> 1087*22dc650dSSadaf Ebrahimi<br> 1088*22dc650dSSadaf Ebrahimi<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b> 1089*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 1090*22dc650dSSadaf Ebrahimi<br> 1091*22dc650dSSadaf Ebrahimi<br> 1092*22dc650dSSadaf EbrahimiThe <i>match_limit</i> parameter provides a means of preventing PCRE2 from using 1093*22dc650dSSadaf Ebrahimiup too many computing resources when processing patterns that are not going to 1094*22dc650dSSadaf Ebrahimimatch, but which have a very large number of possibilities in their search 1095*22dc650dSSadaf Ebrahimitrees. The classic example is a pattern that uses nested unlimited repeats. 1096*22dc650dSSadaf Ebrahimi</P> 1097*22dc650dSSadaf Ebrahimi<P> 1098*22dc650dSSadaf EbrahimiThere is an internal counter in <b>pcre2_match()</b> that is incremented each 1099*22dc650dSSadaf Ebrahimitime round its main matching loop. If this value reaches the match limit, 1100*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has 1101*22dc650dSSadaf Ebrahimithe effect of limiting the amount of backtracking that can take place. For 1102*22dc650dSSadaf Ebrahimipatterns that are not anchored, the count restarts from zero for each position 1103*22dc650dSSadaf Ebrahimiin the subject string. This limit also applies to <b>pcre2_dfa_match()</b>, 1104*22dc650dSSadaf Ebrahimithough the counting is done in a different way. 1105*22dc650dSSadaf Ebrahimi</P> 1106*22dc650dSSadaf Ebrahimi<P> 1107*22dc650dSSadaf EbrahimiWhen <b>pcre2_match()</b> is called with a pattern that was successfully 1108*22dc650dSSadaf Ebrahimiprocessed by <b>pcre2_jit_compile()</b>, the way in which matching is executed 1109*22dc650dSSadaf Ebrahimiis entirely different. However, there is still the possibility of runaway 1110*22dc650dSSadaf Ebrahimimatching that goes on for a very long time, and so the <i>match_limit</i> value 1111*22dc650dSSadaf Ebrahimiis also used in this case (but in a different way) to limit how long the 1112*22dc650dSSadaf Ebrahimimatching can continue. 1113*22dc650dSSadaf Ebrahimi</P> 1114*22dc650dSSadaf Ebrahimi<P> 1115*22dc650dSSadaf EbrahimiThe default value for the limit can be set when PCRE2 is built; the default is 1116*22dc650dSSadaf Ebrahimi10 million, which handles all but the most extreme cases. A value for the match 1117*22dc650dSSadaf Ebrahimilimit may also be supplied by an item at the start of a pattern of the form 1118*22dc650dSSadaf Ebrahimi<pre> 1119*22dc650dSSadaf Ebrahimi (*LIMIT_MATCH=ddd) 1120*22dc650dSSadaf Ebrahimi</pre> 1121*22dc650dSSadaf Ebrahimiwhere ddd is a decimal number. However, such a setting is ignored unless ddd is 1122*22dc650dSSadaf Ebrahimiless than the limit set by the caller of <b>pcre2_match()</b> or 1123*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default. 1124*22dc650dSSadaf Ebrahimi<br> 1125*22dc650dSSadaf Ebrahimi<br> 1126*22dc650dSSadaf Ebrahimi<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b> 1127*22dc650dSSadaf Ebrahimi<b> uint32_t <i>value</i>);</b> 1128*22dc650dSSadaf Ebrahimi<br> 1129*22dc650dSSadaf Ebrahimi<br> 1130*22dc650dSSadaf EbrahimiThis parameter limits the depth of nested backtracking in <b>pcre2_match()</b>. 1131*22dc650dSSadaf EbrahimiEach time a nested backtracking point is passed, a new memory frame is used 1132*22dc650dSSadaf Ebrahimito remember the state of matching at that point. Thus, this parameter 1133*22dc650dSSadaf Ebrahimiindirectly limits the amount of memory that is used in a match. However, 1134*22dc650dSSadaf Ebrahimibecause the size of each memory frame depends on the number of capturing 1135*22dc650dSSadaf Ebrahimiparentheses, the actual memory limit varies from pattern to pattern. This limit 1136*22dc650dSSadaf Ebrahimiwas more useful in versions before 10.30, where function recursion was used for 1137*22dc650dSSadaf Ebrahimibacktracking. 1138*22dc650dSSadaf Ebrahimi</P> 1139*22dc650dSSadaf Ebrahimi<P> 1140*22dc650dSSadaf EbrahimiThe depth limit is not relevant, and is ignored, when matching is done using 1141*22dc650dSSadaf EbrahimiJIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which 1142*22dc650dSSadaf Ebrahimiuses it to limit the depth of nested internal recursive function calls that 1143*22dc650dSSadaf Ebrahimiimplement atomic groups, lookaround assertions, and pattern recursions. This 1144*22dc650dSSadaf Ebrahimilimits, indirectly, the amount of system stack that is used. It was more useful 1145*22dc650dSSadaf Ebrahimiin versions before 10.32, when stack memory was used for local workspace 1146*22dc650dSSadaf Ebrahimivectors for recursive function calls. From version 10.32, only local variables 1147*22dc650dSSadaf Ebrahimiare allocated on the stack and as each call uses only a few hundred bytes, even 1148*22dc650dSSadaf Ebrahimia small stack can support quite a lot of recursion. 1149*22dc650dSSadaf Ebrahimi</P> 1150*22dc650dSSadaf Ebrahimi<P> 1151*22dc650dSSadaf EbrahimiIf the depth of internal recursive function calls is great enough, local 1152*22dc650dSSadaf Ebrahimiworkspace vectors are allocated on the heap from version 10.32 onwards, so the 1153*22dc650dSSadaf Ebrahimidepth limit also indirectly limits the amount of heap memory that is used. A 1154*22dc650dSSadaf Ebrahimirecursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string 1155*22dc650dSSadaf Ebrahimiusing <b>pcre2_dfa_match()</b>, can use a great deal of memory. However, it is 1156*22dc650dSSadaf Ebrahimiprobably better to limit heap usage directly by calling 1157*22dc650dSSadaf Ebrahimi<b>pcre2_set_heap_limit()</b>. 1158*22dc650dSSadaf Ebrahimi</P> 1159*22dc650dSSadaf Ebrahimi<P> 1160*22dc650dSSadaf EbrahimiThe default value for the depth limit can be set when PCRE2 is built; if it is 1161*22dc650dSSadaf Ebrahiminot, the default is set to the same value as the default for the match limit. 1162*22dc650dSSadaf EbrahimiIf the limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> 1163*22dc650dSSadaf Ebrahimireturns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be 1164*22dc650dSSadaf Ebrahimisupplied by an item at the start of a pattern of the form 1165*22dc650dSSadaf Ebrahimi<pre> 1166*22dc650dSSadaf Ebrahimi (*LIMIT_DEPTH=ddd) 1167*22dc650dSSadaf Ebrahimi</pre> 1168*22dc650dSSadaf Ebrahimiwhere ddd is a decimal number. However, such a setting is ignored unless ddd is 1169*22dc650dSSadaf Ebrahimiless than the limit set by the caller of <b>pcre2_match()</b> or 1170*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default. 1171*22dc650dSSadaf Ebrahimi</P> 1172*22dc650dSSadaf Ebrahimi<br><a name="SEC19" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br> 1173*22dc650dSSadaf Ebrahimi<P> 1174*22dc650dSSadaf Ebrahimi<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b> 1175*22dc650dSSadaf Ebrahimi</P> 1176*22dc650dSSadaf Ebrahimi<P> 1177*22dc650dSSadaf EbrahimiThe function <b>pcre2_config()</b> makes it possible for a PCRE2 client to find 1178*22dc650dSSadaf Ebrahimithe value of certain configuration parameters and to discover which optional 1179*22dc650dSSadaf Ebrahimifeatures have been compiled into the PCRE2 library. The 1180*22dc650dSSadaf Ebrahimi<a href="pcre2build.html"><b>pcre2build</b></a> 1181*22dc650dSSadaf Ebrahimidocumentation has more details about these features. 1182*22dc650dSSadaf Ebrahimi</P> 1183*22dc650dSSadaf Ebrahimi<P> 1184*22dc650dSSadaf EbrahimiThe first argument for <b>pcre2_config()</b> specifies which information is 1185*22dc650dSSadaf Ebrahimirequired. The second argument is a pointer to memory into which the information 1186*22dc650dSSadaf Ebrahimiis placed. If NULL is passed, the function returns the amount of memory that is 1187*22dc650dSSadaf Ebrahimineeded for the requested information. For calls that return numerical values, 1188*22dc650dSSadaf Ebrahimithe value is in bytes; when requesting these values, <i>where</i> should point 1189*22dc650dSSadaf Ebrahimito appropriately aligned memory. For calls that return strings, the required 1190*22dc650dSSadaf Ebrahimilength is given in code units, not counting the terminating zero. 1191*22dc650dSSadaf Ebrahimi</P> 1192*22dc650dSSadaf Ebrahimi<P> 1193*22dc650dSSadaf EbrahimiWhen requesting information, the returned value from <b>pcre2_config()</b> is 1194*22dc650dSSadaf Ebrahiminon-negative on success, or the negative error code PCRE2_ERROR_BADOPTION if 1195*22dc650dSSadaf Ebrahimithe value in the first argument is not recognized. The following information is 1196*22dc650dSSadaf Ebrahimiavailable: 1197*22dc650dSSadaf Ebrahimi<pre> 1198*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_BSR 1199*22dc650dSSadaf Ebrahimi</pre> 1200*22dc650dSSadaf EbrahimiThe output is a uint32_t integer whose value indicates what character 1201*22dc650dSSadaf Ebrahimisequences the \R escape sequence matches by default. A value of 1202*22dc650dSSadaf EbrahimiPCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a 1203*22dc650dSSadaf Ebrahimivalue of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The 1204*22dc650dSSadaf Ebrahimidefault can be overridden when a pattern is compiled. 1205*22dc650dSSadaf Ebrahimi<pre> 1206*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_COMPILED_WIDTHS 1207*22dc650dSSadaf Ebrahimi</pre> 1208*22dc650dSSadaf EbrahimiThe output is a uint32_t integer whose lower bits indicate which code unit 1209*22dc650dSSadaf Ebrahimiwidths were selected when PCRE2 was built. The 1-bit indicates 8-bit support, 1210*22dc650dSSadaf Ebrahimiand the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively. 1211*22dc650dSSadaf Ebrahimi<pre> 1212*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_DEPTHLIMIT 1213*22dc650dSSadaf Ebrahimi</pre> 1214*22dc650dSSadaf EbrahimiThe output is a uint32_t integer that gives the default limit for the depth of 1215*22dc650dSSadaf Ebrahiminested backtracking in <b>pcre2_match()</b> or the depth of nested recursions, 1216*22dc650dSSadaf Ebrahimilookarounds, and atomic groups in <b>pcre2_dfa_match()</b>. Further details are 1217*22dc650dSSadaf Ebrahimigiven with <b>pcre2_set_depth_limit()</b> above. 1218*22dc650dSSadaf Ebrahimi<pre> 1219*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_HEAPLIMIT 1220*22dc650dSSadaf Ebrahimi</pre> 1221*22dc650dSSadaf EbrahimiThe output is a uint32_t integer that gives, in kibibytes, the default limit 1222*22dc650dSSadaf Ebrahimifor the amount of heap memory used by <b>pcre2_match()</b> or 1223*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>. Further details are given with 1224*22dc650dSSadaf Ebrahimi<b>pcre2_set_heap_limit()</b> above. 1225*22dc650dSSadaf Ebrahimi<pre> 1226*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_JIT 1227*22dc650dSSadaf Ebrahimi</pre> 1228*22dc650dSSadaf EbrahimiThe output is a uint32_t integer that is set to one if support for just-in-time 1229*22dc650dSSadaf Ebrahimicompiling is included in the library; otherwise it is set to zero. Note that 1230*22dc650dSSadaf Ebrahimihaving the support in the library does not guarantee that JIT will be used for 1231*22dc650dSSadaf Ebrahimiany given match. See the 1232*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a> 1233*22dc650dSSadaf Ebrahimidocumentation for more details. 1234*22dc650dSSadaf Ebrahimi<pre> 1235*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_JITTARGET 1236*22dc650dSSadaf Ebrahimi</pre> 1237*22dc650dSSadaf EbrahimiThe <i>where</i> argument should point to a buffer that is at least 48 code 1238*22dc650dSSadaf Ebrahimiunits long. (The exact length required can be found by calling 1239*22dc650dSSadaf Ebrahimi<b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with a 1240*22dc650dSSadaf Ebrahimistring that contains the name of the architecture for which the JIT compiler is 1241*22dc650dSSadaf Ebrahimiconfigured, for example "x86 32bit (little endian + unaligned)". If JIT support 1242*22dc650dSSadaf Ebrahimiis not available, PCRE2_ERROR_BADOPTION is returned, otherwise the number of 1243*22dc650dSSadaf Ebrahimicode units used is returned. This is the length of the string, plus one unit 1244*22dc650dSSadaf Ebrahimifor the terminating zero. 1245*22dc650dSSadaf Ebrahimi<pre> 1246*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_LINKSIZE 1247*22dc650dSSadaf Ebrahimi</pre> 1248*22dc650dSSadaf EbrahimiThe output is a uint32_t integer that contains the number of bytes used for 1249*22dc650dSSadaf Ebrahimiinternal linkage in compiled regular expressions. When PCRE2 is configured, the 1250*22dc650dSSadaf Ebrahimivalue can be set to 2, 3, or 4, with the default being 2. This is the value 1251*22dc650dSSadaf Ebrahimithat is returned by <b>pcre2_config()</b>. However, when the 16-bit library is 1252*22dc650dSSadaf Ebrahimicompiled, a value of 3 is rounded up to 4, and when the 32-bit library is 1253*22dc650dSSadaf Ebrahimicompiled, internal linkages always use 4 bytes, so the configured value is not 1254*22dc650dSSadaf Ebrahimirelevant. 1255*22dc650dSSadaf Ebrahimi</P> 1256*22dc650dSSadaf Ebrahimi<P> 1257*22dc650dSSadaf EbrahimiThe default value of 2 for the 8-bit and 16-bit libraries is sufficient for all 1258*22dc650dSSadaf Ebrahimibut the most massive patterns, since it allows the size of the compiled pattern 1259*22dc650dSSadaf Ebrahimito be up to 65535 code units. Larger values allow larger regular expressions to 1260*22dc650dSSadaf Ebrahimibe compiled by those two libraries, but at the expense of slower matching. 1261*22dc650dSSadaf Ebrahimi<pre> 1262*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_MATCHLIMIT 1263*22dc650dSSadaf Ebrahimi</pre> 1264*22dc650dSSadaf EbrahimiThe output is a uint32_t integer that gives the default match limit for 1265*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>. Further details are given with 1266*22dc650dSSadaf Ebrahimi<b>pcre2_set_match_limit()</b> above. 1267*22dc650dSSadaf Ebrahimi<pre> 1268*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_NEWLINE 1269*22dc650dSSadaf Ebrahimi</pre> 1270*22dc650dSSadaf EbrahimiThe output is a uint32_t integer whose value specifies the default character 1271*22dc650dSSadaf Ebrahimisequence that is recognized as meaning "newline". The values are: 1272*22dc650dSSadaf Ebrahimi<pre> 1273*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_CR Carriage return (CR) 1274*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_LF Linefeed (LF) 1275*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) 1276*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_ANY Any Unicode line ending 1277*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF 1278*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_NUL The NUL character (binary zero) 1279*22dc650dSSadaf Ebrahimi</pre> 1280*22dc650dSSadaf EbrahimiThe default should normally correspond to the standard sequence for your 1281*22dc650dSSadaf Ebrahimioperating system. 1282*22dc650dSSadaf Ebrahimi<pre> 1283*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_NEVER_BACKSLASH_C 1284*22dc650dSSadaf Ebrahimi</pre> 1285*22dc650dSSadaf EbrahimiThe output is a uint32_t integer that is set to one if the use of \C was 1286*22dc650dSSadaf Ebrahimipermanently disabled when PCRE2 was built; otherwise it is set to zero. 1287*22dc650dSSadaf Ebrahimi<pre> 1288*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_PARENSLIMIT 1289*22dc650dSSadaf Ebrahimi</pre> 1290*22dc650dSSadaf EbrahimiThe output is a uint32_t integer that gives the maximum depth of nesting 1291*22dc650dSSadaf Ebrahimiof parentheses (of any kind) in a pattern. This limit is imposed to cap the 1292*22dc650dSSadaf Ebrahimiamount of system stack used when a pattern is compiled. It is specified when 1293*22dc650dSSadaf EbrahimiPCRE2 is built; the default is 250. This limit does not take into account the 1294*22dc650dSSadaf Ebrahimistack that may already be used by the calling application. For finer control 1295*22dc650dSSadaf Ebrahimiover compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>. 1296*22dc650dSSadaf Ebrahimi<pre> 1297*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_STACKRECURSE 1298*22dc650dSSadaf Ebrahimi</pre> 1299*22dc650dSSadaf EbrahimiThis parameter is obsolete and should not be used in new code. The output is a 1300*22dc650dSSadaf Ebrahimiuint32_t integer that is always set to zero. 1301*22dc650dSSadaf Ebrahimi<pre> 1302*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_TABLES_LENGTH 1303*22dc650dSSadaf Ebrahimi</pre> 1304*22dc650dSSadaf EbrahimiThe output is a uint32_t integer that gives the length of PCRE2's character 1305*22dc650dSSadaf Ebrahimiprocessing tables in bytes. For details of these tables see the 1306*22dc650dSSadaf Ebrahimi<a href="#localesupport">section on locale support</a> 1307*22dc650dSSadaf Ebrahimibelow. 1308*22dc650dSSadaf Ebrahimi<pre> 1309*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_UNICODE_VERSION 1310*22dc650dSSadaf Ebrahimi</pre> 1311*22dc650dSSadaf EbrahimiThe <i>where</i> argument should point to a buffer that is at least 24 code 1312*22dc650dSSadaf Ebrahimiunits long. (The exact length required can be found by calling 1313*22dc650dSSadaf Ebrahimi<b>pcre2_config()</b> with <b>where</b> set to NULL.) If PCRE2 has been compiled 1314*22dc650dSSadaf Ebrahimiwithout Unicode support, the buffer is filled with the text "Unicode not 1315*22dc650dSSadaf Ebrahimisupported". Otherwise, the Unicode version string (for example, "8.0.0") is 1316*22dc650dSSadaf Ebrahimiinserted. The number of code units used is returned. This is the length of the 1317*22dc650dSSadaf Ebrahimistring plus one unit for the terminating zero. 1318*22dc650dSSadaf Ebrahimi<pre> 1319*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_UNICODE 1320*22dc650dSSadaf Ebrahimi</pre> 1321*22dc650dSSadaf EbrahimiThe output is a uint32_t integer that is set to one if Unicode support is 1322*22dc650dSSadaf Ebrahimiavailable; otherwise it is set to zero. Unicode support implies UTF support. 1323*22dc650dSSadaf Ebrahimi<pre> 1324*22dc650dSSadaf Ebrahimi PCRE2_CONFIG_VERSION 1325*22dc650dSSadaf Ebrahimi</pre> 1326*22dc650dSSadaf EbrahimiThe <i>where</i> argument should point to a buffer that is at least 24 code 1327*22dc650dSSadaf Ebrahimiunits long. (The exact length required can be found by calling 1328*22dc650dSSadaf Ebrahimi<b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with 1329*22dc650dSSadaf Ebrahimithe PCRE2 version string, zero-terminated. The number of code units used is 1330*22dc650dSSadaf Ebrahimireturned. This is the length of the string plus one unit for the terminating 1331*22dc650dSSadaf Ebrahimizero. 1332*22dc650dSSadaf Ebrahimi<a name="compiling"></a></P> 1333*22dc650dSSadaf Ebrahimi<br><a name="SEC20" href="#TOC1">COMPILING A PATTERN</a><br> 1334*22dc650dSSadaf Ebrahimi<P> 1335*22dc650dSSadaf Ebrahimi<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b> 1336*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b> 1337*22dc650dSSadaf Ebrahimi<b> pcre2_compile_context *<i>ccontext</i>);</b> 1338*22dc650dSSadaf Ebrahimi<br> 1339*22dc650dSSadaf Ebrahimi<br> 1340*22dc650dSSadaf Ebrahimi<b>void pcre2_code_free(pcre2_code *<i>code</i>);</b> 1341*22dc650dSSadaf Ebrahimi<br> 1342*22dc650dSSadaf Ebrahimi<br> 1343*22dc650dSSadaf Ebrahimi<b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b> 1344*22dc650dSSadaf Ebrahimi<br> 1345*22dc650dSSadaf Ebrahimi<br> 1346*22dc650dSSadaf Ebrahimi<b>pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *<i>code</i>);</b> 1347*22dc650dSSadaf Ebrahimi</P> 1348*22dc650dSSadaf Ebrahimi<P> 1349*22dc650dSSadaf EbrahimiThe <b>pcre2_compile()</b> function compiles a pattern into an internal form. 1350*22dc650dSSadaf EbrahimiThe pattern is defined by a pointer to a string of code units and a length in 1351*22dc650dSSadaf Ebrahimicode units. If the pattern is zero-terminated, the length can be specified as 1352*22dc650dSSadaf EbrahimiPCRE2_ZERO_TERMINATED. A NULL pattern pointer with a length of zero is treated 1353*22dc650dSSadaf Ebrahimias an empty string (NULL with a non-zero length causes an error return). The 1354*22dc650dSSadaf Ebrahimifunction returns a pointer to a block of memory that contains the compiled 1355*22dc650dSSadaf Ebrahimipattern and related data, or NULL if an error occurred. 1356*22dc650dSSadaf Ebrahimi</P> 1357*22dc650dSSadaf Ebrahimi<P> 1358*22dc650dSSadaf EbrahimiIf the compile context argument <i>ccontext</i> is NULL, memory for the compiled 1359*22dc650dSSadaf Ebrahimipattern is obtained by calling <b>malloc()</b>. Otherwise, it is obtained from 1360*22dc650dSSadaf Ebrahimithe same memory function that was used for the compile context. The caller must 1361*22dc650dSSadaf Ebrahimifree the memory by calling <b>pcre2_code_free()</b> when it is no longer needed. 1362*22dc650dSSadaf EbrahimiIf <b>pcre2_code_free()</b> is called with a NULL argument, it returns 1363*22dc650dSSadaf Ebrahimiimmediately, without doing anything. 1364*22dc650dSSadaf Ebrahimi</P> 1365*22dc650dSSadaf Ebrahimi<P> 1366*22dc650dSSadaf EbrahimiThe function <b>pcre2_code_copy()</b> makes a copy of the compiled code in new 1367*22dc650dSSadaf Ebrahimimemory, using the same memory allocator as was used for the original. However, 1368*22dc650dSSadaf Ebrahimiif the code has been processed by the JIT compiler (see 1369*22dc650dSSadaf Ebrahimi<a href="#jitcompiling">below),</a> 1370*22dc650dSSadaf Ebrahimithe JIT information cannot be copied (because it is position-dependent). 1371*22dc650dSSadaf EbrahimiThe new copy can initially be used only for non-JIT matching, though it can be 1372*22dc650dSSadaf Ebrahimipassed to <b>pcre2_jit_compile()</b> if required. If <b>pcre2_code_copy()</b> is 1373*22dc650dSSadaf Ebrahimicalled with a NULL argument, it returns NULL. 1374*22dc650dSSadaf Ebrahimi</P> 1375*22dc650dSSadaf Ebrahimi<P> 1376*22dc650dSSadaf EbrahimiThe <b>pcre2_code_copy()</b> function provides a way for individual threads in a 1377*22dc650dSSadaf Ebrahimimultithreaded application to acquire a private copy of shared compiled code. 1378*22dc650dSSadaf EbrahimiHowever, it does not make a copy of the character tables used by the compiled 1379*22dc650dSSadaf Ebrahimipattern; the new pattern code points to the same tables as the original code. 1380*22dc650dSSadaf Ebrahimi(See 1381*22dc650dSSadaf Ebrahimi<a href="#jitcompiling">"Locale Support"</a> 1382*22dc650dSSadaf Ebrahimibelow for details of these character tables.) In many applications the same 1383*22dc650dSSadaf Ebrahimitables are used throughout, so this behaviour is appropriate. Nevertheless, 1384*22dc650dSSadaf Ebrahimithere are occasions when a copy of a compiled pattern and the relevant tables 1385*22dc650dSSadaf Ebrahimiare needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility. 1386*22dc650dSSadaf EbrahimiCopies of both the code and the tables are made, with the new code pointing to 1387*22dc650dSSadaf Ebrahimithe new tables. The memory for the new tables is automatically freed when 1388*22dc650dSSadaf Ebrahimi<b>pcre2_code_free()</b> is called for the new copy of the compiled code. If 1389*22dc650dSSadaf Ebrahimi<b>pcre2_code_copy_with_tables()</b> is called with a NULL argument, it returns 1390*22dc650dSSadaf EbrahimiNULL. 1391*22dc650dSSadaf Ebrahimi</P> 1392*22dc650dSSadaf Ebrahimi<P> 1393*22dc650dSSadaf EbrahimiNOTE: When one of the matching functions is called, pointers to the compiled 1394*22dc650dSSadaf Ebrahimipattern and the subject string are set in the match data block so that they can 1395*22dc650dSSadaf Ebrahimibe referenced by the substring extraction functions after a successful match. 1396*22dc650dSSadaf EbrahimiAfter running a match, you must not free a compiled pattern or a subject string 1397*22dc650dSSadaf Ebrahimiuntil after all operations on the 1398*22dc650dSSadaf Ebrahimi<a href="#matchdatablock">match data block</a> 1399*22dc650dSSadaf Ebrahimihave taken place, unless, in the case of the subject string, you have used the 1400*22dc650dSSadaf EbrahimiPCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled 1401*22dc650dSSadaf Ebrahimi"Option bits for <b>pcre2_match()</b>" 1402*22dc650dSSadaf Ebrahimi<a href="#matchoptions>">below.</a> 1403*22dc650dSSadaf Ebrahimi</P> 1404*22dc650dSSadaf Ebrahimi<P> 1405*22dc650dSSadaf EbrahimiThe <i>options</i> argument for <b>pcre2_compile()</b> contains various bit 1406*22dc650dSSadaf Ebrahimisettings that affect the compilation. It should be zero if none of them are 1407*22dc650dSSadaf Ebrahimirequired. The available options are described below. Some of them (in 1408*22dc650dSSadaf Ebrahimiparticular, those that are compatible with Perl, but some others as well) can 1409*22dc650dSSadaf Ebrahimialso be set and unset from within the pattern (see the detailed description in 1410*22dc650dSSadaf Ebrahimithe 1411*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 1412*22dc650dSSadaf Ebrahimidocumentation). 1413*22dc650dSSadaf Ebrahimi</P> 1414*22dc650dSSadaf Ebrahimi<P> 1415*22dc650dSSadaf EbrahimiFor those options that can be different in different parts of the pattern, the 1416*22dc650dSSadaf Ebrahimicontents of the <i>options</i> argument specifies their settings at the start of 1417*22dc650dSSadaf Ebrahimicompilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK 1418*22dc650dSSadaf Ebrahimioptions can be set at the time of matching as well as at compile time. 1419*22dc650dSSadaf Ebrahimi</P> 1420*22dc650dSSadaf Ebrahimi<P> 1421*22dc650dSSadaf EbrahimiSome additional options and less frequently required compile-time parameters 1422*22dc650dSSadaf Ebrahimi(for example, the newline setting) can be provided in a compile context (as 1423*22dc650dSSadaf Ebrahimidescribed 1424*22dc650dSSadaf Ebrahimi<a href="#compilecontext">above).</a> 1425*22dc650dSSadaf Ebrahimi</P> 1426*22dc650dSSadaf Ebrahimi<P> 1427*22dc650dSSadaf EbrahimiIf <i>errorcode</i> or <i>erroroffset</i> is NULL, <b>pcre2_compile()</b> returns 1428*22dc650dSSadaf EbrahimiNULL immediately. Otherwise, the variables to which these point are set to an 1429*22dc650dSSadaf Ebrahimierror code and an offset (number of code units) within the pattern, 1430*22dc650dSSadaf Ebrahimirespectively, when <b>pcre2_compile()</b> returns NULL because a compilation 1431*22dc650dSSadaf Ebrahimierror has occurred. 1432*22dc650dSSadaf Ebrahimi</P> 1433*22dc650dSSadaf Ebrahimi<P> 1434*22dc650dSSadaf EbrahimiThere are nearly 100 positive error codes that <b>pcre2_compile()</b> may return 1435*22dc650dSSadaf Ebrahimiif it finds an error in the pattern. There are also some negative error codes 1436*22dc650dSSadaf Ebrahimithat are used for invalid UTF strings when validity checking is in force. These 1437*22dc650dSSadaf Ebrahimiare the same as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and 1438*22dc650dSSadaf Ebrahimiare described in the 1439*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html"><b>pcre2unicode</b></a> 1440*22dc650dSSadaf Ebrahimidocumentation. There is no separate documentation for the positive error codes, 1441*22dc650dSSadaf Ebrahimibecause the textual error messages that are obtained by calling the 1442*22dc650dSSadaf Ebrahimi<b>pcre2_get_error_message()</b> function (see "Obtaining a textual error 1443*22dc650dSSadaf Ebrahimimessage" 1444*22dc650dSSadaf Ebrahimi<a href="#geterrormessage">below)</a> 1445*22dc650dSSadaf Ebrahimishould be self-explanatory. Macro names starting with PCRE2_ERROR_ are defined 1446*22dc650dSSadaf Ebrahimifor both positive and negative error codes in <b>pcre2.h</b>. When compilation 1447*22dc650dSSadaf Ebrahimiis successful <i>errorcode</i> is set to a value that returns the message "no 1448*22dc650dSSadaf Ebrahimierror" if passed to <b>pcre2_get_error_message()</b>. 1449*22dc650dSSadaf Ebrahimi</P> 1450*22dc650dSSadaf Ebrahimi<P> 1451*22dc650dSSadaf EbrahimiThe value returned in <i>erroroffset</i> is an indication of where in the 1452*22dc650dSSadaf Ebrahimipattern an error occurred. When there is no error, zero is returned. A non-zero 1453*22dc650dSSadaf Ebrahimivalue is not necessarily the furthest point in the pattern that was read. For 1454*22dc650dSSadaf Ebrahimiexample, after the error "lookbehind assertion is not fixed length", the error 1455*22dc650dSSadaf Ebrahimioffset points to the start of the failing assertion. For an invalid UTF-8 or 1456*22dc650dSSadaf EbrahimiUTF-16 string, the offset is that of the first code unit of the failing 1457*22dc650dSSadaf Ebrahimicharacter. 1458*22dc650dSSadaf Ebrahimi</P> 1459*22dc650dSSadaf Ebrahimi<P> 1460*22dc650dSSadaf EbrahimiSome errors are not detected until the whole pattern has been scanned; in these 1461*22dc650dSSadaf Ebrahimicases, the offset passed back is the length of the pattern. Note that the 1462*22dc650dSSadaf Ebrahimioffset is in code units, not characters, even in a UTF mode. It may sometimes 1463*22dc650dSSadaf Ebrahimipoint into the middle of a UTF-8 or UTF-16 character. 1464*22dc650dSSadaf Ebrahimi</P> 1465*22dc650dSSadaf Ebrahimi<P> 1466*22dc650dSSadaf EbrahimiThis code fragment shows a typical straightforward call to 1467*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b>: 1468*22dc650dSSadaf Ebrahimi<pre> 1469*22dc650dSSadaf Ebrahimi pcre2_code *re; 1470*22dc650dSSadaf Ebrahimi PCRE2_SIZE erroffset; 1471*22dc650dSSadaf Ebrahimi int errorcode; 1472*22dc650dSSadaf Ebrahimi re = pcre2_compile( 1473*22dc650dSSadaf Ebrahimi "^A.*Z", /* the pattern */ 1474*22dc650dSSadaf Ebrahimi PCRE2_ZERO_TERMINATED, /* the pattern is zero-terminated */ 1475*22dc650dSSadaf Ebrahimi 0, /* default options */ 1476*22dc650dSSadaf Ebrahimi &errorcode, /* for error code */ 1477*22dc650dSSadaf Ebrahimi &erroffset, /* for error offset */ 1478*22dc650dSSadaf Ebrahimi NULL); /* no compile context */ 1479*22dc650dSSadaf Ebrahimi 1480*22dc650dSSadaf Ebrahimi</PRE> 1481*22dc650dSSadaf Ebrahimi</P> 1482*22dc650dSSadaf Ebrahimi<br><b> 1483*22dc650dSSadaf EbrahimiMain compile options 1484*22dc650dSSadaf Ebrahimi</b><br> 1485*22dc650dSSadaf Ebrahimi<P> 1486*22dc650dSSadaf EbrahimiThe following names for option bits are defined in the <b>pcre2.h</b> header 1487*22dc650dSSadaf Ebrahimifile: 1488*22dc650dSSadaf Ebrahimi<pre> 1489*22dc650dSSadaf Ebrahimi PCRE2_ANCHORED 1490*22dc650dSSadaf Ebrahimi</pre> 1491*22dc650dSSadaf EbrahimiIf this bit is set, the pattern is forced to be "anchored", that is, it is 1492*22dc650dSSadaf Ebrahimiconstrained to match only at the first matching point in the string that is 1493*22dc650dSSadaf Ebrahimibeing searched (the "subject string"). This effect can also be achieved by 1494*22dc650dSSadaf Ebrahimiappropriate constructs in the pattern itself, which is the only way to do it in 1495*22dc650dSSadaf EbrahimiPerl. 1496*22dc650dSSadaf Ebrahimi<pre> 1497*22dc650dSSadaf Ebrahimi PCRE2_ALLOW_EMPTY_CLASS 1498*22dc650dSSadaf Ebrahimi</pre> 1499*22dc650dSSadaf EbrahimiBy default, for compatibility with Perl, a closing square bracket that 1500*22dc650dSSadaf Ebrahimiimmediately follows an opening one is treated as a data character for the 1501*22dc650dSSadaf Ebrahimiclass. When PCRE2_ALLOW_EMPTY_CLASS is set, it terminates the class, which 1502*22dc650dSSadaf Ebrahimitherefore contains no characters and so can never match. 1503*22dc650dSSadaf Ebrahimi<pre> 1504*22dc650dSSadaf Ebrahimi PCRE2_ALT_BSUX 1505*22dc650dSSadaf Ebrahimi</pre> 1506*22dc650dSSadaf EbrahimiThis option request alternative handling of three escape sequences, which 1507*22dc650dSSadaf Ebrahimimakes PCRE2's behaviour more like ECMAscript (aka JavaScript). When it is set: 1508*22dc650dSSadaf Ebrahimi</P> 1509*22dc650dSSadaf Ebrahimi<P> 1510*22dc650dSSadaf Ebrahimi(1) \U matches an upper case "U" character; by default \U causes a compile 1511*22dc650dSSadaf Ebrahimitime error (Perl uses \U to upper case subsequent characters). 1512*22dc650dSSadaf Ebrahimi</P> 1513*22dc650dSSadaf Ebrahimi<P> 1514*22dc650dSSadaf Ebrahimi(2) \u matches a lower case "u" character unless it is followed by four 1515*22dc650dSSadaf Ebrahimihexadecimal digits, in which case the hexadecimal number defines the code point 1516*22dc650dSSadaf Ebrahimito match. By default, \u causes a compile time error (Perl uses it to upper 1517*22dc650dSSadaf Ebrahimicase the following character). 1518*22dc650dSSadaf Ebrahimi</P> 1519*22dc650dSSadaf Ebrahimi<P> 1520*22dc650dSSadaf Ebrahimi(3) \x matches a lower case "x" character unless it is followed by two 1521*22dc650dSSadaf Ebrahimihexadecimal digits, in which case the hexadecimal number defines the code point 1522*22dc650dSSadaf Ebrahimito match. By default, as in Perl, a hexadecimal number is always expected after 1523*22dc650dSSadaf Ebrahimi\x, but it may have zero, one, or two digits (so, for example, \xz matches a 1524*22dc650dSSadaf Ebrahimibinary zero character followed by z). 1525*22dc650dSSadaf Ebrahimi</P> 1526*22dc650dSSadaf Ebrahimi<P> 1527*22dc650dSSadaf EbrahimiECMAscript 6 added additional functionality to \u. This can be accessed using 1528*22dc650dSSadaf Ebrahimithe PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options" 1529*22dc650dSSadaf Ebrahimi<a href="#extracompileoptions">below).</a> 1530*22dc650dSSadaf EbrahimiNote that this alternative escape handling applies only to patterns. Neither of 1531*22dc650dSSadaf Ebrahimithese options affects the processing of replacement strings passed to 1532*22dc650dSSadaf Ebrahimi<b>pcre2_substitute()</b>. 1533*22dc650dSSadaf Ebrahimi<pre> 1534*22dc650dSSadaf Ebrahimi PCRE2_ALT_CIRCUMFLEX 1535*22dc650dSSadaf Ebrahimi</pre> 1536*22dc650dSSadaf EbrahimiIn multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter 1537*22dc650dSSadaf Ebrahimimatches at the start of the subject (unless PCRE2_NOTBOL is set), and also 1538*22dc650dSSadaf Ebrahimiafter any internal newline. However, it does not match after a newline at the 1539*22dc650dSSadaf Ebrahimiend of the subject, for compatibility with Perl. If you want a multiline 1540*22dc650dSSadaf Ebrahimicircumflex also to match after a terminating newline, you must set 1541*22dc650dSSadaf EbrahimiPCRE2_ALT_CIRCUMFLEX. 1542*22dc650dSSadaf Ebrahimi<pre> 1543*22dc650dSSadaf Ebrahimi PCRE2_ALT_VERBNAMES 1544*22dc650dSSadaf Ebrahimi</pre> 1545*22dc650dSSadaf EbrahimiBy default, for compatibility with Perl, the name in any verb sequence such as 1546*22dc650dSSadaf Ebrahimi(*MARK:NAME) is any sequence of characters that does not include a closing 1547*22dc650dSSadaf Ebrahimiparenthesis. The name is not processed in any way, and it is not possible to 1548*22dc650dSSadaf Ebrahimiinclude a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES 1549*22dc650dSSadaf Ebrahimioption is set, normal backslash processing is applied to verb names and only an 1550*22dc650dSSadaf Ebrahimiunescaped closing parenthesis terminates the name. A closing parenthesis can be 1551*22dc650dSSadaf Ebrahimiincluded in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED 1552*22dc650dSSadaf Ebrahimior PCRE2_EXTENDED_MORE option is set with PCRE2_ALT_VERBNAMES, unescaped 1553*22dc650dSSadaf Ebrahimiwhitespace in verb names is skipped and #-comments are recognized, exactly as 1554*22dc650dSSadaf Ebrahimiin the rest of the pattern. 1555*22dc650dSSadaf Ebrahimi<pre> 1556*22dc650dSSadaf Ebrahimi PCRE2_AUTO_CALLOUT 1557*22dc650dSSadaf Ebrahimi</pre> 1558*22dc650dSSadaf EbrahimiIf this bit is set, <b>pcre2_compile()</b> automatically inserts callout items, 1559*22dc650dSSadaf Ebrahimiall with number 255, before each pattern item, except immediately before or 1560*22dc650dSSadaf Ebrahimiafter an explicit callout in the pattern. For discussion of the callout 1561*22dc650dSSadaf Ebrahimifacility, see the 1562*22dc650dSSadaf Ebrahimi<a href="pcre2callout.html"><b>pcre2callout</b></a> 1563*22dc650dSSadaf Ebrahimidocumentation. 1564*22dc650dSSadaf Ebrahimi<pre> 1565*22dc650dSSadaf Ebrahimi PCRE2_CASELESS 1566*22dc650dSSadaf Ebrahimi</pre> 1567*22dc650dSSadaf EbrahimiIf this bit is set, letters in the pattern match both upper and lower case 1568*22dc650dSSadaf Ebrahimiletters in the subject. It is equivalent to Perl's /i option, and it can be 1569*22dc650dSSadaf Ebrahimichanged within a pattern by a (?i) option setting. If either PCRE2_UTF or 1570*22dc650dSSadaf EbrahimiPCRE2_UCP is set, Unicode properties are used for all characters with more than 1571*22dc650dSSadaf Ebrahimione other case, and for all characters whose code points are greater than 1572*22dc650dSSadaf EbrahimiU+007F. Note that there are two ASCII characters, K and S, that, in addition to 1573*22dc650dSSadaf Ebrahimitheir lower case ASCII equivalents, are case-equivalent with U+212A (Kelvin 1574*22dc650dSSadaf Ebrahimisign) and U+017F (long S) respectively. If you do not want this case 1575*22dc650dSSadaf Ebrahimiequivalence, you can suppress it by setting PCRE2_EXTRA_CASELESS_RESTRICT. 1576*22dc650dSSadaf Ebrahimi</P> 1577*22dc650dSSadaf Ebrahimi<P> 1578*22dc650dSSadaf EbrahimiFor lower valued characters with only one other case, a lookup table is used 1579*22dc650dSSadaf Ebrahimifor speed. When neither PCRE2_UTF nor PCRE2_UCP is set, a lookup table is used 1580*22dc650dSSadaf Ebrahimifor all code points less than 256, and higher code points (available only in 1581*22dc650dSSadaf Ebrahimi16-bit or 32-bit mode) are treated as not having another case. 1582*22dc650dSSadaf Ebrahimi<pre> 1583*22dc650dSSadaf Ebrahimi PCRE2_DOLLAR_ENDONLY 1584*22dc650dSSadaf Ebrahimi</pre> 1585*22dc650dSSadaf EbrahimiIf this bit is set, a dollar metacharacter in the pattern matches only at the 1586*22dc650dSSadaf Ebrahimiend of the subject string. Without this option, a dollar also matches 1587*22dc650dSSadaf Ebrahimiimmediately before a newline at the end of the string (but not before any other 1588*22dc650dSSadaf Ebrahiminewlines). The PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is 1589*22dc650dSSadaf Ebrahimiset. There is no equivalent to this option in Perl, and no way to set it within 1590*22dc650dSSadaf Ebrahimia pattern. 1591*22dc650dSSadaf Ebrahimi<pre> 1592*22dc650dSSadaf Ebrahimi PCRE2_DOTALL 1593*22dc650dSSadaf Ebrahimi</pre> 1594*22dc650dSSadaf EbrahimiIf this bit is set, a dot metacharacter in the pattern matches any character, 1595*22dc650dSSadaf Ebrahimiincluding one that indicates a newline. However, it only ever matches one 1596*22dc650dSSadaf Ebrahimicharacter, even if newlines are coded as CRLF. Without this option, a dot does 1597*22dc650dSSadaf Ebrahiminot match when the current position in the subject is at a newline. This option 1598*22dc650dSSadaf Ebrahimiis equivalent to Perl's /s option, and it can be changed within a pattern by a 1599*22dc650dSSadaf Ebrahimi(?s) option setting. A negative class such as [^a] always matches newline 1600*22dc650dSSadaf Ebrahimicharacters, and the \N escape sequence always matches a non-newline character, 1601*22dc650dSSadaf Ebrahimiindependent of the setting of PCRE2_DOTALL. 1602*22dc650dSSadaf Ebrahimi<pre> 1603*22dc650dSSadaf Ebrahimi PCRE2_DUPNAMES 1604*22dc650dSSadaf Ebrahimi</pre> 1605*22dc650dSSadaf EbrahimiIf this bit is set, names used to identify capture groups need not be unique. 1606*22dc650dSSadaf EbrahimiThis can be helpful for certain types of pattern when it is known that only one 1607*22dc650dSSadaf Ebrahimiinstance of the named group can ever be matched. There are more details of 1608*22dc650dSSadaf Ebrahiminamed capture groups below; see also the 1609*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 1610*22dc650dSSadaf Ebrahimidocumentation. 1611*22dc650dSSadaf Ebrahimi<pre> 1612*22dc650dSSadaf Ebrahimi PCRE2_ENDANCHORED 1613*22dc650dSSadaf Ebrahimi</pre> 1614*22dc650dSSadaf EbrahimiIf this bit is set, the end of any pattern match must be right at the end of 1615*22dc650dSSadaf Ebrahimithe string being searched (the "subject string"). If the pattern match 1616*22dc650dSSadaf Ebrahimisucceeds by reaching (*ACCEPT), but does not reach the end of the subject, the 1617*22dc650dSSadaf Ebrahimimatch fails at the current starting point. For unanchored patterns, a new match 1618*22dc650dSSadaf Ebrahimiis then tried at the next starting point. However, if the match succeeds by 1619*22dc650dSSadaf Ebrahimireaching the end of the pattern, but not the end of the subject, backtracking 1620*22dc650dSSadaf Ebrahimioccurs and an alternative match may be found. Consider these two patterns: 1621*22dc650dSSadaf Ebrahimi<pre> 1622*22dc650dSSadaf Ebrahimi .(*ACCEPT)|.. 1623*22dc650dSSadaf Ebrahimi .|.. 1624*22dc650dSSadaf Ebrahimi</pre> 1625*22dc650dSSadaf EbrahimiIf matched against "abc" with PCRE2_ENDANCHORED set, the first matches "c" 1626*22dc650dSSadaf Ebrahimiwhereas the second matches "bc". The effect of PCRE2_ENDANCHORED can also be 1627*22dc650dSSadaf Ebrahimiachieved by appropriate constructs in the pattern itself, which is the only way 1628*22dc650dSSadaf Ebrahimito do it in Perl. 1629*22dc650dSSadaf Ebrahimi</P> 1630*22dc650dSSadaf Ebrahimi<P> 1631*22dc650dSSadaf EbrahimiFor DFA matching with <b>pcre2_dfa_match()</b>, PCRE2_ENDANCHORED applies only 1632*22dc650dSSadaf Ebrahimito the first (that is, the longest) matched string. Other parallel matches, 1633*22dc650dSSadaf Ebrahimiwhich are necessarily substrings of the first one, must obviously end before 1634*22dc650dSSadaf Ebrahimithe end of the subject. 1635*22dc650dSSadaf Ebrahimi<pre> 1636*22dc650dSSadaf Ebrahimi PCRE2_EXTENDED 1637*22dc650dSSadaf Ebrahimi</pre> 1638*22dc650dSSadaf EbrahimiIf this bit is set, most white space characters in the pattern are totally 1639*22dc650dSSadaf Ebrahimiignored except when escaped, inside a character class, or inside a \Q...\E 1640*22dc650dSSadaf Ebrahimisequence. However, white space is not allowed within sequences such as (?> that 1641*22dc650dSSadaf Ebrahimiintroduce various parenthesized groups, nor within numerical quantifiers such 1642*22dc650dSSadaf Ebrahimias {1,3}. Ignorable white space is permitted between an item and a following 1643*22dc650dSSadaf Ebrahimiquantifier and between a quantifier and a following + that indicates 1644*22dc650dSSadaf Ebrahimipossessiveness. PCRE2_EXTENDED is equivalent to Perl's /x option, and it can be 1645*22dc650dSSadaf Ebrahimichanged within a pattern by a (?x) option setting. 1646*22dc650dSSadaf Ebrahimi</P> 1647*22dc650dSSadaf Ebrahimi<P> 1648*22dc650dSSadaf EbrahimiWhen PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as 1649*22dc650dSSadaf Ebrahimiwhite space only those characters with code points less than 256 that are 1650*22dc650dSSadaf Ebrahimiflagged as white space in its low-character table. The table is normally 1651*22dc650dSSadaf Ebrahimicreated by 1652*22dc650dSSadaf Ebrahimi<a href="pcre2_maketables.html"><b>pcre2_maketables()</b>,</a> 1653*22dc650dSSadaf Ebrahimiwhich uses the <b>isspace()</b> function to identify space characters. In most 1654*22dc650dSSadaf EbrahimiASCII environments, the relevant characters are those with code points 0x0009 1655*22dc650dSSadaf Ebrahimi(tab), 0x000A (linefeed), 0x000B (vertical tab), 0x000C (formfeed), 0x000D 1656*22dc650dSSadaf Ebrahimi(carriage return), and 0x0020 (space). 1657*22dc650dSSadaf Ebrahimi</P> 1658*22dc650dSSadaf Ebrahimi<P> 1659*22dc650dSSadaf EbrahimiWhen PCRE2 is compiled with Unicode support, in addition to these characters, 1660*22dc650dSSadaf Ebrahimifive more Unicode "Pattern White Space" characters are recognized by 1661*22dc650dSSadaf EbrahimiPCRE2_EXTENDED. These are U+0085 (next line), U+200E (left-to-right mark), 1662*22dc650dSSadaf EbrahimiU+200F (right-to-left mark), U+2028 (line separator), and U+2029 (paragraph 1663*22dc650dSSadaf Ebrahimiseparator). This set of characters is the same as recognized by Perl's /x 1664*22dc650dSSadaf Ebrahimioption. Note that the horizontal and vertical space characters that are matched 1665*22dc650dSSadaf Ebrahimiby the \h and \v escapes in patterns are a much bigger set. 1666*22dc650dSSadaf Ebrahimi</P> 1667*22dc650dSSadaf Ebrahimi<P> 1668*22dc650dSSadaf EbrahimiAs well as ignoring most white space, PCRE2_EXTENDED also causes characters 1669*22dc650dSSadaf Ebrahimibetween an unescaped # outside a character class and the next newline, 1670*22dc650dSSadaf Ebrahimiinclusive, to be ignored, which makes it possible to include comments inside 1671*22dc650dSSadaf Ebrahimicomplicated patterns. Note that the end of this type of comment is a literal 1672*22dc650dSSadaf Ebrahiminewline sequence in the pattern; escape sequences that happen to represent a 1673*22dc650dSSadaf Ebrahiminewline do not count. 1674*22dc650dSSadaf Ebrahimi</P> 1675*22dc650dSSadaf Ebrahimi<P> 1676*22dc650dSSadaf EbrahimiWhich characters are interpreted as newlines can be specified by a setting in 1677*22dc650dSSadaf Ebrahimithe compile context that is passed to <b>pcre2_compile()</b> or by a special 1678*22dc650dSSadaf Ebrahimisequence at the start of the pattern, as described in the section entitled 1679*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html#newlines">"Newline conventions"</a> 1680*22dc650dSSadaf Ebrahimiin the <b>pcre2pattern</b> documentation. A default is defined when PCRE2 is 1681*22dc650dSSadaf Ebrahimibuilt. 1682*22dc650dSSadaf Ebrahimi<pre> 1683*22dc650dSSadaf Ebrahimi PCRE2_EXTENDED_MORE 1684*22dc650dSSadaf Ebrahimi</pre> 1685*22dc650dSSadaf EbrahimiThis option has the effect of PCRE2_EXTENDED, but, in addition, unescaped space 1686*22dc650dSSadaf Ebrahimiand horizontal tab characters are ignored inside a character class. Note: only 1687*22dc650dSSadaf Ebrahimithese two characters are ignored, not the full set of pattern white space 1688*22dc650dSSadaf Ebrahimicharacters that are ignored outside a character class. PCRE2_EXTENDED_MORE is 1689*22dc650dSSadaf Ebrahimiequivalent to Perl's /xx option, and it can be changed within a pattern by a 1690*22dc650dSSadaf Ebrahimi(?xx) option setting. 1691*22dc650dSSadaf Ebrahimi<pre> 1692*22dc650dSSadaf Ebrahimi PCRE2_FIRSTLINE 1693*22dc650dSSadaf Ebrahimi</pre> 1694*22dc650dSSadaf EbrahimiIf this option is set, the start of an unanchored pattern match must be before 1695*22dc650dSSadaf Ebrahimior at the first newline in the subject string following the start of matching, 1696*22dc650dSSadaf Ebrahimithough the matched text may continue over the newline. If <i>startoffset</i> is 1697*22dc650dSSadaf Ebrahiminon-zero, the limiting newline is not necessarily the first newline in the 1698*22dc650dSSadaf Ebrahimisubject. For example, if the subject string is "abc\nxyz" (where \n 1699*22dc650dSSadaf Ebrahimirepresents a single-character newline) a pattern match for "yz" succeeds with 1700*22dc650dSSadaf EbrahimiPCRE2_FIRSTLINE if <i>startoffset</i> is greater than 3. See also 1701*22dc650dSSadaf EbrahimiPCRE2_USE_OFFSET_LIMIT, which provides a more general limiting facility. If 1702*22dc650dSSadaf EbrahimiPCRE2_FIRSTLINE is set with an offset limit, a match must occur in the first 1703*22dc650dSSadaf Ebrahimiline and also within the offset limit. In other words, whichever limit comes 1704*22dc650dSSadaf Ebrahimifirst is used. This option has no effect for anchored patterns. 1705*22dc650dSSadaf Ebrahimi<pre> 1706*22dc650dSSadaf Ebrahimi PCRE2_LITERAL 1707*22dc650dSSadaf Ebrahimi</pre> 1708*22dc650dSSadaf EbrahimiIf this option is set, all meta-characters in the pattern are disabled, and it 1709*22dc650dSSadaf Ebrahimiis treated as a literal string. Matching literal strings with a regular 1710*22dc650dSSadaf Ebrahimiexpression engine is not the most efficient way of doing it. If you are doing a 1711*22dc650dSSadaf Ebrahimilot of literal matching and are worried about efficiency, you should consider 1712*22dc650dSSadaf Ebrahimiusing other approaches. The only other main options that are allowed with 1713*22dc650dSSadaf EbrahimiPCRE2_LITERAL are: PCRE2_ANCHORED, PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT, 1714*22dc650dSSadaf EbrahimiPCRE2_CASELESS, PCRE2_FIRSTLINE, PCRE2_MATCH_INVALID_UTF, 1715*22dc650dSSadaf EbrahimiPCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK, PCRE2_UTF, and 1716*22dc650dSSadaf EbrahimiPCRE2_USE_OFFSET_LIMIT. The extra options PCRE2_EXTRA_MATCH_LINE and 1717*22dc650dSSadaf EbrahimiPCRE2_EXTRA_MATCH_WORD are also supported. Any other options cause an error. 1718*22dc650dSSadaf Ebrahimi<pre> 1719*22dc650dSSadaf Ebrahimi PCRE2_MATCH_INVALID_UTF 1720*22dc650dSSadaf Ebrahimi</pre> 1721*22dc650dSSadaf EbrahimiThis option forces PCRE2_UTF (see below) and also enables support for matching 1722*22dc650dSSadaf Ebrahimiby <b>pcre2_match()</b> in subject strings that contain invalid UTF sequences. 1723*22dc650dSSadaf EbrahimiNote, however, that the 16-bit and 32-bit PCRE2 libraries process strings as 1724*22dc650dSSadaf Ebrahimisequences of uint16_t or uint32_t code points. They cannot find valid UTF 1725*22dc650dSSadaf Ebrahimisequences within an arbitrary string of bytes unless such sequences are 1726*22dc650dSSadaf Ebrahimisuitably aligned. This facility is not supported for DFA matching. For details, 1727*22dc650dSSadaf Ebrahimisee the 1728*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html"><b>pcre2unicode</b></a> 1729*22dc650dSSadaf Ebrahimidocumentation. 1730*22dc650dSSadaf Ebrahimi<pre> 1731*22dc650dSSadaf Ebrahimi PCRE2_MATCH_UNSET_BACKREF 1732*22dc650dSSadaf Ebrahimi</pre> 1733*22dc650dSSadaf EbrahimiIf this option is set, a backreference to an unset capture group matches an 1734*22dc650dSSadaf Ebrahimiempty string (by default this causes the current matching alternative to fail). 1735*22dc650dSSadaf EbrahimiA pattern such as (\1)(a) succeeds when this option is set (assuming it can 1736*22dc650dSSadaf Ebrahimifind an "a" in the subject), whereas it fails by default, for Perl 1737*22dc650dSSadaf Ebrahimicompatibility. Setting this option makes PCRE2 behave more like ECMAscript (aka 1738*22dc650dSSadaf EbrahimiJavaScript). 1739*22dc650dSSadaf Ebrahimi<pre> 1740*22dc650dSSadaf Ebrahimi PCRE2_MULTILINE 1741*22dc650dSSadaf Ebrahimi</pre> 1742*22dc650dSSadaf EbrahimiBy default, for the purposes of matching "start of line" and "end of line", 1743*22dc650dSSadaf EbrahimiPCRE2 treats the subject string as consisting of a single line of characters, 1744*22dc650dSSadaf Ebrahimieven if it actually contains newlines. The "start of line" metacharacter (^) 1745*22dc650dSSadaf Ebrahimimatches only at the start of the string, and the "end of line" metacharacter 1746*22dc650dSSadaf Ebrahimi($) matches only at the end of the string, or before a terminating newline 1747*22dc650dSSadaf Ebrahimi(except when PCRE2_DOLLAR_ENDONLY is set). Note, however, that unless 1748*22dc650dSSadaf EbrahimiPCRE2_DOTALL is set, the "any character" metacharacter (.) does not match at a 1749*22dc650dSSadaf Ebrahiminewline. This behaviour (for ^, $, and dot) is the same as Perl. 1750*22dc650dSSadaf Ebrahimi</P> 1751*22dc650dSSadaf Ebrahimi<P> 1752*22dc650dSSadaf EbrahimiWhen PCRE2_MULTILINE it is set, the "start of line" and "end of line" 1753*22dc650dSSadaf Ebrahimiconstructs match immediately following or immediately before internal newlines 1754*22dc650dSSadaf Ebrahimiin the subject string, respectively, as well as at the very start and end. This 1755*22dc650dSSadaf Ebrahimiis equivalent to Perl's /m option, and it can be changed within a pattern by a 1756*22dc650dSSadaf Ebrahimi(?m) option setting. Note that the "start of line" metacharacter does not match 1757*22dc650dSSadaf Ebrahimiafter a newline at the end of the subject, for compatibility with Perl. 1758*22dc650dSSadaf EbrahimiHowever, you can change this by setting the PCRE2_ALT_CIRCUMFLEX option. If 1759*22dc650dSSadaf Ebrahimithere are no newlines in a subject string, or no occurrences of ^ or $ in a 1760*22dc650dSSadaf Ebrahimipattern, setting PCRE2_MULTILINE has no effect. 1761*22dc650dSSadaf Ebrahimi<pre> 1762*22dc650dSSadaf Ebrahimi PCRE2_NEVER_BACKSLASH_C 1763*22dc650dSSadaf Ebrahimi</pre> 1764*22dc650dSSadaf EbrahimiThis option locks out the use of \C in the pattern that is being compiled. 1765*22dc650dSSadaf EbrahimiThis escape can cause unpredictable behaviour in UTF-8 or UTF-16 modes, because 1766*22dc650dSSadaf Ebrahimiit may leave the current matching point in the middle of a multi-code-unit 1767*22dc650dSSadaf Ebrahimicharacter. This option may be useful in applications that process patterns from 1768*22dc650dSSadaf Ebrahimiexternal sources. Note that there is also a build-time option that permanently 1769*22dc650dSSadaf Ebrahimilocks out the use of \C. 1770*22dc650dSSadaf Ebrahimi<pre> 1771*22dc650dSSadaf Ebrahimi PCRE2_NEVER_UCP 1772*22dc650dSSadaf Ebrahimi</pre> 1773*22dc650dSSadaf EbrahimiThis option locks out the use of Unicode properties for handling \B, \b, \D, 1774*22dc650dSSadaf Ebrahimi\d, \S, \s, \W, \w, and some of the POSIX character classes, as described 1775*22dc650dSSadaf Ebrahimifor the PCRE2_UCP option below. In particular, it prevents the creator of the 1776*22dc650dSSadaf Ebrahimipattern from enabling this facility by starting the pattern with (*UCP). This 1777*22dc650dSSadaf Ebrahimioption may be useful in applications that process patterns from external 1778*22dc650dSSadaf Ebrahimisources. The option combination PCRE_UCP and PCRE_NEVER_UCP causes an error. 1779*22dc650dSSadaf Ebrahimi<pre> 1780*22dc650dSSadaf Ebrahimi PCRE2_NEVER_UTF 1781*22dc650dSSadaf Ebrahimi</pre> 1782*22dc650dSSadaf EbrahimiThis option locks out interpretation of the pattern as UTF-8, UTF-16, or 1783*22dc650dSSadaf EbrahimiUTF-32, depending on which library is in use. In particular, it prevents the 1784*22dc650dSSadaf Ebrahimicreator of the pattern from switching to UTF interpretation by starting the 1785*22dc650dSSadaf Ebrahimipattern with (*UTF). This option may be useful in applications that process 1786*22dc650dSSadaf Ebrahimipatterns from external sources. The combination of PCRE2_UTF and 1787*22dc650dSSadaf EbrahimiPCRE2_NEVER_UTF causes an error. 1788*22dc650dSSadaf Ebrahimi<pre> 1789*22dc650dSSadaf Ebrahimi PCRE2_NO_AUTO_CAPTURE 1790*22dc650dSSadaf Ebrahimi</pre> 1791*22dc650dSSadaf EbrahimiIf this option is set, it disables the use of numbered capturing parentheses in 1792*22dc650dSSadaf Ebrahimithe pattern. Any opening parenthesis that is not followed by ? behaves as if it 1793*22dc650dSSadaf Ebrahimiwere followed by ?: but named parentheses can still be used for capturing (and 1794*22dc650dSSadaf Ebrahimithey acquire numbers in the usual way). This is the same as Perl's /n option. 1795*22dc650dSSadaf EbrahimiNote that, when this option is set, references to capture groups 1796*22dc650dSSadaf Ebrahimi(backreferences or recursion/subroutine calls) may only refer to named groups, 1797*22dc650dSSadaf Ebrahimithough the reference can be by name or by number. 1798*22dc650dSSadaf Ebrahimi<pre> 1799*22dc650dSSadaf Ebrahimi PCRE2_NO_AUTO_POSSESS 1800*22dc650dSSadaf Ebrahimi</pre> 1801*22dc650dSSadaf EbrahimiIf this option is set, it disables "auto-possessification", which is an 1802*22dc650dSSadaf Ebrahimioptimization that, for example, turns a+b into a++b in order to avoid 1803*22dc650dSSadaf Ebrahimibacktracks into a+ that can never be successful. However, if callouts are in 1804*22dc650dSSadaf Ebrahimiuse, auto-possessification means that some callouts are never taken. You can 1805*22dc650dSSadaf Ebrahimiset this option if you want the matching functions to do a full unoptimized 1806*22dc650dSSadaf Ebrahimisearch and run all the callouts, but it is mainly provided for testing 1807*22dc650dSSadaf Ebrahimipurposes. 1808*22dc650dSSadaf Ebrahimi<pre> 1809*22dc650dSSadaf Ebrahimi PCRE2_NO_DOTSTAR_ANCHOR 1810*22dc650dSSadaf Ebrahimi</pre> 1811*22dc650dSSadaf EbrahimiIf this option is set, it disables an optimization that is applied when .* is 1812*22dc650dSSadaf Ebrahimithe first significant item in a top-level branch of a pattern, and all the 1813*22dc650dSSadaf Ebrahimiother branches also start with .* or with \A or \G or ^. The optimization is 1814*22dc650dSSadaf Ebrahimiautomatically disabled for .* if it is inside an atomic group or a capture 1815*22dc650dSSadaf Ebrahimigroup that is the subject of a backreference, or if the pattern contains 1816*22dc650dSSadaf Ebrahimi(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is 1817*22dc650dSSadaf Ebrahimiautomatically anchored if PCRE2_DOTALL is set for all the .* items and 1818*22dc650dSSadaf EbrahimiPCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match 1819*22dc650dSSadaf Ebrahimimust start either at the start of the subject or following a newline is 1820*22dc650dSSadaf Ebrahimiremembered. Like other optimizations, this can cause callouts to be skipped. 1821*22dc650dSSadaf Ebrahimi<pre> 1822*22dc650dSSadaf Ebrahimi PCRE2_NO_START_OPTIMIZE 1823*22dc650dSSadaf Ebrahimi</pre> 1824*22dc650dSSadaf EbrahimiThis is an option whose main effect is at matching time. It does not change 1825*22dc650dSSadaf Ebrahimiwhat <b>pcre2_compile()</b> generates, but it does affect the output of the JIT 1826*22dc650dSSadaf Ebrahimicompiler. 1827*22dc650dSSadaf Ebrahimi</P> 1828*22dc650dSSadaf Ebrahimi<P> 1829*22dc650dSSadaf EbrahimiThere are a number of optimizations that may occur at the start of a match, in 1830*22dc650dSSadaf Ebrahimiorder to speed up the process. For example, if it is known that an unanchored 1831*22dc650dSSadaf Ebrahimimatch must start with a specific code unit value, the matching code searches 1832*22dc650dSSadaf Ebrahimithe subject for that value, and fails immediately if it cannot find it, without 1833*22dc650dSSadaf Ebrahimiactually running the main matching function. This means that a special item 1834*22dc650dSSadaf Ebrahimisuch as (*COMMIT) at the start of a pattern is not considered until after a 1835*22dc650dSSadaf Ebrahimisuitable starting point for the match has been found. Also, when callouts or 1836*22dc650dSSadaf Ebrahimi(*MARK) items are in use, these "start-up" optimizations can cause them to be 1837*22dc650dSSadaf Ebrahimiskipped if the pattern is never actually used. The start-up optimizations are 1838*22dc650dSSadaf Ebrahimiin effect a pre-scan of the subject that takes place before the pattern is run. 1839*22dc650dSSadaf Ebrahimi</P> 1840*22dc650dSSadaf Ebrahimi<P> 1841*22dc650dSSadaf EbrahimiThe PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations, 1842*22dc650dSSadaf Ebrahimipossibly causing performance to suffer, but ensuring that in cases where the 1843*22dc650dSSadaf Ebrahimiresult is "no match", the callouts do occur, and that items such as (*COMMIT) 1844*22dc650dSSadaf Ebrahimiand (*MARK) are considered at every possible starting position in the subject 1845*22dc650dSSadaf Ebrahimistring. 1846*22dc650dSSadaf Ebrahimi</P> 1847*22dc650dSSadaf Ebrahimi<P> 1848*22dc650dSSadaf EbrahimiSetting PCRE2_NO_START_OPTIMIZE may change the outcome of a matching operation. 1849*22dc650dSSadaf EbrahimiConsider the pattern 1850*22dc650dSSadaf Ebrahimi<pre> 1851*22dc650dSSadaf Ebrahimi (*COMMIT)ABC 1852*22dc650dSSadaf Ebrahimi</pre> 1853*22dc650dSSadaf EbrahimiWhen this is compiled, PCRE2 records the fact that a match must start with the 1854*22dc650dSSadaf Ebrahimicharacter "A". Suppose the subject string is "DEFABC". The start-up 1855*22dc650dSSadaf Ebrahimioptimization scans along the subject, finds "A" and runs the first match 1856*22dc650dSSadaf Ebrahimiattempt from there. The (*COMMIT) item means that the pattern must match the 1857*22dc650dSSadaf Ebrahimicurrent starting position, which in this case, it does. However, if the same 1858*22dc650dSSadaf Ebrahimimatch is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the 1859*22dc650dSSadaf Ebrahimisubject string does not happen. The first match attempt is run starting from 1860*22dc650dSSadaf Ebrahimi"D" and when this fails, (*COMMIT) prevents any further matches being tried, so 1861*22dc650dSSadaf Ebrahimithe overall result is "no match". 1862*22dc650dSSadaf Ebrahimi</P> 1863*22dc650dSSadaf Ebrahimi<P> 1864*22dc650dSSadaf EbrahimiAs another start-up optimization makes use of a minimum length for a matching 1865*22dc650dSSadaf Ebrahimisubject, which is recorded when possible. Consider the pattern 1866*22dc650dSSadaf Ebrahimi<pre> 1867*22dc650dSSadaf Ebrahimi (*MARK:1)B(*MARK:2)(X|Y) 1868*22dc650dSSadaf Ebrahimi</pre> 1869*22dc650dSSadaf EbrahimiThe minimum length for a match is two characters. If the subject is "XXBB", the 1870*22dc650dSSadaf Ebrahimi"starting character" optimization skips "XX", then tries to match "BB", which 1871*22dc650dSSadaf Ebrahimiis long enough. In the process, (*MARK:2) is encountered and remembered. When 1872*22dc650dSSadaf Ebrahimithe match attempt fails, the next "B" is found, but there is only one character 1873*22dc650dSSadaf Ebrahimileft, so there are no more attempts, and "no match" is returned with the "last 1874*22dc650dSSadaf Ebrahimimark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried 1875*22dc650dSSadaf Ebrahimiat every possible starting position, including at the end of the subject, where 1876*22dc650dSSadaf Ebrahimi(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is 1877*22dc650dSSadaf Ebrahimireturned is "1". In this case, the optimizations do not affect the overall 1878*22dc650dSSadaf Ebrahimimatch result, which is still "no match", but they do affect the auxiliary 1879*22dc650dSSadaf Ebrahimiinformation that is returned. 1880*22dc650dSSadaf Ebrahimi<pre> 1881*22dc650dSSadaf Ebrahimi PCRE2_NO_UTF_CHECK 1882*22dc650dSSadaf Ebrahimi</pre> 1883*22dc650dSSadaf EbrahimiWhen PCRE2_UTF is set, the validity of the pattern as a UTF string is 1884*22dc650dSSadaf Ebrahimiautomatically checked. There are discussions about the validity of 1885*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html#utf8strings">UTF-8 strings,</a> 1886*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html#utf16strings">UTF-16 strings,</a> 1887*22dc650dSSadaf Ebrahimiand 1888*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html#utf32strings">UTF-32 strings</a> 1889*22dc650dSSadaf Ebrahimiin the 1890*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html"><b>pcre2unicode</b></a> 1891*22dc650dSSadaf Ebrahimidocument. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a 1892*22dc650dSSadaf Ebrahiminegative error code. 1893*22dc650dSSadaf Ebrahimi</P> 1894*22dc650dSSadaf Ebrahimi<P> 1895*22dc650dSSadaf EbrahimiIf you know that your pattern is a valid UTF string, and you want to skip this 1896*22dc650dSSadaf Ebrahimicheck for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When 1897*22dc650dSSadaf Ebrahimiit is set, the effect of passing an invalid UTF string as a pattern is 1898*22dc650dSSadaf Ebrahimiundefined. It may cause your program to crash or loop. 1899*22dc650dSSadaf Ebrahimi</P> 1900*22dc650dSSadaf Ebrahimi<P> 1901*22dc650dSSadaf EbrahimiNote that this option can also be passed to <b>pcre2_match()</b> and 1902*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>, to suppress UTF validity checking of the subject 1903*22dc650dSSadaf Ebrahimistring. 1904*22dc650dSSadaf Ebrahimi</P> 1905*22dc650dSSadaf Ebrahimi<P> 1906*22dc650dSSadaf EbrahimiNote also that setting PCRE2_NO_UTF_CHECK at compile time does not disable the 1907*22dc650dSSadaf Ebrahimierror that is given if an escape sequence for an invalid Unicode code point is 1908*22dc650dSSadaf Ebrahimiencountered in the pattern. In particular, the so-called "surrogate" code 1909*22dc650dSSadaf Ebrahimipoints (0xd800 to 0xdfff) are invalid. If you want to allow escape sequences 1910*22dc650dSSadaf Ebrahimisuch as \x{d800} you can set the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra 1911*22dc650dSSadaf Ebrahimioption, as described in the section entitled "Extra compile options" 1912*22dc650dSSadaf Ebrahimi<a href="#extracompileoptions">below.</a> 1913*22dc650dSSadaf EbrahimiHowever, this is possible only in UTF-8 and UTF-32 modes, because these values 1914*22dc650dSSadaf Ebrahimiare not representable in UTF-16. 1915*22dc650dSSadaf Ebrahimi<pre> 1916*22dc650dSSadaf Ebrahimi PCRE2_UCP 1917*22dc650dSSadaf Ebrahimi</pre> 1918*22dc650dSSadaf EbrahimiThis option has two effects. Firstly, it change the way PCRE2 processes \B, 1919*22dc650dSSadaf Ebrahimi\b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes. By 1920*22dc650dSSadaf Ebrahimidefault, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode 1921*22dc650dSSadaf Ebrahimiproperties are used to classify characters. There are some PCRE2_EXTRA 1922*22dc650dSSadaf Ebrahimioptions (see below) that add finer control to this behaviour. More details are 1923*22dc650dSSadaf Ebrahimigiven in the section on 1924*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html#genericchartypes">generic character types</a> 1925*22dc650dSSadaf Ebrahimiin the 1926*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 1927*22dc650dSSadaf Ebrahimipage. 1928*22dc650dSSadaf Ebrahimi</P> 1929*22dc650dSSadaf Ebrahimi<P> 1930*22dc650dSSadaf EbrahimiThe second effect of PCRE2_UCP is to force the use of Unicode properties for 1931*22dc650dSSadaf Ebrahimiupper/lower casing operations, even when PCRE2_UTF is not set. This makes it 1932*22dc650dSSadaf Ebrahimipossible to process strings in the 16-bit UCS-2 code. This option is available 1933*22dc650dSSadaf Ebrahimionly if PCRE2 has been compiled with Unicode support (which is the default). 1934*22dc650dSSadaf EbrahimiThe PCRE2_EXTRA_CASELESS_RESTRICT option (see below) restricts caseless 1935*22dc650dSSadaf Ebrahimimatching such that ASCII characters match only ASCII characters and non-ASCII 1936*22dc650dSSadaf Ebrahimicharacters match only non-ASCII characters. 1937*22dc650dSSadaf Ebrahimi<pre> 1938*22dc650dSSadaf Ebrahimi PCRE2_UNGREEDY 1939*22dc650dSSadaf Ebrahimi</pre> 1940*22dc650dSSadaf EbrahimiThis option inverts the "greediness" of the quantifiers so that they are not 1941*22dc650dSSadaf Ebrahimigreedy by default, but become greedy if followed by "?". It is not compatible 1942*22dc650dSSadaf Ebrahimiwith Perl. It can also be set by a (?U) option setting within the pattern. 1943*22dc650dSSadaf Ebrahimi<pre> 1944*22dc650dSSadaf Ebrahimi PCRE2_USE_OFFSET_LIMIT 1945*22dc650dSSadaf Ebrahimi</pre> 1946*22dc650dSSadaf EbrahimiThis option must be set for <b>pcre2_compile()</b> if 1947*22dc650dSSadaf Ebrahimi<b>pcre2_set_offset_limit()</b> is going to be used to set a non-default offset 1948*22dc650dSSadaf Ebrahimilimit in a match context for matches that use this pattern. An error is 1949*22dc650dSSadaf Ebrahimigenerated if an offset limit is set without this option. For more details, see 1950*22dc650dSSadaf Ebrahimithe description of <b>pcre2_set_offset_limit()</b> in the 1951*22dc650dSSadaf Ebrahimi<a href="#matchcontext">section</a> 1952*22dc650dSSadaf Ebrahimithat describes match contexts. See also the PCRE2_FIRSTLINE 1953*22dc650dSSadaf Ebrahimioption above. 1954*22dc650dSSadaf Ebrahimi<pre> 1955*22dc650dSSadaf Ebrahimi PCRE2_UTF 1956*22dc650dSSadaf Ebrahimi</pre> 1957*22dc650dSSadaf EbrahimiThis option causes PCRE2 to regard both the pattern and the subject strings 1958*22dc650dSSadaf Ebrahimithat are subsequently processed as strings of UTF characters instead of 1959*22dc650dSSadaf Ebrahimisingle-code-unit strings. It is available when PCRE2 is built to include 1960*22dc650dSSadaf EbrahimiUnicode support (which is the default). If Unicode support is not available, 1961*22dc650dSSadaf Ebrahimithe use of this option provokes an error. Details of how PCRE2_UTF changes the 1962*22dc650dSSadaf Ebrahimibehaviour of PCRE2 are given in the 1963*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html"><b>pcre2unicode</b></a> 1964*22dc650dSSadaf Ebrahimipage. In particular, note that it changes the way PCRE2_CASELESS works. 1965*22dc650dSSadaf Ebrahimi<a name="extracompileoptions"></a></P> 1966*22dc650dSSadaf Ebrahimi<br><b> 1967*22dc650dSSadaf EbrahimiExtra compile options 1968*22dc650dSSadaf Ebrahimi</b><br> 1969*22dc650dSSadaf Ebrahimi<P> 1970*22dc650dSSadaf EbrahimiThe option bits that can be set in a compile context by calling the 1971*22dc650dSSadaf Ebrahimi<b>pcre2_set_compile_extra_options()</b> function are as follows: 1972*22dc650dSSadaf Ebrahimi<pre> 1973*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK 1974*22dc650dSSadaf Ebrahimi</pre> 1975*22dc650dSSadaf EbrahimiSince release 10.38 PCRE2 has forbidden the use of \K within lookaround 1976*22dc650dSSadaf Ebrahimiassertions, following Perl's lead. This option is provided to re-enable the 1977*22dc650dSSadaf Ebrahimiprevious behaviour (act in positive lookarounds, ignore in negative ones) in 1978*22dc650dSSadaf Ebrahimicase anybody is relying on it. 1979*22dc650dSSadaf Ebrahimi<pre> 1980*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 1981*22dc650dSSadaf Ebrahimi</pre> 1982*22dc650dSSadaf EbrahimiThis option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is 1983*22dc650dSSadaf Ebrahimiforbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate" 1984*22dc650dSSadaf Ebrahimicode points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode 1985*22dc650dSSadaf Ebrahimicode points with values in the range 0x10000 to 0x10ffff. The surrogates cannot 1986*22dc650dSSadaf Ebrahimitherefore be represented in UTF-16. They can be represented in UTF-8 and 1987*22dc650dSSadaf EbrahimiUTF-32, but are defined as invalid code points, and cause errors if encountered 1988*22dc650dSSadaf Ebrahimiin a UTF-8 or UTF-32 string that is being checked for validity by PCRE2. 1989*22dc650dSSadaf Ebrahimi</P> 1990*22dc650dSSadaf Ebrahimi<P> 1991*22dc650dSSadaf EbrahimiThese values also cause errors if encountered in escape sequences such as 1992*22dc650dSSadaf Ebrahimi\x{d912} within a pattern. However, it seems that some applications, when 1993*22dc650dSSadaf Ebrahimiusing PCRE2 to check for unwanted characters in UTF-8 strings, explicitly test 1994*22dc650dSSadaf Ebrahimifor the surrogates using escape sequences. The PCRE2_NO_UTF_CHECK option does 1995*22dc650dSSadaf Ebrahiminot disable the error that occurs, because it applies only to the testing of 1996*22dc650dSSadaf Ebrahimiinput strings for UTF validity. 1997*22dc650dSSadaf Ebrahimi</P> 1998*22dc650dSSadaf Ebrahimi<P> 1999*22dc650dSSadaf EbrahimiIf the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code 2000*22dc650dSSadaf Ebrahimipoint values in UTF-8 and UTF-32 patterns no longer provoke errors and are 2001*22dc650dSSadaf Ebrahimiincorporated in the compiled pattern. However, they can only match subject 2002*22dc650dSSadaf Ebrahimicharacters if the matching function is called with PCRE2_NO_UTF_CHECK set. 2003*22dc650dSSadaf Ebrahimi<pre> 2004*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ALT_BSUX 2005*22dc650dSSadaf Ebrahimi</pre> 2006*22dc650dSSadaf EbrahimiThe original option PCRE2_ALT_BSUX causes PCRE2 to process \U, \u, and \x in 2007*22dc650dSSadaf Ebrahimithe way that ECMAscript (aka JavaScript) does. Additional functionality was 2008*22dc650dSSadaf Ebrahimidefined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of 2009*22dc650dSSadaf EbrahimiPCRE2_ALT_BSUX, but in addition it recognizes \u{hhh..} as a hexadecimal 2010*22dc650dSSadaf Ebrahimicharacter code, where hhh.. is any number of hexadecimal digits. 2011*22dc650dSSadaf Ebrahimi<pre> 2012*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ASCII_BSD 2013*22dc650dSSadaf Ebrahimi</pre> 2014*22dc650dSSadaf EbrahimiThis option forces \d to match only ASCII digits, even when PCRE2_UCP is set. 2015*22dc650dSSadaf EbrahimiIt can be changed within a pattern by means of the (?aD) option setting. 2016*22dc650dSSadaf Ebrahimi<pre> 2017*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ASCII_BSS 2018*22dc650dSSadaf Ebrahimi</pre> 2019*22dc650dSSadaf EbrahimiThis option forces \s to match only ASCII space characters, even when 2020*22dc650dSSadaf EbrahimiPCRE2_UCP is set. It can be changed within a pattern by means of the (?aS) 2021*22dc650dSSadaf Ebrahimioption setting. 2022*22dc650dSSadaf Ebrahimi<pre> 2023*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ASCII_BSW 2024*22dc650dSSadaf Ebrahimi</pre> 2025*22dc650dSSadaf EbrahimiThis option forces \w to match only ASCII word characters, even when PCRE2_UCP 2026*22dc650dSSadaf Ebrahimiis set. It can be changed within a pattern by means of the (?aW) option 2027*22dc650dSSadaf Ebrahimisetting. 2028*22dc650dSSadaf Ebrahimi<pre> 2029*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ASCII_DIGIT 2030*22dc650dSSadaf Ebrahimi</pre> 2031*22dc650dSSadaf EbrahimiThis option forces the POSIX character classes [:digit:] and [:xdigit:] to 2032*22dc650dSSadaf Ebrahimimatch only ASCII digits, even when PCRE2_UCP is set. It can be changed within 2033*22dc650dSSadaf Ebrahimia pattern by means of the (?aT) option setting. 2034*22dc650dSSadaf Ebrahimi<pre> 2035*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ASCII_POSIX 2036*22dc650dSSadaf Ebrahimi</pre> 2037*22dc650dSSadaf EbrahimiThis option forces all the POSIX character classes, including [:digit:] and 2038*22dc650dSSadaf Ebrahimi[:xdigit:], to match only ASCII characters, even when PCRE2_UCP is set. It can 2039*22dc650dSSadaf Ebrahimibe changed within a pattern by means of the (?aP) option setting, but note that 2040*22dc650dSSadaf Ebrahimithis also sets PCRE2_EXTRA_ASCII_DIGIT in order to ensure that (?-aP) unsets 2041*22dc650dSSadaf Ebrahimiall ASCII restrictions for POSIX classes. 2042*22dc650dSSadaf Ebrahimi<pre> 2043*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 2044*22dc650dSSadaf Ebrahimi</pre> 2045*22dc650dSSadaf EbrahimiThis is a dangerous option. Use with care. By default, an unrecognized escape 2046*22dc650dSSadaf Ebrahimisuch as \j or a malformed one such as \x{2z} causes a compile-time error when 2047*22dc650dSSadaf Ebrahimidetected by <b>pcre2_compile()</b>. Perl is somewhat inconsistent in handling 2048*22dc650dSSadaf Ebrahimisuch items: for example, \j is treated as a literal "j", and non-hexadecimal 2049*22dc650dSSadaf Ebrahimidigits in \x{} are just ignored, though warnings are given in both cases if 2050*22dc650dSSadaf EbrahimiPerl's warning switch is enabled. However, a malformed octal number after \o{ 2051*22dc650dSSadaf Ebrahimialways causes an error in Perl. 2052*22dc650dSSadaf Ebrahimi</P> 2053*22dc650dSSadaf Ebrahimi<P> 2054*22dc650dSSadaf EbrahimiIf the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to 2055*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b>, all unrecognized or malformed escape sequences are 2056*22dc650dSSadaf Ebrahimitreated as single-character escapes. For example, \j is a literal "j" and 2057*22dc650dSSadaf Ebrahimi\x{2z} is treated as the literal string "x{2z}". Setting this option means 2058*22dc650dSSadaf Ebrahimithat typos in patterns may go undetected and have unexpected results. Also note 2059*22dc650dSSadaf Ebrahimithat a sequence such as [\N{] is interpreted as a malformed attempt at 2060*22dc650dSSadaf Ebrahimi[\N{...}] and so is treated as [N{] whereas [\N] gives an error because an 2061*22dc650dSSadaf Ebrahimiunqualified \N is a valid escape sequence but is not supported in a character 2062*22dc650dSSadaf Ebrahimiclass. To reiterate: this is a dangerous option. Use with great care. 2063*22dc650dSSadaf Ebrahimi<pre> 2064*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_CASELESS_RESTRICT 2065*22dc650dSSadaf Ebrahimi</pre> 2066*22dc650dSSadaf EbrahimiWhen either PCRE2_UCP or PCRE2_UTF is set, caseless matching follows Unicode 2067*22dc650dSSadaf Ebrahimirules, which allow for more than two cases per character. There are two 2068*22dc650dSSadaf Ebrahimicase-equivalent character sets that contain both ASCII and non-ASCII 2069*22dc650dSSadaf Ebrahimicharacters. The ASCII letter S is case-equivalent to U+017f (long S) and the 2070*22dc650dSSadaf EbrahimiASCII letter K is case-equivalent to U+212a (Kelvin sign). This option disables 2071*22dc650dSSadaf Ebrahimirecognition of case-equivalences that cross the ASCII/non-ASCII boundary. In a 2072*22dc650dSSadaf Ebrahimicaseless match, both characters must either be ASCII or non-ASCII. The option 2073*22dc650dSSadaf Ebrahimican be changed with a pattern by the (?r) option setting. 2074*22dc650dSSadaf Ebrahimi<pre> 2075*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ESCAPED_CR_IS_LF 2076*22dc650dSSadaf Ebrahimi</pre> 2077*22dc650dSSadaf EbrahimiThere are some legacy applications where the escape sequence \r in a pattern 2078*22dc650dSSadaf Ebrahimiis expected to match a newline. If this option is set, \r in a pattern is 2079*22dc650dSSadaf Ebrahimiconverted to \n so that it matches a LF (linefeed) instead of a CR (carriage 2080*22dc650dSSadaf Ebrahimireturn) character. The option does not affect a literal CR in the pattern, nor 2081*22dc650dSSadaf Ebrahimidoes it affect CR specified as an explicit code point such as \x{0D}. 2082*22dc650dSSadaf Ebrahimi<pre> 2083*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_MATCH_LINE 2084*22dc650dSSadaf Ebrahimi</pre> 2085*22dc650dSSadaf EbrahimiThis option is provided for use by the <b>-x</b> option of <b>pcre2grep</b>. It 2086*22dc650dSSadaf Ebrahimicauses the pattern only to match complete lines. This is achieved by 2087*22dc650dSSadaf Ebrahimiautomatically inserting the code for "^(?:" at the start of the compiled 2088*22dc650dSSadaf Ebrahimipattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, the matched 2089*22dc650dSSadaf Ebrahimiline may be in the middle of the subject string. This option can be used with 2090*22dc650dSSadaf EbrahimiPCRE2_LITERAL. 2091*22dc650dSSadaf Ebrahimi<pre> 2092*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_MATCH_WORD 2093*22dc650dSSadaf Ebrahimi</pre> 2094*22dc650dSSadaf EbrahimiThis option is provided for use by the <b>-w</b> option of <b>pcre2grep</b>. It 2095*22dc650dSSadaf Ebrahimicauses the pattern only to match strings that have a word boundary at the start 2096*22dc650dSSadaf Ebrahimiand the end. This is achieved by automatically inserting the code for "\b(?:" 2097*22dc650dSSadaf Ebrahimiat the start of the compiled pattern and ")\b" at the end. The option may be 2098*22dc650dSSadaf Ebrahimiused with PCRE2_LITERAL. However, it is ignored if PCRE2_EXTRA_MATCH_LINE is 2099*22dc650dSSadaf Ebrahimialso set. 2100*22dc650dSSadaf Ebrahimi<a name="jitcompiling"></a></P> 2101*22dc650dSSadaf Ebrahimi<br><a name="SEC21" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br> 2102*22dc650dSSadaf Ebrahimi<P> 2103*22dc650dSSadaf Ebrahimi<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b> 2104*22dc650dSSadaf Ebrahimi<br> 2105*22dc650dSSadaf Ebrahimi<br> 2106*22dc650dSSadaf Ebrahimi<b>int pcre2_jit_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> 2107*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> 2108*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b> 2109*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>);</b> 2110*22dc650dSSadaf Ebrahimi<br> 2111*22dc650dSSadaf Ebrahimi<br> 2112*22dc650dSSadaf Ebrahimi<b>void pcre2_jit_free_unused_memory(pcre2_general_context *<i>gcontext</i>);</b> 2113*22dc650dSSadaf Ebrahimi<br> 2114*22dc650dSSadaf Ebrahimi<br> 2115*22dc650dSSadaf Ebrahimi<b>pcre2_jit_stack *pcre2_jit_stack_create(size_t <i>startsize</i>,</b> 2116*22dc650dSSadaf Ebrahimi<b> size_t <i>maxsize</i>, pcre2_general_context *<i>gcontext</i>);</b> 2117*22dc650dSSadaf Ebrahimi<br> 2118*22dc650dSSadaf Ebrahimi<br> 2119*22dc650dSSadaf Ebrahimi<b>void pcre2_jit_stack_assign(pcre2_match_context *<i>mcontext</i>,</b> 2120*22dc650dSSadaf Ebrahimi<b> pcre2_jit_callback <i>callback_function</i>, void *<i>callback_data</i>);</b> 2121*22dc650dSSadaf Ebrahimi<br> 2122*22dc650dSSadaf Ebrahimi<br> 2123*22dc650dSSadaf Ebrahimi<b>void pcre2_jit_stack_free(pcre2_jit_stack *<i>jit_stack</i>);</b> 2124*22dc650dSSadaf Ebrahimi</P> 2125*22dc650dSSadaf Ebrahimi<P> 2126*22dc650dSSadaf EbrahimiThese functions provide support for JIT compilation, which, if the just-in-time 2127*22dc650dSSadaf Ebrahimicompiler is available, further processes a compiled pattern into machine code 2128*22dc650dSSadaf Ebrahimithat executes much faster than the <b>pcre2_match()</b> interpretive matching 2129*22dc650dSSadaf Ebrahimifunction. Full details are given in the 2130*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a> 2131*22dc650dSSadaf Ebrahimidocumentation. 2132*22dc650dSSadaf Ebrahimi</P> 2133*22dc650dSSadaf Ebrahimi<P> 2134*22dc650dSSadaf EbrahimiJIT compilation is a heavyweight optimization. It can take some time for 2135*22dc650dSSadaf Ebrahimipatterns to be analyzed, and for one-off matches and simple patterns the 2136*22dc650dSSadaf Ebrahimibenefit of faster execution might be offset by a much slower compilation time. 2137*22dc650dSSadaf EbrahimiMost (but not all) patterns can be optimized by the JIT compiler. 2138*22dc650dSSadaf Ebrahimi<a name="localesupport"></a></P> 2139*22dc650dSSadaf Ebrahimi<br><a name="SEC22" href="#TOC1">LOCALE SUPPORT</a><br> 2140*22dc650dSSadaf Ebrahimi<P> 2141*22dc650dSSadaf Ebrahimi<b>const uint8_t *pcre2_maketables(pcre2_general_context *<i>gcontext</i>);</b> 2142*22dc650dSSadaf Ebrahimi<br> 2143*22dc650dSSadaf Ebrahimi<br> 2144*22dc650dSSadaf Ebrahimi<b>void pcre2_maketables_free(pcre2_general_context *<i>gcontext</i>,</b> 2145*22dc650dSSadaf Ebrahimi<b> const uint8_t *<i>tables</i>);</b> 2146*22dc650dSSadaf Ebrahimi</P> 2147*22dc650dSSadaf Ebrahimi<P> 2148*22dc650dSSadaf EbrahimiPCRE2 handles caseless matching, and determines whether characters are letters, 2149*22dc650dSSadaf Ebrahimidigits, or whatever, by reference to a set of tables, indexed by character code 2150*22dc650dSSadaf Ebrahimipoint. However, this applies only to characters whose code points are less than 2151*22dc650dSSadaf Ebrahimi256. By default, higher-valued code points never match escapes such as \w or 2152*22dc650dSSadaf Ebrahimi\d. 2153*22dc650dSSadaf Ebrahimi</P> 2154*22dc650dSSadaf Ebrahimi<P> 2155*22dc650dSSadaf EbrahimiWhen PCRE2 is built with Unicode support (the default), certain Unicode 2156*22dc650dSSadaf Ebrahimicharacter properties can be tested with \p and \P, or, alternatively, the 2157*22dc650dSSadaf EbrahimiPCRE2_UCP option can be set when a pattern is compiled; this causes \w and 2158*22dc650dSSadaf Ebrahimifriends to use Unicode property support instead of the built-in tables. 2159*22dc650dSSadaf EbrahimiPCRE2_UCP also causes upper/lower casing operations on characters with code 2160*22dc650dSSadaf Ebrahimipoints greater than 127 to use Unicode properties. These effects apply even 2161*22dc650dSSadaf Ebrahimiwhen PCRE2_UTF is not set. There are, however, some PCRE2_EXTRA options (see 2162*22dc650dSSadaf Ebrahimiabove) that can be used to modify or suppress them. 2163*22dc650dSSadaf Ebrahimi</P> 2164*22dc650dSSadaf Ebrahimi<P> 2165*22dc650dSSadaf EbrahimiThe use of locales with Unicode is discouraged. If you are handling characters 2166*22dc650dSSadaf Ebrahimiwith code points greater than 127, you should either use Unicode support, or 2167*22dc650dSSadaf Ebrahimiuse locales, but not try to mix the two. 2168*22dc650dSSadaf Ebrahimi</P> 2169*22dc650dSSadaf Ebrahimi<P> 2170*22dc650dSSadaf EbrahimiPCRE2 contains a built-in set of character tables that are used by default. 2171*22dc650dSSadaf EbrahimiThese are sufficient for many applications. Normally, the internal tables 2172*22dc650dSSadaf Ebrahimirecognize only ASCII characters. However, when PCRE2 is built, it is possible 2173*22dc650dSSadaf Ebrahimito cause the internal tables to be rebuilt in the default "C" locale of the 2174*22dc650dSSadaf Ebrahimilocal system, which may cause them to be different. 2175*22dc650dSSadaf Ebrahimi</P> 2176*22dc650dSSadaf Ebrahimi<P> 2177*22dc650dSSadaf EbrahimiThe built-in tables can be overridden by tables supplied by the application 2178*22dc650dSSadaf Ebrahimithat calls PCRE2. These may be created in a different locale from the default. 2179*22dc650dSSadaf EbrahimiAs more and more applications change to using Unicode, the need for this locale 2180*22dc650dSSadaf Ebrahimisupport is expected to die away. 2181*22dc650dSSadaf Ebrahimi</P> 2182*22dc650dSSadaf Ebrahimi<P> 2183*22dc650dSSadaf EbrahimiExternal tables are built by calling the <b>pcre2_maketables()</b> function, in 2184*22dc650dSSadaf Ebrahimithe relevant locale. The only argument to this function is a general context, 2185*22dc650dSSadaf Ebrahimiwhich can be used to pass a custom memory allocator. If the argument is NULL, 2186*22dc650dSSadaf Ebrahimithe system <b>malloc()</b> is used. The result can be passed to 2187*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> as often as necessary, by creating a compile context and 2188*22dc650dSSadaf Ebrahimicalling <b>pcre2_set_character_tables()</b> to set the tables pointer therein. 2189*22dc650dSSadaf Ebrahimi</P> 2190*22dc650dSSadaf Ebrahimi<P> 2191*22dc650dSSadaf EbrahimiFor example, to build and use tables that are appropriate for the French locale 2192*22dc650dSSadaf Ebrahimi(where accented characters with values greater than 127 are treated as 2193*22dc650dSSadaf Ebrahimiletters), the following code could be used: 2194*22dc650dSSadaf Ebrahimi<pre> 2195*22dc650dSSadaf Ebrahimi setlocale(LC_CTYPE, "fr_FR"); 2196*22dc650dSSadaf Ebrahimi tables = pcre2_maketables(NULL); 2197*22dc650dSSadaf Ebrahimi ccontext = pcre2_compile_context_create(NULL); 2198*22dc650dSSadaf Ebrahimi pcre2_set_character_tables(ccontext, tables); 2199*22dc650dSSadaf Ebrahimi re = pcre2_compile(..., ccontext); 2200*22dc650dSSadaf Ebrahimi</pre> 2201*22dc650dSSadaf EbrahimiThe locale name "fr_FR" is used on Linux and other Unix-like systems; if you 2202*22dc650dSSadaf Ebrahimiare using Windows, the name for the French locale is "french". 2203*22dc650dSSadaf Ebrahimi</P> 2204*22dc650dSSadaf Ebrahimi<P> 2205*22dc650dSSadaf EbrahimiThe pointer that is passed (via the compile context) to <b>pcre2_compile()</b> 2206*22dc650dSSadaf Ebrahimiis saved with the compiled pattern, and the same tables are used by the 2207*22dc650dSSadaf Ebrahimimatching functions. Thus, for any single pattern, compilation and matching both 2208*22dc650dSSadaf Ebrahimihappen in the same locale, but different patterns can be processed in different 2209*22dc650dSSadaf Ebrahimilocales. 2210*22dc650dSSadaf Ebrahimi</P> 2211*22dc650dSSadaf Ebrahimi<P> 2212*22dc650dSSadaf EbrahimiIt is the caller's responsibility to ensure that the memory containing the 2213*22dc650dSSadaf Ebrahimitables remains available while they are still in use. When they are no longer 2214*22dc650dSSadaf Ebrahimineeded, you can discard them using <b>pcre2_maketables_free()</b>, which should 2215*22dc650dSSadaf Ebrahimipass as its first parameter the same global context that was used to create the 2216*22dc650dSSadaf Ebrahimitables. 2217*22dc650dSSadaf Ebrahimi</P> 2218*22dc650dSSadaf Ebrahimi<br><b> 2219*22dc650dSSadaf EbrahimiSaving locale tables 2220*22dc650dSSadaf Ebrahimi</b><br> 2221*22dc650dSSadaf Ebrahimi<P> 2222*22dc650dSSadaf EbrahimiThe tables described above are just a sequence of binary bytes, which makes 2223*22dc650dSSadaf Ebrahimithem independent of hardware characteristics such as endianness or whether the 2224*22dc650dSSadaf Ebrahimiprocessor is 32-bit or 64-bit. A copy of the result of <b>pcre2_maketables()</b> 2225*22dc650dSSadaf Ebrahimican therefore be saved in a file or elsewhere and re-used later, even in a 2226*22dc650dSSadaf Ebrahimidifferent program or on another computer. The size of the tables (number of 2227*22dc650dSSadaf Ebrahimibytes) must be obtained by calling <b>pcre2_config()</b> with the 2228*22dc650dSSadaf EbrahimiPCRE2_CONFIG_TABLES_LENGTH option because <b>pcre2_maketables()</b> does not 2229*22dc650dSSadaf Ebrahimireturn this value. Note that the <b>pcre2_dftables</b> program, which is part of 2230*22dc650dSSadaf Ebrahimithe PCRE2 build system, can be used stand-alone to create a file that contains 2231*22dc650dSSadaf Ebrahimia set of binary tables. See the 2232*22dc650dSSadaf Ebrahimi<a href="pcre2build.html#createtables"><b>pcre2build</b></a> 2233*22dc650dSSadaf Ebrahimidocumentation for details. 2234*22dc650dSSadaf Ebrahimi<a name="infoaboutpattern"></a></P> 2235*22dc650dSSadaf Ebrahimi<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br> 2236*22dc650dSSadaf Ebrahimi<P> 2237*22dc650dSSadaf Ebrahimi<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b> 2238*22dc650dSSadaf Ebrahimi</P> 2239*22dc650dSSadaf Ebrahimi<P> 2240*22dc650dSSadaf EbrahimiThe <b>pcre2_pattern_info()</b> function returns general information about a 2241*22dc650dSSadaf Ebrahimicompiled pattern. For information about callouts, see the 2242*22dc650dSSadaf Ebrahimi<a href="#infoaboutcallouts">next section.</a> 2243*22dc650dSSadaf EbrahimiThe first argument for <b>pcre2_pattern_info()</b> is a pointer to the compiled 2244*22dc650dSSadaf Ebrahimipattern. The second argument specifies which piece of information is required, 2245*22dc650dSSadaf Ebrahimiand the third argument is a pointer to a variable to receive the data. If the 2246*22dc650dSSadaf Ebrahimithird argument is NULL, the first argument is ignored, and the function returns 2247*22dc650dSSadaf Ebrahimithe size in bytes of the variable that is required for the information 2248*22dc650dSSadaf Ebrahimirequested. Otherwise, the yield of the function is zero for success, or one of 2249*22dc650dSSadaf Ebrahimithe following negative numbers: 2250*22dc650dSSadaf Ebrahimi<pre> 2251*22dc650dSSadaf Ebrahimi PCRE2_ERROR_NULL the argument <i>code</i> was NULL 2252*22dc650dSSadaf Ebrahimi PCRE2_ERROR_BADMAGIC the "magic number" was not found 2253*22dc650dSSadaf Ebrahimi PCRE2_ERROR_BADOPTION the value of <i>what</i> was invalid 2254*22dc650dSSadaf Ebrahimi PCRE2_ERROR_UNSET the requested field is not set 2255*22dc650dSSadaf Ebrahimi</pre> 2256*22dc650dSSadaf EbrahimiThe "magic number" is placed at the start of each compiled pattern as a simple 2257*22dc650dSSadaf Ebrahimicheck against passing an arbitrary memory pointer. Here is a typical call of 2258*22dc650dSSadaf Ebrahimi<b>pcre2_pattern_info()</b>, to obtain the length of the compiled pattern: 2259*22dc650dSSadaf Ebrahimi<pre> 2260*22dc650dSSadaf Ebrahimi int rc; 2261*22dc650dSSadaf Ebrahimi size_t length; 2262*22dc650dSSadaf Ebrahimi rc = pcre2_pattern_info( 2263*22dc650dSSadaf Ebrahimi re, /* result of pcre2_compile() */ 2264*22dc650dSSadaf Ebrahimi PCRE2_INFO_SIZE, /* what is required */ 2265*22dc650dSSadaf Ebrahimi &length); /* where to put the data */ 2266*22dc650dSSadaf Ebrahimi</pre> 2267*22dc650dSSadaf EbrahimiThe possible values for the second argument are defined in <b>pcre2.h</b>, and 2268*22dc650dSSadaf Ebrahimiare as follows: 2269*22dc650dSSadaf Ebrahimi<pre> 2270*22dc650dSSadaf Ebrahimi PCRE2_INFO_ALLOPTIONS 2271*22dc650dSSadaf Ebrahimi PCRE2_INFO_ARGOPTIONS 2272*22dc650dSSadaf Ebrahimi PCRE2_INFO_EXTRAOPTIONS 2273*22dc650dSSadaf Ebrahimi</pre> 2274*22dc650dSSadaf EbrahimiReturn copies of the pattern's options. The third argument should point to a 2275*22dc650dSSadaf Ebrahimi<b>uint32_t</b> variable. PCRE2_INFO_ARGOPTIONS returns exactly the options that 2276*22dc650dSSadaf Ebrahimiwere passed to <b>pcre2_compile()</b>, whereas PCRE2_INFO_ALLOPTIONS returns 2277*22dc650dSSadaf Ebrahimithe compile options as modified by any top-level (*XXX) option settings such as 2278*22dc650dSSadaf Ebrahimi(*UTF) at the start of the pattern itself. PCRE2_INFO_EXTRAOPTIONS returns the 2279*22dc650dSSadaf Ebrahimiextra options that were set in the compile context by calling the 2280*22dc650dSSadaf Ebrahimipcre2_set_compile_extra_options() function. 2281*22dc650dSSadaf Ebrahimi</P> 2282*22dc650dSSadaf Ebrahimi<P> 2283*22dc650dSSadaf EbrahimiFor example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED 2284*22dc650dSSadaf Ebrahimioption, the result for PCRE2_INFO_ALLOPTIONS is PCRE2_EXTENDED and PCRE2_UTF. 2285*22dc650dSSadaf EbrahimiOption settings such as (?i) that can change within a pattern do not affect the 2286*22dc650dSSadaf Ebrahimiresult of PCRE2_INFO_ALLOPTIONS, even if they appear right at the start of the 2287*22dc650dSSadaf Ebrahimipattern. (This was different in some earlier releases.) 2288*22dc650dSSadaf Ebrahimi</P> 2289*22dc650dSSadaf Ebrahimi<P> 2290*22dc650dSSadaf EbrahimiA pattern compiled without PCRE2_ANCHORED is automatically anchored by PCRE2 if 2291*22dc650dSSadaf Ebrahimithe first significant item in every top-level branch is one of the following: 2292*22dc650dSSadaf Ebrahimi<pre> 2293*22dc650dSSadaf Ebrahimi ^ unless PCRE2_MULTILINE is set 2294*22dc650dSSadaf Ebrahimi \A always 2295*22dc650dSSadaf Ebrahimi \G always 2296*22dc650dSSadaf Ebrahimi .* sometimes - see below 2297*22dc650dSSadaf Ebrahimi</pre> 2298*22dc650dSSadaf EbrahimiWhen .* is the first significant item, anchoring is possible only when all the 2299*22dc650dSSadaf Ebrahimifollowing are true: 2300*22dc650dSSadaf Ebrahimi<pre> 2301*22dc650dSSadaf Ebrahimi .* is not in an atomic group 2302*22dc650dSSadaf Ebrahimi .* is not in a capture group that is the subject of a backreference 2303*22dc650dSSadaf Ebrahimi PCRE2_DOTALL is in force for .* 2304*22dc650dSSadaf Ebrahimi Neither (*PRUNE) nor (*SKIP) appears in the pattern 2305*22dc650dSSadaf Ebrahimi PCRE2_NO_DOTSTAR_ANCHOR is not set 2306*22dc650dSSadaf Ebrahimi</pre> 2307*22dc650dSSadaf EbrahimiFor patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the 2308*22dc650dSSadaf Ebrahimioptions returned for PCRE2_INFO_ALLOPTIONS. 2309*22dc650dSSadaf Ebrahimi<pre> 2310*22dc650dSSadaf Ebrahimi PCRE2_INFO_BACKREFMAX 2311*22dc650dSSadaf Ebrahimi</pre> 2312*22dc650dSSadaf EbrahimiReturn the number of the highest backreference in the pattern. The third 2313*22dc650dSSadaf Ebrahimiargument should point to a <b>uint32_t</b> variable. Named capture groups 2314*22dc650dSSadaf Ebrahimiacquire numbers as well as names, and these count towards the highest 2315*22dc650dSSadaf Ebrahimibackreference. Backreferences such as \4 or \g{12} match the captured 2316*22dc650dSSadaf Ebrahimicharacters of the given group, but in addition, the check that a capture 2317*22dc650dSSadaf Ebrahimigroup is set in a conditional group such as (?(3)a|b) is also a backreference. 2318*22dc650dSSadaf EbrahimiZero is returned if there are no backreferences. 2319*22dc650dSSadaf Ebrahimi<pre> 2320*22dc650dSSadaf Ebrahimi PCRE2_INFO_BSR 2321*22dc650dSSadaf Ebrahimi</pre> 2322*22dc650dSSadaf EbrahimiThe output is a uint32_t integer whose value indicates what character sequences 2323*22dc650dSSadaf Ebrahimithe \R escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R 2324*22dc650dSSadaf Ebrahimimatches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means 2325*22dc650dSSadaf Ebrahimithat \R matches only CR, LF, or CRLF. 2326*22dc650dSSadaf Ebrahimi<pre> 2327*22dc650dSSadaf Ebrahimi PCRE2_INFO_CAPTURECOUNT 2328*22dc650dSSadaf Ebrahimi</pre> 2329*22dc650dSSadaf EbrahimiReturn the highest capture group number in the pattern. In patterns where (?| 2330*22dc650dSSadaf Ebrahimiis not used, this is also the total number of capture groups. The third 2331*22dc650dSSadaf Ebrahimiargument should point to a <b>uint32_t</b> variable. 2332*22dc650dSSadaf Ebrahimi<pre> 2333*22dc650dSSadaf Ebrahimi PCRE2_INFO_DEPTHLIMIT 2334*22dc650dSSadaf Ebrahimi</pre> 2335*22dc650dSSadaf EbrahimiIf the pattern set a backtracking depth limit by including an item of the form 2336*22dc650dSSadaf Ebrahimi(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument 2337*22dc650dSSadaf Ebrahimishould point to a uint32_t integer. If no such value has been set, the call to 2338*22dc650dSSadaf Ebrahimi<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this 2339*22dc650dSSadaf Ebrahimilimit will only be used during matching if it is less than the limit set or 2340*22dc650dSSadaf Ebrahimidefaulted by the caller of the match function. 2341*22dc650dSSadaf Ebrahimi<pre> 2342*22dc650dSSadaf Ebrahimi PCRE2_INFO_FIRSTBITMAP 2343*22dc650dSSadaf Ebrahimi</pre> 2344*22dc650dSSadaf EbrahimiIn the absence of a single first code unit for a non-anchored pattern, 2345*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> may construct a 256-bit table that defines a fixed set of 2346*22dc650dSSadaf Ebrahimivalues for the first code unit in any match. For example, a pattern that starts 2347*22dc650dSSadaf Ebrahimiwith [abc] results in a table with three bits set. When code unit values 2348*22dc650dSSadaf Ebrahimigreater than 255 are supported, the flag bit for 255 means "any code unit of 2349*22dc650dSSadaf Ebrahimivalue 255 or above". If such a table was constructed, a pointer to it is 2350*22dc650dSSadaf Ebrahimireturned. Otherwise NULL is returned. The third argument should point to a 2351*22dc650dSSadaf Ebrahimi<b>const uint8_t *</b> variable. 2352*22dc650dSSadaf Ebrahimi<pre> 2353*22dc650dSSadaf Ebrahimi PCRE2_INFO_FIRSTCODETYPE 2354*22dc650dSSadaf Ebrahimi</pre> 2355*22dc650dSSadaf EbrahimiReturn information about the first code unit of any matched string, for a 2356*22dc650dSSadaf Ebrahiminon-anchored pattern. The third argument should point to a <b>uint32_t</b> 2357*22dc650dSSadaf Ebrahimivariable. If there is a fixed first value, for example, the letter "c" from a 2358*22dc650dSSadaf Ebrahimipattern such as (cat|cow|coyote), 1 is returned, and the value can be retrieved 2359*22dc650dSSadaf Ebrahimiusing PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, but it is 2360*22dc650dSSadaf Ebrahimiknown that a match can occur only at the start of the subject or following a 2361*22dc650dSSadaf Ebrahiminewline in the subject, 2 is returned. Otherwise, and for anchored patterns, 0 2362*22dc650dSSadaf Ebrahimiis returned. 2363*22dc650dSSadaf Ebrahimi<pre> 2364*22dc650dSSadaf Ebrahimi PCRE2_INFO_FIRSTCODEUNIT 2365*22dc650dSSadaf Ebrahimi</pre> 2366*22dc650dSSadaf EbrahimiReturn the value of the first code unit of any matched string for a pattern 2367*22dc650dSSadaf Ebrahimiwhere PCRE2_INFO_FIRSTCODETYPE returns 1; otherwise return 0. The third 2368*22dc650dSSadaf Ebrahimiargument should point to a <b>uint32_t</b> variable. In the 8-bit library, the 2369*22dc650dSSadaf Ebrahimivalue is always less than 256. In the 16-bit library the value can be up to 2370*22dc650dSSadaf Ebrahimi0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff, 2371*22dc650dSSadaf Ebrahimiand up to 0xffffffff when not using UTF-32 mode. 2372*22dc650dSSadaf Ebrahimi<pre> 2373*22dc650dSSadaf Ebrahimi PCRE2_INFO_FRAMESIZE 2374*22dc650dSSadaf Ebrahimi</pre> 2375*22dc650dSSadaf EbrahimiReturn the size (in bytes) of the data frames that are used to remember 2376*22dc650dSSadaf Ebrahimibacktracking positions when the pattern is processed by <b>pcre2_match()</b> 2377*22dc650dSSadaf Ebrahimiwithout the use of JIT. The third argument should point to a <b>size_t</b> 2378*22dc650dSSadaf Ebrahimivariable. The frame size depends on the number of capturing parentheses in the 2379*22dc650dSSadaf Ebrahimipattern. Each additional capture group adds two PCRE2_SIZE variables. 2380*22dc650dSSadaf Ebrahimi<pre> 2381*22dc650dSSadaf Ebrahimi PCRE2_INFO_HASBACKSLASHC 2382*22dc650dSSadaf Ebrahimi</pre> 2383*22dc650dSSadaf EbrahimiReturn 1 if the pattern contains any instances of \C, otherwise 0. The third 2384*22dc650dSSadaf Ebrahimiargument should point to a <b>uint32_t</b> variable. 2385*22dc650dSSadaf Ebrahimi<pre> 2386*22dc650dSSadaf Ebrahimi PCRE2_INFO_HASCRORLF 2387*22dc650dSSadaf Ebrahimi</pre> 2388*22dc650dSSadaf EbrahimiReturn 1 if the pattern contains any explicit matches for CR or LF characters, 2389*22dc650dSSadaf Ebrahimiotherwise 0. The third argument should point to a <b>uint32_t</b> variable. An 2390*22dc650dSSadaf Ebrahimiexplicit match is either a literal CR or LF character, or \r or \n or one of 2391*22dc650dSSadaf Ebrahimithe equivalent hexadecimal or octal escape sequences. 2392*22dc650dSSadaf Ebrahimi<pre> 2393*22dc650dSSadaf Ebrahimi PCRE2_INFO_HEAPLIMIT 2394*22dc650dSSadaf Ebrahimi</pre> 2395*22dc650dSSadaf EbrahimiIf the pattern set a heap memory limit by including an item of the form 2396*22dc650dSSadaf Ebrahimi(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument 2397*22dc650dSSadaf Ebrahimishould point to a uint32_t integer. If no such value has been set, the call to 2398*22dc650dSSadaf Ebrahimi<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this 2399*22dc650dSSadaf Ebrahimilimit will only be used during matching if it is less than the limit set or 2400*22dc650dSSadaf Ebrahimidefaulted by the caller of the match function. 2401*22dc650dSSadaf Ebrahimi<pre> 2402*22dc650dSSadaf Ebrahimi PCRE2_INFO_JCHANGED 2403*22dc650dSSadaf Ebrahimi</pre> 2404*22dc650dSSadaf EbrahimiReturn 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise 2405*22dc650dSSadaf Ebrahimi0. The third argument should point to a <b>uint32_t</b> variable. (?J) and 2406*22dc650dSSadaf Ebrahimi(?-J) set and unset the local PCRE2_DUPNAMES option, respectively. 2407*22dc650dSSadaf Ebrahimi<pre> 2408*22dc650dSSadaf Ebrahimi PCRE2_INFO_JITSIZE 2409*22dc650dSSadaf Ebrahimi</pre> 2410*22dc650dSSadaf EbrahimiIf the compiled pattern was successfully processed by 2411*22dc650dSSadaf Ebrahimi<b>pcre2_jit_compile()</b>, return the size of the JIT compiled code, otherwise 2412*22dc650dSSadaf Ebrahimireturn zero. The third argument should point to a <b>size_t</b> variable. 2413*22dc650dSSadaf Ebrahimi<pre> 2414*22dc650dSSadaf Ebrahimi PCRE2_INFO_LASTCODETYPE 2415*22dc650dSSadaf Ebrahimi</pre> 2416*22dc650dSSadaf EbrahimiReturns 1 if there is a rightmost literal code unit that must exist in any 2417*22dc650dSSadaf Ebrahimimatched string, other than at its start. The third argument should point to a 2418*22dc650dSSadaf Ebrahimi<b>uint32_t</b> variable. If there is no such value, 0 is returned. When 1 is 2419*22dc650dSSadaf Ebrahimireturned, the code unit value itself can be retrieved using 2420*22dc650dSSadaf EbrahimiPCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is 2421*22dc650dSSadaf Ebrahimirecorded only if it follows something of variable length. For example, for the 2422*22dc650dSSadaf Ebrahimipattern /^a\d+z\d+/ the returned value is 1 (with "z" returned from 2423*22dc650dSSadaf EbrahimiPCRE2_INFO_LASTCODEUNIT), but for /^a\dz\d/ the returned value is 0. 2424*22dc650dSSadaf Ebrahimi<pre> 2425*22dc650dSSadaf Ebrahimi PCRE2_INFO_LASTCODEUNIT 2426*22dc650dSSadaf Ebrahimi</pre> 2427*22dc650dSSadaf EbrahimiReturn the value of the rightmost literal code unit that must exist in any 2428*22dc650dSSadaf Ebrahimimatched string, other than at its start, for a pattern where 2429*22dc650dSSadaf EbrahimiPCRE2_INFO_LASTCODETYPE returns 1. Otherwise, return 0. The third argument 2430*22dc650dSSadaf Ebrahimishould point to a <b>uint32_t</b> variable. 2431*22dc650dSSadaf Ebrahimi<pre> 2432*22dc650dSSadaf Ebrahimi PCRE2_INFO_MATCHEMPTY 2433*22dc650dSSadaf Ebrahimi</pre> 2434*22dc650dSSadaf EbrahimiReturn 1 if the pattern might match an empty string, otherwise 0. The third 2435*22dc650dSSadaf Ebrahimiargument should point to a <b>uint32_t</b> variable. When a pattern contains 2436*22dc650dSSadaf Ebrahimirecursive subroutine calls it is not always possible to determine whether or 2437*22dc650dSSadaf Ebrahiminot it can match an empty string. PCRE2 takes a cautious approach and returns 1 2438*22dc650dSSadaf Ebrahimiin such cases. 2439*22dc650dSSadaf Ebrahimi<pre> 2440*22dc650dSSadaf Ebrahimi PCRE2_INFO_MATCHLIMIT 2441*22dc650dSSadaf Ebrahimi</pre> 2442*22dc650dSSadaf EbrahimiIf the pattern set a match limit by including an item of the form 2443*22dc650dSSadaf Ebrahimi(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument 2444*22dc650dSSadaf Ebrahimishould point to a uint32_t integer. If no such value has been set, the call to 2445*22dc650dSSadaf Ebrahimi<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this 2446*22dc650dSSadaf Ebrahimilimit will only be used during matching if it is less than the limit set or 2447*22dc650dSSadaf Ebrahimidefaulted by the caller of the match function. 2448*22dc650dSSadaf Ebrahimi<pre> 2449*22dc650dSSadaf Ebrahimi PCRE2_INFO_MAXLOOKBEHIND 2450*22dc650dSSadaf Ebrahimi</pre> 2451*22dc650dSSadaf EbrahimiA lookbehind assertion moves back a certain number of characters (not code 2452*22dc650dSSadaf Ebrahimiunits) when it starts to process each of its branches. This request returns the 2453*22dc650dSSadaf Ebrahimilargest of these backward moves. The third argument should point to a uint32_t 2454*22dc650dSSadaf Ebrahimiinteger. The simple assertions \b and \B require a one-character lookbehind 2455*22dc650dSSadaf Ebrahimiand cause PCRE2_INFO_MAXLOOKBEHIND to return 1 in the absence of anything 2456*22dc650dSSadaf Ebrahimilonger. \A also registers a one-character lookbehind, though it does not 2457*22dc650dSSadaf Ebrahimiactually inspect the previous character. 2458*22dc650dSSadaf Ebrahimi</P> 2459*22dc650dSSadaf Ebrahimi<P> 2460*22dc650dSSadaf EbrahimiNote that this information is useful for multi-segment matching only 2461*22dc650dSSadaf Ebrahimiif the pattern contains no nested lookbehinds. For example, the pattern 2462*22dc650dSSadaf Ebrahimi(?<=a(?<=ba)c) returns a maximum lookbehind of 2, but when it is processed, the 2463*22dc650dSSadaf Ebrahimifirst lookbehind moves back by two characters, matches one character, then the 2464*22dc650dSSadaf Ebrahiminested lookbehind also moves back by two characters. This puts the matching 2465*22dc650dSSadaf Ebrahimipoint three characters earlier than it was at the start. 2466*22dc650dSSadaf EbrahimiPCRE2_INFO_MAXLOOKBEHIND is really only useful as a debugging tool. See the 2467*22dc650dSSadaf Ebrahimi<a href="pcre2partial.html"><b>pcre2partial</b></a> 2468*22dc650dSSadaf Ebrahimidocumentation for a discussion of multi-segment matching. 2469*22dc650dSSadaf Ebrahimi<pre> 2470*22dc650dSSadaf Ebrahimi PCRE2_INFO_MINLENGTH 2471*22dc650dSSadaf Ebrahimi</pre> 2472*22dc650dSSadaf EbrahimiIf a minimum length for matching subject strings was computed, its value is 2473*22dc650dSSadaf Ebrahimireturned. Otherwise the returned value is 0. This value is not computed when 2474*22dc650dSSadaf EbrahimiPCRE2_NO_START_OPTIMIZE is set. The value is a number of characters, which in 2475*22dc650dSSadaf EbrahimiUTF mode may be different from the number of code units. The third argument 2476*22dc650dSSadaf Ebrahimishould point to a <b>uint32_t</b> variable. The value is a lower bound to the 2477*22dc650dSSadaf Ebrahimilength of any matching string. There may not be any strings of that length that 2478*22dc650dSSadaf Ebrahimido actually match, but every string that does match is at least that long. 2479*22dc650dSSadaf Ebrahimi<pre> 2480*22dc650dSSadaf Ebrahimi PCRE2_INFO_NAMECOUNT 2481*22dc650dSSadaf Ebrahimi PCRE2_INFO_NAMEENTRYSIZE 2482*22dc650dSSadaf Ebrahimi PCRE2_INFO_NAMETABLE 2483*22dc650dSSadaf Ebrahimi</pre> 2484*22dc650dSSadaf EbrahimiPCRE2 supports the use of named as well as numbered capturing parentheses. The 2485*22dc650dSSadaf Ebrahiminames are just an additional way of identifying the parentheses, which still 2486*22dc650dSSadaf Ebrahimiacquire numbers. Several convenience functions such as 2487*22dc650dSSadaf Ebrahimi<b>pcre2_substring_get_byname()</b> are provided for extracting captured 2488*22dc650dSSadaf Ebrahimisubstrings by name. It is also possible to extract the data directly, by first 2489*22dc650dSSadaf Ebrahimiconverting the name to a number in order to access the correct pointers in the 2490*22dc650dSSadaf Ebrahimioutput vector (described with <b>pcre2_match()</b> below). To do the conversion, 2491*22dc650dSSadaf Ebrahimiyou need to use the name-to-number map, which is described by these three 2492*22dc650dSSadaf Ebrahimivalues. 2493*22dc650dSSadaf Ebrahimi</P> 2494*22dc650dSSadaf Ebrahimi<P> 2495*22dc650dSSadaf EbrahimiThe map consists of a number of fixed-size entries. PCRE2_INFO_NAMECOUNT gives 2496*22dc650dSSadaf Ebrahimithe number of entries, and PCRE2_INFO_NAMEENTRYSIZE gives the size of each 2497*22dc650dSSadaf Ebrahimientry in code units; both of these return a <b>uint32_t</b> value. The entry 2498*22dc650dSSadaf Ebrahimisize depends on the length of the longest name. 2499*22dc650dSSadaf Ebrahimi</P> 2500*22dc650dSSadaf Ebrahimi<P> 2501*22dc650dSSadaf EbrahimiPCRE2_INFO_NAMETABLE returns a pointer to the first entry of the table. This is 2502*22dc650dSSadaf Ebrahimia PCRE2_SPTR pointer to a block of code units. In the 8-bit library, the first 2503*22dc650dSSadaf Ebrahimitwo bytes of each entry are the number of the capturing parenthesis, most 2504*22dc650dSSadaf Ebrahimisignificant byte first. In the 16-bit library, the pointer points to 16-bit 2505*22dc650dSSadaf Ebrahimicode units, the first of which contains the parenthesis number. In the 32-bit 2506*22dc650dSSadaf Ebrahimilibrary, the pointer points to 32-bit code units, the first of which contains 2507*22dc650dSSadaf Ebrahimithe parenthesis number. The rest of the entry is the corresponding name, zero 2508*22dc650dSSadaf Ebrahimiterminated. 2509*22dc650dSSadaf Ebrahimi</P> 2510*22dc650dSSadaf Ebrahimi<P> 2511*22dc650dSSadaf EbrahimiThe names are in alphabetical order. If (?| is used to create multiple capture 2512*22dc650dSSadaf Ebrahimigroups with the same number, as described in the 2513*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html#dupgroupnumber">section on duplicate group numbers</a> 2514*22dc650dSSadaf Ebrahimiin the 2515*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 2516*22dc650dSSadaf Ebrahimipage, the groups may be given the same name, but there is only one entry in the 2517*22dc650dSSadaf Ebrahimitable. Different names for groups of the same number are not permitted. 2518*22dc650dSSadaf Ebrahimi</P> 2519*22dc650dSSadaf Ebrahimi<P> 2520*22dc650dSSadaf EbrahimiDuplicate names for capture groups with different numbers are permitted, but 2521*22dc650dSSadaf Ebrahimionly if PCRE2_DUPNAMES is set. They appear in the table in the order in which 2522*22dc650dSSadaf Ebrahimithey were found in the pattern. In the absence of (?| this is the order of 2523*22dc650dSSadaf Ebrahimiincreasing number; when (?| is used this is not necessarily the case because 2524*22dc650dSSadaf Ebrahimilater capture groups may have lower numbers. 2525*22dc650dSSadaf Ebrahimi</P> 2526*22dc650dSSadaf Ebrahimi<P> 2527*22dc650dSSadaf EbrahimiAs a simple example of the name/number table, consider the following pattern 2528*22dc650dSSadaf Ebrahimiafter compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white 2529*22dc650dSSadaf Ebrahimispace - including newlines - is ignored): 2530*22dc650dSSadaf Ebrahimi<pre> 2531*22dc650dSSadaf Ebrahimi (?<date> (?<year>(\d\d)?\d\d) - (?<month>\d\d) - (?<day>\d\d) ) 2532*22dc650dSSadaf Ebrahimi</pre> 2533*22dc650dSSadaf EbrahimiThere are four named capture groups, so the table has four entries, and each 2534*22dc650dSSadaf Ebrahimientry in the table is eight bytes long. The table is as follows, with 2535*22dc650dSSadaf Ebrahiminon-printing bytes shows in hexadecimal, and undefined bytes shown as ??: 2536*22dc650dSSadaf Ebrahimi<pre> 2537*22dc650dSSadaf Ebrahimi 00 01 d a t e 00 ?? 2538*22dc650dSSadaf Ebrahimi 00 05 d a y 00 ?? ?? 2539*22dc650dSSadaf Ebrahimi 00 04 m o n t h 00 2540*22dc650dSSadaf Ebrahimi 00 02 y e a r 00 ?? 2541*22dc650dSSadaf Ebrahimi</pre> 2542*22dc650dSSadaf EbrahimiWhen writing code to extract data from named capture groups using the 2543*22dc650dSSadaf Ebrahiminame-to-number map, remember that the length of the entries is likely to be 2544*22dc650dSSadaf Ebrahimidifferent for each compiled pattern. 2545*22dc650dSSadaf Ebrahimi<pre> 2546*22dc650dSSadaf Ebrahimi PCRE2_INFO_NEWLINE 2547*22dc650dSSadaf Ebrahimi</pre> 2548*22dc650dSSadaf EbrahimiThe output is one of the following <b>uint32_t</b> values: 2549*22dc650dSSadaf Ebrahimi<pre> 2550*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_CR Carriage return (CR) 2551*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_LF Linefeed (LF) 2552*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) 2553*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_ANY Any Unicode line ending 2554*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF 2555*22dc650dSSadaf Ebrahimi PCRE2_NEWLINE_NUL The NUL character (binary zero) 2556*22dc650dSSadaf Ebrahimi</pre> 2557*22dc650dSSadaf EbrahimiThis identifies the character sequence that will be recognized as meaning 2558*22dc650dSSadaf Ebrahimi"newline" while matching. 2559*22dc650dSSadaf Ebrahimi<pre> 2560*22dc650dSSadaf Ebrahimi PCRE2_INFO_SIZE 2561*22dc650dSSadaf Ebrahimi</pre> 2562*22dc650dSSadaf EbrahimiReturn the size of the compiled pattern in bytes (for all three libraries). The 2563*22dc650dSSadaf Ebrahimithird argument should point to a <b>size_t</b> variable. This value includes the 2564*22dc650dSSadaf Ebrahimisize of the general data block that precedes the code units of the compiled 2565*22dc650dSSadaf Ebrahimipattern itself. The value that is used when <b>pcre2_compile()</b> is getting 2566*22dc650dSSadaf Ebrahimimemory in which to place the compiled pattern may be slightly larger than the 2567*22dc650dSSadaf Ebrahimivalue returned by this option, because there are cases where the code that 2568*22dc650dSSadaf Ebrahimicalculates the size has to over-estimate. Processing a pattern with the JIT 2569*22dc650dSSadaf Ebrahimicompiler does not alter the value returned by this option. 2570*22dc650dSSadaf Ebrahimi<a name="infoaboutcallouts"></a></P> 2571*22dc650dSSadaf Ebrahimi<br><a name="SEC24" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br> 2572*22dc650dSSadaf Ebrahimi<P> 2573*22dc650dSSadaf Ebrahimi<b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b> 2574*22dc650dSSadaf Ebrahimi<b> int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b> 2575*22dc650dSSadaf Ebrahimi<b> void *<i>user_data</i>);</b> 2576*22dc650dSSadaf Ebrahimi<br> 2577*22dc650dSSadaf Ebrahimi<br> 2578*22dc650dSSadaf EbrahimiA script language that supports the use of string arguments in callouts might 2579*22dc650dSSadaf Ebrahimilike to scan all the callouts in a pattern before running the match. This can 2580*22dc650dSSadaf Ebrahimibe done by calling <b>pcre2_callout_enumerate()</b>. The first argument is a 2581*22dc650dSSadaf Ebrahimipointer to a compiled pattern, the second points to a callback function, and 2582*22dc650dSSadaf Ebrahimithe third is arbitrary user data. The callback function is called for every 2583*22dc650dSSadaf Ebrahimicallout in the pattern in the order in which they appear. Its first argument is 2584*22dc650dSSadaf Ebrahimia pointer to a callout enumeration block, and its second argument is the 2585*22dc650dSSadaf Ebrahimi<i>user_data</i> value that was passed to <b>pcre2_callout_enumerate()</b>. The 2586*22dc650dSSadaf Ebrahimicontents of the callout enumeration block are described in the 2587*22dc650dSSadaf Ebrahimi<a href="pcre2callout.html"><b>pcre2callout</b></a> 2588*22dc650dSSadaf Ebrahimidocumentation, which also gives further details about callouts. 2589*22dc650dSSadaf Ebrahimi</P> 2590*22dc650dSSadaf Ebrahimi<br><a name="SEC25" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br> 2591*22dc650dSSadaf Ebrahimi<P> 2592*22dc650dSSadaf EbrahimiIt is possible to save compiled patterns on disc or elsewhere, and reload them 2593*22dc650dSSadaf Ebrahimilater, subject to a number of restrictions. The host on which the patterns are 2594*22dc650dSSadaf Ebrahimireloaded must be running the same version of PCRE2, with the same code unit 2595*22dc650dSSadaf Ebrahimiwidth, and must also have the same endianness, pointer width, and PCRE2_SIZE 2596*22dc650dSSadaf Ebrahimitype. Before compiled patterns can be saved, they must be converted to a 2597*22dc650dSSadaf Ebrahimi"serialized" form, which in the case of PCRE2 is really just a bytecode dump. 2598*22dc650dSSadaf EbrahimiThe functions whose names begin with <b>pcre2_serialize_</b> are used for 2599*22dc650dSSadaf Ebrahimiconverting to and from the serialized form. They are described in the 2600*22dc650dSSadaf Ebrahimi<a href="pcre2serialize.html"><b>pcre2serialize</b></a> 2601*22dc650dSSadaf Ebrahimidocumentation. Note that PCRE2 serialization does not convert compiled patterns 2602*22dc650dSSadaf Ebrahimito an abstract format like Java or .NET serialization. 2603*22dc650dSSadaf Ebrahimi<a name="matchdatablock"></a></P> 2604*22dc650dSSadaf Ebrahimi<br><a name="SEC26" href="#TOC1">THE MATCH DATA BLOCK</a><br> 2605*22dc650dSSadaf Ebrahimi<P> 2606*22dc650dSSadaf Ebrahimi<b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b> 2607*22dc650dSSadaf Ebrahimi<b> pcre2_general_context *<i>gcontext</i>);</b> 2608*22dc650dSSadaf Ebrahimi<br> 2609*22dc650dSSadaf Ebrahimi<br> 2610*22dc650dSSadaf Ebrahimi<b>pcre2_match_data *pcre2_match_data_create_from_pattern(</b> 2611*22dc650dSSadaf Ebrahimi<b> const pcre2_code *<i>code</i>, pcre2_general_context *<i>gcontext</i>);</b> 2612*22dc650dSSadaf Ebrahimi<br> 2613*22dc650dSSadaf Ebrahimi<br> 2614*22dc650dSSadaf Ebrahimi<b>void pcre2_match_data_free(pcre2_match_data *<i>match_data</i>);</b> 2615*22dc650dSSadaf Ebrahimi</P> 2616*22dc650dSSadaf Ebrahimi<P> 2617*22dc650dSSadaf EbrahimiInformation about a successful or unsuccessful match is placed in a match 2618*22dc650dSSadaf Ebrahimidata block, which is an opaque structure that is accessed by function calls. In 2619*22dc650dSSadaf Ebrahimiparticular, the match data block contains a vector of offsets into the subject 2620*22dc650dSSadaf Ebrahimistring that define the matched parts of the subject. This is known as the 2621*22dc650dSSadaf Ebrahimi<i>ovector</i>. 2622*22dc650dSSadaf Ebrahimi</P> 2623*22dc650dSSadaf Ebrahimi<P> 2624*22dc650dSSadaf EbrahimiBefore calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or 2625*22dc650dSSadaf Ebrahimi<b>pcre2_jit_match()</b> you must create a match data block by calling one of 2626*22dc650dSSadaf Ebrahimithe creation functions above. For <b>pcre2_match_data_create()</b>, the first 2627*22dc650dSSadaf Ebrahimiargument is the number of pairs of offsets in the <i>ovector</i>. 2628*22dc650dSSadaf Ebrahimi</P> 2629*22dc650dSSadaf Ebrahimi<P> 2630*22dc650dSSadaf EbrahimiWhen using <b>pcre2_match()</b>, one pair of offsets is required to identify the 2631*22dc650dSSadaf Ebrahimistring that matched the whole pattern, with an additional pair for each 2632*22dc650dSSadaf Ebrahimicaptured substring. For example, a value of 4 creates enough space to record 2633*22dc650dSSadaf Ebrahimithe matched portion of the subject plus three captured substrings. 2634*22dc650dSSadaf Ebrahimi</P> 2635*22dc650dSSadaf Ebrahimi<P> 2636*22dc650dSSadaf EbrahimiWhen using <b>pcre2_dfa_match()</b> there may be multiple matched substrings of 2637*22dc650dSSadaf Ebrahimidifferent lengths at the same point in the subject. The ovector should be made 2638*22dc650dSSadaf Ebrahimilarge enough to hold as many as are expected. 2639*22dc650dSSadaf Ebrahimi</P> 2640*22dc650dSSadaf Ebrahimi<P> 2641*22dc650dSSadaf EbrahimiA minimum of at least 1 pair is imposed by <b>pcre2_match_data_create()</b>, so 2642*22dc650dSSadaf Ebrahimiit is always possible to return the overall matched string in the case of 2643*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> or the longest match in the case of 2644*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>. The maximum number of pairs is 65535; if the first 2645*22dc650dSSadaf Ebrahimiargument of <b>pcre2_match_data_create()</b> is greater than this, 65535 is 2646*22dc650dSSadaf Ebrahimiused. 2647*22dc650dSSadaf Ebrahimi</P> 2648*22dc650dSSadaf Ebrahimi<P> 2649*22dc650dSSadaf EbrahimiThe second argument of <b>pcre2_match_data_create()</b> is a pointer to a 2650*22dc650dSSadaf Ebrahimigeneral context, which can specify custom memory management for obtaining the 2651*22dc650dSSadaf Ebrahimimemory for the match data block. If you are not using custom memory management, 2652*22dc650dSSadaf Ebrahimipass NULL, which causes <b>malloc()</b> to be used. 2653*22dc650dSSadaf Ebrahimi</P> 2654*22dc650dSSadaf Ebrahimi<P> 2655*22dc650dSSadaf EbrahimiFor <b>pcre2_match_data_create_from_pattern()</b>, the first argument is a 2656*22dc650dSSadaf Ebrahimipointer to a compiled pattern. The ovector is created to be exactly the right 2657*22dc650dSSadaf Ebrahimisize to hold all the substrings a pattern might capture when matched using 2658*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>. You should not use this call when matching with 2659*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>. The second argument is again a pointer to a general 2660*22dc650dSSadaf Ebrahimicontext, but in this case if NULL is passed, the memory is obtained using the 2661*22dc650dSSadaf Ebrahimisame allocator that was used for the compiled pattern (custom or default). 2662*22dc650dSSadaf Ebrahimi</P> 2663*22dc650dSSadaf Ebrahimi<P> 2664*22dc650dSSadaf EbrahimiA match data block can be used many times, with the same or different compiled 2665*22dc650dSSadaf Ebrahimipatterns. You can extract information from a match data block after a match 2666*22dc650dSSadaf Ebrahimioperation has finished, using functions that are described in the sections on 2667*22dc650dSSadaf Ebrahimi<a href="#matchedstrings">matched strings</a> 2668*22dc650dSSadaf Ebrahimiand 2669*22dc650dSSadaf Ebrahimi<a href="#matchotherdata">other match data</a> 2670*22dc650dSSadaf Ebrahimibelow. 2671*22dc650dSSadaf Ebrahimi</P> 2672*22dc650dSSadaf Ebrahimi<P> 2673*22dc650dSSadaf EbrahimiWhen a call of <b>pcre2_match()</b> fails, valid data is available in the match 2674*22dc650dSSadaf Ebrahimiblock only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one 2675*22dc650dSSadaf Ebrahimiof the error codes for an invalid UTF string. Exactly what is available depends 2676*22dc650dSSadaf Ebrahimion the error, and is detailed below. 2677*22dc650dSSadaf Ebrahimi</P> 2678*22dc650dSSadaf Ebrahimi<P> 2679*22dc650dSSadaf EbrahimiWhen one of the matching functions is called, pointers to the compiled pattern 2680*22dc650dSSadaf Ebrahimiand the subject string are set in the match data block so that they can be 2681*22dc650dSSadaf Ebrahimireferenced by the extraction functions after a successful match. After running 2682*22dc650dSSadaf Ebrahimia match, you must not free a compiled pattern or a subject string until after 2683*22dc650dSSadaf Ebrahimiall operations on the match data block (for that match) have taken place, 2684*22dc650dSSadaf Ebrahimiunless, in the case of the subject string, you have used the 2685*22dc650dSSadaf EbrahimiPCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled 2686*22dc650dSSadaf Ebrahimi"Option bits for <b>pcre2_match()</b>" 2687*22dc650dSSadaf Ebrahimi<a href="#matchoptions>">below.</a> 2688*22dc650dSSadaf Ebrahimi</P> 2689*22dc650dSSadaf Ebrahimi<P> 2690*22dc650dSSadaf EbrahimiWhen a match data block itself is no longer needed, it should be freed by 2691*22dc650dSSadaf Ebrahimicalling <b>pcre2_match_data_free()</b>. If this function is called with a NULL 2692*22dc650dSSadaf Ebrahimiargument, it returns immediately, without doing anything. 2693*22dc650dSSadaf Ebrahimi</P> 2694*22dc650dSSadaf Ebrahimi<br><a name="SEC27" href="#TOC1">MEMORY USE FOR MATCH DATA BLOCKS</a><br> 2695*22dc650dSSadaf Ebrahimi<P> 2696*22dc650dSSadaf Ebrahimi<b>PCRE2_SIZE pcre2_get_match_data_size(pcre2_match_data *<i>match_data</i>);</b> 2697*22dc650dSSadaf Ebrahimi<br> 2698*22dc650dSSadaf Ebrahimi<br> 2699*22dc650dSSadaf Ebrahimi<b>PCRE2_SIZE pcre2_get_match_data_heapframes_size(</b> 2700*22dc650dSSadaf Ebrahimi<b> pcre2_match_data *<i>match_data</i>);</b> 2701*22dc650dSSadaf Ebrahimi</P> 2702*22dc650dSSadaf Ebrahimi<P> 2703*22dc650dSSadaf EbrahimiThe size of a match data block depends on the size of the ovector that it 2704*22dc650dSSadaf Ebrahimicontains. The function <b>pcre2_get_match_data_size()</b> returns the size, in 2705*22dc650dSSadaf Ebrahimibytes, of the block that is its argument. 2706*22dc650dSSadaf Ebrahimi</P> 2707*22dc650dSSadaf Ebrahimi<P> 2708*22dc650dSSadaf EbrahimiWhen <b>pcre2_match()</b> runs interpretively (that is, without using JIT), it 2709*22dc650dSSadaf Ebrahimimakes use of a vector of data frames for remembering backtracking positions. 2710*22dc650dSSadaf EbrahimiThe size of each individual frame depends on the number of capturing 2711*22dc650dSSadaf Ebrahimiparentheses in the pattern and can be obtained by calling 2712*22dc650dSSadaf Ebrahimi<b>pcre2_pattern_info()</b> with the PCRE2_INFO_FRAMESIZE option (see the 2713*22dc650dSSadaf Ebrahimisection entitled "Information about a compiled pattern" 2714*22dc650dSSadaf Ebrahimi<a href="#infoaboutpattern>">above).</a> 2715*22dc650dSSadaf Ebrahimi</P> 2716*22dc650dSSadaf Ebrahimi<P> 2717*22dc650dSSadaf EbrahimiHeap memory is used for the frames vector; if the initial memory block turns 2718*22dc650dSSadaf Ebrahimiout to be too small during matching, it is automatically expanded. When 2719*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> returns, the memory is not freed, but remains attached to 2720*22dc650dSSadaf Ebrahimithe match data block, for use by any subsequent matches that use the same 2721*22dc650dSSadaf Ebrahimiblock. It is automatically freed when the match data block itself is freed. 2722*22dc650dSSadaf Ebrahimi</P> 2723*22dc650dSSadaf Ebrahimi<P> 2724*22dc650dSSadaf EbrahimiYou can find the current size of the frames vector that a match data block owns 2725*22dc650dSSadaf Ebrahimiby calling <b>pcre2_get_match_data_heapframes_size()</b>. For a newly created 2726*22dc650dSSadaf Ebrahimimatch data block the size will be zero. Some types of match may require a lot 2727*22dc650dSSadaf Ebrahimiof frames and thus a large vector; applications that run in environments where 2728*22dc650dSSadaf Ebrahimimemory is constrained can check this and free the match data block if the heap 2729*22dc650dSSadaf Ebrahimiframes vector has become too big. 2730*22dc650dSSadaf Ebrahimi</P> 2731*22dc650dSSadaf Ebrahimi<br><a name="SEC28" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br> 2732*22dc650dSSadaf Ebrahimi<P> 2733*22dc650dSSadaf Ebrahimi<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> 2734*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> 2735*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b> 2736*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>);</b> 2737*22dc650dSSadaf Ebrahimi</P> 2738*22dc650dSSadaf Ebrahimi<P> 2739*22dc650dSSadaf EbrahimiThe function <b>pcre2_match()</b> is called to match a subject string against a 2740*22dc650dSSadaf Ebrahimicompiled pattern, which is passed in the <i>code</i> argument. You can call 2741*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> with the same <i>code</i> argument as many times as you 2742*22dc650dSSadaf Ebrahimilike, in order to find multiple matches in the subject string or to match 2743*22dc650dSSadaf Ebrahimidifferent subject strings with the same pattern. 2744*22dc650dSSadaf Ebrahimi</P> 2745*22dc650dSSadaf Ebrahimi<P> 2746*22dc650dSSadaf EbrahimiThis function is the main matching facility of the library, and it operates in 2747*22dc650dSSadaf Ebrahimia Perl-like manner. For specialist use there is also an alternative matching 2748*22dc650dSSadaf Ebrahimifunction, which is described 2749*22dc650dSSadaf Ebrahimi<a href="#dfamatch">below</a> 2750*22dc650dSSadaf Ebrahimiin the section about the <b>pcre2_dfa_match()</b> function. 2751*22dc650dSSadaf Ebrahimi</P> 2752*22dc650dSSadaf Ebrahimi<P> 2753*22dc650dSSadaf EbrahimiHere is an example of a simple call to <b>pcre2_match()</b>: 2754*22dc650dSSadaf Ebrahimi<pre> 2755*22dc650dSSadaf Ebrahimi pcre2_match_data *md = pcre2_match_data_create(4, NULL); 2756*22dc650dSSadaf Ebrahimi int rc = pcre2_match( 2757*22dc650dSSadaf Ebrahimi re, /* result of pcre2_compile() */ 2758*22dc650dSSadaf Ebrahimi "some string", /* the subject string */ 2759*22dc650dSSadaf Ebrahimi 11, /* the length of the subject string */ 2760*22dc650dSSadaf Ebrahimi 0, /* start at offset 0 in the subject */ 2761*22dc650dSSadaf Ebrahimi 0, /* default options */ 2762*22dc650dSSadaf Ebrahimi md, /* the match data block */ 2763*22dc650dSSadaf Ebrahimi NULL); /* a match context; NULL means use defaults */ 2764*22dc650dSSadaf Ebrahimi</pre> 2765*22dc650dSSadaf EbrahimiIf the subject string is zero-terminated, the length can be given as 2766*22dc650dSSadaf EbrahimiPCRE2_ZERO_TERMINATED. A match context must be provided if certain less common 2767*22dc650dSSadaf Ebrahimimatching parameters are to be changed. For details, see the section on 2768*22dc650dSSadaf Ebrahimi<a href="#matchcontext">the match context</a> 2769*22dc650dSSadaf Ebrahimiabove. 2770*22dc650dSSadaf Ebrahimi</P> 2771*22dc650dSSadaf Ebrahimi<br><b> 2772*22dc650dSSadaf EbrahimiThe string to be matched by <b>pcre2_match()</b> 2773*22dc650dSSadaf Ebrahimi</b><br> 2774*22dc650dSSadaf Ebrahimi<P> 2775*22dc650dSSadaf EbrahimiThe subject string is passed to <b>pcre2_match()</b> as a pointer in 2776*22dc650dSSadaf Ebrahimi<i>subject</i>, a length in <i>length</i>, and a starting offset in 2777*22dc650dSSadaf Ebrahimi<i>startoffset</i>. The length and offset are in code units, not characters. 2778*22dc650dSSadaf EbrahimiThat is, they are in bytes for the 8-bit library, 16-bit code units for the 2779*22dc650dSSadaf Ebrahimi16-bit library, and 32-bit code units for the 32-bit library, whether or not 2780*22dc650dSSadaf EbrahimiUTF processing is enabled. As a special case, if <i>subject</i> is NULL and 2781*22dc650dSSadaf Ebrahimi<i>length</i> is zero, the subject is assumed to be an empty string. If 2782*22dc650dSSadaf Ebrahimi<i>length</i> is non-zero, an error occurs if <i>subject</i> is NULL. 2783*22dc650dSSadaf Ebrahimi</P> 2784*22dc650dSSadaf Ebrahimi<P> 2785*22dc650dSSadaf EbrahimiIf <i>startoffset</i> is greater than the length of the subject, 2786*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> returns PCRE2_ERROR_BADOFFSET. When the starting offset is 2787*22dc650dSSadaf Ebrahimizero, the search for a match starts at the beginning of the subject, and this 2788*22dc650dSSadaf Ebrahimiis by far the most common case. In UTF-8 or UTF-16 mode, the starting offset 2789*22dc650dSSadaf Ebrahimimust point to the start of a character, or to the end of the subject (in UTF-32 2790*22dc650dSSadaf Ebrahimimode, one code unit equals one character, so all offsets are valid). Like the 2791*22dc650dSSadaf Ebrahimipattern string, the subject may contain binary zeros. 2792*22dc650dSSadaf Ebrahimi</P> 2793*22dc650dSSadaf Ebrahimi<P> 2794*22dc650dSSadaf EbrahimiA non-zero starting offset is useful when searching for another match in the 2795*22dc650dSSadaf Ebrahimisame subject by calling <b>pcre2_match()</b> again after a previous success. 2796*22dc650dSSadaf EbrahimiSetting <i>startoffset</i> differs from passing over a shortened string and 2797*22dc650dSSadaf Ebrahimisetting PCRE2_NOTBOL in the case of a pattern that begins with any kind of 2798*22dc650dSSadaf Ebrahimilookbehind. For example, consider the pattern 2799*22dc650dSSadaf Ebrahimi<pre> 2800*22dc650dSSadaf Ebrahimi \Biss\B 2801*22dc650dSSadaf Ebrahimi</pre> 2802*22dc650dSSadaf Ebrahimiwhich finds occurrences of "iss" in the middle of words. (\B matches only if 2803*22dc650dSSadaf Ebrahimithe current position in the subject is not a word boundary.) When applied to 2804*22dc650dSSadaf Ebrahimithe string "Mississippi" the first call to <b>pcre2_match()</b> finds the first 2805*22dc650dSSadaf Ebrahimioccurrence. If <b>pcre2_match()</b> is called again with just the remainder of 2806*22dc650dSSadaf Ebrahimithe subject, namely "issippi", it does not match, because \B is always false 2807*22dc650dSSadaf Ebrahimiat the start of the subject, which is deemed to be a word boundary. However, if 2808*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> is passed the entire string again, but with 2809*22dc650dSSadaf Ebrahimi<i>startoffset</i> set to 4, it finds the second occurrence of "iss" because it 2810*22dc650dSSadaf Ebrahimiis able to look behind the starting point to discover that it is preceded by a 2811*22dc650dSSadaf Ebrahimiletter. 2812*22dc650dSSadaf Ebrahimi</P> 2813*22dc650dSSadaf Ebrahimi<P> 2814*22dc650dSSadaf EbrahimiFinding all the matches in a subject is tricky when the pattern can match an 2815*22dc650dSSadaf Ebrahimiempty string. It is possible to emulate Perl's /g behaviour by first trying the 2816*22dc650dSSadaf Ebrahimimatch again at the same offset, with the PCRE2_NOTEMPTY_ATSTART and 2817*22dc650dSSadaf EbrahimiPCRE2_ANCHORED options, and then if that fails, advancing the starting offset 2818*22dc650dSSadaf Ebrahimiand trying an ordinary match again. There is some code that demonstrates how to 2819*22dc650dSSadaf Ebrahimido this in the 2820*22dc650dSSadaf Ebrahimi<a href="pcre2demo.html"><b>pcre2demo</b></a> 2821*22dc650dSSadaf Ebrahimisample program. In the most general case, you have to check to see if the 2822*22dc650dSSadaf Ebrahiminewline convention recognizes CRLF as a newline, and if so, and the current 2823*22dc650dSSadaf Ebrahimicharacter is CR followed by LF, advance the starting offset by two characters 2824*22dc650dSSadaf Ebrahimiinstead of one. 2825*22dc650dSSadaf Ebrahimi</P> 2826*22dc650dSSadaf Ebrahimi<P> 2827*22dc650dSSadaf EbrahimiIf a non-zero starting offset is passed when the pattern is anchored, a single 2828*22dc650dSSadaf Ebrahimiattempt to match at the given offset is made. This can only succeed if the 2829*22dc650dSSadaf Ebrahimipattern does not require the match to be at the start of the subject. In other 2830*22dc650dSSadaf Ebrahimiwords, the anchoring must be the result of setting the PCRE2_ANCHORED option or 2831*22dc650dSSadaf Ebrahimithe use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A. 2832*22dc650dSSadaf Ebrahimi<a name="matchoptions"></a></P> 2833*22dc650dSSadaf Ebrahimi<br><b> 2834*22dc650dSSadaf EbrahimiOption bits for <b>pcre2_match()</b> 2835*22dc650dSSadaf Ebrahimi</b><br> 2836*22dc650dSSadaf Ebrahimi<P> 2837*22dc650dSSadaf EbrahimiThe unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be 2838*22dc650dSSadaf Ebrahimizero. The only bits that may be set are PCRE2_ANCHORED, 2839*22dc650dSSadaf EbrahimiPCRE2_COPY_MATCHED_SUBJECT, PCRE2_DISABLE_RECURSELOOP_CHECK, PCRE2_ENDANCHORED, 2840*22dc650dSSadaf EbrahimiPCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, 2841*22dc650dSSadaf EbrahimiPCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. 2842*22dc650dSSadaf EbrahimiTheir action is described below. 2843*22dc650dSSadaf Ebrahimi</P> 2844*22dc650dSSadaf Ebrahimi<P> 2845*22dc650dSSadaf EbrahimiSetting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by 2846*22dc650dSSadaf Ebrahimithe just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the 2847*22dc650dSSadaf Ebrahimiinterpretive code in <b>pcre2_match()</b> is run. 2848*22dc650dSSadaf EbrahimiPCRE2_DISABLE_RECURSELOOP_CHECK is ignored by JIT, but apart from PCRE2_NO_JIT 2849*22dc650dSSadaf Ebrahimi(obviously), the remaining options are supported for JIT matching. 2850*22dc650dSSadaf Ebrahimi<pre> 2851*22dc650dSSadaf Ebrahimi PCRE2_ANCHORED 2852*22dc650dSSadaf Ebrahimi</pre> 2853*22dc650dSSadaf EbrahimiThe PCRE2_ANCHORED option limits <b>pcre2_match()</b> to matching at the first 2854*22dc650dSSadaf Ebrahimimatching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out 2855*22dc650dSSadaf Ebrahimito be anchored by virtue of its contents, it cannot be made unachored at 2856*22dc650dSSadaf Ebrahimimatching time. Note that setting the option at match time disables JIT 2857*22dc650dSSadaf Ebrahimimatching. 2858*22dc650dSSadaf Ebrahimi<pre> 2859*22dc650dSSadaf Ebrahimi PCRE2_COPY_MATCHED_SUBJECT 2860*22dc650dSSadaf Ebrahimi</pre> 2861*22dc650dSSadaf EbrahimiBy default, a pointer to the subject is remembered in the match data block so 2862*22dc650dSSadaf Ebrahimithat, after a successful match, it can be referenced by the substring 2863*22dc650dSSadaf Ebrahimiextraction functions. This means that the subject's memory must not be freed 2864*22dc650dSSadaf Ebrahimiuntil all such operations are complete. For some applications where the 2865*22dc650dSSadaf Ebrahimilifetime of the subject string is not guaranteed, it may be necessary to make a 2866*22dc650dSSadaf Ebrahimicopy of the subject string, but it is wasteful to do this unless the match is 2867*22dc650dSSadaf Ebrahimisuccessful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the 2868*22dc650dSSadaf Ebrahimisubject is copied and the new pointer is remembered in the match data block 2869*22dc650dSSadaf Ebrahimiinstead of the original subject pointer. The memory allocator that was used for 2870*22dc650dSSadaf Ebrahimithe match block itself is used. The copy is automatically freed when 2871*22dc650dSSadaf Ebrahimi<b>pcre2_match_data_free()</b> is called to free the match data block. It is also 2872*22dc650dSSadaf Ebrahimiautomatically freed if the match data block is re-used for another match 2873*22dc650dSSadaf Ebrahimioperation. 2874*22dc650dSSadaf Ebrahimi<pre> 2875*22dc650dSSadaf Ebrahimi PCRE2_DISABLE_RECURSELOOP_CHECK 2876*22dc650dSSadaf Ebrahimi</pre> 2877*22dc650dSSadaf EbrahimiThis option is relevant only to <b>pcre2_match()</b> for interpretive matching. 2878*22dc650dSSadaf EbrahimiIt is ignored when JIT is used, and is forbidden for <b>pcre2_dfa_match()</b>. 2879*22dc650dSSadaf Ebrahimi</P> 2880*22dc650dSSadaf Ebrahimi<P> 2881*22dc650dSSadaf EbrahimiThe use of recursion in patterns can lead to infinite loops. In the 2882*22dc650dSSadaf Ebrahimiinterpretive matcher these would be eventually caught by the match or heap 2883*22dc650dSSadaf Ebrahimilimits, but this could take a long time and/or use a lot of memory if the 2884*22dc650dSSadaf Ebrahimilimits are large. There is therefore a check at the start of each recursion. 2885*22dc650dSSadaf EbrahimiIf the same group is still active from a previous call, and the current subject 2886*22dc650dSSadaf Ebrahimipointer is the same as it was at the start of that group, and the furthest 2887*22dc650dSSadaf Ebrahimiinspected character of the subject has not changed, an error is generated. 2888*22dc650dSSadaf Ebrahimi</P> 2889*22dc650dSSadaf Ebrahimi<P> 2890*22dc650dSSadaf EbrahimiThere are rare cases of matches that would complete, but nevertheless trigger 2891*22dc650dSSadaf Ebrahimithis error. This option disables the check. It is provided mainly for testing 2892*22dc650dSSadaf Ebrahimiwhen comparing JIT and interpretive behaviour. 2893*22dc650dSSadaf Ebrahimi<pre> 2894*22dc650dSSadaf Ebrahimi PCRE2_ENDANCHORED 2895*22dc650dSSadaf Ebrahimi</pre> 2896*22dc650dSSadaf EbrahimiIf the PCRE2_ENDANCHORED option is set, any string that <b>pcre2_match()</b> 2897*22dc650dSSadaf Ebrahimimatches must be right at the end of the subject string. Note that setting the 2898*22dc650dSSadaf Ebrahimioption at match time disables JIT matching. 2899*22dc650dSSadaf Ebrahimi<pre> 2900*22dc650dSSadaf Ebrahimi PCRE2_NOTBOL 2901*22dc650dSSadaf Ebrahimi</pre> 2902*22dc650dSSadaf EbrahimiThis option specifies that first character of the subject string is not the 2903*22dc650dSSadaf Ebrahimibeginning of a line, so the circumflex metacharacter should not match before 2904*22dc650dSSadaf Ebrahimiit. Setting this without having set PCRE2_MULTILINE at compile time causes 2905*22dc650dSSadaf Ebrahimicircumflex never to match. This option affects only the behaviour of the 2906*22dc650dSSadaf Ebrahimicircumflex metacharacter. It does not affect \A. 2907*22dc650dSSadaf Ebrahimi<pre> 2908*22dc650dSSadaf Ebrahimi PCRE2_NOTEOL 2909*22dc650dSSadaf Ebrahimi</pre> 2910*22dc650dSSadaf EbrahimiThis option specifies that the end of the subject string is not the end of a 2911*22dc650dSSadaf Ebrahimiline, so the dollar metacharacter should not match it nor (except in multiline 2912*22dc650dSSadaf Ebrahimimode) a newline immediately before it. Setting this without having set 2913*22dc650dSSadaf EbrahimiPCRE2_MULTILINE at compile time causes dollar never to match. This option 2914*22dc650dSSadaf Ebrahimiaffects only the behaviour of the dollar metacharacter. It does not affect \Z 2915*22dc650dSSadaf Ebrahimior \z. 2916*22dc650dSSadaf Ebrahimi<pre> 2917*22dc650dSSadaf Ebrahimi PCRE2_NOTEMPTY 2918*22dc650dSSadaf Ebrahimi</pre> 2919*22dc650dSSadaf EbrahimiAn empty string is not considered to be a valid match if this option is set. If 2920*22dc650dSSadaf Ebrahimithere are alternatives in the pattern, they are tried. If all the alternatives 2921*22dc650dSSadaf Ebrahimimatch the empty string, the entire match fails. For example, if the pattern 2922*22dc650dSSadaf Ebrahimi<pre> 2923*22dc650dSSadaf Ebrahimi a?b? 2924*22dc650dSSadaf Ebrahimi</pre> 2925*22dc650dSSadaf Ebrahimiis applied to a string not beginning with "a" or "b", it matches an empty 2926*22dc650dSSadaf Ebrahimistring at the start of the subject. With PCRE2_NOTEMPTY set, this match is not 2927*22dc650dSSadaf Ebrahimivalid, so <b>pcre2_match()</b> searches further into the string for occurrences 2928*22dc650dSSadaf Ebrahimiof "a" or "b". 2929*22dc650dSSadaf Ebrahimi<pre> 2930*22dc650dSSadaf Ebrahimi PCRE2_NOTEMPTY_ATSTART 2931*22dc650dSSadaf Ebrahimi</pre> 2932*22dc650dSSadaf EbrahimiThis is like PCRE2_NOTEMPTY, except that it locks out an empty string match 2933*22dc650dSSadaf Ebrahimionly at the first matching position, that is, at the start of the subject plus 2934*22dc650dSSadaf Ebrahimithe starting offset. An empty string match later in the subject is permitted. 2935*22dc650dSSadaf EbrahimiIf the pattern is anchored, such a match can occur only if the pattern contains 2936*22dc650dSSadaf Ebrahimi\K. 2937*22dc650dSSadaf Ebrahimi<pre> 2938*22dc650dSSadaf Ebrahimi PCRE2_NO_JIT 2939*22dc650dSSadaf Ebrahimi</pre> 2940*22dc650dSSadaf EbrahimiBy default, if a pattern has been successfully processed by 2941*22dc650dSSadaf Ebrahimi<b>pcre2_jit_compile()</b>, JIT is automatically used when <b>pcre2_match()</b> 2942*22dc650dSSadaf Ebrahimiis called with options that JIT supports. Setting PCRE2_NO_JIT disables the use 2943*22dc650dSSadaf Ebrahimiof JIT; it forces matching to be done by the interpreter. 2944*22dc650dSSadaf Ebrahimi<pre> 2945*22dc650dSSadaf Ebrahimi PCRE2_NO_UTF_CHECK 2946*22dc650dSSadaf Ebrahimi</pre> 2947*22dc650dSSadaf EbrahimiWhen PCRE2_UTF is set at compile time, the validity of the subject as a UTF 2948*22dc650dSSadaf Ebrahimistring is checked unless PCRE2_NO_UTF_CHECK is passed to <b>pcre2_match()</b> or 2949*22dc650dSSadaf EbrahimiPCRE2_MATCH_INVALID_UTF was passed to <b>pcre2_compile()</b>. The latter special 2950*22dc650dSSadaf Ebrahimicase is discussed in detail in the 2951*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html"><b>pcre2unicode</b></a> 2952*22dc650dSSadaf Ebrahimidocumentation. 2953*22dc650dSSadaf Ebrahimi</P> 2954*22dc650dSSadaf Ebrahimi<P> 2955*22dc650dSSadaf EbrahimiIn the default case, if a non-zero starting offset is given, the check is 2956*22dc650dSSadaf Ebrahimiapplied only to that part of the subject that could be inspected during 2957*22dc650dSSadaf Ebrahimimatching, and there is a check that the starting offset points to the first 2958*22dc650dSSadaf Ebrahimicode unit of a character or to the end of the subject. If there are no 2959*22dc650dSSadaf Ebrahimilookbehind assertions in the pattern, the check starts at the starting offset. 2960*22dc650dSSadaf EbrahimiOtherwise, it starts at the length of the longest lookbehind before the 2961*22dc650dSSadaf Ebrahimistarting offset, or at the start of the subject if there are not that many 2962*22dc650dSSadaf Ebrahimicharacters before the starting offset. Note that the sequences \b and \B are 2963*22dc650dSSadaf Ebrahimione-character lookbehinds. 2964*22dc650dSSadaf Ebrahimi</P> 2965*22dc650dSSadaf Ebrahimi<P> 2966*22dc650dSSadaf EbrahimiThe check is carried out before any other processing takes place, and a 2967*22dc650dSSadaf Ebrahiminegative error code is returned if the check fails. There are several UTF error 2968*22dc650dSSadaf Ebrahimicodes for each code unit width, corresponding to different problems with the 2969*22dc650dSSadaf Ebrahimicode unit sequence. There are discussions about the validity of 2970*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html#utf8strings">UTF-8 strings,</a> 2971*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html#utf16strings">UTF-16 strings,</a> 2972*22dc650dSSadaf Ebrahimiand 2973*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html#utf32strings">UTF-32 strings</a> 2974*22dc650dSSadaf Ebrahimiin the 2975*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html"><b>pcre2unicode</b></a> 2976*22dc650dSSadaf Ebrahimidocumentation. 2977*22dc650dSSadaf Ebrahimi</P> 2978*22dc650dSSadaf Ebrahimi<P> 2979*22dc650dSSadaf EbrahimiIf you know that your subject is valid, and you want to skip this check for 2980*22dc650dSSadaf Ebrahimiperformance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling 2981*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>. You might want to do this for the second and subsequent 2982*22dc650dSSadaf Ebrahimicalls to <b>pcre2_match()</b> if you are making repeated calls to find multiple 2983*22dc650dSSadaf Ebrahimimatches in the same subject string. 2984*22dc650dSSadaf Ebrahimi</P> 2985*22dc650dSSadaf Ebrahimi<P> 2986*22dc650dSSadaf Ebrahimi<b>Warning:</b> Unless PCRE2_MATCH_INVALID_UTF was set at compile time, when 2987*22dc650dSSadaf EbrahimiPCRE2_NO_UTF_CHECK is set at match time the effect of passing an invalid 2988*22dc650dSSadaf Ebrahimistring as a subject, or an invalid value of <i>startoffset</i>, is undefined. 2989*22dc650dSSadaf EbrahimiYour program may crash or loop indefinitely or give wrong results. 2990*22dc650dSSadaf Ebrahimi<pre> 2991*22dc650dSSadaf Ebrahimi PCRE2_PARTIAL_HARD 2992*22dc650dSSadaf Ebrahimi PCRE2_PARTIAL_SOFT 2993*22dc650dSSadaf Ebrahimi</pre> 2994*22dc650dSSadaf EbrahimiThese options turn on the partial matching feature. A partial match occurs if 2995*22dc650dSSadaf Ebrahimithe end of the subject string is reached successfully, but there are not enough 2996*22dc650dSSadaf Ebrahimisubject characters to complete the match. In addition, either at least one 2997*22dc650dSSadaf Ebrahimicharacter must have been inspected or the pattern must contain a lookbehind, or 2998*22dc650dSSadaf Ebrahimithe pattern must be one that could match an empty string. 2999*22dc650dSSadaf Ebrahimi</P> 3000*22dc650dSSadaf Ebrahimi<P> 3001*22dc650dSSadaf EbrahimiIf this situation arises when PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD) 3002*22dc650dSSadaf Ebrahimiis set, matching continues by testing any remaining alternatives. Only if no 3003*22dc650dSSadaf Ebrahimicomplete match can be found is PCRE2_ERROR_PARTIAL returned instead of 3004*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOMATCH. In other words, PCRE2_PARTIAL_SOFT specifies that the 3005*22dc650dSSadaf Ebrahimicaller is prepared to handle a partial match, but only if no complete match can 3006*22dc650dSSadaf Ebrahimibe found. 3007*22dc650dSSadaf Ebrahimi</P> 3008*22dc650dSSadaf Ebrahimi<P> 3009*22dc650dSSadaf EbrahimiIf PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if 3010*22dc650dSSadaf Ebrahimia partial match is found, <b>pcre2_match()</b> immediately returns 3011*22dc650dSSadaf EbrahimiPCRE2_ERROR_PARTIAL, without considering any other alternatives. In other 3012*22dc650dSSadaf Ebrahimiwords, when PCRE2_PARTIAL_HARD is set, a partial match is considered to be more 3013*22dc650dSSadaf Ebrahimiimportant that an alternative complete match. 3014*22dc650dSSadaf Ebrahimi</P> 3015*22dc650dSSadaf Ebrahimi<P> 3016*22dc650dSSadaf EbrahimiThere is a more detailed discussion of partial and multi-segment matching, with 3017*22dc650dSSadaf Ebrahimiexamples, in the 3018*22dc650dSSadaf Ebrahimi<a href="pcre2partial.html"><b>pcre2partial</b></a> 3019*22dc650dSSadaf Ebrahimidocumentation. 3020*22dc650dSSadaf Ebrahimi</P> 3021*22dc650dSSadaf Ebrahimi<br><a name="SEC29" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br> 3022*22dc650dSSadaf Ebrahimi<P> 3023*22dc650dSSadaf EbrahimiWhen PCRE2 is built, a default newline convention is set; this is usually the 3024*22dc650dSSadaf Ebrahimistandard convention for the operating system. The default can be overridden in 3025*22dc650dSSadaf Ebrahimia 3026*22dc650dSSadaf Ebrahimi<a href="#compilecontext">compile context</a> 3027*22dc650dSSadaf Ebrahimiby calling <b>pcre2_set_newline()</b>. It can also be overridden by starting a 3028*22dc650dSSadaf Ebrahimipattern string with, for example, (*CRLF), as described in the 3029*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html#newlines">section on newline conventions</a> 3030*22dc650dSSadaf Ebrahimiin the 3031*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 3032*22dc650dSSadaf Ebrahimipage. During matching, the newline choice affects the behaviour of the dot, 3033*22dc650dSSadaf Ebrahimicircumflex, and dollar metacharacters. It may also alter the way the match 3034*22dc650dSSadaf Ebrahimistarting position is advanced after a match failure for an unanchored pattern. 3035*22dc650dSSadaf Ebrahimi</P> 3036*22dc650dSSadaf Ebrahimi<P> 3037*22dc650dSSadaf EbrahimiWhen PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is set as 3038*22dc650dSSadaf Ebrahimithe newline convention, and a match attempt for an unanchored pattern fails 3039*22dc650dSSadaf Ebrahimiwhen the current starting position is at a CRLF sequence, and the pattern 3040*22dc650dSSadaf Ebrahimicontains no explicit matches for CR or LF characters, the match position is 3041*22dc650dSSadaf Ebrahimiadvanced by two characters instead of one, in other words, to after the CRLF. 3042*22dc650dSSadaf Ebrahimi</P> 3043*22dc650dSSadaf Ebrahimi<P> 3044*22dc650dSSadaf EbrahimiThe above rule is a compromise that makes the most common cases work as 3045*22dc650dSSadaf Ebrahimiexpected. For example, if the pattern is .+A (and the PCRE2_DOTALL option is 3046*22dc650dSSadaf Ebrahiminot set), it does not match the string "\r\nA" because, after failing at the 3047*22dc650dSSadaf Ebrahimistart, it skips both the CR and the LF before retrying. However, the pattern 3048*22dc650dSSadaf Ebrahimi[\r\n]A does match that string, because it contains an explicit CR or LF 3049*22dc650dSSadaf Ebrahimireference, and so advances only by one character after the first failure. 3050*22dc650dSSadaf Ebrahimi</P> 3051*22dc650dSSadaf Ebrahimi<P> 3052*22dc650dSSadaf EbrahimiAn explicit match for CR of LF is either a literal appearance of one of those 3053*22dc650dSSadaf Ebrahimicharacters in the pattern, or one of the \r or \n or equivalent octal or 3054*22dc650dSSadaf Ebrahimihexadecimal escape sequences. Implicit matches such as [^X] do not count, nor 3055*22dc650dSSadaf Ebrahimidoes \s, even though it includes CR and LF in the characters that it matches. 3056*22dc650dSSadaf Ebrahimi</P> 3057*22dc650dSSadaf Ebrahimi<P> 3058*22dc650dSSadaf EbrahimiNotwithstanding the above, anomalous effects may still occur when CRLF is a 3059*22dc650dSSadaf Ebrahimivalid newline sequence and explicit \r or \n escapes appear in the pattern. 3060*22dc650dSSadaf Ebrahimi<a name="matchedstrings"></a></P> 3061*22dc650dSSadaf Ebrahimi<br><a name="SEC30" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br> 3062*22dc650dSSadaf Ebrahimi<P> 3063*22dc650dSSadaf Ebrahimi<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b> 3064*22dc650dSSadaf Ebrahimi<br> 3065*22dc650dSSadaf Ebrahimi<br> 3066*22dc650dSSadaf Ebrahimi<b>PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *<i>match_data</i>);</b> 3067*22dc650dSSadaf Ebrahimi</P> 3068*22dc650dSSadaf Ebrahimi<P> 3069*22dc650dSSadaf EbrahimiIn general, a pattern matches a certain portion of the subject, and in 3070*22dc650dSSadaf Ebrahimiaddition, further substrings from the subject may be picked out by 3071*22dc650dSSadaf Ebrahimiparenthesized parts of the pattern. Following the usage in Jeffrey Friedl's 3072*22dc650dSSadaf Ebrahimibook, this is called "capturing" in what follows, and the phrase "capture 3073*22dc650dSSadaf Ebrahimigroup" (Perl terminology) is used for a fragment of a pattern that picks out a 3074*22dc650dSSadaf Ebrahimisubstring. PCRE2 supports several other kinds of parenthesized group that do 3075*22dc650dSSadaf Ebrahiminot cause substrings to be captured. The <b>pcre2_pattern_info()</b> function 3076*22dc650dSSadaf Ebrahimican be used to find out how many capture groups there are in a compiled 3077*22dc650dSSadaf Ebrahimipattern. 3078*22dc650dSSadaf Ebrahimi</P> 3079*22dc650dSSadaf Ebrahimi<P> 3080*22dc650dSSadaf EbrahimiYou can use auxiliary functions for accessing captured substrings 3081*22dc650dSSadaf Ebrahimi<a href="#extractbynumber">by number</a> 3082*22dc650dSSadaf Ebrahimior 3083*22dc650dSSadaf Ebrahimi<a href="#extractbyname">by name,</a> 3084*22dc650dSSadaf Ebrahimias described in sections below. 3085*22dc650dSSadaf Ebrahimi</P> 3086*22dc650dSSadaf Ebrahimi<P> 3087*22dc650dSSadaf EbrahimiAlternatively, you can make direct use of the vector of PCRE2_SIZE values, 3088*22dc650dSSadaf Ebrahimicalled the <b>ovector</b>, which contains the offsets of captured strings. It is 3089*22dc650dSSadaf Ebrahimipart of the 3090*22dc650dSSadaf Ebrahimi<a href="#matchdatablock">match data block.</a> 3091*22dc650dSSadaf EbrahimiThe function <b>pcre2_get_ovector_pointer()</b> returns the address of the 3092*22dc650dSSadaf Ebrahimiovector, and <b>pcre2_get_ovector_count()</b> returns the number of pairs of 3093*22dc650dSSadaf Ebrahimivalues it contains. 3094*22dc650dSSadaf Ebrahimi</P> 3095*22dc650dSSadaf Ebrahimi<P> 3096*22dc650dSSadaf EbrahimiWithin the ovector, the first in each pair of values is set to the offset of 3097*22dc650dSSadaf Ebrahimithe first code unit of a substring, and the second is set to the offset of the 3098*22dc650dSSadaf Ebrahimifirst code unit after the end of a substring. These values are always code unit 3099*22dc650dSSadaf Ebrahimioffsets, not character offsets. That is, they are byte offsets in the 8-bit 3100*22dc650dSSadaf Ebrahimilibrary, 16-bit offsets in the 16-bit library, and 32-bit offsets in the 32-bit 3101*22dc650dSSadaf Ebrahimilibrary. 3102*22dc650dSSadaf Ebrahimi</P> 3103*22dc650dSSadaf Ebrahimi<P> 3104*22dc650dSSadaf EbrahimiAfter a partial match (error return PCRE2_ERROR_PARTIAL), only the first pair 3105*22dc650dSSadaf Ebrahimiof offsets (that is, <i>ovector[0]</i> and <i>ovector[1]</i>) are set. They 3106*22dc650dSSadaf Ebrahimiidentify the part of the subject that was partially matched. See the 3107*22dc650dSSadaf Ebrahimi<a href="pcre2partial.html"><b>pcre2partial</b></a> 3108*22dc650dSSadaf Ebrahimidocumentation for details of partial matching. 3109*22dc650dSSadaf Ebrahimi</P> 3110*22dc650dSSadaf Ebrahimi<P> 3111*22dc650dSSadaf EbrahimiAfter a fully successful match, the first pair of offsets identifies the 3112*22dc650dSSadaf Ebrahimiportion of the subject string that was matched by the entire pattern. The next 3113*22dc650dSSadaf Ebrahimipair is used for the first captured substring, and so on. The value returned by 3114*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> is one more than the highest numbered pair that has been 3115*22dc650dSSadaf Ebrahimiset. For example, if two substrings have been captured, the returned value is 3116*22dc650dSSadaf Ebrahimi3. If there are no captured substrings, the return value from a successful 3117*22dc650dSSadaf Ebrahimimatch is 1, indicating that just the first pair of offsets has been set. 3118*22dc650dSSadaf Ebrahimi</P> 3119*22dc650dSSadaf Ebrahimi<P> 3120*22dc650dSSadaf EbrahimiIf a pattern uses the \K escape sequence within a positive assertion, the 3121*22dc650dSSadaf Ebrahimireported start of a successful match can be greater than the end of the match. 3122*22dc650dSSadaf EbrahimiFor example, if the pattern (?=ab\K) is matched against "ab", the start and 3123*22dc650dSSadaf Ebrahimiend offset values for the match are 2 and 0. 3124*22dc650dSSadaf Ebrahimi</P> 3125*22dc650dSSadaf Ebrahimi<P> 3126*22dc650dSSadaf EbrahimiIf a capture group is matched repeatedly within a single match operation, it is 3127*22dc650dSSadaf Ebrahimithe last portion of the subject that it matched that is returned. 3128*22dc650dSSadaf Ebrahimi</P> 3129*22dc650dSSadaf Ebrahimi<P> 3130*22dc650dSSadaf EbrahimiIf the ovector is too small to hold all the captured substring offsets, as much 3131*22dc650dSSadaf Ebrahimias possible is filled in, and the function returns a value of zero. If captured 3132*22dc650dSSadaf Ebrahimisubstrings are not of interest, <b>pcre2_match()</b> may be called with a match 3133*22dc650dSSadaf Ebrahimidata block whose ovector is of minimum length (that is, one pair). 3134*22dc650dSSadaf Ebrahimi</P> 3135*22dc650dSSadaf Ebrahimi<P> 3136*22dc650dSSadaf EbrahimiIt is possible for capture group number <i>n+1</i> to match some part of the 3137*22dc650dSSadaf Ebrahimisubject when group <i>n</i> has not been used at all. For example, if the string 3138*22dc650dSSadaf Ebrahimi"abc" is matched against the pattern (a|(z))(bc) the return from the function 3139*22dc650dSSadaf Ebrahimiis 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both 3140*22dc650dSSadaf Ebrahimivalues in the offset pairs corresponding to unused groups are set to 3141*22dc650dSSadaf EbrahimiPCRE2_UNSET. 3142*22dc650dSSadaf Ebrahimi</P> 3143*22dc650dSSadaf Ebrahimi<P> 3144*22dc650dSSadaf EbrahimiOffset values that correspond to unused groups at the end of the expression are 3145*22dc650dSSadaf Ebrahimialso set to PCRE2_UNSET. For example, if the string "abc" is matched against 3146*22dc650dSSadaf Ebrahimithe pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the 3147*22dc650dSSadaf Ebrahimifunction is 2, because the highest used capture group number is 1. The offsets 3148*22dc650dSSadaf Ebrahimifor the second and third capture groups (assuming the vector is large enough, 3149*22dc650dSSadaf Ebrahimiof course) are set to PCRE2_UNSET. 3150*22dc650dSSadaf Ebrahimi</P> 3151*22dc650dSSadaf Ebrahimi<P> 3152*22dc650dSSadaf EbrahimiElements in the ovector that do not correspond to capturing parentheses in the 3153*22dc650dSSadaf Ebrahimipattern are never changed. That is, if a pattern contains <i>n</i> capturing 3154*22dc650dSSadaf Ebrahimiparentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by 3155*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>. The other elements retain whatever values they previously 3156*22dc650dSSadaf Ebrahimihad. After a failed match attempt, the contents of the ovector are unchanged. 3157*22dc650dSSadaf Ebrahimi<a name="matchotherdata"></a></P> 3158*22dc650dSSadaf Ebrahimi<br><a name="SEC31" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br> 3159*22dc650dSSadaf Ebrahimi<P> 3160*22dc650dSSadaf Ebrahimi<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b> 3161*22dc650dSSadaf Ebrahimi<br> 3162*22dc650dSSadaf Ebrahimi<br> 3163*22dc650dSSadaf Ebrahimi<b>PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *<i>match_data</i>);</b> 3164*22dc650dSSadaf Ebrahimi</P> 3165*22dc650dSSadaf Ebrahimi<P> 3166*22dc650dSSadaf EbrahimiAs well as the offsets in the ovector, other information about a match is 3167*22dc650dSSadaf Ebrahimiretained in the match data block and can be retrieved by the above functions in 3168*22dc650dSSadaf Ebrahimiappropriate circumstances. If they are called at other times, the result is 3169*22dc650dSSadaf Ebrahimiundefined. 3170*22dc650dSSadaf Ebrahimi</P> 3171*22dc650dSSadaf Ebrahimi<P> 3172*22dc650dSSadaf EbrahimiAfter a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure 3173*22dc650dSSadaf Ebrahimito match (PCRE2_ERROR_NOMATCH), a mark name may be available. The function 3174*22dc650dSSadaf Ebrahimi<b>pcre2_get_mark()</b> can be called to access this name, which can be 3175*22dc650dSSadaf Ebrahimispecified in the pattern by any of the backtracking control verbs, not just 3176*22dc650dSSadaf Ebrahimi(*MARK). The same function applies to all the verbs. It returns a pointer to 3177*22dc650dSSadaf Ebrahimithe zero-terminated name, which is within the compiled pattern. If no name is 3178*22dc650dSSadaf Ebrahimiavailable, NULL is returned. The length of the name (excluding the terminating 3179*22dc650dSSadaf Ebrahimizero) is stored in the code unit that precedes the name. You should use this 3180*22dc650dSSadaf Ebrahimilength instead of relying on the terminating zero if the name might contain a 3181*22dc650dSSadaf Ebrahimibinary zero. 3182*22dc650dSSadaf Ebrahimi</P> 3183*22dc650dSSadaf Ebrahimi<P> 3184*22dc650dSSadaf EbrahimiAfter a successful match, the name that is returned is the last mark name 3185*22dc650dSSadaf Ebrahimiencountered on the matching path through the pattern. Instances of backtracking 3186*22dc650dSSadaf Ebrahimiverbs without names do not count. Thus, for example, if the matching path 3187*22dc650dSSadaf Ebrahimicontains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a 3188*22dc650dSSadaf Ebrahimipartial match, the last encountered name is returned. For example, consider 3189*22dc650dSSadaf Ebrahimithis pattern: 3190*22dc650dSSadaf Ebrahimi<pre> 3191*22dc650dSSadaf Ebrahimi ^(*MARK:A)((*MARK:B)a|b)c 3192*22dc650dSSadaf Ebrahimi</pre> 3193*22dc650dSSadaf EbrahimiWhen it matches "bc", the returned name is A. The B mark is "seen" in the first 3194*22dc650dSSadaf Ebrahimibranch of the group, but it is not on the matching path. On the other hand, 3195*22dc650dSSadaf Ebrahimiwhen this pattern fails to match "bx", the returned name is B. 3196*22dc650dSSadaf Ebrahimi</P> 3197*22dc650dSSadaf Ebrahimi<P> 3198*22dc650dSSadaf Ebrahimi<b>Warning:</b> By default, certain start-of-match optimizations are used to 3199*22dc650dSSadaf Ebrahimigive a fast "no match" result in some situations. For example, if the anchoring 3200*22dc650dSSadaf Ebrahimiis removed from the pattern above, there is an initial check for the presence 3201*22dc650dSSadaf Ebrahimiof "c" in the subject before running the matching engine. This check fails for 3202*22dc650dSSadaf Ebrahimi"bx", causing a match failure without seeing any marks. You can disable the 3203*22dc650dSSadaf Ebrahimistart-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for 3204*22dc650dSSadaf Ebrahimi<b>pcre2_compile()</b> or by starting the pattern with (*NO_START_OPT). 3205*22dc650dSSadaf Ebrahimi</P> 3206*22dc650dSSadaf Ebrahimi<P> 3207*22dc650dSSadaf EbrahimiAfter a successful match, a partial match, or one of the invalid UTF errors 3208*22dc650dSSadaf Ebrahimi(for example, PCRE2_ERROR_UTF8_ERR5), <b>pcre2_get_startchar()</b> can be 3209*22dc650dSSadaf Ebrahimicalled. After a successful or partial match it returns the code unit offset of 3210*22dc650dSSadaf Ebrahimithe character at which the match started. For a non-partial match, this can be 3211*22dc650dSSadaf Ebrahimidifferent to the value of <i>ovector[0]</i> if the pattern contains the \K 3212*22dc650dSSadaf Ebrahimiescape sequence. After a partial match, however, this value is always the same 3213*22dc650dSSadaf Ebrahimias <i>ovector[0]</i> because \K does not affect the result of a partial match. 3214*22dc650dSSadaf Ebrahimi</P> 3215*22dc650dSSadaf Ebrahimi<P> 3216*22dc650dSSadaf EbrahimiAfter a UTF check failure, <b>pcre2_get_startchar()</b> can be used to obtain 3217*22dc650dSSadaf Ebrahimithe code unit offset of the invalid UTF character. Details are given in the 3218*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html"><b>pcre2unicode</b></a> 3219*22dc650dSSadaf Ebrahimipage. 3220*22dc650dSSadaf Ebrahimi<a name="errorlist"></a></P> 3221*22dc650dSSadaf Ebrahimi<br><a name="SEC32" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br> 3222*22dc650dSSadaf Ebrahimi<P> 3223*22dc650dSSadaf EbrahimiIf <b>pcre2_match()</b> fails, it returns a negative number. This can be 3224*22dc650dSSadaf Ebrahimiconverted to a text string by calling the <b>pcre2_get_error_message()</b> 3225*22dc650dSSadaf Ebrahimifunction (see "Obtaining a textual error message" 3226*22dc650dSSadaf Ebrahimi<a href="#geterrormessage">below).</a> 3227*22dc650dSSadaf EbrahimiNegative error codes are also returned by other functions, and are documented 3228*22dc650dSSadaf Ebrahimiwith them. The codes are given names in the header file. If UTF checking is in 3229*22dc650dSSadaf Ebrahimiforce and an invalid UTF subject string is detected, one of a number of 3230*22dc650dSSadaf EbrahimiUTF-specific negative error codes is returned. Details are given in the 3231*22dc650dSSadaf Ebrahimi<a href="pcre2unicode.html"><b>pcre2unicode</b></a> 3232*22dc650dSSadaf Ebrahimipage. The following are the other errors that may be returned by 3233*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>: 3234*22dc650dSSadaf Ebrahimi<pre> 3235*22dc650dSSadaf Ebrahimi PCRE2_ERROR_NOMATCH 3236*22dc650dSSadaf Ebrahimi</pre> 3237*22dc650dSSadaf EbrahimiThe subject string did not match the pattern. 3238*22dc650dSSadaf Ebrahimi<pre> 3239*22dc650dSSadaf Ebrahimi PCRE2_ERROR_PARTIAL 3240*22dc650dSSadaf Ebrahimi</pre> 3241*22dc650dSSadaf EbrahimiThe subject string did not match, but it did match partially. See the 3242*22dc650dSSadaf Ebrahimi<a href="pcre2partial.html"><b>pcre2partial</b></a> 3243*22dc650dSSadaf Ebrahimidocumentation for details of partial matching. 3244*22dc650dSSadaf Ebrahimi<pre> 3245*22dc650dSSadaf Ebrahimi PCRE2_ERROR_BADMAGIC 3246*22dc650dSSadaf Ebrahimi</pre> 3247*22dc650dSSadaf EbrahimiPCRE2 stores a 4-byte "magic number" at the start of the compiled code, to 3248*22dc650dSSadaf Ebrahimicatch the case when it is passed a junk pointer. This is the error that is 3249*22dc650dSSadaf Ebrahimireturned when the magic number is not present. 3250*22dc650dSSadaf Ebrahimi<pre> 3251*22dc650dSSadaf Ebrahimi PCRE2_ERROR_BADMODE 3252*22dc650dSSadaf Ebrahimi</pre> 3253*22dc650dSSadaf EbrahimiThis error is given when a compiled pattern is passed to a function in a 3254*22dc650dSSadaf Ebrahimilibrary of a different code unit width, for example, a pattern compiled by 3255*22dc650dSSadaf Ebrahimithe 8-bit library is passed to a 16-bit or 32-bit library function. 3256*22dc650dSSadaf Ebrahimi<pre> 3257*22dc650dSSadaf Ebrahimi PCRE2_ERROR_BADOFFSET 3258*22dc650dSSadaf Ebrahimi</pre> 3259*22dc650dSSadaf EbrahimiThe value of <i>startoffset</i> was greater than the length of the subject. 3260*22dc650dSSadaf Ebrahimi<pre> 3261*22dc650dSSadaf Ebrahimi PCRE2_ERROR_BADOPTION 3262*22dc650dSSadaf Ebrahimi</pre> 3263*22dc650dSSadaf EbrahimiAn unrecognized bit was set in the <i>options</i> argument. 3264*22dc650dSSadaf Ebrahimi<pre> 3265*22dc650dSSadaf Ebrahimi PCRE2_ERROR_BADUTFOFFSET 3266*22dc650dSSadaf Ebrahimi</pre> 3267*22dc650dSSadaf EbrahimiThe UTF code unit sequence that was passed as a subject was checked and found 3268*22dc650dSSadaf Ebrahimito be valid (the PCRE2_NO_UTF_CHECK option was not set), but the value of 3269*22dc650dSSadaf Ebrahimi<i>startoffset</i> did not point to the beginning of a UTF character or the end 3270*22dc650dSSadaf Ebrahimiof the subject. 3271*22dc650dSSadaf Ebrahimi<pre> 3272*22dc650dSSadaf Ebrahimi PCRE2_ERROR_CALLOUT 3273*22dc650dSSadaf Ebrahimi</pre> 3274*22dc650dSSadaf EbrahimiThis error is never generated by <b>pcre2_match()</b> itself. It is provided for 3275*22dc650dSSadaf Ebrahimiuse by callout functions that want to cause <b>pcre2_match()</b> or 3276*22dc650dSSadaf Ebrahimi<b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the 3277*22dc650dSSadaf Ebrahimi<a href="pcre2callout.html"><b>pcre2callout</b></a> 3278*22dc650dSSadaf Ebrahimidocumentation for details. 3279*22dc650dSSadaf Ebrahimi<pre> 3280*22dc650dSSadaf Ebrahimi PCRE2_ERROR_DEPTHLIMIT 3281*22dc650dSSadaf Ebrahimi</pre> 3282*22dc650dSSadaf EbrahimiThe nested backtracking depth limit was reached. 3283*22dc650dSSadaf Ebrahimi<pre> 3284*22dc650dSSadaf Ebrahimi PCRE2_ERROR_HEAPLIMIT 3285*22dc650dSSadaf Ebrahimi</pre> 3286*22dc650dSSadaf EbrahimiThe heap limit was reached. 3287*22dc650dSSadaf Ebrahimi<pre> 3288*22dc650dSSadaf Ebrahimi PCRE2_ERROR_INTERNAL 3289*22dc650dSSadaf Ebrahimi</pre> 3290*22dc650dSSadaf EbrahimiAn unexpected internal error has occurred. This error could be caused by a bug 3291*22dc650dSSadaf Ebrahimiin PCRE2 or by overwriting of the compiled pattern. 3292*22dc650dSSadaf Ebrahimi<pre> 3293*22dc650dSSadaf Ebrahimi PCRE2_ERROR_JIT_STACKLIMIT 3294*22dc650dSSadaf Ebrahimi</pre> 3295*22dc650dSSadaf EbrahimiThis error is returned when a pattern that was successfully studied using JIT 3296*22dc650dSSadaf Ebrahimiis being matched, but the memory available for the just-in-time processing 3297*22dc650dSSadaf Ebrahimistack is not large enough. See the 3298*22dc650dSSadaf Ebrahimi<a href="pcre2jit.html"><b>pcre2jit</b></a> 3299*22dc650dSSadaf Ebrahimidocumentation for more details. 3300*22dc650dSSadaf Ebrahimi<pre> 3301*22dc650dSSadaf Ebrahimi PCRE2_ERROR_MATCHLIMIT 3302*22dc650dSSadaf Ebrahimi</pre> 3303*22dc650dSSadaf EbrahimiThe backtracking match limit was reached. 3304*22dc650dSSadaf Ebrahimi<pre> 3305*22dc650dSSadaf Ebrahimi PCRE2_ERROR_NOMEMORY 3306*22dc650dSSadaf Ebrahimi</pre> 3307*22dc650dSSadaf EbrahimiHeap memory is used to remember backtracking points. This error is given when 3308*22dc650dSSadaf Ebrahimithe memory allocation function (default or custom) fails. Note that a different 3309*22dc650dSSadaf Ebrahimierror, PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed exceeds 3310*22dc650dSSadaf Ebrahimithe heap limit. PCRE2_ERROR_NOMEMORY is also returned if 3311*22dc650dSSadaf EbrahimiPCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails. 3312*22dc650dSSadaf Ebrahimi<pre> 3313*22dc650dSSadaf Ebrahimi PCRE2_ERROR_NULL 3314*22dc650dSSadaf Ebrahimi</pre> 3315*22dc650dSSadaf EbrahimiEither the <i>code</i>, <i>subject</i>, or <i>match_data</i> argument was passed 3316*22dc650dSSadaf Ebrahimias NULL. 3317*22dc650dSSadaf Ebrahimi<pre> 3318*22dc650dSSadaf Ebrahimi PCRE2_ERROR_RECURSELOOP 3319*22dc650dSSadaf Ebrahimi</pre> 3320*22dc650dSSadaf EbrahimiThis error is returned when <b>pcre2_match()</b> detects a recursion loop within 3321*22dc650dSSadaf Ebrahimithe pattern. Specifically, it means that either the whole pattern or a 3322*22dc650dSSadaf Ebrahimicapture group has been called recursively for the second time at the same 3323*22dc650dSSadaf Ebrahimiposition in the subject string. Some simple patterns that might do this are 3324*22dc650dSSadaf Ebrahimidetected and faulted at compile time, but more complicated cases, in particular 3325*22dc650dSSadaf Ebrahimimutual recursions between two different groups, cannot be detected until 3326*22dc650dSSadaf Ebrahimimatching is attempted. 3327*22dc650dSSadaf Ebrahimi<a name="geterrormessage"></a></P> 3328*22dc650dSSadaf Ebrahimi<br><a name="SEC33" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br> 3329*22dc650dSSadaf Ebrahimi<P> 3330*22dc650dSSadaf Ebrahimi<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b> 3331*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>bufflen</i>);</b> 3332*22dc650dSSadaf Ebrahimi</P> 3333*22dc650dSSadaf Ebrahimi<P> 3334*22dc650dSSadaf EbrahimiA text message for an error code from any PCRE2 function (compile, match, or 3335*22dc650dSSadaf Ebrahimiauxiliary) can be obtained by calling <b>pcre2_get_error_message()</b>. The code 3336*22dc650dSSadaf Ebrahimiis passed as the first argument, with the remaining two arguments specifying a 3337*22dc650dSSadaf Ebrahimicode unit buffer and its length in code units, into which the text message is 3338*22dc650dSSadaf Ebrahimiplaced. The message is returned in code units of the appropriate width for the 3339*22dc650dSSadaf Ebrahimilibrary that is being used. 3340*22dc650dSSadaf Ebrahimi</P> 3341*22dc650dSSadaf Ebrahimi<P> 3342*22dc650dSSadaf EbrahimiThe returned message is terminated with a trailing zero, and the function 3343*22dc650dSSadaf Ebrahimireturns the number of code units used, excluding the trailing zero. If the 3344*22dc650dSSadaf Ebrahimierror number is unknown, the negative error code PCRE2_ERROR_BADDATA is 3345*22dc650dSSadaf Ebrahimireturned. If the buffer is too small, the message is truncated (but still with 3346*22dc650dSSadaf Ebrahimia trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned. 3347*22dc650dSSadaf EbrahimiNone of the messages are very long; a buffer size of 120 code units is ample. 3348*22dc650dSSadaf Ebrahimi<a name="extractbynumber"></a></P> 3349*22dc650dSSadaf Ebrahimi<br><a name="SEC34" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br> 3350*22dc650dSSadaf Ebrahimi<P> 3351*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b> 3352*22dc650dSSadaf Ebrahimi<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b> 3353*22dc650dSSadaf Ebrahimi<br> 3354*22dc650dSSadaf Ebrahimi<br> 3355*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_copy_bynumber(pcre2_match_data *<i>match_data</i>,</b> 3356*22dc650dSSadaf Ebrahimi<b> uint32_t <i>number</i>, PCRE2_UCHAR *<i>buffer</i>,</b> 3357*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE *<i>bufflen</i>);</b> 3358*22dc650dSSadaf Ebrahimi<br> 3359*22dc650dSSadaf Ebrahimi<br> 3360*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_get_bynumber(pcre2_match_data *<i>match_data</i>,</b> 3361*22dc650dSSadaf Ebrahimi<b> uint32_t <i>number</i>, PCRE2_UCHAR **<i>bufferptr</i>,</b> 3362*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE *<i>bufflen</i>);</b> 3363*22dc650dSSadaf Ebrahimi<br> 3364*22dc650dSSadaf Ebrahimi<br> 3365*22dc650dSSadaf Ebrahimi<b>void pcre2_substring_free(PCRE2_UCHAR *<i>buffer</i>);</b> 3366*22dc650dSSadaf Ebrahimi</P> 3367*22dc650dSSadaf Ebrahimi<P> 3368*22dc650dSSadaf EbrahimiCaptured substrings can be accessed directly by using the ovector as described 3369*22dc650dSSadaf Ebrahimi<a href="#matchedstrings">above.</a> 3370*22dc650dSSadaf EbrahimiFor convenience, auxiliary functions are provided for extracting captured 3371*22dc650dSSadaf Ebrahimisubstrings as new, separate, zero-terminated strings. A substring that contains 3372*22dc650dSSadaf Ebrahimia binary zero is correctly extracted and has a further zero added on the end, 3373*22dc650dSSadaf Ebrahimibut the result is not, of course, a C string. 3374*22dc650dSSadaf Ebrahimi</P> 3375*22dc650dSSadaf Ebrahimi<P> 3376*22dc650dSSadaf EbrahimiThe functions in this section identify substrings by number. The number zero 3377*22dc650dSSadaf Ebrahimirefers to the entire matched substring, with higher numbers referring to 3378*22dc650dSSadaf Ebrahimisubstrings captured by parenthesized groups. After a partial match, only 3379*22dc650dSSadaf Ebrahimisubstring zero is available. An attempt to extract any other substring gives 3380*22dc650dSSadaf Ebrahimithe error PCRE2_ERROR_PARTIAL. The next section describes similar functions for 3381*22dc650dSSadaf Ebrahimiextracting captured substrings by name. 3382*22dc650dSSadaf Ebrahimi</P> 3383*22dc650dSSadaf Ebrahimi<P> 3384*22dc650dSSadaf EbrahimiIf a pattern uses the \K escape sequence within a positive assertion, the 3385*22dc650dSSadaf Ebrahimireported start of a successful match can be greater than the end of the match. 3386*22dc650dSSadaf EbrahimiFor example, if the pattern (?=ab\K) is matched against "ab", the start and 3387*22dc650dSSadaf Ebrahimiend offset values for the match are 2 and 0. In this situation, calling these 3388*22dc650dSSadaf Ebrahimifunctions with a zero substring number extracts a zero-length empty string. 3389*22dc650dSSadaf Ebrahimi</P> 3390*22dc650dSSadaf Ebrahimi<P> 3391*22dc650dSSadaf EbrahimiYou can find the length in code units of a captured substring without 3392*22dc650dSSadaf Ebrahimiextracting it by calling <b>pcre2_substring_length_bynumber()</b>. The first 3393*22dc650dSSadaf Ebrahimiargument is a pointer to the match data block, the second is the group number, 3394*22dc650dSSadaf Ebrahimiand the third is a pointer to a variable into which the length is placed. If 3395*22dc650dSSadaf Ebrahimiyou just want to know whether or not the substring has been captured, you can 3396*22dc650dSSadaf Ebrahimipass the third argument as NULL. 3397*22dc650dSSadaf Ebrahimi</P> 3398*22dc650dSSadaf Ebrahimi<P> 3399*22dc650dSSadaf EbrahimiThe <b>pcre2_substring_copy_bynumber()</b> function copies a captured substring 3400*22dc650dSSadaf Ebrahimiinto a supplied buffer, whereas <b>pcre2_substring_get_bynumber()</b> copies it 3401*22dc650dSSadaf Ebrahimiinto new memory, obtained using the same memory allocation function that was 3402*22dc650dSSadaf Ebrahimiused for the match data block. The first two arguments of these functions are a 3403*22dc650dSSadaf Ebrahimipointer to the match data block and a capture group number. 3404*22dc650dSSadaf Ebrahimi</P> 3405*22dc650dSSadaf Ebrahimi<P> 3406*22dc650dSSadaf EbrahimiThe final arguments of <b>pcre2_substring_copy_bynumber()</b> are a pointer to 3407*22dc650dSSadaf Ebrahimithe buffer and a pointer to a variable that contains its length in code units. 3408*22dc650dSSadaf EbrahimiThis is updated to contain the actual number of code units used for the 3409*22dc650dSSadaf Ebrahimiextracted substring, excluding the terminating zero. 3410*22dc650dSSadaf Ebrahimi</P> 3411*22dc650dSSadaf Ebrahimi<P> 3412*22dc650dSSadaf EbrahimiFor <b>pcre2_substring_get_bynumber()</b> the third and fourth arguments point 3413*22dc650dSSadaf Ebrahimito variables that are updated with a pointer to the new memory and the number 3414*22dc650dSSadaf Ebrahimiof code units that comprise the substring, again excluding the terminating 3415*22dc650dSSadaf Ebrahimizero. When the substring is no longer needed, the memory should be freed by 3416*22dc650dSSadaf Ebrahimicalling <b>pcre2_substring_free()</b>. 3417*22dc650dSSadaf Ebrahimi</P> 3418*22dc650dSSadaf Ebrahimi<P> 3419*22dc650dSSadaf EbrahimiThe return value from all these functions is zero for success, or a negative 3420*22dc650dSSadaf Ebrahimierror code. If the pattern match failed, the match failure code is returned. 3421*22dc650dSSadaf EbrahimiIf a substring number greater than zero is used after a partial match, 3422*22dc650dSSadaf EbrahimiPCRE2_ERROR_PARTIAL is returned. Other possible error codes are: 3423*22dc650dSSadaf Ebrahimi<pre> 3424*22dc650dSSadaf Ebrahimi PCRE2_ERROR_NOMEMORY 3425*22dc650dSSadaf Ebrahimi</pre> 3426*22dc650dSSadaf EbrahimiThe buffer was too small for <b>pcre2_substring_copy_bynumber()</b>, or the 3427*22dc650dSSadaf Ebrahimiattempt to get memory failed for <b>pcre2_substring_get_bynumber()</b>. 3428*22dc650dSSadaf Ebrahimi<pre> 3429*22dc650dSSadaf Ebrahimi PCRE2_ERROR_NOSUBSTRING 3430*22dc650dSSadaf Ebrahimi</pre> 3431*22dc650dSSadaf EbrahimiThere is no substring with that number in the pattern, that is, the number is 3432*22dc650dSSadaf Ebrahimigreater than the number of capturing parentheses. 3433*22dc650dSSadaf Ebrahimi<pre> 3434*22dc650dSSadaf Ebrahimi PCRE2_ERROR_UNAVAILABLE 3435*22dc650dSSadaf Ebrahimi</pre> 3436*22dc650dSSadaf EbrahimiThe substring number, though not greater than the number of captures in the 3437*22dc650dSSadaf Ebrahimipattern, is greater than the number of slots in the ovector, so the substring 3438*22dc650dSSadaf Ebrahimicould not be captured. 3439*22dc650dSSadaf Ebrahimi<pre> 3440*22dc650dSSadaf Ebrahimi PCRE2_ERROR_UNSET 3441*22dc650dSSadaf Ebrahimi</pre> 3442*22dc650dSSadaf EbrahimiThe substring did not participate in the match. For example, if the pattern is 3443*22dc650dSSadaf Ebrahimi(abc)|(def) and the subject is "def", and the ovector contains at least two 3444*22dc650dSSadaf Ebrahimicapturing slots, substring number 1 is unset. 3445*22dc650dSSadaf Ebrahimi</P> 3446*22dc650dSSadaf Ebrahimi<br><a name="SEC35" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br> 3447*22dc650dSSadaf Ebrahimi<P> 3448*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b> 3449*22dc650dSSadaf Ebrahimi<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b> 3450*22dc650dSSadaf Ebrahimi<br> 3451*22dc650dSSadaf Ebrahimi<br> 3452*22dc650dSSadaf Ebrahimi<b>void pcre2_substring_list_free(PCRE2_UCHAR **<i>list</i>);</b> 3453*22dc650dSSadaf Ebrahimi</P> 3454*22dc650dSSadaf Ebrahimi<P> 3455*22dc650dSSadaf EbrahimiThe <b>pcre2_substring_list_get()</b> function extracts all available substrings 3456*22dc650dSSadaf Ebrahimiand builds a list of pointers to them. It also (optionally) builds a second 3457*22dc650dSSadaf Ebrahimilist that contains their lengths (in code units), excluding a terminating zero 3458*22dc650dSSadaf Ebrahimithat is added to each of them. All this is done in a single block of memory 3459*22dc650dSSadaf Ebrahimithat is obtained using the same memory allocation function that was used to get 3460*22dc650dSSadaf Ebrahimithe match data block. 3461*22dc650dSSadaf Ebrahimi</P> 3462*22dc650dSSadaf Ebrahimi<P> 3463*22dc650dSSadaf EbrahimiThis function must be called only after a successful match. If called after a 3464*22dc650dSSadaf Ebrahimipartial match, the error code PCRE2_ERROR_PARTIAL is returned. 3465*22dc650dSSadaf Ebrahimi</P> 3466*22dc650dSSadaf Ebrahimi<P> 3467*22dc650dSSadaf EbrahimiThe address of the memory block is returned via <i>listptr</i>, which is also 3468*22dc650dSSadaf Ebrahimithe start of the list of string pointers. The end of the list is marked by a 3469*22dc650dSSadaf EbrahimiNULL pointer. The address of the list of lengths is returned via 3470*22dc650dSSadaf Ebrahimi<i>lengthsptr</i>. If your strings do not contain binary zeros and you do not 3471*22dc650dSSadaf Ebrahimitherefore need the lengths, you may supply NULL as the <b>lengthsptr</b> 3472*22dc650dSSadaf Ebrahimiargument to disable the creation of a list of lengths. The yield of the 3473*22dc650dSSadaf Ebrahimifunction is zero if all went well, or PCRE2_ERROR_NOMEMORY if the memory block 3474*22dc650dSSadaf Ebrahimicould not be obtained. When the list is no longer needed, it should be freed by 3475*22dc650dSSadaf Ebrahimicalling <b>pcre2_substring_list_free()</b>. 3476*22dc650dSSadaf Ebrahimi</P> 3477*22dc650dSSadaf Ebrahimi<P> 3478*22dc650dSSadaf EbrahimiIf this function encounters a substring that is unset, which can happen when 3479*22dc650dSSadaf Ebrahimicapture group number <i>n+1</i> matches some part of the subject, but group 3480*22dc650dSSadaf Ebrahimi<i>n</i> has not been used at all, it returns an empty string. This can be 3481*22dc650dSSadaf Ebrahimidistinguished from a genuine zero-length substring by inspecting the 3482*22dc650dSSadaf Ebrahimiappropriate offset in the ovector, which contain PCRE2_UNSET for unset 3483*22dc650dSSadaf Ebrahimisubstrings, or by calling <b>pcre2_substring_length_bynumber()</b>. 3484*22dc650dSSadaf Ebrahimi<a name="extractbyname"></a></P> 3485*22dc650dSSadaf Ebrahimi<br><a name="SEC36" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br> 3486*22dc650dSSadaf Ebrahimi<P> 3487*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b> 3488*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>);</b> 3489*22dc650dSSadaf Ebrahimi<br> 3490*22dc650dSSadaf Ebrahimi<br> 3491*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_length_byname(pcre2_match_data *<i>match_data</i>,</b> 3492*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>, PCRE2_SIZE *<i>length</i>);</b> 3493*22dc650dSSadaf Ebrahimi<br> 3494*22dc650dSSadaf Ebrahimi<br> 3495*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_copy_byname(pcre2_match_data *<i>match_data</i>,</b> 3496*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>, PCRE2_UCHAR *<i>buffer</i>, PCRE2_SIZE *<i>bufflen</i>);</b> 3497*22dc650dSSadaf Ebrahimi<br> 3498*22dc650dSSadaf Ebrahimi<br> 3499*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_get_byname(pcre2_match_data *<i>match_data</i>,</b> 3500*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>, PCRE2_UCHAR **<i>bufferptr</i>, PCRE2_SIZE *<i>bufflen</i>);</b> 3501*22dc650dSSadaf Ebrahimi<br> 3502*22dc650dSSadaf Ebrahimi<br> 3503*22dc650dSSadaf Ebrahimi<b>void pcre2_substring_free(PCRE2_UCHAR *<i>buffer</i>);</b> 3504*22dc650dSSadaf Ebrahimi</P> 3505*22dc650dSSadaf Ebrahimi<P> 3506*22dc650dSSadaf EbrahimiTo extract a substring by name, you first have to find associated number. 3507*22dc650dSSadaf EbrahimiFor example, for this pattern: 3508*22dc650dSSadaf Ebrahimi<pre> 3509*22dc650dSSadaf Ebrahimi (a+)b(?<xxx>\d+)... 3510*22dc650dSSadaf Ebrahimi</pre> 3511*22dc650dSSadaf Ebrahimithe number of the capture group called "xxx" is 2. If the name is known to be 3512*22dc650dSSadaf Ebrahimiunique (PCRE2_DUPNAMES was not set), you can find the number from the name by 3513*22dc650dSSadaf Ebrahimicalling <b>pcre2_substring_number_from_name()</b>. The first argument is the 3514*22dc650dSSadaf Ebrahimicompiled pattern, and the second is the name. The yield of the function is the 3515*22dc650dSSadaf Ebrahimigroup number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or 3516*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name. 3517*22dc650dSSadaf EbrahimiGiven the number, you can extract the substring directly from the ovector, or 3518*22dc650dSSadaf Ebrahimiuse one of the "bynumber" functions described above. 3519*22dc650dSSadaf Ebrahimi</P> 3520*22dc650dSSadaf Ebrahimi<P> 3521*22dc650dSSadaf EbrahimiFor convenience, there are also "byname" functions that correspond to the 3522*22dc650dSSadaf Ebrahimi"bynumber" functions, the only difference being that the second argument is a 3523*22dc650dSSadaf Ebrahiminame instead of a number. If PCRE2_DUPNAMES is set and there are duplicate 3524*22dc650dSSadaf Ebrahiminames, these functions scan all the groups with the given name, and return the 3525*22dc650dSSadaf Ebrahimicaptured substring from the first named group that is set. 3526*22dc650dSSadaf Ebrahimi</P> 3527*22dc650dSSadaf Ebrahimi<P> 3528*22dc650dSSadaf EbrahimiIf there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is 3529*22dc650dSSadaf Ebrahimireturned. If all groups with the name have numbers that are greater than the 3530*22dc650dSSadaf Ebrahiminumber of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there 3531*22dc650dSSadaf Ebrahimiis at least one group with a slot in the ovector, but no group is found to be 3532*22dc650dSSadaf Ebrahimiset, PCRE2_ERROR_UNSET is returned. 3533*22dc650dSSadaf Ebrahimi</P> 3534*22dc650dSSadaf Ebrahimi<P> 3535*22dc650dSSadaf Ebrahimi<b>Warning:</b> If the pattern uses the (?| feature to set up multiple 3536*22dc650dSSadaf Ebrahimicapture groups with the same number, as described in the 3537*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html#dupgroupnumber">section on duplicate group numbers</a> 3538*22dc650dSSadaf Ebrahimiin the 3539*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 3540*22dc650dSSadaf Ebrahimipage, you cannot use names to distinguish the different capture groups, because 3541*22dc650dSSadaf Ebrahiminames are not included in the compiled code. The matching process uses only 3542*22dc650dSSadaf Ebrahiminumbers. For this reason, the use of different names for groups with the 3543*22dc650dSSadaf Ebrahimisame number causes an error at compile time. 3544*22dc650dSSadaf Ebrahimi<a name="substitutions"></a></P> 3545*22dc650dSSadaf Ebrahimi<br><a name="SEC37" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br> 3546*22dc650dSSadaf Ebrahimi<P> 3547*22dc650dSSadaf Ebrahimi<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> 3548*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> 3549*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b> 3550*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>, PCRE2_SPTR <i>replacement</i>,</b> 3551*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>rlength</i>, PCRE2_UCHAR *<i>outputbuffer</i>,</b> 3552*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE *<i>outlengthptr</i>);</b> 3553*22dc650dSSadaf Ebrahimi</P> 3554*22dc650dSSadaf Ebrahimi<P> 3555*22dc650dSSadaf EbrahimiThis function optionally calls <b>pcre2_match()</b> and then makes a copy of the 3556*22dc650dSSadaf Ebrahimisubject string in <i>outputbuffer</i>, replacing parts that were matched with 3557*22dc650dSSadaf Ebrahimithe <i>replacement</i> string, whose length is supplied in <b>rlength</b>, which 3558*22dc650dSSadaf Ebrahimican be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. As a 3559*22dc650dSSadaf Ebrahimispecial case, if <i>replacement</i> is NULL and <i>rlength</i> is zero, the 3560*22dc650dSSadaf Ebrahimireplacement is assumed to be an empty string. If <i>rlength</i> is non-zero, an 3561*22dc650dSSadaf Ebrahimierror occurs if <i>replacement</i> is NULL. 3562*22dc650dSSadaf Ebrahimi</P> 3563*22dc650dSSadaf Ebrahimi<P> 3564*22dc650dSSadaf EbrahimiThere is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just 3565*22dc650dSSadaf Ebrahimithe replacement string(s). The default action is to perform just one 3566*22dc650dSSadaf Ebrahimireplacement if the pattern matches, but there is an option that requests 3567*22dc650dSSadaf Ebrahimimultiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below). 3568*22dc650dSSadaf Ebrahimi</P> 3569*22dc650dSSadaf Ebrahimi<P> 3570*22dc650dSSadaf EbrahimiIf successful, <b>pcre2_substitute()</b> returns the number of substitutions 3571*22dc650dSSadaf Ebrahimithat were carried out. This may be zero if no match was found, and is never 3572*22dc650dSSadaf Ebrahimigreater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is 3573*22dc650dSSadaf Ebrahimireturned if an error is detected. 3574*22dc650dSSadaf Ebrahimi</P> 3575*22dc650dSSadaf Ebrahimi<P> 3576*22dc650dSSadaf EbrahimiMatches in which a \K item in a lookahead in the pattern causes the match to 3577*22dc650dSSadaf Ebrahimiend before it starts are not supported, and give rise to an error return. For 3578*22dc650dSSadaf Ebrahimiglobal replacements, matches in which \K in a lookbehind causes the match to 3579*22dc650dSSadaf Ebrahimistart earlier than the point that was reached in the previous iteration are 3580*22dc650dSSadaf Ebrahimialso not supported. 3581*22dc650dSSadaf Ebrahimi</P> 3582*22dc650dSSadaf Ebrahimi<P> 3583*22dc650dSSadaf EbrahimiThe first seven arguments of <b>pcre2_substitute()</b> are the same as for 3584*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>, except that the partial matching options are not 3585*22dc650dSSadaf Ebrahimipermitted, and <i>match_data</i> may be passed as NULL, in which case a match 3586*22dc650dSSadaf Ebrahimidata block is obtained and freed within this function, using memory management 3587*22dc650dSSadaf Ebrahimifunctions from the match context, if provided, or else those that were used to 3588*22dc650dSSadaf Ebrahimiallocate memory for the compiled code. 3589*22dc650dSSadaf Ebrahimi</P> 3590*22dc650dSSadaf Ebrahimi<P> 3591*22dc650dSSadaf EbrahimiIf <i>match_data</i> is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the 3592*22dc650dSSadaf Ebrahimiprovided block is used for all calls to <b>pcre2_match()</b>, and its contents 3593*22dc650dSSadaf Ebrahimiafterwards are the result of the final call. For global changes, this will 3594*22dc650dSSadaf Ebrahimialways be a no-match error. The contents of the ovector within the match data 3595*22dc650dSSadaf Ebrahimiblock may or may not have been changed. 3596*22dc650dSSadaf Ebrahimi</P> 3597*22dc650dSSadaf Ebrahimi<P> 3598*22dc650dSSadaf EbrahimiAs well as the usual options for <b>pcre2_match()</b>, a number of additional 3599*22dc650dSSadaf Ebrahimioptions can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>. 3600*22dc650dSSadaf EbrahimiOne such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external 3601*22dc650dSSadaf Ebrahimi<i>match_data</i> block must be provided, and it must have already been used for 3602*22dc650dSSadaf Ebrahimian external call to <b>pcre2_match()</b> with the same pattern and subject 3603*22dc650dSSadaf Ebrahimiarguments. The data in the <i>match_data</i> block (return code, offset vector) 3604*22dc650dSSadaf Ebrahimiis then used for the first substitution instead of calling <b>pcre2_match()</b> 3605*22dc650dSSadaf Ebrahimifrom within <b>pcre2_substitute()</b>. This allows an application to check for a 3606*22dc650dSSadaf Ebrahimimatch before choosing to substitute, without having to repeat the match. 3607*22dc650dSSadaf Ebrahimi</P> 3608*22dc650dSSadaf Ebrahimi<P> 3609*22dc650dSSadaf EbrahimiThe contents of the externally supplied match data block are not changed when 3610*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set, 3611*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> is called after the first substitution to check for further 3612*22dc650dSSadaf Ebrahimimatches, but this is done using an internally obtained match data block, thus 3613*22dc650dSSadaf Ebrahimialways leaving the external block unchanged. 3614*22dc650dSSadaf Ebrahimi</P> 3615*22dc650dSSadaf Ebrahimi<P> 3616*22dc650dSSadaf EbrahimiThe <i>code</i> argument is not used for matching before the first substitution 3617*22dc650dSSadaf Ebrahimiwhen PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when 3618*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the 3619*22dc650dSSadaf EbrahimiUTF setting and the number of capturing parentheses in the pattern. 3620*22dc650dSSadaf Ebrahimi</P> 3621*22dc650dSSadaf Ebrahimi<P> 3622*22dc650dSSadaf EbrahimiThe default action of <b>pcre2_substitute()</b> is to return a copy of the 3623*22dc650dSSadaf Ebrahimisubject string with matched substrings replaced. However, if 3624*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are 3625*22dc650dSSadaf Ebrahimireturned. In the global case, multiple replacements are concatenated in the 3626*22dc650dSSadaf Ebrahimioutput buffer. Substitution callouts (see 3627*22dc650dSSadaf Ebrahimi<a href="#subcallouts">below)</a> 3628*22dc650dSSadaf Ebrahimican be used to separate them if necessary. 3629*22dc650dSSadaf Ebrahimi</P> 3630*22dc650dSSadaf Ebrahimi<P> 3631*22dc650dSSadaf EbrahimiThe <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a 3632*22dc650dSSadaf Ebrahimivariable that contains the length, in code units, of the output buffer. If the 3633*22dc650dSSadaf Ebrahimifunction is successful, the value is updated to contain the length in code 3634*22dc650dSSadaf Ebrahimiunits of the new string, excluding the trailing zero that is automatically 3635*22dc650dSSadaf Ebrahimiadded. 3636*22dc650dSSadaf Ebrahimi</P> 3637*22dc650dSSadaf Ebrahimi<P> 3638*22dc650dSSadaf EbrahimiIf the function is not successful, the value set via <i>outlengthptr</i> depends 3639*22dc650dSSadaf Ebrahimion the type of error. For syntax errors in the replacement string, the value is 3640*22dc650dSSadaf Ebrahimithe offset in the replacement string where the error was detected. For other 3641*22dc650dSSadaf Ebrahimierrors, the value is PCRE2_UNSET by default. This includes the case of the 3642*22dc650dSSadaf Ebrahimioutput buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set. 3643*22dc650dSSadaf Ebrahimi</P> 3644*22dc650dSSadaf Ebrahimi<P> 3645*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is 3646*22dc650dSSadaf Ebrahimitoo small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If 3647*22dc650dSSadaf Ebrahimithis option is set, however, <b>pcre2_substitute()</b> continues to go through 3648*22dc650dSSadaf Ebrahimithe motions of matching and substituting (without, of course, writing anything) 3649*22dc650dSSadaf Ebrahimiin order to compute the size of buffer that is needed. This value is passed 3650*22dc650dSSadaf Ebrahimiback via the <i>outlengthptr</i> variable, with the result of the function still 3651*22dc650dSSadaf Ebrahimibeing PCRE2_ERROR_NOMEMORY. 3652*22dc650dSSadaf Ebrahimi</P> 3653*22dc650dSSadaf Ebrahimi<P> 3654*22dc650dSSadaf EbrahimiPassing a buffer size of zero is a permitted way of finding out how much memory 3655*22dc650dSSadaf Ebrahimiis needed for given substitution. However, this does mean that the entire 3656*22dc650dSSadaf Ebrahimioperation is carried out twice. Depending on the application, it may be more 3657*22dc650dSSadaf Ebrahimiefficient to allocate a large buffer and free the excess afterwards, instead of 3658*22dc650dSSadaf Ebrahimiusing PCRE2_SUBSTITUTE_OVERFLOW_LENGTH. 3659*22dc650dSSadaf Ebrahimi</P> 3660*22dc650dSSadaf Ebrahimi<P> 3661*22dc650dSSadaf EbrahimiThe replacement string, which is interpreted as a UTF string in UTF mode, is 3662*22dc650dSSadaf Ebrahimichecked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF 3663*22dc650dSSadaf Ebrahimireplacement string causes an immediate return with the relevant UTF error code. 3664*22dc650dSSadaf Ebrahimi</P> 3665*22dc650dSSadaf Ebrahimi<P> 3666*22dc650dSSadaf EbrahimiIf PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted 3667*22dc650dSSadaf Ebrahimiin any way. By default, however, a dollar character is an escape character that 3668*22dc650dSSadaf Ebrahimican specify the insertion of characters from capture groups and names from 3669*22dc650dSSadaf Ebrahimi(*MARK) or other control verbs in the pattern. Dollar is the only escape 3670*22dc650dSSadaf Ebrahimicharacter (backslash is treated as literal). The following forms are always 3671*22dc650dSSadaf Ebrahimirecognized: 3672*22dc650dSSadaf Ebrahimi<pre> 3673*22dc650dSSadaf Ebrahimi $$ insert a dollar character 3674*22dc650dSSadaf Ebrahimi $<n> or ${<n>} insert the contents of group <n> 3675*22dc650dSSadaf Ebrahimi $*MARK or ${*MARK} insert a control verb name 3676*22dc650dSSadaf Ebrahimi</pre> 3677*22dc650dSSadaf EbrahimiEither a group number or a group name can be given for <n>. Curly brackets are 3678*22dc650dSSadaf Ebrahimirequired only if the following character would be interpreted as part of the 3679*22dc650dSSadaf Ebrahiminumber or name. The number may be zero to include the entire matched string. 3680*22dc650dSSadaf EbrahimiFor example, if the pattern a(b)c is matched with "=abc=" and the replacement 3681*22dc650dSSadaf Ebrahimistring "+$1$0$1+", the result is "=+babcb+=". 3682*22dc650dSSadaf Ebrahimi</P> 3683*22dc650dSSadaf Ebrahimi<P> 3684*22dc650dSSadaf Ebrahimi$*MARK inserts the name from the last encountered backtracking control verb on 3685*22dc650dSSadaf Ebrahimithe matching path that has a name. (*MARK) must always include a name, but the 3686*22dc650dSSadaf Ebrahimiother verbs need not. For example, in the case of (*MARK:A)(*PRUNE) the name 3687*22dc650dSSadaf Ebrahimiinserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This 3688*22dc650dSSadaf Ebrahimifacility can be used to perform simple simultaneous substitutions, as this 3689*22dc650dSSadaf Ebrahimi<b>pcre2test</b> example shows: 3690*22dc650dSSadaf Ebrahimi<pre> 3691*22dc650dSSadaf Ebrahimi /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK} 3692*22dc650dSSadaf Ebrahimi apple lemon 3693*22dc650dSSadaf Ebrahimi 2: pear orange 3694*22dc650dSSadaf Ebrahimi</pre> 3695*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string, 3696*22dc650dSSadaf Ebrahimireplacing every matching substring. If this option is not set, only the first 3697*22dc650dSSadaf Ebrahimimatching substring is replaced. The search for matches takes place in the 3698*22dc650dSSadaf Ebrahimioriginal subject string (that is, previous replacements do not affect it). 3699*22dc650dSSadaf EbrahimiIteration is implemented by advancing the <i>startoffset</i> value for each 3700*22dc650dSSadaf Ebrahimisearch, which is always passed the entire subject string. If an offset limit is 3701*22dc650dSSadaf Ebrahimiset in the match context, searching stops when that limit is reached. 3702*22dc650dSSadaf Ebrahimi</P> 3703*22dc650dSSadaf Ebrahimi<P> 3704*22dc650dSSadaf EbrahimiYou can restrict the effect of a global substitution to a portion of the 3705*22dc650dSSadaf Ebrahimisubject string by setting either or both of <i>startoffset</i> and an offset 3706*22dc650dSSadaf Ebrahimilimit. Here is a <b>pcre2test</b> example: 3707*22dc650dSSadaf Ebrahimi<pre> 3708*22dc650dSSadaf Ebrahimi /B/g,replace=!,use_offset_limit 3709*22dc650dSSadaf Ebrahimi ABC ABC ABC ABC\=offset=3,offset_limit=12 3710*22dc650dSSadaf Ebrahimi 2: ABC A!C A!C ABC 3711*22dc650dSSadaf Ebrahimi</pre> 3712*22dc650dSSadaf EbrahimiWhen continuing with global substitutions after matching a substring with zero 3713*22dc650dSSadaf Ebrahimilength, an attempt to find a non-empty match at the same offset is performed. 3714*22dc650dSSadaf EbrahimiIf this is not successful, the offset is advanced by one character except when 3715*22dc650dSSadaf EbrahimiCRLF is a valid newline sequence and the next two characters are CR, LF. In 3716*22dc650dSSadaf Ebrahimithis case, the offset is advanced by two characters. 3717*22dc650dSSadaf Ebrahimi</P> 3718*22dc650dSSadaf Ebrahimi<P> 3719*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do 3720*22dc650dSSadaf Ebrahiminot appear in the pattern to be treated as unset groups. This option should be 3721*22dc650dSSadaf Ebrahimiused with care, because it means that a typo in a group name or number no 3722*22dc650dSSadaf Ebrahimilonger causes the PCRE2_ERROR_NOSUBSTRING error. 3723*22dc650dSSadaf Ebrahimi</P> 3724*22dc650dSSadaf Ebrahimi<P> 3725*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including unknown 3726*22dc650dSSadaf Ebrahimigroups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty 3727*22dc650dSSadaf Ebrahimistrings when inserted as described above. If this option is not set, an attempt 3728*22dc650dSSadaf Ebrahimito insert an unset group causes the PCRE2_ERROR_UNSET error. This option does 3729*22dc650dSSadaf Ebrahiminot influence the extended substitution syntax described below. 3730*22dc650dSSadaf Ebrahimi</P> 3731*22dc650dSSadaf Ebrahimi<P> 3732*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the 3733*22dc650dSSadaf Ebrahimireplacement string. Without this option, only the dollar character is special, 3734*22dc650dSSadaf Ebrahimiand only the group insertion forms listed above are valid. When 3735*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_EXTENDED is set, two things change: 3736*22dc650dSSadaf Ebrahimi</P> 3737*22dc650dSSadaf Ebrahimi<P> 3738*22dc650dSSadaf EbrahimiFirstly, backslash in a replacement string is interpreted as an escape 3739*22dc650dSSadaf Ebrahimicharacter. The usual forms such as \n or \x{ddd} can be used to specify 3740*22dc650dSSadaf Ebrahimiparticular character codes, and backslash followed by any non-alphanumeric 3741*22dc650dSSadaf Ebrahimicharacter quotes that character. Extended quoting can be coded using \Q...\E, 3742*22dc650dSSadaf Ebrahimiexactly as in pattern strings. 3743*22dc650dSSadaf Ebrahimi</P> 3744*22dc650dSSadaf Ebrahimi<P> 3745*22dc650dSSadaf EbrahimiThere are also four escape sequences for forcing the case of inserted letters. 3746*22dc650dSSadaf EbrahimiThe insertion mechanism has three states: no case forcing, force upper case, 3747*22dc650dSSadaf Ebrahimiand force lower case. The escape sequences change the current state: \U and 3748*22dc650dSSadaf Ebrahimi\L change to upper or lower case forcing, respectively, and \E (when not 3749*22dc650dSSadaf Ebrahimiterminating a \Q quoted sequence) reverts to no case forcing. The sequences 3750*22dc650dSSadaf Ebrahimi\u and \l force the next character (if it is a letter) to upper or lower 3751*22dc650dSSadaf Ebrahimicase, respectively, and then the state automatically reverts to no case 3752*22dc650dSSadaf Ebrahimiforcing. Case forcing applies to all inserted characters, including those from 3753*22dc650dSSadaf Ebrahimicapture groups and letters within \Q...\E quoted sequences. If either 3754*22dc650dSSadaf EbrahimiPCRE2_UTF or PCRE2_UCP was set when the pattern was compiled, Unicode 3755*22dc650dSSadaf Ebrahimiproperties are used for case forcing characters whose code points are greater 3756*22dc650dSSadaf Ebrahimithan 127. 3757*22dc650dSSadaf Ebrahimi</P> 3758*22dc650dSSadaf Ebrahimi<P> 3759*22dc650dSSadaf EbrahimiNote that case forcing sequences such as \U...\E do not nest. For example, 3760*22dc650dSSadaf Ebrahimithe result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final \E has no 3761*22dc650dSSadaf Ebrahimieffect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do 3762*22dc650dSSadaf Ebrahiminot apply to replacement strings. 3763*22dc650dSSadaf Ebrahimi</P> 3764*22dc650dSSadaf Ebrahimi<P> 3765*22dc650dSSadaf EbrahimiThe second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more 3766*22dc650dSSadaf Ebrahimiflexibility to capture group substitution. The syntax is similar to that used 3767*22dc650dSSadaf Ebrahimiby Bash: 3768*22dc650dSSadaf Ebrahimi<pre> 3769*22dc650dSSadaf Ebrahimi ${<n>:-<string>} 3770*22dc650dSSadaf Ebrahimi ${<n>:+<string1>:<string2>} 3771*22dc650dSSadaf Ebrahimi</pre> 3772*22dc650dSSadaf EbrahimiAs before, <n> may be a group number or a name. The first form specifies a 3773*22dc650dSSadaf Ebrahimidefault value. If group <n> is set, its value is inserted; if not, <string> is 3774*22dc650dSSadaf Ebrahimiexpanded and the result inserted. The second form specifies strings that are 3775*22dc650dSSadaf Ebrahimiexpanded and inserted when group <n> is set or unset, respectively. The first 3776*22dc650dSSadaf Ebrahimiform is just a convenient shorthand for 3777*22dc650dSSadaf Ebrahimi<pre> 3778*22dc650dSSadaf Ebrahimi ${<n>:+${<n>}:<string>} 3779*22dc650dSSadaf Ebrahimi</pre> 3780*22dc650dSSadaf EbrahimiBackslash can be used to escape colons and closing curly brackets in the 3781*22dc650dSSadaf Ebrahimireplacement strings. A change of the case forcing state within a replacement 3782*22dc650dSSadaf Ebrahimistring remains in force afterwards, as shown in this <b>pcre2test</b> example: 3783*22dc650dSSadaf Ebrahimi<pre> 3784*22dc650dSSadaf Ebrahimi /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo 3785*22dc650dSSadaf Ebrahimi body 3786*22dc650dSSadaf Ebrahimi 1: hello 3787*22dc650dSSadaf Ebrahimi somebody 3788*22dc650dSSadaf Ebrahimi 1: HELLO 3789*22dc650dSSadaf Ebrahimi</pre> 3790*22dc650dSSadaf EbrahimiThe PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended 3791*22dc650dSSadaf Ebrahimisubstitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown 3792*22dc650dSSadaf Ebrahimigroups in the extended syntax forms to be treated as unset. 3793*22dc650dSSadaf Ebrahimi</P> 3794*22dc650dSSadaf Ebrahimi<P> 3795*22dc650dSSadaf EbrahimiIf PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, 3796*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and 3797*22dc650dSSadaf Ebrahimiare ignored. 3798*22dc650dSSadaf Ebrahimi</P> 3799*22dc650dSSadaf Ebrahimi<br><b> 3800*22dc650dSSadaf EbrahimiSubstitution errors 3801*22dc650dSSadaf Ebrahimi</b><br> 3802*22dc650dSSadaf Ebrahimi<P> 3803*22dc650dSSadaf EbrahimiIn the event of an error, <b>pcre2_substitute()</b> returns a negative error 3804*22dc650dSSadaf Ebrahimicode. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from 3805*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> are passed straight back. 3806*22dc650dSSadaf Ebrahimi</P> 3807*22dc650dSSadaf Ebrahimi<P> 3808*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion, 3809*22dc650dSSadaf Ebrahimiunless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set. 3810*22dc650dSSadaf Ebrahimi</P> 3811*22dc650dSSadaf Ebrahimi<P> 3812*22dc650dSSadaf EbrahimiPCRE2_ERROR_UNSET is returned for an unset substring insertion (including an 3813*22dc650dSSadaf Ebrahimiunknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) when the simple 3814*22dc650dSSadaf Ebrahimi(non-extended) syntax is used and PCRE2_SUBSTITUTE_UNSET_EMPTY is not set. 3815*22dc650dSSadaf Ebrahimi</P> 3816*22dc650dSSadaf Ebrahimi<P> 3817*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough. If the 3818*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is 3819*22dc650dSSadaf Ebrahimineeded is returned via <i>outlengthptr</i>. Note that this does not happen by 3820*22dc650dSSadaf Ebrahimidefault. 3821*22dc650dSSadaf Ebrahimi</P> 3822*22dc650dSSadaf Ebrahimi<P> 3823*22dc650dSSadaf EbrahimiPCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the 3824*22dc650dSSadaf Ebrahimi<i>match_data</i> argument is NULL or if the <i>subject</i> or <i>replacement</i> 3825*22dc650dSSadaf Ebrahimiarguments are NULL. For backward compatibility reasons an exception is made for 3826*22dc650dSSadaf Ebrahimithe <i>replacement</i> argument if the <i>rlength</i> argument is also 0. 3827*22dc650dSSadaf Ebrahimi</P> 3828*22dc650dSSadaf Ebrahimi<P> 3829*22dc650dSSadaf EbrahimiPCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the 3830*22dc650dSSadaf Ebrahimireplacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE 3831*22dc650dSSadaf Ebrahimi(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket 3832*22dc650dSSadaf Ebrahiminot found), PCRE2_ERROR_BADSUBSTITUTION (syntax error in extended group 3833*22dc650dSSadaf Ebrahimisubstitution), and PCRE2_ERROR_BADSUBSPATTERN (the pattern match ended before 3834*22dc650dSSadaf Ebrahimiit started or the match started earlier than the current position in the 3835*22dc650dSSadaf Ebrahimisubject, which can happen if \K is used in an assertion). 3836*22dc650dSSadaf Ebrahimi</P> 3837*22dc650dSSadaf Ebrahimi<P> 3838*22dc650dSSadaf EbrahimiAs for all PCRE2 errors, a text message that describes the error can be 3839*22dc650dSSadaf Ebrahimiobtained by calling the <b>pcre2_get_error_message()</b> function (see 3840*22dc650dSSadaf Ebrahimi"Obtaining a textual error message" 3841*22dc650dSSadaf Ebrahimi<a href="#geterrormessage">above).</a> 3842*22dc650dSSadaf Ebrahimi<a name="subcallouts"></a></P> 3843*22dc650dSSadaf Ebrahimi<br><b> 3844*22dc650dSSadaf EbrahimiSubstitution callouts 3845*22dc650dSSadaf Ebrahimi</b><br> 3846*22dc650dSSadaf Ebrahimi<P> 3847*22dc650dSSadaf Ebrahimi<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b> 3848*22dc650dSSadaf Ebrahimi<b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b> 3849*22dc650dSSadaf Ebrahimi<b> void *<i>callout_data</i>);</b> 3850*22dc650dSSadaf Ebrahimi<br> 3851*22dc650dSSadaf Ebrahimi<br> 3852*22dc650dSSadaf EbrahimiThe <b>pcre2_set_substitution_callout()</b> function can be used to specify a 3853*22dc650dSSadaf Ebrahimicallout function for <b>pcre2_substitute()</b>. This information is passed in 3854*22dc650dSSadaf Ebrahimia match context. The callout function is called after each substitution has 3855*22dc650dSSadaf Ebrahimibeen processed, but it can cause the replacement not to happen. The callout 3856*22dc650dSSadaf Ebrahimifunction is not called for simulated substitutions that happen as a result of 3857*22dc650dSSadaf Ebrahimithe PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. 3858*22dc650dSSadaf Ebrahimi</P> 3859*22dc650dSSadaf Ebrahimi<P> 3860*22dc650dSSadaf EbrahimiThe first argument of the callout function is a pointer to a substitute callout 3861*22dc650dSSadaf Ebrahimiblock structure, which contains the following fields, not necessarily in this 3862*22dc650dSSadaf Ebrahimiorder: 3863*22dc650dSSadaf Ebrahimi<pre> 3864*22dc650dSSadaf Ebrahimi uint32_t <i>version</i>; 3865*22dc650dSSadaf Ebrahimi uint32_t <i>subscount</i>; 3866*22dc650dSSadaf Ebrahimi PCRE2_SPTR <i>input</i>; 3867*22dc650dSSadaf Ebrahimi PCRE2_SPTR <i>output</i>; 3868*22dc650dSSadaf Ebrahimi PCRE2_SIZE <i>*ovector</i>; 3869*22dc650dSSadaf Ebrahimi uint32_t <i>oveccount</i>; 3870*22dc650dSSadaf Ebrahimi PCRE2_SIZE <i>output_offsets[2]</i>; 3871*22dc650dSSadaf Ebrahimi</pre> 3872*22dc650dSSadaf EbrahimiThe <i>version</i> field contains the version number of the block format. The 3873*22dc650dSSadaf Ebrahimicurrent version is 0. The version number will increase in future if more fields 3874*22dc650dSSadaf Ebrahimiare added, but the intention is never to remove any of the existing fields. 3875*22dc650dSSadaf Ebrahimi</P> 3876*22dc650dSSadaf Ebrahimi<P> 3877*22dc650dSSadaf EbrahimiThe <i>subscount</i> field is the number of the current match. It is 1 for the 3878*22dc650dSSadaf Ebrahimifirst callout, 2 for the second, and so on. The <i>input</i> and <i>output</i> 3879*22dc650dSSadaf Ebrahimipointers are copies of the values passed to <b>pcre2_substitute()</b>. 3880*22dc650dSSadaf Ebrahimi</P> 3881*22dc650dSSadaf Ebrahimi<P> 3882*22dc650dSSadaf EbrahimiThe <i>ovector</i> field points to the ovector, which contains the result of the 3883*22dc650dSSadaf Ebrahimimost recent match. The <i>oveccount</i> field contains the number of pairs that 3884*22dc650dSSadaf Ebrahimiare set in the ovector, and is always greater than zero. 3885*22dc650dSSadaf Ebrahimi</P> 3886*22dc650dSSadaf Ebrahimi<P> 3887*22dc650dSSadaf EbrahimiThe <i>output_offsets</i> vector contains the offsets of the replacement in the 3888*22dc650dSSadaf Ebrahimioutput string. This has already been processed for dollar and (if requested) 3889*22dc650dSSadaf Ebrahimibackslash substitutions as described above. 3890*22dc650dSSadaf Ebrahimi</P> 3891*22dc650dSSadaf Ebrahimi<P> 3892*22dc650dSSadaf EbrahimiThe second argument of the callout function is the value passed as 3893*22dc650dSSadaf Ebrahimi<i>callout_data</i> when the function was registered. The value returned by the 3894*22dc650dSSadaf Ebrahimicallout function is interpreted as follows: 3895*22dc650dSSadaf Ebrahimi</P> 3896*22dc650dSSadaf Ebrahimi<P> 3897*22dc650dSSadaf EbrahimiIf the value is zero, the replacement is accepted, and, if 3898*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next 3899*22dc650dSSadaf Ebrahimimatch. If the value is not zero, the current replacement is not accepted. If 3900*22dc650dSSadaf Ebrahimithe value is greater than zero, processing continues when 3901*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or 3902*22dc650dSSadaf EbrahimiPCRE2_SUBSTITUTE_GLOBAL is not set), the rest of the input is copied to the 3903*22dc650dSSadaf Ebrahimioutput and the call to <b>pcre2_substitute()</b> exits, returning the number of 3904*22dc650dSSadaf Ebrahimimatches so far. 3905*22dc650dSSadaf Ebrahimi</P> 3906*22dc650dSSadaf Ebrahimi<br><a name="SEC38" href="#TOC1">DUPLICATE CAPTURE GROUP NAMES</a><br> 3907*22dc650dSSadaf Ebrahimi<P> 3908*22dc650dSSadaf Ebrahimi<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b> 3909*22dc650dSSadaf Ebrahimi<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b> 3910*22dc650dSSadaf Ebrahimi</P> 3911*22dc650dSSadaf Ebrahimi<P> 3912*22dc650dSSadaf EbrahimiWhen a pattern is compiled with the PCRE2_DUPNAMES option, names for capture 3913*22dc650dSSadaf Ebrahimigroups are not required to be unique. Duplicate names are always allowed for 3914*22dc650dSSadaf Ebrahimigroups with the same number, created by using the (?| feature. Indeed, if such 3915*22dc650dSSadaf Ebrahimigroups are named, they are required to use the same names. 3916*22dc650dSSadaf Ebrahimi</P> 3917*22dc650dSSadaf Ebrahimi<P> 3918*22dc650dSSadaf EbrahimiNormally, patterns that use duplicate names are such that in any one match, 3919*22dc650dSSadaf Ebrahimionly one of each set of identically-named groups participates. An example is 3920*22dc650dSSadaf Ebrahimishown in the 3921*22dc650dSSadaf Ebrahimi<a href="pcre2pattern.html"><b>pcre2pattern</b></a> 3922*22dc650dSSadaf Ebrahimidocumentation. 3923*22dc650dSSadaf Ebrahimi</P> 3924*22dc650dSSadaf Ebrahimi<P> 3925*22dc650dSSadaf EbrahimiWhen duplicates are present, <b>pcre2_substring_copy_byname()</b> and 3926*22dc650dSSadaf Ebrahimi<b>pcre2_substring_get_byname()</b> return the first substring corresponding to 3927*22dc650dSSadaf Ebrahimithe given name that is set. Only if none are set is PCRE2_ERROR_UNSET is 3928*22dc650dSSadaf Ebrahimireturned. The <b>pcre2_substring_number_from_name()</b> function returns the 3929*22dc650dSSadaf Ebrahimierror PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate names. 3930*22dc650dSSadaf Ebrahimi</P> 3931*22dc650dSSadaf Ebrahimi<P> 3932*22dc650dSSadaf EbrahimiIf you want to get full details of all captured substrings for a given name, 3933*22dc650dSSadaf Ebrahimiyou must use the <b>pcre2_substring_nametable_scan()</b> function. The first 3934*22dc650dSSadaf Ebrahimiargument is the compiled pattern, and the second is the name. If the third and 3935*22dc650dSSadaf Ebrahimifourth arguments are NULL, the function returns a group number for a unique 3936*22dc650dSSadaf Ebrahiminame, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise. 3937*22dc650dSSadaf Ebrahimi</P> 3938*22dc650dSSadaf Ebrahimi<P> 3939*22dc650dSSadaf EbrahimiWhen the third and fourth arguments are not NULL, they must be pointers to 3940*22dc650dSSadaf Ebrahimivariables that are updated by the function. After it has run, they point to the 3941*22dc650dSSadaf Ebrahimifirst and last entries in the name-to-number table for the given name, and the 3942*22dc650dSSadaf Ebrahimifunction returns the length of each entry in code units. In both cases, 3943*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOSUBSTRING is returned if there are no entries for the given name. 3944*22dc650dSSadaf Ebrahimi</P> 3945*22dc650dSSadaf Ebrahimi<P> 3946*22dc650dSSadaf EbrahimiThe format of the name table is described 3947*22dc650dSSadaf Ebrahimi<a href="#infoaboutpattern">above</a> 3948*22dc650dSSadaf Ebrahimiin the section entitled <i>Information about a pattern</i>. Given all the 3949*22dc650dSSadaf Ebrahimirelevant entries for the name, you can extract each of their numbers, and hence 3950*22dc650dSSadaf Ebrahimithe captured data. 3951*22dc650dSSadaf Ebrahimi</P> 3952*22dc650dSSadaf Ebrahimi<br><a name="SEC39" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br> 3953*22dc650dSSadaf Ebrahimi<P> 3954*22dc650dSSadaf EbrahimiThe traditional matching function uses a similar algorithm to Perl, which stops 3955*22dc650dSSadaf Ebrahimiwhen it finds the first match at a given point in the subject. If you want to 3956*22dc650dSSadaf Ebrahimifind all possible matches, or the longest possible match at a given position, 3957*22dc650dSSadaf Ebrahimiconsider using the alternative matching function (see below) instead. If you 3958*22dc650dSSadaf Ebrahimicannot use the alternative function, you can kludge it up by making use of the 3959*22dc650dSSadaf Ebrahimicallout facility, which is described in the 3960*22dc650dSSadaf Ebrahimi<a href="pcre2callout.html"><b>pcre2callout</b></a> 3961*22dc650dSSadaf Ebrahimidocumentation. 3962*22dc650dSSadaf Ebrahimi</P> 3963*22dc650dSSadaf Ebrahimi<P> 3964*22dc650dSSadaf EbrahimiWhat you have to do is to insert a callout right at the end of the pattern. 3965*22dc650dSSadaf EbrahimiWhen your callout function is called, extract and save the current matched 3966*22dc650dSSadaf Ebrahimisubstring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try 3967*22dc650dSSadaf Ebrahimiother alternatives. Ultimately, when it runs out of matches, 3968*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH. 3969*22dc650dSSadaf Ebrahimi<a name="dfamatch"></a></P> 3970*22dc650dSSadaf Ebrahimi<br><a name="SEC40" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br> 3971*22dc650dSSadaf Ebrahimi<P> 3972*22dc650dSSadaf Ebrahimi<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> 3973*22dc650dSSadaf Ebrahimi<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> 3974*22dc650dSSadaf Ebrahimi<b> uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b> 3975*22dc650dSSadaf Ebrahimi<b> pcre2_match_context *<i>mcontext</i>,</b> 3976*22dc650dSSadaf Ebrahimi<b> int *<i>workspace</i>, PCRE2_SIZE <i>wscount</i>);</b> 3977*22dc650dSSadaf Ebrahimi</P> 3978*22dc650dSSadaf Ebrahimi<P> 3979*22dc650dSSadaf EbrahimiThe function <b>pcre2_dfa_match()</b> is called to match a subject string 3980*22dc650dSSadaf Ebrahimiagainst a compiled pattern, using a matching algorithm that scans the subject 3981*22dc650dSSadaf Ebrahimistring just once (not counting lookaround assertions), and does not backtrack 3982*22dc650dSSadaf Ebrahimi(except when processing lookaround assertions). This has different 3983*22dc650dSSadaf Ebrahimicharacteristics to the normal algorithm, and is not compatible with Perl. Some 3984*22dc650dSSadaf Ebrahimiof the features of PCRE2 patterns are not supported. Nevertheless, there are 3985*22dc650dSSadaf Ebrahimitimes when this kind of matching can be useful. For a discussion of the two 3986*22dc650dSSadaf Ebrahimimatching algorithms, and a list of features that <b>pcre2_dfa_match()</b> does 3987*22dc650dSSadaf Ebrahiminot support, see the 3988*22dc650dSSadaf Ebrahimi<a href="pcre2matching.html"><b>pcre2matching</b></a> 3989*22dc650dSSadaf Ebrahimidocumentation. 3990*22dc650dSSadaf Ebrahimi</P> 3991*22dc650dSSadaf Ebrahimi<P> 3992*22dc650dSSadaf EbrahimiThe arguments for the <b>pcre2_dfa_match()</b> function are the same as for 3993*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>, plus two extras. The ovector within the match data block 3994*22dc650dSSadaf Ebrahimiis used in a different way, and this is described below. The other common 3995*22dc650dSSadaf Ebrahimiarguments are used in the same way as for <b>pcre2_match()</b>, so their 3996*22dc650dSSadaf Ebrahimidescription is not repeated here. 3997*22dc650dSSadaf Ebrahimi</P> 3998*22dc650dSSadaf Ebrahimi<P> 3999*22dc650dSSadaf EbrahimiThe two additional arguments provide workspace for the function. The workspace 4000*22dc650dSSadaf Ebrahimivector should contain at least 20 elements. It is used for keeping track of 4001*22dc650dSSadaf Ebrahimimultiple paths through the pattern tree. More workspace is needed for patterns 4002*22dc650dSSadaf Ebrahimiand subjects where there are a lot of potential matches. 4003*22dc650dSSadaf Ebrahimi</P> 4004*22dc650dSSadaf Ebrahimi<P> 4005*22dc650dSSadaf EbrahimiHere is an example of a simple call to <b>pcre2_dfa_match()</b>: 4006*22dc650dSSadaf Ebrahimi<pre> 4007*22dc650dSSadaf Ebrahimi int wspace[20]; 4008*22dc650dSSadaf Ebrahimi pcre2_match_data *md = pcre2_match_data_create(4, NULL); 4009*22dc650dSSadaf Ebrahimi int rc = pcre2_dfa_match( 4010*22dc650dSSadaf Ebrahimi re, /* result of pcre2_compile() */ 4011*22dc650dSSadaf Ebrahimi "some string", /* the subject string */ 4012*22dc650dSSadaf Ebrahimi 11, /* the length of the subject string */ 4013*22dc650dSSadaf Ebrahimi 0, /* start at offset 0 in the subject */ 4014*22dc650dSSadaf Ebrahimi 0, /* default options */ 4015*22dc650dSSadaf Ebrahimi md, /* the match data block */ 4016*22dc650dSSadaf Ebrahimi NULL, /* a match context; NULL means use defaults */ 4017*22dc650dSSadaf Ebrahimi wspace, /* working space vector */ 4018*22dc650dSSadaf Ebrahimi 20); /* number of elements (NOT size in bytes) */ 4019*22dc650dSSadaf Ebrahimi</PRE> 4020*22dc650dSSadaf Ebrahimi</P> 4021*22dc650dSSadaf Ebrahimi<br><b> 4022*22dc650dSSadaf EbrahimiOption bits for <b>pcre2_dfa_match()</b> 4023*22dc650dSSadaf Ebrahimi</b><br> 4024*22dc650dSSadaf Ebrahimi<P> 4025*22dc650dSSadaf EbrahimiThe unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must 4026*22dc650dSSadaf Ebrahimibe zero. The only bits that may be set are PCRE2_ANCHORED, 4027*22dc650dSSadaf EbrahimiPCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, 4028*22dc650dSSadaf EbrahimiPCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, 4029*22dc650dSSadaf EbrahimiPCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last 4030*22dc650dSSadaf Ebrahimifour of these are exactly the same as for <b>pcre2_match()</b>, so their 4031*22dc650dSSadaf Ebrahimidescription is not repeated here. 4032*22dc650dSSadaf Ebrahimi<pre> 4033*22dc650dSSadaf Ebrahimi PCRE2_PARTIAL_HARD 4034*22dc650dSSadaf Ebrahimi PCRE2_PARTIAL_SOFT 4035*22dc650dSSadaf Ebrahimi</pre> 4036*22dc650dSSadaf EbrahimiThese have the same general effect as they do for <b>pcre2_match()</b>, but the 4037*22dc650dSSadaf Ebrahimidetails are slightly different. When PCRE2_PARTIAL_HARD is set for 4038*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>, it returns PCRE2_ERROR_PARTIAL if the end of the 4039*22dc650dSSadaf Ebrahimisubject is reached and there is still at least one matching possibility that 4040*22dc650dSSadaf Ebrahimirequires additional characters. This happens even if some complete matches have 4041*22dc650dSSadaf Ebrahimialready been found. When PCRE2_PARTIAL_SOFT is set, the return code 4042*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL if the end of the 4043*22dc650dSSadaf Ebrahimisubject is reached, there have been no complete matches, but there is still at 4044*22dc650dSSadaf Ebrahimileast one matching possibility. The portion of the string that was inspected 4045*22dc650dSSadaf Ebrahimiwhen the longest partial match was found is set as the first matching string in 4046*22dc650dSSadaf Ebrahimiboth cases. There is a more detailed discussion of partial and multi-segment 4047*22dc650dSSadaf Ebrahimimatching, with examples, in the 4048*22dc650dSSadaf Ebrahimi<a href="pcre2partial.html"><b>pcre2partial</b></a> 4049*22dc650dSSadaf Ebrahimidocumentation. 4050*22dc650dSSadaf Ebrahimi<pre> 4051*22dc650dSSadaf Ebrahimi PCRE2_DFA_SHORTEST 4052*22dc650dSSadaf Ebrahimi</pre> 4053*22dc650dSSadaf EbrahimiSetting the PCRE2_DFA_SHORTEST option causes the matching algorithm to stop as 4054*22dc650dSSadaf Ebrahimisoon as it has found one match. Because of the way the alternative algorithm 4055*22dc650dSSadaf Ebrahimiworks, this is necessarily the shortest possible match at the first possible 4056*22dc650dSSadaf Ebrahimimatching point in the subject string. 4057*22dc650dSSadaf Ebrahimi<pre> 4058*22dc650dSSadaf Ebrahimi PCRE2_DFA_RESTART 4059*22dc650dSSadaf Ebrahimi</pre> 4060*22dc650dSSadaf EbrahimiWhen <b>pcre2_dfa_match()</b> returns a partial match, it is possible to call it 4061*22dc650dSSadaf Ebrahimiagain, with additional subject characters, and have it continue with the same 4062*22dc650dSSadaf Ebrahimimatch. The PCRE2_DFA_RESTART option requests this action; when it is set, the 4063*22dc650dSSadaf Ebrahimi<i>workspace</i> and <i>wscount</i> options must reference the same vector as 4064*22dc650dSSadaf Ebrahimibefore because data about the match so far is left in them after a partial 4065*22dc650dSSadaf Ebrahimimatch. There is more discussion of this facility in the 4066*22dc650dSSadaf Ebrahimi<a href="pcre2partial.html"><b>pcre2partial</b></a> 4067*22dc650dSSadaf Ebrahimidocumentation. 4068*22dc650dSSadaf Ebrahimi</P> 4069*22dc650dSSadaf Ebrahimi<br><b> 4070*22dc650dSSadaf EbrahimiSuccessful returns from <b>pcre2_dfa_match()</b> 4071*22dc650dSSadaf Ebrahimi</b><br> 4072*22dc650dSSadaf Ebrahimi<P> 4073*22dc650dSSadaf EbrahimiWhen <b>pcre2_dfa_match()</b> succeeds, it may have matched more than one 4074*22dc650dSSadaf Ebrahimisubstring in the subject. Note, however, that all the matches from one run of 4075*22dc650dSSadaf Ebrahimithe function start at the same point in the subject. The shorter matches are 4076*22dc650dSSadaf Ebrahimiall initial substrings of the longer matches. For example, if the pattern 4077*22dc650dSSadaf Ebrahimi<pre> 4078*22dc650dSSadaf Ebrahimi <.*> 4079*22dc650dSSadaf Ebrahimi</pre> 4080*22dc650dSSadaf Ebrahimiis matched against the string 4081*22dc650dSSadaf Ebrahimi<pre> 4082*22dc650dSSadaf Ebrahimi This is <something> <something else> <something further> no more 4083*22dc650dSSadaf Ebrahimi</pre> 4084*22dc650dSSadaf Ebrahimithe three matched strings are 4085*22dc650dSSadaf Ebrahimi<pre> 4086*22dc650dSSadaf Ebrahimi <something> <something else> <something further> 4087*22dc650dSSadaf Ebrahimi <something> <something else> 4088*22dc650dSSadaf Ebrahimi <something> 4089*22dc650dSSadaf Ebrahimi</pre> 4090*22dc650dSSadaf EbrahimiOn success, the yield of the function is a number greater than zero, which is 4091*22dc650dSSadaf Ebrahimithe number of matched substrings. The offsets of the substrings are returned in 4092*22dc650dSSadaf Ebrahimithe ovector, and can be extracted by number in the same way as for 4093*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b>, but the numbers bear no relation to any capture groups 4094*22dc650dSSadaf Ebrahimithat may exist in the pattern, because DFA matching does not support capturing. 4095*22dc650dSSadaf Ebrahimi</P> 4096*22dc650dSSadaf Ebrahimi<P> 4097*22dc650dSSadaf EbrahimiCalls to the convenience functions that extract substrings by name 4098*22dc650dSSadaf Ebrahimireturn the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a 4099*22dc650dSSadaf EbrahimiDFA match. The convenience functions that extract substrings by number never 4100*22dc650dSSadaf Ebrahimireturn PCRE2_ERROR_NOSUBSTRING. 4101*22dc650dSSadaf Ebrahimi</P> 4102*22dc650dSSadaf Ebrahimi<P> 4103*22dc650dSSadaf EbrahimiThe matched strings are stored in the ovector in reverse order of length; that 4104*22dc650dSSadaf Ebrahimiis, the longest matching string is first. If there were too many matches to fit 4105*22dc650dSSadaf Ebrahimiinto the ovector, the yield of the function is zero, and the vector is filled 4106*22dc650dSSadaf Ebrahimiwith the longest matches. 4107*22dc650dSSadaf Ebrahimi</P> 4108*22dc650dSSadaf Ebrahimi<P> 4109*22dc650dSSadaf EbrahimiNOTE: PCRE2's "auto-possessification" optimization usually applies to character 4110*22dc650dSSadaf Ebrahimirepeats at the end of a pattern (as well as internally). For example, the 4111*22dc650dSSadaf Ebrahimipattern "a\d+" is compiled as if it were "a\d++". For DFA matching, this 4112*22dc650dSSadaf Ebrahimimeans that only one possible match is found. If you really do want multiple 4113*22dc650dSSadaf Ebrahimimatches in such cases, either use an ungreedy repeat such as "a\d+?" or set 4114*22dc650dSSadaf Ebrahimithe PCRE2_NO_AUTO_POSSESS option when compiling. 4115*22dc650dSSadaf Ebrahimi</P> 4116*22dc650dSSadaf Ebrahimi<br><b> 4117*22dc650dSSadaf EbrahimiError returns from <b>pcre2_dfa_match()</b> 4118*22dc650dSSadaf Ebrahimi</b><br> 4119*22dc650dSSadaf Ebrahimi<P> 4120*22dc650dSSadaf EbrahimiThe <b>pcre2_dfa_match()</b> function returns a negative number when it fails. 4121*22dc650dSSadaf EbrahimiMany of the errors are the same as for <b>pcre2_match()</b>, as described 4122*22dc650dSSadaf Ebrahimi<a href="#errorlist">above.</a> 4123*22dc650dSSadaf EbrahimiThere are in addition the following errors that are specific to 4124*22dc650dSSadaf Ebrahimi<b>pcre2_dfa_match()</b>: 4125*22dc650dSSadaf Ebrahimi<pre> 4126*22dc650dSSadaf Ebrahimi PCRE2_ERROR_DFA_UITEM 4127*22dc650dSSadaf Ebrahimi</pre> 4128*22dc650dSSadaf EbrahimiThis return is given if <b>pcre2_dfa_match()</b> encounters an item in the 4129*22dc650dSSadaf Ebrahimipattern that it does not support, for instance, the use of \C in a UTF mode or 4130*22dc650dSSadaf Ebrahimia backreference. 4131*22dc650dSSadaf Ebrahimi<pre> 4132*22dc650dSSadaf Ebrahimi PCRE2_ERROR_DFA_UCOND 4133*22dc650dSSadaf Ebrahimi</pre> 4134*22dc650dSSadaf EbrahimiThis return is given if <b>pcre2_dfa_match()</b> encounters a condition item 4135*22dc650dSSadaf Ebrahimithat uses a backreference for the condition, or a test for recursion in a 4136*22dc650dSSadaf Ebrahimispecific capture group. These are not supported. 4137*22dc650dSSadaf Ebrahimi<pre> 4138*22dc650dSSadaf Ebrahimi PCRE2_ERROR_DFA_UINVALID_UTF 4139*22dc650dSSadaf Ebrahimi</pre> 4140*22dc650dSSadaf EbrahimiThis return is given if <b>pcre2_dfa_match()</b> is called for a pattern that 4141*22dc650dSSadaf Ebrahimiwas compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for DFA 4142*22dc650dSSadaf Ebrahimimatching. 4143*22dc650dSSadaf Ebrahimi<pre> 4144*22dc650dSSadaf Ebrahimi PCRE2_ERROR_DFA_WSSIZE 4145*22dc650dSSadaf Ebrahimi</pre> 4146*22dc650dSSadaf EbrahimiThis return is given if <b>pcre2_dfa_match()</b> runs out of space in the 4147*22dc650dSSadaf Ebrahimi<i>workspace</i> vector. 4148*22dc650dSSadaf Ebrahimi<pre> 4149*22dc650dSSadaf Ebrahimi PCRE2_ERROR_DFA_RECURSE 4150*22dc650dSSadaf Ebrahimi</pre> 4151*22dc650dSSadaf EbrahimiWhen a recursion or subroutine call is processed, the matching function calls 4152*22dc650dSSadaf Ebrahimiitself recursively, using private memory for the ovector and <i>workspace</i>. 4153*22dc650dSSadaf EbrahimiThis error is given if the internal ovector is not large enough. This should be 4154*22dc650dSSadaf Ebrahimiextremely rare, as a vector of size 1000 is used. 4155*22dc650dSSadaf Ebrahimi<pre> 4156*22dc650dSSadaf Ebrahimi PCRE2_ERROR_DFA_BADRESTART 4157*22dc650dSSadaf Ebrahimi</pre> 4158*22dc650dSSadaf EbrahimiWhen <b>pcre2_dfa_match()</b> is called with the <b>PCRE2_DFA_RESTART</b> option, 4159*22dc650dSSadaf Ebrahimisome plausibility checks are made on the contents of the workspace, which 4160*22dc650dSSadaf Ebrahimishould contain data about the previous partial match. If any of these checks 4161*22dc650dSSadaf Ebrahimifail, this error is given. 4162*22dc650dSSadaf Ebrahimi</P> 4163*22dc650dSSadaf Ebrahimi<br><a name="SEC41" href="#TOC1">SEE ALSO</a><br> 4164*22dc650dSSadaf Ebrahimi<P> 4165*22dc650dSSadaf Ebrahimi<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>, 4166*22dc650dSSadaf Ebrahimi<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3), 4167*22dc650dSSadaf Ebrahimi<b>pcre2sample</b>(3), <b>pcre2unicode</b>(3). 4168*22dc650dSSadaf Ebrahimi</P> 4169*22dc650dSSadaf Ebrahimi<br><a name="SEC42" href="#TOC1">AUTHOR</a><br> 4170*22dc650dSSadaf Ebrahimi<P> 4171*22dc650dSSadaf EbrahimiPhilip Hazel 4172*22dc650dSSadaf Ebrahimi<br> 4173*22dc650dSSadaf EbrahimiRetired from University Computing Service 4174*22dc650dSSadaf Ebrahimi<br> 4175*22dc650dSSadaf EbrahimiCambridge, England. 4176*22dc650dSSadaf Ebrahimi<br> 4177*22dc650dSSadaf Ebrahimi</P> 4178*22dc650dSSadaf Ebrahimi<br><a name="SEC43" href="#TOC1">REVISION</a><br> 4179*22dc650dSSadaf Ebrahimi<P> 4180*22dc650dSSadaf EbrahimiLast updated: 24 April 2024 4181*22dc650dSSadaf Ebrahimi<br> 4182*22dc650dSSadaf EbrahimiCopyright © 1997-2024 University of Cambridge. 4183*22dc650dSSadaf Ebrahimi<br> 4184*22dc650dSSadaf Ebrahimi<p> 4185*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>. 4186*22dc650dSSadaf Ebrahimi</p> 4187