1*22dc650dSSadaf Ebrahimi<html> 2*22dc650dSSadaf Ebrahimi<head> 3*22dc650dSSadaf Ebrahimi<title>pcre2partial specification</title> 4*22dc650dSSadaf Ebrahimi</head> 5*22dc650dSSadaf Ebrahimi<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> 6*22dc650dSSadaf Ebrahimi<h1>pcre2partial man page</h1> 7*22dc650dSSadaf Ebrahimi<p> 8*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>. 9*22dc650dSSadaf Ebrahimi</p> 10*22dc650dSSadaf Ebrahimi<p> 11*22dc650dSSadaf EbrahimiThis page is part of the PCRE2 HTML documentation. It was generated 12*22dc650dSSadaf Ebrahimiautomatically from the original man page. If there is any nonsense in it, 13*22dc650dSSadaf Ebrahimiplease consult the man page, in case the conversion went wrong. 14*22dc650dSSadaf Ebrahimi<br> 15*22dc650dSSadaf Ebrahimi<ul> 16*22dc650dSSadaf Ebrahimi<li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE2</a> 17*22dc650dSSadaf Ebrahimi<li><a name="TOC2" href="#SEC2">REQUIREMENTS FOR A PARTIAL MATCH</a> 18*22dc650dSSadaf Ebrahimi<li><a name="TOC3" href="#SEC3">PARTIAL MATCHING USING pcre2_match()</a> 19*22dc650dSSadaf Ebrahimi<li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre2_match()</a> 20*22dc650dSSadaf Ebrahimi<li><a name="TOC5" href="#SEC5">PARTIAL MATCHING USING pcre2_dfa_match()</a> 21*22dc650dSSadaf Ebrahimi<li><a name="TOC6" href="#SEC6">MULTI-SEGMENT MATCHING WITH pcre2_dfa_match()</a> 22*22dc650dSSadaf Ebrahimi<li><a name="TOC7" href="#SEC7">AUTHOR</a> 23*22dc650dSSadaf Ebrahimi<li><a name="TOC8" href="#SEC8">REVISION</a> 24*22dc650dSSadaf Ebrahimi</ul> 25*22dc650dSSadaf Ebrahimi<br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE2</a><br> 26*22dc650dSSadaf Ebrahimi<P> 27*22dc650dSSadaf EbrahimiIn normal use of PCRE2, if there is a match up to the end of a subject string, 28*22dc650dSSadaf Ebrahimibut more characters are needed to match the entire pattern, PCRE2_ERROR_NOMATCH 29*22dc650dSSadaf Ebrahimiis returned, just like any other failing match. There are circumstances where 30*22dc650dSSadaf Ebrahimiit might be helpful to distinguish this "partial match" case. 31*22dc650dSSadaf Ebrahimi</P> 32*22dc650dSSadaf Ebrahimi<P> 33*22dc650dSSadaf EbrahimiOne example is an application where the subject string is very long, and not 34*22dc650dSSadaf Ebrahimiall available at once. The requirement here is to be able to do the matching 35*22dc650dSSadaf Ebrahimisegment by segment, but special action is needed when a matched substring spans 36*22dc650dSSadaf Ebrahimithe boundary between two segments. 37*22dc650dSSadaf Ebrahimi</P> 38*22dc650dSSadaf Ebrahimi<P> 39*22dc650dSSadaf EbrahimiAnother example is checking a user input string as it is typed, to ensure that 40*22dc650dSSadaf Ebrahimiit conforms to a required format. Invalid characters can be immediately 41*22dc650dSSadaf Ebrahimidiagnosed and rejected, giving instant feedback. 42*22dc650dSSadaf Ebrahimi</P> 43*22dc650dSSadaf Ebrahimi<P> 44*22dc650dSSadaf EbrahimiPartial matching is a PCRE2-specific feature; it is not Perl-compatible. It is 45*22dc650dSSadaf Ebrahimirequested by setting one of the PCRE2_PARTIAL_HARD or PCRE2_PARTIAL_SOFT 46*22dc650dSSadaf Ebrahimioptions when calling a matching function. The difference between the two 47*22dc650dSSadaf Ebrahimioptions is whether or not a partial match is preferred to an alternative 48*22dc650dSSadaf Ebrahimicomplete match, though the details differ between the two types of matching 49*22dc650dSSadaf Ebrahimifunction. If both options are set, PCRE2_PARTIAL_HARD takes precedence. 50*22dc650dSSadaf Ebrahimi</P> 51*22dc650dSSadaf Ebrahimi<P> 52*22dc650dSSadaf EbrahimiIf you want to use partial matching with just-in-time optimized code, as well 53*22dc650dSSadaf Ebrahimias setting a partial match option for the matching function, you must also call 54*22dc650dSSadaf Ebrahimi<b>pcre2_jit_compile()</b> with one or both of these options: 55*22dc650dSSadaf Ebrahimi<pre> 56*22dc650dSSadaf Ebrahimi PCRE2_JIT_PARTIAL_HARD 57*22dc650dSSadaf Ebrahimi PCRE2_JIT_PARTIAL_SOFT 58*22dc650dSSadaf Ebrahimi</pre> 59*22dc650dSSadaf EbrahimiPCRE2_JIT_COMPLETE should also be set if you are going to run non-partial 60*22dc650dSSadaf Ebrahimimatches on the same pattern. Separate code is compiled for each mode. If the 61*22dc650dSSadaf Ebrahimiappropriate JIT mode has not been compiled, interpretive matching code is used. 62*22dc650dSSadaf Ebrahimi</P> 63*22dc650dSSadaf Ebrahimi<P> 64*22dc650dSSadaf EbrahimiSetting a partial matching option disables two of PCRE2's standard 65*22dc650dSSadaf Ebrahimioptimization hints. PCRE2 remembers the last literal code unit in a pattern, 66*22dc650dSSadaf Ebrahimiand abandons matching immediately if it is not present in the subject string. 67*22dc650dSSadaf EbrahimiThis optimization cannot be used for a subject string that might match only 68*22dc650dSSadaf Ebrahimipartially. PCRE2 also remembers a minimum length of a matching string, and does 69*22dc650dSSadaf Ebrahiminot bother to run the matching function on shorter strings. This optimization 70*22dc650dSSadaf Ebrahimiis also disabled for partial matching. 71*22dc650dSSadaf Ebrahimi</P> 72*22dc650dSSadaf Ebrahimi<br><a name="SEC2" href="#TOC1">REQUIREMENTS FOR A PARTIAL MATCH</a><br> 73*22dc650dSSadaf Ebrahimi<P> 74*22dc650dSSadaf EbrahimiA possible partial match occurs during matching when the end of the subject 75*22dc650dSSadaf Ebrahimistring is reached successfully, but either more characters are needed to 76*22dc650dSSadaf Ebrahimicomplete the match, or the addition of more characters might change what is 77*22dc650dSSadaf Ebrahimimatched. 78*22dc650dSSadaf Ebrahimi</P> 79*22dc650dSSadaf Ebrahimi<P> 80*22dc650dSSadaf EbrahimiExample 1: if the pattern is /abc/ and the subject is "ab", more characters are 81*22dc650dSSadaf Ebrahimidefinitely needed to complete a match. In this case both hard and soft matching 82*22dc650dSSadaf Ebrahimioptions yield a partial match. 83*22dc650dSSadaf Ebrahimi</P> 84*22dc650dSSadaf Ebrahimi<P> 85*22dc650dSSadaf EbrahimiExample 2: if the pattern is /ab+/ and the subject is "ab", a complete match 86*22dc650dSSadaf Ebrahimican be found, but the addition of more characters might change what is 87*22dc650dSSadaf Ebrahimimatched. In this case, only PCRE2_PARTIAL_HARD returns a partial match; 88*22dc650dSSadaf EbrahimiPCRE2_PARTIAL_SOFT returns the complete match. 89*22dc650dSSadaf Ebrahimi</P> 90*22dc650dSSadaf Ebrahimi<P> 91*22dc650dSSadaf EbrahimiOn reaching the end of the subject, when PCRE2_PARTIAL_HARD is set, if the next 92*22dc650dSSadaf Ebrahimipattern item is \z, \Z, \b, \B, or $ there is always a partial match. 93*22dc650dSSadaf EbrahimiOtherwise, for both options, the next pattern item must be one that inspects a 94*22dc650dSSadaf Ebrahimicharacter, and at least one of the following must be true: 95*22dc650dSSadaf Ebrahimi</P> 96*22dc650dSSadaf Ebrahimi<P> 97*22dc650dSSadaf Ebrahimi(1) At least one character has already been inspected. An inspected character 98*22dc650dSSadaf Ebrahimineed not form part of the final matched string; lookbehind assertions and the 99*22dc650dSSadaf Ebrahimi\K escape sequence provide ways of inspecting characters before the start of a 100*22dc650dSSadaf Ebrahimimatched string. 101*22dc650dSSadaf Ebrahimi</P> 102*22dc650dSSadaf Ebrahimi<P> 103*22dc650dSSadaf Ebrahimi(2) The pattern contains one or more lookbehind assertions. This condition 104*22dc650dSSadaf Ebrahimiexists in case there is a lookbehind that inspects characters before the start 105*22dc650dSSadaf Ebrahimiof the match. 106*22dc650dSSadaf Ebrahimi</P> 107*22dc650dSSadaf Ebrahimi<P> 108*22dc650dSSadaf Ebrahimi(3) There is a special case when the whole pattern can match an empty string. 109*22dc650dSSadaf EbrahimiWhen the starting point is at the end of the subject, the empty string match is 110*22dc650dSSadaf Ebrahimia possibility, and if PCRE2_PARTIAL_SOFT is set and neither of the above 111*22dc650dSSadaf Ebrahimiconditions is true, it is returned. However, because adding more characters 112*22dc650dSSadaf Ebrahimimight result in a non-empty match, PCRE2_PARTIAL_HARD returns a partial match, 113*22dc650dSSadaf Ebrahimiwhich in this case means "there is going to be a match at this point, but until 114*22dc650dSSadaf Ebrahimisome more characters are added, we do not know if it will be an empty string or 115*22dc650dSSadaf Ebrahimisomething longer". 116*22dc650dSSadaf Ebrahimi</P> 117*22dc650dSSadaf Ebrahimi<br><a name="SEC3" href="#TOC1">PARTIAL MATCHING USING pcre2_match()</a><br> 118*22dc650dSSadaf Ebrahimi<P> 119*22dc650dSSadaf EbrahimiWhen a partial matching option is set, the result of calling 120*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> can be one of the following: 121*22dc650dSSadaf Ebrahimi</P> 122*22dc650dSSadaf Ebrahimi<P> 123*22dc650dSSadaf Ebrahimi<b>A successful match</b> 124*22dc650dSSadaf EbrahimiA complete match has been found, starting and ending within this subject. 125*22dc650dSSadaf Ebrahimi</P> 126*22dc650dSSadaf Ebrahimi<P> 127*22dc650dSSadaf Ebrahimi<b>PCRE2_ERROR_NOMATCH</b> 128*22dc650dSSadaf EbrahimiNo match can start anywhere in this subject. 129*22dc650dSSadaf Ebrahimi</P> 130*22dc650dSSadaf Ebrahimi<P> 131*22dc650dSSadaf Ebrahimi<b>PCRE2_ERROR_PARTIAL</b> 132*22dc650dSSadaf EbrahimiAdding more characters may result in a complete match that uses one or more 133*22dc650dSSadaf Ebrahimicharacters from the end of this subject. 134*22dc650dSSadaf Ebrahimi</P> 135*22dc650dSSadaf Ebrahimi<P> 136*22dc650dSSadaf EbrahimiWhen a partial match is returned, the first two elements in the ovector point 137*22dc650dSSadaf Ebrahimito the portion of the subject that was matched, but the values in the rest of 138*22dc650dSSadaf Ebrahimithe ovector are undefined. The appearance of \K in the pattern has no effect 139*22dc650dSSadaf Ebrahimifor a partial match. Consider this pattern: 140*22dc650dSSadaf Ebrahimi<pre> 141*22dc650dSSadaf Ebrahimi /abc\K123/ 142*22dc650dSSadaf Ebrahimi</pre> 143*22dc650dSSadaf EbrahimiIf it is matched against "456abc123xyz" the result is a complete match, and the 144*22dc650dSSadaf Ebrahimiovector defines the matched string as "123", because \K resets the "start of 145*22dc650dSSadaf Ebrahimimatch" point. However, if a partial match is requested and the subject string 146*22dc650dSSadaf Ebrahimiis "456abc12", a partial match is found for the string "abc12", because all 147*22dc650dSSadaf Ebrahimithese characters are needed for a subsequent re-match with additional 148*22dc650dSSadaf Ebrahimicharacters. 149*22dc650dSSadaf Ebrahimi</P> 150*22dc650dSSadaf Ebrahimi<P> 151*22dc650dSSadaf EbrahimiIf there is more than one partial match, the first one that was found provides 152*22dc650dSSadaf Ebrahimithe data that is returned. Consider this pattern: 153*22dc650dSSadaf Ebrahimi<pre> 154*22dc650dSSadaf Ebrahimi /123\w+X|dogY/ 155*22dc650dSSadaf Ebrahimi</pre> 156*22dc650dSSadaf EbrahimiIf this is matched against the subject string "abc123dog", both alternatives 157*22dc650dSSadaf Ebrahimifail to match, but the end of the subject is reached during matching, so 158*22dc650dSSadaf EbrahimiPCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, identifying 159*22dc650dSSadaf Ebrahimi"123dog" as the first partial match. (In this example, there are two partial 160*22dc650dSSadaf Ebrahimimatches, because "dog" on its own partially matches the second alternative.) 161*22dc650dSSadaf Ebrahimi</P> 162*22dc650dSSadaf Ebrahimi<br><b> 163*22dc650dSSadaf EbrahimiHow a partial match is processed by pcre2_match() 164*22dc650dSSadaf Ebrahimi</b><br> 165*22dc650dSSadaf Ebrahimi<P> 166*22dc650dSSadaf EbrahimiWhat happens when a partial match is identified depends on which of the two 167*22dc650dSSadaf Ebrahimipartial matching options is set. 168*22dc650dSSadaf Ebrahimi</P> 169*22dc650dSSadaf Ebrahimi<P> 170*22dc650dSSadaf EbrahimiIf PCRE2_PARTIAL_HARD is set, PCRE2_ERROR_PARTIAL is returned as soon as a 171*22dc650dSSadaf Ebrahimipartial match is found, without continuing to search for possible complete 172*22dc650dSSadaf Ebrahimimatches. This option is "hard" because it prefers an earlier partial match over 173*22dc650dSSadaf Ebrahimia later complete match. For this reason, the assumption is made that the end of 174*22dc650dSSadaf Ebrahimithe supplied subject string is not the true end of the available data, which is 175*22dc650dSSadaf Ebrahimiwhy \z, \Z, \b, \B, and $ always give a partial match. 176*22dc650dSSadaf Ebrahimi</P> 177*22dc650dSSadaf Ebrahimi<P> 178*22dc650dSSadaf EbrahimiIf PCRE2_PARTIAL_SOFT is set, the partial match is remembered, but matching 179*22dc650dSSadaf Ebrahimicontinues as normal, and other alternatives in the pattern are tried. If no 180*22dc650dSSadaf Ebrahimicomplete match can be found, PCRE2_ERROR_PARTIAL is returned instead of 181*22dc650dSSadaf EbrahimiPCRE2_ERROR_NOMATCH. This option is "soft" because it prefers a complete match 182*22dc650dSSadaf Ebrahimiover a partial match. All the various matching items in a pattern behave as if 183*22dc650dSSadaf Ebrahimithe subject string is potentially complete; \z, \Z, and $ match at the end of 184*22dc650dSSadaf Ebrahimithe subject, as normal, and for \b and \B the end of the subject is treated 185*22dc650dSSadaf Ebrahimias a non-alphanumeric. 186*22dc650dSSadaf Ebrahimi</P> 187*22dc650dSSadaf Ebrahimi<P> 188*22dc650dSSadaf EbrahimiThe difference between the two partial matching options can be illustrated by a 189*22dc650dSSadaf Ebrahimipattern such as: 190*22dc650dSSadaf Ebrahimi<pre> 191*22dc650dSSadaf Ebrahimi /dog(sbody)?/ 192*22dc650dSSadaf Ebrahimi</pre> 193*22dc650dSSadaf EbrahimiThis matches either "dog" or "dogsbody", greedily (that is, it prefers the 194*22dc650dSSadaf Ebrahimilonger string if possible). If it is matched against the string "dog" with 195*22dc650dSSadaf EbrahimiPCRE2_PARTIAL_SOFT, it yields a complete match for "dog". However, if 196*22dc650dSSadaf EbrahimiPCRE2_PARTIAL_HARD is set, the result is PCRE2_ERROR_PARTIAL. On the other 197*22dc650dSSadaf Ebrahimihand, if the pattern is made ungreedy the result is different: 198*22dc650dSSadaf Ebrahimi<pre> 199*22dc650dSSadaf Ebrahimi /dog(sbody)??/ 200*22dc650dSSadaf Ebrahimi</pre> 201*22dc650dSSadaf EbrahimiIn this case the result is always a complete match because that is found first, 202*22dc650dSSadaf Ebrahimiand matching never continues after finding a complete match. It might be easier 203*22dc650dSSadaf Ebrahimito follow this explanation by thinking of the two patterns like this: 204*22dc650dSSadaf Ebrahimi<pre> 205*22dc650dSSadaf Ebrahimi /dog(sbody)?/ is the same as /dogsbody|dog/ 206*22dc650dSSadaf Ebrahimi /dog(sbody)??/ is the same as /dog|dogsbody/ 207*22dc650dSSadaf Ebrahimi</pre> 208*22dc650dSSadaf EbrahimiThe second pattern will never match "dogsbody", because it will always find the 209*22dc650dSSadaf Ebrahimishorter match first. 210*22dc650dSSadaf Ebrahimi</P> 211*22dc650dSSadaf Ebrahimi<br><b> 212*22dc650dSSadaf EbrahimiExample of partial matching using pcre2test 213*22dc650dSSadaf Ebrahimi</b><br> 214*22dc650dSSadaf Ebrahimi<P> 215*22dc650dSSadaf EbrahimiThe <b>pcre2test</b> data modifiers <b>partial_hard</b> (or <b>ph</b>) and 216*22dc650dSSadaf Ebrahimi<b>partial_soft</b> (or <b>ps</b>) set PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT, 217*22dc650dSSadaf Ebrahimirespectively, when calling <b>pcre2_match()</b>. Here is a run of 218*22dc650dSSadaf Ebrahimi<b>pcre2test</b> using a pattern that matches the whole subject in the form of a 219*22dc650dSSadaf Ebrahimidate: 220*22dc650dSSadaf Ebrahimi<pre> 221*22dc650dSSadaf Ebrahimi re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ 222*22dc650dSSadaf Ebrahimi data> 25dec3\=ph 223*22dc650dSSadaf Ebrahimi Partial match: 23dec3 224*22dc650dSSadaf Ebrahimi data> 3ju\=ph 225*22dc650dSSadaf Ebrahimi Partial match: 3ju 226*22dc650dSSadaf Ebrahimi data> 3juj\=ph 227*22dc650dSSadaf Ebrahimi No match 228*22dc650dSSadaf Ebrahimi</pre> 229*22dc650dSSadaf EbrahimiThis example gives the same results for both hard and soft partial matching 230*22dc650dSSadaf Ebrahimioptions. Here is an example where there is a difference: 231*22dc650dSSadaf Ebrahimi<pre> 232*22dc650dSSadaf Ebrahimi re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ 233*22dc650dSSadaf Ebrahimi data> 25jun04\=ps 234*22dc650dSSadaf Ebrahimi 0: 25jun04 235*22dc650dSSadaf Ebrahimi 1: jun 236*22dc650dSSadaf Ebrahimi data> 25jun04\=ph 237*22dc650dSSadaf Ebrahimi Partial match: 25jun04 238*22dc650dSSadaf Ebrahimi</pre> 239*22dc650dSSadaf EbrahimiWith PCRE2_PARTIAL_SOFT, the subject is matched completely. For 240*22dc650dSSadaf EbrahimiPCRE2_PARTIAL_HARD, however, the subject is assumed not to be complete, so 241*22dc650dSSadaf Ebrahimithere is only a partial match. 242*22dc650dSSadaf Ebrahimi</P> 243*22dc650dSSadaf Ebrahimi<br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre2_match()</a><br> 244*22dc650dSSadaf Ebrahimi<P> 245*22dc650dSSadaf EbrahimiPCRE was not originally designed with multi-segment matching in mind. However, 246*22dc650dSSadaf Ebrahimiover time, features (including partial matching) that make multi-segment 247*22dc650dSSadaf Ebrahimimatching possible have been added. A very long string can be searched segment 248*22dc650dSSadaf Ebrahimiby segment by calling <b>pcre2_match()</b> repeatedly, with the aim of achieving 249*22dc650dSSadaf Ebrahimithe same results that would happen if the entire string was available for 250*22dc650dSSadaf Ebrahimisearching all the time. Normally, the strings that are being sought are much 251*22dc650dSSadaf Ebrahimishorter than each individual segment, and are in the middle of very long 252*22dc650dSSadaf Ebrahimistrings, so the pattern is normally not anchored. 253*22dc650dSSadaf Ebrahimi</P> 254*22dc650dSSadaf Ebrahimi<P> 255*22dc650dSSadaf EbrahimiSpecial logic must be implemented to handle a matched substring that spans a 256*22dc650dSSadaf Ebrahimisegment boundary. PCRE2_PARTIAL_HARD should be used, because it returns a 257*22dc650dSSadaf Ebrahimipartial match at the end of a segment whenever there is the possibility of 258*22dc650dSSadaf Ebrahimichanging the match by adding more characters. The PCRE2_NOTBOL option should 259*22dc650dSSadaf Ebrahimialso be set for all but the first segment. 260*22dc650dSSadaf Ebrahimi</P> 261*22dc650dSSadaf Ebrahimi<P> 262*22dc650dSSadaf EbrahimiWhen a partial match occurs, the next segment must be added to the current 263*22dc650dSSadaf Ebrahimisubject and the match re-run, using the <i>startoffset</i> argument of 264*22dc650dSSadaf Ebrahimi<b>pcre2_match()</b> to begin at the point where the partial match started. 265*22dc650dSSadaf EbrahimiFor example: 266*22dc650dSSadaf Ebrahimi<pre> 267*22dc650dSSadaf Ebrahimi re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/ 268*22dc650dSSadaf Ebrahimi data> ...the date is 23ja\=ph 269*22dc650dSSadaf Ebrahimi Partial match: 23ja 270*22dc650dSSadaf Ebrahimi data> ...the date is 23jan19 and on that day...\=offset=15 271*22dc650dSSadaf Ebrahimi 0: 23jan19 272*22dc650dSSadaf Ebrahimi 1: jan 273*22dc650dSSadaf Ebrahimi</pre> 274*22dc650dSSadaf EbrahimiNote the use of the <b>offset</b> modifier to start the new match where the 275*22dc650dSSadaf Ebrahimipartial match was found. In this example, the next segment was added to the one 276*22dc650dSSadaf Ebrahimiin which the partial match was found. This is the most straightforward 277*22dc650dSSadaf Ebrahimiapproach, typically using a memory buffer that is twice the size of each 278*22dc650dSSadaf Ebrahimisegment. After a partial match, the first half of the buffer is discarded, the 279*22dc650dSSadaf Ebrahimisecond half is moved to the start of the buffer, and a new segment is added 280*22dc650dSSadaf Ebrahimibefore repeating the match as in the example above. After a no match, the 281*22dc650dSSadaf Ebrahimientire buffer can be discarded. 282*22dc650dSSadaf Ebrahimi</P> 283*22dc650dSSadaf Ebrahimi<P> 284*22dc650dSSadaf EbrahimiIf there are memory constraints, you may want to discard text that precedes a 285*22dc650dSSadaf Ebrahimipartial match before adding the next segment. Unfortunately, this is not at 286*22dc650dSSadaf Ebrahimipresent straightforward. In cases such as the above, where the pattern does not 287*22dc650dSSadaf Ebrahimicontain any lookbehinds, it is sufficient to retain only the partially matched 288*22dc650dSSadaf Ebrahimisubstring. However, if the pattern contains a lookbehind assertion, characters 289*22dc650dSSadaf Ebrahimithat precede the start of the partial match may have been inspected during the 290*22dc650dSSadaf Ebrahimimatching process. When <b>pcre2test</b> displays a partial match, it indicates 291*22dc650dSSadaf Ebrahimithese characters with '<' if the <b>allusedtext</b> modifier is set: 292*22dc650dSSadaf Ebrahimi<pre> 293*22dc650dSSadaf Ebrahimi re> "(?<=123)abc" 294*22dc650dSSadaf Ebrahimi data> xx123ab\=ph,allusedtext 295*22dc650dSSadaf Ebrahimi Partial match: 123ab 296*22dc650dSSadaf Ebrahimi <<< 297*22dc650dSSadaf Ebrahimi</pre> 298*22dc650dSSadaf EbrahimiHowever, the <b>allusedtext</b> modifier is not available for JIT matching, 299*22dc650dSSadaf Ebrahimibecause JIT matching does not record the first (or last) consulted characters. 300*22dc650dSSadaf EbrahimiFor this reason, this information is not available via the API. It is therefore 301*22dc650dSSadaf Ebrahiminot possible in general to obtain the exact number of characters that must be 302*22dc650dSSadaf Ebrahimiretained in order to get the right match result. If you cannot retain the 303*22dc650dSSadaf Ebrahimientire segment, you must find some heuristic way of choosing. 304*22dc650dSSadaf Ebrahimi</P> 305*22dc650dSSadaf Ebrahimi<P> 306*22dc650dSSadaf EbrahimiIf you know the approximate length of the matching substrings, you can use that 307*22dc650dSSadaf Ebrahimito decide how much text to retain. The only lookbehind information that is 308*22dc650dSSadaf Ebrahimicurrently available via the API is the length of the longest individual 309*22dc650dSSadaf Ebrahimilookbehind in a pattern, but this can be misleading if there are nested 310*22dc650dSSadaf Ebrahimilookbehinds. The value returned by calling <b>pcre2_pattern_info()</b> with the 311*22dc650dSSadaf EbrahimiPCRE2_INFO_MAXLOOKBEHIND option is the maximum number of characters (not code 312*22dc650dSSadaf Ebrahimiunits) that any individual lookbehind moves back when it is processed. A 313*22dc650dSSadaf Ebrahimipattern such as "(?<=(?<!b)a)" has a maximum lookbehind value of one, but 314*22dc650dSSadaf Ebrahimiinspects two characters before its starting point. 315*22dc650dSSadaf Ebrahimi</P> 316*22dc650dSSadaf Ebrahimi<P> 317*22dc650dSSadaf EbrahimiIn a non-UTF or a 32-bit case, moving back is just a subtraction, but in 318*22dc650dSSadaf EbrahimiUTF-8 or UTF-16 you have to count characters while moving back through the code 319*22dc650dSSadaf Ebrahimiunits. 320*22dc650dSSadaf Ebrahimi</P> 321*22dc650dSSadaf Ebrahimi<br><a name="SEC5" href="#TOC1">PARTIAL MATCHING USING pcre2_dfa_match()</a><br> 322*22dc650dSSadaf Ebrahimi<P> 323*22dc650dSSadaf EbrahimiThe DFA function moves along the subject string character by character, without 324*22dc650dSSadaf Ebrahimibacktracking, searching for all possible matches simultaneously. If the end of 325*22dc650dSSadaf Ebrahimithe subject is reached before the end of the pattern, there is the possibility 326*22dc650dSSadaf Ebrahimiof a partial match. 327*22dc650dSSadaf Ebrahimi</P> 328*22dc650dSSadaf Ebrahimi<P> 329*22dc650dSSadaf EbrahimiWhen PCRE2_PARTIAL_SOFT is set, PCRE2_ERROR_PARTIAL is returned only if there 330*22dc650dSSadaf Ebrahimihave been no complete matches. Otherwise, the complete matches are returned. 331*22dc650dSSadaf EbrahimiIf PCRE2_PARTIAL_HARD is set, a partial match takes precedence over any 332*22dc650dSSadaf Ebrahimicomplete matches. The portion of the string that was matched when the longest 333*22dc650dSSadaf Ebrahimipartial match was found is set as the first matching string. 334*22dc650dSSadaf Ebrahimi</P> 335*22dc650dSSadaf Ebrahimi<P> 336*22dc650dSSadaf EbrahimiBecause the DFA function always searches for all possible matches, and there is 337*22dc650dSSadaf Ebrahimino difference between greedy and ungreedy repetition, its behaviour is 338*22dc650dSSadaf Ebrahimidifferent from the <b>pcre2_match()</b>. Consider the string "dog" matched 339*22dc650dSSadaf Ebrahimiagainst this ungreedy pattern: 340*22dc650dSSadaf Ebrahimi<pre> 341*22dc650dSSadaf Ebrahimi /dog(sbody)??/ 342*22dc650dSSadaf Ebrahimi</pre> 343*22dc650dSSadaf EbrahimiWhereas the standard function stops as soon as it finds the complete match for 344*22dc650dSSadaf Ebrahimi"dog", the DFA function also finds the partial match for "dogsbody", and so 345*22dc650dSSadaf Ebrahimireturns that when PCRE2_PARTIAL_HARD is set. 346*22dc650dSSadaf Ebrahimi</P> 347*22dc650dSSadaf Ebrahimi<br><a name="SEC6" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre2_dfa_match()</a><br> 348*22dc650dSSadaf Ebrahimi<P> 349*22dc650dSSadaf EbrahimiWhen a partial match has been found using the DFA matching function, it is 350*22dc650dSSadaf Ebrahimipossible to continue the match by providing additional subject data and calling 351*22dc650dSSadaf Ebrahimithe function again with the same compiled regular expression, this time setting 352*22dc650dSSadaf Ebrahimithe PCRE2_DFA_RESTART option. You must pass the same working space as before, 353*22dc650dSSadaf Ebrahimibecause this is where details of the previous partial match are stored. You can 354*22dc650dSSadaf Ebrahimiset the PCRE2_PARTIAL_SOFT or PCRE2_PARTIAL_HARD options with PCRE2_DFA_RESTART 355*22dc650dSSadaf Ebrahimito continue partial matching over multiple segments. Here is an example using 356*22dc650dSSadaf Ebrahimi<b>pcre2test</b>: 357*22dc650dSSadaf Ebrahimi<pre> 358*22dc650dSSadaf Ebrahimi re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ 359*22dc650dSSadaf Ebrahimi data> 23ja\=dfa,ps 360*22dc650dSSadaf Ebrahimi Partial match: 23ja 361*22dc650dSSadaf Ebrahimi data> n05\=dfa,dfa_restart 362*22dc650dSSadaf Ebrahimi 0: n05 363*22dc650dSSadaf Ebrahimi</pre> 364*22dc650dSSadaf EbrahimiThe first call has "23ja" as the subject, and requests partial matching; the 365*22dc650dSSadaf Ebrahimisecond call has "n05" as the subject for the continued (restarted) match. 366*22dc650dSSadaf EbrahimiNotice that when the match is complete, only the last part is shown; PCRE2 does 367*22dc650dSSadaf Ebrahiminot retain the previously partially-matched string. It is up to the calling 368*22dc650dSSadaf Ebrahimiprogram to do that if it needs to. This means that, for an unanchored pattern, 369*22dc650dSSadaf Ebrahimiif a continued match fails, it is not possible to try again at a new starting 370*22dc650dSSadaf Ebrahimipoint. All this facility is capable of doing is continuing with the previous 371*22dc650dSSadaf Ebrahimimatch attempt. For example, consider this pattern: 372*22dc650dSSadaf Ebrahimi<pre> 373*22dc650dSSadaf Ebrahimi 1234|3789 374*22dc650dSSadaf Ebrahimi</pre> 375*22dc650dSSadaf EbrahimiIf the first part of the subject is "ABC123", a partial match of the first 376*22dc650dSSadaf Ebrahimialternative is found at offset 3. There is no partial match for the second 377*22dc650dSSadaf Ebrahimialternative, because such a match does not start at the same point in the 378*22dc650dSSadaf Ebrahimisubject string. Attempting to continue with the string "7890" does not yield a 379*22dc650dSSadaf Ebrahimimatch because only those alternatives that match at one point in the subject 380*22dc650dSSadaf Ebrahimiare remembered. Depending on the application, this may or may not be what you 381*22dc650dSSadaf Ebrahimiwant. 382*22dc650dSSadaf Ebrahimi</P> 383*22dc650dSSadaf Ebrahimi<P> 384*22dc650dSSadaf EbrahimiIf you do want to allow for starting again at the next character, one way of 385*22dc650dSSadaf Ebrahimidoing it is to retain some or all of the segment and try a new complete match, 386*22dc650dSSadaf Ebrahimias described for <b>pcre2_match()</b> above. Another possibility is to work with 387*22dc650dSSadaf Ebrahimitwo buffers. If a partial match at offset <i>n</i> in the first buffer is 388*22dc650dSSadaf Ebrahimifollowed by "no match" when PCRE2_DFA_RESTART is used on the second buffer, you 389*22dc650dSSadaf Ebrahimican then try a new match starting at offset <i>n+1</i> in the first buffer. 390*22dc650dSSadaf Ebrahimi</P> 391*22dc650dSSadaf Ebrahimi<br><a name="SEC7" href="#TOC1">AUTHOR</a><br> 392*22dc650dSSadaf Ebrahimi<P> 393*22dc650dSSadaf EbrahimiPhilip Hazel 394*22dc650dSSadaf Ebrahimi<br> 395*22dc650dSSadaf EbrahimiRetired from University Computing Service 396*22dc650dSSadaf Ebrahimi<br> 397*22dc650dSSadaf EbrahimiCambridge, England. 398*22dc650dSSadaf Ebrahimi<br> 399*22dc650dSSadaf Ebrahimi</P> 400*22dc650dSSadaf Ebrahimi<br><a name="SEC8" href="#TOC1">REVISION</a><br> 401*22dc650dSSadaf Ebrahimi<P> 402*22dc650dSSadaf EbrahimiLast updated: 04 September 2019 403*22dc650dSSadaf Ebrahimi<br> 404*22dc650dSSadaf EbrahimiCopyright © 1997-2019 University of Cambridge. 405*22dc650dSSadaf Ebrahimi<br> 406*22dc650dSSadaf Ebrahimi<p> 407*22dc650dSSadaf EbrahimiReturn to the <a href="index.html">PCRE2 index page</a>. 408*22dc650dSSadaf Ebrahimi</p> 409