1*22dc650dSSadaf Ebrahimi# These test special UTF and UCP features of DFA matching. The output is 2*22dc650dSSadaf Ebrahimi# different for the different widths. 3*22dc650dSSadaf Ebrahimi 4*22dc650dSSadaf Ebrahimi#subject dfa 5*22dc650dSSadaf Ebrahimi 6*22dc650dSSadaf Ebrahimi# ---------------------------------------------------- 7*22dc650dSSadaf Ebrahimi# These are a selection of the more comprehensive tests that are run for 8*22dc650dSSadaf Ebrahimi# non-DFA matching. 9*22dc650dSSadaf Ebrahimi 10*22dc650dSSadaf Ebrahimi/X/utf 11*22dc650dSSadaf Ebrahimi XX\x{d800} 12*22dc650dSSadaf EbrahimiFailed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 13*22dc650dSSadaf Ebrahimi XX\x{d800}\=offset=3 14*22dc650dSSadaf EbrahimiError -36 (bad UTF-8 offset) 15*22dc650dSSadaf Ebrahimi XX\x{d800}\=no_utf_check 16*22dc650dSSadaf Ebrahimi 0: X 17*22dc650dSSadaf Ebrahimi XX\x{da00} 18*22dc650dSSadaf EbrahimiFailed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 19*22dc650dSSadaf Ebrahimi XX\x{da00}\=no_utf_check 20*22dc650dSSadaf Ebrahimi 0: X 21*22dc650dSSadaf Ebrahimi XX\x{dc00} 22*22dc650dSSadaf EbrahimiFailed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 23*22dc650dSSadaf Ebrahimi XX\x{dc00}\=no_utf_check 24*22dc650dSSadaf Ebrahimi 0: X 25*22dc650dSSadaf Ebrahimi XX\x{de00} 26*22dc650dSSadaf EbrahimiFailed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 27*22dc650dSSadaf Ebrahimi XX\x{de00}\=no_utf_check 28*22dc650dSSadaf Ebrahimi 0: X 29*22dc650dSSadaf Ebrahimi XX\x{dfff} 30*22dc650dSSadaf EbrahimiFailed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 31*22dc650dSSadaf Ebrahimi XX\x{dfff}\=no_utf_check 32*22dc650dSSadaf Ebrahimi 0: X 33*22dc650dSSadaf Ebrahimi XX\x{110000} 34*22dc650dSSadaf EbrahimiFailed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2 35*22dc650dSSadaf Ebrahimi XX\x{d800}\x{1234} 36*22dc650dSSadaf EbrahimiFailed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 37*22dc650dSSadaf Ebrahimi 38*22dc650dSSadaf Ebrahimi/badutf/utf 39*22dc650dSSadaf Ebrahimi X\xdf 40*22dc650dSSadaf EbrahimiFailed: error -3: UTF-8 error: 1 byte missing at end at offset 1 41*22dc650dSSadaf Ebrahimi XX\xef 42*22dc650dSSadaf EbrahimiFailed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 43*22dc650dSSadaf Ebrahimi XXX\xef\x80 44*22dc650dSSadaf EbrahimiFailed: error -3: UTF-8 error: 1 byte missing at end at offset 3 45*22dc650dSSadaf Ebrahimi X\xf7 46*22dc650dSSadaf EbrahimiFailed: error -5: UTF-8 error: 3 bytes missing at end at offset 1 47*22dc650dSSadaf Ebrahimi XX\xf7\x80 48*22dc650dSSadaf EbrahimiFailed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 49*22dc650dSSadaf Ebrahimi XXX\xf7\x80\x80 50*22dc650dSSadaf EbrahimiFailed: error -3: UTF-8 error: 1 byte missing at end at offset 3 51*22dc650dSSadaf Ebrahimi 52*22dc650dSSadaf Ebrahimi/shortutf/utf 53*22dc650dSSadaf Ebrahimi XX\xdf\=ph 54*22dc650dSSadaf EbrahimiFailed: error -3: UTF-8 error: 1 byte missing at end at offset 2 55*22dc650dSSadaf Ebrahimi XX\xef\=ph 56*22dc650dSSadaf EbrahimiFailed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 57*22dc650dSSadaf Ebrahimi XX\xef\x80\=ph 58*22dc650dSSadaf EbrahimiFailed: error -3: UTF-8 error: 1 byte missing at end at offset 2 59*22dc650dSSadaf Ebrahimi \xf7\=ph 60*22dc650dSSadaf EbrahimiFailed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 61*22dc650dSSadaf Ebrahimi \xf7\x80\=ph 62*22dc650dSSadaf EbrahimiFailed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 63*22dc650dSSadaf Ebrahimi 64*22dc650dSSadaf Ebrahimi# ---------------------------------------------------- 65*22dc650dSSadaf Ebrahimi# UCP and casing tests - except for the first two, these will all fail in 8-bit 66*22dc650dSSadaf Ebrahimi# mode because they are testing UCP without UTF and use characters > 255. 67*22dc650dSSadaf Ebrahimi 68*22dc650dSSadaf Ebrahimi/\x{c1}/i,no_start_optimize 69*22dc650dSSadaf Ebrahimi\= Expect no match 70*22dc650dSSadaf Ebrahimi \x{e1} 71*22dc650dSSadaf EbrahimiNo match 72*22dc650dSSadaf Ebrahimi 73*22dc650dSSadaf Ebrahimi/\x{c1}+\x{e1}/iB,ucp 74*22dc650dSSadaf Ebrahimi------------------------------------------------------------------ 75*22dc650dSSadaf Ebrahimi Bra 76*22dc650dSSadaf Ebrahimi /i \x{c1}+ 77*22dc650dSSadaf Ebrahimi /i \x{e1} 78*22dc650dSSadaf Ebrahimi Ket 79*22dc650dSSadaf Ebrahimi End 80*22dc650dSSadaf Ebrahimi------------------------------------------------------------------ 81*22dc650dSSadaf Ebrahimi \x{c1}\x{c1}\x{c1} 82*22dc650dSSadaf Ebrahimi 0: \xc1\xc1\xc1 83*22dc650dSSadaf Ebrahimi 1: \xc1\xc1 84*22dc650dSSadaf Ebrahimi \x{e1}\x{e1}\x{e1} 85*22dc650dSSadaf Ebrahimi 0: \xe1\xe1\xe1 86*22dc650dSSadaf Ebrahimi 1: \xe1\xe1 87*22dc650dSSadaf Ebrahimi 88*22dc650dSSadaf Ebrahimi/\x{120}\x{c1}/i,ucp,no_start_optimize 89*22dc650dSSadaf EbrahimiFailed: error 134 at offset 6: character code point value in \x{} or \o{} is too large 90*22dc650dSSadaf Ebrahimi \x{121}\x{e1} 91*22dc650dSSadaf Ebrahimi 92*22dc650dSSadaf Ebrahimi/\x{120}\x{c1}/i,ucp 93*22dc650dSSadaf EbrahimiFailed: error 134 at offset 6: character code point value in \x{} or \o{} is too large 94*22dc650dSSadaf Ebrahimi \x{121}\x{e1} 95*22dc650dSSadaf Ebrahimi 96*22dc650dSSadaf Ebrahimi/[^\x{120}]/i,no_start_optimize 97*22dc650dSSadaf EbrahimiFailed: error 134 at offset 8: character code point value in \x{} or \o{} is too large 98*22dc650dSSadaf Ebrahimi \x{121} 99*22dc650dSSadaf Ebrahimi 100*22dc650dSSadaf Ebrahimi/[^\x{120}]/i,ucp,no_start_optimize 101*22dc650dSSadaf EbrahimiFailed: error 134 at offset 8: character code point value in \x{} or \o{} is too large 102*22dc650dSSadaf Ebrahimi\= Expect no match 103*22dc650dSSadaf Ebrahimi \x{121} 104*22dc650dSSadaf Ebrahimi 105*22dc650dSSadaf Ebrahimi/[^\x{120}]/i 106*22dc650dSSadaf EbrahimiFailed: error 134 at offset 8: character code point value in \x{} or \o{} is too large 107*22dc650dSSadaf Ebrahimi \x{121} 108*22dc650dSSadaf Ebrahimi 109*22dc650dSSadaf Ebrahimi/[^\x{120}]/i,ucp 110*22dc650dSSadaf EbrahimiFailed: error 134 at offset 8: character code point value in \x{} or \o{} is too large 111*22dc650dSSadaf Ebrahimi\= Expect no match 112*22dc650dSSadaf Ebrahimi \x{121} 113*22dc650dSSadaf Ebrahimi 114*22dc650dSSadaf Ebrahimi/\x{120}{2}/i,ucp 115*22dc650dSSadaf EbrahimiFailed: error 134 at offset 6: character code point value in \x{} or \o{} is too large 116*22dc650dSSadaf Ebrahimi \x{121}\x{121} 117*22dc650dSSadaf Ebrahimi 118*22dc650dSSadaf Ebrahimi/[^\x{120}]{2}/i,ucp 119*22dc650dSSadaf EbrahimiFailed: error 134 at offset 8: character code point value in \x{} or \o{} is too large 120*22dc650dSSadaf Ebrahimi\= Expect no match 121*22dc650dSSadaf Ebrahimi \x{121}\x{121} 122*22dc650dSSadaf Ebrahimi 123*22dc650dSSadaf Ebrahimi# ---------------------------------------------------- 124*22dc650dSSadaf Ebrahimi 125*22dc650dSSadaf Ebrahimi# ---------------------------------------------------- 126*22dc650dSSadaf Ebrahimi# Tests for handling 0xffffffff in caseless UCP mode. They only apply to 32-bit 127*22dc650dSSadaf Ebrahimi# mode; for the other widths they will fail. 128*22dc650dSSadaf Ebrahimi 129*22dc650dSSadaf Ebrahimi/k*\x{ffffffff}/caseless,ucp 130*22dc650dSSadaf EbrahimiFailed: error 134 at offset 13: character code point value in \x{} or \o{} is too large 131*22dc650dSSadaf Ebrahimi \x{ffffffff} 132*22dc650dSSadaf Ebrahimi 133*22dc650dSSadaf Ebrahimi/k+\x{ffffffff}/caseless,ucp,no_start_optimize 134*22dc650dSSadaf EbrahimiFailed: error 134 at offset 13: character code point value in \x{} or \o{} is too large 135*22dc650dSSadaf Ebrahimi K\x{ffffffff} 136*22dc650dSSadaf Ebrahimi\= Expect no match 137*22dc650dSSadaf Ebrahimi \x{ffffffff}\x{ffffffff} 138*22dc650dSSadaf Ebrahimi 139*22dc650dSSadaf Ebrahimi/k{2}\x{ffffffff}/caseless,ucp,no_start_optimize 140*22dc650dSSadaf EbrahimiFailed: error 134 at offset 15: character code point value in \x{} or \o{} is too large 141*22dc650dSSadaf Ebrahimi\= Expect no match 142*22dc650dSSadaf Ebrahimi \x{ffffffff}\x{ffffffff}\x{ffffffff} 143*22dc650dSSadaf Ebrahimi 144*22dc650dSSadaf Ebrahimi/k\x{ffffffff}/caseless,ucp,no_start_optimize 145*22dc650dSSadaf EbrahimiFailed: error 134 at offset 12: character code point value in \x{} or \o{} is too large 146*22dc650dSSadaf Ebrahimi K\x{ffffffff} 147*22dc650dSSadaf Ebrahimi\= Expect no match 148*22dc650dSSadaf Ebrahimi \x{ffffffff}\x{ffffffff}\x{ffffffff} 149*22dc650dSSadaf Ebrahimi 150*22dc650dSSadaf Ebrahimi/k{2,}?Z/caseless,ucp,no_start_optimize,no_auto_possess 151*22dc650dSSadaf Ebrahimi\= Expect no match 152*22dc650dSSadaf Ebrahimi Kk\x{ffffffff}\x{ffffffff}\x{ffffffff}Z 153*22dc650dSSadaf Ebrahimi** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled. 154*22dc650dSSadaf Ebrahimi** Truncation will probably give the wrong result. 155*22dc650dSSadaf Ebrahimi** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled. 156*22dc650dSSadaf Ebrahimi** Truncation will probably give the wrong result. 157*22dc650dSSadaf Ebrahimi** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled. 158*22dc650dSSadaf Ebrahimi** Truncation will probably give the wrong result. 159*22dc650dSSadaf EbrahimiNo match 160*22dc650dSSadaf Ebrahimi 161*22dc650dSSadaf Ebrahimi# ---------------------------------------------------- 162*22dc650dSSadaf Ebrahimi 163*22dc650dSSadaf Ebrahimi# End of testinput14 164