1*cc02d7e2SAndroid Build Coastguard Worker[](https://travis-ci.com/cyb70289/utf8) 2*cc02d7e2SAndroid Build Coastguard Worker 3*cc02d7e2SAndroid Build Coastguard Worker# Fast UTF-8 validation with Range algorithm (NEON+SSE4+AVX2) 4*cc02d7e2SAndroid Build Coastguard Worker 5*cc02d7e2SAndroid Build Coastguard WorkerThis is a brand new algorithm to leverage SIMD for fast UTF-8 string validation. Both **NEON**(armv8a) and **SSE4** versions are implemented. **AVX2** implementation contributed by [ioioioio](https://github.com/ioioioio). 6*cc02d7e2SAndroid Build Coastguard Worker 7*cc02d7e2SAndroid Build Coastguard WorkerFour UTF-8 validation methods are compared on both x86 and Arm platforms. Benchmark result shows range base algorithm is the best solution on Arm, and achieves same performance as [Lemire's approach](https://lemire.me/blog/2018/05/16/validating-utf-8-strings-using-as-little-as-0-7-cycles-per-byte/) on x86. 8*cc02d7e2SAndroid Build Coastguard Worker 9*cc02d7e2SAndroid Build Coastguard Worker* Range based algorithm 10*cc02d7e2SAndroid Build Coastguard Worker * range-neon.c: NEON version 11*cc02d7e2SAndroid Build Coastguard Worker * range-sse.c: SSE4 version 12*cc02d7e2SAndroid Build Coastguard Worker * range-avx2.c: AVX2 version 13*cc02d7e2SAndroid Build Coastguard Worker * range2-neon.c, range2-sse.c: Process two blocks in one iteration 14*cc02d7e2SAndroid Build Coastguard Worker* [Lemire's SIMD implementation](https://github.com/lemire/fastvalidate-utf-8) 15*cc02d7e2SAndroid Build Coastguard Worker * lemire-sse.c: SSE4 version 16*cc02d7e2SAndroid Build Coastguard Worker * lemire-avx2.c: AVX2 version 17*cc02d7e2SAndroid Build Coastguard Worker * lemire-neon.c: NEON porting 18*cc02d7e2SAndroid Build Coastguard Worker* naive.c: Naive UTF-8 validation byte by byte 19*cc02d7e2SAndroid Build Coastguard Worker* lookup.c: [Lookup-table method](http://bjoern.hoehrmann.de/utf-8/decoder/dfa/) 20*cc02d7e2SAndroid Build Coastguard Worker 21*cc02d7e2SAndroid Build Coastguard Worker## About the code 22*cc02d7e2SAndroid Build Coastguard Worker 23*cc02d7e2SAndroid Build Coastguard Worker* Run "make" to build. Built and tested with gcc-7.3. 24*cc02d7e2SAndroid Build Coastguard Worker* Run "./utf8" to see all command line options. 25*cc02d7e2SAndroid Build Coastguard Worker* Benchmark 26*cc02d7e2SAndroid Build Coastguard Worker * Run "./utf8 bench" to bechmark all algorithms with [default test file](https://raw.githubusercontent.com/cyb70289/utf8/master/UTF-8-demo.txt). 27*cc02d7e2SAndroid Build Coastguard Worker * Run "./utf8 bench size NUM" to benchmark specified string size. 28*cc02d7e2SAndroid Build Coastguard Worker* Run "./utf8 test" to test all algorithms with positive and negative test cases. 29*cc02d7e2SAndroid Build Coastguard Worker* To benchmark or test specific algorithm, run something like "./utf8 bench range". 30*cc02d7e2SAndroid Build Coastguard Worker 31*cc02d7e2SAndroid Build Coastguard Worker## Benchmark result (MB/s) 32*cc02d7e2SAndroid Build Coastguard Worker 33*cc02d7e2SAndroid Build Coastguard Worker### Method 34*cc02d7e2SAndroid Build Coastguard Worker1. Generate UTF-8 test buffer per [test file](https://raw.githubusercontent.com/cyb70289/utf8/master/UTF-8-demo.txt) or buffer size. 35*cc02d7e2SAndroid Build Coastguard Worker1. Call validation sub-routines in a loop until 1G bytes are checked. 36*cc02d7e2SAndroid Build Coastguard Worker1. Calculate speed(MB/s) of validating UTF-8 strings. 37*cc02d7e2SAndroid Build Coastguard Worker 38*cc02d7e2SAndroid Build Coastguard Worker### NEON(armv8a) 39*cc02d7e2SAndroid Build Coastguard WorkerTest case | naive | lookup | lemire | range | range2 40*cc02d7e2SAndroid Build Coastguard Worker:-------- | :---- | :----- | :----- | :---- | :----- 41*cc02d7e2SAndroid Build Coastguard Worker[UTF-demo.txt](https://raw.githubusercontent.com/cyb70289/utf8/master/UTF-8-demo.txt) | 562.25 | 412.84 | 1198.50 | 1411.72 | **1579.85** 42*cc02d7e2SAndroid Build Coastguard Worker32 bytes | 651.55 | 441.70 | 891.38 | 1003.95 | **1043.58** 43*cc02d7e2SAndroid Build Coastguard Worker33 bytes | 660.00 | 446.78 | 588.77 | 1009.31 | **1048.12** 44*cc02d7e2SAndroid Build Coastguard Worker129 bytes | 771.89 | 402.55 | 938.07 | 1283.77 | **1401.76** 45*cc02d7e2SAndroid Build Coastguard Worker1K bytes | 811.92 | 411.58 | 1188.96 | 1398.15 | **1560.23** 46*cc02d7e2SAndroid Build Coastguard Worker8K bytes | 812.25 | 412.74 | 1198.90 | 1412.18 | **1580.65** 47*cc02d7e2SAndroid Build Coastguard Worker64K bytes | 817.35 | 412.24 | 1200.20 | 1415.11 | **1583.86** 48*cc02d7e2SAndroid Build Coastguard Worker1M bytes | 815.70 | 411.93 | 1200.93 | 1415.65 | **1585.40** 49*cc02d7e2SAndroid Build Coastguard Worker 50*cc02d7e2SAndroid Build Coastguard Worker### SSE4(E5-2650) 51*cc02d7e2SAndroid Build Coastguard WorkerTest case | naive | lookup | lemire | range | range2 52*cc02d7e2SAndroid Build Coastguard Worker:-------- | :---- | :----- | :----- | :---- | :----- 53*cc02d7e2SAndroid Build Coastguard Worker[UTF-demo.txt](https://raw.githubusercontent.com/cyb70289/utf8/master/UTF-8-demo.txt) | 753.70 | 310.41 | 3954.74 | 3945.60 | **3986.13** 54*cc02d7e2SAndroid Build Coastguard Worker32 bytes | 1135.76 | 364.07 | **2890.52** | 2351.81 | 2173.02 55*cc02d7e2SAndroid Build Coastguard Worker33 bytes | 1161.85 | 376.29 | 1352.95 | **2239.55** | 2041.43 56*cc02d7e2SAndroid Build Coastguard Worker129 bytes | 1161.22 | 322.47 | 2742.49 | **3315.33** | 3249.35 57*cc02d7e2SAndroid Build Coastguard Worker1K bytes | 1310.95 | 310.72 | 3755.88 | 3781.23 | **3874.17** 58*cc02d7e2SAndroid Build Coastguard Worker8K bytes | 1348.32 | 307.93 | 3860.71 | 3922.81 | **3968.93** 59*cc02d7e2SAndroid Build Coastguard Worker64K bytes | 1301.34 | 308.39 | 3935.15 | 3973.50 | **3983.44** 60*cc02d7e2SAndroid Build Coastguard Worker1M bytes | 1279.78 | 309.06 | 3923.51 | 3953.00 | **3960.49** 61*cc02d7e2SAndroid Build Coastguard Worker 62*cc02d7e2SAndroid Build Coastguard Worker## Range algorithm analysis 63*cc02d7e2SAndroid Build Coastguard Worker 64*cc02d7e2SAndroid Build Coastguard WorkerBasic idea: 65*cc02d7e2SAndroid Build Coastguard Worker* Load 16 bytes 66*cc02d7e2SAndroid Build Coastguard Worker* Leverage SIMD to calculate value range for each byte efficiently 67*cc02d7e2SAndroid Build Coastguard Worker* Validate 16 bytes at once 68*cc02d7e2SAndroid Build Coastguard Worker 69*cc02d7e2SAndroid Build Coastguard Worker### UTF-8 coding format 70*cc02d7e2SAndroid Build Coastguard Worker 71*cc02d7e2SAndroid Build Coastguard Workerhttp://www.unicode.org/versions/Unicode6.0.0/ch03.pdf, page 94 72*cc02d7e2SAndroid Build Coastguard Worker 73*cc02d7e2SAndroid Build Coastguard WorkerTable 3-7. Well-Formed UTF-8 Byte Sequences 74*cc02d7e2SAndroid Build Coastguard Worker 75*cc02d7e2SAndroid Build Coastguard WorkerCode Points | First Byte | Second Byte | Third Byte | Fourth Byte | 76*cc02d7e2SAndroid Build Coastguard Worker:---------- | :--------- | :---------- | :--------- | :---------- | 77*cc02d7e2SAndroid Build Coastguard WorkerU+0000..U+007F | 00..7F | | | | 78*cc02d7e2SAndroid Build Coastguard WorkerU+0080..U+07FF | C2..DF | 80..BF | | | 79*cc02d7e2SAndroid Build Coastguard WorkerU+0800..U+0FFF | E0 | ***A0***..BF| 80..BF | | 80*cc02d7e2SAndroid Build Coastguard WorkerU+1000..U+CFFF | E1..EC | 80..BF | 80..BF | | 81*cc02d7e2SAndroid Build Coastguard WorkerU+D000..U+D7FF | ED | 80..***9F***| 80..BF | | 82*cc02d7e2SAndroid Build Coastguard WorkerU+E000..U+FFFF | EE..EF | 80..BF | 80..BF | | 83*cc02d7e2SAndroid Build Coastguard WorkerU+10000..U+3FFFF | F0 | ***90***..BF| 80..BF | 80..BF | 84*cc02d7e2SAndroid Build Coastguard WorkerU+40000..U+FFFFF | F1..F3 | 80..BF | 80..BF | 80..BF | 85*cc02d7e2SAndroid Build Coastguard WorkerU+100000..U+10FFFF | F4 | 80..***8F***| 80..BF | 80..BF | 86*cc02d7e2SAndroid Build Coastguard Worker 87*cc02d7e2SAndroid Build Coastguard WorkerTo summarise UTF-8 encoding: 88*cc02d7e2SAndroid Build Coastguard Worker* Depending on First Byte, one legal character can be 1, 2, 3, 4 bytes 89*cc02d7e2SAndroid Build Coastguard Worker * For First Byte within C0..DF, character length = 2 90*cc02d7e2SAndroid Build Coastguard Worker * For First Byte within E0..EF, character length = 3 91*cc02d7e2SAndroid Build Coastguard Worker * For First Byte within F0..F4, character length = 4 92*cc02d7e2SAndroid Build Coastguard Worker* C0, C1, F5..FF are not allowed 93*cc02d7e2SAndroid Build Coastguard Worker* Second,Third,Fourth Bytes must lie in 80..BF. 94*cc02d7e2SAndroid Build Coastguard Worker* There are four **special cases** for Second Byte, shown ***bold italic*** in above table. 95*cc02d7e2SAndroid Build Coastguard Worker 96*cc02d7e2SAndroid Build Coastguard Worker### Range table 97*cc02d7e2SAndroid Build Coastguard Worker 98*cc02d7e2SAndroid Build Coastguard WorkerRange table maps range index 0 ~ 15 to minimal and maximum values allowed. Our task is to observe input string, find the pattern and set correct range index for each byte, then validate input string. 99*cc02d7e2SAndroid Build Coastguard Worker 100*cc02d7e2SAndroid Build Coastguard WorkerIndex | Min | Max | Byte type 101*cc02d7e2SAndroid Build Coastguard Worker:---- | :-- | :-- | :-------- 102*cc02d7e2SAndroid Build Coastguard Worker0 | 00 | 7F | First Byte, ASCII 103*cc02d7e2SAndroid Build Coastguard Worker1,2,3 | 80 | BF | Second, Third, Fourth Bytes 104*cc02d7e2SAndroid Build Coastguard Worker4 | A0 | BF | Second Byte after E0 105*cc02d7e2SAndroid Build Coastguard Worker5 | 80 | 9F | Second Byte after ED 106*cc02d7e2SAndroid Build Coastguard Worker6 | 90 | BF | Second Byte after F0 107*cc02d7e2SAndroid Build Coastguard Worker7 | 80 | 8F | Second Byte after F4 108*cc02d7e2SAndroid Build Coastguard Worker8 | C2 | F4 | First Byte, non-ASCII 109*cc02d7e2SAndroid Build Coastguard Worker9..15(NEON) | FF | 00 | Illegal: unsigned char >= 255 && unsigned char <= 0 110*cc02d7e2SAndroid Build Coastguard Worker9..15(SSE) | 7F | 80 | Illegal: signed char >= 127 && signed char <= -128 111*cc02d7e2SAndroid Build Coastguard Worker 112*cc02d7e2SAndroid Build Coastguard Worker### Calculate byte ranges (ignore special cases) 113*cc02d7e2SAndroid Build Coastguard Worker 114*cc02d7e2SAndroid Build Coastguard WorkerIgnoring the four special cases(E0,ED,F0,F4), how should we set range index for each byte? 115*cc02d7e2SAndroid Build Coastguard Worker 116*cc02d7e2SAndroid Build Coastguard Worker* Set range index to 0(00..7F) for all bytes by default 117*cc02d7e2SAndroid Build Coastguard Worker* Find non-ASCII First Byte (C0..FF), set their range index to 8(C2..F4) 118*cc02d7e2SAndroid Build Coastguard Worker* For First Byte within C0..DF, set next byte's range index to 1(80..BF) 119*cc02d7e2SAndroid Build Coastguard Worker* For First Byte within E0..EF, set next two byte's range index to 2,1(80..BF) in sequence 120*cc02d7e2SAndroid Build Coastguard Worker* For First Byte within F0..FF, set next three byte's range index to 3,2,1(80..BF) in sequence 121*cc02d7e2SAndroid Build Coastguard Worker 122*cc02d7e2SAndroid Build Coastguard WorkerTo implement above operations efficiently with SIMD: 123*cc02d7e2SAndroid Build Coastguard Worker* For 16 input bytes, use lookup table to map C0..DF to 1, E0..EF to 2, F0..FF to 3, others to 0. Save to first_len. 124*cc02d7e2SAndroid Build Coastguard Worker* Map C0..FF to 8, we get range indices for First Byte. 125*cc02d7e2SAndroid Build Coastguard Worker* Shift first_len one byte, we get range indices for Second Byte. 126*cc02d7e2SAndroid Build Coastguard Worker* Saturate substract first_len by one(3->2, 2->1, 1->0, 0->0), then shift two bytes, we get range indices for Third Byte. 127*cc02d7e2SAndroid Build Coastguard Worker* Saturate substract first_len by two(3->1, 2->0, 1->0, 0->0), then shift three bytes, we get range indices for Fourth Byte. 128*cc02d7e2SAndroid Build Coastguard Worker 129*cc02d7e2SAndroid Build Coastguard WorkerExample(assume no previous data) 130*cc02d7e2SAndroid Build Coastguard Worker 131*cc02d7e2SAndroid Build Coastguard WorkerInput | F1 | 80 | 80 | 80 | 80 | C2 | 80 | 80 | ... 132*cc02d7e2SAndroid Build Coastguard Worker:---- | :- | :- | :- | :- | :- | :- | :- | :- | :-- 133*cc02d7e2SAndroid Build Coastguard Worker*first_len* |*3* |*0* |*0* |*0* |*0* |*1* |*0* |*0* |*...* 134*cc02d7e2SAndroid Build Coastguard WorkerFirst Byte | 8 | 0 | 0 | 0 | 0 | 8 | 0 | 0 | ... 135*cc02d7e2SAndroid Build Coastguard WorkerSecond Byte | 0 | 3 | 0 | 0 | 0 | 0 | 1 | 0 | ... 136*cc02d7e2SAndroid Build Coastguard WorkerThird Byte | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | ... 137*cc02d7e2SAndroid Build Coastguard WorkerFourth Byte | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ... 138*cc02d7e2SAndroid Build Coastguard WorkerRange index | 8 | 3 | 2 | 1 | 0 | 8 | 1 | 0 | ... 139*cc02d7e2SAndroid Build Coastguard Worker 140*cc02d7e2SAndroid Build Coastguard Worker```c 141*cc02d7e2SAndroid Build Coastguard WorkerRange_index = First_Byte | Second_Byte | Third_Byte | Fourth_Byte 142*cc02d7e2SAndroid Build Coastguard Worker``` 143*cc02d7e2SAndroid Build Coastguard Worker 144*cc02d7e2SAndroid Build Coastguard Worker#### Error handling 145*cc02d7e2SAndroid Build Coastguard Worker 146*cc02d7e2SAndroid Build Coastguard Worker* C0,C1,F5..FF are not included in range table and will always be detected. 147*cc02d7e2SAndroid Build Coastguard Worker* Illegal 80..BF will have range index 0(00..7F) and be detected. 148*cc02d7e2SAndroid Build Coastguard Worker* Based on First Byte, according Second, Third and Fourth Bytes will have range index 1/2/3, to make sure they must lie in 80..BF. 149*cc02d7e2SAndroid Build Coastguard Worker* If non-ASCII First Byte overlaps, above algorithm will set range index of the latter First Byte to 9,10,11, which are illegal ranges. E.g, Input = F1 80 C2 90 --> Range index = 8 3 10 1, where 10 indicates error. See table below. 150*cc02d7e2SAndroid Build Coastguard Worker 151*cc02d7e2SAndroid Build Coastguard WorkerOverlapped non-ASCII First Byte 152*cc02d7e2SAndroid Build Coastguard Worker 153*cc02d7e2SAndroid Build Coastguard WorkerInput | F1 | 80 | C2 | 90 154*cc02d7e2SAndroid Build Coastguard Worker:---- | :- | :- | :- | :- 155*cc02d7e2SAndroid Build Coastguard Worker*first_len* |*3* |*0* |*1* |*0* 156*cc02d7e2SAndroid Build Coastguard WorkerFirst Byte | 8 | 0 | 8 | 0 157*cc02d7e2SAndroid Build Coastguard WorkerSecond Byte | 0 | 3 | 0 | 1 158*cc02d7e2SAndroid Build Coastguard WorkerThird Byte | 0 | 0 | 2 | 0 159*cc02d7e2SAndroid Build Coastguard WorkerFourth Byte | 0 | 0 | 0 | 1 160*cc02d7e2SAndroid Build Coastguard WorkerRange index | 8 | 3 |***10***| 1 161*cc02d7e2SAndroid Build Coastguard Worker 162*cc02d7e2SAndroid Build Coastguard Worker### Adjust Second Byte range for special cases 163*cc02d7e2SAndroid Build Coastguard Worker 164*cc02d7e2SAndroid Build Coastguard WorkerRange index adjustment for four special cases 165*cc02d7e2SAndroid Build Coastguard Worker 166*cc02d7e2SAndroid Build Coastguard WorkerFirst Byte | Second Byte | Before adjustment | Correct index | Adjustment | 167*cc02d7e2SAndroid Build Coastguard Worker:--------- | :---------- | :---------------- | :------------ | :--------- 168*cc02d7e2SAndroid Build Coastguard WorkerE0 | A0..BF | 2 | 4 | **2** 169*cc02d7e2SAndroid Build Coastguard WorkerED | 80..9F | 2 | 5 | **3** 170*cc02d7e2SAndroid Build Coastguard WorkerF0 | 90..BF | 3 | 6 | **3** 171*cc02d7e2SAndroid Build Coastguard WorkerF4 | 80..8F | 3 | 7 | **4** 172*cc02d7e2SAndroid Build Coastguard Worker 173*cc02d7e2SAndroid Build Coastguard WorkerRange index adjustment can be reduced to below problem: 174*cc02d7e2SAndroid Build Coastguard Worker 175*cc02d7e2SAndroid Build Coastguard Worker***Given 16 bytes, replace E0 with 2, ED with 3, F0 with 3, F4 with 4, others with 0.*** 176*cc02d7e2SAndroid Build Coastguard Worker 177*cc02d7e2SAndroid Build Coastguard WorkerA naive SIMD approach: 178*cc02d7e2SAndroid Build Coastguard Worker1. Compare 16 bytes with E0, get the mask for eacy byte (FF if equal, 00 otherwise) 179*cc02d7e2SAndroid Build Coastguard Worker1. And the mask with 2 to get adjustment for E0 180*cc02d7e2SAndroid Build Coastguard Worker1. Repeat step 1,2 for ED,F0,F4 181*cc02d7e2SAndroid Build Coastguard Worker 182*cc02d7e2SAndroid Build Coastguard WorkerAt least **eight** operations are required for naive approach. 183*cc02d7e2SAndroid Build Coastguard Worker 184*cc02d7e2SAndroid Build Coastguard WorkerObserving special bytes(E0,ED,F0,F4) are close to each other, we can do much better using lookup table. 185*cc02d7e2SAndroid Build Coastguard Worker 186*cc02d7e2SAndroid Build Coastguard Worker#### NEON 187*cc02d7e2SAndroid Build Coastguard Worker 188*cc02d7e2SAndroid Build Coastguard WorkerNEON ```tbl``` instruction is very convenient for table lookup: 189*cc02d7e2SAndroid Build Coastguard Worker* Table can be up to 16x4 bytes in size 190*cc02d7e2SAndroid Build Coastguard Worker* Return zero if index is out of range 191*cc02d7e2SAndroid Build Coastguard Worker 192*cc02d7e2SAndroid Build Coastguard WorkerLeverage these features, we can solve the problem with as few as **two** operations: 193*cc02d7e2SAndroid Build Coastguard Worker* Precreate a 16x2 lookup table, where table[0]=2, table[13]=3, table[16]=3, table[20]=4, table[others]=0. 194*cc02d7e2SAndroid Build Coastguard Worker* Substract input bytes with E0 (E0 -> 0, ED -> 13, F0 -> 16, F4 -> 20). 195*cc02d7e2SAndroid Build Coastguard Worker* Use the substracted byte as index of lookup table and get range adjustment directly. 196*cc02d7e2SAndroid Build Coastguard Worker * For indices less than 32, we get zero or required adjustment value per input byte 197*cc02d7e2SAndroid Build Coastguard Worker * For out of bound indices, we get zero per ```tbl``` behaviour 198*cc02d7e2SAndroid Build Coastguard Worker 199*cc02d7e2SAndroid Build Coastguard Worker#### SSE 200*cc02d7e2SAndroid Build Coastguard Worker 201*cc02d7e2SAndroid Build Coastguard WorkerSSE ```pshufb``` instruction is not as friendly as NEON ```tbl``` in this case: 202*cc02d7e2SAndroid Build Coastguard Worker* Table can only be 16 bytes in size 203*cc02d7e2SAndroid Build Coastguard Worker* Out of bound indices are handled this way: 204*cc02d7e2SAndroid Build Coastguard Worker * If 7-th bit of index is 0, least four bits are used as index (E.g, index 0x73 returns 3rd element) 205*cc02d7e2SAndroid Build Coastguard Worker * If 7-th bit of index is 1, return 0 (E.g, index 0x83 returns 0) 206*cc02d7e2SAndroid Build Coastguard Worker 207*cc02d7e2SAndroid Build Coastguard WorkerWe can still leverage these features to solve the problem in **five** operations: 208*cc02d7e2SAndroid Build Coastguard Worker* Precreate two tables: 209*cc02d7e2SAndroid Build Coastguard Worker * table_df[1] = 2, table_df[14] = 3, table_df[others] = 0 210*cc02d7e2SAndroid Build Coastguard Worker * table_ef[1] = 3, table_ef[5] = 4, table_ef[others] = 0 211*cc02d7e2SAndroid Build Coastguard Worker* Substract input bytes with EF (E0 -> 241, ED -> 254, F0 -> 1, F4 -> 5) to get the temporary indices 212*cc02d7e2SAndroid Build Coastguard Worker* Get range index for E0,ED 213*cc02d7e2SAndroid Build Coastguard Worker * Saturate substract temporary indices with 240 (E0 -> 1, ED -> 14, all values below 240 becomes 0) 214*cc02d7e2SAndroid Build Coastguard Worker * Use substracted indices to look up table_df, get the correct adjustment 215*cc02d7e2SAndroid Build Coastguard Worker* Get range index for F0,F4 216*cc02d7e2SAndroid Build Coastguard Worker * Saturate add temporary indices with 112(0x70) (F0 -> 0x71, F4 -> 0x75, all values above 16 will be larger than 128(7-th bit set)) 217*cc02d7e2SAndroid Build Coastguard Worker * Use added indices to look up table_ef, get the correct adjustment (index 0x71,0x75 returns 1st,5th elements, per ```pshufb``` behaviour) 218*cc02d7e2SAndroid Build Coastguard Worker 219*cc02d7e2SAndroid Build Coastguard Worker#### Error handling 220*cc02d7e2SAndroid Build Coastguard Worker 221*cc02d7e2SAndroid Build Coastguard Worker* For overlapped non-ASCII First Byte, range index before adjustment is 9,10,11. After adjustment (adds 2,3,4 or 0), the range index will be 9 to 15, which is still illegal in range table. So the error will be detected. 222*cc02d7e2SAndroid Build Coastguard Worker 223*cc02d7e2SAndroid Build Coastguard Worker### Handling remaining bytes 224*cc02d7e2SAndroid Build Coastguard Worker 225*cc02d7e2SAndroid Build Coastguard WorkerFor remaining input less than 16 bytes, we will fallback to naive byte by byte approach to validate them, which is actually faster than SIMD processing. 226*cc02d7e2SAndroid Build Coastguard Worker* Look back last 16 bytes buffer to find First Byte. At most three bytes need to look back. Otherwise we either happen to be at character boundray, or there are some errors we already detected. 227*cc02d7e2SAndroid Build Coastguard Worker* Validate string byte by byte starting from the First Byte. 228*cc02d7e2SAndroid Build Coastguard Worker 229*cc02d7e2SAndroid Build Coastguard Worker## Tests 230*cc02d7e2SAndroid Build Coastguard Worker 231*cc02d7e2SAndroid Build Coastguard WorkerIt's necessary to design test cases to cover corner cases as more as possible. 232*cc02d7e2SAndroid Build Coastguard Worker 233*cc02d7e2SAndroid Build Coastguard Worker### Positive cases 234*cc02d7e2SAndroid Build Coastguard Worker 235*cc02d7e2SAndroid Build Coastguard Worker1. Prepare correct characters 236*cc02d7e2SAndroid Build Coastguard Worker2. Validate correct characters 237*cc02d7e2SAndroid Build Coastguard Worker3. Validate long strings 238*cc02d7e2SAndroid Build Coastguard Worker * Round concatenate characters starting from first character to 1024 bytes 239*cc02d7e2SAndroid Build Coastguard Worker * Validate 1024 bytes string 240*cc02d7e2SAndroid Build Coastguard Worker * Shift 1 byte, validate 1025 bytes string 241*cc02d7e2SAndroid Build Coastguard Worker * Shift 2 bytes, Validate 1026 bytes string 242*cc02d7e2SAndroid Build Coastguard Worker * ... 243*cc02d7e2SAndroid Build Coastguard Worker * Shift 16 bytes, validate 1040 bytes string 244*cc02d7e2SAndroid Build Coastguard Worker4. Repeat step3, test buffer starting from second character 245*cc02d7e2SAndroid Build Coastguard Worker5. Repeat step3, test buffer starting from third character 246*cc02d7e2SAndroid Build Coastguard Worker6. ... 247*cc02d7e2SAndroid Build Coastguard Worker 248*cc02d7e2SAndroid Build Coastguard Worker### Negative cases 249*cc02d7e2SAndroid Build Coastguard Worker 250*cc02d7e2SAndroid Build Coastguard Worker1. Prepare bad characters and bad strings 251*cc02d7e2SAndroid Build Coastguard Worker * Bad character 252*cc02d7e2SAndroid Build Coastguard Worker * Bad character cross 16 bytes boundary 253*cc02d7e2SAndroid Build Coastguard Worker * Bad character cross last 16 bytes and remaining bytes boundary 254*cc02d7e2SAndroid Build Coastguard Worker2. Test long strings 255*cc02d7e2SAndroid Build Coastguard Worker * Prepare correct long strings same as positive cases 256*cc02d7e2SAndroid Build Coastguard Worker * Append bad characters 257*cc02d7e2SAndroid Build Coastguard Worker * Shift one byte for each iteration 258*cc02d7e2SAndroid Build Coastguard Worker * Validate each shift 259*cc02d7e2SAndroid Build Coastguard Worker 260*cc02d7e2SAndroid Build Coastguard Worker## Code breakdown 261*cc02d7e2SAndroid Build Coastguard Worker 262*cc02d7e2SAndroid Build Coastguard WorkerBelow table shows how 16 bytes input are processed step by step. See [range-neon.c](range-neon.c) for according code. 263*cc02d7e2SAndroid Build Coastguard Worker 264*cc02d7e2SAndroid Build Coastguard Worker 265