xref: /aosp_15_r20/external/grpc-grpc/third_party/utf8_range/README.md (revision cc02d7e222339f7a4f6ba5f422e6413f4bd931f2)
1*cc02d7e2SAndroid Build Coastguard Worker[![Build Status](https://travis-ci.com/cyb70289/utf8.svg?branch=master)](https://travis-ci.com/cyb70289/utf8)
2*cc02d7e2SAndroid Build Coastguard Worker
3*cc02d7e2SAndroid Build Coastguard Worker# Fast UTF-8 validation with Range algorithm (NEON+SSE4+AVX2)
4*cc02d7e2SAndroid Build Coastguard Worker
5*cc02d7e2SAndroid Build Coastguard WorkerThis is a brand new algorithm to leverage SIMD for fast UTF-8 string validation. Both **NEON**(armv8a) and **SSE4** versions are implemented. **AVX2** implementation contributed by [ioioioio](https://github.com/ioioioio).
6*cc02d7e2SAndroid Build Coastguard Worker
7*cc02d7e2SAndroid Build Coastguard WorkerFour UTF-8 validation methods are compared on both x86 and Arm platforms. Benchmark result shows range base algorithm is the best solution on Arm, and achieves same performance as [Lemire's approach](https://lemire.me/blog/2018/05/16/validating-utf-8-strings-using-as-little-as-0-7-cycles-per-byte/) on x86.
8*cc02d7e2SAndroid Build Coastguard Worker
9*cc02d7e2SAndroid Build Coastguard Worker* Range based algorithm
10*cc02d7e2SAndroid Build Coastguard Worker  * range-neon.c: NEON version
11*cc02d7e2SAndroid Build Coastguard Worker  * range-sse.c: SSE4 version
12*cc02d7e2SAndroid Build Coastguard Worker  * range-avx2.c: AVX2 version
13*cc02d7e2SAndroid Build Coastguard Worker  * range2-neon.c, range2-sse.c: Process two blocks in one iteration
14*cc02d7e2SAndroid Build Coastguard Worker* [Lemire's SIMD implementation](https://github.com/lemire/fastvalidate-utf-8)
15*cc02d7e2SAndroid Build Coastguard Worker  * lemire-sse.c: SSE4 version
16*cc02d7e2SAndroid Build Coastguard Worker  * lemire-avx2.c: AVX2 version
17*cc02d7e2SAndroid Build Coastguard Worker  * lemire-neon.c: NEON porting
18*cc02d7e2SAndroid Build Coastguard Worker* naive.c: Naive UTF-8 validation byte by byte
19*cc02d7e2SAndroid Build Coastguard Worker* lookup.c: [Lookup-table method](http://bjoern.hoehrmann.de/utf-8/decoder/dfa/)
20*cc02d7e2SAndroid Build Coastguard Worker
21*cc02d7e2SAndroid Build Coastguard Worker## About the code
22*cc02d7e2SAndroid Build Coastguard Worker
23*cc02d7e2SAndroid Build Coastguard Worker* Run "make" to build. Built and tested with gcc-7.3.
24*cc02d7e2SAndroid Build Coastguard Worker* Run "./utf8" to see all command line options.
25*cc02d7e2SAndroid Build Coastguard Worker* Benchmark
26*cc02d7e2SAndroid Build Coastguard Worker  * Run "./utf8 bench" to bechmark all algorithms with [default test file](https://raw.githubusercontent.com/cyb70289/utf8/master/UTF-8-demo.txt).
27*cc02d7e2SAndroid Build Coastguard Worker  * Run "./utf8 bench size NUM" to benchmark specified string size.
28*cc02d7e2SAndroid Build Coastguard Worker* Run "./utf8 test" to test all algorithms with positive and negative test cases.
29*cc02d7e2SAndroid Build Coastguard Worker* To benchmark or test specific algorithm, run something like "./utf8 bench range".
30*cc02d7e2SAndroid Build Coastguard Worker
31*cc02d7e2SAndroid Build Coastguard Worker## Benchmark result (MB/s)
32*cc02d7e2SAndroid Build Coastguard Worker
33*cc02d7e2SAndroid Build Coastguard Worker### Method
34*cc02d7e2SAndroid Build Coastguard Worker1. Generate UTF-8 test buffer per [test file](https://raw.githubusercontent.com/cyb70289/utf8/master/UTF-8-demo.txt) or buffer size.
35*cc02d7e2SAndroid Build Coastguard Worker1. Call validation sub-routines in a loop until 1G bytes are checked.
36*cc02d7e2SAndroid Build Coastguard Worker1. Calculate speed(MB/s) of validating UTF-8 strings.
37*cc02d7e2SAndroid Build Coastguard Worker
38*cc02d7e2SAndroid Build Coastguard Worker### NEON(armv8a)
39*cc02d7e2SAndroid Build Coastguard WorkerTest case | naive | lookup | lemire | range | range2
40*cc02d7e2SAndroid Build Coastguard Worker:-------- | :---- | :----- | :----- | :---- | :-----
41*cc02d7e2SAndroid Build Coastguard Worker[UTF-demo.txt](https://raw.githubusercontent.com/cyb70289/utf8/master/UTF-8-demo.txt) | 562.25 | 412.84 | 1198.50 | 1411.72 | **1579.85**
42*cc02d7e2SAndroid Build Coastguard Worker32 bytes | 651.55 | 441.70 | 891.38 | 1003.95 | **1043.58**
43*cc02d7e2SAndroid Build Coastguard Worker33 bytes | 660.00 | 446.78 | 588.77 | 1009.31 | **1048.12**
44*cc02d7e2SAndroid Build Coastguard Worker129 bytes | 771.89 | 402.55 | 938.07 | 1283.77 | **1401.76**
45*cc02d7e2SAndroid Build Coastguard Worker1K bytes | 811.92 | 411.58 | 1188.96 | 1398.15 | **1560.23**
46*cc02d7e2SAndroid Build Coastguard Worker8K bytes | 812.25  | 412.74 | 1198.90 | 1412.18 | **1580.65**
47*cc02d7e2SAndroid Build Coastguard Worker64K bytes | 817.35 | 412.24 | 1200.20 | 1415.11 | **1583.86**
48*cc02d7e2SAndroid Build Coastguard Worker1M bytes | 815.70  | 411.93 | 1200.93 | 1415.65 | **1585.40**
49*cc02d7e2SAndroid Build Coastguard Worker
50*cc02d7e2SAndroid Build Coastguard Worker### SSE4(E5-2650)
51*cc02d7e2SAndroid Build Coastguard WorkerTest case | naive | lookup | lemire | range | range2
52*cc02d7e2SAndroid Build Coastguard Worker:-------- | :---- | :----- | :----- | :---- | :-----
53*cc02d7e2SAndroid Build Coastguard Worker[UTF-demo.txt](https://raw.githubusercontent.com/cyb70289/utf8/master/UTF-8-demo.txt) | 753.70 | 310.41 | 3954.74 | 3945.60 | **3986.13**
54*cc02d7e2SAndroid Build Coastguard Worker32 bytes | 1135.76 | 364.07 | **2890.52** | 2351.81 | 2173.02
55*cc02d7e2SAndroid Build Coastguard Worker33 bytes | 1161.85 | 376.29 | 1352.95 | **2239.55** | 2041.43
56*cc02d7e2SAndroid Build Coastguard Worker129 bytes | 1161.22 | 322.47 | 2742.49 | **3315.33** | 3249.35
57*cc02d7e2SAndroid Build Coastguard Worker1K bytes | 1310.95 | 310.72 | 3755.88 | 3781.23 | **3874.17**
58*cc02d7e2SAndroid Build Coastguard Worker8K bytes | 1348.32 | 307.93 | 3860.71 | 3922.81 | **3968.93**
59*cc02d7e2SAndroid Build Coastguard Worker64K bytes | 1301.34 | 308.39 | 3935.15 | 3973.50 | **3983.44**
60*cc02d7e2SAndroid Build Coastguard Worker1M bytes | 1279.78 | 309.06 | 3923.51 | 3953.00 | **3960.49**
61*cc02d7e2SAndroid Build Coastguard Worker
62*cc02d7e2SAndroid Build Coastguard Worker## Range algorithm analysis
63*cc02d7e2SAndroid Build Coastguard Worker
64*cc02d7e2SAndroid Build Coastguard WorkerBasic idea:
65*cc02d7e2SAndroid Build Coastguard Worker* Load 16 bytes
66*cc02d7e2SAndroid Build Coastguard Worker* Leverage SIMD to calculate value range for each byte efficiently
67*cc02d7e2SAndroid Build Coastguard Worker* Validate 16 bytes at once
68*cc02d7e2SAndroid Build Coastguard Worker
69*cc02d7e2SAndroid Build Coastguard Worker### UTF-8 coding format
70*cc02d7e2SAndroid Build Coastguard Worker
71*cc02d7e2SAndroid Build Coastguard Workerhttp://www.unicode.org/versions/Unicode6.0.0/ch03.pdf, page 94
72*cc02d7e2SAndroid Build Coastguard Worker
73*cc02d7e2SAndroid Build Coastguard WorkerTable 3-7. Well-Formed UTF-8 Byte Sequences
74*cc02d7e2SAndroid Build Coastguard Worker
75*cc02d7e2SAndroid Build Coastguard WorkerCode Points        | First Byte | Second Byte | Third Byte | Fourth Byte |
76*cc02d7e2SAndroid Build Coastguard Worker:----------        | :--------- | :---------- | :--------- | :---------- |
77*cc02d7e2SAndroid Build Coastguard WorkerU+0000..U+007F     | 00..7F     |             |            |             |
78*cc02d7e2SAndroid Build Coastguard WorkerU+0080..U+07FF     | C2..DF     | 80..BF      |            |             |
79*cc02d7e2SAndroid Build Coastguard WorkerU+0800..U+0FFF     | E0         | ***A0***..BF| 80..BF     |             |
80*cc02d7e2SAndroid Build Coastguard WorkerU+1000..U+CFFF     | E1..EC     | 80..BF      | 80..BF     |             |
81*cc02d7e2SAndroid Build Coastguard WorkerU+D000..U+D7FF     | ED         | 80..***9F***| 80..BF     |             |
82*cc02d7e2SAndroid Build Coastguard WorkerU+E000..U+FFFF     | EE..EF     | 80..BF      | 80..BF     |             |
83*cc02d7e2SAndroid Build Coastguard WorkerU+10000..U+3FFFF   | F0         | ***90***..BF| 80..BF     | 80..BF      |
84*cc02d7e2SAndroid Build Coastguard WorkerU+40000..U+FFFFF   | F1..F3     | 80..BF      | 80..BF     | 80..BF      |
85*cc02d7e2SAndroid Build Coastguard WorkerU+100000..U+10FFFF | F4         | 80..***8F***| 80..BF     | 80..BF      |
86*cc02d7e2SAndroid Build Coastguard Worker
87*cc02d7e2SAndroid Build Coastguard WorkerTo summarise UTF-8 encoding:
88*cc02d7e2SAndroid Build Coastguard Worker* Depending on First Byte, one legal character can be 1, 2, 3, 4 bytes
89*cc02d7e2SAndroid Build Coastguard Worker  * For First Byte within C0..DF, character length = 2
90*cc02d7e2SAndroid Build Coastguard Worker  * For First Byte within E0..EF, character length = 3
91*cc02d7e2SAndroid Build Coastguard Worker  * For First Byte within F0..F4, character length = 4
92*cc02d7e2SAndroid Build Coastguard Worker* C0, C1, F5..FF are not allowed
93*cc02d7e2SAndroid Build Coastguard Worker* Second,Third,Fourth Bytes must lie in 80..BF.
94*cc02d7e2SAndroid Build Coastguard Worker* There are four **special cases** for Second Byte, shown ***bold italic*** in above table.
95*cc02d7e2SAndroid Build Coastguard Worker
96*cc02d7e2SAndroid Build Coastguard Worker### Range table
97*cc02d7e2SAndroid Build Coastguard Worker
98*cc02d7e2SAndroid Build Coastguard WorkerRange table maps range index 0 ~ 15 to minimal and maximum values allowed. Our task is to observe input string, find the pattern and set correct range index for each byte, then validate input string.
99*cc02d7e2SAndroid Build Coastguard Worker
100*cc02d7e2SAndroid Build Coastguard WorkerIndex      | Min | Max | Byte type
101*cc02d7e2SAndroid Build Coastguard Worker:----      | :-- | :-- | :--------
102*cc02d7e2SAndroid Build Coastguard Worker0          | 00  | 7F  | First Byte, ASCII
103*cc02d7e2SAndroid Build Coastguard Worker1,2,3      | 80  | BF  | Second, Third, Fourth Bytes
104*cc02d7e2SAndroid Build Coastguard Worker4          | A0  | BF  | Second Byte after E0
105*cc02d7e2SAndroid Build Coastguard Worker5          | 80  | 9F  | Second Byte after ED
106*cc02d7e2SAndroid Build Coastguard Worker6          | 90  | BF  | Second Byte after F0
107*cc02d7e2SAndroid Build Coastguard Worker7          | 80  | 8F  | Second Byte after F4
108*cc02d7e2SAndroid Build Coastguard Worker8          | C2  | F4  | First Byte, non-ASCII
109*cc02d7e2SAndroid Build Coastguard Worker9..15(NEON) | FF  | 00  | Illegal: unsigned char >= 255 && unsigned char <= 0
110*cc02d7e2SAndroid Build Coastguard Worker9..15(SSE)  | 7F  | 80  | Illegal: signed char >= 127 && signed char <= -128
111*cc02d7e2SAndroid Build Coastguard Worker
112*cc02d7e2SAndroid Build Coastguard Worker### Calculate byte ranges (ignore special cases)
113*cc02d7e2SAndroid Build Coastguard Worker
114*cc02d7e2SAndroid Build Coastguard WorkerIgnoring the four special cases(E0,ED,F0,F4), how should we set range index for each byte?
115*cc02d7e2SAndroid Build Coastguard Worker
116*cc02d7e2SAndroid Build Coastguard Worker* Set range index to 0(00..7F) for all bytes by default
117*cc02d7e2SAndroid Build Coastguard Worker* Find non-ASCII First Byte (C0..FF), set their range index to 8(C2..F4)
118*cc02d7e2SAndroid Build Coastguard Worker* For First Byte within C0..DF, set next byte's range index to 1(80..BF)
119*cc02d7e2SAndroid Build Coastguard Worker* For First Byte within E0..EF, set next two byte's range index to 2,1(80..BF) in sequence
120*cc02d7e2SAndroid Build Coastguard Worker* For First Byte within F0..FF, set next three byte's range index to 3,2,1(80..BF) in sequence
121*cc02d7e2SAndroid Build Coastguard Worker
122*cc02d7e2SAndroid Build Coastguard WorkerTo implement above operations efficiently with SIMD:
123*cc02d7e2SAndroid Build Coastguard Worker* For 16 input bytes, use lookup table to map C0..DF to 1, E0..EF to 2, F0..FF to 3, others to 0. Save to first_len.
124*cc02d7e2SAndroid Build Coastguard Worker* Map C0..FF to 8, we get range indices for First Byte.
125*cc02d7e2SAndroid Build Coastguard Worker* Shift first_len one byte, we get range indices for Second Byte.
126*cc02d7e2SAndroid Build Coastguard Worker* Saturate substract first_len by one(3->2, 2->1, 1->0, 0->0), then shift two bytes, we get range indices for Third Byte.
127*cc02d7e2SAndroid Build Coastguard Worker* Saturate substract first_len by two(3->1, 2->0, 1->0, 0->0), then shift three bytes, we get range indices for Fourth Byte.
128*cc02d7e2SAndroid Build Coastguard Worker
129*cc02d7e2SAndroid Build Coastguard WorkerExample(assume no previous data)
130*cc02d7e2SAndroid Build Coastguard Worker
131*cc02d7e2SAndroid Build Coastguard WorkerInput       | F1 | 80 | 80 | 80 | 80 | C2 | 80 | 80 | ...
132*cc02d7e2SAndroid Build Coastguard Worker:----       | :- | :- | :- | :- | :- | :- | :- | :- | :--
133*cc02d7e2SAndroid Build Coastguard Worker*first_len* |*3* |*0* |*0* |*0* |*0* |*1* |*0* |*0* |*...*
134*cc02d7e2SAndroid Build Coastguard WorkerFirst Byte  | 8  | 0  | 0  | 0  | 0  | 8  | 0  | 0  | ...
135*cc02d7e2SAndroid Build Coastguard WorkerSecond Byte | 0  | 3  | 0  | 0  | 0  | 0  | 1  | 0  | ...
136*cc02d7e2SAndroid Build Coastguard WorkerThird Byte  | 0  | 0  | 2  | 0  | 0  | 0  | 0  | 0  | ...
137*cc02d7e2SAndroid Build Coastguard WorkerFourth Byte | 0  | 0  | 0  | 1  | 0  | 0  | 0  | 0  | ...
138*cc02d7e2SAndroid Build Coastguard WorkerRange index | 8  | 3  | 2  | 1  | 0  | 8  | 1  | 0  | ...
139*cc02d7e2SAndroid Build Coastguard Worker
140*cc02d7e2SAndroid Build Coastguard Worker```c
141*cc02d7e2SAndroid Build Coastguard WorkerRange_index = First_Byte | Second_Byte | Third_Byte | Fourth_Byte
142*cc02d7e2SAndroid Build Coastguard Worker```
143*cc02d7e2SAndroid Build Coastguard Worker
144*cc02d7e2SAndroid Build Coastguard Worker#### Error handling
145*cc02d7e2SAndroid Build Coastguard Worker
146*cc02d7e2SAndroid Build Coastguard Worker* C0,C1,F5..FF are not included in range table and will always be detected.
147*cc02d7e2SAndroid Build Coastguard Worker* Illegal 80..BF will have range index 0(00..7F) and be detected.
148*cc02d7e2SAndroid Build Coastguard Worker* Based on First Byte, according Second, Third and Fourth Bytes will have range index 1/2/3, to make sure they must lie in 80..BF.
149*cc02d7e2SAndroid Build Coastguard Worker* If non-ASCII First Byte overlaps, above algorithm will set range index of the latter First Byte to 9,10,11, which are illegal ranges. E.g, Input = F1 80 C2 90 --> Range index = 8 3 10 1, where 10 indicates error. See table below.
150*cc02d7e2SAndroid Build Coastguard Worker
151*cc02d7e2SAndroid Build Coastguard WorkerOverlapped non-ASCII First Byte
152*cc02d7e2SAndroid Build Coastguard Worker
153*cc02d7e2SAndroid Build Coastguard WorkerInput       | F1 | 80 | C2 | 90
154*cc02d7e2SAndroid Build Coastguard Worker:----       | :- | :- | :- | :-
155*cc02d7e2SAndroid Build Coastguard Worker*first_len* |*3* |*0* |*1* |*0*
156*cc02d7e2SAndroid Build Coastguard WorkerFirst Byte  | 8  | 0  | 8  | 0
157*cc02d7e2SAndroid Build Coastguard WorkerSecond Byte | 0  | 3  | 0  | 1
158*cc02d7e2SAndroid Build Coastguard WorkerThird Byte  | 0  | 0  | 2  | 0
159*cc02d7e2SAndroid Build Coastguard WorkerFourth Byte | 0  | 0  | 0  | 1
160*cc02d7e2SAndroid Build Coastguard WorkerRange index | 8  | 3  |***10***| 1
161*cc02d7e2SAndroid Build Coastguard Worker
162*cc02d7e2SAndroid Build Coastguard Worker### Adjust Second Byte range for special cases
163*cc02d7e2SAndroid Build Coastguard Worker
164*cc02d7e2SAndroid Build Coastguard WorkerRange index adjustment for four special cases
165*cc02d7e2SAndroid Build Coastguard Worker
166*cc02d7e2SAndroid Build Coastguard WorkerFirst Byte | Second Byte | Before adjustment | Correct index | Adjustment |
167*cc02d7e2SAndroid Build Coastguard Worker:--------- | :---------- | :---------------- | :------------ | :---------
168*cc02d7e2SAndroid Build Coastguard WorkerE0         | A0..BF      | 2                 | 4             | **2**
169*cc02d7e2SAndroid Build Coastguard WorkerED         | 80..9F      | 2                 | 5             | **3**
170*cc02d7e2SAndroid Build Coastguard WorkerF0         | 90..BF      | 3                 | 6             | **3**
171*cc02d7e2SAndroid Build Coastguard WorkerF4         | 80..8F      | 3                 | 7             | **4**
172*cc02d7e2SAndroid Build Coastguard Worker
173*cc02d7e2SAndroid Build Coastguard WorkerRange index adjustment can be reduced to below problem:
174*cc02d7e2SAndroid Build Coastguard Worker
175*cc02d7e2SAndroid Build Coastguard Worker***Given 16 bytes, replace E0 with 2, ED with 3, F0 with 3, F4 with 4, others with 0.***
176*cc02d7e2SAndroid Build Coastguard Worker
177*cc02d7e2SAndroid Build Coastguard WorkerA naive SIMD approach:
178*cc02d7e2SAndroid Build Coastguard Worker1. Compare 16 bytes with E0, get the mask for eacy byte (FF if equal, 00 otherwise)
179*cc02d7e2SAndroid Build Coastguard Worker1. And the mask with 2 to get adjustment for E0
180*cc02d7e2SAndroid Build Coastguard Worker1. Repeat step 1,2 for ED,F0,F4
181*cc02d7e2SAndroid Build Coastguard Worker
182*cc02d7e2SAndroid Build Coastguard WorkerAt least **eight** operations are required for naive approach.
183*cc02d7e2SAndroid Build Coastguard Worker
184*cc02d7e2SAndroid Build Coastguard WorkerObserving special bytes(E0,ED,F0,F4) are close to each other, we can do much better using lookup table.
185*cc02d7e2SAndroid Build Coastguard Worker
186*cc02d7e2SAndroid Build Coastguard Worker#### NEON
187*cc02d7e2SAndroid Build Coastguard Worker
188*cc02d7e2SAndroid Build Coastguard WorkerNEON ```tbl``` instruction is very convenient for table lookup:
189*cc02d7e2SAndroid Build Coastguard Worker* Table can be up to 16x4 bytes in size
190*cc02d7e2SAndroid Build Coastguard Worker* Return zero if index is out of range
191*cc02d7e2SAndroid Build Coastguard Worker
192*cc02d7e2SAndroid Build Coastguard WorkerLeverage these features, we can solve the problem with as few as **two** operations:
193*cc02d7e2SAndroid Build Coastguard Worker* Precreate a 16x2 lookup table, where table[0]=2, table[13]=3, table[16]=3, table[20]=4, table[others]=0.
194*cc02d7e2SAndroid Build Coastguard Worker* Substract input bytes with E0 (E0 -> 0, ED -> 13, F0 -> 16, F4 -> 20).
195*cc02d7e2SAndroid Build Coastguard Worker* Use the substracted byte as index of lookup table and get range adjustment directly.
196*cc02d7e2SAndroid Build Coastguard Worker  * For indices less than 32, we get zero or required adjustment value per input byte
197*cc02d7e2SAndroid Build Coastguard Worker  * For out of bound indices, we get zero per ```tbl``` behaviour
198*cc02d7e2SAndroid Build Coastguard Worker
199*cc02d7e2SAndroid Build Coastguard Worker#### SSE
200*cc02d7e2SAndroid Build Coastguard Worker
201*cc02d7e2SAndroid Build Coastguard WorkerSSE ```pshufb``` instruction is not as friendly as NEON ```tbl``` in this case:
202*cc02d7e2SAndroid Build Coastguard Worker* Table can only be 16 bytes in size
203*cc02d7e2SAndroid Build Coastguard Worker* Out of bound indices are handled this way:
204*cc02d7e2SAndroid Build Coastguard Worker  * If 7-th bit of index is 0, least four bits are used as index (E.g, index 0x73 returns 3rd element)
205*cc02d7e2SAndroid Build Coastguard Worker  * If 7-th bit of index is 1, return 0 (E.g, index 0x83 returns 0)
206*cc02d7e2SAndroid Build Coastguard Worker
207*cc02d7e2SAndroid Build Coastguard WorkerWe can still leverage these features to solve the problem in **five** operations:
208*cc02d7e2SAndroid Build Coastguard Worker* Precreate two tables:
209*cc02d7e2SAndroid Build Coastguard Worker  * table_df[1] = 2, table_df[14] = 3, table_df[others] = 0
210*cc02d7e2SAndroid Build Coastguard Worker  * table_ef[1] = 3, table_ef[5] = 4, table_ef[others] = 0
211*cc02d7e2SAndroid Build Coastguard Worker* Substract input bytes with EF (E0 -> 241, ED -> 254, F0 -> 1, F4 -> 5) to get the temporary indices
212*cc02d7e2SAndroid Build Coastguard Worker* Get range index for E0,ED
213*cc02d7e2SAndroid Build Coastguard Worker  * Saturate substract temporary indices with 240 (E0 -> 1, ED -> 14, all values below 240 becomes 0)
214*cc02d7e2SAndroid Build Coastguard Worker  * Use substracted indices to look up table_df, get the correct adjustment
215*cc02d7e2SAndroid Build Coastguard Worker* Get range index for F0,F4
216*cc02d7e2SAndroid Build Coastguard Worker  * Saturate add temporary indices with 112(0x70) (F0 -> 0x71, F4 -> 0x75, all values above 16 will be larger than 128(7-th bit set))
217*cc02d7e2SAndroid Build Coastguard Worker  * Use added indices to look up table_ef, get the correct adjustment (index 0x71,0x75 returns 1st,5th elements, per ```pshufb``` behaviour)
218*cc02d7e2SAndroid Build Coastguard Worker
219*cc02d7e2SAndroid Build Coastguard Worker#### Error handling
220*cc02d7e2SAndroid Build Coastguard Worker
221*cc02d7e2SAndroid Build Coastguard Worker* For overlapped non-ASCII First Byte, range index before adjustment is 9,10,11. After adjustment (adds 2,3,4 or 0), the range index will be 9 to 15, which is still illegal in range table. So the error will be detected.
222*cc02d7e2SAndroid Build Coastguard Worker
223*cc02d7e2SAndroid Build Coastguard Worker### Handling remaining bytes
224*cc02d7e2SAndroid Build Coastguard Worker
225*cc02d7e2SAndroid Build Coastguard WorkerFor remaining input less than 16 bytes, we will fallback to naive byte by byte approach to validate them, which is actually faster than SIMD processing.
226*cc02d7e2SAndroid Build Coastguard Worker* Look back last 16 bytes buffer to find First Byte. At most three bytes need to look back. Otherwise we either happen to be at character boundray, or there are some errors we already detected.
227*cc02d7e2SAndroid Build Coastguard Worker* Validate string byte by byte starting from the First Byte.
228*cc02d7e2SAndroid Build Coastguard Worker
229*cc02d7e2SAndroid Build Coastguard Worker## Tests
230*cc02d7e2SAndroid Build Coastguard Worker
231*cc02d7e2SAndroid Build Coastguard WorkerIt's necessary to design test cases to cover corner cases as more as possible.
232*cc02d7e2SAndroid Build Coastguard Worker
233*cc02d7e2SAndroid Build Coastguard Worker### Positive cases
234*cc02d7e2SAndroid Build Coastguard Worker
235*cc02d7e2SAndroid Build Coastguard Worker1. Prepare correct characters
236*cc02d7e2SAndroid Build Coastguard Worker2. Validate correct characters
237*cc02d7e2SAndroid Build Coastguard Worker3. Validate long strings
238*cc02d7e2SAndroid Build Coastguard Worker   * Round concatenate characters starting from first character to 1024 bytes
239*cc02d7e2SAndroid Build Coastguard Worker   * Validate 1024 bytes string
240*cc02d7e2SAndroid Build Coastguard Worker   * Shift 1 byte, validate 1025 bytes string
241*cc02d7e2SAndroid Build Coastguard Worker   * Shift 2 bytes, Validate 1026 bytes string
242*cc02d7e2SAndroid Build Coastguard Worker   * ...
243*cc02d7e2SAndroid Build Coastguard Worker   * Shift 16 bytes, validate 1040 bytes string
244*cc02d7e2SAndroid Build Coastguard Worker4. Repeat step3, test buffer starting from second character
245*cc02d7e2SAndroid Build Coastguard Worker5. Repeat step3, test buffer starting from third character
246*cc02d7e2SAndroid Build Coastguard Worker6. ...
247*cc02d7e2SAndroid Build Coastguard Worker
248*cc02d7e2SAndroid Build Coastguard Worker### Negative cases
249*cc02d7e2SAndroid Build Coastguard Worker
250*cc02d7e2SAndroid Build Coastguard Worker1. Prepare bad characters and bad strings
251*cc02d7e2SAndroid Build Coastguard Worker   * Bad character
252*cc02d7e2SAndroid Build Coastguard Worker   * Bad character cross 16 bytes boundary
253*cc02d7e2SAndroid Build Coastguard Worker   * Bad character cross last 16 bytes and remaining bytes boundary
254*cc02d7e2SAndroid Build Coastguard Worker2. Test long strings
255*cc02d7e2SAndroid Build Coastguard Worker   * Prepare correct long strings same as positive cases
256*cc02d7e2SAndroid Build Coastguard Worker   * Append bad characters
257*cc02d7e2SAndroid Build Coastguard Worker   * Shift one byte for each iteration
258*cc02d7e2SAndroid Build Coastguard Worker   * Validate each shift
259*cc02d7e2SAndroid Build Coastguard Worker
260*cc02d7e2SAndroid Build Coastguard Worker## Code breakdown
261*cc02d7e2SAndroid Build Coastguard Worker
262*cc02d7e2SAndroid Build Coastguard WorkerBelow table shows how 16 bytes input are processed step by step. See [range-neon.c](range-neon.c) for according code.
263*cc02d7e2SAndroid Build Coastguard Worker
264*cc02d7e2SAndroid Build Coastguard Worker![Range based UTF-8 validation algorithm](https://raw.githubusercontent.com/cyb70289/utf8/master/range.png)
265