Lines Matching +full:utf +full:- +full:8

49     if (NonUtf)          s.Add_OptSpaced("non-UTF8");  in PrintStatus()
84 if (allowReduced == false) - all UTF-8 character sequences must be finished.
85 if (allowReduced == true) - it allows truncated last character-Utf8-sequence
100 it processes SINGLE-SURROGATE-8 as valid Unicode point.
101 it converts SINGLE-SURROGATE-8 to SINGLE-SURROGATE-16
102 Note: some sequencies of two SINGLE-SURROGATE-8 points
103 will generate correct SURROGATE-16-PAIR, and
104 that SURROGATE-16-PAIR later will be converted to correct
105 UTF8-SURROGATE-21 point. So we don't restore original
106 STR-8 sequence in that case.
112 it generates ESCAPE for SINGLE-SURROGATE-8,
114 it generates U+fffd for SINGLE-SURROGATE-8,
121 it generates (U+fffd) code for non-UTF-8 (invalid) characters
125 It generates (ESCAPE) codes for NON-UTF-8 (invalid) characters.
126 And later we can restore original UTF-8-RAW characters from (ESCAPE-16-21) codes.
133 it process ESCAPE-8 points as another Unicode points.
134 In Linux: ESCAPE-16 will mean two different ESCAPE-8 seqences,
135 so we need HIGH-ESCAPE-PLANE-21 to restore UTF-8-RAW -> UTF-16 -> UTF-8-RAW
140 it generates ESCAPE-16-21 for ESCAPE-8 points
141 so we can restore UTF-8-RAW -> UTF-16 -> UTF-8-RAW without HIGH-ESCAPE-PLANE-21.
145 Main USE CASES with UTF-8 <-> UTF-16 conversions:
147 WIN32: UTF-16-RAW -> UTF-8 (Archive) -> UTF-16-RAW
153 So we restore original SINGLE-SURROGATE-16 from single SINGLE-SURROGATE-8.
156 Linux: UTF-8-RAW -> UTF-16 (Intermediate / Archive) -> UTF-8-RAW
158 we want restore original UTF-8-RAW sequence later from that ESCAPE-16.
165 MacOS: UTF-8-RAW -> UTF-16 (Intermediate / Archive) -> UTF-8-RAW
167 we want to restore correct UTF-8 without any BMP processing:
181 #define Z7_UTF_FLAG_TO_UTF8_SURROGATE_ERROR (1 << 8)
190 we extract SINGLE-SURROGATE as normal UTF-8
192 In Windows : for UTF-16-RAW <-> UTF-8 (archive) <-> UTF-16-RAW in .
195 use-case-1: UTF-8 -> UTF-16 -> UTF-8 doesn't generate UTF-16 SINGLE-SURROGATE,
197 use-case 2: UTF-16-7z (with SINGLE-SURROGATE from Windows) -> UTF-8 (Linux)
198 will generate SINGLE-SURROGATE-UTF-8 here.
204 it can be used for compatibility mode with WIN32 UTF function
205 or if we want UTF-8 stream without any errors
211 if (flag is NOT set) it doesn't extract raw 8-bit symbol from Escape-Plane-16
212 if (flag is set) it extracts raw 8-bit symbol from Escape-Plane-16
214 in Linux we need some way to extract NON-UTF8 RAW 8-bits from BMP (UTF-16 7z archive):
215 if (we use High-Escape-Plane), we can transfer BMP escapes to High-Escape-Plane.
216 if (we don't use High-Escape-Plane), we must use Z7_UTF_FLAG_TO_UTF8_EXTRACT_BMP_ESCAPE.
220 // that flag affects the code only if (wchar_t is 32-bit)
221 // that mode with high-escape can be disabled now in UTFConvert.cpp
223 it doesn't extract raw 8-bit symbol from High-Escape-Plane
225 it extracts raw 8-bit symbol from High-Escape-Plane
229 WIN32 : UTF-16-RAW -> UTF-8 (archive) -> UTF-16-RAW
233 So we restore original UTF-16-RAW.
236 Linix : UTF-8 with Escapes -> UTF-16 (7z archive) -> UTF-8 with Escapes
237 set Z7_UTF_FLAG_TO_UTF8_EXTRACT_BMP_ESCAPE to extract non-UTF from 7z archive
238 set Z7_UTF_FLAG_TO_UTF8_PARSE_HIGH_ESCAPE for intermediate UTF-16.
242 the system doesn't support incorrect UTF-8 in file names.
273 // ---------- Utf16 Little endian functions ----------
275 // We store 16-bit surrogates even in 32-bit WCHARs in Linux.
331 // printf("\nSurragate : %4x %4x -> ", (int)c, (int)c2);