UTFConvert.h - OpenGrok cross reference for /aosp_15_r20/external/lzma/CPP/Common/UTFConvert.h

Lines Matching +full:utf +full:- +full:8
49     if (NonUtf)          s.Add_OptSpaced("non-UTF8");  in PrintStatus()
84 if (allowReduced == false) - all UTF-8 character sequences must be finished.
85 if (allowReduced == true)  - it allows truncated last character-Utf8-sequence
100      it processes SINGLE-SURROGATE-8 as valid Unicode point.
101      it converts  SINGLE-SURROGATE-8 to SINGLE-SURROGATE-16
102      Note: some sequencies of two SINGLE-SURROGATE-8 points
103            will generate correct SURROGATE-16-PAIR, and
104            that SURROGATE-16-PAIR later will be converted to correct
105            UTF8-SURROGATE-21 point. So we don't restore original
106            STR-8 sequence in that case.
112         it generates ESCAPE for SINGLE-SURROGATE-8,
114         it generates U+fffd for SINGLE-SURROGATE-8,
121      it generates (U+fffd) code for non-UTF-8 (invalid) characters
125      It generates (ESCAPE) codes for NON-UTF-8 (invalid) characters.
126      And later we can restore original UTF-8-RAW characters from (ESCAPE-16-21) codes.
133      it process ESCAPE-8 points as another Unicode points.
134      In Linux: ESCAPE-16 will mean two different ESCAPE-8 seqences,
135        so we need HIGH-ESCAPE-PLANE-21 to restore UTF-8-RAW -> UTF-16 -> UTF-8-RAW
140      it generates ESCAPE-16-21 for ESCAPE-8 points
141      so we can restore UTF-8-RAW -> UTF-16 -> UTF-8-RAW without HIGH-ESCAPE-PLANE-21.
145 Main USE CASES with UTF-8 <-> UTF-16 conversions:
147  WIN32:   UTF-16-RAW -> UTF-8 (Archive) -> UTF-16-RAW
153      So we restore original SINGLE-SURROGATE-16 from single SINGLE-SURROGATE-8.
156  Linux:   UTF-8-RAW -> UTF-16 (Intermediate / Archive) -> UTF-8-RAW
158      we want restore original UTF-8-RAW sequence later from that ESCAPE-16.
165  MacOS:   UTF-8-RAW -> UTF-16 (Intermediate / Archive) -> UTF-8-RAW
167      we want to restore correct UTF-8 without any BMP processing:
181 #define Z7_UTF_FLAG_TO_UTF8_SURROGATE_ERROR    (1 << 8)
190      we extract SINGLE-SURROGATE as normal UTF-8
192      In Windows : for UTF-16-RAW <-> UTF-8 (archive) <-> UTF-16-RAW in .
195        use-case-1: UTF-8 -> UTF-16 -> UTF-8  doesn't generate UTF-16 SINGLE-SURROGATE,
197        use-case 2: UTF-16-7z (with SINGLE-SURROGATE from Windows) -> UTF-8 (Linux)
198                    will generate SINGLE-SURROGATE-UTF-8 here.
204      it can be used for compatibility mode with WIN32 UTF function
205      or if we want UTF-8 stream without any errors
211   if (flag is NOT set) it doesn't extract  raw 8-bit symbol from Escape-Plane-16
212   if (flag is set)     it         extracts raw 8-bit symbol from Escape-Plane-16
214   in Linux we need some way to extract NON-UTF8 RAW 8-bits from BMP (UTF-16 7z archive):
215   if (we       use High-Escape-Plane), we can transfer BMP escapes to High-Escape-Plane.
216   if (we don't use High-Escape-Plane), we must use Z7_UTF_FLAG_TO_UTF8_EXTRACT_BMP_ESCAPE.
220   // that flag affects the code only if (wchar_t is 32-bit)
221   // that mode with high-escape can be disabled now in UTFConvert.cpp
223      it doesn't extract raw 8-bit symbol from High-Escape-Plane
225      it        extracts raw 8-bit symbol from High-Escape-Plane
229 WIN32 : UTF-16-RAW -> UTF-8 (archive) -> UTF-16-RAW
233      So we restore original UTF-16-RAW.
236 Linix : UTF-8 with Escapes -> UTF-16 (7z archive) -> UTF-8 with Escapes
237      set Z7_UTF_FLAG_TO_UTF8_EXTRACT_BMP_ESCAPE to extract non-UTF from 7z archive
238      set Z7_UTF_FLAG_TO_UTF8_PARSE_HIGH_ESCAPE for intermediate UTF-16.
242      the system doesn't support incorrect UTF-8 in file names.
273 // ---------- Utf16 Little endian functions ----------
275 // We store 16-bit surrogates even in 32-bit WCHARs in Linux.
331         // printf("\nSurragate : %4x %4x -> ", (int)c, (int)c2);