|
Name |
|
Date |
Size |
#Lines |
LOC |
| .. | | - | - |
| clinic/ | | 25-Apr-2025 | - | 574 | 473 |
| README | D | 25-Apr-2025 | 2.6 KiB | 78 | 52 |
| _codecs_cn.c | D | 25-Apr-2025 | 11 KiB | 471 | 354 |
| _codecs_hk.c | D | 25-Apr-2025 | 5 KiB | 192 | 145 |
| _codecs_iso2022.c | D | 25-Apr-2025 | 33 KiB | 1,144 | 982 |
| _codecs_jp.c | D | 25-Apr-2025 | 19.9 KiB | 761 | 625 |
| _codecs_kr.c | D | 25-Apr-2025 | 12.7 KiB | 469 | 373 |
| _codecs_tw.c | D | 25-Apr-2025 | 2.3 KiB | 144 | 100 |
| alg_jisx0201.h | D | 25-Apr-2025 | 3.1 KiB | 66 | 60 |
| cjkcodecs.h | D | 25-Apr-2025 | 13.9 KiB | 431 | 360 |
| emu_jisx0213_2000.h | D | 25-Apr-2025 | 2.7 KiB | 55 | 44 |
| mappings_cn.h | D | 25-Apr-2025 | 312.4 KiB | 4,105 | 4,092 |
| mappings_hk.h | D | 25-Apr-2025 | 179.4 KiB | 2,379 | 2,369 |
| mappings_jisx0213_pair.h | D | 25-Apr-2025 | 3.8 KiB | 61 | 57 |
| mappings_jp.h | D | 25-Apr-2025 | 357 KiB | 4,767 | 4,743 |
| mappings_kr.h | D | 25-Apr-2025 | 247.9 KiB | 3,254 | 3,246 |
| mappings_tw.h | D | 25-Apr-2025 | 198.8 KiB | 2,634 | 2,626 |
| multibytecodec.c | D | 25-Apr-2025 | 59.5 KiB | 2,082 | 1,604 |
| multibytecodec.h | D | 25-Apr-2025 | 4.5 KiB | 140 | 106 |
README
1To generate or modify mapping headers
2-------------------------------------
3Mapping headers are generated from Tools/unicode/genmap_*.py
4
5
6
7Notes on implementation characteristics of each codecs
8-----------------------------------------------------
9
101) Big5 codec
11
12 The big5 codec maps the following characters as cp950 does rather
13 than conforming Unicode.org's that maps to 0xFFFD.
14
15 BIG5 Unicode Description
16
17 0xA15A 0x2574 SPACING UNDERSCORE
18 0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE
19 0xA1C5 0x02CD SPACING HEAVY UNDERSCORE
20 0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT
21 0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT
22 0xA2CC 0x5341 HANGZHOU NUMERAL TEN
23 0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY
24
25 Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
26 big5 codes already, a roundtrip compatibility is not guaranteed for
27 them.
28
29
302) cp932 codec
31
32 To conform to Windows's real mapping, cp932 codec maps the following
33 codepoints in addition of the official cp932 mapping.
34
35 CP932 Unicode Description
36
37 0x80 0x80 UNDEFINED
38 0xA0 0xF8F0 UNDEFINED
39 0xFD 0xF8F1 UNDEFINED
40 0xFE 0xF8F2 UNDEFINED
41 0xFF 0xF8F3 UNDEFINED
42
43
443) euc-jisx0213 codec
45
46 The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
47 unicode U+FF3C instead of U+005C as on unicode.org's mapping.
48 Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
49 is shown as a full width character, mapping to U+FF3C can make
50 more sense.
51
52 The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
53 codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
54 overlapped by each other, it doesn't bother standard conformations
55 (and JIS X 0213 Plane 2 is intended to use so.) On encoding
56 sessions, the codec will try to encode kanji characters in this
57 order:
58
59 JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212
60
61
624) euc-jp codec
63
64 The euc-jp codec is a compatibility instance on these points:
65 - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
66 - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
67 - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)
68
69
705) shift-jis codec
71
72 The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
73 instead of using JIS X 0201 for compatibility. The differences are:
74 - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
75 - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
76 - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.
77
78