xref: /aosp_15_r20/external/cronet/third_party/icu/README.chromium (revision 6777b5387eb2ff775bb5750e3f5d96f37fb7352b)
1*6777b538SAndroid Build Coastguard WorkerName: icu
2*6777b538SAndroid Build Coastguard WorkerURL: https://github.com/unicode-org/icu
3*6777b538SAndroid Build Coastguard WorkerVersion: 74-2
4*6777b538SAndroid Build Coastguard WorkerCPEPrefix: cpe:/a:icu-project:international_components_for_unicode:74.2
5*6777b538SAndroid Build Coastguard WorkerLicense: MIT
6*6777b538SAndroid Build Coastguard WorkerLicense File: LICENSE
7*6777b538SAndroid Build Coastguard WorkerSecurity Critical: yes
8*6777b538SAndroid Build Coastguard WorkerShipped: yes
9*6777b538SAndroid Build Coastguard Worker
10*6777b538SAndroid Build Coastguard WorkerDescription:
11*6777b538SAndroid Build Coastguard WorkerThis directory contains the source code of ICU 74.2 for C/C++.
12*6777b538SAndroid Build Coastguard Worker
13*6777b538SAndroid Build Coastguard WorkerA. How to update ICU
14*6777b538SAndroid Build Coastguard Worker
15*6777b538SAndroid Build Coastguard Worker1. Run "scripts/update.sh <version>" (e.g. 74-2).
16*6777b538SAndroid Build Coastguard Worker   This will download ICU from the upstream git repository.
17*6777b538SAndroid Build Coastguard Worker   It does preserve Chrome-specific build files and
18*6777b538SAndroid Build Coastguard Worker   converter files. (see section C)
19*6777b538SAndroid Build Coastguard Worker
20*6777b538SAndroid Build Coastguard Worker   source.gni and icu.gyp* files are automatically updated, too.
21*6777b538SAndroid Build Coastguard Worker
22*6777b538SAndroid Build Coastguard Worker2. Review and apply patches/changes in "D. Local Modifications" if
23*6777b538SAndroid Build Coastguard Worker   necessary/applicable. Update patch files in patches/.
24*6777b538SAndroid Build Coastguard Worker
25*6777b538SAndroid Build Coastguard Worker3. Follow the instructions in section B on building ICU data files
26*6777b538SAndroid Build Coastguard Worker
27*6777b538SAndroid Build Coastguard WorkerB. How to build ICU data files
28*6777b538SAndroid Build Coastguard Worker
29*6777b538SAndroid Build Coastguard Worker
30*6777b538SAndroid Build Coastguard WorkerPre-built data files are generated and checked in with the following steps
31*6777b538SAndroid Build Coastguard Worker
32*6777b538SAndroid Build Coastguard Worker1. icu data files for Chrome OS, Linux, Mac and Windows
33*6777b538SAndroid Build Coastguard Worker
34*6777b538SAndroid Build Coastguard Worker  a. Make a icu data build directory outside the Chromium source tree
35*6777b538SAndroid Build Coastguard Worker     and cd to that directory (say, $ICUBUILDIR).
36*6777b538SAndroid Build Coastguard Worker
37*6777b538SAndroid Build Coastguard Worker  b. Run
38*6777b538SAndroid Build Coastguard Worker       ${CHROME_ICU_TREE_TOP}/scripts/make_data_all.sh
39*6777b538SAndroid Build Coastguard Worker
40*6777b538SAndroid Build Coastguard Worker     This script takes the following steps:
41*6777b538SAndroid Build Coastguard Worker
42*6777b538SAndroid Build Coastguard Worker     i) Run
43*6777b538SAndroid Build Coastguard Worker        ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout --disable-tests
44*6777b538SAndroid Build Coastguard Worker
45*6777b538SAndroid Build Coastguard Worker     ii) Run make
46*6777b538SAndroid Build Coastguard Worker
47*6777b538SAndroid Build Coastguard Worker     iii) (cd data && make clean)
48*6777b538SAndroid Build Coastguard Worker
49*6777b538SAndroid Build Coastguard Worker     iv) scripts/config_data.sh common
50*6777b538SAndroid Build Coastguard Worker       This configure the build with filer for common.
51*6777b538SAndroid Build Coastguard Worker
52*6777b538SAndroid Build Coastguard Worker     v) Run make
53*6777b538SAndroid Build Coastguard Worker
54*6777b538SAndroid Build Coastguard Worker     vi)  scripts/copy_data.sh common
55*6777b538SAndroid Build Coastguard Worker       This copies the ICU data files for non-Android platforms
56*6777b538SAndroid Build Coastguard Worker       (both Little and Big Endian) to the following locations:
57*6777b538SAndroid Build Coastguard Worker
58*6777b538SAndroid Build Coastguard Worker       common/icudtl.dat
59*6777b538SAndroid Build Coastguard Worker       common/icudtb.dat
60*6777b538SAndroid Build Coastguard Worker
61*6777b538SAndroid Build Coastguard Worker     vii) Repeat step iii) - vi) for chromeos to produce chromeos/icudtl.dat
62*6777b538SAndroid Build Coastguard Worker
63*6777b538SAndroid Build Coastguard Worker     viii) cast/patch_locale.sh
64*6777b538SAndroid Build Coastguard Worker       Modify the file for cast, android, ios and flutter.
65*6777b538SAndroid Build Coastguard Worker
66*6777b538SAndroid Build Coastguard Worker     ix) Repeat step iii) - vi) for cast, andriod and ios to produce
67*6777b538SAndroid Build Coastguard Worker       cast/icudtl.dat
68*6777b538SAndroid Build Coastguard Worker       andriod/icudtl.dat
69*6777b538SAndroid Build Coastguard Worker       ios/icudtl.dat
70*6777b538SAndroid Build Coastguard Worker
71*6777b538SAndroid Build Coastguard Worker     x) flutter/patch_brkitr.sh
72*6777b538SAndroid Build Coastguard Worker       On top of cast/patch_locale.sh.sh (step viii)), further patch
73*6777b538SAndroid Build Coastguard Worker       the code for flutter.
74*6777b538SAndroid Build Coastguard Worker
75*6777b538SAndroid Build Coastguard Worker     xi) Repeat step iii) - vi) for flutter to produce
76*6777b538SAndroid Build Coastguard Worker       flutter/icudtl.dat
77*6777b538SAndroid Build Coastguard Worker
78*6777b538SAndroid Build Coastguard Worker     xii) scripts/clean_up_data_source.sh
79*6777b538SAndroid Build Coastguard Worker
80*6777b538SAndroid Build Coastguard Worker     This reverts the result of cast/patch_locale.sh and flutter/patch_brkitr.sh
81*6777b538SAndroid Build Coastguard Worker     make the tree ready for committing updated ICU data files for
82*6777b538SAndroid Build Coastguard Worker     non-Android and Android platforms.
83*6777b538SAndroid Build Coastguard Worker
84*6777b538SAndroid Build Coastguard Worker  c. Whenever data is updated (e.g timezone update), take step b as long
85*6777b538SAndroid Build Coastguard Worker     as the ICU build directory used in a. is kept.
86*6777b538SAndroid Build Coastguard Worker
87*6777b538SAndroid Build Coastguard Worker2. Note on the locale data customization
88*6777b538SAndroid Build Coastguard Worker
89*6777b538SAndroid Build Coastguard Worker  - filter/chromeos.json
90*6777b538SAndroid Build Coastguard Worker      a. Filter the locale data for ChromeOS's UI langauges :
91*6777b538SAndroid Build Coastguard Worker         locales, lang, region, currency, zone
92*6777b538SAndroid Build Coastguard Worker      b. Filter the locale data for non-UI languages to the bare minimum :
93*6777b538SAndroid Build Coastguard Worker         ExemplarCharacters, LocaleScript, layout, and the name of the
94*6777b538SAndroid Build Coastguard Worker         language for a locale in its native language.
95*6777b538SAndroid Build Coastguard Worker      c. Filter the legacy Chinese character set-based collation
96*6777b538SAndroid Build Coastguard Worker         (big5han/gb2312han) that don't make any sense and nobdoy uses.
97*6777b538SAndroid Build Coastguard Worker
98*6777b538SAndroid Build Coastguard Worker  - filter/common.json
99*6777b538SAndroid Build Coastguard Worker      Same as above in filter/chromeos.json, AND
100*6777b538SAndroid Build Coastguard Worker      e. Filter exemplar cities in timezone data (data/zone).
101*6777b538SAndroid Build Coastguard Worker
102*6777b538SAndroid Build Coastguard Worker  - filter/android.json and filter/ios.json
103*6777b538SAndroid Build Coastguard Worker      a. Filter the locale data for Android / iOS UI langauges :
104*6777b538SAndroid Build Coastguard Worker         locales, lang, region, currency, zone
105*6777b538SAndroid Build Coastguard Worker      b. Filter the locale data for non-UI languages to the bare minimum :
106*6777b538SAndroid Build Coastguard Worker         ExemplarCharacters, LocaleScript, layout, and the name of the
107*6777b538SAndroid Build Coastguard Worker         language for a locale in its native language.
108*6777b538SAndroid Build Coastguard Worker      c. Filter the legacy Chinese character set-based collation
109*6777b538SAndroid Build Coastguard Worker      d. Filter source/data/{region,lang} to exclude these data
110*6777b538SAndroid Build Coastguard Worker         except the language and script names of zh_Hans and zh_Hant.
111*6777b538SAndroid Build Coastguard Worker      e. Keep only the minimal calendar data in data/locales.
112*6777b538SAndroid Build Coastguard Worker      f. Include currency display names for a smaller subset of currencies.
113*6777b538SAndroid Build Coastguard Worker      g. Minimize the locale data for 9 locales to which Chrome on Android
114*6777b538SAndroid Build Coastguard Worker         is not localized.
115*6777b538SAndroid Build Coastguard Worker
116*6777b538SAndroid Build Coastguard Worker
117*6777b538SAndroid Build Coastguard WorkerC. Chromium-specific data build files and converters
118*6777b538SAndroid Build Coastguard Worker
119*6777b538SAndroid Build Coastguard WorkerThey're preserved in step A.1 above. In general, there's no need to touch
120*6777b538SAndroid Build Coastguard Workerthem when updating ICU.
121*6777b538SAndroid Build Coastguard Worker
122*6777b538SAndroid Build Coastguard Worker1. source/data/mappings
123*6777b538SAndroid Build Coastguard Worker  - convrtrs.txt : Lists encodings and aliases required by the WHATWG
124*6777b538SAndroid Build Coastguard Worker    Encoding spec plus a few extra (see the file as to why).
125*6777b538SAndroid Build Coastguard Worker
126*6777b538SAndroid Build Coastguard Worker  - ucmlocal.txt : to list only converters we need.
127*6777b538SAndroid Build Coastguard Worker
128*6777b538SAndroid Build Coastguard Worker  - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
129*6777b538SAndroid Build Coastguard Worker    Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
130*6777b538SAndroid Build Coastguard Worker    They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.
131*6777b538SAndroid Build Coastguard Worker
132*6777b538SAndroid Build Coastguard Worker  - gb18030.ucm and windows-936.ucm
133*6777b538SAndroid Build Coastguard Worker    gb_table.patch was applied for the following changes. No need
134*6777b538SAndroid Build Coastguard Worker    to apply it again. The patch is kept for the record.
135*6777b538SAndroid Build Coastguard Worker    a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
136*6777b538SAndroid Build Coastguard Worker    the encoding spec (one-way mapping in toUnicode direction).
137*6777b538SAndroid Build Coastguard Worker    b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
138*6777b538SAndroid Build Coastguard Worker    from U+1E3F to \xA8\xBC (windows-936/GBK).
139*6777b538SAndroid Build Coastguard Worker       See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
140*6777b538SAndroid Build Coastguard Worker
141*6777b538SAndroid Build Coastguard Worker2. source/data/brkitr
142*6777b538SAndroid Build Coastguard Worker  - dictionaries/khmerdict.txt: Abridged Khmer dictionary. See
143*6777b538SAndroid Build Coastguard Worker    https://unicode-org.atlassian.net/browse/ICU-9451
144*6777b538SAndroid Build Coastguard Worker  - dictionaries/laodict.txt: Abridged Lao dictionary. We keep using the smaller
145*6777b538SAndroid Build Coastguard Worker    old version from ICU69-1.
146*6777b538SAndroid Build Coastguard Worker  - rules/word_ja.txt (used only on Android)
147*6777b538SAndroid Build Coastguard Worker    Added for Japanese-specific word-breaking without the C+J dictionary.
148*6777b538SAndroid Build Coastguard Worker  - rules/{root,zh,zh_Hant}.txt
149*6777b538SAndroid Build Coastguard Worker    a. Use line_normal by default.
150*6777b538SAndroid Build Coastguard Worker    b. Drop local patches we used to have for the following issues. They'll
151*6777b538SAndroid Build Coastguard Worker       be dealt with in the upstream (Unicode/CLDR).
152*6777b538SAndroid Build Coastguard Worker       http://unicode.org/cldr/trac/ticket/6557
153*6777b538SAndroid Build Coastguard Worker       http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)
154*6777b538SAndroid Build Coastguard Worker
155*6777b538SAndroid Build Coastguard Worker3. Add {an,ku,tg,wa}.txt to source/data/{locale,lang}
156*6777b538SAndroid Build Coastguard Worker   with the minimal locale data necessary for spellchecker and
157*6777b538SAndroid Build Coastguard Worker   and language menus.
158*6777b538SAndroid Build Coastguard Worker
159*6777b538SAndroid Build Coastguard WorkerD. Local Modifications
160*6777b538SAndroid Build Coastguard Worker
161*6777b538SAndroid Build Coastguard Worker1. Applied locale data patches from Google obtained by diff'ing
162*6777b538SAndroid Build Coastguard Worker   the upstream copy and Google's internal copy for source/data
163*6777b538SAndroid Build Coastguard Worker
164*6777b538SAndroid Build Coastguard Worker  - patches/locale_google.patch:
165*6777b538SAndroid Build Coastguard Worker    * Google's internal ICU locale changes
166*6777b538SAndroid Build Coastguard Worker    * Simpler region names for Hong Kong and Macau in all locales
167*6777b538SAndroid Build Coastguard Worker    * Currency signs in ru and uk locales (do not include 'tr' locale changes)
168*6777b538SAndroid Build Coastguard Worker    * AM/PM, midnight, noon formatting for a few Indian locales
169*6777b538SAndroid Build Coastguard Worker    * Timezone name changes in Korean and Chinese locales
170*6777b538SAndroid Build Coastguard Worker    * Default digit for Arabic locale is European digits.
171*6777b538SAndroid Build Coastguard Worker
172*6777b538SAndroid Build Coastguard Worker  - patches/locale1.patch: Minor fixes for Korean
173*6777b538SAndroid Build Coastguard Worker
174*6777b538SAndroid Build Coastguard Worker  - patches/name_5_langs.patch: add the native names of 5 languages not currently
175*6777b538SAndroid Build Coastguard Worker    supported by CLDR/ICU. When updating the ICU to a new version,
176*6777b538SAndroid Build Coastguard Worker    source/data/lang/{ay,dv,ilo,lus,ts}.txt have to be checked and if they are
177*6777b538SAndroid Build Coastguard Worker    present with their display names populated, this patch has to be adjusted
178*6777b538SAndroid Build Coastguard Worker    or discarded as necessary.
179*6777b538SAndroid Build Coastguard Worker
180*6777b538SAndroid Build Coastguard Worker2. Breakiterator patches
181*6777b538SAndroid Build Coastguard Worker  - patches/wordbrk.patch for word.txt, word_POSIX.txt, and word_fi_sv.txt
182*6777b538SAndroid Build Coastguard Worker    a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
183*6777b538SAndroid Build Coastguard Worker       FQDN labels can be split at '.'
184*6777b538SAndroid Build Coastguard Worker    b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
185*6777b538SAndroid Build Coastguard Worker       See http://unicode.org/cldr/trac/ticket/6555
186*6777b538SAndroid Build Coastguard Worker    c. Restore pre-ICU 72 behavior of breaking at '@'. The new upstream behavior
187*6777b538SAndroid Build Coastguard Worker       of not breaking at '@' interacted badly with the local change to break at
188*6777b538SAndroid Build Coastguard Worker       '.' (D.2.a above): although not breaking at '@' is intended to not break
189*6777b538SAndroid Build Coastguard Worker       within e-mail addresses, this is not possible with Chromium's
190*6777b538SAndroid Build Coastguard Worker       break-at-'.' behavior.
191*6777b538SAndroid Build Coastguard Worker
192*6777b538SAndroid Build Coastguard Worker  - patches/khmer-dictbe.patch
193*6777b538SAndroid Build Coastguard Worker    Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
194*6777b538SAndroid Build Coastguard Worker    https://unicode-org.atlassian.net/browse/ICU-9451
195*6777b538SAndroid Build Coastguard Worker
196*6777b538SAndroid Build Coastguard Worker  - Add several common Chinese words that were dropped previously to
197*6777b538SAndroid Build Coastguard Worker    source/data/cjdict/brkitr/cjdict.txt
198*6777b538SAndroid Build Coastguard Worker    patch: patches/cjdict.patch
199*6777b538SAndroid Build Coastguard Worker    upstream bug: https://unicode-org.atlassian.net/browse/ICU-10888
200*6777b538SAndroid Build Coastguard Worker
201*6777b538SAndroid Build Coastguard Worker3. Timezone data update
202*6777b538SAndroid Build Coastguard Worker  Run scripts/update_tz.sh to grab the latest version of the
203*6777b538SAndroid Build Coastguard Worker  following timezone data files and put them in source/data/misc
204*6777b538SAndroid Build Coastguard Worker
205*6777b538SAndroid Build Coastguard Worker     metaZones.txt
206*6777b538SAndroid Build Coastguard Worker     timezoneTypes.txt
207*6777b538SAndroid Build Coastguard Worker     windowsZones.txt
208*6777b538SAndroid Build Coastguard Worker     zoneinfo64.txt
209*6777b538SAndroid Build Coastguard Worker
210*6777b538SAndroid Build Coastguard Worker  As of Mar 5, 2024, the latest version is 2024a
211*6777b538SAndroid Build Coastguard Worker  and the above files are available at the ICU github repos.
212*6777b538SAndroid Build Coastguard Worker
213*6777b538SAndroid Build Coastguard Worker4. Build-related changes
214*6777b538SAndroid Build Coastguard Worker
215*6777b538SAndroid Build Coastguard Worker  - patches/configure.patch:
216*6777b538SAndroid Build Coastguard Worker    * Remove a section of configure that will cause breakage while
217*6777b538SAndroid Build Coastguard Worker      running runConfigureICU.
218*6777b538SAndroid Build Coastguard Worker
219*6777b538SAndroid Build Coastguard Worker  - patches/wpo.patch (only needed when icudata dll is used).
220*6777b538SAndroid Build Coastguard Worker    upstream bugs : https://unicode-org.atlassian.net/browse/ICU-8043
221*6777b538SAndroid Build Coastguard Worker                    https://unicode-org.atlassian.net/browse/ICU-5701
222*6777b538SAndroid Build Coastguard Worker
223*6777b538SAndroid Build Coastguard Worker  - patches/data_symb.patch :
224*6777b538SAndroid Build Coastguard Worker      Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
225*6777b538SAndroid Build Coastguard Worker      the icu data file or icudt.dll
226*6777b538SAndroid Build Coastguard Worker
227*6777b538SAndroid Build Coastguard Worker5. ISO-2022-JP encoding (fromUnicode) change per WHATWG encoding spec.
228*6777b538SAndroid Build Coastguard Worker  - patches/iso2022jp.patch
229*6777b538SAndroid Build Coastguard Worker  - upstream bug:
230*6777b538SAndroid Build Coastguard Worker    https://unicode-org.atlassian.net/browse/ICU-20251
231*6777b538SAndroid Build Coastguard Worker
232*6777b538SAndroid Build Coastguard Worker6. Enable tracing of file but not resource, only for Chromium
233*6777b538SAndroid Build Coastguard Worker    to reduce performance impact/risk.
234*6777b538SAndroid Build Coastguard Worker  - patches/restrace.patch
235*6777b538SAndroid Build Coastguard Worker
236*6777b538SAndroid Build Coastguard Worker7. Patch Arabic date time pattern back to 67 value to avoid test
237*6777b538SAndroid Build Coastguard Worker   breakage in
238*6777b538SAndroid Build Coastguard Worker   third_party/blink/web_tests/fast/forms/datetimelocal/datetimelocal-appearance-l10n.html
239*6777b538SAndroid Build Coastguard Worker  - patches/ardatepattern.patch
240*6777b538SAndroid Build Coastguard Worker  - https://bugs.chromium.org/p/chromium/issues/detail?id=1139186
241*6777b538SAndroid Build Coastguard Worker
242*6777b538SAndroid Build Coastguard Worker8.  Remove explicit std::atomic<NumberRangeFormatterImpl*> template
243*6777b538SAndroid Build Coastguard Worker    instantiation
244*6777b538SAndroid Build Coastguard Worker    patches/atomic_template_instantiation.patch
245*6777b538SAndroid Build Coastguard Worker  - The explicit instantiation was added to silence MSVC C4251 warnings:
246*6777b538SAndroid Build Coastguard Worker    https://unicode-org.atlassian.net/browse/ICU-20157
247*6777b538SAndroid Build Coastguard Worker    Small test cases show that it is generally an error to instantiate
248*6777b538SAndroid Build Coastguard Worker    std::atomic<T*> with an incomplete type T with MSVC, clang, and GCC, so this
249*6777b538SAndroid Build Coastguard Worker    instantiation never should have worked:
250*6777b538SAndroid Build Coastguard Worker    https://gcc.godbolt.org/z/34xx8h
251*6777b538SAndroid Build Coastguard Worker    At this time, it's not clear if this particular instantiation with
252*6777b538SAndroid Build Coastguard Worker    NumberRangeFormatterImpl* was ever necessary for MSVC. Further testing with
253*6777b538SAndroid Build Coastguard Worker    MSVC is required to upstream this patch.
254*6777b538SAndroid Build Coastguard Worker  - https://unicode-org.atlassian.net/browse/ICU-21482
255*6777b538SAndroid Build Coastguard Worker
256*6777b538SAndroid Build Coastguard Worker9.  Patch source/common/uposixdefs.h so it compiles on Fuchsia on Macs.
257*6777b538SAndroid Build Coastguard Worker    patches/fuchsia.patch
258*6777b538SAndroid Build Coastguard Worker  - context bug: https://bugs.chromium.org/p/chromium/issues/detail?id=1184527
259*6777b538SAndroid Build Coastguard Worker
260*6777b538SAndroid Build Coastguard Worker10. Patch fix of Etc/Unknown being returned for
261*6777b538SAndroid Build Coastguard Worker    Intl.DateTimeFormat().resolvedOptions().timeZone on macOS 14.
262*6777b538SAndroid Build Coastguard Worker    patches/revert_realpath.patch
263*6777b538SAndroid Build Coastguard Worker  - https://bugs.chromium.org/p/chromium/issues/detail?id=1473422
264*6777b538SAndroid Build Coastguard Worker  - https://unicode-org.atlassian.net/browse/ICU-22541