1Name: icu 2URL: https://github.com/unicode-org/icu 3Version: 74-2 4CPEPrefix: cpe:/a:icu-project:international_components_for_unicode:74.2 5License: MIT 6License File: LICENSE 7Security Critical: yes 8Shipped: yes 9 10Description: 11This directory contains the source code of ICU 74.2 for C/C++. 12 13A. How to update ICU 14 151. Run "scripts/update.sh <version>" (e.g. 74-2). 16 This will download ICU from the upstream git repository. 17 It does preserve Chrome-specific build files and 18 converter files. (see section C) 19 20 source.gni and icu.gyp* files are automatically updated, too. 21 222. Review and apply patches/changes in "D. Local Modifications" if 23 necessary/applicable. Update patch files in patches/. 24 253. Follow the instructions in section B on building ICU data files 26 27B. How to build ICU data files 28 29 30Pre-built data files are generated and checked in with the following steps 31 321. icu data files for Chrome OS, Linux, Mac and Windows 33 34 a. Make a icu data build directory outside the Chromium source tree 35 and cd to that directory (say, $ICUBUILDIR). 36 37 b. Run 38 ${CHROME_ICU_TREE_TOP}/scripts/make_data_all.sh 39 40 This script takes the following steps: 41 42 i) Run 43 ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout --disable-tests 44 45 ii) Run make 46 47 iii) (cd data && make clean) 48 49 iv) scripts/config_data.sh common 50 This configure the build with filer for common. 51 52 v) Run make 53 54 vi) scripts/copy_data.sh common 55 This copies the ICU data files for non-Android platforms 56 (both Little and Big Endian) to the following locations: 57 58 common/icudtl.dat 59 common/icudtb.dat 60 61 vii) Repeat step iii) - vi) for chromeos to produce chromeos/icudtl.dat 62 63 viii) cast/patch_locale.sh 64 Modify the file for cast, android, ios and flutter. 65 66 ix) Repeat step iii) - vi) for cast, andriod and ios to produce 67 cast/icudtl.dat 68 andriod/icudtl.dat 69 ios/icudtl.dat 70 71 x) flutter/patch_brkitr.sh 72 On top of cast/patch_locale.sh.sh (step viii)), further patch 73 the code for flutter. 74 75 xi) Repeat step iii) - vi) for flutter to produce 76 flutter/icudtl.dat 77 78 xii) scripts/clean_up_data_source.sh 79 80 This reverts the result of cast/patch_locale.sh and flutter/patch_brkitr.sh 81 make the tree ready for committing updated ICU data files for 82 non-Android and Android platforms. 83 84 c. Whenever data is updated (e.g timezone update), take step b as long 85 as the ICU build directory used in a. is kept. 86 872. Note on the locale data customization 88 89 - filter/chromeos.json 90 a. Filter the locale data for ChromeOS's UI langauges : 91 locales, lang, region, currency, zone 92 b. Filter the locale data for non-UI languages to the bare minimum : 93 ExemplarCharacters, LocaleScript, layout, and the name of the 94 language for a locale in its native language. 95 c. Filter the legacy Chinese character set-based collation 96 (big5han/gb2312han) that don't make any sense and nobdoy uses. 97 98 - filter/common.json 99 Same as above in filter/chromeos.json, AND 100 e. Filter exemplar cities in timezone data (data/zone). 101 102 - filter/android.json and filter/ios.json 103 a. Filter the locale data for Android / iOS UI langauges : 104 locales, lang, region, currency, zone 105 b. Filter the locale data for non-UI languages to the bare minimum : 106 ExemplarCharacters, LocaleScript, layout, and the name of the 107 language for a locale in its native language. 108 c. Filter the legacy Chinese character set-based collation 109 d. Filter source/data/{region,lang} to exclude these data 110 except the language and script names of zh_Hans and zh_Hant. 111 e. Keep only the minimal calendar data in data/locales. 112 f. Include currency display names for a smaller subset of currencies. 113 g. Minimize the locale data for 9 locales to which Chrome on Android 114 is not localized. 115 116 117C. Chromium-specific data build files and converters 118 119They're preserved in step A.1 above. In general, there's no need to touch 120them when updating ICU. 121 1221. source/data/mappings 123 - convrtrs.txt : Lists encodings and aliases required by the WHATWG 124 Encoding spec plus a few extra (see the file as to why). 125 126 - ucmlocal.txt : to list only converters we need. 127 128 - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP, 129 Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings. 130 They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh. 131 132 - gb18030.ucm and windows-936.ucm 133 gb_table.patch was applied for the following changes. No need 134 to apply it again. The patch is kept for the record. 135 a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per 136 the encoding spec (one-way mapping in toUnicode direction). 137 b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map 138 from U+1E3F to \xA8\xBC (windows-936/GBK). 139 See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3 140 1412. source/data/brkitr 142 - dictionaries/khmerdict.txt: Abridged Khmer dictionary. See 143 https://unicode-org.atlassian.net/browse/ICU-9451 144 - dictionaries/laodict.txt: Abridged Lao dictionary. We keep using the smaller 145 old version from ICU69-1. 146 - rules/word_ja.txt (used only on Android) 147 Added for Japanese-specific word-breaking without the C+J dictionary. 148 - rules/{root,zh,zh_Hant}.txt 149 a. Use line_normal by default. 150 b. Drop local patches we used to have for the following issues. They'll 151 be dealt with in the upstream (Unicode/CLDR). 152 http://unicode.org/cldr/trac/ticket/6557 153 http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779) 154 1553. Add {an,ku,tg,wa}.txt to source/data/{locale,lang} 156 with the minimal locale data necessary for spellchecker and 157 and language menus. 158 159D. Local Modifications 160 1611. Applied locale data patches from Google obtained by diff'ing 162 the upstream copy and Google's internal copy for source/data 163 164 - patches/locale_google.patch: 165 * Google's internal ICU locale changes 166 * Simpler region names for Hong Kong and Macau in all locales 167 * Currency signs in ru and uk locales (do not include 'tr' locale changes) 168 * AM/PM, midnight, noon formatting for a few Indian locales 169 * Timezone name changes in Korean and Chinese locales 170 * Default digit for Arabic locale is European digits. 171 172 - patches/locale1.patch: Minor fixes for Korean 173 174 - patches/name_5_langs.patch: add the native names of 5 languages not currently 175 supported by CLDR/ICU. When updating the ICU to a new version, 176 source/data/lang/{ay,dv,ilo,lus,ts}.txt have to be checked and if they are 177 present with their display names populated, this patch has to be adjusted 178 or discarded as necessary. 179 1802. Breakiterator patches 181 - patches/wordbrk.patch for word.txt, word_POSIX.txt, and word_fi_sv.txt 182 a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that 183 FQDN labels can be split at '.' 184 b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric. 185 See http://unicode.org/cldr/trac/ticket/6555 186 c. Restore pre-ICU 72 behavior of breaking at '@'. The new upstream behavior 187 of not breaking at '@' interacted badly with the local change to break at 188 '.' (D.2.a above): although not breaking at '@' is intended to not break 189 within e-mail addresses, this is not possible with Chromium's 190 break-at-'.' behavior. 191 192 - patches/khmer-dictbe.patch 193 Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt). 194 https://unicode-org.atlassian.net/browse/ICU-9451 195 196 - Add several common Chinese words that were dropped previously to 197 source/data/cjdict/brkitr/cjdict.txt 198 patch: patches/cjdict.patch 199 upstream bug: https://unicode-org.atlassian.net/browse/ICU-10888 200 2013. Timezone data update 202 Run scripts/update_tz.sh to grab the latest version of the 203 following timezone data files and put them in source/data/misc 204 205 metaZones.txt 206 timezoneTypes.txt 207 windowsZones.txt 208 zoneinfo64.txt 209 210 As of Mar 5, 2024, the latest version is 2024a 211 and the above files are available at the ICU github repos. 212 2134. Build-related changes 214 215 - patches/configure.patch: 216 * Remove a section of configure that will cause breakage while 217 running runConfigureICU. 218 219 - patches/wpo.patch (only needed when icudata dll is used). 220 upstream bugs : https://unicode-org.atlassian.net/browse/ICU-8043 221 https://unicode-org.atlassian.net/browse/ICU-5701 222 223 - patches/data_symb.patch : 224 Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use 225 the icu data file or icudt.dll 226 2275. ISO-2022-JP encoding (fromUnicode) change per WHATWG encoding spec. 228 - patches/iso2022jp.patch 229 - upstream bug: 230 https://unicode-org.atlassian.net/browse/ICU-20251 231 2326. Enable tracing of file but not resource, only for Chromium 233 to reduce performance impact/risk. 234 - patches/restrace.patch 235 2367. Patch Arabic date time pattern back to 67 value to avoid test 237 breakage in 238 third_party/blink/web_tests/fast/forms/datetimelocal/datetimelocal-appearance-l10n.html 239 - patches/ardatepattern.patch 240 - https://bugs.chromium.org/p/chromium/issues/detail?id=1139186 241 2428. Remove explicit std::atomic<NumberRangeFormatterImpl*> template 243 instantiation 244 patches/atomic_template_instantiation.patch 245 - The explicit instantiation was added to silence MSVC C4251 warnings: 246 https://unicode-org.atlassian.net/browse/ICU-20157 247 Small test cases show that it is generally an error to instantiate 248 std::atomic<T*> with an incomplete type T with MSVC, clang, and GCC, so this 249 instantiation never should have worked: 250 https://gcc.godbolt.org/z/34xx8h 251 At this time, it's not clear if this particular instantiation with 252 NumberRangeFormatterImpl* was ever necessary for MSVC. Further testing with 253 MSVC is required to upstream this patch. 254 - https://unicode-org.atlassian.net/browse/ICU-21482 255 2569. Patch source/common/uposixdefs.h so it compiles on Fuchsia on Macs. 257 patches/fuchsia.patch 258 - context bug: https://bugs.chromium.org/p/chromium/issues/detail?id=1184527 259 26010. Patch fix of Etc/Unknown being returned for 261 Intl.DateTimeFormat().resolvedOptions().timeZone on macOS 14. 262 patches/revert_realpath.patch 263 - https://bugs.chromium.org/p/chromium/issues/detail?id=1473422 264 - https://unicode-org.atlassian.net/browse/ICU-22541