xref: /aosp_15_r20/external/icu/tools/cldr/cldr-to-icu/README.md (revision 0e209d3975ff4a8c132096b14b0e9364a753506e)
1*0e209d39SAndroid Build Coastguard Worker<!--
2*0e209d39SAndroid Build Coastguard Worker© 2019 and later: Unicode, Inc. and others.
3*0e209d39SAndroid Build Coastguard WorkerLicense & terms of use: http://www.unicode.org/copyright.html
4*0e209d39SAndroid Build Coastguard Worker-->
5*0e209d39SAndroid Build Coastguard Worker
6*0e209d39SAndroid Build Coastguard Worker# Basic instructions for running the LdmlConverter via Maven
7*0e209d39SAndroid Build Coastguard Worker
8*0e209d39SAndroid Build Coastguard Worker> Note: While this document provides useful background information about the
9*0e209d39SAndroid Build Coastguard Worker  LdmlConverter, the actual complete process for integrating CLDR data to ICU
10*0e209d39SAndroid Build Coastguard Worker  is described in the document `../../../docs/processes/cldr-icu.md` which is
11*0e209d39SAndroid Build Coastguard Worker  best viewed as
12*0e209d39SAndroid Build Coastguard Worker  [CLDR-ICU integration](https://unicode-org.github.io/icu/processes/cldr-icu.html)
13*0e209d39SAndroid Build Coastguard Worker
14*0e209d39SAndroid Build Coastguard Worker## Requirements
15*0e209d39SAndroid Build Coastguard Worker
16*0e209d39SAndroid Build Coastguard Worker* A CLDR release for supplying CLDR data and the CLDR API.
17*0e209d39SAndroid Build Coastguard Worker* The Maven build tool
18*0e209d39SAndroid Build Coastguard Worker* The Ant build tool (using JDK 11+)
19*0e209d39SAndroid Build Coastguard Worker
20*0e209d39SAndroid Build Coastguard Worker## Important directories
21*0e209d39SAndroid Build Coastguard Worker
22*0e209d39SAndroid Build Coastguard Worker| Directory       | Description                                                                                                                                                                                                                          |
23*0e209d39SAndroid Build Coastguard Worker|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
24*0e209d39SAndroid Build Coastguard Worker| `TOOLS_ROOT`    | Path to root of ICU tools directory, below which are (e.g.) the `cldr/` and `unicodetools/` directories.                                                                                                                             |
25*0e209d39SAndroid Build Coastguard Worker| `CLDR_DIR`      | This is the path to the to root of standard CLDR sources, below which are the `common/` and `tools/` directories.                                                                                                                    |
26*0e209d39SAndroid Build Coastguard Worker| `CLDR_DATA_DIR` | The top-level directory for the CLDR production data (typically the "production" directory in the staging repository). Usually generated locally or obtained from:  https://github.com/unicode-org/cldr-staging/tree/main/production |
27*0e209d39SAndroid Build Coastguard Worker
28*0e209d39SAndroid Build Coastguard WorkerIn Posix systems, it's best to set these as exported shell variables, and any
29*0e209d39SAndroid Build Coastguard Workerfollowing instructions assume they have been set accordingly:
30*0e209d39SAndroid Build Coastguard Worker
31*0e209d39SAndroid Build Coastguard Worker```
32*0e209d39SAndroid Build Coastguard Worker$ export TOOLS_ROOT=/path/to/icu/tools
33*0e209d39SAndroid Build Coastguard Worker$ export CLDR_DIR=/path/to/cldr
34*0e209d39SAndroid Build Coastguard Worker$ export CLDR_DATA_DIR=/path/to/cldr-staging/production
35*0e209d39SAndroid Build Coastguard Worker```
36*0e209d39SAndroid Build Coastguard Worker
37*0e209d39SAndroid Build Coastguard WorkerNote that you should not attempt to use data from the CLDR project directory
38*0e209d39SAndroid Build Coastguard Worker(where the CLDR API code exists) for conversion into ICU data. The process now
39*0e209d39SAndroid Build Coastguard Workerrelies on a pre-processing step, and the CLDR data must come from the separate
40*0e209d39SAndroid Build Coastguard Worker"staging" repository (i.e. https://github.com/unicode-org/cldr-staging) or be
41*0e209d39SAndroid Build Coastguard Workerpre-processed locally into a different directory.
42*0e209d39SAndroid Build Coastguard Worker
43*0e209d39SAndroid Build Coastguard Worker
44*0e209d39SAndroid Build Coastguard Worker## Initial Setup
45*0e209d39SAndroid Build Coastguard Worker
46*0e209d39SAndroid Build Coastguard WorkerThis project relies on the Maven build tool for managing dependencies and uses
47*0e209d39SAndroid Build Coastguard WorkerAnt for configuration purposes, so both will need to be installed. On a Debian
48*0e209d39SAndroid Build Coastguard Workerbased system, this should be as simple as:
49*0e209d39SAndroid Build Coastguard Worker
50*0e209d39SAndroid Build Coastguard Worker```
51*0e209d39SAndroid Build Coastguard Worker$ sudo apt-get install maven ant
52*0e209d39SAndroid Build Coastguard Worker```
53*0e209d39SAndroid Build Coastguard Worker
54*0e209d39SAndroid Build Coastguard WorkerYou must also install an additional CLDR JAR file the local Maven repository at
55*0e209d39SAndroid Build Coastguard Worker`$TOOLS_ROOT/cldr/lib` (see the `README.txt` in that directory for more
56*0e209d39SAndroid Build Coastguard Workerinformation).
57*0e209d39SAndroid Build Coastguard Worker
58*0e209d39SAndroid Build Coastguard Worker```
59*0e209d39SAndroid Build Coastguard Worker$ cd "$TOOLS_ROOT/cldr/lib"
60*0e209d39SAndroid Build Coastguard Worker$ ./install-cldr-jars.sh "$CLDR_DIR"
61*0e209d39SAndroid Build Coastguard Worker```
62*0e209d39SAndroid Build Coastguard Worker
63*0e209d39SAndroid Build Coastguard Worker## Generating all ICU data and source code
64*0e209d39SAndroid Build Coastguard Worker
65*0e209d39SAndroid Build Coastguard Worker```
66*0e209d39SAndroid Build Coastguard Worker$ cd "$TOOLS_ROOT/cldr/cldr-to-icu"
67*0e209d39SAndroid Build Coastguard Worker$ ant -f build-icu-data.xml
68*0e209d39SAndroid Build Coastguard Worker```
69*0e209d39SAndroid Build Coastguard Worker
70*0e209d39SAndroid Build Coastguard Worker## Other Examples
71*0e209d39SAndroid Build Coastguard Worker
72*0e209d39SAndroid Build Coastguard Worker* Outputting a subset of the supplemental data into a specified directory:
73*0e209d39SAndroid Build Coastguard Worker  ```
74*0e209d39SAndroid Build Coastguard Worker  $ ant -f build-icu-data.xml -DoutDir=/tmp/cldr -DoutputTypes=plurals,dayPeriods -DdontGenCode=true
75*0e209d39SAndroid Build Coastguard Worker  ```
76*0e209d39SAndroid Build Coastguard Worker  Note: Output types can be listed with mixedCase, lower_underscore or UPPER_UNDERSCORE.
77*0e209d39SAndroid Build Coastguard Worker  Pass `-DoutputTypes=help` to see the full list.
78*0e209d39SAndroid Build Coastguard Worker
79*0e209d39SAndroid Build Coastguard Worker
80*0e209d39SAndroid Build Coastguard Worker* Outputting only a subset of locale IDs (and all the supplemental data):
81*0e209d39SAndroid Build Coastguard Worker  ```
82*0e209d39SAndroid Build Coastguard Worker  $ ant -f build-icu-data.xml -DoutDir=/tmp/cldr -DlocaleIdFilter='(zh|yue).*' -DdontGenCode=true
83*0e209d39SAndroid Build Coastguard Worker  ```
84*0e209d39SAndroid Build Coastguard Worker
85*0e209d39SAndroid Build Coastguard Worker* Overriding the default CLDR version string (which normally matches the CLDR library code):
86*0e209d39SAndroid Build Coastguard Worker  ```
87*0e209d39SAndroid Build Coastguard Worker  $ ant -f build-icu-data.xml -DcldrVersion="36.1"
88*0e209d39SAndroid Build Coastguard Worker  ```
89*0e209d39SAndroid Build Coastguard Worker
90*0e209d39SAndroid Build Coastguard Worker* Using alternate CLDR values (ex: use `alt="ascii"` values from the CLDR XML):
91*0e209d39SAndroid Build Coastguard Worker
92*0e209d39SAndroid Build Coastguard Worker  First, edit the `build-icu-data.xml` file where it mentions `ALTERNATE VALUES`
93*0e209d39SAndroid Build Coastguard Worker  with the correctly annotated source path, target path, and locales list:
94*0e209d39SAndroid Build Coastguard Worker  ```diff
95*0e209d39SAndroid Build Coastguard Worker  @@ -384,6 +399,20 @@
96*0e209d39SAndroid Build Coastguard Worker            <!-- ALTERNATE VALUES -->
97*0e209d39SAndroid Build Coastguard Worker
98*0e209d39SAndroid Build Coastguard Worker            <!-- The following elements configure alternate values for some special case paths.
99*0e209d39SAndroid Build Coastguard Worker                 The target path will only be replaced if both it, and the source path, exist in
100*0e209d39SAndroid Build Coastguard Worker                 the CLDR data (paths will not be modified if only the source path exists).
101*0e209d39SAndroid Build Coastguard Worker
102*0e209d39SAndroid Build Coastguard Worker                 Since the paths must represent the same semantic type of data, they must be in the
103*0e209d39SAndroid Build Coastguard Worker                 same "namespace" (same element names) and must not contain value attributes. Thus
104*0e209d39SAndroid Build Coastguard Worker                 they can only differ by distinguishing attributes (either added or modified).
105*0e209d39SAndroid Build Coastguard Worker
106*0e209d39SAndroid Build Coastguard Worker                 This feature is typically used to select alternate translations (e.g. short forms)
107*0e209d39SAndroid Build Coastguard Worker                 for certain paths. -->
108*0e209d39SAndroid Build Coastguard Worker             <!-- <altPath target="//path/to/value[@attr='foo']"
109*0e209d39SAndroid Build Coastguard Worker                           source="//path/to/value[@attr='bar']"
110*0e209d39SAndroid Build Coastguard Worker                           locales="xx,yy_ZZ"/> -->
111*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehm']"
112*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehm'][@alt='ascii']"
113*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
114*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehms']"
115*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehms'][@alt='ascii']"
116*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
117*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='h']"
118*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='h'][@alt='ascii']"
119*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
120*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hm']"
121*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hm'][@alt='ascii']"
122*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
123*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms']"
124*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms'][@alt='ascii']"
125*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
126*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hmsv']"
127*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hmsv'][@alt='ascii']"
128*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
129*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hmv']"
130*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hmv'][@alt='ascii']"
131*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
132*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='full']/timeFormat[@type='standard']/pattern[@type='standard']"
133*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='full']/timeFormat[@type='standard']/pattern[@alt='ascii'][@type='standard']"
134*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
135*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='long']/timeFormat[@type='standard']/pattern[@type='standard']"
136*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='long']/timeFormat[@type='standard']/pattern[@alt='ascii'][@type='standard']"
137*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
138*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='medium']/timeFormat[@type='standard']/pattern[@type='standard']"
139*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='medium']/timeFormat[@type='standard']/pattern[@alt='ascii'][@type='standard']"
140*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
141*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='short']/timeFormat[@type='standard']/pattern[@type='standard']"
142*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='short']/timeFormat[@type='standard']/pattern[@alt='ascii'][@type='standard']"
143*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
144*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehm']"
145*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehm'][@alt='ascii']"
146*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
147*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehms']"
148*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehms'][@alt='ascii']"
149*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
150*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='h']"
151*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='h'][@alt='ascii']"
152*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
153*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hm']"
154*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hm'][@alt='ascii']"
155*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
156*0e209d39SAndroid Build Coastguard Worker  +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms']"
157*0e209d39SAndroid Build Coastguard Worker  +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms'][@alt='ascii']"
158*0e209d39SAndroid Build Coastguard Worker  +                     locales="en"/>
159*0e209d39SAndroid Build Coastguard Worker  ```
160*0e209d39SAndroid Build Coastguard Worker  Then run the generator:
161*0e209d39SAndroid Build Coastguard Worker  ```
162*0e209d39SAndroid Build Coastguard Worker  $ ant -f build-icu-data.xml <options>
163*0e209d39SAndroid Build Coastguard Worker  ```
164*0e209d39SAndroid Build Coastguard Worker
165*0e209d39SAndroid Build Coastguard WorkerSee build-icu-data.xml for documentation of all options and additional customization.
166*0e209d39SAndroid Build Coastguard Worker
167*0e209d39SAndroid Build Coastguard Worker
168*0e209d39SAndroid Build Coastguard Worker## Running unit tests
169*0e209d39SAndroid Build Coastguard Worker
170*0e209d39SAndroid Build Coastguard Worker```
171*0e209d39SAndroid Build Coastguard Worker$ mvn test -DCLDR_DIR="$CLDR_DATA_DIR"
172*0e209d39SAndroid Build Coastguard Worker```
173*0e209d39SAndroid Build Coastguard Worker
174*0e209d39SAndroid Build Coastguard Worker
175*0e209d39SAndroid Build Coastguard Worker## Importing and running from an IDE
176*0e209d39SAndroid Build Coastguard Worker
177*0e209d39SAndroid Build Coastguard WorkerThis project should be easy to import into an IDE which supports Maven development, such
178*0e209d39SAndroid Build Coastguard Workeras IntelliJ or Eclipse. It uses a local Maven repository directory for the unpublished
179*0e209d39SAndroid Build Coastguard WorkerCLDR libraries (which are included in the project), but otherwise gets all dependencies
180*0e209d39SAndroid Build Coastguard Workervia Maven's public repositories.
181