xref: /aosp_15_r20/external/cldr/docs/ldml/tr35.md (revision 912701f9769bb47905792267661f0baf2b85bed5)
1*912701f9SAndroid Build Coastguard Worker## Proposed Update Unicode Technical Standard #35
2*912701f9SAndroid Build Coastguard Worker
3*912701f9SAndroid Build Coastguard Worker# Unicode Locale Data Markup Language (LDML)
4*912701f9SAndroid Build Coastguard Worker
5*912701f9SAndroid Build Coastguard Worker|Version|45        |
6*912701f9SAndroid Build Coastguard Worker|-------|----------|
7*912701f9SAndroid Build Coastguard Worker|Editors|Mark Davis (<a href="mailto:[email protected]">[email protected]</a>) and <a href="tr35.md#Acknowledgments">other CLDR committee members</a>|
8*912701f9SAndroid Build Coastguard Worker|Date|2024-04-16|
9*912701f9SAndroid Build Coastguard Worker|This Version|<a href="https://www.unicode.org/reports/tr35/tr35-72/tr35.html">https://www.unicode.org/reports/tr35/tr35-72/tr35.html</a>|
10*912701f9SAndroid Build Coastguard Worker|Previous Version|<a href="https://www.unicode.org/reports/tr35/tr35-71/tr35.html">https://www.unicode.org/reports/tr35/tr35-71/tr35.html</a>|
11*912701f9SAndroid Build Coastguard Worker|Latest Version|<a href="https://www.unicode.org/reports/tr35/">https://www.unicode.org/reports/tr35/</a>|
12*912701f9SAndroid Build Coastguard Worker|Corrigenda|<a href="https://cldr.unicode.org/index/corrigenda">https://cldr.unicode.org/index/corrigenda</a>|
13*912701f9SAndroid Build Coastguard Worker|Latest Proposed Update|<a href="https://www.unicode.org/reports/tr35/proposed.html">https://www.unicode.org/reports/tr35/proposed.html</a></td></tr>
14*912701f9SAndroid Build Coastguard Worker|Namespace|<a href="https://www.unicode.org/cldr/">https://www.unicode.org/cldr/</a>|
15*912701f9SAndroid Build Coastguard Worker|DTDs|<a href="https://www.unicode.org/cldr/dtd/45/">https://www.unicode.org/cldr/dtd/45/</a>|
16*912701f9SAndroid Build Coastguard Worker|Revision|<a href="#Modifications">72</a>|
17*912701f9SAndroid Build Coastguard Worker
18*912701f9SAndroid Build Coastguard Worker### _Summary_
19*912701f9SAndroid Build Coastguard Worker
20*912701f9SAndroid Build Coastguard WorkerThis document describes an XML format (_vocabulary_) for the exchange of structured locale data. This format is used in the [Unicode Common Locale Data Repository](https://www.unicode.org/cldr/).
21*912701f9SAndroid Build Coastguard Worker
22*912701f9SAndroid Build Coastguard Worker_Note:_
23*912701f9SAndroid Build Coastguard WorkerSome links may lead to in-development or older
24*912701f9SAndroid Build Coastguard Workerversions of the data files.
25*912701f9SAndroid Build Coastguard WorkerSee <https://cldr.unicode.org> for up-to-date CLDR release data.
26*912701f9SAndroid Build Coastguard Worker
27*912701f9SAndroid Build Coastguard Worker### _Status_
28*912701f9SAndroid Build Coastguard Worker
29*912701f9SAndroid Build Coastguard Worker<!-- _This is a draft document which may be updated, replaced, or superseded by other documents at any time.
30*912701f9SAndroid Build Coastguard WorkerPublication does not imply endorsement by the Unicode Consortium.
31*912701f9SAndroid Build Coastguard WorkerThis is not a stable document; it is inappropriate to cite this document as other than a work in progress._ -->
32*912701f9SAndroid Build Coastguard Worker
33*912701f9SAndroid Build Coastguard Worker_This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium.
34*912701f9SAndroid Build Coastguard WorkerThis is a stable document and may be used as reference material or cited as a normative reference by other specifications._
35*912701f9SAndroid Build Coastguard Worker
36*912701f9SAndroid Build Coastguard Worker> _**A Unicode Technical Standard (UTS)** is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS._
37*912701f9SAndroid Build Coastguard Worker
38*912701f9SAndroid Build Coastguard Worker_Please submit corrigenda and other comments with the CLDR bug reporting form [[Bugs](https://cldr.unicode.org/index/bug-reports)]. Related information that is useful in understanding this document is found in the [References](#References). For the latest version of the Unicode Standard see [[Unicode](https://www.unicode.org/versions/latest/)]. For a list of current Unicode Technical Reports see [[Reports](https://www.unicode.org/reports/)]. For more information about versions of the Unicode Standard, see [[Versions](https://www.unicode.org/versions/)]._
39*912701f9SAndroid Build Coastguard Worker
40*912701f9SAndroid Build Coastguard Worker>**_NOTE: The source for the LDML specification has been converted to GitHub Markdown (GFM) instead of HTML. The formatting is now simpler, but some features — such as formatting for table captions — may not be complete by the release date. Improvements in the formatting for the specification may be done after the release, but no substantive changes will be made to the content._**
41*912701f9SAndroid Build Coastguard Worker
42*912701f9SAndroid Build Coastguard Worker## <a name="Parts" href="#Parts">Parts</a>
43*912701f9SAndroid Build Coastguard Worker
44*912701f9SAndroid Build Coastguard WorkerThe LDML specification is divided into the following parts:
45*912701f9SAndroid Build Coastguard Worker
46*912701f9SAndroid Build Coastguard Worker*   Part 1: [Core](tr35.md#Contents) (languages, locales, basic structure)
47*912701f9SAndroid Build Coastguard Worker*   Part 2: [General](tr35-general.md#Contents) (display names & transforms, etc.)
48*912701f9SAndroid Build Coastguard Worker*   Part 3: [Numbers](tr35-numbers.md#Contents) (number & currency formatting)
49*912701f9SAndroid Build Coastguard Worker*   Part 4: [Dates](tr35-dates.md#Contents) (date, time, time zone formatting)
50*912701f9SAndroid Build Coastguard Worker*   Part 5: [Collation](tr35-collation.md#Contents) (sorting, searching, grouping)
51*912701f9SAndroid Build Coastguard Worker*   Part 6: [Supplemental](tr35-info.md#Contents) (supplemental data)
52*912701f9SAndroid Build Coastguard Worker*   Part 7: [Keyboards](tr35-keyboards.md#Contents) (keyboard mappings)
53*912701f9SAndroid Build Coastguard Worker*   Part 8: [Person Names](tr35-personNames.md#Contents) (person names)
54*912701f9SAndroid Build Coastguard Worker*   Part 9: [MessageFormat](tr35-messageFormat.md#Contents) (message format)
55*912701f9SAndroid Build Coastguard Worker
56*912701f9SAndroid Build Coastguard Worker## <a name="Contents" href="#Contents">Contents of Part 1, Core</a>
57*912701f9SAndroid Build Coastguard Worker
58*912701f9SAndroid Build Coastguard Worker* [Introduction](#Introduction)
59*912701f9SAndroid Build Coastguard Worker  * [Conformance](#Conformance)
60*912701f9SAndroid Build Coastguard Worker  * [EBNF](#ebnf)
61*912701f9SAndroid Build Coastguard Worker* [What is a Locale?](#Locale)
62*912701f9SAndroid Build Coastguard Worker* [Unicode Language and Locale Identifiers](#Unicode_Language_and_Locale_Identifiers)
63*912701f9SAndroid Build Coastguard Worker  * _[Unicode Language Identifier](#Unicode_language_identifier)_
64*912701f9SAndroid Build Coastguard Worker  * _[Unicode Locale Identifier](#Unicode_locale_identifier)_
65*912701f9SAndroid Build Coastguard Worker    * [Canonical Unicode Locale Identifiers](#Canonical_Unicode_Locale_Identifiers)
66*912701f9SAndroid Build Coastguard Worker  * [BCP 47 Conformance](#BCP_47_Conformance)
67*912701f9SAndroid Build Coastguard Worker    * [BCP 47 Language Tag Conversion](#BCP_47_Language_Tag_Conversion)
68*912701f9SAndroid Build Coastguard Worker      * Table: [BCP 47 Language Tag to Unicode BCP 47 Locale Identifier](#Language_Tag_to_Locale_Identifier) Examples
69*912701f9SAndroid Build Coastguard Worker      * [Unicode Locale Identifier: CLDR to BCP 47](#Unicode_Locale_Identifier_CLDR_to_BCP_47)
70*912701f9SAndroid Build Coastguard Worker      * [Unicode Locale Identifier: BCP 47 to CLDR](#Unicode_Locale_Identifier_BCP_47_to_CLDR)
71*912701f9SAndroid Build Coastguard Worker      * [Truncation](#truncation)
72*912701f9SAndroid Build Coastguard Worker  * [Language Identifier Field Definitions](#Field_Definitions)
73*912701f9SAndroid Build Coastguard Worker    * [`unicode_language_subtag`](#unicode_language_subtag_validity) (also known as a _Unicode base language code_)
74*912701f9SAndroid Build Coastguard Worker    * [`unicode_script_subtag`](#unicode_script_subtag_validity) (also known as a _Unicode script code_)
75*912701f9SAndroid Build Coastguard Worker    * [`unicode_region_subtag`](#unicode_region_subtag_validity) (also known as a _Unicode region code,_ or a _Unicode territory code_)
76*912701f9SAndroid Build Coastguard Worker    * [`unicode_variant_subtag`](#unicode_variant_subtag_validity) (also known as a _Unicode language variant code_)
77*912701f9SAndroid Build Coastguard Worker  * [Special Codes](#Special_Codes)
78*912701f9SAndroid Build Coastguard Worker    * [Unknown or Invalid Identifiers](#Unknown_or_Invalid_Identifiers)
79*912701f9SAndroid Build Coastguard Worker    * [Numeric Codes](#Numeric_Codes)
80*912701f9SAndroid Build Coastguard Worker    * [Private Use Codes](#Private_Use_Codes)
81*912701f9SAndroid Build Coastguard Worker      * Table: [Private Use Codes in CLDR](#Private_Use_CLDR)
82*912701f9SAndroid Build Coastguard Worker  * [Special Script Codes](#special-script-codes)
83*912701f9SAndroid Build Coastguard Worker  * [Unicode BCP 47 U Extension](#u_Extension)
84*912701f9SAndroid Build Coastguard Worker    * [Key And Type Definitions](#Key_And_Type_Definitions_)
85*912701f9SAndroid Build Coastguard Worker      * Table: [Key/Type Definitions](#Key_Type_Definitions)
86*912701f9SAndroid Build Coastguard Worker    * [Numbering System Data](#Numbering%20System%20Data)
87*912701f9SAndroid Build Coastguard Worker    * [Time Zone Identifiers](#Time_Zone_Identifiers)
88*912701f9SAndroid Build Coastguard Worker    * [U Extension Data Files](#Unicode_Locale_Extension_Data_Files)
89*912701f9SAndroid Build Coastguard Worker    * [Subdivision Codes](#Unicode_Subdivision_Codes)
90*912701f9SAndroid Build Coastguard Worker      * [Validity](#Validity)
91*912701f9SAndroid Build Coastguard Worker  * [Unicode BCP 47 T Extension](#BCP47_T_Extension)
92*912701f9SAndroid Build Coastguard Worker    * [T Extension Data Files](#Transformed_Content_Data_File)
93*912701f9SAndroid Build Coastguard Worker  * [Compatibility with Older Identifiers](#Compatibility_with_Older_Identifiers)
94*912701f9SAndroid Build Coastguard Worker    * [Old Locale Extension Syntax](#Old_Locale_Extension_Syntax)
95*912701f9SAndroid Build Coastguard Worker      * Table: [Locale Extension Mappings](#Locale_Extension_Mappings)
96*912701f9SAndroid Build Coastguard Worker    * [Legacy Variants](#Legacy_Variants)
97*912701f9SAndroid Build Coastguard Worker      * Table: [Legacy Variant Mappings](#Legacy_Variant_Mappings)
98*912701f9SAndroid Build Coastguard Worker    * [Relation to OpenI18n](#Relation_to_OpenI18n)
99*912701f9SAndroid Build Coastguard Worker  * [Transmitting Locale Information](#Transmitting_Locale_Information)
100*912701f9SAndroid Build Coastguard Worker    * [Message Formatting and Exceptions](#Message_Formatting_and_Exceptions)
101*912701f9SAndroid Build Coastguard Worker  * [Unicode Language and Locale IDs](#Language_and_Locale_IDs)
102*912701f9SAndroid Build Coastguard Worker    * [Written Language](#Written_Language)
103*912701f9SAndroid Build Coastguard Worker    * [Hybrid Locale Identifiers](#Hybrid_Locale)
104*912701f9SAndroid Build Coastguard Worker  * [Validity Data](#Validity_Data)
105*912701f9SAndroid Build Coastguard Worker* [Locale Inheritance and Matching](#Locale_Inheritance)
106*912701f9SAndroid Build Coastguard Worker  * [Lookup](#Lookup)
107*912701f9SAndroid Build Coastguard Worker    * [Bundle vs Item Lookup](#Bundle_vs_Item_Lookup)
108*912701f9SAndroid Build Coastguard Worker      * Table: [Lookup Differences](#Lookup-Differences)
109*912701f9SAndroid Build Coastguard Worker    * [Lateral Inheritance](#Lateral_Inheritance)
110*912701f9SAndroid Build Coastguard Worker      * Table: [Count Fallback: normal](#Count_Fallback_normal)
111*912701f9SAndroid Build Coastguard Worker      * Table: [Count Fallback: currency](#Count_Fallback_currency)
112*912701f9SAndroid Build Coastguard Worker    * [Parent Locales](#Parent_Locales)
113*912701f9SAndroid Build Coastguard Worker    * [Region-Priority Inheritance](#Region_Priority_Inheritance)
114*912701f9SAndroid Build Coastguard Worker  * [Inheritance and Validity](#Inheritance_and_Validity)
115*912701f9SAndroid Build Coastguard Worker    * [Definitions](#Definitions)
116*912701f9SAndroid Build Coastguard Worker    * [Resolved Data File](#Resolved_Data_File)
117*912701f9SAndroid Build Coastguard Worker    * [Valid Data](#Valid_Data)
118*912701f9SAndroid Build Coastguard Worker    * [Checking for Draft Status](#Checking_for_Draft_Status)
119*912701f9SAndroid Build Coastguard Worker    * [Keyword and Default Resolution](#Keyword_and_Default_Resolution)
120*912701f9SAndroid Build Coastguard Worker    * [Inheritance vs Related Information](#Inheritance_vs_Related)
121*912701f9SAndroid Build Coastguard Worker  * [Likely Subtags](#Likely_Subtags)
122*912701f9SAndroid Build Coastguard Worker  * [Language Matching](#LanguageMatching)
123*912701f9SAndroid Build Coastguard Worker    * [Enhanced Language Matching](#EnhancedLanguageMatching)
124*912701f9SAndroid Build Coastguard Worker* [XML Format](#XML_Format)
125*912701f9SAndroid Build Coastguard Worker  * [Common Elements](#Common_Elements)
126*912701f9SAndroid Build Coastguard Worker    * [Element special](#special)
127*912701f9SAndroid Build Coastguard Worker      * [Sample Special Elements](#Sample_Special_Elements)
128*912701f9SAndroid Build Coastguard Worker    * [Element alias](#Alias_Elements)
129*912701f9SAndroid Build Coastguard Worker      * Table: [Inheritance with `source="locale"`](#Inheritance_with_source_locale_)
130*912701f9SAndroid Build Coastguard Worker    * [Element displayName](#Element_displayName)
131*912701f9SAndroid Build Coastguard Worker    * [Escaping Characters](#Escaping_Characters)
132*912701f9SAndroid Build Coastguard Worker  * [Common Attributes](#Common_Attributes)
133*912701f9SAndroid Build Coastguard Worker    * [Attribute type](#Attribute_type)
134*912701f9SAndroid Build Coastguard Worker    * [Attribute draft](#Attribute_draft)
135*912701f9SAndroid Build Coastguard Worker    * [Attribute alt](#alt_attribute)
136*912701f9SAndroid Build Coastguard Worker    * [Attribute references](#references_attribute)
137*912701f9SAndroid Build Coastguard Worker  * [Common Structures](#Common_Structures)
138*912701f9SAndroid Build Coastguard Worker    * [Date and Date Ranges](#Date_Ranges)
139*912701f9SAndroid Build Coastguard Worker    * [Text Directionality](#Text_Directionality)
140*912701f9SAndroid Build Coastguard Worker    * [Unicode Sets](#Unicode_Sets)
141*912701f9SAndroid Build Coastguard Worker      * [UnicodeSet syntax](#unicodeset-syntax)
142*912701f9SAndroid Build Coastguard Worker        * [Syntax Special Case Examples](#syntax-special-case-examples)
143*912701f9SAndroid Build Coastguard Worker      * [Lists of Code Points](#Lists_of_Code_Points)
144*912701f9SAndroid Build Coastguard Worker      * [Backslash Escapes](#Backslash_Escapes)
145*912701f9SAndroid Build Coastguard Worker      * [Unicode Properties](#Unicode_Properties)
146*912701f9SAndroid Build Coastguard Worker      * [Boolean Operations](#Boolean_Operations)
147*912701f9SAndroid Build Coastguard Worker      * [Variables in UnicodeSets](#Variables_in_UnicodeSets)
148*912701f9SAndroid Build Coastguard Worker      * [UnicodeSet Examples](#UnicodeSet_Examples)
149*912701f9SAndroid Build Coastguard Worker    * [String Range](#String_Range)
150*912701f9SAndroid Build Coastguard Worker  * [Identity Elements](#Identity_Elements)
151*912701f9SAndroid Build Coastguard Worker  * [Valid Attribute Values](#Valid_Attribute_Values)
152*912701f9SAndroid Build Coastguard Worker  * [Canonical Form](#Canonical_Form)
153*912701f9SAndroid Build Coastguard Worker    * [Content](#Content)
154*912701f9SAndroid Build Coastguard Worker    * [Ordering](#Ordering)
155*912701f9SAndroid Build Coastguard Worker    * [Comments](#Comments)
156*912701f9SAndroid Build Coastguard Worker  * [DTD Annotations](#DTD_Annotations)
157*912701f9SAndroid Build Coastguard Worker    * [Attribute Value Constraints](#match_expressions)
158*912701f9SAndroid Build Coastguard Worker* [Property Data](#Property_Data)
159*912701f9SAndroid Build Coastguard Worker  * [Script Metadata](#Script_Metadata)
160*912701f9SAndroid Build Coastguard Worker  * [Extended Pictographic](#Extended_Pictographic)
161*912701f9SAndroid Build Coastguard Worker  * [Labels.txt](#Labels.txt)
162*912701f9SAndroid Build Coastguard Worker  * [Segmentation Tests](#Segmentation_Tests)
163*912701f9SAndroid Build Coastguard Worker* [Issues in Formatting and Parsing](#Format_Parse_Issues)
164*912701f9SAndroid Build Coastguard Worker  * [Lenient Parsing](#Lenient_Parsing)
165*912701f9SAndroid Build Coastguard Worker    * [Motivation](#Motivation)
166*912701f9SAndroid Build Coastguard Worker    * [Loose Matching](#Loose_Matching)
167*912701f9SAndroid Build Coastguard Worker  * [Handling Invalid Patterns](#Invalid_Patterns)
168*912701f9SAndroid Build Coastguard Worker* [Data Size Reduction](#Data_Size)
169*912701f9SAndroid Build Coastguard Worker  * [Vertical Slicing](#Vertical_Slicing)
170*912701f9SAndroid Build Coastguard Worker  * [Horizontal Slicing](#Horizontal_Slicing)
171*912701f9SAndroid Build Coastguard Worker* [Annex A Deprecated Structure](#Deprecated_Structure)
172*912701f9SAndroid Build Coastguard Worker  * [A.1 Element fallback](#Fallback_Elements)
173*912701f9SAndroid Build Coastguard Worker  * [A.2 BCP 47 Keyword Mapping](#BCP47_Keyword_Mapping)
174*912701f9SAndroid Build Coastguard Worker  * [A.3 Choice Patterns](#Choice_Patterns)
175*912701f9SAndroid Build Coastguard Worker  * [A.4 Element default](#Element_default)
176*912701f9SAndroid Build Coastguard Worker  * [A.5 Deprecated Common Attributes](#Deprecated_Common_Attributes)
177*912701f9SAndroid Build Coastguard Worker    * [A.5.1 Attribute standard](#Attribute_standard)
178*912701f9SAndroid Build Coastguard Worker    * [A.5.2 Attribute draft in non-leaf elements](#Attribute_draft_nonLeaf)
179*912701f9SAndroid Build Coastguard Worker  * [A.6 Element base](#Element_base)
180*912701f9SAndroid Build Coastguard Worker  * [A.7 Element rules](#Element_rules)
181*912701f9SAndroid Build Coastguard Worker  * [A.8 Deprecated subelements of `<dates>`](#Deprecated_subelements_of_dates)
182*912701f9SAndroid Build Coastguard Worker  * [A.9 Deprecated subelements of `<calendars>`](#Deprecated_subelements_of_calendars)
183*912701f9SAndroid Build Coastguard Worker  * [A.10 Deprecated subelements of `<timeZoneNames>`](#Deprecated_subelements_of_timeZoneNames)
184*912701f9SAndroid Build Coastguard Worker  * [A.11 Deprecated subelements of `<zone>` and `<metazone>`](#Deprecated_subelements_of_zone_metazone)
185*912701f9SAndroid Build Coastguard Worker  * [A.12 Renamed attribute values for `<contextTransformUsage>` element](#Renamed_attribute_values_for_contextTransformUsage)
186*912701f9SAndroid Build Coastguard Worker  * [A.13 Deprecated subelements of `<segmentations>`](#Deprecated_subelements_of_segmentations)
187*912701f9SAndroid Build Coastguard Worker  * [A.14 Element cp](#Element_cp)
188*912701f9SAndroid Build Coastguard Worker  * [A.15 Attribute validSubLocales](#validSubLocales)
189*912701f9SAndroid Build Coastguard Worker  * [A.16 Elements postalCodeData, postCodeRegex](#postCodeElements)
190*912701f9SAndroid Build Coastguard Worker  * [A.17 Element telephoneCodeData](#telephoneCodeData)
191*912701f9SAndroid Build Coastguard Worker* [Annex B Links to Other Parts](#Links_to_Other_Parts)
192*912701f9SAndroid Build Coastguard Worker  * Table: [Part 2 Links](#Part_2_Links): [General](tr35-general.md) (display names & transforms, etc.)
193*912701f9SAndroid Build Coastguard Worker  * Table: [Part 3 Links](#Part_3_Links): [Numbers](tr35-numbers.md) (number & currency formatting)
194*912701f9SAndroid Build Coastguard Worker  * Table: [Part 4 Links](#Part_4_Links): [Dates](tr35-dates.md) (date, time, time zone formatting)
195*912701f9SAndroid Build Coastguard Worker  * Table: [Part 5 Links](#Part_5_Links): [Collation](tr35-collation.md) (sorting, searching, grouping)
196*912701f9SAndroid Build Coastguard Worker  * Table: [Part 6 Links](#Part_6_Links): [Supplemental](tr35-info.md) (supplemental data)
197*912701f9SAndroid Build Coastguard Worker  * Table: [Part 7 Links](#Part_7_Links): [Keyboards](tr35-keyboards.md) (keyboard mappings)
198*912701f9SAndroid Build Coastguard Worker* [Annex C. LocaleId Canonicalization](#LocaleId_Canonicalization)
199*912701f9SAndroid Build Coastguard Worker  * [LocaleId Definitions](#LocaleId_Definitions)
200*912701f9SAndroid Build Coastguard Worker    * [1. Multimap interpretation](#1.-multimap-interpretation)
201*912701f9SAndroid Build Coastguard Worker    * [2. Alias elements](#2.-alias-elements)
202*912701f9SAndroid Build Coastguard Worker    * [Matches](#3.-matches)
203*912701f9SAndroid Build Coastguard Worker    * [4. Replacement](#4.-replacement)
204*912701f9SAndroid Build Coastguard Worker      * [Territory Exception](#territory-exception)
205*912701f9SAndroid Build Coastguard Worker    * [5. Canonicalizing Syntax](#5.-canonicalizing-syntax)
206*912701f9SAndroid Build Coastguard Worker  * [Preprocessing](#preprocessing)
207*912701f9SAndroid Build Coastguard Worker  * [Processing LanguageIds](#processing-languageids)
208*912701f9SAndroid Build Coastguard Worker  * [Processing LocaleIds](#processing-localeids)
209*912701f9SAndroid Build Coastguard Worker  * [Optimizations](#optimizations)
210*912701f9SAndroid Build Coastguard Worker* [References](#References)
211*912701f9SAndroid Build Coastguard Worker* [Acknowledgments](#Acknowledgments)
212*912701f9SAndroid Build Coastguard Worker* [Modifications](#Modifications)
213*912701f9SAndroid Build Coastguard Worker
214*912701f9SAndroid Build Coastguard Worker## <a name="Introduction" href="#Introduction">Introduction</a>
215*912701f9SAndroid Build Coastguard Worker
216*912701f9SAndroid Build Coastguard WorkerNot long ago, computer systems were like separate worlds, isolated from one another. The internet and related events have changed all that. A single system can be built of many different components, hardware and software, all needing to work together. Many different technologies have been important in bridging the gaps; in the internationalization arena, Unicode has provided a lingua franca for communicating textual data. However, there remain differences in the locale data used by different systems.
217*912701f9SAndroid Build Coastguard Worker
218*912701f9SAndroid Build Coastguard WorkerThe best practice for internationalization is to store and communicate language-neutral data, and format that data for the client. This formatting can take place on any of a number of the components in a system; a server might format data based on the user's locale, or it could be that a client machine does the formatting. The same goes for parsing data, and locale-sensitive analysis of data.
219*912701f9SAndroid Build Coastguard Worker
220*912701f9SAndroid Build Coastguard WorkerBut there remain significant differences across systems and applications in the locale-sensitive data used for such formatting, parsing, and analysis. Many of those differences are simply gratuitous; all within acceptable limits for human beings, but yielding different results. In many other cases there are outright errors. Whatever the cause, the differences can cause discrepancies to creep into a heterogeneous system. This is especially serious in the case of collation (sort-order), where different collation caused not only ordering differences, but also different results of queries! That is, with a query of customers with names between "Abbot, Cosmo" and "Arnold, James", if different systems have different sort orders, different lists will be returned. (For comparisons across systems formatted as HTML tables, see [[Comparisons](#Comparisons)].)
221*912701f9SAndroid Build Coastguard Worker
222*912701f9SAndroid Build Coastguard Worker> **Note:** There are many different equally valid ways in which data can be judged to be "correct" for a particular locale. The goal for the common locale data is to make it as consistent as possible with existing locale data, and acceptable to users in that locale.
223*912701f9SAndroid Build Coastguard Worker
224*912701f9SAndroid Build Coastguard WorkerThis document specifies an XML format for the communication of locale data: the Unicode Locale Data Markup Language (LDML). This provides a common format for systems to interchange locale data so that they can get the same results in the services provided by internationalization libraries. It also provides a standard format that can allow users to customize the behavior of a system. With it, for example, collation (sorting) rules can be exchanged, allowing two implementations to exchange a specification of tailored collation rules. Using the same specification, the two implementations will achieve the same results in comparing strings. Unicode LDML can also be used to let a user encapsulate specialized sorting behavior for a specific domain, or create a customized locale for a minority language. Unicode LDML is also used in the Unicode Common Locale Data Repository (CLDR). CLDR uses an open process for reconciling differences between the locale data used on different systems and validating the data, to produce with a useful, common, consistent base of locale data.
225*912701f9SAndroid Build Coastguard Worker
226*912701f9SAndroid Build Coastguard WorkerFor more information, see the Common Locale Data Repository project page [[LocaleProject](#localeProject)].
227*912701f9SAndroid Build Coastguard Worker
228*912701f9SAndroid Build Coastguard WorkerAs LDML is an interchange format, it was designed for ease of maintenance and simplicity of transformation into other formats, above efficiency of run-time lookup and use. Implementations should consider converting LDML data into a more compact format prior to use.
229*912701f9SAndroid Build Coastguard Worker
230*912701f9SAndroid Build Coastguard Worker### <a name="Conformance" href="#Conformance">Conformance</a>
231*912701f9SAndroid Build Coastguard Worker
232*912701f9SAndroid Build Coastguard WorkerThere are many ways to use the Unicode LDML format and the data in CLDR, and the Unicode Consortium does not restrict the ways in which the format or data are used. However, an implementation may also claim conformance to LDML or to CLDR, as follows:
233*912701f9SAndroid Build Coastguard Worker
234*912701f9SAndroid Build Coastguard Worker_**UAX35-C1.**_ An implementation that claims conformance to this specification shall:
235*912701f9SAndroid Build Coastguard Worker
236*912701f9SAndroid Build Coastguard Worker1. Identify the sections of the specification that it conforms to.
237*912701f9SAndroid Build Coastguard Worker   * For example, an implementation might claim conformance to all LDML features except for _transforms_ and _segments_.
238*912701f9SAndroid Build Coastguard Worker2. Interpret the relevant elements and attributes of LDML documents in accordance with the descriptions in those sections.
239*912701f9SAndroid Build Coastguard Worker   * For example, an implementation that claims conformance to the date format patterns must interpret the characters in such patterns according to [Date Field Symbol Table](tr35-dates.md#Date_Field_Symbol_Table).
240*912701f9SAndroid Build Coastguard Worker3. Declare which types of CLDR data it uses.
241*912701f9SAndroid Build Coastguard Worker   * For example, an implementation might declare that it only uses language names, and those with a _draft_ status of _contributed_ or _approved_.
242*912701f9SAndroid Build Coastguard Worker
243*912701f9SAndroid Build Coastguard Worker_**UAX35-C2.**_ An implementation that claims conformance to Unicode locale or language identifiers shall:
244*912701f9SAndroid Build Coastguard Worker
245*912701f9SAndroid Build Coastguard Worker1. Specify whether Unicode locale extensions are allowed
246*912701f9SAndroid Build Coastguard Worker2. Specify the canonical form used for identifiers in terms of casing and field separator characters.
247*912701f9SAndroid Build Coastguard Worker
248*912701f9SAndroid Build Coastguard WorkerExternal specifications may also reference particular components of Unicode locale or language identifiers, such as:
249*912701f9SAndroid Build Coastguard Worker
250*912701f9SAndroid Build Coastguard Worker> _Field X can contain any Unicode region subtag values as given in Unicode Technical Standard #35: Unicode Locale Data Markup Language (LDML), excluding grouping codes._
251*912701f9SAndroid Build Coastguard Worker
252*912701f9SAndroid Build Coastguard Worker### EBNF
253*912701f9SAndroid Build Coastguard WorkerThe BNF syntax used in LDML is a variant of the Extended Backus-Naur Form (EBNF) notation used in [W3C XML Notation](https://www.w3.org/TR/REC-xml/#sec-notation). The main differences are:
254*912701f9SAndroid Build Coastguard Worker
255*912701f9SAndroid Build Coastguard Worker1. Bounded repetition following Perl regex syntax is allowed, such as `alphanum{3,8}`.
256*912701f9SAndroid Build Coastguard Worker2. Whitespace inside bracketed enumerations and ranges is ignored.
257*912701f9SAndroid Build Coastguard Worker   * eg., `[A-Z a-z]` is the same as `[A-Za-z]`
258*912701f9SAndroid Build Coastguard Worker3. A backslash may be used to escape a following "x"-prefixed hexadecimal code point or the immediately following character.
259*912701f9SAndroid Build Coastguard Worker   * eg., `\x20` is the same as `#x20` and `[\&\-]` is the same as `[#x26#x2D]`
260*912701f9SAndroid Build Coastguard Worker4. Constraints (well-formedness or validity) may use separate notes, and/or the W3C notations:
261*912701f9SAndroid Build Coastguard Worker   * [ wfc: ... ]
262*912701f9SAndroid Build Coastguard Worker   * [ vc: ... ]
263*912701f9SAndroid Build Coastguard Worker
264*912701f9SAndroid Build Coastguard WorkerIn the text, this is sometimes referred to as "EBNF (Perl-based)".
265*912701f9SAndroid Build Coastguard Worker
266*912701f9SAndroid Build Coastguard Worker## <a name="Locale" href="#Locale">What is a Locale?</a>
267*912701f9SAndroid Build Coastguard Worker
268*912701f9SAndroid Build Coastguard WorkerBefore diving into the XML structure, it is helpful to describe the model behind the structure. People do not have to subscribe to this model to use data in LDML, but they do need to understand it so that the data can be correctly translated into whatever model their implementation uses.
269*912701f9SAndroid Build Coastguard Worker
270*912701f9SAndroid Build Coastguard WorkerThe first issue is basic: _what is a locale?_ In this model, a locale is an identifier (id) that refers to a set of user preferences that tend to be shared across significant swaths of the world. Traditionally, the data associated with this id provides support for formatting and parsing of dates, times, numbers, and currencies; for measurement units, for sort-order (collation), plus translated names for time zones, languages, countries, and scripts. The data can also include support for text boundaries (character, word, line, and sentence), text transformations (including transliterations), and other services.
271*912701f9SAndroid Build Coastguard Worker
272*912701f9SAndroid Build Coastguard WorkerLocale data is not cast in stone: the data used on someone's machine generally may reflect the US format, for example, but preferences can typically set to override particular items, such as setting the date format for 2002.03.15, or using metric or Imperial measurement units. In the abstract, locales are simply one of many sets of preferences that, say, a website may want to remember for a particular user. Depending on the application, it may want to also remember the user's time zone, preferred currency, preferred character set, smoker/non-smoker preference, meal preference (vegetarian, kosher, and so on), music preference, religion, party affiliation, favorite charity, and so on.
273*912701f9SAndroid Build Coastguard Worker
274*912701f9SAndroid Build Coastguard WorkerLocale data in a system may also change over time: country boundaries change; governments (and currencies) come and go: committees impose new standards; bugs are found and fixed in the source data; and so on. Thus the data needs to be versioned for stability over time.
275*912701f9SAndroid Build Coastguard Worker
276*912701f9SAndroid Build Coastguard WorkerIn general terms, the locale id is a parameter that is supplied to a particular service (date formatting, sorting, spell-checking, and so on). The format in this document does not attempt to represent all the data that could conceivably be used by all possible services. Instead, it collects together data that is in common use in systems and internationalization libraries for basic services. The main difference among locales is in terms of language; there may also be some differences according to different countries or regions. However, the line between _locales_ and _languages_, as commonly used in the industry, are rather fuzzy. Note also that the vast majority of the locale data in CLDR is in fact language data; all non-linguistic data is separated out into a separate tree. For more information, see _[Language and Locale IDs](#Language_and_Locale_IDs)_.
277*912701f9SAndroid Build Coastguard Worker
278*912701f9SAndroid Build Coastguard WorkerWe will speak of data as being "in locale X". That does not imply that a locale _is_ a collection of data; it is simply shorthand for "the set of data associated with the locale id X". Each individual piece of data is called a _resource_ or _field_, and a tag indicating the key of the resource is called a _resource tag._
279*912701f9SAndroid Build Coastguard Worker
280*912701f9SAndroid Build Coastguard Worker
281*912701f9SAndroid Build Coastguard Worker<a name="Identifiers"></a>
282*912701f9SAndroid Build Coastguard Worker## <a name="Unicode_Language_and_Locale_Identifiers" href="#Unicode_Language_and_Locale_Identifiers">Unicode Language and Locale Identifiers</a>
283*912701f9SAndroid Build Coastguard Worker
284*912701f9SAndroid Build Coastguard WorkerUnicode LDML uses stable identifiers based on [[BCP47](#BCP47)] for distinguishing among languages, locales, regions, currencies, time zones, transforms, and so on. There are many systems for identifiers for these entities. The Unicode LDML identifiers may not match the identifiers used on a particular target system. If so, some process of identifier translation may be required when using LDML data.
285*912701f9SAndroid Build Coastguard Worker
286*912701f9SAndroid Build Coastguard WorkerThe BCP 47 extensions (-u- and -t-) are described in _[Unicode BCP 47 U Extension](#u_Extension)_ and _[Unicode BCP 47 T Extension](#BCP47_T_Extension)_.
287*912701f9SAndroid Build Coastguard Worker
288*912701f9SAndroid Build Coastguard Worker### _<a name="Unicode_language_identifier" href="#Unicode_language_identifier">Unicode Language Identifier</a>_
289*912701f9SAndroid Build Coastguard Worker
290*912701f9SAndroid Build Coastguard WorkerA _Unicode language identifier_ has the following structure (provided in EBNF (Perl-based)). The following table defines syntactically well-formed identifiers: they are not necessarily valid identifiers. For additional validity criteria, see the links on the right.
291*912701f9SAndroid Build Coastguard Worker
292*912701f9SAndroid Build Coastguard Worker<table>
293*912701f9SAndroid Build Coastguard Worker<tbody>
294*912701f9SAndroid Build Coastguard Worker   <tr><th></th><th>EBNF</th><th>Validity / Comments</th></tr>
295*912701f9SAndroid Build Coastguard Worker<tr>
296*912701f9SAndroid Build Coastguard Worker    <td><a name="unicode_language_id" href="#unicode_language_id"><code>unicode_language_id</code></a></td>
297*912701f9SAndroid Build Coastguard Worker    <td><pre><code>= "root"
298*912701f9SAndroid Build Coastguard Worker| (unicode_language_subtag
299*912701f9SAndroid Build Coastguard Worker    (sep unicode_script_subtag)?
300*912701f9SAndroid Build Coastguard Worker  | unicode_script_subtag)
301*912701f9SAndroid Build Coastguard Worker  (sep unicode_region_subtag)?
302*912701f9SAndroid Build Coastguard Worker  (sep unicode_variant_subtag)* ;</code></pre></td>
303*912701f9SAndroid Build Coastguard Worker    <td>"root" is treated as a special <code>unicode_language_subtag</code></td>
304*912701f9SAndroid Build Coastguard Worker</tr>
305*912701f9SAndroid Build Coastguard Worker<tr>
306*912701f9SAndroid Build Coastguard Worker    <td><a name="unicode_language_subtag" href="#unicode_language_subtag"><code>unicode_language_subtag</code></a></td>
307*912701f9SAndroid Build Coastguard Worker    <td><pre>= alpha{2,3} | alpha{5,8};</pre></td>
308*912701f9SAndroid Build Coastguard Worker    <td><a href="#unicode_language_subtag_validity">validity</a><br/>
309*912701f9SAndroid Build Coastguard Worker        <a href="https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/language.xml">latest-data</a></td>
310*912701f9SAndroid Build Coastguard Worker</tr>
311*912701f9SAndroid Build Coastguard Worker<tr>
312*912701f9SAndroid Build Coastguard Worker    <td><a name="unicode_script_subtag" href="#unicode_script_subtag"><code>unicode_script_subtag</code></a></td>
313*912701f9SAndroid Build Coastguard Worker    <td><pre>= alpha{4} ;</pre></td>
314*912701f9SAndroid Build Coastguard Worker    <td><a href="#unicode_script_subtag_validity">validity</a><br/>
315*912701f9SAndroid Build Coastguard Worker        <a href="https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/script.xml">latest-data</a></td>
316*912701f9SAndroid Build Coastguard Worker</tr>
317*912701f9SAndroid Build Coastguard Worker<tr>
318*912701f9SAndroid Build Coastguard Worker    <td><a name="unicode_region_subtag" href="#unicode_region_subtag"><code>unicode_region_subtag</code></a>
319*912701f9SAndroid Build Coastguard Worker    <td><pre>= (alpha{2} | digit{3}) ;</pre></td>
320*912701f9SAndroid Build Coastguard Worker    <td><a href="#unicode_region_subtag_validity">validity</a><br/>
321*912701f9SAndroid Build Coastguard Worker        <a href="https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/region.xml">latest-data</a></td>
322*912701f9SAndroid Build Coastguard Worker</tr>
323*912701f9SAndroid Build Coastguard Worker<tr>
324*912701f9SAndroid Build Coastguard Worker    <td><a name="unicode_variant_subtag" href="#unicode_variant_subtag"><code>unicode_variant_subtag</code></a>
325*912701f9SAndroid Build Coastguard Worker    <td><pre>= (alphanum{5,8}<br/>| digit alphanum{3}) ;</pre></td>
326*912701f9SAndroid Build Coastguard Worker    <td><a href="#unicode_variant_subtag_validity">validity</a><br/>
327*912701f9SAndroid Build Coastguard Worker        <a href="https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/variant.xml">latest-data</a></td>
328*912701f9SAndroid Build Coastguard Worker</tr>
329*912701f9SAndroid Build Coastguard Worker   <tr><td><code>sep</code></td>     <td><pre>= [-_] ;</pre></td></tr>
330*912701f9SAndroid Build Coastguard Worker<tr><td><code>digit</code></td>   <td><pre>= [0-9] ;</pre></td></tr>
331*912701f9SAndroid Build Coastguard Worker<tr><td><code>alpha</code></td>   <td><pre>= [A-Z a-z] ;</pre></td></tr>
332*912701f9SAndroid Build Coastguard Worker<tr><td><code>alphanum</code></td><td><pre>= [0-9 A-Z a-z] ;</pre></td></tr>
333*912701f9SAndroid Build Coastguard Worker</tbody></table>
334*912701f9SAndroid Build Coastguard Worker
335*912701f9SAndroid Build Coastguard WorkerThe following is an additional well-formedness constraint:
336*912701f9SAndroid Build Coastguard Worker  1. [ wfc: The sequence of variant subtags must not have any duplicates (eg, de-1996-fonipa-1996 is not syntactically well-formed). ]
337*912701f9SAndroid Build Coastguard Worker
338*912701f9SAndroid Build Coastguard WorkerThe semantics of the various subtags is explained in _[Language Identifier Field Definitions](#Field_Definitions)_ ; there are also direct links from [`unicode_language_subtag`](#unicode_language_subtag) , etc. While theoretically the [`unicode_language_subtag`](#unicode_language_subtag) may have more than 3 letters through the IANA registration process, in practice that has not occurred. The [`unicode_language_subtag`](#unicode_language_subtag) "und" may be omitted when there is a [`unicode_script_subtag`](#unicode_script_subtag) ; for that reason [`unicode_language_subtag`](#unicode_language_subtag) values with 4 letters are not permitted. However, such [`unicode_language_id`](#unicode_language_id) values are not intended for general interchange, because they are not valid BCP 47 tags. Instead, they are intended for certain protocols such as the identification of transliterators or font ScriptLangTag values. For more information on language subtags with 4 letters, see [BCP 47 Language Tag to Unicode BCP 47 Locale Identifier](#Language_Tag_to_Locale_Identifier).
339*912701f9SAndroid Build Coastguard Worker
340*912701f9SAndroid Build Coastguard WorkerFor example, "en-US" (American English), "en_GB" (British English), "es-419" (Latin American Spanish), and "uz-Cyrl" (Uzbek in Cyrillic) are all valid Unicode language identifiers.
341*912701f9SAndroid Build Coastguard Worker
342*912701f9SAndroid Build Coastguard Worker### _<a name="Unicode_locale_identifier" href="#Unicode_locale_identifier">Unicode Locale Identifier</a>_
343*912701f9SAndroid Build Coastguard Worker
344*912701f9SAndroid Build Coastguard WorkerA _Unicode locale identifier_ is composed of a Unicode language identifier plus (optional) locale extensions. It has the following structure. The semantics of the U and T extensions are explained in _[Unicode BCP 47 U Extension](#u_Extension)_ and _[Unicode BCP 47 T Extension](#BCP47_T_Extension)_. Other extensions and private use extensions are supported for pass-through. The following table defines syntactically _well-formed_ identifiers: they are not necessarily _valid_ identifiers. For additional validity criteria, see the links on the right.
345*912701f9SAndroid Build Coastguard Worker
346*912701f9SAndroid Build Coastguard Worker|                                                                                                       | EBNF                                            | Validity / Comments |
347*912701f9SAndroid Build Coastguard Worker| ----------------------------------------------------------------------------------------------------- | ----------------------------------------------- | ------------------- |
348*912701f9SAndroid Build Coastguard Worker| <a name="unicode_locale_id" href="#unicode_locale_id">`unicode_locale_id`</a>                         | `= unicode_language_id`<br/>  `extensions*`<br/>  `pu_extensions? ;` |
349*912701f9SAndroid Build Coastguard Worker| <a name="extensions" href="#extensions">`extensions`</a>                                              | `= unicode_locale_extensions`<br/>`\| transformed_extensions`<br/>` \| other_extensions ;` |
350*912701f9SAndroid Build Coastguard Worker| <a name="unicode_locale_extensions" href="#unicode_locale_extensions">`unicode_locale_extensions`</a> | `= sep [uU]`<br/>  `((sep keyword)+`<br/>  `\|(sep attribute)+ (sep keyword)*) ;` |
351*912701f9SAndroid Build Coastguard Worker| <a name="transformed_extensions" href="#transformed_extensions">`transformed_extensions`</a>          | `= sep [tT]`<br/>  `((sep tlang (sep tfield)*)`<br/>  `\| (sep tfield)+) ;` |
352*912701f9SAndroid Build Coastguard Worker| <a name="pu_extensions" href="#pu_extensions">`pu_extensions`</a>                                     | `= sep [xX]`<br/>`  (sep alphanum{1,8})+ ;` |
353*912701f9SAndroid Build Coastguard Worker| <a name="other_extensions" href="#other_extensions">`other_extensions`</a>                            | `= sep [alphanum-[tTuUxX]]`<br/>`  (sep alphanum{2,8})+ ;` |
354*912701f9SAndroid Build Coastguard Worker| `keyword`<br/>(Also known as `ufield`)                                                                | `= key (sep type)? ;` |
355*912701f9SAndroid Build Coastguard Worker| `key`<br/>(Also known as `ukey`)                                                                      | `= alphanum alpha ;`<br/>(Note that this is narrower than in [[RFC6067](https://www.ietf.org/rfc/rfc6067.txt)], so that it is disjoint with tkey.) | [`validity`](#Key_Type_Definitions)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47) |
356*912701f9SAndroid Build Coastguard Worker| `type`<br/>(Also known as `uvalue`)                                                                   | `= alphanum{3,8}`<br/>`  (sep alphanum{3,8})* ;` | [`validity`](#Key_Type_Definitions)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47) |
357*912701f9SAndroid Build Coastguard Worker| `attribute`                                                                                           | `= alphanum{3,8} ;` |
358*912701f9SAndroid Build Coastguard Worker| <a name="unicode_subdivision_id" href="#unicode_subdivision_id">`unicode_subdivision_id`</a>          | `= `[`unicode_region_subtag`](#unicode_region_subtag)` unicode_subdivision_suffix ;` | [`validity`](#unicode_subdivision_subtag_validity)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/subdivision.xml) |
359*912701f9SAndroid Build Coastguard Worker| `unicode_subdivision_suffix`                                                                          | `= alphanum{1,4} ;` |
360*912701f9SAndroid Build Coastguard Worker| <a name="unicode_measure_unit" href="#unicode_measure_unit">`unicode_measure_unit`</a>                | `= alphanum{3,8}`<br/>`  (sep alphanum{3,8})* ;` | [`validity`](#Validity_Data)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/unit.xml) |
361*912701f9SAndroid Build Coastguard Worker| `tlang`                                                                                               | `= unicode_language_subtag`<br/>`  (sep unicode_script_subtag)?`<br/>`  (sep unicode_region_subtag)?`<br/>`  (sep unicode_variant_subtag)* ;` | same as in unicode_language_id |
362*912701f9SAndroid Build Coastguard Worker| `tfield`                                                                                              | `= tkey tvalue;` | [`validity`](#BCP47_T_Extension)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47) |
363*912701f9SAndroid Build Coastguard Worker| `tkey`                                                                                                | `= alpha digit ;` |
364*912701f9SAndroid Build Coastguard Worker| `tvalue`                                                                                              | `= (sep alphanum{3,8})+ ;` |
365*912701f9SAndroid Build Coastguard Worker
366*912701f9SAndroid Build Coastguard WorkerThe following are additional well-formedness constraints:
367*912701f9SAndroid Build Coastguard Worker  1. [ wfc: There cannot be more than one extension with the same singleton. For example, en-u-ca-buddhist-u-cf-standard is ill-formed.]
368*912701f9SAndroid Build Coastguard Worker  2. [ wfc: There cannot be more than one ukey or tkey. For example, en-u-ca-buddhist-ca-islamic is ill-formed. ]
369*912701f9SAndroid Build Coastguard Worker  2. [ wfc: The sequence of variant subtags in a tlang must not have any duplicates. ]
370*912701f9SAndroid Build Coastguard Worker  3. [ wfc: The private use extension (-x-) must come after all other extensions. ]
371*912701f9SAndroid Build Coastguard Worker
372*912701f9SAndroid Build Coastguard WorkerFor historical reasons, this is called a Unicode locale identifier. However, it really functions (with few exceptions) as a language identifier, and accesses language-based data. Except where it would be unclear, this document uses the term "locale" data loosely to encompass both types of data: for more information, see _[Language and Locale IDs](#Language_and_Locale_IDs)_.
373*912701f9SAndroid Build Coastguard Worker
374*912701f9SAndroid Build Coastguard WorkerAs of the release of this specification, there were no other_extensions defined. The other_extensions are present in the syntax to allow implementations to preserve that information.
375*912701f9SAndroid Build Coastguard Worker
376*912701f9SAndroid Build Coastguard WorkerAs for terminology, the term _code_ may also be used instead of "subtag", and "territory" instead of "region". The primary language subtag is also called the _base language code_. For example, the base language code for "en-US" (American English) is "en" (English). The _type_ may also be referred to as a _value_ or _key-value_.
377*912701f9SAndroid Build Coastguard Worker
378*912701f9SAndroid Build Coastguard WorkerThe identifiers can vary in case and in the separator characters. The "-" and "\_" separators are treated as equivalent, although "-" is preferred.
379*912701f9SAndroid Build Coastguard Worker
380*912701f9SAndroid Build Coastguard WorkerAll identifier field values are case-insensitive. Although case distinctions do not carry any special meaning, an implementation of LDML should use the casing recommendations in [[BCP47](#BCP47)], especially when a Unicode locale identifier is used for locale data exchange in software protocols.
381*912701f9SAndroid Build Coastguard Worker
382*912701f9SAndroid Build Coastguard Worker#### <a name="Canonical_Unicode_Locale_Identifiers" href="#Canonical_Unicode_Locale_Identifiers">Canonical Unicode Locale Identifiers</a>
383*912701f9SAndroid Build Coastguard Worker
384*912701f9SAndroid Build Coastguard WorkerA [`unicode_locale_id`](#unicode_locale_id) has _canonical syntax_ when:
385*912701f9SAndroid Build Coastguard Worker
386*912701f9SAndroid Build Coastguard Worker* It starts with a language subtag (those beginning with a script subtag are only for specialized use)
387*912701f9SAndroid Build Coastguard Worker* Casing
388*912701f9SAndroid Build Coastguard Worker  * Any script subtag inside unicode_language_id is in title case (eg, Hant)
389*912701f9SAndroid Build Coastguard Worker  * Any region subtag inside unicode_language_id is in uppercase (eg, DE)
390*912701f9SAndroid Build Coastguard Worker  * All other subtags are in lowercase (eg, en, fonipa)
391*912701f9SAndroid Build Coastguard Worker* Order
392*912701f9SAndroid Build Coastguard Worker  * Any variants are in alphabetical order (eg, en-fonipa-scouse, not en-scouse-fonipa)
393*912701f9SAndroid Build Coastguard Worker  * Any extensions are in alphabetical order by their singleton (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)
394*912701f9SAndroid Build Coastguard Worker  * All attributes are sorted in alphabetical order.
395*912701f9SAndroid Build Coastguard Worker  * All keywords and tfields are sorted by alphabetical order of their keys, within their respective extensions.
396*912701f9SAndroid Build Coastguard Worker  * Any type or tfield value "true" is removed.
397*912701f9SAndroid Build Coastguard Worker
398*912701f9SAndroid Build Coastguard WorkerFor example, the canonical form of "en-u-foo-bar-nu-thai-ca-buddhist-kk-true" is "en-u-bar-foo-ca-buddhist-kk-nu-thai". The attributes `"foo"` and `"bar"` in this example are provided only for illustration; no attribute subtags are defined by the current CLDR specification.
399*912701f9SAndroid Build Coastguard Worker
400*912701f9SAndroid Build Coastguard WorkerNOTE: Some people may wonder why CLDR uses alphabetical order for variants, rather than the ordering in [Section 4.1](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.1) of BCP 47. Here are the considerations that lead to that decision:
401*912701f9SAndroid Build Coastguard Worker  * The ordering in is recommended, but not required for conformance. In particular, use of and ordering by Prefix is recommended but not required.
402*912701f9SAndroid Build Coastguard Worker  * Moreover, [Section 4.5](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.5) states that “If more than one variant appears within a tag, processors MAY reorder the variants to obtain better matching behavior or more consistent presentation.”
403*912701f9SAndroid Build Coastguard Worker  * The best practices for internationalization have moved well beyond some of the guidelines and recommendations in BCP 47, especially for language matching and language fallback.
404*912701f9SAndroid Build Coastguard Worker  * Robust implementations will accept the variants in any order, just as they accept extensions in any order.
405*912701f9SAndroid Build Coastguard Worker  * A canonical order allows for determination of identity of identifiers via string comparison.
406*912701f9SAndroid Build Coastguard Worker  * The ordering in does not result in a determinant order for canonicalization, since the mechanism for determining “importance” is not specified: ca-valencia-fonipa and ca-fonipa-valencia could both be ‘canonical’ variants of one another.
407*912701f9SAndroid Build Coastguard Worker  * Pure alphabetical order is determinant and simple to implement while the ordering in is indeterminant, more complex, and provides no significant benefit in modern applications.
408*912701f9SAndroid Build Coastguard Worker
409*912701f9SAndroid Build Coastguard Worker**Note:** The current version of CLDR data uses some non-preferred _syntax_ for backward compatibility. This might be changed in future CLDR releases.
410*912701f9SAndroid Build Coastguard Worker
411*912701f9SAndroid Build Coastguard Worker  * It uses uppercase letters for variant subtags, while the preferred forms are all lowercase.
412*912701f9SAndroid Build Coastguard Worker  * It uses "\_" as the separator, while the preferred form of the separator is "-".
413*912701f9SAndroid Build Coastguard Worker  * It uses "root", while the preferred form is "und".
414*912701f9SAndroid Build Coastguard Worker
415*912701f9SAndroid Build Coastguard WorkerA [`unicode_locale_id`](#unicode_locale_id) is in _canonical form_ when it has canonical syntax and contains no aliased subtags. A [`unicode_locale_id`](#unicode_locale_id) can be transformed into canonical form according to [Annex C. LocaleId Canonicalization](#LocaleId_Canonicalization).
416*912701f9SAndroid Build Coastguard Worker
417*912701f9SAndroid Build Coastguard WorkerA [`unicode_locale_id`](#unicode_locale_id) is _maximal_ when the [`unicode_language_id`](#unicode_language_id) and tlang (if any) have been transformed by the Add Likely Subtags operation in _[Likely Subtags](#Likely_Subtags)_, excluding "und".
418*912701f9SAndroid Build Coastguard Worker
419*912701f9SAndroid Build Coastguard Worker> _Example:_ the maxmal form of ja-Kana-t-it is ja-Kana-JP-t-it-latn-it
420*912701f9SAndroid Build Coastguard Worker
421*912701f9SAndroid Build Coastguard WorkerNote that the _latn_ and final _it_ don't use any uppercase characters, since they are not inside unicode_language_id.
422*912701f9SAndroid Build Coastguard Worker
423*912701f9SAndroid Build Coastguard WorkerTwo [`unicode_locale_ids`](#unicode_locale_id) are _equivalent_ when their maximal canonical forms are identical.
424*912701f9SAndroid Build Coastguard Worker
425*912701f9SAndroid Build Coastguard Worker> _Example:_ "IW-HEBR-u-ms-imperial" ~ "he-u-ms-uksystem"
426*912701f9SAndroid Build Coastguard Worker
427*912701f9SAndroid Build Coastguard WorkerThe equivalence relationship may change over time, such as when subtags are deprecated or likely subtag mappings change. For example, if two countries were to merge, then various subtags would become deprecated. These kinds of changes are generally very infrequent.
428*912701f9SAndroid Build Coastguard Worker
429*912701f9SAndroid Build Coastguard Worker
430*912701f9SAndroid Build Coastguard Worker### <a name="BCP_47_Conformance" href="#BCP_47_Conformance">BCP 47 Conformance</a>
431*912701f9SAndroid Build Coastguard Worker
432*912701f9SAndroid Build Coastguard WorkerUnicode language and locale identifiers inherit the design and the repertoire of subtags from [[BCP47](#BCP47)] Language Tags. There are some extensions and restrictions made for the use of the Unicode locale identifier in CLDR:
433*912701f9SAndroid Build Coastguard Worker
434*912701f9SAndroid Build Coastguard Worker* It does not allow for the full syntax of [[BCP47](#BCP47)]:
435*912701f9SAndroid Build Coastguard Worker  * No extlang subtags are allowed (as in the BCP 47 canonical form, see BCP 47 [Section 4.5](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.5) and [Section 3.1.7](https://www.rfc-editor.org/rfc/rfc5646.html#section-3.1.7))
436*912701f9SAndroid Build Coastguard Worker  * No irregular BCP 47 legacy language tags (marked as “Type: grandfathered” in BCP 47) are allowed (these are all deprecated in BCP 47)
437*912701f9SAndroid Build Coastguard Worker  * A tag must not start with the subtag "x": thus a _privateuse_ (eg x-abc) can only be after a language subtag, like "und"
438*912701f9SAndroid Build Coastguard Worker* It allows for certain semantic additions and constraints:
439*912701f9SAndroid Build Coastguard Worker  * Certain codes that are private-use in BCP 47 and ISO are given semantics by LDML
440*912701f9SAndroid Build Coastguard Worker  * Each macrolanguage has an identified primary encompassed language, which is treated as an alias for the macrolanguage, and thus is replaced when canonicalizing (as allowed by BCP 47, see [Section 4.1.2](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.1.2))
441*912701f9SAndroid Build Coastguard Worker* It allows certain syntax for backwards compatibility (not BCP 47-compatible):
442*912701f9SAndroid Build Coastguard Worker  * The "\_" character for field separator characters, as well as the "-" used in [[BCP47](#BCP47)] (however, the canonical form is with "-")
443*912701f9SAndroid Build Coastguard Worker  * The subtag "root" to indicate the generic locale used as the parent of all languages in the CLDR data model ("und" can be used instead)
444*912701f9SAndroid Build Coastguard Worker  * The language tag may begin with a script subtag rather than a language subtag. This is specialized use only, and not required for CLDR conformance.
445*912701f9SAndroid Build Coastguard Worker
446*912701f9SAndroid Build Coastguard WorkerThere are thus two subtypes of Unicode locale identifiers:
447*912701f9SAndroid Build Coastguard Worker
448*912701f9SAndroid Build Coastguard Worker* the term _Unicode CLDR locale identifier_ applies where the backwards compatibility syntax is used.
449*912701f9SAndroid Build Coastguard Worker* the term _Unicode BCP 47 locale identifier_ applies otherwise. A _Unicode BCP 47 locale identifier_ is also a valid BCP 47 language tag.
450*912701f9SAndroid Build Coastguard Worker
451*912701f9SAndroid Build Coastguard Worker#### <a name="BCP_47_Language_Tag_Conversion" href="#BCP_47_Language_Tag_Conversion">BCP 47 Language Tag Conversion</a>
452*912701f9SAndroid Build Coastguard Worker
453*912701f9SAndroid Build Coastguard WorkerThe different identifiers can be converted to one another as described in this section.
454*912701f9SAndroid Build Coastguard Worker
455*912701f9SAndroid Build Coastguard WorkerA valid [[BCP47](#BCP47)] language tag can be converted to a valid Unicode BCP 47 locale identifier according to [Annex C. LocaleId Canonicalization](#LocaleId_Canonicalization).
456*912701f9SAndroid Build Coastguard Worker
457*912701f9SAndroid Build Coastguard WorkerThe result is a Unicode BCP 47 locale identifier, in canonical form. It is both a BCP 47 language tag and a Unicode locale identifier. Because the process maps from all BCP 47 language tags into a subset of BCP 47 language tags, the format changes are not reversible, much as a lowercase transformation of the string “McGowan” is not reversible.
458*912701f9SAndroid Build Coastguard Worker
459*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Language_Tag_to_Locale_Identifier" href="#Language_Tag_to_Locale_Identifier">BCP 47 Language Tag to Unicode BCP 47 Locale Identifier</a> Examples
460*912701f9SAndroid Build Coastguard Worker
461*912701f9SAndroid Build Coastguard Worker| BCP 47 language tag | Unicode BCP 47 locale identifier | Comments |
462*912701f9SAndroid Build Coastguard Worker| ------------------- | -------------------------------- | -------- |
463*912701f9SAndroid Build Coastguard Worker| `en-US`             | `en-US`                          | no changes |
464*912701f9SAndroid Build Coastguard Worker| `iw-FX`             | `he-FR`                          | BCP 47 canonicalization  |
465*912701f9SAndroid Build Coastguard Worker| `cmn-TW`            | `zh-TW`                          | language alias  |
466*912701f9SAndroid Build Coastguard Worker| `zh-cmn-TW`         | `zh-TW`                          | BCP 47 canonicalization, then language alias  |
467*912701f9SAndroid Build Coastguard Worker| `sr-CS`             | `sr-RS`                          | territory alias  |
468*912701f9SAndroid Build Coastguard Worker| `sh`                | `sr-Latn`                        | multiple replacement subtags  |
469*912701f9SAndroid Build Coastguard Worker| `sh-Cyrl`           | `sr-Cyrl`                        | no replacement with multiple replacement subtags |
470*912701f9SAndroid Build Coastguard Worker| `hy-SU`             | `hy-AM`                          | multiple territory values <br/>`<territoryAlias type="SU" replacement="RU AM AZ BY EE GE KZ KG LV LT MD TJ TM UA UZ" …/>` |
471*912701f9SAndroid Build Coastguard Worker| `i-enochian`        | `und-x-i-enochian`               | prefix any legacy language tags (marked as “Type: grandfathered” in BCP 47) with "und-x-"  |
472*912701f9SAndroid Build Coastguard Worker| `x-abc`             | `und-x-abc`                      | prefix with "und-", so that there is always a base language subtag  |
473*912701f9SAndroid Build Coastguard Worker
474*912701f9SAndroid Build Coastguard Worker##### <a name="Unicode_Locale_Identifier_CLDR_to_BCP_47" href="#Unicode_Locale_Identifier_CLDR_to_BCP_47">Unicode Locale Identifier: CLDR to BCP 47</a>
475*912701f9SAndroid Build Coastguard Worker
476*912701f9SAndroid Build Coastguard WorkerA Unicode CLDR locale identifier can be converted to a valid [[BCP47](#BCP47)] language tag (which is also a Unicode BCP 47 locale identifier) by performing the following transformation.
477*912701f9SAndroid Build Coastguard Worker
478*912701f9SAndroid Build Coastguard Worker1.  Replace the "\_" separators with "-"
479*912701f9SAndroid Build Coastguard Worker2.  Replace the special language identifier "root" with the BCP 47 primary language tag "und"
480*912701f9SAndroid Build Coastguard Worker3.  Add an initial "und" primary language subtag if the first subtag is a script.
481*912701f9SAndroid Build Coastguard Worker
482*912701f9SAndroid Build Coastguard Worker_Examples:_
483*912701f9SAndroid Build Coastguard Worker
484*912701f9SAndroid Build Coastguard Worker| Unicode CLDR locale identifier | BCP 47 language tag  | Comments               |
485*912701f9SAndroid Build Coastguard Worker| ------------------------------ | -------------------- | ---------------------- |
486*912701f9SAndroid Build Coastguard Worker| `en_US`                        | `en-US`              | change separator       |
487*912701f9SAndroid Build Coastguard Worker| `de_DE_u_co_phonebk`           | `de-DE-u-co-phonebk` | change separator       |
488*912701f9SAndroid Build Coastguard Worker| `root`                         | `und`                | change to "und"        |
489*912701f9SAndroid Build Coastguard Worker| `root_u_cu_usd`                | `und-u-cu-usd`       | change to "und"        |
490*912701f9SAndroid Build Coastguard Worker| `Latn_DE`                      | `und-Latn-DE`        | add "und"              |
491*912701f9SAndroid Build Coastguard Worker
492*912701f9SAndroid Build Coastguard Worker##### <a name="Unicode_Locale_Identifier_BCP_47_to_CLDR" href="#Unicode_Locale_Identifier_BCP_47_to_CLDR">Unicode Locale Identifier: BCP 47 to CLDR</a>
493*912701f9SAndroid Build Coastguard Worker
494*912701f9SAndroid Build Coastguard WorkerA Unicode BCP 47 locale identifier can be transformed into a Unicode CLDR locale identifier by performing the following transformation.
495*912701f9SAndroid Build Coastguard Worker
496*912701f9SAndroid Build Coastguard Worker1.  the separator is changed to "\_"
497*912701f9SAndroid Build Coastguard Worker2.  the primary language subtag "und" is replaced with "root" if no script, region, or variant subtags are present.
498*912701f9SAndroid Build Coastguard Worker
499*912701f9SAndroid Build Coastguard Worker_Examples:_
500*912701f9SAndroid Build Coastguard Worker
501*912701f9SAndroid Build Coastguard Worker| BCP 47 language tag | Unicode CLDR locale identifier | Comments |
502*912701f9SAndroid Build Coastguard Worker| ------------------- | ------------------------------ | -------- |
503*912701f9SAndroid Build Coastguard Worker| `en-US`             | `en_US`                        | changes separator |
504*912701f9SAndroid Build Coastguard Worker| `und`               | `root`                         | changes to "root", because no script, region, or variant tag is present |
505*912701f9SAndroid Build Coastguard Worker| `und-US`            | `und_US`                       | no change to "und", because a region subtag is present |
506*912701f9SAndroid Build Coastguard Worker| `und-u-cu-USD`      | `root_u_cu_usd`                | changes to "root", because no script, region, or variant tag is present |
507*912701f9SAndroid Build Coastguard Worker
508*912701f9SAndroid Build Coastguard Worker##### Truncation
509*912701f9SAndroid Build Coastguard Worker
510*912701f9SAndroid Build Coastguard WorkerBCP 47 requires that implementations allow for language tags of at least 35 characters, in [Section 4.1.1](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.4.1).
511*912701f9SAndroid Build Coastguard WorkerTo allow for use of extensions, CLDR extends that minimum to 255 for Unicode locale identifiers.
512*912701f9SAndroid Build Coastguard WorkerTheoretically, a language tag could be far longer, due to the possibility of a large number of variants and extensions.
513*912701f9SAndroid Build Coastguard WorkerIn practice, the typical size of a locale or language identifier will be much smaller, so implementations can optimize for smaller sizes, as long as there is an escape mechanism allowing for up to 255.
514*912701f9SAndroid Build Coastguard Worker
515*912701f9SAndroid Build Coastguard Worker### <a name="Field_Definitions" href="#Field_Definitions">Language Identifier Field Definitions</a>
516*912701f9SAndroid Build Coastguard Worker
517*912701f9SAndroid Build Coastguard WorkerUnicode language and locale identifier field values are provided in the following table. Note that some private-use BCP 47 field values are given specific meanings in CLDR. While field values are based on [[BCP47](#BCP47)] subtag values, their validity status in CLDR is specified by means of machine-readable files in the [common/validity/](https://github.com/unicode-org/cldr-staging/tree/main/production/common/validity) subdirectory, such as language.xml. For the format of those files and more information, see _[Validity Data](#Validity_Data)_.
518*912701f9SAndroid Build Coastguard Worker
519*912701f9SAndroid Build Coastguard Worker#### <a name="unicode_language_subtag_validity" href="#unicode_language_subtag_validity">`unicode_language_subtag`</a> (also known as a _Unicode base language code_)
520*912701f9SAndroid Build Coastguard Worker
521*912701f9SAndroid Build Coastguard WorkerSubtags in the language.xml file (see _[Validity Data](#Validity_Data)_ ). These are based on [[BCP47](#BCP47)] subtag values marked as **Type: language**
522*912701f9SAndroid Build Coastguard Worker
523*912701f9SAndroid Build Coastguard WorkerISO 639-3 introduces the notion of "macrolanguages", where certain ISO 639-1 or ISO 639-2 codes are given broad semantics, and additional codes are given for the narrower semantics. For backwards compatibility, Unicode language identifiers retain use of the narrower semantics for these codes. For example:
524*912701f9SAndroid Build Coastguard Worker
525*912701f9SAndroid Build Coastguard Worker| For                         | Use   | _Not_ |
526*912701f9SAndroid Build Coastguard Worker| --------------------------- | ----- | ----- |
527*912701f9SAndroid Build Coastguard Worker| Standard Chinese (Mandarin) | `zh`  | `cmn` |
528*912701f9SAndroid Build Coastguard Worker| Standard Arabic             | `ar`  | `arb` |
529*912701f9SAndroid Build Coastguard Worker| Standard Malay              | `ms`  | `zsm` |
530*912701f9SAndroid Build Coastguard Worker| Standard Swahili            | `sw`  | `swh` |
531*912701f9SAndroid Build Coastguard Worker| Standard Uzbek              | `uz`  | `uzn` |
532*912701f9SAndroid Build Coastguard Worker| Standard Konkani            | `kok` | `knn` |
533*912701f9SAndroid Build Coastguard Worker| Northern Kurdish            | `ku`  | `kmr` |
534*912701f9SAndroid Build Coastguard Worker
535*912701f9SAndroid Build Coastguard WorkerIf a language subtag matches the `type` attribute of a `languageAlias` element, then the replacement value is used instead. For example, because "swh" occurs in `<languageAlias type="swh" replacement="sw" />` , "sw" must be used instead of "swh". Thus Unicode language identifiers use "ar-EG" for Standard Arabic (Egypt), not "arb-EG"; they use "zh-TW" for Mandarin Chinese (Taiwan), not "cmn-TW".
536*912701f9SAndroid Build Coastguard Worker
537*912701f9SAndroid Build Coastguard WorkerThe private use codes listed as **excluded** in _[Private Use Codes](#Private_Use_Codes)_ will never be given specific semantics in Unicode identifiers, and are thus safe for use for other purposes by other applications.
538*912701f9SAndroid Build Coastguard Worker
539*912701f9SAndroid Build Coastguard WorkerThe CLDR provides data for normalizing language/locale codes, including mapping overlong codes like "eng-840" or "eng-USA" to the correct code "en-US"; see the **[Aliases](https://unicode-org.github.io/cldr-staging/charts/38/supplemental/aliases.html)** Chart.
540*912701f9SAndroid Build Coastguard Worker
541*912701f9SAndroid Build Coastguard WorkerThe following are special language subtags:
542*912701f9SAndroid Build Coastguard Worker
543*912701f9SAndroid Build Coastguard Worker|       | Name                  | Comment |
544*912701f9SAndroid Build Coastguard Worker| ----- | --------------------- | ------- |
545*912701f9SAndroid Build Coastguard Worker| `mis` | Uncoded languages     | The content is in a language that doesn't yet have an ISO 639 code. |
546*912701f9SAndroid Build Coastguard Worker| `mul` | Multiple languages    | The content contains more than one language or text that is simultaneously in multiple languages (such as brand names). |
547*912701f9SAndroid Build Coastguard Worker| `zxx` | No linguistic content | The content is not in any particular languages (such as images, symbols, etc.) |
548*912701f9SAndroid Build Coastguard Worker
549*912701f9SAndroid Build Coastguard Worker#### <a name="unicode_script_subtag_validity" href="#unicode_script_subtag_validity">`unicode_script_subtag`</a> (also known as a _Unicode script code_)
550*912701f9SAndroid Build Coastguard Worker
551*912701f9SAndroid Build Coastguard WorkerSubtags in the script.xml file (see _[Validity Data](#Validity_Data)_). These are based on [[BCP47](#BCP47)] subtag values marked as **Type: script**
552*912701f9SAndroid Build Coastguard Worker
553*912701f9SAndroid Build Coastguard WorkerIn most cases the script is not necessary, since the language is only customarily written in a single script. Examples of cases where it is used are:
554*912701f9SAndroid Build Coastguard Worker
555*912701f9SAndroid Build Coastguard Worker| Subtag    | Description |
556*912701f9SAndroid Build Coastguard Worker| --------- | ----------- |
557*912701f9SAndroid Build Coastguard Worker| `az_Arab` | Azerbaijani in Arabic script |
558*912701f9SAndroid Build Coastguard Worker| `az_Cyrl` | Azerbaijani in Cyrillic script |
559*912701f9SAndroid Build Coastguard Worker| `az_Latn` | Azerbaijani in Latin script |
560*912701f9SAndroid Build Coastguard Worker| `zh_Hans` | Chinese, in simplified script (=zh, zh-Hans, zh-CN, zh-Hans-CN) |
561*912701f9SAndroid Build Coastguard Worker| `zh_Hant` | Chinese, in traditional script |
562*912701f9SAndroid Build Coastguard Worker
563*912701f9SAndroid Build Coastguard WorkerUnicode identifiers give specific semantics to certain Unicode Script values. For more information, see also [[UAX24](https://www.unicode.org/reports/tr41/#UAX24)]:
564*912701f9SAndroid Build Coastguard Worker
565*912701f9SAndroid Build Coastguard Worker<!-- HTML: rospan, colspan -->
566*912701f9SAndroid Build Coastguard Worker<table><tbody>
567*912701f9SAndroid Build Coastguard Worker<tr><td><code>Qaag</code></td>
568*912701f9SAndroid Build Coastguard Worker    <td>Zawgyi</td>
569*912701f9SAndroid Build Coastguard Worker    <td colspan="2">Qaag is a special script code for identifying the non-standard use of Myanmar characters for display with the Zawgyi font. The purpose of the code is to enable migration to standard, interoperable use of Unicode by providing an identifier for Zawgyi for tagging text, applications, input methods, font tables, transformations, and other mechanisms used for migration.</td></tr>
570*912701f9SAndroid Build Coastguard Worker<tr><td><code>Qaai</code></td>
571*912701f9SAndroid Build Coastguard Worker    <td>Inherited</td>
572*912701f9SAndroid Build Coastguard Worker    <td colspan="2"><b>deprecated</b>: the <i>canonicalized</i> form is Zinh</td></tr>
573*912701f9SAndroid Build Coastguard Worker<tr><td><code>Zinh</code></td>
574*912701f9SAndroid Build Coastguard Worker    <td>Inherited</td>
575*912701f9SAndroid Build Coastguard Worker    <td colspan="2">&nbsp;</td></tr>
576*912701f9SAndroid Build Coastguard Worker<tr><td><code>Zsye</code></td>
577*912701f9SAndroid Build Coastguard Worker    <td>Emoji Style</td>
578*912701f9SAndroid Build Coastguard Worker    <td colspan="2">Prefer emoji style for characters that have both text and emoji styles available.</td></tr>
579*912701f9SAndroid Build Coastguard Worker<tr><td><code>Zsym</code></td>
580*912701f9SAndroid Build Coastguard Worker    <td>Text Style</td>
581*912701f9SAndroid Build Coastguard Worker    <td colspan="2">Prefer text style for characters that have both text and emoji styles available.</td></tr>
582*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="7"><code>Zxxx</code></td>
583*912701f9SAndroid Build Coastguard Worker    <td rowspan="7">Unwritten</td>
584*912701f9SAndroid Build Coastguard Worker    <td colspan="2">Indicates spoken or otherwise unwritten content. For example:</td></tr>
585*912701f9SAndroid Build Coastguard Worker
586*912701f9SAndroid Build Coastguard Worker<tr><th>Sample(s)</th><th>Description</th></tr>
587*912701f9SAndroid Build Coastguard Worker<tr><td>uz</td><td>either written or spoken content</td></tr>
588*912701f9SAndroid Build Coastguard Worker<tr><td>uz-Latn <i>or</i> uz-Arab</td><td>written-only content (particular script)</td></tr>
589*912701f9SAndroid Build Coastguard Worker<tr><td>uz-Zyyy</td><td>written-only content (unspecified script)</td></tr>
590*912701f9SAndroid Build Coastguard Worker<tr><td>uz-Zxxx</td><td>spoken-only content</td></tr>
591*912701f9SAndroid Build Coastguard Worker<tr><td>uz-Latn, uz-Zxxx</td><td>both specific written and spoken content (using a <i>language list</i>)</td></tr>
592*912701f9SAndroid Build Coastguard Worker
593*912701f9SAndroid Build Coastguard Worker<tr><td><code>Zyyy</code></td>
594*912701f9SAndroid Build Coastguard Worker    <td>Common</td>
595*912701f9SAndroid Build Coastguard Worker    <td colspan="2">&nbsp;</td></tr>
596*912701f9SAndroid Build Coastguard Worker<tr><td><code>Zzzz</code></td>
597*912701f9SAndroid Build Coastguard Worker    <td>Unknown</td>
598*912701f9SAndroid Build Coastguard Worker<td colspan="2">&nbsp;</td></tr>
599*912701f9SAndroid Build Coastguard Worker</tbody></table>
600*912701f9SAndroid Build Coastguard Worker
601*912701f9SAndroid Build Coastguard WorkerThe private use subtags listed as **excluded** in _[Private Use Codes](#Private_Use_Codes)_ will never be given specific semantics in Unicode identifiers, and are thus safe for use for other purposes by other applications.
602*912701f9SAndroid Build Coastguard Worker
603*912701f9SAndroid Build Coastguard Worker#### <a name="unicode_region_subtag_validity" href="#unicode_region_subtag_validity">`unicode_region_subtag`</a> (also known as a _Unicode region code,_ or a _Unicode territory code_)
604*912701f9SAndroid Build Coastguard Worker
605*912701f9SAndroid Build Coastguard WorkerSubtags in the region.xml file (see _[Validity Data](#Validity_Data)_). These are based on [[BCP47](#BCP47)] subtag values marked as **Type: region**
606*912701f9SAndroid Build Coastguard Worker
607*912701f9SAndroid Build Coastguard WorkerUnicode identifiers give specific semantics to the following subtags.
608*912701f9SAndroid Build Coastguard Worker(The alpha2 codes are used as Unicode region subtags. The alpha3 and numeric codes are derived according to _[Numeric Codes](#Numeric_Codes)_ and listed here for additional documentation.)
609*912701f9SAndroid Build Coastguard Worker
610*912701f9SAndroid Build Coastguard Worker| alpha2 | alpha3 | num | Name                         | Comment | ISO 3166-1 status |
611*912701f9SAndroid Build Coastguard Worker| ------ | ------ | --- | ---------------------------- | ------- | ----------------- |
612*912701f9SAndroid Build Coastguard Worker| `QO`   | `QOO`  | 961 | Outlying Oceania             | countries in Oceania [009] that do not have a [subcontinent](https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_containment_un_m_49.html). | private use |
613*912701f9SAndroid Build Coastguard Worker| `QU`   | `QUU`  | 967 | European Union               | **deprecated**: the _canonicalized_ form is EU | private use |
614*912701f9SAndroid Build Coastguard Worker| `UK`   | -      | -   | United Kingdom               | **deprecated**: the _canonicalized_ form is GB | exceptionally reserved |
615*912701f9SAndroid Build Coastguard Worker| `XA`   | `XAA`  | 973 | Pseudo-Accents               | special code indicating derived testing locale with English + added accents and lengthened | private use |
616*912701f9SAndroid Build Coastguard Worker| `XB`   | `XBB`  | 974 | Pseudo-Bidi                  | special code indicating derived testing locale with forced RTL English | private use |
617*912701f9SAndroid Build Coastguard Worker| `XK`   | `XKK`  | 983 | Kosovo                       | industry practice | private use |
618*912701f9SAndroid Build Coastguard Worker| `ZZ`   | `ZZZ`  | 999 | Unknown or Invalid Territory | used in APIs or as replacement for invalid code | private use |
619*912701f9SAndroid Build Coastguard Worker
620*912701f9SAndroid Build Coastguard Worker
621*912701f9SAndroid Build Coastguard WorkerThe private use subtags listed as **excluded** in _[Private Use Codes](#Private_Use_Codes)_ will normally never be given specific semantics in Unicode identifiers, and are thus safe for use for other purposes by other applications. However, LDML may follow widespread industry practice in the use of some of these codes, such as for XK.
622*912701f9SAndroid Build Coastguard Worker
623*912701f9SAndroid Build Coastguard WorkerThe CLDR provides data for normalizing territory/region codes, including mapping overlong codes like "eng-840" or "eng-USA" to the correct code "en-US".
624*912701f9SAndroid Build Coastguard Worker
625*912701f9SAndroid Build Coastguard WorkerSpecial Codes:
626*912701f9SAndroid Build Coastguard Worker
627*912701f9SAndroid Build Coastguard Worker* The territory code 'UK' has a special status in ISO, and is used for the domain name instead of GB. It is thus recognized by CLDR as being an alternate (unnormalized) form of 'GB'.
628*912701f9SAndroid Build Coastguard Worker* The territory code '001' (the World) is used to indicate a standardized form, such as "ar-001" for Modern Standard Arabic.
629*912701f9SAndroid Build Coastguard Worker
630*912701f9SAndroid Build Coastguard Worker#### <a name="unicode_variant_subtag_validity" href="#unicode_variant_subtag_validity">`unicode_variant_subtag`</a> (also known as a _Unicode language variant code_)
631*912701f9SAndroid Build Coastguard Worker
632*912701f9SAndroid Build Coastguard WorkerSubtags in the variant.xml file (see _[Validity Data](#Validity_Data)_). These are based on [[BCP47](#BCP47)] subtag values marked as **Type: variant**. The sequence of variant tags must not have any duplicates: thus de-1996-fonipa-1996 is invalid, while de-1996-fonipa and de-fonipa-1996 are both valid.
633*912701f9SAndroid Build Coastguard Worker
634*912701f9SAndroid Build Coastguard WorkerCLDR provides data for normalizing variant codes. About handling of the "POSIX" variant see _[Legacy Variants](#Legacy_Variants)_.
635*912701f9SAndroid Build Coastguard Worker
636*912701f9SAndroid Build Coastguard Worker_Examples:_
637*912701f9SAndroid Build Coastguard Worker
638*912701f9SAndroid Build Coastguard Worker```
639*912701f9SAndroid Build Coastguard Workeren
640*912701f9SAndroid Build Coastguard Workerfr_BE
641*912701f9SAndroid Build Coastguard Workerzh-Hant-HK
642*912701f9SAndroid Build Coastguard Worker```
643*912701f9SAndroid Build Coastguard Worker
644*912701f9SAndroid Build Coastguard Worker_Deprecated_ codes—such as QU above—are valid, but strongly discouraged.
645*912701f9SAndroid Build Coastguard Worker
646*912701f9SAndroid Build Coastguard WorkerA locale that only has a language subtag (and optionally a script subtag) is called a _language locale_; one with both language and territory subtag is called a _territory locale_ (or _country locale_).
647*912701f9SAndroid Build Coastguard Worker
648*912701f9SAndroid Build Coastguard Worker### <a name="Special_Codes" href="#Special_Codes">Special Codes</a>
649*912701f9SAndroid Build Coastguard Worker
650*912701f9SAndroid Build Coastguard Worker#### <a name="Unknown_or_Invalid_Identifiers" href="#Unknown_or_Invalid_Identifiers">Unknown or Invalid Identifiers</a>
651*912701f9SAndroid Build Coastguard Worker
652*912701f9SAndroid Build Coastguard WorkerThe following identifiers are used to indicate an unknown or invalid code in Unicode language and locale identifiers. For Unicode identifiers, the region code uses a private use ISO 3166 code, and Time Zone code uses an additional code; the others are defined by the relevant standards. When these codes are used in APIs connected with Unicode identifiers, the meaning is that either there was no identifier available, or that at some point an input identifier value was determined to be invalid or ill-formed.
653*912701f9SAndroid Build Coastguard Worker
654*912701f9SAndroid Build Coastguard Worker| Code Type   | Value  | Description in Referenced Standards |
655*912701f9SAndroid Build Coastguard Worker| ----------- | ------ | ----------------------------------- |
656*912701f9SAndroid Build Coastguard Worker| Language    | `und`  | Undetermined language, also used for “root” |
657*912701f9SAndroid Build Coastguard Worker| Script      | `Zzzz` | Code for uncoded script, Unknown [[UAX24](https://www.unicode.org/reports/tr41/#UAX24)] |
658*912701f9SAndroid Build Coastguard Worker| Region      | `ZZ`   | Unknown or Invalid Territory |
659*912701f9SAndroid Build Coastguard Worker| Currency    | `XXX`  | The codes assigned for transactions where no currency is involved |
660*912701f9SAndroid Build Coastguard Worker| Time Zone   | `unk`  | Unknown or Invalid Time Zone |
661*912701f9SAndroid Build Coastguard Worker| Subdivision | _\<region>zzzz_ | Unknown or Invalid Subdivision |
662*912701f9SAndroid Build Coastguard Worker
663*912701f9SAndroid Build Coastguard WorkerWhen only the script or region are known, then a locale ID will use "und" as the language subtag portion. Thus the locale tag "und_Grek" represents the Greek script; "und_US" represents the US territory.
664*912701f9SAndroid Build Coastguard Worker
665*912701f9SAndroid Build Coastguard Worker#### <a name="Numeric_Codes" href="#Numeric_Codes">Numeric Codes</a>
666*912701f9SAndroid Build Coastguard Worker
667*912701f9SAndroid Build Coastguard WorkerFor region codes, ISO and the UN establish a mapping to three-letter codes and numeric codes. However, this does not extend to the private use codes, which are the codes 900-999 (total: 100), and AAA, QMA-QZZ, XAA-XZZ, and ZZZ (total: 1092). Unicode identifiers supply a standard mapping to these: for the numeric codes, it uses the top of the numeric private use range; for the 3-letter codes it doubles the final letter. These are the resulting mappings for all of the private use region codes:
668*912701f9SAndroid Build Coastguard Worker
669*912701f9SAndroid Build Coastguard Worker| Region   | UN/ISO Numeric | ISO 3-Letter |
670*912701f9SAndroid Build Coastguard Worker| -------- | -------------- | ------------ |
671*912701f9SAndroid Build Coastguard Worker| `AA`     | `958`          | `AAA`        |
672*912701f9SAndroid Build Coastguard Worker| `QM..QZ` | `959..972`     | `QMM..QZZ`   |
673*912701f9SAndroid Build Coastguard Worker| `XA..XZ` | `973..998`     | `XAA..XZZ`   |
674*912701f9SAndroid Build Coastguard Worker| `ZZ`     | `999`          | `ZZZ`        |
675*912701f9SAndroid Build Coastguard Worker
676*912701f9SAndroid Build Coastguard WorkerFor script codes, ISO 15924 supplies a mapping (however, the numeric codes are not in common use):
677*912701f9SAndroid Build Coastguard Worker
678*912701f9SAndroid Build Coastguard Worker| Script       | Numeric    |
679*912701f9SAndroid Build Coastguard Worker| ------------ | ---------- |
680*912701f9SAndroid Build Coastguard Worker| `Qaaa..Qabx` | `900..949` |
681*912701f9SAndroid Build Coastguard Worker
682*912701f9SAndroid Build Coastguard Worker#### <a name="Private_Use_Codes" href="#Private_Use_Codes">Private Use Codes</a>
683*912701f9SAndroid Build Coastguard Worker
684*912701f9SAndroid Build Coastguard WorkerPrivate use codes fall into three groups.
685*912701f9SAndroid Build Coastguard Worker
686*912701f9SAndroid Build Coastguard Worker*   **defined:** those that are given particular semantics currently in CLDR
687*912701f9SAndroid Build Coastguard Worker*   **reserved:** those that may be given particular semantics in future versions of CLDR
688*912701f9SAndroid Build Coastguard Worker*   **excluded:** those that will never be given particular CLDR semantics in the future, and thus can normally be used by applications without worrying about collisions. However, CLDR may follow widespread industry practice in the use of some of these codes, such as for XA, XB, and XK.
689*912701f9SAndroid Build Coastguard Worker
690*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Private_Use_CLDR" href="#Private_Use_CLDR">Private Use Codes in CLDR</a>
691*912701f9SAndroid Build Coastguard Worker
692*912701f9SAndroid Build Coastguard Worker| category      | status   | codes |
693*912701f9SAndroid Build Coastguard Worker| ------------- | -------- | ----- |
694*912701f9SAndroid Build Coastguard Worker| base language | defined  | none  |
695*912701f9SAndroid Build Coastguard Worker|               | reserved | qaa..qfy |
696*912701f9SAndroid Build Coastguard Worker|               | excluded | qfz..qtz |
697*912701f9SAndroid Build Coastguard Worker| script        | defined  | Qaai (obsolete), Qaag |
698*912701f9SAndroid Build Coastguard Worker|               | reserved | Qaaa..Qaaf Qaah Qaaj..Qaap |
699*912701f9SAndroid Build Coastguard Worker|               | excluded | Qaaq..Qabx |
700*912701f9SAndroid Build Coastguard Worker| region        | defined  | QO, QU, UK, XA, XB, XK, ZZ |
701*912701f9SAndroid Build Coastguard Worker|               | reserved | AA QM..QN QP..QT QV..QZ |
702*912701f9SAndroid Build Coastguard Worker|               | excluded | XC..XJ, XL..XZ |
703*912701f9SAndroid Build Coastguard Worker| timezone      | defined  | IANA: Etc/Unknown<br/>bcp47: as listed in bcp47/timezone.xml |
704*912701f9SAndroid Build Coastguard Worker|               | reserved | bcp47: all non-5 letter codes not starting with x |
705*912701f9SAndroid Build Coastguard Worker|               | excluded | bcp47: all non-5 letter codes starting with x |
706*912701f9SAndroid Build Coastguard Worker
707*912701f9SAndroid Build Coastguard WorkerSee also _[Unknown or Invalid Identifiers](#Unknown_or_Invalid_Identifiers)_.
708*912701f9SAndroid Build Coastguard Worker
709*912701f9SAndroid Build Coastguard Worker### Special Script Codes
710*912701f9SAndroid Build Coastguard WorkerCertain valid script code require special handling.
711*912701f9SAndroid Build Coastguard WorkerThese are the codes in [Script Codes](https://www.unicode.org/iso15924/iso15924-codes.html) with the words "variant" or "alias" within parentheses,
712*912701f9SAndroid Build Coastguard Workerexcluding Zsye.
713*912701f9SAndroid Build Coastguard WorkerThe Compound codes include characters in multiple scripts;
714*912701f9SAndroid Build Coastguard Workerthe Visual variants are distinct in appearance, but otherwise encompass a single script;
715*912701f9SAndroid Build Coastguard Workerand the Subsets exclude certain characters from a script.
716*912701f9SAndroid Build Coastguard WorkerThe Equivalents for Subsets are not as well defined, so the "Equivalents" are marked as approximate.
717*912701f9SAndroid Build Coastguard Worker
718*912701f9SAndroid Build Coastguard Worker| Group | Script | Equivalent |
719*912701f9SAndroid Build Coastguard Worker| --- | --- | --- |
720*912701f9SAndroid Build Coastguard Worker| Compounds | Jpan | ≡ Hani ∪ Hira ∪ Kana |
721*912701f9SAndroid Build Coastguard Worker|  | Hrkt | ≡ Hira ∪ Kana |
722*912701f9SAndroid Build Coastguard Worker|  | Kore | ≡ Hani ∪ Hang |
723*912701f9SAndroid Build Coastguard Worker|  | Hanb | ≡ Hani ∪ Bopo |
724*912701f9SAndroid Build Coastguard Worker| Visual variants | Aran | ≡ Arab (Nastaliq variant) |
725*912701f9SAndroid Build Coastguard Worker|  | Cyrs | ≡ Cyrl (Old Church Slavonic variant) |
726*912701f9SAndroid Build Coastguard Worker|  | Latf | ≡ Latn (Fraktur variant) |
727*912701f9SAndroid Build Coastguard Worker|  | Latg | ≡ Latn (Gaelic variant) |
728*912701f9SAndroid Build Coastguard Worker|  | Syrn | ≡ Syrc (Eastern variant) |
729*912701f9SAndroid Build Coastguard Worker|  | Syre | ≡ Syrc (Estrangelo variant) |
730*912701f9SAndroid Build Coastguard Worker|  | Syrj | ≡ Syrc (Western variant) |
731*912701f9SAndroid Build Coastguard Worker| Subsets (approximate) | Jamo | ≃ Hang - LVT - LV |
732*912701f9SAndroid Build Coastguard Worker|  | Hans | ≃ Hani - Traditional-only |
733*912701f9SAndroid Build Coastguard Worker|  | Hant | ≃ Hani - Simplified-only |
734*912701f9SAndroid Build Coastguard Worker
735*912701f9SAndroid Build Coastguard WorkerThe special codes most frequently used are in the locale identifiers zh-Hans, zh-Hant, ja-Jpan, and ko-Kore.
736*912701f9SAndroid Build Coastguard WorkerThese are used, for example, in [Likely Subtags](#Likely_Subtags) in LDML.
737*912701f9SAndroid Build Coastguard WorkerSome of the special codes are used in other specifications,
738*912701f9SAndroid Build Coastguard Workersuch as in [Mixed_Script_Detection](https://unicode.org/reports/tr39/#Mixed_Script_Detection).
739*912701f9SAndroid Build Coastguard Worker
740*912701f9SAndroid Build Coastguard Worker<a name="Locale_Extension_Key_and_Type_Data"></a>
741*912701f9SAndroid Build Coastguard Worker### <a name="u_Extension" href="#u_Extension">Unicode BCP 47 U Extension</a>
742*912701f9SAndroid Build Coastguard Worker
743*912701f9SAndroid Build Coastguard Worker[[BCP47](#BCP47)] Language Tags provides a mechanism for extending language tags for use in various applications by extension subtags. Each extension subtag is identified by a single alphanumeric character subtag assigned by IANA.
744*912701f9SAndroid Build Coastguard Worker
745*912701f9SAndroid Build Coastguard WorkerThe Unicode Consortium has registered and is the maintaining authority for two BCP 47 language tag extensions: the extension 'u' for Unicode locale extension [[RFC6067](#RFC6067)] and extension 't' for transformed content [[RFC6497](#RFC6497)]. The Unicode BCP 47 extension data defines the complete list of valid subtags.
746*912701f9SAndroid Build Coastguard Worker
747*912701f9SAndroid Build Coastguard WorkerThese subtags are all in lowercase (that is the canonical casing for these subtags), however, subtags are case-insensitive and casing does not carry any specific meaning. All subtags within the Unicode extensions are alphanumeric characters in length of two to eight that meet the rule `extension` in the [[BCP47](#BCP47)].
748*912701f9SAndroid Build Coastguard Worker
749*912701f9SAndroid Build Coastguard Worker**The -u- Extension.** The syntax of 'u' extension subtags is defined by the rule `unicode_locale_extensions` in [Unicode locale identifier](#Unicode_locale_identifier), except the separator of subtags `sep` must be always hyphen '-' when the extension is used as a part of BCP 47 language tag.
750*912701f9SAndroid Build Coastguard Worker
751*912701f9SAndroid Build Coastguard WorkerA 'u' extension may contain multiple `attribute` s or `keyword` s as defined in [Unicode locale identifier](#Unicode_locale_identifier). The canonical syntax is defined as in [Canonical Unicode Locale Identifiers](#Canonical_Unicode_Locale_Identifiers).
752*912701f9SAndroid Build Coastguard Worker
753*912701f9SAndroid Build Coastguard Worker_See also [Unicode Extensions for BCP 47](https://cldr.unicode.org/index/bcp47-extension) on the CLDR site._
754*912701f9SAndroid Build Coastguard Worker
755*912701f9SAndroid Build Coastguard Worker#### <a name="Key_And_Type_Definitions_" href="#Key_And_Type_Definitions_">Key And Type Definitions</a>
756*912701f9SAndroid Build Coastguard Worker
757*912701f9SAndroid Build Coastguard WorkerThe following chart contains a set of U extension key values that are currently available, with a description or sampling of the U extension type values. Each category is associated with an XML file in the bcp47 directory.
758*912701f9SAndroid Build Coastguard Worker
759*912701f9SAndroid Build Coastguard WorkerFor the complete list of valid keys and types defined for Unicode locale extensions, see [U Extension Data Files](#Unicode_Locale_Extension_Data_Files). For information on the process for adding new _key_/_type_, see [[LocaleProject](#localeProject)].
760*912701f9SAndroid Build Coastguard Worker
761*912701f9SAndroid Build Coastguard WorkerMost type values are represented by a single subtag in the current version of CLDR. There are exceptions, such as types used for key "ca" (calendar) and "kr" (collation reordering). If the type is not included, then the type value "true" is assumed. Note that the default for key with a possible "true" value is often "false", but may not always be. Note also that "true"/"True" is not a valid script code, since [the ISO 15924 Registration Authority has exceptionally reserved it](https://www.unicode.org/iso15924/codelists.html), which means that it will not be assigned for any purpose.
762*912701f9SAndroid Build Coastguard Worker
763*912701f9SAndroid Build Coastguard WorkerNote that canonicalization does not change invalid locales to valid locales. For example, und-u-ka canonicalizes to und-u-ka-true, but:
764*912701f9SAndroid Build Coastguard Worker
765*912701f9SAndroid Build Coastguard Worker1. "und-u-ka-true" — is invalid, since ‘yes’ is not a valid value for ka
766*912701f9SAndroid Build Coastguard Worker2. "und-u-ka" — is invalid, since the value “true” is assumed whenever there is no value, and ‘true’ is not a valid value for ka
767*912701f9SAndroid Build Coastguard Worker
768*912701f9SAndroid Build Coastguard WorkerThe BCP 47 form for keys and types is the canonical form, and recommended. Other aliases are included for backwards compatibility.
769*912701f9SAndroid Build Coastguard Worker
770*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Key_Type_Definitions" href="#Key_Type_Definitions">Key/Type Definitions</a>
771*912701f9SAndroid Build Coastguard Worker
772*912701f9SAndroid Build Coastguard Worker<!-- HTML: rowspan, colspan -->
773*912701f9SAndroid Build Coastguard Worker<table><tbody>
774*912701f9SAndroid Build Coastguard Worker<tr><th>key<br>(old key name)</th><th>key description</th><th>example type<br>(old type name)</th><th>type description</th></tr>
775*912701f9SAndroid Build Coastguard Worker
776*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeCalendarIdentifier" id="UnicodeCalendarIdentifier" href="#UnicodeCalendarIdentifier">Unicode Calendar Identifier</a>
777*912701f9SAndroid Build Coastguard Worker        defines a type of calendar. The valid values are those <i>name</i> attribute values in the <i>type</i> elements of key name="ca"
778*912701f9SAndroid Build Coastguard Worker        in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/calendar.xml" target="_blank">calendar.xml</a></b>.<br>
779*912701f9SAndroid Build Coastguard Worker        This selects calendar-specific data within a locale used for formatting and parsing, such as date/time symbols and patterns; it also selects supplemental
780*912701f9SAndroid Build Coastguard Worker        calendarData used for calendrical calculations.
781*912701f9SAndroid Build Coastguard Worker		The value can affect the computation of the first day of the week: see <a href='tr35-dates.md#first-day-overrides'>First Day Overrides</a>.
782*912701f9SAndroid Build Coastguard Worker        </td></tr>
783*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="10">"ca"<br>(calendar)</td>
784*912701f9SAndroid Build Coastguard Worker    <td rowspan="10">Calendar algorithm<br><br><i>(For information on the calendar algorithms associated with the data used with these, see [<a href="#Calendars">Calendars</a>].)</i></td>
785*912701f9SAndroid Build Coastguard Worker            <td>"buddhist"</td>
786*912701f9SAndroid Build Coastguard Worker            <td>Thai Buddhist calendar (same as Gregorian except for the year)</td></tr>
787*912701f9SAndroid Build Coastguard Worker        <tr><td>"chinese"</td>
788*912701f9SAndroid Build Coastguard Worker            <td>Traditional Chinese calendar</td></tr>
789*912701f9SAndroid Build Coastguard Worker        <tr><td colspan="2">…</td></tr>
790*912701f9SAndroid Build Coastguard Worker        <tr><td>"gregory"<br>(gregorian)</td>
791*912701f9SAndroid Build Coastguard Worker            <td>Gregorian calendar</td></tr>
792*912701f9SAndroid Build Coastguard Worker        <tr><td colspan="2">…</td></tr>
793*912701f9SAndroid Build Coastguard Worker        <tr><td>"islamic"</td>
794*912701f9SAndroid Build Coastguard Worker            <td>Islamic calendar</td></tr>
795*912701f9SAndroid Build Coastguard Worker        <tr><td>"islamic-civil"</td>
796*912701f9SAndroid Build Coastguard Worker            <td>Islamic calendar, tabular (intercalary years [2,5,7,10,13,16,18,21,24,26,29] - civil epoch)</td></tr>
797*912701f9SAndroid Build Coastguard Worker        <tr><td>"islamic-umalqura"</td>
798*912701f9SAndroid Build Coastguard Worker            <td>Islamic calendar, Umm al-Qura</td></tr>
799*912701f9SAndroid Build Coastguard Worker        <tr><td colspan="2">…</td></tr>
800*912701f9SAndroid Build Coastguard Worker        <tr><td colspan="2"><b>Note:</b> <i>Some calendar types are represented by two subtags. In such cases, the first subtag specifies a generic calendar type and the second subtag specifies a calendar algorithm variant. The CLDR uses generic calendar types (single subtag types) for tagging data when calendar algorithm variations within a generic calendar type are irrelevant. For example, type "islamic" is used for specifying Islamic calendar formatting data for all Islamic calendar types, including "islamic-civil" and "islamic-umalqura".</i></td></tr>
801*912701f9SAndroid Build Coastguard Worker
802*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeCurrencyFormatIdentifier" id="UnicodeCurrencyFormatIdentifier" href="#UnicodeCurrencyFormatIdentifier">Unicode Currency Format Identifier</a>
803*912701f9SAndroid Build Coastguard Worker        defines a style for currency formatting. The valid values are those <i>name</i> attribute values in the <i>type</i> elements of key name="cf" in
804*912701f9SAndroid Build Coastguard Worker        bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/currency.xml" target="_blank">currency.xml</a></b>.<br>
805*912701f9SAndroid Build Coastguard Worker        This selects the specific type of currency formatting pattern within a locale.
806*912701f9SAndroid Build Coastguard Worker        </td></tr>
807*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="2">"cf"</td>
808*912701f9SAndroid Build Coastguard Worker    <td rowspan="2">Currency Format style</td>
809*912701f9SAndroid Build Coastguard Worker        <td>"standard"</td><td>Negative numbers use the minusSign symbol (the default).</td></tr>
810*912701f9SAndroid Build Coastguard Worker        <tr><td>"account"</td><td>Negative numbers use parentheses or equivalent.</td></tr>
811*912701f9SAndroid Build Coastguard Worker
812*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeCollationIdentifier" id="UnicodeCollationIdentifier" href="#UnicodeCollationIdentifier">Unicode Collation Identifier</a> defines a type of collation (sort order). The valid values are those <i>name</i> attribute values in the <i>type</i> elements of bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/collation.xml" target="_blank">collation.xml</a></b>.</td></tr>
813*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><i>For information on each collation setting parameter, from <b>ka</b> to <b>vt</b>, see <a href="tr35-collation.md#Setting_Options">Setting Options</a></i></td></tr>
814*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="8">"co"<br>(collation)</td>
815*912701f9SAndroid Build Coastguard Worker    <td rowspan="8">Collation type</td>
816*912701f9SAndroid Build Coastguard Worker            <td>"standard"</td>
817*912701f9SAndroid Build Coastguard Worker            <td>The default ordering for each language. For root it is based on the [<a href="#DUCET">DUCET</a>] (Default Unicode Collation Element Table): see <i><a href="tr35-collation.md#Root_Collation">Root Collation</a></i>. Each other locale is based on that, except for appropriate modifications to certain characters for that language.</td></tr>
818*912701f9SAndroid Build Coastguard Worker        <tr><td>"search"</td>
819*912701f9SAndroid Build Coastguard Worker            <td>A special collation type dedicated for string search—it is not used to determine the relative order of two strings, but only to determine whether they should be considered equivalent for the specified strength, using the string search matching rules appropriate for the language. Compared to the normal collator for the language, this may add or remove primary equivalences, may make additional characters ignorable or change secondary equivalences, and may modify contractions to allow matching within them, depending on the desired behavior. For example, in Czech, the distinction between ‘a’ and ‘á’ is secondary for normal collation, but primary for search; a search for ‘a’ should never match ‘á’ and vice versa. A search collator is normally used with strength set to PRIMARY or SECONDARY (should be SECONDARY if using “asymmetric” search as described in the [<a href="https://www.unicode.org/reports/tr41/#UTS10">UCA</a>] section Asymmetric Search). The search collator in root supplies matching rules that are appropriate for most languages (and which are different than the root collation behavior); language-specific search collators may be provided to override the matching rules for a given language as necessary.</td></tr>
820*912701f9SAndroid Build Coastguard Worker        <tr><td colspan="2"><p>Other keywords provide additional choices for certain locales; <i>they only have effect in certain locales.</i></p></td></tr>
821*912701f9SAndroid Build Coastguard Worker        <tr><td colspan="2">…</td></tr>
822*912701f9SAndroid Build Coastguard Worker        <tr><td>"phonetic"</td>
823*912701f9SAndroid Build Coastguard Worker            <td>Requests a phonetic variant if available, where text is sorted based on pronunciation. It may interleave different scripts, if multiple scripts are in common use.</td></tr>
824*912701f9SAndroid Build Coastguard Worker        <tr><td>"pinyin"</td>
825*912701f9SAndroid Build Coastguard Worker            <td>Pinyin ordering for Latin and for CJK characters; that is, an ordering for CJK characters based on a character-by-character transliteration into a pinyin. (used in Chinese)</td></tr>
826*912701f9SAndroid Build Coastguard Worker        <tr><td>"searchjl"</td>
827*912701f9SAndroid Build Coastguard Worker            <td>Special collation type for a modified string search in which a pattern consisting of a sequence of Hangul initial consonants (jamo lead consonants) will match a sequence of Hangul syllable characters whose initial consonants match the pattern. The jamo lead consonants can be represented using conjoining or compatibility jamo. This search collator is best used at SECONDARY strength with an "asymmetric" search as described in the [<a href="https://www.unicode.org/reports/tr41/#UTS10">UCA</a>] section Asymmetric Search and obtained, for example, using ICU4C's usearch facility with attribute USEARCH_ELEMENT_COMPARISON set to value USEARCH_PATTERN_BASE_WEIGHT_IS_WILDCARD; this ensures that a full Hangul syllable in the search pattern will only match the same syllable in the searched text (instead of matching any syllable with the same initial consonant), while a Hangul initial consonant in the search pattern will match any Hangul syllable in the searched text with the same initial consonant.</td></tr>
828*912701f9SAndroid Build Coastguard Worker        <tr><td colspan="2">…</td></tr>
829*912701f9SAndroid Build Coastguard Worker
830*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeCurrencyIdentifier" id="UnicodeCurrencyIdentifier" href="#UnicodeCurrencyIdentifier">Unicode Currency Identifier</a> defines a type of currency. The valid values are those <i>name</i> attribute values in the <i>type</i> elements of key name="cu" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/currency.xml" target="_blank">currency.xml</a>.</b></td></tr>
831*912701f9SAndroid Build Coastguard Worker<tr><td>"cu"<br>(currency)</td>
832*912701f9SAndroid Build Coastguard Worker    <td>Currency type</td>
833*912701f9SAndroid Build Coastguard Worker    <td><i>ISO 4217 code,</i><p><i>plus others in common use</i></p></td>
834*912701f9SAndroid Build Coastguard Worker    <td><p>Codes consisting of 3 ASCII letters that are or have been valid in ISO 4217, plus certain additional codes that are or have been in common use. The list of countries and time periods associated with each currency value is available in <a href="tr35-numbers.md#Supplemental_Currency_Data">Supplemental Currency Data</a>, plus the default number of decimals.</p><p>The XXX code is given a broader interpretation as <i>Unknown or Invalid Currency</i>.</p></td></tr>
835*912701f9SAndroid Build Coastguard Worker
836*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeDictionaryBreakExclusionIdentifier" id="UnicodeDictionaryBreakExclusionIdentifier"
837*912701f9SAndroid Build Coastguard Worker        href="#UnicodeDictionaryBreakExclusionIdentifier">Unicode Dictionary Break Exclusion Identifier</a> specifies scripts to be excluded from dictionary-based text break
838*912701f9SAndroid Build Coastguard Worker        (for words and lines). The valid values are of one or more items of type SCRIPT_CODE as specified in the <i>name</i> attribute value in the <i>type</i> element of
839*912701f9SAndroid Build Coastguard Worker        key name="dx" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/segmentation.xml" target="_blank">segmentation.xml</a></b>.<br>
840*912701f9SAndroid Build Coastguard Worker        This affects break iteration regardless of locale.
841*912701f9SAndroid Build Coastguard Worker        </td></tr>
842*912701f9SAndroid Build Coastguard Worker<tr><td>"dx"</td>
843*912701f9SAndroid Build Coastguard Worker    <td>Dictionary break script exclusions</td>
844*912701f9SAndroid Build Coastguard Worker    <td><i><code><a href="#unicode_script_subtag">unicode_script_subtag</a></code> values</i></td>
845*912701f9SAndroid Build Coastguard Worker    <td><ul><li>One or more items of type SCRIPT_CODE (as usual, separated by hyphens), which are valid <code><a href="#unicode_script_subtag">unicode_script_subtag</a></code> values.</li>
846*912701f9SAndroid Build Coastguard Worker		<li>Each of the values for the DX key must be a short script property value in the UCD, or one of the compound script values like jpan. The compound script values are expanded when interpreted, eg, -dx-jpan = -dx-hani-hira-kata</li>
847*912701f9SAndroid Build Coastguard Worker		<li>The values may be in any order, eg, -dx-thai-hani = dx-hani-thai. However, the canonical order for the bcp47 subtag is alphabetical, eg, dx-hani-thai</li>
848*912701f9SAndroid Build Coastguard Worker		<li>Dictionary-based break iterators will ignore each character whose Script_Extension value set intersects with the DX value set.</li>
849*912701f9SAndroid Build Coastguard Worker        <li>The code Zyyy (Common) can be specified to exclude all scripts, if and only if it is the only SCRIPT_CODE value specified. If it is not the only script code, Zyyy has the normal meaning: excluding Script_Extension=Common.</li></ul>
850*912701f9SAndroid Build Coastguard Worker        </td></tr>
851*912701f9SAndroid Build Coastguard Worker
852*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeEmojiPresentationStyleIdentifier" id="UnicodeEmojiPresentationStyleIdentifier" href="#UnicodeEmojiPresentationStyleIdentifier">Unicode Emoji Presentation Style Identifier</a> specifies a request for the preferred emoji presentation style. This can be used as part of the value for an HTML lang attribute, for example <code>&lt;html lang="sr-Latn-u-em-emoji"&gt;</code>. The valid values are those <i>name</i> attribute values in the <i>type</i> elements of key name="em" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/variant.xml" target="_blank">variant.xml</a></b>.</td></tr>
853*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="3">"em"</td>
854*912701f9SAndroid Build Coastguard Worker    <td rowspan="3">Emoji presentation style</td>
855*912701f9SAndroid Build Coastguard Worker            <td>"emoji"</td>
856*912701f9SAndroid Build Coastguard Worker            <td>Use an emoji presentation for emoji characters if possible.</td></tr>
857*912701f9SAndroid Build Coastguard Worker        <tr><td>"text"</td>
858*912701f9SAndroid Build Coastguard Worker            <td>Use a text presentation for emoji characters if possible.</td></tr>
859*912701f9SAndroid Build Coastguard Worker        <tr><td>"default"</td><td>Use the default presentation for emoji characters as specified in UTR #51 <a href="https://www.unicode.org/reports/tr51/#Presentation_Style">Presentation Style</a>.</td></tr>
860*912701f9SAndroid Build Coastguard Worker
861*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeFirstDayIdentifier" id="UnicodeFirstDayIdentifier" href="#UnicodeFirstDayIdentifier">Unicode First Day Identifier</a>
862*912701f9SAndroid Build Coastguard Worker        defines the preferred first day of the week for calendar display. Specifying "fw" in a locale identifier overrides the default value specified by supplemental
863*912701f9SAndroid Build Coastguard Worker        week data for the region (see Part 4 Dates, <a href="tr35-dates.md#Week_Data">Week Data</a>).
864*912701f9SAndroid Build Coastguard Worker		The valid values are those <i>name</i> attribute values in the <i>type</i> elements
865*912701f9SAndroid Build Coastguard Worker        of key name="fw" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/calendar.xml" target="_blank">calendar.xml</a>.
866*912701f9SAndroid Build Coastguard Worker		The value can affect the computation of the first day of the week: see <a href='tr35-dates.md#first-day-overrides'>First Day Overrides</a>.
867*912701f9SAndroid Build Coastguard Worker        </td></tr>
868*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="4">"fw"</td>
869*912701f9SAndroid Build Coastguard Worker    <td rowspan="4">First day of week</td>
870*912701f9SAndroid Build Coastguard Worker            <td>"sun"</td>
871*912701f9SAndroid Build Coastguard Worker            <td>Sunday</td></tr>
872*912701f9SAndroid Build Coastguard Worker        <tr><td>"mon"</td>
873*912701f9SAndroid Build Coastguard Worker            <td>Monday</td></tr>
874*912701f9SAndroid Build Coastguard Worker        <tr><td colspan="2">…</td></tr>
875*912701f9SAndroid Build Coastguard Worker        <tr><td>"sat"</td>
876*912701f9SAndroid Build Coastguard Worker            <td>Saturday</td></tr>
877*912701f9SAndroid Build Coastguard Worker
878*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeHourCycleIdentifier" id="UnicodeHourCycleIdentifier" href="#UnicodeHourCycleIdentifier">Unicode Hour Cycle Identifier</a>
879*912701f9SAndroid Build Coastguard Worker        defines the preferred time cycle. Specifying "hc" in a locale identifier overrides the default value specified by supplemental time data for the region
880*912701f9SAndroid Build Coastguard Worker        (see Part 4 Dates, <a href="tr35-dates.md#Time_Data">Time Data</a>). The valid values are those <i>name</i> attribute values in the <i>type</i> elements of
881*912701f9SAndroid Build Coastguard Worker        key name="hc" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/calendar.xml" target="_blank">calendar.xml</a></b>.
882*912701f9SAndroid Build Coastguard Worker        </td></tr>
883*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="4">"hc"</td>
884*912701f9SAndroid Build Coastguard Worker    <td rowspan="4">Hour cycle</td>
885*912701f9SAndroid Build Coastguard Worker            <td>"h12"</td>
886*912701f9SAndroid Build Coastguard Worker            <td>Hour system using 1–12; corresponds to 'h' in patterns</td></tr>
887*912701f9SAndroid Build Coastguard Worker        <tr><td>"h23"</td>
888*912701f9SAndroid Build Coastguard Worker            <td>Hour system using 0–23; corresponds to 'H' in patterns</td></tr>
889*912701f9SAndroid Build Coastguard Worker        <tr><td>"h11"</td>
890*912701f9SAndroid Build Coastguard Worker            <td>Hour system using 0–11; corresponds to 'K' in patterns</td></tr>
891*912701f9SAndroid Build Coastguard Worker        <tr><td>"h24"</td>
892*912701f9SAndroid Build Coastguard Worker            <td>Hour system using 1–24; corresponds to 'k' in pattern</td></tr>
893*912701f9SAndroid Build Coastguard Worker
894*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeLineBreakStyleIdentifier" id="UnicodeLineBreakStyleIdentifier" href="#UnicodeLineBreakStyleIdentifier">Unicode Line Break Style Identifier</a>
895*912701f9SAndroid Build Coastguard Worker        defines a preferred line break style corresponding to the CSS level 3 <a href="https://drafts.csswg.org/css-text/#line-break-property">line-break option</a>.
896*912701f9SAndroid Build Coastguard Worker        Specifying "lb" in a locale identifier overrides the locale’s default style (which may correspond to "normal" or "strict"). The valid values are those <i>name</i>
897*912701f9SAndroid Build Coastguard Worker        attribute values in the <i>type</i> elements of key name="lb" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/segmentation.xml" target="_blank">segmentation.xml</a></b>.
898*912701f9SAndroid Build Coastguard Worker        </td></tr>
899*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="3">"lb"</td>
900*912701f9SAndroid Build Coastguard Worker    <td rowspan="3">Line break style</td>
901*912701f9SAndroid Build Coastguard Worker            <td>"strict"</td>
902*912701f9SAndroid Build Coastguard Worker            <td>CSS level 3 line-break=strict, e.g. treat CJ as NS</td></tr>
903*912701f9SAndroid Build Coastguard Worker        <tr><td>"normal"</td>
904*912701f9SAndroid Build Coastguard Worker            <td>CSS level 3 line-break=normal, e.g. treat CJ as ID, break before hyphens for ja,zh</td></tr>
905*912701f9SAndroid Build Coastguard Worker        <tr><td>"loose"</td>
906*912701f9SAndroid Build Coastguard Worker            <td>CSS lev 3 line-break=loose</td></tr>
907*912701f9SAndroid Build Coastguard Worker
908*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeLineBreakWordIdentifier" id="UnicodeLineBreakWordIdentifier" href="#UnicodeLineBreakWordIdentifier">Unicode Line Break Word Identifier</a>
909*912701f9SAndroid Build Coastguard Worker        defines preferred line break word handling behavior corresponding to the CSS level 3 <a href="https://drafts.csswg.org/css-text/#word-break-property">word-break option</a>.
910*912701f9SAndroid Build Coastguard Worker        Specifying "lw" in a locale identifier overrides the locale’s default style (which may correspond to "normal" or "keepall"). The valid values are those <i>name</i>
911*912701f9SAndroid Build Coastguard Worker        attribute values in the <i>type</i> elements of key name="lw" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/segmentation.xml" target="_blank">segmentation.xml</a></b>.
912*912701f9SAndroid Build Coastguard Worker        </td></tr>
913*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="4">"lw"</td>
914*912701f9SAndroid Build Coastguard Worker    <td rowspan="4">Line break word handling</td>
915*912701f9SAndroid Build Coastguard Worker            <td>"normal"</td>
916*912701f9SAndroid Build Coastguard Worker            <td>CSS level 3 word-break=normal, normal script/language behavior for midword breaks</td></tr>
917*912701f9SAndroid Build Coastguard Worker        <tr><td>"breakall"</td>
918*912701f9SAndroid Build Coastguard Worker            <td>CSS level 3 word-break=break-all, allow midword breaks unless forbidden by lb setting</td></tr>
919*912701f9SAndroid Build Coastguard Worker        <tr><td>"keepall"</td>
920*912701f9SAndroid Build Coastguard Worker            <td>CSS level 3 word-break=keep-all, prohibit midword breaks except for dictionary breaks</td></tr>
921*912701f9SAndroid Build Coastguard Worker	<tr><td>"phrase"</td>
922*912701f9SAndroid Build Coastguard Worker	    <td>Prioritize keeping natural phrases (of multiple words) together when breaking, used in short text like title and headline</td></tr>
923*912701f9SAndroid Build Coastguard Worker
924*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeMeasurementSystemIdentifier" id="UnicodeMeasurementSystemIdentifier" href="#UnicodeMeasurementSystemIdentifier">Unicode Measurement System Identifier</a>
925*912701f9SAndroid Build Coastguard Worker        defines a preferred measurement system. Specifying "ms" in a locale identifier overrides the default value specified by supplemental measurement system data for the region
926*912701f9SAndroid Build Coastguard Worker        (see Part 2 General, <a href="tr35-general.md#Measurement_System_Data">Measurement System Data</a>). The valid values are those <i>name</i> attribute values in the
927*912701f9SAndroid Build Coastguard Worker        <i>type</i> elements of key name="ms" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/measure.xml" target="_blank">measure.xml</a></b>.
928*912701f9SAndroid Build Coastguard Worker        The determination of preferred units depends on the locale identifer: the keys ms, mu, rg, the base locale (language, script, region) and the user preferences.
929*912701f9SAndroid Build Coastguard Worker        <i>For information about preferred units and unit conversion, see <a href="tr35-info.md#Unit_Conversion">Unit Conversion</a> and <a href="tr35-info.md#Unit_Preferences">Unit Preferences</a>.</i>
930*912701f9SAndroid Build Coastguard Worker        </td></tr>
931*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="3">"ms"</td>
932*912701f9SAndroid Build Coastguard Worker    <td rowspan="3">Measurement system</td>
933*912701f9SAndroid Build Coastguard Worker            <td>"metric"</td>
934*912701f9SAndroid Build Coastguard Worker            <td>Metric System</td></tr>
935*912701f9SAndroid Build Coastguard Worker        <tr><td>"ussystem"</td>
936*912701f9SAndroid Build Coastguard Worker            <td>US System of measurement: feet, pints, etc.; pints are 16oz</td></tr>
937*912701f9SAndroid Build Coastguard Worker        <tr><td>"uksystem"</td>
938*912701f9SAndroid Build Coastguard Worker            <td>UK System of measurement: feet, pints, etc.; pints are 20oz</td></tr>
939*912701f9SAndroid Build Coastguard Worker
940*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="MeasurementUnitPreferenceOverride" id="MeasurementUnitPreferenceOverride" href="#MeasurementUnitPreferenceOverride">Measurement Unit Preference Override</a>
941*912701f9SAndroid Build Coastguard Worker        defines an override for measurement unit preference. The valid values are those <i>name</i> attribute values in the <i>type</i> elements of key name="mu" in
942*912701f9SAndroid Build Coastguard Worker        bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/measure.xml" target="_blank">measure.xml</a></b>.
943*912701f9SAndroid Build Coastguard Worker        <i>For information about preferred units and unit conversion, see <a href="tr35-info.md#Unit_Conversion">Unit Conversion</a> and <a href="tr35-info.md#Unit_Preferences">Unit Preferences</a>.</i>
944*912701f9SAndroid Build Coastguard Worker        </td></tr>
945*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="3">"mu"</td>
946*912701f9SAndroid Build Coastguard Worker    <td rowspan="3">Measurement unit override</td>
947*912701f9SAndroid Build Coastguard Worker            <td>"celsius"</td>
948*912701f9SAndroid Build Coastguard Worker            <td>Celsius as temperature unit</td></tr>
949*912701f9SAndroid Build Coastguard Worker        <tr><td>"kelvin"</td>
950*912701f9SAndroid Build Coastguard Worker            <td>Kelvin as temperature unit</td></tr>
951*912701f9SAndroid Build Coastguard Worker        <tr><td>"fahrenhe"</td>
952*912701f9SAndroid Build Coastguard Worker            <td>Fahrenheit as temperature unit</td></tr>
953*912701f9SAndroid Build Coastguard Worker
954*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeNumberSystemIdentifier" id="UnicodeNumberSystemIdentifier" href="#UnicodeNumberSystemIdentifier">Unicode Number System Identifier</a> defines a type of number system. The valid values are those <i>name</i> attribute values in the <i>type</i> elements of bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/number.xml" target="_blank">number.xml</a>.</b></td></tr>
955*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="7">"nu"<br>(numbers)</td>
956*912701f9SAndroid Build Coastguard Worker    <td rowspan="7">Numbering system</td>
957*912701f9SAndroid Build Coastguard Worker            <td><i>Unicode script subtag</i></td>
958*912701f9SAndroid Build Coastguard Worker            <td><p>Four-letter types indicating the primary numbering system for the corresponding script represented in Unicode. Unless otherwise specified, it is a decimal numbering system using digits [:GeneralCategory=Nd:]. For example, "latn" refers to the ASCII / Western digits 0-9, while "taml" is an algorithmic (non-decimal) numbering system. (The code "tamldec" is indicates the "modern Tamil decimal digits".)</p>
959*912701f9SAndroid Build Coastguard Worker                <p class="note">For more information, see <a href="tr35-numbers.md#Numbering_Systems">Numbering Systems</a>.</p></td></tr>
960*912701f9SAndroid Build Coastguard Worker        <tr><td>"arabext"</td>
961*912701f9SAndroid Build Coastguard Worker            <td>Extended Arabic-Indic digits ("arab" means the base Arabic-Indic digits)</td></tr>
962*912701f9SAndroid Build Coastguard Worker        <tr><td>"armnlow"</td>
963*912701f9SAndroid Build Coastguard Worker            <td>Armenian lowercase numerals</td></tr>
964*912701f9SAndroid Build Coastguard Worker        <tr><td colspan="2">…</td></tr>
965*912701f9SAndroid Build Coastguard Worker        <tr><td>"roman"</td>
966*912701f9SAndroid Build Coastguard Worker            <td>Roman numerals</td></tr>
967*912701f9SAndroid Build Coastguard Worker        <tr><td>"romanlow"</td>
968*912701f9SAndroid Build Coastguard Worker            <td>Roman lowercase numerals</td></tr>
969*912701f9SAndroid Build Coastguard Worker        <tr><td>"tamldec"</td>
970*912701f9SAndroid Build Coastguard Worker            <td>Modern Tamil decimal digits</td></tr>
971*912701f9SAndroid Build Coastguard Worker
972*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="RegionOverride" id="RegionOverride" href="#RegionOverride">Region Override</a> specifies an alternate region to use for obtaining
973*912701f9SAndroid Build Coastguard Worker        certain region-specific default values (those specified by the <a href="tr35-info.md#rgScope">&lt;rgScope&gt;</a> element), instead of using the region
974*912701f9SAndroid Build Coastguard Worker        specified by the <a href="#unicode_region_subtag">unicode_region_subtag</a> in the Unicode Language Identifier (or inferred from the
975*912701f9SAndroid Build Coastguard Worker        <a href="#unicode_language_subtag">unicode_language_subtag</a>)</b>.
976*912701f9SAndroid Build Coastguard Worker        </td></tr>
977*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="2">"rg"</td>
978*912701f9SAndroid Build Coastguard Worker    <td rowspan="2">Region Override</td><td>"uszzzz"<br><br></td><td rowspan="2">The value is a <a href="#unicode_subdivision_id">unicode_subdivision_id</a> of type “unknown” or “regular”; this consists of a <a href="#unicode_region_subtag">unicode_region_subtag</a> for a regular region (not a macroregion), suffixed either by “zzzz” (case is not significant) to designate the region as a whole, or by a unicode_subdivision_suffix to provide more specificity. For example, “en-GB-u-rg-uszzzz” represents a locale for British English but with region-specific defaults set to US for items such as default currency, default calendar and week data, default time cycle, and default measurement system and unit preferences.
979*912701f9SAndroid Build Coastguard Worker	The determination of preferred units depends on the locale identifer: the keys ms, mu, rg, the base locale (language, script, region) and the user preferences.
980*912701f9SAndroid Build Coastguard Worker	The value can affect the computation of the first day of the week: see <a href='tr35-dates.md#first-day-overrides'>First Day Overrides</a>.
981*912701f9SAndroid Build Coastguard Worker<i>For information about preferred units and unit conversion, see <a href="tr35-info.md#Unit_Conversion">Unit Conversion</a> and <a href="tr35-info.md#Unit_Preferences">Unit Preferences</a>.</i>
982*912701f9SAndroid Build Coastguard Worker	</td></tr>
983*912701f9SAndroid Build Coastguard Worker        <tr><td>…</td></tr>
984*912701f9SAndroid Build Coastguard Worker
985*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="unicode_subdivision_subtag_validity"></a><a name="UnicodeSubdivisionIdentifier" id="UnicodeSubdivisionIdentifier" href="#UnicodeSubdivisionIdentifier">Unicode Subdivision Identifier</a> defines a regional subdivision used for locales. The valid values are based on the <i>subdivisionContainment</i> element as described in <i>Section <a href="#Unicode_Subdivision_Codes">3.6.5 Subdivision Codes</a></i>.</b></td></tr>
986*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="2">"sd"</td>
987*912701f9SAndroid Build Coastguard Worker    <td rowspan="2">Regional Subdivision</td>
988*912701f9SAndroid Build Coastguard Worker            <td>"gbsct"</td>
989*912701f9SAndroid Build Coastguard Worker            <td rowspan="2">A <a href="#unicode_subdivision_id">unicode_subdivision_id</a>, which is a <a href="#unicode_region_subtag">unicode_region_subtag</a> concatenated with a unicode_subdivision_suffix.<br>For example, <i>gbsct</i> is “gb”+“sct” (where sct represents the subdivision code for Scotland). Thus “en-GB-u-sd-gbsct” represents the language variant “English as used in Scotland”. And both “en-u-sd-usca” and “en-US-u-sd-usca” represent “English as used in California”. See <b><i><a href="#Unicode_Subdivision_Codes">3.6.5 Subdivision Codes</a></i></b>.
990*912701f9SAndroid Build Coastguard Worker			The value can affect the computation of the first day of the week: see <a href='tr35-dates.md#first-day-overrides'>First Day Overrides</a>.
991*912701f9SAndroid Build Coastguard Worker		</td></tr>
992*912701f9SAndroid Build Coastguard Worker        <tr><td>…</td></tr>
993*912701f9SAndroid Build Coastguard Worker
994*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeSentenceBreakSuppressionsIdentifier" id="UnicodeSentenceBreakSuppressionsIdentifier" href="#UnicodeSentenceBreakSuppressionsIdentifier">Unicode Sentence Break Suppressions Identifier</a> defines a set of data to be used for suppressing certain sentence breaks that would otherwise be found by UAX #14 rules. The valid values are those <i>name</i> attribute values in the <i>type</i> elements of key name="ss" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/segmentation.xml" target="_blank">segmentation.xml</a></b>.</td></tr>
995*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="2">"ss"</td>
996*912701f9SAndroid Build Coastguard Worker    <td rowspan="2">Sentence break suppressions</td>
997*912701f9SAndroid Build Coastguard Worker            <td>"none"</td>
998*912701f9SAndroid Build Coastguard Worker            <td>Don’t use sentence break suppressions data (the default).</td></tr>
999*912701f9SAndroid Build Coastguard Worker        <tr><td>"standard"</td>
1000*912701f9SAndroid Build Coastguard Worker            <td>Use sentence break suppressions data of type "standard"</td></tr>
1001*912701f9SAndroid Build Coastguard Worker
1002*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeTimezoneIdentifier" id="UnicodeTimezoneIdentifier" href="#UnicodeTimezoneIdentifier">Unicode Timezone Identifier</a> defines a timezone. The valid values are those name attribute values in the <i>type</i> elements of bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/timezone.xml" target="_blank">timezone.xml</a>.</b></td></tr>
1003*912701f9SAndroid Build Coastguard Worker<tr><td>"tz"<br>(timezone)</td>
1004*912701f9SAndroid Build Coastguard Worker    <td>Time zone</td>
1005*912701f9SAndroid Build Coastguard Worker    <td><i>Unicode short time zone IDs</i></td>
1006*912701f9SAndroid Build Coastguard Worker    <td><p>Short identifiers defined in terms of a TZ time zone database [<a href="#Olson">Olson</a>] identifier in the common/bcp47/timezone.xml file, plus a few extra values.</p>
1007*912701f9SAndroid Build Coastguard Worker        <p>For more information, see <a href="#Time_Zone_Identifiers">Time Zone Identifiers</a>.</p>
1008*912701f9SAndroid Build Coastguard Worker        <p>CLDR provides data for normalizing timezone codes.</p></td></tr>
1009*912701f9SAndroid Build Coastguard Worker
1010*912701f9SAndroid Build Coastguard Worker<tr><td colspan="4"><b>A <a name="UnicodeVariantIdentifier" id="UnicodeVariantIdentifier" href="#UnicodeVariantIdentifier">Unicode Variant Identifier</a> defines a special variant used for locales. The valid values are those name attribute values in the <i>type</i> elements of bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/variant.xml" target="_blank">variant.xml</a>.</b></td></tr>
1011*912701f9SAndroid Build Coastguard Worker<tr><td>"va"</td>
1012*912701f9SAndroid Build Coastguard Worker    <td>Common variant type</td>
1013*912701f9SAndroid Build Coastguard Worker    <td>"posix"</td>
1014*912701f9SAndroid Build Coastguard Worker    <td>POSIX style locale variant. About handling of the "POSIX" variant see <i><a href="#Legacy_Variants">Legacy Variants</a></i>.</td></tr>
1015*912701f9SAndroid Build Coastguard Worker
1016*912701f9SAndroid Build Coastguard Worker</tbody></table>
1017*912701f9SAndroid Build Coastguard Worker
1018*912701f9SAndroid Build Coastguard WorkerFor more information on the allowed keys and types, see the specific elements below, and [U Extension Data Files](#Unicode_Locale_Extension_Data_Files).
1019*912701f9SAndroid Build Coastguard Worker
1020*912701f9SAndroid Build Coastguard WorkerAdditional keys or types might be added in future versions. Implementations of LDML should be robust to handle any syntactically valid key or type values.
1021*912701f9SAndroid Build Coastguard Worker
1022*912701f9SAndroid Build Coastguard Worker#### <a name="Numbering%20System%20Data" href="#Numbering%20System%20Data">Numbering System Data</a>
1023*912701f9SAndroid Build Coastguard Worker
1024*912701f9SAndroid Build Coastguard WorkerLDML supports multiple numbering systems. The identifiers for those numbering systems are defined in the file **bcp47/number.xml**. For example, for the latest version of the data see [bcp47/number.xml](https://github.com/unicode-org/cldr/blob/main/common/bcp47/number.xml).
1025*912701f9SAndroid Build Coastguard Worker
1026*912701f9SAndroid Build Coastguard WorkerDetails about those numbering systems are defined in **supplemental/numberingSystems.xml**. For example, for the latest version of the data see [supplemental/numberingSystems.xml](https://github.com/unicode-org/cldr/blob/main/common/supplemental/numberingSystems.xml).
1027*912701f9SAndroid Build Coastguard Worker
1028*912701f9SAndroid Build Coastguard WorkerLDML makes certain stability guarantees on this data:
1029*912701f9SAndroid Build Coastguard Worker
1030*912701f9SAndroid Build Coastguard Worker1.  Like other BCP 47 identifiers, once a numeric identifier is added to **bcp47/number.xml** or **numberingSystems.xml**, it will never be removed from either of those files.
1031*912701f9SAndroid Build Coastguard Worker2.  If an identifier has type="numeric" in numberingSystems.xml, then
1032*912701f9SAndroid Build Coastguard Worker    1.  It is a decimal, positional numbering system with an attribute `digits=X`, where `X` is a string with the 10 digits in order used by the numbering system.
1033*912701f9SAndroid Build Coastguard Worker    2.  The values of the type and digits will never change.
1034*912701f9SAndroid Build Coastguard Worker
1035*912701f9SAndroid Build Coastguard Worker#### <a name="Time_Zone_Identifiers" href="#Time_Zone_Identifiers">Time Zone Identifiers</a>
1036*912701f9SAndroid Build Coastguard Worker
1037*912701f9SAndroid Build Coastguard WorkerLDML inherits time zone IDs from the tz database [[Olson](#Olson)]. Because these IDs from the tz database do not satisfy the BCP 47 language subtag syntax requirements, CLDR defines short identifiers for the use in the Unicode locale extension. The short identifiers are defined in the file **common/bcp47/timezone.xml**.
1038*912701f9SAndroid Build Coastguard Worker
1039*912701f9SAndroid Build Coastguard WorkerThe short identifiers use UN/LOCODE [[LOCODE](#LOCODE)] (excluding a space character) codes where possible. For example, the short identifier for "America/Los_Angeles" is "uslax" (the LOCODE for Los Angeles, US is "US LAX"). Identifiers of length not equal to 5 are used where there is no corresponding UN/LOCODE, such as "usnavajo" for "America/Shiprock", or "utcw01" for "Etc/GMT+1", so that they do not overlap with future UN/LOCODE.
1040*912701f9SAndroid Build Coastguard Worker
1041*912701f9SAndroid Build Coastguard WorkerAlthough the first two letters of a short identifier may match an ISO 3166 two-letter country code, a user should not assume that the time zone belongs to the country. The first two letters in an identifier of length not equal to 5 have no meaning. Also, the identifiers are stabilized, meaning that they will not change no matter what changes happen in the base standard. So if Hawaii leaves the US and joins Canada as a new province, the short time zone identifier "ushnl" would not change in CLDR even if the UN/LOCODE changes to "cahnl" or something else.
1042*912701f9SAndroid Build Coastguard Worker
1043*912701f9SAndroid Build Coastguard WorkerThere is a special code "unk" for an Unknown or Invalid time zone. This can be expressed in the tz database style ID "Etc/Unknown", although it is not defined in the tz database.
1044*912701f9SAndroid Build Coastguard Worker
1045*912701f9SAndroid Build Coastguard Worker**Stability of Time Zone Identifiers**
1046*912701f9SAndroid Build Coastguard Worker
1047*912701f9SAndroid Build Coastguard WorkerAlthough the short time zone identifiers are guaranteed to be stable, the preferred IDs in the tz database (as those found in **zone.tab** file) might be changed time to time. For example, "Asia/Culcutta" was replaced with "Asia/Kolkata" and moved to **backward** file in the tz database. CLDR contains locale data using a time zone ID from the tz database as the key, stability of the IDs is critical.
1048*912701f9SAndroid Build Coastguard Worker
1049*912701f9SAndroid Build Coastguard WorkerTo maintain the stability of "long" IDs (for those inherited from the tz database), a special rule applied to the `alias` attribute in the `<type>` element for "tz" - the first "long" ID is the CLDR canonical "long" time zone ID. In addition to this, `iana` attribute specifies the preferred ID in the tz database if it's different from the CLDR canonical "long" ID.
1050*912701f9SAndroid Build Coastguard Worker
1051*912701f9SAndroid Build Coastguard WorkerFor example:
1052*912701f9SAndroid Build Coastguard Worker
1053*912701f9SAndroid Build Coastguard Worker```xml
1054*912701f9SAndroid Build Coastguard Worker<type name="inccu" description="Kolkata, India" alias="Asia/Calcutta Asia/Kolkata" iana="Asia/Kolkata"/>
1055*912701f9SAndroid Build Coastguard Worker```
1056*912701f9SAndroid Build Coastguard Worker
1057*912701f9SAndroid Build Coastguard WorkerAbove `<type>` element defines the short time zone ID "inccu" (for the use in the Unicode locale extension), corresponding _CLDR canonical "long" ID_ "Asia/Culcutta", and an alias "Asia/Kolkata". In the tz database, the preferred ID for this time zone is "Asia/Kolkata".
1058*912701f9SAndroid Build Coastguard Worker
1059*912701f9SAndroid Build Coastguard Worker**Links in the tz database**
1060*912701f9SAndroid Build Coastguard Worker
1061*912701f9SAndroid Build Coastguard WorkerNot all TZDB links are in CLDR aliases.
1062*912701f9SAndroid Build Coastguard WorkerCLDR purposefully does not exactly match the Link structure in the TZDB.
1063*912701f9SAndroid Build Coastguard Worker
1064*912701f9SAndroid Build Coastguard Worker1. The links are maintained in the TZDB, and it would duplicate information that could fall out of sync (especially because the TZDB can be updated many times in a single month).
1065*912701f9SAndroid Build Coastguard Worker2. The TZDB went though a change a few years ago where it dropped the mappings to countries, whereas CLDR still maintains that distinction.
1066*912701f9SAndroid Build Coastguard Worker3. Because there are several different timezones that all link together, that would make for a single long alias being an alias for several different short aliases.
1067*912701f9SAndroid Build Coastguard Worker
1068*912701f9SAndroid Build Coastguard WorkerCLDR doesn't alias across country boundaries because countries are useful for timezone selection.
1069*912701f9SAndroid Build Coastguard WorkerEven if, for example, Serbia and Croatia share the same rules, CLDR maintains the difference so that the user can either pick "Serbia time" or "Croatia time".
1070*912701f9SAndroid Build Coastguard WorkerThe Croat is not forced to pick "Serbia time" (Europe/Belgrade) nor the Serb forced to pick “Croatia time” (Europe/Zagreb).
1071*912701f9SAndroid Build Coastguard Worker
1072*912701f9SAndroid Build Coastguard Worker#### <a name="Unicode_Locale_Extension_Data_Files" href="#Unicode_Locale_Extension_Data_Files">U Extension Data Files</a>
1073*912701f9SAndroid Build Coastguard Worker
1074*912701f9SAndroid Build Coastguard WorkerThe 'u' extension data is stored in multiple XML files located under common/bcp47 directory in CLDR. Each file contains the locale extension key/type values and their backward compatibility mappings appropriate for a particular domain. [common/bcp47/collation.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/collation.xml) contains key/type values for collation, including optional collation parameters and valid type values for each key.
1075*912701f9SAndroid Build Coastguard Worker
1076*912701f9SAndroid Build Coastguard WorkerThe 't' extension data is stored in [common/bcp47/transform.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform.xml).
1077*912701f9SAndroid Build Coastguard Worker
1078*912701f9SAndroid Build Coastguard Worker```xml
1079*912701f9SAndroid Build Coastguard Worker<!ELEMENT keyword ( key* )>
1080*912701f9SAndroid Build Coastguard Worker
1081*912701f9SAndroid Build Coastguard Worker<!ELEMENT key ( type* )>
1082*912701f9SAndroid Build Coastguard Worker<!ATTLIST key extension NMTOKEN #IMPLIED>
1083*912701f9SAndroid Build Coastguard Worker<!ATTLIST key name NMTOKEN #REQUIRED>
1084*912701f9SAndroid Build Coastguard Worker<!ATTLIST key description CDATA #IMPLIED>
1085*912701f9SAndroid Build Coastguard Worker<!ATTLIST key deprecated ( true | false ) "false">
1086*912701f9SAndroid Build Coastguard Worker<!ATTLIST key preferred NMTOKEN #IMPLIED>
1087*912701f9SAndroid Build Coastguard Worker<!ATTLIST key alias NMTOKEN #IMPLIED>
1088*912701f9SAndroid Build Coastguard Worker<!ATTLIST key valueType (single | multiple | incremental | any) #IMPLIED >
1089*912701f9SAndroid Build Coastguard Worker<!ATTLIST key since CDATA #IMPLIED>
1090*912701f9SAndroid Build Coastguard Worker
1091*912701f9SAndroid Build Coastguard Worker<!ELEMENT type EMPTY>
1092*912701f9SAndroid Build Coastguard Worker<!ATTLIST type name NMTOKEN #REQUIRED>
1093*912701f9SAndroid Build Coastguard Worker<!ATTLIST type description CDATA #IMPLIED>
1094*912701f9SAndroid Build Coastguard Worker<!ATTLIST type deprecated ( true | false ) "false">
1095*912701f9SAndroid Build Coastguard Worker<!ATTLIST type preferred NMTOKEN #IMPLIED>
1096*912701f9SAndroid Build Coastguard Worker<!ATTLIST type alias CDATA #IMPLIED>
1097*912701f9SAndroid Build Coastguard Worker<!ATTLIST type since CDATA #IMPLIED>
1098*912701f9SAndroid Build Coastguard Worker<!ATTLIST type iana CDATA #IMPLIED >
1099*912701f9SAndroid Build Coastguard Worker
1100*912701f9SAndroid Build Coastguard Worker<!ELEMENT attribute EMPTY>
1101*912701f9SAndroid Build Coastguard Worker<!ATTLIST attribute name NMTOKEN #REQUIRED>
1102*912701f9SAndroid Build Coastguard Worker<!ATTLIST attribute description CDATA #IMPLIED>
1103*912701f9SAndroid Build Coastguard Worker<!ATTLIST attribute deprecated ( true | false ) "false">
1104*912701f9SAndroid Build Coastguard Worker<!ATTLIST attribute preferred NMTOKEN #IMPLIED>
1105*912701f9SAndroid Build Coastguard Worker<!ATTLIST attribute since CDATA #IMPLIED>
1106*912701f9SAndroid Build Coastguard Worker```
1107*912701f9SAndroid Build Coastguard Worker
1108*912701f9SAndroid Build Coastguard WorkerThe extension attribute in `<key>` element specifies the BCP 47 language tag extension type. The default value of the extension attribute is "u" (Unicode locale extension). The `<type>` element is only applicable to the enclosing `<key>`.
1109*912701f9SAndroid Build Coastguard Worker
1110*912701f9SAndroid Build Coastguard WorkerIn the Unicode locale extension 'u' and 't' data files, the common attributes for the `<key>`, `<type>` and `<attribute>` elements are as follows:
1111*912701f9SAndroid Build Coastguard Worker
1112*912701f9SAndroid Build Coastguard Worker**name**
1113*912701f9SAndroid Build Coastguard Worker
1114*912701f9SAndroid Build Coastguard Worker> The key or type name used by Unicode locale extension with ['u' extension syntax](#Unicode_locale_identifier) or the 't' extensions syntax. When _alias_ below is absent, this name can be also used with the old style ["@key=type" syntax](#Old_Locale_Extension_Syntax).
1115*912701f9SAndroid Build Coastguard Worker>
1116*912701f9SAndroid Build Coastguard Worker> Most type names are **literal type names**, which match exactly the same value. All of these have at least one lowercase letter, such as "buddhist". There are a small number of **indirect type names**, such as "RG_KEY_VALUE". These have no lowercase letters. The interpretation of each one is listed below.
1117*912701f9SAndroid Build Coastguard Worker>
1118*912701f9SAndroid Build Coastguard Worker> ##### <a name="CODEPOINTS" href="#CODEPOINTS">CODEPOINTS</a>
1119*912701f9SAndroid Build Coastguard Worker>
1120*912701f9SAndroid Build Coastguard Worker> The type name **"CODEPOINTS"** is reserved for a variable representing Unicode code point(s). The syntax is:
1121*912701f9SAndroid Build Coastguard Worker>
1122*912701f9SAndroid Build Coastguard Worker> |            | EBNF |
1123*912701f9SAndroid Build Coastguard Worker> | ---------- | ---- |
1124*912701f9SAndroid Build Coastguard Worker> | codepoints | `= codepoint (sep codepoint)?` |
1125*912701f9SAndroid Build Coastguard Worker> | codepoint  | `= [0-9 A-F a-f]{4,6}` |
1126*912701f9SAndroid Build Coastguard Worker>
1127*912701f9SAndroid Build Coastguard Worker> In addition, no codepoint may exceed 10FFFF. For example, "00A0", "300b", "10D40C" and "00C1-00E1" are valid, but "A0", "U060C" and "110000" are not.
1128*912701f9SAndroid Build Coastguard Worker>
1129*912701f9SAndroid Build Coastguard Worker> In the current version of CLDR, the type "CODEPOINTS" is only used for the deprecated locale extension key "vt" (variableTop). The subtags forming the type for "vt" represent an arbitrary string of characters. There is no formal limit in the number of characters, although practically anything above 1 will be rare, and anything longer than 4 might be useless. Repetition is allowed, for example, 0061-0061 ("aa") is a Valid type value for "vt", since the sequence may be a collating element. Order is vital: 0061-0062 ("ab") is different than 0062-0061 ("ba"). Note that for variableTop any character sequence must be a contraction which yields exactly one primary weight.
1130*912701f9SAndroid Build Coastguard Worker>
1131*912701f9SAndroid Build Coastguard Worker> For example,
1132*912701f9SAndroid Build Coastguard Worker>
1133*912701f9SAndroid Build Coastguard Worker> > **en-u-vt-00A4** : this indicates English, with any characters sorting at or below " ¤" (at a primary level) considered Variable.
1134*912701f9SAndroid Build Coastguard Worker>
1135*912701f9SAndroid Build Coastguard Worker> By default in UCA, variable characters are ignored in sorting at a primary, secondary, and tertiary level. But in CLDR, they are not ignorable by default. For more information, see [Collation: _Setting Options_](tr35-collation.md#Setting_Options) .
1136*912701f9SAndroid Build Coastguard Worker>
1137*912701f9SAndroid Build Coastguard Worker> ##### <a name="REORDER_CODE" href="#REORDER_CODE">REORDER_CODE</a>
1138*912701f9SAndroid Build Coastguard Worker>
1139*912701f9SAndroid Build Coastguard Worker> The type name **"REORDER_CODE"** is reserved for reordering block names (e.g. "latn", "digit" and "others") defined in the _[Root Collation](tr35-collation.md#Root_Collation)_. The type "REORDER_CODE" is used for locale extension key "kr" (colReorder). The value of type for "kr" is represented by one or more reordering block names such as "latn-digit". For more information, see [Collation: _Collation Reordering_](tr35-collation.md#Script_Reordering) .
1140*912701f9SAndroid Build Coastguard Worker>
1141*912701f9SAndroid Build Coastguard Worker> ##### <a name="RG_KEY_VALUE" href="#RG_KEY_VALUE">RG_KEY_VALUE</a>
1142*912701f9SAndroid Build Coastguard Worker>
1143*912701f9SAndroid Build Coastguard Worker> The type name **"RG_KEY_VALUE"** is reserved for region codes in the format required by the "rg" key; this is a subdivision code with idStatus='unknown' or 'regular' from the idValidity data in common/validity/subdivision.xml.
1144*912701f9SAndroid Build Coastguard Worker>
1145*912701f9SAndroid Build Coastguard Worker> ##### <a name="SCRIPT_CODE" href="#SCRIPT_CODE">SCRIPT_CODE</a>
1146*912701f9SAndroid Build Coastguard Worker>
1147*912701f9SAndroid Build Coastguard Worker> The type name **"SCRIPT_CODE"** is reserved for [`unicode_script_subtag`](#unicode_script_subtag) values (e.g. "thai", "laoo"). The type "SCRIPT_CODE" is used for locale extension key "dx". The value of type for "dx" is represented by one or more SCRIPT_CODEs, such as "thai-laoo".
1148*912701f9SAndroid Build Coastguard Worker>
1149*912701f9SAndroid Build Coastguard Worker> ##### <a name="SUBDIVISION_CODE" href="#SUBDIVISION_CODE">SUBDIVISION_CODE</a>
1150*912701f9SAndroid Build Coastguard Worker>
1151*912701f9SAndroid Build Coastguard Worker> The type name **"SUBDIVISION_CODE"** is reserved for subdivision codes in the format required by the "sd" key; this is a subdivision code from the idValidity data in common/validity/subdivision.xml, excluding those with idStatus='unknown'. Codes with idStatus='deprecated' should not be generated, and those with idStatus='private_use' are only to be used with prior agreement.
1152*912701f9SAndroid Build Coastguard Worker>
1153*912701f9SAndroid Build Coastguard Worker> ##### <a name="PRIVATE_USE" href="#PRIVATE_USE">PRIVATE_USE</a>
1154*912701f9SAndroid Build Coastguard Worker>
1155*912701f9SAndroid Build Coastguard Worker> The type name **"PRIVATE_USE"** is reserved for private use types. A valid type value is composed of one or more subtags separated by hyphens and each subtag consists of three to eight ASCII alphanumeric characters. In the current version of CLDR, **"PRIVATE_USE"** is only used for transform extension "x0".
1156*912701f9SAndroid Build Coastguard Worker
1157*912701f9SAndroid Build Coastguard Worker**valueType**
1158*912701f9SAndroid Build Coastguard Worker
1159*912701f9SAndroid Build Coastguard Worker> The `valueType` attribute indicates how many subtags are valid for a given key:
1160*912701f9SAndroid Build Coastguard Worker>
1161*912701f9SAndroid Build Coastguard Worker> | Value         | Description |
1162*912701f9SAndroid Build Coastguard Worker> | ------------- | ----------- |
1163*912701f9SAndroid Build Coastguard Worker> | `single`      | Either exactly one type value, or no type value (but only if the value of "true" would be valid). This is the default if no valueType attribute is present. |
1164*912701f9SAndroid Build Coastguard Worker> | `incremental` | Multiple type values are allowed, but only if a prefix is also present, and the sequence is explicitly listed. Each successive type value indicates a refinement of its prefix. For example:<br/>`<key name="ca" description="Calendar algorithm key" valueType="incremental">`<br/>`    <type name="islamic" description="Islamic calendar"/>`<br/>`    <type name="islamic-umalqura" description="Islamic calendar, Umm al-Qura"/>`<br/>Thus _ca-islamic-umalqura_ is valid. However, _ca-gregory-japanese_ is not valid, because "gregory-japanese" is not listed as a type. |
1165*912701f9SAndroid Build Coastguard Worker> | `multiple`    | Multiple type values are allowed, but each may only occur once. For example:<br/>`<key name="kr" description="Collation reorder codes" valueType="multiple">`<br/>`    <type name="REORDER_CODE" …/>` |
1166*912701f9SAndroid Build Coastguard Worker> | `any`         | Any number of type values are allowed, with none of the above restrictions. For example:<br/>`<key extension="t" name="x0" description="Private use transform type key." valueType="any">`<br/>`    <type name="PRIVATE_USE" …/>` |
1167*912701f9SAndroid Build Coastguard Worker
1168*912701f9SAndroid Build Coastguard Worker**description**
1169*912701f9SAndroid Build Coastguard Worker
1170*912701f9SAndroid Build Coastguard Worker> The description of the `key`, `type` or `attribute` element. There is also some informative text about certain keys and types in the [Key And Type Definitions](#Key_And_Type_Definitions_).
1171*912701f9SAndroid Build Coastguard Worker
1172*912701f9SAndroid Build Coastguard Worker**deprecated**
1173*912701f9SAndroid Build Coastguard Worker
1174*912701f9SAndroid Build Coastguard Worker> The deprecation status of the `key`, `type` or `attribute` element. The value `"true"` indicates the element is deprecated and no longer used in the version of CLDR. The default value is `"false"`.
1175*912701f9SAndroid Build Coastguard Worker
1176*912701f9SAndroid Build Coastguard Worker**preferred**
1177*912701f9SAndroid Build Coastguard Worker
1178*912701f9SAndroid Build Coastguard Worker> The preferred value of the deprecated `key`, `type` or `attribute` element. When a `key`, `type` or `attribute` element is deprecated, this attribute is used for specifying a new canonical form if available.
1179*912701f9SAndroid Build Coastguard Worker
1180*912701f9SAndroid Build Coastguard Worker**alias** (Not applicable to `<attribute>`)
1181*912701f9SAndroid Build Coastguard Worker
1182*912701f9SAndroid Build Coastguard Worker> The BCP 47 form is the canonical form, and recommended. Other aliases are included only for backwards compatibility.
1183*912701f9SAndroid Build Coastguard Worker>
1184*912701f9SAndroid Build Coastguard Worker> _Example:_
1185*912701f9SAndroid Build Coastguard Worker>
1186*912701f9SAndroid Build Coastguard Worker> ```xml
1187*912701f9SAndroid Build Coastguard Worker> <type name="phonebk" alias="phonebook" description="Phonebook style ordering (such as in German)"/>
1188*912701f9SAndroid Build Coastguard Worker> ```
1189*912701f9SAndroid Build Coastguard Worker>
1190*912701f9SAndroid Build Coastguard Worker> The preferred term, and the only one to be used in BCP 47, is the name: in this example, "phonebk".
1191*912701f9SAndroid Build Coastguard Worker>
1192*912701f9SAndroid Build Coastguard Worker> The alias is a key or type name used by Unicode locale extensions with the old ["@key=type" syntax](#Old_Locale_Extension_Syntax). The attribute value for type may contain multiple names delimited by ASCII space characters. Of those aliases, the first name is the preferred value.
1193*912701f9SAndroid Build Coastguard Worker
1194*912701f9SAndroid Build Coastguard Worker**since**
1195*912701f9SAndroid Build Coastguard Worker
1196*912701f9SAndroid Build Coastguard Worker> The version of CLDR in which this key or type was introduced. Absence of this attribute value implies the key or type was available in CLDR 1.7.2.
1197*912701f9SAndroid Build Coastguard Worker
1198*912701f9SAndroid Build Coastguard Worker_Note: There are no values defined for the locale extension attribute in the current CLDR release._
1199*912701f9SAndroid Build Coastguard Worker
1200*912701f9SAndroid Build Coastguard WorkerFor example,
1201*912701f9SAndroid Build Coastguard Worker
1202*912701f9SAndroid Build Coastguard Worker```xml
1203*912701f9SAndroid Build Coastguard Worker<key name="co" alias="collation" description="Collation type key">
1204*912701f9SAndroid Build Coastguard Worker  <type name="pinyin" description="Pinyin ordering for Latin and for CJK characters (used in Chinese)"/>
1205*912701f9SAndroid Build Coastguard Worker</key>
1206*912701f9SAndroid Build Coastguard Worker
1207*912701f9SAndroid Build Coastguard Worker<key name="ka" alias="colAlternate" description="Collation parameter key for alternate handling">
1208*912701f9SAndroid Build Coastguard Worker  <type name="noignore" alias="non-ignorable" description="Variable collation elements are not reset to ignorable"/>
1209*912701f9SAndroid Build Coastguard Worker  <type name="shifted" description="Variable collation elements are reset to zero at levels one through three"/>
1210*912701f9SAndroid Build Coastguard Worker</key>
1211*912701f9SAndroid Build Coastguard Worker
1212*912701f9SAndroid Build Coastguard Worker<key name="tz" alias="timezone">
1213*912701f9SAndroid Build Coastguard Worker  ...
1214*912701f9SAndroid Build Coastguard Worker  <type name="aumel" alias="Australia/Melbourne Australia/Victoria" description="Melbourne, Australia"/>
1215*912701f9SAndroid Build Coastguard Worker  <type name="aumqi" alias="Antarctica/Macquarie" description="Macquarie Island Station, Macquarie Island" since="1.8.1"/>
1216*912701f9SAndroid Build Coastguard Worker  ...
1217*912701f9SAndroid Build Coastguard Worker</key>
1218*912701f9SAndroid Build Coastguard Worker```
1219*912701f9SAndroid Build Coastguard Worker
1220*912701f9SAndroid Build Coastguard WorkerThe data above indicates:
1221*912701f9SAndroid Build Coastguard Worker
1222*912701f9SAndroid Build Coastguard Worker* type "pinyin" is valid for key "co", thus "u-co-pinyin" is a valid Unicode locale extension.
1223*912701f9SAndroid Build Coastguard Worker* type "pinyin" is not valid for key "ka", thus "u-ka-pinyin" is not a valid Unicode locale extension.
1224*912701f9SAndroid Build Coastguard Worker* type "pinyin" has no _alias_, so "zh@collation=pinyin" is a valid Unicode locale identifier according to the old syntax.
1225*912701f9SAndroid Build Coastguard Worker* type "noignore" has an alias attribute, so "en@colAlternate=noignore" is not a valid Unicode locale identifier according to the old syntax.
1226*912701f9SAndroid Build Coastguard Worker* type "aumel" is valid for key "tz", supported by CLDR 1.7.2 (default value) or later versions.
1227*912701f9SAndroid Build Coastguard Worker* type "aumqi" is valid for key "tz", supported by CLDR 1.8.1 or later versions.
1228*912701f9SAndroid Build Coastguard Worker
1229*912701f9SAndroid Build Coastguard WorkerIt is strongly recommended that all API methods accept all possible aliases for keywords and types, but generate the canonical form. For example, "ar-u-ca-islamicc" would be equivalent to "ar-u-ca-islamic-civil" on input, but the latter should be output. The one exception is where an alias would only be well-formed with the old syntax, such as "gregorian" (for "gregory").
1230*912701f9SAndroid Build Coastguard Worker
1231*912701f9SAndroid Build Coastguard Worker
1232*912701f9SAndroid Build Coastguard WorkerIn the Unicode locale extension 'u' data files, `<type>` element has an optional attribute below:
1233*912701f9SAndroid Build Coastguard Worker
1234*912701f9SAndroid Build Coastguard Worker**iana**
1235*912701f9SAndroid Build Coastguard Worker
1236*912701f9SAndroid Build Coastguard WorkerThis attribute is used by `tz` types for specifying preferred zone ID in the IANA time zone database.
1237*912701f9SAndroid Build Coastguard Worker
1238*912701f9SAndroid Build Coastguard Worker#### <a name="Unicode_Subdivision_Codes" href="#Unicode_Subdivision_Codes">Subdivision Codes</a>
1239*912701f9SAndroid Build Coastguard Worker
1240*912701f9SAndroid Build Coastguard WorkerThe subdivision codes designate a subdivision of a country or region. They are called various names, such as a _state_ in the United States, or a _province_ in Canada. The codes in CLDR are based on ISO 3166-2 subdivision codes. The ISO codes have a region code followed by a hyphen, then a suffix consisting of 1..3 ASCII letters or digits.
1241*912701f9SAndroid Build Coastguard Worker
1242*912701f9SAndroid Build Coastguard WorkerThe CLDR codes are designed to work in a [unicode_locale_id](#unicode_locale_id) (BCP 47), and are thus all lowercase, with no hyphen. For example, the following are valid, and mean “English as used in California, USA”.
1243*912701f9SAndroid Build Coastguard Worker
1244*912701f9SAndroid Build Coastguard Worker* en-u-sd-**usca**
1245*912701f9SAndroid Build Coastguard Worker* en-US-u-sd-**usca**
1246*912701f9SAndroid Build Coastguard Worker
1247*912701f9SAndroid Build Coastguard WorkerCLDR has additional subdivision codes. These may start with a 3-digit region code or use a suffix of 4 ASCII letters or digits, so they will not collide with the ISO codes. Subdivision codes for unknown values are the region code plus "zzzz", such as "uszzzz" for an unknown subdivision of the US. Other codes may be added for stability.
1248*912701f9SAndroid Build Coastguard Worker
1249*912701f9SAndroid Build Coastguard WorkerLike BCP 47, CLDR requires stable codes, which are not guaranteed for ISO 3166-2 (nor have the ISO 3166-2 codes been stable in the past). If an ISO 3166-2 code is removed, it remains valid (though marked as deprecated) in CLDR. If an ICU 3166-2 code is reused (for the same region), then CLDR will define a new equivalent code using these as 4-character suffixes.
1250*912701f9SAndroid Build Coastguard Worker
1251*912701f9SAndroid Build Coastguard Worker##### <a name="Validity" href="#Validity">Validity</a>
1252*912701f9SAndroid Build Coastguard Worker
1253*912701f9SAndroid Build Coastguard WorkerA [unicode_subdivision_id](#unicode_subdivision_id) is only valid when it is present in the subdivision.xml file as described in _[Validity Data](#Validity_Data)_. The data is in a compressed form, and thus needs to be expanded before such a test is made.
1254*912701f9SAndroid Build Coastguard Worker
1255*912701f9SAndroid Build Coastguard Worker_Examples:_
1256*912701f9SAndroid Build Coastguard Worker
1257*912701f9SAndroid Build Coastguard Worker* **usca** is valid — there is an `id` element `<id type="subdivision"…>… usca …</id>`
1258*912701f9SAndroid Build Coastguard Worker* **ussct** is invalid — there is no `id` element `<id type="subdivision"…>… ussct …</id>`
1259*912701f9SAndroid Build Coastguard Worker
1260*912701f9SAndroid Build Coastguard WorkerIf a [unicode_locale_id](#unicode_locale_id) contains both a [unicode_region_subtag](#unicode_region_subtag) and a [unicode_subdivision_id](#unicode_subdivision_id), it is only valid if the [unicode_subdivision_id](#unicode_subdivision_id) starts with the [unicode_region_subtag](#unicode_region_subtag) (case-insensitively).
1261*912701f9SAndroid Build Coastguard Worker
1262*912701f9SAndroid Build Coastguard WorkerIt is recommended that a [unicode_locale_id](#unicode_locale_id) contain a [unicode_region_subtag](#unicode_region_subtag) if it contains a [unicode_subdivision_id](#unicode_subdivision_id) and the region would not be added by adding likely subtags. That produces better behavior if the [unicode_subdivision_id](#unicode_subdivision_id) is ignored by an implementation or if the language tag is truncated.
1263*912701f9SAndroid Build Coastguard Worker
1264*912701f9SAndroid Build Coastguard WorkerExamples:
1265*912701f9SAndroid Build Coastguard Worker
1266*912701f9SAndroid Build Coastguard Worker* en-**US**-u-sd-**us**ca is valid — the region "US" matches the first part of "usca"
1267*912701f9SAndroid Build Coastguard Worker* en-u-sd-**us**ca is valid — it still works after adding likely subtags.
1268*912701f9SAndroid Build Coastguard Worker* en-**CA**-u-sd-**gb**sct is invalid — the region "CA" does not match the first part of "gbsct". An implementation should disregard the subdivision id (or return an error).
1269*912701f9SAndroid Build Coastguard Worker* en-u-sd-**gb**sct is valid but not recommended — an implementation that ignores the [unicode_subdivision_id](#unicode_subdivision_id) can get the wrong fallback behavior, or could add likely subtags and get the invalid en-**Latn-US**-u-sd-**gb**sct
1270*912701f9SAndroid Build Coastguard Worker
1271*912701f9SAndroid Build Coastguard WorkerIn version 28.0, the subdivisions in the validity files used the ISO format, uppercase with a hyphen separating two components, instead of the BCP 47 format.
1272*912701f9SAndroid Build Coastguard Worker
1273*912701f9SAndroid Build Coastguard Worker<a name="t_Extension"></a>
1274*912701f9SAndroid Build Coastguard Worker### <a name="BCP47_T_Extension" href="#BCP47_T_Extension">Unicode BCP 47 T Extension</a>
1275*912701f9SAndroid Build Coastguard Worker
1276*912701f9SAndroid Build Coastguard WorkerThe Unicode Consortium has registered and is the maintaining authority for two BCP 47 language tag extensions: the extension 'u' for Unicode locale extension [[RFC6067](#RFC6067)] and extension 't' for transformed content [[RFC6497](#RFC6497)]. The Unicode BCP 47 extension data defines the complete list of valid subtags. While the title of the RFC is “Transformed Content”, the abstract makes it clear that the scope is broader than the term "transformed" might indicate to a casual reader: “including content that has been transliterated, transcribed, or translated, or _in some other way influenced by the source. It also provides for additional information used for identification._”
1277*912701f9SAndroid Build Coastguard Worker
1278*912701f9SAndroid Build Coastguard Worker**The -t- Extension.** The syntax of 't' extension subtags is defined by the rule `transformed_extensions` in [_ Unicode locale identifier_](#Unicode_locale_identifier), except the separator of subtags `sep` must be always hyphen '-' when the extension is used as a part of BCP 47 language tag. For information about the registration process, meaning, and usage of the 't' extension, see [[RFC6497](#RFC6497)].
1279*912701f9SAndroid Build Coastguard Worker
1280*912701f9SAndroid Build Coastguard WorkerThese subtags are all in lowercase (that is the canonical casing for these subtags), however, subtags are case-insensitive and casing does not carry any specific meaning. All subtags within the Unicode extensions are alphanumeric characters in length of two to eight that meet the rule `extension` in the [[BCP47](#BCP47)].
1281*912701f9SAndroid Build Coastguard Worker
1282*912701f9SAndroid Build Coastguard WorkerThe following keys are defined for the -t- extension:
1283*912701f9SAndroid Build Coastguard Worker
1284*912701f9SAndroid Build Coastguard Worker| Keys   | Description | Values in latest release |
1285*912701f9SAndroid Build Coastguard Worker| ------ | ----------- | ------------------------ |
1286*912701f9SAndroid Build Coastguard Worker| m0     | **Transform extension mechanism:** to reference an authority or rules for a type of transformation | [​transform.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform.xml) |
1287*912701f9SAndroid Build Coastguard Worker| s0, d0 | **Transform source/destination:** for non-languages/scripts, such as fullwidth-halfwidth conversion. | [​transform-destination.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform-destination.xml) |
1288*912701f9SAndroid Build Coastguard Worker| i0     | **Input Method Engine transform:** Used to indicate an input method transformation, such as one used by a client-side input method. The first subfield in a sequence would typically be a 'platform' or vendor designation. | [​transform_ime.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform_ime.xml) |
1289*912701f9SAndroid Build Coastguard Worker| k0     | **Keyboard transform:** Used to indicate a keyboard transformation, such as one used by a client-side virtual keyboard. The first subfield in a sequence would typically be a 'platform' designation, representing the platform that the keyboard is intended for. The keyboard might or might not correspond to a keyboard mapping shipped by the vendor for the platform. One or more subsequent fields may occur, but are only added where needed to distinguish from others. | [​transform_keyboard.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform_keyboard.xml) |
1290*912701f9SAndroid Build Coastguard Worker| t0     | **Machine Translation:** Used to indicate content that has been machine translated, or a request for a particular type of machine translation of content. The first subfield in a sequence would typically be a 'platform' or vendor designation. | [​transform_mt.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform_mt.xml) |
1291*912701f9SAndroid Build Coastguard Worker| h0     | **Hybrid Locale Identifiers:** h0 with the value 'hybrid' indicates that the -t- value is a language that is mixed into the main language tag to form a hybrid. For more information, and examples, see _[Hybrid Locale Identifiers](#Hybrid_Locale)._ | [​transform_hybrid.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform_hybrid.xml) |
1292*912701f9SAndroid Build Coastguard Worker| x0     | **Private use transform** | [​transform_private_use.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform_private_use.xml) |
1293*912701f9SAndroid Build Coastguard Worker
1294*912701f9SAndroid Build Coastguard Worker#### <a name="Transformed_Content_Data_File" href="#Transformed_Content_Data_File">T Extension Data Files</a>
1295*912701f9SAndroid Build Coastguard Worker
1296*912701f9SAndroid Build Coastguard WorkerThe overall structure of the data files is the similar to the U Extension, with the following exceptions.
1297*912701f9SAndroid Build Coastguard Worker
1298*912701f9SAndroid Build Coastguard WorkerIn the transformed content 't' data file, the `name` attribute in a `<key>` element defines a valid field separator subtag. The `name` attribute in an enclosed `<type>` element defines a valid field subtag for the field separator subtag. For example:
1299*912701f9SAndroid Build Coastguard Worker
1300*912701f9SAndroid Build Coastguard Worker```xml
1301*912701f9SAndroid Build Coastguard Worker<key extension="t" name="m0" description="Transform extension mechanism">
1302*912701f9SAndroid Build Coastguard Worker    <type name="ungegn" description="United Nations Group of Experts on Geographical Names" since="21"/>
1303*912701f9SAndroid Build Coastguard Worker</key>
1304*912701f9SAndroid Build Coastguard Worker```
1305*912701f9SAndroid Build Coastguard Worker
1306*912701f9SAndroid Build Coastguard WorkerThe data above indicates:
1307*912701f9SAndroid Build Coastguard Worker
1308*912701f9SAndroid Build Coastguard Worker* "m0" is a valid field separator for the transformed content extension 't'.
1309*912701f9SAndroid Build Coastguard Worker* field subtag "ungegn" is valid for field separator "m0".
1310*912701f9SAndroid Build Coastguard Worker* field subtag "ungegn" was introduced in CLDR 21.
1311*912701f9SAndroid Build Coastguard Worker
1312*912701f9SAndroid Build Coastguard WorkerThe attributes are:
1313*912701f9SAndroid Build Coastguard Worker
1314*912701f9SAndroid Build Coastguard Worker**name**
1315*912701f9SAndroid Build Coastguard Worker
1316*912701f9SAndroid Build Coastguard Worker> The name of the mechanism, limited to 3-8 characters (or sequences of them). Any indirect type names are listed in 3.6.4 [U Extension Data Files](#Unicode_Locale_Extension_Data_Files).
1317*912701f9SAndroid Build Coastguard Worker
1318*912701f9SAndroid Build Coastguard Worker**description**
1319*912701f9SAndroid Build Coastguard Worker
1320*912701f9SAndroid Build Coastguard Worker> A description of the name, with all and only that information necessary to distinguish one name from others with which it might be confused. Descriptions are not intended to provide general background information.
1321*912701f9SAndroid Build Coastguard Worker
1322*912701f9SAndroid Build Coastguard Worker**since**
1323*912701f9SAndroid Build Coastguard Worker
1324*912701f9SAndroid Build Coastguard Worker> Indicates the first version of CLDR where the name appears. (Required for new items.)
1325*912701f9SAndroid Build Coastguard Worker
1326*912701f9SAndroid Build Coastguard Worker**alias**
1327*912701f9SAndroid Build Coastguard Worker
1328*912701f9SAndroid Build Coastguard Worker> Alternative name, not limited in number of characters. Aliases are intended for compatibility, not to provide all possible alternate names or designations. _(Optional)_
1329*912701f9SAndroid Build Coastguard Worker
1330*912701f9SAndroid Build Coastguard WorkerFor information about the registration process, meaning, and usage of the 't' extension, see [[RFC6497](#RFC6497)].
1331*912701f9SAndroid Build Coastguard Worker
1332*912701f9SAndroid Build Coastguard Worker### <a name="Compatibility_with_Older_Identifiers" href="#Compatibility_with_Older_Identifiers">Compatibility with Older Identifiers</a>
1333*912701f9SAndroid Build Coastguard Worker
1334*912701f9SAndroid Build Coastguard WorkerLDML version before 1.7.2 used slightly different syntax for variant subtags and locale extensions. Implementations of LDML may provide backward compatible identifier support as described in following sections.
1335*912701f9SAndroid Build Coastguard Worker
1336*912701f9SAndroid Build Coastguard Worker#### <a name="Old_Locale_Extension_Syntax" href="#Old_Locale_Extension_Syntax">Old Locale Extension Syntax</a>
1337*912701f9SAndroid Build Coastguard Worker
1338*912701f9SAndroid Build Coastguard WorkerLDML 1.7 or older specification used different syntax for representing Unicode locale extensions. The previous definition of Unicode locale extensions had the following structure:
1339*912701f9SAndroid Build Coastguard Worker
1340*912701f9SAndroid Build Coastguard Worker|                               | EBNF |
1341*912701f9SAndroid Build Coastguard Worker| ----------------------------- | ---- |
1342*912701f9SAndroid Build Coastguard Worker| `old_unicode_locale_extensions` | `= "@" old_key "=" old_type`<br/>`(";" old_key "=" old_type)*` |
1343*912701f9SAndroid Build Coastguard Worker
1344*912701f9SAndroid Build Coastguard WorkerThe new specification mandates keys to be two alphanumeric characters and types to be three to eight alphanumeric characters. As the result, new codes were assigned to all existing keys and some types. For example, a new key "co" replaced the previous key "collation", a new type "phonebk" replaced the previous type "phonebook". However, the existing collation type "big5han" already satisfied the new requirement, so no new type code was assigned to the type. All new keys and types introduced after LDML 1.7 satisfy the new requirement, so they do not have aliases dedicated for the old syntax, except time zone types. The conversion between old types and new types can be done regardless of key, with one known exception (old type "traditional" is mapped to new type "trad" for collation and "traditio" for numbering system), and this relationship will be maintained in the future versions unless otherwise noted.
1345*912701f9SAndroid Build Coastguard Worker
1346*912701f9SAndroid Build Coastguard WorkerThe new specification introduced a new field `attribute` in addition to key/type pairs in the Unicode locale extension. When it is necessary to map a new Unicode locale identifier with `attribute` field to a well-formed old locale identifier, a special key name _attribute_ with the value of entire `attribute` subtags in the new identifier is used. For example, a new identifier `ja-u-xxx-yyy-ca-japanese` is mapped to an old identifier `ja@attribute=xxx-yyy;calendar=japanese` .
1347*912701f9SAndroid Build Coastguard Worker
1348*912701f9SAndroid Build Coastguard WorkerThe chart below shows some example mappings between the new syntax and the old syntax.
1349*912701f9SAndroid Build Coastguard Worker
1350*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Locale_Extension_Mappings" href="#Locale_Extension_Mappings">Locale Extension Mappings</a>
1351*912701f9SAndroid Build Coastguard Worker
1352*912701f9SAndroid Build Coastguard Worker| Old (LDML 1.7 or older)                    | New                          |
1353*912701f9SAndroid Build Coastguard Worker| ------------------------------------------ | ---------------------------- |
1354*912701f9SAndroid Build Coastguard Worker| `de_DE@collation=phonebook`                | `de_DE_u_co_phonebk`         |
1355*912701f9SAndroid Build Coastguard Worker| `zh_Hant_TW@collation=big5han`             | `zh_Hant_TW_u_co_big5han`    |
1356*912701f9SAndroid Build Coastguard Worker| `th_TH@calendar=gregorian;numbers=thai`    | `th_TH_u_ca_gregory_nu_thai` |
1357*912701f9SAndroid Build Coastguard Worker| `en_US_POSIX@timezone=America/Los_Angeles` | `en_US_u_tz_uslax_va_posix`  |
1358*912701f9SAndroid Build Coastguard Worker
1359*912701f9SAndroid Build Coastguard WorkerWhere the old API is supplied the bcp47 language code, or vice versa, the recommendation is to:
1360*912701f9SAndroid Build Coastguard Worker
1361*912701f9SAndroid Build Coastguard Worker1. Have all methods that take the old syntax also take the new syntax, interpreted correctly. For example, "zh-TW-u-co-pinyin" and "zh_TW@collation=pinyin" would both be interpreted as meaning the same.
1362*912701f9SAndroid Build Coastguard Worker2. Have all methods (both for old and new syntax) accept all possible aliases for keywords and types. For example, "ar-u-ca-islamicc" would be equivalent to "ar-u-ca-islamic-civil".
1363*912701f9SAndroid Build Coastguard Worker   * The one exception is where an alias would only be well-formed with the old syntax, such as "gregorian" (for "gregory").
1364*912701f9SAndroid Build Coastguard Worker3. Where an API cannot successfully accept the alternate syntax, throw an exception (or otherwise indicate an error) so that people can detect that they are using the wrong method (or wrong input).
1365*912701f9SAndroid Build Coastguard Worker4. Provide a method that tests a purported locale ID string to determine its status:
1366*912701f9SAndroid Build Coastguard Worker   1. **well-formed** - syntactically correct
1367*912701f9SAndroid Build Coastguard Worker   2. **valid** - well-formed and only uses registered language subtags, extensions, keywords, types...
1368*912701f9SAndroid Build Coastguard Worker   3. **canonical** - valid and no deprecated codes or structure.
1369*912701f9SAndroid Build Coastguard Worker
1370*912701f9SAndroid Build Coastguard Worker#### <a name="Legacy_Variants" href="#Legacy_Variants">Legacy Variants</a>
1371*912701f9SAndroid Build Coastguard Worker
1372*912701f9SAndroid Build Coastguard WorkerOld LDML specification allowed codes other than registered [[BCP47](#BCP47)] variant subtags used in Unicode language and locale identifiers for representing variations of locale data. Unicode locale identifiers including such variant codes can be converted to the new [[BCP47](#BCP47)] compatible identifiers by following the descriptions below:
1373*912701f9SAndroid Build Coastguard Worker
1374*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Legacy_Variant_Mappings" href="#Legacy_Variant_Mappings">Legacy Variant Mappings</a>
1375*912701f9SAndroid Build Coastguard Worker
1376*912701f9SAndroid Build Coastguard Worker| Variant Code | Description |
1377*912701f9SAndroid Build Coastguard Worker| ------------ | ----------- |
1378*912701f9SAndroid Build Coastguard Worker| `AALAND`     | Åland, variant of "`sv`" Swedish used in Finland. Use `sv_AX` to indicate this. |
1379*912701f9SAndroid Build Coastguard Worker| `BOKMAL`     | Bokmål, variant of "`no`" Norwegian. Use primary language subtag "`nb`" to indicate this. |
1380*912701f9SAndroid Build Coastguard Worker| `NYNORSK`    | Nynorsk, variant of "`no`" Norwegian. Use primary language subtag "`nn`" to indicate this. |
1381*912701f9SAndroid Build Coastguard Worker| `POSIX`      | POSIX variation of locale data. Use Unicode locale extension `-u-va-posix` to indicate this. |
1382*912701f9SAndroid Build Coastguard Worker| `POLYTONI`   | Polytonic, variant of "`el`" Greek. Use [[BCP47](#BCP47)] variant subtag `polyton` to indicate this. |
1383*912701f9SAndroid Build Coastguard Worker| `SAAHO`      | The Saaho variant of Afar. Use primary language subtag "`ssy`" to indicate this. |
1384*912701f9SAndroid Build Coastguard Worker
1385*912701f9SAndroid Build Coastguard WorkerWhen converting to old syntax, the Unicode locale extension "`-u-va-posix`" should be converted to the "`POSIX`" variant, _not_ to old extension syntax like "`@va=posix`". This is an exception: The other mappings above should not be reversed.
1386*912701f9SAndroid Build Coastguard Worker
1387*912701f9SAndroid Build Coastguard WorkerExamples:
1388*912701f9SAndroid Build Coastguard Worker
1389*912701f9SAndroid Build Coastguard Worker* `en_US_POSIX` ↔ `en-US-u-va-posix`
1390*912701f9SAndroid Build Coastguard Worker* `en_US_POSIX@colNumeric=yes` ↔ `en-US-u-kn-va-posix`
1391*912701f9SAndroid Build Coastguard Worker* `en-US-POSIX-u-kn-true` → `en-US-u-kn-va-posix`
1392*912701f9SAndroid Build Coastguard Worker* `en-US-POSIX-u-kn-va-posix` → `en-US-u-kn-va-posix`
1393*912701f9SAndroid Build Coastguard Worker
1394*912701f9SAndroid Build Coastguard Worker> �� Note that the mapping between `en_US_POSIX` and `en-US-u-va-posix` is a conversion process, not a canonicalization process.
1395*912701f9SAndroid Build Coastguard Worker
1396*912701f9SAndroid Build Coastguard Worker#### <a name="Relation_to_OpenI18n" href="#Relation_to_OpenI18n">Relation to OpenI18n</a>
1397*912701f9SAndroid Build Coastguard Worker
1398*912701f9SAndroid Build Coastguard WorkerThe locale id format generally follows the description in the _OpenI18N Locale Naming Guideline_ [[NamingGuideline](#NamingGuideline)], with some enhancements. The main differences from those guidelines are that the locale id:
1399*912701f9SAndroid Build Coastguard Worker
1400*912701f9SAndroid Build Coastguard Worker1. does not include a charset (since the data in LDML format always provides a representation of all Unicode characters. The repository is stored in UTF-8, although that can be transcoded to other encodings as well.)
1401*912701f9SAndroid Build Coastguard Worker2. adds the ability to have a variant, as in Java
1402*912701f9SAndroid Build Coastguard Worker3. adds the ability to discriminate the written language by script (or script variant).
1403*912701f9SAndroid Build Coastguard Worker4. is a superset of [[BCP47](#BCP47)] codes.
1404*912701f9SAndroid Build Coastguard Worker
1405*912701f9SAndroid Build Coastguard Worker### <a name="Transmitting_Locale_Information" href="#Transmitting_Locale_Information">Transmitting Locale Information</a>
1406*912701f9SAndroid Build Coastguard Worker
1407*912701f9SAndroid Build Coastguard WorkerIn a world of on-demand software components, with arbitrary connections between those components, it is important to get a sense of where localization should be done, and how to transmit enough information so that it can be done at that appropriate place. End-users need to get messages localized to their languages, messages that not only contain a translation of text, but also contain variables such as date, time, number formats, and currencies formatted according to the users' conventions. The strategy for doing the so-called _JIT localization_ is made up of two parts:
1408*912701f9SAndroid Build Coastguard Worker
1409*912701f9SAndroid Build Coastguard Worker1. Store and transmit _neutral-format_ data wherever possible.
1410*912701f9SAndroid Build Coastguard Worker   * Neutral-format data is data that is kept in a standard format, no matter what the local user's environment is. Neutral-format is also (loosely) called _binary data_, even though it actually could be represented in many different ways, including a textual representation such as in XML.
1411*912701f9SAndroid Build Coastguard Worker   * Such data should use accepted standards where possible, such as for currency codes.
1412*912701f9SAndroid Build Coastguard Worker   * Textual data should also be in a uniform character set (Unicode/10646) to avoid possible data corruption problems when converting between encodings.
1413*912701f9SAndroid Build Coastguard Worker2. Localize that data as "_close_" to the end-user as possible.
1414*912701f9SAndroid Build Coastguard Worker
1415*912701f9SAndroid Build Coastguard WorkerThere are a number of advantages to this strategy. The longer the data is kept in a neutral format, the more flexible the entire system is. On a practical level, if transmitted data is neutral-format, then it is much easier to manipulate the data, debug the processing of the data, and maintain the software connections between components.
1416*912701f9SAndroid Build Coastguard Worker
1417*912701f9SAndroid Build Coastguard WorkerOnce data has been localized into a given language, it can be quite difficult to programmatically convert that data into another format, if required. This is especially true if the data contains a mixture of translated text and formatted variables. Once information has been localized into, say, Romanian, it is much more difficult to localize that data into, say, French. Parsing is more difficult than formatting, and may run up against different ambiguities in interpreting text that has been localized, even if the original translated message text is available (which it may not be).
1418*912701f9SAndroid Build Coastguard Worker
1419*912701f9SAndroid Build Coastguard WorkerMoreover, the closer we are to end-user, the more we know about that user's preferred formats. If we format dates, for example, at the user's machine, then it can easily take into account any customizations that the user has specified. If the formatting is done elsewhere, either we have to transmit whatever user customizations are in play, or we only transmit the user's locale code, which may only approximate the desired format. Thus the closer the localization is to the end user, the less we need to ship all of the user's preferences around to all the places that localization could possibly need to be done.
1420*912701f9SAndroid Build Coastguard Worker
1421*912701f9SAndroid Build Coastguard WorkerEven though localization should be done as close to the end-user as possible, there will be cases where different components need to be aware of whatever settings are appropriate for doing the localization. Thus information such as a locale code or time zone needs to be communicated between different components.
1422*912701f9SAndroid Build Coastguard Worker
1423*912701f9SAndroid Build Coastguard Worker#### <a name="Message_Formatting_and_Exceptions" href="#Message_Formatting_and_Exceptions">Message Formatting and Exceptions</a>
1424*912701f9SAndroid Build Coastguard Worker
1425*912701f9SAndroid Build Coastguard WorkerWindows ([FormatMessage](https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessage), [String.Format](https://learn.microsoft.com/en-us/dotnet/api/system.string.format?view=net-6.0)), Java ([MessageFormat](https://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html)) and ICU ([MessageFormat](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classMessageFormat.html), [umsg](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/umsg_8h.html)) all provide methods of formatting variables (dates, times, etc) and inserting them at arbitrary positions in a string. This avoids the manual string concatenation that causes severe problems for localization. The question is, where to do this? It is especially important since the original code site that originates a particular message may be far down in the bowels of a component, and passed up to the top of the component with an exception. So we will take that case as representative of this class of issues.
1426*912701f9SAndroid Build Coastguard Worker
1427*912701f9SAndroid Build Coastguard WorkerThere are circumstances where the message can be communicated with a language-neutral code, such as a numeric error code or mnemonic string key, that is understood outside of the component. If there are arguments that need to accompany that message, such as a number of files or a datetime, those need to accompany the numeric code so that when the localization is finally at some point, the full information can be presented to the end-user. This is the best case for localization.
1428*912701f9SAndroid Build Coastguard Worker
1429*912701f9SAndroid Build Coastguard WorkerMore often, the exact messages that could originate from within the component are not known outside of the component itself; or at least they may not be known by the component that is finally displaying text to the user. In such a case, the information as to the user's locale needs to be communicated in some way to the component that is doing the localization. That locale information does not necessarily need to be communicated deep within the component; ideally, any exceptions should bundle up some language-neutral message ID, plus the arguments needed to format the message (for example, datetime), but not do the localization at the throw site. This approach has the advantages noted above for JIT localization.
1430*912701f9SAndroid Build Coastguard Worker
1431*912701f9SAndroid Build Coastguard WorkerIn addition, exceptions are often caught at a higher level; they do not end up being displayed to any end-user at all. By avoiding the localization at the throw site, it the cost of doing formatting, when that formatting is not really necessary. In fact, in many running programs most of the exceptions that are thrown at a low level never end up being presented to an end-user, so this can have considerable performance benefits.
1432*912701f9SAndroid Build Coastguard Worker
1433*912701f9SAndroid Build Coastguard Worker### <a name="Language_and_Locale_IDs" href="#Language_and_Locale_IDs">Unicode Language and Locale IDs</a>
1434*912701f9SAndroid Build Coastguard Worker
1435*912701f9SAndroid Build Coastguard WorkerPeople have very slippery notions of what distinguishes a language code versus a locale code. The problem is that both are somewhat nebulous concepts.
1436*912701f9SAndroid Build Coastguard Worker
1437*912701f9SAndroid Build Coastguard WorkerIn practice, many people use [[BCP47](#BCP47)] codes to mean locale codes instead of strictly language codes. It is easy to see why this came about; because [[BCP47](#BCP47)] includes an explicit region (territory) code, for most people it was sufficient for use as a locale code as well. For example, when typical web software receives a [[BCP47](#BCP47)] code, it will use it as a locale code. Other typical software will do the same: in practice, language codes and locale codes are treated interchangeably. Some people recommend distinguishing on the basis of "-" versus "\_" (for example, _zh-TW_ for language code, _zh_TW_ for locale code), but in practice that does not work because of the free variation out in the world in the use of these separators. Notice that Windows, for example, uses "-" as a separator in its locale codes. So pragmatically one is forced to treat "-" and "\_" as equivalent when interpreting either one on input.
1438*912701f9SAndroid Build Coastguard Worker
1439*912701f9SAndroid Build Coastguard WorkerAnother reason for the conflation of these codes is that _very_ little data in most systems is distinguished by region alone; currency codes and measurement systems being some of the few. Sometimes date or number formats are mentioned as regional, but that really does not make much sense. If people see the sentence "You will have to adjust the value to १,२३४.५६७ from ૭૧,૨૩૪.૫૬" (using Indic digits), they would say that sentence is simply not English. Number format is far more closely associated with language than it is with region. The same is true for date formats: people would never expect to see intermixed a date in the format "2003年4月1日" (using Kanji) in text purporting to be purely English. There are regional differences in date and number format — differences which can be important — but those are different in kind than other language differences between regions.
1440*912701f9SAndroid Build Coastguard Worker
1441*912701f9SAndroid Build Coastguard WorkerAs far as we are concerned — _as a completely practical matter_ — two languages are different if they require substantially different localized resources. Distinctions according to spoken form are important in some contexts, but the written form is by far and away the most important issue for data interchange. Unfortunately, this is not the principle used in [[ISO639](#ISO639)], which has the fairly unproductive notion (for data interchange) that only spoken language matters (it is also not completely consistent about this, however).
1442*912701f9SAndroid Build Coastguard Worker
1443*912701f9SAndroid Build Coastguard Worker[[BCP47](#BCP47)] _**can**_ express a difference if the use of written languages happens to correspond to region boundaries expressed as [[ISO3166](#ISO3166)] region codes, and has recently added codes that allow it to express some important cases that are not distinguished by [[ISO3166](#ISO3166)] codes. These written languages include simplified and traditional Chinese (both used in Hong Kong S.A.R.); Serbian in Latin script; Azerbaijani in Arab script, and so on.
1444*912701f9SAndroid Build Coastguard Worker
1445*912701f9SAndroid Build Coastguard WorkerNotice also that _currency codes_ are different than _currency localizations_. The currency localizations should largely be in the language-based resource bundles, not in the territory-based resource bundles. Thus, the resource bundle _en_ contains the localized mappings in English for a range of different currency codes: USD → US$, RUR → Rub, AUD → $A and so on. Of course, some currency symbols are used for more than one currency, and in such cases specializations appear in the territory-based bundles. Continuing the example, _en_US_ would have USD → $, while _en_AU_ would have AUD → $. (In protocols, the currency codes should always accompany any currency amounts; otherwise the data is ambiguous, and software is forced to use the user's territory to guess at the currency. For some informal discussion of this, see [JIT Localization](https://unicode-org.github.io/icu-docs/design/jit_localization.html).)
1446*912701f9SAndroid Build Coastguard Worker
1447*912701f9SAndroid Build Coastguard Worker#### <a name="Written_Language" href="#Written_Language">Written Language</a>
1448*912701f9SAndroid Build Coastguard Worker
1449*912701f9SAndroid Build Coastguard WorkerCriteria for what makes a written language should be purely pragmatic; _what would copy-editors say?_ If one gave them text like the following, they would respond that is far from acceptable English for publication, and ask for it to be redone:
1450*912701f9SAndroid Build Coastguard Worker
1451*912701f9SAndroid Build Coastguard Worker1. "Theatre Center News: The date of the last version of this document was 2003年3月20日. A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt."
1452*912701f9SAndroid Build Coastguard Worker
1453*912701f9SAndroid Build Coastguard WorkerSo one would change it to either B or C below, depending on which orthographic variant of English was the target for the publication:
1454*912701f9SAndroid Build Coastguard Worker
1455*912701f9SAndroid Build Coastguard Worker2. "Theater Center News: The date of the last version of this document was 3/20/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."
1456*912701f9SAndroid Build Coastguard Worker3. "Theatre Centre News: The date of the last version of this document was 20/3/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."
1457*912701f9SAndroid Build Coastguard Worker
1458*912701f9SAndroid Build Coastguard WorkerClearly there are many acceptable variations on this text. For example, copy editors might still quibble with the use of first versus last name sorting in the list, but clearly the first list was _not_ acceptable English alphabetical order. And in quoting a name, like "Theatre Centre News", one may leave it in the source orthography even if it differs from the publication target orthography. And so on. However, just as clearly, there are limits on what is acceptable English, and "2003年3月20日", for example, is _not_.
1459*912701f9SAndroid Build Coastguard Worker
1460*912701f9SAndroid Build Coastguard WorkerNote that the language of locale data may differ from the language of localized software or web sites, when those latter are not localized into the user's preferred language. In such cases, the kind of incongruous juxtapositions described above may well appear, but this situation is usually preferable to forcing unfamiliar date or number formats on the user as well.
1461*912701f9SAndroid Build Coastguard Worker
1462*912701f9SAndroid Build Coastguard Worker#### <a name="Hybrid_Locale" href="#Hybrid_Locale">Hybrid Locale Identifiers</a>
1463*912701f9SAndroid Build Coastguard Worker
1464*912701f9SAndroid Build Coastguard WorkerHybrid locales have intermixed content from 2 (or more) languages, often with one language's grammatical structure applied to words in another. These are commonly referred to with portmanteau words such as _Franglais, [​Spanglish](https://en.wikipedia.org/wiki/Spanglish)_ or _Denglish_. Hybrid locales do not _not_ reference text simply containing two languages: a book of parallel text containing English and French, such as the following, is not Franglais:
1465*912701f9SAndroid Build Coastguard Worker
1466*912701f9SAndroid Build Coastguard Worker<!-- HTML: no header -->
1467*912701f9SAndroid Build Coastguard Worker<table><tbody><tr>
1468*912701f9SAndroid Build Coastguard Worker    <td>On the 24th of May, 1863, my uncle, Professor Liedenbrock, rushed into his little house, No. 19 Königstrasse, one of the oldest streets in the oldest portion of the city of Hamburg…</td>
1469*912701f9SAndroid Build Coastguard Worker    <td>Le 24 mai 1863, un dimanche, mon oncle, le professeur Lidenbrock, revint précipitamment vers sa petite maison située au numéro 19 de Königstrasse, l’une des plus anciennes rues du vieux quartier de Hambourg…</td>
1470*912701f9SAndroid Build Coastguard Worker</tr></tbody></table>
1471*912701f9SAndroid Build Coastguard Worker
1472*912701f9SAndroid Build Coastguard WorkerWhile text in a document can be tagged as partly in one language and partly in another, that is not the same having a hybrid locale. There is a difference between having a Spanglish document, and a Spanish document that has some passages quoted in English. Fine-grained tagging doesn't handle grammatical combinations like Tanglish “Enna matteru?” (_What’s the matter?_), which is neither standard Tamil nor standard English. More importantly, it doesn’t work for the very common use case for a [unicode_locale_id](#unicode_locale_id): _locale selection_.
1473*912701f9SAndroid Build Coastguard Worker
1474*912701f9SAndroid Build Coastguard WorkerTo communicate requests for localized content and internationalization services, locales are used. When people pick a language from a menu, internally they are picking a locale (en-GB, es-419, etc.). To allow an application to support Spanglish or Hinglish locale selection, [unicode_locale_id](#unicode_locale_id)s can represent hybrid locales using the T Extension key-value 'h0-hybrid'. (For more information on the T extension, see _[Unicode BCP 47 T Extension](#t_Extension)._)
1475*912701f9SAndroid Build Coastguard Worker
1476*912701f9SAndroid Build Coastguard Worker_However, if users typically expect a their language in non-default script to contain a significant amount of text due to lexical borrowing, then the -t- and hybrid subtags may be omitted. An example of this is when Hindi is written in Latin script since Romanized Hindi typically contains a significant amount of English text, ‘hi-Latn’ can be used instead of ‘hi-Latn-t-en-h0-hybrid’._
1477*912701f9SAndroid Build Coastguard WorkerThis tends to work better in implementations that don't yet handle the -t- extension.
1478*912701f9SAndroid Build Coastguard Worker
1479*912701f9SAndroid Build Coastguard WorkerExamples:
1480*912701f9SAndroid Build Coastguard Worker
1481*912701f9SAndroid Build Coastguard Worker|Locale ID			| Base script	| Hybrid name	| Description									|
1482*912701f9SAndroid Build Coastguard Worker|-------------------------------|---------------|---------------|-------------------------------------------------------------------------------|
1483*912701f9SAndroid Build Coastguard Worker|hi-t-***en-h0-hybrid***	| Deva		| Hinglish	| Hindi-English hybrid where the script is Devanagari\*				|
1484*912701f9SAndroid Build Coastguard Worker|hi-Latn-t-***en-h0-hybrid***	| Latin		| Hinglish	| Hindi-English hybrid where the script is Latin\*				|
1485*912701f9SAndroid Build Coastguard Worker|hi-Latn			| Latin		| Hinglish	| Hindi written in Latin script; in practice usually a hybrid with English	|
1486*912701f9SAndroid Build Coastguard Worker|ta-t-***en-h0-hybrid***  	| Tamil		| Tanglish	| Tamil-English hybrid where the script is Tamil\*				|
1487*912701f9SAndroid Build Coastguard Worker|...																		||
1488*912701f9SAndroid Build Coastguard Worker|en-t-***hi-h0-hybrid***	| Latin		| Hinglish	| English-Hindi hybrid	where the script is Latin\*				|
1489*912701f9SAndroid Build Coastguard Worker|en-t-***zh-h0-hybrid***	| Latin		| Chinglish	| English-Chinese hybrid where the script is Latin\*				|
1490*912701f9SAndroid Build Coastguard Worker|...																		||
1491*912701f9SAndroid Build Coastguard Worker
1492*912701f9SAndroid Build Coastguard Worker\* When used as a request for international services (such as date formatting), the request is for everything to be in the base script if possible. When used to tag arbitrary content on a coarse level, the expectation is that it be the predominant script — that is, there may be certain passages or phrases that are in the other script but are not tagged on a fine-grained level.
1493*912701f9SAndroid Build Coastguard Worker
1494*912701f9SAndroid Build Coastguard Worker> _Note: The [unicode_language_id](#unicode_language_id) should be the language used as the ‘scaffold’: for the fallback locale for internationalization services, typically used for more of the core vocabulary/structure in the content. Thus where Hindi is the scaffold, Hinglish should be represented as hi-t-en-h0-hybrid (when written in Devanagari script) or hi-Latn-t-en-h0-hybrid (when written in Latin characters). Where English is the scaffold, Hinglish should be represented as en-t-hi-h0-hybrid (or possibly en-Deva-t-hi-h0-hybrid)._
1495*912701f9SAndroid Build Coastguard Worker
1496*912701f9SAndroid Build Coastguard WorkerThe value of -t- is a full _[unicode_language_id](#unicode_language_id)_, and can contain a subtag for the region where it is important to include it, as in the following. The value can also include the script, although that is not normally included: the only instance where it should be is where the content of the source text varies by script. So because zh-Hant has different vocabulary and expressions, it could make sense to have en-t-zh-hant to make that distinction.
1497*912701f9SAndroid Build Coastguard Worker
1498*912701f9SAndroid Build Coastguard Worker> Note: The default script for the language is computed without reference to the hybrid subtags. Thus the default script for 'ru' is “Cyrl”, no matter what the source is in the -t- tag.
1499*912701f9SAndroid Build Coastguard Worker
1500*912701f9SAndroid Build Coastguard Worker|Locale ID			| Base script	| Hybrid name	| Description									|
1501*912701f9SAndroid Build Coastguard Worker|-------------------------------|---------------|---------------|-------------------------------------------------------------------------------|
1502*912701f9SAndroid Build Coastguard Worker|ru-t-***en***-h0-hybrid	| Cyrillic	| Runglish	| Russian with an admixture of ***American English***				|
1503*912701f9SAndroid Build Coastguard Worker|ru-t-***en-gb***-h0-hybrid	| Cyrillic	| Runglish	| Russian with an admixture of ***British English***				|
1504*912701f9SAndroid Build Coastguard Worker|ru-***Latn***-t-en-gb-h0-hybrid| Latin		| Runglish	| Russian with an admixture of British English					|
1505*912701f9SAndroid Build Coastguard Worker|en-t-***zh-h0-hybrid***	| Latin		| Chinglish	| American English with an admixture of ***Chinese (Simplified Mandarin Chinese)***|
1506*912701f9SAndroid Build Coastguard Worker|en-t-***zh-hant-h0-hybrid***	| Latin		| Chinglish	| American English with an admixture of ***Chinese (Traditional Mandarin Chinese)***|
1507*912701f9SAndroid Build Coastguard Worker
1508*912701f9SAndroid Build Coastguard WorkerShould there ever be strong need for hybrids of more than two languages or for other purposes such as hybrid languages as the source of translated content, additional structure could be added.
1509*912701f9SAndroid Build Coastguard Worker
1510*912701f9SAndroid Build Coastguard Worker### <a name="Validity_Data" href="#Validity_Data">Validity Data</a>
1511*912701f9SAndroid Build Coastguard Worker
1512*912701f9SAndroid Build Coastguard Worker```xml
1513*912701f9SAndroid Build Coastguard Worker<!ELEMENT idValidity (id*) >
1514*912701f9SAndroid Build Coastguard Worker<!ELEMENT id ( #PCDATA ) >
1515*912701f9SAndroid Build Coastguard Worker<!ATTLIST id type NMTOKEN #REQUIRED >
1516*912701f9SAndroid Build Coastguard Worker<!ATTLIST id idStatus NMTOKEN #REQUIRED >
1517*912701f9SAndroid Build Coastguard Worker```
1518*912701f9SAndroid Build Coastguard Worker
1519*912701f9SAndroid Build Coastguard WorkerThe directory [common/validity](https://github.com/unicode-org/cldr/blob/main/common/validity/) contains machine-readable data for validating the language, region, script, and variant subtags, as well as currency, subdivisions and measure units. Each file contains a number of subtags with the following **idStatus** values:
1520*912701f9SAndroid Build Coastguard Worker
1521*912701f9SAndroid Build Coastguard Worker* **regular** — the standard codes used for the specific type of subtag
1522*912701f9SAndroid Build Coastguard Worker* **special** — certain exceptional language codes like 'mul' _(languages only)_
1523*912701f9SAndroid Build Coastguard Worker* **unknown** — the code used to indicate the "unknown", "undetermined" or "invalid" values. For more information, see _[Unknown or Invalid Identifiers](#Unknown_or_Invalid_Identifiers)_.
1524*912701f9SAndroid Build Coastguard Worker* **macroregion** — the standard codes that are macroregions _(for regions only)._
1525*912701f9SAndroid Build Coastguard Worker  * Note that some two-letter region codes are macroregions, and (in the future) some three-digit codes may be regular codes.
1526*912701f9SAndroid Build Coastguard Worker  * For details as to which regions are contained within which macroregions, see the `<containment>` element of the supplemental data.
1527*912701f9SAndroid Build Coastguard Worker* **deprecated** — codes that should not be used. The `<alias>` element in the supplementalMeta file contains more information about these codes, and which codes should be used instead.
1528*912701f9SAndroid Build Coastguard Worker* **private_use** — codes that, for CLDR, are considered private use. Note that some private-use codes in a source standard such as BCP 47 have defined CLDR semantics, and are considered regular codes. For more information, see _[Private Use Codes](#Private_Use_Codes)._
1529*912701f9SAndroid Build Coastguard Worker* **reserved** — codes that are private use in a source standard, but are reserved for future use as regular codes by CLDR.
1530*912701f9SAndroid Build Coastguard Worker
1531*912701f9SAndroid Build Coastguard WorkerThe list of subtags for each idStatus use a compact format as a space-delimited list of StringRanges, as defined in _Section String Range](#String_Range)._ The separator for each StringRange is a "~".
1532*912701f9SAndroid Build Coastguard Worker
1533*912701f9SAndroid Build Coastguard WorkerEach measure unit is a sequence of subtags, such as “angle-arc-minute”. The first subtag provides a general “category” of the unit.
1534*912701f9SAndroid Build Coastguard Worker
1535*912701f9SAndroid Build Coastguard WorkerIn version 28.0, the subdivisions in the validity files used the ISO format, uppercase with a hyphen separating two components, instead of the BCP 47 format.
1536*912701f9SAndroid Build Coastguard Worker
1537*912701f9SAndroid Build Coastguard Worker
1538*912701f9SAndroid Build Coastguard Worker
1539*912701f9SAndroid Build Coastguard Worker## <a name="Locale_Inheritance" href="#Locale_Inheritance">Locale Inheritance and Matching</a>
1540*912701f9SAndroid Build Coastguard Worker
1541*912701f9SAndroid Build Coastguard WorkerThe XML format relies on an inheritance model, whereby the resources are collected into _bundles_, and the bundles organized into a tree. Data for the many Spanish locales does not need to be duplicated across all of the countries having Spanish as a national language. Instead, common data is collected in the Spanish language locale, and territory locales only need to supply differences. The parent of all of the language locales is a generic locale known as _root_. Wherever possible, the resources in the root are language & territory neutral. For example, the collation (sorting) order in the root is based on the [[DUCET](#DUCET)] (see _[Root Collation](tr35-collation.md#Root_Collation)_). Since English language collation has the same ordering as the root locale, the 'en' locale data does not need to supply any collation data, nor do the 'en_US', 'en_GB' or the any of the various other locales that use English.
1542*912701f9SAndroid Build Coastguard Worker
1543*912701f9SAndroid Build Coastguard WorkerGiven a particular locale id "en_US_someVariant", the default search chain for a particular resource is the following.
1544*912701f9SAndroid Build Coastguard Worker
1545*912701f9SAndroid Build Coastguard Worker```
1546*912701f9SAndroid Build Coastguard Workeren_US_someVariant
1547*912701f9SAndroid Build Coastguard Workeren_US
1548*912701f9SAndroid Build Coastguard Workeren
1549*912701f9SAndroid Build Coastguard Workerroot
1550*912701f9SAndroid Build Coastguard Worker```
1551*912701f9SAndroid Build Coastguard Worker
1552*912701f9SAndroid Build Coastguard Worker_The inheritance is often not simple truncation, as will be seen later in this section._
1553*912701f9SAndroid Build Coastguard Worker
1554*912701f9SAndroid Build Coastguard WorkerThe default search chain is slighly different for multiple variants.
1555*912701f9SAndroid Build Coastguard WorkerIn that case, the inheritance chain covers all combinations of variants, with longest number of variants first, and otherwise in alphabetical order.
1556*912701f9SAndroid Build Coastguard WorkerFor example, where the requested locale ID is en_fonipa_scouse, the inheritance chain is as follows:
1557*912701f9SAndroid Build Coastguard Worker
1558*912701f9SAndroid Build Coastguard Worker```
1559*912701f9SAndroid Build Coastguard Workeren_GB_fonipa_scouse
1560*912701f9SAndroid Build Coastguard Workeren_GB_scouse_fonipa // extra step, only needed if not canonical
1561*912701f9SAndroid Build Coastguard Workeren_GB_fonipa
1562*912701f9SAndroid Build Coastguard Workeren_GB_scouse // extra step
1563*912701f9SAndroid Build Coastguard Workeren_GB
1564*912701f9SAndroid Build Coastguard Workeren
1565*912701f9SAndroid Build Coastguard Worker```
1566*912701f9SAndroid Build Coastguard Worker
1567*912701f9SAndroid Build Coastguard Worker
1568*912701f9SAndroid Build Coastguard WorkerIf the data for the implementation performing the inheritance doesn't require canonical locale identifiers, then extra locale IDs need to be inserted in the chain.
1569*912701f9SAndroid Build Coastguard WorkerThat is indicated in the example above, marked with "only needed if not canonical".
1570*912701f9SAndroid Build Coastguard WorkerThese would would include all combinations of variants that are not in canonical order, inserted in alphabetical order.
1571*912701f9SAndroid Build Coastguard WorkerNote that the order of multiple variants in canonical locale identifiers is alphabetical, as per [5. Canonicalizing Syntax](#5-canonicalizing-syntax) in [Annex C. LocaleId Canonicalization](#annex-c-localeid-canonicalization).
1572*912701f9SAndroid Build Coastguard Worker
1573*912701f9SAndroid Build Coastguard WorkerIf a type and key are supplied in the locale id, then logically the chain from that id to the root is searched for a resource tag with a given type, all the way up to root. If no resource is found with that tag and type, then the chain is searched again without the type.
1574*912701f9SAndroid Build Coastguard Worker
1575*912701f9SAndroid Build Coastguard WorkerThus the data for any given locale will only contain resources that are different from the parent locale. For example, most territory locales will inherit the bulk of their data from the language locale: "en" will contain the bulk of the data: "en_IE" will only contain a few items like currency. All data that is inherited from a parent is presumed to be valid, just as valid as if it were physically present in the file. This provides for much smaller resource bundles, and much simpler (and less error-prone) maintenance. At the script or region level, the "primary" child locale will be empty, since its parent will contain all of the appropriate resources for it. For more information see _CLDR Information: [Default Content](tr35-info.md#Default_Content)._
1576*912701f9SAndroid Build Coastguard Worker
1577*912701f9SAndroid Build Coastguard WorkerCertain data items depend only on the region specified in a locale id (by a [unicode_region_subtag](#unicode_region_subtag_validity) or an “rg” [Region Override](#RegionOverride) key), and are obtained from supplemental data rather than through locale resources. For example:
1578*912701f9SAndroid Build Coastguard Worker
1579*912701f9SAndroid Build Coastguard Worker* The currency for the specified region (see [Supplemental Currency Data](tr35-numbers.md#Supplemental_Currency_Data))
1580*912701f9SAndroid Build Coastguard Worker* The measurement system for the specified region (see [Measurement System Data](tr35-general.md#Measurement_System_Data))
1581*912701f9SAndroid Build Coastguard Worker* The week conventions for the specified region (see [Week Data](tr35-dates.md#Week_Data))
1582*912701f9SAndroid Build Coastguard Worker
1583*912701f9SAndroid Build Coastguard Worker(For more information on the specific items handled this way, see [Territory-Based Preferences](tr35-info.md#Territory_Based_Preferences).) These items will be correct for the specified region regardless of whether a locale bundle actually exists with the same combination of language and region as in the locale id. For example, suppose data is requested for the locale id "fr_US" and there is no bundle for that combination. Data obtained via locale inheritance, such as currency patterns and currency symbols, will be obtained from the parent locale "fr". However, currency amounts would be formatted by default using US dollars, just displayed in the manner governed by the locale "fr". When a locale id does not specify a region, the region-specific items such as those above are obtained from the likely region for the locale (obtained via [Likely Subtags](#Likely_Subtags)).
1584*912701f9SAndroid Build Coastguard Worker
1585*912701f9SAndroid Build Coastguard WorkerFor the relationship between Inheritance, DefaultContent, LikelySubtags, and LocaleMatching, see [Inheritance vs Related Information](tr35.md#Inheritance_vs_Related).
1586*912701f9SAndroid Build Coastguard Worker
1587*912701f9SAndroid Build Coastguard Worker### <a name="Lookup" href="#Lookup">Lookup</a>
1588*912701f9SAndroid Build Coastguard Worker
1589*912701f9SAndroid Build Coastguard WorkerIf a language has more than one script in customary modern use, then the CLDR file structure in common/main follows the following model:
1590*912701f9SAndroid Build Coastguard Worker
1591*912701f9SAndroid Build Coastguard Worker```
1592*912701f9SAndroid Build Coastguard Workerlang
1593*912701f9SAndroid Build Coastguard Workerlang_script
1594*912701f9SAndroid Build Coastguard Workerlang_script_region
1595*912701f9SAndroid Build Coastguard Workerlang_region (aliases to lang_script_region based on likely subtags)
1596*912701f9SAndroid Build Coastguard Worker```
1597*912701f9SAndroid Build Coastguard Worker
1598*912701f9SAndroid Build Coastguard Worker#### <a name="Bundle_vs_Item_Lookup" href="#Bundle_vs_Item_Lookup">Bundle vs Item Lookup</a>
1599*912701f9SAndroid Build Coastguard Worker
1600*912701f9SAndroid Build Coastguard WorkerThere are actually two different kinds of inheritance fallback: _resource bundle lookup_ and _resource item lookup_. For the former, a process is looking to find the first, best resource bundle it can; for the later, it is fallback within bundles on individual items, like the translated name for the region "CN" in Breton.
1601*912701f9SAndroid Build Coastguard Worker
1602*912701f9SAndroid Build Coastguard WorkerThese are closely related, but distinct, processes. They are illustrated in the table [Lookup Differences](#Lookup-Differences), where "key" stands for zero or more key/type pairs. Logically speaking, when looking up an item for a given locale, you first do a resource bundle lookup to find the best bundle for the locale, then you do an inherited item lookup starting with that resource bundle.
1603*912701f9SAndroid Build Coastguard Worker
1604*912701f9SAndroid Build Coastguard WorkerThe table [Lookup Differences](#Lookup-Differences) uses the naïve resource bundle lookup for illustration. More sophisticated systems will get far better results for resource bundle lookup if they use the algorithm described in _[Language Matching](#LanguageMatching)_. That algorithm takes into account both the user’s desired locale(s) and the application’s supported locales, in order to get the best match.
1605*912701f9SAndroid Build Coastguard Worker
1606*912701f9SAndroid Build Coastguard WorkerIf the naïve resource bundle lookup is used, the desired locale needs to be canonicalized using 4.3 [Likely Subtags](#Likely_Subtags) and the supplemental alias information, so that locales that CLDR considers identical are treated as such. Thus eng-Latn-GB should be mapped to en-GB, and cmn-TW mapped to zh-Hant-TW.
1607*912701f9SAndroid Build Coastguard Worker
1608*912701f9SAndroid Build Coastguard WorkerThe initial bundle accessed during resource bundle lookup should not contain a script subtag unless, according to likely subtags, the script is required to disambiguate the locale. For example, `zh-Hant-TW` should start lookup at `zh-TW` (since `zh-TW` implies `Hant`), and `de-Latn-LI` should start at `de-LI` (since `de` implies `Latn` and `de-LI` does not have its own entry in likely subtags).
1609*912701f9SAndroid Build Coastguard Worker
1610*912701f9SAndroid Build Coastguard WorkerFor the purposes of CLDR, everything with the `<ldml>` dtd is treated logically as if it is one resource bundle, even if the implementation separates data into separate physical resource bundles. For example, suppose that there is a main XML file for Nama (naq), but there are no `<unit>` elements for it because the units are all inherited from root. If the `<unit>` elements are separated into a separate data tree for modularity in the implementation, the Nama `<unit>` resource bundle would be empty. However, for purposes of resource-bundle lookup the resource bundle lookup still stops at naq.xml.
1611*912701f9SAndroid Build Coastguard Worker
1612*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Lookup-Differences" href="#Lookup-Differences">Lookup Differences</a>
1613*912701f9SAndroid Build Coastguard Worker
1614*912701f9SAndroid Build Coastguard Worker
1615*912701f9SAndroid Build Coastguard Worker<!-- HTML: readability -->
1616*912701f9SAndroid Build Coastguard Worker<table><thead>
1617*912701f9SAndroid Build Coastguard Worker<tr>
1618*912701f9SAndroid Build Coastguard Worker    <th>Lookup Type</th>
1619*912701f9SAndroid Build Coastguard Worker    <th>Example</th>
1620*912701f9SAndroid Build Coastguard Worker    <th>Comments</th>
1621*912701f9SAndroid Build Coastguard Worker</tr>
1622*912701f9SAndroid Build Coastguard Worker</thead><tbody>
1623*912701f9SAndroid Build Coastguard Worker<tr>
1624*912701f9SAndroid Build Coastguard Worker    <td><b>Resource bundle</b> lookup</td>
1625*912701f9SAndroid Build Coastguard Worker    <td>
1626*912701f9SAndroid Build Coastguard Worker        se-FI →                 <br/>
1627*912701f9SAndroid Build Coastguard Worker        se →                    <br/>
1628*912701f9SAndroid Build Coastguard Worker        <i>default‑locale* →</i><br/>
1629*912701f9SAndroid Build Coastguard Worker        root
1630*912701f9SAndroid Build Coastguard Worker    </td>
1631*912701f9SAndroid Build Coastguard Worker    <td><p>* The default-locale may have its own inheritance change; for example, it may be "en-GB → en" In that case, the chain is expanded by inserting the chain, resulting in:</p>
1632*912701f9SAndroid Build Coastguard Worker        <p>
1633*912701f9SAndroid Build Coastguard Worker            se-FI →             <br/>
1634*912701f9SAndroid Build Coastguard Worker            se →                <br/>
1635*912701f9SAndroid Build Coastguard Worker            fi →                <br/>
1636*912701f9SAndroid Build Coastguard Worker            <i>en-GB →</i>      <br/>
1637*912701f9SAndroid Build Coastguard Worker            <i>en →</i>         <br/>
1638*912701f9SAndroid Build Coastguard Worker            root
1639*912701f9SAndroid Build Coastguard Worker        </p>
1640*912701f9SAndroid Build Coastguard Worker    </td>
1641*912701f9SAndroid Build Coastguard Worker<tr>
1642*912701f9SAndroid Build Coastguard Worker    <td><b>Inherited item</b> lookup</td>
1643*912701f9SAndroid Build Coastguard Worker    <td>
1644*912701f9SAndroid Build Coastguard Worker        se-FI+key →             <br/>
1645*912701f9SAndroid Build Coastguard Worker        se+key →                <br/>
1646*912701f9SAndroid Build Coastguard Worker        <i>root_alias*+key</i>  <br/>
1647*912701f9SAndroid Build Coastguard Worker        → root+key
1648*912701f9SAndroid Build Coastguard Worker    </td>
1649*912701f9SAndroid Build Coastguard Worker    <td><p>* If there is a root_alias to another key or locale, then insert that entire chain. For example, suppose that months for another calendar system have a root alias to Gregorian months. In that case, the root alias would change the key, and retry from se-FI downward. This can happen multiple times.</p>
1650*912701f9SAndroid Build Coastguard Worker        <p>
1651*912701f9SAndroid Build Coastguard Worker            se-FI+key →         <br/>
1652*912701f9SAndroid Build Coastguard Worker            se+key →            <br/>
1653*912701f9SAndroid Build Coastguard Worker            root_alias*+key →   <br/>
1654*912701f9SAndroid Build Coastguard Worker            <i>se-FI+key2 →</i> <br/>
1655*912701f9SAndroid Build Coastguard Worker            <i>se+key2 →</i>    <br/>
1656*912701f9SAndroid Build Coastguard Worker            root_alias*+key2 →  <br/>
1657*912701f9SAndroid Build Coastguard Worker            root+key2
1658*912701f9SAndroid Build Coastguard Worker        </p>
1659*912701f9SAndroid Build Coastguard Worker    </td>
1660*912701f9SAndroid Build Coastguard Worker</tr>
1661*912701f9SAndroid Build Coastguard Worker</tbody></table>
1662*912701f9SAndroid Build Coastguard Worker
1663*912701f9SAndroid Build Coastguard Worker_Both the resource bundle inheritance and the inherited item inheritance use the parentLocale data, where available, instead of simple truncation._
1664*912701f9SAndroid Build Coastguard Worker
1665*912701f9SAndroid Build Coastguard WorkerThe fallback is a bit different for these two cases; internal aliases and keys are not involved in the bundle lookup, and the default locale is not involved in the item lookup. If the default-locale were used in the resource-item lookup, then strange results will occur. For example, suppose that the default locale is Swedish, and there is a Nama locale but no specific inherited item for collation. If the default-locale were used in resource-item lookup, it would produce odd and unexpected results for Nama sorting.
1666*912701f9SAndroid Build Coastguard Worker
1667*912701f9SAndroid Build Coastguard WorkerThe default locale is not even always used in resource bundle inheritance. For the following services, the fallback is always directly to the root locale rather than through default locale.
1668*912701f9SAndroid Build Coastguard Worker
1669*912701f9SAndroid Build Coastguard Worker*   collation
1670*912701f9SAndroid Build Coastguard Worker*   break iteration
1671*912701f9SAndroid Build Coastguard Worker*   case mapping
1672*912701f9SAndroid Build Coastguard Worker*   transliteration
1673*912701f9SAndroid Build Coastguard Worker    *   The lookup for transliteration is yet more complicated because of the interplay of source and target locales: see _Part 2 General, [Inheritance.](tr35-general.md#Inheritance)_
1674*912701f9SAndroid Build Coastguard Worker
1675*912701f9SAndroid Build Coastguard WorkerThus if there is no Akan locale, for example, asking for a collation for Akan should produce the root collation, _not the Swedish collation._
1676*912701f9SAndroid Build Coastguard Worker
1677*912701f9SAndroid Build Coastguard WorkerThe inherited item lookup must remain stable, because the resources are built with a certain fallback in mind; changing the core fallback order can render the bundle structure incoherent.
1678*912701f9SAndroid Build Coastguard Worker
1679*912701f9SAndroid Build Coastguard WorkerResource bundle lookup, on the other hand, is more flexible; changes in the view of the "best" match between the input request and the output bundle are more tolerant, when represent overall improvements for users. For more information, see _[A.1 Element fallback](#Fallback_Elements)_.
1680*912701f9SAndroid Build Coastguard Worker
1681*912701f9SAndroid Build Coastguard WorkerWhere the LDML inheritance relationship does not match a target system, such as POSIX, the data logically should be fully resolved in converting to a format for use by that system, by adding _all_ inherited data to each locale data set.
1682*912701f9SAndroid Build Coastguard Worker
1683*912701f9SAndroid Build Coastguard WorkerFor a more complete description of how inheritance applies to data, and the use of keywords, see _[Inheritance](#Inheritance_and_Validity)_ .
1684*912701f9SAndroid Build Coastguard Worker
1685*912701f9SAndroid Build Coastguard WorkerThe locale data does not contain general character properties that are derived from the _Unicode Character Database_ [[UAX44](https://www.unicode.org/reports/tr41/#UAX44)]. That data being common across locales, it is not duplicated in the bundles. Constructing a POSIX locale from the CLDR data requires use of UCD data. In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding.
1686*912701f9SAndroid Build Coastguard Worker
1687*912701f9SAndroid Build Coastguard Worker**Warning:** If a locale has a different script than its parent (for example, sr_Latn), then special attention must be paid to make sure that all inheritance is covered. For example, auxiliary exemplar characters may need to be empty ("[]") to block inheritance.
1688*912701f9SAndroid Build Coastguard Worker
1689*912701f9SAndroid Build Coastguard Worker**Empty Override:** There is one special value reserved in LDML to indicate that a child locale is to have no value for a path, even if the parent locale has a value for that path. That value is "∅∅∅". For example, if there is no phrase for "two days ago" in a language, that can be indicated with:
1690*912701f9SAndroid Build Coastguard Worker
1691*912701f9SAndroid Build Coastguard Worker```xml
1692*912701f9SAndroid Build Coastguard Worker<field type="day">
1693*912701f9SAndroid Build Coastguard Worker  <relative type="-2">∅∅∅</relative>
1694*912701f9SAndroid Build Coastguard Worker```
1695*912701f9SAndroid Build Coastguard Worker
1696*912701f9SAndroid Build Coastguard Worker<a name="Multiple_Inheritance"></a>
1697*912701f9SAndroid Build Coastguard Worker#### <a name="Lateral_Inheritance" href="#Lateral_Inheritance">Lateral Inheritance</a>
1698*912701f9SAndroid Build Coastguard Worker
1699*912701f9SAndroid Build Coastguard Worker__Lateral Inheritance__ is where resources are inherited from within the same locale, _before inheriting from the parent_. This is used for the following element@attribute instances:
1700*912701f9SAndroid Build Coastguard Worker
1701*912701f9SAndroid Build Coastguard Worker| Element @Attribute          | Source | Context |
1702*912701f9SAndroid Build Coastguard Worker| ---------------- | ------ | ------- |
1703*912701f9SAndroid Build Coastguard Worker| `currency` `@pattern` | `currencyFormat`   | `numberSystem` = `defaultNumberingSystem`, unless otherwise specified*<br/>`currencyFormatLength` type=none, unless otherwise specified<br/>`currencyFormat` `type="standard"`, unless otherwise specified |
1704*912701f9SAndroid Build Coastguard Worker| `currency` `@decimal` | `symbols` `@decimal`  | `numberSystem` = `defaultNumberingSystem`, unless otherwise specified |
1705*912701f9SAndroid Build Coastguard Worker| `currency` `@group`   | `symbols` `@group`    | `numberSystem` = `defaultNumberingSystem`, unless otherwise specified |
1706*912701f9SAndroid Build Coastguard Worker
1707*912701f9SAndroid Build Coastguard Worker>\* The "unless otherwise specified" clause is for when an API or other context indicates a different choice, such as currencyFormat type="accounting".
1708*912701f9SAndroid Build Coastguard Worker
1709*912701f9SAndroid Build Coastguard WorkerFor example, with /currency [@type="CVE"], the decimal symbol for almost all locales is the value from symbols/decimal, but for pt_CV it is explicitly `<decimal>$</decimal>`.
1710*912701f9SAndroid Build Coastguard Worker
1711*912701f9SAndroid Build Coastguard WorkerThe following attributes use lateral inheritance for **all elements** with the DTD root = ldml, except where otherwise noted. The process is applied recursively.
1712*912701f9SAndroid Build Coastguard Worker
1713*912701f9SAndroid Build Coastguard Worker| Attribute  | Fallback                               | Exception Elements          |
1714*912701f9SAndroid Build Coastguard Worker| ---------- | -------------------------------------- | --------------------------- |
1715*912701f9SAndroid Build Coastguard Worker| `alt`        | __no alt attribute__                   | _none_                      |
1716*912701f9SAndroid Build Coastguard Worker| `case`       | "nominative" → ∅                       | `caseMinimalPairs`            |
1717*912701f9SAndroid Build Coastguard Worker| `gender`     | default_gender(locale) → ∅             | `genderMinimalPairs`          |
1718*912701f9SAndroid Build Coastguard Worker| `count`      | plural_rules(locale, x) → "other" → ∅  | `minDays`, `pluralMinimalPairs` |
1719*912701f9SAndroid Build Coastguard Worker| `ordinal`    | plural_rules(locale, x) → "other" → ∅  | `ordinalMinimalPairs`         |
1720*912701f9SAndroid Build Coastguard Worker
1721*912701f9SAndroid Build Coastguard WorkerThe gender fallback is to neuter if the locale has a neuter gender, otherwise masculine. This may be extended in the future if necessary. See also [Part 2, Grammatical Features](tr35-general.md#Grammatical_Features).
1722*912701f9SAndroid Build Coastguard Worker
1723*912701f9SAndroid Build Coastguard WorkerFor example, if there is no value for a path, and that path has a [@count="x"] attribute and value, then:
1724*912701f9SAndroid Build Coastguard Worker
1725*912701f9SAndroid Build Coastguard Worker1. If "x" is numeric, the path falls back to the path with [@count=«the plural rules category for x for that locale»], within that the same locale.
1726*912701f9SAndroid Build Coastguard Worker   1. For example, [@count="0"] for English falls back to [@count="other"], while for French falls back to [@count="one"].
1727*912701f9SAndroid Build Coastguard Worker2. If "x" is anything but "other", it falls back to a path [@count="other"], within that the same locale.
1728*912701f9SAndroid Build Coastguard Worker3. If "x" is "other", it falls back to the path that is completely missing the count item, within that the same locale.
1729*912701f9SAndroid Build Coastguard Worker4. If there is no value for that path the same locale, the same process is used for the **original path** in the parent locale.
1730*912701f9SAndroid Build Coastguard Worker
1731*912701f9SAndroid Build Coastguard WorkerA path may have multiple attributes with lateral inheritance. In such a case, all of the combinations are tried, and in the order supplied above. For example (this is an extreme case):
1732*912701f9SAndroid Build Coastguard Worker
1733*912701f9SAndroid Build Coastguard Worker```
1734*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="few"][@gender="feminine"][@case="accusative">] →
1735*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="few"][@gender="feminine"][@case="nominative">] →
1736*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="few"][@gender="feminine"] →
1737*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="few"][@gender="neuter"][@case="accusative">] →
1738*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="few"][@gender="neuter"][@case="nominative">] →
1739*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="few"][@gender="neuter"] →
1740*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="few"][@case="accusative">] →
1741*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="few"][@case="nominative">] →
1742*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="few"] →
1743*912701f9SAndroid Build Coastguard Worker
1744*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="other"][@gender="feminine"][@case="accusative">] →
1745*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="other"][@gender="feminine"][@case="nominative">] →
1746*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="other"][@gender="feminine"] →
1747*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="other"][@gender="neuter"][@case="accusative">] →
1748*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="other"][@gender="neuter"][@case="nominative">] →
1749*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="other"][@gender="neuter"] →
1750*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="other"][@case="accusative">] →
1751*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="other"][@case="nominative">] →
1752*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@count="other"] →
1753*912701f9SAndroid Build Coastguard Worker
1754*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@gender="feminine"][@case="accusative">] →
1755*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@gender="feminine"][@case="nominative">] →
1756*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@gender="feminine"] →
1757*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@gender="neuter"][@case="accusative">] →
1758*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@gender="neuter"][@case="nominative">] →
1759*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@gender="neuter"] →
1760*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@case="accusative">] →
1761*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1[@case="nominative">] →
1762*912701f9SAndroid Build Coastguard Worker/compoundUnitPattern1
1763*912701f9SAndroid Build Coastguard Worker```
1764*912701f9SAndroid Build Coastguard Worker
1765*912701f9SAndroid Build Coastguard Worker_Examples:_
1766*912701f9SAndroid Build Coastguard Worker
1767*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Count_Fallback_normal" href="#Count_Fallback_normal">Count Fallback: normal</a>
1768*912701f9SAndroid Build Coastguard Worker
1769*912701f9SAndroid Build Coastguard Worker| Locale | Path |
1770*912701f9SAndroid Build Coastguard Worker| ------ | ---- |
1771*912701f9SAndroid Build Coastguard Worker| fr-CA  | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="x"]`     |
1772*912701f9SAndroid Build Coastguard Worker| fr-CA  | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="other"]` |
1773*912701f9SAndroid Build Coastguard Worker| fr     | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="x"]`     |
1774*912701f9SAndroid Build Coastguard Worker| fr     | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="other"]` |
1775*912701f9SAndroid Build Coastguard Worker| root   | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="x"]`     |
1776*912701f9SAndroid Build Coastguard Worker| root   | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="other"]` |
1777*912701f9SAndroid Build Coastguard Worker
1778*912701f9SAndroid Build Coastguard Worker> Note that there may also be an alias in root that changes the path and starts again from the requested locale, such as:
1779*912701f9SAndroid Build Coastguard Worker
1780*912701f9SAndroid Build Coastguard Worker```xml
1781*912701f9SAndroid Build Coastguard Worker<unitLength type="narrow">
1782*912701f9SAndroid Build Coastguard Worker   <alias source="locale" path="../unitLength[@type='short']"/>
1783*912701f9SAndroid Build Coastguard Worker</unitLength>
1784*912701f9SAndroid Build Coastguard Worker```
1785*912701f9SAndroid Build Coastguard Worker
1786*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Count_Fallback_currency" href="#Count_Fallback_currency">Count Fallback: currency</a>
1787*912701f9SAndroid Build Coastguard Worker
1788*912701f9SAndroid Build Coastguard Worker| Locale | Path |
1789*912701f9SAndroid Build Coastguard Worker| ------ | ---- |
1790*912701f9SAndroid Build Coastguard Worker| fr-CA | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName[@count="x"]`     |
1791*912701f9SAndroid Build Coastguard Worker| fr-CA | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName[@count="other"]` |
1792*912701f9SAndroid Build Coastguard Worker| fr-CA | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName`                 |
1793*912701f9SAndroid Build Coastguard Worker| fr    | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName[@count="x"]`     |
1794*912701f9SAndroid Build Coastguard Worker| fr    | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName[@count="other"]` |
1795*912701f9SAndroid Build Coastguard Worker| fr    | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName`                 |
1796*912701f9SAndroid Build Coastguard Worker| root  | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName[@count="x"]`     |
1797*912701f9SAndroid Build Coastguard Worker| root  | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName[@count="other"]` |
1798*912701f9SAndroid Build Coastguard Worker| root  | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName`                 |
1799*912701f9SAndroid Build Coastguard Worker
1800*912701f9SAndroid Build Coastguard Worker
1801*912701f9SAndroid Build Coastguard Worker#### <a name="Parent_Locales" href="#Parent_Locales">Parent Locales</a>
1802*912701f9SAndroid Build Coastguard Worker
1803*912701f9SAndroid Build Coastguard Worker```xml
1804*912701f9SAndroid Build Coastguard Worker<!ELEMENT parentLocales ( parentLocale* ) >
1805*912701f9SAndroid Build Coastguard Worker<!ATTLIST parentLocales component NMTOKENS #IMPLIED >
1806*912701f9SAndroid Build Coastguard Worker<!ELEMENT parentLocale EMPTY >
1807*912701f9SAndroid Build Coastguard Worker<!ATTLIST parentLocale parent NMTOKEN #REQUIRED >
1808*912701f9SAndroid Build Coastguard Worker<!ATTLIST parentLocale localeRules NMTOKENS #IMPLIED >
1809*912701f9SAndroid Build Coastguard Worker<!ATTLIST parentLocale locales NMTOKENS #REQUIRED >
1810*912701f9SAndroid Build Coastguard Worker```
1811*912701f9SAndroid Build Coastguard Worker
1812*912701f9SAndroid Build Coastguard WorkerWhen the component does not occur, that is referred to as the ‘main’ component.
1813*912701f9SAndroid Build Coastguard WorkerOtherwise the component value typically corresponds to elements and their children, such as ‘collations’ or ‘plurals’.
1814*912701f9SAndroid Build Coastguard WorkerThere may be more than one component value (space separated):
1815*912701f9SAndroid Build Coastguard Workerin that case the information applies to all the components listed.
1816*912701f9SAndroid Build Coastguard Worker
1817*912701f9SAndroid Build Coastguard WorkerThe basic inheritance model for locales of the form `lang_script_region_variant1_…variantN` is to truncate from the end.
1818*912701f9SAndroid Build Coastguard WorkerThat is,
1819*912701f9SAndroid Build Coastguard Workerremove the _u and _t extensions, then remove the last _ and following tag, then restore the extensions.
1820*912701f9SAndroid Build Coastguard Worker
1821*912701f9SAndroid Build Coastguard WorkerFor example
1822*912701f9SAndroid Build Coastguard Worker```
1823*912701f9SAndroid Build Coastguard Workersr_Cyrl_ME
1824*912701f9SAndroid Build Coastguard Worker1825*912701f9SAndroid Build Coastguard Workersr_Cyrl
1826*912701f9SAndroid Build Coastguard Worker1827*912701f9SAndroid Build Coastguard Workersr
1828*912701f9SAndroid Build Coastguard Worker
1829*912701f9SAndroid Build Coastguard Worker```
1830*912701f9SAndroid Build Coastguard WorkerIn some cases, the normal truncation inheritance does not function well.
1831*912701f9SAndroid Build Coastguard WorkerFor example, if the truncation algorithm changes script,
1832*912701f9SAndroid Build Coastguard Workerthen a mixture of child and parent textual data is a mishmash of different scripts.
1833*912701f9SAndroid Build Coastguard Worker
1834*912701f9SAndroid Build Coastguard WorkerThus there are two cases where the truncation inheritance needs to be overridden:
1835*912701f9SAndroid Build Coastguard Worker
1836*912701f9SAndroid Build Coastguard Worker1.  When the parent locale would have a different script, and text would be mixed.
1837*912701f9SAndroid Build Coastguard Worker2.  In certain exceptional circumstances where the 'truncation' parent needs to be adjusted.
1838*912701f9SAndroid Build Coastguard Worker
1839*912701f9SAndroid Build Coastguard WorkerThe `parentLocale` element is used to override the normal inheritance when accessing CLDR data.
1840*912701f9SAndroid Build Coastguard Worker
1841*912701f9SAndroid Build Coastguard WorkerFor case 1, there is a special attribute and value, `localeRules="nonlikelyScript"`,
1842*912701f9SAndroid Build Coastguard Workerwhich specifies **all locales** of the form `lang_script`,
1843*912701f9SAndroid Build Coastguard Workerwherever the `script` is **not** the likely script for `lang`.
1844*912701f9SAndroid Build Coastguard WorkerFor migration, the previous short list of locales (a subset of the nonlikelyScript locales) is retained,
1845*912701f9SAndroid Build Coastguard Workerbut those locales are slated for removal in the future.
1846*912701f9SAndroid Build Coastguard WorkerFor example, `ru_Latn` is not included in the short list but is included (programmatically) in the rule.
1847*912701f9SAndroid Build Coastguard Worker
1848*912701f9SAndroid Build Coastguard Worker```xml
1849*912701f9SAndroid Build Coastguard Worker<parentLocale parent="root" localeRules="nonlikelyScript" locales="az_Arab az_Cyrl bal_Latn … yue_Hans zh_Hant"/>/>
1850*912701f9SAndroid Build Coastguard Worker```
1851*912701f9SAndroid Build Coastguard Worker
1852*912701f9SAndroid Build Coastguard WorkerThe `localeRules` is used for the main component, for example.
1853*912701f9SAndroid Build Coastguard WorkerIt is not used to components where text is not mixed,
1854*912701f9SAndroid Build Coastguard Workersuch as the collations component or the plurals component.
1855*912701f9SAndroid Build Coastguard Worker
1856*912701f9SAndroid Build Coastguard WorkerFor case 2, the children and parent share the same primary language, but the region is changed.
1857*912701f9SAndroid Build Coastguard WorkerFor example:
1858*912701f9SAndroid Build Coastguard Worker
1859*912701f9SAndroid Build Coastguard Worker```xml
1860*912701f9SAndroid Build Coastguard Worker<parentLocale parent="es_419" locales="es_AR es_BO … es_UY es_VE"/>
1861*912701f9SAndroid Build Coastguard Worker```
1862*912701f9SAndroid Build Coastguard Worker
1863*912701f9SAndroid Build Coastguard WorkerThere are certain components that require addenda to the common parent fallback rules.
1864*912701f9SAndroid Build Coastguard WorkerFor a locale like `zh_Hant` in the example above,
1865*912701f9SAndroid Build Coastguard Workerthe `parentLocale` element would dictate the parent as `root` when referring to main locale data,
1866*912701f9SAndroid Build Coastguard Workerbut for collation data, the parent locale should still be `zh`,
1867*912701f9SAndroid Build Coastguard Workereven though the `parentLocale` element is present for that locale.
1868*912701f9SAndroid Build Coastguard WorkerTo address this, components can have their own fallback rules that inherit from the common rules
1869*912701f9SAndroid Build Coastguard Workerand add additional parents that supplement or override the common rules:
1870*912701f9SAndroid Build Coastguard Worker
1871*912701f9SAndroid Build Coastguard Worker```xml
1872*912701f9SAndroid Build Coastguard Worker<parentLocales component="segmentations">
1873*912701f9SAndroid Build Coastguard Worker  <parentLocale parent="zh" locales="zh_Hant"/>
1874*912701f9SAndroid Build Coastguard Worker</parentLocales>
1875*912701f9SAndroid Build Coastguard Worker```
1876*912701f9SAndroid Build Coastguard Worker
1877*912701f9SAndroid Build Coastguard WorkerNote: When components were first introduced, the component-specific parent locales were be merged with the main parent locales.
1878*912701f9SAndroid Build Coastguard WorkerThis was determined to be an error, and the component-specific parent locales are now not merged,
1879*912701f9SAndroid Build Coastguard Workerbut instead are treated as stand-alone.
1880*912701f9SAndroid Build Coastguard Worker
1881*912701f9SAndroid Build Coastguard WorkerSince parentLocale information is not localizable on a per locale basis,
1882*912701f9SAndroid Build Coastguard Workerthe parentLocale information is contained in CLDR’s [supplemental data.](tr35-info.md)
1883*912701f9SAndroid Build Coastguard Worker
1884*912701f9SAndroid Build Coastguard WorkerWhen a `parentLocale` element is used to override normal inheritance, the following guidelines apply in most cases:
1885*912701f9SAndroid Build Coastguard Worker
1886*912701f9SAndroid Build Coastguard Worker1.  If X is the parentLocale of Y, then either X is the root locale, or X has the same base language code as Y.
1887*912701f9SAndroid Build Coastguard WorkerFor example, the parent of `en` cannot be `fr`, and the parent of `en_YY` cannot be `fr` or `fr_XX`.
1888*912701f9SAndroid Build Coastguard Worker2.  If X is the parentLocale of Y, Y must not be a base language locale. For example, the parent of `en` cannot be `en_XX`.
1889*912701f9SAndroid Build Coastguard Worker
1890*912701f9SAndroid Build Coastguard WorkerThere may be specific exceptions to these for certain closely-related languages or language-script combinations, for example:
1891*912701f9SAndroid Build Coastguard Worker* `no` may be the parent of `nb` and `nn`.
1892*912701f9SAndroid Build Coastguard Worker* `en_IN` may be the parent of `hi_Latn` (the parent is one of the languages for a child that is effectively a hybrid of two languages in `Latn` script)
1893*912701f9SAndroid Build Coastguard Worker
1894*912701f9SAndroid Build Coastguard WorkerThere are certain invariants that must always be true:
1895*912701f9SAndroid Build Coastguard Worker
1896*912701f9SAndroid Build Coastguard Worker3. The parent must either be the root locale or have the same script as the child. This rule applies to component=main.
1897*912701f9SAndroid Build Coastguard Worker4. There must never be cycles, such as: X parent of Y ... parent of X.
1898*912701f9SAndroid Build Coastguard Worker5. Following the inheritance path, using parentLocale where available and otherwise truncating the locale, must always lead eventually to the root locale.
1899*912701f9SAndroid Build Coastguard Worker
1900*912701f9SAndroid Build Coastguard Worker#### <a name="Region_Priority_Inheritance" href="#Region_Priority_Inheritance">Region-Priority Inheritance</a>
1901*912701f9SAndroid Build Coastguard Worker
1902*912701f9SAndroid Build Coastguard WorkerCertain data may be more appropriate to store with the region as the primary key instead of language. This is often needed for regional user preferences, such as week info, calendar system, and measurement system. All resources matched by an entry in <a href="tr35-info.md#rgScope">&lt;rgScope&gt;</a> should use this type of inheritance.
1903*912701f9SAndroid Build Coastguard Worker
1904*912701f9SAndroid Build Coastguard WorkerThe default search chain for region-priority inheritance removes the language subtag before the region subtag, as follows:
1905*912701f9SAndroid Build Coastguard Worker
1906*912701f9SAndroid Build Coastguard Worker```
1907*912701f9SAndroid Build Coastguard Workeren_US_someVariant
1908*912701f9SAndroid Build Coastguard Workeren_US
1909*912701f9SAndroid Build Coastguard WorkerUS
1910*912701f9SAndroid Build Coastguard Worker001
1911*912701f9SAndroid Build Coastguard Worker```
1912*912701f9SAndroid Build Coastguard Worker
1913*912701f9SAndroid Build Coastguard WorkerEquivalently as BCP-47:
1914*912701f9SAndroid Build Coastguard Worker
1915*912701f9SAndroid Build Coastguard Worker```
1916*912701f9SAndroid Build Coastguard Workeren-US-variant
1917*912701f9SAndroid Build Coastguard Workeren-US
1918*912701f9SAndroid Build Coastguard Workerund-US
1919*912701f9SAndroid Build Coastguard Workerund
1920*912701f9SAndroid Build Coastguard Worker```
1921*912701f9SAndroid Build Coastguard Worker
1922*912701f9SAndroid Build Coastguard WorkerBefore running region-priority inheritance, the locale should be normalized as follows:
1923*912701f9SAndroid Build Coastguard Worker
1924*912701f9SAndroid Build Coastguard Worker1. If the locale contains the `-u-rg` Unicode BCP-47 locale extension, the region subtag should be set to the `-u-rg` region. For example, `en-US-u-rg-gbzzzz` should normalize to `en-GB` when running region-priority inheritance.
1925*912701f9SAndroid Build Coastguard Worker2. If, after performing step 1, the locale is missing the region subtag (`language` or `language_script`), the region subtag should be filled in from likely subtags data. For example, `en` should become `en-US` before running region-priority inheritance.
1926*912701f9SAndroid Build Coastguard Worker
1927*912701f9SAndroid Build Coastguard WorkerNote that region-priority inheritance does not currently make use of parent locales or territory containment, but it may in the future.
1928*912701f9SAndroid Build Coastguard Worker
1929*912701f9SAndroid Build Coastguard Worker### <a name="Inheritance_and_Validity" href="#Inheritance_and_Validity">Inheritance and Validity</a>
1930*912701f9SAndroid Build Coastguard Worker
1931*912701f9SAndroid Build Coastguard WorkerThe following describes in more detail how to determine the exact inheritance of elements, and the validity of a given element in LDML.
1932*912701f9SAndroid Build Coastguard Worker
1933*912701f9SAndroid Build Coastguard Worker#### <a name="Definitions" href="#Definitions">Definitions</a>
1934*912701f9SAndroid Build Coastguard Worker
1935*912701f9SAndroid Build Coastguard Worker_Blocking_ elements are those whose subelements do not inherit from parent locales. For example, a `<collation>` element is a blocking element: everything in a `<collation>` element is treated as a single lump of data, as far as inheritance is concerned. For more information, see [Valid Attribute Values](#Valid_Attribute_Values).
1936*912701f9SAndroid Build Coastguard Worker
1937*912701f9SAndroid Build Coastguard WorkerAttributes that serve to distinguish multiple elements at the same level are called _distinguishing_ attributes. For example, the `type` attribute distinguishes different elements in lists of translations, such as:
1938*912701f9SAndroid Build Coastguard Worker
1939*912701f9SAndroid Build Coastguard Worker```xml
1940*912701f9SAndroid Build Coastguard Worker<language type="aa">Afar</language>
1941*912701f9SAndroid Build Coastguard Worker<language type="ab">Abkhazian</language>
1942*912701f9SAndroid Build Coastguard Worker```
1943*912701f9SAndroid Build Coastguard Worker
1944*912701f9SAndroid Build Coastguard WorkerDistinguishing attributes affect inheritance; two elements with different distinguishing attributes are treated as different for purposes of inheritance. For more information, see [Valid Attribute Values](#Valid_Attribute_Values). Other attributes are called value attributes. Value attributes do not affect inheritance, and elements with value attributes may not have child elements (see [XML Format](#XML_Format)).
1945*912701f9SAndroid Build Coastguard Worker
1946*912701f9SAndroid Build Coastguard WorkerNon-distinguishing attributes are identified by [DTD Annotations](#DTD_Annotations) such as `@VALUE`.
1947*912701f9SAndroid Build Coastguard Worker
1948*912701f9SAndroid Build Coastguard WorkerFor any element in an XML file, _an element chain_ is a resolved [[XPath](#XPath)] leading from the root to an element, with attributes on each element in alphabetical order. So in, say, [https://github.com/unicode-org/cldr/blob/main/common/main/el.xml](https://github.com/unicode-org/cldr/blob/main/common/main/el.xml) we may have:
1949*912701f9SAndroid Build Coastguard Worker
1950*912701f9SAndroid Build Coastguard Worker```xml
1951*912701f9SAndroid Build Coastguard Worker<ldml>
1952*912701f9SAndroid Build Coastguard Worker    <identity>
1953*912701f9SAndroid Build Coastguard Worker        <version number="1.1" />
1954*912701f9SAndroid Build Coastguard Worker        <language type="el" />
1955*912701f9SAndroid Build Coastguard Worker    </identity>
1956*912701f9SAndroid Build Coastguard Worker    <localeDisplayNames>
1957*912701f9SAndroid Build Coastguard Worker        <languages>
1958*912701f9SAndroid Build Coastguard Worker            <language type="ar">Αραβικά</language>
1959*912701f9SAndroid Build Coastguard Worker...
1960*912701f9SAndroid Build Coastguard Worker```
1961*912701f9SAndroid Build Coastguard Worker
1962*912701f9SAndroid Build Coastguard WorkerWhich gives the following element chains (among others):
1963*912701f9SAndroid Build Coastguard Worker
1964*912701f9SAndroid Build Coastguard Worker* `//ldml/identity/version[@number="1.1"]`
1965*912701f9SAndroid Build Coastguard Worker* `//ldml/localeDisplayNames/languages/language[@type="ar"]`
1966*912701f9SAndroid Build Coastguard Worker
1967*912701f9SAndroid Build Coastguard WorkerAn element chain A is an _extension_ of an element chain B if B is equivalent to an initial portion of A. For example, #2 below is an extension of #1. (Equivalent, depending on the tree, may not be "identical to". See below for an example.)
1968*912701f9SAndroid Build Coastguard Worker
1969*912701f9SAndroid Build Coastguard Worker1. `//ldml/localeDisplayNames`
1970*912701f9SAndroid Build Coastguard Worker2. `//ldml/localeDisplayNames/languages/language[@type="ar"]`
1971*912701f9SAndroid Build Coastguard Worker
1972*912701f9SAndroid Build Coastguard WorkerAn LDML file can be thought of as an ordered list of _element pairs_: <element chain, data>, where the element chains are all the chains for the end-nodes. (This works because of restrictions on the structure of LDML, including that it does not allow mixed content.) The ordering is the ordering that the element chains are found in the file, and thus determined by the DTD.
1973*912701f9SAndroid Build Coastguard Worker
1974*912701f9SAndroid Build Coastguard WorkerFor example, some of those pairs would be the following. Notice that the first has the null string as element contents.
1975*912701f9SAndroid Build Coastguard Worker
1976*912701f9SAndroid Build Coastguard Worker* <`//ldml/identity/version[@number="1.1"]`,` ""`>
1977*912701f9SAndroid Build Coastguard Worker* <`//ldml/localeDisplayNames/languages/language[@type="ar"]`, `"Αραβικά"`>
1978*912701f9SAndroid Build Coastguard Worker
1979*912701f9SAndroid Build Coastguard Worker> Note: There are two exceptions to this:
1980*912701f9SAndroid Build Coastguard Worker>
1981*912701f9SAndroid Build Coastguard Worker> 1. Blocking nodes and their contents are treated as a single end node.
1982*912701f9SAndroid Build Coastguard Worker> 2. In terms of computing inheritance, the element pair consists of the element chain plus all distinguishing attributes; the value consists of the value (if any) plus any nondistinguishing attributes.
1983*912701f9SAndroid Build Coastguard Worker>
1984*912701f9SAndroid Build Coastguard Worker> > Thus instead of the element pair being (a) below, it is (b):
1985*912701f9SAndroid Build Coastguard Worker> >
1986*912701f9SAndroid Build Coastguard Worker> > 1. <`//ldml/dates/calendars/calendar[@type='gregorian']/week/weekendStart[@day='sun'][@time='00:00']`,`""`>
1987*912701f9SAndroid Build Coastguard Worker> > 2. <`//ldml/dates/calendars/calendar[@type='gregorian']/week/weekendStart`,`[@day='sun'][@time='00:00']`>
1988*912701f9SAndroid Build Coastguard Worker
1989*912701f9SAndroid Build Coastguard WorkerTwo LDML element chains are _equivalent_ when they would be identical if all attributes and their values were removed — except for distinguishing attributes. Thus the following are equivalent:
1990*912701f9SAndroid Build Coastguard Worker
1991*912701f9SAndroid Build Coastguard Worker* `//ldml/localeDisplayNames/languages/language[@type="ar"]`
1992*912701f9SAndroid Build Coastguard Worker* `//ldml/localeDisplayNames/languages/language[@type="ar"][@draft="unconfirmed"]`
1993*912701f9SAndroid Build Coastguard Worker
1994*912701f9SAndroid Build Coastguard WorkerFor any locale ID, a _locale chain_ is an ordered list starting with the root and leading down to the ID. For example:
1995*912701f9SAndroid Build Coastguard Worker
1996*912701f9SAndroid Build Coastguard Worker> <root, de, de_DE, de_DE_xxx>
1997*912701f9SAndroid Build Coastguard Worker
1998*912701f9SAndroid Build Coastguard Worker#### <a name="Resolved_Data_File" href="#Resolved_Data_File">Resolved Data File</a>
1999*912701f9SAndroid Build Coastguard Worker
2000*912701f9SAndroid Build Coastguard WorkerTo produce fully resolved locale data file from CLDR for a locale ID L, you start with L, and successively add unique items from the parent locales until you get up to root. More formally, this can be expressed as the following procedure.
2001*912701f9SAndroid Build Coastguard Worker
2002*912701f9SAndroid Build Coastguard Worker1. Let Result be initially L.
2003*912701f9SAndroid Build Coastguard Worker2. For each Li in the locale chain for L, starting at L and going up to root:
2004*912701f9SAndroid Build Coastguard Worker   1. Let Temp be a copy of the pairs in the LDML file for Li
2005*912701f9SAndroid Build Coastguard Worker   2. Replace each alias in Temp by the resolved list of pairs it points to.
2006*912701f9SAndroid Build Coastguard Worker      1. The resolved list of pairs is obtained by recursively applying this procedure.
2007*912701f9SAndroid Build Coastguard Worker      2. That alias now blocks any inheritance from the parent. (See _[Common Elements](#Common_Elements)_ for an example.)
2008*912701f9SAndroid Build Coastguard Worker   3. For each element pair P in Temp:
2009*912701f9SAndroid Build Coastguard Worker      1. If P does not contain a blocking element, and Result does not have an element pair Q with an equivalent element chain, add P to Result.
2010*912701f9SAndroid Build Coastguard Worker
2011*912701f9SAndroid Build Coastguard Worker**Notes:**
2012*912701f9SAndroid Build Coastguard Worker
2013*912701f9SAndroid Build Coastguard Worker* When adding an element pair to a result, it has to go in the right order for it to be valid according to the DTD.
2014*912701f9SAndroid Build Coastguard Worker* The identity element and its children are unaffected by resolution.
2015*912701f9SAndroid Build Coastguard Worker* The LDML data must be constructed so as to avoid circularity in step 2.2.
2016*912701f9SAndroid Build Coastguard Worker
2017*912701f9SAndroid Build Coastguard Worker#### <a name="Valid_Data" href="#Valid_Data">Valid Data</a>
2018*912701f9SAndroid Build Coastguard Worker
2019*912701f9SAndroid Build Coastguard WorkerThe attribute `draft="x"` in LDML means that the data has not been approved by the subcommittee. (For more information, see [Process](https://cldr.unicode.org/index/process)). However, some data that is not explicitly marked as `draft` may be implicitly `draft`, either because it inherits it from a parent, or from an enclosing element.
2020*912701f9SAndroid Build Coastguard Worker
2021*912701f9SAndroid Build Coastguard Worker**Example 2.** Suppose that new locale data is added for af (Afrikaans). To indicate that all of the data is _unconfirmed_, the attribute can be added to the top level.
2022*912701f9SAndroid Build Coastguard Worker
2023*912701f9SAndroid Build Coastguard Worker```xml
2024*912701f9SAndroid Build Coastguard Worker<ldml version="1.1" draft="unconfirmed">
2025*912701f9SAndroid Build Coastguard Worker    <identity>
2026*912701f9SAndroid Build Coastguard Worker        <version number="1.1" />
2027*912701f9SAndroid Build Coastguard Worker        <language type="af" />
2028*912701f9SAndroid Build Coastguard Worker    </identity>
2029*912701f9SAndroid Build Coastguard Worker    <characters>...</characters>
2030*912701f9SAndroid Build Coastguard Worker    <localeDisplayNames>...</localeDisplayNames>
2031*912701f9SAndroid Build Coastguard Worker</ldml>
2032*912701f9SAndroid Build Coastguard Worker```
2033*912701f9SAndroid Build Coastguard Worker
2034*912701f9SAndroid Build Coastguard WorkerAny data can be added to that file, and the status will all be `draft="unconfirmed"`. Once an item is vetted—_whether it is inherited or explicitly in the file_—then its status can be changed to _approved_. This can be done either by leaving `draft="unconfirmed"` on the enclosing element and marking the child with `draft="approved"`, such as:
2035*912701f9SAndroid Build Coastguard Worker
2036*912701f9SAndroid Build Coastguard Worker```xml
2037*912701f9SAndroid Build Coastguard Worker<ldml version="1.1" draft="unconfirmed">
2038*912701f9SAndroid Build Coastguard Worker    <identity>
2039*912701f9SAndroid Build Coastguard Worker        <version number="1.1" />
2040*912701f9SAndroid Build Coastguard Worker        <language type="af" />
2041*912701f9SAndroid Build Coastguard Worker    </identity>
2042*912701f9SAndroid Build Coastguard Worker    <characters draft="approved">...</characters>
2043*912701f9SAndroid Build Coastguard Worker    <localeDisplayNames>...</localeDisplayNames>
2044*912701f9SAndroid Build Coastguard Worker    <dates />
2045*912701f9SAndroid Build Coastguard Worker    <numbers />
2046*912701f9SAndroid Build Coastguard Worker    <collations />
2047*912701f9SAndroid Build Coastguard Worker</ldml>
2048*912701f9SAndroid Build Coastguard Worker```
2049*912701f9SAndroid Build Coastguard Worker
2050*912701f9SAndroid Build Coastguard WorkerHowever, normally the draft attributes should be canonicalized, which means they are pushed down to leaf nodes as described in _[Canonical Form](#Canonical_Form)_. If an LDML file does have draft attributes that are not on leaf nodes, the file should be interpreted as if it were the canonicalized version of that file.
2051*912701f9SAndroid Build Coastguard Worker
2052*912701f9SAndroid Build Coastguard WorkerMore formally, here is how to determine whether data for an element chain E is implicitly or explicitly draft, given a locale L. Sections 1, 2, and 4 are simply formalizations of what is in LDML already. Item 3 adds the new element.
2053*912701f9SAndroid Build Coastguard Worker
2054*912701f9SAndroid Build Coastguard Worker#### <a name="Checking_for_Draft_Status" href="#Checking_for_Draft_Status">Checking for Draft Status</a>
2055*912701f9SAndroid Build Coastguard Worker
2056*912701f9SAndroid Build Coastguard Worker1. **Parent Locale Inheritance**
2057*912701f9SAndroid Build Coastguard Worker   1. Walk through the locale chain until you find a locale ID L' with a data file D. (L' may equal L).
2058*912701f9SAndroid Build Coastguard Worker   2. Produce the fully resolved data file D' for D.
2059*912701f9SAndroid Build Coastguard Worker   3. In D', find the first element pair whose element chain E' is either equivalent to or an extension of E.
2060*912701f9SAndroid Build Coastguard Worker   4. If there is no such E', return _true_
2061*912701f9SAndroid Build Coastguard Worker   5. If E' is not equivalent to E, truncate E' to the length of E.
2062*912701f9SAndroid Build Coastguard Worker2. **Enclosing Element Inheritance**
2063*912701f9SAndroid Build Coastguard Worker   1. Walk through the elements in E', from back to front.
2064*912701f9SAndroid Build Coastguard Worker      1. If you ever encounter draft=_x_, return _x_
2065*912701f9SAndroid Build Coastguard Worker   2. If L' = L, return _false_
2066*912701f9SAndroid Build Coastguard Worker3. **Missing File Inheritance**
2067*912701f9SAndroid Build Coastguard Worker   1. Otherwise, walk again through the elements in E', from back to front.
2068*912701f9SAndroid Build Coastguard Worker      1. If you encounter a `validSubLocales` attribute (deprecated):
2069*912701f9SAndroid Build Coastguard Worker         1. If L is in the attribute value, return _false_
2070*912701f9SAndroid Build Coastguard Worker         2. Otherwise return _true_
2071*912701f9SAndroid Build Coastguard Worker4. **Otherwise**
2072*912701f9SAndroid Build Coastguard Worker   1.  Return _true_
2073*912701f9SAndroid Build Coastguard Worker
2074*912701f9SAndroid Build Coastguard WorkerThe `validSubLocales` in the most specific (farthest from root file) locale file "wins" through the full resolution step (data from more specific files replacing data from less specific ones).
2075*912701f9SAndroid Build Coastguard Worker
2076*912701f9SAndroid Build Coastguard Worker#### <a name="Keyword_and_Default_Resolution" href="#Keyword_and_Default_Resolution">Keyword and Default Resolution</a>
2077*912701f9SAndroid Build Coastguard Worker
2078*912701f9SAndroid Build Coastguard WorkerWhen accessing data based on keywords, the following process is used. Consider the following example:
2079*912701f9SAndroid Build Coastguard Worker
2080*912701f9SAndroid Build Coastguard Worker* The locale 'de' has collation types A, B, C, and no `<default>` element
2081*912701f9SAndroid Build Coastguard Worker* The locale 'de_CH' has `<default type='B'>`
2082*912701f9SAndroid Build Coastguard Worker
2083*912701f9SAndroid Build Coastguard WorkerHere are the searches for various combinations.
2084*912701f9SAndroid Build Coastguard Worker
2085*912701f9SAndroid Build Coastguard Worker<!-- HTML: rowspan -->
2086*912701f9SAndroid Build Coastguard Worker<table><thead>
2087*912701f9SAndroid Build Coastguard Worker<tr><th>User Input</th>                                 <th>Lookup in Locale</th>   <th>For</th>                        <th>Comment</th></tr>
2088*912701f9SAndroid Build Coastguard Worker</thead><tbody>
2089*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="3">de_CH<br/><i>no keyword</i></td>    <td>de_CH</td>              <td>default collation type</td>     <td>finds "B"</td></tr>
2090*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>de_CH</td>              <td>collation type=B</td>           <td>not found</td></tr>
2091*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>de</td>                 <td>collation type=B</td>           <td><i>found</i></td></tr>
2092*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="4">de<br/><i>no keyword</i></td>       <td>de</td>                 <td>default collation type</td>     <td>not found</td></tr>
2093*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>root</td>               <td>default collation type</td>	    <td>finds "standard"</td></tr>
2094*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>de</td>                 <td>collation type=standard</td>    <td>not found</td></tr>
2095*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>root</td>               <td>collation type=standard</td>    <td><i>found</i></td></tr>
2096*912701f9SAndroid Build Coastguard Worker<tr><td>de_u_co_A</td>                                  <td>de</td>                 <td>collation type=A</td>           <td><i>found</i></td></tr>
2097*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="2">de_u_co_standard</td>	            <td>de</td>	                <td>collation type=standard</td>    <td>not found</td></tr>
2098*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>root</td>               <td>collation type=standard</td>    <td><i>found</i></td></tr>
2099*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="6">de_u_co_foobar</td>	                <td>de</td>	                <td>collation type=foobar</td>      <td>not found</td></tr>
2100*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>root</td>               <td>collation type=foobar</td>      <td>not found, starts looking for default</td></tr>
2101*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>de</td>	                <td>default collation type</td>     <td>not found</td></tr>
2102*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>root</td>               <td>default collation type</td>     <td>finds "standard"</td></tr>
2103*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>de</td>	                <td>collation type=standard</td>    <td>not found</td></tr>
2104*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>root</td>               <td>collation type=standard</td>    <td><i>found</i></td></tr>
2105*912701f9SAndroid Build Coastguard Worker</tbody></table>
2106*912701f9SAndroid Build Coastguard Worker
2107*912701f9SAndroid Build Coastguard WorkerExamples of "search" collator lookup; 'de' has a language-specific version, but 'en' does not:
2108*912701f9SAndroid Build Coastguard Worker
2109*912701f9SAndroid Build Coastguard Worker<!-- HTML: rowspan -->
2110*912701f9SAndroid Build Coastguard Worker<table><thead>
2111*912701f9SAndroid Build Coastguard Worker<tr><th>User Input</th>                                 <th>Lookup in Locale</th>   <th>For</th>                        <th>Comment</th></tr>
2112*912701f9SAndroid Build Coastguard Worker</thead><tbody>
2113*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="2">de_CH_u_co_search</td>              <td>de_CH</td>              <td>collation type=search</td>      <td>not found</td></tr>
2114*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>de</td>                 <td>collation type=search</td>      <td><i>found</i></td></tr>
2115*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="3">en_US_u_co_search</td>              <td>en_US</td>              <td>collation type=search</td>      <td>not found</td></tr>
2116*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>en</td>                 <td>collation type=search</td>      <td>not found</td></tr>
2117*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>root</td>               <td>collation type=search</td>      <td><i>found</i></td></tr>
2118*912701f9SAndroid Build Coastguard Worker</tbody></table>
2119*912701f9SAndroid Build Coastguard Worker
2120*912701f9SAndroid Build Coastguard WorkerExamples of lookup for Chinese collation types. Note:
2121*912701f9SAndroid Build Coastguard Worker
2122*912701f9SAndroid Build Coastguard Worker* All of the Chinese-specific collation types are provided in the 'zh' locale
2123*912701f9SAndroid Build Coastguard Worker* For 'zh' the `<default>` element specifies "pinyin"; for 'zh_Hant' the `<default>` element specifies "stroke". However any of the available Chinese collation types can be explicitly requested for any Chinese locale.
2124*912701f9SAndroid Build Coastguard Worker
2125*912701f9SAndroid Build Coastguard Worker<!-- HTML: rowspan -->
2126*912701f9SAndroid Build Coastguard Worker<table><thead>
2127*912701f9SAndroid Build Coastguard Worker<tr><th>User Input</th>                                 <th>Lookup in Locale</th>   <th>For</th>                        <th>Comment</th></tr>
2128*912701f9SAndroid Build Coastguard Worker</thead><tbody>
2129*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="3">zh_Hant<br/><i>no keyword</i></td>  <td>zh_Hant</td>            <td>default collation type</td>     <td>finds "stroke"</td></tr>
2130*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>zh_Hant</td>            <td>collation type=stroke</td>      <td>not found</td></tr>
2131*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>zh</td>                 <td>collation type=stroke</td>      <td><i>found</i></td></tr>
2132*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="3">zh_Hant_HK_u_co_pinyin</td>         <td>zh_Hant_HK</td>         <td>collation type=pinyin</td>      <td>not found</td></tr>
2133*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>zh_Hant</td>            <td>collation type=pinyin</td>      <td>not found</td></tr>
2134*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>zh</td>                 <td>collation type=pinyin</td>      <td><i>found</i></td></tr>
2135*912701f9SAndroid Build Coastguard Worker<tr><td rowspan="2">zh<br/><i>no keyword</i></td>       <td>zh</td>                 <td>default collation type</td>     <td>finds "pinyin"</td></tr>
2136*912701f9SAndroid Build Coastguard Worker<tr>                                                    <td>zh</td>                 <td>collation type=pinyin</td>      <td><i>found</i></td></tr>
2137*912701f9SAndroid Build Coastguard Worker</tbody></table>
2138*912701f9SAndroid Build Coastguard Worker
2139*912701f9SAndroid Build Coastguard Worker> **Note:** It is an invariant that the default in root for a given element must
2140*912701f9SAndroid Build Coastguard Worker> always be a value that exists in root. So you can not have the following in root:
2141*912701f9SAndroid Build Coastguard Worker
2142*912701f9SAndroid Build Coastguard Worker```
2143*912701f9SAndroid Build Coastguard Worker<someElements>
2144*912701f9SAndroid Build Coastguard Worker    <default type='a'/>
2145*912701f9SAndroid Build Coastguard Worker    <someElement type='b'>...</someElement>
2146*912701f9SAndroid Build Coastguard Worker    <someElement type='c'>...</someElement>
2147*912701f9SAndroid Build Coastguard Worker    <!-- no 'a' -->
2148*912701f9SAndroid Build Coastguard Worker</someElements>
2149*912701f9SAndroid Build Coastguard Worker```
2150*912701f9SAndroid Build Coastguard Worker
2151*912701f9SAndroid Build Coastguard WorkerFor identifiers, such as language codes, script codes, region codes, variant codes, types, keywords, currency symbols or currency display names, the default value is the identifier itself whenever no value is found in the root. Thus if there is no display name for the region code 'QA' in root, then the display name is simply 'QA'.
2152*912701f9SAndroid Build Coastguard Worker
2153*912701f9SAndroid Build Coastguard Worker#### <a name="Inheritance_vs_Related" href="#Inheritance_vs_Related">Inheritance vs Related Information</a>
2154*912701f9SAndroid Build Coastguard Worker
2155*912701f9SAndroid Build Coastguard WorkerThere are related types of data and processing that are easy to confuse:
2156*912701f9SAndroid Build Coastguard Worker
2157*912701f9SAndroid Build Coastguard Worker<!-- HTML: rowspan, colspan, col th -->
2158*912701f9SAndroid Build Coastguard Worker<table class="simple"><tbody>
2159*912701f9SAndroid Build Coastguard Worker<tr><th rowspan="4">Inheritance</th>
2160*912701f9SAndroid Build Coastguard Worker        <td colspan="2">Part of the internal mechanism used by CLDR to organize and manage locale data. This is used to share common resources, and ease maintenance, and provide the best fallback behavior in the absence of data. <i>Should not be used for locale matching or likely subtags.</i></td></tr>
2161*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Example:</i></td>
2162*912701f9SAndroid Build Coastguard Worker            <td>parent(en_AU) ⇒ en_001<br/>
2163*912701f9SAndroid Build Coastguard Worker                parent(en_001) ⇒ en<br/>
2164*912701f9SAndroid Build Coastguard Worker                parent(en) ⇒ root</td></tr>
2165*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Data:</i></td>
2166*912701f9SAndroid Build Coastguard Worker            <td>supplementalData.xml &lt;parentLocale&gt;</td></tr>
2167*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Spec:</i></td>
2168*912701f9SAndroid Build Coastguard Worker            <td><b>Section <a href="#Inheritance_and_Validity">4.2 Inheritance and Validity</a></b></td></tr>
2169*912701f9SAndroid Build Coastguard Worker
2170*912701f9SAndroid Build Coastguard Worker<tr><th rowspan="4">DefaultContent</th>
2171*912701f9SAndroid Build Coastguard Worker    <td colspan="2">Part of the internal mechanism used by CLDR to manage locale data. A particular sublocale is designated the defaultContent for a parent, so that the parent exhibits consistent behavior. <i>Should not be used for locale matching or likely subtags.</i></td></tr>
2172*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Example:</i></td>
2173*912701f9SAndroid Build Coastguard Worker            <td>addLikelySubtags(sr-ME) ⇒ sr-Latn-ME, minimize(de-Latn-DE) ⇒ de</td></tr>
2174*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Data:</i></td>
2175*912701f9SAndroid Build Coastguard Worker            <td>supplementalMetadata.xml &lt;defaultContent&gt;</td></tr>
2176*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Spec:</i></td
2177*912701f9SAndroid Build Coastguard Worker            ><td><b>Part 6: Section 9.3&nbsp;<a href="tr35-info.md#Default_Content">Default Content</a></b></td></tr>
2178*912701f9SAndroid Build Coastguard Worker
2179*912701f9SAndroid Build Coastguard Worker<tr><th rowspan="4">LikelySubtags</th>
2180*912701f9SAndroid Build Coastguard Worker    <td colspan="2">Provides most likely full subtag (script and region) in the absence of other information. A core component of LocaleMatching.</td></tr>
2181*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Example:</i></td>
2182*912701f9SAndroid Build Coastguard Worker            <td>addLikelySubtags(zh) ⇒ zh-Hans-CN<br/>
2183*912701f9SAndroid Build Coastguard Worker				addLikelySubtags(zh-TW) ⇒ zh-Hant-TW<br/>
2184*912701f9SAndroid Build Coastguard Worker				addLikelySubtags(zh-Hant) ⇒ zh-Hant-TW<br/>
2185*912701f9SAndroid Build Coastguard Worker				minimize(zh-Hans-CN, favorRegion|favorScript) ⇒ zh<br/>
2186*912701f9SAndroid Build Coastguard Worker				minimize(zh-Hant-TW, favorRegion) ⇒ zh-TW<br/>
2187*912701f9SAndroid Build Coastguard Worker				minimize(zh-Hant-TW, favorScript) ⇒ zh-Hant
2188*912701f9SAndroid Build Coastguard Worker			</td></tr>
2189*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Data:</i></td>
2190*912701f9SAndroid Build Coastguard Worker            <td>likelySubtags.xml &lt;likelySubtags&gt;</td></tr>
2191*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Spec:</i></td>
2192*912701f9SAndroid Build Coastguard Worker            <td><b>Section <a href="#Likely_Subtags">4.3 Likely Subtags</a></b></td></tr>
2193*912701f9SAndroid Build Coastguard Worker
2194*912701f9SAndroid Build Coastguard Worker<tr><th rowspan="4">LocaleMatching</th>
2195*912701f9SAndroid Build Coastguard Worker    <td colspan="2">Provides the best match for the user’s language(s) among an application’s supported languages.</td></tr>
2196*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Example:</i></td>
2197*912701f9SAndroid Build Coastguard Worker            <td>bestLocale(userLangs=&lt;en, fr&gt;, appLangs=&lt;fr-CA, ru&gt;) ⇒ fr-CA</td></tr>
2198*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Data:</i></td>
2199*912701f9SAndroid Build Coastguard Worker            <td>languageInfo.xml &lt;languageMatching&gt;</td></tr>
2200*912701f9SAndroid Build Coastguard Worker        <tr><td><i>Spec:</i></td>
2201*912701f9SAndroid Build Coastguard Worker            <td><b>Section <a href="#LanguageMatching">4.4 Language Matching</a></b></td></tr>
2202*912701f9SAndroid Build Coastguard Worker
2203*912701f9SAndroid Build Coastguard Worker</tbody></table>
2204*912701f9SAndroid Build Coastguard Worker
2205*912701f9SAndroid Build Coastguard Worker### <a name="Likely_Subtags" href="#Likely_Subtags">Likely Subtags</a>
2206*912701f9SAndroid Build Coastguard Worker
2207*912701f9SAndroid Build Coastguard Worker```xml
2208*912701f9SAndroid Build Coastguard Worker<!ELEMENT likelySubtag EMPTY >
2209*912701f9SAndroid Build Coastguard Worker<!ATTLIST likelySubtag from NMTOKEN #REQUIRED>
2210*912701f9SAndroid Build Coastguard Worker<!ATTLIST likelySubtag to NMTOKEN #REQUIRED>
2211*912701f9SAndroid Build Coastguard Worker```
2212*912701f9SAndroid Build Coastguard Worker
2213*912701f9SAndroid Build Coastguard WorkerThere are a number of situations where it is useful to be able to find the most likely language, script, or region. For example, given the language "zh" and the region "TW", what is the most likely script? Given the script "Thai" what is the most likely language or region? Given the region TW, what is the most likely language and script?
2214*912701f9SAndroid Build Coastguard Worker
2215*912701f9SAndroid Build Coastguard WorkerConversely, given a locale, it is useful to find out which fields (language, script, or region) may be superfluous, in the sense that they contain the likely tags. For example, "en_Latn" can be simplified down to "en" since "Latn" is the likely script for "en"; "ja_Jpan_JP" can be simplified down to "ja".
2216*912701f9SAndroid Build Coastguard Worker
2217*912701f9SAndroid Build Coastguard WorkerThe _likelySubtag_ supplemental data provides default information for computing these values. This data is based on the default content data, the population data, and the suppress-script data in [[BCP47](#BCP47)]. It is heuristically derived, and may change over time.
2218*912701f9SAndroid Build Coastguard Worker
2219*912701f9SAndroid Build Coastguard WorkerFor the relationship between Inheritance, DefaultContent, LikelySubtags, and LocaleMatching, see **_[Inheritance vs Related Information](tr35.md#Inheritance_vs_Related)_**.
2220*912701f9SAndroid Build Coastguard Worker
2221*912701f9SAndroid Build Coastguard WorkerTo look up data in the table, see if a locale matches one of the `from` attribute values. If so, fetch the corresponding `to` attribute value. For example, the Chinese data looks like the following:
2222*912701f9SAndroid Build Coastguard Worker
2223*912701f9SAndroid Build Coastguard Worker```xml
2224*912701f9SAndroid Build Coastguard Worker<likelySubtag from="zh" to="zh_Hans_CN" />
2225*912701f9SAndroid Build Coastguard Worker<likelySubtag from="zh_HK" to="zh_Hant_HK" />
2226*912701f9SAndroid Build Coastguard Worker<likelySubtag from="zh_Hani" to="zh_Hani_CN" />
2227*912701f9SAndroid Build Coastguard Worker<likelySubtag from="zh_Hant" to="zh_Hant_TW" />
2228*912701f9SAndroid Build Coastguard Worker<likelySubtag from="zh_MO" to="zh_Hant_MO" />
2229*912701f9SAndroid Build Coastguard Worker<likelySubtag from="zh_TW" to="zh_Hant_TW" />
2230*912701f9SAndroid Build Coastguard Worker```
2231*912701f9SAndroid Build Coastguard Worker
2232*912701f9SAndroid Build Coastguard WorkerSo looking up "zh_TW" returns "zh_Hant_TW", while looking up "zh" returns "zh_Hans_CN".
2233*912701f9SAndroid Build Coastguard Worker
2234*912701f9SAndroid Build Coastguard WorkerIn more detail, the data is designed to be used in the following operations.
2235*912701f9SAndroid Build Coastguard WorkerLike other CLDR operations, these operations can also be used with language tags having [[BCP47](#BCP47)] syntax, with the appropriate changes to the data.
2236*912701f9SAndroid Build Coastguard Worker
2237*912701f9SAndroid Build Coastguard WorkerAn implementation may choose to exclude language tags with the language subtag "und" from the following operation. In such a case, only the canonicalization is done. An implementation can declare that it is doing the exclusion, or can take a parameter that controls whether or not to do it.
2238*912701f9SAndroid Build Coastguard Worker
2239*912701f9SAndroid Build Coastguard Worker_**Add Likely Subtags:**_ _Given a source locale X, to return a locale Y where the empty subtags have been filled in by the most likely subtags._ This is written as X ⇒ Y ("X maximizes to Y").
2240*912701f9SAndroid Build Coastguard Worker
2241*912701f9SAndroid Build Coastguard WorkerA subtag is called _empty_ if it is a missing script or region subtag, or it is a base language subtag with the value "und". In the description below, a subscript on a subtag _x_ indicates which tag it is from: _xs_ is in the source, _xm_ is in a match, and _xr_ is in the final result.
2242*912701f9SAndroid Build Coastguard Worker
2243*912701f9SAndroid Build Coastguard WorkerThis operation is performed in the following way.
2244*912701f9SAndroid Build Coastguard Worker
2245*912701f9SAndroid Build Coastguard Worker1. **Canonicalize.**
2246*912701f9SAndroid Build Coastguard Worker   1. Make sure the input locale is in canonical form: uses the right separator, and has the right casing.
2247*912701f9SAndroid Build Coastguard Worker   2. Replace any deprecated subtags with their canonical values using the `<alias>` data in supplemental metadata. Use the first value in the replacement list, if it exists.
2248*912701f9SAndroid Build Coastguard Worker      Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is
2249*912701f9SAndroid Build Coastguard Worker      one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".
2250*912701f9SAndroid Build Coastguard Worker      * There are certain exceptions to this: some implementations still use three obsolete language subtags: iw, in, and yi.
2251*912701f9SAndroid Build Coastguard Worker        The likely subtags data currently supports those implementations by providing elements that handle them,
2252*912701f9SAndroid Build Coastguard Worker        with the deprecated code on both sides: `<likelySubtag from="iw"to="iw_Hebr_IL"/>`
2253*912701f9SAndroid Build Coastguard Worker        Such implementations may refrain from replacing those deprecated tags.
2254*912701f9SAndroid Build Coastguard Worker   3. If the tag is a legacy language tag (marked as “Type: grandfathered” in BCP 47; see `<variable id="$grandfathered" type="choice">` in the supplemental data), then return it.
2255*912701f9SAndroid Build Coastguard Worker   4. Remove the script code 'Zzzz' and the region code 'ZZ' if they occur.
2256*912701f9SAndroid Build Coastguard Worker   5. Get the components of the cleaned-up source tag _(language<sub>s</sub>, script<sub>s</sub>,_ and _region<sub>s</sub>_), plus any variants and extensions.
2257*912701f9SAndroid Build Coastguard Worker   6. If the language is not 'und' and the other two components are not empty, return the language tag composed of _language<sub>s</sub>\_script<sub>s</sub>\_region<sub>s</sub>_ + variants + extensions.
2258*912701f9SAndroid Build Coastguard Worker2. **Lookup.** Look up each of the following in order, and stop on the first match:
2259*912701f9SAndroid Build Coastguard Worker   1. _language<sub>s</sub>\_script<sub>s</sub>\_region<sub>s</sub>_
2260*912701f9SAndroid Build Coastguard Worker   2. _language<sub>s</sub>\_script<sub>s</sub>_
2261*912701f9SAndroid Build Coastguard Worker   3. _language<sub>s</sub>\_region<sub>s</sub>_
2262*912701f9SAndroid Build Coastguard Worker   4. _language<sub>s</sub>_
2263*912701f9SAndroid Build Coastguard Worker3. **Return**
2264*912701f9SAndroid Build Coastguard Worker   1. If there is no match, signal an error and stop.
2265*912701f9SAndroid Build Coastguard Worker   2. Otherwise there is a match = _language<sub>m</sub>\_script<sub>m</sub>\_region<sub>m</sub>_
2266*912701f9SAndroid Build Coastguard Worker   3. Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is neither empty nor 'und', and x<sub>m</sub> otherwise.
2267*912701f9SAndroid Build Coastguard Worker   4. Return the language tag composed of _language<sub>r</sub>\_script<sub>r</sub>\_region<sub>r</sub>_ + variants + extensions.
2268*912701f9SAndroid Build Coastguard Worker
2269*912701f9SAndroid Build Coastguard WorkerSignalling an error can be done in various ways, depending on the most consistent approach for APIs in the module. For example:
2270*912701f9SAndroid Build Coastguard Worker   1. raise an exception
2271*912701f9SAndroid Build Coastguard Worker   2. return an error value (such as null)
2272*912701f9SAndroid Build Coastguard Worker   3. return the input (with missing fields)
2273*912701f9SAndroid Build Coastguard Worker   4. return the input, but "Zzzz", and/or "ZZ" substituted for empty fields.
2274*912701f9SAndroid Build Coastguard Worker   5. "und"
2275*912701f9SAndroid Build Coastguard Worker
2276*912701f9SAndroid Build Coastguard WorkerOne by-product of this algorithm is that an element such as `<likelySubtag from="fr_IR "to="en_Arab"/>` would be misleading: the 'fr' can never be replaced by 'en'.
2277*912701f9SAndroid Build Coastguard WorkerThe only subtags that can be replaced are deprecated ones, empty, und, Zzzz, and ZZ.
2278*912701f9SAndroid Build Coastguard Worker
2279*912701f9SAndroid Build Coastguard WorkerThe lookup can be optimized. For example, if any of the tags in Step 2 are the same as previous ones in that list, they do not need to be tested.
2280*912701f9SAndroid Build Coastguard Worker
2281*912701f9SAndroid Build Coastguard Worker_Example1:_
2282*912701f9SAndroid Build Coastguard Worker
2283*912701f9SAndroid Build Coastguard Worker* Input is ZH-ZZZZ-SG.
2284*912701f9SAndroid Build Coastguard Worker* Normalize to zh_SG.
2285*912701f9SAndroid Build Coastguard Worker* Look up in table. No match.
2286*912701f9SAndroid Build Coastguard Worker* Look up zh, and get the match (zh_Hans_CN). Substitute SG, and return zh_Hans_SG.
2287*912701f9SAndroid Build Coastguard Worker
2288*912701f9SAndroid Build Coastguard WorkerTo find the most likely language for a country, or language for a script, use "und" as the language subtag. For example, looking up "und_TW" returns zh_Hant_TW.
2289*912701f9SAndroid Build Coastguard Worker
2290*912701f9SAndroid Build Coastguard WorkerA general goal of the algorithm is that non-empty field present in the 'from' field is also present in the 'to' field, so a non-empty input field will not change in "Add Likely Subtags" operation.
2291*912701f9SAndroid Build Coastguard WorkerThat is, when X ⇒ Y, and X' results from replacing an empty subtag in X by the corresponding subtag in Y, then X' ⇒ Y.
2292*912701f9SAndroid Build Coastguard WorkerFor example, if und_AF ⇒ fa_Arab_AF, then:
2293*912701f9SAndroid Build Coastguard Worker
2294*912701f9SAndroid Build Coastguard Worker* fa_Arab_AF ⇒ fa_Arab_AF
2295*912701f9SAndroid Build Coastguard Worker* und_Arab_AF ⇒ fa_Arab_AF
2296*912701f9SAndroid Build Coastguard Worker* fa_AF ⇒ fa_Arab_AF
2297*912701f9SAndroid Build Coastguard Worker
2298*912701f9SAndroid Build Coastguard WorkerThere are a few exceptions to this goal:
2299*912701f9SAndroid Build Coastguard Worker* A 'denormalized' subtag changes to the normalized form, except for certain denormalized language subtags such as 'iw' (for 'he' = Hebrew) which may occur in both the 'from' and 'to' fields of the data.
2300*912701f9SAndroid Build Coastguard WorkerThis allows for implementations that use those denormalized subtags to use the data with only minor changes to the operations.
2301*912701f9SAndroid Build Coastguard Worker* A macroregion (such as West Africa = 011) _may_ change to a specific country (Nigeria = NG).
2302*912701f9SAndroid Build Coastguard Worker
2303*912701f9SAndroid Build Coastguard Worker**_Remove_** _**Likely Subtags:** Given a locale, remove any fields that Add Likely Subtags would add._
2304*912701f9SAndroid Build Coastguard Worker
2305*912701f9SAndroid Build Coastguard WorkerThe reverse operation removes fields that could be added by the first operation.
2306*912701f9SAndroid Build Coastguard Worker
2307*912701f9SAndroid Build Coastguard Worker1. First get max = AddLikelySubtags(inputLocale).
2308*912701f9SAndroid Build Coastguard Worker2. If an error is signaled in AddLikelySubtags, signal that same error and stop.
2309*912701f9SAndroid Build Coastguard Worker3. Remove the variants and extensions from max.
2310*912701f9SAndroid Build Coastguard Worker4. Get the components of the max (_languagemax_, _scriptmax_, _regionmax_).
2311*912701f9SAndroid Build Coastguard Worker5. Then for _trial_ in {_languagemax_, _languagemax_regionmax_, _languagemax_scriptmax_}
2312*912701f9SAndroid Build Coastguard Worker   * If AddLikelySubtags(_trial_) = max, then return _trial_ + variants + extensions.
2313*912701f9SAndroid Build Coastguard Worker6. If there is no match, return max + variants + extensions.
2314*912701f9SAndroid Build Coastguard Worker
2315*912701f9SAndroid Build Coastguard WorkerExample:
2316*912701f9SAndroid Build Coastguard Worker
2317*912701f9SAndroid Build Coastguard Worker* Input is zh_Hant or zh_TW.
2318*912701f9SAndroid Build Coastguard Worker* Maximize to get zh_Hant_TW.
2319*912701f9SAndroid Build Coastguard Worker* zh => zh_Hans_CN. No match, so continue.
2320*912701f9SAndroid Build Coastguard Worker* zh_TW => zh_Hant_TW. Matches, so return **zh_TW**.
2321*912701f9SAndroid Build Coastguard Worker
2322*912701f9SAndroid Build Coastguard Worker**_Remove_** _**Likely Subtags, favoring script:** Given a locale, remove any fields that Add Likely Subtags would add, but favor script over region._
2323*912701f9SAndroid Build Coastguard Worker
2324*912701f9SAndroid Build Coastguard WorkerA variant of this favors the script over the region, thus using {language, language_script, language_region} in the step #4 above.
2325*912701f9SAndroid Build Coastguard WorkerThis variant much less commonly used, only when the script relationship is more significant to users.
2326*912701f9SAndroid Build Coastguard WorkerHere is the difference:
2327*912701f9SAndroid Build Coastguard Worker
2328*912701f9SAndroid Build Coastguard WorkerExample:
2329*912701f9SAndroid Build Coastguard Worker
2330*912701f9SAndroid Build Coastguard Worker* Input is zh_Hant or zh_TW.
2331*912701f9SAndroid Build Coastguard Worker* Maximize to get zh_Hant_TW.
2332*912701f9SAndroid Build Coastguard Worker* zh => zh_Hans_CN. No match, so continue.
2333*912701f9SAndroid Build Coastguard Worker* zh_Hant => zh_Hant_TW. Matches, so return **zh_Hant**.
2334*912701f9SAndroid Build Coastguard Worker
2335*912701f9SAndroid Build Coastguard Worker### <a name="LanguageMatching" href="#LanguageMatching">Language Matching</a>
2336*912701f9SAndroid Build Coastguard Worker
2337*912701f9SAndroid Build Coastguard Worker```xml
2338*912701f9SAndroid Build Coastguard Worker<!ELEMENT languageMatching ( languageMatches* ) >
2339*912701f9SAndroid Build Coastguard Worker<!ELEMENT languageMatches ( paradigmLocales*, matchVariable*, languageMatch* ) >
2340*912701f9SAndroid Build Coastguard Worker<!ATTLIST languageMatches type NMTOKEN #REQUIRED >
2341*912701f9SAndroid Build Coastguard Worker
2342*912701f9SAndroid Build Coastguard Worker<!ELEMENT languageMatch EMPTY >
2343*912701f9SAndroid Build Coastguard Worker<!ATTLIST languageMatch desired CDATA #REQUIRED >
2344*912701f9SAndroid Build Coastguard Worker<!ATTLIST languageMatch supported CDATA #REQUIRED >
2345*912701f9SAndroid Build Coastguard Worker<!ATTLIST languageMatch percent NMTOKEN #REQUIRED >
2346*912701f9SAndroid Build Coastguard Worker<!ATTLIST languageMatch distance NMTOKEN #IMPLIED >
2347*912701f9SAndroid Build Coastguard Worker<!ATTLIST languageMatch oneway ( true | false ) #IMPLIED >
2348*912701f9SAndroid Build Coastguard Worker
2349*912701f9SAndroid Build Coastguard Worker<!ELEMENT languageMatches ( paradigmLocales*, matchVariable*, languageMatch* ) >
2350*912701f9SAndroid Build Coastguard Worker<!ATTLIST languageMatches type NMTOKEN #REQUIRED >
2351*912701f9SAndroid Build Coastguard Worker
2352*912701f9SAndroid Build Coastguard Worker<!ELEMENT paradigmLocales EMPTY >
2353*912701f9SAndroid Build Coastguard Worker<!ATTLIST paradigmLocales locales NMTOKENS #REQUIRED >
2354*912701f9SAndroid Build Coastguard Worker```
2355*912701f9SAndroid Build Coastguard Worker
2356*912701f9SAndroid Build Coastguard WorkerImplementers are often faced with the issue of how to match the user's requested languages with their product's supported languages. For example, suppose that a product supports \{ja-JP, de, zh-TW}. If the user understands written American English, German, French, Swiss German, and Italian, then **de** would be the best match; if s/he understands only Chinese (zh), then zh-TW would be the best match.
2357*912701f9SAndroid Build Coastguard Worker
2358*912701f9SAndroid Build Coastguard WorkerThe standard truncation-fallback algorithm does not work well when faced with the complexities of natural language. The language matching data is designed to fill that gap. Stated in those terms, language matching can have the effect of a more complex fallback, such as:
2359*912701f9SAndroid Build Coastguard Worker
2360*912701f9SAndroid Build Coastguard Worker```
2361*912701f9SAndroid Build Coastguard Workersr-Cyrl-RS
2362*912701f9SAndroid Build Coastguard Workersr-Cyrl
2363*912701f9SAndroid Build Coastguard Workersr-Latn-RS
2364*912701f9SAndroid Build Coastguard Workersr-Latn
2365*912701f9SAndroid Build Coastguard Workersr
2366*912701f9SAndroid Build Coastguard Workerhr-Latn
2367*912701f9SAndroid Build Coastguard Workerhr
2368*912701f9SAndroid Build Coastguard Worker```
2369*912701f9SAndroid Build Coastguard Worker
2370*912701f9SAndroid Build Coastguard WorkerLanguage matching is used to find the best supported locale ID given a requested list of languages. The requested list could come from different sources, such as the user's list of preferred languages in the OS Settings, or from a browser Accept-Language list. For example, if my native tongue is English, I can understand Swiss German and German, my French is rusty but usable, and Italian basic, ideally an implementation would allow me to select {gsw, de, fr} as my preferred list of languages, skipping Italian because my comprehension is not good enough for arbitrary content.
2371*912701f9SAndroid Build Coastguard Worker
2372*912701f9SAndroid Build Coastguard WorkerLanguage Matching can also be used to get fallback data elements. In many cases, there may not be full data for a particular locale. For example, for a Breton speaker, the best fallback if data is unavailable might be French. That is, suppose we have found a Breton bundle, but it does not contain translation for the key "CN" (for the country China). It is best to return "chine", rather than falling back to the value default language such as Russian and getting "Китай".  The language matching data can be used to get the closest fallback locales (of those supported) to a given language.
2373*912701f9SAndroid Build Coastguard Worker
2374*912701f9SAndroid Build Coastguard WorkerFor the relationship between Inheritance, DefaultContent, LikelySubtags, and LocaleMatching, see **_[Inheritance vs Related Information](tr35.md#Inheritance_vs_Related)_**.
2375*912701f9SAndroid Build Coastguard Worker
2376*912701f9SAndroid Build Coastguard WorkerWhen such fallback is used for inherited item lookup, the normal order of inheritance is used for inherited item lookup, except that before using any data from **root**, the data for the fallback locales would be used if available. Language matching does not interact with the fallback of resources _within the locale-parent chain_. For example, suppose that we are looking for the value for a particular path **P** in **nb-NO**. In the absence of aliases, normally the following lookup is used.
2377*912701f9SAndroid Build Coastguard Worker
2378*912701f9SAndroid Build Coastguard Worker> **nb-NO** → **nb** → **root**
2379*912701f9SAndroid Build Coastguard Worker
2380*912701f9SAndroid Build Coastguard WorkerThat is, we first look in **nb-NO**. If there is no value for **P** there, then we look in **nb**. If there is no value for **P** there, we return the value for **P** in root (or a code value, if there is nothing there). Remember that if there is an `alias` element along this path, then the lookup may restart with a different path in **nb-NO** (or another locale).
2381*912701f9SAndroid Build Coastguard Worker
2382*912701f9SAndroid Build Coastguard WorkerHowever, suppose that **nb-NO** has the fallback values **[nn da sv en]**, derived from language matching. In that case, an implementation _may_ progressively look up each of the listed locales, with the appropriate substitutions, returning the first value that is not found in **root**. This follows roughly the following pseudocode:
2383*912701f9SAndroid Build Coastguard Worker
2384*912701f9SAndroid Build Coastguard Worker```c
2385*912701f9SAndroid Build Coastguard Workervalue = lookup(P, nb-NO); if (locationFound != root) return value;
2386*912701f9SAndroid Build Coastguard Workervalue = lookup(P, nn-NO); if (locationFound != root) return value;
2387*912701f9SAndroid Build Coastguard Workervalue = lookup(P, da-NO); if (locationFound != root) return value;
2388*912701f9SAndroid Build Coastguard Workervalue = lookup(P, sv-NO); if (locationFound != root) return value;
2389*912701f9SAndroid Build Coastguard Workervalue = lookup(P, en-NO); return value;
2390*912701f9SAndroid Build Coastguard Worker```
2391*912701f9SAndroid Build Coastguard Worker
2392*912701f9SAndroid Build Coastguard WorkerThe locales in the fallback list are not used recursively. For example, for the lookup of a path in nb-NO, if **fr** were a fallback value for **da**, it would not matter for the above process. Only the original language matters.
2393*912701f9SAndroid Build Coastguard Worker
2394*912701f9SAndroid Build Coastguard WorkerThe language matching data is intended to be used according to the following algorithm. This is a logical description, and can be optimized for production in many ways. In this algorithm, the languageMatching data is interpreted as an ordered list.
2395*912701f9SAndroid Build Coastguard Worker
2396*912701f9SAndroid Build Coastguard WorkerDistances between given pair of subtags can be larger or smaller than the typical distances. For example, the distance between en and en-GB can be greater than those between en-GB and en-IE. In some cases, language and/or script differences can be as small as the typical region difference. (Example: sr-Latn vs. sr-Cyrl).
2397*912701f9SAndroid Build Coastguard Worker
2398*912701f9SAndroid Build Coastguard WorkerThe distances resulting from the table are not linear, but are rather chosen to produce expected results. So a distance of 10 is not necessarily twice as "bad" as a distance of 5. Implementations may want to have a mode where script distances should swamp language distances. The tables are built such that this can be accomplished by multiplying the language distance by 0.25.
2399*912701f9SAndroid Build Coastguard Worker
2400*912701f9SAndroid Build Coastguard WorkerThe language matching algorithm takes a list of a user’s desired languages, and a list of the application’s supported languages.
2401*912701f9SAndroid Build Coastguard Worker
2402*912701f9SAndroid Build Coastguard Worker* Set the best weighted distance BWD to ∞
2403*912701f9SAndroid Build Coastguard Worker* Set the best desired language BD to null
2404*912701f9SAndroid Build Coastguard Worker* Set the best supported language BS to null
2405*912701f9SAndroid Build Coastguard Worker* For each desired language D
2406*912701f9SAndroid Build Coastguard Worker  * Compute a demotion value F, based on the position in the list.
2407*912701f9SAndroid Build Coastguard Worker    * This demotion value is up to the implementation, but is typically a positive value that increases according to how far D is from the start of the desired language list.
2408*912701f9SAndroid Build Coastguard Worker  * For each supported language S
2409*912701f9SAndroid Build Coastguard Worker    * Find the matching distance MD as described below.
2410*912701f9SAndroid Build Coastguard Worker    * Compute the weighted distance as F + MD
2411*912701f9SAndroid Build Coastguard Worker    * If WD < BD
2412*912701f9SAndroid Build Coastguard Worker      * BWD = WD
2413*912701f9SAndroid Build Coastguard Worker      * BD = D
2414*912701f9SAndroid Build Coastguard Worker      * BS = S
2415*912701f9SAndroid Build Coastguard Worker* If the BWD is less than a threshold, return <BD, BS>
2416*912701f9SAndroid Build Coastguard Worker  * The threshold is implementation-defined, typically set to greater than a default region difference, and less than a default script difference.
2417*912701f9SAndroid Build Coastguard Worker* Otherwise BD = the default supported language (like English); return <BD, null>
2418*912701f9SAndroid Build Coastguard Worker
2419*912701f9SAndroid Build Coastguard WorkerTo find the matching distance MD between any two languages, perform the following steps.
2420*912701f9SAndroid Build Coastguard Worker
2421*912701f9SAndroid Build Coastguard Worker1. Maximize each language using [Likely Subtags](#Likely_Subtags).
2422*912701f9SAndroid Build Coastguard Worker   * und is a special case: see below.
2423*912701f9SAndroid Build Coastguard Worker2. Set the match-distance MD to 0
2424*912701f9SAndroid Build Coastguard Worker3. For each subtag in {language, script, region}
2425*912701f9SAndroid Build Coastguard Worker   1. If respective subtags in each language tag are identical, remove the subtag from each (logically) and continue.
2426*912701f9SAndroid Build Coastguard Worker   2. Traverse the languageMatching data until a match is found.
2427*912701f9SAndroid Build Coastguard Worker      * \* matches any field.
2428*912701f9SAndroid Build Coastguard Worker      * If the oneway flag is false, then the match is symmetric; otherwise only match one direction.
2429*912701f9SAndroid Build Coastguard Worker      * For region matching, use the mechanisms in **[Enhanced Language Matching](#EnhancedLanguageMatching)**.
2430*912701f9SAndroid Build Coastguard Worker   3. Add the `distance` attribute value to MD.
2431*912701f9SAndroid Build Coastguard Worker      * This used to be a `percent` attribute value, which was 100 - the `distance` attribute value.
2432*912701f9SAndroid Build Coastguard Worker   4. Remove the subtag from each (logically)
2433*912701f9SAndroid Build Coastguard Worker4. Return MD
2434*912701f9SAndroid Build Coastguard Worker
2435*912701f9SAndroid Build Coastguard WorkerIt is typically useful to set the discount factor between successive elements of the desired languages list to be slightly greater than the default region difference. That avoids the following problem:
2436*912701f9SAndroid Build Coastguard Worker
2437*912701f9SAndroid Build Coastguard Worker_Supported languages:_ "de, fr, ja"
2438*912701f9SAndroid Build Coastguard Worker
2439*912701f9SAndroid Build Coastguard Worker_User's desired languages:_ "de-AT, fr"
2440*912701f9SAndroid Build Coastguard Worker
2441*912701f9SAndroid Build Coastguard WorkerThis user would expect to get "de", not "fr". In practice, when a user selects a list of preferred languages, they don't include all the regional variants ahead of their second base language. Yet while the user's desired languages really doesn't tell us the priority ranking among their languages, normally the fall-off between the user's languages is substantially greater than regional variants. But unless F is greater than the distance between de-AT and de-DE, then the user’s second-choice language would be returned.
2442*912701f9SAndroid Build Coastguard Worker
2443*912701f9SAndroid Build Coastguard WorkerThe base language subtag "und" is a special case. Suppose we have the following situation:
2444*912701f9SAndroid Build Coastguard Worker
2445*912701f9SAndroid Build Coastguard Worker* desired languages: \{und, it}
2446*912701f9SAndroid Build Coastguard Worker* supported languages: \{en, it}
2447*912701f9SAndroid Build Coastguard Worker* resulting language: en
2448*912701f9SAndroid Build Coastguard Worker
2449*912701f9SAndroid Build Coastguard WorkerPart of this is because 'und' has a special function in BCP 47; it stands in for 'no supplied base language'. To prevent this from happening, if the desired base language is und, the language matcher should not apply likely subtags to it.
2450*912701f9SAndroid Build Coastguard Worker
2451*912701f9SAndroid Build Coastguard WorkerExamples:
2452*912701f9SAndroid Build Coastguard Worker
2453*912701f9SAndroid Build Coastguard WorkerFor example, suppose that nn-DE and nb-FR are being compared. They are first maximized to nn-Latn-DE and nb-Latn-FR, respectively. The list is searched. The first match is with "\*-\*-\*", for a match of 96%. The languages are truncated to nn-Latn and nb-Latn, then to nn and nb. The first match is also for a value of 96%, so the result is 92%.
2454*912701f9SAndroid Build Coastguard Worker
2455*912701f9SAndroid Build Coastguard WorkerNote that language matching is orthogonal to the how closely two languages are related linguistically. For example, Breton is more closely related to Welsh than to French, but French is the better match (because it is more likely that a Breton reader will understand French than Welsh). This also illustrates that the matches are often asymmetric: it is not likely that a French reader will understand Breton.
2456*912701f9SAndroid Build Coastguard Worker
2457*912701f9SAndroid Build Coastguard WorkerThe "\*" acts as a wild card, as shown in the following example:
2458*912701f9SAndroid Build Coastguard Worker
2459*912701f9SAndroid Build Coastguard Worker```xml
2460*912701f9SAndroid Build Coastguard Worker<languageMatch desired="es-*-ES" supported="es-*-ES" percent="100" />
2461*912701f9SAndroid Build Coastguard Worker<!-- Latin American Spanishes are closer to each other. Approximate by having es-ES be further from everything else. -->
2462*912701f9SAndroid Build Coastguard Worker
2463*912701f9SAndroid Build Coastguard Worker<languageMatch desired="es-*-ES" supported="es-*-*" percent="93" />
2464*912701f9SAndroid Build Coastguard Worker
2465*912701f9SAndroid Build Coastguard Worker<languageMatch desired="*" supported="*" percent="1" />
2466*912701f9SAndroid Build Coastguard Worker<!-- [Default value - must be at end!] Normally there is no comprehension of different languages. -->
2467*912701f9SAndroid Build Coastguard Worker
2468*912701f9SAndroid Build Coastguard Worker<languageMatch desired="*-*" supported="*-*" percent="20" />
2469*912701f9SAndroid Build Coastguard Worker<!-- [Default value - must be at end!] Normally there is little comprehension of different scripts. -->
2470*912701f9SAndroid Build Coastguard Worker
2471*912701f9SAndroid Build Coastguard Worker<languageMatch desired="*-*-*" supported="*-*-*" percent="96" />
2472*912701f9SAndroid Build Coastguard Worker<!-- [Default value - must be at end!] Normally there are small differences across regions. -->
2473*912701f9SAndroid Build Coastguard Worker```
2474*912701f9SAndroid Build Coastguard Worker
2475*912701f9SAndroid Build Coastguard WorkerWhen the language+region is not matched, and there is otherwise no reason to pick among the supported regions for that language, then some measure of geographic "closeness" can be used. The results may be more understandable by users. Looking for en-SK, for example, should fall back to something within Europe (eg en-GB) in preference to something far away and unrelated (eg en-SG). Such a closeness metric does not need to be exact; a small amount of data can be used to give an approximate distance between any two regions. However, any such data must be used carefully; although Hong Kong is closer to India than to the UK, it is unlikely that en-IN would be a better match to en-HK than en-GB would.
2476*912701f9SAndroid Build Coastguard Worker
2477*912701f9SAndroid Build Coastguard Worker#### <a name="EnhancedLanguageMatching" href="#EnhancedLanguageMatching">Enhanced Language Matching</a>
2478*912701f9SAndroid Build Coastguard Worker
2479*912701f9SAndroid Build Coastguard WorkerThe enhanced format for language matching adds structure to enable better matching of languages. It is distinguished by having a suffix "\_new" on the type, as in the example below. The extended structure allows matching to take into account broad similarities that would give better results. For example, for English the regions that are or inherit from US (AS|GU|MH|MP|PR|UM|VI|US) form a “cluster”. Each region in that cluster should be closer to each other than to any other region. And a region outside the cluster should be closer to another region outside that cluster than to one inside. We get this issue with the “world languages” like English, Spanish, Portuguese, Arabic, etc.
2480*912701f9SAndroid Build Coastguard Worker
2481*912701f9SAndroid Build Coastguard Worker_Example:_
2482*912701f9SAndroid Build Coastguard Worker
2483*912701f9SAndroid Build Coastguard Worker```xml
2484*912701f9SAndroid Build Coastguard Worker<languageMatches type="written_new">
2485*912701f9SAndroid Build Coastguard Worker    <paradigmLocales locales="en en-GB es es-419 pt-BR pt-PT" />
2486*912701f9SAndroid Build Coastguard Worker    <matchVariable id="$enUS" value="AS+GU+MH+MP+PR+UM+US+VI" />
2487*912701f9SAndroid Build Coastguard Worker    <matchVariable id="$cnsar" value="HK+MO" />
2488*912701f9SAndroid Build Coastguard Worker    <matchVariable id="$americas" value="019" />
2489*912701f9SAndroid Build Coastguard Worker    <matchVariable id="$maghreb" value="MA+DZ+TN+LY+MR+EH" />
2490*912701f9SAndroid Build Coastguard Worker    <languageMatch desired="no" supported="nb" distance="1" /><!-- no ⇒ nb -->
2491*912701f9SAndroid Build Coastguard Worker2492*912701f9SAndroid Build Coastguard Worker    <languageMatch desired="ar_*_$maghreb" supported="ar_*_$maghreb" distance="4" />
2493*912701f9SAndroid Build Coastguard Worker    <!-- ar; *; $maghreb ⇒ ar; *; $maghreb -->
2494*912701f9SAndroid Build Coastguard Worker    <languageMatch desired="ar_*_$!maghreb" supported="ar_*_$!maghreb" distance="4" />
2495*912701f9SAndroid Build Coastguard Worker    <!-- ar; *; $!maghreb ⇒ ar; *; $!maghreb -->
2496*912701f9SAndroid Build Coastguard Worker2497*912701f9SAndroid Build Coastguard Worker```
2498*912701f9SAndroid Build Coastguard Worker
2499*912701f9SAndroid Build Coastguard WorkerThe **matchVariable** allows for a rule to match to multiple regions, as illustrated by **\$maghreb**. The syntax is simple: it allows for + for _union_ and - for _set difference_, but no precedence. So A+B-A+D is interpreted as (((A+B)-A)+D), not as (A+B)-(A+D). The variable **id** has a value of the form [$][a-zA-Z0-9]+. If $X is defined, then $!X automatically means all those regions that are not in $X.
2500*912701f9SAndroid Build Coastguard Worker
2501*912701f9SAndroid Build Coastguard WorkerWhen the set is interpreted, then macrolanguages are (logically) transformed into a list of their contents, so “053+GB” → “AU+GB+NF+NZ”. This is done recursively, so 009 → “053+054+057+061+QO” → “AU+NF+NZ+FJ+NC+PG+SB +VU...”. Note that we use 019 for all of the Americas in the variables above, because en-US should be in the same cluster as es-419 and its contents.
2502*912701f9SAndroid Build Coastguard Worker
2503*912701f9SAndroid Build Coastguard WorkerIn the rules, the percent value (100..0) is replaced by a **distance** value, which is the inverse (0..100).
2504*912701f9SAndroid Build Coastguard Worker
2505*912701f9SAndroid Build Coastguard WorkerThese new variables and rules divide up the world into clusters, where items in the same clusters (for specific languages) get the normal regional difference, and items in different clusters get different weights.
2506*912701f9SAndroid Build Coastguard Worker
2507*912701f9SAndroid Build Coastguard WorkerEach cluster can have one or more associated **paradigmLocales**. These are locales that are preferred within a cluster. So when matching desired=[en-SA] against [en-GU en en-IN en-GB], the value en-GB is returned. Both of \{en-GU en} are in a different cluster. While \{en-IN en-GB} are in the same cluster, and the same distance from en-SA, the preference is given to en-GB because it is in the paradigm locales. It would be possible to express this in rules, but using this mechanism handles these very common cases without bulking up the tables.
2508*912701f9SAndroid Build Coastguard Worker
2509*912701f9SAndroid Build Coastguard WorkerThe **paradigmLocales** also allow matching to macroregions. For example, desired=[es-419] should match to \{es-MX} more closely than to \{es}, and vice versa: \{es-MX} should match more closely to \{es-419} than to \{es}. But es-MX should match more closely to es-419 than to any of the other es-419 sublocales. In general, in the absence of other distance data, there is a ‘paradigm’ in each cluster that the others should match more closely to: en(-US), en-GB, es(-ES), es-419, ru(-RU)...
2510*912701f9SAndroid Build Coastguard Worker
2511*912701f9SAndroid Build Coastguard Worker
2512*912701f9SAndroid Build Coastguard Worker
2513*912701f9SAndroid Build Coastguard Worker## <a name="XML_Format" href="#XML_Format">XML Format</a>
2514*912701f9SAndroid Build Coastguard Worker
2515*912701f9SAndroid Build Coastguard WorkerThere are two kinds of data that can be expressed in LDML: language-dependent data and supplementary data. In either case, data can be split across multiple files, which can be in multiple directory trees.
2516*912701f9SAndroid Build Coastguard Worker
2517*912701f9SAndroid Build Coastguard WorkerFor example, the language-dependent data for Japanese in CLDR is present in the following files:
2518*912701f9SAndroid Build Coastguard Worker
2519*912701f9SAndroid Build Coastguard Worker* common/collation/ja.xml
2520*912701f9SAndroid Build Coastguard Worker* common/main/ja.xml
2521*912701f9SAndroid Build Coastguard Worker* common/rbnf/ja.xml
2522*912701f9SAndroid Build Coastguard Worker* common/segmentations/ja.xml
2523*912701f9SAndroid Build Coastguard Worker
2524*912701f9SAndroid Build Coastguard WorkerData for cased languages such as French are in files like:
2525*912701f9SAndroid Build Coastguard Worker
2526*912701f9SAndroid Build Coastguard Worker* common/casing/fr.xml
2527*912701f9SAndroid Build Coastguard Worker
2528*912701f9SAndroid Build Coastguard WorkerThe status of the data is the same, whether or not data is split. That is, for the purpose of validation and lookup, all of the data for the above ja.xml files is treated as if it was in a single file. These files have the `<ldml>` root element and use ldml.dtd. The file name must match the identity element. For example, the `<ldml>` file pa_Arab_PK.xml must contain the following elements:
2529*912701f9SAndroid Build Coastguard Worker
2530*912701f9SAndroid Build Coastguard Worker```xml
2531*912701f9SAndroid Build Coastguard Worker<ldml>
2532*912701f9SAndroid Build Coastguard Worker    <identity>
2533*912701f9SAndroid Build Coastguard Worker2534*912701f9SAndroid Build Coastguard Worker        <language type="pa" />
2535*912701f9SAndroid Build Coastguard Worker        <script type="Arab" />
2536*912701f9SAndroid Build Coastguard Worker        <territory type="PK" />
2537*912701f9SAndroid Build Coastguard Worker    </identity>
2538*912701f9SAndroid Build Coastguard Worker2539*912701f9SAndroid Build Coastguard Worker```
2540*912701f9SAndroid Build Coastguard Worker
2541*912701f9SAndroid Build Coastguard WorkerSupplemental data can have different root elements, currently: `ldmlBCP47`, `supplementalData`, `keyboard`, and `platform`. Keyboard and platform files are considered distinct. The ldmlBCP47 files and supplementalData files that have the same root are all logically part of the same file; they are simply split into separate files for convenience. Implementations may split the files in different ways, also for their convenience. The files in /properties are also supplemental data files, but are structured like UCD properties.
2542*912701f9SAndroid Build Coastguard Worker
2543*912701f9SAndroid Build Coastguard WorkerFor example, supplemental data relating to Japan or the Japanese writing are in:
2544*912701f9SAndroid Build Coastguard Worker
2545*912701f9SAndroid Build Coastguard Worker* common/supplemental/ (in many files, such as supplementalData.xml)
2546*912701f9SAndroid Build Coastguard Worker* common/transforms/Hiragana-Katakana.xml
2547*912701f9SAndroid Build Coastguard Worker* common/transforms/Hiragana-Latin.xml
2548*912701f9SAndroid Build Coastguard Worker* common/properties/scriptMetadata.txt
2549*912701f9SAndroid Build Coastguard Worker* common/bcp47/calendar.xml
2550*912701f9SAndroid Build Coastguard Worker* uca/allkeys_CLDR.txt (sorting)
2551*912701f9SAndroid Build Coastguard Worker* /keyboards/chromeos/ja-t-k0-chromeos.xml
2552*912701f9SAndroid Build Coastguard Worker* ...
2553*912701f9SAndroid Build Coastguard Worker
2554*912701f9SAndroid Build Coastguard WorkerLike the `<ldml>` files, the keyboard file names must match internal data: in particular, the `locale` attribute on the keyboard element must have a value that corresponds to the file name, such as `<keyboard locale="af-t-k0-android">` for the file af-t-k0-android.xml.
2555*912701f9SAndroid Build Coastguard Worker
2556*912701f9SAndroid Build Coastguard WorkerThe following sections describe the structure of the XML format for language-dependent data. The more precise syntax is in the ldml.dtd file; _however, the DTD does not describe all the constraints on the structure._
2557*912701f9SAndroid Build Coastguard Worker
2558*912701f9SAndroid Build Coastguard WorkerTo start with, the root element is `<ldml>`, with the following DTD entry:
2559*912701f9SAndroid Build Coastguard Worker
2560*912701f9SAndroid Build Coastguard Worker```xml
2561*912701f9SAndroid Build Coastguard Worker<!ELEMENT ldml (identity,(alias|(fallback*,localeDisplayNames?,layout?,contextTransforms?,characters?,
2562*912701f9SAndroid Build Coastguard Workerdelimiters?,measurement?,dates?,numbers?,units?,listPatterns?,collations?,posix?,
2563*912701f9SAndroid Build Coastguard Workersegmentations?,rbnf?,annotations?,metadata?,references?,special*)))>
2564*912701f9SAndroid Build Coastguard Worker```
2565*912701f9SAndroid Build Coastguard Worker
2566*912701f9SAndroid Build Coastguard WorkerThe XML structure is stable over releases. Elements and attributes may be deprecated: they are retained in the DTD but their usage is strongly discouraged. In most cases, an alternate structure is provided for expressing the information. There is only one exception: newer DTDs cannot be used with version 1.1 files, without some modification.
2567*912701f9SAndroid Build Coastguard Worker
2568*912701f9SAndroid Build Coastguard WorkerIn general, all translatable text in this format is in element contents, while attributes are reserved for types and non-translated information (such as numbers or dates). The reason that attributes are not used for translatable text is that spaces are not preserved, and we cannot predict where spaces may be significant in translated material.
2569*912701f9SAndroid Build Coastguard Worker
2570*912701f9SAndroid Build Coastguard WorkerThere are two kinds of elements in LDML: _rule_ elements and _structure_ elements.
2571*912701f9SAndroid Build Coastguard Worker
2572*912701f9SAndroid Build Coastguard WorkerFor structure elements, there are restrictions to allow for effective inheritance and processing:
2573*912701f9SAndroid Build Coastguard Worker
2574*912701f9SAndroid Build Coastguard Worker1.  There is no ["mixed" content](https://www.w3.org/TR/xml/#sec-mixed-content): if an element has textual content, then it cannot contain any elements.
2575*912701f9SAndroid Build Coastguard Worker2.  The [[XPath](#XPath)] leading to the content is unique; no two different pieces of textual content have the same [[XPath](#XPath)].
2576*912701f9SAndroid Build Coastguard Worker3.  An element that has [value attributes](#Definitions) MUST NOT also have have child elements.
2577*912701f9SAndroid Build Coastguard Worker
2578*912701f9SAndroid Build Coastguard WorkerTo illustrate these restrictions, consider the below chunk of XML:
2579*912701f9SAndroid Build Coastguard Worker
2580*912701f9SAndroid Build Coastguard Worker```xml
2581*912701f9SAndroid Build Coastguard Worker<!-- Not correct LDML -->
2582*912701f9SAndroid Build Coastguard Worker<unit type="duration-day"
2583*912701f9SAndroid Build Coastguard Worker      displayName="days"> <!-- #3: @VALUE attribute AND children -->
2584*912701f9SAndroid Build Coastguard Worker  {0} per day <!-- #1: Mixed content -->
2585*912701f9SAndroid Build Coastguard Worker  <unitPattern>{0} day</unitPattern>  <!-- #2 same XPath /unit[@type="duration-day"]/unitPattern -->
2586*912701f9SAndroid Build Coastguard Worker  <unitPattern>{0} days</unitPattern> <!-- #2 same XPath /unit[@type="duration-day"]/unitPattern -->
2587*912701f9SAndroid Build Coastguard Worker</unit>
2588*912701f9SAndroid Build Coastguard Worker```
2589*912701f9SAndroid Build Coastguard Worker
2590*912701f9SAndroid Build Coastguard WorkerLDML is actually structured as below (from `en.xml`):
2591*912701f9SAndroid Build Coastguard Worker
2592*912701f9SAndroid Build Coastguard Worker```xml
2593*912701f9SAndroid Build Coastguard Worker<unit type="duration-day">  <!-- OK: "type" is distinguishing -->
2594*912701f9SAndroid Build Coastguard Worker  <displayName>days</displayName>
2595*912701f9SAndroid Build Coastguard Worker  <unitPattern count="one">{0} day</unitPattern> <!-- "count" is distinguishing -->
2596*912701f9SAndroid Build Coastguard Worker  <unitPattern count="other">{0} days</unitPattern>
2597*912701f9SAndroid Build Coastguard Worker  <perUnitPattern>{0} per day</perUnitPattern> <!-- mixed content in an element -->
2598*912701f9SAndroid Build Coastguard Worker</unit>
2599*912701f9SAndroid Build Coastguard Worker```
2600*912701f9SAndroid Build Coastguard Worker
2601*912701f9SAndroid Build Coastguard WorkerRule elements do not have these restrictions, but also do not inherit, except as an entire block. Items which are ordered have the DTD Annotation `@ORDERED`. See [_DTD Annotations_](#DTD_Annotations) and _[Inheritance and Validity](#Inheritance_and_Validity)_. For more technical details, see [Updating-DTDs](https://cldr.unicode.org/development/updating-dtds).
2602*912701f9SAndroid Build Coastguard Worker
2603*912701f9SAndroid Build Coastguard WorkerNote that the data in examples given below is purely illustrative, and does not match any particular language. For a more detailed example of this format, see [[Example](#LDML)]. There is also a DTD for this format, but _remember that the DTD alone is not sufficient to understand the semantics, the constraints, nor  the interrelationships between the different elements and attributes_. You may wish to have copies of each of these to hand as you proceed through the rest of this document.
2604*912701f9SAndroid Build Coastguard Worker
2605*912701f9SAndroid Build Coastguard WorkerIn particular, all elements allow for draft versions to coexist in the file at the same time. Thus most elements are marked in the DTD as allowing multiple instances. However, unless an element is annotated as `@ORDERED`, or has a distinguishing attribute, it can only occur once as a subelement of a given element. Thus, for example, the following is illegal even though allowed by the DTD:
2606*912701f9SAndroid Build Coastguard Worker
2607*912701f9SAndroid Build Coastguard Worker```xml
2608*912701f9SAndroid Build Coastguard Worker<languages>
2609*912701f9SAndroid Build Coastguard Worker    <language type="aa">...</language>
2610*912701f9SAndroid Build Coastguard Worker    <language type="aa">..</language>
2611*912701f9SAndroid Build Coastguard Worker```
2612*912701f9SAndroid Build Coastguard Worker
2613*912701f9SAndroid Build Coastguard WorkerThere must be only one instance of these per parent, unless there are other distinguishing attributes (such as an `alt` element).
2614*912701f9SAndroid Build Coastguard Worker
2615*912701f9SAndroid Build Coastguard WorkerIn general, LDML data should be in NFC format. Normalization forms are defined by [[UAX15](https://www.unicode.org/reports/tr41/#UAX15)]. However, certain elements may need to contain characters that are not in NFC, including exemplars, transforms, segmentations, and p/s/t/i/pc/sc/tc/ic rules in collation. These elements must not be normalized (either to NFC or NFD), or their meaning may be changed. Thus LDML documents must not be normalized as a whole. To prevent problems with normalization, no element value can start with a combining slash (U+0338 COMBINING LONG SOLIDUS OVERLAY).
2616*912701f9SAndroid Build Coastguard Worker
2617*912701f9SAndroid Build Coastguard WorkerLists, such as singleCountries are space-delimited. That means that they are separated by one or more XML whitespace characters:
2618*912701f9SAndroid Build Coastguard Worker
2619*912701f9SAndroid Build Coastguard Worker* singleCountries
2620*912701f9SAndroid Build Coastguard Worker* preferenceOrdering
2621*912701f9SAndroid Build Coastguard Worker* references
2622*912701f9SAndroid Build Coastguard Worker
2623*912701f9SAndroid Build Coastguard Worker### <a name="Common_Elements" href="#Common_Elements">Common Elements</a>
2624*912701f9SAndroid Build Coastguard Worker
2625*912701f9SAndroid Build Coastguard WorkerAt any level in any element, two special elements are allowed.
2626*912701f9SAndroid Build Coastguard Worker
2627*912701f9SAndroid Build Coastguard Worker#### <a name="special" href="#special">Element special</a>
2628*912701f9SAndroid Build Coastguard Worker
2629*912701f9SAndroid Build Coastguard WorkerThis element is designed to allow for arbitrary additional annotation and data that is product-specific. It has one required attribute `xmlns`, which specifies the XML [namespace](https://www.w3.org/TR/REC-xml-names/) of the special data. For example, the following used the version 1.0 POSIX special element.
2630*912701f9SAndroid Build Coastguard Worker
2631*912701f9SAndroid Build Coastguard Worker```xml
2632*912701f9SAndroid Build Coastguard Worker<!DOCTYPE ldml SYSTEM "https://www.unicode.org/cldr/dtd/1.0/ldml.dtd" [
2633*912701f9SAndroid Build Coastguard Worker    <!ENTITY % posix SYSTEM "https://www.unicode.org/cldr/dtd/1.0/ldmlPOSIX.dtd">
2634*912701f9SAndroid Build Coastguard Worker%posix;
2635*912701f9SAndroid Build Coastguard Worker]>
2636*912701f9SAndroid Build Coastguard Worker<ldml>
2637*912701f9SAndroid Build Coastguard Worker...
2638*912701f9SAndroid Build Coastguard Worker    <special xmlns:posix="https://www.opengroup.org/regproducts/xu.htm">
2639*912701f9SAndroid Build Coastguard Worker        <!-- old abbreviations for pre-GUI days -->
2640*912701f9SAndroid Build Coastguard Worker        <posix:messages>
2641*912701f9SAndroid Build Coastguard Worker            <posix:yesstr>Yes</posix:yesstr>
2642*912701f9SAndroid Build Coastguard Worker            <posix:nostr>No</posix:nostr>
2643*912701f9SAndroid Build Coastguard Worker            <posix:yesexpr>^[Yy].*</posix:yesexpr>
2644*912701f9SAndroid Build Coastguard Worker            <posix:noexpr>^[Nn].*</posix:noexpr>
2645*912701f9SAndroid Build Coastguard Worker        </posix:messages>
2646*912701f9SAndroid Build Coastguard Worker    </special>
2647*912701f9SAndroid Build Coastguard Worker</ldml>
2648*912701f9SAndroid Build Coastguard Worker```
2649*912701f9SAndroid Build Coastguard Worker
2650*912701f9SAndroid Build Coastguard Worker##### <a name="Sample_Special_Elements" href="#Sample_Special_Elements">Sample Special Elements</a>
2651*912701f9SAndroid Build Coastguard Worker
2652*912701f9SAndroid Build Coastguard WorkerThe elements in this section are _**not**_ part of the Locale Data Markup Language 1.0 specification. Instead, they are special elements used for application-specific data to be stored in the Common Locale Repository. They may change or be removed in future versions of this document, and are present here more as examples of how to extend the format. (Some of these items may move into a future version of the Locale Data Markup Language specification.)
2653*912701f9SAndroid Build Coastguard Worker
2654*912701f9SAndroid Build Coastguard Worker* [https://www.unicode.org/cldr/dtd/1.1/ldmlICU.dtd](https://www.unicode.org/cldr/dtd/1.1/ldmlICU.dtd)
2655*912701f9SAndroid Build Coastguard Worker* [https://www.unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd](https://www.unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd)
2656*912701f9SAndroid Build Coastguard Worker
2657*912701f9SAndroid Build Coastguard WorkerThe above examples are old versions: consult the documentation for the specific application to see which should be used.
2658*912701f9SAndroid Build Coastguard Worker
2659*912701f9SAndroid Build Coastguard WorkerThese DTDs use namespaces and the special element. To include one or more, use the following pattern to import the special DTDs that are used in the file:
2660*912701f9SAndroid Build Coastguard Worker
2661*912701f9SAndroid Build Coastguard Worker```xml
2662*912701f9SAndroid Build Coastguard Worker<?xml version="1.0" encoding="UTF-8" ?>
2663*912701f9SAndroid Build Coastguard Worker<!DOCTYPE ldml SYSTEM "https://www.unicode.org/cldr/dtd/1.1/ldml.dtd" [
2664*912701f9SAndroid Build Coastguard Worker    <!ENTITY % icu SYSTEM "https://www.unicode.org/cldr/dtd/1.1/ldmlICU.dtd">
2665*912701f9SAndroid Build Coastguard Worker    <!ENTITY % openOffice SYSTEM "https://www.unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd">
2666*912701f9SAndroid Build Coastguard Worker%icu;
2667*912701f9SAndroid Build Coastguard Worker%openOffice; ]>
2668*912701f9SAndroid Build Coastguard Worker```
2669*912701f9SAndroid Build Coastguard Worker
2670*912701f9SAndroid Build Coastguard WorkerThus to include just the ICU DTD, one uses:
2671*912701f9SAndroid Build Coastguard Worker
2672*912701f9SAndroid Build Coastguard Worker```xml
2673*912701f9SAndroid Build Coastguard Worker<?xml version="1.0" encoding="UTF-8" ?>
2674*912701f9SAndroid Build Coastguard Worker<!DOCTYPE ldml SYSTEM "https://www.unicode.org/cldr/dtd/1.1/ldml.dtd" [
2675*912701f9SAndroid Build Coastguard Worker    <!ENTITY % icu SYSTEM "https://www.unicode.org/cldr/dtd/1.1/ldmlICU.dtd">
2676*912701f9SAndroid Build Coastguard Worker%icu; ]>
2677*912701f9SAndroid Build Coastguard Worker```
2678*912701f9SAndroid Build Coastguard Worker
2679*912701f9SAndroid Build Coastguard Worker> **Note:** A previous version of this document contained a special element for [ISO TR 14652](https://www.open-std.org/jtc1/sc22/wg20/docs/n897-14652w25.pdf) compatibility data. That element has been withdrawn, pending further investigation, since 14652 is a Type 1 TR: "when the required support cannot be obtained for the publication of an International Standard, despite repeated effort". See the ballot comments on [14652 Comments](https://www.open-std.org/jtc1/sc22/wg20/docs/n948-J1N6769-14652.pdf) for details on the 14652 defects. For example, most of these patterns make little provision for substantial changes in format when elements are empty, so are not particularly useful in practice. Compare, for example, the mail-merge capabilities of production software such as Microsoft Word or OpenOffice.
2680*912701f9SAndroid Build Coastguard Worker>
2681*912701f9SAndroid Build Coastguard Worker> **Note:** While the CLDR specification guarantees backwards compatibility, the definition of specials is up to other organizations. Any assurance of backwards compatibility is up to those organizations.
2682*912701f9SAndroid Build Coastguard Worker
2683*912701f9SAndroid Build Coastguard WorkerA number of the elements above can have extra information for <a name="OpenOffice" href="#OpenOffice">openoffice.org</a>, such as the following example:
2684*912701f9SAndroid Build Coastguard Worker
2685*912701f9SAndroid Build Coastguard Worker```xml
2686*912701f9SAndroid Build Coastguard Worker<special xmlns:openOffice="https://www.openoffice.org">
2687*912701f9SAndroid Build Coastguard Worker    <openOffice:search>
2688*912701f9SAndroid Build Coastguard Worker        <openOffice:searchOptions>
2689*912701f9SAndroid Build Coastguard Worker            <openOffice:transliterationModules>IGNORE_CASE</openOffice:transliterationModules>
2690*912701f9SAndroid Build Coastguard Worker        </openOffice:searchOptions>
2691*912701f9SAndroid Build Coastguard Worker    </openOffice:search>
2692*912701f9SAndroid Build Coastguard Worker</special>
2693*912701f9SAndroid Build Coastguard Worker```
2694*912701f9SAndroid Build Coastguard Worker
2695*912701f9SAndroid Build Coastguard Worker#### <a name="Alias_Elements" href="#Alias_Elements">Element alias</a>
2696*912701f9SAndroid Build Coastguard Worker
2697*912701f9SAndroid Build Coastguard Worker```xml
2698*912701f9SAndroid Build Coastguard Worker<!ELEMENT alias (special*) >
2699*912701f9SAndroid Build Coastguard Worker<!ATTLIST alias source NMTOKEN #REQUIRED >
2700*912701f9SAndroid Build Coastguard Worker<!ATTLIST alias path CDATA #IMPLIED>
2701*912701f9SAndroid Build Coastguard Worker```
2702*912701f9SAndroid Build Coastguard Worker
2703*912701f9SAndroid Build Coastguard WorkerThe contents of any element in root can be replaced by an alias, which points to the path where the data can be found.
2704*912701f9SAndroid Build Coastguard Worker
2705*912701f9SAndroid Build Coastguard WorkerAliases will only ever appear in root with the form `//ldml/.../alias[@source="locale"][@path="..."]`.
2706*912701f9SAndroid Build Coastguard Worker
2707*912701f9SAndroid Build Coastguard WorkerConsider the following example in root:
2708*912701f9SAndroid Build Coastguard Worker
2709*912701f9SAndroid Build Coastguard Worker```xml
2710*912701f9SAndroid Build Coastguard Worker<calendar type="gregorian">
2711*912701f9SAndroid Build Coastguard Worker    <months>
2712*912701f9SAndroid Build Coastguard Worker        <default choice="format" />
2713*912701f9SAndroid Build Coastguard Worker        <monthContext type="format">
2714*912701f9SAndroid Build Coastguard Worker            <default choice="wide" />
2715*912701f9SAndroid Build Coastguard Worker            <monthWidth type="abbreviated">
2716*912701f9SAndroid Build Coastguard Worker                <alias source="locale" path="../monthWidth[@type='wide']"/>
2717*912701f9SAndroid Build Coastguard Worker            </monthWidth>
2718*912701f9SAndroid Build Coastguard Worker```
2719*912701f9SAndroid Build Coastguard Worker
2720*912701f9SAndroid Build Coastguard WorkerIf the locale "de_DE" is being accessed for a month name for format/abbreviated, then a resource bundle at "de_DE" will be searched for a resource element at that path. If not found there, then the resource bundle at "de" will be searched, and so on. When the alias is found in root, then the search is restarted, but searching for format/**wide** element instead of format/abbreviated.
2721*912701f9SAndroid Build Coastguard Worker
2722*912701f9SAndroid Build Coastguard WorkerIf the `path` attribute is present, then its value is an [[XPath](#XPath)] that points to a different node in the tree. For example:
2723*912701f9SAndroid Build Coastguard Worker
2724*912701f9SAndroid Build Coastguard Worker```xml
2725*912701f9SAndroid Build Coastguard Worker<alias source="locale" path="../monthWidth[@type='wide']"/>
2726*912701f9SAndroid Build Coastguard Worker```
2727*912701f9SAndroid Build Coastguard Worker
2728*912701f9SAndroid Build Coastguard WorkerThe default value if the path is not present is the same position in the tree. All of the attributes in the [[XPath](#XPath)] must be _distinguishing_ elements. For more details, see [Inheritance and Validity](#Inheritance_and_Validity).
2729*912701f9SAndroid Build Coastguard Worker
2730*912701f9SAndroid Build Coastguard WorkerThere is a special value for the source attribute, the constant `source="locale"`. This special value is equivalent to the locale being resolved. For example, consider the following example, where locale data for 'de' is being resolved:
2731*912701f9SAndroid Build Coastguard Worker
2732*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Inheritance_with_source_locale_" href="#Inheritance_with_source_locale_">Inheritance with `source="locale"`</a>
2733*912701f9SAndroid Build Coastguard Worker
2734*912701f9SAndroid Build Coastguard Worker<!-- HTML: multiline, readability -->
2735*912701f9SAndroid Build Coastguard Worker<table><thead>
2736*912701f9SAndroid Build Coastguard Worker<tr><th>Root</th><th>de</th><th>Resolved</th></tr>
2737*912701f9SAndroid Build Coastguard Worker</thead><tbody>
2738*912701f9SAndroid Build Coastguard Worker<tr>
2739*912701f9SAndroid Build Coastguard Worker<td>
2740*912701f9SAndroid Build Coastguard Worker
2741*912701f9SAndroid Build Coastguard Worker```xml
2742*912701f9SAndroid Build Coastguard Worker<x>
2743*912701f9SAndroid Build Coastguard Worker  <a>1</a>
2744*912701f9SAndroid Build Coastguard Worker  <b>2</b>
2745*912701f9SAndroid Build Coastguard Worker  <c>3</c>
2746*912701f9SAndroid Build Coastguard Worker
2747*912701f9SAndroid Build Coastguard Worker</x>
2748*912701f9SAndroid Build Coastguard Worker```
2749*912701f9SAndroid Build Coastguard Worker</td><td>
2750*912701f9SAndroid Build Coastguard Worker
2751*912701f9SAndroid Build Coastguard Worker```xml
2752*912701f9SAndroid Build Coastguard Worker<x>
2753*912701f9SAndroid Build Coastguard Worker <a>11</a>
2754*912701f9SAndroid Build Coastguard Worker <b>12</b>
2755*912701f9SAndroid Build Coastguard Worker
2756*912701f9SAndroid Build Coastguard Worker <d>14</d>
2757*912701f9SAndroid Build Coastguard Worker</x>
2758*912701f9SAndroid Build Coastguard Worker```
2759*912701f9SAndroid Build Coastguard Worker</td><td>
2760*912701f9SAndroid Build Coastguard Worker
2761*912701f9SAndroid Build Coastguard Worker```xml
2762*912701f9SAndroid Build Coastguard Worker<x>
2763*912701f9SAndroid Build Coastguard Worker <a>11</a>
2764*912701f9SAndroid Build Coastguard Worker <b>12</b>
2765*912701f9SAndroid Build Coastguard Worker <c>3</c>
2766*912701f9SAndroid Build Coastguard Worker <d>14</d>
2767*912701f9SAndroid Build Coastguard Worker</x>
2768*912701f9SAndroid Build Coastguard Worker```
2769*912701f9SAndroid Build Coastguard Worker</td></tr>
2770*912701f9SAndroid Build Coastguard Worker<tr><td>
2771*912701f9SAndroid Build Coastguard Worker
2772*912701f9SAndroid Build Coastguard Worker```xml
2773*912701f9SAndroid Build Coastguard Worker<y>
2774*912701f9SAndroid Build Coastguard Worker <alias source="locale" path="../x">
2775*912701f9SAndroid Build Coastguard Worker</y>
2776*912701f9SAndroid Build Coastguard Worker
2777*912701f9SAndroid Build Coastguard Worker
2778*912701f9SAndroid Build Coastguard Worker
2779*912701f9SAndroid Build Coastguard Worker
2780*912701f9SAndroid Build Coastguard Worker
2781*912701f9SAndroid Build Coastguard Worker```
2782*912701f9SAndroid Build Coastguard Worker</td><td>
2783*912701f9SAndroid Build Coastguard Worker
2784*912701f9SAndroid Build Coastguard Worker```xml
2785*912701f9SAndroid Build Coastguard Worker<y>
2786*912701f9SAndroid Build Coastguard Worker
2787*912701f9SAndroid Build Coastguard Worker <b>22</b>
2788*912701f9SAndroid Build Coastguard Worker
2789*912701f9SAndroid Build Coastguard Worker
2790*912701f9SAndroid Build Coastguard Worker <e>25</e>
2791*912701f9SAndroid Build Coastguard Worker</y>
2792*912701f9SAndroid Build Coastguard Worker```
2793*912701f9SAndroid Build Coastguard Worker</td><td>
2794*912701f9SAndroid Build Coastguard Worker
2795*912701f9SAndroid Build Coastguard Worker```xml
2796*912701f9SAndroid Build Coastguard Worker<y>
2797*912701f9SAndroid Build Coastguard Worker <a>11</a>
2798*912701f9SAndroid Build Coastguard Worker <b>22</b>
2799*912701f9SAndroid Build Coastguard Worker <c>3</c>
2800*912701f9SAndroid Build Coastguard Worker <d>14</d>
2801*912701f9SAndroid Build Coastguard Worker <e>25</e>
2802*912701f9SAndroid Build Coastguard Worker</y>
2803*912701f9SAndroid Build Coastguard Worker```
2804*912701f9SAndroid Build Coastguard Worker</td></tr>
2805*912701f9SAndroid Build Coastguard Worker</tbody></table>
2806*912701f9SAndroid Build Coastguard Worker
2807*912701f9SAndroid Build Coastguard WorkerThe first row shows the inheritance within the `<x>` element, whereby `<c>` is inherited from root. The second shows the inheritance within the `<y>` element, whereby `<a>`, `<c>`, and `<d>` are inherited also from root, but from an alias there. The alias in root is logically replaced not by the elements in root itself, but by elements in the 'target' locale.
2808*912701f9SAndroid Build Coastguard Worker
2809*912701f9SAndroid Build Coastguard WorkerFor more details on data resolution, see [Inheritance and Validity](#Inheritance_and_Validity).
2810*912701f9SAndroid Build Coastguard Worker
2811*912701f9SAndroid Build Coastguard WorkerAliases must be resolved recursively. An alias may point to another path that results in another alias being found, and so on. For example, looking up Thai buddhist abbreviated months for the locale **xx-YY** may result in the following chain of aliases being followed:
2812*912701f9SAndroid Build Coastguard Worker
2813*912701f9SAndroid Build Coastguard Worker> `../../calendar[@type="buddhist"]/months/monthContext[@type="format"]/monthWidth[@type="abbreviated"]`
2814*912701f9SAndroid Build Coastguard Worker>
2815*912701f9SAndroid Build Coastguard Worker> xx-YY → xx → root // finds alias that changes path to:
2816*912701f9SAndroid Build Coastguard Worker>
2817*912701f9SAndroid Build Coastguard Worker> `../../calendar[@type="gregorian"]/months/monthContext[@type="format"]/monthWidth[@type="abbreviated"]`
2818*912701f9SAndroid Build Coastguard Worker>
2819*912701f9SAndroid Build Coastguard Worker> xx-YY → xx → root // finds alias that changes path to:
2820*912701f9SAndroid Build Coastguard Worker>
2821*912701f9SAndroid Build Coastguard Worker> `../../calendar[@type="gregorian"]/months/monthContext[@type="format"]/monthWidth[@type="wide"]`
2822*912701f9SAndroid Build Coastguard Worker>
2823*912701f9SAndroid Build Coastguard Worker> xx-YY → xx // finds value here
2824*912701f9SAndroid Build Coastguard Worker
2825*912701f9SAndroid Build Coastguard Worker
2826*912701f9SAndroid Build Coastguard WorkerIt is an error to have a circular chain of aliases. That is, a collection of LDML XML documents must not have situations where a sequence of alias lookups (including inheritance and lateral inheritance) can be followed indefinitely without terminating.
2827*912701f9SAndroid Build Coastguard Worker
2828*912701f9SAndroid Build Coastguard Worker#### <a name="Element_displayName" href="#Element_displayName">Element displayName</a>
2829*912701f9SAndroid Build Coastguard Worker
2830*912701f9SAndroid Build Coastguard WorkerMany elements can have a display name. This is a translated name that can be presented to users when discussing the particular service. For example, a number format, used to format numbers using the conventions of that locale, can have translated name for presentation in GUIs.
2831*912701f9SAndroid Build Coastguard Worker
2832*912701f9SAndroid Build Coastguard Worker```xml
2833*912701f9SAndroid Build Coastguard Worker<numberFormat>
2834*912701f9SAndroid Build Coastguard Worker    <displayName>Prozentformat</displayName>
2835*912701f9SAndroid Build Coastguard Worker    ...
2836*912701f9SAndroid Build Coastguard Worker<numberFormat>
2837*912701f9SAndroid Build Coastguard Worker```
2838*912701f9SAndroid Build Coastguard Worker
2839*912701f9SAndroid Build Coastguard WorkerWhere present, the display names must be unique; that is, two distinct codes would not get the same display name.  (There is one exception to this: in time zones, where parsing results would give the same GMT offset, the standard and daylight display names can be the same across different time zone IDs.) Any translations should follow customary practice for the locale in question. For more information, see [[Data Formats](#DataFormats)].
2840*912701f9SAndroid Build Coastguard Worker
2841*912701f9SAndroid Build Coastguard Worker#### <a name="Escaping_Characters" href="#Escaping_Characters">Escaping Characters</a>
2842*912701f9SAndroid Build Coastguard Worker
2843*912701f9SAndroid Build Coastguard WorkerUnfortunately, XML does not have the capability to contain all Unicode code points. Due to this, in certain instances extra syntax is required to represent those code points that cannot be otherwise represented in element content. The escaping syntax is only defined on a few types of elements, such as in collation or exemplar sets, and uses the appropriate syntax for that type.
2844*912701f9SAndroid Build Coastguard Worker
2845*912701f9SAndroid Build Coastguard WorkerThe element `<cp>`, which was formerly used for this purpose, has been deprecated.
2846*912701f9SAndroid Build Coastguard Worker
2847*912701f9SAndroid Build Coastguard Worker### <a name="Common_Attributes" href="#Common_Attributes">Common Attributes</a>
2848*912701f9SAndroid Build Coastguard Worker
2849*912701f9SAndroid Build Coastguard Worker#### <a name="Attribute_type" href="#Attribute_type">Attribute type</a>
2850*912701f9SAndroid Build Coastguard Worker
2851*912701f9SAndroid Build Coastguard WorkerThe attribute `type` is also used to indicate an alternate resource that can be selected with a matching `type=option` in the locale id modifiers, or be referenced by a default element. For example:
2852*912701f9SAndroid Build Coastguard Worker
2853*912701f9SAndroid Build Coastguard Worker```xml
2854*912701f9SAndroid Build Coastguard Worker<ldml>
2855*912701f9SAndroid Build Coastguard Worker    ...
2856*912701f9SAndroid Build Coastguard Worker    <currencies>
2857*912701f9SAndroid Build Coastguard Worker        <currency>...</currency>
2858*912701f9SAndroid Build Coastguard Worker        <currency type="preEuro">...</currency>
2859*912701f9SAndroid Build Coastguard Worker    </currencies>
2860*912701f9SAndroid Build Coastguard Worker</ldml>
2861*912701f9SAndroid Build Coastguard Worker```
2862*912701f9SAndroid Build Coastguard Worker
2863*912701f9SAndroid Build Coastguard Worker#### <a name="Attribute_draft" href="#Attribute_draft">Attribute draft</a>
2864*912701f9SAndroid Build Coastguard Worker
2865*912701f9SAndroid Build Coastguard WorkerIf this attribute is present, it indicates the status of all the data in this element and any subelements (unless they have a contrary `draft` value), as per the following:
2866*912701f9SAndroid Build Coastguard Worker
2867*912701f9SAndroid Build Coastguard Worker* `approved`: fully approved by the technical committee (equals the CLDR 1.3 value of `false`, or an absent `draft` attribute). This does not mean that the data is guaranteed to be error-free—this is the best judgment of the committee.
2868*912701f9SAndroid Build Coastguard Worker* `contributed`: partially approved by the technical committee.
2869*912701f9SAndroid Build Coastguard Worker* `provisional`: partially confirmed. Implementations may choose to accept the provisional data, especially if there is no translated alternative.
2870*912701f9SAndroid Build Coastguard Worker* `unconfirmed`: no confirmation available.
2871*912701f9SAndroid Build Coastguard Worker
2872*912701f9SAndroid Build Coastguard WorkerFor more information on precisely how these values are computed for any given release, see [Data Submission and Vetting Process](https://cldr.unicode.org/index/process#h.krygv7y7jkk9) on the CLDR website.
2873*912701f9SAndroid Build Coastguard Worker
2874*912701f9SAndroid Build Coastguard WorkerThe `draft` attribute should only occur on "leaf" elements, and is deprecated elsewhere. For a more formal description of how elements are inherited, and what their draft status is, see _[Inheritance and Validity](#Inheritance_and_Validity)_.
2875*912701f9SAndroid Build Coastguard Worker
2876*912701f9SAndroid Build Coastguard Worker#### <a name="alt_attribute" href="#alt_attribute">Attribute alt</a>
2877*912701f9SAndroid Build Coastguard Worker
2878*912701f9SAndroid Build Coastguard WorkerThis attribute labels an alternative value for an element. The value is a _descriptor_ that indicates what kind of alternative it is, and takes one of the following
2879*912701f9SAndroid Build Coastguard Worker
2880*912701f9SAndroid Build Coastguard Worker* `variantname` means that the value is a variant of the normal value, and may be used in its place in certain circumstances. If a variant value is absent for a particular locale, the normal value is used. The variant mechanism should only be used when such a fallback is acceptable.
2881*912701f9SAndroid Build Coastguard Worker* `proposed`, optionally followed by a number, indicating that the value is a proposed replacement for an existing value.
2882*912701f9SAndroid Build Coastguard Worker* `variantname-proposed`, optionally followed by a number, indicating that the value is a proposed replacement variant value.
2883*912701f9SAndroid Build Coastguard Worker
2884*912701f9SAndroid Build Coastguard Worker`proposed` should only be present if the draft status is not `approved`. It indicates that the data is proposed replacement data that has been added provisionally until the differences between it and the other data can be vetted. For example, suppose that the translation for September for some language is "Settembru", and a bug report is filed that that should be "Settembro". The new data can be entered in, but marked as `alt="proposed"` until it is vetted.
2885*912701f9SAndroid Build Coastguard Worker
2886*912701f9SAndroid Build Coastguard Worker```xml
2887*912701f9SAndroid Build Coastguard Worker...
2888*912701f9SAndroid Build Coastguard Worker<month type="9">Settembru</month>
2889*912701f9SAndroid Build Coastguard Worker<month type="9" draft="unconfirmed" alt="proposed">Settembro</month>
2890*912701f9SAndroid Build Coastguard Worker<month type="10">...
2891*912701f9SAndroid Build Coastguard Worker```
2892*912701f9SAndroid Build Coastguard Worker
2893*912701f9SAndroid Build Coastguard WorkerNow assume another bug report comes in, saying that the correct form is actually "Settembre". Another alternative can be added:
2894*912701f9SAndroid Build Coastguard Worker
2895*912701f9SAndroid Build Coastguard Worker```xml
2896*912701f9SAndroid Build Coastguard Worker...
2897*912701f9SAndroid Build Coastguard Worker<month type="9" draft="unconfirmed" alt="proposed2">Settembre</month>
2898*912701f9SAndroid Build Coastguard Worker...
2899*912701f9SAndroid Build Coastguard Worker```
2900*912701f9SAndroid Build Coastguard Worker
2901*912701f9SAndroid Build Coastguard WorkerThe values for _variantname_ at this time include "variant", "list", "email", "www", "short", and "secondary".
2902*912701f9SAndroid Build Coastguard Worker
2903*912701f9SAndroid Build Coastguard WorkerFor a more complete description of how draft applies to data, see _[Inheritance and Validity](#Inheritance_and_Validity)_.
2904*912701f9SAndroid Build Coastguard Worker
2905*912701f9SAndroid Build Coastguard Worker#### <a name="references_attribute" href="#references_attribute">Attribute references</a>
2906*912701f9SAndroid Build Coastguard Worker
2907*912701f9SAndroid Build Coastguard WorkerThe value of this attribute is a token representing a reference for the information in the element, including standards that it may conform to. `<references>`. (In older versions of CLDR, the value of the attribute was freeform text. That format is deprecated.)
2908*912701f9SAndroid Build Coastguard Worker
2909*912701f9SAndroid Build Coastguard Worker_Example:_
2910*912701f9SAndroid Build Coastguard Worker
2911*912701f9SAndroid Build Coastguard Worker```xml
2912*912701f9SAndroid Build Coastguard Worker<territory type="UM" references="R222">USAs yttre öar</territory>
2913*912701f9SAndroid Build Coastguard Worker```
2914*912701f9SAndroid Build Coastguard Worker
2915*912701f9SAndroid Build Coastguard WorkerThe `reference` element may be inherited. Thus, for example, R222 may be used in sv_SE.xml even though it is not defined there, if it is defined in sv.xml.
2916*912701f9SAndroid Build Coastguard Worker
2917*912701f9SAndroid Build Coastguard Worker```xml
2918*912701f9SAndroid Build Coastguard Worker<... allow="verbatim" ...> (deprecated)
2919*912701f9SAndroid Build Coastguard Worker```
2920*912701f9SAndroid Build Coastguard Worker
2921*912701f9SAndroid Build Coastguard WorkerThis attribute was originally intended for use in marking display names whose capitalization differed from what was indicated by the now-deprecated `<inText>` element (perhaps, for example, because the names included a proper noun). It was never supported in the dtd and is not needed for use with the new `<contextTransforms>` element.
2922*912701f9SAndroid Build Coastguard Worker
2923*912701f9SAndroid Build Coastguard Worker### <a name="Common_Structures" href="#Common_Structures">Common Structures</a>
2924*912701f9SAndroid Build Coastguard Worker
2925*912701f9SAndroid Build Coastguard Worker#### <a name="Date_Ranges" href="#Date_Ranges">Date and Date Ranges</a>
2926*912701f9SAndroid Build Coastguard Worker
2927*912701f9SAndroid Build Coastguard WorkerWhen attribute specify date ranges, it is usually done with attributes `from` and `to`. The `from` attribute specifies the starting point, and the `to` attribute specifies the end point. The deprecated `time` attribute was formerly used to specify time with the deprecated `weekEndStart` and `weekEndEnd` elements, which were themselves inherently `from` or `to`.
2928*912701f9SAndroid Build Coastguard Worker
2929*912701f9SAndroid Build Coastguard WorkerThe data format is a restricted ISO 8601 format, restricted to the fields `year`, `month`, `day`, `hour`, `minute`, and `second` in that order, with "-" used as a separator between date fields, a space used as the separator between the date and the time fields, and `:` used as a separator between the time fields. If the `minute` or `minute` and `second` are absent, they are interpreted as zero. If the `hour` is also missing, then it is interpreted based on whether the attribute is `from` or `to`.
2930*912701f9SAndroid Build Coastguard Worker
2931*912701f9SAndroid Build Coastguard Worker* `from` defaults to "00:00:00" (midnight at the start of the day).
2932*912701f9SAndroid Build Coastguard Worker* `to` defaults to "24:00:00" (midnight at the end of the day).
2933*912701f9SAndroid Build Coastguard Worker
2934*912701f9SAndroid Build Coastguard WorkerThat is, Friday at 24:00:00 is the same time as Saturday at 00:00:00. Thus when the `hour` is missing, the `from` and `to` are interpreted inclusively: the range includes all of the day mentioned.
2935*912701f9SAndroid Build Coastguard Worker
2936*912701f9SAndroid Build Coastguard WorkerFor example, the following are equivalent:
2937*912701f9SAndroid Build Coastguard Worker
2938*912701f9SAndroid Build Coastguard Worker```xml
2939*912701f9SAndroid Build Coastguard Worker<usesMetazone from="1991-10-27" to="2006-04-02" .../>
2940*912701f9SAndroid Build Coastguard Worker<usesMetazone from="1991-10-27 00:00:00" to="2006-04-02 24:00:00" .../>
2941*912701f9SAndroid Build Coastguard Worker<usesMetazone from="1991-10-26 24:00:00" to="2006-04-03 00:00:00" .../>
2942*912701f9SAndroid Build Coastguard Worker```
2943*912701f9SAndroid Build Coastguard Worker
2944*912701f9SAndroid Build Coastguard WorkerIf the `from` element is missing, it is assumed to be as far backwards in time as there is data for; if the `to` element is missing, then it is from this point onwards, with no known end point.
2945*912701f9SAndroid Build Coastguard Worker
2946*912701f9SAndroid Build Coastguard WorkerThe dates and times are specified in local time, unless otherwise noted. (In particular, the metazone values are in UTC (also known as GMT).
2947*912701f9SAndroid Build Coastguard Worker
2948*912701f9SAndroid Build Coastguard Worker#### <a name="Text_Directionality" href="#Text_Directionality">Text Directionality</a>
2949*912701f9SAndroid Build Coastguard Worker
2950*912701f9SAndroid Build Coastguard WorkerThe content of certain elements, such as date or number formats, may consist of several sub-elements with an inherent order (for example, the year, month, and day for dates). In some cases, the order of these sub-elements may be changed depending on the bidirectional context in which the element is embedded.
2951*912701f9SAndroid Build Coastguard Worker
2952*912701f9SAndroid Build Coastguard WorkerFor example, short date formats in languages such as Arabic may contain neutral or weak characters at the beginning or end of the element content. In such a case, the overall order of the sub-elements may change depending on the surrounding text.
2953*912701f9SAndroid Build Coastguard Worker
2954*912701f9SAndroid Build Coastguard WorkerElement content whose display may be affected in this way should include an explicit direction mark, such as U+200E LEFT-TO-RIGHT MARK or U+200F RIGHT-TO-LEFT MARK, at the beginning or end of the element content, or both.
2955*912701f9SAndroid Build Coastguard Worker
2956*912701f9SAndroid Build Coastguard Worker#### <a name="Unicode_Sets" href="#Unicode_Sets">Unicode Sets</a>
2957*912701f9SAndroid Build Coastguard Worker
2958*912701f9SAndroid Build Coastguard WorkerSome attribute values or element contents use _UnicodeSet_ notation.
2959*912701f9SAndroid Build Coastguard WorkerA UnicodeSet represents a finite set of Unicode code points and strings, and is defined by lists of code points and strings, Unicode property sets, and set operators, with square brackets for groupings.
2960*912701f9SAndroid Build Coastguard WorkerIn this context, a code point means a string consisting of exactly one code point.
2961*912701f9SAndroid Build Coastguard Worker
2962*912701f9SAndroid Build Coastguard WorkerA UnicodeSet implements the semantics in _UTS #18: Unicode Regular Expressions_ [[UTS18](https://www.unicode.org/reports/tr41/#UTS18)] Levels 1 & 2 that are relevant to determining sets of characters.
2963*912701f9SAndroid Build Coastguard WorkerNote however that it may deviate from the syntax provided in [[UTS18](https://www.unicode.org/reports/tr41/#UTS18)].
2964*912701f9SAndroid Build Coastguard WorkerIn particular, Section [RL2.6](https://www.unicode.org/reports/tr18/#RL2.6) _Wildcards in Property Values_ is not supported.
2965*912701f9SAndroid Build Coastguard WorkerHowever, that feature can be supported in clients such as ICU by implementing a “hook” as is done in the [online UnicodeSet utilities](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bname%3D%2FAPPLE%2F%7D).
2966*912701f9SAndroid Build Coastguard Worker
2967*912701f9SAndroid Build Coastguard WorkerA UnicodeSet may be cited in specifications outside of the domain of LDML.
2968*912701f9SAndroid Build Coastguard WorkerIn such a case, that specification may specify a subset or superset of the syntax provided here.
2969*912701f9SAndroid Build Coastguard Worker
2970*912701f9SAndroid Build Coastguard Worker##### UnicodeSet syntax
2971*912701f9SAndroid Build Coastguard Worker
2972*912701f9SAndroid Build Coastguard Worker| Symbol         | Expression                                                     | Examples                                |
2973*912701f9SAndroid Build Coastguard Worker| -------------- | -------------------------------------------------------------- | --------------------------------------- |
2974*912701f9SAndroid Build Coastguard Worker| `unicodeSet`         | <pre>= prop<br/>\| '\[' '^'? s '-'? s seq\* \[\\$ \\-\]? s '\]' <br/>\| var</pre> | \\p\{x=y\},<br/>[abc],<br/>$myset                     |
2975*912701f9SAndroid Build Coastguard Worker| `seq`          | <pre>= unicodeSet \(s \[\\&\\\-\] s unicodeSet\)\* s<br/>\| range s</pre>        | \[abc\]\-\[cde\], a                          |
2976*912701f9SAndroid Build Coastguard Worker| `range`        | <pre>= element \('\-' element\)?       | a, a\-c, \{abc\}, a\-\{z\}  <br/> _note: in ranges, elements must resolve to exactly one code point._  |
2977*912701f9SAndroid Build Coastguard Worker| `element`      | <pre>= char \| string \| var </pre> | %, b, \{hello\}, \{\}, \\x\{61 62\} |
2978*912701f9SAndroid Build Coastguard Worker| `prop`         | <pre>= '\\' \[pP\] '\{' propName \(\[≠=\] s pValuePerl\+\)? '\}'<br/>\| '\[:' '^'? propName \(\[≠=\] s pValuePosix\+\)? ':\]'</pre> | \\p\{x=y\}, \[:x=y:\]<br/> |
2979*912701f9SAndroid Build Coastguard Worker| `propName`     | <pre>= s \[A\-Za\-z0\-9\] \[A\-Za\-z0\-9\_\\x20\]\* s</pre>                | General\_Category,<br/>General Category  |
2980*912701f9SAndroid Build Coastguard Worker| `pValuePerl`       | <pre>= \[^\\\}\]<br/>\| '\\' quoted</pre>                         | Lm,<br/>\\n,<br/>\\\}                    |
2981*912701f9SAndroid Build Coastguard Worker| `pValuePosix`       | <pre>= \[^:\]<br/>\| '\\' quoted</pre>                           | Lm,<br/>\\n,<br/>\\:                    |
2982*912701f9SAndroid Build Coastguard Worker| `string`      | <pre>= '\{' \(s charInString\)\* s '\}' </pre> | \{hello\} |
2983*912701f9SAndroid Build Coastguard Worker| `char`         | <pre>= \[^ \\^ \\& \\\- \\\[ \\\] \\\\ \\\{ \\$ \[:Pat_WS:\]\]<br/>\| '\\' quoted</pre> | a, b, c, \\n, \\\{, \\$ |
2984*912701f9SAndroid Build Coastguard Worker| `charInString` | <pre>= \[^ \\\\ \\\} \[:Pat_WS:\]\]<br/>\| '\\' quoted</pre> | a, b, c, \\n, \{, $ |
2985*912701f9SAndroid Build Coastguard Worker| `quoted`       | <pre>= 'u' \(hex\{4\} \| bracketedHex\)<br/>\| 'x' \(hex\{2\} \| bracketedHex\)<br/>\| 'U00' \('0' hex\{5\} \| '10' hex\{4\}\)<br/>\| 'N\{' charName '\}'<br/>\| \[\[\\u0000\-\\U00010FFFF\]\-\[uxUN\]\]</pre> | n, U0000FFFE, \{, $, \] <br/> _note: lengths are exact_ |
2986*912701f9SAndroid Build Coastguard Worker| `charName`     | <pre>= s \[A\-Za\-z0\-9\] \[\-A\-Za\-z0\-9\_\\x20\]\* s</pre>                | TIBETAN LETTER \-A                       |
2987*912701f9SAndroid Build Coastguard Worker| `bracketedHex` | <pre>= '\{' s hexCodePoint \(sRequired hexCodePoint\)\* s '\}'</pre>        | \{61 2019 62\}, \{61\}  |
2988*912701f9SAndroid Build Coastguard Worker| `hexCodePoint` | <pre>= hex\{1,5\} \| '10' hex\{4\}</pre>                           |                                         |
2989*912701f9SAndroid Build Coastguard Worker| `hex`          | <pre>= \[0\-9A\-Fa\-f\]</pre>                                       |                                         |
2990*912701f9SAndroid Build Coastguard Worker| `var`          | <pre>= '$' \[:XID_Start:\] \[:XID_Continue:\]\*</pre>                 | $a, $elt5  (optional support)            |
2991*912701f9SAndroid Build Coastguard Worker| `s`            | <pre>= \[:Pattern_White_Space:\]\*</pre>                          | optional whitespace                     |
2992*912701f9SAndroid Build Coastguard Worker| `sRequired`    | <pre>= \[:Pattern_White_Space:\]\+</pre>                  | required whitespace                     |
2993*912701f9SAndroid Build Coastguard Worker
2994*912701f9SAndroid Build Coastguard WorkerThe following are additional well-formedness and validity constraints:
2995*912701f9SAndroid Build Coastguard Worker1. [ wfc: Ranges (**X**-**Y**) are only well-formed in the case that elements **X** and **Y** resolve to single code points. That is, **\[a-b\]** and **\[\{a\}-\{b\}\]** are well-formed because single-codepoint-strings are equivalent to that code point, while **\[a-{bz}\]** and **\[\{ax\}-\{bz\}\]** are ill-formed. ]
2996*912701f9SAndroid Build Coastguard Worker2. [ vc: Property names and values are restricted to those supported by the implementation, and have additional constraints imposed by [[UAX44](https://www.unicode.org/reports/tr41/#UAX44)]. ]
2997*912701f9SAndroid Build Coastguard Worker
2998*912701f9SAndroid Build Coastguard WorkerNote also that:
2999*912701f9SAndroid Build Coastguard Worker1. Escapes that use multiple code points are equivalent to their flattened representation, i.e., `\x{61 62}` is equivalent to `\x{61}\x{62}`. These can also occur in strings, so **\[\{\\x\{ 061 62 0063\}\}\]** is equivalent to **\[\{abc\}\]**.
3000*912701f9SAndroid Build Coastguard Worker2. If **\[…\]** starts with \[:, then it begins a prop, and must also terminate with :\]. Thus **\[:di:\]** is a valid property expression, **\[di:\]** is a 3 code-point set, and **\[:di\]** raises an error.
3001*912701f9SAndroid Build Coastguard Worker3. Whitespace is significant when initiating/terminating a POSIX property expression, so **\[ :\]** is syntactically valid and equivalent to **\[\\:\]**.
3002*912701f9SAndroid Build Coastguard Worker
3003*912701f9SAndroid Build Coastguard WorkerThe syntax characters are listed in the table below:
3004*912701f9SAndroid Build Coastguard Worker
3005*912701f9SAndroid Build Coastguard Worker| Char | Hex    | Name                 | Usage                                      |
3006*912701f9SAndroid Build Coastguard Worker| ---- | ------ | -------------------- | ------------------------------------------ |
3007*912701f9SAndroid Build Coastguard Worker|  $   | U+0024 | DOLLAR SIGN          | Equivalent to \\uFFFF when followed by '\]', initiator for variable identifiers otherwise  |
3008*912701f9SAndroid Build Coastguard Worker|  &   | U+0026 | AMPERSAND            | Intersecting UnicodeSets                   |
3009*912701f9SAndroid Build Coastguard Worker|  -  | U+002D | HYPHEN-MINUS         | Ranges of characters; also set difference. |
3010*912701f9SAndroid Build Coastguard Worker|  :   | U+003A | COLON                | POSIX-style property syntax                |
3011*912701f9SAndroid Build Coastguard Worker|  [  | U+005B | LEFT SQUARE BRACKET  | Grouping; POSIX property syntax            |
3012*912701f9SAndroid Build Coastguard Worker|  ]  | U+005D | RIGHT SQUARE BRACKET | Grouping; POSIX property syntax            |
3013*912701f9SAndroid Build Coastguard Worker|  \\  | U+005C | REVERSE SOLIDUS      | Escaping                                   |
3014*912701f9SAndroid Build Coastguard Worker|  ^   | U+005E | CIRCUMFLEX ACCENT    | Posix negation syntax                      |
3015*912701f9SAndroid Build Coastguard Worker|  {   | U+007B | LEFT CURLY BRACKET   | Strings in set; Perl property syntax       |
3016*912701f9SAndroid Build Coastguard Worker|  }   | U+007D | RIGHT CURLY BRACKET  | Strings in set; Perl property syntax       |
3017*912701f9SAndroid Build Coastguard Worker|      | U+0020 U+0009..U+000D U+0085<br/>U+200E U+200F<br/>U+2028 U+2029 | ASCII whitespace,<br/>LRM, RLM,<br/>LINE/PARAGRAPH SEPARATOR | Ignored except when escaped |
3018*912701f9SAndroid Build Coastguard Worker
3019*912701f9SAndroid Build Coastguard WorkerNote that some syntax characters only have a special meaning in a certain context. In particular:
3020*912701f9SAndroid Build Coastguard Worker* Out of all above syntax characters, only \\, \}, and whitespace have a special meaning inside strings (**\[\{\[a-z\]\}\]** is the set of the string '\[a-z\]', **\[\{\$blah\}\]** is the set of the string '\$blah').
3021*912701f9SAndroid Build Coastguard Worker* \$ is equivalent to \uFFFF when appearing at the very end of a set with or without trailing whitespace (**[a-z\$]**, **[a-z\$ ]**), and used as starting indicator for a variable reference elsewhere, in which case the variable name will be the longest match on the `var` nonterminal (such as **[\$my_set]**).
3022*912701f9SAndroid Build Coastguard Worker* \- is equivalent to the literal character \\- when occuring at the very beginning of a set, after a \^ at the beginning of a set, or at the very end of a set, in all cases with or without whitespace (**[-abc]**, **[ ^ -abc]**, **[abc-]**), and used as the set difference or range operator elsewhere (**[[abc]-[bc]]**, **[a-z]**)
3023*912701f9SAndroid Build Coastguard Worker* \: initiates a POSIX property set when directly after a \[ without whitespace inbetween (**[:L:]**), ends a POSIX property set when directly before a \] without whitespace inbetween (**[:L:]**), and is equivalent to the literal character \\\: in any other place (**[ \:]**, **[L\:]**)
3024*912701f9SAndroid Build Coastguard Worker* \} ends a string when occurring inside a string (**[{hello}]**), and is equivalent to the literal character \\\} in any other place (**[}a]**)
3025*912701f9SAndroid Build Coastguard Worker
3026*912701f9SAndroid Build Coastguard Worker###### Syntax Special Case Examples
3027*912701f9SAndroid Build Coastguard WorkerIn the following, a table of examples including common sources of confusion concerning the UnicodeSet syntax:
3028*912701f9SAndroid Build Coastguard Worker| Expression | Contained Elements | Syntax Errors |
3029*912701f9SAndroid Build Coastguard Worker| - | - | - |
3030*912701f9SAndroid Build Coastguard Worker| **\[^a\]** | All Unicode code points except 'a' | **\[ ^a\]**, **\[a^\]** |
3031*912701f9SAndroid Build Coastguard Worker| **\[\\^a\]** | 'a' and '^' | |
3032*912701f9SAndroid Build Coastguard Worker| **\[:L:\]** | All code points with Unicode property 'General_Category' equal to 'Letter' | **\[:L\]**, **\[:\]** |
3033*912701f9SAndroid Build Coastguard Worker| **\[ :\]** | ':' | |
3034*912701f9SAndroid Build Coastguard Worker| **\[L:\]** | 'L' and ':' | |
3035*912701f9SAndroid Build Coastguard Worker|  **\[-\]** | '-'. | |
3036*912701f9SAndroid Build Coastguard Worker|  **\[  - \]** | '-' | |
3037*912701f9SAndroid Build Coastguard Worker|  **\[a-\]**, **\[-a\]** | 'a' and '-' | |
3038*912701f9SAndroid Build Coastguard Worker|  **\[a -b\]** | All code points between 'a' and 'b' (inclusive) | |
3039*912701f9SAndroid Build Coastguard Worker|  **\[\[a-b\] -\[b\]\]**, **\[\[a\]-\[b\]-\[c\]\]** | 'a' | **\[a-b-c\]** |
3040*912701f9SAndroid Build Coastguard Worker|  **\[^  - \]** | All Unicode code points except '-' | **\[ ^  - \]** |
3041*912701f9SAndroid Build Coastguard Worker|  **\[\$\]**, **\[ \$  \]** | U+FFFF | |
3042*912701f9SAndroid Build Coastguard Worker|  **\[\$a\]** | The value of the variable '\$a' | **\[\$ a\]**, **\[\$und\]** |
3043*912701f9SAndroid Build Coastguard Worker|  **\[\$a\$\]** | U+FFFF and the value of the variable '\$a' | |
3044*912701f9SAndroid Build Coastguard Worker|  **\[a\$\]** | 'a' and U+FFFF | |
3045*912701f9SAndroid Build Coastguard Worker|  **\[\}\]** | '\}' | **\[\{\]** |
3046*912701f9SAndroid Build Coastguard Worker|  **\[\{\}\]** | the empty string, '' | |
3047*912701f9SAndroid Build Coastguard Worker|  **\[\{\}\}\]** | '\}' and the empty string, '' | |
3048*912701f9SAndroid Build Coastguard Worker|  **\[\{\{\}\]** | '\{' | |
3049*912701f9SAndroid Build Coastguard Worker|  **\[\{\$var\}\]** | the string '\$var' | |
3050*912701f9SAndroid Build Coastguard Worker|  **\[\{\[a-z\}\]**, **\[\{ \[ a - z\}\]** | the string '\[a-z' | |
3051*912701f9SAndroid Build Coastguard Worker|  **\[\\x\{10FFFF 1\}\]** | U+10FFFF and U+1 | **\[\\x\{10FFFF1\}\]** |
3052*912701f9SAndroid Build Coastguard Worker|  **\[\\x\{61\}-d\]** | 'a', 'b', 'c', and 'd' | **\[\\x\{61 63\}-d\]**, **\[\\x\{61 63\}-\\x\{62 64\}\]** |
3053*912701f9SAndroid Build Coastguard Worker
3054*912701f9SAndroid Build Coastguard Worker*Note: the above assumes that variables are supported, \$a is defined as a full UnicodeSet, a string, or a char, and \$und is not defined at all.*
3055*912701f9SAndroid Build Coastguard Worker
3056*912701f9SAndroid Build Coastguard Worker
3057*912701f9SAndroid Build Coastguard Worker
3058*912701f9SAndroid Build Coastguard Worker
3059*912701f9SAndroid Build Coastguard Worker
3060*912701f9SAndroid Build Coastguard Worker##### <a name="Lists_of_Code_Points" href="#Lists_of_Code_Points">Lists of Code Points</a>
3061*912701f9SAndroid Build Coastguard Worker
3062*912701f9SAndroid Build Coastguard WorkerLists are a sequence of strings that may include ranges, which are indicated by a '-' between two code points, as in "a-z". The sequence _start-end_ specifies the range of all code points from the start to end, inclusive, in Unicode order. For example, **[a c d-f m]** is equivalent to **[a c d e f m]**. Whitespace can be freely used for clarity, as **[a c d-f m]** means the same as **[acd-fm]**.
3063*912701f9SAndroid Build Coastguard Worker
3064*912701f9SAndroid Build Coastguard WorkerA string with multiple code points is represented in a list by being surrounded by curly braces, such as in **[a-z \{ch}]**. It can be used with the range notation, with the restriction that each string contains exactly one code point. Thus **\[\{ab\}-\{c\}\]**, **\[\{ax\}-\{bz\}\]**, and **\[\{ab\}-c\]** are invalid. A string consisting of a single code point is equivalent to that code point, that is, **[\{a}-c]** is valid and equivalent to **[a b c]**.
3065*912701f9SAndroid Build Coastguard Worker
3066*912701f9SAndroid Build Coastguard Worker##### <a name="Backslash_Escapes" href="#Backslash_Escapes">Backslash Escapes</a>
3067*912701f9SAndroid Build Coastguard WorkerCertain backslashed code point sequences can be used to quote code points:
3068*912701f9SAndroid Build Coastguard Worker
3069*912701f9SAndroid Build Coastguard Worker| Sequence        | Code point                           |
3070*912701f9SAndroid Build Coastguard Worker| --------------- | ------------------------------------ |
3071*912701f9SAndroid Build Coastguard Worker| \\x\{h...h}<br/>\\u\{h...h} | list of 1-6 hex digits ([0-9A-Fa-f]), separated by spaces |
3072*912701f9SAndroid Build Coastguard Worker| \\xhh           | 2 hex digits                         |
3073*912701f9SAndroid Build Coastguard Worker| \\uhhhh         | Exactly 4 hex digits                 |
3074*912701f9SAndroid Build Coastguard Worker| \\Uhhhhhhhh     | Exactly 8 hex digits                 |
3075*912701f9SAndroid Build Coastguard Worker| \\a             | U+0007 (BEL / ALERT)                 |
3076*912701f9SAndroid Build Coastguard Worker| \\b             | U+0008 (BACKSPACE)                   |
3077*912701f9SAndroid Build Coastguard Worker| \\t             | U+0009 (TAB / CHARACTER TABULATION)  |
3078*912701f9SAndroid Build Coastguard Worker| \\n             | U+000A (LINE FEED)                   |
3079*912701f9SAndroid Build Coastguard Worker| \\v             | U+000B (LINE TABULATION)             |
3080*912701f9SAndroid Build Coastguard Worker| \\f             | U+000C (FORM FEED)                   |
3081*912701f9SAndroid Build Coastguard Worker| \\r             | U+000D (CARRIAGE RETURN)             |
3082*912701f9SAndroid Build Coastguard Worker| \\\\            | U+005C (BACKSLASH / REVERSE SOLIDUS) |
3083*912701f9SAndroid Build Coastguard Worker| \\N\{name}      | The Unicode code point named "name". |
3084*912701f9SAndroid Build Coastguard Worker| \\p\{…},\\P\{…} | Unicode property (see below)         |
3085*912701f9SAndroid Build Coastguard Worker
3086*912701f9SAndroid Build Coastguard WorkerAnything else following a backslash is mapped to itself, except the property syntax described below, or in an environment where it is defined to have some special meaning.
3087*912701f9SAndroid Build Coastguard Worker
3088*912701f9SAndroid Build Coastguard WorkerAny code point formed as the result of a backslash escape loses any special meaning and is treated as a literal. In particular, note that \\x, \\u and \\U escapes create literal code points. (In contrast, Java treats Unicode escapes as just a way to represent arbitrary code points in an ASCII source file, and any resulting code points are _**not**_ tagged as literals.)
3089*912701f9SAndroid Build Coastguard Worker
3090*912701f9SAndroid Build Coastguard WorkerUnicode property sets are defined as described in _UTS #18: Unicode Regular Expressions_ [[UTS18](https://www.unicode.org/reports/tr41/#UTS18)], Level 1 and RL2.5, including the syntax where given. For an example of a concrete implementation of this, see [[ICUUnicodeSet](#ICUUnicodeSet)].
3091*912701f9SAndroid Build Coastguard Worker
3092*912701f9SAndroid Build Coastguard Worker##### <a name="Unicode_Properties" href="#Unicode_Properties">Unicode Properties</a>
3093*912701f9SAndroid Build Coastguard Worker
3094*912701f9SAndroid Build Coastguard WorkerBriefly, Unicode property sets are specified by any Unicode property and a value of that property, such as **[:General_Category=Letter:]** for Unicode letters or **\\p\{uppercase}** for the set of upper case letters in Unicode. The property names are defined by the PropertyAliases.txt file and the property values by the PropertyValueAliases.txt file. For more information, see [[UAX44](https://www.unicode.org/reports/tr41/#UAX44)]. The syntax for specifying the property sets is an extension of either POSIX or Perl syntax, by the addition of `"=<value>"`. For example, you can match letters by using the POSIX-style syntax:
3095*912701f9SAndroid Build Coastguard Worker
3096*912701f9SAndroid Build Coastguard Worker**[:General_Category=Letter:]**
3097*912701f9SAndroid Build Coastguard Worker
3098*912701f9SAndroid Build Coastguard Workeror by using the Perl-style syntax
3099*912701f9SAndroid Build Coastguard Worker
3100*912701f9SAndroid Build Coastguard Worker**\\p\{General_Category=Letter}**.
3101*912701f9SAndroid Build Coastguard Worker
3102*912701f9SAndroid Build Coastguard WorkerProperty names and values are case-insensitive, and whitespace, "-", and "\_" are ignored. The property name can be omitted for the **General_Category** and **Script** properties, but is required for other properties. If the property value is omitted, it is assumed to represent a boolean property with the value "true". Thus **[:Letter:]** is equivalent to **[:General_Category=Letter:]**, and **[:Wh-ite-s pa_ce:]** is equivalent to **[:Whitespace=true:]**.
3103*912701f9SAndroid Build Coastguard Worker
3104*912701f9SAndroid Build Coastguard WorkerThe table below shows the two kinds of syntax: POSIX and Perl style. Also, the table shows the "Negative" version, which is a property that excludes all code points of a given kind. For example, **[:^Letter:]** matches all code points that are not **[:Letter:]**.
3105*912701f9SAndroid Build Coastguard Worker
3106*912701f9SAndroid Build Coastguard Worker|                    | Positive         | Negative          |
3107*912701f9SAndroid Build Coastguard Worker| ------------------ | ---------------- | ----------------- |
3108*912701f9SAndroid Build Coastguard Worker| POSIX-style Syntax | [:type=value:]   | [:^type=value:]   |
3109*912701f9SAndroid Build Coastguard Worker| Perl-style Syntax  | \\p\{type=value} | \\P\{type=value}  |
3110*912701f9SAndroid Build Coastguard Worker
3111*912701f9SAndroid Build Coastguard Worker##### <a name="Boolean_Operations" href="#Boolean_Operations">Boolean Operations</a>
3112*912701f9SAndroid Build Coastguard Worker
3113*912701f9SAndroid Build Coastguard WorkerThe low-level lists or properties then can be freely combined with the normal set operations (union, inverse, difference, and intersection):
3114*912701f9SAndroid Build Coastguard Worker
3115*912701f9SAndroid Build Coastguard Worker* To union two sets, simply concatenate them. For example, **[[:letter:] [:number:]]**
3116*912701f9SAndroid Build Coastguard Worker* To intersect two sets, use the '&' operator. For example, **[[:letter:] & [a-z]]**
3117*912701f9SAndroid Build Coastguard Worker* To take the set-difference of two sets, use the '-' operator. For example, **[[:letter:] - [a-z]]**
3118*912701f9SAndroid Build Coastguard Worker* To invert a set, place a '\^' immediately after the opening '['. For example, **[\^a-z]**. In any other location, the '\^' does not have a special meaning. The inversion [\^X] is equivalent to [[\\x{0}-\\x{10FFFF}]-[X]]. Thus multi-code point strings are discarded.
3119*912701f9SAndroid Build Coastguard Worker* Symmetric difference (~) is not supported.
3120*912701f9SAndroid Build Coastguard Worker
3121*912701f9SAndroid Build Coastguard WorkerThe binary operators '&', '-', and the implicit union have equal precedence and bind left-to-right. Thus **[[:letter:]-[a-z]-[\\u0100-\\u01FF]]** is equal to **[[[:letter:]-[a-z]]-[\\u0100-\\u01FF]]**. Another example is the set **[[ace][bdf] - [abc][def]]**, which is not the empty set, but instead equal to **[[[[ace] [bdf]] - [abc]] [def]]**, which equals **[[[abcdef] - [abc]] [def]]**, which equals **[[def] [def]]**, which equals **[def]**.
3122*912701f9SAndroid Build Coastguard Worker
3123*912701f9SAndroid Build Coastguard Worker**One caution:** the '&' and '-' operators operate between sets. That is, they must be immediately preceded and immediately followed by a set. For example, the pattern **[[:Lu:]-A]** is illegal, since it is interpreted as the set **[:Lu:]** followed by the incomplete range **-A**. To specify the set of upper case letters except for 'A', enclose the 'A' in brackets: **[[:Lu:]-[A]]**.
3124*912701f9SAndroid Build Coastguard Worker
3125*912701f9SAndroid Build Coastguard Worker##### <a name="Variables_in_UnicodeSets" href="#Variables_in_UnicodeSets">Variables in UnicodeSets</a>
3126*912701f9SAndroid Build Coastguard Worker
3127*912701f9SAndroid Build Coastguard WorkerSupport for variable identifiers (var) is optional.
3128*912701f9SAndroid Build Coastguard WorkerThey are used in certain contexts such as in [Transforms](tr35-general.md#Transforms).
3129*912701f9SAndroid Build Coastguard WorkerWhen they are used, they are defined as follows:
3130*912701f9SAndroid Build Coastguard Worker
3131*912701f9SAndroid Build Coastguard WorkerUnicodeSets may contain variables (`$my_char`, `$the_set`, ...) in place of full UnicodeSets and strings/characters. If variable support is enabled, variables must be defined (out-of-scope for UnicodeSets). In particular, referring to undefined variables is an error.
3132*912701f9SAndroid Build Coastguard Worker
3133*912701f9SAndroid Build Coastguard WorkerNot all variable maps are valid for a given expression in UnicodeSet syntax.
3134*912701f9SAndroid Build Coastguard WorkerFor instance, consider **[$a-$b]**; this may be a range of characters if both **$a** and **$b** are characters,
3135*912701f9SAndroid Build Coastguard Workeror a difference of sets if they are both sets; but given the map `{ a => '0', b => [:L:] }`, it is invalid.
3136*912701f9SAndroid Build Coastguard Worker
3137*912701f9SAndroid Build Coastguard Worker**Note:** In particular, the variable map is needed not just to compute the actual set of characters and strings represented by the UnicodeSet,
3138*912701f9SAndroid Build Coastguard Workerbut also to parse the UnicodeSet syntax: if **$a** and **$b** were unknown, the parsing of **[$a-$b]** would be ambiguous.
3139*912701f9SAndroid Build Coastguard Worker
3140*912701f9SAndroid Build Coastguard WorkerVariables are replaced by value, that is, **[a \$minus z]** with a variable map `{ minus => '-' }` is equivalent to **[-az]**, not **[a-z]** (i.e., cardinality of 3 instead of 26).
3141*912701f9SAndroid Build Coastguard WorkerThe full `var` nonterminal is replaced, i.e., the variable name together with the prefixed \$.
3142*912701f9SAndroid Build Coastguard Worker
3143*912701f9SAndroid Build Coastguard WorkerThe variable syntax implements UAX31-R1-2 with XID_Start and XID_Continue. For more information, see [[UAX31](https://www.unicode.org/reports/tr41/#UAX31)].
3144*912701f9SAndroid Build Coastguard WorkerVariables are equivalent normalized identifiers with Normalization Form C, implementing UAX31-R4. Furthermore, variables are case-sensitive.
3145*912701f9SAndroid Build Coastguard Worker
3146*912701f9SAndroid Build Coastguard Worker
3147*912701f9SAndroid Build Coastguard WorkerNotes:
3148*912701f9SAndroid Build Coastguard Worker1. The 'type' of a variable value is not specified syntactically.
3149*912701f9SAndroid Build Coastguard WorkerThus \[\$a\-\$b\] can resolve whether \$a and \$b are chars/strings (eg, \$a=δ, \$b=θ) or full UnicodeSets (eg, \$a=\\p\{script=greek\}, \$b=\\p\{general_category=letter\}).
3150*912701f9SAndroid Build Coastguard WorkerThe only restriction is that the result be syntactic; thus (\$a=w, \$b=xy) would raise an error.
3151*912701f9SAndroid Build Coastguard Worker2. Variable substitution is currently disallowed inside of property expressions.
3152*912701f9SAndroid Build Coastguard WorkerThus \\p{gc=\$blah} raises an error.
3153*912701f9SAndroid Build Coastguard Worker3. '\$' when followed by '\]' is interpreted as \\uFFFF, and is used to match before the start of a string or after the end.
3154*912701f9SAndroid Build Coastguard WorkerThus \[ab\$\] matches the string "xaby" in the locations (marked with '()'): "()xaby", "x(a)by", "xa(b)y", "xaby()".
3155*912701f9SAndroid Build Coastguard Worker4. If an unescaped '\$' is neither followed by a character of type \[:XID_Start:\] nor a '\]', it is a syntax error.
3156*912701f9SAndroid Build Coastguard Worker
3157*912701f9SAndroid Build Coastguard Worker**Backwards compatibility**: In prior versions of this document, the character \$ was a valid element of the `char` nonterminal with the special meaning of `\uFFFF`.
3158*912701f9SAndroid Build Coastguard WorkerIn current versions, the \$ character may only appear by itself at the end of a UnicodeSet, e.g., **[a-z\$]**, where it keeps that interpretation.
3159*912701f9SAndroid Build Coastguard WorkerAllowing \$ to appear in any other location is only allowed as the prefix for variables.
3160*912701f9SAndroid Build Coastguard WorkerThe previous behavior of allowing \$ in the `char` nonterminal is considered obsolete and must be avoided by new implementations.
3161*912701f9SAndroid Build Coastguard Worker
3162*912701f9SAndroid Build Coastguard Worker##### <a name="UnicodeSet_Examples" href="#UnicodeSet_Examples">UnicodeSet Examples</a>
3163*912701f9SAndroid Build Coastguard Worker
3164*912701f9SAndroid Build Coastguard WorkerThe following table summarizes the syntax that can be used.
3165*912701f9SAndroid Build Coastguard Worker
3166*912701f9SAndroid Build Coastguard Worker| Example              | Description |
3167*912701f9SAndroid Build Coastguard Worker| -------------------- | ----------- |
3168*912701f9SAndroid Build Coastguard Worker| [a]                  | The set containing 'a' alone |
3169*912701f9SAndroid Build Coastguard Worker| [a-z]                | The set containing 'a' through 'z' and all letters in between, in Unicode order.<br/>Thus it is the same as [\\u0061-\\u007A]. |
3170*912701f9SAndroid Build Coastguard Worker| [^a-z]               | The set containing all code points but 'a' through 'z'.<br/>Thus it is the same as [\\u0000-\\u0060 \\u007B-\\x{10FFFF}]. |
3171*912701f9SAndroid Build Coastguard Worker| [[pat1][pat2]]       | The union of sets specified by pat1 and pat2 |
3172*912701f9SAndroid Build Coastguard Worker| [[pat1]&[pat2]]      | The intersection of sets specified by pat1 and pat2 |
3173*912701f9SAndroid Build Coastguard Worker| [[pat1]-[pat2]]      | The asymmetric difference of sets specified by pat1 and pat2 |
3174*912701f9SAndroid Build Coastguard Worker| [a \{ab} \{ac}]      | The code point 'a' and the multi-code point strings "ab" and "ac" |
3175*912701f9SAndroid Build Coastguard Worker| [x\\u\{61 2019 62}y] | Equivalent to [x\\u0061\\u2019\\u0062y] (= [xa’by]) |
3176*912701f9SAndroid Build Coastguard Worker| [:Lu:]               | The set of code points with a given property value, as defined by PropertyValueAliases.txt. In this case, these are the Unicode upper case letters. The long form for this is **[:General_Category=Uppercase_Letter:]**. |
3177*912701f9SAndroid Build Coastguard Worker| [:L:]                | The set of code points belonging to all Unicode categories starting with 'L', that is, **[[:Lu:][:Ll:][:Lt:][:Lm:][:Lo:]]**. The long form for this is **[:General_Category=Letter:]**. |
3178*912701f9SAndroid Build Coastguard Worker
3179*912701f9SAndroid Build Coastguard Worker#### <a name="String_Range" href="#String_Range">String Range</a>
3180*912701f9SAndroid Build Coastguard Worker
3181*912701f9SAndroid Build Coastguard WorkerA String Range is a compact format for specifying a list of strings.
3182*912701f9SAndroid Build Coastguard Worker
3183*912701f9SAndroid Build Coastguard Worker**Syntax:**
3184*912701f9SAndroid Build Coastguard Worker
3185*912701f9SAndroid Build Coastguard Worker> X _sep_ Y
3186*912701f9SAndroid Build Coastguard Worker
3187*912701f9SAndroid Build Coastguard WorkerThe separator and the format of strings X, Y may vary depending on the domain. For example,
3188*912701f9SAndroid Build Coastguard Worker
3189*912701f9SAndroid Build Coastguard Worker* for the validity files the separator is ~,
3190*912701f9SAndroid Build Coastguard Worker* for UnicodeSet the separator is -, and any multi-codepoint string is enclosed in {…}.
3191*912701f9SAndroid Build Coastguard Worker
3192*912701f9SAndroid Build Coastguard Worker**Validity:**
3193*912701f9SAndroid Build Coastguard Worker
3194*912701f9SAndroid Build Coastguard Worker> A string range X _sep_ Y is valid iff len(X) ≥ len(Y) > 0, where len(X) is the length of X in code points.
3195*912701f9SAndroid Build Coastguard Worker>
3196*912701f9SAndroid Build Coastguard Worker> _There may be additional, domain-specific requirements for validity of the expansion of the string range._
3197*912701f9SAndroid Build Coastguard Worker
3198*912701f9SAndroid Build Coastguard Worker**Interpretation:**
3199*912701f9SAndroid Build Coastguard Worker
3200*912701f9SAndroid Build Coastguard Worker1. Break X into P and S, where len(S) = len(Y)
3201*912701f9SAndroid Build Coastguard Worker   * Note that P will be an empty string if the lengths of X and Y are equal.
3202*912701f9SAndroid Build Coastguard Worker2. Form the combinations of all P+(s₀..y₀)+(s₁..y₁)+...(sₙ..yₙ)
3203*912701f9SAndroid Build Coastguard Worker   * s₀ is the first code point in S, etc.
3204*912701f9SAndroid Build Coastguard Worker
3205*912701f9SAndroid Build Coastguard Worker**Examples:**
3206*912701f9SAndroid Build Coastguard Worker
3207*912701f9SAndroid Build Coastguard Worker<!-- HTML: no th -->
3208*912701f9SAndroid Build Coastguard Worker<table><tbody>
3209*912701f9SAndroid Build Coastguard Worker<tr><td>ab-ad</td><td>→</td><td>ab ac ad</td></tr>
3210*912701f9SAndroid Build Coastguard Worker<tr><td>ab-d</td><td>→</td><td>ab ac ad</td></tr>
3211*912701f9SAndroid Build Coastguard Worker<tr><td>ab-cd</td><td>→</td><td>ab ac ad bb bc bd cb cc cd</td></tr>
3212*912701f9SAndroid Build Coastguard Worker<tr><td>����-����</td><td>→</td><td>���� ���� ���� ���� ����</td></tr>
3213*912701f9SAndroid Build Coastguard Worker<tr><td>����-��</td><td>→</td><td>���� ���� ���� ���� ����</td></tr>
3214*912701f9SAndroid Build Coastguard Worker</tbody></table>
3215*912701f9SAndroid Build Coastguard Worker
3216*912701f9SAndroid Build Coastguard Worker### <a name="Identity_Elements" href="#Identity_Elements">Identity Elements</a>
3217*912701f9SAndroid Build Coastguard Worker
3218*912701f9SAndroid Build Coastguard Worker```xml
3219*912701f9SAndroid Build Coastguard Worker<!ELEMENT identity (alias | (version, generation?, language, script?, territory?, variant?, special*) ) >
3220*912701f9SAndroid Build Coastguard Worker```
3221*912701f9SAndroid Build Coastguard Worker
3222*912701f9SAndroid Build Coastguard WorkerThe `identity` element contains information identifying the target locale for this data, and general information about the version of this data.
3223*912701f9SAndroid Build Coastguard Worker
3224*912701f9SAndroid Build Coastguard Worker```xml
3225*912701f9SAndroid Build Coastguard Worker<version number="$Revision: 1.227 $">
3226*912701f9SAndroid Build Coastguard Worker```
3227*912701f9SAndroid Build Coastguard Worker
3228*912701f9SAndroid Build Coastguard WorkerThe `version` element provides, in an attribute, the version of this file.  The contents of the element can contain textual notes about the changes between this version and the last. For example:
3229*912701f9SAndroid Build Coastguard Worker
3230*912701f9SAndroid Build Coastguard Worker> ```xml
3231*912701f9SAndroid Build Coastguard Worker> <version number="1.1">Various notes and changes in version 1.1</version>
3232*912701f9SAndroid Build Coastguard Worker> ```
3233*912701f9SAndroid Build Coastguard Worker>
3234*912701f9SAndroid Build Coastguard Worker> This is not to be confused with the `version` attribute on the `ldml` element, which tracks the dtd version.
3235*912701f9SAndroid Build Coastguard Worker
3236*912701f9SAndroid Build Coastguard Worker```xml
3237*912701f9SAndroid Build Coastguard Worker<generation date="$Date: 2007/07/17 23:41:16 $" />
3238*912701f9SAndroid Build Coastguard Worker```
3239*912701f9SAndroid Build Coastguard Worker
3240*912701f9SAndroid Build Coastguard WorkerThe `generation` element is now deprecated. It was used to contain the last modified date for the data. This could be in two formats: ISO 8601 format, or CVS format (illustrated by the example above).
3241*912701f9SAndroid Build Coastguard Worker
3242*912701f9SAndroid Build Coastguard Worker```xml
3243*912701f9SAndroid Build Coastguard Worker<language type="en" />
3244*912701f9SAndroid Build Coastguard Worker```
3245*912701f9SAndroid Build Coastguard Worker
3246*912701f9SAndroid Build Coastguard WorkerThe language code is the primary part of the specification of the locale id, with values as described above.
3247*912701f9SAndroid Build Coastguard Worker
3248*912701f9SAndroid Build Coastguard Worker```xml
3249*912701f9SAndroid Build Coastguard Worker<script type="Latn" />
3250*912701f9SAndroid Build Coastguard Worker```
3251*912701f9SAndroid Build Coastguard Worker
3252*912701f9SAndroid Build Coastguard WorkerThe script code may be used in the identification of written languages, with values described above.
3253*912701f9SAndroid Build Coastguard Worker
3254*912701f9SAndroid Build Coastguard Worker```xml
3255*912701f9SAndroid Build Coastguard Worker<territory type="US" />
3256*912701f9SAndroid Build Coastguard Worker```
3257*912701f9SAndroid Build Coastguard Worker
3258*912701f9SAndroid Build Coastguard WorkerThe territory code is a common part of the specification of the locale id, with values as described above.
3259*912701f9SAndroid Build Coastguard Worker
3260*912701f9SAndroid Build Coastguard Worker```xml
3261*912701f9SAndroid Build Coastguard Worker<variant type="NYNORSK" />
3262*912701f9SAndroid Build Coastguard Worker```
3263*912701f9SAndroid Build Coastguard Worker
3264*912701f9SAndroid Build Coastguard WorkerThe variant code is the tertiary part of the specification of the locale id, with values as described above.
3265*912701f9SAndroid Build Coastguard Worker
3266*912701f9SAndroid Build Coastguard WorkerWhen combined according to the rules described in _[Unicode Language and Locale Identifiers](#Unicode_Language_and_Locale_Identifiers)_, the `language` element, along with any of the optional `script`, `territory`, and `variant` elements, must identify a known, stable locale identifier. Otherwise, it is an error.
3267*912701f9SAndroid Build Coastguard Worker
3268*912701f9SAndroid Build Coastguard Worker### <a name="Valid_Attribute_Values" href="#Valid_Attribute_Values">Valid Attribute Values</a>
3269*912701f9SAndroid Build Coastguard Worker
3270*912701f9SAndroid Build Coastguard WorkerThe [DTD Annotations](#DTD_Annotations) in are used to determine whether elements, attributes, or attribute values are valid (or deprecated).
3271*912701f9SAndroid Build Coastguard Worker
3272*912701f9SAndroid Build Coastguard Worker### <a name="Canonical_Form" href="#Canonical_Form">Canonical Form</a>
3273*912701f9SAndroid Build Coastguard Worker
3274*912701f9SAndroid Build Coastguard WorkerThe following are restrictions on the format of LDML files to allow for easier parsing and comparison of files.
3275*912701f9SAndroid Build Coastguard Worker
3276*912701f9SAndroid Build Coastguard WorkerPeer elements have consistent order. That is, if the DTD or this specification requires the following order in an element `foo`:
3277*912701f9SAndroid Build Coastguard Worker
3278*912701f9SAndroid Build Coastguard Worker```xml
3279*912701f9SAndroid Build Coastguard Worker<foo>
3280*912701f9SAndroid Build Coastguard Worker    <pattern>
3281*912701f9SAndroid Build Coastguard Worker    <somethingElse>
3282*912701f9SAndroid Build Coastguard Worker</foo>
3283*912701f9SAndroid Build Coastguard Worker```
3284*912701f9SAndroid Build Coastguard Worker
3285*912701f9SAndroid Build Coastguard WorkerIt can never require the reverse order in a different element `bar`.
3286*912701f9SAndroid Build Coastguard Worker
3287*912701f9SAndroid Build Coastguard Worker```xml
3288*912701f9SAndroid Build Coastguard Worker<bar>
3289*912701f9SAndroid Build Coastguard Worker    <somethingElse>
3290*912701f9SAndroid Build Coastguard Worker    <pattern>
3291*912701f9SAndroid Build Coastguard Worker</bar>
3292*912701f9SAndroid Build Coastguard Worker```
3293*912701f9SAndroid Build Coastguard Worker
3294*912701f9SAndroid Build Coastguard WorkerNote that there was one case that had to be corrected in order to make this true. For that reason, pattern occurs twice under currency:
3295*912701f9SAndroid Build Coastguard Worker
3296*912701f9SAndroid Build Coastguard Worker```xml
3297*912701f9SAndroid Build Coastguard Worker<!ELEMENT currency (alias | (pattern*, displayName?, symbol?, pattern*, decimal?, group?, special*)) >
3298*912701f9SAndroid Build Coastguard Worker```
3299*912701f9SAndroid Build Coastguard Worker
3300*912701f9SAndroid Build Coastguard Worker[XML](https://www.w3.org/TR/REC-xml/) files can have a wide variation in textual form, while representing precisely the same data. By putting the LDML files in the repository into a canonical form, this allows us to use the simple diff tools used widely (and in CVS) to detect differences when vetting changes, without those tools being confused. This is not a requirement on other uses of LDML; just simply a way to manage repository data more easily.
3301*912701f9SAndroid Build Coastguard Worker
3302*912701f9SAndroid Build Coastguard Worker#### <a name="Content" href="#Content">Content</a>
3303*912701f9SAndroid Build Coastguard Worker
3304*912701f9SAndroid Build Coastguard Worker1.  All start elements are on their own line, indented by _depth_ tabs.
3305*912701f9SAndroid Build Coastguard Worker2.  All end elements (except for leaf nodes) are on their own line, indented by _depth_ tabs.
3306*912701f9SAndroid Build Coastguard Worker3.  Any leaf node with empty content is in the form `<foo/>`.
3307*912701f9SAndroid Build Coastguard Worker4.  There are no blank lines except within comments or content.
3308*912701f9SAndroid Build Coastguard Worker5.  Spaces are used within a start element. There are no extra spaces within elements.
3309*912701f9SAndroid Build Coastguard Worker    * `<version number="1.2"/>`, not `<version  number = "1.2" />`
3310*912701f9SAndroid Build Coastguard Worker    * `</identity>`, not `</identity >`
3311*912701f9SAndroid Build Coastguard Worker6.  All attribute values use double quote ("), not single (').
3312*912701f9SAndroid Build Coastguard Worker7.  There are no CDATA sections, and no escapes except those absolutely required.
3313*912701f9SAndroid Build Coastguard Worker    * no `&apos;` since it is not necessary
3314*912701f9SAndroid Build Coastguard Worker    * no `'&#x61;'`, it would be just `'a'`
3315*912701f9SAndroid Build Coastguard Worker8.  All attributes with defaulted values are suppressed.
3316*912701f9SAndroid Build Coastguard Worker9.  The draft and `alt="proposed.*"` attributes are only on leaf elements.
3317*912701f9SAndroid Build Coastguard Worker10. The tzid are canonicalized in the following way:
3318*912701f9SAndroid Build Coastguard Worker    * All tzids as of CLDR 1.1 (2004.06.08) in zone.tab are canonical.
3319*912701f9SAndroid Build Coastguard Worker    * After that point, the first time a tzid is introduced, that is the canonical form.
3320*912701f9SAndroid Build Coastguard Worker
3321*912701f9SAndroid Build Coastguard Worker    That is, new IDs are added, but existing ones keep the original form. The _TZ_ timezone database keeps a set of equivalences in the "backward" file. These are used to map other tzids to the canonical form. For example, when `America/Argentina/Catamarca` was introduced as the new name for the previous `America/Catamarca` , a link was added in the backward file.
3322*912701f9SAndroid Build Coastguard Worker
3323*912701f9SAndroid Build Coastguard Worker    `Link America/Argentina/Catamarca America/Catamarca`
3324*912701f9SAndroid Build Coastguard Worker
3325*912701f9SAndroid Build Coastguard Worker_Example:_
3326*912701f9SAndroid Build Coastguard Worker
3327*912701f9SAndroid Build Coastguard Worker```xml
3328*912701f9SAndroid Build Coastguard Worker<ldml draft="unconfirmed" >
3329*912701f9SAndroid Build Coastguard Worker    <identity>
3330*912701f9SAndroid Build Coastguard Worker        <version number="1.2" />
3331*912701f9SAndroid Build Coastguard Worker        <language type="en" />
3332*912701f9SAndroid Build Coastguard Worker        <territory type="AS" />
3333*912701f9SAndroid Build Coastguard Worker    </identity>
3334*912701f9SAndroid Build Coastguard Worker    <numbers>
3335*912701f9SAndroid Build Coastguard Worker        <currencyFormats>
3336*912701f9SAndroid Build Coastguard Worker            <currencyFormatLength>
3337*912701f9SAndroid Build Coastguard Worker                <currencyFormat>
3338*912701f9SAndroid Build Coastguard Worker                    <pattern>¤#,##0.00;(¤#,##0.00)</pattern>
3339*912701f9SAndroid Build Coastguard Worker                </currencyFormat>
3340*912701f9SAndroid Build Coastguard Worker            </currencyFormatLength>
3341*912701f9SAndroid Build Coastguard Worker        </currencyFormats>
3342*912701f9SAndroid Build Coastguard Worker    </numbers>
3343*912701f9SAndroid Build Coastguard Worker</ldml>
3344*912701f9SAndroid Build Coastguard Worker```
3345*912701f9SAndroid Build Coastguard Worker
3346*912701f9SAndroid Build Coastguard Worker#### <a name="Ordering" href="#Ordering">Ordering</a>
3347*912701f9SAndroid Build Coastguard Worker
3348*912701f9SAndroid Build Coastguard WorkerAn element is ordered first by the element name, and then if the element names are identical, by the sorted set of attribute-value pairs. For the latter, compare the first pair in each (in sorted order by attribute pair). If not identical, go to the second pair, and so on.
3349*912701f9SAndroid Build Coastguard Worker
3350*912701f9SAndroid Build Coastguard WorkerElements and attributes are ordered according to their order in the respective DTDs. Attribute value comparison is a bit more complicated, and may depend on the attribute and type. This is currently done with specific ordering tables.
3351*912701f9SAndroid Build Coastguard Worker
3352*912701f9SAndroid Build Coastguard WorkerAny future additions to the DTD must be structured so as to allow compatibility with this ordering. See also [Valid Attribute Values.](#Valid_Attribute_Values)
3353*912701f9SAndroid Build Coastguard Worker
3354*912701f9SAndroid Build Coastguard Worker#### <a name="Comments" href="#Comments">Comments</a>
3355*912701f9SAndroid Build Coastguard Worker
3356*912701f9SAndroid Build Coastguard Worker1. Comments are of the form `<!-- stuff -->`.
3357*912701f9SAndroid Build Coastguard Worker2. They are logically attached to a node. There are 4 kinds:
3358*912701f9SAndroid Build Coastguard Worker   1. Inline always appear after a leaf node, on the same line at the end. These are a single line.
3359*912701f9SAndroid Build Coastguard Worker   2. Preblock comments always precede the attachment node, and are indented on the same level.
3360*912701f9SAndroid Build Coastguard Worker   3. Postblock comments always follow the attachment node, and are indented on the same level.
3361*912701f9SAndroid Build Coastguard Worker   4. Final comment, after `</ldml>`
3362*912701f9SAndroid Build Coastguard Worker3. Multiline comments (except the final comment) have each line after the first indented to one deeper level.
3363*912701f9SAndroid Build Coastguard Worker
3364*912701f9SAndroid Build Coastguard Worker**Examples:**
3365*912701f9SAndroid Build Coastguard Worker
3366*912701f9SAndroid Build Coastguard Worker```xml
3367*912701f9SAndroid Build Coastguard Worker<eraAbbr>
3368*912701f9SAndroid Build Coastguard Worker    <era type="0">BC</era> <!-- might add alternate BDE in the future -->
3369*912701f9SAndroid Build Coastguard Worker...
3370*912701f9SAndroid Build Coastguard Worker<timeZoneNames>
3371*912701f9SAndroid Build Coastguard Worker    <!-- Note: zones that do not use daylight time need further work -->
3372*912701f9SAndroid Build Coastguard Worker    <zone type="America/Los_Angeles">
3373*912701f9SAndroid Build Coastguard Worker    ...
3374*912701f9SAndroid Build Coastguard Worker    <!-- Note: the following is known to be sparse,
3375*912701f9SAndroid Build Coastguard Worker            and needs to be improved in the future -->
3376*912701f9SAndroid Build Coastguard Worker    <zone type="Asia/Jerusalem">
3377*912701f9SAndroid Build Coastguard Worker```
3378*912701f9SAndroid Build Coastguard Worker
3379*912701f9SAndroid Build Coastguard Worker### <a name="DTD_Annotations" href="#DTD_Annotations">DTD Annotations</a>
3380*912701f9SAndroid Build Coastguard Worker
3381*912701f9SAndroid Build Coastguard WorkerThe information in a standard DTD is insufficient for use in CLDR. To make up for that, DTD annotations are added. These are of the form
3382*912701f9SAndroid Build Coastguard Worker
3383*912701f9SAndroid Build Coastguard Worker```xml
3384*912701f9SAndroid Build Coastguard Worker<[email protected]>
3385*912701f9SAndroid Build Coastguard Worker```
3386*912701f9SAndroid Build Coastguard Worker
3387*912701f9SAndroid Build Coastguard Workerand are included below the !ELEMENT or !ATTLIST line that they apply to. The current annotations are:
3388*912701f9SAndroid Build Coastguard Worker
3389*912701f9SAndroid Build Coastguard Worker| Type                 | Description |
3390*912701f9SAndroid Build Coastguard Worker| ---------------------| ----------- |
3391*912701f9SAndroid Build Coastguard Worker| `<!--@VALUE-->`      | The attribute is not distinguishing, and is treated like an element value |
3392*912701f9SAndroid Build Coastguard Worker| `<!--@METADATA-->`   | The attribute is a “comment” on the data, like the draft status. It is not typically used in implementations. |
3393*912701f9SAndroid Build Coastguard Worker| `<!--@ALLOWS_UESC-->`   | The attribute value can be escaped using the `\u` notation. Does not require this notation to be used. |
3394*912701f9SAndroid Build Coastguard Worker| `<!--@ORDERED-->`    | The element's children are ordered, and do not inherit. |
3395*912701f9SAndroid Build Coastguard Worker| `<!--@DEPRECATED-->` | The element or attribute is deprecated, and should not be used. |
3396*912701f9SAndroid Build Coastguard Worker| `<!--@DEPRECATED: attribute-value1, attribute-value2-->` | The attribute values are deprecated, and should not be used. Spaces between tokens are not significant. |
3397*912701f9SAndroid Build Coastguard Worker| `<!--@MATCH:{attribute value constraint}-->` | Requires the attribute value to match the constraint. |
3398*912701f9SAndroid Build Coastguard Worker| `<!--@TECHPREVIEW-->` | The element is a technical preview of a feature and may be changed or removed at any time. |
3399*912701f9SAndroid Build Coastguard Worker
3400*912701f9SAndroid Build Coastguard WorkerThere is additional information in the attributeValueValidity.xml file that is used internally for testing. For example, the following line indicates that the 'currency' element in the ldml dtd must have values from the bcp47 'cu' type.
3401*912701f9SAndroid Build Coastguard Worker
3402*912701f9SAndroid Build Coastguard Worker```xml
3403*912701f9SAndroid Build Coastguard Worker<attributeValues dtds='ldml' elements='currency' attributes='type'>$_bcp47_cu</attributeValues>
3404*912701f9SAndroid Build Coastguard Worker```
3405*912701f9SAndroid Build Coastguard Worker
3406*912701f9SAndroid Build Coastguard WorkerThe element values may be literals, regular expressions, or variables (some of which are set programmatically according to other CLDR data, such as the above). However, the information at this point does not cover all attribute values, is used only for testing, and should not be used in implementations since the structure may change without notice.
3407*912701f9SAndroid Build Coastguard Worker
3408*912701f9SAndroid Build Coastguard Worker#### <a name="match_expressions" href="#match_expressions">Attribute Value Constraints</a>
3409*912701f9SAndroid Build Coastguard Worker
3410*912701f9SAndroid Build Coastguard WorkerThe following are constraints on the attribute values. Note: in future versions, the format may change, and/or the constraints may be tightened.
3411*912701f9SAndroid Build Coastguard Worker
3412*912701f9SAndroid Build Coastguard Worker| Constraint                | Comments |
3413*912701f9SAndroid Build Coastguard Worker| ------------------------- | -------- |
3414*912701f9SAndroid Build Coastguard Worker| any                       | any string value |
3415*912701f9SAndroid Build Coastguard Worker| any/TODO                  | placeholder for future constraints |
3416*912701f9SAndroid Build Coastguard Worker| bcp47/anykey              | any bcp47 key or tkey |
3417*912701f9SAndroid Build Coastguard Worker| bcp47/anyvalue            | any bcp47 value (type) or tvalue |
3418*912701f9SAndroid Build Coastguard Worker| literal/\{literal values} | comma separated |
3419*912701f9SAndroid Build Coastguard Worker| regex/\{regex expression} | valid regex expression |
3420*912701f9SAndroid Build Coastguard Worker| bcp47/\{key or tkey}      | matches possible values for that key or tkey |
3421*912701f9SAndroid Build Coastguard Worker| metazone                  | valid metazone |
3422*912701f9SAndroid Build Coastguard Worker| range/\{start_number~{end_number}} | number between (inclusive) start and end |
3423*912701f9SAndroid Build Coastguard Worker| time/\{time or date or date-time pattern} | eg HH:mm |
3424*912701f9SAndroid Build Coastguard Worker| unicodeset/\{unicodeset pattern} | valid unicodeset |
3425*912701f9SAndroid Build Coastguard Worker| validity/\{field}         | currency, language, locale, region, script, subdivision, short-unit, unit, variant<br/>The field can be qualified by particular enums, such as:<br/>`validity/unit/regular deprecated`: matches only _deprecated_ and _regular_<br/>`validity/unit/!deprecated`: matches all but _deprecated_ |
3426*912701f9SAndroid Build Coastguard Worker| version                   | 1 to 4 digit field version, such as 35.3.9 |
3427*912701f9SAndroid Build Coastguard Worker| set/\{match}              | set of elements that match \{match} |
3428*912701f9SAndroid Build Coastguard Worker| or/\{match1}\|\|\{match2}…  | matches at least one of \{match1}, etc |
3429*912701f9SAndroid Build Coastguard Worker
3430*912701f9SAndroid Build Coastguard Worker
3431*912701f9SAndroid Build Coastguard Worker
3432*912701f9SAndroid Build Coastguard Worker## <a name="Property_Data" href="#Property_Data">Property Data</a>
3433*912701f9SAndroid Build Coastguard Worker
3434*912701f9SAndroid Build Coastguard WorkerSome data in CLDR does not use an XML format, but rather a semicolon-delimited format derived from that of the Unicode Character Database. That is because the data is more likely to be parsed by implementations that already parse UCD data. Those files are present in the common/properties directory.
3435*912701f9SAndroid Build Coastguard Worker
3436*912701f9SAndroid Build Coastguard WorkerEach file has a header that explains the format and usage of the data.
3437*912701f9SAndroid Build Coastguard Worker
3438*912701f9SAndroid Build Coastguard Worker### <a name="Script_Metadata" href="#Script_Metadata">Script Metadata</a>
3439*912701f9SAndroid Build Coastguard Worker
3440*912701f9SAndroid Build Coastguard Worker`scriptMetadata.txt`
3441*912701f9SAndroid Build Coastguard Worker
3442*912701f9SAndroid Build Coastguard WorkerThis file provides general information about scripts that may be useful to implementations processing text. The information is the best currently available, and may change between versions of CLDR. The format is similar to Unicode Character Database property file, and is documented in the header of the data file.
3443*912701f9SAndroid Build Coastguard Worker
3444*912701f9SAndroid Build Coastguard Worker### <a name="Extended_Pictographic" href="#Extended_Pictographic">Extended Pictographic</a>
3445*912701f9SAndroid Build Coastguard Worker
3446*912701f9SAndroid Build Coastguard Worker`ExtendedPictographic.txt`
3447*912701f9SAndroid Build Coastguard Worker
3448*912701f9SAndroid Build Coastguard WorkerThis file was used to define the ExtendedPictographic data used for “future-proofing” emoji behavior, especially in segmentation. As of Emoji version 11.0, the set of Extended_Pictographic is incorporated into the emoji data files found at [unicode.org/Public/emoji/](https://www.unicode.org/Public/emoji/).
3449*912701f9SAndroid Build Coastguard Worker
3450*912701f9SAndroid Build Coastguard Worker### <a name="Labels.txt" href="#Labels.txt">Labels.txt</a>
3451*912701f9SAndroid Build Coastguard Worker
3452*912701f9SAndroid Build Coastguard Worker`labels.txt`
3453*912701f9SAndroid Build Coastguard Worker
3454*912701f9SAndroid Build Coastguard WorkerThis file provides general information about associations of labels to characters that may be useful to implementations of character-picking applications. The information is the best currently available, and may change between versions of CLDR. The format is similar to Unicode Character Database property file, and is documented in the header of the data file.
3455*912701f9SAndroid Build Coastguard Worker
3456*912701f9SAndroid Build Coastguard WorkerInitially, the contents are focused on emoji, but may be expanded in the future to other types of characters. Note that a character may have multiple labels.
3457*912701f9SAndroid Build Coastguard Worker
3458*912701f9SAndroid Build Coastguard Worker### <a name="Segmentation_Tests" href="#Segmentation_Tests">Segmentation Tests</a>
3459*912701f9SAndroid Build Coastguard Worker
3460*912701f9SAndroid Build Coastguard WorkerCLDR provides a tailoring to the [Grapheme Cluster Break (gcb)](https://www.unicode.org/reports/tr29/) algorithm to avoid splitting Indic aksaras. The corresponding test files for that are located in common/properties/segments/, along with a readme.txt that provides more details. There are also specific test files for the supported Indic scripts in the unittest directory.
3461*912701f9SAndroid Build Coastguard Worker
3462*912701f9SAndroid Build Coastguard Worker
3463*912701f9SAndroid Build Coastguard Worker
3464*912701f9SAndroid Build Coastguard Worker## <a name="Format_Parse_Issues" href="#Format_Parse_Issues">Issues in Formatting and Parsing</a>
3465*912701f9SAndroid Build Coastguard Worker
3466*912701f9SAndroid Build Coastguard Worker### <a name="Lenient_Parsing" href="#Lenient_Parsing">Lenient Parsing</a>
3467*912701f9SAndroid Build Coastguard Worker
3468*912701f9SAndroid Build Coastguard Worker#### <a name="Motivation" href="#Motivation">Motivation</a>
3469*912701f9SAndroid Build Coastguard Worker
3470*912701f9SAndroid Build Coastguard WorkerUser input is frequently messy. Attempting to parse it by matching it exactly against a pattern is likely to be unsuccessful, even when the meaning of the input is clear to a human being. For example, for a date pattern of "MM/dd/yy", the input "June 1, 2006" will fail.
3471*912701f9SAndroid Build Coastguard Worker
3472*912701f9SAndroid Build Coastguard WorkerThe goal of lenient parsing is to accept user input whenever it is possible to decipher what the user intended. Doing so requires using patterns as data to guide the parsing process, rather than an exact template that must be matched. This informative section suggests some heuristics that may be useful for lenient parsing of dates, times, and numbers.
3473*912701f9SAndroid Build Coastguard Worker
3474*912701f9SAndroid Build Coastguard Worker#### <a name="Loose_Matching" href="#Loose_Matching">Loose Matching</a>
3475*912701f9SAndroid Build Coastguard Worker
3476*912701f9SAndroid Build Coastguard WorkerLoose matching ignores attributes of the strings being compared that are not important to matching. It involves the following steps:
3477*912701f9SAndroid Build Coastguard Worker
3478*912701f9SAndroid Build Coastguard Worker* Remove "." from currency symbols and other fields used for matching, and also from the input string unless:
3479*912701f9SAndroid Build Coastguard Worker  * "." is in the decimal set, and
3480*912701f9SAndroid Build Coastguard Worker  * its position in the input string is immediately before a decimal digit
3481*912701f9SAndroid Build Coastguard Worker* Ignore all format characters: in particular, ignore any RLM, LRM or ALM used to control BIDI formatting.
3482*912701f9SAndroid Build Coastguard Worker* Ignore all characters in [:Zs:] unless they occur between letters. (In the heuristics below, even those between letters are ignored except to delimit fields)
3483*912701f9SAndroid Build Coastguard Worker* Map all characters in [:Dash:] to U+002D HYPHEN-MINUS
3484*912701f9SAndroid Build Coastguard Worker* Use the data in the `<character-fallback>` element to map equivalent characters (for example, curly to straight apostrophes). Other apostrophe-like characters should also be treated as equivalent, especially if the character actually used in a format may be unavailable on some keyboards. For example:
3485*912701f9SAndroid Build Coastguard Worker  * U+02BB MODIFIER LETTER TURNED COMMA (ʻ) might be typed instead as U+2018 LEFT SINGLE QUOTATION MARK (‘).
3486*912701f9SAndroid Build Coastguard Worker  * U+02BC MODIFIER LETTER APOSTROPHE (ʼ) might be typed instead as U+2019 RIGHT SINGLE QUOTATION MARK (’), U+0027 APOSTROPHE, etc.
3487*912701f9SAndroid Build Coastguard Worker  * U+05F3 HEBREW PUNCTUATION GERESH (‎׳) might be typed instead as U+0027 APOSTROPHE.
3488*912701f9SAndroid Build Coastguard Worker* Apply mappings particular to the domain (i.e., for dates or for numbers, discussed in more detail below)
3489*912701f9SAndroid Build Coastguard Worker* Apply case folding (possibly including language-specific mappings such as Turkish i)
3490*912701f9SAndroid Build Coastguard Worker* Normalize to NFKC; thus _no-break space_ will map to _space_; half-width _katakana_ will map to full-width.
3491*912701f9SAndroid Build Coastguard Worker
3492*912701f9SAndroid Build Coastguard WorkerLoose matching involves (logically) applying the above transform to both the input text and to each of the field elements used in matching, before applying the specific heuristics below. For example, if the input number text is " - NA f. 1,000.00", then it is mapped to "-naf1,000.00" before processing. The currency signs are also transformed, so "NA f." is converted to "naf" for purposes of matching. As with other Unicode algorithms, this is a logical statement of the process; actual implementations can optimize, such as by applying the transform incrementally during matching.
3493*912701f9SAndroid Build Coastguard Worker
3494*912701f9SAndroid Build Coastguard Worker### <a name="Invalid_Patterns" href="#Invalid_Patterns">Handling Invalid Patterns</a>
3495*912701f9SAndroid Build Coastguard Worker
3496*912701f9SAndroid Build Coastguard WorkerProcesses sometimes encounter invalid number or date patterns, such as a number pattern with “¤¤¤¤¤” (valid pattern character but invalid length in current CLDR), a date pattern with “nn” (invalid pattern character in current CLDR), or a date pattern with “MMMMMM” (invalid length in current CLDR). The recommended behavior for handling such an invalid pattern field is:
3497*912701f9SAndroid Build Coastguard Worker
3498*912701f9SAndroid Build Coastguard Worker* For a field using a currently-invalid length for a valid pattern character:
3499*912701f9SAndroid Build Coastguard Worker  * In **formatting,** emit U+FFFD REPLACEMENT CHARACTER for the invalid field.
3500*912701f9SAndroid Build Coastguard Worker  * In **parsing,** the field may be parsed as if it had a valid length.
3501*912701f9SAndroid Build Coastguard Worker* For a pattern that contains a currently-invalid pattern character (applies only to date patterns, for which A-Za-z are reserved as pattern characters but not all defined as valid):
3502*912701f9SAndroid Build Coastguard Worker  * Produce an error (set an error code or throw an exception) when an attempt is made to create a formatter with such a pattern or to apply such a pattern to an existing formatter.
3503*912701f9SAndroid Build Coastguard Worker
3504*912701f9SAndroid Build Coastguard Worker## <a name="Data_Size" href="#Data_Size">Data Size Reduction</a>
3505*912701f9SAndroid Build Coastguard WorkerSoftware implementations may have constrained memory requirements.
3506*912701f9SAndroid Build Coastguard WorkerThe following outlines some techniques for filtering out CLDR data for a particular implementation.
3507*912701f9SAndroid Build Coastguard WorkerThe exact filtering would depend on the particular requirements of the implementation in question, of course.
3508*912701f9SAndroid Build Coastguard Worker
3509*912701f9SAndroid Build Coastguard WorkerLocale data can be _sliced_ to exclude data not needed by a particular implementation.
3510*912701f9SAndroid Build Coastguard WorkerThis can be _vertical slicing_: excluding a locale and all the locales inheriting from them, or _horizontal slicing_: excluding particular types of data from all locales.
3511*912701f9SAndroid Build Coastguard WorkerFor example:
3512*912701f9SAndroid Build Coastguard Worker  * A vertical slice could retain only those locales used in a particular set of markets, such as EU locales.
3513*912701f9SAndroid Build Coastguard Worker  * A horizontal slice could remove all data in the emoji/ directory, which are annotations for emoji and symbols.
3514*912701f9SAndroid Build Coastguard Worker
3515*912701f9SAndroid Build Coastguard WorkerOf course, both of these techniques can be applied.
3516*912701f9SAndroid Build Coastguard Worker
3517*912701f9SAndroid Build Coastguard Worker### <a name="Vertical_Slicing" href="#Vertical_Slicing">Vertical Slicing</a>
3518*912701f9SAndroid Build Coastguard Worker
3519*912701f9SAndroid Build Coastguard WorkerThe choice of locales to include depends very much upon particular implementations.
3520*912701f9SAndroid Build Coastguard WorkerSome information that might be useful for determining the choice is found in the
3521*912701f9SAndroid Build Coastguard Worker [Supplemental Territory Information](tr35-info.md#Supplemental_Territory_Information),
3522*912701f9SAndroid Build Coastguard Workerwhich provides information on the use of languages in different countries/regions.
3523*912701f9SAndroid Build Coastguard Worker(For a human-readable chart, see [Territory-Language Information](https://unicode-org.github.io/cldr-staging/charts/latest/supplemental/territory_language_information.html).)
3524*912701f9SAndroid Build Coastguard Worker
3525*912701f9SAndroid Build Coastguard WorkerIt is important to note that if a particular locale is in a vertical slice, then all of its parents should be as well, because of inheritance.
3526*912701f9SAndroid Build Coastguard WorkerThis is not a factor if the data is fully resolved, as in the JSON format data.
3527*912701f9SAndroid Build Coastguard Worker
3528*912701f9SAndroid Build Coastguard WorkerSlicing can also remove related supplemental data.
3529*912701f9SAndroid Build Coastguard WorkerFor example, the likely subtags data includes a large number of languages that may not be of interest for all implementations.
3530*912701f9SAndroid Build Coastguard WorkerWhere an the implementation only includes (say) the CLDR locales at Basic coverage in [Unicode CLDR - Coverage Levels](https://cldr.unicode.org/index/cldr-spec/coverage-levels)
3531*912701f9SAndroid Build Coastguard Worker(and locales inheriting from them), the likely subtag data that doesn’t match can be filtered out.
3532*912701f9SAndroid Build Coastguard Worker
3533*912701f9SAndroid Build Coastguard Worker### <a name="Horizontal_Slicing" href="#Horizontal_Slicing">Horizontal Slicing</a>
3534*912701f9SAndroid Build Coastguard Worker
3535*912701f9SAndroid Build Coastguard WorkerThe main reason to perform horizontal slicing is when a particular feature is not used, so the implementation wants to remove the data required for powering that feature.
3536*912701f9SAndroid Build Coastguard WorkerFor example, if an application isn't performing date formatting, it can remove all date formatting data (transitively).
3537*912701f9SAndroid Build Coastguard WorkerIt must take care to retain data used by other features: in the previous example, the number formatting data where currencies are being formatted.
3538*912701f9SAndroid Build Coastguard Worker
3539*912701f9SAndroid Build Coastguard WorkerLocales may also have data on a field-by-field basis that is reasonable to filter out.
3540*912701f9SAndroid Build Coastguard WorkerFor example, locales that meet the Modern level of coverage typically also include some data at a Comprehensive level.
3541*912701f9SAndroid Build Coastguard WorkerThat data is not typically needed for most implementations, and can typically be filtered out.
3542*912701f9SAndroid Build Coastguard WorkerFor example, in CLDR version 43, 58% of the script names (`//ldml/localeDisplayNames/scripts/script[@type="*"]`) are at the Comprehensive level;
3543*912701f9SAndroid Build Coastguard Workerin fact, ~20% of all of values for the Modern level locales are at the Comprehensive level.
3544*912701f9SAndroid Build Coastguard Worker
3545*912701f9SAndroid Build Coastguard WorkerThe easiest way to do that is to use the CLDR Java tooling (the `cldr-code` package) to filter the data before generating the implementation's data format.
3546*912701f9SAndroid Build Coastguard WorkerThat way allows the implementation to have direct access to the CoverageLevel code that can determine the coverage level, for a given locale and path.
3547*912701f9SAndroid Build Coastguard WorkerOnce the data is transformed, such as to the JSON format, the CoverageLevel code is no longer accessible.
3548*912701f9SAndroid Build Coastguard WorkerFor example, here is a code snippet:
3549*912701f9SAndroid Build Coastguard Worker
3550*912701f9SAndroid Build Coastguard Worker```
3551*912701f9SAndroid Build Coastguard Workerprivate static final SupplementalDataInfo SUPPLEMENTAL_DATA_INFO = CLDRConfig.getInstance().getSupplementalDataInfo();
3552*912701f9SAndroid Build Coastguard Worker...
3553*912701f9SAndroid Build Coastguard Worker    Level pathLevel = SUPPLEMENTAL_DATA_INFO.getCoverageLevel(path, locale);
3554*912701f9SAndroid Build Coastguard Worker    if (minimumPathCoverage.compareTo(pathLevel) >= 0) {
3555*912701f9SAndroid Build Coastguard Worker	include(path);
3556*912701f9SAndroid Build Coastguard Worker    }
3557*912701f9SAndroid Build Coastguard Worker```
3558*912701f9SAndroid Build Coastguard Worker
3559*912701f9SAndroid Build Coastguard WorkerSimilarly, the subdivision translations represent a large body of data that may not be needed for many implementations.
3560*912701f9SAndroid Build Coastguard Worker
3561*912701f9SAndroid Build Coastguard Worker* * *
3562*912701f9SAndroid Build Coastguard Worker
3563*912701f9SAndroid Build Coastguard Worker## <a name="Deprecated_Structure" href="#Deprecated_Structure">Annex A Deprecated Structure</a>
3564*912701f9SAndroid Build Coastguard Worker
3565*912701f9SAndroid Build Coastguard WorkerThe [DTD Annotations](#DTD_Annotations) in are used to determine whether DTD items such as elements, attributes, or attribute values are deprecated.
3566*912701f9SAndroid Build Coastguard Worker
3567*912701f9SAndroid Build Coastguard WorkerThough such deprecated items are still valid LDML, they are strongly discouraged, and are no longer used in CLDR.
3568*912701f9SAndroid Build Coastguard Worker
3569*912701f9SAndroid Build Coastguard WorkerThe CLDR [DTD Deltas](https://unicode-org.github.io/cldr-staging/charts/latest/supplemental/dtd_deltas.html) chart shows which DTD items have been deprecated in which version of CLDR.
3570*912701f9SAndroid Build Coastguard Worker
3571*912701f9SAndroid Build Coastguard WorkerThe remainder of this section describes selected cases of deprecated structure, and what (if any) should be used instead.
3572*912701f9SAndroid Build Coastguard Worker
3573*912701f9SAndroid Build Coastguard Worker### <a name="Fallback_Elements" href="#Fallback_Elements">A.1 Element fallback</a>
3574*912701f9SAndroid Build Coastguard Worker
3575*912701f9SAndroid Build Coastguard WorkerImplementations should use instead the information in [Language Matching](#LanguageMatching) for doing language fallback.
3576*912701f9SAndroid Build Coastguard Worker
3577*912701f9SAndroid Build Coastguard Worker### <a name="BCP47_Keyword_Mapping" href="#BCP47_Keyword_Mapping">A.2 BCP 47 Keyword Mapping</a>
3578*912701f9SAndroid Build Coastguard Worker
3579*912701f9SAndroid Build Coastguard WorkerInstead use the mechanisms descibed in [U Extension Data Files](#Unicode_Locale_Extension_Data_Files).
3580*912701f9SAndroid Build Coastguard Worker
3581*912701f9SAndroid Build Coastguard Worker### <a name="Choice_Patterns" href="#Choice_Patterns">A.3 Choice Patterns</a>
3582*912701f9SAndroid Build Coastguard Worker
3583*912701f9SAndroid Build Coastguard WorkerInstead use `count` attributes.
3584*912701f9SAndroid Build Coastguard Worker
3585*912701f9SAndroid Build Coastguard Worker### <a name="Element_default" href="#Element_default">A.4 Element default</a>
3586*912701f9SAndroid Build Coastguard Worker
3587*912701f9SAndroid Build Coastguard WorkerInstead use replacement structure, for example:
3588*912701f9SAndroid Build Coastguard Worker
3589*912701f9SAndroid Build Coastguard Worker* For `<collations>`, now use the `<defaultCollation>` element.
3590*912701f9SAndroid Build Coastguard Worker* For `<calendars>`, the default calendar type for a locale is now specified by _[Calendar Preference Data](tr35-dates.md#Calendar_Preference_Data)_.
3591*912701f9SAndroid Build Coastguard Worker
3592*912701f9SAndroid Build Coastguard Worker### <a name="Deprecated_Common_Attributes" href="#Deprecated_Common_Attributes">A.5 Deprecated Common Attributes</a>
3593*912701f9SAndroid Build Coastguard Worker
3594*912701f9SAndroid Build Coastguard Worker#### <a name="Attribute_standard" href="#Attribute_standard">A.5.1 Attribute standard</a>
3595*912701f9SAndroid Build Coastguard Worker
3596*912701f9SAndroid Build Coastguard WorkerInstead, use a `reference` element with the attribute `standard="true"`.
3597*912701f9SAndroid Build Coastguard Worker
3598*912701f9SAndroid Build Coastguard Worker#### <a name="Attribute_draft_nonLeaf" href="#Attribute_draft_nonLeaf">A.5.2 Attribute draft in non-leaf elements</a>
3599*912701f9SAndroid Build Coastguard Worker
3600*912701f9SAndroid Build Coastguard WorkerThe `draft` attribute is deprecated except in leaf elements (elements that do not have any subelements)
3601*912701f9SAndroid Build Coastguard Worker
3602*912701f9SAndroid Build Coastguard Worker### <a name="Element_base" href="#Element_base">A.6 Element base</a>
3603*912701f9SAndroid Build Coastguard Worker
3604*912701f9SAndroid Build Coastguard WorkerInstead use the collation `<import>` element.
3605*912701f9SAndroid Build Coastguard Worker
3606*912701f9SAndroid Build Coastguard Worker### <a name="Element_rules" href="#Element_rules">A.7 Element rules</a>
3607*912701f9SAndroid Build Coastguard Worker
3608*912701f9SAndroid Build Coastguard WorkerInstead use the basic collation syntax with the [`<cr>` element](tr35-collation.md#Rules).
3609*912701f9SAndroid Build Coastguard Worker
3610*912701f9SAndroid Build Coastguard Worker### <a name="Deprecated_subelements_of_dates" href="#Deprecated_subelements_of_dates">A.8 Deprecated subelements of `<dates>`</a>
3611*912701f9SAndroid Build Coastguard Worker
3612*912701f9SAndroid Build Coastguard Worker* `<localizedPatternChars>`
3613*912701f9SAndroid Build Coastguard Worker* `<dateRangePattern>`, replaced by `<intervalFormats>`.
3614*912701f9SAndroid Build Coastguard Worker
3615*912701f9SAndroid Build Coastguard Worker### <a name="Deprecated_subelements_of_calendars" href="#Deprecated_subelements_of_calendars">A.9 Deprecated subelements of `<calendars>`</a>
3616*912701f9SAndroid Build Coastguard Worker
3617*912701f9SAndroid Build Coastguard Worker* The deprecated `<monthNames>` and `<monthAbbr>` are replaced by the `months` element with the context `type="format"` and the width `type="wide"` (for ...Names) and `type="narrow"` (for ...Abbr), respectively.
3618*912701f9SAndroid Build Coastguard Worker* The deprecated `<dayNames>` and `<dayAbbr>` are replaced by the `days` element with the context `type="format"` and the width `type="wide"` (for ...Names) and `type="narrow"` (for ...Abbr), respectively.
3619*912701f9SAndroid Build Coastguard Worker* <a name="week" href="#week">`<week>`</code></a> is deprecated in the main LDML files, because the data is more appropriately organized as connected to territories, not to linguistic data. Use the supplemental `<weekData>` element instead.
3620*912701f9SAndroid Build Coastguard Worker* The standalone `<am>` and `<pm>` are deprecated, and the data are instead included as part of the `<dayPeriods>` element
3621*912701f9SAndroid Build Coastguard Worker* `<fields>` is deprecated as a subelement of `<calendars>` instead, a `<fields>` element should be located just under a `<dates>` element. See [Calendar Fields](tr35-dates.md#Calendar_Fields).
3622*912701f9SAndroid Build Coastguard Worker
3623*912701f9SAndroid Build Coastguard Worker### <a name="Deprecated_subelements_of_timeZoneNames" href="#Deprecated_subelements_of_timeZoneNames">A.10 Deprecated subelements of `<timeZoneNames>`</a>
3624*912701f9SAndroid Build Coastguard Worker
3625*912701f9SAndroid Build Coastguard Worker* `<preferenceOrdering>`: use metazones instead.
3626*912701f9SAndroid Build Coastguard Worker* `<singleCountries>`:use [Primary Zones](tr35-dates.md#Primary_Zones)
3627*912701f9SAndroid Build Coastguard Worker* `<hoursFormat>`, <a name="fallbackRegionFormat" href="#fallbackRegionFormat">`<fallbackRegionFormat>`</a>, `<abbreviationFallback>`
3628*912701f9SAndroid Build Coastguard Worker
3629*912701f9SAndroid Build Coastguard Worker### <a name="Deprecated_subelements_of_zone_metazone" href="#Deprecated_subelements_of_zone_metazone">A.11 Deprecated subelements of `<zone>` and `<metazone>`</a>
3630*912701f9SAndroid Build Coastguard Worker
3631*912701f9SAndroid Build Coastguard Worker* `<commonlyUsed>`, formerly used to indicate whether a zone was commonly used in the locale.
3632*912701f9SAndroid Build Coastguard Worker
3633*912701f9SAndroid Build Coastguard Worker### <a name="Renamed_attribute_values_for_contextTransformUsage" href="#Renamed_attribute_values_for_contextTransformUsage">A.12 Renamed attribute values for `<contextTransformUsage>` element</a>
3634*912701f9SAndroid Build Coastguard Worker
3635*912701f9SAndroid Build Coastguard WorkerThe `<contextTransformUsage>` element was introduced in CLDR 21. The values for its `type` attribute are documented in [`<contextTransformUsage>` type attribute values](tr35-general.md#contextTransformUsage_type_attribute_values). In CLDR 25, some of these values were renamed from their previous values for improved clarity:
3636*912701f9SAndroid Build Coastguard Worker
3637*912701f9SAndroid Build Coastguard Worker* `type` was renamed to `keyValue`
3638*912701f9SAndroid Build Coastguard Worker* `displayName` was renamed to `currencyName`
3639*912701f9SAndroid Build Coastguard Worker* `displayName-count` was renamed to `currencyName-count`
3640*912701f9SAndroid Build Coastguard Worker* `tense` was renamed to `relative`
3641*912701f9SAndroid Build Coastguard Worker
3642*912701f9SAndroid Build Coastguard Worker### <a name="Deprecated_subelements_of_segmentations" href="#Deprecated_subelements_of_segmentations">A.13 Deprecated subelements of `<segmentations>`</a>
3643*912701f9SAndroid Build Coastguard Worker
3644*912701f9SAndroid Build Coastguard Worker* `<exceptions>` and `<exception>`: Replaced with `<suppressions>` and `<suppression>`.
3645*912701f9SAndroid Build Coastguard Worker
3646*912701f9SAndroid Build Coastguard Worker### <a name="Element_cp" href="#Element_cp">A.14 Element cp</a>
3647*912701f9SAndroid Build Coastguard Worker
3648*912701f9SAndroid Build Coastguard WorkerThe `cp` element was used in certain elements to escape characters that cannot be represented in XML, even with NCRs. This mechanism was replaced by specialized syntax:
3649*912701f9SAndroid Build Coastguard Worker
3650*912701f9SAndroid Build Coastguard Worker| Code Point | XML Example    |
3651*912701f9SAndroid Build Coastguard Worker| ---------- | -------------- |
3652*912701f9SAndroid Build Coastguard Worker| `U+0000`   | `<cp hex="0">` |
3653*912701f9SAndroid Build Coastguard Worker
3654*912701f9SAndroid Build Coastguard Worker### <a name="validSubLocales" href="#validSubLocales">A.15 Attribute validSubLocales</a>
3655*912701f9SAndroid Build Coastguard Worker
3656*912701f9SAndroid Build Coastguard WorkerInstead of using `validSubLocales`, it is recommended to simply add empty files to specify which sublocales are valid. This convention is used throughout the CLDR.
3657*912701f9SAndroid Build Coastguard Worker
3658*912701f9SAndroid Build Coastguard Worker### <a name="postCodeElements" href="#postCodeElements">A.16 Elements postalCodeData, postCodeRegex</a>
3659*912701f9SAndroid Build Coastguard Worker
3660*912701f9SAndroid Build Coastguard WorkerInstead please see other services that are kept up to date, such as <https://github.com/google/libaddressinput>
3661*912701f9SAndroid Build Coastguard Worker
3662*912701f9SAndroid Build Coastguard Worker### <a name="telephoneCodeData" href="#telephoneCodeData">A.17 Element telephoneCodeData</a>
3663*912701f9SAndroid Build Coastguard Worker
3664*912701f9SAndroid Build Coastguard WorkerThe element `<telephoneCodeData>` and its subelements have been deprecated and the data removed.
3665*912701f9SAndroid Build Coastguard Worker
3666*912701f9SAndroid Build Coastguard Worker* * *
3667*912701f9SAndroid Build Coastguard Worker
3668*912701f9SAndroid Build Coastguard Worker## <a name="Links_to_Other_Parts" href="#Links_to_Other_Parts">Annex B Links to Other Parts</a>
3669*912701f9SAndroid Build Coastguard Worker
3670*912701f9SAndroid Build Coastguard WorkerThe LDML specification is split into several [parts](#Parts) by topic, with one HTML document per part. The following tables provide redirects for links to specific topics. Please update your links and bookmarks.
3671*912701f9SAndroid Build Coastguard Worker
3672*912701f9SAndroid Build Coastguard WorkerPart 1 Links: Core (this document): No redirects needed.
3673*912701f9SAndroid Build Coastguard Worker
3674*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Part_2_Links" href="#Part_2_Links">Part 2 Links</a>: [General](tr35-general.md) (display names & transforms, etc.)
3675*912701f9SAndroid Build Coastguard Worker
3676*912701f9SAndroid Build Coastguard Worker| Old section                                                                                                 | Section in new part |
3677*912701f9SAndroid Build Coastguard Worker| ----------------------------------------------------------------------------------------------------------- | ------------------- |
3678*912701f9SAndroid Build Coastguard Worker| 5.4 <a name="Display_Name_Elements" href="#Display_Name_Elements">Display Name Elements</a>                 | 1 [Display Name Elements](tr35-general.md#Display_Name_Elements) |
3679*912701f9SAndroid Build Coastguard Worker| 5.5 <a name="Layout_Elements" href="#Layout_Elements">Layout Elements</a>                                   | 2 [Layout Elements](tr35-general.md#Layout_Elements) |
3680*912701f9SAndroid Build Coastguard Worker| 5.6 <a name="Character_Elements" href="#Character_Elements">Character Elements</a>                          | 3 [Character Elements](tr35-general.md#Character_Elements) |
3681*912701f9SAndroid Build Coastguard Worker| 5.6.1 <a name="ExemplarSyntax" href="#ExemplarSyntax">Exemplar Syntax</a>                                   | 3.1 [Exemplar Syntax](tr35-general.md#ExemplarSyntax) |
3682*912701f9SAndroid Build Coastguard Worker| 5.6.2 Restrictions                                                                                          | 3.1 [Exemplar Syntax](tr35-general.md#ExemplarSyntax) |
3683*912701f9SAndroid Build Coastguard Worker| 5.6.3 Mapping                                                                                               | 3.2 [Mapping](tr35-general.md#Character_Mapping) |
3684*912701f9SAndroid Build Coastguard Worker| 5.6.4 <a name="IndexLabels" href="#IndexLabels">Index Labels</a>                                            | 3.3 [Index Labels](tr35-general.md#IndexLabels) |
3685*912701f9SAndroid Build Coastguard Worker| 5.6.5 Ellipsis                                                                                              | 3.4 [Ellipsis](tr35-general.md#Ellipsis) |
3686*912701f9SAndroid Build Coastguard Worker| 5.6.6 More Information                                                                                      | 3.5 [More Information](tr35-general.md#Character_More_Info) |
3687*912701f9SAndroid Build Coastguard Worker| 5.7 <a name="Delimiter_Elements" href="#Delimiter_Elements">Delimiter Elements</a>                          | 4 [Delimiter Elements](tr35-general.md#Delimiter_Elements) |
3688*912701f9SAndroid Build Coastguard Worker| C.6 <a name="Measurement_System_Data" href="#Measurement_System_Data">Measurement System Data</a>           | 5 [Measurement System Data](tr35-general.md#Measurement_System_Data) |
3689*912701f9SAndroid Build Coastguard Worker| 5.8 <a name="Measurement_Elements" href="#Measurement_Elements">Measurement Elements (deprecated)</a>       | 5.1 [Measurement Elements (deprecated)](tr35-general.md#Measurement_Elements) |
3690*912701f9SAndroid Build Coastguard Worker| 5.11 <a name="Unit_Elements" href="#Unit_Elements">Unit Elements</a>                                        | 6 [Unit Elements](tr35-general.md#Unit_Elements) |
3691*912701f9SAndroid Build Coastguard Worker| 5.12 <a name="POSIX_Elements" href="#POSIX_Elements">POSIX Elements</a>                                     | 7 [POSIX Elements](tr35-general.md#POSIX_Elements) |
3692*912701f9SAndroid Build Coastguard Worker| 5.13 <a name="Reference_Elements" href="#Reference_Elements">Reference Element</a>                          | 8 [Reference Element](tr35-general.md#Reference_Elements) |
3693*912701f9SAndroid Build Coastguard Worker| 5.15 <a name="Segmentations" href="#Segmentations">Segmentations</a>                                        | 9 [Segmentations](tr35-general.md#Segmentations) |
3694*912701f9SAndroid Build Coastguard Worker| 5.15.1 <a name="Segmentation_Inheritance" href="#Segmentation_Inheritance">Segmentation Inheritance</a>     | 9.1 [Segmentation Inheritance](tr35-general.md#Segmentation_Inheritance) |
3695*912701f9SAndroid Build Coastguard Worker| 5.16 <a name="Transforms" href="#Transforms">Transforms</a>                                                 | 10 [Transforms](tr35-general.md#Transforms) |
3696*912701f9SAndroid Build Coastguard Worker| N <a name="Transform_Rules" href="#Transform_Rules">Transform Rules</a>                                     | 10.3 [Transform Rules Syntax](tr35-general.md#Transform_Rules_Syntax) |
3697*912701f9SAndroid Build Coastguard Worker| 5.18 <a name="ListPatterns" href="#ListPatterns">List Patterns</a>                                          | 11 [List Patterns](tr35-general.md#ListPatterns) |
3698*912701f9SAndroid Build Coastguard Worker| C.20 <a name="List_Gender" href="#List_Gender">Gender of Lists</a>                                          | 11.1 [Gender of Lists](tr35-general.md#List_Gender) |
3699*912701f9SAndroid Build Coastguard Worker| 5.19 <a name="Context_Transform_Elements" href="#Context_Transform_Elements">ContextTransform Elements</a>  | 12 [ContextTransform Elements](tr35-general.md#Context_Transform_Elements) |
3700*912701f9SAndroid Build Coastguard Worker
3701*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Part_3_Links" href="#Part_3_Links">Part 3 Links</a>: [Numbers](tr35-numbers.md) (number & currency formatting)
3702*912701f9SAndroid Build Coastguard Worker
3703*912701f9SAndroid Build Coastguard Worker| Old section                                                                                                       | Section in new part |
3704*912701f9SAndroid Build Coastguard Worker| ----------------------------------------------------------------------------------------------------------------- | ------------------- |
3705*912701f9SAndroid Build Coastguard Worker| C.13 <a name="Numbering_Systems" href="#Numbering_Systems">Numbering Systems</a>                                  | 1 [Numbering Systems](tr35-numbers.md#Numbering_Systems) |
3706*912701f9SAndroid Build Coastguard Worker| 5.10 <a name="Number_Elements" href="#Number_Elements">Number Elements</a>                                        | 2 [Number Elements](tr35-numbers.md#Number_Elements) |
3707*912701f9SAndroid Build Coastguard Worker| 5.10.1 <a name="Number_Symbols" href="#Number_Symbols">Number Symbols</a>                                         | 2.3 [Number Symbols](tr35-numbers.md#Number_Symbols) |
3708*912701f9SAndroid Build Coastguard Worker| G <a name="Number_Format_Patterns" href="#Number_Format_Patterns">Number Format Patterns</a>                      | 3 [Number Format Patterns](tr35-numbers.md#Number_Format_Patterns) |
3709*912701f9SAndroid Build Coastguard Worker| 5.10.2 <a name="Currencies" href="#Currencies">Currencies</a>                                                     | 4 [Currencies](tr35-numbers.md#Currencies) |
3710*912701f9SAndroid Build Coastguard Worker| C.1 <a name="Supplemental_Currency_Data" href="#Supplemental_Currency_Data">Supplemental Currency Data</a>        | 4.1 [Supplemental Currency Data](tr35-numbers.md#Supplemental_Currency_Data) |
3711*912701f9SAndroid Build Coastguard Worker| C.11 <a name="Language_Plural_Rules" href="#Language_Plural_Rules">Language Plural Rules</a>                      | 5 [Language Plural Rules](tr35-numbers.md#Language_Plural_Rules) |
3712*912701f9SAndroid Build Coastguard Worker| 5.17 <a name="Rule-Based_Number_Formatting" href="#Rule-Based_Number_Formatting">Rule-Based Number Formatting</a> | 6 [Rule-Based Number Formatting](tr35-numbers.md#Rule-Based_Number_Formatting) |
3713*912701f9SAndroid Build Coastguard Worker
3714*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Part_4_Links" href="#Part_4_Links">Part 4 Links</a>: [Dates](tr35-dates.md) (date, time, time zone formatting)
3715*912701f9SAndroid Build Coastguard Worker
3716*912701f9SAndroid Build Coastguard Worker| Old section                                                                                                                   | Section in new part |
3717*912701f9SAndroid Build Coastguard Worker| ----------------------------------------------------------------------------------------------------------------------------- | ------------------- |
3718*912701f9SAndroid Build Coastguard Worker| <a name="Date_Elements" href="#Date_Elements">5.9 Date Elements</a>                                                           | 1 [Overview: Dates Element, Supplemental Date and Calendar Information](tr35-dates.md#Overview_Dates_Element_Supplemental) |
3719*912701f9SAndroid Build Coastguard Worker| <a name="Calendar_Elements" href="#Calendar_Elements">5.9.1 Calendar Elements</a>                                             | 2 [Calendar Elements](tr35-dates.md#Calendar_Elements) |
3720*912701f9SAndroid Build Coastguard Worker| <a name="months_days_quarters_eras" href="#months_days_quarters_eras">Elements months, days, quarters, eras</a>               | 2.1 [Elements months, days, quarters, eras](tr35-dates.md#months_days_quarters_eras) |
3721*912701f9SAndroid Build Coastguard Worker| <a name="monthPatterns_cyclicNameSets" href="#monthPatterns_cyclicNameSets">Elements monthPatterns, cyclicNameSets</a>        | 2.2 [Elements monthPatterns, cyclicNameSets](tr35-dates.md#monthPatterns_cyclicNameSets) |
3722*912701f9SAndroid Build Coastguard Worker| <a name="dayPeriods" href="#dayPeriods">Element dayPeriods</a>                                                                | 2.3 [Element dayPeriods](tr35-dates.md#dayPeriods) |
3723*912701f9SAndroid Build Coastguard Worker| <a name="dateFormats" href="#dateFormats">Element dateFormats</a>                                                             | 2.4 [Element dateFormats](tr35-dates.md#dateFormats) |
3724*912701f9SAndroid Build Coastguard Worker| <a name="timeFormats" href="#timeFormats">Element timeFormats</a>                                                             | 2.5 [Element timeFormats](tr35-dates.md#timeFormats) |
3725*912701f9SAndroid Build Coastguard Worker| <a name="dateTimeFormats" href="#dateTimeFormats">Element dateTimeFormats</a>                                                 | 2.6 [Element dateTimeFormats](tr35-dates.md#dateTimeFormats) |
3726*912701f9SAndroid Build Coastguard Worker| <a name="Calendar_Fields" href="#Calendar_Fields">5.9.2 Calendar Fields</a>                                                   | 3 [Calendar Fields](tr35-dates.md#Calendar_Fields) |
3727*912701f9SAndroid Build Coastguard Worker| 5.9.3 <a name="Timezone_Names" href="#Timezone_Names">Time Zone Names</a>                                                     | 5 [Time Zone Names](tr35-dates.md#Time_Zone_Names) |
3728*912701f9SAndroid Build Coastguard Worker| <a name="Supplemental_Calendar_Data" href="#Supplemental_Calendar_Data">C.5 Supplemental Calendar Data</a>                    | 4 [Supplemental Calendar Data](tr35-dates.md#Supplemental_Calendar_Data) |
3729*912701f9SAndroid Build Coastguard Worker| <a name="Supplemental_Timezone_Data" href="#Supplemental_Timezone_Data">C.7 Supplemental Time Zone Data</a>                   | 6 [Supplemental Time Zone Data](tr35-dates.md#Supplemental_Time_Zone_Data) |
3730*912701f9SAndroid Build Coastguard Worker| <a name="Calendar_Preference_Data" href="#Calendar_Preference_Data">C.15 Calendar Preference Data</a>                         | 4.2 [Calendar Preference Data](tr35-dates.md#Calendar_Preference_Data) |
3731*912701f9SAndroid Build Coastguard Worker| <a name="DayPeriodRules" href="#DayPeriodRules">C.17 DayPeriod Rules</a>                                                      | 4.5 [Day Period Rules](tr35-dates.md#Day_Period_Rules) |
3732*912701f9SAndroid Build Coastguard Worker| <a name="Date_Format_Patterns" href="#Date_Format_Patterns">Appendix F: Date Format Patterns</a>                              | 8 [Date Format Patterns](tr35-dates.md#Date_Format_Patterns) |
3733*912701f9SAndroid Build Coastguard Worker| <a name="Date_Field_Symbol_Table" href="#Date_Field_Symbol_Table">Date Field Symbol Table</a>                                 | [Date Field Symbol Table](tr35-dates.md#Date_Field_Symbol_Table) |
3734*912701f9SAndroid Build Coastguard Worker| <a name="Localized_Pattern_Characters" href="#Localized_Pattern_Characters">F.1 Localized Pattern Characters (deprecated)</a> | 8.1 [Localized Pattern Characters (deprecated)](tr35-dates.md#Localized_Pattern_Characters) |
3735*912701f9SAndroid Build Coastguard Worker| <a name="Time_Zone_Fallback" href="#Time_Zone_Fallback">Appendix J: Time Zone Display Names</a>                               | 7 [Using Time Zone Names](tr35-dates.md#Using_Time_Zone_Names) |
3736*912701f9SAndroid Build Coastguard Worker| <a name="fallbackFormat" href="#fallbackFormat">**fallbackFormat**:</a>                                                       | [**fallbackFormat**:](tr35-dates.md#fallbackFormat) |
3737*912701f9SAndroid Build Coastguard Worker| O.4 Parsing Dates and Times                                                                                                   | 9 [Parsing Dates and Times](tr35-dates.md#Parsing_Dates_Times) |
3738*912701f9SAndroid Build Coastguard Worker
3739*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Part_5_Links" href="#Part_5_Links">Part 5 Links</a>: [Collation](tr35-collation.md) (sorting, searching, grouping)
3740*912701f9SAndroid Build Coastguard Worker
3741*912701f9SAndroid Build Coastguard Worker| Old section                                                                                                                     | Section in new part |
3742*912701f9SAndroid Build Coastguard Worker| ------------------------------------------------------------------------------------------------------------------------------- | ------------------- |
3743*912701f9SAndroid Build Coastguard Worker| 5.14 <a name="Collation_Elements" href="#Collation_Elements">Collation Elements</a>                                             | 3 [Collation Tailorings](tr35-collation.md#Collation_Tailorings) |
3744*912701f9SAndroid Build Coastguard Worker| 5.14.1 <a name="Collation_Version" href="#Collation_Version">Version</a>                                                        | 3.1 [Version](tr35-collation.md#Collation_Version) |
3745*912701f9SAndroid Build Coastguard Worker| 5.14.2 <a name="Collation_Element" href="#Collation_Element">Collation Element</a>                                              | 3.2 [Collation Element](tr35-collation.md#Collation_Element) |
3746*912701f9SAndroid Build Coastguard Worker| 5.14.3 <a name="Setting_Options" href="#Setting_Options">Setting Options</a>                                                    | 3.3 [Setting Options](tr35-collation.md#Setting_Options) |
3747*912701f9SAndroid Build Coastguard Worker| Table <a name="Collation_Settings" href="#Collation_Settings">Collation Settings</a>                                            | Table [Collation Settings](tr35-collation.md#Collation_Settings) |
3748*912701f9SAndroid Build Coastguard Worker| 5.14.4 <a name="Rules" href="#Rules">Collation Rule Syntax</a>                                                                  | 3.4 [Collation Rule Syntax](tr35-collation.md#Rules) |
3749*912701f9SAndroid Build Coastguard Worker| 5.14.5 <a name="Orderings" href="#Orderings">Orderings</a>                                                                      | 3.5 [Orderings](tr35-collation.md#Orderings) |
3750*912701f9SAndroid Build Coastguard Worker| 5.14.6 <a name="Contractions" href="#Contractions">Contractions</a>                                                             | 3.6 [Contractions](tr35-collation.md#Contractions) |
3751*912701f9SAndroid Build Coastguard Worker| 5.14.7 <a name="Expansions" href="#Expansions">Expansions</a>                                                                   | 3.7 [Expansions](tr35-collation.md#Expansions) |
3752*912701f9SAndroid Build Coastguard Worker| 5.14.8 <a name="Context_Before" href="#Context_Before">Context Before</a>                                                       | 3.8 [Context Before](tr35-collation.md#Context_Before) |
3753*912701f9SAndroid Build Coastguard Worker| 5.14.9 <a name="Placing_Characters_Before_Others" href="#Placing_Characters_Before_Others">Placing Characters Before Others</a> | 3.9 [Placing Characters Before Others](tr35-collation.md#Placing_Characters_Before_Others) |
3754*912701f9SAndroid Build Coastguard Worker| 5.14.10 <a name="Logical_Reset_Positions" href="#Logical_Reset_Positions">Logical Reset Positions</a>                           | 3.10 [Logical Reset Positions](tr35-collation.md#Logical_Reset_Positions) |
3755*912701f9SAndroid Build Coastguard Worker| 5.14.11 <a name="Special_Purpose_Commands" href="#Special_Purpose_Commands">Special-Purpose Commands</a>                        | 3.11 [Special-Purpose Commands](tr35-collation.md#Special_Purpose_Commands) |
3756*912701f9SAndroid Build Coastguard Worker| 5.14.12 <a name="Script_Reordering" href="#Script_Reordering">Collation Reordering</a>                                          | 3.12 [Collation Reordering](tr35-collation.md#Script_Reordering) |
3757*912701f9SAndroid Build Coastguard Worker| 5.14.13 <a name="Case_Parameters" href="#Case_Parameters">Case Parameters</a>                                                   | 3.13 [Case Parameters](tr35-collation.md#Case_Parameters) |
3758*912701f9SAndroid Build Coastguard Worker| Definition: <a name="UncasedExceptions" href="#UncasedExceptions">UncasedExceptions</a>                                         | removed: see 3.13 [Case Parameters](tr35-collation.md#Case_Parameters) |
3759*912701f9SAndroid Build Coastguard Worker| Definition: <a name="LowerExceptions" href="#LowerExceptions">LowerExceptions</a>                                               | removed: see 3.13 [Case Parameters](tr35-collation.md#Case_Parameters) |
3760*912701f9SAndroid Build Coastguard Worker| Definition: <a name="UpperExceptions" href="#UpperExceptions">UpperExceptions</a>                                               | removed: see 3.13 [Case Parameters](tr35-collation.md#Case_Parameters) |
3761*912701f9SAndroid Build Coastguard Worker| 5.14.14 <a name="Visibility" href="#Visibility">Visibility</a>                                                                  | 3.14 [Visibility](tr35-collation.md#Visibility) |
3762*912701f9SAndroid Build Coastguard Worker
3763*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Part_6_Links" href="#Part_6_Links">Part 6 Links</a>: [Supplemental](tr35-info.md) (supplemental data)
3764*912701f9SAndroid Build Coastguard Worker
3765*912701f9SAndroid Build Coastguard Worker| Old section                                                                                                                              | Section in new part |
3766*912701f9SAndroid Build Coastguard Worker| ---------------------------------------------------------------------------------------------------------------------------------------- | ------------------- |
3767*912701f9SAndroid Build Coastguard Worker| C <a name="Supplemental_Data" href="#Supplemental_Data">Supplemental Data</a>                                                            | Introduction [Supplemental Data](tr35-info.md#Supplemental_Data) |
3768*912701f9SAndroid Build Coastguard Worker| C.2 <a name="Supplemental_Territory_Containment" href="#Supplemental_Territory_Containment">Supplemental Territory Containment</a>       | 1.1 [Supplemental Territory Containment](tr35-info.md#Supplemental_Territory_Containment) |
3769*912701f9SAndroid Build Coastguard Worker| C.4 <a name="Supplemental_Territory_Information" href="#Supplemental_Territory_Information">Supplemental Territory Information</a>       | 1.2 [Supplemental Territory Information](tr35-info.md#Supplemental_Territory_Information) |
3770*912701f9SAndroid Build Coastguard Worker| C.3 <a name="Supplemental_Language_Data" href="#Supplemental_Language_Data">Supplemental Language Data</a>                               | 2 [Supplemental Language Data](tr35-info.md#Supplemental_Language_Data) |
3771*912701f9SAndroid Build Coastguard Worker| C.9 <a name="Supplemental_Code_Mapping" href="#Supplemental_Code_Mapping">Supplemental Code Mapping</a>                                  | 4 [Supplemental Code Mapping](tr35-info.md#Supplemental_Code_Mapping) |
3772*912701f9SAndroid Build Coastguard Worker| C.12 <a name="Telephone_Code_Data" href="#Telephone_Code_Data">Telephone Code Data</a>                                                   | 5 [Telephone Code Data](tr35-info.md#Telephone_Code_Data) |
3773*912701f9SAndroid Build Coastguard Worker| C.14 <a name="Postal_Code_Validation" href="#Postal_Code_Validation">Postal Code Validation</a>                                          | 6 [Postal Code Validation](tr35-info.md#Postal_Code_Validation) |
3774*912701f9SAndroid Build Coastguard Worker| C.8 <a name="Supplemental_Character_Fallback_Data" href="#Supplemental_Character_Fallback_Data">Supplemental Character Fallback Data</a> | 7 [Supplemental Character Fallback Data](tr35-info.md#Supplemental_Character_Fallback_Data) |
3775*912701f9SAndroid Build Coastguard Worker| M <a name="Coverage_Levels" href="#Coverage_Levels">Coverage Levels</a>                                                                  | 8 [Coverage Levels](tr35-info.md#Coverage_Levels) |
3776*912701f9SAndroid Build Coastguard Worker| 5.20 [Metadata Elements](tr35-info.md#Metadata_Elements)                                                                                 | 10 [Locale Metadata Element](tr35-info.md#Metadata_Elements) |
3777*912701f9SAndroid Build Coastguard Worker| P [Supplemental Metadata](tr35-info.md#Appendix_Supplemental_Metadata)                                                                   | 9 [Supplemental Metadata](tr35-info.md#Appendix_Supplemental_Metadata)
3778*912701f9SAndroid Build Coastguard Worker| P.1 [Supplemental Alias Information](tr35-info.md#Supplemental_Alias_Information)                                                        | 9.1 [Supplemental Alias Information](tr35-info.md#Supplemental_Alias_Information)
3779*912701f9SAndroid Build Coastguard Worker| P.2 [Supplemental Deprecated Information](tr35-info.md#Supplemental_Deprecated_Information)                                              | 9.2 [Supplemental Deprecated Information](tr35-info.md#Supplemental_Deprecated_Information)
3780*912701f9SAndroid Build Coastguard Worker| P.3 [Default Content](tr35-info.md#Default_Content)                                                                                      | 9.3 [Default Content](tr35-info.md#Default_Content) |
3781*912701f9SAndroid Build Coastguard Worker
3782*912701f9SAndroid Build Coastguard Worker###### Table: <a name="Part_7_Links" href="#Part_7_Links">Part 7 Links</a>: [Keyboards](tr35-keyboards.md) (keyboard mappings)
3783*912701f9SAndroid Build Coastguard Worker
3784*912701f9SAndroid Build Coastguard Worker[Part 7](tr35-keyboards.md) has been extensively rewritten. The prior link anchors within this file are no longer valid.
3785*912701f9SAndroid Build Coastguard Worker
3786*912701f9SAndroid Build Coastguard Worker* * *
3787*912701f9SAndroid Build Coastguard Worker
3788*912701f9SAndroid Build Coastguard Worker## <a name="LocaleId_Canonicalization" href="#LocaleId_Canonicalization">Annex C. LocaleId Canonicalization</a>
3789*912701f9SAndroid Build Coastguard Worker
3790*912701f9SAndroid Build Coastguard WorkerThe `languageAlias`, `scriptAlias`, `territoryAlias`, and `variantAlias` elements are used as rules to transform an input _source localeId_. The first step is to transform the _languageId_ portion of the localeId.
3791*912701f9SAndroid Build Coastguard Worker
3792*912701f9SAndroid Build Coastguard Worker> Note: in the following discussion, the separator '-' is used. That is also used in examples of XML alias data, even though for compatibility reasons that alias data actually uses '\_' as a separator. The processing can also be applied to syntax while maintaining the separator '\_', _mutatis mutandis_. CLDR also uses “territory” and “region” interchangeably.
3793*912701f9SAndroid Build Coastguard Worker
3794*912701f9SAndroid Build Coastguard Worker> Also note that the discussion of canonicalization assumes BCP 47
3795*912701f9SAndroid Build Coastguard Worker> input data. If input data is a CLDR or ICU locale ID such
3796*912701f9SAndroid Build Coastguard Worker> as `en_US_POSIX`, a conversion step must be done prior to
3797*912701f9SAndroid Build Coastguard Worker> canonicalization.
3798*912701f9SAndroid Build Coastguard Worker>See §3.8.2 [Legacy Variants](#Legacy_Variants).
3799*912701f9SAndroid Build Coastguard Worker
3800*912701f9SAndroid Build Coastguard Worker### <a name="LocaleId_Definitions">LocaleId Definitions</a>
3801*912701f9SAndroid Build Coastguard Worker
3802*912701f9SAndroid Build Coastguard Worker#### <a name="1.-multimap-interpretation" href="#1.-multimap-interpretation">1. Multimap interpretation</a>
3803*912701f9SAndroid Build Coastguard Worker
3804*912701f9SAndroid Build Coastguard WorkerInterpret each languageId as a multimap from a _fieldId_ (language, script, region, variants) to a **sorted set** of field values.
3805*912701f9SAndroid Build Coastguard Worker
3806*912701f9SAndroid Build Coastguard Worker_Examples:_
3807*912701f9SAndroid Build Coastguard Worker
3808*912701f9SAndroid Build Coastguard Worker| Source                    | Language | Script | Region | Variants          |
3809*912701f9SAndroid Build Coastguard Worker|---------------------------|----------|--------|--------|-------------------|
3810*912701f9SAndroid Build Coastguard Worker| en-GB                     | {en}     | {}     | {GB}   | {}                |
3811*912701f9SAndroid Build Coastguard Worker| und-GB                    | {}       | {}     | {GB}   | {}                |
3812*912701f9SAndroid Build Coastguard Worker| ja-Latn-YU-hepburn-heploc | {ja}     | {Latn} | {YU}   | {hepburn, heploc} |
3813*912701f9SAndroid Build Coastguard Worker
3814*912701f9SAndroid Build Coastguard Worker* This can be represented as an abbreviated format: \{L=\{ja}, S=\{Latn}, R=\{YU}, V=\{hepburn, heploc}}, skipping empty sets.
3815*912701f9SAndroid Build Coastguard Worker* “und” is a special language code that is treated as an empty set.
3816*912701f9SAndroid Build Coastguard Worker* Of course, only the Variants can contain more than one item: the others are either empty or contain exactly 1 item.
3817*912701f9SAndroid Build Coastguard Worker
3818*912701f9SAndroid Build Coastguard Worker#### <a name="2.-alias-elements" href="#2.-alias-elements">2. Alias elements</a>
3819*912701f9SAndroid Build Coastguard Worker
3820*912701f9SAndroid Build Coastguard WorkerFor the `languageAlias` elements, the _type_ and _replacements_ are languageIds.
3821*912701f9SAndroid Build Coastguard Worker
3822*912701f9SAndroid Build Coastguard WorkerFor the script-, territory- (aka region), and variant- Alias elements, the type and replacements are interpreted as a languageId, _after_ prefixing with “und-”. Thus
3823*912701f9SAndroid Build Coastguard Worker
3824*912701f9SAndroid Build Coastguard Worker```xml
3825*912701f9SAndroid Build Coastguard Worker<territoryAlias type="AN" replacement="CW SX BQ" reason="deprecated" />
3826*912701f9SAndroid Build Coastguard Worker```
3827*912701f9SAndroid Build Coastguard Worker
3828*912701f9SAndroid Build Coastguard Workeris interpreted as:
3829*912701f9SAndroid Build Coastguard Worker
3830*912701f9SAndroid Build Coastguard Worker```xml
3831*912701f9SAndroid Build Coastguard Worker<territoryAlias type="und-AN" replacement="und-CW und-SX und-BQ" reason="deprecated" />
3832*912701f9SAndroid Build Coastguard Worker```
3833*912701f9SAndroid Build Coastguard Worker
3834*912701f9SAndroid Build Coastguard WorkerNote that for the case of territoryAlias, there may be multiple replacement values separated by spaces in the text (such as replacement="und-CW und-SX und-BQ"); other rules only ever have a single replacement value.
3835*912701f9SAndroid Build Coastguard Worker
3836*912701f9SAndroid Build Coastguard Worker#### <a name="3.-matches" href="#3.-matches">Matches</a>
3837*912701f9SAndroid Build Coastguard Worker
3838*912701f9SAndroid Build Coastguard WorkerA rule matches a source if and only for all fields, each _source_ field ⊇ _type_ field.
3839*912701f9SAndroid Build Coastguard Worker
3840*912701f9SAndroid Build Coastguard Worker_Examples:_
3841*912701f9SAndroid Build Coastguard Worker
3842*912701f9SAndroid Build Coastguard Worker`source="ja-heploc-hepburn"` and `type="und-hepburn"`
3843*912701f9SAndroid Build Coastguard Worker
3844*912701f9SAndroid Build Coastguard Worker<table class="simple"><tbody>
3845*912701f9SAndroid Build Coastguard Worker<tr><td>{ja} ⊇ {}</td><td>success, und = {}</td></tr>
3846*912701f9SAndroid Build Coastguard Worker<tr><td>{hepburn, heploc} ⊇ {hepburn}</td><td><b>success</b></td></tr>
3847*912701f9SAndroid Build Coastguard Worker</tbody></table>
3848*912701f9SAndroid Build Coastguard Worker
3849*912701f9SAndroid Build Coastguard Workerso the rule matches the source. (Note that order of variants is immaterial to matching)
3850*912701f9SAndroid Build Coastguard Worker
3851*912701f9SAndroid Build Coastguard Worker`source="ja-hepburn"` and `type="und-hepburn-heploc"`
3852*912701f9SAndroid Build Coastguard Worker
3853*912701f9SAndroid Build Coastguard Worker<table class="simple"><tbody>
3854*912701f9SAndroid Build Coastguard Worker<tr><td>{ja} ⊇ {}</td><td>success, und = {}</td></tr>
3855*912701f9SAndroid Build Coastguard Worker<tr><td>{hepburn} ⊉ {hepburn, heploc}</td><td><b>failure</b></td></tr>
3856*912701f9SAndroid Build Coastguard Worker</tbody></table>
3857*912701f9SAndroid Build Coastguard Worker
3858*912701f9SAndroid Build Coastguard Workerso the rule does not match the source.
3859*912701f9SAndroid Build Coastguard Worker
3860*912701f9SAndroid Build Coastguard Worker#### <a name="4.-replacement" href="#4.-replacement">4. Replacement</a>
3861*912701f9SAndroid Build Coastguard Worker
3862*912701f9SAndroid Build Coastguard WorkerA matching rule can be used to transform the source fields as follows
3863*912701f9SAndroid Build Coastguard Worker
3864*912701f9SAndroid Build Coastguard Worker* if type.field ≠ \{}
3865*912701f9SAndroid Build Coastguard Worker  * source.field = (source.field - type.field) ∪ replacement.field
3866*912701f9SAndroid Build Coastguard Worker* else if source.field = \{} and replacement.field ≠ \{}
3867*912701f9SAndroid Build Coastguard Worker  * source.field = replacement.field
3868*912701f9SAndroid Build Coastguard Worker
3869*912701f9SAndroid Build Coastguard Worker_Example:_
3870*912701f9SAndroid Build Coastguard Worker
3871*912701f9SAndroid Build Coastguard Worker> source="ja-Latn-fonipa-hepburn-heploc"
3872*912701f9SAndroid Build Coastguard Worker>
3873*912701f9SAndroid Build Coastguard Worker> rule  =`<languageAlias type="und-hepburn-heploc" replacement="und-alalc97">`
3874*912701f9SAndroid Build Coastguard Worker>
3875*912701f9SAndroid Build Coastguard Worker> result="ja-Latn-alalc97-fonipa"
3876*912701f9SAndroid Build Coastguard Worker>
3877*912701f9SAndroid Build Coastguard Worker> (note that CLDR canonical order of variants is alphabetical)
3878*912701f9SAndroid Build Coastguard Worker
3879*912701f9SAndroid Build Coastguard Worker##### Territory Exception
3880*912701f9SAndroid Build Coastguard Worker
3881*912701f9SAndroid Build Coastguard WorkerIf the field = territory, and the replacement.field has more than one value, then look up the most likely territory for the base language code (and script, if there is one). If that likely territory is in the list of replacements, use it. Otherwise, use the first territory in the list.
3882*912701f9SAndroid Build Coastguard Worker
3883*912701f9SAndroid Build Coastguard Worker#### <a name="5.-canonicalizing-syntax" href="#5.-canonicalizing-syntax">5. Canonicalizing Syntax</a>
3884*912701f9SAndroid Build Coastguard Worker
3885*912701f9SAndroid Build Coastguard WorkerTo canonicalize the syntax of _source_:
3886*912701f9SAndroid Build Coastguard Worker
3887*912701f9SAndroid Build Coastguard Worker* Initial Script Subtag
3888*912701f9SAndroid Build Coastguard Worker  * If the first subtag has 4 letters, prepend the source with "und-"
3889*912701f9SAndroid Build Coastguard Worker  * Note: These are only for specialized use.
3890*912701f9SAndroid Build Coastguard Worker* Casing
3891*912701f9SAndroid Build Coastguard Worker  * Put any script subtag inside unicode_language_id into title case (eg, Hant)
3892*912701f9SAndroid Build Coastguard Worker  * Put any region subtag inside unicode_language_id into uppercase (eg, DE)
3893*912701f9SAndroid Build Coastguard Worker  * Put all other subtags into lowercase (eg, en, fonipa)
3894*912701f9SAndroid Build Coastguard Worker* Order
3895*912701f9SAndroid Build Coastguard Worker  * Put any variants into alphabetical order (eg, en-fonipa-scouse, not en-scouse-fonipa)
3896*912701f9SAndroid Build Coastguard Worker  * Put any extensions into alphabetical order by their singleton (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)
3897*912701f9SAndroid Build Coastguard Worker  * Put all attributes into alphabetical order.
3898*912701f9SAndroid Build Coastguard Worker  * Put all ufields (<ukey, uvalue>) and tfields (<tkey, tvalue>) into alphabetical order according to their keys (ukey or tkey), within their respective extensions.
3899*912701f9SAndroid Build Coastguard Worker  * Remove any uvalue (aka type) equal to "true". Note that "true" values cannot be removed from tvalues.
3900*912701f9SAndroid Build Coastguard Worker* Separator
3901*912701f9SAndroid Build Coastguard Worker  * Replace '\_' by '-'
3902*912701f9SAndroid Build Coastguard Worker
3903*912701f9SAndroid Build Coastguard Worker### <a name="preprocessing" href="#preprocessing">Preprocessing</a>
3904*912701f9SAndroid Build Coastguard Worker
3905*912701f9SAndroid Build Coastguard WorkerThe data from supplementalMetadata is (logically) preprocessed as follows.
3906*912701f9SAndroid Build Coastguard Worker
3907*912701f9SAndroid Build Coastguard Worker1. Load the rules from supplementalMetadata.xml, replacing '\_' by '-', and adding “und-” as described in _Definition 2. Alias Elements_.
3908*912701f9SAndroid Build Coastguard Worker2. Capture all languageAlias rules where the _type_ is an invalid languageId into a set of **BCP47 LegacyRules**. Example:
3909*912701f9SAndroid Build Coastguard Worker   1. `<languageAlias type="i-mingo" replacement="see-x-i-mingo" reason="legacy" />`
3910*912701f9SAndroid Build Coastguard Worker3. Discard all rules where the _type_ is an invalid languageId. Examples are
3911*912701f9SAndroid Build Coastguard Worker   1. `<languageAlias type="i-mingo" replacement="see-x-i-mingo" reason="legacy" />`
3912*912701f9SAndroid Build Coastguard Worker   2. `<territoryAlias type="und-AAA" replacement="und-AA" reason="overlong" />`
3913*912701f9SAndroid Build Coastguard Worker4. Change the _type_ and _replacement_ values in the remaining rules into multimap rules, as per _Definition 1. Multimap Interpretation_.
3914*912701f9SAndroid Build Coastguard Worker   1. Note that the “und” value disappears.
3915*912701f9SAndroid Build Coastguard Worker5. Order the set of rules using the following comparison logic:
3916*912701f9SAndroid Build Coastguard Worker   1. For each rule, count the number of items in each field value set (L, S, R, V) and sum the four counts.
3917*912701f9SAndroid Build Coastguard Worker      If two rules have differing sums, order the rule with the greater sum before the rule with the smaller sum.
3918*912701f9SAndroid Build Coastguard Worker        * For example:
3919*912701f9SAndroid Build Coastguard Worker        * {V={hepburn,heploc}} is tied with
3920*912701f9SAndroid Build Coastguard Worker        * {L={en}, R={GB}} (because both have 2 total field value items) and both precede
3921*912701f9SAndroid Build Coastguard Worker        * {R={CA}} (which has 1).
3922*912701f9SAndroid Build Coastguard Worker   2. For rule pairs that are not differentiated by the previous step, consider the value set for each field in the order L, then S, then R, then V.
3923*912701f9SAndroid Build Coastguard Worker      If one rule has a non-empty value set for that field and the other rule does not,
3924*912701f9SAndroid Build Coastguard Worker      then order the rule with the non-empty value set for that field before the other rule and disregard all later fields.
3925*912701f9SAndroid Build Coastguard Worker      Otherwise, consider the next field.
3926*912701f9SAndroid Build Coastguard Worker        * For example:
3927*912701f9SAndroid Build Coastguard Worker        * {L={zh}, S={Hant}, R={CN}} is tied with
3928*912701f9SAndroid Build Coastguard Worker        * {L={en}, S={Latn}, R={GB}} (because both have non-empty sets for L, S, and R but not for V),
3929*912701f9SAndroid Build Coastguard Worker          and both precede
3930*912701f9SAndroid Build Coastguard Worker        * {L={zh}, S={Hans}, V={pinyin}} (because it lacks values for R),
3931*912701f9SAndroid Build Coastguard Worker          which precedes
3932*912701f9SAndroid Build Coastguard Worker        * {L={en}, R={GB}, V={scouse}} (because it lacks values for S),
3933*912701f9SAndroid Build Coastguard Worker          which precedes
3934*912701f9SAndroid Build Coastguard Worker        * {V={fonipa,hepburn,heploc}} (because it lacks values for L),
3935*912701f9SAndroid Build Coastguard Worker          which is tied with
3936*912701f9SAndroid Build Coastguard Worker        * {V={hepburn,heploc,simple}} (because both have non-empty sets for V but not for L, S, or R).
3937*912701f9SAndroid Build Coastguard Worker   3. For rule pairs that are not differentiated by the previous step,
3938*912701f9SAndroid Build Coastguard Worker      consider the value set for each field in the order L, then S, then R, then V as a sequence of subtags.
3939*912701f9SAndroid Build Coastguard Worker      If those lists for the same field of two rules differ,
3940*912701f9SAndroid Build Coastguard Worker      then consider the first position of difference in the two lists and order the rules by code-point order
3941*912701f9SAndroid Build Coastguard Worker      of the field value at that position and disregard all later fields.
3942*912701f9SAndroid Build Coastguard Worker      Otherwise, consider the next field.
3943*912701f9SAndroid Build Coastguard Worker        * For example:
3944*912701f9SAndroid Build Coastguard Worker        * {L={ja}, V={hepburn, heploc}} precedes
3945*912701f9SAndroid Build Coastguard Worker        * {L={zh}, V={1996, pinyin}}
3946*912701f9SAndroid Build Coastguard Worker          (because it has a different field value set for L and "ja" precedes "zh" at the first position of difference),
3947*912701f9SAndroid Build Coastguard Worker          which precedes
3948*912701f9SAndroid Build Coastguard Worker        * {L={zh}, V={hepburn, heploc}}
3949*912701f9SAndroid Build Coastguard Worker          (because it has the same field value set for L and a different field value set for V in which "1996" precedes "hepburn" at the first position of difference),
3950*912701f9SAndroid Build Coastguard Worker          which precedes
3951*912701f9SAndroid Build Coastguard Worker        * {L={zh}, V={hepburn, simple}}
3952*912701f9SAndroid Build Coastguard Worker          (because it has the same field value set for L and a different field value set for V in which "heploc" precedes "simple" at the first position of difference).
3953*912701f9SAndroid Build Coastguard Worker6. The result is the set of **Alias Rules**
3954*912701f9SAndroid Build Coastguard Worker
3955*912701f9SAndroid Build Coastguard WorkerSo using the examples above, we get the following order:
3956*912701f9SAndroid Build Coastguard Worker
3957*912701f9SAndroid Build Coastguard Worker| languageId | 5.1 total field value set item count | 5.2 non-empty field value set | 5.3 field value set items |
3958*912701f9SAndroid Build Coastguard Worker| --- | --- | --- | --- |
3959*912701f9SAndroid Build Coastguard Worker| {L={en}, S={Latn}, R={GB}} | 3 | n/a | n/a |
3960*912701f9SAndroid Build Coastguard Worker| {L={zh}, S={Hant}, R={CN}} | 3 | match (L, S, R) | in L, “en” before “zh” |
3961*912701f9SAndroid Build Coastguard Worker| {L={zh}, S={Hans}, V={pinyin}} | 3 | (L, S, R, …) before (L, S, V) |  |
3962*912701f9SAndroid Build Coastguard Worker| {L={en}, R={GB}, V={scouse}} | 3 | (L, S, …) before (L, R, …) |  |
3963*912701f9SAndroid Build Coastguard Worker| {L={ja}, V={hepburn,heploc}} | 3 | (L, R, …) before (L, V) |  |
3964*912701f9SAndroid Build Coastguard Worker| {L={zh}, V={1996,pinyin}} | 3 | match (L, V) | in L, “ja” before “zh” |
3965*912701f9SAndroid Build Coastguard Worker| {L={zh}, V={hepburn,heploc}} | 3 | match (L, V) | in V, “1996” before “hepburn” |
3966*912701f9SAndroid Build Coastguard Worker| {L={zh}, V={hepburn,simple}} | 3 | match (L, V) | in V, “heploc” before “simple” |
3967*912701f9SAndroid Build Coastguard Worker| {V={fonipa,hepburn,heploc}} | 3 | (L, …) before (V) |  |
3968*912701f9SAndroid Build Coastguard Worker| {V={hepburn,heploc,simple}} | 3 | match (V) | in V, “fonipa” before “hepburn” |
3969*912701f9SAndroid Build Coastguard Worker| {L={en}, R={GB}} | 2 |  |  |
3970*912701f9SAndroid Build Coastguard Worker| {V={hepburn,heploc}} | 2 | (L, …) before (V) |  |
3971*912701f9SAndroid Build Coastguard Worker| {R={CA}} | 1 |  |  |
3972*912701f9SAndroid Build Coastguard Worker
3973*912701f9SAndroid Build Coastguard Worker### <a name="processing-languageids" href="#processing-languageids">Processing LanguageIds</a>
3974*912701f9SAndroid Build Coastguard Worker
3975*912701f9SAndroid Build Coastguard WorkerTo canonicalize a given _source_:
3976*912701f9SAndroid Build Coastguard Worker
3977*912701f9SAndroid Build Coastguard Worker1. Canonicalize the syntax of _source_ as per _Definition 5. Canonicalizing Syntax_.
3978*912701f9SAndroid Build Coastguard Worker2. Where the _source_ could be an arbitrary BCP 47 language tag, first process as follows:
3979*912701f9SAndroid Build Coastguard Worker   1. If the source is identical to one of the types in the BCP47 LegacyRules, replace the entire source by the replacement value.
3980*912701f9SAndroid Build Coastguard Worker   2. Else if there is an extlang subtag, then apply Step 3 of BCP 47 [Section 4.5](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.5) to remove the extlang subtag (possibly adjusting the language subtag).
3981*912701f9SAndroid Build Coastguard Worker      1. Don’t apply any of the other canonicalization steps in that section, however.
3982*912701f9SAndroid Build Coastguard Worker   3. Else if the first subtag is "x", prefix by "und-".
3983*912701f9SAndroid Build Coastguard Worker   4. **Note:** there are currently no valid 4-letter primary language subtags. While it is extremely unlikely that BCP 47 would ever register them, if so then _languageAlias_ mappings will be supplied for them, mapping to defined CLDR language subtags (from the `idStatus="reserved"` set).
3984*912701f9SAndroid Build Coastguard Worker3. Find the first matching rule in **Alias Rules** (from **Preprocessing**)
3985*912701f9SAndroid Build Coastguard Worker   1. If there are none, return _source_
3986*912701f9SAndroid Build Coastguard Worker4. Transform _source_ according to that rule
3987*912701f9SAndroid Build Coastguard Worker5. loop (goto #3)
3988*912701f9SAndroid Build Coastguard Worker
3989*912701f9SAndroid Build Coastguard Worker### <a name="processing-localeids" href="#processing-localeids">Processing LocaleIds</a>
3990*912701f9SAndroid Build Coastguard Worker
3991*912701f9SAndroid Build Coastguard WorkerThe canonicalization of localeIds is done by first canonicalizing the languageId portion, then handling extensions in the following way:
3992*912701f9SAndroid Build Coastguard Worker
3993*912701f9SAndroid Build Coastguard Worker1. Replace any _tlang_ languageId value by its canonicalization.
3994*912701f9SAndroid Build Coastguard Worker2. Use the bcp47 data to replace keys, types, tfields, and tvalues by their canonical forms. See **U Extension Data Files** and **T Extension Data Files**. The matches are in the `alias` attribute value, while the canonical replacement is in the `name` attribute value. For example:
3995*912701f9SAndroid Build Coastguard Worker   1. Because of the following bcp47 data:
3996*912701f9SAndroid Build Coastguard Worker      `<key name="ms"…>…<type name="uksystem" … alias="imperial" … />…</key>`
3997*912701f9SAndroid Build Coastguard Worker   2. We get the following transformation:
3998*912701f9SAndroid Build Coastguard Worker      `en-u-ms-imperial ⇒ en-u-ms-uksystem`
3999*912701f9SAndroid Build Coastguard Worker3. Replace any unicode_subdivision_id that is a subdivision alias by its replacement value in the same way, using subdivisionAlias data. This applies, for example, to the values for the 'sd' and 'rg' keys. However, where the replacement value is a two-letter region code, also append zzzz so that the result is syntactically correct. For example:
4000*912701f9SAndroid Build Coastguard Worker   1. Because of the following bcp47 data:
4001*912701f9SAndroid Build Coastguard Worker      `<subdivisionAlias type="fi01" replacement="AX"…`
4002*912701f9SAndroid Build Coastguard Worker   2. We get the following transformation:
4003*912701f9SAndroid Build Coastguard Worker      `en-u-rg-fi01 ⇒ en-u-rg-axzzzz`
4004*912701f9SAndroid Build Coastguard Worker
4005*912701f9SAndroid Build Coastguard Worker### <a name="optimizations" href="#optimizations">Optimizations</a>
4006*912701f9SAndroid Build Coastguard Worker
4007*912701f9SAndroid Build Coastguard WorkerThe above algorithm is a logical statement of the process, but would obviously not be directly suited to production code. Production-level code can use many optimizations for efficiency while achieving the same result. For example, the Alias Rules can be further preprocessed to avoid indefinite looping, instead doing a rule lookup once per subtag. As another example, the small number of **Territory Exceptions** can be preprocessed to avoid the likely subtags processing.
4008*912701f9SAndroid Build Coastguard Worker
4009*912701f9SAndroid Build Coastguard Worker* * *
4010*912701f9SAndroid Build Coastguard Worker
4011*912701f9SAndroid Build Coastguard Worker## <a name="References" href="#References">References</a>
4012*912701f9SAndroid Build Coastguard Worker
4013*912701f9SAndroid Build Coastguard Worker| Ancillary Information                                    | To properly localize, parse, and format data requires ancillary information, which is not expressed in Locale Data Markup Language. Some of the formats for values used in Locale Data Markup Language are constructed according to external specifications. The sources for this data and/or formats include the following:  |
4014*912701f9SAndroid Build Coastguard Worker| -------------------------------------------------------- | --- |
4015*912701f9SAndroid Build Coastguard Worker| [<a name="Bugs" href="#Bugs">Bugs</a>]                   | CLDR Bug Reporting form<br/>[https://cldr.unicode.org/index/bug-reports](https://cldr.unicode.org/index/bug-reports) |
4016*912701f9SAndroid Build Coastguard Worker| [<a name="Charts" href="#Charts">Charts</a>]             | The online code charts can be found at [https://www.unicode.org/charts/](https://www.unicode.org/charts/) An index to character names with links to the corresponding chart is found at [https://www.unicode.org/charts/charindex.html](https://www.unicode.org/charts/charindex.html) |
4017*912701f9SAndroid Build Coastguard Worker| [<a name="DUCET" href="#DUCET">DUCET</a>]                | The Default Unicode Collation Element Table (DUCET)<br/>For the base-level collation, of which all the collation tables in this document are tailorings.<br/>[https://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table](https://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table) |
4018*912701f9SAndroid Build Coastguard Worker| [<a name="FAQ" href="#FAQ">FAQ</a>]                      | Unicode Frequently Asked Questions<br/>[https://www.unicode.org/faq/<br/>](https://www.unicode.org/faq/)_For answers to common questions on technical issues._ |
4019*912701f9SAndroid Build Coastguard Worker| [<a name="FCD" href="#FCD">FCD</a>]                      | As defined in UTN #5 Canonical Equivalences in Applications<br/>[https://www.unicode.org/notes/tn5/](https://www.unicode.org/notes/tn5/) |
4020*912701f9SAndroid Build Coastguard Worker| [<a name="Glossary" href="#Glossary">Glossary</a>]       | Unicode Glossary[<br/>https://www.unicode.org/glossary/<br/>](https://www.unicode.org/glossary/)_For explanations of terminology used in this and other documents._ |
4021*912701f9SAndroid Build Coastguard Worker| [<a name="JavaChoice" href="#JavaChoice">JavaChoice</a>] | Java ChoiceFormat<br/>[https://docs.oracle.com/javase/7/docs/api/java/text/ChoiceFormat.html](https://docs.oracle.com/javase/7/docs/api/java/text/ChoiceFormat.html) |
4022*912701f9SAndroid Build Coastguard Worker| [<a name="Olson" href="#Olson">Olson</a>]                | The TZID Database (aka Olson timezone database)<br/>Time zone and daylight savings information.<br/>[https://www.iana.org/time-zones](https://www.iana.org/time-zones)<br/>For archived data, see <br/>[ftp://ftp.iana.org/tz/releases/](ftp://ftp.iana.org/tz/releases/) |
4023*912701f9SAndroid Build Coastguard Worker| [<a name="Reports" href="#Reports">Reports</a>]          | Unicode Technical Reports<br/>[https://www.unicode.org/reports/<br/>](https://www.unicode.org/reports/)_For information on the status and development process for technical reports, and for a list of technical reports._ |
4024*912701f9SAndroid Build Coastguard Worker| [<a name="Unicode" href="#Unicode">Unicode</a>]          | The Unicode Consortium, _The Unicode Standard, Version 13.0.0_<br/>(Mountain View, CA: The Unicode Consortium, 2020. ISBN 978-1-936213-26-9)<br/>[https://www.unicode.org/versions/Unicode13.0.0/](https://www.unicode.org/versions/Unicode13.0.0/) |
4025*912701f9SAndroid Build Coastguard Worker| [<a name="Versions" href="#Versions">Versions</a>]       | Versions of the Unicode Standard<br/>[https://www.unicode.org/versions/](https://www.unicode.org/versions/)<br/>_For information on version numbering, and citing and referencing the Unicode Standard, the Unicode Character Database, and Unicode Technical Reports._ |
4026*912701f9SAndroid Build Coastguard Worker| [<a name="XPath" href="#XPath">XPath</a>]                | [https://www.w3.org/TR/xpath/](https://www.w3.org/TR/xpath/) |
4027*912701f9SAndroid Build Coastguard Worker| Other Standards                                          | _Various standards define codes that are used as keys or values in Locale Data Markup Language. These include:_ |
4028*912701f9SAndroid Build Coastguard Worker| [<a name="BCP47" href="#BCP47">BCP47</a>]                | [https://www.rfc-editor.org/rfc/bcp/bcp47.txt](https://www.rfc-editor.org/rfc/bcp/bcp47.txt)<br/>The Registry<br/>[https://www.iana.org/assignments/language-subtag-registry](https://www.iana.org/assignments/language-subtag-registry) |
4029*912701f9SAndroid Build Coastguard Worker| [<a name="ISO639" href="#ISO639">ISO639</a>]             | ISO Language Codes<br/>[https://www.loc.gov/standards/iso639-2/](https://www.loc.gov/standards/iso639-2/)<br/>Actual List<br/>[https://www.loc.gov/standards/iso639-2/langcodes.html](https://www.loc.gov/standards/iso639-2/langcodes.html) |
4030*912701f9SAndroid Build Coastguard Worker| [<a name="ISO1000" href="#ISO1000">ISO1000</a>]          | ISO 1000: SI units and recommendations for the use of their multiples and of certain other units, International Organization for Standardization, 1992.<br/>[https://www.iso.org/iso/catalogue_detail?csnumber=5448](https://www.iso.org/iso/catalogue_detail?csnumber=5448) |
4031*912701f9SAndroid Build Coastguard Worker| [<a name="ISO3166" href="#ISO3166">ISO3166</a>]          | ISO Region Codes<br/>[https://www.iso.org/iso-3166-country-codes.html](https://www.iso.org/iso-3166-country-codes.html)<br/>Actual List<br/>[https://www.iso.org/obp/ui/#search](https://www.iso.org/obp/ui/#search) |
4032*912701f9SAndroid Build Coastguard Worker| [<a name="ISO4217" href="#ISO4217">ISO4217</a>]          | ISO Currency Codes<br/>[https://www.iso.org/iso-4217-currency-codes.html](https://www.iso.org/iso-4217-currency-codes.html)<br/>_(Note that as of this point, there are significant problems with this list. The supplemental data file contains the best compendium of currency information available.)_ |
4033*912701f9SAndroid Build Coastguard Worker| [<a name="ISO8601" href="#ISO8601">ISO8601</a>]          | ISO Date and Time Format<br/>[https://www.iso.org/iso-8601-date-and-time-format.html](https://www.iso.org/iso-8601-date-and-time-format.html) |
4034*912701f9SAndroid Build Coastguard Worker| [<a name="ISO15924" href="#ISO15924">ISO15924</a>]       | ISO Script Codes<br/>[https://www.unicode.org/iso15924/index.html](https://www.unicode.org/iso15924/index.html)<br/>Actual List<br/>[https://www.unicode.org/iso15924/codelists.html](https://www.unicode.org/iso15924/codelists.html) |
4035*912701f9SAndroid Build Coastguard Worker| [<a name="LOCODE" href="#LOCODE">LOCODE</a>]             | United Nations Code for Trade and Transport Locations, commonly known as "UN/LOCODE"<br/>[https://unece.org/trade/uncefact/unlocode](https://unece.org/trade/uncefact/unlocode)<br/>Download at:  [https://unece.org/trade/cefact/UNLOCODE-Download](https://unece.org/trade/cefact/UNLOCODE-Download) |
4036*912701f9SAndroid Build Coastguard Worker| [<a name="RFC6067" href="#RFC6067">RFC6067</a>]          | BCP 47 Extension U<br/>[https://www.ietf.org/rfc/rfc6067.txt](https://www.ietf.org/rfc/rfc6067.txt) |
4037*912701f9SAndroid Build Coastguard Worker| [<a name="RFC6497" href="#RFC6497">RFC6497</a>]          | BCP 47 Extension T - Transformed Content<br/>[https://www.ietf.org/rfc/rfc6497.txt](https://www.ietf.org/rfc/rfc6497.txt) |
4038*912701f9SAndroid Build Coastguard Worker| [<a name="UNM49" href="#UNM49">UNM49</a>]                | UN M.49: UN Statistics Division<br/>Country or area & region codes<br/>[https://unstats.un.org/unsd/methods/m49/m49.htm](https://unstats.un.org/unsd/methods/m49/m49.htm)<br/>Composition of macro geographical (continental) regions, geographical sub-regions, and selected economic and other groupings<br/>[https://unstats.un.org/unsd/methods/m49/m49regin.htm](https://unstats.un.org/unsd/methods/m49/m49regin.htm) |
4039*912701f9SAndroid Build Coastguard Worker| [<a name="XMLSchema" href="#XMLSchema">XML Schema</a>]   | W3C XML Schema<br/>[https://www.w3.org/XML/Schema](https://www.w3.org/XML/Schema) |
4040*912701f9SAndroid Build Coastguard Worker| General                                                  | _The following are general references from the text:_ |
4041*912701f9SAndroid Build Coastguard Worker| [<a name="ByType" href="#ByType">ByType</a>]             | CLDR Comparison Charts<br/>[https://cldr.unicode.org/index/charts](https://cldr.unicode.org/index/charts) |
4042*912701f9SAndroid Build Coastguard Worker| [<a name="Calendars" href="#Calendars">Calendars</a>]    | Calendrical Calculations: The Millennium Edition by Edward M. Reingold, Nachum Dershowitz; Cambridge University Press; Book and CD-ROM edition (July 1, 2001); ISBN: 0521777526. Note that the algorithms given in this book are copyrighted. |
4043*912701f9SAndroid Build Coastguard Worker| [<a name="Comparisons" href="#Comparisons">Comparisons</a>]             | Comparisons between locale data from different sources<br/>[https://unicode-org.github.io/cldr-staging/charts/latest/by_type/index.html](https://unicode-org.github.io/cldr-staging/charts/latest/by_type/index.html) |
4044*912701f9SAndroid Build Coastguard Worker| [<a name="CurrencyInfo" href="#CurrencyInfo">CurrencyInfo</a>]          | UNECE Currency Data<br/>[https://www.iso.org/iso-4217-currency-codes.html](https://www.iso.org/iso-4217-currency-codes.html) |
4045*912701f9SAndroid Build Coastguard Worker| [<a name="DataFormats" href="#DataFormats">DataFormats</a>]             | CLDR Translation Guidelines<br/>[https://cldr.unicode.org/translation](https://cldr.unicode.org/translation) |
4046*912701f9SAndroid Build Coastguard Worker| [<a name="LDML" href="#LDML">Example</a>]                               | A sample in Locale Data Markup Language<br/>[https://www.unicode.org/cldr/dtd/1.1/ldml-example.xml](https://www.unicode.org/cldr/dtd/1.1/ldml-example.xml) |
4047*912701f9SAndroid Build Coastguard Worker| [<a name="ICUCollation" href="#ICUCollation">ICUCollation</a>]          | ICU rule syntax<br/>[https://unicode-org.github.io/icu/userguide/collation/customization/](https://unicode-org.github.io/icu/userguide/collation/customization/) |
4048*912701f9SAndroid Build Coastguard Worker| [<a name="ICUTransforms" href="#ICUTransforms">ICUTransforms</a>]       | Transforms<br/>[https://unicode-org.github.io/icu/userguide/transforms/](https://unicode-org.github.io/icu/userguide/transforms/)<br/>Transforms Demo<br/>[https://icu4c-demos.unicode.org/icu-bin/translit](https://icu4c-demos.unicode.org/icu-bin/translit) |
4049*912701f9SAndroid Build Coastguard Worker| [<a name="ICUUnicodeSet" href="#ICUUnicodeSet">ICUUnicodeSet</a>]       | ICU UnicodeSet<br/>[https://unicode-org.github.io/icu/userguide/strings/unicodeset.html<br/>](https://unicode-org.github.io/icu/userguide/strings/unicodeset.html)API<br/>[https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/UnicodeSet.html](https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/UnicodeSet.html) |
4050*912701f9SAndroid Build Coastguard Worker| [<a name="ITUE164" href="#ITUE164">ITUE164</a>]                         | International Telecommunication Union: List Of ITU Recommendation E.164 Assigned Country Codes<br/>available at [https://www.itu.int/opb/publications.aspx?parent=T-SP&view=T-SP2](https://www.itu.int/opb/publications.aspx?parent=T-SP&view=T-SP2) |
4051*912701f9SAndroid Build Coastguard Worker| [<a name="LocaleExplorer" href="#LocaleExplorer">LocaleExplorer</a>]    | ICU Locale Explorer<br/>[https://icu4c-demos.unicode.org/icu-bin/locexp](https://icu4c-demos.unicode.org/icu-bin/locexp) |
4052*912701f9SAndroid Build Coastguard Worker| [<a name="localeProject" href="#localeProject">LocaleProject</a>]       | Common Locale Data Repository Project<br/>[https://cldr.unicode.org](https://cldr.unicode.org) |
4053*912701f9SAndroid Build Coastguard Worker| [<a name="NamingGuideline" href="#NamingGuideline">NamingGuideline</a>] | OpenI18N Locale Naming Guideline<br/>formerly at https://www.openi18n.org/docs/text/LocNameGuide-V10.txt |
4054*912701f9SAndroid Build Coastguard Worker| [<a name="RBNF" href="#RBNF">RBNF</a>]                                  | Rule-Based Number Format<br/>[https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1RuleBasedNumberFormat.html](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1RuleBasedNumberFormat.html) |
4055*912701f9SAndroid Build Coastguard Worker| [<a name="RBBI" href="#RBBI">RBBI</a>]                                  | Rule-Based Break Iterator<br/>[https://unicode-org.github.io/icu/userguide/boundaryanalysis/](https://unicode-org.github.io/icu/userguide/boundaryanalysis/) |
4056*912701f9SAndroid Build Coastguard Worker| [<a name="UCAChart" href="#UCAChart">UCAChart</a>]                      | Collation Chart[<br/>https://www.unicode.org/charts/collation/](https://www.unicode.org/charts/collation/) |
4057*912701f9SAndroid Build Coastguard Worker| [<a name="UTCInfo" href="#UTCInfo">UTCInfo</a>]                         | NIST Time and Frequency Division Home Page<br/>[https://www.nist.gov/pml/time-and-frequency-division<br/>](https://www.nist.gov/pml/time-and-frequency-division)U.S. Naval Observatory: What is Universal Time?<br/><https://www.cnmoc.usff.navy.mil/Our-Commands/United-States-Naval-Observatory/Precise-Time-Department/The-USNO-Master-Clock/Definitions-of-Systems-of-Time/> |
4058*912701f9SAndroid Build Coastguard Worker| [<a name="WindowsCulture" href="#WindowsCulture">WindowsCulture</a>]    | Windows Culture Info (with mappings from [[BCP47](#BCP47)]-style codes to LCIDs)<br/>[https://learn.microsoft.com/en-us/dotnet/api/system.globalization.cultureinfo?view=net-6.0](https://learn.microsoft.com/en-us/dotnet/api/system.globalization.cultureinfo?view=net-6.0) |
4059*912701f9SAndroid Build Coastguard Worker
4060*912701f9SAndroid Build Coastguard Worker
4061*912701f9SAndroid Build Coastguard Worker## <a name="Acknowledgments" href="#Acknowledgments">Acknowledgments</a>
4062*912701f9SAndroid Build Coastguard Worker
4063*912701f9SAndroid Build Coastguard WorkerSpecial thanks to the following people for their continuing overall contributions to the CLDR project, and for their specific contributions in the following areas. These descriptions only touch on the many contributions that they have made.
4064*912701f9SAndroid Build Coastguard Worker
4065*912701f9SAndroid Build Coastguard Worker* Mark Davis for creating the initial version of LDML, and adding to and maintaining this specification, and for his work on the LDML code and tests, much of the supplemental data and overall structure, and transforms and keyboards.
4066*912701f9SAndroid Build Coastguard Worker* John Emmons for the POSIX conversion tool and metazones.
4067*912701f9SAndroid Build Coastguard Worker* Deborah Goldsmith for her contributions to LDML architecture and this specification.
4068*912701f9SAndroid Build Coastguard Worker* Chris Hansten for coordinating and managing data submissions and vetting.
4069*912701f9SAndroid Build Coastguard Worker* Erkki Kolehmainen and his team for their work on Finnish.
4070*912701f9SAndroid Build Coastguard Worker* Steven R. Loomis for development of the survey tool and database management.
4071*912701f9SAndroid Build Coastguard Worker* Peter Nugent for his contributions to the POSIX tool and from Open Office, and for coordinating and managing data submissions and vetting.
4072*912701f9SAndroid Build Coastguard Worker* George Rhoten for his work on currencies.
4073*912701f9SAndroid Build Coastguard Worker* Roozbeh Pournader (روزبه پورنادر) for his work on South Asian countries.
4074*912701f9SAndroid Build Coastguard Worker* Ram Viswanadha (రఘురామ్ విశ్వనాధ) for all of his work on LDML code and data integration, and for coordinating and managing data submissions and vetting.
4075*912701f9SAndroid Build Coastguard Worker* Vladimir Weinstein (Владимир Вајнштајн) for his work on collation.
4076*912701f9SAndroid Build Coastguard Worker* Yoshito Umaoka (馬岡 由人) for his work on the timezone architecture.
4077*912701f9SAndroid Build Coastguard Worker* Rick McGowan for his work gathering language, script and region data.
4078*912701f9SAndroid Build Coastguard Worker* Xiaomei Ji (吉晓梅) for her work on time intervals and plural formatting.
4079*912701f9SAndroid Build Coastguard Worker* David Bertoni for his contributions to the conversion tools.
4080*912701f9SAndroid Build Coastguard Worker* Mike Tardif for reviewing this specification and for coordinating and vetting data submissions.
4081*912701f9SAndroid Build Coastguard Worker* Peter Edberg for work on this specification, monthPatterns, cyclicNameSets, contextTransforms and other items.
4082*912701f9SAndroid Build Coastguard Worker* Raymond Wainman and Cibu Johny for their work on keyboards.
4083*912701f9SAndroid Build Coastguard Worker* Jennifer Chye for her contributions to the conversion tools.
4084*912701f9SAndroid Build Coastguard Worker* Markus Scherer for a major rewrite of Part 5, Collation.
4085*912701f9SAndroid Build Coastguard Worker* [Shane Carr](https://www.sffc.xyz/) for his work on numbers and measurement units.
4086*912701f9SAndroid Build Coastguard Worker* Robin Leroy for his work on compact plurals: Part 3, [Language Plural Rules](tr35-numbers.md#Language_Plural_Rules)
4087*912701f9SAndroid Build Coastguard Worker* Rich Gillam for work on Person Names.
4088*912701f9SAndroid Build Coastguard Worker* Alex Kolisnychenko for work on Person Names.
4089*912701f9SAndroid Build Coastguard Worker* Mike McKenna for work on Person Names.
4090*912701f9SAndroid Build Coastguard Worker
4091*912701f9SAndroid Build Coastguard Worker
4092*912701f9SAndroid Build Coastguard WorkerOther contributors to CLDR are listed on the [CLDR Project Page](https://www.unicode.org/cldr/).
4093*912701f9SAndroid Build Coastguard Worker
4094*912701f9SAndroid Build Coastguard Worker## <a name="Modifications" href="#Modifications">Modifications</a>
4095*912701f9SAndroid Build Coastguard Worker
4096*912701f9SAndroid Build Coastguard Worker**Differences from LDML Version 44.1**
4097*912701f9SAndroid Build Coastguard Worker
4098*912701f9SAndroid Build Coastguard Worker* Part 1: [Core](tr35.md#Contents)
4099*912701f9SAndroid Build Coastguard Worker  * In [Parent Locales](#Parent_Locales), made substantial changes to the way that parentLocales work,
4100*912701f9SAndroid Build Coastguard Worker    including a new attribute for algorithmic handling of inheritance
4101*912701f9SAndroid Build Coastguard Worker    that avoids needing a long (and fragile) list of language-script codes
4102*912701f9SAndroid Build Coastguard Worker    to skip when falling back to root.
4103*912701f9SAndroid Build Coastguard Worker    That list was retained for migration, but will be withdrawn in the future.
4104*912701f9SAndroid Build Coastguard Worker  * In [Special Script Codes](#special-script-codes), added a description of special script codes,
4105*912701f9SAndroid Build Coastguard Worker    such as Jpan and Aran.
4106*912701f9SAndroid Build Coastguard Worker  * In [Lateral Inheritance](#Lateral_Inheritance), improved the formatting for clarity.
4107*912701f9SAndroid Build Coastguard Worker  * In [LocaleId Canonicalization:Preprocessing](#preprocessing), restructured the steps for clarity, added more examples.
4108*912701f9SAndroid Build Coastguard Worker  * In [Likely Subtags](#Likely_Subtags), clarified that language subtags iw, in, and yi are treated specially in the data,
4109*912701f9SAndroid Build Coastguard Worker    to allow for applications that use them as canonical language subtags.
4110*912701f9SAndroid Build Coastguard Worker    Also removed the substitution for macroregions,
4111*912701f9SAndroid Build Coastguard Worker    and noted that some elements could be NOOPs in customized data, but could be misleading.
4112*912701f9SAndroid Build Coastguard Worker  * In [EBNF](#ebnf), added more differences from W3C EBNF,
4113*912701f9SAndroid Build Coastguard Worker    and documented use of wfc: and vc: for wellformedness and validity constraints.
4114*912701f9SAndroid Build Coastguard Worker    Marked clauses with that format where appropriate, and grouped constraints after the relevant EBNF.
4115*912701f9SAndroid Build Coastguard Worker
4116*912701f9SAndroid Build Coastguard Worker* Part 3: [Numbers](tr35-numbers.md#Contents)
4117*912701f9SAndroid Build Coastguard Worker  * In [Supplemental Currency Data](tr35-numbers.md#Supplemental_Currency_Data), for the `currency` element,
4118*912701f9SAndroid Build Coastguard Worker    added attributes `tz` and `to-tz` to clarify the `from` and `to` dates.
4119*912701f9SAndroid Build Coastguard Worker
4120*912701f9SAndroid Build Coastguard Worker* Part 4: [Dates](tr35-dates.md#Contents)
4121*912701f9SAndroid Build Coastguard Worker  * In [Date Format Patterns](tr35-dates.md#Date_Format_Patterns), reserved date Pattern field lengths of greater than 16
4122*912701f9SAndroid Build Coastguard Worker    as private use.
4123*912701f9SAndroid Build Coastguard Worker
4124*912701f9SAndroid Build Coastguard Worker* Part 6: [Supplemental](tr35-info.md#Contents)
4125*912701f9SAndroid Build Coastguard Worker  * In [Mixed Units](tr35-info.md#mixed-units), clarified many aspects of mixed units (such as foot-and-inch),
4126*912701f9SAndroid Build Coastguard Worker    including how to handle rounding and precision.
4127*912701f9SAndroid Build Coastguard Worker  * In [Testing](tr35-info.md#testing), listed the additional test files.
4128*912701f9SAndroid Build Coastguard Worker  * In [Unit Preferences Overrides](tr35-info.md#Unit_Preferences_Overrides), made substantial changes including
4129*912701f9SAndroid Build Coastguard Worker    handling of edge cases, such as where there is no quantity for a unit, or no preference data for a quantity;
4130*912701f9SAndroid Build Coastguard Worker	how to handle invalid subtags;
4131*912701f9SAndroid Build Coastguard Worker	negative unit amounts;
4132*912701f9SAndroid Build Coastguard Worker	the usage of each of the subtags that affect unit preferences, and others.
4133*912701f9SAndroid Build Coastguard Worker  * In [Conversion Data](tr35-info.md#conversion-data), added the `special` attribute for `convertUnit`, used for handling beaufort.
4134*912701f9SAndroid Build Coastguard Worker  * In [Unit Prefixes](tr35-info.md#unit-prefixes), added the SI unit prefixes and the power of 10
4135*912701f9SAndroid Build Coastguard Worker    (or 2, for binary prefixes) that they represent.
4136*912701f9SAndroid Build Coastguard Worker
4137*912701f9SAndroid Build Coastguard Worker* Part 7: [Keyboards](tr35-keyboards.md#Contents)
4138*912701f9SAndroid Build Coastguard Worker  * Added substantial changes from v44 to bring the Keyboard 3.0 specification out of Tech Preview, including:
4139*912701f9SAndroid Build Coastguard Worker    * New sections for Definitions, Notation, and Normalization.
4140*912701f9SAndroid Build Coastguard Worker    * Many clarifications and modifications in other sections.
4141*912701f9SAndroid Build Coastguard Worker
4142*912701f9SAndroid Build Coastguard Worker* Part 9: [MessageFormat](tr35-messageFormat.md#Contents)
4143*912701f9SAndroid Build Coastguard Worker  * Added the completely new specification for MessageFormat 2.0 (in Tech Preview)
4144*912701f9SAndroid Build Coastguard Worker
4145*912701f9SAndroid Build Coastguard WorkerNote that small changes such as typos and link fixes are not listed above.
4146*912701f9SAndroid Build Coastguard WorkerModifications in previous versions are listed in those respective versions.
4147*912701f9SAndroid Build Coastguard WorkerClick on **Previous Version** in the header until you get to the desired version.
4148*912701f9SAndroid Build Coastguard Worker
4149*912701f9SAndroid Build Coastguard Worker* * *
4150*912701f9SAndroid Build Coastguard Worker
4151*912701f9SAndroid Build Coastguard WorkerCopyright © 2001–2024 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode [Terms of Use](https://www.unicode.org/copyright.html) apply.
4152*912701f9SAndroid Build Coastguard Worker
4153*912701f9SAndroid Build Coastguard WorkerUnicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.
4154