Lines Matching full:collation

3 # Unicode Locale Data Markup Language (LDML)<br/>Part 5: Collation
15 This is a partial document, describing only those parts of the LDML that are relevant for collation
43 * Part 5: [Collation](tr35-collation.md#Contents) (sorting, searching, grouping)
49 ## <a name="Contents" href="#Contents">Contents of Part 5, Collation</a>
51 * [CLDR Collation](#CLDR_Collation)
52 * [CLDR Collation Algorithm](#CLDR_Collation_Algorithm)
58 * [Root Collation](#Root_Collation)
63 * [Root Collation Data Files](#Root_Data_Files)
64 * [Root Collation Data File Formats](#Root_Data_File_Formats)
68 * [Collation Tailorings](#Collation_Tailorings)
69 * [Collation Types](#Collation_Types)
70 * [Collation Type Fallback](#Collation_Type_Fallback)
71 …* Table: [Sample requested and actual collation locales and types](#Sample_requested_and_actual_co…
73 * [Collation Element](#Collation_Element)
75 * Table: [Collation Settings](#Collation_Settings)
79 * [Collation Rule Syntax](#Rules)
81 * Table: [Specifying Collation Ordering](#Specifying_Collation_Ordering)
93 * [Collation Reordering](#Script_Reordering)
98 * [Compute Modified Collation Elements](#Case_Weights)
101 * [Collation Indexes](#Collation_Indexes)
105 ## <a name="CLDR_Collation" href="#CLDR_Collation">CLDR Collation</a>
107 Collation is the general term for the process and function of determining the sorting order of stri…
109 Collation varies by language, by application (some languages use special phonebook sorting), and ot…
111collation data for many languages and styles. The data supports not only sorting but also language…
113 ### <a name="CLDR_Collation_Algorithm" href="#CLDR_Collation_Algorithm">CLDR Collation Algorithm</a>
115 The CLDR collation algorithm is an extension of the [Unicode Collation Algorithm](https://www.unico…
131 …slowed down by contraction matching starting with L/l. In the CLDR root collation, these contracti…
133collation elements for p followed by the collation elements for "x if after p". In the DUCET, L· m…
183 There is a root collation for "emoji" in CLDR. So use of "-u-co-emoji" in a Unicode locale identifi…
224 ## <a name="Root_Collation" href="#Root_Collation">Root Collation</a>
226collation order is based on the [Default Unicode Collation Element Table (DUCET)](https://www.unic…
228 Starting with CLDR 1.9, CLDR uses modified tables for the root collation order. The root locale ord…
264 * [https://www.unicode.org/charts/collation/](https://www.unicode.org/charts/collation/)
276 …que weights on primary and identical levels. For details see the _[CLDR Collation Algorithm](#Algo…
278 UCA (beginning with version 6.3) also maps **U+FFFD** to a special collation element with a very hi…
280 …ecial collation elements, **U+FFFD..U+FFFF** are not further tailorable, and nothing can tailor to…
290collation syntax, but has not been updated recently. It does not support any of the syntax marked …
292 ### <a name="Root_Data_Files" href="#Root_Data_Files">Root Collation Data Files</a>
294 The CLDR root collation data files are in the CLDR repository and release, under the path [common/u…
304 …ts, so that a number of characters can be tailored to sort between any two root collation elements.
305collation elements with primary weights at the boundaries between reordering groups and Unicode sc…
309collation order in the form of [tailoring rules](#Collation_Tailorings). This is only an approxima…
314 ### <a name="Root_Data_File_Formats" href="#Root_Data_File_Formats">Root Collation Data File Format…
320 This file defines CLDR’s tailoring of the DUCET, as described in _[Root Collation](#Root_Collation)…
338 …s the ranges of Unified_Ideograph characters in collation order. (New in CLDR 24.) They map to col…
349 …ing with CLDR 26, the CJK type="unihan" tailorings assume that the root collation order sorts Han …
351 The root collation radical-stroke order is derived from the first (normative) values of the [Unihan…
365 … hex codepoint sequence. The second field is a sequence of collation elements. Each collation elem…
375 … U+00B7 appears immediately after U+006C, it is given the corresponding collation element instead.…
401 Beginning with CLDR 27, some primary or secondary collation elements may have below-common tertiary…
404 # SPECIAL MAX/MIN COLLATION ELEMENTS
417 Some collation elements are specified by reference to other mappings. This is particularly useful f…
419 … [Unified_Ideograph] data line. The referenced character must map to exactly one collation element.
421 `[U+4E0D]` copies U+4E0D’s entire collation element. `[U+4E36, 10]` copies U+4E36’s primary and sec…
435 …ght per script or group. They can be enumerated for implementations of [Collation Indexes](#Collat…
437 …llowing characters as shown above. The reserved ranges are not used for collation elements and are…
445 …her primary weight, otherwise numeric sorting would generate ill-formed collation elements. Theref…
448 # HOMELESS COLLATION ELEMENTS
454collation element (necessary for certain implementations of tailoring), this requires the construc…
526 The format for this file uses the CLDR collation syntax, see _[Collation Tailorings](#Collation_Tai…
528 ## <a name="Collation_Tailorings" href="#Collation_Tailorings">Collation Tailorings</a>
531 <!ELEMENT collations (alias | (defaultCollation?, collation*, special*)) >
536 …ement of the LDML format contains one or more `collation` elements, distinguished by type. Each `c…
538 … **Note**: CLDR collation tailoring data should follow the [CLDR Collation Guidelines](https://cld…
540 ### <a name="Collation_Types" href="#Collation_Types">Collation Types</a>
548 …onments to use CJK sorting, there are also short forms of each of these collation sequences. These…
550collation type name that starts with "private-", for example, "private-kana", indicates an incompl…
552 …stration of collation at [[LocaleExplorer](tr35.md#LocaleExplorer)] that uses the same rule syntax…
554collation files used an XML format. Starting with CLDR 24, the XML collation syntax is deprecated …
556 #### <a name="Collation_Type_Fallback" href="#Collation_Type_Fallback">Collation Type Fallback</a>
561 2. If the request language tag specifies the collation type (keyword "co"), then map it to its alia…
562 3. Use the `<collation>` element with this type.
563 …th "search" but is longer, then set the type to "search" and use that `<collation>` element. (For …
564 … is not the default type, then set the type to the default type and use that `<collation>` element.
565 …and the type is not "standard", then set the type to "standard" and use that `<collation>` element.
566 7. If it does not exist, then use the CLDR root collation.
568collation/root.xml contains `<defaultCollation>standard</defaultCollation>`, `<collation type="sta…
570 For example, assume that we have collation data for the following tailorings. ("da/search" is short…
586 …ted_and_actual_collation_locales_and_types">Sample requested and actual collation locales and type…
606 ### <a name="Collation_Element" href="#Collation_Element">Collation Element</a>
609 <!ELEMENT collation (alias | (cr*, special*)) >
612 …n if the underlying weights change. The following illustrates the overall structure of a collation:
615 <collation type="phonebook">
620 </collation>
629 ###### Table: <a name="Collation_Settings" href="#Collation_Settings">Collation Settings</a>
664 …ints in <b>shifted</b>. That is, the normal Level 4 value for a regular collation element is FFFF,…
672 …ing and usage of the reorder codes, see <i><a href="#Script_Reordering">Collation Reordering</a>.<…
690 Some commonly used parametric collation settings are available via combinations of LDML settings at…
704 …’, ‘å’, and other characters. In the root collation (and in the DUCET), Cyrillic ‘ӛ’ maps to a sin…
710collation chart](https://www.unicode.org/charts/collation/)). Alternatively, someone could want mo…
714 ### <a name="Rules" href="#Rules">Collation Rule Syntax</a>
720 The goal for the collation rule syntax is to have clearly expressed rules with a concise format. Th…
722collation, the FractionalUCA.txt file defines all mappings for all of Unicode directly, and it als…
724 The ASCII [:P:] and [:S:] characters are reserved for collation syntax: `[\u0021-\u002F \u003A-\u00…
730 …nd demos before passing rule strings into the ICU library code. The ICU collation API does not une…
732 The ASCII double quote must be both escaped (so that the collation syntax can be enclosed in pairs …
736 The collation syntax is case-sensitive.
740 The root collation mappings form the initial state. Mappings are added and removed via a sequence o…
742 …more collation elements according to the current state. A relation consists of an operator and a s…
744 …cifying_Collation_Ordering" href="#Specifying_Collation_Ordering">Specifying Collation Ordering</a>
748 | `&` | `& Z` | Map Z to collation elements according to the current state. These wil…
765 Each relation uses and modifies the collation elements of the immediately preceding reset position …
775 …s its relation string to the current collation elements. Any other relation operator modifies the …
777collation element whose strength is at least as great as the strength of the operator. For example…
778 * Increment the collation element weight corresponding to the strength of the operator. For example…
779 …the next weight for the same combination of higher-level weights of any collation element accordin…
785collation elements. The first one is the same as for ‘a’, and the second one has a primary weight …
813 …le collation elements defines an expansion. This is normally the result of a reset position (and/o…
815 …wed by `/` and an _extension string_. The extension string is mapped to collation elements accordi…
819 …ntext-sensitive middle-dot-after-L (which is a secondary CE in the root collation). On the other h…
876 Each of these special reset positions always maps to a single collation element.
884 … reset position strings. For example, if a tailoring rule creates a new collation element after `&…
899collation. This allows for better maintenance and smaller rule sizes. The source is a BCP 47 langu…
926collation/ja.xml), [Chinese](https://github.com/unicode-org/cldr/blob/main/common/collation/zh.xml…
929 <collation>
945 </collation>
948 ### <a name="Script_Reordering" href="#Script_Reordering">Collation Reordering</a>
950 Collation reordering allows scripts and certain other defined blocks of characters to be moved rela…
978 4. The reordering list is now complete, and used to reorder characters in collation accordingly.
980 … identifier) completely replace what was there. To specify a version of collation that completely …
991 … the FractionalUCA.txt file, based on the primary weights of associated collation elements. The fi…
998 * The TRAILING group, the FIELD-SEPARATOR (associated with U+FFFE), and collation elements with onl…
1006 …ports this via reorder-reserved ranges of primary weights that are not used for collation elements.
1012 …ent in FractionalUCA.txt ( `[0312.0020.0002]` ) also shows the allkeys_CLDR.txt collation elements.
1014 The DUCET ordering of some characters is slightly different from the CLDR root collation order. The…
1016 …acters more strictly by category than the DUCET (as described in _[Root Collation](#Root_Collation…
1030 Each collation element of an expansion may be in a different reordering group, for example for pare…
1034 …all and large Kana characters are also included as case differences, to support Japanese collation.
1040 …ameters are set, the following describes the derivation of the modified collation elements. It ass…
1044collation, the case value for each collation element is computed from the tertiary weight listed i…
1046 Look up a case value for the tertiary weight x of each collation element:
1050 …r UNCASED and LOWERCASE, and 10 for UPPER. There is no MIXED case value (01) in the root collation.
1052 #### <a name="Case_Weights" href="#Case_Weights">Compute Modified Collation Elements</a>
1059 Compute a new collation element according to the following table. The notation _xt_ means that the …
1077 For primary+case, which is used for “ignore accents but not case” collation, primary ignorables are…
1085 Characters and strings that are tailored have case values computed from their root collation case b…
1099 …rable) root collation CEs, except for U+0345 Combining Ypogegrammeni which is Lowercase. All Upper…
1103collation options for users to choose from. A collation whose type name starts with "private-" is …
1105 ### <a name="Collation_Indexes" href="#Collation_Indexes">Collation Indexes</a>
1109 The main data includes `<exemplarCharacters>` for collation indexes. See _Part 2 General, [Characte…
1113 The index characters need to be used in conjunction with a collation for the locale, which will det…
1119collation order, and primary less than "B". The use of the list requires that the target list be s…
1123 1. The primary weight (according to the collation) is used to determine which bucket a string is in…
1124 …apted to the underlying collation: normally Ё is in the Е bucket for Russian, but if someone used …
1143 …script boundaries, _not_ multi-script-group boundaries. So if you had a collation that reordered H…
1174 The per-language index exemplar characters are not useful for collation indexes for CJK because for…