xref: /aosp_15_r20/external/cldr/common/properties/segments/readme.txt (revision 912701f9769bb47905792267661f0baf2b85bed5)
1*912701f9SAndroid Build Coastguard WorkerCLDR Segmentation data
2*912701f9SAndroid Build Coastguard Worker#  Copyright © 1991-2020 Unicode, Inc.
3*912701f9SAndroid Build Coastguard Worker#  For terms of use, see http://www.unicode.org/copyright.html
4*912701f9SAndroid Build Coastguard Worker#  Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5*912701f9SAndroid Build Coastguard Worker#  CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
6*912701f9SAndroid Build Coastguard WorkerThe segments directory contains files used to customize the default segmentation data in the UCD.
7*912701f9SAndroid Build Coastguard Worker
8*912701f9SAndroid Build Coastguard WorkerCurrently this just applies to the Grapheme Cluster Break (GCB) (https://unicode.org/reports/tr29/) algorithm,
9*912701f9SAndroid Build Coastguard Workerwhich was used in CLDR 35..43 to add support for not splitting Indic aksaras.
10*912701f9SAndroid Build Coastguard WorkerUnicode 15.1 has adoped these changes.
11*912701f9SAndroid Build Coastguard WorkerStarting with CLDR 44, the GraphemeBreakTest.* files are the same as in the UCD.
12*912701f9SAndroid Build Coastguard Worker
13*912701f9SAndroid Build Coastguard WorkerSee the test files supplied by India to org.unicode.cldr.unittest.data.graphemeCluster/*
14*912701f9SAndroid Build Coastguard Worker
15*912701f9SAndroid Build Coastguard Worker  TestSegmenter-Bengali.txt
16*912701f9SAndroid Build Coastguard Worker  TestSegmenter-Devanagari.txt
17*912701f9SAndroid Build Coastguard Worker  TestSegmenter-Gujarati.txt
18*912701f9SAndroid Build Coastguard Worker  TestSegmenter-Malayalam.txt
19*912701f9SAndroid Build Coastguard Worker  TestSegmenter-Odia.txt
20*912701f9SAndroid Build Coastguard Worker  TestSegmenter-Telugu.txt
21*912701f9SAndroid Build Coastguard Worker
22