Name Date Size #Lines LOC

..--

GraphemeBreakTest.htmlH A D25-Apr-202561.7 KiB232231

GraphemeBreakTest.txtH A D25-Apr-2025183.8 KiB1,2161,215

readme.txtH A D25-Apr-20251,011 2217

readme.txt

1CLDR Segmentation data
2#  Copyright © 1991-2020 Unicode, Inc.
3#  For terms of use, see http://www.unicode.org/copyright.html
4#  Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5#  CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
6The segments directory contains files used to customize the default segmentation data in the UCD.
7
8Currently this just applies to the Grapheme Cluster Break (GCB) (https://unicode.org/reports/tr29/) algorithm,
9which was used in CLDR 35..43 to add support for not splitting Indic aksaras.
10Unicode 15.1 has adoped these changes.
11Starting with CLDR 44, the GraphemeBreakTest.* files are the same as in the UCD.
12
13See the test files supplied by India to org.unicode.cldr.unittest.data.graphemeCluster/*
14
15  TestSegmenter-Bengali.txt
16  TestSegmenter-Devanagari.txt
17  TestSegmenter-Gujarati.txt
18  TestSegmenter-Malayalam.txt
19  TestSegmenter-Odia.txt
20  TestSegmenter-Telugu.txt
21
22