Name | Date | Size | #Lines | LOC | ||
---|---|---|---|---|---|---|
.. | - | - | ||||
GraphemeBreakTest.html | H A D | 25-Apr-2025 | 61.7 KiB | 232 | 231 | |
GraphemeBreakTest.txt | H A D | 25-Apr-2025 | 183.8 KiB | 1,216 | 1,215 | |
readme.txt | H A D | 25-Apr-2025 | 1,011 | 22 | 17 |
readme.txt
1CLDR Segmentation data 2# Copyright © 1991-2020 Unicode, Inc. 3# For terms of use, see http://www.unicode.org/copyright.html 4# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. 5# CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/) 6The segments directory contains files used to customize the default segmentation data in the UCD. 7 8Currently this just applies to the Grapheme Cluster Break (GCB) (https://unicode.org/reports/tr29/) algorithm, 9which was used in CLDR 35..43 to add support for not splitting Indic aksaras. 10Unicode 15.1 has adoped these changes. 11Starting with CLDR 44, the GraphemeBreakTest.* files are the same as in the UCD. 12 13See the test files supplied by India to org.unicode.cldr.unittest.data.graphemeCluster/* 14 15 TestSegmenter-Bengali.txt 16 TestSegmenter-Devanagari.txt 17 TestSegmenter-Gujarati.txt 18 TestSegmenter-Malayalam.txt 19 TestSegmenter-Odia.txt 20 TestSegmenter-Telugu.txt 21 22