1<?xml version="1.0"?> 2<!-- 3Copyright (C) Electronic Dictionary Research and Development Group 4Released under Creative Commons Attribution-ShareAlike Licence (V4.0) 5 6This file only contains the kanjidic2 DTD without the actual database. 7 8http://nihongo.monash.edu/kanjidic2/index.html 9http://www.edrdg.org/edrdg/licence.html 10--> 11<!DOCTYPE kanjidic2 [ 12 <!-- Version 1.3 13 This is the DTD of the XML-format kanji file combining information from 14 the KANJIDIC and KANJD212 files. It is intended to be largely self- 15 documenting, with each field being accompanied by an explanatory 16 comment. 17 18 The file covers the following kanji: 19 (a) the 6,355 kanji from JIS X 0208; 20 (b) the 5,801 kanji from JIS X 0212; 21 (c) the 3,625 kanji from JIS X 0213 as follows: 22 (i) the 2,741 kanji which are also in JIS X 0212 have 23 JIS X 0213 code-points (kuten) added to the existing entry; 24 (ii) the 884 "new" kanji have new entries. 25 26 At the end of the explanation for a number of fields there is a tag 27 with the format [N]. This indicates the leading letter(s) of the 28 equivalent field in the KANJIDIC and KANJD212 files. 29 30 The KANJIDIC documentation should also be read for additional 31 information about the information in the file. 32 --> 33<!ELEMENT kanjidic2 (header,character*)> 34<!ELEMENT header (file_version,database_version,date_of_creation)> 35<!-- 36 The single header element will contain identification information 37 about the version of the file 38 --> 39<!ELEMENT file_version (#PCDATA)> 40<!-- 41 This field denotes the version of kanjidic2 structure, as more 42 than one version may exist. 43 --> 44<!ELEMENT database_version (#PCDATA)> 45<!-- 46 The version of the file, in the format YYYY-NN, where NN will be 47 a number starting with 01 for the first version released in a 48 calendar year, then increasing for each version in that year. 49 --> 50<!ELEMENT date_of_creation (#PCDATA)> 51<!-- 52 The date the file was created in international format (YYYY-MM-DD). 53 --> 54<!ELEMENT character (literal,codepoint, radical, misc, dic_number?, query_code?, reading_meaning?,nanori?)*> 55<!ELEMENT literal (#PCDATA)> 56<!-- 57 The character itself in UTF8 coding. 58 --> 59<!ELEMENT codepoint (cp_value+)> 60 <!-- 61 The codepoint element states the code of the character in the various 62 character set standards. 63 --> 64<!ELEMENT cp_value (#PCDATA)> 65 <!-- 66 The cp_value contains the codepoint of the character in a particular 67 standard. The standard will be identified in the cp_type attribute. 68 --> 69<!ATTLIST cp_value cp_type CDATA #REQUIRED> 70 <!-- 71 The cp_type attribute states the coding standard applying to the 72 element. The values assigned so far are: 73 jis208 - JIS X 0208-1997 - kuten coding (nn-nn) 74 jis212 - JIS X 0212-1990 - kuten coding (nn-nn) 75 jis213 - JIS X 0213-2000 - kuten coding (p-nn-nn) 76 ucs - Unicode 4.0 - hex coding (4 or 5 hexadecimal digits) 77 --> 78<!ELEMENT radical (rad_value+)> 79<!ELEMENT rad_value (#PCDATA)> 80 <!-- 81 The radical number, in the range 1 to 214. The particular 82 classification type is stated in the rad_type attribute. 83 --> 84<!ATTLIST rad_value rad_type CDATA #REQUIRED> 85 <!-- 86 The rad_type attribute states the type of radical classification. 87 classical - as recorded in the KangXi Zidian. 88 nelson - as used in the Nelson "Modern Japanese-English 89 Character Dictionary" (i.e. the Classic, not the New Nelson). 90 This will only be used where Nelson reclassified the kanji. 91 --> 92<!ELEMENT misc (grade?, stroke_count+, variant*, freq*, rad_name*)> 93<!ELEMENT grade (#PCDATA)> 94 <!-- 95 The Jouyou Kanji grade level. 1 through 6 indicate the grade in which 96 the kanji is taught in Japanese schools. 8 indicates it is one of the 97 remaining Jouyou Kanji to be learned in junior high school, and 9 98 indicates it is a Jinmeiyou (for use in names) kanji. [G] 99 --> 100<!ELEMENT stroke_count (#PCDATA)> 101 <!-- 102 The stroke count of the kanji, including the radical. If more than 103 one, the first is considered the accepted count, while subsequent ones 104 are common miscounts. (See Appendix E. of the KANJIDIC documentation 105 for some of the rules applied when counting strokes in some of the 106 radicals.) [S] 107 --> 108<!ELEMENT variant (#PCDATA)> 109 <!-- 110 A cross-reference code to another kanji, usually regarded as a variant. 111 The type of cross-reference is given in the var_type attribute. 112 --> 113<!ATTLIST variant var_type CDATA #REQUIRED> 114 <!-- 115 The var_type attribute indicates the type of variant code. The current 116 values are: 117 jis208 - in JIS X 0208 - kuten coding 118 jis212 - in JIS X 0212 - kuten coding 119 jis213 - in JIS X 0213 - kuten coding 120 deroo - De Roo number - numeric 121 njecd - Halpern NJECD index number - numeric 122 s_h - The Kanji Dictionary (Spahn & Hadamitzky) - descriptor 123 nelson - "Classic" Nelson - numeric 124 oneill - Japanese Names (O'Neill) - numeric 125 --> 126<!ELEMENT freq (#PCDATA)> 127 <!-- 128 A frequency-of-use ranking. The 2,500 most-used characters have a 129 ranking; those characters that lack this field are not ranked. The 130 frequency is a number from 1 to 2,500 that expresses the relative 131 frequency of occurrence of a character in modern Japanese. This is 132 based on a survey in newspapers, so it is biassed towards kanji 133 used in newspaper articles. The discrimination between the less 134 frequently used kanji is not strong. 135 --> 136<!ELEMENT rad_name (#PCDATA)> 137 <!-- 138 When the kanji is itself a radical and has a name, this element 139 contains the name (in hiragana.) [T2] 140 --> 141<!ELEMENT dic_number (dic_ref+)> 142 <!-- 143 This element contains the index numbers and similar unstructured 144 information such as page numbers in a number of published dictionaries, 145 and instructional books on kanji. 146 --> 147<!ELEMENT dic_ref (#PCDATA)> 148 <!-- 149 Each dic_ref contains an index number. The particular dictionary, 150 etc. is defined by the dr_type attribute. 151 --> 152<!ATTLIST dic_ref dr_type CDATA #REQUIRED> 153 <!-- 154 The dr_type defines the dictionary or reference book, etc. to which 155 dic_ref element applies. The initial allocation is: 156 nelson_c - "Modern Reader's Japanese-English Character Dictionary", 157 edited by Andrew Nelson (now published as the "Classic" 158 Nelson). 159 nelson_n - "The New Nelson Japanese-English Character Dictionary", 160 edited by John Haig. 161 halpern_njecd - "New Japanese-English Character Dictionary", 162 edited by Jack Halpern. 163 halpern_kkld - "Kanji Learners Dictionary" (Kodansha) edited by 164 Jack Halpern. 165 heisig - "Remembering The Kanji" by James Heisig. 166 gakken - "A New Dictionary of Kanji Usage" (Gakken) 167 oneill_names - "Japanese Names", by P.G. O'Neill. 168 oneill_kk - "Essential Kanji" by P.G. O'Neill. 169 moro - "Daikanwajiten" compiled by Morohashi. For some kanji two 170 additional attributes are used: m_vol: the volume of the 171 dictionary in which the kanji is found, and m_page: the page 172 number in the volume. 173 henshall - "A Guide To Remembering Japanese Characters" by 174 Kenneth G. Henshall. 175 sh_kk - "Kanji and Kana" by Spahn and Hadamitzky. 176 sakade - "A Guide To Reading and Writing Japanese" edited by 177 Florence Sakade. 178 henshall3 - "A Guide To Reading and Writing Japanese" 3rd 179 edition, edited by Henshall, Seeley and De Groot. 180 tutt_cards - Tuttle Kanji Cards, compiled by Alexander Kask. 181 crowley - "The Kanji Way to Japanese Language Power" by 182 Dale Crowley. 183 kanji_in_context - "Kanji in Context" by Nishiguchi and Kono. 184 busy_people - "Japanese For Busy People" vols I-III, published 185 by the AJLT. The codes are the volume.chapter. 186 kodansha_compact - the "Kodansha Compact Kanji Guide". 187 --> 188<!ATTLIST dic_ref m_vol CDATA #IMPLIED> 189 <!-- 190 See above under "moro". 191 --> 192<!ATTLIST dic_ref m_page CDATA #IMPLIED> 193 <!-- 194 See above under "moro". 195 --> 196<!ELEMENT query_code (q_code+)> 197 <!-- 198 These codes contain information relating to the glyph, and can be used 199 for finding a required kanji. The type of code is defined by the 200 qc_type attribute. 201 --> 202<!ELEMENT q_code (#PCDATA)> 203 <!-- 204 The q_code contains the actual query-code value, according to the 205 qc_type attribute. 206 --> 207<!ATTLIST q_code qc_type CDATA #REQUIRED> 208 <!-- 209 The q_code attribute defines the type of query code. The current values 210 are: 211 skip - Halpern's SKIP (System of Kanji Indexing by Patterns) 212 code. The format is n-nn-nn. See the KANJIDIC documentation 213 for a description of the code and restrictions on the 214 commercial use of this data. [P] 215 216 sh_desc - the descriptor codes for The Kanji Dictionary (Tuttle 217 1996) by Spahn and Hadamitzky. They are in the form nxnn.n, 218 e.g. 3k11.2, where the kanji has 3 strokes in the 219 identifying radical, it is radical "k" in the SH 220 classification system, there are 11 other strokes, and it is 221 the 2nd kanji in the 3k11 sequence. (I am very grateful to 222 Mark Spahn for providing the list of these descriptor codes 223 for the kanji in this file.) [I] 224 four_corner - the "Four Corner" code for the kanji. This is a code 225 invented by Wang Chen in 1928. See the KANJIDIC documentation 226 for an overview of the Four Corner System. [Q] 227 228 deroo - the codes developed by the late Father Joseph De Roo, and 229 published in his book "2001 Kanji" (Bojinsha). Fr De Roo 230 gave his permission for these codes to be included. [DR] 231 misclass - a possible misclassification of the kanji according 232 to one of the code types. (See the "Z" codes in the KANJIDIC 233 documentation for more details.) 234 235 --> 236<!ELEMENT reading_meaning (rmgroup*, nanori*)> 237 <!-- 238 The readings for the kanji in several languages, and the meanings, also 239 in several languages. The readings and meanings are grouped to enable 240 the handling of the situation where the meaning is differentiated by 241 reading. [T1] 242 --> 243<!ELEMENT nanori (#PCDATA)> 244 <!-- 245 Japanese readings that are now only associated with names. 246 --> 247<!ELEMENT rmgroup (reading*, meaning*)> 248<!ELEMENT reading (#PCDATA)> 249 <!-- 250 The reading element contains the reading or pronunciation 251 of the kanji. 252 --> 253<!ATTLIST reading r_type CDATA #REQUIRED> 254 <!-- 255 The r_type attribute defines the type of reading in the reading 256 element. The current values are: 257 pinyin - the modern PinYin romanization of the Chinese reading 258 of the kanji. The tones are represented by a concluding 259 digit. [Y] 260 korean_r - the romanized form of the Korean reading(s) of the 261 kanji. The readings are in the (Republic of Korea) Ministry 262 of Education style of romanization. [W] 263 korean_h - the Korean reading(s) of the kanji in hangul. 264 ja_on - the "on" Japanese reading of the kanji, in katakana. A 265 second attribute r_status, if present, will indicate with 266 a value of "jy" whether the reading is approved for a 267 "Jouyou kanji". 268 ja_kun - the "kun" Japanese reading of the kanji, in hiragana. 269 Where relevant the okurigana is also included separated by a 270 ".". Readings associated with prefixes and suffixes are 271 marked with a "-". A second attribute r_status, if present, 272 will indicate with a value of "jy" whether the reading is 273 approved for a "Jouyou kanji". 274 --> 275<!ATTLIST reading r_status CDATA #IMPLIED> 276 <!-- 277 See under ja_on and ja_kun above. 278 --> 279<!ELEMENT meaning (#PCDATA)> 280 <!-- 281 The meaning associated with the kanji. 282 --> 283<!ATTLIST meaning m_lang CDATA #IMPLIED> 284 <!-- 285 The m_lang attribute defines the target language of the meaning. It 286 will be coded using the two-letter language code from the ISO 639 287 standard. When absent, the value "en" (i.e. English) is implied. [{}] 288 --> 289] > 290<kanjidic2> 291</kanjidic2> 292