1 2<!DOCTYPE html 3 PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> 4 5<html> 6 <head> 7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 8 9 <title>Extensible Markup Language (XML) 1.0</title> 10 <link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-REC"><style type="text/css">code { font-family: monospace }</style></head> 11 <body> 12 13 <div class="head"><a href="http://www.w3.org/"><img src="http://www.w3.org/Icons/WWW/w3c_home" alt="W3C" height="48" width="72"></a><h1>Extensible Markup Language (XML) 1.0<br></h1> 14 <h2>W3C Recommendation 10 February 1998</h2> 15 <dl> 16 <dt>This version:</dt> 17 <dd> 18 <a href="http://www.w3.org/TR/1998/REC-xml-19980210"> 19 http://www.w3.org/TR/1998/REC-xml-19980210 20 </a><br> 21 <a href="http://www.w3.org/TR/1998/REC-xml-19980210.xml"> 22 http://www.w3.org/TR/1998/REC-xml-19980210.xml 23 </a><br> 24 <a href="http://www.w3.org/TR/1998/REC-xml-19980210.html"> 25 http://www.w3.org/TR/1998/REC-xml-19980210.html 26 </a><br> 27 <a href="http://www.w3.org/TR/1998/REC-xml-19980210.pdf"> 28 http://www.w3.org/TR/1998/REC-xml-19980210.pdf 29 </a><br> 30 <a href="http://www.w3.org/TR/1998/REC-xml-19980210.ps"> 31 http://www.w3.org/TR/1998/REC-xml-19980210.ps 32 </a><br> 33 34 </dd> 35 <dt>Latest version:</dt> 36 <dd> 37 <a href="http://www.w3.org/TR/REC-xml"> 38 http://www.w3.org/TR/REC-xml 39 </a><br> 40 41 </dd> 42 <dt>Previous version:</dt> 43 <dd> 44 <a href="http://www.w3.org/TR/PR-xml-971208"> 45 http://www.w3.org/TR/PR-xml-971208 46 </a><br> 47 48 49 </dd> 50 <dt>Editors:</dt> 51 <dd> 52 Tim Bray 53 (Textuality and Netscape) 54 <a href="mailto:[email protected]"><[email protected]></a><br> 55 Jean Paoli 56 (Microsoft) 57 <a href="mailto:[email protected]"><[email protected]></a><br> 58 C. M. Sperberg-McQueen 59 (University of Illinois at Chicago) 60 <a href="mailto:[email protected]"><[email protected]></a><br> 61 62 </dd> 63 </dl> 64 <p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice.html#Copyright"> 65 Copyright 66 </a> © 1999 <a href="http://www.w3.org">W3C</a> 67 (<a href="http://www.lcs.mit.edu">MIT</a>, 68 <a href="http://www.inria.fr/">INRIA</a>, 69 <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C 70 <a href="http://www.w3.org/Consortium/Legal/ipr-notice.html#Legal_Disclaimer">liability</a>, 71 <a href="http://www.w3.org/Consortium/Legal/ipr-notice.html#W3C_Trademarks">trademark</a>, 72 <a href="http://www.w3.org/Consortium/Legal/copyright-documents.html">document use</a> and 73 <a href="http://www.w3.org/Consortium/Legal/copyright-software.html">software licensing</a> rules apply. 74 75 </p> 76 <hr title="Separator for header"> 77 </div> 78 <h2><a name="abstract">Abstract</a></h2> 79 80 <p>The Extensible Markup Language (XML) is a subset of 81 SGML that is completely described in this document. Its goal is to 82 enable generic SGML to be served, received, and processed on the Web 83 in the way that is now possible with HTML. XML has been designed for 84 ease of implementation and for interoperability with both SGML and 85 HTML. 86 </p> 87 88 <h2><a name="status">Status of this document</a></h2> 89 90 <p>This document has been reviewed by W3C Members and 91 other interested parties and has been endorsed by the 92 Director as a W3C Recommendation. It is a stable 93 document and may be used as reference material or cited 94 as a normative reference from another document. W3C's 95 role in making the Recommendation is to draw attention 96 to the specification and to promote its widespread 97 deployment. This enhances the functionality and 98 interoperability of the Web. 99 </p> 100 101 <p> 102 This document specifies a syntax created by subsetting an existing, 103 widely used international text processing standard (Standard 104 Generalized Markup Language, ISO 8879:1986(E) as amended and 105 corrected) for use on the World Wide Web. It is a product of the W3C 106 XML Activity, details of which can be found at <a href="http://www.w3.org/XML">http://www.w3.org/XML</a>. A list of 107 current W3C Recommendations and other technical documents can be found 108 at <a href="http://www.w3.org/TR">http://www.w3.org/TR</a>. 109 110 </p> 111 112 <p>This specification uses the term URI, which is defined by <a href="#Berners-Lee">[Berners-Lee et al.]</a>, a work in progress expected to update <a href="#RFC1738">[IETF RFC1738]</a> and <a href="#RFC1808">[IETF RFC1808]</a>. 113 114 </p> 115 116 <p>The list of known errors in this specification is 117 available at 118 <a href="http://www.w3.org/XML/xml-19980210-errata">http://www.w3.org/XML/xml-19980210-errata</a>. 119 </p> 120 121 <p>Please report errors in this document to 122 <a href="mailto:[email protected]">[email protected]</a>. 123 124 </p> 125 126 127 <h2><a name="contents">Table of contents</a></h2>1 <a href="#sec-intro">Introduction</a><br> 1.1 <a href="#sec-origin-goals">Origin and Goals</a><br> 1.2 <a href="#sec-terminology">Terminology</a><br>2 <a href="#sec-documents">Documents</a><br> 2.1 <a href="#sec-well-formed">Well-Formed XML Documents</a><br> 2.2 <a href="#charsets">Characters</a><br> 2.3 <a href="#sec-common-syn">Common Syntactic Constructs</a><br> 2.4 <a href="#syntax">Character Data and Markup</a><br> 2.5 <a href="#sec-comments">Comments</a><br> 2.6 <a href="#sec-pi">Processing Instructions</a><br> 2.7 <a href="#sec-cdata-sect">CDATA Sections</a><br> 2.8 <a href="#sec-prolog-dtd">Prolog and Document Type Declaration</a><br> 2.9 <a href="#sec-rmd">Standalone Document Declaration</a><br> 2.10 <a href="#sec-white-space">White Space Handling</a><br> 2.11 <a href="#sec-line-ends">End-of-Line Handling</a><br> 2.12 <a href="#sec-lang-tag">Language Identification</a><br>3 <a href="#sec-logical-struct">Logical Structures</a><br> 3.1 <a href="#sec-starttags">Start-Tags, End-Tags, and Empty-Element Tags</a><br> 3.2 <a href="#elemdecls">Element Type Declarations</a><br> 3.2.1 <a href="#sec-element-content">Element Content</a><br> 3.2.2 <a href="#sec-mixed-content">Mixed Content</a><br> 3.3 <a href="#attdecls">Attribute-List Declarations</a><br> 3.3.1 <a href="#sec-attribute-types">Attribute Types</a><br> 3.3.2 <a href="#sec-attr-defaults">Attribute Defaults</a><br> 3.3.3 <a href="#AVNormalize">Attribute-Value Normalization</a><br> 3.4 <a href="#sec-condition-sect">Conditional Sections</a><br>4 <a href="#sec-physical-struct">Physical Structures</a><br> 4.1 <a href="#sec-references">Character and Entity References</a><br> 4.2 <a href="#sec-entity-decl">Entity Declarations</a><br> 4.2.1 <a href="#sec-internal-ent">Internal Entities</a><br> 4.2.2 <a href="#sec-external-ent">External Entities</a><br> 4.3 <a href="#TextEntities">Parsed Entities</a><br> 4.3.1 <a href="#sec-TextDecl">The Text Declaration</a><br> 4.3.2 <a href="#wf-entities">Well-Formed Parsed Entities</a><br> 4.3.3 <a href="#charencoding">Character Encoding in Entities</a><br> 4.4 <a href="#entproc">XML Processor Treatment of Entities and References</a><br> 4.4.1 <a href="#not-recognized">Not Recognized</a><br> 4.4.2 <a href="#included">Included</a><br> 4.4.3 <a href="#include-if-valid">Included If Validating</a><br> 4.4.4 <a href="#forbidden">Forbidden</a><br> 4.4.5 <a href="#inliteral">Included in Literal</a><br> 4.4.6 <a href="#notify">Notify</a><br> 4.4.7 <a href="#bypass">Bypassed</a><br> 4.4.8 <a href="#as-PE">Included as PE</a><br> 4.5 <a href="#intern-replacement">Construction of Internal Entity Replacement Text</a><br> 4.6 <a href="#sec-predefined-ent">Predefined Entities</a><br> 4.7 <a href="#Notations">Notation Declarations</a><br> 4.8 <a href="#sec-doc-entity">Document Entity</a><br>5 <a href="#sec-conformance">Conformance</a><br> 5.1 <a href="#proc-types">Validating and Non-Validating Processors</a><br> 5.2 <a href="#safe-behavior">Using XML Processors</a><br>6 <a href="#sec-notation">Notation</a><br><h3>Appendices</h3>A <a href="#sec-bibliography">References</a><br> A.1 <a href="#sec-existing-stds">Normative References</a><br> A.2 <a href="#section-Other-References">Other References</a><br>B <a href="#CharClasses">Character Classes</a><br>C <a href="#sec-xml-and-sgml">XML and SGML</a> (Non-Normative)<br>D <a href="#sec-entexpand">Expansion of Entity and Character References</a> (Non-Normative)<br>E <a href="#determinism">Deterministic Content Models</a> (Non-Normative)<br>F <a href="#sec-guessing">Autodetection of Character Encodings</a> (Non-Normative)<br>G <a href="#sec-xml-wg">W3C XML Working Group</a> (Non-Normative)<br><hr> 128 129 130 <h2><a name="sec-intro"></a>1 Introduction 131 </h2> 132 133 <p>Extensible Markup Language, abbreviated XML, describes a class of 134 data objects called <a href="#dt-xml-doc">XML documents</a> and 135 partially describes the behavior of 136 computer programs which process them. XML is an application profile or 137 restricted form of SGML, the Standard Generalized Markup 138 Language <a href="#ISO8879">[ISO 8879]</a>. 139 By construction, XML documents 140 are conforming SGML documents. 141 142 </p> 143 144 <p>XML documents are made up of storage units called <a href="#dt-entity">entities</a>, which contain either parsed 145 or unparsed data. 146 Parsed data is made up of <a href="#dt-character">characters</a>, 147 some 148 of which form <a href="#dt-chardata">character data</a>, 149 and some of which form <a href="#dt-markup">markup</a>. 150 Markup encodes a description of the document's storage layout and 151 logical structure. XML provides a mechanism to impose constraints on 152 the storage layout and logical structure. 153 </p> 154 155 <p><a name="dt-xml-proc"></a>A software module 156 called an <b>XML processor</b> is used to read XML documents 157 and provide access to their content and structure. <a name="dt-app"></a>It is assumed that an XML processor is 158 doing its work on behalf of another module, called the 159 <b>application</b>. This specification describes the 160 required behavior of an XML processor in terms of how it must read XML 161 data and the information it must provide to the application. 162 </p> 163 164 165 166 <h3><a name="sec-origin-goals"></a>1.1 Origin and Goals 167 </h3> 168 169 <p>XML was developed by an XML Working Group (originally known as the 170 SGML Editorial Review Board) formed under the auspices of the World 171 Wide Web Consortium (W3C) in 1996. 172 It was chaired by Jon Bosak of Sun 173 Microsystems with the active participation of an XML Special 174 Interest Group (previously known as the SGML Working Group) also 175 organized by the W3C. The membership of the XML Working Group is given 176 in an appendix. Dan Connolly served as the WG's contact with the W3C. 177 178 </p> 179 180 <p>The design goals for XML are: 181 <ol> 182 183 <li> 184 <p>XML shall be straightforwardly usable over the 185 Internet. 186 </p> 187 </li> 188 189 <li> 190 <p>XML shall support a wide variety of applications.</p> 191 </li> 192 193 <li> 194 <p>XML shall be compatible with SGML.</p> 195 </li> 196 197 <li> 198 <p>It shall be easy to write programs which process XML 199 documents. 200 </p> 201 </li> 202 203 <li> 204 <p>The number of optional features in XML is to be kept to the 205 absolute minimum, ideally zero. 206 </p> 207 </li> 208 209 <li> 210 <p>XML documents should be human-legible and reasonably 211 clear. 212 </p> 213 </li> 214 215 <li> 216 <p>The XML design should be prepared quickly.</p> 217 </li> 218 219 <li> 220 <p>The design of XML shall be formal and concise.</p> 221 </li> 222 223 <li> 224 <p>XML documents shall be easy to create.</p> 225 </li> 226 227 <li> 228 <p>Terseness in XML markup is of minimal importance.</p> 229 </li> 230 </ol> 231 232 </p> 233 234 <p>This specification, 235 together with associated standards 236 (Unicode and ISO/IEC 10646 for characters, 237 Internet RFC 1766 for language identification tags, 238 ISO 639 for language name codes, and 239 ISO 3166 for country name codes), 240 provides all the information necessary to understand 241 XML Version 1.0 242 and construct computer programs to process it. 243 </p> 244 245 <p>This version of the XML specification 246 247 may be distributed freely, as long as 248 all text and legal notices remain intact. 249 </p> 250 251 252 253 254 255 256 257 258 <h3><a name="sec-terminology"></a>1.2 Terminology 259 </h3> 260 261 262 <p>The terminology used to describe XML documents is defined in the body of 263 this specification. 264 The terms defined in the following list are used in building those 265 definitions and in describing the actions of an XML processor: 266 267 <dl> 268 269 270 <dt><b>may</b></dt> 271 272 <dd> 273 <p><a name="dt-may"></a>Conforming documents and XML 274 processors are permitted to but need not behave as 275 described. 276 </p> 277 </dd> 278 279 280 281 <dt><b>must</b></dt> 282 283 <dd> 284 <p>Conforming documents and XML processors 285 are required to behave as described; otherwise they are in error. 286 287 288 </p> 289 </dd> 290 291 292 293 <dt><b>error</b></dt> 294 295 <dd> 296 <p><a name="dt-error"></a>A violation of the rules of this 297 specification; results are 298 undefined. Conforming software may detect and report an error and may 299 recover from it. 300 </p> 301 </dd> 302 303 304 305 <dt><b>fatal error</b></dt> 306 307 <dd> 308 <p><a name="dt-fatal"></a>An error 309 which a conforming <a href="#dt-xml-proc">XML processor</a> 310 must detect and report to the application. 311 After encountering a fatal error, the 312 processor may continue 313 processing the data to search for further errors and may report such 314 errors to the application. In order to support correction of errors, 315 the processor may make unprocessed data from the document (with 316 intermingled character data and markup) available to the application. 317 Once a fatal error is detected, however, the processor must not 318 continue normal processing (i.e., it must not 319 continue to pass character data and information about the document's 320 logical structure to the application in the normal way). 321 322 </p> 323 </dd> 324 325 326 327 <dt><b>at user option</b></dt> 328 329 <dd> 330 <p>Conforming software may or must (depending on the modal verb in the 331 sentence) behave as described; if it does, it must 332 provide users a means to enable or disable the behavior 333 described. 334 </p> 335 </dd> 336 337 338 339 <dt><b>validity constraint</b></dt> 340 341 <dd> 342 <p>A rule which applies to all 343 <a href="#dt-valid">valid</a> XML documents. 344 Violations of validity constraints are errors; they must, at user option, 345 be reported by 346 <a href="#dt-validating">validating XML processors</a>. 347 </p> 348 </dd> 349 350 351 352 <dt><b>well-formedness constraint</b></dt> 353 354 <dd> 355 <p>A rule which applies to all <a href="#dt-wellformed">well-formed</a> XML documents. 356 Violations of well-formedness constraints are 357 <a href="#dt-fatal">fatal errors</a>. 358 </p> 359 </dd> 360 361 362 363 364 <dt><b>match</b></dt> 365 366 <dd> 367 <p><a name="dt-match"></a>(Of strings or names:) 368 Two strings or names being compared must be identical. 369 Characters with multiple possible representations in ISO/IEC 10646 (e.g. 370 characters with 371 both precomposed and base+diacritic forms) match only if they have the 372 same representation in both strings. 373 At user option, processors may normalize such characters to 374 some canonical form. 375 No case folding is performed. 376 (Of strings and rules in the grammar:) 377 A string matches a grammatical production if it belongs to the 378 language generated by that production. 379 (Of content and content models:) 380 An element matches its declaration when it conforms 381 in the fashion described in the constraint 382 <a href="#elementvalid">[<b>3 Element Valid</b>] 383 </a>. 384 385 386 </p> 387 </dd> 388 389 390 391 <dt><b>for compatibility</b></dt> 392 393 <dd> 394 <p><a name="dt-compat"></a>A feature of 395 XML included solely to ensure that XML remains compatible with SGML. 396 397 </p> 398 </dd> 399 400 401 402 <dt><b>for interoperability</b></dt> 403 404 <dd> 405 <p><a name="dt-interop"></a>A 406 non-binding recommendation included to increase the chances that XML 407 documents can be processed by the existing installed base of SGML 408 processors which predate the 409 WebSGML Adaptations Annex to ISO 8879. 410 </p> 411 </dd> 412 413 414 </dl> 415 416 </p> 417 418 419 420 421 422 423 424 425 <h2><a name="sec-documents"></a>2 Documents 426 </h2> 427 428 429 <p><a name="dt-xml-doc"></a> 430 A data object is an 431 <b>XML document</b> if it is 432 <a href="#dt-wellformed">well-formed</a>, as 433 defined in this specification. 434 A well-formed XML document may in addition be 435 <a href="#dt-valid">valid</a> if it meets certain further 436 constraints. 437 </p> 438 439 440 <p>Each XML document has both a logical and a physical structure. 441 Physically, the document is composed of units called <a href="#dt-entity">entities</a>. An entity may <a href="#dt-entref">refer</a> to other entities to cause their 442 inclusion in the document. A document begins in a "root" or <a href="#dt-docent">document entity</a>. 443 Logically, the document is composed of declarations, elements, 444 comments, 445 character references, and 446 processing 447 instructions, all of which are indicated in the document by explicit 448 markup. 449 The logical and physical structures must nest properly, as described 450 in <a href="#wf-entities">[<b>4.3.2 Well-Formed Parsed Entities</b>] 451 </a>. 452 453 </p> 454 455 456 457 <h3><a name="sec-well-formed"></a>2.1 Well-Formed XML Documents 458 </h3> 459 460 461 <p><a name="dt-wellformed"></a> 462 A textual object is 463 a well-formed XML document if: 464 465 <ol> 466 467 <li> 468 <p>Taken as a whole, it 469 matches the production labeled <a href="#NT-document">document</a>. 470 </p> 471 </li> 472 473 <li> 474 <p>It 475 meets all the well-formedness constraints given in this specification. 476 </p> 477 478 </li> 479 480 <li> 481 <p>Each of the <a href="#dt-parsedent">parsed entities</a> 482 which is referenced directly or indirectly within the document is 483 <a href="#wf-entities">well-formed</a>. 484 </p> 485 </li> 486 487 </ol> 488 </p> 489 490 <p> 491 492 <h5>Document</h5> 493 <table class="scrap"> 494 <tbody> 495 <tr valign="baseline"> 496 <td><a name="NT-document"></a>[1] 497 </td> 498 <td>document</td> 499 <td> ::= </td> 500 <td><a href="#NT-prolog">prolog</a> 501 <a href="#NT-element">element</a> 502 <a href="#NT-Misc">Misc</a>* 503 </td> 504 <td></td> 505 </tr> 506 </tbody> 507 </table> 508 509 </p> 510 511 <p>Matching the <a href="#NT-document">document</a> production 512 implies that: 513 514 <ol> 515 516 <li> 517 <p>It contains one or more 518 <a href="#dt-element">elements</a>. 519 </p> 520 521 </li> 522 523 524 <li> 525 <p><a name="dt-root"></a>There is exactly 526 one element, called the <b>root</b>, or document element, no 527 part of which appears in the <a href="#dt-content">content</a> of any other element. 528 For all other elements, if the start-tag is in the content of another 529 element, the end-tag is in the content of the same element. More 530 simply stated, the elements, delimited by start- and end-tags, nest 531 properly within each other. 532 533 </p> 534 </li> 535 536 </ol> 537 538 </p> 539 540 <p><a name="dt-parentchild"></a>As a consequence 541 of this, 542 for each non-root element 543 <code>C</code> in the document, there is one other element <code>P</code> 544 in the document such that 545 <code>C</code> is in the content of <code>P</code>, but is not in 546 the content of any other element that is in the content of 547 <code>P</code>. 548 <code>P</code> is referred to as the 549 <b>parent</b> of <code>C</code>, and <code>C</code> as a 550 <b>child</b> of <code>P</code>. 551 </p> 552 553 554 555 <h3><a name="charsets"></a>2.2 Characters 556 </h3> 557 558 559 <p><a name="dt-text"></a>A parsed entity contains 560 <b>text</b>, a sequence of 561 <a href="#dt-character">characters</a>, 562 which may represent markup or character data. 563 <a name="dt-character"></a>A <b>character</b> 564 is an atomic unit of text as specified by 565 ISO/IEC 10646 <a href="#ISO10646">[ISO/IEC 10646]</a>. 566 Legal characters are tab, carriage return, line feed, and the legal 567 graphic characters of Unicode and ISO/IEC 10646. 568 The use of "compatibility characters", as defined in section 6.8 569 of <a href="#Unicode">[Unicode]</a>, is discouraged. 570 571 572 <h5>Character Range</h5> 573 <table class="scrap"> 574 <tbody> 575 576 <tr valign="baseline"> 577 <td><a name="NT-Char"></a>[2] 578 </td> 579 <td>Char</td> 580 <td> ::= </td> 581 <td>#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] 582 | [#x10000-#x10FFFF] 583 </td> 584 <td>/*any Unicode character, excluding the 585 surrogate blocks, FFFE, and FFFF.*/ 586 </td> 587 </tr> 588 589 </tbody> 590 </table> 591 592 </p> 593 594 595 <p>The mechanism for encoding character code points into bit patterns may 596 vary from entity to entity. All XML processors must accept the UTF-8 597 and UTF-16 encodings of 10646; the mechanisms for signaling which of 598 the two is in use, or for bringing other encodings into play, are 599 discussed later, in <a href="#charencoding">[<b>4.3.3 Character Encoding in Entities</b>] 600 </a>. 601 602 </p> 603 604 605 606 607 608 <h3><a name="sec-common-syn"></a>2.3 Common Syntactic Constructs 609 </h3> 610 611 612 <p>This section defines some symbols used widely in the grammar.</p> 613 614 <p><a href="#NT-S">S</a> (white space) consists of one or more space (#x20) 615 characters, carriage returns, line feeds, or tabs. 616 617 618 <h5>White Space</h5> 619 <table class="scrap"> 620 <tbody> 621 622 <tr valign="baseline"> 623 <td><a name="NT-S"></a>[3] 624 </td> 625 <td>S</td> 626 <td> ::= </td> 627 <td>(#x20 | #x9 | #xD | #xA)+</td> 628 <td></td> 629 </tr> 630 631 </tbody> 632 </table> 633 </p> 634 635 <p>Characters are classified for convenience as letters, digits, or other 636 characters. Letters consist of an alphabetic or syllabic 637 base character possibly 638 followed by one or more combining characters, or of an ideographic 639 character. 640 Full definitions of the specific characters in each class 641 are given in <a href="#CharClasses">[<b>B Character Classes</b>] 642 </a>. 643 </p> 644 645 <p><a name="dt-name"></a>A <b>Name</b> is a token 646 beginning with a letter or one of a few punctuation characters, and continuing 647 with letters, digits, hyphens, underscores, colons, or full stops, together 648 known as name characters. 649 Names beginning with the string "<code>xml</code>", or any string 650 which would match <code>(('X'|'x') ('M'|'m') ('L'|'l'))</code>, are 651 reserved for standardization in this or future versions of this 652 specification. 653 654 </p> 655 656 <blockquote><b>NOTE: </b> 657 The colon character within XML names is reserved for experimentation with 658 name spaces. 659 Its meaning is expected to be 660 standardized at some future point, at which point those documents 661 using the colon for experimental purposes may need to be updated. 662 (There is no guarantee that any name-space mechanism 663 adopted for XML will in fact use the colon as a name-space delimiter.) 664 In practice, this means that authors should not use the colon in XML 665 names except as part of name-space experiments, but that XML processors 666 should accept the colon as a name character. 667 668 </blockquote> 669 670 <p>An 671 <a href="#NT-Nmtoken">Nmtoken</a> (name token) is any mixture of 672 name characters. 673 674 <h5>Names and Tokens</h5> 675 <table class="scrap"> 676 <tbody> 677 <tr valign="baseline"> 678 <td><a name="NT-NameChar"></a>[4] 679 </td> 680 <td>NameChar</td> 681 <td> ::= </td> 682 <td><a href="#NT-Letter">Letter</a> 683 | <a href="#NT-Digit">Digit</a> 684 | '.' | '-' | '_' | ':' 685 | <a href="#NT-CombiningChar">CombiningChar</a> 686 | <a href="#NT-Extender">Extender</a></td> 687 <td></td> 688 </tr> 689 <tr valign="baseline"> 690 <td><a name="NT-Name"></a>[5] 691 </td> 692 <td>Name</td> 693 <td> ::= </td> 694 <td>(<a href="#NT-Letter">Letter</a> | '_' | ':') 695 (<a href="#NT-NameChar">NameChar</a>)* 696 </td> 697 <td></td> 698 </tr> 699 <tr valign="baseline"> 700 <td><a name="NT-Names"></a>[6] 701 </td> 702 <td>Names</td> 703 <td> ::= </td> 704 <td><a href="#NT-Name">Name</a> 705 (<a href="#NT-S">S</a> <a href="#NT-Name">Name</a>)* 706 </td> 707 <td></td> 708 </tr> 709 <tr valign="baseline"> 710 <td><a name="NT-Nmtoken"></a>[7] 711 </td> 712 <td>Nmtoken</td> 713 <td> ::= </td> 714 <td>(<a href="#NT-NameChar">NameChar</a>)+ 715 </td> 716 <td></td> 717 </tr> 718 <tr valign="baseline"> 719 <td><a name="NT-Nmtokens"></a>[8] 720 </td> 721 <td>Nmtokens</td> 722 <td> ::= </td> 723 <td><a href="#NT-Nmtoken">Nmtoken</a> (<a href="#NT-S">S</a> <a href="#NT-Nmtoken">Nmtoken</a>)* 724 </td> 725 <td></td> 726 </tr> 727 </tbody> 728 </table> 729 730 </p> 731 732 <p>Literal data is any quoted string not containing 733 the quotation mark used as a delimiter for that string. 734 Literals are used 735 for specifying the content of internal entities 736 (<a href="#NT-EntityValue">EntityValue</a>), 737 the values of attributes (<a href="#NT-AttValue">AttValue</a>), 738 and external identifiers 739 (<a href="#NT-SystemLiteral">SystemLiteral</a>). 740 Note that a <a href="#NT-SystemLiteral">SystemLiteral</a> 741 can be parsed without scanning for markup. 742 743 <h5>Literals</h5> 744 <table class="scrap"> 745 <tbody> 746 <tr valign="baseline"> 747 <td><a name="NT-EntityValue"></a>[9] 748 </td> 749 <td>EntityValue</td> 750 <td> ::= </td> 751 <td>'"' 752 ([^%&"] 753 | <a href="#NT-PEReference">PEReference</a> 754 | <a href="#NT-Reference">Reference</a>)* 755 '"' 756 757 </td> 758 <td></td> 759 </tr> 760 <tr valign="baseline"> 761 <td></td> 762 <td></td> 763 <td></td> 764 <td>| 765 "'" 766 ([^%&'] 767 | <a href="#NT-PEReference">PEReference</a> 768 | <a href="#NT-Reference">Reference</a>)* 769 "'" 770 </td> 771 <td></td> 772 </tr> 773 <tr valign="baseline"> 774 <td><a name="NT-AttValue"></a>[10] 775 </td> 776 <td>AttValue</td> 777 <td> ::= </td> 778 <td>'"' 779 ([^<&"] 780 | <a href="#NT-Reference">Reference</a>)* 781 '"' 782 783 </td> 784 <td></td> 785 </tr> 786 <tr valign="baseline"> 787 <td></td> 788 <td></td> 789 <td></td> 790 <td>| 791 "'" 792 ([^<&'] 793 | <a href="#NT-Reference">Reference</a>)* 794 "'" 795 </td> 796 <td></td> 797 </tr> 798 <tr valign="baseline"> 799 <td><a name="NT-SystemLiteral"></a>[11] 800 </td> 801 <td>SystemLiteral</td> 802 <td> ::= </td> 803 <td>('"' [^"]* '"') | ("'" [^']* "'") 804 805 </td> 806 <td></td> 807 </tr> 808 <tr valign="baseline"> 809 <td><a name="NT-PubidLiteral"></a>[12] 810 </td> 811 <td>PubidLiteral</td> 812 <td> ::= </td> 813 <td>'"' <a href="#NT-PubidChar">PubidChar</a>* 814 '"' 815 | "'" (<a href="#NT-PubidChar">PubidChar</a> - "'")* "'" 816 </td> 817 <td></td> 818 </tr> 819 <tr valign="baseline"> 820 <td><a name="NT-PubidChar"></a>[13] 821 </td> 822 <td>PubidChar</td> 823 <td> ::= </td> 824 <td>#x20 | #xD | #xA 825 | [a-zA-Z0-9] 826 | [-'()+,./:=?;!*#@$_%] 827 </td> 828 <td></td> 829 </tr> 830 </tbody> 831 </table> 832 833 </p> 834 835 836 837 838 839 <h3><a name="syntax"></a>2.4 Character Data and Markup 840 </h3> 841 842 843 <p><a href="#dt-text">Text</a> consists of intermingled 844 <a href="#dt-chardata">character 845 data 846 </a> and markup. 847 <a name="dt-markup"></a><b>Markup</b> takes the form of 848 <a href="#dt-stag">start-tags</a>, 849 <a href="#dt-etag">end-tags</a>, 850 <a href="#dt-empty">empty-element tags</a>, 851 <a href="#dt-entref">entity references</a>, 852 <a href="#dt-charref">character references</a>, 853 <a href="#dt-comment">comments</a>, 854 <a href="#dt-cdsection">CDATA section</a> delimiters, 855 <a href="#dt-doctype">document type declarations</a>, and 856 <a href="#dt-pi">processing instructions</a>. 857 858 859 </p> 860 861 <p><a name="dt-chardata"></a>All text that is not markup 862 constitutes the <b>character data</b> of 863 the document. 864 </p> 865 866 <p>The ampersand character (&) and the left angle bracket (<) 867 may appear in their literal form <i>only</i> when used as markup 868 delimiters, or within a <a href="#dt-comment">comment</a>, a 869 <a href="#dt-pi">processing instruction</a>, 870 or a <a href="#dt-cdsection">CDATA section</a>. 871 872 They are also legal within the <a href="#dt-litentval">literal entity 873 value 874 </a> of an internal entity declaration; see 875 <a href="#wf-entities">[<b>4.3.2 Well-Formed Parsed Entities</b>] 876 </a>. 877 878 If they are needed elsewhere, 879 they must be <a href="#dt-escape">escaped</a> 880 using either <a href="#dt-charref">numeric character references</a> 881 or the strings 882 "<code>&amp;</code>" and "<code>&lt;</code>" respectively. 883 The right angle 884 bracket (>) may be represented using the string 885 "<code>&gt;</code>", and must, <a href="#dt-compat">for 886 compatibility 887 </a>, 888 be escaped using 889 "<code>&gt;</code>" or a character reference 890 when it appears in the string 891 "<code>]]></code>" 892 in content, 893 when that string is not marking the end of 894 a <a href="#dt-cdsection">CDATA section</a>. 895 896 </p> 897 898 <p> 899 In the content of elements, character data 900 is any string of characters which does 901 not contain the start-delimiter of any markup. 902 In a CDATA section, character data 903 is any string of characters not including the CDATA-section-close 904 delimiter, "<code>]]></code>". 905 </p> 906 907 <p> 908 To allow attribute values to contain both single and double quotes, the 909 apostrophe or single-quote character (') may be represented as 910 "<code>&apos;</code>", and the double-quote character (") as 911 "<code>&quot;</code>". 912 913 <h5>Character Data</h5> 914 <table class="scrap"> 915 <tbody> 916 <tr valign="baseline"> 917 <td><a name="NT-CharData"></a>[14] 918 </td> 919 <td>CharData</td> 920 <td> ::= </td> 921 <td>[^<&]* - ([^<&]* ']]>' [^<&]*)</td> 922 <td></td> 923 </tr> 924 </tbody> 925 </table> 926 927 </p> 928 929 930 931 932 <h3><a name="sec-comments"></a>2.5 Comments 933 </h3> 934 935 936 <p><a name="dt-comment"></a><b>Comments</b> may 937 appear anywhere in a document outside other 938 <a href="#dt-markup">markup</a>; in addition, 939 they may appear within the document type declaration 940 at places allowed by the grammar. 941 They are not part of the document's <a href="#dt-chardata">character 942 data 943 </a>; an XML 944 processor may, but need not, make it possible for an application to 945 retrieve the text of comments. 946 <a href="#dt-compat">For compatibility</a>, the string 947 "<code>--</code>" (double-hyphen) must not occur within 948 comments. 949 950 <h5>Comments</h5> 951 <table class="scrap"> 952 <tbody> 953 <tr valign="baseline"> 954 <td><a name="NT-Comment"></a>[15] 955 </td> 956 <td>Comment</td> 957 <td> ::= </td> 958 <td>'<!--' 959 ((<a href="#NT-Char">Char</a> - '-') 960 | ('-' (<a href="#NT-Char">Char</a> - '-')))* 961 '-->' 962 </td> 963 <td></td> 964 </tr> 965 </tbody> 966 </table> 967 968 </p> 969 970 <p>An example of a comment: 971 <pre><!-- declarations for <head> & <body> --></pre> 972 </p> 973 974 975 976 977 <h3><a name="sec-pi"></a>2.6 Processing Instructions 978 </h3> 979 980 981 <p><a name="dt-pi"></a><b>Processing 982 instructions 983 </b> (PIs) allow documents to contain instructions 984 for applications. 985 986 987 <h5>Processing Instructions</h5> 988 <table class="scrap"> 989 <tbody> 990 <tr valign="baseline"> 991 <td><a name="NT-PI"></a>[16] 992 </td> 993 <td>PI</td> 994 <td> ::= </td> 995 <td>'<?' <a href="#NT-PITarget">PITarget</a> 996 (<a href="#NT-S">S</a> 997 (<a href="#NT-Char">Char</a>* - 998 (<a href="#NT-Char">Char</a>* '?>' <a href="#NT-Char">Char</a>*)))? 999 '?>' 1000 </td> 1001 <td></td> 1002 </tr> 1003 <tr valign="baseline"> 1004 <td><a name="NT-PITarget"></a>[17] 1005 </td> 1006 <td>PITarget</td> 1007 <td> ::= </td> 1008 <td><a href="#NT-Name">Name</a> - 1009 (('X' | 'x') ('M' | 'm') ('L' | 'l')) 1010 </td> 1011 <td></td> 1012 </tr> 1013 </tbody> 1014 </table> 1015 PIs are not part of the document's <a href="#dt-chardata">character 1016 data 1017 </a>, but must be passed through to the application. The 1018 PI begins with a target (<a href="#NT-PITarget">PITarget</a>) used 1019 to identify the application to which the instruction is directed. 1020 The target names "<code>XML</code>", "<code>xml</code>", and so on are 1021 reserved for standardization in this or future versions of this 1022 specification. 1023 The 1024 XML <a href="#dt-notation">Notation</a> mechanism 1025 may be used for 1026 formal declaration of PI targets. 1027 1028 </p> 1029 1030 1031 1032 1033 <h3><a name="sec-cdata-sect"></a>2.7 CDATA Sections 1034 </h3> 1035 1036 1037 <p><a name="dt-cdsection"></a><b>CDATA sections</b> 1038 may occur 1039 anywhere character data may occur; they are 1040 used to escape blocks of text containing characters which would 1041 otherwise be recognized as markup. CDATA sections begin with the 1042 string "<code><![CDATA[</code>" and end with the string 1043 "<code>]]></code>": 1044 1045 <h5>CDATA Sections</h5> 1046 <table class="scrap"> 1047 <tbody> 1048 <tr valign="baseline"> 1049 <td><a name="NT-CDSect"></a>[18] 1050 </td> 1051 <td>CDSect</td> 1052 <td> ::= </td> 1053 <td><a href="#NT-CDStart">CDStart</a> 1054 <a href="#NT-CData">CData</a> 1055 <a href="#NT-CDEnd">CDEnd</a></td> 1056 <td></td> 1057 </tr> 1058 <tr valign="baseline"> 1059 <td><a name="NT-CDStart"></a>[19] 1060 </td> 1061 <td>CDStart</td> 1062 <td> ::= </td> 1063 <td>'<![CDATA['</td> 1064 <td></td> 1065 </tr> 1066 <tr valign="baseline"> 1067 <td><a name="NT-CData"></a>[20] 1068 </td> 1069 <td>CData</td> 1070 <td> ::= </td> 1071 <td>(<a href="#NT-Char">Char</a>* - 1072 (<a href="#NT-Char">Char</a>* ']]>' <a href="#NT-Char">Char</a>*)) 1073 1074 </td> 1075 <td></td> 1076 </tr> 1077 <tr valign="baseline"> 1078 <td><a name="NT-CDEnd"></a>[21] 1079 </td> 1080 <td>CDEnd</td> 1081 <td> ::= </td> 1082 <td>']]>'</td> 1083 <td></td> 1084 </tr> 1085 </tbody> 1086 </table> 1087 1088 Within a CDATA section, only the <a href="#NT-CDEnd">CDEnd</a> string is 1089 recognized as markup, so that left angle brackets and ampersands may occur in 1090 their literal form; they need not (and cannot) be escaped using 1091 "<code>&lt;</code>" and "<code>&amp;</code>". CDATA sections 1092 cannot nest. 1093 1094 </p> 1095 1096 1097 <p>An example of a CDATA section, in which "<code><greeting></code>" and 1098 "<code></greeting></code>" 1099 are recognized as <a href="#dt-chardata">character data</a>, not 1100 <a href="#dt-markup">markup</a>: 1101 <pre><![CDATA[<greeting>Hello, world!</greeting>]]></pre> 1102 </p> 1103 1104 1105 1106 1107 <h3><a name="sec-prolog-dtd"></a>2.8 Prolog and Document Type Declaration 1108 </h3> 1109 1110 1111 <p><a name="dt-xmldecl"></a>XML documents 1112 may, and should, 1113 begin with an <b>XML declaration</b> which specifies 1114 the version of 1115 XML being used. 1116 For example, the following is a complete XML document, <a href="#dt-wellformed">well-formed</a> but not 1117 <a href="#dt-valid">valid</a>: 1118 <pre><?xml version="1.0"?> 1119<greeting>Hello, world!</greeting> 1120</pre> 1121 and so is this: 1122 <pre><greeting>Hello, world!</greeting> 1123</pre> 1124 </p> 1125 1126 1127 <p>The version number "<code>1.0</code>" should be used to indicate 1128 conformance to this version of this specification; it is an error 1129 for a document to use the value "<code>1.0</code>" 1130 if it does not conform to this version of this specification. 1131 It is the intent 1132 of the XML working group to give later versions of this specification 1133 numbers other than "<code>1.0</code>", but this intent does not 1134 indicate a 1135 commitment to produce any future versions of XML, nor if any are produced, to 1136 use any particular numbering scheme. 1137 Since future versions are not ruled out, this construct is provided 1138 as a means to allow the possibility of automatic version recognition, should 1139 it become necessary. 1140 Processors may signal an error if they receive documents labeled with 1141 versions they do not support. 1142 1143 </p> 1144 1145 <p>The function of the markup in an XML document is to describe its 1146 storage and logical structure and to associate attribute-value pairs 1147 with its logical structures. XML provides a mechanism, the <a href="#dt-doctype">document type declaration</a>, to define 1148 constraints on the logical structure and to support the use of 1149 predefined storage units. 1150 1151 <a name="dt-valid"></a>An XML document is 1152 <b>valid</b> if it has an associated document type 1153 declaration and if the document 1154 complies with the constraints expressed in it. 1155 </p> 1156 1157 <p>The document type declaration must appear before 1158 the first <a href="#dt-element">element</a> in the document. 1159 1160 <h5>Prolog</h5> 1161 <table class="scrap"> 1162 <tbody> 1163 1164 <tr valign="baseline"> 1165 <td><a name="NT-prolog"></a>[22] 1166 </td> 1167 <td>prolog</td> 1168 <td> ::= </td> 1169 <td><a href="#NT-XMLDecl">XMLDecl</a>? 1170 <a href="#NT-Misc">Misc</a>* 1171 (<a href="#NT-doctypedecl">doctypedecl</a> 1172 <a href="#NT-Misc">Misc</a>*)? 1173 </td> 1174 <td></td> 1175 </tr> 1176 1177 <tr valign="baseline"> 1178 <td><a name="NT-XMLDecl"></a>[23] 1179 </td> 1180 <td>XMLDecl</td> 1181 <td> ::= </td> 1182 <td>'<?xml' 1183 <a href="#NT-VersionInfo">VersionInfo</a> 1184 <a href="#NT-EncodingDecl">EncodingDecl</a>? 1185 <a href="#NT-SDDecl">SDDecl</a>? 1186 <a href="#NT-S">S</a>? 1187 '?>' 1188 </td> 1189 <td></td> 1190 </tr> 1191 1192 <tr valign="baseline"> 1193 <td><a name="NT-VersionInfo"></a>[24] 1194 </td> 1195 <td>VersionInfo</td> 1196 <td> ::= </td> 1197 <td><a href="#NT-S">S</a> 'version' <a href="#NT-Eq">Eq</a> 1198 (' <a href="#NT-VersionNum">VersionNum</a> ' 1199 | " <a href="#NT-VersionNum">VersionNum</a> ") 1200 </td> 1201 <td></td> 1202 </tr> 1203 1204 <tr valign="baseline"> 1205 <td><a name="NT-Eq"></a>[25] 1206 </td> 1207 <td>Eq</td> 1208 <td> ::= </td> 1209 <td><a href="#NT-S">S</a>? '=' <a href="#NT-S">S</a>? 1210 </td> 1211 <td></td> 1212 </tr> 1213 1214 <tr valign="baseline"> 1215 <td><a name="NT-VersionNum"></a>[26] 1216 </td> 1217 <td>VersionNum</td> 1218 <td> ::= </td> 1219 <td>([a-zA-Z0-9_.:] | '-')+</td> 1220 <td></td> 1221 </tr> 1222 1223 <tr valign="baseline"> 1224 <td><a name="NT-Misc"></a>[27] 1225 </td> 1226 <td>Misc</td> 1227 <td> ::= </td> 1228 <td><a href="#NT-Comment">Comment</a> | <a href="#NT-PI">PI</a> | 1229 <a href="#NT-S">S</a></td> 1230 <td></td> 1231 </tr> 1232 1233 </tbody> 1234 </table> 1235 </p> 1236 1237 1238 <p><a name="dt-doctype"></a>The XML 1239 <b>document type declaration</b> 1240 contains or points to 1241 <a href="#dt-markupdecl">markup declarations</a> 1242 that provide a grammar for a 1243 class of documents. 1244 This grammar is known as a document type definition, 1245 or <b>DTD</b>. 1246 The document type declaration can point to an external subset (a 1247 special kind of 1248 <a href="#dt-extent">external entity</a>) containing markup 1249 declarations, or can 1250 contain the markup declarations directly in an internal subset, or can do 1251 both. 1252 The DTD for a document consists of both subsets taken 1253 together. 1254 1255 </p> 1256 1257 <p><a name="dt-markupdecl"></a> 1258 A <b>markup declaration</b> is 1259 an <a href="#dt-eldecl">element type declaration</a>, 1260 an <a href="#dt-attdecl">attribute-list declaration</a>, 1261 an <a href="#dt-entdecl">entity declaration</a>, or 1262 a <a href="#dt-notdecl">notation declaration</a>. 1263 1264 These declarations may be contained in whole or in part 1265 within <a href="#dt-PE">parameter entities</a>, 1266 as described in the well-formedness and validity constraints below. 1267 For fuller information, see 1268 <a href="#sec-physical-struct">[<b>4 Physical Structures</b>] 1269 </a>. 1270 </p> 1271 1272 <h5>Document Type Definition</h5> 1273 <table class="scrap"> 1274 <tbody> 1275 1276 <tr valign="baseline"> 1277 <td><a name="NT-doctypedecl"></a>[28] 1278 </td> 1279 <td>doctypedecl</td> 1280 <td> ::= </td> 1281 <td>'<!DOCTYPE' <a href="#NT-S">S</a> 1282 <a href="#NT-Name">Name</a> (<a href="#NT-S">S</a> 1283 <a href="#NT-ExternalID">ExternalID</a>)? 1284 <a href="#NT-S">S</a>? ('[' 1285 (<a href="#NT-markupdecl">markupdecl</a> 1286 | <a href="#NT-PEReference">PEReference</a> 1287 | <a href="#NT-S">S</a>)* 1288 ']' 1289 <a href="#NT-S">S</a>?)? '>' 1290 </td> 1291 <td>[ VC: <a href="#vc-roottype">Root Element Type</a> ] 1292 </td> 1293 </tr> 1294 1295 <tr valign="baseline"> 1296 <td><a name="NT-markupdecl"></a>[29] 1297 </td> 1298 <td>markupdecl</td> 1299 <td> ::= </td> 1300 <td><a href="#NT-elementdecl">elementdecl</a> 1301 | <a href="#NT-AttlistDecl">AttlistDecl</a> 1302 | <a href="#NT-EntityDecl">EntityDecl</a> 1303 | <a href="#NT-NotationDecl">NotationDecl</a> 1304 | <a href="#NT-PI">PI</a> 1305 | <a href="#NT-Comment">Comment</a> 1306 1307 </td> 1308 <td>[ VC: <a href="#vc-PEinMarkupDecl">Proper Declaration/PE Nesting</a> ] 1309 </td> 1310 </tr> 1311 <tr valign="baseline"> 1312 <td></td> 1313 <td></td> 1314 <td></td> 1315 <td></td> 1316 <td>[ WFC: <a href="#wfc-PEinInternalSubset">PEs in Internal Subset</a> ] 1317 </td> 1318 </tr> 1319 1320 1321 </tbody> 1322 </table> 1323 1324 1325 <p>The markup declarations may be made up in whole or in part of 1326 the <a href="#dt-repltext">replacement text</a> of 1327 <a href="#dt-PE">parameter entities</a>. 1328 The productions later in this specification for 1329 individual nonterminals (<a href="#NT-elementdecl">elementdecl</a>, 1330 <a href="#NT-AttlistDecl">AttlistDecl</a>, and so on) describe 1331 the declarations <i>after</i> all the parameter entities have been 1332 <a href="#dt-include">included</a>. 1333 </p> 1334 1335 <a name="vc-roottype"></a><p><b>Validity Constraint: Root Element Type</b></p> 1336 Root Element Type 1337 1338 <p> 1339 The <a href="#NT-Name">Name</a> in the document type declaration must 1340 match the element type of the <a href="#dt-root">root element</a>. 1341 1342 </p> 1343 1344 1345 <a name="vc-PEinMarkupDecl"></a><p><b>Validity Constraint: Proper Declaration/PE Nesting</b></p> 1346 Proper Declaration/PE Nesting 1347 1348 <p>Parameter-entity 1349 <a href="#dt-repltext">replacement text</a> must be properly nested 1350 with markup declarations. 1351 That is to say, if either the first character 1352 or the last character of a markup 1353 declaration (<a href="#NT-markupdecl">markupdecl</a> above) 1354 is contained in the replacement text for a 1355 <a href="#dt-PERef">parameter-entity reference</a>, 1356 both must be contained in the same replacement text. 1357 </p> 1358 1359 <a name="wfc-PEinInternalSubset"></a><p><b>Well Formedness Constraint: PEs in Internal Subset</b></p> 1360 PEs in Internal Subset 1361 1362 <p>In the internal DTD subset, 1363 <a href="#dt-PERef">parameter-entity references</a> 1364 can occur only where markup declarations can occur, not 1365 within markup declarations. (This does not apply to 1366 references that occur in 1367 external parameter entities or to the external subset.) 1368 1369 </p> 1370 1371 1372 <p> 1373 Like the internal subset, the external subset and 1374 any external parameter entities referred to in the DTD 1375 must consist of a series of complete markup declarations of the types 1376 allowed by the non-terminal symbol 1377 <a href="#NT-markupdecl">markupdecl</a>, interspersed with white space 1378 or <a href="#dt-PERef">parameter-entity references</a>. 1379 However, portions of the contents 1380 of the 1381 external subset or of external parameter entities may conditionally be ignored 1382 by using 1383 the <a href="#dt-cond-section">conditional section</a> 1384 construct; this is not allowed in the internal subset. 1385 1386 1387 <h5>External Subset</h5> 1388 <table class="scrap"> 1389 <tbody> 1390 1391 <tr valign="baseline"> 1392 <td><a name="NT-extSubset"></a>[30] 1393 </td> 1394 <td>extSubset</td> 1395 <td> ::= </td> 1396 <td><a href="#NT-TextDecl">TextDecl</a>? 1397 <a href="#NT-extSubsetDecl">extSubsetDecl</a></td> 1398 <td></td> 1399 </tr> 1400 1401 <tr valign="baseline"> 1402 <td><a name="NT-extSubsetDecl"></a>[31] 1403 </td> 1404 <td>extSubsetDecl</td> 1405 <td> ::= </td> 1406 <td>( 1407 <a href="#NT-markupdecl">markupdecl</a> 1408 | <a href="#NT-conditionalSect">conditionalSect</a> 1409 | <a href="#NT-PEReference">PEReference</a> 1410 | <a href="#NT-S">S</a> 1411 )* 1412 </td> 1413 <td></td> 1414 </tr> 1415 1416 </tbody> 1417 </table> 1418 </p> 1419 1420 <p>The external subset and external parameter entities also differ 1421 from the internal subset in that in them, 1422 <a href="#dt-PERef">parameter-entity references</a> 1423 are permitted <i>within</i> markup declarations, 1424 not only <i>between</i> markup declarations. 1425 </p> 1426 1427 <p>An example of an XML document with a document type declaration: 1428 <pre><?xml version="1.0"?> 1429<!DOCTYPE greeting SYSTEM "hello.dtd"> 1430<greeting>Hello, world!</greeting> 1431</pre> 1432 The <a href="#dt-sysid">system identifier</a> 1433 "<code>hello.dtd</code>" gives the URI of a DTD for the document. 1434 </p> 1435 1436 <p>The declarations can also be given locally, as in this 1437 example: 1438 <pre><?xml version="1.0" encoding="UTF-8" ?> 1439<!DOCTYPE greeting [ 1440 <!ELEMENT greeting (#PCDATA)> 1441]> 1442<greeting>Hello, world!</greeting> 1443</pre> 1444 If both the external and internal subsets are used, the 1445 internal subset is considered to occur before the external subset. 1446 1447 This has the effect that entity and attribute-list declarations in the 1448 internal subset take precedence over those in the external subset. 1449 </p> 1450 1451 1452 1453 1454 <h3><a name="sec-rmd"></a>2.9 Standalone Document Declaration 1455 </h3> 1456 1457 <p>Markup declarations can affect the content of the document, 1458 as passed from an <a href="#dt-xml-proc">XML processor</a> 1459 to an application; examples are attribute defaults and entity 1460 declarations. 1461 The standalone document declaration, 1462 which may appear as a component of the XML declaration, signals 1463 whether or not there are such declarations which appear external to 1464 the <a href="#dt-docent">document entity</a>. 1465 1466 <h5>Standalone Document Declaration</h5> 1467 <table class="scrap"> 1468 <tbody> 1469 1470 <tr valign="baseline"> 1471 <td><a name="NT-SDDecl"></a>[32] 1472 </td> 1473 <td>SDDecl</td> 1474 <td> ::= </td> 1475 <td> 1476 <a href="#NT-S">S</a> 1477 'standalone' <a href="#NT-Eq">Eq</a> 1478 (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) 1479 1480 </td> 1481 <td>[ VC: <a href="#vc-check-rmd">Standalone Document Declaration</a> ] 1482 </td> 1483 </tr> 1484 1485 </tbody> 1486 </table> 1487 </p> 1488 1489 <p> 1490 In a standalone document declaration, the value "<code>yes</code>" indicates 1491 that there 1492 are no markup declarations external to the <a href="#dt-docent">document 1493 entity 1494 </a> (either in the DTD external subset, or in an 1495 external parameter entity referenced from the internal subset) 1496 which affect the information passed from the XML processor to 1497 the application. 1498 The value "<code>no</code>" indicates that there are or may be such 1499 external markup declarations. 1500 Note that the standalone document declaration only 1501 denotes the presence of external <i>declarations</i>; the presence, in a 1502 document, of 1503 references to external <i>entities</i>, when those entities are 1504 internally declared, 1505 does not change its standalone status. 1506 </p> 1507 1508 <p>If there are no external markup declarations, the standalone document 1509 declaration has no meaning. 1510 If there are external markup declarations but there is no standalone 1511 document declaration, the value "<code>no</code>" is assumed. 1512 </p> 1513 1514 <p>Any XML document for which <code>standalone="no"</code> holds can 1515 be converted algorithmically to a standalone document, 1516 which may be desirable for some network delivery applications. 1517 </p> 1518 <a name="vc-check-rmd"></a><p><b>Validity Constraint: Standalone Document Declaration</b></p> 1519 Standalone Document Declaration 1520 1521 <p>The standalone document declaration must have 1522 the value "<code>no</code>" if any external markup declarations 1523 contain declarations of: 1524 </p> 1525 <ul> 1526 1527 <li> 1528 <p>attributes with <a href="#dt-default">default</a> values, if 1529 elements to which 1530 these attributes apply appear in the document without 1531 specifications of values for these attributes, or 1532 </p> 1533 </li> 1534 1535 <li> 1536 <p>entities (other than <code>amp</code>, 1537 <code>lt</code>, 1538 <code>gt</code>, 1539 <code>apos</code>, 1540 <code>quot</code>), 1541 if <a href="#dt-entref">references</a> to those 1542 entities appear in the document, or 1543 </p> 1544 1545 </li> 1546 1547 <li> 1548 <p>attributes with values subject to 1549 <a href="#AVNormalize">normalization</a>, where the 1550 attribute appears in the document with a value which will 1551 change as a result of normalization, or 1552 </p> 1553 1554 </li> 1555 1556 <li> 1557 1558 <p>element types with <a href="#dt-elemcontent">element content</a>, 1559 if white space occurs 1560 directly within any instance of those types. 1561 1562 </p> 1563 </li> 1564 1565 </ul> 1566 1567 1568 1569 <p>An example XML declaration with a standalone document declaration:<pre><?xml version="1.0" standalone='yes'?></pre></p> 1570 1571 1572 1573 <h3><a name="sec-white-space"></a>2.10 White Space Handling 1574 </h3> 1575 1576 1577 <p>In editing XML documents, it is often convenient to use "white space" 1578 (spaces, tabs, and blank lines, denoted by the nonterminal 1579 <a href="#NT-S">S</a> in this specification) to 1580 set apart the markup for greater readability. Such white space is typically 1581 not intended for inclusion in the delivered version of the document. 1582 On the other hand, "significant" white space that should be preserved in the 1583 delivered version is common, for example in poetry and 1584 source code. 1585 </p> 1586 1587 <p>An <a href="#dt-xml-proc">XML processor</a> 1588 must always pass all characters in a document that are not 1589 markup through to the application. A <a href="#dt-validating"> 1590 validating XML processor 1591 </a> must also inform the application 1592 which of these characters constitute white space appearing 1593 in <a href="#dt-elemcontent">element content</a>. 1594 1595 </p> 1596 1597 <p>A special <a href="#dt-attr">attribute</a> 1598 named xml:space may be attached to an element 1599 to signal an intention that in that element, 1600 white space should be preserved by applications. 1601 In valid documents, this attribute, like any other, must be 1602 <a href="#dt-attdecl">declared</a> if it is used. 1603 When declared, it must be given as an 1604 <a href="#dt-enumerated">enumerated type</a> whose only 1605 possible values are "<code>default</code>" and "<code>preserve</code>". 1606 For example:<pre> <!ATTLIST poem xml:space (default|preserve) 'preserve'></pre></p> 1607 1608 <p>The value "<code>default</code>" signals that applications' 1609 default white-space processing modes are acceptable for this element; the 1610 value "<code>preserve</code>" indicates the intent that applications preserve 1611 all the white space. 1612 This declared intent is considered to apply to all elements within the content 1613 of the element where it is specified, unless overriden with another instance 1614 of the xml:space attribute. 1615 1616 </p> 1617 1618 <p>The <a href="#dt-root">root element</a> of any document 1619 is considered to have signaled no intentions as regards application space 1620 handling, unless it provides a value for 1621 this attribute or the attribute is declared with a default value. 1622 1623 </p> 1624 1625 1626 1627 1628 <h3><a name="sec-line-ends"></a>2.11 End-of-Line Handling 1629 </h3> 1630 1631 <p>XML <a href="#dt-parsedent">parsed entities</a> are often stored in 1632 computer files which, for editing convenience, are organized into lines. 1633 These lines are typically separated by some combination of the characters 1634 carriage-return (#xD) and line-feed (#xA). 1635 </p> 1636 1637 <p>To simplify the tasks of <a href="#dt-app">applications</a>, 1638 wherever an external parsed entity or the literal entity value 1639 of an internal parsed entity contains either the literal 1640 two-character sequence "#xD#xA" or a standalone literal 1641 #xD, an <a href="#dt-xml-proc">XML processor</a> must 1642 pass to the application the single character #xA. 1643 (This behavior can 1644 conveniently be produced by normalizing all 1645 line breaks to #xA on input, before parsing.) 1646 1647 </p> 1648 1649 1650 1651 <h3><a name="sec-lang-tag"></a>2.12 Language Identification 1652 </h3> 1653 1654 <p>In document processing, it is often useful to 1655 identify the natural or formal language 1656 in which the content is 1657 written. 1658 A special <a href="#dt-attr">attribute</a> named 1659 xml:lang may be inserted in 1660 documents to specify the 1661 language used in the contents and attribute values 1662 of any element in an XML document. 1663 In valid documents, this attribute, like any other, must be 1664 <a href="#dt-attdecl">declared</a> if it is used. 1665 The values of the attribute are language identifiers as defined 1666 by <a href="#RFC1766">[IETF RFC 1766]</a>, "Tags for the Identification of Languages": 1667 1668 <h5>Language Identification</h5> 1669 <table class="scrap"> 1670 <tbody> 1671 <tr valign="baseline"> 1672 <td><a name="NT-LanguageID"></a>[33] 1673 </td> 1674 <td>LanguageID</td> 1675 <td> ::= </td> 1676 <td><a href="#NT-Langcode">Langcode</a> 1677 ('-' <a href="#NT-Subcode">Subcode</a>)* 1678 </td> 1679 <td></td> 1680 </tr> 1681 <tr valign="baseline"> 1682 <td><a name="NT-Langcode"></a>[34] 1683 </td> 1684 <td>Langcode</td> 1685 <td> ::= </td> 1686 <td><a href="#NT-ISO639Code">ISO639Code</a> | 1687 <a href="#NT-IanaCode">IanaCode</a> | 1688 <a href="#NT-UserCode">UserCode</a></td> 1689 <td></td> 1690 </tr> 1691 <tr valign="baseline"> 1692 <td><a name="NT-ISO639Code"></a>[35] 1693 </td> 1694 <td>ISO639Code</td> 1695 <td> ::= </td> 1696 <td>([a-z] | [A-Z]) ([a-z] | [A-Z])</td> 1697 <td></td> 1698 </tr> 1699 <tr valign="baseline"> 1700 <td><a name="NT-IanaCode"></a>[36] 1701 </td> 1702 <td>IanaCode</td> 1703 <td> ::= </td> 1704 <td>('i' | 'I') '-' ([a-z] | [A-Z])+</td> 1705 <td></td> 1706 </tr> 1707 <tr valign="baseline"> 1708 <td><a name="NT-UserCode"></a>[37] 1709 </td> 1710 <td>UserCode</td> 1711 <td> ::= </td> 1712 <td>('x' | 'X') '-' ([a-z] | [A-Z])+</td> 1713 <td></td> 1714 </tr> 1715 <tr valign="baseline"> 1716 <td><a name="NT-Subcode"></a>[38] 1717 </td> 1718 <td>Subcode</td> 1719 <td> ::= </td> 1720 <td>([a-z] | [A-Z])+</td> 1721 <td></td> 1722 </tr> 1723 </tbody> 1724 </table> 1725 The <a href="#NT-Langcode">Langcode</a> may be any of the following: 1726 1727 <ul> 1728 1729 <li> 1730 <p>a two-letter language code as defined by 1731 <a href="#ISO639">[ISO 639]</a>, "Codes 1732 for the representation of names of languages" 1733 </p> 1734 </li> 1735 1736 <li> 1737 <p>a language identifier registered with the Internet 1738 Assigned Numbers Authority <a href="#IANA">[IANA]</a>; these begin with the 1739 prefix "<code>i-</code>" (or "<code>I-</code>") 1740 </p> 1741 </li> 1742 1743 <li> 1744 <p>a language identifier assigned by the user, or agreed on 1745 between parties in private use; these must begin with the 1746 prefix "<code>x-</code>" or "<code>X-</code>" in order to ensure that they do not conflict 1747 with names later standardized or registered with IANA 1748 </p> 1749 </li> 1750 1751 </ul> 1752 </p> 1753 1754 <p>There may be any number of <a href="#NT-Subcode">Subcode</a> segments; if 1755 the first 1756 subcode segment exists and the Subcode consists of two 1757 letters, then it must be a country code from 1758 <a href="#ISO3166">[ISO 3166]</a>, "Codes 1759 for the representation of names of countries." 1760 If the first 1761 subcode consists of more than two letters, it must be 1762 a subcode for the language in question registered with IANA, 1763 unless the <a href="#NT-Langcode">Langcode</a> begins with the prefix 1764 "<code>x-</code>" or 1765 "<code>X-</code>". 1766 </p> 1767 1768 <p>It is customary to give the language code in lower case, and 1769 the country code (if any) in upper case. 1770 Note that these values, unlike other names in XML documents, 1771 are case insensitive. 1772 </p> 1773 1774 <p>For example: 1775 <pre><p xml:lang="en">The quick brown fox jumps over the lazy dog.</p> 1776<p xml:lang="en-GB">What colour is it?</p> 1777<p xml:lang="en-US">What color is it?</p> 1778<sp who="Faust" desc='leise' xml:lang="de"> 1779 <l>Habe nun, ach! Philosophie,</l> 1780 <l>Juristerei, und Medizin</l> 1781 <l>und leider auch Theologie</l> 1782 <l>durchaus studiert mit heißem Bemüh'n.</l> 1783 </sp></pre></p> 1784 1785 1786 <p>The intent declared with xml:lang is considered to apply to 1787 all attributes and content of the element where it is specified, 1788 unless overridden with an instance of xml:lang 1789 on another element within that content. 1790 </p> 1791 1792 1793 <p>A simple declaration for xml:lang might take 1794 the form 1795 <pre>xml:lang NMTOKEN #IMPLIED</pre> 1796 but specific default values may also be given, if appropriate. In a 1797 collection of French poems for English students, with glosses and 1798 notes in English, the xml:lang attribute might be declared this way: 1799 <pre> <!ATTLIST poem xml:lang NMTOKEN 'fr'> 1800 <!ATTLIST gloss xml:lang NMTOKEN 'en'> 1801 <!ATTLIST note xml:lang NMTOKEN 'en'></pre> 1802 </p> 1803 1804 1805 1806 1807 1808 1809 1810 <h2><a name="sec-logical-struct"></a>3 Logical Structures 1811 </h2> 1812 1813 1814 <p><a name="dt-element"></a>Each <a href="#dt-xml-doc">XML document</a> contains one or more 1815 <b>elements</b>, the boundaries of which are 1816 either delimited by <a href="#dt-stag">start-tags</a> 1817 and <a href="#dt-etag">end-tags</a>, or, for <a href="#dt-empty">empty</a> elements, by an <a href="#dt-eetag">empty-element tag</a>. Each element has a type, 1818 identified by name, sometimes called its "generic 1819 identifier" (GI), and may have a set of 1820 attribute specifications. Each attribute specification 1821 has a <a href="#dt-attrname">name</a> and a <a href="#dt-attrval">value</a>. 1822 1823 </p> 1824 1825 <h5>Element</h5> 1826 <table class="scrap"> 1827 <tbody> 1828 <tr valign="baseline"> 1829 <td><a name="NT-element"></a>[39] 1830 </td> 1831 <td>element</td> 1832 <td> ::= </td> 1833 <td><a href="#NT-EmptyElemTag">EmptyElemTag</a></td> 1834 <td></td> 1835 </tr> 1836 <tr valign="baseline"> 1837 <td></td> 1838 <td></td> 1839 <td></td> 1840 <td>| <a href="#NT-STag">STag</a> <a href="#NT-content">content</a> 1841 <a href="#NT-ETag">ETag</a></td> 1842 <td>[ WFC: <a href="#GIMatch">Element Type Match</a> ] 1843 </td> 1844 </tr> 1845 <tr valign="baseline"> 1846 <td></td> 1847 <td></td> 1848 <td></td> 1849 <td></td> 1850 <td>[ VC: <a href="#elementvalid">Element Valid</a> ] 1851 </td> 1852 </tr> 1853 </tbody> 1854 </table> 1855 1856 <p>This specification does not constrain the semantics, use, or (beyond 1857 syntax) names of the element types and attributes, except that names 1858 beginning with a match to <code>(('X'|'x')('M'|'m')('L'|'l'))</code> 1859 are reserved for standardization in this or future versions of this 1860 specification. 1861 1862 </p> 1863 <a name="GIMatch"></a><p><b>Well Formedness Constraint: Element Type Match</b></p> 1864 Element Type Match 1865 1866 <p> 1867 The <a href="#NT-Name">Name</a> in an element's end-tag must match 1868 the element type in 1869 the start-tag. 1870 1871 </p> 1872 1873 <a name="elementvalid"></a><p><b>Validity Constraint: Element Valid</b></p> 1874 Element Valid 1875 1876 <p>An element is 1877 valid if 1878 there is a declaration matching 1879 <a href="#NT-elementdecl">elementdecl</a> where the 1880 <a href="#NT-Name">Name</a> matches the element type, and 1881 one of the following holds: 1882 </p> 1883 1884 <ol> 1885 1886 <li> 1887 <p>The declaration matches EMPTY and the element has no 1888 <a href="#dt-content">content</a>. 1889 </p> 1890 </li> 1891 1892 <li> 1893 <p>The declaration matches <a href="#NT-children">children</a> and 1894 the sequence of 1895 <a href="#dt-parentchild">child elements</a> 1896 belongs to the language generated by the regular expression in 1897 the content model, with optional white space (characters 1898 matching the nonterminal <a href="#NT-S">S</a>) between each pair 1899 of child elements. 1900 </p> 1901 </li> 1902 1903 <li> 1904 <p>The declaration matches <a href="#NT-Mixed">Mixed</a> and 1905 the content consists of <a href="#dt-chardata">character 1906 data 1907 </a> and <a href="#dt-parentchild">child elements</a> 1908 whose types match names in the content model. 1909 </p> 1910 </li> 1911 1912 <li> 1913 <p>The declaration matches ANY, and the types 1914 of any <a href="#dt-parentchild">child elements</a> have 1915 been declared. 1916 </p> 1917 </li> 1918 1919 </ol> 1920 1921 1922 1923 1924 <h3><a name="sec-starttags"></a>3.1 Start-Tags, End-Tags, and Empty-Element Tags 1925 </h3> 1926 1927 1928 <p><a name="dt-stag"></a>The beginning of every 1929 non-empty XML element is marked by a <b>start-tag</b>. 1930 1931 <h5>Start-tag</h5> 1932 <table class="scrap"> 1933 <tbody> 1934 1935 <tr valign="baseline"> 1936 <td><a name="NT-STag"></a>[40] 1937 </td> 1938 <td>STag</td> 1939 <td> ::= </td> 1940 <td>'<' <a href="#NT-Name">Name</a> 1941 (<a href="#NT-S">S</a> <a href="#NT-Attribute">Attribute</a>)* 1942 <a href="#NT-S">S</a>? '>' 1943 </td> 1944 <td>[ WFC: <a href="#uniqattspec">Unique Att Spec</a> ] 1945 </td> 1946 </tr> 1947 1948 <tr valign="baseline"> 1949 <td><a name="NT-Attribute"></a>[41] 1950 </td> 1951 <td>Attribute</td> 1952 <td> ::= </td> 1953 <td><a href="#NT-Name">Name</a> <a href="#NT-Eq">Eq</a> 1954 <a href="#NT-AttValue">AttValue</a></td> 1955 <td>[ VC: <a href="#ValueType">Attribute Value Type</a> ] 1956 </td> 1957 </tr> 1958 <tr valign="baseline"> 1959 <td></td> 1960 <td></td> 1961 <td></td> 1962 <td></td> 1963 <td>[ WFC: <a href="#NoExternalRefs">No External Entity References</a> ] 1964 </td> 1965 </tr> 1966 <tr valign="baseline"> 1967 <td></td> 1968 <td></td> 1969 <td></td> 1970 <td></td> 1971 <td>[ WFC: <a href="#CleanAttrVals">No < in Attribute Values</a> ] 1972 </td> 1973 </tr> 1974 1975 </tbody> 1976 </table> 1977 The <a href="#NT-Name">Name</a> in 1978 the start- and end-tags gives the 1979 element's <b>type</b>. 1980 <a name="dt-attr"></a> 1981 The <a href="#NT-Name">Name</a>-<a href="#NT-AttValue">AttValue</a> pairs are 1982 referred to as 1983 the <b>attribute specifications</b> of the element, 1984 <a name="dt-attrname"></a>with the 1985 <a href="#NT-Name">Name</a> in each pair 1986 referred to as the <b>attribute name</b> and 1987 <a name="dt-attrval"></a>the content of the 1988 <a href="#NT-AttValue">AttValue</a> (the text between the 1989 <code>'</code> or <code>"</code> delimiters) 1990 as the <b>attribute value</b>. 1991 1992 </p> 1993 <a name="uniqattspec"></a><p><b>Well Formedness Constraint: Unique Att Spec</b></p> 1994 Unique Att Spec 1995 1996 <p> 1997 No attribute name may appear more than once in the same start-tag 1998 or empty-element tag. 1999 2000 </p> 2001 2002 <a name="ValueType"></a><p><b>Validity Constraint: Attribute Value Type</b></p> 2003 Attribute Value Type 2004 2005 <p> 2006 The attribute must have been declared; the value must be of the type 2007 declared for it. 2008 (For attribute types, see <a href="#attdecls">[<b>3.3 Attribute-List Declarations</b>] 2009 </a>.) 2010 2011 </p> 2012 2013 <a name="NoExternalRefs"></a><p><b>Well Formedness Constraint: No External Entity References</b></p> 2014 No External Entity References 2015 2016 <p> 2017 Attribute values cannot contain direct or indirect entity references 2018 to external entities. 2019 2020 </p> 2021 2022 <a name="CleanAttrVals"></a><p><b>Well Formedness Constraint: No < in Attribute Values</b></p> 2023 No <code><</code> in Attribute Values 2024 2025 <p>The <a href="#dt-repltext">replacement text</a> of any entity 2026 referred to directly or indirectly in an attribute 2027 value (other than "<code>&lt;</code>") must not contain 2028 a <code><</code>. 2029 2030 </p> 2031 2032 <p>An example of a start-tag: 2033 <pre><termdef id="dt-dog" term="dog"></pre></p> 2034 2035 <p><a name="dt-etag"></a>The end of every element 2036 that begins with a start-tag must 2037 be marked by an <b>end-tag</b> 2038 containing a name that echoes the element's type as given in the 2039 start-tag: 2040 2041 <h5>End-tag</h5> 2042 <table class="scrap"> 2043 <tbody> 2044 2045 <tr valign="baseline"> 2046 <td><a name="NT-ETag"></a>[42] 2047 </td> 2048 <td>ETag</td> 2049 <td> ::= </td> 2050 <td>'</' <a href="#NT-Name">Name</a> 2051 <a href="#NT-S">S</a>? '>' 2052 </td> 2053 <td></td> 2054 </tr> 2055 2056 </tbody> 2057 </table> 2058 2059 </p> 2060 2061 <p>An example of an end-tag:<pre></termdef></pre></p> 2062 2063 <p><a name="dt-content"></a>The 2064 <a href="#dt-text">text</a> between the start-tag and 2065 end-tag is called the element's 2066 <b>content</b>: 2067 2068 <h5>Content of Elements</h5> 2069 <table class="scrap"> 2070 <tbody> 2071 2072 <tr valign="baseline"> 2073 <td><a name="NT-content"></a>[43] 2074 </td> 2075 <td>content</td> 2076 <td> ::= </td> 2077 <td>(<a href="#NT-element">element</a> | <a href="#NT-CharData">CharData</a> 2078 | <a href="#NT-Reference">Reference</a> | <a href="#NT-CDSect">CDSect</a> 2079 | <a href="#NT-PI">PI</a> | <a href="#NT-Comment">Comment</a>)* 2080 </td> 2081 <td></td> 2082 </tr> 2083 2084 </tbody> 2085 </table> 2086 2087 </p> 2088 2089 <p><a name="dt-empty"></a>If an element is <b>empty</b>, 2090 it must be represented either by a start-tag immediately followed 2091 by an end-tag or by an empty-element tag. 2092 <a name="dt-eetag"></a>An 2093 <b>empty-element tag</b> takes a special form: 2094 2095 <h5>Tags for Empty Elements</h5> 2096 <table class="scrap"> 2097 <tbody> 2098 2099 <tr valign="baseline"> 2100 <td><a name="NT-EmptyElemTag"></a>[44] 2101 </td> 2102 <td>EmptyElemTag</td> 2103 <td> ::= </td> 2104 <td>'<' <a href="#NT-Name">Name</a> (<a href="#NT-S">S</a> 2105 <a href="#NT-Attribute">Attribute</a>)* <a href="#NT-S">S</a>? 2106 '/>' 2107 </td> 2108 <td>[ WFC: <a href="#uniqattspec">Unique Att Spec</a> ] 2109 </td> 2110 </tr> 2111 2112 </tbody> 2113 </table> 2114 2115 </p> 2116 2117 <p>Empty-element tags may be used for any element which has no 2118 content, whether or not it is declared using the keyword 2119 EMPTY. 2120 <a href="#dt-interop">For interoperability</a>, the empty-element 2121 tag must be used, and can only be used, for elements which are 2122 <a href="#dt-eldecl">declared</a> EMPTY. 2123 </p> 2124 2125 <p>Examples of empty elements: 2126 <pre><IMG align="left" 2127 src="http://www.w3.org/Icons/WWW/w3c_home" /> 2128<br></br> 2129<br/></pre></p> 2130 2131 2132 2133 2134 <h3><a name="elemdecls"></a>3.2 Element Type Declarations 2135 </h3> 2136 2137 2138 <p>The <a href="#dt-element">element</a> structure of an 2139 <a href="#dt-xml-doc">XML document</a> may, for 2140 <a href="#dt-valid">validation</a> purposes, 2141 be constrained 2142 using element type and attribute-list declarations. 2143 An element type declaration constrains the element's 2144 <a href="#dt-content">content</a>. 2145 2146 </p> 2147 2148 2149 <p>Element type declarations often constrain which element types can 2150 appear as <a href="#dt-parentchild">children</a> of the element. 2151 At user option, an XML processor may issue a warning 2152 when a declaration mentions an element type for which no declaration 2153 is provided, but this is not an error. 2154 </p> 2155 2156 <p><a name="dt-eldecl"></a>An <b>element 2157 type declaration 2158 </b> takes the form: 2159 2160 <h5>Element Type Declaration</h5> 2161 <table class="scrap"> 2162 <tbody> 2163 2164 <tr valign="baseline"> 2165 <td><a name="NT-elementdecl"></a>[45] 2166 </td> 2167 <td>elementdecl</td> 2168 <td> ::= </td> 2169 <td>'<!ELEMENT' <a href="#NT-S">S</a> 2170 <a href="#NT-Name">Name</a> 2171 <a href="#NT-S">S</a> 2172 <a href="#NT-contentspec">contentspec</a> 2173 <a href="#NT-S">S</a>? '>' 2174 </td> 2175 <td>[ VC: <a href="#EDUnique">Unique Element Type Declaration</a> ] 2176 </td> 2177 </tr> 2178 2179 <tr valign="baseline"> 2180 <td><a name="NT-contentspec"></a>[46] 2181 </td> 2182 <td>contentspec</td> 2183 <td> ::= </td> 2184 <td>'EMPTY' 2185 | 'ANY' 2186 | <a href="#NT-Mixed">Mixed</a> 2187 | <a href="#NT-children">children</a> 2188 2189 </td> 2190 <td></td> 2191 </tr> 2192 2193 </tbody> 2194 </table> 2195 where the <a href="#NT-Name">Name</a> gives the element type 2196 being declared. 2197 2198 </p> 2199 2200 <a name="EDUnique"></a><p><b>Validity Constraint: Unique Element Type Declaration</b></p> 2201 Unique Element Type Declaration 2202 2203 <p> 2204 No element type may be declared more than once. 2205 2206 </p> 2207 2208 2209 2210 <p>Examples of element type declarations: 2211 <pre><!ELEMENT br EMPTY> 2212<!ELEMENT p (#PCDATA|emph)* > 2213<!ELEMENT %name.para; %content.para; > 2214<!ELEMENT container ANY></pre></p> 2215 2216 2217 2218 <h4><a name="sec-element-content"></a>3.2.1 Element Content 2219 </h4> 2220 2221 2222 <p><a name="dt-elemcontent"></a>An element <a href="#dt-stag">type</a> has 2223 <b>element content</b> when elements of that 2224 type must contain only <a href="#dt-parentchild">child</a> 2225 elements (no character data), optionally separated by 2226 white space (characters matching the nonterminal 2227 <a href="#NT-S">S</a>). 2228 2229 In this case, the 2230 constraint includes a content model, a simple grammar governing 2231 the allowed types of the child 2232 elements and the order in which they are allowed to appear. 2233 The grammar is built on 2234 content particles (<a href="#NT-cp">cp</a>s), which consist of names, 2235 choice lists of content particles, or 2236 sequence lists of content particles: 2237 2238 <h5>Element-content Models</h5> 2239 <table class="scrap"> 2240 <tbody> 2241 2242 <tr valign="baseline"> 2243 <td><a name="NT-children"></a>[47] 2244 </td> 2245 <td>children</td> 2246 <td> ::= </td> 2247 <td>(<a href="#NT-choice">choice</a> 2248 | <a href="#NT-seq">seq</a>) 2249 ('?' | '*' | '+')? 2250 </td> 2251 <td></td> 2252 </tr> 2253 2254 <tr valign="baseline"> 2255 <td><a name="NT-cp"></a>[48] 2256 </td> 2257 <td>cp</td> 2258 <td> ::= </td> 2259 <td>(<a href="#NT-Name">Name</a> 2260 | <a href="#NT-choice">choice</a> 2261 | <a href="#NT-seq">seq</a>) 2262 ('?' | '*' | '+')? 2263 </td> 2264 <td></td> 2265 </tr> 2266 2267 <tr valign="baseline"> 2268 <td><a name="NT-choice"></a>[49] 2269 </td> 2270 <td>choice</td> 2271 <td> ::= </td> 2272 <td>'(' <a href="#NT-S">S</a>? cp 2273 ( <a href="#NT-S">S</a>? '|' <a href="#NT-S">S</a>? <a href="#NT-cp">cp</a> )* 2274 <a href="#NT-S">S</a>? ')' 2275 </td> 2276 <td>[ VC: <a href="#vc-PEinGroup">Proper Group/PE Nesting</a> ] 2277 </td> 2278 </tr> 2279 2280 <tr valign="baseline"> 2281 <td><a name="NT-seq"></a>[50] 2282 </td> 2283 <td>seq</td> 2284 <td> ::= </td> 2285 <td>'(' <a href="#NT-S">S</a>? cp 2286 ( <a href="#NT-S">S</a>? ',' <a href="#NT-S">S</a>? <a href="#NT-cp">cp</a> )* 2287 <a href="#NT-S">S</a>? ')' 2288 </td> 2289 <td>[ VC: <a href="#vc-PEinGroup">Proper Group/PE Nesting</a> ] 2290 </td> 2291 </tr> 2292 2293 2294 </tbody> 2295 </table> 2296 where each <a href="#NT-Name">Name</a> is the type of an element which may 2297 appear as a <a href="#dt-parentchild">child</a>. 2298 Any content 2299 particle in a choice list may appear in the <a href="#dt-elemcontent">element content</a> at the location where 2300 the choice list appears in the grammar; 2301 content particles occurring in a sequence list must each 2302 appear in the <a href="#dt-elemcontent">element content</a> in the 2303 order given in the list. 2304 The optional character following a name or list governs 2305 whether the element or the content particles in the list may occur one 2306 or more (<code>+</code>), zero or more (<code>*</code>), or zero or 2307 one times (<code>?</code>). 2308 The absence of such an operator means that the element or content particle 2309 must appear exactly once. 2310 This syntax 2311 and meaning are identical to those used in the productions in this 2312 specification. 2313 </p> 2314 2315 <p> 2316 The content of an element matches a content model if and only if it is 2317 possible to trace out a path through the content model, obeying the 2318 sequence, choice, and repetition operators and matching each element in 2319 the content against an element type in the content model. <a href="#dt-compat">For compatibility</a>, it is an error 2320 if an element in the document can 2321 match more than one occurrence of an element type in the content model. 2322 For more information, see <a href="#determinism">[<b>E Deterministic Content Models</b>] 2323 </a>. 2324 2325 2326 2327 </p> 2328 <a name="vc-PEinGroup"></a><p><b>Validity Constraint: Proper Group/PE Nesting</b></p> 2329 Proper Group/PE Nesting 2330 2331 <p>Parameter-entity 2332 <a href="#dt-repltext">replacement text</a> must be properly nested 2333 with parenthetized groups. 2334 That is to say, if either of the opening or closing parentheses 2335 in a <a href="#NT-choice">choice</a>, <a href="#NT-seq">seq</a>, or 2336 <a href="#NT-Mixed">Mixed</a> construct 2337 is contained in the replacement text for a 2338 <a href="#dt-PERef">parameter entity</a>, 2339 both must be contained in the same replacement text. 2340 </p> 2341 2342 <p><a href="#dt-interop">For interoperability</a>, 2343 if a parameter-entity reference appears in a 2344 <a href="#NT-choice">choice</a>, <a href="#NT-seq">seq</a>, or 2345 <a href="#NT-Mixed">Mixed</a> construct, its replacement text 2346 should not be empty, and 2347 neither the first nor last non-blank 2348 character of the replacement text should be a connector 2349 (<code>|</code> or <code>,</code>). 2350 2351 </p> 2352 2353 2354 <p>Examples of element-content models: 2355 <pre><!ELEMENT spec (front, body, back?)> 2356<!ELEMENT div1 (head, (p | list | note)*, div2*)> 2357<!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*></pre></p> 2358 2359 2360 2361 2362 <h4><a name="sec-mixed-content"></a>3.2.2 Mixed Content 2363 </h4> 2364 2365 2366 <p><a name="dt-mixed"></a>An element 2367 <a href="#dt-stag">type</a> has 2368 <b>mixed content</b> when elements of that type may contain 2369 character data, optionally interspersed with 2370 <a href="#dt-parentchild">child</a> elements. 2371 In this case, the types of the child elements 2372 may be constrained, but not their order or their number of occurrences: 2373 2374 <h5>Mixed-content Declaration</h5> 2375 <table class="scrap"> 2376 <tbody> 2377 2378 <tr valign="baseline"> 2379 <td><a name="NT-Mixed"></a>[51] 2380 </td> 2381 <td>Mixed</td> 2382 <td> ::= </td> 2383 <td>'(' <a href="#NT-S">S</a>? 2384 '#PCDATA' 2385 (<a href="#NT-S">S</a>? 2386 '|' 2387 <a href="#NT-S">S</a>? 2388 <a href="#NT-Name">Name</a>)* 2389 <a href="#NT-S">S</a>? 2390 ')*' 2391 </td> 2392 <td></td> 2393 </tr> 2394 <tr valign="baseline"> 2395 <td></td> 2396 <td></td> 2397 <td></td> 2398 <td>| '(' <a href="#NT-S">S</a>? '#PCDATA' <a href="#NT-S">S</a>? ')' 2399 2400 </td> 2401 <td>[ VC: <a href="#vc-PEinGroup">Proper Group/PE Nesting</a> ] 2402 </td> 2403 </tr> 2404 <tr valign="baseline"> 2405 <td></td> 2406 <td></td> 2407 <td></td> 2408 <td></td> 2409 <td>[ VC: <a href="#vc-MixedChildrenUnique">No Duplicate Types</a> ] 2410 </td> 2411 </tr> 2412 2413 2414 </tbody> 2415 </table> 2416 where the <a href="#NT-Name">Name</a>s give the types of elements 2417 that may appear as children. 2418 2419 </p> 2420 <a name="vc-MixedChildrenUnique"></a><p><b>Validity Constraint: No Duplicate Types</b></p> 2421 No Duplicate Types 2422 2423 <p>The same name must not appear more than once in a single mixed-content 2424 declaration. 2425 2426 </p> 2427 2428 <p>Examples of mixed content declarations: 2429 <pre><!ELEMENT p (#PCDATA|a|ul|b|i|em)*> 2430<!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* > 2431<!ELEMENT b (#PCDATA)></pre></p> 2432 2433 2434 2435 2436 2437 <h3><a name="attdecls"></a>3.3 Attribute-List Declarations 2438 </h3> 2439 2440 2441 <p><a href="#dt-attr">Attributes</a> are used to associate 2442 name-value pairs with <a href="#dt-element">elements</a>. 2443 Attribute specifications may appear only within <a href="#dt-stag">start-tags</a> 2444 and <a href="#dt-eetag">empty-element tags</a>; 2445 thus, the productions used to 2446 recognize them appear in <a href="#sec-starttags">[<b>3.1 Start-Tags, End-Tags, and Empty-Element Tags</b>] 2447 </a>. 2448 Attribute-list 2449 declarations may be used: 2450 2451 <ul> 2452 2453 <li> 2454 <p>To define the set of attributes pertaining to a given 2455 element type. 2456 </p> 2457 </li> 2458 2459 <li> 2460 <p>To establish type constraints for these 2461 attributes. 2462 </p> 2463 </li> 2464 2465 <li> 2466 <p>To provide <a href="#dt-default">default values</a> 2467 for attributes. 2468 </p> 2469 </li> 2470 2471 </ul> 2472 2473 </p> 2474 2475 <p><a name="dt-attdecl"></a> 2476 <b>Attribute-list declarations</b> specify the name, data type, and default 2477 value (if any) of each attribute associated with a given element type: 2478 2479 <h5>Attribute-list Declaration</h5> 2480 <table class="scrap"> 2481 <tbody> 2482 <tr valign="baseline"> 2483 <td><a name="NT-AttlistDecl"></a>[52] 2484 </td> 2485 <td>AttlistDecl</td> 2486 <td> ::= </td> 2487 <td>'<!ATTLIST' <a href="#NT-S">S</a> 2488 <a href="#NT-Name">Name</a> 2489 <a href="#NT-AttDef">AttDef</a>* 2490 <a href="#NT-S">S</a>? '>' 2491 </td> 2492 <td></td> 2493 </tr> 2494 <tr valign="baseline"> 2495 <td><a name="NT-AttDef"></a>[53] 2496 </td> 2497 <td>AttDef</td> 2498 <td> ::= </td> 2499 <td><a href="#NT-S">S</a> <a href="#NT-Name">Name</a> 2500 <a href="#NT-S">S</a> <a href="#NT-AttType">AttType</a> 2501 <a href="#NT-S">S</a> <a href="#NT-DefaultDecl">DefaultDecl</a></td> 2502 <td></td> 2503 </tr> 2504 </tbody> 2505 </table> 2506 The <a href="#NT-Name">Name</a> in the 2507 <a href="#NT-AttlistDecl">AttlistDecl</a> rule is the type of an element. At 2508 user option, an XML processor may issue a warning if attributes are 2509 declared for an element type not itself declared, but this is not an 2510 error. The <a href="#NT-Name">Name</a> in the 2511 <a href="#NT-AttDef">AttDef</a> rule is 2512 the name of the attribute. 2513 </p> 2514 2515 <p> 2516 When more than one <a href="#NT-AttlistDecl">AttlistDecl</a> is provided for a 2517 given element type, the contents of all those provided are merged. When 2518 more than one definition is provided for the same attribute of a 2519 given element type, the first declaration is binding and later 2520 declarations are ignored. 2521 <a href="#dt-interop">For interoperability,</a> writers of DTDs 2522 may choose to provide at most one attribute-list declaration 2523 for a given element type, at most one attribute definition 2524 for a given attribute name, and at least one attribute definition 2525 in each attribute-list declaration. 2526 For interoperability, an XML processor may at user option 2527 issue a warning when more than one attribute-list declaration is 2528 provided for a given element type, or more than one attribute definition 2529 is provided 2530 for a given attribute, but this is not an error. 2531 2532 </p> 2533 2534 2535 2536 <h4><a name="sec-attribute-types"></a>3.3.1 Attribute Types 2537 </h4> 2538 2539 2540 <p>XML attribute types are of three kinds: a string type, a 2541 set of tokenized types, and enumerated types. The string type may take 2542 any literal string as a value; the tokenized types have varying lexical 2543 and semantic constraints, as noted: 2544 2545 <h5>Attribute Types</h5> 2546 <table class="scrap"> 2547 <tbody> 2548 2549 <tr valign="baseline"> 2550 <td><a name="NT-AttType"></a>[54] 2551 </td> 2552 <td>AttType</td> 2553 <td> ::= </td> 2554 <td><a href="#NT-StringType">StringType</a> 2555 | <a href="#NT-TokenizedType">TokenizedType</a> 2556 | <a href="#NT-EnumeratedType">EnumeratedType</a> 2557 2558 </td> 2559 <td></td> 2560 </tr> 2561 2562 <tr valign="baseline"> 2563 <td><a name="NT-StringType"></a>[55] 2564 </td> 2565 <td>StringType</td> 2566 <td> ::= </td> 2567 <td>'CDATA'</td> 2568 <td></td> 2569 </tr> 2570 2571 <tr valign="baseline"> 2572 <td><a name="NT-TokenizedType"></a>[56] 2573 </td> 2574 <td>TokenizedType</td> 2575 <td> ::= </td> 2576 <td>'ID'</td> 2577 <td>[ VC: <a href="#id">ID</a> ] 2578 </td> 2579 </tr> 2580 <tr valign="baseline"> 2581 <td></td> 2582 <td></td> 2583 <td></td> 2584 <td></td> 2585 <td>[ VC: <a href="#one-id-per-el">One ID per Element Type</a> ] 2586 </td> 2587 </tr> 2588 <tr valign="baseline"> 2589 <td></td> 2590 <td></td> 2591 <td></td> 2592 <td></td> 2593 <td>[ VC: <a href="#id-default">ID Attribute Default</a> ] 2594 </td> 2595 </tr> 2596 <tr valign="baseline"> 2597 <td></td> 2598 <td></td> 2599 <td></td> 2600 <td>| 'IDREF'</td> 2601 <td>[ VC: <a href="#idref">IDREF</a> ] 2602 </td> 2603 </tr> 2604 <tr valign="baseline"> 2605 <td></td> 2606 <td></td> 2607 <td></td> 2608 <td>| 'IDREFS'</td> 2609 <td>[ VC: <a href="#idref">IDREF</a> ] 2610 </td> 2611 </tr> 2612 <tr valign="baseline"> 2613 <td></td> 2614 <td></td> 2615 <td></td> 2616 <td>| 'ENTITY'</td> 2617 <td>[ VC: <a href="#entname">Entity Name</a> ] 2618 </td> 2619 </tr> 2620 <tr valign="baseline"> 2621 <td></td> 2622 <td></td> 2623 <td></td> 2624 <td>| 'ENTITIES'</td> 2625 <td>[ VC: <a href="#entname">Entity Name</a> ] 2626 </td> 2627 </tr> 2628 <tr valign="baseline"> 2629 <td></td> 2630 <td></td> 2631 <td></td> 2632 <td>| 'NMTOKEN'</td> 2633 <td>[ VC: <a href="#nmtok">Name Token</a> ] 2634 </td> 2635 </tr> 2636 <tr valign="baseline"> 2637 <td></td> 2638 <td></td> 2639 <td></td> 2640 <td>| 'NMTOKENS'</td> 2641 <td>[ VC: <a href="#nmtok">Name Token</a> ] 2642 </td> 2643 </tr> 2644 2645 </tbody> 2646 </table> 2647 2648 </p> 2649 <a name="id"></a><p><b>Validity Constraint: ID</b></p> 2650 ID 2651 2652 <p> 2653 Values of type ID must match the 2654 <a href="#NT-Name">Name</a> production. 2655 A name must not appear more than once in 2656 an XML document as a value of this type; i.e., ID values must uniquely 2657 identify the elements which bear them. 2658 2659 </p> 2660 2661 <a name="one-id-per-el"></a><p><b>Validity Constraint: One ID per Element Type</b></p> 2662 One ID per Element Type 2663 2664 <p>No element type may have more than one ID attribute specified.</p> 2665 2666 <a name="id-default"></a><p><b>Validity Constraint: ID Attribute Default</b></p> 2667 ID Attribute Default 2668 2669 <p>An ID attribute must have a declared default of #IMPLIED or 2670 #REQUIRED. 2671 </p> 2672 2673 <a name="idref"></a><p><b>Validity Constraint: IDREF</b></p> 2674 IDREF 2675 2676 <p> 2677 Values of type IDREF must match 2678 the <a href="#NT-Name">Name</a> production, and 2679 values of type IDREFS must match 2680 <a href="#NT-Names">Names</a>; 2681 each <a href="#NT-Name">Name</a> must match the value of an ID attribute on 2682 some element in the XML document; i.e. IDREF values must 2683 match the value of some ID attribute. 2684 2685 </p> 2686 2687 <a name="entname"></a><p><b>Validity Constraint: Entity Name</b></p> 2688 Entity Name 2689 2690 <p> 2691 Values of type ENTITY 2692 must match the <a href="#NT-Name">Name</a> production, 2693 values of type ENTITIES must match 2694 <a href="#NT-Names">Names</a>; 2695 each <a href="#NT-Name">Name</a> must 2696 match the 2697 name of an <a href="#dt-unparsed">unparsed entity</a> declared in the 2698 <a href="#dt-doctype">DTD</a>. 2699 2700 </p> 2701 2702 <a name="nmtok"></a><p><b>Validity Constraint: Name Token</b></p> 2703 Name Token 2704 2705 <p> 2706 Values of type NMTOKEN must match the 2707 <a href="#NT-Nmtoken">Nmtoken</a> production; 2708 values of type NMTOKENS must 2709 match <a href="#NT-Nmtokens">Nmtokens</a>. 2710 2711 </p> 2712 2713 2714 2715 <p><a name="dt-enumerated"></a><b>Enumerated attributes</b> can take one 2716 of a list of values provided in the declaration. There are two 2717 kinds of enumerated types: 2718 2719 <h5>Enumerated Attribute Types</h5> 2720 <table class="scrap"> 2721 <tbody> 2722 <tr valign="baseline"> 2723 <td><a name="NT-EnumeratedType"></a>[57] 2724 </td> 2725 <td>EnumeratedType</td> 2726 <td> ::= </td> 2727 <td><a href="#NT-NotationType">NotationType</a> 2728 | <a href="#NT-Enumeration">Enumeration</a> 2729 2730 </td> 2731 <td></td> 2732 </tr> 2733 <tr valign="baseline"> 2734 <td><a name="NT-NotationType"></a>[58] 2735 </td> 2736 <td>NotationType</td> 2737 <td> ::= </td> 2738 <td>'NOTATION' 2739 <a href="#NT-S">S</a> 2740 '(' 2741 <a href="#NT-S">S</a>? 2742 <a href="#NT-Name">Name</a> 2743 (<a href="#NT-S">S</a>? '|' <a href="#NT-S">S</a>? 2744 <a href="#NT-Name">Name</a>)* 2745 <a href="#NT-S">S</a>? ')' 2746 2747 </td> 2748 <td>[ VC: <a href="#notatn">Notation Attributes</a> ] 2749 </td> 2750 </tr> 2751 <tr valign="baseline"> 2752 <td><a name="NT-Enumeration"></a>[59] 2753 </td> 2754 <td>Enumeration</td> 2755 <td> ::= </td> 2756 <td>'(' <a href="#NT-S">S</a>? 2757 <a href="#NT-Nmtoken">Nmtoken</a> 2758 (<a href="#NT-S">S</a>? '|' 2759 <a href="#NT-S">S</a>? 2760 <a href="#NT-Nmtoken">Nmtoken</a>)* 2761 <a href="#NT-S">S</a>? 2762 ')' 2763 </td> 2764 <td>[ VC: <a href="#enum">Enumeration</a> ] 2765 </td> 2766 </tr> 2767 </tbody> 2768 </table> 2769 A NOTATION attribute identifies a 2770 <a href="#dt-notation">notation</a>, declared in the 2771 DTD with associated system and/or public identifiers, to 2772 be used in interpreting the element to which the attribute 2773 is attached. 2774 2775 </p> 2776 2777 <a name="notatn"></a><p><b>Validity Constraint: Notation Attributes</b></p> 2778 Notation Attributes 2779 2780 <p> 2781 Values of this type must match 2782 one of the <a href="#Notations">notation</a> names included in 2783 the declaration; all notation names in the declaration must 2784 be declared. 2785 2786 </p> 2787 2788 <a name="enum"></a><p><b>Validity Constraint: Enumeration</b></p> 2789 Enumeration 2790 2791 <p> 2792 Values of this type 2793 must match one of the <a href="#NT-Nmtoken">Nmtoken</a> tokens in the 2794 declaration. 2795 2796 </p> 2797 2798 2799 <p><a href="#dt-interop">For interoperability,</a> the same 2800 <a href="#NT-Nmtoken">Nmtoken</a> should not occur more than once in the 2801 enumerated attribute types of a single element type. 2802 2803 </p> 2804 2805 2806 2807 2808 <h4><a name="sec-attr-defaults"></a>3.3.2 Attribute Defaults 2809 </h4> 2810 2811 2812 <p>An <a href="#dt-attdecl">attribute declaration</a> provides 2813 information on whether 2814 the attribute's presence is required, and if not, how an XML processor should 2815 react if a declared attribute is absent in a document. 2816 2817 <h5>Attribute Defaults</h5> 2818 <table class="scrap"> 2819 <tbody> 2820 2821 <tr valign="baseline"> 2822 <td><a name="NT-DefaultDecl"></a>[60] 2823 </td> 2824 <td>DefaultDecl</td> 2825 <td> ::= </td> 2826 <td>'#REQUIRED' 2827 | '#IMPLIED' 2828 </td> 2829 <td></td> 2830 </tr> 2831 <tr valign="baseline"> 2832 <td></td> 2833 <td></td> 2834 <td></td> 2835 <td>| (('#FIXED' S)? <a href="#NT-AttValue">AttValue</a>) 2836 </td> 2837 <td>[ VC: <a href="#RequiredAttr">Required Attribute</a> ] 2838 </td> 2839 </tr> 2840 <tr valign="baseline"> 2841 <td></td> 2842 <td></td> 2843 <td></td> 2844 <td></td> 2845 <td>[ VC: <a href="#defattrvalid">Attribute Default Legal</a> ] 2846 </td> 2847 </tr> 2848 <tr valign="baseline"> 2849 <td></td> 2850 <td></td> 2851 <td></td> 2852 <td></td> 2853 <td>[ WFC: <a href="#CleanAttrVals">No < in Attribute Values</a> ] 2854 </td> 2855 </tr> 2856 <tr valign="baseline"> 2857 <td></td> 2858 <td></td> 2859 <td></td> 2860 <td></td> 2861 <td>[ VC: <a href="#FixedAttr">Fixed Attribute Default</a> ] 2862 </td> 2863 </tr> 2864 2865 </tbody> 2866 </table> 2867 2868 2869 </p> 2870 2871 <p>In an attribute declaration, #REQUIRED means that the 2872 attribute must always be provided, #IMPLIED that no default 2873 value is provided. 2874 2875 <a name="dt-default"></a>If the 2876 declaration 2877 is neither #REQUIRED nor #IMPLIED, then the 2878 <a href="#NT-AttValue">AttValue</a> value contains the declared 2879 <b>default</b> value; the #FIXED keyword states that 2880 the attribute must always have the default value. 2881 If a default value 2882 is declared, when an XML processor encounters an omitted attribute, it 2883 is to behave as though the attribute were present with 2884 the declared default value. 2885 </p> 2886 <a name="RequiredAttr"></a><p><b>Validity Constraint: Required Attribute</b></p> 2887 Required Attribute 2888 2889 <p>If the default declaration is the keyword #REQUIRED, then 2890 the attribute must be specified for 2891 all elements of the type in the attribute-list declaration. 2892 2893 </p> 2894 <a name="defattrvalid"></a><p><b>Validity Constraint: Attribute Default Legal</b></p> 2895 Attribute Default Legal 2896 2897 <p> 2898 The declared 2899 default value must meet the lexical constraints of the declared attribute type. 2900 2901 </p> 2902 2903 <a name="FixedAttr"></a><p><b>Validity Constraint: Fixed Attribute Default</b></p> 2904 Fixed Attribute Default 2905 2906 <p>If an attribute has a default value declared with the 2907 #FIXED keyword, instances of that attribute must 2908 match the default value. 2909 2910 </p> 2911 2912 2913 <p>Examples of attribute-list declarations: 2914 <pre><!ATTLIST termdef 2915 id ID #REQUIRED 2916 name CDATA #IMPLIED> 2917<!ATTLIST list 2918 type (bullets|ordered|glossary) "ordered"> 2919<!ATTLIST form 2920 method CDATA #FIXED "POST"></pre></p> 2921 2922 2923 2924 <h4><a name="AVNormalize"></a>3.3.3 Attribute-Value Normalization 2925 </h4> 2926 2927 <p>Before the value of an attribute is passed to the application 2928 or checked for validity, the 2929 XML processor must normalize it as follows: 2930 2931 <ul> 2932 2933 <li> 2934 <p>a character reference is processed by appending the referenced 2935 character to the attribute value 2936 </p> 2937 </li> 2938 2939 <li> 2940 <p>an entity reference is processed by recursively processing the 2941 replacement text of the entity 2942 </p> 2943 </li> 2944 2945 <li> 2946 <p>a whitespace character (#x20, #xD, #xA, #x9) is processed by 2947 appending #x20 to the normalized value, except that only a single #x20 2948 is appended for a "#xD#xA" sequence that is part of an external 2949 parsed entity or the literal entity value of an internal parsed 2950 entity 2951 </p> 2952 </li> 2953 2954 <li> 2955 <p>other characters are processed by appending them to the normalized 2956 value 2957 </p> 2958 2959 </li> 2960 </ul> 2961 2962 </p> 2963 2964 <p>If the declared value is not CDATA, then the XML processor must 2965 further process the normalized attribute value by discarding any 2966 leading and trailing space (#x20) characters, and by replacing 2967 sequences of space (#x20) characters by a single space (#x20) 2968 character. 2969 </p> 2970 2971 <p> 2972 All attributes for which no declaration has been read should be treated 2973 by a non-validating parser as if declared 2974 CDATA. 2975 2976 </p> 2977 2978 2979 2980 2981 <h3><a name="sec-condition-sect"></a>3.4 Conditional Sections 2982 </h3> 2983 2984 <p><a name="dt-cond-section"></a> 2985 <b>Conditional sections</b> are portions of the 2986 <a href="#dt-doctype">document type declaration external subset</a> 2987 which are 2988 included in, or excluded from, the logical structure of the DTD based on 2989 the keyword which governs them. 2990 2991 <h5>Conditional Section</h5> 2992 <table class="scrap"> 2993 <tbody> 2994 2995 <tr valign="baseline"> 2996 <td><a name="NT-conditionalSect"></a>[61] 2997 </td> 2998 <td>conditionalSect</td> 2999 <td> ::= </td> 3000 <td><a href="#NT-includeSect">includeSect</a> 3001 | <a href="#NT-ignoreSect">ignoreSect</a> 3002 3003 </td> 3004 <td></td> 3005 </tr> 3006 3007 <tr valign="baseline"> 3008 <td><a name="NT-includeSect"></a>[62] 3009 </td> 3010 <td>includeSect</td> 3011 <td> ::= </td> 3012 <td>'<![' S? 'INCLUDE' S? '[' 3013 3014 <a href="#NT-extSubsetDecl">extSubsetDecl</a> 3015 ']]>' 3016 3017 </td> 3018 <td></td> 3019 </tr> 3020 3021 <tr valign="baseline"> 3022 <td><a name="NT-ignoreSect"></a>[63] 3023 </td> 3024 <td>ignoreSect</td> 3025 <td> ::= </td> 3026 <td>'<![' S? 'IGNORE' S? '[' 3027 <a href="#NT-ignoreSectContents">ignoreSectContents</a>* 3028 ']]>' 3029 </td> 3030 <td></td> 3031 </tr> 3032 3033 3034 <tr valign="baseline"> 3035 <td><a name="NT-ignoreSectContents"></a>[64] 3036 </td> 3037 <td>ignoreSectContents</td> 3038 <td> ::= </td> 3039 <td><a href="#NT-Ignore">Ignore</a> 3040 ('<![' <a href="#NT-ignoreSectContents">ignoreSectContents</a> ']]>' 3041 <a href="#NT-Ignore">Ignore</a>)* 3042 </td> 3043 <td></td> 3044 </tr> 3045 3046 <tr valign="baseline"> 3047 <td><a name="NT-Ignore"></a>[65] 3048 </td> 3049 <td>Ignore</td> 3050 <td> ::= </td> 3051 <td><a href="#NT-Char">Char</a>* - 3052 (<a href="#NT-Char">Char</a>* ('<![' | ']]>') 3053 <a href="#NT-Char">Char</a>*) 3054 3055 </td> 3056 <td></td> 3057 </tr> 3058 3059 3060 </tbody> 3061 </table> 3062 3063 </p> 3064 3065 <p>Like the internal and external DTD subsets, a conditional section 3066 may contain one or more complete declarations, 3067 comments, processing instructions, 3068 or nested conditional sections, intermingled with white space. 3069 3070 </p> 3071 3072 <p>If the keyword of the 3073 conditional section is INCLUDE, then the contents of the conditional 3074 section are part of the DTD. 3075 If the keyword of the conditional 3076 section is IGNORE, then the contents of the conditional section are 3077 not logically part of the DTD. 3078 Note that for reliable parsing, the contents of even ignored 3079 conditional sections must be read in order to 3080 detect nested conditional sections and ensure that the end of the 3081 outermost (ignored) conditional section is properly detected. 3082 If a conditional section with a 3083 keyword of INCLUDE occurs within a larger conditional 3084 section with a keyword of IGNORE, both the outer and the 3085 inner conditional sections are ignored. 3086 </p> 3087 3088 <p>If the keyword of the conditional section is a 3089 parameter-entity reference, the parameter entity must be replaced by its 3090 content before the processor decides whether to 3091 include or ignore the conditional section. 3092 </p> 3093 3094 <p>An example: 3095 <pre><!ENTITY % draft 'INCLUDE' > 3096<!ENTITY % final 'IGNORE' > 3097 3098<![%draft;[ 3099<!ELEMENT book (comments*, title, body, supplements?)> 3100]]> 3101<![%final;[ 3102<!ELEMENT book (title, body, supplements?)> 3103]]> 3104</pre> 3105 </p> 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 <h2><a name="sec-physical-struct"></a>4 Physical Structures 3117 </h2> 3118 3119 3120 <p><a name="dt-entity"></a>An XML document may consist 3121 of one or many storage units. These are called 3122 <b>entities</b>; they all have <b>content</b> and are all 3123 (except for the document entity, see below, and 3124 the <a href="#dt-doctype">external DTD subset</a>) 3125 identified by <b>name</b>. 3126 3127 Each XML document has one entity 3128 called the <a href="#dt-docent">document entity</a>, which serves 3129 as the starting point for the <a href="#dt-xml-proc">XML 3130 processor 3131 </a> and may contain the whole document. 3132 </p> 3133 3134 <p>Entities may be either parsed or unparsed. 3135 <a name="dt-parsedent"></a>A <b>parsed entity's</b> 3136 contents are referred to as its 3137 <a href="#dt-repltext">replacement text</a>; 3138 this <a href="#dt-text">text</a> is considered an 3139 integral part of the document. 3140 </p> 3141 3142 3143 <p><a name="dt-unparsed"></a>An 3144 <b>unparsed entity</b> 3145 is a resource whose contents may or may not be 3146 <a href="#dt-text">text</a>, and if text, may not be XML. 3147 Each unparsed entity 3148 has an associated <a href="#dt-notation">notation</a>, identified by name. 3149 Beyond a requirement 3150 that an XML processor make the identifiers for the entity and 3151 notation available to the application, 3152 XML places no constraints on the contents of unparsed entities. 3153 3154 </p> 3155 3156 <p> 3157 Parsed entities are invoked by name using entity references; 3158 unparsed entities by name, given in the value of ENTITY 3159 or ENTITIES 3160 attributes. 3161 </p> 3162 3163 <p><a name="gen-entity"></a><b>General entities</b> 3164 are entities for use within the document content. 3165 In this specification, general entities are sometimes referred 3166 to with the unqualified term <i>entity</i> when this leads 3167 to no ambiguity. 3168 <a name="dt-PE"></a>Parameter entities 3169 are parsed entities for use within the DTD. 3170 These two types of entities use different forms of reference and 3171 are recognized in different contexts. 3172 Furthermore, they occupy different namespaces; a parameter entity and 3173 a general entity with the same name are two distinct entities. 3174 3175 </p> 3176 3177 3178 3179 <h3><a name="sec-references"></a>4.1 Character and Entity References 3180 </h3> 3181 3182 <p><a name="dt-charref"></a> 3183 A <b>character reference</b> refers to a specific character in the 3184 ISO/IEC 10646 character set, for example one not directly accessible from 3185 available input devices. 3186 3187 <h5>Character Reference</h5> 3188 <table class="scrap"> 3189 <tbody> 3190 <tr valign="baseline"> 3191 <td><a name="NT-CharRef"></a>[66] 3192 </td> 3193 <td>CharRef</td> 3194 <td> ::= </td> 3195 <td>'&#' [0-9]+ ';' </td> 3196 <td></td> 3197 </tr> 3198 <tr valign="baseline"> 3199 <td></td> 3200 <td></td> 3201 <td></td> 3202 <td>| '&#x' [0-9a-fA-F]+ ';'</td> 3203 <td>[ WFC: <a href="#wf-Legalchar">Legal Character</a> ] 3204 </td> 3205 </tr> 3206 </tbody> 3207 </table> 3208 <a name="wf-Legalchar"></a><p><b>Well Formedness Constraint: Legal Character</b></p> 3209 Legal Character 3210 3211 <p>Characters referred to using character references must 3212 match the production for 3213 <a href="#NT-Char">Char</a>. 3214 </p> 3215 3216 If the character reference begins with "<code>&#x</code>", the digits and 3217 letters up to the terminating <code>;</code> provide a hexadecimal 3218 representation of the character's code point in ISO/IEC 10646. 3219 If it begins just with "<code>&#</code>", the digits up to the terminating 3220 <code>;</code> provide a decimal representation of the character's 3221 code point. 3222 3223 3224 </p> 3225 3226 <p><a name="dt-entref"></a>An <b>entity 3227 reference 3228 </b> refers to the content of a named entity. 3229 <a name="dt-GERef"></a>References to 3230 parsed general entities 3231 use ampersand (<code>&</code>) and semicolon (<code>;</code>) as 3232 delimiters. 3233 <a name="dt-PERef"></a> 3234 <b>Parameter-entity references</b> use percent-sign (<code>%</code>) and 3235 semicolon 3236 (<code>;</code>) as delimiters. 3237 3238 </p> 3239 3240 <h5>Entity Reference</h5> 3241 <table class="scrap"> 3242 <tbody> 3243 <tr valign="baseline"> 3244 <td><a name="NT-Reference"></a>[67] 3245 </td> 3246 <td>Reference</td> 3247 <td> ::= </td> 3248 <td><a href="#NT-EntityRef">EntityRef</a> 3249 | <a href="#NT-CharRef">CharRef</a></td> 3250 <td></td> 3251 </tr> 3252 <tr valign="baseline"> 3253 <td><a name="NT-EntityRef"></a>[68] 3254 </td> 3255 <td>EntityRef</td> 3256 <td> ::= </td> 3257 <td>'&' <a href="#NT-Name">Name</a> ';' 3258 </td> 3259 <td>[ WFC: <a href="#wf-entdeclared">Entity Declared</a> ] 3260 </td> 3261 </tr> 3262 <tr valign="baseline"> 3263 <td></td> 3264 <td></td> 3265 <td></td> 3266 <td></td> 3267 <td>[ VC: <a href="#vc-entdeclared">Entity Declared</a> ] 3268 </td> 3269 </tr> 3270 <tr valign="baseline"> 3271 <td></td> 3272 <td></td> 3273 <td></td> 3274 <td></td> 3275 <td>[ WFC: <a href="#textent">Parsed Entity</a> ] 3276 </td> 3277 </tr> 3278 <tr valign="baseline"> 3279 <td></td> 3280 <td></td> 3281 <td></td> 3282 <td></td> 3283 <td>[ WFC: <a href="#norecursion">No Recursion</a> ] 3284 </td> 3285 </tr> 3286 <tr valign="baseline"> 3287 <td><a name="NT-PEReference"></a>[69] 3288 </td> 3289 <td>PEReference</td> 3290 <td> ::= </td> 3291 <td>'%' <a href="#NT-Name">Name</a> ';' 3292 </td> 3293 <td>[ VC: <a href="#vc-entdeclared">Entity Declared</a> ] 3294 </td> 3295 </tr> 3296 <tr valign="baseline"> 3297 <td></td> 3298 <td></td> 3299 <td></td> 3300 <td></td> 3301 <td>[ WFC: <a href="#norecursion">No Recursion</a> ] 3302 </td> 3303 </tr> 3304 <tr valign="baseline"> 3305 <td></td> 3306 <td></td> 3307 <td></td> 3308 <td></td> 3309 <td>[ WFC: <a href="#indtd">In DTD</a> ] 3310 </td> 3311 </tr> 3312 </tbody> 3313 </table> 3314 3315 <a name="wf-entdeclared"></a><p><b>Well Formedness Constraint: Entity Declared</b></p> 3316 Entity Declared 3317 3318 <p>In a document without any DTD, a document with only an internal 3319 DTD subset which contains no parameter entity references, or a document with 3320 "<code>standalone='yes'</code>", 3321 the <a href="#NT-Name">Name</a> given in the entity reference must 3322 <a href="#dt-match">match</a> that in an 3323 <a href="#sec-entity-decl">entity declaration</a>, except that 3324 well-formed documents need not declare 3325 any of the following entities: <code>amp</code>, 3326 <code>lt</code>, 3327 <code>gt</code>, 3328 <code>apos</code>, 3329 <code>quot</code>. 3330 The declaration of a parameter entity must precede any reference to it. 3331 Similarly, the declaration of a general entity must precede any 3332 reference to it which appears in a default value in an attribute-list 3333 declaration. 3334 </p> 3335 3336 <p>Note that if entities are declared in the external subset or in 3337 external parameter entities, a non-validating processor is 3338 <a href="#include-if-valid">not obligated to</a> read 3339 and process their declarations; for such documents, the rule that 3340 an entity must be declared is a well-formedness constraint only 3341 if <a href="#sec-rmd">standalone='yes'</a>. 3342 </p> 3343 3344 <a name="vc-entdeclared"></a><p><b>Validity Constraint: Entity Declared</b></p> 3345 Entity Declared 3346 3347 <p>In a document with an external subset or external parameter 3348 entities with "<code>standalone='no'</code>", 3349 the <a href="#NT-Name">Name</a> given in the entity reference must <a href="#dt-match">match</a> that in an 3350 <a href="#sec-entity-decl">entity declaration</a>. 3351 For interoperability, valid documents should declare the entities 3352 <code>amp</code>, 3353 <code>lt</code>, 3354 <code>gt</code>, 3355 <code>apos</code>, 3356 <code>quot</code>, in the form 3357 specified in <a href="#sec-predefined-ent">[<b>4.6 Predefined Entities</b>] 3358 </a>. 3359 The declaration of a parameter entity must precede any reference to it. 3360 Similarly, the declaration of a general entity must precede any 3361 reference to it which appears in a default value in an attribute-list 3362 declaration. 3363 </p> 3364 3365 3366 <a name="textent"></a><p><b>Well Formedness Constraint: Parsed Entity</b></p> 3367 Parsed Entity 3368 3369 <p> 3370 An entity reference must not contain the name of an <a href="#dt-unparsed">unparsed entity</a>. Unparsed entities may be referred 3371 to only in <a href="#dt-attrval">attribute values</a> declared to 3372 be of type ENTITY or ENTITIES. 3373 3374 </p> 3375 3376 <a name="norecursion"></a><p><b>Well Formedness Constraint: No Recursion</b></p> 3377 No Recursion 3378 3379 <p> 3380 A parsed entity must not contain a recursive reference to itself, 3381 either directly or indirectly. 3382 3383 </p> 3384 3385 <a name="indtd"></a><p><b>Well Formedness Constraint: In DTD</b></p> 3386 In DTD 3387 3388 <p> 3389 Parameter-entity references may only appear in the 3390 <a href="#dt-doctype">DTD</a>. 3391 3392 </p> 3393 3394 3395 <p>Examples of character and entity references: 3396 <pre>Type <key>less-than</key> (&#x3C;) to save options. 3397This document was prepared on &docdate; and 3398is classified &security-level;.</pre></p> 3399 3400 <p>Example of a parameter-entity reference: 3401 <pre><!-- declare the parameter entity "ISOLat2"... --> 3402<!ENTITY % ISOLat2 3403 SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" > 3404<!-- ... now reference it. --> 3405%ISOLat2;</pre></p> 3406 3407 3408 3409 3410 <h3><a name="sec-entity-decl"></a>4.2 Entity Declarations 3411 </h3> 3412 3413 3414 <p><a name="dt-entdecl"></a> 3415 Entities are declared thus: 3416 3417 <h5>Entity Declaration</h5> 3418 <table class="scrap"> 3419 <tbody> 3420 3421 <tr valign="baseline"> 3422 <td><a name="NT-EntityDecl"></a>[70] 3423 </td> 3424 <td>EntityDecl</td> 3425 <td> ::= </td> 3426 <td><a href="#NT-GEDecl">GEDecl</a> | <a href="#NT-PEDecl">PEDecl</a></td> 3427 <td></td> 3428 </tr> 3429 3430 <tr valign="baseline"> 3431 <td><a name="NT-GEDecl"></a>[71] 3432 </td> 3433 <td>GEDecl</td> 3434 <td> ::= </td> 3435 <td>'<!ENTITY' <a href="#NT-S">S</a> <a href="#NT-Name">Name</a> 3436 <a href="#NT-S">S</a> <a href="#NT-EntityDef">EntityDef</a> 3437 <a href="#NT-S">S</a>? '>' 3438 </td> 3439 <td></td> 3440 </tr> 3441 3442 <tr valign="baseline"> 3443 <td><a name="NT-PEDecl"></a>[72] 3444 </td> 3445 <td>PEDecl</td> 3446 <td> ::= </td> 3447 <td>'<!ENTITY' <a href="#NT-S">S</a> '%' <a href="#NT-S">S</a> 3448 <a href="#NT-Name">Name</a> <a href="#NT-S">S</a> 3449 <a href="#NT-PEDef">PEDef</a> <a href="#NT-S">S</a>? '>' 3450 </td> 3451 <td></td> 3452 </tr> 3453 3454 <tr valign="baseline"> 3455 <td><a name="NT-EntityDef"></a>[73] 3456 </td> 3457 <td>EntityDef</td> 3458 <td> ::= </td> 3459 <td><a href="#NT-EntityValue">EntityValue</a> 3460 | (<a href="#NT-ExternalID">ExternalID</a> 3461 <a href="#NT-NDataDecl">NDataDecl</a>?) 3462 </td> 3463 <td></td> 3464 </tr> 3465 3466 3467 <tr valign="baseline"> 3468 <td><a name="NT-PEDef"></a>[74] 3469 </td> 3470 <td>PEDef</td> 3471 <td> ::= </td> 3472 <td><a href="#NT-EntityValue">EntityValue</a> 3473 | <a href="#NT-ExternalID">ExternalID</a></td> 3474 <td></td> 3475 </tr> 3476 3477 </tbody> 3478 </table> 3479 The <a href="#NT-Name">Name</a> identifies the entity in an 3480 <a href="#dt-entref">entity reference</a> or, in the case of an 3481 unparsed entity, in the value of an ENTITY or ENTITIES 3482 attribute. 3483 If the same entity is declared more than once, the first declaration 3484 encountered is binding; at user option, an XML processor may issue a 3485 warning if entities are declared multiple times. 3486 3487 </p> 3488 3489 3490 3491 <h4><a name="sec-internal-ent"></a>4.2.1 Internal Entities 3492 </h4> 3493 3494 3495 <p><a name="dt-internent"></a>If 3496 the entity definition is an 3497 <a href="#NT-EntityValue">EntityValue</a>, 3498 the defined entity is called an <b>internal entity</b>. 3499 There is no separate physical 3500 storage object, and the content of the entity is given in the 3501 declaration. 3502 Note that some processing of entity and character references in the 3503 <a href="#dt-litentval">literal entity value</a> may be required to 3504 produce the correct <a href="#dt-repltext">replacement 3505 text 3506 </a>: see <a href="#intern-replacement">[<b>4.5 Construction of Internal Entity Replacement Text</b>] 3507 </a>. 3508 3509 </p> 3510 3511 <p>An internal entity is a <a href="#dt-parsedent">parsed 3512 entity 3513 </a>. 3514 </p> 3515 3516 <p>Example of an internal entity declaration: 3517 <pre><!ENTITY Pub-Status "This is a pre-release of the 3518 specification."></pre></p> 3519 3520 3521 3522 3523 <h4><a name="sec-external-ent"></a>4.2.2 External Entities 3524 </h4> 3525 3526 3527 <p><a name="dt-extent"></a>If the entity is not 3528 internal, it is an <b>external 3529 entity 3530 </b>, declared as follows: 3531 3532 <h5>External Entity Declaration</h5> 3533 <table class="scrap"> 3534 <tbody> 3535 <tr valign="baseline"> 3536 <td><a name="NT-ExternalID"></a>[75] 3537 </td> 3538 <td>ExternalID</td> 3539 <td> ::= </td> 3540 <td>'SYSTEM' <a href="#NT-S">S</a> 3541 <a href="#NT-SystemLiteral">SystemLiteral</a></td> 3542 <td></td> 3543 </tr> 3544 <tr valign="baseline"> 3545 <td></td> 3546 <td></td> 3547 <td></td> 3548 <td>| 'PUBLIC' <a href="#NT-S">S</a> 3549 <a href="#NT-PubidLiteral">PubidLiteral</a> 3550 <a href="#NT-S">S</a> 3551 <a href="#NT-SystemLiteral">SystemLiteral</a> 3552 3553 </td> 3554 <td></td> 3555 </tr> 3556 <tr valign="baseline"> 3557 <td><a name="NT-NDataDecl"></a>[76] 3558 </td> 3559 <td>NDataDecl</td> 3560 <td> ::= </td> 3561 <td><a href="#NT-S">S</a> 'NDATA' <a href="#NT-S">S</a> 3562 <a href="#NT-Name">Name</a></td> 3563 <td>[ VC: <a href="#not-declared">Notation Declared</a> ] 3564 </td> 3565 </tr> 3566 </tbody> 3567 </table> 3568 If the <a href="#NT-NDataDecl">NDataDecl</a> is present, this is a 3569 general <a href="#dt-unparsed">unparsed 3570 entity 3571 </a>; otherwise it is a parsed entity. 3572 </p> 3573 <a name="not-declared"></a><p><b>Validity Constraint: Notation Declared</b></p> 3574 Notation Declared 3575 3576 <p> 3577 The <a href="#NT-Name">Name</a> must match the declared name of a 3578 <a href="#dt-notation">notation</a>. 3579 3580 </p> 3581 3582 3583 <p><a name="dt-sysid"></a>The 3584 <a href="#NT-SystemLiteral">SystemLiteral</a> 3585 is called the entity's <b>system identifier</b>. It is a URI, 3586 which may be used to retrieve the entity. 3587 Note that the hash mark (<code>#</code>) and fragment identifier 3588 frequently used with URIs are not, formally, part of the URI itself; 3589 an XML processor may signal an error if a fragment identifier is 3590 given as part of a system identifier. 3591 Unless otherwise provided by information outside the scope of this 3592 specification (e.g. a special XML element type defined by a particular 3593 DTD, or a processing instruction defined by a particular application 3594 specification), relative URIs are relative to the location of the 3595 resource within which the entity declaration occurs. 3596 A URI might thus be relative to the 3597 <a href="#dt-docent">document entity</a>, to the entity 3598 containing the <a href="#dt-doctype">external DTD subset</a>, 3599 or to some other <a href="#dt-extent">external parameter entity</a>. 3600 3601 </p> 3602 3603 <p>An XML processor should handle a non-ASCII character in a URI by 3604 representing the character in UTF-8 as one or more bytes, and then 3605 escaping these bytes with the URI escaping mechanism (i.e., by 3606 converting each byte to %HH, where HH is the hexadecimal notation of the 3607 byte value). 3608 </p> 3609 3610 <p><a name="dt-pubid"></a> 3611 In addition to a system identifier, an external identifier may 3612 include a <b>public identifier</b>. 3613 An XML processor attempting to retrieve the entity's content may use the public 3614 identifier to try to generate an alternative URI. If the processor 3615 is unable to do so, it must use the URI specified in the system 3616 literal. Before a match is attempted, all strings 3617 of white space in the public identifier must be normalized to single space characters (#x20), 3618 and leading and trailing white space must be removed. 3619 </p> 3620 3621 <p>Examples of external entity declarations: 3622 <pre><!ENTITY open-hatch 3623 SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> 3624<!ENTITY open-hatch 3625 PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" 3626 "http://www.textuality.com/boilerplate/OpenHatch.xml"> 3627<!ENTITY hatch-pic 3628 SYSTEM "../grafix/OpenHatch.gif" 3629 NDATA gif ></pre></p> 3630 3631 3632 3633 3634 3635 3636 <h3><a name="TextEntities"></a>4.3 Parsed Entities 3637 </h3> 3638 3639 3640 <h4><a name="sec-TextDecl"></a>4.3.1 The Text Declaration 3641 </h4> 3642 3643 <p>External parsed entities may each begin with a <b>text 3644 declaration 3645 </b>. 3646 3647 <h5>Text Declaration</h5> 3648 <table class="scrap"> 3649 <tbody> 3650 3651 <tr valign="baseline"> 3652 <td><a name="NT-TextDecl"></a>[77] 3653 </td> 3654 <td>TextDecl</td> 3655 <td> ::= </td> 3656 <td>'<?xml' 3657 <a href="#NT-VersionInfo">VersionInfo</a>? 3658 <a href="#NT-EncodingDecl">EncodingDecl</a> 3659 <a href="#NT-S">S</a>? '?>' 3660 </td> 3661 <td></td> 3662 </tr> 3663 3664 </tbody> 3665 </table> 3666 3667 </p> 3668 3669 <p>The text declaration must be provided literally, not 3670 by reference to a parsed entity. 3671 No text declaration may appear at any position other than the beginning of 3672 an external parsed entity. 3673 </p> 3674 3675 3676 3677 <h4><a name="wf-entities"></a>4.3.2 Well-Formed Parsed Entities 3678 </h4> 3679 3680 <p>The document entity is well-formed if it matches the production labeled 3681 <a href="#NT-document">document</a>. 3682 An external general 3683 parsed entity is well-formed if it matches the production labeled 3684 <a href="#NT-extParsedEnt">extParsedEnt</a>. 3685 An external parameter 3686 entity is well-formed if it matches the production labeled 3687 <a href="#NT-extPE">extPE</a>. 3688 3689 <h5>Well-Formed External Parsed Entity</h5> 3690 <table class="scrap"> 3691 <tbody> 3692 <tr valign="baseline"> 3693 <td><a name="NT-extParsedEnt"></a>[78] 3694 </td> 3695 <td>extParsedEnt</td> 3696 <td> ::= </td> 3697 <td><a href="#NT-TextDecl">TextDecl</a>? 3698 <a href="#NT-content">content</a></td> 3699 <td></td> 3700 </tr> 3701 <tr valign="baseline"> 3702 <td><a name="NT-extPE"></a>[79] 3703 </td> 3704 <td>extPE</td> 3705 <td> ::= </td> 3706 <td><a href="#NT-TextDecl">TextDecl</a>? 3707 <a href="#NT-extSubsetDecl">extSubsetDecl</a></td> 3708 <td></td> 3709 </tr> 3710 </tbody> 3711 </table> 3712 An internal general parsed entity is well-formed if its replacement text 3713 matches the production labeled 3714 <a href="#NT-content">content</a>. 3715 All internal parameter entities are well-formed by definition. 3716 3717 </p> 3718 3719 <p>A consequence of well-formedness in entities is that the logical 3720 and physical structures in an XML document are properly nested; no 3721 <a href="#dt-stag">start-tag</a>, 3722 <a href="#dt-etag">end-tag</a>, 3723 <a href="#dt-empty">empty-element tag</a>, 3724 <a href="#dt-element">element</a>, 3725 <a href="#dt-comment">comment</a>, 3726 <a href="#dt-pi">processing instruction</a>, 3727 <a href="#dt-charref">character 3728 reference 3729 </a>, or 3730 <a href="#dt-entref">entity reference</a> 3731 can begin in one entity and end in another. 3732 </p> 3733 3734 3735 3736 <h4><a name="charencoding"></a>4.3.3 Character Encoding in Entities 3737 </h4> 3738 3739 3740 <p>Each external parsed entity in an XML document may use a different 3741 encoding for its characters. All XML processors must be able to read 3742 entities in either UTF-8 or UTF-16. 3743 3744 3745 </p> 3746 3747 <p>Entities encoded in UTF-16 must 3748 begin with the Byte Order Mark described by ISO/IEC 10646 Annex E and 3749 Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). 3750 This is an encoding signature, not part of either the markup or the 3751 character data of the XML document. 3752 XML processors must be able to use this character to 3753 differentiate between UTF-8 and UTF-16 encoded documents. 3754 </p> 3755 3756 <p>Although an XML processor is required to read only entities in 3757 the UTF-8 and UTF-16 encodings, it is recognized that other encodings are 3758 used around the world, and it may be desired for XML processors 3759 to read entities that use them. 3760 Parsed entities which are stored in an encoding other than 3761 UTF-8 or UTF-16 must begin with a <a href="#TextDecl">text 3762 declaration 3763 </a> containing an encoding declaration: 3764 3765 <h5>Encoding Declaration</h5> 3766 <table class="scrap"> 3767 <tbody> 3768 <tr valign="baseline"> 3769 <td><a name="NT-EncodingDecl"></a>[80] 3770 </td> 3771 <td>EncodingDecl</td> 3772 <td> ::= </td> 3773 <td><a href="#NT-S">S</a> 3774 'encoding' <a href="#NT-Eq">Eq</a> 3775 ('"' <a href="#NT-EncName">EncName</a> '"' | 3776 "'" <a href="#NT-EncName">EncName</a> "'" ) 3777 3778 </td> 3779 <td></td> 3780 </tr> 3781 <tr valign="baseline"> 3782 <td><a name="NT-EncName"></a>[81] 3783 </td> 3784 <td>EncName</td> 3785 <td> ::= </td> 3786 <td>[A-Za-z] ([A-Za-z0-9._] | '-')*</td> 3787 <td>/*Encoding name contains only Latin characters*/</td> 3788 </tr> 3789 </tbody> 3790 </table> 3791 In the <a href="#dt-docent">document entity</a>, the encoding 3792 declaration is part of the <a href="#dt-xmldecl">XML declaration</a>. 3793 The <a href="#NT-EncName">EncName</a> is the name of the encoding used. 3794 3795 </p> 3796 3797 3798 <p>In an encoding declaration, the values 3799 "<code>UTF-8</code>", 3800 "<code>UTF-16</code>", 3801 "<code>ISO-10646-UCS-2</code>", and 3802 "<code>ISO-10646-UCS-4</code>" should be 3803 used for the various encodings and transformations of Unicode / 3804 ISO/IEC 10646, the values 3805 "<code>ISO-8859-1</code>", 3806 "<code>ISO-8859-2</code>", ... 3807 "<code>ISO-8859-9</code>" should be used for the parts of ISO 8859, and 3808 the values 3809 "<code>ISO-2022-JP</code>", 3810 "<code>Shift_JIS</code>", and 3811 "<code>EUC-JP</code>" 3812 should be used for the various encoded forms of JIS X-0208-1997. XML 3813 processors may recognize other encodings; it is recommended that 3814 character encodings registered (as <i>charset</i>s) 3815 with the Internet Assigned Numbers 3816 Authority <a href="#IANA">[IANA]</a>, other than those just listed, should be 3817 referred to 3818 using their registered names. 3819 Note that these registered names are defined to be 3820 case-insensitive, so processors wishing to match against them 3821 should do so in a case-insensitive 3822 way. 3823 </p> 3824 3825 <p>In the absence of information provided by an external 3826 transport protocol (e.g. HTTP or MIME), 3827 it is an <a href="#dt-error">error</a> for an entity including 3828 an encoding declaration to be presented to the XML processor 3829 in an encoding other than that named in the declaration, 3830 for an encoding declaration to occur other than at the beginning 3831 of an external entity, or for 3832 an entity which begins with neither a Byte Order Mark nor an encoding 3833 declaration to use an encoding other than UTF-8. 3834 Note that since ASCII 3835 is a subset of UTF-8, ordinary ASCII entities do not strictly need 3836 an encoding declaration. 3837 </p> 3838 3839 3840 <p>It is a <a href="#dt-fatal">fatal error</a> when an XML processor 3841 encounters an entity with an encoding that it is unable to process. 3842 </p> 3843 3844 <p>Examples of encoding declarations: 3845 <pre><?xml encoding='UTF-8'?> 3846<?xml encoding='EUC-JP'?></pre></p> 3847 3848 3849 3850 3851 <h3><a name="entproc"></a>4.4 XML Processor Treatment of Entities and References 3852 </h3> 3853 3854 <p>The table below summarizes the contexts in which character references, 3855 entity references, and invocations of unparsed entities might appear and the 3856 required behavior of an <a href="#dt-xml-proc">XML processor</a> in 3857 each case. 3858 The labels in the leftmost column describe the recognition context: 3859 3860 <dl> 3861 3862 <dt><b>Reference in Content</b></dt> 3863 3864 <dd> 3865 <p>as a reference 3866 anywhere after the <a href="#dt-stag">start-tag</a> and 3867 before the <a href="#dt-etag">end-tag</a> of an element; corresponds 3868 to the nonterminal <a href="#NT-content">content</a>. 3869 </p> 3870 </dd> 3871 3872 3873 3874 <dt><b>Reference in Attribute Value</b></dt> 3875 3876 <dd> 3877 <p>as a reference within either the value of an attribute in a 3878 <a href="#dt-stag">start-tag</a>, or a default 3879 value in an <a href="#dt-attdecl">attribute declaration</a>; 3880 corresponds to the nonterminal 3881 <a href="#NT-AttValue">AttValue</a>. 3882 </p> 3883 </dd> 3884 3885 3886 <dt><b>Occurs as Attribute Value</b></dt> 3887 3888 <dd> 3889 <p>as a <a href="#NT-Name">Name</a>, not a reference, appearing either as 3890 the value of an 3891 attribute which has been declared as type ENTITY, or as one of 3892 the space-separated tokens in the value of an attribute which has been 3893 declared as type ENTITIES. 3894 </p> 3895 3896 </dd> 3897 3898 <dt><b>Reference in Entity Value</b></dt> 3899 3900 <dd> 3901 <p>as a reference 3902 within a parameter or internal entity's 3903 <a href="#dt-litentval">literal entity value</a> in 3904 the entity's declaration; corresponds to the nonterminal 3905 <a href="#NT-EntityValue">EntityValue</a>. 3906 </p> 3907 </dd> 3908 3909 <dt><b>Reference in DTD</b></dt> 3910 3911 <dd> 3912 <p>as a reference within either the internal or external subsets of the 3913 <a href="#dt-doctype">DTD</a>, but outside 3914 of an <a href="#NT-EntityValue">EntityValue</a> or 3915 <a href="#NT-AttValue">AttValue</a>. 3916 </p> 3917 </dd> 3918 3919 3920 </dl> 3921 </p> 3922 3923 <table border="1" cellpadding="7" align="center"> 3924 3925 <tbody> 3926 3927 <tr align="" valign=""> 3928 <td bgcolor="#c0d9c0" rowspan="2" colspan="1" align="" valign=""></td> 3929 3930 <td bgcolor="#c0d9c0" rowspan="1" colspan="4" align="center" valign="bottom">Entity Type</td> 3931 3932 <td bgcolor="#c0d9c0" rowspan="2" colspan="1" align="center" valign="">Character</td> 3933 3934 </tr> 3935 3936 <tr align="center" valign="bottom"> 3937 3938 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign="">Parameter</td> 3939 3940 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign="">Internal 3941 General 3942 </td> 3943 3944 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign="">External Parsed 3945 General 3946 </td> 3947 3948 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign="">Unparsed</td> 3949 3950 </tr> 3951 3952 <tr align="center" valign="middle"> 3953 3954 3955 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Reference 3956 in Content 3957 </td> 3958 3959 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Not recognized</a></td> 3960 3961 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#included">Included</a></td> 3962 3963 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#include-if-valid">Included if validating</a></td> 3964 3965 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td> 3966 3967 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#included">Included</a></td> 3968 3969 </tr> 3970 3971 <tr align="center" valign="middle"> 3972 3973 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Reference 3974 in Attribute Value 3975 </td> 3976 3977 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Not recognized</a></td> 3978 3979 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#inliteral">Included in literal</a></td> 3980 3981 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td> 3982 3983 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td> 3984 3985 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#included">Included</a></td> 3986 3987 </tr> 3988 3989 <tr align="center" valign="middle"> 3990 3991 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Occurs as 3992 Attribute Value 3993 </td> 3994 3995 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Not recognized</a></td> 3996 3997 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Forbidden</a></td> 3998 3999 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Forbidden</a></td> 4000 4001 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#notify">Notify</a></td> 4002 4003 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not%20recognized">Not recognized</a></td> 4004 4005 </tr> 4006 4007 <tr align="center" valign="middle"> 4008 4009 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Reference 4010 in EntityValue 4011 </td> 4012 4013 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#inliteral">Included in literal</a></td> 4014 4015 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#bypass">Bypassed</a></td> 4016 4017 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#bypass">Bypassed</a></td> 4018 4019 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td> 4020 4021 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#included">Included</a></td> 4022 4023 </tr> 4024 4025 <tr align="center" valign="middle"> 4026 4027 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Reference 4028 in DTD 4029 </td> 4030 4031 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#as-PE">Included as PE</a></td> 4032 4033 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td> 4034 4035 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td> 4036 4037 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td> 4038 4039 <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td> 4040 4041 </tr> 4042 4043 </tbody> 4044 4045 </table> 4046 4047 4048 <h4><a name="not-recognized"></a>4.4.1 Not Recognized 4049 </h4> 4050 4051 <p>Outside the DTD, the <code>%</code> character has no 4052 special significance; thus, what would be parameter entity references in the 4053 DTD are not recognized as markup in <a href="#NT-content">content</a>. 4054 Similarly, the names of unparsed entities are not recognized except 4055 when they appear in the value of an appropriately declared attribute. 4056 4057 </p> 4058 4059 4060 4061 <h4><a name="included"></a>4.4.2 Included 4062 </h4> 4063 4064 <p><a name="dt-include"></a>An entity is 4065 <b>included</b> when its 4066 <a href="#dt-repltext">replacement text</a> is retrieved 4067 and processed, in place of the reference itself, 4068 as though it were part of the document at the location the 4069 reference was recognized. 4070 The replacement text may contain both 4071 <a href="#dt-chardata">character data</a> 4072 and (except for parameter entities) <a href="#dt-markup">markup</a>, 4073 which must be recognized in 4074 the usual way, except that the replacement text of entities used to escape 4075 markup delimiters (the entities <code>amp</code>, 4076 <code>lt</code>, 4077 <code>gt</code>, 4078 <code>apos</code>, 4079 <code>quot</code>) is always treated as 4080 data. (The string "<code>AT&amp;T;</code>" expands to 4081 "<code>AT&T;</code>" and the remaining ampersand is not recognized 4082 as an entity-reference delimiter.) 4083 A character reference is <b>included</b> when the indicated 4084 character is processed in place of the reference itself. 4085 4086 </p> 4087 4088 4089 4090 <h4><a name="include-if-valid"></a>4.4.3 Included If Validating 4091 </h4> 4092 4093 <p>When an XML processor recognizes a reference to a parsed entity, in order 4094 to <a href="#dt-valid">validate</a> 4095 the document, the processor must 4096 <a href="#dt-include">include</a> its 4097 replacement text. 4098 If the entity is external, and the processor is not 4099 attempting to validate the XML document, the 4100 processor <a href="#dt-may">may</a>, but need not, 4101 include the entity's replacement text. 4102 If a non-validating parser does not include the replacement text, 4103 it must inform the application that it recognized, but did not 4104 read, the entity. 4105 </p> 4106 4107 <p>This rule is based on the recognition that the automatic inclusion 4108 provided by the SGML and XML entity mechanism, primarily designed 4109 to support modularity in authoring, is not necessarily 4110 appropriate for other applications, in particular document browsing. 4111 Browsers, for example, when encountering an external parsed entity reference, 4112 might choose to provide a visual indication of the entity's 4113 presence and retrieve it for display only on demand. 4114 4115 </p> 4116 4117 4118 4119 <h4><a name="forbidden"></a>4.4.4 Forbidden 4120 </h4> 4121 4122 <p>The following are forbidden, and constitute 4123 <a href="#dt-fatal">fatal</a> errors: 4124 4125 <ul> 4126 4127 <li> 4128 <p>the appearance of a reference to an 4129 <a href="#dt-unparsed">unparsed entity</a>. 4130 4131 </p> 4132 </li> 4133 4134 <li> 4135 <p>the appearance of any character or general-entity reference in the 4136 DTD except within an <a href="#NT-EntityValue">EntityValue</a> or 4137 <a href="#NT-AttValue">AttValue</a>. 4138 </p> 4139 </li> 4140 4141 <li> 4142 <p>a reference to an external entity in an attribute value.</p> 4143 4144 </li> 4145 4146 </ul> 4147 4148 </p> 4149 4150 4151 4152 <h4><a name="inliteral"></a>4.4.5 Included in Literal 4153 </h4> 4154 4155 <p>When an <a href="#dt-entref">entity reference</a> appears in an 4156 attribute value, or a parameter entity reference appears in a literal entity 4157 value, its <a href="#dt-repltext">replacement text</a> is 4158 processed in place of the reference itself as though it 4159 were part of the document at the location the reference was recognized, 4160 except that a single or double quote character in the replacement text 4161 is always treated as a normal data character and will not terminate the 4162 literal. 4163 For example, this is well-formed: 4164 <pre><!ENTITY % YN '"Yes"' > 4165<!ENTITY WhatHeSaid "He said &YN;" ></pre> 4166 while this is not: 4167 <pre><!ENTITY EndAttr "27'" > 4168<element attribute='a-&EndAttr;></pre> 4169 </p> 4170 4171 4172 <h4><a name="notify"></a>4.4.6 Notify 4173 </h4> 4174 4175 <p>When the name of an <a href="#dt-unparsed">unparsed 4176 entity 4177 </a> appears as a token in the 4178 value of an attribute of declared type ENTITY or ENTITIES, 4179 a validating processor must inform the 4180 application of the <a href="#dt-sysid">system</a> 4181 and <a href="#dt-pubid">public</a> (if any) 4182 identifiers for both the entity and its associated 4183 <a href="#dt-notation">notation</a>. 4184 </p> 4185 4186 4187 4188 <h4><a name="bypass"></a>4.4.7 Bypassed 4189 </h4> 4190 4191 <p>When a general entity reference appears in the 4192 <a href="#NT-EntityValue">EntityValue</a> in an entity declaration, 4193 it is bypassed and left as is. 4194 </p> 4195 4196 4197 4198 <h4><a name="as-PE"></a>4.4.8 Included as PE 4199 </h4> 4200 4201 <p>Just as with external parsed entities, parameter entities 4202 need only be <a href="#include-if-valid">included if 4203 validating 4204 </a>. 4205 When a parameter-entity reference is recognized in the DTD 4206 and included, its 4207 <a href="#dt-repltext">replacement 4208 text 4209 </a> is enlarged by the attachment of one leading and one following 4210 space (#x20) character; the intent is to constrain the replacement 4211 text of parameter 4212 entities to contain an integral number of grammatical tokens in the DTD. 4213 4214 </p> 4215 4216 4217 4218 4219 4220 <h3><a name="intern-replacement"></a>4.5 Construction of Internal Entity Replacement Text 4221 </h3> 4222 4223 <p>In discussing the treatment 4224 of internal entities, it is 4225 useful to distinguish two forms of the entity's value. 4226 <a name="dt-litentval"></a>The <b>literal 4227 entity value 4228 </b> is the quoted string actually 4229 present in the entity declaration, corresponding to the 4230 non-terminal <a href="#NT-EntityValue">EntityValue</a>. 4231 <a name="dt-repltext"></a>The <b>replacement 4232 text 4233 </b> is the content of the entity, after 4234 replacement of character references and parameter-entity 4235 references. 4236 4237 </p> 4238 4239 4240 <p>The literal entity value 4241 as given in an internal entity declaration 4242 (<a href="#NT-EntityValue">EntityValue</a>) may contain character, 4243 parameter-entity, and general-entity references. 4244 Such references must be contained entirely within the 4245 literal entity value. 4246 The actual replacement text that is 4247 <a href="#dt-include">included</a> as described above 4248 must contain the <i>replacement text</i> of any 4249 parameter entities referred to, and must contain the character 4250 referred to, in place of any character references in the 4251 literal entity value; however, 4252 general-entity references must be left as-is, unexpanded. 4253 For example, given the following declarations: 4254 4255 <pre><!ENTITY % pub "&#xc9;ditions Gallimard" > 4256<!ENTITY rights "All rights reserved" > 4257<!ENTITY book "La Peste: Albert Camus, 4258&#xA9; 1947 %pub;. &rights;" ></pre> 4259 then the replacement text for the entity "<code>book</code>" is: 4260 <pre>La Peste: Albert Camus, 4261© 1947 Éditions Gallimard. &rights;</pre> 4262 The general-entity reference "<code>&rights;</code>" would be expanded 4263 should the reference "<code>&book;</code>" appear in the document's 4264 content or an attribute value. 4265 </p> 4266 4267 <p>These simple rules may have complex interactions; for a detailed 4268 discussion of a difficult example, see 4269 <a href="#sec-entexpand">[<b>D Expansion of Entity and Character References</b>] 4270 </a>. 4271 4272 </p> 4273 4274 4275 4276 4277 <h3><a name="sec-predefined-ent"></a>4.6 Predefined Entities 4278 </h3> 4279 4280 <p><a name="dt-escape"></a>Entity and character 4281 references can both be used to <b>escape</b> the left angle bracket, 4282 ampersand, and other delimiters. A set of general entities 4283 (<code>amp</code>, 4284 <code>lt</code>, 4285 <code>gt</code>, 4286 <code>apos</code>, 4287 <code>quot</code>) is specified for this purpose. 4288 Numeric character references may also be used; they are 4289 expanded immediately when recognized and must be treated as 4290 character data, so the numeric character references 4291 "<code>&#60;</code>" and "<code>&#38;</code>" may be used to 4292 escape <code><</code> and <code>&</code> when they occur 4293 in character data. 4294 </p> 4295 4296 <p>All XML processors must recognize these entities whether they 4297 are declared or not. 4298 <a href="#dt-interop">For interoperability</a>, 4299 valid XML documents should declare these 4300 entities, like any others, before using them. 4301 If the entities in question are declared, they must be declared 4302 as internal entities whose replacement text is the single 4303 character being escaped or a character reference to 4304 that character, as shown below. 4305 <pre><!ENTITY lt "&#38;#60;"> 4306<!ENTITY gt "&#62;"> 4307<!ENTITY amp "&#38;#38;"> 4308<!ENTITY apos "&#39;"> 4309<!ENTITY quot "&#34;"> 4310</pre> 4311 Note that the <code><</code> and <code>&</code> characters 4312 in the declarations of "<code>lt</code>" and "<code>amp</code>" 4313 are doubly escaped to meet the requirement that entity replacement 4314 be well-formed. 4315 4316 </p> 4317 4318 4319 4320 4321 <h3><a name="Notations"></a>4.7 Notation Declarations 4322 </h3> 4323 4324 4325 <p><a name="dt-notation"></a><b>Notations</b> identify by 4326 name the format of <a href="#dt-extent">unparsed 4327 entities 4328 </a>, the 4329 format of elements which bear a notation attribute, 4330 or the application to which 4331 a <a href="#dt-pi">processing instruction</a> is 4332 addressed. 4333 </p> 4334 4335 <p><a name="dt-notdecl"></a> 4336 <b>Notation declarations</b> 4337 provide a name for the notation, for use in 4338 entity and attribute-list declarations and in attribute specifications, 4339 and an external identifier for the notation which may allow an XML 4340 processor or its client application to locate a helper application 4341 capable of processing data in the given notation. 4342 4343 <h5>Notation Declarations</h5> 4344 <table class="scrap"> 4345 <tbody> 4346 <tr valign="baseline"> 4347 <td><a name="NT-NotationDecl"></a>[82] 4348 </td> 4349 <td>NotationDecl</td> 4350 <td> ::= </td> 4351 <td>'<!NOTATION' <a href="#NT-S">S</a> <a href="#NT-Name">Name</a> 4352 <a href="#NT-S">S</a> 4353 (<a href="#NT-ExternalID">ExternalID</a> | 4354 <a href="#NT-PublicID">PublicID</a>) 4355 <a href="#NT-S">S</a>? '>' 4356 </td> 4357 <td></td> 4358 </tr> 4359 <tr valign="baseline"> 4360 <td><a name="NT-PublicID"></a>[83] 4361 </td> 4362 <td>PublicID</td> 4363 <td> ::= </td> 4364 <td>'PUBLIC' <a href="#NT-S">S</a> 4365 <a href="#NT-PubidLiteral">PubidLiteral</a> 4366 4367 </td> 4368 <td></td> 4369 </tr> 4370 </tbody> 4371 </table> 4372 4373 </p> 4374 4375 <p>XML processors must provide applications with the name and external 4376 identifier(s) of any notation declared and referred to in an attribute 4377 value, attribute definition, or entity declaration. They may 4378 additionally resolve the external identifier into the 4379 <a href="#dt-sysid">system identifier</a>, 4380 file name, or other information needed to allow the 4381 application to call a processor for data in the notation described. (It 4382 is not an error, however, for XML documents to declare and refer to 4383 notations for which notation-specific applications are not available on 4384 the system where the XML processor or application is running.) 4385 </p> 4386 4387 4388 4389 4390 4391 <h3><a name="sec-doc-entity"></a>4.8 Document Entity 4392 </h3> 4393 4394 4395 <p><a name="dt-docent"></a>The <b>document 4396 entity 4397 </b> serves as the root of the entity 4398 tree and a starting-point for an <a href="#dt-xml-proc">XML 4399 processor 4400 </a>. 4401 This specification does 4402 not specify how the document entity is to be located by an XML 4403 processor; unlike other entities, the document entity has no name and might 4404 well appear on a processor input stream 4405 without any identification at all. 4406 </p> 4407 4408 4409 4410 4411 4412 4413 4414 4415 <h2><a name="sec-conformance"></a>5 Conformance 4416 </h2> 4417 4418 4419 4420 <h3><a name="proc-types"></a>5.1 Validating and Non-Validating Processors 4421 </h3> 4422 4423 <p>Conforming <a href="#dt-xml-proc">XML processors</a> fall into two 4424 classes: validating and non-validating. 4425 </p> 4426 4427 <p>Validating and non-validating processors alike must report 4428 violations of this specification's well-formedness constraints 4429 in the content of the 4430 <a href="#dt-docent">document entity</a> and any 4431 other <a href="#dt-parsedent">parsed entities</a> that 4432 they read. 4433 </p> 4434 4435 <p><a name="dt-validating"></a> 4436 <b>Validating processors</b> must report 4437 violations of the constraints expressed by the declarations in the 4438 <a href="#dt-doctype">DTD</a>, and 4439 failures to fulfill the validity constraints given 4440 in this specification. 4441 4442 To accomplish this, validating XML processors must read and process the entire 4443 DTD and all external parsed entities referenced in the document. 4444 4445 </p> 4446 4447 <p>Non-validating processors are required to check only the 4448 <a href="#dt-docent">document entity</a>, including 4449 the entire internal DTD subset, for well-formedness. 4450 <a name="dt-use-mdecl"></a> 4451 While they are not required to check the document for validity, 4452 they are required to 4453 <b>process</b> all the declarations they read in the 4454 internal DTD subset and in any parameter entity that they 4455 read, up to the first reference 4456 to a parameter entity that they do <i>not</i> read; that is to 4457 say, they must 4458 use the information in those declarations to 4459 <a href="#AVNormalize">normalize</a> attribute values, 4460 <a href="#included">include</a> the replacement text of 4461 internal entities, and supply 4462 <a href="#sec-attr-defaults">default attribute values</a>. 4463 4464 They must not <a href="#dt-use-mdecl">process</a> 4465 <a href="#dt-entdecl">entity declarations</a> or 4466 <a href="#dt-attdecl">attribute-list declarations</a> 4467 encountered after a reference to a parameter entity that is not 4468 read, since the entity may have contained overriding declarations. 4469 4470 </p> 4471 4472 4473 4474 <h3><a name="safe-behavior"></a>5.2 Using XML Processors 4475 </h3> 4476 4477 <p>The behavior of a validating XML processor is highly predictable; it 4478 must read every piece of a document and report all well-formedness and 4479 validity violations. 4480 Less is required of a non-validating processor; it need not read any 4481 part of the document other than the document entity. 4482 This has two effects that may be important to users of XML processors: 4483 4484 <ul> 4485 4486 <li> 4487 <p>Certain well-formedness errors, specifically those that require 4488 reading external entities, may not be detected by a non-validating processor. 4489 Examples include the constraints entitled 4490 <a href="#wf-entdeclared">Entity Declared</a>, 4491 <a href="#wf-textent">Parsed Entity</a>, and 4492 <a href="#wf-norecursion">No Recursion</a>, as well 4493 as some of the cases described as 4494 <a href="#forbidden">forbidden</a> in 4495 <a href="#entproc">[<b>4.4 XML Processor Treatment of Entities and References</b>] 4496 </a>. 4497 </p> 4498 </li> 4499 4500 <li> 4501 <p>The information passed from the processor to the application may 4502 vary, depending on whether the processor reads 4503 parameter and external entities. 4504 For example, a non-validating processor may not 4505 <a href="#AVNormalize">normalize</a> attribute values, 4506 <a href="#included">include</a> the replacement text of 4507 internal entities, or supply 4508 <a href="#sec-attr-defaults">default attribute values</a>, 4509 where doing so depends on having read declarations in 4510 external or parameter entities. 4511 </p> 4512 </li> 4513 4514 </ul> 4515 4516 </p> 4517 4518 <p>For maximum reliability in interoperating between different XML 4519 processors, applications which use non-validating processors should not 4520 rely on any behaviors not required of such processors. 4521 Applications which require facilities such as the use of default 4522 attributes or internal entities which are declared in external 4523 entities should use validating XML processors. 4524 </p> 4525 4526 4527 4528 4529 4530 <h2><a name="sec-notation"></a>6 Notation 4531 </h2> 4532 4533 4534 <p>The formal grammar of XML is given in this specification using a simple 4535 Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines 4536 one symbol, in the form 4537 <pre>symbol ::= expression</pre></p> 4538 4539 <p>Symbols are written with an initial capital letter if they are 4540 defined by a regular expression, or with an initial lower case letter 4541 otherwise. 4542 Literal strings are quoted. 4543 4544 4545 </p> 4546 4547 4548 <p>Within the expression on the right-hand side of a rule, the following 4549 expressions are used to match strings of one or more characters: 4550 4551 <dl> 4552 4553 4554 <dt><b><code>#xN</code></b></dt> 4555 4556 <dd> 4557 <p>where <code>N</code> is a hexadecimal integer, the 4558 expression matches the character in ISO/IEC 10646 whose canonical 4559 (UCS-4) 4560 code value, when interpreted as an unsigned binary number, has 4561 the value indicated. The number of leading zeros in the 4562 <code>#xN</code> form is insignificant; the number of leading 4563 zeros in the corresponding code value 4564 is governed by the character 4565 encoding in use and is not significant for XML. 4566 </p> 4567 </dd> 4568 4569 4570 4571 <dt><b><code>[a-zA-Z]</code>, <code>[#xN-#xN]</code></b></dt> 4572 4573 <dd> 4574 <p>matches any <a href="#dt-character">character</a> 4575 with a value in the range(s) indicated (inclusive). 4576 </p> 4577 </dd> 4578 4579 4580 4581 <dt><b><code>[^a-z]</code>, <code>[^#xN-#xN]</code></b></dt> 4582 4583 <dd> 4584 <p>matches any <a href="#dt-character">character</a> 4585 with a value <i>outside</i> the 4586 range indicated. 4587 </p> 4588 </dd> 4589 4590 4591 4592 <dt><b><code>[^abc]</code>, <code>[^#xN#xN#xN]</code></b></dt> 4593 4594 <dd> 4595 <p>matches any <a href="#dt-character">character</a> 4596 with a value not among the characters given. 4597 </p> 4598 </dd> 4599 4600 4601 4602 <dt><b><code>"string"</code></b></dt> 4603 4604 <dd> 4605 <p>matches a literal string <a href="#dt-match">matching</a> 4606 that given inside the double quotes. 4607 </p> 4608 </dd> 4609 4610 4611 4612 <dt><b><code>'string'</code></b></dt> 4613 4614 <dd> 4615 <p>matches a literal string <a href="#dt-match">matching</a> 4616 that given inside the single quotes. 4617 </p> 4618 </dd> 4619 4620 4621 </dl> 4622 These symbols may be combined to match more complex patterns as follows, 4623 where <code>A</code> and <code>B</code> represent simple expressions: 4624 4625 <dl> 4626 4627 4628 <dt><b>(<code>expression</code>) 4629 </b> 4630 </dt> 4631 4632 <dd> 4633 <p><code>expression</code> is treated as a unit 4634 and may be combined as described in this list. 4635 </p> 4636 </dd> 4637 4638 4639 4640 <dt><b><code>A?</code></b></dt> 4641 4642 <dd> 4643 <p>matches <code>A</code> or nothing; optional <code>A</code>. 4644 </p> 4645 </dd> 4646 4647 4648 4649 <dt><b><code>A B</code></b></dt> 4650 4651 <dd> 4652 <p>matches <code>A</code> followed by <code>B</code>. 4653 </p> 4654 </dd> 4655 4656 4657 4658 <dt><b><code>A | B</code></b></dt> 4659 4660 <dd> 4661 <p>matches <code>A</code> or <code>B</code> but not both. 4662 </p> 4663 </dd> 4664 4665 4666 4667 <dt><b><code>A - B</code></b></dt> 4668 4669 <dd> 4670 <p>matches any string that matches <code>A</code> but does not match 4671 <code>B</code>. 4672 4673 </p> 4674 </dd> 4675 4676 4677 4678 <dt><b><code>A+</code></b></dt> 4679 4680 <dd> 4681 <p>matches one or more occurrences of <code>A</code>. 4682 </p> 4683 </dd> 4684 4685 4686 4687 <dt><b><code>A*</code></b></dt> 4688 4689 <dd> 4690 <p>matches zero or more occurrences of <code>A</code>. 4691 </p> 4692 </dd> 4693 4694 4695 4696 </dl> 4697 Other notations used in the productions are: 4698 4699 <dl> 4700 4701 4702 <dt><b><code>/* ... */</code></b></dt> 4703 4704 <dd> 4705 <p>comment.</p> 4706 </dd> 4707 4708 4709 4710 <dt><b><code>[ wfc: ... ]</code></b></dt> 4711 4712 <dd> 4713 <p>well-formedness constraint; this identifies by name a 4714 constraint on 4715 <a href="#dt-wellformed">well-formed</a> documents 4716 associated with a production. 4717 </p> 4718 </dd> 4719 4720 4721 4722 <dt><b><code>[ vc: ... ]</code></b></dt> 4723 4724 <dd> 4725 <p>validity constraint; this identifies by name a constraint on 4726 <a href="#dt-valid">valid</a> documents associated with 4727 a production. 4728 </p> 4729 </dd> 4730 4731 4732 </dl> 4733 4734 </p> 4735 4736 4737 4738 <hr title="Separator from footer"> 4739 4740 4741 4742 4743 4744 4745 4746 <h2><a name="sec-bibliography"></a>A References 4747 </h2> 4748 4749 4750 <h3><a name="sec-existing-stds"></a>A.1 Normative References 4751 </h3> 4752 4753 4754 <dl> 4755 4756 <dt><a name="IANA">IANA</a></dt> 4757 <dd> 4758 (Internet Assigned Numbers Authority) <i>Official Names for 4759 Character Sets 4760 </i>, 4761 ed. Keld Simonsen et al. 4762 See <a href="ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets">ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets</a>. 4763 4764 </dd> 4765 4766 4767 <dt><a name="RFC1766">IETF RFC 1766</a></dt> 4768 <dd> 4769 IETF (Internet Engineering Task Force). 4770 <i>RFC 1766: Tags for the Identification of Languages</i>, 4771 ed. H. Alvestrand. 4772 1995. 4773 4774 </dd> 4775 4776 4777 <dt><a name="ISO639">ISO 639</a></dt> 4778 <dd> 4779 (International Organization for Standardization). 4780 <i>ISO 639:1988 (E). 4781 Code for the representation of names of languages. 4782 </i> 4783 [Geneva]: International Organization for 4784 Standardization, 1988. 4785 </dd> 4786 4787 4788 <dt><a name="ISO3166">ISO 3166</a></dt> 4789 <dd> 4790 (International Organization for Standardization). 4791 <i>ISO 3166-1:1997 (E). 4792 Codes for the representation of names of countries and their subdivisions 4793 -- Part 1: Country codes 4794 </i> 4795 [Geneva]: International Organization for 4796 Standardization, 1997. 4797 </dd> 4798 4799 4800 <dt><a name="ISO10646">ISO/IEC 10646</a></dt> 4801 <dd>ISO 4802 (International Organization for Standardization). 4803 <i>ISO/IEC 10646-1993 (E). Information technology -- Universal 4804 Multiple-Octet Coded Character Set (UCS) -- Part 1: 4805 Architecture and Basic Multilingual Plane. 4806 </i> 4807 [Geneva]: International Organization for 4808 Standardization, 1993 (plus amendments AM 1 through AM 7). 4809 4810 </dd> 4811 4812 4813 <dt><a name="Unicode">Unicode</a></dt> 4814 <dd>The Unicode Consortium. 4815 <i>The Unicode Standard, Version 2.0.</i> 4816 Reading, Mass.: Addison-Wesley Developers Press, 1996. 4817 </dd> 4818 4819 4820 </dl> 4821 4822 4823 4824 4825 <h3><a name="section-Other-References"></a>A.2 Other References 4826 </h3> 4827 4828 4829 <dl> 4830 4831 4832 <dt><a name="Aho">Aho/Ullman</a></dt> 4833 <dd>Aho, Alfred V., 4834 Ravi Sethi, and Jeffrey D. Ullman. 4835 <i>Compilers: Principles, Techniques, and Tools</i>. 4836 Reading: Addison-Wesley, 1986, rpt. corr. 1988. 4837 </dd> 4838 4839 4840 <dt><a name="Berners-Lee">Berners-Lee et al.</a></dt> 4841 <dd> 4842 Berners-Lee, T., R. Fielding, and L. Masinter. 4843 <i>Uniform Resource Identifiers (URI): Generic Syntax and 4844 Semantics 4845 </i>. 4846 1997. 4847 (Work in progress; see updates to RFC1738.) 4848 </dd> 4849 4850 4851 <dt><a name="ABK">Brüggemann-Klein</a></dt> 4852 <dd>Brüggemann-Klein, Anne. 4853 <i>Regular Expressions into Finite Automata</i>. 4854 Extended abstract in I. Simon, Hrsg., LATIN 1992, 4855 S. 97-98. Springer-Verlag, Berlin 1992. 4856 Full Version in Theoretical Computer Science 120: 197-213, 1993. 4857 4858 4859 </dd> 4860 4861 4862 <dt><a name="ABKDW">Brüggemann-Klein and Wood</a></dt> 4863 <dd>Brüggemann-Klein, Anne, 4864 and Derick Wood. 4865 <i>Deterministic Regular Languages</i>. 4866 Universität Freiburg, Institut für Informatik, 4867 Bericht 38, Oktober 1991. 4868 4869 </dd> 4870 4871 4872 <dt><a name="Clark">Clark</a></dt> 4873 <dd>James Clark. 4874 Comparison of SGML and XML. See 4875 <a href="http://www.w3.org/TR/NOTE-sgml-xml-971215">http://www.w3.org/TR/NOTE-sgml-xml-971215</a>. 4876 4877 </dd> 4878 4879 <dt><a name="RFC1738">IETF RFC1738</a></dt> 4880 <dd> 4881 IETF (Internet Engineering Task Force). 4882 <i>RFC 1738: Uniform Resource Locators (URL)</i>, 4883 ed. T. Berners-Lee, L. Masinter, M. McCahill. 4884 1994. 4885 4886 </dd> 4887 4888 4889 <dt><a name="RFC1808">IETF RFC1808</a></dt> 4890 <dd> 4891 IETF (Internet Engineering Task Force). 4892 <i>RFC 1808: Relative Uniform Resource Locators</i>, 4893 ed. R. Fielding. 4894 1995. 4895 4896 </dd> 4897 4898 4899 <dt><a name="RFC2141">IETF RFC2141</a></dt> 4900 <dd> 4901 IETF (Internet Engineering Task Force). 4902 <i>RFC 2141: URN Syntax</i>, 4903 ed. R. Moats. 4904 1997. 4905 4906 </dd> 4907 4908 4909 <dt><a name="ISO8879">ISO 8879</a></dt> 4910 <dd>ISO 4911 (International Organization for Standardization). 4912 <i>ISO 8879:1986(E). Information processing -- Text and Office 4913 Systems -- Standard Generalized Markup Language (SGML). 4914 </i> First 4915 edition -- 1986-10-15. [Geneva]: International Organization for 4916 Standardization, 1986. 4917 4918 </dd> 4919 4920 4921 4922 <dt><a name="ISO10744">ISO/IEC 10744</a></dt> 4923 <dd>ISO 4924 (International Organization for Standardization). 4925 <i>ISO/IEC 10744-1992 (E). Information technology -- 4926 Hypermedia/Time-based Structuring Language (HyTime). 4927 4928 </i> 4929 [Geneva]: International Organization for 4930 Standardization, 1992. 4931 <i>Extended Facilities Annexe.</i> 4932 [Geneva]: International Organization for 4933 Standardization, 1996. 4934 4935 </dd> 4936 4937 4938 4939 4940 </dl> 4941 4942 4943 4944 4945 <h2><a name="CharClasses"></a>B Character Classes 4946 </h2> 4947 4948 <p>Following the characteristics defined in the Unicode standard, 4949 characters are classed as base characters (among others, these 4950 contain the alphabetic characters of the Latin alphabet, without 4951 diacritics), ideographic characters, and combining characters (among 4952 others, this class contains most diacritics); these classes combine 4953 to form the class of letters. Digits and extenders are 4954 also distinguished. 4955 4956 <h5>Characters</h5> 4957 <table class="scrap"> 4958 <tbody> 4959 4960 <tr valign="baseline"> 4961 <td><a name="NT-Letter"></a>[84] 4962 </td> 4963 <td>Letter</td> 4964 <td> ::= </td> 4965 <td><a href="#NT-BaseChar">BaseChar</a> 4966 | <a href="#NT-Ideographic">Ideographic</a></td> 4967 <td></td> 4968 </tr> 4969 4970 <tr valign="baseline"> 4971 <td><a name="NT-BaseChar"></a>[85] 4972 </td> 4973 <td>BaseChar</td> 4974 <td> ::= </td> 4975 <td>[#x0041-#x005A] 4976 | [#x0061-#x007A] 4977 | [#x00C0-#x00D6] 4978 | [#x00D8-#x00F6] 4979 | [#x00F8-#x00FF] 4980 | [#x0100-#x0131] 4981 | [#x0134-#x013E] 4982 | [#x0141-#x0148] 4983 | [#x014A-#x017E] 4984 | [#x0180-#x01C3] 4985 | [#x01CD-#x01F0] 4986 | [#x01F4-#x01F5] 4987 | [#x01FA-#x0217] 4988 | [#x0250-#x02A8] 4989 | [#x02BB-#x02C1] 4990 | #x0386 4991 | [#x0388-#x038A] 4992 | #x038C 4993 | [#x038E-#x03A1] 4994 | [#x03A3-#x03CE] 4995 | [#x03D0-#x03D6] 4996 | #x03DA 4997 | #x03DC 4998 | #x03DE 4999 | #x03E0 5000 | [#x03E2-#x03F3] 5001 | [#x0401-#x040C] 5002 | [#x040E-#x044F] 5003 | [#x0451-#x045C] 5004 | [#x045E-#x0481] 5005 | [#x0490-#x04C4] 5006 | [#x04C7-#x04C8] 5007 | [#x04CB-#x04CC] 5008 | [#x04D0-#x04EB] 5009 | [#x04EE-#x04F5] 5010 | [#x04F8-#x04F9] 5011 | [#x0531-#x0556] 5012 | #x0559 5013 | [#x0561-#x0586] 5014 | [#x05D0-#x05EA] 5015 | [#x05F0-#x05F2] 5016 | [#x0621-#x063A] 5017 | [#x0641-#x064A] 5018 | [#x0671-#x06B7] 5019 | [#x06BA-#x06BE] 5020 | [#x06C0-#x06CE] 5021 | [#x06D0-#x06D3] 5022 | #x06D5 5023 | [#x06E5-#x06E6] 5024 | [#x0905-#x0939] 5025 | #x093D 5026 | [#x0958-#x0961] 5027 | [#x0985-#x098C] 5028 | [#x098F-#x0990] 5029 | [#x0993-#x09A8] 5030 | [#x09AA-#x09B0] 5031 | #x09B2 5032 | [#x09B6-#x09B9] 5033 | [#x09DC-#x09DD] 5034 | [#x09DF-#x09E1] 5035 | [#x09F0-#x09F1] 5036 | [#x0A05-#x0A0A] 5037 | [#x0A0F-#x0A10] 5038 | [#x0A13-#x0A28] 5039 | [#x0A2A-#x0A30] 5040 | [#x0A32-#x0A33] 5041 | [#x0A35-#x0A36] 5042 | [#x0A38-#x0A39] 5043 | [#x0A59-#x0A5C] 5044 | #x0A5E 5045 | [#x0A72-#x0A74] 5046 | [#x0A85-#x0A8B] 5047 | #x0A8D 5048 | [#x0A8F-#x0A91] 5049 | [#x0A93-#x0AA8] 5050 | [#x0AAA-#x0AB0] 5051 | [#x0AB2-#x0AB3] 5052 | [#x0AB5-#x0AB9] 5053 | #x0ABD 5054 | #x0AE0 5055 | [#x0B05-#x0B0C] 5056 | [#x0B0F-#x0B10] 5057 | [#x0B13-#x0B28] 5058 | [#x0B2A-#x0B30] 5059 | [#x0B32-#x0B33] 5060 | [#x0B36-#x0B39] 5061 | #x0B3D 5062 | [#x0B5C-#x0B5D] 5063 | [#x0B5F-#x0B61] 5064 | [#x0B85-#x0B8A] 5065 | [#x0B8E-#x0B90] 5066 | [#x0B92-#x0B95] 5067 | [#x0B99-#x0B9A] 5068 | #x0B9C 5069 | [#x0B9E-#x0B9F] 5070 | [#x0BA3-#x0BA4] 5071 | [#x0BA8-#x0BAA] 5072 | [#x0BAE-#x0BB5] 5073 | [#x0BB7-#x0BB9] 5074 | [#x0C05-#x0C0C] 5075 | [#x0C0E-#x0C10] 5076 | [#x0C12-#x0C28] 5077 | [#x0C2A-#x0C33] 5078 | [#x0C35-#x0C39] 5079 | [#x0C60-#x0C61] 5080 | [#x0C85-#x0C8C] 5081 | [#x0C8E-#x0C90] 5082 | [#x0C92-#x0CA8] 5083 | [#x0CAA-#x0CB3] 5084 | [#x0CB5-#x0CB9] 5085 | #x0CDE 5086 | [#x0CE0-#x0CE1] 5087 | [#x0D05-#x0D0C] 5088 | [#x0D0E-#x0D10] 5089 | [#x0D12-#x0D28] 5090 | [#x0D2A-#x0D39] 5091 | [#x0D60-#x0D61] 5092 | [#x0E01-#x0E2E] 5093 | #x0E30 5094 | [#x0E32-#x0E33] 5095 | [#x0E40-#x0E45] 5096 | [#x0E81-#x0E82] 5097 | #x0E84 5098 | [#x0E87-#x0E88] 5099 | #x0E8A 5100 | #x0E8D 5101 | [#x0E94-#x0E97] 5102 | [#x0E99-#x0E9F] 5103 | [#x0EA1-#x0EA3] 5104 | #x0EA5 5105 | #x0EA7 5106 | [#x0EAA-#x0EAB] 5107 | [#x0EAD-#x0EAE] 5108 | #x0EB0 5109 | [#x0EB2-#x0EB3] 5110 | #x0EBD 5111 | [#x0EC0-#x0EC4] 5112 | [#x0F40-#x0F47] 5113 | [#x0F49-#x0F69] 5114 | [#x10A0-#x10C5] 5115 | [#x10D0-#x10F6] 5116 | #x1100 5117 | [#x1102-#x1103] 5118 | [#x1105-#x1107] 5119 | #x1109 5120 | [#x110B-#x110C] 5121 | [#x110E-#x1112] 5122 | #x113C 5123 | #x113E 5124 | #x1140 5125 | #x114C 5126 | #x114E 5127 | #x1150 5128 | [#x1154-#x1155] 5129 | #x1159 5130 | [#x115F-#x1161] 5131 | #x1163 5132 | #x1165 5133 | #x1167 5134 | #x1169 5135 | [#x116D-#x116E] 5136 | [#x1172-#x1173] 5137 | #x1175 5138 | #x119E 5139 | #x11A8 5140 | #x11AB 5141 | [#x11AE-#x11AF] 5142 | [#x11B7-#x11B8] 5143 | #x11BA 5144 | [#x11BC-#x11C2] 5145 | #x11EB 5146 | #x11F0 5147 | #x11F9 5148 | [#x1E00-#x1E9B] 5149 | [#x1EA0-#x1EF9] 5150 | [#x1F00-#x1F15] 5151 | [#x1F18-#x1F1D] 5152 | [#x1F20-#x1F45] 5153 | [#x1F48-#x1F4D] 5154 | [#x1F50-#x1F57] 5155 | #x1F59 5156 | #x1F5B 5157 | #x1F5D 5158 | [#x1F5F-#x1F7D] 5159 | [#x1F80-#x1FB4] 5160 | [#x1FB6-#x1FBC] 5161 | #x1FBE 5162 | [#x1FC2-#x1FC4] 5163 | [#x1FC6-#x1FCC] 5164 | [#x1FD0-#x1FD3] 5165 | [#x1FD6-#x1FDB] 5166 | [#x1FE0-#x1FEC] 5167 | [#x1FF2-#x1FF4] 5168 | [#x1FF6-#x1FFC] 5169 | #x2126 5170 | [#x212A-#x212B] 5171 | #x212E 5172 | [#x2180-#x2182] 5173 | [#x3041-#x3094] 5174 | [#x30A1-#x30FA] 5175 | [#x3105-#x312C] 5176 | [#xAC00-#xD7A3] 5177 5178 </td> 5179 <td></td> 5180 </tr> 5181 5182 <tr valign="baseline"> 5183 <td><a name="NT-Ideographic"></a>[86] 5184 </td> 5185 <td>Ideographic</td> 5186 <td> ::= </td> 5187 <td>[#x4E00-#x9FA5] 5188 | #x3007 5189 | [#x3021-#x3029] 5190 5191 </td> 5192 <td></td> 5193 </tr> 5194 5195 <tr valign="baseline"> 5196 <td><a name="NT-CombiningChar"></a>[87] 5197 </td> 5198 <td>CombiningChar</td> 5199 <td> ::= </td> 5200 <td>[#x0300-#x0345] 5201 | [#x0360-#x0361] 5202 | [#x0483-#x0486] 5203 | [#x0591-#x05A1] 5204 | [#x05A3-#x05B9] 5205 | [#x05BB-#x05BD] 5206 | #x05BF 5207 | [#x05C1-#x05C2] 5208 | #x05C4 5209 | [#x064B-#x0652] 5210 | #x0670 5211 | [#x06D6-#x06DC] 5212 | [#x06DD-#x06DF] 5213 | [#x06E0-#x06E4] 5214 | [#x06E7-#x06E8] 5215 | [#x06EA-#x06ED] 5216 | [#x0901-#x0903] 5217 | #x093C 5218 | [#x093E-#x094C] 5219 | #x094D 5220 | [#x0951-#x0954] 5221 | [#x0962-#x0963] 5222 | [#x0981-#x0983] 5223 | #x09BC 5224 | #x09BE 5225 | #x09BF 5226 | [#x09C0-#x09C4] 5227 | [#x09C7-#x09C8] 5228 | [#x09CB-#x09CD] 5229 | #x09D7 5230 | [#x09E2-#x09E3] 5231 | #x0A02 5232 | #x0A3C 5233 | #x0A3E 5234 | #x0A3F 5235 | [#x0A40-#x0A42] 5236 | [#x0A47-#x0A48] 5237 | [#x0A4B-#x0A4D] 5238 | [#x0A70-#x0A71] 5239 | [#x0A81-#x0A83] 5240 | #x0ABC 5241 | [#x0ABE-#x0AC5] 5242 | [#x0AC7-#x0AC9] 5243 | [#x0ACB-#x0ACD] 5244 | [#x0B01-#x0B03] 5245 | #x0B3C 5246 | [#x0B3E-#x0B43] 5247 | [#x0B47-#x0B48] 5248 | [#x0B4B-#x0B4D] 5249 | [#x0B56-#x0B57] 5250 | [#x0B82-#x0B83] 5251 | [#x0BBE-#x0BC2] 5252 | [#x0BC6-#x0BC8] 5253 | [#x0BCA-#x0BCD] 5254 | #x0BD7 5255 | [#x0C01-#x0C03] 5256 | [#x0C3E-#x0C44] 5257 | [#x0C46-#x0C48] 5258 | [#x0C4A-#x0C4D] 5259 | [#x0C55-#x0C56] 5260 | [#x0C82-#x0C83] 5261 | [#x0CBE-#x0CC4] 5262 | [#x0CC6-#x0CC8] 5263 | [#x0CCA-#x0CCD] 5264 | [#x0CD5-#x0CD6] 5265 | [#x0D02-#x0D03] 5266 | [#x0D3E-#x0D43] 5267 | [#x0D46-#x0D48] 5268 | [#x0D4A-#x0D4D] 5269 | #x0D57 5270 | #x0E31 5271 | [#x0E34-#x0E3A] 5272 | [#x0E47-#x0E4E] 5273 | #x0EB1 5274 | [#x0EB4-#x0EB9] 5275 | [#x0EBB-#x0EBC] 5276 | [#x0EC8-#x0ECD] 5277 | [#x0F18-#x0F19] 5278 | #x0F35 5279 | #x0F37 5280 | #x0F39 5281 | #x0F3E 5282 | #x0F3F 5283 | [#x0F71-#x0F84] 5284 | [#x0F86-#x0F8B] 5285 | [#x0F90-#x0F95] 5286 | #x0F97 5287 | [#x0F99-#x0FAD] 5288 | [#x0FB1-#x0FB7] 5289 | #x0FB9 5290 | [#x20D0-#x20DC] 5291 | #x20E1 5292 | [#x302A-#x302F] 5293 | #x3099 5294 | #x309A 5295 5296 </td> 5297 <td></td> 5298 </tr> 5299 5300 <tr valign="baseline"> 5301 <td><a name="NT-Digit"></a>[88] 5302 </td> 5303 <td>Digit</td> 5304 <td> ::= </td> 5305 <td>[#x0030-#x0039] 5306 | [#x0660-#x0669] 5307 | [#x06F0-#x06F9] 5308 | [#x0966-#x096F] 5309 | [#x09E6-#x09EF] 5310 | [#x0A66-#x0A6F] 5311 | [#x0AE6-#x0AEF] 5312 | [#x0B66-#x0B6F] 5313 | [#x0BE7-#x0BEF] 5314 | [#x0C66-#x0C6F] 5315 | [#x0CE6-#x0CEF] 5316 | [#x0D66-#x0D6F] 5317 | [#x0E50-#x0E59] 5318 | [#x0ED0-#x0ED9] 5319 | [#x0F20-#x0F29] 5320 5321 </td> 5322 <td></td> 5323 </tr> 5324 5325 <tr valign="baseline"> 5326 <td><a name="NT-Extender"></a>[89] 5327 </td> 5328 <td>Extender</td> 5329 <td> ::= </td> 5330 <td>#x00B7 5331 | #x02D0 5332 | #x02D1 5333 | #x0387 5334 | #x0640 5335 | #x0E46 5336 | #x0EC6 5337 | #x3005 5338 | [#x3031-#x3035] 5339 | [#x309D-#x309E] 5340 | [#x30FC-#x30FE] 5341 5342 </td> 5343 <td></td> 5344 </tr> 5345 5346 5347 </tbody> 5348 </table> 5349 5350 </p> 5351 5352 <p>The character classes defined here can be derived from the 5353 Unicode character database as follows: 5354 5355 <ul> 5356 5357 <li> 5358 5359 <p>Name start characters must have one of the categories Ll, Lu, 5360 Lo, Lt, Nl. 5361 </p> 5362 5363 </li> 5364 5365 <li> 5366 5367 <p>Name characters other than Name-start characters 5368 must have one of the categories Mc, Me, Mn, Lm, or Nd. 5369 </p> 5370 5371 </li> 5372 5373 <li> 5374 5375 <p>Characters in the compatibility area (i.e. with character code 5376 greater than #xF900 and less than #xFFFE) are not allowed in XML 5377 names. 5378 </p> 5379 5380 </li> 5381 5382 <li> 5383 5384 <p>Characters which have a font or compatibility decomposition (i.e. those 5385 with a "compatibility formatting tag" in field 5 of the database -- 5386 marked by field 5 beginning with a "<") are not allowed. 5387 </p> 5388 5389 </li> 5390 5391 <li> 5392 5393 <p>The following characters are treated as name-start characters 5394 rather than name characters, because the property file classifies 5395 them as Alphabetic: [#x02BB-#x02C1], #x0559, #x06E5, #x06E6. 5396 </p> 5397 5398 </li> 5399 5400 <li> 5401 5402 <p>Characters #x20DD-#x20E0 are excluded (in accordance with 5403 Unicode, section 5.14). 5404 </p> 5405 5406 </li> 5407 5408 <li> 5409 5410 <p>Character #x00B7 is classified as an extender, because the 5411 property list so identifies it. 5412 </p> 5413 5414 </li> 5415 5416 <li> 5417 5418 <p>Character #x0387 is added as a name character, because #x00B7 5419 is its canonical equivalent. 5420 </p> 5421 5422 </li> 5423 5424 <li> 5425 5426 <p>Characters ':' and '_' are allowed as name-start characters.</p> 5427 5428 </li> 5429 5430 <li> 5431 5432 <p>Characters '-' and '.' are allowed as name characters.</p> 5433 5434 </li> 5435 5436 </ul> 5437 5438 </p> 5439 5440 5441 5442 <h2><a name="sec-xml-and-sgml"></a>C XML and SGML (Non-Normative) 5443 </h2> 5444 5445 5446 <p>XML is designed to be a subset of SGML, in that every 5447 <a href="#dt-valid">valid</a> XML document should also be a 5448 conformant SGML document. 5449 For a detailed comparison of the additional restrictions that XML places on 5450 documents beyond those of SGML, see <a href="#Clark">[Clark]</a>. 5451 5452 </p> 5453 5454 5455 5456 <h2><a name="sec-entexpand"></a>D Expansion of Entity and Character References (Non-Normative) 5457 </h2> 5458 5459 <p>This appendix contains some examples illustrating the 5460 sequence of entity- and character-reference recognition and 5461 expansion, as specified in <a href="#entproc">[<b>4.4 XML Processor Treatment of Entities and References</b>] 5462 </a>. 5463 </p> 5464 5465 <p> 5466 If the DTD contains the declaration 5467 <pre><!ENTITY example "<p>An ampersand (&#38;#38;) may be escaped 5468numerically (&#38;#38;#38;) or with a general entity 5469(&amp;amp;).</p>" > 5470</pre> 5471 then the XML processor will recognize the character references 5472 when it parses the entity declaration, and resolve them before 5473 storing the following string as the 5474 value of the entity "<code>example</code>": 5475 <pre><p>An ampersand (&#38;) may be escaped 5476numerically (&#38;#38;) or with a general entity 5477(&amp;amp;).</p> 5478</pre> 5479 A reference in the document to "<code>&example;</code>" 5480 will cause the text to be reparsed, at which time the 5481 start- and end-tags of the "<code>p</code>" element will be recognized 5482 and the three references will be recognized and expanded, 5483 resulting in a "<code>p</code>" element with the following content 5484 (all data, no delimiters or markup): 5485 <pre>An ampersand (&) may be escaped 5486numerically (&#38;) or with a general entity 5487(&amp;). 5488</pre> 5489 </p> 5490 5491 <p>A more complex example will illustrate the rules and their 5492 effects fully. In the following example, the line numbers are 5493 solely for reference. 5494 <pre>1 <?xml version='1.0'?> 54952 <!DOCTYPE test [ 54963 <!ELEMENT test (#PCDATA) > 54974 <!ENTITY % xx '&#37;zz;'> 54985 <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' > 54996 %xx; 55007 ]> 55018 <test>This sample shows a &tricky; method.</test> 5502</pre> 5503 This produces the following: 5504 <ul> 5505 5506 <li> 5507 <p>in line 4, the reference to character 37 is expanded immediately, 5508 and the parameter entity "<code>xx</code>" is stored in the symbol 5509 table with the value "<code>%zz;</code>". Since the replacement text 5510 is not rescanned, the reference to parameter entity "<code>zz</code>" 5511 is not recognized. (And it would be an error if it were, since 5512 "<code>zz</code>" is not yet declared.) 5513 </p> 5514 </li> 5515 5516 <li> 5517 <p>in line 5, the character reference "<code>&#60;</code>" is 5518 expanded immediately and the parameter entity "<code>zz</code>" is 5519 stored with the replacement text 5520 "<code><!ENTITY tricky "error-prone" ></code>", 5521 which is a well-formed entity declaration. 5522 </p> 5523 </li> 5524 5525 <li> 5526 <p>in line 6, the reference to "<code>xx</code>" is recognized, 5527 and the replacement text of "<code>xx</code>" (namely 5528 "<code>%zz;</code>") is parsed. The reference to "<code>zz</code>" 5529 is recognized in its turn, and its replacement text 5530 ("<code><!ENTITY tricky "error-prone" ></code>") is parsed. 5531 The general entity "<code>tricky</code>" has now been 5532 declared, with the replacement text "<code>error-prone</code>". 5533 </p> 5534 </li> 5535 5536 <li> 5537 <p> 5538 in line 8, the reference to the general entity "<code>tricky</code>" is 5539 recognized, and it is expanded, so the full content of the 5540 "<code>test</code>" element is the self-describing (and ungrammatical) string 5541 <i>This sample shows a error-prone method.</i> 5542 5543 </p> 5544 </li> 5545 5546 </ul> 5547 5548 </p> 5549 5550 5551 5552 <h2><a name="determinism"></a>E Deterministic Content Models (Non-Normative) 5553 </h2> 5554 5555 <p><a href="#dt-compat">For compatibility</a>, it is 5556 required 5557 that content models in element type declarations be deterministic. 5558 5559 </p> 5560 5561 5562 <p>SGML 5563 requires deterministic content models (it calls them 5564 "unambiguous"); XML processors built using SGML systems may 5565 flag non-deterministic content models as errors. 5566 </p> 5567 5568 <p>For example, the content model <code>((b, c) | (b, d))</code> is 5569 non-deterministic, because given an initial <code>b</code> the parser 5570 cannot know which <code>b</code> in the model is being matched without 5571 looking ahead to see which element follows the <code>b</code>. 5572 In this case, the two references to 5573 <code>b</code> can be collapsed 5574 into a single reference, making the model read 5575 <code>(b, (c | d))</code>. An initial <code>b</code> now clearly 5576 matches only a single name in the content model. The parser doesn't 5577 need to look ahead to see what follows; either <code>c</code> or 5578 <code>d</code> would be accepted. 5579 </p> 5580 5581 <p>More formally: a finite state automaton may be constructed from the 5582 content model using the standard algorithms, e.g. algorithm 3.5 5583 in section 3.9 5584 of Aho, Sethi, and Ullman <a href="#Aho">[Aho/Ullman]</a>. 5585 In many such algorithms, a follow set is constructed for each 5586 position in the regular expression (i.e., each leaf 5587 node in the 5588 syntax tree for the regular expression); 5589 if any position has a follow set in which 5590 more than one following position is 5591 labeled with the same element type name, 5592 then the content model is in error 5593 and may be reported as an error. 5594 5595 </p> 5596 5597 <p>Algorithms exist which allow many but not all non-deterministic 5598 content models to be reduced automatically to equivalent deterministic 5599 models; see Brüggemann-Klein 1991 <a href="#ABK">[Brüggemann-Klein]</a>. 5600 </p> 5601 5602 5603 5604 <h2><a name="sec-guessing"></a>F Autodetection of Character Encodings (Non-Normative) 5605 </h2> 5606 5607 <p>The XML encoding declaration functions as an internal label on each 5608 entity, indicating which character encoding is in use. Before an XML 5609 processor can read the internal label, however, it apparently has to 5610 know what character encoding is in use--which is what the internal label 5611 is trying to indicate. In the general case, this is a hopeless 5612 situation. It is not entirely hopeless in XML, however, because XML 5613 limits the general case in two ways: each implementation is assumed 5614 to support only a finite set of character encodings, and the XML 5615 encoding declaration is restricted in position and content in order to 5616 make it feasible to autodetect the character encoding in use in each 5617 entity in normal cases. Also, in many cases other sources of information 5618 are available in addition to the XML data stream itself. 5619 Two cases may be distinguished, 5620 depending on whether the XML entity is presented to the 5621 processor without, or with, any accompanying 5622 (external) information. We consider the first case first. 5623 5624 </p> 5625 5626 <p> 5627 Because each XML entity not in UTF-8 or UTF-16 format <i>must</i> 5628 begin with an XML encoding declaration, in which the first characters 5629 must be '<code><?xml</code>', any conforming processor can detect, 5630 after two to four octets of input, which of the following cases apply. 5631 In reading this list, it may help to know that in UCS-4, '<' is 5632 "<code>#x0000003C</code>" and '?' is "<code>#x0000003F</code>", and the Byte 5633 Order Mark required of UTF-16 data streams is "<code>#xFEFF</code>". 5634 </p> 5635 5636 <p> 5637 5638 <ul> 5639 5640 <li> 5641 5642 <p><code>00 00 00 3C</code>: UCS-4, big-endian machine (1234 order) 5643 </p> 5644 5645 </li> 5646 5647 <li> 5648 5649 <p><code>3C 00 00 00</code>: UCS-4, little-endian machine (4321 order) 5650 </p> 5651 5652 </li> 5653 5654 <li> 5655 5656 <p><code>00 00 3C 00</code>: UCS-4, unusual octet order (2143) 5657 </p> 5658 5659 </li> 5660 5661 <li> 5662 5663 <p><code>00 3C 00 00</code>: UCS-4, unusual octet order (3412) 5664 </p> 5665 5666 </li> 5667 5668 <li> 5669 5670 <p><code>FE FF</code>: UTF-16, big-endian 5671 </p> 5672 5673 </li> 5674 5675 <li> 5676 5677 <p><code>FF FE</code>: UTF-16, little-endian 5678 </p> 5679 5680 </li> 5681 5682 <li> 5683 5684 <p><code>00 3C 00 3F</code>: UTF-16, big-endian, no Byte Order Mark 5685 (and thus, strictly speaking, in error) 5686 </p> 5687 5688 </li> 5689 5690 <li> 5691 5692 <p><code>3C 00 3F 00</code>: UTF-16, little-endian, no Byte Order Mark 5693 (and thus, strictly speaking, in error) 5694 </p> 5695 5696 </li> 5697 5698 <li> 5699 5700 <p><code>3C 3F 78 6D</code>: UTF-8, ISO 646, ASCII, some part of ISO 8859, 5701 Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding 5702 which ensures that the characters of ASCII have their normal positions, 5703 width, 5704 and values; the actual encoding declaration must be read to 5705 detect which of these applies, but since all of these encodings 5706 use the same bit patterns for the ASCII characters, the encoding 5707 declaration itself may be read reliably 5708 5709 </p> 5710 5711 </li> 5712 5713 <li> 5714 5715 <p><code>4C 6F A7 94</code>: EBCDIC (in some flavor; the full 5716 encoding declaration must be read to tell which code page is in 5717 use) 5718 </p> 5719 5720 </li> 5721 5722 <li> 5723 5724 <p>other: UTF-8 without an encoding declaration, or else 5725 the data stream is corrupt, fragmentary, or enclosed in 5726 a wrapper of some kind 5727 </p> 5728 5729 </li> 5730 5731 </ul> 5732 5733 </p> 5734 5735 <p> 5736 This level of autodetection is enough to read the XML encoding 5737 declaration and parse the character-encoding identifier, which is 5738 still necessary to distinguish the individual members of each family 5739 of encodings (e.g. to tell UTF-8 from 8859, and the parts of 8859 5740 from each other, or to distinguish the specific EBCDIC code page in 5741 use, and so on). 5742 5743 </p> 5744 5745 <p> 5746 Because the contents of the encoding declaration are restricted to 5747 ASCII characters, a processor can reliably read the entire encoding 5748 declaration as soon as it has detected which family of encodings is in 5749 use. Since in practice, all widely used character encodings fall into 5750 one of the categories above, the XML encoding declaration allows 5751 reasonably reliable in-band labeling of character encodings, even when 5752 external sources of information at the operating-system or 5753 transport-protocol level are unreliable. 5754 5755 </p> 5756 5757 <p> 5758 Once the processor has detected the character encoding in use, it can 5759 act appropriately, whether by invoking a separate input routine for 5760 each case, or by calling the proper conversion function on each 5761 character of input. 5762 5763 </p> 5764 5765 <p> 5766 Like any self-labeling system, the XML encoding declaration will not 5767 work if any software changes the entity's character set or encoding 5768 without updating the encoding declaration. Implementors of 5769 character-encoding routines should be careful to ensure the accuracy 5770 of the internal and external information used to label the entity. 5771 5772 </p> 5773 5774 <p>The second possible case occurs when the XML entity is accompanied 5775 by encoding information, as in some file systems and some network 5776 protocols. 5777 When multiple sources of information are available, 5778 5779 their relative 5780 priority and the preferred method of handling conflict should be 5781 specified as part of the higher-level protocol used to deliver XML. 5782 Rules for the relative priority of the internal label and the 5783 MIME-type label in an external header, for example, should be part of the 5784 RFC document defining the text/xml and application/xml MIME types. In 5785 the interests of interoperability, however, the following rules 5786 are recommended. 5787 5788 <ul> 5789 5790 <li> 5791 <p>If an XML entity is in a file, the Byte-Order Mark 5792 and encoding-declaration PI are used (if present) to determine the 5793 character encoding. All other heuristics and sources of information 5794 are solely for error recovery. 5795 5796 </p> 5797 </li> 5798 5799 <li> 5800 <p>If an XML entity is delivered with a 5801 MIME type of text/xml, then the <code>charset</code> parameter 5802 on the MIME type determines the 5803 character encoding method; all other heuristics and sources of 5804 information are solely for error recovery. 5805 5806 </p> 5807 </li> 5808 5809 <li> 5810 <p>If an XML entity is delivered 5811 with a 5812 MIME type of application/xml, then the Byte-Order Mark and 5813 encoding-declaration PI are used (if present) to determine the 5814 character encoding. All other heuristics and sources of 5815 information are solely for error recovery. 5816 5817 </p> 5818 </li> 5819 5820 </ul> 5821 These rules apply only in the absence of protocol-level documentation; 5822 in particular, when the MIME types text/xml and application/xml are 5823 defined, the recommendations of the relevant RFC will supersede 5824 these rules. 5825 5826 </p> 5827 5828 5829 5830 5831 5832 <h2><a name="sec-xml-wg"></a>G W3C XML Working Group (Non-Normative) 5833 </h2> 5834 5835 5836 <p>This specification was prepared and approved for publication by the 5837 W3C XML Working Group (WG). WG approval of this specification does 5838 not necessarily imply that all WG members voted for its approval. 5839 The current and former members of the XML WG are: 5840 </p> 5841 5842 Jon Bosak, Sun (Chair); James Clark (Technical Lead); Tim Bray, Textuality and Netscape (XML Co-editor); Jean Paoli, Microsoft (XML Co-editor); C. M. Sperberg-McQueen, U. of Ill. (XML 5843 Co-editor); Dan Connolly, W3C (W3C Liaison); Paula Angerstein, Texcel; Steve DeRose, INSO; Dave Hollander, HP; Eliot Kimber, ISOGEN; Eve Maler, ArborText; Tom Magliery, NCSA; Murray Maloney, Muzmo and Grif; Makoto Murata, Fuji Xerox Information Systems; Joel Nava, Adobe; Conleth O'Connell, Vignette; Peter Sharpe, SoftQuad; John Tigue, DataChannel 5844 5845 5846 5847 5848 </body> 5849</html>