xref: /aosp_15_r20/external/apache-xml/test/tests/contrib-gold/xsltc/mk/mk054.out (revision 1212f9a0ffdc28482b8821715d2222bf16dc14e2)
1
2<!DOCTYPE html
3  PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
4
5<html>
6   <head>
7      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8
9      <title>Extensible Markup Language (XML) 1.0</title>
10      <link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-REC"><style type="text/css">code { font-family: monospace }</style></head>
11   <body>
12
13      <div class="head"><a href="http://www.w3.org/"><img src="http://www.w3.org/Icons/WWW/w3c_home" alt="W3C" height="48" width="72"></a><h1>Extensible Markup Language (XML) 1.0<br></h1>
14         <h2>W3C Recommendation 10 February 1998</h2>
15         <dl>
16            <dt>This version:</dt>
17            <dd>
18               <a href="http://www.w3.org/TR/1998/REC-xml-19980210">
19                  http://www.w3.org/TR/1998/REC-xml-19980210
20               </a><br>
21               <a href="http://www.w3.org/TR/1998/REC-xml-19980210.xml">
22                  http://www.w3.org/TR/1998/REC-xml-19980210.xml
23               </a><br>
24               <a href="http://www.w3.org/TR/1998/REC-xml-19980210.html">
25                  http://www.w3.org/TR/1998/REC-xml-19980210.html
26               </a><br>
27               <a href="http://www.w3.org/TR/1998/REC-xml-19980210.pdf">
28                  http://www.w3.org/TR/1998/REC-xml-19980210.pdf
29               </a><br>
30               <a href="http://www.w3.org/TR/1998/REC-xml-19980210.ps">
31                  http://www.w3.org/TR/1998/REC-xml-19980210.ps
32               </a><br>
33
34            </dd>
35            <dt>Latest version:</dt>
36            <dd>
37               <a href="http://www.w3.org/TR/REC-xml">
38                  http://www.w3.org/TR/REC-xml
39               </a><br>
40
41            </dd>
42            <dt>Previous version:</dt>
43            <dd>
44               <a href="http://www.w3.org/TR/PR-xml-971208">
45                  http://www.w3.org/TR/PR-xml-971208
46               </a><br>
47
48
49            </dd>
50            <dt>Editors:</dt>
51            <dd>
52               Tim Bray
53                (Textuality and Netscape)
54               <a href="mailto:[email protected]">&lt;[email protected]></a><br>
55               Jean Paoli
56                (Microsoft)
57               <a href="mailto:[email protected]">&lt;[email protected]></a><br>
58               C. M. Sperberg-McQueen
59                (University of Illinois at Chicago)
60               <a href="mailto:[email protected]">&lt;[email protected]></a><br>
61
62            </dd>
63         </dl>
64         <p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice.html#Copyright">
65               		Copyright
66            </a> &nbsp;&copy;&nbsp; 1999 <a href="http://www.w3.org">W3C</a>
67            		(<a href="http://www.lcs.mit.edu">MIT</a>,
68            		<a href="http://www.inria.fr/">INRIA</a>,
69            		<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C
70            		<a href="http://www.w3.org/Consortium/Legal/ipr-notice.html#Legal_Disclaimer">liability</a>,
71            		<a href="http://www.w3.org/Consortium/Legal/ipr-notice.html#W3C_Trademarks">trademark</a>,
72            		<a href="http://www.w3.org/Consortium/Legal/copyright-documents.html">document use</a> and
73            		<a href="http://www.w3.org/Consortium/Legal/copyright-software.html">software licensing</a> rules apply.
74
75         </p>
76         <hr title="Separator for header">
77      </div>
78      <h2><a name="abstract">Abstract</a></h2>
79
80      <p>The Extensible Markup Language (XML) is a subset of
81         SGML that is completely described in this document. Its goal is to
82         enable generic SGML to be served, received, and processed on the Web
83         in the way that is now possible with HTML. XML has been designed for
84         ease of implementation and for interoperability with both SGML and
85         HTML.
86      </p>
87
88      <h2><a name="status">Status of this document</a></h2>
89
90      <p>This document has been reviewed by W3C Members and
91         other interested parties and has been endorsed by the
92         Director as a W3C Recommendation. It is a stable
93         document and may be used as reference material or cited
94         as a normative reference from another document. W3C's
95         role in making the Recommendation is to draw attention
96         to the specification and to promote its widespread
97         deployment. This enhances the functionality and
98         interoperability of the Web.
99      </p>
100
101      <p>
102         This document specifies a syntax created by subsetting an existing,
103         widely used international text processing standard (Standard
104         Generalized Markup Language, ISO 8879:1986(E) as amended and
105         corrected) for use on the World Wide Web.  It is a product of the W3C
106         XML Activity, details of which can be found at <a href="http://www.w3.org/XML">http://www.w3.org/XML</a>.  A list of
107         current W3C Recommendations and other technical documents can be found
108         at <a href="http://www.w3.org/TR">http://www.w3.org/TR</a>.
109
110      </p>
111
112      <p>This specification uses the term URI, which is defined by <a href="#Berners-Lee">[Berners-Lee et al.]</a>, a work in progress expected to update <a href="#RFC1738">[IETF RFC1738]</a> and <a href="#RFC1808">[IETF RFC1808]</a>.
113
114      </p>
115
116      <p>The list of known errors in this specification is
117         available at
118         <a href="http://www.w3.org/XML/xml-19980210-errata">http://www.w3.org/XML/xml-19980210-errata</a>.
119      </p>
120
121      <p>Please report errors in this document to
122         <a href="mailto:[email protected]">[email protected]</a>.
123
124      </p>
125
126
127      <h2><a name="contents">Table of contents</a></h2>1 <a href="#sec-intro">Introduction</a><br>&nbsp;&nbsp;&nbsp;&nbsp;1.1 <a href="#sec-origin-goals">Origin and Goals</a><br>&nbsp;&nbsp;&nbsp;&nbsp;1.2 <a href="#sec-terminology">Terminology</a><br>2 <a href="#sec-documents">Documents</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.1 <a href="#sec-well-formed">Well-Formed XML Documents</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.2 <a href="#charsets">Characters</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.3 <a href="#sec-common-syn">Common Syntactic Constructs</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.4 <a href="#syntax">Character Data and Markup</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.5 <a href="#sec-comments">Comments</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.6 <a href="#sec-pi">Processing Instructions</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.7 <a href="#sec-cdata-sect">CDATA Sections</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.8 <a href="#sec-prolog-dtd">Prolog and Document Type Declaration</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.9 <a href="#sec-rmd">Standalone Document Declaration</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.10 <a href="#sec-white-space">White Space Handling</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.11 <a href="#sec-line-ends">End-of-Line Handling</a><br>&nbsp;&nbsp;&nbsp;&nbsp;2.12 <a href="#sec-lang-tag">Language Identification</a><br>3 <a href="#sec-logical-struct">Logical Structures</a><br>&nbsp;&nbsp;&nbsp;&nbsp;3.1 <a href="#sec-starttags">Start-Tags, End-Tags, and Empty-Element Tags</a><br>&nbsp;&nbsp;&nbsp;&nbsp;3.2 <a href="#elemdecls">Element Type Declarations</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.2.1 <a href="#sec-element-content">Element Content</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.2.2 <a href="#sec-mixed-content">Mixed Content</a><br>&nbsp;&nbsp;&nbsp;&nbsp;3.3 <a href="#attdecls">Attribute-List Declarations</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.3.1 <a href="#sec-attribute-types">Attribute Types</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.3.2 <a href="#sec-attr-defaults">Attribute Defaults</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.3.3 <a href="#AVNormalize">Attribute-Value Normalization</a><br>&nbsp;&nbsp;&nbsp;&nbsp;3.4 <a href="#sec-condition-sect">Conditional Sections</a><br>4 <a href="#sec-physical-struct">Physical Structures</a><br>&nbsp;&nbsp;&nbsp;&nbsp;4.1 <a href="#sec-references">Character and Entity References</a><br>&nbsp;&nbsp;&nbsp;&nbsp;4.2 <a href="#sec-entity-decl">Entity Declarations</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.2.1 <a href="#sec-internal-ent">Internal Entities</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.2.2 <a href="#sec-external-ent">External Entities</a><br>&nbsp;&nbsp;&nbsp;&nbsp;4.3 <a href="#TextEntities">Parsed Entities</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.3.1 <a href="#sec-TextDecl">The Text Declaration</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.3.2 <a href="#wf-entities">Well-Formed Parsed Entities</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.3.3 <a href="#charencoding">Character Encoding in Entities</a><br>&nbsp;&nbsp;&nbsp;&nbsp;4.4 <a href="#entproc">XML Processor Treatment of Entities and References</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.4.1 <a href="#not-recognized">Not Recognized</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.4.2 <a href="#included">Included</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.4.3 <a href="#include-if-valid">Included If Validating</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.4.4 <a href="#forbidden">Forbidden</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.4.5 <a href="#inliteral">Included in Literal</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.4.6 <a href="#notify">Notify</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.4.7 <a href="#bypass">Bypassed</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4.4.8 <a href="#as-PE">Included as PE</a><br>&nbsp;&nbsp;&nbsp;&nbsp;4.5 <a href="#intern-replacement">Construction of Internal Entity Replacement Text</a><br>&nbsp;&nbsp;&nbsp;&nbsp;4.6 <a href="#sec-predefined-ent">Predefined Entities</a><br>&nbsp;&nbsp;&nbsp;&nbsp;4.7 <a href="#Notations">Notation Declarations</a><br>&nbsp;&nbsp;&nbsp;&nbsp;4.8 <a href="#sec-doc-entity">Document Entity</a><br>5 <a href="#sec-conformance">Conformance</a><br>&nbsp;&nbsp;&nbsp;&nbsp;5.1 <a href="#proc-types">Validating and Non-Validating Processors</a><br>&nbsp;&nbsp;&nbsp;&nbsp;5.2 <a href="#safe-behavior">Using XML Processors</a><br>6 <a href="#sec-notation">Notation</a><br><h3>Appendices</h3>A <a href="#sec-bibliography">References</a><br>&nbsp;&nbsp;&nbsp;&nbsp;A.1 <a href="#sec-existing-stds">Normative References</a><br>&nbsp;&nbsp;&nbsp;&nbsp;A.2 <a href="#section-Other-References">Other References</a><br>B <a href="#CharClasses">Character Classes</a><br>C <a href="#sec-xml-and-sgml">XML and SGML</a> (Non-Normative)<br>D <a href="#sec-entexpand">Expansion of Entity and Character References</a> (Non-Normative)<br>E <a href="#determinism">Deterministic Content Models</a> (Non-Normative)<br>F <a href="#sec-guessing">Autodetection of Character Encodings</a> (Non-Normative)<br>G <a href="#sec-xml-wg">W3C XML Working Group</a> (Non-Normative)<br><hr>
128
129
130      <h2><a name="sec-intro"></a>1 Introduction
131      </h2>
132
133      <p>Extensible Markup Language, abbreviated XML, describes a class of
134         data objects called <a href="#dt-xml-doc">XML documents</a> and
135         partially describes the behavior of
136         computer programs which process them. XML is an application profile or
137         restricted form of SGML, the Standard Generalized Markup
138         Language <a href="#ISO8879">[ISO 8879]</a>.
139         By construction, XML documents
140         are conforming SGML documents.
141
142      </p>
143
144      <p>XML documents are made up of storage units called <a href="#dt-entity">entities</a>, which contain either parsed
145         or unparsed data.
146         Parsed data is made up of <a href="#dt-character">characters</a>,
147         some
148         of which form <a href="#dt-chardata">character data</a>,
149         and some of which form <a href="#dt-markup">markup</a>.
150         Markup encodes a description of the document's storage layout and
151         logical structure. XML provides a mechanism to impose constraints on
152         the storage layout and logical structure.
153      </p>
154
155      <p><a name="dt-xml-proc"></a>A software module
156         called an <b>XML processor</b> is used to read XML documents
157         and provide access to their content and structure. <a name="dt-app"></a>It is assumed that an XML processor is
158         doing its work on behalf of another module, called the
159         <b>application</b>. This specification describes the
160         required behavior of an XML processor in terms of how it must read XML
161         data and the information it must provide to the application.
162      </p>
163
164
165
166      <h3><a name="sec-origin-goals"></a>1.1 Origin and Goals
167      </h3>
168
169      <p>XML was developed by an XML Working Group (originally known as the
170         SGML Editorial Review Board) formed under the auspices of the World
171         Wide Web Consortium (W3C) in 1996.
172         It was chaired by Jon Bosak of Sun
173         Microsystems with the active participation of an XML Special
174         Interest Group (previously known as the SGML Working Group) also
175         organized by the W3C. The membership of the XML Working Group is given
176         in an appendix. Dan Connolly served as the WG's contact with the W3C.
177
178      </p>
179
180      <p>The design goals for XML are:
181         <ol>
182
183            <li>
184               <p>XML shall be straightforwardly usable over the
185                  Internet.
186               </p>
187            </li>
188
189            <li>
190               <p>XML shall support a wide variety of applications.</p>
191            </li>
192
193            <li>
194               <p>XML shall be compatible with SGML.</p>
195            </li>
196
197            <li>
198               <p>It shall be easy to write programs which process XML
199                  documents.
200               </p>
201            </li>
202
203            <li>
204               <p>The number of optional features in XML is to be kept to the
205                  absolute minimum, ideally zero.
206               </p>
207            </li>
208
209            <li>
210               <p>XML documents should be human-legible and reasonably
211                  clear.
212               </p>
213            </li>
214
215            <li>
216               <p>The XML design should be prepared quickly.</p>
217            </li>
218
219            <li>
220               <p>The design of XML shall be formal and concise.</p>
221            </li>
222
223            <li>
224               <p>XML documents shall be easy to create.</p>
225            </li>
226
227            <li>
228               <p>Terseness in XML markup is of minimal importance.</p>
229            </li>
230         </ol>
231
232      </p>
233
234      <p>This specification,
235         together with associated standards
236         (Unicode and ISO/IEC 10646 for characters,
237         Internet RFC 1766 for language identification tags,
238         ISO 639 for language name codes, and
239         ISO 3166 for country name codes),
240         provides all the information necessary to understand
241         XML Version 1.0
242         and construct computer programs to process it.
243      </p>
244
245      <p>This version of the XML specification
246
247         may be distributed freely, as long as
248         all text and legal notices remain intact.
249      </p>
250
251
252
253
254
255
256
257
258      <h3><a name="sec-terminology"></a>1.2 Terminology
259      </h3>
260
261
262      <p>The terminology used to describe XML documents is defined in the body of
263         this specification.
264         The terms defined in the following list are used in building those
265         definitions and in describing the actions of an XML processor:
266
267         <dl>
268
269
270            <dt><b>may</b></dt>
271
272            <dd>
273               <p><a name="dt-may"></a>Conforming documents and XML
274                  processors are permitted to but need not behave as
275                  described.
276               </p>
277            </dd>
278
279
280
281            <dt><b>must</b></dt>
282
283            <dd>
284               <p>Conforming documents and XML processors
285                  are required to behave as described; otherwise they are in error.
286
287
288               </p>
289            </dd>
290
291
292
293            <dt><b>error</b></dt>
294
295            <dd>
296               <p><a name="dt-error"></a>A violation of the rules of this
297                  specification; results are
298                  undefined.  Conforming software may detect and report an error and may
299                  recover from it.
300               </p>
301            </dd>
302
303
304
305            <dt><b>fatal error</b></dt>
306
307            <dd>
308               <p><a name="dt-fatal"></a>An error
309                  which a conforming <a href="#dt-xml-proc">XML processor</a>
310                  must detect and report to the application.
311                  After encountering a fatal error, the
312                  processor may continue
313                  processing the data to search for further errors and may report such
314                  errors to the application.  In order to support correction of errors,
315                  the processor may make unprocessed data from the document (with
316                  intermingled character data and markup) available to the application.
317                  Once a fatal error is detected, however, the processor must not
318                  continue normal processing (i.e., it must not
319                  continue to pass character data and information about the document's
320                  logical structure to the application in the normal way).
321
322               </p>
323            </dd>
324
325
326
327            <dt><b>at user option</b></dt>
328
329            <dd>
330               <p>Conforming software may or must (depending on the modal verb in the
331                  sentence) behave as described; if it does, it must
332                  provide users a means to enable or disable the behavior
333                  described.
334               </p>
335            </dd>
336
337
338
339            <dt><b>validity constraint</b></dt>
340
341            <dd>
342               <p>A rule which applies to all
343                  <a href="#dt-valid">valid</a> XML documents.
344                  Violations of validity constraints are errors; they must, at user option,
345                  be reported by
346                  <a href="#dt-validating">validating XML processors</a>.
347               </p>
348            </dd>
349
350
351
352            <dt><b>well-formedness constraint</b></dt>
353
354            <dd>
355               <p>A rule which applies to all <a href="#dt-wellformed">well-formed</a> XML documents.
356                  Violations of well-formedness constraints are
357                  <a href="#dt-fatal">fatal errors</a>.
358               </p>
359            </dd>
360
361
362
363
364            <dt><b>match</b></dt>
365
366            <dd>
367               <p><a name="dt-match"></a>(Of strings or names:)
368                  Two strings or names being compared must be identical.
369                  Characters with multiple possible representations in ISO/IEC 10646 (e.g.
370                  characters with
371                  both precomposed and base+diacritic forms) match only if they have the
372                  same representation in both strings.
373                  At user option, processors may normalize such characters to
374                  some canonical form.
375                  No case folding is performed.
376                  (Of strings and rules in the grammar:)
377                  A string matches a grammatical production if it belongs to the
378                  language generated by that production.
379                  (Of content and content models:)
380                  An element matches its declaration when it conforms
381                  in the fashion described in the constraint
382                  <a href="#elementvalid">[<b>3 Element Valid</b>]
383                  </a>.
384
385
386               </p>
387            </dd>
388
389
390
391            <dt><b>for compatibility</b></dt>
392
393            <dd>
394               <p><a name="dt-compat"></a>A feature of
395                  XML included solely to ensure that XML remains compatible with SGML.
396
397               </p>
398            </dd>
399
400
401
402            <dt><b>for interoperability</b></dt>
403
404            <dd>
405               <p><a name="dt-interop"></a>A
406                  non-binding recommendation included to increase the chances that XML
407                  documents can be processed by the existing installed base of SGML
408                  processors which predate the
409                  WebSGML Adaptations Annex to ISO 8879.
410               </p>
411            </dd>
412
413
414         </dl>
415
416      </p>
417
418
419
420
421
422
423
424
425      <h2><a name="sec-documents"></a>2 Documents
426      </h2>
427
428
429      <p><a name="dt-xml-doc"></a>
430         A data object is an
431         <b>XML document</b> if it is
432         <a href="#dt-wellformed">well-formed</a>, as
433         defined in this specification.
434         A well-formed XML document may in addition be
435         <a href="#dt-valid">valid</a> if it meets certain further
436         constraints.
437      </p>
438
439
440      <p>Each XML document has both a logical and a physical structure.
441         Physically, the document is composed of units called <a href="#dt-entity">entities</a>.  An entity may <a href="#dt-entref">refer</a> to other entities to cause their
442         inclusion in the document. A document begins in a "root"  or <a href="#dt-docent">document entity</a>.
443         Logically, the document is composed of declarations, elements,
444         comments,
445         character references, and
446         processing
447         instructions, all of which are indicated in the document by explicit
448         markup.
449         The logical and physical structures must nest properly, as described
450         in <a href="#wf-entities">[<b>4.3.2 Well-Formed Parsed Entities</b>]
451         </a>.
452
453      </p>
454
455
456
457      <h3><a name="sec-well-formed"></a>2.1 Well-Formed XML Documents
458      </h3>
459
460
461      <p><a name="dt-wellformed"></a>
462         A textual object is
463         a well-formed XML document if:
464
465         <ol>
466
467            <li>
468               <p>Taken as a whole, it
469                  matches the production labeled <a href="#NT-document">document</a>.
470               </p>
471            </li>
472
473            <li>
474               <p>It
475                  meets all the well-formedness constraints given in this specification.
476               </p>
477
478            </li>
479
480            <li>
481               <p>Each of the <a href="#dt-parsedent">parsed entities</a>
482                  which is referenced directly or indirectly within the document is
483                  <a href="#wf-entities">well-formed</a>.
484               </p>
485            </li>
486
487         </ol>
488      </p>
489
490      <p>
491
492         <h5>Document</h5>
493         <table class="scrap">
494            <tbody>
495               <tr valign="baseline">
496                  <td><a name="NT-document"></a>[1]&nbsp;&nbsp;&nbsp;
497                  </td>
498                  <td>document</td>
499                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
500                  <td><a href="#NT-prolog">prolog</a>
501                     <a href="#NT-element">element</a>
502                     <a href="#NT-Misc">Misc</a>*
503                  </td>
504                  <td></td>
505               </tr>
506            </tbody>
507         </table>
508
509      </p>
510
511      <p>Matching the <a href="#NT-document">document</a> production
512         implies that:
513
514         <ol>
515
516            <li>
517               <p>It contains one or more
518                  <a href="#dt-element">elements</a>.
519               </p>
520
521            </li>
522
523
524            <li>
525               <p><a name="dt-root"></a>There is  exactly
526                  one element, called the <b>root</b>, or document element,  no
527                  part of which appears in the <a href="#dt-content">content</a> of any other element.
528                  For all other elements, if the start-tag is in the content of another
529                  element, the end-tag is in the content of the same element.  More
530                  simply stated, the elements, delimited by start- and end-tags, nest
531                  properly within each other.
532
533               </p>
534            </li>
535
536         </ol>
537
538      </p>
539
540      <p><a name="dt-parentchild"></a>As a consequence
541         of this,
542         for each non-root element
543         <code>C</code> in the document, there is one other element <code>P</code>
544         in the document such that
545         <code>C</code> is in the content of <code>P</code>, but is not in
546         the content of any other element that is in the content of
547         <code>P</code>.
548         <code>P</code> is referred to as the
549         <b>parent</b> of <code>C</code>, and <code>C</code> as a
550         <b>child</b> of <code>P</code>.
551      </p>
552
553
554
555      <h3><a name="charsets"></a>2.2 Characters
556      </h3>
557
558
559      <p><a name="dt-text"></a>A parsed entity contains
560         <b>text</b>, a sequence of
561         <a href="#dt-character">characters</a>,
562         which may represent markup or character data.
563         <a name="dt-character"></a>A <b>character</b>
564         is an atomic unit of text as specified by
565         ISO/IEC 10646 <a href="#ISO10646">[ISO/IEC 10646]</a>.
566         Legal characters are tab, carriage return, line feed, and the legal
567         graphic characters of Unicode and ISO/IEC 10646.
568         The use of "compatibility characters", as defined in section 6.8
569         of <a href="#Unicode">[Unicode]</a>, is discouraged.
570
571
572         <h5>Character Range</h5>
573         <table class="scrap">
574            <tbody>
575
576               <tr valign="baseline">
577                  <td><a name="NT-Char"></a>[2]&nbsp;&nbsp;&nbsp;
578                  </td>
579                  <td>Char</td>
580                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
581                  <td>#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
582                     | [#x10000-#x10FFFF]
583                  </td>
584                  <td>/*any Unicode character, excluding the
585                     surrogate blocks, FFFE, and FFFF.*/
586                  </td>
587               </tr>
588
589            </tbody>
590         </table>
591
592      </p>
593
594
595      <p>The mechanism for encoding character code points into bit patterns may
596         vary from entity to entity. All XML processors must accept the UTF-8
597         and UTF-16 encodings of 10646; the mechanisms for signaling which of
598         the two is in use, or for bringing other encodings into play, are
599         discussed later, in <a href="#charencoding">[<b>4.3.3 Character Encoding in Entities</b>]
600         </a>.
601
602      </p>
603
604
605
606
607
608      <h3><a name="sec-common-syn"></a>2.3 Common Syntactic Constructs
609      </h3>
610
611
612      <p>This section defines some symbols used widely in the grammar.</p>
613
614      <p><a href="#NT-S">S</a> (white space) consists of one or more space (#x20)
615         characters, carriage returns, line feeds, or tabs.
616
617
618         <h5>White Space</h5>
619         <table class="scrap">
620            <tbody>
621
622               <tr valign="baseline">
623                  <td><a name="NT-S"></a>[3]&nbsp;&nbsp;&nbsp;
624                  </td>
625                  <td>S</td>
626                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
627                  <td>(#x20 | #x9 | #xD | #xA)+</td>
628                  <td></td>
629               </tr>
630
631            </tbody>
632         </table>
633      </p>
634
635      <p>Characters are classified for convenience as letters, digits, or other
636         characters.  Letters consist of an alphabetic or syllabic
637         base character possibly
638         followed by one or more combining characters, or of an ideographic
639         character.
640         Full definitions of the specific characters in each class
641         are given in <a href="#CharClasses">[<b>B Character Classes</b>]
642         </a>.
643      </p>
644
645      <p><a name="dt-name"></a>A <b>Name</b> is a token
646         beginning with a letter or one of a few punctuation characters, and continuing
647         with letters, digits, hyphens, underscores, colons, or full stops, together
648         known as name characters.
649         Names beginning with the string "<code>xml</code>", or any string
650         which would match <code>(('X'|'x') ('M'|'m') ('L'|'l'))</code>, are
651         reserved for standardization in this or future versions of this
652         specification.
653
654      </p>
655
656      <blockquote><b>NOTE: </b>
657         The colon character within XML names is reserved for experimentation with
658         name spaces.
659         Its meaning is expected to be
660         standardized at some future point, at which point those documents
661         using the colon for experimental purposes may need to be updated.
662         (There is no guarantee that any name-space mechanism
663         adopted for XML will in fact use the colon as a name-space delimiter.)
664         In practice, this means that authors should not use the colon in XML
665         names except as part of name-space experiments, but that XML processors
666         should accept the colon as a name character.
667
668      </blockquote>
669
670      <p>An
671         <a href="#NT-Nmtoken">Nmtoken</a> (name token) is any mixture of
672         name characters.
673
674         <h5>Names and Tokens</h5>
675         <table class="scrap">
676            <tbody>
677               <tr valign="baseline">
678                  <td><a name="NT-NameChar"></a>[4]&nbsp;&nbsp;&nbsp;
679                  </td>
680                  <td>NameChar</td>
681                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
682                  <td><a href="#NT-Letter">Letter</a>
683                     | <a href="#NT-Digit">Digit</a>
684                     | '.' | '-' | '_' | ':'
685                     | <a href="#NT-CombiningChar">CombiningChar</a>
686                     | <a href="#NT-Extender">Extender</a></td>
687                  <td></td>
688               </tr>
689               <tr valign="baseline">
690                  <td><a name="NT-Name"></a>[5]&nbsp;&nbsp;&nbsp;
691                  </td>
692                  <td>Name</td>
693                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
694                  <td>(<a href="#NT-Letter">Letter</a> | '_' | ':')
695                     (<a href="#NT-NameChar">NameChar</a>)*
696                  </td>
697                  <td></td>
698               </tr>
699               <tr valign="baseline">
700                  <td><a name="NT-Names"></a>[6]&nbsp;&nbsp;&nbsp;
701                  </td>
702                  <td>Names</td>
703                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
704                  <td><a href="#NT-Name">Name</a>
705                     (<a href="#NT-S">S</a> <a href="#NT-Name">Name</a>)*
706                  </td>
707                  <td></td>
708               </tr>
709               <tr valign="baseline">
710                  <td><a name="NT-Nmtoken"></a>[7]&nbsp;&nbsp;&nbsp;
711                  </td>
712                  <td>Nmtoken</td>
713                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
714                  <td>(<a href="#NT-NameChar">NameChar</a>)+
715                  </td>
716                  <td></td>
717               </tr>
718               <tr valign="baseline">
719                  <td><a name="NT-Nmtokens"></a>[8]&nbsp;&nbsp;&nbsp;
720                  </td>
721                  <td>Nmtokens</td>
722                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
723                  <td><a href="#NT-Nmtoken">Nmtoken</a> (<a href="#NT-S">S</a> <a href="#NT-Nmtoken">Nmtoken</a>)*
724                  </td>
725                  <td></td>
726               </tr>
727            </tbody>
728         </table>
729
730      </p>
731
732      <p>Literal data is any quoted string not containing
733         the quotation mark used as a delimiter for that string.
734         Literals are used
735         for specifying the content of internal entities
736         (<a href="#NT-EntityValue">EntityValue</a>),
737         the values of attributes (<a href="#NT-AttValue">AttValue</a>),
738         and external identifiers
739         (<a href="#NT-SystemLiteral">SystemLiteral</a>).
740         Note that a <a href="#NT-SystemLiteral">SystemLiteral</a>
741         can be parsed without scanning for markup.
742
743         <h5>Literals</h5>
744         <table class="scrap">
745            <tbody>
746               <tr valign="baseline">
747                  <td><a name="NT-EntityValue"></a>[9]&nbsp;&nbsp;&nbsp;
748                  </td>
749                  <td>EntityValue</td>
750                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
751                  <td>'"'
752                     ([^%&amp;"]
753                     | <a href="#NT-PEReference">PEReference</a>
754                     | <a href="#NT-Reference">Reference</a>)*
755                     '"'
756
757                  </td>
758                  <td></td>
759               </tr>
760               <tr valign="baseline">
761                  <td></td>
762                  <td></td>
763                  <td></td>
764                  <td>|&nbsp;
765                     "'"
766                     ([^%&amp;']
767                     | <a href="#NT-PEReference">PEReference</a>
768                     | <a href="#NT-Reference">Reference</a>)*
769                     "'"
770                  </td>
771                  <td></td>
772               </tr>
773               <tr valign="baseline">
774                  <td><a name="NT-AttValue"></a>[10]&nbsp;&nbsp;&nbsp;
775                  </td>
776                  <td>AttValue</td>
777                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
778                  <td>'"'
779                     ([^&lt;&amp;"]
780                     | <a href="#NT-Reference">Reference</a>)*
781                     '"'
782
783                  </td>
784                  <td></td>
785               </tr>
786               <tr valign="baseline">
787                  <td></td>
788                  <td></td>
789                  <td></td>
790                  <td>|&nbsp;
791                     "'"
792                     ([^&lt;&amp;']
793                     | <a href="#NT-Reference">Reference</a>)*
794                     "'"
795                  </td>
796                  <td></td>
797               </tr>
798               <tr valign="baseline">
799                  <td><a name="NT-SystemLiteral"></a>[11]&nbsp;&nbsp;&nbsp;
800                  </td>
801                  <td>SystemLiteral</td>
802                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
803                  <td>('"' [^"]* '"') |&nbsp;("'" [^']* "'")
804
805                  </td>
806                  <td></td>
807               </tr>
808               <tr valign="baseline">
809                  <td><a name="NT-PubidLiteral"></a>[12]&nbsp;&nbsp;&nbsp;
810                  </td>
811                  <td>PubidLiteral</td>
812                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
813                  <td>'"' <a href="#NT-PubidChar">PubidChar</a>*
814                     '"'
815                     | "'" (<a href="#NT-PubidChar">PubidChar</a> - "'")* "'"
816                  </td>
817                  <td></td>
818               </tr>
819               <tr valign="baseline">
820                  <td><a name="NT-PubidChar"></a>[13]&nbsp;&nbsp;&nbsp;
821                  </td>
822                  <td>PubidChar</td>
823                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
824                  <td>#x20 | #xD | #xA
825                     |&nbsp;[a-zA-Z0-9]
826                     |&nbsp;[-'()+,./:=?;!*#@$_%]
827                  </td>
828                  <td></td>
829               </tr>
830            </tbody>
831         </table>
832
833      </p>
834
835
836
837
838
839      <h3><a name="syntax"></a>2.4 Character Data and Markup
840      </h3>
841
842
843      <p><a href="#dt-text">Text</a> consists of intermingled
844         <a href="#dt-chardata">character
845            data
846         </a> and markup.
847         <a name="dt-markup"></a><b>Markup</b> takes the form of
848         <a href="#dt-stag">start-tags</a>,
849         <a href="#dt-etag">end-tags</a>,
850         <a href="#dt-empty">empty-element tags</a>,
851         <a href="#dt-entref">entity references</a>,
852         <a href="#dt-charref">character references</a>,
853         <a href="#dt-comment">comments</a>,
854         <a href="#dt-cdsection">CDATA section</a> delimiters,
855         <a href="#dt-doctype">document type declarations</a>, and
856         <a href="#dt-pi">processing instructions</a>.
857
858
859      </p>
860
861      <p><a name="dt-chardata"></a>All text that is not markup
862         constitutes the <b>character data</b> of
863         the document.
864      </p>
865
866      <p>The ampersand character (&amp;) and the left angle bracket (&lt;)
867         may appear in their literal form <i>only</i> when used as markup
868         delimiters, or within a <a href="#dt-comment">comment</a>, a
869         <a href="#dt-pi">processing instruction</a>,
870         or a <a href="#dt-cdsection">CDATA section</a>.
871
872         They are also legal within the <a href="#dt-litentval">literal entity
873            value
874         </a> of an internal entity declaration; see
875         <a href="#wf-entities">[<b>4.3.2 Well-Formed Parsed Entities</b>]
876         </a>.
877
878         If they are needed elsewhere,
879         they must be <a href="#dt-escape">escaped</a>
880         using either <a href="#dt-charref">numeric character references</a>
881         or the strings
882         "<code>&amp;amp;</code>" and "<code>&amp;lt;</code>" respectively.
883         The right angle
884         bracket (>) may be represented using the string
885         "<code>&amp;gt;</code>", and must, <a href="#dt-compat">for
886            compatibility
887         </a>,
888         be escaped using
889         "<code>&amp;gt;</code>" or a character reference
890         when it appears in the string
891         "<code>]]></code>"
892         in content,
893         when that string is not marking the end of
894         a <a href="#dt-cdsection">CDATA section</a>.
895
896      </p>
897
898      <p>
899         In the content of elements, character data
900         is any string of characters which does
901         not contain the start-delimiter of any markup.
902         In a CDATA section, character data
903         is any string of characters not including the CDATA-section-close
904         delimiter, "<code>]]></code>".
905      </p>
906
907      <p>
908         To allow attribute values to contain both single and double quotes, the
909         apostrophe or single-quote character (') may be represented as
910         "<code>&amp;apos;</code>", and the double-quote character (") as
911         "<code>&amp;quot;</code>".
912
913         <h5>Character Data</h5>
914         <table class="scrap">
915            <tbody>
916               <tr valign="baseline">
917                  <td><a name="NT-CharData"></a>[14]&nbsp;&nbsp;&nbsp;
918                  </td>
919                  <td>CharData</td>
920                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
921                  <td>[^&lt;&amp;]* - ([^&lt;&amp;]* ']]>' [^&lt;&amp;]*)</td>
922                  <td></td>
923               </tr>
924            </tbody>
925         </table>
926
927      </p>
928
929
930
931
932      <h3><a name="sec-comments"></a>2.5 Comments
933      </h3>
934
935
936      <p><a name="dt-comment"></a><b>Comments</b> may
937         appear anywhere in a document outside other
938         <a href="#dt-markup">markup</a>; in addition,
939         they may appear within the document type declaration
940         at places allowed by the grammar.
941         They are not part of the document's <a href="#dt-chardata">character
942            data
943         </a>; an XML
944         processor may, but need not, make it possible for an application to
945         retrieve the text of comments.
946         <a href="#dt-compat">For compatibility</a>, the string
947         "<code>--</code>" (double-hyphen) must not occur within
948         comments.
949
950         <h5>Comments</h5>
951         <table class="scrap">
952            <tbody>
953               <tr valign="baseline">
954                  <td><a name="NT-Comment"></a>[15]&nbsp;&nbsp;&nbsp;
955                  </td>
956                  <td>Comment</td>
957                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
958                  <td>'&lt;!--'
959                     ((<a href="#NT-Char">Char</a> - '-')
960                     | ('-' (<a href="#NT-Char">Char</a> - '-')))*
961                     '-->'
962                  </td>
963                  <td></td>
964               </tr>
965            </tbody>
966         </table>
967
968      </p>
969
970      <p>An example of a comment:
971         <pre>&lt;!-- declarations for &lt;head> &amp; &lt;body> --></pre>
972         </p>
973
974
975
976
977      <h3><a name="sec-pi"></a>2.6 Processing Instructions
978      </h3>
979
980
981      <p><a name="dt-pi"></a><b>Processing
982            instructions
983         </b> (PIs) allow documents to contain instructions
984         for applications.
985
986
987         <h5>Processing Instructions</h5>
988         <table class="scrap">
989            <tbody>
990               <tr valign="baseline">
991                  <td><a name="NT-PI"></a>[16]&nbsp;&nbsp;&nbsp;
992                  </td>
993                  <td>PI</td>
994                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
995                  <td>'&lt;?' <a href="#NT-PITarget">PITarget</a>
996                     (<a href="#NT-S">S</a>
997                     (<a href="#NT-Char">Char</a>* -
998                     (<a href="#NT-Char">Char</a>* '?>' <a href="#NT-Char">Char</a>*)))?
999                     '?>'
1000                  </td>
1001                  <td></td>
1002               </tr>
1003               <tr valign="baseline">
1004                  <td><a name="NT-PITarget"></a>[17]&nbsp;&nbsp;&nbsp;
1005                  </td>
1006                  <td>PITarget</td>
1007                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1008                  <td><a href="#NT-Name">Name</a> -
1009                     (('X' | 'x') ('M' | 'm') ('L' | 'l'))
1010                  </td>
1011                  <td></td>
1012               </tr>
1013            </tbody>
1014         </table>
1015         PIs are not part of the document's <a href="#dt-chardata">character
1016            data
1017         </a>, but must be passed through to the application. The
1018         PI begins with a target (<a href="#NT-PITarget">PITarget</a>) used
1019         to identify the application to which the instruction is directed.
1020         The target names "<code>XML</code>", "<code>xml</code>", and so on are
1021         reserved for standardization in this or future versions of this
1022         specification.
1023         The
1024         XML <a href="#dt-notation">Notation</a> mechanism
1025         may be used for
1026         formal declaration of PI targets.
1027
1028      </p>
1029
1030
1031
1032
1033      <h3><a name="sec-cdata-sect"></a>2.7 CDATA Sections
1034      </h3>
1035
1036
1037      <p><a name="dt-cdsection"></a><b>CDATA sections</b>
1038         may occur
1039         anywhere character data may occur; they are
1040         used to escape blocks of text containing characters which would
1041         otherwise be recognized as markup.  CDATA sections begin with the
1042         string "<code>&lt;![CDATA[</code>" and end with the string
1043         "<code>]]></code>":
1044
1045         <h5>CDATA Sections</h5>
1046         <table class="scrap">
1047            <tbody>
1048               <tr valign="baseline">
1049                  <td><a name="NT-CDSect"></a>[18]&nbsp;&nbsp;&nbsp;
1050                  </td>
1051                  <td>CDSect</td>
1052                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1053                  <td><a href="#NT-CDStart">CDStart</a>
1054                     <a href="#NT-CData">CData</a>
1055                     <a href="#NT-CDEnd">CDEnd</a></td>
1056                  <td></td>
1057               </tr>
1058               <tr valign="baseline">
1059                  <td><a name="NT-CDStart"></a>[19]&nbsp;&nbsp;&nbsp;
1060                  </td>
1061                  <td>CDStart</td>
1062                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1063                  <td>'&lt;![CDATA['</td>
1064                  <td></td>
1065               </tr>
1066               <tr valign="baseline">
1067                  <td><a name="NT-CData"></a>[20]&nbsp;&nbsp;&nbsp;
1068                  </td>
1069                  <td>CData</td>
1070                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1071                  <td>(<a href="#NT-Char">Char</a>* -
1072                     (<a href="#NT-Char">Char</a>* ']]>' <a href="#NT-Char">Char</a>*))
1073
1074                  </td>
1075                  <td></td>
1076               </tr>
1077               <tr valign="baseline">
1078                  <td><a name="NT-CDEnd"></a>[21]&nbsp;&nbsp;&nbsp;
1079                  </td>
1080                  <td>CDEnd</td>
1081                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1082                  <td>']]>'</td>
1083                  <td></td>
1084               </tr>
1085            </tbody>
1086         </table>
1087
1088         Within a CDATA section, only the <a href="#NT-CDEnd">CDEnd</a> string is
1089         recognized as markup, so that left angle brackets and ampersands may occur in
1090         their literal form; they need not (and cannot) be escaped using
1091         "<code>&amp;lt;</code>" and "<code>&amp;amp;</code>".  CDATA sections
1092         cannot nest.
1093
1094      </p>
1095
1096
1097      <p>An example of a CDATA section, in which "<code>&lt;greeting></code>" and
1098         "<code>&lt;/greeting></code>"
1099         are recognized as <a href="#dt-chardata">character data</a>, not
1100         <a href="#dt-markup">markup</a>:
1101         <pre>&lt;![CDATA[&lt;greeting>Hello, world!&lt;/greeting>]]></pre>
1102         </p>
1103
1104
1105
1106
1107      <h3><a name="sec-prolog-dtd"></a>2.8 Prolog and Document Type Declaration
1108      </h3>
1109
1110
1111      <p><a name="dt-xmldecl"></a>XML documents
1112         may, and should,
1113         begin with an <b>XML declaration</b> which specifies
1114         the version of
1115         XML being used.
1116         For example, the following is a complete XML document, <a href="#dt-wellformed">well-formed</a> but not
1117         <a href="#dt-valid">valid</a>:
1118         <pre>&lt;?xml version="1.0"?>
1119&lt;greeting>Hello, world!&lt;/greeting>
1120</pre>
1121         and so is this:
1122         <pre>&lt;greeting>Hello, world!&lt;/greeting>
1123</pre>
1124         </p>
1125
1126
1127      <p>The version number "<code>1.0</code>" should be used to indicate
1128         conformance to this version of this specification; it is an error
1129         for a document to use the value "<code>1.0</code>"
1130         if it does not conform to this version of this specification.
1131         It is the intent
1132         of the XML working group to give later versions of this specification
1133         numbers other than "<code>1.0</code>", but this intent does not
1134         indicate a
1135         commitment to produce any future versions of XML, nor if any are produced, to
1136         use any particular numbering scheme.
1137         Since future versions are not ruled out, this construct is provided
1138         as a means to allow the possibility of automatic version recognition, should
1139         it become necessary.
1140         Processors may signal an error if they receive documents labeled with
1141         versions they do not support.
1142
1143      </p>
1144
1145      <p>The function of the markup in an XML document is to describe its
1146         storage and logical structure and to associate attribute-value pairs
1147         with its logical structures.  XML provides a mechanism, the <a href="#dt-doctype">document type declaration</a>, to define
1148         constraints on the logical structure and to support the use of
1149         predefined storage units.
1150
1151         <a name="dt-valid"></a>An XML document is
1152         <b>valid</b> if it has an associated document type
1153         declaration and if the document
1154         complies with the constraints expressed in it.
1155      </p>
1156
1157      <p>The document type declaration must appear before
1158         the first <a href="#dt-element">element</a> in the document.
1159
1160         <h5>Prolog</h5>
1161         <table class="scrap">
1162            <tbody>
1163
1164               <tr valign="baseline">
1165                  <td><a name="NT-prolog"></a>[22]&nbsp;&nbsp;&nbsp;
1166                  </td>
1167                  <td>prolog</td>
1168                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1169                  <td><a href="#NT-XMLDecl">XMLDecl</a>?
1170                     <a href="#NT-Misc">Misc</a>*
1171                     (<a href="#NT-doctypedecl">doctypedecl</a>
1172                     <a href="#NT-Misc">Misc</a>*)?
1173                  </td>
1174                  <td></td>
1175               </tr>
1176
1177               <tr valign="baseline">
1178                  <td><a name="NT-XMLDecl"></a>[23]&nbsp;&nbsp;&nbsp;
1179                  </td>
1180                  <td>XMLDecl</td>
1181                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1182                  <td>'&lt;?xml'
1183                     <a href="#NT-VersionInfo">VersionInfo</a>
1184                     <a href="#NT-EncodingDecl">EncodingDecl</a>?
1185                     <a href="#NT-SDDecl">SDDecl</a>?
1186                     <a href="#NT-S">S</a>?
1187                     '?>'
1188                  </td>
1189                  <td></td>
1190               </tr>
1191
1192               <tr valign="baseline">
1193                  <td><a name="NT-VersionInfo"></a>[24]&nbsp;&nbsp;&nbsp;
1194                  </td>
1195                  <td>VersionInfo</td>
1196                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1197                  <td><a href="#NT-S">S</a> 'version' <a href="#NT-Eq">Eq</a>
1198                     (' <a href="#NT-VersionNum">VersionNum</a> '
1199                     | " <a href="#NT-VersionNum">VersionNum</a> ")
1200                  </td>
1201                  <td></td>
1202               </tr>
1203
1204               <tr valign="baseline">
1205                  <td><a name="NT-Eq"></a>[25]&nbsp;&nbsp;&nbsp;
1206                  </td>
1207                  <td>Eq</td>
1208                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1209                  <td><a href="#NT-S">S</a>? '=' <a href="#NT-S">S</a>?
1210                  </td>
1211                  <td></td>
1212               </tr>
1213
1214               <tr valign="baseline">
1215                  <td><a name="NT-VersionNum"></a>[26]&nbsp;&nbsp;&nbsp;
1216                  </td>
1217                  <td>VersionNum</td>
1218                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1219                  <td>([a-zA-Z0-9_.:] | '-')+</td>
1220                  <td></td>
1221               </tr>
1222
1223               <tr valign="baseline">
1224                  <td><a name="NT-Misc"></a>[27]&nbsp;&nbsp;&nbsp;
1225                  </td>
1226                  <td>Misc</td>
1227                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1228                  <td><a href="#NT-Comment">Comment</a> | <a href="#NT-PI">PI</a> |
1229                     <a href="#NT-S">S</a></td>
1230                  <td></td>
1231               </tr>
1232
1233            </tbody>
1234         </table>
1235      </p>
1236
1237
1238      <p><a name="dt-doctype"></a>The XML
1239         <b>document type declaration</b>
1240         contains or points to
1241         <a href="#dt-markupdecl">markup declarations</a>
1242         that provide a grammar for a
1243         class of documents.
1244         This grammar is known as a document type definition,
1245         or <b>DTD</b>.
1246         The document type declaration can point to an external subset (a
1247         special kind of
1248         <a href="#dt-extent">external entity</a>) containing markup
1249         declarations, or can
1250         contain the markup declarations directly in an internal subset, or can do
1251         both.
1252         The DTD for a document consists of both subsets taken
1253         together.
1254
1255      </p>
1256
1257      <p><a name="dt-markupdecl"></a>
1258         A <b>markup declaration</b> is
1259         an <a href="#dt-eldecl">element type declaration</a>,
1260         an <a href="#dt-attdecl">attribute-list declaration</a>,
1261         an <a href="#dt-entdecl">entity declaration</a>, or
1262         a <a href="#dt-notdecl">notation declaration</a>.
1263
1264         These declarations may be contained in whole or in part
1265         within <a href="#dt-PE">parameter entities</a>,
1266         as described in the well-formedness and validity constraints below.
1267         For fuller information, see
1268         <a href="#sec-physical-struct">[<b>4 Physical Structures</b>]
1269         </a>.
1270      </p>
1271
1272      <h5>Document Type Definition</h5>
1273      <table class="scrap">
1274         <tbody>
1275
1276            <tr valign="baseline">
1277               <td><a name="NT-doctypedecl"></a>[28]&nbsp;&nbsp;&nbsp;
1278               </td>
1279               <td>doctypedecl</td>
1280               <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1281               <td>'&lt;!DOCTYPE' <a href="#NT-S">S</a>
1282                  <a href="#NT-Name">Name</a> (<a href="#NT-S">S</a>
1283                  <a href="#NT-ExternalID">ExternalID</a>)?
1284                  <a href="#NT-S">S</a>? ('['
1285                  (<a href="#NT-markupdecl">markupdecl</a>
1286                  | <a href="#NT-PEReference">PEReference</a>
1287                  | <a href="#NT-S">S</a>)*
1288                  ']'
1289                  <a href="#NT-S">S</a>?)? '>'
1290               </td>
1291               <td>[&nbsp;VC:&nbsp;<a href="#vc-roottype">Root Element Type</a>&nbsp;]
1292               </td>
1293            </tr>
1294
1295            <tr valign="baseline">
1296               <td><a name="NT-markupdecl"></a>[29]&nbsp;&nbsp;&nbsp;
1297               </td>
1298               <td>markupdecl</td>
1299               <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1300               <td><a href="#NT-elementdecl">elementdecl</a>
1301                  | <a href="#NT-AttlistDecl">AttlistDecl</a>
1302                  | <a href="#NT-EntityDecl">EntityDecl</a>
1303                  | <a href="#NT-NotationDecl">NotationDecl</a>
1304                  | <a href="#NT-PI">PI</a>
1305                  | <a href="#NT-Comment">Comment</a>
1306
1307               </td>
1308               <td>[&nbsp;VC:&nbsp;<a href="#vc-PEinMarkupDecl">Proper Declaration/PE Nesting</a>&nbsp;]
1309               </td>
1310            </tr>
1311            <tr valign="baseline">
1312               <td></td>
1313               <td></td>
1314               <td></td>
1315               <td></td>
1316               <td>[&nbsp;WFC:&nbsp;<a href="#wfc-PEinInternalSubset">PEs in Internal Subset</a>&nbsp;]
1317               </td>
1318            </tr>
1319
1320
1321         </tbody>
1322      </table>
1323
1324
1325      <p>The markup declarations may be made up in whole or in part of
1326         the <a href="#dt-repltext">replacement text</a> of
1327         <a href="#dt-PE">parameter entities</a>.
1328         The productions later in this specification for
1329         individual nonterminals (<a href="#NT-elementdecl">elementdecl</a>,
1330         <a href="#NT-AttlistDecl">AttlistDecl</a>, and so on) describe
1331         the declarations <i>after</i> all the parameter entities have been
1332         <a href="#dt-include">included</a>.
1333      </p>
1334
1335      <a name="vc-roottype"></a><p><b>Validity Constraint: Root Element Type</b></p>
1336      Root Element Type
1337
1338      <p>
1339         The <a href="#NT-Name">Name</a> in the document type declaration must
1340         match the element type of the <a href="#dt-root">root element</a>.
1341
1342      </p>
1343
1344
1345      <a name="vc-PEinMarkupDecl"></a><p><b>Validity Constraint: Proper Declaration/PE Nesting</b></p>
1346      Proper Declaration/PE Nesting
1347
1348      <p>Parameter-entity
1349         <a href="#dt-repltext">replacement text</a> must be properly nested
1350         with markup declarations.
1351         That is to say, if either the first character
1352         or the last character of a markup
1353         declaration (<a href="#NT-markupdecl">markupdecl</a> above)
1354         is contained in the replacement text for a
1355         <a href="#dt-PERef">parameter-entity reference</a>,
1356         both must be contained in the same replacement text.
1357      </p>
1358
1359      <a name="wfc-PEinInternalSubset"></a><p><b>Well Formedness Constraint: PEs in Internal Subset</b></p>
1360      PEs in Internal Subset
1361
1362      <p>In the internal DTD subset,
1363         <a href="#dt-PERef">parameter-entity references</a>
1364         can occur only where markup declarations can occur, not
1365         within markup declarations.  (This does not apply to
1366         references that occur in
1367         external parameter entities or to the external subset.)
1368
1369      </p>
1370
1371
1372      <p>
1373         Like the internal subset, the external subset and
1374         any external parameter entities referred to in the DTD
1375         must consist of a series of complete markup declarations of the types
1376         allowed by the non-terminal symbol
1377         <a href="#NT-markupdecl">markupdecl</a>, interspersed with white space
1378         or <a href="#dt-PERef">parameter-entity references</a>.
1379         However, portions of the contents
1380         of the
1381         external subset or of external parameter entities may conditionally be ignored
1382         by using
1383         the <a href="#dt-cond-section">conditional section</a>
1384         construct; this is not allowed in the internal subset.
1385
1386
1387         <h5>External Subset</h5>
1388         <table class="scrap">
1389            <tbody>
1390
1391               <tr valign="baseline">
1392                  <td><a name="NT-extSubset"></a>[30]&nbsp;&nbsp;&nbsp;
1393                  </td>
1394                  <td>extSubset</td>
1395                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1396                  <td><a href="#NT-TextDecl">TextDecl</a>?
1397                     <a href="#NT-extSubsetDecl">extSubsetDecl</a></td>
1398                  <td></td>
1399               </tr>
1400
1401               <tr valign="baseline">
1402                  <td><a name="NT-extSubsetDecl"></a>[31]&nbsp;&nbsp;&nbsp;
1403                  </td>
1404                  <td>extSubsetDecl</td>
1405                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1406                  <td>(
1407                     <a href="#NT-markupdecl">markupdecl</a>
1408                     | <a href="#NT-conditionalSect">conditionalSect</a>
1409                     | <a href="#NT-PEReference">PEReference</a>
1410                     | <a href="#NT-S">S</a>
1411                     )*
1412                  </td>
1413                  <td></td>
1414               </tr>
1415
1416            </tbody>
1417         </table>
1418      </p>
1419
1420      <p>The external subset and external parameter entities also differ
1421         from the internal subset in that in them,
1422         <a href="#dt-PERef">parameter-entity references</a>
1423         are permitted <i>within</i> markup declarations,
1424         not only <i>between</i> markup declarations.
1425      </p>
1426
1427      <p>An example of an XML document with a document type declaration:
1428         <pre>&lt;?xml version="1.0"?>
1429&lt;!DOCTYPE greeting SYSTEM "hello.dtd">
1430&lt;greeting>Hello, world!&lt;/greeting>
1431</pre>
1432         The <a href="#dt-sysid">system identifier</a>
1433         "<code>hello.dtd</code>" gives the URI of a DTD for the document.
1434      </p>
1435
1436      <p>The declarations can also be given locally, as in this
1437         example:
1438         <pre>&lt;?xml version="1.0" encoding="UTF-8" ?>
1439&lt;!DOCTYPE greeting [
1440  &lt;!ELEMENT greeting (#PCDATA)>
1441]>
1442&lt;greeting>Hello, world!&lt;/greeting>
1443</pre>
1444         If both the external and internal subsets are used, the
1445         internal subset is considered to occur before the external subset.
1446
1447         This has the effect that entity and attribute-list declarations in the
1448         internal subset take precedence over those in the external subset.
1449         </p>
1450
1451
1452
1453
1454      <h3><a name="sec-rmd"></a>2.9 Standalone Document Declaration
1455      </h3>
1456
1457      <p>Markup declarations can affect the content of the document,
1458         as passed from an <a href="#dt-xml-proc">XML processor</a>
1459         to an application; examples are attribute defaults and entity
1460         declarations.
1461         The standalone document declaration,
1462         which may appear as a component of the XML declaration, signals
1463         whether or not there are such declarations which appear external to
1464         the <a href="#dt-docent">document entity</a>.
1465
1466         <h5>Standalone Document Declaration</h5>
1467         <table class="scrap">
1468            <tbody>
1469
1470               <tr valign="baseline">
1471                  <td><a name="NT-SDDecl"></a>[32]&nbsp;&nbsp;&nbsp;
1472                  </td>
1473                  <td>SDDecl</td>
1474                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1475                  <td>
1476                     <a href="#NT-S">S</a>
1477                     'standalone' <a href="#NT-Eq">Eq</a>
1478                     (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))
1479
1480                  </td>
1481                  <td>[&nbsp;VC:&nbsp;<a href="#vc-check-rmd">Standalone Document Declaration</a>&nbsp;]
1482                  </td>
1483               </tr>
1484
1485            </tbody>
1486         </table>
1487      </p>
1488
1489      <p>
1490         In a standalone document declaration, the value "<code>yes</code>" indicates
1491         that there
1492         are no markup declarations external to the <a href="#dt-docent">document
1493            entity
1494         </a> (either in the DTD external subset, or in an
1495         external parameter entity referenced from the internal subset)
1496         which affect the information passed from the XML processor to
1497         the application.
1498         The value "<code>no</code>" indicates that there are or may be such
1499         external markup declarations.
1500         Note that the standalone document declaration only
1501         denotes the presence of external <i>declarations</i>; the presence, in a
1502         document, of
1503         references to external <i>entities</i>, when those entities are
1504         internally declared,
1505         does not change its standalone status.
1506      </p>
1507
1508      <p>If there are no external markup declarations, the standalone document
1509         declaration has no meaning.
1510         If there are external markup declarations but there is no standalone
1511         document declaration, the value "<code>no</code>" is assumed.
1512      </p>
1513
1514      <p>Any XML document for which <code>standalone="no"</code> holds can
1515         be converted algorithmically to a standalone document,
1516         which may be desirable for some network delivery applications.
1517      </p>
1518      <a name="vc-check-rmd"></a><p><b>Validity Constraint: Standalone Document Declaration</b></p>
1519      Standalone Document Declaration
1520
1521      <p>The standalone document declaration must have
1522         the value "<code>no</code>" if any external markup declarations
1523         contain declarations of:
1524      </p>
1525      <ul>
1526
1527         <li>
1528            <p>attributes with <a href="#dt-default">default</a> values, if
1529               elements to which
1530               these attributes apply appear in the document without
1531               specifications of values for these attributes, or
1532            </p>
1533         </li>
1534
1535         <li>
1536            <p>entities (other than <code>amp</code>,
1537               <code>lt</code>,
1538               <code>gt</code>,
1539               <code>apos</code>,
1540               <code>quot</code>),
1541               if <a href="#dt-entref">references</a> to those
1542               entities appear in the document, or
1543            </p>
1544
1545         </li>
1546
1547         <li>
1548            <p>attributes with values subject to
1549               <a href="#AVNormalize">normalization</a>, where the
1550               attribute appears in the document with a value which will
1551               change as a result of normalization, or
1552            </p>
1553
1554         </li>
1555
1556         <li>
1557
1558            <p>element types with <a href="#dt-elemcontent">element content</a>,
1559               if white space occurs
1560               directly within any instance of those types.
1561
1562            </p>
1563         </li>
1564
1565      </ul>
1566
1567
1568
1569      <p>An example XML declaration with a standalone document declaration:<pre>&lt;?xml version="1.0" standalone='yes'?></pre></p>
1570
1571
1572
1573      <h3><a name="sec-white-space"></a>2.10 White Space Handling
1574      </h3>
1575
1576
1577      <p>In editing XML documents, it is often convenient to use "white space"
1578         (spaces, tabs, and blank lines, denoted by the nonterminal
1579         <a href="#NT-S">S</a> in this specification) to
1580         set apart the markup for greater readability.  Such white space is typically
1581         not intended for inclusion in the delivered version of the document.
1582         On the other hand, "significant" white space that should be preserved in the
1583         delivered version is common, for example in poetry and
1584         source code.
1585      </p>
1586
1587      <p>An <a href="#dt-xml-proc">XML processor</a>
1588         must always pass all characters in a document that are not
1589         markup through to the application.   A <a href="#dt-validating">
1590            validating XML processor
1591         </a> must also inform the application
1592         which  of these characters constitute white space appearing
1593         in <a href="#dt-elemcontent">element content</a>.
1594
1595      </p>
1596
1597      <p>A special <a href="#dt-attr">attribute</a>
1598         named xml:space may be attached to an element
1599         to signal an intention that in that element,
1600         white space should be preserved by applications.
1601         In valid documents, this attribute, like any other, must be
1602         <a href="#dt-attdecl">declared</a> if it is used.
1603         When declared, it must be given as an
1604         <a href="#dt-enumerated">enumerated type</a> whose only
1605         possible values are "<code>default</code>" and "<code>preserve</code>".
1606         For example:<pre>    &lt;!ATTLIST poem   xml:space (default|preserve) 'preserve'></pre></p>
1607
1608      <p>The value "<code>default</code>" signals that applications'
1609         default white-space processing modes are acceptable for this element; the
1610         value "<code>preserve</code>" indicates the intent that applications preserve
1611         all the white space.
1612         This declared intent is considered to apply to all elements within the content
1613         of the element where it is specified, unless overriden with another instance
1614         of the xml:space attribute.
1615
1616      </p>
1617
1618      <p>The <a href="#dt-root">root element</a> of any document
1619         is considered to have signaled no intentions as regards application space
1620         handling, unless it provides a value for
1621         this attribute or the attribute is declared with a default value.
1622
1623      </p>
1624
1625
1626
1627
1628      <h3><a name="sec-line-ends"></a>2.11 End-of-Line Handling
1629      </h3>
1630
1631      <p>XML <a href="#dt-parsedent">parsed entities</a> are often stored in
1632         computer files which, for editing convenience, are organized into lines.
1633         These lines are typically separated by some combination of the characters
1634         carriage-return (#xD) and line-feed (#xA).
1635      </p>
1636
1637      <p>To simplify the tasks of <a href="#dt-app">applications</a>,
1638         wherever an external parsed entity or the literal entity value
1639         of an internal parsed entity contains either the literal
1640         two-character sequence "#xD#xA" or a standalone literal
1641         #xD, an <a href="#dt-xml-proc">XML processor</a> must
1642         pass to the application the single character #xA.
1643         (This behavior can
1644         conveniently be produced by normalizing all
1645         line breaks to #xA on input, before parsing.)
1646
1647      </p>
1648
1649
1650
1651      <h3><a name="sec-lang-tag"></a>2.12 Language Identification
1652      </h3>
1653
1654      <p>In document processing, it is often useful to
1655         identify the natural or formal language
1656         in which the content is
1657         written.
1658         A special <a href="#dt-attr">attribute</a> named
1659         xml:lang may be inserted in
1660         documents to specify the
1661         language used in the contents and attribute values
1662         of any element in an XML document.
1663         In valid documents, this attribute, like any other, must be
1664         <a href="#dt-attdecl">declared</a> if it is used.
1665         The values of the attribute are language identifiers as defined
1666         by <a href="#RFC1766">[IETF RFC 1766]</a>, "Tags for the Identification of Languages":
1667
1668         <h5>Language Identification</h5>
1669         <table class="scrap">
1670            <tbody>
1671               <tr valign="baseline">
1672                  <td><a name="NT-LanguageID"></a>[33]&nbsp;&nbsp;&nbsp;
1673                  </td>
1674                  <td>LanguageID</td>
1675                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1676                  <td><a href="#NT-Langcode">Langcode</a>
1677                     ('-' <a href="#NT-Subcode">Subcode</a>)*
1678                  </td>
1679                  <td></td>
1680               </tr>
1681               <tr valign="baseline">
1682                  <td><a name="NT-Langcode"></a>[34]&nbsp;&nbsp;&nbsp;
1683                  </td>
1684                  <td>Langcode</td>
1685                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1686                  <td><a href="#NT-ISO639Code">ISO639Code</a> |
1687                     <a href="#NT-IanaCode">IanaCode</a> |
1688                     <a href="#NT-UserCode">UserCode</a></td>
1689                  <td></td>
1690               </tr>
1691               <tr valign="baseline">
1692                  <td><a name="NT-ISO639Code"></a>[35]&nbsp;&nbsp;&nbsp;
1693                  </td>
1694                  <td>ISO639Code</td>
1695                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1696                  <td>([a-z] | [A-Z]) ([a-z] | [A-Z])</td>
1697                  <td></td>
1698               </tr>
1699               <tr valign="baseline">
1700                  <td><a name="NT-IanaCode"></a>[36]&nbsp;&nbsp;&nbsp;
1701                  </td>
1702                  <td>IanaCode</td>
1703                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1704                  <td>('i' | 'I') '-' ([a-z] | [A-Z])+</td>
1705                  <td></td>
1706               </tr>
1707               <tr valign="baseline">
1708                  <td><a name="NT-UserCode"></a>[37]&nbsp;&nbsp;&nbsp;
1709                  </td>
1710                  <td>UserCode</td>
1711                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1712                  <td>('x' | 'X') '-' ([a-z] | [A-Z])+</td>
1713                  <td></td>
1714               </tr>
1715               <tr valign="baseline">
1716                  <td><a name="NT-Subcode"></a>[38]&nbsp;&nbsp;&nbsp;
1717                  </td>
1718                  <td>Subcode</td>
1719                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1720                  <td>([a-z] | [A-Z])+</td>
1721                  <td></td>
1722               </tr>
1723            </tbody>
1724         </table>
1725         The <a href="#NT-Langcode">Langcode</a> may be any of the following:
1726
1727         <ul>
1728
1729            <li>
1730               <p>a two-letter language code as defined by
1731                  <a href="#ISO639">[ISO 639]</a>, "Codes
1732                  for the representation of names of languages"
1733               </p>
1734            </li>
1735
1736            <li>
1737               <p>a language identifier registered with the Internet
1738                  Assigned Numbers Authority <a href="#IANA">[IANA]</a>; these begin with the
1739                  prefix "<code>i-</code>" (or "<code>I-</code>")
1740               </p>
1741            </li>
1742
1743            <li>
1744               <p>a language identifier assigned by the user, or agreed on
1745                  between parties in private use; these must begin with the
1746                  prefix "<code>x-</code>" or "<code>X-</code>" in order to ensure that they do not conflict
1747                  with names later standardized or registered with IANA
1748               </p>
1749            </li>
1750
1751         </ul>
1752      </p>
1753
1754      <p>There may be any number of <a href="#NT-Subcode">Subcode</a> segments; if
1755         the first
1756         subcode segment exists and the Subcode consists of two
1757         letters, then it must be a country code from
1758         <a href="#ISO3166">[ISO 3166]</a>, "Codes
1759         for the representation of names of countries."
1760         If the first
1761         subcode consists of more than two letters, it must be
1762         a subcode for the language in question registered with IANA,
1763         unless the <a href="#NT-Langcode">Langcode</a> begins with the prefix
1764         "<code>x-</code>" or
1765         "<code>X-</code>".
1766      </p>
1767
1768      <p>It is customary to give the language code in lower case, and
1769         the country code (if any) in upper case.
1770         Note that these values, unlike other names in XML documents,
1771         are case insensitive.
1772      </p>
1773
1774      <p>For example:
1775         <pre>&lt;p xml:lang="en">The quick brown fox jumps over the lazy dog.&lt;/p>
1776&lt;p xml:lang="en-GB">What colour is it?&lt;/p>
1777&lt;p xml:lang="en-US">What color is it?&lt;/p>
1778&lt;sp who="Faust" desc='leise' xml:lang="de">
1779  &lt;l>Habe nun, ach! Philosophie,&lt;/l>
1780  &lt;l>Juristerei, und Medizin&lt;/l>
1781  &lt;l>und leider auch Theologie&lt;/l>
1782  &lt;l>durchaus studiert mit hei&szlig;em Bem&uuml;h'n.&lt;/l>
1783  &lt;/sp></pre></p>
1784
1785
1786      <p>The intent declared with xml:lang is considered to apply to
1787         all attributes and content of the element where it is specified,
1788         unless overridden with an instance of xml:lang
1789         on another element within that content.
1790      </p>
1791
1792
1793      <p>A simple declaration for xml:lang might take
1794         the form
1795         <pre>xml:lang  NMTOKEN  #IMPLIED</pre>
1796         but specific default values may also be given, if appropriate.  In a
1797         collection of French poems for English students, with glosses and
1798         notes in English, the xml:lang attribute might be declared this way:
1799         <pre>    &lt;!ATTLIST poem   xml:lang NMTOKEN 'fr'>
1800    &lt;!ATTLIST gloss  xml:lang NMTOKEN 'en'>
1801    &lt;!ATTLIST note   xml:lang NMTOKEN 'en'></pre>
1802         </p>
1803
1804
1805
1806
1807
1808
1809
1810      <h2><a name="sec-logical-struct"></a>3 Logical Structures
1811      </h2>
1812
1813
1814      <p><a name="dt-element"></a>Each <a href="#dt-xml-doc">XML document</a> contains one or more
1815         <b>elements</b>, the boundaries of which are
1816         either delimited by <a href="#dt-stag">start-tags</a>
1817         and <a href="#dt-etag">end-tags</a>, or, for <a href="#dt-empty">empty</a> elements, by an <a href="#dt-eetag">empty-element tag</a>. Each element has a type,
1818         identified by name, sometimes called its "generic
1819         identifier" (GI), and may have a set of
1820         attribute specifications.  Each attribute specification
1821         has a <a href="#dt-attrname">name</a> and a <a href="#dt-attrval">value</a>.
1822
1823      </p>
1824
1825      <h5>Element</h5>
1826      <table class="scrap">
1827         <tbody>
1828            <tr valign="baseline">
1829               <td><a name="NT-element"></a>[39]&nbsp;&nbsp;&nbsp;
1830               </td>
1831               <td>element</td>
1832               <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1833               <td><a href="#NT-EmptyElemTag">EmptyElemTag</a></td>
1834               <td></td>
1835            </tr>
1836            <tr valign="baseline">
1837               <td></td>
1838               <td></td>
1839               <td></td>
1840               <td>| <a href="#NT-STag">STag</a> <a href="#NT-content">content</a>
1841                  <a href="#NT-ETag">ETag</a></td>
1842               <td>[&nbsp;WFC:&nbsp;<a href="#GIMatch">Element Type Match</a>&nbsp;]
1843               </td>
1844            </tr>
1845            <tr valign="baseline">
1846               <td></td>
1847               <td></td>
1848               <td></td>
1849               <td></td>
1850               <td>[&nbsp;VC:&nbsp;<a href="#elementvalid">Element Valid</a>&nbsp;]
1851               </td>
1852            </tr>
1853         </tbody>
1854      </table>
1855
1856      <p>This specification does not constrain the semantics, use, or (beyond
1857         syntax) names of the element types and attributes, except that names
1858         beginning with a match to <code>(('X'|'x')('M'|'m')('L'|'l'))</code>
1859         are reserved for standardization in this or future versions of this
1860         specification.
1861
1862      </p>
1863      <a name="GIMatch"></a><p><b>Well Formedness Constraint: Element Type Match</b></p>
1864      Element Type Match
1865
1866      <p>
1867         The <a href="#NT-Name">Name</a> in an element's end-tag must match
1868         the element type in
1869         the start-tag.
1870
1871      </p>
1872
1873      <a name="elementvalid"></a><p><b>Validity Constraint: Element Valid</b></p>
1874      Element Valid
1875
1876      <p>An element is
1877         valid if
1878         there is a declaration matching
1879         <a href="#NT-elementdecl">elementdecl</a> where the
1880         <a href="#NT-Name">Name</a> matches the element type, and
1881         one of the following holds:
1882      </p>
1883
1884      <ol>
1885
1886         <li>
1887            <p>The declaration matches EMPTY and the element has no
1888               <a href="#dt-content">content</a>.
1889            </p>
1890         </li>
1891
1892         <li>
1893            <p>The declaration matches <a href="#NT-children">children</a> and
1894               the sequence of
1895               <a href="#dt-parentchild">child elements</a>
1896               belongs to the language generated by the regular expression in
1897               the content model, with optional white space (characters
1898               matching the nonterminal <a href="#NT-S">S</a>) between each pair
1899               of child elements.
1900            </p>
1901         </li>
1902
1903         <li>
1904            <p>The declaration matches <a href="#NT-Mixed">Mixed</a> and
1905               the content consists of <a href="#dt-chardata">character
1906                  data
1907               </a> and <a href="#dt-parentchild">child elements</a>
1908               whose types match names in the content model.
1909            </p>
1910         </li>
1911
1912         <li>
1913            <p>The declaration matches ANY, and the types
1914               of any <a href="#dt-parentchild">child elements</a> have
1915               been declared.
1916            </p>
1917         </li>
1918
1919      </ol>
1920
1921
1922
1923
1924      <h3><a name="sec-starttags"></a>3.1 Start-Tags, End-Tags, and Empty-Element Tags
1925      </h3>
1926
1927
1928      <p><a name="dt-stag"></a>The beginning of every
1929         non-empty XML element is marked by a <b>start-tag</b>.
1930
1931         <h5>Start-tag</h5>
1932         <table class="scrap">
1933            <tbody>
1934
1935               <tr valign="baseline">
1936                  <td><a name="NT-STag"></a>[40]&nbsp;&nbsp;&nbsp;
1937                  </td>
1938                  <td>STag</td>
1939                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1940                  <td>'&lt;' <a href="#NT-Name">Name</a>
1941                     (<a href="#NT-S">S</a> <a href="#NT-Attribute">Attribute</a>)*
1942                     <a href="#NT-S">S</a>? '>'
1943                  </td>
1944                  <td>[&nbsp;WFC:&nbsp;<a href="#uniqattspec">Unique Att Spec</a>&nbsp;]
1945                  </td>
1946               </tr>
1947
1948               <tr valign="baseline">
1949                  <td><a name="NT-Attribute"></a>[41]&nbsp;&nbsp;&nbsp;
1950                  </td>
1951                  <td>Attribute</td>
1952                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
1953                  <td><a href="#NT-Name">Name</a> <a href="#NT-Eq">Eq</a>
1954                     <a href="#NT-AttValue">AttValue</a></td>
1955                  <td>[&nbsp;VC:&nbsp;<a href="#ValueType">Attribute Value Type</a>&nbsp;]
1956                  </td>
1957               </tr>
1958               <tr valign="baseline">
1959                  <td></td>
1960                  <td></td>
1961                  <td></td>
1962                  <td></td>
1963                  <td>[&nbsp;WFC:&nbsp;<a href="#NoExternalRefs">No External Entity References</a>&nbsp;]
1964                  </td>
1965               </tr>
1966               <tr valign="baseline">
1967                  <td></td>
1968                  <td></td>
1969                  <td></td>
1970                  <td></td>
1971                  <td>[&nbsp;WFC:&nbsp;<a href="#CleanAttrVals">No &lt; in Attribute Values</a>&nbsp;]
1972                  </td>
1973               </tr>
1974
1975            </tbody>
1976         </table>
1977         The <a href="#NT-Name">Name</a> in
1978         the start- and end-tags gives the
1979         element's <b>type</b>.
1980         <a name="dt-attr"></a>
1981         The <a href="#NT-Name">Name</a>-<a href="#NT-AttValue">AttValue</a> pairs are
1982         referred to as
1983         the <b>attribute specifications</b> of the element,
1984         <a name="dt-attrname"></a>with the
1985         <a href="#NT-Name">Name</a> in each pair
1986         referred to as the <b>attribute name</b> and
1987         <a name="dt-attrval"></a>the content of the
1988         <a href="#NT-AttValue">AttValue</a> (the text between the
1989         <code>'</code> or <code>"</code> delimiters)
1990         as the <b>attribute value</b>.
1991
1992      </p>
1993      <a name="uniqattspec"></a><p><b>Well Formedness Constraint: Unique Att Spec</b></p>
1994      Unique Att Spec
1995
1996      <p>
1997         No attribute name may appear more than once in the same start-tag
1998         or empty-element tag.
1999
2000      </p>
2001
2002      <a name="ValueType"></a><p><b>Validity Constraint: Attribute Value Type</b></p>
2003      Attribute Value Type
2004
2005      <p>
2006         The attribute must have been declared; the value must be of the type
2007         declared for it.
2008         (For attribute types, see <a href="#attdecls">[<b>3.3 Attribute-List Declarations</b>]
2009         </a>.)
2010
2011      </p>
2012
2013      <a name="NoExternalRefs"></a><p><b>Well Formedness Constraint: No External Entity References</b></p>
2014      No External Entity References
2015
2016      <p>
2017         Attribute values cannot contain direct or indirect entity references
2018         to external entities.
2019
2020      </p>
2021
2022      <a name="CleanAttrVals"></a><p><b>Well Formedness Constraint: No &lt; in Attribute Values</b></p>
2023      No <code>&lt;</code> in Attribute Values
2024
2025      <p>The <a href="#dt-repltext">replacement text</a> of any entity
2026         referred to directly or indirectly in an attribute
2027         value (other than "<code>&amp;lt;</code>") must not contain
2028         a <code>&lt;</code>.
2029
2030      </p>
2031
2032      <p>An example of a start-tag:
2033         <pre>&lt;termdef id="dt-dog" term="dog"></pre></p>
2034
2035      <p><a name="dt-etag"></a>The end of every element
2036         that begins with a start-tag must
2037         be marked by an <b>end-tag</b>
2038         containing a name that echoes the element's type as given in the
2039         start-tag:
2040
2041         <h5>End-tag</h5>
2042         <table class="scrap">
2043            <tbody>
2044
2045               <tr valign="baseline">
2046                  <td><a name="NT-ETag"></a>[42]&nbsp;&nbsp;&nbsp;
2047                  </td>
2048                  <td>ETag</td>
2049                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2050                  <td>'&lt;/' <a href="#NT-Name">Name</a>
2051                     <a href="#NT-S">S</a>? '>'
2052                  </td>
2053                  <td></td>
2054               </tr>
2055
2056            </tbody>
2057         </table>
2058
2059      </p>
2060
2061      <p>An example of an end-tag:<pre>&lt;/termdef></pre></p>
2062
2063      <p><a name="dt-content"></a>The
2064         <a href="#dt-text">text</a> between the start-tag and
2065         end-tag is called the element's
2066         <b>content</b>:
2067
2068         <h5>Content of Elements</h5>
2069         <table class="scrap">
2070            <tbody>
2071
2072               <tr valign="baseline">
2073                  <td><a name="NT-content"></a>[43]&nbsp;&nbsp;&nbsp;
2074                  </td>
2075                  <td>content</td>
2076                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2077                  <td>(<a href="#NT-element">element</a> | <a href="#NT-CharData">CharData</a>
2078                     | <a href="#NT-Reference">Reference</a> | <a href="#NT-CDSect">CDSect</a>
2079                     | <a href="#NT-PI">PI</a> | <a href="#NT-Comment">Comment</a>)*
2080                  </td>
2081                  <td></td>
2082               </tr>
2083
2084            </tbody>
2085         </table>
2086
2087      </p>
2088
2089      <p><a name="dt-empty"></a>If an element is <b>empty</b>,
2090         it must be represented either by a start-tag immediately followed
2091         by an end-tag or by an empty-element tag.
2092         <a name="dt-eetag"></a>An
2093         <b>empty-element tag</b> takes a special form:
2094
2095         <h5>Tags for Empty Elements</h5>
2096         <table class="scrap">
2097            <tbody>
2098
2099               <tr valign="baseline">
2100                  <td><a name="NT-EmptyElemTag"></a>[44]&nbsp;&nbsp;&nbsp;
2101                  </td>
2102                  <td>EmptyElemTag</td>
2103                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2104                  <td>'&lt;' <a href="#NT-Name">Name</a> (<a href="#NT-S">S</a>
2105                     <a href="#NT-Attribute">Attribute</a>)* <a href="#NT-S">S</a>?
2106                     '/>'
2107                  </td>
2108                  <td>[&nbsp;WFC:&nbsp;<a href="#uniqattspec">Unique Att Spec</a>&nbsp;]
2109                  </td>
2110               </tr>
2111
2112            </tbody>
2113         </table>
2114
2115      </p>
2116
2117      <p>Empty-element tags may be used for any element which has no
2118         content, whether or not it is declared using the keyword
2119         EMPTY.
2120         <a href="#dt-interop">For interoperability</a>, the empty-element
2121         tag must be used, and can only be used, for elements which are
2122         <a href="#dt-eldecl">declared</a> EMPTY.
2123      </p>
2124
2125      <p>Examples of empty elements:
2126         <pre>&lt;IMG align="left"
2127 src="http://www.w3.org/Icons/WWW/w3c_home" />
2128&lt;br>&lt;/br>
2129&lt;br/></pre></p>
2130
2131
2132
2133
2134      <h3><a name="elemdecls"></a>3.2 Element Type Declarations
2135      </h3>
2136
2137
2138      <p>The <a href="#dt-element">element</a> structure of an
2139         <a href="#dt-xml-doc">XML document</a> may, for
2140         <a href="#dt-valid">validation</a> purposes,
2141         be constrained
2142         using element type and attribute-list declarations.
2143         An element type declaration constrains the element's
2144         <a href="#dt-content">content</a>.
2145
2146      </p>
2147
2148
2149      <p>Element type declarations often constrain which element types can
2150         appear as <a href="#dt-parentchild">children</a> of the element.
2151         At user option, an XML processor may issue a warning
2152         when a declaration mentions an element type for which no declaration
2153         is provided, but this is not an error.
2154      </p>
2155
2156      <p><a name="dt-eldecl"></a>An <b>element
2157            type declaration
2158         </b> takes the form:
2159
2160         <h5>Element Type Declaration</h5>
2161         <table class="scrap">
2162            <tbody>
2163
2164               <tr valign="baseline">
2165                  <td><a name="NT-elementdecl"></a>[45]&nbsp;&nbsp;&nbsp;
2166                  </td>
2167                  <td>elementdecl</td>
2168                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2169                  <td>'&lt;!ELEMENT' <a href="#NT-S">S</a>
2170                     <a href="#NT-Name">Name</a>
2171                     <a href="#NT-S">S</a>
2172                     <a href="#NT-contentspec">contentspec</a>
2173                     <a href="#NT-S">S</a>? '>'
2174                  </td>
2175                  <td>[&nbsp;VC:&nbsp;<a href="#EDUnique">Unique Element Type Declaration</a>&nbsp;]
2176                  </td>
2177               </tr>
2178
2179               <tr valign="baseline">
2180                  <td><a name="NT-contentspec"></a>[46]&nbsp;&nbsp;&nbsp;
2181                  </td>
2182                  <td>contentspec</td>
2183                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2184                  <td>'EMPTY'
2185                     | 'ANY'
2186                     | <a href="#NT-Mixed">Mixed</a>
2187                     | <a href="#NT-children">children</a>
2188
2189                  </td>
2190                  <td></td>
2191               </tr>
2192
2193            </tbody>
2194         </table>
2195         where the <a href="#NT-Name">Name</a> gives the element type
2196         being declared.
2197
2198      </p>
2199
2200      <a name="EDUnique"></a><p><b>Validity Constraint: Unique Element Type Declaration</b></p>
2201      Unique Element Type Declaration
2202
2203      <p>
2204         No element type may be declared more than once.
2205
2206      </p>
2207
2208
2209
2210      <p>Examples of element type declarations:
2211         <pre>&lt;!ELEMENT br EMPTY>
2212&lt;!ELEMENT p (#PCDATA|emph)* >
2213&lt;!ELEMENT %name.para; %content.para; >
2214&lt;!ELEMENT container ANY></pre></p>
2215
2216
2217
2218      <h4><a name="sec-element-content"></a>3.2.1 Element Content
2219      </h4>
2220
2221
2222      <p><a name="dt-elemcontent"></a>An element <a href="#dt-stag">type</a> has
2223         <b>element content</b> when elements of that
2224         type must contain only <a href="#dt-parentchild">child</a>
2225         elements (no character data), optionally separated by
2226         white space (characters matching the nonterminal
2227         <a href="#NT-S">S</a>).
2228
2229         In this case, the
2230         constraint includes a content model, a simple grammar governing
2231         the allowed types of the child
2232         elements and the order in which they are allowed to appear.
2233         The grammar is built on
2234         content particles (<a href="#NT-cp">cp</a>s), which consist of names,
2235         choice lists of content particles, or
2236         sequence lists of content particles:
2237
2238         <h5>Element-content Models</h5>
2239         <table class="scrap">
2240            <tbody>
2241
2242               <tr valign="baseline">
2243                  <td><a name="NT-children"></a>[47]&nbsp;&nbsp;&nbsp;
2244                  </td>
2245                  <td>children</td>
2246                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2247                  <td>(<a href="#NT-choice">choice</a>
2248                     | <a href="#NT-seq">seq</a>)
2249                     ('?' | '*' | '+')?
2250                  </td>
2251                  <td></td>
2252               </tr>
2253
2254               <tr valign="baseline">
2255                  <td><a name="NT-cp"></a>[48]&nbsp;&nbsp;&nbsp;
2256                  </td>
2257                  <td>cp</td>
2258                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2259                  <td>(<a href="#NT-Name">Name</a>
2260                     | <a href="#NT-choice">choice</a>
2261                     | <a href="#NT-seq">seq</a>)
2262                     ('?' | '*' | '+')?
2263                  </td>
2264                  <td></td>
2265               </tr>
2266
2267               <tr valign="baseline">
2268                  <td><a name="NT-choice"></a>[49]&nbsp;&nbsp;&nbsp;
2269                  </td>
2270                  <td>choice</td>
2271                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2272                  <td>'(' <a href="#NT-S">S</a>? cp
2273                     ( <a href="#NT-S">S</a>? '|' <a href="#NT-S">S</a>? <a href="#NT-cp">cp</a> )*
2274                     <a href="#NT-S">S</a>? ')'
2275                  </td>
2276                  <td>[&nbsp;VC:&nbsp;<a href="#vc-PEinGroup">Proper Group/PE Nesting</a>&nbsp;]
2277                  </td>
2278               </tr>
2279
2280               <tr valign="baseline">
2281                  <td><a name="NT-seq"></a>[50]&nbsp;&nbsp;&nbsp;
2282                  </td>
2283                  <td>seq</td>
2284                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2285                  <td>'(' <a href="#NT-S">S</a>? cp
2286                     ( <a href="#NT-S">S</a>? ',' <a href="#NT-S">S</a>? <a href="#NT-cp">cp</a> )*
2287                     <a href="#NT-S">S</a>? ')'
2288                  </td>
2289                  <td>[&nbsp;VC:&nbsp;<a href="#vc-PEinGroup">Proper Group/PE Nesting</a>&nbsp;]
2290                  </td>
2291               </tr>
2292
2293
2294            </tbody>
2295         </table>
2296         where each <a href="#NT-Name">Name</a> is the type of an element which may
2297         appear as a <a href="#dt-parentchild">child</a>.
2298         Any content
2299         particle in a choice list may appear in the <a href="#dt-elemcontent">element content</a> at the location where
2300         the choice list appears in the grammar;
2301         content particles occurring in a sequence list must each
2302         appear in the <a href="#dt-elemcontent">element content</a> in the
2303         order given in the list.
2304         The optional character following a name or list governs
2305         whether the element or the content particles in the list may occur one
2306         or more (<code>+</code>), zero or more (<code>*</code>), or zero or
2307         one times (<code>?</code>).
2308         The absence of such an operator means that the element or content particle
2309         must appear exactly once.
2310         This syntax
2311         and meaning are identical to those used in the productions in this
2312         specification.
2313      </p>
2314
2315      <p>
2316         The content of an element matches a content model if and only if it is
2317         possible to trace out a path through the content model, obeying the
2318         sequence, choice, and repetition operators and matching each element in
2319         the content against an element type in the content model.  <a href="#dt-compat">For compatibility</a>, it is an error
2320         if an element in the document can
2321         match more than one occurrence of an element type in the content model.
2322         For more information, see <a href="#determinism">[<b>E Deterministic Content Models</b>]
2323         </a>.
2324
2325
2326
2327      </p>
2328      <a name="vc-PEinGroup"></a><p><b>Validity Constraint: Proper Group/PE Nesting</b></p>
2329      Proper Group/PE Nesting
2330
2331      <p>Parameter-entity
2332         <a href="#dt-repltext">replacement text</a> must be properly nested
2333         with parenthetized groups.
2334         That is to say, if either of the opening or closing parentheses
2335         in a <a href="#NT-choice">choice</a>, <a href="#NT-seq">seq</a>, or
2336         <a href="#NT-Mixed">Mixed</a> construct
2337         is contained in the replacement text for a
2338         <a href="#dt-PERef">parameter entity</a>,
2339         both must be contained in the same replacement text.
2340      </p>
2341
2342      <p><a href="#dt-interop">For interoperability</a>,
2343         if a parameter-entity reference appears in a
2344         <a href="#NT-choice">choice</a>, <a href="#NT-seq">seq</a>, or
2345         <a href="#NT-Mixed">Mixed</a> construct, its replacement text
2346         should not be empty, and
2347         neither the first nor last non-blank
2348         character of the replacement text should be a connector
2349         (<code>|</code> or <code>,</code>).
2350
2351      </p>
2352
2353
2354      <p>Examples of element-content models:
2355         <pre>&lt;!ELEMENT spec (front, body, back?)>
2356&lt;!ELEMENT div1 (head, (p | list | note)*, div2*)>
2357&lt;!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*></pre></p>
2358
2359
2360
2361
2362      <h4><a name="sec-mixed-content"></a>3.2.2 Mixed Content
2363      </h4>
2364
2365
2366      <p><a name="dt-mixed"></a>An element
2367         <a href="#dt-stag">type</a> has
2368         <b>mixed content</b> when elements of that type may contain
2369         character data, optionally interspersed with
2370         <a href="#dt-parentchild">child</a> elements.
2371         In this case, the types of the child elements
2372         may be constrained, but not their order or their number of occurrences:
2373
2374         <h5>Mixed-content Declaration</h5>
2375         <table class="scrap">
2376            <tbody>
2377
2378               <tr valign="baseline">
2379                  <td><a name="NT-Mixed"></a>[51]&nbsp;&nbsp;&nbsp;
2380                  </td>
2381                  <td>Mixed</td>
2382                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2383                  <td>'(' <a href="#NT-S">S</a>?
2384                     '#PCDATA'
2385                     (<a href="#NT-S">S</a>?
2386                     '|'
2387                     <a href="#NT-S">S</a>?
2388                     <a href="#NT-Name">Name</a>)*
2389                     <a href="#NT-S">S</a>?
2390                     ')*'
2391                  </td>
2392                  <td></td>
2393               </tr>
2394               <tr valign="baseline">
2395                  <td></td>
2396                  <td></td>
2397                  <td></td>
2398                  <td>| '(' <a href="#NT-S">S</a>? '#PCDATA' <a href="#NT-S">S</a>? ')'
2399
2400                  </td>
2401                  <td>[&nbsp;VC:&nbsp;<a href="#vc-PEinGroup">Proper Group/PE Nesting</a>&nbsp;]
2402                  </td>
2403               </tr>
2404               <tr valign="baseline">
2405                  <td></td>
2406                  <td></td>
2407                  <td></td>
2408                  <td></td>
2409                  <td>[&nbsp;VC:&nbsp;<a href="#vc-MixedChildrenUnique">No Duplicate Types</a>&nbsp;]
2410                  </td>
2411               </tr>
2412
2413
2414            </tbody>
2415         </table>
2416         where the <a href="#NT-Name">Name</a>s give the types of elements
2417         that may appear as children.
2418
2419      </p>
2420      <a name="vc-MixedChildrenUnique"></a><p><b>Validity Constraint: No Duplicate Types</b></p>
2421      No Duplicate Types
2422
2423      <p>The same name must not appear more than once in a single mixed-content
2424         declaration.
2425
2426      </p>
2427
2428      <p>Examples of mixed content declarations:
2429         <pre>&lt;!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
2430&lt;!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* >
2431&lt;!ELEMENT b (#PCDATA)></pre></p>
2432
2433
2434
2435
2436
2437      <h3><a name="attdecls"></a>3.3 Attribute-List Declarations
2438      </h3>
2439
2440
2441      <p><a href="#dt-attr">Attributes</a> are used to associate
2442         name-value pairs with <a href="#dt-element">elements</a>.
2443         Attribute specifications may appear only within <a href="#dt-stag">start-tags</a>
2444         and <a href="#dt-eetag">empty-element tags</a>;
2445         thus, the productions used to
2446         recognize them appear in <a href="#sec-starttags">[<b>3.1 Start-Tags, End-Tags, and Empty-Element Tags</b>]
2447         </a>.
2448         Attribute-list
2449         declarations may be used:
2450
2451         <ul>
2452
2453            <li>
2454               <p>To define the set of attributes pertaining to a given
2455                  element type.
2456               </p>
2457            </li>
2458
2459            <li>
2460               <p>To establish type constraints for these
2461                  attributes.
2462               </p>
2463            </li>
2464
2465            <li>
2466               <p>To provide <a href="#dt-default">default values</a>
2467                  for attributes.
2468               </p>
2469            </li>
2470
2471         </ul>
2472
2473      </p>
2474
2475      <p><a name="dt-attdecl"></a>
2476         <b>Attribute-list declarations</b> specify the name, data type, and default
2477         value (if any) of each attribute associated with a given element type:
2478
2479         <h5>Attribute-list Declaration</h5>
2480         <table class="scrap">
2481            <tbody>
2482               <tr valign="baseline">
2483                  <td><a name="NT-AttlistDecl"></a>[52]&nbsp;&nbsp;&nbsp;
2484                  </td>
2485                  <td>AttlistDecl</td>
2486                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2487                  <td>'&lt;!ATTLIST' <a href="#NT-S">S</a>
2488                     <a href="#NT-Name">Name</a>
2489                     <a href="#NT-AttDef">AttDef</a>*
2490                     <a href="#NT-S">S</a>? '>'
2491                  </td>
2492                  <td></td>
2493               </tr>
2494               <tr valign="baseline">
2495                  <td><a name="NT-AttDef"></a>[53]&nbsp;&nbsp;&nbsp;
2496                  </td>
2497                  <td>AttDef</td>
2498                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2499                  <td><a href="#NT-S">S</a> <a href="#NT-Name">Name</a>
2500                     <a href="#NT-S">S</a> <a href="#NT-AttType">AttType</a>
2501                     <a href="#NT-S">S</a> <a href="#NT-DefaultDecl">DefaultDecl</a></td>
2502                  <td></td>
2503               </tr>
2504            </tbody>
2505         </table>
2506         The <a href="#NT-Name">Name</a> in the
2507         <a href="#NT-AttlistDecl">AttlistDecl</a> rule is the type of an element.  At
2508         user option, an XML processor may issue a warning if attributes are
2509         declared for an element type not itself declared, but this is not an
2510         error.  The <a href="#NT-Name">Name</a> in the
2511         <a href="#NT-AttDef">AttDef</a> rule is
2512         the name of the attribute.
2513      </p>
2514
2515      <p>
2516         When more than one <a href="#NT-AttlistDecl">AttlistDecl</a> is provided for a
2517         given element type, the contents of all those provided are merged.  When
2518         more than one definition is provided for the same attribute of a
2519         given element type, the first declaration is binding and later
2520         declarations are ignored.
2521         <a href="#dt-interop">For interoperability,</a> writers of DTDs
2522         may choose to provide at most one attribute-list declaration
2523         for a given element type, at most one attribute definition
2524         for a given attribute name, and at least one attribute definition
2525         in each attribute-list declaration.
2526         For interoperability, an XML processor may at user option
2527         issue a warning when more than one attribute-list declaration is
2528         provided for a given element type, or more than one attribute definition
2529         is provided
2530         for a given attribute, but this is not an error.
2531
2532      </p>
2533
2534
2535
2536      <h4><a name="sec-attribute-types"></a>3.3.1 Attribute Types
2537      </h4>
2538
2539
2540      <p>XML attribute types are of three kinds:  a string type, a
2541         set of tokenized types, and enumerated types.  The string type may take
2542         any literal string as a value; the tokenized types have varying lexical
2543         and semantic constraints, as noted:
2544
2545         <h5>Attribute Types</h5>
2546         <table class="scrap">
2547            <tbody>
2548
2549               <tr valign="baseline">
2550                  <td><a name="NT-AttType"></a>[54]&nbsp;&nbsp;&nbsp;
2551                  </td>
2552                  <td>AttType</td>
2553                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2554                  <td><a href="#NT-StringType">StringType</a>
2555                     | <a href="#NT-TokenizedType">TokenizedType</a>
2556                     | <a href="#NT-EnumeratedType">EnumeratedType</a>
2557
2558                  </td>
2559                  <td></td>
2560               </tr>
2561
2562               <tr valign="baseline">
2563                  <td><a name="NT-StringType"></a>[55]&nbsp;&nbsp;&nbsp;
2564                  </td>
2565                  <td>StringType</td>
2566                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2567                  <td>'CDATA'</td>
2568                  <td></td>
2569               </tr>
2570
2571               <tr valign="baseline">
2572                  <td><a name="NT-TokenizedType"></a>[56]&nbsp;&nbsp;&nbsp;
2573                  </td>
2574                  <td>TokenizedType</td>
2575                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2576                  <td>'ID'</td>
2577                  <td>[&nbsp;VC:&nbsp;<a href="#id">ID</a>&nbsp;]
2578                  </td>
2579               </tr>
2580               <tr valign="baseline">
2581                  <td></td>
2582                  <td></td>
2583                  <td></td>
2584                  <td></td>
2585                  <td>[&nbsp;VC:&nbsp;<a href="#one-id-per-el">One ID per Element Type</a>&nbsp;]
2586                  </td>
2587               </tr>
2588               <tr valign="baseline">
2589                  <td></td>
2590                  <td></td>
2591                  <td></td>
2592                  <td></td>
2593                  <td>[&nbsp;VC:&nbsp;<a href="#id-default">ID Attribute Default</a>&nbsp;]
2594                  </td>
2595               </tr>
2596               <tr valign="baseline">
2597                  <td></td>
2598                  <td></td>
2599                  <td></td>
2600                  <td>| 'IDREF'</td>
2601                  <td>[&nbsp;VC:&nbsp;<a href="#idref">IDREF</a>&nbsp;]
2602                  </td>
2603               </tr>
2604               <tr valign="baseline">
2605                  <td></td>
2606                  <td></td>
2607                  <td></td>
2608                  <td>| 'IDREFS'</td>
2609                  <td>[&nbsp;VC:&nbsp;<a href="#idref">IDREF</a>&nbsp;]
2610                  </td>
2611               </tr>
2612               <tr valign="baseline">
2613                  <td></td>
2614                  <td></td>
2615                  <td></td>
2616                  <td>| 'ENTITY'</td>
2617                  <td>[&nbsp;VC:&nbsp;<a href="#entname">Entity Name</a>&nbsp;]
2618                  </td>
2619               </tr>
2620               <tr valign="baseline">
2621                  <td></td>
2622                  <td></td>
2623                  <td></td>
2624                  <td>| 'ENTITIES'</td>
2625                  <td>[&nbsp;VC:&nbsp;<a href="#entname">Entity Name</a>&nbsp;]
2626                  </td>
2627               </tr>
2628               <tr valign="baseline">
2629                  <td></td>
2630                  <td></td>
2631                  <td></td>
2632                  <td>| 'NMTOKEN'</td>
2633                  <td>[&nbsp;VC:&nbsp;<a href="#nmtok">Name Token</a>&nbsp;]
2634                  </td>
2635               </tr>
2636               <tr valign="baseline">
2637                  <td></td>
2638                  <td></td>
2639                  <td></td>
2640                  <td>| 'NMTOKENS'</td>
2641                  <td>[&nbsp;VC:&nbsp;<a href="#nmtok">Name Token</a>&nbsp;]
2642                  </td>
2643               </tr>
2644
2645            </tbody>
2646         </table>
2647
2648      </p>
2649      <a name="id"></a><p><b>Validity Constraint: ID</b></p>
2650      ID
2651
2652      <p>
2653         Values of type ID must match the
2654         <a href="#NT-Name">Name</a> production.
2655         A name must not appear more than once in
2656         an XML document as a value of this type; i.e., ID values must uniquely
2657         identify the elements which bear them.
2658
2659      </p>
2660
2661      <a name="one-id-per-el"></a><p><b>Validity Constraint: One ID per Element Type</b></p>
2662      One ID per Element Type
2663
2664      <p>No element type may have more than one ID attribute specified.</p>
2665
2666      <a name="id-default"></a><p><b>Validity Constraint: ID Attribute Default</b></p>
2667      ID Attribute Default
2668
2669      <p>An ID attribute must have a declared default of #IMPLIED or
2670         #REQUIRED.
2671      </p>
2672
2673      <a name="idref"></a><p><b>Validity Constraint: IDREF</b></p>
2674      IDREF
2675
2676      <p>
2677         Values of type IDREF must match
2678         the <a href="#NT-Name">Name</a> production, and
2679         values of type IDREFS must match
2680         <a href="#NT-Names">Names</a>;
2681         each <a href="#NT-Name">Name</a> must match the value of an ID attribute on
2682         some element in the XML document; i.e. IDREF values must
2683         match the value of some ID attribute.
2684
2685      </p>
2686
2687      <a name="entname"></a><p><b>Validity Constraint: Entity Name</b></p>
2688      Entity Name
2689
2690      <p>
2691         Values of type ENTITY
2692         must match the <a href="#NT-Name">Name</a> production,
2693         values of type ENTITIES must match
2694         <a href="#NT-Names">Names</a>;
2695         each <a href="#NT-Name">Name</a> must
2696         match the
2697         name of an <a href="#dt-unparsed">unparsed entity</a> declared in the
2698         <a href="#dt-doctype">DTD</a>.
2699
2700      </p>
2701
2702      <a name="nmtok"></a><p><b>Validity Constraint: Name Token</b></p>
2703      Name Token
2704
2705      <p>
2706         Values of type NMTOKEN must match the
2707         <a href="#NT-Nmtoken">Nmtoken</a> production;
2708         values of type NMTOKENS must
2709         match <a href="#NT-Nmtokens">Nmtokens</a>.
2710
2711      </p>
2712
2713
2714
2715      <p><a name="dt-enumerated"></a><b>Enumerated attributes</b> can take one
2716         of a list of values provided in the declaration. There are two
2717         kinds of enumerated types:
2718
2719         <h5>Enumerated Attribute Types</h5>
2720         <table class="scrap">
2721            <tbody>
2722               <tr valign="baseline">
2723                  <td><a name="NT-EnumeratedType"></a>[57]&nbsp;&nbsp;&nbsp;
2724                  </td>
2725                  <td>EnumeratedType</td>
2726                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2727                  <td><a href="#NT-NotationType">NotationType</a>
2728                     | <a href="#NT-Enumeration">Enumeration</a>
2729
2730                  </td>
2731                  <td></td>
2732               </tr>
2733               <tr valign="baseline">
2734                  <td><a name="NT-NotationType"></a>[58]&nbsp;&nbsp;&nbsp;
2735                  </td>
2736                  <td>NotationType</td>
2737                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2738                  <td>'NOTATION'
2739                     <a href="#NT-S">S</a>
2740                     '('
2741                     <a href="#NT-S">S</a>?
2742                     <a href="#NT-Name">Name</a>
2743                     (<a href="#NT-S">S</a>? '|' <a href="#NT-S">S</a>?
2744                     <a href="#NT-Name">Name</a>)*
2745                     <a href="#NT-S">S</a>? ')'
2746
2747                  </td>
2748                  <td>[&nbsp;VC:&nbsp;<a href="#notatn">Notation Attributes</a>&nbsp;]
2749                  </td>
2750               </tr>
2751               <tr valign="baseline">
2752                  <td><a name="NT-Enumeration"></a>[59]&nbsp;&nbsp;&nbsp;
2753                  </td>
2754                  <td>Enumeration</td>
2755                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2756                  <td>'(' <a href="#NT-S">S</a>?
2757                     <a href="#NT-Nmtoken">Nmtoken</a>
2758                     (<a href="#NT-S">S</a>? '|'
2759                     <a href="#NT-S">S</a>?
2760                     <a href="#NT-Nmtoken">Nmtoken</a>)*
2761                     <a href="#NT-S">S</a>?
2762                     ')'
2763                  </td>
2764                  <td>[&nbsp;VC:&nbsp;<a href="#enum">Enumeration</a>&nbsp;]
2765                  </td>
2766               </tr>
2767            </tbody>
2768         </table>
2769         A NOTATION attribute identifies a
2770         <a href="#dt-notation">notation</a>, declared in the
2771         DTD with associated system and/or public identifiers, to
2772         be used in interpreting the element to which the attribute
2773         is attached.
2774
2775      </p>
2776
2777      <a name="notatn"></a><p><b>Validity Constraint: Notation Attributes</b></p>
2778      Notation Attributes
2779
2780      <p>
2781         Values of this type must match
2782         one of the <a href="#Notations">notation</a> names included in
2783         the declaration; all notation names in the declaration must
2784         be declared.
2785
2786      </p>
2787
2788      <a name="enum"></a><p><b>Validity Constraint: Enumeration</b></p>
2789      Enumeration
2790
2791      <p>
2792         Values of this type
2793         must match one of the <a href="#NT-Nmtoken">Nmtoken</a> tokens in the
2794         declaration.
2795
2796      </p>
2797
2798
2799      <p><a href="#dt-interop">For interoperability,</a> the same
2800         <a href="#NT-Nmtoken">Nmtoken</a> should not occur more than once in the
2801         enumerated attribute types of a single element type.
2802
2803      </p>
2804
2805
2806
2807
2808      <h4><a name="sec-attr-defaults"></a>3.3.2 Attribute Defaults
2809      </h4>
2810
2811
2812      <p>An <a href="#dt-attdecl">attribute declaration</a> provides
2813         information on whether
2814         the attribute's presence is required, and if not, how an XML processor should
2815         react if a declared attribute is absent in a document.
2816
2817         <h5>Attribute Defaults</h5>
2818         <table class="scrap">
2819            <tbody>
2820
2821               <tr valign="baseline">
2822                  <td><a name="NT-DefaultDecl"></a>[60]&nbsp;&nbsp;&nbsp;
2823                  </td>
2824                  <td>DefaultDecl</td>
2825                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
2826                  <td>'#REQUIRED'
2827                     |&nbsp;'#IMPLIED'
2828                  </td>
2829                  <td></td>
2830               </tr>
2831               <tr valign="baseline">
2832                  <td></td>
2833                  <td></td>
2834                  <td></td>
2835                  <td>| (('#FIXED' S)? <a href="#NT-AttValue">AttValue</a>)
2836                  </td>
2837                  <td>[&nbsp;VC:&nbsp;<a href="#RequiredAttr">Required Attribute</a>&nbsp;]
2838                  </td>
2839               </tr>
2840               <tr valign="baseline">
2841                  <td></td>
2842                  <td></td>
2843                  <td></td>
2844                  <td></td>
2845                  <td>[&nbsp;VC:&nbsp;<a href="#defattrvalid">Attribute Default Legal</a>&nbsp;]
2846                  </td>
2847               </tr>
2848               <tr valign="baseline">
2849                  <td></td>
2850                  <td></td>
2851                  <td></td>
2852                  <td></td>
2853                  <td>[&nbsp;WFC:&nbsp;<a href="#CleanAttrVals">No &lt; in Attribute Values</a>&nbsp;]
2854                  </td>
2855               </tr>
2856               <tr valign="baseline">
2857                  <td></td>
2858                  <td></td>
2859                  <td></td>
2860                  <td></td>
2861                  <td>[&nbsp;VC:&nbsp;<a href="#FixedAttr">Fixed Attribute Default</a>&nbsp;]
2862                  </td>
2863               </tr>
2864
2865            </tbody>
2866         </table>
2867
2868
2869      </p>
2870
2871      <p>In an attribute declaration, #REQUIRED means that the
2872         attribute must always be provided, #IMPLIED that no default
2873         value is provided.
2874
2875         <a name="dt-default"></a>If the
2876         declaration
2877         is neither #REQUIRED nor #IMPLIED, then the
2878         <a href="#NT-AttValue">AttValue</a> value contains the declared
2879         <b>default</b> value; the #FIXED keyword states that
2880         the attribute must always have the default value.
2881         If a default value
2882         is declared, when an XML processor encounters an omitted attribute, it
2883         is to behave as though the attribute were present with
2884         the declared default value.
2885      </p>
2886      <a name="RequiredAttr"></a><p><b>Validity Constraint: Required Attribute</b></p>
2887      Required Attribute
2888
2889      <p>If the default declaration is the keyword #REQUIRED, then
2890         the attribute must be specified for
2891         all elements of the type in the attribute-list declaration.
2892
2893      </p>
2894      <a name="defattrvalid"></a><p><b>Validity Constraint: Attribute Default Legal</b></p>
2895      Attribute Default Legal
2896
2897      <p>
2898         The declared
2899         default value must meet the lexical constraints of the declared attribute type.
2900
2901      </p>
2902
2903      <a name="FixedAttr"></a><p><b>Validity Constraint: Fixed Attribute Default</b></p>
2904      Fixed Attribute Default
2905
2906      <p>If an attribute has a default value declared with the
2907         #FIXED keyword, instances of that attribute must
2908         match the default value.
2909
2910      </p>
2911
2912
2913      <p>Examples of attribute-list declarations:
2914         <pre>&lt;!ATTLIST termdef
2915          id      ID      #REQUIRED
2916          name    CDATA   #IMPLIED>
2917&lt;!ATTLIST list
2918          type    (bullets|ordered|glossary)  "ordered">
2919&lt;!ATTLIST form
2920          method  CDATA   #FIXED "POST"></pre></p>
2921
2922
2923
2924      <h4><a name="AVNormalize"></a>3.3.3 Attribute-Value Normalization
2925      </h4>
2926
2927      <p>Before the value of an attribute is passed to the application
2928         or checked for validity, the
2929         XML processor must normalize it as follows:
2930
2931         <ul>
2932
2933            <li>
2934               <p>a character reference is processed by appending the referenced
2935                  character to the attribute value
2936               </p>
2937            </li>
2938
2939            <li>
2940               <p>an entity reference is processed by recursively processing the
2941                  replacement text of the entity
2942               </p>
2943            </li>
2944
2945            <li>
2946               <p>a whitespace character (#x20, #xD, #xA, #x9) is processed by
2947                  appending #x20 to the normalized value, except that only a single #x20
2948                  is appended for a "#xD#xA" sequence that is part of an external
2949                  parsed entity or the literal entity value of an internal parsed
2950                  entity
2951               </p>
2952            </li>
2953
2954            <li>
2955               <p>other characters are processed by appending them to the normalized
2956                  value
2957               </p>
2958
2959            </li>
2960         </ul>
2961
2962      </p>
2963
2964      <p>If the declared value is not CDATA, then the XML processor must
2965         further process the normalized attribute value by discarding any
2966         leading and trailing space (#x20) characters, and by replacing
2967         sequences of space (#x20) characters by a single space (#x20)
2968         character.
2969      </p>
2970
2971      <p>
2972         All attributes for which no declaration has been read should be treated
2973         by a non-validating parser as if declared
2974         CDATA.
2975
2976      </p>
2977
2978
2979
2980
2981      <h3><a name="sec-condition-sect"></a>3.4 Conditional Sections
2982      </h3>
2983
2984      <p><a name="dt-cond-section"></a>
2985         <b>Conditional sections</b> are portions of the
2986         <a href="#dt-doctype">document type declaration external subset</a>
2987         which are
2988         included in, or excluded from, the logical structure of the DTD based on
2989         the keyword which governs them.
2990
2991         <h5>Conditional Section</h5>
2992         <table class="scrap">
2993            <tbody>
2994
2995               <tr valign="baseline">
2996                  <td><a name="NT-conditionalSect"></a>[61]&nbsp;&nbsp;&nbsp;
2997                  </td>
2998                  <td>conditionalSect</td>
2999                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3000                  <td><a href="#NT-includeSect">includeSect</a>
3001                     | <a href="#NT-ignoreSect">ignoreSect</a>
3002
3003                  </td>
3004                  <td></td>
3005               </tr>
3006
3007               <tr valign="baseline">
3008                  <td><a name="NT-includeSect"></a>[62]&nbsp;&nbsp;&nbsp;
3009                  </td>
3010                  <td>includeSect</td>
3011                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3012                  <td>'&lt;![' S? 'INCLUDE' S? '['
3013
3014                     <a href="#NT-extSubsetDecl">extSubsetDecl</a>
3015                     ']]>'
3016
3017                  </td>
3018                  <td></td>
3019               </tr>
3020
3021               <tr valign="baseline">
3022                  <td><a name="NT-ignoreSect"></a>[63]&nbsp;&nbsp;&nbsp;
3023                  </td>
3024                  <td>ignoreSect</td>
3025                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3026                  <td>'&lt;![' S? 'IGNORE' S? '['
3027                     <a href="#NT-ignoreSectContents">ignoreSectContents</a>*
3028                     ']]>'
3029                  </td>
3030                  <td></td>
3031               </tr>
3032
3033
3034               <tr valign="baseline">
3035                  <td><a name="NT-ignoreSectContents"></a>[64]&nbsp;&nbsp;&nbsp;
3036                  </td>
3037                  <td>ignoreSectContents</td>
3038                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3039                  <td><a href="#NT-Ignore">Ignore</a>
3040                     ('&lt;![' <a href="#NT-ignoreSectContents">ignoreSectContents</a> ']]>'
3041                     <a href="#NT-Ignore">Ignore</a>)*
3042                  </td>
3043                  <td></td>
3044               </tr>
3045
3046               <tr valign="baseline">
3047                  <td><a name="NT-Ignore"></a>[65]&nbsp;&nbsp;&nbsp;
3048                  </td>
3049                  <td>Ignore</td>
3050                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3051                  <td><a href="#NT-Char">Char</a>* -
3052                     (<a href="#NT-Char">Char</a>* ('&lt;![' | ']]>')
3053                     <a href="#NT-Char">Char</a>*)
3054
3055                  </td>
3056                  <td></td>
3057               </tr>
3058
3059
3060            </tbody>
3061         </table>
3062
3063      </p>
3064
3065      <p>Like the internal and external DTD subsets, a conditional section
3066         may contain one or more complete declarations,
3067         comments, processing instructions,
3068         or nested conditional sections, intermingled with white space.
3069
3070      </p>
3071
3072      <p>If the keyword of the
3073         conditional section is INCLUDE, then the contents of the conditional
3074         section are part of the DTD.
3075         If the keyword of the conditional
3076         section is IGNORE, then the contents of the conditional section are
3077         not logically part of the DTD.
3078         Note that for reliable parsing, the contents of even ignored
3079         conditional sections must be read in order to
3080         detect nested conditional sections and ensure that the end of the
3081         outermost (ignored) conditional section is properly detected.
3082         If a conditional section with a
3083         keyword of INCLUDE occurs within a larger conditional
3084         section with a keyword of IGNORE, both the outer and the
3085         inner conditional sections are ignored.
3086      </p>
3087
3088      <p>If the keyword of the conditional section is a
3089         parameter-entity reference, the parameter entity must be replaced by its
3090         content before the processor decides whether to
3091         include or ignore the conditional section.
3092      </p>
3093
3094      <p>An example:
3095         <pre>&lt;!ENTITY % draft 'INCLUDE' >
3096&lt;!ENTITY % final 'IGNORE' >
3097
3098&lt;![%draft;[
3099&lt;!ELEMENT book (comments*, title, body, supplements?)>
3100]]>
3101&lt;![%final;[
3102&lt;!ELEMENT book (title, body, supplements?)>
3103]]>
3104</pre>
3105         </p>
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116      <h2><a name="sec-physical-struct"></a>4 Physical Structures
3117      </h2>
3118
3119
3120      <p><a name="dt-entity"></a>An XML document may consist
3121         of one or many storage units.   These are called
3122         <b>entities</b>; they all have <b>content</b> and are all
3123         (except for the document entity, see below, and
3124         the <a href="#dt-doctype">external DTD subset</a>)
3125         identified by <b>name</b>.
3126
3127         Each XML document has one entity
3128         called the <a href="#dt-docent">document entity</a>, which serves
3129         as the starting point for the <a href="#dt-xml-proc">XML
3130            processor
3131         </a> and may contain the whole document.
3132      </p>
3133
3134      <p>Entities may be either parsed or unparsed.
3135         <a name="dt-parsedent"></a>A <b>parsed entity's</b>
3136         contents are referred to as its
3137         <a href="#dt-repltext">replacement text</a>;
3138         this <a href="#dt-text">text</a> is considered an
3139         integral part of the document.
3140      </p>
3141
3142
3143      <p><a name="dt-unparsed"></a>An
3144         <b>unparsed entity</b>
3145         is a resource whose contents may or may not be
3146         <a href="#dt-text">text</a>, and if text, may not be XML.
3147         Each unparsed entity
3148         has an associated <a href="#dt-notation">notation</a>, identified by name.
3149         Beyond a requirement
3150         that an XML processor make the identifiers for the entity and
3151         notation available to the application,
3152         XML places no constraints on the contents of unparsed entities.
3153
3154      </p>
3155
3156      <p>
3157         Parsed entities are invoked by name using entity references;
3158         unparsed entities by name, given in the value of ENTITY
3159         or ENTITIES
3160         attributes.
3161      </p>
3162
3163      <p><a name="gen-entity"></a><b>General entities</b>
3164         are entities for use within the document content.
3165         In this specification, general entities are sometimes referred
3166         to with the unqualified term <i>entity</i> when this leads
3167         to no ambiguity.
3168         <a name="dt-PE"></a>Parameter entities
3169         are parsed entities for use within the DTD.
3170         These two types of entities use different forms of reference and
3171         are recognized in different contexts.
3172         Furthermore, they occupy different namespaces; a parameter entity and
3173         a general entity with the same name are two distinct entities.
3174
3175      </p>
3176
3177
3178
3179      <h3><a name="sec-references"></a>4.1 Character and Entity References
3180      </h3>
3181
3182      <p><a name="dt-charref"></a>
3183         A <b>character reference</b> refers to a specific character in the
3184         ISO/IEC 10646 character set, for example one not directly accessible from
3185         available input devices.
3186
3187         <h5>Character Reference</h5>
3188         <table class="scrap">
3189            <tbody>
3190               <tr valign="baseline">
3191                  <td><a name="NT-CharRef"></a>[66]&nbsp;&nbsp;&nbsp;
3192                  </td>
3193                  <td>CharRef</td>
3194                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3195                  <td>'&amp;#' [0-9]+ ';' </td>
3196                  <td></td>
3197               </tr>
3198               <tr valign="baseline">
3199                  <td></td>
3200                  <td></td>
3201                  <td></td>
3202                  <td>| '&amp;#x' [0-9a-fA-F]+ ';'</td>
3203                  <td>[&nbsp;WFC:&nbsp;<a href="#wf-Legalchar">Legal Character</a>&nbsp;]
3204                  </td>
3205               </tr>
3206            </tbody>
3207         </table>
3208         <a name="wf-Legalchar"></a><p><b>Well Formedness Constraint: Legal Character</b></p>
3209         Legal Character
3210
3211         <p>Characters referred to using character references must
3212            match the production for
3213            <a href="#NT-Char">Char</a>.
3214         </p>
3215
3216         If the character reference begins with "<code>&amp;#x</code>", the digits and
3217         letters up to the terminating <code>;</code> provide a hexadecimal
3218         representation of the character's code point in ISO/IEC 10646.
3219         If it begins just with "<code>&amp;#</code>", the digits up to the terminating
3220         <code>;</code> provide a decimal representation of the character's
3221         code point.
3222
3223
3224      </p>
3225
3226      <p><a name="dt-entref"></a>An <b>entity
3227            reference
3228         </b> refers to the content of a named entity.
3229         <a name="dt-GERef"></a>References to
3230         parsed general entities
3231         use ampersand (<code>&amp;</code>) and semicolon (<code>;</code>) as
3232         delimiters.
3233         <a name="dt-PERef"></a>
3234         <b>Parameter-entity references</b> use percent-sign (<code>%</code>) and
3235         semicolon
3236         (<code>;</code>) as delimiters.
3237
3238      </p>
3239
3240      <h5>Entity Reference</h5>
3241      <table class="scrap">
3242         <tbody>
3243            <tr valign="baseline">
3244               <td><a name="NT-Reference"></a>[67]&nbsp;&nbsp;&nbsp;
3245               </td>
3246               <td>Reference</td>
3247               <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3248               <td><a href="#NT-EntityRef">EntityRef</a>
3249                  | <a href="#NT-CharRef">CharRef</a></td>
3250               <td></td>
3251            </tr>
3252            <tr valign="baseline">
3253               <td><a name="NT-EntityRef"></a>[68]&nbsp;&nbsp;&nbsp;
3254               </td>
3255               <td>EntityRef</td>
3256               <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3257               <td>'&amp;' <a href="#NT-Name">Name</a> ';'
3258               </td>
3259               <td>[&nbsp;WFC:&nbsp;<a href="#wf-entdeclared">Entity Declared</a>&nbsp;]
3260               </td>
3261            </tr>
3262            <tr valign="baseline">
3263               <td></td>
3264               <td></td>
3265               <td></td>
3266               <td></td>
3267               <td>[&nbsp;VC:&nbsp;<a href="#vc-entdeclared">Entity Declared</a>&nbsp;]
3268               </td>
3269            </tr>
3270            <tr valign="baseline">
3271               <td></td>
3272               <td></td>
3273               <td></td>
3274               <td></td>
3275               <td>[&nbsp;WFC:&nbsp;<a href="#textent">Parsed Entity</a>&nbsp;]
3276               </td>
3277            </tr>
3278            <tr valign="baseline">
3279               <td></td>
3280               <td></td>
3281               <td></td>
3282               <td></td>
3283               <td>[&nbsp;WFC:&nbsp;<a href="#norecursion">No Recursion</a>&nbsp;]
3284               </td>
3285            </tr>
3286            <tr valign="baseline">
3287               <td><a name="NT-PEReference"></a>[69]&nbsp;&nbsp;&nbsp;
3288               </td>
3289               <td>PEReference</td>
3290               <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3291               <td>'%' <a href="#NT-Name">Name</a> ';'
3292               </td>
3293               <td>[&nbsp;VC:&nbsp;<a href="#vc-entdeclared">Entity Declared</a>&nbsp;]
3294               </td>
3295            </tr>
3296            <tr valign="baseline">
3297               <td></td>
3298               <td></td>
3299               <td></td>
3300               <td></td>
3301               <td>[&nbsp;WFC:&nbsp;<a href="#norecursion">No Recursion</a>&nbsp;]
3302               </td>
3303            </tr>
3304            <tr valign="baseline">
3305               <td></td>
3306               <td></td>
3307               <td></td>
3308               <td></td>
3309               <td>[&nbsp;WFC:&nbsp;<a href="#indtd">In DTD</a>&nbsp;]
3310               </td>
3311            </tr>
3312         </tbody>
3313      </table>
3314
3315      <a name="wf-entdeclared"></a><p><b>Well Formedness Constraint: Entity Declared</b></p>
3316      Entity Declared
3317
3318      <p>In a document without any DTD, a document with only an internal
3319         DTD subset which contains no parameter entity references, or a document with
3320         "<code>standalone='yes'</code>",
3321         the <a href="#NT-Name">Name</a> given in the entity reference must
3322         <a href="#dt-match">match</a> that in an
3323         <a href="#sec-entity-decl">entity declaration</a>, except that
3324         well-formed documents need not declare
3325         any of the following entities: <code>amp</code>,
3326         <code>lt</code>,
3327         <code>gt</code>,
3328         <code>apos</code>,
3329         <code>quot</code>.
3330         The declaration of a parameter entity must precede any reference to it.
3331         Similarly, the declaration of a general entity must precede any
3332         reference to it which appears in a default value in an attribute-list
3333         declaration.
3334      </p>
3335
3336      <p>Note that if entities are declared in the external subset or in
3337         external parameter entities, a non-validating processor is
3338         <a href="#include-if-valid">not obligated to</a> read
3339         and process their declarations; for such documents, the rule that
3340         an entity must be declared is a well-formedness constraint only
3341         if <a href="#sec-rmd">standalone='yes'</a>.
3342      </p>
3343
3344      <a name="vc-entdeclared"></a><p><b>Validity Constraint: Entity Declared</b></p>
3345      Entity Declared
3346
3347      <p>In a document with an external subset or external parameter
3348         entities with "<code>standalone='no'</code>",
3349         the <a href="#NT-Name">Name</a> given in the entity reference must <a href="#dt-match">match</a> that in an
3350         <a href="#sec-entity-decl">entity declaration</a>.
3351         For interoperability, valid documents should declare the entities
3352         <code>amp</code>,
3353         <code>lt</code>,
3354         <code>gt</code>,
3355         <code>apos</code>,
3356         <code>quot</code>, in the form
3357         specified in <a href="#sec-predefined-ent">[<b>4.6 Predefined Entities</b>]
3358         </a>.
3359         The declaration of a parameter entity must precede any reference to it.
3360         Similarly, the declaration of a general entity must precede any
3361         reference to it which appears in a default value in an attribute-list
3362         declaration.
3363      </p>
3364
3365
3366      <a name="textent"></a><p><b>Well Formedness Constraint: Parsed Entity</b></p>
3367      Parsed Entity
3368
3369      <p>
3370         An entity reference must not contain the name of an <a href="#dt-unparsed">unparsed entity</a>. Unparsed entities may be referred
3371         to only in <a href="#dt-attrval">attribute values</a> declared to
3372         be of type ENTITY or ENTITIES.
3373
3374      </p>
3375
3376      <a name="norecursion"></a><p><b>Well Formedness Constraint: No Recursion</b></p>
3377      No Recursion
3378
3379      <p>
3380         A parsed entity must not contain a recursive reference to itself,
3381         either directly or indirectly.
3382
3383      </p>
3384
3385      <a name="indtd"></a><p><b>Well Formedness Constraint: In DTD</b></p>
3386      In DTD
3387
3388      <p>
3389         Parameter-entity references may only appear in the
3390         <a href="#dt-doctype">DTD</a>.
3391
3392      </p>
3393
3394
3395      <p>Examples of character and entity references:
3396         <pre>Type &lt;key>less-than&lt;/key> (&amp;#x3C;) to save options.
3397This document was prepared on &amp;docdate; and
3398is classified &amp;security-level;.</pre></p>
3399
3400      <p>Example of a parameter-entity reference:
3401         <pre>&lt;!-- declare the parameter entity "ISOLat2"... -->
3402&lt;!ENTITY % ISOLat2
3403         SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" >
3404&lt;!-- ... now reference it. -->
3405%ISOLat2;</pre></p>
3406
3407
3408
3409
3410      <h3><a name="sec-entity-decl"></a>4.2 Entity Declarations
3411      </h3>
3412
3413
3414      <p><a name="dt-entdecl"></a>
3415         Entities are declared thus:
3416
3417         <h5>Entity Declaration</h5>
3418         <table class="scrap">
3419            <tbody>
3420
3421               <tr valign="baseline">
3422                  <td><a name="NT-EntityDecl"></a>[70]&nbsp;&nbsp;&nbsp;
3423                  </td>
3424                  <td>EntityDecl</td>
3425                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3426                  <td><a href="#NT-GEDecl">GEDecl</a> | <a href="#NT-PEDecl">PEDecl</a></td>
3427                  <td></td>
3428               </tr>
3429
3430               <tr valign="baseline">
3431                  <td><a name="NT-GEDecl"></a>[71]&nbsp;&nbsp;&nbsp;
3432                  </td>
3433                  <td>GEDecl</td>
3434                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3435                  <td>'&lt;!ENTITY' <a href="#NT-S">S</a> <a href="#NT-Name">Name</a>
3436                     <a href="#NT-S">S</a> <a href="#NT-EntityDef">EntityDef</a>
3437                     <a href="#NT-S">S</a>? '>'
3438                  </td>
3439                  <td></td>
3440               </tr>
3441
3442               <tr valign="baseline">
3443                  <td><a name="NT-PEDecl"></a>[72]&nbsp;&nbsp;&nbsp;
3444                  </td>
3445                  <td>PEDecl</td>
3446                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3447                  <td>'&lt;!ENTITY' <a href="#NT-S">S</a> '%' <a href="#NT-S">S</a>
3448                     <a href="#NT-Name">Name</a> <a href="#NT-S">S</a>
3449                     <a href="#NT-PEDef">PEDef</a> <a href="#NT-S">S</a>? '>'
3450                  </td>
3451                  <td></td>
3452               </tr>
3453
3454               <tr valign="baseline">
3455                  <td><a name="NT-EntityDef"></a>[73]&nbsp;&nbsp;&nbsp;
3456                  </td>
3457                  <td>EntityDef</td>
3458                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3459                  <td><a href="#NT-EntityValue">EntityValue</a>
3460                     | (<a href="#NT-ExternalID">ExternalID</a>
3461                     <a href="#NT-NDataDecl">NDataDecl</a>?)
3462                  </td>
3463                  <td></td>
3464               </tr>
3465
3466
3467               <tr valign="baseline">
3468                  <td><a name="NT-PEDef"></a>[74]&nbsp;&nbsp;&nbsp;
3469                  </td>
3470                  <td>PEDef</td>
3471                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3472                  <td><a href="#NT-EntityValue">EntityValue</a>
3473                     | <a href="#NT-ExternalID">ExternalID</a></td>
3474                  <td></td>
3475               </tr>
3476
3477            </tbody>
3478         </table>
3479         The <a href="#NT-Name">Name</a> identifies the entity in an
3480         <a href="#dt-entref">entity reference</a> or, in the case of an
3481         unparsed entity, in the value of an ENTITY or ENTITIES
3482         attribute.
3483         If the same entity is declared more than once, the first declaration
3484         encountered is binding; at user option, an XML processor may issue a
3485         warning if entities are declared multiple times.
3486
3487      </p>
3488
3489
3490
3491      <h4><a name="sec-internal-ent"></a>4.2.1 Internal Entities
3492      </h4>
3493
3494
3495      <p><a name="dt-internent"></a>If
3496         the entity definition is an
3497         <a href="#NT-EntityValue">EntityValue</a>,
3498         the defined entity is called an <b>internal entity</b>.
3499         There is no separate physical
3500         storage object, and the content of the entity is given in the
3501         declaration.
3502         Note that some processing of entity and character references in the
3503         <a href="#dt-litentval">literal entity value</a> may be required to
3504         produce the correct <a href="#dt-repltext">replacement
3505            text
3506         </a>: see <a href="#intern-replacement">[<b>4.5 Construction of Internal Entity Replacement Text</b>]
3507         </a>.
3508
3509      </p>
3510
3511      <p>An internal entity is a <a href="#dt-parsedent">parsed
3512            entity
3513         </a>.
3514      </p>
3515
3516      <p>Example of an internal entity declaration:
3517         <pre>&lt;!ENTITY Pub-Status "This is a pre-release of the
3518 specification."></pre></p>
3519
3520
3521
3522
3523      <h4><a name="sec-external-ent"></a>4.2.2 External Entities
3524      </h4>
3525
3526
3527      <p><a name="dt-extent"></a>If the entity is not
3528         internal, it is an <b>external
3529            entity
3530         </b>, declared as follows:
3531
3532         <h5>External Entity Declaration</h5>
3533         <table class="scrap">
3534            <tbody>
3535               <tr valign="baseline">
3536                  <td><a name="NT-ExternalID"></a>[75]&nbsp;&nbsp;&nbsp;
3537                  </td>
3538                  <td>ExternalID</td>
3539                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3540                  <td>'SYSTEM' <a href="#NT-S">S</a>
3541                     <a href="#NT-SystemLiteral">SystemLiteral</a></td>
3542                  <td></td>
3543               </tr>
3544               <tr valign="baseline">
3545                  <td></td>
3546                  <td></td>
3547                  <td></td>
3548                  <td>| 'PUBLIC' <a href="#NT-S">S</a>
3549                     <a href="#NT-PubidLiteral">PubidLiteral</a>
3550                     <a href="#NT-S">S</a>
3551                     <a href="#NT-SystemLiteral">SystemLiteral</a>
3552
3553                  </td>
3554                  <td></td>
3555               </tr>
3556               <tr valign="baseline">
3557                  <td><a name="NT-NDataDecl"></a>[76]&nbsp;&nbsp;&nbsp;
3558                  </td>
3559                  <td>NDataDecl</td>
3560                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3561                  <td><a href="#NT-S">S</a> 'NDATA' <a href="#NT-S">S</a>
3562                     <a href="#NT-Name">Name</a></td>
3563                  <td>[&nbsp;VC:&nbsp;<a href="#not-declared">Notation Declared</a>&nbsp;]
3564                  </td>
3565               </tr>
3566            </tbody>
3567         </table>
3568         If the <a href="#NT-NDataDecl">NDataDecl</a> is present, this is a
3569         general <a href="#dt-unparsed">unparsed
3570            entity
3571         </a>; otherwise it is a parsed entity.
3572      </p>
3573      <a name="not-declared"></a><p><b>Validity Constraint: Notation Declared</b></p>
3574      Notation Declared
3575
3576      <p>
3577         The <a href="#NT-Name">Name</a> must match the declared name of a
3578         <a href="#dt-notation">notation</a>.
3579
3580      </p>
3581
3582
3583      <p><a name="dt-sysid"></a>The
3584         <a href="#NT-SystemLiteral">SystemLiteral</a>
3585         is called the entity's <b>system identifier</b>. It is a URI,
3586         which may be used to retrieve the entity.
3587         Note that the hash mark (<code>#</code>) and fragment identifier
3588         frequently used with URIs are not, formally, part of the URI itself;
3589         an XML processor may signal an error if a fragment identifier is
3590         given as part of a system identifier.
3591         Unless otherwise provided by information outside the scope of this
3592         specification (e.g. a special XML element type defined by a particular
3593         DTD, or a processing instruction defined by a particular application
3594         specification), relative URIs are relative to the location of the
3595         resource within which the entity declaration occurs.
3596         A URI might thus be relative to the
3597         <a href="#dt-docent">document entity</a>, to the entity
3598         containing the <a href="#dt-doctype">external DTD subset</a>,
3599         or to some other <a href="#dt-extent">external parameter entity</a>.
3600
3601      </p>
3602
3603      <p>An XML processor should handle a non-ASCII character in a URI by
3604         representing the character in UTF-8 as one or more bytes, and then
3605         escaping these bytes with the URI escaping mechanism (i.e., by
3606         converting each byte to %HH, where HH is the hexadecimal notation of the
3607         byte value).
3608      </p>
3609
3610      <p><a name="dt-pubid"></a>
3611         In addition to a system identifier, an external identifier may
3612         include a <b>public identifier</b>.
3613         An XML processor attempting to retrieve the entity's content may use the public
3614         identifier to try to generate an alternative URI.  If the processor
3615         is unable to do so, it must use the URI specified in the system
3616         literal.  Before a match is attempted, all strings
3617         of white space in the public identifier must be normalized to single space characters (#x20),
3618         and leading and trailing white space must be removed.
3619      </p>
3620
3621      <p>Examples of external entity declarations:
3622         <pre>&lt;!ENTITY open-hatch
3623         SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml">
3624&lt;!ENTITY open-hatch
3625         PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN"
3626         "http://www.textuality.com/boilerplate/OpenHatch.xml">
3627&lt;!ENTITY hatch-pic
3628         SYSTEM "../grafix/OpenHatch.gif"
3629         NDATA gif ></pre></p>
3630
3631
3632
3633
3634
3635
3636      <h3><a name="TextEntities"></a>4.3 Parsed Entities
3637      </h3>
3638
3639
3640      <h4><a name="sec-TextDecl"></a>4.3.1 The Text Declaration
3641      </h4>
3642
3643      <p>External parsed entities may each begin with a <b>text
3644            declaration
3645         </b>.
3646
3647         <h5>Text Declaration</h5>
3648         <table class="scrap">
3649            <tbody>
3650
3651               <tr valign="baseline">
3652                  <td><a name="NT-TextDecl"></a>[77]&nbsp;&nbsp;&nbsp;
3653                  </td>
3654                  <td>TextDecl</td>
3655                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3656                  <td>'&lt;?xml'
3657                     <a href="#NT-VersionInfo">VersionInfo</a>?
3658                     <a href="#NT-EncodingDecl">EncodingDecl</a>
3659                     <a href="#NT-S">S</a>? '?>'
3660                  </td>
3661                  <td></td>
3662               </tr>
3663
3664            </tbody>
3665         </table>
3666
3667      </p>
3668
3669      <p>The text declaration must be provided literally, not
3670         by reference to a parsed entity.
3671         No text declaration may appear at any position other than the beginning of
3672         an external parsed entity.
3673      </p>
3674
3675
3676
3677      <h4><a name="wf-entities"></a>4.3.2 Well-Formed Parsed Entities
3678      </h4>
3679
3680      <p>The document entity is well-formed if it matches the production labeled
3681         <a href="#NT-document">document</a>.
3682         An external general
3683         parsed entity is well-formed if it matches the production labeled
3684         <a href="#NT-extParsedEnt">extParsedEnt</a>.
3685         An external parameter
3686         entity is well-formed if it matches the production labeled
3687         <a href="#NT-extPE">extPE</a>.
3688
3689         <h5>Well-Formed External Parsed Entity</h5>
3690         <table class="scrap">
3691            <tbody>
3692               <tr valign="baseline">
3693                  <td><a name="NT-extParsedEnt"></a>[78]&nbsp;&nbsp;&nbsp;
3694                  </td>
3695                  <td>extParsedEnt</td>
3696                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3697                  <td><a href="#NT-TextDecl">TextDecl</a>?
3698                     <a href="#NT-content">content</a></td>
3699                  <td></td>
3700               </tr>
3701               <tr valign="baseline">
3702                  <td><a name="NT-extPE"></a>[79]&nbsp;&nbsp;&nbsp;
3703                  </td>
3704                  <td>extPE</td>
3705                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3706                  <td><a href="#NT-TextDecl">TextDecl</a>?
3707                     <a href="#NT-extSubsetDecl">extSubsetDecl</a></td>
3708                  <td></td>
3709               </tr>
3710            </tbody>
3711         </table>
3712         An internal general parsed entity is well-formed if its replacement text
3713         matches the production labeled
3714         <a href="#NT-content">content</a>.
3715         All internal parameter entities are well-formed by definition.
3716
3717      </p>
3718
3719      <p>A consequence of well-formedness in entities is that the logical
3720         and physical structures in an XML document are properly nested; no
3721         <a href="#dt-stag">start-tag</a>,
3722         <a href="#dt-etag">end-tag</a>,
3723         <a href="#dt-empty">empty-element tag</a>,
3724         <a href="#dt-element">element</a>,
3725         <a href="#dt-comment">comment</a>,
3726         <a href="#dt-pi">processing instruction</a>,
3727         <a href="#dt-charref">character
3728            reference
3729         </a>, or
3730         <a href="#dt-entref">entity reference</a>
3731         can begin in one entity and end in another.
3732      </p>
3733
3734
3735
3736      <h4><a name="charencoding"></a>4.3.3 Character Encoding in Entities
3737      </h4>
3738
3739
3740      <p>Each external parsed entity in an XML document may use a different
3741         encoding for its characters. All XML processors must be able to read
3742         entities in either UTF-8 or UTF-16.
3743
3744
3745      </p>
3746
3747      <p>Entities encoded in UTF-16 must
3748         begin with the Byte Order Mark described by ISO/IEC 10646 Annex E and
3749         Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF).
3750         This is an encoding signature, not part of either the markup or the
3751         character data of the XML document.
3752         XML processors must be able to use this character to
3753         differentiate between UTF-8 and UTF-16 encoded documents.
3754      </p>
3755
3756      <p>Although an XML processor is required to read only entities in
3757         the UTF-8 and UTF-16 encodings, it is recognized that other encodings are
3758         used around the world, and it may be desired for XML processors
3759         to read entities that use them.
3760         Parsed entities which are stored in an encoding other than
3761         UTF-8 or UTF-16 must begin with a <a href="#TextDecl">text
3762            declaration
3763         </a> containing an encoding declaration:
3764
3765         <h5>Encoding Declaration</h5>
3766         <table class="scrap">
3767            <tbody>
3768               <tr valign="baseline">
3769                  <td><a name="NT-EncodingDecl"></a>[80]&nbsp;&nbsp;&nbsp;
3770                  </td>
3771                  <td>EncodingDecl</td>
3772                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3773                  <td><a href="#NT-S">S</a>
3774                     'encoding' <a href="#NT-Eq">Eq</a>
3775                     ('"' <a href="#NT-EncName">EncName</a> '"' |
3776                     "'" <a href="#NT-EncName">EncName</a> "'" )
3777
3778                  </td>
3779                  <td></td>
3780               </tr>
3781               <tr valign="baseline">
3782                  <td><a name="NT-EncName"></a>[81]&nbsp;&nbsp;&nbsp;
3783                  </td>
3784                  <td>EncName</td>
3785                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
3786                  <td>[A-Za-z] ([A-Za-z0-9._] | '-')*</td>
3787                  <td>/*Encoding name contains only Latin characters*/</td>
3788               </tr>
3789            </tbody>
3790         </table>
3791         In the <a href="#dt-docent">document entity</a>, the encoding
3792         declaration is part of the <a href="#dt-xmldecl">XML declaration</a>.
3793         The <a href="#NT-EncName">EncName</a> is the name of the encoding used.
3794
3795      </p>
3796
3797
3798      <p>In an encoding declaration, the values
3799         "<code>UTF-8</code>",
3800         "<code>UTF-16</code>",
3801         "<code>ISO-10646-UCS-2</code>", and
3802         "<code>ISO-10646-UCS-4</code>" should be
3803         used for the various encodings and transformations of Unicode /
3804         ISO/IEC 10646, the values
3805         "<code>ISO-8859-1</code>",
3806         "<code>ISO-8859-2</code>", ...
3807         "<code>ISO-8859-9</code>" should be used for the parts of ISO 8859, and
3808         the values
3809         "<code>ISO-2022-JP</code>",
3810         "<code>Shift_JIS</code>", and
3811         "<code>EUC-JP</code>"
3812         should be used for the various encoded forms of JIS X-0208-1997.  XML
3813         processors may recognize other encodings; it is recommended that
3814         character encodings registered (as <i>charset</i>s)
3815         with the Internet Assigned Numbers
3816         Authority <a href="#IANA">[IANA]</a>, other than those just listed, should be
3817         referred to
3818         using their registered names.
3819         Note that these registered names are defined to be
3820         case-insensitive, so processors wishing to match against them
3821         should do so in a case-insensitive
3822         way.
3823      </p>
3824
3825      <p>In the absence of information provided by an external
3826         transport protocol (e.g. HTTP or MIME),
3827         it is an <a href="#dt-error">error</a> for an entity including
3828         an encoding declaration to be presented to the XML processor
3829         in an encoding other than that named in the declaration,
3830         for an encoding declaration to occur other than at the beginning
3831         of an external entity, or for
3832         an entity which begins with neither a Byte Order Mark nor an encoding
3833         declaration to use an encoding other than UTF-8.
3834         Note that since ASCII
3835         is a subset of UTF-8, ordinary ASCII entities do not strictly need
3836         an encoding declaration.
3837      </p>
3838
3839
3840      <p>It is a <a href="#dt-fatal">fatal error</a> when an XML processor
3841         encounters an entity with an encoding that it is unable to process.
3842      </p>
3843
3844      <p>Examples of encoding declarations:
3845         <pre>&lt;?xml encoding='UTF-8'?>
3846&lt;?xml encoding='EUC-JP'?></pre></p>
3847
3848
3849
3850
3851      <h3><a name="entproc"></a>4.4 XML Processor Treatment of Entities and References
3852      </h3>
3853
3854      <p>The table below summarizes the contexts in which character references,
3855         entity references, and invocations of unparsed entities might appear and the
3856         required behavior of an <a href="#dt-xml-proc">XML processor</a> in
3857         each case.
3858         The labels in the leftmost column describe the recognition context:
3859
3860         <dl>
3861
3862            <dt><b>Reference in Content</b></dt>
3863
3864            <dd>
3865               <p>as a reference
3866                  anywhere after the <a href="#dt-stag">start-tag</a> and
3867                  before the <a href="#dt-etag">end-tag</a> of an element; corresponds
3868                  to the nonterminal <a href="#NT-content">content</a>.
3869               </p>
3870            </dd>
3871
3872
3873
3874            <dt><b>Reference in Attribute Value</b></dt>
3875
3876            <dd>
3877               <p>as a reference within either the value of an attribute in a
3878                  <a href="#dt-stag">start-tag</a>, or a default
3879                  value in an <a href="#dt-attdecl">attribute declaration</a>;
3880                  corresponds to the nonterminal
3881                  <a href="#NT-AttValue">AttValue</a>.
3882               </p>
3883            </dd>
3884
3885
3886            <dt><b>Occurs as Attribute Value</b></dt>
3887
3888            <dd>
3889               <p>as a <a href="#NT-Name">Name</a>, not a reference, appearing either as
3890                  the value of an
3891                  attribute which has been declared as type ENTITY, or as one of
3892                  the space-separated tokens in the value of an attribute which has been
3893                  declared as type ENTITIES.
3894               </p>
3895
3896            </dd>
3897
3898            <dt><b>Reference in Entity Value</b></dt>
3899
3900            <dd>
3901               <p>as a reference
3902                  within a parameter or internal entity's
3903                  <a href="#dt-litentval">literal entity value</a> in
3904                  the entity's declaration; corresponds to the nonterminal
3905                  <a href="#NT-EntityValue">EntityValue</a>.
3906               </p>
3907            </dd>
3908
3909            <dt><b>Reference in DTD</b></dt>
3910
3911            <dd>
3912               <p>as a reference within either the internal or external subsets of the
3913                  <a href="#dt-doctype">DTD</a>, but outside
3914                  of an <a href="#NT-EntityValue">EntityValue</a> or
3915                  <a href="#NT-AttValue">AttValue</a>.
3916               </p>
3917            </dd>
3918
3919
3920         </dl>
3921      </p>
3922
3923      <table border="1" cellpadding="7" align="center">
3924
3925         <tbody>
3926
3927            <tr align="" valign="">
3928               <td bgcolor="#c0d9c0" rowspan="2" colspan="1" align="" valign=""></td>
3929
3930               <td bgcolor="#c0d9c0" rowspan="1" colspan="4" align="center" valign="bottom">Entity Type</td>
3931
3932               <td bgcolor="#c0d9c0" rowspan="2" colspan="1" align="center" valign="">Character</td>
3933
3934            </tr>
3935
3936            <tr align="center" valign="bottom">
3937
3938               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign="">Parameter</td>
3939
3940               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign="">Internal
3941                  General
3942               </td>
3943
3944               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign="">External Parsed
3945                  General
3946               </td>
3947
3948               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign="">Unparsed</td>
3949
3950            </tr>
3951
3952            <tr align="center" valign="middle">
3953
3954
3955               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Reference
3956                  in Content
3957               </td>
3958
3959               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Not recognized</a></td>
3960
3961               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#included">Included</a></td>
3962
3963               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#include-if-valid">Included if validating</a></td>
3964
3965               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td>
3966
3967               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#included">Included</a></td>
3968
3969            </tr>
3970
3971            <tr align="center" valign="middle">
3972
3973               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Reference
3974                  in Attribute Value
3975               </td>
3976
3977               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Not recognized</a></td>
3978
3979               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#inliteral">Included in literal</a></td>
3980
3981               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td>
3982
3983               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td>
3984
3985               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#included">Included</a></td>
3986
3987            </tr>
3988
3989            <tr align="center" valign="middle">
3990
3991               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Occurs as
3992                  Attribute Value
3993               </td>
3994
3995               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Not recognized</a></td>
3996
3997               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Forbidden</a></td>
3998
3999               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not-recognized">Forbidden</a></td>
4000
4001               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#notify">Notify</a></td>
4002
4003               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#not%20recognized">Not recognized</a></td>
4004
4005            </tr>
4006
4007            <tr align="center" valign="middle">
4008
4009               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Reference
4010                  in EntityValue
4011               </td>
4012
4013               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#inliteral">Included in literal</a></td>
4014
4015               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#bypass">Bypassed</a></td>
4016
4017               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#bypass">Bypassed</a></td>
4018
4019               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td>
4020
4021               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#included">Included</a></td>
4022
4023            </tr>
4024
4025            <tr align="center" valign="middle">
4026
4027               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="right" valign="">Reference
4028                  in DTD
4029               </td>
4030
4031               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#as-PE">Included as PE</a></td>
4032
4033               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td>
4034
4035               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td>
4036
4037               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td>
4038
4039               <td bgcolor="#c0d9c0" rowspan="1" colspan="1" align="" valign=""><a href="#forbidden">Forbidden</a></td>
4040
4041            </tr>
4042
4043         </tbody>
4044
4045      </table>
4046
4047
4048      <h4><a name="not-recognized"></a>4.4.1 Not Recognized
4049      </h4>
4050
4051      <p>Outside the DTD, the <code>%</code> character has no
4052         special significance; thus, what would be parameter entity references in the
4053         DTD are not recognized as markup in <a href="#NT-content">content</a>.
4054         Similarly, the names of unparsed entities are not recognized except
4055         when they appear in the value of an appropriately declared attribute.
4056
4057      </p>
4058
4059
4060
4061      <h4><a name="included"></a>4.4.2 Included
4062      </h4>
4063
4064      <p><a name="dt-include"></a>An entity is
4065         <b>included</b> when its
4066         <a href="#dt-repltext">replacement text</a> is retrieved
4067         and processed, in place of the reference itself,
4068         as though it were part of the document at the location the
4069         reference was recognized.
4070         The replacement text may contain both
4071         <a href="#dt-chardata">character data</a>
4072         and (except for parameter entities) <a href="#dt-markup">markup</a>,
4073         which must be recognized in
4074         the usual way, except that the replacement text of entities used to escape
4075         markup delimiters (the entities <code>amp</code>,
4076         <code>lt</code>,
4077         <code>gt</code>,
4078         <code>apos</code>,
4079         <code>quot</code>) is always treated as
4080         data.  (The string "<code>AT&amp;amp;T;</code>" expands to
4081         "<code>AT&amp;T;</code>" and the remaining ampersand is not recognized
4082         as an entity-reference delimiter.)
4083         A character reference is <b>included</b> when the indicated
4084         character is processed in place of the reference itself.
4085
4086      </p>
4087
4088
4089
4090      <h4><a name="include-if-valid"></a>4.4.3 Included If Validating
4091      </h4>
4092
4093      <p>When an XML processor recognizes a reference to a parsed entity, in order
4094         to <a href="#dt-valid">validate</a>
4095         the document, the processor must
4096         <a href="#dt-include">include</a> its
4097         replacement text.
4098         If the entity is external, and the processor is not
4099         attempting to validate the XML document, the
4100         processor <a href="#dt-may">may</a>, but need not,
4101         include the entity's replacement text.
4102         If a non-validating parser does not include the replacement text,
4103         it must inform the application that it recognized, but did not
4104         read, the entity.
4105      </p>
4106
4107      <p>This rule is based on the recognition that the automatic inclusion
4108         provided by the SGML and XML entity mechanism, primarily designed
4109         to support modularity in authoring, is not necessarily
4110         appropriate for other applications, in particular document browsing.
4111         Browsers, for example, when encountering an external parsed entity reference,
4112         might choose to provide a visual indication of the entity's
4113         presence and retrieve it for display only on demand.
4114
4115      </p>
4116
4117
4118
4119      <h4><a name="forbidden"></a>4.4.4 Forbidden
4120      </h4>
4121
4122      <p>The following are forbidden, and constitute
4123         <a href="#dt-fatal">fatal</a> errors:
4124
4125         <ul>
4126
4127            <li>
4128               <p>the appearance of a reference to an
4129                  <a href="#dt-unparsed">unparsed entity</a>.
4130
4131               </p>
4132            </li>
4133
4134            <li>
4135               <p>the appearance of any character or general-entity reference in the
4136                  DTD except within an <a href="#NT-EntityValue">EntityValue</a> or
4137                  <a href="#NT-AttValue">AttValue</a>.
4138               </p>
4139            </li>
4140
4141            <li>
4142               <p>a reference to an external entity in an attribute value.</p>
4143
4144            </li>
4145
4146         </ul>
4147
4148      </p>
4149
4150
4151
4152      <h4><a name="inliteral"></a>4.4.5 Included in Literal
4153      </h4>
4154
4155      <p>When an <a href="#dt-entref">entity reference</a> appears in an
4156         attribute value, or a parameter entity reference appears in a literal entity
4157         value, its <a href="#dt-repltext">replacement text</a> is
4158         processed in place of the reference itself as though it
4159         were part of the document at the location the reference was recognized,
4160         except that a single or double quote character in the replacement text
4161         is always treated as a normal data character and will not terminate the
4162         literal.
4163         For example, this is well-formed:
4164         <pre>&lt;!ENTITY % YN '"Yes"' >
4165&lt;!ENTITY WhatHeSaid "He said &amp;YN;" ></pre>
4166         while this is not:
4167         <pre>&lt;!ENTITY EndAttr "27'" >
4168&lt;element attribute='a-&amp;EndAttr;></pre>
4169         </p>
4170
4171
4172      <h4><a name="notify"></a>4.4.6 Notify
4173      </h4>
4174
4175      <p>When the name of an <a href="#dt-unparsed">unparsed
4176            entity
4177         </a> appears as a token in the
4178         value of an attribute of declared type ENTITY or ENTITIES,
4179         a validating processor must inform the
4180         application of the <a href="#dt-sysid">system</a>
4181         and <a href="#dt-pubid">public</a> (if any)
4182         identifiers for both the entity and its associated
4183         <a href="#dt-notation">notation</a>.
4184      </p>
4185
4186
4187
4188      <h4><a name="bypass"></a>4.4.7 Bypassed
4189      </h4>
4190
4191      <p>When a general entity reference appears in the
4192         <a href="#NT-EntityValue">EntityValue</a> in an entity declaration,
4193         it is bypassed and left as is.
4194      </p>
4195
4196
4197
4198      <h4><a name="as-PE"></a>4.4.8 Included as PE
4199      </h4>
4200
4201      <p>Just as with external parsed entities, parameter entities
4202         need only be <a href="#include-if-valid">included if
4203            validating
4204         </a>.
4205         When a parameter-entity reference is recognized in the DTD
4206         and included, its
4207         <a href="#dt-repltext">replacement
4208            text
4209         </a> is enlarged by the attachment of one leading and one following
4210         space (#x20) character; the intent is to constrain the replacement
4211         text of parameter
4212         entities to contain an integral number of grammatical tokens in the DTD.
4213
4214      </p>
4215
4216
4217
4218
4219
4220      <h3><a name="intern-replacement"></a>4.5 Construction of Internal Entity Replacement Text
4221      </h3>
4222
4223      <p>In discussing the treatment
4224         of internal entities, it is
4225         useful to distinguish two forms of the entity's value.
4226         <a name="dt-litentval"></a>The <b>literal
4227            entity value
4228         </b> is the quoted string actually
4229         present in the entity declaration, corresponding to the
4230         non-terminal <a href="#NT-EntityValue">EntityValue</a>.
4231         <a name="dt-repltext"></a>The <b>replacement
4232            text
4233         </b> is the content of the entity, after
4234         replacement of character references and parameter-entity
4235         references.
4236
4237      </p>
4238
4239
4240      <p>The literal entity value
4241         as given in an internal entity declaration
4242         (<a href="#NT-EntityValue">EntityValue</a>) may contain character,
4243         parameter-entity, and general-entity references.
4244         Such references must be contained entirely within the
4245         literal entity value.
4246         The actual replacement text that is
4247         <a href="#dt-include">included</a> as described above
4248         must contain the <i>replacement text</i> of any
4249         parameter entities referred to, and must contain the character
4250         referred to, in place of any character references in the
4251         literal entity value; however,
4252         general-entity references must be left as-is, unexpanded.
4253         For example, given the following declarations:
4254
4255         <pre>&lt;!ENTITY % pub    "&amp;#xc9;ditions Gallimard" >
4256&lt;!ENTITY   rights "All rights reserved" >
4257&lt;!ENTITY   book   "La Peste: Albert Camus,
4258&amp;#xA9; 1947 %pub;. &amp;rights;" ></pre>
4259         then the replacement text for the entity "<code>book</code>" is:
4260         <pre>La Peste: Albert Camus,
4261&copy; 1947 &Eacute;ditions Gallimard. &amp;rights;</pre>
4262         The general-entity reference "<code>&amp;rights;</code>" would be expanded
4263         should the reference "<code>&amp;book;</code>" appear in the document's
4264         content or an attribute value.
4265      </p>
4266
4267      <p>These simple rules may have complex interactions; for a detailed
4268         discussion of a difficult example, see
4269         <a href="#sec-entexpand">[<b>D Expansion of Entity and Character References</b>]
4270         </a>.
4271
4272      </p>
4273
4274
4275
4276
4277      <h3><a name="sec-predefined-ent"></a>4.6 Predefined Entities
4278      </h3>
4279
4280      <p><a name="dt-escape"></a>Entity and character
4281         references can both be used to <b>escape</b> the left angle bracket,
4282         ampersand, and other delimiters.   A set of general entities
4283         (<code>amp</code>,
4284         <code>lt</code>,
4285         <code>gt</code>,
4286         <code>apos</code>,
4287         <code>quot</code>) is specified for this purpose.
4288         Numeric character references may also be used; they are
4289         expanded immediately when recognized and must be treated as
4290         character data, so the numeric character references
4291         "<code>&amp;#60;</code>" and "<code>&amp;#38;</code>" may be used to
4292         escape <code>&lt;</code> and <code>&amp;</code> when they occur
4293         in character data.
4294      </p>
4295
4296      <p>All XML processors must recognize these entities whether they
4297         are declared or not.
4298         <a href="#dt-interop">For interoperability</a>,
4299         valid XML documents should declare these
4300         entities, like any others, before using them.
4301         If the entities in question are declared, they must be declared
4302         as internal entities whose replacement text is the single
4303         character being escaped or a character reference to
4304         that character, as shown below.
4305         <pre>&lt;!ENTITY lt     "&amp;#38;#60;">
4306&lt;!ENTITY gt     "&amp;#62;">
4307&lt;!ENTITY amp    "&amp;#38;#38;">
4308&lt;!ENTITY apos   "&amp;#39;">
4309&lt;!ENTITY quot   "&amp;#34;">
4310</pre>
4311         Note that the <code>&lt;</code> and <code>&amp;</code> characters
4312         in the declarations of "<code>lt</code>" and "<code>amp</code>"
4313         are doubly escaped to meet the requirement that entity replacement
4314         be well-formed.
4315
4316      </p>
4317
4318
4319
4320
4321      <h3><a name="Notations"></a>4.7 Notation Declarations
4322      </h3>
4323
4324
4325      <p><a name="dt-notation"></a><b>Notations</b> identify by
4326         name the format of <a href="#dt-extent">unparsed
4327            entities
4328         </a>, the
4329         format of elements which bear a notation attribute,
4330         or the application to which
4331         a <a href="#dt-pi">processing instruction</a> is
4332         addressed.
4333      </p>
4334
4335      <p><a name="dt-notdecl"></a>
4336         <b>Notation declarations</b>
4337         provide a name for the notation, for use in
4338         entity and attribute-list declarations and in attribute specifications,
4339         and an external identifier for the notation which may allow an XML
4340         processor or its client application to locate a helper application
4341         capable of processing data in the given notation.
4342
4343         <h5>Notation Declarations</h5>
4344         <table class="scrap">
4345            <tbody>
4346               <tr valign="baseline">
4347                  <td><a name="NT-NotationDecl"></a>[82]&nbsp;&nbsp;&nbsp;
4348                  </td>
4349                  <td>NotationDecl</td>
4350                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
4351                  <td>'&lt;!NOTATION' <a href="#NT-S">S</a> <a href="#NT-Name">Name</a>
4352                     <a href="#NT-S">S</a>
4353                     (<a href="#NT-ExternalID">ExternalID</a> |
4354                     <a href="#NT-PublicID">PublicID</a>)
4355                     <a href="#NT-S">S</a>? '>'
4356                  </td>
4357                  <td></td>
4358               </tr>
4359               <tr valign="baseline">
4360                  <td><a name="NT-PublicID"></a>[83]&nbsp;&nbsp;&nbsp;
4361                  </td>
4362                  <td>PublicID</td>
4363                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
4364                  <td>'PUBLIC' <a href="#NT-S">S</a>
4365                     <a href="#NT-PubidLiteral">PubidLiteral</a>
4366
4367                  </td>
4368                  <td></td>
4369               </tr>
4370            </tbody>
4371         </table>
4372
4373      </p>
4374
4375      <p>XML processors must provide applications with the name and external
4376         identifier(s) of any notation declared and referred to in an attribute
4377         value, attribute definition, or entity declaration.  They may
4378         additionally resolve the external identifier into the
4379         <a href="#dt-sysid">system identifier</a>,
4380         file name, or other information needed to allow the
4381         application to call a processor for data in the notation described.  (It
4382         is not an error, however, for XML documents to declare and refer to
4383         notations for which notation-specific applications are not available on
4384         the system where the XML processor or application is running.)
4385      </p>
4386
4387
4388
4389
4390
4391      <h3><a name="sec-doc-entity"></a>4.8 Document Entity
4392      </h3>
4393
4394
4395      <p><a name="dt-docent"></a>The <b>document
4396            entity
4397         </b> serves as the root of the entity
4398         tree and a starting-point for an <a href="#dt-xml-proc">XML
4399            processor
4400         </a>.
4401         This specification does
4402         not specify how the document entity is to be located by an XML
4403         processor; unlike other entities, the document entity has no name and might
4404         well appear on a processor input stream
4405         without any identification at all.
4406      </p>
4407
4408
4409
4410
4411
4412
4413
4414
4415      <h2><a name="sec-conformance"></a>5 Conformance
4416      </h2>
4417
4418
4419
4420      <h3><a name="proc-types"></a>5.1 Validating and Non-Validating Processors
4421      </h3>
4422
4423      <p>Conforming <a href="#dt-xml-proc">XML processors</a> fall into two
4424         classes: validating and non-validating.
4425      </p>
4426
4427      <p>Validating and non-validating processors alike must report
4428         violations of this specification's well-formedness constraints
4429         in the content of the
4430         <a href="#dt-docent">document entity</a> and any
4431         other <a href="#dt-parsedent">parsed entities</a> that
4432         they read.
4433      </p>
4434
4435      <p><a name="dt-validating"></a>
4436         <b>Validating processors</b> must report
4437         violations of the constraints expressed by the declarations in the
4438         <a href="#dt-doctype">DTD</a>, and
4439         failures to fulfill the validity constraints given
4440         in this specification.
4441
4442         To accomplish this, validating XML processors must read and process the entire
4443         DTD and all external parsed entities referenced in the document.
4444
4445      </p>
4446
4447      <p>Non-validating processors are required to check only the
4448         <a href="#dt-docent">document entity</a>, including
4449         the entire internal DTD subset, for well-formedness.
4450         <a name="dt-use-mdecl"></a>
4451         While they are not required to check the document for validity,
4452         they are required to
4453         <b>process</b> all the declarations they read in the
4454         internal DTD subset and in any parameter entity that they
4455         read, up to the first reference
4456         to a parameter entity that they do <i>not</i> read; that is to
4457         say, they must
4458         use the information in those declarations to
4459         <a href="#AVNormalize">normalize</a> attribute values,
4460         <a href="#included">include</a> the replacement text of
4461         internal entities, and supply
4462         <a href="#sec-attr-defaults">default attribute values</a>.
4463
4464         They must not <a href="#dt-use-mdecl">process</a>
4465         <a href="#dt-entdecl">entity declarations</a> or
4466         <a href="#dt-attdecl">attribute-list declarations</a>
4467         encountered after a reference to a parameter entity that is not
4468         read, since the entity may have contained overriding declarations.
4469
4470      </p>
4471
4472
4473
4474      <h3><a name="safe-behavior"></a>5.2 Using XML Processors
4475      </h3>
4476
4477      <p>The behavior of a validating XML processor is highly predictable; it
4478         must read every piece of a document and report all well-formedness and
4479         validity violations.
4480         Less is required of a non-validating processor; it need not read any
4481         part of the document other than the document entity.
4482         This has two effects that may be important to users of XML processors:
4483
4484         <ul>
4485
4486            <li>
4487               <p>Certain well-formedness errors, specifically those that require
4488                  reading external entities, may not be detected by a non-validating processor.
4489                  Examples include the constraints entitled
4490                  <a href="#wf-entdeclared">Entity Declared</a>,
4491                  <a href="#wf-textent">Parsed Entity</a>, and
4492                  <a href="#wf-norecursion">No Recursion</a>, as well
4493                  as some of the cases described as
4494                  <a href="#forbidden">forbidden</a> in
4495                  <a href="#entproc">[<b>4.4 XML Processor Treatment of Entities and References</b>]
4496                  </a>.
4497               </p>
4498            </li>
4499
4500            <li>
4501               <p>The information passed from the processor to the application may
4502                  vary, depending on whether the processor reads
4503                  parameter and external entities.
4504                  For example, a non-validating processor may not
4505                  <a href="#AVNormalize">normalize</a> attribute values,
4506                  <a href="#included">include</a> the replacement text of
4507                  internal entities, or supply
4508                  <a href="#sec-attr-defaults">default attribute values</a>,
4509                  where doing so depends on having read declarations in
4510                  external or parameter entities.
4511               </p>
4512            </li>
4513
4514         </ul>
4515
4516      </p>
4517
4518      <p>For maximum reliability in interoperating between different XML
4519         processors, applications which use non-validating processors should not
4520         rely on any behaviors not required of such processors.
4521         Applications which require facilities such as the use of default
4522         attributes or internal entities which are declared in external
4523         entities should use validating XML processors.
4524      </p>
4525
4526
4527
4528
4529
4530      <h2><a name="sec-notation"></a>6 Notation
4531      </h2>
4532
4533
4534      <p>The formal grammar of XML is given in this specification using a simple
4535         Extended Backus-Naur Form (EBNF) notation.  Each rule in the grammar defines
4536         one symbol, in the form
4537         <pre>symbol ::= expression</pre></p>
4538
4539      <p>Symbols are written with an initial capital letter if they are
4540         defined by a regular expression, or with an initial lower case letter
4541         otherwise.
4542         Literal strings are quoted.
4543
4544
4545      </p>
4546
4547
4548      <p>Within the expression on the right-hand side of a rule, the following
4549         expressions are used to match strings of one or more characters:
4550
4551         <dl>
4552
4553
4554            <dt><b><code>#xN</code></b></dt>
4555
4556            <dd>
4557               <p>where <code>N</code> is a hexadecimal integer, the
4558                  expression matches the character in ISO/IEC 10646 whose canonical
4559                  (UCS-4)
4560                  code value, when interpreted as an unsigned binary number, has
4561                  the value indicated.  The number of leading zeros in the
4562                  <code>#xN</code> form is insignificant; the number of leading
4563                  zeros in the corresponding code value
4564                  is governed by the character
4565                  encoding in use and is not significant for XML.
4566               </p>
4567            </dd>
4568
4569
4570
4571            <dt><b><code>[a-zA-Z]</code>, <code>[#xN-#xN]</code></b></dt>
4572
4573            <dd>
4574               <p>matches any <a href="#dt-character">character</a>
4575                  with a value in the range(s) indicated (inclusive).
4576               </p>
4577            </dd>
4578
4579
4580
4581            <dt><b><code>[^a-z]</code>, <code>[^#xN-#xN]</code></b></dt>
4582
4583            <dd>
4584               <p>matches any <a href="#dt-character">character</a>
4585                  with a value <i>outside</i> the
4586                  range indicated.
4587               </p>
4588            </dd>
4589
4590
4591
4592            <dt><b><code>[^abc]</code>, <code>[^#xN#xN#xN]</code></b></dt>
4593
4594            <dd>
4595               <p>matches any <a href="#dt-character">character</a>
4596                  with a value not among the characters given.
4597               </p>
4598            </dd>
4599
4600
4601
4602            <dt><b><code>"string"</code></b></dt>
4603
4604            <dd>
4605               <p>matches a literal string <a href="#dt-match">matching</a>
4606                  that given inside the double quotes.
4607               </p>
4608            </dd>
4609
4610
4611
4612            <dt><b><code>'string'</code></b></dt>
4613
4614            <dd>
4615               <p>matches a literal string <a href="#dt-match">matching</a>
4616                  that given inside the single quotes.
4617               </p>
4618            </dd>
4619
4620
4621         </dl>
4622         These symbols may be combined to match more complex patterns as follows,
4623         where <code>A</code> and <code>B</code> represent simple expressions:
4624
4625         <dl>
4626
4627
4628            <dt><b>(<code>expression</code>)
4629               </b>
4630            </dt>
4631
4632            <dd>
4633               <p><code>expression</code> is treated as a unit
4634                  and may be combined as described in this list.
4635               </p>
4636            </dd>
4637
4638
4639
4640            <dt><b><code>A?</code></b></dt>
4641
4642            <dd>
4643               <p>matches <code>A</code> or nothing; optional <code>A</code>.
4644               </p>
4645            </dd>
4646
4647
4648
4649            <dt><b><code>A B</code></b></dt>
4650
4651            <dd>
4652               <p>matches <code>A</code> followed by <code>B</code>.
4653               </p>
4654            </dd>
4655
4656
4657
4658            <dt><b><code>A | B</code></b></dt>
4659
4660            <dd>
4661               <p>matches <code>A</code> or <code>B</code> but not both.
4662               </p>
4663            </dd>
4664
4665
4666
4667            <dt><b><code>A - B</code></b></dt>
4668
4669            <dd>
4670               <p>matches any string that matches <code>A</code> but does not match
4671                  <code>B</code>.
4672
4673               </p>
4674            </dd>
4675
4676
4677
4678            <dt><b><code>A+</code></b></dt>
4679
4680            <dd>
4681               <p>matches one or more occurrences of <code>A</code>.
4682               </p>
4683            </dd>
4684
4685
4686
4687            <dt><b><code>A*</code></b></dt>
4688
4689            <dd>
4690               <p>matches zero or more occurrences of <code>A</code>.
4691               </p>
4692            </dd>
4693
4694
4695
4696         </dl>
4697         Other notations used in the productions are:
4698
4699         <dl>
4700
4701
4702            <dt><b><code>/* ... */</code></b></dt>
4703
4704            <dd>
4705               <p>comment.</p>
4706            </dd>
4707
4708
4709
4710            <dt><b><code>[ wfc: ... ]</code></b></dt>
4711
4712            <dd>
4713               <p>well-formedness constraint; this identifies by name a
4714                  constraint on
4715                  <a href="#dt-wellformed">well-formed</a> documents
4716                  associated with a production.
4717               </p>
4718            </dd>
4719
4720
4721
4722            <dt><b><code>[ vc: ... ]</code></b></dt>
4723
4724            <dd>
4725               <p>validity constraint; this identifies by name a constraint on
4726                  <a href="#dt-valid">valid</a> documents associated with
4727                  a production.
4728               </p>
4729            </dd>
4730
4731
4732         </dl>
4733
4734      </p>
4735
4736
4737
4738      <hr title="Separator from footer">
4739
4740
4741
4742
4743
4744
4745
4746      <h2><a name="sec-bibliography"></a>A References
4747      </h2>
4748
4749
4750      <h3><a name="sec-existing-stds"></a>A.1 Normative References
4751      </h3>
4752
4753
4754      <dl>
4755
4756         <dt><a name="IANA">IANA</a></dt>
4757         <dd>
4758            (Internet Assigned Numbers Authority) <i>Official Names for
4759               Character Sets
4760            </i>,
4761            ed. Keld Simonsen et al.
4762            See <a href="ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets">ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets</a>.
4763
4764         </dd>
4765
4766
4767         <dt><a name="RFC1766">IETF RFC 1766</a></dt>
4768         <dd>
4769            IETF (Internet Engineering Task Force).
4770            <i>RFC 1766:  Tags for the Identification of Languages</i>,
4771            ed. H. Alvestrand.
4772            1995.
4773
4774         </dd>
4775
4776
4777         <dt><a name="ISO639">ISO 639</a></dt>
4778         <dd>
4779            (International Organization for Standardization).
4780            <i>ISO 639:1988 (E).
4781               Code for the representation of names of languages.
4782            </i>
4783            [Geneva]:  International Organization for
4784            Standardization, 1988.
4785         </dd>
4786
4787
4788         <dt><a name="ISO3166">ISO 3166</a></dt>
4789         <dd>
4790            (International Organization for Standardization).
4791            <i>ISO 3166-1:1997 (E).
4792               Codes for the representation of names of countries and their subdivisions
4793               -- Part 1: Country codes
4794            </i>
4795            [Geneva]:  International Organization for
4796            Standardization, 1997.
4797         </dd>
4798
4799
4800         <dt><a name="ISO10646">ISO/IEC 10646</a></dt>
4801         <dd>ISO
4802            (International Organization for Standardization).
4803            <i>ISO/IEC 10646-1993 (E).  Information technology -- Universal
4804               Multiple-Octet Coded Character Set (UCS) -- Part 1:
4805               Architecture and Basic Multilingual Plane.
4806            </i>
4807            [Geneva]:  International Organization for
4808            Standardization, 1993 (plus amendments AM 1 through AM 7).
4809
4810         </dd>
4811
4812
4813         <dt><a name="Unicode">Unicode</a></dt>
4814         <dd>The Unicode Consortium.
4815            <i>The Unicode Standard, Version 2.0.</i>
4816            Reading, Mass.:  Addison-Wesley Developers Press, 1996.
4817         </dd>
4818
4819
4820      </dl>
4821
4822
4823
4824
4825      <h3><a name="section-Other-References"></a>A.2 Other References
4826      </h3>
4827
4828
4829      <dl>
4830
4831
4832         <dt><a name="Aho">Aho/Ullman</a></dt>
4833         <dd>Aho, Alfred V.,
4834            Ravi Sethi, and Jeffrey D. Ullman.
4835            <i>Compilers:  Principles, Techniques, and Tools</i>.
4836            Reading:  Addison-Wesley, 1986, rpt. corr. 1988.
4837         </dd>
4838
4839
4840         <dt><a name="Berners-Lee">Berners-Lee et al.</a></dt>
4841         <dd>
4842            Berners-Lee, T., R. Fielding, and L. Masinter.
4843            <i>Uniform Resource Identifiers (URI):  Generic Syntax and
4844               Semantics
4845            </i>.
4846            1997.
4847            (Work in progress; see updates to RFC1738.)
4848         </dd>
4849
4850
4851         <dt><a name="ABK">Br&uuml;ggemann-Klein</a></dt>
4852         <dd>Br&uuml;ggemann-Klein, Anne.
4853            <i>Regular Expressions into Finite Automata</i>.
4854            Extended abstract in I. Simon, Hrsg., LATIN 1992,
4855            S. 97-98. Springer-Verlag, Berlin 1992.
4856            Full Version in Theoretical Computer Science 120: 197-213, 1993.
4857
4858
4859         </dd>
4860
4861
4862         <dt><a name="ABKDW">Br&uuml;ggemann-Klein and Wood</a></dt>
4863         <dd>Br&uuml;ggemann-Klein, Anne,
4864            and Derick Wood.
4865            <i>Deterministic Regular Languages</i>.
4866            Universit&auml;t Freiburg, Institut f&uuml;r Informatik,
4867            Bericht 38, Oktober 1991.
4868
4869         </dd>
4870
4871
4872         <dt><a name="Clark">Clark</a></dt>
4873         <dd>James Clark.
4874            Comparison of SGML and XML. See
4875            <a href="http://www.w3.org/TR/NOTE-sgml-xml-971215">http://www.w3.org/TR/NOTE-sgml-xml-971215</a>.
4876
4877         </dd>
4878
4879         <dt><a name="RFC1738">IETF RFC1738</a></dt>
4880         <dd>
4881            IETF (Internet Engineering Task Force).
4882            <i>RFC 1738:  Uniform Resource Locators (URL)</i>,
4883            ed. T. Berners-Lee, L. Masinter, M. McCahill.
4884            1994.
4885
4886         </dd>
4887
4888
4889         <dt><a name="RFC1808">IETF RFC1808</a></dt>
4890         <dd>
4891            IETF (Internet Engineering Task Force).
4892            <i>RFC 1808:  Relative Uniform Resource Locators</i>,
4893            ed. R. Fielding.
4894            1995.
4895
4896         </dd>
4897
4898
4899         <dt><a name="RFC2141">IETF RFC2141</a></dt>
4900         <dd>
4901            IETF (Internet Engineering Task Force).
4902            <i>RFC 2141:  URN Syntax</i>,
4903            ed. R. Moats.
4904            1997.
4905
4906         </dd>
4907
4908
4909         <dt><a name="ISO8879">ISO 8879</a></dt>
4910         <dd>ISO
4911            (International Organization for Standardization).
4912            <i>ISO 8879:1986(E).  Information processing -- Text and Office
4913               Systems -- Standard Generalized Markup Language (SGML).
4914            </i>  First
4915            edition -- 1986-10-15.  [Geneva]:  International Organization for
4916            Standardization, 1986.
4917
4918         </dd>
4919
4920
4921
4922         <dt><a name="ISO10744">ISO/IEC 10744</a></dt>
4923         <dd>ISO
4924            (International Organization for Standardization).
4925            <i>ISO/IEC 10744-1992 (E).  Information technology --
4926               Hypermedia/Time-based Structuring Language (HyTime).
4927
4928            </i>
4929            [Geneva]:  International Organization for
4930            Standardization, 1992.
4931            <i>Extended Facilities Annexe.</i>
4932            [Geneva]:  International Organization for
4933            Standardization, 1996.
4934
4935         </dd>
4936
4937
4938
4939
4940      </dl>
4941
4942
4943
4944
4945      <h2><a name="CharClasses"></a>B Character Classes
4946      </h2>
4947
4948      <p>Following the characteristics defined in the Unicode standard,
4949         characters are classed as base characters (among others, these
4950         contain the alphabetic characters of the Latin alphabet, without
4951         diacritics), ideographic characters, and combining characters (among
4952         others, this class contains most diacritics); these classes combine
4953         to form the class of letters.  Digits and extenders are
4954         also distinguished.
4955
4956         <h5>Characters</h5>
4957         <table class="scrap">
4958            <tbody>
4959
4960               <tr valign="baseline">
4961                  <td><a name="NT-Letter"></a>[84]&nbsp;&nbsp;&nbsp;
4962                  </td>
4963                  <td>Letter</td>
4964                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
4965                  <td><a href="#NT-BaseChar">BaseChar</a>
4966                     | <a href="#NT-Ideographic">Ideographic</a></td>
4967                  <td></td>
4968               </tr>
4969
4970               <tr valign="baseline">
4971                  <td><a name="NT-BaseChar"></a>[85]&nbsp;&nbsp;&nbsp;
4972                  </td>
4973                  <td>BaseChar</td>
4974                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
4975                  <td>[#x0041-#x005A]
4976                     |&nbsp;[#x0061-#x007A]
4977                     |&nbsp;[#x00C0-#x00D6]
4978                     |&nbsp;[#x00D8-#x00F6]
4979                     |&nbsp;[#x00F8-#x00FF]
4980                     |&nbsp;[#x0100-#x0131]
4981                     |&nbsp;[#x0134-#x013E]
4982                     |&nbsp;[#x0141-#x0148]
4983                     |&nbsp;[#x014A-#x017E]
4984                     |&nbsp;[#x0180-#x01C3]
4985                     |&nbsp;[#x01CD-#x01F0]
4986                     |&nbsp;[#x01F4-#x01F5]
4987                     |&nbsp;[#x01FA-#x0217]
4988                     |&nbsp;[#x0250-#x02A8]
4989                     |&nbsp;[#x02BB-#x02C1]
4990                     |&nbsp;#x0386
4991                     |&nbsp;[#x0388-#x038A]
4992                     |&nbsp;#x038C
4993                     |&nbsp;[#x038E-#x03A1]
4994                     |&nbsp;[#x03A3-#x03CE]
4995                     |&nbsp;[#x03D0-#x03D6]
4996                     |&nbsp;#x03DA
4997                     |&nbsp;#x03DC
4998                     |&nbsp;#x03DE
4999                     |&nbsp;#x03E0
5000                     |&nbsp;[#x03E2-#x03F3]
5001                     |&nbsp;[#x0401-#x040C]
5002                     |&nbsp;[#x040E-#x044F]
5003                     |&nbsp;[#x0451-#x045C]
5004                     |&nbsp;[#x045E-#x0481]
5005                     |&nbsp;[#x0490-#x04C4]
5006                     |&nbsp;[#x04C7-#x04C8]
5007                     |&nbsp;[#x04CB-#x04CC]
5008                     |&nbsp;[#x04D0-#x04EB]
5009                     |&nbsp;[#x04EE-#x04F5]
5010                     |&nbsp;[#x04F8-#x04F9]
5011                     |&nbsp;[#x0531-#x0556]
5012                     |&nbsp;#x0559
5013                     |&nbsp;[#x0561-#x0586]
5014                     |&nbsp;[#x05D0-#x05EA]
5015                     |&nbsp;[#x05F0-#x05F2]
5016                     |&nbsp;[#x0621-#x063A]
5017                     |&nbsp;[#x0641-#x064A]
5018                     |&nbsp;[#x0671-#x06B7]
5019                     |&nbsp;[#x06BA-#x06BE]
5020                     |&nbsp;[#x06C0-#x06CE]
5021                     |&nbsp;[#x06D0-#x06D3]
5022                     |&nbsp;#x06D5
5023                     |&nbsp;[#x06E5-#x06E6]
5024                     |&nbsp;[#x0905-#x0939]
5025                     |&nbsp;#x093D
5026                     |&nbsp;[#x0958-#x0961]
5027                     |&nbsp;[#x0985-#x098C]
5028                     |&nbsp;[#x098F-#x0990]
5029                     |&nbsp;[#x0993-#x09A8]
5030                     |&nbsp;[#x09AA-#x09B0]
5031                     |&nbsp;#x09B2
5032                     |&nbsp;[#x09B6-#x09B9]
5033                     |&nbsp;[#x09DC-#x09DD]
5034                     |&nbsp;[#x09DF-#x09E1]
5035                     |&nbsp;[#x09F0-#x09F1]
5036                     |&nbsp;[#x0A05-#x0A0A]
5037                     |&nbsp;[#x0A0F-#x0A10]
5038                     |&nbsp;[#x0A13-#x0A28]
5039                     |&nbsp;[#x0A2A-#x0A30]
5040                     |&nbsp;[#x0A32-#x0A33]
5041                     |&nbsp;[#x0A35-#x0A36]
5042                     |&nbsp;[#x0A38-#x0A39]
5043                     |&nbsp;[#x0A59-#x0A5C]
5044                     |&nbsp;#x0A5E
5045                     |&nbsp;[#x0A72-#x0A74]
5046                     |&nbsp;[#x0A85-#x0A8B]
5047                     |&nbsp;#x0A8D
5048                     |&nbsp;[#x0A8F-#x0A91]
5049                     |&nbsp;[#x0A93-#x0AA8]
5050                     |&nbsp;[#x0AAA-#x0AB0]
5051                     |&nbsp;[#x0AB2-#x0AB3]
5052                     |&nbsp;[#x0AB5-#x0AB9]
5053                     |&nbsp;#x0ABD
5054                     |&nbsp;#x0AE0
5055                     |&nbsp;[#x0B05-#x0B0C]
5056                     |&nbsp;[#x0B0F-#x0B10]
5057                     |&nbsp;[#x0B13-#x0B28]
5058                     |&nbsp;[#x0B2A-#x0B30]
5059                     |&nbsp;[#x0B32-#x0B33]
5060                     |&nbsp;[#x0B36-#x0B39]
5061                     |&nbsp;#x0B3D
5062                     |&nbsp;[#x0B5C-#x0B5D]
5063                     |&nbsp;[#x0B5F-#x0B61]
5064                     |&nbsp;[#x0B85-#x0B8A]
5065                     |&nbsp;[#x0B8E-#x0B90]
5066                     |&nbsp;[#x0B92-#x0B95]
5067                     |&nbsp;[#x0B99-#x0B9A]
5068                     |&nbsp;#x0B9C
5069                     |&nbsp;[#x0B9E-#x0B9F]
5070                     |&nbsp;[#x0BA3-#x0BA4]
5071                     |&nbsp;[#x0BA8-#x0BAA]
5072                     |&nbsp;[#x0BAE-#x0BB5]
5073                     |&nbsp;[#x0BB7-#x0BB9]
5074                     |&nbsp;[#x0C05-#x0C0C]
5075                     |&nbsp;[#x0C0E-#x0C10]
5076                     |&nbsp;[#x0C12-#x0C28]
5077                     |&nbsp;[#x0C2A-#x0C33]
5078                     |&nbsp;[#x0C35-#x0C39]
5079                     |&nbsp;[#x0C60-#x0C61]
5080                     |&nbsp;[#x0C85-#x0C8C]
5081                     |&nbsp;[#x0C8E-#x0C90]
5082                     |&nbsp;[#x0C92-#x0CA8]
5083                     |&nbsp;[#x0CAA-#x0CB3]
5084                     |&nbsp;[#x0CB5-#x0CB9]
5085                     |&nbsp;#x0CDE
5086                     |&nbsp;[#x0CE0-#x0CE1]
5087                     |&nbsp;[#x0D05-#x0D0C]
5088                     |&nbsp;[#x0D0E-#x0D10]
5089                     |&nbsp;[#x0D12-#x0D28]
5090                     |&nbsp;[#x0D2A-#x0D39]
5091                     |&nbsp;[#x0D60-#x0D61]
5092                     |&nbsp;[#x0E01-#x0E2E]
5093                     |&nbsp;#x0E30
5094                     |&nbsp;[#x0E32-#x0E33]
5095                     |&nbsp;[#x0E40-#x0E45]
5096                     |&nbsp;[#x0E81-#x0E82]
5097                     |&nbsp;#x0E84
5098                     |&nbsp;[#x0E87-#x0E88]
5099                     |&nbsp;#x0E8A
5100                     |&nbsp;#x0E8D
5101                     |&nbsp;[#x0E94-#x0E97]
5102                     |&nbsp;[#x0E99-#x0E9F]
5103                     |&nbsp;[#x0EA1-#x0EA3]
5104                     |&nbsp;#x0EA5
5105                     |&nbsp;#x0EA7
5106                     |&nbsp;[#x0EAA-#x0EAB]
5107                     |&nbsp;[#x0EAD-#x0EAE]
5108                     |&nbsp;#x0EB0
5109                     |&nbsp;[#x0EB2-#x0EB3]
5110                     |&nbsp;#x0EBD
5111                     |&nbsp;[#x0EC0-#x0EC4]
5112                     |&nbsp;[#x0F40-#x0F47]
5113                     |&nbsp;[#x0F49-#x0F69]
5114                     |&nbsp;[#x10A0-#x10C5]
5115                     |&nbsp;[#x10D0-#x10F6]
5116                     |&nbsp;#x1100
5117                     |&nbsp;[#x1102-#x1103]
5118                     |&nbsp;[#x1105-#x1107]
5119                     |&nbsp;#x1109
5120                     |&nbsp;[#x110B-#x110C]
5121                     |&nbsp;[#x110E-#x1112]
5122                     |&nbsp;#x113C
5123                     |&nbsp;#x113E
5124                     |&nbsp;#x1140
5125                     |&nbsp;#x114C
5126                     |&nbsp;#x114E
5127                     |&nbsp;#x1150
5128                     |&nbsp;[#x1154-#x1155]
5129                     |&nbsp;#x1159
5130                     |&nbsp;[#x115F-#x1161]
5131                     |&nbsp;#x1163
5132                     |&nbsp;#x1165
5133                     |&nbsp;#x1167
5134                     |&nbsp;#x1169
5135                     |&nbsp;[#x116D-#x116E]
5136                     |&nbsp;[#x1172-#x1173]
5137                     |&nbsp;#x1175
5138                     |&nbsp;#x119E
5139                     |&nbsp;#x11A8
5140                     |&nbsp;#x11AB
5141                     |&nbsp;[#x11AE-#x11AF]
5142                     |&nbsp;[#x11B7-#x11B8]
5143                     |&nbsp;#x11BA
5144                     |&nbsp;[#x11BC-#x11C2]
5145                     |&nbsp;#x11EB
5146                     |&nbsp;#x11F0
5147                     |&nbsp;#x11F9
5148                     |&nbsp;[#x1E00-#x1E9B]
5149                     |&nbsp;[#x1EA0-#x1EF9]
5150                     |&nbsp;[#x1F00-#x1F15]
5151                     |&nbsp;[#x1F18-#x1F1D]
5152                     |&nbsp;[#x1F20-#x1F45]
5153                     |&nbsp;[#x1F48-#x1F4D]
5154                     |&nbsp;[#x1F50-#x1F57]
5155                     |&nbsp;#x1F59
5156                     |&nbsp;#x1F5B
5157                     |&nbsp;#x1F5D
5158                     |&nbsp;[#x1F5F-#x1F7D]
5159                     |&nbsp;[#x1F80-#x1FB4]
5160                     |&nbsp;[#x1FB6-#x1FBC]
5161                     |&nbsp;#x1FBE
5162                     |&nbsp;[#x1FC2-#x1FC4]
5163                     |&nbsp;[#x1FC6-#x1FCC]
5164                     |&nbsp;[#x1FD0-#x1FD3]
5165                     |&nbsp;[#x1FD6-#x1FDB]
5166                     |&nbsp;[#x1FE0-#x1FEC]
5167                     |&nbsp;[#x1FF2-#x1FF4]
5168                     |&nbsp;[#x1FF6-#x1FFC]
5169                     |&nbsp;#x2126
5170                     |&nbsp;[#x212A-#x212B]
5171                     |&nbsp;#x212E
5172                     |&nbsp;[#x2180-#x2182]
5173                     |&nbsp;[#x3041-#x3094]
5174                     |&nbsp;[#x30A1-#x30FA]
5175                     |&nbsp;[#x3105-#x312C]
5176                     |&nbsp;[#xAC00-#xD7A3]
5177
5178                  </td>
5179                  <td></td>
5180               </tr>
5181
5182               <tr valign="baseline">
5183                  <td><a name="NT-Ideographic"></a>[86]&nbsp;&nbsp;&nbsp;
5184                  </td>
5185                  <td>Ideographic</td>
5186                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
5187                  <td>[#x4E00-#x9FA5]
5188                     |&nbsp;#x3007
5189                     |&nbsp;[#x3021-#x3029]
5190
5191                  </td>
5192                  <td></td>
5193               </tr>
5194
5195               <tr valign="baseline">
5196                  <td><a name="NT-CombiningChar"></a>[87]&nbsp;&nbsp;&nbsp;
5197                  </td>
5198                  <td>CombiningChar</td>
5199                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
5200                  <td>[#x0300-#x0345]
5201                     |&nbsp;[#x0360-#x0361]
5202                     |&nbsp;[#x0483-#x0486]
5203                     |&nbsp;[#x0591-#x05A1]
5204                     |&nbsp;[#x05A3-#x05B9]
5205                     |&nbsp;[#x05BB-#x05BD]
5206                     |&nbsp;#x05BF
5207                     |&nbsp;[#x05C1-#x05C2]
5208                     |&nbsp;#x05C4
5209                     |&nbsp;[#x064B-#x0652]
5210                     |&nbsp;#x0670
5211                     |&nbsp;[#x06D6-#x06DC]
5212                     |&nbsp;[#x06DD-#x06DF]
5213                     |&nbsp;[#x06E0-#x06E4]
5214                     |&nbsp;[#x06E7-#x06E8]
5215                     |&nbsp;[#x06EA-#x06ED]
5216                     |&nbsp;[#x0901-#x0903]
5217                     |&nbsp;#x093C
5218                     |&nbsp;[#x093E-#x094C]
5219                     |&nbsp;#x094D
5220                     |&nbsp;[#x0951-#x0954]
5221                     |&nbsp;[#x0962-#x0963]
5222                     |&nbsp;[#x0981-#x0983]
5223                     |&nbsp;#x09BC
5224                     |&nbsp;#x09BE
5225                     |&nbsp;#x09BF
5226                     |&nbsp;[#x09C0-#x09C4]
5227                     |&nbsp;[#x09C7-#x09C8]
5228                     |&nbsp;[#x09CB-#x09CD]
5229                     |&nbsp;#x09D7
5230                     |&nbsp;[#x09E2-#x09E3]
5231                     |&nbsp;#x0A02
5232                     |&nbsp;#x0A3C
5233                     |&nbsp;#x0A3E
5234                     |&nbsp;#x0A3F
5235                     |&nbsp;[#x0A40-#x0A42]
5236                     |&nbsp;[#x0A47-#x0A48]
5237                     |&nbsp;[#x0A4B-#x0A4D]
5238                     |&nbsp;[#x0A70-#x0A71]
5239                     |&nbsp;[#x0A81-#x0A83]
5240                     |&nbsp;#x0ABC
5241                     |&nbsp;[#x0ABE-#x0AC5]
5242                     |&nbsp;[#x0AC7-#x0AC9]
5243                     |&nbsp;[#x0ACB-#x0ACD]
5244                     |&nbsp;[#x0B01-#x0B03]
5245                     |&nbsp;#x0B3C
5246                     |&nbsp;[#x0B3E-#x0B43]
5247                     |&nbsp;[#x0B47-#x0B48]
5248                     |&nbsp;[#x0B4B-#x0B4D]
5249                     |&nbsp;[#x0B56-#x0B57]
5250                     |&nbsp;[#x0B82-#x0B83]
5251                     |&nbsp;[#x0BBE-#x0BC2]
5252                     |&nbsp;[#x0BC6-#x0BC8]
5253                     |&nbsp;[#x0BCA-#x0BCD]
5254                     |&nbsp;#x0BD7
5255                     |&nbsp;[#x0C01-#x0C03]
5256                     |&nbsp;[#x0C3E-#x0C44]
5257                     |&nbsp;[#x0C46-#x0C48]
5258                     |&nbsp;[#x0C4A-#x0C4D]
5259                     |&nbsp;[#x0C55-#x0C56]
5260                     |&nbsp;[#x0C82-#x0C83]
5261                     |&nbsp;[#x0CBE-#x0CC4]
5262                     |&nbsp;[#x0CC6-#x0CC8]
5263                     |&nbsp;[#x0CCA-#x0CCD]
5264                     |&nbsp;[#x0CD5-#x0CD6]
5265                     |&nbsp;[#x0D02-#x0D03]
5266                     |&nbsp;[#x0D3E-#x0D43]
5267                     |&nbsp;[#x0D46-#x0D48]
5268                     |&nbsp;[#x0D4A-#x0D4D]
5269                     |&nbsp;#x0D57
5270                     |&nbsp;#x0E31
5271                     |&nbsp;[#x0E34-#x0E3A]
5272                     |&nbsp;[#x0E47-#x0E4E]
5273                     |&nbsp;#x0EB1
5274                     |&nbsp;[#x0EB4-#x0EB9]
5275                     |&nbsp;[#x0EBB-#x0EBC]
5276                     |&nbsp;[#x0EC8-#x0ECD]
5277                     |&nbsp;[#x0F18-#x0F19]
5278                     |&nbsp;#x0F35
5279                     |&nbsp;#x0F37
5280                     |&nbsp;#x0F39
5281                     |&nbsp;#x0F3E
5282                     |&nbsp;#x0F3F
5283                     |&nbsp;[#x0F71-#x0F84]
5284                     |&nbsp;[#x0F86-#x0F8B]
5285                     |&nbsp;[#x0F90-#x0F95]
5286                     |&nbsp;#x0F97
5287                     |&nbsp;[#x0F99-#x0FAD]
5288                     |&nbsp;[#x0FB1-#x0FB7]
5289                     |&nbsp;#x0FB9
5290                     |&nbsp;[#x20D0-#x20DC]
5291                     |&nbsp;#x20E1
5292                     |&nbsp;[#x302A-#x302F]
5293                     |&nbsp;#x3099
5294                     |&nbsp;#x309A
5295
5296                  </td>
5297                  <td></td>
5298               </tr>
5299
5300               <tr valign="baseline">
5301                  <td><a name="NT-Digit"></a>[88]&nbsp;&nbsp;&nbsp;
5302                  </td>
5303                  <td>Digit</td>
5304                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
5305                  <td>[#x0030-#x0039]
5306                     |&nbsp;[#x0660-#x0669]
5307                     |&nbsp;[#x06F0-#x06F9]
5308                     |&nbsp;[#x0966-#x096F]
5309                     |&nbsp;[#x09E6-#x09EF]
5310                     |&nbsp;[#x0A66-#x0A6F]
5311                     |&nbsp;[#x0AE6-#x0AEF]
5312                     |&nbsp;[#x0B66-#x0B6F]
5313                     |&nbsp;[#x0BE7-#x0BEF]
5314                     |&nbsp;[#x0C66-#x0C6F]
5315                     |&nbsp;[#x0CE6-#x0CEF]
5316                     |&nbsp;[#x0D66-#x0D6F]
5317                     |&nbsp;[#x0E50-#x0E59]
5318                     |&nbsp;[#x0ED0-#x0ED9]
5319                     |&nbsp;[#x0F20-#x0F29]
5320
5321                  </td>
5322                  <td></td>
5323               </tr>
5324
5325               <tr valign="baseline">
5326                  <td><a name="NT-Extender"></a>[89]&nbsp;&nbsp;&nbsp;
5327                  </td>
5328                  <td>Extender</td>
5329                  <td>&nbsp;&nbsp;&nbsp;::=&nbsp;&nbsp;&nbsp;</td>
5330                  <td>#x00B7
5331                     |&nbsp;#x02D0
5332                     |&nbsp;#x02D1
5333                     |&nbsp;#x0387
5334                     |&nbsp;#x0640
5335                     |&nbsp;#x0E46
5336                     |&nbsp;#x0EC6
5337                     |&nbsp;#x3005
5338                     |&nbsp;[#x3031-#x3035]
5339                     |&nbsp;[#x309D-#x309E]
5340                     |&nbsp;[#x30FC-#x30FE]
5341
5342                  </td>
5343                  <td></td>
5344               </tr>
5345
5346
5347            </tbody>
5348         </table>
5349
5350      </p>
5351
5352      <p>The character classes defined here can be derived from the
5353         Unicode character database as follows:
5354
5355         <ul>
5356
5357            <li>
5358
5359               <p>Name start characters must have one of the categories Ll, Lu,
5360                  Lo, Lt, Nl.
5361               </p>
5362
5363            </li>
5364
5365            <li>
5366
5367               <p>Name characters other than Name-start characters
5368                  must have one of the categories Mc, Me, Mn, Lm, or Nd.
5369               </p>
5370
5371            </li>
5372
5373            <li>
5374
5375               <p>Characters in the compatibility area (i.e. with character code
5376                  greater than #xF900 and less than #xFFFE) are not allowed in XML
5377                  names.
5378               </p>
5379
5380            </li>
5381
5382            <li>
5383
5384               <p>Characters which have a font or compatibility decomposition (i.e. those
5385                  with a "compatibility formatting tag" in field 5 of the database --
5386                  marked by field 5 beginning with a "&lt;") are not allowed.
5387               </p>
5388
5389            </li>
5390
5391            <li>
5392
5393               <p>The following characters are treated as name-start characters
5394                  rather than name characters, because the property file classifies
5395                  them as Alphabetic:  [#x02BB-#x02C1], #x0559, #x06E5, #x06E6.
5396               </p>
5397
5398            </li>
5399
5400            <li>
5401
5402               <p>Characters #x20DD-#x20E0 are excluded (in accordance with
5403                  Unicode, section 5.14).
5404               </p>
5405
5406            </li>
5407
5408            <li>
5409
5410               <p>Character #x00B7 is classified as an extender, because the
5411                  property list so identifies it.
5412               </p>
5413
5414            </li>
5415
5416            <li>
5417
5418               <p>Character #x0387 is added as a name character, because #x00B7
5419                  is its canonical equivalent.
5420               </p>
5421
5422            </li>
5423
5424            <li>
5425
5426               <p>Characters ':' and '_' are allowed as name-start characters.</p>
5427
5428            </li>
5429
5430            <li>
5431
5432               <p>Characters '-' and '.' are allowed as name characters.</p>
5433
5434            </li>
5435
5436         </ul>
5437
5438      </p>
5439
5440
5441
5442      <h2><a name="sec-xml-and-sgml"></a>C XML and SGML (Non-Normative)
5443      </h2>
5444
5445
5446      <p>XML is designed to be a subset of SGML, in that every
5447         <a href="#dt-valid">valid</a> XML document should also be a
5448         conformant SGML document.
5449         For a detailed comparison of the additional restrictions that XML places on
5450         documents beyond those of SGML, see <a href="#Clark">[Clark]</a>.
5451
5452      </p>
5453
5454
5455
5456      <h2><a name="sec-entexpand"></a>D Expansion of Entity and Character References (Non-Normative)
5457      </h2>
5458
5459      <p>This appendix contains some examples illustrating the
5460         sequence of entity- and character-reference recognition and
5461         expansion, as specified in <a href="#entproc">[<b>4.4 XML Processor Treatment of Entities and References</b>]
5462         </a>.
5463      </p>
5464
5465      <p>
5466         If the DTD contains the declaration
5467         <pre>&lt;!ENTITY example "&lt;p>An ampersand (&amp;#38;#38;) may be escaped
5468numerically (&amp;#38;#38;#38;) or with a general entity
5469(&amp;amp;amp;).&lt;/p>" >
5470</pre>
5471         then the XML processor will recognize the character references
5472         when it parses the entity declaration, and resolve them before
5473         storing the following string as the
5474         value of the entity "<code>example</code>":
5475         <pre>&lt;p>An ampersand (&amp;#38;) may be escaped
5476numerically (&amp;#38;#38;) or with a general entity
5477(&amp;amp;amp;).&lt;/p>
5478</pre>
5479         A reference in the document to "<code>&amp;example;</code>"
5480         will cause the text to be reparsed, at which time the
5481         start- and end-tags of the "<code>p</code>" element will be recognized
5482         and the three references will be recognized and expanded,
5483         resulting in a "<code>p</code>" element with the following content
5484         (all data, no delimiters or markup):
5485         <pre>An ampersand (&amp;) may be escaped
5486numerically (&amp;#38;) or with a general entity
5487(&amp;amp;).
5488</pre>
5489         </p>
5490
5491      <p>A more complex example will illustrate the rules and their
5492         effects fully.  In the following example, the line numbers are
5493         solely for reference.
5494         <pre>1 &lt;?xml version='1.0'?>
54952 &lt;!DOCTYPE test [
54963 &lt;!ELEMENT test (#PCDATA) >
54974 &lt;!ENTITY % xx '&amp;#37;zz;'>
54985 &lt;!ENTITY % zz '&amp;#60;!ENTITY tricky "error-prone" >' >
54996 %xx;
55007 ]>
55018 &lt;test>This sample shows a &amp;tricky; method.&lt;/test>
5502</pre>
5503         This produces the following:
5504         <ul>
5505
5506            <li>
5507               <p>in line 4, the reference to character 37 is expanded immediately,
5508                  and the parameter entity "<code>xx</code>" is stored in the symbol
5509                  table with the value "<code>%zz;</code>".  Since the replacement text
5510                  is not rescanned, the reference to parameter entity "<code>zz</code>"
5511                  is not recognized.  (And it would be an error if it were, since
5512                  "<code>zz</code>" is not yet declared.)
5513               </p>
5514            </li>
5515
5516            <li>
5517               <p>in line 5, the character reference "<code>&amp;#60;</code>" is
5518                  expanded immediately and the parameter entity "<code>zz</code>" is
5519                  stored with the replacement text
5520                  "<code>&lt;!ENTITY tricky "error-prone" ></code>",
5521                  which is a well-formed entity declaration.
5522               </p>
5523            </li>
5524
5525            <li>
5526               <p>in line 6, the reference to "<code>xx</code>" is recognized,
5527                  and the replacement text of "<code>xx</code>" (namely
5528                  "<code>%zz;</code>") is parsed.  The reference to "<code>zz</code>"
5529                  is recognized in its turn, and its replacement text
5530                  ("<code>&lt;!ENTITY tricky "error-prone" ></code>") is parsed.
5531                  The general entity "<code>tricky</code>" has now been
5532                  declared, with the replacement text "<code>error-prone</code>".
5533               </p>
5534            </li>
5535
5536            <li>
5537               <p>
5538                  in line 8, the reference to the general entity "<code>tricky</code>" is
5539                  recognized, and it is expanded, so the full content of the
5540                  "<code>test</code>" element is the self-describing (and ungrammatical) string
5541                  <i>This sample shows a error-prone method.</i>
5542
5543               </p>
5544            </li>
5545
5546         </ul>
5547
5548      </p>
5549
5550
5551
5552      <h2><a name="determinism"></a>E Deterministic Content Models (Non-Normative)
5553      </h2>
5554
5555      <p><a href="#dt-compat">For compatibility</a>, it is
5556         required
5557         that content models in element type declarations be deterministic.
5558
5559      </p>
5560
5561
5562      <p>SGML
5563         requires deterministic content models (it calls them
5564         "unambiguous"); XML processors built using SGML systems may
5565         flag non-deterministic content models as errors.
5566      </p>
5567
5568      <p>For example, the content model <code>((b, c) | (b, d))</code> is
5569         non-deterministic, because given an initial <code>b</code> the parser
5570         cannot know which <code>b</code> in the model is being matched without
5571         looking ahead to see which element follows the <code>b</code>.
5572         In this case, the two references to
5573         <code>b</code> can be collapsed
5574         into a single reference, making the model read
5575         <code>(b, (c | d))</code>.  An initial <code>b</code> now clearly
5576         matches only a single name in the content model.  The parser doesn't
5577         need to look ahead to see what follows; either <code>c</code> or
5578         <code>d</code> would be accepted.
5579      </p>
5580
5581      <p>More formally:  a finite state automaton may be constructed from the
5582         content model using the standard algorithms, e.g. algorithm 3.5
5583         in section 3.9
5584         of Aho, Sethi, and Ullman <a href="#Aho">[Aho/Ullman]</a>.
5585         In many such algorithms, a follow set is constructed for each
5586         position in the regular expression (i.e., each leaf
5587         node in the
5588         syntax tree for the regular expression);
5589         if any position has a follow set in which
5590         more than one following position is
5591         labeled with the same element type name,
5592         then the content model is in error
5593         and may be reported as an error.
5594
5595      </p>
5596
5597      <p>Algorithms exist which allow many but not all non-deterministic
5598         content models to be reduced automatically to equivalent deterministic
5599         models; see Br&uuml;ggemann-Klein 1991 <a href="#ABK">[Br&uuml;ggemann-Klein]</a>.
5600      </p>
5601
5602
5603
5604      <h2><a name="sec-guessing"></a>F Autodetection of Character Encodings (Non-Normative)
5605      </h2>
5606
5607      <p>The XML encoding declaration functions as an internal label on each
5608         entity, indicating which character encoding is in use.  Before an XML
5609         processor can read the internal label, however, it apparently has to
5610         know what character encoding is in use--which is what the internal label
5611         is trying to indicate.  In the general case, this is a hopeless
5612         situation. It is not entirely hopeless in XML, however, because XML
5613         limits the general case in two ways:  each implementation is assumed
5614         to support only a  finite set of character encodings, and the XML
5615         encoding declaration is restricted in position and content in order to
5616         make it feasible to autodetect the character encoding in use in each
5617         entity in normal cases.  Also, in many cases other sources of information
5618         are available in addition to the XML data stream itself.
5619         Two cases may be distinguished,
5620         depending on whether the XML entity is presented to the
5621         processor without, or with, any accompanying
5622         (external) information.  We consider the first case first.
5623
5624      </p>
5625
5626      <p>
5627         Because each XML entity not in UTF-8 or UTF-16 format <i>must</i>
5628         begin with an XML encoding declaration, in which the first  characters
5629         must be '<code>&lt;?xml</code>', any conforming processor can detect,
5630         after two to four octets of input, which of the following cases apply.
5631         In reading this list, it may help to know that in UCS-4, '&lt;' is
5632         "<code>#x0000003C</code>" and '?' is "<code>#x0000003F</code>", and the Byte
5633         Order Mark required of UTF-16 data streams is "<code>#xFEFF</code>".
5634      </p>
5635
5636      <p>
5637
5638         <ul>
5639
5640            <li>
5641
5642               <p><code>00 00 00 3C</code>: UCS-4, big-endian machine (1234 order)
5643               </p>
5644
5645            </li>
5646
5647            <li>
5648
5649               <p><code>3C 00 00 00</code>: UCS-4, little-endian machine (4321 order)
5650               </p>
5651
5652            </li>
5653
5654            <li>
5655
5656               <p><code>00 00 3C 00</code>: UCS-4, unusual octet order (2143)
5657               </p>
5658
5659            </li>
5660
5661            <li>
5662
5663               <p><code>00 3C 00 00</code>: UCS-4, unusual octet order (3412)
5664               </p>
5665
5666            </li>
5667
5668            <li>
5669
5670               <p><code>FE FF</code>: UTF-16, big-endian
5671               </p>
5672
5673            </li>
5674
5675            <li>
5676
5677               <p><code>FF FE</code>: UTF-16, little-endian
5678               </p>
5679
5680            </li>
5681
5682            <li>
5683
5684               <p><code>00 3C 00 3F</code>: UTF-16, big-endian, no Byte Order Mark
5685                  (and thus, strictly speaking, in error)
5686               </p>
5687
5688            </li>
5689
5690            <li>
5691
5692               <p><code>3C 00 3F 00</code>: UTF-16, little-endian, no Byte Order Mark
5693                  (and thus, strictly speaking, in error)
5694               </p>
5695
5696            </li>
5697
5698            <li>
5699
5700               <p><code>3C 3F 78 6D</code>: UTF-8, ISO 646, ASCII, some part of ISO 8859,
5701                  Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding
5702                  which ensures that the characters of ASCII have their normal positions,
5703                  width,
5704                  and values; the actual encoding declaration must be read to
5705                  detect which of these applies, but since all of these encodings
5706                  use the same bit patterns for the ASCII characters, the encoding
5707                  declaration itself may be read reliably
5708
5709               </p>
5710
5711            </li>
5712
5713            <li>
5714
5715               <p><code>4C 6F A7 94</code>: EBCDIC (in some flavor; the full
5716                  encoding declaration must be read to tell which code page is in
5717                  use)
5718               </p>
5719
5720            </li>
5721
5722            <li>
5723
5724               <p>other: UTF-8 without an encoding declaration, or else
5725                  the data stream is corrupt, fragmentary, or enclosed in
5726                  a wrapper of some kind
5727               </p>
5728
5729            </li>
5730
5731         </ul>
5732
5733      </p>
5734
5735      <p>
5736         This level of autodetection is enough to read the XML encoding
5737         declaration and parse the character-encoding identifier, which is
5738         still necessary to distinguish the individual members of each family
5739         of encodings (e.g. to tell  UTF-8 from 8859, and the parts of 8859
5740         from each other, or to distinguish the specific EBCDIC code page in
5741         use, and so on).
5742
5743      </p>
5744
5745      <p>
5746         Because the contents of the encoding declaration are restricted to
5747         ASCII characters, a processor can reliably read the entire encoding
5748         declaration as soon as it has detected which family of encodings is in
5749         use.  Since in practice, all widely used character encodings fall into
5750         one of the categories above, the XML encoding declaration allows
5751         reasonably reliable in-band labeling of character encodings, even when
5752         external sources of information at the operating-system or
5753         transport-protocol level are unreliable.
5754
5755      </p>
5756
5757      <p>
5758         Once the processor has detected the character encoding in use, it can
5759         act appropriately, whether by invoking a separate input routine for
5760         each case, or by calling the proper conversion function on each
5761         character of input.
5762
5763      </p>
5764
5765      <p>
5766         Like any self-labeling system, the XML encoding declaration will not
5767         work if any software changes the entity's character set or encoding
5768         without updating the encoding declaration.  Implementors of
5769         character-encoding routines should be careful to ensure the accuracy
5770         of the internal and external information used to label the entity.
5771
5772      </p>
5773
5774      <p>The second possible case occurs when the XML entity is accompanied
5775         by encoding information, as in some file systems and some network
5776         protocols.
5777         When multiple sources of information are available,
5778
5779         their relative
5780         priority and the preferred method of handling conflict should be
5781         specified as part of the higher-level protocol used to deliver XML.
5782         Rules for the relative priority of the internal label and the
5783         MIME-type label in an external header, for example, should be part of the
5784         RFC document defining the text/xml and application/xml MIME types. In
5785         the interests of interoperability, however, the following rules
5786         are recommended.
5787
5788         <ul>
5789
5790            <li>
5791               <p>If an XML entity is in a file, the Byte-Order Mark
5792                  and encoding-declaration PI are used (if present) to determine the
5793                  character encoding.  All other heuristics and sources of information
5794                  are solely for error recovery.
5795
5796               </p>
5797            </li>
5798
5799            <li>
5800               <p>If an XML entity is delivered with a
5801                  MIME type of text/xml, then the <code>charset</code> parameter
5802                  on the MIME type determines the
5803                  character encoding method; all other heuristics and sources of
5804                  information are solely for error recovery.
5805
5806               </p>
5807            </li>
5808
5809            <li>
5810               <p>If an XML entity is delivered
5811                  with a
5812                  MIME type of application/xml, then the Byte-Order Mark and
5813                  encoding-declaration PI are used (if present) to determine the
5814                  character encoding.  All other heuristics and sources of
5815                  information are solely for error recovery.
5816
5817               </p>
5818            </li>
5819
5820         </ul>
5821         These rules apply only in the absence of protocol-level documentation;
5822         in particular, when the MIME types text/xml and application/xml are
5823         defined, the recommendations of the relevant RFC will supersede
5824         these rules.
5825
5826      </p>
5827
5828
5829
5830
5831
5832      <h2><a name="sec-xml-wg"></a>G W3C XML Working Group (Non-Normative)
5833      </h2>
5834
5835
5836      <p>This specification was prepared and approved for publication by the
5837         W3C XML Working Group (WG).  WG approval of this specification does
5838         not necessarily imply that all WG members voted for its approval.
5839         The current and former members of the XML WG are:
5840      </p>
5841
5842      Jon Bosak, Sun (Chair); James Clark (Technical Lead); Tim Bray, Textuality and Netscape (XML Co-editor); Jean Paoli, Microsoft (XML Co-editor); C. M. Sperberg-McQueen, U. of Ill. (XML
5843      Co-editor); Dan Connolly, W3C (W3C Liaison); Paula Angerstein, Texcel; Steve DeRose, INSO; Dave Hollander, HP; Eliot Kimber, ISOGEN; Eve Maler, ArborText; Tom Magliery, NCSA; Murray Maloney, Muzmo and Grif; Makoto Murata, Fuji Xerox Information Systems; Joel Nava, Adobe; Conleth O'Connell, Vignette; Peter Sharpe, SoftQuad; John Tigue, DataChannel
5844
5845
5846
5847
5848   </body>
5849</html>