xref: /aosp_15_r20/external/cldr/docs/ldml/tr35-messageFormat.md (revision 912701f9769bb47905792267661f0baf2b85bed5)
1## Unicode Technical Standard #35
2
3# Unicode Locale Data Markup Language (LDML)<br/>Part 9: Message Format
4
5|Version|45                      |
6|-------|------------------------|
7|Editors|Addison Phillips and [other CLDR committee members](tr35.md#Acknowledgments)|
8
9For the full header, summary, and status, see [Part 1: Core](tr35.md).
10
11### _Summary_
12
13This specification defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages.
14
15This is a partial document, describing only those parts of the LDML that are relevant for message format. For the other parts of the LDML see the [main LDML document](tr35.md) and the links above.
16
17### _Status_
18
19<!-- _This is a draft document which may be updated, replaced, or superseded by other documents at any time.
20Publication does not imply endorsement by the Unicode Consortium.
21This is not a stable document; it is inappropriate to cite this document as other than a work in progress._ -->
22
23_This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium.
24This is a stable document and may be used as reference material or cited as a normative reference by other specifications._
25
26> _**A Unicode Technical Standard (UTS)** is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS._
27
28_Please submit corrigenda and other comments with the CLDR bug reporting form [[Bugs](tr35.md#Bugs)]. Related information that is useful in understanding this document is found in the [References](tr35.md#References). For the latest version of the Unicode Standard see [[Unicode](tr35.md#Unicode)]. For a list of current Unicode Technical Reports see [[Reports](tr35.md#Reports)]. For more information about versions of the Unicode Standard, see [[Versions](tr35.md#Versions)]._
29
30## Parts
31
32The LDML specification is divided into the following parts:
33
34*   Part 1: [Core](tr35.md#Contents) (languages, locales, basic structure)
35*   Part 2: [General](tr35-general.md#Contents) (display names & transforms, etc.)
36*   Part 3: [Numbers](tr35-numbers.md#Contents) (number & currency formatting)
37*   Part 4: [Dates](tr35-dates.md#Contents) (date, time, time zone formatting)
38*   Part 5: [Collation](tr35-collation.md#Contents) (sorting, searching, grouping)
39*   Part 6: [Supplemental](tr35-info.md#Contents) (supplemental data)
40*   Part 7: [Keyboards](tr35-keyboards.md#Contents) (keyboard mappings)
41*   Part 8: [Person Names](tr35-personNames.md#Contents) (person names)
42*   Part 9: [MessageFormat](tr35-messageFormat.md#Contents) (message format)
43
44## <a name="Contents">Contents of Part 9, Message Format</a>
45
46* [Introduction](#introduction)
47  * [Conformance](#conformance)
48  * [Terminology and Conventions](#terminology-and-conventions)
49  * [Stability Policy](#stability-policy)
50* [Syntax](#syntax)
51  * [Design Goals](#design-goals)
52  * [Design Restrictions](#design-restrictions)
53  * [Messages and their Syntax](#messages-and-their-syntax)
54    * [Well-formed vs. Valid Messages](#well-formed-vs.-valid-messages)
55  * [The Message](#the-message)
56    * [Declarations](#declarations)
57      * [Reserved Statements](#reserved-statements)
58    * [Complex Body](#complex-body)
59  * [Pattern](#pattern)
60    * [Quoted Pattern](#quoted-pattern)
61    * [Text](#text)
62    * [Placeholder](#placeholder)
63  * [Matcher](#matcher)
64    * [Selector](#selector)
65    * [Variant](#variant)
66      * [Key](#key)
67  * [Expressions](#expressions)
68    * [Annotation](#annotation)
69      * [Function](#function)
70        * [Options](#options)
71      * [Private-Use Annotations](#private-use-annotations)
72      * [Reserved Annotations](#reserved-annotations)
73  * [Markup](#markup)
74  * [Attributes](#attributes)
75  * [Other Syntax Elements](#other-syntax-elements)
76    * [Keywords](#keywords)
77    * [Literals](#literals)
78    * [Names and Identifiers](#names-and-identifiers)
79    * [Escape Sequences](#escape-sequences)
80    * [Whitespace](#whitespace)
81* [Complete ABNF](#complete-abnf)
82  * [`message.abnf`](#message.abnf)
83* [Errors](#errors)
84  * [Error Handling](#error-handling)
85  * [Syntax Errors](#syntax-errors)
86  * [Data Model Errors](#data-model-errors)
87    * [Variant Key Mismatch](#variant-key-mismatch)
88    * [Missing Fallback Variant](#missing-fallback-variant)
89    * [Missing Selector Annotation](#missing-selector-annotation)
90    * [Duplicate Declaration](#duplicate-declaration)
91    * [Duplicate Option Name](#duplicate-option-name)
92  * [Resolution Errors](#resolution-errors)
93    * [Unresolved Variable](#unresolved-variable)
94    * [Unknown Function](#unknown-function)
95    * [Unsupported Expression](#unsupported-expression)
96    * [Invalid Expression](#invalid-expression)
97    * [Unsupported Statement](#unsupported-statement)
98  * [Selection Errors](#selection-errors)
99  * [Formatting Errors](#formatting-errors)
100* [Function Registry](#function-registry)
101  * [Goals](#goals)
102  * [Conformance and Use](#conformance-and-use)
103  * [Registry Data Model](#registry-data-model)
104  * [Example](#example)
105  * [Default Registry](#default-registry)
106    * [String Value Selection and Formatting](#string-value-selection-and-formatting)
107      * [The `:string` function](#the-string-function)
108        * [Operands](#operands)
109        * [Options](#options)
110        * [Selection](#selection)
111        * [Formatting](#formatting)
112    * [Numeric Value Selection and Formatting](#numeric-value-selection-and-formatting)
113      * [The `:number` function](#the-number-function)
114        * [Operands](#operands)
115        * [Options](#options)
116        * [Default Value of `select` Option](#default-value-of-select-option)
117        * [Percent Style](#percent-style)
118        * [Selection](#selection)
119      * [The `:integer` function](#the-integer-function)
120        * [Operands](#operands)
121        * [Options](#options)
122        * [Default Value of `select` Option](#default-value-of-select-option)
123        * [Percent Style](#percent-style)
124        * [Selection](#selection)
125      * [Number Operands](#number-operands)
126      * [Digit Size Options](#digit-size-options)
127      * [Number Selection](#number-selection)
128        * [Rule Selection](#rule-selection)
129        * [Determining Exact Literal Match](#determining-exact-literal-match)
130    * [Date and Time Value Formatting](#date-and-time-value-formatting)
131      * [The `:datetime` function](#the-datetime-function)
132        * [Operands](#operands)
133        * [Options](#options)
134      * [The `:date` function](#the-date-function)
135        * [Operands](#operands)
136        * [Options](#options)
137      * [The `:time` function](#the-time-function)
138        * [Operands](#operands)
139        * [Options](#options)
140      * [Date and Time Operands](#date-and-time-operands)
141* [Formatting](#formatting)
142  * [Formatting Context](#formatting-context)
143  * [Expression and Markup Resolution](#expression-and-markup-resolution)
144    * [Literal Resolution](#literal-resolution)
145    * [Variable Resolution](#variable-resolution)
146    * [Function Resolution](#function-resolution)
147      * [Option Resolution](#option-resolution)
148    * [Markup Resolution](#markup-resolution)
149    * [Fallback Resolution](#fallback-resolution)
150  * [Pattern Selection](#pattern-selection)
151    * [Resolve Selectors](#resolve-selectors)
152    * [Resolve Preferences](#resolve-preferences)
153    * [Filter Variants](#filter-variants)
154    * [Sort Variants](#sort-variants)
155    * [Examples](#examples)
156      * [Example 1](#example-1)
157      * [Example 2](#example-2)
158      * [Example 3](#example-3)
159  * [Formatting](#formatting)
160    * [Examples](#examples)
161    * [Formatting Fallback Values](#formatting-fallback-values)
162    * [Handling Bidirectional Text](#handling-bidirectional-text)
163* [Interchange Data Model](#interchange-data-model)
164  * [Messages](#messages)
165  * [Patterns](#patterns)
166  * [Expressions](#expressions)
167  * [Markup](#markup)
168  * [Extensions](#extensions)
169* [Appendices](#appendices)
170  * [Security Considerations](#security-considerations)
171  * [Acknowledgements](#acknowledgements)
172
173## Introduction
174
175One of the challenges in adapting software to work for
176users with different languages and cultures is the need for **_dynamic messages_**.
177Whenever a user interface needs to present data as part of a larger string,
178that data needs to be formatted (and the message may need to be altered)
179to make it culturally accepted and grammatically correct.
180
181> For example, if your US English (`en-US`) interface has a message like:
182>
183> > Your item had 1,023 views on April 3, 2023
184>
185> You want the translated message to be appropriately formatted into French:
186>
187> > Votre article a eu 1 023 vues le 3 avril 2023
188>
189> Or Japanese:
190>
191> > あなたのアイテムは 2023 年 4 月 3 日に 1,023 回閲覧されました。
192
193This specification defines the
194data model, syntax, processing, and conformance requirements
195for the next generation of _dynamic messages_.
196It is intended for adoption by programming languages and APIs.
197This will enable the integration of
198existing internationalization APIs (such as the date and number formats shown above),
199grammatical matching (such as plurals or genders),
200as well as user-defined formats and message selectors.
201
202The document is the successor to ICU MessageFormat,
203henceforth called ICU MessageFormat 1.0.
204
205### Conformance
206
207Everything in this specification is normative except for:
208sections marked as non-normative,
209all authoring guidelines, diagrams, examples, and notes.
210
211The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
212NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
213"MAY", and "OPTIONAL" in this document are to be interpreted as
214described in BCP 14 \[[RFC2119](https://www.rfc-editor.org/rfc/rfc2119)\]
215\[[RFC8174](https://www.rfc-editor.org/rfc/rfc8174)\] when, and only when, they
216appear in all capitals, as shown here.
217
218### Terminology and Conventions
219
220A **_term_** looks like this when it is defined in this specification.
221
222A reference to a _term_ looks like this.
223
224> Examples are non-normative and styled like this.
225
226### Stability Policy
227
228> [!IMPORTANT]
229> The provisions of the stability policy are not in effect until
230> the conclusion of the technical preview and adoption of this specification.
231
232Updates to this specification will not change
233the syntactical meaning, the runtime output, or other behaviour
234of valid messages written for earlier versions of this specification
235that only use functions defined in this specification.
236Updates to this specification will not remove any syntax provided in this version.
237Future versions MAY add additional structure or meaning to existing syntax.
238
239Updates to this specification will not remove any reserved keywords or sigils.
240
241> [!NOTE]
242> Future versions may define new keywords.
243
244Updates to this specification will not reserve or assign meaning to
245any character "sigils" except for those in the `reserved` production.
246
247Updates to this specification
248will not remove any functions defined in the default registry nor
249will they remove any options or option values.
250Additional options or option values MAY be defined.
251
252> [!NOTE]
253> This does not guarantee that the results of formatting will never change.
254> Even when the specification doesn't change,
255> the functions for date formatting, number formatting and so on
256> will change their results over time.
257
258Later specification versions MAY make previously invalid messages valid.
259
260Updates to this specification will not introduce message syntax that,
261when parsed according to earlier versions of this specification,
262would produce syntax or data model errors.
263Such messages MAY produce errors when formatted
264according to an earlier version of this specification.
265
266From version 2.0, MessageFormat will only reserve, define, or require
267function names or function option names
268consisting of characters in the ranges a-z, A-Z, and 0-9.
269All other names in these categories are reserved for the use of implementations or users.
270
271> [!NOTE]
272> Users defining custom names SHOULD include at least one character outside these ranges
273> to ensure that they will be compatible with future versions of this specification.
274
275Later versions of this specification will not introduce changes
276to the data model that would result in a data model representation
277based on this version being invalid.
278
279> For example, existing interfaces or fields will not be removed.
280
281Later versions of this specification MAY introduce changes
282to the data model that would result in future data model representations
283not being valid for implementations of this version of the data model.
284
285> For example, a future version could introduce a new keyword,
286> whose data model representation would be a new interface
287> that is not recognized by this version's data model.
288
289Later specification versions will not introduce syntax that cannot be
290represented by this version of the data model.
291
292> For example, a future version could introduce a new keyword.
293> The future version's data model would provide an interface for that keyword
294> while this version of the data model would parse the value into
295> the interface `UnsupportedStatement`.
296> Both data models would be "valid" in their context,
297> but this version's would be missing any functionality for the new statement type.
298
299## Syntax
300
301This section defines the formal grammar describing the syntax of a single message.
302
303### Design Goals
304
305_This section is non-normative._
306
307The design goals of the syntax specification are as follows:
308
3091. The syntax should leverage the familiarity with ICU MessageFormat 1.0
310   in order to lower the barrier to entry and increase the chance of adoption.
311   At the same time,
312   the syntax should fix the [pain points of ICU MessageFormat 1.0](https://github.com/unicode-org/message-format-wg/blob/main/docs/why_mf_next.md).
313
314   - _Non-Goal_: Be backwards-compatible with the ICU MessageFormat 1.0 syntax.
315
3161. The syntax inside translatable content should be easy to understand for humans.
317   This includes making it clear which parts of the message body _are_ translatable content,
318   which parts inside it are placeholders for expressions,
319   as well as making the selection logic predictable and easy to reason about.
320
321   - _Non-Goal_: Make the syntax intuitive enough for non-technical translators to hand-edit.
322     Instead, we assume that most translators will work with MessageFormat 2
323     by means of GUI tooling, CAT workbenches etc.
324
3251. The syntax surrounding translatable content should be easy to write and edit
326   for developers, localization engineers, and easy to parse by machines.
327
3281. The syntax should make a single message easily embeddable inside many container formats:
329   `.properties`, YAML, XML, inlined as string literals in programming languages, etc.
330   This includes a future _MessageResource_ specification.
331
332   - _Non-Goal_: Support unnecessary escape sequences, which would theirselves require
333     additional escaping when embedded. Instead, we tolerate direct use of nearly all
334     characters (including line breaks, control characters, etc.) and rely upon escaping
335     in those outer formats to aid human comprehension (e.g., depending upon container
336     format, a U+000A LINE FEED might be represented as `\n`, `\012`, `\x0A`, `\u000A`,
337     `\U0000000A`, `&#xA;`, `&NewLine;`, `%0A`, `<LF>`, or something else entirely).
338
339### Design Restrictions
340
341_This section is non-normative._
342
343The syntax specification takes into account the following design restrictions:
344
3451. Whitespace outside the translatable content should be insignificant.
346   It should be possible to define a message entirely on a single line with no ambiguity,
347   as well as to format it over multiple lines for clarity.
348
3491. The syntax should define as few special characters and sigils as possible.
350   Note that this necessitates extra care when presenting messages for human consumption,
351   because they may contain invisible characters such as U+200B ZERO WIDTH SPACE,
352   control characters such as U+0000 NULL and U+0009 TAB, permanently reserved noncharacters
353   (U+FDD0 through U+FDEF and U+<i>n</i>FFFE and U+<i>n</i>FFFF where <i>n</i> is 0x0 through 0x10),
354   private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and
355   U+100000 through U+10FFFD), unassigned code points, and other potentially confusing content.
356
357### Messages and their Syntax
358
359The purpose of MessageFormat is to allow content to vary at runtime.
360This variation might be due to placing a value into the content
361or it might be due to selecting a different bit of content based on some data value
362or it might be due to a combination of the two.
363
364MessageFormat calls the template for a given formatting operation a _message_.
365
366The values passed in at runtime (which are to be placed into the content or used
367to select between different content items) are called _external variables_.
368The author of a _message_ can also assign _local variables_, including
369variables that modify _external variables_.
370
371This part of the MessageFormat specification defines the syntax for a _message_,
372along with the concepts and terminology needed when processing a _message_
373during the [formatting](#formatting) of a _message_ at runtime.
374
375The complete formal syntax of a _message_ is described by the [ABNF](#complete-abnf).
376
377#### Well-formed vs. Valid Messages
378
379A _message_ is **_<dfn>well-formed</dfn>_** if it satisfies all the rules of the grammar.
380Attempting to parse a _message_ that is not _well-formed_ will result in a _Syntax Error_.
381
382A _message_ is **_<dfn>valid</dfn>_** if it is _well-formed_ and
383**also** meets the additional content restrictions
384and semantic requirements about its structure defined below for
385_declarations_, _matcher_ and _options_.
386Attempting to parse a _message_ that is not _valid_ will result in a _Data Model Error_.
387
388### The Message
389
390A **_<dfn>message</dfn>_** is the complete template for a specific message formatting request.
391
392> [!NOTE]
393> This syntax is designed to be embeddable into many different programming languages and formats.
394> As such, it avoids constructs, such as character escapes, that are specific to any given file
395> format or processor.
396> In particular, it avoids using quote characters common to many file formats and formal languages
397> so that these do not need to be escaped in the body of a _message_.
398
399> [!NOTE]
400> In general (and except where required by the syntax), whitespace carries no meaning in the structure
401> of a _message_. While many of the examples in this spec are written on multiple lines, the formatting
402> shown is primarily for readability.
403>
404> > **Example** This _message_:
405> >
406> > ```
407> > .local $foo   =   { |horse| }
408> > {{You have a {$foo}!}}
409> > ```
410> >
411> > Can also be written as:
412> >
413> > ```
414> > .local $foo={|horse|}{{You have a {$foo}!}}
415> > ```
416> >
417> > An exception to this is: whitespace inside a _pattern_ is **always** significant.
418
419> [!NOTE]
420> The syntax assumes that each _message_ will be displayed with a left-to-right display order
421> and be processed in the logical character order.
422> The syntax also permits the use of right-to-left characters in _identifiers_,
423> _literals_, and other values.
424> This can result in confusion when viewing the _message_.
425>
426> Additional restrictions or requirements,
427> such as permitting the use of certain bidirectional control characters in the syntax,
428> might be added during the Tech Preview to better manage bidirectional text.
429> Feedback on the creation and management of _messages_
430> containing bidirectional tokens is strongly desired.
431
432A _message_ can be a _simple message_ or it can be a _complex message_.
433
434```abnf
435message = simple-message / complex-message
436```
437
438A **_<dfn>simple message</dfn>_** contains a single _pattern_,
439with restrictions on its first character.
440An empty string is a valid _simple message_.
441
442```abnf
443simple-message = [simple-start pattern]
444simple-start   = simple-start-char / text-escape / placeholder
445```
446
447A **_<dfn>complex message</dfn>_** is any _message_ that contains _declarations_,
448a _matcher_, or both.
449A _complex message_ always begins with either a keyword that has a `.` prefix or a _quoted pattern_
450and consists of:
451
4521. an optional list of _declarations_, followed by
4532. a _complex body_
454
455```abnf
456complex-message = *(declaration [s]) complex-body
457```
458
459#### Declarations
460
461A **_<dfn>declaration</dfn>_** binds a _variable_ identifier to a value within the scope of a _message_.
462This _variable_ can then be used in other _expressions_ within the same _message_.
463_Declarations_ are optional: many messages will not contain any _declarations_.
464
465An **_<dfn>input-declaration</dfn>_** binds a _variable_ to an external input value.
466The _variable-expression_ of an _input-declaration_
467MAY include an _annotation_ that is applied to the external value.
468
469A **_<dfn>local-declaration</dfn>_** binds a _variable_ to the resolved value of an _expression_.
470
471For compatibility with later MessageFormat 2 specification versions,
472_declarations_ MAY also include _reserved statements_.
473
474```abnf
475declaration       = input-declaration / local-declaration / reserved-statement
476input-declaration = input [s] variable-expression
477local-declaration = local s variable [s] "=" [s] expression
478```
479
480_Variables_, once declared, MUST NOT be redeclared.
481A _message_ that does any of the following is not _valid_ and will produce a
482_Duplicate Declaration_ error during processing:
483- A _declaration_ MUST NOT bind a _variable_
484  that appears as a _variable_ anywhere within a previous _declaration_.
485- An _input-declaration_ MUST NOT bind a _variable_
486  that appears anywhere within the _annotation_ of its _variable-expression_.
487- A _local-declaration_ MUST NOT bind a _variable_ that appears in its _expression_.
488
489A _local-declaration_ MAY overwrite an external input value as long as the
490external input value does not appear in a previous _declaration_.
491
492> [!NOTE]
493> These restrictions only apply to _declarations_.
494> A _placeholder_ or _selector_ can apply a different annotation to a _variable_
495> than one applied to the same _variable_ named in a _declaration_.
496> For example, this message is _valid_:
497> ```
498> .input {$var :number maximumFractionDigits=0}
499> .match {$var :number maximumFractionDigits=2}
500> 0 {{The selector can apply a different annotation to {$var} for the purposes of selection}}
501> * {{A placeholder in a pattern can apply a different annotation to {$var :number maximumFractionDigits=3}}}
502> ```
503> (See the [Errors](#errors) section for examples of invalid messages)
504
505##### Reserved Statements
506
507A **_<dfn>reserved statement</dfn>_** reserves additional `.keywords`
508for use by future versions of this specification.
509Any such future keyword must start with `.`,
510followed by two or more lower-case ASCII characters.
511
512The rest of the statement supports
513a similarly wide range of content as _reserved annotations_,
514but it MUST end with one or more _expressions_.
515
516```abnf
517reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
518reserved-keyword   = "." name
519```
520
521> [!NOTE]
522> The `reserved-keyword` ABNF rule is a simplification,
523> as it MUST NOT be considered to match any of the existing keywords
524> `.input`, `.local`, or `.match`.
525
526This allows flexibility in future standardization,
527as future definitions MAY define additional semantics and constraints
528on the contents of these _reserved statements_.
529
530Implementations MUST NOT assign meaning or semantics to a _reserved statement_:
531these are reserved for future standardization.
532Implementations MUST NOT remove or alter the contents of a _reserved statement_.
533
534#### Complex Body
535
536The **_<dfn>complex body</dfn>_** of a _complex message_ is the part that will be formatted.
537The _complex body_ consists of either a _quoted pattern_ or a _matcher_.
538
539```abnf
540complex-body = quoted-pattern / matcher
541```
542
543### Pattern
544
545A **_<dfn>pattern</dfn>_** contains a sequence of _text_ and _placeholders_ to be formatted as a unit.
546Unless there is an error, resolving a _message_ always results in the formatting
547of a single _pattern_.
548
549```abnf
550pattern = *(text-char / text-escape / placeholder)
551```
552A _pattern_ MAY be empty.
553
554A _pattern_ MAY contain an arbitrary number of _placeholders_ to be evaluated
555during the formatting process.
556
557#### Quoted Pattern
558
559A **_<dfn>quoted pattern</dfn>_** is a _pattern_ that is "quoted" to prevent
560interference with other parts of the _message_.
561A _quoted pattern_ starts with a sequence of two U+007B LEFT CURLY BRACKET `{{`
562and ends with a sequence of two U+007D RIGHT CURLY BRACKET `}}`.
563
564```abnf
565quoted-pattern = "{{" pattern "}}"
566```
567
568A _quoted pattern_ MAY be empty.
569
570> An empty _quoted pattern_:
571>
572> ```
573> {{}}
574> ```
575
576#### Text
577
578**_<dfn>text</dfn>_** is the translateable content of a _pattern_.
579Any Unicode code point is allowed, except for U+0000 NULL
580and the surrogate code points U+D800 through U+DFFF inclusive.
581The characters U+005C REVERSE SOLIDUS `\`,
582U+007B LEFT CURLY BRACKET `{`, and U+007D RIGHT CURLY BRACKET `}`
583MUST be escaped as `\\`, `\{`, and `\}` respectively.
584
585In the ABNF, _text_ is represented by non-empty sequences of
586`simple-start-char`, `text-char`, and `text-escape`.
587The first of these is used at the start of a _simple message_,
588and matches `text-char` except for not allowing U+002E FULL STOP `.`.
589The ABNF uses `content-char` as a shared base for _text_ and _quoted literal_ characters.
590
591Whitespace in _text_, including tabs, spaces, and newlines is significant and MUST
592be preserved during formatting.
593
594```abnf
595simple-start-char = content-char / s / "@" / "|"
596text-char         = content-char / s / "." / "@" / "|"
597quoted-char       = content-char / s / "." / "@" / "{" / "}"
598reserved-char     = content-char / "."
599content-char      = %x01-08        ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
600                  / %x0B-0C        ; omit CR (%x0D)
601                  / %x0E-1F        ; omit SP (%x20)
602                  / %x21-2D        ; omit . (%x2E)
603                  / %x2F-3F        ; omit @ (%x40)
604                  / %x41-5B        ; omit \ (%x5C)
605                  / %x5D-7A        ; omit { | } (%x7B-7D)
606                  / %x7E-2FFF      ; omit IDEOGRAPHIC SPACE (%x3000)
607                  / %x3001-D7FF    ; omit surrogates
608                  / %xE000-10FFFF
609```
610
611When a _pattern_ is quoted by embedding the _pattern_ in curly brackets, the
612resulting _message_ can be embedded into
613various formats regardless of the container's whitespace trimming rules.
614Otherwise, care must be taken to ensure that pattern-significant whitespace is preserved.
615
616> **Example**
617> In a Java `.properties` file, the values `hello` and `hello2` both contain
618> an identical _message_ which consists of a single _pattern_.
619> This _pattern_ consists of _text_ with exactly three spaces before and after the word "Hello":
620>
621> ```properties
622> hello = {{   Hello   }}
623> hello2=\   Hello  \
624> ```
625
626#### Placeholder
627
628A **_<dfn>placeholder</dfn>_** is an _expression_ or _markup_ that appears inside of a _pattern_
629and which will be replaced during the formatting of a _message_.
630
631```abnf
632placeholder = expression / markup
633```
634
635### Matcher
636
637A **_<dfn>matcher</dfn>_** is the _complex body_ of a _message_ that allows runtime selection
638of the _pattern_ to use for formatting.
639This allows the form or content of a _message_ to vary based on values
640determined at runtime.
641
642A _matcher_ consists of the keyword `.match` followed by at least one _selector_
643and at least one _variant_.
644
645When the _matcher_ is processed, the result will be a single _pattern_ that serves
646as the template for the formatting process.
647
648A _message_ can only be considered _valid_ if the following requirements are
649satisfied:
650
651- The number of _keys_ on each _variant_ MUST be equal to the number of _selectors_.
652- At least one _variant_ MUST exist whose _keys_ are all equal to the "catch-all" key `*`.
653- Each _selector_ MUST have an _annotation_,
654  or contain a _variable_ that directly or indirectly references a _declaration_ with an _annotation_.
655
656```abnf
657matcher         = match-statement 1*([s] variant)
658match-statement = match 1*([s] selector)
659```
660
661> A _message_ with a _matcher_:
662>
663> ```
664> .input {$count :number}
665> .match {$count}
666> one {{You have {$count} notification.}}
667> *   {{You have {$count} notifications.}}
668> ```
669
670> A _message_ containing a _matcher_ formatted on a single line:
671>
672> ```
673> .match {:platform} windows {{Settings}} * {{Preferences}}
674> ```
675
676#### Selector
677
678A **_<dfn>selector</dfn>_** is an _expression_ that ranks or excludes the
679_variants_ based on the value of the corresponding _key_ in each _variant_.
680The combination of _selectors_ in a _matcher_ thus determines
681which _pattern_ will be used during formatting.
682
683```abnf
684selector = expression
685```
686
687There MUST be at least one _selector_ in a _matcher_.
688There MAY be any number of additional _selectors_.
689
690> A _message_ with a single _selector_ that uses a custom _function_
691> `:hasCase` which is a _selector_ that allows the _message_ to choose a _pattern_
692> based on grammatical case:
693>
694> ```
695> .match {$userName :hasCase}
696> vocative {{Hello, {$userName :person case=vocative}!}}
697> accusative {{Please welcome {$userName :person case=accusative}!}}
698> * {{Hello!}}
699> ```
700
701> A message with two _selectors_:
702>
703> ```
704> .input {$numLikes :integer}
705> .input {$numShares :integer}
706> .match {$numLikes} {$numShares}
707> 0   0   {{Your item has no likes and has not been shared.}}
708> 0   one {{Your item has no likes and has been shared {$numShares} time.}}
709> 0   *   {{Your item has no likes and has been shared {$numShares} times.}}
710> one 0   {{Your item has {$numLikes} like and has not been shared.}}
711> one one {{Your item has {$numLikes} like and has been shared {$numShares} time.}}
712> one *   {{Your item has {$numLikes} like and has been shared {$numShares} times.}}
713> *   0   {{Your item has {$numLikes} likes and has not been shared.}}
714> *   one {{Your item has {$numLikes} likes and has been shared {$numShares} time.}}
715> *   *   {{Your item has {$numLikes} likes and has been shared {$numShares} times.}}
716> ```
717
718#### Variant
719
720A **_<dfn>variant</dfn>_** is a _quoted pattern_ associated with a set of _keys_ in a _matcher_.
721Each _variant_ MUST begin with a sequence of _keys_,
722and terminate with a valid _quoted pattern_.
723The number of _keys_ in each _variant_ MUST match the number of _selectors_ in the _matcher_.
724
725Each _key_ is separated from each other by whitespace.
726Whitespace is permitted but not required between the last _key_ and the _quoted pattern_.
727
728```abnf
729variant = key *(s key) [s] quoted-pattern
730key     = literal / "*"
731```
732
733##### Key
734
735A **_<dfn>key</dfn>_** is a value in a _variant_ for use by a _selector_ when ranking
736or excluding _variants_ during the _matcher_ process.
737A _key_ can be either a _literal_ value or the "catch-all" key `*`.
738
739The **_<dfn>catch-all key</dfn>_** is a special key, represented by `*`,
740that matches all values for a given _selector_.
741
742### Expressions
743
744An **_<dfn>expression</dfn>_** is a part of a _message_ that will be determined
745during the _message_'s formatting.
746
747An _expression_ MUST begin with U+007B LEFT CURLY BRACKET `{`
748and end with U+007D RIGHT CURLY BRACKET `}`.
749An _expression_ MUST NOT be empty.
750An _expression_ cannot contain another _expression_.
751An _expression_ MAY contain one more _attributes_.
752
753A **_<dfn>literal-expression</dfn>_** contains a _literal_,
754optionally followed by an _annotation_.
755
756A **_<dfn>variable-expression</dfn>_** contains a _variable_,
757optionally followed by an _annotation_.
758
759An **_<dfn>annotation-expression</dfn>_** contains an _annotation_ without an _operand_.
760
761```abnf
762expression            = literal-expression
763                      / variable-expression
764                      / annotation-expression
765literal-expression    = "{" [s] literal [s annotation] *(s attribute) [s] "}"
766variable-expression   = "{" [s] variable [s annotation] *(s attribute) [s] "}"
767annotation-expression = "{" [s] annotation *(s attribute) [s] "}"
768```
769
770There are several types of _expression_ that can appear in a _message_.
771All _expressions_ share a common syntax. The types of _expression_ are:
772
7731. The value of a _local-declaration_
7742. A _selector_
7753. A kind of _placeholder_ in a _pattern_
776
777Additionally, an _input-declaration_ can contain a _variable-expression_.
778
779> Examples of different types of _expression_
780>
781> Declarations:
782>
783> ```
784> .input {$x :function option=value}
785> .local $y = {|This is an expression|}
786> ```
787>
788> Selectors:
789>
790> ```
791> .match {$selector :functionRequired}
792> ```
793>
794> Placeholders:
795>
796> ```
797> This placeholder contains a literal expression: {|literal|}
798> This placeholder contains a variable expression: {$variable}
799> This placeholder references a function on a variable: {$variable :function with=options}
800> This placeholder contains a function expression with a variable-valued option: {:function option=$variable}
801> ```
802
803#### Annotation
804
805An **_<dfn>annotation</dfn>_** is part of an _expression_ containing either
806a _function_ together with its associated _options_, or
807a _private-use annotation_ or a _reserved annotation_.
808
809```abnf
810annotation = function
811           / private-use-annotation
812           / reserved-annotation
813```
814
815An **_<dfn>operand</dfn>_** is the _literal_ of a _literal-expression_ or
816the _variable_ of a _variable-expression_.
817
818An _annotation_ can appear in an _expression_ by itself or following a single _operand_.
819When following an _operand_, the _operand_ serves as input to the _annotation_.
820
821##### Function
822
823A **_<dfn>function</dfn>_** is named functionality in an _annotation_.
824_Functions_ are used to evaluate, format, select, or otherwise process data
825values during formatting.
826
827Each _function_ is defined by the runtime's _function registry_.
828A _function_'s entry in the _function registry_ will define
829whether the _function_ is a _selector_ or formatter (or both),
830whether an _operand_ is required,
831what form the values of an _operand_ can take,
832what _options_ and _option_ values are valid,
833and what outputs might result.
834See [function registry](#function-registry) for more information.
835
836A _function_ starts with a prefix sigil `:` followed by an _identifier_.
837The _identifier_ MAY be followed by one or more _options_.
838_Options_ are not required.
839
840```abnf
841function = ":" identifier *(s option)
842```
843
844> A _message_ with a _function_ operating on the _variable_ `$now`:
845>
846> ```
847> It is now {$now :datetime}.
848> ```
849
850###### Options
851
852An **_<dfn>option</dfn>_** is a key-value pair
853containing a named argument that is passed to a _function_.
854
855An _option_ has an _identifier_ and a _value_.
856The _identifier_ is separated from the _value_ by an U+003D EQUALS SIGN `=` along with
857optional whitespace.
858The value of an _option_ can be either a _literal_ or a _variable_.
859
860Multiple _options_ are permitted in an _annotation_.
861_Options_ are separated from the preceding _function_ _identifier_
862and from each other by whitespace.
863Each _option_'s _identifier_ MUST be unique within the _annotation_:
864an _annotation_ with duplicate _option_ _identifiers_ is not valid.
865
866The order of _options_ is not significant.
867
868```abnf
869option = identifier [s] "=" [s] (literal / variable)
870```
871
872> Examples of _functions_ with _options_
873>
874> A _message_ using the `:datetime` function.
875> The _option_ `weekday` has the literal `long` as its value:
876>
877> ```
878> Today is {$date :datetime weekday=long}!
879> ```
880
881> A _message_ using the `:datetime` function.
882> The _option_ `weekday` has a variable `$dateStyle` as its value:
883>
884> ```
885> Today is {$date :datetime weekday=$dateStyle}!
886> ```
887
888##### Private-Use Annotations
889
890A **_<dfn>private-use annotation</dfn>_** is an _annotation_ whose syntax is reserved
891for use by a specific implementation or by private agreement between multiple implementations.
892Implementations MAY define their own meaning and semantics for _private-use annotations_.
893
894A _private-use annotation_ starts with either U+0026 AMPERSAND `&` or U+005E CIRCUMFLEX ACCENT `^`.
895
896Characters, including whitespace, are assigned meaning by the implementation.
897The definition of escapes in the `reserved-body` production, used for the body of
898a _private-use annotation_ is an affordance to implementations that
899wish to use a syntax exactly like other functions. Specifically:
900
901- The characters `\`, `{`, and `}` MUST be escaped as `\\`, `\{`, and `\}` respectively
902  when they appear in the body of a _private-use annotation_.
903- The character `|` is special: it SHOULD be escaped as `\|` in a _private-use annotation_,
904  but can appear unescaped as long as it is paired with another `|`.
905  This is an affordance to allow _literals_ to appear in the private use syntax.
906
907A _private-use annotation_ MAY be empty after its introducing sigil.
908
909```abnf
910private-use-annotation = private-start [[s] reserved-body]
911private-start          = "^" / "&"
912```
913
914> [!NOTE]
915> Users are cautioned that _private-use annotations_ cannot be reliably exchanged
916> and can result in errors during formatting.
917> It is generally a better idea to use the function registry
918> to define additional formatting or annotation options.
919
920> Here are some examples of what _private-use_ sequences might look like:
921>
922> ```
923> Here's private use with an operand: {$foo &bar}
924> Here's a placeholder that is entirely private-use: {&anything here}
925> Here's a private-use function that uses normal function syntax: {$operand ^foo option=|literal|}
926> The character \| has to be paired or escaped: {&private || |something between| or isolated: \| }
927> Stop {& "translate 'stop' as a verb" might be a translator instruction or comment }
928> Protect stuff in {^ph}<a>{^/ph}private use{^ph}</a>{^/ph}
929> ```
930
931##### Reserved Annotations
932
933A **_<dfn>reserved annotation</dfn>_** is an _annotation_ whose syntax is reserved
934for future standardization.
935
936A _reserved annotation_ starts with a reserved character.
937The remaining part of a _reserved annotation_, called a _reserved body_,
938MAY be empty or contain arbitrary text that starts and ends with
939a non-whitespace character.
940
941This allows maximum flexibility in future standardization,
942as future definitions MAY define additional semantics and constraints
943on the contents of these _annotations_.
944
945Implementations MUST NOT assign meaning or semantics to
946an _annotation_ starting with `reserved-annotation-start`:
947these are reserved for future standardization.
948Whitespace before or after a _reserved body_ is not part of the _reserved body_.
949Implementations MUST NOT remove or alter the contents of a _reserved body_,
950including any interior whitespace,
951but MAY remove or alter whitespace before or after the _reserved body_.
952
953While a reserved sequence is technically "well-formed",
954unrecognized _reserved-annotations_ or _private-use-annotations_ have no meaning.
955
956```abnf
957reserved-annotation       = reserved-annotation-start [[s] reserved-body]
958reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~"
959
960reserved-body             = reserved-body-part *([s] reserved-body-part)
961reserved-body-part        = reserved-char / reserved-escape / quoted
962```
963
964### Markup
965
966**_<dfn>Markup</dfn>_** _placeholders_ are _pattern_ parts
967that can be used to represent non-language parts of a _message_,
968such as inline elements or styling that should apply to a span of parts.
969
970_Markup_ MUST begin with U+007B LEFT CURLY BRACKET `{`
971and end with U+007D RIGHT CURLY BRACKET `}`.
972_Markup_ MAY contain one more _attributes_.
973
974_Markup_ comes in three forms:
975
976**_<dfn>Markup-open</dfn>_** starts with U+0023 NUMBER SIGN `#` and
977represents an opening element within the _message_,
978such as markup used to start a span.
979It MAY include _options_.
980
981**_<dfn>Markup-standalone</dfn>_** starts with U+0023 NUMBER SIGN `#`
982and has a U+002F SOLIDUS `/` immediately before its closing `}`
983representing a self-closing or standalone element within the _message_.
984It MAY include _options_.
985
986**_<dfn>Markup-close</dfn>_** starts with U+002F SOLIDUS `/` and
987is a _pattern_ part ending a span.
988
989```abnf
990markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}"  ; open and standalone
991       / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}"  ; close
992```
993
994> A _message_ with one `button` markup span and a standalone `img` markup element:
995>
996> ```
997> {#button}Submit{/button} or {#img alt=|Cancel| /}.
998> ```
999
1000> A _message_ with attributes in the closing tag:
1001>
1002> ```
1003> {#ansi attr=|bold,italic|}Bold and italic{/ansi attr=|bold|} italic only {/ansi attr=|italic|} no formatting.}
1004> ```
1005
1006A _markup-open_ can appear without a corresponding _markup-close_.
1007A _markup-close_ can appear without a corresponding _markup-open_.
1008_Markup_ _placeholders_ can appear in any order without making the _message_ invalid.
1009However, specifications or implementations defining _markup_ might impose requirements
1010on the pairing, ordering, or contents of _markup_ during _formatting_.
1011
1012### Attributes
1013
1014**_Attributes_ are reserved for standardization by future versions of this specification.**
1015Examples in this section are meant to be illustrative and
1016might not match future requirements or usage.
1017
1018> [!NOTE]
1019> The Tech Preview does not provide a built-in mechanism for overriding
1020> values in the _formatting context_ (most notably the locale)
1021> Nor does it provide a mechanism for identifying specific expressions
1022> such as by assigning a name or id.
1023> The utility of these types of mechanisms has been debated.
1024> There are at least two proposed mechanisms for implementing support for
1025> these.
1026> Specifically, one mechanism would be to reserve specifically-named options,
1027> possibly using a Unicode namespace (i.e. `locale=xxx` or `u:locale=xxx`).
1028> Such options would be reserved for use in any and all functions or markup.
1029> The other mechanism would be to use the reserved "expression attribute" syntax
1030> for this purpose (i.e. `@locale=xxx` or `@id=foo`)
1031> Neither mechanism was included in this Tech Preview.
1032> Feedback on the preferred mechanism for managing these features
1033> is strongly desired.
1034>
1035> In the meantime, function authors and other implementers are cautioned to avoid creating
1036> function-specific or implementation-specific option values for this purpose.
1037> One workaround would be to use the implementation's namespace for these
1038> features to insure later interoperability when such a mechanism is finalized
1039> during the Tech Preview period.
1040> Specifically:
1041> - Avoid specifying an option for setting the locale of an expression as different from
1042>   that of the overall _message_ locale, or use a namespace that later maps to the final
1043>   mechanism.
1044> - Avoid specifying options for the purpose of linking placeholders
1045>   (such as to pair opening markup to closing markup).
1046>   If such an option is created, the implementer should use an
1047>   implementation-specific namespace.
1048>   Users and implementers are cautioned that such options might be
1049>   replaced with a standard mechanism in a future version.
1050> - Avoid specifying generic options to communicate with translators and
1051>   translation tooling (i.e. implementation-specific options that apply to all
1052>   functions.
1053> The above are all desirable features.
1054> We welcome contributions to and proposals for such features during the
1055> Technical Preview.
1056
1057An **_<dfn>attribute</dfn>_** is an _identifier_ with an optional value
1058that appears in an _expression_ or in _markup_.
1059
1060_Attributes_ are prefixed by a U+0040 COMMERCIAL AT `@` sign,
1061followed by an _identifier_.
1062An _attribute_ MAY have a _value_ which is separated from the _identifier_
1063by an U+003D EQUALS SIGN `=` along with optional whitespace.
1064The _value_ of an _attribute_ can be either a _literal_ or a _variable_.
1065
1066Multiple _attributes_ are permitted in an _expression_ or _markup_.
1067Each _attribute_ is separated by whitespace.
1068
1069The order of _attributes_ is not significant.
1070
1071
1072```abnf
1073attribute = "@" identifier [[s] "=" [s] (literal / variable)]
1074```
1075
1076> Examples of _expressions_ and _markup_ with _attributes_:
1077>
1078> A _message_ including a _literal_ that should not be translated:
1079>
1080> ```
1081> In French, "{|bonjour| @translate=no}" is a greeting
1082> ```
1083>
1084> A _message_ with _markup_ that should not be copied:
1085>
1086> ```
1087> Have a {#span @can-copy}great and wonderful{/span @can-copy} birthday!
1088> ```
1089
1090### Other Syntax Elements
1091
1092This section defines common elements used to construct _messages_.
1093
1094#### Keywords
1095
1096A **_<dfn>keyword</dfn>_** is a reserved token that has a unique meaning in the _message_ syntax.
1097
1098The following three keywords are defined: `.input`, `.local`, and `.match`.
1099Keywords are always lowercase and start with U+002E FULL STOP `.`.
1100
1101```abnf
1102input = %s".input"
1103local = %s".local"
1104match = %s".match"
1105```
1106
1107#### Literals
1108
1109A **_<dfn>literal</dfn>_** is a character sequence that appears outside
1110of _text_ in various parts of a _message_.
1111A _literal_ can appear
1112as a _key_ value,
1113as the _operand_ of a _literal-expression_,
1114or in the value of an _option_.
1115A _literal_ MAY include any Unicode code point
1116except for U+0000 NULL or the surrogate code points U+D800 through U+DFFF.
1117
1118All code points are preserved.
1119
1120A **_<dfn>quoted</dfn>_** literal begins and ends with U+005E VERTICAL BAR `|`.
1121The characters `\` and `|` within a _quoted_ literal MUST be
1122escaped as `\\` and `\|`.
1123
1124An **_<dfn>unquoted</dfn>_** literal is a _literal_ that does not require the `|`
1125quotes around it to be distinct from the rest of the _message_ syntax.
1126An _unquoted_ MAY be used when the content of the _literal_
1127contains no whitespace and otherwise matches the `unquoted` production.
1128Any _unquoted_ literal MAY be _quoted_.
1129Implementations MUST NOT distinguish between _quoted_ and _unquoted_ literals
1130that have the same sequence of code points.
1131
1132_Unquoted_ literals can contain a _name_ or consist of a _number-literal_.
1133A _number-literal_ uses the same syntax as JSON and is intended for the encoding
1134of number values in _operands_ or _options_, or as _keys_ for _variants_.
1135
1136```abnf
1137literal        = quoted / unquoted
1138quoted         = "|" *(quoted-char / quoted-escape) "|"
1139unquoted       = name / number-literal
1140number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT]
1141```
1142
1143#### Names and Identifiers
1144
1145An **_<dfn>identifier</dfn>_** is a character sequence that
1146identifies a _function_, _markup_, or _option_.
1147Each _identifier_ consists of a _name_ optionally preceeded by
1148a _namespace_.
1149When present, the _namespace_ is separated from the _name_ by a
1150U+003A COLON `:`.
1151Built-in _functions_ and their _options_ do not have a _namespace_ identifier.
1152
1153The _namespace_ `u` (U+0075 LATIN SMALL LETTER U)
1154is reserved for future standardization.
1155
1156_Function_ _identifiers_ are prefixed with `:`.
1157_Markup_ _identifiers_ are prefixed with `#` or `/`.
1158_Option_ _identifiers_ have no prefix.
1159
1160A **_<dfn>name</dfn>_** is a character sequence used in an _identifier_
1161or as the name for a _variable_
1162or the value of an _unquoted_ _literal_.
1163
1164_Variable_ names are prefixed with `$`.
1165
1166Valid content for _names_ is based on <cite>Namespaces in XML 1.0</cite>'s
1167[NCName](https://www.w3.org/TR/xml-names/#NT-NCName).
1168This is different from XML's [Name](https://www.w3.org/TR/xml/#NT-Name)
1169in that it MUST NOT contain a U+003A COLON `:`.
1170Otherwise, the set of characters allowed in a _name_ is large.
1171
1172> [!NOTE]
1173> _External variables_ can be passed in that are not valid _names_.
1174> Such variables cannot be referenced in a _message_,
1175> but are not otherwise errors.
1176
1177Examples:
1178> A variable:
1179>```
1180> This has a {$variable}
1181>```
1182> A function:
1183> ```
1184> This has a {:function}
1185> ```
1186> An add-on function from the `icu` namespace:
1187> ```
1188> This has a {:icu:function}
1189> ```
1190> An option and an add-on option:
1191> ```
1192> This has {:options option=value icu:option=add_on}
1193> ```
1194
1195Support for _namespaces_ and their interpretation is implementation-defined
1196in this release.
1197
1198```abnf
1199variable   = "$" name
1200option     = identifier [s] "=" [s] (literal / variable)
1201
1202identifier = [namespace ":"] name
1203namespace  = name
1204name       = name-start *name-char
1205name-start = ALPHA / "_"
1206           / %xC0-D6 / %xD8-F6 / %xF8-2FF
1207           / %x370-37D / %x37F-1FFF / %x200C-200D
1208           / %x2070-218F / %x2C00-2FEF / %x3001-D7FF
1209           / %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
1210name-char  = name-start / DIGIT / "-" / "."
1211           / %xB7 / %x300-36F / %x203F-2040
1212```
1213
1214#### Escape Sequences
1215
1216An **_<dfn>escape sequence</dfn>_** is a two-character sequence starting with
1217U+005C REVERSE SOLIDUS `\`.
1218
1219An _escape sequence_ allows the appearance of lexically meaningful characters
1220in the body of _text_, _quoted_, or _reserved_ (which includes, in this case,
1221_private-use_) sequences respectively:
1222
1223```abnf
1224text-escape     = backslash ( backslash / "{" / "}" )
1225quoted-escape   = backslash ( backslash / "|" )
1226reserved-escape = backslash ( backslash / "{" / "|" / "}" )
1227backslash       = %x5C ; U+005C REVERSE SOLIDUS "\"
1228```
1229
1230#### Whitespace
1231
1232**_<dfn>Whitespace</dfn>_** is defined as one or more of
1233U+0009 CHARACTER TABULATION (tab),
1234U+000A LINE FEED (new line),
1235U+000D CARRIAGE RETURN,
1236U+3000 IDEOGRAPHIC SPACE,
1237or U+0020 SPACE.
1238
1239Inside _patterns_ and _quoted literals_,
1240whitespace is part of the content and is recorded and stored verbatim.
1241Whitespace is not significant outside translatable text, except where required by the syntax.
1242
1243> [!NOTE]
1244> The character U+3000 IDEOGRAPHIC SPACE is included in whitespace for
1245> compatibility with certain East Asian keyboards and input methods,
1246> in which users might accidentally create these characters in a _message_.
1247
1248```abnf
1249s = 1*( SP / HTAB / CR / LF / %x3000 )
1250```
1251
1252## Complete ABNF
1253
1254The grammar below uses the ABNF notation [[STD68](https://www.rfc-editor.org/info/std68)],
1255including the modifications found in [RFC 7405](https://www.rfc-editor.org/rfc/rfc7405).
1256
1257RFC7405 defines a variation of ABNF that is case-sensitive.
1258Some ABNF tools are only compatible with the specification found in
1259[RFC 5234](https://www.rfc-editor.org/rfc/rfc5234).
1260To make `message.abnf` compatible with that version of ABNF, replace
1261the rules of the same name with this block:
1262
1263```abnf
1264input = %x2E.69.6E.70.75.74  ; ".input"
1265local = %x2E.6C.6F.63.61.6C  ; ".local"
1266match = %x2E.6D.61.74.63.68  ; ".match"
1267```
1268
1269### `message.abnf`
1270
1271```abnf
1272message           = simple-message / complex-message
1273
1274simple-message    = [simple-start pattern]
1275simple-start      = simple-start-char / text-escape / placeholder
1276pattern           = *(text-char / text-escape / placeholder)
1277placeholder       = expression / markup
1278
1279complex-message   = *(declaration [s]) complex-body
1280declaration       = input-declaration / local-declaration / reserved-statement
1281complex-body      = quoted-pattern / matcher
1282
1283input-declaration = input [s] variable-expression
1284local-declaration = local s variable [s] "=" [s] expression
1285
1286quoted-pattern    = "{{" pattern "}}"
1287
1288matcher           = match-statement 1*([s] variant)
1289match-statement   = match 1*([s] selector)
1290selector          = expression
1291variant           = key *(s key) [s] quoted-pattern
1292key               = literal / "*"
1293
1294; Expressions
1295expression            = literal-expression
1296                      / variable-expression
1297                      / annotation-expression
1298literal-expression    = "{" [s] literal [s annotation] *(s attribute) [s] "}"
1299variable-expression   = "{" [s] variable [s annotation] *(s attribute) [s] "}"
1300annotation-expression = "{" [s] annotation *(s attribute) [s] "}"
1301
1302annotation            = function
1303                      / private-use-annotation
1304                      / reserved-annotation
1305
1306markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}"  ; open and standalone
1307       / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}"  ; close
1308
1309; Expression and literal parts
1310function       = ":" identifier *(s option)
1311option         = identifier [s] "=" [s] (literal / variable)
1312; Attributes are reserved for future standardization
1313attribute      = "@" identifier [[s] "=" [s] (literal / variable)]
1314
1315variable       = "$" name
1316literal        = quoted / unquoted
1317quoted         = "|" *(quoted-char / quoted-escape) "|"
1318unquoted       = name / number-literal
1319; number-literal matches JSON number (https://www.rfc-editor.org/rfc/rfc8259#section-6)
1320number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT]
1321
1322; Keywords; Note that these are case-sensitive
1323input = %s".input"
1324local = %s".local"
1325match = %s".match"
1326
1327; Reserve additional .keywords for use by future versions of this specification.
1328reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
1329; Note that the following production is a simplification,
1330; as this rule MUST NOT be considered to match existing keywords
1331; (`.input`, `.local`, and `.match`).
1332reserved-keyword   = "." name
1333
1334; Reserve additional sigils for use by future versions of this specification.
1335reserved-annotation       = reserved-annotation-start [[s] reserved-body]
1336reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~"
1337
1338; Reserve sigils for private-use by implementations.
1339private-use-annotation    = private-start [[s] reserved-body]
1340private-start             = "^" / "&"
1341reserved-body             = reserved-body-part *([s] reserved-body-part)
1342reserved-body-part        = reserved-char / reserved-escape / quoted
1343
1344; Names and identifiers
1345; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName
1346; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName
1347identifier = [namespace ":"] name
1348namespace  = name
1349name       = name-start *name-char
1350name-start = ALPHA / "_"
1351           / %xC0-D6 / %xD8-F6 / %xF8-2FF
1352           / %x370-37D / %x37F-1FFF / %x200C-200D
1353           / %x2070-218F / %x2C00-2FEF / %x3001-D7FF
1354           / %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
1355name-char  = name-start / DIGIT / "-" / "."
1356           / %xB7 / %x300-36F / %x203F-2040
1357
1358; Restrictions on characters in various contexts
1359simple-start-char = content-char / s / "@" / "|"
1360text-char         = content-char / s / "." / "@" / "|"
1361quoted-char       = content-char / s / "." / "@" / "{" / "}"
1362reserved-char     = content-char / "."
1363content-char      = %x01-08        ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
1364                  / %x0B-0C        ; omit CR (%x0D)
1365                  / %x0E-1F        ; omit SP (%x20)
1366                  / %x21-2D        ; omit . (%x2E)
1367                  / %x2F-3F        ; omit @ (%x40)
1368                  / %x41-5B        ; omit \ (%x5C)
1369                  / %x5D-7A        ; omit { | } (%x7B-7D)
1370                  / %x7E-2FFF      ; omit IDEOGRAPHIC SPACE (%x3000)
1371                  / %x3001-D7FF    ; omit surrogates
1372                  / %xE000-10FFFF
1373
1374; Character escapes
1375text-escape     = backslash ( backslash / "{" / "}" )
1376quoted-escape   = backslash ( backslash / "|" )
1377reserved-escape = backslash ( backslash / "{" / "|" / "}" )
1378backslash       = %x5C ; U+005C REVERSE SOLIDUS "\"
1379
1380; Whitespace
1381s = 1*( SP / HTAB / CR / LF / %x3000 )
1382```
1383
1384## Errors
1385
1386Errors in messages and their formatting MAY occur and be detected
1387at different stages of processing.
1388Where available,
1389the use of validation tools is recommended,
1390as early detection of errors makes their correction easier.
1391
1392### Error Handling
1393
1394_Syntax Errors_ and _Data Model Errors_ apply to all message processors,
1395and MUST be emitted as soon as possible.
1396The other error categories are only emitted during formatting,
1397but it might be possible to detect them with validation tools.
1398
1399During selection, an _expression_ handler MUST only emit _Resolution Errors_ and _Selection Errors_.
1400During formatting, an _expression_ handler MUST only emit _Resolution Errors_ and _Formatting Errors_.
1401
1402_Resolution Errors_ and _Formatting Errors_ in _expressions_ that are not used
1403in _pattern selection_ or _formatting_ MAY be ignored,
1404as they do not affect the output of the formatter.
1405
1406In all cases, when encountering a runtime error,
1407a message formatter MUST provide some representation of the message.
1408An informative error or errors MUST also be separately provided.
1409
1410When a message contains more than one error,
1411or contains some error which leads to further errors,
1412an implementation which does not emit all of the errors
1413SHOULD prioritise _Syntax Errors_ and _Data Model Errors_ over others.
1414
1415When an error occurs within a _selector_,
1416the _selector_ MUST NOT match any _variant_ _key_ other than the catch-all `*`
1417and a _Resolution Error_ or a _Selection Error_ MUST be emitted.
1418
1419### Syntax Errors
1420
1421**_<dfn>Syntax Errors</dfn>_** occur when the syntax representation of a message is not well-formed.
1422
1423> Example invalid messages resulting in a _Syntax Error_:
1424>
1425> ```
1426> {{Missing end braces
1427> ```
1428>
1429> ```
1430> {{Missing one end brace}
1431> ```
1432>
1433> ```
1434> Unknown {{expression}}
1435> ```
1436>
1437> ```
1438> .local $var = {|no message body|}
1439> ```
1440
1441### Data Model Errors
1442
1443**_<dfn>Data Model Errors</dfn>_** occur when a message is invalid due to
1444violating one of the semantic requirements on its structure.
1445
1446#### Variant Key Mismatch
1447
1448A **_<dfn>Variant Key Mismatch</dfn>_** occurs when the number of keys on a _variant_
1449does not equal the number of _selectors_.
1450
1451> Example invalid messages resulting in a _Variant Key Mismatch_ error:
1452>
1453> ```
1454> .match {$one :func}
1455> 1 2 {{Too many}}
1456> * {{Otherwise}}
1457> ```
1458>
1459> ```
1460> .match {$one :func} {$two :func}
1461> 1 2 {{Two keys}}
1462> * {{Missing a key}}
1463> * * {{Otherwise}}
1464> ```
1465
1466#### Missing Fallback Variant
1467
1468A **_<dfn>Missing Fallback Variant</dfn>_** error occurs when the message
1469does not include a _variant_ with only catch-all keys.
1470
1471> Example invalid messages resulting in a _Missing Fallback Variant_ error:
1472>
1473> ```
1474> .match {$one :func}
1475> 1 {{Value is one}}
1476> 2 {{Value is two}}
1477> ```
1478>
1479> ```
1480> .match {$one :func} {$two :func}
1481> 1 * {{First is one}}
1482> * 1 {{Second is one}}
1483> ```
1484
1485#### Missing Selector Annotation
1486
1487A **_<dfn>Missing Selector Annotation</dfn>_** error occurs when the _message_
1488contains a _selector_ that does not have an _annotation_,
1489or contains a _variable_ that does not directly or indirectly reference a _declaration_ with an _annotation_.
1490
1491> Examples of invalid messages resulting in a _Missing Selector Annotation_ error:
1492>
1493> ```
1494> .match {$one}
1495> 1 {{Value is one}}
1496> * {{Value is not one}}
1497> ```
1498>
1499> ```
1500> .local $one = {|The one|}
1501> .match {$one}
1502> 1 {{Value is one}}
1503> * {{Value is not one}}
1504> ```
1505>
1506> ```
1507> .input {$one}
1508> .match {$one}
1509> 1 {{Value is one}}
1510> * {{Value is not one}}
1511> ```
1512
1513#### Duplicate Declaration
1514
1515A **_<dfn>Duplicate Declaration</dfn>_** error occurs when a _variable_ is declared more than once.
1516Note that an input _variable_ is implicitly declared when it is first used,
1517so explicitly declaring it after such use is also an error.
1518
1519> Examples of invalid messages resulting in a _Duplicate Declaration_ error:
1520>
1521> ```
1522> .input {$var :number maximumFractionDigits=0}
1523> .input {$var :number minimumFractionDigits=0}
1524> {{Redeclaration of the same variable}}
1525>
1526> .local $var = {$ext :number maximumFractionDigits=0}
1527> .input {$var :number minimumFractionDigits=0}
1528> {{Redeclaration of a local variable}}
1529>
1530> .input {$var :number minimumFractionDigits=0}
1531> .local $var = {$ext :number maximumFractionDigits=0}
1532> {{Redeclaration of an input variable}}
1533>
1534> .input {$var :number minimumFractionDigits=$var2}
1535> .input {$var2 :number}
1536> {{Redeclaration of the implicit input variable $var2}}
1537>
1538> .local $var = {$ext :someFunction}
1539> .local $var = {$error}
1540> .local $var2 = {$var2 :error}
1541> {{{$var} cannot be redefined. {$var2} cannot refer to itself}}
1542> ```
1543
1544#### Duplicate Option Name
1545
1546A **_<dfn>Duplicate Option Name</dfn>_** error occurs when the same _identifier_
1547appears on the left-hand side of more than one _option_ in the same _expression_.
1548
1549> Examples of invalid messages resulting in a _Duplicate Option Name_ error:
1550>
1551> ```
1552> Value is {42 :number style=percent style=decimal}
1553> ```
1554>
1555> ```
1556> .local $foo = {horse :func one=1 two=2 one=1}
1557> {{This is {$foo}}}
1558> ```
1559
1560### Resolution Errors
1561
1562**_<dfn>Resolution Errors</dfn>_** occur when the runtime value of a part of a message
1563cannot be determined.
1564
1565#### Unresolved Variable
1566
1567An **_<dfn>Unresolved Variable</dfn>_** error occurs when a variable reference cannot be resolved.
1568
1569> For example, attempting to format either of the following messages
1570> would result in an _Unresolved Variable_ error if done within a context that
1571> does not provide for the variable reference `$var` to be successfully resolved:
1572>
1573> ```
1574> The value is {$var}.
1575> ```
1576>
1577> ```
1578> .match {$var :func}
1579> 1 {{The value is one.}}
1580> * {{The value is not one.}}
1581> ```
1582
1583#### Unknown Function
1584
1585An **_<dfn>Unknown Function</dfn>_** error occurs when an _expression_ includes
1586a reference to a function which cannot be resolved.
1587
1588> For example, attempting to format either of the following messages
1589> would result in an _Unknown Function_ error if done within a context that
1590> does not provide for the function `:func` to be successfully resolved:
1591>
1592> ```
1593> The value is {horse :func}.
1594> ```
1595>
1596> ```
1597> .match {|horse| :func}
1598> 1 {{The value is one.}}
1599> * {{The value is not one.}}
1600> ```
1601
1602#### Unsupported Expression
1603
1604An **_<dfn>Unsupported Expression</dfn>_** error occurs when an expression uses
1605syntax reserved for future standardization,
1606or for private implementation use that is not supported by the current implementation.
1607
1608> For example, attempting to format this message
1609> would always result in an _Unsupported Expression_ error:
1610>
1611> ```
1612> The value is {!horse}.
1613> ```
1614>
1615> Attempting to format this message would result in an _Unsupported Expression_ error
1616> if done within a context that does not support the `^` private use sigil:
1617>
1618> ```
1619> .match {|horse| ^private}
1620> 1 {{The value is one.}}
1621> * {{The value is not one.}}
1622> ```
1623
1624#### Invalid Expression
1625
1626An **_<dfn>Invalid Expression</dfn>_** error occurs when a _message_ includes an _expression_
1627whose implementation-defined internal requirements produce an error during _function resolution_
1628or when a _function_ returns a value (such as `null`) that the implementation does not support.
1629
1630An **_<dfn>Operand Mismatch Error</dfn>_** is an _Invalid Expression_ error that occurs when
1631an _operand_ provided to a _function_ during _function resolution_ does not match one of the
1632expected implementation-defined types for that function;
1633or in which a literal _operand_ value does not have the required format
1634and thus cannot be processed into one of the expected implementation-defined types
1635for that specific _function_.
1636
1637> For example, the following _message_ produces an _Operand Mismatch Error_
1638> (a type of _Invalid Expression_ error)
1639> because the literal `|horse|` does not match the production `number-literal`,
1640> which is a requirement of the function `:number` for its operand:
1641> ```
1642> .local $horse = {horse :number}
1643> {{You have a {$horse}.}}
1644> ```
1645> The following _message_ might produce an _Invalid Expression_ error if the
1646> the function `:function` threw an exception or otherwise emitted an error
1647> rather than returning a valid value:
1648>```
1649> {{This has an invalid expression {$var :function} because it has a bug in it.}}
1650>```
1651
1652#### Unsupported Statement
1653
1654An **_<dfn>Unsupported Statement</dfn>_** error occurs when a message includes a _reserved statement_.
1655
1656> For example, attempting to format this message
1657> would always result in an _Unsupported Statement_ error:
1658>
1659> ```
1660> .some {|horse|}
1661> {{The message body}}
1662> ```
1663
1664### Selection Errors
1665
1666**_<dfn>Selection Errors</dfn>_** occur when message selection fails.
1667
1668> For example, attempting to format either of the following messages
1669> might result in a _Selection Error_ if done within a context that
1670> uses a `:number` selector function which requires its input to be numeric:
1671>
1672> ```
1673> .match {|horse| :number}
1674> 1 {{The value is one.}}
1675> * {{The value is not one.}}
1676> ```
1677>
1678> ```
1679> .local $sel = {|horse| :number}
1680> .match {$sel}
1681> 1 {{The value is one.}}
1682> * {{The value is not one.}}
1683> ```
1684
1685### Formatting Errors
1686
1687**_<dfn>Formatting Errors</dfn>_** occur during the formatting of a resolved value,
1688for example when encountering a value with an unsupported type
1689or an internally inconsistent set of options.
1690
1691> For example, attempting to format any of the following messages
1692> might result in a _Formatting Error_ if done within a context that
1693>
1694> 1. provides for the variable reference `$user` to resolve to
1695>    an object `{ name: 'Kat', id: 1234 }`,
1696> 2. provides for the variable reference `$field` to resolve to
1697>    a string `'address'`, and
1698> 3. uses a `:get` formatting function which requires its argument to be an object and
1699>    an option `field` to be provided with a string value,
1700>
1701> ```
1702> Hello, {horse :get field=name}!
1703> ```
1704>
1705> ```
1706> Hello, {$user :get}!
1707> ```
1708>
1709> ```
1710> .local $id = {$user :get field=id}
1711> {{Hello, {$id :get field=name}!}}
1712> ```
1713>
1714> ```
1715> Your {$field} is {$id :get field=$field}
1716> ```
1717
1718## Function Registry
1719
1720Implementations and tooling can greatly benefit from a
1721structured definition of formatting and matching functions available to messages at runtime.
1722This specification is intended to provide a mechanism for storing such declarations in a portable manner.
1723
1724### Goals
1725
1726_This section is non-normative._
1727
1728The registry provides a machine-readable description of MessageFormat 2 extensions (custom functions),
1729in order to support the following goals and use-cases:
1730
1731- Validate semantic properties of messages. For example:
1732  - Type-check values passed into functions.
1733  - Validate that matching functions are only called in selectors.
1734  - Validate that formatting functions are only called in placeholders.
1735  - Verify the exhaustiveness of variant keys given a selector.
1736- Support the localization roundtrip. For example:
1737  - Generate variant keys for a given locale during XLIFF extraction.
1738- Improve the authoring experience. For example:
1739  - Forbid edits to certain function options (e.g. currency options).
1740  - Autocomplete function and option names.
1741  - Display on-hover tooltips for function signatures with documentation.
1742  - Display/edit known message metadata.
1743  - Restrict input in GUI by providing a dropdown with all viable option values.
1744
1745### Conformance and Use
1746
1747_This section is normative._
1748
1749To be conformant with MessageFormat 2.0, an implementation MUST implement
1750the _functions_, _options_ and _option_ values, _operands_ and outputs
1751described in the section [Default Registry](#default-registry) below.
1752
1753Implementations MAY implement additional _functions_ or additional _options_.
1754In particular, implementations are encouraged to provide feedback on proposed
1755_options_ and their values.
1756
1757> [!IMPORTANT]
1758> In the Tech Preview, the [registry data model](#registry-data-model) should
1759> be regarded as experimental.
1760> Changes to the format are expected during this period.
1761> Feedback on the registry's format and implementation is encouraged!
1762
1763Implementations are not required to provide a machine-readable registry
1764nor to read or interpret the registry data model in order to be conformant.
1765
1766The MessageFormat 2.0 Registry was created to describe
1767the core set of formatting and selection _functions_,
1768including _operands_, _options_, and _option_ values.
1769This is the minimum set of functionality needed for conformance.
1770By using the same names and values, _messages_ can be used interchangeably
1771by different implementations,
1772regardless of programming language or runtime environment.
1773This ensures that developers do not have to relearn core MessageFormat syntax
1774and functionality when moving between platforms
1775and that translators do not need to know about the runtime environment for most
1776selection or formatting operations.
1777
1778The registry provides a machine-readable description of _functions_
1779suitable for tools, such as those used in translation automation, so that
1780variant expansion and information about available _options_ and their effects
1781are available in the translation ecosystem.
1782To that end, implementations are strongly encouraged to provide appropriately
1783tailored versions of the registry for consumption by tools
1784(even if not included in software distributions)
1785and to encourage any add-on or plug-in functionality to provide
1786a registry to support localization tooling.
1787
1788### Registry Data Model
1789
1790_This section is non-normative._
1791
1792> [!IMPORTANT]
1793> This part of the specification is not part of the Tech Preview.
1794
1795The registry contains descriptions of function signatures.
1796
1797The main building block of the registry is the `<function>` element.
1798It represents an implementation of a custom function available to translation at runtime.
1799A function defines a human-readable `<description>` of its behavior
1800and one or more machine-readable _signatures_ of how to call it.
1801Named `<validationRule>` elements can optionally define regex validation rules for
1802literals, option values, and variant keys.
1803
1804MessageFormat 2 functions can be invoked in two contexts:
1805
1806- inside placeholders, to produce a part of the message's formatted output;
1807  for example, a raw value of `|1.5|` may be formatted to `1,5` in a language which uses commas as decimal separators,
1808- inside selectors, to contribute to selecting the appropriate variant among all given variants.
1809
1810A single _function name_ may be used in both contexts,
1811regardless of whether it's implemented as one or multiple functions.
1812
1813A _signature_ defines one particular set of at most one argument and any number of named options
1814that can be used together in a single call to the function.
1815`<formatSignature>` corresponds to a function call inside a placeholder inside translatable text.
1816`<matchSignature>` corresponds to a function call inside a selector.
1817
1818A signature may define the positional argument of the function with the `<input>` element.
1819If the `<input>` element is not present, the function is defined as a nullary function.
1820A signature may also define one or more `<option>` elements representing _named options_ to the function.
1821An option can be omitted in a call to the function,
1822unless the `required` attribute is present.
1823They accept either a finite enumeration of values (the `values` attribute)
1824or validate their input with a regular expression (the `validationRule` attribute).
1825Read-only options (the `readonly` attribute) can be displayed to translators in CAT tools, but may not be edited.
1826
1827As the `<input>` and `<option>` rules may be locale-dependent,
1828each signature can include an `<override locales="...">` that extends and overrides
1829the corresponding input and options rules.
1830If multiple `<override>` elements would match the current locale,
1831only the first one is used.
1832
1833Matching-function signatures additionally include one or more `<match>` elements
1834to define the keys against which they can match when used as selectors.
1835
1836Functions may also include `<alias>` definitions,
1837which provide shorthands for commonly used option baskets.
1838An _alias name_ may be used equivalently to a _function name_ in messages.
1839Its `<setOption>` values are always set, and may not be overridden in message annotations.
1840
1841If a `<function>`, `<input>` or `<option>` includes multiple `<description>` elements,
1842each SHOULD have a different `xml:lang` attribute value.
1843This allows for the descriptions of these elements to be themselves localized
1844according to the preferred locale of the message authors and editors.
1845
1846### Example
1847
1848The following `registry.xml` is an example of a registry file
1849which may be provided by an implementation to describe its built-in functions.
1850For the sake of brevity, only `locales="en"` is considered.
1851
1852```xml
1853<?xml version="1.0" encoding="UTF-8" ?>
1854<!DOCTYPE registry SYSTEM "./registry.dtd">
1855
1856<registry xml:lang="en">
1857    <function name="platform">
1858        <description>Match the current OS.</description>
1859        <matchSignature>
1860            <match values="windows linux macos android ios"/>
1861        </matchSignature>
1862    </function>
1863
1864    <validationRule id="anyNumber" regex="-?[0-9]+(\.[0-9]+)"/>
1865    <validationRule id="positiveInteger" regex="[0-9]+"/>
1866    <validationRule id="currencyCode" regex="[A-Z]{3}"/>
1867
1868    <function name="number">
1869        <description>
1870            Format a number.
1871            Match a **formatted** numerical value against CLDR plural categories or against a number literal.
1872        </description>
1873
1874        <matchSignature>
1875            <input validationRule="anyNumber"/>
1876            <option name="type" values="cardinal ordinal"/>
1877            <option name="minimumIntegerDigits" validationRule="positiveInteger"/>
1878            <option name="minimumFractionDigits" validationRule="positiveInteger"/>
1879            <option name="maximumFractionDigits" validationRule="positiveInteger"/>
1880            <option name="minimumSignificantDigits" validationRule="positiveInteger"/>
1881            <option name="maximumSignificantDigits" validationRule="positiveInteger"/>
1882            <!-- Since this applies to both cardinal and ordinal, all plural options are valid. -->
1883            <match locales="en" values="one two few other" validationRule="anyNumber"/>
1884            <match values="zero one two few many other" validationRule="anyNumber"/>
1885        </matchSignature>
1886
1887        <formatSignature>
1888            <input validationRule="anyNumber"/>
1889            <option name="minimumIntegerDigits" validationRule="positiveInteger"/>
1890            <option name="minimumFractionDigits" validationRule="positiveInteger"/>
1891            <option name="maximumFractionDigits" validationRule="positiveInteger"/>
1892            <option name="minimumSignificantDigits" validationRule="positiveInteger"/>
1893            <option name="maximumSignificantDigits" validationRule="positiveInteger"/>
1894            <option name="style" readonly="true" values="decimal currency percent unit" default="decimal"/>
1895            <option name="currency" readonly="true" validationRule="currencyCode"/>
1896        </formatSignature>
1897
1898        <alias name="integer">
1899          <description>Locale-sensitive integral number formatting</description>
1900          <setOption name="maximumFractionDigits" value="0" />
1901          <setOption name="style" value="decimal" />
1902        </alias>
1903    </function>
1904</registry>
1905```
1906
1907Given the above description, the `:number` function is defined to work both in a selector and a placeholder:
1908
1909```
1910.match {$count :number}
19111 {{One new message}}
1912* {{{$count :number} new messages}}
1913```
1914
1915Furthermore,
1916`:number`'s `<matchSignature>` contains two `<match>` elements
1917which allow the validation of variant keys.
1918The element whose `locales` best matches the current locale
1919using resource item [lookup](tr35.md#Lookup) from LDML is used.
1920An element with no `locales` attribute is the default
1921(and is considered equivalent to the `root` locale).
1922
1923- `<match locales="en" values="one two few other" .../>` can be used in locales like `en` and `en-GB`
1924  to validate the `when other` variant by verifying that the `other` key is present
1925  in the list of enumarated values: `one other`.
1926- `<match ... validationRule="anyNumber"/>` can be used to valide the `when 1` variant
1927  by testing the `1` key against the `anyNumber` regular expression defined in the registry file.
1928
1929---
1930
1931A localization engineer can then extend the registry by defining the following `customRegistry.xml` file.
1932
1933```xml
1934<?xml version="1.0" encoding="UTF-8" ?>
1935<!DOCTYPE registry SYSTEM "./registry.dtd">
1936
1937<registry xml:lang="en">
1938    <function name="noun">
1939        <description>Handle the grammar of a noun.</description>
1940        <formatSignature>
1941            <override locales="en">
1942                <input/>
1943                <option name="article" values="definite indefinite"/>
1944                <option name="plural" values="one other"/>
1945                <option name="case" values="nominative genitive" default="nominative"/>
1946            </override>
1947        </formatSignature>
1948    </function>
1949
1950    <function name="adjective">
1951        <description>Handle the grammar of an adjective.</description>
1952        <formatSignature>
1953            <override locales="en">
1954                <input/>
1955                <option name="article" values="definite indefinite"/>
1956                <option name="plural" values="one other"/>
1957                <option name="case" values="nominative genitive" default="nominative"/>
1958            </override>
1959        </formatSignature>
1960        <formatSignature>
1961            <override locales="en">
1962                <input/>
1963                <option name="article" values="definite indefinite"/>
1964                <option name="accord"/>
1965            </override>
1966        </formatSignature>
1967    </function>
1968</registry>
1969```
1970
1971Messages can now use the `:noun` and the `:adjective` functions.
1972The following message references the first signature of `:adjective`,
1973which expects the `plural` and `case` options:
1974
1975> ```
1976> You see {$color :adjective article=indefinite plural=one case=nominative} {$object :noun case=nominative}!
1977> ```
1978
1979The following message references the second signature of `:adjective`,
1980which only expects the `accord` option:
1981
1982>```
1983> .input {$object :noun case=nominative}
1984> {{You see {$color :adjective article=indefinite accord=$object} {$object}!}}
1985>```
1986
1987### Default Registry
1988
1989> [!IMPORTANT]
1990> This part of the specification is part of the Tech Preview
1991> and is **_NORMATIVE_**.
1992
1993This section describes the functions which each implementation MUST provide
1994to be conformant with this specification.
1995
1996#### String Value Selection and Formatting
1997
1998##### The `:string` function
1999
2000The function `:string` provides string selection and formatting.
2001
2002###### Operands
2003
2004The _operand_ of `:string` is either any implementation-defined type
2005that is a string or for which conversion to a string is supported,
2006or any _literal_ value.
2007All other values produce an _Invalid Expression_ error.
2008
2009> For example, in Java, implementations of the `java.lang.CharSequence` interface
2010> (such as `java.lang.String` or `java.lang.StringBuilder`),
2011> the type `char`, or the class `java.lang.Character` might be considered
2012> as the "implementation-defined types".
2013> Such an implementation might also support other classes via the method `toString()`.
2014> This might be used to enable selection of a `enum` value by name, for example.
2015>
2016> Other programming languages would define string and character sequence types or
2017> classes according to their local needs, including, where appropriate,
2018> coercion to string.
2019
2020###### Options
2021
2022The function `:string` has no options.
2023
2024> [!NOTE]
2025> Proposals for string transformation options or implementation
2026> experience with user requirements is desired during the Tech Preview.
2027
2028###### Selection
2029
2030When implementing [`MatchSelectorKeys(resolvedSelector, keys)`](#resolve-preferences)
2031where `resolvedSelector` is the resolved value of a _selector_ _expression_
2032and `keys` is a list of strings,
2033the `:string` selector performs as described below.
2034
20351. Let `compare` be the string value of `resolvedSelector`.
20361. Let `result` be a new empty list of strings.
20371. For each string `key` in `keys`:
2038   1. If `key` and `compare` consist of the same sequence of Unicode code points, then
2039      1. Append `key` as the last element of the list `result`.
20401. Return `result`.
2041
2042> [!NOTE]
2043> Matching of `key` and `compare` values is sensitive to the sequence of code points
2044> in each string.
2045> As a result, variations in how text can be encoded can affect the performance of matching.
2046> The function `:string` does not perform case folding or Unicode Normalization of string values.
2047> Users SHOULD encode _messages_ and their parts (such as _keys_ and _operands_),
2048> in Unicode Normalization Form C (NFC) unless there is a very good reason
2049> not to.
2050> See also: [String Matching](https://www.w3.org/TR/charmod-norm)
2051
2052> [!NOTE]
2053> Unquoted string literals in a _variant_ do not include spaces.
2054> If users wish to match strings that include whitespace
2055> (including U+3000 `IDEOGRAPHIC SPACE`)
2056> to a key, the `key` needs to be quoted.
2057>
2058> For example:
2059> ```
2060> .match {$string :string}
2061> | space key | {{Matches the string " space key "}}
2062> *             {{Matches the string "space key"}}
2063> ```
2064
2065###### Formatting
2066
2067The `:string` function returns the string value of the resolved value of the _operand_.
2068
2069#### Numeric Value Selection and Formatting
2070
2071##### The `:number` function
2072
2073The function `:number` is a selector and formatter for numeric values.
2074
2075###### Operands
2076
2077The function `:number` requires a [Number Operand](#number-operands) as its _operand_.
2078
2079###### Options
2080
2081Some options do not have default values defined in this specification.
2082The defaults for these options are implementation-dependent.
2083In general, the default values for such options depend on the locale,
2084the value of other options, or both.
2085
2086The following options and their values are required to be available on the function `:number`:
2087- `select`
2088   -  `plural` (default; see [Default Value of `select` Option](#default-value-of-select-option) below)
2089   -  `ordinal`
2090   -  `exact`
2091- `compactDisplay` (this option only has meaning when combined with the option `notation=compact`)
2092   - `short` (default)
2093   - `long`
2094- `notation`
2095   - `standard` (default)
2096   - `scientific`
2097   - `engineering`
2098   - `compact`
2099- `numberingSystem`
2100   - valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
2101     (default is locale-specific)
2102- `signDisplay`
2103   -  `auto` (default)
2104   -  `always`
2105   -  `exceptZero`
2106   -  `negative`
2107   -  `never`
2108- `style`
2109  - `decimal` (default)
2110  - `percent` (see [Percent Style](#percent-style) below)
2111- `useGrouping`
2112  - `auto` (default)
2113  - `always`
2114  - `never`
2115  - `min2`
2116- `minimumIntegerDigits`
2117  - ([digit size option](#digit-size-options), default: `1`)
2118- `minimumFractionDigits`
2119  - ([digit size option](#digit-size-options))
2120- `maximumFractionDigits`
2121  - ([digit size option](#digit-size-options))
2122- `minimumSignificantDigits`
2123  - ([digit size option](#digit-size-options))
2124- `maximumSignificantDigits`
2125  - ([digit size option](#digit-size-options))
2126
2127> [!NOTE]
2128> The following options and option values are being developed during the Technical Preview
2129> period.
2130
2131The following values for the option `style` are _not_ part of the default registry.
2132Implementations SHOULD avoid creating options that conflict with these, but
2133are encouraged to track development of these options during Tech Preview:
2134- `currency`
2135- `unit`
2136
2137The following options are _not_ part of the default registry.
2138Implementations SHOULD avoid creating options that conflict with these, but
2139are encouraged to track development of these options during Tech Preview:
2140- `currency`
2141   - valid [Unicode Currency Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCurrencyIdentifier)
2142     (no default)
2143- `currencyDisplay`
2144   - `symbol` (default)
2145   - `narrowSymbol`
2146   - `code`
2147   - `name`
2148- `currencySign`
2149  - `accounting`
2150  - `standard` (default)
2151- `unit`
2152   - (anything not empty)
2153- `unitDisplay`
2154   - `long`
2155   - `short` (default)
2156   - `narrow`
2157
2158###### Default Value of `select` Option
2159
2160The value `plural` is the default for the option `select`
2161because it is the most common use case for numeric selection.
2162It can be used for exact value matches but also allows for the grammatical needs of
2163languages using CLDR's plural rules.
2164This might not be noticeable in the source language (particularly English),
2165but can cause problems in target locales that the original developer is not considering.
2166
2167> For example, a naive developer might use a special message for the value `1` without
2168> considering a locale's need for a `one` plural:
2169> ```
2170> .match {$var :number}
2171> 1   {{You have one last chance}}
2172> one {{You have {$var} chance remaining}}
2173> *   {{You have {$var} chances remaining}}
2174> ```
2175>
2176> The `one` variant is needed by languages such as Polish or Russian.
2177> Such locales typically also require other keywords such as `two`, `few`, and `many`.
2178
2179###### Percent Style
2180When implementing `style=percent`, the numeric value of the _operand_
2181MUST be multiplied by 100 for the purposes of formatting.
2182
2183> For example,
2184> ```
2185> The total was {0.5 :number style=percent}.
2186> ```
2187> should format in a manner similar to:
2188> > The total was 50%.
2189
2190###### Selection
2191
2192The _function_ `:number` performs selection as described in [Number Selection](#number-selection) below.
2193
2194##### The `:integer` function
2195
2196The function `:integer` is a selector and formatter for matching or formatting numeric
2197values as integers.
2198
2199###### Operands
2200
2201The function `:integer` requires a [Number Operand](#number-operands) as its _operand_.
2202
2203
2204###### Options
2205
2206Some options do not have default values defined in this specification.
2207The defaults for these options are implementation-dependent.
2208In general, the default values for such options depend on the locale,
2209the value of other options, or both.
2210
2211The following options and their values are required in the default registry to be available on the
2212function `:integer`:
2213- `select`
2214   -  `plural` (default)
2215   -  `ordinal`
2216   -  `exact`
2217- `numberingSystem`
2218   - valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
2219     (default is locale-specific)
2220- `signDisplay`
2221   -  `auto` (default)
2222   -  `always`
2223   -  `exceptZero`
2224   -  `negative`
2225   -  `never`
2226- `style`
2227  - `decimal` (default)
2228  - `percent` (see [Percent Style](#percent-style) below)
2229- `useGrouping`
2230  - `auto` (default)
2231  - `always`
2232  - `min2`
2233- `minimumIntegerDigits`
2234  - ([digit size option](#digit-size-options), default: `1`)
2235- `maximumSignificantDigits`
2236  - ([digit size option](#digit-size-options))
2237
2238> [!NOTE]
2239> The following options and option values are being developed during the Technical Preview
2240> period.
2241
2242The following values for the option `style` are _not_ part of the default registry.
2243Implementations SHOULD avoid creating options that conflict with these, but
2244are encouraged to track development of these options during Tech Preview:
2245- `currency`
2246- `unit`
2247
2248The following options are _not_ part of the default registry.
2249Implementations SHOULD avoid creating options that conflict with these, but
2250are encouraged to track development of these options during Tech Preview:
2251- `currency`
2252   - valid [Unicode Currency Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCurrencyIdentifier)
2253     (no default)
2254- `currencyDisplay`
2255   - `symbol` (default)
2256   - `narrowSymbol`
2257   - `code`
2258   - `name`
2259- `currencySign`
2260  - `accounting`
2261  - `standard` (default)
2262- `unit`
2263   - (anything not empty)
2264- `unitDisplay`
2265   - `long`
2266   - `short` (default)
2267   - `narrow`
2268
2269###### Default Value of `select` Option
2270
2271The value `plural` is the default for the option `select`
2272because it is the most common use case for numeric selection.
2273It can be used for exact value matches but also allows for the grammatical needs of
2274languages using CLDR's plural rules.
2275This might not be noticeable in the source language (particularly English),
2276but can cause problems in target locales that the original developer is not considering.
2277
2278> For example, a naive developer might use a special message for the value `1` without
2279> considering a locale's need for a `one` plural:
2280> ```
2281> .match {$var :integer}
2282> 1   {{You have one last chance}}
2283> one {{You have {$var} chance remaining}}
2284> *   {{You have {$var} chances remaining}}
2285> ```
2286>
2287> The `one` variant is needed by languages such as Polish or Russian.
2288> Such locales typically also require other keywords such as `two`, `few`, and `many`.
2289
2290###### Percent Style
2291When implementing `style=percent`, the numeric value of the _operand_
2292MUST be multiplied by 100 for the purposes of formatting.
2293
2294> For example,
2295> ```
2296> The total was {0.5 :number style=percent}.
2297> ```
2298> should format in a manner similar to:
2299> > The total was 50%.
2300
2301###### Selection
2302
2303The _function_ `:integer` performs selection as described in [Number Selection](#number-selection) below.
2304
2305##### Number Operands
2306
2307The _operand_ of a number function is either an implementation-defined type or
2308a literal whose contents match the `number-literal` production in the [ABNF](#complete-abnf).
2309All other values produce an _Invalid Expression_ error.
2310
2311> For example, in Java, any subclass of `java.lang.Number` plus the primitive
2312> types (`byte`, `short`, `int`, `long`, `float`, `double`, etc.)
2313> might be considered as the "implementation-defined numeric types".
2314> Implementations in other programming languages would define different types
2315> or classes according to their local needs.
2316
2317> [!NOTE]
2318> String values passed as variables in the _formatting context_'s
2319> _input mapping_ can be formatted as numeric values as long as their
2320> contents match the `number-literal` production in the [ABNF](#complete-abnf).
2321>
2322> For example, if the value of the variable `num` were the string
2323> `-1234.567`, it would behave identically to the local
2324> variable in this example:
2325> ```
2326> .local $example = {|-1234.567| :number}
2327> {{{$num :number} == {$example}}}
2328> ```
2329
2330> [!NOTE]
2331> Implementations are encouraged to provide support for compound types or data structures
2332> that provide additional semantic meaning to the formatting of number-like values.
2333> For example, in ICU4J, the type `com.ibm.icu.util.Measure` can be used to communicate
2334> a value that includes a unit
2335> or the type `com.ibm.icu.util.CurrencyAmount` can be used to set the currency and related
2336> options (such as the number of fraction digits).
2337
2338##### Digit Size Options
2339
2340Some _options_ of number _functions_ are defined to take a "digit size option".
2341Implementations of number _functions_ use these _options_ to control aspects of numeric display
2342such as the number of fraction, integer, or significant digits.
2343
2344A "digit size option" is an _option_ value that the _function_ interprets
2345as a small integer value greater than or equal to zero.
2346Implementations MAY define an upper limit on the resolved value
2347of a digit size option option consistent with that implementation's practical limits.
2348
2349In most cases, the value of a digit size option will be a string that
2350encodes the value as a decimal integer.
2351Implementations MAY also accept implementation-defined types as the value.
2352When provided as a string, the representation of a digit size option matches the following ABNF:
2353>```abnf
2354> digit-size-option = "0" / (("1"-"9") [DIGIT])
2355>```
2356
2357
2358##### Number Selection
2359
2360Number selection has three modes:
2361- `exact` selection matches the operand to explicit numeric keys exactly
2362- `plural` selection matches the operand to explicit numeric keys exactly
2363  or to plural rule categories if there is no explicit match
2364- `ordinal` selection matches the operand to explicit numeric keys exactly
2365  or to ordinal rule categories if there is no explicit match
2366
2367When implementing [`MatchSelectorKeys(resolvedSelector, keys)`](#resolve-preferences)
2368where `resolvedSelector` is the resolved value of a _selector_ _expression_
2369and `keys` is a list of strings,
2370numeric selectors perform as described below.
2371
23721. Let `exact` be the JSON string representation of the numeric value of `resolvedSelector`.
2373   (See [Determining Exact Literal Match](#determining-exact-literal-match) for details)
23741. Let `keyword` be a string which is the result of [rule selection](#rule-selection) on `resolvedSelector`.
23751. Let `resultExact` be a new empty list of strings.
23761. Let `resultKeyword` be a new empty list of strings.
23771. For each string `key` in `keys`:
2378   1. If the value of `key` matches the production `number-literal`, then
2379      1. If `key` and `exact` consist of the same sequence of Unicode code points, then
2380         1. Append `key` as the last element of the list `resultExact`.
2381   1. Else if `key` is one of the keywords `zero`, `one`, `two`, `few`, `many`, or `other`, then
2382      1. If `key` and `keyword` consist of the same sequence of Unicode code points, then
2383         1. Append `key` as the last element of the list `resultKeyword`.
2384   1. Else, emit a _Selection Error_.
23851. Return a new list whose elements are the concatenation of the elements (in order) of `resultExact` followed by the elements (in order) of `resultKeyword`.
2386
2387> [!NOTE]
2388> Implementations are not required to implement this exactly as written.
2389> However, the observed behavior must be consistent with what is described here.
2390
2391###### Rule Selection
2392
2393If the option `select` is set to `exact`, rule-based selection is not used.
2394Return the empty string.
2395
2396> [!NOTE]
2397> Since valid keys cannot be the empty string in a numeric expression, returning the
2398> empty string disables keyword selection.
2399
2400If the option `select` is set to `plural`, selection should be based on CLDR plural rule data
2401of type `cardinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html)
2402for examples.
2403
2404If the option `select` is set to `ordinal`, selection should be based on CLDR plural rule data
2405of type `ordinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html)
2406for examples.
2407
2408Apply the rules defined by CLDR to the resolved value of the operand and the function options,
2409and return the resulting keyword.
2410If no rules match, return `other`.
2411
2412> **Example.**
2413> In CLDR 44, the Czech (`cs`) plural rule set can be found
2414> [here](https://www.unicode.org/cldr/charts/44/supplemental/language_plural_rules.html#cs).
2415>
2416> A message in Czech might be:
2417> ```
2418> .match {$numDays :number}
2419> one  {{{$numDays} den}}
2420> few  {{{$numDays} dny}}
2421> many {{{$numDays} dne}}
2422> *    {{{$numDays} dní}}
2423> ```
2424> Using the rules found above, the results of various _operand_ values might look like:
2425> | Operand value | Keyword | Formatted Message |
2426> |---|---|---|
2427> | 1 | `one` | 1 den |
2428> | 2 | `few` | 2 dny |
2429> | 5 | `other` | 5 dní |
2430> | 22 | `few` | 22 dny |
2431> | 27 | `other` | 27 dní |
2432> | 2.4 | `many` | 2,4 dne |
2433
2434###### Determining Exact Literal Match
2435
2436> [!IMPORTANT]
2437> The exact behavior of exact literal match is only defined for non-zero-filled
2438> integer values.
2439> Annotations that use fraction digits or significant digits might work in specific
2440> implementation-defined ways.
2441> Users should avoid depending on these types of keys in message selection.
2442
2443
2444Number literals in the MessageFormat 2 syntax use the
2445[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6).
2446A `resolvedSelector` exactly matches a numeric literal `key`
2447if, when the numeric value of `resolvedSelector` is serialized using the format for a JSON number,
2448the two strings are equal.
2449
2450> [!NOTE]
2451> Only integer matching is required in the Technical Preview.
2452> Feedback describing use cases for fractional and significant digits-based
2453> selection would be helpful.
2454> Otherwise, users should avoid using matching with fractional numbers or significant digits.
2455
2456#### Date and Time Value Formatting
2457
2458This subsection describes the functions and options for date/time formatting.
2459Selection based on date and time values is not required in this release.
2460
2461> [!NOTE]
2462> Selection based on date/time types is not required by MF2.
2463> Implementations should use care when defining selectors based on date/time types.
2464> The types of queries found in implementations such as `java.time.TemporalAccessor`
2465> are complex and user expectations may be inconsistent with good I18N practices.
2466
2467##### The `:datetime` function
2468
2469The function `:datetime` is used to format date/time values, including
2470the ability to compose user-specified combinations of fields.
2471
2472If no options are specified, this function defaults to the following:
2473- `{$d :datetime}` is the same as `{$d :datetime dateStyle=short timeStyle=short}`
2474
2475> [!NOTE]
2476> The default formatting behavior of `:datetime` is inconsistent with `Intl.DateTimeFormat`
2477> in JavaScript and with `{d,date}` in ICU MessageFormat 1.0.
2478> This is because, unlike those implementations, `:datetime` is distinct from `:date` and `:time`.
2479
2480###### Operands
2481
2482The _operand_ of the `:datetime` function is either
2483an implementation-defined date/time type
2484or a _date/time literal value_, as defined in [Date and Time Operand](#date-and-time-operands).
2485All other _operand_ values produce an _Invalid Expression_ error.
2486
2487###### Options
2488
2489The `:datetime` function can use either the appropriate _style options_
2490or can use a collection of _field options_ (but not both) to control the formatted
2491output.
2492
2493If both are specified, an _Invalid Expression_ error MUST be emitted
2494and a _fallback value_ used as the resolved value of the _expression_.
2495
2496**Style Options**
2497
2498The function `:datetime` has these _style options_.
2499- `dateStyle`
2500  - `full`
2501  - `long`
2502  - `medium`
2503  - `short`
2504- `timeStyle`
2505  - `full`
2506  - `long`
2507  - `medium`
2508  - `short`
2509
2510**Field Options**
2511
2512_Field options_ describe which fields to include in the formatted output
2513and what format to use for that field.
2514The implementation may use this _annotation_ to configure which fields
2515appear in the formatted output.
2516
2517> [!NOTE]
2518> _Field options_ do not have default values because they are only to be used
2519> to compose the formatter.
2520
2521The _field options_ are defined as follows:
2522
2523> [!IMPORTANT]
2524> The value `2-digit` for some _field options_ **must** be quoted
2525> in the MessageFormat syntax because it starts with a digit
2526> but does not match the `number-literal` production in the ABNF.
2527> ```
2528> .local $correct = {$someDate :datetime year=|2-digit|}
2529> .local $syntaxError = {$someDate :datetime year=2-digit}
2530> ```
2531
2532The function `:datetime` has the following options:
2533- `weekday`
2534  - `long`
2535  - `short`
2536  - `narrow`
2537- `era`
2538  - `long`
2539  - `short`
2540  - `narrow`
2541- `year`
2542  - `numeric`
2543  - `2-digit`
2544- `month`
2545  - `numeric`
2546  - `2-digit`
2547  - `long`
2548  - `short`
2549  - `narrow`
2550- `day`
2551  - `numeric`
2552  - `2-digit`
2553- `hour`
2554  - `numeric`
2555  - `2-digit`
2556- `minute`
2557  - `numeric`
2558  - `2-digit`
2559- `second`
2560  - `numeric`
2561  - `2-digit`
2562- `fractionalSecondDigits`
2563  - `1`
2564  - `2`
2565  - `3`
2566- `hourCycle` (default is locale-specific)
2567  - `h11`
2568  - `h12`
2569  - `h23`
2570  - `h24`
2571- `timeZoneName`
2572  - `long`
2573  - `short`
2574  - `shortOffset`
2575  - `longOffset`
2576  - `shortGeneric`
2577  - `longGeneric`
2578
2579> [!NOTE]
2580> The following options do not have default values because they are only to be used
2581> as overrides for locale-and-value dependent implementation-defined defaults.
2582
2583The following date/time options are **not** part of the default registry.
2584Implementations SHOULD avoid creating options that conflict with these, but
2585are encouraged to track development of these options during Tech Preview:
2586- `calendar` (default is locale-specific)
2587  - valid [Unicode Calendar Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCalendarIdentifier)
2588- `numberingSystem` (default is locale-specific)
2589   - valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
2590- `timeZone` (default is system default time zone or UTC)
2591  - valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
2592
2593##### The `:date` function
2594
2595The function `:date` is used to format the date portion of date/time values.
2596
2597If no options are specified, this function defaults to the following:
2598- `{$d :date}` is the same as `{$d :date style=short}`
2599
2600###### Operands
2601
2602The _operand_ of the `:date` function is either
2603an implementation-defined date/time type
2604or a _date/time literal value_, as defined in [Date and Time Operand](#date-and-time-operands).
2605All other _operand_ values produce an _Invalid Expression_ error.
2606
2607###### Options
2608
2609The function `:date` has these _options_:
2610- `style`
2611  - `full`
2612  - `long`
2613  - `medium`
2614  - `short` (default)
2615
2616##### The `:time` function
2617
2618The function `:time` is used to format the time portion of date/time values.
2619
2620If no options are specified, this function defaults to the following:
2621- `{$t :time}` is the same as `{$t :time style=short}`
2622
2623###### Operands
2624
2625The _operand_ of the `:time` function is either
2626an implementation-defined date/time type
2627or a _date/time literal value_, as defined in [Date and Time Operand](#date-and-time-operands).
2628All other _operand_ values produce an _Invalid Expression_ error.
2629
2630###### Options
2631
2632The function `:time` has these _options_:
2633- `style`
2634  - `full`
2635  - `long`
2636  - `medium`
2637  - `short` (default)
2638
2639
2640##### Date and Time Operands
2641
2642The _operand_ of a date/time function is either
2643an implementation-defined date/time type
2644or a _date/time literal value_, as defined below.
2645All other _operand_ values produce an _Invalid Expression_ error.
2646
2647A **_<dfn>date/time literal value</dfn>_** is a non-empty string consisting of an ISO 8601 date,
2648or an ISO 8601 datetime optionally followed by a timezone offset.
2649As implementations differ slightly in their parsing of such strings,
2650ISO 8601 date and datetime values not matching the following regular expression MAY also be supported.
2651Furthermore, matching this regular expression does not guarantee validity,
2652given the variable number of days in each month.
2653
2654```regexp
2655(?!0000)[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])(T([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]{1,3})?(Z|[+-]((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?)?
2656```
2657
2658When the time is not present, implementations SHOULD use `00:00:00` as the time.
2659When the offset is not present, implementations SHOULD use a floating time type
2660(such as Java's `java.time.LocalDateTime`) to represent the time value.
2661For more information, see [Working with Timezones](https://w3c.github.io/timezone).
2662
2663> [!IMPORTANT]
2664> The [ABNF](#complete-abnf) and [syntax](#syntax) of MF2
2665> do not formally define date/time literals.
2666> This means that a _message_ can be syntactically valid but produce
2667> an _Operand Mismatch Error_ at runtime.
2668
2669> [!NOTE]
2670> String values passed as variables in the _formatting context_'s
2671> _input mapping_ can be formatted as date/time values as long as their
2672> contents are date/time literals.
2673>
2674> For example, if the value of the variable `now` were the string
2675> `2024-02-06T16:40:00Z`, it would behave identically to the local
2676> variable in this example:
2677> ```
2678> .local $example = {|2024-02-06T16:40:00Z| :datetime}
2679> {{{$now :datetime} == {$example}}}
2680> ```
2681
2682> [!NOTE]
2683> True time zone support in serializations is expected to coincide with the adoption
2684> of Temporal in JavaScript.
2685> The form of these serializations is known and is a de facto standard.
2686> Support for these extensions is expected to be required in the post-tech preview.
2687> See: https://datatracker.ietf.org/doc/draft-ietf-sedate-datetime-extended/
2688
2689
2690
2691
2692## Formatting
2693
2694This section defines the behavior of a MessageFormat 2.0 implementation
2695when formatting a message for display in a user interface, or for some later processing.
2696
2697To start, we presume that a _message_ has either been parsed from its syntax
2698or created from a data model description.
2699If this construction has encountered any _Syntax Errors_ or _Data Model Errors_,
2700an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result.
2701
2702Formatting of a _message_ is defined by the following operations:
2703
2704- **_Expression and Markup Resolution_** determines the value of an _expression_ or _markup_,
2705  with reference to the current _formatting context_.
2706  This can include multiple steps,
2707  such as looking up the value of a variable and calling formatting functions.
2708  The form of the resolved value is implementation defined and the
2709  value might not be evaluated or formatted yet.
2710  However, it needs to be "formattable", i.e. it contains everything required
2711  by the eventual formatting.
2712
2713  The resolution of _text_ is rather straightforward,
2714  and is detailed under _literal resolution_.
2715
2716> [!IMPORTANT]
2717>
2718> **This specification does not require either eager or lazy _expression resolution_ of _message_
2719> parts; do not construe any requirement in this document as requiring either.**
2720>
2721> Implementations are not required to evaluate all parts of a _message_ when
2722> parsing, processing, or formatting.
2723> In particular, an implementation MAY choose not to evaluate or resolve the
2724> value of a given _expression_ until it is actually used by a
2725> selection or formatting process.
2726> However, when an _expression_ is resolved, it MUST behave as if all preceding
2727> _declarations_ and _selectors_ affecting _variables_ referenced by that _expression_
2728> have already been evaluated in the order in which the relevant _declarations_
2729> and _selectors_ appear in the _message_.
2730
2731- **_Pattern Selection_** determines which of a message's _patterns_ is formatted.
2732  For a message with no _selectors_, this is simple as there is only one _pattern_.
2733  With _selectors_, this will depend on their resolution.
2734
2735  At the start of _pattern selection_,
2736  if the _message_ contains any _reserved statements_,
2737  emit an _Unsupported Statement_ error.
2738
2739- **_Formatting_** takes the resolved values of the selected _pattern_,
2740  and produces the formatted result for the _message_.
2741  Depending on the implementation, this result could be a single concatenated string,
2742  an array of objects, an attributed string, or some other locally appropriate data type.
2743
2744Formatter implementations are not required to expose
2745the _expression resolution_ and _pattern selection_ operations to their users,
2746or even use them in their internal processing,
2747as long as the final _formatting_ result is made available to users
2748and the observable behavior of the formatter matches that described here.
2749
2750### Formatting Context
2751
2752A message's **_formatting context_** represents the data and procedures that are required
2753for the message's _expression resolution_, _pattern selection_ and _formatting_.
2754
2755At a minimum, it includes:
2756
2757- Information on the current **_locale_**,
2758  potentially including a fallback chain of locales.
2759  This will be passed on to formatting functions.
2760
2761- Information on the base directionality of the _message_ and its _text_ tokens.
2762  This will be used by strategies for bidirectional isolation,
2763  and can be used to set the base direction of the _message_ upon display.
2764
2765- An **_<dfn>input mapping</dfn>_** of string identifiers to values,
2766  defining variable values that are available during _variable resolution_.
2767  This is often determined by a user-provided argument of a formatting function call.
2768
2769- The _function registry_,
2770  providing the implementations of the functions referred to by message _functions_.
2771
2772- Optionally, a fallback string to use for the message
2773  if it contains any _Syntax Errors_ or _Data Model Errors_.
2774
2775Implementations MAY include additional fields in their _formatting context_.
2776
2777### Expression and Markup Resolution
2778
2779_Expressions_ are used in _declarations_, _selectors_, and _patterns_.
2780_Markup_ is only used in _patterns_.
2781
2782In a _declaration_, the resolved value of the _expression_ is bound to a _variable_,
2783which is available for use by later _expressions_.
2784Since a _variable_ can be referenced in different ways later,
2785implementations SHOULD NOT immediately fully format the value for output.
2786
2787In an _input-declaration_, the _variable_ operand of the _variable-expression_
2788identifies not only the name of the external input value,
2789but also the _variable_ to which the resolved value of the _variable-expression_ is bound.
2790
2791In _selectors_, the resolved value of an _expression_ is used for _pattern selection_.
2792
2793In a _pattern_, the resolved value of an _expression_ or _markup_ is used in its _formatting_.
2794
2795The form that resolved values take is implementation-dependent,
2796and different implementations MAY choose to perform different levels of resolution.
2797
2798> For example, the resolved value of the _expression_ `{|0.40| :number style=percent}`
2799> could be an object such as
2800>
2801> ```
2802> { value: Number('0.40'),
2803>   formatter: NumberFormat(locale, { style: 'percent' }) }
2804> ```
2805>
2806> Alternatively, it could be an instance of an ICU4J `FormattedNumber`,
2807> or some other locally appropriate value.
2808
2809Depending on the presence or absence of a _variable_ or _literal_ operand
2810and a _function_, _private-use annotation_, or _reserved annotation_,
2811the resolved value of the _expression_ is determined as follows:
2812
2813If the _expression_ contains a _reserved annotation_,
2814an _Unsupported Expression_ error is emitted and
2815a _fallback value_ is used as the resolved value of the _expression_.
2816
2817Else, if the _expression_ contains a _private-use annotation_,
2818its resolved value is defined according to the implementation's specification.
2819
2820Else, if the _expression_ contains an _annotation_,
2821its resolved value is defined by _function resolution_.
2822
2823Else, if the _expression_ consists of a _variable_,
2824its resolved value is defined by _variable resolution_.
2825An implementation MAY perform additional processing
2826when resolving the value of an _expression_
2827that consists only of a _variable_.
2828
2829> For example, it could apply _function resolution_ using a _function_
2830> and a set of _options_ chosen based on the value or type of the _variable_.
2831> So, given a _message_ like this:
2832>
2833> ```
2834> Today is {$date}
2835> ```
2836>
2837> If the value passed in the _variable_ were a date object,
2838> such as a JavaScript `Date` or a Java `java.util.Date` or `java.time.Temporal`,
2839> the implementation could interpret the _placeholder_ `{$date}` as if
2840> the pattern included the function `:datetime` with some set of default options.
2841
2842Else, the _expression_ consists of a _literal_.
2843Its resolved value is defined by _literal resolution_.
2844
2845> **Note**
2846> This means that a _literal_ value with no _annotation_
2847> is always treated as a string.
2848> To represent values that are not strings as a _literal_,
2849> an _annotation_ needs to be provided:
2850>
2851> ```
2852> .local $aNumber = {1234 :number}
2853> .local $aDate = {|2023-08-30| :datetime}
2854> .local $aFoo = {|some foo| :foo}
2855> {{You have {42 :number}}}
2856> ```
2857
2858#### Literal Resolution
2859
2860The resolved value of a _text_ or a _literal_ is
2861the character sequence of the _text_ or _literal_
2862after any character escape has been converted to the escaped character.
2863
2864When a _literal_ is used as an _operand_
2865or on the right-hand side of an _option_,
2866the formatting function MUST treat its resolved value the same
2867whether its value was originally _quoted_ or _unquoted_.
2868
2869> For example,
2870> the _option_ `foo=42` and the _option_ `foo=|42|` are treated as identical.
2871
2872The resolution of a _text_ or _literal_ MUST resolve to a string.
2873
2874#### Variable Resolution
2875
2876To resolve the value of a _variable_,
2877its _name_ is used to identify either a local variable or an input variable.
2878If a _declaration_ exists for the _variable_, its resolved value is used.
2879Otherwise, the _variable_ is an implicit reference to an input value,
2880and its value is looked up from the _formatting context_ _input mapping_.
2881
2882The resolution of a _variable_ MAY fail if no value is identified for its _name_.
2883If this happens, an _Unresolved Variable_ error MUST be emitted.
2884If a _variable_ would resolve to a _fallback value_,
2885this MUST also be considered a failure.
2886
2887#### Function Resolution
2888
2889To resolve an _expression_ with a _function_ _annotation_,
2890the following steps are taken:
2891
28921. If the _expression_ includes an _operand_, resolve its value.
2893   If this fails, use a _fallback value_ for the _expression_.
28942. Resolve the _identifier_ of the _function_ and, based on the starting sigil,
2895   find the appropriate function implementation to call.
2896   If the implementation cannot find the function,
2897   or if the _identifier_ includes a _namespace_ that the implementation does not support,
2898   emit an _Unknown Function_ error
2899   and use a _fallback value_ for the _expression_.
2900
2901   Implementations are not required to implement _namespaces_ or installable
2902   _function registries_.
2903
29043. Perform _option resolution_.
2905
29064. Call the function implementation with the following arguments:
2907
2908   - The current _locale_.
2909   - The resolved mapping of _options_.
2910   - If the _expression_ includes an _operand_, its resolved value.
2911
2912   The form that resolved _operand_ and _option_ values take is implementation-defined.
2913
2914   A _declaration_ binds the resolved value of an _expression_
2915   to a _variable_.
2916   Thus, the result of one _function_ is potentially the _operand_
2917   of another _function_,
2918   or the value of one of the _options_ for another function.
2919   For example, in
2920   ```
2921   .input {$n :number minIntegerDigits=3}
2922   .local $n1 = {$n :number maxFractionDigits=3}
2923   ```
2924   the value bound to `$n` is the
2925   resolved value used as the _operand_
2926   of the `:number` _function_
2927   when resolving the value of the _variable_ `$n1`.
2928
2929   Implementations that provide a means for defining custom functions
2930   SHOULD provide a means for function implementations
2931   to return values that contain enough information
2932   (e.g. a representation of
2933   the resolved _operand_ and _option_ values
2934   that the function was called with)
2935   to be used as arguments to subsequent calls
2936   to the function implementations.
2937   For example, an implementation might define an interface that allows custom function implementation.
2938   Such an interface SHOULD define an implementation-specific
2939   argument type `T` and return type `U`
2940   for implementations of functions
2941   such that `U` can be coerced to `T`.
2942   Implementations of a _function_ SHOULD emit an
2943   _Invalid Expression_ error for _operands_ whose resolved value
2944   or type is not supported.
2945
2946> [!NOTE]
2947> The behavior of the previous example is
2948> currently implementation-dependent. Supposing that
2949> the external input variable `n` is bound to the string `"1"`,
2950> and that the implementation formats to a string,
2951> the formatted result of the following message:
2952>
2953> ```
2954> .input {$n :number minIntegerDigits=3}
2955> .local $n1 = {$n :number maxFractionDigits=3}
2956> {{$n1}}
2957> ```
2958>
2959> is currently implementation-dependent.
2960> Depending on whether the options are preserved
2961> between the resolution of the first `:number` _annotation_
2962> and the resolution of the second `:number` _annotation_,
2963> a conformant implementation
2964> could produce either "001.000" or "1.000"
2965>
2966> Each function **specification** MAY have
2967> its own rules to preserve some options in the returned structure
2968> and discard others.
2969> In instances where a function specification does not determine whether an option is preserved or discarded,
2970> each function **implementation** of that specification MAY have
2971> its own rules to preserve some options in the returned structure
2972> and discard others.
2973>
2974
2975> [!NOTE]
2976> During the Technical Preview,
2977> feedback on how the registry describes
2978> the flow of _resolved values_ and _options_
2979> from one _function_ to another,
2980> and on what requirements this specification should impose,
2981> is highly desired.
2982
2983   An implementation MAY pass additional arguments to the function,
2984   as long as reasonable precautions are taken to keep the function interface
2985   simple and minimal, and avoid introducing potential security vulnerabilities.
2986
2987   An implementation MAY define its own functions.
2988   An implementation MAY allow custom functions to be defined by users.
2989
2990   Function access to the _formatting context_ MUST be minimal and read-only,
2991   and execution time SHOULD be limited.
2992
2993   Implementation-defined _functions_ SHOULD use an implementation-defined _namespace_.
2994
29955. If the call succeeds,
2996   resolve the value of the _expression_ as the result of that function call.
2997
2998   If the call fails or does not return a valid value,
2999   emit a _Invalid Expression_ error.
3000
3001   Implementations MAY provide a mechanism for the _function_ to provide
3002   additional detail about internal failures.
3003   Specifically, if the cause of the failure was that the datatype, value, or format of the
3004   _operand_ did not match that expected by the _function_,
3005   the _function_ might cause an _Operand Mismatch Error_ to be emitted.
3006
3007   In all failure cases, use the _fallback value_ for the _expression_ as the resolved value.
3008
3009##### Option Resolution
3010
3011The result of resolving _option_ values is an unordered mapping of string identifiers to values.
3012
3013For each _option_:
3014
3015- Resolve the _identifier_ of the _option_.
3016- If the _option_'s right-hand side successfully resolves to a value,
3017   bind the _identifier_ of the _option_ to the resolved value in the mapping.
3018- Otherwise, bind the _identifier_ of the _option_ to an unresolved value in the mapping.
3019   Implementations MAY later remove this value before calling the _function_.
3020   (Note that an _Unresolved Variable_ error will have been emitted.)
3021
3022Errors MAY be emitted during _option resolution_,
3023but it always resolves to some mapping of string identifiers to values.
3024This mapping can be empty.
3025
3026#### Markup Resolution
3027
3028Unlike _functions_, the resolution of _markup_ is not customizable.
3029
3030The resolved value of _markup_ includes the following fields:
3031
3032- The type of the markup: open, standalone, or close
3033- The _identifier_ of the _markup_
3034- The resolved _options_ values after _option resolution_.
3035
3036The resolution of _markup_ MUST always succeed.
3037
3038#### Fallback Resolution
3039
3040A **_fallback value_** is the resolved value for an _expression_ that fails to resolve.
3041
3042An _expression_ fails to resolve when:
3043
3044- A _variable_ used as an _operand_ (with or without an _annotation_) fails to resolve.
3045  * Note that this does not include a _variable_ used as an _option_ value.
3046- A _function_ _annotation_ fails to resolve.
3047- A _private-use annotation_ is unsupported by the implementation or if
3048  a _private-use annotation_ fails to resolve.
3049- The _expression_ has a _reserved annotation_.
3050
3051The _fallback value_ depends on the contents of the _expression_:
3052
3053- _expression_ with _literal_ _operand_ (_quoted_ or _unquoted_):
3054  U+007C VERTICAL LINE `|`
3055  followed by the value of the _literal_
3056  with escaping applied to U+005C REVERSE SOLIDUS `\` and U+007C VERTICAL LINE `|`,
3057  and then by U+007C VERTICAL LINE `|`.
3058
3059  > Examples:
3060  > In a context where `:func` fails to resolve,
3061  > `{42 :func}` resolves to the _fallback value_ `|42|` and
3062  > `{|C:\\| :func}` resolves to the _fallback value_ `|C:\\|`.
3063  > In any context, `{|| @reserved}` resolves to the _fallback value_ `||`.
3064
3065- _expression_ with _variable_ _operand_ referring to a local _declaration_ (with or without an _annotation_):
3066  the _value_ to which it resolves (which may already be a _fallback value_)
3067
3068  > Examples:
3069  > In a context where `:func` fails to resolve,
3070  > the _pattern_'s _expression_ in `.local $var={|val|} {{{$val :func}}}`
3071  > resolves to the _fallback value_ `|val|` and the message formats to `{|val|}`.
3072  > In a context where `:now` fails to resolve but `:datetime` does not,
3073  > the _pattern_'s _expression_ in
3074  > ```
3075  > .local $t = {:now format=iso8601}
3076  > .local $pretty_t = {$t :datetime}
3077  > {{{$pretty_t}}}
3078  > ```
3079  > (transitively) resolves to the _fallback value_ `:now` and
3080  > the message formats to `{:now}`.
3081
3082- _expression_ with _variable_ _operand_ not referring to a local _declaration_ (with or without an _annotation_):
3083  U+0024 DOLLAR SIGN `$` followed by the _name_ of the _variable_
3084
3085  > Examples:
3086  > In a context where `$var` fails to resolve, `{$var}` and `{$var :number}` and `{$var @reserved}`
3087  > all resolve to the _fallback value_ `$var`.
3088  > In a context where `:func` fails to resolve,
3089  > the _pattern_'s _expression_ in `.input $arg {{{$arg :func}}}`
3090  > resolves to the _fallback value_ `$arg` and
3091  > the message formats to `{$arg}`.
3092
3093- _function_ _expression_ with no _operand_:
3094  U+003A COLON `:` followed by the _function_ _identifier_
3095
3096  > Examples:
3097  > In a context where `:func` fails to resolve, `{:func}` resolves to the _fallback value_ `:func`.
3098  > In a context where `:ns:func` fails to resolve, `{:ns:func}` resolves to the _fallback value_ `:ns:func`.
3099
3100- unsupported _private-use annotation_ or _reserved annotation_ with no _operand_:
3101  the _annotation_ starting sigil
3102
3103  > Examples:
3104  > In any context, `{@reserved}` and `{@reserved |...|}` both resolve to the _fallback value_ `@`.
3105
3106- supported _private-use annotation_ with no _operand_:
3107  the _annotation_ starting sigil, optionally followed by implementation-defined details
3108  conforming with patterns in the other cases (such as quoting literals).
3109  If details are provided, they SHOULD NOT leak potentially private information.
3110
3111  > Examples:
3112  > In a context where `^` expressions are used for comments, `{^▽^}` might resolve to the _fallback value_ `^`.
3113  > In a context where `&` expressions are _function_-like macro invocations, `{&foo |...|}` might resolve to the _fallback value_ `&foo`.
3114
3115- Otherwise: the U+FFFD REPLACEMENT CHARACTER `�`
3116
3117  This is not currently used by any expression, but may apply in future revisions.
3118
3119_Option_ _identifiers_ and values are not included in the _fallback value_.
3120
3121_Pattern selection_ is not supported for _fallback values_.
3122
3123### Pattern Selection
3124
3125When a _message_ contains a _matcher_ with one or more _selectors_,
3126the implementation needs to determine which _variant_ will be used
3127to provide the _pattern_ for the formatting operation.
3128This is done by ordering and filtering the available _variant_ statements
3129according to their _key_ values and selecting the first one.
3130
3131> [!NOTE]
3132> At least one _variant_ is required to have all of its _keys_ consist of
3133> the fallback value `*`.
3134> Some _selectors_ might be implemented in a way that the key value `*`
3135> cannot be selected in a _valid_ _message_.
3136> In other cases, this key value might be unreachable only in certain locales.
3137> This could result in the need in some locales to create
3138> one or more _variants_ that do not make sense grammatically for that language.
3139> > For example, in the `pl` (Polish) locale, this _message_ cannot reach
3140> > the `*` _variant_:
3141> > ```
3142> > .match {$num :integer}
3143> > 0    {{ }}
3144> > one  {{ }}
3145> > few  {{ }}
3146> > many {{ }}
3147> > *    {{Only used by fractions in Polish.}}
3148> > ```
3149>
3150> In the Tech Preview, feedback from users and implementers is desired about
3151> whether to relax the requirement that such a "fallback _variant_" appear in
3152> every message, versus the potential for a _message_ to fail at runtime
3153> because no matching _variant_ is available.
3154
3155The number of _keys_ in each _variant_ MUST equal the number of _selectors_.
3156
3157Each _key_ corresponds to a _selector_ by its position in the _variant_.
3158
3159> For example, in this message:
3160>
3161> ```
3162> .match {:one} {:two} {:three}
3163> 1 2 3 {{ ... }}
3164> ```
3165>
3166> The first _key_ `1` corresponds to the first _selector_ (`{:one}`),
3167> the second _key_ `2` to the second _selector_ (`{:two}`),
3168> and the third _key_ `3` to the third _selector_ (`{:three}`).
3169
3170To determine which _variant_ best matches a given set of inputs,
3171each _selector_ is used in turn to order and filter the list of _variants_.
3172
3173Each _variant_ with a _key_ that does not match its corresponding _selector_
3174is omitted from the list of _variants_.
3175The remaining _variants_ are sorted according to the _selector_'s _key_-ordering preference.
3176Earlier _selectors_ in the _matcher_'s list of _selectors_ have a higher priority than later ones.
3177
3178When all of the _selectors_ have been processed,
3179the earliest-sorted _variant_ in the remaining list of _variants_ is selected.
3180
3181> [!NOTE]
3182> A _selector_ is not a _declaration_.
3183> Even when the same _function_ can be used for both formatting and selection
3184> of a given _operand_
3185> the _annotation_ that appears in a _selector_ has no effect on subsequent
3186> _selectors_ nor on the formatting used in _placeholders_.
3187> To use the same value for selection and formatting,
3188> set its value with a `.input` or `.local` _declaration_.
3189
3190This selection method is defined in more detail below.
3191An implementation MAY use any pattern selection method,
3192as long as its observable behavior matches the results of the method defined here.
3193
3194If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
3195the result of pattern selection MUST be a pattern resolving to a single _fallback value_
3196using the message's fallback string defined in the _formatting context_
3197or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`.
3198
3199#### Resolve Selectors
3200
3201First, resolve the values of each _selector_:
3202
32031. Let `res` be a new empty list of resolved values that support selection.
32041. For each _selector_ `sel`, in source order,
3205   1. Let `rv` be the resolved value of `sel`.
3206   1. If selection is supported for `rv`:
3207      1. Append `rv` as the last element of the list `res`.
3208   1. Else:
3209      1. Let `nomatch` be a resolved value for which selection always fails.
3210      1. Append `nomatch` as the last element of the list `res`.
3211      1. Emit a _Selection Error_.
3212
3213The form of the resolved values is determined by each implementation,
3214along with the manner of determining their support for selection.
3215
3216#### Resolve Preferences
3217
3218Next, using `res`, resolve the preferential order for all message keys:
3219
32201. Let `pref` be a new empty list of lists of strings.
32211. For each index `i` in `res`:
3222   1. Let `keys` be a new empty list of strings.
3223   1. For each _variant_ `var` of the message:
3224      1. Let `key` be the `var` key at position `i`.
3225      1. If `key` is not the catch-all key `'*'`:
3226         1. Assert that `key` is a _literal_.
3227         1. Let `ks` be the resolved value of `key`.
3228         1. Append `ks` as the last element of the list `keys`.
3229   1. Let `rv` be the resolved value at index `i` of `res`.
3230   1. Let `matches` be the result of calling the method MatchSelectorKeys(`rv`, `keys`)
3231   1. Append `matches` as the last element of the list `pref`.
3232
3233The method MatchSelectorKeys is determined by the implementation.
3234It takes as arguments a resolved _selector_ value `rv` and a list of string keys `keys`,
3235and returns a list of string keys in preferential order.
3236The returned list MUST contain only unique elements of the input list `keys`.
3237The returned list MAY be empty.
3238The most-preferred key is first,
3239with each successive key appearing in order by decreasing preference.
3240
3241#### Filter Variants
3242
3243Then, using the preferential key orders `pref`,
3244filter the list of _variants_ to the ones that match with some preference:
3245
32461. Let `vars` be a new empty list of _variants_.
32471. For each _variant_ `var` of the message:
3248   1. For each index `i` in `pref`:
3249      1. Let `key` be the `var` key at position `i`.
3250      1. If `key` is the catch-all key `'*'`:
3251         1. Continue the inner loop on `pref`.
3252      1. Assert that `key` is a _literal_.
3253      1. Let `ks` be the resolved value of `key`.
3254      1. Let `matches` be the list of strings at index `i` of `pref`.
3255      1. If `matches` includes `ks`:
3256         1. Continue the inner loop on `pref`.
3257      1. Else:
3258         1. Continue the outer loop on message _variants_.
3259   1. Append `var` as the last element of the list `vars`.
3260
3261#### Sort Variants
3262
3263Finally, sort the list of variants `vars` and select the _pattern_:
3264
32651. Let `sortable` be a new empty list of (integer, _variant_) tuples.
32661. For each _variant_ `var` of `vars`:
3267   1. Let `tuple` be a new tuple (-1, `var`).
3268   1. Append `tuple` as the last element of the list `sortable`.
32691. Let `len` be the integer count of items in `pref`.
32701. Let `i` be `len` - 1.
32711. While `i` >= 0:
3272   1. Let `matches` be the list of strings at index `i` of `pref`.
3273   1. Let `minpref` be the integer count of items in `matches`.
3274   1. For each tuple `tuple` of `sortable`:
3275      1. Let `matchpref` be an integer with the value `minpref`.
3276      1. Let `key` be the `tuple` _variant_ key at position `i`.
3277      1. If `key` is not the catch-all key `'*'`:
3278         1. Assert that `key` is a _literal_.
3279         1. Let `ks` be the resolved value of `key`.
3280         1. Let `matchpref` be the integer position of `ks` in `matches`.
3281      1. Set the `tuple` integer value as `matchpref`.
3282   1. Set `sortable` to be the result of calling the method `SortVariants(sortable)`.
3283   1. Set `i` to be `i` - 1.
32841. Let `var` be the _variant_ element of the first element of `sortable`.
32851. Select the _pattern_ of `var`.
3286
3287`SortVariants` is a method whose single argument is
3288a list of (integer, _variant_) tuples.
3289It returns a list of (integer, _variant_) tuples.
3290Any implementation of `SortVariants` is acceptable
3291as long as it satisfies the following requirements:
3292
32931. Let `sortable` be an arbitrary list of (integer, _variant_) tuples.
32941. Let `sorted` be `SortVariants(sortable)`.
32951. `sorted` is the result of sorting `sortable` using the following comparator:
3296   1. `(i1, v1)` <= `(i2, v2)` if and only if `i1 <= i2`.
32971. The sort is stable (pairs of tuples from `sortable` that are equal
3298   in their first element have the same relative order in `sorted`).
3299
3300#### Examples
3301
3302_This section is non-normative._
3303
3304##### Example 1
3305
3306Presuming a minimal implementation which only supports `:string` annotation
3307which matches keys by using string comparison,
3308and a formatting context in which
3309the variable reference `$foo` resolves to the string `'foo'` and
3310the variable reference `$bar` resolves to the string `'bar'`,
3311pattern selection proceeds as follows for this message:
3312
3313```
3314.match {$foo :string} {$bar :string}
3315bar bar {{All bar}}
3316foo foo {{All foo}}
3317* * {{Otherwise}}
3318```
3319
33201. For the first selector:<br>
3321   The value of the selector is resolved to be `'foo'`.<br>
3322   The available keys « `'bar'`, `'foo'` » are compared to `'foo'`,<br>
3323   resulting in a list « `'foo'` » of matching keys.
3324
33252. For the second selector:<br>
3326   The value of the selector is resolved to be `'bar'`.<br>
3327   The available keys « `'bar'`, `'foo'` » are compared to `'bar'`,<br>
3328   resulting in a list « `'bar'` » of matching keys.
3329
33303. Creating the list `vars` of variants matching all keys:<br>
3331   The first variant `bar bar` is discarded as its first key does not match the first selector.<br>
3332   The second variant `foo foo` is discarded as its second key does not match the second selector.<br>
3333   The catch-all keys of the third variant `* *` always match, and this is added to `vars`,<br>
3334   resulting in a list « `* *` » of variants.
3335
33364. As the list `vars` only has one entry, it does not need to be sorted.<br>
3337   The pattern `Otherwise` of the third variant is selected.
3338
3339##### Example 2
3340
3341Alternatively, with the same implementation and formatting context as in Example 1,
3342pattern selection would proceed as follows for this message:
3343
3344```
3345.match {$foo :string} {$bar :string}
3346* bar {{Any and bar}}
3347foo * {{Foo and any}}
3348foo bar {{Foo and bar}}
3349* * {{Otherwise}}
3350```
3351
33521. For the first selector:<br>
3353   The value of the selector is resolved to be `'foo'`.<br>
3354   The available keys « `'foo'` » are compared to `'foo'`,<br>
3355   resulting in a list « `'foo'` » of matching keys.
3356
33572. For the second selector:<br>
3358   The value of the selector is resolved to be `'bar'`.<br>
3359   The available keys « `'bar'` » are compared to `'bar'`,<br>
3360   resulting in a list « `'bar'` » of matching keys.
3361
33623. Creating the list `vars` of variants matching all keys:<br>
3363   The keys of all variants either match each selector exactly, or via the catch-all key,<br>
3364   resulting in a list « `* bar`, `foo *`, `foo bar`, `* *` » of variants.
3365
33664. Sorting the variants:<br>
3367   The list `sortable` is first set with the variants in their source order
3368   and scores determined by the second selector:<br>
3369   « ( 0, `* bar` ), ( 1, `foo *` ), ( 0, `foo bar` ), ( 1, `* *` ) »<br>
3370   This is then sorted as:<br>
3371   « ( 0, `* bar` ), ( 0, `foo bar` ), ( 1, `foo *` ), ( 1, `* *` ) ».<br>
3372   To sort according to the first selector, the scores are updated to:<br>
3373   « ( 1, `* bar` ), ( 0, `foo bar` ), ( 0, `foo *` ), ( 1, `* *` ) ».<br>
3374   This is then sorted as:<br>
3375   « ( 0, `foo bar` ), ( 0, `foo *` ), ( 1, `* bar` ), ( 1, `* *` ) ».<br>
3376
33775. The pattern `Foo and bar` of the most preferred `foo bar` variant is selected.
3378
3379##### Example 3
3380
3381A more-complex example is the matching found in selection APIs
3382such as ICU's `PluralFormat`.
3383Suppose that this API is represented here by the function `:number`.
3384This `:number` function can match a given numeric value to a specific number _literal_
3385and **_also_** to a plural category (`zero`, `one`, `two`, `few`, `many`, `other`)
3386according to locale rules defined in CLDR.
3387
3388Given a variable reference `$count` whose value resolves to the number `1`
3389and an `en` (English) locale,
3390the pattern selection proceeds as follows for this message:
3391
3392```
3393.input {$count :number}
3394.match {$count}
3395one {{Category match for {$count}}}
33961   {{Exact match for {$count}}}
3397*   {{Other match for {$count}}}
3398```
3399
34001. For the selector:<br>
3401   The value of the selector is resolved to an implementation-defined value
3402   that is capable of performing English plural category selection on the value `1`.<br>
3403   The available keys « `'one'`, `'1'` » are passed to
3404   the implementation's MatchSelectorKeys method,<br>
3405   resulting in a list « `'1'`, `'one'` » of matching keys.
3406
34072. Creating the list `vars` of variants matching all keys:<br>
3408   The keys of all variants are included in the list of matching keys, or use the catch-all key,<br>
3409   resulting in a list « `one`, `1`, `*` » of variants.
3410
34113. Sorting the variants:<br>
3412   The list `sortable` is first set with the variants in their source order
3413   and scores determined by the selector key order:<br>
3414   « ( 1, `one` ), ( 0, `1` ), ( 2, `*` ) »<br>
3415   This is then sorted as:<br>
3416   « ( 0, `1` ), ( 1, `one` ), ( 2, `*` ) »<br>
3417
34184. The pattern `Exact match for {$count}` of the most preferred `1` variant is selected.
3419
3420### Formatting
3421
3422After _pattern selection_,
3423each _text_ and _placeholder_ part of the selected _pattern_ is resolved and formatted.
3424
3425Resolved values cannot always be formatted by a given implementation.
3426When such an error occurs during _formatting_,
3427an implementation SHOULD emit a _Formatting Error_ and produce a
3428_fallback value_ for the _placeholder_ that produced the error.
3429A formatting function MAY substitute a value to use instead of a _fallback value_.
3430
3431Implementations MAY represent the result of _formatting_ using the most
3432appropriate data type or structure. Some examples of these include:
3433
3434- A single string concatenated from the parts of the resolved _pattern_.
3435- A string with associated attributes for portions of its text.
3436- A flat sequence of objects corresponding to each resolved value.
3437- A hierarchical structure of objects that group spans of resolved values,
3438  such as sequences delimited by _markup-open_ and _markup-close_ _placeholders_.
3439
3440Implementations SHOULD provide _formatting_ result types that match user needs,
3441including situations that require further processing of formatted messages.
3442Implementations SHOULD encourage users to consider a formatted localised string
3443as an opaque data structure, suitable only for presentation.
3444
3445When formatting to a string, the default representation of all _markup_
3446MUST be an empty string.
3447Implementations MAY offer functionality for customizing this,
3448such as by emitting XML-ish tags for each _markup_.
3449
3450_Attributes_ are reserved for future standardization.
3451Other than checking for valid syntax, they SHOULD NOT
3452affect the processing or output of a _message_.
3453
3454#### Examples
3455
3456_This section is non-normative._
3457
34581. An implementation might choose to return an interstitial object
3459   so that the caller can "decorate" portions of the formatted value.
3460   In ICU4J, the `NumberFormatter` class returns a `FormattedNumber` object,
3461   so a _pattern_ such as `This is my number {42 :number}` might return
3462   the character sequence `This is my number `
3463   followed by a `FormattedNumber` object representing the value `42` in the current locale.
3464
34652. A formatter in a web browser could format a message as a DOM fragment
3466   rather than as a representation of its HTML source.
3467
3468#### Formatting Fallback Values
3469
3470If the resolved _pattern_ includes any _fallback values_
3471and the formatting result is a concatenated string or a sequence of strings,
3472the string representation of each _fallback value_ MUST be the concatenation of
3473a U+007B LEFT CURLY BRACKET `{`,
3474the _fallback value_ as a string,
3475and a U+007D RIGHT CURLY BRACKET `}`.
3476
3477> For example,
3478> a message with a _Syntax Error_ and no fallback string
3479> defined in the _formatting context_ would format to a string as `{�}`.
3480
3481#### Handling Bidirectional Text
3482
3483_Messages_ contain text. Any text can be
3484[bidirectional text](https://www.w3.org/TR/i18n-glossary/#dfn-bidirectional-text).
3485That is, the text can can consist of a mixture of left-to-right and right-to-left spans of text.
3486The display of bidirectional text is defined by the
3487[Unicode Bidirectional Algorithm](http://www.unicode.org/reports/tr9/) [UAX9].
3488
3489The directionality of the message as a whole is provided by the _formatting context_.
3490
3491When a _message_ is formatted, _placeholders_ are replaced
3492with their formatted representation.
3493Applying the Unicode Bidirectional Algorithm to the text of a formatted _message_
3494(including its formatted parts)
3495can result in unexpected or undesirable
3496[spillover effects](https://www.w3.org/TR/i18n-glossary/#dfn-spillover-effects).
3497Applying [bidi isolation](https://www.w3.org/TR/i18n-glossary/#dfn-bidi-isolation)
3498to each affected formatted value helps avoid this spillover in a formatted _message_.
3499
3500Note that both the _message_ and, separately, each _placeholder_ need to have
3501direction metadata for this to work.
3502If an implementation supports formatting to something other than a string
3503(such as a sequence of parts),
3504the directionality of each formatted _placeholder_ needs to be available to the caller.
3505
3506If a formatted _expression_ itself contains spans with differing directionality,
3507its formatter SHOULD perform any necessary processing, such as inserting controls or
3508isolating such parts to ensure that the formatted value displays correctly in a plain text context.
3509
3510> For example, an implementation could provide a `:currency` formatting function
3511> which inserts strongly directional characters, such as U+200F RIGHT-TO-LEFT MARK (RLM),
3512> U+200E LEFT-TO-RIGHT MARK (LRM), or U+061C ARABIC LETTER MARKER (ALM),
3513> to coerce proper display of the sign and currency symbol next to a formatted number.
3514> An example of this is formatting the value `-1234.56` as the currency `AED`
3515> in the `ar-AE` locale. The formatted value appears like this:
3516> ```
3517> ‎-1,234.56 د.إ.‏
3518> ```
3519> The code point sequence for this string, as produced by the ICU4J `NumberFormat` function,
3520> includes **U+200F U+200E** at the start and **U+200F** at the end of the string.
3521> If it did not do this, the same string would appear like this instead:
3522>
3523> ![image](images/messageFormatCurrencyExample.png)
3524
3525A **_bidirectional isolation strategy_** is functionality in the formatter's
3526processing of a _message_ that produces bidirectional output text that is ready for display.
3527
3528The **_Default Bidi Strategy_** is a _bidirectional isolation strategy_ that uses
3529isolating Unicode control characters around _placeholder_'s formatted values.
3530It is primarily intended for use in plain-text strings, where markup or other mechanisms
3531are not available.
3532Implementations MUST provide the _Default Bidi Strategy_ as one of the
3533_bidirectional isolation strategies_.
3534
3535Implementations MAY provide other _bidirectional isolation strategies_.
3536
3537Implementations MAY supply a _bidirectional isolation strategy_ that performs no processing.
3538
3539The _Default Bidi Strategy_ is defined as follows:
3540
35411. Let `msgdir` be the directionality of the whole message,
3542   one of « `'LTR'`, `'RTL'`, `'unknown'` ».
3543   These correspond to the message having left-to-right directionality,
3544   right-to-left directionality, and to the message's directionality not being known.
35451. For each _expression_ `exp` in _pattern_:
3546   1. Let `fmt` be the formatted string representation of the resolved value of `exp`.
3547   1. Let `dir` be the directionality of `fmt`,
3548      one of « `'LTR'`, `'RTL'`, `'unknown'` », with the same meanings as for `msgdir`.
3549   1. If `dir` is `'LTR'`:
3550      1. If `msgdir` is `'LTR'`
3551         in the formatted output, let `fmt` be itself
3552      1. Else, in the formatted output,
3553         prefix `fmt` with U+2066 LEFT-TO-RIGHT ISOLATE
3554         and postfix it with U+2069 POP DIRECTIONAL ISOLATE.
3555   1. Else, if `dir` is `'RTL'`:
3556      1. In the formatted output,
3557         prefix `fmt` with U+2067 RIGHT-TO-LEFT ISOLATE
3558         and postfix it with U+2069 POP DIRECTIONAL ISOLATE.
3559   1. Else:
3560      1. In the formatted output,
3561         prefix `fmt` with U+2068 FIRST STRONG ISOLATE
3562         and postfix it with U+2069 POP DIRECTIONAL ISOLATE.
3563
3564
3565## Interchange Data Model
3566
3567This section defines a data model representation of MessageFormat 2 _messages_.
3568
3569Implementations are not required to use this data model for their internal representation of messages.
3570Neither are they required to provide an interface that accepts or produces
3571representations of this data model.
3572
3573The major reason this specification provides a data model is to allow interchange of
3574the logical representation of a _message_ between different implementations.
3575This includes mapping legacy formatting syntaxes (such as MessageFormat 1)
3576to a MessageFormat 2 implementation.
3577Another use would be in converting to or from translation formats without
3578the need to continually parse and serialize all or part of a message.
3579
3580Implementations that expose APIs supporting the production, consumption, or transformation of a
3581_message_ as a data structure are encouraged to use this data model.
3582
3583This data model provides these capabilities:
3584- any MessageFormat 2 message (including future versions)
3585  can be parsed into this representation
3586- this data model representation can be serialized as a well-formed
3587MessageFormat 2 message
3588- parsing a MessageFormat 2 message into a data model representation
3589  and then serializing it results in an equivalently functional message
3590
3591This data model might also be used to:
3592- parse a non-MessageFormat 2 message into a data model
3593  (and therefore re-serialize it as MessageFormat 2).
3594  Note that this depends on compatibility between the two syntaxes.
3595- re-serialize a MessageFormat 2 message into some other format
3596  including (but not limited to) other formatting syntaxes
3597  or translation formats.
3598
3599To ensure compatibility across all platforms,
3600this interchange data model is defined here using TypeScript notation.
3601Two equivalent definitions of the data model are also provided:
3602
3603- `common/dtd/messageFormat/message.json` is a JSON Schema definition,
3604  for use with message data encoded as JSON or compatible formats, such as YAML.
3605- `common/dtd/messageFormat/message.json` is a document type definition (DTD),
3606  for use with message data encoded as XML.
3607
3608Note that while the data model description below is the canonical one,
3609the JSON and DTD definitions are intended for interchange between systems and processors.
3610To that end, they relax some aspects of the data model, such as allowing
3611declarations, options, and attributes to be optional rather than required properties.
3612
3613> [!NOTE]
3614> Users relying on XML representations of messages should note that
3615> XML 1.0 does not allow for the representation of all C0 control characters (U+0000-U+001F).
3616> Except for U+0000 NULL , these characters are allowed in MessageFormat 2 messages,
3617> so systems and users relying on this XML representation for interchange
3618> might need to supply an alternate escape mechanism to support messages
3619> that contain these characters.
3620
3621> [!IMPORTANT]
3622> The data model uses the field name `name` to denote various interface identifiers.
3623> In the MessageFormat 2 [syntax](#syntax), the source for these `name` fields
3624> sometimes uses the production `identifier`.
3625> This happens when the named item, such as a _function_, supports namespacing.
3626>
3627> In the Tech Preview, feedback on whether to separate the `namespace` from the `name`
3628> and represent both separately, or just, as here, use an opaque single field `name`
3629> is desired.
3630
3631### Messages
3632
3633A `SelectMessage` corresponds to a syntax message that includes _selectors_.
3634A message without _selectors_ and with a single _pattern_ is represented by a `PatternMessage`.
3635
3636In the syntax,
3637a `PatternMessage` may be represented either as a _simple message_ or as a _complex message_,
3638depending on whether it has declarations and if its `pattern` is allowed in a _simple message_.
3639
3640```ts
3641type Message = PatternMessage | SelectMessage;
3642
3643interface PatternMessage {
3644  type: "message";
3645  declarations: Declaration[];
3646  pattern: Pattern;
3647}
3648
3649interface SelectMessage {
3650  type: "select";
3651  declarations: Declaration[];
3652  selectors: Expression[];
3653  variants: Variant[];
3654}
3655```
3656
3657Each message _declaration_ is represented by a `Declaration`,
3658which connects the `name` of a _variable_
3659with its _expression_ `value`.
3660The `name` does not include the initial `$` of the _variable_.
3661
3662The `name` of an `InputDeclaration` MUST be the same
3663as the `name` in the `VariableRef` of its `VariableExpression` `value`.
3664
3665An `UnsupportedStatement` represents a statement not supported by the implementation.
3666Its `keyword` is a non-empty string name (i.e. not including the initial `.`).
3667If not empty, the `body` is the "raw" value (i.e. escape sequences are not processed)
3668starting after the keyword and up to the first _expression_,
3669not including leading or trailing whitespace.
3670The non-empty `expressions` correspond to the trailing _expressions_ of the _reserved statement_.
3671
3672> [!NOTE]
3673> Be aware that future versions of this specification
3674> might assign meaning to _reserved statement_ values.
3675> This would result in new interfaces being added to
3676> this data model.
3677
3678```ts
3679type Declaration = InputDeclaration | LocalDeclaration | UnsupportedStatement;
3680
3681interface InputDeclaration {
3682  type: "input";
3683  name: string;
3684  value: VariableExpression;
3685}
3686
3687interface LocalDeclaration {
3688  type: "local";
3689  name: string;
3690  value: Expression;
3691}
3692
3693interface UnsupportedStatement {
3694  type: "unsupported-statement";
3695  keyword: string;
3696  body?: string;
3697  expressions: Expression[];
3698}
3699```
3700
3701In a `SelectMessage`,
3702the `keys` and `value` of each _variant_ are represented as an array of `Variant`.
3703For the `CatchallKey`, a string `value` may be provided to retain an identifier.
3704This is always `'*'` in MessageFormat 2 syntax, but may vary in other formats.
3705
3706```ts
3707interface Variant {
3708  keys: Array<Literal | CatchallKey>;
3709  value: Pattern;
3710}
3711
3712interface CatchallKey {
3713  type: "*";
3714  value?: string;
3715}
3716```
3717
3718### Patterns
3719
3720Each `Pattern` contains a linear sequence of text and placeholders corresponding to potential output of a message.
3721
3722Each element of the `Pattern` MUST either be a non-empty string, an `Expression`, or a `Markup` object.
3723String values represent literal _text_.
3724String values include all processing of the underlying _text_ values,
3725including escape sequence processing.
3726`Expression` wraps each of the potential _expression_ shapes.
3727`Markup` wraps each of the potential _markup_ shapes.
3728
3729Implementations MUST NOT rely on the set of `Expression` and
3730`Markup` interfaces defined in this document being exhaustive.
3731Future versions of this specification might define additional
3732expressions or markup.
3733
3734```ts
3735type Pattern = Array<string | Expression | Markup>;
3736
3737type Expression =
3738  | LiteralExpression
3739  | VariableExpression
3740  | FunctionExpression
3741  | UnsupportedExpression;
3742
3743interface LiteralExpression {
3744  type: "expression";
3745  arg: Literal;
3746  annotation?: FunctionAnnotation | UnsupportedAnnotation;
3747  attributes: Attribute[];
3748}
3749
3750interface VariableExpression {
3751  type: "expression";
3752  arg: VariableRef;
3753  annotation?: FunctionAnnotation | UnsupportedAnnotation;
3754  attributes: Attribute[];
3755}
3756
3757interface FunctionExpression {
3758  type: "expression";
3759  arg?: never;
3760  annotation: FunctionAnnotation;
3761  attributes: Attribute[];
3762}
3763
3764interface UnsupportedExpression {
3765  type: "expression";
3766  arg?: never;
3767  annotation: UnsupportedAnnotation;
3768  attributes: Attribute[];
3769}
3770
3771interface Attribute {
3772  name: string;
3773  value?: Literal | VariableRef;
3774}
3775```
3776
3777### Expressions
3778
3779The `Literal` and `VariableRef` correspond to the the _literal_ and _variable_ syntax rules.
3780When they are used as the `body` of an `Expression`,
3781they represent _expression_ values with no _annotation_.
3782
3783`Literal` represents all literal values, both _quoted_ and _unquoted_.
3784The presence or absence of quotes is not preserved by the data model.
3785The `value` of `Literal` is the "cooked" value (i.e. escape sequences are processed).
3786
3787In a `VariableRef`, the `name` does not include the initial `$` of the _variable_.
3788
3789```ts
3790interface Literal {
3791  type: "literal";
3792  value: string;
3793}
3794
3795interface VariableRef {
3796  type: "variable";
3797  name: string;
3798}
3799```
3800
3801A `FunctionAnnotation` represents a _function_ _annotation_.
3802The `name` does not include the `:` starting sigil.
3803
3804Each _option_ is represented by an `Option`.
3805
3806```ts
3807interface FunctionAnnotation {
3808  type: "function";
3809  name: string;
3810  options: Option[];
3811}
3812
3813interface Option {
3814  name: string;
3815  value: Literal | VariableRef;
3816}
3817```
3818
3819An `UnsupportedAnnotation` represents a
3820_private-use annotation_ not supported by the implementation or a _reserved annotation_.
3821The `source` is the "raw" value (i.e. escape sequences are not processed),
3822including the starting sigil.
3823
3824When parsing the syntax of a _message_ that includes a _private-use annotation_
3825supported by the implementation,
3826the implementation SHOULD represent it in the data model
3827using an interface appropriate for the semantics and meaning
3828that the implementation attaches to that _annotation_.
3829
3830```ts
3831interface UnsupportedAnnotation {
3832  type: "unsupported-annotation";
3833  source: string;
3834}
3835```
3836
3837### Markup
3838
3839A `Markup` object has a `kind` of either `"open"`, `"standalone"`, or `"close"`,
3840each corresponding to _open_, _standalone_, and _close_ _markup_.
3841The `name` in these does not include the starting sigils `#` and `/`
3842or the ending sigil `/`.
3843The optional `options` for markup use the same `Option` as `FunctionAnnotation`.
3844
3845```ts
3846interface Markup {
3847  type: "markup";
3848  kind: "open" | "standalone" | "close";
3849  name: string;
3850  options: Option[];
3851  attributes: Attribute[];
3852}
3853```
3854
3855### Extensions
3856
3857Implementations MAY extend this data model with additional interfaces,
3858as well as adding new fields to existing interfaces.
3859When encountering an unfamiliar field, an implementation MUST ignore it.
3860For example, an implementation could include a `span` field on all interfaces
3861encoding the corresponding start and end positions in its source syntax.
3862
3863In general,
3864implementations MUST NOT extend the sets of values for any defined field or type
3865when representing a valid message.
3866However, when using this data model to represent an invalid message,
3867an implementation MAY do so.
3868This is intended to allow for the representation of "junk" or invalid content within messages.
3869
3870## Appendices
3871
3872### Security Considerations
3873
3874MessageFormat 2.0 _patterns_ are meant to allow a _message_ to include any string value
3875which users might normally wish to use in their environment.
3876Programming languages and other environments vary in what characters are permitted
3877to appear in a valid string.
3878In many cases, certain types of characters, such as invisible control characters,
3879require escaping by these host formats.
3880In other cases, strings are not permitted to contain certain characters at all.
3881Since _messages_ are subject to the restrictions and limitations of their
3882host environments, their serializations and resource formats,
3883that might be sufficient to prevent most problems.
3884However, MessageFormat itself does not supply such a restriction.
3885
3886MessageFormat _messages_ permit nearly all Unicode code points,
3887with the exception of surrogates,
3888to appear in _literals_, including the text portions of a _pattern_.
3889This means that it can be possible for a _message_ to contain invisible characters
3890(such as bidirectional controls,
3891ASCII control characters in the range U+0000 to U+001F,
3892or characters that might be interpreted as escapes or syntax in the host format)
3893that abnormally affect the display of the _message_
3894when viewed as source code, or in resource formats or translation tools,
3895but do not generate errors from MessageFormat parsers or processing APIs.
3896
3897Bidirectional text containing right-to-left characters (such as used for Arabic or Hebrew)
3898also poses a potential source of confusion for users.
3899Since MessageFormat 2.0's syntax makes use of
3900keywords and symbols that are left-to-right or consist of neutral characters
3901(including characters subject to mirroring under the Unicode Bidirectional Algorithm),
3902it is possible to create messages that,
3903when displayed in source code, or in resource formats or translation tools,
3904have a misleading appearance or are difficult to parse visually.
3905
3906For more information, see \[[UTS#55](https://unicode.org/reports/tr55/)\]
3907<cite>Unicode Source Code Handling</cite>.
3908
3909MessageFormat 2.0 implementations might allow end-users to install
3910_selectors_, _functions_, or _markup_ from third-party sources.
3911Such functionality can be a vector for various exploits,
3912including buffer overflow, code injection, user tracking,
3913fingerprinting, and other types of bad behavior.
3914Any installed code needs to be appropriately sandboxed.
3915In addition, end-users need to be aware of the risks involved.
3916
3917### Acknowledgements
3918
3919Special thanks to the following people for their contributions to making MessageFormat v2.
3920The following people contributed to our github repo and are listed in order by contribution size:
3921
3922Addison Phillips,
3923Eemeli Aro,
3924Romulo Cintra,
3925Stanisław Małolepszy,
3926Elango Cheran,
3927Richard Gibson,
3928Tim Chevalier,
3929Mihai Niță,
3930Shane F. Carr,
3931Mark Davis,
3932Steven R. Loomis,
3933Caleb Maclennan,
3934David Filip,
3935Daniel Minor,
3936Christopher Dieringer,
3937George Rhoten,
3938Ujjwal Sharma,
3939Daniel Ehrenberg,
3940Markus Scherer,
3941Zibi Braniecki,
3942Matt Radbourne,
3943Bruno Haible,
3944and Rafael Xavier de Souza.
3945
3946Addison Phillips was chair of the working group from January 2023.
3947Prior to 2023, the group was governed by a chair group, consisting of
3948Romulo Cintra,
3949Elango Cheran,
3950Mihai Niță,
3951David Filip,
3952Nicolas Bouvrette,
3953Stanisław Małolepszy,
3954Rafael Xavier de Souza,
3955Addison Phillips,
3956and Daniel Minor.
3957Romulo Cintra chaired the chair group.
3958
3959* * *
3960
3961Copyright © 2001–2024 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode [Terms of Use](https://www.unicode.org/copyright.html) apply.
3962
3963Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.
3964