1## Unicode Technical Standard #35 2 3# Unicode Locale Data Markup Language (LDML)<br/>Part 9: Message Format 4 5|Version|45 | 6|-------|------------------------| 7|Editors|Addison Phillips and [other CLDR committee members](tr35.md#Acknowledgments)| 8 9For the full header, summary, and status, see [Part 1: Core](tr35.md). 10 11### _Summary_ 12 13This specification defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. 14 15This is a partial document, describing only those parts of the LDML that are relevant for message format. For the other parts of the LDML see the [main LDML document](tr35.md) and the links above. 16 17### _Status_ 18 19<!-- _This is a draft document which may be updated, replaced, or superseded by other documents at any time. 20Publication does not imply endorsement by the Unicode Consortium. 21This is not a stable document; it is inappropriate to cite this document as other than a work in progress._ --> 22 23_This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium. 24This is a stable document and may be used as reference material or cited as a normative reference by other specifications._ 25 26> _**A Unicode Technical Standard (UTS)** is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS._ 27 28_Please submit corrigenda and other comments with the CLDR bug reporting form [[Bugs](tr35.md#Bugs)]. Related information that is useful in understanding this document is found in the [References](tr35.md#References). For the latest version of the Unicode Standard see [[Unicode](tr35.md#Unicode)]. For a list of current Unicode Technical Reports see [[Reports](tr35.md#Reports)]. For more information about versions of the Unicode Standard, see [[Versions](tr35.md#Versions)]._ 29 30## Parts 31 32The LDML specification is divided into the following parts: 33 34* Part 1: [Core](tr35.md#Contents) (languages, locales, basic structure) 35* Part 2: [General](tr35-general.md#Contents) (display names & transforms, etc.) 36* Part 3: [Numbers](tr35-numbers.md#Contents) (number & currency formatting) 37* Part 4: [Dates](tr35-dates.md#Contents) (date, time, time zone formatting) 38* Part 5: [Collation](tr35-collation.md#Contents) (sorting, searching, grouping) 39* Part 6: [Supplemental](tr35-info.md#Contents) (supplemental data) 40* Part 7: [Keyboards](tr35-keyboards.md#Contents) (keyboard mappings) 41* Part 8: [Person Names](tr35-personNames.md#Contents) (person names) 42* Part 9: [MessageFormat](tr35-messageFormat.md#Contents) (message format) 43 44## <a name="Contents">Contents of Part 9, Message Format</a> 45 46* [Introduction](#introduction) 47 * [Conformance](#conformance) 48 * [Terminology and Conventions](#terminology-and-conventions) 49 * [Stability Policy](#stability-policy) 50* [Syntax](#syntax) 51 * [Design Goals](#design-goals) 52 * [Design Restrictions](#design-restrictions) 53 * [Messages and their Syntax](#messages-and-their-syntax) 54 * [Well-formed vs. Valid Messages](#well-formed-vs.-valid-messages) 55 * [The Message](#the-message) 56 * [Declarations](#declarations) 57 * [Reserved Statements](#reserved-statements) 58 * [Complex Body](#complex-body) 59 * [Pattern](#pattern) 60 * [Quoted Pattern](#quoted-pattern) 61 * [Text](#text) 62 * [Placeholder](#placeholder) 63 * [Matcher](#matcher) 64 * [Selector](#selector) 65 * [Variant](#variant) 66 * [Key](#key) 67 * [Expressions](#expressions) 68 * [Annotation](#annotation) 69 * [Function](#function) 70 * [Options](#options) 71 * [Private-Use Annotations](#private-use-annotations) 72 * [Reserved Annotations](#reserved-annotations) 73 * [Markup](#markup) 74 * [Attributes](#attributes) 75 * [Other Syntax Elements](#other-syntax-elements) 76 * [Keywords](#keywords) 77 * [Literals](#literals) 78 * [Names and Identifiers](#names-and-identifiers) 79 * [Escape Sequences](#escape-sequences) 80 * [Whitespace](#whitespace) 81* [Complete ABNF](#complete-abnf) 82 * [`message.abnf`](#message.abnf) 83* [Errors](#errors) 84 * [Error Handling](#error-handling) 85 * [Syntax Errors](#syntax-errors) 86 * [Data Model Errors](#data-model-errors) 87 * [Variant Key Mismatch](#variant-key-mismatch) 88 * [Missing Fallback Variant](#missing-fallback-variant) 89 * [Missing Selector Annotation](#missing-selector-annotation) 90 * [Duplicate Declaration](#duplicate-declaration) 91 * [Duplicate Option Name](#duplicate-option-name) 92 * [Resolution Errors](#resolution-errors) 93 * [Unresolved Variable](#unresolved-variable) 94 * [Unknown Function](#unknown-function) 95 * [Unsupported Expression](#unsupported-expression) 96 * [Invalid Expression](#invalid-expression) 97 * [Unsupported Statement](#unsupported-statement) 98 * [Selection Errors](#selection-errors) 99 * [Formatting Errors](#formatting-errors) 100* [Function Registry](#function-registry) 101 * [Goals](#goals) 102 * [Conformance and Use](#conformance-and-use) 103 * [Registry Data Model](#registry-data-model) 104 * [Example](#example) 105 * [Default Registry](#default-registry) 106 * [String Value Selection and Formatting](#string-value-selection-and-formatting) 107 * [The `:string` function](#the-string-function) 108 * [Operands](#operands) 109 * [Options](#options) 110 * [Selection](#selection) 111 * [Formatting](#formatting) 112 * [Numeric Value Selection and Formatting](#numeric-value-selection-and-formatting) 113 * [The `:number` function](#the-number-function) 114 * [Operands](#operands) 115 * [Options](#options) 116 * [Default Value of `select` Option](#default-value-of-select-option) 117 * [Percent Style](#percent-style) 118 * [Selection](#selection) 119 * [The `:integer` function](#the-integer-function) 120 * [Operands](#operands) 121 * [Options](#options) 122 * [Default Value of `select` Option](#default-value-of-select-option) 123 * [Percent Style](#percent-style) 124 * [Selection](#selection) 125 * [Number Operands](#number-operands) 126 * [Digit Size Options](#digit-size-options) 127 * [Number Selection](#number-selection) 128 * [Rule Selection](#rule-selection) 129 * [Determining Exact Literal Match](#determining-exact-literal-match) 130 * [Date and Time Value Formatting](#date-and-time-value-formatting) 131 * [The `:datetime` function](#the-datetime-function) 132 * [Operands](#operands) 133 * [Options](#options) 134 * [The `:date` function](#the-date-function) 135 * [Operands](#operands) 136 * [Options](#options) 137 * [The `:time` function](#the-time-function) 138 * [Operands](#operands) 139 * [Options](#options) 140 * [Date and Time Operands](#date-and-time-operands) 141* [Formatting](#formatting) 142 * [Formatting Context](#formatting-context) 143 * [Expression and Markup Resolution](#expression-and-markup-resolution) 144 * [Literal Resolution](#literal-resolution) 145 * [Variable Resolution](#variable-resolution) 146 * [Function Resolution](#function-resolution) 147 * [Option Resolution](#option-resolution) 148 * [Markup Resolution](#markup-resolution) 149 * [Fallback Resolution](#fallback-resolution) 150 * [Pattern Selection](#pattern-selection) 151 * [Resolve Selectors](#resolve-selectors) 152 * [Resolve Preferences](#resolve-preferences) 153 * [Filter Variants](#filter-variants) 154 * [Sort Variants](#sort-variants) 155 * [Examples](#examples) 156 * [Example 1](#example-1) 157 * [Example 2](#example-2) 158 * [Example 3](#example-3) 159 * [Formatting](#formatting) 160 * [Examples](#examples) 161 * [Formatting Fallback Values](#formatting-fallback-values) 162 * [Handling Bidirectional Text](#handling-bidirectional-text) 163* [Interchange Data Model](#interchange-data-model) 164 * [Messages](#messages) 165 * [Patterns](#patterns) 166 * [Expressions](#expressions) 167 * [Markup](#markup) 168 * [Extensions](#extensions) 169* [Appendices](#appendices) 170 * [Security Considerations](#security-considerations) 171 * [Acknowledgements](#acknowledgements) 172 173## Introduction 174 175One of the challenges in adapting software to work for 176users with different languages and cultures is the need for **_dynamic messages_**. 177Whenever a user interface needs to present data as part of a larger string, 178that data needs to be formatted (and the message may need to be altered) 179to make it culturally accepted and grammatically correct. 180 181> For example, if your US English (`en-US`) interface has a message like: 182> 183> > Your item had 1,023 views on April 3, 2023 184> 185> You want the translated message to be appropriately formatted into French: 186> 187> > Votre article a eu 1 023 vues le 3 avril 2023 188> 189> Or Japanese: 190> 191> > あなたのアイテムは 2023 年 4 月 3 日に 1,023 回閲覧されました。 192 193This specification defines the 194data model, syntax, processing, and conformance requirements 195for the next generation of _dynamic messages_. 196It is intended for adoption by programming languages and APIs. 197This will enable the integration of 198existing internationalization APIs (such as the date and number formats shown above), 199grammatical matching (such as plurals or genders), 200as well as user-defined formats and message selectors. 201 202The document is the successor to ICU MessageFormat, 203henceforth called ICU MessageFormat 1.0. 204 205### Conformance 206 207Everything in this specification is normative except for: 208sections marked as non-normative, 209all authoring guidelines, diagrams, examples, and notes. 210 211The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 212NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", 213"MAY", and "OPTIONAL" in this document are to be interpreted as 214described in BCP 14 \[[RFC2119](https://www.rfc-editor.org/rfc/rfc2119)\] 215\[[RFC8174](https://www.rfc-editor.org/rfc/rfc8174)\] when, and only when, they 216appear in all capitals, as shown here. 217 218### Terminology and Conventions 219 220A **_term_** looks like this when it is defined in this specification. 221 222A reference to a _term_ looks like this. 223 224> Examples are non-normative and styled like this. 225 226### Stability Policy 227 228> [!IMPORTANT] 229> The provisions of the stability policy are not in effect until 230> the conclusion of the technical preview and adoption of this specification. 231 232Updates to this specification will not change 233the syntactical meaning, the runtime output, or other behaviour 234of valid messages written for earlier versions of this specification 235that only use functions defined in this specification. 236Updates to this specification will not remove any syntax provided in this version. 237Future versions MAY add additional structure or meaning to existing syntax. 238 239Updates to this specification will not remove any reserved keywords or sigils. 240 241> [!NOTE] 242> Future versions may define new keywords. 243 244Updates to this specification will not reserve or assign meaning to 245any character "sigils" except for those in the `reserved` production. 246 247Updates to this specification 248will not remove any functions defined in the default registry nor 249will they remove any options or option values. 250Additional options or option values MAY be defined. 251 252> [!NOTE] 253> This does not guarantee that the results of formatting will never change. 254> Even when the specification doesn't change, 255> the functions for date formatting, number formatting and so on 256> will change their results over time. 257 258Later specification versions MAY make previously invalid messages valid. 259 260Updates to this specification will not introduce message syntax that, 261when parsed according to earlier versions of this specification, 262would produce syntax or data model errors. 263Such messages MAY produce errors when formatted 264according to an earlier version of this specification. 265 266From version 2.0, MessageFormat will only reserve, define, or require 267function names or function option names 268consisting of characters in the ranges a-z, A-Z, and 0-9. 269All other names in these categories are reserved for the use of implementations or users. 270 271> [!NOTE] 272> Users defining custom names SHOULD include at least one character outside these ranges 273> to ensure that they will be compatible with future versions of this specification. 274 275Later versions of this specification will not introduce changes 276to the data model that would result in a data model representation 277based on this version being invalid. 278 279> For example, existing interfaces or fields will not be removed. 280 281Later versions of this specification MAY introduce changes 282to the data model that would result in future data model representations 283not being valid for implementations of this version of the data model. 284 285> For example, a future version could introduce a new keyword, 286> whose data model representation would be a new interface 287> that is not recognized by this version's data model. 288 289Later specification versions will not introduce syntax that cannot be 290represented by this version of the data model. 291 292> For example, a future version could introduce a new keyword. 293> The future version's data model would provide an interface for that keyword 294> while this version of the data model would parse the value into 295> the interface `UnsupportedStatement`. 296> Both data models would be "valid" in their context, 297> but this version's would be missing any functionality for the new statement type. 298 299## Syntax 300 301This section defines the formal grammar describing the syntax of a single message. 302 303### Design Goals 304 305_This section is non-normative._ 306 307The design goals of the syntax specification are as follows: 308 3091. The syntax should leverage the familiarity with ICU MessageFormat 1.0 310 in order to lower the barrier to entry and increase the chance of adoption. 311 At the same time, 312 the syntax should fix the [pain points of ICU MessageFormat 1.0](https://github.com/unicode-org/message-format-wg/blob/main/docs/why_mf_next.md). 313 314 - _Non-Goal_: Be backwards-compatible with the ICU MessageFormat 1.0 syntax. 315 3161. The syntax inside translatable content should be easy to understand for humans. 317 This includes making it clear which parts of the message body _are_ translatable content, 318 which parts inside it are placeholders for expressions, 319 as well as making the selection logic predictable and easy to reason about. 320 321 - _Non-Goal_: Make the syntax intuitive enough for non-technical translators to hand-edit. 322 Instead, we assume that most translators will work with MessageFormat 2 323 by means of GUI tooling, CAT workbenches etc. 324 3251. The syntax surrounding translatable content should be easy to write and edit 326 for developers, localization engineers, and easy to parse by machines. 327 3281. The syntax should make a single message easily embeddable inside many container formats: 329 `.properties`, YAML, XML, inlined as string literals in programming languages, etc. 330 This includes a future _MessageResource_ specification. 331 332 - _Non-Goal_: Support unnecessary escape sequences, which would theirselves require 333 additional escaping when embedded. Instead, we tolerate direct use of nearly all 334 characters (including line breaks, control characters, etc.) and rely upon escaping 335 in those outer formats to aid human comprehension (e.g., depending upon container 336 format, a U+000A LINE FEED might be represented as `\n`, `\012`, `\x0A`, `\u000A`, 337 `\U0000000A`, `
`, `
`, `%0A`, `<LF>`, or something else entirely). 338 339### Design Restrictions 340 341_This section is non-normative._ 342 343The syntax specification takes into account the following design restrictions: 344 3451. Whitespace outside the translatable content should be insignificant. 346 It should be possible to define a message entirely on a single line with no ambiguity, 347 as well as to format it over multiple lines for clarity. 348 3491. The syntax should define as few special characters and sigils as possible. 350 Note that this necessitates extra care when presenting messages for human consumption, 351 because they may contain invisible characters such as U+200B ZERO WIDTH SPACE, 352 control characters such as U+0000 NULL and U+0009 TAB, permanently reserved noncharacters 353 (U+FDD0 through U+FDEF and U+<i>n</i>FFFE and U+<i>n</i>FFFF where <i>n</i> is 0x0 through 0x10), 354 private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and 355 U+100000 through U+10FFFD), unassigned code points, and other potentially confusing content. 356 357### Messages and their Syntax 358 359The purpose of MessageFormat is to allow content to vary at runtime. 360This variation might be due to placing a value into the content 361or it might be due to selecting a different bit of content based on some data value 362or it might be due to a combination of the two. 363 364MessageFormat calls the template for a given formatting operation a _message_. 365 366The values passed in at runtime (which are to be placed into the content or used 367to select between different content items) are called _external variables_. 368The author of a _message_ can also assign _local variables_, including 369variables that modify _external variables_. 370 371This part of the MessageFormat specification defines the syntax for a _message_, 372along with the concepts and terminology needed when processing a _message_ 373during the [formatting](#formatting) of a _message_ at runtime. 374 375The complete formal syntax of a _message_ is described by the [ABNF](#complete-abnf). 376 377#### Well-formed vs. Valid Messages 378 379A _message_ is **_<dfn>well-formed</dfn>_** if it satisfies all the rules of the grammar. 380Attempting to parse a _message_ that is not _well-formed_ will result in a _Syntax Error_. 381 382A _message_ is **_<dfn>valid</dfn>_** if it is _well-formed_ and 383**also** meets the additional content restrictions 384and semantic requirements about its structure defined below for 385_declarations_, _matcher_ and _options_. 386Attempting to parse a _message_ that is not _valid_ will result in a _Data Model Error_. 387 388### The Message 389 390A **_<dfn>message</dfn>_** is the complete template for a specific message formatting request. 391 392> [!NOTE] 393> This syntax is designed to be embeddable into many different programming languages and formats. 394> As such, it avoids constructs, such as character escapes, that are specific to any given file 395> format or processor. 396> In particular, it avoids using quote characters common to many file formats and formal languages 397> so that these do not need to be escaped in the body of a _message_. 398 399> [!NOTE] 400> In general (and except where required by the syntax), whitespace carries no meaning in the structure 401> of a _message_. While many of the examples in this spec are written on multiple lines, the formatting 402> shown is primarily for readability. 403> 404> > **Example** This _message_: 405> > 406> > ``` 407> > .local $foo = { |horse| } 408> > {{You have a {$foo}!}} 409> > ``` 410> > 411> > Can also be written as: 412> > 413> > ``` 414> > .local $foo={|horse|}{{You have a {$foo}!}} 415> > ``` 416> > 417> > An exception to this is: whitespace inside a _pattern_ is **always** significant. 418 419> [!NOTE] 420> The syntax assumes that each _message_ will be displayed with a left-to-right display order 421> and be processed in the logical character order. 422> The syntax also permits the use of right-to-left characters in _identifiers_, 423> _literals_, and other values. 424> This can result in confusion when viewing the _message_. 425> 426> Additional restrictions or requirements, 427> such as permitting the use of certain bidirectional control characters in the syntax, 428> might be added during the Tech Preview to better manage bidirectional text. 429> Feedback on the creation and management of _messages_ 430> containing bidirectional tokens is strongly desired. 431 432A _message_ can be a _simple message_ or it can be a _complex message_. 433 434```abnf 435message = simple-message / complex-message 436``` 437 438A **_<dfn>simple message</dfn>_** contains a single _pattern_, 439with restrictions on its first character. 440An empty string is a valid _simple message_. 441 442```abnf 443simple-message = [simple-start pattern] 444simple-start = simple-start-char / text-escape / placeholder 445``` 446 447A **_<dfn>complex message</dfn>_** is any _message_ that contains _declarations_, 448a _matcher_, or both. 449A _complex message_ always begins with either a keyword that has a `.` prefix or a _quoted pattern_ 450and consists of: 451 4521. an optional list of _declarations_, followed by 4532. a _complex body_ 454 455```abnf 456complex-message = *(declaration [s]) complex-body 457``` 458 459#### Declarations 460 461A **_<dfn>declaration</dfn>_** binds a _variable_ identifier to a value within the scope of a _message_. 462This _variable_ can then be used in other _expressions_ within the same _message_. 463_Declarations_ are optional: many messages will not contain any _declarations_. 464 465An **_<dfn>input-declaration</dfn>_** binds a _variable_ to an external input value. 466The _variable-expression_ of an _input-declaration_ 467MAY include an _annotation_ that is applied to the external value. 468 469A **_<dfn>local-declaration</dfn>_** binds a _variable_ to the resolved value of an _expression_. 470 471For compatibility with later MessageFormat 2 specification versions, 472_declarations_ MAY also include _reserved statements_. 473 474```abnf 475declaration = input-declaration / local-declaration / reserved-statement 476input-declaration = input [s] variable-expression 477local-declaration = local s variable [s] "=" [s] expression 478``` 479 480_Variables_, once declared, MUST NOT be redeclared. 481A _message_ that does any of the following is not _valid_ and will produce a 482_Duplicate Declaration_ error during processing: 483- A _declaration_ MUST NOT bind a _variable_ 484 that appears as a _variable_ anywhere within a previous _declaration_. 485- An _input-declaration_ MUST NOT bind a _variable_ 486 that appears anywhere within the _annotation_ of its _variable-expression_. 487- A _local-declaration_ MUST NOT bind a _variable_ that appears in its _expression_. 488 489A _local-declaration_ MAY overwrite an external input value as long as the 490external input value does not appear in a previous _declaration_. 491 492> [!NOTE] 493> These restrictions only apply to _declarations_. 494> A _placeholder_ or _selector_ can apply a different annotation to a _variable_ 495> than one applied to the same _variable_ named in a _declaration_. 496> For example, this message is _valid_: 497> ``` 498> .input {$var :number maximumFractionDigits=0} 499> .match {$var :number maximumFractionDigits=2} 500> 0 {{The selector can apply a different annotation to {$var} for the purposes of selection}} 501> * {{A placeholder in a pattern can apply a different annotation to {$var :number maximumFractionDigits=3}}} 502> ``` 503> (See the [Errors](#errors) section for examples of invalid messages) 504 505##### Reserved Statements 506 507A **_<dfn>reserved statement</dfn>_** reserves additional `.keywords` 508for use by future versions of this specification. 509Any such future keyword must start with `.`, 510followed by two or more lower-case ASCII characters. 511 512The rest of the statement supports 513a similarly wide range of content as _reserved annotations_, 514but it MUST end with one or more _expressions_. 515 516```abnf 517reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression) 518reserved-keyword = "." name 519``` 520 521> [!NOTE] 522> The `reserved-keyword` ABNF rule is a simplification, 523> as it MUST NOT be considered to match any of the existing keywords 524> `.input`, `.local`, or `.match`. 525 526This allows flexibility in future standardization, 527as future definitions MAY define additional semantics and constraints 528on the contents of these _reserved statements_. 529 530Implementations MUST NOT assign meaning or semantics to a _reserved statement_: 531these are reserved for future standardization. 532Implementations MUST NOT remove or alter the contents of a _reserved statement_. 533 534#### Complex Body 535 536The **_<dfn>complex body</dfn>_** of a _complex message_ is the part that will be formatted. 537The _complex body_ consists of either a _quoted pattern_ or a _matcher_. 538 539```abnf 540complex-body = quoted-pattern / matcher 541``` 542 543### Pattern 544 545A **_<dfn>pattern</dfn>_** contains a sequence of _text_ and _placeholders_ to be formatted as a unit. 546Unless there is an error, resolving a _message_ always results in the formatting 547of a single _pattern_. 548 549```abnf 550pattern = *(text-char / text-escape / placeholder) 551``` 552A _pattern_ MAY be empty. 553 554A _pattern_ MAY contain an arbitrary number of _placeholders_ to be evaluated 555during the formatting process. 556 557#### Quoted Pattern 558 559A **_<dfn>quoted pattern</dfn>_** is a _pattern_ that is "quoted" to prevent 560interference with other parts of the _message_. 561A _quoted pattern_ starts with a sequence of two U+007B LEFT CURLY BRACKET `{{` 562and ends with a sequence of two U+007D RIGHT CURLY BRACKET `}}`. 563 564```abnf 565quoted-pattern = "{{" pattern "}}" 566``` 567 568A _quoted pattern_ MAY be empty. 569 570> An empty _quoted pattern_: 571> 572> ``` 573> {{}} 574> ``` 575 576#### Text 577 578**_<dfn>text</dfn>_** is the translateable content of a _pattern_. 579Any Unicode code point is allowed, except for U+0000 NULL 580and the surrogate code points U+D800 through U+DFFF inclusive. 581The characters U+005C REVERSE SOLIDUS `\`, 582U+007B LEFT CURLY BRACKET `{`, and U+007D RIGHT CURLY BRACKET `}` 583MUST be escaped as `\\`, `\{`, and `\}` respectively. 584 585In the ABNF, _text_ is represented by non-empty sequences of 586`simple-start-char`, `text-char`, and `text-escape`. 587The first of these is used at the start of a _simple message_, 588and matches `text-char` except for not allowing U+002E FULL STOP `.`. 589The ABNF uses `content-char` as a shared base for _text_ and _quoted literal_ characters. 590 591Whitespace in _text_, including tabs, spaces, and newlines is significant and MUST 592be preserved during formatting. 593 594```abnf 595simple-start-char = content-char / s / "@" / "|" 596text-char = content-char / s / "." / "@" / "|" 597quoted-char = content-char / s / "." / "@" / "{" / "}" 598reserved-char = content-char / "." 599content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) 600 / %x0B-0C ; omit CR (%x0D) 601 / %x0E-1F ; omit SP (%x20) 602 / %x21-2D ; omit . (%x2E) 603 / %x2F-3F ; omit @ (%x40) 604 / %x41-5B ; omit \ (%x5C) 605 / %x5D-7A ; omit { | } (%x7B-7D) 606 / %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000) 607 / %x3001-D7FF ; omit surrogates 608 / %xE000-10FFFF 609``` 610 611When a _pattern_ is quoted by embedding the _pattern_ in curly brackets, the 612resulting _message_ can be embedded into 613various formats regardless of the container's whitespace trimming rules. 614Otherwise, care must be taken to ensure that pattern-significant whitespace is preserved. 615 616> **Example** 617> In a Java `.properties` file, the values `hello` and `hello2` both contain 618> an identical _message_ which consists of a single _pattern_. 619> This _pattern_ consists of _text_ with exactly three spaces before and after the word "Hello": 620> 621> ```properties 622> hello = {{ Hello }} 623> hello2=\ Hello \ 624> ``` 625 626#### Placeholder 627 628A **_<dfn>placeholder</dfn>_** is an _expression_ or _markup_ that appears inside of a _pattern_ 629and which will be replaced during the formatting of a _message_. 630 631```abnf 632placeholder = expression / markup 633``` 634 635### Matcher 636 637A **_<dfn>matcher</dfn>_** is the _complex body_ of a _message_ that allows runtime selection 638of the _pattern_ to use for formatting. 639This allows the form or content of a _message_ to vary based on values 640determined at runtime. 641 642A _matcher_ consists of the keyword `.match` followed by at least one _selector_ 643and at least one _variant_. 644 645When the _matcher_ is processed, the result will be a single _pattern_ that serves 646as the template for the formatting process. 647 648A _message_ can only be considered _valid_ if the following requirements are 649satisfied: 650 651- The number of _keys_ on each _variant_ MUST be equal to the number of _selectors_. 652- At least one _variant_ MUST exist whose _keys_ are all equal to the "catch-all" key `*`. 653- Each _selector_ MUST have an _annotation_, 654 or contain a _variable_ that directly or indirectly references a _declaration_ with an _annotation_. 655 656```abnf 657matcher = match-statement 1*([s] variant) 658match-statement = match 1*([s] selector) 659``` 660 661> A _message_ with a _matcher_: 662> 663> ``` 664> .input {$count :number} 665> .match {$count} 666> one {{You have {$count} notification.}} 667> * {{You have {$count} notifications.}} 668> ``` 669 670> A _message_ containing a _matcher_ formatted on a single line: 671> 672> ``` 673> .match {:platform} windows {{Settings}} * {{Preferences}} 674> ``` 675 676#### Selector 677 678A **_<dfn>selector</dfn>_** is an _expression_ that ranks or excludes the 679_variants_ based on the value of the corresponding _key_ in each _variant_. 680The combination of _selectors_ in a _matcher_ thus determines 681which _pattern_ will be used during formatting. 682 683```abnf 684selector = expression 685``` 686 687There MUST be at least one _selector_ in a _matcher_. 688There MAY be any number of additional _selectors_. 689 690> A _message_ with a single _selector_ that uses a custom _function_ 691> `:hasCase` which is a _selector_ that allows the _message_ to choose a _pattern_ 692> based on grammatical case: 693> 694> ``` 695> .match {$userName :hasCase} 696> vocative {{Hello, {$userName :person case=vocative}!}} 697> accusative {{Please welcome {$userName :person case=accusative}!}} 698> * {{Hello!}} 699> ``` 700 701> A message with two _selectors_: 702> 703> ``` 704> .input {$numLikes :integer} 705> .input {$numShares :integer} 706> .match {$numLikes} {$numShares} 707> 0 0 {{Your item has no likes and has not been shared.}} 708> 0 one {{Your item has no likes and has been shared {$numShares} time.}} 709> 0 * {{Your item has no likes and has been shared {$numShares} times.}} 710> one 0 {{Your item has {$numLikes} like and has not been shared.}} 711> one one {{Your item has {$numLikes} like and has been shared {$numShares} time.}} 712> one * {{Your item has {$numLikes} like and has been shared {$numShares} times.}} 713> * 0 {{Your item has {$numLikes} likes and has not been shared.}} 714> * one {{Your item has {$numLikes} likes and has been shared {$numShares} time.}} 715> * * {{Your item has {$numLikes} likes and has been shared {$numShares} times.}} 716> ``` 717 718#### Variant 719 720A **_<dfn>variant</dfn>_** is a _quoted pattern_ associated with a set of _keys_ in a _matcher_. 721Each _variant_ MUST begin with a sequence of _keys_, 722and terminate with a valid _quoted pattern_. 723The number of _keys_ in each _variant_ MUST match the number of _selectors_ in the _matcher_. 724 725Each _key_ is separated from each other by whitespace. 726Whitespace is permitted but not required between the last _key_ and the _quoted pattern_. 727 728```abnf 729variant = key *(s key) [s] quoted-pattern 730key = literal / "*" 731``` 732 733##### Key 734 735A **_<dfn>key</dfn>_** is a value in a _variant_ for use by a _selector_ when ranking 736or excluding _variants_ during the _matcher_ process. 737A _key_ can be either a _literal_ value or the "catch-all" key `*`. 738 739The **_<dfn>catch-all key</dfn>_** is a special key, represented by `*`, 740that matches all values for a given _selector_. 741 742### Expressions 743 744An **_<dfn>expression</dfn>_** is a part of a _message_ that will be determined 745during the _message_'s formatting. 746 747An _expression_ MUST begin with U+007B LEFT CURLY BRACKET `{` 748and end with U+007D RIGHT CURLY BRACKET `}`. 749An _expression_ MUST NOT be empty. 750An _expression_ cannot contain another _expression_. 751An _expression_ MAY contain one more _attributes_. 752 753A **_<dfn>literal-expression</dfn>_** contains a _literal_, 754optionally followed by an _annotation_. 755 756A **_<dfn>variable-expression</dfn>_** contains a _variable_, 757optionally followed by an _annotation_. 758 759An **_<dfn>annotation-expression</dfn>_** contains an _annotation_ without an _operand_. 760 761```abnf 762expression = literal-expression 763 / variable-expression 764 / annotation-expression 765literal-expression = "{" [s] literal [s annotation] *(s attribute) [s] "}" 766variable-expression = "{" [s] variable [s annotation] *(s attribute) [s] "}" 767annotation-expression = "{" [s] annotation *(s attribute) [s] "}" 768``` 769 770There are several types of _expression_ that can appear in a _message_. 771All _expressions_ share a common syntax. The types of _expression_ are: 772 7731. The value of a _local-declaration_ 7742. A _selector_ 7753. A kind of _placeholder_ in a _pattern_ 776 777Additionally, an _input-declaration_ can contain a _variable-expression_. 778 779> Examples of different types of _expression_ 780> 781> Declarations: 782> 783> ``` 784> .input {$x :function option=value} 785> .local $y = {|This is an expression|} 786> ``` 787> 788> Selectors: 789> 790> ``` 791> .match {$selector :functionRequired} 792> ``` 793> 794> Placeholders: 795> 796> ``` 797> This placeholder contains a literal expression: {|literal|} 798> This placeholder contains a variable expression: {$variable} 799> This placeholder references a function on a variable: {$variable :function with=options} 800> This placeholder contains a function expression with a variable-valued option: {:function option=$variable} 801> ``` 802 803#### Annotation 804 805An **_<dfn>annotation</dfn>_** is part of an _expression_ containing either 806a _function_ together with its associated _options_, or 807a _private-use annotation_ or a _reserved annotation_. 808 809```abnf 810annotation = function 811 / private-use-annotation 812 / reserved-annotation 813``` 814 815An **_<dfn>operand</dfn>_** is the _literal_ of a _literal-expression_ or 816the _variable_ of a _variable-expression_. 817 818An _annotation_ can appear in an _expression_ by itself or following a single _operand_. 819When following an _operand_, the _operand_ serves as input to the _annotation_. 820 821##### Function 822 823A **_<dfn>function</dfn>_** is named functionality in an _annotation_. 824_Functions_ are used to evaluate, format, select, or otherwise process data 825values during formatting. 826 827Each _function_ is defined by the runtime's _function registry_. 828A _function_'s entry in the _function registry_ will define 829whether the _function_ is a _selector_ or formatter (or both), 830whether an _operand_ is required, 831what form the values of an _operand_ can take, 832what _options_ and _option_ values are valid, 833and what outputs might result. 834See [function registry](#function-registry) for more information. 835 836A _function_ starts with a prefix sigil `:` followed by an _identifier_. 837The _identifier_ MAY be followed by one or more _options_. 838_Options_ are not required. 839 840```abnf 841function = ":" identifier *(s option) 842``` 843 844> A _message_ with a _function_ operating on the _variable_ `$now`: 845> 846> ``` 847> It is now {$now :datetime}. 848> ``` 849 850###### Options 851 852An **_<dfn>option</dfn>_** is a key-value pair 853containing a named argument that is passed to a _function_. 854 855An _option_ has an _identifier_ and a _value_. 856The _identifier_ is separated from the _value_ by an U+003D EQUALS SIGN `=` along with 857optional whitespace. 858The value of an _option_ can be either a _literal_ or a _variable_. 859 860Multiple _options_ are permitted in an _annotation_. 861_Options_ are separated from the preceding _function_ _identifier_ 862and from each other by whitespace. 863Each _option_'s _identifier_ MUST be unique within the _annotation_: 864an _annotation_ with duplicate _option_ _identifiers_ is not valid. 865 866The order of _options_ is not significant. 867 868```abnf 869option = identifier [s] "=" [s] (literal / variable) 870``` 871 872> Examples of _functions_ with _options_ 873> 874> A _message_ using the `:datetime` function. 875> The _option_ `weekday` has the literal `long` as its value: 876> 877> ``` 878> Today is {$date :datetime weekday=long}! 879> ``` 880 881> A _message_ using the `:datetime` function. 882> The _option_ `weekday` has a variable `$dateStyle` as its value: 883> 884> ``` 885> Today is {$date :datetime weekday=$dateStyle}! 886> ``` 887 888##### Private-Use Annotations 889 890A **_<dfn>private-use annotation</dfn>_** is an _annotation_ whose syntax is reserved 891for use by a specific implementation or by private agreement between multiple implementations. 892Implementations MAY define their own meaning and semantics for _private-use annotations_. 893 894A _private-use annotation_ starts with either U+0026 AMPERSAND `&` or U+005E CIRCUMFLEX ACCENT `^`. 895 896Characters, including whitespace, are assigned meaning by the implementation. 897The definition of escapes in the `reserved-body` production, used for the body of 898a _private-use annotation_ is an affordance to implementations that 899wish to use a syntax exactly like other functions. Specifically: 900 901- The characters `\`, `{`, and `}` MUST be escaped as `\\`, `\{`, and `\}` respectively 902 when they appear in the body of a _private-use annotation_. 903- The character `|` is special: it SHOULD be escaped as `\|` in a _private-use annotation_, 904 but can appear unescaped as long as it is paired with another `|`. 905 This is an affordance to allow _literals_ to appear in the private use syntax. 906 907A _private-use annotation_ MAY be empty after its introducing sigil. 908 909```abnf 910private-use-annotation = private-start [[s] reserved-body] 911private-start = "^" / "&" 912``` 913 914> [!NOTE] 915> Users are cautioned that _private-use annotations_ cannot be reliably exchanged 916> and can result in errors during formatting. 917> It is generally a better idea to use the function registry 918> to define additional formatting or annotation options. 919 920> Here are some examples of what _private-use_ sequences might look like: 921> 922> ``` 923> Here's private use with an operand: {$foo &bar} 924> Here's a placeholder that is entirely private-use: {&anything here} 925> Here's a private-use function that uses normal function syntax: {$operand ^foo option=|literal|} 926> The character \| has to be paired or escaped: {&private || |something between| or isolated: \| } 927> Stop {& "translate 'stop' as a verb" might be a translator instruction or comment } 928> Protect stuff in {^ph}<a>{^/ph}private use{^ph}</a>{^/ph} 929> ``` 930 931##### Reserved Annotations 932 933A **_<dfn>reserved annotation</dfn>_** is an _annotation_ whose syntax is reserved 934for future standardization. 935 936A _reserved annotation_ starts with a reserved character. 937The remaining part of a _reserved annotation_, called a _reserved body_, 938MAY be empty or contain arbitrary text that starts and ends with 939a non-whitespace character. 940 941This allows maximum flexibility in future standardization, 942as future definitions MAY define additional semantics and constraints 943on the contents of these _annotations_. 944 945Implementations MUST NOT assign meaning or semantics to 946an _annotation_ starting with `reserved-annotation-start`: 947these are reserved for future standardization. 948Whitespace before or after a _reserved body_ is not part of the _reserved body_. 949Implementations MUST NOT remove or alter the contents of a _reserved body_, 950including any interior whitespace, 951but MAY remove or alter whitespace before or after the _reserved body_. 952 953While a reserved sequence is technically "well-formed", 954unrecognized _reserved-annotations_ or _private-use-annotations_ have no meaning. 955 956```abnf 957reserved-annotation = reserved-annotation-start [[s] reserved-body] 958reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~" 959 960reserved-body = reserved-body-part *([s] reserved-body-part) 961reserved-body-part = reserved-char / reserved-escape / quoted 962``` 963 964### Markup 965 966**_<dfn>Markup</dfn>_** _placeholders_ are _pattern_ parts 967that can be used to represent non-language parts of a _message_, 968such as inline elements or styling that should apply to a span of parts. 969 970_Markup_ MUST begin with U+007B LEFT CURLY BRACKET `{` 971and end with U+007D RIGHT CURLY BRACKET `}`. 972_Markup_ MAY contain one more _attributes_. 973 974_Markup_ comes in three forms: 975 976**_<dfn>Markup-open</dfn>_** starts with U+0023 NUMBER SIGN `#` and 977represents an opening element within the _message_, 978such as markup used to start a span. 979It MAY include _options_. 980 981**_<dfn>Markup-standalone</dfn>_** starts with U+0023 NUMBER SIGN `#` 982and has a U+002F SOLIDUS `/` immediately before its closing `}` 983representing a self-closing or standalone element within the _message_. 984It MAY include _options_. 985 986**_<dfn>Markup-close</dfn>_** starts with U+002F SOLIDUS `/` and 987is a _pattern_ part ending a span. 988 989```abnf 990markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone 991 / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close 992``` 993 994> A _message_ with one `button` markup span and a standalone `img` markup element: 995> 996> ``` 997> {#button}Submit{/button} or {#img alt=|Cancel| /}. 998> ``` 999 1000> A _message_ with attributes in the closing tag: 1001> 1002> ``` 1003> {#ansi attr=|bold,italic|}Bold and italic{/ansi attr=|bold|} italic only {/ansi attr=|italic|} no formatting.} 1004> ``` 1005 1006A _markup-open_ can appear without a corresponding _markup-close_. 1007A _markup-close_ can appear without a corresponding _markup-open_. 1008_Markup_ _placeholders_ can appear in any order without making the _message_ invalid. 1009However, specifications or implementations defining _markup_ might impose requirements 1010on the pairing, ordering, or contents of _markup_ during _formatting_. 1011 1012### Attributes 1013 1014**_Attributes_ are reserved for standardization by future versions of this specification.** 1015Examples in this section are meant to be illustrative and 1016might not match future requirements or usage. 1017 1018> [!NOTE] 1019> The Tech Preview does not provide a built-in mechanism for overriding 1020> values in the _formatting context_ (most notably the locale) 1021> Nor does it provide a mechanism for identifying specific expressions 1022> such as by assigning a name or id. 1023> The utility of these types of mechanisms has been debated. 1024> There are at least two proposed mechanisms for implementing support for 1025> these. 1026> Specifically, one mechanism would be to reserve specifically-named options, 1027> possibly using a Unicode namespace (i.e. `locale=xxx` or `u:locale=xxx`). 1028> Such options would be reserved for use in any and all functions or markup. 1029> The other mechanism would be to use the reserved "expression attribute" syntax 1030> for this purpose (i.e. `@locale=xxx` or `@id=foo`) 1031> Neither mechanism was included in this Tech Preview. 1032> Feedback on the preferred mechanism for managing these features 1033> is strongly desired. 1034> 1035> In the meantime, function authors and other implementers are cautioned to avoid creating 1036> function-specific or implementation-specific option values for this purpose. 1037> One workaround would be to use the implementation's namespace for these 1038> features to insure later interoperability when such a mechanism is finalized 1039> during the Tech Preview period. 1040> Specifically: 1041> - Avoid specifying an option for setting the locale of an expression as different from 1042> that of the overall _message_ locale, or use a namespace that later maps to the final 1043> mechanism. 1044> - Avoid specifying options for the purpose of linking placeholders 1045> (such as to pair opening markup to closing markup). 1046> If such an option is created, the implementer should use an 1047> implementation-specific namespace. 1048> Users and implementers are cautioned that such options might be 1049> replaced with a standard mechanism in a future version. 1050> - Avoid specifying generic options to communicate with translators and 1051> translation tooling (i.e. implementation-specific options that apply to all 1052> functions. 1053> The above are all desirable features. 1054> We welcome contributions to and proposals for such features during the 1055> Technical Preview. 1056 1057An **_<dfn>attribute</dfn>_** is an _identifier_ with an optional value 1058that appears in an _expression_ or in _markup_. 1059 1060_Attributes_ are prefixed by a U+0040 COMMERCIAL AT `@` sign, 1061followed by an _identifier_. 1062An _attribute_ MAY have a _value_ which is separated from the _identifier_ 1063by an U+003D EQUALS SIGN `=` along with optional whitespace. 1064The _value_ of an _attribute_ can be either a _literal_ or a _variable_. 1065 1066Multiple _attributes_ are permitted in an _expression_ or _markup_. 1067Each _attribute_ is separated by whitespace. 1068 1069The order of _attributes_ is not significant. 1070 1071 1072```abnf 1073attribute = "@" identifier [[s] "=" [s] (literal / variable)] 1074``` 1075 1076> Examples of _expressions_ and _markup_ with _attributes_: 1077> 1078> A _message_ including a _literal_ that should not be translated: 1079> 1080> ``` 1081> In French, "{|bonjour| @translate=no}" is a greeting 1082> ``` 1083> 1084> A _message_ with _markup_ that should not be copied: 1085> 1086> ``` 1087> Have a {#span @can-copy}great and wonderful{/span @can-copy} birthday! 1088> ``` 1089 1090### Other Syntax Elements 1091 1092This section defines common elements used to construct _messages_. 1093 1094#### Keywords 1095 1096A **_<dfn>keyword</dfn>_** is a reserved token that has a unique meaning in the _message_ syntax. 1097 1098The following three keywords are defined: `.input`, `.local`, and `.match`. 1099Keywords are always lowercase and start with U+002E FULL STOP `.`. 1100 1101```abnf 1102input = %s".input" 1103local = %s".local" 1104match = %s".match" 1105``` 1106 1107#### Literals 1108 1109A **_<dfn>literal</dfn>_** is a character sequence that appears outside 1110of _text_ in various parts of a _message_. 1111A _literal_ can appear 1112as a _key_ value, 1113as the _operand_ of a _literal-expression_, 1114or in the value of an _option_. 1115A _literal_ MAY include any Unicode code point 1116except for U+0000 NULL or the surrogate code points U+D800 through U+DFFF. 1117 1118All code points are preserved. 1119 1120A **_<dfn>quoted</dfn>_** literal begins and ends with U+005E VERTICAL BAR `|`. 1121The characters `\` and `|` within a _quoted_ literal MUST be 1122escaped as `\\` and `\|`. 1123 1124An **_<dfn>unquoted</dfn>_** literal is a _literal_ that does not require the `|` 1125quotes around it to be distinct from the rest of the _message_ syntax. 1126An _unquoted_ MAY be used when the content of the _literal_ 1127contains no whitespace and otherwise matches the `unquoted` production. 1128Any _unquoted_ literal MAY be _quoted_. 1129Implementations MUST NOT distinguish between _quoted_ and _unquoted_ literals 1130that have the same sequence of code points. 1131 1132_Unquoted_ literals can contain a _name_ or consist of a _number-literal_. 1133A _number-literal_ uses the same syntax as JSON and is intended for the encoding 1134of number values in _operands_ or _options_, or as _keys_ for _variants_. 1135 1136```abnf 1137literal = quoted / unquoted 1138quoted = "|" *(quoted-char / quoted-escape) "|" 1139unquoted = name / number-literal 1140number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT] 1141``` 1142 1143#### Names and Identifiers 1144 1145An **_<dfn>identifier</dfn>_** is a character sequence that 1146identifies a _function_, _markup_, or _option_. 1147Each _identifier_ consists of a _name_ optionally preceeded by 1148a _namespace_. 1149When present, the _namespace_ is separated from the _name_ by a 1150U+003A COLON `:`. 1151Built-in _functions_ and their _options_ do not have a _namespace_ identifier. 1152 1153The _namespace_ `u` (U+0075 LATIN SMALL LETTER U) 1154is reserved for future standardization. 1155 1156_Function_ _identifiers_ are prefixed with `:`. 1157_Markup_ _identifiers_ are prefixed with `#` or `/`. 1158_Option_ _identifiers_ have no prefix. 1159 1160A **_<dfn>name</dfn>_** is a character sequence used in an _identifier_ 1161or as the name for a _variable_ 1162or the value of an _unquoted_ _literal_. 1163 1164_Variable_ names are prefixed with `$`. 1165 1166Valid content for _names_ is based on <cite>Namespaces in XML 1.0</cite>'s 1167[NCName](https://www.w3.org/TR/xml-names/#NT-NCName). 1168This is different from XML's [Name](https://www.w3.org/TR/xml/#NT-Name) 1169in that it MUST NOT contain a U+003A COLON `:`. 1170Otherwise, the set of characters allowed in a _name_ is large. 1171 1172> [!NOTE] 1173> _External variables_ can be passed in that are not valid _names_. 1174> Such variables cannot be referenced in a _message_, 1175> but are not otherwise errors. 1176 1177Examples: 1178> A variable: 1179>``` 1180> This has a {$variable} 1181>``` 1182> A function: 1183> ``` 1184> This has a {:function} 1185> ``` 1186> An add-on function from the `icu` namespace: 1187> ``` 1188> This has a {:icu:function} 1189> ``` 1190> An option and an add-on option: 1191> ``` 1192> This has {:options option=value icu:option=add_on} 1193> ``` 1194 1195Support for _namespaces_ and their interpretation is implementation-defined 1196in this release. 1197 1198```abnf 1199variable = "$" name 1200option = identifier [s] "=" [s] (literal / variable) 1201 1202identifier = [namespace ":"] name 1203namespace = name 1204name = name-start *name-char 1205name-start = ALPHA / "_" 1206 / %xC0-D6 / %xD8-F6 / %xF8-2FF 1207 / %x370-37D / %x37F-1FFF / %x200C-200D 1208 / %x2070-218F / %x2C00-2FEF / %x3001-D7FF 1209 / %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF 1210name-char = name-start / DIGIT / "-" / "." 1211 / %xB7 / %x300-36F / %x203F-2040 1212``` 1213 1214#### Escape Sequences 1215 1216An **_<dfn>escape sequence</dfn>_** is a two-character sequence starting with 1217U+005C REVERSE SOLIDUS `\`. 1218 1219An _escape sequence_ allows the appearance of lexically meaningful characters 1220in the body of _text_, _quoted_, or _reserved_ (which includes, in this case, 1221_private-use_) sequences respectively: 1222 1223```abnf 1224text-escape = backslash ( backslash / "{" / "}" ) 1225quoted-escape = backslash ( backslash / "|" ) 1226reserved-escape = backslash ( backslash / "{" / "|" / "}" ) 1227backslash = %x5C ; U+005C REVERSE SOLIDUS "\" 1228``` 1229 1230#### Whitespace 1231 1232**_<dfn>Whitespace</dfn>_** is defined as one or more of 1233U+0009 CHARACTER TABULATION (tab), 1234U+000A LINE FEED (new line), 1235U+000D CARRIAGE RETURN, 1236U+3000 IDEOGRAPHIC SPACE, 1237or U+0020 SPACE. 1238 1239Inside _patterns_ and _quoted literals_, 1240whitespace is part of the content and is recorded and stored verbatim. 1241Whitespace is not significant outside translatable text, except where required by the syntax. 1242 1243> [!NOTE] 1244> The character U+3000 IDEOGRAPHIC SPACE is included in whitespace for 1245> compatibility with certain East Asian keyboards and input methods, 1246> in which users might accidentally create these characters in a _message_. 1247 1248```abnf 1249s = 1*( SP / HTAB / CR / LF / %x3000 ) 1250``` 1251 1252## Complete ABNF 1253 1254The grammar below uses the ABNF notation [[STD68](https://www.rfc-editor.org/info/std68)], 1255including the modifications found in [RFC 7405](https://www.rfc-editor.org/rfc/rfc7405). 1256 1257RFC7405 defines a variation of ABNF that is case-sensitive. 1258Some ABNF tools are only compatible with the specification found in 1259[RFC 5234](https://www.rfc-editor.org/rfc/rfc5234). 1260To make `message.abnf` compatible with that version of ABNF, replace 1261the rules of the same name with this block: 1262 1263```abnf 1264input = %x2E.69.6E.70.75.74 ; ".input" 1265local = %x2E.6C.6F.63.61.6C ; ".local" 1266match = %x2E.6D.61.74.63.68 ; ".match" 1267``` 1268 1269### `message.abnf` 1270 1271```abnf 1272message = simple-message / complex-message 1273 1274simple-message = [simple-start pattern] 1275simple-start = simple-start-char / text-escape / placeholder 1276pattern = *(text-char / text-escape / placeholder) 1277placeholder = expression / markup 1278 1279complex-message = *(declaration [s]) complex-body 1280declaration = input-declaration / local-declaration / reserved-statement 1281complex-body = quoted-pattern / matcher 1282 1283input-declaration = input [s] variable-expression 1284local-declaration = local s variable [s] "=" [s] expression 1285 1286quoted-pattern = "{{" pattern "}}" 1287 1288matcher = match-statement 1*([s] variant) 1289match-statement = match 1*([s] selector) 1290selector = expression 1291variant = key *(s key) [s] quoted-pattern 1292key = literal / "*" 1293 1294; Expressions 1295expression = literal-expression 1296 / variable-expression 1297 / annotation-expression 1298literal-expression = "{" [s] literal [s annotation] *(s attribute) [s] "}" 1299variable-expression = "{" [s] variable [s annotation] *(s attribute) [s] "}" 1300annotation-expression = "{" [s] annotation *(s attribute) [s] "}" 1301 1302annotation = function 1303 / private-use-annotation 1304 / reserved-annotation 1305 1306markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone 1307 / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close 1308 1309; Expression and literal parts 1310function = ":" identifier *(s option) 1311option = identifier [s] "=" [s] (literal / variable) 1312; Attributes are reserved for future standardization 1313attribute = "@" identifier [[s] "=" [s] (literal / variable)] 1314 1315variable = "$" name 1316literal = quoted / unquoted 1317quoted = "|" *(quoted-char / quoted-escape) "|" 1318unquoted = name / number-literal 1319; number-literal matches JSON number (https://www.rfc-editor.org/rfc/rfc8259#section-6) 1320number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT] 1321 1322; Keywords; Note that these are case-sensitive 1323input = %s".input" 1324local = %s".local" 1325match = %s".match" 1326 1327; Reserve additional .keywords for use by future versions of this specification. 1328reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression) 1329; Note that the following production is a simplification, 1330; as this rule MUST NOT be considered to match existing keywords 1331; (`.input`, `.local`, and `.match`). 1332reserved-keyword = "." name 1333 1334; Reserve additional sigils for use by future versions of this specification. 1335reserved-annotation = reserved-annotation-start [[s] reserved-body] 1336reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~" 1337 1338; Reserve sigils for private-use by implementations. 1339private-use-annotation = private-start [[s] reserved-body] 1340private-start = "^" / "&" 1341reserved-body = reserved-body-part *([s] reserved-body-part) 1342reserved-body-part = reserved-char / reserved-escape / quoted 1343 1344; Names and identifiers 1345; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName 1346; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName 1347identifier = [namespace ":"] name 1348namespace = name 1349name = name-start *name-char 1350name-start = ALPHA / "_" 1351 / %xC0-D6 / %xD8-F6 / %xF8-2FF 1352 / %x370-37D / %x37F-1FFF / %x200C-200D 1353 / %x2070-218F / %x2C00-2FEF / %x3001-D7FF 1354 / %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF 1355name-char = name-start / DIGIT / "-" / "." 1356 / %xB7 / %x300-36F / %x203F-2040 1357 1358; Restrictions on characters in various contexts 1359simple-start-char = content-char / s / "@" / "|" 1360text-char = content-char / s / "." / "@" / "|" 1361quoted-char = content-char / s / "." / "@" / "{" / "}" 1362reserved-char = content-char / "." 1363content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) 1364 / %x0B-0C ; omit CR (%x0D) 1365 / %x0E-1F ; omit SP (%x20) 1366 / %x21-2D ; omit . (%x2E) 1367 / %x2F-3F ; omit @ (%x40) 1368 / %x41-5B ; omit \ (%x5C) 1369 / %x5D-7A ; omit { | } (%x7B-7D) 1370 / %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000) 1371 / %x3001-D7FF ; omit surrogates 1372 / %xE000-10FFFF 1373 1374; Character escapes 1375text-escape = backslash ( backslash / "{" / "}" ) 1376quoted-escape = backslash ( backslash / "|" ) 1377reserved-escape = backslash ( backslash / "{" / "|" / "}" ) 1378backslash = %x5C ; U+005C REVERSE SOLIDUS "\" 1379 1380; Whitespace 1381s = 1*( SP / HTAB / CR / LF / %x3000 ) 1382``` 1383 1384## Errors 1385 1386Errors in messages and their formatting MAY occur and be detected 1387at different stages of processing. 1388Where available, 1389the use of validation tools is recommended, 1390as early detection of errors makes their correction easier. 1391 1392### Error Handling 1393 1394_Syntax Errors_ and _Data Model Errors_ apply to all message processors, 1395and MUST be emitted as soon as possible. 1396The other error categories are only emitted during formatting, 1397but it might be possible to detect them with validation tools. 1398 1399During selection, an _expression_ handler MUST only emit _Resolution Errors_ and _Selection Errors_. 1400During formatting, an _expression_ handler MUST only emit _Resolution Errors_ and _Formatting Errors_. 1401 1402_Resolution Errors_ and _Formatting Errors_ in _expressions_ that are not used 1403in _pattern selection_ or _formatting_ MAY be ignored, 1404as they do not affect the output of the formatter. 1405 1406In all cases, when encountering a runtime error, 1407a message formatter MUST provide some representation of the message. 1408An informative error or errors MUST also be separately provided. 1409 1410When a message contains more than one error, 1411or contains some error which leads to further errors, 1412an implementation which does not emit all of the errors 1413SHOULD prioritise _Syntax Errors_ and _Data Model Errors_ over others. 1414 1415When an error occurs within a _selector_, 1416the _selector_ MUST NOT match any _variant_ _key_ other than the catch-all `*` 1417and a _Resolution Error_ or a _Selection Error_ MUST be emitted. 1418 1419### Syntax Errors 1420 1421**_<dfn>Syntax Errors</dfn>_** occur when the syntax representation of a message is not well-formed. 1422 1423> Example invalid messages resulting in a _Syntax Error_: 1424> 1425> ``` 1426> {{Missing end braces 1427> ``` 1428> 1429> ``` 1430> {{Missing one end brace} 1431> ``` 1432> 1433> ``` 1434> Unknown {{expression}} 1435> ``` 1436> 1437> ``` 1438> .local $var = {|no message body|} 1439> ``` 1440 1441### Data Model Errors 1442 1443**_<dfn>Data Model Errors</dfn>_** occur when a message is invalid due to 1444violating one of the semantic requirements on its structure. 1445 1446#### Variant Key Mismatch 1447 1448A **_<dfn>Variant Key Mismatch</dfn>_** occurs when the number of keys on a _variant_ 1449does not equal the number of _selectors_. 1450 1451> Example invalid messages resulting in a _Variant Key Mismatch_ error: 1452> 1453> ``` 1454> .match {$one :func} 1455> 1 2 {{Too many}} 1456> * {{Otherwise}} 1457> ``` 1458> 1459> ``` 1460> .match {$one :func} {$two :func} 1461> 1 2 {{Two keys}} 1462> * {{Missing a key}} 1463> * * {{Otherwise}} 1464> ``` 1465 1466#### Missing Fallback Variant 1467 1468A **_<dfn>Missing Fallback Variant</dfn>_** error occurs when the message 1469does not include a _variant_ with only catch-all keys. 1470 1471> Example invalid messages resulting in a _Missing Fallback Variant_ error: 1472> 1473> ``` 1474> .match {$one :func} 1475> 1 {{Value is one}} 1476> 2 {{Value is two}} 1477> ``` 1478> 1479> ``` 1480> .match {$one :func} {$two :func} 1481> 1 * {{First is one}} 1482> * 1 {{Second is one}} 1483> ``` 1484 1485#### Missing Selector Annotation 1486 1487A **_<dfn>Missing Selector Annotation</dfn>_** error occurs when the _message_ 1488contains a _selector_ that does not have an _annotation_, 1489or contains a _variable_ that does not directly or indirectly reference a _declaration_ with an _annotation_. 1490 1491> Examples of invalid messages resulting in a _Missing Selector Annotation_ error: 1492> 1493> ``` 1494> .match {$one} 1495> 1 {{Value is one}} 1496> * {{Value is not one}} 1497> ``` 1498> 1499> ``` 1500> .local $one = {|The one|} 1501> .match {$one} 1502> 1 {{Value is one}} 1503> * {{Value is not one}} 1504> ``` 1505> 1506> ``` 1507> .input {$one} 1508> .match {$one} 1509> 1 {{Value is one}} 1510> * {{Value is not one}} 1511> ``` 1512 1513#### Duplicate Declaration 1514 1515A **_<dfn>Duplicate Declaration</dfn>_** error occurs when a _variable_ is declared more than once. 1516Note that an input _variable_ is implicitly declared when it is first used, 1517so explicitly declaring it after such use is also an error. 1518 1519> Examples of invalid messages resulting in a _Duplicate Declaration_ error: 1520> 1521> ``` 1522> .input {$var :number maximumFractionDigits=0} 1523> .input {$var :number minimumFractionDigits=0} 1524> {{Redeclaration of the same variable}} 1525> 1526> .local $var = {$ext :number maximumFractionDigits=0} 1527> .input {$var :number minimumFractionDigits=0} 1528> {{Redeclaration of a local variable}} 1529> 1530> .input {$var :number minimumFractionDigits=0} 1531> .local $var = {$ext :number maximumFractionDigits=0} 1532> {{Redeclaration of an input variable}} 1533> 1534> .input {$var :number minimumFractionDigits=$var2} 1535> .input {$var2 :number} 1536> {{Redeclaration of the implicit input variable $var2}} 1537> 1538> .local $var = {$ext :someFunction} 1539> .local $var = {$error} 1540> .local $var2 = {$var2 :error} 1541> {{{$var} cannot be redefined. {$var2} cannot refer to itself}} 1542> ``` 1543 1544#### Duplicate Option Name 1545 1546A **_<dfn>Duplicate Option Name</dfn>_** error occurs when the same _identifier_ 1547appears on the left-hand side of more than one _option_ in the same _expression_. 1548 1549> Examples of invalid messages resulting in a _Duplicate Option Name_ error: 1550> 1551> ``` 1552> Value is {42 :number style=percent style=decimal} 1553> ``` 1554> 1555> ``` 1556> .local $foo = {horse :func one=1 two=2 one=1} 1557> {{This is {$foo}}} 1558> ``` 1559 1560### Resolution Errors 1561 1562**_<dfn>Resolution Errors</dfn>_** occur when the runtime value of a part of a message 1563cannot be determined. 1564 1565#### Unresolved Variable 1566 1567An **_<dfn>Unresolved Variable</dfn>_** error occurs when a variable reference cannot be resolved. 1568 1569> For example, attempting to format either of the following messages 1570> would result in an _Unresolved Variable_ error if done within a context that 1571> does not provide for the variable reference `$var` to be successfully resolved: 1572> 1573> ``` 1574> The value is {$var}. 1575> ``` 1576> 1577> ``` 1578> .match {$var :func} 1579> 1 {{The value is one.}} 1580> * {{The value is not one.}} 1581> ``` 1582 1583#### Unknown Function 1584 1585An **_<dfn>Unknown Function</dfn>_** error occurs when an _expression_ includes 1586a reference to a function which cannot be resolved. 1587 1588> For example, attempting to format either of the following messages 1589> would result in an _Unknown Function_ error if done within a context that 1590> does not provide for the function `:func` to be successfully resolved: 1591> 1592> ``` 1593> The value is {horse :func}. 1594> ``` 1595> 1596> ``` 1597> .match {|horse| :func} 1598> 1 {{The value is one.}} 1599> * {{The value is not one.}} 1600> ``` 1601 1602#### Unsupported Expression 1603 1604An **_<dfn>Unsupported Expression</dfn>_** error occurs when an expression uses 1605syntax reserved for future standardization, 1606or for private implementation use that is not supported by the current implementation. 1607 1608> For example, attempting to format this message 1609> would always result in an _Unsupported Expression_ error: 1610> 1611> ``` 1612> The value is {!horse}. 1613> ``` 1614> 1615> Attempting to format this message would result in an _Unsupported Expression_ error 1616> if done within a context that does not support the `^` private use sigil: 1617> 1618> ``` 1619> .match {|horse| ^private} 1620> 1 {{The value is one.}} 1621> * {{The value is not one.}} 1622> ``` 1623 1624#### Invalid Expression 1625 1626An **_<dfn>Invalid Expression</dfn>_** error occurs when a _message_ includes an _expression_ 1627whose implementation-defined internal requirements produce an error during _function resolution_ 1628or when a _function_ returns a value (such as `null`) that the implementation does not support. 1629 1630An **_<dfn>Operand Mismatch Error</dfn>_** is an _Invalid Expression_ error that occurs when 1631an _operand_ provided to a _function_ during _function resolution_ does not match one of the 1632expected implementation-defined types for that function; 1633or in which a literal _operand_ value does not have the required format 1634and thus cannot be processed into one of the expected implementation-defined types 1635for that specific _function_. 1636 1637> For example, the following _message_ produces an _Operand Mismatch Error_ 1638> (a type of _Invalid Expression_ error) 1639> because the literal `|horse|` does not match the production `number-literal`, 1640> which is a requirement of the function `:number` for its operand: 1641> ``` 1642> .local $horse = {horse :number} 1643> {{You have a {$horse}.}} 1644> ``` 1645> The following _message_ might produce an _Invalid Expression_ error if the 1646> the function `:function` threw an exception or otherwise emitted an error 1647> rather than returning a valid value: 1648>``` 1649> {{This has an invalid expression {$var :function} because it has a bug in it.}} 1650>``` 1651 1652#### Unsupported Statement 1653 1654An **_<dfn>Unsupported Statement</dfn>_** error occurs when a message includes a _reserved statement_. 1655 1656> For example, attempting to format this message 1657> would always result in an _Unsupported Statement_ error: 1658> 1659> ``` 1660> .some {|horse|} 1661> {{The message body}} 1662> ``` 1663 1664### Selection Errors 1665 1666**_<dfn>Selection Errors</dfn>_** occur when message selection fails. 1667 1668> For example, attempting to format either of the following messages 1669> might result in a _Selection Error_ if done within a context that 1670> uses a `:number` selector function which requires its input to be numeric: 1671> 1672> ``` 1673> .match {|horse| :number} 1674> 1 {{The value is one.}} 1675> * {{The value is not one.}} 1676> ``` 1677> 1678> ``` 1679> .local $sel = {|horse| :number} 1680> .match {$sel} 1681> 1 {{The value is one.}} 1682> * {{The value is not one.}} 1683> ``` 1684 1685### Formatting Errors 1686 1687**_<dfn>Formatting Errors</dfn>_** occur during the formatting of a resolved value, 1688for example when encountering a value with an unsupported type 1689or an internally inconsistent set of options. 1690 1691> For example, attempting to format any of the following messages 1692> might result in a _Formatting Error_ if done within a context that 1693> 1694> 1. provides for the variable reference `$user` to resolve to 1695> an object `{ name: 'Kat', id: 1234 }`, 1696> 2. provides for the variable reference `$field` to resolve to 1697> a string `'address'`, and 1698> 3. uses a `:get` formatting function which requires its argument to be an object and 1699> an option `field` to be provided with a string value, 1700> 1701> ``` 1702> Hello, {horse :get field=name}! 1703> ``` 1704> 1705> ``` 1706> Hello, {$user :get}! 1707> ``` 1708> 1709> ``` 1710> .local $id = {$user :get field=id} 1711> {{Hello, {$id :get field=name}!}} 1712> ``` 1713> 1714> ``` 1715> Your {$field} is {$id :get field=$field} 1716> ``` 1717 1718## Function Registry 1719 1720Implementations and tooling can greatly benefit from a 1721structured definition of formatting and matching functions available to messages at runtime. 1722This specification is intended to provide a mechanism for storing such declarations in a portable manner. 1723 1724### Goals 1725 1726_This section is non-normative._ 1727 1728The registry provides a machine-readable description of MessageFormat 2 extensions (custom functions), 1729in order to support the following goals and use-cases: 1730 1731- Validate semantic properties of messages. For example: 1732 - Type-check values passed into functions. 1733 - Validate that matching functions are only called in selectors. 1734 - Validate that formatting functions are only called in placeholders. 1735 - Verify the exhaustiveness of variant keys given a selector. 1736- Support the localization roundtrip. For example: 1737 - Generate variant keys for a given locale during XLIFF extraction. 1738- Improve the authoring experience. For example: 1739 - Forbid edits to certain function options (e.g. currency options). 1740 - Autocomplete function and option names. 1741 - Display on-hover tooltips for function signatures with documentation. 1742 - Display/edit known message metadata. 1743 - Restrict input in GUI by providing a dropdown with all viable option values. 1744 1745### Conformance and Use 1746 1747_This section is normative._ 1748 1749To be conformant with MessageFormat 2.0, an implementation MUST implement 1750the _functions_, _options_ and _option_ values, _operands_ and outputs 1751described in the section [Default Registry](#default-registry) below. 1752 1753Implementations MAY implement additional _functions_ or additional _options_. 1754In particular, implementations are encouraged to provide feedback on proposed 1755_options_ and their values. 1756 1757> [!IMPORTANT] 1758> In the Tech Preview, the [registry data model](#registry-data-model) should 1759> be regarded as experimental. 1760> Changes to the format are expected during this period. 1761> Feedback on the registry's format and implementation is encouraged! 1762 1763Implementations are not required to provide a machine-readable registry 1764nor to read or interpret the registry data model in order to be conformant. 1765 1766The MessageFormat 2.0 Registry was created to describe 1767the core set of formatting and selection _functions_, 1768including _operands_, _options_, and _option_ values. 1769This is the minimum set of functionality needed for conformance. 1770By using the same names and values, _messages_ can be used interchangeably 1771by different implementations, 1772regardless of programming language or runtime environment. 1773This ensures that developers do not have to relearn core MessageFormat syntax 1774and functionality when moving between platforms 1775and that translators do not need to know about the runtime environment for most 1776selection or formatting operations. 1777 1778The registry provides a machine-readable description of _functions_ 1779suitable for tools, such as those used in translation automation, so that 1780variant expansion and information about available _options_ and their effects 1781are available in the translation ecosystem. 1782To that end, implementations are strongly encouraged to provide appropriately 1783tailored versions of the registry for consumption by tools 1784(even if not included in software distributions) 1785and to encourage any add-on or plug-in functionality to provide 1786a registry to support localization tooling. 1787 1788### Registry Data Model 1789 1790_This section is non-normative._ 1791 1792> [!IMPORTANT] 1793> This part of the specification is not part of the Tech Preview. 1794 1795The registry contains descriptions of function signatures. 1796 1797The main building block of the registry is the `<function>` element. 1798It represents an implementation of a custom function available to translation at runtime. 1799A function defines a human-readable `<description>` of its behavior 1800and one or more machine-readable _signatures_ of how to call it. 1801Named `<validationRule>` elements can optionally define regex validation rules for 1802literals, option values, and variant keys. 1803 1804MessageFormat 2 functions can be invoked in two contexts: 1805 1806- inside placeholders, to produce a part of the message's formatted output; 1807 for example, a raw value of `|1.5|` may be formatted to `1,5` in a language which uses commas as decimal separators, 1808- inside selectors, to contribute to selecting the appropriate variant among all given variants. 1809 1810A single _function name_ may be used in both contexts, 1811regardless of whether it's implemented as one or multiple functions. 1812 1813A _signature_ defines one particular set of at most one argument and any number of named options 1814that can be used together in a single call to the function. 1815`<formatSignature>` corresponds to a function call inside a placeholder inside translatable text. 1816`<matchSignature>` corresponds to a function call inside a selector. 1817 1818A signature may define the positional argument of the function with the `<input>` element. 1819If the `<input>` element is not present, the function is defined as a nullary function. 1820A signature may also define one or more `<option>` elements representing _named options_ to the function. 1821An option can be omitted in a call to the function, 1822unless the `required` attribute is present. 1823They accept either a finite enumeration of values (the `values` attribute) 1824or validate their input with a regular expression (the `validationRule` attribute). 1825Read-only options (the `readonly` attribute) can be displayed to translators in CAT tools, but may not be edited. 1826 1827As the `<input>` and `<option>` rules may be locale-dependent, 1828each signature can include an `<override locales="...">` that extends and overrides 1829the corresponding input and options rules. 1830If multiple `<override>` elements would match the current locale, 1831only the first one is used. 1832 1833Matching-function signatures additionally include one or more `<match>` elements 1834to define the keys against which they can match when used as selectors. 1835 1836Functions may also include `<alias>` definitions, 1837which provide shorthands for commonly used option baskets. 1838An _alias name_ may be used equivalently to a _function name_ in messages. 1839Its `<setOption>` values are always set, and may not be overridden in message annotations. 1840 1841If a `<function>`, `<input>` or `<option>` includes multiple `<description>` elements, 1842each SHOULD have a different `xml:lang` attribute value. 1843This allows for the descriptions of these elements to be themselves localized 1844according to the preferred locale of the message authors and editors. 1845 1846### Example 1847 1848The following `registry.xml` is an example of a registry file 1849which may be provided by an implementation to describe its built-in functions. 1850For the sake of brevity, only `locales="en"` is considered. 1851 1852```xml 1853<?xml version="1.0" encoding="UTF-8" ?> 1854<!DOCTYPE registry SYSTEM "./registry.dtd"> 1855 1856<registry xml:lang="en"> 1857 <function name="platform"> 1858 <description>Match the current OS.</description> 1859 <matchSignature> 1860 <match values="windows linux macos android ios"/> 1861 </matchSignature> 1862 </function> 1863 1864 <validationRule id="anyNumber" regex="-?[0-9]+(\.[0-9]+)"/> 1865 <validationRule id="positiveInteger" regex="[0-9]+"/> 1866 <validationRule id="currencyCode" regex="[A-Z]{3}"/> 1867 1868 <function name="number"> 1869 <description> 1870 Format a number. 1871 Match a **formatted** numerical value against CLDR plural categories or against a number literal. 1872 </description> 1873 1874 <matchSignature> 1875 <input validationRule="anyNumber"/> 1876 <option name="type" values="cardinal ordinal"/> 1877 <option name="minimumIntegerDigits" validationRule="positiveInteger"/> 1878 <option name="minimumFractionDigits" validationRule="positiveInteger"/> 1879 <option name="maximumFractionDigits" validationRule="positiveInteger"/> 1880 <option name="minimumSignificantDigits" validationRule="positiveInteger"/> 1881 <option name="maximumSignificantDigits" validationRule="positiveInteger"/> 1882 <!-- Since this applies to both cardinal and ordinal, all plural options are valid. --> 1883 <match locales="en" values="one two few other" validationRule="anyNumber"/> 1884 <match values="zero one two few many other" validationRule="anyNumber"/> 1885 </matchSignature> 1886 1887 <formatSignature> 1888 <input validationRule="anyNumber"/> 1889 <option name="minimumIntegerDigits" validationRule="positiveInteger"/> 1890 <option name="minimumFractionDigits" validationRule="positiveInteger"/> 1891 <option name="maximumFractionDigits" validationRule="positiveInteger"/> 1892 <option name="minimumSignificantDigits" validationRule="positiveInteger"/> 1893 <option name="maximumSignificantDigits" validationRule="positiveInteger"/> 1894 <option name="style" readonly="true" values="decimal currency percent unit" default="decimal"/> 1895 <option name="currency" readonly="true" validationRule="currencyCode"/> 1896 </formatSignature> 1897 1898 <alias name="integer"> 1899 <description>Locale-sensitive integral number formatting</description> 1900 <setOption name="maximumFractionDigits" value="0" /> 1901 <setOption name="style" value="decimal" /> 1902 </alias> 1903 </function> 1904</registry> 1905``` 1906 1907Given the above description, the `:number` function is defined to work both in a selector and a placeholder: 1908 1909``` 1910.match {$count :number} 19111 {{One new message}} 1912* {{{$count :number} new messages}} 1913``` 1914 1915Furthermore, 1916`:number`'s `<matchSignature>` contains two `<match>` elements 1917which allow the validation of variant keys. 1918The element whose `locales` best matches the current locale 1919using resource item [lookup](tr35.md#Lookup) from LDML is used. 1920An element with no `locales` attribute is the default 1921(and is considered equivalent to the `root` locale). 1922 1923- `<match locales="en" values="one two few other" .../>` can be used in locales like `en` and `en-GB` 1924 to validate the `when other` variant by verifying that the `other` key is present 1925 in the list of enumarated values: `one other`. 1926- `<match ... validationRule="anyNumber"/>` can be used to valide the `when 1` variant 1927 by testing the `1` key against the `anyNumber` regular expression defined in the registry file. 1928 1929--- 1930 1931A localization engineer can then extend the registry by defining the following `customRegistry.xml` file. 1932 1933```xml 1934<?xml version="1.0" encoding="UTF-8" ?> 1935<!DOCTYPE registry SYSTEM "./registry.dtd"> 1936 1937<registry xml:lang="en"> 1938 <function name="noun"> 1939 <description>Handle the grammar of a noun.</description> 1940 <formatSignature> 1941 <override locales="en"> 1942 <input/> 1943 <option name="article" values="definite indefinite"/> 1944 <option name="plural" values="one other"/> 1945 <option name="case" values="nominative genitive" default="nominative"/> 1946 </override> 1947 </formatSignature> 1948 </function> 1949 1950 <function name="adjective"> 1951 <description>Handle the grammar of an adjective.</description> 1952 <formatSignature> 1953 <override locales="en"> 1954 <input/> 1955 <option name="article" values="definite indefinite"/> 1956 <option name="plural" values="one other"/> 1957 <option name="case" values="nominative genitive" default="nominative"/> 1958 </override> 1959 </formatSignature> 1960 <formatSignature> 1961 <override locales="en"> 1962 <input/> 1963 <option name="article" values="definite indefinite"/> 1964 <option name="accord"/> 1965 </override> 1966 </formatSignature> 1967 </function> 1968</registry> 1969``` 1970 1971Messages can now use the `:noun` and the `:adjective` functions. 1972The following message references the first signature of `:adjective`, 1973which expects the `plural` and `case` options: 1974 1975> ``` 1976> You see {$color :adjective article=indefinite plural=one case=nominative} {$object :noun case=nominative}! 1977> ``` 1978 1979The following message references the second signature of `:adjective`, 1980which only expects the `accord` option: 1981 1982>``` 1983> .input {$object :noun case=nominative} 1984> {{You see {$color :adjective article=indefinite accord=$object} {$object}!}} 1985>``` 1986 1987### Default Registry 1988 1989> [!IMPORTANT] 1990> This part of the specification is part of the Tech Preview 1991> and is **_NORMATIVE_**. 1992 1993This section describes the functions which each implementation MUST provide 1994to be conformant with this specification. 1995 1996#### String Value Selection and Formatting 1997 1998##### The `:string` function 1999 2000The function `:string` provides string selection and formatting. 2001 2002###### Operands 2003 2004The _operand_ of `:string` is either any implementation-defined type 2005that is a string or for which conversion to a string is supported, 2006or any _literal_ value. 2007All other values produce an _Invalid Expression_ error. 2008 2009> For example, in Java, implementations of the `java.lang.CharSequence` interface 2010> (such as `java.lang.String` or `java.lang.StringBuilder`), 2011> the type `char`, or the class `java.lang.Character` might be considered 2012> as the "implementation-defined types". 2013> Such an implementation might also support other classes via the method `toString()`. 2014> This might be used to enable selection of a `enum` value by name, for example. 2015> 2016> Other programming languages would define string and character sequence types or 2017> classes according to their local needs, including, where appropriate, 2018> coercion to string. 2019 2020###### Options 2021 2022The function `:string` has no options. 2023 2024> [!NOTE] 2025> Proposals for string transformation options or implementation 2026> experience with user requirements is desired during the Tech Preview. 2027 2028###### Selection 2029 2030When implementing [`MatchSelectorKeys(resolvedSelector, keys)`](#resolve-preferences) 2031where `resolvedSelector` is the resolved value of a _selector_ _expression_ 2032and `keys` is a list of strings, 2033the `:string` selector performs as described below. 2034 20351. Let `compare` be the string value of `resolvedSelector`. 20361. Let `result` be a new empty list of strings. 20371. For each string `key` in `keys`: 2038 1. If `key` and `compare` consist of the same sequence of Unicode code points, then 2039 1. Append `key` as the last element of the list `result`. 20401. Return `result`. 2041 2042> [!NOTE] 2043> Matching of `key` and `compare` values is sensitive to the sequence of code points 2044> in each string. 2045> As a result, variations in how text can be encoded can affect the performance of matching. 2046> The function `:string` does not perform case folding or Unicode Normalization of string values. 2047> Users SHOULD encode _messages_ and their parts (such as _keys_ and _operands_), 2048> in Unicode Normalization Form C (NFC) unless there is a very good reason 2049> not to. 2050> See also: [String Matching](https://www.w3.org/TR/charmod-norm) 2051 2052> [!NOTE] 2053> Unquoted string literals in a _variant_ do not include spaces. 2054> If users wish to match strings that include whitespace 2055> (including U+3000 `IDEOGRAPHIC SPACE`) 2056> to a key, the `key` needs to be quoted. 2057> 2058> For example: 2059> ``` 2060> .match {$string :string} 2061> | space key | {{Matches the string " space key "}} 2062> * {{Matches the string "space key"}} 2063> ``` 2064 2065###### Formatting 2066 2067The `:string` function returns the string value of the resolved value of the _operand_. 2068 2069#### Numeric Value Selection and Formatting 2070 2071##### The `:number` function 2072 2073The function `:number` is a selector and formatter for numeric values. 2074 2075###### Operands 2076 2077The function `:number` requires a [Number Operand](#number-operands) as its _operand_. 2078 2079###### Options 2080 2081Some options do not have default values defined in this specification. 2082The defaults for these options are implementation-dependent. 2083In general, the default values for such options depend on the locale, 2084the value of other options, or both. 2085 2086The following options and their values are required to be available on the function `:number`: 2087- `select` 2088 - `plural` (default; see [Default Value of `select` Option](#default-value-of-select-option) below) 2089 - `ordinal` 2090 - `exact` 2091- `compactDisplay` (this option only has meaning when combined with the option `notation=compact`) 2092 - `short` (default) 2093 - `long` 2094- `notation` 2095 - `standard` (default) 2096 - `scientific` 2097 - `engineering` 2098 - `compact` 2099- `numberingSystem` 2100 - valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier) 2101 (default is locale-specific) 2102- `signDisplay` 2103 - `auto` (default) 2104 - `always` 2105 - `exceptZero` 2106 - `negative` 2107 - `never` 2108- `style` 2109 - `decimal` (default) 2110 - `percent` (see [Percent Style](#percent-style) below) 2111- `useGrouping` 2112 - `auto` (default) 2113 - `always` 2114 - `never` 2115 - `min2` 2116- `minimumIntegerDigits` 2117 - ([digit size option](#digit-size-options), default: `1`) 2118- `minimumFractionDigits` 2119 - ([digit size option](#digit-size-options)) 2120- `maximumFractionDigits` 2121 - ([digit size option](#digit-size-options)) 2122- `minimumSignificantDigits` 2123 - ([digit size option](#digit-size-options)) 2124- `maximumSignificantDigits` 2125 - ([digit size option](#digit-size-options)) 2126 2127> [!NOTE] 2128> The following options and option values are being developed during the Technical Preview 2129> period. 2130 2131The following values for the option `style` are _not_ part of the default registry. 2132Implementations SHOULD avoid creating options that conflict with these, but 2133are encouraged to track development of these options during Tech Preview: 2134- `currency` 2135- `unit` 2136 2137The following options are _not_ part of the default registry. 2138Implementations SHOULD avoid creating options that conflict with these, but 2139are encouraged to track development of these options during Tech Preview: 2140- `currency` 2141 - valid [Unicode Currency Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCurrencyIdentifier) 2142 (no default) 2143- `currencyDisplay` 2144 - `symbol` (default) 2145 - `narrowSymbol` 2146 - `code` 2147 - `name` 2148- `currencySign` 2149 - `accounting` 2150 - `standard` (default) 2151- `unit` 2152 - (anything not empty) 2153- `unitDisplay` 2154 - `long` 2155 - `short` (default) 2156 - `narrow` 2157 2158###### Default Value of `select` Option 2159 2160The value `plural` is the default for the option `select` 2161because it is the most common use case for numeric selection. 2162It can be used for exact value matches but also allows for the grammatical needs of 2163languages using CLDR's plural rules. 2164This might not be noticeable in the source language (particularly English), 2165but can cause problems in target locales that the original developer is not considering. 2166 2167> For example, a naive developer might use a special message for the value `1` without 2168> considering a locale's need for a `one` plural: 2169> ``` 2170> .match {$var :number} 2171> 1 {{You have one last chance}} 2172> one {{You have {$var} chance remaining}} 2173> * {{You have {$var} chances remaining}} 2174> ``` 2175> 2176> The `one` variant is needed by languages such as Polish or Russian. 2177> Such locales typically also require other keywords such as `two`, `few`, and `many`. 2178 2179###### Percent Style 2180When implementing `style=percent`, the numeric value of the _operand_ 2181MUST be multiplied by 100 for the purposes of formatting. 2182 2183> For example, 2184> ``` 2185> The total was {0.5 :number style=percent}. 2186> ``` 2187> should format in a manner similar to: 2188> > The total was 50%. 2189 2190###### Selection 2191 2192The _function_ `:number` performs selection as described in [Number Selection](#number-selection) below. 2193 2194##### The `:integer` function 2195 2196The function `:integer` is a selector and formatter for matching or formatting numeric 2197values as integers. 2198 2199###### Operands 2200 2201The function `:integer` requires a [Number Operand](#number-operands) as its _operand_. 2202 2203 2204###### Options 2205 2206Some options do not have default values defined in this specification. 2207The defaults for these options are implementation-dependent. 2208In general, the default values for such options depend on the locale, 2209the value of other options, or both. 2210 2211The following options and their values are required in the default registry to be available on the 2212function `:integer`: 2213- `select` 2214 - `plural` (default) 2215 - `ordinal` 2216 - `exact` 2217- `numberingSystem` 2218 - valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier) 2219 (default is locale-specific) 2220- `signDisplay` 2221 - `auto` (default) 2222 - `always` 2223 - `exceptZero` 2224 - `negative` 2225 - `never` 2226- `style` 2227 - `decimal` (default) 2228 - `percent` (see [Percent Style](#percent-style) below) 2229- `useGrouping` 2230 - `auto` (default) 2231 - `always` 2232 - `min2` 2233- `minimumIntegerDigits` 2234 - ([digit size option](#digit-size-options), default: `1`) 2235- `maximumSignificantDigits` 2236 - ([digit size option](#digit-size-options)) 2237 2238> [!NOTE] 2239> The following options and option values are being developed during the Technical Preview 2240> period. 2241 2242The following values for the option `style` are _not_ part of the default registry. 2243Implementations SHOULD avoid creating options that conflict with these, but 2244are encouraged to track development of these options during Tech Preview: 2245- `currency` 2246- `unit` 2247 2248The following options are _not_ part of the default registry. 2249Implementations SHOULD avoid creating options that conflict with these, but 2250are encouraged to track development of these options during Tech Preview: 2251- `currency` 2252 - valid [Unicode Currency Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCurrencyIdentifier) 2253 (no default) 2254- `currencyDisplay` 2255 - `symbol` (default) 2256 - `narrowSymbol` 2257 - `code` 2258 - `name` 2259- `currencySign` 2260 - `accounting` 2261 - `standard` (default) 2262- `unit` 2263 - (anything not empty) 2264- `unitDisplay` 2265 - `long` 2266 - `short` (default) 2267 - `narrow` 2268 2269###### Default Value of `select` Option 2270 2271The value `plural` is the default for the option `select` 2272because it is the most common use case for numeric selection. 2273It can be used for exact value matches but also allows for the grammatical needs of 2274languages using CLDR's plural rules. 2275This might not be noticeable in the source language (particularly English), 2276but can cause problems in target locales that the original developer is not considering. 2277 2278> For example, a naive developer might use a special message for the value `1` without 2279> considering a locale's need for a `one` plural: 2280> ``` 2281> .match {$var :integer} 2282> 1 {{You have one last chance}} 2283> one {{You have {$var} chance remaining}} 2284> * {{You have {$var} chances remaining}} 2285> ``` 2286> 2287> The `one` variant is needed by languages such as Polish or Russian. 2288> Such locales typically also require other keywords such as `two`, `few`, and `many`. 2289 2290###### Percent Style 2291When implementing `style=percent`, the numeric value of the _operand_ 2292MUST be multiplied by 100 for the purposes of formatting. 2293 2294> For example, 2295> ``` 2296> The total was {0.5 :number style=percent}. 2297> ``` 2298> should format in a manner similar to: 2299> > The total was 50%. 2300 2301###### Selection 2302 2303The _function_ `:integer` performs selection as described in [Number Selection](#number-selection) below. 2304 2305##### Number Operands 2306 2307The _operand_ of a number function is either an implementation-defined type or 2308a literal whose contents match the `number-literal` production in the [ABNF](#complete-abnf). 2309All other values produce an _Invalid Expression_ error. 2310 2311> For example, in Java, any subclass of `java.lang.Number` plus the primitive 2312> types (`byte`, `short`, `int`, `long`, `float`, `double`, etc.) 2313> might be considered as the "implementation-defined numeric types". 2314> Implementations in other programming languages would define different types 2315> or classes according to their local needs. 2316 2317> [!NOTE] 2318> String values passed as variables in the _formatting context_'s 2319> _input mapping_ can be formatted as numeric values as long as their 2320> contents match the `number-literal` production in the [ABNF](#complete-abnf). 2321> 2322> For example, if the value of the variable `num` were the string 2323> `-1234.567`, it would behave identically to the local 2324> variable in this example: 2325> ``` 2326> .local $example = {|-1234.567| :number} 2327> {{{$num :number} == {$example}}} 2328> ``` 2329 2330> [!NOTE] 2331> Implementations are encouraged to provide support for compound types or data structures 2332> that provide additional semantic meaning to the formatting of number-like values. 2333> For example, in ICU4J, the type `com.ibm.icu.util.Measure` can be used to communicate 2334> a value that includes a unit 2335> or the type `com.ibm.icu.util.CurrencyAmount` can be used to set the currency and related 2336> options (such as the number of fraction digits). 2337 2338##### Digit Size Options 2339 2340Some _options_ of number _functions_ are defined to take a "digit size option". 2341Implementations of number _functions_ use these _options_ to control aspects of numeric display 2342such as the number of fraction, integer, or significant digits. 2343 2344A "digit size option" is an _option_ value that the _function_ interprets 2345as a small integer value greater than or equal to zero. 2346Implementations MAY define an upper limit on the resolved value 2347of a digit size option option consistent with that implementation's practical limits. 2348 2349In most cases, the value of a digit size option will be a string that 2350encodes the value as a decimal integer. 2351Implementations MAY also accept implementation-defined types as the value. 2352When provided as a string, the representation of a digit size option matches the following ABNF: 2353>```abnf 2354> digit-size-option = "0" / (("1"-"9") [DIGIT]) 2355>``` 2356 2357 2358##### Number Selection 2359 2360Number selection has three modes: 2361- `exact` selection matches the operand to explicit numeric keys exactly 2362- `plural` selection matches the operand to explicit numeric keys exactly 2363 or to plural rule categories if there is no explicit match 2364- `ordinal` selection matches the operand to explicit numeric keys exactly 2365 or to ordinal rule categories if there is no explicit match 2366 2367When implementing [`MatchSelectorKeys(resolvedSelector, keys)`](#resolve-preferences) 2368where `resolvedSelector` is the resolved value of a _selector_ _expression_ 2369and `keys` is a list of strings, 2370numeric selectors perform as described below. 2371 23721. Let `exact` be the JSON string representation of the numeric value of `resolvedSelector`. 2373 (See [Determining Exact Literal Match](#determining-exact-literal-match) for details) 23741. Let `keyword` be a string which is the result of [rule selection](#rule-selection) on `resolvedSelector`. 23751. Let `resultExact` be a new empty list of strings. 23761. Let `resultKeyword` be a new empty list of strings. 23771. For each string `key` in `keys`: 2378 1. If the value of `key` matches the production `number-literal`, then 2379 1. If `key` and `exact` consist of the same sequence of Unicode code points, then 2380 1. Append `key` as the last element of the list `resultExact`. 2381 1. Else if `key` is one of the keywords `zero`, `one`, `two`, `few`, `many`, or `other`, then 2382 1. If `key` and `keyword` consist of the same sequence of Unicode code points, then 2383 1. Append `key` as the last element of the list `resultKeyword`. 2384 1. Else, emit a _Selection Error_. 23851. Return a new list whose elements are the concatenation of the elements (in order) of `resultExact` followed by the elements (in order) of `resultKeyword`. 2386 2387> [!NOTE] 2388> Implementations are not required to implement this exactly as written. 2389> However, the observed behavior must be consistent with what is described here. 2390 2391###### Rule Selection 2392 2393If the option `select` is set to `exact`, rule-based selection is not used. 2394Return the empty string. 2395 2396> [!NOTE] 2397> Since valid keys cannot be the empty string in a numeric expression, returning the 2398> empty string disables keyword selection. 2399 2400If the option `select` is set to `plural`, selection should be based on CLDR plural rule data 2401of type `cardinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) 2402for examples. 2403 2404If the option `select` is set to `ordinal`, selection should be based on CLDR plural rule data 2405of type `ordinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) 2406for examples. 2407 2408Apply the rules defined by CLDR to the resolved value of the operand and the function options, 2409and return the resulting keyword. 2410If no rules match, return `other`. 2411 2412> **Example.** 2413> In CLDR 44, the Czech (`cs`) plural rule set can be found 2414> [here](https://www.unicode.org/cldr/charts/44/supplemental/language_plural_rules.html#cs). 2415> 2416> A message in Czech might be: 2417> ``` 2418> .match {$numDays :number} 2419> one {{{$numDays} den}} 2420> few {{{$numDays} dny}} 2421> many {{{$numDays} dne}} 2422> * {{{$numDays} dní}} 2423> ``` 2424> Using the rules found above, the results of various _operand_ values might look like: 2425> | Operand value | Keyword | Formatted Message | 2426> |---|---|---| 2427> | 1 | `one` | 1 den | 2428> | 2 | `few` | 2 dny | 2429> | 5 | `other` | 5 dní | 2430> | 22 | `few` | 22 dny | 2431> | 27 | `other` | 27 dní | 2432> | 2.4 | `many` | 2,4 dne | 2433 2434###### Determining Exact Literal Match 2435 2436> [!IMPORTANT] 2437> The exact behavior of exact literal match is only defined for non-zero-filled 2438> integer values. 2439> Annotations that use fraction digits or significant digits might work in specific 2440> implementation-defined ways. 2441> Users should avoid depending on these types of keys in message selection. 2442 2443 2444Number literals in the MessageFormat 2 syntax use the 2445[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6). 2446A `resolvedSelector` exactly matches a numeric literal `key` 2447if, when the numeric value of `resolvedSelector` is serialized using the format for a JSON number, 2448the two strings are equal. 2449 2450> [!NOTE] 2451> Only integer matching is required in the Technical Preview. 2452> Feedback describing use cases for fractional and significant digits-based 2453> selection would be helpful. 2454> Otherwise, users should avoid using matching with fractional numbers or significant digits. 2455 2456#### Date and Time Value Formatting 2457 2458This subsection describes the functions and options for date/time formatting. 2459Selection based on date and time values is not required in this release. 2460 2461> [!NOTE] 2462> Selection based on date/time types is not required by MF2. 2463> Implementations should use care when defining selectors based on date/time types. 2464> The types of queries found in implementations such as `java.time.TemporalAccessor` 2465> are complex and user expectations may be inconsistent with good I18N practices. 2466 2467##### The `:datetime` function 2468 2469The function `:datetime` is used to format date/time values, including 2470the ability to compose user-specified combinations of fields. 2471 2472If no options are specified, this function defaults to the following: 2473- `{$d :datetime}` is the same as `{$d :datetime dateStyle=short timeStyle=short}` 2474 2475> [!NOTE] 2476> The default formatting behavior of `:datetime` is inconsistent with `Intl.DateTimeFormat` 2477> in JavaScript and with `{d,date}` in ICU MessageFormat 1.0. 2478> This is because, unlike those implementations, `:datetime` is distinct from `:date` and `:time`. 2479 2480###### Operands 2481 2482The _operand_ of the `:datetime` function is either 2483an implementation-defined date/time type 2484or a _date/time literal value_, as defined in [Date and Time Operand](#date-and-time-operands). 2485All other _operand_ values produce an _Invalid Expression_ error. 2486 2487###### Options 2488 2489The `:datetime` function can use either the appropriate _style options_ 2490or can use a collection of _field options_ (but not both) to control the formatted 2491output. 2492 2493If both are specified, an _Invalid Expression_ error MUST be emitted 2494and a _fallback value_ used as the resolved value of the _expression_. 2495 2496**Style Options** 2497 2498The function `:datetime` has these _style options_. 2499- `dateStyle` 2500 - `full` 2501 - `long` 2502 - `medium` 2503 - `short` 2504- `timeStyle` 2505 - `full` 2506 - `long` 2507 - `medium` 2508 - `short` 2509 2510**Field Options** 2511 2512_Field options_ describe which fields to include in the formatted output 2513and what format to use for that field. 2514The implementation may use this _annotation_ to configure which fields 2515appear in the formatted output. 2516 2517> [!NOTE] 2518> _Field options_ do not have default values because they are only to be used 2519> to compose the formatter. 2520 2521The _field options_ are defined as follows: 2522 2523> [!IMPORTANT] 2524> The value `2-digit` for some _field options_ **must** be quoted 2525> in the MessageFormat syntax because it starts with a digit 2526> but does not match the `number-literal` production in the ABNF. 2527> ``` 2528> .local $correct = {$someDate :datetime year=|2-digit|} 2529> .local $syntaxError = {$someDate :datetime year=2-digit} 2530> ``` 2531 2532The function `:datetime` has the following options: 2533- `weekday` 2534 - `long` 2535 - `short` 2536 - `narrow` 2537- `era` 2538 - `long` 2539 - `short` 2540 - `narrow` 2541- `year` 2542 - `numeric` 2543 - `2-digit` 2544- `month` 2545 - `numeric` 2546 - `2-digit` 2547 - `long` 2548 - `short` 2549 - `narrow` 2550- `day` 2551 - `numeric` 2552 - `2-digit` 2553- `hour` 2554 - `numeric` 2555 - `2-digit` 2556- `minute` 2557 - `numeric` 2558 - `2-digit` 2559- `second` 2560 - `numeric` 2561 - `2-digit` 2562- `fractionalSecondDigits` 2563 - `1` 2564 - `2` 2565 - `3` 2566- `hourCycle` (default is locale-specific) 2567 - `h11` 2568 - `h12` 2569 - `h23` 2570 - `h24` 2571- `timeZoneName` 2572 - `long` 2573 - `short` 2574 - `shortOffset` 2575 - `longOffset` 2576 - `shortGeneric` 2577 - `longGeneric` 2578 2579> [!NOTE] 2580> The following options do not have default values because they are only to be used 2581> as overrides for locale-and-value dependent implementation-defined defaults. 2582 2583The following date/time options are **not** part of the default registry. 2584Implementations SHOULD avoid creating options that conflict with these, but 2585are encouraged to track development of these options during Tech Preview: 2586- `calendar` (default is locale-specific) 2587 - valid [Unicode Calendar Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCalendarIdentifier) 2588- `numberingSystem` (default is locale-specific) 2589 - valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier) 2590- `timeZone` (default is system default time zone or UTC) 2591 - valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557) 2592 2593##### The `:date` function 2594 2595The function `:date` is used to format the date portion of date/time values. 2596 2597If no options are specified, this function defaults to the following: 2598- `{$d :date}` is the same as `{$d :date style=short}` 2599 2600###### Operands 2601 2602The _operand_ of the `:date` function is either 2603an implementation-defined date/time type 2604or a _date/time literal value_, as defined in [Date and Time Operand](#date-and-time-operands). 2605All other _operand_ values produce an _Invalid Expression_ error. 2606 2607###### Options 2608 2609The function `:date` has these _options_: 2610- `style` 2611 - `full` 2612 - `long` 2613 - `medium` 2614 - `short` (default) 2615 2616##### The `:time` function 2617 2618The function `:time` is used to format the time portion of date/time values. 2619 2620If no options are specified, this function defaults to the following: 2621- `{$t :time}` is the same as `{$t :time style=short}` 2622 2623###### Operands 2624 2625The _operand_ of the `:time` function is either 2626an implementation-defined date/time type 2627or a _date/time literal value_, as defined in [Date and Time Operand](#date-and-time-operands). 2628All other _operand_ values produce an _Invalid Expression_ error. 2629 2630###### Options 2631 2632The function `:time` has these _options_: 2633- `style` 2634 - `full` 2635 - `long` 2636 - `medium` 2637 - `short` (default) 2638 2639 2640##### Date and Time Operands 2641 2642The _operand_ of a date/time function is either 2643an implementation-defined date/time type 2644or a _date/time literal value_, as defined below. 2645All other _operand_ values produce an _Invalid Expression_ error. 2646 2647A **_<dfn>date/time literal value</dfn>_** is a non-empty string consisting of an ISO 8601 date, 2648or an ISO 8601 datetime optionally followed by a timezone offset. 2649As implementations differ slightly in their parsing of such strings, 2650ISO 8601 date and datetime values not matching the following regular expression MAY also be supported. 2651Furthermore, matching this regular expression does not guarantee validity, 2652given the variable number of days in each month. 2653 2654```regexp 2655(?!0000)[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])(T([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]{1,3})?(Z|[+-]((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?)? 2656``` 2657 2658When the time is not present, implementations SHOULD use `00:00:00` as the time. 2659When the offset is not present, implementations SHOULD use a floating time type 2660(such as Java's `java.time.LocalDateTime`) to represent the time value. 2661For more information, see [Working with Timezones](https://w3c.github.io/timezone). 2662 2663> [!IMPORTANT] 2664> The [ABNF](#complete-abnf) and [syntax](#syntax) of MF2 2665> do not formally define date/time literals. 2666> This means that a _message_ can be syntactically valid but produce 2667> an _Operand Mismatch Error_ at runtime. 2668 2669> [!NOTE] 2670> String values passed as variables in the _formatting context_'s 2671> _input mapping_ can be formatted as date/time values as long as their 2672> contents are date/time literals. 2673> 2674> For example, if the value of the variable `now` were the string 2675> `2024-02-06T16:40:00Z`, it would behave identically to the local 2676> variable in this example: 2677> ``` 2678> .local $example = {|2024-02-06T16:40:00Z| :datetime} 2679> {{{$now :datetime} == {$example}}} 2680> ``` 2681 2682> [!NOTE] 2683> True time zone support in serializations is expected to coincide with the adoption 2684> of Temporal in JavaScript. 2685> The form of these serializations is known and is a de facto standard. 2686> Support for these extensions is expected to be required in the post-tech preview. 2687> See: https://datatracker.ietf.org/doc/draft-ietf-sedate-datetime-extended/ 2688 2689 2690 2691 2692## Formatting 2693 2694This section defines the behavior of a MessageFormat 2.0 implementation 2695when formatting a message for display in a user interface, or for some later processing. 2696 2697To start, we presume that a _message_ has either been parsed from its syntax 2698or created from a data model description. 2699If this construction has encountered any _Syntax Errors_ or _Data Model Errors_, 2700an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result. 2701 2702Formatting of a _message_ is defined by the following operations: 2703 2704- **_Expression and Markup Resolution_** determines the value of an _expression_ or _markup_, 2705 with reference to the current _formatting context_. 2706 This can include multiple steps, 2707 such as looking up the value of a variable and calling formatting functions. 2708 The form of the resolved value is implementation defined and the 2709 value might not be evaluated or formatted yet. 2710 However, it needs to be "formattable", i.e. it contains everything required 2711 by the eventual formatting. 2712 2713 The resolution of _text_ is rather straightforward, 2714 and is detailed under _literal resolution_. 2715 2716> [!IMPORTANT] 2717> 2718> **This specification does not require either eager or lazy _expression resolution_ of _message_ 2719> parts; do not construe any requirement in this document as requiring either.** 2720> 2721> Implementations are not required to evaluate all parts of a _message_ when 2722> parsing, processing, or formatting. 2723> In particular, an implementation MAY choose not to evaluate or resolve the 2724> value of a given _expression_ until it is actually used by a 2725> selection or formatting process. 2726> However, when an _expression_ is resolved, it MUST behave as if all preceding 2727> _declarations_ and _selectors_ affecting _variables_ referenced by that _expression_ 2728> have already been evaluated in the order in which the relevant _declarations_ 2729> and _selectors_ appear in the _message_. 2730 2731- **_Pattern Selection_** determines which of a message's _patterns_ is formatted. 2732 For a message with no _selectors_, this is simple as there is only one _pattern_. 2733 With _selectors_, this will depend on their resolution. 2734 2735 At the start of _pattern selection_, 2736 if the _message_ contains any _reserved statements_, 2737 emit an _Unsupported Statement_ error. 2738 2739- **_Formatting_** takes the resolved values of the selected _pattern_, 2740 and produces the formatted result for the _message_. 2741 Depending on the implementation, this result could be a single concatenated string, 2742 an array of objects, an attributed string, or some other locally appropriate data type. 2743 2744Formatter implementations are not required to expose 2745the _expression resolution_ and _pattern selection_ operations to their users, 2746or even use them in their internal processing, 2747as long as the final _formatting_ result is made available to users 2748and the observable behavior of the formatter matches that described here. 2749 2750### Formatting Context 2751 2752A message's **_formatting context_** represents the data and procedures that are required 2753for the message's _expression resolution_, _pattern selection_ and _formatting_. 2754 2755At a minimum, it includes: 2756 2757- Information on the current **_locale_**, 2758 potentially including a fallback chain of locales. 2759 This will be passed on to formatting functions. 2760 2761- Information on the base directionality of the _message_ and its _text_ tokens. 2762 This will be used by strategies for bidirectional isolation, 2763 and can be used to set the base direction of the _message_ upon display. 2764 2765- An **_<dfn>input mapping</dfn>_** of string identifiers to values, 2766 defining variable values that are available during _variable resolution_. 2767 This is often determined by a user-provided argument of a formatting function call. 2768 2769- The _function registry_, 2770 providing the implementations of the functions referred to by message _functions_. 2771 2772- Optionally, a fallback string to use for the message 2773 if it contains any _Syntax Errors_ or _Data Model Errors_. 2774 2775Implementations MAY include additional fields in their _formatting context_. 2776 2777### Expression and Markup Resolution 2778 2779_Expressions_ are used in _declarations_, _selectors_, and _patterns_. 2780_Markup_ is only used in _patterns_. 2781 2782In a _declaration_, the resolved value of the _expression_ is bound to a _variable_, 2783which is available for use by later _expressions_. 2784Since a _variable_ can be referenced in different ways later, 2785implementations SHOULD NOT immediately fully format the value for output. 2786 2787In an _input-declaration_, the _variable_ operand of the _variable-expression_ 2788identifies not only the name of the external input value, 2789but also the _variable_ to which the resolved value of the _variable-expression_ is bound. 2790 2791In _selectors_, the resolved value of an _expression_ is used for _pattern selection_. 2792 2793In a _pattern_, the resolved value of an _expression_ or _markup_ is used in its _formatting_. 2794 2795The form that resolved values take is implementation-dependent, 2796and different implementations MAY choose to perform different levels of resolution. 2797 2798> For example, the resolved value of the _expression_ `{|0.40| :number style=percent}` 2799> could be an object such as 2800> 2801> ``` 2802> { value: Number('0.40'), 2803> formatter: NumberFormat(locale, { style: 'percent' }) } 2804> ``` 2805> 2806> Alternatively, it could be an instance of an ICU4J `FormattedNumber`, 2807> or some other locally appropriate value. 2808 2809Depending on the presence or absence of a _variable_ or _literal_ operand 2810and a _function_, _private-use annotation_, or _reserved annotation_, 2811the resolved value of the _expression_ is determined as follows: 2812 2813If the _expression_ contains a _reserved annotation_, 2814an _Unsupported Expression_ error is emitted and 2815a _fallback value_ is used as the resolved value of the _expression_. 2816 2817Else, if the _expression_ contains a _private-use annotation_, 2818its resolved value is defined according to the implementation's specification. 2819 2820Else, if the _expression_ contains an _annotation_, 2821its resolved value is defined by _function resolution_. 2822 2823Else, if the _expression_ consists of a _variable_, 2824its resolved value is defined by _variable resolution_. 2825An implementation MAY perform additional processing 2826when resolving the value of an _expression_ 2827that consists only of a _variable_. 2828 2829> For example, it could apply _function resolution_ using a _function_ 2830> and a set of _options_ chosen based on the value or type of the _variable_. 2831> So, given a _message_ like this: 2832> 2833> ``` 2834> Today is {$date} 2835> ``` 2836> 2837> If the value passed in the _variable_ were a date object, 2838> such as a JavaScript `Date` or a Java `java.util.Date` or `java.time.Temporal`, 2839> the implementation could interpret the _placeholder_ `{$date}` as if 2840> the pattern included the function `:datetime` with some set of default options. 2841 2842Else, the _expression_ consists of a _literal_. 2843Its resolved value is defined by _literal resolution_. 2844 2845> **Note** 2846> This means that a _literal_ value with no _annotation_ 2847> is always treated as a string. 2848> To represent values that are not strings as a _literal_, 2849> an _annotation_ needs to be provided: 2850> 2851> ``` 2852> .local $aNumber = {1234 :number} 2853> .local $aDate = {|2023-08-30| :datetime} 2854> .local $aFoo = {|some foo| :foo} 2855> {{You have {42 :number}}} 2856> ``` 2857 2858#### Literal Resolution 2859 2860The resolved value of a _text_ or a _literal_ is 2861the character sequence of the _text_ or _literal_ 2862after any character escape has been converted to the escaped character. 2863 2864When a _literal_ is used as an _operand_ 2865or on the right-hand side of an _option_, 2866the formatting function MUST treat its resolved value the same 2867whether its value was originally _quoted_ or _unquoted_. 2868 2869> For example, 2870> the _option_ `foo=42` and the _option_ `foo=|42|` are treated as identical. 2871 2872The resolution of a _text_ or _literal_ MUST resolve to a string. 2873 2874#### Variable Resolution 2875 2876To resolve the value of a _variable_, 2877its _name_ is used to identify either a local variable or an input variable. 2878If a _declaration_ exists for the _variable_, its resolved value is used. 2879Otherwise, the _variable_ is an implicit reference to an input value, 2880and its value is looked up from the _formatting context_ _input mapping_. 2881 2882The resolution of a _variable_ MAY fail if no value is identified for its _name_. 2883If this happens, an _Unresolved Variable_ error MUST be emitted. 2884If a _variable_ would resolve to a _fallback value_, 2885this MUST also be considered a failure. 2886 2887#### Function Resolution 2888 2889To resolve an _expression_ with a _function_ _annotation_, 2890the following steps are taken: 2891 28921. If the _expression_ includes an _operand_, resolve its value. 2893 If this fails, use a _fallback value_ for the _expression_. 28942. Resolve the _identifier_ of the _function_ and, based on the starting sigil, 2895 find the appropriate function implementation to call. 2896 If the implementation cannot find the function, 2897 or if the _identifier_ includes a _namespace_ that the implementation does not support, 2898 emit an _Unknown Function_ error 2899 and use a _fallback value_ for the _expression_. 2900 2901 Implementations are not required to implement _namespaces_ or installable 2902 _function registries_. 2903 29043. Perform _option resolution_. 2905 29064. Call the function implementation with the following arguments: 2907 2908 - The current _locale_. 2909 - The resolved mapping of _options_. 2910 - If the _expression_ includes an _operand_, its resolved value. 2911 2912 The form that resolved _operand_ and _option_ values take is implementation-defined. 2913 2914 A _declaration_ binds the resolved value of an _expression_ 2915 to a _variable_. 2916 Thus, the result of one _function_ is potentially the _operand_ 2917 of another _function_, 2918 or the value of one of the _options_ for another function. 2919 For example, in 2920 ``` 2921 .input {$n :number minIntegerDigits=3} 2922 .local $n1 = {$n :number maxFractionDigits=3} 2923 ``` 2924 the value bound to `$n` is the 2925 resolved value used as the _operand_ 2926 of the `:number` _function_ 2927 when resolving the value of the _variable_ `$n1`. 2928 2929 Implementations that provide a means for defining custom functions 2930 SHOULD provide a means for function implementations 2931 to return values that contain enough information 2932 (e.g. a representation of 2933 the resolved _operand_ and _option_ values 2934 that the function was called with) 2935 to be used as arguments to subsequent calls 2936 to the function implementations. 2937 For example, an implementation might define an interface that allows custom function implementation. 2938 Such an interface SHOULD define an implementation-specific 2939 argument type `T` and return type `U` 2940 for implementations of functions 2941 such that `U` can be coerced to `T`. 2942 Implementations of a _function_ SHOULD emit an 2943 _Invalid Expression_ error for _operands_ whose resolved value 2944 or type is not supported. 2945 2946> [!NOTE] 2947> The behavior of the previous example is 2948> currently implementation-dependent. Supposing that 2949> the external input variable `n` is bound to the string `"1"`, 2950> and that the implementation formats to a string, 2951> the formatted result of the following message: 2952> 2953> ``` 2954> .input {$n :number minIntegerDigits=3} 2955> .local $n1 = {$n :number maxFractionDigits=3} 2956> {{$n1}} 2957> ``` 2958> 2959> is currently implementation-dependent. 2960> Depending on whether the options are preserved 2961> between the resolution of the first `:number` _annotation_ 2962> and the resolution of the second `:number` _annotation_, 2963> a conformant implementation 2964> could produce either "001.000" or "1.000" 2965> 2966> Each function **specification** MAY have 2967> its own rules to preserve some options in the returned structure 2968> and discard others. 2969> In instances where a function specification does not determine whether an option is preserved or discarded, 2970> each function **implementation** of that specification MAY have 2971> its own rules to preserve some options in the returned structure 2972> and discard others. 2973> 2974 2975> [!NOTE] 2976> During the Technical Preview, 2977> feedback on how the registry describes 2978> the flow of _resolved values_ and _options_ 2979> from one _function_ to another, 2980> and on what requirements this specification should impose, 2981> is highly desired. 2982 2983 An implementation MAY pass additional arguments to the function, 2984 as long as reasonable precautions are taken to keep the function interface 2985 simple and minimal, and avoid introducing potential security vulnerabilities. 2986 2987 An implementation MAY define its own functions. 2988 An implementation MAY allow custom functions to be defined by users. 2989 2990 Function access to the _formatting context_ MUST be minimal and read-only, 2991 and execution time SHOULD be limited. 2992 2993 Implementation-defined _functions_ SHOULD use an implementation-defined _namespace_. 2994 29955. If the call succeeds, 2996 resolve the value of the _expression_ as the result of that function call. 2997 2998 If the call fails or does not return a valid value, 2999 emit a _Invalid Expression_ error. 3000 3001 Implementations MAY provide a mechanism for the _function_ to provide 3002 additional detail about internal failures. 3003 Specifically, if the cause of the failure was that the datatype, value, or format of the 3004 _operand_ did not match that expected by the _function_, 3005 the _function_ might cause an _Operand Mismatch Error_ to be emitted. 3006 3007 In all failure cases, use the _fallback value_ for the _expression_ as the resolved value. 3008 3009##### Option Resolution 3010 3011The result of resolving _option_ values is an unordered mapping of string identifiers to values. 3012 3013For each _option_: 3014 3015- Resolve the _identifier_ of the _option_. 3016- If the _option_'s right-hand side successfully resolves to a value, 3017 bind the _identifier_ of the _option_ to the resolved value in the mapping. 3018- Otherwise, bind the _identifier_ of the _option_ to an unresolved value in the mapping. 3019 Implementations MAY later remove this value before calling the _function_. 3020 (Note that an _Unresolved Variable_ error will have been emitted.) 3021 3022Errors MAY be emitted during _option resolution_, 3023but it always resolves to some mapping of string identifiers to values. 3024This mapping can be empty. 3025 3026#### Markup Resolution 3027 3028Unlike _functions_, the resolution of _markup_ is not customizable. 3029 3030The resolved value of _markup_ includes the following fields: 3031 3032- The type of the markup: open, standalone, or close 3033- The _identifier_ of the _markup_ 3034- The resolved _options_ values after _option resolution_. 3035 3036The resolution of _markup_ MUST always succeed. 3037 3038#### Fallback Resolution 3039 3040A **_fallback value_** is the resolved value for an _expression_ that fails to resolve. 3041 3042An _expression_ fails to resolve when: 3043 3044- A _variable_ used as an _operand_ (with or without an _annotation_) fails to resolve. 3045 * Note that this does not include a _variable_ used as an _option_ value. 3046- A _function_ _annotation_ fails to resolve. 3047- A _private-use annotation_ is unsupported by the implementation or if 3048 a _private-use annotation_ fails to resolve. 3049- The _expression_ has a _reserved annotation_. 3050 3051The _fallback value_ depends on the contents of the _expression_: 3052 3053- _expression_ with _literal_ _operand_ (_quoted_ or _unquoted_): 3054 U+007C VERTICAL LINE `|` 3055 followed by the value of the _literal_ 3056 with escaping applied to U+005C REVERSE SOLIDUS `\` and U+007C VERTICAL LINE `|`, 3057 and then by U+007C VERTICAL LINE `|`. 3058 3059 > Examples: 3060 > In a context where `:func` fails to resolve, 3061 > `{42 :func}` resolves to the _fallback value_ `|42|` and 3062 > `{|C:\\| :func}` resolves to the _fallback value_ `|C:\\|`. 3063 > In any context, `{|| @reserved}` resolves to the _fallback value_ `||`. 3064 3065- _expression_ with _variable_ _operand_ referring to a local _declaration_ (with or without an _annotation_): 3066 the _value_ to which it resolves (which may already be a _fallback value_) 3067 3068 > Examples: 3069 > In a context where `:func` fails to resolve, 3070 > the _pattern_'s _expression_ in `.local $var={|val|} {{{$val :func}}}` 3071 > resolves to the _fallback value_ `|val|` and the message formats to `{|val|}`. 3072 > In a context where `:now` fails to resolve but `:datetime` does not, 3073 > the _pattern_'s _expression_ in 3074 > ``` 3075 > .local $t = {:now format=iso8601} 3076 > .local $pretty_t = {$t :datetime} 3077 > {{{$pretty_t}}} 3078 > ``` 3079 > (transitively) resolves to the _fallback value_ `:now` and 3080 > the message formats to `{:now}`. 3081 3082- _expression_ with _variable_ _operand_ not referring to a local _declaration_ (with or without an _annotation_): 3083 U+0024 DOLLAR SIGN `$` followed by the _name_ of the _variable_ 3084 3085 > Examples: 3086 > In a context where `$var` fails to resolve, `{$var}` and `{$var :number}` and `{$var @reserved}` 3087 > all resolve to the _fallback value_ `$var`. 3088 > In a context where `:func` fails to resolve, 3089 > the _pattern_'s _expression_ in `.input $arg {{{$arg :func}}}` 3090 > resolves to the _fallback value_ `$arg` and 3091 > the message formats to `{$arg}`. 3092 3093- _function_ _expression_ with no _operand_: 3094 U+003A COLON `:` followed by the _function_ _identifier_ 3095 3096 > Examples: 3097 > In a context where `:func` fails to resolve, `{:func}` resolves to the _fallback value_ `:func`. 3098 > In a context where `:ns:func` fails to resolve, `{:ns:func}` resolves to the _fallback value_ `:ns:func`. 3099 3100- unsupported _private-use annotation_ or _reserved annotation_ with no _operand_: 3101 the _annotation_ starting sigil 3102 3103 > Examples: 3104 > In any context, `{@reserved}` and `{@reserved |...|}` both resolve to the _fallback value_ `@`. 3105 3106- supported _private-use annotation_ with no _operand_: 3107 the _annotation_ starting sigil, optionally followed by implementation-defined details 3108 conforming with patterns in the other cases (such as quoting literals). 3109 If details are provided, they SHOULD NOT leak potentially private information. 3110 3111 > Examples: 3112 > In a context where `^` expressions are used for comments, `{^▽^}` might resolve to the _fallback value_ `^`. 3113 > In a context where `&` expressions are _function_-like macro invocations, `{&foo |...|}` might resolve to the _fallback value_ `&foo`. 3114 3115- Otherwise: the U+FFFD REPLACEMENT CHARACTER `�` 3116 3117 This is not currently used by any expression, but may apply in future revisions. 3118 3119_Option_ _identifiers_ and values are not included in the _fallback value_. 3120 3121_Pattern selection_ is not supported for _fallback values_. 3122 3123### Pattern Selection 3124 3125When a _message_ contains a _matcher_ with one or more _selectors_, 3126the implementation needs to determine which _variant_ will be used 3127to provide the _pattern_ for the formatting operation. 3128This is done by ordering and filtering the available _variant_ statements 3129according to their _key_ values and selecting the first one. 3130 3131> [!NOTE] 3132> At least one _variant_ is required to have all of its _keys_ consist of 3133> the fallback value `*`. 3134> Some _selectors_ might be implemented in a way that the key value `*` 3135> cannot be selected in a _valid_ _message_. 3136> In other cases, this key value might be unreachable only in certain locales. 3137> This could result in the need in some locales to create 3138> one or more _variants_ that do not make sense grammatically for that language. 3139> > For example, in the `pl` (Polish) locale, this _message_ cannot reach 3140> > the `*` _variant_: 3141> > ``` 3142> > .match {$num :integer} 3143> > 0 {{ }} 3144> > one {{ }} 3145> > few {{ }} 3146> > many {{ }} 3147> > * {{Only used by fractions in Polish.}} 3148> > ``` 3149> 3150> In the Tech Preview, feedback from users and implementers is desired about 3151> whether to relax the requirement that such a "fallback _variant_" appear in 3152> every message, versus the potential for a _message_ to fail at runtime 3153> because no matching _variant_ is available. 3154 3155The number of _keys_ in each _variant_ MUST equal the number of _selectors_. 3156 3157Each _key_ corresponds to a _selector_ by its position in the _variant_. 3158 3159> For example, in this message: 3160> 3161> ``` 3162> .match {:one} {:two} {:three} 3163> 1 2 3 {{ ... }} 3164> ``` 3165> 3166> The first _key_ `1` corresponds to the first _selector_ (`{:one}`), 3167> the second _key_ `2` to the second _selector_ (`{:two}`), 3168> and the third _key_ `3` to the third _selector_ (`{:three}`). 3169 3170To determine which _variant_ best matches a given set of inputs, 3171each _selector_ is used in turn to order and filter the list of _variants_. 3172 3173Each _variant_ with a _key_ that does not match its corresponding _selector_ 3174is omitted from the list of _variants_. 3175The remaining _variants_ are sorted according to the _selector_'s _key_-ordering preference. 3176Earlier _selectors_ in the _matcher_'s list of _selectors_ have a higher priority than later ones. 3177 3178When all of the _selectors_ have been processed, 3179the earliest-sorted _variant_ in the remaining list of _variants_ is selected. 3180 3181> [!NOTE] 3182> A _selector_ is not a _declaration_. 3183> Even when the same _function_ can be used for both formatting and selection 3184> of a given _operand_ 3185> the _annotation_ that appears in a _selector_ has no effect on subsequent 3186> _selectors_ nor on the formatting used in _placeholders_. 3187> To use the same value for selection and formatting, 3188> set its value with a `.input` or `.local` _declaration_. 3189 3190This selection method is defined in more detail below. 3191An implementation MAY use any pattern selection method, 3192as long as its observable behavior matches the results of the method defined here. 3193 3194If the message being formatted has any _Syntax Errors_ or _Data Model Errors_, 3195the result of pattern selection MUST be a pattern resolving to a single _fallback value_ 3196using the message's fallback string defined in the _formatting context_ 3197or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`. 3198 3199#### Resolve Selectors 3200 3201First, resolve the values of each _selector_: 3202 32031. Let `res` be a new empty list of resolved values that support selection. 32041. For each _selector_ `sel`, in source order, 3205 1. Let `rv` be the resolved value of `sel`. 3206 1. If selection is supported for `rv`: 3207 1. Append `rv` as the last element of the list `res`. 3208 1. Else: 3209 1. Let `nomatch` be a resolved value for which selection always fails. 3210 1. Append `nomatch` as the last element of the list `res`. 3211 1. Emit a _Selection Error_. 3212 3213The form of the resolved values is determined by each implementation, 3214along with the manner of determining their support for selection. 3215 3216#### Resolve Preferences 3217 3218Next, using `res`, resolve the preferential order for all message keys: 3219 32201. Let `pref` be a new empty list of lists of strings. 32211. For each index `i` in `res`: 3222 1. Let `keys` be a new empty list of strings. 3223 1. For each _variant_ `var` of the message: 3224 1. Let `key` be the `var` key at position `i`. 3225 1. If `key` is not the catch-all key `'*'`: 3226 1. Assert that `key` is a _literal_. 3227 1. Let `ks` be the resolved value of `key`. 3228 1. Append `ks` as the last element of the list `keys`. 3229 1. Let `rv` be the resolved value at index `i` of `res`. 3230 1. Let `matches` be the result of calling the method MatchSelectorKeys(`rv`, `keys`) 3231 1. Append `matches` as the last element of the list `pref`. 3232 3233The method MatchSelectorKeys is determined by the implementation. 3234It takes as arguments a resolved _selector_ value `rv` and a list of string keys `keys`, 3235and returns a list of string keys in preferential order. 3236The returned list MUST contain only unique elements of the input list `keys`. 3237The returned list MAY be empty. 3238The most-preferred key is first, 3239with each successive key appearing in order by decreasing preference. 3240 3241#### Filter Variants 3242 3243Then, using the preferential key orders `pref`, 3244filter the list of _variants_ to the ones that match with some preference: 3245 32461. Let `vars` be a new empty list of _variants_. 32471. For each _variant_ `var` of the message: 3248 1. For each index `i` in `pref`: 3249 1. Let `key` be the `var` key at position `i`. 3250 1. If `key` is the catch-all key `'*'`: 3251 1. Continue the inner loop on `pref`. 3252 1. Assert that `key` is a _literal_. 3253 1. Let `ks` be the resolved value of `key`. 3254 1. Let `matches` be the list of strings at index `i` of `pref`. 3255 1. If `matches` includes `ks`: 3256 1. Continue the inner loop on `pref`. 3257 1. Else: 3258 1. Continue the outer loop on message _variants_. 3259 1. Append `var` as the last element of the list `vars`. 3260 3261#### Sort Variants 3262 3263Finally, sort the list of variants `vars` and select the _pattern_: 3264 32651. Let `sortable` be a new empty list of (integer, _variant_) tuples. 32661. For each _variant_ `var` of `vars`: 3267 1. Let `tuple` be a new tuple (-1, `var`). 3268 1. Append `tuple` as the last element of the list `sortable`. 32691. Let `len` be the integer count of items in `pref`. 32701. Let `i` be `len` - 1. 32711. While `i` >= 0: 3272 1. Let `matches` be the list of strings at index `i` of `pref`. 3273 1. Let `minpref` be the integer count of items in `matches`. 3274 1. For each tuple `tuple` of `sortable`: 3275 1. Let `matchpref` be an integer with the value `minpref`. 3276 1. Let `key` be the `tuple` _variant_ key at position `i`. 3277 1. If `key` is not the catch-all key `'*'`: 3278 1. Assert that `key` is a _literal_. 3279 1. Let `ks` be the resolved value of `key`. 3280 1. Let `matchpref` be the integer position of `ks` in `matches`. 3281 1. Set the `tuple` integer value as `matchpref`. 3282 1. Set `sortable` to be the result of calling the method `SortVariants(sortable)`. 3283 1. Set `i` to be `i` - 1. 32841. Let `var` be the _variant_ element of the first element of `sortable`. 32851. Select the _pattern_ of `var`. 3286 3287`SortVariants` is a method whose single argument is 3288a list of (integer, _variant_) tuples. 3289It returns a list of (integer, _variant_) tuples. 3290Any implementation of `SortVariants` is acceptable 3291as long as it satisfies the following requirements: 3292 32931. Let `sortable` be an arbitrary list of (integer, _variant_) tuples. 32941. Let `sorted` be `SortVariants(sortable)`. 32951. `sorted` is the result of sorting `sortable` using the following comparator: 3296 1. `(i1, v1)` <= `(i2, v2)` if and only if `i1 <= i2`. 32971. The sort is stable (pairs of tuples from `sortable` that are equal 3298 in their first element have the same relative order in `sorted`). 3299 3300#### Examples 3301 3302_This section is non-normative._ 3303 3304##### Example 1 3305 3306Presuming a minimal implementation which only supports `:string` annotation 3307which matches keys by using string comparison, 3308and a formatting context in which 3309the variable reference `$foo` resolves to the string `'foo'` and 3310the variable reference `$bar` resolves to the string `'bar'`, 3311pattern selection proceeds as follows for this message: 3312 3313``` 3314.match {$foo :string} {$bar :string} 3315bar bar {{All bar}} 3316foo foo {{All foo}} 3317* * {{Otherwise}} 3318``` 3319 33201. For the first selector:<br> 3321 The value of the selector is resolved to be `'foo'`.<br> 3322 The available keys « `'bar'`, `'foo'` » are compared to `'foo'`,<br> 3323 resulting in a list « `'foo'` » of matching keys. 3324 33252. For the second selector:<br> 3326 The value of the selector is resolved to be `'bar'`.<br> 3327 The available keys « `'bar'`, `'foo'` » are compared to `'bar'`,<br> 3328 resulting in a list « `'bar'` » of matching keys. 3329 33303. Creating the list `vars` of variants matching all keys:<br> 3331 The first variant `bar bar` is discarded as its first key does not match the first selector.<br> 3332 The second variant `foo foo` is discarded as its second key does not match the second selector.<br> 3333 The catch-all keys of the third variant `* *` always match, and this is added to `vars`,<br> 3334 resulting in a list « `* *` » of variants. 3335 33364. As the list `vars` only has one entry, it does not need to be sorted.<br> 3337 The pattern `Otherwise` of the third variant is selected. 3338 3339##### Example 2 3340 3341Alternatively, with the same implementation and formatting context as in Example 1, 3342pattern selection would proceed as follows for this message: 3343 3344``` 3345.match {$foo :string} {$bar :string} 3346* bar {{Any and bar}} 3347foo * {{Foo and any}} 3348foo bar {{Foo and bar}} 3349* * {{Otherwise}} 3350``` 3351 33521. For the first selector:<br> 3353 The value of the selector is resolved to be `'foo'`.<br> 3354 The available keys « `'foo'` » are compared to `'foo'`,<br> 3355 resulting in a list « `'foo'` » of matching keys. 3356 33572. For the second selector:<br> 3358 The value of the selector is resolved to be `'bar'`.<br> 3359 The available keys « `'bar'` » are compared to `'bar'`,<br> 3360 resulting in a list « `'bar'` » of matching keys. 3361 33623. Creating the list `vars` of variants matching all keys:<br> 3363 The keys of all variants either match each selector exactly, or via the catch-all key,<br> 3364 resulting in a list « `* bar`, `foo *`, `foo bar`, `* *` » of variants. 3365 33664. Sorting the variants:<br> 3367 The list `sortable` is first set with the variants in their source order 3368 and scores determined by the second selector:<br> 3369 « ( 0, `* bar` ), ( 1, `foo *` ), ( 0, `foo bar` ), ( 1, `* *` ) »<br> 3370 This is then sorted as:<br> 3371 « ( 0, `* bar` ), ( 0, `foo bar` ), ( 1, `foo *` ), ( 1, `* *` ) ».<br> 3372 To sort according to the first selector, the scores are updated to:<br> 3373 « ( 1, `* bar` ), ( 0, `foo bar` ), ( 0, `foo *` ), ( 1, `* *` ) ».<br> 3374 This is then sorted as:<br> 3375 « ( 0, `foo bar` ), ( 0, `foo *` ), ( 1, `* bar` ), ( 1, `* *` ) ».<br> 3376 33775. The pattern `Foo and bar` of the most preferred `foo bar` variant is selected. 3378 3379##### Example 3 3380 3381A more-complex example is the matching found in selection APIs 3382such as ICU's `PluralFormat`. 3383Suppose that this API is represented here by the function `:number`. 3384This `:number` function can match a given numeric value to a specific number _literal_ 3385and **_also_** to a plural category (`zero`, `one`, `two`, `few`, `many`, `other`) 3386according to locale rules defined in CLDR. 3387 3388Given a variable reference `$count` whose value resolves to the number `1` 3389and an `en` (English) locale, 3390the pattern selection proceeds as follows for this message: 3391 3392``` 3393.input {$count :number} 3394.match {$count} 3395one {{Category match for {$count}}} 33961 {{Exact match for {$count}}} 3397* {{Other match for {$count}}} 3398``` 3399 34001. For the selector:<br> 3401 The value of the selector is resolved to an implementation-defined value 3402 that is capable of performing English plural category selection on the value `1`.<br> 3403 The available keys « `'one'`, `'1'` » are passed to 3404 the implementation's MatchSelectorKeys method,<br> 3405 resulting in a list « `'1'`, `'one'` » of matching keys. 3406 34072. Creating the list `vars` of variants matching all keys:<br> 3408 The keys of all variants are included in the list of matching keys, or use the catch-all key,<br> 3409 resulting in a list « `one`, `1`, `*` » of variants. 3410 34113. Sorting the variants:<br> 3412 The list `sortable` is first set with the variants in their source order 3413 and scores determined by the selector key order:<br> 3414 « ( 1, `one` ), ( 0, `1` ), ( 2, `*` ) »<br> 3415 This is then sorted as:<br> 3416 « ( 0, `1` ), ( 1, `one` ), ( 2, `*` ) »<br> 3417 34184. The pattern `Exact match for {$count}` of the most preferred `1` variant is selected. 3419 3420### Formatting 3421 3422After _pattern selection_, 3423each _text_ and _placeholder_ part of the selected _pattern_ is resolved and formatted. 3424 3425Resolved values cannot always be formatted by a given implementation. 3426When such an error occurs during _formatting_, 3427an implementation SHOULD emit a _Formatting Error_ and produce a 3428_fallback value_ for the _placeholder_ that produced the error. 3429A formatting function MAY substitute a value to use instead of a _fallback value_. 3430 3431Implementations MAY represent the result of _formatting_ using the most 3432appropriate data type or structure. Some examples of these include: 3433 3434- A single string concatenated from the parts of the resolved _pattern_. 3435- A string with associated attributes for portions of its text. 3436- A flat sequence of objects corresponding to each resolved value. 3437- A hierarchical structure of objects that group spans of resolved values, 3438 such as sequences delimited by _markup-open_ and _markup-close_ _placeholders_. 3439 3440Implementations SHOULD provide _formatting_ result types that match user needs, 3441including situations that require further processing of formatted messages. 3442Implementations SHOULD encourage users to consider a formatted localised string 3443as an opaque data structure, suitable only for presentation. 3444 3445When formatting to a string, the default representation of all _markup_ 3446MUST be an empty string. 3447Implementations MAY offer functionality for customizing this, 3448such as by emitting XML-ish tags for each _markup_. 3449 3450_Attributes_ are reserved for future standardization. 3451Other than checking for valid syntax, they SHOULD NOT 3452affect the processing or output of a _message_. 3453 3454#### Examples 3455 3456_This section is non-normative._ 3457 34581. An implementation might choose to return an interstitial object 3459 so that the caller can "decorate" portions of the formatted value. 3460 In ICU4J, the `NumberFormatter` class returns a `FormattedNumber` object, 3461 so a _pattern_ such as `This is my number {42 :number}` might return 3462 the character sequence `This is my number ` 3463 followed by a `FormattedNumber` object representing the value `42` in the current locale. 3464 34652. A formatter in a web browser could format a message as a DOM fragment 3466 rather than as a representation of its HTML source. 3467 3468#### Formatting Fallback Values 3469 3470If the resolved _pattern_ includes any _fallback values_ 3471and the formatting result is a concatenated string or a sequence of strings, 3472the string representation of each _fallback value_ MUST be the concatenation of 3473a U+007B LEFT CURLY BRACKET `{`, 3474the _fallback value_ as a string, 3475and a U+007D RIGHT CURLY BRACKET `}`. 3476 3477> For example, 3478> a message with a _Syntax Error_ and no fallback string 3479> defined in the _formatting context_ would format to a string as `{�}`. 3480 3481#### Handling Bidirectional Text 3482 3483_Messages_ contain text. Any text can be 3484[bidirectional text](https://www.w3.org/TR/i18n-glossary/#dfn-bidirectional-text). 3485That is, the text can can consist of a mixture of left-to-right and right-to-left spans of text. 3486The display of bidirectional text is defined by the 3487[Unicode Bidirectional Algorithm](http://www.unicode.org/reports/tr9/) [UAX9]. 3488 3489The directionality of the message as a whole is provided by the _formatting context_. 3490 3491When a _message_ is formatted, _placeholders_ are replaced 3492with their formatted representation. 3493Applying the Unicode Bidirectional Algorithm to the text of a formatted _message_ 3494(including its formatted parts) 3495can result in unexpected or undesirable 3496[spillover effects](https://www.w3.org/TR/i18n-glossary/#dfn-spillover-effects). 3497Applying [bidi isolation](https://www.w3.org/TR/i18n-glossary/#dfn-bidi-isolation) 3498to each affected formatted value helps avoid this spillover in a formatted _message_. 3499 3500Note that both the _message_ and, separately, each _placeholder_ need to have 3501direction metadata for this to work. 3502If an implementation supports formatting to something other than a string 3503(such as a sequence of parts), 3504the directionality of each formatted _placeholder_ needs to be available to the caller. 3505 3506If a formatted _expression_ itself contains spans with differing directionality, 3507its formatter SHOULD perform any necessary processing, such as inserting controls or 3508isolating such parts to ensure that the formatted value displays correctly in a plain text context. 3509 3510> For example, an implementation could provide a `:currency` formatting function 3511> which inserts strongly directional characters, such as U+200F RIGHT-TO-LEFT MARK (RLM), 3512> U+200E LEFT-TO-RIGHT MARK (LRM), or U+061C ARABIC LETTER MARKER (ALM), 3513> to coerce proper display of the sign and currency symbol next to a formatted number. 3514> An example of this is formatting the value `-1234.56` as the currency `AED` 3515> in the `ar-AE` locale. The formatted value appears like this: 3516> ``` 3517> -1,234.56 د.إ. 3518> ``` 3519> The code point sequence for this string, as produced by the ICU4J `NumberFormat` function, 3520> includes **U+200F U+200E** at the start and **U+200F** at the end of the string. 3521> If it did not do this, the same string would appear like this instead: 3522> 3523>  3524 3525A **_bidirectional isolation strategy_** is functionality in the formatter's 3526processing of a _message_ that produces bidirectional output text that is ready for display. 3527 3528The **_Default Bidi Strategy_** is a _bidirectional isolation strategy_ that uses 3529isolating Unicode control characters around _placeholder_'s formatted values. 3530It is primarily intended for use in plain-text strings, where markup or other mechanisms 3531are not available. 3532Implementations MUST provide the _Default Bidi Strategy_ as one of the 3533_bidirectional isolation strategies_. 3534 3535Implementations MAY provide other _bidirectional isolation strategies_. 3536 3537Implementations MAY supply a _bidirectional isolation strategy_ that performs no processing. 3538 3539The _Default Bidi Strategy_ is defined as follows: 3540 35411. Let `msgdir` be the directionality of the whole message, 3542 one of « `'LTR'`, `'RTL'`, `'unknown'` ». 3543 These correspond to the message having left-to-right directionality, 3544 right-to-left directionality, and to the message's directionality not being known. 35451. For each _expression_ `exp` in _pattern_: 3546 1. Let `fmt` be the formatted string representation of the resolved value of `exp`. 3547 1. Let `dir` be the directionality of `fmt`, 3548 one of « `'LTR'`, `'RTL'`, `'unknown'` », with the same meanings as for `msgdir`. 3549 1. If `dir` is `'LTR'`: 3550 1. If `msgdir` is `'LTR'` 3551 in the formatted output, let `fmt` be itself 3552 1. Else, in the formatted output, 3553 prefix `fmt` with U+2066 LEFT-TO-RIGHT ISOLATE 3554 and postfix it with U+2069 POP DIRECTIONAL ISOLATE. 3555 1. Else, if `dir` is `'RTL'`: 3556 1. In the formatted output, 3557 prefix `fmt` with U+2067 RIGHT-TO-LEFT ISOLATE 3558 and postfix it with U+2069 POP DIRECTIONAL ISOLATE. 3559 1. Else: 3560 1. In the formatted output, 3561 prefix `fmt` with U+2068 FIRST STRONG ISOLATE 3562 and postfix it with U+2069 POP DIRECTIONAL ISOLATE. 3563 3564 3565## Interchange Data Model 3566 3567This section defines a data model representation of MessageFormat 2 _messages_. 3568 3569Implementations are not required to use this data model for their internal representation of messages. 3570Neither are they required to provide an interface that accepts or produces 3571representations of this data model. 3572 3573The major reason this specification provides a data model is to allow interchange of 3574the logical representation of a _message_ between different implementations. 3575This includes mapping legacy formatting syntaxes (such as MessageFormat 1) 3576to a MessageFormat 2 implementation. 3577Another use would be in converting to or from translation formats without 3578the need to continually parse and serialize all or part of a message. 3579 3580Implementations that expose APIs supporting the production, consumption, or transformation of a 3581_message_ as a data structure are encouraged to use this data model. 3582 3583This data model provides these capabilities: 3584- any MessageFormat 2 message (including future versions) 3585 can be parsed into this representation 3586- this data model representation can be serialized as a well-formed 3587MessageFormat 2 message 3588- parsing a MessageFormat 2 message into a data model representation 3589 and then serializing it results in an equivalently functional message 3590 3591This data model might also be used to: 3592- parse a non-MessageFormat 2 message into a data model 3593 (and therefore re-serialize it as MessageFormat 2). 3594 Note that this depends on compatibility between the two syntaxes. 3595- re-serialize a MessageFormat 2 message into some other format 3596 including (but not limited to) other formatting syntaxes 3597 or translation formats. 3598 3599To ensure compatibility across all platforms, 3600this interchange data model is defined here using TypeScript notation. 3601Two equivalent definitions of the data model are also provided: 3602 3603- `common/dtd/messageFormat/message.json` is a JSON Schema definition, 3604 for use with message data encoded as JSON or compatible formats, such as YAML. 3605- `common/dtd/messageFormat/message.json` is a document type definition (DTD), 3606 for use with message data encoded as XML. 3607 3608Note that while the data model description below is the canonical one, 3609the JSON and DTD definitions are intended for interchange between systems and processors. 3610To that end, they relax some aspects of the data model, such as allowing 3611declarations, options, and attributes to be optional rather than required properties. 3612 3613> [!NOTE] 3614> Users relying on XML representations of messages should note that 3615> XML 1.0 does not allow for the representation of all C0 control characters (U+0000-U+001F). 3616> Except for U+0000 NULL , these characters are allowed in MessageFormat 2 messages, 3617> so systems and users relying on this XML representation for interchange 3618> might need to supply an alternate escape mechanism to support messages 3619> that contain these characters. 3620 3621> [!IMPORTANT] 3622> The data model uses the field name `name` to denote various interface identifiers. 3623> In the MessageFormat 2 [syntax](#syntax), the source for these `name` fields 3624> sometimes uses the production `identifier`. 3625> This happens when the named item, such as a _function_, supports namespacing. 3626> 3627> In the Tech Preview, feedback on whether to separate the `namespace` from the `name` 3628> and represent both separately, or just, as here, use an opaque single field `name` 3629> is desired. 3630 3631### Messages 3632 3633A `SelectMessage` corresponds to a syntax message that includes _selectors_. 3634A message without _selectors_ and with a single _pattern_ is represented by a `PatternMessage`. 3635 3636In the syntax, 3637a `PatternMessage` may be represented either as a _simple message_ or as a _complex message_, 3638depending on whether it has declarations and if its `pattern` is allowed in a _simple message_. 3639 3640```ts 3641type Message = PatternMessage | SelectMessage; 3642 3643interface PatternMessage { 3644 type: "message"; 3645 declarations: Declaration[]; 3646 pattern: Pattern; 3647} 3648 3649interface SelectMessage { 3650 type: "select"; 3651 declarations: Declaration[]; 3652 selectors: Expression[]; 3653 variants: Variant[]; 3654} 3655``` 3656 3657Each message _declaration_ is represented by a `Declaration`, 3658which connects the `name` of a _variable_ 3659with its _expression_ `value`. 3660The `name` does not include the initial `$` of the _variable_. 3661 3662The `name` of an `InputDeclaration` MUST be the same 3663as the `name` in the `VariableRef` of its `VariableExpression` `value`. 3664 3665An `UnsupportedStatement` represents a statement not supported by the implementation. 3666Its `keyword` is a non-empty string name (i.e. not including the initial `.`). 3667If not empty, the `body` is the "raw" value (i.e. escape sequences are not processed) 3668starting after the keyword and up to the first _expression_, 3669not including leading or trailing whitespace. 3670The non-empty `expressions` correspond to the trailing _expressions_ of the _reserved statement_. 3671 3672> [!NOTE] 3673> Be aware that future versions of this specification 3674> might assign meaning to _reserved statement_ values. 3675> This would result in new interfaces being added to 3676> this data model. 3677 3678```ts 3679type Declaration = InputDeclaration | LocalDeclaration | UnsupportedStatement; 3680 3681interface InputDeclaration { 3682 type: "input"; 3683 name: string; 3684 value: VariableExpression; 3685} 3686 3687interface LocalDeclaration { 3688 type: "local"; 3689 name: string; 3690 value: Expression; 3691} 3692 3693interface UnsupportedStatement { 3694 type: "unsupported-statement"; 3695 keyword: string; 3696 body?: string; 3697 expressions: Expression[]; 3698} 3699``` 3700 3701In a `SelectMessage`, 3702the `keys` and `value` of each _variant_ are represented as an array of `Variant`. 3703For the `CatchallKey`, a string `value` may be provided to retain an identifier. 3704This is always `'*'` in MessageFormat 2 syntax, but may vary in other formats. 3705 3706```ts 3707interface Variant { 3708 keys: Array<Literal | CatchallKey>; 3709 value: Pattern; 3710} 3711 3712interface CatchallKey { 3713 type: "*"; 3714 value?: string; 3715} 3716``` 3717 3718### Patterns 3719 3720Each `Pattern` contains a linear sequence of text and placeholders corresponding to potential output of a message. 3721 3722Each element of the `Pattern` MUST either be a non-empty string, an `Expression`, or a `Markup` object. 3723String values represent literal _text_. 3724String values include all processing of the underlying _text_ values, 3725including escape sequence processing. 3726`Expression` wraps each of the potential _expression_ shapes. 3727`Markup` wraps each of the potential _markup_ shapes. 3728 3729Implementations MUST NOT rely on the set of `Expression` and 3730`Markup` interfaces defined in this document being exhaustive. 3731Future versions of this specification might define additional 3732expressions or markup. 3733 3734```ts 3735type Pattern = Array<string | Expression | Markup>; 3736 3737type Expression = 3738 | LiteralExpression 3739 | VariableExpression 3740 | FunctionExpression 3741 | UnsupportedExpression; 3742 3743interface LiteralExpression { 3744 type: "expression"; 3745 arg: Literal; 3746 annotation?: FunctionAnnotation | UnsupportedAnnotation; 3747 attributes: Attribute[]; 3748} 3749 3750interface VariableExpression { 3751 type: "expression"; 3752 arg: VariableRef; 3753 annotation?: FunctionAnnotation | UnsupportedAnnotation; 3754 attributes: Attribute[]; 3755} 3756 3757interface FunctionExpression { 3758 type: "expression"; 3759 arg?: never; 3760 annotation: FunctionAnnotation; 3761 attributes: Attribute[]; 3762} 3763 3764interface UnsupportedExpression { 3765 type: "expression"; 3766 arg?: never; 3767 annotation: UnsupportedAnnotation; 3768 attributes: Attribute[]; 3769} 3770 3771interface Attribute { 3772 name: string; 3773 value?: Literal | VariableRef; 3774} 3775``` 3776 3777### Expressions 3778 3779The `Literal` and `VariableRef` correspond to the the _literal_ and _variable_ syntax rules. 3780When they are used as the `body` of an `Expression`, 3781they represent _expression_ values with no _annotation_. 3782 3783`Literal` represents all literal values, both _quoted_ and _unquoted_. 3784The presence or absence of quotes is not preserved by the data model. 3785The `value` of `Literal` is the "cooked" value (i.e. escape sequences are processed). 3786 3787In a `VariableRef`, the `name` does not include the initial `$` of the _variable_. 3788 3789```ts 3790interface Literal { 3791 type: "literal"; 3792 value: string; 3793} 3794 3795interface VariableRef { 3796 type: "variable"; 3797 name: string; 3798} 3799``` 3800 3801A `FunctionAnnotation` represents a _function_ _annotation_. 3802The `name` does not include the `:` starting sigil. 3803 3804Each _option_ is represented by an `Option`. 3805 3806```ts 3807interface FunctionAnnotation { 3808 type: "function"; 3809 name: string; 3810 options: Option[]; 3811} 3812 3813interface Option { 3814 name: string; 3815 value: Literal | VariableRef; 3816} 3817``` 3818 3819An `UnsupportedAnnotation` represents a 3820_private-use annotation_ not supported by the implementation or a _reserved annotation_. 3821The `source` is the "raw" value (i.e. escape sequences are not processed), 3822including the starting sigil. 3823 3824When parsing the syntax of a _message_ that includes a _private-use annotation_ 3825supported by the implementation, 3826the implementation SHOULD represent it in the data model 3827using an interface appropriate for the semantics and meaning 3828that the implementation attaches to that _annotation_. 3829 3830```ts 3831interface UnsupportedAnnotation { 3832 type: "unsupported-annotation"; 3833 source: string; 3834} 3835``` 3836 3837### Markup 3838 3839A `Markup` object has a `kind` of either `"open"`, `"standalone"`, or `"close"`, 3840each corresponding to _open_, _standalone_, and _close_ _markup_. 3841The `name` in these does not include the starting sigils `#` and `/` 3842or the ending sigil `/`. 3843The optional `options` for markup use the same `Option` as `FunctionAnnotation`. 3844 3845```ts 3846interface Markup { 3847 type: "markup"; 3848 kind: "open" | "standalone" | "close"; 3849 name: string; 3850 options: Option[]; 3851 attributes: Attribute[]; 3852} 3853``` 3854 3855### Extensions 3856 3857Implementations MAY extend this data model with additional interfaces, 3858as well as adding new fields to existing interfaces. 3859When encountering an unfamiliar field, an implementation MUST ignore it. 3860For example, an implementation could include a `span` field on all interfaces 3861encoding the corresponding start and end positions in its source syntax. 3862 3863In general, 3864implementations MUST NOT extend the sets of values for any defined field or type 3865when representing a valid message. 3866However, when using this data model to represent an invalid message, 3867an implementation MAY do so. 3868This is intended to allow for the representation of "junk" or invalid content within messages. 3869 3870## Appendices 3871 3872### Security Considerations 3873 3874MessageFormat 2.0 _patterns_ are meant to allow a _message_ to include any string value 3875which users might normally wish to use in their environment. 3876Programming languages and other environments vary in what characters are permitted 3877to appear in a valid string. 3878In many cases, certain types of characters, such as invisible control characters, 3879require escaping by these host formats. 3880In other cases, strings are not permitted to contain certain characters at all. 3881Since _messages_ are subject to the restrictions and limitations of their 3882host environments, their serializations and resource formats, 3883that might be sufficient to prevent most problems. 3884However, MessageFormat itself does not supply such a restriction. 3885 3886MessageFormat _messages_ permit nearly all Unicode code points, 3887with the exception of surrogates, 3888to appear in _literals_, including the text portions of a _pattern_. 3889This means that it can be possible for a _message_ to contain invisible characters 3890(such as bidirectional controls, 3891ASCII control characters in the range U+0000 to U+001F, 3892or characters that might be interpreted as escapes or syntax in the host format) 3893that abnormally affect the display of the _message_ 3894when viewed as source code, or in resource formats or translation tools, 3895but do not generate errors from MessageFormat parsers or processing APIs. 3896 3897Bidirectional text containing right-to-left characters (such as used for Arabic or Hebrew) 3898also poses a potential source of confusion for users. 3899Since MessageFormat 2.0's syntax makes use of 3900keywords and symbols that are left-to-right or consist of neutral characters 3901(including characters subject to mirroring under the Unicode Bidirectional Algorithm), 3902it is possible to create messages that, 3903when displayed in source code, or in resource formats or translation tools, 3904have a misleading appearance or are difficult to parse visually. 3905 3906For more information, see \[[UTS#55](https://unicode.org/reports/tr55/)\] 3907<cite>Unicode Source Code Handling</cite>. 3908 3909MessageFormat 2.0 implementations might allow end-users to install 3910_selectors_, _functions_, or _markup_ from third-party sources. 3911Such functionality can be a vector for various exploits, 3912including buffer overflow, code injection, user tracking, 3913fingerprinting, and other types of bad behavior. 3914Any installed code needs to be appropriately sandboxed. 3915In addition, end-users need to be aware of the risks involved. 3916 3917### Acknowledgements 3918 3919Special thanks to the following people for their contributions to making MessageFormat v2. 3920The following people contributed to our github repo and are listed in order by contribution size: 3921 3922Addison Phillips, 3923Eemeli Aro, 3924Romulo Cintra, 3925Stanisław Małolepszy, 3926Elango Cheran, 3927Richard Gibson, 3928Tim Chevalier, 3929Mihai Niță, 3930Shane F. Carr, 3931Mark Davis, 3932Steven R. Loomis, 3933Caleb Maclennan, 3934David Filip, 3935Daniel Minor, 3936Christopher Dieringer, 3937George Rhoten, 3938Ujjwal Sharma, 3939Daniel Ehrenberg, 3940Markus Scherer, 3941Zibi Braniecki, 3942Matt Radbourne, 3943Bruno Haible, 3944and Rafael Xavier de Souza. 3945 3946Addison Phillips was chair of the working group from January 2023. 3947Prior to 2023, the group was governed by a chair group, consisting of 3948Romulo Cintra, 3949Elango Cheran, 3950Mihai Niță, 3951David Filip, 3952Nicolas Bouvrette, 3953Stanisław Małolepszy, 3954Rafael Xavier de Souza, 3955Addison Phillips, 3956and Daniel Minor. 3957Romulo Cintra chaired the chair group. 3958 3959* * * 3960 3961Copyright © 2001–2024 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode [Terms of Use](https://www.unicode.org/copyright.html) apply. 3962 3963Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions. 3964