xref: /aosp_15_r20/external/cldr/docs/ldml/tr35-personNames.md (revision 912701f9769bb47905792267661f0baf2b85bed5)
1## Unicode Technical Standard #35
2
3# Unicode Locale Data Markup Language (LDML)<br/>Part 8: Person Names
4
5|Version|45                      |
6|-------|------------------------|
7|Editors|Mark Davis, Peter Edberg,  Rich Gillam, Alex Kolisnychenko, Mike McKenna and [other CLDR committee members](tr35.md#Acknowledgments)|
8
9For the full header, summary, and status, see [Part 1: Core](tr35.md).
10
11### _Summary_
12
13This document describes parts of an XML format (_vocabulary_) for the exchange of structured locale data. This format is used in the [Unicode Common Locale Data Repository](https://www.unicode.org/cldr/).
14
15This is a partial document, describing only those parts of the LDML that are relevant for person names (name structure, formats, sorting). For the other parts of the LDML see the [main LDML document](tr35.md) and the links above.
16
17### _Status_
18
19<!-- _This is a draft document which may be updated, replaced, or superseded by other documents at any time.
20Publication does not imply endorsement by the Unicode Consortium.
21This is not a stable document; it is inappropriate to cite this document as other than a work in progress._ -->
22
23_This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium.
24This is a stable document and may be used as reference material or cited as a normative reference by other specifications._
25
26> _**A Unicode Technical Standard (UTS)** is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS._
27
28_Please submit corrigenda and other comments with the CLDR bug reporting form [[Bugs](tr35.md#Bugs)]. Related information that is useful in understanding this document is found in the [References](tr35.md#References). For the latest version of the Unicode Standard see [[Unicode](tr35.md#Unicode)]. For a list of current Unicode Technical Reports see [[Reports](tr35.md#Reports)]. For more information about versions of the Unicode Standard, see [[Versions](tr35.md#Versions)]._
29
30## Parts
31
32The LDML specification is divided into the following parts:
33
34*   Part 1: [Core](tr35.md#Contents) (languages, locales, basic structure)
35*   Part 2: [General](tr35-general.md#Contents) (display names & transforms, etc.)
36*   Part 3: [Numbers](tr35-numbers.md#Contents) (number & currency formatting)
37*   Part 4: [Dates](tr35-dates.md#Contents) (date, time, time zone formatting)
38*   Part 5: [Collation](tr35-collation.md#Contents) (sorting, searching, grouping)
39*   Part 6: [Supplemental](tr35-info.md#Contents) (supplemental data)
40*   Part 7: [Keyboards](tr35-keyboards.md#Contents) (keyboard mappings)
41*   Part 8: [Person Names](tr35-personNames.md#Contents) (person names)
42*   Part 9: [MessageFormat](tr35-messageFormat.md#Contents) (message format)
43
44## <a name="Contents">Contents of Part 8, Person Names</a>
45
46* [CLDR Person Names](#cldr-person-names)
47  * [Introduction](#introduction)
48    * [Not in scope](#not-in-scope)
49  * [API Implementation](#api-implementation)
50  * [Person Name Formatting Overview](#person-name-formatting-overview)
51  * [Example Usage](#example-usage)
52* [XML Structure](#xml-structure)
53  * [personNames Element](#personnames-element)
54  * [personName Element](#personname-element)
55  * [nameOrderLocales Element](#nameorderlocales-element)
56  * [parameterDefault Element](#parameterdefault-element)
57  * [foreignSpaceReplacement Element](#foreignspacereplacement-element)
58  * [nativeSpaceReplacement Element](#nativespacereplacement-element)
59  * [initialPattern Element](#initialpattern-element)
60    * [Syntax](#syntax)
61* [Person Name Object](#person-name-object)
62* [Person Name Attributes](#person-name-attributes)
63  * [order](#order)
64  * [length](#length)
65  * [usage](#usage)
66  * [formality](#formality)
67* [namePattern Syntax](#namepattern-syntax)
68  * [Fields](#fields)
69  * [Modifiers](#modifiers)
70    * [Grammatical Modifiers for Names](#grammatical-modifiers-for-names)
71    * [Future Modifiers](#future-modifiers)
72* [Formatting Process](#formatting-process)
73  * [Derive the name locale](#derive-the-name-locale)
74  * [Derive the formatting locale](#derive-the-formatting-locale)
75    * [Switch the formatting locale if necessary](#switch-the-formatting-locale-if-necessary)
76  * [Derive the name order](#derive-the-name-order)
77  * [Choose a personName element](#choose-a-personname-element)
78  * [Choose a namePattern](#choose-a-namepattern)
79  * [Access PersonName object](#access-personname-object)
80    * [Handle missing surname](#handle-missing-surname)
81    * [Handle core and prefix](#handle-core-and-prefix)
82    * [Derive initials](#derive-initials)
83  * [Process a namePattern](#process-a-namepattern)
84    * [Handling foreign names](#handling-foreign-names)
85    * [Setting the spaceReplacement](#setting-the-spacereplacement)
86    * [Examples of space replacement](#examples-of-space-replacement)
87  * [Formatting examples](#formatting-examples)
88* [Sample Name](#sample-name)
89  * [Syntax](#syntax)
90  * [Expected values](#expected-values)
91* [PersonName Data Interface Examples](#personname-data-interface-examples)
92  * [Example 1](#example-1)
93  * [Example 2](#example-2)
94
95## CLDR Person Names
96
97### Introduction
98
99CLDR provides formatting for person names, such as John Smith or 宮崎駿. These use patterns to show how a name object (for example, from a database) should be formatted for a particular locale. Name data has fields for the parts of people’s names, such as a **given** field with a value of “Maria”, and a **surname** field value of “Schmidt”.
100
101There is a wide variety in the way that people’s names appear in different languages.
102
103* People may have a different number of names, depending on their culture—they might have only one name (“Zendaya”), two (“Albert Einstein”), or three or more.
104* People may have multiple words in a particular name field, eg “Mary Beth” as a given name, or “van Berg” as a surname.
105* Some languages, such as Spanish, have two surnames (where each can be composed of multiple words).
106* The ordering of name fields can be different across languages, as well as the spacing (or lack thereof) and punctuation.
107* Name formatting needs to be adapted to different circumstances, such as a need to be presented shorter or longer; formal or informal context; or when talking about someone, or talking to someone, or as a monogram (JFK).
108
109This document provides the [LDML](tr35.md) specification for formatting of personal names, using data, structure, and examples.
110
111The CLDR functionality is targeted at formatting names for typical usage on computers (e.g. contact names, automated greetings, etc.), rather than being designed for special circumstances or protocol, such addressing royalty. However, the structure may be enhanced in the future when it becomes clear that additional features are needed for some languages.
112
113This addition to CLDR is based on review of current standards and practices that exist in LDAP, OECD, S42, hCard, HTML and various other international standards and commercial implementations.
114
115Additions to those structures were made to accommodate known issues in large population groups, such as mononyms in Indonesia, patronymic and matronymic naming structure in Iceland and India, the need for a second surname in Spanish-speaking regions and the common case of chains of patronymic names in Arabic-speaking locales. The formatting patterns allow for specifying different “input parameters” to account for different contexts.
116
117#### Not in scope
118
119The following features are currently out of scope for Person Names formating:
120
121* Grammatical inflection of formatted names.
122* Context-specific cultural aspects, such as when to use “-san” vs “-sama” when addressing a Japanese person.
123* Providing locale-specific lists of titles, generation terms, and credentials for use in pull-down menus or validation (Mr, Ms., Mx., Dr., Jr., M.D., etc.).
124* Validation of input, such as  which fields are required, and what characters are allowed.
125* Combining alternative names, such as multicultural names in Hong Kong "[Jackie Chan Kong-Sang](https://en.wikipedia.org/wiki/Jackie_Chan)”, or ‘Dwayne “The Rock” Johnson’.
126* More than two levels of formality for names.
127* Parsing of names:
128  * Parsing of name strings into specific name parts such as given and given2. A name like "Mary Beth Estrella" could conceivably be any of the following.
129
130    | given     | given2    | surname       | surname2 |
131    | --------- | --------- | ------------- | -------- |
132    | Mary      | Beth      | Estrella      |          |
133    | Mary Beth |           | Estrella      |          |
134    | Mary      |           | Beth Estrella |          |
135    | Mary      |           | Beth          | Estrella |
136
137  * Parsing out the other components of a name in a string, such as surname prefixes ([Tussenvoegsel](https://en.wikipedia.org/wiki/Tussenvoegsel) in Dutch).
138
139### API Implementation
140
141In addition to the settings in this document, it is recommended that implementations provide some additional features in their APIs to allow more control for clients, notably:
142
1431. forceGivenFirst — no matter what the values are in nameOrderLocales or in the NameObject, display the name as givenFirst.
1442. forceSurnameFirst — no matter what the values are in nameOrderLocales or in the NameObject, display the name as surnameFirst.
1453. forceNativeOrdering — no matter what the values are in nameOrderLocales or in the NameObject, display the name with the same ordering as the native locale.
1464. surnameFirstAllCaps — display the surname and surname2 fields in all caps **if** not using native order. Thus where the foreign name ordering is surnameFirst, the name {given=Shinzo, surname=Abe} would display as “ABE Shinzo”.
147
148### Person Name Formatting Overview
149
150Logically, the model used for applying the CLDR data is the following:
151
152![diagram showing relationship of components involved in person name formatting](images/personNamesFormatModel.png)
153
154Conceptually, CLDR person name formatting depends on data supplied by a PersonName Data Interface. That could be a very thin interface that simply accesses a database record, or it could be a more sophisticated interface that can modify the raw data before presenting it to be formatted. For example, based on the formatting locale a PersonName data interface could transliterate names that are in another script, or supply equivalent titles in different languages.
155
156The specification below will talk about a “PersonName object” as an entity that is logically accessed via such an interface. If multiple formatted names are needed, such as in different scripts or with alternate names, or pronunciations (eg kana), the presumption is that those are logically separate PersonName objects. See [[Person Name Object](#person-name-object)].
157
158The following summarizes the name data supplied via the PersonName Data Interface:
159
160* Name data is composed of one or more name parts, which are categorized in this standard as
161    * _title_ - a string that represents one or more honorifics or titles, such as “Mr.”, or “Herr Doctor”.
162    * _given_ - usually a name given to someone that is not passed to a person by way of parentage
163    * _given2_ - name or names that may appear between the first given name string and the surname. In the West, this may be a middle name, in Slavic regions it may be a patronymic name, and in parts of the Middle East, it may be the _nasab (نسب)_ or series of patronymics.
164    * _surname_ - usually the family name passed to a person that indicates their family, tribe, or community. In most Western languages, this is known as the last name.
165    * _surname2_ - in some cultures, both the parent’s surnames are used and need to be handled separately for formatting in different contexts.
166    * _generation_ - a string that represents a generation marker, such as “Jr.” or “III”.
167    * _credentials_ - a string that represents one or more credentials or accreditations, such as “M.D.”, or “MBA”.
168    * _See the section on [[Fields](#fields)] for more details._
169* Name data may have additional attributes that this specification accommodates.
170    * _-informal_ - A name may have a formal and an informal presentation form, for example “Bob” vs “Robert” or “Са́ша” vs “Алекса́ндра”. This is accomplished by using the simple construct _given-informal_.
171    * _-prefix_ and _-core_ - In some languages the surname may have a prefix that needs to be treated differently, for example “van den Berg”. The data can refer to “van den” as _surname-prefix_ and “Berg” with _surname-core_ and the PersonNames formatters will format them correctly in Dutch and many other languages.
172    * _See the section on [[Modifiers](#modifiers)] for more details._
173
174To format a name correctly, the correct context needs to be known. The context is composed of:
175
176* **The formatting locale.** This is used to choose the primary set of patterns to format name data.
177* **The name locale.** If the name data comes from a locale different from the formatting locale, it may need to be handled differently. If the name locale is not known, an inferred name locale is derived from the information in the name and the formatting locale.
178* **Input parameters.**
179    * **_order_** - indicates whether the given name comes first or the surname. This is normally specified in the CLDR data for the locale. This feature is also used for the sorting format.
180    * **_length_** - used to select patterns for common short, medium, and long formatted names.
181    * **_usage_** - this is used to select the correct pattern to format a name when a program is _addressing_ or talking to a person or it is _referring_ to or talking about another person.
182    * **_formality_** - This is used to select the formal or informal formatting of a name.
183    * _See [[Person Name Attributes](#person-name-attributes)] for more details._
184
185### Example Usage
186
187As an example, consider a person’s name that may contain:
188
189| `title`  | `given`  | `given2` | `surname` | `credentials` |
190| -------- | -------- | -------- | --------- | --------      |
191|          | Robin    | Finley   | Wang      | Ph.D.         |
192
193If the selected personName data has the following formatting pattern:
194
195> `{title} {given} {given2-initial} {surname}, {credentials}`
196
197Then the output is:
198
199> Robin F. Wang, Ph.D.
200
201The _title_ field is empty, so both it and the space that follows it in the formatting pattern are omitted from the output, the _given2_ field is formatted as an initial, and a preceding comma is placed before the _credentials_.
202
203Sections below specify the precise manner in which a pattern is selected, and how the pattern is modified for missing fields.
204
205## XML Structure
206
207Person name formatting data is stored as LDML with schema defined as follows. Each element has a brief description of the usage, but the exact algorithms for using these elements are provided in [Formatting Process](#formatting-process).
208
209
210### personNames Element
211
212```xml
213<!ELEMENT personNames ( nameOrderLocales*, parameterDefault*, nativeSpaceReplacement*, foreignSpaceReplacement*, initialPattern*, personName*, sampleName* ) >
214```
215
216The LDML top-level `<personNames>` element contains information regarding the formatting of person names, and the formatting of person names in specific contexts for a specific locale.
217
218### personName Element
219
220The `<personName>` element contains the format patterns, or `<namePattern>` elements, for a specific context and is described in [[namePattern Syntax](#namepattern-syntax)]
221
222The `<namePattern>` syntax is described in [[Person Name Format Patterns](#formatting-process)].
223
224```xml
225<!ELEMENT personName ( namePattern+ ) >
226<!ATTLIST personName order NMTOKEN #IMPLIED >
227```
228
229* `NMTOKEN` is one of `( surnameFirst | givenFirst | sorting )`
230
231```xml
232<!ATTLIST personName length NMTOKEN #IMPLIED >
233```
234
235* `NMTOKEN` is one of `( long | medium | short )`
236
237```xml
238<!ATTLIST personName usage NMTOKEN #IMPLIED >
239```
240
241* `NMTOKEN` is one of `( addressing | referring | monogram )`
242
243```xml
244<!ATTLIST personName formality NMTOKEN #IMPLIED >
245```
246
247* `NMTOKEN` is one of `( formal | informal )`
248
249The `<personName>` element has attributes of `order`, `length`, `usage`, and `formality`, and contains one or more `<namePattern>` elements.
250
251A missing attribute matches all valid values for that attribute. For example, if `formality=...` is missing, it is equivalent to multiple lines, one for each possible `formality` attribute.
252
253```xml
254<!ELEMENT namePattern ( #PCDATA ) >
255```
256
257A `namePattern` contains a list of PersonName fields enclosed in curly braces, separated by literals, such as:
258
259> `<namePattern>{surname}, {given} {given2}</namePattern>`
260
261which produces output like _“Smith, Robert James”_. See [[namePattern Syntax](#namepattern-syntax)] for more details.
262
263### nameOrderLocales Element
264
265The `<nameOrderLocales>` element is optional, and contains information about selecting patterns based on the locale of a passed in PersonName object to determine the order of elements in a formatted name. For more information see [[NameOrder](#derive-the-name-order)]. It has a structure as follows:
266
267```xml
268<!ELEMENT nameOrderLocales `( #PCDATA )`>
269<!ATTLIST nameOrderLocales order ( givenFirst | surnameFirst ) #REQUIRED >
270```
271
272* `#PCDATA `is a space delimited list of one or more [unicode_locale_id](tr35.md#unicode_locale_id)s. Normally each locale is limited to language, script, and region. The _und_ locale ID may only occur once, either in _surnameFirst_ or _givenFirst_, but not both, and matches all base locales not explicitly listed.
273
274An example from English may look like the following
275
276> `<nameOrderLocales order="givenFirst">und en</nameOrderLocales>`<br/>
277> `<nameOrderLocales order="surnameFirst">ko vi yue zh</nameOrderLocales>`
278
279This would tell the formatting code, when handling person name data from an English locale, to use patterns with the `givenFirst` order attribute for all data except name data from Korean, Vietnamese, Cantonese, and Chinese locales, where the `surnameFirst` patterns should be used.
280
281### parameterDefault Element
282```xml
283<!ELEMENT parameterDefault ( #PCDATA ) >
284<!ATTLIST parameterDefault parameter (length | formality) #REQUIRED >
285```
286Many clients of the person-names functionality don’t really care about formal versus informal; they just want whatever the “normal” formality level is for the user’s language. The same goes for the default length.
287
288This parameter provides that information, so that APIs can allow users to use default values for the formality and length. The exact form that this takes depends on the API conventions, of course.
289
290### foreignSpaceReplacement Element
291
292The `<foreignSpaceReplacement>` element is used to specify how spaces should be handled when the name language is **different from** the formatting language. It is used in languages that don't normally require spaces between words. For example, Japanese and Chinese have the value of a middle dot (‘·’ U+00B7 MIDDLE DOT or ‘・’ U+30FB KATAKANA MIDDLE DOT), so that it is used between words in a foreign name; most other languages have the value of SPACE.
293
294```xml
295<!ELEMENT foreignSpaceReplacement ( #PCDATA ) >
296<!ATTLIST foreignSpaceReplacement xml:space preserve #REQUIRED >
297```
298
299* `xml:space` must be set to `'preserve'` so that actual spaces in the pattern are preserved. See [W3C XML White Space Handling](https://www.w3.org/TR/xml/#sec-white-space).
300* The `#PCDATA `is the character sequence used to replace spaces when postprocessing a pattern.
301
302### nativeSpaceReplacement Element
303
304The `<nativeSpaceReplacement>` element is used to specify how spaces should be handled when the name language is **the same as** the formatting language. It is used in languages that don't normally require spaces between words, but may use spaces within names. For example, Japanese and Chinese have the value of an empty string between words in a native name; most other languages have the value of SPACE.
305
306```xml
307<!ELEMENT nativeSpaceReplacement ( #PCDATA ) >
308<!ATTLIST nativeSpaceReplacement xml:space preserve #REQUIRED >
309```
310
311* `xml:space` must be set to `'preserve'` so that actual spaces in the pattern are preserved. See [W3C XML White Space Handling](https://www.w3.org/TR/xml/#sec-white-space).
312* The `#PCDATA `is the character sequence used to replace spaces when postprocessing a pattern.
313
314### initialPattern Element
315
316The `<initialPattern>` element is used to specify how to format initials of name parts.
317
318**_initial_** is a pattern used to display a single initial in the locale, while **_initialSequence_** is a pattern used to “glue” together multiple initials for multiword fields, for example with the given name “Mary Beth” in English.
319
320#### Syntax
321
322```xml
323<!ELEMENT initialPattern ( #PCDATA ) >
324<!ATTLIST initialPattern type ( initial | initialSequence) #REQUIRED >
325```
326
327The `type="initial"` is used to specify the pattern for how single initials are created, for example “Wolcott” => “W.” would have an entry of
328
329> `<initialPattern type="initial">{0}.</initialPattern>`
330
331`type="initialSequence`” is used to specify how a series of initials should appear, for example “Wolcott Janus” => “W. J.”, with spaces between each initial, would have a specifier of
332
333> `<initialPattern type="initialSequence">{0} {1}</initialPattern>`
334
335## Person Name Object
336
337The information that is to be formatted logically consists of a data object containing a number of fields. This data object is a construct for the purpose of formatting, and doesn’t represent the source of the name data. That is, the original source may contain more information. The PersonName object is merely a logical ‘transport’ of information to formatting; it may in actuality consist of, for example, an API that fetches fields from a database.
338
339Note that an application might have more than one set of name data for a given person, such as data for both a legal name and a nickname or preferred name. Or the source data may contain two whole sets of name data for a person from an Eastern Slavic region, one in Cyrillic characters and one in Latin characters. Or it might contain phonetic data for a name (commonly used in Japan). The additional application-specific information in person’s names is out of scope for the CLDR Person Name formatting data. Thus a calling application may produce more than one PersonName object to format depending on the purpose.
340
341For illustration, the following is a sample PersonName object.
342
343| Field            | Value        | Comment                         |
344| ---------------- | ------------ | ------------------------------- |
345| `title`          | “Dr.”        |                                 |
346| `given`          | “William”    |                                 |
347| `given-informal` | “Bill”       | example inclusion of "nickname" |
348| `given2`         | “Torval”     |                                 |
349| `surname`        | “Brown”      |                                 |
350| `nameLocale`     | “und-US”     | this is just for illustration   |
351| `preferredOrder` | “givenFirst” | values are givenFirst and surnameFirst                        |
352
353A PersonName object is logically composed of the fields above plus other possible variations. See [[Fields](#fields)]. There must be at least one field present: either a `given` or `surname` field. Other fields are optional, and some of them can be constructed from other fields if necessary.
354
355A modifier is supplied, _-informal_, which can be used to indicate which data element to choose when formatting informal cases which might include nicknames or preferred names. For more details, see section on [_[Modifiers](#modifiers)_] in [namePattern Syntax](#namepattern-syntax) below.
356
357## Person Name Attributes
358
359A person name pattern may have any of four attributes: order, length, usage, and formality. LDML specifies that all the values for these attributes are unique. For example, because length=long is valid, usage=long cannot also be valid. That allows the pattern labels to be simple, because the attribute names can be skipped. That is,
360
361> `{order=givenFirst, length=long, usage=referring, formality=formal}`
362
363can be abbreviated without loss of information as:
364
365> _givenFirst-long-referring-formal._
366
367Each of these attributes are described below using sample PersonName objects as examples.
368
369### order
370
371The order attribute is used for patterns with different orders of fields. The order=sorting patterns are chosen based on input parameters, while the choice between givenFirst and surnameFirst is based on features of the PersonName object to be formatted and the nameOrder element values.
372
373| Parameter      | Description                                  |
374| -------------- | -------------------------------------------- |
375| `givenFirst`   | The given name precedes the surname.         |
376| `surnameFirst` | The surname precedes the given name.         |
377| `sorting`      | Used to format names for a sorted list.<br/>example: “Brown, William”  [medium, informal] |
378
379For example, when the display language is Japanese, it is customary to use _surnameFirst_ for names of people from Japan and Hungary, but use _givenFirst_ for names of people from the United States and France. Although the English pattern for sorting is distinct from the other patterns (except for unusual names), that is not necessarily the case in other languages.
380
381### length
382
383The `length` attribute specifies the relative length of a formatted name depending on context. For example, a `long` formal name in English might include title, given, given2, surname plus generation and credentials; whereas a `short` informal name may only be the given name.
384
385Note that the formats may be the same for different lengths depending on the formality, usage, and cultural conventions for the locale. For example, medium and short may be the same for a particular context.
386
387| Parameter | Description |
388| --------- | ----------- |
389| `long`    | A `long` length would usually include all parts needed for a legal name or identification.<br/>Example: `usage="referring", formality="formal"`<br/>_“Mr. Robert John Smith, PhD”_ |
390| `medium`  | A `medium` length is between long and short.<br/>Example: `usage="referring", formality="formal"`<br/>_“Robert Smith”_ |
391| `short`   | A `short` length uses a minimum set of names.<br/>Example: `usage="referring", formality="formal"`<br/>_“Mr. Smith”_ |
392
393### usage
394
395The usage indicates if the formatted name is being used to address someone, refer to someone, or present their name in an abbreviated form.
396
397The pattern for `usage="referring"` may be the same as the pattern for `usage="addressing"`.
398
399| Parameter    | Description |
400| ------------ | ----------- |
401| `addressing` | Used when speaking “to” a person, or “vocative” case. This may also have an effect on the formality.<br/>example: “Welcome, **Robert**” |
402| `referring`  | Used when speaking “about” a person, or “nominative” case.<br/>example: “**Robert Smith** joined your group” |
403| `monogram`   | The `monogram` usage is for a specific abbreviated form for computer UI.<br/>Example: a monogram for Robert James Smith may be **RS** or **RJS**.|
404
405Slavic languages provide a good  example of `addressing` vs `referring`. An example _uk-Cyrl_ PersonName object:
406
407| Field            | Value        | Comment                         |
408| ---------------- | ------------ | ------------------------------- |
409| `title`          | “г-н”        | “Mr.”                           |
410| `given`          | “Иван”       | “Ivan”                          |
411| `given2`         | “Петрович”   | “Petrovich”                     |
412| `surname`        | “Васильев”   | “Vasiliev”                      |
413
414In Slavic languages, when _`addressing`_ a person (with `length="long"`), it might be
415
416* г-н Иван Петрович Васильев `// "Mr Ivan Petrovich Vasiliev"`
417
418And when _`referring`_ to a person, it might place the surname first.:
419
420* Васильев Иван Петрович `// "Vasiliev Ivan Petrovich"`
421
422The `monogram` usage is for very short abbreviated names, such as might be found in online messaging text avatars or other annotations. Ideally, a `monogram` format should result in something that could fit in an em square. Some emoji provide examples of this: ��️ �� ��
423
424When used with `length`, for many alphabetic locales a `monogram` would resolve to one, two, or three characters for short, medium, and long respectively. But that may vary depending on the usage in a locale.
425
426### formality
427
428The `formality` indicates the formality of usage. A name on a badge for an informal gathering may be much different from an award announcement at the Nobel Prize Ceremonies.
429
430Note that the formats may be the same for different formality scenarios depending on the length, usage, and cultural conventions for the locale. For example short formal and short informal may both be just the given name.
431
432| Parameter  | Description |
433| ---------- | ----------- |
434| `formal`   | A more formal name for the individual. The composition depends upon the language. For example, a particular locale might include the title, generation, credentials and a full middle name (given2) in the long form.<br/><br/>`length="medium", formality="formal"`<br/>“Robert J. Smith” |
435| `informal` | A less formal name for the individual. The composition depends upon the language. For example, a language might exclude the title, credentials and given2 (middle) name. Depending on the length, it may also exclude the surname. The formatting algorithm should choose any passed in name data that has an _informal_ attribute, if available.<br/><br/>`length="medium", formality="informal"`<br/>“Bob Smith” |
436
437## namePattern Syntax
438
439A _namePattern_  is composed of a sequence of field IDs, each enclosed in curly braces, and separated by zero or more literal characters (eg, space or comma + space). An Extended Backus Normal Form (EBNF) is used to describe the namePattern format for a specific set of attributes. It has the following structure. This is the `( #PCDATA )` reference in the element specification above.
440
441|              | EBNF                          | Comments |
442| ------------ | ----------------------------- | -------- |
443| namePattern  | = literal?<br/><span style="white-space:nowrap">( modField  literal? )+;</span> | Two literals cannot be adjacent |
444| modField     | <span style="white-space:nowrap">= '{' field modifierList? '}';</span> | A name field, optionally modified |
445| field        | = 'title'<br/>\| 'given'<br/>\| 'given2'<br/>\| 'surname'<br/>\| 'surname2'<br/>\|  'generation'<br/>\| 'credentials' ; | See [Fields](#fields) |
446| modifierList | = '-informal'?<br/><span style="white-space:nowrap">( '-allCaps' \| ‘-initialCap' )?;</span><br/><span style="white-space:nowrap">( '-initial'  \| '-monogram' )?</span><br/><span style="white-space:nowrap">( '-prefix' \| '-core' )?</span> | Optional modifiers that can be applied to name parts, see [Modifiers](#modifiers). Note that some modifiers are exclusive: only `prefix` or `core`, only `initial` or `monogram`, only `allCaps` or `initialCap`. |
447| literal      | = codepoint+ ; | One or more Unicode codepoints. |
448
449### Fields
450
451The Person Name formatting data assumes that the name data to be formatted consists of the fields in the table below. All of the fields may contain multiple words. Field IDs are lowercase ASCII alphanumeric, and start with an alphabetic character.
452
453When determining how a full name is to be placed into name fields, the data to be formatted should be organized functionally. That is, if a name part is on the dividing line between `given2` and `given`, the key feature is whether it would always occur with the rest of the given name. For example, in _“Mary Jean Smith”_, if _“Mary”_ never occurs without the _“Jean”_, then the given name should be _“Mary Jean”_. If _“Smith”_ never occurs without the _“Jean”_, the `surname` should be _“Jean Smith”_. Otherwise, _“Jean”_ would be the `given2` field.
454
455For example, a patronymic would be treated as a `given2` name in most slavic languages.
456
457In some cultures, two surnames are used to indicate the paternal and maternal family names or generational names indicating father, grandfather. The `surname2` field is used to indicate this. The CLDR PersonName formatting data assumes that if a PersonName object to be formatted does not have two surnames, then the `surname2` field is not populated. (That is, no pattern should have a `surname2` field without a surname field.) Order of fields in a pattern can vary arbitrarily by locale.
458
459In most cultures, there is a concept of nickname or preferred name, which is used in informal settings or sometimes to represent a “public” or “stage name”. The nickname or preferred name may be submitted as a separate PersonName object to be formatted, or included with a modifier such as `given-informal`.
460
461| Field      | Description<br/>Note: The values for each are as supplied by the PersonName object, via the PersonName data interface. |
462| ---------- | ----------- |
463| `title`   | A title or honorific qualifier.<br/>Example: ‘Ms.’, ‘Mr.’, ’Dr’, ‘President’<br/><br/>Note that CLDR PersonName formats data does not define regional or locale-specific lists of titles or honorifics such as “Mr”, “Ms”, “Mx”, “Prof”, etc. |
464| `given`    | The “given” name. Can be multiple words such as “Mary Ann”.<br/>Examples:  “Janus”, “Mary Jean”, or “Jean-Louis”|
465| `given2`   | Additional given name or names or middle name, usually names(s) written between the given and surname. Can be multiple words. In some references, also known as a “second” or “additional” given name or patronymic. This field is separate from the “given” field because it is often optional in various presentation forms.<br/>Examples:  “Horatio Wallace” as in<br/>`{ given: "Janus", `**`given2: "Horatio Wallace"`**`, surname: "Young" }`<br/><br/>“S.” as in “Harry S. Truman”. Yes, his full middle name was legally just “S.”.|
466| `surname`  | The “family name”. Can be more than one word.<br/><br/>Example: “van Gogh” as in<br/>`{ given: "Vincent", given2: "Willem", `**`surname: "van Gogh"`**` }`<br/><br/>Other examples: “Heathcote-Drummond-Willoughby” as in “William Emanuel Heathcote-Drummond-Willoughby III”|
467| `surname2` | Secondary surname (used in some cultures), such as second or maternal surname in Mexico and Spain. This field is separate from the “surname” field because it is often optional in various presentation forms, and is considered a separate distinct name in some cultures.<br/><br/>Example: “Barrientos” in “Diego Rivera Barrientos”;<br/>`{ given: "Diego", surname: "Rivera", `**`surname2: "Barrientos"`**` }`<br/><br/>Example: if "Mary Jane Smith" moves to Spain the new name may be<br/>`{ given: "Mary", given2: "Jane", surname: "Smith", `**`surname2: "Jones"`**` }`|
468| `credentials`   | A credential or accreditation qualifier.<br/>Example: “PhD”, “MBA”<br/><br/>Example: “Salvatore Jarvis MBA”<br/>`{ given: "Salvatore", given2: "Blinken", surname: "Jarvis", `**`credentials: "MBA"`**` }`<br/><br/>An alternate PersonName object may be presented for formatting using the “stage” name from the application’s data:<br/>`{ given: "Salvatore", given-informal: "Salvatore", given2: "", surname: "Jarvis", `**`credentials: "MBA"`**` }` |
469| `generation`   | A generation qualifier.<br/>Example: “III”, “Jr.”<br/><br/>Example: “Sonny Jarvis Jr.”<br/>`{ given: "Salvatore", given2: "Blinken", surname: "Jarvis", `**`generation: "Jr."`**` }` |
470
471Some other examples:
472
473* British name: _John Ronald Reuel Tolkien_: `given` name is "John", `given2` name would be  "Ronald Reuel", and the `surame` is "Tolkien".
474* Dutch name: _Anneliese Louise van der Pol_: `given` name: "Anneliese", `given2` name: "Louise", `surname`: "van der Pol"
475    * Also surname-prefix: “van der”, surname-core: “Pol” — see below.
476* French name: “Jean-Louis Trintignant” would _not_ be Jean (`given`) Louis (`given2`) Trintignant (`surname`), since “Louis” wouldn’t be discarded when formatting. Instead it would be Jean-Louis (`given`) Trintignant (`surname`)
477
478Note: If the legal name, stage name, etc. are substantially different, then that information can be logically in a separate PersonName object. That is, it is up to the implementation to maintain any distinctions that are important to it: CLDR PersonName formats is focusing on formatting a PersonName object that is given to it.
479
480`surname2` would only be asked for in certain locales, and where it is considered a separate, divisible name, such as in Mexico or Spain. For instance, in Mexico, the first and second surname are used for the legal name and in formal settings, and sometimes only the first surname is used in familiar or informal contexts.
481
482* Heathcote-Drummond is a single surname and would not be `{surname}-{surname2}` because we would never discard part of the name when formatting.
483* Spanish name: "Jose Luis Garcia Barrientos":   The `given` name is “Jose”, the `given2` name is “Luis”, the `surname` is "Garcia”, and the `surname2` is “Barrientos"
484
485How names get placed into fields to be formatted is beyond the scope of CLDR PersonName formats; this document just lays out the assumptions the formatting code makes when formatting the names.
486
487### Modifiers
488
489Each field in a pattern can have one or more modifiers. The modifiers can be appended to any field name, such as `{given-initial}` for the first grapheme of the given name. If more than one modifier is applied, they must be structured as in the EBNF.
490
491The modifiers transform the input data as described in the following table:
492
493| Modifier   | Description |
494| ---------- | ----------- |
495| informal   | Requests an informal version of the name if available. For example, {given} might be “Thomas”, and {given-informal} might be “Tom”. If there is no informal version, then the normal one is returned. An informal version should not be generated, because they vary too much: Beth, Betty, Betsy, Bette, Liz, … |
496| prefix     | Return the “prefix” name, or the “tussenvoegsel'' if present. For example, “van der Poel” becomes “van der”, “bint Fadi” becomes “bint”, “di Santis” becomes “di”. Note that what constitutes the prefix is language- and locale-sensitive. It may be passed in as part of the PersonName object, similar to the _“-informal”_ modifier, e.g. as _“surname-prefix”_.<br/><br/>The implementation of this modifier depends on the PersonName object. CLDR does not currently provide support for automatic identification of tussenvoegsels, but may in the future.<br/><br/>If the resulting _“-prefix”_ value is empty, it defaults to an empty string.<br/><br/>An example sorting pattern for “Johannes van den Berg” may be<br/>{surname-core}, {given} {given2} {surname-prefix}<br/><br/>Only the _“-prefix”_ or the _“-core”_ modifier may be used, but not both. They are mutually exclusive. |
497| core       | Return the “core” name, removing any tussenvoegsel. For example, “van der Poel” becomes “Poel”, “bint Fadi” becomes “Fadi”, “di Santis” becomes “Santis”. Note that what constitutes the core is language- and locale-sensitive.<br/><br/>The implementation of this modifier depends on the PersonName object. CLDR does not currently provide support for identification of tussenvoegsel, but may in the future.<br/><br/>If the resulting _“-core”_ value is empty, it defaults to the field it modifies. E.g., if _“surname-core”_ is empty in the PersonName object to be formatted, it will default to the _“surname”_ field.<br/><br/>Vice-versa, if the _surname_ field is empty, the formatter will attempt to use _surname-prefix_ and _surname-core_, if present, to format the name.<br/><br/>Only the _“-prefix”_ or the _“-core”_ modifier may be used, but not both. They are mutually exclusive. |
498| allCaps    | Requests the element in all caps, which is desired In some contexts. For example, a new guideline in Japan is that for the Latin representation of Japanese names, the family name comes first and is presented in all capitals. This would be represented as<br/>“{surname-allCaps} {given}”<br/><br/>Hayao Miyazaki (宮崎 駿) would be represented in Latin characters in Japan (ja-Latn-JP) as _“MIYAZAKI Hayao”_<br/><br/>_The default implementation uses the default Unicode uppercase algorithm; if the PersonName object being formatted has a locale, and CLDR supports a locale-specific algorithm for that locale, then that algorithm is used. The PersonName object can override this, as detailed below._<br/><br/>Only the _“-allCaps”_ or the _“-initalCap”_ modifier may be used, but not both. They are mutually exclusive. |
499| initialCap | Request the element with the first grapheme capitalized, and remaining characters unchanged. This is used in cases where an element is usually in lower case but may need to be modified. For example in Dutch, the name<br/>{ title: “dhr.”, given: ”Johannes”, surname: “van den Berg” },<br/>when addressed formally, would need to be “dhr. Van den Berg”. This would be represented as<br/>“{title} {surname-initialCap}”<br/><br/>Only the _“-allCaps”_ or the _“-initalCap”_ modifier may be used, but not both. They are mutually exclusive. |
500| initial    | Requests the initial grapheme cluster of each word in a field. The `initialPattern` patterns for the locale are used to create the format and layout for lists of initials. For example, if the initialPattern types are<br/>`<initialPattern type="initial">{0}.</initialPattern>`<br/>`<initialPattern type="initialSequence">{0} {1}</initialPattern>`<br/>then a name such as<br/>{ given: “John”, given2: “Ronald Reuel”, surname: “Tolkien” }<br/>could be represented as<br/>“{given-initial-allCaps} {given2-initial-allCaps} {surname}”<br/>and will format to “**J. R. R. Tolkien**”<br/><br/>_The default implementation uses the first grapheme cluster of each word for the value for the field; if the PersonName object has a locale, and CLDR supports a locale-specific grapheme cluster algorithm for that locale, then that algorithm is used. The PersonName object can override this, as detailed below._<br/><br/>Only the _“-initial”_ or the _“-monogram”_ modifier may be used, but not both. They are mutually exclusive. |
501| monogram   | Requests initial grapheme. Example: A name such as<br/>{ given: “Landon”, given2: “Bainard Crawford”, surname: “Johnson” }<br/>could be represented as<br/>“{given-monogram-allCaps}{given2-monogram-allCaps}{surname-monogram-allCaps}”<br/>or “**LBJ**”<br/><br/>_The default implementation uses the first grapheme cluster of the value for the field; if the PersonName object has a locale, and CLDR supports a locale-specific grapheme cluster algorithm for that locale, then that algorithm is used. The PersonName object can override this, as detailed below. The difference between monogram an initial is that monogram only returns one element, not one element per word._<br/><br/>Only the _“-initial”_ or the _“-monogram”_ modifier may be used, but not both. They are mutually exclusive. |
502| retain | This is needed in languages that preserve punctuation when forming initials. For example, normally the name {given=Anne-Marie} is converted into initials with {given-initialCaps} as “A. M.”. However, where a language preserves the hyphen, the pattern should use {given-initialCaps**-retain**} instead. In that case, the result is “A.-M.”. (The periods are added by the pattern-initialSequence.) |
503| genitive, vocative | Patterns can use these modifiers so that better results can be obtained for inflected languages. However, see the details below. |
504
505#### Grammatical Modifiers for Names
506
507The CLDR person name formatting does not itself support grammatical inflection.
508However, name sources (NameObject) can support inflections, either by having additional fields or by using an inflection engine that can handle personal name parts.
509
510In the current release, the focus is on supporting `referring` and `addressing` forms.
511Typically the `referring` forms will be in the most neutral (*nominative*) case, and the `addressing` forms will be in the *vocative* case.
512Some modifiers have been added to facilitate this, so that there can be patterns like: {given-vocative} {surname-vocative}.
513
514Notice that some **parts** of the formatted name may be in different grammatical cases, so the cases may not be consistent across the whole name.
515For example:
516
517| English Pattern | Examples | Latvian Pattern | Examples |
518| ---- | ---- | ---- | ---- |
519| {given} {surname} | John Smith | {given} {surname} | Kārlis Ozoliņš |
520| {title} {surname} | Mr Smith | {surname} {title} | Ozoliņa kungs |
521
522Notice that the `surname` in Latvian needs to change to the genitive case with that pattern:
523
524Ozoliņš ➡︎ **Ozoliņa**
525
526That is accomplished by changing the pattern to be {surname<b>-genitive</b>} {title}. In this case the {surname} should only be genitive if followed by the {title}.
527
528#### Future Modifiers
529
530Additional modifiers may be added in future versions of CLDR.
531
532Examples:
533
5341. For the initial of the surname **_“de Souza”_**, in a language that treats the “de” as a tussenvoegsel, the PersonName object can automatically recast `{surname-initial}` to:<br/>`{surname-prefix-initial}{surname-core-initial-allCaps} `to get “dS” instead of “d”.
5352. If the locale expects a surname prefix to to be sorted after a surname, then both `{surname-core} `then `{surname-prefix}` would be used as in<br/>`{surname-core}, {given} {given2} {surname-prefix}`
5363. Only the grammatical modifiers requested by translators for `referring` or `addressing` have been added as yet, but additional grammatical modifiers may be added in the future.
537
538## Formatting Process
539
540The patterns are in **personName** elements, which are themselves in a **personNames** container element. The following describes how the formatter's locale interacts with the personName's locale, how the name patterns are chosen, and how they are processed.
541
542The details of the XML structure behind the data referenced here are in [XML Structure](#xml-structure).
543
544The formatting process may be refined in the future. In particular, additional data may be added to allow further customization.
545
546The term **maximal likely locale** used below is the result of using the [Likely Subtags](tr35.md#Likely_Subtags) data to map from a locale to a full representation that includes the base language, script, and region.
547
548### Derive the name locale
549
550Construct the **name script** in the following way.
5511. Iterate through the characters of the surname, then through the given name.
552    1. Find the script of that character using the Script property.
553    2. If the script is not Common, Inherited, nor Unknown, return that script as the **name script**
5542. If nothing is found during the iteration, return Zzzz (Unknown Script)
555
556Construct the **name base language** in the following way.
5571. If the PersonName object can provide a name locale, return its language.
5582. Otherwise, find the maximal likely locale for the name script and return its base language (first subtag).
559
560Construct the **name locale** in the following way:
5611. If the PersonName object can provide a name locale, return a locale formed from it by replacing its script by the name script.
5622. Otherwise, return the locale formed from the name base language plus name script.
563
564Construct the **name ordering locale** in the following way:
5651. If the PersonName object can provide a name locale, return it.
5662. Otherwise, return the maximal likely locale for “und-” + name script.
567
568### Derive the formatting locale
569
570Let the **full formatting locale** be the maximal likely locale for the formatter's locale. The **formatting base language** is the base language (first subtag) of the full formatting locale, and the **formatting script** is the script code of the full formatting locale.
571
572#### Switch the formatting locale if necessary
573
574A few script values represent a set of scripts, such as Jpan = {Hani, Kana, Hira}. Two script codes are said to _match_ when they are either identical, or one represents a set which contains the other, or they both represent sets which intersect. For example, Hani and Jpan match, because {Hani, Kana, Hira} contains Hani.
575
576If the **name script** doesn't match the **formatting script**:
5771. If the name locale has name formatting data, then set the formatting locale to the name locale.
5782. Otherwise, set the formatting locale to the maximal likely locale for the the locale formed from und, plus the name script plus the region of the nameLocale.
579
580For example, when a Hindi (Devanagari) formatter is called upon to format a name object that has the locale Ukrainian (Cyrillic):
581* If the name is written with Cyrillic letters, under the covers a Ukrainian (Cyrillic) formatter should be instantiated and used to format that name. 
582* If the name is written in Greek letters, then under the covers a Greek (Greek-script) formatter should be instantiated and used to format.
583
584To determine whether there is name formatting data for a locale, get the values for each of the following paths.
585If at least one of them doesn’t inherit their value from root, then the locale has name formatting data.
586* //ldml/personNames/nameOrderLocales[@order="givenFirst"]
587* //ldml/personNames/nameOrderLocales[@order="surnameFirst"]
588
589### Derive the name order
590
591A PersonName object’s fields are used to derive an order, as follows:
592
5931. If the calling API requests sorting order, that is used.
5942. Otherwise, if the PersonName object to be formatted has a `preferredOrder` field, then return that field’s value
5953. Otherwise, use the nameOrderLocales elements to find the best match for the name locale, as follows.
596    1. For each locale L1 in the parent locale lookup chain* for the **name ordering locale**, do the following
597        1. Create a locale L2 by replacing the language subtag by 'und'. (Eg, 'de_DE' ⇒ 'und_DE')
598        2. For each locale L in {L1, L2}, do the following
599             1. If there is a precise match among the givenFirst nameOrderLocales for L, then let the nameOrder be givenFirst, and stop.
600             2. Otherwise if there is a precise match among the surnameFirst nameOrderLocales for L, then let the nameOrder be surnameFirst, and stop.
601    2. Otherwise, let the nameOrder be givenFirst, and stop.
602
603\* For example, here is a parent locale lookup chain:
604
605    de_Latn_DE ⇒ de_Latn ⇒ de_DE ⇒ de ⇒ und
606
607In other words, with the name locale of `de_Latin_DE` you'll check the givenFirst and surnameFirst resources for the following locales, in this order:
608
609    de_Latin_DE, und_Latn_DE, de_Latn, und_Latn, de_DE, und_DE, de, und
610
611This process will always terminate, because there is always a und value in one of the two nameOrderLocales elements. Remember that the lookup chain requires use of the parentLocales elements: it is not just truncation.
612
613For example, the data for a particular locale might look like the following:
614
615```xml
616<nameOrderLocales order="surnameFirst">zh ja und-CN und-TW und-SG und-HK und-MO und-HU und-JP</nameOrderLocales>
617```
618These nameOrderLocales will match any locale with a zh or ja [unicode_language_subtag](tr35.md#unicode_language_subtag) and any locale with a CN, TW, SG, HK MO, HU, or JP [unicode_region_subtag](tr35.md#unicode_region_subtag).
619
620Here are some more examples. Note that if there is no order field or locale field in the PersonName object to be formatted, and the script of the PersonName data is different from that of the formatting locale, then the default result is givenFirst.
621
622| PersonName Object preferredOrder | PersonName Object Locale | Resulting Order |
623| -------------------------------- | ------------------------ | --------------- |
624| surnameFirst                     | ?                        | surnameFirst    |
625|                                  | zh                       | surnameFirst    |
626|                                  | und-JP                   | surnameFirst    |
627|                                  | fr                       | givenFirst      |
628|                                  |                          | givenFirst      |
629
630### Choose a personName element
631
632The personName data in CLDR provides representations for how names are to be formatted across the different axes of _order_, _length_, _usage_, and _formality_. More than one `namePattern` can be associated with a single `personName` entry. An algorithm is then used to choose the best `namePattern` to use.
633
634As an example for English, this may look like:
635
636```xml
637<personNames>
638  <personName order="givenFirst" length="long" usage="referring" formality="formal">
639    <namePattern>{title} {given} {given2} {surname}, {credentials}</namePattern>
640  </personName>
641  <personName order="givenFirst" length="long" usage="referring" formality="informal">
642    <namePattern>{given} «{given2}» {surname}</namePattern>
643    <namePattern alt="2">«{given2}» {surname}</namePattern>
644  </personName>
645  <personName order="givenFirst" length="long" usage="sorting" formality="informal">
646    <namePattern>{surname}, {given} {given2}</namePattern>
647  </personName>
648  ...
649</personNames>
650```
651
652The task is to find the best personName for a given set of input attributes. Well-formed data will always cover all possible combinations of the input parameters, so the algorithm is simple: traverse the list of person names until the first match is found, then return it.
653
654In more detail:
655
656A set of input parameters { order=O length=L usage=U formality=F } matches a personName element when:
657
658* The order attribute values contain O or there is no order attribute, and
659* The length attribute values contain L or there is no length attribute, and
660* The usage attribute values contain U or there is no usage attribute, and
661* The formality attribute values contain F or there is no formality attribute
662
663Example for input parameters
664
665> `order = `**`givenFirst`**`, length = `**`long`**`, usage = `**`referring`**`, formality = `**`formal`**
666
667To match a personName, all four attributes in the personName must match (a missing attribute matches any value for that attribute):
668
669| Sample personName attributes                                 | Matches? | Comment |
670| :----------------------------------------------------------- | :------: | :------ |
671| `order=`_`"givenFirst"`_` length=`_`"long"`_` usage=`_`"referring"`_` formality=`_`"formal"`_ | Y | exact match |
672| `length=`_`"long"`_` usage=`_`"referring"`_` formality=`_`"informal"`_ | N | mismatch for formality |
673| `length=`_`"long"`_` formality=`_`"formal"`_                  | Y | missing usage = all! |
674
675To find the matching personName element, traverse all the personNames in order until the first one is found. This will always terminate since the data is well-formed in CLDR.
676
677### Choose a namePattern
678
679To format a name, the fields in a namePattern are replaced with fields fetched from the PersonName Data Interface. The personName element can contain multiple namePattern elements. Choose one based on the fields in the input PersonName object that are populated:
6801. Find the set of patterns with the most populated fields.
6812. If there is just one element in that set, use it.
6822. Otherwise, among that set, find the set of patterns with the fewest unpopulated fields.
6833. If there is just one element in that set, use it.
6844. Otherwise, take the pattern that is alphabetically least. (This step should rarely happen, and is only for producing a determinant result.)
685
686For example:
687
6881. Pattern A has 12 fields total, pattern B has 10 fields total, and pattern C has 8 fields total.
6892. Both patterns A and B can be populated with 7 fields from the input PersonName object, pattern C can be populated with only 3 fields from the input PersonName object.
6903. Pattern C is discarded, because it has the least number of populated name fields.
6914. Out of the remaining patterns A and B, pattern B wins, because it has only 3 unpopulated fields compared to pattern A.
692
693### Access PersonName object
694
695#### Handle missing surname
696
697All PersonName objects will have a given name (for mononyms the given name is used). However, there may not be a surname. In that case, the following process is followed so that formatted patterns produce reasonable results.
698
6991. If there is no surname from a PersonName P1 _and_ the pattern either doesn't include the given name or only shows an initial for the given name, then:
700    1. Construct and use a derived PersonName P2, whereby P2 behaves exactly as P1 except that:
701        1. Any request for a surname field (with any modifiers) returns P1's given name (with the same modifiers)
702        2. Any request for a given name field (with any modifiers) returns "" (empty string)
703
704As always, this is a logical description and may be optimized in implementations. For example, an implemenation may use an interface for P2 that just delegates calls to P1, with some redirection for accesses to surname and given name.
705
706#### Handle core and prefix
707
708A given field may have a core value, a prefix value, and/or a ‘plain’ value (neither core nor prefix). If one or more of them are missing, then the returned values should be adjusted according to the table below. In the three cells on the left, a ✓ indicates that a value is available, an ✖️ if there is none. In three cells on the right, the value of = means the returned value is unchanged, ✖️ means the returned value is “empty”, and anything else is a description of what to change it to.
709
710| prefix | core | plain | | prefix | core  | plain |
711| ------ | ---- | ----- |-| ------ | ----  | -----    |
712| ✓      | ✓    | ✓     | | =      | =     | =        |
713| ✓      | ✖️   | ✓     | | ✖️     | plain | =        |
714| ✖️     | ✓    | ✓     | | =      | plain | =        |
715| ✖️     | ✖️   | ✓     | | =      | plain | =        |
716| ✓      | ✓    | ✖️    | | =      | =     | prefix + " " + core |
717| ✖️     | ✓    | ✖️    | | =     | =         | core |
718| ✓      | ✖️   | ✖️    | | ✖️    | =         | =        |
719| ✖️     | ✖️   | ✖️    | | =     | =         | =        |
720
721For example, if the surname-prefix is "von und zu" and the surname-core is "Stettbach" and there is no surname (plain), then the derived value for the (plain) surname is "von und zu Stettbach". (The cases where existing prefix values are changed should not be necessary with well-formed PersonName data.)
722
723#### Derive initials
724
725The following process is used to produce initials when they are not supplied by the PersonName object. Assuming the input example is “Mary Beth”:
726
727| Action              | Result |
728| ------------------- | ------ |
729| 1. Split into words | “Mary” and “Beth” |
730| 2. Fetch the first grapheme cluster of each word | “M” and “B” |
731| 3. The ***initial*** pattern is applied to each<br/>`  <initialPattern type="initial">{0}.</initialPattern>` | “M.” and “B.” |
732| 4. Finally recombined with ***initialSequence***<br/>`  <initialPattern type="initialSequence">{0} {1}</initialPattern>` | “M. B.” |
733
734See the “initial” modifier in the [Modifiers](#modifiers) section for more details.
735
736### Process a namePattern
737
738The “winning” namePattern may still have fields that are unpopulated (empty) in the PersonName object. That namePattern is populated with field values with the following steps:
739
7401. If one or more fields at the start of the pattern are empty, all fields and literal text before the **first** populated field are omitted.
7412. If one or more fields at the end of the pattern are empty, all fields and literal text after the **last** populated field are omitted.
7423. Processing from the start of the remaining pattern:
743    1. If there are two or more empty fields separated only by literals, the fields and the literals between them are removed.
744    2. If there is a single empty field, it is removed.
7454. If the processing from step 3 results in two adjacent literals (call them A and B), they are coalesced into one literal as follows:
746    1. If either is empty the result is the other one.
747    2. If B matches the end of A, then the result is A. So xyz + yz ⇒ xyz, and xyz + xyz ⇒ xyz.
748    3. Otherwise the result is A + B, further modified by replacing any sequence of two or more white space characters by the first whitespace character.
7495. All of the fields are replaced by the corresponding values from the PersonName object.
750
751The result is the **formatted value**. However, there is one further step that might further modify that value.
752
753#### Handling foreign names
754
755There are two main challenges in dealing with foreign name formatting that needs to be considered. One is the ordering, which is dealt with under the section [nameOrderLocales Element](#nameorderlocales-element)]. The other is spacing.
756
757Some writing systems require spaces (or some other non-letters) to separate words. For example, [Hayao Miyazaki](https://en.wikipedia.org/wiki/Hayao_Miyazaki) is written in English with given name first and with a space between the two name fields, while in Japanese there is no space with surname first: [宮崎駿](https://ja.wikipedia.org/wiki/%E5%AE%AE%E5%B4%8E%E9%A7%BF)
758
759If a locale requires spaces between words, the normal patterns for the formatting locale are used. On Wikipedia, for example, note the space within the Japanese name on pages from English and Korean (an ideographic space is used here for emphasis).
760
761* “​​[Hayao Miyazaki (宮崎<span style="background-color:aqua"> </span>駿, Miyazaki Hayao](https://en.wikipedia.org/wiki/Hayao_Miyazaki)…” or
762* “[미야자키<span style="background-color:aqua"> </span>하야오(일본어: 宮﨑<span style="background-color:aqua"> </span>駿 Miyazaki Hayao](https://ko.wikipedia.org/wiki/%EB%AF%B8%EC%95%BC%EC%9E%90%ED%82%A4_%ED%95%98%EC%95%BC%EC%98%A4)…”.
763
764If a locale **doesn’t** require spaces between words, there are two cases, based on whether the name is foreign or not (based on the PersonName objects explicit or calculated locale's language subtag). For example, the formatting locale might be Japanese, and the locale of the PersonName object might be de_CH, German (Switzerland), such as Albert Einstein. When the locale is foreign, the **foreignSpaceReplacement** is substituted for each space in the formatted name. When the name locale is native, a **nativeSpaceReplacement** is substituted for each space in the formatted name. The precise algorithm is given below.
765
766Here are examples for Albert Einstein in Japanese and Chinese:
767* [アルベルト<span style="background-color:aqua">・</span>アインシュタイン](https://ja.wikipedia.org/wiki/%E3%82%A2%E3%83%AB%E3%83%99%E3%83%AB%E3%83%88%E3%83%BB%E3%82%A2%E3%82%A4%E3%83%B3%E3%82%B7%E3%83%A5%E3%82%BF%E3%82%A4%E3%83%B3)
768* [阿尔伯特<span style="background-color:aqua">·</span>爱因斯坦](https://zh.wikipedia.org/wiki/%E9%98%BF%E5%B0%94%E4%BC%AF%E7%89%B9%C2%B7%E7%88%B1%E5%9B%A0%E6%96%AF%E5%9D%A6)
769
770#### Setting the spaceReplacement
771
7721. The foreignSpaceReplacement is provided by the value for the `foreignSpaceReplacement` element; the default value is a SPACE (" ").
7732. The nativeSpaceReplacement is provided by the value for the `nativeSpaceReplacement` element; the default value is SPACE (" ").
7743. If the formatter base language matches the name base language, then let spaceReplacement = nativeSpaceReplacement, otherwise let spaceReplacement = foreignSpaceReplacement.
7754. Replace all sequences of space in the formatted value string by the spaceReplacement.
776
777For the purposes of this algorithm, two base languages are said to __match__ when they are identical, or if both are in {ja, zh, yue}.
778
779**Note:** in the future the plan is to make the specific languages and scripts used in this algorithm be data-driven.
780
781Remember that **a name in a different script** will use a different locale for formatting, as per [Switch the formatting locale if necessary](#switch-the-formatting-locale-if-necessary).
782For example, when formatting a name for Japanese, if the name is in the Latin script, a Latin based locale will be used to format it, such as when “Albert Einstein” appears in Latin characters as in the Wikipedia page [Albert Einstein](https://ja.wikipedia.org/wiki/Albert_Einstein).
783
784#### Examples of space replacement
785
786To illustrate how foreign space replacement works, consider the following name data. For illustration, the name locale is given in the maximized form: in practice, `ja` would be used instead of `ja_Jpan_JP`, and so on.: For more information, see [Likely Subtags](tr35.md#Likely_Subtags).
787
788| name locale   | given    | surname       |
789| ------------- | -------- | ------------- |
790| `de_Latn_CH`  | Albert   | Einstein      |
791| `de_Kata_CH`  | アルベルト | アインシュタイン |
792| `ja_Kata_CH`  | アルベルト | アインシュタイン |
793| `ja_Latn_JP`  | Hayao    | Miyazaki      |
794| `ja_Jpan_JP`  | 駿       | 宮崎           |
795
796Suppose the PersonNames formatting patterns for `ja_JP` and `de_CH` contained the following:
797
798**`ja_JP` formatting patterns**
799
800<pre>
801&lt;personNames&gt;
802   &lt;nameOrderLocales order="givenFirst"&gt;und&lt;/nameOrderLocales&gt;
803   &lt;<strong>nameOrderLocales</strong> order="<strong>surnameFirst</strong>"&gt;hu <strong>ja</strong> ko vi yue zh <strong>und_JP</strong>&lt;/nameOrderLocales&gt;
804   &lt;<strong>nativeSpaceReplacement</strong> xml:space="preserve"&gt;<span style="background-color:aqua"></span>&lt;/nativeSpaceReplacement&gt;
805   &lt;<strong>foreignSpaceReplacement</strong> xml:space="preserve"&gt;<span style="background-color:aqua">・</span>&lt;/foreignSpaceReplacement&gt;
806   . . .
807   &lt;personName order="<strong>givenFirst</strong>" length="medium" usage="referring" formality="formal"&gt;
808      &lt;namePattern&gt;{given}<span style="background-color:aqua"> </span>{given2}<span style="background-color:aqua"> </span>{surname}{generation}&lt;/namePattern&gt;
809   &lt;/personName&gt;
810   . . .
811   &lt;personName order="<strong>surnameFirst</strong>" length="medium" usage="referring" formality="formal"&gt;
812      &lt;namePattern&gt;{surname}{given2}{given}{generation}&lt;/namePattern&gt;
813   &lt;/personName&gt;
814   . . .
815&lt;/personNames&gt;
816</pre>
817
818Note in the `de_CH` locale, _ja_ is not listed in nameOrderLocales, and would therefore fall under _und_, and be formatted using the givenFirst order patterns if the name data is in the same script as the formatting locale.
819
820**`de_CH` formatting patterns**
821
822<pre>
823&lt;personNames&gt;
824   &lt;nameOrderLocales order="<strong>givenFirst</strong>"&gt;und <strong>de</strong>&lt;/nameOrderLocales&gt;
825   &lt;nameOrderLocales order="surnameFirst"&gt;ko vi yue zh&lt;/nameOrderLocales&gt;
826   &lt;foreignSpaceReplacemen xml:space="preserve"&gt;<span style="background-color:aqua"> </span>&lt;/foreignSpaceReplacement&gt;
827   . . .
828   &lt;personName order="givenFirst" length="medium" usage="referring" formality="formal"&gt;
829      &lt;namePattern&gt;{given}<span style="background-color:aqua"> </span>{given2-initial}<span style="background-color:aqua"> </span>{surname}, {generation}&lt;/namePattern&gt;
830   &lt;/personName&gt;
831   . . .
832   &lt;personName order="surnameFirst" length="medium" usage="referring" formality="formal"&gt;
833      &lt;namePattern&gt;{surname}<span style="background-color:aqua">, </span>{given}<span style="background-color:aqua"> </span>{given2-initial}<span style="background-color:aqua">,</span> {generation}&lt;/namePattern&gt;
834   &lt;/personName&gt;
835   . . .
836&lt;/personNames&gt;`
837</pre>
838
839The name data would resolve as follows:
840<!-- TODO Replace the following with a markdown table -->
841
842<table>
843  <tr>
844   <td colspan="7" ><strong>formatting locale: ja_JP, </strong>script is Jpan which includes Hani, Hira and Kana</td>
845  </tr>
846  <tr>
847   <td><strong>name locale</strong></td>
848   <td><strong>given</strong></td>
849   <td><strong>surname</strong></td>
850   <td><strong>same<br/>script</strong></td>
851   <td><strong>formatting<br/>locale</strong</td>
852   <td><strong>order</strong></td>
853   <td><strong>foreign<br/>space</strong></td>
854  </tr>
855  <tr>
856   <td>de_Latn_CH</td>
857   <td>Albert</td>
858   <td><span style="text-decoration:underline;">Einstein</span></td>
859   <td>NO</td>
860   <td>de</td>
861   <td>given First</td>
862   <td></td>
863  </tr>
864  <tr>
865   <td colspan="7" style="text-align:center">“Albert <span style="text-decoration:underline;">Einstein</span>”</td>
866  </tr>
867  <tr>
868   <td>de_Jpan_CH</td>
869   <td>アルベルト</td>
870   <td><span style="text-decoration:underline;">アインシュタイン</span></td>
871   <td>YES</td>
872   <td>und</td>
873   <td>given First</td>
874   <td>“<span style="background-color:aqua">・</span>”</td>
875  </tr>
876  <tr>
877   <td colspan="7" style="text-align:center">“アルベルト<span style="background-color:aqua">・</span><span style="text-decoration:underline;">アインシュタイン</span>”</td>
878  </tr>
879  <tr>
880   <td>ja_Jpan_JP</td>
881   <td>駿</td>
882   <td><span style="text-decoration:underline;">宮崎</span></td>
883   <td>YES</td>
884   <td>ja</td>
885   <td>surname First</td>
886   <td></td>
887  </tr>
888  <tr>
889   <td colspan="7" style="text-align:center"><span style="text-decoration:underline;">宮崎</span>駿</td>
890  </tr>
891</table>
892<br/>
893
894<table>
895  <tr>
896   <td colspan="7" ><strong>formatting locale: de_CH</strong>, formatting locale script is Latn</td>
897  </tr>
898  <tr>
899   <td><strong>name locale</strong></td>
900   <td><strong>given</strong></td>
901   <td><strong>surname</strong></td>
902   <td><strong>same<br/>script</strong></td>
903   <td><strong>formatting<br/>locale</strong></td>
904   <td><strong>order</strong></td>
905   <td><strong>foreign<br/>space</strong></td>
906  </tr>
907  <tr>
908   <td>de_Latn_CH</td>
909   <td>Albert</td>
910   <td>Einstein</td>
911   <td>YES</td>
912   <td>de</td>
913   <td>given First</td>
914   <td></td>
915  </tr>
916  <tr>
917   <td colspan="7" style="text-align:center">“Albert Einstein”</td>
918  </tr>
919  <tr>
920   <td>de_Jpan_CH</td>
921   <td>アルベルト</td>
922   <td>アインシュタイン</td>
923   <td>NO</td>
924   <td>ja<br/>from script</td>
925   <td>given First</td>
926   <td>“<span style="background-color:aqua">・</span>”</td>
927  </tr>
928  <tr>
929   <td colspan="7" style="text-align:center">“アルベルト<span style="background-color:aqua">・</span>アインシュタイン”</td>
930  </tr>
931  <tr>
932   <td>und_Latn_JP</td>
933   <td>Hayao</td>
934   <td>Miyazaki</td>
935   <td>YES</td>
936   <td>und</td>
937   <td>given First</td>
938   <td>“<span style="background-color:aqua"> </span>”</td>
939  </tr>
940  <tr>
941   <td colspan="7" style="text-align:center">“Hayao<span style="background-color:aqua"> </span>Miyazaki”</td>
942  </tr>
943</table>
944<br/>
945
946### Formatting examples
947
948The personName element contains:
949
950
951> `<namePattern>{title} {given} {given2} {surname}, {credentials}</namePattern>`
952
953
954The input PersonName object contains:
955
956| `title` | `given` | `given2` | `surname` | `generation` |
957| -------- | ------- | -------- | --------- | --------      |
958|          | Raymond | J.       | Johnson   | Jr.           |
959
960The output is:
961
962> Raymond J. Johnson, Jr.
963
964The “title” field is empty, and so both it and the space that follows it are omitted from the output, according to rule 1 above.
965
966If, instead, the input PersonName object contains:
967
968| `title` | `given` | `given2` | `surname` | `generation` |
969| -------- | ------- | -------- | --------- | -------- |
970|          | Raymond | J.       | Johnson   |          |
971
972The output is:
973
974> Raymond J. Johnson
975
976The “title” field is empty, and so both it and the space that follows it are omitted from the output, according to rule 1 above.
977
978The “generation” field is also empty, so it and both the comma and the space that precede it are omitted from the output, according to rule 2 above.
979
980To see how rule 3 interacts with the other rules, consider an imaginary language in which people generally have given and given2 (or middle)  names, and the given2 name is always written with parentheses around it, and the given name is usually written as an initial with a following period.
981
982The personName element contains:
983
984> `<namePattern>{given-initial}. ({given2}) {surname}</namePattern>`
985
986
987The input PersonName object contains:
988
989| `given` | `given2` | `surname` |
990| ------- | -------- | --------- |
991| Foo     | Bar      | Baz       |
992
993The output is:
994
995> F. (Bar) Baz
996
997If, instead, the input PersonName object contains:
998
999| `given` | `given2` | `surname` |
1000| ------- | -------- | --------- |
1001| Foo     |          | Baz       |
1002
1003The output is:
1004
1005> F. Baz
1006
1007The “given2” field is empty, so it and the surrounding parentheses are omitted from the output, as is one of the surrounding spaces, according to rule 3. The period after “{given-initial}” remains, because it is separated from the “{given2}” element by  space-- punctuation around a missing field is only deleted up until the closest space in each direction.
1008
1009If there were no space between the period and the parentheses, as might happen if our hypothetical language didn’t use spaces:
1010
1011> `<namePattern>{given-initial}.({given2}) {surname}</namePattern>`
1012
1013The input PersonName object still contains:
1014
1015| `given` | `given2` | `surname` |
1016| ------- | -------- | --------- |
1017| Foo     |          | Baz       |
1018
1019The output is:
1020
1021> F Baz
1022
1023Both the period after “{given-initial}” _and_ the parentheses around “{given2}” are omitted from the output, because there was no space between them — instead, we delete punctuation all the way up to the neighboring field. To solve this (making sure the “{given-initial}” field always has a period after it), you would add another namePattern:
1024
1025> `<namePattern>{given-initial}.({given2}) {surname}</namePattern>`<br/>
1026> `<namePattern alt=”2”>{given-initial}. {surname}</namePattern>`
1027
1028The first pattern would be used when the “given2” field is populated, and the second pattern would be used when the “given2” field is empty.
1029
1030Rules 1 and 3 can conflict in similar ways. If the personName element contains (there’s a space between the period and the opening parenthesis again):
1031
1032> `<namePattern>{given-initial}. ({given2}) {surname}</namePattern>`
1033
1034And the input PersonName object contains:
1035
1036| `given` | `given2` | `surname` |
1037| ------- | -------- | --------- |
1038|         | Bar      | Baz       |
1039
1040The output is:
1041
1042> Bar) Baz
1043
1044Because the “given” field is empty, rule 1 not only has us delete it, but also all punctuation up to “{given2}”. This includes _both_ the period _and_ the opening parenthesis. Again, to solve this, you’d supply two namePatterns:
1045
1046> `<namePattern>{given-initial}. ({given2}) {surname}</namePattern>`<br/>
1047> `<namePattern alt=”2”> ({given2}) {surname}</namePattern>`
1048
1049The output would then be:
1050
1051> (Bar) Baz
1052
1053The first namePattern would be used if the “given” field was populated, and the second would be used if it was empty.
1054
1055If, instead, the input PersonName object contains:
1056
1057| `given` | `given2` | `surname` |
1058| ------- | -------- | --------- |
1059| Foo     |          | Baz       |
1060
1061The output is:
1062
1063> F. Baz
1064
1065## Sample Name
1066
1067The sampleName element is used for test names in the personNames LDML data for each locale to aid in testing and display in the CLDR Survey Tool. They are not intended to be used in production software as prompts or placeholders in a user interface and should not be displayed in a user interface.
1068
1069### Syntax
1070
1071```xml
1072<!ELEMENT sampleName ( nameField+ )  >
1073<!ATTLIST sampleName item NMTOKEN #REQUIRED >
1074```
1075
1076* `NMTOKEN` must be one of `( nativeG, nativeGS, nativeGGS, nativeFull, foreignG, foreignGS, foreignGGS, foreignFull )`. However, these may change arbitrarily in the future.
1077
1078### Expected values
1079
1080The item values starting with "native" are expected to be native names, in native script.
1081The item values starting with "foreign" are expected to be foreign names, in native script.
1082There are no foreign names or native names in a foreign script, because those should be handled by a different locale's data.
1083
1084The rest of the item value indicates how many fields are present.
1085For the expected sample name items, assume a name such as Mr. Richard “Rich” Edward Smith Iglesias Ph.D.
1086
1087* `G` is for an example name with only the given is presented: “Richard” or “Rich” (informal)
1088* `GS` is for an example name with only the given name and surname: “Richard Smith” or “Rich Smith” (informal)
1089* `GSS` is for an example using both given and given2 names and a surname: “Richard Edward Smith” and “Rich E. Smith” (informal)
1090* `Full` is used to present a name using all possible fields: “Mr. Richard Edward Smith Iglesias, Ph.D.1091
1092The `nameField` values and their modifiers are described in the [Person Name Object](#person-name-object) and [namePattern Syntax](#namepattern-syntax) sections.
1093
1094## PersonName Data Interface Examples
1095
1096### Example 1
1097
1098Greek initials can be produced via the following process in the PersonName object, and returned to the formatter.
1099
1100* Include all letters up through the first consonant or digraph (including the consonant or digraph).<br/>
1101(This is a simplified version of the actual process.)
1102
1103Examples:
1104
1105* Χριστίνα Λόπεζ (Christina Lopez) ⟶ Χ. Λόπεζ (C. Lopez)
1106* Ντέιβιντ Λόπεζ (David Lopez) ⟶ Ντ. Λόπεζ (D. Lopez)<br/>Note that Ντ is a digraph representing the sound D.
1107
1108### Example 2
1109
1110To make an initial when there are multiple words, an implementation might produce the following:
1111
1112* A field containing multiple words might skip some of them, such as in “Mohammed bin Ali bin Osman” (“MAO”).
1113* The short version of "Son Heung-min" is "H. Son" and not "H. M. Son" or the like. Korean given-names have hyphens and the part after the hyphen is lower-case.
1114
1115
1116* * *
1117
1118Copyright © 2001–2024 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode [Terms of Use](https://www.unicode.org/copyright.html) apply.
1119
1120Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.
1121