1<?xml version="1.0"?> 2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" 3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ 4 <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> 5 <!ENTITY version SYSTEM "version.xml"> 6]> 7<chapter id="what-is-harfbuzz"> 8 <title>What is HarfBuzz?</title> 9 <para> 10 HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you 11 give HarfBuzz a font and a string containing a sequence of Unicode 12 codepoints, HarfBuzz selects and positions the corresponding 13 glyphs from the font, applying all of the necessary layout rules 14 and font features. HarfBuzz then returns the string to you in the 15 form that is correctly arranged for the language and writing 16 system. 17 </para> 18 <para> 19 HarfBuzz can properly shape all of the world's major writing 20 systems. It runs on all major operating systems and software 21 platforms and it supports the major font formats in use 22 today. 23 </para> 24 <section id="what-is-text-shaping"> 25 <title>What is text shaping?</title> 26 <para> 27 Text shaping is the process of translating a string of character 28 codes (such as Unicode codepoints) into a properly arranged 29 sequence of glyphs that can be rendered onto a screen or into 30 final output form for inclusion in a document. 31 </para> 32 <para> 33 The shaping process is dependent on the input string, the active 34 font, the script (or writing system) that the string is in, and 35 the language that the string is in. 36 </para> 37 <para> 38 Modern software systems generally only deal with strings in the 39 Unicode encoding scheme (although legacy systems and documents may 40 involve other encodings). 41 </para> 42 <para> 43 There are several font formats that a program might 44 encounter, each of which has a set of standard text-shaping 45 rules. 46 </para> 47 <para>The dominant format is <ulink 48 url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The 49 OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for 50 various scripts from around the world. These shaping models depend on 51 the font incorporating certain features as 52 <emphasis>lookups</emphasis> in its <literal>GSUB</literal> 53 and <literal>GPOS</literal> tables. 54 </para> 55 <para> 56 Alternatively, OpenType fonts can include shaping features for 57 the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model. 58 </para> 59 <para> 60 TrueType fonts can also include OpenType shaping 61 features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple 62 Advanced Typography</ulink> (AAT) tables to implement shaping 63 support. AAT fonts are generally only found on macOS and iOS systems. 64 </para> 65 <para> 66 Text strings will usually be tagged with a script and language 67 tag that provide the context needed to perform text shaping 68 correctly. The necessary <ulink 69 url="https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags">script</ulink> 70 and <ulink 71 url="https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags">language</ulink> 72 tags are defined by OpenType. 73 </para> 74 </section> 75 76 <section id="why-do-i-need-a-shaping-engine"> 77 <title>Why do I need a shaping engine?</title> 78 <para> 79 Text shaping is an integral part of preparing text for 80 display. Before a Unicode sequence can be rendered, the 81 codepoints in the sequence must be mapped to the corresponding 82 glyphs provided in the font, and those glyphs must be positioned 83 correctly relative to each other. For many of the scripts 84 supported in Unicode, these steps involve script-specific layout 85 rules, including complex joining, reordering, and positioning 86 behavior. Implementing these rules is the job of the shaping engine. 87 </para> 88 <para> 89 Text shaping is a fairly low-level operation. HarfBuzz is 90 used directly by text-handling libraries like <ulink 91 url="https://www.pango.org/">Pango</ulink>, as well as by the layout 92 engines in Firefox, LibreOffice, and Chromium. Unless you are 93 <emphasis>writing</emphasis> one of these layout engines 94 yourself, you will probably not need to use HarfBuzz: normally, 95 a layout engine, toolkit, or other library will turn text into 96 glyphs for you. 97 </para> 98 <para> 99 However, if you <emphasis>are</emphasis> writing a layout engine 100 or graphics library yourself, then you will need to perform text 101 shaping, and this is where HarfBuzz can help you. 102 </para> 103 <para> 104 Here are some specific scenarios where a text-shaping engine 105 like HarfBuzz helps you: 106 </para> 107 <itemizedlist> 108 <listitem> 109 <para> 110 OpenType fonts contain a set of glyphs (that is, shapes 111 to represent the letters, numbers, punctuation marks, and 112 all other symbols), which are indexed by a <literal>glyph ID</literal>. 113 </para> 114 <para> 115 A particular glyph ID within the font does not necessarily 116 correlate to a predictable Unicode codepoint. For instance, 117 some fonts have the letter "a" as glyph ID 1, but 118 many others do not. In order to retrieve the right glyph 119 from the font to display "a", you need to consult 120 the table inside the font (the <literal>cmap</literal> 121 table) that maps Unicode codepoints to glyph IDs. In other 122 words, <emphasis>text shaping turns codepoints into glyph 123 IDs</emphasis>. 124 </para> 125 </listitem> 126 <listitem> 127 <para> 128 Many OpenType fonts contain ligatures: combinations of 129 characters that are rendered as a single unit. For instance, 130 it is common for the "f, i" letter 131 sequence to appear in print as the single ligature glyph 132 "fi". 133 </para> 134 <para> 135 Whether you should render an "f, i" sequence 136 as <literal>fi</literal> or as "fi" does not 137 depend on the input text. Instead, it depends on the whether 138 or not the font includes an "fi" glyph and on the 139 level of ligature application you wish to perform. The font 140 and the amount of ligature application used are under your 141 control. In other words, <emphasis>text shaping involves 142 querying the font's ligature tables and determining what 143 substitutions should be made</emphasis>. 144 </para> 145 </listitem> 146 <listitem> 147 <para> 148 While ligatures like "fi" are optional typographic 149 refinements, some languages <emphasis>require</emphasis> certain 150 substitutions to be made in order to display text correctly. 151 </para> 152 <para> 153 For example, in Tamil, when the letter "TTA" (ட) 154 letter is followed by the vowel sign "U" (ு), the pair 155 must be replaced by the single glyph "டு". The 156 sequence of Unicode characters "ட,ு" needs to be 157 substituted with a single "டு" glyph from the 158 font. 159 </para> 160 <para> 161 But "டு" does not have a Unicode codepoint. To 162 find this glyph, you need to consult the table inside 163 the font (the <literal>GSUB</literal> table) that contains 164 substitution information. In other words, <emphasis>text shaping 165 chooses the correct glyph for a sequence of characters 166 provided</emphasis>. 167 </para> 168 </listitem> 169 <listitem> 170 <para> 171 Similarly, each Arabic character has four different variants 172 corresponding to the different positions it might appear in 173 within a sequence. Inside a font, there will be separate 174 glyphs for the initial, medial, final, and isolated forms of 175 each letter, each at a different glyph ID. 176 </para> 177 <para> 178 Unicode only assigns one codepoint per character, so a 179 Unicode string will not tell you which glyph variant to use 180 for each character. To decide, you need to analyze the whole 181 string and determine the appropriate glyph for each character 182 based on its position. In other words, <emphasis>text 183 shaping chooses the correct form of the letter by its 184 position and returns the correct glyph from the font</emphasis>. 185 </para> 186 </listitem> 187 <listitem> 188 <para> 189 Other languages involve marks and accents that need to be 190 rendered in specific positions relative a base character. For 191 instance, the Moldovan language includes the Cyrillic letter 192 "zhe" (ж) with a breve accent, like so: "ӂ". 193 </para> 194 <para> 195 Some fonts will provide this character as a single 196 zhe-with-breve glyph, but other fonts will not and, instead, 197 will expect the rendering engine to form the character by 198 superimposing the separate "ж" and "˘" 199 glyphs. 200 </para> 201 <para> 202 But exactly where you should draw the breve depends on the 203 height and width of the preceding zhe glyph. To find the 204 right position, you need to consult the table inside 205 the font (the <literal>GPOS</literal> table) that contains 206 positioning information. 207 In other words, <emphasis>text shaping tells you whether you 208 have a precomposed glyph within your font or if you need to 209 compose a glyph yourself out of combining marks—and, 210 if so, where to position those marks.</emphasis> 211 </para> 212 </listitem> 213 </itemizedlist> 214 <para> 215 If tasks like these are something that you need to do, then you 216 need a text shaping engine. You could use Uniscribe if you are 217 writing Windows software; you could use CoreText on macOS; or 218 you could use HarfBuzz. 219 </para> 220 <note> 221 <para> 222 In the rest of this manual, the text will assume that the reader 223 is that implementor of a text-layout engine. 224 </para> 225 </note> 226 </section> 227 228 229 <section id="what-does-harfbuzz-do"> 230 <title>What does HarfBuzz do?</title> 231 <para> 232 HarfBuzz provides text shaping through a cross-platform 233 C API that accepts sequences of Unicode codepoints as input. Currently, 234 the following OpenType shaping models are supported: 235 </para> 236 <itemizedlist> 237 <listitem> 238 <para> 239 Indic (covering Devanagari, Bengali, Gujarati, 240 Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu) 241 </para> 242 </listitem> 243 <listitem> 244 <para> 245 Arabic (covering Arabic, N'Ko, Syriac, and Mongolian) 246 </para> 247 </listitem> 248 <listitem> 249 <para> 250 Thai and Lao 251 </para> 252 </listitem> 253 <listitem> 254 <para> 255 Khmer 256 </para> 257 </listitem> 258 <listitem> 259 <para> 260 Myanmar 261 </para> 262 </listitem> 263 264 <listitem> 265 <para> 266 Tibetan 267 </para> 268 </listitem> 269 270 <listitem> 271 <para> 272 Hangul 273 </para> 274 </listitem> 275 276 <listitem> 277 <para> 278 Hebrew 279 </para> 280 </listitem> 281 <listitem> 282 <para> 283 The Universal Shaping Engine or <emphasis>USE</emphasis> 284 (covering complex scripts not covered by the above shaping 285 models) 286 </para> 287 </listitem> 288 <listitem> 289 <para> 290 A default shaping model for non-complex scripts 291 (covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh, 292 and many others) 293 </para> 294 </listitem> 295 <listitem> 296 <para> 297 Emoji (including emoji modifier sequences, flag sequences, 298 and ZWJ sequences) 299 </para> 300 </listitem> 301 </itemizedlist> 302 303 <para> 304 In addition to OpenType shaping, HarfBuzz supports the latest 305 version of Graphite shaping (the "Graphite 2" model) and AAT 306 shaping. 307 </para> 308 309 <para> 310 HarfBuzz can read and understand TrueType fonts (.ttf), TrueType 311 collections (.ttc), and OpenType fonts (.otf, including those 312 fonts that contain TrueType-style outlines and those that 313 contain PostScript CFF or CFF2 outlines). 314 </para> 315 316 <para> 317 HarfBuzz is designed and tested to run on top of the FreeType 318 font renderer. It can run on Linux, Android, Windows, macOS, and 319 iOS systems. 320 </para> 321 322 <para> 323 In addition to its core shaping functionality, HarfBuzz provides 324 functions for accessing other font features, including optional 325 GSUB and GPOS OpenType features, as well as 326 all color-font formats (<literal>CBDT</literal>, 327 <literal>sbix</literal>, <literal>COLR/CPAL</literal>, and 328 <literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz 329 also includes a font-subsetting feature. HarfBuzz can perform 330 some low-level math-shaping operations, although it does not 331 currently perform full shaping for mathematical typesetting. 332 </para> 333 334 <para> 335 A suite of command-line utilities is also provided in the 336 source-code tree, designed to help users test and debug 337 HarfBuzz's features on real-world fonts and input. 338 </para> 339 </section> 340 341 <section id="what-harfbuzz-doesnt-do"> 342 <title>What HarfBuzz doesn't do</title> 343 <para> 344 HarfBuzz will take a Unicode string, shape it, and give you the 345 information required to lay it out correctly on a single 346 horizontal (or vertical) line using the font provided. That is the 347 extent of HarfBuzz's responsibility. 348 </para> 349 <para> 350 It is important to note that if you are implementing a complete 351 text-layout engine you may have other responsibilities that 352 HarfBuzz will <emphasis>not</emphasis> help you with. For example: 353 </para> 354 <itemizedlist> 355 <listitem> 356 <para> 357 HarfBuzz won't help you with bidirectionality. If you want to 358 lay out text that includes a mix of Hebrew and English, you 359 will need to ensure that each buffer provided to HarfBuzz 360 has all of its characters in the same order and that the 361 directionality of the buffer is set correctly. This may mean 362 segmenting the text before it is placed into HarfBuzz buffers. In 363 other words, the user will hit the keys in the following 364 sequence: 365 </para> 366 <programlisting> 367 A B C [space] ג ב א [space] D E F 368 </programlisting> 369 <para> 370 but will expect to see in the output: 371 </para> 372 <programlisting> 373 ABC אבג DEF 374 </programlisting> 375 <para> 376 This reordering is called <emphasis>bidi processing</emphasis> 377 ("bidi" is short for bidirectional), and there's an 378 algorithm as an annex to the Unicode Standard which tells you how 379 to process a string of mixed directionality. 380 Before sending your string to HarfBuzz, you may need to apply the 381 bidi algorithm to it. Libraries such as <ulink 382 url="http://icu-project.org/">ICU</ulink> and <ulink 383 url="http://fribidi.org/">fribidi</ulink> can do this for you. 384 </para> 385 </listitem> 386 <listitem> 387 <para> 388 HarfBuzz won't help you with text that contains different font 389 properties. For instance, if you have the string "a 390 <emphasis>huge</emphasis> breakfast", and you expect 391 "huge" to be italic, then you will need to send three 392 strings to HarfBuzz: <literal>a</literal>, in your Roman font; 393 <literal>huge</literal> using your italic font; and 394 <literal>breakfast</literal> using your Roman font again. 395 </para> 396 <para> 397 Similarly, if you change the font, font size, script, 398 language, or direction within your string, then you will 399 need to shape each run independently and output them 400 independently. HarfBuzz expects to shape a run of characters 401 that all share the same properties. 402 </para> 403 </listitem> 404 <listitem> 405 <para> 406 HarfBuzz won't help you with line breaking, hyphenation, or 407 justification. As mentioned above, HarfBuzz lays out the string 408 along a <emphasis>single line</emphasis> of, notionally, 409 infinite length. If you want to find out where the potential 410 word, sentence and line break points are in your text, you 411 could use the ICU library's break iterator functions. 412 </para> 413 <para> 414 HarfBuzz can tell you how wide a shaped piece of text is, which is 415 useful input to a justification algorithm, but it knows nothing 416 about paragraphs, lines or line lengths. Nor will it adjust the 417 space between words to fit them proportionally into a line. 418 </para> 419 </listitem> 420 </itemizedlist> 421 <para> 422 As a layout-engine implementor, HarfBuzz will help you with the 423 interface between your text and your font, and that's something 424 that you'll need—what you then do with the glyphs that your font 425 returns is up to you. 426 </para> 427 </section> 428 429 <section id="why-is-it-called-harfbuzz"> 430 <title>Why is it called HarfBuzz?</title> 431 <para> 432 HarfBuzz began its life as text-shaping code within the FreeType 433 project (and you will see references to the FreeType authors 434 within the source code copyright declarations), but was then 435 extracted out to its own project. This project is maintained by 436 Behdad Esfahbod, who named it HarfBuzz. Originally, it was a 437 shaping engine for OpenType fonts—"HarfBuzz" is 438 the Persian for "open type". 439 </para> 440 </section> 441</chapter> 442