1*2d1272b8SAndroid Build Coastguard Worker<?xml version="1.0"?> 2*2d1272b8SAndroid Build Coastguard Worker<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" 3*2d1272b8SAndroid Build Coastguard Worker "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ 4*2d1272b8SAndroid Build Coastguard Worker <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> 5*2d1272b8SAndroid Build Coastguard Worker <!ENTITY version SYSTEM "version.xml"> 6*2d1272b8SAndroid Build Coastguard Worker]> 7*2d1272b8SAndroid Build Coastguard Worker<chapter id="buffers-language-script-and-direction"> 8*2d1272b8SAndroid Build Coastguard Worker <title>Buffers, language, script and direction</title> 9*2d1272b8SAndroid Build Coastguard Worker <para> 10*2d1272b8SAndroid Build Coastguard Worker The input to the HarfBuzz shaper is a series of Unicode characters, stored in a 11*2d1272b8SAndroid Build Coastguard Worker buffer. In this chapter, we'll look at how to set up a buffer with 12*2d1272b8SAndroid Build Coastguard Worker the text that we want and how to customize the properties of the 13*2d1272b8SAndroid Build Coastguard Worker buffer. We'll also look at a piece of lower-level machinery that 14*2d1272b8SAndroid Build Coastguard Worker you will need to understand before proceeding: the functions that 15*2d1272b8SAndroid Build Coastguard Worker HarfBuzz uses to retrieve Unicode information. 16*2d1272b8SAndroid Build Coastguard Worker </para> 17*2d1272b8SAndroid Build Coastguard Worker <para> 18*2d1272b8SAndroid Build Coastguard Worker After shaping is complete, HarfBuzz puts its output back 19*2d1272b8SAndroid Build Coastguard Worker into the buffer. But getting that output requires setting up a 20*2d1272b8SAndroid Build Coastguard Worker face and a font first, so we will look at that in the next chapter 21*2d1272b8SAndroid Build Coastguard Worker instead of here. 22*2d1272b8SAndroid Build Coastguard Worker </para> 23*2d1272b8SAndroid Build Coastguard Worker <section id="creating-and-destroying-buffers"> 24*2d1272b8SAndroid Build Coastguard Worker <title>Creating and destroying buffers</title> 25*2d1272b8SAndroid Build Coastguard Worker <para> 26*2d1272b8SAndroid Build Coastguard Worker As we saw in our <emphasis>Getting Started</emphasis> example, a 27*2d1272b8SAndroid Build Coastguard Worker buffer is created and 28*2d1272b8SAndroid Build Coastguard Worker initialized with <function>hb_buffer_create()</function>. This 29*2d1272b8SAndroid Build Coastguard Worker produces a new, empty buffer object, instantiated with some 30*2d1272b8SAndroid Build Coastguard Worker default values and ready to accept your Unicode strings. 31*2d1272b8SAndroid Build Coastguard Worker </para> 32*2d1272b8SAndroid Build Coastguard Worker <para> 33*2d1272b8SAndroid Build Coastguard Worker HarfBuzz manages the memory of objects (such as buffers) that it 34*2d1272b8SAndroid Build Coastguard Worker creates, so you don't have to. When you have finished working on 35*2d1272b8SAndroid Build Coastguard Worker a buffer, you can call <function>hb_buffer_destroy()</function>: 36*2d1272b8SAndroid Build Coastguard Worker </para> 37*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 38*2d1272b8SAndroid Build Coastguard Worker hb_buffer_t *buf = hb_buffer_create(); 39*2d1272b8SAndroid Build Coastguard Worker ... 40*2d1272b8SAndroid Build Coastguard Worker hb_buffer_destroy(buf); 41*2d1272b8SAndroid Build Coastguard Worker </programlisting> 42*2d1272b8SAndroid Build Coastguard Worker <para> 43*2d1272b8SAndroid Build Coastguard Worker This will destroy the object and free its associated memory - 44*2d1272b8SAndroid Build Coastguard Worker unless some other part of the program holds a reference to this 45*2d1272b8SAndroid Build Coastguard Worker buffer. If you acquire a HarfBuzz buffer from another subsystem 46*2d1272b8SAndroid Build Coastguard Worker and want to ensure that it is not garbage collected by someone 47*2d1272b8SAndroid Build Coastguard Worker else destroying it, you should increase its reference count: 48*2d1272b8SAndroid Build Coastguard Worker </para> 49*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 50*2d1272b8SAndroid Build Coastguard Worker void somefunc(hb_buffer_t *buf) { 51*2d1272b8SAndroid Build Coastguard Worker buf = hb_buffer_reference(buf); 52*2d1272b8SAndroid Build Coastguard Worker ... 53*2d1272b8SAndroid Build Coastguard Worker </programlisting> 54*2d1272b8SAndroid Build Coastguard Worker <para> 55*2d1272b8SAndroid Build Coastguard Worker And then decrease it once you're done with it: 56*2d1272b8SAndroid Build Coastguard Worker </para> 57*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 58*2d1272b8SAndroid Build Coastguard Worker hb_buffer_destroy(buf); 59*2d1272b8SAndroid Build Coastguard Worker } 60*2d1272b8SAndroid Build Coastguard Worker </programlisting> 61*2d1272b8SAndroid Build Coastguard Worker <para> 62*2d1272b8SAndroid Build Coastguard Worker While we are on the subject of reference-counting buffers, it is 63*2d1272b8SAndroid Build Coastguard Worker worth noting that an individual buffer can only meaningfully be 64*2d1272b8SAndroid Build Coastguard Worker used by one thread at a time. 65*2d1272b8SAndroid Build Coastguard Worker </para> 66*2d1272b8SAndroid Build Coastguard Worker <para> 67*2d1272b8SAndroid Build Coastguard Worker To throw away all the data in your buffer and start from scratch, 68*2d1272b8SAndroid Build Coastguard Worker call <function>hb_buffer_reset(buf)</function>. If you want to 69*2d1272b8SAndroid Build Coastguard Worker throw away the string in the buffer but keep the options, you can 70*2d1272b8SAndroid Build Coastguard Worker instead call <function>hb_buffer_clear_contents(buf)</function>. 71*2d1272b8SAndroid Build Coastguard Worker </para> 72*2d1272b8SAndroid Build Coastguard Worker </section> 73*2d1272b8SAndroid Build Coastguard Worker 74*2d1272b8SAndroid Build Coastguard Worker <section id="adding-text-to-the-buffer"> 75*2d1272b8SAndroid Build Coastguard Worker <title>Adding text to the buffer</title> 76*2d1272b8SAndroid Build Coastguard Worker <para> 77*2d1272b8SAndroid Build Coastguard Worker Now we have a brand new HarfBuzz buffer. Let's start filling it 78*2d1272b8SAndroid Build Coastguard Worker with text! From HarfBuzz's perspective, a buffer is just a stream 79*2d1272b8SAndroid Build Coastguard Worker of Unicode code points, but your input string is probably in one of 80*2d1272b8SAndroid Build Coastguard Worker the standard Unicode character encodings (UTF-8, UTF-16, or 81*2d1272b8SAndroid Build Coastguard Worker UTF-32). HarfBuzz provides convenience functions that accept 82*2d1272b8SAndroid Build Coastguard Worker each of these encodings: 83*2d1272b8SAndroid Build Coastguard Worker <function>hb_buffer_add_utf8()</function>, 84*2d1272b8SAndroid Build Coastguard Worker <function>hb_buffer_add_utf16()</function>, and 85*2d1272b8SAndroid Build Coastguard Worker <function>hb_buffer_add_utf32()</function>. Other than the 86*2d1272b8SAndroid Build Coastguard Worker character encoding they accept, they function identically. 87*2d1272b8SAndroid Build Coastguard Worker </para> 88*2d1272b8SAndroid Build Coastguard Worker <para> 89*2d1272b8SAndroid Build Coastguard Worker You can add UTF-8 text to a buffer by passing in the text array, 90*2d1272b8SAndroid Build Coastguard Worker the array's length, an offset into the array for the first 91*2d1272b8SAndroid Build Coastguard Worker character to add, and the length of the segment to add: 92*2d1272b8SAndroid Build Coastguard Worker </para> 93*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 94*2d1272b8SAndroid Build Coastguard Worker hb_buffer_add_utf8 (hb_buffer_t *buf, 95*2d1272b8SAndroid Build Coastguard Worker const char *text, 96*2d1272b8SAndroid Build Coastguard Worker int text_length, 97*2d1272b8SAndroid Build Coastguard Worker unsigned int item_offset, 98*2d1272b8SAndroid Build Coastguard Worker int item_length) 99*2d1272b8SAndroid Build Coastguard Worker </programlisting> 100*2d1272b8SAndroid Build Coastguard Worker <para> 101*2d1272b8SAndroid Build Coastguard Worker So, in practice, you can say: 102*2d1272b8SAndroid Build Coastguard Worker </para> 103*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 104*2d1272b8SAndroid Build Coastguard Worker hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text)); 105*2d1272b8SAndroid Build Coastguard Worker </programlisting> 106*2d1272b8SAndroid Build Coastguard Worker <para> 107*2d1272b8SAndroid Build Coastguard Worker This will append your new characters to 108*2d1272b8SAndroid Build Coastguard Worker <parameter>buf</parameter>, not replace its existing 109*2d1272b8SAndroid Build Coastguard Worker contents. Also, note that you can use <literal>-1</literal> in 110*2d1272b8SAndroid Build Coastguard Worker place of the first instance of <function>strlen(text)</function> 111*2d1272b8SAndroid Build Coastguard Worker if your text array is NULL-terminated. Similarly, you can also use 112*2d1272b8SAndroid Build Coastguard Worker <literal>-1</literal> as the final argument want to add its full 113*2d1272b8SAndroid Build Coastguard Worker contents. 114*2d1272b8SAndroid Build Coastguard Worker </para> 115*2d1272b8SAndroid Build Coastguard Worker <para> 116*2d1272b8SAndroid Build Coastguard Worker Whatever start <parameter>item_offset</parameter> and 117*2d1272b8SAndroid Build Coastguard Worker <parameter>item_length</parameter> you provide, HarfBuzz will also 118*2d1272b8SAndroid Build Coastguard Worker attempt to grab the five characters <emphasis>before</emphasis> 119*2d1272b8SAndroid Build Coastguard Worker the offset point and the five characters 120*2d1272b8SAndroid Build Coastguard Worker <emphasis>after</emphasis> the designated end. These are the 121*2d1272b8SAndroid Build Coastguard Worker before and after "context" segments, which are used internally 122*2d1272b8SAndroid Build Coastguard Worker for HarfBuzz to make shaping decisions. They will not be part of 123*2d1272b8SAndroid Build Coastguard Worker the final output, but they ensure that HarfBuzz's 124*2d1272b8SAndroid Build Coastguard Worker script-specific shaping operations are correct. If there are 125*2d1272b8SAndroid Build Coastguard Worker fewer than five characters available for the before or after 126*2d1272b8SAndroid Build Coastguard Worker contexts, HarfBuzz will just grab what is there. 127*2d1272b8SAndroid Build Coastguard Worker </para> 128*2d1272b8SAndroid Build Coastguard Worker <para> 129*2d1272b8SAndroid Build Coastguard Worker For longer text runs, such as full paragraphs, it might be 130*2d1272b8SAndroid Build Coastguard Worker tempting to only add smaller sub-segments to a buffer and 131*2d1272b8SAndroid Build Coastguard Worker shape them in piecemeal fashion. Generally, this is not a good 132*2d1272b8SAndroid Build Coastguard Worker idea, however, because a lot of shaping decisions are 133*2d1272b8SAndroid Build Coastguard Worker dependent on this context information. For example, in Arabic 134*2d1272b8SAndroid Build Coastguard Worker and other connected scripts, HarfBuzz needs to know the code 135*2d1272b8SAndroid Build Coastguard Worker points before and after each character in order to correctly 136*2d1272b8SAndroid Build Coastguard Worker determine which glyph to return. 137*2d1272b8SAndroid Build Coastguard Worker </para> 138*2d1272b8SAndroid Build Coastguard Worker <para> 139*2d1272b8SAndroid Build Coastguard Worker The safest approach is to add all of the text available (even 140*2d1272b8SAndroid Build Coastguard Worker if your text contains a mix of scripts, directions, languages 141*2d1272b8SAndroid Build Coastguard Worker and fonts), then use <parameter>item_offset</parameter> and 142*2d1272b8SAndroid Build Coastguard Worker <parameter>item_length</parameter> to indicate which characters you 143*2d1272b8SAndroid Build Coastguard Worker want shaped (which must all have the same script, direction, 144*2d1272b8SAndroid Build Coastguard Worker language and font), so that HarfBuzz has access to any context. 145*2d1272b8SAndroid Build Coastguard Worker </para> 146*2d1272b8SAndroid Build Coastguard Worker <para> 147*2d1272b8SAndroid Build Coastguard Worker You can also add Unicode code points directly with 148*2d1272b8SAndroid Build Coastguard Worker <function>hb_buffer_add_codepoints()</function>. The arguments 149*2d1272b8SAndroid Build Coastguard Worker to this function are the same as those for the UTF 150*2d1272b8SAndroid Build Coastguard Worker encodings. But it is particularly important to note that 151*2d1272b8SAndroid Build Coastguard Worker HarfBuzz does not do validity checking on the text that is added 152*2d1272b8SAndroid Build Coastguard Worker to a buffer. Invalid code points will be replaced, but it is up 153*2d1272b8SAndroid Build Coastguard Worker to you to do any deep-sanity checking necessary. 154*2d1272b8SAndroid Build Coastguard Worker </para> 155*2d1272b8SAndroid Build Coastguard Worker 156*2d1272b8SAndroid Build Coastguard Worker </section> 157*2d1272b8SAndroid Build Coastguard Worker 158*2d1272b8SAndroid Build Coastguard Worker <section id="setting-buffer-properties"> 159*2d1272b8SAndroid Build Coastguard Worker <title>Setting buffer properties</title> 160*2d1272b8SAndroid Build Coastguard Worker <para> 161*2d1272b8SAndroid Build Coastguard Worker Buffers containing input characters still need several 162*2d1272b8SAndroid Build Coastguard Worker properties set before HarfBuzz can shape their text correctly. 163*2d1272b8SAndroid Build Coastguard Worker </para> 164*2d1272b8SAndroid Build Coastguard Worker <para> 165*2d1272b8SAndroid Build Coastguard Worker Initially, all buffers are set to the 166*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_CONTENT_TYPE_INVALID</literal> content 167*2d1272b8SAndroid Build Coastguard Worker type. After adding text, the buffer should be set to 168*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_CONTENT_TYPE_UNICODE</literal> instead, which 169*2d1272b8SAndroid Build Coastguard Worker indicates that it contains un-shaped input 170*2d1272b8SAndroid Build Coastguard Worker characters. After shaping, the buffer will have the 171*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal> content type. 172*2d1272b8SAndroid Build Coastguard Worker </para> 173*2d1272b8SAndroid Build Coastguard Worker <para> 174*2d1272b8SAndroid Build Coastguard Worker <function>hb_buffer_add_utf8()</function> and the 175*2d1272b8SAndroid Build Coastguard Worker other UTF functions set the content type of their buffer 176*2d1272b8SAndroid Build Coastguard Worker automatically. But if you are reusing a buffer you may want to 177*2d1272b8SAndroid Build Coastguard Worker check its state with 178*2d1272b8SAndroid Build Coastguard Worker <function>hb_buffer_get_content_type(buffer)</function>. If 179*2d1272b8SAndroid Build Coastguard Worker necessary you can set the content type with 180*2d1272b8SAndroid Build Coastguard Worker </para> 181*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 182*2d1272b8SAndroid Build Coastguard Worker hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE); 183*2d1272b8SAndroid Build Coastguard Worker </programlisting> 184*2d1272b8SAndroid Build Coastguard Worker <para> 185*2d1272b8SAndroid Build Coastguard Worker to prepare for shaping. 186*2d1272b8SAndroid Build Coastguard Worker </para> 187*2d1272b8SAndroid Build Coastguard Worker <para> 188*2d1272b8SAndroid Build Coastguard Worker Buffers also need to carry information about the script, 189*2d1272b8SAndroid Build Coastguard Worker language, and text direction of their contents. You can set 190*2d1272b8SAndroid Build Coastguard Worker these properties individually: 191*2d1272b8SAndroid Build Coastguard Worker </para> 192*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 193*2d1272b8SAndroid Build Coastguard Worker hb_buffer_set_direction(buf, HB_DIRECTION_LTR); 194*2d1272b8SAndroid Build Coastguard Worker hb_buffer_set_script(buf, HB_SCRIPT_LATIN); 195*2d1272b8SAndroid Build Coastguard Worker hb_buffer_set_language(buf, hb_language_from_string("en", -1)); 196*2d1272b8SAndroid Build Coastguard Worker </programlisting> 197*2d1272b8SAndroid Build Coastguard Worker <para> 198*2d1272b8SAndroid Build Coastguard Worker However, since these properties are often repeated for 199*2d1272b8SAndroid Build Coastguard Worker multiple text runs, you can also save them in a 200*2d1272b8SAndroid Build Coastguard Worker <literal>hb_segment_properties_t</literal> for reuse: 201*2d1272b8SAndroid Build Coastguard Worker </para> 202*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 203*2d1272b8SAndroid Build Coastguard Worker hb_segment_properties_t *savedprops; 204*2d1272b8SAndroid Build Coastguard Worker hb_buffer_get_segment_properties (buf, savedprops); 205*2d1272b8SAndroid Build Coastguard Worker ... 206*2d1272b8SAndroid Build Coastguard Worker hb_buffer_set_segment_properties (buf2, savedprops); 207*2d1272b8SAndroid Build Coastguard Worker </programlisting> 208*2d1272b8SAndroid Build Coastguard Worker <para> 209*2d1272b8SAndroid Build Coastguard Worker HarfBuzz also provides getter functions to retrieve a buffer's 210*2d1272b8SAndroid Build Coastguard Worker direction, script, and language properties individually. 211*2d1272b8SAndroid Build Coastguard Worker </para> 212*2d1272b8SAndroid Build Coastguard Worker <para> 213*2d1272b8SAndroid Build Coastguard Worker HarfBuzz recognizes four text directions in 214*2d1272b8SAndroid Build Coastguard Worker <type>hb_direction_t</type>: left-to-right 215*2d1272b8SAndroid Build Coastguard Worker (<literal>HB_DIRECTION_LTR</literal>), right-to-left (<literal>HB_DIRECTION_RTL</literal>), 216*2d1272b8SAndroid Build Coastguard Worker top-to-bottom (<literal>HB_DIRECTION_TTB</literal>), and 217*2d1272b8SAndroid Build Coastguard Worker bottom-to-top (<literal>HB_DIRECTION_BTT</literal>). For the 218*2d1272b8SAndroid Build Coastguard Worker script property, HarfBuzz uses identifiers based on the 219*2d1272b8SAndroid Build Coastguard Worker <ulink 220*2d1272b8SAndroid Build Coastguard Worker url="https://unicode.org/iso15924/">ISO 15924 221*2d1272b8SAndroid Build Coastguard Worker standard</ulink>. For languages, HarfBuzz uses tags based on the 222*2d1272b8SAndroid Build Coastguard Worker <ulink url="https://tools.ietf.org/html/bcp47">IETF BCP 47</ulink> standard. 223*2d1272b8SAndroid Build Coastguard Worker </para> 224*2d1272b8SAndroid Build Coastguard Worker <para> 225*2d1272b8SAndroid Build Coastguard Worker Helper functions are provided to convert character strings into 226*2d1272b8SAndroid Build Coastguard Worker the necessary script and language tag types. 227*2d1272b8SAndroid Build Coastguard Worker </para> 228*2d1272b8SAndroid Build Coastguard Worker <para> 229*2d1272b8SAndroid Build Coastguard Worker Two additional buffer properties to be aware of are the 230*2d1272b8SAndroid Build Coastguard Worker "invisible glyph" and the replacement code point. The 231*2d1272b8SAndroid Build Coastguard Worker replacement code point is inserted into buffer output in place of 232*2d1272b8SAndroid Build Coastguard Worker any invalid code points encountered in the input. By default, it 233*2d1272b8SAndroid Build Coastguard Worker is the Unicode <literal>REPLACEMENT CHARACTER</literal> code 234*2d1272b8SAndroid Build Coastguard Worker point, <literal>U+FFFD</literal> "�". You can change this with 235*2d1272b8SAndroid Build Coastguard Worker </para> 236*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 237*2d1272b8SAndroid Build Coastguard Worker hb_buffer_set_replacement_codepoint(buf, replacement); 238*2d1272b8SAndroid Build Coastguard Worker </programlisting> 239*2d1272b8SAndroid Build Coastguard Worker <para> 240*2d1272b8SAndroid Build Coastguard Worker passing in the replacement Unicode code point as the 241*2d1272b8SAndroid Build Coastguard Worker <parameter>replacement</parameter> parameter. 242*2d1272b8SAndroid Build Coastguard Worker </para> 243*2d1272b8SAndroid Build Coastguard Worker <para> 244*2d1272b8SAndroid Build Coastguard Worker The invisible glyph is used to replace all output glyphs that 245*2d1272b8SAndroid Build Coastguard Worker are invisible. By default, the standard space character 246*2d1272b8SAndroid Build Coastguard Worker <literal>U+0020</literal> is used; you can replace this (for 247*2d1272b8SAndroid Build Coastguard Worker example, when using a font that provides script-specific 248*2d1272b8SAndroid Build Coastguard Worker spaces) with 249*2d1272b8SAndroid Build Coastguard Worker </para> 250*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 251*2d1272b8SAndroid Build Coastguard Worker hb_buffer_set_invisible_glyph(buf, replacement_glyph); 252*2d1272b8SAndroid Build Coastguard Worker </programlisting> 253*2d1272b8SAndroid Build Coastguard Worker <para> 254*2d1272b8SAndroid Build Coastguard Worker Do note that in the <parameter>replacement_glyph</parameter> 255*2d1272b8SAndroid Build Coastguard Worker parameter, you must provide the glyph ID of the replacement you 256*2d1272b8SAndroid Build Coastguard Worker wish to use, not the Unicode code point. 257*2d1272b8SAndroid Build Coastguard Worker </para> 258*2d1272b8SAndroid Build Coastguard Worker <para> 259*2d1272b8SAndroid Build Coastguard Worker HarfBuzz supports a few additional flags you might want to set 260*2d1272b8SAndroid Build Coastguard Worker on your buffer under certain circumstances. The 261*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_FLAG_BOT</literal> and 262*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_FLAG_EOT</literal> flags tell HarfBuzz 263*2d1272b8SAndroid Build Coastguard Worker that the buffer represents the beginning or end (respectively) 264*2d1272b8SAndroid Build Coastguard Worker of a text element (such as a paragraph or other block). Knowing 265*2d1272b8SAndroid Build Coastguard Worker this allows HarfBuzz to apply certain contextual font features 266*2d1272b8SAndroid Build Coastguard Worker when shaping, such as initial or final variants in connected 267*2d1272b8SAndroid Build Coastguard Worker scripts. 268*2d1272b8SAndroid Build Coastguard Worker </para> 269*2d1272b8SAndroid Build Coastguard Worker <para> 270*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES</literal> 271*2d1272b8SAndroid Build Coastguard Worker tells HarfBuzz not to hide glyphs with the 272*2d1272b8SAndroid Build Coastguard Worker <literal>Default_Ignorable</literal> property in Unicode. This 273*2d1272b8SAndroid Build Coastguard Worker property designates control characters and other non-printing 274*2d1272b8SAndroid Build Coastguard Worker code points, such as joiners and variation selectors. Normally 275*2d1272b8SAndroid Build Coastguard Worker HarfBuzz replaces them in the output buffer with zero-width 276*2d1272b8SAndroid Build Coastguard Worker space glyphs (using the "invisible glyph" property discussed 277*2d1272b8SAndroid Build Coastguard Worker above); setting this flag causes them to be printed, which can 278*2d1272b8SAndroid Build Coastguard Worker be helpful for troubleshooting. 279*2d1272b8SAndroid Build Coastguard Worker </para> 280*2d1272b8SAndroid Build Coastguard Worker <para> 281*2d1272b8SAndroid Build Coastguard Worker Conversely, setting the 282*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES</literal> flag 283*2d1272b8SAndroid Build Coastguard Worker tells HarfBuzz to remove <literal>Default_Ignorable</literal> 284*2d1272b8SAndroid Build Coastguard Worker glyphs from the output buffer entirely. Finally, setting the 285*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_FLAG_DO_NOT_INSERT_DOTTED_CIRCLE</literal> 286*2d1272b8SAndroid Build Coastguard Worker flag tells HarfBuzz not to insert the dotted-circle glyph 287*2d1272b8SAndroid Build Coastguard Worker (<literal>U+25CC</literal>, "◌"), which is normally 288*2d1272b8SAndroid Build Coastguard Worker inserted into buffer output when broken character sequences are 289*2d1272b8SAndroid Build Coastguard Worker encountered (such as combining marks that are not attached to a 290*2d1272b8SAndroid Build Coastguard Worker base character). 291*2d1272b8SAndroid Build Coastguard Worker </para> 292*2d1272b8SAndroid Build Coastguard Worker </section> 293*2d1272b8SAndroid Build Coastguard Worker 294*2d1272b8SAndroid Build Coastguard Worker <section id="customizing-unicode-functions"> 295*2d1272b8SAndroid Build Coastguard Worker <title>Customizing Unicode functions</title> 296*2d1272b8SAndroid Build Coastguard Worker <para> 297*2d1272b8SAndroid Build Coastguard Worker HarfBuzz requires some simple functions for accessing 298*2d1272b8SAndroid Build Coastguard Worker information from the Unicode Character Database (such as the 299*2d1272b8SAndroid Build Coastguard Worker <literal>General_Category</literal> (gc) and 300*2d1272b8SAndroid Build Coastguard Worker <literal>Script</literal> (sc) properties) that is useful 301*2d1272b8SAndroid Build Coastguard Worker for shaping, as well as some useful operations like composing and 302*2d1272b8SAndroid Build Coastguard Worker decomposing code points. 303*2d1272b8SAndroid Build Coastguard Worker </para> 304*2d1272b8SAndroid Build Coastguard Worker <para> 305*2d1272b8SAndroid Build Coastguard Worker HarfBuzz includes its own internal, lightweight set of Unicode 306*2d1272b8SAndroid Build Coastguard Worker functions. At build time, it is also possible to compile support 307*2d1272b8SAndroid Build Coastguard Worker for some other options, such as the Unicode functions provided 308*2d1272b8SAndroid Build Coastguard Worker by GLib or the International Components for Unicode (ICU) 309*2d1272b8SAndroid Build Coastguard Worker library. Generally, this option is only of interest for client 310*2d1272b8SAndroid Build Coastguard Worker programs that have specific integration requirements or that do 311*2d1272b8SAndroid Build Coastguard Worker a significant amount of customization. 312*2d1272b8SAndroid Build Coastguard Worker </para> 313*2d1272b8SAndroid Build Coastguard Worker <para> 314*2d1272b8SAndroid Build Coastguard Worker If your program has access to other Unicode functions, however, 315*2d1272b8SAndroid Build Coastguard Worker such as through a system library or application framework, you 316*2d1272b8SAndroid Build Coastguard Worker might prefer to use those instead of the built-in 317*2d1272b8SAndroid Build Coastguard Worker options. HarfBuzz supports this by implementing its Unicode 318*2d1272b8SAndroid Build Coastguard Worker functions as a set of virtual methods that you can replace — 319*2d1272b8SAndroid Build Coastguard Worker without otherwise affecting HarfBuzz's functionality. 320*2d1272b8SAndroid Build Coastguard Worker </para> 321*2d1272b8SAndroid Build Coastguard Worker <para> 322*2d1272b8SAndroid Build Coastguard Worker The Unicode functions are specified in a structure called 323*2d1272b8SAndroid Build Coastguard Worker <literal>unicode_funcs</literal> which is attached to each 324*2d1272b8SAndroid Build Coastguard Worker buffer. But even though <literal>unicode_funcs</literal> is 325*2d1272b8SAndroid Build Coastguard Worker associated with a <type>hb_buffer_t</type>, the functions 326*2d1272b8SAndroid Build Coastguard Worker themselves are called by other HarfBuzz APIs that access 327*2d1272b8SAndroid Build Coastguard Worker buffers, so it would be unwise for you to hook different 328*2d1272b8SAndroid Build Coastguard Worker functions into different buffers. 329*2d1272b8SAndroid Build Coastguard Worker </para> 330*2d1272b8SAndroid Build Coastguard Worker <para> 331*2d1272b8SAndroid Build Coastguard Worker In addition, you can mark your <literal>unicode_funcs</literal> 332*2d1272b8SAndroid Build Coastguard Worker as immutable by calling 333*2d1272b8SAndroid Build Coastguard Worker <function>hb_unicode_funcs_make_immutable (ufuncs)</function>. 334*2d1272b8SAndroid Build Coastguard Worker This is especially useful if your code is a 335*2d1272b8SAndroid Build Coastguard Worker library or framework that will have its own client programs. By 336*2d1272b8SAndroid Build Coastguard Worker marking your Unicode function choices as immutable, you prevent 337*2d1272b8SAndroid Build Coastguard Worker your own client programs from changing the 338*2d1272b8SAndroid Build Coastguard Worker <literal>unicode_funcs</literal> configuration and introducing 339*2d1272b8SAndroid Build Coastguard Worker inconsistencies and errors downstream. 340*2d1272b8SAndroid Build Coastguard Worker </para> 341*2d1272b8SAndroid Build Coastguard Worker <para> 342*2d1272b8SAndroid Build Coastguard Worker You can retrieve the Unicode-functions configuration for 343*2d1272b8SAndroid Build Coastguard Worker your buffer by calling <function>hb_buffer_get_unicode_funcs()</function>: 344*2d1272b8SAndroid Build Coastguard Worker </para> 345*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 346*2d1272b8SAndroid Build Coastguard Worker hb_unicode_funcs_t *ufunctions; 347*2d1272b8SAndroid Build Coastguard Worker ufunctions = hb_buffer_get_unicode_funcs(buf); 348*2d1272b8SAndroid Build Coastguard Worker </programlisting> 349*2d1272b8SAndroid Build Coastguard Worker <para> 350*2d1272b8SAndroid Build Coastguard Worker The current version of <literal>unicode_funcs</literal> uses six functions: 351*2d1272b8SAndroid Build Coastguard Worker </para> 352*2d1272b8SAndroid Build Coastguard Worker <itemizedlist> 353*2d1272b8SAndroid Build Coastguard Worker <listitem> 354*2d1272b8SAndroid Build Coastguard Worker <para> 355*2d1272b8SAndroid Build Coastguard Worker <function>hb_unicode_combining_class_func_t</function>: 356*2d1272b8SAndroid Build Coastguard Worker returns the Canonical Combining Class of a code point. 357*2d1272b8SAndroid Build Coastguard Worker </para> 358*2d1272b8SAndroid Build Coastguard Worker </listitem> 359*2d1272b8SAndroid Build Coastguard Worker <listitem> 360*2d1272b8SAndroid Build Coastguard Worker <para> 361*2d1272b8SAndroid Build Coastguard Worker <function>hb_unicode_general_category_func_t</function>: 362*2d1272b8SAndroid Build Coastguard Worker returns the General Category (gc) of a code point. 363*2d1272b8SAndroid Build Coastguard Worker </para> 364*2d1272b8SAndroid Build Coastguard Worker </listitem> 365*2d1272b8SAndroid Build Coastguard Worker <listitem> 366*2d1272b8SAndroid Build Coastguard Worker <para> 367*2d1272b8SAndroid Build Coastguard Worker <function>hb_unicode_mirroring_func_t</function>: returns 368*2d1272b8SAndroid Build Coastguard Worker the Mirroring Glyph code point (for bi-directional 369*2d1272b8SAndroid Build Coastguard Worker replacement) of a code point. 370*2d1272b8SAndroid Build Coastguard Worker </para> 371*2d1272b8SAndroid Build Coastguard Worker </listitem> 372*2d1272b8SAndroid Build Coastguard Worker <listitem> 373*2d1272b8SAndroid Build Coastguard Worker <para> 374*2d1272b8SAndroid Build Coastguard Worker <function>hb_unicode_script_func_t</function>: returns the 375*2d1272b8SAndroid Build Coastguard Worker Script (sc) property of a code point. 376*2d1272b8SAndroid Build Coastguard Worker </para> 377*2d1272b8SAndroid Build Coastguard Worker </listitem> 378*2d1272b8SAndroid Build Coastguard Worker <listitem> 379*2d1272b8SAndroid Build Coastguard Worker <para> 380*2d1272b8SAndroid Build Coastguard Worker <function>hb_unicode_compose_func_t</function>: returns the 381*2d1272b8SAndroid Build Coastguard Worker canonical composition of a sequence of two code points. 382*2d1272b8SAndroid Build Coastguard Worker </para> 383*2d1272b8SAndroid Build Coastguard Worker </listitem> 384*2d1272b8SAndroid Build Coastguard Worker <listitem> 385*2d1272b8SAndroid Build Coastguard Worker <para> 386*2d1272b8SAndroid Build Coastguard Worker <function>hb_unicode_decompose_func_t</function>: returns 387*2d1272b8SAndroid Build Coastguard Worker the canonical decomposition of a code point. 388*2d1272b8SAndroid Build Coastguard Worker </para> 389*2d1272b8SAndroid Build Coastguard Worker </listitem> 390*2d1272b8SAndroid Build Coastguard Worker </itemizedlist> 391*2d1272b8SAndroid Build Coastguard Worker <para> 392*2d1272b8SAndroid Build Coastguard Worker Note, however, that future HarfBuzz releases may alter this set. 393*2d1272b8SAndroid Build Coastguard Worker </para> 394*2d1272b8SAndroid Build Coastguard Worker <para> 395*2d1272b8SAndroid Build Coastguard Worker Each Unicode function has a corresponding setter, with which you 396*2d1272b8SAndroid Build Coastguard Worker can assign a callback to your replacement function. For example, 397*2d1272b8SAndroid Build Coastguard Worker to replace 398*2d1272b8SAndroid Build Coastguard Worker <function>hb_unicode_general_category_func_t</function>, you can call 399*2d1272b8SAndroid Build Coastguard Worker </para> 400*2d1272b8SAndroid Build Coastguard Worker <programlisting language="C"> 401*2d1272b8SAndroid Build Coastguard Worker hb_unicode_funcs_set_general_category_func (*ufuncs, func, *user_data, destroy) 402*2d1272b8SAndroid Build Coastguard Worker </programlisting> 403*2d1272b8SAndroid Build Coastguard Worker <para> 404*2d1272b8SAndroid Build Coastguard Worker Virtualizing this set of Unicode functions is primarily intended 405*2d1272b8SAndroid Build Coastguard Worker to improve portability. There is no need for every client 406*2d1272b8SAndroid Build Coastguard Worker program to make the effort to replace the default options, so if 407*2d1272b8SAndroid Build Coastguard Worker you are unsure, do not feel any pressure to customize 408*2d1272b8SAndroid Build Coastguard Worker <literal>unicode_funcs</literal>. 409*2d1272b8SAndroid Build Coastguard Worker </para> 410*2d1272b8SAndroid Build Coastguard Worker </section> 411*2d1272b8SAndroid Build Coastguard Worker 412*2d1272b8SAndroid Build Coastguard Worker</chapter> 413