xref: /aosp_15_r20/external/harfbuzz_ng/docs/usermanual-buffers-language-script-and-direction.xml (revision 2d1272b857b1f7575e6e246373e1cb218663db8a)
1*2d1272b8SAndroid Build Coastguard Worker<?xml version="1.0"?>
2*2d1272b8SAndroid Build Coastguard Worker<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3*2d1272b8SAndroid Build Coastguard Worker               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
4*2d1272b8SAndroid Build Coastguard Worker  <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
5*2d1272b8SAndroid Build Coastguard Worker  <!ENTITY version SYSTEM "version.xml">
6*2d1272b8SAndroid Build Coastguard Worker]>
7*2d1272b8SAndroid Build Coastguard Worker<chapter id="buffers-language-script-and-direction">
8*2d1272b8SAndroid Build Coastguard Worker  <title>Buffers, language, script and direction</title>
9*2d1272b8SAndroid Build Coastguard Worker  <para>
10*2d1272b8SAndroid Build Coastguard Worker    The input to the HarfBuzz shaper is a series of Unicode characters, stored in a
11*2d1272b8SAndroid Build Coastguard Worker    buffer. In this chapter, we'll look at how to set up a buffer with
12*2d1272b8SAndroid Build Coastguard Worker    the text that we want and how to customize the properties of the
13*2d1272b8SAndroid Build Coastguard Worker    buffer. We'll also look at a piece of lower-level machinery that
14*2d1272b8SAndroid Build Coastguard Worker    you will need to understand before proceeding: the functions that
15*2d1272b8SAndroid Build Coastguard Worker    HarfBuzz uses to retrieve Unicode information.
16*2d1272b8SAndroid Build Coastguard Worker  </para>
17*2d1272b8SAndroid Build Coastguard Worker  <para>
18*2d1272b8SAndroid Build Coastguard Worker    After shaping is complete, HarfBuzz puts its output back
19*2d1272b8SAndroid Build Coastguard Worker    into the buffer. But getting that output requires setting up a
20*2d1272b8SAndroid Build Coastguard Worker    face and a font first, so we will look at that in the next chapter
21*2d1272b8SAndroid Build Coastguard Worker    instead of here.
22*2d1272b8SAndroid Build Coastguard Worker  </para>
23*2d1272b8SAndroid Build Coastguard Worker  <section id="creating-and-destroying-buffers">
24*2d1272b8SAndroid Build Coastguard Worker    <title>Creating and destroying buffers</title>
25*2d1272b8SAndroid Build Coastguard Worker    <para>
26*2d1272b8SAndroid Build Coastguard Worker      As we saw in our <emphasis>Getting Started</emphasis> example, a
27*2d1272b8SAndroid Build Coastguard Worker      buffer is created and
28*2d1272b8SAndroid Build Coastguard Worker      initialized with <function>hb_buffer_create()</function>. This
29*2d1272b8SAndroid Build Coastguard Worker      produces a new, empty buffer object, instantiated with some
30*2d1272b8SAndroid Build Coastguard Worker      default values and ready to accept your Unicode strings.
31*2d1272b8SAndroid Build Coastguard Worker    </para>
32*2d1272b8SAndroid Build Coastguard Worker    <para>
33*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz manages the memory of objects (such as buffers) that it
34*2d1272b8SAndroid Build Coastguard Worker      creates, so you don't have to. When you have finished working on
35*2d1272b8SAndroid Build Coastguard Worker      a buffer, you can call <function>hb_buffer_destroy()</function>:
36*2d1272b8SAndroid Build Coastguard Worker    </para>
37*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
38*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_t *buf = hb_buffer_create();
39*2d1272b8SAndroid Build Coastguard Worker      ...
40*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_destroy(buf);
41*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
42*2d1272b8SAndroid Build Coastguard Worker    <para>
43*2d1272b8SAndroid Build Coastguard Worker      This will destroy the object and free its associated memory -
44*2d1272b8SAndroid Build Coastguard Worker      unless some other part of the program holds a reference to this
45*2d1272b8SAndroid Build Coastguard Worker      buffer. If you acquire a HarfBuzz buffer from another subsystem
46*2d1272b8SAndroid Build Coastguard Worker      and want to ensure that it is not garbage collected by someone
47*2d1272b8SAndroid Build Coastguard Worker      else destroying it, you should increase its reference count:
48*2d1272b8SAndroid Build Coastguard Worker    </para>
49*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
50*2d1272b8SAndroid Build Coastguard Worker      void somefunc(hb_buffer_t *buf) {
51*2d1272b8SAndroid Build Coastguard Worker      buf = hb_buffer_reference(buf);
52*2d1272b8SAndroid Build Coastguard Worker      ...
53*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
54*2d1272b8SAndroid Build Coastguard Worker    <para>
55*2d1272b8SAndroid Build Coastguard Worker      And then decrease it once you're done with it:
56*2d1272b8SAndroid Build Coastguard Worker    </para>
57*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
58*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_destroy(buf);
59*2d1272b8SAndroid Build Coastguard Worker      }
60*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
61*2d1272b8SAndroid Build Coastguard Worker    <para>
62*2d1272b8SAndroid Build Coastguard Worker      While we are on the subject of reference-counting buffers, it is
63*2d1272b8SAndroid Build Coastguard Worker      worth noting that an individual buffer can only meaningfully be
64*2d1272b8SAndroid Build Coastguard Worker      used by one thread at a time.
65*2d1272b8SAndroid Build Coastguard Worker    </para>
66*2d1272b8SAndroid Build Coastguard Worker    <para>
67*2d1272b8SAndroid Build Coastguard Worker      To throw away all the data in your buffer and start from scratch,
68*2d1272b8SAndroid Build Coastguard Worker      call <function>hb_buffer_reset(buf)</function>. If you want to
69*2d1272b8SAndroid Build Coastguard Worker      throw away the string in the buffer but keep the options, you can
70*2d1272b8SAndroid Build Coastguard Worker      instead call <function>hb_buffer_clear_contents(buf)</function>.
71*2d1272b8SAndroid Build Coastguard Worker    </para>
72*2d1272b8SAndroid Build Coastguard Worker  </section>
73*2d1272b8SAndroid Build Coastguard Worker
74*2d1272b8SAndroid Build Coastguard Worker  <section id="adding-text-to-the-buffer">
75*2d1272b8SAndroid Build Coastguard Worker    <title>Adding text to the buffer</title>
76*2d1272b8SAndroid Build Coastguard Worker    <para>
77*2d1272b8SAndroid Build Coastguard Worker      Now we have a brand new HarfBuzz buffer. Let's start filling it
78*2d1272b8SAndroid Build Coastguard Worker      with text! From HarfBuzz's perspective, a buffer is just a stream
79*2d1272b8SAndroid Build Coastguard Worker      of Unicode code points, but your input string is probably in one of
80*2d1272b8SAndroid Build Coastguard Worker      the standard Unicode character encodings (UTF-8, UTF-16, or
81*2d1272b8SAndroid Build Coastguard Worker      UTF-32). HarfBuzz provides convenience functions that accept
82*2d1272b8SAndroid Build Coastguard Worker      each of these encodings:
83*2d1272b8SAndroid Build Coastguard Worker      <function>hb_buffer_add_utf8()</function>,
84*2d1272b8SAndroid Build Coastguard Worker      <function>hb_buffer_add_utf16()</function>, and
85*2d1272b8SAndroid Build Coastguard Worker      <function>hb_buffer_add_utf32()</function>. Other than the
86*2d1272b8SAndroid Build Coastguard Worker      character encoding they accept, they function identically.
87*2d1272b8SAndroid Build Coastguard Worker    </para>
88*2d1272b8SAndroid Build Coastguard Worker    <para>
89*2d1272b8SAndroid Build Coastguard Worker      You can add UTF-8 text to a buffer by passing in the text array,
90*2d1272b8SAndroid Build Coastguard Worker      the array's length, an offset into the array for the first
91*2d1272b8SAndroid Build Coastguard Worker      character to add, and the length of the segment to add:
92*2d1272b8SAndroid Build Coastguard Worker    </para>
93*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
94*2d1272b8SAndroid Build Coastguard Worker    hb_buffer_add_utf8 (hb_buffer_t *buf,
95*2d1272b8SAndroid Build Coastguard Worker                    const char *text,
96*2d1272b8SAndroid Build Coastguard Worker                    int text_length,
97*2d1272b8SAndroid Build Coastguard Worker                    unsigned int item_offset,
98*2d1272b8SAndroid Build Coastguard Worker                    int item_length)
99*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
100*2d1272b8SAndroid Build Coastguard Worker    <para>
101*2d1272b8SAndroid Build Coastguard Worker      So, in practice, you can say:
102*2d1272b8SAndroid Build Coastguard Worker    </para>
103*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
104*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
105*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
106*2d1272b8SAndroid Build Coastguard Worker    <para>
107*2d1272b8SAndroid Build Coastguard Worker      This will append your new characters to
108*2d1272b8SAndroid Build Coastguard Worker      <parameter>buf</parameter>, not replace its existing
109*2d1272b8SAndroid Build Coastguard Worker      contents. Also, note that you can use <literal>-1</literal> in
110*2d1272b8SAndroid Build Coastguard Worker      place of the first instance of <function>strlen(text)</function>
111*2d1272b8SAndroid Build Coastguard Worker      if your text array is NULL-terminated. Similarly, you can also use
112*2d1272b8SAndroid Build Coastguard Worker      <literal>-1</literal> as the final argument want to add its full
113*2d1272b8SAndroid Build Coastguard Worker      contents.
114*2d1272b8SAndroid Build Coastguard Worker    </para>
115*2d1272b8SAndroid Build Coastguard Worker    <para>
116*2d1272b8SAndroid Build Coastguard Worker      Whatever start <parameter>item_offset</parameter> and
117*2d1272b8SAndroid Build Coastguard Worker      <parameter>item_length</parameter> you provide, HarfBuzz will also
118*2d1272b8SAndroid Build Coastguard Worker      attempt to grab the five characters <emphasis>before</emphasis>
119*2d1272b8SAndroid Build Coastguard Worker      the offset point and the five characters
120*2d1272b8SAndroid Build Coastguard Worker      <emphasis>after</emphasis> the designated end. These are the
121*2d1272b8SAndroid Build Coastguard Worker      before and after "context" segments, which are used internally
122*2d1272b8SAndroid Build Coastguard Worker      for HarfBuzz to make shaping decisions. They will not be part of
123*2d1272b8SAndroid Build Coastguard Worker      the final output, but they ensure that HarfBuzz's
124*2d1272b8SAndroid Build Coastguard Worker      script-specific shaping operations are correct. If there are
125*2d1272b8SAndroid Build Coastguard Worker      fewer than five characters available for the before or after
126*2d1272b8SAndroid Build Coastguard Worker      contexts, HarfBuzz will just grab what is there.
127*2d1272b8SAndroid Build Coastguard Worker    </para>
128*2d1272b8SAndroid Build Coastguard Worker    <para>
129*2d1272b8SAndroid Build Coastguard Worker      For longer text runs, such as full paragraphs, it might be
130*2d1272b8SAndroid Build Coastguard Worker      tempting to only add smaller sub-segments to a buffer and
131*2d1272b8SAndroid Build Coastguard Worker      shape them in piecemeal fashion. Generally, this is not a good
132*2d1272b8SAndroid Build Coastguard Worker      idea, however, because a lot of shaping decisions are
133*2d1272b8SAndroid Build Coastguard Worker      dependent on this context information. For example, in Arabic
134*2d1272b8SAndroid Build Coastguard Worker      and other connected scripts, HarfBuzz needs to know the code
135*2d1272b8SAndroid Build Coastguard Worker      points before and after each character in order to correctly
136*2d1272b8SAndroid Build Coastguard Worker      determine which glyph to return.
137*2d1272b8SAndroid Build Coastguard Worker    </para>
138*2d1272b8SAndroid Build Coastguard Worker    <para>
139*2d1272b8SAndroid Build Coastguard Worker      The safest approach is to add all of the text available (even
140*2d1272b8SAndroid Build Coastguard Worker      if your text contains a mix of scripts, directions, languages
141*2d1272b8SAndroid Build Coastguard Worker      and fonts), then use <parameter>item_offset</parameter> and
142*2d1272b8SAndroid Build Coastguard Worker      <parameter>item_length</parameter> to indicate which characters you
143*2d1272b8SAndroid Build Coastguard Worker      want shaped (which must all have the same script, direction,
144*2d1272b8SAndroid Build Coastguard Worker      language and font), so that HarfBuzz has access to any context.
145*2d1272b8SAndroid Build Coastguard Worker    </para>
146*2d1272b8SAndroid Build Coastguard Worker    <para>
147*2d1272b8SAndroid Build Coastguard Worker      You can also add Unicode code points directly with
148*2d1272b8SAndroid Build Coastguard Worker      <function>hb_buffer_add_codepoints()</function>. The arguments
149*2d1272b8SAndroid Build Coastguard Worker      to this function are the same as those for the UTF
150*2d1272b8SAndroid Build Coastguard Worker      encodings. But it is particularly important to note that
151*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz does not do validity checking on the text that is added
152*2d1272b8SAndroid Build Coastguard Worker      to a buffer. Invalid code points will be replaced, but it is up
153*2d1272b8SAndroid Build Coastguard Worker      to you to do any deep-sanity checking necessary.
154*2d1272b8SAndroid Build Coastguard Worker    </para>
155*2d1272b8SAndroid Build Coastguard Worker
156*2d1272b8SAndroid Build Coastguard Worker  </section>
157*2d1272b8SAndroid Build Coastguard Worker
158*2d1272b8SAndroid Build Coastguard Worker  <section id="setting-buffer-properties">
159*2d1272b8SAndroid Build Coastguard Worker    <title>Setting buffer properties</title>
160*2d1272b8SAndroid Build Coastguard Worker    <para>
161*2d1272b8SAndroid Build Coastguard Worker      Buffers containing input characters still need several
162*2d1272b8SAndroid Build Coastguard Worker      properties set before HarfBuzz can shape their text correctly.
163*2d1272b8SAndroid Build Coastguard Worker    </para>
164*2d1272b8SAndroid Build Coastguard Worker    <para>
165*2d1272b8SAndroid Build Coastguard Worker      Initially, all buffers are set to the
166*2d1272b8SAndroid Build Coastguard Worker      <literal>HB_BUFFER_CONTENT_TYPE_INVALID</literal> content
167*2d1272b8SAndroid Build Coastguard Worker      type. After adding text, the buffer should be set to
168*2d1272b8SAndroid Build Coastguard Worker      <literal>HB_BUFFER_CONTENT_TYPE_UNICODE</literal> instead, which
169*2d1272b8SAndroid Build Coastguard Worker      indicates that it contains un-shaped input
170*2d1272b8SAndroid Build Coastguard Worker      characters. After shaping, the buffer will have the
171*2d1272b8SAndroid Build Coastguard Worker      <literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal> content type.
172*2d1272b8SAndroid Build Coastguard Worker    </para>
173*2d1272b8SAndroid Build Coastguard Worker    <para>
174*2d1272b8SAndroid Build Coastguard Worker      <function>hb_buffer_add_utf8()</function> and the
175*2d1272b8SAndroid Build Coastguard Worker      other UTF functions set the content type of their buffer
176*2d1272b8SAndroid Build Coastguard Worker      automatically. But if you are reusing a buffer you may want to
177*2d1272b8SAndroid Build Coastguard Worker      check its state with
178*2d1272b8SAndroid Build Coastguard Worker      <function>hb_buffer_get_content_type(buffer)</function>. If
179*2d1272b8SAndroid Build Coastguard Worker      necessary you can set the content type with
180*2d1272b8SAndroid Build Coastguard Worker    </para>
181*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
182*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE);
183*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
184*2d1272b8SAndroid Build Coastguard Worker    <para>
185*2d1272b8SAndroid Build Coastguard Worker      to prepare for shaping.
186*2d1272b8SAndroid Build Coastguard Worker    </para>
187*2d1272b8SAndroid Build Coastguard Worker    <para>
188*2d1272b8SAndroid Build Coastguard Worker      Buffers also need to carry information about the script,
189*2d1272b8SAndroid Build Coastguard Worker      language, and text direction of their contents. You can set
190*2d1272b8SAndroid Build Coastguard Worker      these properties individually:
191*2d1272b8SAndroid Build Coastguard Worker    </para>
192*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
193*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_set_direction(buf, HB_DIRECTION_LTR);
194*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_set_script(buf, HB_SCRIPT_LATIN);
195*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_set_language(buf, hb_language_from_string("en", -1));
196*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
197*2d1272b8SAndroid Build Coastguard Worker    <para>
198*2d1272b8SAndroid Build Coastguard Worker      However, since these properties are often repeated for
199*2d1272b8SAndroid Build Coastguard Worker      multiple text runs, you can also save them in a
200*2d1272b8SAndroid Build Coastguard Worker      <literal>hb_segment_properties_t</literal> for reuse:
201*2d1272b8SAndroid Build Coastguard Worker    </para>
202*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
203*2d1272b8SAndroid Build Coastguard Worker      hb_segment_properties_t *savedprops;
204*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_get_segment_properties (buf, savedprops);
205*2d1272b8SAndroid Build Coastguard Worker      ...
206*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_set_segment_properties (buf2, savedprops);
207*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
208*2d1272b8SAndroid Build Coastguard Worker    <para>
209*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz also provides getter functions to retrieve a buffer's
210*2d1272b8SAndroid Build Coastguard Worker      direction, script, and language properties individually.
211*2d1272b8SAndroid Build Coastguard Worker    </para>
212*2d1272b8SAndroid Build Coastguard Worker    <para>
213*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz recognizes four text directions in
214*2d1272b8SAndroid Build Coastguard Worker      <type>hb_direction_t</type>: left-to-right
215*2d1272b8SAndroid Build Coastguard Worker      (<literal>HB_DIRECTION_LTR</literal>), right-to-left (<literal>HB_DIRECTION_RTL</literal>),
216*2d1272b8SAndroid Build Coastguard Worker      top-to-bottom (<literal>HB_DIRECTION_TTB</literal>), and
217*2d1272b8SAndroid Build Coastguard Worker      bottom-to-top (<literal>HB_DIRECTION_BTT</literal>).  For the
218*2d1272b8SAndroid Build Coastguard Worker      script property, HarfBuzz uses identifiers based on the
219*2d1272b8SAndroid Build Coastguard Worker      <ulink
220*2d1272b8SAndroid Build Coastguard Worker      url="https://unicode.org/iso15924/">ISO 15924
221*2d1272b8SAndroid Build Coastguard Worker      standard</ulink>. For languages, HarfBuzz uses tags based on the
222*2d1272b8SAndroid Build Coastguard Worker      <ulink url="https://tools.ietf.org/html/bcp47">IETF BCP 47</ulink> standard.
223*2d1272b8SAndroid Build Coastguard Worker    </para>
224*2d1272b8SAndroid Build Coastguard Worker    <para>
225*2d1272b8SAndroid Build Coastguard Worker      Helper functions are provided to convert character strings into
226*2d1272b8SAndroid Build Coastguard Worker      the necessary script and language tag types.
227*2d1272b8SAndroid Build Coastguard Worker    </para>
228*2d1272b8SAndroid Build Coastguard Worker    <para>
229*2d1272b8SAndroid Build Coastguard Worker      Two additional buffer properties to be aware of are the
230*2d1272b8SAndroid Build Coastguard Worker      "invisible glyph" and the replacement code point. The
231*2d1272b8SAndroid Build Coastguard Worker      replacement code point is inserted into buffer output in place of
232*2d1272b8SAndroid Build Coastguard Worker      any invalid code points encountered in the input. By default, it
233*2d1272b8SAndroid Build Coastguard Worker      is the Unicode <literal>REPLACEMENT CHARACTER</literal> code
234*2d1272b8SAndroid Build Coastguard Worker      point, <literal>U+FFFD</literal> "&#xFFFD;". You can change this with
235*2d1272b8SAndroid Build Coastguard Worker    </para>
236*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
237*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_set_replacement_codepoint(buf, replacement);
238*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
239*2d1272b8SAndroid Build Coastguard Worker    <para>
240*2d1272b8SAndroid Build Coastguard Worker      passing in the replacement Unicode code point as the
241*2d1272b8SAndroid Build Coastguard Worker      <parameter>replacement</parameter> parameter.
242*2d1272b8SAndroid Build Coastguard Worker    </para>
243*2d1272b8SAndroid Build Coastguard Worker    <para>
244*2d1272b8SAndroid Build Coastguard Worker      The invisible glyph is used to replace all output glyphs that
245*2d1272b8SAndroid Build Coastguard Worker      are invisible. By default, the standard space character
246*2d1272b8SAndroid Build Coastguard Worker      <literal>U+0020</literal> is used; you can replace this (for
247*2d1272b8SAndroid Build Coastguard Worker      example, when using a font that provides script-specific
248*2d1272b8SAndroid Build Coastguard Worker      spaces) with
249*2d1272b8SAndroid Build Coastguard Worker    </para>
250*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
251*2d1272b8SAndroid Build Coastguard Worker      hb_buffer_set_invisible_glyph(buf, replacement_glyph);
252*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
253*2d1272b8SAndroid Build Coastguard Worker    <para>
254*2d1272b8SAndroid Build Coastguard Worker      Do note that in the <parameter>replacement_glyph</parameter>
255*2d1272b8SAndroid Build Coastguard Worker      parameter, you must provide the glyph ID of the replacement you
256*2d1272b8SAndroid Build Coastguard Worker      wish to use, not the Unicode code point.
257*2d1272b8SAndroid Build Coastguard Worker    </para>
258*2d1272b8SAndroid Build Coastguard Worker    <para>
259*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz supports a few additional flags you might want to set
260*2d1272b8SAndroid Build Coastguard Worker      on your buffer under certain circumstances. The
261*2d1272b8SAndroid Build Coastguard Worker      <literal>HB_BUFFER_FLAG_BOT</literal> and
262*2d1272b8SAndroid Build Coastguard Worker      <literal>HB_BUFFER_FLAG_EOT</literal> flags tell HarfBuzz
263*2d1272b8SAndroid Build Coastguard Worker      that the buffer represents the beginning or end (respectively)
264*2d1272b8SAndroid Build Coastguard Worker      of a text element (such as a paragraph or other block). Knowing
265*2d1272b8SAndroid Build Coastguard Worker      this allows HarfBuzz to apply certain contextual font features
266*2d1272b8SAndroid Build Coastguard Worker      when shaping, such as initial or final variants in connected
267*2d1272b8SAndroid Build Coastguard Worker      scripts.
268*2d1272b8SAndroid Build Coastguard Worker    </para>
269*2d1272b8SAndroid Build Coastguard Worker    <para>
270*2d1272b8SAndroid Build Coastguard Worker      <literal>HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES</literal>
271*2d1272b8SAndroid Build Coastguard Worker      tells HarfBuzz not to hide glyphs with the
272*2d1272b8SAndroid Build Coastguard Worker      <literal>Default_Ignorable</literal> property in Unicode. This
273*2d1272b8SAndroid Build Coastguard Worker      property designates control characters and other non-printing
274*2d1272b8SAndroid Build Coastguard Worker      code points, such as joiners and variation selectors. Normally
275*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz replaces them in the output buffer with zero-width
276*2d1272b8SAndroid Build Coastguard Worker      space glyphs (using the "invisible glyph" property discussed
277*2d1272b8SAndroid Build Coastguard Worker      above); setting this flag causes them to be printed, which can
278*2d1272b8SAndroid Build Coastguard Worker      be helpful for troubleshooting.
279*2d1272b8SAndroid Build Coastguard Worker    </para>
280*2d1272b8SAndroid Build Coastguard Worker    <para>
281*2d1272b8SAndroid Build Coastguard Worker      Conversely, setting the
282*2d1272b8SAndroid Build Coastguard Worker      <literal>HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES</literal> flag
283*2d1272b8SAndroid Build Coastguard Worker      tells HarfBuzz to remove <literal>Default_Ignorable</literal>
284*2d1272b8SAndroid Build Coastguard Worker      glyphs from the output buffer entirely. Finally, setting the
285*2d1272b8SAndroid Build Coastguard Worker      <literal>HB_BUFFER_FLAG_DO_NOT_INSERT_DOTTED_CIRCLE</literal>
286*2d1272b8SAndroid Build Coastguard Worker      flag tells HarfBuzz not to insert the dotted-circle glyph
287*2d1272b8SAndroid Build Coastguard Worker      (<literal>U+25CC</literal>, "&#x25CC;"), which is normally
288*2d1272b8SAndroid Build Coastguard Worker      inserted into buffer output when broken character sequences are
289*2d1272b8SAndroid Build Coastguard Worker      encountered (such as combining marks that are not attached to a
290*2d1272b8SAndroid Build Coastguard Worker      base character).
291*2d1272b8SAndroid Build Coastguard Worker    </para>
292*2d1272b8SAndroid Build Coastguard Worker  </section>
293*2d1272b8SAndroid Build Coastguard Worker
294*2d1272b8SAndroid Build Coastguard Worker  <section id="customizing-unicode-functions">
295*2d1272b8SAndroid Build Coastguard Worker    <title>Customizing Unicode functions</title>
296*2d1272b8SAndroid Build Coastguard Worker    <para>
297*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz requires some simple functions for accessing
298*2d1272b8SAndroid Build Coastguard Worker      information from the Unicode Character Database (such as the
299*2d1272b8SAndroid Build Coastguard Worker      <literal>General_Category</literal> (gc) and
300*2d1272b8SAndroid Build Coastguard Worker      <literal>Script</literal> (sc) properties) that is useful
301*2d1272b8SAndroid Build Coastguard Worker      for shaping, as well as some useful operations like composing and
302*2d1272b8SAndroid Build Coastguard Worker      decomposing code points.
303*2d1272b8SAndroid Build Coastguard Worker    </para>
304*2d1272b8SAndroid Build Coastguard Worker    <para>
305*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz includes its own internal, lightweight set of Unicode
306*2d1272b8SAndroid Build Coastguard Worker      functions. At build time, it is also possible to compile support
307*2d1272b8SAndroid Build Coastguard Worker      for some other options, such as the Unicode functions provided
308*2d1272b8SAndroid Build Coastguard Worker      by GLib or the International Components for Unicode (ICU)
309*2d1272b8SAndroid Build Coastguard Worker      library. Generally, this option is only of interest for client
310*2d1272b8SAndroid Build Coastguard Worker      programs that have specific integration requirements or that do
311*2d1272b8SAndroid Build Coastguard Worker      a significant amount of customization.
312*2d1272b8SAndroid Build Coastguard Worker    </para>
313*2d1272b8SAndroid Build Coastguard Worker    <para>
314*2d1272b8SAndroid Build Coastguard Worker      If your program has access to other Unicode functions, however,
315*2d1272b8SAndroid Build Coastguard Worker      such as through a system library or application framework, you
316*2d1272b8SAndroid Build Coastguard Worker      might prefer to use those instead of the built-in
317*2d1272b8SAndroid Build Coastguard Worker      options. HarfBuzz supports this by implementing its Unicode
318*2d1272b8SAndroid Build Coastguard Worker      functions as a set of virtual methods that you can replace —
319*2d1272b8SAndroid Build Coastguard Worker      without otherwise affecting HarfBuzz's functionality.
320*2d1272b8SAndroid Build Coastguard Worker    </para>
321*2d1272b8SAndroid Build Coastguard Worker    <para>
322*2d1272b8SAndroid Build Coastguard Worker      The Unicode functions are specified in a structure called
323*2d1272b8SAndroid Build Coastguard Worker      <literal>unicode_funcs</literal> which is attached to each
324*2d1272b8SAndroid Build Coastguard Worker      buffer. But even though <literal>unicode_funcs</literal> is
325*2d1272b8SAndroid Build Coastguard Worker      associated with a <type>hb_buffer_t</type>, the functions
326*2d1272b8SAndroid Build Coastguard Worker      themselves are called by other HarfBuzz APIs that access
327*2d1272b8SAndroid Build Coastguard Worker      buffers, so it would be unwise for you to hook different
328*2d1272b8SAndroid Build Coastguard Worker      functions into different buffers.
329*2d1272b8SAndroid Build Coastguard Worker    </para>
330*2d1272b8SAndroid Build Coastguard Worker    <para>
331*2d1272b8SAndroid Build Coastguard Worker      In addition, you can mark your <literal>unicode_funcs</literal>
332*2d1272b8SAndroid Build Coastguard Worker      as immutable by calling
333*2d1272b8SAndroid Build Coastguard Worker      <function>hb_unicode_funcs_make_immutable (ufuncs)</function>.
334*2d1272b8SAndroid Build Coastguard Worker      This is especially useful if your code is a
335*2d1272b8SAndroid Build Coastguard Worker      library or framework that will have its own client programs. By
336*2d1272b8SAndroid Build Coastguard Worker      marking your Unicode function choices as immutable, you prevent
337*2d1272b8SAndroid Build Coastguard Worker      your own client programs from changing the
338*2d1272b8SAndroid Build Coastguard Worker      <literal>unicode_funcs</literal> configuration and introducing
339*2d1272b8SAndroid Build Coastguard Worker      inconsistencies and errors downstream.
340*2d1272b8SAndroid Build Coastguard Worker    </para>
341*2d1272b8SAndroid Build Coastguard Worker    <para>
342*2d1272b8SAndroid Build Coastguard Worker      You can retrieve the Unicode-functions configuration for
343*2d1272b8SAndroid Build Coastguard Worker      your buffer by calling <function>hb_buffer_get_unicode_funcs()</function>:
344*2d1272b8SAndroid Build Coastguard Worker    </para>
345*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
346*2d1272b8SAndroid Build Coastguard Worker      hb_unicode_funcs_t *ufunctions;
347*2d1272b8SAndroid Build Coastguard Worker      ufunctions = hb_buffer_get_unicode_funcs(buf);
348*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
349*2d1272b8SAndroid Build Coastguard Worker    <para>
350*2d1272b8SAndroid Build Coastguard Worker      The current version of <literal>unicode_funcs</literal> uses six functions:
351*2d1272b8SAndroid Build Coastguard Worker    </para>
352*2d1272b8SAndroid Build Coastguard Worker    <itemizedlist>
353*2d1272b8SAndroid Build Coastguard Worker      <listitem>
354*2d1272b8SAndroid Build Coastguard Worker	<para>
355*2d1272b8SAndroid Build Coastguard Worker	  <function>hb_unicode_combining_class_func_t</function>:
356*2d1272b8SAndroid Build Coastguard Worker	  returns the Canonical Combining Class of a code point.
357*2d1272b8SAndroid Build Coastguard Worker      	</para>
358*2d1272b8SAndroid Build Coastguard Worker      </listitem>
359*2d1272b8SAndroid Build Coastguard Worker      <listitem>
360*2d1272b8SAndroid Build Coastguard Worker	<para>
361*2d1272b8SAndroid Build Coastguard Worker	  <function>hb_unicode_general_category_func_t</function>:
362*2d1272b8SAndroid Build Coastguard Worker	  returns the General Category (gc) of a code point.
363*2d1272b8SAndroid Build Coastguard Worker      	</para>
364*2d1272b8SAndroid Build Coastguard Worker      </listitem>
365*2d1272b8SAndroid Build Coastguard Worker      <listitem>
366*2d1272b8SAndroid Build Coastguard Worker	<para>
367*2d1272b8SAndroid Build Coastguard Worker	  <function>hb_unicode_mirroring_func_t</function>: returns
368*2d1272b8SAndroid Build Coastguard Worker	  the Mirroring Glyph code point (for bi-directional
369*2d1272b8SAndroid Build Coastguard Worker	  replacement) of a code point.
370*2d1272b8SAndroid Build Coastguard Worker      	</para>
371*2d1272b8SAndroid Build Coastguard Worker      </listitem>
372*2d1272b8SAndroid Build Coastguard Worker      <listitem>
373*2d1272b8SAndroid Build Coastguard Worker	<para>
374*2d1272b8SAndroid Build Coastguard Worker	  <function>hb_unicode_script_func_t</function>: returns the
375*2d1272b8SAndroid Build Coastguard Worker	  Script (sc) property of a code point.
376*2d1272b8SAndroid Build Coastguard Worker      	</para>
377*2d1272b8SAndroid Build Coastguard Worker      </listitem>
378*2d1272b8SAndroid Build Coastguard Worker      <listitem>
379*2d1272b8SAndroid Build Coastguard Worker	<para>
380*2d1272b8SAndroid Build Coastguard Worker	  <function>hb_unicode_compose_func_t</function>: returns the
381*2d1272b8SAndroid Build Coastguard Worker	  canonical composition of a sequence of two code points.
382*2d1272b8SAndroid Build Coastguard Worker	</para>
383*2d1272b8SAndroid Build Coastguard Worker      </listitem>
384*2d1272b8SAndroid Build Coastguard Worker      <listitem>
385*2d1272b8SAndroid Build Coastguard Worker	<para>
386*2d1272b8SAndroid Build Coastguard Worker	  <function>hb_unicode_decompose_func_t</function>: returns
387*2d1272b8SAndroid Build Coastguard Worker	  the canonical decomposition of a code point.
388*2d1272b8SAndroid Build Coastguard Worker	</para>
389*2d1272b8SAndroid Build Coastguard Worker      </listitem>
390*2d1272b8SAndroid Build Coastguard Worker    </itemizedlist>
391*2d1272b8SAndroid Build Coastguard Worker    <para>
392*2d1272b8SAndroid Build Coastguard Worker      Note, however, that future HarfBuzz releases may alter this set.
393*2d1272b8SAndroid Build Coastguard Worker    </para>
394*2d1272b8SAndroid Build Coastguard Worker    <para>
395*2d1272b8SAndroid Build Coastguard Worker      Each Unicode function has a corresponding setter, with which you
396*2d1272b8SAndroid Build Coastguard Worker      can assign a callback to your replacement function. For example,
397*2d1272b8SAndroid Build Coastguard Worker      to replace
398*2d1272b8SAndroid Build Coastguard Worker      <function>hb_unicode_general_category_func_t</function>, you can call
399*2d1272b8SAndroid Build Coastguard Worker    </para>
400*2d1272b8SAndroid Build Coastguard Worker    <programlisting language="C">
401*2d1272b8SAndroid Build Coastguard Worker      hb_unicode_funcs_set_general_category_func (*ufuncs, func, *user_data, destroy)
402*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
403*2d1272b8SAndroid Build Coastguard Worker    <para>
404*2d1272b8SAndroid Build Coastguard Worker      Virtualizing this set of Unicode functions is primarily intended
405*2d1272b8SAndroid Build Coastguard Worker      to improve portability. There is no need for every client
406*2d1272b8SAndroid Build Coastguard Worker      program to make the effort to replace the default options, so if
407*2d1272b8SAndroid Build Coastguard Worker      you are unsure, do not feel any pressure to customize
408*2d1272b8SAndroid Build Coastguard Worker      <literal>unicode_funcs</literal>.
409*2d1272b8SAndroid Build Coastguard Worker    </para>
410*2d1272b8SAndroid Build Coastguard Worker  </section>
411*2d1272b8SAndroid Build Coastguard Worker
412*2d1272b8SAndroid Build Coastguard Worker</chapter>
413