xref: /aosp_15_r20/external/marisa-trie/docs/readme.en.html (revision ab8db090fce404b23716c4c9194221ee27efe31c)
1*ab8db090SAndroid Build Coastguard Worker<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
2*ab8db090SAndroid Build Coastguard Worker<html lang="en">
3*ab8db090SAndroid Build Coastguard Worker <head>
4*ab8db090SAndroid Build Coastguard Worker  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
5*ab8db090SAndroid Build Coastguard Worker  <title>MARISA: Matching Algorithm with Recursively Implemented StorAge</title>
6*ab8db090SAndroid Build Coastguard Worker  <link rel="stylesheet" type="text/css" href="./style.css">
7*ab8db090SAndroid Build Coastguard Worker </head>
8*ab8db090SAndroid Build Coastguard Worker <body>
9*ab8db090SAndroid Build Coastguard Worker  <div id="header">
10*ab8db090SAndroid Build Coastguard Worker   <div class="left">MARISA: Matching Algorithm with Recursively Implemented StorAge</div>
11*ab8db090SAndroid Build Coastguard Worker   <div class="right">Last modified: 14 Jun 2020</div>
12*ab8db090SAndroid Build Coastguard Worker   <div class="end"></div>
13*ab8db090SAndroid Build Coastguard Worker  </div><!-- header -->
14*ab8db090SAndroid Build Coastguard Worker  <div id="body" style="text-align: justify">
15*ab8db090SAndroid Build Coastguard Worker   <h1>MARISA: Matching Algorithm with Recursively Implemented StorAge</h1>
16*ab8db090SAndroid Build Coastguard Worker   <p id="abstract">
17*ab8db090SAndroid Build Coastguard Worker    <span id="heading">Abstract: </span>
18*ab8db090SAndroid Build Coastguard Worker     Matching Algorithm with Recursively Implemented StorAge (MARISA) is a space-efficient trie data structure. libmarisa is a C++ library for an implementation of MARISA. Users can build dictionaries and search keys from the dictionaries. The package also provides command line tools to test basic operations of libmarisa, and the tools are useful to test the performance.
19*ab8db090SAndroid Build Coastguard Worker   </p><!-- abstract -->
20*ab8db090SAndroid Build Coastguard Worker
21*ab8db090SAndroid Build Coastguard Worker   <div class="section">
22*ab8db090SAndroid Build Coastguard Worker    <h2><a name="introduction">Introduction</a></h2>
23*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
24*ab8db090SAndroid Build Coastguard Worker     <h3><a name="overview">Overview</a></h3>
25*ab8db090SAndroid Build Coastguard Worker     <p>
26*ab8db090SAndroid Build Coastguard Worker      Matching Algorithm with Recursively Implemented StorAge (MARISA) is a space-efficient, fairly fast, and static trie data structure. MARISA serves as a dictionary structure, and by definition, it supports exact match lookup, which is the basic operation of dictionary. In addition, MARISA supports reverse lookup, common prefix search, and predictive search.
27*ab8db090SAndroid Build Coastguard Worker     </p>
28*ab8db090SAndroid Build Coastguard Worker     <p>
29*ab8db090SAndroid Build Coastguard Worker      In most cases, MARISA is much more compact than a plain text which consists of the registered keys. This means that the traditional dictionary implementations, a binary tree (<code>std::map&lt;std::string, T&gt;</code>) and a hash table (<code>std::unordered_map&lt;std::string, T&gt;</code>), require more and more and more spaces than MARISA. Bloom Filter, a probabilistic data structure, is more space-efficient than MARISA but causes false positives and does not support reverse lookup, common prefix search, and predictive search.
30*ab8db090SAndroid Build Coastguard Worker     </p>
31*ab8db090SAndroid Build Coastguard Worker     <p>
32*ab8db090SAndroid Build Coastguard Worker      libmarisa is a C++ library for an implementation of MARISA. Users can build dictionaries and search keys from the dictionaries. The package also provides command line tools to test basic operations of libmarisa, and the tools are useful to test the performance.
33*ab8db090SAndroid Build Coastguard Worker     </p>
34*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
35*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
36*ab8db090SAndroid Build Coastguard Worker     <h3><a name="ability">Functionality</a></h3>
37*ab8db090SAndroid Build Coastguard Worker     <p>
38*ab8db090SAndroid Build Coastguard Worker      libmarisa associates string keys with unique IDs, from <var>0</var> to <var>(n - 1)</var>, where <var>n</var> is the number of keys. Note that users cannot specify the IDs because the mapping is automatically generated by MARISA. Every search function takes a string or an ID and returns the search result which is represented by a pair of the key and its ID.
39*ab8db090SAndroid Build Coastguard Worker     </p>
40*ab8db090SAndroid Build Coastguard Worker     <ul>
41*ab8db090SAndroid Build Coastguard Worker      <li>Lookup
42*ab8db090SAndroid Build Coastguard Worker       <ul>
43*ab8db090SAndroid Build Coastguard Worker        <li>checks whether or not a query string is registered.</li>
44*ab8db090SAndroid Build Coastguard Worker       </ul>
45*ab8db090SAndroid Build Coastguard Worker      </li>
46*ab8db090SAndroid Build Coastguard Worker      <li>Reverse lookup
47*ab8db090SAndroid Build Coastguard Worker       <ul>
48*ab8db090SAndroid Build Coastguard Worker        <li>restores a key from its ID.</li>
49*ab8db090SAndroid Build Coastguard Worker       </ul>
50*ab8db090SAndroid Build Coastguard Worker      </li>
51*ab8db090SAndroid Build Coastguard Worker      <li>Common prefix search
52*ab8db090SAndroid Build Coastguard Worker       <ul>
53*ab8db090SAndroid Build Coastguard Worker        <li>searches keys from the possible prefixes of a query string.</li>
54*ab8db090SAndroid Build Coastguard Worker       </ul>
55*ab8db090SAndroid Build Coastguard Worker      </li>
56*ab8db090SAndroid Build Coastguard Worker      <li>Predictive search
57*ab8db090SAndroid Build Coastguard Worker       <ul>
58*ab8db090SAndroid Build Coastguard Worker        <li>searches keys starting with a query string.</li>
59*ab8db090SAndroid Build Coastguard Worker       </ul>
60*ab8db090SAndroid Build Coastguard Worker      </li>
61*ab8db090SAndroid Build Coastguard Worker     </ul>
62*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
63*ab8db090SAndroid Build Coastguard Worker   </div><!-- section -->
64*ab8db090SAndroid Build Coastguard Worker   <div class="section">
65*ab8db090SAndroid Build Coastguard Worker    <h2><a name="source">Source</a></h2>
66*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
67*ab8db090SAndroid Build Coastguard Worker     <h3><a name="license">License</a></h3>
68*ab8db090SAndroid Build Coastguard Worker     <p>
69*ab8db090SAndroid Build Coastguard Worker      libmarisa and its command line tools are dual-licensed under the BSD 2-clause license and the LGPL.
70*ab8db090SAndroid Build Coastguard Worker     </p>
71*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
72*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
73*ab8db090SAndroid Build Coastguard Worker     <h3><a name="download">Download</a></h3>
74*ab8db090SAndroid Build Coastguard Worker     <p>
75*ab8db090SAndroid Build Coastguard Worker      The project is hosted on <a href="https://github.com/">GitHub</a>.
76*ab8db090SAndroid Build Coastguard Worker     </p>
77*ab8db090SAndroid Build Coastguard Worker     <ul>
78*ab8db090SAndroid Build Coastguard Worker      <li>Project
79*ab8db090SAndroid Build Coastguard Worker       <ul>
80*ab8db090SAndroid Build Coastguard Worker        <li><a href="https://github.com/s-yata/marisa-trie">https://github.com/s-yata/marisa-trie</a></li>
81*ab8db090SAndroid Build Coastguard Worker       </ul>
82*ab8db090SAndroid Build Coastguard Worker      </li>
83*ab8db090SAndroid Build Coastguard Worker      <li>Source
84*ab8db090SAndroid Build Coastguard Worker       <ul>
85*ab8db090SAndroid Build Coastguard Worker        <li><a href="https://github.com/s-yata/marisa-trie/archive/v0.2.6.tar.gz">marisa-0.2.6.tar.gz</a></li>
86*ab8db090SAndroid Build Coastguard Worker       </ul>
87*ab8db090SAndroid Build Coastguard Worker      </li>
88*ab8db090SAndroid Build Coastguard Worker     </ul>
89*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
90*ab8db090SAndroid Build Coastguard Worker   </div><!-- section -->
91*ab8db090SAndroid Build Coastguard Worker
92*ab8db090SAndroid Build Coastguard Worker   <div class="section">
93*ab8db090SAndroid Build Coastguard Worker    <h2><a name="install">Installation</a></h2>
94*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
95*ab8db090SAndroid Build Coastguard Worker     <h3><a name="gcc">GCC &amp; Clang</a></h3>
96*ab8db090SAndroid Build Coastguard Worker     <div class="float">
97*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ tar zxf marisa-0.2.6.tar.gz
98*ab8db090SAndroid Build Coastguard Worker$ cd marisa-0.2.6
99*ab8db090SAndroid Build Coastguard Worker$ ./configure
100*ab8db090SAndroid Build Coastguard Worker$ make
101*ab8db090SAndroid Build Coastguard Worker$ make check
102*ab8db090SAndroid Build Coastguard Worker$ make install</pre>
103*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
104*ab8db090SAndroid Build Coastguard Worker     <p>
105*ab8db090SAndroid Build Coastguard Worker      Users can install libmarisa by using <kbd>configure</kbd> and <kbd>make</kbd>. <kbd>make install</kbd> might require <kbd>sudo</kbd> to install libmarisa as the root user. Additionally, <kbd>ldconfig</kbd> might be required because libmarisa is installed as a shared library in default settings.
106*ab8db090SAndroid Build Coastguard Worker     </p>
107*ab8db090SAndroid Build Coastguard Worker     <p>
108*ab8db090SAndroid Build Coastguard Worker      If a POPCNT instruction is available on your environment, you can specify <kbd>--enable-popcnt</kbd>, when you run <kbd>configure</kbd>, to improve the performance of libmarisa. Likewise, <kbd>--enable-sse2</kbd>, <kbd>--enable-sse3</kbd>, <kbd>--enable-ssse3</kbd>, <kbd>--enable-sse4.1</kbd>, <kbd>--enable-sse4.2</kbd>, <kbd>--enable-sse4</kbd>, <kbd>--enable-sse4a</kbd>, <kbd>--enable-bmi</kbd>, <kbd>--enable-bmi2</kbd> are available. Note that <kbd>--enable-native-code</kbd> enables instructions available on the compilation environment. Also, if you need a static library, specify <kbd>--enable-static</kbd> to <kbd>configure</kbd>. For other options, see <kbd>./configure --help</kbd>.
109*ab8db090SAndroid Build Coastguard Worker     </p>
110*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
111*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
112*ab8db090SAndroid Build Coastguard Worker     <h3><a name="vc">Visual C++ 2008</a></h3>
113*ab8db090SAndroid Build Coastguard Worker     <p>
114*ab8db090SAndroid Build Coastguard Worker      There are project files for Visual C++ 2008 in <kbd>vs2008/</kbd>. Users can build a static library <kbd>libmarisa.lib</kbd> and the command line tools by using <kbd>vs2008/vs2008.sln</kbd>. If your Visual C++ is older than 2008. New projects are required to build libmarisa.
115*ab8db090SAndroid Build Coastguard Worker     </p>
116*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
117*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
118*ab8db090SAndroid Build Coastguard Worker     <h3><a name="vc">Perl Bindings</a></h3>
119*ab8db090SAndroid Build Coastguard Worker     <div class="float">
120*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ cd bindings/perl
121*ab8db090SAndroid Build Coastguard Worker$ perl Makefile.PL
122*ab8db090SAndroid Build Coastguard Worker$ make
123*ab8db090SAndroid Build Coastguard Worker$ make install</pre>
124*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
125*ab8db090SAndroid Build Coastguard Worker     <p>
126*ab8db090SAndroid Build Coastguard Worker      Users can find a Perl bindings in <kbd>bindings/perl/</kbd>, in which the wrapper was generated by <a href="http://www.swig.org/">SWIG</a>. To install the Perl bindings, run <kbd>perl Makefile.PL</kbd> and then <kbd>make install</kbd>. See also <kbd>bindings/perl/sample.pl</kbd>.
127*ab8db090SAndroid Build Coastguard Worker     </p>
128*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
129*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
130*ab8db090SAndroid Build Coastguard Worker     <h3><a name="vc">Python Bindings</a></h3>
131*ab8db090SAndroid Build Coastguard Worker     <div class="float">
132*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ cd bindings/python
133*ab8db090SAndroid Build Coastguard Worker$ python setup.py build
134*ab8db090SAndroid Build Coastguard Worker$ python setup.py install</pre>
135*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
136*ab8db090SAndroid Build Coastguard Worker     <p>
137*ab8db090SAndroid Build Coastguard Worker      Users can find a Python bindings in <kbd>bindings/python/</kbd>, in which the wrapper was generated by <a href="http://www.swig.org/">SWIG</a>. To install the Python bindings, run <kbd>python setup.py install</kbd>. See also <kbd>bindings/python/sample.py</kbd>.
138*ab8db090SAndroid Build Coastguard Worker     </p>
139*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
140*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
141*ab8db090SAndroid Build Coastguard Worker     <h3><a name="vc">Ruby Bindings</a></h3>
142*ab8db090SAndroid Build Coastguard Worker     <div class="float">
143*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ cd bindings/ruby
144*ab8db090SAndroid Build Coastguard Worker$ ruby extconf.rb
145*ab8db090SAndroid Build Coastguard Worker$ make
146*ab8db090SAndroid Build Coastguard Worker$ make install</pre>
147*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
148*ab8db090SAndroid Build Coastguard Worker     <p>
149*ab8db090SAndroid Build Coastguard Worker      Users can find a Ruby bindings in <kbd>bindings/ruby/</kbd>, in which the wrapper was generated by <a href="http://www.swig.org/">SWIG</a>. To install the Ruby bindings, run <kbd>ruby extconf.rb</kbd> and then <kbd>make install</kbd>. See also <kbd>bindings/ruby/sample.rb</kbd>.
150*ab8db090SAndroid Build Coastguard Worker     </p>
151*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
152*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
153*ab8db090SAndroid Build Coastguard Worker     <h3><a name="vc">Others</a></h3>
154*ab8db090SAndroid Build Coastguard Worker     <p>
155*ab8db090SAndroid Build Coastguard Worker      There are some other bindings.
156*ab8db090SAndroid Build Coastguard Worker     </p>
157*ab8db090SAndroid Build Coastguard Worker     <ul>
158*ab8db090SAndroid Build Coastguard Worker      <li>Python
159*ab8db090SAndroid Build Coastguard Worker       <ul>
160*ab8db090SAndroid Build Coastguard Worker        <li><a href="https://github.com/kmike/marisa-trie/">https://github.com/kmike/marisa-trie/</a> an alternative Cython-based pip-installable Python bindings which is faster than the above Python bindings.</li>
161*ab8db090SAndroid Build Coastguard Worker       </ul>
162*ab8db090SAndroid Build Coastguard Worker      </li>
163*ab8db090SAndroid Build Coastguard Worker      <li>Node.js</li>
164*ab8db090SAndroid Build Coastguard Worker       <ul>
165*ab8db090SAndroid Build Coastguard Worker        <li><a href="https://github.com/jakwings/iojs-marisa-trie">https://github.com/jakwings/iojs-marisa-trie</a> a wrapper of marisa-trie</li>
166*ab8db090SAndroid Build Coastguard Worker       </ul>
167*ab8db090SAndroid Build Coastguard Worker      </li>
168*ab8db090SAndroid Build Coastguard Worker     </ul>
169*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
170*ab8db090SAndroid Build Coastguard Worker   </div><!-- section -->
171*ab8db090SAndroid Build Coastguard Worker
172*ab8db090SAndroid Build Coastguard Worker   <div class="section">
173*ab8db090SAndroid Build Coastguard Worker    <h2><a name="tools">Command Line Tools</a></h2>
174*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
175*ab8db090SAndroid Build Coastguard Worker     <h3><a name="marisa-build">marisa-build</a></h3>
176*ab8db090SAndroid Build Coastguard Worker     <div class="float">
177*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ marisa-build &lt; keyset.txt &gt; keyset.dic
178*ab8db090SAndroid Build Coastguard Worker#keys: 9864459
179*ab8db090SAndroid Build Coastguard Worker#nodes: 13473881
180*ab8db090SAndroid Build Coastguard Workersize: 51044424</pre>
181*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
182*ab8db090SAndroid Build Coastguard Worker     <p>
183*ab8db090SAndroid Build Coastguard Worker      <kbd>marisa-build</kbd> is a tool to build a dictionary from a set of keys. This tool takes as input newline-delimited keys and writes the dictionary to the standard output.
184*ab8db090SAndroid Build Coastguard Worker     </p>
185*ab8db090SAndroid Build Coastguard Worker     <p>
186*ab8db090SAndroid Build Coastguard Worker      Users can specify parameters through command line options. See <kbd>marisa-build -h</kbd> for the list of options.
187*ab8db090SAndroid Build Coastguard Worker     </p>
188*ab8db090SAndroid Build Coastguard Worker     <p>
189*ab8db090SAndroid Build Coastguard Worker      If an input line contains horizontal tabs, the last one serves as the delimiter between a key and its weight which is used to optimize the order of nodes. Estimated frequency of each key, given as the weight, may improve the search performance.
190*ab8db090SAndroid Build Coastguard Worker     </p>
191*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
192*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
193*ab8db090SAndroid Build Coastguard Worker     <h3><a name="marisa-lookup">marisa-lookup</a></h3>
194*ab8db090SAndroid Build Coastguard Worker     <div class="float">
195*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ marisa-lookup keyset.dic
196*ab8db090SAndroid Build Coastguard WorkerMarisa
197*ab8db090SAndroid Build Coastguard Worker915465	Marisa
198*ab8db090SAndroid Build Coastguard WorkerWhat's_uuup
199*ab8db090SAndroid Build Coastguard Worker-1	What's_uuup</pre>
200*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
201*ab8db090SAndroid Build Coastguard Worker     <p>
202*ab8db090SAndroid Build Coastguard Worker      <kbd>marisa-lookup</kbd> is a tool to test exact match lookup. If a query string is registered, this tool prints the key and its ID. Otherwise, this tool prints the query with <var>-1</var>.
203*ab8db090SAndroid Build Coastguard Worker     </p>
204*ab8db090SAndroid Build Coastguard Worker     <p>
205*ab8db090SAndroid Build Coastguard Worker      See <kbd>marisa-lookup -h</kbd> for the list of options.
206*ab8db090SAndroid Build Coastguard Worker     </p>
207*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
208*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
209*ab8db090SAndroid Build Coastguard Worker     <h3><a name="marisa-reverse-lookup">marisa-reverse-lookup</a></h3>
210*ab8db090SAndroid Build Coastguard Worker     <div class="float">
211*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ marisa-reverse-lookup keyset.dic
212*ab8db090SAndroid Build Coastguard Worker1234567
213*ab8db090SAndroid Build Coastguard Worker1234567	Goma_International_Airport</pre>
214*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
215*ab8db090SAndroid Build Coastguard Worker     <p>
216*ab8db090SAndroid Build Coastguard Worker      <kbd>marisa-reverse-lookup</kbd> is a tool to test reverse lookup. If a given ID is not out-of-range, this tool restores the associated key and prints it. The available ID is <var>0</var> to <var>(n - 1)</var>, where <var>n</var> is the number of keys. Note that an out-of-range ID causes an error.
217*ab8db090SAndroid Build Coastguard Worker     </p>
218*ab8db090SAndroid Build Coastguard Worker     <p>
219*ab8db090SAndroid Build Coastguard Worker      See <kbd>marisa-reverse-lookup -h</kbd> for the list of options.
220*ab8db090SAndroid Build Coastguard Worker     </p>
221*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
222*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
223*ab8db090SAndroid Build Coastguard Worker     <h3><a name="marisa-common-prefix-search">marisa-common-prefix-search</a></h3>
224*ab8db090SAndroid Build Coastguard Worker     <div class="float">
225*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ marisa-common-prefix-search keyset.dic
226*ab8db090SAndroid Build Coastguard WorkerUSA
227*ab8db090SAndroid Build Coastguard Worker3 found
228*ab8db090SAndroid Build Coastguard Worker20	U	USA
229*ab8db090SAndroid Build Coastguard Worker1526	US	USA
230*ab8db090SAndroid Build Coastguard Worker37471	USA	USA</pre>
231*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
232*ab8db090SAndroid Build Coastguard Worker     <p>
233*ab8db090SAndroid Build Coastguard Worker      <kbd>marisa-common-prefix-search</kbd> is a tool to test common prefix search.  This tool searches keys from the possible prefixes of a query string and then prints the first <var>m</var> keys, where <var>m</var> is one of the parameters.
234*ab8db090SAndroid Build Coastguard Worker     </p>
235*ab8db090SAndroid Build Coastguard Worker     <p>
236*ab8db090SAndroid Build Coastguard Worker      See <kbd>marisa-common-prefix-search -h</kbd> for the list of options.
237*ab8db090SAndroid Build Coastguard Worker     </p>
238*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
239*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
240*ab8db090SAndroid Build Coastguard Worker     <h3><a name="marisa-predictive-search">marisa-predictive-search</a></h3>
241*ab8db090SAndroid Build Coastguard Worker     <div class="float">
242*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ marisa-predictive-search keyset.dic -n 2
243*ab8db090SAndroid Build Coastguard WorkerTouhou
244*ab8db090SAndroid Build Coastguard Worker15 found
245*ab8db090SAndroid Build Coastguard Worker975378	Touhou	Touhou
246*ab8db090SAndroid Build Coastguard Worker5508004	Touhou_Hisotensoku	Touhou</pre>
247*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
248*ab8db090SAndroid Build Coastguard Worker     <p>
249*ab8db090SAndroid Build Coastguard Worker      <kbd>marisa-predictive-search</kbd> is a tool to test predictive search. This tool searches keys starting with a query string and then prints the first <var>m</var> keys, where <var>m</var> is one of the parameters.
250*ab8db090SAndroid Build Coastguard Worker     </p>
251*ab8db090SAndroid Build Coastguard Worker     <p>
252*ab8db090SAndroid Build Coastguard Worker      See <kbd>marisa-predictive-search -h</kbd> for the list of options.
253*ab8db090SAndroid Build Coastguard Worker     </p>
254*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
255*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
256*ab8db090SAndroid Build Coastguard Worker     <h3><a name="marisa-benchmark">marisa-benchmark</a></h3>
257*ab8db090SAndroid Build Coastguard Worker     <div class="float">
258*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ marisa-benchmark keyset.txt
259*ab8db090SAndroid Build Coastguard WorkerNumber of tries: 1 - 5
260*ab8db090SAndroid Build Coastguard WorkerTAIL mode: Text mode
261*ab8db090SAndroid Build Coastguard WorkerNode order: Descending weight order
262*ab8db090SAndroid Build Coastguard WorkerCache level: Normal cache
263*ab8db090SAndroid Build Coastguard WorkerNumber of keys: 9864459
264*ab8db090SAndroid Build Coastguard WorkerTotal length: 191858227
265*ab8db090SAndroid Build Coastguard Worker------+----------+--------+--------+--------+--------+--------
266*ab8db090SAndroid Build Coastguard Worker#tries       size    build   lookup  reverse   prefix  predict
267*ab8db090SAndroid Build Coastguard Worker                                      lookup   search   search
268*ab8db090SAndroid Build Coastguard Worker          [bytes]    [K/s]    [K/s]    [K/s]    [K/s]    [K/s]
269*ab8db090SAndroid Build Coastguard Worker------+----------+--------+--------+--------+--------+--------
270*ab8db090SAndroid Build Coastguard Worker     1   69905816   334.84  1368.16  1304.82  1080.44   605.92
271*ab8db090SAndroid Build Coastguard Worker     2   53635744   284.03   762.91   773.68   662.04   244.35
272*ab8db090SAndroid Build Coastguard Worker     3   51044424   278.89   688.86   703.60   604.44   212.00
273*ab8db090SAndroid Build Coastguard Worker     4   50309000   277.01   669.23   680.78   588.57   204.23
274*ab8db090SAndroid Build Coastguard Worker     5   50042232   275.93   636.83   674.26   562.08   199.48
275*ab8db090SAndroid Build Coastguard Worker------+----------+--------+--------+--------+--------+--------</pre>
276*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
277*ab8db090SAndroid Build Coastguard Worker     <p>
278*ab8db090SAndroid Build Coastguard Worker      <kbd>marisa-benchmark</kbd> is a tool to benchmark libmarisa. This tool takes the same input as <kbd>marisa-build</kbd> and measures the performance of libmarisa for the given set of keys. This tool is useful to fix dictionary settings.
279*ab8db090SAndroid Build Coastguard Worker     </p>
280*ab8db090SAndroid Build Coastguard Worker     <p>
281*ab8db090SAndroid Build Coastguard Worker      For the search performance, <kbd>marisa-benchmark</kbd> measures the time to lookup or search keys in input order. When the keys are given in lexicographic order, few cache misses will occur in the benchmark. In contrast, when the keys are given in random order, many cache misses will occur in the benchmark.
282*ab8db090SAndroid Build Coastguard Worker     </p>
283*ab8db090SAndroid Build Coastguard Worker     <p>
284*ab8db090SAndroid Build Coastguard Worker      See <kbd>marisa-benchmark -h</kbd> for the list of options.
285*ab8db090SAndroid Build Coastguard Worker     </p>
286*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
287*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
288*ab8db090SAndroid Build Coastguard Worker     <h3><a name="marisa-dump">marisa-dump</a></h3>
289*ab8db090SAndroid Build Coastguard Worker     <div class="float">
290*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ marisa-dump keyset.dic | head -3
291*ab8db090SAndroid Build Coastguard Workerinput: keyset.dic
292*ab8db090SAndroid Build Coastguard WorkerS
293*ab8db090SAndroid Build Coastguard WorkerSt
294*ab8db090SAndroid Build Coastguard WorkerSta</pre>
295*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
296*ab8db090SAndroid Build Coastguard Worker     <p>
297*ab8db090SAndroid Build Coastguard Worker      <kbd>marisa-build</kbd> is a tool to dump a dictionary. This tool prints all the keys in a given dictionary.
298*ab8db090SAndroid Build Coastguard Worker     </p>
299*ab8db090SAndroid Build Coastguard Worker     <p>
300*ab8db090SAndroid Build Coastguard Worker      Users can specify the delimiter through command line options. See <kbd>marisa-dump -h</kbd> for the list of options.
301*ab8db090SAndroid Build Coastguard Worker     </p>
302*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
303*ab8db090SAndroid Build Coastguard Worker   </div><!-- section -->
304*ab8db090SAndroid Build Coastguard Worker
305*ab8db090SAndroid Build Coastguard Worker   <div class="section">
306*ab8db090SAndroid Build Coastguard Worker    <h2><a name="library">Library</a></h2>
307*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
308*ab8db090SAndroid Build Coastguard Worker     <h3><a name="howto">How to Use</a></h3>
309*ab8db090SAndroid Build Coastguard Worker     <div class="float">
310*ab8db090SAndroid Build Coastguard Worker      <pre class="code">// sample.cc
311*ab8db090SAndroid Build Coastguard Worker#include &lt;iostream&gt;
312*ab8db090SAndroid Build Coastguard Worker#include &lt;marisa.h&gt;
313*ab8db090SAndroid Build Coastguard Worker
314*ab8db090SAndroid Build Coastguard Workerint main() {
315*ab8db090SAndroid Build Coastguard Worker  marisa::Keyset keyset;
316*ab8db090SAndroid Build Coastguard Worker  keyset.push_back("a");
317*ab8db090SAndroid Build Coastguard Worker  keyset.push_back("app");
318*ab8db090SAndroid Build Coastguard Worker  keyset.push_back("apple");
319*ab8db090SAndroid Build Coastguard Worker
320*ab8db090SAndroid Build Coastguard Worker  marisa::Trie trie;
321*ab8db090SAndroid Build Coastguard Worker  trie.build(keyset);
322*ab8db090SAndroid Build Coastguard Worker
323*ab8db090SAndroid Build Coastguard Worker  marisa::Agent agent;
324*ab8db090SAndroid Build Coastguard Worker  agent.set_query("apple");
325*ab8db090SAndroid Build Coastguard Worker  while (trie.common_prefix_search(agent)) {
326*ab8db090SAndroid Build Coastguard Worker    std::cout.write(agent.key().ptr(), agent.key().length());
327*ab8db090SAndroid Build Coastguard Worker    std::cout &lt;&lt; ": " &lt;&lt; agent.key().id() &lt;&lt; std::endl;
328*ab8db090SAndroid Build Coastguard Worker  }
329*ab8db090SAndroid Build Coastguard Worker  return 0;
330*ab8db090SAndroid Build Coastguard Worker}</pre>
331*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
332*ab8db090SAndroid Build Coastguard Worker     <div class="float">
333*ab8db090SAndroid Build Coastguard Worker      <pre class="console">$ g++ sample.cc -lmarisa
334*ab8db090SAndroid Build Coastguard Worker$ ./a.out
335*ab8db090SAndroid Build Coastguard Workera: 0
336*ab8db090SAndroid Build Coastguard Workerapp: 1
337*ab8db090SAndroid Build Coastguard Workerapple: 2</pre>
338*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
339*ab8db090SAndroid Build Coastguard Worker     <p>
340*ab8db090SAndroid Build Coastguard Worker      libmarisa provides <kbd>marisa.h</kbd> in which all the headers are <code>#include</code>d. Also, libmarisa uses <code>namespace marisa</code>. All the classes and functions except enumeration types are given as members of this namespace. Note that <code>using namespace marisa</code> may cause a critical error. Finally, <kbd>gcc</kbd> and <kbd>clang</kbd> require an option, <kbd>-lmarisa</kbd>, to link libmarisa with an application.
341*ab8db090SAndroid Build Coastguard Worker     </p>
342*ab8db090SAndroid Build Coastguard Worker     <p>
343*ab8db090SAndroid Build Coastguard Worker      The core components of libmarisa are <a href="#keyset">Keyset</a>, <a href="#agent">Agent</a>, and <a href="#trie">Trie</a>. In addition, libmarisa provides an exception class, <a href="#exception">Exception</a>, and two more classes, <a href="#key">Key</a> and <a href="#query">Query</a>, as members of <code>Keyset</code> and <code>Agent</code>.
344*ab8db090SAndroid Build Coastguard Worker     </p>
345*ab8db090SAndroid Build Coastguard Worker     <ul>
346*ab8db090SAndroid Build Coastguard Worker      <li><code>Keyset</code>: A class to store a set of keys. This class is used to build a set of keys for building a dictionary. Also, this class is useful to store search results.</li>
347*ab8db090SAndroid Build Coastguard Worker      <li><code>Agent</code>: A class to store a query and a result of search operations. Every search function takes a reference to this class.</li>
348*ab8db090SAndroid Build Coastguard Worker      <li><code>Trie</code>: A dictionary class.</li>
349*ab8db090SAndroid Build Coastguard Worker     </ul>
350*ab8db090SAndroid Build Coastguard Worker     <p>
351*ab8db090SAndroid Build Coastguard Worker      For more examples, you can find the source code of the command line tools in <kbd>tools/</kbd>. The source code is useful as an example of error handling, predicive search, etc.
352*ab8db090SAndroid Build Coastguard Worker     </p>
353*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
354*ab8db090SAndroid Build Coastguard Worker
355*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
356*ab8db090SAndroid Build Coastguard Worker     <h3><a name="enum">Enumeration Constants</a></h3>
357*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
358*ab8db090SAndroid Build Coastguard Worker      <h4>Error Codes</h4>
359*ab8db090SAndroid Build Coastguard Worker      <div class="float">
360*ab8db090SAndroid Build Coastguard Worker       <pre class="code">typedef enum marisa_error_code_ {
361*ab8db090SAndroid Build Coastguard Worker  MARISA_OK           = 0,
362*ab8db090SAndroid Build Coastguard Worker  MARISA_STATE_ERROR  = 1,
363*ab8db090SAndroid Build Coastguard Worker  MARISA_NULL_ERROR   = 2,
364*ab8db090SAndroid Build Coastguard Worker  MARISA_BOUND_ERROR  = 3,
365*ab8db090SAndroid Build Coastguard Worker  MARISA_RANGE_ERROR  = 4,
366*ab8db090SAndroid Build Coastguard Worker  MARISA_CODE_ERROR   = 5,
367*ab8db090SAndroid Build Coastguard Worker  MARISA_RESET_ERROR  = 6,
368*ab8db090SAndroid Build Coastguard Worker  MARISA_SIZE_ERROR   = 7,
369*ab8db090SAndroid Build Coastguard Worker  MARISA_MEMORY_ERROR = 8,
370*ab8db090SAndroid Build Coastguard Worker  MARISA_IO_ERROR     = 9,
371*ab8db090SAndroid Build Coastguard Worker  MARISA_FORMAT_ERROR = 10,
372*ab8db090SAndroid Build Coastguard Worker} marisa_error_code;</pre>
373*ab8db090SAndroid Build Coastguard Worker      </div><!-- float -->
374*ab8db090SAndroid Build Coastguard Worker      <p>
375*ab8db090SAndroid Build Coastguard Worker       libmarisa throws an instance of <code>Exception</code> when an error occurs, such as a file I/O error (<var>MARISA_IO_ERROR</var>), a size limitation error (<var>MARISA_SIZE_ERROR</var>), etc. For details, see <kbd>marisa/base.h</kbd>.
376*ab8db090SAndroid Build Coastguard Worker      </p>
377*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
378*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
379*ab8db090SAndroid Build Coastguard Worker      <h4>Number of Tries</h4>
380*ab8db090SAndroid Build Coastguard Worker      <div class="float">
381*ab8db090SAndroid Build Coastguard Worker       <pre class="code">
382*ab8db090SAndroid Build Coastguard Workertypedef enum marisa_num_tries_ {
383*ab8db090SAndroid Build Coastguard Worker  MARISA_MIN_NUM_TRIES     = 0x00001,
384*ab8db090SAndroid Build Coastguard Worker  MARISA_MAX_NUM_TRIES     = 0x0007F,
385*ab8db090SAndroid Build Coastguard Worker  MARISA_DEFAULT_NUM_TRIES = 0x00003,
386*ab8db090SAndroid Build Coastguard Worker} marisa_num_tries;</pre>
387*ab8db090SAndroid Build Coastguard Worker      </div><!-- float -->
388*ab8db090SAndroid Build Coastguard Worker      <p>
389*ab8db090SAndroid Build Coastguard Worker       MARISA is a recursive data structure in which a patricia trie is used to represent another patricia trie. A deeper recursion makes a dictionary more compact but degrades the search performance. For this time-space tradeoff, libmarisa provides a parameter to limit the recursion depth, which is equivalent to the number of tries. <code>marisa_num_tries</code> gives the range and the default setting of this parameter.
390*ab8db090SAndroid Build Coastguard Worker      </p>
391*ab8db090SAndroid Build Coastguard Worker      <p>
392*ab8db090SAndroid Build Coastguard Worker       The best setting depends on the set of keys and the applications. In most cases, libmarisa works well with the default setting, <var>MARISA_DEFAULT_NUM_TRIES</var>, but if the application requires better search performance, <var>MARISA_MIN_NUM_TRIES</var> may be a better choice. Also, if the application uses long and complicated keys, a deeper recursion may achieve much higher spece-efficiency. <kbd>marisa-benchmark</kbd> is useful to find the best setting.
393*ab8db090SAndroid Build Coastguard Worker      </p>
394*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
395*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
396*ab8db090SAndroid Build Coastguard Worker      <h4>Cache Size</h4>
397*ab8db090SAndroid Build Coastguard Worker      <div class="float">
398*ab8db090SAndroid Build Coastguard Worker       <pre class="code">typedef enum marisa_cache_level_ {
399*ab8db090SAndroid Build Coastguard Worker  MARISA_HUGE_CACHE    = 0x00080,
400*ab8db090SAndroid Build Coastguard Worker  MARISA_LARGE_CACHE   = 0x00100,
401*ab8db090SAndroid Build Coastguard Worker  MARISA_NORMAL_CACHE  = 0x00200,
402*ab8db090SAndroid Build Coastguard Worker  MARISA_SMALL_CACHE   = 0x00400,
403*ab8db090SAndroid Build Coastguard Worker  MARISA_TINY_CACHE    = 0x00800,
404*ab8db090SAndroid Build Coastguard Worker  MARISA_DEFAULT_CACHE = MARISA_NORMAL_CACHE
405*ab8db090SAndroid Build Coastguard Worker} marisa_cache_level;</pre>
406*ab8db090SAndroid Build Coastguard Worker      </div><!-- float -->
407*ab8db090SAndroid Build Coastguard Worker      <p>
408*ab8db090SAndroid Build Coastguard Worker       libmarisa embeds a precomputed table to a dictionary. The table serves as transition cache which improves the search performance but increases the dictionary size. Cache size is the parameter of this time-space tradeoff.
409*ab8db090SAndroid Build Coastguard Worker      </p>
410*ab8db090SAndroid Build Coastguard Worker      <p>
411*ab8db090SAndroid Build Coastguard Worker       <code>marisa_cache_level</code> gives a list of available cache size. Compared with <var>MARISA_NORMAL_CACHE</var>, <var>MARISA_LARGE_CACHE</var> is 2 times larger, <var>MARISA_HUGE_CACHE</var> is 4 times larger, <var>MARISA_SMALL_CACHE</var> is 2 times smaller, and <var>MARISA_TINY_CACHE</var> is 4 times smaller.
412*ab8db090SAndroid Build Coastguard Worker      </p>
413*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
414*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
415*ab8db090SAndroid Build Coastguard Worker      <h4>TAIL Mode</h4>
416*ab8db090SAndroid Build Coastguard Worker      <div class="float">
417*ab8db090SAndroid Build Coastguard Worker       <pre class="code">typedef enum marisa_tail_mode_ {
418*ab8db090SAndroid Build Coastguard Worker  MARISA_TEXT_TAIL    = 0x01000,
419*ab8db090SAndroid Build Coastguard Worker  MARISA_BINARY_TAIL  = 0x02000,
420*ab8db090SAndroid Build Coastguard Worker  MARISA_DEFAULT_TAIL = MARISA_TEXT_TAIL,
421*ab8db090SAndroid Build Coastguard Worker} marisa_tail_mode;</pre>
422*ab8db090SAndroid Build Coastguard Worker      </div><!-- float -->
423*ab8db090SAndroid Build Coastguard Worker      <p>
424*ab8db090SAndroid Build Coastguard Worker       The last patricia trie of MARISA stores its multi-byte labels as strings and <code>marisa_tail_mode</code> gives a list of TAIL implementations.
425*ab8db090SAndroid Build Coastguard Worker      </p>
426*ab8db090SAndroid Build Coastguard Worker      <p>
427*ab8db090SAndroid Build Coastguard Worker       <var>MARISA_TEXT_TAIL</var> stores labels as zero-terminated strings. If the labels contain <var>'\0'</var>, the TAIL mode is automatically switched to <var>MARISA_BINARY_TAIL</var>.
428*ab8db090SAndroid Build Coastguard Worker      </p>
429*ab8db090SAndroid Build Coastguard Worker      <p>
430*ab8db090SAndroid Build Coastguard Worker       On the other hand, <var>MARISA_BINARY_TAIL</var> uses a bit vector, instead of <var>'\0'</var>, to detect the end of labels. This means that <var>MARISA_TEXT_TAIL</var> is more space-efficient than <var>MARISA_BINARY_TAIL</var> when the average length of multi-byte labels is longer than <var>8 bytes</var>.
431*ab8db090SAndroid Build Coastguard Worker      </p>
432*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
433*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
434*ab8db090SAndroid Build Coastguard Worker      <h4>Node Order</h4>
435*ab8db090SAndroid Build Coastguard Worker      <div class="float">
436*ab8db090SAndroid Build Coastguard Worker       <pre class="code">typedef enum marisa_node_order_ {
437*ab8db090SAndroid Build Coastguard Worker  MARISA_LABEL_ORDER   = 0x10000,
438*ab8db090SAndroid Build Coastguard Worker  MARISA_WEIGHT_ORDER  = 0x20000,
439*ab8db090SAndroid Build Coastguard Worker  MARISA_DEFAULT_ORDER = MARISA_WEIGHT_ORDER,
440*ab8db090SAndroid Build Coastguard Worker} marisa_node_order;</pre>
441*ab8db090SAndroid Build Coastguard Worker      </div><!-- float -->
442*ab8db090SAndroid Build Coastguard Worker      <p>
443*ab8db090SAndroid Build Coastguard Worker       A dictionary has one more parameter, which is the order of nodes. There are two choices, <var>MARISA_LABEL_ORDER</var> and <var>MARISA_WEIGHT_ORDER</var>. The former arranges nodes in ascending order of the label and the latter arranges nodes in descending order of the weight. Many trie implementations arrange nodes in the label order but libmarisa uses <var>MARISA_WEIGHT_ORDER</var> as the default setting.
444*ab8db090SAndroid Build Coastguard Worker      </p>
445*ab8db090SAndroid Build Coastguard Worker      <p>
446*ab8db090SAndroid Build Coastguard Worker       <var>MARISA_WEIGHT_ORDER</var> optimizes the node order for linear search performed in exact match lookup, common prefix search, and predictive search. In practice, experiments for English words/phrases showed that <var>MARISA_WEIGHT_ORDER</var> halved the average search time. On the other hand, <var>MARISA_LABEL_ORDER</var> enables predictive search to restore keys in lexicographic order.
447*ab8db090SAndroid Build Coastguard Worker      </p>
448*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
449*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
450*ab8db090SAndroid Build Coastguard Worker      <h4>Aliases</h4>
451*ab8db090SAndroid Build Coastguard Worker      <div class="float">
452*ab8db090SAndroid Build Coastguard Worker       <pre class="code">namespace marisa {
453*ab8db090SAndroid Build Coastguard Worker  typedef ::marisa_error_code ErrorCode;
454*ab8db090SAndroid Build Coastguard Worker  typedef ::marisa_cache_level CacheLevel;
455*ab8db090SAndroid Build Coastguard Worker  typedef ::marisa_tail_mode TailMode;
456*ab8db090SAndroid Build Coastguard Worker  typedef ::marisa_node_order NodeOrder;
457*ab8db090SAndroid Build Coastguard Worker}  // namespace marisa</pre>
458*ab8db090SAndroid Build Coastguard Worker      </div><!-- float -->
459*ab8db090SAndroid Build Coastguard Worker      <p>
460*ab8db090SAndroid Build Coastguard Worker       The above enumeration types are defined in the global namespace to avoid collisions of the enumeration constants with macros provided by other modules. libmarisa provides type aliases and users can choose the familiar one.
461*ab8db090SAndroid Build Coastguard Worker      </p>
462*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
463*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
464*ab8db090SAndroid Build Coastguard Worker
465*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
466*ab8db090SAndroid Build Coastguard Worker     <h3><a name="exception">class Exception</a></h3>
467*ab8db090SAndroid Build Coastguard Worker     <div class="float">
468*ab8db090SAndroid Build Coastguard Worker      <pre class="code">class Exception {
469*ab8db090SAndroid Build Coastguard Worker public:
470*ab8db090SAndroid Build Coastguard Worker  const char *filename() const;
471*ab8db090SAndroid Build Coastguard Worker  int line() const;
472*ab8db090SAndroid Build Coastguard Worker  ErrorCode error_code() const;
473*ab8db090SAndroid Build Coastguard Worker  const char *error_message() const;
474*ab8db090SAndroid Build Coastguard Worker
475*ab8db090SAndroid Build Coastguard Worker  const char *what() const;
476*ab8db090SAndroid Build Coastguard Worker};</pre>
477*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
478*ab8db090SAndroid Build Coastguard Worker     <p>
479*ab8db090SAndroid Build Coastguard Worker      <code>Exception</code> is an exception class. libmarisa throws an instance of <code>Exception</code> with the file name (<code>__FILE__</code>), the line number (<code>__LINE__</code>), and an error code (<code>ErrorCode</code>) when an error is detected. The instance also has an error message formatted <var>__FILE__:__LINE__: error_code: error_message</var>.
480*ab8db090SAndroid Build Coastguard Worker     </p>
481*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
482*ab8db090SAndroid Build Coastguard Worker
483*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
484*ab8db090SAndroid Build Coastguard Worker     <h3><a name="key">class Key</a></h3>
485*ab8db090SAndroid Build Coastguard Worker     <div class="float">
486*ab8db090SAndroid Build Coastguard Worker      <pre class="code">class Key {
487*ab8db090SAndroid Build Coastguard Worker public:
488*ab8db090SAndroid Build Coastguard Worker  char operator[](std::size_t i) const;
489*ab8db090SAndroid Build Coastguard Worker  const char *ptr() const;
490*ab8db090SAndroid Build Coastguard Worker  std::size_t length() const;
491*ab8db090SAndroid Build Coastguard Worker  std::size_t id() const;
492*ab8db090SAndroid Build Coastguard Worker};</pre>
493*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
494*ab8db090SAndroid Build Coastguard Worker     <p>
495*ab8db090SAndroid Build Coastguard Worker      <code>Key</code> is a member of <a href="#keyset">Keyset</a> and <a href="#agent">Agent</a>. Each key of <code>Keyset</code> is represented by this class. Also, the search result of <code>Agent</code> is represented by this class.
496*ab8db090SAndroid Build Coastguard Worker     </p>
497*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
498*ab8db090SAndroid Build Coastguard Worker
499*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
500*ab8db090SAndroid Build Coastguard Worker     <h3><a name="query">class Query</a></h3>
501*ab8db090SAndroid Build Coastguard Worker     <div class="float">
502*ab8db090SAndroid Build Coastguard Worker      <pre class="code">class Query {
503*ab8db090SAndroid Build Coastguard Worker public:
504*ab8db090SAndroid Build Coastguard Worker  char operator[](std::size_t i) const;
505*ab8db090SAndroid Build Coastguard Worker  const char *ptr() const;
506*ab8db090SAndroid Build Coastguard Worker  std::size_t length() const;
507*ab8db090SAndroid Build Coastguard Worker  std::size_t id() const;
508*ab8db090SAndroid Build Coastguard Worker};</pre>
509*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
510*ab8db090SAndroid Build Coastguard Worker     <p>
511*ab8db090SAndroid Build Coastguard Worker      <code>Query</code> is a member of <a href="#agent">Agent</a>. This class stores a query string and an ID as input for search functions. Users cannot make changes directly to <code>Query</code> because <code>Agent</code> provides a special interface to update its query.
512*ab8db090SAndroid Build Coastguard Worker     </p>
513*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
514*ab8db090SAndroid Build Coastguard Worker
515*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
516*ab8db090SAndroid Build Coastguard Worker     <h3><a name="keyset">class Keyset</a></h3>
517*ab8db090SAndroid Build Coastguard Worker     <div class="float">
518*ab8db090SAndroid Build Coastguard Worker      <pre class="code">class Keyset {
519*ab8db090SAndroid Build Coastguard Worker public:
520*ab8db090SAndroid Build Coastguard Worker  Keyset();
521*ab8db090SAndroid Build Coastguard Worker
522*ab8db090SAndroid Build Coastguard Worker  void push_back(const Key &amp;key);
523*ab8db090SAndroid Build Coastguard Worker  void push_back(const Key &amp;key, char end_marker);
524*ab8db090SAndroid Build Coastguard Worker
525*ab8db090SAndroid Build Coastguard Worker  void push_back(const char *str);
526*ab8db090SAndroid Build Coastguard Worker  void push_back(const char *ptr,
527*ab8db090SAndroid Build Coastguard Worker                 std::size_t length,
528*ab8db090SAndroid Build Coastguard Worker                 float weight = 1.0);
529*ab8db090SAndroid Build Coastguard Worker
530*ab8db090SAndroid Build Coastguard Worker  const Key &amp;operator[](std::size_t i) const;
531*ab8db090SAndroid Build Coastguard Worker  Key &amp;operator[](std::size_t i);
532*ab8db090SAndroid Build Coastguard Worker
533*ab8db090SAndroid Build Coastguard Worker  std::size_t num_keys();
534*ab8db090SAndroid Build Coastguard Worker
535*ab8db090SAndroid Build Coastguard Worker  bool empty() const;
536*ab8db090SAndroid Build Coastguard Worker  std::size_t size() const;
537*ab8db090SAndroid Build Coastguard Worker  std::size_t total_length() const;
538*ab8db090SAndroid Build Coastguard Worker
539*ab8db090SAndroid Build Coastguard Worker  void reset();
540*ab8db090SAndroid Build Coastguard Worker
541*ab8db090SAndroid Build Coastguard Worker  void clear();
542*ab8db090SAndroid Build Coastguard Worker  void swap(Keyset &amp;rhs);
543*ab8db090SAndroid Build Coastguard Worker};</pre>
544*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
545*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
546*ab8db090SAndroid Build Coastguard Worker      <h4>Overview</h4>
547*ab8db090SAndroid Build Coastguard Worker      <p>
548*ab8db090SAndroid Build Coastguard Worker       <code>Keyset</code> is used to store a set of keys for dictionary construction or to save the results of search functions.
549*ab8db090SAndroid Build Coastguard Worker      </p>
550*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
551*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
552*ab8db090SAndroid Build Coastguard Worker      <h4>Dictionary Source</h4>
553*ab8db090SAndroid Build Coastguard Worker      <p>
554*ab8db090SAndroid Build Coastguard Worker       For dictionary construction, users append keys to <code>Keyset</code> by using <code>push_back()</code> and then pass the keyset to <code>build()</code> of <a href="#trie">Trie</a>. <var>weight</var> is an argument to receive the frequency or possibility of each key. If there are same keys, the weights are accumulated in dictionary construction.
555*ab8db090SAndroid Build Coastguard Worker      </p>
556*ab8db090SAndroid Build Coastguard Worker      <p>
557*ab8db090SAndroid Build Coastguard Worker       After dictionary construction, users can read the associated IDs through <code>operator[]()</code>. Instead, the weights are overwritten by the IDs because <code>Key</code> uses a <code>union</code> to store a weight or an ID.
558*ab8db090SAndroid Build Coastguard Worker      </p>
559*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
560*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
561*ab8db090SAndroid Build Coastguard Worker      <h4>Search Result</h4>
562*ab8db090SAndroid Build Coastguard Worker      <p>
563*ab8db090SAndroid Build Coastguard Worker       Users can save a search result to <code>Keyset</code> by using <code>push_back()</code>. When <code>key()</code> of <a href="#agent">Agent</a> is given, a copy of the search result is stored in <code>Keyset</code>. If you want to append an end marker, such as <code>'\0'</code>, use <var>end_marker</var> of <code>push_back()</code>.
564*ab8db090SAndroid Build Coastguard Worker      </p>
565*ab8db090SAndroid Build Coastguard Worker      <p>
566*ab8db090SAndroid Build Coastguard Worker       If you want to reuse an instance of <code>Keyset</code>, <code>reset()</code> may be a better choice than <code>clear()</code> because <code>reset()</code> keeps allocated memory in order to reduce memory allocation overhead.
567*ab8db090SAndroid Build Coastguard Worker      </p>
568*ab8db090SAndroid Build Coastguard Worker      <p>
569*ab8db090SAndroid Build Coastguard Worker       <code>num_keys()</code> and <code>size()</code> return the number of keys. <code>empty()</code> checks whether the number of keys is <var>0</var> or not. <code>total_length()</code> returns the total length in byte.
570*ab8db090SAndroid Build Coastguard Worker      </p>
571*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
572*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
573*ab8db090SAndroid Build Coastguard Worker
574*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
575*ab8db090SAndroid Build Coastguard Worker     <h3><a name="agent">class Agent</a></h3>
576*ab8db090SAndroid Build Coastguard Worker     <div class="float">
577*ab8db090SAndroid Build Coastguard Worker      <pre class="code">class Agent {
578*ab8db090SAndroid Build Coastguard Worker public:
579*ab8db090SAndroid Build Coastguard Worker  Agent();
580*ab8db090SAndroid Build Coastguard Worker
581*ab8db090SAndroid Build Coastguard Worker  const Query &amp;query() const;
582*ab8db090SAndroid Build Coastguard Worker  const Key &amp;key() const;
583*ab8db090SAndroid Build Coastguard Worker
584*ab8db090SAndroid Build Coastguard Worker  void set_query(const char *str);
585*ab8db090SAndroid Build Coastguard Worker  void set_query(const char *ptr,
586*ab8db090SAndroid Build Coastguard Worker                 std::size_t length);
587*ab8db090SAndroid Build Coastguard Worker  void set_query(std::size_t key_id);
588*ab8db090SAndroid Build Coastguard Worker};</pre>
589*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
590*ab8db090SAndroid Build Coastguard Worker     <p>
591*ab8db090SAndroid Build Coastguard Worker      <code>Agent</code> is actually a tuple of <code>Query</code>, <code>Key</code>, and <code>State</code>. This class is used as I/O of search functions. Also, <code>State</code> is an incomplete type to keep the internal state of search operation.
592*ab8db090SAndroid Build Coastguard Worker     </p>
593*ab8db090SAndroid Build Coastguard Worker     <p>
594*ab8db090SAndroid Build Coastguard Worker      A lookup operation requires 3 steps as follows: 1. sets a query string by <code>set_query()</code> of <code>Agent</code>, 2. passes the agent to <code>lookup()</code> of <code>Trie</code>, and 3. gets the search result by <code>key()</code> of <code>Agent</code>. The other operations proceed in the same way.
595*ab8db090SAndroid Build Coastguard Worker     </p>
596*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
597*ab8db090SAndroid Build Coastguard Worker
598*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
599*ab8db090SAndroid Build Coastguard Worker     <h3><a name="trie">class Trie</a></h3>
600*ab8db090SAndroid Build Coastguard Worker     <div class="float">
601*ab8db090SAndroid Build Coastguard Worker      <pre class="code">class Trie {
602*ab8db090SAndroid Build Coastguard Worker public:
603*ab8db090SAndroid Build Coastguard Worker  Trie();
604*ab8db090SAndroid Build Coastguard Worker
605*ab8db090SAndroid Build Coastguard Worker  void build(Keyset &amp;keyset,
606*ab8db090SAndroid Build Coastguard Worker             int config_flags = 0);
607*ab8db090SAndroid Build Coastguard Worker
608*ab8db090SAndroid Build Coastguard Worker  void mmap(const char *filename);
609*ab8db090SAndroid Build Coastguard Worker  void map(const void *ptr,
610*ab8db090SAndroid Build Coastguard Worker           std::size_t size);
611*ab8db090SAndroid Build Coastguard Worker
612*ab8db090SAndroid Build Coastguard Worker  void load(const char *filename);
613*ab8db090SAndroid Build Coastguard Worker  void read(int fd);
614*ab8db090SAndroid Build Coastguard Worker
615*ab8db090SAndroid Build Coastguard Worker  void save(const char *filename) const;
616*ab8db090SAndroid Build Coastguard Worker  void write(int fd) const;
617*ab8db090SAndroid Build Coastguard Worker
618*ab8db090SAndroid Build Coastguard Worker  bool lookup(Agent &amp;agent) const;
619*ab8db090SAndroid Build Coastguard Worker  void reverse_lookup(Agent &amp;agent) const;
620*ab8db090SAndroid Build Coastguard Worker  bool common_prefix_search(Agent &amp;agent) const;
621*ab8db090SAndroid Build Coastguard Worker  bool predictive_search(Agent &amp;agent) const;
622*ab8db090SAndroid Build Coastguard Worker
623*ab8db090SAndroid Build Coastguard Worker  std::size_t num_tries() const;
624*ab8db090SAndroid Build Coastguard Worker  std::size_t num_keys() const;
625*ab8db090SAndroid Build Coastguard Worker  std::size_t num_nodes() const;
626*ab8db090SAndroid Build Coastguard Worker
627*ab8db090SAndroid Build Coastguard Worker  TailMode tail_mode() const;
628*ab8db090SAndroid Build Coastguard Worker  NodeOrder node_order() const;
629*ab8db090SAndroid Build Coastguard Worker
630*ab8db090SAndroid Build Coastguard Worker  bool empty() const;
631*ab8db090SAndroid Build Coastguard Worker  std::size_t size() const;
632*ab8db090SAndroid Build Coastguard Worker  std::size_t io_size() const;
633*ab8db090SAndroid Build Coastguard Worker
634*ab8db090SAndroid Build Coastguard Worker  void clear();
635*ab8db090SAndroid Build Coastguard Worker  void swap(Trie &amp;rhs);
636*ab8db090SAndroid Build Coastguard Worker};</pre>
637*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
638*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
639*ab8db090SAndroid Build Coastguard Worker      <h4>Overview</h4>
640*ab8db090SAndroid Build Coastguard Worker      <p>
641*ab8db090SAndroid Build Coastguard Worker       <code>Trie</code> is a dictionary class, which is the most important component of libmarisa. All the operations are performed through this class.
642*ab8db090SAndroid Build Coastguard Worker      </p>
643*ab8db090SAndroid Build Coastguard Worker      <p>
644*ab8db090SAndroid Build Coastguard Worker       In fact, <code>Trie</code> is a dictionary handle, and if the handle is invalid, functions other than <code>build()</code>, <code>mmap()</code>, <code>map()</code>, <code>load()</code>, <code>read()</code>, <code>clear()</code>, <code>swap()</code> throw an exception.
645*ab8db090SAndroid Build Coastguard Worker      </p>
646*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
647*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
648*ab8db090SAndroid Build Coastguard Worker      <h4>Construction</h4>
649*ab8db090SAndroid Build Coastguard Worker      <p>
650*ab8db090SAndroid Build Coastguard Worker       You can build a dictionary by using <code>build()</code>. The arguments are the above mentioned <a href="#keyset">Keyset</a> and a dictionary setting, <var>config_flags</var>, which is represented by a combination of flags. For example, <var>2 | MARISA_BINARY_TAIL</var> specifies the maximum number of tries (<var>2</var>) and a TAIL mode (<var>MARISA_BINARY_TAIL</var>). Also, in this case, the default settings, <var>MARISA_DEFAULT_ORDER</var> and <var>MARISA_DEFAULT_CACHE</var>, are used for the node order and the cache size.
651*ab8db090SAndroid Build Coastguard Worker      </p>
652*ab8db090SAndroid Build Coastguard Worker      <p>
653*ab8db090SAndroid Build Coastguard Worker       The IDs associated with the keys are available through <code>operator[]()</code> of <var>keyset</var>, and the IDs are useful to associate the keys with any data types.
654*ab8db090SAndroid Build Coastguard Worker      </p>
655*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
656*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
657*ab8db090SAndroid Build Coastguard Worker      <h4>File I/O</h4>
658*ab8db090SAndroid Build Coastguard Worker      <p>
659*ab8db090SAndroid Build Coastguard Worker       <code>mmap()</code> is an interface for memory mapped I/O. If an application performs a few search operations, it is unnecessary to read the whole dictionary, and in such a case, <code>mmap()</code> is useful. Also, memory mapped I/O is an easy way to share dictionary data among processes. On the other hand, if an application performs a lot of search operations, a memory mapped dictionary might cause a lot of random disk accesses which considerably increase the search time.
660*ab8db090SAndroid Build Coastguard Worker      </p>
661*ab8db090SAndroid Build Coastguard Worker      <p>
662*ab8db090SAndroid Build Coastguard Worker       <code>map()</code> restores an instance of <code>Trie</code> from dictionary data on memory. <code>load()</code> and <code>read()</code> read a dictionary from a file or a file descriptor. <code>save()</code> and <code>write()</code> write a dictionary to a file or a file descriptor.
663*ab8db090SAndroid Build Coastguard Worker      </p>
664*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
665*ab8db090SAndroid Build Coastguard Worker     <div class="subsubsection">
666*ab8db090SAndroid Build Coastguard Worker      <h4>Search</h4>
667*ab8db090SAndroid Build Coastguard Worker      <p>
668*ab8db090SAndroid Build Coastguard Worker       <code>Trie</code> provides 4 search functions <code>lookup()</code>, <code>reverse_lookup()</code>, <code>common_prefix_search()</code>, and <code>predictive_search()</code> as follows:
669*ab8db090SAndroid Build Coastguard Worker      </p>
670*ab8db090SAndroid Build Coastguard Worker      <ul>
671*ab8db090SAndroid Build Coastguard Worker       <li>
672*ab8db090SAndroid Build Coastguard Worker        <code>lookup()</code> checks whether a query string is registered or not, and if it is registered, <code>lookup()</code> returns <var>true</var>. In this case, the search result is available through <code>agent.key()</code>. Note that <code>lookup()</code> does not restore a key and <code>agent.key().ptr()</code> points to the query string because the two strings are the same.
673*ab8db090SAndroid Build Coastguard Worker       </li>
674*ab8db090SAndroid Build Coastguard Worker       <li>
675*ab8db090SAndroid Build Coastguard Worker        <code>reverse_lookup()</code> restores a key from its ID. This function has no return value and the key is available through <var>agent.key()</var>. The key is actually stored in <var>agent</var> and it is lost when <var>agent</var> is reset or used for another search operation. If a given ID is out-of-range, <code>reverse_lookup()</code> throws an exception.
676*ab8db090SAndroid Build Coastguard Worker       </li>
677*ab8db090SAndroid Build Coastguard Worker       <li>
678*ab8db090SAndroid Build Coastguard Worker        <code>common_prefix_search()</code> searches keys from the possible prefixes of a query string. If there are matching keys, this function returns <var>true</var>. In this case, the first key is available through <code>agent.key()</code>, and if there are more than one matching keys, the next key will be available after the next <code>common_prefix_search()</code> which returns <var>true</var> until there are no more matching keys. Note that <code>agent.key().ptr() == agent.query().ptr()</code> is always <var>true</var> when <code>common_prefix_search()</code> has returned <var>true</var>.
679*ab8db090SAndroid Build Coastguard Worker       </li>
680*ab8db090SAndroid Build Coastguard Worker       <li>
681*ab8db090SAndroid Build Coastguard Worker        <code>predictive_search()</code> searches keys starting with a query string, and similar to <code>common_prefix_search()</code>, this function returns <var>true</var> until there are no more matching keys.
682*ab8db090SAndroid Build Coastguard Worker       </li>
683*ab8db090SAndroid Build Coastguard Worker      </ul>
684*ab8db090SAndroid Build Coastguard Worker      <p>
685*ab8db090SAndroid Build Coastguard Worker       Note that <code>agent</code> keeps the internal state of <code>common_prefix_search()</code> and <code>predictive_search()</code> until <code>agent</code> is passed to another search function or <code>agent.set_query()</code> is called.
686*ab8db090SAndroid Build Coastguard Worker      </p>
687*ab8db090SAndroid Build Coastguard Worker      <p>
688*ab8db090SAndroid Build Coastguard Worker       <code>num_keys()</code> and <code>size()</code> return the number of keys. <code>empty()</code> checks whether the number of keys is <var>0</var> or not. <code>io_size()</code> returns the dictionary size in byte.
689*ab8db090SAndroid Build Coastguard Worker      </p>
690*ab8db090SAndroid Build Coastguard Worker     </div><!-- subsubsection -->
691*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
692*ab8db090SAndroid Build Coastguard Worker
693*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
694*ab8db090SAndroid Build Coastguard Worker     <h3><a name="stdio">stdio</a></h3>
695*ab8db090SAndroid Build Coastguard Worker     <div class="float">
696*ab8db090SAndroid Build Coastguard Worker      <pre class="code">void fread(std::FILE *file, Trie *trie);
697*ab8db090SAndroid Build Coastguard Workervoid fwrite(std::FILE *file, const Trie &amp;trie);</pre>
698*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
699*ab8db090SAndroid Build Coastguard Worker     <p>
700*ab8db090SAndroid Build Coastguard Worker      The functions for I/O using <code>std::FILE</code> are declared in <kbd>marisa/stdio.h</kbd>. If you don't want to <code>#include &lt;cstdio&gt;</code>, use <kbd>marisa/trie.h</kbd> instead of <kbd>marisa.h</kbd>.
701*ab8db090SAndroid Build Coastguard Worker     </p>
702*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
703*ab8db090SAndroid Build Coastguard Worker
704*ab8db090SAndroid Build Coastguard Worker    <div class="subsection">
705*ab8db090SAndroid Build Coastguard Worker     <h3><a name="iostream">iostream</a></h3>
706*ab8db090SAndroid Build Coastguard Worker     <div class="float">
707*ab8db090SAndroid Build Coastguard Worker      <pre class="code">std::istream &amp;read(std::istream &amp;stream, Trie *trie);
708*ab8db090SAndroid Build Coastguard Workerstd::ostream &amp;write(std::ostream &amp;stream, const Trie &amp;trie);
709*ab8db090SAndroid Build Coastguard Worker
710*ab8db090SAndroid Build Coastguard Workerstd::istream &amp;operator>>(std::istream &amp;stream, Trie &amp;trie);
711*ab8db090SAndroid Build Coastguard Workerstd::ostream &amp;operator<<(std::ostream &amp;stream, const Trie &amp;trie);</pre>
712*ab8db090SAndroid Build Coastguard Worker     </div><!-- float -->
713*ab8db090SAndroid Build Coastguard Worker     <p>
714*ab8db090SAndroid Build Coastguard Worker      The functions for I/O using <code>std::iostream</code> are declared in <kbd>marisa/iostream.h</kbd>. If you don't want to <code>#include &lt;iosfwd&gt;</code>, use <kbd>marisa/trie.h</kbd> instead of <kbd>marisa.h</kbd>.
715*ab8db090SAndroid Build Coastguard Worker     </p>
716*ab8db090SAndroid Build Coastguard Worker    </div><!-- subsection -->
717*ab8db090SAndroid Build Coastguard Worker   </div><!-- section -->
718*ab8db090SAndroid Build Coastguard Worker
719*ab8db090SAndroid Build Coastguard Worker   <div class="section">
720*ab8db090SAndroid Build Coastguard Worker    <h2><a name="compatibility">Cross-architecture compatibility</a></h2>
721*ab8db090SAndroid Build Coastguard Worker    <p>
722*ab8db090SAndroid Build Coastguard Worker     The dictionary format of libmarisa depends on the architecture. Dictionaries built on a little endian architecture don't work on a big endian architecture. Also, on a big endian architecture, dictionaries built on a 32-bit machine don't work on a 64-bit machine and vise versa. On a little endian architecture, dictionaries are compatible on 32/64-bit machines.
723*ab8db090SAndroid Build Coastguard Worker    </p>
724*ab8db090SAndroid Build Coastguard Worker   </div><!-- section -->
725*ab8db090SAndroid Build Coastguard Worker
726*ab8db090SAndroid Build Coastguard Worker   <div class="section">
727*ab8db090SAndroid Build Coastguard Worker    <h2><a name="references">References</a></h2>
728*ab8db090SAndroid Build Coastguard Worker    <ul>
729*ab8db090SAndroid Build Coastguard Worker     <li><a href="https://www.slideshare.net/s5yata/x86opti-05-s5yata">Remove Branches in BitVector Select Operations - marisa 0.2.2 -</a>
730*ab8db090SAndroid Build Coastguard Worker      <ul>
731*ab8db090SAndroid Build Coastguard Worker       <li>This PowerPoint presentation describes improvements in marisa 0.2.2.</li>
732*ab8db090SAndroid Build Coastguard Worker      </ul>
733*ab8db090SAndroid Build Coastguard Worker     </li>
734*ab8db090SAndroid Build Coastguard Worker    </ul>
735*ab8db090SAndroid Build Coastguard Worker   </div><!-- section -->
736*ab8db090SAndroid Build Coastguard Worker
737*ab8db090SAndroid Build Coastguard Worker   <div class="section">
738*ab8db090SAndroid Build Coastguard Worker    <h2><a name="conclusion">Last Spell</a></h2>
739*ab8db090SAndroid Build Coastguard Worker    <p>
740*ab8db090SAndroid Build Coastguard Worker     Feel free to contact me for any questions.
741*ab8db090SAndroid Build Coastguard Worker    </p>
742*ab8db090SAndroid Build Coastguard Worker   </div><!-- section -->
743*ab8db090SAndroid Build Coastguard Worker  </div><!-- body -->
744*ab8db090SAndroid Build Coastguard Worker  <div id="footer">
745*ab8db090SAndroid Build Coastguard Worker   <div class="left">MARISA: Matching Algorithm with Recursively Implemented StorAge</div>
746*ab8db090SAndroid Build Coastguard Worker   <div class="right">
747*ab8db090SAndroid Build Coastguard Worker[email protected]748*ab8db090SAndroid Build Coastguard Worker   </div>
749*ab8db090SAndroid Build Coastguard Worker   <div class="end"></div>
750*ab8db090SAndroid Build Coastguard Worker  </div><!-- footer -->
751*ab8db090SAndroid Build Coastguard Worker </body>
752*ab8db090SAndroid Build Coastguard Worker</html>
753