xref: /aosp_15_r20/external/giflib/doc/gifstandard/LZW-and-GIF-explained.html (revision 324bb76b8d05e2a05aa88511fff61cf3f9ca5892)
1*324bb76bSAndroid Build Coastguard Worker<html>
2*324bb76bSAndroid Build Coastguard Worker<head>
3*324bb76bSAndroid Build Coastguard Worker<title>An Introduction to Data Compression</title>
4*324bb76bSAndroid Build Coastguard Worker</head>
5*324bb76bSAndroid Build Coastguard Worker<body>
6*324bb76bSAndroid Build Coastguard Worker
7*324bb76bSAndroid Build Coastguard Worker<h1>LZW and GIF explained<br>
8*324bb76bSAndroid Build Coastguard Worker<font size="-1">by Steve Blackstock</font>
9*324bb76bSAndroid Build Coastguard Worker</h1>
10*324bb76bSAndroid Build Coastguard Worker
11*324bb76bSAndroid Build Coastguard Worker<p>I hope this little document will help enlighten those of you out there
12*324bb76bSAndroid Build Coastguard Workerwho want to know more about the Lempel-Ziv Welch (LZW) compression algorithm, and,
13*324bb76bSAndroid Build Coastguard Workerspecifically, the implementation that GIF uses.</p>
14*324bb76bSAndroid Build Coastguard Worker
15*324bb76bSAndroid Build Coastguard Worker<p>Before we start, here's a little terminology, for the purposes of this
16*324bb76bSAndroid Build Coastguard Workerdocument:</p>
17*324bb76bSAndroid Build Coastguard Worker
18*324bb76bSAndroid Build Coastguard Worker<ul>
19*324bb76bSAndroid Build Coastguard Worker<li>
20*324bb76bSAndroid Build Coastguard Worker      <strong>character</strong>: a fundamental data element. In normal text files, this is
21*324bb76bSAndroid Build Coastguard Workerjust a single byte. In raster images, which is what we're interested in, it's
22*324bb76bSAndroid Build Coastguard Workeran index that specifies the color of a given pixel. I'll refer to an arbitray
23*324bb76bSAndroid Build Coastguard Workercharacter as "K".
24*324bb76bSAndroid Build Coastguard Worker</li><li>
25*324bb76bSAndroid Build Coastguard Worker      <strong>charstream</strong>: a stream of characters, as in a data file.
26*324bb76bSAndroid Build Coastguard Worker</li><li>
27*324bb76bSAndroid Build Coastguard Worker      <strong>string</strong>: a number of continuous characters, anywhere from one to very
28*324bb76bSAndroid Build Coastguard Workermany characters in length. I can specify an arbitrary string as "[...]K".
29*324bb76bSAndroid Build Coastguard Worker</li><li>
30*324bb76bSAndroid Build Coastguard Worker      <strong>prefix</strong>: almost the same as a string, but with the implication that a
31*324bb76bSAndroid Build Coastguard Workerprefix immediately precedes a character, and a prefix can have a length of
32*324bb76bSAndroid Build Coastguard Workerzero. So, a prefix and a character make up a string. I will refer to an
33*324bb76bSAndroid Build Coastguard Workerarbitrary prefix as "[...]".
34*324bb76bSAndroid Build Coastguard Worker</li><li>
35*324bb76bSAndroid Build Coastguard Worker      <strong>root</strong>: a single-character string. For most purposes, this is a
36*324bb76bSAndroid Build Coastguard Workercharacter, but we may occasionally make a distinction. It is [...]K, where
37*324bb76bSAndroid Build Coastguard Worker[...] is empty.
38*324bb76bSAndroid Build Coastguard Worker</li><li>
39*324bb76bSAndroid Build Coastguard Worker      <strong>code</strong>: a number, specified by a known number of bits, which maps to a
40*324bb76bSAndroid Build Coastguard Workerstring.
41*324bb76bSAndroid Build Coastguard Worker</li><li>
42*324bb76bSAndroid Build Coastguard Worker      <strong>codestream</strong>: the output stream of codes, as in the "raster data"
43*324bb76bSAndroid Build Coastguard Worker</li><li>
44*324bb76bSAndroid Build Coastguard Worker      <strong>entry</strong>: a code and its string.
45*324bb76bSAndroid Build Coastguard Worker</li><li>
46*324bb76bSAndroid Build Coastguard Worker      <strong>string table</strong>: a list of entries; usually, but not necessarily, unique.
47*324bb76bSAndroid Build Coastguard Worker</li>
48*324bb76bSAndroid Build Coastguard Worker</ul>
49*324bb76bSAndroid Build Coastguard Worker
50*324bb76bSAndroid Build Coastguard Worker<p>
51*324bb76bSAndroid Build Coastguard Worker     LZW is a way of compressing data that takes advantage of repetition of
52*324bb76bSAndroid Build Coastguard Workerstrings in the data. Since raster data usually contains a lot of this
53*324bb76bSAndroid Build Coastguard Workerrepetition, LZW is a good way of compressing and decompressing it.
54*324bb76bSAndroid Build Coastguard Worker     For the moment, lets consider normal LZW encoding and decoding. GIF's
55*324bb76bSAndroid Build Coastguard Workervariation on the concept is just an extension from there.
56*324bb76bSAndroid Build Coastguard Worker
57*324bb76bSAndroid Build Coastguard Worker</p><p>
58*324bb76bSAndroid Build Coastguard Worker  LZW manipulates three objects in both compression and decompression: the
59*324bb76bSAndroid Build Coastguard Workercharstream, the codestream, and the string table. In compression, the
60*324bb76bSAndroid Build Coastguard Workercharstream is the input and the codestream is the output. In decompression,
61*324bb76bSAndroid Build Coastguard Workerthe codestream is the input and the charstream is the output. The string table
62*324bb76bSAndroid Build Coastguard Workeris a product of both compression and decompression, but is never passed from
63*324bb76bSAndroid Build Coastguard Workerone to the other.
64*324bb76bSAndroid Build Coastguard Worker
65*324bb76bSAndroid Build Coastguard Worker</p><h2>Compression</h2>
66*324bb76bSAndroid Build Coastguard Worker
67*324bb76bSAndroid Build Coastguard Worker<p>
68*324bb76bSAndroid Build Coastguard Worker     The first thing we do in LZW compression is initialize our string table.
69*324bb76bSAndroid Build Coastguard WorkerTo do this, we need to choose a code size (how many bits) and know how many
70*324bb76bSAndroid Build Coastguard Workervalues our characters can possibly take. Let's say our code size is 12 bits,
71*324bb76bSAndroid Build Coastguard Workermeaning we can store 0-&gt;FFF, or 4096 entries in our string table. Lets also
72*324bb76bSAndroid Build Coastguard Workersay that we have 32 possible different characters. (This corresponds to, say,
73*324bb76bSAndroid Build Coastguard Workera picture in which there are 32 different colors possible for each pixel.) To
74*324bb76bSAndroid Build Coastguard Workerinitialize the table, we set code#0 to character#0, code #1 to character#1,
75*324bb76bSAndroid Build Coastguard Workerand so on, until code#31 to character#31. Actually, we are specifying that
76*324bb76bSAndroid Build Coastguard Workereach code from 0 to 31 maps to a root. There will be no more entries in the
77*324bb76bSAndroid Build Coastguard Workertable that have this property.
78*324bb76bSAndroid Build Coastguard Worker
79*324bb76bSAndroid Build Coastguard Worker</p><p>
80*324bb76bSAndroid Build Coastguard Worker
81*324bb76bSAndroid Build Coastguard WorkerNow we start compressing data. Let's first define something
82*324bb76bSAndroid Build Coastguard Workercalled the "current prefix".  It's just a prefix that we'll store
83*324bb76bSAndroid Build Coastguard Workerthings in and compare things to now and then.  I will refer to it as
84*324bb76bSAndroid Build Coastguard Worker"[.c.]". Initially, the current prefix has nothing in it.  Let's also
85*324bb76bSAndroid Build Coastguard Workerdefine a "current string", which will be the current prefix plus the
86*324bb76bSAndroid Build Coastguard Workernext character in the charstream.  I will refer to the current string
87*324bb76bSAndroid Build Coastguard Workeras "[.c.]K", where K is some character.  OK, look at the first
88*324bb76bSAndroid Build Coastguard Workercharacter in the charstream.  Call it P.  Make [.c.]P the current
89*324bb76bSAndroid Build Coastguard Workerstring.  (At this point, of course, it's just the root P.)  Now search
90*324bb76bSAndroid Build Coastguard Workerthrough the string table to see if [.c.]P appears in it.  Of course, it
91*324bb76bSAndroid Build Coastguard Workerdoes now, because our string table is initialized to have all roots.
92*324bb76bSAndroid Build Coastguard WorkerSo we don't do anything.  Now make [.c.]P the current prefix.  Look at
93*324bb76bSAndroid Build Coastguard Workerthe next character in the charstream. Call it Q.  Add it to the
94*324bb76bSAndroid Build Coastguard Workercurrent prefix to form [.c.]Q, the current string.  Now search through
95*324bb76bSAndroid Build Coastguard Workerthe string table to see if [.c.]Q appears in it. In this case, of
96*324bb76bSAndroid Build Coastguard Workercourse, it doesn't.  Aha! Now we get to do something.  Add [.c.]Q
97*324bb76bSAndroid Build Coastguard Worker(which is PQ in this case) to the string table for code#32, and output
98*324bb76bSAndroid Build Coastguard Workerthe code for [.c.] to the codestream.  Now start over again with the
99*324bb76bSAndroid Build Coastguard Workercurrent prefix being just the root Q.  Keep adding characters to [.c.]
100*324bb76bSAndroid Build Coastguard Workerto form [.c.]K, until you can't find [.c.]K in the string table.  Then
101*324bb76bSAndroid Build Coastguard Workeroutput the code for [.c.] and add [.c.]K to the string table.  In
102*324bb76bSAndroid Build Coastguard Workerpseudo-code, the algorithm goes something like this:
103*324bb76bSAndroid Build Coastguard Worker
104*324bb76bSAndroid Build Coastguard Worker</p><pre>     [1] Initialize string table;
105*324bb76bSAndroid Build Coastguard Worker     [2] [.c.] &lt;- empty;
106*324bb76bSAndroid Build Coastguard Worker     [3] K &lt;- next character in charstream;
107*324bb76bSAndroid Build Coastguard Worker     [4] Is [.c.]K in string table?
108*324bb76bSAndroid Build Coastguard Worker         (yes: [.c.] &lt;- [.c.]K;
109*324bb76bSAndroid Build Coastguard Worker               go to [3];
110*324bb76bSAndroid Build Coastguard Worker         )
111*324bb76bSAndroid Build Coastguard Worker         (no: add [.c.]K to the string table;
112*324bb76bSAndroid Build Coastguard Worker              output the code for [.c.] to the codestream;
113*324bb76bSAndroid Build Coastguard Worker              [.c.] &lt;- K;
114*324bb76bSAndroid Build Coastguard Worker              go to [3];
115*324bb76bSAndroid Build Coastguard Worker         )
116*324bb76bSAndroid Build Coastguard Worker</pre>
117*324bb76bSAndroid Build Coastguard Worker
118*324bb76bSAndroid Build Coastguard Worker<p>
119*324bb76bSAndroid Build Coastguard Worker       It's as simple as that! Of course, when you get to step [3] and there
120*324bb76bSAndroid Build Coastguard Workeraren't any more characters left, you just output the code for [.c.] and throw
121*324bb76bSAndroid Build Coastguard Workerthe table away. You're done.
122*324bb76bSAndroid Build Coastguard Worker
123*324bb76bSAndroid Build Coastguard Worker</p><p>
124*324bb76bSAndroid Build Coastguard Worker      Wanna do an example? Let's pretend we have a four-character alphabet:
125*324bb76bSAndroid Build Coastguard WorkerA,B,C,D. The charstream looks like ABACABA. Let's compress it. First, we
126*324bb76bSAndroid Build Coastguard Workerinitialize our string table to: #0=A, #1=B, #2=C, #3=D. The first character is
127*324bb76bSAndroid Build Coastguard WorkerA, which is in the string table, so [.c.] becomes A. Next we get AB, which is
128*324bb76bSAndroid Build Coastguard Workernot in the table, so we output code #0 (for [.c.]),
129*324bb76bSAndroid Build Coastguard Worker     and add AB to the string table as code #4. [.c.] becomes B. Next we get
130*324bb76bSAndroid Build Coastguard Worker[.c.]A = BA, which is not in the string table, so output code #1, and add BA
131*324bb76bSAndroid Build Coastguard Workerto the string table as code #5. [.c.] becomes A. Next we get AC, which is not
132*324bb76bSAndroid Build Coastguard Workerin the string table. Output code #0, and add AC to the string table as code
133*324bb76bSAndroid Build Coastguard Worker#6. Now [.c.] becomes C. Next we get [.c.]A = CA, which is not in the table.
134*324bb76bSAndroid Build Coastguard WorkerOutput #2 for C, and add CA to table as code#7. Now [.c.] becomes A. Next we
135*324bb76bSAndroid Build Coastguard Workerget AB, which IS in the string table, so [.c.] gets AB, and we look at ABA,
136*324bb76bSAndroid Build Coastguard Workerwhich is not in the string table, so output the code for AB, which is #4, and
137*324bb76bSAndroid Build Coastguard Workeradd ABA to the string table as code #8. [.c.] becomes A. We can't get any more
138*324bb76bSAndroid Build Coastguard Workercharacters, so we just output #0 for the code for A, and we're done. So, the
139*324bb76bSAndroid Build Coastguard Workercodestream is #0#1#0#2#4#0.
140*324bb76bSAndroid Build Coastguard Worker
141*324bb76bSAndroid Build Coastguard Worker</p><p>
142*324bb76bSAndroid Build Coastguard Worker      A few words (four) should be said here about efficiency: use a hashing
143*324bb76bSAndroid Build Coastguard Workerstrategy. The search through the string table can be computationally
144*324bb76bSAndroid Build Coastguard Workerintensive, and some hashing is well worth the effort. Also, note that
145*324bb76bSAndroid Build Coastguard Worker"straight LZW" compression runs the risk of overflowing the string table -
146*324bb76bSAndroid Build Coastguard Workergetting to a code which can't be represented in the number of bits you've set
147*324bb76bSAndroid Build Coastguard Workeraside for codes. There are several ways of dealing with this problem, and GIF
148*324bb76bSAndroid Build Coastguard Workerimplements a very clever one, but we'll get to that.
149*324bb76bSAndroid Build Coastguard Worker
150*324bb76bSAndroid Build Coastguard Worker</p><p>
151*324bb76bSAndroid Build Coastguard Worker      An important thing to notice is that, at any point during the
152*324bb76bSAndroid Build Coastguard Workercompression, if [...]K is in the string table, [...] is there also. This fact
153*324bb76bSAndroid Build Coastguard Workersuggests an efficient method for storing strings in the table. Rather than
154*324bb76bSAndroid Build Coastguard Workerstore the entire string of K's in the table, realize that any string can be
155*324bb76bSAndroid Build Coastguard Workerexpressed as a prefix plus a character: [...]K. If we're about to store [...]K
156*324bb76bSAndroid Build Coastguard Workerin the table, we know that [...] is already there, so we can just store the
157*324bb76bSAndroid Build Coastguard Workercode for [...] plus the final character K.
158*324bb76bSAndroid Build Coastguard Worker
159*324bb76bSAndroid Build Coastguard Worker
160*324bb76bSAndroid Build Coastguard Worker</p><h2>Decompression</h2>
161*324bb76bSAndroid Build Coastguard Worker
162*324bb76bSAndroid Build Coastguard WorkerDecompression is perhaps more
163*324bb76bSAndroid Build Coastguard Workerdifficult conceptually, but it is really easier to program.
164*324bb76bSAndroid Build Coastguard WorkerWe again have to start with an initialized string
165*324bb76bSAndroid Build Coastguard Workertable. This table comes from what knowledge we have about the charstream that
166*324bb76bSAndroid Build Coastguard Workerwe will eventually get, like what possible values the characters can take. In
167*324bb76bSAndroid Build Coastguard WorkerGIF files, this information is in the header as the number of possible pixel
168*324bb76bSAndroid Build Coastguard Workervalues. The beauty of LZW, though, is that this is all we need to know. We
169*324bb76bSAndroid Build Coastguard Workerwill build the rest of the string table as we decompress the codestream. The
170*324bb76bSAndroid Build Coastguard Workercompression is done in such a way that we will never encounter a code in the
171*324bb76bSAndroid Build Coastguard Workercodestream that we can't translate into a string.
172*324bb76bSAndroid Build Coastguard Worker
173*324bb76bSAndroid Build Coastguard Worker<p>
174*324bb76bSAndroid Build Coastguard WorkerWe need to define something called a "current code", which I
175*324bb76bSAndroid Build Coastguard Workerwill refer to as "&lt;code&gt;", and an "old-code", which I will refer
176*324bb76bSAndroid Build Coastguard Workerto as "&lt;old&gt;".  To start things off, look at the first code.  This
177*324bb76bSAndroid Build Coastguard Workeris now &lt;code&gt;.  This code will be in the intialized string table as
178*324bb76bSAndroid Build Coastguard Workerthe code for a root.  Output the root to the charstream. Make this code
179*324bb76bSAndroid Build Coastguard Workerthe old-code &lt;old&gt;.  *Now look at the next code, and make it
180*324bb76bSAndroid Build Coastguard Worker&lt;code&gt;.  It is possible that this code will not be in the string
181*324bb76bSAndroid Build Coastguard Workertable, but let's assume for now that it is.  Output the string
182*324bb76bSAndroid Build Coastguard Workercorresponding to &lt;code&gt; to the codestream.  Now find the first
183*324bb76bSAndroid Build Coastguard Workercharacter in the string you just translated.  Call this K.  Add this to
184*324bb76bSAndroid Build Coastguard Workerthe prefix [...] generated by &lt;old&gt; to form a new string
185*324bb76bSAndroid Build Coastguard Worker[...]K. Add this string [...]K to the string table, and set the
186*324bb76bSAndroid Build Coastguard Workerold-code &lt;old&gt; to the current code &lt;code&gt;. Repeat from where I
187*324bb76bSAndroid Build Coastguard Workertyped the asterisk, and you're all set.
188*324bb76bSAndroid Build Coastguard WorkerThis is the most common case so you should understand this before going
189*324bb76bSAndroid Build Coastguard Workeron.
190*324bb76bSAndroid Build Coastguard Worker
191*324bb76bSAndroid Build Coastguard Worker</p><p>
192*324bb76bSAndroid Build Coastguard Worker
193*324bb76bSAndroid Build Coastguard WorkerNow let's consider the possibility that &lt;code&gt; is not in the
194*324bb76bSAndroid Build Coastguard Workerstring table, which as we will see can only occur for strings of the
195*324bb76bSAndroid Build Coastguard Workerform P[...]P (for any character P).  Think back to compression, and
196*324bb76bSAndroid Build Coastguard Workertry to understand what happens when you have a string like
197*324bb76bSAndroid Build Coastguard WorkerP[...]P[...]PQ appear in the charstream. Suppose P[...] is already in
198*324bb76bSAndroid Build Coastguard Workerthe string table, but P[...]P is not. The compressor will parse out
199*324bb76bSAndroid Build Coastguard WorkerP[...], and find that P[...]P is not in the string table. It will
200*324bb76bSAndroid Build Coastguard Workeroutput the code for P[...], and add P[...]P to the string table. Then
201*324bb76bSAndroid Build Coastguard Workerit will get up to P[...]P for the next string, and find that P[...]P
202*324bb76bSAndroid Build Coastguard Workeris in the table, as the code just added. So it will output the code
203*324bb76bSAndroid Build Coastguard Workerfor P[...]P if it finds that P[...]PQ is not in the table.  The
204*324bb76bSAndroid Build Coastguard Workerdecompressor is always "one step behind" the compressor. When the
205*324bb76bSAndroid Build Coastguard Workerdecompressor sees the code for P[...]P, it will not have added that
206*324bb76bSAndroid Build Coastguard Workercode to it's string table yet because it needed the beginning
207*324bb76bSAndroid Build Coastguard Workercharacter of P[...]P to add to the string for the last code, P[...],
208*324bb76bSAndroid Build Coastguard Workerto form the code for P[...]P. However, when a decompressor finds a
209*324bb76bSAndroid Build Coastguard Workercode that it doesn't know yet, it will always be the very next one to
210*324bb76bSAndroid Build Coastguard Workerbe added to the string table.  So it can guess at what the string for
211*324bb76bSAndroid Build Coastguard Workerthe code should be, and, in fact, it will always be correct. If I am a
212*324bb76bSAndroid Build Coastguard Workerdecompressor, and I see code#124, and yet my string table has entries
213*324bb76bSAndroid Build Coastguard Workeronly up to code#123, I can figure out what code#124 must be, add it to
214*324bb76bSAndroid Build Coastguard Workermy string table, and output the string. If code#123 generated the
215*324bb76bSAndroid Build Coastguard Workerstring [...], which I will refer to here as a prefix, then code#124,
216*324bb76bSAndroid Build Coastguard Workerin this special case, will be [...] plus the first character of [...].
217*324bb76bSAndroid Build Coastguard WorkerSo just add the first character of [...] to the end of itself.  Not
218*324bb76bSAndroid Build Coastguard Workertoo bad.
219*324bb76bSAndroid Build Coastguard Worker
220*324bb76bSAndroid Build Coastguard Worker
221*324bb76bSAndroid Build Coastguard Worker</p><p>
222*324bb76bSAndroid Build Coastguard Worker
223*324bb76bSAndroid Build Coastguard WorkerAs an example (and a very common one) of this special case, let's
224*324bb76bSAndroid Build Coastguard Workerassume we have a raster image in which the first three pixels have the
225*324bb76bSAndroid Build Coastguard Workersame color value.  That is, my charstream looks like: QQQ....  For the
226*324bb76bSAndroid Build Coastguard Workersake of argument, let's say we have 32 colors, and Q is the
227*324bb76bSAndroid Build Coastguard Workercolor#12. The compressor will generate the code sequence
228*324bb76bSAndroid Build Coastguard Worker12,32,.... (if you don't know why, take a minute to understand it.)
229*324bb76bSAndroid Build Coastguard WorkerRemember that #32 is not in the initial table, which goes from #0 to
230*324bb76bSAndroid Build Coastguard Worker#31. The decompressor will see #12 and translate it just fine as color
231*324bb76bSAndroid Build Coastguard WorkerQ. Then it will see #32 and not yet know what that means. But if it
232*324bb76bSAndroid Build Coastguard Workerthinks about it long enough, it can figure out that QQ should be
233*324bb76bSAndroid Build Coastguard Workerentry#32 in the table and QQ should be the next string output.  So the
234*324bb76bSAndroid Build Coastguard Workerdecompression pseudo-code goes something like:
235*324bb76bSAndroid Build Coastguard Worker
236*324bb76bSAndroid Build Coastguard Worker</p><pre>     [1] Initialize string table;
237*324bb76bSAndroid Build Coastguard Worker     [2] get first code: &lt;code&gt;
238*324bb76bSAndroid Build Coastguard Worker     [3] output the string for &lt;code&gt; to the charstream;
239*324bb76bSAndroid Build Coastguard Worker     [4] &lt;old&gt; = &lt;code&gt;
240*324bb76bSAndroid Build Coastguard Worker     [5] &lt;code&gt; &lt;- next code in codestream;
241*324bb76bSAndroid Build Coastguard Worker     [6] does &lt;code&gt; exist in the string table?
242*324bb76bSAndroid Build Coastguard Worker         (yes: output the string for &lt;code&gt; to the charstream;
243*324bb76bSAndroid Build Coastguard Worker            [...] &lt;- translation for &lt;old&gt;
244*324bb76bSAndroid Build Coastguard Worker            K &lt;- first character of translation for &lt;code&gt;
245*324bb76bSAndroid Build Coastguard Worker            add [...]K to the string table;
246*324bb76bSAndroid Build Coastguard Worker            &lt;old&gt; &lt;- &lt;code&gt;
247*324bb76bSAndroid Build Coastguard Worker         )
248*324bb76bSAndroid Build Coastguard Worker         (no: [...] &lt;- translation for &lt;old&gt;
249*324bb76bSAndroid Build Coastguard Worker            K &lt;- first character of [...];
250*324bb76bSAndroid Build Coastguard Worker            output [...]K to charstream and add it to string table;
251*324bb76bSAndroid Build Coastguard Worker            &lt;old&gt; &lt;- &lt;code&gt;
252*324bb76bSAndroid Build Coastguard Worker         )
253*324bb76bSAndroid Build Coastguard Worker     [7] go to [5];
254*324bb76bSAndroid Build Coastguard Worker</pre>
255*324bb76bSAndroid Build Coastguard Worker
256*324bb76bSAndroid Build Coastguard Worker<p>
257*324bb76bSAndroid Build Coastguard Worker      Again, when you get to step [5] and there are no more codes, you're
258*324bb76bSAndroid Build Coastguard Workerfinished.  Outputting of strings, and finding of initial characters in strings
259*324bb76bSAndroid Build Coastguard Workerare efficiency problems all to themselves, but I'm not going to suggest ways
260*324bb76bSAndroid Build Coastguard Workerto do them here. Half the fun of programming is figuring these things out!
261*324bb76bSAndroid Build Coastguard Worker
262*324bb76bSAndroid Build Coastguard Worker</p><h2>GIF variation</h2>
263*324bb76bSAndroid Build Coastguard Worker
264*324bb76bSAndroid Build Coastguard Worker<p>
265*324bb76bSAndroid Build Coastguard Worker
266*324bb76bSAndroid Build Coastguard WorkerNow for the GIF variations on the theme. In part of the header of a
267*324bb76bSAndroid Build Coastguard WorkerGIF file, there is a field, in the Raster Data stream, called "code
268*324bb76bSAndroid Build Coastguard Workersize". This is a very misleading name for the field, but we have to
269*324bb76bSAndroid Build Coastguard Workerlive with it. What it is really is the "root size". The actual size,
270*324bb76bSAndroid Build Coastguard Workerin bits, of the compression codes actually changes during
271*324bb76bSAndroid Build Coastguard Workercompression/decompression, and I will refer to that size here as the
272*324bb76bSAndroid Build Coastguard Worker"compression size".  The initial table is just the codes for all the
273*324bb76bSAndroid Build Coastguard Workerroots, as usual, but two special codes are added on top of those.  The
274*324bb76bSAndroid Build Coastguard Worker"code size" N is set to max(2,bits-per-pixel).  In the table the roots
275*324bb76bSAndroid Build Coastguard Workertake up slots #0 through #(2**N-1), and the special codes are (2**N)
276*324bb76bSAndroid Build Coastguard Workerand (2**N + 1).  The initial compression size will be N+1 bits per
277*324bb76bSAndroid Build Coastguard Workercode. If you're encoding, you output the codes (N+1) bits at a time to
278*324bb76bSAndroid Build Coastguard Workerstart with, and if you're decoding, you grab (N+1) bits from the
279*324bb76bSAndroid Build Coastguard Workercodestream at a time.  As for the special codes: &lt;CC&gt; or the clear
280*324bb76bSAndroid Build Coastguard Workercode, is (2**N), and &lt;EOI&gt;, or end-of-information, is (2**N +
281*324bb76bSAndroid Build Coastguard Worker1). &lt;CC&gt; tells the compressor to re- initialize the string table,
282*324bb76bSAndroid Build Coastguard Workerand to reset the compression size to (N+1). &lt;EOI&gt; means there's no
283*324bb76bSAndroid Build Coastguard Workermore in the codestream.
284*324bb76bSAndroid Build Coastguard Worker
285*324bb76bSAndroid Build Coastguard Worker</p><p>
286*324bb76bSAndroid Build Coastguard Worker
287*324bb76bSAndroid Build Coastguard WorkerIf you're encoding or decoding, you should
288*324bb76bSAndroid Build Coastguard Workerstart adding things to the string table at &lt;CC&gt; + 2. If you're
289*324bb76bSAndroid Build Coastguard Workerencoding, you should output &lt;CC&gt; as the very first code, and then
290*324bb76bSAndroid Build Coastguard Workerwhenever after that you reach code #4095 (hex FFF), because GIF does
291*324bb76bSAndroid Build Coastguard Workernot allow compression sizes to be greater than 12 bits. If you're
292*324bb76bSAndroid Build Coastguard Workerdecoding, you should reinitialize your string table when you observe
293*324bb76bSAndroid Build Coastguard Worker&lt;CC&gt;.  The variable compression sizes are really no big deal. If
294*324bb76bSAndroid Build Coastguard Workeryou're encoding, you start with a compression size of (N+1) bits, and,
295*324bb76bSAndroid Build Coastguard Workerwhenever you output the code (2**(compression size)-1), you bump the
296*324bb76bSAndroid Build Coastguard Workercompression size up one bit. So the next code you output will be one
297*324bb76bSAndroid Build Coastguard Workerbit longer. Remember that the largest compression size is 12 bits,
298*324bb76bSAndroid Build Coastguard Workercorresponding to a code of 4095. If you get that far, you must output
299*324bb76bSAndroid Build Coastguard Worker&lt;CC&gt; as the next code, and start over.  If you're decoding, you
300*324bb76bSAndroid Build Coastguard Workermust increase your compression size AS SOON AS YOU write entry
301*324bb76bSAndroid Build Coastguard Worker#(2**(compression size) - 1) to the string table. The next code you
302*324bb76bSAndroid Build Coastguard WorkerREAD will be one bit longer. Don't make the mistake of waiting until
303*324bb76bSAndroid Build Coastguard Workeryou need to add the code (2**compression size) to the table. You'll
304*324bb76bSAndroid Build Coastguard Workerhave already missed a bit from the last code.  The packaging of codes
305*324bb76bSAndroid Build Coastguard Workerinto a bitsream for the raster data is also a potential stumbling
306*324bb76bSAndroid Build Coastguard Workerblock for the novice encoder or decoder. The lowest order bit in the
307*324bb76bSAndroid Build Coastguard Workercode should coincide with the lowest available bit in the first
308*324bb76bSAndroid Build Coastguard Workeravailable byte in the codestream. For example, if you're starting with
309*324bb76bSAndroid Build Coastguard Worker5-bit compression codes, and your first three codes are, say,
310*324bb76bSAndroid Build Coastguard Worker&lt;abcde&gt;, &lt;fghij&gt;, &lt;klmno&gt;, where e, j, and o are bit#0,
311*324bb76bSAndroid Build Coastguard Workerthen your codestream will start off like:
312*324bb76bSAndroid Build Coastguard Worker
313*324bb76bSAndroid Build Coastguard Worker</p><pre>       byte#0: hijabcde
314*324bb76bSAndroid Build Coastguard Worker       byte#1: .klmnofg
315*324bb76bSAndroid Build Coastguard Worker</pre>
316*324bb76bSAndroid Build Coastguard Worker
317*324bb76bSAndroid Build Coastguard Worker<p>
318*324bb76bSAndroid Build Coastguard Worker
319*324bb76bSAndroid Build Coastguard Worker      So the differences between straight LZW and GIF LZW are: two additional
320*324bb76bSAndroid Build Coastguard Workerspecial codes and variable compression sizes. If you understand LZW, and you
321*324bb76bSAndroid Build Coastguard Workerunderstand those variations, you understand it all!
322*324bb76bSAndroid Build Coastguard Worker
323*324bb76bSAndroid Build Coastguard Worker</p><p>
324*324bb76bSAndroid Build Coastguard Worker      Just as sort of a P.S., you may have noticed that a compressor has a
325*324bb76bSAndroid Build Coastguard Workerlittle bit of flexibility at compression time. I specified a "greedy" approach
326*324bb76bSAndroid Build Coastguard Workerto the compression, grabbing as many characters as possible before outputting
327*324bb76bSAndroid Build Coastguard Workercodes. This is, in fact, the standard LZW way of doing things, and it will
328*324bb76bSAndroid Build Coastguard Workeryield the best compression ratio. But there's no rule saying you can't stop
329*324bb76bSAndroid Build Coastguard Workeranywhere along the line and just output the code for the current prefix,
330*324bb76bSAndroid Build Coastguard Workerwhether it's already in the table or not, and add that string plus the next
331*324bb76bSAndroid Build Coastguard Workercharacter to the string table. There are various reasons for wanting to do
332*324bb76bSAndroid Build Coastguard Workerthis, especially if the strings get extremely long and make hashing difficult.
333*324bb76bSAndroid Build Coastguard WorkerIf you need to, do it.
334*324bb76bSAndroid Build Coastguard Worker
335*324bb76bSAndroid Build Coastguard Worker</p><p>
336*324bb76bSAndroid Build Coastguard Worker      Hope this helps out.----steve blackstock
337*324bb76bSAndroid Build Coastguard Worker
338*324bb76bSAndroid Build Coastguard Worker</p><h3>Further information</h3>
339*324bb76bSAndroid Build Coastguard Worker
340*324bb76bSAndroid Build Coastguard WorkerThe original paper that describes the LZW algorithm is:
341*324bb76bSAndroid Build Coastguard Worker
342*324bb76bSAndroid Build Coastguard Worker<blockquote>
343*324bb76bSAndroid Build Coastguard WorkerTerry A. Welch.
344*324bb76bSAndroid Build Coastguard WorkerA Technique for High Performance Data Compression.
345*324bb76bSAndroid Build Coastguard WorkerIEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19.
346*324bb76bSAndroid Build Coastguard Worker</blockquote>
347*324bb76bSAndroid Build Coastguard Worker
348*324bb76bSAndroid Build Coastguard WorkerThe GIF format is described in more detail in the
349*324bb76bSAndroid Build Coastguard Worker<a href="gif87.txt">GIF87(5) - GIF 87</a> and
350*324bb76bSAndroid Build Coastguard Worker<a href="gif89.txt">GIF89a(5) - GIF 89a</a> standards.
351*324bb76bSAndroid Build Coastguard Worker</body>
352*324bb76bSAndroid Build Coastguard Worker</html>
353