xref: /aosp_15_r20/external/gemmlowp/doc/quantization.md (revision 5f39d1b313f0528e11bae88b3029b54b9e1033e7)
1*5f39d1b3SJooyung Han# Building a quantization paradigm from first principles
2*5f39d1b3SJooyung Han
3*5f39d1b3SJooyung Han**TLDR:** If you prefer example code over theory, look at
4*5f39d1b3SJooyung Han[doc/quantization_example.cc](quantization_example.cc).
5*5f39d1b3SJooyung Han
6*5f39d1b3SJooyung Han## Overview
7*5f39d1b3SJooyung Han
8*5f39d1b3SJooyung Hangemmlowp allows to perform calculations on matrices on uint8 values, but these
9*5f39d1b3SJooyung Hanmatrices are only useful insofar as they somehow approximate matrices of real
10*5f39d1b3SJooyung Hannumbers. By a _quantization paradigm_ we mean a correspondence between matrices
11*5f39d1b3SJooyung Hanof quantized 8bit values and matrices of real numbers. The choice of a
12*5f39d1b3SJooyung Hanquantization paradigm affects the calculations that gemmlowp itself needs to
13*5f39d1b3SJooyung Hanperform, specifically, it affects how one goes from internal 32bit accumulator
14*5f39d1b3SJooyung Hanto final 8bit outputs.
15*5f39d1b3SJooyung Han
16*5f39d1b3SJooyung HanThe part of gemmlowp transforming internal 32bit accumulator to final
17*5f39d1b3SJooyung Han8bit outputs is the "output pipeline" described in [output.md](output.md).
18*5f39d1b3SJooyung Han
19*5f39d1b3SJooyung Hangemmlowp's `GemmWithOutputPipeline` entry point allows specifying an arbitrary
20*5f39d1b3SJooyung Hanoutput pipeline, allowing the user to implement their own preferred quantized
21*5f39d1b3SJooyung Hanarithmetic paradigm.
22*5f39d1b3SJooyung Han
23*5f39d1b3SJooyung HanIn the present document, our purpose is to show how, reasoning from first
24*5f39d1b3SJooyung Hanprinciples and some domain-specific knowledge of neural networks, we can arrive
25*5f39d1b3SJooyung Hannaturally at some specific quantization paradigm, and how that can be
26*5f39d1b3SJooyung Hanimplemented using a specific output pipeline.
27*5f39d1b3SJooyung Han
28*5f39d1b3SJooyung HanWe also aim to show how that differs from the older, legacy quantization
29*5f39d1b3SJooyung Hanparadigm implemented by gemmlowp's legacy interfaces and why the change to the
30*5f39d1b3SJooyung Hannewer quantization paradigm described in this document was useful as far as some
31*5f39d1b3SJooyung Hanapplications of gemmlowp were concerned.
32*5f39d1b3SJooyung Han
33*5f39d1b3SJooyung Han## Quantization as an affine map.
34*5f39d1b3SJooyung Han
35*5f39d1b3SJooyung HanIn order for arithmetic on real values to map directly to arithmetic on
36*5f39d1b3SJooyung Hanquantized uint8 values, the mapping between real and quantized uint8 values must
37*5f39d1b3SJooyung Hanbe affine, which means it must be of the form
38*5f39d1b3SJooyung Han
39*5f39d1b3SJooyung Han```
40*5f39d1b3SJooyung Hanreal_value = A * quantized_value + B             (1)
41*5f39d1b3SJooyung Han```
42*5f39d1b3SJooyung Han
43*5f39d1b3SJooyung Hanfor some constants A, B, or equivalently, of the form
44*5f39d1b3SJooyung Han
45*5f39d1b3SJooyung Han```
46*5f39d1b3SJooyung Hanreal_value = C * (quantized_value + D)           (2)
47*5f39d1b3SJooyung Han```
48*5f39d1b3SJooyung Han
49*5f39d1b3SJooyung Hanfor some constants C, D. Indeed, anything else than such an affine map would
50*5f39d1b3SJooyung Hanmean that the result of the quantized calculations do no longer readily provide
51*5f39d1b3SJooyung Hanan approximation to the result of the real-numbers calculation.
52*5f39d1b3SJooyung Han
53*5f39d1b3SJooyung Han## Domain-specific constraint: the real value 0 must be exactly representable.
54*5f39d1b3SJooyung Han
55*5f39d1b3SJooyung HanHere a domain-specific constrain from neural networks appears: for some neural
56*5f39d1b3SJooyung Hannetwork layers, it is very useful for optimized implementations that the
57*5f39d1b3SJooyung Hanreal-value 0 be exactly representable.
58*5f39d1b3SJooyung Han
59*5f39d1b3SJooyung HanFor instance, in a Convolutional or Pooling layer with padding, it is useful to
60*5f39d1b3SJooyung Hanbe able to implement the padding by zero-padding the input array, so that
61*5f39d1b3SJooyung Hanoptimized loops do not need to become more complex to avoid overrunning the
62*5f39d1b3SJooyung Hanarray bounds.
63*5f39d1b3SJooyung Han
64*5f39d1b3SJooyung HanIn order for such zero-padding to be feasible in a quantized implementation of
65*5f39d1b3SJooyung Hansuch layers, it is important that the real value '0' be exactly representable in
66*5f39d1b3SJooyung Hanquantized form, i.e. that it correspond exactly to some quantized value, which
67*5f39d1b3SJooyung Hanwe call the _zero-point_.
68*5f39d1b3SJooyung Han
69*5f39d1b3SJooyung HanIndeed, if '0' were not exactly representable, then we would have to use some
70*5f39d1b3SJooyung Hanquantized value for padding, that does not exactly correspond to the real value
71*5f39d1b3SJooyung Han'0'. That would typically introduce inaccuracy in the result. In fact, using
72*5f39d1b3SJooyung Hanalways the same such value would be worse: it would introduce _bias_ in the
73*5f39d1b3SJooyung Hanresult.
74*5f39d1b3SJooyung Han
75*5f39d1b3SJooyung Han## The final form of the quantization equation
76*5f39d1b3SJooyung Han
77*5f39d1b3SJooyung HanNow let us phrase what this constraint — that the real value 0 be exactly
78*5f39d1b3SJooyung Hanrepresentable — means in either quantization equations, (1) and (2).
79*5f39d1b3SJooyung Han
80*5f39d1b3SJooyung HanIn equation (1), plugging `real_value = 0` and `quantized_value = zero_point`,
81*5f39d1b3SJooyung Hanwe get:
82*5f39d1b3SJooyung Han
83*5f39d1b3SJooyung Han```
84*5f39d1b3SJooyung Han0 = A * zero_point + B
85*5f39d1b3SJooyung Han```
86*5f39d1b3SJooyung Han
87*5f39d1b3SJooyung Hanequivalently:
88*5f39d1b3SJooyung Han
89*5f39d1b3SJooyung Han```
90*5f39d1b3SJooyung Hanzero_point = -B / A
91*5f39d1b3SJooyung Han```
92*5f39d1b3SJooyung Han
93*5f39d1b3SJooyung HanWe are thus left with a rather awkward constraint: the real number `-B / A` must
94*5f39d1b3SJooyung Hansomehow be guaranteed to be exactly integral, so that the special uint8 value
95*5f39d1b3SJooyung Han`zero_point` can be exactly equal to it. Quite awkward!
96*5f39d1b3SJooyung Han
97*5f39d1b3SJooyung HanNow let us look at equation (2). Plugging `real_value = 0` and
98*5f39d1b3SJooyung Han`quantized_value = zero_point`, we get:
99*5f39d1b3SJooyung Han
100*5f39d1b3SJooyung Han```
101*5f39d1b3SJooyung Han0 = C * (zero_point + D)
102*5f39d1b3SJooyung Han```
103*5f39d1b3SJooyung Han
104*5f39d1b3SJooyung HanConveniently, the constant `C` plays no role anymore, so this equation
105*5f39d1b3SJooyung Hansimplifies to:
106*5f39d1b3SJooyung Han
107*5f39d1b3SJooyung Han```
108*5f39d1b3SJooyung Han0 = zero_point + D
109*5f39d1b3SJooyung Han```
110*5f39d1b3SJooyung Han
111*5f39d1b3SJooyung HanIn other words, `D = -zero_point`. This suggests rewriting the quantization
112*5f39d1b3SJooyung Hanequation (2) into the following form (3), which will be the final form that we
113*5f39d1b3SJooyung Hanwill consistently use:
114*5f39d1b3SJooyung Han
115*5f39d1b3SJooyung Han```
116*5f39d1b3SJooyung Hanreal_value = scale * (quantized_value - zero_point)        (3)
117*5f39d1b3SJooyung Han```
118*5f39d1b3SJooyung Han
119*5f39d1b3SJooyung HanTo go from (2) to (3), we merely renamed `C` to `scale` and `D` to
120*5f39d1b3SJooyung Han`-zero_point`.
121*5f39d1b3SJooyung Han
122*5f39d1b3SJooyung HanWith this quantization equation (3), the condition that 0 be exactly
123*5f39d1b3SJooyung Hanrepresentable is vacuously satisfied: `zero_point` is by definition one of the
124*5f39d1b3SJooyung Hanpossible `quantized_value`'s, and equation (3) maps it to a `real_value` of
125*5f39d1b3SJooyung Hanexactly 0.
126*5f39d1b3SJooyung Han
127*5f39d1b3SJooyung HanNote that the final quantizaton equation (3) depends on two constants, one
128*5f39d1b3SJooyung Hanintegral, the other an arbitrary positive real number:
129*5f39d1b3SJooyung Han
130*5f39d1b3SJooyung Han*   `zero_point` is integral, more specifically is one of the possible quantized
131*5f39d1b3SJooyung Han    values (i.e. typically is a uint8 value).
132*5f39d1b3SJooyung Han*   `scale` is a positive real number. Thus at this stage we have not yet shown
133*5f39d1b3SJooyung Han    how to eliminate all usage of floating-point arithmetic. That will come
134*5f39d1b3SJooyung Han    below.
135*5f39d1b3SJooyung Han
136*5f39d1b3SJooyung Han## Quantizing a matrix multiplication
137*5f39d1b3SJooyung Han
138*5f39d1b3SJooyung HanNow that we know — equation (3) — how real numbers are to correspond
139*5f39d1b3SJooyung Hanto quantized values (typically uint8), we turn to applying this knowledge to
140*5f39d1b3SJooyung Hanrewriting a multiplication of matrices of real numbers, by the equivalent
141*5f39d1b3SJooyung Hanmultiplication of matrices of quantized values.
142*5f39d1b3SJooyung Han
143*5f39d1b3SJooyung HanSay that we have two matrices of real values `lhs_real_matrix`,
144*5f39d1b3SJooyung Han`rhs_real_matrix`. Each entry of their product is the sum (accumulation) of many
145*5f39d1b3SJooyung Hanproducts of individual matrix entries, say `lhs_real_value * rhs_real_value`.
146*5f39d1b3SJooyung Han
147*5f39d1b3SJooyung HanNow suppose that we have already quantized these two matrices according to the
148*5f39d1b3SJooyung Hanabove equation (3), with some already-known quantization parameters `lhs_scale`,
149*5f39d1b3SJooyung Han`rhs_scale`, `lhs_zero_point`, `rhs_zero_point`, so that their matrix entries
150*5f39d1b3SJooyung Hanare quantized as
151*5f39d1b3SJooyung Han
152*5f39d1b3SJooyung Han```
153*5f39d1b3SJooyung Hanlhs_real_value[i] = lhs_scale * (lhs_quantized_value[i] - lhs_zero_point)
154*5f39d1b3SJooyung Hanrhs_real_value[i] = rhs_scale * (rhs_quantized_value[i] - rhs_zero_point)
155*5f39d1b3SJooyung Han```
156*5f39d1b3SJooyung Han
157*5f39d1b3SJooyung HanWe then rewrite the matrix product accumulator accordingly:
158*5f39d1b3SJooyung Han
159*5f39d1b3SJooyung Han```
160*5f39d1b3SJooyung Hanresult_real_value
161*5f39d1b3SJooyung Han  = Sum_over_i(lhs_real_value[i] * rhs_real_value[i])
162*5f39d1b3SJooyung Han  = Sum_over_i(
163*5f39d1b3SJooyung Han        lhs_scale * (lhs_quantized_value[i] - lhs_zero_point) *
164*5f39d1b3SJooyung Han        rhs_scale * (rhs_quantized_value[i] - rhs_zero_point)
165*5f39d1b3SJooyung Han    )
166*5f39d1b3SJooyung Han  = lhs_scale * rhs_scale * Sum_over_i(
167*5f39d1b3SJooyung Han        (lhs_quantized_value[i] - lhs_zero_point) *
168*5f39d1b3SJooyung Han        (rhs_quantized_value[i] - rhs_zero_point)
169*5f39d1b3SJooyung Han    )                                                      (4)
170*5f39d1b3SJooyung Han```
171*5f39d1b3SJooyung Han
172*5f39d1b3SJooyung HanNow our goal is to represent this result itself as a quantized matrix, i.e.
173*5f39d1b3SJooyung Hanstill according to equation (3), for some pre-established quantization
174*5f39d1b3SJooyung Hanparameters `result_scale` and `result_zero_point`, as
175*5f39d1b3SJooyung Han
176*5f39d1b3SJooyung Han```
177*5f39d1b3SJooyung Hanresult_real_value = result_scale *
178*5f39d1b3SJooyung Han    (result_quantized_value - result_zero_point)
179*5f39d1b3SJooyung Han```
180*5f39d1b3SJooyung Han
181*5f39d1b3SJooyung HanHere we need to keep in mind that our goal is to specify what the quantized
182*5f39d1b3SJooyung Hanmatrix multiplication should do, i.e. how to compute `result_quantized_value`.
183*5f39d1b3SJooyung HanThe last equation above is equivalent to
184*5f39d1b3SJooyung Han
185*5f39d1b3SJooyung Han```
186*5f39d1b3SJooyung Hanresult_quantized_value = result_zero_point +
187*5f39d1b3SJooyung Han    result_real_value / result_scale
188*5f39d1b3SJooyung Han```
189*5f39d1b3SJooyung Han
190*5f39d1b3SJooyung HanNow we can use equation (4) above to plug into this the expression of
191*5f39d1b3SJooyung Hanresult_real_value in terms of the quantized operands, and we obtain:
192*5f39d1b3SJooyung Han
193*5f39d1b3SJooyung Han```
194*5f39d1b3SJooyung Hanresult_quantized_value = result_zero_point +
195*5f39d1b3SJooyung Han    (lhs_scale * rhs_scale / result_scale) *
196*5f39d1b3SJooyung Han        Sum_over_i(
197*5f39d1b3SJooyung Han            (lhs_quantized_value[i] - lhs_zero_point) *
198*5f39d1b3SJooyung Han            (rhs_quantized_value[i] - rhs_zero_point)
199*5f39d1b3SJooyung Han        )                                                  (5)
200*5f39d1b3SJooyung Han```
201*5f39d1b3SJooyung Han
202*5f39d1b3SJooyung HanEquation (5) is the conclusion of this general discussion of how to specify what
203*5f39d1b3SJooyung Han"quantized matrix multiplication" should actually compute, in order to be able
204*5f39d1b3SJooyung Hanto replace real matrix multiplications.
205*5f39d1b3SJooyung Han
206*5f39d1b3SJooyung Han## Implementation of quantized matrix multiplication
207*5f39d1b3SJooyung Han
208*5f39d1b3SJooyung HanHaving obtained the mathematical form (5) of quantized matrix multiplication, we
209*5f39d1b3SJooyung Hannow turn to its actual implementation.
210*5f39d1b3SJooyung Han
211*5f39d1b3SJooyung HanThe inner-most part of (5),
212*5f39d1b3SJooyung Han
213*5f39d1b3SJooyung Han```
214*5f39d1b3SJooyung Hanint32_accumulator =
215*5f39d1b3SJooyung Han    Sum_over_i(
216*5f39d1b3SJooyung Han        (lhs_quantized_value[i] - lhs_zero_point) *
217*5f39d1b3SJooyung Han        (rhs_quantized_value[i] - rhs_zero_point)
218*5f39d1b3SJooyung Han)
219*5f39d1b3SJooyung Han```
220*5f39d1b3SJooyung Han
221*5f39d1b3SJooyung Hanis the "kernel" accumulation loop. It is where the bulk of the computational
222*5f39d1b3SJooyung Hancost goes. Luckily, it only involves integers: the quantized operands matrix
223*5f39d1b3SJooyung Hanentries, and their `zero_point` quantization parameters. Typically, all of these
224*5f39d1b3SJooyung Hanvalues are uint8. Typically, the above differences of uint8 values would be
225*5f39d1b3SJooyung Hanrepresented as signed int16; their products as signed int32.
226*5f39d1b3SJooyung Han
227*5f39d1b3SJooyung HanIt is out of scope of the present doc to discuss how to avoid the overhead of
228*5f39d1b3SJooyung Hanhaving to subtract these `zero_point` constants in this inner loop; refer to
229*5f39d1b3SJooyung Han[this section of
230*5f39d1b3SJooyung Hanlow-precision.md](low-precision.md#efficient-handling-of-offsets) for that. The
231*5f39d1b3SJooyung Hangist of it is that a mathematical trick allows us to take the handling of these
232*5f39d1b3SJooyung Han`zero_point` constants out of this accumulation loop, so that it simplifies to
233*5f39d1b3SJooyung Han
234*5f39d1b3SJooyung Han```
235*5f39d1b3SJooyung Hanint32_accumulator =
236*5f39d1b3SJooyung Han    Sum_over_i(
237*5f39d1b3SJooyung Han      lhs_quantized_value[i] *
238*5f39d1b3SJooyung Han      rhs_quantized_value[i]
239*5f39d1b3SJooyung Han    )                                                      (6)
240*5f39d1b3SJooyung Han```
241*5f39d1b3SJooyung Han
242*5f39d1b3SJooyung HanAnyway, the result is a `int32_accumulator` that we now plug back into the rest
243*5f39d1b3SJooyung Hanof (5):
244*5f39d1b3SJooyung Han
245*5f39d1b3SJooyung Han```
246*5f39d1b3SJooyung Hanresult_quantized_value = result_zero_point +
247*5f39d1b3SJooyung Han    (lhs_scale * rhs_scale / result_scale) * int32_accumulator       (7)
248*5f39d1b3SJooyung Han```
249*5f39d1b3SJooyung Han
250*5f39d1b3SJooyung HanThe difficulty here is of course that `(lhs_scale * rhs_scale / result_scale)`
251*5f39d1b3SJooyung Hanis a positive real number, not an integer in general. It is a constant, though.
252*5f39d1b3SJooyung HanSo what we have to implement here is the (approximate) scaling of a int32 value
253*5f39d1b3SJooyung Hanby some arbitrary positive constant multiplier.
254*5f39d1b3SJooyung Han
255*5f39d1b3SJooyung HanMoreover, it is safe to assume that this positive constant multiplier is smaller
256*5f39d1b3SJooyung Hanthan one — each of the `scale` values here is typically smaller than one,
257*5f39d1b3SJooyung Hanas we are typically mapping the `[0..255]` quantized uint8 value range to an
258*5f39d1b3SJooyung Haninterval of real values that is much narrower than that, typically within
259*5f39d1b3SJooyung Han`[-10,10]` in most neural networks. For example, a neural network using Relu6
260*5f39d1b3SJooyung Hanactivation functions will typically have real activation values in the interval
261*5f39d1b3SJooyung Han[0,6].
262*5f39d1b3SJooyung Han
263*5f39d1b3SJooyung HanSo how do we implement the multiplication of a int32 value by a positive real
264*5f39d1b3SJooyung Hanconstant that is smaller than one? Typically, by multiplying by a fixed-point
265*5f39d1b3SJooyung Hanconstant multiplier in the normalized interval `[1/2,1)`, and right-shifting
266*5f39d1b3SJooyung Hanthe result to achieve the correct multiplier.
267*5f39d1b3SJooyung Han
268*5f39d1b3SJooyung HanAt this point we have obtained the int32 value of the product
269*5f39d1b3SJooyung Han
270*5f39d1b3SJooyung Han```
271*5f39d1b3SJooyung Han(lhs_scale * rhs_scale / result_scale) * int32_accumulator
272*5f39d1b3SJooyung Han```
273*5f39d1b3SJooyung Han
274*5f39d1b3SJooyung HanLooking at (7), it only remains to add to it the integral value
275*5f39d1b3SJooyung Han`result_zero_point`, and we are done.
276*5f39d1b3SJooyung Han
277*5f39d1b3SJooyung Han## How this is implemented in gemmlowp
278*5f39d1b3SJooyung Han
279*5f39d1b3SJooyung HanThe different parts of gemmlowp implementing aspects of the above discussion
280*5f39d1b3SJooyung Hanare:
281*5f39d1b3SJooyung Han
282*5f39d1b3SJooyung Han*   The packing stage (see [packing.md](packing.md)) implements the special
283*5f39d1b3SJooyung Han    mathematical trick to handle `lhs_offset`, `rhs_offset` that we alluded to
284*5f39d1b3SJooyung Han    above, see [this section of
285*5f39d1b3SJooyung Han    low-precision.md](low-precision.md#efficient-handling-of-offsets) for
286*5f39d1b3SJooyung Han    details. Thanks to is, the rest of the calculation can proceed as if
287*5f39d1b3SJooyung Han    `lhs_offset`, `rhs_offset` were 0.
288*5f39d1b3SJooyung Han
289*5f39d1b3SJooyung Han*   The compute/kernel stage (see [kernel.md](kernel.md)) performs the core
290*5f39d1b3SJooyung Han    accumulation loop producing the `int32_accumulator`, see equation (6) above.
291*5f39d1b3SJooyung Han
292*5f39d1b3SJooyung Han*   The unpacking stage feeds into the output pipeline (see
293*5f39d1b3SJooyung Han    [output.md](output.md)), which implements the rest of the evaluation of the
294*5f39d1b3SJooyung Han    above equation (5), that we discussed in the previous section.
295*5f39d1b3SJooyung Han
296*5f39d1b3SJooyung HanNow, the point of gemmlowp's flexible output-pipelines mechanism (see
297*5f39d1b3SJooyung Han[output.md](output.md)) is to support different quantization paradigms, so we
298*5f39d1b3SJooyung Hannow have to specify which particular flavor of output pipeline corresponds to
299*5f39d1b3SJooyung Hanthe particular quantization paradigm that we detailed above in this document.
300*5f39d1b3SJooyung Han
301*5f39d1b3SJooyung HanThe specific output pipeline stage implementing the present quantization
302*5f39d1b3SJooyung Hanparadigm, i.e. implementing the precise computation detailed in the previous
303*5f39d1b3SJooyung Hansection (equation (5)), is
304*5f39d1b3SJooyung Han`OutputStageQuantizeDownInt32ByFixedPoint`.
305*5f39d1b3SJooyung Han
306*5f39d1b3SJooyung HanPlease refer to the comment explaining it in
307*5f39d1b3SJooyung Han[public/output_stages.h](../public/output_stages.h).
308*5f39d1b3SJooyung Han
309*5f39d1b3SJooyung Han## How this differs from the older legacy gemmlowp quantization paradigm
310*5f39d1b3SJooyung Han
311*5f39d1b3SJooyung HanThe difference between the older legacy quantization paradigm described in
312*5f39d1b3SJooyung Han[low-precision.md](low-precision.md) and the newer one described in this
313*5f39d1b3SJooyung Handocument boils down to the difference between the legacy output stage
314*5f39d1b3SJooyung Hanimplementing it, `OutputStageQuantizeDownInt32ToUint8Scale`, and the new output
315*5f39d1b3SJooyung Hanstage implementing the new paradigm,
316*5f39d1b3SJooyung Han`OutputStageQuantizeDownInt32ByFixedPoint`.
317*5f39d1b3SJooyung Han
318*5f39d1b3SJooyung HanPlease refer to the comments in
319*5f39d1b3SJooyung Han[public/output_stages.h](../public/output_stages.h) for details about these two
320*5f39d1b3SJooyung Hanoutput stages and how they differ.
321*5f39d1b3SJooyung Han
322*5f39d1b3SJooyung HanIssues with the old output stage `OutputStageQuantizeDownInt32ToUint8Scale` are:
323*5f39d1b3SJooyung Han
324*5f39d1b3SJooyung Han1.  The int32 accumulators (inputs to the output stage) undergo a plain int32
325*5f39d1b3SJooyung Han    multiplication with a int32 multiplier, which may overflow. By contrast, in
326*5f39d1b3SJooyung Han    the newer `OutputStageQuantizeDownInt32ByFixedPoint`, this
327*5f39d1b3SJooyung Han    integer multiplication becomes a fixed-point multiplication and cannot
328*5f39d1b3SJooyung Han    overflow.
329*5f39d1b3SJooyung Han
330*5f39d1b3SJooyung Han    *   In practice, to limit the risk of overflow, this pushes users to choose
331*5f39d1b3SJooyung Han        smaller values for this integer multiplier, which means limited
332*5f39d1b3SJooyung Han        multiplicative accuracy, which may cause multiplicative bias depending
333*5f39d1b3SJooyung Han        on how it is used.
334*5f39d1b3SJooyung Han
335*5f39d1b3SJooyung Han2.  Note how the order of multiplying by the multipler and adding the
336*5f39d1b3SJooyung Han    `result_offset` are swapped. This reflects a quantizatin equation of the
337*5f39d1b3SJooyung Han    form (1) above, as opposed to the form (2)/(3) that the new quantization
338*5f39d1b3SJooyung Han    paradigm uses. As a result, it is essentially impossible to guarantee that 0
339*5f39d1b3SJooyung Han    is an exactly-representable value, which as discussed above is an issue at
340*5f39d1b3SJooyung Han    least in some convolutional neural network applications.
341*5f39d1b3SJooyung Han
342*5f39d1b3SJooyung Han## Example code illustrating the new quantization paradigm
343*5f39d1b3SJooyung Han
344*5f39d1b3SJooyung HanExample code showing how to perfom a quantized matrix multiplication in the
345*5f39d1b3SJooyung Hanquantization paradigm discussed here is in
346*5f39d1b3SJooyung Han[doc/quantization_example.cc](quantization_example.cc).
347