gemmlowp/doc/quantization.md

*5f39d1b3SJooyung Han# Building a quantization paradigm from first principles
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han**TLDR:** If you prefer example code over theory, look at
*5f39d1b3SJooyung Han[doc/quantization_example.cc](quantization_example.cc).
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han## Overview
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Hangemmlowp allows to perform calculations on matrices on uint8 values, but these
*5f39d1b3SJooyung Hanmatrices are only useful insofar as they somehow approximate matrices of real
*5f39d1b3SJooyung Hannumbers. By a _quantization paradigm_ we mean a correspondence between matrices
*5f39d1b3SJooyung Hanof quantized 8bit values and matrices of real numbers. The choice of a
*5f39d1b3SJooyung Hanquantization paradigm affects the calculations that gemmlowp itself needs to
*5f39d1b3SJooyung Hanperform, specifically, it affects how one goes from internal 32bit accumulator
*5f39d1b3SJooyung Hanto final 8bit outputs.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanThe part of gemmlowp transforming internal 32bit accumulator to final
*5f39d1b3SJooyung Han8bit outputs is the "output pipeline" described in [output.md](output.md).
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Hangemmlowp's `GemmWithOutputPipeline` entry point allows specifying an arbitrary
*5f39d1b3SJooyung Hanoutput pipeline, allowing the user to implement their own preferred quantized
*5f39d1b3SJooyung Hanarithmetic paradigm.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanIn the present document, our purpose is to show how, reasoning from first
*5f39d1b3SJooyung Hanprinciples and some domain-specific knowledge of neural networks, we can arrive
*5f39d1b3SJooyung Hannaturally at some specific quantization paradigm, and how that can be
*5f39d1b3SJooyung Hanimplemented using a specific output pipeline.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanWe also aim to show how that differs from the older, legacy quantization
*5f39d1b3SJooyung Hanparadigm implemented by gemmlowp's legacy interfaces and why the change to the
*5f39d1b3SJooyung Hannewer quantization paradigm described in this document was useful as far as some
*5f39d1b3SJooyung Hanapplications of gemmlowp were concerned.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han## Quantization as an affine map.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanIn order for arithmetic on real values to map directly to arithmetic on
*5f39d1b3SJooyung Hanquantized uint8 values, the mapping between real and quantized uint8 values must
*5f39d1b3SJooyung Hanbe affine, which means it must be of the form
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanreal_value = A * quantized_value + B             (1)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Hanfor some constants A, B, or equivalently, of the form
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanreal_value = C * (quantized_value + D)           (2)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Hanfor some constants C, D. Indeed, anything else than such an affine map would
*5f39d1b3SJooyung Hanmean that the result of the quantized calculations do no longer readily provide
*5f39d1b3SJooyung Hanan approximation to the result of the real-numbers calculation.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han## Domain-specific constraint: the real value 0 must be exactly representable.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanHere a domain-specific constrain from neural networks appears: for some neural
*5f39d1b3SJooyung Hannetwork layers, it is very useful for optimized implementations that the
*5f39d1b3SJooyung Hanreal-value 0 be exactly representable.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanFor instance, in a Convolutional or Pooling layer with padding, it is useful to
*5f39d1b3SJooyung Hanbe able to implement the padding by zero-padding the input array, so that
*5f39d1b3SJooyung Hanoptimized loops do not need to become more complex to avoid overrunning the
*5f39d1b3SJooyung Hanarray bounds.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanIn order for such zero-padding to be feasible in a quantized implementation of
*5f39d1b3SJooyung Hansuch layers, it is important that the real value '0' be exactly representable in
*5f39d1b3SJooyung Hanquantized form, i.e. that it correspond exactly to some quantized value, which
*5f39d1b3SJooyung Hanwe call the _zero-point_.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanIndeed, if '0' were not exactly representable, then we would have to use some
*5f39d1b3SJooyung Hanquantized value for padding, that does not exactly correspond to the real value
*5f39d1b3SJooyung Han'0'. That would typically introduce inaccuracy in the result. In fact, using
*5f39d1b3SJooyung Hanalways the same such value would be worse: it would introduce _bias_ in the
*5f39d1b3SJooyung Hanresult.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han## The final form of the quantization equation
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanNow let us phrase what this constraint &mdash; that the real value 0 be exactly
*5f39d1b3SJooyung Hanrepresentable &mdash; means in either quantization equations, (1) and (2).
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanIn equation (1), plugging `real_value = 0` and `quantized_value = zero_point`,
*5f39d1b3SJooyung Hanwe get:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han0 = A * zero_point + B
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Hanequivalently:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanzero_point = -B / A
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanWe are thus left with a rather awkward constraint: the real number `-B / A` must
*5f39d1b3SJooyung Hansomehow be guaranteed to be exactly integral, so that the special uint8 value
*5f39d1b3SJooyung Han`zero_point` can be exactly equal to it. Quite awkward!
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanNow let us look at equation (2). Plugging `real_value = 0` and
*5f39d1b3SJooyung Han`quantized_value = zero_point`, we get:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han0 = C * (zero_point + D)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanConveniently, the constant `C` plays no role anymore, so this equation
*5f39d1b3SJooyung Hansimplifies to:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han0 = zero_point + D
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanIn other words, `D = -zero_point`. This suggests rewriting the quantization
*5f39d1b3SJooyung Hanequation (2) into the following form (3), which will be the final form that we
*5f39d1b3SJooyung Hanwill consistently use:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanreal_value = scale * (quantized_value - zero_point)        (3)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanTo go from (2) to (3), we merely renamed `C` to `scale` and `D` to
*5f39d1b3SJooyung Han`-zero_point`.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanWith this quantization equation (3), the condition that 0 be exactly
*5f39d1b3SJooyung Hanrepresentable is vacuously satisfied: `zero_point` is by definition one of the
*5f39d1b3SJooyung Hanpossible `quantized_value`'s, and equation (3) maps it to a `real_value` of
*5f39d1b3SJooyung Hanexactly 0.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanNote that the final quantizaton equation (3) depends on two constants, one
*5f39d1b3SJooyung Hanintegral, the other an arbitrary positive real number:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han*   `zero_point` is integral, more specifically is one of the possible quantized
*5f39d1b3SJooyung Han    values (i.e. typically is a uint8 value).
*5f39d1b3SJooyung Han*   `scale` is a positive real number. Thus at this stage we have not yet shown
*5f39d1b3SJooyung Han    how to eliminate all usage of floating-point arithmetic. That will come
*5f39d1b3SJooyung Han    below.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han## Quantizing a matrix multiplication
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanNow that we know &mdash; equation (3) &mdash; how real numbers are to correspond
*5f39d1b3SJooyung Hanto quantized values (typically uint8), we turn to applying this knowledge to
*5f39d1b3SJooyung Hanrewriting a multiplication of matrices of real numbers, by the equivalent
*5f39d1b3SJooyung Hanmultiplication of matrices of quantized values.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanSay that we have two matrices of real values `lhs_real_matrix`,
*5f39d1b3SJooyung Han`rhs_real_matrix`. Each entry of their product is the sum (accumulation) of many
*5f39d1b3SJooyung Hanproducts of individual matrix entries, say `lhs_real_value * rhs_real_value`.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanNow suppose that we have already quantized these two matrices according to the
*5f39d1b3SJooyung Hanabove equation (3), with some already-known quantization parameters `lhs_scale`,
*5f39d1b3SJooyung Han`rhs_scale`, `lhs_zero_point`, `rhs_zero_point`, so that their matrix entries
*5f39d1b3SJooyung Hanare quantized as
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanlhs_real_value[i] = lhs_scale * (lhs_quantized_value[i] - lhs_zero_point)
*5f39d1b3SJooyung Hanrhs_real_value[i] = rhs_scale * (rhs_quantized_value[i] - rhs_zero_point)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanWe then rewrite the matrix product accumulator accordingly:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanresult_real_value
*5f39d1b3SJooyung Han  = Sum_over_i(lhs_real_value[i] * rhs_real_value[i])
*5f39d1b3SJooyung Han  = Sum_over_i(
*5f39d1b3SJooyung Han        lhs_scale * (lhs_quantized_value[i] - lhs_zero_point) *
*5f39d1b3SJooyung Han        rhs_scale * (rhs_quantized_value[i] - rhs_zero_point)
*5f39d1b3SJooyung Han    )
*5f39d1b3SJooyung Han  = lhs_scale * rhs_scale * Sum_over_i(
*5f39d1b3SJooyung Han        (lhs_quantized_value[i] - lhs_zero_point) *
*5f39d1b3SJooyung Han        (rhs_quantized_value[i] - rhs_zero_point)
*5f39d1b3SJooyung Han    )                                                      (4)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanNow our goal is to represent this result itself as a quantized matrix, i.e.
*5f39d1b3SJooyung Hanstill according to equation (3), for some pre-established quantization
*5f39d1b3SJooyung Hanparameters `result_scale` and `result_zero_point`, as
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanresult_real_value = result_scale *
*5f39d1b3SJooyung Han    (result_quantized_value - result_zero_point)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanHere we need to keep in mind that our goal is to specify what the quantized
*5f39d1b3SJooyung Hanmatrix multiplication should do, i.e. how to compute `result_quantized_value`.
*5f39d1b3SJooyung HanThe last equation above is equivalent to
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanresult_quantized_value = result_zero_point +
*5f39d1b3SJooyung Han    result_real_value / result_scale
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanNow we can use equation (4) above to plug into this the expression of
*5f39d1b3SJooyung Hanresult_real_value in terms of the quantized operands, and we obtain:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanresult_quantized_value = result_zero_point +
*5f39d1b3SJooyung Han    (lhs_scale * rhs_scale / result_scale) *
*5f39d1b3SJooyung Han        Sum_over_i(
*5f39d1b3SJooyung Han            (lhs_quantized_value[i] - lhs_zero_point) *
*5f39d1b3SJooyung Han            (rhs_quantized_value[i] - rhs_zero_point)
*5f39d1b3SJooyung Han        )                                                  (5)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanEquation (5) is the conclusion of this general discussion of how to specify what
*5f39d1b3SJooyung Han"quantized matrix multiplication" should actually compute, in order to be able
*5f39d1b3SJooyung Hanto replace real matrix multiplications.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han## Implementation of quantized matrix multiplication
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanHaving obtained the mathematical form (5) of quantized matrix multiplication, we
*5f39d1b3SJooyung Hannow turn to its actual implementation.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanThe inner-most part of (5),
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanint32_accumulator =
*5f39d1b3SJooyung Han    Sum_over_i(
*5f39d1b3SJooyung Han        (lhs_quantized_value[i] - lhs_zero_point) *
*5f39d1b3SJooyung Han        (rhs_quantized_value[i] - rhs_zero_point)
*5f39d1b3SJooyung Han)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Hanis the "kernel" accumulation loop. It is where the bulk of the computational
*5f39d1b3SJooyung Hancost goes. Luckily, it only involves integers: the quantized operands matrix
*5f39d1b3SJooyung Hanentries, and their `zero_point` quantization parameters. Typically, all of these
*5f39d1b3SJooyung Hanvalues are uint8. Typically, the above differences of uint8 values would be
*5f39d1b3SJooyung Hanrepresented as signed int16; their products as signed int32.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanIt is out of scope of the present doc to discuss how to avoid the overhead of
*5f39d1b3SJooyung Hanhaving to subtract these `zero_point` constants in this inner loop; refer to
*5f39d1b3SJooyung Han[this section of
*5f39d1b3SJooyung Hanlow-precision.md](low-precision.md#efficient-handling-of-offsets) for that. The
*5f39d1b3SJooyung Hangist of it is that a mathematical trick allows us to take the handling of these
*5f39d1b3SJooyung Han`zero_point` constants out of this accumulation loop, so that it simplifies to
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanint32_accumulator =
*5f39d1b3SJooyung Han    Sum_over_i(
*5f39d1b3SJooyung Han      lhs_quantized_value[i] *
*5f39d1b3SJooyung Han      rhs_quantized_value[i]
*5f39d1b3SJooyung Han    )                                                      (6)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanAnyway, the result is a `int32_accumulator` that we now plug back into the rest
*5f39d1b3SJooyung Hanof (5):
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Hanresult_quantized_value = result_zero_point +
*5f39d1b3SJooyung Han    (lhs_scale * rhs_scale / result_scale) * int32_accumulator       (7)
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanThe difficulty here is of course that `(lhs_scale * rhs_scale / result_scale)`
*5f39d1b3SJooyung Hanis a positive real number, not an integer in general. It is a constant, though.
*5f39d1b3SJooyung HanSo what we have to implement here is the (approximate) scaling of a int32 value
*5f39d1b3SJooyung Hanby some arbitrary positive constant multiplier.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanMoreover, it is safe to assume that this positive constant multiplier is smaller
*5f39d1b3SJooyung Hanthan one &mdash; each of the `scale` values here is typically smaller than one,
*5f39d1b3SJooyung Hanas we are typically mapping the `[0..255]` quantized uint8 value range to an
*5f39d1b3SJooyung Haninterval of real values that is much narrower than that, typically within
*5f39d1b3SJooyung Han`[-10,10]` in most neural networks. For example, a neural network using Relu6
*5f39d1b3SJooyung Hanactivation functions will typically have real activation values in the interval
*5f39d1b3SJooyung Han[0,6].
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanSo how do we implement the multiplication of a int32 value by a positive real
*5f39d1b3SJooyung Hanconstant that is smaller than one? Typically, by multiplying by a fixed-point
*5f39d1b3SJooyung Hanconstant multiplier in the normalized interval `[1/2,1)`, and right-shifting
*5f39d1b3SJooyung Hanthe result to achieve the correct multiplier.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanAt this point we have obtained the int32 value of the product
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han(lhs_scale * rhs_scale / result_scale) * int32_accumulator
*5f39d1b3SJooyung Han```
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanLooking at (7), it only remains to add to it the integral value
*5f39d1b3SJooyung Han`result_zero_point`, and we are done.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han## How this is implemented in gemmlowp
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanThe different parts of gemmlowp implementing aspects of the above discussion
*5f39d1b3SJooyung Hanare:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han*   The packing stage (see [packing.md](packing.md)) implements the special
*5f39d1b3SJooyung Han    mathematical trick to handle `lhs_offset`, `rhs_offset` that we alluded to
*5f39d1b3SJooyung Han    above, see [this section of
*5f39d1b3SJooyung Han    low-precision.md](low-precision.md#efficient-handling-of-offsets) for
*5f39d1b3SJooyung Han    details. Thanks to is, the rest of the calculation can proceed as if
*5f39d1b3SJooyung Han    `lhs_offset`, `rhs_offset` were 0.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han*   The compute/kernel stage (see [kernel.md](kernel.md)) performs the core
*5f39d1b3SJooyung Han    accumulation loop producing the `int32_accumulator`, see equation (6) above.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han*   The unpacking stage feeds into the output pipeline (see
*5f39d1b3SJooyung Han    [output.md](output.md)), which implements the rest of the evaluation of the
*5f39d1b3SJooyung Han    above equation (5), that we discussed in the previous section.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanNow, the point of gemmlowp's flexible output-pipelines mechanism (see
*5f39d1b3SJooyung Han[output.md](output.md)) is to support different quantization paradigms, so we
*5f39d1b3SJooyung Hannow have to specify which particular flavor of output pipeline corresponds to
*5f39d1b3SJooyung Hanthe particular quantization paradigm that we detailed above in this document.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanThe specific output pipeline stage implementing the present quantization
*5f39d1b3SJooyung Hanparadigm, i.e. implementing the precise computation detailed in the previous
*5f39d1b3SJooyung Hansection (equation (5)), is
*5f39d1b3SJooyung Han`OutputStageQuantizeDownInt32ByFixedPoint`.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanPlease refer to the comment explaining it in
*5f39d1b3SJooyung Han[public/output_stages.h](../public/output_stages.h).
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han## How this differs from the older legacy gemmlowp quantization paradigm
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanThe difference between the older legacy quantization paradigm described in
*5f39d1b3SJooyung Han[low-precision.md](low-precision.md) and the newer one described in this
*5f39d1b3SJooyung Handocument boils down to the difference between the legacy output stage
*5f39d1b3SJooyung Hanimplementing it, `OutputStageQuantizeDownInt32ToUint8Scale`, and the new output
*5f39d1b3SJooyung Hanstage implementing the new paradigm,
*5f39d1b3SJooyung Han`OutputStageQuantizeDownInt32ByFixedPoint`.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanPlease refer to the comments in
*5f39d1b3SJooyung Han[public/output_stages.h](../public/output_stages.h) for details about these two
*5f39d1b3SJooyung Hanoutput stages and how they differ.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanIssues with the old output stage `OutputStageQuantizeDownInt32ToUint8Scale` are:
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han1.  The int32 accumulators (inputs to the output stage) undergo a plain int32
*5f39d1b3SJooyung Han    multiplication with a int32 multiplier, which may overflow. By contrast, in
*5f39d1b3SJooyung Han    the newer `OutputStageQuantizeDownInt32ByFixedPoint`, this
*5f39d1b3SJooyung Han    integer multiplication becomes a fixed-point multiplication and cannot
*5f39d1b3SJooyung Han    overflow.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han    *   In practice, to limit the risk of overflow, this pushes users to choose
*5f39d1b3SJooyung Han        smaller values for this integer multiplier, which means limited
*5f39d1b3SJooyung Han        multiplicative accuracy, which may cause multiplicative bias depending
*5f39d1b3SJooyung Han        on how it is used.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han2.  Note how the order of multiplying by the multipler and adding the
*5f39d1b3SJooyung Han    `result_offset` are swapped. This reflects a quantizatin equation of the
*5f39d1b3SJooyung Han    form (1) above, as opposed to the form (2)/(3) that the new quantization
*5f39d1b3SJooyung Han    paradigm uses. As a result, it is essentially impossible to guarantee that 0
*5f39d1b3SJooyung Han    is an exactly-representable value, which as discussed above is an issue at
*5f39d1b3SJooyung Han    least in some convolutional neural network applications.
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung Han## Example code illustrating the new quantization paradigm
*5f39d1b3SJooyung Han
*5f39d1b3SJooyung HanExample code showing how to perfom a quantized matrix multiplication in the
*5f39d1b3SJooyung Hanquantization paradigm discussed here is in
*5f39d1b3SJooyung Han[doc/quantization_example.cc](quantization_example.cc).