1*5f39d1b3SJooyung Han# Building a quantization paradigm from first principles 2*5f39d1b3SJooyung Han 3*5f39d1b3SJooyung Han**TLDR:** If you prefer example code over theory, look at 4*5f39d1b3SJooyung Han[doc/quantization_example.cc](quantization_example.cc). 5*5f39d1b3SJooyung Han 6*5f39d1b3SJooyung Han## Overview 7*5f39d1b3SJooyung Han 8*5f39d1b3SJooyung Hangemmlowp allows to perform calculations on matrices on uint8 values, but these 9*5f39d1b3SJooyung Hanmatrices are only useful insofar as they somehow approximate matrices of real 10*5f39d1b3SJooyung Hannumbers. By a _quantization paradigm_ we mean a correspondence between matrices 11*5f39d1b3SJooyung Hanof quantized 8bit values and matrices of real numbers. The choice of a 12*5f39d1b3SJooyung Hanquantization paradigm affects the calculations that gemmlowp itself needs to 13*5f39d1b3SJooyung Hanperform, specifically, it affects how one goes from internal 32bit accumulator 14*5f39d1b3SJooyung Hanto final 8bit outputs. 15*5f39d1b3SJooyung Han 16*5f39d1b3SJooyung HanThe part of gemmlowp transforming internal 32bit accumulator to final 17*5f39d1b3SJooyung Han8bit outputs is the "output pipeline" described in [output.md](output.md). 18*5f39d1b3SJooyung Han 19*5f39d1b3SJooyung Hangemmlowp's `GemmWithOutputPipeline` entry point allows specifying an arbitrary 20*5f39d1b3SJooyung Hanoutput pipeline, allowing the user to implement their own preferred quantized 21*5f39d1b3SJooyung Hanarithmetic paradigm. 22*5f39d1b3SJooyung Han 23*5f39d1b3SJooyung HanIn the present document, our purpose is to show how, reasoning from first 24*5f39d1b3SJooyung Hanprinciples and some domain-specific knowledge of neural networks, we can arrive 25*5f39d1b3SJooyung Hannaturally at some specific quantization paradigm, and how that can be 26*5f39d1b3SJooyung Hanimplemented using a specific output pipeline. 27*5f39d1b3SJooyung Han 28*5f39d1b3SJooyung HanWe also aim to show how that differs from the older, legacy quantization 29*5f39d1b3SJooyung Hanparadigm implemented by gemmlowp's legacy interfaces and why the change to the 30*5f39d1b3SJooyung Hannewer quantization paradigm described in this document was useful as far as some 31*5f39d1b3SJooyung Hanapplications of gemmlowp were concerned. 32*5f39d1b3SJooyung Han 33*5f39d1b3SJooyung Han## Quantization as an affine map. 34*5f39d1b3SJooyung Han 35*5f39d1b3SJooyung HanIn order for arithmetic on real values to map directly to arithmetic on 36*5f39d1b3SJooyung Hanquantized uint8 values, the mapping between real and quantized uint8 values must 37*5f39d1b3SJooyung Hanbe affine, which means it must be of the form 38*5f39d1b3SJooyung Han 39*5f39d1b3SJooyung Han``` 40*5f39d1b3SJooyung Hanreal_value = A * quantized_value + B (1) 41*5f39d1b3SJooyung Han``` 42*5f39d1b3SJooyung Han 43*5f39d1b3SJooyung Hanfor some constants A, B, or equivalently, of the form 44*5f39d1b3SJooyung Han 45*5f39d1b3SJooyung Han``` 46*5f39d1b3SJooyung Hanreal_value = C * (quantized_value + D) (2) 47*5f39d1b3SJooyung Han``` 48*5f39d1b3SJooyung Han 49*5f39d1b3SJooyung Hanfor some constants C, D. Indeed, anything else than such an affine map would 50*5f39d1b3SJooyung Hanmean that the result of the quantized calculations do no longer readily provide 51*5f39d1b3SJooyung Hanan approximation to the result of the real-numbers calculation. 52*5f39d1b3SJooyung Han 53*5f39d1b3SJooyung Han## Domain-specific constraint: the real value 0 must be exactly representable. 54*5f39d1b3SJooyung Han 55*5f39d1b3SJooyung HanHere a domain-specific constrain from neural networks appears: for some neural 56*5f39d1b3SJooyung Hannetwork layers, it is very useful for optimized implementations that the 57*5f39d1b3SJooyung Hanreal-value 0 be exactly representable. 58*5f39d1b3SJooyung Han 59*5f39d1b3SJooyung HanFor instance, in a Convolutional or Pooling layer with padding, it is useful to 60*5f39d1b3SJooyung Hanbe able to implement the padding by zero-padding the input array, so that 61*5f39d1b3SJooyung Hanoptimized loops do not need to become more complex to avoid overrunning the 62*5f39d1b3SJooyung Hanarray bounds. 63*5f39d1b3SJooyung Han 64*5f39d1b3SJooyung HanIn order for such zero-padding to be feasible in a quantized implementation of 65*5f39d1b3SJooyung Hansuch layers, it is important that the real value '0' be exactly representable in 66*5f39d1b3SJooyung Hanquantized form, i.e. that it correspond exactly to some quantized value, which 67*5f39d1b3SJooyung Hanwe call the _zero-point_. 68*5f39d1b3SJooyung Han 69*5f39d1b3SJooyung HanIndeed, if '0' were not exactly representable, then we would have to use some 70*5f39d1b3SJooyung Hanquantized value for padding, that does not exactly correspond to the real value 71*5f39d1b3SJooyung Han'0'. That would typically introduce inaccuracy in the result. In fact, using 72*5f39d1b3SJooyung Hanalways the same such value would be worse: it would introduce _bias_ in the 73*5f39d1b3SJooyung Hanresult. 74*5f39d1b3SJooyung Han 75*5f39d1b3SJooyung Han## The final form of the quantization equation 76*5f39d1b3SJooyung Han 77*5f39d1b3SJooyung HanNow let us phrase what this constraint — that the real value 0 be exactly 78*5f39d1b3SJooyung Hanrepresentable — means in either quantization equations, (1) and (2). 79*5f39d1b3SJooyung Han 80*5f39d1b3SJooyung HanIn equation (1), plugging `real_value = 0` and `quantized_value = zero_point`, 81*5f39d1b3SJooyung Hanwe get: 82*5f39d1b3SJooyung Han 83*5f39d1b3SJooyung Han``` 84*5f39d1b3SJooyung Han0 = A * zero_point + B 85*5f39d1b3SJooyung Han``` 86*5f39d1b3SJooyung Han 87*5f39d1b3SJooyung Hanequivalently: 88*5f39d1b3SJooyung Han 89*5f39d1b3SJooyung Han``` 90*5f39d1b3SJooyung Hanzero_point = -B / A 91*5f39d1b3SJooyung Han``` 92*5f39d1b3SJooyung Han 93*5f39d1b3SJooyung HanWe are thus left with a rather awkward constraint: the real number `-B / A` must 94*5f39d1b3SJooyung Hansomehow be guaranteed to be exactly integral, so that the special uint8 value 95*5f39d1b3SJooyung Han`zero_point` can be exactly equal to it. Quite awkward! 96*5f39d1b3SJooyung Han 97*5f39d1b3SJooyung HanNow let us look at equation (2). Plugging `real_value = 0` and 98*5f39d1b3SJooyung Han`quantized_value = zero_point`, we get: 99*5f39d1b3SJooyung Han 100*5f39d1b3SJooyung Han``` 101*5f39d1b3SJooyung Han0 = C * (zero_point + D) 102*5f39d1b3SJooyung Han``` 103*5f39d1b3SJooyung Han 104*5f39d1b3SJooyung HanConveniently, the constant `C` plays no role anymore, so this equation 105*5f39d1b3SJooyung Hansimplifies to: 106*5f39d1b3SJooyung Han 107*5f39d1b3SJooyung Han``` 108*5f39d1b3SJooyung Han0 = zero_point + D 109*5f39d1b3SJooyung Han``` 110*5f39d1b3SJooyung Han 111*5f39d1b3SJooyung HanIn other words, `D = -zero_point`. This suggests rewriting the quantization 112*5f39d1b3SJooyung Hanequation (2) into the following form (3), which will be the final form that we 113*5f39d1b3SJooyung Hanwill consistently use: 114*5f39d1b3SJooyung Han 115*5f39d1b3SJooyung Han``` 116*5f39d1b3SJooyung Hanreal_value = scale * (quantized_value - zero_point) (3) 117*5f39d1b3SJooyung Han``` 118*5f39d1b3SJooyung Han 119*5f39d1b3SJooyung HanTo go from (2) to (3), we merely renamed `C` to `scale` and `D` to 120*5f39d1b3SJooyung Han`-zero_point`. 121*5f39d1b3SJooyung Han 122*5f39d1b3SJooyung HanWith this quantization equation (3), the condition that 0 be exactly 123*5f39d1b3SJooyung Hanrepresentable is vacuously satisfied: `zero_point` is by definition one of the 124*5f39d1b3SJooyung Hanpossible `quantized_value`'s, and equation (3) maps it to a `real_value` of 125*5f39d1b3SJooyung Hanexactly 0. 126*5f39d1b3SJooyung Han 127*5f39d1b3SJooyung HanNote that the final quantizaton equation (3) depends on two constants, one 128*5f39d1b3SJooyung Hanintegral, the other an arbitrary positive real number: 129*5f39d1b3SJooyung Han 130*5f39d1b3SJooyung Han* `zero_point` is integral, more specifically is one of the possible quantized 131*5f39d1b3SJooyung Han values (i.e. typically is a uint8 value). 132*5f39d1b3SJooyung Han* `scale` is a positive real number. Thus at this stage we have not yet shown 133*5f39d1b3SJooyung Han how to eliminate all usage of floating-point arithmetic. That will come 134*5f39d1b3SJooyung Han below. 135*5f39d1b3SJooyung Han 136*5f39d1b3SJooyung Han## Quantizing a matrix multiplication 137*5f39d1b3SJooyung Han 138*5f39d1b3SJooyung HanNow that we know — equation (3) — how real numbers are to correspond 139*5f39d1b3SJooyung Hanto quantized values (typically uint8), we turn to applying this knowledge to 140*5f39d1b3SJooyung Hanrewriting a multiplication of matrices of real numbers, by the equivalent 141*5f39d1b3SJooyung Hanmultiplication of matrices of quantized values. 142*5f39d1b3SJooyung Han 143*5f39d1b3SJooyung HanSay that we have two matrices of real values `lhs_real_matrix`, 144*5f39d1b3SJooyung Han`rhs_real_matrix`. Each entry of their product is the sum (accumulation) of many 145*5f39d1b3SJooyung Hanproducts of individual matrix entries, say `lhs_real_value * rhs_real_value`. 146*5f39d1b3SJooyung Han 147*5f39d1b3SJooyung HanNow suppose that we have already quantized these two matrices according to the 148*5f39d1b3SJooyung Hanabove equation (3), with some already-known quantization parameters `lhs_scale`, 149*5f39d1b3SJooyung Han`rhs_scale`, `lhs_zero_point`, `rhs_zero_point`, so that their matrix entries 150*5f39d1b3SJooyung Hanare quantized as 151*5f39d1b3SJooyung Han 152*5f39d1b3SJooyung Han``` 153*5f39d1b3SJooyung Hanlhs_real_value[i] = lhs_scale * (lhs_quantized_value[i] - lhs_zero_point) 154*5f39d1b3SJooyung Hanrhs_real_value[i] = rhs_scale * (rhs_quantized_value[i] - rhs_zero_point) 155*5f39d1b3SJooyung Han``` 156*5f39d1b3SJooyung Han 157*5f39d1b3SJooyung HanWe then rewrite the matrix product accumulator accordingly: 158*5f39d1b3SJooyung Han 159*5f39d1b3SJooyung Han``` 160*5f39d1b3SJooyung Hanresult_real_value 161*5f39d1b3SJooyung Han = Sum_over_i(lhs_real_value[i] * rhs_real_value[i]) 162*5f39d1b3SJooyung Han = Sum_over_i( 163*5f39d1b3SJooyung Han lhs_scale * (lhs_quantized_value[i] - lhs_zero_point) * 164*5f39d1b3SJooyung Han rhs_scale * (rhs_quantized_value[i] - rhs_zero_point) 165*5f39d1b3SJooyung Han ) 166*5f39d1b3SJooyung Han = lhs_scale * rhs_scale * Sum_over_i( 167*5f39d1b3SJooyung Han (lhs_quantized_value[i] - lhs_zero_point) * 168*5f39d1b3SJooyung Han (rhs_quantized_value[i] - rhs_zero_point) 169*5f39d1b3SJooyung Han ) (4) 170*5f39d1b3SJooyung Han``` 171*5f39d1b3SJooyung Han 172*5f39d1b3SJooyung HanNow our goal is to represent this result itself as a quantized matrix, i.e. 173*5f39d1b3SJooyung Hanstill according to equation (3), for some pre-established quantization 174*5f39d1b3SJooyung Hanparameters `result_scale` and `result_zero_point`, as 175*5f39d1b3SJooyung Han 176*5f39d1b3SJooyung Han``` 177*5f39d1b3SJooyung Hanresult_real_value = result_scale * 178*5f39d1b3SJooyung Han (result_quantized_value - result_zero_point) 179*5f39d1b3SJooyung Han``` 180*5f39d1b3SJooyung Han 181*5f39d1b3SJooyung HanHere we need to keep in mind that our goal is to specify what the quantized 182*5f39d1b3SJooyung Hanmatrix multiplication should do, i.e. how to compute `result_quantized_value`. 183*5f39d1b3SJooyung HanThe last equation above is equivalent to 184*5f39d1b3SJooyung Han 185*5f39d1b3SJooyung Han``` 186*5f39d1b3SJooyung Hanresult_quantized_value = result_zero_point + 187*5f39d1b3SJooyung Han result_real_value / result_scale 188*5f39d1b3SJooyung Han``` 189*5f39d1b3SJooyung Han 190*5f39d1b3SJooyung HanNow we can use equation (4) above to plug into this the expression of 191*5f39d1b3SJooyung Hanresult_real_value in terms of the quantized operands, and we obtain: 192*5f39d1b3SJooyung Han 193*5f39d1b3SJooyung Han``` 194*5f39d1b3SJooyung Hanresult_quantized_value = result_zero_point + 195*5f39d1b3SJooyung Han (lhs_scale * rhs_scale / result_scale) * 196*5f39d1b3SJooyung Han Sum_over_i( 197*5f39d1b3SJooyung Han (lhs_quantized_value[i] - lhs_zero_point) * 198*5f39d1b3SJooyung Han (rhs_quantized_value[i] - rhs_zero_point) 199*5f39d1b3SJooyung Han ) (5) 200*5f39d1b3SJooyung Han``` 201*5f39d1b3SJooyung Han 202*5f39d1b3SJooyung HanEquation (5) is the conclusion of this general discussion of how to specify what 203*5f39d1b3SJooyung Han"quantized matrix multiplication" should actually compute, in order to be able 204*5f39d1b3SJooyung Hanto replace real matrix multiplications. 205*5f39d1b3SJooyung Han 206*5f39d1b3SJooyung Han## Implementation of quantized matrix multiplication 207*5f39d1b3SJooyung Han 208*5f39d1b3SJooyung HanHaving obtained the mathematical form (5) of quantized matrix multiplication, we 209*5f39d1b3SJooyung Hannow turn to its actual implementation. 210*5f39d1b3SJooyung Han 211*5f39d1b3SJooyung HanThe inner-most part of (5), 212*5f39d1b3SJooyung Han 213*5f39d1b3SJooyung Han``` 214*5f39d1b3SJooyung Hanint32_accumulator = 215*5f39d1b3SJooyung Han Sum_over_i( 216*5f39d1b3SJooyung Han (lhs_quantized_value[i] - lhs_zero_point) * 217*5f39d1b3SJooyung Han (rhs_quantized_value[i] - rhs_zero_point) 218*5f39d1b3SJooyung Han) 219*5f39d1b3SJooyung Han``` 220*5f39d1b3SJooyung Han 221*5f39d1b3SJooyung Hanis the "kernel" accumulation loop. It is where the bulk of the computational 222*5f39d1b3SJooyung Hancost goes. Luckily, it only involves integers: the quantized operands matrix 223*5f39d1b3SJooyung Hanentries, and their `zero_point` quantization parameters. Typically, all of these 224*5f39d1b3SJooyung Hanvalues are uint8. Typically, the above differences of uint8 values would be 225*5f39d1b3SJooyung Hanrepresented as signed int16; their products as signed int32. 226*5f39d1b3SJooyung Han 227*5f39d1b3SJooyung HanIt is out of scope of the present doc to discuss how to avoid the overhead of 228*5f39d1b3SJooyung Hanhaving to subtract these `zero_point` constants in this inner loop; refer to 229*5f39d1b3SJooyung Han[this section of 230*5f39d1b3SJooyung Hanlow-precision.md](low-precision.md#efficient-handling-of-offsets) for that. The 231*5f39d1b3SJooyung Hangist of it is that a mathematical trick allows us to take the handling of these 232*5f39d1b3SJooyung Han`zero_point` constants out of this accumulation loop, so that it simplifies to 233*5f39d1b3SJooyung Han 234*5f39d1b3SJooyung Han``` 235*5f39d1b3SJooyung Hanint32_accumulator = 236*5f39d1b3SJooyung Han Sum_over_i( 237*5f39d1b3SJooyung Han lhs_quantized_value[i] * 238*5f39d1b3SJooyung Han rhs_quantized_value[i] 239*5f39d1b3SJooyung Han ) (6) 240*5f39d1b3SJooyung Han``` 241*5f39d1b3SJooyung Han 242*5f39d1b3SJooyung HanAnyway, the result is a `int32_accumulator` that we now plug back into the rest 243*5f39d1b3SJooyung Hanof (5): 244*5f39d1b3SJooyung Han 245*5f39d1b3SJooyung Han``` 246*5f39d1b3SJooyung Hanresult_quantized_value = result_zero_point + 247*5f39d1b3SJooyung Han (lhs_scale * rhs_scale / result_scale) * int32_accumulator (7) 248*5f39d1b3SJooyung Han``` 249*5f39d1b3SJooyung Han 250*5f39d1b3SJooyung HanThe difficulty here is of course that `(lhs_scale * rhs_scale / result_scale)` 251*5f39d1b3SJooyung Hanis a positive real number, not an integer in general. It is a constant, though. 252*5f39d1b3SJooyung HanSo what we have to implement here is the (approximate) scaling of a int32 value 253*5f39d1b3SJooyung Hanby some arbitrary positive constant multiplier. 254*5f39d1b3SJooyung Han 255*5f39d1b3SJooyung HanMoreover, it is safe to assume that this positive constant multiplier is smaller 256*5f39d1b3SJooyung Hanthan one — each of the `scale` values here is typically smaller than one, 257*5f39d1b3SJooyung Hanas we are typically mapping the `[0..255]` quantized uint8 value range to an 258*5f39d1b3SJooyung Haninterval of real values that is much narrower than that, typically within 259*5f39d1b3SJooyung Han`[-10,10]` in most neural networks. For example, a neural network using Relu6 260*5f39d1b3SJooyung Hanactivation functions will typically have real activation values in the interval 261*5f39d1b3SJooyung Han[0,6]. 262*5f39d1b3SJooyung Han 263*5f39d1b3SJooyung HanSo how do we implement the multiplication of a int32 value by a positive real 264*5f39d1b3SJooyung Hanconstant that is smaller than one? Typically, by multiplying by a fixed-point 265*5f39d1b3SJooyung Hanconstant multiplier in the normalized interval `[1/2,1)`, and right-shifting 266*5f39d1b3SJooyung Hanthe result to achieve the correct multiplier. 267*5f39d1b3SJooyung Han 268*5f39d1b3SJooyung HanAt this point we have obtained the int32 value of the product 269*5f39d1b3SJooyung Han 270*5f39d1b3SJooyung Han``` 271*5f39d1b3SJooyung Han(lhs_scale * rhs_scale / result_scale) * int32_accumulator 272*5f39d1b3SJooyung Han``` 273*5f39d1b3SJooyung Han 274*5f39d1b3SJooyung HanLooking at (7), it only remains to add to it the integral value 275*5f39d1b3SJooyung Han`result_zero_point`, and we are done. 276*5f39d1b3SJooyung Han 277*5f39d1b3SJooyung Han## How this is implemented in gemmlowp 278*5f39d1b3SJooyung Han 279*5f39d1b3SJooyung HanThe different parts of gemmlowp implementing aspects of the above discussion 280*5f39d1b3SJooyung Hanare: 281*5f39d1b3SJooyung Han 282*5f39d1b3SJooyung Han* The packing stage (see [packing.md](packing.md)) implements the special 283*5f39d1b3SJooyung Han mathematical trick to handle `lhs_offset`, `rhs_offset` that we alluded to 284*5f39d1b3SJooyung Han above, see [this section of 285*5f39d1b3SJooyung Han low-precision.md](low-precision.md#efficient-handling-of-offsets) for 286*5f39d1b3SJooyung Han details. Thanks to is, the rest of the calculation can proceed as if 287*5f39d1b3SJooyung Han `lhs_offset`, `rhs_offset` were 0. 288*5f39d1b3SJooyung Han 289*5f39d1b3SJooyung Han* The compute/kernel stage (see [kernel.md](kernel.md)) performs the core 290*5f39d1b3SJooyung Han accumulation loop producing the `int32_accumulator`, see equation (6) above. 291*5f39d1b3SJooyung Han 292*5f39d1b3SJooyung Han* The unpacking stage feeds into the output pipeline (see 293*5f39d1b3SJooyung Han [output.md](output.md)), which implements the rest of the evaluation of the 294*5f39d1b3SJooyung Han above equation (5), that we discussed in the previous section. 295*5f39d1b3SJooyung Han 296*5f39d1b3SJooyung HanNow, the point of gemmlowp's flexible output-pipelines mechanism (see 297*5f39d1b3SJooyung Han[output.md](output.md)) is to support different quantization paradigms, so we 298*5f39d1b3SJooyung Hannow have to specify which particular flavor of output pipeline corresponds to 299*5f39d1b3SJooyung Hanthe particular quantization paradigm that we detailed above in this document. 300*5f39d1b3SJooyung Han 301*5f39d1b3SJooyung HanThe specific output pipeline stage implementing the present quantization 302*5f39d1b3SJooyung Hanparadigm, i.e. implementing the precise computation detailed in the previous 303*5f39d1b3SJooyung Hansection (equation (5)), is 304*5f39d1b3SJooyung Han`OutputStageQuantizeDownInt32ByFixedPoint`. 305*5f39d1b3SJooyung Han 306*5f39d1b3SJooyung HanPlease refer to the comment explaining it in 307*5f39d1b3SJooyung Han[public/output_stages.h](../public/output_stages.h). 308*5f39d1b3SJooyung Han 309*5f39d1b3SJooyung Han## How this differs from the older legacy gemmlowp quantization paradigm 310*5f39d1b3SJooyung Han 311*5f39d1b3SJooyung HanThe difference between the older legacy quantization paradigm described in 312*5f39d1b3SJooyung Han[low-precision.md](low-precision.md) and the newer one described in this 313*5f39d1b3SJooyung Handocument boils down to the difference between the legacy output stage 314*5f39d1b3SJooyung Hanimplementing it, `OutputStageQuantizeDownInt32ToUint8Scale`, and the new output 315*5f39d1b3SJooyung Hanstage implementing the new paradigm, 316*5f39d1b3SJooyung Han`OutputStageQuantizeDownInt32ByFixedPoint`. 317*5f39d1b3SJooyung Han 318*5f39d1b3SJooyung HanPlease refer to the comments in 319*5f39d1b3SJooyung Han[public/output_stages.h](../public/output_stages.h) for details about these two 320*5f39d1b3SJooyung Hanoutput stages and how they differ. 321*5f39d1b3SJooyung Han 322*5f39d1b3SJooyung HanIssues with the old output stage `OutputStageQuantizeDownInt32ToUint8Scale` are: 323*5f39d1b3SJooyung Han 324*5f39d1b3SJooyung Han1. The int32 accumulators (inputs to the output stage) undergo a plain int32 325*5f39d1b3SJooyung Han multiplication with a int32 multiplier, which may overflow. By contrast, in 326*5f39d1b3SJooyung Han the newer `OutputStageQuantizeDownInt32ByFixedPoint`, this 327*5f39d1b3SJooyung Han integer multiplication becomes a fixed-point multiplication and cannot 328*5f39d1b3SJooyung Han overflow. 329*5f39d1b3SJooyung Han 330*5f39d1b3SJooyung Han * In practice, to limit the risk of overflow, this pushes users to choose 331*5f39d1b3SJooyung Han smaller values for this integer multiplier, which means limited 332*5f39d1b3SJooyung Han multiplicative accuracy, which may cause multiplicative bias depending 333*5f39d1b3SJooyung Han on how it is used. 334*5f39d1b3SJooyung Han 335*5f39d1b3SJooyung Han2. Note how the order of multiplying by the multipler and adding the 336*5f39d1b3SJooyung Han `result_offset` are swapped. This reflects a quantizatin equation of the 337*5f39d1b3SJooyung Han form (1) above, as opposed to the form (2)/(3) that the new quantization 338*5f39d1b3SJooyung Han paradigm uses. As a result, it is essentially impossible to guarantee that 0 339*5f39d1b3SJooyung Han is an exactly-representable value, which as discussed above is an issue at 340*5f39d1b3SJooyung Han least in some convolutional neural network applications. 341*5f39d1b3SJooyung Han 342*5f39d1b3SJooyung Han## Example code illustrating the new quantization paradigm 343*5f39d1b3SJooyung Han 344*5f39d1b3SJooyung HanExample code showing how to perfom a quantized matrix multiplication in the 345*5f39d1b3SJooyung Hanquantization paradigm discussed here is in 346*5f39d1b3SJooyung Han[doc/quantization_example.cc](quantization_example.cc). 347