1The quantized folder holds the implementation of the low-level quantized kernel. 2The kernels are registered in `torch::_ops` namespace, and operate on the quantized `at::Tensor` data type. 3You can learn more about the quantized tensors in the [quantized tensor API wiki](https://github.com/pytorch/pytorch/wiki/Introducing-Quantized-Tensor) page. 4 5This document serves as an entry point for quantized kernel implementation. 6 7## Implementing native quantized ops 8 9The new quantized ops are almost always located under the `ATen/native/quantized/cpu` folder. For 10the sake of an example, let us implement an element-wise quantized [logical XAND](https://en.wiktionary.org/wiki/XAND) 11operation under `ATen/native/quantized/cpu/qxand.cpp`. 12 13### Step 0. Implement the quantized function 14 15Before writing the quantized kernel and registering it, let us implement a quantized function. 16That would assist in any further discussion. 17The snippet below shows the implementation of a quantized XAND operator, with the support of all implemented quantized types. 18 19```c++ 20Tensor quantized_xand(Tensor qa, Tensor qb) { 21 // Some type checks for qa and qb should be here... 22 Tensor qc; 23 double scale = qa.q_scale(); 24 int64_t zero_point = qa.q_zero_point(); 25 26 auto iter = TensorIterator::binary_op(qc, qa, qb); 27 28 AT_DISPATCH_QINT_TYPES(qa.scalar_type(), "quantized_xand", [&]() { 29 Tensor qc = at::_empty_affine_quantized( 30 qa.sizes(), at::device(kCPU).dtype(SCALAR_TYPE), scale, zero_point); 31 cpu_kernel(iter, [&](scalar_t a_value, scalar_t b_value) -> scalar_t { 32 return scalar_t(a_value.val_ & b_value.val_); 33 }); 34 }); 35 return qc; 36} 37``` 38 39The code above is fairly straight-forward: 40It takes two quantized tensors `qa` and `qb`, and uses `binary_kernel` to produce a quantized tensor `qc`. 41We also use the [`TensorIterator`](https://caffe2.ai/doxygen-c/html/structat_1_1_tensor_iterator.html) in this example. 42The only part that requires explicit explanation is the `AT_DISPATCH_QINT_TYPES`. 43This macro makes sure that the underlying code works with all quantized types. 44It provides several useful "aliases": 45 46- `SCALAR_TYPE` -- `ScalarType` of the quantized tensor (e.g. `kQInt8`) 47- `scalar_t` -- quantized data type (dtype, e.g. `qint8`) 48- `underlying_t` -- underlying POD data type (dtype, e.g. `int8_t`) 49 50The macro takes three arguments: 51 521. Quantized data type. This will define what the "aliases" are. 53In the example above, the resulting tensor will be the same as the `qa.scalar_type()`. 542. Function name. This argument is currently used for error reporting. 553. Implementation lambda. The main implementation should sit in the body of this lambda. 56it should also use the aliases for the quantized data types instead of the explicit data types. 57 58### Step 1. Define the schema 59 60Update `aten/src/ATen/native/quantized/library.cpp` and add 61a `def` for your new operator: 62 63```c++ 64TORCH_LIBRARY(quantized, m) { 65 // ... the existing definitions ... 66 m.def("quantized::xand(Tensor qa, Tensor qb) -> Tensor"); 67} 68``` 69 70Def takes a **function schema string**: This schema describes the usage of the op. 71In the example above the schema is `"quantized::xand(Tensor qa, Tensor qb) -> Tensor"`. 72This translates to `torch._ops.ops.quantized.xand` function in Python of the appropriate signature. 73 74### Step 2. Register the implementation 75 76The registration is done using `TORCH_LIBRARY_IMPL`. 77 78```c++ 79TORCH_LIBRARY_IMPL(quantized, QuantizedCPU, m) { 80 m.impl("xand", TORCH_FN(quantized_xand)); 81} 82``` 83 84### Step 2b. [Optional] Registering the operation with the `native_functions.yaml` 85 86In some cases, if the signature of the quantized function and its non-quantized counterpart are the same, it is worth adding it to the `ATen/native/native_functions.yaml`. 87A detailed explanation on this file can be found [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md). 88 89**If adding a new entry to the `native_functions.yaml`:** 90 91```yaml 92- func: quantized_xand(Tensor qa, Tensor qb) -> Tensor 93 dispatch: 94 QuantizedCPU: quantized_xand 95``` 96 97**If adding to an existing entry in the `native_functions.yaml`:** 98 99If you find an entry in the yaml file, and would like to add a quantized kernel to it, you can just add a new dispatch entry for it. 100For example, let's assume there existed a `xand` function in the YAML file. 101In that case, modification would look as: 102 103```yaml 104- func: xand(Tensor a, Tensor b) -> Tensor 105 dispatch: 106 CPU: _xand_cpu # Assume this existed 107 CUDA: _xand_cuda # Assume this existed 108 QuantizedCPU: quantized_xand 109``` 110 111### Putting it all together 112 113The final file `ATen/native/quantized/cpu/qxand.cpp` would look as follows 114 115```c++ 116#include <ATen/ATen.h> 117#include <ATen/NativeFunctions.h> // Need that for the `native_functions.yaml` 118#include <ATen/core/Type.h> 119#include <torch/library.h> 120#include <ATen/native/TensorIterator.h> 121#include <ATen/native/cpu/Loops.h> 122 123namespace at { 124 namespace native { 125 Tensor quantized_xand(Tensor qa, Tensor qb) { 126 // The awesome op implementation... 127 return qc; 128 } 129 130 TORCH_LIBRARY_IMPL(quantized, QuantizedCPU, m) { 131 m.impl("xand", TORCH_FN(quantized_xand)); 132 } 133}} // namespace at::native 134``` 135 136### Step 3. Administrative stuff 137 138Before the op can be used, it needs to be compiled. 139If the op is placed under `native/quantized/cpu`, this already done for you. 140However, if the location is changed, two files must be notified: 141 142- *`caffe2/aten/TARGETS`* -- You can follow the same example, and add your path in somewhere in that file. Notice in this file we places the path to the quantized source files: 143```bash 144ATEN_NATIVE_CPP = glob([ 145#... 146 "src/ATen/native/quantized/**/*.cpp", 147]) 148``` 149 150- *`caffe2/aten/src/ATen/CMakeLists.txt`* -- Again, following the example, you must add your paths. 151The current quantization paths are added as 152 153```bash 154FILE(GLOB native_quantized_cpp 155 "native/quantized/*.cpp" 156 "native/quantized/cpu/*.cpp") 157``` 158 159## Using quantized ops 160 161### Python 162 163Usage in Python is pretty easy. 164To implement the python quantized function using our kernel, you can do the following 165 166```python 167from torch._ops import ops 168 169def quantized_xand(qa, qb): 170#Notice the schema changed from `quantized::xand` to `quantized.xand` 171 return ops.quantized.xand(qa, qb) 172``` 173 174**Note:** If writing new pytorch functions that use quantized kernels, 175it is strongly encouraged to place them in the `torch/ao/nn/quantized/functional.py`. 176 177### C++ 178 179You should not need to use the registered kernels in C++. 180Although **officially not supported**, you can use the following 181 182```c++ 183 Tensor quantized_xand(Tensor qa, Tensor qb) { 184 static const c10::OperatorHandle op = c10::Dispatcher::singleton().findSchema({"quantized::xand", ""}).value(); 185 return op.call<Tensor, Tensor, Tensor>(qa, qb); 186 } 187``` 188