1# XNNPACK Backend 2 3[XNNPACK](https://github.com/google/XNNPACK) is a library of optimized neural network operators for ARM and x86 CPU platforms. Our delegate lowers models to run using these highly optimized CPU operators. You can try out lowering and running some example models in the demo. Please refer to the following docs for information on the XNNPACK Delegate 4- [XNNPACK Backend Delegate Overview](https://pytorch.org/executorch/stable/native-delegates-executorch-xnnpack-delegate.html) 5- [XNNPACK Delegate Export Tutorial](https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html) 6 7 8## Directory structure 9 10```bash 11examples/xnnpack 12├── quantization # Scripts to illustrate PyTorch 2 Export Quantization workflow with XNNPACKQuantizer 13│ └── example.py 14├── aot_compiler.py # The main script to illustrate the full AOT (export, quantization, delegation) workflow with XNNPACK delegate 15└── README.md # This file 16``` 17 18## Delegating a Floating-point Model 19 20The following command will produce a floating-point XNNPACK delegated model `mv2_xnnpack_fp32.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`. 21 22```bash 23# For MobileNet V2 24python3 -m examples.xnnpack.aot_compiler --model_name="mv2" --delegate 25``` 26 27Once we have the model binary (pte) file, then let's run it with ExecuTorch runtime using the `xnn_executor_runner`. With cmake, you first configure your cmake with the following: 28 29```bash 30# cd to the root of executorch repo 31cd executorch 32 33# Get a clean cmake-out directory 34rm -rf cmake-out 35mkdir cmake-out 36 37# Configure cmake 38cmake \ 39 -DCMAKE_INSTALL_PREFIX=cmake-out \ 40 -DCMAKE_BUILD_TYPE=Release \ 41 -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ 42 -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ 43 -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ 44 -DEXECUTORCH_BUILD_XNNPACK=ON \ 45 -DEXECUTORCH_ENABLE_LOGGING=ON \ 46 -DPYTHON_EXECUTABLE=python \ 47 -Bcmake-out . 48``` 49 50Then you can build the runtime components with 51 52```bash 53cmake --build cmake-out -j9 --target install --config Release 54``` 55 56Now finally you should be able to run this model with the following command 57 58```bash 59./cmake-out/backends/xnnpack/xnn_executor_runner --model_path ./mv2_xnnpack_fp32.pte 60``` 61 62## Quantization 63First, learn more about the generic PyTorch 2 Export Quantization workflow in the [Quantization Flow Docs](https://pytorch.org/executorch/stable/quantization-overview.html), if you are not familiar already. 64 65Here we will discuss quantizing a model suitable for XNNPACK delegation using XNNPACKQuantizer. 66 67Though it is typical to run this quantized mode via XNNPACK delegate, we want to highlight that this is just another quantization flavor, and we can run this quantized model without necessarily using XNNPACK delegate, but only using standard quantization operators. 68 69A shared library to register the out variants of the quantized operators (e.g., `quantized_decomposed::add.out`) into EXIR is required. On cmake, follow the instructions in `test_quantize.sh` to build it, the default path is `cmake-out/kernels/quantized/libquantized_ops_lib.so`. 70 71Then you can generate a XNNPACK quantized model with the following command by passing the path to the shared library into the script `quantization/example.py`: 72```bash 73python3 -m examples.xnnpack.quantization.example --model_name "mv2" --so_library "<path/to/so/lib>" # for MobileNetv2 74 75# This should generate ./mv2_quantized.pte file, if successful. 76``` 77You can find more valid quantized example models by running: 78```bash 79python3 -m examples.xnnpack.quantization.example --help 80``` 81 82## Running the XNNPACK Model with CMake 83After exporting the XNNPACK Delegated model, we can now try running it with example inputs using CMake. We can build and use the xnn_executor_runner, which is a sample wrapper for the ExecuTorch Runtime and XNNPACK Backend. We first begin by configuring the CMake build like such: 84```bash 85# cd to the root of executorch repo 86cd executorch 87 88# Get a clean cmake-out directory 89rm -rf cmake-out 90mkdir cmake-out 91 92# Configure cmake 93cmake \ 94 -DCMAKE_INSTALL_PREFIX=cmake-out \ 95 -DCMAKE_BUILD_TYPE=Release \ 96 -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ 97 -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ 98 -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ 99 -DEXECUTORCH_BUILD_XNNPACK=ON \ 100 -DEXECUTORCH_ENABLE_LOGGING=ON \ 101 -DPYTHON_EXECUTABLE=python \ 102 -Bcmake-out . 103``` 104Then you can build the runtime componenets with 105 106```bash 107cmake --build cmake-out -j9 --target install --config Release 108``` 109 110Now you should be able to find the executable built at `./cmake-out/backends/xnnpack/xnn_executor_runner` you can run the executable with the model you generated as such 111```bash 112./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./mv2_quantized.pte 113``` 114 115## Delegating a Quantized Model 116 117The following command will produce a XNNPACK quantized and delegated model `mv2_xnnpack_q8.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`. 118 119```bash 120python3 -m examples.xnnpack.aot_compiler --model_name "mv2" --quantize --delegate 121``` 122