1# ExecuTorch Vulkan Delegate 2 3The ExecuTorch Vulkan delegate is a native GPU delegate for ExecuTorch that is 4built on top of the cross-platform Vulkan GPU API standard. It is primarily 5designed to leverage the GPU to accelerate model inference on Android devices, 6but can be used on any platform that supports an implementation of Vulkan: 7laptops, servers, and edge devices. 8 9::::{note} 10The Vulkan delegate is currently under active development, and its components 11are subject to change. 12:::: 13 14## What is Vulkan? 15 16Vulkan is a low-level GPU API specification developed as a successor to OpenGL. 17It is designed to offer developers more explicit control over GPUs compared to 18previous specifications in order to reduce overhead and maximize the 19capabilities of the modern graphics hardware. 20 21Vulkan has been widely adopted among GPU vendors, and most modern GPUs (both 22desktop and mobile) in the market support Vulkan. Vulkan is also included in 23Android from Android 7.0 onwards. 24 25**Note that Vulkan is a GPU API, not a GPU Math Library**. That is to say it 26provides a way to execute compute and graphics operations on a GPU, but does not 27come with a built-in library of performant compute kernels. 28 29## The Vulkan Compute Library 30 31The ExecuTorch Vulkan Delegate is a wrapper around a standalone runtime known as 32the **Vulkan Compute Library**. The aim of the Vulkan Compute Library is to 33provide GPU implementations for PyTorch operators via GLSL compute shaders. 34 35The Vulkan Compute Library is a fork/iteration of the [PyTorch Vulkan Backend](https://pytorch.org/tutorials/prototype/vulkan_workflow.html). 36The core components of the PyTorch Vulkan backend were forked into ExecuTorch 37and adapted for an AOT graph-mode style of model inference (as opposed to 38PyTorch which adopted an eager execution style of model inference). 39 40The components of the Vulkan Compute Library are contained in the 41`executorch/backends/vulkan/runtime/` directory. The core components are listed 42and described below: 43 44``` 45runtime/ 46├── api/ .................... Wrapper API around Vulkan to manage Vulkan objects 47└── graph/ .................. ComputeGraph class which implements graph mode inference 48 └── ops/ ................ Base directory for operator implementations 49 ├── glsl/ ........... GLSL compute shaders 50 │ ├── *.glsl 51 │ └── conv2d.glsl 52 └── impl/ ........... C++ code to dispatch GPU compute shaders 53 ├── *.cpp 54 └── Conv2d.cpp 55``` 56 57## Features 58 59The Vulkan delegate currently supports the following features: 60 61* **Memory Planning** 62 * Intermediate tensors whose lifetimes do not overlap will share memory allocations. This reduces the peak memory usage of model inference. 63* **Capability Based Partitioning**: 64 * A graph can be partially lowered to the Vulkan delegate via a partitioner, which will identify nodes (i.e. operators) that are supported by the Vulkan delegate and lower only supported subgraphs 65* **Support for upper-bound dynamic shapes**: 66 * Tensors can change shape between inferences as long as its current shape is smaller than the bounds specified during lowering 67 68In addition to increasing operator coverage, the following features are 69currently in development: 70 71* **Quantization Support** 72 * We are currently working on support for 8-bit dynamic quantization, with plans to extend to other quantization schemes in the future. 73* **Memory Layout Management** 74 * Memory layout is an important factor to optimizing performance. We plan to introduce graph passes to introduce memory layout transitions throughout a graph to optimize memory-layout sensitive operators such as Convolution and Matrix Multiplication. 75* **Selective Build** 76 * We plan to make it possible to control build size by selecting which operators/shaders you want to build with 77 78## End to End Example 79 80To further understand the features of the Vulkan Delegate and how to use it, 81consider the following end to end example with a simple single operator model. 82 83### Compile and lower a model to the Vulkan Delegate 84 85Assuming ExecuTorch has been set up and installed, the following script can be 86used to produce a lowered MobileNet V2 model as `vulkan_mobilenetv2.pte`. 87 88Once ExecuTorch has been set up and installed, the following script can be used 89to generate a simple model and lower it to the Vulkan delegate. 90 91``` 92# Note: this script is the same as the script from the "Setting up ExecuTorch" 93# page, with one minor addition to lower to the Vulkan backend. 94import torch 95from torch.export import export 96from executorch.exir import to_edge 97 98from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner 99 100# Start with a PyTorch model that adds two input tensors (matrices) 101class Add(torch.nn.Module): 102 def __init__(self): 103 super(Add, self).__init__() 104 105 def forward(self, x: torch.Tensor, y: torch.Tensor): 106 return x + y 107 108# 1. torch.export: Defines the program with the ATen operator set. 109aten_dialect = export(Add(), (torch.ones(1), torch.ones(1))) 110 111# 2. to_edge: Make optimizations for Edge devices 112edge_program = to_edge(aten_dialect) 113# 2.1 Lower to the Vulkan backend 114edge_program = edge_program.to_backend(VulkanPartitioner()) 115 116# 3. to_executorch: Convert the graph to an ExecuTorch program 117executorch_program = edge_program.to_executorch() 118 119# 4. Save the compiled .pte program 120with open("vk_add.pte", "wb") as file: 121 file.write(executorch_program.buffer) 122``` 123 124Like other ExecuTorch delegates, a model can be lowered to the Vulkan Delegate 125using the `to_backend()` API. The Vulkan Delegate implements the 126`VulkanPartitioner` class which identifies nodes (i.e. operators) in the graph 127that are supported by the Vulkan delegate, and separates compatible sections of 128the model to be executed on the GPU. 129 130This means the a model can be lowered to the Vulkan delegate even if it contains 131some unsupported operators. This will just mean that only parts of the graph 132will be executed on the GPU. 133 134 135::::{note} 136The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/partitioner/supported_ops.py) 137Vulkan partitioner code can be inspected to examine which ops are currently 138implemented in the Vulkan delegate. 139:::: 140 141### Build Vulkan Delegate libraries 142 143The easiest way to build and test the Vulkan Delegate is to build for Android 144and test on a local Android device. Android devices have built in support for 145Vulkan, and the Android NDK ships with a GLSL compiler which is needed to 146compile the Vulkan Compute Library's GLSL compute shaders. 147 148The Vulkan Delegate libraries can be built by setting `-DEXECUTORCH_BUILD_VULKAN=ON` 149when building with CMake. 150 151First, make sure that you have the Android NDK installed; any NDK version past 152NDK r19c should work. Note that the examples in this doc have been validated with 153NDK r27b. The Android SDK should also be installed so that you have access to `adb`. 154 155The instructions in this page assumes that the following environment variables 156are set. 157 158```shell 159export ANDROID_NDK=<path_to_ndk> 160# Select the appropriate Android ABI for your device 161export ANDROID_ABI=arm64-v8a 162# All subsequent commands should be performed from ExecuTorch repo root 163cd <path_to_executorch_root> 164# Make sure adb works 165adb --version 166``` 167 168To build and install ExecuTorch libraries (for Android) with the Vulkan 169Delegate: 170 171```shell 172# From executorch root directory 173(rm -rf cmake-android-out && \ 174 pp cmake . -DCMAKE_INSTALL_PREFIX=cmake-android-out \ 175 -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ 176 -DANDROID_ABI=$ANDROID_ABI \ 177 -DEXECUTORCH_BUILD_VULKAN=ON \ 178 -DPYTHON_EXECUTABLE=python \ 179 -Bcmake-android-out && \ 180 cmake --build cmake-android-out -j16 --target install) 181``` 182 183### Run the Vulkan model on device 184 185::::{note} 186Since operator support is currently limited, only binary arithmetic operators 187will run on the GPU. Expect inference to be slow as the majority of operators 188are being executed via Portable operators. 189:::: 190 191Now, the partially delegated model can be executed (partially) on your device's 192GPU! 193 194```shell 195# Build a model runner binary linked with the Vulkan delegate libs 196cmake --build cmake-android-out --target vulkan_executor_runner -j32 197 198# Push model to device 199adb push vk_add.pte /data/local/tmp/vk_add.pte 200# Push binary to device 201adb push cmake-android-out/backends/vulkan/vulkan_executor_runner /data/local/tmp/runner_bin 202 203# Run the model 204adb shell /data/local/tmp/runner_bin --model_path /data/local/tmp/vk_add.pte 205``` 206