xref: /aosp_15_r20/external/executorch/backends/vulkan/README.md (revision 523fa7a60841cd1ecfb9cc4201f1ca8b03ed023a)
1# ExecuTorch Vulkan Delegate
2
3The ExecuTorch Vulkan delegate is a native GPU delegate for ExecuTorch that is
4built on top of the cross-platform Vulkan GPU API standard. It is primarily
5designed to leverage the GPU to accelerate model inference on Android devices,
6but can be used on any platform that supports an implementation of Vulkan:
7laptops, servers, and edge devices.
8
9::::{note}
10The Vulkan delegate is currently under active development, and its components
11are subject to change.
12::::
13
14## What is Vulkan?
15
16Vulkan is a low-level GPU API specification developed as a successor to OpenGL.
17It is designed to offer developers more explicit control over GPUs compared to
18previous specifications in order to reduce overhead and maximize the
19capabilities of the modern graphics hardware.
20
21Vulkan has been widely adopted among GPU vendors, and most modern GPUs (both
22desktop and mobile) in the market support Vulkan. Vulkan is also included in
23Android from Android 7.0 onwards.
24
25**Note that Vulkan is a GPU API, not a GPU Math Library**. That is to say it
26provides a way to execute compute and graphics operations on a GPU, but does not
27come with a built-in library of performant compute kernels.
28
29## The Vulkan Compute Library
30
31The ExecuTorch Vulkan Delegate is a wrapper around a standalone runtime known as
32the **Vulkan Compute Library**. The aim of the Vulkan Compute Library is to
33provide GPU implementations for PyTorch operators via GLSL compute shaders.
34
35The Vulkan Compute Library is a fork/iteration of the [PyTorch Vulkan Backend](https://pytorch.org/tutorials/prototype/vulkan_workflow.html).
36The core components of the PyTorch Vulkan backend were forked into ExecuTorch
37and adapted for an AOT graph-mode style of model inference (as opposed to
38PyTorch which adopted an eager execution style of model inference).
39
40The components of the Vulkan Compute Library are contained in the
41`executorch/backends/vulkan/runtime/` directory. The core components are listed
42and described below:
43
44```
45runtime/
46├── api/ .................... Wrapper API around Vulkan to manage Vulkan objects
47└── graph/ .................. ComputeGraph class which implements graph mode inference
48    └── ops/ ................ Base directory for operator implementations
49        ├── glsl/ ........... GLSL compute shaders
50        │   ├── *.glsl
51        │   └── conv2d.glsl
52        └── impl/ ........... C++ code to dispatch GPU compute shaders
53            ├── *.cpp
54            └── Conv2d.cpp
55```
56
57## Features
58
59The Vulkan delegate currently supports the following features:
60
61* **Memory Planning**
62  * Intermediate tensors whose lifetimes do not overlap will share memory allocations. This reduces the peak memory usage of model inference.
63* **Capability Based Partitioning**:
64  * A graph can be partially lowered to the Vulkan delegate via a partitioner, which will identify nodes (i.e. operators) that are supported by the Vulkan delegate and lower only supported subgraphs
65* **Support for upper-bound dynamic shapes**:
66  * Tensors can change shape between inferences as long as its current shape is smaller than the bounds specified during lowering
67
68In addition to increasing operator coverage, the following features are
69currently in development:
70
71* **Quantization Support**
72  * We are currently working on support for 8-bit dynamic quantization, with plans to extend to other quantization schemes in the future.
73* **Memory Layout Management**
74  * Memory layout is an important factor to optimizing performance. We plan to introduce graph passes to introduce memory layout transitions throughout a graph to optimize memory-layout sensitive operators such as Convolution and Matrix Multiplication.
75* **Selective Build**
76  * We plan to make it possible to control build size by selecting which operators/shaders you want to build with
77
78## End to End Example
79
80To further understand the features of the Vulkan Delegate and how to use it,
81consider the following end to end example with a simple single operator model.
82
83### Compile and lower a model to the Vulkan Delegate
84
85Assuming ExecuTorch has been set up and installed, the following script can be
86used to produce a lowered MobileNet V2 model as `vulkan_mobilenetv2.pte`.
87
88Once ExecuTorch has been set up and installed, the following script can be used
89to generate a simple model and lower it to the Vulkan delegate.
90
91```
92# Note: this script is the same as the script from the "Setting up ExecuTorch"
93# page, with one minor addition to lower to the Vulkan backend.
94import torch
95from torch.export import export
96from executorch.exir import to_edge
97
98from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner
99
100# Start with a PyTorch model that adds two input tensors (matrices)
101class Add(torch.nn.Module):
102  def __init__(self):
103    super(Add, self).__init__()
104
105  def forward(self, x: torch.Tensor, y: torch.Tensor):
106      return x + y
107
108# 1. torch.export: Defines the program with the ATen operator set.
109aten_dialect = export(Add(), (torch.ones(1), torch.ones(1)))
110
111# 2. to_edge: Make optimizations for Edge devices
112edge_program = to_edge(aten_dialect)
113# 2.1 Lower to the Vulkan backend
114edge_program = edge_program.to_backend(VulkanPartitioner())
115
116# 3. to_executorch: Convert the graph to an ExecuTorch program
117executorch_program = edge_program.to_executorch()
118
119# 4. Save the compiled .pte program
120with open("vk_add.pte", "wb") as file:
121    file.write(executorch_program.buffer)
122```
123
124Like other ExecuTorch delegates, a model can be lowered to the Vulkan Delegate
125using the `to_backend()` API. The Vulkan Delegate implements the
126`VulkanPartitioner` class which identifies nodes (i.e. operators) in the graph
127that are supported by the Vulkan delegate, and separates compatible sections of
128the model to be executed on the GPU.
129
130This means the a model can be lowered to the Vulkan delegate even if it contains
131some unsupported operators. This will just mean that only parts of the graph
132will be executed on the GPU.
133
134
135::::{note}
136The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/partitioner/supported_ops.py)
137Vulkan partitioner code can be inspected to examine which ops are currently
138implemented in the Vulkan delegate.
139::::
140
141### Build Vulkan Delegate libraries
142
143The easiest way to build and test the Vulkan Delegate is to build for Android
144and test on a local Android device. Android devices have built in support for
145Vulkan, and the Android NDK ships with a GLSL compiler which is needed to
146compile the Vulkan Compute Library's GLSL compute shaders.
147
148The Vulkan Delegate libraries can be built by setting `-DEXECUTORCH_BUILD_VULKAN=ON`
149when building with CMake.
150
151First, make sure that you have the Android NDK installed; any NDK version past
152NDK r19c should work. Note that the examples in this doc have been validated with
153NDK r27b. The Android SDK should also be installed so that you have access to `adb`.
154
155The instructions in this page assumes that the following environment variables
156are set.
157
158```shell
159export ANDROID_NDK=<path_to_ndk>
160# Select the appropriate Android ABI for your device
161export ANDROID_ABI=arm64-v8a
162# All subsequent commands should be performed from ExecuTorch repo root
163cd <path_to_executorch_root>
164# Make sure adb works
165adb --version
166```
167
168To build and install ExecuTorch libraries (for Android) with the Vulkan
169Delegate:
170
171```shell
172# From executorch root directory
173(rm -rf cmake-android-out && \
174  pp cmake . -DCMAKE_INSTALL_PREFIX=cmake-android-out \
175    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
176    -DANDROID_ABI=$ANDROID_ABI \
177    -DEXECUTORCH_BUILD_VULKAN=ON \
178    -DPYTHON_EXECUTABLE=python \
179    -Bcmake-android-out && \
180  cmake --build cmake-android-out -j16 --target install)
181```
182
183### Run the Vulkan model on device
184
185::::{note}
186Since operator support is currently limited, only binary arithmetic operators
187will run on the GPU. Expect inference to be slow as the majority of operators
188are being executed via Portable operators.
189::::
190
191Now, the partially delegated model can be executed (partially) on your device's
192GPU!
193
194```shell
195# Build a model runner binary linked with the Vulkan delegate libs
196cmake --build cmake-android-out --target vulkan_executor_runner -j32
197
198# Push model to device
199adb push vk_add.pte /data/local/tmp/vk_add.pte
200# Push binary to device
201adb push cmake-android-out/backends/vulkan/vulkan_executor_runner /data/local/tmp/runner_bin
202
203# Run the model
204adb shell /data/local/tmp/runner_bin --model_path /data/local/tmp/vk_add.pte
205```
206