xref: /aosp_15_r20/external/executorch/docs/source/running-a-model-cpp-tutorial.md (revision 523fa7a60841cd1ecfb9cc4201f1ca8b03ed023a)
1# Running an ExecuTorch Model in C++ Tutorial
2
3**Author:** [Jacob Szwejbka](https://github.com/JacobSzwejbka)
4
5In this tutorial, we will cover how to run an ExecuTorch model in C++ using the more detailed, lower-level APIs: prepare the `MemoryManager`, set inputs, execute the model, and retrieve outputs. However, if you’re looking for a simpler interface that works out of the box, consider trying the [Module Extension Tutorial](extension-module.md).
6
7For a high level overview of the ExecuTorch Runtime please see [Runtime Overview](runtime-overview.md), and for more in-depth documentation on
8each API please see the [Runtime API Reference](executorch-runtime-api-reference.rst).
9[Here](https://github.com/pytorch/executorch/blob/main/examples/portable/executor_runner/executor_runner.cpp) is a fully functional version C++ model runner, and the [Setting up ExecuTorch](getting-started-setup.md) doc shows how to build and run it.
10
11
12## Prerequisites
13
14You will need an ExecuTorch model to follow along. We will be using
15the model `SimpleConv` generated from the [Exporting to ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).
16
17## Model Loading
18
19The first step towards running your model is to load it. ExecuTorch uses an abstraction called a `DataLoader` to handle the specifics of retrieving the `.pte` file data, and then `Program` represents the loaded state.
20
21Users can define their own `DataLoader`s to fit the needs of their particular system. In this tutorial we will be using the `FileDataLoader`, but you can look under [Example Data Loader Implementations](https://github.com/pytorch/executorch/tree/main/extension/data_loader) to see other options provided by the ExecuTorch project.
22
23For the `FileDataLoader` all we need to do is provide a file path to the constructor.
24
25``` cpp
26using executorch::aten::Tensor;
27using executorch::aten::TensorImpl;
28using executorch::extension::FileDataLoader;
29using executorch::extension::MallocMemoryAllocator;
30using executorch::runtime::Error;
31using executorch::runtime::EValue;
32using executorch::runtime::HierarchicalAllocator;
33using executorch::runtime::MemoryManager;
34using executorch::runtime::Method;
35using executorch::runtime::MethodMeta;
36using executorch::runtime::Program;
37using executorch::runtime::Result;
38using executorch::runtime::Span;
39
40Result<FileDataLoader> loader =
41        FileDataLoader::from("/tmp/model.pte");
42assert(loader.ok());
43
44Result<Program> program = Program::load(&loader.get());
45assert(program.ok());
46```
47
48## Setting Up the MemoryManager
49
50Next we will set up the `MemoryManager`.
51
52One of the principles of ExecuTorch is giving users control over where the memory used by the runtime comes from. Today (late 2023) users need to provide 2 different allocators:
53
54* Method Allocator: A `MemoryAllocator` used to allocate runtime structures at `Method` load time. Things like Tensor metadata, the internal chain of instructions, and other runtime state come from this.
55
56* Planned Memory: A `HierarchicalAllocator` containing 1 or more memory arenas where internal mutable tensor data buffers are placed. At `Method` load time internal tensors have their data pointers assigned to various offsets within. The positions of those offsets and the sizes of the arenas are determined by memory planning ahead of time.
57
58For this example we will retrieve the size of the planned memory arenas dynamically from the `Program`, but for heapless environments users could retrieve this information from the `Program` ahead of time and allocate the arena statically. We will also be using a malloc based allocator for the method allocator.
59
60``` cpp
61// Method names map back to Python nn.Module method names. Most users will only
62// have the singular method "forward".
63const char* method_name = "forward";
64
65// MethodMeta is a lightweight structure that lets us gather metadata
66// information about a specific method. In this case we are looking to get the
67// required size of the memory planned buffers for the method "forward".
68Result<MethodMeta> method_meta = program->method_meta(method_name);
69assert(method_meta.ok());
70
71std::vector<std::unique_ptr<uint8_t[]>> planned_buffers; // Owns the Memory
72std::vector<Span<uint8_t>> planned_arenas; // Passed to the allocator
73
74size_t num_memory_planned_buffers = method_meta->num_memory_planned_buffers();
75
76// It is possible to have multiple layers in our memory hierarchy; for example,
77// SRAM and DRAM.
78for (size_t id = 0; id < num_memory_planned_buffers; ++id) {
79  // .get() will always succeed because id < num_memory_planned_buffers.
80  size_t buffer_size =
81      static_cast<size_t>(method_meta->memory_planned_buffer_size(id).get());
82  planned_buffers.push_back(std::make_unique<uint8_t[]>(buffer_size));
83  planned_arenas.push_back({planned_buffers.back().get(), buffer_size});
84}
85HierarchicalAllocator planned_memory(
86    {planned_arenas.data(), planned_arenas.size()});
87
88// Version of MemoryAllocator that uses malloc to handle allocations rather then
89// a fixed buffer.
90MallocMemoryAllocator method_allocator;
91
92// Assemble all of the allocators into the MemoryManager that the Executor will
93// use.
94MemoryManager memory_manager(&method_allocator, &planned_memory);
95```
96
97## Loading a Method
98
99In ExecuTorch we load and initialize from the `Program` at a method granularity. Many programs will only have one method 'forward'. `load_method` is where initialization is done, from setting up tensor metadata, to intializing delegates, etc.
100
101``` cpp
102Result<Method> method = program->load_method(method_name);
103assert(method.ok());
104```
105
106## Setting Inputs
107
108Now that we have our method we need to set up its inputs before we can
109perform an inference. In this case we know our model takes a single (1, 3, 256, 256)
110sized float tensor.
111
112Depending on how your model was memory planned, the planned memory may or may
113not contain buffer space for your inputs and outputs.
114
115If the outputs were not memory planned then users will need to set up the output data pointer with 'set_output_data_ptr'. In this case we will just assume our model was exported with inputs and outputs handled by the memory plan.
116
117``` cpp
118// Create our input tensor.
119float data[1 * 3 * 256 * 256];
120Tensor::SizesType sizes[] = {1, 3, 256, 256};
121Tensor::DimOrderType dim_order = {0, 1, 2, 3};
122TensorImpl impl(
123    ScalarType::Float, // dtype
124    4, // number of dimensions
125    sizes,
126    data,
127    dim_order);
128Tensor t(&impl);
129
130// Implicitly casts t to EValue
131Error set_input_error = method->set_input(t, 0);
132assert(set_input_error == Error::Ok);
133```
134
135## Perform an Inference
136
137Now that our method is loaded and our inputs are set we can perform an inference. We do this by calling `execute`.
138
139``` cpp
140Error execute_error = method->execute();
141assert(execute_error == Error::Ok);
142```
143
144## Retrieve Outputs
145
146Once our inference completes we can retrieve our output. We know that our model only returns a single output tensor. One potential pitfall here is that the output we get back is owned by the `Method`. Users should take care to clone their output before performing any mutations on it, or if they need it to have a lifespan separate from the `Method`.
147
148``` cpp
149EValue output = method->get_output(0);
150assert(output.isTensor());
151```
152
153## Conclusion
154
155This tutorial demonstrated how to run an ExecuTorch model using low-level runtime APIs, which offer granular control over memory management and execution. However, for most use cases, we recommend using the Module APIs, which provide a more streamlined experience without sacrificing flexibility. For more details, check out the [Module Extension Tutorial](extension-module.md).
156