xref: /aosp_15_r20/external/executorch/docs/source/compiler-memory-planning.md (revision 523fa7a60841cd1ecfb9cc4201f1ca8b03ed023a)
1# Memory Planning
2
3Audience: Backend integrators and embedded developers who are interested in customizing the regions of memory ExecuTorch programs operate in.
4
5## Overview
6
7MemoryPlanning is the very last action taken before taking an `ExportedProgram` and undergoing emission to an ExecuTorch program. During this process, ExecuTorch takes the size and lifespan of each mutable tensor, and plans out their location in fixed size memory arenas.
8
9Concretely, there are three passes related to memory planning:
10* `SpecPropPass` computes a TensorSpec for each tensor in the graph (inputs, intermediates or outputs). The most important field of the tensor spec is a symbolic expression of the shapes of the tensor, where the initial set of symbols comes from the dimensions of input tensors, intermediate tensor shapes’ symbolic expression is propagated via tensor operations. The dimensions can be marked as either dynamic or static by users and when the dims are dynamic, users are required to annotate the dim with a ValueRange.
11
12* `SymShapeEvalPass` evaluates the symbolic expressions to concrete integers with their upper bounds. There are two ways to doing the upper bound specialization:
13HintBasedSymShapeEval (to be deprecated) is the old way of evaluating the upper bound. It doesn’t look at the ValueRange of the symbols but uses the shapes of example inputs to replace all the symbols. We call it “hint based“ because the example inputs’ shapes are just hints of what the input shapes might be at run time and are used for tracing only. ValueRangeBasedSymShapeEval is the recommended way of doing UpperBoundMemory planning. It will actually look at the ValueRange of the symbols and do an inference over the ranges to get a real upper bound.
14
15* `MemoryPlanningPass` does the actual memory planning given all tensors get a TensorSpec with concrete integer shapes.
16
17## Algorithms
18
19ExecuTorch provides two options for memory planning algorithms out of the box, but users can define their own if the provided options are inappropriate or insufficient for their use case.
20
21* The naive algorithm simply concatenates all the tensors together in a linear memory block without considering memory re-use. It serves as an upper bound for total memory consumption and serves as a baseline.
22
23* The Greedy algorithm tries to re-use the already allocated memory based on the best-fit criteria. Specifically:
24When there isn’t an allocated memory whose lifetime doesn’t overlap with the current tensor that we try to do memory planning for, we allocate a new memory buffer with the same size and lifetime as the current tensor. When there is one or more allocated memory buffer, whose lifetime overlaps with the current tensor, we pick the buffer that has the closest size with current tensor so as to reduce memory fragmentation. Finally, we allocate these memory buffers linearly in memory.
25
26
27## Method Inputs and Outputs
28
29The `MemoryPlanningPass` exposes the option to not memory plan program inputs and outputs. If the IO is not planned then users will be expected to provide data buffers to back these values at runtime. Example:
30
31```python
32program = edge_program.to_executorch(
33            exir.ExecutorchBackendConfig(
34                memory_planning_pass=MemoryPlanningPass(
35                    alloc_graph_input=False, # Inputs will not be memory planned, the data_ptr for input tensors after model load will be nullptr
36                    alloc_graph_output=True, # Outputs will be memory planned, the data_ptr for output tensors after model load will be in the `planned_memory`.
37                )
38            )
39        )
40```
41
42One common set-up would be for models where the outputs of the model are provided as inputs to subsequent inferences. In that situation, it would generally be better to not memory plan the IO, and instead provide the same buffer to both the input and output at runtime to avoid a copy.
43
44## Custom Memory Plans
45
46Users can write custom memory plans to take advantage of multiple memory locations (like SRAM and DRAM), place the outputs of specific nodes in specific locations, or even change the planning algorithm itself. The following example shows how you could reuse the provided planning algorithms, but with multiple hierarchies and placing the outputs of specific ops in specific memory arenas.
47
48```python
49class CustomPoolMemoryPlanningPass(MemoryPlanningPass):
50    def run(self, graph_module: GraphModule, graph_signature: Optional[ExportGraphSignature]) -> PassResult:
51        for subgm in graph_module.modules():
52            if not isinstance(subgm, GraphModule):
53                continue
54            for node in subgm.graph.nodes:
55                # mem_id = 1 placeholder and outputs of mul
56                # mem_id = 2 for outputs of add
57                # parent class will copy spec will to alloc nodes
58                if node.op == "placeholder":
59                    node.meta["spec"].mem_id = 1
60                    continue
61
62                if node.op != "call_function":
63                    continue
64
65                if node.target == torch.ops.aten.add.out:
66                    node.meta["spec"].mem_id = 2
67                elif node.target == torch.ops.aten.mul.out:
68                    node.meta["spec"].mem_id = 1
69
70        return super().run(graph_module, graph_signature)
71```
72
73Then later when lowering to ExecuTorch you can use your custom plan in the following way:
74
75```python
76program = edge_program.to_executorch(
77            exir.ExecutorchBackendConfig(
78                memory_planning_pass=CustomPoolMemoryPlanningPass(
79                    memory_planning_algo=greedy,
80                )
81            )
82        )
83```
84
85Users attempting to write a custom memory planning algorithm should start by looking at [the greedy algorithm's implementation](https://github.com/pytorch/executorch/blob/d62c41ca86435e5316e7ed292b6d68aff27a2fb7/exir/memory_planning.py#L459C1-L459C12).
86
87## Debugging Tool
88
89Please refer to [Memory Planning Inspection](./memory-planning-inspection.md) for a tool to inspect the result of memory planning.
90