README.md
1# ExecuTorch Core ML Delegate
2
3
4This subtree contains the Core ML Delegate implementation for ExecuTorch.
5Core ML is an optimized framework for running machine learning models on Apple devices. The delegate is the mechanism for leveraging the Core ML framework to accelerate operators when running on Apple devices.
6
7## Layout
8- `compiler/` : Lowers a module to Core ML backend.
9- `partition/`: Partitions a module fully or partially to Core ML backend.
10- `quantizer/`: Quantizes a module in Core ML favored scheme.
11- `scripts/` : Scripts for installing dependencies and running tests.
12- `runtime/`: Core ML delegate runtime implementation.
13 - `inmemoryfs`: InMemory filesystem implementation used to serialize/de-serialize AOT blob.
14 - `kvstore`: Persistent Key-Value store implementation.
15 - `delegate`: Runtime implementation.
16 - `include` : Public headers.
17 - `sdk` : SDK implementation.
18 - `tests` : Unit tests.
19 - `workspace` : Xcode workspace for the runtime.
20- `third-party/`: External dependencies.
21
22## Partition and Delegation
23
24To delegate a Program to the **Core ML** backend, the client must call `to_backend` with the **CoreMLPartitioner**.
25
26```python
27import torch
28import executorch.exir
29
30from executorch.backends.apple.coreml.compiler import CoreMLBackend
31from executorch.backends.apple.coreml.partition import CoreMLPartitioner
32
33class Model(torch.nn.Module):
34 def __init__(self):
35 super().__init__()
36
37 def forward(self, x):
38 return torch.sin(x)
39
40source_model = Model()
41example_inputs = (torch.ones(1), )
42
43# Export the source model to Edge IR representation
44aten_program = torch.export.export(source_model, example_inputs)
45edge_program_manager = executorch.exir.to_edge(aten_program)
46
47# Delegate to Core ML backend
48delegated_program_manager = edge_program_manager.to_backend(CoreMLPartitioner())
49
50# Serialize delegated program
51executorch_program = delegated_program_manager.to_executorch()
52with open("model.pte", "wb") as f:
53 f.write(executorch_program.buffer)
54```
55
56The module will be fully or partially delegated to **Core ML**, depending on whether all or part of ops are supported by the **Core ML** backend. User may force skip certain ops by `CoreMLPartitioner(skip_ops_for_coreml_delegation=...)`
57
58The `to_backend` implementation is a thin wrapper over [coremltools](https://apple.github.io/coremltools/docs-guides/), `coremltools` is responsible for converting an **ExportedProgram** to a **MLModel**. The converted **MLModel** data is saved, flattened, and returned as bytes to **ExecuTorch**.
59
60## Quantization
61
62To quantize a Program in a Core ML favored way, the client may utilize **CoreMLQuantizer**.
63
64```python
65import torch
66import executorch.exir
67
68from torch.export import export_for_training
69from torch.ao.quantization.quantize_pt2e import (
70 convert_pt2e,
71 prepare_pt2e,
72 prepare_qat_pt2e,
73)
74
75from executorch.backends.apple.coreml.quantizer import CoreMLQuantizer
76from coremltools.optimize.torch.quantization.quantization_config import (
77 LinearQuantizerConfig,
78 QuantizationScheme,
79)
80
81class Model(torch.nn.Module):
82 def __init__(self) -> None:
83 super().__init__()
84 self.conv = torch.nn.Conv2d(
85 in_channels=3, out_channels=16, kernel_size=3, padding=1
86 )
87 self.relu = torch.nn.ReLU()
88
89 def forward(self, x: torch.Tensor) -> torch.Tensor:
90 a = self.conv(x)
91 return self.relu(a)
92
93source_model = Model()
94example_inputs = (torch.randn((1, 3, 256, 256)), )
95
96pre_autograd_aten_dialect = export_for_training(model, example_inputs).module()
97
98quantization_config = LinearQuantizerConfig.from_dict(
99 {
100 "global_config": {
101 "quantization_scheme": QuantizationScheme.symmetric,
102 "activation_dtype": torch.uint8,
103 "weight_dtype": torch.int8,
104 "weight_per_channel": True,
105 }
106 }
107)
108quantizer = CoreMLQuantizer(quantization_config)
109
110# For post-training quantization, use `prepare_pt2e`
111# For quantization-aware trainin,g use `prepare_qat_pt2e`
112prepared_graph = prepare_pt2e(pre_autograd_aten_dialect, quantizer)
113
114prepared_graph(*example_inputs)
115converted_graph = convert_pt2e(prepared_graph)
116```
117
118The `converted_graph` is the quantized torch model, and can be delegated to **Core ML** similarly through **CoreMLPartitioner**
119
120## Runtime
121
122To execute a Core ML delegated program, the application must link to the `coremldelegate` library. Once linked there are no additional steps required, ExecuTorch when running the program would call the Core ML runtime to execute the Core ML delegated part of the program.
123
124Please follow the instructions described in the [Core ML setup](/backends/apple/coreml/setup.md) to link the `coremldelegate` library.
125
126## Help & Improvements
127If you have problems or questions or have suggestions for ways to make
128implementation and testing better, please create an issue on [github](https://www.github.com/pytorch/executorch/issues).
129