README.md
1# CLI Tool for Compile / Deploy Pre-Built QNN Artifacts
2
3An easy-to-use tool for generating / executing .pte program from pre-built model libraries / context binaries from Qualcomm AI Engine Direct. Tool is verified with [host environement](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#host-os).
4
5## Description
6
7This tool aims for users who want to leverage ExecuTorch runtime framework with their existent artifacts generated by QNN. It's possible for them to produce .pte program in few steps.<br/>
8If users are interested in well-known applications, [Qualcomm AI HUB](https://aihub.qualcomm.com/) is a great approach which provides tons of optimized state-of-the-art models ready for deploying. All of them could be downloaded in model library or context binary format.
9
10* Model libraries(.so) came from `qnn-model-lib-generator` | AI HUB, or context binaries(.bin) came from `qnn-context-binary-generator` | AI HUB, could apply tool directly with:
11 - To produce .pte program:
12 ```bash
13 $ python export.py compile
14 ```
15 - To perform inference with generated .pte program:
16 ```bash
17 $ python export.py execute
18 ```
19
20### Dependencies
21
22* Register for Qualcomm AI HUB.
23* Download the corresponding QNN SDK via [link](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk) which your favorite model is compiled with. Ths link will automatically download the latest version at this moment (users should be able to specify version soon, please refer to [this](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#software) for earlier releases).
24
25### Target Model
26
27* Consider using [virtual environment](https://app.aihub.qualcomm.com/docs/hub/getting_started.html) for AI HUB scripts to prevent package conflict against ExecuTorch. Please finish the [installation section](https://app.aihub.qualcomm.com/docs/hub/getting_started.html#installation) before proceeding following steps.
28* Take [QuickSRNetLarge-Quantized](https://aihub.qualcomm.com/models/quicksrnetlarge_quantized?searchTerm=quantized) as an example, please [install](https://huggingface.co/qualcomm/QuickSRNetLarge-Quantized#installation) package as instructed.
29* Create workspace and export pre-built model library:
30 ```bash
31 mkdir $MY_WS && cd $MY_WS
32 # target chipset is `SM8650`
33 python -m qai_hub_models.models.quicksrnetlarge_quantized.export --target-runtime qnn --chipset qualcomm-snapdragon-8gen3
34 ```
35* The compiled model library will be located under `$MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so`. This model library maps to the artifacts generated by SDK tools mentioned in `Integration workflow` section on [Qualcomm AI Engine Direct document](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/overview.html).
36
37### Compiling Program
38
39* Compile .pte program
40 ```bash
41 # `pip install pydot` if package is missing
42 # Note that device serial & hostname might not be required if given artifacts is in context binary format
43 PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py compile -a $MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so -m SM8650 -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android
44 ```
45* Artifacts for checking IO information
46 - `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.json`
47 - `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.svg`
48
49### Executing Program
50
51* Prepare test image
52 ```bash
53 cd $MY_WS
54 wget https://user-images.githubusercontent.com/12981474/40157448-eff91f06-5953-11e8-9a37-f6b5693fa03f.png -O baboon.png
55 ```
56 Execute following python script to generate input data:
57 ```python
58 import torch
59 import torchvision.transforms as transforms
60 from PIL import Image
61 img = Image.open('baboon.png').resize((128, 128))
62 transform = transforms.Compose([transforms.PILToTensor()])
63 # convert (C, H, W) to (N, H, W, C)
64 # IO tensor info. could be checked with quicksrnetlarge_quantized.json | .svg
65 img = transform(img).permute(1, 2, 0).unsqueeze(0)
66 torch.save(img, 'baboon.pt')
67 ```
68* Execute .pte program
69 ```bash
70 PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py execute -p output_pte/quicksrnetlarge_quantized -i baboon.pt -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android
71 ```
72* Post-process generated data
73 ```bash
74 cd output_data
75 ```
76 Execute following python script to generate output image:
77 ```python
78 import io
79 import torch
80 import torchvision.transforms as transforms
81 # IO tensor info. could be checked with quicksrnetlarge_quantized.json | .svg
82 # generally we would have same layout for input / output tensors: e.g. either NHWC or NCHW
83 # this might not be true under different converter configurations
84 # learn more with converter tool from Qualcomm AI Engine Direct documentation
85 # https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/tools.html#model-conversion
86 with open('output__142.pt', 'rb') as f:
87 buffer = io.BytesIO(f.read())
88 img = torch.load(buffer, weights_only=False)
89 transform = transforms.Compose([transforms.ToPILImage()])
90 img_pil = transform(img.squeeze(0))
91 img_pil.save('baboon_upscaled.png')
92 ```
93 You could check the upscaled result now!
94
95## Help
96
97Please check help messages for more information:
98```bash
99PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/export.py -h
100PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py compile -h
101PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py execute -h
102```
103