1# CLI Tool for Compile / Deploy Pre-Built QNN Artifacts 2 3An easy-to-use tool for generating / executing .pte program from pre-built model libraries / context binaries from Qualcomm AI Engine Direct. Tool is verified with [host environement](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#host-os). 4 5## Description 6 7This tool aims for users who want to leverage ExecuTorch runtime framework with their existent artifacts generated by QNN. It's possible for them to produce .pte program in few steps.<br/> 8If users are interested in well-known applications, [Qualcomm AI HUB](https://aihub.qualcomm.com/) is a great approach which provides tons of optimized state-of-the-art models ready for deploying. All of them could be downloaded in model library or context binary format. 9 10* Model libraries(.so) came from `qnn-model-lib-generator` | AI HUB, or context binaries(.bin) came from `qnn-context-binary-generator` | AI HUB, could apply tool directly with: 11 - To produce .pte program: 12 ```bash 13 $ python export.py compile 14 ``` 15 - To perform inference with generated .pte program: 16 ```bash 17 $ python export.py execute 18 ``` 19 20### Dependencies 21 22* Register for Qualcomm AI HUB. 23* Download the corresponding QNN SDK via [link](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk) which your favorite model is compiled with. Ths link will automatically download the latest version at this moment (users should be able to specify version soon, please refer to [this](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#software) for earlier releases). 24 25### Target Model 26 27* Consider using [virtual environment](https://app.aihub.qualcomm.com/docs/hub/getting_started.html) for AI HUB scripts to prevent package conflict against ExecuTorch. Please finish the [installation section](https://app.aihub.qualcomm.com/docs/hub/getting_started.html#installation) before proceeding following steps. 28* Take [QuickSRNetLarge-Quantized](https://aihub.qualcomm.com/models/quicksrnetlarge_quantized?searchTerm=quantized) as an example, please [install](https://huggingface.co/qualcomm/QuickSRNetLarge-Quantized#installation) package as instructed. 29* Create workspace and export pre-built model library: 30 ```bash 31 mkdir $MY_WS && cd $MY_WS 32 # target chipset is `SM8650` 33 python -m qai_hub_models.models.quicksrnetlarge_quantized.export --target-runtime qnn --chipset qualcomm-snapdragon-8gen3 34 ``` 35* The compiled model library will be located under `$MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so`. This model library maps to the artifacts generated by SDK tools mentioned in `Integration workflow` section on [Qualcomm AI Engine Direct document](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/overview.html). 36 37### Compiling Program 38 39* Compile .pte program 40 ```bash 41 # `pip install pydot` if package is missing 42 # Note that device serial & hostname might not be required if given artifacts is in context binary format 43 PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py compile -a $MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so -m SM8650 -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android 44 ``` 45* Artifacts for checking IO information 46 - `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.json` 47 - `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.svg` 48 49### Executing Program 50 51* Prepare test image 52 ```bash 53 cd $MY_WS 54 wget https://user-images.githubusercontent.com/12981474/40157448-eff91f06-5953-11e8-9a37-f6b5693fa03f.png -O baboon.png 55 ``` 56 Execute following python script to generate input data: 57 ```python 58 import torch 59 import torchvision.transforms as transforms 60 from PIL import Image 61 img = Image.open('baboon.png').resize((128, 128)) 62 transform = transforms.Compose([transforms.PILToTensor()]) 63 # convert (C, H, W) to (N, H, W, C) 64 # IO tensor info. could be checked with quicksrnetlarge_quantized.json | .svg 65 img = transform(img).permute(1, 2, 0).unsqueeze(0) 66 torch.save(img, 'baboon.pt') 67 ``` 68* Execute .pte program 69 ```bash 70 PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py execute -p output_pte/quicksrnetlarge_quantized -i baboon.pt -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android 71 ``` 72* Post-process generated data 73 ```bash 74 cd output_data 75 ``` 76 Execute following python script to generate output image: 77 ```python 78 import io 79 import torch 80 import torchvision.transforms as transforms 81 # IO tensor info. could be checked with quicksrnetlarge_quantized.json | .svg 82 # generally we would have same layout for input / output tensors: e.g. either NHWC or NCHW 83 # this might not be true under different converter configurations 84 # learn more with converter tool from Qualcomm AI Engine Direct documentation 85 # https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/tools.html#model-conversion 86 with open('output__142.pt', 'rb') as f: 87 buffer = io.BytesIO(f.read()) 88 img = torch.load(buffer, weights_only=False) 89 transform = transforms.Compose([transforms.ToPILImage()]) 90 img_pil = transform(img.squeeze(0)) 91 img_pil.save('baboon_upscaled.png') 92 ``` 93 You could check the upscaled result now! 94 95## Help 96 97Please check help messages for more information: 98```bash 99PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/export.py -h 100PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py compile -h 101PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py execute -h 102``` 103