# ExecuTorch Llama Android Demo App **[UPDATE - 10/24]** We have added support for running quantized Llama 3.2 1B/3B models in demo apps on the [XNNPACK backend](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md). We currently support inference with SpinQuant and QAT+LoRA quantization methods. We’re excited to share that the newly revamped Android demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an Android demo app and how to exercise the many features ExecuTorch and Llama models have to offer. This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case. Please dive in and start exploring our demo app today! We look forward to any feedback and are excited to see your innovative ideas. ## Key Concepts From this demo app, you will learn many key concepts such as: * How to prepare Llama models, build the ExecuTorch library, and model inferencing across delegates * Expose the ExecuTorch library via JNI layer * Familiarity with current ExecuTorch app-facing capabilities The goal is for you to see the type of support ExecuTorch provides and feel comfortable with leveraging it for your use cases. ## Supporting Models As a whole, the models that this app supports are (varies by delegate): * Llama 3.2 Quantized 1B/3B * Llama 3.2 1B/3B in BF16 * Llama Guard 3 1B * Llama 3.1 8B * Llama 3 8B * Llama 2 7B * LLaVA-1.5 vision model (only XNNPACK) ## Building the APK First it’s important to note that currently ExecuTorch provides support across 3 delegates. Once you identify the delegate of your choice, select the README link to get a complete end-to-end instructions for environment set-up to exporting the models to build ExecuTorch libraries and apps to run on device: | Delegate | Resource | | ------------- | ------------- | | XNNPACK (CPU-based library) | [link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md) | | QNN (Qualcomm AI Accelerators) | [link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md) | | MediaTek (MediaTek AI Accelerators) | [link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/mediatek_README.md) | ## How to Use the App This section will provide the main steps to use the app, along with a code snippet of the ExecuTorch API. For loading the app, development, and running on device we recommend Android Studio: 1. Open Android Studio and select "Open an existing Android Studio project" to open examples/demo-apps/android/LlamaDemo. 2. Run the app (^R). This builds and launches the app on the phone. ### Opening the App Below are the UI features for the app. Select the settings widget to get started with picking a model, its parameters and any prompts.