1# ExecuTorch Finetuning example 2 3In this tutorial, we show how to fine-tune an LLM using executorch. 4 5## Pre-requisites 6 7You will need to have a model's checkpoint, in the Hugging Face format. For example: 8 9``` 10git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct 11``` 12 13You will need to install [torchtune](https://github.com/pytorch/torchtune) following [its installation instructions](https://github.com/pytorch/torchtune?tab=readme-ov-file#installation). 14 15## Config Files 16 17As mentioned in the previous section, we internally use `torchtune` APIs, and thus, we use config files that follow `torchtune`'s structure. Typically, in the following sections we go through a working example which can be found in the `phi3_config.yaml` config file. 18 19### Tokenizer 20 21We need to define the tokenizer. Let's suppose we would like to use [PHI3 Mini Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model from Microsoft. We need to define the tokenizer component: 22 23``` 24tokenizer: 25 _component_: torchtune.models.phi3.phi3_mini_tokenizer 26 path: /tmp/Phi-3-mini-4k-instruct/tokenizer.model 27 max_seq_len: 1024 28``` 29 30This will load the tokenizer, and set the max sequence length to 1024. The class that will be instantiated will be [`Phi3MiniTokenizer`](https://github.com/pytorch/torchtune/blob/ee343e61804f9942b2bd48243552bf17b5d0d553/torchtune/models/phi3/_tokenizer.py#L30). 31 32### Dataset 33 34In this example we use the [Alpaca-Cleaned dataset](https://huggingface.co/datasets/yahma/alpaca-cleaned). We need to define the following parameters: 35 36``` 37dataset: 38 _component_: torchtune.datasets.alpaca_cleaned_dataset 39seed: null 40shuffle: True 41batch_size: 1 42``` 43 44Torchtune supports datasets using huggingface dataloaders, so custom datasets could also be defined. For examples on defining your own datasets, review the [torchtune docs](https://pytorch.org/torchtune/stable/tutorials/datasets.html#hugging-face-datasets). 45 46### Loss 47 48For the loss function, we use PyTorch losses. In this example we use the `CrossEntropyLoss`: 49 50``` 51loss: 52 _component_: torch.nn.CrossEntropyLoss 53``` 54 55### Model 56 57Model parameters can be set, in this example we replicate the configuration for phi3 mini instruct benchmarks: 58 59``` 60model: 61 _component_: torchtune.models.phi3.lora_phi3_mini 62 lora_attn_modules: ['q_proj', 'v_proj'] 63 apply_lora_to_mlp: False 64 apply_lora_to_output: False 65 lora_rank: 8 66 lora_alpha: 16 67``` 68 69### Checkpointer 70 71Depending on how your model is defined, you will need to instantiate different components. In these examples we use checkpoints from HF (hugging face format), and thus we will need to instantiate a `FullModelHFCheckpointer` object. We need to pass the checkpoint directory, the files with the tensors, the output directory for training and the model type: 72 73``` 74checkpointer: 75 _component_: torchtune.training.FullModelHFCheckpointer 76 checkpoint_dir: /tmp/Phi-3-mini-4k-instruct 77 checkpoint_files: [ 78 model-00001-of-00002.safetensors, 79 model-00002-of-00002.safetensors 80 ] 81 recipe_checkpoint: null 82 output_dir: /tmp/Phi-3-mini-4k-instruct/ 83 model_type: PHI3_MINI 84``` 85 86### Device 87 88Torchtune supports `cuda` and `bf16` tensors. However, for ExecuTorch training we only support `cpu` and `fp32`: 89 90``` 91device: cpu 92dtype: fp32 93``` 94 95## Running the example 96 97### Step 1: Generate the ExecuTorch PTE (checkpoint) 98 99The `model_exporter.py` exports the LLM checkpoint into an ExecuTorch checkpoint (.pte). It has two parameters: 100 101* `cfg`: Configuration file 102* `output_file`: The `.pte` output path 103 104``` 105python model_exporter.py --cfg=phi3_config.yaml --output_file=phi3_mini_lora.pte 106``` 107 108### Step 2: Run the fine-tuning job 109 110To run the fine-tuning job: 111 112``` 113python runner.py --cfg=phi3_config.yaml --model_file=phi3_mini_lora.pte 114``` 115 116You need to use **the same** config file from the previous step. The `model_file` arg is the `.pte` model from the previous step. 117 118Example output: 119 120``` 121Evaluating the model before training... 122100%|██████████████████████████████████████████████████████████████████████████████████████| 3/3 [31:23<00:00, 627.98s/it] 123Eval loss: tensor(2.3778) 124100%|██████████████████████████████████████████████████████████████████████████████████████| 5/5 [52:29<00:00, 629.84s/it] 125Losses: [2.7152762413024902, 0.7890686988830566, 2.249271869659424, 1.4777560234069824, 0.8378427624702454] 126100%|██████████████████████████████████████████████████████████████████████████████████████| 3/3 [30:35<00:00, 611.90s/it] 127Eval loss: tensor(0.8464) 128``` 129