1# Protected Virtual Machine Firmware 2 3In the context of the [Android Virtualization Framework][AVF], a hypervisor 4(_e.g._ [pKVM]) enforces full memory isolation between its virtual machines 5(VMs) and the host. As a result, the host is only allowed to access memory that 6has been explicitly shared back by a VM. Such _protected VMs_ (“pVMs”) are 7therefore able to manipulate secrets without being at risk of an attacker 8stealing them by compromising the Android host. 9 10As pVMs are started dynamically by a _virtual machine manager_ (“VMM”) running 11as a host process and as pVMs must not trust the host (see [_Why 12AVF?_][why-avf]), the virtual machine it configures can't be trusted either. 13Furthermore, even though the isolation mentioned above allows pVMs to protect 14their secrets from the host, it does not help with provisioning them during 15boot. In particular, the threat model would prohibit the host from ever having 16access to those secrets, preventing the VMM from passing them to the pVM. 17 18To address these concerns the hypervisor securely loads the pVM firmware 19(“pvmfw”) in the pVM from a protected memory region (this prevents the host or 20any pVM from tampering with it), setting it as the entry point of the virtual 21machine. As a result, pvmfw becomes the very first code that gets executed in 22the pVM, allowing it to validate the environment and abort the boot sequence if 23necessary. This process takes place whenever the VMM places a VM in protected 24mode and can’t be prevented by the host. 25 26Given the threat model, pvmfw is not allowed to trust the devices or device 27layout provided by the virtual platform it is running on as those are configured 28by the VMM. Instead, it performs all the necessary checks to ensure that the pVM 29was set up as expected. For functional purposes, the interface with the 30hypervisor, although trusted, is also validated. 31 32Once it has been determined that the platform can be trusted, pvmfw derives 33unique secrets for the guest through the [_DICE Chain_][android-dice] (see 34[Open Profile for DICE][open-dice]) that can be used to prove the identity of 35the pVM to local and remote actors. If any operation or check fails, or in case 36of a missing prerequisite, pvmfw will abort the boot process of the pVM, 37effectively preventing non-compliant pVMs and/or guests from running. 38Otherwise, it hands over the pVM to the guest kernel by jumping to its first 39instruction, similarly to a bootloader. 40 41pvmfw currently only supports AArch64. 42 43[AVF]: https://source.android.com/docs/core/virtualization 44[why-avf]: https://source.android.com/docs/core/virtualization/whyavf 45[android-dice]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/android.md 46[pKVM]: https://source.android.com/docs/core/virtualization/architecture#hypervisor 47[open-dice]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md 48 49## Integration 50 51### pvmfw Loading 52 53When running pKVM, the physical memory from which the hypervisor loads pvmfw 54into guest address space is not initially populated by the hypervisor itself. 55Instead, it receives a pre-loaded memory region from a trusted pvmfw loader and 56only then becomes responsible for protecting it. As a result, the hypervisor is 57kept generic (beyond AVF) and small as it is not expected (nor necessary) for it 58to know how to interpret or obtain the content of that region. 59 60#### Android Bootloader (ABL) Support 61 62Starting in Android T, the `PRODUCT_BUILD_PVMFW_IMAGE` build variable controls 63the generation of `pvmfw.img`, a new [ABL partition][ABL-part] containing the 64pvmfw binary (sometimes called "`pvmfw.bin`") and following the internal format 65of the [`boot`][boot-img] partition, intended to be verified and loaded by ABL 66on AVF-compatible devices. 67 68Once ABL has verified the `pvmfw.img` chained static partition, the contained 69[`boot.img` header][boot-img] may be used to obtain the size of the `pvmfw.bin` 70image (recorded in the `kernel_size` field), as it already does for the kernel 71itself. In accordance with the header format, the `kernel_size` bytes of the 72partition following the header will be the `pvmfw.bin` image. 73 74Note that when it gets executed in the context of a pVM, `pvmfw` expects to have 75been loaded at 4KiB-aligned intermediate physical address (IPA) so if ABL loads 76the `pvmfw.bin` image without respecting this alignment, it is the 77responsibility of the hypervisor to either reject the image or copy it into 78guest address space with the right alignment. 79 80To support pKVM, ABL is expected to describe the region using a reserved memory 81device tree node where both address and size have been properly aligned to the 82page size used by the hypervisor. This single region must include both the pvmfw 83binary image and its configuration data (see below). For example, the following 84node describes a region of size `0x40000` at address `0x80000000`: 85``` 86reserved-memory { 87 ... 88 pkvm_guest_firmware { 89 compatible = "linux,pkvm-guest-firmware-memory"; 90 reg = <0x0 0x80000000 0x40000>; 91 no-map; 92 } 93} 94``` 95 96[ABL-part]: https://source.android.com/docs/core/architecture/bootloader/partitions 97[boot-img]: https://source.android.com/docs/core/architecture/bootloader/boot-image-header 98 99### Configuration Data 100 101As part of the process of loading pvmfw, the loader (typically the Android 102Bootloader, "ABL") is expected to pass device-specific pvmfw configuration data 103by appending it to the pvmfw binary and including it in the region passed to the 104hypervisor. As a result, the hypervisor will give the same protection to this 105data as it does to pvmfw and will transparently load it in guest memory, making 106it available to pvmfw at runtime. This enables pvmfw to be kept device-agnostic, 107simplifying its adoption and distribution as a centralized signed binary, while 108also being able to support device-specific details. 109 110The configuration data will be read by pvmfw at the next 4KiB boundary from the 111end of its loaded binary. Even if the pvmfw is position-independent, it will be 112expected for it to also have been loaded at a 4-KiB boundary. As a result, the 113location of the configuration data is implicitly passed to pvmfw and known to it 114at build time. 115 116#### Configuration Data Format 117 118The configuration data is described using the following [header]: 119 120``` 121+===============================+ 122| pvmfw.bin | 123+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ 124| (Padding to 4KiB alignment) | 125+===============================+ <-- HEAD 126| Magic (= 0x666d7670) | 127+-------------------------------+ 128| Version | 129+-------------------------------+ 130| Total Size = (TAIL - HEAD) | 131+-------------------------------+ 132| Flags | 133+-------------------------------+ 134| [Entry 0] | 135| offset = (FIRST - HEAD) | 136| size = (FIRST_END - FIRST) | 137+-------------------------------+ 138| [Entry 1] | 139| offset = (SECOND - HEAD) | 140| size = (SECOND_END - SECOND) | 141+-------------------------------+ 142| [Entry 2] | <-- Entry 2 is present since version 1.1 143| offset = (THIRD - HEAD) | 144| size = (THIRD_END - THIRD) | 145+-------------------------------+ 146| [Entry 3] | <-- Entry 3 is present since version 1.2 147| offset = (FOURTH - HEAD) | 148| size = (FOURTH_END - FOURTH) | 149+-------------------------------+ 150| ... | 151+-------------------------------+ 152| [Entry n] | 153+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ 154| (Padding to 8-byte alignment) | 155+===============================+ <-- FIRST 156| {First blob: DICE chain} | 157+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- FIRST_END 158| (Padding to 8-byte alignment) | 159+===============================+ <-- SECOND 160| {Second blob: DP} | 161+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- SECOND_END 162| (Padding to 8-byte alignment) | 163+===============================+ <-- THIRD 164| {Third blob: VM DTBO} | 165+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- THIRD_END 166| (Padding to 8-byte alignment) | 167+===============================+ <-- FOURTH 168| {Fourth blob: VM reference DT}| 169+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- FOURTH_END 170| (Padding to 8-byte alignment) | 171+===============================+ 172| ... | 173+===============================+ <-- TAIL 174``` 175 176Where the version number is encoded using a "`major.minor`" as follows 177 178``` 179((major << 16) | (minor & 0xffff)) 180``` 181 182and defines the format of the header (which may change between major versions), 183its size and, in particular, the expected number of appended blobs. Each blob is 184referred to by its offset in the entry array and may be mandatory or optional 185(as defined by this specification), where missing entries are denoted by a zero 186size. It is therefore not allowed to trim missing optional entries from the end 187of the array. The header uses the endianness of the virtual machine. 188 189The header format itself is agnostic of the internal format of the individual 190blos it refers to. 191 192##### Version 1.0 {#pvmfw-data-v1-0} 193 194In version 1.0, it describes two blobs: 195 196- entry 0 must point to a valid DICE chain handover (see below) 197- entry 1 may point to a [DTBO] to be applied to the pVM device tree. See 198 [debug policy][debug_policy] for an example. 199 200##### Version 1.1 {#pvmfw-data-v1-1} 201 202In version 1.1, a third blob is added. 203 204- entry 2 may point to a [DTBO] that describes VM DA DTBO for 205 [device assignment][device_assignment]. 206 pvmfw will provision assigned devices with the VM DTBO. 207 208#### Version 1.2 {#pvmfw-data-v1-2} 209 210In version 1.2, a fourth blob is added. 211 212- entry 3 if present contains the VM reference DT. This defines properties that 213 may be included in the device tree passed to a protected VM. pvmfw validates 214 that if any of these properties is included in the VM's device tree, the 215 property value exactly matches what is in the VM reference DT. 216 217 The bootloader should ensure that the same properties, with the same values, 218 are added under the "/avf/reference" node in the host Android device tree. 219 220 This provides a mechanism to allow configuration information to be securely 221 passed to the VM via the host. pvmfw does not interpret the content of VM 222 reference DT, nor does it apply it to the VM's device tree, it just ensures 223 that if matching properties are present in the VM device tree they contain the 224 correct values. 225 226 Use-cases of VM reference DT include: 227 228 - Passing the [public key of the Secretkeeper][secretkeeper_key] HAL 229 implementation to each VM. 230 231 - Passing the [vendor hashtree digest][vendor_hashtree_digest] to run 232 Microdroid with verified vendor image. 233 234[header]: src/config.rs 235[DTBO]: https://android.googlesource.com/platform/external/dtc/+/refs/heads/main/Documentation/dt-object-internal.txt 236[debug_policy]: ../../docs/debug/README.md#debug-policy 237[device_assignment]: ../../docs/device_assignment.md 238[secretkeeper_key]: https://android.googlesource.com/platform/system/secretkeeper/+/refs/heads/main/README.md#secretkeeper-public-key 239[vendor_hashtree_digest]: ../../build/microdroid/README.md#verification-of-vendor-image 240 241#### Virtual Platform DICE Chain Handover 242 243The format of the DICE chain entry mentioned above, compatible with the 244[`AndroidDiceHandover`][AndroidDiceHandover] defined by the Open Profile for 245DICE reference implementation, is described by the following [CDDL][CDDL]: 246``` 247PvmfwDiceHandover = { 248 1 : bstr .size 32, ; CDI_Attest 249 2 : bstr .size 32, ; CDI_Seal 250 3 : DiceCertChain, ; Android DICE chain 251} 252``` 253 254It contains the _Compound Device Identifiers_ (CDIs), used for deriving the 255next-stage secret, and a certificate chain, necessary for building the full 256[pVM DICE chain][pvm-dice-chain] required by features like 257[pVM remote attestation][vm-attestation]. 258 259Note that it differs from the `AndroidDiceHandover` defined by the specification 260in that its `DiceCertChain` field is mandatory (while optional in the original). 261 262Devices that fully implement DICE should provide a certificate rooted at the 263Unique Device Secret (UDS) in a boot stage preceding the pvmfw loader (typically 264ABL), in such a way that it would receive a valid `AndroidDiceHandover`, that 265can be passed to [`DiceAndroidHandoverMainFlow`][DiceAndroidHandoverMainFlow] along with 266the inputs described below. 267 268The recommended DICE inputs at this stage are: 269 270- **Code**: hash of the pvmfw image, hypervisor (`boot.img`), and other target 271 code relevant to the secure execution of pvmfw (_e.g._ `vendor_boot.img`) 272- **Configuration Data**: any extra input relevant to pvmfw security 273- **Authority Data**: must cover all the public keys used to sign and verify the 274 code contributing to the **Code** input 275- **Mode Decision**: Set according to the [specification][dice-mode]. In 276 particular, should only be `Normal` if secure boot is being properly enforced 277 (_e.g._ locked device in [Android Verified Boot][AVB]) 278- **Hidden Inputs**: Factory Reset Secret (FRS, stored in a tamper evident 279 storage and changes during every factory reset) or similar that changes as 280 part of the device lifecycle (_e.g._ reset) 281 282The resulting `AndroidDiceHandover` is then used by pvmfw in a similar way to 283derive another [DICE layer][Layering], passed to the guest through a 284`/reserved-memory` device tree node marked as 285[`compatible=”google,open-dice”`][dice-dt]. 286 287[AVB]: https://source.android.com/docs/security/features/verifiedboot/boot-flow 288[AndroidDiceHandover]: https://pigweed.googlesource.com/open-dice/+/42ae7760023/src/android.c#212 289[DiceAndroidHandoverMainFlow]: https://pigweed.googlesource.com/open-dice/+/42ae7760023/src/android.c#221 290[CDDL]: https://datatracker.ietf.org/doc/rfc8610 291[dice-mode]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#Mode-Value-Details 292[dice-dt]: https://www.kernel.org/doc/Documentation/devicetree/bindings/reserved-memory/google%2Copen-dice.yaml 293[Layering]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#layering-details 294[pvm-dice-chain]: ../../docs/pvm_dice_chain.md 295[vm-attestation]: ../../docs/vm_remote_attestation.md 296 297### Platform Requirements 298 299pvmfw is intended to run in a virtualized environment according to the `crosvm` 300[memory layout][crosvm-mem] for protected VMs and so it expects to have been 301loaded at address `0x7fc0_0000` and uses the 2MiB region at address 302`0x7fe0_0000` as scratch memory. It makes use of the virtual PCI bus to obtain a 303virtio interface to the host and prints its logs through the 16550 UART (address 304`0x3f8`). 305 306At boot, pvmfw discovers the running hypervisor in order to select the 307appropriate hypervisor calls to share/unshare memory, mark IPA regions as MMIO, 308obtain trusted true entropy, and reboot the virtual machine. In particular, it 309makes use of the following hypervisor calls: 310 311- Arm [SMC Calling Convention][smccc] v1.1 or above: 312 313 - `SMCCC_VERSION` 314 - Vendor Specific Hypervisor Service Call UID Query 315 316- Arm [Power State Coordination Interface][psci] v1.0 or above: 317 318 - `PSCI_VERSION` 319 - `PSCI_FEATURES` 320 - `PSCI_SYSTEM_RESET` 321 - `PSCI_SYSTEM_SHUTDOWN` 322 323- Arm [True Random Number Generator Firmware Interface][smccc-trng] v1.0: 324 325 - `TRNG_VERSION` 326 - `TRNG_FEATURES` 327 - `TRNG_RND` 328 329- When running under KVM, the pKVM-specific hypervisor interface must provide: 330 331 - `MEMINFO` (function ID `0xc6000002`) 332 - `MEM_SHARE` (function ID `0xc6000003`) 333 - `MEM_UNSHARE` (function ID `0xc6000004`) 334 - `MMIO_GUARD_INFO` (function ID `0xc6000005`) 335 - `MMIO_GUARD_ENROLL` (function ID `0xc6000006`) 336 - `MMIO_GUARD_MAP` (function ID `0xc6000007`) 337 - `MMIO_GUARD_UNMAP` (function ID `0xc6000008`) 338 339[crosvm-mem]: https://crosvm.dev/book/appendix/memory_layout.html 340[psci]: https://developer.arm.com/documentation/den0022 341[smccc]: https://developer.arm.com/documentation/den0028 342[smccc-trng]: https://developer.arm.com/documentation/den0098 343 344## Booting Protected Virtual Machines 345 346### Boot Protocol 347 348As the hypervisor makes pvmfw the entry point of the VM, the initial value of 349the registers it receives is configured by the VMM and is expected to follow the 350[Linux ABI] _i.e._ 351 352- x0 = physical address of device tree blob (dtb) in system RAM. 353- x1 = 0 (reserved for future use) 354- x2 = 0 (reserved for future use) 355- x3 = 0 (reserved for future use) 356 357Images to be verified, which have been loaded to guest memory by the VMM prior 358to booting the VM, are described to pvmfw using the device tree (x0): 359 360- the kernel in the `/config` DT node _e.g._ 361 362 ``` 363 / { 364 config { 365 kernel-address = <0x80200000>; 366 kernel-size = <0x1000000>; 367 }; 368 }; 369 ```` 370 371- the (optional) ramdisk in the standard `/chosen` node _e.g._ 372 373 ``` 374 / { 375 chosen { 376 linux,initrd-start = <0x82000000>; 377 linux,initrd-end = <0x82800000>; 378 }; 379 }; 380 ``` 381 382[Linux ABI]: https://www.kernel.org/doc/Documentation/arm64/booting.txt 383 384### Handover ABI 385 386After verifying the guest kernel, pvmfw boots it using the Linux ABI described 387above. It uses the device tree to pass [AVF-specific properties][dt.md] and the 388DICE chain: 389 390``` 391/ { 392 reserved-memory { 393 #address-cells = <0x02>; 394 #size-cells = <0x02>; 395 ranges; 396 dice { 397 compatible = "google,open-dice"; 398 no-map; 399 reg = <0x0 0x7fe0000>, <0x0 0x1000>; 400 }; 401 }; 402}; 403``` 404 405[dt.md]: ../../docs/device_trees.md#avf_specific-properties-and-nodes 406 407### Guest Image Signing 408 409pvmfw verifies the guest kernel image (loaded by the VMM) by re-using tools and 410formats introduced by the Android Verified Boot. In particular, it expects the 411kernel region (see `/config/kernel-{address,size}` described above) to contain 412an appended VBMeta structure, which can be generated as follows: 413 414``` 415avbtool add_hash_footer --image <kernel.bin> \ 416 --partition_name boot \ 417 --dynamic_partition_size \ 418 --key $KEY 419``` 420 421In cases where a ramdisk is required by the guest, pvmfw must also verify it. To 422do so, it must be covered by a hash descriptor in the VBMeta of the kernel: 423 424``` 425cp <initrd.bin> /tmp/ 426avbtool add_hash_footer --image /tmp/<initrd.bin> \ 427 --partition_name $INITRD_NAME \ 428 --dynamic_partition_size \ 429 --key $KEY 430avbtool add_hash_footer --image <kernel.bin> \ 431 --partition_name boot \ 432 --dynamic_partition_size \ 433 --include_descriptor_from_image /tmp/<initrd.bin> \ 434 --key $KEY 435``` 436 437Note that the `/tmp/<initrd.bin>` file is only created to temporarily hold the 438hash descriptor to be added to the kernel footer and that the unsigned 439`<initrd.bin>` should be passed to the VMM when booting a pVM. 440 441The name of the AVB "partition" for the ramdisk (`$INITRD_NAME`) can be used by 442the signer to specify if pvmfw must consider the guest to be debuggable 443(`initrd_debug`) or not (`initrd_normal`), which will be reflected in the 444certificate of the guest and will affect the secrets being provisioned. 445 446If pVM guest kernels are built and/or packaged using the Android Build system, 447the signing described above is recommended to be done through an 448`avb_add_hash_footer` Soong module (see [how we sign the Microdroid 449kernel][soong-udroid]). 450 451[soong-udroid]: https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/Virtualization/microdroid/Android.bp;l=425;drc=b94a5cf516307c4279f6c16a63803527a8affc6d 452 453#### VBMeta Properties 454 455AVF defines special keys for AVB VBMeta descriptor properties that pvmfw 456recognizes, allowing VM owners to ensure that pvmfw performs its role in a way 457that is compatible with their guest kernel. These are: 458 459- `"com.android.virt.cap"`: a `|`-separated list of "capabilities" from 460 - `remote_attest`: pvmfw uses a hard-coded index for rollback protection 461 - `secretkeeper_protection`: pvmfw defers rollback protection to the guest 462 - `supports_uefi_boot`: pvmfw boots the VM as a EFI payload (experimental) 463 - `trusty_security_vm`: pvmfw skips rollback protection 464 465## Development 466 467For faster iteration, you can build pvmfw, adb-push it to the device, and use 468it directly for a new pVM, without having to flash it to the physical 469partition. To do that, the binary image composition performed by ABL described 470above must be replicated to produce a single file containing the pvmfw binary 471and its configuration data. 472 473As a quick prototyping solution, a valid DICE chain (such as this [test 474file][bcc.dat]) can be appended to the `pvmfw.bin` image with `pvmfw-tool`. 475 476```shell 477m pvmfw-tool pvmfw_bin 478PVMFW_BIN=${ANDROID_PRODUCT_OUT}/system/etc/pvmfw.bin 479DICE=${ANDROID_BUILD_TOP}/packages/modules/Virtualization/tests/pvmfw/assets/bcc.dat 480 481pvmfw-tool custom_pvmfw ${PVMFW_BIN} ${DICE} 482``` 483 484The result can then be pushed to the device. Pointing the system property 485`hypervisor.pvmfw.path` to it will cause AVF to use that image as pvmfw: 486 487```shell 488adb push custom_pvmfw /data/local/tmp/pvmfw 489adb root 490adb shell setprop hypervisor.pvmfw.path /data/local/tmp/pvmfw 491``` 492 493Then run a protected VM, for example: 494 495```shell 496adb shell /apex/com.android.virt/bin/vm run-microdroid --protected 497``` 498 499Note: `adb root` is required to set the system property. 500 501[bcc.dat]: https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/Virtualization/tests/pvmfw/assets/bcc.dat 502 503### Running pVM without pvmfw 504 505Sometimes, it might be useful to start a pVM without pvmfw, e.g. when debugging 506early pVM boot issues. You can achieve that by setting `hypervisor.pvmfw.path` 507propety to the value `none`: 508 509```shell 510adb shell 'setprop hypervisor.pvmfw.path "none"' 511``` 512 513Then run a protected VM: 514 515```shell 516adb shell /apex/com.android.virt/bin/vm run-microdroid --protected 517``` 518