1# Architecture: Snapshotting 2 3Snapshotting is a **highly experimental** `x86_64` only feature currently under development. It is 4100% **not supported** and only supports a very limited set of devices. This page roughly summarizes 5how the system works, and how device authors should think about it when writing new devices. 6 7## The snapshot & restore sequence 8 9The data required for a snapshot is stored in several places, including guest memory, and the 10devices running on the host. To take an accurate snapshot, we need a point in time snapshot. Since 11there is no way to fetch this state atomically, we have to freeze the guest (VCPUs) and the device 12backends. Similarly, on restore we must freeze in the same way to prevent partially restored state 13from being modified. 14 15## Snapshotting a running VM 16 17In code, this is implemented by 18[vm_control::do_snapshot](https://crosvm.dev/doc/vm_control/fn.do_snapshot.html). We always freeze 19the VCPUs first 20([vm_control::VcpuSuspendGuard](https://crosvm.dev/doc/vm_control/struct.VcpuSuspendGuard.html)). 21This is done so that we can flush all pending interrupts to the irqchip (LAPIC) without triggering 22further activity from the driver (which could in turn trigger more device activity). With the VCPUs 23frozen, we freeze devices 24([vm_control::DeviceSleepGuard](https://crosvm.dev/doc/vm_control/struct.DeviceSleepGuard.html)). 25From here, it's a just a matter of serializing VCPU state, guest memory, and device state. 26 27### A word about interrupts 28 29Interrupts come in two primary flavors from the snapshotting perspective: legacy interrupts (e.g. 30IOAPIC interrupt lines), and MSIs. 31 32#### Legacy interrupts 33 34These are a little tricky because they are allocated as part of device creation, and device creation 35happens **before** we snapshot or restore. To avoid actually having to snapshot or restore the 36`Event` object wiring for these interrupts, we rely on the fact that as long as the VM is created 37with the right shape (e.g. devices), the interrupt `Event`s will be wired between the device & the 38irqchip correctly. As part of restoring, we will set the routing table, which ensures that those 39events map to the right GSIs in the hypervisor. 40 41#### MSIs 42 43These are much simpler, because of how MSIs are implemented in CrosVM. In `MsixConfig`, we save the 44MSI routing information for every IRQ. At restore time, we just register these MSIs with the 45hypervisor using the exact same mechanism that would be invoked on device activation (albeit 46bypassing GSI allocation since we know from the saved state exactly which GSI must be used). 47 48#### Flushing IRQs to the irqchip 49 50IRQs sometimes pass through multiple host `Event`s before reaching the hypervisor (or VCPU loop) for 51injection. Rather than trying to snapshot the `Event` state, we freeze all interrupt sources 52(devices) and flush all pending interrupts into the irqchip. This way, snapshotting the irqchip 53state is sufficient to capture all pending interrupts. 54 55### Two-step snapshotting 56 57Two-step snapshotting is performed in crosvm to ensure data retention. 58 59Problem definition: 60 611. VMM Manager requests crosvm to suspend. 621. Crosvm suspends, however host-side processes are still running. 631. VMM Manager requests processes suspend. 641. VMM Manager requests snapshot from crosvm. 651. VMM Manager snapshots host-side processes. 661. VMM Manager requests host-side processes and crosvm to resume (or stop). 67 68The problem is that data may be lost in steps 4 & 5, because of the time between steps 2 & 3. After 69step 2, crosvm is suspended and host-side processes are still running, which means host-side 70processes may send data to crosvm but the device in crosvm has not read that data. 71 72When the VM resumes, there are no issues, as the data gets read and processing continues normally. 73However, when the VM restores, that data is lost as it was not saved. 74 75Solution is two-step snapshotting. We modify step 4 to read any data coming from the host just 76before snapshotting, to save that data in crosvm, and then process that data when the VM resumes. 77 78## Restoring a VM in lieu of booting 79 80Restoring on to a running VM is not supported, and may never be. Our preferred approach is to 81instead create a new VM from a snapshot. This is why `vm_control::do_restore` can be invoked as part 82of the VM creation process. 83 84## Implications for device authors 85 86New devices SHOULD be compatible with the `devices::Suspendable` trait, but MAY defer actual 87implementation to the future. This trait's implementation defines how the device will sleep/wake, 88and how its state will be saved & restored as part of snapshotting. 89 90New virtio devices SHOULD implement the virtio device snapshot methods on 91[VirtioDevice](https://crosvm.dev/doc/devices/virtio/virtio_device/trait.VirtioDevice.html): 92`virtio_sleep`, `virtio_wake`, `virtio_snapshot`, and `virtio_restore`. 93