xref: /aosp_15_r20/external/crosvm/docs/book/src/architecture/snapshotting.md (revision bb4ee6a4ae7042d18b07a98463b9c8b875e44b39)
1# Architecture: Snapshotting
2
3Snapshotting is a **highly experimental** `x86_64` only feature currently under development. It is
4100% **not supported** and only supports a very limited set of devices. This page roughly summarizes
5how the system works, and how device authors should think about it when writing new devices.
6
7## The snapshot & restore sequence
8
9The data required for a snapshot is stored in several places, including guest memory, and the
10devices running on the host. To take an accurate snapshot, we need a point in time snapshot. Since
11there is no way to fetch this state atomically, we have to freeze the guest (VCPUs) and the device
12backends. Similarly, on restore we must freeze in the same way to prevent partially restored state
13from being modified.
14
15## Snapshotting a running VM
16
17In code, this is implemented by
18[vm_control::do_snapshot](https://crosvm.dev/doc/vm_control/fn.do_snapshot.html). We always freeze
19the VCPUs first
20([vm_control::VcpuSuspendGuard](https://crosvm.dev/doc/vm_control/struct.VcpuSuspendGuard.html)).
21This is done so that we can flush all pending interrupts to the irqchip (LAPIC) without triggering
22further activity from the driver (which could in turn trigger more device activity). With the VCPUs
23frozen, we freeze devices
24([vm_control::DeviceSleepGuard](https://crosvm.dev/doc/vm_control/struct.DeviceSleepGuard.html)).
25From here, it's a just a matter of serializing VCPU state, guest memory, and device state.
26
27### A word about interrupts
28
29Interrupts come in two primary flavors from the snapshotting perspective: legacy interrupts (e.g.
30IOAPIC interrupt lines), and MSIs.
31
32#### Legacy interrupts
33
34These are a little tricky because they are allocated as part of device creation, and device creation
35happens **before** we snapshot or restore. To avoid actually having to snapshot or restore the
36`Event` object wiring for these interrupts, we rely on the fact that as long as the VM is created
37with the right shape (e.g. devices), the interrupt `Event`s will be wired between the device & the
38irqchip correctly. As part of restoring, we will set the routing table, which ensures that those
39events map to the right GSIs in the hypervisor.
40
41#### MSIs
42
43These are much simpler, because of how MSIs are implemented in CrosVM. In `MsixConfig`, we save the
44MSI routing information for every IRQ. At restore time, we just register these MSIs with the
45hypervisor using the exact same mechanism that would be invoked on device activation (albeit
46bypassing GSI allocation since we know from the saved state exactly which GSI must be used).
47
48#### Flushing IRQs to the irqchip
49
50IRQs sometimes pass through multiple host `Event`s before reaching the hypervisor (or VCPU loop) for
51injection. Rather than trying to snapshot the `Event` state, we freeze all interrupt sources
52(devices) and flush all pending interrupts into the irqchip. This way, snapshotting the irqchip
53state is sufficient to capture all pending interrupts.
54
55### Two-step snapshotting
56
57Two-step snapshotting is performed in crosvm to ensure data retention.
58
59Problem definition:
60
611. VMM Manager requests crosvm to suspend.
621. Crosvm suspends, however host-side processes are still running.
631. VMM Manager requests processes suspend.
641. VMM Manager requests snapshot from crosvm.
651. VMM Manager snapshots host-side processes.
661. VMM Manager requests host-side processes and crosvm to resume (or stop).
67
68The problem is that data may be lost in steps 4 & 5, because of the time between steps 2 & 3. After
69step 2, crosvm is suspended and host-side processes are still running, which means host-side
70processes may send data to crosvm but the device in crosvm has not read that data.
71
72When the VM resumes, there are no issues, as the data gets read and processing continues normally.
73However, when the VM restores, that data is lost as it was not saved.
74
75Solution is two-step snapshotting. We modify step 4 to read any data coming from the host just
76before snapshotting, to save that data in crosvm, and then process that data when the VM resumes.
77
78## Restoring a VM in lieu of booting
79
80Restoring on to a running VM is not supported, and may never be. Our preferred approach is to
81instead create a new VM from a snapshot. This is why `vm_control::do_restore` can be invoked as part
82of the VM creation process.
83
84## Implications for device authors
85
86New devices SHOULD be compatible with the `devices::Suspendable` trait, but MAY defer actual
87implementation to the future. This trait's implementation defines how the device will sleep/wake,
88and how its state will be saved & restored as part of snapshotting.
89
90New virtio devices SHOULD implement the virtio device snapshot methods on
91[VirtioDevice](https://crosvm.dev/doc/devices/virtio/virtio_device/trait.VirtioDevice.html):
92`virtio_sleep`, `virtio_wake`, `virtio_snapshot`, and `virtio_restore`.
93