1# Cr50 And Chrome OS Verified Boot Troubleshooting 2 3H1 is a Google security chip installed on most Chrome OS devices. Cr50 is the 4firmware running on the H1. A high level overview of hardware and firmware can 5be found in [this 6presentation](https://2018.osfc.io/uploads/talk/paper/7/gsc_copy.pdf). 7 8This write-up is an attempt to explain how Cr50 participates in the Chrome OS 9device boot process, and what are possible reasons for the dreaded "Chrome OS 10Missing Or Damaged" screen showing up when Chrome OS device reboots. 11 12## Basic overview 13 14The H1 controls reset lines of the EC (embedded controller) and the AP 15(application processor, or SOC). During normal Chromebook operation H1 is 16always powered up as long as battery retains even a minimal amount of charge. 17In Chromeboxes H1 powers on with the rest of the system. 18 19One of the important functions of H1 in the system is a subset of TPM (Trusted 20Platform Module) functionality. The TPM stores verified boot information, this 21is why any **problems communicating with the TPM during the boot up process** 22result in the Chrome OS device falling into **recovery mode**. 23 24Another important function of the H1 in the system is CCD ([closed case 25debugging](https://chromium.googlesource.com/chromiumos/platform/ec/+/fe6ca90e/docs/case_closed_debugging_gsc.md#)) 26 27## H1 power states and CCD 28 29During periods of inactivity H1 could enter a *sleep* or *deep sleep* state. 30In *sleep* state most of the clocks are turned off and power consumption is 31minimized, but SRAM contents and the CPU state are maintained. In *deep sleep* 32state the H1 is practically shut down. 33 34The H1 never enters the *deep sleep* state during the Chrome OS boot process, 35but could enter the *sleep* state if the Chrome OS device boot process is 36delayed for whatever reason, and **only when CCD is not active**. This could 37be one of the reasons that there are boot failures when CCD is not connected, 38but the failures go away if CCD is on (the debug cable is plugged in). 39 40To make sure the H1 exits the *sleep* state the AP triggers a wake up event, 41details of which are described below. 42 43## H1 communications with the AP 44 45The H1 could be connected to the AP over the I2C or SPI bus. The same Cr50 46firmware is used in both cases, the decision which of the two interfaces to 47use is made based on resistor straps the Cr50 reads at startup. 48 49Both I2C and SPI interfaces do not fully comply with their respective bus 50standards: the I2C controller does not support clock stretching, and the SPI 51controller can not be clocked faster than 2 MHz. 52 53Look for a text line like the following in the Cr50 console output right after 54power up 55 56> [0.005657 Valid strap: 0xa properties: 0x41] 57 58to confirm that the straps were read properly. 59 60A Cr50 console command allows to see which interface is used to communicate 61with the AP: 62 63> \> brdpprop<br> 64> properties = 0x1141 65 66If the least significant bit of the value is set, the H1 is using the SPI 67interface, if the bit is cleared the H1 is using the I2C interface. 68 69Using H1 imposes additional requirements on the AP interface - the H1 might 70have to be waken up from sleep, and flow controls the AP using an additional 71`AP_INT_L` signal, both described in more details below. 72 73## TPM reset 74 75The H1 is staying up until power is removed, unless it falls into deep sleep. 76TPM is just one of the components of the Cr50 firmware, and the TPM must be 77reset when the AP resets. 78 79There are differences between ARM and X86 reset circuit architectures. ARM 80SOCs have a bidirectional reset signal called `SYS_RST_L`. They (or, rather, 81most of them, but let's not worry about the outliers) generate a pulse on this 82line when the SOC reboots. External device can toggle this line to reset the 83SOC asynchronously, which is what the Cr50 does to reset ARM SOCs. 84 85The X86 SOCs have two separate signals, one output `PLT_RST_L` which is held 86low, while the AP is in reset or in low power mode, and one input, 87`SYS_RST_ODL` which Cr50 toggles to reset the SOC. 88 89In case of X86, when `PLT_RST_L` is held low longer than a second, the Cr50 90considers this an indication of the AP going into a low power mode (S5 or 91lower), which means that the AP will start from the reset vector when it wakes 92up, so Cr50 can take H1 into *deep sleep* mode as well. 93 94On top of that ARM based Chrome OS devices have some additional logic which 95forces the `SYS_RST_L` behave similar to `PLT_RST_L` - it stays low when 96the SOC is in a low power mode and will resume operation from the reset 97vector. This allows H1 to enter deep sleep on ARM devices as well. 98 99Resistor bootstraps tell the Cr50 which kind of reset architecture to expect, 100the SOC reset indication is used both to reset the TPM component and to enter 101the *deep sleep* mode as appropriate. 102 103In the `brdprop` command output bit D5 when set signifies `SYS_RST_L` 104('regular' ARM devices) and bit D6 - `PLT_RST_L` (X86 and modified ARM) type 105of reset. 106 107Boot problems can arise when the AP reboots, without cr50 seeing a pulse on 108the `SYS_RST_L` or `PLT_RST_L` signal: in this case the very first TPM_Startup 109command sent by coreboot returns an error, and the Chrome OS device falls into 110recovery mode. 111 112 113## Cr50 operations synchronization 114 115The H1 microcontroller is very slow (clocked at 24 MHz), the AP is usually 116hundreds of times faster, there is a need to slow down the AP when it tries to 117talk to the TPM during boot up process. The issue is complicated by the 118inability of the I2C controller of stretching the clock. 119 120In both I2C and SPI modes the AP\_INT\_L H1 output signal is used to indicate 121to the AP that the H1 is ready for the next I2C or SPI transaction. By default 122this signal is a 4+ us long low pulse. Some X86 platforms require a pulse of 123100+ us, this pulse extension mode can be configured by setting a bit in a TPM 124register (I2C register address 0x1c or SPI register address 0xfe0). 125 126In any case it is important that the AP firmware is properly configuring the 127pin where the AP\_INT\_L signal is connected as an edge sensitive GPIO, which 128latches on either falling or rising edge of the signal. 129 130AP firmware missing these synchronization pulses results in boot process 131taking very long time and the AP firmware log including messages 132 133> Timeout wait for TPM IRQ! 134 135in case of SPI or 136 137> Cr50 i2c TPM IRQ timeout! 138 139in case of I2C. 140 141## Waking H1 up from sleep 142 143The I2C Start sequence is sufficient for the H1 to resume operation, the AP 144does not have to do anything special. In case of SPI the matters are more 145complicated. 146 147Technically speaking the assertion of the CS SPI bus signal is enough to wake 148up the H1, but it takes time for it to become fully operational, the AP could 149be already transmitting the message by the time the H1 SPI controller is 150ready. This is why in case the previous SPI transaction was a second or more 151ago, the SPI driver is required to first issue a CS pulse without transferring 152any data, just to wake up the H1, then wait for 100 us to let the H1 wake up, 153and then continue with a regular SPI transaction. 154 155If the AP does not follow this protocol and starts transmitting before H1 is 156ready, communications failures are likely, resulting in the Chrome OS device 157falling into recovery. This often happens when the device took a long time to 158find the kernel to boot, and then the AP is trying to lock the TPM state 159before starting up the kernel, but fails, because the H1 was asleep by this 160time and was not properly woken up. 161 162## SPI Message Synchronization 163 164SPI interface is synchronous, and either read or write accesses happen within 165a single transaction. The Trusted Computing Group (TCG) came up with a 166hardware protocol on top of SPI specification to allow the slow device to flow 167control the fast host controller. 168 169The base idea is that each time the AP wants to read or write a TPM register, 170it sends a SPI packet, which consists of the header and data fields. 171 172The header field is always present, it is 4 bytes in size, and includes the 173operation type (read or write), data length and register address. 174 175The header is sent out as soon as the SPI transaction starts, then the AP 176starts monitoring the MOSI line, one byte at a time, paying attention to bit 177D0. The Cr50 keeps sending zeros on that bit, until ready to proceed with the 178operation requested in the transaction header. Once the Cr50 is ready, it 179responds with a byte with bit D0 set to one. At this point the AP knows that 180starting with the next byte the actual data of the transaction can be flowing, 181so it either sends the data in case of write or reads it from the TPM in case 182of reads. 183 184This is described in details in [TCG PC Client Platform TPM Profile (PTP) 185Specification Family "2.0" Level 00 Revision 18600.43](https://drive.google.com/file/d/16r1vDhf1fnggI4BkOBuTXPqOQt4LaFvk/view?usp=sharing) 187in section "6.4 Spi Hardware Protocol". 188 189The AP ignoring this flow control mechanism is yet another common problem 190causing failures to boot, because the driver starts sending or receiving data 191before TPM is ready. This failure is more likely to happen when developing new 192SPI drivers. 193 194## Boot up process examples 195 196A trace of a typical Chrome OS device boot process was collected using the 197[Saleae](https://www.saleae.com/) Logic Pro 16 logic analyzer. 198 199The [full trace](./images/bobba_boot.sal) can be examined in details using the 200Saleae application in the trace analysis mode. 201 202A few detailed snapshots of this trace are shown below (click to expand): 203 204### Full boot sequence 205 206[][1] shows communications 207between AP an H1 during a typical Chrome OS boot: first a flurry of 208communications between Coreboot and the H1, then some time spent verifying and 209loading various firmware stages, then a block of communications between 210Depthcarge and the H1. 211 212### Typical read sequence 213 214[][2] shows the 4 byte 215header where the read of four bytes from register address 0xd40f00 is 216requested. The TPM is not ready and sends all zeros on the MISO line for three 217cycles, then sends a byte of 01 and then the AP reads four bytes of the actual 218register value (0xe01a2800). Then, after H1 is ready to accept the next SPI 219transaction it generates a pulse on AP\_INT\_L. 220 221### Read with wake pulse sequence 222 223[][3] is an example 224of a case where the AP toggles the CS line first, without sending any data, 225and then in 100 us starts the actual SPI transaction completed with the 226AP\_INT\_L pulse. 227 228[1]:https://drive.google.com/file/d/16Z_Nw1e6z5akUnyLZyI8ivfT5frxKPQh/view 229[2]:https://drive.google.com/file/d/1weBd6kBiXoQ0I3TGmbpiHZm0dimByYnI/view 230[3]:https://drive.google.com/file/d/13ZSP3up4leG0Etqo4A_gkFK1MeptGDCw/view 231