1This document explains the strategy that was used so far in starting the 2migration to PSA Crypto and mentions future perspectives and open questions. 3 4Goals 5===== 6 7Several benefits are expected from migrating to PSA Crypto: 8 9G1. Use PSA Crypto drivers when available. 10G2. Allow isolation of long-term secrets (for example, private keys). 11G3. Allow isolation of short-term secrets (for example, TLS session keys). 12G4. Have a clean, unified API for Crypto (retire the legacy API). 13G5. Code size: compile out our implementation when a driver is available. 14 15As of Mbed TLS 3.2, most of (G1) and all of (G2) is implemented when 16`MBEDTLS_USE_PSA_CRYPTO` is enabled. For (G2) to take effect, the application 17needs to be changed to use new APIs. For a more detailed account of what's 18implemented, see `docs/use-psa-crypto.md`, where new APIs are about (G2), and 19internal changes implement (G1). 20 21As of early 2023, work towards G5 is in progress: Mbed TLS 3.3 and 3.4 saw 22some improvements in this area, and more will be coming in future releases. 23 24Generally speaking, the numbering above doesn't mean that each goal requires 25the preceding ones to be completed. 26 27 28Compile-time options 29==================== 30 31We currently have a few compile-time options that are relevant to the migration: 32 33- `MBEDTLS_PSA_CRYPTO_C` - enabled by default, controls the presence of the PSA 34 Crypto APIs. 35- `MBEDTLS_USE_PSA_CRYPTO` - disabled by default (enabled in "full" config), 36 controls usage of PSA Crypto APIs to perform operations in X.509 and TLS 37(G1 above), as well as the availability of some new APIs (G2 above). 38- `PSA_CRYPTO_CONFIG` - disabled by default, supports builds with drivers and 39 without the corresponding software implementation (G5 above). 40 41The reasons why `MBEDTLS_USE_PSA_CRYPTO` is optional and disabled by default 42are: 43- it's not fully compatible with `MBEDTLS_ECP_RESTARTABLE`: you can enable 44 both, but then you won't get the full effect of RESTARTBLE (see the 45documentation of this option in `mbedtls_config.h`); 46- to avoid a hard/default dependency of TLS, X.509 and PK on 47 `MBEDTLS_PSA_CRYPTO_C`, for backward compatibility reasons: 48 - When `MBEDTLS_PSA_CRYPTO_C` is enabled and used, applications need to call 49 `psa_crypto_init()` before TLS/X.509 uses PSA functions. (This prevents us 50from even enabling the option by default.) 51 - `MBEDTLS_PSA_CRYPTO_C` has a hard dependency on `MBEDTLS_ENTROPY_C || 52 MBEDTLS_PSA_CRYPTO_EXTERNAL_RNG` but it's 53 currently possible to compile TLS and X.509 without any of the options. 54 Also, we can't just auto-enable `MBEDTLS_ENTROPY_C` as it doesn't build 55 out of the box on all platforms, and even less 56 `MBEDTLS_PSA_CRYPTO_EXTERNAL_RNG` as it requires a user-provided RNG 57 function. 58 59The downside of this approach is that until we are able to make 60`MBDEDTLS_USE_PSA_CRYPTO` non-optional (always enabled), we have to maintain 61two versions of some parts of the code: one using PSA, the other using the 62legacy APIs. However, see next section for strategies that can lower that 63cost. The rest of this section explains the reasons for the 64incompatibilities mentioned above. 65 66At the time of writing (early 2022) it is unclear what could be done about the 67backward compatibility issues, and in particular if the cost of implementing 68solutions to these problems would be higher or lower than the cost of 69maintaining dual code paths until the next major version. (Note: these 70solutions would probably also solve other problems at the same time.) 71 72### `MBEDTLS_ECP_RESTARTABLE` 73 74Currently this option controls not only the presence of restartable APIs in 75the crypto library, but also their use in the TLS and X.509 layers. Since PSA 76Crypto does not support restartable operations, there's a clear conflict: the 77TLS and X.509 layers can't both use only PSA APIs and get restartable 78behaviour. 79 80Support for restartable (aka interruptible) ECDSA sign/verify operation was 81added to PSA in Mbed TLS 3.4, but support for ECDH is not present yet. 82 83It will then require follow-up work to make use of the new PSA APIs in 84PK/X.509/TLS in all places where we currently allow restartable operations. 85 86### Backward compatibility issues with making `MBEDTLS_USE_PSA_CRYPTO` always on 87 881. Existing applications may not be calling `psa_crypto_init()` before using 89 TLS, X.509 or PK. We can try to work around that by calling (the relevant 90part of) it ourselves under the hood as needed, but that would likely require 91splitting init between the parts that can fail and the parts that can't (see 92<https://github.com/ARM-software/psa-crypto-api/pull/536> for that). 932. It's currently not possible to enable `MBEDTLS_PSA_CRYPTO_C` in 94 configurations that don't have `MBEDTLS_ENTROPY_C`, and we can't just 95auto-enable the latter, as it won't build or work out of the box on all 96platforms. There are two kinds of things we'd need to do if we want to work 97around that: 98 1. Make it possible to enable the parts of PSA Crypto that don't require an 99 RNG (typically, public key operations, symmetric crypto, some key 100management functions (destroy etc)) in configurations that don't have 101`ENTROPY_C`. This requires going through the PSA code base to adjust 102dependencies. Risk: there may be annoying dependencies, some of which may be 103surprising. 104 2. For operations that require an RNG, provide an alternative function 105 accepting an explicit `f_rng` parameter (see #5238), that would be 106available in entropy-less builds. (Then code using those functions still needs 107to have one version using it, for entropy-less builds, and one version using 108the standard function, for driver support in build with entropy.) 109 110See <https://github.com/Mbed-TLS/mbedtls/issues/5156>. 111 112Taking advantage of the existing abstractions layers - or not 113============================================================= 114 115The Crypto library in Mbed TLS currently has 3 abstraction layers that offer 116algorithm-agnostic APIs for a class of algorithms: 117 118- MD for messages digests aka hashes (including HMAC) 119- Cipher for symmetric ciphers (included AEAD) 120- PK for asymmetric (aka public-key) cryptography (excluding key exchange) 121 122Note: key exchange (FFDH, ECDH) is not covered by an abstraction layer. 123 124These abstraction layers typically provide, in addition to the API for crypto 125operations, types and numerical identifiers for algorithms (for 126example `mbedtls_cipher_mode_t` and its values). The 127current strategy is to keep using those identifiers in most of the code, in 128particular in existing structures and public APIs, even when 129`MBEDTLS_USE_PSA_CRYPTO` is enabled. (This is not an issue for G1, G2, G3 130above, and is only potentially relevant for G4.) 131 132The are multiple strategies that can be used regarding the place of those 133layers in the migration to PSA. 134 135Silently call to PSA from the abstraction layer 136----------------------------------------------- 137 138- Provide a new definition (conditionally on `USE_PSA_CRYPTO`) of wrapper 139 functions in the abstraction layer, that calls PSA instead of the legacy 140crypto API. 141- Upside: changes contained to a single place, no need to change TLS or X.509 142 code anywhere. 143- Downside: tricky to implement if the PSA implementation is currently done on 144 top of that layer (dependency loop). 145 146This strategy is currently (early 2023) used for all operations in the PK 147layer; the MD layer uses a variant where it dispatches to PSA if a driver is 148available and the driver subsystem has been initialized, regardless of whether 149`USE_PSA_CRYPTO` is enabled; see `md-cipher-dispatch.md` in the same directory 150for details. 151 152This strategy is not very well suited to the Cipher layer, as the PSA 153implementation is currently done on top of that layer. 154 155This strategy will probably be used for some time for the PK layer, while we 156figure out what the future of that layer is: parts of it (parse/write, ECDSA 157signatures in the format that X.509 & TLS want) are not covered by PSA, so 158they will need to keep existing in some way. (Also, the PK layer is a good 159place for dispatching to either PSA or `mbedtls_xxx_restartable` while that 160part is not covered by PSA yet, if we decide to do that.) 161 162Replace calls for each operation 163-------------------------------- 164 165- For every operation that's done through this layer in TLS or X.509, just 166 replace function call with calls to PSA (conditionally on `USE_PSA_CRYPTO`) 167- Upside: conceptually simple, and if the PSA implementation is currently done 168 on top of that layer, avoids concerns about dependency loops. 169- Upside: opens the door to building TLS/X.509 without that layer, saving some 170 code size. 171- Downside: TLS/X.509 code has to be done for each operation. 172 173This strategy is currently (early 2023) used for the MD layer and the Cipher 174layer in X.509 and TLS. Crypto modules however always call to MD which may 175then dispatch to PSA, see `md-cipher-dispatch.md`. 176 177Opt-in use of PSA from the abstraction layer 178-------------------------------------------- 179 180- Provide a new way to set up a context that causes operations on that context 181 to be done via PSA. 182- Upside: changes mostly contained in one place, TLS/X.509 code only needs to 183 be changed when setting up the context, but not when using it. In 184 particular, no changes to/duplication of existing public APIs that expect a 185 key to be passed as a context of this layer (eg, `mbedtls_pk_context`). 186- Upside: avoids dependency loop when PSA implemented on top of that layer. 187- Downside: when the context is typically set up by the application, requires 188 changes in application code. 189 190This strategy is not useful when no context is used, for example with the 191one-shot function `mbedtls_md()`. 192 193There are two variants of this strategy: one where using the new setup 194function also allows for key isolation (the key is only held by PSA, 195supporting both G1 and G2 in that area), and one without isolation (the key is 196still stored outside of PSA most of the time, supporting only G1). 197 198This strategy, with support for key isolation, is currently (early 2022) used for 199private-key operations in the PK layer - see `mbedtls_pk_setup_opaque()`. This 200allows use of PSA-held private ECDSA keys in TLS and X.509 with no change to 201the TLS/X.509 code, but a contained change in the application. 202 203This strategy, without key isolation, was also previously used (until 3.1 204included) in the Cipher layer - see `mbedtls_cipher_setup_psa()`. This allowed 205use of PSA for cipher operations in TLS with no change to the application 206code, and a contained change in TLS code. (It only supported a subset of 207ciphers.) 208 209Note: for private key operations in the PK layer, both the "silent" and the 210"opt-in" strategy can apply, and can complement each other, as one provides 211support for key isolation, but at the (unavoidable) code of change in 212application code, while the other requires no application change to get 213support for drivers, but fails to provide isolation support. 214 215Summary 216------- 217 218Strategies currently (early 2022) used with each abstraction layer: 219 220- PK (for G1): silently call PSA 221- PK (for G2): opt-in use of PSA (new key type) 222- Cipher (G1): replace calls at each call site 223- MD (G1, X.509 and TLS): replace calls at each call site (depending on 224 `USE_PSA_CRYPTO`) 225- MD (G5): silently call PSA when a driver is available, see 226 `md-cipher-dispatch.md`. 227 228 229Supporting builds with drivers without the software implementation 230================================================================== 231 232This section presents a plan towards G5: save code size by compiling out our 233software implementation when a driver is available. 234 235Let's expand a bit on the definition of the goal: in such a configuration 236(driver used, software implementation and abstraction layer compiled out), 237we want: 238 239a. the library to build in a reasonably-complete configuration, 240b. with all tests passing, 241c. and no more tests skipped than the same configuration with software 242 implementation. 243 244Criterion (c) ensures not only test coverage, but that driver-based builds are 245at feature parity with software-based builds. 246 247We can roughly divide the work needed to get there in the following steps: 248 2490. Have a working driver interface for the algorithms we want to replace. 2501. Have users of these algorithms call to PSA or an abstraction layer than can 251 dispatch to PSA, but not the low-level legacy API, for all operations. 252(This is G1, and for PK, X.509 and TLS this is controlled by 253`MBEDTLS_USE_PSA_CRYPTO`.) This needs to be done in the library and tests. 2542. Have users of these algorithms not depend on the legacy API for information 255 management (getting a size for a given algorithm, etc.) 2563. Adapt compile-time guards used to query availability of a given algorithm; 257 this needs to be done in the library (for crypto operations and data) and 258tests. 259 260Note: the first two steps enable use of drivers, but not by themselves removal 261of the software implementation. 262 263Note: the fact that step 1 is not achieved for all of libmbedcrypto (see 264below) is the reason why criterion (a) has "a reasonably-complete 265configuration", to allow working around internal crypto dependencies when 266working on other parts such as X.509 and TLS - for example, a configuration 267without RSA PKCS#1 v2.1 still allows reasonable use of X.509 and TLS. 268 269Note: this is a conceptual division that will sometimes translate to how the 270work is divided into PRs, sometimes not. For example, in situations where it's 271not possible to achieve good test coverage at the end of step 1 or step 2, it 272is preferable to group with the next step(s) in the same PR until good test 273coverage can be reached. 274 275**Status as of end of March 2023 (shortly after 3.4):** 276 277- Step 0 is achieved for most algorithms, with only a few gaps remaining. 278- Step 1 is achieved for most of PK, X.509, and TLS when 279 `MBEDTLS_USE_PSA_CRYPTO` is enabled with only a few gaps remaining (see 280 docs/use-psa-crypto.md). 281- Step 1 is achieved for the crypto library regarding hashes: everything uses 282 MD (not low-level hash APIs), which then dispatches to PSA if applicable. 283- Step 1 is not achieved for all of the crypto library when it come to 284 ciphers. For example,`ctr_drbg.c` calls the legacy API `mbedtls_aes`. 285- Step 2 is achieved for most of X.509 and TLS (same gaps as step 1) when 286 `MBEDTLS_USE_PSA_CRYPTO` is enabled. 287- Step 3 is done for hashes and top-level ECC modules (ECDSA, ECDH, ECJPAKE). 288 289**Strategy for step 1:** 290 291Regarding PK, X.509, and TLS, this is mostly achieved with only a few gaps. 292(The strategy was outlined in the previous section.) 293 294Regarding libmbedcrypto: 295- for hashes and ciphers, see `md-cipher-dispatch.md` in the same directory; 296- for ECC, we have no internal uses of the top-level algorithms (ECDSA, ECDH, 297 ECJPAKE), however they all depend on `ECP_C` which in turn depends on 298`BIGNUM_C`. So, direct calls from TLS, X.509 and PK to ECP and Bignum will 299need to be replaced; see <https://github.com/Mbed-TLS/mbedtls/issues/6839> and 300linked issues for a summary of intermediate steps and open points. 301 302**Strategy for step 2:** 303 304The most satisfying situation here is when we can just use the PSA Crypto API 305for information management as well. However sometimes it may not be 306convenient, for example in parts of the code that accept old-style identifiers 307(such as `mbedtls_md_type_t`) in their API and can't assume PSA to be 308compiled in (such as `rsa.c`). 309 310When using an existing abstraction layer such as MD, it can provide 311information management functions. In other cases, information that was in a 312low-level module but logically belongs in a higher-level module can be moved 313to that module (for example, TLS identifiers of curves and there conversion 314to/from PSA or legacy identifiers belongs in TLS, not `ecp.c`). 315 316**Strategy for step 3:** 317 318There are currently two (complementary) ways for crypto-using code to check if a 319particular algorithm is supported: using `MBEDTLS_xxx` macros, and using 320`PSA_WANT_xxx` macros. For example, PSA-based code that want to use SHA-256 321will check for `PSA_WANT_ALG_SHA_256`, while legacy-based code that wants to 322use SHA-256 will check for `MBEDTLS_SHA256_C` if using the `mbedtls_sha256` 323API, or for `MBEDTLS_MD_C && MBEDTLS_SHA256_C` if using the `mbedtls_md` API. 324 325Code that obeys `MBEDTLS_USE_PSA_CRYPTO` will want to use one of the two 326dependencies above depending on whether `MBEDTLS_USE_PSA_CRYPTO` is defined: 327if it is, the code want the algorithm available in PSA, otherwise, it wants it 328available via the legacy API(s) is it using (MD and/or low-level). 329 330As much as possible, we're trying to create for each algorithm a single new 331macro that can be used to express dependencies everywhere (except pure PSA 332code that should always use `PSA_WANT`). For example, for hashes this is the 333`MBEDTLS_MD_CAN_xxx` family. For ECC algorithms, we have similar 334`MBEDTLS_PK_CAN_xxx` macros. 335 336Note that in order to achieve that goal, even for code that obeys 337`USE_PSA_CRYPTO`, it is useful to impose that all algorithms that are 338available via the legacy APIs are also available via PSA. 339 340Executing step 3 will mostly consist of using the right dependency macros in 341the right places (once the previous steps are done). 342 343**Note on testing** 344 345Since supporting driver-only builds is not about adding features, but about 346supporting existing features in new types of builds, testing will not involve 347adding cases to the test suites, but instead adding new components in `all.sh` 348that build and run tests in newly-supported configurations. For example, if 349we're making some part of the library work with hashes provided only by 350drivers when `MBEDTLS_USE_PSA_CRYPTO` is defined, there should be a place in 351`all.sh` that builds and run tests in such a configuration. 352 353There is however a risk, especially in step 3 where we change how dependencies 354are expressed (sometimes in bulk), to get things wrong in a way that would 355result in more tests being skipped, which is easy to miss. Care must be 356taken to ensure this does not happen. The following criteria can be used: 357 3581. The sets of tests skipped in the default config and the full config must be 359 the same before and after the PR that implements step 3. This is tested 360manually for each PR that changes dependency declarations by using the script 361`outcome-analysis.sh` in the present directory. 3622. The set of tests skipped in the driver-only build is the same as in an 363 equivalent software-based configuration. This is tested automatically by the 364CI in the "Results analysis" stage, by running 365`tests/scripts/analyze_outcomes.py`. See the 366`analyze_driver_vs_reference_xxx` actions in the script and the comments above 367their declaration for how to do that locally. 368 369 370Migrating away from the legacy API 371================================== 372 373This section briefly introduces questions and possible plans towards G4, 374mainly as they relate to choices in previous stages. 375 376The role of the PK/Cipher/MD APIs in user migration 377--------------------------------------------------- 378 379We're currently taking advantage of the existing PK layer in order 380to reduce the number of places where library code needs to be changed. It's 381only natural to consider using the same strategy (with the PK, MD and Cipher 382layers) for facilitating migration of application code. 383 384Note: a necessary first step for that would be to make sure PSA is no longer 385implemented of top of the concerned layers 386 387### Zero-cost compatibility layer? 388 389The most favourable case is if we can have a zero-cost abstraction (no 390runtime, RAM usage or code size penalty), for example just a bunch of 391`#define`s, essentially mapping `mbedtls_` APIs to their `psa_` equivalent. 392 393Unfortunately that's unlikely to fully work. For example, the MD layer uses the 394same context type for hashes and HMACs, while the PSA API (rightfully) has 395distinct operation types. Similarly, the Cipher layer uses the same context 396type for unauthenticated and AEAD ciphers, which again the PSA API 397distinguishes. 398 399It is unclear how much value, if any, a zero-cost compatibility layer that's 400incomplete (for example, for MD covering only hashes, or for Cipher covering 401only AEAD) or differs significantly from the existing API (for example, 402introducing new context types) would provide to users. 403 404### Low-cost compatibility layers? 405 406Another possibility is to keep most or all of the existing API for the PK, MD 407and Cipher layers, implemented on top of PSA, aiming for the lowest possible 408cost. For example, `mbedtls_md_context_t` would be defined as a (tagged) union 409of `psa_hash_operation_t` and `psa_mac_operation_t`, then `mbedtls_md_setup()` 410would initialize the correct part, and the rest of the functions be simple 411wrappers around PSA functions. This would vastly reduce the complexity of the 412layers compared to the existing (no need to dispatch through function 413pointers, just call the corresponding PSA API). 414 415Since this would still represent a non-zero cost, not only in terms of code 416size, but also in terms of maintenance (testing, etc.) this would probably 417be a temporary solution: for example keep the compatibility layers in 4.0 (and 418make them optional), but remove them in 5.0. 419 420Again, this provides the most value to users if we can manage to keep the 421existing API unchanged. Their might be conflicts between this goal and that of 422reducing the cost, and judgment calls may need to be made. 423 424Note: when it comes to holding public keys in the PK layer, depending on how 425the rest of the code is structured, it may be worth holding the key data in 426memory controlled by the PK layer as opposed to a PSA key slot, moving it to a 427slot only when needed (see current `ecdsa_verify_wrap` when 428`MBEDTLS_USE_PSA_CRYPTO` is defined) For example, when parsing a large 429number, N, of X.509 certificates (for example the list of trusted roots), it 430might be undesirable to use N PSA key slots for their public keys as long as 431the certs are loaded. OTOH, this could also be addressed by merging the "X.509 432parsing on-demand" (#2478), and then the public key data would be held as 433bytes in the X.509 CRT structure, and only moved to a PK context / PSA slot 434when it's actually used. 435 436Note: the PK layer actually consists of two relatively distinct parts: crypto 437operations, which will be covered by PSA, and parsing/writing (exporting) 438from/to various formats, which is currently not fully covered by the PSA 439Crypto API. 440 441### Algorithm identifiers and other identifiers 442 443It should be easy to provide the user with a bunch of `#define`s for algorithm 444identifiers, for example `#define MBEDTLS_MD_SHA256 PSA_ALG_SHA_256`; most of 445those would be in the MD, Cipher and PK compatibility layers mentioned above, 446but there might be some in other modules that may be worth considering, for 447example identifiers for elliptic curves. 448 449### Lower layers 450 451Generally speaking, we would retire all of the low-level, non-generic modules, 452such as AES, SHA-256, RSA, DHM, ECDH, ECP, bignum, etc, without providing 453compatibility APIs for them. People would be encouraged to switch to the PSA 454API. (The compatibility implementation of the existing PK, MD, Cipher APIs 455would mostly benefit people who already used those generic APis rather than 456the low-level, alg-specific ones.) 457 458### APIs in TLS and X.509 459 460Public APIs in TLS and X.509 may be affected by the migration in at least two 461ways: 462 4631. APIs that rely on a legacy `mbedtls_` crypto type: for example 464 `mbedtls_ssl_conf_own_cert()` to configure a (certificate and the 465associated) private key. Currently the private key is passed as a 466`mbedtls_pk_context` object, which would probably change to a `psa_key_id_t`. 467Since some users would probably still be using the compatibility PK layer, it 468would need a way to easily extract the PSA key ID from the PK context. 469 4702. APIs the accept list of identifiers: for example 471 `mbedtls_ssl_conf_curves()` taking a list of `mbedtls_ecp_group_id`s. This 472could be changed to accept a list of pairs (`psa_ecc_family_t`, size) but we 473should probably take this opportunity to move to a identifier independent from 474the underlying crypto implementation and use TLS-specific identifiers instead 475(based on IANA values or custom enums), as is currently done in the new 476`mbedtls_ssl_conf_groups()` API, see #4859). 477 478Testing 479------- 480 481An question that needs careful consideration when we come around to removing 482the low-level crypto APIs and making PK, MD and Cipher optional compatibility 483layers is to be sure to preserve testing quality. A lot of the existing test 484cases use the low level crypto APIs; we would need to either keep using that 485API for tests, or manually migrate tests to the PSA Crypto API. Perhaps a 486combination of both, perhaps evolving gradually over time. 487