1# PartitionAlloc Design 2 3This document describes PartitionAlloc at a high level, with some architectural 4details. For implementation details, see the comments in 5`partition_alloc_constants.h`. 6 7## Quick Links 8 9* [Glossary](./glossary.md): Definitions of terms commonly used in 10 PartitionAlloc. The present document largely avoids defining terms. 11 12* [Build Config](./build_config.md): Pertinent GN args, buildflags, and 13 macros. 14 15* [Chrome-External Builds](./external_builds.md): Further considerations 16 for standalone PartitionAlloc, plus an embedder's guide for some extra 17 GN args. 18 19## Overview 20 21PartitionAlloc is a memory allocator optimized for space efficiency, 22allocation latency, and security. 23 24### Performance 25 26PartitionAlloc is designed to be extremely fast in its fast paths. The fast 27paths of allocation and deallocation require very few (reasonably predictable) 28branches. The number of operations in the fast paths is minimal, leading to the 29possibility of inlining. 30 31 35 36However, even the fast path isn't the fastest, because it requires taking 37a per-partition lock. Although we optimized the lock, there was still room for 38improvement; to this end, we introduced the thread cache. 39The thread cache has been tailored to satisfy a vast majority of requests by 40allocating from and releasing memory to the main allocator in batches, 41amortizing lock acquisition and further improving locality while not trapping 42excess memory. 43 44### Security 45 46Security is one of the important goals of PartitionAlloc. 47 48PartitionAlloc guarantees that different partitions exist in different regions 49of the process's address space. When the caller has freed all objects contained 50in a page in a partition, PartitionAlloc returns the physical memory to the 51operating system, but continues to reserve the region of address space. 52PartitionAlloc will only reuse an address space region for the same partition. 53 54Similarly, one page can contain only objects from the same bucket. 55When freed, PartitionAlloc returns the physical memory, but continues to reserve 56the region for this very bucket. 57 58The above techniques help avoid type confusion attacks. Note, however, these 59apply only to normal buckets and not to direct map, as it'd waste too much 60address space. 61 62PartitionAlloc also guarantees that: 63 64* Linear overflows/underflows cannot corrupt into, out of, or between 65 partitions. There are guard pages at the beginning and the end of each memory 66 region owned by a partition. 67 68* Linear overflows/underflows cannot corrupt the allocation metadata. 69 PartitionAlloc records metadata in a dedicated, out-of-line region (not 70 adjacent to objects), surrounded by guard pages. (Freelist pointers are an 71 exception.) 72 73* Partial pointer overwrite of freelist pointer should fault. 74 75* Direct map allocations have guard pages at the beginning and the end. 76 77### Alignment 78 79PartitionAlloc guarantees that returned pointers are aligned on 80`partition_alloc::internal::kAlignment` boundary (typically 16B on 8164-bit systems, and 8B on 32-bit). 82 83PartitionAlloc also supports higher levels of alignment, that can be requested 84via `PartitionAlloc::AlignedAlloc()` or platform-specific APIs (such as 85`posix_memalign()`). The requested 86alignment has to be a power of two. PartitionAlloc reserves the right to round 87up the requested size to the nearest power of two, greater than or equal to the 88requested alignment. This may be wasteful, but allows taking advantage of 89natural PartitionAlloc alignment guarantees. Allocations with an alignment 90requirement greater than `partition_alloc::internal::kAlignment` are expected 91to be very rare. 92 93## Architecture 94 95### Layout in Memory 96 97PartitionAlloc handles normal buckets by reserving (not committing) 2MiB super 98pages. Each super page is split into partition pages. 99The first and the last partition page are permanently inaccessible and serve 100as guard pages, with the exception of one system page in the middle of the first 101partition page that holds metadata (32B struct per partition page). 102 103 107 108* The slot span numbers provide a visual hint of their size (in partition 109 pages). 110* Colors provide a visual hint of the bucket to which the slot span belongs. 111 * Although only five colors are shown, in reality, a super page holds 112 tens of slot spans, some of which belong to the same bucket. 113* The system page that holds metadata tracks each partition page with one 32B 114 [`PartitionPageMetadata` struct][PartitionPage], which is either 115 * a [`SlotSpanMetadata`][SlotSpanMetadata] ("v"s in the diagram) or 116 * a [`SubsequentPageMetadata`][SubsequentPageMetadata] ("+"s in the 117 diagram). 118* Gray fill denotes guard pages (one partition page each at the head and tail 119 of each super page). 120* In some configurations, PartitionAlloc stores more metadata than can 121 fit in the one system page at the front. These are the bitmaps for 122 StarScan and `MTECheckedPtr<T>`, and they are relegated to the head of 123 what would otherwise be usable space for slot spans. One, both, or 124 none of these bitmaps may be present, depending on build 125 configuration, runtime configuration, and type of allocation. 126 See [`SuperPagePayloadBegin()`][payload-start] for details. 127 128As allocation requests arrive, there is eventually a need to allocate a new slot 129span. 130Address space for such a slot span is carved out from the last super page. If 131not enough space, a new super page is allocated. Due to varying sizes of slot 132span, this may lead to leaving space unused (we never go back to fill previous 133super pages), which is fine because this memory is merely reserved, which is far 134less precious than committed memory. Note also that address space reserved for a 135slot span is never released, even if the slot span isn't used for a long time. 136 137All slots in a newly allocated slot span are *free*, i.e. available for 138allocation. 139 140### Freelist Pointers 141 142All free slots within a slot span are chained into a singly-linked free-list, 143by writing the *next* pointer at the beginning of each slot, and the head of the 144list is written in the metadata struct. 145 146However, writing a pointer in each free slot of a newly allocated span would 147require committing and faulting in physical pages upfront, which would be 148unacceptable. Therefore, PartitionAlloc has a concept of *provisioning slots*. 149Only provisioned slots are chained into the freelist. 150Once provisioned slots in a span are depleted, then another page worth of slots 151is provisioned (note, a slot that crosses a page boundary only gets 152provisioned with slots of the next page). See 153`PartitionBucket::ProvisionMoreSlotsAndAllocOne()` for more details. 154 155Freelist pointers are stored at the beginning of each free slot. As such, they 156are the only metadata that is inline, i.e. stored among the 157objects. This makes them prone to overruns. On little-endian systems, the 158pointers are encoded by reversing byte order, so that partial overruns will very 159likely result in destroying the pointer, as opposed to forming a valid pointer 160to a nearby location. 161 162Furthermore, a shadow of a freelist pointer is stored next to it, encoded in a 163different manner. This helps PartitionAlloc detect corruptions. 164 165### Slot Span States 166 167A slot span can be in any of 4 states: 168* *Full*. A full span has no free slots. 169* *Empty*. An empty span has no allocated slots, only free slots. 170* *Active*. An active span is anything in between the above two. 171* *Decommitted*. A decommitted span is a special case of an empty span, where 172 all pages are decommitted from memory. 173 174PartitionAlloc prioritizes getting an available slot from an active span, over 175an empty one, in hope that the latter can be soon transitioned into a 176decommitted state, thus releasing memory. There is no mechanism, however, to 177prioritize selection of a slot span based on the number of already allocated 178slots. 179 180An empty span becomes decommitted either when there are too many empty spans 181(FIFO), or when `PartitionRoot::PurgeMemory()` gets invoked periodically (or in 182low memory pressure conditions). An allocation can be satisfied from 183a decommitted span if there are no active or empty spans available. The slot 184provisioning mechanism kicks back in, committing the pages gradually as needed, 185and the span becomes active. (There is currently no other way 186to unprovision slots than decommitting the entire span). 187 188As mentioned above, a bucket is a collection of slot spans containing slots of 189the same size. In fact, each bucket has 3 linked-lists, chaining active, empty 190and decommitted spans (see `PartitionBucket::*_slot_spans_head`). 191There is no need for a full span list. The lists are updated lazily. An empty, 192decommitted or full span may stay on the active list for some time, until 193`PartitionBucket::SetNewActiveSlotSpan()` encounters it. 194A decommitted span may stay on the empty list for some time, 195until `PartitionBucket::SlowPathAlloc()` encounters it. However, 196the inaccuracy can't happen in the other direction, i.e. an active span can only 197be on the active list, and an empty span can only be on the active or empty 198list. 199 200[PartitionPage]: https://source.chromium.org/search?q=-file:third_party/(angle|dawn)%20class:PartitionPageMetadata%20file:partition_page.h&ss=chromium 201[SlotSpanMetadata]: https://source.chromium.org/search?q=-file:third_party/(angle|dawn)%20class:SlotSpanMetadata%20file:partition_page.h&ss=chromium 202[SubsequentPageMetadata]: https://source.chromium.org/search?q=-file:third_party/(angle|dawn)%20class:SubsequentPageMetadata%20file:partition_page.h&ss=chromium 203[payload-start]: https://source.chromium.org/search?q=-file:third_party%2F(angle%7Cdawn)%20content:SuperPagePayloadBegin%20file:partition_page.h&ss=chromium 204