xref: /aosp_15_r20/external/cronet/base/allocator/partition_allocator/PartitionAlloc.md (revision 6777b5387eb2ff775bb5750e3f5d96f37fb7352b)
1# PartitionAlloc Design
2
3This document describes PartitionAlloc at a high level, with some architectural
4details. For implementation details, see the comments in
5`partition_alloc_constants.h`.
6
7## Quick Links
8
9* [Glossary](./glossary.md): Definitions of terms commonly used in
10  PartitionAlloc. The present document largely avoids defining terms.
11
12* [Build Config](./build_config.md): Pertinent GN args, buildflags, and
13  macros.
14
15* [Chrome-External Builds](./external_builds.md): Further considerations
16  for standalone PartitionAlloc, plus an embedder's guide for some extra
17  GN args.
18
19## Overview
20
21PartitionAlloc is a memory allocator optimized for space efficiency,
22allocation latency, and security.
23
24### Performance
25
26PartitionAlloc is designed to be extremely fast in its fast paths. The fast
27paths of allocation and deallocation require very few (reasonably predictable)
28branches. The number of operations in the fast paths is minimal, leading to the
29possibility of inlining.
30
31![The central allocator manages slots and spans. It is locked on a
32  per-partition basis. Separately, the thread cache consumes slots
33  from the central allocator, allowing it to hand out memory
34  quickly to individual threads.](./src/partition_alloc/dot/layers.png)
35
36However, even the fast path isn't the fastest, because it requires taking
37a per-partition lock. Although we optimized the lock, there was still room for
38improvement; to this end, we introduced the thread cache.
39The thread cache has been tailored to satisfy a vast majority of requests by
40allocating from and releasing memory to the main allocator in batches,
41amortizing lock acquisition and further improving locality while not trapping
42excess memory.
43
44### Security
45
46Security is one of the important goals of PartitionAlloc.
47
48PartitionAlloc guarantees that different partitions exist in different regions
49of the process's address space. When the caller has freed all objects contained
50in a page in a partition, PartitionAlloc returns the physical memory to the
51operating system, but continues to reserve the region of address space.
52PartitionAlloc will only reuse an address space region for the same partition.
53
54Similarly, one page can contain only objects from the same bucket.
55When freed, PartitionAlloc returns the physical memory, but continues to reserve
56the region for this very bucket.
57
58The above techniques help avoid type confusion attacks. Note, however, these
59apply only to normal buckets and not to direct map, as it'd waste too much
60address space.
61
62PartitionAlloc also guarantees that:
63
64* Linear overflows/underflows cannot corrupt into, out of, or between
65  partitions. There are guard pages at the beginning and the end of each memory
66  region owned by a partition.
67
68* Linear overflows/underflows cannot corrupt the allocation metadata.
69  PartitionAlloc records metadata in a dedicated, out-of-line region (not
70  adjacent to objects), surrounded by guard pages. (Freelist pointers are an
71  exception.)
72
73* Partial pointer overwrite of freelist pointer should fault.
74
75* Direct map allocations have guard pages at the beginning and the end.
76
77### Alignment
78
79PartitionAlloc guarantees that returned pointers are aligned on
80`partition_alloc::internal::kAlignment` boundary (typically 16B on
8164-bit systems, and 8B on 32-bit).
82
83PartitionAlloc also supports higher levels of alignment, that can be requested
84via `PartitionAlloc::AlignedAlloc()` or platform-specific APIs (such as
85`posix_memalign()`). The requested
86alignment has to be a power of two. PartitionAlloc reserves the right to round
87up the requested size to the nearest power of two, greater than or equal to the
88requested alignment. This may be wasteful, but allows taking advantage of
89natural PartitionAlloc alignment guarantees. Allocations with an alignment
90requirement greater than `partition_alloc::internal::kAlignment` are expected
91to be very rare.
92
93## Architecture
94
95### Layout in Memory
96
97PartitionAlloc handles normal buckets by reserving (not committing) 2MiB super
98pages. Each super page is split into partition pages.
99The first and the last partition page are permanently inaccessible and serve
100as guard pages, with the exception of one system page in the middle of the first
101partition page that holds metadata (32B struct per partition page).
102
103![A super page is shown full of slot spans. The slot spans are logically
104  strung together to form buckets. At both extremes of the super page
105  are guard pages. PartitionAlloc metadata is hidden inside the
106  guard pages at the "front."](./src/partition_alloc/dot/super-page.png)
107
108* The slot span numbers provide a visual hint of their size (in partition
109  pages).
110* Colors provide a visual hint of the bucket to which the slot span belongs.
111    * Although only five colors are shown, in reality, a super page holds
112      tens of slot spans, some of which belong to the same bucket.
113* The system page that holds metadata tracks each partition page with one 32B
114  [`PartitionPageMetadata` struct][PartitionPage], which is either
115    * a [`SlotSpanMetadata`][SlotSpanMetadata] ("v"s in the diagram) or
116    * a [`SubsequentPageMetadata`][SubsequentPageMetadata] ("+"s in the
117      diagram).
118* Gray fill denotes guard pages (one partition page each at the head and tail
119  of each super page).
120* In some configurations, PartitionAlloc stores more metadata than can
121  fit in the one system page at the front. These are the bitmaps for
122  StarScan and `MTECheckedPtr<T>`, and they are relegated to the head of
123  what would otherwise be usable space for slot spans. One, both, or
124  none of these bitmaps may be present, depending on build
125  configuration, runtime configuration, and type of allocation.
126  See [`SuperPagePayloadBegin()`][payload-start] for details.
127
128As allocation requests arrive, there is eventually a need to allocate a new slot
129span.
130Address space for such a slot span is carved out from the last super page. If
131not enough space, a new super page is allocated. Due to varying sizes of slot
132span, this may lead to leaving space unused (we never go back to fill previous
133super pages), which is fine because this memory is merely reserved, which is far
134less precious than committed memory. Note also that address space reserved for a
135slot span is never released, even if the slot span isn't used for a long time.
136
137All slots in a newly allocated slot span are *free*, i.e. available for
138allocation.
139
140### Freelist Pointers
141
142All free slots within a slot span are chained into a singly-linked free-list,
143by writing the *next* pointer at the beginning of each slot, and the head of the
144list is written in the metadata struct.
145
146However, writing a pointer in each free slot of a newly allocated span would
147require committing and faulting in physical pages upfront, which would be
148unacceptable. Therefore, PartitionAlloc has a concept of *provisioning slots*.
149Only provisioned slots are chained into the freelist.
150Once provisioned slots in a span are depleted, then another page worth of slots
151is provisioned (note, a slot that crosses a page boundary only gets
152provisioned with slots of the next page). See
153`PartitionBucket::ProvisionMoreSlotsAndAllocOne()` for more details.
154
155Freelist pointers are stored at the beginning of each free slot. As such, they
156are the only metadata that is inline, i.e. stored among the
157objects. This makes them prone to overruns. On little-endian systems, the
158pointers are encoded by reversing byte order, so that partial overruns will very
159likely result in destroying the pointer, as opposed to forming a valid pointer
160to a nearby location.
161
162Furthermore, a shadow of a freelist pointer is stored next to it, encoded in a
163different manner. This helps PartitionAlloc detect corruptions.
164
165### Slot Span States
166
167A slot span can be in any of 4 states:
168* *Full*. A full span has no free slots.
169* *Empty*. An empty span has no allocated slots, only free slots.
170* *Active*. An active span is anything in between the above two.
171* *Decommitted*. A decommitted span is a special case of an empty span, where
172  all pages are decommitted from memory.
173
174PartitionAlloc prioritizes getting an available slot from an active span, over
175an empty one, in hope that the latter can be soon transitioned into a
176decommitted state, thus releasing memory. There is no mechanism, however, to
177prioritize selection of a slot span based on the number of already allocated
178slots.
179
180An empty span becomes decommitted either when there are too many empty spans
181(FIFO), or when `PartitionRoot::PurgeMemory()` gets invoked periodically (or in
182low memory pressure conditions). An allocation can be satisfied from
183a decommitted span if there are no active or empty spans available. The slot
184provisioning mechanism kicks back in, committing the pages gradually as needed,
185and the span becomes active. (There is currently no other way
186to unprovision slots than decommitting the entire span).
187
188As mentioned above, a bucket is a collection of slot spans containing slots of
189the same size. In fact, each bucket has 3 linked-lists, chaining active, empty
190and decommitted spans (see `PartitionBucket::*_slot_spans_head`).
191There is no need for a full span list. The lists are updated lazily. An empty,
192decommitted or full span may stay on the active list for some time, until
193`PartitionBucket::SetNewActiveSlotSpan()` encounters it.
194A decommitted span may stay on the empty list for some time,
195until `PartitionBucket::SlowPathAlloc()` encounters it. However,
196the inaccuracy can't happen in the other direction, i.e. an active span can only
197be on the active list, and an empty span can only be on the active or empty
198list.
199
200[PartitionPage]: https://source.chromium.org/search?q=-file:third_party/(angle|dawn)%20class:PartitionPageMetadata%20file:partition_page.h&ss=chromium
201[SlotSpanMetadata]: https://source.chromium.org/search?q=-file:third_party/(angle|dawn)%20class:SlotSpanMetadata%20file:partition_page.h&ss=chromium
202[SubsequentPageMetadata]: https://source.chromium.org/search?q=-file:third_party/(angle|dawn)%20class:SubsequentPageMetadata%20file:partition_page.h&ss=chromium
203[payload-start]: https://source.chromium.org/search?q=-file:third_party%2F(angle%7Cdawn)%20content:SuperPagePayloadBegin%20file:partition_page.h&ss=chromium
204