xref: /aosp_15_r20/external/jemalloc_new/TUNING.md (revision 1208bc7e437ced7eb82efac44ba17e3beba411da)
1*1208bc7eSAndroid Build Coastguard WorkerThis document summarizes the common approaches for performance fine tuning with
2*1208bc7eSAndroid Build Coastguard Workerjemalloc (as of 5.1.0).  The default configuration of jemalloc tends to work
3*1208bc7eSAndroid Build Coastguard Workerreasonably well in practice, and most applications should not have to tune any
4*1208bc7eSAndroid Build Coastguard Workeroptions. However, in order to cover a wide range of applications and avoid
5*1208bc7eSAndroid Build Coastguard Workerpathological cases, the default setting is sometimes kept conservative and
6*1208bc7eSAndroid Build Coastguard Workersuboptimal, even for many common workloads.  When jemalloc is properly tuned for
7*1208bc7eSAndroid Build Coastguard Workera specific application / workload, it is common to improve system level metrics
8*1208bc7eSAndroid Build Coastguard Workerby a few percent, or make favorable trade-offs.
9*1208bc7eSAndroid Build Coastguard Worker
10*1208bc7eSAndroid Build Coastguard Worker
11*1208bc7eSAndroid Build Coastguard Worker## Notable runtime options for performance tuning
12*1208bc7eSAndroid Build Coastguard Worker
13*1208bc7eSAndroid Build Coastguard WorkerRuntime options can be set via
14*1208bc7eSAndroid Build Coastguard Worker[malloc_conf](http://jemalloc.net/jemalloc.3.html#tuning).
15*1208bc7eSAndroid Build Coastguard Worker
16*1208bc7eSAndroid Build Coastguard Worker* [background_thread](http://jemalloc.net/jemalloc.3.html#background_thread)
17*1208bc7eSAndroid Build Coastguard Worker
18*1208bc7eSAndroid Build Coastguard Worker    Enabling jemalloc background threads generally improves the tail latency for
19*1208bc7eSAndroid Build Coastguard Worker    application threads, since unused memory purging is shifted to the dedicated
20*1208bc7eSAndroid Build Coastguard Worker    background threads.  In addition, unintended purging delay caused by
21*1208bc7eSAndroid Build Coastguard Worker    application inactivity is avoided with background threads.
22*1208bc7eSAndroid Build Coastguard Worker
23*1208bc7eSAndroid Build Coastguard Worker    Suggested: `background_thread:true` when jemalloc managed threads can be
24*1208bc7eSAndroid Build Coastguard Worker    allowed.
25*1208bc7eSAndroid Build Coastguard Worker
26*1208bc7eSAndroid Build Coastguard Worker* [metadata_thp](http://jemalloc.net/jemalloc.3.html#opt.metadata_thp)
27*1208bc7eSAndroid Build Coastguard Worker
28*1208bc7eSAndroid Build Coastguard Worker    Allowing jemalloc to utilize transparent huge pages for its internal
29*1208bc7eSAndroid Build Coastguard Worker    metadata usually reduces TLB misses significantly, especially for programs
30*1208bc7eSAndroid Build Coastguard Worker    with large memory footprint and frequent allocation / deallocation
31*1208bc7eSAndroid Build Coastguard Worker    activities.  Metadata memory usage may increase due to the use of huge
32*1208bc7eSAndroid Build Coastguard Worker    pages.
33*1208bc7eSAndroid Build Coastguard Worker
34*1208bc7eSAndroid Build Coastguard Worker    Suggested for allocation intensive programs: `metadata_thp:auto` or
35*1208bc7eSAndroid Build Coastguard Worker    `metadata_thp:always`, which is expected to improve CPU utilization at a
36*1208bc7eSAndroid Build Coastguard Worker    small memory cost.
37*1208bc7eSAndroid Build Coastguard Worker
38*1208bc7eSAndroid Build Coastguard Worker* [dirty_decay_ms](http://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms) and
39*1208bc7eSAndroid Build Coastguard Worker  [muzzy_decay_ms](http://jemalloc.net/jemalloc.3.html#opt.muzzy_decay_ms)
40*1208bc7eSAndroid Build Coastguard Worker
41*1208bc7eSAndroid Build Coastguard Worker    Decay time determines how fast jemalloc returns unused pages back to the
42*1208bc7eSAndroid Build Coastguard Worker    operating system, and therefore provides a fairly straightforward trade-off
43*1208bc7eSAndroid Build Coastguard Worker    between CPU and memory usage.  Shorter decay time purges unused pages faster
44*1208bc7eSAndroid Build Coastguard Worker    to reduces memory usage (usually at the cost of more CPU cycles spent on
45*1208bc7eSAndroid Build Coastguard Worker    purging), and vice versa.
46*1208bc7eSAndroid Build Coastguard Worker
47*1208bc7eSAndroid Build Coastguard Worker    Suggested: tune the values based on the desired trade-offs.
48*1208bc7eSAndroid Build Coastguard Worker
49*1208bc7eSAndroid Build Coastguard Worker* [narenas](http://jemalloc.net/jemalloc.3.html#opt.narenas)
50*1208bc7eSAndroid Build Coastguard Worker
51*1208bc7eSAndroid Build Coastguard Worker    By default jemalloc uses multiple arenas to reduce internal lock contention.
52*1208bc7eSAndroid Build Coastguard Worker    However high arena count may also increase overall memory fragmentation,
53*1208bc7eSAndroid Build Coastguard Worker    since arenas manage memory independently.  When high degree of parallelism
54*1208bc7eSAndroid Build Coastguard Worker    is not expected at the allocator level, lower number of arenas often
55*1208bc7eSAndroid Build Coastguard Worker    improves memory usage.
56*1208bc7eSAndroid Build Coastguard Worker
57*1208bc7eSAndroid Build Coastguard Worker    Suggested: if low parallelism is expected, try lower arena count while
58*1208bc7eSAndroid Build Coastguard Worker    monitoring CPU and memory usage.
59*1208bc7eSAndroid Build Coastguard Worker
60*1208bc7eSAndroid Build Coastguard Worker* [percpu_arena](http://jemalloc.net/jemalloc.3.html#opt.percpu_arena)
61*1208bc7eSAndroid Build Coastguard Worker
62*1208bc7eSAndroid Build Coastguard Worker    Enable dynamic thread to arena association based on running CPU.  This has
63*1208bc7eSAndroid Build Coastguard Worker    the potential to improve locality, e.g. when thread to CPU affinity is
64*1208bc7eSAndroid Build Coastguard Worker    present.
65*1208bc7eSAndroid Build Coastguard Worker
66*1208bc7eSAndroid Build Coastguard Worker    Suggested: try `percpu_arena:percpu` or `percpu_arena:phycpu` if
67*1208bc7eSAndroid Build Coastguard Worker    thread migration between processors is expected to be infrequent.
68*1208bc7eSAndroid Build Coastguard Worker
69*1208bc7eSAndroid Build Coastguard WorkerExamples:
70*1208bc7eSAndroid Build Coastguard Worker
71*1208bc7eSAndroid Build Coastguard Worker* High resource consumption application, prioritizing CPU utilization:
72*1208bc7eSAndroid Build Coastguard Worker
73*1208bc7eSAndroid Build Coastguard Worker    `background_thread:true,metadata_thp:auto` combined with relaxed decay time
74*1208bc7eSAndroid Build Coastguard Worker    (increased `dirty_decay_ms` and / or `muzzy_decay_ms`,
75*1208bc7eSAndroid Build Coastguard Worker    e.g. `dirty_decay_ms:30000,muzzy_decay_ms:30000`).
76*1208bc7eSAndroid Build Coastguard Worker
77*1208bc7eSAndroid Build Coastguard Worker* High resource consumption application, prioritizing memory usage:
78*1208bc7eSAndroid Build Coastguard Worker
79*1208bc7eSAndroid Build Coastguard Worker    `background_thread:true` combined with shorter decay time (decreased
80*1208bc7eSAndroid Build Coastguard Worker    `dirty_decay_ms` and / or `muzzy_decay_ms`,
81*1208bc7eSAndroid Build Coastguard Worker    e.g. `dirty_decay_ms:5000,muzzy_decay_ms:5000`), and lower arena count
82*1208bc7eSAndroid Build Coastguard Worker    (e.g. number of CPUs).
83*1208bc7eSAndroid Build Coastguard Worker
84*1208bc7eSAndroid Build Coastguard Worker* Low resource consumption application:
85*1208bc7eSAndroid Build Coastguard Worker
86*1208bc7eSAndroid Build Coastguard Worker    `narenas:1,lg_tcache_max:13` combined with shorter decay time (decreased
87*1208bc7eSAndroid Build Coastguard Worker    `dirty_decay_ms` and / or `muzzy_decay_ms`,e.g.
88*1208bc7eSAndroid Build Coastguard Worker    `dirty_decay_ms:1000,muzzy_decay_ms:0`).
89*1208bc7eSAndroid Build Coastguard Worker
90*1208bc7eSAndroid Build Coastguard Worker* Extremely conservative -- minimize memory usage at all costs, only suitable when
91*1208bc7eSAndroid Build Coastguard Workerallocation activity is very rare:
92*1208bc7eSAndroid Build Coastguard Worker
93*1208bc7eSAndroid Build Coastguard Worker    `narenas:1,tcache:false,dirty_decay_ms:0,muzzy_decay_ms:0`
94*1208bc7eSAndroid Build Coastguard Worker
95*1208bc7eSAndroid Build Coastguard WorkerNote that it is recommended to combine the options with `abort_conf:true` which
96*1208bc7eSAndroid Build Coastguard Workeraborts immediately on illegal options.
97*1208bc7eSAndroid Build Coastguard Worker
98*1208bc7eSAndroid Build Coastguard Worker## Beyond runtime options
99*1208bc7eSAndroid Build Coastguard Worker
100*1208bc7eSAndroid Build Coastguard WorkerIn addition to the runtime options, there are a number of programmatic ways to
101*1208bc7eSAndroid Build Coastguard Workerimprove application performance with jemalloc.
102*1208bc7eSAndroid Build Coastguard Worker
103*1208bc7eSAndroid Build Coastguard Worker* [Explicit arenas](http://jemalloc.net/jemalloc.3.html#arenas.create)
104*1208bc7eSAndroid Build Coastguard Worker
105*1208bc7eSAndroid Build Coastguard Worker    Manually created arenas can help performance in various ways, e.g. by
106*1208bc7eSAndroid Build Coastguard Worker    managing locality and contention for specific usages.  For example,
107*1208bc7eSAndroid Build Coastguard Worker    applications can explicitly allocate frequently accessed objects from a
108*1208bc7eSAndroid Build Coastguard Worker    dedicated arena with
109*1208bc7eSAndroid Build Coastguard Worker    [mallocx()](http://jemalloc.net/jemalloc.3.html#MALLOCX_ARENA) to improve
110*1208bc7eSAndroid Build Coastguard Worker    locality.  In addition, explicit arenas often benefit from individually
111*1208bc7eSAndroid Build Coastguard Worker    tuned options, e.g. relaxed [decay
112*1208bc7eSAndroid Build Coastguard Worker    time](http://jemalloc.net/jemalloc.3.html#arena.i.dirty_decay_ms) if
113*1208bc7eSAndroid Build Coastguard Worker    frequent reuse is expected.
114*1208bc7eSAndroid Build Coastguard Worker
115*1208bc7eSAndroid Build Coastguard Worker* [Extent hooks](http://jemalloc.net/jemalloc.3.html#arena.i.extent_hooks)
116*1208bc7eSAndroid Build Coastguard Worker
117*1208bc7eSAndroid Build Coastguard Worker    Extent hooks allow customization for managing underlying memory.  One use
118*1208bc7eSAndroid Build Coastguard Worker    case for performance purpose is to utilize huge pages -- for example,
119*1208bc7eSAndroid Build Coastguard Worker    [HHVM](https://github.com/facebook/hhvm/blob/master/hphp/util/alloc.cpp)
120*1208bc7eSAndroid Build Coastguard Worker    uses explicit arenas with customized extent hooks to manage 1GB huge pages
121*1208bc7eSAndroid Build Coastguard Worker    for frequently accessed data, which reduces TLB misses significantly.
122*1208bc7eSAndroid Build Coastguard Worker
123*1208bc7eSAndroid Build Coastguard Worker* [Explicit thread-to-arena
124*1208bc7eSAndroid Build Coastguard Worker  binding](http://jemalloc.net/jemalloc.3.html#thread.arena)
125*1208bc7eSAndroid Build Coastguard Worker
126*1208bc7eSAndroid Build Coastguard Worker    It is common for some threads in an application to have different memory
127*1208bc7eSAndroid Build Coastguard Worker    access / allocation patterns.  Threads with heavy workloads often benefit
128*1208bc7eSAndroid Build Coastguard Worker    from explicit binding, e.g. binding very active threads to dedicated arenas
129*1208bc7eSAndroid Build Coastguard Worker    may reduce contention at the allocator level.
130