1*1208bc7eSAndroid Build Coastguard WorkerThis document summarizes the common approaches for performance fine tuning with 2*1208bc7eSAndroid Build Coastguard Workerjemalloc (as of 5.1.0). The default configuration of jemalloc tends to work 3*1208bc7eSAndroid Build Coastguard Workerreasonably well in practice, and most applications should not have to tune any 4*1208bc7eSAndroid Build Coastguard Workeroptions. However, in order to cover a wide range of applications and avoid 5*1208bc7eSAndroid Build Coastguard Workerpathological cases, the default setting is sometimes kept conservative and 6*1208bc7eSAndroid Build Coastguard Workersuboptimal, even for many common workloads. When jemalloc is properly tuned for 7*1208bc7eSAndroid Build Coastguard Workera specific application / workload, it is common to improve system level metrics 8*1208bc7eSAndroid Build Coastguard Workerby a few percent, or make favorable trade-offs. 9*1208bc7eSAndroid Build Coastguard Worker 10*1208bc7eSAndroid Build Coastguard Worker 11*1208bc7eSAndroid Build Coastguard Worker## Notable runtime options for performance tuning 12*1208bc7eSAndroid Build Coastguard Worker 13*1208bc7eSAndroid Build Coastguard WorkerRuntime options can be set via 14*1208bc7eSAndroid Build Coastguard Worker[malloc_conf](http://jemalloc.net/jemalloc.3.html#tuning). 15*1208bc7eSAndroid Build Coastguard Worker 16*1208bc7eSAndroid Build Coastguard Worker* [background_thread](http://jemalloc.net/jemalloc.3.html#background_thread) 17*1208bc7eSAndroid Build Coastguard Worker 18*1208bc7eSAndroid Build Coastguard Worker Enabling jemalloc background threads generally improves the tail latency for 19*1208bc7eSAndroid Build Coastguard Worker application threads, since unused memory purging is shifted to the dedicated 20*1208bc7eSAndroid Build Coastguard Worker background threads. In addition, unintended purging delay caused by 21*1208bc7eSAndroid Build Coastguard Worker application inactivity is avoided with background threads. 22*1208bc7eSAndroid Build Coastguard Worker 23*1208bc7eSAndroid Build Coastguard Worker Suggested: `background_thread:true` when jemalloc managed threads can be 24*1208bc7eSAndroid Build Coastguard Worker allowed. 25*1208bc7eSAndroid Build Coastguard Worker 26*1208bc7eSAndroid Build Coastguard Worker* [metadata_thp](http://jemalloc.net/jemalloc.3.html#opt.metadata_thp) 27*1208bc7eSAndroid Build Coastguard Worker 28*1208bc7eSAndroid Build Coastguard Worker Allowing jemalloc to utilize transparent huge pages for its internal 29*1208bc7eSAndroid Build Coastguard Worker metadata usually reduces TLB misses significantly, especially for programs 30*1208bc7eSAndroid Build Coastguard Worker with large memory footprint and frequent allocation / deallocation 31*1208bc7eSAndroid Build Coastguard Worker activities. Metadata memory usage may increase due to the use of huge 32*1208bc7eSAndroid Build Coastguard Worker pages. 33*1208bc7eSAndroid Build Coastguard Worker 34*1208bc7eSAndroid Build Coastguard Worker Suggested for allocation intensive programs: `metadata_thp:auto` or 35*1208bc7eSAndroid Build Coastguard Worker `metadata_thp:always`, which is expected to improve CPU utilization at a 36*1208bc7eSAndroid Build Coastguard Worker small memory cost. 37*1208bc7eSAndroid Build Coastguard Worker 38*1208bc7eSAndroid Build Coastguard Worker* [dirty_decay_ms](http://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms) and 39*1208bc7eSAndroid Build Coastguard Worker [muzzy_decay_ms](http://jemalloc.net/jemalloc.3.html#opt.muzzy_decay_ms) 40*1208bc7eSAndroid Build Coastguard Worker 41*1208bc7eSAndroid Build Coastguard Worker Decay time determines how fast jemalloc returns unused pages back to the 42*1208bc7eSAndroid Build Coastguard Worker operating system, and therefore provides a fairly straightforward trade-off 43*1208bc7eSAndroid Build Coastguard Worker between CPU and memory usage. Shorter decay time purges unused pages faster 44*1208bc7eSAndroid Build Coastguard Worker to reduces memory usage (usually at the cost of more CPU cycles spent on 45*1208bc7eSAndroid Build Coastguard Worker purging), and vice versa. 46*1208bc7eSAndroid Build Coastguard Worker 47*1208bc7eSAndroid Build Coastguard Worker Suggested: tune the values based on the desired trade-offs. 48*1208bc7eSAndroid Build Coastguard Worker 49*1208bc7eSAndroid Build Coastguard Worker* [narenas](http://jemalloc.net/jemalloc.3.html#opt.narenas) 50*1208bc7eSAndroid Build Coastguard Worker 51*1208bc7eSAndroid Build Coastguard Worker By default jemalloc uses multiple arenas to reduce internal lock contention. 52*1208bc7eSAndroid Build Coastguard Worker However high arena count may also increase overall memory fragmentation, 53*1208bc7eSAndroid Build Coastguard Worker since arenas manage memory independently. When high degree of parallelism 54*1208bc7eSAndroid Build Coastguard Worker is not expected at the allocator level, lower number of arenas often 55*1208bc7eSAndroid Build Coastguard Worker improves memory usage. 56*1208bc7eSAndroid Build Coastguard Worker 57*1208bc7eSAndroid Build Coastguard Worker Suggested: if low parallelism is expected, try lower arena count while 58*1208bc7eSAndroid Build Coastguard Worker monitoring CPU and memory usage. 59*1208bc7eSAndroid Build Coastguard Worker 60*1208bc7eSAndroid Build Coastguard Worker* [percpu_arena](http://jemalloc.net/jemalloc.3.html#opt.percpu_arena) 61*1208bc7eSAndroid Build Coastguard Worker 62*1208bc7eSAndroid Build Coastguard Worker Enable dynamic thread to arena association based on running CPU. This has 63*1208bc7eSAndroid Build Coastguard Worker the potential to improve locality, e.g. when thread to CPU affinity is 64*1208bc7eSAndroid Build Coastguard Worker present. 65*1208bc7eSAndroid Build Coastguard Worker 66*1208bc7eSAndroid Build Coastguard Worker Suggested: try `percpu_arena:percpu` or `percpu_arena:phycpu` if 67*1208bc7eSAndroid Build Coastguard Worker thread migration between processors is expected to be infrequent. 68*1208bc7eSAndroid Build Coastguard Worker 69*1208bc7eSAndroid Build Coastguard WorkerExamples: 70*1208bc7eSAndroid Build Coastguard Worker 71*1208bc7eSAndroid Build Coastguard Worker* High resource consumption application, prioritizing CPU utilization: 72*1208bc7eSAndroid Build Coastguard Worker 73*1208bc7eSAndroid Build Coastguard Worker `background_thread:true,metadata_thp:auto` combined with relaxed decay time 74*1208bc7eSAndroid Build Coastguard Worker (increased `dirty_decay_ms` and / or `muzzy_decay_ms`, 75*1208bc7eSAndroid Build Coastguard Worker e.g. `dirty_decay_ms:30000,muzzy_decay_ms:30000`). 76*1208bc7eSAndroid Build Coastguard Worker 77*1208bc7eSAndroid Build Coastguard Worker* High resource consumption application, prioritizing memory usage: 78*1208bc7eSAndroid Build Coastguard Worker 79*1208bc7eSAndroid Build Coastguard Worker `background_thread:true` combined with shorter decay time (decreased 80*1208bc7eSAndroid Build Coastguard Worker `dirty_decay_ms` and / or `muzzy_decay_ms`, 81*1208bc7eSAndroid Build Coastguard Worker e.g. `dirty_decay_ms:5000,muzzy_decay_ms:5000`), and lower arena count 82*1208bc7eSAndroid Build Coastguard Worker (e.g. number of CPUs). 83*1208bc7eSAndroid Build Coastguard Worker 84*1208bc7eSAndroid Build Coastguard Worker* Low resource consumption application: 85*1208bc7eSAndroid Build Coastguard Worker 86*1208bc7eSAndroid Build Coastguard Worker `narenas:1,lg_tcache_max:13` combined with shorter decay time (decreased 87*1208bc7eSAndroid Build Coastguard Worker `dirty_decay_ms` and / or `muzzy_decay_ms`,e.g. 88*1208bc7eSAndroid Build Coastguard Worker `dirty_decay_ms:1000,muzzy_decay_ms:0`). 89*1208bc7eSAndroid Build Coastguard Worker 90*1208bc7eSAndroid Build Coastguard Worker* Extremely conservative -- minimize memory usage at all costs, only suitable when 91*1208bc7eSAndroid Build Coastguard Workerallocation activity is very rare: 92*1208bc7eSAndroid Build Coastguard Worker 93*1208bc7eSAndroid Build Coastguard Worker `narenas:1,tcache:false,dirty_decay_ms:0,muzzy_decay_ms:0` 94*1208bc7eSAndroid Build Coastguard Worker 95*1208bc7eSAndroid Build Coastguard WorkerNote that it is recommended to combine the options with `abort_conf:true` which 96*1208bc7eSAndroid Build Coastguard Workeraborts immediately on illegal options. 97*1208bc7eSAndroid Build Coastguard Worker 98*1208bc7eSAndroid Build Coastguard Worker## Beyond runtime options 99*1208bc7eSAndroid Build Coastguard Worker 100*1208bc7eSAndroid Build Coastguard WorkerIn addition to the runtime options, there are a number of programmatic ways to 101*1208bc7eSAndroid Build Coastguard Workerimprove application performance with jemalloc. 102*1208bc7eSAndroid Build Coastguard Worker 103*1208bc7eSAndroid Build Coastguard Worker* [Explicit arenas](http://jemalloc.net/jemalloc.3.html#arenas.create) 104*1208bc7eSAndroid Build Coastguard Worker 105*1208bc7eSAndroid Build Coastguard Worker Manually created arenas can help performance in various ways, e.g. by 106*1208bc7eSAndroid Build Coastguard Worker managing locality and contention for specific usages. For example, 107*1208bc7eSAndroid Build Coastguard Worker applications can explicitly allocate frequently accessed objects from a 108*1208bc7eSAndroid Build Coastguard Worker dedicated arena with 109*1208bc7eSAndroid Build Coastguard Worker [mallocx()](http://jemalloc.net/jemalloc.3.html#MALLOCX_ARENA) to improve 110*1208bc7eSAndroid Build Coastguard Worker locality. In addition, explicit arenas often benefit from individually 111*1208bc7eSAndroid Build Coastguard Worker tuned options, e.g. relaxed [decay 112*1208bc7eSAndroid Build Coastguard Worker time](http://jemalloc.net/jemalloc.3.html#arena.i.dirty_decay_ms) if 113*1208bc7eSAndroid Build Coastguard Worker frequent reuse is expected. 114*1208bc7eSAndroid Build Coastguard Worker 115*1208bc7eSAndroid Build Coastguard Worker* [Extent hooks](http://jemalloc.net/jemalloc.3.html#arena.i.extent_hooks) 116*1208bc7eSAndroid Build Coastguard Worker 117*1208bc7eSAndroid Build Coastguard Worker Extent hooks allow customization for managing underlying memory. One use 118*1208bc7eSAndroid Build Coastguard Worker case for performance purpose is to utilize huge pages -- for example, 119*1208bc7eSAndroid Build Coastguard Worker [HHVM](https://github.com/facebook/hhvm/blob/master/hphp/util/alloc.cpp) 120*1208bc7eSAndroid Build Coastguard Worker uses explicit arenas with customized extent hooks to manage 1GB huge pages 121*1208bc7eSAndroid Build Coastguard Worker for frequently accessed data, which reduces TLB misses significantly. 122*1208bc7eSAndroid Build Coastguard Worker 123*1208bc7eSAndroid Build Coastguard Worker* [Explicit thread-to-arena 124*1208bc7eSAndroid Build Coastguard Worker binding](http://jemalloc.net/jemalloc.3.html#thread.arena) 125*1208bc7eSAndroid Build Coastguard Worker 126*1208bc7eSAndroid Build Coastguard Worker It is common for some threads in an application to have different memory 127*1208bc7eSAndroid Build Coastguard Worker access / allocation patterns. Threads with heavy workloads often benefit 128*1208bc7eSAndroid Build Coastguard Worker from explicit binding, e.g. binding very active threads to dedicated arenas 129*1208bc7eSAndroid Build Coastguard Worker may reduce contention at the allocator level. 130