xref: /aosp_15_r20/external/perfetto/docs/data-sources/memory-counters.md (revision 6dbdd20afdafa5e3ca9b8809fa73465d530080dc)
1# Memory counters and events
2
3Perfetto allows to gather a number of memory events and counters on
4Android and Linux. These events come from kernel interfaces, both ftrace and
5/proc interfaces, and are of two types: polled counters and events pushed by
6the kernel in the ftrace buffer.
7
8## Per-process polled counters
9
10The process stats data source allows to poll `/proc/<pid>/status` and
11`/proc/<pid>/oom_score_adj` at user-defined intervals.
12
13See [`man 5 proc`][man-proc] for their semantic.
14
15### UI
16
17![](/docs/images/proc_stat.png "UI showing trace data collected by process stats pollers")
18
19### SQL
20
21```sql
22select c.ts, c.value, t.name as counter_name, p.name as proc_name, p.pid
23from counter as c left join process_counter_track as t on c.track_id = t.id
24left join process as p using (upid)
25where t.name like 'mem.%'
26```
27ts | counter_name | value_kb | proc_name | pid
28---|--------------|----------|-----------|----
29261187015027350 | mem.virt | 1326464 | com.android.vending | 28815
30261187015027350 | mem.rss | 85592 | com.android.vending | 28815
31261187015027350 | mem.rss.anon | 36948 | com.android.vending | 28815
32261187015027350 | mem.rss.file | 46560 | com.android.vending | 28815
33261187015027350 | mem.swap | 6908 | com.android.vending | 28815
34261187015027350 | mem.rss.watermark | 102856 | com.android.vending | 28815
35261187090251420 | mem.virt | 1326464 | com.android.vending | 28815
36
37### TraceConfig
38
39To collect process stat counters every X ms set `proc_stats_poll_ms = X` in
40your process stats config. X must be greater than 100ms to avoid excessive CPU
41usage. Details about the specific counters being collected can be found in the
42[ProcessStats reference](/docs/reference/trace-packet-proto.autogen#ProcessStats).
43
44```protobuf
45data_sources: {
46    config {
47        name: "linux.process_stats"
48        process_stats_config {
49            scan_all_processes_on_start: true
50            proc_stats_poll_ms: 1000
51        }
52    }
53}
54```
55
56## Per-process memory events (ftrace)
57
58### rss_stat
59
60Recent versions of the Linux kernel allow to report ftrace events when the
61Resident Set Size (RSS) mm counters change. This is the same counter available
62in `/proc/pid/status` as `VmRSS`. The main advantage of this event is that by
63being an event-driven push event it allows to detect very short memory usage
64bursts that would be otherwise undetectable by using /proc counters.
65
66Memory usage peaks of hundreds of MB can have dramatically negative impact on
67Android, even if they last only few ms, as they can cause mass low memory kills
68to reclaim memory.
69
70The kernel feature that supports this has been introduced in the Linux Kernel
71in [b3d1411b6] and later improved by [e4dcad20]. They are available in upstream
72since Linux v5.5-rc1. This patch has been backported in several Google Pixel
73kernels running Android 10 (Q).
74
75[b3d1411b6]: https://github.com/torvalds/linux/commit/b3d1411b6726ea6930222f8f12587d89762477c6
76[e4dcad20]: https://github.com/torvalds/linux/commit/e4dcad204d3a281be6f8573e0a82648a4ad84e69
77
78### mm_event
79
80`mm_event` is an ftrace event that captures statistics about key memory events
81(a subset of the ones exposed by `/proc/vmstat`). Unlike RSS-stat counter
82updates, mm events are extremely high volume and tracing them individually would
83be unfeasible. `mm_event` instead reports only periodic histograms in the trace,
84reducing sensibly the overhead.
85
86`mm_event` is available only on some Google Pixel kernels running Android 10 (Q)
87and beyond.
88
89When `mm_event` is enabled, the following mm event types are recorded:
90
91* mem.mm.min_flt: Minor page faults
92* mem.mm.maj_flt: Major page faults
93* mem.mm.swp_flt: Page faults served by swapcache
94* mem.mm.read_io: Read page faults backed by I/O
95* mem.mm..compaction: Memory compaction events
96* mem.mm.reclaim: Memory reclaim events
97
98For each event type, the event records:
99
100* count: how many times the event happened since the previous event.
101* min_lat: the smallest latency (the duration of the mm event) recorded since
102  the previous event.
103* max_lat: the highest latency recorded since the previous event.
104
105### UI
106
107![rss_stat and mm_event](/docs/images/rss_stat_and_mm_event.png)
108
109### SQL
110
111At the SQL level, these events are imported and exposed in the same way as
112the corresponding polled events. This allows to collect both types of events
113(pushed and polled) and treat them uniformly in queries and scripts.
114
115```sql
116select c.ts, c.value, t.name as counter_name, p.name as proc_name, p.pid
117from counter as c left join process_counter_track as t on c.track_id = t.id
118left join process as p using (upid)
119where t.name like 'mem.%'
120```
121
122ts | value | counter_name | proc_name | pid
123---|-------|--------------|-----------|----
124777227867975055 | 18358272 | mem.rss.anon | com.google.android.apps.safetyhub | 31386
125777227865995315 | 5 | mem.mm.min_flt.count | com.google.android.apps.safetyhub | 31386
126777227865995315 | 8 | mem.mm.min_flt.max_lat | com.google.android.apps.safetyhub | 31386
127777227865995315 | 4 | mem.mm.min_flt.avg_lat | com.google.android.apps.safetyhub | 31386
128777227865998023 | 3 | mem.mm.swp_flt.count | com.google.android.apps.safetyhub | 31386
129
130### TraceConfig
131
132```protobuf
133data_sources: {
134    config {
135        name: "linux.ftrace"
136        ftrace_config {
137            ftrace_events: "kmem/rss_stat"
138            ftrace_events: "mm_event/mm_event_record"
139        }
140    }
141}
142
143# This is for getting Thread<>Process associations and full process names.
144data_sources: {
145    config {
146        name: "linux.process_stats"
147    }
148}
149```
150
151## System-wide polled counters
152
153This data source allows periodic polling of system data from:
154
155- `/proc/stat`
156- `/proc/vmstat`
157- `/proc/meminfo`
158
159See [`man 5 proc`][man-proc] for their semantic.
160
161### UI
162
163![System Memory Counters](/docs/images/sys_stat_counters.png
164"Example of system memory counters in the UI")
165
166The polling period and specific counters to include in the trace can be set in the trace config.
167
168### SQL
169
170```sql
171select c.ts, t.name, c.value / 1024 as value_kb from counters as c left join counter_track as t on c.track_id = t.id
172```
173
174ts | name | value_kb
175---|------|---------
176775177736769834 | MemAvailable | 1708956
177775177736769834 | Buffers | 6208
178775177736769834 | Cached | 1352960
179775177736769834 | SwapCached | 8232
180775177736769834 | Active | 1021108
181775177736769834 | Inactive(file) | 351496
182
183### TraceConfig
184
185The set of supported counters is available in the
186[TraceConfig reference](/docs/reference/trace-config-proto.autogen#SysStatsConfig)
187
188```protobuf
189data_sources: {
190    config {
191        name: "linux.sys_stats"
192        sys_stats_config {
193            meminfo_period_ms: 1000
194            meminfo_counters: MEMINFO_MEM_TOTAL
195            meminfo_counters: MEMINFO_MEM_FREE
196            meminfo_counters: MEMINFO_MEM_AVAILABLE
197
198            vmstat_period_ms: 1000
199            vmstat_counters: VMSTAT_NR_FREE_PAGES
200            vmstat_counters: VMSTAT_NR_ALLOC_BATCH
201            vmstat_counters: VMSTAT_NR_INACTIVE_ANON
202            vmstat_counters: VMSTAT_NR_ACTIVE_ANON
203
204            stat_period_ms: 1000
205            stat_counters: STAT_CPU_TIMES
206            stat_counters: STAT_FORK_COUNT
207        }
208    }
209}
210```
211
212
213
214## Low-memory Kills (LMK)
215
216#### Background
217
218The Android framework kills apps and services, especially background ones, to
219make room for newly opened apps when memory is needed. These are known as low
220memory kills (LMK).
221
222Note LMKs are not always the symptom of a performance problem. The rule of thumb
223is that the severity (as in: user perceived impact) is proportional to the state
224of the app being killed. The app state can be derived in a trace from the OOM
225adjustment score.
226
227A LMK of a foreground app or service is typically a big concern. This happens
228when the app that the user was using disappeared under their fingers, or their
229favorite music player service suddenly stopped playing music.
230
231A LMK of a cached app or service, instead, is frequently business-as-usual and
232in most cases won't be noticed by the end user until they try to go back to
233the app, which will then cold-start.
234
235The situation in between these extremes is more nuanced. LMKs of cached
236apps/service can be still problematic if it happens in storms (i.e. observing
237that most processes get LMK-ed in a short time frame) and are often the symptom
238of some component of the system causing memory spikes.
239
240### lowmemorykiller vs lmkd
241
242#### In-kernel lowmemorykiller driver
243In Android, LMK used to be handled by an ad-hoc kernel-driver,
244Linux's [drivers/staging/android/lowmemorykiller.c](https://github.com/torvalds/linux/blob/v3.8/drivers/staging/android/lowmemorykiller.c).
245This driver uses to emit the ftrace event `lowmemorykiller/lowmemory_kill`
246in the trace.
247
248#### Userspace lmkd
249
250Android 9 introduced a userspace native daemon that took over the LMK
251responsibility: `lmkd`. Not all devices running Android 9 will
252necessarily use `lmkd` as the ultimate choice of in-kernel vs userspace is
253up to the phone manufacturer, their kernel version and kernel config.
254
255On Google Pixel phones, `lmkd`-side killing is used since Pixel 2 running
256Android 9.
257
258See https://source.android.com/devices/tech/perf/lmkd for details.
259
260`lmkd` emits a userspace atrace counter event called `kill_one_process`.
261
262#### Android LMK vs Linux oomkiller
263
264LMKs on Android, whether the old in-kernel `lowmemkiller` or the newer `lmkd`,
265use a completely different mechanism than the standard
266[Linux kernel's OOM Killer](https://linux-mm.org/OOM_Killer).
267Perfetto at the moment supports only Android LMK events (Both in-kernel and
268user-space) and does not support tracing of Linux kernel OOM Killer events.
269Linux OOMKiller events are still theoretically possible on Android but extremely
270unlikely to happen. If they happen, they are more likely the symptom of a
271mis-configured BSP.
272
273### UI
274
275Newer userspace LMKs are available in the UI under the `lmkd` track
276in the form of a counter. The counter value is the PID of the killed process
277(in the example below, PID=27985).
278
279![Userspace lmkd](/docs/images/lmk_lmkd.png "Example of a LMK caused by lmkd")
280
281TODO: we are working on a better UI support for LMKs.
282
283### SQL
284
285Both newer lmkd and legacy kernel-driven lowmemorykiller events are normalized
286at import time and available under the `mem.lmk` key in the `instants` table.
287
288```sql
289SELECT ts, process.name, process.pid
290FROM instant
291JOIN process_track ON instant.track_id = process_track.id
292JOIN process USING (upid)
293WHERE instant.name = 'mem.lmk'
294```
295
296| ts | name | pid |
297|----|------|-----|
298| 442206415875043 | roid.apps.turbo | 27324 |
299| 442206446142234 | android.process.acore | 27683 |
300| 442206462090204 | com.google.process.gapps | 28198 |
301
302### TraceConfig
303
304To enable tracing of low memory kills add the following options to trace config:
305
306```protobuf
307data_sources: {
308    config {
309        name: "linux.ftrace"
310        ftrace_config {
311            # For old in-kernel events.
312            ftrace_events: "lowmemorykiller/lowmemory_kill"
313
314            # For new userspace lmkds.
315            atrace_apps: "lmkd"
316
317            # This is not strictly required but is useful to know the state
318            # of the process (FG, cached, ...) before it got killed.
319            ftrace_events: "oom/oom_score_adj_update"
320        }
321    }
322}
323```
324
325## {#oom-adj} App states and OOM adjustment score
326
327The Android app state can be inferred in a trace from the process
328`oom_score_adj`. The mapping is not 1:1, there are more states than
329oom_score_adj value groups and the `oom_score_adj` range for cached processes
330spans from 900 to 1000.
331
332The mapping can be inferred from the
333[ActivityManager's ProcessList sources](https://cs.android.com/android/platform/superproject/+/android10-release:frameworks/base/services/core/java/com/android/server/am/ProcessList.java;l=126)
334
335```java
336// This is a process only hosting activities that are not visible,
337// so it can be killed without any disruption.
338static final int CACHED_APP_MAX_ADJ = 999;
339static final int CACHED_APP_MIN_ADJ = 900;
340
341// This is the oom_adj level that we allow to die first. This cannot be equal to
342// CACHED_APP_MAX_ADJ unless processes are actively being assigned an oom_score_adj of
343// CACHED_APP_MAX_ADJ.
344static final int CACHED_APP_LMK_FIRST_ADJ = 950;
345
346// The B list of SERVICE_ADJ -- these are the old and decrepit
347// services that aren't as shiny and interesting as the ones in the A list.
348static final int SERVICE_B_ADJ = 800;
349
350// This is the process of the previous application that the user was in.
351// This process is kept above other things, because it is very common to
352// switch back to the previous app.  This is important both for recent
353// task switch (toggling between the two top recent apps) as well as normal
354// UI flow such as clicking on a URI in the e-mail app to view in the browser,
355// and then pressing back to return to e-mail.
356static final int PREVIOUS_APP_ADJ = 700;
357
358// This is a process holding the home application -- we want to try
359// avoiding killing it, even if it would normally be in the background,
360// because the user interacts with it so much.
361static final int HOME_APP_ADJ = 600;
362
363// This is a process holding an application service -- killing it will not
364// have much of an impact as far as the user is concerned.
365static final int SERVICE_ADJ = 500;
366
367// This is a process with a heavy-weight application.  It is in the
368// background, but we want to try to avoid killing it.  Value set in
369// system/rootdir/init.rc on startup.
370static final int HEAVY_WEIGHT_APP_ADJ = 400;
371
372// This is a process currently hosting a backup operation.  Killing it
373// is not entirely fatal but is generally a bad idea.
374static final int BACKUP_APP_ADJ = 300;
375
376// This is a process bound by the system (or other app) that's more important than services but
377// not so perceptible that it affects the user immediately if killed.
378static final int PERCEPTIBLE_LOW_APP_ADJ = 250;
379
380// This is a process only hosting components that are perceptible to the
381// user, and we really want to avoid killing them, but they are not
382// immediately visible. An example is background music playback.
383static final int PERCEPTIBLE_APP_ADJ = 200;
384
385// This is a process only hosting activities that are visible to the
386// user, so we'd prefer they don't disappear.
387static final int VISIBLE_APP_ADJ = 100;
388
389// This is a process that was recently TOP and moved to FGS. Continue to treat it almost
390// like a foreground app for a while.
391// @see TOP_TO_FGS_GRACE_PERIOD
392static final int PERCEPTIBLE_RECENT_FOREGROUND_APP_ADJ = 50;
393
394// This is the process running the current foreground app.  We'd really
395// rather not kill it!
396static final int FOREGROUND_APP_ADJ = 0;
397
398// This is a process that the system or a persistent process has bound to,
399// and indicated it is important.
400static final int PERSISTENT_SERVICE_ADJ = -700;
401
402// This is a system persistent process, such as telephony.  Definitely
403// don't want to kill it, but doing so is not completely fatal.
404static final int PERSISTENT_PROC_ADJ = -800;
405
406// The system process runs at the default adjustment.
407static final int SYSTEM_ADJ = -900;
408
409// Special code for native processes that are not being managed by the system (so
410// don't have an oom adj assigned by the system).
411static final int NATIVE_ADJ = -1000;
412```
413
414[man-proc]: https://manpages.debian.org/stretch/manpages/proc.5.en.html
415