xref: /aosp_15_r20/system/extras/simpleperf/doc/inferno.md (revision 288bf5226967eb3dac5cce6c939ccc2a7f2b4fe5)
1# Inferno
2
3![logo](./inferno_small.png)
4
5[TOC]
6
7## Description
8
9Inferno is a flamegraph generator for native (C/C++) Android apps. It was
10originally written to profile and improve surfaceflinger performance
11(Android compositor) but it can be used for any native Android application
12. You can see a sample report generated with Inferno
13[here](./report.html). Report are self-contained in HTML so they can be
14exchanged easily.
15
16Notice there is no concept of time in a flame graph since all callstack are
17merged together. As a result, the width of a flamegraph represents 100% of
18the number of samples and the height is related to the number of functions on
19the stack when sampling occurred.
20
21
22![flamegraph sample](./main_thread_flamegraph.png)
23
24In the flamegraph featured above you can see the main thread of SurfaceFlinger.
25It is immediatly apparent that most of the CPU time is spent processing messages
26`android::SurfaceFlinger::onMessageReceived`. The most expensive task is to ask
27 the screen to be refreshed as `android::DisplayDevice::prepare` shows in orange
28. This graphic division helps to see what part of the program is costly and
29where a developer's effort to improve performances should go.
30
31## Example of bottleneck
32
33A flamegraph give you instant vision on the CPU cycles cost centers but
34it can also be used to find specific offenders. To find them, look for
35plateaus. It is easier to see an example:
36
37![flamegraph sample](./bottleneck.png)
38
39In the previous flamegraph, two
40plateaus (due to `android::BufferQueueCore::validateConsistencyLocked`)
41are immediately apparent.
42
43## How it works
44
45Inferno relies on simpleperf to record the callstack of a native application
46thousands of times per second. Simpleperf takes care of unwinding the stack
47either using frame pointer (recommended) or dwarf. At the end of the recording
48`simpleperf` also symbolize all IPs automatically. The record are aggregated and
49dumps dumped to a file `perf.data`. This file is pulled from the Android device
50and processed on the host by Inferno. The callstacks are merged together to
51visualize in which part of an app the CPU cycles are spent.
52
53## How to use it
54
55Open a terminal and from `simpleperf/scripts` directory type:
56```
57./inferno.sh  (on Linux/Mac)
58inferno.bat (on Windows)
59```
60
61Inferno will collect data, process them and automatically open your web browser
62to display the HTML report.
63
64## Parameters
65
66You can select how long to sample for, the color of the node and many other
67things. Use `-h` to get a list of all supported parameters.
68
69```
70./inferno.sh -h
71```
72
73## Troubleshooting
74
75### Messy flame graph
76
77A healthy flame graph features a single call site at its base (see [here](./report.html)).
78If you don't see a unique call site like `_start` or `_start_thread` at the base
79from which all flames originate, something went wrong. : Stack unwinding may
80fail to reach the root callsite. These incomplete
81callstack are impossible to merge properly. By default Inferno asks
82 `simpleperf` to unwind the stack via the kernel and frame pointers. Try to
83 perform unwinding with dwarf `-du`, you can further tune this setting.
84
85
86### No flames
87
88If you see no flames at all or a mess of 1 level flame without a common base,
89this may be because you compiled without frame pointers. Make sure there is no
90` -fomit-frame-pointer` in your build config. Alternatively, ask simpleperf to
91collect data with dward unwinding `-du`.
92
93
94
95### High percentage of lost samples
96
97If simpleperf reports a lot of lost sample it is probably because you are
98unwinding with `dwarf`. Dwarf unwinding involves copying the stack before it is
99processed. Try to use frame pointer unwinding which can be done by the kernel
100and it much faster.
101
102The cost of frame pointer is negligible on arm64 parameter but considerable
103 on arm 32-bit arch (due to register pressure). Use a 64-bit build for better
104 profiling.
105
106### run-as: package not debuggable
107
108If you cannot run as root, make sure the app is debuggable otherwise simpleperf
109will not be able to profile it.
110