xref: /aosp_15_r20/system/extras/simpleperf/doc/view_the_profile.md (revision 288bf5226967eb3dac5cce6c939ccc2a7f2b4fe5)
1# View the profile
2
3[TOC]
4
5## Introduction
6
7After using `simpleperf record` or `app_profiler.py`, we get a profile data
8file. The file contains a list of samples. Each sample has a timestamp, a thread
9id, a callstack, events (like cpu-cycles or cpu-clock) used in this sample, etc.
10We have many choices for viewing the profile. We can show samples in
11chronological order, or show aggregated flamegraphs. We can show reports in text
12format, or in some interactive UIs.
13
14Below shows some recommended UIs to view the profile. Google developers can find
15more examples in
16[go/gmm-profiling](go/gmm-profiling?polyglot=linux-workstation#viewing-the-profile).
17
18## Continuous PProf UI (great flamegraph UI, but only available internally)
19
20[PProf](https://github.com/google/pprof) is a mature profiling technology used
21extensively on Google servers, with a powerful flamegraph UI, with strong
22drilldown, search, pivot, profile diff, and graph visualisation.
23
24![Example](./pictures/continuous_pprof.png)
25
26We can use `pprof_proto_generator.py` to convert profiles into pprof.profile
27protobufs for use in pprof.
28
29```
30# Output all threads, broken down by threadpool.
31./pprof_proto_generator.py
32
33# Use proguard mapping.
34./pprof_proto_generator.py --proguard-mapping-file proguard.map
35
36# Just the main (UI) thread (query by thread name):
37./pprof_proto_generator.py --comm com.example.android.displayingbitmaps
38```
39
40This will print some debug logs about Failed to read symbols: this is usually
41OK, unless those symbols are hotspots.
42
43The continuous pprof server has a file upload size limit of 50MB. To get around
44this limit, compress the profile before uploading:
45
46```
47gzip pprof.profile
48```
49
50After compressing, you can upload the `pprof.profile.gz` file to http://pprof/.
51The website has an 'Upload' tab for this purpose. Alternatively, you can use the
52following `pprof` command to upload the compressed profile:
53
54```
55# Upload all threads in profile, grouped by threadpool.
56# This is usually a good default, combining threads with similar names.
57pprof --flame --tagroot threadpool pprof.profile.gz
58
59# Upload all threads in profile, grouped by individual thread name.
60pprof --flame --tagroot thread pprof.profile.gz
61
62# Upload all threads in profile, without grouping by thread.
63pprof --flame pprof.profile.gz
64This will output a URL, example: https://pprof.corp.google.com/?id=589a60852306144c880e36429e10b166
65```
66
67## Perfetto (preferred chronological UI and flamegraph UI for public)
68
69The [Perfetto UI](https://ui.perfetto.dev) is a web-based visualizer combining
70the chronological view of the profile with a powerful flamegraph UI.
71
72The Perfetto UI shows stack samples over time, exactly as collected by perf and
73allows selecting both region of time and certain threads and/or processes to
74analyse only matching samples. Moreover, it has a similar flamegraph UI to pprof
75very similar drilldown, search and pivot functionality. Finally, it also has an
76SQL query language (PerfettoSQL) which allows programmatic queries on profiles.
77
78![Example](./pictures/perfetto.png)
79
80We can use `gecko_profile_generator.py` to convert raw perf.data files into a
81Gecko format; while Perfetto supports opening raw perf.data files as well,
82symbolization and deobfuscation does not work out of the box.
83
84```
85# Create Gecko format profile
86./gecko_profile_generator.py > gecko_profile.json
87
88# Create Gecko format profile with Proguard map for deobfuscation
89./gecko_profile_generator.py --proguard-mapping-file proguard.map > gecko_profile.json
90```
91
92Then drag-and-drop `gecko_profile.json` into https://ui.perfetto.dev/.
93Alternatively, to open from the command line, you can also do:
94
95```
96curl -L https://github.com/google/perfetto/raw/main/tools/open_trace_in_ui | python - -i gecko_profile.json
97```
98
99Note: if running the above on a remote machine over SSH, you need to first port
100forward `9001` to your local machine. For example, you could do this by running:
101
102```
103ssh -fNT -L 9001:localhost:9001 <hostname>
104```
105
106## Firefox Profiler (great chronological UI)
107
108We can view Android profiles using Firefox Profiler:
109https://profiler.firefox.com/. This does not require Firefox installation --
110Firefox Profiler is just a website, you can open it in any browser. There is
111also an internal Google-Hosted Firefox Profiler, at go/profiler or
112go/firefox-profiler.
113
114![Example](./pictures/firefox_profiler.png)
115
116Firefox Profiler has a great chronological view, as it doesn't pre-aggregate
117similar stack traces like pprof does.
118
119We can use `gecko_profile_generator.py` to convert raw perf.data files into a
120Firefox Profile, with Proguard deobfuscation.
121
122```
123# Create Gecko Profile
124./gecko_profile_generator.py | gzip > gecko_profile.json.gz
125
126# Create Gecko Profile using Proguard map
127./gecko_profile_generator.py --proguard-mapping-file proguard.map | gzip > gecko_profile.json.gz
128```
129
130Then drag-and-drop gecko_profile.json.gz into https://profiler.firefox.com/.
131
132Firefox Profiler supports:
133
1341.  Aggregated Flamegraphs
1352.  Chronological Stackcharts
136
137And allows filtering by:
138
1391.  Individual threads
1402.  Multiple threads (Ctrl+Click thread names to select many)
1413.  Timeline period
1424.  Stack frame text search
143
144## FlameScope (great jank-finding UI)
145
146[Netflix's FlameScope](https://github.com/Netflix/flamescope) is a rough,
147proof-of-concept UI that lets you spot repeating patterns of work by laying out
148the profile as a subsecond heatmap.
149
150Below, each vertical stripe is one second, and each cell is 10ms. Redder cells
151have more samples. See
152https://www.brendangregg.com/blog/2018-11-08/flamescope-pattern-recognition.html
153for how to spot patterns.
154
155This is an example of a 60s DisplayBitmaps app Startup Profile.
156
157![Example](./pictures/flamescope.png)
158
159You can see:
160
161The thick red vertical line on the left is startup. The long white vertical
162sections on the left shows the app is mostly idle, waiting for commands from
163instrumented tests. Then we see periodically red blocks, which shows the app is
164periodically busy handling commands from instrumented tests.
165
166Click the start and end cells of a duration:
167
168![Example](./pictures/flamescope_click.png)
169
170To see a flamegraph for that duration:
171
172![Example](./pictures/flamescope_flamegraph.png)
173
174Install and run Flamescope:
175
176```
177git clone https://github.com/Netflix/flamescope ~/flamescope
178cd ~/flamescope
179pip install -r requirements.txt
180npm install
181npm run webpack
182python3 run.py
183```
184
185Then open FlameScope in-browser: http://localhost:5000/.
186
187FlameScope can read gzipped perf script format profiles. Convert simpleperf
188perf.data to this format with `report_sample.py`, and place it in Flamescope's
189examples directory:
190
191```
192# Create `Linux perf script` format profile.
193report_sample.py | gzip > ~/flamescope/examples/my_simpleperf_profile.gz
194
195# Create `Linux perf script` format profile using Proguard map.
196report_sample.py \
197  --proguard-mapping-file proguard.map \
198  | gzip > ~/flamescope/examples/my_simpleperf_profile.gz
199```
200
201Open the profile "as Linux Perf", and click start and end sections to get a
202flamegraph of that timespan.
203
204To investigate UI Thread Jank, filter to UI thread samples only:
205
206```
207report_sample.py \
208  --comm com.example.android.displayingbitmaps \ # UI Thread
209  | gzip > ~/flamescope/examples/uithread.gz
210```
211
212Once you've identified the timespan of interest, consider also zooming into that
213section with Firefox Profiler, which has a more powerful flamegraph viewer.
214
215## Differential FlameGraph
216
217See Brendan Gregg's
218[Differential Flame Graphs](https://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html)
219blog.
220
221Use Simpleperf's `stackcollapse.py` to convert perf.data to Folded Stacks format
222for the FlameGraph toolkit.
223
224Consider diffing both directions: After minus Before, and Before minus After.
225
226If you've recorded before and after your optimisation as perf_before.data and
227perf_after.data, and you're only interested in the UI thread:
228
229```
230# Generate before and after folded stacks from perf.data files
231./stackcollapse.py --kernel --jit -i perf_before.data \
232  --proguard-mapping-file proguard_before.map \
233  --comm com.example.android.displayingbitmaps \
234  > perf_before.folded
235./stackcollapse.py --kernel --jit -i perf_after.data \
236  --proguard-mapping-file proguard_after.map \
237  --comm com.example.android.displayingbitmaps \
238  > perf_after.folded
239
240# Generate diff reports
241FlameGraph/difffolded.pl -n perf_before.folded perf_after.folded \
242  | FlameGraph/flamegraph.pl > diff1.svg
243FlameGraph/difffolded.pl -n --negate perf_after.folded perf_before.folded \
244  | FlameGraph/flamegraph.pl > diff2.svg
245```
246
247## Android Studio Profiler
248
249Android Studio Profiler supports recording and reporting profiles of app
250processes. It supports several recording methods, including one using simpleperf
251as backend. You can use Android Studio Profiler for both recording and
252reporting.
253
254In Android Studio: Open View -> Tool Windows -> Profiler Click + -> Your Device
255-> Profileable Processes -> Your App
256
257![Example](./pictures/android_studio_profiler_select_process.png)
258
259Click into "CPU" Chart
260
261Choose Callstack Sample Recording. Even if you're using Java, this provides
262better observability, into ART, malloc, and the kernel.
263
264![Example](./pictures/android_studio_profiler_select_recording_method.png)
265
266Click Record, run your test on the device, then Stop when you're done.
267
268Click on a thread track, and "Flame Chart" to see a chronological chart on the
269left, and an aggregated flamechart on the right:
270
271![Example](./pictures/android_studio_profiler_flame_chart.png)
272
273If you want more flexibility in recording options, or want to add proguard
274mapping file, you can record using simpleperf, and report using Android Studio
275Profiler.
276
277We can use `simpleperf report-sample` to convert perf.data to trace files for
278Android Studio Profiler.
279
280```
281# Convert perf.data to perf.trace for Android Studio Profiler.
282# If on Mac/Windows, use simpleperf host executable for those platforms instead.
283bin/linux/x86_64/simpleperf report-sample --show-callchain --protobuf -i perf.data -o perf.trace
284
285# Convert perf.data to perf.trace using proguard mapping file.
286bin/linux/x86_64/simpleperf report-sample --show-callchain --protobuf -i perf.data -o perf.trace \
287    --proguard-mapping-file proguard.map
288```
289
290In Android Studio: Open File -> Open -> Select perf.trace
291
292![Example](./pictures/android_studio_profiler_open_perf_trace.png)
293
294## Simpleperf HTML Report
295
296Simpleperf can generate its own HTML Profile, which is able to show
297Android-specific information and separate flamegraphs for all threads, with a
298much rougher flamegraph UI.
299
300![Example](./pictures/report_html.png)
301
302This UI is fairly rough; we recommend using the Continuous PProf UI or Firefox
303Profiler instead. But it's useful for a quick look at your data.
304
305Each of the following commands take as input ./perf.data and output
306./report.html.
307
308```
309# Make an HTML report.
310./report_html.py
311
312# Make an HTML report with Proguard mapping.
313./report_html.py --proguard-mapping-file proguard.map
314```
315
316This will print some debug logs about Failed to read symbols: this is usually
317OK, unless those symbols are hotspots.
318
319See also [report_html.py's README](scripts_reference.md#report_htmlpy) and
320`report_html.py -h`.
321
322## PProf Interactive Command Line
323
324Unlike Continuous PProf UI, [PProf](https://github.com/google/pprof) command
325line is publicly available, and allows drilldown, pivoting and filtering.
326
327The below session demonstrates filtering to stack frames containing
328processBitmap.
329
330```
331$ pprof pprof.profile
332(pprof) show=processBitmap
333(pprof) top
334Active filters:
335   show=processBitmap
336Showing nodes accounting for 2.45s, 11.44% of 21.46s total
337      flat  flat%   sum%        cum   cum%
338     2.45s 11.44% 11.44%      2.45s 11.44%  com.example.android.displayingbitmaps.util.ImageFetcher.processBitmap
339```
340
341And then showing the tags of those frames, to tell what threads they are running
342on:
343
344```
345(pprof) tags
346 pid: Total 2.5s
347      2.5s (  100%): 31112
348
349 thread: Total 2.5s
350         1.4s (57.21%): AsyncTask #3
351         1.1s (42.79%): AsyncTask #4
352
353 threadpool: Total 2.5s
354             2.5s (  100%): AsyncTask #%d
355
356 tid: Total 2.5s
357      1.4s (57.21%): 31174
358      1.1s (42.79%): 31175
359```
360
361Contrast with another method:
362
363```
364(pprof) show=addBitmapToCache
365(pprof) top
366Active filters:
367   show=addBitmapToCache
368Showing nodes accounting for 1.05s, 4.88% of 21.46s total
369      flat  flat%   sum%        cum   cum%
370     1.05s  4.88%  4.88%      1.05s  4.88%  com.example.android.displayingbitmaps.util.ImageCache.addBitmapToCache
371```
372
373For more information, see the
374[pprof README](https://github.com/google/pprof/blob/main/doc/README.md#interactive-terminal-use).
375
376## Simpleperf Report Command Line
377
378The simpleperf report command reports profiles in text format.
379
380![Example](./pictures/report_command.png)
381
382You can call `simpleperf report` directly or call it via `report.py`.
383
384```
385# Report symbols in table format.
386$ ./report.py --children
387
388# Report call graph.
389$ bin/linux/x86_64/simpleperf report -g -i perf.data
390```
391
392See also
393[report command's README](executable_commands_reference.md#The-report-command)
394and `report.py -h`.
395
396## Custom Report Interface
397
398If the above View UIs can't fulfill your need, you can use
399`simpleperf_report_lib.py` to parse perf.data, extract sample information, and
400feed it to any views you like.
401
402See
403[simpleperf_report_lib.py's README](scripts_reference.md#simpleperf_report_libpy)
404for more details.
405