1# View the profile 2 3[TOC] 4 5## Introduction 6 7After using `simpleperf record` or `app_profiler.py`, we get a profile data 8file. The file contains a list of samples. Each sample has a timestamp, a thread 9id, a callstack, events (like cpu-cycles or cpu-clock) used in this sample, etc. 10We have many choices for viewing the profile. We can show samples in 11chronological order, or show aggregated flamegraphs. We can show reports in text 12format, or in some interactive UIs. 13 14Below shows some recommended UIs to view the profile. Google developers can find 15more examples in 16[go/gmm-profiling](go/gmm-profiling?polyglot=linux-workstation#viewing-the-profile). 17 18## Continuous PProf UI (great flamegraph UI, but only available internally) 19 20[PProf](https://github.com/google/pprof) is a mature profiling technology used 21extensively on Google servers, with a powerful flamegraph UI, with strong 22drilldown, search, pivot, profile diff, and graph visualisation. 23 24 25 26We can use `pprof_proto_generator.py` to convert profiles into pprof.profile 27protobufs for use in pprof. 28 29``` 30# Output all threads, broken down by threadpool. 31./pprof_proto_generator.py 32 33# Use proguard mapping. 34./pprof_proto_generator.py --proguard-mapping-file proguard.map 35 36# Just the main (UI) thread (query by thread name): 37./pprof_proto_generator.py --comm com.example.android.displayingbitmaps 38``` 39 40This will print some debug logs about Failed to read symbols: this is usually 41OK, unless those symbols are hotspots. 42 43The continuous pprof server has a file upload size limit of 50MB. To get around 44this limit, compress the profile before uploading: 45 46``` 47gzip pprof.profile 48``` 49 50After compressing, you can upload the `pprof.profile.gz` file to http://pprof/. 51The website has an 'Upload' tab for this purpose. Alternatively, you can use the 52following `pprof` command to upload the compressed profile: 53 54``` 55# Upload all threads in profile, grouped by threadpool. 56# This is usually a good default, combining threads with similar names. 57pprof --flame --tagroot threadpool pprof.profile.gz 58 59# Upload all threads in profile, grouped by individual thread name. 60pprof --flame --tagroot thread pprof.profile.gz 61 62# Upload all threads in profile, without grouping by thread. 63pprof --flame pprof.profile.gz 64This will output a URL, example: https://pprof.corp.google.com/?id=589a60852306144c880e36429e10b166 65``` 66 67## Perfetto (preferred chronological UI and flamegraph UI for public) 68 69The [Perfetto UI](https://ui.perfetto.dev) is a web-based visualizer combining 70the chronological view of the profile with a powerful flamegraph UI. 71 72The Perfetto UI shows stack samples over time, exactly as collected by perf and 73allows selecting both region of time and certain threads and/or processes to 74analyse only matching samples. Moreover, it has a similar flamegraph UI to pprof 75very similar drilldown, search and pivot functionality. Finally, it also has an 76SQL query language (PerfettoSQL) which allows programmatic queries on profiles. 77 78 79 80We can use `gecko_profile_generator.py` to convert raw perf.data files into a 81Gecko format; while Perfetto supports opening raw perf.data files as well, 82symbolization and deobfuscation does not work out of the box. 83 84``` 85# Create Gecko format profile 86./gecko_profile_generator.py > gecko_profile.json 87 88# Create Gecko format profile with Proguard map for deobfuscation 89./gecko_profile_generator.py --proguard-mapping-file proguard.map > gecko_profile.json 90``` 91 92Then drag-and-drop `gecko_profile.json` into https://ui.perfetto.dev/. 93Alternatively, to open from the command line, you can also do: 94 95``` 96curl -L https://github.com/google/perfetto/raw/main/tools/open_trace_in_ui | python - -i gecko_profile.json 97``` 98 99Note: if running the above on a remote machine over SSH, you need to first port 100forward `9001` to your local machine. For example, you could do this by running: 101 102``` 103ssh -fNT -L 9001:localhost:9001 <hostname> 104``` 105 106## Firefox Profiler (great chronological UI) 107 108We can view Android profiles using Firefox Profiler: 109https://profiler.firefox.com/. This does not require Firefox installation -- 110Firefox Profiler is just a website, you can open it in any browser. There is 111also an internal Google-Hosted Firefox Profiler, at go/profiler or 112go/firefox-profiler. 113 114 115 116Firefox Profiler has a great chronological view, as it doesn't pre-aggregate 117similar stack traces like pprof does. 118 119We can use `gecko_profile_generator.py` to convert raw perf.data files into a 120Firefox Profile, with Proguard deobfuscation. 121 122``` 123# Create Gecko Profile 124./gecko_profile_generator.py | gzip > gecko_profile.json.gz 125 126# Create Gecko Profile using Proguard map 127./gecko_profile_generator.py --proguard-mapping-file proguard.map | gzip > gecko_profile.json.gz 128``` 129 130Then drag-and-drop gecko_profile.json.gz into https://profiler.firefox.com/. 131 132Firefox Profiler supports: 133 1341. Aggregated Flamegraphs 1352. Chronological Stackcharts 136 137And allows filtering by: 138 1391. Individual threads 1402. Multiple threads (Ctrl+Click thread names to select many) 1413. Timeline period 1424. Stack frame text search 143 144## FlameScope (great jank-finding UI) 145 146[Netflix's FlameScope](https://github.com/Netflix/flamescope) is a rough, 147proof-of-concept UI that lets you spot repeating patterns of work by laying out 148the profile as a subsecond heatmap. 149 150Below, each vertical stripe is one second, and each cell is 10ms. Redder cells 151have more samples. See 152https://www.brendangregg.com/blog/2018-11-08/flamescope-pattern-recognition.html 153for how to spot patterns. 154 155This is an example of a 60s DisplayBitmaps app Startup Profile. 156 157 158 159You can see: 160 161The thick red vertical line on the left is startup. The long white vertical 162sections on the left shows the app is mostly idle, waiting for commands from 163instrumented tests. Then we see periodically red blocks, which shows the app is 164periodically busy handling commands from instrumented tests. 165 166Click the start and end cells of a duration: 167 168 169 170To see a flamegraph for that duration: 171 172 173 174Install and run Flamescope: 175 176``` 177git clone https://github.com/Netflix/flamescope ~/flamescope 178cd ~/flamescope 179pip install -r requirements.txt 180npm install 181npm run webpack 182python3 run.py 183``` 184 185Then open FlameScope in-browser: http://localhost:5000/. 186 187FlameScope can read gzipped perf script format profiles. Convert simpleperf 188perf.data to this format with `report_sample.py`, and place it in Flamescope's 189examples directory: 190 191``` 192# Create `Linux perf script` format profile. 193report_sample.py | gzip > ~/flamescope/examples/my_simpleperf_profile.gz 194 195# Create `Linux perf script` format profile using Proguard map. 196report_sample.py \ 197 --proguard-mapping-file proguard.map \ 198 | gzip > ~/flamescope/examples/my_simpleperf_profile.gz 199``` 200 201Open the profile "as Linux Perf", and click start and end sections to get a 202flamegraph of that timespan. 203 204To investigate UI Thread Jank, filter to UI thread samples only: 205 206``` 207report_sample.py \ 208 --comm com.example.android.displayingbitmaps \ # UI Thread 209 | gzip > ~/flamescope/examples/uithread.gz 210``` 211 212Once you've identified the timespan of interest, consider also zooming into that 213section with Firefox Profiler, which has a more powerful flamegraph viewer. 214 215## Differential FlameGraph 216 217See Brendan Gregg's 218[Differential Flame Graphs](https://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html) 219blog. 220 221Use Simpleperf's `stackcollapse.py` to convert perf.data to Folded Stacks format 222for the FlameGraph toolkit. 223 224Consider diffing both directions: After minus Before, and Before minus After. 225 226If you've recorded before and after your optimisation as perf_before.data and 227perf_after.data, and you're only interested in the UI thread: 228 229``` 230# Generate before and after folded stacks from perf.data files 231./stackcollapse.py --kernel --jit -i perf_before.data \ 232 --proguard-mapping-file proguard_before.map \ 233 --comm com.example.android.displayingbitmaps \ 234 > perf_before.folded 235./stackcollapse.py --kernel --jit -i perf_after.data \ 236 --proguard-mapping-file proguard_after.map \ 237 --comm com.example.android.displayingbitmaps \ 238 > perf_after.folded 239 240# Generate diff reports 241FlameGraph/difffolded.pl -n perf_before.folded perf_after.folded \ 242 | FlameGraph/flamegraph.pl > diff1.svg 243FlameGraph/difffolded.pl -n --negate perf_after.folded perf_before.folded \ 244 | FlameGraph/flamegraph.pl > diff2.svg 245``` 246 247## Android Studio Profiler 248 249Android Studio Profiler supports recording and reporting profiles of app 250processes. It supports several recording methods, including one using simpleperf 251as backend. You can use Android Studio Profiler for both recording and 252reporting. 253 254In Android Studio: Open View -> Tool Windows -> Profiler Click + -> Your Device 255-> Profileable Processes -> Your App 256 257 258 259Click into "CPU" Chart 260 261Choose Callstack Sample Recording. Even if you're using Java, this provides 262better observability, into ART, malloc, and the kernel. 263 264 265 266Click Record, run your test on the device, then Stop when you're done. 267 268Click on a thread track, and "Flame Chart" to see a chronological chart on the 269left, and an aggregated flamechart on the right: 270 271 272 273If you want more flexibility in recording options, or want to add proguard 274mapping file, you can record using simpleperf, and report using Android Studio 275Profiler. 276 277We can use `simpleperf report-sample` to convert perf.data to trace files for 278Android Studio Profiler. 279 280``` 281# Convert perf.data to perf.trace for Android Studio Profiler. 282# If on Mac/Windows, use simpleperf host executable for those platforms instead. 283bin/linux/x86_64/simpleperf report-sample --show-callchain --protobuf -i perf.data -o perf.trace 284 285# Convert perf.data to perf.trace using proguard mapping file. 286bin/linux/x86_64/simpleperf report-sample --show-callchain --protobuf -i perf.data -o perf.trace \ 287 --proguard-mapping-file proguard.map 288``` 289 290In Android Studio: Open File -> Open -> Select perf.trace 291 292 293 294## Simpleperf HTML Report 295 296Simpleperf can generate its own HTML Profile, which is able to show 297Android-specific information and separate flamegraphs for all threads, with a 298much rougher flamegraph UI. 299 300 301 302This UI is fairly rough; we recommend using the Continuous PProf UI or Firefox 303Profiler instead. But it's useful for a quick look at your data. 304 305Each of the following commands take as input ./perf.data and output 306./report.html. 307 308``` 309# Make an HTML report. 310./report_html.py 311 312# Make an HTML report with Proguard mapping. 313./report_html.py --proguard-mapping-file proguard.map 314``` 315 316This will print some debug logs about Failed to read symbols: this is usually 317OK, unless those symbols are hotspots. 318 319See also [report_html.py's README](scripts_reference.md#report_htmlpy) and 320`report_html.py -h`. 321 322## PProf Interactive Command Line 323 324Unlike Continuous PProf UI, [PProf](https://github.com/google/pprof) command 325line is publicly available, and allows drilldown, pivoting and filtering. 326 327The below session demonstrates filtering to stack frames containing 328processBitmap. 329 330``` 331$ pprof pprof.profile 332(pprof) show=processBitmap 333(pprof) top 334Active filters: 335 show=processBitmap 336Showing nodes accounting for 2.45s, 11.44% of 21.46s total 337 flat flat% sum% cum cum% 338 2.45s 11.44% 11.44% 2.45s 11.44% com.example.android.displayingbitmaps.util.ImageFetcher.processBitmap 339``` 340 341And then showing the tags of those frames, to tell what threads they are running 342on: 343 344``` 345(pprof) tags 346 pid: Total 2.5s 347 2.5s ( 100%): 31112 348 349 thread: Total 2.5s 350 1.4s (57.21%): AsyncTask #3 351 1.1s (42.79%): AsyncTask #4 352 353 threadpool: Total 2.5s 354 2.5s ( 100%): AsyncTask #%d 355 356 tid: Total 2.5s 357 1.4s (57.21%): 31174 358 1.1s (42.79%): 31175 359``` 360 361Contrast with another method: 362 363``` 364(pprof) show=addBitmapToCache 365(pprof) top 366Active filters: 367 show=addBitmapToCache 368Showing nodes accounting for 1.05s, 4.88% of 21.46s total 369 flat flat% sum% cum cum% 370 1.05s 4.88% 4.88% 1.05s 4.88% com.example.android.displayingbitmaps.util.ImageCache.addBitmapToCache 371``` 372 373For more information, see the 374[pprof README](https://github.com/google/pprof/blob/main/doc/README.md#interactive-terminal-use). 375 376## Simpleperf Report Command Line 377 378The simpleperf report command reports profiles in text format. 379 380 381 382You can call `simpleperf report` directly or call it via `report.py`. 383 384``` 385# Report symbols in table format. 386$ ./report.py --children 387 388# Report call graph. 389$ bin/linux/x86_64/simpleperf report -g -i perf.data 390``` 391 392See also 393[report command's README](executable_commands_reference.md#The-report-command) 394and `report.py -h`. 395 396## Custom Report Interface 397 398If the above View UIs can't fulfill your need, you can use 399`simpleperf_report_lib.py` to parse perf.data, extract sample information, and 400feed it to any views you like. 401 402See 403[simpleperf_report_lib.py's README](scripts_reference.md#simpleperf_report_libpy) 404for more details. 405