1# Trace-based metrics 2 3_The metrics subsystem is a part of the 4[trace processor](/docs/analysis/trace-processor.md) which uses traces to 5compute reproducible metrics. It can be used in a wide range of situations; 6examples include benchmarks, lab tests and on large corpuses of traces._ 7 8 9 10## Quickstart 11 12The [quickstart](/docs/quickstart/trace-analysis.md) provides a quick overview 13on how to compute trace-based metrics traces using trace processor. 14 15## Introduction 16 17### Motivation 18 19Performance metrics are useful to monitor for the health of a system and ensure 20that a system does not regress over time as new features are added. 21 22However, metrics retrieved directly from the system have a downside: if there is 23a regression, it is difficult to root-cause the issue. Often, the problem may 24not be reproducible or may rely on a particular setup. 25 26Trace-based metrics are one possible solution to this problem. Instead of 27collecting metrics directly on the system, a trace is collected and metrics are 28computed from the trace. If a regression in the metric is spotted, the developer 29can look directly at the trace to understand why the regression has occurred 30instead of having to reproduce the issue. 31 32### Metric subsystem 33 34The metric subsystem is a part of the 35[trace processor](/docs/analysis/trace-processor.md) which executes SQL queries 36against traces and produces a metric which summarizes some performance attribute 37(e.g. CPU, memory, startup latency etc.). 38 39For example, generating the Android CPU metrics on a trace is as simple as: 40 41```python 42> ./trace_processor --run-metrics android_cpu <trace> 43android_cpu { 44 process_info { 45 name: "/system/bin/init" 46 threads { 47 name: "init" 48 core { 49 id: 1 50 metrics { 51 mcycles: 1 52 runtime_ns: 570365 53 min_freq_khz: 1900800 54 max_freq_khz: 1900800 55 avg_freq_khz: 1902017 56 } 57 } 58 ... 59 } 60 ... 61 } 62 ... 63} 64``` 65 66## Metric development guide 67 68As metric writing requires a lot of iterations to get right, there are several 69tips which make the experience a lot smoother. 70 71### Hot reloading metrics 72To obtain the fastest possible iteration time when developing metrics, 73it's possible to hot reload any changes to SQL; this will skip over both 74recompilation (for builtin metrics) and trace load (for both builtin and 75custom metrics). 76 77To do this, trace processor is started in *interactive mode* while 78still specifying command line flags about which metrics should be run and 79the paths of any extensions. Then, in the REPL shell, the commands 80`.load-metrics-sql` (which causes any SQL on disk to be re-read) and 81`.run-metrics` (to run the metrics and print the result). 82 83For example, suppose we want to iterate on the `android_startup` metric. We 84can run the following commands from a Perfetto checkout: 85```python 86> ./tools/trace_processor --interactive \ 87 --run-metrics android_startup \ 88 --metric-extension src/trace_processor/metrics@/ \ 89 --dev \ 90 <trace> 91android_startup { 92 <contents of startup metric> 93} 94 95# Now make any changes you want to the SQL files related to the startup 96# metric. Even adding new files in the src/trace_processor/metric works. 97 98# Then, we can reload the changes using `.load-metrics-sql`. 99> .load-metrics-sql 100 101# We can rerun the changed metric using `.run-metrics` 102> .run-metrics 103android_startup { 104 <contents of changed startup metric> 105} 106``` 107 108NOTE: see below about why `--dev` was required for this command. 109 110This also works for custom metrics specified on the command line: 111```python 112> ./tools/trace_processor -i --run-metrics /tmp/my_custom_metric.sql <trace> 113my_custom_metric { 114 <contents of my_custom_metric> 115} 116 117# Change the SQL file as before. 118 119> .load-metrics-sql 120> .run-metrics 121my_custom_metric { 122 <contents of changed my_custom_metric> 123} 124``` 125 126WARNING: it is currently not possible to reload protos in the same way. If 127protos are changed, a recompile (for built-in metrics) and reinvoking 128trace processor is necessary to pick up the changes. 129 130WARNING: Deleted files from `--metric-extension` folders are *not* removed 131and will remain available e.g. to RUN_METRIC invocations. 132 133### Modifying built-in metric SQL without recompiling 134It is possible to override the SQL of built-in metrics at runtime without 135needing to recompile trace processor. To do this, the flag `--metric-extension` 136needs to be specified with the disk path where the built-metrics live and the 137special string `/` for the virtual path. 138 139For example, from inside a Perfetto checkout: 140```python 141> ./tools/trace_processor \ 142 --run-metrics android_cpu \ 143 --metric-extension src/trace_processor/metrics@/ 144 --dev 145 <trace> 146``` 147This will run the CPU metric using the live SQL in the repo *not* the SQL 148defintion built into the binary. 149 150NOTE: protos are *not* overriden in the same way - if any proto messages are 151changed a recompile of trace processor is required for the changes to be 152available. 153 154NOTE: the `--dev` flag is required for the use of this feature. This 155flag ensures that this feature is not accidentally in production as it is only 156intended for local development. 157 158WARNING: protos are *not* overriden in the same way - if any proto messages are 159changed a recompile of trace processor is required for the changes to be 160available. 161 162## Metric helper functions 163 164### RUN_METRIC 165`RUN_METRIC` allows you to run another metric file. This allows you to use views 166or tables defined in that file without repeatition. 167 168Conceptually, `RUN_METRIC` adds *composability* for SQL queries to break a big SQL 169metric into smaller, reusable files. This is similar to how functions allow decomposing 170large chunks in traditional programming languages. 171 172A simple usage of `RUN_METRIC` would be as follows: 173 174In file android/foo.sql: 175```sql 176CREATE VIEW view_defined_in_foo AS 177SELECT * 178FROM slice 179LIMIT 1; 180``` 181 182In file android/bar.sql 183```sql 184SELECT RUN_METRIC('android/foo.sql'); 185 186CREATE VIEW view_depending_on_view_from_foo AS 187SELECT * 188FROM view_defined_in_foo 189LIMIT 1; 190``` 191 192`RUN_METRIC` also supports running *templated* metric files. Here's an example of 193what that looks like: 194 195In file android/slice_template.sql: 196```sql 197CREATE VIEW {{view_name}} AS 198SELECT * 199FROM slice 200WHERE slice.name = '{{slice_name}}'; 201``` 202 203In file android/metric.sql: 204```sql 205SELECT RUN_METRIC( 206 'android/slice_template.sql', 207 'view_name', 'choreographer_slices', 208 'slice_name', 'Chroeographer#doFrame' 209); 210 211CREATE VIEW long_choreographer_slices AS 212SELECT * 213FROM choreographer_slices 214WHERE dur > 1e6; 215``` 216 217When running `slice_template.sql`, trace processor will substitute the arguments 218passed to `RUN_METRIC` into the templated file *before* executing the file using 219SQLite. 220 221In other words, this is what SQLite sees and executes in practice for the above 222example: 223```sql 224CREATE VIEW choreographer_slices AS 225SELECT * 226FROM slice 227WHERE slice.name = 'Chroeographer#doFrame'; 228 229CREATE VIEW long_choreographer_slices AS 230SELECT * 231FROM choreographer_slices 232WHERE dur > 1e6; 233``` 234 235The syntax for templated metric files is essentially a highly simplified version of 236[Jinja's](https://jinja.palletsprojects.com/en/3.0.x/) syntax. 237 238## Walkthrough: prototyping a metric 239 240TIP: To see how to add a new metric to trace processor, see the checklist 241[here](/docs/contributing/common-tasks.md#new-metric) 242 243This walkthrough will outline how to prototype a metric locally without needing 244to compile trace processor. This metric will compute the CPU time for every 245process in the trace and list the names of the top 5 processes (by CPU time) and 246the number of threads created by the process. 247 248NOTE: See this [GitHub gist][gist] to see how the code should look at the end of 249 the walkthrough. The prerequisites and Step 4 below give instructions on 250 how to get trace processor and run the metrics code. 251 252[gist]: https://gist.github.com/LalitMaganti/c221cf0cae17e298dfa82b118edf9080 253 254### Prerequisites 255 256As a setup step, create a folder to act as a scratch workspace; this folder will be referred to using the env variable `$WORKSPACE` in Step 4. 257 258The other requirement is trace processor. This can downloaded from [here](https://get.perfetto.dev/trace_processor) or can be built from source 259using the instructions [here](trace-processor.md). Whichever method is chosen, $TRACE_PROCESSOR env variable will be used to refer to the location of the binary in Step 4. 260 261### Step 1 262 263As all metrics in the metrics platform are defined using protos, the metric needs to be structured as a proto. For this metric, there needs to be some notion of a process name along with its CPU time and number of threads. 264 265Starting off, in a file named `top_five_processes.proto` in our workspace, create a basic proto message called ProcessInfo with those three fields: 266 267```protobuf 268message ProcessInfo { 269 optional string process_name = 1; 270 optional uint64 cpu_time_ms = 2; 271 optional uint32 num_threads = 3; 272} 273``` 274 275Next , create a wrapping message which will hold the repeated field containing the top 5 processes. 276 277```protobuf 278message TopProcesses { 279 repeated ProcessInfo process_info = 1; 280} 281``` 282 283Finally, define an extension to the root proto for all metrics (the [TraceMetrics](https://android.googlesource.com/platform/external/perfetto/+/HEAD/protos/perfetto/metrics/metrics.proto#39) proto). 284 285```protobuf 286extend TraceMetrics { 287 optional TopProcesses top_five_processes = 450; 288} 289``` 290 291Adding this extension field allows trace processor to link the newly defined 292metric to the `TraceMetrics` proto. 293 294_Notes:_ 295 296- The field ids 450-500 are reserved for local development so any of them can be used as the field id for the extension field. 297- The choice of field name here is important as the SQL file and the final table generated in SQL will be based on this name. 298 299Putting everything together, along with some boilerplate preamble gives: 300 301```protobuf 302syntax = "proto2"; 303 304package perfetto.protos; 305 306import "protos/perfetto/metrics/metrics.proto"; 307 308message ProcessInfo { 309 optional string process_name = 1; 310 optional int64 cpu_time_ms = 2; 311 optional uint32 num_threads = 3; 312} 313 314message TopProcesses { 315 repeated ProcessInfo process_info = 1; 316} 317 318extend TraceMetrics { 319 optional TopProcesses top_five_processes = 450; 320} 321``` 322 323### Step 2 324 325Next, write the SQL to generate the table of the top 5 processes ordered by the 326sum of the CPU time they ran for and the number of threads which were associated 327with the process. 328 329The following SQL should be added to a file called `top_five_processes.sql` in 330the workspace: 331 332```sql 333CREATE VIEW top_five_processes_by_cpu AS 334SELECT 335 process.name as process_name, 336 CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms, 337 COUNT(DISTINCT utid) as num_threads 338FROM sched 339INNER JOIN thread USING(utid) 340INNER JOIN process USING(upid) 341GROUP BY process.name 342ORDER BY cpu_time_ms DESC 343LIMIT 5; 344``` 345 346Let's break this query down: 347 3481. The first table used is the `sched` table. This contains all the scheduling 349 data available in the trace. Each scheduling "slice" is associated with a 350 thread which is uniquely identified in Perfetto traces using its `utid`. The 351 two pieces of information needed from the sched table are the `dur` - 352 short for duration, this is the amount of time the slice lasted - and the 353 `utid` which will be used to join with the thread table. 3542. The next table is the thread table. This gives us a lot of information which 355 is not particularly interesting (including its thread name) but it does give 356 us the `upid`. Similar to `utid`, `upid` is the unique identifier for a 357 process in a Perfetto trace. In this case, `upid` will refer to the process 358 which hosts the thread given by `utid`. 3593. The final table is the process table. This gives the name of the process 360 associated with the original sched slice. 3614. With the process, thread and duration for each sched slice, all the slices 362 for a single processes are collected and their durations summed to get the 363 CPU time (dividing by 1e6 as sched's duration is in nanoseconds) and 364 the number of distinct threads. 3655. Finally, we order by the cpu time and limit to the top 5 results. 366 367### Step 3 368 369Now that the result of the metric has been expressed as an SQL table, it needs 370to be converted to a proto. The metrics platform has built-in support for emitting 371protos using SQL functions; something which is used extensively in this step. 372 373Let's look at how it works for our table above. 374 375```sql 376CREATE VIEW top_five_processes_output AS 377SELECT TopProcesses( 378 'process_info', ( 379 SELECT RepeatedField( 380 ProcessInfo( 381 'process_name', process_name, 382 'cpu_time_ms', cpu_time_ms, 383 'num_threads', num_threads 384 ) 385 ) 386 FROM top_five_processes_by_cpu 387 ) 388); 389``` 390 391Breaking this down again: 392 3931. Starting from the inner-most SELECT statement, there is what looks like 394 a function call to the ProcessInfo function; in fact this is no coincidence. 395 For each proto that the metrics platform knows about, an SQL function is 396 generated with the same name as the proto. This function takes key value 397 pairs with the key as the name of the proto field to fill and the value being 398 the data to store in the field. The output is the proto created by writing 399 the fields described in the function. (\*) 400 401 In this case, this function is called once for each row in the 402 `top_five_processes_by_cpu` table. The output will be the fully filled 403 ProcessInfo proto. 404 405 The call to the `RepeatedField` function is the most interesting part and 406 also the most important. In technical terms, `RepeatedField` is an aggregate 407 function. Practically, this means that it takes a full table of values and 408 generates a single array which contains all the values passed to it. 409 410 Therefore, the output of this whole SELECT statement is an array of 5 411 ProcessInfo protos. 412 4132. Next is creation of the `TopProcesses` proto. By now, the syntax should 414 already feel somewhat familiar; the proto builder function is called to fill 415 in the `process_info` field with the array of protos from the inner function. 416 417 The output of this SELECT is a single `TopProcesses` proto containing the 418 ProcessInfos as a repeated field. 419 4203. Finally, the view is created. This view is specially named to allow the 421 metrics platform to query it to obtain the root proto for each metric 422 (in this case `TopProcesses`). See the note below as to the pattern behind 423 this view's name. 424 425(\*) _This is not strictly true. To type-check the protos, some metadata 426is returned about the type of the proto but this is unimportant for metric 427authors._ 428 429NOTE: It is important that the views be named {name of TraceMetrics extension 430 field}\_output. This is the pattern used and expected by the metrics 431 platform for all metrics. 432 433The final file should look like so: 434 435```sql 436CREATE VIEW top_five_processes_by_cpu AS 437SELECT 438 process.name as process_name, 439 CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms, 440 COUNT(DISTINCT utid) as num_threads 441FROM sched 442INNER JOIN thread USING(utid) 443INNER JOIN process USING(upid) 444GROUP BY process.name 445ORDER BY cpu_time_ms DESC 446LIMIT 5; 447 448CREATE VIEW top_five_processes_output AS 449SELECT TopProcesses( 450 'process_info', ( 451 SELECT RepeatedField( 452 ProcessInfo( 453 'process_name', process_name, 454 'cpu_time_ms', cpu_time_ms, 455 'num_threads', num_threads 456 ) 457 ) 458 FROM top_five_processes_by_cpu 459 ) 460); 461``` 462 463NOTE: The name of the SQL file should be the same as the name of TraceMetrics 464 extension field. This is to allow the metrics platform to associated the 465 proto extension field with the SQL which needs to be run to generate it. 466 467### Step 4 468 469For this step, invoke trace processor shell to run the metrics (see the 470[Quickstart](/docs/quickstart/trace-analysis.md) for downloading instructions): 471 472```shell 473$TRACE_PROCESSOR --run-metrics $WORKSPACE/top_five_processes.sql $TRACE 2> /dev/null 474``` 475 476(For an example trace to test this on, see the Notes section below.) 477 478By passing the SQL file for the metric to be computed, trace processor uses the name of this file to find the proto and to figure out the name of the output table for the proto and the name of the extension field for `TraceMetrics`; this is the reason it was important to choose the names of these other objects carefully. 479 480_Notes:_ 481 482- If something doesn't work as intended, check that the workspace looks the same as the contents of this [GitHub gist](https://gist.github.com/LalitMaganti/c221cf0cae17e298dfa82b118edf9080). 483- A good example trace for this metric is the Android example trace used by the Perfetto UI found [here](https://storage.googleapis.com/perfetto-misc/example_android_trace_30s_1). 484- stderr is redirected to remove any noise from parsing the trace that trace processor generates. 485 486If everything went successfully, the following output should be visible (specifically this is the output for the Android example trace linked above): 487 488``` 489[perfetto.protos.top_five_processes] { 490 process_info { 491 process_name: "com.google.android.GoogleCamera" 492 cpu_time_ms: 15154 493 num_threads: 125 494 } 495 process_info { 496 process_name: "sugov:4" 497 cpu_time_ms: 6846 498 num_threads: 1 499 } 500 process_info { 501 process_name: "system_server" 502 cpu_time_ms: 6809 503 num_threads: 66 504 } 505 process_info { 506 process_name: "cds_ol_rx_threa" 507 cpu_time_ms: 6684 508 num_threads: 1 509 } 510 process_info { 511 process_name: "com.android.chrome" 512 cpu_time_ms: 5125 513 num_threads: 49 514 } 515} 516``` 517 518## Next steps 519 520* The [common tasks](/docs/contributing/common-tasks.md) page gives a list of 521 steps on how new metrics can be added to the trace processor. 522 523## Appendix: Case for upstreaming 524 525NOTE: Googlers: for internal usage of metrics in Google3 (i.e. metrics which are 526confidential), please see [this internal page](https://goto.google.com/viecd). 527 528Authors are strongly encouraged to add all metrics derived on Perfetto traces to 529the Perfetto repo unless there is a clear usecase (e.g. confidentiality) why 530these metrics should not be publicly available. 531 532In return for upstreaming metrics, authors will have first class support for 533running metrics locally and the confidence that their metrics will remain stable 534as trace processor is developed. 535 536As well as scaling upwards while developing from running on a single trace 537locally to running on a large set of traces, the reverse is also very useful. 538When an anomaly is observed in the metrics of a lab benchmark, a representative 539trace can be downloaded and the same metric can be run locally in trace 540processor. 541 542Since the same code is running locally and remotely, developers can be confident 543in reproducing the issue and use the trace processor and/or the Perfetto UI to 544identify the problem. 545