xref: /aosp_15_r20/external/perfetto/docs/analysis/metrics.md (revision 6dbdd20afdafa5e3ca9b8809fa73465d530080dc)
1# Trace-based metrics
2
3_The metrics subsystem is a part of the
4[trace processor](/docs/analysis/trace-processor.md) which uses traces to
5compute reproducible metrics. It can be used in a wide range of situations;
6examples include benchmarks, lab tests and on large corpuses of traces._
7
8![Block diagram of metrics](/docs/images/metrics-summary.png)
9
10## Quickstart
11
12The [quickstart](/docs/quickstart/trace-analysis.md) provides a quick overview
13on how to compute trace-based metrics traces using trace processor.
14
15## Introduction
16
17### Motivation
18
19Performance metrics are useful to monitor for the health of a system and ensure
20that a system does not regress over time as new features are added.
21
22However, metrics retrieved directly from the system have a downside: if there is
23a regression, it is difficult to root-cause the issue. Often, the problem may
24not be reproducible or may rely on a particular setup.
25
26Trace-based metrics are one possible solution to this problem. Instead of
27collecting metrics directly on the system, a trace is collected and metrics are
28computed from the trace. If a regression in the metric is spotted, the developer
29can look directly at the trace to understand why the regression has occurred
30instead of having to reproduce the issue.
31
32### Metric subsystem
33
34The metric subsystem is a part of the
35[trace processor](/docs/analysis/trace-processor.md) which executes SQL queries
36against traces and produces a metric which summarizes some performance attribute
37(e.g. CPU, memory, startup latency etc.).
38
39For example, generating the Android CPU metrics on a trace is as simple as:
40
41```python
42> ./trace_processor --run-metrics android_cpu <trace>
43android_cpu {
44  process_info {
45    name: "/system/bin/init"
46    threads {
47      name: "init"
48      core {
49        id: 1
50        metrics {
51          mcycles: 1
52          runtime_ns: 570365
53          min_freq_khz: 1900800
54          max_freq_khz: 1900800
55          avg_freq_khz: 1902017
56        }
57      }
58      ...
59    }
60    ...
61  }
62  ...
63}
64```
65
66## Metric development guide
67
68As metric writing requires a lot of iterations to get right, there are several
69tips which make the experience a lot smoother.
70
71### Hot reloading metrics
72To obtain the fastest possible iteration time when developing metrics,
73it's possible to hot reload any changes to SQL; this will skip over both
74recompilation (for builtin metrics) and trace load (for both builtin and
75custom metrics).
76
77To do this, trace processor is started in *interactive mode* while
78still specifying command line flags about which metrics should be run and
79the paths of any extensions. Then, in the REPL shell, the commands
80`.load-metrics-sql` (which causes any SQL on disk to be re-read) and
81`.run-metrics` (to run the metrics and print the result).
82
83For example, suppose we want to iterate on the `android_startup` metric. We
84can run the following commands from a Perfetto checkout:
85```python
86> ./tools/trace_processor --interactive \
87  --run-metrics android_startup \
88  --metric-extension src/trace_processor/metrics@/ \
89  --dev \
90  <trace>
91android_startup {
92  <contents of startup metric>
93}
94
95# Now make any changes you want to the SQL files related to the startup
96# metric. Even adding new files in the src/trace_processor/metric works.
97
98# Then, we can reload the changes using `.load-metrics-sql`.
99> .load-metrics-sql
100
101# We can rerun the changed metric using `.run-metrics`
102> .run-metrics
103android_startup {
104  <contents of changed startup metric>
105}
106```
107
108NOTE: see below about why `--dev` was required for this command.
109
110This also works for custom metrics specified on the command line:
111```python
112> ./tools/trace_processor -i --run-metrics /tmp/my_custom_metric.sql <trace>
113my_custom_metric {
114  <contents of my_custom_metric>
115}
116
117# Change the SQL file as before.
118
119> .load-metrics-sql
120> .run-metrics
121my_custom_metric {
122  <contents of changed my_custom_metric>
123}
124```
125
126WARNING: it is currently not possible to reload protos in the same way. If
127protos are changed, a recompile (for built-in metrics) and reinvoking
128trace processor is necessary to pick up the changes.
129
130WARNING: Deleted files from `--metric-extension` folders are *not* removed
131and will remain available e.g. to RUN_METRIC invocations.
132
133### Modifying built-in metric SQL without recompiling
134It is possible to override the SQL of built-in metrics at runtime without
135needing to recompile trace processor. To do this, the flag `--metric-extension`
136needs to be specified with the disk path where the built-metrics live and the
137special string `/` for the virtual path.
138
139For example, from inside a Perfetto checkout:
140```python
141> ./tools/trace_processor \
142  --run-metrics android_cpu \
143  --metric-extension src/trace_processor/metrics@/
144  --dev
145  <trace>
146```
147This will run the CPU metric using the live SQL in the repo *not* the SQL
148defintion built into the binary.
149
150NOTE: protos are *not* overriden in the same way - if any proto messages are
151changed a recompile of trace processor is required for the changes to be
152available.
153
154NOTE: the `--dev` flag is required for the use of this feature. This
155flag ensures that this feature is not accidentally in production as it is only
156intended for local development.
157
158WARNING: protos are *not* overriden in the same way - if any proto messages are
159changed a recompile of trace processor is required for the changes to be
160available.
161
162## Metric helper functions
163
164### RUN_METRIC
165`RUN_METRIC` allows you to run another metric file. This allows you to use views
166or tables defined in that file without repeatition.
167
168Conceptually, `RUN_METRIC` adds *composability* for SQL queries to break a big SQL
169metric into smaller, reusable files. This is similar to how functions allow decomposing
170large chunks in traditional programming languages.
171
172A simple usage of `RUN_METRIC` would be as follows:
173
174In file android/foo.sql:
175```sql
176CREATE VIEW view_defined_in_foo AS
177SELECT *
178FROM slice
179LIMIT 1;
180```
181
182In file android/bar.sql
183```sql
184SELECT RUN_METRIC('android/foo.sql');
185
186CREATE VIEW view_depending_on_view_from_foo AS
187SELECT *
188FROM view_defined_in_foo
189LIMIT 1;
190```
191
192`RUN_METRIC` also supports running *templated* metric files. Here's an example of
193what that looks like:
194
195In file android/slice_template.sql:
196```sql
197CREATE VIEW {{view_name}} AS
198SELECT *
199FROM slice
200WHERE slice.name = '{{slice_name}}';
201```
202
203In file android/metric.sql:
204```sql
205SELECT RUN_METRIC(
206  'android/slice_template.sql',
207  'view_name', 'choreographer_slices',
208  'slice_name', 'Chroeographer#doFrame'
209);
210
211CREATE VIEW long_choreographer_slices AS
212SELECT *
213FROM choreographer_slices
214WHERE dur > 1e6;
215```
216
217When running `slice_template.sql`, trace processor will substitute the arguments
218passed to `RUN_METRIC` into the templated file *before* executing the file using
219SQLite.
220
221In other words, this is what SQLite sees and executes in practice for the above
222example:
223```sql
224CREATE VIEW choreographer_slices AS
225SELECT *
226FROM slice
227WHERE slice.name = 'Chroeographer#doFrame';
228
229CREATE VIEW long_choreographer_slices AS
230SELECT *
231FROM choreographer_slices
232WHERE dur > 1e6;
233```
234
235The syntax for templated metric files is essentially a highly simplified version of
236[Jinja's](https://jinja.palletsprojects.com/en/3.0.x/) syntax.
237
238## Walkthrough: prototyping a metric
239
240TIP: To see how to add a new metric to trace processor, see the checklist
241[here](/docs/contributing/common-tasks.md#new-metric)
242
243This walkthrough will outline how to prototype a metric locally without needing
244to compile trace processor. This metric will compute the CPU time for every
245process in the trace and list the names of the top 5 processes (by CPU time) and
246the number of threads created by the process.
247
248NOTE: See this [GitHub gist][gist] to see how the code should look at the end of
249      the walkthrough. The prerequisites and Step 4 below give instructions on
250      how to get trace processor and run the metrics code.
251
252[gist]: https://gist.github.com/LalitMaganti/c221cf0cae17e298dfa82b118edf9080
253
254### Prerequisites
255
256As a setup step, create a folder to act as a scratch workspace; this folder will be referred to using the env variable `$WORKSPACE` in Step 4.
257
258The other requirement is trace processor. This can downloaded from [here](https://get.perfetto.dev/trace_processor) or can be built from source
259using the instructions [here](trace-processor.md). Whichever method is chosen, $TRACE_PROCESSOR env variable will be used to refer to the location of the binary in Step 4.
260
261### Step 1
262
263As all metrics in the metrics platform are defined using protos, the metric needs to be structured as a proto. For this metric, there needs to be some notion of a process name along with its CPU time and number of threads.
264
265Starting off, in a file named `top_five_processes.proto` in our workspace, create a basic proto message called ProcessInfo with those three fields:
266
267```protobuf
268message ProcessInfo {
269  optional string process_name = 1;
270  optional uint64 cpu_time_ms = 2;
271  optional uint32 num_threads = 3;
272}
273```
274
275Next , create a wrapping message which will hold the repeated field containing the top 5 processes.
276
277```protobuf
278message TopProcesses {
279  repeated ProcessInfo process_info = 1;
280}
281```
282
283Finally, define an extension to the root proto for all metrics (the [TraceMetrics](https://android.googlesource.com/platform/external/perfetto/+/HEAD/protos/perfetto/metrics/metrics.proto#39) proto).
284
285```protobuf
286extend TraceMetrics {
287  optional TopProcesses top_five_processes = 450;
288}
289```
290
291Adding this extension field allows trace processor to link the newly defined
292metric to the `TraceMetrics` proto.
293
294_Notes:_
295
296- The field ids 450-500 are reserved for local development so any of them can be used as the field id for the extension field.
297- The choice of field name here is important as the SQL file and the final table generated in SQL will be based on this name.
298
299Putting everything together, along with some boilerplate preamble gives:
300
301```protobuf
302syntax = "proto2";
303
304package perfetto.protos;
305
306import "protos/perfetto/metrics/metrics.proto";
307
308message ProcessInfo {
309  optional string process_name = 1;
310  optional int64 cpu_time_ms = 2;
311  optional uint32 num_threads = 3;
312}
313
314message TopProcesses {
315  repeated ProcessInfo process_info = 1;
316}
317
318extend TraceMetrics {
319  optional TopProcesses top_five_processes = 450;
320}
321```
322
323### Step 2
324
325Next, write the SQL to generate the table of the top 5 processes ordered by the
326sum of the CPU time they ran for and the number of threads which were associated
327with the process.
328
329The following SQL should be added to a file called `top_five_processes.sql` in
330the workspace:
331
332```sql
333CREATE VIEW top_five_processes_by_cpu AS
334SELECT
335  process.name as process_name,
336  CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms,
337  COUNT(DISTINCT utid) as num_threads
338FROM sched
339INNER JOIN thread USING(utid)
340INNER JOIN process USING(upid)
341GROUP BY process.name
342ORDER BY cpu_time_ms DESC
343LIMIT 5;
344```
345
346Let's break this query down:
347
3481. The first table used is the `sched` table. This contains all the scheduling
349   data available in the trace. Each scheduling "slice" is associated with a
350   thread which is uniquely identified in Perfetto traces using its `utid`. The
351   two pieces of information needed from the sched table are the `dur` -
352   short for duration, this is the amount of time the slice lasted - and the
353   `utid` which will be used to join with the thread table.
3542. The next table is the thread table. This gives us a lot of information which
355   is not particularly interesting (including its thread name) but it does give
356   us the `upid`. Similar to `utid`, `upid` is the unique identifier for a
357   process in a Perfetto trace. In this case, `upid` will refer to the process
358   which hosts the thread given by `utid`.
3593. The final table is the process table. This gives the name of the process
360   associated with the original sched slice.
3614. With the process, thread and duration for each sched slice, all the slices
362   for a single processes are collected and their durations summed to get the
363   CPU time (dividing by 1e6 as sched's duration is in nanoseconds) and
364   the number of distinct threads.
3655. Finally, we order by the cpu time and limit to the top 5 results.
366
367### Step 3
368
369Now that the result of the metric has been expressed as an SQL table, it needs
370to be converted to a proto. The metrics platform has built-in support for emitting
371protos using SQL functions; something which is used extensively in this step.
372
373Let's look at how it works for our table above.
374
375```sql
376CREATE VIEW top_five_processes_output AS
377SELECT TopProcesses(
378  'process_info', (
379    SELECT RepeatedField(
380      ProcessInfo(
381        'process_name', process_name,
382        'cpu_time_ms', cpu_time_ms,
383        'num_threads', num_threads
384      )
385    )
386    FROM top_five_processes_by_cpu
387  )
388);
389```
390
391Breaking this down again:
392
3931. Starting from the inner-most SELECT statement, there is what looks like
394   a function call to the ProcessInfo function; in fact this is no coincidence.
395   For each proto that the metrics platform knows about, an SQL function is
396   generated with the same name as the proto. This function takes key value
397   pairs with the key as the name of the proto field to fill and the value being
398   the data to store in the field. The output is the proto created by writing
399   the fields described in the function. (\*)
400
401   In this case, this function is called once for each row in the
402   `top_five_processes_by_cpu` table. The output will be the fully filled
403   ProcessInfo proto.
404
405   The call to the `RepeatedField` function is the most interesting part and
406   also the most important. In technical terms, `RepeatedField` is an aggregate
407   function. Practically, this means that it takes a full table of values and
408   generates a single array which contains all the values passed to it.
409
410   Therefore, the output of this whole SELECT statement is an array of 5
411   ProcessInfo protos.
412
4132. Next is creation of the `TopProcesses` proto. By now, the syntax should
414   already feel somewhat familiar; the proto builder function is called to fill
415   in the `process_info` field with the array of protos from the inner function.
416
417   The output of this SELECT is a single `TopProcesses` proto containing the
418   ProcessInfos as a repeated field.
419
4203. Finally, the view is created. This view is specially named to allow the
421   metrics platform to query it to obtain the root proto for each metric
422   (in this case `TopProcesses`). See the note below as to the pattern behind
423   this view's name.
424
425(\*) _This is not strictly true. To type-check the protos, some metadata
426is returned about the type of the proto but this is unimportant for metric
427authors._
428
429NOTE: It is important that the views be named {name of TraceMetrics extension
430      field}\_output. This is the pattern used and expected by the metrics
431      platform for all metrics.
432
433The final file should look like so:
434
435```sql
436CREATE VIEW top_five_processes_by_cpu AS
437SELECT
438  process.name as process_name,
439  CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms,
440  COUNT(DISTINCT utid) as num_threads
441FROM sched
442INNER JOIN thread USING(utid)
443INNER JOIN process USING(upid)
444GROUP BY process.name
445ORDER BY cpu_time_ms DESC
446LIMIT 5;
447
448CREATE VIEW top_five_processes_output AS
449SELECT TopProcesses(
450  'process_info', (
451    SELECT RepeatedField(
452      ProcessInfo(
453        'process_name', process_name,
454        'cpu_time_ms', cpu_time_ms,
455        'num_threads', num_threads
456      )
457    )
458    FROM top_five_processes_by_cpu
459  )
460);
461```
462
463NOTE: The name of the SQL file should be the same as the name of TraceMetrics
464      extension field. This is to allow the metrics platform to associated the
465      proto extension field with the SQL which needs to be run to generate it.
466
467### Step 4
468
469For this step, invoke trace processor shell to run the metrics (see the
470[Quickstart](/docs/quickstart/trace-analysis.md) for downloading instructions):
471
472```shell
473$TRACE_PROCESSOR --run-metrics $WORKSPACE/top_five_processes.sql $TRACE 2> /dev/null
474```
475
476(For an example trace to test this on, see the Notes section below.)
477
478By passing the SQL file for the metric to be computed, trace processor uses the name of this file to find the proto and to figure out the name of the output table for the proto and the name of the extension field for `TraceMetrics`; this is the reason it was important to choose the names of these other objects carefully.
479
480_Notes:_
481
482- If something doesn't work as intended, check that the workspace looks the same as the contents of this [GitHub gist](https://gist.github.com/LalitMaganti/c221cf0cae17e298dfa82b118edf9080).
483- A good example trace for this metric is the Android example trace used by the Perfetto UI found [here](https://storage.googleapis.com/perfetto-misc/example_android_trace_30s_1).
484- stderr is redirected to remove any noise from parsing the trace that trace processor generates.
485
486If everything went successfully, the following output should be visible (specifically this is the output for the Android example trace linked above):
487
488```
489[perfetto.protos.top_five_processes] {
490  process_info {
491    process_name: "com.google.android.GoogleCamera"
492    cpu_time_ms: 15154
493    num_threads: 125
494  }
495  process_info {
496    process_name: "sugov:4"
497    cpu_time_ms: 6846
498    num_threads: 1
499  }
500  process_info {
501    process_name: "system_server"
502    cpu_time_ms: 6809
503    num_threads: 66
504  }
505  process_info {
506    process_name: "cds_ol_rx_threa"
507    cpu_time_ms: 6684
508    num_threads: 1
509  }
510  process_info {
511    process_name: "com.android.chrome"
512    cpu_time_ms: 5125
513    num_threads: 49
514  }
515}
516```
517
518## Next steps
519
520* The [common tasks](/docs/contributing/common-tasks.md) page gives a list of
521  steps on how new metrics can be added to the trace processor.
522
523## Appendix: Case for upstreaming
524
525NOTE: Googlers: for internal usage of metrics in Google3 (i.e. metrics which are
526confidential), please see [this internal page](https://goto.google.com/viecd).
527
528Authors are strongly encouraged to add all metrics derived on Perfetto traces to
529the Perfetto repo unless there is a clear usecase (e.g. confidentiality) why
530these metrics should not be publicly available.
531
532In return for upstreaming metrics, authors will have first class support for
533running metrics locally and the confidence that their metrics will remain stable
534as trace processor is developed.
535
536As well as scaling upwards while developing from running on a single trace
537locally to running on a large set of traces, the reverse is also very useful.
538When an anomaly is observed in the metrics of a lab benchmark, a representative
539trace can be downloaded and the same metric can be run locally in trace
540processor.
541
542Since the same code is running locally and remotely, developers can be confident
543in reproducing the issue and use the trace processor and/or the Perfetto UI to
544identify the problem.
545