xref: /aosp_15_r20/external/perfetto/docs/analysis/batch-trace-processor.md (revision 6dbdd20afdafa5e3ca9b8809fa73465d530080dc)
1# Batch Trace Processor
2
3_The Batch Trace Processor is a Python library wrapping the
4[Trace Processor](/docs/analysis/trace-processor.md): it allows fast (<1s)
5interactive queries on large sets (up to ~1000) of traces._
6
7## Installation
8
9Batch Trace Processor is part of the `perfetto` Python library and can be
10installed by running:
11
12```shell
13pip3 install pandas       # prerequisite for Batch Trace Processor
14pip3 install perfetto
15```
16
17## Loading traces
18NOTE: if you are a Googler, have a look at
19[go/perfetto-btp-load-internal](http://goto.corp.google.com/perfetto-btp-load-internal) for how to load traces from Google-internal sources.
20
21The simplest way to load traces in is by passing a list of file paths to load:
22```python
23from perfetto.batch_trace_processor.api import BatchTraceProcessor
24
25files = [
26  'traces/slow-start.pftrace',
27  'traces/oom.pftrace',
28  'traces/high-battery-drain.pftrace',
29]
30with BatchTraceProcessor(files) as btp:
31  btp.query('...')
32```
33
34[glob](https://docs.python.org/3/library/glob.html) can be used to load
35all traces in a directory:
36```python
37from perfetto.batch_trace_processor.api import BatchTraceProcessor
38
39files = glob.glob('traces/*.pftrace')
40with BatchTraceProcessor(files) as btp:
41  btp.query('...')
42```
43
44NOTE: loading too many traces can cause out-of-memory issues: see
45[this](/docs/analysis/batch-trace-processor#memory-usage) section for details.
46
47A common requirement is to load traces located in the cloud or by sending
48a request to a server. To support this usecase, traces can also be loaded
49using [trace URIs](/docs/analysis/batch-trace-processor#trace-uris):
50```python
51from perfetto.batch_trace_processor.api import BatchTraceProcessor
52from perfetto.batch_trace_processor.api import BatchTraceProcessorConfig
53from perfetto.trace_processor.api import TraceProcessorConfig
54from perfetto.trace_uri_resolver.registry import ResolverRegistry
55from perfetto.trace_uri_resolver.resolver import TraceUriResolver
56
57class FooResolver(TraceUriResolver):
58  # See "Trace URIs" section below for how to implement a URI resolver.
59
60config = BatchTraceProcessorConfig(
61  # See "Trace URIs" below
62)
63with BatchTraceProcessor('foo:bar=1,baz=abc', config=config) as btp:
64  btp.query('...')
65```
66
67## Writing queries
68Writing queries with batch trace processor works very similarly to the
69[Python API](/docs/analysis/batch-trace-processor#python-api).
70
71For example, to get a count of the number of userspace slices:
72```python
73>>> btp.query('select count(1) from slice')
74[  count(1)
750  2092592,   count(1)
760   156071,   count(1)
770   121431]
78```
79The return value of `query` is a list of [Pandas](https://pandas.pydata.org/)
80dataframes, one for each trace loaded.
81
82A common requirement is for all of the traces to be flattened into a
83single dataframe instead of getting one dataframe per-trace. To support this,
84the `query_and_flatten` function can be used:
85```python
86>>> btp.query_and_flatten('select count(1) from slice')
87  count(1)
880  2092592
891   156071
902   121431
91```
92
93`query_and_flatten` also implicitly adds columns indicating the originating
94trace. The exact columns added depend on the resolver being used: consult your
95resolver's documentation for more information.
96
97## Trace URIs
98Trace URIs are a powerful feature of the batch trace processor. URIs decouple
99the notion of "paths" to traces from the filesystem. Instead, the URI
100describes *how* a trace should be fetched (i.e. by sending a HTTP request
101to a server, from cloud storage etc).
102
103The syntax of trace URIs are similar to web
104[URLs](https://en.wikipedia.org/wiki/URL). Formally a trace URI has the
105structure:
106```
107Trace URI = protocol:key1=val1(;keyn=valn)*
108```
109
110As an example:
111```
112gcs:bucket=foo;path=bar
113```
114would indicate that traces should be fetched using the protocol `gcs`
115([Google Cloud Storage](https://cloud.google.com/storage)) with traces
116located at bucket `foo` and path `bar` in the bucket.
117
118NOTE: the `gcs` resolver is *not* actually included: it's simply given as its
119an easy to understand example.
120
121URIs are only a part of the puzzle: ultimately batch trace processor still needs
122the bytes of the traces to be able to parse and query them. The job of
123converting URIs to trace bytes is left to *resolvers* - Python
124classes associated to each *protocol* and use the key-value pairs in the URI
125to lookup the traces to be parsed.
126
127By default, batch trace processor only ships with a single resolver which knows
128how to lookup filesystem paths: however, custom resolvers can be easily
129created and registered. See the documentation on the
130[TraceUriResolver class](https://cs.android.com/android/platform/superproject/main/+/main:external/perfetto/python/perfetto/trace_uri_resolver/resolver.py;l=56?q=resolver.py)
131for information on how to do this.
132
133## Memory usage
134Memory usage is a very important thing to pay attention to working with batch
135trace processor. Every trace loaded lives fully in memory: this is magic behind
136making queries fast (<1s) even on hundreds of traces.
137
138This also means that the number of traces you can load is heavily limited by
139the amount of memory available available. As a rule of thumb, if your
140average trace size is S and you are trying to load N traces, you will have
1412 * S * N memory usage. Note that this can vary significantly based on the
142exact contents and sizes of your trace.
143
144## Advanced features
145### Sharing computations between TP and BTP
146Sometimes it can be useful to parameterise code to work with either trace
147processor or batch trace processor. `execute` or `execute_and_flatten`
148can be used for this purpose:
149```python
150def some_complex_calculation(tp):
151  res = tp.query('...').as_pandas_dataframe()
152  # ... do some calculations with res
153  return res
154
155# |some_complex_calculation| can be called with a [TraceProcessor] object:
156tp = TraceProcessor('/foo/bar.pftrace')
157some_complex_calculation(tp)
158
159# |some_complex_calculation| can also be passed to |execute| or
160# |execute_and_flatten|
161btp = BatchTraceProcessor(['...', '...', '...'])
162
163# Like |query|, |execute| returns one result per trace. Note that the returned
164# value *does not* have to be a Pandas dataframe.
165[a, b, c] = btp.execute(some_complex_calculation)
166
167# Like |query_and_flatten|, |execute_and_flatten| merges the Pandas dataframes
168# returned per trace into a single dataframe, adding any columns requested by
169# the resolver.
170flattened_res = btp.execute_and_flatten(some_complex_calculation)
171```
172