1# Batch Trace Processor 2 3_The Batch Trace Processor is a Python library wrapping the 4[Trace Processor](/docs/analysis/trace-processor.md): it allows fast (<1s) 5interactive queries on large sets (up to ~1000) of traces._ 6 7## Installation 8 9Batch Trace Processor is part of the `perfetto` Python library and can be 10installed by running: 11 12```shell 13pip3 install pandas # prerequisite for Batch Trace Processor 14pip3 install perfetto 15``` 16 17## Loading traces 18NOTE: if you are a Googler, have a look at 19[go/perfetto-btp-load-internal](http://goto.corp.google.com/perfetto-btp-load-internal) for how to load traces from Google-internal sources. 20 21The simplest way to load traces in is by passing a list of file paths to load: 22```python 23from perfetto.batch_trace_processor.api import BatchTraceProcessor 24 25files = [ 26 'traces/slow-start.pftrace', 27 'traces/oom.pftrace', 28 'traces/high-battery-drain.pftrace', 29] 30with BatchTraceProcessor(files) as btp: 31 btp.query('...') 32``` 33 34[glob](https://docs.python.org/3/library/glob.html) can be used to load 35all traces in a directory: 36```python 37from perfetto.batch_trace_processor.api import BatchTraceProcessor 38 39files = glob.glob('traces/*.pftrace') 40with BatchTraceProcessor(files) as btp: 41 btp.query('...') 42``` 43 44NOTE: loading too many traces can cause out-of-memory issues: see 45[this](/docs/analysis/batch-trace-processor#memory-usage) section for details. 46 47A common requirement is to load traces located in the cloud or by sending 48a request to a server. To support this usecase, traces can also be loaded 49using [trace URIs](/docs/analysis/batch-trace-processor#trace-uris): 50```python 51from perfetto.batch_trace_processor.api import BatchTraceProcessor 52from perfetto.batch_trace_processor.api import BatchTraceProcessorConfig 53from perfetto.trace_processor.api import TraceProcessorConfig 54from perfetto.trace_uri_resolver.registry import ResolverRegistry 55from perfetto.trace_uri_resolver.resolver import TraceUriResolver 56 57class FooResolver(TraceUriResolver): 58 # See "Trace URIs" section below for how to implement a URI resolver. 59 60config = BatchTraceProcessorConfig( 61 # See "Trace URIs" below 62) 63with BatchTraceProcessor('foo:bar=1,baz=abc', config=config) as btp: 64 btp.query('...') 65``` 66 67## Writing queries 68Writing queries with batch trace processor works very similarly to the 69[Python API](/docs/analysis/batch-trace-processor#python-api). 70 71For example, to get a count of the number of userspace slices: 72```python 73>>> btp.query('select count(1) from slice') 74[ count(1) 750 2092592, count(1) 760 156071, count(1) 770 121431] 78``` 79The return value of `query` is a list of [Pandas](https://pandas.pydata.org/) 80dataframes, one for each trace loaded. 81 82A common requirement is for all of the traces to be flattened into a 83single dataframe instead of getting one dataframe per-trace. To support this, 84the `query_and_flatten` function can be used: 85```python 86>>> btp.query_and_flatten('select count(1) from slice') 87 count(1) 880 2092592 891 156071 902 121431 91``` 92 93`query_and_flatten` also implicitly adds columns indicating the originating 94trace. The exact columns added depend on the resolver being used: consult your 95resolver's documentation for more information. 96 97## Trace URIs 98Trace URIs are a powerful feature of the batch trace processor. URIs decouple 99the notion of "paths" to traces from the filesystem. Instead, the URI 100describes *how* a trace should be fetched (i.e. by sending a HTTP request 101to a server, from cloud storage etc). 102 103The syntax of trace URIs are similar to web 104[URLs](https://en.wikipedia.org/wiki/URL). Formally a trace URI has the 105structure: 106``` 107Trace URI = protocol:key1=val1(;keyn=valn)* 108``` 109 110As an example: 111``` 112gcs:bucket=foo;path=bar 113``` 114would indicate that traces should be fetched using the protocol `gcs` 115([Google Cloud Storage](https://cloud.google.com/storage)) with traces 116located at bucket `foo` and path `bar` in the bucket. 117 118NOTE: the `gcs` resolver is *not* actually included: it's simply given as its 119an easy to understand example. 120 121URIs are only a part of the puzzle: ultimately batch trace processor still needs 122the bytes of the traces to be able to parse and query them. The job of 123converting URIs to trace bytes is left to *resolvers* - Python 124classes associated to each *protocol* and use the key-value pairs in the URI 125to lookup the traces to be parsed. 126 127By default, batch trace processor only ships with a single resolver which knows 128how to lookup filesystem paths: however, custom resolvers can be easily 129created and registered. See the documentation on the 130[TraceUriResolver class](https://cs.android.com/android/platform/superproject/main/+/main:external/perfetto/python/perfetto/trace_uri_resolver/resolver.py;l=56?q=resolver.py) 131for information on how to do this. 132 133## Memory usage 134Memory usage is a very important thing to pay attention to working with batch 135trace processor. Every trace loaded lives fully in memory: this is magic behind 136making queries fast (<1s) even on hundreds of traces. 137 138This also means that the number of traces you can load is heavily limited by 139the amount of memory available available. As a rule of thumb, if your 140average trace size is S and you are trying to load N traces, you will have 1412 * S * N memory usage. Note that this can vary significantly based on the 142exact contents and sizes of your trace. 143 144## Advanced features 145### Sharing computations between TP and BTP 146Sometimes it can be useful to parameterise code to work with either trace 147processor or batch trace processor. `execute` or `execute_and_flatten` 148can be used for this purpose: 149```python 150def some_complex_calculation(tp): 151 res = tp.query('...').as_pandas_dataframe() 152 # ... do some calculations with res 153 return res 154 155# |some_complex_calculation| can be called with a [TraceProcessor] object: 156tp = TraceProcessor('/foo/bar.pftrace') 157some_complex_calculation(tp) 158 159# |some_complex_calculation| can also be passed to |execute| or 160# |execute_and_flatten| 161btp = BatchTraceProcessor(['...', '...', '...']) 162 163# Like |query|, |execute| returns one result per trace. Note that the returned 164# value *does not* have to be a Pandas dataframe. 165[a, b, c] = btp.execute(some_complex_calculation) 166 167# Like |query_and_flatten|, |execute_and_flatten| merges the Pandas dataframes 168# returned per trace into a single dataframe, adding any columns requested by 169# the resolver. 170flattened_res = btp.execute_and_flatten(some_complex_calculation) 171``` 172