xref: /aosp_15_r20/external/perfetto/docs/deployment/deploying-bigtrace-on-kubernetes.md (revision 6dbdd20afdafa5e3ca9b8809fa73465d530080dc)
1# Deploying Bigtrace on Kubernetes
2
3NOTE: This doc is designed for administrators of Bigtrace services NOT Bigtrace users. This is also designed for non-Googlers - Googlers should look at `go/bigtrace` instead.
4
5## Overview of Bigtrace
6
7Bigtrace is a tool which facilitates the processing of traces in the O(million) by distributing instances of TraceProcessor across a Kubernetes cluster.
8
9The design of Bigtrace consists of four main parts:
10
11![](/docs/images/bigtrace/bigtrace-diagram.png)
12
13### Client
14There are three clients to interact with Bigtrace: a Python API, clickhouse-client and Apache Superset.
15- The Python API exists in the Perfetto python library and can be used similar to the TraceProcessor and BatchTraceProcessor APIs.
16- Clickhouse is a data warehousing solution which gives a SQL based interface for the user to write queries which are sent through gRPC to the Orchestrator. This can be accessed natively using the clickhouse-client which provides a CLI which allows the user to write queries to the DB.
17- Superset is a GUI for Clickhouse which offers an SQLLab to run queries offering support for modern features such as multiple tabs, autocomplete and syntax highlighting as well as providing data visualization tools to create charts easily from query results.
18
19### Orchestrator
20The Orchestrator is the central component of the service and is responsible for sharding traces to the various Worker pods and streaming the results to the Client.
21
22### Worker
23Each Worker runs an instance of TraceProcessor and performs the inputted query on a given trace. Each Worker runs on its own pod in the cluster.
24
25### Object Store (GCS)
26The object store contains the set of traces the service can query from and is accessed by the Worker.
27Currently, there is support for GCS as the main object store and the loading of traces stored locally on each machine for testing.
28
29Additional integrations can be added by creating a new repository policy in src/bigtrace/worker/repository_policies.
30
31## Deploying Bigtrace on GKE
32
33### GKE
34The recommended way to deploy Bigtrace is on Google Kubernetes Engine and this guide will explain the process.
35
36**Prerequisites:**
37- A GCP Project
38- GCS
39- GKE
40- gcloud (https://cloud.google.com/sdk/gcloud)
41- A clone of the Perfetto directory
42
43#### Service account permissions
44In addition to the default API access of the Compute Engine service account, the following permissions are required:
45- Storage Object User - to allow for the Worker to retrieve GCS authentication tokens
46
47These can be added on GCP through IAM & Admin > IAM > Permissions.
48
49---
50
51### Setting up the cluster
52
53#### Creating the cluster
541. Navigate to Kubernetes Engine within GCP
552. Create a Standard cluster (Create > Standard > Configure)
56![](/docs/images/bigtrace/create_cluster_2.png)
573. In Cluster basics, select a location type - Use zonal for best load balancing performance
58![](/docs/images/bigtrace/create_cluster_3.png)
594. In Node pools > default-pool > Nodes, select a VM type - Preferably standard - e.g. e2-standard-8 or above
60![](/docs/images/bigtrace/create_cluster_4.png)
615. In the Networking tab, enable subsetting for L4 internal load balancers (this is required for services using internal load balancing within the VPC)
62![](/docs/images/bigtrace/create_cluster_5.png)
636. Create the cluster
64
65#### Accessing the cluster
66To use kubectl to apply the yaml files for deployments and services you must first connect and authenticate with the cluster.
67
68You can follow these instructions on device or in cloud shell using the following command:
69
70```bash
71gcloud container clusters get-credentials [CLUSTER_NAME] --zone [ZONE]--project [PROJECT_NAME]
72```
73
74
75---
76
77### Deploying the Orchestrator
78The deployment of Orchestrator requires two main steps: Building and pushing the images to Artifact Registry & deploying to the cluster.
79
80#### Building and uploading the Orchestrator image
81To build the image and push to Artifact Registry, first navigate to the perfetto directory and then run the following commands:
82
83```bash
84docker build -t bigtrace_orchestrator src/bigtrace/orchestrator
85
86docker tag bigtrace_orchestrator [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_orchestrator
87
88docker push [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_orchestrator
89```
90
91#### Applying the yaml files
92To use the images from the registry which were built in the previous step, the orchestrator-deployment.yaml file must be modified to replace the line.
93
94```yaml
95image: [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_orchestrator
96```
97
98The CPU resources should also be set depending on the vCPUs per pod as chosen before.
99
100```yaml
101resources:
102    requests:
103      cpu: [VCPUS_PER_MACHINE]
104    limits:
105      cpu: [VCPUS_PER_MACHINE]
106```
107
108Then to deploy the Orchestrator you apply both the orchestrator-deployment.yaml and the orchestrator-ilb.yaml, for the deployment and internal load balancing service respectively.
109
110```bash
111kubectl apply -f orchestrator-deployment.yaml
112kubectl apply -f orchestrator-ilb.yaml
113```
114
115This deploys the Orchestrator as a single replica in a pod and exposes it as a service for access within the VPC by the client.
116
117### Deploying the Worker
118Similar to the Orchestrator first build and push the images to Artifact Registry.
119
120```bash
121docker build -t bigtrace_worker src/bigtrace/worker
122
123docker tag bigtrace_worker [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_worker
124
125docker push [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_worker
126```
127
128Then modify the yaml files to reflect the image as well as fit the required configuration for the use case.
129
130```yaml
131image: [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_worker
132...
133
134replicas: [DESIRED_REPLICA_COUNT]
135
136...
137
138resources:
139  requests:
140    cpu: [VCPUS_PER_MACHINE]
141```
142
143Then deploy the deployment and service as follows:
144
145```bash
146kubectl apply -f worker-deployment.yaml
147kubectl apply -f worker-service.yaml
148```
149
150### Deploying Clickhouse
151
152#### Build and upload the Clickhouse deployment image
153This image builds on top of the base Clickhouse image and provides the necessary Python libraries for gRPC to communicate with the Orchestrator.
154
155```bash
156docker build -t clickhouse src/bigtrace_clickhouse
157
158docker tag clickhouse [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/clickhouse
159
160docker push [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/clickhouse
161```
162
163To deploy this on a pod in a cluster, the provided yaml files must be applied using kubectl e.g.
164
165```
166kubectl apply -f src/bigtrace_clickhouse/config.yaml
167
168kubectl apply -f src/bigtrace_clickhouse/pvc.yaml
169
170kubectl apply -f src/bigtrace_clickhouse/pv.yaml
171
172kubectl apply -f src/bigtrace_clickhouse/clickhouse-deployment.yaml
173
174kubectl apply -f src/bigtrace_clickhouse/clickhouse-ilb.yaml
175```
176With the clickhouse-deployment.yaml you must replace the image variable with the URI to the image built in the previous step - which contains the Clickhouse image with the necessary Python files for gRPC installed on top.
177
178The env variable BIGTRACE_ORCHESTRATOR_ADDRESS must also be changed to the address of the Orchestrator service given by GKE:
179
180```
181 containers:
182      - name: clickhouse
183        image: # [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/clickhouse
184        env:
185        - name: BIGTRACE_ORCHESTRATOR_ADDRESS
186          value: # Address of Orchestrator service
187```
188### File summary
189
190#### Deployment
191
192Contains the image of the Clickhouse server and configures the necessary volumes and resources.
193
194#### Internal Load Balancer Service (ILB)
195
196This Internal Load Balancer is used to allow for the Clickhouse server pod to be reached from within the VPC in GKE. This means that VMs outside the cluster are able to access the Clickhouse server through Clickhouse Client, without exposing the service to the public.
197
198#### Persistent Volume and Persistent Volume Claim
199
200These files create the volumes needed for the Clickhouse server to persist the databases in the event of pod failure.
201
202#### Config
203
204This is where Clickhouse config files can be specified to customize the server to the user's requirements. (https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings)
205
206### Accessing Clickhouse through clickhouse-client (CLI)
207You can deploy Clickhouse in a variety of ways by following:
208https://clickhouse.com/docs/en/install
209
210When running the client through CLI it is important to specify:
211./clickhouse client --host [ADDRESS]  --port [PORT] --receive-timeout=1000000 --send-timeout=100000 --idle_connection_timeout=1000000
212
213### Deploying Superset
214
215There are two methods of deploying Superset - one for development and one for production.
216
217You can deploy an instance of Superset within a VM for development by following:
218https://superset.apache.org/docs/quickstart
219
220You can deploy a production ready instance on Kubernetes across pods by following:
221https://superset.apache.org/docs/installation/kubernetes
222
223Superset can then be connected to Clickhouse via clickhouse-connect by following the instructions at this link, but replacing the first step with the connection details of the deployment: https://clickhouse.com/docs/en/integrations/superset
224