1# Deploying Bigtrace on Kubernetes 2 3NOTE: This doc is designed for administrators of Bigtrace services NOT Bigtrace users. This is also designed for non-Googlers - Googlers should look at `go/bigtrace` instead. 4 5## Overview of Bigtrace 6 7Bigtrace is a tool which facilitates the processing of traces in the O(million) by distributing instances of TraceProcessor across a Kubernetes cluster. 8 9The design of Bigtrace consists of four main parts: 10 11 12 13### Client 14There are three clients to interact with Bigtrace: a Python API, clickhouse-client and Apache Superset. 15- The Python API exists in the Perfetto python library and can be used similar to the TraceProcessor and BatchTraceProcessor APIs. 16- Clickhouse is a data warehousing solution which gives a SQL based interface for the user to write queries which are sent through gRPC to the Orchestrator. This can be accessed natively using the clickhouse-client which provides a CLI which allows the user to write queries to the DB. 17- Superset is a GUI for Clickhouse which offers an SQLLab to run queries offering support for modern features such as multiple tabs, autocomplete and syntax highlighting as well as providing data visualization tools to create charts easily from query results. 18 19### Orchestrator 20The Orchestrator is the central component of the service and is responsible for sharding traces to the various Worker pods and streaming the results to the Client. 21 22### Worker 23Each Worker runs an instance of TraceProcessor and performs the inputted query on a given trace. Each Worker runs on its own pod in the cluster. 24 25### Object Store (GCS) 26The object store contains the set of traces the service can query from and is accessed by the Worker. 27Currently, there is support for GCS as the main object store and the loading of traces stored locally on each machine for testing. 28 29Additional integrations can be added by creating a new repository policy in src/bigtrace/worker/repository_policies. 30 31## Deploying Bigtrace on GKE 32 33### GKE 34The recommended way to deploy Bigtrace is on Google Kubernetes Engine and this guide will explain the process. 35 36**Prerequisites:** 37- A GCP Project 38- GCS 39- GKE 40- gcloud (https://cloud.google.com/sdk/gcloud) 41- A clone of the Perfetto directory 42 43#### Service account permissions 44In addition to the default API access of the Compute Engine service account, the following permissions are required: 45- Storage Object User - to allow for the Worker to retrieve GCS authentication tokens 46 47These can be added on GCP through IAM & Admin > IAM > Permissions. 48 49--- 50 51### Setting up the cluster 52 53#### Creating the cluster 541. Navigate to Kubernetes Engine within GCP 552. Create a Standard cluster (Create > Standard > Configure) 56 573. In Cluster basics, select a location type - Use zonal for best load balancing performance 58 594. In Node pools > default-pool > Nodes, select a VM type - Preferably standard - e.g. e2-standard-8 or above 60 615. In the Networking tab, enable subsetting for L4 internal load balancers (this is required for services using internal load balancing within the VPC) 62 636. Create the cluster 64 65#### Accessing the cluster 66To use kubectl to apply the yaml files for deployments and services you must first connect and authenticate with the cluster. 67 68You can follow these instructions on device or in cloud shell using the following command: 69 70```bash 71gcloud container clusters get-credentials [CLUSTER_NAME] --zone [ZONE]--project [PROJECT_NAME] 72``` 73 74 75--- 76 77### Deploying the Orchestrator 78The deployment of Orchestrator requires two main steps: Building and pushing the images to Artifact Registry & deploying to the cluster. 79 80#### Building and uploading the Orchestrator image 81To build the image and push to Artifact Registry, first navigate to the perfetto directory and then run the following commands: 82 83```bash 84docker build -t bigtrace_orchestrator src/bigtrace/orchestrator 85 86docker tag bigtrace_orchestrator [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_orchestrator 87 88docker push [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_orchestrator 89``` 90 91#### Applying the yaml files 92To use the images from the registry which were built in the previous step, the orchestrator-deployment.yaml file must be modified to replace the line. 93 94```yaml 95image: [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_orchestrator 96``` 97 98The CPU resources should also be set depending on the vCPUs per pod as chosen before. 99 100```yaml 101resources: 102 requests: 103 cpu: [VCPUS_PER_MACHINE] 104 limits: 105 cpu: [VCPUS_PER_MACHINE] 106``` 107 108Then to deploy the Orchestrator you apply both the orchestrator-deployment.yaml and the orchestrator-ilb.yaml, for the deployment and internal load balancing service respectively. 109 110```bash 111kubectl apply -f orchestrator-deployment.yaml 112kubectl apply -f orchestrator-ilb.yaml 113``` 114 115This deploys the Orchestrator as a single replica in a pod and exposes it as a service for access within the VPC by the client. 116 117### Deploying the Worker 118Similar to the Orchestrator first build and push the images to Artifact Registry. 119 120```bash 121docker build -t bigtrace_worker src/bigtrace/worker 122 123docker tag bigtrace_worker [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_worker 124 125docker push [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_worker 126``` 127 128Then modify the yaml files to reflect the image as well as fit the required configuration for the use case. 129 130```yaml 131image: [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_worker 132... 133 134replicas: [DESIRED_REPLICA_COUNT] 135 136... 137 138resources: 139 requests: 140 cpu: [VCPUS_PER_MACHINE] 141``` 142 143Then deploy the deployment and service as follows: 144 145```bash 146kubectl apply -f worker-deployment.yaml 147kubectl apply -f worker-service.yaml 148``` 149 150### Deploying Clickhouse 151 152#### Build and upload the Clickhouse deployment image 153This image builds on top of the base Clickhouse image and provides the necessary Python libraries for gRPC to communicate with the Orchestrator. 154 155```bash 156docker build -t clickhouse src/bigtrace_clickhouse 157 158docker tag clickhouse [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/clickhouse 159 160docker push [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/clickhouse 161``` 162 163To deploy this on a pod in a cluster, the provided yaml files must be applied using kubectl e.g. 164 165``` 166kubectl apply -f src/bigtrace_clickhouse/config.yaml 167 168kubectl apply -f src/bigtrace_clickhouse/pvc.yaml 169 170kubectl apply -f src/bigtrace_clickhouse/pv.yaml 171 172kubectl apply -f src/bigtrace_clickhouse/clickhouse-deployment.yaml 173 174kubectl apply -f src/bigtrace_clickhouse/clickhouse-ilb.yaml 175``` 176With the clickhouse-deployment.yaml you must replace the image variable with the URI to the image built in the previous step - which contains the Clickhouse image with the necessary Python files for gRPC installed on top. 177 178The env variable BIGTRACE_ORCHESTRATOR_ADDRESS must also be changed to the address of the Orchestrator service given by GKE: 179 180``` 181 containers: 182 - name: clickhouse 183 image: # [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/clickhouse 184 env: 185 - name: BIGTRACE_ORCHESTRATOR_ADDRESS 186 value: # Address of Orchestrator service 187``` 188### File summary 189 190#### Deployment 191 192Contains the image of the Clickhouse server and configures the necessary volumes and resources. 193 194#### Internal Load Balancer Service (ILB) 195 196This Internal Load Balancer is used to allow for the Clickhouse server pod to be reached from within the VPC in GKE. This means that VMs outside the cluster are able to access the Clickhouse server through Clickhouse Client, without exposing the service to the public. 197 198#### Persistent Volume and Persistent Volume Claim 199 200These files create the volumes needed for the Clickhouse server to persist the databases in the event of pod failure. 201 202#### Config 203 204This is where Clickhouse config files can be specified to customize the server to the user's requirements. (https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings) 205 206### Accessing Clickhouse through clickhouse-client (CLI) 207You can deploy Clickhouse in a variety of ways by following: 208https://clickhouse.com/docs/en/install 209 210When running the client through CLI it is important to specify: 211./clickhouse client --host [ADDRESS] --port [PORT] --receive-timeout=1000000 --send-timeout=100000 --idle_connection_timeout=1000000 212 213### Deploying Superset 214 215There are two methods of deploying Superset - one for development and one for production. 216 217You can deploy an instance of Superset within a VM for development by following: 218https://superset.apache.org/docs/quickstart 219 220You can deploy a production ready instance on Kubernetes across pods by following: 221https://superset.apache.org/docs/installation/kubernetes 222 223Superset can then be connected to Clickhouse via clickhouse-connect by following the instructions at this link, but replacing the first step with the connection details of the deployment: https://clickhouse.com/docs/en/integrations/superset 224