1# Overview of performance test suite 2 3For design of the tests, see https://grpc.io/docs/guides/benchmarking. 4 5This document contains documentation of on how to run gRPC end-to-end benchmarks 6using the gRPC OSS benchmarks framework (recommended) or how to run them 7manually (for experts only). 8 9## Approach 1: Use gRPC OSS benchmarks framework (Recommended) 10 11### gRPC OSS benchmarks 12 13The scripts in this section generate LoadTest configurations for the GKE-based 14gRPC OSS benchmarks framework. This framework is stored in a separate 15repository, [grpc/test-infra]. 16 17These scripts, together with tools defined in [grpc/test-infra], are used in the 18continuous integration setup defined in [grpc_e2e_performance_gke.sh] and 19[grpc_e2e_performance_gke_experiment.sh]. 20 21#### Generating scenarios 22 23The benchmarks framework uses the same test scenarios as the legacy one. The 24script [scenario_config_exporter.py](./scenario_config_exporter.py) can be used 25to export these scenarios to files, and also to count and analyze existing 26scenarios. 27 28The language(s) and category of the scenarios are of particular importance to 29the tests. Continuous runs will typically run tests in the `scalable` category. 30 31The following example counts scenarios in the `scalable` category: 32 33``` 34$ ./tools/run_tests/performance/scenario_config_exporter.py --count_scenarios --category=scalable 35Scenario count for all languages (category: scalable): 36Count Language Client Server Categories 37 77 c++ scalable 38 19 python_asyncio scalable 39 16 java scalable 40 12 go scalable 41 12 node node scalable 42 12 node_purejs node scalable 43 9 csharp scalable 44 7 python scalable 45 5 ruby scalable 46 4 csharp c++ scalable 47 4 php7 c++ scalable 48 4 php7_protobuf_c c++ scalable 49 3 python_asyncio c++ scalable 50 2 ruby c++ scalable 51 2 python c++ scalable 52 1 csharp c++ scalable 53 54 189 total scenarios (category: scalable) 55``` 56 57Client and server languages are only set for cross-language scenarios, where the 58client or server language do not match the scenario language. 59 60#### Generating load test configurations 61 62The benchmarks framework uses LoadTest resources configured by YAML files. Each 63LoadTest resource specifies a driver, a server, and one or more clients to run 64the test. Each test runs one scenario. The scenario configuration is embedded in 65the LoadTest configuration. Example configurations for various languages can be 66found here: 67 68https://github.com/grpc/test-infra/tree/master/config/samples 69 70The script [loadtest_config.py](./loadtest_config.py) generates LoadTest 71configurations for tests running a set of scenarios. The configurations are 72written in multipart YAML format, either to a file or to stdout. Each 73configuration contains a single embedded scenario. 74 75The LoadTest configurations are generated from a template. Any configuration can 76be used as a template, as long as it contains the languages required by the set 77of scenarios we intend to run (for instance, if we are generating configurations 78to run go scenarios, the template must contain a go client and a go server; if 79we are generating configurations for cross-language scenarios that need a go 80client and a C++ server, the template must also contain a C++ server; and the 81same for all other languages). 82 83The LoadTests specified in the script output all have unique names and can be 84run by applying the test to a cluster running the LoadTest controller with 85`kubectl apply`: 86 87``` 88$ kubectl apply -f loadtest_config.yaml 89``` 90 91> Note: The most common way of running tests generated by this script is to use 92> a _test runner_. For details, see [running tests](#running-tests). 93 94A basic template for generating tests in various languages can be found here: 95[loadtest_template_basic_all_languages.yaml](./templates/loadtest_template_basic_all_languages.yaml). 96The following example generates configurations for C# and Java tests using this 97template, including tests against C++ clients and servers, and running each test 98twice: 99 100``` 101$ ./tools/run_tests/performance/loadtest_config.py -l go -l java \ 102 -t ./tools/run_tests/performance/templates/loadtest_template_basic_all_languages.yaml \ 103 -s client_pool=workers-8core -s driver_pool=drivers \ 104 -s server_pool=workers-8core \ 105 -s big_query_table=e2e_benchmarks.experimental_results \ 106 -s timeout_seconds=3600 --category=scalable \ 107 -d --allow_client_language=c++ --allow_server_language=c++ \ 108 --runs_per_test=2 -o ./loadtest.yaml 109``` 110 111The script `loadtest_config.py` takes the following options: 112 113- `-l`, `--language`<br> Language to benchmark. May be repeated. 114- `-t`, `--template`<br> Template file. A template is a configuration file that 115 may contain multiple client and server configuration, and may also include 116 substitution keys. 117- `-s`, `--substitution` Substitution keys, in the format `key=value`. These 118 keys are substituted while processing the template. Environment variables that 119 are set by the load test controller at runtime are ignored by default 120 (`DRIVER_PORT`, `KILL_AFTER`, `POD_TIMEOUT`). The user can override this 121 behavior by specifying these variables as keys. 122- `-p`, `--prefix`<br> Test names consist of a prefix_joined with a uuid with a 123 dash. Test names are stored in `metadata.name`. The prefix is also added as 124 the `prefix` label in `metadata.labels`. The prefix defaults to the user name 125 if not set. 126- `-u`, `--uniquifier_element`<br> Uniquifier elements may be passed to the test 127 to make the test name unique. This option may be repeated to add multiple 128 elements. The uniquifier elements (plus a date string and a run index, if 129 applicable) are joined with a dash to form a _uniquifier_. The test name uuid 130 is derived from the scenario name and the uniquifier. The uniquifier is also 131 added as the `uniquifier` annotation in `metadata.annotations`. 132- `-d`<br> This option is a shorthand for the addition of a date string as a 133 uniquifier element. 134- `-a`, `--annotation`<br> Metadata annotation to be stored in 135 `metadata.annotations`, in the form key=value. May be repeated. 136- `-r`, `--regex`<br> Regex to select scenarios to run. Each scenario is 137 embedded in a LoadTest configuration containing a client and server of the 138 language(s) required for the test. Defaults to `.*`, i.e., select all 139 scenarios. 140- `--category`<br> Select scenarios of a specified _category_, or of all 141 categories. Defaults to `all`. Continuous runs typically run tests in the 142 `scalable` category. 143- `--allow_client_language`<br> Allows cross-language scenarios where the client 144 is of a specified language, different from the scenario language. This is 145 typically `c++`. This flag may be repeated. 146- `--allow_server_language`<br> Allows cross-language scenarios where the server 147 is of a specified language, different from the scenario language. This is 148 typically `node` or `c++`. This flag may be repeated. 149- `--instances_per_client`<br> This option generates multiple instances of the 150 clients for each test. The instances are named with the name of the client 151 combined with an index (or only an index, if no name is specified). If the 152 template specifies more than one client for a given language, it must also 153 specify unique names for each client. In the most common case, the template 154 contains only one unnamed client for each language, and the instances will be 155 named `0`, `1`, ... 156- `--runs_per_test`<br> This option specifies that each test should be repeated 157 `n` times, where `n` is the value of the flag. If `n` > 1, the index of each 158 test run is added as a uniquifier element for that run. 159- `-o`, `--output`<br> Output file name. The LoadTest configurations are added 160 to this file, in multipart YAML format. Output is streamed to `sys.stdout` if 161 not set. 162 163The script adds labels and annotations to the metadata of each LoadTest 164configuration: 165 166The following labels are added to `metadata.labels`: 167 168- `language`<br> The language of the LoadTest scenario. 169- `prefix`<br> The prefix used in `metadata.name`. 170 171The following annotations are added to `metadata.annotations`: 172 173- `scenario`<br> The name of the LoadTest scenario. 174- `uniquifier`<br> The uniquifier used to generate the LoadTest name, including 175 the run index if applicable. 176 177[Labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/) 178can be used in selectors in resource queries. Adding the prefix, in particular, 179allows the user (or an automation script) to select the resources started from a 180given run of the config generator. 181 182[Annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) 183contain additional information that is available to the user (or an automation 184script) but is not indexed and cannot be used to select objects. Scenario name 185and uniquifier are added to provide the elements of the LoadTest name uuid in 186human-readable form. Additional annotations may be added later for automation. 187 188#### Concatenating load test configurations 189 190The LoadTest configuration generator can process multiple languages at a time, 191assuming that they are supported by the template. The convenience script 192[loadtest_concat_yaml.py](./loadtest_concat_yaml.py) is provided to concatenate 193several YAML files into one, so configurations generated by multiple generator 194invocations can be concatenated into one and run with a single command. The 195script can be invoked as follows: 196 197``` 198$ loadtest_concat_yaml.py -i infile1.yaml infile2.yaml -o outfile.yaml 199``` 200 201#### Generating load test examples 202 203The script [loadtest_examples.sh](./loadtest_examples.sh) is provided to 204generate example load test configurations in all supported languages. This 205script takes only one argument, which is the output directory where the 206configurations will be created. The script produces a set of basic 207configurations, as well as a set of template configurations intended to be used 208with prebuilt images. 209 210The [examples](https://github.com/grpc/test-infra/tree/master/config/samples) in 211the repository [grpc/test-infra] are generated by this script. 212 213#### Generating configuration templates 214 215The script [loadtest_template.py](./loadtest_template.py) generates a load test 216configuration template from a set of load test configurations. The source files 217may be load test configurations or load test configuration templates. The 218generated template supports all languages supported in any of the input 219configurations or templates. 220 221The example template in 222[loadtest_template_basic_template_all_languages.yaml](./templates/loadtest_template_basic_all_languages.yaml) 223was generated from the example configurations in [grpc/test-infra] by the 224following command: 225 226``` 227$ ./tools/run_tests/performance/loadtest_template.py \ 228 -i ../test-infra/config/samples/*_example_loadtest.yaml \ 229 --inject_client_pool --inject_server_pool \ 230 --inject_big_query_table --inject_timeout_seconds \ 231 -o ./tools/run_tests/performance/templates/loadtest_template_basic_all_languages.yaml \ 232 --name basic_all_languages 233``` 234 235The example template with prebuilt images in 236[loadtest_template_prebuilt_all_languages.yaml](./templates/loadtest_template_prebuilt_all_languages.yaml) 237was generated by the following command: 238 239``` 240$ ./tools/run_tests/performance/loadtest_template.py \ 241 -i ../test-infra/config/samples/templates/*_example_loadtest_with_prebuilt_workers.yaml \ 242 --inject_client_pool --inject_driver_image --inject_driver_pool \ 243 --inject_server_pool --inject_big_query_table --inject_timeout_seconds \ 244 -o ./tools/run_tests/performance/templates/loadtest_template_prebuilt_all_languages.yaml \ 245 --name prebuilt_all_languages 246``` 247 248The script `loadtest_template.py` takes the following options: 249 250- `-i`, `--inputs`<br> Space-separated list of the names of input files 251 containing LoadTest configurations. May be repeated. 252- `-o`, `--output`<br> Output file name. Outputs to `sys.stdout` if not set. 253- `--inject_client_pool`<br> If this option is set, the pool attribute of all 254 clients in `spec.clients` is set to `${client_pool}`, for later substitution. 255- `--inject_driver_image`<br> If this option is set, the image attribute of the 256 driver(s) in `spec.drivers` is set to `${driver_image}`, for later 257 substitution. 258- `--inject_driver_pool`<br> If this attribute is set, the pool attribute of the 259 driver(s) is set to `${driver_pool}`, for later substitution. 260- `--inject_server_pool`<br> If this option is set, the pool attribute of all 261 servers in `spec.servers` is set to `${server_pool}`, for later substitution. 262- `--inject_big_query_table`<br> If this option is set, 263 spec.results.bigQueryTable is set to `${big_query_table}`. 264- `--inject_timeout_seconds`<br> If this option is set, `spec.timeoutSeconds` is 265 set to `${timeout_seconds}`. 266- `--inject_ttl_seconds`<br> If this option is set, `spec.ttlSeconds` is set to 267 `${ttl_seconds}`. 268- `-n`, `--name`<br> Name to be set in `metadata.name`. 269- `-a`, `--annotation`<br> Metadata annotation to be stored in 270 `metadata.annotations`, in the form key=value. May be repeated. 271 272The options that inject substitution keys are the most useful for template 273reuse. When running tests on different node pools, it becomes necessary to set 274the pool, and usually also to store the data on a different table. When running 275as part of a larger collection of tests, it may also be necessary to adjust test 276timeout and time-to-live, to ensure that all tests have time to complete. 277 278The template name is replaced again by `loadtest_config.py`, and so is set only 279as a human-readable memo. 280 281Annotations, on the other hand, are passed on to the test configurations, and 282may be set to values or to substitution keys in themselves, allowing future 283automation scripts to process the tests generated from these configurations in 284different ways. 285 286#### Running tests 287 288Collections of tests generated by `loadtest_config.py` are intended to be run 289with a test runner. The code for the test runner is stored in a separate 290repository, [grpc/test-infra]. 291 292The test runner applies the tests to the cluster, and monitors the tests for 293completion while they are running. The test runner can also be set up to run 294collections of tests in parallel on separate node pools, and to limit the number 295of tests running in parallel on each pool. 296 297For more information, see the 298[tools README](https://github.com/grpc/test-infra/blob/master/tools/README.md) 299in [grpc/test-infra]. 300 301For usage examples, see the continuous integration setup defined in 302[grpc_e2e_performance_gke.sh] and [grpc_e2e_performance_gke_experiment.sh]. 303 304[grpc/test-infra]: https://github.com/grpc/test-infra 305[grpc_e2e_performance_gke.sh]: ../../internal_ci/linux/grpc_e2e_performance_gke.sh 306[grpc_e2e_performance_gke_experiment.sh]: ../../internal_ci/linux/grpc_e2e_performance_gke_experiment.sh 307 308## Approach 2: Running benchmarks locally via legacy tooling (still useful sometimes) 309 310This approach is much more involved than using the gRPC OSS benchmarks framework 311(see above), but can still be useful for hands-on low-level experiments 312(especially when you know what you are doing). 313 314### Prerequisites for running benchmarks manually: 315 316In general the benchmark workers and driver build scripts expect 317[linux_performance_worker_init.sh](../../gce/linux_performance_worker_init.sh) 318to have been ran already. 319 320### To run benchmarks locally: 321 322- From the grpc repo root, start the 323 [run_performance_tests.py](../run_performance_tests.py) runner script. 324 325### On remote machines, to start the driver and workers manually: 326 327The [run_performance_test.py](../run_performance_tests.py) top-level runner 328script can also be used with remote machines, but for e.g., profiling the 329server, it might be useful to run workers manually. 330 3311. You'll need a "driver" and separate "worker" machines. For example, you might 332 use one GCE "driver" machine and 3 other GCE "worker" machines that are in 333 the same zone. 334 3352. Connect to each worker machine and start up a benchmark worker with a 336 "driver_port". 337 338- For example, to start the grpc-go benchmark worker: 339 [grpc-go worker main.go](https://github.com/grpc/grpc-go/blob/master/benchmark/worker/main.go) 340 --driver_port <driver_port> 341 342#### Commands to start workers in different languages: 343 344- Note that these commands are what the top-level 345 [run_performance_test.py](../run_performance_tests.py) script uses to build 346 and run different workers through the 347 [build_performance.sh](./build_performance.sh) script and "run worker" scripts 348 (such as the [run_worker_java.sh](./run_worker_java.sh)). 349 350##### Running benchmark workers for C-core wrapped languages (C++, Python, C#, Node, Ruby): 351 352- These are more simple since they all live in the main grpc repo. 353 354``` 355$ cd <grpc_repo_root> 356$ tools/run_tests/performance/build_performance.sh 357$ tools/run_tests/performance/run_worker_<language>.sh 358``` 359 360- Note that there is one "run_worker" script per language, e.g., 361 [run_worker_csharp.sh](./run_worker_csharp.sh) for c#. 362 363##### Running benchmark workers for gRPC-Java: 364 365- You'll need the [grpc-java](https://github.com/grpc/grpc-java) repo. 366 367``` 368$ cd <grpc-java-repo> 369$ ./gradlew -PskipCodegen=true -PskipAndroid=true :grpc-benchmarks:installDist 370$ benchmarks/build/install/grpc-benchmarks/bin/benchmark_worker --driver_port <driver_port> 371``` 372 373##### Running benchmark workers for gRPC-Go: 374 375- You'll need the [grpc-go repo](https://github.com/grpc/grpc-go) 376 377``` 378$ cd <grpc-go-repo>/benchmark/worker && go install 379$ # if profiling, it might be helpful to turn off inlining by building with "-gcflags=-l" 380$ $GOPATH/bin/worker --driver_port <driver_port> 381``` 382 383#### Build the driver: 384 385- Connect to the driver machine (if using a remote driver) and from the grpc 386 repo root: 387 388``` 389$ tools/run_tests/performance/build_performance.sh 390``` 391 392#### Run the driver: 393 3941. Get the 'scenario_json' relevant for the scenario to run. Note that "scenario 395 json" configs are generated from [scenario_config.py](./scenario_config.py). 396 The [driver](../../../test/cpp/qps/qps_json_driver.cc) takes a list of these 397 configs as a json string of the form: `{scenario: <json_list_of_scenarios> }` 398 in its `--scenarios_json` command argument. One quick way to get a valid json 399 string to pass to the driver is by running the 400 [run_performance_tests.py](./run_performance_tests.py) locally and copying 401 the logged scenario json command arg. 402 4032. From the grpc repo root: 404 405- Set `QPS_WORKERS` environment variable to a comma separated list of worker 406 machines. Note that the driver will start the "benchmark server" on the first 407 entry in the list, and the rest will be told to run as clients against the 408 benchmark server. 409 410Example running and profiling of go benchmark server: 411 412``` 413$ export QPS_WORKERS=<host1>:<10000>,<host2>,10000,<host3>:10000 414$ bins/opt/qps_json_driver --scenario_json='<scenario_json_scenario_config_string>' 415``` 416 417### Example profiling commands 418 419While running the benchmark, a profiler can be attached to the server. 420 421Example to count syscalls in grpc-go server during a benchmark: 422 423- Connect to server machine and run: 424 425``` 426$ netstat -tulpn | grep <driver_port> # to get pid of worker 427$ perf stat -p <worker_pid> -e syscalls:sys_enter_write # stop after test complete 428``` 429 430Example memory profile of grpc-go server, with `go tools pprof`: 431 432- After a run is done on the server, see its alloc profile with: 433 434``` 435$ go tool pprof --text --alloc_space http://localhost:<pprof_port>/debug/heap 436``` 437 438### Configuration environment variables: 439 440- QPS_WORKER_CHANNEL_CONNECT_TIMEOUT 441 442 Consuming process: qps_worker 443 444 Type: integer (number of seconds) 445 446 This can be used to configure the amount of time that benchmark clients wait 447 for channels to the benchmark server to become ready. This is useful in 448 certain benchmark environments in which the server can take a long time to 449 become ready. Note: if setting this to a high value, then the scenario config 450 under test should probably also have a large "warmup_seconds". 451 452- QPS_WORKERS 453 454 Consuming process: qps_json_driver 455 456 Type: comma separated list of host:port 457 458 Set this to a comma separated list of QPS worker processes/machines. Each 459 scenario in a scenario config has specifies a certain number of servers, 460 `num_servers`, and the driver will start "benchmark servers"'s on the first 461 `num_server` `host:port` pairs in the comma separated list. The rest will be 462 told to run as clients against the benchmark server. 463