1# Overview of performance test suite 2 3For design of the tests, see https://grpc.io/docs/guides/benchmarking. 4 5This document contains documentation of on how to run gRPC end-to-end benchmarks 6using the gRPC OSS benchmarks framework (recommended) or how to run them 7manually (for experts only). 8 9## Approach 1: Use gRPC OSS benchmarks framework (Recommended) 10 11### gRPC OSS benchmarks 12 13The scripts in this section generate LoadTest configurations for the GKE-based 14gRPC OSS benchmarks framework. This framework is stored in a separate 15repository, [grpc/test-infra]. 16 17These scripts, together with tools defined in [grpc/test-infra], are used in the 18continuous integration setup defined in [grpc_e2e_performance_gke.sh] and 19[grpc_e2e_performance_gke_experiment.sh]. 20 21#### Generating scenarios 22 23The benchmarks framework uses the same test scenarios as the legacy one. The 24script [scenario_config_exporter.py](./scenario_config_exporter.py) can be used 25to export these scenarios to files, and also to count and analyze existing 26scenarios. 27 28The language(s) and category of the scenarios are of particular importance to 29the tests. Continuous runs will typically run tests in the `scalable` category. 30 31The following example counts scenarios in the `scalable` category: 32 33``` 34$ ./tools/run_tests/performance/scenario_config_exporter.py --count_scenarios --category=scalable 35Scenario count for all languages (category: scalable): 36Count Language Client Server Categories 37 56 c++ scalable 38 19 python_asyncio scalable 39 16 java scalable 40 12 go scalable 41 12 node scalable 42 9 csharp scalable 43 9 dotnet scalable 44 7 python scalable 45 5 ruby scalable 46 4 csharp c++ scalable 47 4 dotnet c++ scalable 48 4 php7 c++ scalable 49 4 php7_protobuf_c c++ scalable 50 3 python_asyncio c++ scalable 51 2 ruby c++ scalable 52 2 python c++ scalable 53 1 csharp c++ scalable 54 1 dotnet c++ scalable 55 56 170 total scenarios (category: scalable) 57``` 58 59Client and server languages are only set for cross-language scenarios, where the 60client or server language do not match the scenario language. 61 62#### Generating load test configurations 63 64The benchmarks framework uses LoadTest resources configured by YAML files. Each 65LoadTest resource specifies a driver, a server, and one or more clients to run 66the test. Each test runs one scenario. The scenario configuration is embedded in 67the LoadTest configuration. Example configurations for various languages can be 68found here: 69 70https://github.com/grpc/test-infra/tree/master/config/samples 71 72The script [loadtest_config.py](./loadtest_config.py) generates LoadTest 73configurations for tests running a set of scenarios. The configurations are 74written in multipart YAML format, either to a file or to stdout. Each 75configuration contains a single embedded scenario. 76 77The LoadTest configurations are generated from a template. Any configuration can 78be used as a template, as long as it contains the languages required by the set 79of scenarios we intend to run (for instance, if we are generating configurations 80to run go scenarios, the template must contain a go client and a go server; if 81we are generating configurations for cross-language scenarios that need a go 82client and a C++ server, the template must also contain a C++ server; and the 83same for all other languages). 84 85The LoadTests specified in the script output all have unique names and can be 86run by applying the test to a cluster running the LoadTest controller with 87`kubectl apply`: 88 89``` 90$ kubectl apply -f loadtest_config.yaml 91``` 92 93> Note: The most common way of running tests generated by this script is to use 94> a _test runner_. For details, see [running tests](#running-tests). 95 96A basic template for generating tests in various languages can be found here: 97[loadtest_template_basic_all_languages.yaml](./templates/loadtest_template_basic_all_languages.yaml). 98The following example generates configurations for C# and Java tests using this 99template, including tests against C++ clients and servers, and running each test 100twice: 101 102``` 103$ ./tools/run_tests/performance/loadtest_config.py -l go -l java \ 104 -t ./tools/run_tests/performance/templates/loadtest_template_basic_all_languages.yaml \ 105 -s client_pool=workers-8core -s driver_pool=drivers \ 106 -s server_pool=workers-8core \ 107 -s big_query_table=e2e_benchmarks.experimental_results \ 108 -s timeout_seconds=3600 --category=scalable \ 109 -d --allow_client_language=c++ --allow_server_language=c++ \ 110 --runs_per_test=2 -o ./loadtest.yaml 111``` 112 113The script `loadtest_config.py` takes the following options: 114 115- `-l`, `--language`<br> Language to benchmark. May be repeated. 116- `-t`, `--template`<br> Template file. A template is a configuration file that 117 may contain multiple client and server configuration, and may also include 118 substitution keys. 119- `-s`, `--substitution` Substitution keys, in the format `key=value`. These 120 keys are substituted while processing the template. Environment variables that 121 are set by the load test controller at runtime are ignored by default 122 (`DRIVER_PORT`, `KILL_AFTER`, `POD_TIMEOUT`). The user can override this 123 behavior by specifying these variables as keys. 124- `-p`, `--prefix`<br> Test names consist of a prefix_joined with a uuid with a 125 dash. Test names are stored in `metadata.name`. The prefix is also added as 126 the `prefix` label in `metadata.labels`. The prefix defaults to the user name 127 if not set. 128- `-u`, `--uniquifier_element`<br> Uniquifier elements may be passed to the test 129 to make the test name unique. This option may be repeated to add multiple 130 elements. The uniquifier elements (plus a date string and a run index, if 131 applicable) are joined with a dash to form a _uniquifier_. The test name uuid 132 is derived from the scenario name and the uniquifier. The uniquifier is also 133 added as the `uniquifier` annotation in `metadata.annotations`. 134- `-d`<br> This option is a shorthand for the addition of a date string as a 135 uniquifier element. 136- `-a`, `--annotation`<br> Metadata annotation to be stored in 137 `metadata.annotations`, in the form key=value. May be repeated. 138- `-r`, `--regex`<br> Regex to select scenarios to run. Each scenario is 139 embedded in a LoadTest configuration containing a client and server of the 140 language(s) required for the test. Defaults to `.*`, i.e., select all 141 scenarios. 142- `--category`<br> Select scenarios of a specified _category_, or of all 143 categories. Defaults to `all`. Continuous runs typically run tests in the 144 `scalable` category. 145- `--allow_client_language`<br> Allows cross-language scenarios where the client 146 is of a specified language, different from the scenario language. This is 147 typically `c++`. This flag may be repeated. 148- `--allow_server_language`<br> Allows cross-language scenarios where the server 149 is of a specified language, different from the scenario language. This is 150 typically `node` or `c++`. This flag may be repeated. 151- `--instances_per_client`<br> This option generates multiple instances of the 152 clients for each test. The instances are named with the name of the client 153 combined with an index (or only an index, if no name is specified). If the 154 template specifies more than one client for a given language, it must also 155 specify unique names for each client. In the most common case, the template 156 contains only one unnamed client for each language, and the instances will be 157 named `0`, `1`, ... 158- `--runs_per_test`<br> This option specifies that each test should be repeated 159 `n` times, where `n` is the value of the flag. If `n` > 1, the index of each 160 test run is added as a uniquifier element for that run. 161- `-o`, `--output`<br> Output file name. The LoadTest configurations are added 162 to this file, in multipart YAML format. Output is streamed to `sys.stdout` if 163 not set. 164 165The script adds labels and annotations to the metadata of each LoadTest 166configuration: 167 168The following labels are added to `metadata.labels`: 169 170- `language`<br> The language of the LoadTest scenario. 171- `prefix`<br> The prefix used in `metadata.name`. 172 173The following annotations are added to `metadata.annotations`: 174 175- `scenario`<br> The name of the LoadTest scenario. 176- `uniquifier`<br> The uniquifier used to generate the LoadTest name, including 177 the run index if applicable. 178 179[Labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/) 180can be used in selectors in resource queries. Adding the prefix, in particular, 181allows the user (or an automation script) to select the resources started from a 182given run of the config generator. 183 184[Annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) 185contain additional information that is available to the user (or an automation 186script) but is not indexed and cannot be used to select objects. Scenario name 187and uniquifier are added to provide the elements of the LoadTest name uuid in 188human-readable form. Additional annotations may be added later for automation. 189 190#### Concatenating load test configurations 191 192The LoadTest configuration generator can process multiple languages at a time, 193assuming that they are supported by the template. The convenience script 194[loadtest_concat_yaml.py](./loadtest_concat_yaml.py) is provided to concatenate 195several YAML files into one, so configurations generated by multiple generator 196invocations can be concatenated into one and run with a single command. The 197script can be invoked as follows: 198 199``` 200$ loadtest_concat_yaml.py -i infile1.yaml infile2.yaml -o outfile.yaml 201``` 202 203#### Generating load test examples 204 205The script [loadtest_examples.sh](./loadtest_examples.sh) is provided to 206generate example load test configurations in all supported languages. This 207script takes only one argument, which is the output directory where the 208configurations will be created. The script produces a set of basic 209configurations, as well as a set of template configurations intended to be used 210with prebuilt images. 211 212The [examples](https://github.com/grpc/test-infra/tree/master/config/samples) in 213the repository [grpc/test-infra] are generated by this script. 214 215#### Generating configuration templates 216 217The script [loadtest_template.py](./loadtest_template.py) generates a load test 218configuration template from a set of load test configurations. The source files 219may be load test configurations or load test configuration templates. The 220generated template supports all languages supported in any of the input 221configurations or templates. 222 223The example template in 224[loadtest_template_basic_template_all_languages.yaml](./templates/loadtest_template_basic_all_languages.yaml) 225was generated from the example configurations in [grpc/test-infra] by the 226following command: 227 228``` 229$ ./tools/run_tests/performance/loadtest_template.py \ 230 -i ../test-infra/config/samples/*_example_loadtest.yaml \ 231 --inject_client_pool --inject_server_pool \ 232 --inject_big_query_table --inject_timeout_seconds \ 233 -o ./tools/run_tests/performance/templates/loadtest_template_basic_all_languages.yaml \ 234 --name basic_all_languages 235``` 236 237The example template with prebuilt images in 238[loadtest_template_prebuilt_all_languages.yaml](./templates/loadtest_template_prebuilt_all_languages.yaml) 239was generated by the following command: 240 241``` 242$ ./tools/run_tests/performance/loadtest_template.py \ 243 -i ../test-infra/config/samples/templates/*_example_loadtest_with_prebuilt_workers.yaml \ 244 --inject_client_pool --inject_driver_image --inject_driver_pool \ 245 --inject_server_pool --inject_big_query_table --inject_timeout_seconds \ 246 -o ./tools/run_tests/performance/templates/loadtest_template_prebuilt_all_languages.yaml \ 247 --name prebuilt_all_languages 248``` 249 250The script `loadtest_template.py` takes the following options: 251 252- `-i`, `--inputs`<br> Space-separated list of the names of input files 253 containing LoadTest configurations. May be repeated. 254- `-o`, `--output`<br> Output file name. Outputs to `sys.stdout` if not set. 255- `--inject_client_pool`<br> If this option is set, the pool attribute of all 256 clients in `spec.clients` is set to `${client_pool}`, for later substitution. 257- `--inject_driver_image`<br> If this option is set, the image attribute of the 258 driver(s) in `spec.drivers` is set to `${driver_image}`, for later 259 substitution. 260- `--inject_driver_pool`<br> If this attribute is set, the pool attribute of the 261 driver(s) is set to `${driver_pool}`, for later substitution. 262- `--inject_server_pool`<br> If this option is set, the pool attribute of all 263 servers in `spec.servers` is set to `${server_pool}`, for later substitution. 264- `--inject_big_query_table`<br> If this option is set, 265 spec.results.bigQueryTable is set to `${big_query_table}`. 266- `--inject_timeout_seconds`<br> If this option is set, `spec.timeoutSeconds` is 267 set to `${timeout_seconds}`. 268- `--inject_ttl_seconds`<br> If this option is set, `spec.ttlSeconds` is set to 269 `${ttl_seconds}`. 270- `-n`, `--name`<br> Name to be set in `metadata.name`. 271- `-a`, `--annotation`<br> Metadata annotation to be stored in 272 `metadata.annotations`, in the form key=value. May be repeated. 273 274The options that inject substitution keys are the most useful for template 275reuse. When running tests on different node pools, it becomes necessary to set 276the pool, and usually also to store the data on a different table. When running 277as part of a larger collection of tests, it may also be necessary to adjust test 278timeout and time-to-live, to ensure that all tests have time to complete. 279 280The template name is replaced again by `loadtest_config.py`, and so is set only 281as a human-readable memo. 282 283Annotations, on the other hand, are passed on to the test configurations, and 284may be set to values or to substitution keys in themselves, allowing future 285automation scripts to process the tests generated from these configurations in 286different ways. 287 288#### Running tests 289 290Collections of tests generated by `loadtest_config.py` are intended to be run 291with a test runner. The code for the test runner is stored in a separate 292repository, [grpc/test-infra]. 293 294The test runner applies the tests to the cluster, and monitors the tests for 295completion while they are running. The test runner can also be set up to run 296collections of tests in parallel on separate node pools, and to limit the number 297of tests running in parallel on each pool. 298 299For more information, see the 300[tools README](https://github.com/grpc/test-infra/blob/master/tools/README.md) 301in [grpc/test-infra]. 302 303For usage examples, see the continuous integration setup defined in 304[grpc_e2e_performance_gke.sh] and [grpc_e2e_performance_gke_experiment.sh]. 305 306[grpc/test-infra]: https://github.com/grpc/test-infra 307[grpc_e2e_performance_gke.sh]: ../../internal_ci/linux/grpc_e2e_performance_gke.sh 308[grpc_e2e_performance_gke_experiment.sh]: ../../internal_ci/linux/grpc_e2e_performance_gke_experiment.sh 309 310## Approach 2: Running benchmarks locally via legacy tooling (still useful sometimes) 311 312This approach is much more involved than using the gRPC OSS benchmarks framework 313(see above), but can still be useful for hands-on low-level experiments 314(especially when you know what you are doing). 315 316### Prerequisites for running benchmarks manually: 317 318In general the benchmark workers and driver build scripts expect 319[linux_performance_worker_init.sh](../../gce/linux_performance_worker_init.sh) 320to have been ran already. 321 322### To run benchmarks locally: 323 324- From the grpc repo root, start the 325 [run_performance_tests.py](../run_performance_tests.py) runner script. 326 327### On remote machines, to start the driver and workers manually: 328 329The [run_performance_test.py](../run_performance_tests.py) top-level runner 330script can also be used with remote machines, but for e.g., profiling the 331server, it might be useful to run workers manually. 332 3331. You'll need a "driver" and separate "worker" machines. For example, you might 334 use one GCE "driver" machine and 3 other GCE "worker" machines that are in 335 the same zone. 336 3372. Connect to each worker machine and start up a benchmark worker with a 338 "driver_port". 339 340- For example, to start the grpc-go benchmark worker: 341 [grpc-go worker main.go](https://github.com/grpc/grpc-go/blob/master/benchmark/worker/main.go) 342 --driver_port <driver_port> 343 344#### Commands to start workers in different languages: 345 346- Note that these commands are what the top-level 347 [run_performance_test.py](../run_performance_tests.py) script uses to build 348 and run different workers through the 349 [build_performance.sh](./build_performance.sh) script and "run worker" scripts 350 (such as the [run_worker_java.sh](./run_worker_java.sh)). 351 352##### Running benchmark workers for C-core wrapped languages (C++, Python, C#, Node, Ruby): 353 354- These are more simple since they all live in the main grpc repo. 355 356``` 357$ cd <grpc_repo_root> 358$ tools/run_tests/performance/build_performance.sh 359$ tools/run_tests/performance/run_worker_<language>.sh 360``` 361 362- Note that there is one "run_worker" script per language, e.g., 363 [run_worker_csharp.sh](./run_worker_csharp.sh) for c#. 364 365##### Running benchmark workers for gRPC-Java: 366 367- You'll need the [grpc-java](https://github.com/grpc/grpc-java) repo. 368 369``` 370$ cd <grpc-java-repo> 371$ ./gradlew -PskipCodegen=true -PskipAndroid=true :grpc-benchmarks:installDist 372$ benchmarks/build/install/grpc-benchmarks/bin/benchmark_worker --driver_port <driver_port> 373``` 374 375##### Running benchmark workers for gRPC-Go: 376 377- You'll need the [grpc-go repo](https://github.com/grpc/grpc-go) 378 379``` 380$ cd <grpc-go-repo>/benchmark/worker && go install 381$ # if profiling, it might be helpful to turn off inlining by building with "-gcflags=-l" 382$ $GOPATH/bin/worker --driver_port <driver_port> 383``` 384 385#### Build the driver: 386 387- Connect to the driver machine (if using a remote driver) and from the grpc 388 repo root: 389 390``` 391$ tools/run_tests/performance/build_performance.sh 392``` 393 394#### Run the driver: 395 3961. Get the 'scenario_json' relevant for the scenario to run. Note that "scenario 397 json" configs are generated from [scenario_config.py](./scenario_config.py). 398 The [driver](../../../test/cpp/qps/qps_json_driver.cc) takes a list of these 399 configs as a json string of the form: `{scenario: <json_list_of_scenarios> }` 400 in its `--scenarios_json` command argument. One quick way to get a valid json 401 string to pass to the driver is by running the 402 [run_performance_tests.py](./run_performance_tests.py) locally and copying 403 the logged scenario json command arg. 404 4052. From the grpc repo root: 406 407- Set `QPS_WORKERS` environment variable to a comma separated list of worker 408 machines. Note that the driver will start the "benchmark server" on the first 409 entry in the list, and the rest will be told to run as clients against the 410 benchmark server. 411 412Example running and profiling of go benchmark server: 413 414``` 415$ export QPS_WORKERS=<host1>:<10000>,<host2>,10000,<host3>:10000 416$ bins/opt/qps_json_driver --scenario_json='<scenario_json_scenario_config_string>' 417``` 418 419### Example profiling commands 420 421While running the benchmark, a profiler can be attached to the server. 422 423Example to count syscalls in grpc-go server during a benchmark: 424 425- Connect to server machine and run: 426 427``` 428$ netstat -tulpn | grep <driver_port> # to get pid of worker 429$ perf stat -p <worker_pid> -e syscalls:sys_enter_write # stop after test complete 430``` 431 432Example memory profile of grpc-go server, with `go tools pprof`: 433 434- After a run is done on the server, see its alloc profile with: 435 436``` 437$ go tool pprof --text --alloc_space http://localhost:<pprof_port>/debug/heap 438``` 439 440### Configuration environment variables: 441 442- QPS_WORKER_CHANNEL_CONNECT_TIMEOUT 443 444 Consuming process: qps_worker 445 446 Type: integer (number of seconds) 447 448 This can be used to configure the amount of time that benchmark clients wait 449 for channels to the benchmark server to become ready. This is useful in 450 certain benchmark environments in which the server can take a long time to 451 become ready. Note: if setting this to a high value, then the scenario config 452 under test should probably also have a large "warmup_seconds". 453 454- QPS_WORKERS 455 456 Consuming process: qps_json_driver 457 458 Type: comma separated list of host:port 459 460 Set this to a comma separated list of QPS worker processes/machines. Each 461 scenario in a scenario config has specifies a certain number of servers, 462 `num_servers`, and the driver will start "benchmark servers"'s on the first 463 `num_server` `host:port` pairs in the comma separated list. The rest will be 464 told to run as clients against the benchmark server. 465