1# Overview of performance test suite
2
3For design of the tests, see https://grpc.io/docs/guides/benchmarking.
4
5This document contains documentation of on how to run gRPC end-to-end benchmarks
6using the gRPC OSS benchmarks framework (recommended) or how to run them
7manually (for experts only).
8
9## Approach 1: Use gRPC OSS benchmarks framework (Recommended)
10
11### gRPC OSS benchmarks
12
13The scripts in this section generate LoadTest configurations for the GKE-based
14gRPC OSS benchmarks framework. This framework is stored in a separate
15repository, [grpc/test-infra].
16
17These scripts, together with tools defined in [grpc/test-infra], are used in the
18continuous integration setup defined in [grpc_e2e_performance_gke.sh] and
19[grpc_e2e_performance_gke_experiment.sh].
20
21#### Generating scenarios
22
23The benchmarks framework uses the same test scenarios as the legacy one. The
24script [scenario_config_exporter.py](./scenario_config_exporter.py) can be used
25to export these scenarios to files, and also to count and analyze existing
26scenarios.
27
28The language(s) and category of the scenarios are of particular importance to
29the tests. Continuous runs will typically run tests in the `scalable` category.
30
31The following example counts scenarios in the `scalable` category:
32
33```
34$ ./tools/run_tests/performance/scenario_config_exporter.py --count_scenarios --category=scalable
35Scenario count for all languages (category: scalable):
36Count  Language         Client   Server   Categories
37   77  c++                                scalable
38   19  python_asyncio                     scalable
39   16  java                               scalable
40   12  go                                 scalable
41   12  node                      node     scalable
42   12  node_purejs               node     scalable
43    9  csharp                             scalable
44    7  python                             scalable
45    5  ruby                               scalable
46    4  csharp                    c++      scalable
47    4  php7                      c++      scalable
48    4  php7_protobuf_c           c++      scalable
49    3  python_asyncio            c++      scalable
50    2  ruby                      c++      scalable
51    2  python                    c++      scalable
52    1  csharp           c++               scalable
53
54  189  total scenarios (category: scalable)
55```
56
57Client and server languages are only set for cross-language scenarios, where the
58client or server language do not match the scenario language.
59
60#### Generating load test configurations
61
62The benchmarks framework uses LoadTest resources configured by YAML files. Each
63LoadTest resource specifies a driver, a server, and one or more clients to run
64the test. Each test runs one scenario. The scenario configuration is embedded in
65the LoadTest configuration. Example configurations for various languages can be
66found here:
67
68https://github.com/grpc/test-infra/tree/master/config/samples
69
70The script [loadtest_config.py](./loadtest_config.py) generates LoadTest
71configurations for tests running a set of scenarios. The configurations are
72written in multipart YAML format, either to a file or to stdout. Each
73configuration contains a single embedded scenario.
74
75The LoadTest configurations are generated from a template. Any configuration can
76be used as a template, as long as it contains the languages required by the set
77of scenarios we intend to run (for instance, if we are generating configurations
78to run go scenarios, the template must contain a go client and a go server; if
79we are generating configurations for cross-language scenarios that need a go
80client and a C++ server, the template must also contain a C++ server; and the
81same for all other languages).
82
83The LoadTests specified in the script output all have unique names and can be
84run by applying the test to a cluster running the LoadTest controller with
85`kubectl apply`:
86
87```
88$ kubectl apply -f loadtest_config.yaml
89```
90
91> Note: The most common way of running tests generated by this script is to use
92> a _test runner_. For details, see [running tests](#running-tests).
93
94A basic template for generating tests in various languages can be found here:
95[loadtest_template_basic_all_languages.yaml](./templates/loadtest_template_basic_all_languages.yaml).
96The following example generates configurations for C# and Java tests using this
97template, including tests against C++ clients and servers, and running each test
98twice:
99
100```
101$ ./tools/run_tests/performance/loadtest_config.py -l go -l java \
102    -t ./tools/run_tests/performance/templates/loadtest_template_basic_all_languages.yaml \
103    -s client_pool=workers-8core -s driver_pool=drivers \
104    -s server_pool=workers-8core \
105    -s big_query_table=e2e_benchmarks.experimental_results \
106    -s timeout_seconds=3600 --category=scalable \
107    -d --allow_client_language=c++ --allow_server_language=c++ \
108    --runs_per_test=2 -o ./loadtest.yaml
109```
110
111The script `loadtest_config.py` takes the following options:
112
113- `-l`, `--language`<br> Language to benchmark. May be repeated.
114- `-t`, `--template`<br> Template file. A template is a configuration file that
115  may contain multiple client and server configuration, and may also include
116  substitution keys.
117- `-s`, `--substitution` Substitution keys, in the format `key=value`. These
118  keys are substituted while processing the template. Environment variables that
119  are set by the load test controller at runtime are ignored by default
120  (`DRIVER_PORT`, `KILL_AFTER`, `POD_TIMEOUT`). The user can override this
121  behavior by specifying these variables as keys.
122- `-p`, `--prefix`<br> Test names consist of a prefix_joined with a uuid with a
123  dash. Test names are stored in `metadata.name`. The prefix is also added as
124  the `prefix` label in `metadata.labels`. The prefix defaults to the user name
125  if not set.
126- `-u`, `--uniquifier_element`<br> Uniquifier elements may be passed to the test
127  to make the test name unique. This option may be repeated to add multiple
128  elements. The uniquifier elements (plus a date string and a run index, if
129  applicable) are joined with a dash to form a _uniquifier_. The test name uuid
130  is derived from the scenario name and the uniquifier. The uniquifier is also
131  added as the `uniquifier` annotation in `metadata.annotations`.
132- `-d`<br> This option is a shorthand for the addition of a date string as a
133  uniquifier element.
134- `-a`, `--annotation`<br> Metadata annotation to be stored in
135  `metadata.annotations`, in the form key=value. May be repeated.
136- `-r`, `--regex`<br> Regex to select scenarios to run. Each scenario is
137  embedded in a LoadTest configuration containing a client and server of the
138  language(s) required for the test. Defaults to `.*`, i.e., select all
139  scenarios.
140- `--category`<br> Select scenarios of a specified _category_, or of all
141  categories. Defaults to `all`. Continuous runs typically run tests in the
142  `scalable` category.
143- `--allow_client_language`<br> Allows cross-language scenarios where the client
144  is of a specified language, different from the scenario language. This is
145  typically `c++`. This flag may be repeated.
146- `--allow_server_language`<br> Allows cross-language scenarios where the server
147  is of a specified language, different from the scenario language. This is
148  typically `node` or `c++`. This flag may be repeated.
149- `--instances_per_client`<br> This option generates multiple instances of the
150  clients for each test. The instances are named with the name of the client
151  combined with an index (or only an index, if no name is specified). If the
152  template specifies more than one client for a given language, it must also
153  specify unique names for each client. In the most common case, the template
154  contains only one unnamed client for each language, and the instances will be
155  named `0`, `1`, ...
156- `--runs_per_test`<br> This option specifies that each test should be repeated
157  `n` times, where `n` is the value of the flag. If `n` > 1, the index of each
158  test run is added as a uniquifier element for that run.
159- `-o`, `--output`<br> Output file name. The LoadTest configurations are added
160  to this file, in multipart YAML format. Output is streamed to `sys.stdout` if
161  not set.
162
163The script adds labels and annotations to the metadata of each LoadTest
164configuration:
165
166The following labels are added to `metadata.labels`:
167
168- `language`<br> The language of the LoadTest scenario.
169- `prefix`<br> The prefix used in `metadata.name`.
170
171The following annotations are added to `metadata.annotations`:
172
173- `scenario`<br> The name of the LoadTest scenario.
174- `uniquifier`<br> The uniquifier used to generate the LoadTest name, including
175  the run index if applicable.
176
177[Labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/)
178can be used in selectors in resource queries. Adding the prefix, in particular,
179allows the user (or an automation script) to select the resources started from a
180given run of the config generator.
181
182[Annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/)
183contain additional information that is available to the user (or an automation
184script) but is not indexed and cannot be used to select objects. Scenario name
185and uniquifier are added to provide the elements of the LoadTest name uuid in
186human-readable form. Additional annotations may be added later for automation.
187
188#### Concatenating load test configurations
189
190The LoadTest configuration generator can process multiple languages at a time,
191assuming that they are supported by the template. The convenience script
192[loadtest_concat_yaml.py](./loadtest_concat_yaml.py) is provided to concatenate
193several YAML files into one, so configurations generated by multiple generator
194invocations can be concatenated into one and run with a single command. The
195script can be invoked as follows:
196
197```
198$ loadtest_concat_yaml.py -i infile1.yaml infile2.yaml -o outfile.yaml
199```
200
201#### Generating load test examples
202
203The script [loadtest_examples.sh](./loadtest_examples.sh) is provided to
204generate example load test configurations in all supported languages. This
205script takes only one argument, which is the output directory where the
206configurations will be created. The script produces a set of basic
207configurations, as well as a set of template configurations intended to be used
208with prebuilt images.
209
210The [examples](https://github.com/grpc/test-infra/tree/master/config/samples) in
211the repository [grpc/test-infra] are generated by this script.
212
213#### Generating configuration templates
214
215The script [loadtest_template.py](./loadtest_template.py) generates a load test
216configuration template from a set of load test configurations. The source files
217may be load test configurations or load test configuration templates. The
218generated template supports all languages supported in any of the input
219configurations or templates.
220
221The example template in
222[loadtest_template_basic_template_all_languages.yaml](./templates/loadtest_template_basic_all_languages.yaml)
223was generated from the example configurations in [grpc/test-infra] by the
224following command:
225
226```
227$ ./tools/run_tests/performance/loadtest_template.py \
228    -i ../test-infra/config/samples/*_example_loadtest.yaml \
229    --inject_client_pool --inject_server_pool \
230    --inject_big_query_table --inject_timeout_seconds \
231    -o ./tools/run_tests/performance/templates/loadtest_template_basic_all_languages.yaml \
232    --name basic_all_languages
233```
234
235The example template with prebuilt images in
236[loadtest_template_prebuilt_all_languages.yaml](./templates/loadtest_template_prebuilt_all_languages.yaml)
237was generated by the following command:
238
239```
240$ ./tools/run_tests/performance/loadtest_template.py \
241    -i ../test-infra/config/samples/templates/*_example_loadtest_with_prebuilt_workers.yaml \
242    --inject_client_pool --inject_driver_image --inject_driver_pool \
243    --inject_server_pool --inject_big_query_table --inject_timeout_seconds \
244    -o ./tools/run_tests/performance/templates/loadtest_template_prebuilt_all_languages.yaml \
245    --name prebuilt_all_languages
246```
247
248The script `loadtest_template.py` takes the following options:
249
250- `-i`, `--inputs`<br> Space-separated list of the names of input files
251  containing LoadTest configurations. May be repeated.
252- `-o`, `--output`<br> Output file name. Outputs to `sys.stdout` if not set.
253- `--inject_client_pool`<br> If this option is set, the pool attribute of all
254  clients in `spec.clients` is set to `${client_pool}`, for later substitution.
255- `--inject_driver_image`<br> If this option is set, the image attribute of the
256  driver(s) in `spec.drivers` is set to `${driver_image}`, for later
257  substitution.
258- `--inject_driver_pool`<br> If this attribute is set, the pool attribute of the
259  driver(s) is set to `${driver_pool}`, for later substitution.
260- `--inject_server_pool`<br> If this option is set, the pool attribute of all
261  servers in `spec.servers` is set to `${server_pool}`, for later substitution.
262- `--inject_big_query_table`<br> If this option is set,
263  spec.results.bigQueryTable is set to `${big_query_table}`.
264- `--inject_timeout_seconds`<br> If this option is set, `spec.timeoutSeconds` is
265  set to `${timeout_seconds}`.
266- `--inject_ttl_seconds`<br> If this option is set, `spec.ttlSeconds` is set to
267  `${ttl_seconds}`.
268- `-n`, `--name`<br> Name to be set in `metadata.name`.
269- `-a`, `--annotation`<br> Metadata annotation to be stored in
270  `metadata.annotations`, in the form key=value. May be repeated.
271
272The options that inject substitution keys are the most useful for template
273reuse. When running tests on different node pools, it becomes necessary to set
274the pool, and usually also to store the data on a different table. When running
275as part of a larger collection of tests, it may also be necessary to adjust test
276timeout and time-to-live, to ensure that all tests have time to complete.
277
278The template name is replaced again by `loadtest_config.py`, and so is set only
279as a human-readable memo.
280
281Annotations, on the other hand, are passed on to the test configurations, and
282may be set to values or to substitution keys in themselves, allowing future
283automation scripts to process the tests generated from these configurations in
284different ways.
285
286#### Running tests
287
288Collections of tests generated by `loadtest_config.py` are intended to be run
289with a test runner. The code for the test runner is stored in a separate
290repository, [grpc/test-infra].
291
292The test runner applies the tests to the cluster, and monitors the tests for
293completion while they are running. The test runner can also be set up to run
294collections of tests in parallel on separate node pools, and to limit the number
295of tests running in parallel on each pool.
296
297For more information, see the
298[tools README](https://github.com/grpc/test-infra/blob/master/tools/README.md)
299in [grpc/test-infra].
300
301For usage examples, see the continuous integration setup defined in
302[grpc_e2e_performance_gke.sh] and [grpc_e2e_performance_gke_experiment.sh].
303
304[grpc/test-infra]: https://github.com/grpc/test-infra
305[grpc_e2e_performance_gke.sh]: ../../internal_ci/linux/grpc_e2e_performance_gke.sh
306[grpc_e2e_performance_gke_experiment.sh]: ../../internal_ci/linux/grpc_e2e_performance_gke_experiment.sh
307
308## Approach 2: Running benchmarks locally via legacy tooling (still useful sometimes)
309
310This approach is much more involved than using the gRPC OSS benchmarks framework
311(see above), but can still be useful for hands-on low-level experiments
312(especially when you know what you are doing).
313
314### Prerequisites for running benchmarks manually:
315
316In general the benchmark workers and driver build scripts expect
317[linux_performance_worker_init.sh](../../gce/linux_performance_worker_init.sh)
318to have been ran already.
319
320### To run benchmarks locally:
321
322- From the grpc repo root, start the
323  [run_performance_tests.py](../run_performance_tests.py) runner script.
324
325### On remote machines, to start the driver and workers manually:
326
327The [run_performance_test.py](../run_performance_tests.py) top-level runner
328script can also be used with remote machines, but for e.g., profiling the
329server, it might be useful to run workers manually.
330
3311. You'll need a "driver" and separate "worker" machines. For example, you might
332   use one GCE "driver" machine and 3 other GCE "worker" machines that are in
333   the same zone.
334
3352. Connect to each worker machine and start up a benchmark worker with a
336   "driver_port".
337
338- For example, to start the grpc-go benchmark worker:
339  [grpc-go worker main.go](https://github.com/grpc/grpc-go/blob/master/benchmark/worker/main.go)
340  --driver_port <driver_port>
341
342#### Commands to start workers in different languages:
343
344- Note that these commands are what the top-level
345  [run_performance_test.py](../run_performance_tests.py) script uses to build
346  and run different workers through the
347  [build_performance.sh](./build_performance.sh) script and "run worker" scripts
348  (such as the [run_worker_java.sh](./run_worker_java.sh)).
349
350##### Running benchmark workers for C-core wrapped languages (C++, Python, C#, Node, Ruby):
351
352- These are more simple since they all live in the main grpc repo.
353
354```
355$ cd <grpc_repo_root>
356$ tools/run_tests/performance/build_performance.sh
357$ tools/run_tests/performance/run_worker_<language>.sh
358```
359
360- Note that there is one "run_worker" script per language, e.g.,
361  [run_worker_csharp.sh](./run_worker_csharp.sh) for c#.
362
363##### Running benchmark workers for gRPC-Java:
364
365- You'll need the [grpc-java](https://github.com/grpc/grpc-java) repo.
366
367```
368$ cd <grpc-java-repo>
369$ ./gradlew -PskipCodegen=true -PskipAndroid=true :grpc-benchmarks:installDist
370$ benchmarks/build/install/grpc-benchmarks/bin/benchmark_worker --driver_port <driver_port>
371```
372
373##### Running benchmark workers for gRPC-Go:
374
375- You'll need the [grpc-go repo](https://github.com/grpc/grpc-go)
376
377```
378$ cd <grpc-go-repo>/benchmark/worker && go install
379$ # if profiling, it might be helpful to turn off inlining by building with "-gcflags=-l"
380$ $GOPATH/bin/worker --driver_port <driver_port>
381```
382
383#### Build the driver:
384
385- Connect to the driver machine (if using a remote driver) and from the grpc
386  repo root:
387
388```
389$ tools/run_tests/performance/build_performance.sh
390```
391
392#### Run the driver:
393
3941. Get the 'scenario_json' relevant for the scenario to run. Note that "scenario
395   json" configs are generated from [scenario_config.py](./scenario_config.py).
396   The [driver](../../../test/cpp/qps/qps_json_driver.cc) takes a list of these
397   configs as a json string of the form: `{scenario: <json_list_of_scenarios> }`
398   in its `--scenarios_json` command argument. One quick way to get a valid json
399   string to pass to the driver is by running the
400   [run_performance_tests.py](./run_performance_tests.py) locally and copying
401   the logged scenario json command arg.
402
4032. From the grpc repo root:
404
405- Set `QPS_WORKERS` environment variable to a comma separated list of worker
406  machines. Note that the driver will start the "benchmark server" on the first
407  entry in the list, and the rest will be told to run as clients against the
408  benchmark server.
409
410Example running and profiling of go benchmark server:
411
412```
413$ export QPS_WORKERS=<host1>:<10000>,<host2>,10000,<host3>:10000
414$ bins/opt/qps_json_driver --scenario_json='<scenario_json_scenario_config_string>'
415```
416
417### Example profiling commands
418
419While running the benchmark, a profiler can be attached to the server.
420
421Example to count syscalls in grpc-go server during a benchmark:
422
423- Connect to server machine and run:
424
425```
426$ netstat -tulpn | grep <driver_port> # to get pid of worker
427$ perf stat -p <worker_pid> -e syscalls:sys_enter_write # stop after test complete
428```
429
430Example memory profile of grpc-go server, with `go tools pprof`:
431
432- After a run is done on the server, see its alloc profile with:
433
434```
435$ go tool pprof --text --alloc_space http://localhost:<pprof_port>/debug/heap
436```
437
438### Configuration environment variables:
439
440- QPS_WORKER_CHANNEL_CONNECT_TIMEOUT
441
442  Consuming process: qps_worker
443
444  Type: integer (number of seconds)
445
446  This can be used to configure the amount of time that benchmark clients wait
447  for channels to the benchmark server to become ready. This is useful in
448  certain benchmark environments in which the server can take a long time to
449  become ready. Note: if setting this to a high value, then the scenario config
450  under test should probably also have a large "warmup_seconds".
451
452- QPS_WORKERS
453
454  Consuming process: qps_json_driver
455
456  Type: comma separated list of host:port
457
458  Set this to a comma separated list of QPS worker processes/machines. Each
459  scenario in a scenario config has specifies a certain number of servers,
460  `num_servers`, and the driver will start "benchmark servers"'s on the first
461  `num_server` `host:port` pairs in the comma separated list. The rest will be
462  told to run as clients against the benchmark server.
463