1*e07d83d3SAndroid Build Coastguard Workergrpc Benchmarks 2*e07d83d3SAndroid Build Coastguard Worker============================================== 3*e07d83d3SAndroid Build Coastguard Worker 4*e07d83d3SAndroid Build Coastguard Worker## QPS Benchmark 5*e07d83d3SAndroid Build Coastguard Worker 6*e07d83d3SAndroid Build Coastguard WorkerThe "Queries Per Second Benchmark" allows you to get a quick overview of the throughput and latency characteristics of grpc. 7*e07d83d3SAndroid Build Coastguard Worker 8*e07d83d3SAndroid Build Coastguard WorkerTo build the benchmark type 9*e07d83d3SAndroid Build Coastguard Worker 10*e07d83d3SAndroid Build Coastguard Worker``` 11*e07d83d3SAndroid Build Coastguard Worker$ ./gradlew :grpc-benchmarks:installDist 12*e07d83d3SAndroid Build Coastguard Worker``` 13*e07d83d3SAndroid Build Coastguard Worker 14*e07d83d3SAndroid Build Coastguard Workerfrom the grpc-java directory. 15*e07d83d3SAndroid Build Coastguard Worker 16*e07d83d3SAndroid Build Coastguard WorkerYou can now find the client and the server executables in `benchmarks/build/install/grpc-benchmarks/bin`. 17*e07d83d3SAndroid Build Coastguard Worker 18*e07d83d3SAndroid Build Coastguard WorkerThe `C++` counterpart can be found at https://github.com/grpc/grpc/tree/master/test/cpp/qps 19*e07d83d3SAndroid Build Coastguard Worker 20*e07d83d3SAndroid Build Coastguard WorkerThe [netty benchmark](src/jmh/java/io/grpc/benchmarks/netty) directory contains 21*e07d83d3SAndroid Build Coastguard Workerthe standard benchmarks used to assess the performance of GRPC. Since these 22*e07d83d3SAndroid Build Coastguard Workerbenchmarks run on localhost over loopback the performance of the underlying network is considerably 23*e07d83d3SAndroid Build Coastguard Workerdifferent to real networks and their behavior. To address this issue we recommend the use of 24*e07d83d3SAndroid Build Coastguard Workera network emulator to make loopback behave more like a real network. To this end the benchmark 25*e07d83d3SAndroid Build Coastguard Workercode looks for a loopback interface with 'benchmark' in its name and attempts to use the address 26*e07d83d3SAndroid Build Coastguard Workerbound to that interface when creating the client and server. If it cannot find such an interface it 27*e07d83d3SAndroid Build Coastguard Workerwill print a warning and continue with the default localhost address. 28*e07d83d3SAndroid Build Coastguard Worker 29*e07d83d3SAndroid Build Coastguard WorkerTo attempt to standardize benchmark behavior across machines we attempt to emulate a 10gbit 30*e07d83d3SAndroid Build Coastguard Workerethernet interface with a packet delay of 0.1ms. 31*e07d83d3SAndroid Build Coastguard Worker 32*e07d83d3SAndroid Build Coastguard Worker### Linux 33*e07d83d3SAndroid Build Coastguard Worker 34*e07d83d3SAndroid Build Coastguard WorkerOn Linux we can use [netem](https://www.linuxfoundation.org/collaborate/workgroups/networking/netem) to shape the traffic appropriately. 35*e07d83d3SAndroid Build Coastguard Worker 36*e07d83d3SAndroid Build Coastguard Worker```sh 37*e07d83d3SAndroid Build Coastguard Worker# Remove all traffic shaping from loopback 38*e07d83d3SAndroid Build Coastguard Workersudo tc qdisc del dev lo root 39*e07d83d3SAndroid Build Coastguard Worker# Add a priority traffic class to the root of loopback 40*e07d83d3SAndroid Build Coastguard Workersudo tc qdisc add dev lo root handle 1: prio 41*e07d83d3SAndroid Build Coastguard Worker# Add a qdisc under the new class with the appropriate shaping 42*e07d83d3SAndroid Build Coastguard Workersudo tc qdisc add dev lo parent 1:1 handle 2: netem delay 0.1ms rate 10gbit 43*e07d83d3SAndroid Build Coastguard Worker# Add a filter which selects the new qdisc class for traffic to 127.127.127.127 44*e07d83d3SAndroid Build Coastguard Workersudo tc filter add dev lo parent 1:0 protocol ip prio 1 u32 match ip dst 127.127.127.127 flowid 2:1 45*e07d83d3SAndroid Build Coastguard Worker# Create an interface alias call 'lo:benchmark' that maps 127.127.127.127 to loopback 46*e07d83d3SAndroid Build Coastguard Workersudo ip addr add dev lo 127.127.127.127/32 label lo:benchmark 47*e07d83d3SAndroid Build Coastguard Worker``` 48*e07d83d3SAndroid Build Coastguard Worker 49*e07d83d3SAndroid Build Coastguard Workerto remove this configuration 50*e07d83d3SAndroid Build Coastguard Worker 51*e07d83d3SAndroid Build Coastguard Worker```sh 52*e07d83d3SAndroid Build Coastguard Workersudo tc qdisc del dev lo root 53*e07d83d3SAndroid Build Coastguard Workersudo ip addr del dev lo 127.127.127.127/32 label lo:benchmark 54*e07d83d3SAndroid Build Coastguard Worker``` 55*e07d83d3SAndroid Build Coastguard Worker 56*e07d83d3SAndroid Build Coastguard Worker### Other Platforms 57*e07d83d3SAndroid Build Coastguard Worker 58*e07d83d3SAndroid Build Coastguard WorkerContributions are welcome! 59*e07d83d3SAndroid Build Coastguard Worker 60*e07d83d3SAndroid Build Coastguard Worker## Visualizing the Latency Distribution 61*e07d83d3SAndroid Build Coastguard Worker 62*e07d83d3SAndroid Build Coastguard WorkerThe QPS client comes with the option `--save_histogram=FILE`, if set it serializes the histogram to `FILE` which can then be used with a plotter to visualize the latency distribution. The histogram is stored in the file format of [HdrHistogram](https://hdrhistogram.org/). That way it can be plotted very easily using a browser based tool like https://hdrhistogram.github.io/HdrHistogram/plotFiles.html. Simply upload the generated file and it will generate a beautiful graph for you. It also allows you to plot two or more histograms on the same surface in order two easily compare latency distributions. 63*e07d83d3SAndroid Build Coastguard Worker 64*e07d83d3SAndroid Build Coastguard Worker## JVM Options 65*e07d83d3SAndroid Build Coastguard Worker 66*e07d83d3SAndroid Build Coastguard WorkerWhen running a benchmark it's often useful to adjust some JVM options to improve performance and to gain some insights into what's happening. Passing JVM options to the QPS server and client is as easy as setting the `JAVA_OPTS` environment variables. Below are some options that I find very useful: 67*e07d83d3SAndroid Build Coastguard Worker - `-Xms` gives a lower bound on the heap to allocate and `-Xmx` gives an upper bound. If your program uses more than what's specified in `-Xmx` the JVM will exit with an `OutOfMemoryError`. When setting those always set `Xms` and `Xmx` to the same value. The reason for this is that the young and old generation are sized according to the total available heap space. So if the total heap gets resized, they will also have to be resized and this will then trigger a full GC. 68*e07d83d3SAndroid Build Coastguard Worker - `-verbose:gc` prints some basic information about garbage collection. It will log to stdout whenever a GC happend and will tell you about the kind of GC, pause time and memory compaction. 69*e07d83d3SAndroid Build Coastguard Worker - `-XX:+PrintGCDetails` prints out very detailed GC and heap usage information before the program terminates. 70*e07d83d3SAndroid Build Coastguard Worker - `-XX:-HeapDumpOnOutOfMemoryError` and `-XX:HeapDumpPath=path` when you are pushing the JVM hard it sometimes happens that it will crash due to the lack of available heap space. This option will allow you to dive into the details of why it happened. The heap dump can be viewed with e.g. the [Eclipse Memory Analyzer](https://eclipse.org/mat/). 71*e07d83d3SAndroid Build Coastguard Worker - `-XX:+PrintCompilation` will give you a detailed overview of what gets compiled, when it gets compiled, by which HotSpot compiler it gets compiled and such. It's a lot of output. I usually just redirect it to file and look at it with `less` and `grep`. 72*e07d83d3SAndroid Build Coastguard Worker - `-XX:+PrintInlining` will give you a detailed overview of what gets inlined and why some methods didn't get inlined. The output is very verbose and like `-XX:+PrintCompilation` and useful to look at after some major changes or when a drop in performance occurs. 73*e07d83d3SAndroid Build Coastguard Worker - It sometimes happens that a benchmark just doesn't make any progress, that is no bytes are transferred over the network, there is hardly any CPU utilization and low memory usage but the benchmark is still running. In that case it's useful to get a thread dump and see what's going on. HotSpot ships with a tool called `jps` and `jstack`. `jps` tells you the process id of all running JVMs on the machine, which you can then pass to `jstack` and it will print a thread dump of this JVM. 74*e07d83d3SAndroid Build Coastguard Worker - Taking a heap dump of a running JVM is similarly straightforward. First get the process id with `jps` and then use `jmap` to take the heap dump. You will almost always want to run it with `-dump:live` in order to only dump live objects. If possible, try to size the heap of your JVM (`-Xmx`) as small as possible in order to also keep the heap dump small. Large heap dumps are very painful and slow to analyze. 75*e07d83d3SAndroid Build Coastguard Worker 76*e07d83d3SAndroid Build Coastguard Worker## Profiling 77*e07d83d3SAndroid Build Coastguard Worker 78*e07d83d3SAndroid Build Coastguard WorkerNewer JVMs come with a built-in profiler called `Java Flight Recorder`. It's an excellent profiler and it can be used to start a recording directly on the command line, from within `Java Mission Control` or 79*e07d83d3SAndroid Build Coastguard Workerwith jcmd. 80*e07d83d3SAndroid Build Coastguard Worker 81*e07d83d3SAndroid Build Coastguard WorkerA good introduction on how it works and how to use it are http://hirt.se/blog/?p=364 and http://hirt.se/blog/?p=370. 82