1# User Guide 2 3## Command Line 4 5[Output Formats](#output-formats) 6 7[Output Files](#output-files) 8 9[Running Benchmarks](#running-benchmarks) 10 11[Running a Subset of Benchmarks](#running-a-subset-of-benchmarks) 12 13[Result Comparison](#result-comparison) 14 15[Extra Context](#extra-context) 16 17## Library 18 19[Runtime and Reporting Considerations](#runtime-and-reporting-considerations) 20 21[Setup/Teardown](#setupteardown) 22 23[Passing Arguments](#passing-arguments) 24 25[Custom Benchmark Name](#custom-benchmark-name) 26 27[Calculating Asymptotic Complexity](#asymptotic-complexity) 28 29[Templated Benchmarks](#templated-benchmarks) 30 31[Templated Benchmarks that take arguments](#templated-benchmarks-with-arguments) 32 33[Fixtures](#fixtures) 34 35[Custom Counters](#custom-counters) 36 37[Multithreaded Benchmarks](#multithreaded-benchmarks) 38 39[CPU Timers](#cpu-timers) 40 41[Manual Timing](#manual-timing) 42 43[Setting the Time Unit](#setting-the-time-unit) 44 45[Random Interleaving](random_interleaving.md) 46 47[User-Requested Performance Counters](perf_counters.md) 48 49[Preventing Optimization](#preventing-optimization) 50 51[Reporting Statistics](#reporting-statistics) 52 53[Custom Statistics](#custom-statistics) 54 55[Memory Usage](#memory-usage) 56 57[Using RegisterBenchmark](#using-register-benchmark) 58 59[Exiting with an Error](#exiting-with-an-error) 60 61[A Faster `KeepRunning` Loop](#a-faster-keep-running-loop) 62 63## Benchmarking Tips 64 65[Disabling CPU Frequency Scaling](#disabling-cpu-frequency-scaling) 66 67[Reducing Variance in Benchmarks](reducing_variance.md) 68 69<a name="output-formats" /> 70 71## Output Formats 72 73The library supports multiple output formats. Use the 74`--benchmark_format=<console|json|csv>` flag (or set the 75`BENCHMARK_FORMAT=<console|json|csv>` environment variable) to set 76the format type. `console` is the default format. 77 78The Console format is intended to be a human readable format. By default 79the format generates color output. Context is output on stderr and the 80tabular data on stdout. Example tabular output looks like: 81 82``` 83Benchmark Time(ns) CPU(ns) Iterations 84---------------------------------------------------------------------- 85BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s 86BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s 87BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s 88``` 89 90The JSON format outputs human readable json split into two top level attributes. 91The `context` attribute contains information about the run in general, including 92information about the CPU and the date. 93The `benchmarks` attribute contains a list of every benchmark run. Example json 94output looks like: 95 96```json 97{ 98 "context": { 99 "date": "2015/03/17-18:40:25", 100 "num_cpus": 40, 101 "mhz_per_cpu": 2801, 102 "cpu_scaling_enabled": false, 103 "build_type": "debug" 104 }, 105 "benchmarks": [ 106 { 107 "name": "BM_SetInsert/1024/1", 108 "iterations": 94877, 109 "real_time": 29275, 110 "cpu_time": 29836, 111 "bytes_per_second": 134066, 112 "items_per_second": 33516 113 }, 114 { 115 "name": "BM_SetInsert/1024/8", 116 "iterations": 21609, 117 "real_time": 32317, 118 "cpu_time": 32429, 119 "bytes_per_second": 986770, 120 "items_per_second": 246693 121 }, 122 { 123 "name": "BM_SetInsert/1024/10", 124 "iterations": 21393, 125 "real_time": 32724, 126 "cpu_time": 33355, 127 "bytes_per_second": 1199226, 128 "items_per_second": 299807 129 } 130 ] 131} 132``` 133 134The CSV format outputs comma-separated values. The `context` is output on stderr 135and the CSV itself on stdout. Example CSV output looks like: 136 137``` 138name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label 139"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, 140"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, 141"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, 142``` 143 144<a name="output-files" /> 145 146## Output Files 147 148Write benchmark results to a file with the `--benchmark_out=<filename>` option 149(or set `BENCHMARK_OUT`). Specify the output format with 150`--benchmark_out_format={json|console|csv}` (or set 151`BENCHMARK_OUT_FORMAT={json|console|csv}`). Note that the 'csv' reporter is 152deprecated and the saved `.csv` file 153[is not parsable](https://github.com/google/benchmark/issues/794) by csv 154parsers. 155 156Specifying `--benchmark_out` does not suppress the console output. 157 158<a name="running-benchmarks" /> 159 160## Running Benchmarks 161 162Benchmarks are executed by running the produced binaries. Benchmarks binaries, 163by default, accept options that may be specified either through their command 164line interface or by setting environment variables before execution. For every 165`--option_flag=<value>` CLI switch, a corresponding environment variable 166`OPTION_FLAG=<value>` exist and is used as default if set (CLI switches always 167 prevails). A complete list of CLI options is available running benchmarks 168 with the `--help` switch. 169 170<a name="running-a-subset-of-benchmarks" /> 171 172## Running a Subset of Benchmarks 173 174The `--benchmark_filter=<regex>` option (or `BENCHMARK_FILTER=<regex>` 175environment variable) can be used to only run the benchmarks that match 176the specified `<regex>`. For example: 177 178```bash 179$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 180Run on (1 X 2300 MHz CPU ) 1812016-06-25 19:34:24 182Benchmark Time CPU Iterations 183---------------------------------------------------- 184BM_memcpy/32 11 ns 11 ns 79545455 185BM_memcpy/32k 2181 ns 2185 ns 324074 186BM_memcpy/32 12 ns 12 ns 54687500 187BM_memcpy/32k 1834 ns 1837 ns 357143 188``` 189 190## Disabling Benchmarks 191 192It is possible to temporarily disable benchmarks by renaming the benchmark 193function to have the prefix "DISABLED_". This will cause the benchmark to 194be skipped at runtime. 195 196<a name="result-comparison" /> 197 198## Result comparison 199 200It is possible to compare the benchmarking results. 201See [Additional Tooling Documentation](tools.md) 202 203<a name="extra-context" /> 204 205## Extra Context 206 207Sometimes it's useful to add extra context to the content printed before the 208results. By default this section includes information about the CPU on which 209the benchmarks are running. If you do want to add more context, you can use 210the `benchmark_context` command line flag: 211 212```bash 213$ ./run_benchmarks --benchmark_context=pwd=`pwd` 214Run on (1 x 2300 MHz CPU) 215pwd: /home/user/benchmark/ 216Benchmark Time CPU Iterations 217---------------------------------------------------- 218BM_memcpy/32 11 ns 11 ns 79545455 219BM_memcpy/32k 2181 ns 2185 ns 324074 220``` 221 222You can get the same effect with the API: 223 224```c++ 225 benchmark::AddCustomContext("foo", "bar"); 226``` 227 228Note that attempts to add a second value with the same key will fail with an 229error message. 230 231<a name="runtime-and-reporting-considerations" /> 232 233## Runtime and Reporting Considerations 234 235When the benchmark binary is executed, each benchmark function is run serially. 236The number of iterations to run is determined dynamically by running the 237benchmark a few times and measuring the time taken and ensuring that the 238ultimate result will be statistically stable. As such, faster benchmark 239functions will be run for more iterations than slower benchmark functions, and 240the number of iterations is thus reported. 241 242In all cases, the number of iterations for which the benchmark is run is 243governed by the amount of time the benchmark takes. Concretely, the number of 244iterations is at least one, not more than 1e9, until CPU time is greater than 245the minimum time, or the wallclock time is 5x minimum time. The minimum time is 246set per benchmark by calling `MinTime` on the registered benchmark object. 247 248Furthermore warming up a benchmark might be necessary in order to get 249stable results because of e.g caching effects of the code under benchmark. 250Warming up means running the benchmark a given amount of time, before 251results are actually taken into account. The amount of time for which 252the warmup should be run can be set per benchmark by calling 253`MinWarmUpTime` on the registered benchmark object or for all benchmarks 254using the `--benchmark_min_warmup_time` command-line option. Note that 255`MinWarmUpTime` will overwrite the value of `--benchmark_min_warmup_time` 256for the single benchmark. How many iterations the warmup run of each 257benchmark takes is determined the same way as described in the paragraph 258above. Per default the warmup phase is set to 0 seconds and is therefore 259disabled. 260 261Average timings are then reported over the iterations run. If multiple 262repetitions are requested using the `--benchmark_repetitions` command-line 263option, or at registration time, the benchmark function will be run several 264times and statistical results across these repetitions will also be reported. 265 266As well as the per-benchmark entries, a preamble in the report will include 267information about the machine on which the benchmarks are run. 268 269<a name="setup-teardown" /> 270 271## Setup/Teardown 272 273Global setup/teardown specific to each benchmark can be done by 274passing a callback to Setup/Teardown: 275 276The setup/teardown callbacks will be invoked once for each benchmark. If the 277benchmark is multi-threaded (will run in k threads), they will be invoked 278exactly once before each run with k threads. 279 280If the benchmark uses different size groups of threads, the above will be true 281for each size group. 282 283Eg., 284 285```c++ 286static void DoSetup(const benchmark::State& state) { 287} 288 289static void DoTeardown(const benchmark::State& state) { 290} 291 292static void BM_func(benchmark::State& state) {...} 293 294BENCHMARK(BM_func)->Arg(1)->Arg(3)->Threads(16)->Threads(32)->Setup(DoSetup)->Teardown(DoTeardown); 295 296``` 297 298In this example, `DoSetup` and `DoTearDown` will be invoked 4 times each, 299specifically, once for each of this family: 300 - BM_func_Arg_1_Threads_16, BM_func_Arg_1_Threads_32 301 - BM_func_Arg_3_Threads_16, BM_func_Arg_3_Threads_32 302 303<a name="passing-arguments" /> 304 305## Passing Arguments 306 307Sometimes a family of benchmarks can be implemented with just one routine that 308takes an extra argument to specify which one of the family of benchmarks to 309run. For example, the following code defines a family of benchmarks for 310measuring the speed of `memcpy()` calls of different lengths: 311 312```c++ 313static void BM_memcpy(benchmark::State& state) { 314 char* src = new char[state.range(0)]; 315 char* dst = new char[state.range(0)]; 316 memset(src, 'x', state.range(0)); 317 for (auto _ : state) 318 memcpy(dst, src, state.range(0)); 319 state.SetBytesProcessed(int64_t(state.iterations()) * 320 int64_t(state.range(0))); 321 delete[] src; 322 delete[] dst; 323} 324BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(4<<10)->Arg(8<<10); 325``` 326 327The preceding code is quite repetitive, and can be replaced with the following 328short-hand. The following invocation will pick a few appropriate arguments in 329the specified range and will generate a benchmark for each such argument. 330 331```c++ 332BENCHMARK(BM_memcpy)->Range(8, 8<<10); 333``` 334 335By default the arguments in the range are generated in multiples of eight and 336the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the 337range multiplier is changed to multiples of two. 338 339```c++ 340BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); 341``` 342 343Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. 344 345The preceding code shows a method of defining a sparse range. The following 346example shows a method of defining a dense range. It is then used to benchmark 347the performance of `std::vector` initialization for uniformly increasing sizes. 348 349```c++ 350static void BM_DenseRange(benchmark::State& state) { 351 for(auto _ : state) { 352 std::vector<int> v(state.range(0), state.range(0)); 353 auto data = v.data(); 354 benchmark::DoNotOptimize(data); 355 benchmark::ClobberMemory(); 356 } 357} 358BENCHMARK(BM_DenseRange)->DenseRange(0, 1024, 128); 359``` 360 361Now arguments generated are [ 0, 128, 256, 384, 512, 640, 768, 896, 1024 ]. 362 363You might have a benchmark that depends on two or more inputs. For example, the 364following code defines a family of benchmarks for measuring the speed of set 365insertion. 366 367```c++ 368static void BM_SetInsert(benchmark::State& state) { 369 std::set<int> data; 370 for (auto _ : state) { 371 state.PauseTiming(); 372 data = ConstructRandomSet(state.range(0)); 373 state.ResumeTiming(); 374 for (int j = 0; j < state.range(1); ++j) 375 data.insert(RandomNumber()); 376 } 377} 378BENCHMARK(BM_SetInsert) 379 ->Args({1<<10, 128}) 380 ->Args({2<<10, 128}) 381 ->Args({4<<10, 128}) 382 ->Args({8<<10, 128}) 383 ->Args({1<<10, 512}) 384 ->Args({2<<10, 512}) 385 ->Args({4<<10, 512}) 386 ->Args({8<<10, 512}); 387``` 388 389The preceding code is quite repetitive, and can be replaced with the following 390short-hand. The following macro will pick a few appropriate arguments in the 391product of the two specified ranges and will generate a benchmark for each such 392pair. 393 394<!-- {% raw %} --> 395```c++ 396BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}}); 397``` 398<!-- {% endraw %} --> 399 400Some benchmarks may require specific argument values that cannot be expressed 401with `Ranges`. In this case, `ArgsProduct` offers the ability to generate a 402benchmark input for each combination in the product of the supplied vectors. 403 404<!-- {% raw %} --> 405```c++ 406BENCHMARK(BM_SetInsert) 407 ->ArgsProduct({{1<<10, 3<<10, 8<<10}, {20, 40, 60, 80}}) 408// would generate the same benchmark arguments as 409BENCHMARK(BM_SetInsert) 410 ->Args({1<<10, 20}) 411 ->Args({3<<10, 20}) 412 ->Args({8<<10, 20}) 413 ->Args({3<<10, 40}) 414 ->Args({8<<10, 40}) 415 ->Args({1<<10, 40}) 416 ->Args({1<<10, 60}) 417 ->Args({3<<10, 60}) 418 ->Args({8<<10, 60}) 419 ->Args({1<<10, 80}) 420 ->Args({3<<10, 80}) 421 ->Args({8<<10, 80}); 422``` 423<!-- {% endraw %} --> 424 425For the most common scenarios, helper methods for creating a list of 426integers for a given sparse or dense range are provided. 427 428```c++ 429BENCHMARK(BM_SetInsert) 430 ->ArgsProduct({ 431 benchmark::CreateRange(8, 128, /*multi=*/2), 432 benchmark::CreateDenseRange(1, 4, /*step=*/1) 433 }) 434// would generate the same benchmark arguments as 435BENCHMARK(BM_SetInsert) 436 ->ArgsProduct({ 437 {8, 16, 32, 64, 128}, 438 {1, 2, 3, 4} 439 }); 440``` 441 442For more complex patterns of inputs, passing a custom function to `Apply` allows 443programmatic specification of an arbitrary set of arguments on which to run the 444benchmark. The following example enumerates a dense range on one parameter, 445and a sparse range on the second. 446 447```c++ 448static void CustomArguments(benchmark::internal::Benchmark* b) { 449 for (int i = 0; i <= 10; ++i) 450 for (int j = 32; j <= 1024*1024; j *= 8) 451 b->Args({i, j}); 452} 453BENCHMARK(BM_SetInsert)->Apply(CustomArguments); 454``` 455 456### Passing Arbitrary Arguments to a Benchmark 457 458In C++11 it is possible to define a benchmark that takes an arbitrary number 459of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` 460macro creates a benchmark that invokes `func` with the `benchmark::State` as 461the first argument followed by the specified `args...`. 462The `test_case_name` is appended to the name of the benchmark and 463should describe the values passed. 464 465```c++ 466template <class ...Args> 467void BM_takes_args(benchmark::State& state, Args&&... args) { 468 auto args_tuple = std::make_tuple(std::move(args)...); 469 for (auto _ : state) { 470 std::cout << std::get<0>(args_tuple) << ": " << std::get<1>(args_tuple) 471 << '\n'; 472 [...] 473 } 474} 475// Registers a benchmark named "BM_takes_args/int_string_test" that passes 476// the specified values to `args`. 477BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); 478 479// Registers the same benchmark "BM_takes_args/int_test" that passes 480// the specified values to `args`. 481BENCHMARK_CAPTURE(BM_takes_args, int_test, 42, 43); 482``` 483 484Note that elements of `...args` may refer to global variables. Users should 485avoid modifying global state inside of a benchmark. 486 487<a name="asymptotic-complexity" /> 488 489## Calculating Asymptotic Complexity (Big O) 490 491Asymptotic complexity might be calculated for a family of benchmarks. The 492following code will calculate the coefficient for the high-order term in the 493running time and the normalized root-mean square error of string comparison. 494 495```c++ 496static void BM_StringCompare(benchmark::State& state) { 497 std::string s1(state.range(0), '-'); 498 std::string s2(state.range(0), '-'); 499 for (auto _ : state) { 500 auto comparison_result = s1.compare(s2); 501 benchmark::DoNotOptimize(comparison_result); 502 } 503 state.SetComplexityN(state.range(0)); 504} 505BENCHMARK(BM_StringCompare) 506 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); 507``` 508 509As shown in the following invocation, asymptotic complexity might also be 510calculated automatically. 511 512```c++ 513BENCHMARK(BM_StringCompare) 514 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); 515``` 516 517The following code will specify asymptotic complexity with a lambda function, 518that might be used to customize high-order term calculation. 519 520```c++ 521BENCHMARK(BM_StringCompare)->RangeMultiplier(2) 522 ->Range(1<<10, 1<<18)->Complexity([](benchmark::IterationCount n)->double{return n; }); 523``` 524 525<a name="custom-benchmark-name" /> 526 527## Custom Benchmark Name 528 529You can change the benchmark's name as follows: 530 531```c++ 532BENCHMARK(BM_memcpy)->Name("memcpy")->RangeMultiplier(2)->Range(8, 8<<10); 533``` 534 535The invocation will execute the benchmark as before using `BM_memcpy` but changes 536the prefix in the report to `memcpy`. 537 538<a name="templated-benchmarks" /> 539 540## Templated Benchmarks 541 542This example produces and consumes messages of size `sizeof(v)` `range_x` 543times. It also outputs throughput in the absence of multiprogramming. 544 545```c++ 546template <class Q> void BM_Sequential(benchmark::State& state) { 547 Q q; 548 typename Q::value_type v; 549 for (auto _ : state) { 550 for (int i = state.range(0); i--; ) 551 q.push(v); 552 for (int e = state.range(0); e--; ) 553 q.Wait(&v); 554 } 555 // actually messages, not bytes: 556 state.SetBytesProcessed( 557 static_cast<int64_t>(state.iterations())*state.range(0)); 558} 559// C++03 560BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); 561 562// C++11 or newer, you can use the BENCHMARK macro with template parameters: 563BENCHMARK(BM_Sequential<WaitQueue<int>>)->Range(1<<0, 1<<10); 564 565``` 566 567Three macros are provided for adding benchmark templates. 568 569```c++ 570#ifdef BENCHMARK_HAS_CXX11 571#define BENCHMARK(func<...>) // Takes any number of parameters. 572#else // C++ < C++11 573#define BENCHMARK_TEMPLATE(func, arg1) 574#endif 575#define BENCHMARK_TEMPLATE1(func, arg1) 576#define BENCHMARK_TEMPLATE2(func, arg1, arg2) 577``` 578 579<a name="templated-benchmarks-with-arguments" /> 580 581## Templated Benchmarks that take arguments 582 583Sometimes there is a need to template benchmarks, and provide arguments to them. 584 585```c++ 586template <class Q> void BM_Sequential_With_Step(benchmark::State& state, int step) { 587 Q q; 588 typename Q::value_type v; 589 for (auto _ : state) { 590 for (int i = state.range(0); i-=step; ) 591 q.push(v); 592 for (int e = state.range(0); e-=step; ) 593 q.Wait(&v); 594 } 595 // actually messages, not bytes: 596 state.SetBytesProcessed( 597 static_cast<int64_t>(state.iterations())*state.range(0)); 598} 599 600BENCHMARK_TEMPLATE1_CAPTURE(BM_Sequential, WaitQueue<int>, Step1, 1)->Range(1<<0, 1<<10); 601``` 602 603<a name="fixtures" /> 604 605## Fixtures 606 607Fixture tests are created by first defining a type that derives from 608`::benchmark::Fixture` and then creating/registering the tests using the 609following macros: 610 611* `BENCHMARK_F(ClassName, Method)` 612* `BENCHMARK_DEFINE_F(ClassName, Method)` 613* `BENCHMARK_REGISTER_F(ClassName, Method)` 614 615For Example: 616 617```c++ 618class MyFixture : public benchmark::Fixture { 619public: 620 void SetUp(::benchmark::State& state) { 621 } 622 623 void TearDown(::benchmark::State& state) { 624 } 625}; 626 627// Defines and registers `FooTest` using the class `MyFixture`. 628BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { 629 for (auto _ : st) { 630 ... 631 } 632} 633 634// Only defines `BarTest` using the class `MyFixture`. 635BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { 636 for (auto _ : st) { 637 ... 638 } 639} 640// `BarTest` is NOT registered. 641BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); 642// `BarTest` is now registered. 643``` 644 645### Templated Fixtures 646 647Also you can create templated fixture by using the following macros: 648 649* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)` 650* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)` 651 652For example: 653 654```c++ 655template<typename T> 656class MyFixture : public benchmark::Fixture {}; 657 658// Defines and registers `IntTest` using the class template `MyFixture<int>`. 659BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) { 660 for (auto _ : st) { 661 ... 662 } 663} 664 665// Only defines `DoubleTest` using the class template `MyFixture<double>`. 666BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) { 667 for (auto _ : st) { 668 ... 669 } 670} 671// `DoubleTest` is NOT registered. 672BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2); 673// `DoubleTest` is now registered. 674``` 675 676<a name="custom-counters" /> 677 678## Custom Counters 679 680You can add your own counters with user-defined names. The example below 681will add columns "Foo", "Bar" and "Baz" in its output: 682 683```c++ 684static void UserCountersExample1(benchmark::State& state) { 685 double numFoos = 0, numBars = 0, numBazs = 0; 686 for (auto _ : state) { 687 // ... count Foo,Bar,Baz events 688 } 689 state.counters["Foo"] = numFoos; 690 state.counters["Bar"] = numBars; 691 state.counters["Baz"] = numBazs; 692} 693``` 694 695The `state.counters` object is a `std::map` with `std::string` keys 696and `Counter` values. The latter is a `double`-like class, via an implicit 697conversion to `double&`. Thus you can use all of the standard arithmetic 698assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter. 699 700In multithreaded benchmarks, each counter is set on the calling thread only. 701When the benchmark finishes, the counters from each thread will be summed; 702the resulting sum is the value which will be shown for the benchmark. 703 704The `Counter` constructor accepts three parameters: the value as a `double` 705; a bit flag which allows you to show counters as rates, and/or as per-thread 706iteration, and/or as per-thread averages, and/or iteration invariants, 707and/or finally inverting the result; and a flag specifying the 'unit' - i.e. 708is 1k a 1000 (default, `benchmark::Counter::OneK::kIs1000`), or 1024 709(`benchmark::Counter::OneK::kIs1024`)? 710 711```c++ 712 // sets a simple counter 713 state.counters["Foo"] = numFoos; 714 715 // Set the counter as a rate. It will be presented divided 716 // by the duration of the benchmark. 717 // Meaning: per one second, how many 'foo's are processed? 718 state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate); 719 720 // Set the counter as a rate. It will be presented divided 721 // by the duration of the benchmark, and the result inverted. 722 // Meaning: how many seconds it takes to process one 'foo'? 723 state.counters["FooInvRate"] = Counter(numFoos, benchmark::Counter::kIsRate | benchmark::Counter::kInvert); 724 725 // Set the counter as a thread-average quantity. It will 726 // be presented divided by the number of threads. 727 state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads); 728 729 // There's also a combined flag: 730 state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate); 731 732 // This says that we process with the rate of state.range(0) bytes every iteration: 733 state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); 734``` 735 736When you're compiling in C++11 mode or later you can use `insert()` with 737`std::initializer_list`: 738 739<!-- {% raw %} --> 740```c++ 741 // With C++11, this can be done: 742 state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}}); 743 // ... instead of: 744 state.counters["Foo"] = numFoos; 745 state.counters["Bar"] = numBars; 746 state.counters["Baz"] = numBazs; 747``` 748<!-- {% endraw %} --> 749 750### Counter Reporting 751 752When using the console reporter, by default, user counters are printed at 753the end after the table, the same way as ``bytes_processed`` and 754``items_processed``. This is best for cases in which there are few counters, 755or where there are only a couple of lines per benchmark. Here's an example of 756the default output: 757 758``` 759------------------------------------------------------------------------------ 760Benchmark Time CPU Iterations UserCounters... 761------------------------------------------------------------------------------ 762BM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz=24 Foo=8 763BM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3 Foo=1024m 764BM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=6 Foo=2 765BM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=12 Foo=4 766BM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz=24 Foo=8 767BM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz=48 Foo=16 768BM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Baz=96 Foo=32 769BM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=12 Foo=4 770BM_Factorial 26 ns 26 ns 26608979 40320 771BM_Factorial/real_time 26 ns 26 ns 26587936 40320 772BM_CalculatePiRange/1 16 ns 16 ns 45704255 0 773BM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374 774BM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746 775BM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355 776``` 777 778If this doesn't suit you, you can print each counter as a table column by 779passing the flag `--benchmark_counters_tabular=true` to the benchmark 780application. This is best for cases in which there are a lot of counters, or 781a lot of lines per individual benchmark. Note that this will trigger a 782reprinting of the table header any time the counter set changes between 783individual benchmarks. Here's an example of corresponding output when 784`--benchmark_counters_tabular=true` is passed: 785 786``` 787--------------------------------------------------------------------------------------- 788Benchmark Time CPU Iterations Bar Bat Baz Foo 789--------------------------------------------------------------------------------------- 790BM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 24 8 791BM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5 3 1 792BM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10 6 2 793BM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 12 4 794BM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 24 8 795BM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 48 16 796BM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 96 32 797BM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 12 4 798-------------------------------------------------------------- 799Benchmark Time CPU Iterations 800-------------------------------------------------------------- 801BM_Factorial 26 ns 26 ns 26392245 40320 802BM_Factorial/real_time 26 ns 26 ns 26494107 40320 803BM_CalculatePiRange/1 15 ns 15 ns 45571597 0 804BM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374 805BM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746 806BM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355 807BM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184 808BM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162 809BM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416 810BM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159 811BM_CalculatePi/threads:8 2255 ns 9943 ns 70936 812``` 813 814Note above the additional header printed when the benchmark changes from 815``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does 816not have the same counter set as ``BM_UserCounter``. 817 818<a name="multithreaded-benchmarks"/> 819 820## Multithreaded Benchmarks 821 822In a multithreaded test (benchmark invoked by multiple threads simultaneously), 823it is guaranteed that none of the threads will start until all have reached 824the start of the benchmark loop, and all will have finished before any thread 825exits the benchmark loop. (This behavior is also provided by the `KeepRunning()` 826API) As such, any global setup or teardown can be wrapped in a check against the thread 827index: 828 829```c++ 830static void BM_MultiThreaded(benchmark::State& state) { 831 if (state.thread_index() == 0) { 832 // Setup code here. 833 } 834 for (auto _ : state) { 835 // Run the test as normal. 836 } 837 if (state.thread_index() == 0) { 838 // Teardown code here. 839 } 840} 841BENCHMARK(BM_MultiThreaded)->Threads(2); 842``` 843 844To run the benchmark across a range of thread counts, instead of `Threads`, use 845`ThreadRange`. This takes two parameters (`min_threads` and `max_threads`) and 846runs the benchmark once for values in the inclusive range. For example: 847 848```c++ 849BENCHMARK(BM_MultiThreaded)->ThreadRange(1, 8); 850``` 851 852will run `BM_MultiThreaded` with thread counts 1, 2, 4, and 8. 853 854If the benchmarked code itself uses threads and you want to compare it to 855single-threaded code, you may want to use real-time ("wallclock") measurements 856for latency comparisons: 857 858```c++ 859BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); 860``` 861 862Without `UseRealTime`, CPU time is used by default. 863 864<a name="cpu-timers" /> 865 866## CPU Timers 867 868By default, the CPU timer only measures the time spent by the main thread. 869If the benchmark itself uses threads internally, this measurement may not 870be what you are looking for. Instead, there is a way to measure the total 871CPU usage of the process, by all the threads. 872 873```c++ 874void callee(int i); 875 876static void MyMain(int size) { 877#pragma omp parallel for 878 for(int i = 0; i < size; i++) 879 callee(i); 880} 881 882static void BM_OpenMP(benchmark::State& state) { 883 for (auto _ : state) 884 MyMain(state.range(0)); 885} 886 887// Measure the time spent by the main thread, use it to decide for how long to 888// run the benchmark loop. Depending on the internal implementation detail may 889// measure to anywhere from near-zero (the overhead spent before/after work 890// handoff to worker thread[s]) to the whole single-thread time. 891BENCHMARK(BM_OpenMP)->Range(8, 8<<10); 892 893// Measure the user-visible time, the wall clock (literally, the time that 894// has passed on the clock on the wall), use it to decide for how long to 895// run the benchmark loop. This will always be meaningful, and will match the 896// time spent by the main thread in single-threaded case, in general decreasing 897// with the number of internal threads doing the work. 898BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->UseRealTime(); 899 900// Measure the total CPU consumption, use it to decide for how long to 901// run the benchmark loop. This will always measure to no less than the 902// time spent by the main thread in single-threaded case. 903BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime(); 904 905// A mixture of the last two. Measure the total CPU consumption, but use the 906// wall clock to decide for how long to run the benchmark loop. 907BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime()->UseRealTime(); 908``` 909 910### Controlling Timers 911 912Normally, the entire duration of the work loop (`for (auto _ : state) {}`) 913is measured. But sometimes, it is necessary to do some work inside of 914that loop, every iteration, but without counting that time to the benchmark time. 915That is possible, although it is not recommended, since it has high overhead. 916 917<!-- {% raw %} --> 918```c++ 919static void BM_SetInsert_With_Timer_Control(benchmark::State& state) { 920 std::set<int> data; 921 for (auto _ : state) { 922 state.PauseTiming(); // Stop timers. They will not count until they are resumed. 923 data = ConstructRandomSet(state.range(0)); // Do something that should not be measured 924 state.ResumeTiming(); // And resume timers. They are now counting again. 925 // The rest will be measured. 926 for (int j = 0; j < state.range(1); ++j) 927 data.insert(RandomNumber()); 928 } 929} 930BENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}}); 931``` 932<!-- {% endraw %} --> 933 934<a name="manual-timing" /> 935 936## Manual Timing 937 938For benchmarking something for which neither CPU time nor real-time are 939correct or accurate enough, completely manual timing is supported using 940the `UseManualTime` function. 941 942When `UseManualTime` is used, the benchmarked code must call 943`SetIterationTime` once per iteration of the benchmark loop to 944report the manually measured time. 945 946An example use case for this is benchmarking GPU execution (e.g. OpenCL 947or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot 948be accurately measured using CPU time or real-time. Instead, they can be 949measured accurately using a dedicated API, and these measurement results 950can be reported back with `SetIterationTime`. 951 952```c++ 953static void BM_ManualTiming(benchmark::State& state) { 954 int microseconds = state.range(0); 955 std::chrono::duration<double, std::micro> sleep_duration { 956 static_cast<double>(microseconds) 957 }; 958 959 for (auto _ : state) { 960 auto start = std::chrono::high_resolution_clock::now(); 961 // Simulate some useful workload with a sleep 962 std::this_thread::sleep_for(sleep_duration); 963 auto end = std::chrono::high_resolution_clock::now(); 964 965 auto elapsed_seconds = 966 std::chrono::duration_cast<std::chrono::duration<double>>( 967 end - start); 968 969 state.SetIterationTime(elapsed_seconds.count()); 970 } 971} 972BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); 973``` 974 975<a name="setting-the-time-unit" /> 976 977## Setting the Time Unit 978 979If a benchmark runs a few milliseconds it may be hard to visually compare the 980measured times, since the output data is given in nanoseconds per default. In 981order to manually set the time unit, you can specify it manually: 982 983```c++ 984BENCHMARK(BM_test)->Unit(benchmark::kMillisecond); 985``` 986 987Additionally the default time unit can be set globally with the 988`--benchmark_time_unit={ns|us|ms|s}` command line argument. The argument only 989affects benchmarks where the time unit is not set explicitly. 990 991<a name="preventing-optimization" /> 992 993## Preventing Optimization 994 995To prevent a value or expression from being optimized away by the compiler 996the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` 997functions can be used. 998 999```c++ 1000static void BM_test(benchmark::State& state) { 1001 for (auto _ : state) { 1002 int x = 0; 1003 for (int i=0; i < 64; ++i) { 1004 benchmark::DoNotOptimize(x += i); 1005 } 1006 } 1007} 1008``` 1009 1010`DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either 1011memory or a register. For GNU based compilers it acts as read/write barrier 1012for global memory. More specifically it forces the compiler to flush pending 1013writes to memory and reload any other values as necessary. 1014 1015Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` 1016in any way. `<expr>` may even be removed entirely when the result is already 1017known. For example: 1018 1019```c++ 1020 // Example 1: `<expr>` is removed entirely. 1021 int foo(int x) { return x + 42; } 1022 while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); 1023 1024 // Example 2: Result of '<expr>' is only reused. 1025 int bar(int) __attribute__((const)); 1026 while (...) DoNotOptimize(bar(0)); // Optimized to: 1027 // int __result__ = bar(0); 1028 // while (...) DoNotOptimize(__result__); 1029``` 1030 1031The second tool for preventing optimizations is `ClobberMemory()`. In essence 1032`ClobberMemory()` forces the compiler to perform all pending writes to global 1033memory. Memory managed by block scope objects must be "escaped" using 1034`DoNotOptimize(...)` before it can be clobbered. In the below example 1035`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized 1036away. 1037 1038```c++ 1039static void BM_vector_push_back(benchmark::State& state) { 1040 for (auto _ : state) { 1041 std::vector<int> v; 1042 v.reserve(1); 1043 auto data = v.data(); // Allow v.data() to be clobbered. Pass as non-const 1044 benchmark::DoNotOptimize(data); // lvalue to avoid undesired compiler optimizations 1045 v.push_back(42); 1046 benchmark::ClobberMemory(); // Force 42 to be written to memory. 1047 } 1048} 1049``` 1050 1051Note that `ClobberMemory()` is only available for GNU or MSVC based compilers. 1052 1053<a name="reporting-statistics" /> 1054 1055## Statistics: Reporting the Mean, Median and Standard Deviation / Coefficient of variation of Repeated Benchmarks 1056 1057By default each benchmark is run once and that single result is reported. 1058However benchmarks are often noisy and a single result may not be representative 1059of the overall behavior. For this reason it's possible to repeatedly rerun the 1060benchmark. 1061 1062The number of runs of each benchmark is specified globally by the 1063`--benchmark_repetitions` flag or on a per benchmark basis by calling 1064`Repetitions` on the registered benchmark object. When a benchmark is run more 1065than once the mean, median, standard deviation and coefficient of variation 1066of the runs will be reported. 1067 1068Additionally the `--benchmark_report_aggregates_only={true|false}`, 1069`--benchmark_display_aggregates_only={true|false}` flags or 1070`ReportAggregatesOnly(bool)`, `DisplayAggregatesOnly(bool)` functions can be 1071used to change how repeated tests are reported. By default the result of each 1072repeated run is reported. When `report aggregates only` option is `true`, 1073only the aggregates (i.e. mean, median, standard deviation and coefficient 1074of variation, maybe complexity measurements if they were requested) of the runs 1075is reported, to both the reporters - standard output (console), and the file. 1076However when only the `display aggregates only` option is `true`, 1077only the aggregates are displayed in the standard output, while the file 1078output still contains everything. 1079Calling `ReportAggregatesOnly(bool)` / `DisplayAggregatesOnly(bool)` on a 1080registered benchmark object overrides the value of the appropriate flag for that 1081benchmark. 1082 1083<a name="custom-statistics" /> 1084 1085## Custom Statistics 1086 1087While having these aggregates is nice, this may not be enough for everyone. 1088For example you may want to know what the largest observation is, e.g. because 1089you have some real-time constraints. This is easy. The following code will 1090specify a custom statistic to be calculated, defined by a lambda function. 1091 1092```c++ 1093void BM_spin_empty(benchmark::State& state) { 1094 for (auto _ : state) { 1095 for (int x = 0; x < state.range(0); ++x) { 1096 benchmark::DoNotOptimize(x); 1097 } 1098 } 1099} 1100 1101BENCHMARK(BM_spin_empty) 1102 ->ComputeStatistics("max", [](const std::vector<double>& v) -> double { 1103 return *(std::max_element(std::begin(v), std::end(v))); 1104 }) 1105 ->Arg(512); 1106``` 1107 1108While usually the statistics produce values in time units, 1109you can also produce percentages: 1110 1111```c++ 1112void BM_spin_empty(benchmark::State& state) { 1113 for (auto _ : state) { 1114 for (int x = 0; x < state.range(0); ++x) { 1115 benchmark::DoNotOptimize(x); 1116 } 1117 } 1118} 1119 1120BENCHMARK(BM_spin_empty) 1121 ->ComputeStatistics("ratio", [](const std::vector<double>& v) -> double { 1122 return std::begin(v) / std::end(v); 1123 }, benchmark::StatisticUnit::kPercentage) 1124 ->Arg(512); 1125``` 1126 1127<a name="memory-usage" /> 1128 1129## Memory Usage 1130 1131It's often useful to also track memory usage for benchmarks, alongside CPU 1132performance. For this reason, benchmark offers the `RegisterMemoryManager` 1133method that allows a custom `MemoryManager` to be injected. 1134 1135If set, the `MemoryManager::Start` and `MemoryManager::Stop` methods will be 1136called at the start and end of benchmark runs to allow user code to fill out 1137a report on the number of allocations, bytes used, etc. 1138 1139This data will then be reported alongside other performance data, currently 1140only when using JSON output. 1141 1142<a name="profiling" /> 1143 1144## Profiling 1145 1146It's often useful to also profile benchmarks in particular ways, in addition to 1147CPU performance. For this reason, benchmark offers the `RegisterProfilerManager` 1148method that allows a custom `ProfilerManager` to be injected. 1149 1150If set, the `ProfilerManager::AfterSetupStart` and 1151`ProfilerManager::BeforeTeardownStop` methods will be called at the start and 1152end of a separate benchmark run to allow user code to collect and report 1153user-provided profile metrics. 1154 1155Output collected from this profiling run must be reported separately. 1156 1157<a name="using-register-benchmark" /> 1158 1159## Using RegisterBenchmark(name, fn, args...) 1160 1161The `RegisterBenchmark(name, func, args...)` function provides an alternative 1162way to create and register benchmarks. 1163`RegisterBenchmark(name, func, args...)` creates, registers, and returns a 1164pointer to a new benchmark with the specified `name` that invokes 1165`func(st, args...)` where `st` is a `benchmark::State` object. 1166 1167Unlike the `BENCHMARK` registration macros, which can only be used at the global 1168scope, the `RegisterBenchmark` can be called anywhere. This allows for 1169benchmark tests to be registered programmatically. 1170 1171Additionally `RegisterBenchmark` allows any callable object to be registered 1172as a benchmark. Including capturing lambdas and function objects. 1173 1174For Example: 1175```c++ 1176auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; 1177 1178int main(int argc, char** argv) { 1179 for (auto& test_input : { /* ... */ }) 1180 benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); 1181 benchmark::Initialize(&argc, argv); 1182 benchmark::RunSpecifiedBenchmarks(); 1183 benchmark::Shutdown(); 1184} 1185``` 1186 1187<a name="exiting-with-an-error" /> 1188 1189## Exiting with an Error 1190 1191When errors caused by external influences, such as file I/O and network 1192communication, occur within a benchmark the 1193`State::SkipWithError(const std::string& msg)` function can be used to skip that run 1194of benchmark and report the error. Note that only future iterations of the 1195`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop 1196Users must explicitly exit the loop, otherwise all iterations will be performed. 1197Users may explicitly return to exit the benchmark immediately. 1198 1199The `SkipWithError(...)` function may be used at any point within the benchmark, 1200including before and after the benchmark loop. Moreover, if `SkipWithError(...)` 1201has been used, it is not required to reach the benchmark loop and one may return 1202from the benchmark function early. 1203 1204For example: 1205 1206```c++ 1207static void BM_test(benchmark::State& state) { 1208 auto resource = GetResource(); 1209 if (!resource.good()) { 1210 state.SkipWithError("Resource is not good!"); 1211 // KeepRunning() loop will not be entered. 1212 } 1213 while (state.KeepRunning()) { 1214 auto data = resource.read_data(); 1215 if (!resource.good()) { 1216 state.SkipWithError("Failed to read data!"); 1217 break; // Needed to skip the rest of the iteration. 1218 } 1219 do_stuff(data); 1220 } 1221} 1222 1223static void BM_test_ranged_fo(benchmark::State & state) { 1224 auto resource = GetResource(); 1225 if (!resource.good()) { 1226 state.SkipWithError("Resource is not good!"); 1227 return; // Early return is allowed when SkipWithError() has been used. 1228 } 1229 for (auto _ : state) { 1230 auto data = resource.read_data(); 1231 if (!resource.good()) { 1232 state.SkipWithError("Failed to read data!"); 1233 break; // REQUIRED to prevent all further iterations. 1234 } 1235 do_stuff(data); 1236 } 1237} 1238``` 1239<a name="a-faster-keep-running-loop" /> 1240 1241## A Faster KeepRunning Loop 1242 1243In C++11 mode, a ranged-based for loop should be used in preference to 1244the `KeepRunning` loop for running the benchmarks. For example: 1245 1246```c++ 1247static void BM_Fast(benchmark::State &state) { 1248 for (auto _ : state) { 1249 FastOperation(); 1250 } 1251} 1252BENCHMARK(BM_Fast); 1253``` 1254 1255The reason the ranged-for loop is faster than using `KeepRunning`, is 1256because `KeepRunning` requires a memory load and store of the iteration count 1257ever iteration, whereas the ranged-for variant is able to keep the iteration count 1258in a register. 1259 1260For example, an empty inner loop of using the ranged-based for method looks like: 1261 1262```asm 1263# Loop Init 1264 mov rbx, qword ptr [r14 + 104] 1265 call benchmark::State::StartKeepRunning() 1266 test rbx, rbx 1267 je .LoopEnd 1268.LoopHeader: # =>This Inner Loop Header: Depth=1 1269 add rbx, -1 1270 jne .LoopHeader 1271.LoopEnd: 1272``` 1273 1274Compared to an empty `KeepRunning` loop, which looks like: 1275 1276```asm 1277.LoopHeader: # in Loop: Header=BB0_3 Depth=1 1278 cmp byte ptr [rbx], 1 1279 jne .LoopInit 1280.LoopBody: # =>This Inner Loop Header: Depth=1 1281 mov rax, qword ptr [rbx + 8] 1282 lea rcx, [rax + 1] 1283 mov qword ptr [rbx + 8], rcx 1284 cmp rax, qword ptr [rbx + 104] 1285 jb .LoopHeader 1286 jmp .LoopEnd 1287.LoopInit: 1288 mov rdi, rbx 1289 call benchmark::State::StartKeepRunning() 1290 jmp .LoopBody 1291.LoopEnd: 1292``` 1293 1294Unless C++03 compatibility is required, the ranged-for variant of writing 1295the benchmark loop should be preferred. 1296 1297<a name="disabling-cpu-frequency-scaling" /> 1298 1299## Disabling CPU Frequency Scaling 1300 1301If you see this error: 1302 1303``` 1304***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 1305be noisy and will incur extra overhead. 1306``` 1307 1308you might want to disable the CPU frequency scaling while running the 1309benchmark, as well as consider other ways to stabilize the performance of 1310your system while benchmarking. 1311 1312See [Reducing Variance](reducing_variance.md) for more information. 1313