xref: /aosp_15_r20/external/cronet/testing/libfuzzer/getting_started_with_libfuzzer.md (revision 6777b5387eb2ff775bb5750e3f5d96f37fb7352b)
1# Getting started with libfuzzer in Chromium
2
3Our current best advice on how to start fuzzing is by using FuzzTest, which
4has its own [getting started guide here]. If you're reading this page, it's
5probably because you've run into limitations of FuzzTest and want to create
6a libfuzzer fuzzer instead. This is a slightly older approach to fuzzing
7Chrome, but it still works well - read on.
8
9This document walks you through the basic steps to start fuzzing and suggestions
10for improving your fuzz targets. If you're looking for more advanced fuzzing
11topics, see the [main page](README.md).
12
13[TOC]
14
15## Getting started
16
17### Simple Example
18
19Before writing any code let us look at a simple
20example of a test that uses input fuzzing. The test is setup to exercise the
21[`CreateFnmatchQuery`](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/extensions/file_manager/search_by_pattern.h;drc=4bc4bcef0ab5581a5a27cea986296739582243a6)
22function. The role of this function is to take a user query and produce
23a case-insensitive pattern that matches file names containing the
24query in them. For example, for a query "1abc" the function generates
25"\*1[aA][bB][cC]\*". Unlike a traditional test, an input fuzzing test does not
26care about the output of the tested function. Instead it verifies that no
27matter what string the user enters `CreateFnmatchQuery` does not do something
28unexpected, such as a crash, overriding a memory region, etc. The test
29[create_fnmatch_query_fuzzer.cc](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/extensions/file_manager/create_fnmatch_query_fuzzer.cc;drc=1f5a5af3eb1bbdf9e4566c3e6d2051e68de112eb)
30is shown below:
31
32```cpp
33#include <stddef.h>
34#include <stdint.h>
35
36#include <string>
37
38#include "chrome/browser/ash/extensions/file_manager/search_by_pattern.h"
39
40extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
41  std::string str = std::string(reinterpret_cast<const char*>(data), size);
42  extensions::CreateFnmatchQuery(str);
43  return 0;
44}
45```
46
47The code starts by including `stddef.h` for `size_t` definition, `stdint.h`
48for `uint8_t` definition, `string` for `std::string` definition and finally
49the file where `extensions::CreateFnmatchQuery` function is defined. Next
50it declares and defines the `LLVMFuzzerTestOneInput` function, which is
51the function called by the testing framework. The function is supplied with two
52arguments, a pointer to an array of bytes, and the size of the array. These
53bytes are generated by the fuzzing test harness and their specific values
54are irrelevant. The job of the test is to convert those bytes to input
55parameters of the tested function. In our case bytes are converted
56to a `std::string` and given to the `CreateFnmatchQuery` function. If
57the function completes its job and the code successfully returns, the
58`LLVMFuzzerTestOneInput` function returns 0, signaling a successful execution.
59
60The above pattern is typical to fuzzing tests. You create a
61`LLVMFuzzerTestOneInput` function. You then write code that uses the provided
62random bytes to form input parameters to the function you intend to test. Next,
63you call the function, and if it successfully completes, return 0.
64
65To run this test we need to create a `fuzzer_test` target in the appropriate
66`BUILD.gn` file. For the above example, the target is defined as
67
68```python
69fuzzer_test("create_fnmatch_query_fuzzer") {
70  sources = [ "extensions/file_manager/create_fnmatch_query_fuzzer.cc" ]
71  deps = [
72    ":ash",
73    "//base",
74    "//chrome/browser",
75    "//components/exo/wayland:ui_controls_protocol",
76  ]
77}
78```
79The source field typically specified just the file that contains the test. The
80dependencies are specific to the tested function. Here we are listing them for
81the completeness. In your test all but `//base` dependencies are unlikely to be
82required.
83
84### Creating your first fuzz target
85
86Having seen a concrete example, let us describe the generic flow of steps to
87create a new fuzzing test.
88
891. In the same directory as the code you are going to fuzz (or next to the tests
90   for that code), create a new `<my_fuzzer>.cc` file.
91
92   *** note
93   **Note:** Do not use the `testing/libfuzzer/fuzzers` directory. This
94   directory was used for initial sample fuzz targets but is no longer
95   recommended for landing new targets.
96   ***
97
982. In the new file, define a `LLVMFuzzerTestOneInput` function:
99
100  ```cpp
101  #include <stddef.h>
102  #include <stdint.h>
103
104  extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
105    // Put your fuzzing code here and use |data| and |size| as input.
106    return 0;
107  }
108  ```
109
1103. In `BUILD.gn` file, define a `fuzzer_test` GN target:
111
112  ```python
113  import("//testing/libfuzzer/fuzzer_test.gni")
114  fuzzer_test("my_fuzzer") {
115    sources = [ "my_fuzzer.cc" ]
116    deps = [ ... ]
117  }
118  ```
119
120*** note
121**Note:** Most of the targets are small. They may perform one or a few API calls
122using the data provided by the fuzzing engine as an argument. However, fuzz
123targets may be more complex if a certain initialization procedure needs to be
124performed. [quic_session_pool_fuzzer.cc] is a good example of a complex fuzz
125target.
126***
127
128Once you created your first fuzz target, in order to run it, you must set up
129your build environment. This is described next.
130
131### Setting up your build environment
132
133Generate build files by using the `use_libfuzzer` [GN] argument together with a
134sanitizer. Rather than generating a GN build configuration by hand, we recommend
135that you run the meta-builder tool using [GN config] that corresponds to the
136operating system of the DUT you're deploying to:
137
138```bash
139# AddressSanitizer is the default config we recommend testing with.
140# Linux:
141tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Linux ASan' out/libfuzzer
142# Chrome OS:
143tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Chrome OS ASan' out/libfuzzer
144# Mac:
145tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Mac ASan' out/libfuzzer
146# Windows:
147python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Upload Windows ASan" out\libfuzzer
148```
149
150If testing things locally these are the recommended configurations
151
152```bash
153# AddressSanitizer is the default config we recommend testing with.
154# Linux:
155tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Linux ASan' out/libfuzzer
156# Chrome OS:
157tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Chrome OS ASan' out/libfuzzer
158# Mac:
159tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Mac ASan' out/libfuzzer
160# Windows:
161python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Local Windows ASan" out\libfuzzer
162```
163
164[`tools/mb/mb.py`](https://source.chromium.org/chromium/chromium/src/+/main:tools/mb/mb.py;drc=c771c017eca9a6a859d245be54c511acafdc9867)
165is "a wrapper script for GN that [..] generate[s] build files for sets of
166canned configurations." The `-m` flag selects the builder group, while the
167`-b` flag selects a specific builder in the builder group. The `out/libfuzzer`
168is the directory to which GN configuration is written. If you wish, you can
169inspect the generated config by running `gn args out/libfuzzer`, once the
170`mb.py` script is done.
171
172*** note
173**Note:** The above invocations may set `use_remoteexec` or `use_rbe` to true.
174However, these args aren't compatible on local workstations yet. So if you run
175into reclient errors when building locally, remove both those args and set
176`use_goma` instead.
177
178You can also invoke [AFL] by using the `use_afl` GN argument, but we
179recommend libFuzzer for local development. Running libFuzzer locally doesn't
180require any special configuration and gives quick, meaningful output for speed,
181coverage, and other parameters.
182***
183
184It’s possible to run fuzz targets without sanitizers, but not recommended, as
185sanitizers help to detect errors which may not result in a crash otherwise.
186`use_libfuzzer` is supported in the following sanitizer configurations.
187
188| GN Argument | Description | Supported OS |
189|-------------|-------------|--------------|
190| `is_asan=true` | Enables [AddressSanitizer] to catch problems like buffer overruns. | Linux, Windows, Mac, Chrome OS |
191| `is_msan=true` | Enables [MemorySanitizer] to catch problems like uninitialized reads<sup>\[[\*](reference.md#MSan)\]</sup>. | Linux |
192| `is_ubsan_security=true` | Enables [UndefinedBehaviorSanitizer] to catch<sup>\[[\*](reference.md#UBSan)\]</sup> undefined behavior like integer overflow.| Linux |
193
194For more on builder and sanitizer configurations, see the [Integration
195Reference] page.
196
197*** note
198**Hint**: Fuzz targets are built with minimal symbols by default. You can adjust
199the symbol level by setting the `symbol_level` attribute.
200***
201
202### Running the fuzz target
203
204After you create your fuzz target, build it with autoninja and run it locally.
205To make this example concrete, we are going to use the existing
206`create_fnmatch_query_fuzzer` target.
207
208```bash
209# Build the fuzz target.
210autoninja -C out/libfuzzer chrome/browser/ash:create_fnmatch_query_fuzzer
211# Run the fuzz target.
212./out/libfuzzer/create_fnmatch_query_fuzzer
213```
214
215Your fuzz target should produce output like this:
216
217```
218INFO: Seed: 1511722356
219INFO: Loaded 2 modules   (115485 guards): 22572 [0x7fe8acddf560, 0x7fe8acdf5610), 92913 [0xaa05d0, 0xafb194),
220INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
221INFO: A corpus is not provided, starting from an empty corpus
222#2  INITED cov: 961 ft: 48 corp: 1/1b exec/s: 0 rss: 48Mb
223#3  NEW    cov: 986 ft: 70 corp: 2/104b exec/s: 0 rss: 48Mb L: 103/103 MS: 1 InsertRepeatedBytes-
224#4  NEW    cov: 989 ft: 74 corp: 3/106b exec/s: 0 rss: 48Mb L: 2/103 MS: 1 InsertByte-
225#6  NEW    cov: 991 ft: 76 corp: 4/184b exec/s: 0 rss: 48Mb L: 78/103 MS: 2 CopyPart-InsertRepeatedBytes-
226```
227
228A `... NEW ...` line appears when libFuzzer finds new and interesting inputs. If
229your fuzz target is efficient, it will find a lot of them quickly. A `... pulse
230...` line appears periodically to show the current status.
231
232For more information about the output, see [libFuzzer's output documentation].
233
234*** note
235**Note:** If you observe an `odr-violation` error in the log, please try setting
236the following environment variable: `ASAN_OPTIONS=detect_odr_violation=0` and
237running the fuzz target again.
238***
239
240#### Symbolizing a stacktrace
241
242If your fuzz target crashes when running locally and you see non-symbolized
243stacktrace, make sure you add the `third_party/llvm-build/Release+Asserts/bin/`
244directory from Chromium’s Clang package in `$PATH`. This directory contains the
245`llvm-symbolizer` binary.
246
247Alternatively, you can set an `external_symbolizer_path` via the `ASAN_OPTIONS`
248environment variable:
249
250```bash
251ASAN_OPTIONS=external_symbolizer_path=/my/local/llvm/build/llvm-symbolizer \
252  ./fuzzer ./crash-input
253```
254
255The same approach works with other sanitizers via `MSAN_OPTIONS`,
256`UBSAN_OPTIONS`, etc.
257
258### Submitting your fuzz target
259
260ClusterFuzz and the build infrastructure automatically discover, build and
261execute all `fuzzer_test` targets in the Chromium repository. Once you land your
262fuzz target, ClusterFuzz will run it at scale. Check the [ClusterFuzz status]
263page after a day or two.
264
265If you want to better understand and optimize your fuzz target’s performance,
266see the [Efficient Fuzzing Guide].
267
268*** note
269**Note:** It’s important to run fuzzers at scale, not just in your own
270environment, because local fuzzing will catch fewer issues. If you run fuzz
271targets at scale continuously, you’ll catch regressions and improve code
272coverage over time.
273***
274
275## Optional improvements
276
277### Common tricks
278
279Your fuzz target may immediately discover interesting (i.e. crashing) inputs.
280You can make it more effective with several easy steps:
281
282* **Create a seed corpus**. You can guide the fuzzing engine to generate more
283  relevant inputs by adding the `seed_corpus = "src/fuzz-testcases/"` attribute
284  to your fuzz target and adding example files to the appropriate directory. For
285  more, see the [Seed Corpus] section of the [Efficient Fuzzing Guide].
286
287  *** note
288  **Note:** make sure your corpus files are appropriately licensed.
289  ***
290
291* **Create a mutation dictionary**. You can make mutations more effective by
292  providing the fuzzer with a `dict = "protocol.dict"` GN attribute and a
293  dictionary file that contains interesting strings / byte sequences for the
294  target API. For more, see the [Fuzzer Dictionary] section of the [Efficient
295  Fuzzer Guide].
296
297* **Specify testcase length limits**. Long inputs can be problematic, because
298  they are more slowly processed by the fuzz target and increase the search
299  space. By default, libFuzzer uses `-max_len=4096` or takes the longest
300  testcase in the corpus if `-max_len` is not specified.
301
302  ClusterFuzz uses different strategies for different fuzzing sessions,
303  including different random values. Also, ClusterFuzz uses different fuzzing
304  engines (e.g. AFL that doesn't have `-max_len` option). If your target has an
305  input length limit that you would like to *strictly enforce*, add a sanity
306  check to the beginning of your `LLVMFuzzerTestOneInput` function:
307
308  ```cpp
309  if (size < kMinInputLength || size > kMaxInputLength)
310    return 0;
311  ```
312
313* **Generate a [code coverage report]**. See which code the fuzzer covered in
314  recent runs, so you can gauge whether it hits the important code parts or not.
315
316  **Note:** Since the code coverage of a fuzz target depends heavily on the
317  corpus provided when running the target, we recommend running the fuzz target
318  built with ASan locally for a little while (several minutes / hours) first.
319  This will produce some corpus, which should be used for generating a code
320  coverage report.
321
322#### Disabling noisy error message logging
323
324If the code you’re fuzzing generates a lot of error messages when encountering
325incorrect or invalid data, the fuzz target will be slow and inefficient.
326
327If the target uses Chromium logging APIs, you can silence errors by overriding
328the environment used for logging in your fuzz target:
329
330```cpp
331struct Environment {
332  Environment() {
333    logging::SetMinLogLevel(logging::LOGGING_FATAL);
334  }
335};
336
337extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
338  static Environment env;
339
340  // Put your fuzzing code here and use data+size as input.
341  return 0;
342}
343```
344
345### Mutating Multiple Inputs
346
347By default, a fuzzing engine such as libFuzzer mutates a single input (`uint8_t*
348data, size_t size`). However, APIs often accept multiple arguments of various
349types, rather than a single buffer. You can use three different methods to
350mutate multiple inputs at once.
351
352#### libprotobuf-mutator (LPM)
353
354If you need to mutate multiple inputs of various types and length, see [Getting
355Started with libprotobuf-mutator in Chromium].
356
357*** note
358**Note:** This method works with APIs and data structures of any complexity, but
359requires extra effort. You would need to write a `.proto` definition (unless you
360fuzz an existing protobuf) and C++ code to pass the proto message to the API you
361are fuzzing (you'll have a fuzzed protobuf message instead of `data, size`
362buffer).
363***
364
365#### FuzzedDataProvider (FDP)
366
367[FuzzedDataProvider] is a class useful for splitting a fuzz input into multiple
368parts of various types.
369
370*** note
371**Note:** FDP is much easier to use than LPM, but its downside is that format of
372the corpus becomes inconsistent. This doesn't matter if you don't have [Seed
373Corpus] (e.g. valid image files if you fuzz an image parser). FDP splits your
374corpus files into several pieces to fuzz a broader range of input types, so it
375can take longer to reach deeper code paths that surface more quickly if you fuzz
376only a single input type.
377***
378
379To use FDP, add `#include <fuzzer/FuzzedDataProvider.h>` to your fuzz target
380source file.
381
382To learn more about `FuzzedDataProvider`, check out the [upstream documentation]
383on it. It gives an overview of the available methods and links to a few example
384fuzz targets.
385
386#### Hash-based argument
387
388If your API accepts a buffer with data and some integer value (i.e., a bitwise
389combination of flags), you can calculate a hash value from (`data, size`) and
390use it to fuzz an additional integer argument. For example:
391
392```cpp
393extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
394  std::string str = std::string(reinterpret_cast<const char*>(data), size);
395  std::size_t data_hash = std::hash<std::string>()(str);
396  APIToBeFuzzed(data, size, data_hash);
397  return 0;
398}
399
400```
401
402*** note
403**Note:** The hash method doesn't have the corpus format issue mentioned in the
404FDP section above, but it can lead to results that aren't as sophisticated as
405LPM or FDP. The hash value derived from the data is a random value, rather than
406a meaningful one controlled by the fuzzing engine. A single bit mutation might
407lead to a new code coverage, but the next mutation would generate a new hash
408value and trigger another code path, without providing any real guidance to the
409fuzzing engine.
410***
411
412[AFL]: AFL_integration.md
413[AddressSanitizer]: http://clang.llvm.org/docs/AddressSanitizer.html
414[ClusterFuzz status]: libFuzzer_integration.md#Status-Links
415[Efficient Fuzzing Guide]: efficient_fuzzing.md
416[FuzzedDataProvider]: https://cs.chromium.org/chromium/src/third_party/re2/src/re2/fuzzing/compiler-rt/include/fuzzer/FuzzedDataProvider.h
417[Fuzzer Dictionary]: efficient_fuzzing.md#Fuzzer-dictionary
418[GN]: https://gn.googlesource.com/gn/+/master/README.md
419[GN config]: https://cs.chromium.org/chromium/src/tools/mb/mb_config_expectations/chromium.fuzz.json
420[Getting Started with libprotobuf-mutator in Chromium]: libprotobuf-mutator.md
421[Integration Reference]: reference.md
422[MemorySanitizer]: http://clang.llvm.org/docs/MemorySanitizer.html
423[Seed Corpus]: efficient_fuzzing.md#Seed-corpus
424[UndefinedBehaviorSanitizer]: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
425[code coverage report]: efficient_fuzzing.md#Code-coverage
426[upstream documentation]: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider
427[libFuzzer's output documentation]: http://llvm.org/docs/LibFuzzer.html#output
428[quic_session_pool_fuzzer.cc]: https://cs.chromium.org/chromium/src/net/quic/quic_session_pool_fuzzer.cc
429[getting started guide here]: getting_started.md
430