testing/libfuzzer/getting_started_with_libfuzzer.md

# Getting started with libfuzzer in Chromium

Our current best advice on how to start fuzzing is by using FuzzTest, which
has its own [getting started guide here]. If you're reading this page, it's
probably because you've run into limitations of FuzzTest and want to create
a libfuzzer fuzzer instead. This is a slightly older approach to fuzzing
Chrome, but it still works well - read on.

This document walks you through the basic steps to start fuzzing and suggestions
for improving your fuzz targets. If you're looking for more advanced fuzzing
topics, see the [main page](README.md).

[TOC]

## Getting started

### Simple Example

Before writing any code let us look at a simple
example of a test that uses input fuzzing. The test is setup to exercise the
[`CreateFnmatchQuery`](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/extensions/file_manager/search_by_pattern.h;drc=4bc4bcef0ab5581a5a27cea986296739582243a6)
function. The role of this function is to take a user query and produce
a case-insensitive pattern that matches file names containing the
query in them. For example, for a query "1abc" the function generates
"\*1[aA][bB][cC]\*". Unlike a traditional test, an input fuzzing test does not
care about the output of the tested function. Instead it verifies that no
matter what string the user enters `CreateFnmatchQuery` does not do something
unexpected, such as a crash, overriding a memory region, etc. The test
[create_fnmatch_query_fuzzer.cc](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/extensions/file_manager/create_fnmatch_query_fuzzer.cc;drc=1f5a5af3eb1bbdf9e4566c3e6d2051e68de112eb)
is shown below:

```cpp
#include <stddef.h>
#include <stdint.h>

#include <string>

#include "chrome/browser/ash/extensions/file_manager/search_by_pattern.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  std::string str = std::string(reinterpret_cast<const char*>(data), size);
  extensions::CreateFnmatchQuery(str);
  return 0;
}
```

The code starts by including `stddef.h` for `size_t` definition, `stdint.h`
for `uint8_t` definition, `string` for `std::string` definition and finally
the file where `extensions::CreateFnmatchQuery` function is defined. Next
it declares and defines the `LLVMFuzzerTestOneInput` function, which is
the function called by the testing framework. The function is supplied with two
arguments, a pointer to an array of bytes, and the size of the array. These
bytes are generated by the fuzzing test harness and their specific values
are irrelevant. The job of the test is to convert those bytes to input
parameters of the tested function. In our case bytes are converted
to a `std::string` and given to the `CreateFnmatchQuery` function. If
the function completes its job and the code successfully returns, the
`LLVMFuzzerTestOneInput` function returns 0, signaling a successful execution.

The above pattern is typical to fuzzing tests. You create a
`LLVMFuzzerTestOneInput` function. You then write code that uses the provided
random bytes to form input parameters to the function you intend to test. Next,
you call the function, and if it successfully completes, return 0.

To run this test we need to create a `fuzzer_test` target in the appropriate
`BUILD.gn` file. For the above example, the target is defined as

```python
fuzzer_test("create_fnmatch_query_fuzzer") {
  sources = [ "extensions/file_manager/create_fnmatch_query_fuzzer.cc" ]
  deps = [
    ":ash",
    "//base",
    "//chrome/browser",
    "//components/exo/wayland:ui_controls_protocol",
  ]
}
```
The source field typically specified just the file that contains the test. The
dependencies are specific to the tested function. Here we are listing them for
the completeness. In your test all but `//base` dependencies are unlikely to be
required.

### Creating your first fuzz target

Having seen a concrete example, let us describe the generic flow of steps to
create a new fuzzing test.

1. In the same directory as the code you are going to fuzz (or next to the tests
   for that code), create a new `<my_fuzzer>.cc` file.

   *** note
   **Note:** Do not use the `testing/libfuzzer/fuzzers` directory. This
   directory was used for initial sample fuzz targets but is no longer
   recommended for landing new targets.
   ***

2. In the new file, define a `LLVMFuzzerTestOneInput` function:

  ```cpp
  #include <stddef.h>
  #include <stdint.h>

  extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
    // Put your fuzzing code here and use |data| and |size| as input.
    return 0;
  }
  ```

3. In `BUILD.gn` file, define a `fuzzer_test` GN target:

  ```python
  import("//testing/libfuzzer/fuzzer_test.gni")
  fuzzer_test("my_fuzzer") {
    sources = [ "my_fuzzer.cc" ]
    deps = [ ... ]
  }
  ```

*** note
**Note:** Most of the targets are small. They may perform one or a few API calls
using the data provided by the fuzzing engine as an argument. However, fuzz
targets may be more complex if a certain initialization procedure needs to be
performed. [quic_session_pool_fuzzer.cc] is a good example of a complex fuzz
target.
***

Once you created your first fuzz target, in order to run it, you must set up
your build environment. This is described next.

### Setting up your build environment

Generate build files by using the `use_libfuzzer` [GN] argument together with a
sanitizer. Rather than generating a GN build configuration by hand, we recommend
that you run the meta-builder tool using [GN config] that corresponds to the
operating system of the DUT you're deploying to:

```bash
# AddressSanitizer is the default config we recommend testing with.
# Linux:
tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Linux ASan' out/libfuzzer
# Chrome OS:
tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Chrome OS ASan' out/libfuzzer
# Mac:
tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Mac ASan' out/libfuzzer
# Windows:
python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Upload Windows ASan" out\libfuzzer
```

If testing things locally these are the recommended configurations

```bash
# AddressSanitizer is the default config we recommend testing with.
# Linux:
tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Linux ASan' out/libfuzzer
# Chrome OS:
tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Chrome OS ASan' out/libfuzzer
# Mac:
tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Mac ASan' out/libfuzzer
# Windows:
python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Local Windows ASan" out\libfuzzer
```

[`tools/mb/mb.py`](https://source.chromium.org/chromium/chromium/src/+/main:tools/mb/mb.py;drc=c771c017eca9a6a859d245be54c511acafdc9867)
is "a wrapper script for GN that [..] generate[s] build files for sets of
canned configurations." The `-m` flag selects the builder group, while the
`-b` flag selects a specific builder in the builder group. The `out/libfuzzer`
is the directory to which GN configuration is written. If you wish, you can
inspect the generated config by running `gn args out/libfuzzer`, once the
`mb.py` script is done.

*** note
**Note:** The above invocations may set `use_remoteexec` or `use_rbe` to true.
However, these args aren't compatible on local workstations yet. So if you run
into reclient errors when building locally, remove both those args and set
`use_goma` instead.

You can also invoke [AFL] by using the `use_afl` GN argument, but we
recommend libFuzzer for local development. Running libFuzzer locally doesn't
require any special configuration and gives quick, meaningful output for speed,
coverage, and other parameters.
***

It’s possible to run fuzz targets without sanitizers, but not recommended, as
sanitizers help to detect errors which may not result in a crash otherwise.
`use_libfuzzer` is supported in the following sanitizer configurations.

| GN Argument | Description | Supported OS |
|-------------|-------------|--------------|
| `is_asan=true` | Enables [AddressSanitizer] to catch problems like buffer overruns. | Linux, Windows, Mac, Chrome OS |
| `is_msan=true` | Enables [MemorySanitizer] to catch problems like uninitialized reads<sup>\[[\*](reference.md#MSan)\]</sup>. | Linux |
| `is_ubsan_security=true` | Enables [UndefinedBehaviorSanitizer] to catch<sup>\[[\*](reference.md#UBSan)\]</sup> undefined behavior like integer overflow.| Linux |

For more on builder and sanitizer configurations, see the [Integration
Reference] page.

*** note
**Hint**: Fuzz targets are built with minimal symbols by default. You can adjust
the symbol level by setting the `symbol_level` attribute.
***

### Running the fuzz target

After you create your fuzz target, build it with autoninja and run it locally.
To make this example concrete, we are going to use the existing
`create_fnmatch_query_fuzzer` target.

```bash
# Build the fuzz target.
autoninja -C out/libfuzzer chrome/browser/ash:create_fnmatch_query_fuzzer
# Run the fuzz target.
./out/libfuzzer/create_fnmatch_query_fuzzer
```

Your fuzz target should produce output like this:

```
INFO: Seed: 1511722356
INFO: Loaded 2 modules   (115485 guards): 22572 [0x7fe8acddf560, 0x7fe8acdf5610), 92913 [0xaa05d0, 0xafb194),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2  INITED cov: 961 ft: 48 corp: 1/1b exec/s: 0 rss: 48Mb
#3  NEW    cov: 986 ft: 70 corp: 2/104b exec/s: 0 rss: 48Mb L: 103/103 MS: 1 InsertRepeatedBytes-
#4  NEW    cov: 989 ft: 74 corp: 3/106b exec/s: 0 rss: 48Mb L: 2/103 MS: 1 InsertByte-
#6  NEW    cov: 991 ft: 76 corp: 4/184b exec/s: 0 rss: 48Mb L: 78/103 MS: 2 CopyPart-InsertRepeatedBytes-
```

A `... NEW ...` line appears when libFuzzer finds new and interesting inputs. If
your fuzz target is efficient, it will find a lot of them quickly. A `... pulse
...` line appears periodically to show the current status.

For more information about the output, see [libFuzzer's output documentation].

*** note
**Note:** If you observe an `odr-violation` error in the log, please try setting
the following environment variable: `ASAN_OPTIONS=detect_odr_violation=0` and
running the fuzz target again.
***

#### Symbolizing a stacktrace

If your fuzz target crashes when running locally and you see non-symbolized
stacktrace, make sure you add the `third_party/llvm-build/Release+Asserts/bin/`
directory from Chromium’s Clang package in `$PATH`. This directory contains the
`llvm-symbolizer` binary.

Alternatively, you can set an `external_symbolizer_path` via the `ASAN_OPTIONS`
environment variable:

```bash
ASAN_OPTIONS=external_symbolizer_path=/my/local/llvm/build/llvm-symbolizer \
  ./fuzzer ./crash-input
```

The same approach works with other sanitizers via `MSAN_OPTIONS`,
`UBSAN_OPTIONS`, etc.

### Submitting your fuzz target

ClusterFuzz and the build infrastructure automatically discover, build and
execute all `fuzzer_test` targets in the Chromium repository. Once you land your
fuzz target, ClusterFuzz will run it at scale. Check the [ClusterFuzz status]
page after a day or two.

If you want to better understand and optimize your fuzz target’s performance,
see the [Efficient Fuzzing Guide].

*** note
**Note:** It’s important to run fuzzers at scale, not just in your own
environment, because local fuzzing will catch fewer issues. If you run fuzz
targets at scale continuously, you’ll catch regressions and improve code
coverage over time.
***

## Optional improvements

### Common tricks

Your fuzz target may immediately discover interesting (i.e. crashing) inputs.
You can make it more effective with several easy steps:

* **Create a seed corpus**. You can guide the fuzzing engine to generate more
  relevant inputs by adding the `seed_corpus = "src/fuzz-testcases/"` attribute
  to your fuzz target and adding example files to the appropriate directory. For
  more, see the [Seed Corpus] section of the [Efficient Fuzzing Guide].

  *** note
  **Note:** make sure your corpus files are appropriately licensed.
  ***

* **Create a mutation dictionary**. You can make mutations more effective by
  providing the fuzzer with a `dict = "protocol.dict"` GN attribute and a
  dictionary file that contains interesting strings / byte sequences for the
  target API. For more, see the [Fuzzer Dictionary] section of the [Efficient
  Fuzzer Guide].

* **Specify testcase length limits**. Long inputs can be problematic, because
  they are more slowly processed by the fuzz target and increase the search
  space. By default, libFuzzer uses `-max_len=4096` or takes the longest
  testcase in the corpus if `-max_len` is not specified.

  ClusterFuzz uses different strategies for different fuzzing sessions,
  including different random values. Also, ClusterFuzz uses different fuzzing
  engines (e.g. AFL that doesn't have `-max_len` option). If your target has an
  input length limit that you would like to *strictly enforce*, add a sanity
  check to the beginning of your `LLVMFuzzerTestOneInput` function:

  ```cpp
  if (size < kMinInputLength || size > kMaxInputLength)
    return 0;
  ```

* **Generate a [code coverage report]**. See which code the fuzzer covered in
  recent runs, so you can gauge whether it hits the important code parts or not.

  **Note:** Since the code coverage of a fuzz target depends heavily on the
  corpus provided when running the target, we recommend running the fuzz target
  built with ASan locally for a little while (several minutes / hours) first.
  This will produce some corpus, which should be used for generating a code
  coverage report.

#### Disabling noisy error message logging

If the code you’re fuzzing generates a lot of error messages when encountering
incorrect or invalid data, the fuzz target will be slow and inefficient.

If the target uses Chromium logging APIs, you can silence errors by overriding
the environment used for logging in your fuzz target:

```cpp
struct Environment {
  Environment() {
    logging::SetMinLogLevel(logging::LOGGING_FATAL);
  }
};

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  static Environment env;

  // Put your fuzzing code here and use data+size as input.
  return 0;
}
```

### Mutating Multiple Inputs

By default, a fuzzing engine such as libFuzzer mutates a single input (`uint8_t*
data, size_t size`). However, APIs often accept multiple arguments of various
types, rather than a single buffer. You can use three different methods to
mutate multiple inputs at once.

#### libprotobuf-mutator (LPM)

If you need to mutate multiple inputs of various types and length, see [Getting
Started with libprotobuf-mutator in Chromium].

*** note
**Note:** This method works with APIs and data structures of any complexity, but
requires extra effort. You would need to write a `.proto` definition (unless you
fuzz an existing protobuf) and C++ code to pass the proto message to the API you
are fuzzing (you'll have a fuzzed protobuf message instead of `data, size`
buffer).
***

#### FuzzedDataProvider (FDP)

[FuzzedDataProvider] is a class useful for splitting a fuzz input into multiple
parts of various types.

*** note
**Note:** FDP is much easier to use than LPM, but its downside is that format of
the corpus becomes inconsistent. This doesn't matter if you don't have [Seed
Corpus] (e.g. valid image files if you fuzz an image parser). FDP splits your
corpus files into several pieces to fuzz a broader range of input types, so it
can take longer to reach deeper code paths that surface more quickly if you fuzz
only a single input type.
***

To use FDP, add `#include <fuzzer/FuzzedDataProvider.h>` to your fuzz target
source file.

To learn more about `FuzzedDataProvider`, check out the [upstream documentation]
on it. It gives an overview of the available methods and links to a few example
fuzz targets.

#### Hash-based argument

If your API accepts a buffer with data and some integer value (i.e., a bitwise
combination of flags), you can calculate a hash value from (`data, size`) and
use it to fuzz an additional integer argument. For example:

```cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  std::string str = std::string(reinterpret_cast<const char*>(data), size);
  std::size_t data_hash = std::hash<std::string>()(str);
  APIToBeFuzzed(data, size, data_hash);
  return 0;
}

```

*** note
**Note:** The hash method doesn't have the corpus format issue mentioned in the
FDP section above, but it can lead to results that aren't as sophisticated as
LPM or FDP. The hash value derived from the data is a random value, rather than
a meaningful one controlled by the fuzzing engine. A single bit mutation might
lead to a new code coverage, but the next mutation would generate a new hash
value and trigger another code path, without providing any real guidance to the
fuzzing engine.
***

[AFL]: AFL_integration.md
[AddressSanitizer]: http://clang.llvm.org/docs/AddressSanitizer.html
[ClusterFuzz status]: libFuzzer_integration.md#Status-Links
[Efficient Fuzzing Guide]: efficient_fuzzing.md
[FuzzedDataProvider]: https://cs.chromium.org/chromium/src/third_party/re2/src/re2/fuzzing/compiler-rt/include/fuzzer/FuzzedDataProvider.h
[Fuzzer Dictionary]: efficient_fuzzing.md#Fuzzer-dictionary
[GN]: https://gn.googlesource.com/gn/+/master/README.md
[GN config]: https://cs.chromium.org/chromium/src/tools/mb/mb_config_expectations/chromium.fuzz.json
[Getting Started with libprotobuf-mutator in Chromium]: libprotobuf-mutator.md
[Integration Reference]: reference.md
[MemorySanitizer]: http://clang.llvm.org/docs/MemorySanitizer.html
[Seed Corpus]: efficient_fuzzing.md#Seed-corpus
[UndefinedBehaviorSanitizer]: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
[code coverage report]: efficient_fuzzing.md#Code-coverage
[upstream documentation]: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider
[libFuzzer's output documentation]: http://llvm.org/docs/LibFuzzer.html#output
[quic_session_pool_fuzzer.cc]: https://cs.chromium.org/chromium/src/net/quic/quic_session_pool_fuzzer.cc
[getting started guide here]: getting_started.md