xref: /aosp_15_r20/external/cronet/testing/libfuzzer/libprotobuf-mutator.md (revision 6777b5387eb2ff775bb5750e3f5d96f37fb7352b)
1# Getting Started with libprotobuf-mutator (LPM) in Chromium
2
3*** note
4**Note:** Writing grammar fuzzers with libprotobuf-mutator requires greater
5effort than writing fuzzers with libFuzzer alone. If you run into problems, send
6an email to [[email protected]] for help.
7
8**Prerequisites:** Knowledge of [libFuzzer in Chromium] and basic understanding
9of [Protocol Buffers].
10***
11
12This document will walk you through:
13
14* An overview of libprotobuf-mutator and how it's used.
15* Writing and building your first fuzzer using libprotobuf-mutator.
16
17[TOC]
18
19## Overview of libprotobuf-mutator
20libprotobuf-mutator is a package that allows libFuzzer’s mutation engine to
21manipulate protobufs. This allows libFuzzer's mutations to be more specific
22to the format it is fuzzing and less arbitrary. Below are some good use cases
23for libprotobuf-mutator:
24
25* Fuzzing targets that accept Protocol Buffers as input. See the next section
26for how to do this.
27* Fuzzing targets that accept input defined by a grammar. To do this you
28must write code that converts data from a protobuf-based format that represents
29the grammar to a format the target accepts. url_parse_proto_fuzzer is a working
30example of this and is commented extensively. Readers may wish to consult its
31code, which is located in `testing/libfuzzer/fuzzers/url_parse_proto_fuzzer.cc`
32and `testing/libfuzzer/proto/url.proto`. Its build configuration can be found
33in `testing/libfuzzer/fuzzers/BUILD.gn` and `testing/libfuzzer/proto/BUILD.gn`.
34We also provide a walkthrough on how to do this in the section after the next.
35* Fuzzing targets that accept more than one argument (such as data and flags).
36In this case, you can define each argument as its own field in your protobuf
37definition.
38
39In the next section, we discuss building a fuzzer that targets code that accepts
40an already existing protobuf definition. In the section after that, we discuss
41how to write and build grammar-based fuzzers using libprotobuf-mutator.
42Interested readers may also want to look at [this] example of a
43libprotobuf-mutator fuzzer that is even more trivial than
44url_parse_proto_fuzzer.
45
46## Write a fuzz target for code that accepts protobufs
47
48This is almost as easy as writing a standard libFuzzer-based fuzzer. You can
49look at [lpm_test_fuzzer] for an example of a working example of this (don't
50copy the line adding "//testing/libfuzzer:no_clusterfuzz" to
51additional_configs). Or you can follow this walkthrough:
52
53Start by creating a fuzz target. This is what the .cc file will look like:
54
55```c++
56// my_fuzzer.cc
57
58#include "testing/libfuzzer/proto/lpm_interface.h"
59
60// Assuming the .proto file is path/to/your/proto_file/my_proto.proto.
61#include "path/to/your/proto_file/my_proto.pb.h"
62
63DEFINE_PROTO_FUZZER(
64  const my_proto::MyProtoMessage& my_proto_message) {
65  targeted_function(my_proto_message);
66}
67```
68
69The BUILD.gn definition for this target will be very similar to regular
70libFuzzer-based fuzzer_test. However it will also have libprotobuf-mutator in
71its deps. This is an example of what it will look like:
72
73```python
74// You must wrap the target in "use_fuzzing_engine_with_lpm" since trying to compile the
75// target without a suitable fuzzing engine will fail (for reasons alluded to in the next
76// step), which the commit queue will try.
77if (use_fuzzing_engine_with_lpm) {
78  fuzzer_test("my_fuzzer") {
79    sources = [ "my_fuzzer.cc" ]
80    deps = [
81      // The proto library defining the message accepted by
82      // DEFINE_PROTO_FUZZER().
83      ":my_proto",
84
85      "//third_party/libprotobuf-mutator",
86      ...
87    ]
88  }
89}
90```
91
92There's one more step however. Because Chromium doesn't want to ship to users
93the full protobuf library, all `.proto` files in Chromium that are used in
94production contain this line: `option optimize_for = LITE_RUNTIME` But this
95line is incompatible with libprotobuf-mutator. Thus, we need to modify the
96`proto_library` build target so that builds when fuzzing are compatible with
97libprotobuf-mutator. To do this, change your `proto_library` to
98`fuzzable_proto_library` (don't worry, this works just like `proto_library` when
99`use_fuzzing_engine_with_lpm` is `false`) like so:
100
101```python
102import("//third_party/libprotobuf-mutator/fuzzable_proto_library.gni")
103
104fuzzable_proto_library("my_proto") {
105  ...
106}
107```
108
109And with that we have completed writing a libprotobuf-mutator fuzz target for
110Chromium code that accepts protobufs.
111
112
113## Write a grammar-based fuzzer with libprotobuf-mutator
114
115Once you have in mind the code you want to fuzz and the format it accepts, you
116are ready to start writing a libprotobuf-mutator fuzzer. Writing the fuzzer
117will have three steps:
118
119* Define the fuzzed format (not required for protobuf formats, unless the
120original definition is optimized for `LITE_RUNTIME`).
121* Write the fuzz target and conversion code (for non-protobuf formats).
122* Define the GN target
123
124### Define the Fuzzed Format
125Create a new .proto using `proto2` or `proto3` syntax and define a message that
126you want libFuzzer to mutate.
127
128``` protocol-buffer
129syntax = "proto2";
130
131package my_fuzzer;
132
133message MyProtoFormat {
134    // Define a format for libFuzzer to mutate here.
135}
136```
137
138See `testing/libfuzzer/proto/url.proto` for an example of this in practice.
139That example has extensive comments on URL syntax and how that influenced
140the definition of the Url message.
141
142### Write the Fuzz Target and Conversion Code
143Create a new .cc and write a `DEFINE_PROTO_FUZZER` function:
144
145```c++
146// Needed since we use getenv().
147#include <stdlib.h>
148
149// Needed since we use std::cout.
150#include <iostream>
151
152#include "testing/libfuzzer/proto/lpm_interface.h"
153
154// Assuming the .proto file is path/to/your/proto_file/my_format.proto.
155#include "path/to/your/proto_file/my_format.pb.h"
156
157// Put your conversion code here (if needed) and then pass the result to
158// your fuzzing code (or just pass "my_format", if your target accepts
159// protobufs).
160
161DEFINE_PROTO_FUZZER(const my_fuzzer::MyFormat& my_proto_format) {
162    // Convert your protobuf to whatever format your targeted code accepts
163    // if it doesn't accept protobufs.
164    std::string native_input = convert_to_native_input(my_proto_format);
165
166    // You should provide a way to easily retrieve the native input for
167    // a given protobuf input. This is useful for debugging and for seeing
168    // the inputs that cause targeted_function to crash (which is the reason we
169    // are here!). Note how this is done before targeted_function is called
170    // since we can't print after the program has crashed.
171    if (getenv("LPM_DUMP_NATIVE_INPUT"))
172      std::cout << native_input << std::endl;
173
174    // Now test your targeted code using the converted protobuf input.
175    targeted_function(native_input);
176}
177```
178
179This is very similar to the same step in writing a standard libFuzzer fuzzer.
180The only real differences are accepting protobufs rather than raw data and
181converting them to the desired format. Conversion code can't really be
182explored in this guide since it is format-specific. However, a good example
183of conversion code (and a fuzz target) can be found in
184`testing/libfuzzer/fuzzers/url_parse_proto_fuzzer.cc`. That example
185thoroughly documents how it converts the Url protobuf message into a real URL
186string. A good convention is printing the native input when the
187`LPM_DUMP_NATIVE_INPUT` env variable is set. This will make it easy to
188retrieve the actual input that causes the code to crash instead of the
189protobuf version of it (e.g. you can get the URL string that causes an input
190to crash rather than a protobuf). Since it is only a convention it is
191strongly recommended even though it isn't necessary. You don't need to do
192this if the native input of targeted_function is protobufs. Beware that
193printing a newline can make the output invalid for some formats. In this case
194you should use `fflush(0)` since otherwise the program may crash before
195native_input is actually printed.
196
197
198### Define the GN Target
199Define a fuzzer_test target and include your protobuf definition and
200libprotobuf-mutator as dependencies.
201
202```python
203import("//testing/libfuzzer/fuzzer_test.gni")
204import("//third_party/protobuf/proto_library.gni")
205
206fuzzer_test("my_fuzzer") {
207  sources = [ "my_fuzzer.cc" ]
208  deps = [
209    ":my_format_proto",
210    "//third_party/libprotobuf-mutator"
211    ...
212  ]
213}
214
215proto_library("my_format_proto") {
216  sources = [ "my_format.proto" ]
217}
218```
219
220See `testing/libfuzzer/fuzzers/BUILD.gn` for an example of this in practice.
221
222### Tips For Grammar Based Fuzzers
223* If you have messages that are defined recursively (eg: message `Foo` has a
224field of type `Foo`), make sure to bound recursive calls to code converting
225your message into native input. Otherwise you will (probably) end up with an
226out of memory error. The code coverage benefits of allowing unlimited
227recursion in a message are probably fairly low for most targets anyway.
228
229* Remember that proto definitions can be changed in ways that are backwards
230compatible (such as adding explicit values to an `enum`). This means that you
231can make changes to your definitions while preserving the usefulness of your
232corpus. In general adding fields will be backwards compatible but removing them
233(particulary if they are `required`) is not.
234
235* Make sure you understand the meaning of the different protobuf modifiers such
236as `oneof` and `repeated` as they can be counter-intuitive. `oneof` means "At
237most one of" while `repeated` means "At least zero". You can hack around these
238meanings if you need "at least one of" or "exactly one of" something. For
239example, this is the proto code for exactly one of: `MessageA` or `MessageB` or
240`MessageC`:
241
242```protocol-buffer
243message MyFormat {
244    oneof a_or_b {
245      MessageA message_a = 1;
246      MessageB message_b = 2;
247    }
248    required MessageC message_c = 3;
249}
250```
251
252And here is the C++ code that converts it.
253
254```c++
255std::string Convert(const MyFormat& my_format) {
256  if (my_format.has_message_a())
257    return ConvertMessageA(my_format.message_a());
258  else if (my_format.has_message_b())
259    return ConvertMessageB(my_format.message_b());
260  else // Fall through to the default case, message_c.
261    return ConvertMessageC(my_format.message_c());
262}
263```
264
265* libprotobuf-mutator supports both proto2 and proto3 syntax. Be aware though
266that it handles strings differently in each because of differences in the way
267the proto library handles strings in each syntax (in short, proto3 strings must
268actually be UTF-8 while in proto2 they do not). See [here] for more details.
269
270## Write a fuzz target for code that accepts multiple inputs
271LPM makes it straightforward to write a fuzzer for code that needs multiple
272inputs. The steps for doing this are similar to those of writing a grammar based
273fuzzer, except in this case the grammar is very simple. Thus instructions for
274this use case are given below.
275Start by creating the proto file which will define the inputs you want:
276
277```protocol-buffer
278// my_fuzzer_input.proto
279
280syntax = "proto2";
281
282package my_fuzzer;
283
284message FuzzerInput {
285    required bool arg1 = 1;
286    required string arg2 = 2;
287    optional int arg3 = 1;
288}
289
290```
291
292In this example, the function we are fuzzing requires a `bool` and a `string`
293and takes an `int` as an optional argument. Let's define our fuzzer harness:
294
295```c++
296// my_fuzzer.cc
297
298#include "testing/libfuzzer/proto/lpm_interface.h"
299
300// Assuming the .proto file is path/to/your/proto_file/my_fuzzer_input.proto.
301#include "path/to/your/proto_file/my_proto.pb.h"
302
303DEFINE_PROTO_FUZZER(
304  const my_proto::FuzzerInput& fuzzer_input) {
305  if (fuzzer_input.has_arg3())
306    targeted_function_1(fuzzer_input.arg1(), fuzzer_input.arg2(), fuzzer_input.arg3());
307  else
308    targeted_function_2(fuzzer_input.arg1(), fuzzer_input.arg2());
309}
310```
311
312Then you must define build targets for your fuzzer harness and proto format in
313GN, like so:
314```python
315import("//testing/libfuzzer/fuzzer_test.gni")
316import("//third_party/protobuf/proto_library.gni")
317
318fuzzer_test("my_fuzzer") {
319  sources = [ "my_fuzzer.cc" ]
320  deps = [
321    ":my_fuzzer_input",
322    "//third_party/libprotobuf-mutator"
323    ...
324  ]
325}
326
327proto_library("my_fuzzer_input") {
328  sources = [ "my_fuzzer_input.proto" ]
329}
330```
331
332### Tips for fuzz targets that accept multiple inputs
333Protobuf has a field rule `repeated` that is useful when a fuzzer needs to
334accept a non-fixed number of inputs (see [mojo_parse_messages_proto_fuzzer],
335which accepts an unbounded number of mojo messages as an example).
336Protobuf version 2 also has `optional` and `required` field rules that some may
337find useful.
338
339
340## Wrapping Up
341Once you have written a fuzzer with libprotobuf-mutator, building and running
342it is pretty much the same as if the fuzzer were a [standard libFuzzer-based
343fuzzer] (with minor exceptions, like your seed corpus must be in protobuf
344format).
345
346## General Tips
347* Check out some of the [existing proto fuzzers]. Not only will they be helpful
348examples, it is possible that format you want to fuzz is already defined or
349partially defined by an existing proto definition (if you are writing a grammar
350fuzzer).
351
352* `DEFINE_BINARY_PROTO_FUZZER` can be used instead of `DEFINE_PROTO_FUZZER` (or
353  `DEFINE_TEXT_PROTO_FUZZER`) to use protobuf's binary format for the corpus.
354  This will make it hard/impossible to modify the corpus manually (i.e. when not
355  fuzzing). However, protobuf's text format (and by extension
356  `DEFINE_PROTO_FUZZER`) is believed by some to come with a performance penalty
357  compared to the binary format. We've never seen a case where this penalty
358  was important, but if profiling reveals that protobuf deserialization is the
359  bottleneck in your fuzzer, you may want to consider using the binary format.
360  This will probably not be the case.
361
362[libfuzzer in Chromium]: getting_started.md
363[Protocol Buffers]: https://developers.google.com/protocol-buffers/docs/cpptutorial
364[[email protected]]: mailto:[email protected]
365[this]: https://github.com/google/libprotobuf-mutator/tree/master/examples/libfuzzer/libfuzzer_example.cc
366[existing proto fuzzers]: https://cs.chromium.org/search/?q=DEFINE_(BINARY_%7CTEXT_)?PROTO_FUZZER+-file:src/third_party/libprotobuf-mutator/src/src/libfuzzer/libfuzzer_macro.h+lang:cpp&sq=package:chromium&type=cs
367[here]: https://github.com/google/libprotobuf-mutator/blob/master/README.md#utf-8-strings
368[lpm_test_fuzzer]: https://cs.chromium.org/#search&q=lpm_test_fuzzer+file:%5Esrc/third_party/libprotobuf-mutator/BUILD.gn
369[mojo_parse_messages_proto_fuzzer]: https://cs.chromium.org/chromium/src/mojo/public/tools/fuzzers/mojo_parse_message_proto_fuzzer.cc?l=25
370[standard libFuzzer-based fuzzer]:getting_started_with_libfuzzer.md
371