1# Getting Started with libprotobuf-mutator (LPM) in Chromium 2 3*** note 4**Note:** Writing grammar fuzzers with libprotobuf-mutator requires greater 5effort than writing fuzzers with libFuzzer alone. If you run into problems, send 6an email to [[email protected]] for help. 7 8**Prerequisites:** Knowledge of [libFuzzer in Chromium] and basic understanding 9of [Protocol Buffers]. 10*** 11 12This document will walk you through: 13 14* An overview of libprotobuf-mutator and how it's used. 15* Writing and building your first fuzzer using libprotobuf-mutator. 16 17[TOC] 18 19## Overview of libprotobuf-mutator 20libprotobuf-mutator is a package that allows libFuzzer’s mutation engine to 21manipulate protobufs. This allows libFuzzer's mutations to be more specific 22to the format it is fuzzing and less arbitrary. Below are some good use cases 23for libprotobuf-mutator: 24 25* Fuzzing targets that accept Protocol Buffers as input. See the next section 26for how to do this. 27* Fuzzing targets that accept input defined by a grammar. To do this you 28must write code that converts data from a protobuf-based format that represents 29the grammar to a format the target accepts. url_parse_proto_fuzzer is a working 30example of this and is commented extensively. Readers may wish to consult its 31code, which is located in `testing/libfuzzer/fuzzers/url_parse_proto_fuzzer.cc` 32and `testing/libfuzzer/proto/url.proto`. Its build configuration can be found 33in `testing/libfuzzer/fuzzers/BUILD.gn` and `testing/libfuzzer/proto/BUILD.gn`. 34We also provide a walkthrough on how to do this in the section after the next. 35* Fuzzing targets that accept more than one argument (such as data and flags). 36In this case, you can define each argument as its own field in your protobuf 37definition. 38 39In the next section, we discuss building a fuzzer that targets code that accepts 40an already existing protobuf definition. In the section after that, we discuss 41how to write and build grammar-based fuzzers using libprotobuf-mutator. 42Interested readers may also want to look at [this] example of a 43libprotobuf-mutator fuzzer that is even more trivial than 44url_parse_proto_fuzzer. 45 46## Write a fuzz target for code that accepts protobufs 47 48This is almost as easy as writing a standard libFuzzer-based fuzzer. You can 49look at [lpm_test_fuzzer] for an example of a working example of this (don't 50copy the line adding "//testing/libfuzzer:no_clusterfuzz" to 51additional_configs). Or you can follow this walkthrough: 52 53Start by creating a fuzz target. This is what the .cc file will look like: 54 55```c++ 56// my_fuzzer.cc 57 58#include "testing/libfuzzer/proto/lpm_interface.h" 59 60// Assuming the .proto file is path/to/your/proto_file/my_proto.proto. 61#include "path/to/your/proto_file/my_proto.pb.h" 62 63DEFINE_PROTO_FUZZER( 64 const my_proto::MyProtoMessage& my_proto_message) { 65 targeted_function(my_proto_message); 66} 67``` 68 69The BUILD.gn definition for this target will be very similar to regular 70libFuzzer-based fuzzer_test. However it will also have libprotobuf-mutator in 71its deps. This is an example of what it will look like: 72 73```python 74// You must wrap the target in "use_fuzzing_engine_with_lpm" since trying to compile the 75// target without a suitable fuzzing engine will fail (for reasons alluded to in the next 76// step), which the commit queue will try. 77if (use_fuzzing_engine_with_lpm) { 78 fuzzer_test("my_fuzzer") { 79 sources = [ "my_fuzzer.cc" ] 80 deps = [ 81 // The proto library defining the message accepted by 82 // DEFINE_PROTO_FUZZER(). 83 ":my_proto", 84 85 "//third_party/libprotobuf-mutator", 86 ... 87 ] 88 } 89} 90``` 91 92There's one more step however. Because Chromium doesn't want to ship to users 93the full protobuf library, all `.proto` files in Chromium that are used in 94production contain this line: `option optimize_for = LITE_RUNTIME` But this 95line is incompatible with libprotobuf-mutator. Thus, we need to modify the 96`proto_library` build target so that builds when fuzzing are compatible with 97libprotobuf-mutator. To do this, change your `proto_library` to 98`fuzzable_proto_library` (don't worry, this works just like `proto_library` when 99`use_fuzzing_engine_with_lpm` is `false`) like so: 100 101```python 102import("//third_party/libprotobuf-mutator/fuzzable_proto_library.gni") 103 104fuzzable_proto_library("my_proto") { 105 ... 106} 107``` 108 109And with that we have completed writing a libprotobuf-mutator fuzz target for 110Chromium code that accepts protobufs. 111 112 113## Write a grammar-based fuzzer with libprotobuf-mutator 114 115Once you have in mind the code you want to fuzz and the format it accepts, you 116are ready to start writing a libprotobuf-mutator fuzzer. Writing the fuzzer 117will have three steps: 118 119* Define the fuzzed format (not required for protobuf formats, unless the 120original definition is optimized for `LITE_RUNTIME`). 121* Write the fuzz target and conversion code (for non-protobuf formats). 122* Define the GN target 123 124### Define the Fuzzed Format 125Create a new .proto using `proto2` or `proto3` syntax and define a message that 126you want libFuzzer to mutate. 127 128``` protocol-buffer 129syntax = "proto2"; 130 131package my_fuzzer; 132 133message MyProtoFormat { 134 // Define a format for libFuzzer to mutate here. 135} 136``` 137 138See `testing/libfuzzer/proto/url.proto` for an example of this in practice. 139That example has extensive comments on URL syntax and how that influenced 140the definition of the Url message. 141 142### Write the Fuzz Target and Conversion Code 143Create a new .cc and write a `DEFINE_PROTO_FUZZER` function: 144 145```c++ 146// Needed since we use getenv(). 147#include <stdlib.h> 148 149// Needed since we use std::cout. 150#include <iostream> 151 152#include "testing/libfuzzer/proto/lpm_interface.h" 153 154// Assuming the .proto file is path/to/your/proto_file/my_format.proto. 155#include "path/to/your/proto_file/my_format.pb.h" 156 157// Put your conversion code here (if needed) and then pass the result to 158// your fuzzing code (or just pass "my_format", if your target accepts 159// protobufs). 160 161DEFINE_PROTO_FUZZER(const my_fuzzer::MyFormat& my_proto_format) { 162 // Convert your protobuf to whatever format your targeted code accepts 163 // if it doesn't accept protobufs. 164 std::string native_input = convert_to_native_input(my_proto_format); 165 166 // You should provide a way to easily retrieve the native input for 167 // a given protobuf input. This is useful for debugging and for seeing 168 // the inputs that cause targeted_function to crash (which is the reason we 169 // are here!). Note how this is done before targeted_function is called 170 // since we can't print after the program has crashed. 171 if (getenv("LPM_DUMP_NATIVE_INPUT")) 172 std::cout << native_input << std::endl; 173 174 // Now test your targeted code using the converted protobuf input. 175 targeted_function(native_input); 176} 177``` 178 179This is very similar to the same step in writing a standard libFuzzer fuzzer. 180The only real differences are accepting protobufs rather than raw data and 181converting them to the desired format. Conversion code can't really be 182explored in this guide since it is format-specific. However, a good example 183of conversion code (and a fuzz target) can be found in 184`testing/libfuzzer/fuzzers/url_parse_proto_fuzzer.cc`. That example 185thoroughly documents how it converts the Url protobuf message into a real URL 186string. A good convention is printing the native input when the 187`LPM_DUMP_NATIVE_INPUT` env variable is set. This will make it easy to 188retrieve the actual input that causes the code to crash instead of the 189protobuf version of it (e.g. you can get the URL string that causes an input 190to crash rather than a protobuf). Since it is only a convention it is 191strongly recommended even though it isn't necessary. You don't need to do 192this if the native input of targeted_function is protobufs. Beware that 193printing a newline can make the output invalid for some formats. In this case 194you should use `fflush(0)` since otherwise the program may crash before 195native_input is actually printed. 196 197 198### Define the GN Target 199Define a fuzzer_test target and include your protobuf definition and 200libprotobuf-mutator as dependencies. 201 202```python 203import("//testing/libfuzzer/fuzzer_test.gni") 204import("//third_party/protobuf/proto_library.gni") 205 206fuzzer_test("my_fuzzer") { 207 sources = [ "my_fuzzer.cc" ] 208 deps = [ 209 ":my_format_proto", 210 "//third_party/libprotobuf-mutator" 211 ... 212 ] 213} 214 215proto_library("my_format_proto") { 216 sources = [ "my_format.proto" ] 217} 218``` 219 220See `testing/libfuzzer/fuzzers/BUILD.gn` for an example of this in practice. 221 222### Tips For Grammar Based Fuzzers 223* If you have messages that are defined recursively (eg: message `Foo` has a 224field of type `Foo`), make sure to bound recursive calls to code converting 225your message into native input. Otherwise you will (probably) end up with an 226out of memory error. The code coverage benefits of allowing unlimited 227recursion in a message are probably fairly low for most targets anyway. 228 229* Remember that proto definitions can be changed in ways that are backwards 230compatible (such as adding explicit values to an `enum`). This means that you 231can make changes to your definitions while preserving the usefulness of your 232corpus. In general adding fields will be backwards compatible but removing them 233(particulary if they are `required`) is not. 234 235* Make sure you understand the meaning of the different protobuf modifiers such 236as `oneof` and `repeated` as they can be counter-intuitive. `oneof` means "At 237most one of" while `repeated` means "At least zero". You can hack around these 238meanings if you need "at least one of" or "exactly one of" something. For 239example, this is the proto code for exactly one of: `MessageA` or `MessageB` or 240`MessageC`: 241 242```protocol-buffer 243message MyFormat { 244 oneof a_or_b { 245 MessageA message_a = 1; 246 MessageB message_b = 2; 247 } 248 required MessageC message_c = 3; 249} 250``` 251 252And here is the C++ code that converts it. 253 254```c++ 255std::string Convert(const MyFormat& my_format) { 256 if (my_format.has_message_a()) 257 return ConvertMessageA(my_format.message_a()); 258 else if (my_format.has_message_b()) 259 return ConvertMessageB(my_format.message_b()); 260 else // Fall through to the default case, message_c. 261 return ConvertMessageC(my_format.message_c()); 262} 263``` 264 265* libprotobuf-mutator supports both proto2 and proto3 syntax. Be aware though 266that it handles strings differently in each because of differences in the way 267the proto library handles strings in each syntax (in short, proto3 strings must 268actually be UTF-8 while in proto2 they do not). See [here] for more details. 269 270## Write a fuzz target for code that accepts multiple inputs 271LPM makes it straightforward to write a fuzzer for code that needs multiple 272inputs. The steps for doing this are similar to those of writing a grammar based 273fuzzer, except in this case the grammar is very simple. Thus instructions for 274this use case are given below. 275Start by creating the proto file which will define the inputs you want: 276 277```protocol-buffer 278// my_fuzzer_input.proto 279 280syntax = "proto2"; 281 282package my_fuzzer; 283 284message FuzzerInput { 285 required bool arg1 = 1; 286 required string arg2 = 2; 287 optional int arg3 = 1; 288} 289 290``` 291 292In this example, the function we are fuzzing requires a `bool` and a `string` 293and takes an `int` as an optional argument. Let's define our fuzzer harness: 294 295```c++ 296// my_fuzzer.cc 297 298#include "testing/libfuzzer/proto/lpm_interface.h" 299 300// Assuming the .proto file is path/to/your/proto_file/my_fuzzer_input.proto. 301#include "path/to/your/proto_file/my_proto.pb.h" 302 303DEFINE_PROTO_FUZZER( 304 const my_proto::FuzzerInput& fuzzer_input) { 305 if (fuzzer_input.has_arg3()) 306 targeted_function_1(fuzzer_input.arg1(), fuzzer_input.arg2(), fuzzer_input.arg3()); 307 else 308 targeted_function_2(fuzzer_input.arg1(), fuzzer_input.arg2()); 309} 310``` 311 312Then you must define build targets for your fuzzer harness and proto format in 313GN, like so: 314```python 315import("//testing/libfuzzer/fuzzer_test.gni") 316import("//third_party/protobuf/proto_library.gni") 317 318fuzzer_test("my_fuzzer") { 319 sources = [ "my_fuzzer.cc" ] 320 deps = [ 321 ":my_fuzzer_input", 322 "//third_party/libprotobuf-mutator" 323 ... 324 ] 325} 326 327proto_library("my_fuzzer_input") { 328 sources = [ "my_fuzzer_input.proto" ] 329} 330``` 331 332### Tips for fuzz targets that accept multiple inputs 333Protobuf has a field rule `repeated` that is useful when a fuzzer needs to 334accept a non-fixed number of inputs (see [mojo_parse_messages_proto_fuzzer], 335which accepts an unbounded number of mojo messages as an example). 336Protobuf version 2 also has `optional` and `required` field rules that some may 337find useful. 338 339 340## Wrapping Up 341Once you have written a fuzzer with libprotobuf-mutator, building and running 342it is pretty much the same as if the fuzzer were a [standard libFuzzer-based 343fuzzer] (with minor exceptions, like your seed corpus must be in protobuf 344format). 345 346## General Tips 347* Check out some of the [existing proto fuzzers]. Not only will they be helpful 348examples, it is possible that format you want to fuzz is already defined or 349partially defined by an existing proto definition (if you are writing a grammar 350fuzzer). 351 352* `DEFINE_BINARY_PROTO_FUZZER` can be used instead of `DEFINE_PROTO_FUZZER` (or 353 `DEFINE_TEXT_PROTO_FUZZER`) to use protobuf's binary format for the corpus. 354 This will make it hard/impossible to modify the corpus manually (i.e. when not 355 fuzzing). However, protobuf's text format (and by extension 356 `DEFINE_PROTO_FUZZER`) is believed by some to come with a performance penalty 357 compared to the binary format. We've never seen a case where this penalty 358 was important, but if profiling reveals that protobuf deserialization is the 359 bottleneck in your fuzzer, you may want to consider using the binary format. 360 This will probably not be the case. 361 362[libfuzzer in Chromium]: getting_started.md 363[Protocol Buffers]: https://developers.google.com/protocol-buffers/docs/cpptutorial 364[[email protected]]: mailto:[email protected] 365[this]: https://github.com/google/libprotobuf-mutator/tree/master/examples/libfuzzer/libfuzzer_example.cc 366[existing proto fuzzers]: https://cs.chromium.org/search/?q=DEFINE_(BINARY_%7CTEXT_)?PROTO_FUZZER+-file:src/third_party/libprotobuf-mutator/src/src/libfuzzer/libfuzzer_macro.h+lang:cpp&sq=package:chromium&type=cs 367[here]: https://github.com/google/libprotobuf-mutator/blob/master/README.md#utf-8-strings 368[lpm_test_fuzzer]: https://cs.chromium.org/#search&q=lpm_test_fuzzer+file:%5Esrc/third_party/libprotobuf-mutator/BUILD.gn 369[mojo_parse_messages_proto_fuzzer]: https://cs.chromium.org/chromium/src/mojo/public/tools/fuzzers/mojo_parse_message_proto_fuzzer.cc?l=25 370[standard libFuzzer-based fuzzer]:getting_started_with_libfuzzer.md 371