1# Fuzzing binary-only targets 2 3AFL++, libfuzzer, and other fuzzers are great if you have the source code of the 4target. This allows for very fast and coverage guided fuzzing. 5 6However, if there is only the binary program and no source code available, then 7standard `afl-fuzz -n` (non-instrumented mode) is not effective. 8 9For fast, on-the-fly instrumentation of black-box binaries, AFL++ still offers 10various support. The following is a description of how these binaries can be 11fuzzed with AFL++. 12 13## TL;DR: 14 15FRIDA mode and QEMU mode in persistent mode are the fastest - if persistent mode 16is possible and the stability is high enough. 17 18Otherwise, try Zafl, RetroWrite, Dyninst, and if these fail, too, then try 19standard FRIDA/QEMU mode with `AFL_ENTRYPOINT` to where you need it. 20 21If your target is non-linux, then use unicorn_mode. 22 23## Fuzzing binary-only targets with AFL++ 24 25### QEMU mode 26 27QEMU mode is the "native" solution to the program. It is available in the 28./qemu_mode/ directory and, once compiled, it can be accessed by the afl-fuzz -Q 29command line option. It is the easiest to use alternative and even works for 30cross-platform binaries. 31 32For linux programs and its libraries, this is accomplished with a version of 33QEMU running in the lesser-known "user space emulation" mode. QEMU is a project 34separate from AFL++, but you can conveniently build the feature by doing: 35 36```shell 37cd qemu_mode 38./build_qemu_support.sh 39``` 40 41The following setup to use QEMU mode is recommended: 42 43* run 1 afl-fuzz -Q instance with CMPLOG (`-c 0` + `AFL_COMPCOV_LEVEL=2`) 44* run 1 afl-fuzz -Q instance with QASAN (`AFL_USE_QASAN=1`) 45* run 1 afl-fuzz -Q instance with LAF (`AFL_PRELOAD=libcmpcov.so` + 46 `AFL_COMPCOV_LEVEL=2`), alternatively you can use FRIDA mode, just switch `-Q` 47 with `-O` and remove the LAF instance 48 49Then run as many instances as you have cores left with either -Q mode or - even 50better - use a binary rewriter like Dyninst, RetroWrite, ZAFL, etc. 51The binary rewriters all have their own advantages and caveats. 52ZAFL is the best but cannot be used in a business/commercial context. 53 54If a binary rewriter works for your target then you can use afl-fuzz normally 55and it will have twice the speed compared to QEMU mode (but slower than QEMU 56persistent mode). 57 58The speed decrease of QEMU mode is at about 50%. However, various options exist 59to increase the speed: 60- using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in 61 the binary (+5-10% speed) 62- using persistent mode 63 [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md) this will 64 result in a 150-300% overall speed increase - so 3-8x the original QEMU mode 65 speed! 66- using AFL_CODE_START/AFL_CODE_END to only instrument specific parts 67 68For additional instructions and caveats, see 69[qemu_mode/README.md](../qemu_mode/README.md). If possible, you should use the 70persistent mode, see 71[qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md). The mode is 72approximately 2-5x slower than compile-time instrumentation, and is less 73conducive to parallelization. 74 75Note that there is also honggfuzz: 76[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz) which 77now has a QEMU mode, but its performance is just 1.5% ... 78 79If you like to code a customized fuzzer without much work, we highly recommend 80to check out our sister project libafl which supports QEMU, too: 81[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL) 82 83### WINE+QEMU 84 85Wine mode can run Win32 PE binaries with the QEMU instrumentation. It needs 86Wine, python3, and the pefile python package installed. 87 88It is included in AFL++. 89 90For more information, see 91[qemu_mode/README.wine.md](../qemu_mode/README.wine.md). 92 93### FRIDA mode 94 95In FRIDA mode, you can fuzz binary-only targets as easily as with QEMU mode. 96FRIDA mode is most of the times slightly faster than QEMU mode. It is also 97newer, and has the advantage that it works on MacOS (both intel and M1). 98 99To build FRIDA mode: 100 101```shell 102cd frida_mode 103gmake 104``` 105 106For additional instructions and caveats, see 107[frida_mode/README.md](../frida_mode/README.md). 108 109If possible, you should use the persistent mode, see 110[instrumentation/README.persistent_mode.md](../instrumentation/README.persistent_mode.md). 111The mode is approximately 2-5x slower than compile-time instrumentation, and is 112less conducive to parallelization. But for binary-only fuzzing, it gives a huge 113speed improvement if it is possible to use. 114 115You can also perform remote fuzzing with frida, e.g., if you want to fuzz on 116iPhone or Android devices, for this you can use 117[https://github.com/ttdennis/fpicker/](https://github.com/ttdennis/fpicker/) as 118an intermediate that uses AFL++ for fuzzing. 119 120If you like to code a customized fuzzer without much work, we highly recommend 121to check out our sister project libafl which supports Frida, too: 122[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL). 123Working examples already exist :-) 124 125### Nyx mode 126 127Nyx is a full system emulation fuzzing environment with snapshot support that is 128built upon KVM and QEMU. It is only available on Linux and currently restricted 129to x86_x64. 130 131For binary-only fuzzing a special 5.10 kernel is required. 132 133See [nyx_mode/README.md](../nyx_mode/README.md). 134 135### Unicorn 136 137Unicorn is a fork of QEMU. The instrumentation is, therefore, very similar. In 138contrast to QEMU, Unicorn does not offer a full system or even userland 139emulation. Runtime environment and/or loaders have to be written from scratch, 140if needed. On top, block chaining has been removed. This means the speed boost 141introduced in the patched QEMU Mode of AFL++ cannot be ported over to Unicorn. 142 143For non-Linux binaries, you can use AFL++'s unicorn_mode which can emulate 144anything you want - for the price of speed and user written scripts. 145 146To build unicorn_mode: 147 148```shell 149cd unicorn_mode 150./build_unicorn_support.sh 151``` 152 153For further information, check out 154[unicorn_mode/README.md](../unicorn_mode/README.md). 155 156### Shared libraries 157 158If the goal is to fuzz a dynamic library, then there are two options available. 159For both, you need to write a small harness that loads and calls the library. 160Then you fuzz this with either FRIDA mode or QEMU mode and either use 161`AFL_INST_LIBS=1` or `AFL_QEMU/FRIDA_INST_RANGES`. 162 163Another, less precise and slower option is to fuzz it with utils/afl_untracer/ 164and use afl-untracer.c as a template. It is slower than FRIDA mode. 165 166For more information, see 167[utils/afl_untracer/README.md](../utils/afl_untracer/README.md). 168 169### Coresight 170 171Coresight is ARM's answer to Intel's PT. With AFL++ v3.15, there is a coresight 172tracer implementation available in `coresight_mode/` which is faster than QEMU, 173however, cannot run in parallel. Currently, only one process can be traced, it 174is WIP. 175 176Fore more information, see 177[coresight_mode/README.md](../coresight_mode/README.md). 178 179## Binary rewriters 180 181An alternative solution are binary rewriters. They are faster than the solutions 182native to AFL++ but don't always work. 183 184### ZAFL 185 186ZAFL is a static rewriting platform supporting x86-64 C/C++, 187stripped/unstripped, and PIE/non-PIE binaries. Beyond conventional 188instrumentation, ZAFL's API enables transformation passes (e.g., laf-Intel, 189context sensitivity, InsTrim, etc.). 190 191Its baseline instrumentation speed typically averages 90-95% of 192afl-clang-fast's. 193 194[https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl) 195 196### RetroWrite 197 198RetroWrite is a static binary rewriter that can be combined with AFL++. If you 199have an x86_64 or arm64 binary that does not contain C++ exceptions and - if 200x86_64 - still has it's symbols and compiled with position independent code 201(PIC/PIE), then the RetroWrite solution might be for you. 202It decompiles to ASM files which can then be instrumented with afl-gcc. 203 204Binaries that are statically instrumented for fuzzing using RetroWrite are close 205in performance to compiler-instrumented binaries and outperform the QEMU-based 206instrumentation. 207 208[https://github.com/HexHive/retrowrite](https://github.com/HexHive/retrowrite) 209 210### Dyninst 211 212Dyninst is a binary instrumentation framework similar to Pintool and DynamoRIO. 213However, whereas Pintool and DynamoRIO work at runtime, Dyninst instruments the 214target at load time and then let it run - or save the binary with the changes. 215This is great for some things, e.g., fuzzing, and not so effective for others, 216e.g., malware analysis. 217 218So, what you can do with Dyninst is taking every basic block and putting AFL++'s 219instrumentation code in there - and then save the binary. Afterwards, just fuzz 220the newly saved target binary with afl-fuzz. Sounds great? It is. The issue 221though - it is a non-trivial problem to insert instructions, which change 222addresses in the process space, so that everything is still working afterwards. 223Hence, more often than not binaries crash when they are run. 224 225The speed decrease is about 15-35%, depending on the optimization options used 226with afl-dyninst. 227 228[https://github.com/vanhauser-thc/afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst) 229 230### Mcsema 231 232Theoretically, you can also decompile to llvm IR with mcsema, and then use 233llvm_mode to instrument the binary. Good luck with that. 234 235[https://github.com/lifting-bits/mcsema](https://github.com/lifting-bits/mcsema) 236 237## Binary tracers 238 239### Pintool & DynamoRIO 240 241Pintool and DynamoRIO are dynamic instrumentation engines. They can be used for 242getting basic block information at runtime. Pintool is only available for Intel 243x32/x64 on Linux, Mac OS, and Windows, whereas DynamoRIO is additionally 244available for ARM and AARCH64. DynamoRIO is also 10x faster than Pintool. 245 246The big issue with DynamoRIO (and therefore Pintool, too) is speed. DynamoRIO 247has a speed decrease of 98-99%, Pintool has a speed decrease of 99.5%. 248 249Hence, DynamoRIO is the option to go for if everything else fails and Pintool 250only if DynamoRIO fails, too. 251 252DynamoRIO solutions: 253* [https://github.com/vanhauser-thc/afl-dynamorio](https://github.com/vanhauser-thc/afl-dynamorio) 254* [https://github.com/mxmssh/drAFL](https://github.com/mxmssh/drAFL) 255* [https://github.com/googleprojectzero/winafl/](https://github.com/googleprojectzero/winafl/) 256 <= very good but windows only 257 258Pintool solutions: 259* [https://github.com/vanhauser-thc/afl-pin](https://github.com/vanhauser-thc/afl-pin) 260* [https://github.com/mothran/aflpin](https://github.com/mothran/aflpin) 261* [https://github.com/spinpx/afl_pin_mode](https://github.com/spinpx/afl_pin_mode) 262 <= only old Pintool version supported 263 264### Intel PT 265 266If you have a newer Intel CPU, you can make use of Intel's processor trace. The 267big issue with Intel's PT is the small buffer size and the complex encoding of 268the debug information collected through PT. This makes the decoding very CPU 269intensive and hence slow. As a result, the overall speed decrease is about 27070-90% (depending on the implementation and other factors). 271 272There are two AFL intel-pt implementations: 273 2741. [https://github.com/junxzm1990/afl-pt](https://github.com/junxzm1990/afl-pt) 275 => This needs Ubuntu 14.04.05 without any updates and the 4.4 kernel. 276 2772. [https://github.com/hunter-ht-2018/ptfuzzer](https://github.com/hunter-ht-2018/ptfuzzer) 278 => This needs a 4.14 or 4.15 kernel. The "nopti" kernel boot option must be 279 used. This one is faster than the other. 280 281Note that there is also honggfuzz: 282[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz). But 283its IPT performance is just 6%! 284 285## Non-AFL++ solutions 286 287There are many binary-only fuzzing frameworks. Some are great for CTFs but don't 288work with large binaries, others are very slow but have good path discovery, 289some are very hard to set up... 290 291* Jackalope: 292 [https://github.com/googleprojectzero/Jackalope](https://github.com/googleprojectzero/Jackalope) 293* Manticore: 294 [https://github.com/trailofbits/manticore](https://github.com/trailofbits/manticore) 295* QSYM: 296 [https://github.com/sslab-gatech/qsym](https://github.com/sslab-gatech/qsym) 297* S2E: [https://github.com/S2E](https://github.com/S2E) 298* TinyInst: 299 [https://github.com/googleprojectzero/TinyInst](https://github.com/googleprojectzero/TinyInst) 300* ... please send me any missing that are good 301 302## Closing words 303 304That's it! News, corrections, updates? Send an email to [email protected]. 305