1*103e46e4SHarish Mahendrakar# WebM Parser {#mainpage} 2*103e46e4SHarish Mahendrakar 3*103e46e4SHarish Mahendrakar# Introduction 4*103e46e4SHarish Mahendrakar 5*103e46e4SHarish MahendrakarThis WebM parser is a C++11-based parser that aims to be a safe and complete 6*103e46e4SHarish Mahendrakarparser for WebM. It supports all WebM elements (from the old deprecated ones to 7*103e46e4SHarish Mahendrakarthe newest ones like `Colour`), including recursive elements like `ChapterAtom` 8*103e46e4SHarish Mahendrakarand `SimpleTag`. It supports incremental parsing; parsing may be stopped at any 9*103e46e4SHarish Mahendrakarpoint and resumed later as needed. It also supports starting at an arbitrary 10*103e46e4SHarish MahendrakarWebM element, so parsing need not start from the beginning of the file. 11*103e46e4SHarish Mahendrakar 12*103e46e4SHarish MahendrakarThe parser (`WebmParser`) works by being fed input data from a data source (an 13*103e46e4SHarish Mahendrakarinstance of `Reader`) that represents a WebM file. The parser will parse the 14*103e46e4SHarish MahendrakarWebM data into various data structures that represent the encoded WebM elements, 15*103e46e4SHarish Mahendrakarand then call corresponding `Callback` event methods as the data structures are 16*103e46e4SHarish Mahendrakarparsed. 17*103e46e4SHarish Mahendrakar 18*103e46e4SHarish Mahendrakar# Building 19*103e46e4SHarish Mahendrakar 20*103e46e4SHarish MahendrakarCMake support has been added to the root libwebm `CMakeLists.txt` file. Simply 21*103e46e4SHarish Mahendrakarenable the `ENABLE_WEBM_PARSER` feature if using the interactive CMake builder, 22*103e46e4SHarish Mahendrakaror alternatively pass the `-DENABLE_WEBM_PARSER:BOOL=ON` flag from the command 23*103e46e4SHarish Mahendrakarline. By default, this parser is not enabled when building libwebm, so you must 24*103e46e4SHarish Mahendrakarexplicitly enable it. 25*103e46e4SHarish Mahendrakar 26*103e46e4SHarish MahendrakarAlternatively, the following illustrates the minimal commands necessary to 27*103e46e4SHarish Mahendrakarcompile the code into a static library without CMake: 28*103e46e4SHarish Mahendrakar 29*103e46e4SHarish Mahendrakar```.sh 30*103e46e4SHarish Mahendrakarc++ -Iinclude -I. -std=c++11 -c src/*.cc 31*103e46e4SHarish Mahendrakarar rcs libwebm.a *.o 32*103e46e4SHarish Mahendrakar``` 33*103e46e4SHarish Mahendrakar 34*103e46e4SHarish Mahendrakar# Using the parser 35*103e46e4SHarish Mahendrakar 36*103e46e4SHarish MahendrakarThere are 3 basic components in the parser that are used: `Reader`, `Callback`, 37*103e46e4SHarish Mahendrakarand `WebmParser`. 38*103e46e4SHarish Mahendrakar 39*103e46e4SHarish Mahendrakar## `Reader` 40*103e46e4SHarish Mahendrakar 41*103e46e4SHarish MahendrakarThe `Reader` interface acts as a data source for the parser. You may subclass it 42*103e46e4SHarish Mahendrakarand implement your own data source if you wish. Alternatively, use the 43*103e46e4SHarish Mahendrakar`FileReader`, `IstreamReader`, or `BufferReader` if you wish to read from a 44*103e46e4SHarish Mahendrakar`FILE*`, `std::istream`, or `std::vector<std::uint8_t>`, respectively. 45*103e46e4SHarish Mahendrakar 46*103e46e4SHarish MahendrakarThe parser supports `Reader` implementations that do short reads. If 47*103e46e4SHarish Mahendrakar`Reader::Skip()` or `Reader::Read()` do a partial read (returning 48*103e46e4SHarish Mahendrakar`Status::kOkPartial`), the parser will call them again in an attempt to read 49*103e46e4SHarish Mahendrakarmore data. If no data is available, the `Reader` may return some other status 50*103e46e4SHarish Mahendrakar(like `Status::kWouldBlock`) to indicate that no data is available. In this 51*103e46e4SHarish Mahendrakarsituation, the parser will stop parsing and return the status it received. 52*103e46e4SHarish MahendrakarParsing may be resumed later when more data is available. 53*103e46e4SHarish Mahendrakar 54*103e46e4SHarish MahendrakarWhen the `Reader` has reached the end of the WebM document and no more data is 55*103e46e4SHarish Mahendrakaravailable, it should return `Status::kEndOfFile`. This will cause parsing to 56*103e46e4SHarish Mahendrakarstop. If the file ends at a valid location (that is, there aren't any elements 57*103e46e4SHarish Mahendrakarthat have specified a size that indicates the file ended prematurely), the 58*103e46e4SHarish Mahendrakarparser will translate `Status::kEndOfFile` into `Status::kOkCompleted` and 59*103e46e4SHarish Mahendrakarreturn it. If the file ends prematurely, the parser will return 60*103e46e4SHarish Mahendrakar`Status::kEndOfFile` to indicate that. 61*103e46e4SHarish Mahendrakar 62*103e46e4SHarish MahendrakarNote that if the WebM file contains elements that have an unknown size (or a 63*103e46e4SHarish Mahendrakarseek has been performed and the parser doesn't know the size of the root 64*103e46e4SHarish Mahendrakarelement(s)), and the parser is parsing them and hits end-of-file, the parser may 65*103e46e4SHarish Mahendrakarstill call `Reader::Read()`/`Reader::Skip()` multiple times (even though they've 66*103e46e4SHarish Mahendrakaralready reported `Status::kEndOfFile`) as nested parsers terminate parsing. 67*103e46e4SHarish MahendrakarBecause of this, `Reader::Read()`/`Reader::Skip()` implementations should be 68*103e46e4SHarish Mahendrakarable to handle being called multiple times after the file's end has been 69*103e46e4SHarish Mahendrakarreached, and they should consistently return `Status::kEndOfFile`. 70*103e46e4SHarish Mahendrakar 71*103e46e4SHarish MahendrakarThe three provided readers (`FileReader`, `IstreamReader`, and `BufferReader`) 72*103e46e4SHarish Mahendrakarare blocking implementations (they won't return `Status::kWouldBlock`), so if 73*103e46e4SHarish Mahendrakaryou're using them the parser will run until it entirely consumes all their data 74*103e46e4SHarish Mahendrakar(unless, of course, you request the parser to stop via `Callback`... see the 75*103e46e4SHarish Mahendrakarnext section). 76*103e46e4SHarish Mahendrakar 77*103e46e4SHarish Mahendrakar## `Callback` 78*103e46e4SHarish Mahendrakar 79*103e46e4SHarish MahendrakarAs the parser progresses through the file, it builds objects (see 80*103e46e4SHarish Mahendrakar`webm/dom_types.h`) that represent parsed data structures. The parser then 81*103e46e4SHarish Mahendrakarnotifies the `Callback` implementation as objects complete parsing. For some 82*103e46e4SHarish Mahendrakardata structures (like frames or Void elements), the parser notifies the 83*103e46e4SHarish Mahendrakar`Callback` and requests it to consume the data directly from the `Reader` (this 84*103e46e4SHarish Mahendrakaris done for structures that can be large/frequent binary blobs in order to allow 85*103e46e4SHarish Mahendrakaryou to read the data directly into the object/type of your choice, rather than 86*103e46e4SHarish Mahendrakarjust reading them into a `std::vector<std::uint8_t>` and making you copy it into 87*103e46e4SHarish Mahendrakara different object if you wanted to work with something other than 88*103e46e4SHarish Mahendrakar`std::vector<std::uint8_t>`). 89*103e46e4SHarish Mahendrakar 90*103e46e4SHarish MahendrakarThe parser was designed to parse the data into objects that are small enough 91*103e46e4SHarish Mahendrakarthat the `Callback` can be quickly and frequently notified as soon as the object 92*103e46e4SHarish Mahendrakaris ready, but large enough that the objects received by the `Callback` are still 93*103e46e4SHarish Mahendrakaruseful. Having `Callback` events for every tiny integer/float/string/etc. 94*103e46e4SHarish Mahendrakarelement would require too much assembly and work to be useful to most users, and 95*103e46e4SHarish Mahendrakarpasing the file into a single DOM tree (or a small handful of large conglomerate 96*103e46e4SHarish Mahendrakarstructures) would unnecessarily delay video playback or consume too much memory 97*103e46e4SHarish Mahendrakaron smaller devices. 98*103e46e4SHarish Mahendrakar 99*103e46e4SHarish MahendrakarThe parser may call the following methods while nearly anywhere in the file: 100*103e46e4SHarish Mahendrakar 101*103e46e4SHarish Mahendrakar- `Callback::OnElementBegin()`: This is called for every element that the 102*103e46e4SHarish Mahendrakar parser encounters. This is primarily useful if you want to skip some 103*103e46e4SHarish Mahendrakar elements or build a map of every element in the file. 104*103e46e4SHarish Mahendrakar- `Callback::OnUnknownElement()`: This is called when an element is either not 105*103e46e4SHarish Mahendrakar a valid/recognized WebM element, or it is a WebM element but is improperly 106*103e46e4SHarish Mahendrakar nested (e.g. an EBMLVersion element inside of a Segment element). The parser 107*103e46e4SHarish Mahendrakar doesn't know how to handle the element; it could just skip it but instead 108*103e46e4SHarish Mahendrakar defers to the `Callback` to decide how it should be handled. The default 109*103e46e4SHarish Mahendrakar implementation just skips the element. 110*103e46e4SHarish Mahendrakar- `Callback::OnVoid()`: Void elements can appear anywhere in any master 111*103e46e4SHarish Mahendrakar element. This method will be called to handle the Void element. 112*103e46e4SHarish Mahendrakar 113*103e46e4SHarish MahendrakarThe parser may call the following methods in the proper nesting order, as shown 114*103e46e4SHarish Mahendrakarin the list. A `*Begin()` method will always be matched up with its 115*103e46e4SHarish Mahendrakarcorresponding `*End()` method (unless a seek has been performed). The parser 116*103e46e4SHarish Mahendrakarwill only call the methods in the proper nesting order as specified in the WebM 117*103e46e4SHarish MahendrakarDOM. For example, `Callback::OnEbml()` will never be called in between 118*103e46e4SHarish Mahendrakar`Callback::OnSegmentBegin()`/`Callback::OnSegmentEnd()` (since the EBML element 119*103e46e4SHarish Mahendrakaris not a child of the Segment element), and `Callback::OnTrackEntry()` will only 120*103e46e4SHarish Mahendrakarever be called in between 121*103e46e4SHarish Mahendrakar`Callback::OnSegmentBegin()`/`Callback::OnSegmentEnd()` (since the TrackEntry 122*103e46e4SHarish Mahendrakarelement is a (grand-)child of the Segment element and must be contained by a 123*103e46e4SHarish MahendrakarSegment element). `Callback::OnFrame()` is listed twice because it will be 124*103e46e4SHarish Mahendrakarcalled to handle frames contained in both SimpleBlock and Block elements. 125*103e46e4SHarish Mahendrakar 126*103e46e4SHarish Mahendrakar- `Callback::OnEbml()` 127*103e46e4SHarish Mahendrakar- `Callback::OnSegmentBegin()` 128*103e46e4SHarish Mahendrakar - `Callback::OnSeek()` 129*103e46e4SHarish Mahendrakar - `Callback::OnInfo()` 130*103e46e4SHarish Mahendrakar - `Callback::OnClusterBegin()` 131*103e46e4SHarish Mahendrakar - `Callback::OnSimpleBlockBegin()` 132*103e46e4SHarish Mahendrakar - `Callback::OnFrame()` 133*103e46e4SHarish Mahendrakar - `Callback::OnSimpleBlockEnd()` 134*103e46e4SHarish Mahendrakar - `Callback::OnBlockGroupBegin()` 135*103e46e4SHarish Mahendrakar - `Callback::OnBlockBegin()` 136*103e46e4SHarish Mahendrakar - `Callback::OnFrame()` 137*103e46e4SHarish Mahendrakar - `Callback::OnBlockEnd()` 138*103e46e4SHarish Mahendrakar - `Callback::OnBlockGroupEnd()` 139*103e46e4SHarish Mahendrakar - `Callback::OnClusterEnd()` 140*103e46e4SHarish Mahendrakar - `Callback::OnTrackEntry()` 141*103e46e4SHarish Mahendrakar - `Callback::OnCuePoint()` 142*103e46e4SHarish Mahendrakar - `Callback::OnEditionEntry()` 143*103e46e4SHarish Mahendrakar - `Callback::OnTag()` 144*103e46e4SHarish Mahendrakar- `Callback::OnSegmentEnd()` 145*103e46e4SHarish Mahendrakar 146*103e46e4SHarish MahendrakarOnly `Callback::OnFrame()` (and no other `Callback` methods) will be called in 147*103e46e4SHarish Mahendrakarbetween `Callback::OnSimpleBlockBegin()`/`Callback::OnSimpleBlockEnd()` or 148*103e46e4SHarish Mahendrakar`Callback::OnBlockBegin()`/`Callback::OnBlockEnd()`, since the SimpleBlock and 149*103e46e4SHarish MahendrakarBlock elements are not master elements only contain frames. 150*103e46e4SHarish Mahendrakar 151*103e46e4SHarish MahendrakarNote that seeking into the middle of the file may cause the parser to skip some 152*103e46e4SHarish Mahendrakar`*Begin()` methods. For example, if a seek is performed to a SimpleBlock 153*103e46e4SHarish Mahendrakarelement, `Callback::OnSegmentBegin()` and `Callback::OnClusterBegin()` will not 154*103e46e4SHarish Mahendrakarbe called. In this situation, the full sequence of callback events would be 155*103e46e4SHarish Mahendrakar(assuming the file ended after the SimpleBlock): 156*103e46e4SHarish Mahendrakar`Callback::OnSimpleBlockBegin()`, `Callback::OnFrame()` (for every frame in the 157*103e46e4SHarish MahendrakarSimpleBlock), `Callback::OnSimpleBlockEnd()`, `Callback::OnClusterEnd()`, and 158*103e46e4SHarish Mahendrakar`Callback::OnSegmentEnd()`. Since the Cluster and Segment elements were skipped, 159*103e46e4SHarish Mahendrakarthe `Cluster` DOM object may have some members marked as absent, and the 160*103e46e4SHarish Mahendrakar`*End()` events for the Cluster and Segment elements will have metadata with 161*103e46e4SHarish Mahendrakarunknown header position, header length, and body size (see `kUnknownHeaderSize`, 162*103e46e4SHarish Mahendrakar`kUnknownElementSize`, and `kUnknownElementPosition`). 163*103e46e4SHarish Mahendrakar 164*103e46e4SHarish MahendrakarWhen a `Callback` method has completed, it should return `Status::kOkCompleted` 165*103e46e4SHarish Mahendrakarto allow parsing to continue. If you would like parsing to stop, return any 166*103e46e4SHarish Mahendrakarother status code (except `Status::kEndOfFile`, since that's treated somewhat 167*103e46e4SHarish Mahendrakarspecially and is intended for `Reader`s to use), which the parser will return. 168*103e46e4SHarish MahendrakarIf you return a non-parsing-error status code (.e.g. `Status::kOkPartial`, 169*103e46e4SHarish Mahendrakar`Status::kWouldBlock`, etc. or your own status code with a value > 0), parsing 170*103e46e4SHarish Mahendrakarmay be resumed again. When parsing is resumed, the parser will call the same 171*103e46e4SHarish Mahendrakarcallback method again (and once again, you may return `Status::kOkCompleted` to 172*103e46e4SHarish Mahendrakarlet parsing continue or some other value to stop parsing). 173*103e46e4SHarish Mahendrakar 174*103e46e4SHarish MahendrakarYou may subclass the `Callback` element and override methods which you are 175*103e46e4SHarish Mahendrakarinterested in receiving events for. By default, methods taking an `Action` 176*103e46e4SHarish Mahendrakarparameter will set it to `Action::kRead` so the entire file is parsed. The 177*103e46e4SHarish Mahendrakar`Callback::OnFrame()` method will just skip over the frame bytes by default. 178*103e46e4SHarish Mahendrakar 179*103e46e4SHarish Mahendrakar## `WebmParser` 180*103e46e4SHarish Mahendrakar 181*103e46e4SHarish MahendrakarThe actual parsing work is done with `WebmParser`. Simply construct a 182*103e46e4SHarish Mahendrakar`WebmParser` and call `WebmParser::Feed()` (providing it a `Callback` and 183*103e46e4SHarish Mahendrakar`Reader` instance) to parse a file. It will return `Status::kOkCompleted` when 184*103e46e4SHarish Mahendrakarthe entire file has been successfully parsed. `WebmParser::Feed()` doesn't store 185*103e46e4SHarish Mahendrakarany internal references to the `Callback` or `Reader`. 186*103e46e4SHarish Mahendrakar 187*103e46e4SHarish MahendrakarIf you wish to start parsing from the middle of a file, call 188*103e46e4SHarish Mahendrakar`WebmParser::DidSeek()` before calling `WebmParser::Feed()` to prepare the 189*103e46e4SHarish Mahendrakarparser to receive data starting at an arbitrary point in the file. When seeking, 190*103e46e4SHarish Mahendrakaryou should seek to the beginning of a WebM element; seeking to a location that 191*103e46e4SHarish Mahendrakaris not the start of a WebM element (e.g. seeking to a frame, rather than its 192*103e46e4SHarish Mahendrakarcontaining SimpleBlock/Block element) will cause parsing to fail. Calling 193*103e46e4SHarish Mahendrakar`WebmParser::DidSeek()` will reset the state of the parser and clear any 194*103e46e4SHarish Mahendrakarinternal errors, so a `WebmParser` instance may be reused (even if it has 195*103e46e4SHarish Mahendrakarpreviously failed to parse a file). 196*103e46e4SHarish Mahendrakar 197*103e46e4SHarish Mahendrakar## Building your program 198*103e46e4SHarish Mahendrakar 199*103e46e4SHarish MahendrakarThe following program is a small program that completely parses a file from 200*103e46e4SHarish Mahendrakarstdin: 201*103e46e4SHarish Mahendrakar 202*103e46e4SHarish Mahendrakar```.cc 203*103e46e4SHarish Mahendrakar#include <webm/callback.h> 204*103e46e4SHarish Mahendrakar#include <webm/file_reader.h> 205*103e46e4SHarish Mahendrakar#include <webm/webm_parser.h> 206*103e46e4SHarish Mahendrakar 207*103e46e4SHarish Mahendrakarint main() { 208*103e46e4SHarish Mahendrakar webm::Callback callback; 209*103e46e4SHarish Mahendrakar webm::FileReader reader(std::freopen(nullptr, "rb", stdin)); 210*103e46e4SHarish Mahendrakar webm::WebmParser parser; 211*103e46e4SHarish Mahendrakar parser.Feed(&callback, &reader); 212*103e46e4SHarish Mahendrakar} 213*103e46e4SHarish Mahendrakar``` 214*103e46e4SHarish Mahendrakar 215*103e46e4SHarish MahendrakarIt completely parses the input file, but we need to make a new class that 216*103e46e4SHarish Mahendrakarderives from `Callback` if we want to receive any parsing events. So if we 217*103e46e4SHarish Mahendrakarchange it to: 218*103e46e4SHarish Mahendrakar 219*103e46e4SHarish Mahendrakar```.cc 220*103e46e4SHarish Mahendrakar#include <iomanip> 221*103e46e4SHarish Mahendrakar#include <iostream> 222*103e46e4SHarish Mahendrakar 223*103e46e4SHarish Mahendrakar#include <webm/callback.h> 224*103e46e4SHarish Mahendrakar#include <webm/file_reader.h> 225*103e46e4SHarish Mahendrakar#include <webm/status.h> 226*103e46e4SHarish Mahendrakar#include <webm/webm_parser.h> 227*103e46e4SHarish Mahendrakar 228*103e46e4SHarish Mahendrakarclass MyCallback : public webm::Callback { 229*103e46e4SHarish Mahendrakar public: 230*103e46e4SHarish Mahendrakar webm::Status OnElementBegin(const webm::ElementMetadata& metadata, 231*103e46e4SHarish Mahendrakar webm::Action* action) override { 232*103e46e4SHarish Mahendrakar std::cout << "Element ID = 0x" 233*103e46e4SHarish Mahendrakar << std::hex << static_cast<std::uint32_t>(metadata.id); 234*103e46e4SHarish Mahendrakar std::cout << std::dec; // Reset to decimal mode. 235*103e46e4SHarish Mahendrakar std::cout << " at position "; 236*103e46e4SHarish Mahendrakar if (metadata.position == webm::kUnknownElementPosition) { 237*103e46e4SHarish Mahendrakar // The position will only be unknown if we've done a seek. But since we 238*103e46e4SHarish Mahendrakar // aren't seeking in this demo, this will never be the case. However, this 239*103e46e4SHarish Mahendrakar // if-statement is included for completeness. 240*103e46e4SHarish Mahendrakar std::cout << "<unknown>"; 241*103e46e4SHarish Mahendrakar } else { 242*103e46e4SHarish Mahendrakar std::cout << metadata.position; 243*103e46e4SHarish Mahendrakar } 244*103e46e4SHarish Mahendrakar std::cout << " with header size "; 245*103e46e4SHarish Mahendrakar if (metadata.header_size == webm::kUnknownHeaderSize) { 246*103e46e4SHarish Mahendrakar // The header size will only be unknown if we've done a seek. But since we 247*103e46e4SHarish Mahendrakar // aren't seeking in this demo, this will never be the case. However, this 248*103e46e4SHarish Mahendrakar // if-statement is included for completeness. 249*103e46e4SHarish Mahendrakar std::cout << "<unknown>"; 250*103e46e4SHarish Mahendrakar } else { 251*103e46e4SHarish Mahendrakar std::cout << metadata.header_size; 252*103e46e4SHarish Mahendrakar } 253*103e46e4SHarish Mahendrakar std::cout << " and body size "; 254*103e46e4SHarish Mahendrakar if (metadata.size == webm::kUnknownElementSize) { 255*103e46e4SHarish Mahendrakar // WebM master elements may have an unknown size, though this is rare. 256*103e46e4SHarish Mahendrakar std::cout << "<unknown>"; 257*103e46e4SHarish Mahendrakar } else { 258*103e46e4SHarish Mahendrakar std::cout << metadata.size; 259*103e46e4SHarish Mahendrakar } 260*103e46e4SHarish Mahendrakar std::cout << '\n'; 261*103e46e4SHarish Mahendrakar 262*103e46e4SHarish Mahendrakar *action = webm::Action::kRead; 263*103e46e4SHarish Mahendrakar return webm::Status(webm::Status::kOkCompleted); 264*103e46e4SHarish Mahendrakar } 265*103e46e4SHarish Mahendrakar}; 266*103e46e4SHarish Mahendrakar 267*103e46e4SHarish Mahendrakarint main() { 268*103e46e4SHarish Mahendrakar MyCallback callback; 269*103e46e4SHarish Mahendrakar webm::FileReader reader(std::freopen(nullptr, "rb", stdin)); 270*103e46e4SHarish Mahendrakar webm::WebmParser parser; 271*103e46e4SHarish Mahendrakar webm::Status status = parser.Feed(&callback, &reader); 272*103e46e4SHarish Mahendrakar if (status.completed_ok()) { 273*103e46e4SHarish Mahendrakar std::cout << "Parsing successfully completed\n"; 274*103e46e4SHarish Mahendrakar } else { 275*103e46e4SHarish Mahendrakar std::cout << "Parsing failed with status code: " << status.code << '\n'; 276*103e46e4SHarish Mahendrakar } 277*103e46e4SHarish Mahendrakar} 278*103e46e4SHarish Mahendrakar``` 279*103e46e4SHarish Mahendrakar 280*103e46e4SHarish MahendrakarThis will output information about every element in the entire file: it's ID, 281*103e46e4SHarish Mahendrakarposition, header size, and body size. The status of the parse is also checked 282*103e46e4SHarish Mahendrakarand reported. 283*103e46e4SHarish Mahendrakar 284*103e46e4SHarish MahendrakarFor a more complete example, see `demo/demo.cc`, which parses an entire file and 285*103e46e4SHarish Mahendrakarprints out all of its information. That example overrides every `Callback` 286*103e46e4SHarish Mahendrakarmethod to show exactly what information is available while parsing and how to 287*103e46e4SHarish Mahendrakaraccess it. The example is verbose, but that's primarily due to pretty-printing 288*103e46e4SHarish Mahendrakarand string formatting operations. 289*103e46e4SHarish Mahendrakar 290*103e46e4SHarish MahendrakarWhen compiling your program, add the `include` directory to your compiler's 291*103e46e4SHarish Mahendrakarheader search paths and link to the compiled library. Be sure your compiler has 292*103e46e4SHarish MahendrakarC++11 mode enabled (`-std=c++11` in clang++ or g++). 293*103e46e4SHarish Mahendrakar 294*103e46e4SHarish Mahendrakar# Testing 295*103e46e4SHarish Mahendrakar 296*103e46e4SHarish MahendrakarUnit tests are located in the `tests` directory. Google Test and Google Mock are 297*103e46e4SHarish Mahendrakarused as testing frameworks. Building and running the tests will be supported in 298*103e46e4SHarish Mahendrakarthe upcoming CMake scripts, but they can currently be built and run by manually 299*103e46e4SHarish Mahendrakarcompiling them (and linking to Google Test and Google Mock). 300*103e46e4SHarish Mahendrakar 301*103e46e4SHarish Mahendrakar# Fuzzing 302*103e46e4SHarish Mahendrakar 303*103e46e4SHarish MahendrakarThe parser has been fuzzed with [AFL](http://lcamtuf.coredump.cx/afl/) and 304*103e46e4SHarish Mahendrakar[libFuzzer](http://llvm.org/docs/LibFuzzer.html). If you wish to fuzz the parser 305*103e46e4SHarish Mahendrakarwith AFL or libFuzzer but don't want to write an executable that exercises the 306*103e46e4SHarish Mahendrakarparsing API, you may use `fuzzing/webm_fuzzer.cc`. 307*103e46e4SHarish Mahendrakar 308*103e46e4SHarish MahendrakarWhen compiling for fuzzing, define the macro 309*103e46e4SHarish Mahendrakar`WEBM_FUZZER_BYTE_ELEMENT_SIZE_LIMIT` to be some integer in order to limit the 310*103e46e4SHarish Mahendrakarmaximum size of ASCII/UTF-8/binary elements. It's too easy for the fuzzer to 311*103e46e4SHarish Mahendrakargenerate elements that claim to have a ridiculously massive size, which will 312*103e46e4SHarish Mahendrakarcause allocations to fail or the program to allocate too much memory. AFL will 313*103e46e4SHarish Mahendrakarterminate the process if it allocates too much memory (by default, 50 MB), and 314*103e46e4SHarish Mahendrakarthe [Address Sanitizer doesn't throw `std::bad_alloc` when an allocation fails] 315*103e46e4SHarish Mahendrakar(https://github.com/google/sanitizers/issues/295). Defining 316*103e46e4SHarish Mahendrakar`WEBM_FUZZER_BYTE_ELEMENT_SIZE_LIMIT` to a low number (say, 1024) will cause the 317*103e46e4SHarish MahendrakarASCII/UTF-8/binary element parsers to return `Status::kNotEnoughMemory` if the 318*103e46e4SHarish Mahendrakarelement's size exceeds `WEBM_FUZZER_BYTE_ELEMENT_SIZE_LIMIT`, which will avoid 319*103e46e4SHarish Mahendrakarfalse positives when fuzzing. The parser expects `std::string` and `std::vector` 320*103e46e4SHarish Mahendrakarto throw `std::bad_alloc` when an allocation fails, which doesn't necessarily 321*103e46e4SHarish Mahendrakarhappen due to the fuzzers' limitations. 322*103e46e4SHarish Mahendrakar 323*103e46e4SHarish MahendrakarYou may also define the macro `WEBM_FUZZER_SEEK_FIRST` to have 324*103e46e4SHarish Mahendrakar`fuzzing/webm_fuzzer.cc` call `WebmParser::DidSeek()` before doing any parsing. 325*103e46e4SHarish MahendrakarThis will test the seeking code paths. 326