xref: /aosp_15_r20/external/libwebm/webm_parser/README.md (revision 103e46e4cd4b6efcf6001f23fa8665fb110abf8d)
1*103e46e4SHarish Mahendrakar# WebM Parser {#mainpage}
2*103e46e4SHarish Mahendrakar
3*103e46e4SHarish Mahendrakar# Introduction
4*103e46e4SHarish Mahendrakar
5*103e46e4SHarish MahendrakarThis WebM parser is a C++11-based parser that aims to be a safe and complete
6*103e46e4SHarish Mahendrakarparser for WebM. It supports all WebM elements (from the old deprecated ones to
7*103e46e4SHarish Mahendrakarthe newest ones like `Colour`), including recursive elements like `ChapterAtom`
8*103e46e4SHarish Mahendrakarand `SimpleTag`. It supports incremental parsing; parsing may be stopped at any
9*103e46e4SHarish Mahendrakarpoint and resumed later as needed. It also supports starting at an arbitrary
10*103e46e4SHarish MahendrakarWebM element, so parsing need not start from the beginning of the file.
11*103e46e4SHarish Mahendrakar
12*103e46e4SHarish MahendrakarThe parser (`WebmParser`) works by being fed input data from a data source (an
13*103e46e4SHarish Mahendrakarinstance of `Reader`) that represents a WebM file. The parser will parse the
14*103e46e4SHarish MahendrakarWebM data into various data structures that represent the encoded WebM elements,
15*103e46e4SHarish Mahendrakarand then call corresponding `Callback` event methods as the data structures are
16*103e46e4SHarish Mahendrakarparsed.
17*103e46e4SHarish Mahendrakar
18*103e46e4SHarish Mahendrakar# Building
19*103e46e4SHarish Mahendrakar
20*103e46e4SHarish MahendrakarCMake support has been added to the root libwebm `CMakeLists.txt` file. Simply
21*103e46e4SHarish Mahendrakarenable the `ENABLE_WEBM_PARSER` feature if using the interactive CMake builder,
22*103e46e4SHarish Mahendrakaror alternatively pass the `-DENABLE_WEBM_PARSER:BOOL=ON` flag from the command
23*103e46e4SHarish Mahendrakarline. By default, this parser is not enabled when building libwebm, so you must
24*103e46e4SHarish Mahendrakarexplicitly enable it.
25*103e46e4SHarish Mahendrakar
26*103e46e4SHarish MahendrakarAlternatively, the following illustrates the minimal commands necessary to
27*103e46e4SHarish Mahendrakarcompile the code into a static library without CMake:
28*103e46e4SHarish Mahendrakar
29*103e46e4SHarish Mahendrakar```.sh
30*103e46e4SHarish Mahendrakarc++ -Iinclude -I. -std=c++11 -c src/*.cc
31*103e46e4SHarish Mahendrakarar rcs libwebm.a *.o
32*103e46e4SHarish Mahendrakar```
33*103e46e4SHarish Mahendrakar
34*103e46e4SHarish Mahendrakar# Using the parser
35*103e46e4SHarish Mahendrakar
36*103e46e4SHarish MahendrakarThere are 3 basic components in the parser that are used: `Reader`, `Callback`,
37*103e46e4SHarish Mahendrakarand `WebmParser`.
38*103e46e4SHarish Mahendrakar
39*103e46e4SHarish Mahendrakar## `Reader`
40*103e46e4SHarish Mahendrakar
41*103e46e4SHarish MahendrakarThe `Reader` interface acts as a data source for the parser. You may subclass it
42*103e46e4SHarish Mahendrakarand implement your own data source if you wish. Alternatively, use the
43*103e46e4SHarish Mahendrakar`FileReader`, `IstreamReader`, or `BufferReader` if you wish to read from a
44*103e46e4SHarish Mahendrakar`FILE*`, `std::istream`, or `std::vector<std::uint8_t>`, respectively.
45*103e46e4SHarish Mahendrakar
46*103e46e4SHarish MahendrakarThe parser supports `Reader` implementations that do short reads. If
47*103e46e4SHarish Mahendrakar`Reader::Skip()` or `Reader::Read()` do a partial read (returning
48*103e46e4SHarish Mahendrakar`Status::kOkPartial`), the parser will call them again in an attempt to read
49*103e46e4SHarish Mahendrakarmore data. If no data is available, the `Reader` may return some other status
50*103e46e4SHarish Mahendrakar(like `Status::kWouldBlock`) to indicate that no data is available. In this
51*103e46e4SHarish Mahendrakarsituation, the parser will stop parsing and return the status it received.
52*103e46e4SHarish MahendrakarParsing may be resumed later when more data is available.
53*103e46e4SHarish Mahendrakar
54*103e46e4SHarish MahendrakarWhen the `Reader` has reached the end of the WebM document and no more data is
55*103e46e4SHarish Mahendrakaravailable, it should return `Status::kEndOfFile`. This will cause parsing to
56*103e46e4SHarish Mahendrakarstop. If the file ends at a valid location (that is, there aren't any elements
57*103e46e4SHarish Mahendrakarthat have specified a size that indicates the file ended prematurely), the
58*103e46e4SHarish Mahendrakarparser will translate `Status::kEndOfFile` into `Status::kOkCompleted` and
59*103e46e4SHarish Mahendrakarreturn it. If the file ends prematurely, the parser will return
60*103e46e4SHarish Mahendrakar`Status::kEndOfFile` to indicate that.
61*103e46e4SHarish Mahendrakar
62*103e46e4SHarish MahendrakarNote that if the WebM file contains elements that have an unknown size (or a
63*103e46e4SHarish Mahendrakarseek has been performed and the parser doesn't know the size of the root
64*103e46e4SHarish Mahendrakarelement(s)), and the parser is parsing them and hits end-of-file, the parser may
65*103e46e4SHarish Mahendrakarstill call `Reader::Read()`/`Reader::Skip()` multiple times (even though they've
66*103e46e4SHarish Mahendrakaralready reported `Status::kEndOfFile`) as nested parsers terminate parsing.
67*103e46e4SHarish MahendrakarBecause of this, `Reader::Read()`/`Reader::Skip()` implementations should be
68*103e46e4SHarish Mahendrakarable to handle being called multiple times after the file's end has been
69*103e46e4SHarish Mahendrakarreached, and they should consistently return `Status::kEndOfFile`.
70*103e46e4SHarish Mahendrakar
71*103e46e4SHarish MahendrakarThe three provided readers (`FileReader`, `IstreamReader`, and `BufferReader`)
72*103e46e4SHarish Mahendrakarare blocking implementations (they won't return `Status::kWouldBlock`), so if
73*103e46e4SHarish Mahendrakaryou're using them the parser will run until it entirely consumes all their data
74*103e46e4SHarish Mahendrakar(unless, of course, you request the parser to stop via `Callback`... see the
75*103e46e4SHarish Mahendrakarnext section).
76*103e46e4SHarish Mahendrakar
77*103e46e4SHarish Mahendrakar## `Callback`
78*103e46e4SHarish Mahendrakar
79*103e46e4SHarish MahendrakarAs the parser progresses through the file, it builds objects (see
80*103e46e4SHarish Mahendrakar`webm/dom_types.h`) that represent parsed data structures. The parser then
81*103e46e4SHarish Mahendrakarnotifies the `Callback` implementation as objects complete parsing. For some
82*103e46e4SHarish Mahendrakardata structures (like frames or Void elements), the parser notifies the
83*103e46e4SHarish Mahendrakar`Callback` and requests it to consume the data directly from the `Reader` (this
84*103e46e4SHarish Mahendrakaris done for structures that can be large/frequent binary blobs in order to allow
85*103e46e4SHarish Mahendrakaryou to read the data directly into the object/type of your choice, rather than
86*103e46e4SHarish Mahendrakarjust reading them into a `std::vector<std::uint8_t>` and making you copy it into
87*103e46e4SHarish Mahendrakara different object if you wanted to work with something other than
88*103e46e4SHarish Mahendrakar`std::vector<std::uint8_t>`).
89*103e46e4SHarish Mahendrakar
90*103e46e4SHarish MahendrakarThe parser was designed to parse the data into objects that are small enough
91*103e46e4SHarish Mahendrakarthat the `Callback` can be quickly and frequently notified as soon as the object
92*103e46e4SHarish Mahendrakaris ready, but large enough that the objects received by the `Callback` are still
93*103e46e4SHarish Mahendrakaruseful. Having `Callback` events for every tiny integer/float/string/etc.
94*103e46e4SHarish Mahendrakarelement would require too much assembly and work to be useful to most users, and
95*103e46e4SHarish Mahendrakarpasing the file into a single DOM tree (or a small handful of large conglomerate
96*103e46e4SHarish Mahendrakarstructures) would unnecessarily delay video playback or consume too much memory
97*103e46e4SHarish Mahendrakaron smaller devices.
98*103e46e4SHarish Mahendrakar
99*103e46e4SHarish MahendrakarThe parser may call the following methods while nearly anywhere in the file:
100*103e46e4SHarish Mahendrakar
101*103e46e4SHarish Mahendrakar-   `Callback::OnElementBegin()`: This is called for every element that the
102*103e46e4SHarish Mahendrakar    parser encounters. This is primarily useful if you want to skip some
103*103e46e4SHarish Mahendrakar    elements or build a map of every element in the file.
104*103e46e4SHarish Mahendrakar-   `Callback::OnUnknownElement()`: This is called when an element is either not
105*103e46e4SHarish Mahendrakar    a valid/recognized WebM element, or it is a WebM element but is improperly
106*103e46e4SHarish Mahendrakar    nested (e.g. an EBMLVersion element inside of a Segment element). The parser
107*103e46e4SHarish Mahendrakar    doesn't know how to handle the element; it could just skip it but instead
108*103e46e4SHarish Mahendrakar    defers to the `Callback` to decide how it should be handled. The default
109*103e46e4SHarish Mahendrakar    implementation just skips the element.
110*103e46e4SHarish Mahendrakar-   `Callback::OnVoid()`: Void elements can appear anywhere in any master
111*103e46e4SHarish Mahendrakar    element. This method will be called to handle the Void element.
112*103e46e4SHarish Mahendrakar
113*103e46e4SHarish MahendrakarThe parser may call the following methods in the proper nesting order, as shown
114*103e46e4SHarish Mahendrakarin the list. A `*Begin()` method will always be matched up with its
115*103e46e4SHarish Mahendrakarcorresponding `*End()` method (unless a seek has been performed). The parser
116*103e46e4SHarish Mahendrakarwill only call the methods in the proper nesting order as specified in the WebM
117*103e46e4SHarish MahendrakarDOM. For example, `Callback::OnEbml()` will never be called in between
118*103e46e4SHarish Mahendrakar`Callback::OnSegmentBegin()`/`Callback::OnSegmentEnd()` (since the EBML element
119*103e46e4SHarish Mahendrakaris not a child of the Segment element), and `Callback::OnTrackEntry()` will only
120*103e46e4SHarish Mahendrakarever be called in between
121*103e46e4SHarish Mahendrakar`Callback::OnSegmentBegin()`/`Callback::OnSegmentEnd()` (since the TrackEntry
122*103e46e4SHarish Mahendrakarelement is a (grand-)child of the Segment element and must be contained by a
123*103e46e4SHarish MahendrakarSegment element). `Callback::OnFrame()` is listed twice because it will be
124*103e46e4SHarish Mahendrakarcalled to handle frames contained in both SimpleBlock and Block elements.
125*103e46e4SHarish Mahendrakar
126*103e46e4SHarish Mahendrakar-   `Callback::OnEbml()`
127*103e46e4SHarish Mahendrakar-   `Callback::OnSegmentBegin()`
128*103e46e4SHarish Mahendrakar    -   `Callback::OnSeek()`
129*103e46e4SHarish Mahendrakar    -   `Callback::OnInfo()`
130*103e46e4SHarish Mahendrakar    -   `Callback::OnClusterBegin()`
131*103e46e4SHarish Mahendrakar        -   `Callback::OnSimpleBlockBegin()`
132*103e46e4SHarish Mahendrakar            -   `Callback::OnFrame()`
133*103e46e4SHarish Mahendrakar        -   `Callback::OnSimpleBlockEnd()`
134*103e46e4SHarish Mahendrakar        -   `Callback::OnBlockGroupBegin()`
135*103e46e4SHarish Mahendrakar            -   `Callback::OnBlockBegin()`
136*103e46e4SHarish Mahendrakar                -   `Callback::OnFrame()`
137*103e46e4SHarish Mahendrakar            -   `Callback::OnBlockEnd()`
138*103e46e4SHarish Mahendrakar        -   `Callback::OnBlockGroupEnd()`
139*103e46e4SHarish Mahendrakar    -   `Callback::OnClusterEnd()`
140*103e46e4SHarish Mahendrakar    -   `Callback::OnTrackEntry()`
141*103e46e4SHarish Mahendrakar    -   `Callback::OnCuePoint()`
142*103e46e4SHarish Mahendrakar    -   `Callback::OnEditionEntry()`
143*103e46e4SHarish Mahendrakar    -   `Callback::OnTag()`
144*103e46e4SHarish Mahendrakar-   `Callback::OnSegmentEnd()`
145*103e46e4SHarish Mahendrakar
146*103e46e4SHarish MahendrakarOnly `Callback::OnFrame()` (and no other `Callback` methods) will be called in
147*103e46e4SHarish Mahendrakarbetween `Callback::OnSimpleBlockBegin()`/`Callback::OnSimpleBlockEnd()` or
148*103e46e4SHarish Mahendrakar`Callback::OnBlockBegin()`/`Callback::OnBlockEnd()`, since the SimpleBlock and
149*103e46e4SHarish MahendrakarBlock elements are not master elements only contain frames.
150*103e46e4SHarish Mahendrakar
151*103e46e4SHarish MahendrakarNote that seeking into the middle of the file may cause the parser to skip some
152*103e46e4SHarish Mahendrakar`*Begin()` methods. For example, if a seek is performed to a SimpleBlock
153*103e46e4SHarish Mahendrakarelement, `Callback::OnSegmentBegin()` and `Callback::OnClusterBegin()` will not
154*103e46e4SHarish Mahendrakarbe called. In this situation, the full sequence of callback events would be
155*103e46e4SHarish Mahendrakar(assuming the file ended after the SimpleBlock):
156*103e46e4SHarish Mahendrakar`Callback::OnSimpleBlockBegin()`, `Callback::OnFrame()` (for every frame in the
157*103e46e4SHarish MahendrakarSimpleBlock), `Callback::OnSimpleBlockEnd()`, `Callback::OnClusterEnd()`, and
158*103e46e4SHarish Mahendrakar`Callback::OnSegmentEnd()`. Since the Cluster and Segment elements were skipped,
159*103e46e4SHarish Mahendrakarthe `Cluster` DOM object may have some members marked as absent, and the
160*103e46e4SHarish Mahendrakar`*End()` events for the Cluster and Segment elements will have metadata with
161*103e46e4SHarish Mahendrakarunknown header position, header length, and body size (see `kUnknownHeaderSize`,
162*103e46e4SHarish Mahendrakar`kUnknownElementSize`, and `kUnknownElementPosition`).
163*103e46e4SHarish Mahendrakar
164*103e46e4SHarish MahendrakarWhen a `Callback` method has completed, it should return `Status::kOkCompleted`
165*103e46e4SHarish Mahendrakarto allow parsing to continue. If you would like parsing to stop, return any
166*103e46e4SHarish Mahendrakarother status code (except `Status::kEndOfFile`, since that's treated somewhat
167*103e46e4SHarish Mahendrakarspecially and is intended for `Reader`s to use), which the parser will return.
168*103e46e4SHarish MahendrakarIf you return a non-parsing-error status code (.e.g. `Status::kOkPartial`,
169*103e46e4SHarish Mahendrakar`Status::kWouldBlock`, etc. or your own status code with a value > 0), parsing
170*103e46e4SHarish Mahendrakarmay be resumed again. When parsing is resumed, the parser will call the same
171*103e46e4SHarish Mahendrakarcallback method again (and once again, you may return `Status::kOkCompleted` to
172*103e46e4SHarish Mahendrakarlet parsing continue or some other value to stop parsing).
173*103e46e4SHarish Mahendrakar
174*103e46e4SHarish MahendrakarYou may subclass the `Callback` element and override methods which you are
175*103e46e4SHarish Mahendrakarinterested in receiving events for. By default, methods taking an `Action`
176*103e46e4SHarish Mahendrakarparameter will set it to `Action::kRead` so the entire file is parsed. The
177*103e46e4SHarish Mahendrakar`Callback::OnFrame()` method will just skip over the frame bytes by default.
178*103e46e4SHarish Mahendrakar
179*103e46e4SHarish Mahendrakar## `WebmParser`
180*103e46e4SHarish Mahendrakar
181*103e46e4SHarish MahendrakarThe actual parsing work is done with `WebmParser`. Simply construct a
182*103e46e4SHarish Mahendrakar`WebmParser` and call `WebmParser::Feed()` (providing it a `Callback` and
183*103e46e4SHarish Mahendrakar`Reader` instance) to parse a file. It will return `Status::kOkCompleted` when
184*103e46e4SHarish Mahendrakarthe entire file has been successfully parsed. `WebmParser::Feed()` doesn't store
185*103e46e4SHarish Mahendrakarany internal references to the `Callback` or `Reader`.
186*103e46e4SHarish Mahendrakar
187*103e46e4SHarish MahendrakarIf you wish to start parsing from the middle of a file, call
188*103e46e4SHarish Mahendrakar`WebmParser::DidSeek()` before calling `WebmParser::Feed()` to prepare the
189*103e46e4SHarish Mahendrakarparser to receive data starting at an arbitrary point in the file. When seeking,
190*103e46e4SHarish Mahendrakaryou should seek to the beginning of a WebM element; seeking to a location that
191*103e46e4SHarish Mahendrakaris not the start of a WebM element (e.g. seeking to a frame, rather than its
192*103e46e4SHarish Mahendrakarcontaining SimpleBlock/Block element) will cause parsing to fail. Calling
193*103e46e4SHarish Mahendrakar`WebmParser::DidSeek()` will reset the state of the parser and clear any
194*103e46e4SHarish Mahendrakarinternal errors, so a `WebmParser` instance may be reused (even if it has
195*103e46e4SHarish Mahendrakarpreviously failed to parse a file).
196*103e46e4SHarish Mahendrakar
197*103e46e4SHarish Mahendrakar## Building your program
198*103e46e4SHarish Mahendrakar
199*103e46e4SHarish MahendrakarThe following program is a small program that completely parses a file from
200*103e46e4SHarish Mahendrakarstdin:
201*103e46e4SHarish Mahendrakar
202*103e46e4SHarish Mahendrakar```.cc
203*103e46e4SHarish Mahendrakar#include <webm/callback.h>
204*103e46e4SHarish Mahendrakar#include <webm/file_reader.h>
205*103e46e4SHarish Mahendrakar#include <webm/webm_parser.h>
206*103e46e4SHarish Mahendrakar
207*103e46e4SHarish Mahendrakarint main() {
208*103e46e4SHarish Mahendrakar  webm::Callback callback;
209*103e46e4SHarish Mahendrakar  webm::FileReader reader(std::freopen(nullptr, "rb", stdin));
210*103e46e4SHarish Mahendrakar  webm::WebmParser parser;
211*103e46e4SHarish Mahendrakar  parser.Feed(&callback, &reader);
212*103e46e4SHarish Mahendrakar}
213*103e46e4SHarish Mahendrakar```
214*103e46e4SHarish Mahendrakar
215*103e46e4SHarish MahendrakarIt completely parses the input file, but we need to make a new class that
216*103e46e4SHarish Mahendrakarderives from `Callback` if we want to receive any parsing events. So if we
217*103e46e4SHarish Mahendrakarchange it to:
218*103e46e4SHarish Mahendrakar
219*103e46e4SHarish Mahendrakar```.cc
220*103e46e4SHarish Mahendrakar#include <iomanip>
221*103e46e4SHarish Mahendrakar#include <iostream>
222*103e46e4SHarish Mahendrakar
223*103e46e4SHarish Mahendrakar#include <webm/callback.h>
224*103e46e4SHarish Mahendrakar#include <webm/file_reader.h>
225*103e46e4SHarish Mahendrakar#include <webm/status.h>
226*103e46e4SHarish Mahendrakar#include <webm/webm_parser.h>
227*103e46e4SHarish Mahendrakar
228*103e46e4SHarish Mahendrakarclass MyCallback : public webm::Callback {
229*103e46e4SHarish Mahendrakar public:
230*103e46e4SHarish Mahendrakar  webm::Status OnElementBegin(const webm::ElementMetadata& metadata,
231*103e46e4SHarish Mahendrakar                              webm::Action* action) override {
232*103e46e4SHarish Mahendrakar    std::cout << "Element ID = 0x"
233*103e46e4SHarish Mahendrakar              << std::hex << static_cast<std::uint32_t>(metadata.id);
234*103e46e4SHarish Mahendrakar    std::cout << std::dec;  // Reset to decimal mode.
235*103e46e4SHarish Mahendrakar    std::cout << " at position ";
236*103e46e4SHarish Mahendrakar    if (metadata.position == webm::kUnknownElementPosition) {
237*103e46e4SHarish Mahendrakar      // The position will only be unknown if we've done a seek. But since we
238*103e46e4SHarish Mahendrakar      // aren't seeking in this demo, this will never be the case. However, this
239*103e46e4SHarish Mahendrakar      // if-statement is included for completeness.
240*103e46e4SHarish Mahendrakar      std::cout << "<unknown>";
241*103e46e4SHarish Mahendrakar    } else {
242*103e46e4SHarish Mahendrakar      std::cout << metadata.position;
243*103e46e4SHarish Mahendrakar    }
244*103e46e4SHarish Mahendrakar    std::cout << " with header size ";
245*103e46e4SHarish Mahendrakar    if (metadata.header_size == webm::kUnknownHeaderSize) {
246*103e46e4SHarish Mahendrakar      // The header size will only be unknown if we've done a seek. But since we
247*103e46e4SHarish Mahendrakar      // aren't seeking in this demo, this will never be the case. However, this
248*103e46e4SHarish Mahendrakar      // if-statement is included for completeness.
249*103e46e4SHarish Mahendrakar      std::cout << "<unknown>";
250*103e46e4SHarish Mahendrakar    } else {
251*103e46e4SHarish Mahendrakar      std::cout << metadata.header_size;
252*103e46e4SHarish Mahendrakar    }
253*103e46e4SHarish Mahendrakar    std::cout << " and body size ";
254*103e46e4SHarish Mahendrakar    if (metadata.size == webm::kUnknownElementSize) {
255*103e46e4SHarish Mahendrakar      // WebM master elements may have an unknown size, though this is rare.
256*103e46e4SHarish Mahendrakar      std::cout << "<unknown>";
257*103e46e4SHarish Mahendrakar    } else {
258*103e46e4SHarish Mahendrakar      std::cout << metadata.size;
259*103e46e4SHarish Mahendrakar    }
260*103e46e4SHarish Mahendrakar    std::cout << '\n';
261*103e46e4SHarish Mahendrakar
262*103e46e4SHarish Mahendrakar    *action = webm::Action::kRead;
263*103e46e4SHarish Mahendrakar    return webm::Status(webm::Status::kOkCompleted);
264*103e46e4SHarish Mahendrakar  }
265*103e46e4SHarish Mahendrakar};
266*103e46e4SHarish Mahendrakar
267*103e46e4SHarish Mahendrakarint main() {
268*103e46e4SHarish Mahendrakar  MyCallback callback;
269*103e46e4SHarish Mahendrakar  webm::FileReader reader(std::freopen(nullptr, "rb", stdin));
270*103e46e4SHarish Mahendrakar  webm::WebmParser parser;
271*103e46e4SHarish Mahendrakar  webm::Status status = parser.Feed(&callback, &reader);
272*103e46e4SHarish Mahendrakar  if (status.completed_ok()) {
273*103e46e4SHarish Mahendrakar    std::cout << "Parsing successfully completed\n";
274*103e46e4SHarish Mahendrakar  } else {
275*103e46e4SHarish Mahendrakar    std::cout << "Parsing failed with status code: " << status.code << '\n';
276*103e46e4SHarish Mahendrakar  }
277*103e46e4SHarish Mahendrakar}
278*103e46e4SHarish Mahendrakar```
279*103e46e4SHarish Mahendrakar
280*103e46e4SHarish MahendrakarThis will output information about every element in the entire file: it's ID,
281*103e46e4SHarish Mahendrakarposition, header size, and body size. The status of the parse is also checked
282*103e46e4SHarish Mahendrakarand reported.
283*103e46e4SHarish Mahendrakar
284*103e46e4SHarish MahendrakarFor a more complete example, see `demo/demo.cc`, which parses an entire file and
285*103e46e4SHarish Mahendrakarprints out all of its information. That example overrides every `Callback`
286*103e46e4SHarish Mahendrakarmethod to show exactly what information is available while parsing and how to
287*103e46e4SHarish Mahendrakaraccess it. The example is verbose, but that's primarily due to pretty-printing
288*103e46e4SHarish Mahendrakarand string formatting operations.
289*103e46e4SHarish Mahendrakar
290*103e46e4SHarish MahendrakarWhen compiling your program, add the `include` directory to your compiler's
291*103e46e4SHarish Mahendrakarheader search paths and link to the compiled library. Be sure your compiler has
292*103e46e4SHarish MahendrakarC++11 mode enabled (`-std=c++11` in clang++ or g++).
293*103e46e4SHarish Mahendrakar
294*103e46e4SHarish Mahendrakar# Testing
295*103e46e4SHarish Mahendrakar
296*103e46e4SHarish MahendrakarUnit tests are located in the `tests` directory. Google Test and Google Mock are
297*103e46e4SHarish Mahendrakarused as testing frameworks. Building and running the tests will be supported in
298*103e46e4SHarish Mahendrakarthe upcoming CMake scripts, but they can currently be built and run by manually
299*103e46e4SHarish Mahendrakarcompiling them (and linking to Google Test and Google Mock).
300*103e46e4SHarish Mahendrakar
301*103e46e4SHarish Mahendrakar# Fuzzing
302*103e46e4SHarish Mahendrakar
303*103e46e4SHarish MahendrakarThe parser has been fuzzed with [AFL](http://lcamtuf.coredump.cx/afl/) and
304*103e46e4SHarish Mahendrakar[libFuzzer](http://llvm.org/docs/LibFuzzer.html). If you wish to fuzz the parser
305*103e46e4SHarish Mahendrakarwith AFL or libFuzzer but don't want to write an executable that exercises the
306*103e46e4SHarish Mahendrakarparsing API, you may use `fuzzing/webm_fuzzer.cc`.
307*103e46e4SHarish Mahendrakar
308*103e46e4SHarish MahendrakarWhen compiling for fuzzing, define the macro
309*103e46e4SHarish Mahendrakar`WEBM_FUZZER_BYTE_ELEMENT_SIZE_LIMIT` to be some integer in order to limit the
310*103e46e4SHarish Mahendrakarmaximum size of ASCII/UTF-8/binary elements. It's too easy for the fuzzer to
311*103e46e4SHarish Mahendrakargenerate elements that claim to have a ridiculously massive size, which will
312*103e46e4SHarish Mahendrakarcause allocations to fail or the program to allocate too much memory. AFL will
313*103e46e4SHarish Mahendrakarterminate the process if it allocates too much memory (by default, 50 MB), and
314*103e46e4SHarish Mahendrakarthe [Address Sanitizer doesn't throw `std::bad_alloc` when an allocation fails]
315*103e46e4SHarish Mahendrakar(https://github.com/google/sanitizers/issues/295). Defining
316*103e46e4SHarish Mahendrakar`WEBM_FUZZER_BYTE_ELEMENT_SIZE_LIMIT` to a low number (say, 1024) will cause the
317*103e46e4SHarish MahendrakarASCII/UTF-8/binary element parsers to return `Status::kNotEnoughMemory` if the
318*103e46e4SHarish Mahendrakarelement's size exceeds `WEBM_FUZZER_BYTE_ELEMENT_SIZE_LIMIT`, which will avoid
319*103e46e4SHarish Mahendrakarfalse positives when fuzzing. The parser expects `std::string` and `std::vector`
320*103e46e4SHarish Mahendrakarto throw `std::bad_alloc` when an allocation fails, which doesn't necessarily
321*103e46e4SHarish Mahendrakarhappen due to the fuzzers' limitations.
322*103e46e4SHarish Mahendrakar
323*103e46e4SHarish MahendrakarYou may also define the macro `WEBM_FUZZER_SEEK_FIRST` to have
324*103e46e4SHarish Mahendrakar`fuzzing/webm_fuzzer.cc` call `WebmParser::DidSeek()` before doing any parsing.
325*103e46e4SHarish MahendrakarThis will test the seeking code paths.
326