1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 2<html> 3<head> 4 5<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> 6<title>Ogg Documentation</title> 7 8<style type="text/css"> 9body { 10 margin: 0 18px 0 18px; 11 padding-bottom: 30px; 12 font-family: Verdana, Arial, Helvetica, sans-serif; 13 color: #333333; 14 font-size: .8em; 15} 16 17a { 18 color: #3366cc; 19} 20 21img { 22 border: 0; 23} 24 25#xiphlogo { 26 margin: 30px 0 16px 0; 27} 28 29#content p { 30 line-height: 1.4; 31} 32 33h1, h1 a, h2, h2 a, h3, h3 a { 34 font-weight: bold; 35 color: #ff9900; 36 margin: 1.3em 0 8px 0; 37} 38 39h1 { 40 font-size: 1.3em; 41} 42 43h2 { 44 font-size: 1.2em; 45} 46 47h3 { 48 font-size: 1.1em; 49} 50 51li { 52 line-height: 1.4; 53} 54 55#copyright { 56 margin-top: 30px; 57 line-height: 1.5em; 58 text-align: center; 59 font-size: .8em; 60 color: #888888; 61 clear: both; 62} 63 64.caption { 65 color: #000000; 66 background-color: #aabbff; 67 margin: 1em; 68 margin-left: 2em; 69 margin-right: 2em; 70 padding: 1em; 71 padding-bottom: 0em; 72 overflow: hidden; 73} 74 75.caption p { 76 clear: none; 77} 78 79.caption img { 80 display: block; 81 margin: 0px; 82 margin-left: auto; 83 margin-right: auto; 84 margin-bottom: 1.5em; 85 background-color: #ffffff; 86 padding: 10px; 87} 88 89#thepage { 90 margin-left: auto; 91 margin-right: auto; 92 width: 840px; 93} 94 95</style> 96 97</head> 98 99<body> 100<div id="thepage"> 101 102<div id="xiphlogo"> 103 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a> 104</div> 105 106<h1>Ogg bitstream overview</h1> 107 108<p>This document serves as starting point for understanding the design 109and implementation of the Ogg container format. If you're new to Ogg 110or merely want a high-level technical overview, start reading here. 111Other documents linked from the <a href="index.html">index page</a> 112give distilled technical descriptions and references of the container 113mechanisms. This document is intended to aid understanding. 114 115<h2>Container format design points</h2> 116 117<p>Ogg is intended to be a simplest-possible container, concerned only 118with framing, ordering, and interleave. It can be used as a stream delivery 119mechanism, for media file storage, or as a building block toward 120implementing a more complex, non-linear container (for example, see 121the <a href="skeleton.html">Skeleton</a> or <a 122href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>). 123 124<p>The Ogg container is not intended to be a monolithic 125'kitchen-sink'. It exists only to frame and deliver in-order stream 126data and as such is vastly simpler than most other containers. 127Elementary and multiplexed streams are both constructed entirely from a 128single building block (an Ogg page) comprised of eight fields 129totalling twenty-eight bytes (the page header) a list of packet lengths 130(up to 255 bytes) and payload data (up to 65025 bytes). The structure 131of every page is the same. There are no optional fields or alternate 132encodings. 133 134<p>Stream and media metadata is contained in Ogg and not built into 135the Ogg container itself. Metadata is thus compartmentalized and 136layered rather than part of a monolithic design, an especially good 137idea as no two groups seem able to agree on what a complete or 138complete-enough metadata set should be. In this way, the container and 139container implementation are isolated from unnecessary metadata design 140flux. 141 142<h3>Streaming</h3> 143 144<p>The Ogg container is primarily a streaming format, 145encapsulating chronological, time-linear mixed media into a single 146delivery stream or file. The design is such that an application can 147always encode and/or decode all features of a bitstream in one pass 148with no seeking and minimal buffering. Seeking to provide optimized 149encoding (such as two-pass encoding) or interactive decoding (such as 150scrubbing or instant replay) is not disallowed or discouraged, however 151no container feature requires nonlinear access of the bitstream. 152 153<h3>Variable Bit Rate, Variable Payload Size</h3> 154 155<p>Ogg is designed to contain any size data payload with bounded, 156predictable efficiency. Ogg packets have no maximum size and a 157zero-byte minimum size. There is no restriction on size changes from 158packet to packet. Variable size packets do not require the use of any 159optional or additional container features. There is no optimal 160suggested packet size, though special consideration was paid to make 161sure 50-200 byte packets were no less efficient than larger packet 162sizes. The original design criteria was a 2% overhead at 50 byte 163packets, dropping to a maximum working overhead of 1% with larger 164packets, and a typical working overhead of .5-.7% for most practical 165uses. 166 167<h3>Simple pagination</h3> 168 169<p>Ogg is a byte-aligned container with no context-dependent, optional 170or variable-length fields. Ogg requires no repacking of codec data. 171The page structure is written out in-line as packet data is submitted 172to the streaming abstraction. In addition, it is possible to 173implement both Ogg mux and demux as MT-hot zero-copy abstractions (as 174is done in the Tremor sourcebase). 175 176<h3>Capture</h3> 177 178<p>Ogg is designed for efficient and immediate stream capture with 179high confidence. Although packets have no size limit in Ogg, pages 180are a maximum of just under 64kB meaning that any Ogg stream can be 181captured with confidence after seeing 128kB of data or less [worst 182case; typical figure is 6kB] from any random starting point in the 183stream. 184 185<h3>Seeking</h3> 186 187<p>Ogg implements simple coarse- and fine-grained seeking by design. 188 189<p>Coarse seeking may be performed by simply 'moving the tone arm' to a 190new position and 'dropping the needle'. Rapid capture with 191accompanying timecode from any location in an Ogg file is guaranteed 192by the stream design. From the acquisition of the first timecode, 193all data needed to play back from that time code forward is ahead of 194the stream cursor. 195 196<p>Ogg implements full sample-granularity seeking using an 197interpolated bisection search built on the capture and timecode 198mechanisms used by coarse seeking. As above, once a search finds 199the desired timecode, all data needed to play back from that time code 200forward is ahead of the stream cursor. 201 202<p>Both coarse and fine seeking use the page structure and sequencing 203inherent to the Ogg format. All Ogg streams are fully seekable from 204creation; seekability is unaffected by truncation or missing data, and 205is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor 206heuristic. 207 208<p>Seeking without use of an index is a major point of the Ogg 209design. There two primary reasons why Ogg transport forgoes an index: 210 211<ol> 212 213<li>An index is only marginally useful in Ogg for the complexity 214added; it adds no new functionality and seldom improves performance 215noticeably. Empirical testing shows that indexless interpolation 216search does not require many more seeks in practice than using an 217index would. 218 219<li>'Optional' indexes encourage lazy implementations that can seek 220only when indexes are present, or that implement indexless seeking 221only by building an internal index after reading the entire file 222beginning to end. This has been the fate of other containers that 223specify optional indexing. 224 225</ol> 226 227<p>In addition, it must be possible to create an Ogg stream in a 228single pass. Although an optional index can simply be tacked on the 229end of the created stream, some software groups object to 230end-positioned indexes and claim to be unwilling to support indexes 231not located at the stream beginning. 232 233<p><i>All this said, it's become clear that an optional index is a 234demanded feature. For this reason, the <a 235href="http://wiki.xiph.org/Ogg_Index">OggSkeleton now defines a 236proposed index.</a></i> 237 238<h3>Simple multiplexing</h3> 239 240<p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a 241multiplexed stream in time order. The multiplexed pages are not 242altered. Muxing an Ogg AV stream out of separate audio, 243video and data streams is akin to shuffling several decks of cards 244together into a single deck; the cards themselves remain unchanged. 245Demultiplexing is similarly simple (as the cards are marked). 246 247<p>The goal of this design is to make the mux/demux operation as 248trivial as possible to allow live streaming systems to build and 249rebuild streams on the fly with minimal CPU usage and no additional 250storage or latency requirements. 251 252<h3>Continuous and Discontinuous Media</h3> 253 254<p>Ogg streams belong to one of two categories, "Continuous" streams and 255"Discontinuous" streams. 256 257<p>A stream that provides a gapless, time-continuous media type with a 258fine-grained timebase is considered to be 'Continuous'. A continuous 259stream should never be starved of data. Examples of continuous data 260types include broadcast audio and video. 261 262<p>A stream that delivers data in a potentially irregular pattern or 263with widely spaced timing gaps is considered to be 'Discontinuous'. A 264discontinuous stream may be best thought of as data representing 265scattered events; although they happen in order, they are typically 266unconnected data often located far apart. One example of a 267discontinuous stream types would be captioning such as <a 268href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's 269possible to design captions as a continuous stream type, it's most 270natural to think of captions as widely spaced pieces of text with 271little happening between. 272 273<p>The fundamental reason for distinction between continuous and 274discontinuous streams concerns buffering. 275 276<h3>Buffering</h3> 277 278<p>A continuous stream is, by definition, gapless. Ogg buffering is based 279on the simple premise of never allowing an active continuous stream 280to starve for data during decode; buffering works ahead until all 281continuous streams in a physical stream have data ready and no further. 282 283<p>Discontinuous stream data is not assumed to be predictable. The 284buffering design takes discontinuous data 'as it comes' rather than 285working ahead to look for future discontinuous data for a potentially 286unbounded period. Thus, the buffering process makes no attempt to fill 287discontinuous stream buffers; their pages simply 'fall out' of the 288stream when continuous streams are handled properly. 289 290<p>Buffering requirements in this design need not be explicitly 291declared or managed in the encoded stream. The decoder simply reads as 292much data as is necessary to keep all continuous stream types gapless 293and no more, with discontinuous data processed as it arrives in the 294continuous data. Buffering is implicitly optimal for the given 295stream. Because all pages of all data types are stamped with absolute 296timing information within the stream, inter-stream synchronization 297timing is always maintained without the need for explicitly declared 298buffer-ahead hinting. 299 300<h3>Codec metadata</h3> 301 302<p>Ogg does not replicate codec-specific metadata into the mux layer 303in an attempt to make the mux and codec layer implementations 'fully 304separable'. Things like specific timebase, keyframing strategy, frame 305duration, etc, do not appear in the Ogg container. The mux layer is, 306instead, expected to query a codec through a centralized interface, 307left to the implementation, for this data when it is needed. 308 309<p>Though modern design wisdom usually prefers to predict all possible 310needs of current and future codecs then embed these dependencies and 311the required metadata into the container itself, this strategy 312increases container specification complexity, fragility, and rigidity. 313The mux and codec code becomes more independent, but the 314specifications become logically less independent. A codec can't do 315what a container hasn't already provided for. Novel codecs are harder 316to support, and you can do fewer useful things with the ones you've 317already got (eg, try to make a good splitter without using any codecs. 318Such a splitter is limited to splitting at keyframes only, or building 319yet another new mechanism into the container layer to mark what frames 320to skip displaying). 321 322<p>Ogg's design goes the opposite direction, where the specification 323is to be as simple, easy to understand, and 'proofed' against novel 324codecs as possible. When an Ogg mux layer requires codec-specific 325information, it queries the codec (or a codec stub). This trades a 326more complex implementation for a simpler, more flexible 327specification. 328 329<h3>Stream structure metadata</h3> 330 331<p>The Ogg container itself does not define a metadata system for 332declaring the structure and interrelations between multiple media 333types in a muxed stream. That is, the Ogg container itself does not 334specify data like 'which steam is the subtitle stream?' or 'which 335video stream is the primary angle?'. This metadata still exists, but 336is stored by the Ogg container rather than being built into the Ogg 337container itself. Xiph specifies the 'Skeleton' metadata format for Ogg 338streams, but this decoupling of container and stream structure 339metadata means it is possible to use Ogg with any metadata 340specification without altering the container itself, or without stream 341structure metadata at all. 342 343<h3>Frame accurate absolute position</h3> 344 345<p>Every Ogg page is stamped with a 64 bit 'granule position' that 346serves as an absolute timestamp for mux and seeking. A few nifty 347little tricks are usually also embedded in the granpos state, but 348we'll leave those aside for the moment (strictly speaking, they're 349part of each codec's mapping, not Ogg). 350 351<p>As previously mentioned above, granule positions are mapped into 352absolute timestamps by the codec, rather than being a hard timestamp. 353This allows maximally efficient use of the available 64 bits to 354address every sample/frame position without approximation while 355supporting new and previously unknown timebase encodings without 356needing to extend or update the mux layer. When a codec needs a novel 357timebase, it simply brings the code for that mapping along with it. 358This is not a theoretical curiosity; new, wholly novel timebases were 359deployed with the adoption of both Theora and Dirac. "Rolling INTRA" 360(keyframeless video) also benefits from novel use of the granule 361position. 362 363<h2>Ogg stream arrangement</h2> 364 365<h3>Packets, pages, and bitstreams</h3> 366 367<p>Ogg codecs place raw compressed data into <em>packets</em>. 368Packets are octet payloads containing the data needed for a single 369decompressed unit, eg, one video frame. Packets have no maximum size 370and may be zero length. They do not generally have any framing 371information; strung together, the unframed packets form a <em>logical 372bitstream</em> of codec data with no internal landmarks. 373 374<div class="caption"> 375 <img src="packets.png"> 376 377 <p> Packets of raw codec data are not typically internally framed. 378 When they are strung together into a stream without any container to 379 provide framing, they lose their individual boundaries. Seek and 380 capture are not possible within an unframed stream, and for many 381 codecs with variable length payloads and/or early-packet termination 382 (such as Vorbis), it may become impossible to recover the original 383 frame boundaries even if the stream is scanned linearly from 384 beginning to end. 385 386</div> 387 388<p>Logical bitstream packets are grouped and framed into Ogg pages 389along with a unique stream <em>serial number</em> to produce a 390<em>physical bitstream</em>. An <em>elementary stream</em> is a 391physical bitstream containing only a single logical bitstream. Each 392page is a self contained entity, although a packet may be split and 393encoded across one or more pages. The page decode mechanism is 394designed to recognize, verify and handle single pages at a time from 395the overall bitstream. 396 397<div class="caption"> 398 <img src="pages.png"> 399 400 <p> The primary purpose of a container is to provide framing for raw 401 packets, marking the packet boundaries so the exact packets can be 402 retrieved for decode later. The container also provides secondary 403 functions such as capture, timestamping, sequencing, stream 404 identification and so on. Not all of these functions are represented in the diagram. 405 406 <p>In the Ogg container, pages do not necessarily contain 407 integer numbers of packets. Packets may span across page boundaries 408 or even multiple pages. This is necessary as pages have a maximum 409 possible size in order to provide capture guarantees, but packet 410 size is unbounded. 411</div> 412 413 414<p><a href="framing.html">Ogg Bitstream Framing</a> specifies 415the page format of an Ogg bitstream, the packet coding process 416and elementary bitstreams in detail. 417 418<h3>Multiplexed bitstreams</h3> 419 420<p>Multiple logical/elementary bitstreams can be combined into a single 421<em>multiplexed bitstream</em> by interleaving whole pages from each 422contributing elementary stream in time order. The result is a single 423physical stream that multiplexes and frames multiple logical streams. 424Each logical stream is identified by the unique stream serial number 425stamped in its pages. A physical stream may include a 'meta-header' 426(such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its 427own Ogg page at the beginning of the physical stream. A decoder 428recovers the original logical/elementary bitstreams out of the 429physical bitstream by taking the pages in order from the physical 430bitstream and redirecting them into the appropriate logical decoding 431entity. 432 433<div class="caption"> 434 <img src="multiplex1.png"> 435 436<p>Multiple media types are mutliplexed into a single Ogg stream by 437interleaving the pages from each elementary physical stream. 438 439</div> 440 441<p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies 442proper multiplexing of an Ogg bitstream in detail. 443 444<h3>Chaining</h3> 445 446<p>Multiple Ogg physical bitstreams may be concatenated into a single new 447stream; this is <em>chaining</em>. The bitstreams do not overlap; the 448final page of a given logical bitstream is immediately followed by the 449initial page of the next.</p> 450 451<p>Each logical bitstream in a chain must have a unique serial number 452within the scope of the full physical bitstream, not only within a 453particular <em>link</em> or <em>segment</em> of the chain.</p> 454 455<h3>Continuous and discontinuous streams</h3> 456 457<p>Within Ogg, each stream must be declared (by the codec) to be 458continuous- or discontinuous-time. Most codecs treat all streams they 459use as either inherently continuous- or discontinuous-time, although 460this is not a requirement. A codec may, as part of its mapping, choose 461according to data in the initial header. 462 463<p>Continuous-time pages are stamped by end-time, discontinuous pages 464are stamped by begin-time. Pages in a multiplexed stream are 465interleaved in order of the time stamp regardless of stream type. 466Both continuous and discontinuous logical streams are used to seek 467within a physical stream, however only continuous streams are used to 468determine buffering depth; because discontinuous streams are stamped 469by start time, they will always 'fall out' at the proper time when 470buffering the continuous streams. See 'Examples' for an illustration 471of the buffering mechanism. 472 473<h2>Multiplexing Requirements</h2> 474 475<p>Multiplexing requirements within Ogg are straightforward. When 476constructing a single-link (unchained) physical bitstream consisting 477of multiple elementary streams: 478 479<ol> 480 481<li><p> The initial header for each stream appears in sequence, each 482header on a single page. All initial headers must appear with no 483intervening data (no auxiliary header pages or packets, no data pages 484or packets). Order of the initial headers is unspecified. The 485'beginning of stream' flag is set on each initial header. 486 487<li><p> All auxiliary headers for all streams must follow. Order 488is unspecified. The final auxiliary header of each stream must flush 489its page. 490 491<li><p>Data pages for each stream follow, interleaved in time order. 492 493<li><p>The final page of each stream sets the 'end of stream' flag. 494Unlike initial pages, terminal pages for the logical bitstreams need 495not occur contiguously; indeed it may not be possible for them to do so. 496</oL> 497 498<p><p>Each grouped bitstream must have a unique serial number within the 499scope of the physical bitstream.</p> 500 501<h3>chaining and multiplexing</h3> 502 503<p>Multiplexed and/or unmultiplexed bitstreams may be chained 504consecutively. Such a physical bitstream obeys all the rules of both 505chained and multiplexed streams. Each link, when unchained, must 506stand on its own as a valid physical bitstream. Chained streams do 507not mix or interleave; a new segment may not begin until all streams 508in the preceding segment have terminated. </p> 509 510<h2>Codec Mapping Requirements</h2> 511 512<p>Each codec is allowed some freedom in deciding how its logical 513bitstream is encapsulated into an Ogg bitstream (even if it is a 514trivial mapping, eg, 'plop the packets in and go'). This is the 515codec's <em>mapping</em>. Ogg imposes a few mapping requirements 516on any codec. 517 518<ol> 519 520<li><p>The <a href="framing.html">framing specification</a> defines 521'beginning of stream' and 'end of stream' page markers via a header 522flag (it is possible for a stream to consist of a single page). A 523correct stream always consists of an integer number of pages, an easy 524requirement given the variable size nature of pages.</p> 525 526<li><p>The first page of an elementary Ogg bitstream consists of a single, 527small 'initial header' packet that must include sufficient information 528to identify the exact CODEC type. From this initial header, the codec 529must also be able to determine its timebase and whether or not it is a 530continuous- or discontinuous-time stream. The initial header must fit 531on a single page. If a codec makes use of auxiliary headers (for 532example, Vorbis uses two auxiliary headers), these headers must follow 533the initial header immediately. The last header finishes its page; 534data begins on a fresh page. 535 536<p><p>As an example, Ogg Vorbis places the name and revision of the 537Vorbis CODEC, the audio rate and the audio quality into this initial 538header. Vorbis comments and detailed codec setup appears in the larger 539auxiliary headers.</p> 540 541<li><p>Granule positions must be translatable to an exact absolute 542time value. As described above, the mux layer is permitted to query a 543codec or codec stub plugin to perform this mapping. It is not 544necessary for an absolute time to be mappable into a single unique 545granule position value. 546 547<li><p>Codecs are not required to use a fixed duration-per-packet (for 548example, Vorbis does not). the mux layer is permitted to query a 549codec or codec stub plugin for the time duration of a packet. 550 551<li><p>Although an absolute time need not be translatable to a unique 552granule position, a codec must be able to determine the unique granule 553position of the current packet using the granule position of a 554preceding packet. 555 556<li><p>Packets and pages must be arranged in ascending 557granule-position and time order. 558 559</ol> 560 561<h2>Examples</h2> 562 563<em>[More to come shortly; this section is currently being revised and expanded]</em> 564 565<p>Below, we present an example of a multiplexed and chained bitstream:</p> 566 567<p><img src="stream.png" alt="stream"/></p> 568 569<p>In this example, we see pages from five total logical bitstreams 570multiplexed into a physical bitstream. Note the following 571characteristics:</p> 572 573<ol> 574<li>Multiplexed bitstreams in a given link begin together; all of the 575initial pages must appear before any data pages. When concurrently 576multiplexed groups are chained, the new group does not begin until all 577the bitstreams in the previous group have terminated.</li> 578 579<li>The ordering of pages of concurrently multiplexed bitstreams is 580goverened by timestamp (not shown here); there is no regular 581interleaving order. Pages within a logical bitstream appear in 582sequence order.</li> 583</ol> 584 585<div id="copyright"> 586 The Xiph Fish Logo is a 587 trademark (™) of Xiph.Org.<br/> 588 589 These pages © 1994 - 2010 Xiph.Org. All rights reserved. 590</div> 591 592</div> 593</body> 594</html> 595