xref: /aosp_15_r20/external/webrtc/modules/video_coding/g3doc/index.md (revision d9f758449e529ab9291ac668be2861e7a55c2422)
1<?% config.freshness.owner = 'brandtr' %?>
2<?% config.freshness.reviewed = '2021-04-15' %?>
3
4# Video coding in WebRTC
5
6## Introduction to layered video coding
7
8[Video coding][video-coding-wiki] is the process of encoding a stream of
9uncompressed video frames into a compressed bitstream, whose bitrate is lower
10than that of the original stream.
11
12### Block-based hybrid video coding
13
14All video codecs in WebRTC are based on the block-based hybrid video coding
15paradigm, which entails prediction of the original video frame using either
16[information from previously encoded frames][motion-compensation-wiki] or
17information from previously encoded portions of the current frame, subtraction
18of the prediction from the original video, and
19[transform][transform-coding-wiki] and [quantization][quantization-wiki] of the
20resulting difference. The output of the quantization process, quantized
21transform coefficients, is losslessly [entropy coded][entropy-coding-wiki] along
22with other encoder parameters (e.g., those related to the prediction process)
23and then a reconstruction is constructed by inverse quantizing and inverse
24transforming the quantized transform coefficients and adding the result to the
25prediction. Finally, in-loop filtering is applied and the resulting
26reconstruction is stored as a reference frame to be used to develop predictions
27for future frames.
28
29### Frame types
30
31When an encoded frame depends on previously encoded frames (i.e., it has one or
32more inter-frame dependencies), the prior frames must be available at the
33receiver before the current frame can be decoded. In order for a receiver to
34start decoding an encoded bitstream, a frame which has no prior dependencies is
35required. Such a frame is called a "key frame". For real-time-communications
36encoding, key frames typically compress less efficiently than "delta frames"
37(i.e., frames whose predictions are derived from previously encoded frames).
38
39### Single-layer coding
40
41In 1:1 calls, the encoded bitstream has a single recipient. Using end-to-end
42bandwidth estimation, the target bitrate can thus be well tailored for the
43intended recipient. The number of key frames can be kept to a minimum and the
44compressability of the stream can be maximized. One way of achiving this is by
45using "single-layer coding", where each delta frame only depends on the frame
46that was most recently encoded.
47
48### Scalable video coding
49
50In multiway conferences, on the other hand, the encoded bitstream has multiple
51recipients each of whom may have different downlink bandwidths. In order to
52tailor the encoded bitstreams to a heterogeneous network of receivers,
53[scalable video coding][svc-wiki] can be used. The idea is to introduce
54structure into the dependency graph of the encoded bitstream, such that _layers_ of
55the full stream can be decoded using only available lower layers. This structure
56allows for a [selective forwarding unit][sfu-webrtc-glossary] to discard upper
57layers of the of the bitstream in order to achieve the intended downlink
58bandwidth.
59
60There are multiple types of scalability:
61
62* _Temporal scalability_ are layers whose framerate (and bitrate) is lower than that of the upper layer(s)
63* _Spatial scalability_ are layers whose resolution (and bitrate) is lower than that of the upper layer(s)
64* _Quality scalability_ are layers whose bitrate is lower than that of the upper layer(s)
65
66WebRTC supports temporal scalability for `VP8`, `VP9` and `AV1`, and spatial
67scalability for `VP9` and `AV1`.
68
69### Simulcast
70
71Simulcast is another approach for multiway conferencing, where multiple
72_independent_ bitstreams are produced by the encoder.
73
74In cases where multiple encodings of the same source are required (e.g., uplink
75transmission in a multiway call), spatial scalability with inter-layer
76prediction generally offers superior coding efficiency compared with simulcast.
77When a single encoding is required (e.g., downlink transmission in any call),
78simulcast generally provides better coding efficiency for the upper spatial
79layers. The `K-SVC` concept, where spatial inter-layer dependencies are only
80used to encode key frames, for which inter-layer prediction is typically
81significantly more effective than it is for delta frames, can be seen as a
82compromise between full spatial scalability and simulcast.
83
84## Overview of implementation in `modules/video_coding`
85
86Given the general introduction to video coding above, we now describe some
87specifics of the [`modules/video_coding`][modules-video-coding] folder in WebRTC.
88
89### Built-in software codecs in [`modules/video_coding/codecs`][modules-video-coding-codecs]
90
91This folder contains WebRTC-specific classes that wrap software codec
92implementations for different video coding standards:
93
94* [libaom][libaom-src] for [AV1][av1-spec]
95* [libvpx][libvpx-src] for [VP8][vp8-spec] and [VP9][vp9-spec]
96* [OpenH264][openh264-src] for [H.264 constrained baseline profile][h264-spec]
97
98Users of the library can also inject their own codecs, using the
99[VideoEncoderFactory][video-encoder-factory-interface] and
100[VideoDecoderFactory][video-decoder-factory-interface] interfaces. This is how
101platform-supported codecs, such as hardware backed codecs, are implemented.
102
103### Video codec test framework in [`modules/video_coding/codecs/test`][modules-video-coding-codecs-test]
104
105This folder contains a test framework that can be used to evaluate video quality
106performance of different video codec implementations.
107
108### SVC helper classes in [`modules/video_coding/svc`][modules-video-coding-svc]
109
110*   [`ScalabilityStructure*`][scalabilitystructure] - different
111    [standardized scalability structures][scalability-structure-spec]
112*   [`ScalableVideoController`][scalablevideocontroller] - provides instructions to the video encoder how
113    to create a scalable stream
114*   [`SvcRateAllocator`][svcrateallocator] - bitrate allocation to different spatial and temporal
115    layers
116
117### Utility classes in [`modules/video_coding/utility`][modules-video-coding-utility]
118
119*   [`FrameDropper`][framedropper] - drops incoming frames when encoder systematically
120    overshoots its target bitrate
121*   [`FramerateController`][frameratecontroller] - drops incoming frames to achieve a target framerate
122*   [`QpParser`][qpparser] - parses the quantization parameter from a bitstream
123*   [`QualityScaler`][qualityscaler] - signals when an encoder generates encoded frames whose
124    quantization parameter is outside the window of acceptable values
125*   [`SimulcastRateAllocator`][simulcastrateallocator] - bitrate allocation to simulcast layers
126
127### General helper classes in [`modules/video_coding`][modules-video-coding]
128
129*   [`FecControllerDefault`][feccontrollerdefault] - provides a default implementation for rate
130    allocation to [forward error correction][fec-wiki]
131*   [`VideoCodecInitializer`][videocodecinitializer] - converts between different encoder configuration
132    structs
133
134### Receiver buffer classes in [`modules/video_coding`][modules-video-coding]
135
136*   [`PacketBuffer`][packetbuffer] - (re-)combines RTP packets into frames
137*   [`RtpFrameReferenceFinder`][rtpframereferencefinder] - determines dependencies between frames based on information in the RTP header, payload header and RTP extensions
138*   [`FrameBuffer`][framebuffer] - order frames based on their dependencies to be fed to the decoder
139
140[video-coding-wiki]: https://en.wikipedia.org/wiki/Video_coding_format
141[motion-compensation-wiki]: https://en.wikipedia.org/wiki/Motion_compensation
142[transform-coding-wiki]: https://en.wikipedia.org/wiki/Transform_coding
143[motion-vector-wiki]: https://en.wikipedia.org/wiki/Motion_vector
144[mpeg-wiki]: https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group
145[svc-wiki]: https://en.wikipedia.org/wiki/Scalable_Video_Coding
146[sfu-webrtc-glossary]: https://webrtcglossary.com/sfu/
147[libvpx-src]: https://chromium.googlesource.com/webm/libvpx/
148[libaom-src]: https://aomedia.googlesource.com/aom/
149[openh264-src]: https://github.com/cisco/openh264
150[vp8-spec]: https://tools.ietf.org/html/rfc6386
151[vp9-spec]: https://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6-20160331-draft.pdf
152[av1-spec]: https://aomediacodec.github.io/av1-spec/
153[h264-spec]: https://www.itu.int/rec/T-REC-H.264-201906-I/en
154[video-encoder-factory-interface]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/api/video_codecs/video_encoder_factory.h;l=27;drc=afadfb24a5e608da6ae102b20b0add53a083dcf3
155[video-decoder-factory-interface]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/api/video_codecs/video_decoder_factory.h;l=27;drc=49c293f03d8f593aa3aca282577fcb14daa63207
156[scalability-structure-spec]: https://w3c.github.io/webrtc-svc/#scalabilitymodes*
157[fec-wiki]: https://en.wikipedia.org/wiki/Error_correction_code#Forward_error_correction
158[entropy-coding-wiki]: https://en.wikipedia.org/wiki/Entropy_encoding
159[modules-video-coding]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/
160[modules-video-coding-codecs]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/codecs/
161[modules-video-coding-codecs-test]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/codecs/test/
162[modules-video-coding-svc]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/
163[modules-video-coding-utility]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/
164[scalabilitystructure]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/create_scalability_structure.h?q=CreateScalabilityStructure
165[scalablevideocontroller]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/scalable_video_controller.h?q=ScalableVideoController
166[svcrateallocator]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/svc_rate_allocator.h?q=SvcRateAllocator
167[framedropper]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/frame_dropper.h?q=FrameDropper
168[frameratecontroller]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/framerate_controller.h?q=FramerateController
169[qpparser]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/qp_parser.h?q=QpParser
170[qualityscaler]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/quality_scaler.h?q=QualityScaler
171[simulcastrateallocator]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/simulcast_rate_allocator.h?q=SimulcastRateAllocator
172[feccontrollerdefault]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/fec_controller_default.h?q=FecControllerDefault
173[videocodecinitializer]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/include/video_codec_initializer.h?q=VideoCodecInitializer
174[packetbuffer]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/packet_buffer.h?q=PacketBuffer
175[rtpframereferencefinder]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/rtp_frame_reference_finder.h?q=RtpFrameReferenceFinder
176[framebuffer]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/frame_buffer2.h?q=FrameBuffer
177[quantization-wiki]: https://en.wikipedia.org/wiki/Quantization_(signal_processing)
178