xref: /aosp_15_r20/external/libopus/doc/draft-ietf-codec-opus.xml (revision a58d3d2adb790c104798cd88c8a3aff4fa8b82cc)
1*a58d3d2aSXin Li<?xml version="1.0" encoding="utf-8"?>
2*a58d3d2aSXin Li<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>
3*a58d3d2aSXin Li<?rfc toc="yes" symrefs="yes" ?>
4*a58d3d2aSXin Li
5*a58d3d2aSXin Li<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-opus-14">
6*a58d3d2aSXin Li
7*a58d3d2aSXin Li<front>
8*a58d3d2aSXin Li<title abbrev="Interactive Audio Codec">Definition of the Opus Audio Codec</title>
9*a58d3d2aSXin Li
10*a58d3d2aSXin Li
11*a58d3d2aSXin Li<author initials="JM" surname="Valin" fullname="Jean-Marc Valin">
12*a58d3d2aSXin Li<organization>Mozilla Corporation</organization>
13*a58d3d2aSXin Li<address>
14*a58d3d2aSXin Li<postal>
15*a58d3d2aSXin Li<street>650 Castro Street</street>
16*a58d3d2aSXin Li<city>Mountain View</city>
17*a58d3d2aSXin Li<region>CA</region>
18*a58d3d2aSXin Li<code>94041</code>
19*a58d3d2aSXin Li<country>USA</country>
20*a58d3d2aSXin Li</postal>
21*a58d3d2aSXin Li<phone>+1 650 903-0800</phone>
22*a58d3d2aSXin Li<email>[email protected]</email>
23*a58d3d2aSXin Li</address>
24*a58d3d2aSXin Li</author>
25*a58d3d2aSXin Li
26*a58d3d2aSXin Li<author initials="K." surname="Vos" fullname="Koen Vos">
27*a58d3d2aSXin Li<organization>Skype Technologies S.A.</organization>
28*a58d3d2aSXin Li<address>
29*a58d3d2aSXin Li<postal>
30*a58d3d2aSXin Li<street>Soder Malarstrand 43</street>
31*a58d3d2aSXin Li<city>Stockholm</city>
32*a58d3d2aSXin Li<region></region>
33*a58d3d2aSXin Li<code>11825</code>
34*a58d3d2aSXin Li<country>SE</country>
35*a58d3d2aSXin Li</postal>
36*a58d3d2aSXin Li<phone>+46 73 085 7619</phone>
37*a58d3d2aSXin Li<email>[email protected]</email>
38*a58d3d2aSXin Li</address>
39*a58d3d2aSXin Li</author>
40*a58d3d2aSXin Li
41*a58d3d2aSXin Li<author initials="T." surname="Terriberry" fullname="Timothy B. Terriberry">
42*a58d3d2aSXin Li<organization>Mozilla Corporation</organization>
43*a58d3d2aSXin Li<address>
44*a58d3d2aSXin Li<postal>
45*a58d3d2aSXin Li<street>650 Castro Street</street>
46*a58d3d2aSXin Li<city>Mountain View</city>
47*a58d3d2aSXin Li<region>CA</region>
48*a58d3d2aSXin Li<code>94041</code>
49*a58d3d2aSXin Li<country>USA</country>
50*a58d3d2aSXin Li</postal>
51*a58d3d2aSXin Li<phone>+1 650 903-0800</phone>
52*a58d3d2aSXin Li<email>[email protected]</email>
53*a58d3d2aSXin Li</address>
54*a58d3d2aSXin Li</author>
55*a58d3d2aSXin Li
56*a58d3d2aSXin Li<date day="17" month="May" year="2012" />
57*a58d3d2aSXin Li
58*a58d3d2aSXin Li<area>General</area>
59*a58d3d2aSXin Li
60*a58d3d2aSXin Li<workgroup></workgroup>
61*a58d3d2aSXin Li
62*a58d3d2aSXin Li<abstract>
63*a58d3d2aSXin Li<t>
64*a58d3d2aSXin LiThis document defines the Opus interactive speech and audio codec.
65*a58d3d2aSXin LiOpus is designed to handle a wide range of interactive audio applications,
66*a58d3d2aSXin Li including Voice over IP, videoconferencing, in-game chat, and even live,
67*a58d3d2aSXin Li distributed music performances.
68*a58d3d2aSXin LiIt scales from low bitrate narrowband speech at 6 kb/s to very high quality
69*a58d3d2aSXin Li stereo music at 510 kb/s.
70*a58d3d2aSXin LiOpus uses both linear prediction (LP) and the Modified Discrete Cosine
71*a58d3d2aSXin Li Transform (MDCT) to achieve good compression of both speech and music.
72*a58d3d2aSXin Li</t>
73*a58d3d2aSXin Li</abstract>
74*a58d3d2aSXin Li</front>
75*a58d3d2aSXin Li
76*a58d3d2aSXin Li<middle>
77*a58d3d2aSXin Li
78*a58d3d2aSXin Li<section anchor="introduction" title="Introduction">
79*a58d3d2aSXin Li<t>
80*a58d3d2aSXin LiThe Opus codec is a real-time interactive audio codec designed to meet the requirements
81*a58d3d2aSXin Lidescribed in <xref target="requirements"></xref>.
82*a58d3d2aSXin LiIt is composed of a linear
83*a58d3d2aSXin Li prediction (LP)-based <xref target="LPC"/> layer and a Modified Discrete Cosine Transform
84*a58d3d2aSXin Li (MDCT)-based <xref target="MDCT"/> layer.
85*a58d3d2aSXin LiThe main idea behind using two layers is that in speech, linear prediction
86*a58d3d2aSXin Li techniques (such as Code-Excited Linear Prediction, or CELP) code low frequencies more efficiently than transform
87*a58d3d2aSXin Li (e.g., MDCT) domain techniques, while the situation is reversed for music and
88*a58d3d2aSXin Li higher speech frequencies.
89*a58d3d2aSXin LiThus a codec with both layers available can operate over a wider range than
90*a58d3d2aSXin Li either one alone and, by combining them, achieve better quality than either
91*a58d3d2aSXin Li one individually.
92*a58d3d2aSXin Li</t>
93*a58d3d2aSXin Li
94*a58d3d2aSXin Li<t>
95*a58d3d2aSXin LiThe primary normative part of this specification is provided by the source code
96*a58d3d2aSXin Li in <xref target="ref-implementation"></xref>.
97*a58d3d2aSXin LiOnly the decoder portion of this software is normative, though a
98*a58d3d2aSXin Li significant amount of code is shared by both the encoder and decoder.
99*a58d3d2aSXin Li<xref target="conformance"/> provides a decoder conformance test.
100*a58d3d2aSXin LiThe decoder contains a great deal of integer and fixed-point arithmetic which
101*a58d3d2aSXin Li needs to be performed exactly, including all rounding considerations, so any
102*a58d3d2aSXin Li useful specification requires domain-specific symbolic language to adequately
103*a58d3d2aSXin Li define these operations.
104*a58d3d2aSXin LiAdditionally, any
105*a58d3d2aSXin Liconflict between the symbolic representation and the included reference
106*a58d3d2aSXin Liimplementation must be resolved. For the practical reasons of compatibility and
107*a58d3d2aSXin Litestability it would be advantageous to give the reference implementation
108*a58d3d2aSXin Lipriority in any disagreement. The C language is also one of the most
109*a58d3d2aSXin Liwidely understood human-readable symbolic representations for machine
110*a58d3d2aSXin Libehavior.
111*a58d3d2aSXin LiFor these reasons this RFC uses the reference implementation as the sole
112*a58d3d2aSXin Li symbolic representation of the codec.
113*a58d3d2aSXin Li</t>
114*a58d3d2aSXin Li
115*a58d3d2aSXin Li<t>While the symbolic representation is unambiguous and complete it is not
116*a58d3d2aSXin Lialways the easiest way to understand the codec's operation. For this reason
117*a58d3d2aSXin Lithis document also describes significant parts of the codec in English and
118*a58d3d2aSXin Litakes the opportunity to explain the rationale behind many of the more
119*a58d3d2aSXin Lisurprising elements of the design. These descriptions are intended to be
120*a58d3d2aSXin Liaccurate and informative, but the limitations of common English sometimes
121*a58d3d2aSXin Liresult in ambiguity, so it is expected that the reader will always read
122*a58d3d2aSXin Lithem alongside the symbolic representation. Numerous references to the
123*a58d3d2aSXin Liimplementation are provided for this purpose. The descriptions sometimes
124*a58d3d2aSXin Lidiffer from the reference in ordering or through mathematical simplification
125*a58d3d2aSXin Liwherever such deviation makes an explanation easier to understand.
126*a58d3d2aSXin LiFor example, the right shift and left shift operations in the reference
127*a58d3d2aSXin Liimplementation are often described using division and multiplication in the text.
128*a58d3d2aSXin LiIn general, the text is focused on the "what" and "why" while the symbolic
129*a58d3d2aSXin Lirepresentation most clearly provides the "how".
130*a58d3d2aSXin Li</t>
131*a58d3d2aSXin Li
132*a58d3d2aSXin Li<section anchor="notation" title="Notation and Conventions">
133*a58d3d2aSXin Li<t>
134*a58d3d2aSXin LiThe key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
135*a58d3d2aSXin Li "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
136*a58d3d2aSXin Li interpreted as described in RFC 2119 <xref target="rfc2119"></xref>.
137*a58d3d2aSXin Li</t>
138*a58d3d2aSXin Li<t>
139*a58d3d2aSXin LiVarious operations in the codec require bit-exact fixed-point behavior, even
140*a58d3d2aSXin Li when writing a floating point implementation.
141*a58d3d2aSXin LiThe notation "Q&lt;n&gt;", where n is an integer, denotes the number of binary
142*a58d3d2aSXin Li digits to the right of the decimal point in a fixed-point number.
143*a58d3d2aSXin LiFor example, a signed Q14 value in a 16-bit word can represent values from
144*a58d3d2aSXin Li -2.0 to 1.99993896484375, inclusive.
145*a58d3d2aSXin LiThis notation is for informational purposes only.
146*a58d3d2aSXin LiArithmetic, when described, always operates on the underlying integer.
147*a58d3d2aSXin LiE.g., the text will explicitly indicate any shifts required after a
148*a58d3d2aSXin Li multiplication.
149*a58d3d2aSXin Li</t>
150*a58d3d2aSXin Li<t>
151*a58d3d2aSXin LiExpressions, where included in the text, follow C operator rules and
152*a58d3d2aSXin Li precedence, with the exception that the syntax "x**y" indicates x raised to
153*a58d3d2aSXin Li the power y.
154*a58d3d2aSXin LiThe text also makes use of the following functions:
155*a58d3d2aSXin Li</t>
156*a58d3d2aSXin Li
157*a58d3d2aSXin Li<section anchor="min" toc="exclude" title="min(x,y)">
158*a58d3d2aSXin Li<t>
159*a58d3d2aSXin LiThe smallest of two values x and y.
160*a58d3d2aSXin Li</t>
161*a58d3d2aSXin Li</section>
162*a58d3d2aSXin Li
163*a58d3d2aSXin Li<section anchor="max" toc="exclude" title="max(x,y)">
164*a58d3d2aSXin Li<t>
165*a58d3d2aSXin LiThe largest of two values x and y.
166*a58d3d2aSXin Li</t>
167*a58d3d2aSXin Li</section>
168*a58d3d2aSXin Li
169*a58d3d2aSXin Li<section anchor="clamp" toc="exclude" title="clamp(lo,x,hi)">
170*a58d3d2aSXin Li<figure align="center">
171*a58d3d2aSXin Li<artwork align="center"><![CDATA[
172*a58d3d2aSXin Liclamp(lo,x,hi) = max(lo,min(x,hi))
173*a58d3d2aSXin Li]]></artwork>
174*a58d3d2aSXin Li</figure>
175*a58d3d2aSXin Li<t>
176*a58d3d2aSXin LiWith this definition, if lo&nbsp;&gt;&nbsp;hi, the lower bound is the one that
177*a58d3d2aSXin Li is enforced.
178*a58d3d2aSXin Li</t>
179*a58d3d2aSXin Li</section>
180*a58d3d2aSXin Li
181*a58d3d2aSXin Li<section anchor="sign" toc="exclude" title="sign(x)">
182*a58d3d2aSXin Li<t>
183*a58d3d2aSXin LiThe sign of x, i.e.,
184*a58d3d2aSXin Li<figure align="center">
185*a58d3d2aSXin Li<artwork align="center"><![CDATA[
186*a58d3d2aSXin Li          ( -1,  x < 0 ,
187*a58d3d2aSXin Lisign(x) = <  0,  x == 0 ,
188*a58d3d2aSXin Li          (  1,  x > 0 .
189*a58d3d2aSXin Li]]></artwork>
190*a58d3d2aSXin Li</figure>
191*a58d3d2aSXin Li</t>
192*a58d3d2aSXin Li</section>
193*a58d3d2aSXin Li
194*a58d3d2aSXin Li<section anchor="abs" toc="exclude" title="abs(x)">
195*a58d3d2aSXin Li<t>
196*a58d3d2aSXin LiThe absolute value of x, i.e.,
197*a58d3d2aSXin Li<figure align="center">
198*a58d3d2aSXin Li<artwork align="center"><![CDATA[
199*a58d3d2aSXin Liabs(x) = sign(x)*x .
200*a58d3d2aSXin Li]]></artwork>
201*a58d3d2aSXin Li</figure>
202*a58d3d2aSXin Li</t>
203*a58d3d2aSXin Li</section>
204*a58d3d2aSXin Li
205*a58d3d2aSXin Li<section anchor="floor" toc="exclude" title="floor(f)">
206*a58d3d2aSXin Li<t>
207*a58d3d2aSXin LiThe largest integer z such that z &lt;= f.
208*a58d3d2aSXin Li</t>
209*a58d3d2aSXin Li</section>
210*a58d3d2aSXin Li
211*a58d3d2aSXin Li<section anchor="ceil" toc="exclude" title="ceil(f)">
212*a58d3d2aSXin Li<t>
213*a58d3d2aSXin LiThe smallest integer z such that z &gt;= f.
214*a58d3d2aSXin Li</t>
215*a58d3d2aSXin Li</section>
216*a58d3d2aSXin Li
217*a58d3d2aSXin Li<section anchor="round" toc="exclude" title="round(f)">
218*a58d3d2aSXin Li<t>
219*a58d3d2aSXin LiThe integer z nearest to f, with ties rounded towards negative infinity,
220*a58d3d2aSXin Li i.e.,
221*a58d3d2aSXin Li<figure align="center">
222*a58d3d2aSXin Li<artwork align="center"><![CDATA[
223*a58d3d2aSXin Li round(f) = ceil(f - 0.5) .
224*a58d3d2aSXin Li]]></artwork>
225*a58d3d2aSXin Li</figure>
226*a58d3d2aSXin Li</t>
227*a58d3d2aSXin Li</section>
228*a58d3d2aSXin Li
229*a58d3d2aSXin Li<section anchor="log2" toc="exclude" title="log2(f)">
230*a58d3d2aSXin Li<t>
231*a58d3d2aSXin LiThe base-two logarithm of f.
232*a58d3d2aSXin Li</t>
233*a58d3d2aSXin Li</section>
234*a58d3d2aSXin Li
235*a58d3d2aSXin Li<section anchor="ilog" toc="exclude" title="ilog(n)">
236*a58d3d2aSXin Li<t>
237*a58d3d2aSXin LiThe minimum number of bits required to store a positive integer n in two's
238*a58d3d2aSXin Li complement notation, or 0 for a non-positive integer n.
239*a58d3d2aSXin Li<figure align="center">
240*a58d3d2aSXin Li<artwork align="center"><![CDATA[
241*a58d3d2aSXin Li          ( 0,                 n <= 0,
242*a58d3d2aSXin Liilog(n) = <
243*a58d3d2aSXin Li          ( floor(log2(n))+1,  n > 0
244*a58d3d2aSXin Li]]></artwork>
245*a58d3d2aSXin Li</figure>
246*a58d3d2aSXin LiExamples:
247*a58d3d2aSXin Li<list style="symbols">
248*a58d3d2aSXin Li<t>ilog(-1) = 0</t>
249*a58d3d2aSXin Li<t>ilog(0) = 0</t>
250*a58d3d2aSXin Li<t>ilog(1) = 1</t>
251*a58d3d2aSXin Li<t>ilog(2) = 2</t>
252*a58d3d2aSXin Li<t>ilog(3) = 2</t>
253*a58d3d2aSXin Li<t>ilog(4) = 3</t>
254*a58d3d2aSXin Li<t>ilog(7) = 3</t>
255*a58d3d2aSXin Li</list>
256*a58d3d2aSXin Li</t>
257*a58d3d2aSXin Li</section>
258*a58d3d2aSXin Li
259*a58d3d2aSXin Li</section>
260*a58d3d2aSXin Li
261*a58d3d2aSXin Li</section>
262*a58d3d2aSXin Li
263*a58d3d2aSXin Li<section anchor="overview" title="Opus Codec Overview">
264*a58d3d2aSXin Li
265*a58d3d2aSXin Li<t>
266*a58d3d2aSXin LiThe Opus codec scales from 6&nbsp;kb/s narrowband mono speech to 510&nbsp;kb/s
267*a58d3d2aSXin Li fullband stereo music, with algorithmic delays ranging from 5&nbsp;ms to
268*a58d3d2aSXin Li 65.2&nbsp;ms.
269*a58d3d2aSXin LiAt any given time, either the LP layer, the MDCT layer, or both, may be active.
270*a58d3d2aSXin LiIt can seamlessly switch between all of its various operating modes, giving it
271*a58d3d2aSXin Li a great deal of flexibility to adapt to varying content and network
272*a58d3d2aSXin Li conditions without renegotiating the current session.
273*a58d3d2aSXin LiThe codec allows input and output of various audio bandwidths, defined as
274*a58d3d2aSXin Li follows:
275*a58d3d2aSXin Li</t>
276*a58d3d2aSXin Li<texttable anchor="audio-bandwidth">
277*a58d3d2aSXin Li<ttcol>Abbreviation</ttcol>
278*a58d3d2aSXin Li<ttcol align="right">Audio Bandwidth</ttcol>
279*a58d3d2aSXin Li<ttcol align="right">Sample Rate (Effective)</ttcol>
280*a58d3d2aSXin Li<c>NB (narrowband)</c>       <c>4&nbsp;kHz</c>  <c>8&nbsp;kHz</c>
281*a58d3d2aSXin Li<c>MB (medium-band)</c>      <c>6&nbsp;kHz</c> <c>12&nbsp;kHz</c>
282*a58d3d2aSXin Li<c>WB (wideband)</c>         <c>8&nbsp;kHz</c> <c>16&nbsp;kHz</c>
283*a58d3d2aSXin Li<c>SWB (super-wideband)</c> <c>12&nbsp;kHz</c> <c>24&nbsp;kHz</c>
284*a58d3d2aSXin Li<c>FB (fullband)</c>        <c>20&nbsp;kHz (*)</c> <c>48&nbsp;kHz</c>
285*a58d3d2aSXin Li</texttable>
286*a58d3d2aSXin Li<t>
287*a58d3d2aSXin Li(*) Although the sampling theorem allows a bandwidth as large as half the
288*a58d3d2aSXin Li sampling rate, Opus never codes audio above 20&nbsp;kHz, as that is the
289*a58d3d2aSXin Li generally accepted upper limit of human hearing.
290*a58d3d2aSXin Li</t>
291*a58d3d2aSXin Li
292*a58d3d2aSXin Li<t>
293*a58d3d2aSXin LiOpus defines super-wideband (SWB) with an effective sample rate of 24&nbsp;kHz,
294*a58d3d2aSXin Li unlike some other audio coding standards that use 32&nbsp;kHz.
295*a58d3d2aSXin LiThis was chosen for a number of reasons.
296*a58d3d2aSXin LiThe band layout in the MDCT layer naturally allows skipping coefficients for
297*a58d3d2aSXin Li frequencies over 12&nbsp;kHz, but does not allow cleanly dropping just those
298*a58d3d2aSXin Li frequencies over 16&nbsp;kHz.
299*a58d3d2aSXin LiA sample rate of 24&nbsp;kHz also makes resampling in the MDCT layer easier,
300*a58d3d2aSXin Li as 24 evenly divides 48, and when 24&nbsp;kHz is sufficient, it can save
301*a58d3d2aSXin Li computation in other processing, such as Acoustic Echo Cancellation (AEC).
302*a58d3d2aSXin LiExperimental changes to the band layout to allow a 16&nbsp;kHz cutoff
303*a58d3d2aSXin Li (32&nbsp;kHz effective sample rate) showed potential quality degradations at
304*a58d3d2aSXin Li other sample rates, and at typical bitrates the number of bits saved by using
305*a58d3d2aSXin Li such a cutoff instead of coding in fullband (FB) mode is very small.
306*a58d3d2aSXin LiTherefore, if an application wishes to process a signal sampled at 32&nbsp;kHz,
307*a58d3d2aSXin Li it should just use FB.
308*a58d3d2aSXin Li</t>
309*a58d3d2aSXin Li
310*a58d3d2aSXin Li<t>
311*a58d3d2aSXin LiThe LP layer is based on the SILK codec
312*a58d3d2aSXin Li <xref target="SILK"></xref>.
313*a58d3d2aSXin LiIt supports NB, MB, or WB audio and frame sizes from 10&nbsp;ms to 60&nbsp;ms,
314*a58d3d2aSXin Li and requires an additional 5&nbsp;ms look-ahead for noise shaping estimation.
315*a58d3d2aSXin LiA small additional delay (up to 1.5 ms) may be required for sampling rate
316*a58d3d2aSXin Li conversion.
317*a58d3d2aSXin LiLike Vorbis <xref target='Vorbis-website'/> and many other modern codecs, SILK is inherently designed for
318*a58d3d2aSXin Li variable-bitrate (VBR) coding, though the encoder can also produce
319*a58d3d2aSXin Li constant-bitrate (CBR) streams.
320*a58d3d2aSXin LiThe version of SILK used in Opus is substantially modified from, and not
321*a58d3d2aSXin Li compatible with, the stand-alone SILK codec previously deployed by Skype.
322*a58d3d2aSXin LiThis document does not serve to define that format, but those interested in the
323*a58d3d2aSXin Li original SILK codec should see <xref target="SILK"/> instead.
324*a58d3d2aSXin Li</t>
325*a58d3d2aSXin Li
326*a58d3d2aSXin Li<t>
327*a58d3d2aSXin LiThe MDCT layer is based on the CELT  codec <xref target="CELT"></xref>.
328*a58d3d2aSXin LiIt supports NB, WB, SWB, or FB audio and frame sizes from 2.5&nbsp;ms to
329*a58d3d2aSXin Li 20&nbsp;ms, and requires an additional 2.5&nbsp;ms look-ahead due to the
330*a58d3d2aSXin Li overlapping MDCT windows.
331*a58d3d2aSXin LiThe CELT codec is inherently designed for CBR coding, but unlike many CBR
332*a58d3d2aSXin Li codecs it is not limited to a set of predetermined rates.
333*a58d3d2aSXin LiIt internally allocates bits to exactly fill any given target budget, and an
334*a58d3d2aSXin Li encoder can produce a VBR stream by varying the target on a per-frame basis.
335*a58d3d2aSXin LiThe MDCT layer is not used for speech when the audio bandwidth is WB or less,
336*a58d3d2aSXin Li as it is not useful there.
337*a58d3d2aSXin LiOn the other hand, non-speech signals are not always adequately coded using
338*a58d3d2aSXin Li linear prediction, so for music only the MDCT layer should be used.
339*a58d3d2aSXin Li</t>
340*a58d3d2aSXin Li
341*a58d3d2aSXin Li<t>
342*a58d3d2aSXin LiA "Hybrid" mode allows the use of both layers simultaneously with a frame size
343*a58d3d2aSXin Li of 10&nbsp;or 20&nbsp;ms and a SWB or FB audio bandwidth.
344*a58d3d2aSXin LiThe LP layer codes the low frequencies by resampling the signal down to WB.
345*a58d3d2aSXin LiThe MDCT layer follows, coding the high frequency portion of the signal.
346*a58d3d2aSXin LiThe cutoff between the two lies at 8&nbsp;kHz, the maximum WB audio bandwidth.
347*a58d3d2aSXin LiIn the MDCT layer, all bands below 8&nbsp;kHz are discarded, so there is no
348*a58d3d2aSXin Li coding redundancy between the two layers.
349*a58d3d2aSXin Li</t>
350*a58d3d2aSXin Li
351*a58d3d2aSXin Li<t>
352*a58d3d2aSXin LiThe sample rate (in contrast to the actual audio bandwidth) can be chosen
353*a58d3d2aSXin Li independently on the encoder and decoder side, e.g., a fullband signal can be
354*a58d3d2aSXin Li decoded as wideband, or vice versa.
355*a58d3d2aSXin LiThis approach ensures a sender and receiver can always interoperate, regardless
356*a58d3d2aSXin Li of the capabilities of their actual audio hardware.
357*a58d3d2aSXin LiInternally, the LP layer always operates at a sample rate of twice the audio
358*a58d3d2aSXin Li bandwidth, up to a maximum of 16&nbsp;kHz, which it continues to use for SWB
359*a58d3d2aSXin Li and FB.
360*a58d3d2aSXin LiThe decoder simply resamples its output to support different sample rates.
361*a58d3d2aSXin LiThe MDCT layer always operates internally at a sample rate of 48&nbsp;kHz.
362*a58d3d2aSXin LiSince all the supported sample rates evenly divide this rate, and since the
363*a58d3d2aSXin Li the decoder may easily zero out the high frequency portion of the spectrum in
364*a58d3d2aSXin Li the frequency domain, it can simply decimate the MDCT layer output to achieve
365*a58d3d2aSXin Li the other supported sample rates very cheaply.
366*a58d3d2aSXin Li</t>
367*a58d3d2aSXin Li
368*a58d3d2aSXin Li<t>
369*a58d3d2aSXin LiAfter conversion to the common, desired output sample rate, the decoder simply
370*a58d3d2aSXin Li adds the output from the two layers together.
371*a58d3d2aSXin LiTo compensate for the different look-ahead required by each layer, the CELT
372*a58d3d2aSXin Li encoder input is delayed by an additional 2.7&nbsp;ms.
373*a58d3d2aSXin LiThis ensures that low frequencies and high frequencies arrive at the same time.
374*a58d3d2aSXin LiThis extra delay may be reduced by an encoder by using less look-ahead for noise
375*a58d3d2aSXin Li shaping or using a simpler resampler in the LP layer, but this will reduce
376*a58d3d2aSXin Li quality.
377*a58d3d2aSXin LiHowever, the base 2.5&nbsp;ms look-ahead in the CELT layer cannot be reduced in
378*a58d3d2aSXin Li the encoder because it is needed for the MDCT overlap, whose size is fixed by
379*a58d3d2aSXin Li the decoder.
380*a58d3d2aSXin Li</t>
381*a58d3d2aSXin Li
382*a58d3d2aSXin Li<t>
383*a58d3d2aSXin LiBoth layers use the same entropy coder, avoiding any waste from "padding bits"
384*a58d3d2aSXin Li between them.
385*a58d3d2aSXin LiThe hybrid approach makes it easy to support both CBR and VBR coding.
386*a58d3d2aSXin LiAlthough the LP layer is VBR, the bit allocation of the MDCT layer can produce
387*a58d3d2aSXin Li a final stream that is CBR by using all the bits left unused by the LP layer.
388*a58d3d2aSXin Li</t>
389*a58d3d2aSXin Li
390*a58d3d2aSXin Li<section title="Control Parameters">
391*a58d3d2aSXin Li<t>
392*a58d3d2aSXin LiThe Opus codec includes a number of control parameters which can be changed dynamically during
393*a58d3d2aSXin Liregular operation of the codec, without interrupting the audio stream from the encoder to the decoder.
394*a58d3d2aSXin LiThese parameters only affect the encoder since any impact they have on the bit-stream is signaled
395*a58d3d2aSXin Liin-band such that a decoder can decode any Opus stream without any out-of-band signaling. Any Opus
396*a58d3d2aSXin Liimplementation can add or modify these control parameters without affecting interoperability. The most
397*a58d3d2aSXin Liimportant encoder control parameters in the reference encoder are listed below.
398*a58d3d2aSXin Li</t>
399*a58d3d2aSXin Li
400*a58d3d2aSXin Li<section title="Bitrate" toc="exlcude">
401*a58d3d2aSXin Li<t>
402*a58d3d2aSXin LiOpus supports all bitrates from 6&nbsp;kb/s to 510&nbsp;kb/s. All other parameters being
403*a58d3d2aSXin Liequal, higher bitrate results in higher quality. For a frame size of 20&nbsp;ms, these
404*a58d3d2aSXin Liare the bitrate "sweet spots" for Opus in various configurations:
405*a58d3d2aSXin Li<list style="symbols">
406*a58d3d2aSXin Li<t>8-12 kb/s for NB speech,</t>
407*a58d3d2aSXin Li<t>16-20 kb/s for WB speech,</t>
408*a58d3d2aSXin Li<t>28-40 kb/s for FB speech,</t>
409*a58d3d2aSXin Li<t>48-64 kb/s for FB mono music, and</t>
410*a58d3d2aSXin Li<t>64-128 kb/s for FB stereo music.</t>
411*a58d3d2aSXin Li</list>
412*a58d3d2aSXin Li</t>
413*a58d3d2aSXin Li</section>
414*a58d3d2aSXin Li
415*a58d3d2aSXin Li<section title="Number of Channels (Mono/Stereo)" toc="exlcude">
416*a58d3d2aSXin Li<t>
417*a58d3d2aSXin LiOpus can transmit either mono or stereo frames within a single stream.
418*a58d3d2aSXin LiWhen decoding a mono frame in a stereo decoder, the left and right channels are
419*a58d3d2aSXin Li identical, and when decoding a stereo frame in a mono decoder, the mono output
420*a58d3d2aSXin Li is the average of the left and right channels.
421*a58d3d2aSXin LiIn some cases, it is desirable to encode a stereo input stream in mono (e.g.,
422*a58d3d2aSXin Li because the bitrate is too low to encode stereo with sufficient quality).
423*a58d3d2aSXin LiThe number of channels encoded can be selected in real-time, but by default the
424*a58d3d2aSXin Li reference encoder attempts to make the best decision possible given the
425*a58d3d2aSXin Li current bitrate.
426*a58d3d2aSXin Li</t>
427*a58d3d2aSXin Li</section>
428*a58d3d2aSXin Li
429*a58d3d2aSXin Li<section title="Audio Bandwidth" toc="exlcude">
430*a58d3d2aSXin Li<t>
431*a58d3d2aSXin LiThe audio bandwidths supported by Opus are listed in
432*a58d3d2aSXin Li <xref target="audio-bandwidth"/>.
433*a58d3d2aSXin LiJust like for the number of channels, any decoder can decode audio encoded at
434*a58d3d2aSXin Li any bandwidth.
435*a58d3d2aSXin LiFor example, any Opus decoder operating at 8&nbsp;kHz can decode a FB Opus
436*a58d3d2aSXin Li frame, and any Opus decoder operating at 48&nbsp;kHz can decode a NB frame.
437*a58d3d2aSXin LiSimilarly, the reference encoder can take a 48&nbsp;kHz input signal and
438*a58d3d2aSXin Li encode it as NB.
439*a58d3d2aSXin LiThe higher the audio bandwidth, the higher the required bitrate to achieve
440*a58d3d2aSXin Li acceptable quality.
441*a58d3d2aSXin LiThe audio bandwidth can be explicitly specified in real-time, but by default
442*a58d3d2aSXin Li the reference encoder attempts to make the best bandwidth decision possible
443*a58d3d2aSXin Li given the current bitrate.
444*a58d3d2aSXin Li</t>
445*a58d3d2aSXin Li</section>
446*a58d3d2aSXin Li
447*a58d3d2aSXin Li
448*a58d3d2aSXin Li<section title="Frame Duration" toc="exlcude">
449*a58d3d2aSXin Li<t>
450*a58d3d2aSXin LiOpus can encode frames of 2.5, 5, 10, 20, 40 or 60&nbsp;ms.
451*a58d3d2aSXin LiIt can also combine multiple frames into packets of up to 120&nbsp;ms.
452*a58d3d2aSXin LiFor real-time applications, sending fewer packets per second reduces the
453*a58d3d2aSXin Li bitrate, since it reduces the overhead from IP, UDP, and RTP headers.
454*a58d3d2aSXin LiHowever, it increases latency and sensitivity to packet losses, as losing one
455*a58d3d2aSXin Li packet constitutes a loss of a bigger chunk of audio.
456*a58d3d2aSXin LiIncreasing the frame duration also slightly improves coding efficiency, but the
457*a58d3d2aSXin Li gain becomes small for frame sizes above 20&nbsp;ms.
458*a58d3d2aSXin LiFor this reason, 20&nbsp;ms frames are a good choice for most applications.
459*a58d3d2aSXin Li</t>
460*a58d3d2aSXin Li</section>
461*a58d3d2aSXin Li
462*a58d3d2aSXin Li<section title="Complexity" toc="exlcude">
463*a58d3d2aSXin Li<t>
464*a58d3d2aSXin LiThere are various aspects of the Opus encoding process where trade-offs
465*a58d3d2aSXin Lican be made between CPU complexity and quality/bitrate. In the reference
466*a58d3d2aSXin Liencoder, the complexity is selected using an integer from 0 to 10, where
467*a58d3d2aSXin Li0 is the lowest complexity and 10 is the highest. Examples of
468*a58d3d2aSXin Licomputations for which such trade-offs may occur are:
469*a58d3d2aSXin Li<list style="symbols">
470*a58d3d2aSXin Li<t>The order of the pitch analysis whitening filter <xref target="Whitening"/>,</t>
471*a58d3d2aSXin Li<t>The order of the short-term noise shaping filter,</t>
472*a58d3d2aSXin Li<t>The number of states in delayed decision quantization of the
473*a58d3d2aSXin Liresidual signal, and</t>
474*a58d3d2aSXin Li<t>The use of certain bit-stream features such as variable time-frequency
475*a58d3d2aSXin Liresolution and the pitch post-filter.</t>
476*a58d3d2aSXin Li</list>
477*a58d3d2aSXin Li</t>
478*a58d3d2aSXin Li</section>
479*a58d3d2aSXin Li
480*a58d3d2aSXin Li<section title="Packet Loss Resilience" toc="exlcude">
481*a58d3d2aSXin Li<t>
482*a58d3d2aSXin LiAudio codecs often exploit inter-frame correlations to reduce the
483*a58d3d2aSXin Libitrate at a cost in error propagation: after losing one packet
484*a58d3d2aSXin Liseveral packets need to be received before the decoder is able to
485*a58d3d2aSXin Liaccurately reconstruct the speech signal.  The extent to which Opus
486*a58d3d2aSXin Liexploits inter-frame dependencies can be adjusted on the fly to
487*a58d3d2aSXin Lichoose a trade-off between bitrate and amount of error propagation.
488*a58d3d2aSXin Li</t>
489*a58d3d2aSXin Li</section>
490*a58d3d2aSXin Li
491*a58d3d2aSXin Li<section title="Forward Error Correction (FEC)" toc="exlcude">
492*a58d3d2aSXin Li<t>
493*a58d3d2aSXin Li   Another mechanism providing robustness against packet loss is the in-band
494*a58d3d2aSXin Li   Forward Error Correction (FEC).  Packets that are determined to
495*a58d3d2aSXin Li   contain perceptually important speech information, such as onsets or
496*a58d3d2aSXin Li   transients, are encoded again at a lower bitrate and this re-encoded
497*a58d3d2aSXin Li   information is added to a subsequent packet.
498*a58d3d2aSXin Li</t>
499*a58d3d2aSXin Li</section>
500*a58d3d2aSXin Li
501*a58d3d2aSXin Li<section title="Constant/Variable Bitrate" toc="exlcude">
502*a58d3d2aSXin Li<t>
503*a58d3d2aSXin LiOpus is more efficient when operating with variable bitrate (VBR), which is
504*a58d3d2aSXin Lithe default. However, in some (rare) applications, constant bitrate (CBR)
505*a58d3d2aSXin Liis required. There are two main reasons to operate in CBR mode:
506*a58d3d2aSXin Li<list style="symbols">
507*a58d3d2aSXin Li<t>When the transport only supports a fixed size for each compressed frame</t>
508*a58d3d2aSXin Li<t>When encryption is used for an audio stream that is either highly constrained
509*a58d3d2aSXin Li   (e.g. yes/no, recorded prompts) or highly sensitive <xref target="SRTP-VBR"></xref> </t>
510*a58d3d2aSXin Li</list>
511*a58d3d2aSXin Li
512*a58d3d2aSXin LiWhen low-latency transmission is required over a relatively slow connection, then
513*a58d3d2aSXin Liconstrained VBR can also be used. This uses VBR in a way that simulates a
514*a58d3d2aSXin Li"bit reservoir" and is equivalent to what MP3 (MPEG 1, Layer 3) and
515*a58d3d2aSXin LiAAC (Advanced Audio Coding) call CBR (i.e., not true
516*a58d3d2aSXin LiCBR due to the bit reservoir).
517*a58d3d2aSXin Li</t>
518*a58d3d2aSXin Li</section>
519*a58d3d2aSXin Li
520*a58d3d2aSXin Li<section title="Discontinuous Transmission (DTX)" toc="exlcude">
521*a58d3d2aSXin Li<t>
522*a58d3d2aSXin Li   Discontinuous Transmission (DTX) reduces the bitrate during silence
523*a58d3d2aSXin Li   or background noise.  When DTX is enabled, only one frame is encoded
524*a58d3d2aSXin Li   every 400 milliseconds.
525*a58d3d2aSXin Li</t>
526*a58d3d2aSXin Li</section>
527*a58d3d2aSXin Li
528*a58d3d2aSXin Li</section>
529*a58d3d2aSXin Li
530*a58d3d2aSXin Li</section>
531*a58d3d2aSXin Li
532*a58d3d2aSXin Li<section anchor="modes" title="Internal Framing">
533*a58d3d2aSXin Li
534*a58d3d2aSXin Li<t>
535*a58d3d2aSXin LiThe Opus encoder produces "packets", which are each a contiguous set of bytes
536*a58d3d2aSXin Li meant to be transmitted as a single unit.
537*a58d3d2aSXin LiThe packets described here do not include such things as IP, UDP, or RTP
538*a58d3d2aSXin Li headers which are normally found in a transport-layer packet.
539*a58d3d2aSXin LiA single packet may contain multiple audio frames, so long as they share a
540*a58d3d2aSXin Li common set of parameters, including the operating mode, audio bandwidth, frame
541*a58d3d2aSXin Li size, and channel count (mono vs. stereo).
542*a58d3d2aSXin LiThis section describes the possible combinations of these parameters and the
543*a58d3d2aSXin Li internal framing used to pack multiple frames into a single packet.
544*a58d3d2aSXin LiThis framing is not self-delimiting.
545*a58d3d2aSXin LiInstead, it assumes that a higher layer (such as UDP or RTP <xref target='RFC3550'/>
546*a58d3d2aSXin Lior Ogg <xref target='RFC3533'/> or Matroska <xref target='Matroska-website'/>)
547*a58d3d2aSXin Li will communicate the length, in bytes, of the packet, and it uses this
548*a58d3d2aSXin Li information to reduce the framing overhead in the packet itself.
549*a58d3d2aSXin LiA decoder implementation MUST support the framing described in this section.
550*a58d3d2aSXin LiAn alternative, self-delimiting variant of the framing is described in
551*a58d3d2aSXin Li <xref target="self-delimiting-framing"/>.
552*a58d3d2aSXin LiSupport for that variant is OPTIONAL.
553*a58d3d2aSXin Li</t>
554*a58d3d2aSXin Li
555*a58d3d2aSXin Li<t>
556*a58d3d2aSXin LiAll bit diagrams in this document number the bits so that bit 0 is the most
557*a58d3d2aSXin Li significant bit of the first byte, and bit 7 is the least significant.
558*a58d3d2aSXin LiBit 8 is thus the most significant bit of the second byte, etc.
559*a58d3d2aSXin LiWell-formed Opus packets obey certain requirements, marked [R1] through [R7]
560*a58d3d2aSXin Li below.
561*a58d3d2aSXin LiThese are summarized in <xref target="malformed-packets"/> along with
562*a58d3d2aSXin Li appropriate means of handling malformed packets.
563*a58d3d2aSXin Li</t>
564*a58d3d2aSXin Li
565*a58d3d2aSXin Li<section anchor="toc_byte" title="The TOC Byte">
566*a58d3d2aSXin Li<t anchor="R1">
567*a58d3d2aSXin LiA well-formed Opus packet MUST contain at least one byte&nbsp;[R1].
568*a58d3d2aSXin LiThis byte forms a table-of-contents (TOC) header that signals which of the
569*a58d3d2aSXin Li various modes and configurations a given packet uses.
570*a58d3d2aSXin LiIt is composed of a configuration number, "config", a stereo flag, "s", and a
571*a58d3d2aSXin Li frame count code, "c", arranged as illustrated in
572*a58d3d2aSXin Li <xref target="toc_byte_fig"/>.
573*a58d3d2aSXin LiA description of each of these fields follows.
574*a58d3d2aSXin Li</t>
575*a58d3d2aSXin Li
576*a58d3d2aSXin Li<figure anchor="toc_byte_fig" title="The TOC Byte">
577*a58d3d2aSXin Li<artwork align="center"><![CDATA[
578*a58d3d2aSXin Li 0
579*a58d3d2aSXin Li 0 1 2 3 4 5 6 7
580*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+
581*a58d3d2aSXin Li| config  |s| c |
582*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+
583*a58d3d2aSXin Li]]></artwork>
584*a58d3d2aSXin Li</figure>
585*a58d3d2aSXin Li
586*a58d3d2aSXin Li<t>
587*a58d3d2aSXin LiThe top five bits of the TOC byte, labeled "config", encode one of 32 possible
588*a58d3d2aSXin Li configurations of operating mode, audio bandwidth, and frame size.
589*a58d3d2aSXin LiAs described, the LP (SILK) layer and MDCT (CELT) layer can be combined in three possible
590*a58d3d2aSXin Li operating modes:
591*a58d3d2aSXin Li<list style="numbers">
592*a58d3d2aSXin Li<t>A SILK-only mode for use in low bitrate connections with an audio bandwidth
593*a58d3d2aSXin Li of WB or less,</t>
594*a58d3d2aSXin Li<t>A Hybrid (SILK+CELT) mode for SWB or FB speech at medium bitrates, and</t>
595*a58d3d2aSXin Li<t>A CELT-only mode for very low delay speech transmission as well as music
596*a58d3d2aSXin Li transmission (NB to FB).</t>
597*a58d3d2aSXin Li</list>
598*a58d3d2aSXin LiThe 32 possible configurations each identify which one of these operating modes
599*a58d3d2aSXin Li the packet uses, as well as the audio bandwidth and the frame size.
600*a58d3d2aSXin Li<xref target="config_bits"/> lists the parameters for each configuration.
601*a58d3d2aSXin Li</t>
602*a58d3d2aSXin Li<texttable anchor="config_bits" title="TOC Byte Configuration Parameters">
603*a58d3d2aSXin Li<ttcol>Configuration Number(s)</ttcol>
604*a58d3d2aSXin Li<ttcol>Mode</ttcol>
605*a58d3d2aSXin Li<ttcol>Bandwidth</ttcol>
606*a58d3d2aSXin Li<ttcol>Frame Sizes</ttcol>
607*a58d3d2aSXin Li<c>0...3</c>   <c>SILK-only</c> <c>NB</c>  <c>10, 20, 40, 60&nbsp;ms</c>
608*a58d3d2aSXin Li<c>4...7</c>   <c>SILK-only</c> <c>MB</c>  <c>10, 20, 40, 60&nbsp;ms</c>
609*a58d3d2aSXin Li<c>8...11</c>  <c>SILK-only</c> <c>WB</c>  <c>10, 20, 40, 60&nbsp;ms</c>
610*a58d3d2aSXin Li<c>12...13</c> <c>Hybrid</c>    <c>SWB</c> <c>10, 20&nbsp;ms</c>
611*a58d3d2aSXin Li<c>14...15</c> <c>Hybrid</c>    <c>FB</c>  <c>10, 20&nbsp;ms</c>
612*a58d3d2aSXin Li<c>16...19</c> <c>CELT-only</c> <c>NB</c>  <c>2.5, 5, 10, 20&nbsp;ms</c>
613*a58d3d2aSXin Li<c>20...23</c> <c>CELT-only</c> <c>WB</c>  <c>2.5, 5, 10, 20&nbsp;ms</c>
614*a58d3d2aSXin Li<c>24...27</c> <c>CELT-only</c> <c>SWB</c> <c>2.5, 5, 10, 20&nbsp;ms</c>
615*a58d3d2aSXin Li<c>28...31</c> <c>CELT-only</c> <c>FB</c>  <c>2.5, 5, 10, 20&nbsp;ms</c>
616*a58d3d2aSXin Li</texttable>
617*a58d3d2aSXin Li<t>
618*a58d3d2aSXin LiThe configuration numbers in each range (e.g., 0...3 for NB SILK-only)
619*a58d3d2aSXin Li correspond to the various choices of frame size, in the same order.
620*a58d3d2aSXin LiFor example, configuration 0 has a 10&nbsp;ms frame size and configuration 3
621*a58d3d2aSXin Li has a 60&nbsp;ms frame size.
622*a58d3d2aSXin Li</t>
623*a58d3d2aSXin Li
624*a58d3d2aSXin Li<t>
625*a58d3d2aSXin LiOne additional bit, labeled "s", signals mono vs. stereo, with 0 indicating
626*a58d3d2aSXin Li mono and 1 indicating stereo.
627*a58d3d2aSXin Li</t>
628*a58d3d2aSXin Li
629*a58d3d2aSXin Li<t>
630*a58d3d2aSXin LiThe remaining two bits of the TOC byte, labeled "c", code the number of frames
631*a58d3d2aSXin Li per packet (codes 0 to 3) as follows:
632*a58d3d2aSXin Li<list style="symbols">
633*a58d3d2aSXin Li<t>0:    1 frame in the packet</t>
634*a58d3d2aSXin Li<t>1:    2 frames in the packet, each with equal compressed size</t>
635*a58d3d2aSXin Li<t>2:    2 frames in the packet, with different compressed sizes</t>
636*a58d3d2aSXin Li<t>3:    an arbitrary number of frames in the packet</t>
637*a58d3d2aSXin Li</list>
638*a58d3d2aSXin LiThis draft refers to a packet as a code 0 packet, code 1 packet, etc., based on
639*a58d3d2aSXin Li the value of "c".
640*a58d3d2aSXin Li</t>
641*a58d3d2aSXin Li
642*a58d3d2aSXin Li</section>
643*a58d3d2aSXin Li
644*a58d3d2aSXin Li<section title="Frame Packing">
645*a58d3d2aSXin Li
646*a58d3d2aSXin Li<t>
647*a58d3d2aSXin LiThis section describes how frames are packed according to each possible value
648*a58d3d2aSXin Li of "c" in the TOC byte.
649*a58d3d2aSXin Li</t>
650*a58d3d2aSXin Li
651*a58d3d2aSXin Li<section anchor="frame-length-coding" title="Frame Length Coding">
652*a58d3d2aSXin Li<t>
653*a58d3d2aSXin LiWhen a packet contains multiple VBR frames (i.e., code 2 or 3), the compressed
654*a58d3d2aSXin Li length of one or more of these frames is indicated with a one- or two-byte
655*a58d3d2aSXin Li sequence, with the meaning of the first byte as follows:
656*a58d3d2aSXin Li<list style="symbols">
657*a58d3d2aSXin Li<t>0:          No frame (discontinuous transmission (DTX) or lost packet)</t>
658*a58d3d2aSXin Li<t>1...251:    Length of the frame in bytes</t>
659*a58d3d2aSXin Li<t>252...255:  A second byte is needed. The total length is (second_byte*4)+first_byte</t>
660*a58d3d2aSXin Li</list>
661*a58d3d2aSXin Li</t>
662*a58d3d2aSXin Li
663*a58d3d2aSXin Li<t>
664*a58d3d2aSXin LiThe special length 0 indicates that no frame is available, either because it
665*a58d3d2aSXin Li was dropped during transmission by some intermediary or because the encoder
666*a58d3d2aSXin Li chose not to transmit it.
667*a58d3d2aSXin LiAny Opus frame in any mode MAY have a length of 0.
668*a58d3d2aSXin Li</t>
669*a58d3d2aSXin Li
670*a58d3d2aSXin Li<t>
671*a58d3d2aSXin LiThe maximum representable length is 255*4+255=1275&nbsp;bytes.
672*a58d3d2aSXin LiFor 20&nbsp;ms frames, this represents a bitrate of 510&nbsp;kb/s, which is
673*a58d3d2aSXin Li approximately the highest useful rate for lossily compressed fullband stereo
674*a58d3d2aSXin Li music.
675*a58d3d2aSXin LiBeyond this point, lossless codecs are more appropriate.
676*a58d3d2aSXin LiIt is also roughly the maximum useful rate of the MDCT layer, as shortly
677*a58d3d2aSXin Li thereafter quality no longer improves with additional bits due to limitations
678*a58d3d2aSXin Li on the codebook sizes.
679*a58d3d2aSXin Li</t>
680*a58d3d2aSXin Li
681*a58d3d2aSXin Li<t anchor="R2">
682*a58d3d2aSXin LiNo length is transmitted for the last frame in a VBR packet, or for any of the
683*a58d3d2aSXin Li frames in a CBR packet, as it can be inferred from the total size of the
684*a58d3d2aSXin Li packet and the size of all other data in the packet.
685*a58d3d2aSXin LiHowever, the length of any individual frame MUST NOT exceed
686*a58d3d2aSXin Li 1275&nbsp;bytes&nbsp;[R2], to allow for repacketization by gateways,
687*a58d3d2aSXin Li conference bridges, or other software.
688*a58d3d2aSXin Li</t>
689*a58d3d2aSXin Li</section>
690*a58d3d2aSXin Li
691*a58d3d2aSXin Li<section title="Code 0: One Frame in the Packet">
692*a58d3d2aSXin Li
693*a58d3d2aSXin Li<t>
694*a58d3d2aSXin LiFor code&nbsp;0 packets, the TOC byte is immediately followed by N-1&nbsp;bytes
695*a58d3d2aSXin Li of compressed data for a single frame (where N is the size of the packet),
696*a58d3d2aSXin Li as illustrated in <xref target="code0_packet"/>.
697*a58d3d2aSXin Li</t>
698*a58d3d2aSXin Li<figure anchor="code0_packet" title="A Code 0 Packet" align="center">
699*a58d3d2aSXin Li<artwork align="center"><![CDATA[
700*a58d3d2aSXin Li 0                   1                   2                   3
701*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
702*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
703*a58d3d2aSXin Li| config  |s|0|0|                                               |
704*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+                                               |
705*a58d3d2aSXin Li|                    Compressed frame 1 (N-1 bytes)...          :
706*a58d3d2aSXin Li:                                                               |
707*a58d3d2aSXin Li|                                                               |
708*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
709*a58d3d2aSXin Li]]></artwork>
710*a58d3d2aSXin Li</figure>
711*a58d3d2aSXin Li</section>
712*a58d3d2aSXin Li
713*a58d3d2aSXin Li<section title="Code 1: Two Frames in the Packet, Each with Equal Compressed Size">
714*a58d3d2aSXin Li<t anchor="R3">
715*a58d3d2aSXin LiFor code 1 packets, the TOC byte is immediately followed by the
716*a58d3d2aSXin Li (N-1)/2&nbsp;bytes of compressed data for the first frame, followed by
717*a58d3d2aSXin Li (N-1)/2&nbsp;bytes of compressed data for the second frame, as illustrated in
718*a58d3d2aSXin Li <xref target="code1_packet"/>.
719*a58d3d2aSXin LiThe number of payload bytes available for compressed data, N-1, MUST be even
720*a58d3d2aSXin Li for all code 1 packets&nbsp;[R3].
721*a58d3d2aSXin Li</t>
722*a58d3d2aSXin Li<figure anchor="code1_packet" title="A Code 1 Packet" align="center">
723*a58d3d2aSXin Li<artwork align="center"><![CDATA[
724*a58d3d2aSXin Li 0                   1                   2                   3
725*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
726*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
727*a58d3d2aSXin Li| config  |s|0|1|                                               |
728*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+                                               :
729*a58d3d2aSXin Li|             Compressed frame 1 ((N-1)/2 bytes)...             |
730*a58d3d2aSXin Li:                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
731*a58d3d2aSXin Li|                               |                               |
732*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               :
733*a58d3d2aSXin Li|             Compressed frame 2 ((N-1)/2 bytes)...             |
734*a58d3d2aSXin Li:                                               +-+-+-+-+-+-+-+-+
735*a58d3d2aSXin Li|                                               |
736*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
737*a58d3d2aSXin Li]]></artwork>
738*a58d3d2aSXin Li</figure>
739*a58d3d2aSXin Li</section>
740*a58d3d2aSXin Li
741*a58d3d2aSXin Li<section title="Code 2: Two Frames in the Packet, with Different Compressed Sizes">
742*a58d3d2aSXin Li<t anchor="R4">
743*a58d3d2aSXin LiFor code 2 packets, the TOC byte is followed by a one- or two-byte sequence
744*a58d3d2aSXin Li indicating the length of the first frame (marked N1 in <xref target='code2_packet'/>),
745*a58d3d2aSXin Li followed by N1 bytes of compressed data for the first frame.
746*a58d3d2aSXin LiThe remaining N-N1-2 or N-N1-3&nbsp;bytes are the compressed data for the
747*a58d3d2aSXin Li second frame.
748*a58d3d2aSXin LiThis is illustrated in <xref target="code2_packet"/>.
749*a58d3d2aSXin LiA code 2 packet MUST contain enough bytes to represent a valid length.
750*a58d3d2aSXin LiFor example, a 1-byte code 2 packet is always invalid, and a 2-byte code 2
751*a58d3d2aSXin Li packet whose second byte is in the range 252...255 is also invalid.
752*a58d3d2aSXin LiThe length of the first frame, N1, MUST also be no larger than the size of the
753*a58d3d2aSXin Li payload remaining after decoding that length for all code 2 packets&nbsp;[R4].
754*a58d3d2aSXin LiThis makes, for example, a 2-byte code 2 packet with a second byte in the range
755*a58d3d2aSXin Li 1...251 invalid as well (the only valid 2-byte code 2 packet is one where the
756*a58d3d2aSXin Li length of both frames is zero).
757*a58d3d2aSXin Li</t>
758*a58d3d2aSXin Li<figure anchor="code2_packet" title="A Code 2 Packet" align="center">
759*a58d3d2aSXin Li<artwork align="center"><![CDATA[
760*a58d3d2aSXin Li 0                   1                   2                   3
761*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
762*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
763*a58d3d2aSXin Li| config  |s|1|0| N1 (1-2 bytes):                               |
764*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               :
765*a58d3d2aSXin Li|               Compressed frame 1 (N1 bytes)...                |
766*a58d3d2aSXin Li:                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
767*a58d3d2aSXin Li|                               |                               |
768*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
769*a58d3d2aSXin Li|                     Compressed frame 2...                     :
770*a58d3d2aSXin Li:                                                               |
771*a58d3d2aSXin Li|                                                               |
772*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
773*a58d3d2aSXin Li]]></artwork>
774*a58d3d2aSXin Li</figure>
775*a58d3d2aSXin Li</section>
776*a58d3d2aSXin Li
777*a58d3d2aSXin Li<section title="Code 3: A Signaled Number of Frames in the Packet">
778*a58d3d2aSXin Li<t anchor="R5">
779*a58d3d2aSXin LiCode 3 packets signal the number of frames, as well as additional
780*a58d3d2aSXin Li padding, called "Opus padding" to indicate that this padding is added at the
781*a58d3d2aSXin Li Opus layer, rather than at the transport layer.
782*a58d3d2aSXin LiCode 3 packets MUST have at least 2 bytes&nbsp;[R6,R7].
783*a58d3d2aSXin LiThe TOC byte is followed by a byte encoding the number of frames in the packet
784*a58d3d2aSXin Li in bits 2 to 7 (marked "M" in <xref target='frame_count_byte'/>), with bit 1 indicating whether
785*a58d3d2aSXin Li or not Opus padding is inserted (marked "p" in <xref target='frame_count_byte'/>), and bit 0
786*a58d3d2aSXin Li indicating VBR (marked "v" in <xref target='frame_count_byte'/>).
787*a58d3d2aSXin LiM MUST NOT be zero, and the audio duration contained within a packet MUST NOT
788*a58d3d2aSXin Li exceed 120&nbsp;ms&nbsp;[R5].
789*a58d3d2aSXin LiThis limits the maximum frame count for any frame size to 48 (for 2.5&nbsp;ms
790*a58d3d2aSXin Li frames), with lower limits for longer frame sizes.
791*a58d3d2aSXin Li<xref target="frame_count_byte"/> illustrates the layout of the frame count
792*a58d3d2aSXin Li byte.
793*a58d3d2aSXin Li</t>
794*a58d3d2aSXin Li<figure anchor="frame_count_byte" title="The frame count byte">
795*a58d3d2aSXin Li<artwork align="center"><![CDATA[
796*a58d3d2aSXin Li 0
797*a58d3d2aSXin Li 0 1 2 3 4 5 6 7
798*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+
799*a58d3d2aSXin Li|v|p|     M     |
800*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+
801*a58d3d2aSXin Li]]></artwork>
802*a58d3d2aSXin Li</figure>
803*a58d3d2aSXin Li<t>
804*a58d3d2aSXin LiWhen Opus padding is used, the number of bytes of padding is encoded in the
805*a58d3d2aSXin Li bytes following the frame count byte.
806*a58d3d2aSXin LiValues from 0...254 indicate that 0...254&nbsp;bytes of padding are included,
807*a58d3d2aSXin Li in addition to the byte(s) used to indicate the size of the padding.
808*a58d3d2aSXin LiIf the value is 255, then the size of the additional padding is 254&nbsp;bytes,
809*a58d3d2aSXin Li plus the padding value encoded in the next byte.
810*a58d3d2aSXin LiThere MUST be at least one more byte in the packet in this case&nbsp;[R6,R7].
811*a58d3d2aSXin LiThe additional padding bytes appear at the end of the packet, and MUST be set
812*a58d3d2aSXin Li to zero by the encoder to avoid creating a covert channel.
813*a58d3d2aSXin LiThe decoder MUST accept any value for the padding bytes, however.
814*a58d3d2aSXin Li</t>
815*a58d3d2aSXin Li<t>
816*a58d3d2aSXin LiAlthough this encoding provides multiple ways to indicate a given number of
817*a58d3d2aSXin Li padding bytes, each uses a different number of bytes to indicate the padding
818*a58d3d2aSXin Li size, and thus will increase the total packet size by a different amount.
819*a58d3d2aSXin LiFor example, to add 255 bytes to a packet, set the padding bit, p, to 1, insert
820*a58d3d2aSXin Li a single byte after the frame count byte with a value of 254, and append 254
821*a58d3d2aSXin Li padding bytes with the value zero to the end of the packet.
822*a58d3d2aSXin LiTo add 256 bytes to a packet, set the padding bit to 1, insert two bytes after
823*a58d3d2aSXin Li the frame count byte with the values 255 and 0, respectively, and append 254
824*a58d3d2aSXin Li padding bytes with the value zero to the end of the packet.
825*a58d3d2aSXin LiBy using the value 255 multiple times, it is possible to create a packet of any
826*a58d3d2aSXin Li specific, desired size.
827*a58d3d2aSXin LiLet P be the number of header bytes used to indicate the padding size plus the
828*a58d3d2aSXin Li number of padding bytes themselves (i.e., P is the total number of bytes added
829*a58d3d2aSXin Li to the packet).
830*a58d3d2aSXin LiThen P MUST be no more than N-2&nbsp;[R6,R7].
831*a58d3d2aSXin Li</t>
832*a58d3d2aSXin Li<t anchor="R6">
833*a58d3d2aSXin LiIn the CBR case, let R=N-2-P be the number of bytes remaining in the packet
834*a58d3d2aSXin Li after subtracting the (optional) padding.
835*a58d3d2aSXin LiThen the compressed length of each frame in bytes is equal to R/M.
836*a58d3d2aSXin LiThe value R MUST be a non-negative integer multiple of M&nbsp;[R6].
837*a58d3d2aSXin LiThe compressed data for all M frames follows, each of size
838*a58d3d2aSXin Li R/M&nbsp;bytes, as illustrated in <xref target="code3cbr_packet"/>.
839*a58d3d2aSXin Li</t>
840*a58d3d2aSXin Li
841*a58d3d2aSXin Li<figure anchor="code3cbr_packet" title="A CBR Code 3 Packet" align="center">
842*a58d3d2aSXin Li<artwork align="center"><![CDATA[
843*a58d3d2aSXin Li 0                   1                   2                   3
844*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
845*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
846*a58d3d2aSXin Li| config  |s|1|1|0|p|     M     |  Padding length (Optional)    :
847*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
848*a58d3d2aSXin Li|                                                               |
849*a58d3d2aSXin Li:               Compressed frame 1 (R/M bytes)...               :
850*a58d3d2aSXin Li|                                                               |
851*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
852*a58d3d2aSXin Li|                                                               |
853*a58d3d2aSXin Li:               Compressed frame 2 (R/M bytes)...               :
854*a58d3d2aSXin Li|                                                               |
855*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
856*a58d3d2aSXin Li|                                                               |
857*a58d3d2aSXin Li:                              ...                              :
858*a58d3d2aSXin Li|                                                               |
859*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
860*a58d3d2aSXin Li|                                                               |
861*a58d3d2aSXin Li:               Compressed frame M (R/M bytes)...               :
862*a58d3d2aSXin Li|                                                               |
863*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
864*a58d3d2aSXin Li:                  Opus Padding (Optional)...                   |
865*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
866*a58d3d2aSXin Li]]></artwork>
867*a58d3d2aSXin Li</figure>
868*a58d3d2aSXin Li
869*a58d3d2aSXin Li<t anchor="R7">
870*a58d3d2aSXin LiIn the VBR case, the (optional) padding length is followed by M-1 frame
871*a58d3d2aSXin Li lengths (indicated by "N1" to "N[M-1]" in <xref target='code3vbr_packet'/>), each encoded in a
872*a58d3d2aSXin Li one- or two-byte sequence as described above.
873*a58d3d2aSXin LiThe packet MUST contain enough data for the M-1 lengths after removing the
874*a58d3d2aSXin Li (optional) padding, and the sum of these lengths MUST be no larger than the
875*a58d3d2aSXin Li number of bytes remaining in the packet after decoding them&nbsp;[R7].
876*a58d3d2aSXin LiThe compressed data for all M frames follows, each frame consisting of the
877*a58d3d2aSXin Li indicated number of bytes, with the final frame consuming any remaining bytes
878*a58d3d2aSXin Li before the final padding, as illustrated in <xref target="code3cbr_packet"/>.
879*a58d3d2aSXin LiThe number of header bytes (TOC byte, frame count byte, padding length bytes,
880*a58d3d2aSXin Li and frame length bytes), plus the signaled length of the first M-1 frames themselves,
881*a58d3d2aSXin Li plus the signaled length of the padding MUST be no larger than N, the total size of the
882*a58d3d2aSXin Li packet.
883*a58d3d2aSXin Li</t>
884*a58d3d2aSXin Li
885*a58d3d2aSXin Li<figure anchor="code3vbr_packet" title="A VBR Code 3 Packet" align="center">
886*a58d3d2aSXin Li<artwork align="center"><![CDATA[
887*a58d3d2aSXin Li 0                   1                   2                   3
888*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
889*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
890*a58d3d2aSXin Li| config  |s|1|1|1|p|     M     | Padding length (Optional)     :
891*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
892*a58d3d2aSXin Li: N1 (1-2 bytes): N2 (1-2 bytes):     ...       :     N[M-1]    |
893*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
894*a58d3d2aSXin Li|                                                               |
895*a58d3d2aSXin Li:               Compressed frame 1 (N1 bytes)...                :
896*a58d3d2aSXin Li|                                                               |
897*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
898*a58d3d2aSXin Li|                                                               |
899*a58d3d2aSXin Li:               Compressed frame 2 (N2 bytes)...                :
900*a58d3d2aSXin Li|                                                               |
901*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
902*a58d3d2aSXin Li|                                                               |
903*a58d3d2aSXin Li:                              ...                              :
904*a58d3d2aSXin Li|                                                               |
905*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
906*a58d3d2aSXin Li|                                                               |
907*a58d3d2aSXin Li:                     Compressed frame M...                     :
908*a58d3d2aSXin Li|                                                               |
909*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
910*a58d3d2aSXin Li:                  Opus Padding (Optional)...                   |
911*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
912*a58d3d2aSXin Li]]></artwork>
913*a58d3d2aSXin Li</figure>
914*a58d3d2aSXin Li</section>
915*a58d3d2aSXin Li</section>
916*a58d3d2aSXin Li
917*a58d3d2aSXin Li<section anchor="examples" title="Examples">
918*a58d3d2aSXin Li<t>
919*a58d3d2aSXin LiSimplest case, one NB mono 20&nbsp;ms SILK frame:
920*a58d3d2aSXin Li</t>
921*a58d3d2aSXin Li
922*a58d3d2aSXin Li<figure anchor='framing_example_1'>
923*a58d3d2aSXin Li<artwork><![CDATA[
924*a58d3d2aSXin Li 0                   1                   2                   3
925*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
926*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
927*a58d3d2aSXin Li|    1    |0|0|0|               compressed data...              :
928*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
929*a58d3d2aSXin Li]]></artwork>
930*a58d3d2aSXin Li</figure>
931*a58d3d2aSXin Li
932*a58d3d2aSXin Li<t>
933*a58d3d2aSXin LiTwo FB mono 5&nbsp;ms CELT frames of the same compressed size:
934*a58d3d2aSXin Li</t>
935*a58d3d2aSXin Li
936*a58d3d2aSXin Li<figure anchor='framing_example_2'>
937*a58d3d2aSXin Li<artwork><![CDATA[
938*a58d3d2aSXin Li 0                   1                   2                   3
939*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
940*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
941*a58d3d2aSXin Li|   29    |0|0|1|               compressed data...              :
942*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
943*a58d3d2aSXin Li]]></artwork>
944*a58d3d2aSXin Li</figure>
945*a58d3d2aSXin Li
946*a58d3d2aSXin Li<t>
947*a58d3d2aSXin LiTwo FB mono 20&nbsp;ms Hybrid frames of different compressed size:
948*a58d3d2aSXin Li</t>
949*a58d3d2aSXin Li
950*a58d3d2aSXin Li<figure anchor='framing_example_3'>
951*a58d3d2aSXin Li<artwork><![CDATA[
952*a58d3d2aSXin Li 0                   1                   2                   3
953*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
954*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
955*a58d3d2aSXin Li|   15    |0|1|1|1|0|     2     |      N1       |               |
956*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
957*a58d3d2aSXin Li|                       compressed data...                      :
958*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
959*a58d3d2aSXin Li]]></artwork>
960*a58d3d2aSXin Li</figure>
961*a58d3d2aSXin Li
962*a58d3d2aSXin Li<t>
963*a58d3d2aSXin LiFour FB stereo 20&nbsp;ms CELT frames of the same compressed size:
964*a58d3d2aSXin Li</t>
965*a58d3d2aSXin Li
966*a58d3d2aSXin Li<figure anchor='framing_example_4'>
967*a58d3d2aSXin Li<artwork><![CDATA[
968*a58d3d2aSXin Li 0                   1                   2                   3
969*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
970*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
971*a58d3d2aSXin Li|   31    |1|1|1|0|0|     4     |      compressed data...       :
972*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
973*a58d3d2aSXin Li]]></artwork>
974*a58d3d2aSXin Li</figure>
975*a58d3d2aSXin Li</section>
976*a58d3d2aSXin Li
977*a58d3d2aSXin Li<section anchor="malformed-packets" title="Receiving Malformed Packets">
978*a58d3d2aSXin Li<t>
979*a58d3d2aSXin LiA receiver MUST NOT process packets which violate any of the rules above as
980*a58d3d2aSXin Li normal Opus packets.
981*a58d3d2aSXin LiThey are reserved for future applications, such as in-band headers (containing
982*a58d3d2aSXin Li metadata, etc.).
983*a58d3d2aSXin LiPackets which violate these constraints may cause implementations of
984*a58d3d2aSXin Li <spanx style="emph">this</spanx> specification to treat them as malformed, and
985*a58d3d2aSXin Li discard them.
986*a58d3d2aSXin Li</t>
987*a58d3d2aSXin Li<t>
988*a58d3d2aSXin LiThese constraints are summarized here for reference:
989*a58d3d2aSXin Li<list style="format [R%d]">
990*a58d3d2aSXin Li<t>Packets are at least one byte.</t>
991*a58d3d2aSXin Li<t>No implicit frame length is larger than 1275 bytes.</t>
992*a58d3d2aSXin Li<t>Code 1 packets have an odd total length, N, so that (N-1)/2 is an
993*a58d3d2aSXin Li integer.</t>
994*a58d3d2aSXin Li<t>Code 2 packets have enough bytes after the TOC for a valid frame
995*a58d3d2aSXin Li length, and that length is no larger than the number of bytes remaining in the
996*a58d3d2aSXin Li packet.</t>
997*a58d3d2aSXin Li<t>Code 3 packets contain at least one frame, but no more than 120&nbsp;ms
998*a58d3d2aSXin Li of audio total.</t>
999*a58d3d2aSXin Li<t>The length of a CBR code 3 packet, N, is at least two bytes, the number of
1000*a58d3d2aSXin Li bytes added to indicate the padding size plus the trailing padding bytes
1001*a58d3d2aSXin Li themselves, P, is no more than N-2, and the frame count, M, satisfies
1002*a58d3d2aSXin Li the constraint that (N-2-P) is a non-negative integer multiple of M.</t>
1003*a58d3d2aSXin Li<t>VBR code 3 packets are large enough to contain all the header bytes (TOC
1004*a58d3d2aSXin Li byte, frame count byte, any padding length bytes, and any frame length bytes),
1005*a58d3d2aSXin Li plus the length of the first M-1 frames, plus any trailing padding bytes.</t>
1006*a58d3d2aSXin Li</list>
1007*a58d3d2aSXin Li</t>
1008*a58d3d2aSXin Li</section>
1009*a58d3d2aSXin Li
1010*a58d3d2aSXin Li</section>
1011*a58d3d2aSXin Li
1012*a58d3d2aSXin Li<section title="Opus Decoder">
1013*a58d3d2aSXin Li<t>
1014*a58d3d2aSXin LiThe Opus decoder consists of two main blocks: the SILK decoder and the CELT
1015*a58d3d2aSXin Li decoder.
1016*a58d3d2aSXin LiAt any given time, one or both of the SILK and CELT decoders may be active.
1017*a58d3d2aSXin LiThe output of the Opus decode is the sum of the outputs from the SILK and CELT
1018*a58d3d2aSXin Li decoders with proper sample rate conversion and delay compensation on the SILK
1019*a58d3d2aSXin Li side, and optional decimation (when decoding to sample rates less than
1020*a58d3d2aSXin Li 48&nbsp;kHz) on the CELT side, as illustrated in the block diagram below.
1021*a58d3d2aSXin Li</t>
1022*a58d3d2aSXin Li<figure>
1023*a58d3d2aSXin Li<artwork>
1024*a58d3d2aSXin Li<![CDATA[
1025*a58d3d2aSXin Li                         +---------+    +------------+
1026*a58d3d2aSXin Li                         |  SILK   |    |   Sample   |
1027*a58d3d2aSXin Li                      +->| Decoder |--->|    Rate    |----+
1028*a58d3d2aSXin LiBit-    +---------+   |  |         |    | Conversion |    v
1029*a58d3d2aSXin Listream  |  Range  |---+  +---------+    +------------+  /---\  Audio
1030*a58d3d2aSXin Li------->| Decoder |                                     | + |------>
1031*a58d3d2aSXin Li        |         |---+  +---------+    +------------+  \---/
1032*a58d3d2aSXin Li        +---------+   |  |  CELT   |    | Decimation |    ^
1033*a58d3d2aSXin Li                      +->| Decoder |--->| (Optional) |----+
1034*a58d3d2aSXin Li                         |         |    |            |
1035*a58d3d2aSXin Li                         +---------+    +------------+
1036*a58d3d2aSXin Li]]>
1037*a58d3d2aSXin Li</artwork>
1038*a58d3d2aSXin Li</figure>
1039*a58d3d2aSXin Li
1040*a58d3d2aSXin Li<section anchor="range-decoder" title="Range Decoder">
1041*a58d3d2aSXin Li<t>
1042*a58d3d2aSXin LiOpus uses an entropy coder based on range coding <xref target="range-coding"></xref>
1043*a58d3d2aSXin Li<xref target="Martin79"></xref>,
1044*a58d3d2aSXin Liwhich is itself a rediscovery of the FIFO arithmetic code introduced by <xref target="coding-thesis"></xref>.
1045*a58d3d2aSXin LiIt is very similar to arithmetic encoding, except that encoding is done with
1046*a58d3d2aSXin Lidigits in any base instead of with bits,
1047*a58d3d2aSXin Liso it is faster when using larger bases (i.e., a byte). All of the
1048*a58d3d2aSXin Licalculations in the range coder must use bit-exact integer arithmetic.
1049*a58d3d2aSXin Li</t>
1050*a58d3d2aSXin Li<t>
1051*a58d3d2aSXin LiSymbols may also be coded as "raw bits" packed directly into the bitstream,
1052*a58d3d2aSXin Li bypassing the range coder.
1053*a58d3d2aSXin LiThese are packed backwards starting at the end of the frame, as illustrated in
1054*a58d3d2aSXin Li <xref target="rawbits-example"/>.
1055*a58d3d2aSXin LiThis reduces complexity and makes the stream more resilient to bit errors, as
1056*a58d3d2aSXin Li corruption in the raw bits will not desynchronize the decoding process, unlike
1057*a58d3d2aSXin Li corruption in the input to the range decoder.
1058*a58d3d2aSXin LiRaw bits are only used in the CELT layer.
1059*a58d3d2aSXin Li</t>
1060*a58d3d2aSXin Li
1061*a58d3d2aSXin Li<figure anchor="rawbits-example" title="Illustrative example of packing range
1062*a58d3d2aSXin Li coder and raw bits data">
1063*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1064*a58d3d2aSXin Li 0                   1                   2                   3
1065*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1066*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1067*a58d3d2aSXin Li| Range coder data (packed MSB to LSB) ->                       :
1068*a58d3d2aSXin Li+                                                               +
1069*a58d3d2aSXin Li:                                                               :
1070*a58d3d2aSXin Li+     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1071*a58d3d2aSXin Li:     | <- Boundary occurs at an arbitrary bit position         :
1072*a58d3d2aSXin Li+-+-+-+                                                         +
1073*a58d3d2aSXin Li:                          <- Raw bits data (packed LSB to MSB) |
1074*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1075*a58d3d2aSXin Li]]></artwork>
1076*a58d3d2aSXin Li</figure>
1077*a58d3d2aSXin Li
1078*a58d3d2aSXin Li<t>
1079*a58d3d2aSXin LiEach symbol coded by the range coder is drawn from a finite alphabet and coded
1080*a58d3d2aSXin Li in a separate "context", which describes the size of the alphabet and the
1081*a58d3d2aSXin Li relative frequency of each symbol in that alphabet.
1082*a58d3d2aSXin Li</t>
1083*a58d3d2aSXin Li<t>
1084*a58d3d2aSXin LiSuppose there is a context with n symbols, identified with an index that ranges
1085*a58d3d2aSXin Li from 0 to n-1.
1086*a58d3d2aSXin LiThe parameters needed to encode or decode symbol k in this context are
1087*a58d3d2aSXin Li represented by a three-tuple (fl[k],&nbsp;fh[k],&nbsp;ft), with
1088*a58d3d2aSXin Li 0&nbsp;&lt;=&nbsp;fl[k]&nbsp;&lt;&nbsp;fh[k]&nbsp;&lt;=&nbsp;ft&nbsp;&lt;=&nbsp;65535.
1089*a58d3d2aSXin LiThe values of this tuple are derived from the probability model for the
1090*a58d3d2aSXin Li symbol, represented by traditional "frequency counts".
1091*a58d3d2aSXin LiBecause Opus uses static contexts these are not updated as symbols are decoded.
1092*a58d3d2aSXin LiLet f[i] be the frequency of symbol i.
1093*a58d3d2aSXin LiThen the three-tuple corresponding to symbol k is given by
1094*a58d3d2aSXin Li</t>
1095*a58d3d2aSXin Li<figure align="center">
1096*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1097*a58d3d2aSXin Li        k-1                                   n-1
1098*a58d3d2aSXin Li        __                                    __
1099*a58d3d2aSXin Lifl[k] = \  f[i],  fh[k] = fl[k] + f[k],  ft = \  f[i]
1100*a58d3d2aSXin Li        /_                                    /_
1101*a58d3d2aSXin Li        i=0                                   i=0
1102*a58d3d2aSXin Li]]></artwork>
1103*a58d3d2aSXin Li</figure>
1104*a58d3d2aSXin Li<t>
1105*a58d3d2aSXin LiThe range decoder extracts the symbols and integers encoded using the range
1106*a58d3d2aSXin Li encoder in <xref target="range-encoder"/>.
1107*a58d3d2aSXin LiThe range decoder maintains an internal state vector composed of the two-tuple
1108*a58d3d2aSXin Li (val,&nbsp;rng), representing the difference between the high end of the
1109*a58d3d2aSXin Li current range and the actual coded value, minus one, and the size of the
1110*a58d3d2aSXin Li current range, respectively.
1111*a58d3d2aSXin LiBoth val and rng are 32-bit unsigned integer values.
1112*a58d3d2aSXin Li</t>
1113*a58d3d2aSXin Li
1114*a58d3d2aSXin Li<section anchor="range-decoder-init" title="Range Decoder Initialization">
1115*a58d3d2aSXin Li<t>
1116*a58d3d2aSXin LiLet b0 be the first input byte (or zero if there are no bytes in this Opus
1117*a58d3d2aSXin Li frame).
1118*a58d3d2aSXin LiThe decoder initializes rng to 128 and initializes val to
1119*a58d3d2aSXin Li (127&nbsp;-&nbsp;(b0&gt;&gt;1)), where (b0&gt;&gt;1) is the top 7 bits of the
1120*a58d3d2aSXin Li first input byte.
1121*a58d3d2aSXin LiIt saves the remaining bit, (b0&amp;1), for use in the renormalization
1122*a58d3d2aSXin Li procedure described in <xref target="range-decoder-renorm"/>, which the
1123*a58d3d2aSXin Li decoder invokes immediately after initialization to read additional bits and
1124*a58d3d2aSXin Li establish the invariant that rng&nbsp;&gt;&nbsp;2**23.
1125*a58d3d2aSXin Li</t>
1126*a58d3d2aSXin Li</section>
1127*a58d3d2aSXin Li
1128*a58d3d2aSXin Li<section anchor="decoding-symbols" title="Decoding Symbols">
1129*a58d3d2aSXin Li<t>
1130*a58d3d2aSXin LiDecoding a symbol is a two-step process.
1131*a58d3d2aSXin LiThe first step determines a 16-bit unsigned value fs, which lies within the
1132*a58d3d2aSXin Li range of some symbol in the current context.
1133*a58d3d2aSXin LiThe second step updates the range decoder state with the three-tuple
1134*a58d3d2aSXin Li (fl[k],&nbsp;fh[k],&nbsp;ft) corresponding to that symbol.
1135*a58d3d2aSXin Li</t>
1136*a58d3d2aSXin Li<t>
1137*a58d3d2aSXin LiThe first step is implemented by ec_decode() (entdec.c), which computes
1138*a58d3d2aSXin Li<figure align="center">
1139*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1140*a58d3d2aSXin Li               val
1141*a58d3d2aSXin Lifs = ft - min(------ + 1, ft) .
1142*a58d3d2aSXin Li              rng/ft
1143*a58d3d2aSXin Li]]></artwork>
1144*a58d3d2aSXin Li</figure>
1145*a58d3d2aSXin LiThe divisions here are integer division.
1146*a58d3d2aSXin Li</t>
1147*a58d3d2aSXin Li<t>
1148*a58d3d2aSXin LiThe decoder then identifies the symbol in the current context corresponding to
1149*a58d3d2aSXin Li fs; i.e., the value of k whose three-tuple (fl[k],&nbsp;fh[k],&nbsp;ft)
1150*a58d3d2aSXin Li satisfies fl[k]&nbsp;&lt;=&nbsp;fs&nbsp;&lt;&nbsp;fh[k].
1151*a58d3d2aSXin LiIt uses this tuple to update val according to
1152*a58d3d2aSXin Li<figure align="center">
1153*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1154*a58d3d2aSXin Li            rng
1155*a58d3d2aSXin Lival = val - --- * (ft - fh[k]) .
1156*a58d3d2aSXin Li            ft
1157*a58d3d2aSXin Li]]></artwork>
1158*a58d3d2aSXin Li</figure>
1159*a58d3d2aSXin LiIf fl[k] is greater than zero, then the decoder updates rng using
1160*a58d3d2aSXin Li<figure align="center">
1161*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1162*a58d3d2aSXin Li      rng
1163*a58d3d2aSXin Lirng = --- * (fh[k] - fl[k]) .
1164*a58d3d2aSXin Li      ft
1165*a58d3d2aSXin Li]]></artwork>
1166*a58d3d2aSXin Li</figure>
1167*a58d3d2aSXin LiOtherwise, it updates rng using
1168*a58d3d2aSXin Li<figure align="center">
1169*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1170*a58d3d2aSXin Li            rng
1171*a58d3d2aSXin Lirng = rng - --- * (ft - fh[k]) .
1172*a58d3d2aSXin Li            ft
1173*a58d3d2aSXin Li]]></artwork>
1174*a58d3d2aSXin Li</figure>
1175*a58d3d2aSXin Li</t>
1176*a58d3d2aSXin Li<t>
1177*a58d3d2aSXin LiUsing a special case for the first symbol (rather than the last symbol, as is
1178*a58d3d2aSXin Li commonly done in other arithmetic coders) ensures that all the truncation
1179*a58d3d2aSXin Li error from the finite precision arithmetic accumulates in symbol 0.
1180*a58d3d2aSXin LiThis makes the cost of coding a 0 slightly smaller, on average, than its
1181*a58d3d2aSXin Li estimated probability indicates and makes the cost of coding any other symbol
1182*a58d3d2aSXin Li slightly larger.
1183*a58d3d2aSXin LiWhen contexts are designed so that 0 is the most probable symbol, which is
1184*a58d3d2aSXin Li often the case, this strategy minimizes the inefficiency introduced by the
1185*a58d3d2aSXin Li finite precision.
1186*a58d3d2aSXin LiIt also makes some of the special-case decoding routines in
1187*a58d3d2aSXin Li <xref target="decoding-alternate"/> particularly simple.
1188*a58d3d2aSXin Li</t>
1189*a58d3d2aSXin Li<t>
1190*a58d3d2aSXin LiAfter the updates, implemented by ec_dec_update() (entdec.c), the decoder
1191*a58d3d2aSXin Li normalizes the range using the procedure in the next section, and returns the
1192*a58d3d2aSXin Li index k.
1193*a58d3d2aSXin Li</t>
1194*a58d3d2aSXin Li
1195*a58d3d2aSXin Li<section anchor="range-decoder-renorm" title="Renormalization">
1196*a58d3d2aSXin Li<t>
1197*a58d3d2aSXin LiTo normalize the range, the decoder repeats the following process, implemented
1198*a58d3d2aSXin Li by ec_dec_normalize() (entdec.c), until rng&nbsp;&gt;&nbsp;2**23.
1199*a58d3d2aSXin LiIf rng is already greater than 2**23, the entire process is skipped.
1200*a58d3d2aSXin LiFirst, it sets rng to (rng&lt;&lt;8).
1201*a58d3d2aSXin LiThen it reads the next byte of the Opus frame and forms an 8-bit value sym,
1202*a58d3d2aSXin Li using the left-over bit buffered from the previous byte as the high bit
1203*a58d3d2aSXin Li and the top 7 bits of the byte just read as the other 7 bits of sym.
1204*a58d3d2aSXin LiThe remaining bit in the byte just read is buffered for use in the next
1205*a58d3d2aSXin Li iteration.
1206*a58d3d2aSXin LiIf no more input bytes remain, it uses zero bits instead.
1207*a58d3d2aSXin LiSee <xref target="range-decoder-init"/> for the initialization used to process
1208*a58d3d2aSXin Li the first byte.
1209*a58d3d2aSXin LiThen, it sets
1210*a58d3d2aSXin Li<figure align="center">
1211*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1212*a58d3d2aSXin Lival = ((val<<8) + (255-sym)) & 0x7FFFFFFF .
1213*a58d3d2aSXin Li]]></artwork>
1214*a58d3d2aSXin Li</figure>
1215*a58d3d2aSXin Li</t>
1216*a58d3d2aSXin Li<t>
1217*a58d3d2aSXin LiIt is normal and expected that the range decoder will read several bytes
1218*a58d3d2aSXin Li into the raw bits data (if any) at the end of the packet by the time the frame
1219*a58d3d2aSXin Li is completely decoded, as illustrated in <xref target="finalize-example"/>.
1220*a58d3d2aSXin LiThis same data MUST also be returned as raw bits when requested.
1221*a58d3d2aSXin LiThe encoder is expected to terminate the stream in such a way that the decoder
1222*a58d3d2aSXin Li will decode the intended values regardless of the data contained in the raw
1223*a58d3d2aSXin Li bits.
1224*a58d3d2aSXin Li<xref target="encoder-finalizing"/> describes a procedure for doing this.
1225*a58d3d2aSXin LiIf the range decoder consumes all of the bytes belonging to the current frame,
1226*a58d3d2aSXin Li it MUST continue to use zero when any further input bytes are required, even
1227*a58d3d2aSXin Li if there is additional data in the current packet from padding or other
1228*a58d3d2aSXin Li frames.
1229*a58d3d2aSXin Li</t>
1230*a58d3d2aSXin Li
1231*a58d3d2aSXin Li<figure anchor="finalize-example" title="Illustrative example of raw bits
1232*a58d3d2aSXin Li overlapping range coder data">
1233*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1234*a58d3d2aSXin Li n              n+1             n+2             n+3
1235*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
1236*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1237*a58d3d2aSXin Li:     | <----------- Overlap region ------------> |             :
1238*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1239*a58d3d2aSXin Li      ^                                           ^
1240*a58d3d2aSXin Li      |   End of data buffered by the range coder |
1241*a58d3d2aSXin Li...-----------------------------------------------+
1242*a58d3d2aSXin Li      |
1243*a58d3d2aSXin Li      | End of data consumed by raw bits
1244*a58d3d2aSXin Li      +-------------------------------------------------------...
1245*a58d3d2aSXin Li]]></artwork>
1246*a58d3d2aSXin Li</figure>
1247*a58d3d2aSXin Li</section>
1248*a58d3d2aSXin Li</section>
1249*a58d3d2aSXin Li
1250*a58d3d2aSXin Li<section anchor="decoding-alternate" title="Alternate Decoding Methods">
1251*a58d3d2aSXin Li<t>
1252*a58d3d2aSXin LiThe reference implementation uses three additional decoding methods that are
1253*a58d3d2aSXin Li exactly equivalent to the above, but make assumptions and simplifications that
1254*a58d3d2aSXin Li allow for a more efficient implementation.
1255*a58d3d2aSXin Li</t>
1256*a58d3d2aSXin Li<section anchor="ec_decode_bin" title="ec_decode_bin()">
1257*a58d3d2aSXin Li<t>
1258*a58d3d2aSXin LiThe first is ec_decode_bin() (entdec.c), defined using the parameter ftb
1259*a58d3d2aSXin Li instead of ft.
1260*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_decode() with
1261*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;ftb), but avoids one of the divisions.
1262*a58d3d2aSXin Li</t>
1263*a58d3d2aSXin Li</section>
1264*a58d3d2aSXin Li<section anchor="ec_dec_bit_logp" title="ec_dec_bit_logp()">
1265*a58d3d2aSXin Li<t>
1266*a58d3d2aSXin LiThe next is ec_dec_bit_logp() (entdec.c), which decodes a single binary symbol,
1267*a58d3d2aSXin Li replacing both the ec_decode() and ec_dec_update() steps.
1268*a58d3d2aSXin LiThe context is described by a single parameter, logp, which is the absolute
1269*a58d3d2aSXin Li value of the base-2 logarithm of the probability of a "1".
1270*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_decode() with
1271*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;logp), followed by ec_dec_update() with
1272*a58d3d2aSXin Li the 3-tuple (fl[k]&nbsp;=&nbsp;0,
1273*a58d3d2aSXin Li fh[k]&nbsp;=&nbsp;(1&lt;&lt;logp)&nbsp;-&nbsp;1,
1274*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;logp)) if the returned value
1275*a58d3d2aSXin Li of fs is less than (1&lt;&lt;logp)&nbsp;-&nbsp;1 (a "0" was decoded), and with
1276*a58d3d2aSXin Li (fl[k]&nbsp;=&nbsp;(1&lt;&lt;logp)&nbsp;-&nbsp;1,
1277*a58d3d2aSXin Li fh[k]&nbsp;=&nbsp;ft&nbsp;=&nbsp;(1&lt;&lt;logp)) otherwise (a "1" was
1278*a58d3d2aSXin Li decoded).
1279*a58d3d2aSXin LiThe implementation requires no multiplications or divisions.
1280*a58d3d2aSXin Li</t>
1281*a58d3d2aSXin Li</section>
1282*a58d3d2aSXin Li<section anchor="ec_dec_icdf" title="ec_dec_icdf()">
1283*a58d3d2aSXin Li<t>
1284*a58d3d2aSXin LiThe last is ec_dec_icdf() (entdec.c), which decodes a single symbol with a
1285*a58d3d2aSXin Li table-based context of up to 8 bits, also replacing both the ec_decode() and
1286*a58d3d2aSXin Li ec_dec_update() steps, as well as the search for the decoded symbol in between.
1287*a58d3d2aSXin LiThe context is described by two parameters, an icdf
1288*a58d3d2aSXin Li ("inverse" cumulative distribution function) table and ftb.
1289*a58d3d2aSXin LiAs with ec_decode_bin(), (1&lt;&lt;ftb) is equivalent to ft.
1290*a58d3d2aSXin Liidcf[k], on the other hand, stores (1&lt;&lt;ftb)-fh[k], which is equal to
1291*a58d3d2aSXin Li (1&lt;&lt;ftb)&nbsp;-&nbsp;fl[k+1].
1292*a58d3d2aSXin Lifl[0] is assumed to be 0, and the table is terminated by a value of 0 (where
1293*a58d3d2aSXin Li fh[k]&nbsp;==&nbsp;ft).
1294*a58d3d2aSXin Li</t>
1295*a58d3d2aSXin Li<t>
1296*a58d3d2aSXin LiThe function is mathematically equivalent to calling ec_decode() with
1297*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;ftb), using the returned value fs to search the table
1298*a58d3d2aSXin Li for the first entry where fs&nbsp;&lt;&nbsp;(1&lt;&lt;ftb)-icdf[k], and
1299*a58d3d2aSXin Li calling ec_dec_update() with
1300*a58d3d2aSXin Li fl[k]&nbsp;=&nbsp;(1&lt;&lt;ftb)&nbsp;-&nbsp;icdf[k-1] (or 0
1301*a58d3d2aSXin Li if k&nbsp;==&nbsp;0), fh[k]&nbsp;=&nbsp;(1&lt;&lt;ftb)&nbsp;-&nbsp;idcf[k],
1302*a58d3d2aSXin Li and ft&nbsp;=&nbsp;(1&lt;&lt;ftb).
1303*a58d3d2aSXin LiCombining the search with the update allows the division to be replaced by a
1304*a58d3d2aSXin Li series of multiplications (which are usually much cheaper), and using an
1305*a58d3d2aSXin Li inverse CDF allows the use of an ftb as large as 8 in an 8-bit table without
1306*a58d3d2aSXin Li any special cases.
1307*a58d3d2aSXin LiThis is the primary interface with the range decoder in the SILK layer, though
1308*a58d3d2aSXin Li it is used in a few places in the CELT layer as well.
1309*a58d3d2aSXin Li</t>
1310*a58d3d2aSXin Li<t>
1311*a58d3d2aSXin LiAlthough icdf[k] is more convenient for the code, the frequency counts, f[k],
1312*a58d3d2aSXin Li are a more natural representation of the probability distribution function
1313*a58d3d2aSXin Li (PDF) for a given symbol.
1314*a58d3d2aSXin LiTherefore this draft lists the latter, not the former, when describing the
1315*a58d3d2aSXin Li context in which a symbol is coded as a list, e.g., {4, 4, 4, 4}/16 for a
1316*a58d3d2aSXin Li uniform context with four possible values and ft&nbsp;=&nbsp;16.
1317*a58d3d2aSXin LiThe value of ft after the slash is always the sum of the entries in the PDF,
1318*a58d3d2aSXin Li but is included for convenience.
1319*a58d3d2aSXin LiContexts with identical probabilities, f[k]/ft, but different values of ft
1320*a58d3d2aSXin Li (or equivalently, ftb) are not the same, and cannot, in general, be used in
1321*a58d3d2aSXin Li place of one another.
1322*a58d3d2aSXin LiAn icdf table is also not capable of representing a PDF where the first symbol
1323*a58d3d2aSXin Li has 0 probability.
1324*a58d3d2aSXin LiIn such contexts, ec_dec_icdf() can decode the symbol by using a table that
1325*a58d3d2aSXin Li drops the entries for any initial zero-probability values and adding the
1326*a58d3d2aSXin Li constant offset of the first value with a non-zero probability to its return
1327*a58d3d2aSXin Li value.
1328*a58d3d2aSXin Li</t>
1329*a58d3d2aSXin Li</section>
1330*a58d3d2aSXin Li</section>
1331*a58d3d2aSXin Li
1332*a58d3d2aSXin Li<section anchor="decoding-bits" title="Decoding Raw Bits">
1333*a58d3d2aSXin Li<t>
1334*a58d3d2aSXin LiThe raw bits used by the CELT layer are packed at the end of the packet, with
1335*a58d3d2aSXin Li the least significant bit of the first value packed in the least significant
1336*a58d3d2aSXin Li bit of the last byte, filling up to the most significant bit in the last byte,
1337*a58d3d2aSXin Li continuing on to the least significant bit of the penultimate byte, and so on.
1338*a58d3d2aSXin LiThe reference implementation reads them using ec_dec_bits() (entdec.c).
1339*a58d3d2aSXin LiBecause the range decoder must read several bytes ahead in the stream, as
1340*a58d3d2aSXin Li described in <xref target="range-decoder-renorm"/>, the input consumed by the
1341*a58d3d2aSXin Li raw bits may overlap with the input consumed by the range coder, and a decoder
1342*a58d3d2aSXin Li MUST allow this.
1343*a58d3d2aSXin LiThe format should render it impossible to attempt to read more raw bits than
1344*a58d3d2aSXin Li there are actual bits in the frame, though a decoder may wish to check for
1345*a58d3d2aSXin Li this and report an error.
1346*a58d3d2aSXin Li</t>
1347*a58d3d2aSXin Li</section>
1348*a58d3d2aSXin Li
1349*a58d3d2aSXin Li<section anchor="ec_dec_uint" title="Decoding Uniformly Distributed Integers">
1350*a58d3d2aSXin Li<t>
1351*a58d3d2aSXin LiThe function ec_dec_uint() (entdec.c) decodes one of ft equiprobable values in
1352*a58d3d2aSXin Li the range 0 to (ft&nbsp;-&nbsp;1), inclusive, each with a frequency of 1,
1353*a58d3d2aSXin Li where ft may be as large as (2**32&nbsp;-&nbsp;1).
1354*a58d3d2aSXin LiBecause ec_decode() is limited to a total frequency of (2**16&nbsp;-&nbsp;1),
1355*a58d3d2aSXin Li it splits up the value into a range coded symbol representing up to 8 of the
1356*a58d3d2aSXin Li high bits, and, if necessary, raw bits representing the remainder of the
1357*a58d3d2aSXin Li value.
1358*a58d3d2aSXin LiThe limit of 8 bits in the range coded symbol is a trade-off between
1359*a58d3d2aSXin Li implementation complexity, modeling error (since the symbols no longer truly
1360*a58d3d2aSXin Li have equal coding cost), and rounding error introduced by the range coder
1361*a58d3d2aSXin Li itself (which gets larger as more bits are included).
1362*a58d3d2aSXin LiUsing raw bits reduces the maximum number of divisions required in the worst
1363*a58d3d2aSXin Li case, but means that it may be possible to decode a value outside the range
1364*a58d3d2aSXin Li 0 to (ft&nbsp;-&nbsp;1), inclusive.
1365*a58d3d2aSXin Li</t>
1366*a58d3d2aSXin Li
1367*a58d3d2aSXin Li<t>
1368*a58d3d2aSXin Liec_dec_uint() takes a single, positive parameter, ft, which is not necessarily
1369*a58d3d2aSXin Li a power of two, and returns an integer, t, whose value lies between 0 and
1370*a58d3d2aSXin Li (ft&nbsp;-&nbsp;1), inclusive.
1371*a58d3d2aSXin LiLet ftb&nbsp;=&nbsp;ilog(ft&nbsp;-&nbsp;1), i.e., the number of bits required
1372*a58d3d2aSXin Li to store (ft&nbsp;-&nbsp;1) in two's complement notation.
1373*a58d3d2aSXin LiIf ftb is 8 or less, then t is decoded with t&nbsp;=&nbsp;ec_decode(ft), and
1374*a58d3d2aSXin Li the range coder state is updated using the three-tuple (t, t&nbsp;+&nbsp;1,
1375*a58d3d2aSXin Li ft).
1376*a58d3d2aSXin Li</t>
1377*a58d3d2aSXin Li<t>
1378*a58d3d2aSXin LiIf ftb is greater than 8, then the top 8 bits of t are decoded using
1379*a58d3d2aSXin Li<figure align="center">
1380*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1381*a58d3d2aSXin Lit = ec_decode(((ft - 1) >> (ftb - 8)) + 1) ,
1382*a58d3d2aSXin Li]]></artwork>
1383*a58d3d2aSXin Li</figure>
1384*a58d3d2aSXin Li the decoder state is updated using the three-tuple
1385*a58d3d2aSXin Li (t, t&nbsp;+&nbsp;1,
1386*a58d3d2aSXin Li ((ft&nbsp;-&nbsp;1)&nbsp;&gt;&gt;&nbsp;(ftb&nbsp;-&nbsp;8))&nbsp;+&nbsp;1),
1387*a58d3d2aSXin Li and the remaining bits are decoded as raw bits, setting
1388*a58d3d2aSXin Li<figure align="center">
1389*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1390*a58d3d2aSXin Lit = (t << (ftb - 8)) | ec_dec_bits(ftb - 8) .
1391*a58d3d2aSXin Li]]></artwork>
1392*a58d3d2aSXin Li</figure>
1393*a58d3d2aSXin LiIf, at this point, t >= ft, then the current frame is corrupt.
1394*a58d3d2aSXin LiIn that case, the decoder should assume there has been an error in the coding,
1395*a58d3d2aSXin Li decoding, or transmission and SHOULD take measures to conceal the
1396*a58d3d2aSXin Li error and/or report to the application that the error has occurred.
1397*a58d3d2aSXin Li</t>
1398*a58d3d2aSXin Li
1399*a58d3d2aSXin Li</section>
1400*a58d3d2aSXin Li
1401*a58d3d2aSXin Li<section anchor="decoder-tell" title="Current Bit Usage">
1402*a58d3d2aSXin Li<t>
1403*a58d3d2aSXin LiThe bit allocation routines in the CELT decoder need a conservative upper bound
1404*a58d3d2aSXin Li on the number of bits that have been used from the current frame thus far,
1405*a58d3d2aSXin Li including both range coder bits and raw bits.
1406*a58d3d2aSXin LiThis drives allocation decisions that must match those made in the encoder.
1407*a58d3d2aSXin LiThe upper bound is computed in the reference implementation to whole-bit
1408*a58d3d2aSXin Li precision by the function ec_tell() (entcode.h) and to fractional 1/8th bit
1409*a58d3d2aSXin Li precision by the function ec_tell_frac() (entcode.c).
1410*a58d3d2aSXin LiLike all operations in the range coder, it must be implemented in a bit-exact
1411*a58d3d2aSXin Li manner, and must produce exactly the same value returned by the same functions
1412*a58d3d2aSXin Li in the encoder after encoding the same symbols.
1413*a58d3d2aSXin Li</t>
1414*a58d3d2aSXin Li<t>
1415*a58d3d2aSXin Liec_tell() is guaranteed to return ceil(ec_tell_frac()/8.0).
1416*a58d3d2aSXin LiIn various places the codec will check to ensure there is enough room to
1417*a58d3d2aSXin Li contain a symbol before attempting to decode it.
1418*a58d3d2aSXin LiIn practice, although the number of bits used so far is an upper bound,
1419*a58d3d2aSXin Li decoding a symbol whose probability model suggests it has a worst-case cost of
1420*a58d3d2aSXin Li p 1/8th bits may actually advance the return value of ec_tell_frac() by
1421*a58d3d2aSXin Li p-1, p, or p+1 1/8th bits, due to approximation error in that upper bound,
1422*a58d3d2aSXin Li truncation error in the range coder, and for large values of ft, modeling
1423*a58d3d2aSXin Li error in ec_dec_uint().
1424*a58d3d2aSXin Li</t>
1425*a58d3d2aSXin Li<t>
1426*a58d3d2aSXin LiHowever, this error is bounded, and periodic calls to ec_tell() or
1427*a58d3d2aSXin Li ec_tell_frac() at precisely defined points in the decoding process prevent it
1428*a58d3d2aSXin Li from accumulating.
1429*a58d3d2aSXin LiFor a range coder symbol that requires a whole number of bits (i.e.,
1430*a58d3d2aSXin Li for which ft/(fh[k]&nbsp;-&nbsp;fl[k]) is a power of two), where there are at
1431*a58d3d2aSXin Li least p 1/8th bits available, decoding the symbol will never cause ec_tell() or
1432*a58d3d2aSXin Li ec_tell_frac() to exceed the size of the frame ("bust the budget").
1433*a58d3d2aSXin LiIn this case the return value of ec_tell_frac() will only advance by more than
1434*a58d3d2aSXin Li p 1/8th bits if there was an additional, fractional number of bits remaining,
1435*a58d3d2aSXin Li and it will never advance beyond the next whole-bit boundary, which is safe,
1436*a58d3d2aSXin Li since frames always contain a whole number of bits.
1437*a58d3d2aSXin LiHowever, when p is not a whole number of bits, an extra 1/8th bit is required
1438*a58d3d2aSXin Li to ensure that decoding the symbol will not bust the budget.
1439*a58d3d2aSXin Li</t>
1440*a58d3d2aSXin Li<t>
1441*a58d3d2aSXin LiThe reference implementation keeps track of the total number of whole bits that
1442*a58d3d2aSXin Li have been processed by the decoder so far in the variable nbits_total,
1443*a58d3d2aSXin Li including the (possibly fractional) number of bits that are currently
1444*a58d3d2aSXin Li buffered, but not consumed, inside the range coder.
1445*a58d3d2aSXin Linbits_total is initialized to 9 just before the initial range renormalization
1446*a58d3d2aSXin Li process completes (or equivalently, it can be initialized to 33 after the
1447*a58d3d2aSXin Li first renormalization).
1448*a58d3d2aSXin LiThe extra two bits over the actual amount buffered by the range coder
1449*a58d3d2aSXin Li guarantees that it is an upper bound and that there is enough room for the
1450*a58d3d2aSXin Li encoder to terminate the stream.
1451*a58d3d2aSXin LiEach iteration through the range coder's renormalization loop increases
1452*a58d3d2aSXin Li nbits_total by 8.
1453*a58d3d2aSXin LiReading raw bits increases nbits_total by the number of raw bits read.
1454*a58d3d2aSXin Li</t>
1455*a58d3d2aSXin Li
1456*a58d3d2aSXin Li<section anchor="ec_tell" title="ec_tell()">
1457*a58d3d2aSXin Li<t>
1458*a58d3d2aSXin LiThe whole number of bits buffered in rng may be estimated via lg = ilog(rng).
1459*a58d3d2aSXin Liec_tell() then becomes a simple matter of removing these bits from the total.
1460*a58d3d2aSXin LiIt returns (nbits_total - lg).
1461*a58d3d2aSXin Li</t>
1462*a58d3d2aSXin Li<t>
1463*a58d3d2aSXin LiIn a newly initialized decoder, before any symbols have been read, this reports
1464*a58d3d2aSXin Li that 1 bit has been used.
1465*a58d3d2aSXin LiThis is the bit reserved for termination of the encoder.
1466*a58d3d2aSXin Li</t>
1467*a58d3d2aSXin Li</section>
1468*a58d3d2aSXin Li
1469*a58d3d2aSXin Li<section anchor="ec_tell_frac" title="ec_tell_frac()">
1470*a58d3d2aSXin Li<t>
1471*a58d3d2aSXin Liec_tell_frac() estimates the number of bits buffered in rng to fractional
1472*a58d3d2aSXin Li precision.
1473*a58d3d2aSXin LiSince rng must be greater than 2**23 after renormalization, lg must be at least
1474*a58d3d2aSXin Li 24.
1475*a58d3d2aSXin LiLet
1476*a58d3d2aSXin Li<figure align="center">
1477*a58d3d2aSXin Li<artwork align="center">
1478*a58d3d2aSXin Li<![CDATA[
1479*a58d3d2aSXin Lir_Q15 = rng >> (lg-16) ,
1480*a58d3d2aSXin Li]]></artwork>
1481*a58d3d2aSXin Li</figure>
1482*a58d3d2aSXin Li so that 32768 &lt;= r_Q15 &lt; 65536, an unsigned Q15 value representing the
1483*a58d3d2aSXin Li fractional part of rng.
1484*a58d3d2aSXin LiThen the following procedure can be used to add one bit of precision to lg.
1485*a58d3d2aSXin LiFirst, update
1486*a58d3d2aSXin Li<figure align="center">
1487*a58d3d2aSXin Li<artwork align="center">
1488*a58d3d2aSXin Li<![CDATA[
1489*a58d3d2aSXin Lir_Q15 = (r_Q15*r_Q15) >> 15 .
1490*a58d3d2aSXin Li]]></artwork>
1491*a58d3d2aSXin Li</figure>
1492*a58d3d2aSXin LiThen add the 16th bit of r_Q15 to lg via
1493*a58d3d2aSXin Li<figure align="center">
1494*a58d3d2aSXin Li<artwork align="center">
1495*a58d3d2aSXin Li<![CDATA[
1496*a58d3d2aSXin Lilg = 2*lg + (r_Q15 >> 16) .
1497*a58d3d2aSXin Li]]></artwork>
1498*a58d3d2aSXin Li</figure>
1499*a58d3d2aSXin LiFinally, if this bit was a 1, reduce r_Q15 by a factor of two via
1500*a58d3d2aSXin Li<figure align="center">
1501*a58d3d2aSXin Li<artwork align="center">
1502*a58d3d2aSXin Li<![CDATA[
1503*a58d3d2aSXin Lir_Q15 = r_Q15 >> 1 ,
1504*a58d3d2aSXin Li]]></artwork>
1505*a58d3d2aSXin Li</figure>
1506*a58d3d2aSXin Li so that it once again lies in the range 32768 &lt;= r_Q15 &lt; 65536.
1507*a58d3d2aSXin Li</t>
1508*a58d3d2aSXin Li<t>
1509*a58d3d2aSXin LiThis procedure is repeated three times to extend lg to 1/8th bit precision.
1510*a58d3d2aSXin Liec_tell_frac() then returns (nbits_total*8 - lg).
1511*a58d3d2aSXin Li</t>
1512*a58d3d2aSXin Li</section>
1513*a58d3d2aSXin Li
1514*a58d3d2aSXin Li</section>
1515*a58d3d2aSXin Li
1516*a58d3d2aSXin Li</section>
1517*a58d3d2aSXin Li
1518*a58d3d2aSXin Li<section anchor="silk_decoder_outline" title="SILK Decoder">
1519*a58d3d2aSXin Li<t>
1520*a58d3d2aSXin LiThe decoder's LP layer uses a modified version of the SILK codec (herein simply
1521*a58d3d2aSXin Li called "SILK"), which runs a decoded excitation signal through adaptive
1522*a58d3d2aSXin Li long-term and short-term prediction synthesis filters.
1523*a58d3d2aSXin LiIt runs at NB, MB, and WB sample rates internally.
1524*a58d3d2aSXin LiWhen used in a SWB or FB Hybrid frame, the LP layer itself still only runs in
1525*a58d3d2aSXin Li WB.
1526*a58d3d2aSXin Li</t>
1527*a58d3d2aSXin Li
1528*a58d3d2aSXin Li<section title="SILK Decoder Modules">
1529*a58d3d2aSXin Li<t>
1530*a58d3d2aSXin LiAn overview of the decoder is given in <xref target="silk_decoder_figure"/>.
1531*a58d3d2aSXin Li</t>
1532*a58d3d2aSXin Li<figure align="center" anchor="silk_decoder_figure" title="SILK Decoder">
1533*a58d3d2aSXin Li<artwork align="center">
1534*a58d3d2aSXin Li<![CDATA[
1535*a58d3d2aSXin Li   +---------+    +------------+
1536*a58d3d2aSXin Li-->| Range   |--->| Decode     |---------------------------+
1537*a58d3d2aSXin Li 1 | Decoder | 2  | Parameters |----------+       5        |
1538*a58d3d2aSXin Li   +---------+    +------------+     4    |                |
1539*a58d3d2aSXin Li                       3 |                |                |
1540*a58d3d2aSXin Li                        \/               \/               \/
1541*a58d3d2aSXin Li                  +------------+   +------------+   +------------+
1542*a58d3d2aSXin Li                  | Generate   |-->| LTP        |-->| LPC        |
1543*a58d3d2aSXin Li                  | Excitation |   | Synthesis  |   | Synthesis  |
1544*a58d3d2aSXin Li                  +------------+   +------------+   +------------+
1545*a58d3d2aSXin Li                                          ^                |
1546*a58d3d2aSXin Li                                          |                |
1547*a58d3d2aSXin Li                      +-------------------+----------------+
1548*a58d3d2aSXin Li                      |                                      6
1549*a58d3d2aSXin Li                      |   +------------+   +-------------+
1550*a58d3d2aSXin Li                      +-->| Stereo     |-->| Sample Rate |-->
1551*a58d3d2aSXin Li                          | Unmixing   | 7 | Conversion  | 8
1552*a58d3d2aSXin Li                          +------------+   +-------------+
1553*a58d3d2aSXin Li
1554*a58d3d2aSXin Li1: Range encoded bitstream
1555*a58d3d2aSXin Li2: Coded parameters
1556*a58d3d2aSXin Li3: Pulses, LSBs, and signs
1557*a58d3d2aSXin Li4: Pitch lags, Long-Term Prediction (LTP) coefficients
1558*a58d3d2aSXin Li5: Linear Predictive Coding (LPC) coefficients and gains
1559*a58d3d2aSXin Li6: Decoded signal (mono or mid-side stereo)
1560*a58d3d2aSXin Li7: Unmixed signal (mono or left-right stereo)
1561*a58d3d2aSXin Li8: Resampled signal
1562*a58d3d2aSXin Li]]>
1563*a58d3d2aSXin Li</artwork>
1564*a58d3d2aSXin Li</figure>
1565*a58d3d2aSXin Li
1566*a58d3d2aSXin Li<t>
1567*a58d3d2aSXin LiThe decoder feeds the bitstream (1) to the range decoder from
1568*a58d3d2aSXin Li <xref target="range-decoder"/>, and then decodes the parameters in it (2)
1569*a58d3d2aSXin Li using the procedures detailed in
1570*a58d3d2aSXin Li Sections&nbsp;<xref format="counter" target="silk_header_bits"/>
1571*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_signs"/>.
1572*a58d3d2aSXin LiThese parameters (3, 4, 5) are used to generate an excitation signal (see
1573*a58d3d2aSXin Li <xref target="silk_excitation_reconstruction"/>), which is fed to an optional
1574*a58d3d2aSXin Li long-term prediction (LTP) filter (voiced frames only, see
1575*a58d3d2aSXin Li <xref target="silk_ltp_synthesis"/>) and then a short-term prediction filter
1576*a58d3d2aSXin Li (see <xref target="silk_lpc_synthesis"/>), producing the decoded signal (6).
1577*a58d3d2aSXin LiFor stereo streams, the mid-side representation is converted to separate left
1578*a58d3d2aSXin Li and right channels (7).
1579*a58d3d2aSXin LiThe result is finally resampled to the desired output sample rate (e.g.,
1580*a58d3d2aSXin Li 48&nbsp;kHz) so that the resampled signal (8) can be mixed with the CELT
1581*a58d3d2aSXin Li layer.
1582*a58d3d2aSXin Li</t>
1583*a58d3d2aSXin Li
1584*a58d3d2aSXin Li</section>
1585*a58d3d2aSXin Li
1586*a58d3d2aSXin Li<section anchor="silk_layer_organization" title="LP Layer Organization">
1587*a58d3d2aSXin Li
1588*a58d3d2aSXin Li<t>
1589*a58d3d2aSXin LiInternally, the LP layer of a single Opus frame is composed of either a single
1590*a58d3d2aSXin Li 10&nbsp;ms regular SILK frame or between one and three 20&nbsp;ms regular SILK
1591*a58d3d2aSXin Li frames.
1592*a58d3d2aSXin LiA stereo Opus frame may double the number of regular SILK frames (up to a total
1593*a58d3d2aSXin Li of six), since it includes separate frames for a mid channel and, optionally,
1594*a58d3d2aSXin Li a side channel.
1595*a58d3d2aSXin LiOptional Low Bit-Rate Redundancy (LBRR) frames, which are reduced-bitrate
1596*a58d3d2aSXin Li encodings of previous SILK frames, may be included to aid in recovery from
1597*a58d3d2aSXin Li packet loss.
1598*a58d3d2aSXin LiIf present, these appear before the regular SILK frames.
1599*a58d3d2aSXin LiThey are in most respects identical to regular, active SILK frames, except that
1600*a58d3d2aSXin Li they are usually encoded with a lower bitrate.
1601*a58d3d2aSXin LiThis draft uses "SILK frame" to refer to either one and "regular SILK frame" if
1602*a58d3d2aSXin Li it needs to draw a distinction between the two.
1603*a58d3d2aSXin Li</t>
1604*a58d3d2aSXin Li<t>
1605*a58d3d2aSXin LiLogically, each SILK frame is in turn composed of either two or four 5&nbsp;ms
1606*a58d3d2aSXin Li subframes.
1607*a58d3d2aSXin LiVarious parameters, such as the quantization gain of the excitation and the
1608*a58d3d2aSXin Li pitch lag and filter coefficients can vary on a subframe-by-subframe basis.
1609*a58d3d2aSXin LiPhysically, the parameters for each subframe are interleaved in the bitstream,
1610*a58d3d2aSXin Li as described in the relevant sections for each parameter.
1611*a58d3d2aSXin Li</t>
1612*a58d3d2aSXin Li<t>
1613*a58d3d2aSXin LiAll of these frames and subframes are decoded from the same range coder, with
1614*a58d3d2aSXin Li no padding between them.
1615*a58d3d2aSXin LiThus packing multiple SILK frames in a single Opus frame saves, on average,
1616*a58d3d2aSXin Li half a byte per SILK frame.
1617*a58d3d2aSXin LiIt also allows some parameters to be predicted from prior SILK frames in the
1618*a58d3d2aSXin Li same Opus frame, since this does not degrade packet loss robustness (beyond
1619*a58d3d2aSXin Li any penalty for merely using fewer, larger packets to store multiple frames).
1620*a58d3d2aSXin Li</t>
1621*a58d3d2aSXin Li
1622*a58d3d2aSXin Li<t>
1623*a58d3d2aSXin LiStereo support in SILK uses a variant of mid-side coding, allowing a mono
1624*a58d3d2aSXin Li decoder to simply decode the mid channel.
1625*a58d3d2aSXin LiHowever, the data for the two channels is interleaved, so a mono decoder must
1626*a58d3d2aSXin Li still unpack the data for the side channel.
1627*a58d3d2aSXin LiIt would be required to do so anyway for Hybrid Opus frames, or to support
1628*a58d3d2aSXin Li decoding individual 20&nbsp;ms frames.
1629*a58d3d2aSXin Li</t>
1630*a58d3d2aSXin Li
1631*a58d3d2aSXin Li<t>
1632*a58d3d2aSXin Li<xref target="silk_symbols"/> summarizes the overall grouping of the contents of
1633*a58d3d2aSXin Li the LP layer.
1634*a58d3d2aSXin LiFigures&nbsp;<xref format="counter" target="silk_mono_60ms_frame"/>
1635*a58d3d2aSXin Li and&nbsp;<xref format="counter" target="silk_stereo_60ms_frame"/> illustrate
1636*a58d3d2aSXin Li the ordering of the various SILK frames for a 60&nbsp;ms Opus frame, for both
1637*a58d3d2aSXin Li mono and stereo, respectively.
1638*a58d3d2aSXin Li</t>
1639*a58d3d2aSXin Li
1640*a58d3d2aSXin Li<texttable anchor="silk_symbols"
1641*a58d3d2aSXin Li title="Organization of the SILK layer of an Opus frame">
1642*a58d3d2aSXin Li<ttcol align="center">Symbol(s)</ttcol>
1643*a58d3d2aSXin Li<ttcol align="center">PDF(s)</ttcol>
1644*a58d3d2aSXin Li<ttcol align="center">Condition</ttcol>
1645*a58d3d2aSXin Li
1646*a58d3d2aSXin Li<c>Voice Activity Detection (VAD) flags</c>
1647*a58d3d2aSXin Li<c>{1, 1}/2</c>
1648*a58d3d2aSXin Li<c/>
1649*a58d3d2aSXin Li
1650*a58d3d2aSXin Li<c>LBRR flag</c>
1651*a58d3d2aSXin Li<c>{1, 1}/2</c>
1652*a58d3d2aSXin Li<c/>
1653*a58d3d2aSXin Li
1654*a58d3d2aSXin Li<c>Per-frame LBRR flags</c>
1655*a58d3d2aSXin Li<c><xref target="silk_lbrr_flag_pdfs"/></c>
1656*a58d3d2aSXin Li<c><xref target="silk_lbrr_flags"/></c>
1657*a58d3d2aSXin Li
1658*a58d3d2aSXin Li<c>LBRR Frame(s)</c>
1659*a58d3d2aSXin Li<c><xref target="silk_frame"/></c>
1660*a58d3d2aSXin Li<c><xref target="silk_lbrr_flags"/></c>
1661*a58d3d2aSXin Li
1662*a58d3d2aSXin Li<c>Regular SILK Frame(s)</c>
1663*a58d3d2aSXin Li<c><xref target="silk_frame"/></c>
1664*a58d3d2aSXin Li<c/>
1665*a58d3d2aSXin Li
1666*a58d3d2aSXin Li</texttable>
1667*a58d3d2aSXin Li
1668*a58d3d2aSXin Li<figure align="center" anchor="silk_mono_60ms_frame"
1669*a58d3d2aSXin Li title="A 60&nbsp;ms Mono Frame">
1670*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1671*a58d3d2aSXin Li+---------------------------------+
1672*a58d3d2aSXin Li|            VAD Flags            |
1673*a58d3d2aSXin Li+---------------------------------+
1674*a58d3d2aSXin Li|            LBRR Flag            |
1675*a58d3d2aSXin Li+---------------------------------+
1676*a58d3d2aSXin Li| Per-Frame LBRR Flags (Optional) |
1677*a58d3d2aSXin Li+---------------------------------+
1678*a58d3d2aSXin Li|     LBRR Frame 1 (Optional)     |
1679*a58d3d2aSXin Li+---------------------------------+
1680*a58d3d2aSXin Li|     LBRR Frame 2 (Optional)     |
1681*a58d3d2aSXin Li+---------------------------------+
1682*a58d3d2aSXin Li|     LBRR Frame 3 (Optional)     |
1683*a58d3d2aSXin Li+---------------------------------+
1684*a58d3d2aSXin Li|      Regular SILK Frame 1       |
1685*a58d3d2aSXin Li+---------------------------------+
1686*a58d3d2aSXin Li|      Regular SILK Frame 2       |
1687*a58d3d2aSXin Li+---------------------------------+
1688*a58d3d2aSXin Li|      Regular SILK Frame 3       |
1689*a58d3d2aSXin Li+---------------------------------+
1690*a58d3d2aSXin Li]]></artwork>
1691*a58d3d2aSXin Li</figure>
1692*a58d3d2aSXin Li
1693*a58d3d2aSXin Li<figure align="center" anchor="silk_stereo_60ms_frame"
1694*a58d3d2aSXin Li title="A 60&nbsp;ms Stereo Frame">
1695*a58d3d2aSXin Li<artwork align="center"><![CDATA[
1696*a58d3d2aSXin Li+---------------------------------------+
1697*a58d3d2aSXin Li|             Mid VAD Flags             |
1698*a58d3d2aSXin Li+---------------------------------------+
1699*a58d3d2aSXin Li|             Mid LBRR Flag             |
1700*a58d3d2aSXin Li+---------------------------------------+
1701*a58d3d2aSXin Li|             Side VAD Flags            |
1702*a58d3d2aSXin Li+---------------------------------------+
1703*a58d3d2aSXin Li|             Side LBRR Flag            |
1704*a58d3d2aSXin Li+---------------------------------------+
1705*a58d3d2aSXin Li|  Mid Per-Frame LBRR Flags (Optional)  |
1706*a58d3d2aSXin Li+---------------------------------------+
1707*a58d3d2aSXin Li| Side Per-Frame LBRR Flags (Optional)  |
1708*a58d3d2aSXin Li+---------------------------------------+
1709*a58d3d2aSXin Li|     Mid LBRR Frame 1 (Optional)       |
1710*a58d3d2aSXin Li+---------------------------------------+
1711*a58d3d2aSXin Li|     Side LBRR Frame 1 (Optional)      |
1712*a58d3d2aSXin Li+---------------------------------------+
1713*a58d3d2aSXin Li|     Mid LBRR Frame 2 (Optional)       |
1714*a58d3d2aSXin Li+---------------------------------------+
1715*a58d3d2aSXin Li|     Side LBRR Frame 2 (Optional)      |
1716*a58d3d2aSXin Li+---------------------------------------+
1717*a58d3d2aSXin Li|     Mid LBRR Frame 3 (Optional)       |
1718*a58d3d2aSXin Li+---------------------------------------+
1719*a58d3d2aSXin Li|     Side LBRR Frame 3 (Optional)      |
1720*a58d3d2aSXin Li+---------------------------------------+
1721*a58d3d2aSXin Li|      Mid Regular SILK Frame 1         |
1722*a58d3d2aSXin Li+---------------------------------------+
1723*a58d3d2aSXin Li| Side Regular SILK Frame 1 (Optional)  |
1724*a58d3d2aSXin Li+---------------------------------------+
1725*a58d3d2aSXin Li|      Mid Regular SILK Frame 2         |
1726*a58d3d2aSXin Li+---------------------------------------+
1727*a58d3d2aSXin Li| Side Regular SILK Frame 2 (Optional)  |
1728*a58d3d2aSXin Li+---------------------------------------+
1729*a58d3d2aSXin Li|      Mid Regular SILK Frame 3         |
1730*a58d3d2aSXin Li+---------------------------------------+
1731*a58d3d2aSXin Li| Side Regular SILK Frame 3 (Optional)  |
1732*a58d3d2aSXin Li+---------------------------------------+
1733*a58d3d2aSXin Li]]></artwork>
1734*a58d3d2aSXin Li</figure>
1735*a58d3d2aSXin Li
1736*a58d3d2aSXin Li</section>
1737*a58d3d2aSXin Li
1738*a58d3d2aSXin Li<section anchor="silk_header_bits" title="Header Bits">
1739*a58d3d2aSXin Li<t>
1740*a58d3d2aSXin LiThe LP layer begins with two to eight header bits, decoded in silk_Decode()
1741*a58d3d2aSXin Li (dec_API.c).
1742*a58d3d2aSXin LiThese consist of one Voice Activity Detection (VAD) bit per frame (up to 3),
1743*a58d3d2aSXin Li followed by a single flag indicating the presence of LBRR frames.
1744*a58d3d2aSXin LiFor a stereo packet, these first flags correspond to the mid channel, and a
1745*a58d3d2aSXin Li second set of flags is included for the side channel.
1746*a58d3d2aSXin Li</t>
1747*a58d3d2aSXin Li<t>
1748*a58d3d2aSXin LiBecause these are the first symbols decoded by the range coder and because they
1749*a58d3d2aSXin Li are coded as binary values with uniform probability, they can be extracted
1750*a58d3d2aSXin Li directly from the most significant bits of the first byte of compressed data.
1751*a58d3d2aSXin LiThus, a receiver can determine if an Opus frame contains any active SILK frames
1752*a58d3d2aSXin Li without the overhead of using the range decoder.
1753*a58d3d2aSXin Li</t>
1754*a58d3d2aSXin Li</section>
1755*a58d3d2aSXin Li
1756*a58d3d2aSXin Li<section anchor="silk_lbrr_flags" title="Per-Frame LBRR Flags">
1757*a58d3d2aSXin Li<t>
1758*a58d3d2aSXin LiFor Opus frames longer than 20&nbsp;ms, a set of LBRR flags is
1759*a58d3d2aSXin Li decoded for each channel that has its LBRR flag set.
1760*a58d3d2aSXin LiEach set contains one flag per 20&nbsp;ms SILK frame.
1761*a58d3d2aSXin Li40&nbsp;ms Opus frames use the 2-frame LBRR flag PDF from
1762*a58d3d2aSXin Li <xref target="silk_lbrr_flag_pdfs"/>, and 60&nbsp;ms Opus frames use the
1763*a58d3d2aSXin Li 3-frame LBRR flag PDF.
1764*a58d3d2aSXin LiFor each channel, the resulting 2- or 3-bit integer contains the corresponding
1765*a58d3d2aSXin Li LBRR flag for each frame, packed in order from the LSB to the MSB.
1766*a58d3d2aSXin Li</t>
1767*a58d3d2aSXin Li
1768*a58d3d2aSXin Li<texttable anchor="silk_lbrr_flag_pdfs" title="LBRR Flag PDFs">
1769*a58d3d2aSXin Li<ttcol>Frame Size</ttcol>
1770*a58d3d2aSXin Li<ttcol>PDF</ttcol>
1771*a58d3d2aSXin Li<c>40&nbsp;ms</c> <c>{0, 53, 53, 150}/256</c>
1772*a58d3d2aSXin Li<c>60&nbsp;ms</c> <c>{0, 41, 20, 29, 41, 15, 28, 82}/256</c>
1773*a58d3d2aSXin Li</texttable>
1774*a58d3d2aSXin Li
1775*a58d3d2aSXin Li<t>
1776*a58d3d2aSXin LiA 10&nbsp;or 20&nbsp;ms Opus frame does not contain any per-frame LBRR flags,
1777*a58d3d2aSXin Li as there may be at most one LBRR frame per channel.
1778*a58d3d2aSXin LiThe global LBRR flag in the header bits (see <xref target="silk_header_bits"/>)
1779*a58d3d2aSXin Li is already sufficient to indicate the presence of that single LBRR frame.
1780*a58d3d2aSXin Li</t>
1781*a58d3d2aSXin Li
1782*a58d3d2aSXin Li</section>
1783*a58d3d2aSXin Li
1784*a58d3d2aSXin Li<section anchor="silk_lbrr_frames" title="LBRR Frames">
1785*a58d3d2aSXin Li<t>
1786*a58d3d2aSXin LiThe LBRR frames, if present, contain an encoded representation of the signal
1787*a58d3d2aSXin Li immediately prior to the current Opus frame as if it were encoded with the
1788*a58d3d2aSXin Li current mode, frame size, audio bandwidth, and channel count, even if those
1789*a58d3d2aSXin Li differ from the prior Opus frame.
1790*a58d3d2aSXin LiWhen one of these parameters changes from one Opus frame to the next, this
1791*a58d3d2aSXin Li implies that the LBRR frames of the current Opus frame may not be simple
1792*a58d3d2aSXin Li drop-in replacements for the contents of the previous Opus frame.
1793*a58d3d2aSXin Li</t>
1794*a58d3d2aSXin Li
1795*a58d3d2aSXin Li<t>
1796*a58d3d2aSXin LiFor example, when switching from 20&nbsp;ms to 60&nbsp;ms, the 60&nbsp;ms Opus
1797*a58d3d2aSXin Li frame may contain LBRR frames covering up to three prior 20&nbsp;ms Opus
1798*a58d3d2aSXin Li frames, even if those frames already contained LBRR frames covering some of
1799*a58d3d2aSXin Li the same time periods.
1800*a58d3d2aSXin LiWhen switching from 20&nbsp;ms to 10&nbsp;ms, the 10&nbsp;ms Opus frame can
1801*a58d3d2aSXin Li contain an LBRR frame covering at most half the prior 20&nbsp;ms Opus frame,
1802*a58d3d2aSXin Li potentially leaving a hole that needs to be concealed from even a single
1803*a58d3d2aSXin Li packet loss (see <xref target="Packet Loss Concealment"/>).
1804*a58d3d2aSXin LiWhen switching from mono to stereo, the LBRR frames in the first stereo Opus
1805*a58d3d2aSXin Li frame MAY contain a non-trivial side channel.
1806*a58d3d2aSXin Li</t>
1807*a58d3d2aSXin Li
1808*a58d3d2aSXin Li<t>
1809*a58d3d2aSXin LiIn order to properly produce LBRR frames under all conditions, an encoder might
1810*a58d3d2aSXin Li need to buffer up to 60&nbsp;ms of audio and re-encode it during these
1811*a58d3d2aSXin Li transitions.
1812*a58d3d2aSXin LiHowever, the reference implementation opts to disable LBRR frames at the
1813*a58d3d2aSXin Li transition point for simplicity.
1814*a58d3d2aSXin LiSince transitions are relatively infrequent in normal usage, this does not have
1815*a58d3d2aSXin Li a significant impact on packet loss robustness.
1816*a58d3d2aSXin Li</t>
1817*a58d3d2aSXin Li
1818*a58d3d2aSXin Li<t>
1819*a58d3d2aSXin LiThe LBRR frames immediately follow the LBRR flags, prior to any regular SILK
1820*a58d3d2aSXin Li frames.
1821*a58d3d2aSXin Li<xref target="silk_frame"/> describes their exact contents.
1822*a58d3d2aSXin LiLBRR frames do not include their own separate VAD flags.
1823*a58d3d2aSXin LiLBRR frames are only meant to be transmitted for active speech, thus all LBRR
1824*a58d3d2aSXin Li frames are treated as active.
1825*a58d3d2aSXin Li</t>
1826*a58d3d2aSXin Li
1827*a58d3d2aSXin Li<t>
1828*a58d3d2aSXin LiIn a stereo Opus frame longer than 20&nbsp;ms, although the per-frame LBRR
1829*a58d3d2aSXin Li flags for the mid channel are coded as a unit before the per-frame LBRR flags
1830*a58d3d2aSXin Li for the side channel, the LBRR frames themselves are interleaved.
1831*a58d3d2aSXin LiThe decoder parses an LBRR frame for the mid channel of a given 20&nbsp;ms
1832*a58d3d2aSXin Li interval (if present) and then immediately parses the corresponding LBRR
1833*a58d3d2aSXin Li frame for the side channel (if present), before proceeding to the next
1834*a58d3d2aSXin Li 20&nbsp;ms interval.
1835*a58d3d2aSXin Li</t>
1836*a58d3d2aSXin Li</section>
1837*a58d3d2aSXin Li
1838*a58d3d2aSXin Li<section anchor="silk_regular_frames" title="Regular SILK Frames">
1839*a58d3d2aSXin Li<t>
1840*a58d3d2aSXin LiThe regular SILK frame(s) follow the LBRR frames (if any).
1841*a58d3d2aSXin Li<xref target="silk_frame"/> describes their contents, as well.
1842*a58d3d2aSXin LiUnlike the LBRR frames, a regular SILK frame is coded for each time interval in
1843*a58d3d2aSXin Li an Opus frame, even if the corresponding VAD flags are unset.
1844*a58d3d2aSXin LiFor stereo Opus frames longer than 20&nbsp;ms, the regular mid and side SILK
1845*a58d3d2aSXin Li frames for each 20&nbsp;ms interval are interleaved, just as with the LBRR
1846*a58d3d2aSXin Li frames.
1847*a58d3d2aSXin LiThe side frame may be skipped by coding an appropriate flag, as detailed in
1848*a58d3d2aSXin Li <xref target="silk_mid_only_flag"/>.
1849*a58d3d2aSXin Li</t>
1850*a58d3d2aSXin Li</section>
1851*a58d3d2aSXin Li
1852*a58d3d2aSXin Li<section anchor="silk_frame" title="SILK Frame Contents">
1853*a58d3d2aSXin Li<t>
1854*a58d3d2aSXin LiEach SILK frame includes a set of side information that encodes
1855*a58d3d2aSXin Li<list style="symbols">
1856*a58d3d2aSXin Li<t>The frame type and quantization type (<xref target="silk_frame_type"/>),</t>
1857*a58d3d2aSXin Li<t>Quantization gains (<xref target="silk_gains"/>),</t>
1858*a58d3d2aSXin Li<t>Short-term prediction filter coefficients (<xref target="silk_nlsfs"/>),</t>
1859*a58d3d2aSXin Li<t>A Line Spectral Frequencies (LSF) interpolation weight (<xref target="silk_nlsf_interpolation"/>),</t>
1860*a58d3d2aSXin Li<t>
1861*a58d3d2aSXin LiLong-term prediction filter lags and gains (<xref target="silk_ltp_params"/>),
1862*a58d3d2aSXin Li and
1863*a58d3d2aSXin Li</t>
1864*a58d3d2aSXin Li<t>A linear congruential generator (LCG) seed (<xref target="silk_seed"/>).</t>
1865*a58d3d2aSXin Li</list>
1866*a58d3d2aSXin LiThe quantized excitation signal (see <xref target="silk_excitation"/>) follows
1867*a58d3d2aSXin Li these at the end of the frame.
1868*a58d3d2aSXin Li<xref target="silk_frame_symbols"/> details the overall organization of a
1869*a58d3d2aSXin Li SILK frame.
1870*a58d3d2aSXin Li</t>
1871*a58d3d2aSXin Li
1872*a58d3d2aSXin Li<texttable anchor="silk_frame_symbols"
1873*a58d3d2aSXin Li title="Order of the symbols in an individual SILK frame">
1874*a58d3d2aSXin Li<ttcol align="center">Symbol(s)</ttcol>
1875*a58d3d2aSXin Li<ttcol align="center">PDF(s)</ttcol>
1876*a58d3d2aSXin Li<ttcol align="center">Condition</ttcol>
1877*a58d3d2aSXin Li
1878*a58d3d2aSXin Li<c>Stereo Prediction Weights</c>
1879*a58d3d2aSXin Li<c><xref target="silk_stereo_pred_pdfs"/></c>
1880*a58d3d2aSXin Li<c><xref target="silk_stereo_pred"/></c>
1881*a58d3d2aSXin Li
1882*a58d3d2aSXin Li<c>Mid-only Flag</c>
1883*a58d3d2aSXin Li<c><xref target="silk_mid_only_pdf"/></c>
1884*a58d3d2aSXin Li<c><xref target="silk_mid_only_flag"/></c>
1885*a58d3d2aSXin Li
1886*a58d3d2aSXin Li<c>Frame Type</c>
1887*a58d3d2aSXin Li<c><xref target="silk_frame_type"/></c>
1888*a58d3d2aSXin Li<c/>
1889*a58d3d2aSXin Li
1890*a58d3d2aSXin Li<c>Subframe Gains</c>
1891*a58d3d2aSXin Li<c><xref target="silk_gains"/></c>
1892*a58d3d2aSXin Li<c/>
1893*a58d3d2aSXin Li
1894*a58d3d2aSXin Li<c>Normalized LSF Stage-1 Index</c>
1895*a58d3d2aSXin Li<c><xref target="silk_nlsf_stage1_pdfs"/></c>
1896*a58d3d2aSXin Li<c/>
1897*a58d3d2aSXin Li
1898*a58d3d2aSXin Li<c>Normalized LSF Stage-2 Residual</c>
1899*a58d3d2aSXin Li<c><xref target="silk_nlsf_stage2"/></c>
1900*a58d3d2aSXin Li<c/>
1901*a58d3d2aSXin Li
1902*a58d3d2aSXin Li<c>Normalized LSF Interpolation Weight</c>
1903*a58d3d2aSXin Li<c><xref target="silk_nlsf_interp_pdf"/></c>
1904*a58d3d2aSXin Li<c>20&nbsp;ms frame</c>
1905*a58d3d2aSXin Li
1906*a58d3d2aSXin Li<c>Primary Pitch Lag</c>
1907*a58d3d2aSXin Li<c><xref target="silk_ltp_lags"/></c>
1908*a58d3d2aSXin Li<c>Voiced frame</c>
1909*a58d3d2aSXin Li
1910*a58d3d2aSXin Li<c>Subframe Pitch Contour</c>
1911*a58d3d2aSXin Li<c><xref target="silk_pitch_contour_pdfs"/></c>
1912*a58d3d2aSXin Li<c>Voiced frame</c>
1913*a58d3d2aSXin Li
1914*a58d3d2aSXin Li<c>Periodicity Index</c>
1915*a58d3d2aSXin Li<c><xref target="silk_perindex_pdf"/></c>
1916*a58d3d2aSXin Li<c>Voiced frame</c>
1917*a58d3d2aSXin Li
1918*a58d3d2aSXin Li<c>LTP Filter</c>
1919*a58d3d2aSXin Li<c><xref target="silk_ltp_filter_pdfs"/></c>
1920*a58d3d2aSXin Li<c>Voiced frame</c>
1921*a58d3d2aSXin Li
1922*a58d3d2aSXin Li<c>LTP Scaling</c>
1923*a58d3d2aSXin Li<c><xref target="silk_ltp_scaling_pdf"/></c>
1924*a58d3d2aSXin Li<c><xref target="silk_ltp_scaling"/></c>
1925*a58d3d2aSXin Li
1926*a58d3d2aSXin Li<c>LCG Seed</c>
1927*a58d3d2aSXin Li<c><xref target="silk_seed_pdf"/></c>
1928*a58d3d2aSXin Li<c/>
1929*a58d3d2aSXin Li
1930*a58d3d2aSXin Li<c>Excitation Rate Level</c>
1931*a58d3d2aSXin Li<c><xref target="silk_rate_level_pdfs"/></c>
1932*a58d3d2aSXin Li<c/>
1933*a58d3d2aSXin Li
1934*a58d3d2aSXin Li<c>Excitation Pulse Counts</c>
1935*a58d3d2aSXin Li<c><xref target="silk_pulse_count_pdfs"/></c>
1936*a58d3d2aSXin Li<c/>
1937*a58d3d2aSXin Li
1938*a58d3d2aSXin Li<c>Excitation Pulse Locations</c>
1939*a58d3d2aSXin Li<c><xref target="silk_pulse_locations"/></c>
1940*a58d3d2aSXin Li<c>Non-zero pulse count</c>
1941*a58d3d2aSXin Li
1942*a58d3d2aSXin Li<c>Excitation LSBs</c>
1943*a58d3d2aSXin Li<c><xref target="silk_shell_lsb_pdf"/></c>
1944*a58d3d2aSXin Li<c><xref target="silk_pulse_counts"/></c>
1945*a58d3d2aSXin Li
1946*a58d3d2aSXin Li<c>Excitation Signs</c>
1947*a58d3d2aSXin Li<c><xref target="silk_sign_pdfs"/></c>
1948*a58d3d2aSXin Li<c/>
1949*a58d3d2aSXin Li
1950*a58d3d2aSXin Li</texttable>
1951*a58d3d2aSXin Li
1952*a58d3d2aSXin Li<section anchor="silk_stereo_pred" toc="include"
1953*a58d3d2aSXin Li title="Stereo Prediction Weights">
1954*a58d3d2aSXin Li<t>
1955*a58d3d2aSXin LiA SILK frame corresponding to the mid channel of a stereo Opus frame begins
1956*a58d3d2aSXin Li with a pair of side channel prediction weights, designed such that zeros
1957*a58d3d2aSXin Li indicate normal mid-side coupling.
1958*a58d3d2aSXin LiSince these weights can change on every frame, the first portion of each frame
1959*a58d3d2aSXin Li linearly interpolates between the previous weights and the current ones, using
1960*a58d3d2aSXin Li zeros for the previous weights if none are available.
1961*a58d3d2aSXin LiThese prediction weights are never included in a mono Opus frame, and the
1962*a58d3d2aSXin Li previous weights are reset to zeros on any transition from mono to stereo.
1963*a58d3d2aSXin LiThey are also not included in an LBRR frame for the side channel, even if the
1964*a58d3d2aSXin Li LBRR flags indicate the corresponding mid channel was not coded.
1965*a58d3d2aSXin LiIn that case, the previous weights are used, again substituting in zeros if no
1966*a58d3d2aSXin Li previous weights are available since the last decoder reset
1967*a58d3d2aSXin Li (see <xref target="decoder-reset"/>).
1968*a58d3d2aSXin Li</t>
1969*a58d3d2aSXin Li
1970*a58d3d2aSXin Li<t>
1971*a58d3d2aSXin LiTo summarize, these weights are coded if and only if
1972*a58d3d2aSXin Li<list style="symbols">
1973*a58d3d2aSXin Li<t>This is a stereo Opus frame (<xref target="toc_byte"/>), and</t>
1974*a58d3d2aSXin Li<t>The current SILK frame corresponds to the mid channel.</t>
1975*a58d3d2aSXin Li</list>
1976*a58d3d2aSXin Li</t>
1977*a58d3d2aSXin Li
1978*a58d3d2aSXin Li<t>
1979*a58d3d2aSXin LiThe prediction weights are coded in three separate pieces, which are decoded
1980*a58d3d2aSXin Li by silk_stereo_decode_pred() (decode_stereo_pred.c).
1981*a58d3d2aSXin LiThe first piece jointly codes the high-order part of a table index for both
1982*a58d3d2aSXin Li weights.
1983*a58d3d2aSXin LiThe second piece codes the low-order part of each table index.
1984*a58d3d2aSXin LiThe third piece codes an offset used to linearly interpolate between table
1985*a58d3d2aSXin Li indices.
1986*a58d3d2aSXin LiThe details are as follows.
1987*a58d3d2aSXin Li</t>
1988*a58d3d2aSXin Li
1989*a58d3d2aSXin Li<t>
1990*a58d3d2aSXin LiLet n be an index decoded with the 25-element stage-1 PDF in
1991*a58d3d2aSXin Li <xref target="silk_stereo_pred_pdfs"/>.
1992*a58d3d2aSXin LiThen let i0 and i1 be indices decoded with the stage-2 and stage-3 PDFs in
1993*a58d3d2aSXin Li <xref target="silk_stereo_pred_pdfs"/>, respectively, and let i2 and i3
1994*a58d3d2aSXin Li be two more indices decoded with the stage-2 and stage-3 PDFs, all in that
1995*a58d3d2aSXin Li order.
1996*a58d3d2aSXin Li</t>
1997*a58d3d2aSXin Li
1998*a58d3d2aSXin Li<texttable anchor="silk_stereo_pred_pdfs" title="Stereo Weight PDFs">
1999*a58d3d2aSXin Li<ttcol align="left">Stage</ttcol>
2000*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
2001*a58d3d2aSXin Li<c>Stage 1</c>
2002*a58d3d2aSXin Li<c>{7,  2,  1,  1,  1,
2003*a58d3d2aSXin Li   10, 24,  8,  1,  1,
2004*a58d3d2aSXin Li    3, 23, 92, 23,  3,
2005*a58d3d2aSXin Li    1,  1,  8, 24, 10,
2006*a58d3d2aSXin Li    1,  1,  1,  2,  7}/256</c>
2007*a58d3d2aSXin Li
2008*a58d3d2aSXin Li<c>Stage 2</c>
2009*a58d3d2aSXin Li<c>{85, 86, 85}/256</c>
2010*a58d3d2aSXin Li
2011*a58d3d2aSXin Li<c>Stage 3</c>
2012*a58d3d2aSXin Li<c>{51, 51, 52, 51, 51}/256</c>
2013*a58d3d2aSXin Li</texttable>
2014*a58d3d2aSXin Li
2015*a58d3d2aSXin Li<t>
2016*a58d3d2aSXin LiThen use n, i0, and i2 to form two table indices, wi0 and wi1, according to
2017*a58d3d2aSXin Li<figure align="center">
2018*a58d3d2aSXin Li<artwork align="center"><![CDATA[
2019*a58d3d2aSXin Liwi0 = i0 + 3*(n/5)
2020*a58d3d2aSXin Liwi1 = i2 + 3*(n%5)
2021*a58d3d2aSXin Li]]></artwork>
2022*a58d3d2aSXin Li</figure>
2023*a58d3d2aSXin Li where the division is integer division.
2024*a58d3d2aSXin LiThe range of these indices is 0 to 14, inclusive.
2025*a58d3d2aSXin LiLet w[i] be the i'th weight from <xref target="silk_stereo_weights_table"/>.
2026*a58d3d2aSXin LiThen the two prediction weights, w0_Q13 and w1_Q13, are
2027*a58d3d2aSXin Li<figure align="center">
2028*a58d3d2aSXin Li<artwork align="center"><![CDATA[
2029*a58d3d2aSXin Liw1_Q13 = w_Q13[wi1]
2030*a58d3d2aSXin Li         + ((w_Q13[wi1+1] - w_Q13[wi1])*6554) >> 16)*(2*i3 + 1)
2031*a58d3d2aSXin Li
2032*a58d3d2aSXin Liw0_Q13 = w_Q13[wi0]
2033*a58d3d2aSXin Li         + ((w_Q13[wi0+1] - w_Q13[wi0])*6554) >> 16)*(2*i1 + 1)
2034*a58d3d2aSXin Li         - w1_Q13
2035*a58d3d2aSXin Li]]></artwork>
2036*a58d3d2aSXin Li</figure>
2037*a58d3d2aSXin LiN.b., w1_Q13 is computed first here, because w0_Q13 depends on it.
2038*a58d3d2aSXin LiThe constant 6554 is approximately 0.1 in Q16.
2039*a58d3d2aSXin LiAlthough wi0 and wi1 only have 15 possible values,
2040*a58d3d2aSXin Li <xref target="silk_stereo_weights_table"/> contains 16 entries to allow
2041*a58d3d2aSXin Li interpolation between entry wi0 and (wi0&nbsp;+&nbsp;1) (and likewise for wi1).
2042*a58d3d2aSXin Li</t>
2043*a58d3d2aSXin Li
2044*a58d3d2aSXin Li<texttable anchor="silk_stereo_weights_table"
2045*a58d3d2aSXin Li title="Stereo Weight Table">
2046*a58d3d2aSXin Li<ttcol align="left">Index</ttcol>
2047*a58d3d2aSXin Li<ttcol align="right">Weight (Q13)</ttcol>
2048*a58d3d2aSXin Li <c>0</c> <c>-13732</c>
2049*a58d3d2aSXin Li <c>1</c> <c>-10050</c>
2050*a58d3d2aSXin Li <c>2</c>  <c>-8266</c>
2051*a58d3d2aSXin Li <c>3</c>  <c>-7526</c>
2052*a58d3d2aSXin Li <c>4</c>  <c>-6500</c>
2053*a58d3d2aSXin Li <c>5</c>  <c>-5000</c>
2054*a58d3d2aSXin Li <c>6</c>  <c>-2950</c>
2055*a58d3d2aSXin Li <c>7</c>   <c>-820</c>
2056*a58d3d2aSXin Li <c>8</c>    <c>820</c>
2057*a58d3d2aSXin Li <c>9</c>   <c>2950</c>
2058*a58d3d2aSXin Li<c>10</c>   <c>5000</c>
2059*a58d3d2aSXin Li<c>11</c>   <c>6500</c>
2060*a58d3d2aSXin Li<c>12</c>   <c>7526</c>
2061*a58d3d2aSXin Li<c>13</c>   <c>8266</c>
2062*a58d3d2aSXin Li<c>14</c>  <c>10050</c>
2063*a58d3d2aSXin Li<c>15</c>  <c>13732</c>
2064*a58d3d2aSXin Li</texttable>
2065*a58d3d2aSXin Li
2066*a58d3d2aSXin Li</section>
2067*a58d3d2aSXin Li
2068*a58d3d2aSXin Li<section anchor="silk_mid_only_flag" toc="include" title="Mid-only Flag">
2069*a58d3d2aSXin Li<t>
2070*a58d3d2aSXin LiA flag appears after the stereo prediction weights that indicates if only the
2071*a58d3d2aSXin Li mid channel is coded for this time interval.
2072*a58d3d2aSXin LiIt appears only when
2073*a58d3d2aSXin Li<list style="symbols">
2074*a58d3d2aSXin Li<t>This is a stereo Opus frame (see <xref target="toc_byte"/>),</t>
2075*a58d3d2aSXin Li<t>The current SILK frame corresponds to the mid channel, and</t>
2076*a58d3d2aSXin Li<t>Either
2077*a58d3d2aSXin Li<list style="symbols">
2078*a58d3d2aSXin Li<t>This is a regular SILK frame where the VAD flags
2079*a58d3d2aSXin Li (see <xref target="silk_header_bits"/>) indicate that the corresponding side
2080*a58d3d2aSXin Li channel is not active.</t>
2081*a58d3d2aSXin Li<t>
2082*a58d3d2aSXin LiThis is an LBRR frame where the LBRR flags
2083*a58d3d2aSXin Li (see <xref target="silk_header_bits"/> and <xref target="silk_lbrr_flags"/>)
2084*a58d3d2aSXin Li indicate that the corresponding side channel is not coded.
2085*a58d3d2aSXin Li</t>
2086*a58d3d2aSXin Li</list>
2087*a58d3d2aSXin Li</t>
2088*a58d3d2aSXin Li</list>
2089*a58d3d2aSXin LiIt is omitted when there are no stereo weights, for all of the same reasons.
2090*a58d3d2aSXin LiIt is also omitted for a regular SILK frame when the VAD flag of the
2091*a58d3d2aSXin Li corresponding side channel frame is set (indicating it is active).
2092*a58d3d2aSXin LiThe side channel must be coded in this case, making the mid-only flag
2093*a58d3d2aSXin Li redundant.
2094*a58d3d2aSXin LiIt is also omitted for an LBRR frame when the corresponding LBRR flags
2095*a58d3d2aSXin Li indicate the side channel is coded.
2096*a58d3d2aSXin Li</t>
2097*a58d3d2aSXin Li
2098*a58d3d2aSXin Li<t>
2099*a58d3d2aSXin LiWhen the flag is present, the decoder reads a single value using the PDF in
2100*a58d3d2aSXin Li <xref target="silk_mid_only_pdf"/>, as implemented in
2101*a58d3d2aSXin Li silk_stereo_decode_mid_only() (decode_stereo_pred.c).
2102*a58d3d2aSXin LiIf the flag is set, then there is no corresponding SILK frame for the side
2103*a58d3d2aSXin Li channel, the entire decoding process for the side channel is skipped, and
2104*a58d3d2aSXin Li zeros are fed to the stereo unmixing process (see
2105*a58d3d2aSXin Li <xref target="silk_stereo_unmixing"/>) instead.
2106*a58d3d2aSXin LiAs stated above, LBRR frames still include this flag when the LBRR flag
2107*a58d3d2aSXin Li indicates that the side channel is not coded.
2108*a58d3d2aSXin LiIn that case, if this flag is zero (indicating that there should be a side
2109*a58d3d2aSXin Li channel), then Packet Loss Concealment (PLC, see
2110*a58d3d2aSXin Li <xref target="Packet Loss Concealment"/>) SHOULD be invoked to recover a
2111*a58d3d2aSXin Li side channel signal.
2112*a58d3d2aSXin LiOtherwise, the stereo image will collapse.
2113*a58d3d2aSXin Li</t>
2114*a58d3d2aSXin Li
2115*a58d3d2aSXin Li<texttable anchor="silk_mid_only_pdf" title="Mid-only Flag PDF">
2116*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
2117*a58d3d2aSXin Li<c>{192, 64}/256</c>
2118*a58d3d2aSXin Li</texttable>
2119*a58d3d2aSXin Li
2120*a58d3d2aSXin Li</section>
2121*a58d3d2aSXin Li
2122*a58d3d2aSXin Li<section anchor="silk_frame_type" toc="include" title="Frame Type">
2123*a58d3d2aSXin Li<t>
2124*a58d3d2aSXin LiEach SILK frame contains a single "frame type" symbol that jointly codes the
2125*a58d3d2aSXin Li signal type and quantization offset type of the corresponding frame.
2126*a58d3d2aSXin LiIf the current frame is a regular SILK frame whose VAD bit was not set (an
2127*a58d3d2aSXin Li "inactive" frame), then the frame type symbol takes on a value of either 0 or
2128*a58d3d2aSXin Li 1 and is decoded using the first PDF in <xref target="silk_frame_type_pdfs"/>.
2129*a58d3d2aSXin LiIf the frame is an LBRR frame or a regular SILK frame whose VAD flag was set
2130*a58d3d2aSXin Li (an "active" frame), then the value of the symbol may range from 2 to 5,
2131*a58d3d2aSXin Li inclusive, and is decoded using the second PDF in
2132*a58d3d2aSXin Li <xref target="silk_frame_type_pdfs"/>.
2133*a58d3d2aSXin Li<xref target="silk_frame_type_table"/> translates between the value of the
2134*a58d3d2aSXin Li frame type symbol and the corresponding signal type and quantization offset
2135*a58d3d2aSXin Li type.
2136*a58d3d2aSXin Li</t>
2137*a58d3d2aSXin Li
2138*a58d3d2aSXin Li<texttable anchor="silk_frame_type_pdfs" title="Frame Type PDFs">
2139*a58d3d2aSXin Li<ttcol>VAD Flag</ttcol>
2140*a58d3d2aSXin Li<ttcol>PDF</ttcol>
2141*a58d3d2aSXin Li<c>Inactive</c> <c>{26, 230, 0, 0, 0, 0}/256</c>
2142*a58d3d2aSXin Li<c>Active</c>   <c>{0, 0, 24, 74, 148, 10}/256</c>
2143*a58d3d2aSXin Li</texttable>
2144*a58d3d2aSXin Li
2145*a58d3d2aSXin Li<texttable anchor="silk_frame_type_table"
2146*a58d3d2aSXin Li title="Signal Type and Quantization Offset Type from Frame Type">
2147*a58d3d2aSXin Li<ttcol>Frame Type</ttcol>
2148*a58d3d2aSXin Li<ttcol>Signal Type</ttcol>
2149*a58d3d2aSXin Li<ttcol align="right">Quantization Offset Type</ttcol>
2150*a58d3d2aSXin Li<c>0</c> <c>Inactive</c> <c>Low</c>
2151*a58d3d2aSXin Li<c>1</c> <c>Inactive</c> <c>High</c>
2152*a58d3d2aSXin Li<c>2</c> <c>Unvoiced</c> <c>Low</c>
2153*a58d3d2aSXin Li<c>3</c> <c>Unvoiced</c> <c>High</c>
2154*a58d3d2aSXin Li<c>4</c> <c>Voiced</c>   <c>Low</c>
2155*a58d3d2aSXin Li<c>5</c> <c>Voiced</c>   <c>High</c>
2156*a58d3d2aSXin Li</texttable>
2157*a58d3d2aSXin Li
2158*a58d3d2aSXin Li</section>
2159*a58d3d2aSXin Li
2160*a58d3d2aSXin Li<section anchor="silk_gains" toc="include" title="Subframe Gains">
2161*a58d3d2aSXin Li<t>
2162*a58d3d2aSXin LiA separate quantization gain is coded for each 5&nbsp;ms subframe.
2163*a58d3d2aSXin LiThese gains control the step size between quantization levels of the excitation
2164*a58d3d2aSXin Li signal and, therefore, the quality of the reconstruction.
2165*a58d3d2aSXin LiThey are independent of and unrelated to the pitch contours coded for voiced
2166*a58d3d2aSXin Li frames.
2167*a58d3d2aSXin LiThe quantization gains are themselves uniformly quantized to 6&nbsp;bits on a
2168*a58d3d2aSXin Li log scale, giving them a resolution of approximately 1.369&nbsp;dB and a range
2169*a58d3d2aSXin Li of approximately 1.94&nbsp;dB to 88.21&nbsp;dB.
2170*a58d3d2aSXin Li</t>
2171*a58d3d2aSXin Li<t>
2172*a58d3d2aSXin LiThe subframe gains are either coded independently, or relative to the gain from
2173*a58d3d2aSXin Li the most recent coded subframe in the same channel.
2174*a58d3d2aSXin LiIndependent coding is used if and only if
2175*a58d3d2aSXin Li<list style="symbols">
2176*a58d3d2aSXin Li<t>
2177*a58d3d2aSXin LiThis is the first subframe in the current SILK frame, and
2178*a58d3d2aSXin Li</t>
2179*a58d3d2aSXin Li<t>Either
2180*a58d3d2aSXin Li<list style="symbols">
2181*a58d3d2aSXin Li<t>
2182*a58d3d2aSXin LiThis is the first SILK frame of its type (LBRR or regular) for this channel in
2183*a58d3d2aSXin Li the current Opus frame, or
2184*a58d3d2aSXin Li </t>
2185*a58d3d2aSXin Li<t>
2186*a58d3d2aSXin LiThe previous SILK frame of the same type (LBRR or regular) for this channel in
2187*a58d3d2aSXin Li the same Opus frame was not coded.
2188*a58d3d2aSXin Li</t>
2189*a58d3d2aSXin Li</list>
2190*a58d3d2aSXin Li</t>
2191*a58d3d2aSXin Li</list>
2192*a58d3d2aSXin Li</t>
2193*a58d3d2aSXin Li
2194*a58d3d2aSXin Li<t>
2195*a58d3d2aSXin LiIn an independently coded subframe gain, the 3 most significant bits of the
2196*a58d3d2aSXin Li quantization gain are decoded using a PDF selected from
2197*a58d3d2aSXin Li <xref target="silk_independent_gain_msb_pdfs"/> based on the decoded signal
2198*a58d3d2aSXin Li type (see <xref target="silk_frame_type"/>).
2199*a58d3d2aSXin Li</t>
2200*a58d3d2aSXin Li
2201*a58d3d2aSXin Li<texttable anchor="silk_independent_gain_msb_pdfs"
2202*a58d3d2aSXin Li title="PDFs for Independent Quantization Gain MSB Coding">
2203*a58d3d2aSXin Li<ttcol align="left">Signal Type</ttcol>
2204*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
2205*a58d3d2aSXin Li<c>Inactive</c> <c>{32, 112, 68, 29, 12,  1,  1, 1}/256</c>
2206*a58d3d2aSXin Li<c>Unvoiced</c>  <c>{2,  17, 45, 60, 62, 47, 19, 4}/256</c>
2207*a58d3d2aSXin Li<c>Voiced</c>    <c>{1,   3, 26, 71, 94, 50,  9, 2}/256</c>
2208*a58d3d2aSXin Li</texttable>
2209*a58d3d2aSXin Li
2210*a58d3d2aSXin Li<t>
2211*a58d3d2aSXin LiThe 3 least significant bits are decoded using a uniform PDF:
2212*a58d3d2aSXin Li</t>
2213*a58d3d2aSXin Li<texttable anchor="silk_independent_gain_lsb_pdf"
2214*a58d3d2aSXin Li title="PDF for Independent Quantization Gain LSB Coding">
2215*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
2216*a58d3d2aSXin Li<c>{32, 32, 32, 32, 32, 32, 32, 32}/256</c>
2217*a58d3d2aSXin Li</texttable>
2218*a58d3d2aSXin Li
2219*a58d3d2aSXin Li<t>
2220*a58d3d2aSXin LiThese 6 bits are combined to form a value, gain_index, between 0 and 63.
2221*a58d3d2aSXin LiWhen the gain for the previous subframe is available, then the current gain is
2222*a58d3d2aSXin Li limited as follows:
2223*a58d3d2aSXin Li<figure align="center">
2224*a58d3d2aSXin Li<artwork align="center"><![CDATA[
2225*a58d3d2aSXin Lilog_gain = max(gain_index, previous_log_gain - 16) .
2226*a58d3d2aSXin Li]]></artwork>
2227*a58d3d2aSXin Li</figure>
2228*a58d3d2aSXin LiThis may help some implementations limit the change in precision of their
2229*a58d3d2aSXin Li internal LTP history.
2230*a58d3d2aSXin LiThe indices which this clamp applies to cannot simply be removed from the
2231*a58d3d2aSXin Li codebook, because previous_log_gain will not be available after packet loss.
2232*a58d3d2aSXin LiThe clamping is skipped after a decoder reset, and in the side channel if the
2233*a58d3d2aSXin Li previous frame in the side channel was not coded, since there is no value for
2234*a58d3d2aSXin Li previous_log_gain available.
2235*a58d3d2aSXin LiIt MAY also be skipped after packet loss.
2236*a58d3d2aSXin Li</t>
2237*a58d3d2aSXin Li
2238*a58d3d2aSXin Li<t>
2239*a58d3d2aSXin LiFor subframes which do not have an independent gain (including the first
2240*a58d3d2aSXin Li subframe of frames not listed as using independent coding above), the
2241*a58d3d2aSXin Li quantization gain is coded relative to the gain from the previous subframe (in
2242*a58d3d2aSXin Li the same channel).
2243*a58d3d2aSXin LiThe PDF in <xref target="silk_delta_gain_pdf"/> yields a delta_gain_index value
2244*a58d3d2aSXin Li between 0 and 40, inclusive.
2245*a58d3d2aSXin Li</t>
2246*a58d3d2aSXin Li<texttable anchor="silk_delta_gain_pdf"
2247*a58d3d2aSXin Li title="PDF for Delta Quantization Gain Coding">
2248*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
2249*a58d3d2aSXin Li<c>{6,   5,  11,  31, 132,  21,   8,   4,
2250*a58d3d2aSXin Li    3,   2,   2,   2,   1,   1,   1,   1,
2251*a58d3d2aSXin Li    1,   1,   1,   1,   1,   1,   1,   1,
2252*a58d3d2aSXin Li    1,   1,   1,   1,   1,   1,   1,   1,
2253*a58d3d2aSXin Li    1,   1,   1,   1,   1,   1,   1,   1,   1}/256</c>
2254*a58d3d2aSXin Li</texttable>
2255*a58d3d2aSXin Li<t>
2256*a58d3d2aSXin LiThe following formula translates this index into a quantization gain for the
2257*a58d3d2aSXin Li current subframe using the gain from the previous subframe:
2258*a58d3d2aSXin Li<figure align="center">
2259*a58d3d2aSXin Li<artwork align="center"><![CDATA[
2260*a58d3d2aSXin Lilog_gain = clamp(0, max(2*delta_gain_index - 16,
2261*a58d3d2aSXin Li                   previous_log_gain + delta_gain_index - 4), 63) .
2262*a58d3d2aSXin Li]]></artwork>
2263*a58d3d2aSXin Li</figure>
2264*a58d3d2aSXin Li</t>
2265*a58d3d2aSXin Li<t>
2266*a58d3d2aSXin Lisilk_gains_dequant() (gain_quant.c) dequantizes log_gain for the k'th subframe
2267*a58d3d2aSXin Li and converts it into a linear Q16 scale factor via
2268*a58d3d2aSXin Li<figure align="center">
2269*a58d3d2aSXin Li<artwork align="center"><![CDATA[
2270*a58d3d2aSXin Ligain_Q16[k] = silk_log2lin((0x1D1C71*log_gain>>16) + 2090)
2271*a58d3d2aSXin Li]]></artwork>
2272*a58d3d2aSXin Li</figure>
2273*a58d3d2aSXin Li</t>
2274*a58d3d2aSXin Li<t>
2275*a58d3d2aSXin LiThe function silk_log2lin() (log2lin.c) computes an approximation of
2276*a58d3d2aSXin Li 2**(inLog_Q7/128.0), where inLog_Q7 is its Q7 input.
2277*a58d3d2aSXin LiLet i = inLog_Q7&gt;&gt;7 be the integer part of inLogQ7 and
2278*a58d3d2aSXin Li f = inLog_Q7&amp;127 be the fractional part.
2279*a58d3d2aSXin LiThen
2280*a58d3d2aSXin Li<figure align="center">
2281*a58d3d2aSXin Li<artwork align="center"><![CDATA[
2282*a58d3d2aSXin Li(1<<i) + ((-174*f*(128-f)>>16)+f)*((1<<i)>>7)
2283*a58d3d2aSXin Li]]></artwork>
2284*a58d3d2aSXin Li</figure>
2285*a58d3d2aSXin Li yields the approximate exponential.
2286*a58d3d2aSXin LiThe final Q16 gain values lies between 81920 and 1686110208, inclusive
2287*a58d3d2aSXin Li (representing scale factors of 1.25 to 25728, respectively).
2288*a58d3d2aSXin Li</t>
2289*a58d3d2aSXin Li</section>
2290*a58d3d2aSXin Li
2291*a58d3d2aSXin Li<section anchor="silk_nlsfs" toc="include" title="Normalized Line Spectral
2292*a58d3d2aSXin Li Frequency (LSF) and Linear Predictive Coding (LPC) Coefficients">
2293*a58d3d2aSXin Li<t>
2294*a58d3d2aSXin LiA set of normalized Line Spectral Frequency (LSF) coefficients follow the
2295*a58d3d2aSXin Li quantization gains in the bitstream, and represent the Linear Predictive
2296*a58d3d2aSXin Li Coding (LPC) coefficients for the current SILK frame.
2297*a58d3d2aSXin LiOnce decoded, the normalized LSFs form an increasing list of Q15 values between
2298*a58d3d2aSXin Li 0 and 1.
2299*a58d3d2aSXin LiThese represent the interleaved zeros on the upper half of the unit circle
2300*a58d3d2aSXin Li (between 0 and pi, hence "normalized") in the standard decomposition
2301*a58d3d2aSXin Li <xref target="line-spectral-pairs"/> of the LPC filter into a symmetric part
2302*a58d3d2aSXin Li and an anti-symmetric part (P and Q in <xref target="silk_nlsf2lpc"/>).
2303*a58d3d2aSXin LiBecause of non-linear effects in the decoding process, an implementation SHOULD
2304*a58d3d2aSXin Li match the fixed-point arithmetic described in this section exactly.
2305*a58d3d2aSXin LiAn encoder SHOULD also use the same process.
2306*a58d3d2aSXin Li</t>
2307*a58d3d2aSXin Li<t>
2308*a58d3d2aSXin LiThe normalized LSFs are coded using a two-stage vector quantizer (VQ)
2309*a58d3d2aSXin Li (<xref target="silk_nlsf_stage1"/> and <xref target="silk_nlsf_stage2"/>).
2310*a58d3d2aSXin LiNB and MB frames use an order-10 predictor, while WB frames use an order-16
2311*a58d3d2aSXin Li predictor, and thus have different sets of tables.
2312*a58d3d2aSXin LiAfter reconstructing the normalized LSFs
2313*a58d3d2aSXin Li (<xref target="silk_nlsf_reconstruction"/>), the decoder runs them through a
2314*a58d3d2aSXin Li stabilization process (<xref target="silk_nlsf_stabilization"/>), interpolates
2315*a58d3d2aSXin Li them between frames (<xref target="silk_nlsf_interpolation"/>), converts them
2316*a58d3d2aSXin Li back into LPC coefficients (<xref target="silk_nlsf2lpc"/>), and then runs
2317*a58d3d2aSXin Li them through further processes to limit the range of the coefficients
2318*a58d3d2aSXin Li (<xref target="silk_lpc_range_limit"/>) and the gain of the filter
2319*a58d3d2aSXin Li (<xref target="silk_lpc_gain_limit"/>).
2320*a58d3d2aSXin LiAll of this is necessary to ensure the reconstruction process is stable.
2321*a58d3d2aSXin Li</t>
2322*a58d3d2aSXin Li
2323*a58d3d2aSXin Li<section anchor="silk_nlsf_stage1" title="Normalized LSF Stage 1 Decoding">
2324*a58d3d2aSXin Li<t>
2325*a58d3d2aSXin LiThe first VQ stage uses a 32-element codebook, coded with one of the PDFs in
2326*a58d3d2aSXin Li <xref target="silk_nlsf_stage1_pdfs"/>, depending on the audio bandwidth and
2327*a58d3d2aSXin Li the signal type of the current SILK frame.
2328*a58d3d2aSXin LiThis yields a single index, I1, for the entire frame, which
2329*a58d3d2aSXin Li<list style="numbers">
2330*a58d3d2aSXin Li<t>Indexes an element in a coarse codebook,</t>
2331*a58d3d2aSXin Li<t>Selects the PDFs for the second stage of the VQ, and</t>
2332*a58d3d2aSXin Li<t>Selects the prediction weights used to remove intra-frame redundancy from
2333*a58d3d2aSXin Li the second stage.</t>
2334*a58d3d2aSXin Li</list>
2335*a58d3d2aSXin LiThe actual codebook elements are listed in
2336*a58d3d2aSXin Li <xref target="silk_nlsf_nbmb_codebook"/> and
2337*a58d3d2aSXin Li <xref target="silk_nlsf_wb_codebook"/>, but they are not needed until the last
2338*a58d3d2aSXin Li stages of reconstructing the LSF coefficients.
2339*a58d3d2aSXin Li</t>
2340*a58d3d2aSXin Li
2341*a58d3d2aSXin Li<texttable anchor="silk_nlsf_stage1_pdfs"
2342*a58d3d2aSXin Li title="PDFs for Normalized LSF Stage-1 Index Decoding">
2343*a58d3d2aSXin Li<ttcol align="left">Audio Bandwidth</ttcol>
2344*a58d3d2aSXin Li<ttcol align="left">Signal Type</ttcol>
2345*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
2346*a58d3d2aSXin Li<c>NB or MB</c> <c>Inactive or unvoiced</c>
2347*a58d3d2aSXin Li<c>
2348*a58d3d2aSXin Li{44, 34, 30, 19, 21, 12, 11,  3,
2349*a58d3d2aSXin Li  3,  2, 16,  2,  2,  1,  5,  2,
2350*a58d3d2aSXin Li  1,  3,  3,  1,  1,  2,  2,  2,
2351*a58d3d2aSXin Li  3,  1,  9,  9,  2,  7,  2,  1}/256
2352*a58d3d2aSXin Li</c>
2353*a58d3d2aSXin Li<c>NB or MB</c> <c>Voiced</c>
2354*a58d3d2aSXin Li<c>
2355*a58d3d2aSXin Li{1, 10,  1,  8,  3,  8,  8, 14,
2356*a58d3d2aSXin Li13, 14,  1, 14, 12, 13, 11, 11,
2357*a58d3d2aSXin Li12, 11, 10, 10, 11,  8,  9,  8,
2358*a58d3d2aSXin Li 7,  8,  1,  1,  6,  1,  6,  5}/256
2359*a58d3d2aSXin Li</c>
2360*a58d3d2aSXin Li<c>WB</c> <c>Inactive or unvoiced</c>
2361*a58d3d2aSXin Li<c>
2362*a58d3d2aSXin Li{31, 21,  3, 17,  1,  8, 17,  4,
2363*a58d3d2aSXin Li  1, 18, 16,  4,  2,  3,  1, 10,
2364*a58d3d2aSXin Li  1,  3, 16, 11, 16,  2,  2,  3,
2365*a58d3d2aSXin Li  2, 11,  1,  4,  9,  8,  7,  3}/256
2366*a58d3d2aSXin Li</c>
2367*a58d3d2aSXin Li<c>WB</c> <c>Voiced</c>
2368*a58d3d2aSXin Li<c>
2369*a58d3d2aSXin Li{1,  4, 16,  5, 18, 11,  5, 14,
2370*a58d3d2aSXin Li15,  1,  3, 12, 13, 14, 14,  6,
2371*a58d3d2aSXin Li14, 12,  2,  6,  1, 12, 12, 11,
2372*a58d3d2aSXin Li10,  3, 10,  5,  1,  1,  1,  3}/256
2373*a58d3d2aSXin Li</c>
2374*a58d3d2aSXin Li</texttable>
2375*a58d3d2aSXin Li
2376*a58d3d2aSXin Li</section>
2377*a58d3d2aSXin Li
2378*a58d3d2aSXin Li<section anchor="silk_nlsf_stage2" title="Normalized LSF Stage 2 Decoding">
2379*a58d3d2aSXin Li<t>
2380*a58d3d2aSXin LiA total of 16 PDFs are available for the LSF residual in the second stage: the
2381*a58d3d2aSXin Li 8 (a...h) for NB and MB frames given in
2382*a58d3d2aSXin Li <xref target="silk_nlsf_stage2_nbmb_pdfs"/>, and the 8 (i...p) for WB frames
2383*a58d3d2aSXin Li given in <xref target="silk_nlsf_stage2_wb_pdfs"/>.
2384*a58d3d2aSXin LiWhich PDF is used for which coefficient is driven by the index, I1,
2385*a58d3d2aSXin Li decoded in the first stage.
2386*a58d3d2aSXin Li<xref target="silk_nlsf_nbmb_stage2_cb_sel"/> lists the letter of the
2387*a58d3d2aSXin Li corresponding PDF for each normalized LSF coefficient for NB and MB, and
2388*a58d3d2aSXin Li <xref target="silk_nlsf_wb_stage2_cb_sel"/> lists the same information for WB.
2389*a58d3d2aSXin Li</t>
2390*a58d3d2aSXin Li
2391*a58d3d2aSXin Li<texttable anchor="silk_nlsf_stage2_nbmb_pdfs"
2392*a58d3d2aSXin Li title="PDFs for NB/MB Normalized LSF Stage-2 Index Decoding">
2393*a58d3d2aSXin Li<ttcol align="left">Codebook</ttcol>
2394*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
2395*a58d3d2aSXin Li<c>a</c> <c>{1,   1,   1,  15, 224,  11,   1,   1,   1}/256</c>
2396*a58d3d2aSXin Li<c>b</c> <c>{1,   1,   2,  34, 183,  32,   1,   1,   1}/256</c>
2397*a58d3d2aSXin Li<c>c</c> <c>{1,   1,   4,  42, 149,  55,   2,   1,   1}/256</c>
2398*a58d3d2aSXin Li<c>d</c> <c>{1,   1,   8,  52, 123,  61,   8,   1,   1}/256</c>
2399*a58d3d2aSXin Li<c>e</c> <c>{1,   3,  16,  53, 101,  74,   6,   1,   1}/256</c>
2400*a58d3d2aSXin Li<c>f</c> <c>{1,   3,  17,  55,  90,  73,  15,   1,   1}/256</c>
2401*a58d3d2aSXin Li<c>g</c> <c>{1,   7,  24,  53,  74,  67,  26,   3,   1}/256</c>
2402*a58d3d2aSXin Li<c>h</c> <c>{1,   1,  18,  63,  78,  58,  30,   6,   1}/256</c>
2403*a58d3d2aSXin Li</texttable>
2404*a58d3d2aSXin Li
2405*a58d3d2aSXin Li<texttable anchor="silk_nlsf_stage2_wb_pdfs"
2406*a58d3d2aSXin Li title="PDFs for WB Normalized LSF Stage-2 Index Decoding">
2407*a58d3d2aSXin Li<ttcol align="left">Codebook</ttcol>
2408*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
2409*a58d3d2aSXin Li<c>i</c> <c>{1,   1,   1,   9, 232,   9,   1,   1,   1}/256</c>
2410*a58d3d2aSXin Li<c>j</c> <c>{1,   1,   2,  28, 186,  35,   1,   1,   1}/256</c>
2411*a58d3d2aSXin Li<c>k</c> <c>{1,   1,   3,  42, 152,  53,   2,   1,   1}/256</c>
2412*a58d3d2aSXin Li<c>l</c> <c>{1,   1,  10,  49, 126,  65,   2,   1,   1}/256</c>
2413*a58d3d2aSXin Li<c>m</c> <c>{1,   4,  19,  48, 100,  77,   5,   1,   1}/256</c>
2414*a58d3d2aSXin Li<c>n</c> <c>{1,   1,  14,  54, 100,  72,  12,   1,   1}/256</c>
2415*a58d3d2aSXin Li<c>o</c> <c>{1,   1,  15,  61,  87,  61,  25,   4,   1}/256</c>
2416*a58d3d2aSXin Li<c>p</c> <c>{1,   7,  21,  50,  77,  81,  17,   1,   1}/256</c>
2417*a58d3d2aSXin Li</texttable>
2418*a58d3d2aSXin Li
2419*a58d3d2aSXin Li<texttable anchor="silk_nlsf_nbmb_stage2_cb_sel"
2420*a58d3d2aSXin Li title="Codebook Selection for NB/MB Normalized LSF Stage-2 Index Decoding">
2421*a58d3d2aSXin Li<ttcol>I1</ttcol>
2422*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
2423*a58d3d2aSXin Li<c/>
2424*a58d3d2aSXin Li<c><spanx style="vbare">0&nbsp;1&nbsp;2&nbsp;3&nbsp;4&nbsp;5&nbsp;6&nbsp;7&nbsp;8&nbsp;9</spanx></c>
2425*a58d3d2aSXin Li<c> 0</c>
2426*a58d3d2aSXin Li<c><spanx style="vbare">a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a</spanx></c>
2427*a58d3d2aSXin Li<c> 1</c>
2428*a58d3d2aSXin Li<c><spanx style="vbare">b&nbsp;d&nbsp;b&nbsp;c&nbsp;c&nbsp;b&nbsp;c&nbsp;b&nbsp;b&nbsp;b</spanx></c>
2429*a58d3d2aSXin Li<c> 2</c>
2430*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b</spanx></c>
2431*a58d3d2aSXin Li<c> 3</c>
2432*a58d3d2aSXin Li<c><spanx style="vbare">b&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;b&nbsp;c&nbsp;b&nbsp;b&nbsp;b</spanx></c>
2433*a58d3d2aSXin Li<c> 4</c>
2434*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;d&nbsp;d&nbsp;d&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c</spanx></c>
2435*a58d3d2aSXin Li<c> 5</c>
2436*a58d3d2aSXin Li<c><spanx style="vbare">a&nbsp;f&nbsp;d&nbsp;d&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;b&nbsp;b</spanx></c>
2437*a58d3d2aSXin Li<c> g</c>
2438*a58d3d2aSXin Li<c><spanx style="vbare">a&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;b</spanx></c>
2439*a58d3d2aSXin Li<c> 7</c>
2440*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;g&nbsp;e&nbsp;e&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f</spanx></c>
2441*a58d3d2aSXin Li<c> 8</c>
2442*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;e&nbsp;f&nbsp;f&nbsp;e&nbsp;f&nbsp;e&nbsp;g&nbsp;e&nbsp;e</spanx></c>
2443*a58d3d2aSXin Li<c> 9</c>
2444*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;e&nbsp;e&nbsp;h&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f&nbsp;e</spanx></c>
2445*a58d3d2aSXin Li<c>10</c>
2446*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;d&nbsp;d&nbsp;d&nbsp;c&nbsp;d&nbsp;c&nbsp;c&nbsp;c&nbsp;c</spanx></c>
2447*a58d3d2aSXin Li<c>11</c>
2448*a58d3d2aSXin Li<c><spanx style="vbare">b&nbsp;f&nbsp;f&nbsp;g&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f&nbsp;f</spanx></c>
2449*a58d3d2aSXin Li<c>12</c>
2450*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;h&nbsp;e&nbsp;g&nbsp;f&nbsp;f&nbsp;f&nbsp;f&nbsp;f&nbsp;f</spanx></c>
2451*a58d3d2aSXin Li<c>13</c>
2452*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;h&nbsp;f&nbsp;f&nbsp;f&nbsp;f&nbsp;f&nbsp;g&nbsp;f&nbsp;e</spanx></c>
2453*a58d3d2aSXin Li<c>14</c>
2454*a58d3d2aSXin Li<c><spanx style="vbare">d&nbsp;d&nbsp;f&nbsp;e&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;e&nbsp;e</spanx></c>
2455*a58d3d2aSXin Li<c>15</c>
2456*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;d&nbsp;f&nbsp;f&nbsp;e&nbsp;e&nbsp;e&nbsp;e&nbsp;e</spanx></c>
2457*a58d3d2aSXin Li<c>16</c>
2458*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;e&nbsp;e&nbsp;g&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f&nbsp;f</spanx></c>
2459*a58d3d2aSXin Li<c>17</c>
2460*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;f&nbsp;e&nbsp;g&nbsp;f&nbsp;f&nbsp;f&nbsp;e&nbsp;f&nbsp;e</spanx></c>
2461*a58d3d2aSXin Li<c>18</c>
2462*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;h&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f&nbsp;f</spanx></c>
2463*a58d3d2aSXin Li<c>19</c>
2464*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;f&nbsp;e&nbsp;g&nbsp;h&nbsp;g&nbsp;f&nbsp;g&nbsp;f&nbsp;e</spanx></c>
2465*a58d3d2aSXin Li<c>20</c>
2466*a58d3d2aSXin Li<c><spanx style="vbare">d&nbsp;g&nbsp;h&nbsp;e&nbsp;g&nbsp;f&nbsp;f&nbsp;g&nbsp;e&nbsp;f</spanx></c>
2467*a58d3d2aSXin Li<c>21</c>
2468*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;h&nbsp;g&nbsp;e&nbsp;e&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f</spanx></c>
2469*a58d3d2aSXin Li<c>22</c>
2470*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;f&nbsp;f&nbsp;e&nbsp;g&nbsp;g&nbsp;f&nbsp;g&nbsp;f&nbsp;e</spanx></c>
2471*a58d3d2aSXin Li<c>23</c>
2472*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;f&nbsp;f&nbsp;g&nbsp;f&nbsp;g&nbsp;e&nbsp;g&nbsp;e&nbsp;e</spanx></c>
2473*a58d3d2aSXin Li<c>24</c>
2474*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;f&nbsp;f&nbsp;f&nbsp;d&nbsp;h&nbsp;e&nbsp;f&nbsp;f&nbsp;e</spanx></c>
2475*a58d3d2aSXin Li<c>25</c>
2476*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;e&nbsp;f&nbsp;f&nbsp;g&nbsp;e&nbsp;f&nbsp;f&nbsp;e</spanx></c>
2477*a58d3d2aSXin Li<c>26</c>
2478*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;c&nbsp;d&nbsp;d&nbsp;e&nbsp;c&nbsp;d&nbsp;d&nbsp;d</spanx></c>
2479*a58d3d2aSXin Li<c>27</c>
2480*a58d3d2aSXin Li<c><spanx style="vbare">b&nbsp;b&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;d&nbsp;c&nbsp;c</spanx></c>
2481*a58d3d2aSXin Li<c>28</c>
2482*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;f&nbsp;f&nbsp;g&nbsp;g&nbsp;g&nbsp;f&nbsp;g&nbsp;e&nbsp;f</spanx></c>
2483*a58d3d2aSXin Li<c>29</c>
2484*a58d3d2aSXin Li<c><spanx style="vbare">d&nbsp;f&nbsp;f&nbsp;e&nbsp;e&nbsp;e&nbsp;e&nbsp;d&nbsp;d&nbsp;c</spanx></c>
2485*a58d3d2aSXin Li<c>30</c>
2486*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;f&nbsp;d&nbsp;h&nbsp;f&nbsp;f&nbsp;e&nbsp;e&nbsp;f&nbsp;e</spanx></c>
2487*a58d3d2aSXin Li<c>31</c>
2488*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;g&nbsp;f&nbsp;g&nbsp;f&nbsp;e</spanx></c>
2489*a58d3d2aSXin Li</texttable>
2490*a58d3d2aSXin Li
2491*a58d3d2aSXin Li<texttable anchor="silk_nlsf_wb_stage2_cb_sel"
2492*a58d3d2aSXin Li title="Codebook Selection for WB Normalized LSF Stage-2 Index Decoding">
2493*a58d3d2aSXin Li<ttcol>I1</ttcol>
2494*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
2495*a58d3d2aSXin Li<c/>
2496*a58d3d2aSXin Li<c><spanx style="vbare">0&nbsp;&nbsp;1&nbsp;&nbsp;2&nbsp;&nbsp;3&nbsp;&nbsp;4&nbsp;&nbsp;5&nbsp;&nbsp;6&nbsp;&nbsp;7&nbsp;&nbsp;8&nbsp;&nbsp;9&nbsp;10&nbsp;11&nbsp;12&nbsp;13&nbsp;14&nbsp;15</spanx></c>
2497*a58d3d2aSXin Li<c> 0</c>
2498*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
2499*a58d3d2aSXin Li<c> 1</c>
2500*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;l</spanx></c>
2501*a58d3d2aSXin Li<c> 2</c>
2502*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;k&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l</spanx></c>
2503*a58d3d2aSXin Li<c> 3</c>
2504*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j</spanx></c>
2505*a58d3d2aSXin Li<c> 4</c>
2506*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l</spanx></c>
2507*a58d3d2aSXin Li<c> 5</c>
2508*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;m</spanx></c>
2509*a58d3d2aSXin Li<c> 6</c>
2510*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
2511*a58d3d2aSXin Li<c> 7</c>
2512*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;k&nbsp;&nbsp;o&nbsp;&nbsp;l&nbsp;&nbsp;p&nbsp;&nbsp;k&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l</spanx></c>
2513*a58d3d2aSXin Li<c> 8</c>
2514*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;k&nbsp;&nbsp;o&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;o&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l</spanx></c>
2515*a58d3d2aSXin Li<c> 9</c>
2516*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
2517*a58d3d2aSXin Li<c>10</c>
2518*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j</spanx></c>
2519*a58d3d2aSXin Li<c>11</c>
2520*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;l</spanx></c>
2521*a58d3d2aSXin Li<c>12</c>
2522*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;l</spanx></c>
2523*a58d3d2aSXin Li<c>13</c>
2524*a58d3d2aSXin Li<c><spanx style="vbare">l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;m</spanx></c>
2525*a58d3d2aSXin Li<c>14</c>
2526*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;n&nbsp;&nbsp;k&nbsp;&nbsp;o&nbsp;&nbsp;n&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l</spanx></c>
2527*a58d3d2aSXin Li<c>15</c>
2528*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;i</spanx></c>
2529*a58d3d2aSXin Li<c>16</c>
2530*a58d3d2aSXin Li<c><spanx style="vbare">j&nbsp;&nbsp;o&nbsp;&nbsp;n&nbsp;&nbsp;p&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;m</spanx></c>
2531*a58d3d2aSXin Li<c>17</c>
2532*a58d3d2aSXin Li<c><spanx style="vbare">j&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m</spanx></c>
2533*a58d3d2aSXin Li<c>18</c>
2534*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;m</spanx></c>
2535*a58d3d2aSXin Li<c>19</c>
2536*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
2537*a58d3d2aSXin Li<c>20</c>
2538*a58d3d2aSXin Li<c><spanx style="vbare">l&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;m</spanx></c>
2539*a58d3d2aSXin Li<c>21</c>
2540*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;o&nbsp;&nbsp;l&nbsp;&nbsp;p&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;l</spanx></c>
2541*a58d3d2aSXin Li<c>22</c>
2542*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;o&nbsp;&nbsp;o&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;m</spanx></c>
2543*a58d3d2aSXin Li<c>23</c>
2544*a58d3d2aSXin Li<c><spanx style="vbare">j&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j</spanx></c>
2545*a58d3d2aSXin Li<c>24</c>
2546*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;o&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l</spanx></c>
2547*a58d3d2aSXin Li<c>25</c>
2548*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
2549*a58d3d2aSXin Li<c>26</c>
2550*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;o&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;k&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m</spanx></c>
2551*a58d3d2aSXin Li<c>27</c>
2552*a58d3d2aSXin Li<c><spanx style="vbare">l&nbsp;&nbsp;l&nbsp;&nbsp;p&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l</spanx></c>
2553*a58d3d2aSXin Li<c>28</c>
2554*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j</spanx></c>
2555*a58d3d2aSXin Li<c>29</c>
2556*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j</spanx></c>
2557*a58d3d2aSXin Li<c>30</c>
2558*a58d3d2aSXin Li<c><spanx style="vbare">l&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;i</spanx></c>
2559*a58d3d2aSXin Li<c>31</c>
2560*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
2561*a58d3d2aSXin Li</texttable>
2562*a58d3d2aSXin Li
2563*a58d3d2aSXin Li<t>
2564*a58d3d2aSXin LiDecoding the second stage residual proceeds as follows.
2565*a58d3d2aSXin LiFor each coefficient, the decoder reads a symbol using the PDF corresponding to
2566*a58d3d2aSXin Li I1 from either <xref target="silk_nlsf_nbmb_stage2_cb_sel"/> or
2567*a58d3d2aSXin Li <xref target="silk_nlsf_wb_stage2_cb_sel"/>, and subtracts 4 from the result
2568*a58d3d2aSXin Li to give an index in the range -4 to 4, inclusive.
2569*a58d3d2aSXin LiIf the index is either -4 or 4, it reads a second symbol using the PDF in
2570*a58d3d2aSXin Li <xref target="silk_nlsf_ext_pdf"/>, and adds the value of this second symbol
2571*a58d3d2aSXin Li to the index, using the same sign.
2572*a58d3d2aSXin LiThis gives the index, I2[k], a total range of -10 to 10, inclusive.
2573*a58d3d2aSXin Li</t>
2574*a58d3d2aSXin Li
2575*a58d3d2aSXin Li<texttable anchor="silk_nlsf_ext_pdf"
2576*a58d3d2aSXin Li title="PDF for Normalized LSF Index Extension Decoding">
2577*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
2578*a58d3d2aSXin Li<c>{156, 60, 24,  9,  4,  2,  1}/256</c>
2579*a58d3d2aSXin Li</texttable>
2580*a58d3d2aSXin Li
2581*a58d3d2aSXin Li<t>
2582*a58d3d2aSXin LiThe decoded indices from both stages are translated back into normalized LSF
2583*a58d3d2aSXin Li coefficients in silk_NLSF_decode() (NLSF_decode.c).
2584*a58d3d2aSXin LiThe stage-2 indices represent residuals after both the first stage of the VQ
2585*a58d3d2aSXin Li and a separate backwards-prediction step.
2586*a58d3d2aSXin LiThe backwards prediction process in the encoder subtracts a prediction from
2587*a58d3d2aSXin Li each residual formed by a multiple of the coefficient that follows it.
2588*a58d3d2aSXin LiThe decoder must undo this process.
2589*a58d3d2aSXin Li<xref target="silk_nlsf_pred_weights"/> contains lists of prediction weights
2590*a58d3d2aSXin Li for each coefficient.
2591*a58d3d2aSXin LiThere are two lists for NB and MB, and another two lists for WB, giving two
2592*a58d3d2aSXin Li possible prediction weights for each coefficient.
2593*a58d3d2aSXin Li</t>
2594*a58d3d2aSXin Li
2595*a58d3d2aSXin Li<texttable anchor="silk_nlsf_pred_weights"
2596*a58d3d2aSXin Li title="Prediction Weights for Normalized LSF Decoding">
2597*a58d3d2aSXin Li<ttcol align="left">Coefficient</ttcol>
2598*a58d3d2aSXin Li<ttcol align="right">A</ttcol>
2599*a58d3d2aSXin Li<ttcol align="right">B</ttcol>
2600*a58d3d2aSXin Li<ttcol align="right">C</ttcol>
2601*a58d3d2aSXin Li<ttcol align="right">D</ttcol>
2602*a58d3d2aSXin Li <c>0</c> <c>179</c> <c>116</c> <c>175</c>  <c>68</c>
2603*a58d3d2aSXin Li <c>1</c> <c>138</c>  <c>67</c> <c>148</c>  <c>62</c>
2604*a58d3d2aSXin Li <c>2</c> <c>140</c>  <c>82</c> <c>160</c>  <c>66</c>
2605*a58d3d2aSXin Li <c>3</c> <c>148</c>  <c>59</c> <c>176</c>  <c>60</c>
2606*a58d3d2aSXin Li <c>4</c> <c>151</c>  <c>92</c> <c>178</c>  <c>72</c>
2607*a58d3d2aSXin Li <c>5</c> <c>149</c>  <c>72</c> <c>173</c> <c>117</c>
2608*a58d3d2aSXin Li <c>6</c> <c>153</c> <c>100</c> <c>174</c>  <c>85</c>
2609*a58d3d2aSXin Li <c>7</c> <c>151</c>  <c>89</c> <c>164</c>  <c>90</c>
2610*a58d3d2aSXin Li <c>8</c> <c>163</c>  <c>92</c> <c>177</c> <c>118</c>
2611*a58d3d2aSXin Li <c>9</c> <c/>        <c/>      <c>174</c> <c>136</c>
2612*a58d3d2aSXin Li<c>10</c> <c/>        <c/>      <c>196</c> <c>151</c>
2613*a58d3d2aSXin Li<c>11</c> <c/>        <c/>      <c>182</c> <c>142</c>
2614*a58d3d2aSXin Li<c>12</c> <c/>        <c/>      <c>198</c> <c>160</c>
2615*a58d3d2aSXin Li<c>13</c> <c/>        <c/>      <c>192</c> <c>142</c>
2616*a58d3d2aSXin Li<c>14</c> <c/>        <c/>      <c>182</c> <c>155</c>
2617*a58d3d2aSXin Li</texttable>
2618*a58d3d2aSXin Li
2619*a58d3d2aSXin Li<t>
2620*a58d3d2aSXin LiThe prediction is undone using the procedure implemented in
2621*a58d3d2aSXin Li silk_NLSF_residual_dequant() (NLSF_decode.c), which is as follows.
2622*a58d3d2aSXin LiEach coefficient selects its prediction weight from one of the two lists based
2623*a58d3d2aSXin Li on the stage-1 index, I1.
2624*a58d3d2aSXin Li<xref target="silk_nlsf_nbmb_weight_sel"/> gives the selections for each
2625*a58d3d2aSXin Li coefficient for NB and MB, and <xref target="silk_nlsf_wb_weight_sel"/> gives
2626*a58d3d2aSXin Li the selections for WB.
2627*a58d3d2aSXin LiLet d_LPC be the order of the codebook, i.e., 10 for NB and MB, and 16 for WB,
2628*a58d3d2aSXin Li and let pred_Q8[k] be the weight for the k'th coefficient selected by this
2629*a58d3d2aSXin Li process for 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d_LPC-1.
2630*a58d3d2aSXin LiThen, the stage-2 residual for each coefficient is computed via
2631*a58d3d2aSXin Li<figure align="center">
2632*a58d3d2aSXin Li<artwork align="center"><![CDATA[
2633*a58d3d2aSXin Lires_Q10[k] = (k+1 < d_LPC ? (res_Q10[k+1]*pred_Q8[k])>>8 : 0)
2634*a58d3d2aSXin Li             + ((((I2[k]<<10) - sign(I2[k])*102)*qstep)>>16) ,
2635*a58d3d2aSXin Li]]></artwork>
2636*a58d3d2aSXin Li</figure>
2637*a58d3d2aSXin Li where qstep is the Q16 quantization step size, which is 11796 for NB and MB
2638*a58d3d2aSXin Li and 9830 for WB (representing step sizes of approximately 0.18 and 0.15,
2639*a58d3d2aSXin Li respectively).
2640*a58d3d2aSXin Li</t>
2641*a58d3d2aSXin Li
2642*a58d3d2aSXin Li<texttable anchor="silk_nlsf_nbmb_weight_sel"
2643*a58d3d2aSXin Li title="Prediction Weight Selection for NB/MB Normalized LSF Decoding">
2644*a58d3d2aSXin Li<ttcol>I1</ttcol>
2645*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
2646*a58d3d2aSXin Li<c/>
2647*a58d3d2aSXin Li<c><spanx style="vbare">0&nbsp;1&nbsp;2&nbsp;3&nbsp;4&nbsp;5&nbsp;6&nbsp;7&nbsp;8</spanx></c>
2648*a58d3d2aSXin Li<c> 0</c>
2649*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
2650*a58d3d2aSXin Li<c> 1</c>
2651*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
2652*a58d3d2aSXin Li<c> 2</c>
2653*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
2654*a58d3d2aSXin Li<c> 3</c>
2655*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;A</spanx></c>
2656*a58d3d2aSXin Li<c> 4</c>
2657*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
2658*a58d3d2aSXin Li<c> 5</c>
2659*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
2660*a58d3d2aSXin Li<c> 6</c>
2661*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;A</spanx></c>
2662*a58d3d2aSXin Li<c> 7</c>
2663*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;A</spanx></c>
2664*a58d3d2aSXin Li<c> 8</c>
2665*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;B</spanx></c>
2666*a58d3d2aSXin Li<c> 9</c>
2667*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;B</spanx></c>
2668*a58d3d2aSXin Li<c>10</c>
2669*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
2670*a58d3d2aSXin Li<c>11</c>
2671*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
2672*a58d3d2aSXin Li<c>12</c>
2673*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
2674*a58d3d2aSXin Li<c>13</c>
2675*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
2676*a58d3d2aSXin Li<c>14</c>
2677*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B</spanx></c>
2678*a58d3d2aSXin Li<c>15</c>
2679*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
2680*a58d3d2aSXin Li<c>16</c>
2681*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
2682*a58d3d2aSXin Li<c>17</c>
2683*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B</spanx></c>
2684*a58d3d2aSXin Li<c>18</c>
2685*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
2686*a58d3d2aSXin Li<c>19</c>
2687*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
2688*a58d3d2aSXin Li<c>20</c>
2689*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
2690*a58d3d2aSXin Li<c>21</c>
2691*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;A</spanx></c>
2692*a58d3d2aSXin Li<c>22</c>
2693*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B</spanx></c>
2694*a58d3d2aSXin Li<c>23</c>
2695*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;B</spanx></c>
2696*a58d3d2aSXin Li<c>24</c>
2697*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B</spanx></c>
2698*a58d3d2aSXin Li<c>25</c>
2699*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
2700*a58d3d2aSXin Li<c>26</c>
2701*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
2702*a58d3d2aSXin Li<c>27</c>
2703*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
2704*a58d3d2aSXin Li<c>28</c>
2705*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
2706*a58d3d2aSXin Li<c>29</c>
2707*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;A&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
2708*a58d3d2aSXin Li<c>30</c>
2709*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A&nbsp;B</spanx></c>
2710*a58d3d2aSXin Li<c>31</c>
2711*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B</spanx></c>
2712*a58d3d2aSXin Li</texttable>
2713*a58d3d2aSXin Li
2714*a58d3d2aSXin Li<texttable anchor="silk_nlsf_wb_weight_sel"
2715*a58d3d2aSXin Li title="Prediction Weight Selection for WB Normalized LSF Decoding">
2716*a58d3d2aSXin Li<ttcol>I1</ttcol>
2717*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
2718*a58d3d2aSXin Li<c/>
2719*a58d3d2aSXin Li<c><spanx style="vbare">0&nbsp;&nbsp;1&nbsp;&nbsp;2&nbsp;&nbsp;3&nbsp;&nbsp;4&nbsp;&nbsp;5&nbsp;&nbsp;6&nbsp;&nbsp;7&nbsp;&nbsp;8&nbsp;&nbsp;9&nbsp;10&nbsp;11&nbsp;12&nbsp;13&nbsp;14</spanx></c>
2720*a58d3d2aSXin Li<c> 0</c>
2721*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
2722*a58d3d2aSXin Li<c> 1</c>
2723*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2724*a58d3d2aSXin Li<c> 2</c>
2725*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2726*a58d3d2aSXin Li<c> 3</c>
2727*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2728*a58d3d2aSXin Li<c> 4</c>
2729*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
2730*a58d3d2aSXin Li<c> 5</c>
2731*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2732*a58d3d2aSXin Li<c> 6</c>
2733*a58d3d2aSXin Li<c><spanx style="vbare">D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
2734*a58d3d2aSXin Li<c> 7</c>
2735*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
2736*a58d3d2aSXin Li<c> 8</c>
2737*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
2738*a58d3d2aSXin Li<c> 9</c>
2739*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
2740*a58d3d2aSXin Li<c>10</c>
2741*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2742*a58d3d2aSXin Li<c>11</c>
2743*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2744*a58d3d2aSXin Li<c>12</c>
2745*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2746*a58d3d2aSXin Li<c>13</c>
2747*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2748*a58d3d2aSXin Li<c>14</c>
2749*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
2750*a58d3d2aSXin Li<c>15</c>
2751*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
2752*a58d3d2aSXin Li<c>16</c>
2753*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2754*a58d3d2aSXin Li<c>17</c>
2755*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2756*a58d3d2aSXin Li<c>18</c>
2757*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
2758*a58d3d2aSXin Li<c>19</c>
2759*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2760*a58d3d2aSXin Li<c>20</c>
2761*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2762*a58d3d2aSXin Li<c>21</c>
2763*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
2764*a58d3d2aSXin Li<c>22</c>
2765*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2766*a58d3d2aSXin Li<c>23</c>
2767*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
2768*a58d3d2aSXin Li<c>24</c>
2769*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
2770*a58d3d2aSXin Li<c>25</c>
2771*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
2772*a58d3d2aSXin Li<c>26</c>
2773*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
2774*a58d3d2aSXin Li<c>27</c>
2775*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
2776*a58d3d2aSXin Li<c>28</c>
2777*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
2778*a58d3d2aSXin Li<c>29</c>
2779*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
2780*a58d3d2aSXin Li<c>30</c>
2781*a58d3d2aSXin Li<c><spanx style="vbare">D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
2782*a58d3d2aSXin Li<c>31</c>
2783*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
2784*a58d3d2aSXin Li</texttable>
2785*a58d3d2aSXin Li
2786*a58d3d2aSXin Li</section>
2787*a58d3d2aSXin Li
2788*a58d3d2aSXin Li<section anchor="silk_nlsf_reconstruction"
2789*a58d3d2aSXin Li title="Reconstructing the Normalized LSF Coefficients">
2790*a58d3d2aSXin Li<t>
2791*a58d3d2aSXin LiOnce the stage-1 index I1 and the stage-2 residual res_Q10[] have been decoded,
2792*a58d3d2aSXin Li the final normalized LSF coefficients can be reconstructed.
2793*a58d3d2aSXin Li</t>
2794*a58d3d2aSXin Li<t>
2795*a58d3d2aSXin LiThe spectral distortion introduced by the quantization of each LSF coefficient
2796*a58d3d2aSXin Li varies, so the stage-2 residual is weighted accordingly, using the
2797*a58d3d2aSXin Li low-complexity Inverse Harmonic Mean Weighting (IHMW) function proposed in
2798*a58d3d2aSXin Li <xref target="laroia-icassp"/>.
2799*a58d3d2aSXin LiThe weights are derived directly from the stage-1 codebook vector.
2800*a58d3d2aSXin LiLet cb1_Q8[k] be the k'th entry of the stage-1 codebook vector from
2801*a58d3d2aSXin Li <xref target="silk_nlsf_nbmb_codebook"/> or
2802*a58d3d2aSXin Li <xref target="silk_nlsf_wb_codebook"/>.
2803*a58d3d2aSXin LiThen for 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d_LPC the following expression
2804*a58d3d2aSXin Li computes the square of the weight as a Q18 value:
2805*a58d3d2aSXin Li<figure align="center">
2806*a58d3d2aSXin Li<artwork align="center">
2807*a58d3d2aSXin Li<![CDATA[
2808*a58d3d2aSXin Liw2_Q18[k] = (1024/(cb1_Q8[k] - cb1_Q8[k-1])
2809*a58d3d2aSXin Li             + 1024/(cb1_Q8[k+1] - cb1_Q8[k])) << 16 ,
2810*a58d3d2aSXin Li]]>
2811*a58d3d2aSXin Li</artwork>
2812*a58d3d2aSXin Li</figure>
2813*a58d3d2aSXin Li where cb1_Q8[-1]&nbsp;=&nbsp;0 and cb1_Q8[d_LPC]&nbsp;=&nbsp;256, and the
2814*a58d3d2aSXin Li division is integer division.
2815*a58d3d2aSXin LiThis is reduced to an unsquared, Q9 value using the following square-root
2816*a58d3d2aSXin Li approximation:
2817*a58d3d2aSXin Li<figure align="center">
2818*a58d3d2aSXin Li<artwork align="center"><![CDATA[
2819*a58d3d2aSXin Lii = ilog(w2_Q18[k])
2820*a58d3d2aSXin Lif = (w2_Q18[k]>>(i-8)) & 127
2821*a58d3d2aSXin Liy = ((i&1) ? 32768 : 46214) >> ((32-i)>>1)
2822*a58d3d2aSXin Liw_Q9[k] = y + ((213*f*y)>>16)
2823*a58d3d2aSXin Li]]></artwork>
2824*a58d3d2aSXin Li</figure>
2825*a58d3d2aSXin LiThe constant 46214 here is approximately the square root of 2 in Q15.
2826*a58d3d2aSXin LiThe cb1_Q8[] vector completely determines these weights, and they may be
2827*a58d3d2aSXin Li tabulated and stored as 13-bit unsigned values (with a range of 1819 to 5227,
2828*a58d3d2aSXin Li inclusive) to avoid computing them when decoding.
2829*a58d3d2aSXin LiThe reference implementation already requires code to compute these weights on
2830*a58d3d2aSXin Li unquantized coefficients in the encoder, in silk_NLSF_VQ_weights_laroia()
2831*a58d3d2aSXin Li (NLSF_VQ_weights_laroia.c) and its callers, so it reuses that code in the
2832*a58d3d2aSXin Li decoder instead of using a pre-computed table to reduce the amount of ROM
2833*a58d3d2aSXin Li required.
2834*a58d3d2aSXin Li</t>
2835*a58d3d2aSXin Li
2836*a58d3d2aSXin Li<texttable anchor="silk_nlsf_nbmb_codebook"
2837*a58d3d2aSXin Li           title="NB/MB Normalized LSF Stage-1 Codebook Vectors">
2838*a58d3d2aSXin Li<ttcol>I1</ttcol>
2839*a58d3d2aSXin Li<ttcol>Codebook (Q8)</ttcol>
2840*a58d3d2aSXin Li<c/>
2841*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;4&nbsp;&nbsp;&nbsp;5&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;&nbsp;7&nbsp;&nbsp;&nbsp;8&nbsp;&nbsp;&nbsp;9</spanx></c>
2842*a58d3d2aSXin Li<c>0</c>
2843*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;&nbsp;35&nbsp;&nbsp;60&nbsp;&nbsp;83&nbsp;108&nbsp;132&nbsp;157&nbsp;180&nbsp;206&nbsp;228</spanx></c>
2844*a58d3d2aSXin Li<c>1</c>
2845*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;&nbsp;32&nbsp;&nbsp;55&nbsp;&nbsp;77&nbsp;101&nbsp;125&nbsp;151&nbsp;175&nbsp;201&nbsp;225</spanx></c>
2846*a58d3d2aSXin Li<c>2</c>
2847*a58d3d2aSXin Li<c><spanx style="vbare">19&nbsp;&nbsp;42&nbsp;&nbsp;66&nbsp;&nbsp;89&nbsp;114&nbsp;137&nbsp;162&nbsp;184&nbsp;209&nbsp;230</spanx></c>
2848*a58d3d2aSXin Li<c>3</c>
2849*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;&nbsp;25&nbsp;&nbsp;50&nbsp;&nbsp;72&nbsp;&nbsp;97&nbsp;120&nbsp;147&nbsp;172&nbsp;200&nbsp;223</spanx></c>
2850*a58d3d2aSXin Li<c>4</c>
2851*a58d3d2aSXin Li<c><spanx style="vbare">26&nbsp;&nbsp;44&nbsp;&nbsp;69&nbsp;&nbsp;90&nbsp;114&nbsp;135&nbsp;159&nbsp;180&nbsp;205&nbsp;225</spanx></c>
2852*a58d3d2aSXin Li<c>5</c>
2853*a58d3d2aSXin Li<c><spanx style="vbare">13&nbsp;&nbsp;22&nbsp;&nbsp;53&nbsp;&nbsp;80&nbsp;106&nbsp;130&nbsp;156&nbsp;180&nbsp;205&nbsp;228</spanx></c>
2854*a58d3d2aSXin Li<c>6</c>
2855*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;&nbsp;25&nbsp;&nbsp;44&nbsp;&nbsp;64&nbsp;&nbsp;90&nbsp;115&nbsp;142&nbsp;168&nbsp;196&nbsp;222</spanx></c>
2856*a58d3d2aSXin Li<c>7</c>
2857*a58d3d2aSXin Li<c><spanx style="vbare">19&nbsp;&nbsp;24&nbsp;&nbsp;62&nbsp;&nbsp;82&nbsp;100&nbsp;120&nbsp;145&nbsp;168&nbsp;190&nbsp;214</spanx></c>
2858*a58d3d2aSXin Li<c>8</c>
2859*a58d3d2aSXin Li<c><spanx style="vbare">22&nbsp;&nbsp;31&nbsp;&nbsp;50&nbsp;&nbsp;79&nbsp;103&nbsp;120&nbsp;151&nbsp;170&nbsp;203&nbsp;227</spanx></c>
2860*a58d3d2aSXin Li<c>9</c>
2861*a58d3d2aSXin Li<c><spanx style="vbare">21&nbsp;&nbsp;29&nbsp;&nbsp;45&nbsp;&nbsp;65&nbsp;106&nbsp;124&nbsp;150&nbsp;171&nbsp;196&nbsp;224</spanx></c>
2862*a58d3d2aSXin Li<c>10</c>
2863*a58d3d2aSXin Li<c><spanx style="vbare">30&nbsp;&nbsp;49&nbsp;&nbsp;75&nbsp;&nbsp;97&nbsp;121&nbsp;142&nbsp;165&nbsp;186&nbsp;209&nbsp;229</spanx></c>
2864*a58d3d2aSXin Li<c>11</c>
2865*a58d3d2aSXin Li<c><spanx style="vbare">19&nbsp;&nbsp;25&nbsp;&nbsp;52&nbsp;&nbsp;70&nbsp;&nbsp;93&nbsp;116&nbsp;143&nbsp;166&nbsp;192&nbsp;219</spanx></c>
2866*a58d3d2aSXin Li<c>12</c>
2867*a58d3d2aSXin Li<c><spanx style="vbare">26&nbsp;&nbsp;34&nbsp;&nbsp;62&nbsp;&nbsp;75&nbsp;&nbsp;97&nbsp;118&nbsp;145&nbsp;167&nbsp;194&nbsp;217</spanx></c>
2868*a58d3d2aSXin Li<c>13</c>
2869*a58d3d2aSXin Li<c><spanx style="vbare">25&nbsp;&nbsp;33&nbsp;&nbsp;56&nbsp;&nbsp;70&nbsp;&nbsp;91&nbsp;113&nbsp;143&nbsp;165&nbsp;196&nbsp;223</spanx></c>
2870*a58d3d2aSXin Li<c>14</c>
2871*a58d3d2aSXin Li<c><spanx style="vbare">21&nbsp;&nbsp;34&nbsp;&nbsp;51&nbsp;&nbsp;72&nbsp;&nbsp;97&nbsp;117&nbsp;145&nbsp;171&nbsp;196&nbsp;222</spanx></c>
2872*a58d3d2aSXin Li<c>15</c>
2873*a58d3d2aSXin Li<c><spanx style="vbare">20&nbsp;&nbsp;29&nbsp;&nbsp;50&nbsp;&nbsp;67&nbsp;&nbsp;90&nbsp;117&nbsp;144&nbsp;168&nbsp;197&nbsp;221</spanx></c>
2874*a58d3d2aSXin Li<c>16</c>
2875*a58d3d2aSXin Li<c><spanx style="vbare">22&nbsp;&nbsp;31&nbsp;&nbsp;48&nbsp;&nbsp;66&nbsp;&nbsp;95&nbsp;117&nbsp;146&nbsp;168&nbsp;196&nbsp;222</spanx></c>
2876*a58d3d2aSXin Li<c>17</c>
2877*a58d3d2aSXin Li<c><spanx style="vbare">24&nbsp;&nbsp;33&nbsp;&nbsp;51&nbsp;&nbsp;77&nbsp;116&nbsp;134&nbsp;158&nbsp;180&nbsp;200&nbsp;224</spanx></c>
2878*a58d3d2aSXin Li<c>18</c>
2879*a58d3d2aSXin Li<c><spanx style="vbare">21&nbsp;&nbsp;28&nbsp;&nbsp;70&nbsp;&nbsp;87&nbsp;106&nbsp;124&nbsp;149&nbsp;170&nbsp;194&nbsp;217</spanx></c>
2880*a58d3d2aSXin Li<c>19</c>
2881*a58d3d2aSXin Li<c><spanx style="vbare">26&nbsp;&nbsp;33&nbsp;&nbsp;53&nbsp;&nbsp;64&nbsp;&nbsp;83&nbsp;117&nbsp;152&nbsp;173&nbsp;204&nbsp;225</spanx></c>
2882*a58d3d2aSXin Li<c>20</c>
2883*a58d3d2aSXin Li<c><spanx style="vbare">27&nbsp;&nbsp;34&nbsp;&nbsp;65&nbsp;&nbsp;95&nbsp;108&nbsp;129&nbsp;155&nbsp;174&nbsp;210&nbsp;225</spanx></c>
2884*a58d3d2aSXin Li<c>21</c>
2885*a58d3d2aSXin Li<c><spanx style="vbare">20&nbsp;&nbsp;26&nbsp;&nbsp;72&nbsp;&nbsp;99&nbsp;113&nbsp;131&nbsp;154&nbsp;176&nbsp;200&nbsp;219</spanx></c>
2886*a58d3d2aSXin Li<c>22</c>
2887*a58d3d2aSXin Li<c><spanx style="vbare">34&nbsp;&nbsp;43&nbsp;&nbsp;61&nbsp;&nbsp;78&nbsp;&nbsp;93&nbsp;114&nbsp;155&nbsp;177&nbsp;205&nbsp;229</spanx></c>
2888*a58d3d2aSXin Li<c>23</c>
2889*a58d3d2aSXin Li<c><spanx style="vbare">23&nbsp;&nbsp;29&nbsp;&nbsp;54&nbsp;&nbsp;97&nbsp;124&nbsp;138&nbsp;163&nbsp;179&nbsp;209&nbsp;229</spanx></c>
2890*a58d3d2aSXin Li<c>24</c>
2891*a58d3d2aSXin Li<c><spanx style="vbare">30&nbsp;&nbsp;38&nbsp;&nbsp;56&nbsp;&nbsp;89&nbsp;118&nbsp;129&nbsp;158&nbsp;178&nbsp;200&nbsp;231</spanx></c>
2892*a58d3d2aSXin Li<c>25</c>
2893*a58d3d2aSXin Li<c><spanx style="vbare">21&nbsp;&nbsp;29&nbsp;&nbsp;49&nbsp;&nbsp;63&nbsp;&nbsp;85&nbsp;111&nbsp;142&nbsp;163&nbsp;193&nbsp;222</spanx></c>
2894*a58d3d2aSXin Li<c>26</c>
2895*a58d3d2aSXin Li<c><spanx style="vbare">27&nbsp;&nbsp;48&nbsp;&nbsp;77&nbsp;103&nbsp;133&nbsp;158&nbsp;179&nbsp;196&nbsp;215&nbsp;232</spanx></c>
2896*a58d3d2aSXin Li<c>27</c>
2897*a58d3d2aSXin Li<c><spanx style="vbare">29&nbsp;&nbsp;47&nbsp;&nbsp;74&nbsp;&nbsp;99&nbsp;124&nbsp;151&nbsp;176&nbsp;198&nbsp;220&nbsp;237</spanx></c>
2898*a58d3d2aSXin Li<c>28</c>
2899*a58d3d2aSXin Li<c><spanx style="vbare">33&nbsp;&nbsp;42&nbsp;&nbsp;61&nbsp;&nbsp;76&nbsp;&nbsp;93&nbsp;121&nbsp;155&nbsp;174&nbsp;207&nbsp;225</spanx></c>
2900*a58d3d2aSXin Li<c>29</c>
2901*a58d3d2aSXin Li<c><spanx style="vbare">29&nbsp;&nbsp;53&nbsp;&nbsp;87&nbsp;112&nbsp;136&nbsp;154&nbsp;170&nbsp;188&nbsp;208&nbsp;227</spanx></c>
2902*a58d3d2aSXin Li<c>30</c>
2903*a58d3d2aSXin Li<c><spanx style="vbare">24&nbsp;&nbsp;30&nbsp;&nbsp;52&nbsp;&nbsp;84&nbsp;131&nbsp;150&nbsp;166&nbsp;186&nbsp;203&nbsp;229</spanx></c>
2904*a58d3d2aSXin Li<c>31</c>
2905*a58d3d2aSXin Li<c><spanx style="vbare">37&nbsp;&nbsp;48&nbsp;&nbsp;64&nbsp;&nbsp;84&nbsp;104&nbsp;118&nbsp;156&nbsp;177&nbsp;201&nbsp;230</spanx></c>
2906*a58d3d2aSXin Li</texttable>
2907*a58d3d2aSXin Li
2908*a58d3d2aSXin Li<texttable anchor="silk_nlsf_wb_codebook"
2909*a58d3d2aSXin Li           title="WB Normalized LSF Stage-1 Codebook Vectors">
2910*a58d3d2aSXin Li<ttcol>I1</ttcol>
2911*a58d3d2aSXin Li<ttcol>Codebook (Q8)</ttcol>
2912*a58d3d2aSXin Li<c/>
2913*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;2&nbsp;&nbsp;3&nbsp;&nbsp;4&nbsp;&nbsp;&nbsp;5&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;&nbsp;7&nbsp;&nbsp;&nbsp;8&nbsp;&nbsp;&nbsp;9&nbsp;&nbsp;10&nbsp;&nbsp;11&nbsp;&nbsp;12&nbsp;&nbsp;13&nbsp;&nbsp;14&nbsp;&nbsp;15</spanx></c>
2914*a58d3d2aSXin Li<c>0</c>
2915*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;7&nbsp;23&nbsp;38&nbsp;54&nbsp;69&nbsp;&nbsp;85&nbsp;100&nbsp;116&nbsp;131&nbsp;147&nbsp;162&nbsp;178&nbsp;193&nbsp;208&nbsp;223&nbsp;239</spanx></c>
2916*a58d3d2aSXin Li<c>1</c>
2917*a58d3d2aSXin Li<c><spanx style="vbare">13&nbsp;25&nbsp;41&nbsp;55&nbsp;69&nbsp;&nbsp;83&nbsp;&nbsp;98&nbsp;112&nbsp;127&nbsp;142&nbsp;157&nbsp;171&nbsp;187&nbsp;203&nbsp;220&nbsp;236</spanx></c>
2918*a58d3d2aSXin Li<c>2</c>
2919*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;21&nbsp;34&nbsp;51&nbsp;61&nbsp;&nbsp;78&nbsp;&nbsp;92&nbsp;106&nbsp;126&nbsp;136&nbsp;152&nbsp;167&nbsp;185&nbsp;205&nbsp;225&nbsp;240</spanx></c>
2920*a58d3d2aSXin Li<c>3</c>
2921*a58d3d2aSXin Li<c><spanx style="vbare">10&nbsp;21&nbsp;36&nbsp;50&nbsp;63&nbsp;&nbsp;79&nbsp;&nbsp;95&nbsp;110&nbsp;126&nbsp;141&nbsp;157&nbsp;173&nbsp;189&nbsp;205&nbsp;221&nbsp;237</spanx></c>
2922*a58d3d2aSXin Li<c>4</c>
2923*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;20&nbsp;37&nbsp;51&nbsp;59&nbsp;&nbsp;78&nbsp;&nbsp;89&nbsp;107&nbsp;123&nbsp;134&nbsp;150&nbsp;164&nbsp;184&nbsp;205&nbsp;224&nbsp;240</spanx></c>
2924*a58d3d2aSXin Li<c>5</c>
2925*a58d3d2aSXin Li<c><spanx style="vbare">10&nbsp;15&nbsp;32&nbsp;51&nbsp;67&nbsp;&nbsp;81&nbsp;&nbsp;96&nbsp;112&nbsp;129&nbsp;142&nbsp;158&nbsp;173&nbsp;189&nbsp;204&nbsp;220&nbsp;236</spanx></c>
2926*a58d3d2aSXin Li<c>6</c>
2927*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;8&nbsp;21&nbsp;37&nbsp;51&nbsp;65&nbsp;&nbsp;79&nbsp;&nbsp;98&nbsp;113&nbsp;126&nbsp;138&nbsp;155&nbsp;168&nbsp;179&nbsp;192&nbsp;209&nbsp;218</spanx></c>
2928*a58d3d2aSXin Li<c>7</c>
2929*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;15&nbsp;34&nbsp;55&nbsp;63&nbsp;&nbsp;78&nbsp;&nbsp;87&nbsp;108&nbsp;118&nbsp;131&nbsp;148&nbsp;167&nbsp;185&nbsp;203&nbsp;219&nbsp;236</spanx></c>
2930*a58d3d2aSXin Li<c>8</c>
2931*a58d3d2aSXin Li<c><spanx style="vbare">16&nbsp;19&nbsp;32&nbsp;36&nbsp;56&nbsp;&nbsp;79&nbsp;&nbsp;91&nbsp;108&nbsp;118&nbsp;136&nbsp;154&nbsp;171&nbsp;186&nbsp;204&nbsp;220&nbsp;237</spanx></c>
2932*a58d3d2aSXin Li<c>9</c>
2933*a58d3d2aSXin Li<c><spanx style="vbare">11&nbsp;28&nbsp;43&nbsp;58&nbsp;74&nbsp;&nbsp;89&nbsp;105&nbsp;120&nbsp;135&nbsp;150&nbsp;165&nbsp;180&nbsp;196&nbsp;211&nbsp;226&nbsp;241</spanx></c>
2934*a58d3d2aSXin Li<c>10</c>
2935*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;6&nbsp;16&nbsp;33&nbsp;46&nbsp;60&nbsp;&nbsp;75&nbsp;&nbsp;92&nbsp;107&nbsp;123&nbsp;137&nbsp;156&nbsp;169&nbsp;185&nbsp;199&nbsp;214&nbsp;225</spanx></c>
2936*a58d3d2aSXin Li<c>11</c>
2937*a58d3d2aSXin Li<c><spanx style="vbare">11&nbsp;19&nbsp;30&nbsp;44&nbsp;57&nbsp;&nbsp;74&nbsp;&nbsp;89&nbsp;105&nbsp;121&nbsp;135&nbsp;152&nbsp;169&nbsp;186&nbsp;202&nbsp;218&nbsp;234</spanx></c>
2938*a58d3d2aSXin Li<c>12</c>
2939*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;19&nbsp;29&nbsp;46&nbsp;57&nbsp;&nbsp;71&nbsp;&nbsp;88&nbsp;100&nbsp;120&nbsp;132&nbsp;148&nbsp;165&nbsp;182&nbsp;199&nbsp;216&nbsp;233</spanx></c>
2940*a58d3d2aSXin Li<c>13</c>
2941*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;23&nbsp;35&nbsp;46&nbsp;56&nbsp;&nbsp;77&nbsp;&nbsp;92&nbsp;106&nbsp;123&nbsp;134&nbsp;152&nbsp;167&nbsp;185&nbsp;204&nbsp;222&nbsp;237</spanx></c>
2942*a58d3d2aSXin Li<c>14</c>
2943*a58d3d2aSXin Li<c><spanx style="vbare">14&nbsp;17&nbsp;45&nbsp;53&nbsp;63&nbsp;&nbsp;75&nbsp;&nbsp;89&nbsp;107&nbsp;115&nbsp;132&nbsp;151&nbsp;171&nbsp;188&nbsp;206&nbsp;221&nbsp;240</spanx></c>
2944*a58d3d2aSXin Li<c>15</c>
2945*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;9&nbsp;16&nbsp;29&nbsp;40&nbsp;56&nbsp;&nbsp;71&nbsp;&nbsp;88&nbsp;103&nbsp;119&nbsp;137&nbsp;154&nbsp;171&nbsp;189&nbsp;205&nbsp;222&nbsp;237</spanx></c>
2946*a58d3d2aSXin Li<c>16</c>
2947*a58d3d2aSXin Li<c><spanx style="vbare">16&nbsp;19&nbsp;36&nbsp;48&nbsp;57&nbsp;&nbsp;76&nbsp;&nbsp;87&nbsp;105&nbsp;118&nbsp;132&nbsp;150&nbsp;167&nbsp;185&nbsp;202&nbsp;218&nbsp;236</spanx></c>
2948*a58d3d2aSXin Li<c>17</c>
2949*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;17&nbsp;29&nbsp;54&nbsp;71&nbsp;&nbsp;81&nbsp;&nbsp;94&nbsp;104&nbsp;126&nbsp;136&nbsp;149&nbsp;164&nbsp;182&nbsp;201&nbsp;221&nbsp;237</spanx></c>
2950*a58d3d2aSXin Li<c>18</c>
2951*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;28&nbsp;47&nbsp;62&nbsp;79&nbsp;&nbsp;97&nbsp;115&nbsp;129&nbsp;142&nbsp;155&nbsp;168&nbsp;180&nbsp;194&nbsp;208&nbsp;223&nbsp;238</spanx></c>
2952*a58d3d2aSXin Li<c>19</c>
2953*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;8&nbsp;14&nbsp;30&nbsp;45&nbsp;62&nbsp;&nbsp;78&nbsp;&nbsp;94&nbsp;111&nbsp;127&nbsp;143&nbsp;159&nbsp;175&nbsp;192&nbsp;207&nbsp;223&nbsp;239</spanx></c>
2954*a58d3d2aSXin Li<c>20</c>
2955*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;30&nbsp;49&nbsp;62&nbsp;79&nbsp;&nbsp;92&nbsp;107&nbsp;119&nbsp;132&nbsp;145&nbsp;160&nbsp;174&nbsp;190&nbsp;204&nbsp;220&nbsp;235</spanx></c>
2956*a58d3d2aSXin Li<c>21</c>
2957*a58d3d2aSXin Li<c><spanx style="vbare">14&nbsp;19&nbsp;36&nbsp;45&nbsp;61&nbsp;&nbsp;76&nbsp;&nbsp;91&nbsp;108&nbsp;121&nbsp;138&nbsp;154&nbsp;172&nbsp;189&nbsp;205&nbsp;222&nbsp;238</spanx></c>
2958*a58d3d2aSXin Li<c>22</c>
2959*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;18&nbsp;31&nbsp;45&nbsp;60&nbsp;&nbsp;76&nbsp;&nbsp;91&nbsp;107&nbsp;123&nbsp;138&nbsp;154&nbsp;171&nbsp;187&nbsp;204&nbsp;221&nbsp;236</spanx></c>
2960*a58d3d2aSXin Li<c>23</c>
2961*a58d3d2aSXin Li<c><spanx style="vbare">13&nbsp;17&nbsp;31&nbsp;43&nbsp;53&nbsp;&nbsp;70&nbsp;&nbsp;83&nbsp;103&nbsp;114&nbsp;131&nbsp;149&nbsp;167&nbsp;185&nbsp;203&nbsp;220&nbsp;237</spanx></c>
2962*a58d3d2aSXin Li<c>24</c>
2963*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;22&nbsp;35&nbsp;42&nbsp;58&nbsp;&nbsp;78&nbsp;&nbsp;93&nbsp;110&nbsp;125&nbsp;139&nbsp;155&nbsp;170&nbsp;188&nbsp;206&nbsp;224&nbsp;240</spanx></c>
2964*a58d3d2aSXin Li<c>25</c>
2965*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;8&nbsp;15&nbsp;34&nbsp;50&nbsp;67&nbsp;&nbsp;83&nbsp;&nbsp;99&nbsp;115&nbsp;131&nbsp;146&nbsp;162&nbsp;178&nbsp;193&nbsp;209&nbsp;224&nbsp;239</spanx></c>
2966*a58d3d2aSXin Li<c>26</c>
2967*a58d3d2aSXin Li<c><spanx style="vbare">13&nbsp;16&nbsp;41&nbsp;66&nbsp;73&nbsp;&nbsp;86&nbsp;&nbsp;95&nbsp;111&nbsp;128&nbsp;137&nbsp;150&nbsp;163&nbsp;183&nbsp;206&nbsp;225&nbsp;241</spanx></c>
2968*a58d3d2aSXin Li<c>27</c>
2969*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;25&nbsp;37&nbsp;52&nbsp;63&nbsp;&nbsp;75&nbsp;&nbsp;92&nbsp;102&nbsp;119&nbsp;132&nbsp;144&nbsp;160&nbsp;175&nbsp;191&nbsp;212&nbsp;231</spanx></c>
2970*a58d3d2aSXin Li<c>28</c>
2971*a58d3d2aSXin Li<c><spanx style="vbare">19&nbsp;31&nbsp;49&nbsp;65&nbsp;83&nbsp;100&nbsp;117&nbsp;133&nbsp;147&nbsp;161&nbsp;174&nbsp;187&nbsp;200&nbsp;213&nbsp;227&nbsp;242</spanx></c>
2972*a58d3d2aSXin Li<c>29</c>
2973*a58d3d2aSXin Li<c><spanx style="vbare">18&nbsp;31&nbsp;52&nbsp;68&nbsp;88&nbsp;103&nbsp;117&nbsp;126&nbsp;138&nbsp;149&nbsp;163&nbsp;177&nbsp;192&nbsp;207&nbsp;223&nbsp;239</spanx></c>
2974*a58d3d2aSXin Li<c>30</c>
2975*a58d3d2aSXin Li<c><spanx style="vbare">16&nbsp;29&nbsp;47&nbsp;61&nbsp;76&nbsp;&nbsp;90&nbsp;106&nbsp;119&nbsp;133&nbsp;147&nbsp;161&nbsp;176&nbsp;193&nbsp;209&nbsp;224&nbsp;240</spanx></c>
2976*a58d3d2aSXin Li<c>31</c>
2977*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;21&nbsp;35&nbsp;50&nbsp;61&nbsp;&nbsp;73&nbsp;&nbsp;86&nbsp;&nbsp;97&nbsp;110&nbsp;119&nbsp;129&nbsp;141&nbsp;175&nbsp;198&nbsp;218&nbsp;237</spanx></c>
2978*a58d3d2aSXin Li</texttable>
2979*a58d3d2aSXin Li
2980*a58d3d2aSXin Li<t>
2981*a58d3d2aSXin LiGiven the stage-1 codebook entry cb1_Q8[], the stage-2 residual res_Q10[], and
2982*a58d3d2aSXin Li their corresponding weights, w_Q9[], the reconstructed normalized LSF
2983*a58d3d2aSXin Li coefficients are
2984*a58d3d2aSXin Li<figure align="center">
2985*a58d3d2aSXin Li<artwork align="center"><![CDATA[
2986*a58d3d2aSXin LiNLSF_Q15[k] = clamp(0,
2987*a58d3d2aSXin Li               (cb1_Q8[k]<<7) + (res_Q10[k]<<14)/w_Q9[k], 32767) ,
2988*a58d3d2aSXin Li]]></artwork>
2989*a58d3d2aSXin Li</figure>
2990*a58d3d2aSXin Li where the division is integer division.
2991*a58d3d2aSXin LiHowever, nothing in either the reconstruction process or the
2992*a58d3d2aSXin Li quantization process in the encoder thus far guarantees that the coefficients
2993*a58d3d2aSXin Li are monotonically increasing and separated well enough to ensure a stable
2994*a58d3d2aSXin Li filter <xref target="Kabal86"/>.
2995*a58d3d2aSXin LiWhen using the reference encoder, roughly 2% of frames violate this constraint.
2996*a58d3d2aSXin LiThe next section describes a stabilization procedure used to make these
2997*a58d3d2aSXin Li guarantees.
2998*a58d3d2aSXin Li</t>
2999*a58d3d2aSXin Li
3000*a58d3d2aSXin Li</section>
3001*a58d3d2aSXin Li
3002*a58d3d2aSXin Li<section anchor="silk_nlsf_stabilization" title="Normalized LSF Stabilization">
3003*a58d3d2aSXin Li<t>
3004*a58d3d2aSXin LiThe normalized LSF stabilization procedure is implemented in
3005*a58d3d2aSXin Li silk_NLSF_stabilize() (NLSF_stabilize.c).
3006*a58d3d2aSXin LiThis process ensures that consecutive values of the normalized LSF
3007*a58d3d2aSXin Li coefficients, NLSF_Q15[], are spaced some minimum distance apart
3008*a58d3d2aSXin Li (predetermined to be the 0.01 percentile of a large training set).
3009*a58d3d2aSXin Li<xref target="silk_nlsf_min_spacing"/> gives the minimum spacings for NB and MB
3010*a58d3d2aSXin Li and those for WB, where row k is the minimum allowed value of
3011*a58d3d2aSXin Li NLSF_Q[k]-NLSF_Q[k-1].
3012*a58d3d2aSXin LiFor the purposes of computing this spacing for the first and last coefficient,
3013*a58d3d2aSXin Li NLSF_Q15[-1] is taken to be 0, and NLSF_Q15[d_LPC] is taken to be 32768.
3014*a58d3d2aSXin Li</t>
3015*a58d3d2aSXin Li
3016*a58d3d2aSXin Li<texttable anchor="silk_nlsf_min_spacing"
3017*a58d3d2aSXin Li           title="Minimum Spacing for Normalized LSF Coefficients">
3018*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
3019*a58d3d2aSXin Li<ttcol align="right">NB and MB</ttcol>
3020*a58d3d2aSXin Li<ttcol align="right">WB</ttcol>
3021*a58d3d2aSXin Li <c>0</c> <c>250</c> <c>100</c>
3022*a58d3d2aSXin Li <c>1</c>   <c>3</c>   <c>3</c>
3023*a58d3d2aSXin Li <c>2</c>   <c>6</c>  <c>40</c>
3024*a58d3d2aSXin Li <c>3</c>   <c>3</c>   <c>3</c>
3025*a58d3d2aSXin Li <c>4</c>   <c>3</c>   <c>3</c>
3026*a58d3d2aSXin Li <c>5</c>   <c>3</c>   <c>3</c>
3027*a58d3d2aSXin Li <c>6</c>   <c>4</c>   <c>5</c>
3028*a58d3d2aSXin Li <c>7</c>   <c>3</c>  <c>14</c>
3029*a58d3d2aSXin Li <c>8</c>   <c>3</c>  <c>14</c>
3030*a58d3d2aSXin Li <c>9</c>   <c>3</c>  <c>10</c>
3031*a58d3d2aSXin Li<c>10</c> <c>461</c>  <c>11</c>
3032*a58d3d2aSXin Li<c>11</c>       <c/>   <c>3</c>
3033*a58d3d2aSXin Li<c>12</c>       <c/>   <c>8</c>
3034*a58d3d2aSXin Li<c>13</c>       <c/>   <c>9</c>
3035*a58d3d2aSXin Li<c>14</c>       <c/>   <c>7</c>
3036*a58d3d2aSXin Li<c>15</c>       <c/>   <c>3</c>
3037*a58d3d2aSXin Li<c>16</c>       <c/> <c>347</c>
3038*a58d3d2aSXin Li</texttable>
3039*a58d3d2aSXin Li
3040*a58d3d2aSXin Li<t>
3041*a58d3d2aSXin LiThe procedure starts off by trying to make small adjustments which attempt to
3042*a58d3d2aSXin Li minimize the amount of distortion introduced.
3043*a58d3d2aSXin LiAfter 20 such adjustments, it falls back to a more direct method which
3044*a58d3d2aSXin Li guarantees the constraints are enforced but may require large adjustments.
3045*a58d3d2aSXin Li</t>
3046*a58d3d2aSXin Li<t>
3047*a58d3d2aSXin LiLet NDeltaMin_Q15[k] be the minimum required spacing for the current audio
3048*a58d3d2aSXin Li bandwidth from <xref target="silk_nlsf_min_spacing"/>.
3049*a58d3d2aSXin LiFirst, the procedure finds the index i where
3050*a58d3d2aSXin Li NLSF_Q15[i]&nbsp;-&nbsp;NLSF_Q15[i-1]&nbsp;-&nbsp;NDeltaMin_Q15[i] is the
3051*a58d3d2aSXin Li smallest, breaking ties by using the lower value of i.
3052*a58d3d2aSXin LiIf this value is non-negative, then the stabilization stops; the coefficients
3053*a58d3d2aSXin Li satisfy all the constraints.
3054*a58d3d2aSXin LiOtherwise, if i&nbsp;==&nbsp;0, it sets NLSF_Q15[0] to NDeltaMin_Q15[0], and if
3055*a58d3d2aSXin Li i&nbsp;==&nbsp;d_LPC, it sets NLSF_Q15[d_LPC-1] to
3056*a58d3d2aSXin Li (32768&nbsp;-&nbsp;NDeltaMin_Q15[d_LPC]).
3057*a58d3d2aSXin LiFor all other values of i, both NLSF_Q15[i-1] and NLSF_Q15[i] are updated as
3058*a58d3d2aSXin Li follows:
3059*a58d3d2aSXin Li<figure align="center">
3060*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3061*a58d3d2aSXin Li                                          i-1
3062*a58d3d2aSXin Li                                          __
3063*a58d3d2aSXin Li min_center_Q15 = (NDeltaMin_Q15[i]>>1) + \  NDeltaMin_Q15[k]
3064*a58d3d2aSXin Li                                          /_
3065*a58d3d2aSXin Li                                          k=0
3066*a58d3d2aSXin Li                                                 d_LPC
3067*a58d3d2aSXin Li                                                  __
3068*a58d3d2aSXin Li max_center_Q15 = 32768 - (NDeltaMin_Q15[i]>>1) - \  NDeltaMin_Q15[k]
3069*a58d3d2aSXin Li                                                  /_
3070*a58d3d2aSXin Li                                                 k=i+1
3071*a58d3d2aSXin Licenter_freq_Q15 = clamp(min_center_Q15[i],
3072*a58d3d2aSXin Li                        (NLSF_Q15[i-1] + NLSF_Q15[i] + 1)>>1,
3073*a58d3d2aSXin Li                        max_center_Q15[i])
3074*a58d3d2aSXin Li
3075*a58d3d2aSXin Li NLSF_Q15[i-1] = center_freq_Q15 - (NDeltaMin_Q15[i]>>1)
3076*a58d3d2aSXin Li
3077*a58d3d2aSXin Li   NLSF_Q15[i] = NLSF_Q15[i-1] + NDeltaMin_Q15[i] .
3078*a58d3d2aSXin Li]]></artwork>
3079*a58d3d2aSXin Li</figure>
3080*a58d3d2aSXin LiThen the procedure repeats again, until it has either executed 20 times or
3081*a58d3d2aSXin Li has stopped because the coefficients satisfy all the constraints.
3082*a58d3d2aSXin Li</t>
3083*a58d3d2aSXin Li<t>
3084*a58d3d2aSXin LiAfter the 20th repetition of the above procedure, the following fallback
3085*a58d3d2aSXin Li procedure executes once.
3086*a58d3d2aSXin LiFirst, the values of NLSF_Q15[k] for 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d_LPC
3087*a58d3d2aSXin Li are sorted in ascending order.
3088*a58d3d2aSXin LiThen for each value of k from 0 to d_LPC-1, NLSF_Q15[k] is set to
3089*a58d3d2aSXin Li<figure align="center">
3090*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3091*a58d3d2aSXin Limax(NLSF_Q15[k], NLSF_Q15[k-1] + NDeltaMin_Q15[k]) .
3092*a58d3d2aSXin Li]]></artwork>
3093*a58d3d2aSXin Li</figure>
3094*a58d3d2aSXin LiNext, for each value of k from d_LPC-1 down to 0, NLSF_Q15[k] is set to
3095*a58d3d2aSXin Li<figure align="center">
3096*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3097*a58d3d2aSXin Limin(NLSF_Q15[k], NLSF_Q15[k+1] - NDeltaMin_Q15[k+1]) .
3098*a58d3d2aSXin Li]]></artwork>
3099*a58d3d2aSXin Li</figure>
3100*a58d3d2aSXin Li</t>
3101*a58d3d2aSXin Li
3102*a58d3d2aSXin Li</section>
3103*a58d3d2aSXin Li
3104*a58d3d2aSXin Li<section anchor="silk_nlsf_interpolation" title="Normalized LSF Interpolation">
3105*a58d3d2aSXin Li<t>
3106*a58d3d2aSXin LiFor 20&nbsp;ms SILK frames, the first half of the frame (i.e., the first two
3107*a58d3d2aSXin Li subframes) may use normalized LSF coefficients that are interpolated between
3108*a58d3d2aSXin Li the decoded LSFs for the most recent coded frame (in the same channel) and the
3109*a58d3d2aSXin Li current frame.
3110*a58d3d2aSXin LiA Q2 interpolation factor follows the LSF coefficient indices in the bitstream,
3111*a58d3d2aSXin Li which is decoded using the PDF in <xref target="silk_nlsf_interp_pdf"/>.
3112*a58d3d2aSXin LiThis happens in silk_decode_indices() (decode_indices.c).
3113*a58d3d2aSXin LiAfter either
3114*a58d3d2aSXin Li<list style="symbols">
3115*a58d3d2aSXin Li<t>An uncoded regular SILK frame in the side channel, or</t>
3116*a58d3d2aSXin Li<t>A decoder reset (see <xref target="decoder-reset"/>),</t>
3117*a58d3d2aSXin Li</list>
3118*a58d3d2aSXin Li the decoder still decodes this factor, but ignores its value and always uses
3119*a58d3d2aSXin Li 4 instead.
3120*a58d3d2aSXin LiFor 10&nbsp;ms SILK frames, this factor is not stored at all.
3121*a58d3d2aSXin Li</t>
3122*a58d3d2aSXin Li
3123*a58d3d2aSXin Li<texttable anchor="silk_nlsf_interp_pdf"
3124*a58d3d2aSXin Li           title="PDF for Normalized LSF Interpolation Index">
3125*a58d3d2aSXin Li<ttcol>PDF</ttcol>
3126*a58d3d2aSXin Li<c>{13, 22, 29, 11, 181}/256</c>
3127*a58d3d2aSXin Li</texttable>
3128*a58d3d2aSXin Li
3129*a58d3d2aSXin Li<t>
3130*a58d3d2aSXin LiLet n2_Q15[k] be the normalized LSF coefficients decoded by the procedure in
3131*a58d3d2aSXin Li <xref target="silk_nlsfs"/>, n0_Q15[k] be the LSF coefficients
3132*a58d3d2aSXin Li decoded for the prior frame, and w_Q2 be the interpolation factor.
3133*a58d3d2aSXin LiThen the normalized LSF coefficients used for the first half of a 20&nbsp;ms
3134*a58d3d2aSXin Li frame, n1_Q15[k], are
3135*a58d3d2aSXin Li<figure align="center">
3136*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3137*a58d3d2aSXin Lin1_Q15[k] = n0_Q15[k] + (w_Q2*(n2_Q15[k] - n0_Q15[k]) >> 2) .
3138*a58d3d2aSXin Li]]></artwork>
3139*a58d3d2aSXin Li</figure>
3140*a58d3d2aSXin LiThis interpolation is performed in silk_decode_parameters()
3141*a58d3d2aSXin Li (decode_parameters.c).
3142*a58d3d2aSXin Li</t>
3143*a58d3d2aSXin Li</section>
3144*a58d3d2aSXin Li
3145*a58d3d2aSXin Li<section anchor="silk_nlsf2lpc"
3146*a58d3d2aSXin Li title="Converting Normalized LSFs to LPC Coefficients">
3147*a58d3d2aSXin Li<t>
3148*a58d3d2aSXin LiAny LPC filter A(z) can be split into a symmetric part P(z) and an
3149*a58d3d2aSXin Li anti-symmetric part Q(z) such that
3150*a58d3d2aSXin Li<figure align="center">
3151*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3152*a58d3d2aSXin Li          d_LPC
3153*a58d3d2aSXin Li           __         -k   1
3154*a58d3d2aSXin LiA(z) = 1 - \  a[k] * z   = - * (P(z) + Q(z))
3155*a58d3d2aSXin Li           /_              2
3156*a58d3d2aSXin Li           k=1
3157*a58d3d2aSXin Li]]></artwork>
3158*a58d3d2aSXin Li</figure>
3159*a58d3d2aSXin Liwith
3160*a58d3d2aSXin Li<figure align="center">
3161*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3162*a58d3d2aSXin Li               -d_LPC-1      -1
3163*a58d3d2aSXin LiP(z) = A(z) + z         * A(z  )
3164*a58d3d2aSXin Li
3165*a58d3d2aSXin Li               -d_LPC-1      -1
3166*a58d3d2aSXin LiQ(z) = A(z) - z         * A(z  ) .
3167*a58d3d2aSXin Li]]></artwork>
3168*a58d3d2aSXin Li</figure>
3169*a58d3d2aSXin LiThe even normalized LSF coefficients correspond to a pair of conjugate roots of
3170*a58d3d2aSXin Li P(z), while the odd coefficients correspond to a pair of conjugate roots of
3171*a58d3d2aSXin Li Q(z), all of which lie on the unit circle.
3172*a58d3d2aSXin LiIn addition, P(z) has a root at pi and Q(z) has a root at 0.
3173*a58d3d2aSXin LiThus, they may be reconstructed mathematically from a set of normalized LSF
3174*a58d3d2aSXin Li coefficients, n[k], as
3175*a58d3d2aSXin Li<figure align="center">
3176*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3177*a58d3d2aSXin Li                 d_LPC/2-1
3178*a58d3d2aSXin Li             -1     ___                        -1    -2
3179*a58d3d2aSXin LiP(z) = (1 + z  ) *  | |  (1 - 2*cos(pi*n[2*k])*z  + z  )
3180*a58d3d2aSXin Li                    k=0
3181*a58d3d2aSXin Li
3182*a58d3d2aSXin Li                 d_LPC/2-1
3183*a58d3d2aSXin Li             -1     ___                          -1    -2
3184*a58d3d2aSXin LiQ(z) = (1 - z  ) *  | |  (1 - 2*cos(pi*n[2*k+1])*z  + z  )
3185*a58d3d2aSXin Li                    k=0
3186*a58d3d2aSXin Li]]></artwork>
3187*a58d3d2aSXin Li</figure>
3188*a58d3d2aSXin Li</t>
3189*a58d3d2aSXin Li<t>
3190*a58d3d2aSXin LiHowever, SILK performs this reconstruction using a fixed-point approximation so
3191*a58d3d2aSXin Li that all decoders can reproduce it in a bit-exact manner to avoid prediction
3192*a58d3d2aSXin Li drift.
3193*a58d3d2aSXin LiThe function silk_NLSF2A() (NLSF2A.c) implements this procedure.
3194*a58d3d2aSXin Li</t>
3195*a58d3d2aSXin Li<t>
3196*a58d3d2aSXin LiTo start, it approximates cos(pi*n[k]) using a table lookup with linear
3197*a58d3d2aSXin Li interpolation.
3198*a58d3d2aSXin LiThe encoder SHOULD use the inverse of this piecewise linear approximation,
3199*a58d3d2aSXin Li rather than the true inverse of the cosine function, when deriving the
3200*a58d3d2aSXin Li normalized LSF coefficients.
3201*a58d3d2aSXin LiThese values are also re-ordered to improve numerical accuracy when
3202*a58d3d2aSXin Li constructing the LPC polynomials.
3203*a58d3d2aSXin Li</t>
3204*a58d3d2aSXin Li
3205*a58d3d2aSXin Li<texttable anchor="silk_nlsf_orderings"
3206*a58d3d2aSXin Li           title="LSF Ordering for Polynomial Evaluation">
3207*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
3208*a58d3d2aSXin Li<ttcol align="right">NB and MB</ttcol>
3209*a58d3d2aSXin Li<ttcol align="right">WB</ttcol>
3210*a58d3d2aSXin Li <c>0</c>  <c>0</c>  <c>0</c>
3211*a58d3d2aSXin Li <c>1</c>  <c>9</c> <c>15</c>
3212*a58d3d2aSXin Li <c>2</c>  <c>6</c>  <c>8</c>
3213*a58d3d2aSXin Li <c>3</c>  <c>3</c>  <c>7</c>
3214*a58d3d2aSXin Li <c>4</c>  <c>4</c>  <c>4</c>
3215*a58d3d2aSXin Li <c>5</c>  <c>5</c> <c>11</c>
3216*a58d3d2aSXin Li <c>6</c>  <c>8</c> <c>12</c>
3217*a58d3d2aSXin Li <c>7</c>  <c>1</c>  <c>3</c>
3218*a58d3d2aSXin Li <c>8</c>  <c>2</c>  <c>2</c>
3219*a58d3d2aSXin Li <c>9</c>  <c>7</c> <c>13</c>
3220*a58d3d2aSXin Li<c>10</c>      <c/> <c>10</c>
3221*a58d3d2aSXin Li<c>11</c>      <c/>  <c>5</c>
3222*a58d3d2aSXin Li<c>12</c>      <c/>  <c>6</c>
3223*a58d3d2aSXin Li<c>13</c>      <c/>  <c>9</c>
3224*a58d3d2aSXin Li<c>14</c>      <c/> <c>14</c>
3225*a58d3d2aSXin Li<c>15</c>      <c/>  <c>1</c>
3226*a58d3d2aSXin Li</texttable>
3227*a58d3d2aSXin Li
3228*a58d3d2aSXin Li<t>
3229*a58d3d2aSXin LiThe top 7 bits of each normalized LSF coefficient index a value in the table,
3230*a58d3d2aSXin Li and the next 8 bits interpolate between it and the next value.
3231*a58d3d2aSXin LiLet i&nbsp;=&nbsp;(n[k]&nbsp;&gt;&gt;&nbsp;8) be the integer index and
3232*a58d3d2aSXin Li f&nbsp;=&nbsp;(n[k]&nbsp;&amp;&nbsp;255) be the fractional part of a given
3233*a58d3d2aSXin Li coefficient.
3234*a58d3d2aSXin LiThen the re-ordered, approximated cosine, c_Q17[ordering[k]], is
3235*a58d3d2aSXin Li<figure align="center">
3236*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3237*a58d3d2aSXin Lic_Q17[ordering[k]] = (cos_Q12[i]*256
3238*a58d3d2aSXin Li                      + (cos_Q12[i+1]-cos_Q12[i])*f + 4) >> 3 ,
3239*a58d3d2aSXin Li]]></artwork>
3240*a58d3d2aSXin Li</figure>
3241*a58d3d2aSXin Li where ordering[k] is the k'th entry of the column of
3242*a58d3d2aSXin Li <xref target="silk_nlsf_orderings"/> corresponding to the current audio
3243*a58d3d2aSXin Li bandwidth and cos_Q12[i] is the i'th entry of <xref target="silk_cos_table"/>.
3244*a58d3d2aSXin Li</t>
3245*a58d3d2aSXin Li
3246*a58d3d2aSXin Li<texttable anchor="silk_cos_table"
3247*a58d3d2aSXin Li           title="Q12 Cosine Table for LSF Conversion">
3248*a58d3d2aSXin Li<ttcol align="right">i</ttcol>
3249*a58d3d2aSXin Li<ttcol align="right">+0</ttcol>
3250*a58d3d2aSXin Li<ttcol align="right">+1</ttcol>
3251*a58d3d2aSXin Li<ttcol align="right">+2</ttcol>
3252*a58d3d2aSXin Li<ttcol align="right">+3</ttcol>
3253*a58d3d2aSXin Li<c>0</c>
3254*a58d3d2aSXin Li <c>4096</c> <c>4095</c> <c>4091</c> <c>4085</c>
3255*a58d3d2aSXin Li<c>4</c>
3256*a58d3d2aSXin Li <c>4076</c> <c>4065</c> <c>4052</c> <c>4036</c>
3257*a58d3d2aSXin Li<c>8</c>
3258*a58d3d2aSXin Li <c>4017</c> <c>3997</c> <c>3973</c> <c>3948</c>
3259*a58d3d2aSXin Li<c>12</c>
3260*a58d3d2aSXin Li <c>3920</c> <c>3889</c> <c>3857</c> <c>3822</c>
3261*a58d3d2aSXin Li<c>16</c>
3262*a58d3d2aSXin Li <c>3784</c> <c>3745</c> <c>3703</c> <c>3659</c>
3263*a58d3d2aSXin Li<c>20</c>
3264*a58d3d2aSXin Li <c>3613</c> <c>3564</c> <c>3513</c> <c>3461</c>
3265*a58d3d2aSXin Li<c>24</c>
3266*a58d3d2aSXin Li <c>3406</c> <c>3349</c> <c>3290</c> <c>3229</c>
3267*a58d3d2aSXin Li<c>28</c>
3268*a58d3d2aSXin Li <c>3166</c> <c>3102</c> <c>3035</c> <c>2967</c>
3269*a58d3d2aSXin Li<c>32</c>
3270*a58d3d2aSXin Li <c>2896</c> <c>2824</c> <c>2751</c> <c>2676</c>
3271*a58d3d2aSXin Li<c>36</c>
3272*a58d3d2aSXin Li <c>2599</c> <c>2520</c> <c>2440</c> <c>2359</c>
3273*a58d3d2aSXin Li<c>40</c>
3274*a58d3d2aSXin Li <c>2276</c> <c>2191</c> <c>2106</c> <c>2019</c>
3275*a58d3d2aSXin Li<c>44</c>
3276*a58d3d2aSXin Li <c>1931</c> <c>1842</c> <c>1751</c> <c>1660</c>
3277*a58d3d2aSXin Li<c>48</c>
3278*a58d3d2aSXin Li <c>1568</c> <c>1474</c> <c>1380</c> <c>1285</c>
3279*a58d3d2aSXin Li<c>52</c>
3280*a58d3d2aSXin Li <c>1189</c> <c>1093</c>  <c>995</c>  <c>897</c>
3281*a58d3d2aSXin Li<c>56</c>
3282*a58d3d2aSXin Li  <c>799</c>  <c>700</c>  <c>601</c>  <c>501</c>
3283*a58d3d2aSXin Li<c>60</c>
3284*a58d3d2aSXin Li  <c>401</c>  <c>301</c>  <c>201</c>  <c>101</c>
3285*a58d3d2aSXin Li<c>64</c>
3286*a58d3d2aSXin Li    <c>0</c> <c>-101</c> <c>-201</c> <c>-301</c>
3287*a58d3d2aSXin Li<c>68</c>
3288*a58d3d2aSXin Li <c>-401</c> <c>-501</c> <c>-601</c> <c>-700</c>
3289*a58d3d2aSXin Li<c>72</c>
3290*a58d3d2aSXin Li <c>-799</c> <c>-897</c> <c>-995</c> <c>-1093</c>
3291*a58d3d2aSXin Li<c>76</c>
3292*a58d3d2aSXin Li<c>-1189</c><c>-1285</c><c>-1380</c><c>-1474</c>
3293*a58d3d2aSXin Li<c>80</c>
3294*a58d3d2aSXin Li<c>-1568</c><c>-1660</c><c>-1751</c><c>-1842</c>
3295*a58d3d2aSXin Li<c>84</c>
3296*a58d3d2aSXin Li<c>-1931</c><c>-2019</c><c>-2106</c><c>-2191</c>
3297*a58d3d2aSXin Li<c>88</c>
3298*a58d3d2aSXin Li<c>-2276</c><c>-2359</c><c>-2440</c><c>-2520</c>
3299*a58d3d2aSXin Li<c>92</c>
3300*a58d3d2aSXin Li<c>-2599</c><c>-2676</c><c>-2751</c><c>-2824</c>
3301*a58d3d2aSXin Li<c>96</c>
3302*a58d3d2aSXin Li<c>-2896</c><c>-2967</c><c>-3035</c><c>-3102</c>
3303*a58d3d2aSXin Li<c>100</c>
3304*a58d3d2aSXin Li<c>-3166</c><c>-3229</c><c>-3290</c><c>-3349</c>
3305*a58d3d2aSXin Li<c>104</c>
3306*a58d3d2aSXin Li<c>-3406</c><c>-3461</c><c>-3513</c><c>-3564</c>
3307*a58d3d2aSXin Li<c>108</c>
3308*a58d3d2aSXin Li<c>-3613</c><c>-3659</c><c>-3703</c><c>-3745</c>
3309*a58d3d2aSXin Li<c>112</c>
3310*a58d3d2aSXin Li<c>-3784</c><c>-3822</c><c>-3857</c><c>-3889</c>
3311*a58d3d2aSXin Li<c>116</c>
3312*a58d3d2aSXin Li<c>-3920</c><c>-3948</c><c>-3973</c><c>-3997</c>
3313*a58d3d2aSXin Li<c>120</c>
3314*a58d3d2aSXin Li<c>-4017</c><c>-4036</c><c>-4052</c><c>-4065</c>
3315*a58d3d2aSXin Li<c>124</c>
3316*a58d3d2aSXin Li<c>-4076</c><c>-4085</c><c>-4091</c><c>-4095</c>
3317*a58d3d2aSXin Li<c>128</c>
3318*a58d3d2aSXin Li<c>-4096</c>        <c/>        <c/>        <c/>
3319*a58d3d2aSXin Li</texttable>
3320*a58d3d2aSXin Li
3321*a58d3d2aSXin Li<t>
3322*a58d3d2aSXin LiGiven the list of cosine values, silk_NLSF2A_find_poly() (NLSF2A.c)
3323*a58d3d2aSXin Li computes the coefficients of P and Q, described here via a simple recurrence.
3324*a58d3d2aSXin LiLet p_Q16[k][j] and q_Q16[k][j] be the coefficients of the products of the
3325*a58d3d2aSXin Li first (k+1) root pairs for P and Q, with j indexing the coefficient number.
3326*a58d3d2aSXin LiOnly the first (k+2) coefficients are needed, as the products are symmetric.
3327*a58d3d2aSXin LiLet p_Q16[0][0]&nbsp;=&nbsp;q_Q16[0][0]&nbsp;=&nbsp;1&lt;&lt;16,
3328*a58d3d2aSXin Li p_Q16[0][1]&nbsp;=&nbsp;-c_Q17[0], q_Q16[0][1]&nbsp;=&nbsp;-c_Q17[1], and
3329*a58d3d2aSXin Li d2&nbsp;=&nbsp;d_LPC/2.
3330*a58d3d2aSXin LiAs boundary conditions, assume
3331*a58d3d2aSXin Li p_Q16[k][j]&nbsp;=&nbsp;q_Q16[k][j]&nbsp;=&nbsp;0 for all
3332*a58d3d2aSXin Li j&nbsp;&lt;&nbsp;0.
3333*a58d3d2aSXin LiAlso, assume p_Q16[k][k+2]&nbsp;=&nbsp;p_Q16[k][k] and
3334*a58d3d2aSXin Li q_Q16[k][k+2]&nbsp;=&nbsp;q_Q16[k][k] (because of the symmetry).
3335*a58d3d2aSXin LiThen, for 0&nbsp;&lt;&nbsp;k&nbsp;&lt;&nbsp;d2 and 0&nbsp;&lt;=&nbsp;j&nbsp;&lt;=&nbsp;k+1,
3336*a58d3d2aSXin Li<figure align="center">
3337*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3338*a58d3d2aSXin Lip_Q16[k][j] = p_Q16[k-1][j] + p_Q16[k-1][j-2]
3339*a58d3d2aSXin Li              - ((c_Q17[2*k]*p_Q16[k-1][j-1] + 32768)>>16) ,
3340*a58d3d2aSXin Li
3341*a58d3d2aSXin Liq_Q16[k][j] = q_Q16[k-1][j] + q_Q16[k-1][j-2]
3342*a58d3d2aSXin Li              - ((c_Q17[2*k+1]*q_Q16[k-1][j-1] + 32768)>>16) .
3343*a58d3d2aSXin Li]]></artwork>
3344*a58d3d2aSXin Li</figure>
3345*a58d3d2aSXin LiThe use of Q17 values for the cosine terms in an otherwise Q16 expression
3346*a58d3d2aSXin Li implicitly scales them by a factor of 2.
3347*a58d3d2aSXin LiThe multiplications in this recurrence may require up to 48 bits of precision
3348*a58d3d2aSXin Li in the result to avoid overflow.
3349*a58d3d2aSXin LiIn practice, each row of the recurrence only depends on the previous row, so an
3350*a58d3d2aSXin Li implementation does not need to store all of them.
3351*a58d3d2aSXin Li</t>
3352*a58d3d2aSXin Li<t>
3353*a58d3d2aSXin Lisilk_NLSF2A() uses the values from the last row of this recurrence to
3354*a58d3d2aSXin Li reconstruct a 32-bit version of the LPC filter (without the leading 1.0
3355*a58d3d2aSXin Li coefficient), a32_Q17[k], 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d2:
3356*a58d3d2aSXin Li<figure align="center">
3357*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3358*a58d3d2aSXin Lia32_Q17[k]         = -(q_Q16[d2-1][k+1] - q_Q16[d2-1][k])
3359*a58d3d2aSXin Li                     - (p_Q16[d2-1][k+1] + p_Q16[d2-1][k])) ,
3360*a58d3d2aSXin Li
3361*a58d3d2aSXin Lia32_Q17[d_LPC-k-1] =  (q_Q16[d2-1][k+1] - q_Q16[d2-1][k])
3362*a58d3d2aSXin Li                     - (p_Q16[d2-1][k+1] + p_Q16[d2-1][k])) .
3363*a58d3d2aSXin Li]]></artwork>
3364*a58d3d2aSXin Li</figure>
3365*a58d3d2aSXin LiThe sum and difference of two terms from each of the p_Q16 and q_Q16
3366*a58d3d2aSXin Li coefficient lists reflect the (1&nbsp;+&nbsp;z**-1) and
3367*a58d3d2aSXin Li (1&nbsp;-&nbsp;z**-1) factors of P and Q, respectively.
3368*a58d3d2aSXin LiThe promotion of the expression from Q16 to Q17 implicitly scales the result
3369*a58d3d2aSXin Li by 1/2.
3370*a58d3d2aSXin Li</t>
3371*a58d3d2aSXin Li</section>
3372*a58d3d2aSXin Li
3373*a58d3d2aSXin Li<section anchor="silk_lpc_range_limit"
3374*a58d3d2aSXin Li title="Limiting the Range of the LPC Coefficients">
3375*a58d3d2aSXin Li<t>
3376*a58d3d2aSXin LiThe a32_Q17[] coefficients are too large to fit in a 16-bit value, which
3377*a58d3d2aSXin Li significantly increases the cost of applying this filter in fixed-point
3378*a58d3d2aSXin Li decoders.
3379*a58d3d2aSXin LiReducing them to Q12 precision doesn't incur any significant quality loss,
3380*a58d3d2aSXin Li but still does not guarantee they will fit.
3381*a58d3d2aSXin Lisilk_NLSF2A() applies up to 10 rounds of bandwidth expansion to limit
3382*a58d3d2aSXin Li the dynamic range of these coefficients.
3383*a58d3d2aSXin LiEven floating-point decoders SHOULD perform these steps, to avoid mismatch.
3384*a58d3d2aSXin Li</t>
3385*a58d3d2aSXin Li<t>
3386*a58d3d2aSXin LiFor each round, the process first finds the index k such that abs(a32_Q17[k])
3387*a58d3d2aSXin Li is largest, breaking ties by choosing the lowest value of k.
3388*a58d3d2aSXin LiThen, it computes the corresponding Q12 precision value, maxabs_Q12, subject to
3389*a58d3d2aSXin Li an upper bound to avoid overflow in subsequent computations:
3390*a58d3d2aSXin Li<figure align="center">
3391*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3392*a58d3d2aSXin Limaxabs_Q12 = min((maxabs_Q17 + 16) >> 5, 163838) .
3393*a58d3d2aSXin Li]]></artwork>
3394*a58d3d2aSXin Li</figure>
3395*a58d3d2aSXin LiIf this is larger than 32767, the procedure derives the chirp factor,
3396*a58d3d2aSXin Li sc_Q16[0], to use in the bandwidth expansion as
3397*a58d3d2aSXin Li<figure align="center">
3398*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3399*a58d3d2aSXin Li                    (maxabs_Q12 - 32767) << 14
3400*a58d3d2aSXin Lisc_Q16[0] = 65470 - -------------------------- ,
3401*a58d3d2aSXin Li                    (maxabs_Q12 * (k+1)) >> 2
3402*a58d3d2aSXin Li]]></artwork>
3403*a58d3d2aSXin Li</figure>
3404*a58d3d2aSXin Li where the division here is integer division.
3405*a58d3d2aSXin LiThis is an approximation of the chirp factor needed to reduce the target
3406*a58d3d2aSXin Li coefficient to 32767, though it is both less than 0.999 and, for
3407*a58d3d2aSXin Li k&nbsp;&gt;&nbsp;0 when maxabs_Q12 is much greater than 32767, still slightly
3408*a58d3d2aSXin Li too large.
3409*a58d3d2aSXin LiThe upper bound on maxabs_Q12, 163838, was chosen because it is equal to
3410*a58d3d2aSXin Li ((2**31&nbsp;-&nbsp;1)&nbsp;&gt;&gt;&nbsp;14)&nbsp;+&nbsp;32767, i.e., the
3411*a58d3d2aSXin Li largest value of maxabs_Q12 that would not overflow the numerator in the
3412*a58d3d2aSXin Li equation above when stored in a signed 32-bit integer.
3413*a58d3d2aSXin Li</t>
3414*a58d3d2aSXin Li<t>
3415*a58d3d2aSXin Lisilk_bwexpander_32() (bwexpander_32.c) performs the bandwidth expansion (again,
3416*a58d3d2aSXin Li only when maxabs_Q12 is greater than 32767) using the following recurrence:
3417*a58d3d2aSXin Li<figure align="center">
3418*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3419*a58d3d2aSXin Li a32_Q17[k] = (a32_Q17[k]*sc_Q16[k]) >> 16
3420*a58d3d2aSXin Li
3421*a58d3d2aSXin Lisc_Q16[k+1] = (sc_Q16[0]*sc_Q16[k] + 32768) >> 16
3422*a58d3d2aSXin Li]]></artwork>
3423*a58d3d2aSXin Li</figure>
3424*a58d3d2aSXin LiThe first multiply may require up to 48 bits of precision in the result to
3425*a58d3d2aSXin Li avoid overflow.
3426*a58d3d2aSXin LiThe second multiply must be unsigned to avoid overflow with only 32 bits of
3427*a58d3d2aSXin Li precision.
3428*a58d3d2aSXin LiThe reference implementation uses a slightly more complex formulation that
3429*a58d3d2aSXin Li avoids the 32-bit overflow using signed multiplication, but is otherwise
3430*a58d3d2aSXin Li equivalent.
3431*a58d3d2aSXin Li</t>
3432*a58d3d2aSXin Li<t>
3433*a58d3d2aSXin LiAfter 10 rounds of bandwidth expansion are performed, they are simply saturated
3434*a58d3d2aSXin Li to 16 bits:
3435*a58d3d2aSXin Li<figure align="center">
3436*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3437*a58d3d2aSXin Lia32_Q17[k] = clamp(-32768, (a32_Q17[k] + 16) >> 5, 32767) << 5 .
3438*a58d3d2aSXin Li]]></artwork>
3439*a58d3d2aSXin Li</figure>
3440*a58d3d2aSXin LiBecause this performs the actual saturation in the Q12 domain, but converts the
3441*a58d3d2aSXin Li coefficients back to the Q17 domain for the purposes of prediction gain
3442*a58d3d2aSXin Li limiting, this step must be performed after the 10th round of bandwidth
3443*a58d3d2aSXin Li expansion, regardless of whether or not the Q12 version of any coefficient
3444*a58d3d2aSXin Li still overflows a 16-bit integer.
3445*a58d3d2aSXin LiThis saturation is not performed if maxabs_Q12 drops to 32767 or less prior to
3446*a58d3d2aSXin Li the 10th round.
3447*a58d3d2aSXin Li</t>
3448*a58d3d2aSXin Li</section>
3449*a58d3d2aSXin Li
3450*a58d3d2aSXin Li<section anchor="silk_lpc_gain_limit"
3451*a58d3d2aSXin Li title="Limiting the Prediction Gain of the LPC Filter">
3452*a58d3d2aSXin Li<t>
3453*a58d3d2aSXin LiThe prediction gain of an LPC synthesis filter is the square-root of the output
3454*a58d3d2aSXin Li energy when the filter is excited by a unit-energy impulse.
3455*a58d3d2aSXin LiEven if the Q12 coefficients would fit, the resulting filter may still have a
3456*a58d3d2aSXin Li significant gain (especially for voiced sounds), making the filter unstable.
3457*a58d3d2aSXin Lisilk_NLSF2A() applies up to 18 additional rounds of bandwidth expansion to
3458*a58d3d2aSXin Li limit the prediction gain.
3459*a58d3d2aSXin LiInstead of controlling the amount of bandwidth expansion using the prediction
3460*a58d3d2aSXin Li gain itself (which may diverge to infinity for an unstable filter),
3461*a58d3d2aSXin Li silk_NLSF2A() uses silk_LPC_inverse_pred_gain_QA() (LPC_inv_pred_gain.c) to
3462*a58d3d2aSXin Li compute the reflection coefficients associated with the filter.
3463*a58d3d2aSXin LiThe filter is stable if and only if the magnitude of these coefficients is
3464*a58d3d2aSXin Li sufficiently less than one.
3465*a58d3d2aSXin LiThe reflection coefficients, rc[k], can be computed using a simple Levinson
3466*a58d3d2aSXin Li recurrence, initialized with the LPC coefficients
3467*a58d3d2aSXin Li a[d_LPC-1][n]&nbsp;=&nbsp;a[n], and then updated via
3468*a58d3d2aSXin Li<figure align="center">
3469*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3470*a58d3d2aSXin Li    rc[k] = -a[k][k] ,
3471*a58d3d2aSXin Li
3472*a58d3d2aSXin Li            a[k][n] - a[k][k-n-1]*rc[k]
3473*a58d3d2aSXin Lia[k-1][n] = --------------------------- .
3474*a58d3d2aSXin Li                             2
3475*a58d3d2aSXin Li                    1 - rc[k]
3476*a58d3d2aSXin Li]]></artwork>
3477*a58d3d2aSXin Li</figure>
3478*a58d3d2aSXin Li</t>
3479*a58d3d2aSXin Li<t>
3480*a58d3d2aSXin LiHowever, silk_LPC_inverse_pred_gain_QA() approximates this using fixed-point
3481*a58d3d2aSXin Li arithmetic to guarantee reproducible results across platforms and
3482*a58d3d2aSXin Li implementations.
3483*a58d3d2aSXin LiSince small changes in the coefficients can make a stable filter unstable, it
3484*a58d3d2aSXin Li takes the real Q12 coefficients that will be used during reconstruction as
3485*a58d3d2aSXin Li input.
3486*a58d3d2aSXin LiThus, let
3487*a58d3d2aSXin Li<figure align="center">
3488*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3489*a58d3d2aSXin Lia32_Q12[n] = (a32_Q17[n] + 16) >> 5
3490*a58d3d2aSXin Li]]></artwork>
3491*a58d3d2aSXin Li</figure>
3492*a58d3d2aSXin Li be the Q12 version of the LPC coefficients that will eventually be used.
3493*a58d3d2aSXin LiAs a simple initial check, the decoder computes the DC response as
3494*a58d3d2aSXin Li<figure align="center">
3495*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3496*a58d3d2aSXin Li        d_PLC-1
3497*a58d3d2aSXin Li          __
3498*a58d3d2aSXin LiDC_resp = \   a32_Q12[n]
3499*a58d3d2aSXin Li          /_
3500*a58d3d2aSXin Li          n=0
3501*a58d3d2aSXin Li]]></artwork>
3502*a58d3d2aSXin Li</figure>
3503*a58d3d2aSXin Li and if DC_resp&nbsp;&gt;&nbsp;4096, the filter is unstable.
3504*a58d3d2aSXin Li</t>
3505*a58d3d2aSXin Li<t>
3506*a58d3d2aSXin LiIncreasing the precision of these Q12 coefficients to Q24 for intermediate
3507*a58d3d2aSXin Li computations allows more accurate computation of the reflection coefficients,
3508*a58d3d2aSXin Li so the decoder initializes the recurrence via
3509*a58d3d2aSXin Li<figure align="center">
3510*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3511*a58d3d2aSXin Lia32_Q24[d_LPC-1][n] = a32_Q12[n] << 12 .
3512*a58d3d2aSXin Li]]></artwork>
3513*a58d3d2aSXin Li</figure>
3514*a58d3d2aSXin LiThen for each k from d_LPC-1 down to 0, if
3515*a58d3d2aSXin Li abs(a32_Q24[k][k])&nbsp;&gt;&nbsp;16773022, the filter is unstable and the
3516*a58d3d2aSXin Li recurrence stops.
3517*a58d3d2aSXin LiThe constant 16773022 here is approximately 0.99975 in Q24.
3518*a58d3d2aSXin LiOtherwise, row k-1 of a32_Q24 is computed from row k as
3519*a58d3d2aSXin Li<figure align="center">
3520*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3521*a58d3d2aSXin Li      rc_Q31[k] = -a32_Q24[k][k] << 7 ,
3522*a58d3d2aSXin Li
3523*a58d3d2aSXin Li     div_Q30[k] = (1<<30) - (rc_Q31[k]*rc_Q31[k] >> 32) ,
3524*a58d3d2aSXin Li
3525*a58d3d2aSXin Li          b1[k] = ilog(div_Q30[k]) ,
3526*a58d3d2aSXin Li
3527*a58d3d2aSXin Li          b2[k] = b1[k] - 16 ,
3528*a58d3d2aSXin Li
3529*a58d3d2aSXin Li                        (1<<29) - 1
3530*a58d3d2aSXin Li     inv_Qb2[k] = ----------------------- ,
3531*a58d3d2aSXin Li                  div_Q30[k] >> (b2[k]+1)
3532*a58d3d2aSXin Li
3533*a58d3d2aSXin Li     err_Q29[k] = (1<<29)
3534*a58d3d2aSXin Li                  - ((div_Q30[k]<<(15-b2[k]))*inv_Qb2[k] >> 16) ,
3535*a58d3d2aSXin Li
3536*a58d3d2aSXin Li    gain_Qb1[k] = ((inv_Qb2[k] << 16)
3537*a58d3d2aSXin Li                   + (err_Q29[k]*inv_Qb2[k] >> 13)) ,
3538*a58d3d2aSXin Li
3539*a58d3d2aSXin Linum_Q24[k-1][n] = a32_Q24[k][n]
3540*a58d3d2aSXin Li                  - ((a32_Q24[k][k-n-1]*rc_Q31[k] + (1<<30)) >> 31) ,
3541*a58d3d2aSXin Li
3542*a58d3d2aSXin Lia32_Q24[k-1][n] = (num_Q24[k-1][n]*gain_Qb1[k]
3543*a58d3d2aSXin Li                   + (1<<(b1[k]-1))) >> b1[k] ,
3544*a58d3d2aSXin Li]]></artwork>
3545*a58d3d2aSXin Li</figure>
3546*a58d3d2aSXin Li where 0&nbsp;&lt;=&nbsp;n&nbsp;&lt;&nbsp;k.
3547*a58d3d2aSXin LiHere, rc_Q30[k] are the reflection coefficients.
3548*a58d3d2aSXin Lidiv_Q30[k] is the denominator for each iteration, and gain_Qb1[k] is its
3549*a58d3d2aSXin Li multiplicative inverse (with b1[k] fractional bits, where b1[k] ranges from
3550*a58d3d2aSXin Li 20 to 31).
3551*a58d3d2aSXin Liinv_Qb2[k], which ranges from 16384 to 32767, is a low-precision version of
3552*a58d3d2aSXin Li that inverse (with b2[k] fractional bits).
3553*a58d3d2aSXin Lierr_Q29[k] is the residual error, ranging from -32763 to 32392, which is used
3554*a58d3d2aSXin Li to improve the accuracy.
3555*a58d3d2aSXin LiThe values t_Q24[k-1][n] for each n are the numerators for the next row of
3556*a58d3d2aSXin Li coefficients in the recursion, and a32_Q24[k-1][n] is the final version of
3557*a58d3d2aSXin Li that row.
3558*a58d3d2aSXin LiEvery multiply in this procedure except the one used to compute gain_Qb1[k]
3559*a58d3d2aSXin Li requires more than 32 bits of precision, but otherwise all intermediate
3560*a58d3d2aSXin Li results fit in 32 bits or less.
3561*a58d3d2aSXin LiIn practice, because each row only depends on the next one, an implementation
3562*a58d3d2aSXin Li does not need to store them all.
3563*a58d3d2aSXin Li</t>
3564*a58d3d2aSXin Li<t>
3565*a58d3d2aSXin LiIf abs(a32_Q24[k][k])&nbsp;&lt;=&nbsp;16773022 for
3566*a58d3d2aSXin Li 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d_LPC, then the filter is considered stable.
3567*a58d3d2aSXin LiHowever, the problem of determining stability is ill-conditioned when the
3568*a58d3d2aSXin Li filter contains several reflection coefficients whose magnitude is very close
3569*a58d3d2aSXin Li to one.
3570*a58d3d2aSXin LiThis fixed-point algorithm is not mathematically guaranteed to correctly
3571*a58d3d2aSXin Li classify filters as stable or unstable in this case, though it does very well
3572*a58d3d2aSXin Li in practice.
3573*a58d3d2aSXin Li</t>
3574*a58d3d2aSXin Li<t>
3575*a58d3d2aSXin LiOn round i, 1&nbsp;&lt;=&nbsp;i&nbsp;&lt;=&nbsp;18, if the filter passes these
3576*a58d3d2aSXin Li stability checks, then this procedure stops, and the final LPC coefficients to
3577*a58d3d2aSXin Li use for reconstruction in <xref target="silk_lpc_synthesis"/> are
3578*a58d3d2aSXin Li<figure align="center">
3579*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3580*a58d3d2aSXin Lia_Q12[k] = (a32_Q17[k] + 16) >> 5 .
3581*a58d3d2aSXin Li]]></artwork>
3582*a58d3d2aSXin Li</figure>
3583*a58d3d2aSXin LiOtherwise, a round of bandwidth expansion is applied using the same procedure
3584*a58d3d2aSXin Li as in <xref target="silk_lpc_range_limit"/>, with
3585*a58d3d2aSXin Li<figure align="center">
3586*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3587*a58d3d2aSXin Lisc_Q16[0] = 65536 - (2<<i) .
3588*a58d3d2aSXin Li]]></artwork>
3589*a58d3d2aSXin Li</figure>
3590*a58d3d2aSXin LiDuring the 15th round, sc_Q16[0] becomes 0 in the above equation, so a_Q12[k]
3591*a58d3d2aSXin Li is set to 0 for all k, guaranteeing a stable filter.
3592*a58d3d2aSXin Li</t>
3593*a58d3d2aSXin Li</section>
3594*a58d3d2aSXin Li
3595*a58d3d2aSXin Li</section>
3596*a58d3d2aSXin Li
3597*a58d3d2aSXin Li<section anchor="silk_ltp_params" toc="include"
3598*a58d3d2aSXin Li title="Long-Term Prediction (LTP) Parameters">
3599*a58d3d2aSXin Li<t>
3600*a58d3d2aSXin LiAfter the normalized LSF indices and, for 20&nbsp;ms frames, the LSF
3601*a58d3d2aSXin Li interpolation index, voiced frames (see <xref target="silk_frame_type"/>)
3602*a58d3d2aSXin Li include additional LTP parameters.
3603*a58d3d2aSXin LiThere is one primary lag index for each SILK frame, but this is refined to
3604*a58d3d2aSXin Li produce a separate lag index per subframe using a vector quantizer.
3605*a58d3d2aSXin LiEach subframe also gets its own prediction gain coefficient.
3606*a58d3d2aSXin Li</t>
3607*a58d3d2aSXin Li
3608*a58d3d2aSXin Li<section anchor="silk_ltp_lags" title="Pitch Lags">
3609*a58d3d2aSXin Li<t>
3610*a58d3d2aSXin LiThe primary lag index is coded either relative to the primary lag of the prior
3611*a58d3d2aSXin Li frame in the same channel, or as an absolute index.
3612*a58d3d2aSXin LiAbsolute coding is used if and only if
3613*a58d3d2aSXin Li<list style="symbols">
3614*a58d3d2aSXin Li<t>
3615*a58d3d2aSXin LiThis is the first SILK frame of its type (LBRR or regular) for this channel in
3616*a58d3d2aSXin Li the current Opus frame,
3617*a58d3d2aSXin Li</t>
3618*a58d3d2aSXin Li<t>
3619*a58d3d2aSXin LiThe previous SILK frame of the same type (LBRR or regular) for this channel in
3620*a58d3d2aSXin Li the same Opus frame was not coded, or
3621*a58d3d2aSXin Li</t>
3622*a58d3d2aSXin Li<t>
3623*a58d3d2aSXin LiThat previous SILK frame was coded, but was not voiced (see
3624*a58d3d2aSXin Li <xref target="silk_frame_type"/>).
3625*a58d3d2aSXin Li</t>
3626*a58d3d2aSXin Li</list>
3627*a58d3d2aSXin Li</t>
3628*a58d3d2aSXin Li
3629*a58d3d2aSXin Li<t>
3630*a58d3d2aSXin LiWith absolute coding, the primary pitch lag may range from 2&nbsp;ms
3631*a58d3d2aSXin Li (inclusive) up to 18&nbsp;ms (exclusive), corresponding to pitches from
3632*a58d3d2aSXin Li 500&nbsp;Hz down to 55.6&nbsp;Hz, respectively.
3633*a58d3d2aSXin LiIt is comprised of a high part and a low part, where the decoder reads the high
3634*a58d3d2aSXin Li part using the 32-entry codebook in <xref target="silk_abs_pitch_high_pdf"/>
3635*a58d3d2aSXin Li and the low part using the codebook corresponding to the current audio
3636*a58d3d2aSXin Li bandwidth from <xref target="silk_abs_pitch_low_pdf"/>.
3637*a58d3d2aSXin LiThe final primary pitch lag is then
3638*a58d3d2aSXin Li<figure align="center">
3639*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3640*a58d3d2aSXin Lilag = lag_high*lag_scale + lag_low + lag_min
3641*a58d3d2aSXin Li]]></artwork>
3642*a58d3d2aSXin Li</figure>
3643*a58d3d2aSXin Li where lag_high is the high part, lag_low is the low part, and lag_scale
3644*a58d3d2aSXin Li and lag_min are the values from the "Scale" and "Minimum Lag" columns of
3645*a58d3d2aSXin Li <xref target="silk_abs_pitch_low_pdf"/>, respectively.
3646*a58d3d2aSXin Li</t>
3647*a58d3d2aSXin Li
3648*a58d3d2aSXin Li<texttable anchor="silk_abs_pitch_high_pdf"
3649*a58d3d2aSXin Li title="PDF for High Part of Primary Pitch Lag">
3650*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
3651*a58d3d2aSXin Li<c>{3,   3,   6,  11,  21,  30,  32,  19,
3652*a58d3d2aSXin Li   11,  10,  12,  13,  13,  12,  11,   9,
3653*a58d3d2aSXin Li    8,   7,   6,   4,   2,   2,   2,   1,
3654*a58d3d2aSXin Li    1,   1,   1,   1,   1,   1,   1,   1}/256</c>
3655*a58d3d2aSXin Li</texttable>
3656*a58d3d2aSXin Li
3657*a58d3d2aSXin Li<texttable anchor="silk_abs_pitch_low_pdf"
3658*a58d3d2aSXin Li title="PDF for Low Part of Primary Pitch Lag">
3659*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol>
3660*a58d3d2aSXin Li<ttcol>PDF</ttcol>
3661*a58d3d2aSXin Li<ttcol>Scale</ttcol>
3662*a58d3d2aSXin Li<ttcol>Minimum Lag</ttcol>
3663*a58d3d2aSXin Li<ttcol>Maximum Lag</ttcol>
3664*a58d3d2aSXin Li<c>NB</c> <c>{64, 64, 64, 64}/256</c>                 <c>4</c> <c>16</c> <c>144</c>
3665*a58d3d2aSXin Li<c>MB</c> <c>{43, 42, 43, 43, 42, 43}/256</c>         <c>6</c> <c>24</c> <c>216</c>
3666*a58d3d2aSXin Li<c>WB</c> <c>{32, 32, 32, 32, 32, 32, 32, 32}/256</c> <c>8</c> <c>32</c> <c>288</c>
3667*a58d3d2aSXin Li</texttable>
3668*a58d3d2aSXin Li
3669*a58d3d2aSXin Li<t>
3670*a58d3d2aSXin LiAll frames that do not use absolute coding for the primary lag index use
3671*a58d3d2aSXin Li relative coding instead.
3672*a58d3d2aSXin LiThe decoder reads a single delta value using the 21-entry PDF in
3673*a58d3d2aSXin Li <xref target="silk_rel_pitch_pdf"/>.
3674*a58d3d2aSXin LiIf the resulting value is zero, it falls back to the absolute coding procedure
3675*a58d3d2aSXin Li from the prior paragraph.
3676*a58d3d2aSXin LiOtherwise, the final primary pitch lag is then
3677*a58d3d2aSXin Li<figure align="center">
3678*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3679*a58d3d2aSXin Lilag = previous_lag + (delta_lag_index - 9)
3680*a58d3d2aSXin Li]]></artwork>
3681*a58d3d2aSXin Li</figure>
3682*a58d3d2aSXin Li where previous_lag is the primary pitch lag from the most recent frame in the
3683*a58d3d2aSXin Li same channel and delta_lag_index is the value just decoded.
3684*a58d3d2aSXin LiThis allows a per-frame change in the pitch lag of -8 to +11 samples.
3685*a58d3d2aSXin LiThe decoder does no clamping at this point, so this value can fall outside the
3686*a58d3d2aSXin Li range of 2&nbsp;ms to 18&nbsp;ms, and the decoder must use this unclamped
3687*a58d3d2aSXin Li value when using relative coding in the next SILK frame (if any).
3688*a58d3d2aSXin LiHowever, because an Opus frame can use relative coding for at most two
3689*a58d3d2aSXin Li consecutive SILK frames, integer overflow should not be an issue.
3690*a58d3d2aSXin Li</t>
3691*a58d3d2aSXin Li
3692*a58d3d2aSXin Li<texttable anchor="silk_rel_pitch_pdf"
3693*a58d3d2aSXin Li title="PDF for Primary Pitch Lag Change">
3694*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
3695*a58d3d2aSXin Li<c>{46,  2,  2,  3,  4,  6, 10, 15,
3696*a58d3d2aSXin Li    26, 38, 30, 22, 15, 10,  7,  6,
3697*a58d3d2aSXin Li     4,  4,  2,  2,  2}/256</c>
3698*a58d3d2aSXin Li</texttable>
3699*a58d3d2aSXin Li
3700*a58d3d2aSXin Li<t>
3701*a58d3d2aSXin LiAfter the primary pitch lag, a "pitch contour", stored as a single entry from
3702*a58d3d2aSXin Li one of four small VQ codebooks, gives lag offsets for each subframe in the
3703*a58d3d2aSXin Li current SILK frame.
3704*a58d3d2aSXin LiThe codebook index is decoded using one of the PDFs in
3705*a58d3d2aSXin Li <xref target="silk_pitch_contour_pdfs"/> depending on the current frame size
3706*a58d3d2aSXin Li and audio bandwidth.
3707*a58d3d2aSXin LiTables&nbsp;<xref format="counter" target="silk_pitch_contour_cb_nb10ms"/>
3708*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_pitch_contour_cb_mbwb20ms"/>
3709*a58d3d2aSXin Li give the corresponding offsets to apply to the primary pitch lag for each
3710*a58d3d2aSXin Li subframe given the decoded codebook index.
3711*a58d3d2aSXin Li</t>
3712*a58d3d2aSXin Li
3713*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_pdfs"
3714*a58d3d2aSXin Li title="PDFs for Subframe Pitch Contour">
3715*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol>
3716*a58d3d2aSXin Li<ttcol>SILK Frame Size</ttcol>
3717*a58d3d2aSXin Li<ttcol align="right">Codebook Size</ttcol>
3718*a58d3d2aSXin Li<ttcol>PDF</ttcol>
3719*a58d3d2aSXin Li<c>NB</c>       <c>10&nbsp;ms</c>  <c>3</c>
3720*a58d3d2aSXin Li<c>{143, 50, 63}/256</c>
3721*a58d3d2aSXin Li<c>NB</c>       <c>20&nbsp;ms</c> <c>11</c>
3722*a58d3d2aSXin Li<c>{68, 12, 21, 17, 19, 22, 30, 24,
3723*a58d3d2aSXin Li    17, 16, 10}/256</c>
3724*a58d3d2aSXin Li<c>MB or WB</c> <c>10&nbsp;ms</c> <c>12</c>
3725*a58d3d2aSXin Li<c>{91, 46, 39, 19, 14, 12,  8,  7,
3726*a58d3d2aSXin Li     6,  5,  5,  4}/256</c>
3727*a58d3d2aSXin Li<c>MB or WB</c> <c>20&nbsp;ms</c> <c>34</c>
3728*a58d3d2aSXin Li<c>{33, 22, 18, 16, 15, 14, 14, 13,
3729*a58d3d2aSXin Li    13, 10,  9,  9,  8,  6,  6,  6,
3730*a58d3d2aSXin Li     5,  4,  4,  4,  3,  3,  3,  2,
3731*a58d3d2aSXin Li     2,  2,  2,  2,  2,  2,  1,  1,
3732*a58d3d2aSXin Li     1,  1}/256</c>
3733*a58d3d2aSXin Li</texttable>
3734*a58d3d2aSXin Li
3735*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_nb10ms"
3736*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: NB, 10&nbsp;ms Frames">
3737*a58d3d2aSXin Li<ttcol>Index</ttcol>
3738*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol>
3739*a58d3d2aSXin Li<c>0</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0</spanx></c>
3740*a58d3d2aSXin Li<c>1</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0</spanx></c>
3741*a58d3d2aSXin Li<c>2</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;1</spanx></c>
3742*a58d3d2aSXin Li</texttable>
3743*a58d3d2aSXin Li
3744*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_nb20ms"
3745*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: NB, 20&nbsp;ms Frames">
3746*a58d3d2aSXin Li<ttcol>Index</ttcol>
3747*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol>
3748*a58d3d2aSXin Li <c>0</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
3749*a58d3d2aSXin Li <c>1</c> <c><spanx style="vbare">&nbsp;2&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;-1</spanx></c>
3750*a58d3d2aSXin Li <c>2</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;2</spanx></c>
3751*a58d3d2aSXin Li <c>3</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1</spanx></c>
3752*a58d3d2aSXin Li <c>4</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
3753*a58d3d2aSXin Li <c>5</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1</spanx></c>
3754*a58d3d2aSXin Li <c>6</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;1</spanx></c>
3755*a58d3d2aSXin Li <c>7</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
3756*a58d3d2aSXin Li <c>8</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
3757*a58d3d2aSXin Li <c>9</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-1</spanx></c>
3758*a58d3d2aSXin Li<c>10</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-1</spanx></c>
3759*a58d3d2aSXin Li</texttable>
3760*a58d3d2aSXin Li
3761*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_mbwb10ms"
3762*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: MB or WB, 10&nbsp;ms Frames">
3763*a58d3d2aSXin Li<ttcol>Index</ttcol>
3764*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol>
3765*a58d3d2aSXin Li <c>0</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0</spanx></c>
3766*a58d3d2aSXin Li <c>1</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;1</spanx></c>
3767*a58d3d2aSXin Li <c>2</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0</spanx></c>
3768*a58d3d2aSXin Li <c>3</c> <c><spanx style="vbare">-1&nbsp;&nbsp;1</spanx></c>
3769*a58d3d2aSXin Li <c>4</c> <c><spanx style="vbare">&nbsp;1&nbsp;-1</spanx></c>
3770*a58d3d2aSXin Li <c>5</c> <c><spanx style="vbare">-1&nbsp;&nbsp;2</spanx></c>
3771*a58d3d2aSXin Li <c>6</c> <c><spanx style="vbare">&nbsp;2&nbsp;-1</spanx></c>
3772*a58d3d2aSXin Li <c>7</c> <c><spanx style="vbare">-2&nbsp;&nbsp;2</spanx></c>
3773*a58d3d2aSXin Li <c>8</c> <c><spanx style="vbare">&nbsp;2&nbsp;-2</spanx></c>
3774*a58d3d2aSXin Li <c>9</c> <c><spanx style="vbare">-2&nbsp;&nbsp;3</spanx></c>
3775*a58d3d2aSXin Li<c>10</c> <c><spanx style="vbare">&nbsp;3&nbsp;-2</spanx></c>
3776*a58d3d2aSXin Li<c>11</c> <c><spanx style="vbare">-3&nbsp;&nbsp;3</spanx></c>
3777*a58d3d2aSXin Li</texttable>
3778*a58d3d2aSXin Li
3779*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_mbwb20ms"
3780*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: MB or WB, 20&nbsp;ms Frames">
3781*a58d3d2aSXin Li<ttcol>Index</ttcol>
3782*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol>
3783*a58d3d2aSXin Li <c>0</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
3784*a58d3d2aSXin Li <c>1</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;1</spanx></c>
3785*a58d3d2aSXin Li <c>2</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
3786*a58d3d2aSXin Li <c>3</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
3787*a58d3d2aSXin Li <c>4</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1</spanx></c>
3788*a58d3d2aSXin Li <c>5</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
3789*a58d3d2aSXin Li <c>6</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1</spanx></c>
3790*a58d3d2aSXin Li <c>7</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-1</spanx></c>
3791*a58d3d2aSXin Li <c>8</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;2</spanx></c>
3792*a58d3d2aSXin Li <c>9</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-1</spanx></c>
3793*a58d3d2aSXin Li<c>10</c> <c><spanx style="vbare">-2&nbsp;-1&nbsp;&nbsp;1&nbsp;&nbsp;2</spanx></c>
3794*a58d3d2aSXin Li<c>11</c> <c><spanx style="vbare">&nbsp;2&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;-1</spanx></c>
3795*a58d3d2aSXin Li<c>12</c> <c><spanx style="vbare">-2&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;2</spanx></c>
3796*a58d3d2aSXin Li<c>13</c> <c><spanx style="vbare">-2&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;3</spanx></c>
3797*a58d3d2aSXin Li<c>14</c> <c><spanx style="vbare">&nbsp;2&nbsp;&nbsp;1&nbsp;-1&nbsp;-2</spanx></c>
3798*a58d3d2aSXin Li<c>15</c> <c><spanx style="vbare">-3&nbsp;-1&nbsp;&nbsp;1&nbsp;&nbsp;3</spanx></c>
3799*a58d3d2aSXin Li<c>16</c> <c><spanx style="vbare">&nbsp;2&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-2</spanx></c>
3800*a58d3d2aSXin Li<c>17</c> <c><spanx style="vbare">&nbsp;3&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;-2</spanx></c>
3801*a58d3d2aSXin Li<c>18</c> <c><spanx style="vbare">-3&nbsp;-1&nbsp;&nbsp;2&nbsp;&nbsp;4</spanx></c>
3802*a58d3d2aSXin Li<c>19</c> <c><spanx style="vbare">-4&nbsp;-1&nbsp;&nbsp;1&nbsp;&nbsp;4</spanx></c>
3803*a58d3d2aSXin Li<c>20</c> <c><spanx style="vbare">&nbsp;3&nbsp;&nbsp;1&nbsp;-1&nbsp;-3</spanx></c>
3804*a58d3d2aSXin Li<c>21</c> <c><spanx style="vbare">-4&nbsp;-1&nbsp;&nbsp;2&nbsp;&nbsp;5</spanx></c>
3805*a58d3d2aSXin Li<c>22</c> <c><spanx style="vbare">&nbsp;4&nbsp;&nbsp;2&nbsp;-1&nbsp;-3</spanx></c>
3806*a58d3d2aSXin Li<c>23</c> <c><spanx style="vbare">&nbsp;4&nbsp;&nbsp;1&nbsp;-1&nbsp;-4</spanx></c>
3807*a58d3d2aSXin Li<c>24</c> <c><spanx style="vbare">-5&nbsp;-1&nbsp;&nbsp;2&nbsp;&nbsp;6</spanx></c>
3808*a58d3d2aSXin Li<c>25</c> <c><spanx style="vbare">&nbsp;5&nbsp;&nbsp;2&nbsp;-1&nbsp;-4</spanx></c>
3809*a58d3d2aSXin Li<c>26</c> <c><spanx style="vbare">-6&nbsp;-2&nbsp;&nbsp;2&nbsp;&nbsp;6</spanx></c>
3810*a58d3d2aSXin Li<c>27</c> <c><spanx style="vbare">-5&nbsp;-2&nbsp;&nbsp;2&nbsp;&nbsp;5</spanx></c>
3811*a58d3d2aSXin Li<c>28</c> <c><spanx style="vbare">&nbsp;6&nbsp;&nbsp;2&nbsp;-1&nbsp;-5</spanx></c>
3812*a58d3d2aSXin Li<c>29</c> <c><spanx style="vbare">-7&nbsp;-2&nbsp;&nbsp;3&nbsp;&nbsp;8</spanx></c>
3813*a58d3d2aSXin Li<c>30</c> <c><spanx style="vbare">&nbsp;6&nbsp;&nbsp;2&nbsp;-2&nbsp;-6</spanx></c>
3814*a58d3d2aSXin Li<c>31</c> <c><spanx style="vbare">&nbsp;5&nbsp;&nbsp;2&nbsp;-2&nbsp;-5</spanx></c>
3815*a58d3d2aSXin Li<c>32</c> <c><spanx style="vbare">&nbsp;8&nbsp;&nbsp;3&nbsp;-2&nbsp;-7</spanx></c>
3816*a58d3d2aSXin Li<c>33</c> <c><spanx style="vbare">-9&nbsp;-3&nbsp;&nbsp;3&nbsp;&nbsp;9</spanx></c>
3817*a58d3d2aSXin Li</texttable>
3818*a58d3d2aSXin Li
3819*a58d3d2aSXin Li<t>
3820*a58d3d2aSXin LiThe final pitch lag for each subframe is assembled in silk_decode_pitch()
3821*a58d3d2aSXin Li (decode_pitch.c).
3822*a58d3d2aSXin LiLet lag be the primary pitch lag for the current SILK frame, contour_index be
3823*a58d3d2aSXin Li index of the VQ codebook, and lag_cb[contour_index][k] be the corresponding
3824*a58d3d2aSXin Li entry of the codebook from the appropriate table given above for the k'th
3825*a58d3d2aSXin Li subframe.
3826*a58d3d2aSXin LiThen the final pitch lag for that subframe is
3827*a58d3d2aSXin Li<figure align="center">
3828*a58d3d2aSXin Li<artwork align="center"><![CDATA[
3829*a58d3d2aSXin Lipitch_lags[k] = clamp(lag_min, lag + lag_cb[contour_index][k],
3830*a58d3d2aSXin Li                      lag_max)
3831*a58d3d2aSXin Li]]></artwork>
3832*a58d3d2aSXin Li</figure>
3833*a58d3d2aSXin Li where lag_min and lag_max are the values from the "Minimum Lag" and
3834*a58d3d2aSXin Li "Maximum Lag" columns of <xref target="silk_abs_pitch_low_pdf"/>,
3835*a58d3d2aSXin Li respectively.
3836*a58d3d2aSXin Li</t>
3837*a58d3d2aSXin Li
3838*a58d3d2aSXin Li</section>
3839*a58d3d2aSXin Li
3840*a58d3d2aSXin Li<section anchor="silk_ltp_filter" title="LTP Filter Coefficients">
3841*a58d3d2aSXin Li<t>
3842*a58d3d2aSXin LiSILK uses a separate 5-tap pitch filter for each subframe, selected from one
3843*a58d3d2aSXin Li of three codebooks.
3844*a58d3d2aSXin LiThe three codebooks each represent different rate-distortion trade-offs, with
3845*a58d3d2aSXin Li average rates of 1.61&nbsp;bits/subframe, 3.68&nbsp;bits/subframe, and
3846*a58d3d2aSXin Li 4.85&nbsp;bits/subframe, respectively.
3847*a58d3d2aSXin Li</t>
3848*a58d3d2aSXin Li
3849*a58d3d2aSXin Li<t>
3850*a58d3d2aSXin LiThe importance of the filter coefficients generally depends on two factors: the
3851*a58d3d2aSXin Li periodicity of the signal and relative energy between the current subframe and
3852*a58d3d2aSXin Li the signal from one period earlier.
3853*a58d3d2aSXin LiGreater periodicity and decaying energy both lead to more important filter
3854*a58d3d2aSXin Li coefficients, and thus should be coded with lower distortion and higher rate.
3855*a58d3d2aSXin LiThese properties are relatively stable over the duration of a single SILK
3856*a58d3d2aSXin Li frame, hence all of the subframes in a SILK frame choose their filter from the
3857*a58d3d2aSXin Li same codebook.
3858*a58d3d2aSXin LiThis is signaled with an explicitly-coded "periodicity index".
3859*a58d3d2aSXin LiThis immediately follows the subframe pitch lags, and is coded using the
3860*a58d3d2aSXin Li 3-entry PDF from <xref target="silk_perindex_pdf"/>.
3861*a58d3d2aSXin Li</t>
3862*a58d3d2aSXin Li
3863*a58d3d2aSXin Li<texttable anchor="silk_perindex_pdf" title="Periodicity Index PDF">
3864*a58d3d2aSXin Li<ttcol>PDF</ttcol>
3865*a58d3d2aSXin Li<c>{77, 80, 99}/256</c>
3866*a58d3d2aSXin Li</texttable>
3867*a58d3d2aSXin Li
3868*a58d3d2aSXin Li<t>
3869*a58d3d2aSXin LiThe indices of the filters for each subframe follow.
3870*a58d3d2aSXin LiThey are all coded using the PDF from <xref target="silk_ltp_filter_pdfs"/>
3871*a58d3d2aSXin Li corresponding to the periodicity index.
3872*a58d3d2aSXin LiTables&nbsp;<xref format="counter" target="silk_ltp_filter_coeffs0"/>
3873*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_ltp_filter_coeffs2"/>
3874*a58d3d2aSXin Li contain the corresponding filter taps as signed Q7 integers.
3875*a58d3d2aSXin Li</t>
3876*a58d3d2aSXin Li
3877*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_pdfs" title="LTP Filter PDFs">
3878*a58d3d2aSXin Li<ttcol>Periodicity Index</ttcol>
3879*a58d3d2aSXin Li<ttcol align="right">Codebook Size</ttcol>
3880*a58d3d2aSXin Li<ttcol>PDF</ttcol>
3881*a58d3d2aSXin Li<c>0</c>  <c>8</c> <c>{185, 15, 13, 13, 9, 9, 6, 6}/256</c>
3882*a58d3d2aSXin Li<c>1</c> <c>16</c> <c>{57, 34, 21, 20, 15, 13, 12, 13,
3883*a58d3d2aSXin Li                       10, 10,  9, 10,  9,  8,  7,  8}/256</c>
3884*a58d3d2aSXin Li<c>2</c> <c>32</c> <c>{15, 16, 14, 12, 12, 12, 11, 11,
3885*a58d3d2aSXin Li                       11, 10,  9,  9,  9,  9,  8,  8,
3886*a58d3d2aSXin Li                        8,  8,  7,  7,  6,  6,  5,  4,
3887*a58d3d2aSXin Li                        5,  4,  4,  4,  3,  4,  3,  2}/256</c>
3888*a58d3d2aSXin Li</texttable>
3889*a58d3d2aSXin Li
3890*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_coeffs0"
3891*a58d3d2aSXin Li title="Codebook Vectors for LTP Filter, Periodicity Index 0">
3892*a58d3d2aSXin Li<ttcol>Index</ttcol>
3893*a58d3d2aSXin Li<ttcol align="right">Filter Taps (Q7)</ttcol>
3894*a58d3d2aSXin Li <c>0</c>
3895*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;4&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;24&nbsp;&nbsp;&nbsp;7&nbsp;&nbsp;&nbsp;5</spanx></c>
3896*a58d3d2aSXin Li <c>1</c>
3897*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;0&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;&nbsp;0</spanx></c>
3898*a58d3d2aSXin Li <c>2</c>
3899*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;12&nbsp;&nbsp;28&nbsp;&nbsp;41&nbsp;&nbsp;13&nbsp;&nbsp;-4</spanx></c>
3900*a58d3d2aSXin Li <c>3</c>
3901*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-9&nbsp;&nbsp;15&nbsp;&nbsp;42&nbsp;&nbsp;25&nbsp;&nbsp;14</spanx></c>
3902*a58d3d2aSXin Li <c>4</c>
3903*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;1&nbsp;&nbsp;-2&nbsp;&nbsp;62&nbsp;&nbsp;41&nbsp;&nbsp;-9</spanx></c>
3904*a58d3d2aSXin Li <c>5</c>
3905*a58d3d2aSXin Li<c><spanx style="vbare">-10&nbsp;&nbsp;37&nbsp;&nbsp;65&nbsp;&nbsp;-4&nbsp;&nbsp;&nbsp;3</spanx></c>
3906*a58d3d2aSXin Li <c>6</c>
3907*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-6&nbsp;&nbsp;&nbsp;4&nbsp;&nbsp;66&nbsp;&nbsp;&nbsp;7&nbsp;&nbsp;-8</spanx></c>
3908*a58d3d2aSXin Li <c>7</c>
3909*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;16&nbsp;&nbsp;14&nbsp;&nbsp;38&nbsp;&nbsp;-3&nbsp;&nbsp;33</spanx></c>
3910*a58d3d2aSXin Li</texttable>
3911*a58d3d2aSXin Li
3912*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_coeffs1"
3913*a58d3d2aSXin Li title="Codebook Vectors for LTP Filter, Periodicity Index 1">
3914*a58d3d2aSXin Li<ttcol>Index</ttcol>
3915*a58d3d2aSXin Li<ttcol align="right">Filter Taps (Q7)</ttcol>
3916*a58d3d2aSXin Li
3917*a58d3d2aSXin Li <c>0</c>
3918*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;13&nbsp;&nbsp;22&nbsp;&nbsp;39&nbsp;&nbsp;23&nbsp;&nbsp;12</spanx></c>
3919*a58d3d2aSXin Li <c>1</c>
3920*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-1&nbsp;&nbsp;36&nbsp;&nbsp;64&nbsp;&nbsp;27&nbsp;&nbsp;-6</spanx></c>
3921*a58d3d2aSXin Li <c>2</c>
3922*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-7&nbsp;&nbsp;10&nbsp;&nbsp;55&nbsp;&nbsp;43&nbsp;&nbsp;17</spanx></c>
3923*a58d3d2aSXin Li <c>3</c>
3924*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;8&nbsp;&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;1</spanx></c>
3925*a58d3d2aSXin Li <c>4</c>
3926*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;6&nbsp;-11&nbsp;&nbsp;74&nbsp;&nbsp;53&nbsp;&nbsp;-9</spanx></c>
3927*a58d3d2aSXin Li <c>5</c>
3928*a58d3d2aSXin Li<c><spanx style="vbare">-12&nbsp;&nbsp;55&nbsp;&nbsp;76&nbsp;-12&nbsp;&nbsp;&nbsp;8</spanx></c>
3929*a58d3d2aSXin Li <c>6</c>
3930*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-3&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;93&nbsp;&nbsp;27&nbsp;&nbsp;-4</spanx></c>
3931*a58d3d2aSXin Li <c>7</c>
3932*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;26&nbsp;&nbsp;39&nbsp;&nbsp;59&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;-8</spanx></c>
3933*a58d3d2aSXin Li <c>8</c>
3934*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;77&nbsp;&nbsp;11&nbsp;&nbsp;&nbsp;9</spanx></c>
3935*a58d3d2aSXin Li <c>9</c>
3936*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-8&nbsp;&nbsp;22&nbsp;&nbsp;44&nbsp;&nbsp;-6&nbsp;&nbsp;&nbsp;7</spanx></c>
3937*a58d3d2aSXin Li<c>10</c>
3938*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;40&nbsp;&nbsp;&nbsp;9&nbsp;&nbsp;26&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;9</spanx></c>
3939*a58d3d2aSXin Li<c>11</c>
3940*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-7&nbsp;&nbsp;20&nbsp;101&nbsp;&nbsp;-7&nbsp;&nbsp;&nbsp;4</spanx></c>
3941*a58d3d2aSXin Li<c>12</c>
3942*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;-8&nbsp;&nbsp;42&nbsp;&nbsp;26&nbsp;&nbsp;&nbsp;0</spanx></c>
3943*a58d3d2aSXin Li<c>13</c>
3944*a58d3d2aSXin Li<c><spanx style="vbare">-15&nbsp;&nbsp;33&nbsp;&nbsp;68&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;23</spanx></c>
3945*a58d3d2aSXin Li<c>14</c>
3946*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-2&nbsp;&nbsp;55&nbsp;&nbsp;46&nbsp;&nbsp;-2&nbsp;&nbsp;15</spanx></c>
3947*a58d3d2aSXin Li<c>15</c>
3948*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;-1&nbsp;&nbsp;21&nbsp;&nbsp;16&nbsp;&nbsp;41</spanx></c>
3949*a58d3d2aSXin Li</texttable>
3950*a58d3d2aSXin Li
3951*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_coeffs2"
3952*a58d3d2aSXin Li title="Codebook Vectors for LTP Filter, Periodicity Index 2">
3953*a58d3d2aSXin Li<ttcol>Index</ttcol>
3954*a58d3d2aSXin Li<ttcol align="right">Filter Taps (Q7)</ttcol>
3955*a58d3d2aSXin Li <c>0</c>
3956*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-6&nbsp;&nbsp;27&nbsp;&nbsp;61&nbsp;&nbsp;39&nbsp;&nbsp;&nbsp;5</spanx></c>
3957*a58d3d2aSXin Li <c>1</c>
3958*a58d3d2aSXin Li<c><spanx style="vbare">-11&nbsp;&nbsp;42&nbsp;&nbsp;88&nbsp;&nbsp;&nbsp;4&nbsp;&nbsp;&nbsp;1</spanx></c>
3959*a58d3d2aSXin Li <c>2</c>
3960*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-2&nbsp;&nbsp;60&nbsp;&nbsp;65&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;-4</spanx></c>
3961*a58d3d2aSXin Li <c>3</c>
3962*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-1&nbsp;&nbsp;-5&nbsp;&nbsp;73&nbsp;&nbsp;56&nbsp;&nbsp;&nbsp;1</spanx></c>
3963*a58d3d2aSXin Li <c>4</c>
3964*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-9&nbsp;&nbsp;19&nbsp;&nbsp;94&nbsp;&nbsp;29&nbsp;&nbsp;-9</spanx></c>
3965*a58d3d2aSXin Li <c>5</c>
3966*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;0&nbsp;&nbsp;12&nbsp;&nbsp;99&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;&nbsp;4</spanx></c>
3967*a58d3d2aSXin Li <c>6</c>
3968*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;8&nbsp;-19&nbsp;102&nbsp;&nbsp;46&nbsp;-13</spanx></c>
3969*a58d3d2aSXin Li <c>7</c>
3970*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;13&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;2</spanx></c>
3971*a58d3d2aSXin Li <c>8</c>
3972*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;9&nbsp;-21&nbsp;&nbsp;84&nbsp;&nbsp;72&nbsp;-18</spanx></c>
3973*a58d3d2aSXin Li <c>9</c>
3974*a58d3d2aSXin Li<c><spanx style="vbare">-11&nbsp;&nbsp;46&nbsp;104&nbsp;-22&nbsp;&nbsp;&nbsp;8</spanx></c>
3975*a58d3d2aSXin Li<c>10</c>
3976*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;18&nbsp;&nbsp;38&nbsp;&nbsp;48&nbsp;&nbsp;23&nbsp;&nbsp;&nbsp;0</spanx></c>
3977*a58d3d2aSXin Li<c>11</c>
3978*a58d3d2aSXin Li<c><spanx style="vbare">-16&nbsp;&nbsp;70&nbsp;&nbsp;83&nbsp;-21&nbsp;&nbsp;11</spanx></c>
3979*a58d3d2aSXin Li<c>12</c>
3980*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;5&nbsp;-11&nbsp;117&nbsp;&nbsp;22&nbsp;&nbsp;-8</spanx></c>
3981*a58d3d2aSXin Li<c>13</c>
3982*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-6&nbsp;&nbsp;23&nbsp;117&nbsp;-12&nbsp;&nbsp;&nbsp;3</spanx></c>
3983*a58d3d2aSXin Li<c>14</c>
3984*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;-8&nbsp;&nbsp;95&nbsp;&nbsp;28&nbsp;&nbsp;&nbsp;4</spanx></c>
3985*a58d3d2aSXin Li<c>15</c>
3986*a58d3d2aSXin Li<c><spanx style="vbare">-10&nbsp;&nbsp;15&nbsp;&nbsp;77&nbsp;&nbsp;60&nbsp;-15</spanx></c>
3987*a58d3d2aSXin Li<c>16</c>
3988*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-1&nbsp;&nbsp;&nbsp;4&nbsp;124&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;-4</spanx></c>
3989*a58d3d2aSXin Li<c>17</c>
3990*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;38&nbsp;&nbsp;84&nbsp;&nbsp;24&nbsp;-25</spanx></c>
3991*a58d3d2aSXin Li<c>18</c>
3992*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;2&nbsp;&nbsp;13&nbsp;&nbsp;42&nbsp;&nbsp;13&nbsp;&nbsp;31</spanx></c>
3993*a58d3d2aSXin Li<c>19</c>
3994*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;21&nbsp;&nbsp;-4&nbsp;&nbsp;56&nbsp;&nbsp;46&nbsp;&nbsp;-1</spanx></c>
3995*a58d3d2aSXin Li<c>20</c>
3996*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-1&nbsp;&nbsp;35&nbsp;&nbsp;79&nbsp;-13&nbsp;&nbsp;19</spanx></c>
3997*a58d3d2aSXin Li<c>21</c>
3998*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-7&nbsp;&nbsp;65&nbsp;&nbsp;88&nbsp;&nbsp;-9&nbsp;-14</spanx></c>
3999*a58d3d2aSXin Li<c>22</c>
4000*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;20&nbsp;&nbsp;&nbsp;4&nbsp;&nbsp;81&nbsp;&nbsp;49&nbsp;-29</spanx></c>
4001*a58d3d2aSXin Li<c>23</c>
4002*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;20&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;75&nbsp;&nbsp;&nbsp;3&nbsp;-17</spanx></c>
4003*a58d3d2aSXin Li<c>24</c>
4004*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;5&nbsp;&nbsp;-9&nbsp;&nbsp;44&nbsp;&nbsp;92&nbsp;&nbsp;-8</spanx></c>
4005*a58d3d2aSXin Li<c>25</c>
4006*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;1&nbsp;&nbsp;-3&nbsp;&nbsp;22&nbsp;&nbsp;69&nbsp;&nbsp;31</spanx></c>
4007*a58d3d2aSXin Li<c>26</c>
4008*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-6&nbsp;&nbsp;95&nbsp;&nbsp;41&nbsp;-12&nbsp;&nbsp;&nbsp;5</spanx></c>
4009*a58d3d2aSXin Li<c>27</c>
4010*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;39&nbsp;&nbsp;67&nbsp;&nbsp;16&nbsp;&nbsp;-4&nbsp;&nbsp;&nbsp;1</spanx></c>
4011*a58d3d2aSXin Li<c>28</c>
4012*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;0&nbsp;&nbsp;-6&nbsp;120&nbsp;&nbsp;55&nbsp;-36</spanx></c>
4013*a58d3d2aSXin Li<c>29</c>
4014*a58d3d2aSXin Li<c><spanx style="vbare">-13&nbsp;&nbsp;44&nbsp;122&nbsp;&nbsp;&nbsp;4&nbsp;-24</spanx></c>
4015*a58d3d2aSXin Li<c>30</c>
4016*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;81&nbsp;&nbsp;&nbsp;5&nbsp;&nbsp;11&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;7</spanx></c>
4017*a58d3d2aSXin Li<c>31</c>
4018*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;&nbsp;9&nbsp;&nbsp;10&nbsp;&nbsp;88</spanx></c>
4019*a58d3d2aSXin Li</texttable>
4020*a58d3d2aSXin Li
4021*a58d3d2aSXin Li</section>
4022*a58d3d2aSXin Li
4023*a58d3d2aSXin Li<section anchor="silk_ltp_scaling" title="LTP Scaling Parameter">
4024*a58d3d2aSXin Li<t>
4025*a58d3d2aSXin LiAn LTP scaling parameter appears after the LTP filter coefficients if and only
4026*a58d3d2aSXin Li if
4027*a58d3d2aSXin Li<list style="symbols">
4028*a58d3d2aSXin Li<t>This is a voiced frame (see <xref target="silk_frame_type"/>), and</t>
4029*a58d3d2aSXin Li<t>Either
4030*a58d3d2aSXin Li<list style="symbols">
4031*a58d3d2aSXin Li<t>
4032*a58d3d2aSXin LiThis SILK frame corresponds to the first time interval of the
4033*a58d3d2aSXin Li current Opus frame for its type (LBRR or regular), or
4034*a58d3d2aSXin Li</t>
4035*a58d3d2aSXin Li<t>
4036*a58d3d2aSXin LiThis is an LBRR frame where the LBRR flags (see
4037*a58d3d2aSXin Li <xref target="silk_lbrr_flags"/>) indicate the previous LBRR frame in the same
4038*a58d3d2aSXin Li channel is not coded.
4039*a58d3d2aSXin Li</t>
4040*a58d3d2aSXin Li</list>
4041*a58d3d2aSXin Li</t>
4042*a58d3d2aSXin Li</list>
4043*a58d3d2aSXin LiThis allows the encoder to trade off the prediction gain between
4044*a58d3d2aSXin Li packets against the recovery time after packet loss.
4045*a58d3d2aSXin LiUnlike absolute-coding for pitch lags, regular SILK frames that are not at the
4046*a58d3d2aSXin Li start of an Opus frame (i.e., that do not correspond to the first 20&nbsp;ms
4047*a58d3d2aSXin Li time interval in Opus frames of 40&nbsp;or 60&nbsp;ms) do not include this
4048*a58d3d2aSXin Li field, even if the prior frame was not voiced, or (in the case of the side
4049*a58d3d2aSXin Li channel) not even coded.
4050*a58d3d2aSXin LiAfter an uncoded frame in the side channel, the LTP buffer (see
4051*a58d3d2aSXin Li <xref target="silk_ltp_synthesis"/>) is cleared to zero, and is thus in a
4052*a58d3d2aSXin Li known state.
4053*a58d3d2aSXin LiIn contrast, LBRR frames do include this field when the prior frame was not
4054*a58d3d2aSXin Li coded, since the LTP buffer contains the output of the PLC, which is
4055*a58d3d2aSXin Li non-normative.
4056*a58d3d2aSXin Li</t>
4057*a58d3d2aSXin Li<t>
4058*a58d3d2aSXin LiIf present, the decoder reads a value using the 3-entry PDF in
4059*a58d3d2aSXin Li <xref target="silk_ltp_scaling_pdf"/>.
4060*a58d3d2aSXin LiThe three possible values represent Q14 scale factors of 15565, 12288, and
4061*a58d3d2aSXin Li 8192, respectively (corresponding to approximately 0.95, 0.75, and 0.5).
4062*a58d3d2aSXin LiFrames that do not code the scaling parameter use the default factor of 15565
4063*a58d3d2aSXin Li (approximately 0.95).
4064*a58d3d2aSXin Li</t>
4065*a58d3d2aSXin Li
4066*a58d3d2aSXin Li<texttable anchor="silk_ltp_scaling_pdf"
4067*a58d3d2aSXin Li title="PDF for LTP Scaling Parameter">
4068*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
4069*a58d3d2aSXin Li<c>{128, 64, 64}/256</c>
4070*a58d3d2aSXin Li</texttable>
4071*a58d3d2aSXin Li
4072*a58d3d2aSXin Li</section>
4073*a58d3d2aSXin Li
4074*a58d3d2aSXin Li</section>
4075*a58d3d2aSXin Li
4076*a58d3d2aSXin Li<section anchor="silk_seed" toc="include"
4077*a58d3d2aSXin Li title="Linear Congruential Generator (LCG) Seed">
4078*a58d3d2aSXin Li<t>
4079*a58d3d2aSXin LiAs described in <xref target="silk_excitation_reconstruction"/>, SILK uses a
4080*a58d3d2aSXin Li linear congruential generator (LCG) to inject pseudorandom noise into the
4081*a58d3d2aSXin Li quantized excitation.
4082*a58d3d2aSXin LiTo ensure synchronization of this process between the encoder and decoder, each
4083*a58d3d2aSXin Li SILK frame stores a 2-bit seed after the LTP parameters (if any).
4084*a58d3d2aSXin LiThe encoder may consider the choice of seed during quantization, and the
4085*a58d3d2aSXin Li flexibility of this choice lets it reduce distortion, helping to pay for the
4086*a58d3d2aSXin Li bit cost required to signal it.
4087*a58d3d2aSXin LiThe decoder reads the seed using the uniform 4-entry PDF in
4088*a58d3d2aSXin Li <xref target="silk_seed_pdf"/>, yielding a value between 0 and 3, inclusive.
4089*a58d3d2aSXin Li</t>
4090*a58d3d2aSXin Li
4091*a58d3d2aSXin Li<texttable anchor="silk_seed_pdf"
4092*a58d3d2aSXin Li title="PDF for LCG Seed">
4093*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
4094*a58d3d2aSXin Li<c>{64, 64, 64, 64}/256</c>
4095*a58d3d2aSXin Li</texttable>
4096*a58d3d2aSXin Li
4097*a58d3d2aSXin Li</section>
4098*a58d3d2aSXin Li
4099*a58d3d2aSXin Li<section anchor="silk_excitation" toc="include" title="Excitation">
4100*a58d3d2aSXin Li<t>
4101*a58d3d2aSXin LiSILK codes the excitation using a modified version of the Pyramid Vector
4102*a58d3d2aSXin Li Quantization (PVQ) codebook <xref target="PVQ"/>.
4103*a58d3d2aSXin LiThe PVQ codebook is designed for Laplace-distributed values and consists of all
4104*a58d3d2aSXin Li sums of K signed, unit pulses in a vector of dimension N, where two pulses at
4105*a58d3d2aSXin Li the same position are required to have the same sign.
4106*a58d3d2aSXin LiThus the codebook includes all integer codevectors y of dimension N that
4107*a58d3d2aSXin Li satisfy
4108*a58d3d2aSXin Li<figure align="center">
4109*a58d3d2aSXin Li<artwork align="center"><![CDATA[
4110*a58d3d2aSXin LiN-1
4111*a58d3d2aSXin Li__
4112*a58d3d2aSXin Li\  abs(y[j]) = K .
4113*a58d3d2aSXin Li/_
4114*a58d3d2aSXin Lij=0
4115*a58d3d2aSXin Li]]></artwork>
4116*a58d3d2aSXin Li</figure>
4117*a58d3d2aSXin LiUnlike regular PVQ, SILK uses a variable-length, rather than fixed-length,
4118*a58d3d2aSXin Li encoding.
4119*a58d3d2aSXin LiThis encoding is better suited to the more Gaussian-like distribution of the
4120*a58d3d2aSXin Li coefficient magnitudes and the non-uniform distribution of their signs (caused
4121*a58d3d2aSXin Li by the quantization offset described below).
4122*a58d3d2aSXin LiSILK also handles large codebooks by coding the least significant bits (LSBs)
4123*a58d3d2aSXin Li of each coefficient directly.
4124*a58d3d2aSXin LiThis adds a small coding efficiency loss, but greatly reduces the computation
4125*a58d3d2aSXin Li time and ROM size required for decoding, as implemented in
4126*a58d3d2aSXin Li silk_decode_pulses() (decode_pulses.c).
4127*a58d3d2aSXin Li</t>
4128*a58d3d2aSXin Li
4129*a58d3d2aSXin Li<t>
4130*a58d3d2aSXin LiSILK fixes the dimension of the codebook to N&nbsp;=&nbsp;16.
4131*a58d3d2aSXin LiThe excitation is made up of a number of "shell blocks", each 16 samples in
4132*a58d3d2aSXin Li size.
4133*a58d3d2aSXin Li<xref target="silk_shell_block_table"/> lists the number of shell blocks
4134*a58d3d2aSXin Li required for a SILK frame for each possible audio bandwidth and frame size.
4135*a58d3d2aSXin Li10&nbsp;ms MB frames nominally contain 120&nbsp;samples (10&nbsp;ms at
4136*a58d3d2aSXin Li 12&nbsp;kHz), which is not a multiple of 16.
4137*a58d3d2aSXin LiThis is handled by coding 8 shell blocks (128 samples) and discarding the final
4138*a58d3d2aSXin Li 8 samples of the last block.
4139*a58d3d2aSXin LiThe decoder contains no special case that prevents an encoder from placing
4140*a58d3d2aSXin Li pulses in these samples, and they must be correctly parsed from the bitstream
4141*a58d3d2aSXin Li if present, but they are otherwise ignored.
4142*a58d3d2aSXin Li</t>
4143*a58d3d2aSXin Li
4144*a58d3d2aSXin Li<texttable anchor="silk_shell_block_table"
4145*a58d3d2aSXin Li title="Number of Shell Blocks Per SILK Frame">
4146*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol>
4147*a58d3d2aSXin Li<ttcol>Frame Size</ttcol>
4148*a58d3d2aSXin Li<ttcol align="right">Number of Shell Blocks</ttcol>
4149*a58d3d2aSXin Li<c>NB</c> <c>10&nbsp;ms</c>  <c>5</c>
4150*a58d3d2aSXin Li<c>MB</c> <c>10&nbsp;ms</c>  <c>8</c>
4151*a58d3d2aSXin Li<c>WB</c> <c>10&nbsp;ms</c> <c>10</c>
4152*a58d3d2aSXin Li<c>NB</c> <c>20&nbsp;ms</c> <c>10</c>
4153*a58d3d2aSXin Li<c>MB</c> <c>20&nbsp;ms</c> <c>15</c>
4154*a58d3d2aSXin Li<c>WB</c> <c>20&nbsp;ms</c> <c>20</c>
4155*a58d3d2aSXin Li</texttable>
4156*a58d3d2aSXin Li
4157*a58d3d2aSXin Li<section anchor="silk_rate_level" title="Rate Level">
4158*a58d3d2aSXin Li<t>
4159*a58d3d2aSXin LiThe first symbol in the excitation is a "rate level", which is an index from 0
4160*a58d3d2aSXin Li to 8, inclusive, coded using the PDF in <xref target="silk_rate_level_pdfs"/>
4161*a58d3d2aSXin Li corresponding to the signal type of the current frame (from
4162*a58d3d2aSXin Li <xref target="silk_frame_type"/>).
4163*a58d3d2aSXin LiThe rate level selects the PDF used to decode the number of pulses in
4164*a58d3d2aSXin Li the individual shell blocks.
4165*a58d3d2aSXin LiIt does not directly convey any information about the bitrate or the number of
4166*a58d3d2aSXin Li pulses itself, but merely changes the probability of the symbols in
4167*a58d3d2aSXin Li <xref target="silk_pulse_counts"/>.
4168*a58d3d2aSXin LiLevel&nbsp;0 provides a more efficient encoding at low rates generally, and
4169*a58d3d2aSXin Li level&nbsp;8 provides a more efficient encoding at high rates generally,
4170*a58d3d2aSXin Li though the most efficient level for a particular SILK frame may depend on the
4171*a58d3d2aSXin Li exact distribution of the coded symbols.
4172*a58d3d2aSXin LiAn encoder should, but is not required to, use the most efficient rate level.
4173*a58d3d2aSXin Li</t>
4174*a58d3d2aSXin Li
4175*a58d3d2aSXin Li<texttable anchor="silk_rate_level_pdfs"
4176*a58d3d2aSXin Li title="PDFs for the Rate Level">
4177*a58d3d2aSXin Li<ttcol>Signal Type</ttcol>
4178*a58d3d2aSXin Li<ttcol>PDF</ttcol>
4179*a58d3d2aSXin Li<c>Inactive or Unvoiced</c>
4180*a58d3d2aSXin Li<c>{15, 51, 12, 46, 45, 13, 33, 27, 14}/256</c>
4181*a58d3d2aSXin Li<c>Voiced</c>
4182*a58d3d2aSXin Li<c>{33, 30, 36, 17, 34, 49, 18, 21, 18}/256</c>
4183*a58d3d2aSXin Li</texttable>
4184*a58d3d2aSXin Li
4185*a58d3d2aSXin Li</section>
4186*a58d3d2aSXin Li
4187*a58d3d2aSXin Li<section anchor="silk_pulse_counts" title="Pulses Per Shell Block">
4188*a58d3d2aSXin Li<t>
4189*a58d3d2aSXin LiThe total number of pulses in each of the shell blocks follows the rate level.
4190*a58d3d2aSXin LiThe pulse counts for all of the shell blocks are coded consecutively, before
4191*a58d3d2aSXin Li the content of any of the blocks.
4192*a58d3d2aSXin LiEach block may have anywhere from 0 to 16 pulses, inclusive, coded using the
4193*a58d3d2aSXin Li 18-entry PDF in <xref target="silk_pulse_count_pdfs"/> corresponding to the
4194*a58d3d2aSXin Li rate level from <xref target="silk_rate_level"/>.
4195*a58d3d2aSXin LiThe special value 17 indicates that this block has one or more additional
4196*a58d3d2aSXin Li LSBs to decode for each coefficient.
4197*a58d3d2aSXin LiIf the decoder encounters this value, it decodes another value for the actual
4198*a58d3d2aSXin Li pulse count of the block, but uses the PDF corresponding to the special rate
4199*a58d3d2aSXin Li level&nbsp;9 instead of the normal rate level.
4200*a58d3d2aSXin LiThis process repeats until the decoder reads a value less than 17, and it then
4201*a58d3d2aSXin Li sets the number of extra LSBs used to the number of 17's decoded for that
4202*a58d3d2aSXin Li block.
4203*a58d3d2aSXin LiIf it reads the value 17 ten times, then the next iteration uses the special
4204*a58d3d2aSXin Li rate level&nbsp;10 instead of 9.
4205*a58d3d2aSXin LiThe probability of decoding a 17 when using the PDF for rate level&nbsp;10 is
4206*a58d3d2aSXin Li zero, ensuring that the number of LSBs for a block will not exceed 10.
4207*a58d3d2aSXin LiThe cumulative distribution for rate level&nbsp;10 is just a shifted version of
4208*a58d3d2aSXin Li that for 9 and thus does not require any additional storage.
4209*a58d3d2aSXin Li</t>
4210*a58d3d2aSXin Li
4211*a58d3d2aSXin Li<texttable anchor="silk_pulse_count_pdfs"
4212*a58d3d2aSXin Li title="PDFs for the Pulse Count">
4213*a58d3d2aSXin Li<ttcol>Rate Level</ttcol>
4214*a58d3d2aSXin Li<ttcol>PDF</ttcol>
4215*a58d3d2aSXin Li<c>0</c>
4216*a58d3d2aSXin Li<c>{131, 74, 25, 8, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c>
4217*a58d3d2aSXin Li<c>1</c>
4218*a58d3d2aSXin Li<c>{58, 93, 60, 23, 7, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c>
4219*a58d3d2aSXin Li<c>2</c>
4220*a58d3d2aSXin Li<c>{43, 51, 46, 33, 24, 16, 11, 8, 6, 3, 3, 3, 2, 1, 1, 2, 1, 2}/256</c>
4221*a58d3d2aSXin Li<c>3</c>
4222*a58d3d2aSXin Li<c>{17, 52, 71, 57, 31, 12, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c>
4223*a58d3d2aSXin Li<c>4</c>
4224*a58d3d2aSXin Li<c>{6, 21, 41, 53, 49, 35, 21, 11, 6, 3, 2, 2, 1, 1, 1, 1, 1, 1}/256</c>
4225*a58d3d2aSXin Li<c>5</c>
4226*a58d3d2aSXin Li<c>{7, 14, 22, 28, 29, 28, 25, 20, 17, 13, 11, 9, 7, 5, 4, 4, 3, 10}/256</c>
4227*a58d3d2aSXin Li<c>6</c>
4228*a58d3d2aSXin Li<c>{2, 5, 14, 29, 42, 46, 41, 31, 19, 11, 6, 3, 2, 1, 1, 1, 1, 1}/256</c>
4229*a58d3d2aSXin Li<c>7</c>
4230*a58d3d2aSXin Li<c>{1, 2, 4, 10, 19, 29, 35, 37, 34, 28, 20, 14, 8, 5, 4, 2, 2, 2}/256</c>
4231*a58d3d2aSXin Li<c>8</c>
4232*a58d3d2aSXin Li<c>{1, 2, 2, 5, 9, 14, 20, 24, 27, 28, 26, 23, 20, 15, 11, 8, 6, 15}/256</c>
4233*a58d3d2aSXin Li<c>9</c>
4234*a58d3d2aSXin Li<c>{1, 1, 1, 6, 27, 58, 56, 39, 25, 14, 10, 6, 3, 3, 2, 1, 1, 2}/256</c>
4235*a58d3d2aSXin Li<c>10</c>
4236*a58d3d2aSXin Li<c>{2, 1, 6, 27, 58, 56, 39, 25, 14, 10, 6, 3, 3, 2, 1, 1, 2, 0}/256</c>
4237*a58d3d2aSXin Li</texttable>
4238*a58d3d2aSXin Li
4239*a58d3d2aSXin Li</section>
4240*a58d3d2aSXin Li
4241*a58d3d2aSXin Li<section anchor="silk_pulse_locations" title="Pulse Location Decoding">
4242*a58d3d2aSXin Li<t>
4243*a58d3d2aSXin LiThe locations of the pulses in each shell block follow the pulse counts,
4244*a58d3d2aSXin Li as decoded by silk_shell_decoder() (shell_coder.c).
4245*a58d3d2aSXin LiAs with the pulse counts, these locations are coded for all the shell blocks
4246*a58d3d2aSXin Li before any of the remaining information for each block.
4247*a58d3d2aSXin LiUnlike many other codecs, SILK places no restriction on the distribution of
4248*a58d3d2aSXin Li pulses within a shell block.
4249*a58d3d2aSXin LiAll of the pulses may be placed in a single location, or each one in a unique
4250*a58d3d2aSXin Li location, or anything in between.
4251*a58d3d2aSXin Li</t>
4252*a58d3d2aSXin Li
4253*a58d3d2aSXin Li<t>
4254*a58d3d2aSXin LiThe location of pulses is coded by recursively partitioning each block into
4255*a58d3d2aSXin Li halves, and coding how many pulses fall on the left side of the split.
4256*a58d3d2aSXin LiAll remaining pulses must fall on the right side of the split.
4257*a58d3d2aSXin LiThe process then recurses into the left half, and after that returns, the
4258*a58d3d2aSXin Li right half (preorder traversal).
4259*a58d3d2aSXin LiThe PDF to use is chosen by the size of the current partition (16, 8, 4, or 2)
4260*a58d3d2aSXin Li and the number of pulses in the partition (1 to 16, inclusive).
4261*a58d3d2aSXin LiTables&nbsp;<xref format="counter" target="silk_shell_code3_pdfs"/>
4262*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_shell_code0_pdfs"/> list the
4263*a58d3d2aSXin Li PDFs used for each partition size and pulse count.
4264*a58d3d2aSXin LiThis process skips partitions without any pulses, i.e., where the initial pulse
4265*a58d3d2aSXin Li count from <xref target="silk_pulse_counts"/> was zero, or where the split in
4266*a58d3d2aSXin Li the prior level indicated that all of the pulses fell on the other side.
4267*a58d3d2aSXin LiThese partitions have nothing to code, so they require no PDF.
4268*a58d3d2aSXin Li</t>
4269*a58d3d2aSXin Li
4270*a58d3d2aSXin Li<texttable anchor="silk_shell_code3_pdfs"
4271*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 16 Sample Partitions">
4272*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
4273*a58d3d2aSXin Li<ttcol>PDF</ttcol>
4274*a58d3d2aSXin Li <c>1</c> <c>{126, 130}/256</c>
4275*a58d3d2aSXin Li <c>2</c> <c>{56, 142, 58}/256</c>
4276*a58d3d2aSXin Li <c>3</c> <c>{25, 101, 104, 26}/256</c>
4277*a58d3d2aSXin Li <c>4</c> <c>{12, 60, 108, 64, 12}/256</c>
4278*a58d3d2aSXin Li <c>5</c> <c>{7, 35, 84, 87, 37, 6}/256</c>
4279*a58d3d2aSXin Li <c>6</c> <c>{4, 20, 59, 86, 63, 21, 3}/256</c>
4280*a58d3d2aSXin Li <c>7</c> <c>{3, 12, 38, 72, 75, 42, 12, 2}/256</c>
4281*a58d3d2aSXin Li <c>8</c> <c>{2, 8, 25, 54, 73, 59, 27, 7, 1}/256</c>
4282*a58d3d2aSXin Li <c>9</c> <c>{2, 5, 17, 39, 63, 65, 42, 18, 4, 1}/256</c>
4283*a58d3d2aSXin Li<c>10</c> <c>{1, 4, 12, 28, 49, 63, 54, 30, 11, 3, 1}/256</c>
4284*a58d3d2aSXin Li<c>11</c> <c>{1, 4, 8, 20, 37, 55, 57, 41, 22, 8, 2, 1}/256</c>
4285*a58d3d2aSXin Li<c>12</c> <c>{1, 3, 7, 15, 28, 44, 53, 48, 33, 16, 6, 1, 1}/256</c>
4286*a58d3d2aSXin Li<c>13</c> <c>{1, 2, 6, 12, 21, 35, 47, 48, 40, 25, 12, 5, 1, 1}/256</c>
4287*a58d3d2aSXin Li<c>14</c> <c>{1, 1, 4, 10, 17, 27, 37, 47, 43, 33, 21, 9, 4, 1, 1}/256</c>
4288*a58d3d2aSXin Li<c>15</c> <c>{1, 1, 1, 8, 14, 22, 33, 40, 43, 38, 28, 16, 8, 1, 1, 1}/256</c>
4289*a58d3d2aSXin Li<c>16</c> <c>{1, 1, 1, 1, 13, 18, 27, 36, 41, 41, 34, 24, 14, 1, 1, 1, 1}/256</c>
4290*a58d3d2aSXin Li</texttable>
4291*a58d3d2aSXin Li
4292*a58d3d2aSXin Li<texttable anchor="silk_shell_code2_pdfs"
4293*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 8 Sample Partitions">
4294*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
4295*a58d3d2aSXin Li<ttcol>PDF</ttcol>
4296*a58d3d2aSXin Li <c>1</c> <c>{127, 129}/256</c>
4297*a58d3d2aSXin Li <c>2</c> <c>{53, 149, 54}/256</c>
4298*a58d3d2aSXin Li <c>3</c> <c>{22, 105, 106, 23}/256</c>
4299*a58d3d2aSXin Li <c>4</c> <c>{11, 61, 111, 63, 10}/256</c>
4300*a58d3d2aSXin Li <c>5</c> <c>{6, 35, 86, 88, 36, 5}/256</c>
4301*a58d3d2aSXin Li <c>6</c> <c>{4, 20, 59, 87, 62, 21, 3}/256</c>
4302*a58d3d2aSXin Li <c>7</c> <c>{3, 13, 40, 71, 73, 41, 13, 2}/256</c>
4303*a58d3d2aSXin Li <c>8</c> <c>{3, 9, 27, 53, 70, 56, 28, 9, 1}/256</c>
4304*a58d3d2aSXin Li <c>9</c> <c>{3, 8, 19, 37, 57, 61, 44, 20, 6, 1}/256</c>
4305*a58d3d2aSXin Li<c>10</c> <c>{3, 7, 15, 28, 44, 54, 49, 33, 17, 5, 1}/256</c>
4306*a58d3d2aSXin Li<c>11</c> <c>{1, 7, 13, 22, 34, 46, 48, 38, 28, 14, 4, 1}/256</c>
4307*a58d3d2aSXin Li<c>12</c> <c>{1, 1, 11, 22, 27, 35, 42, 47, 33, 25, 10, 1, 1}/256</c>
4308*a58d3d2aSXin Li<c>13</c> <c>{1, 1, 6, 14, 26, 37, 43, 43, 37, 26, 14, 6, 1, 1}/256</c>
4309*a58d3d2aSXin Li<c>14</c> <c>{1, 1, 4, 10, 20, 31, 40, 42, 40, 31, 20, 10, 4, 1, 1}/256</c>
4310*a58d3d2aSXin Li<c>15</c> <c>{1, 1, 3, 8, 16, 26, 35, 38, 38, 35, 26, 16, 8, 3, 1, 1}/256</c>
4311*a58d3d2aSXin Li<c>16</c> <c>{1, 1, 2, 6, 12, 21, 30, 36, 38, 36, 30, 21, 12, 6, 2, 1, 1}/256</c>
4312*a58d3d2aSXin Li</texttable>
4313*a58d3d2aSXin Li
4314*a58d3d2aSXin Li<texttable anchor="silk_shell_code1_pdfs"
4315*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 4 Sample Partitions">
4316*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
4317*a58d3d2aSXin Li<ttcol>PDF</ttcol>
4318*a58d3d2aSXin Li <c>1</c> <c>{127, 129}/256</c>
4319*a58d3d2aSXin Li <c>2</c> <c>{49, 157, 50}/256</c>
4320*a58d3d2aSXin Li <c>3</c> <c>{20, 107, 109, 20}/256</c>
4321*a58d3d2aSXin Li <c>4</c> <c>{11, 60, 113, 62, 10}/256</c>
4322*a58d3d2aSXin Li <c>5</c> <c>{7, 36, 84, 87, 36, 6}/256</c>
4323*a58d3d2aSXin Li <c>6</c> <c>{6, 24, 57, 82, 60, 23, 4}/256</c>
4324*a58d3d2aSXin Li <c>7</c> <c>{5, 18, 39, 64, 68, 42, 16, 4}/256</c>
4325*a58d3d2aSXin Li <c>8</c> <c>{6, 14, 29, 47, 61, 52, 30, 14, 3}/256</c>
4326*a58d3d2aSXin Li <c>9</c> <c>{1, 15, 23, 35, 51, 50, 40, 30, 10, 1}/256</c>
4327*a58d3d2aSXin Li<c>10</c> <c>{1, 1, 21, 32, 42, 52, 46, 41, 18, 1, 1}/256</c>
4328*a58d3d2aSXin Li<c>11</c> <c>{1, 6, 16, 27, 36, 42, 42, 36, 27, 16, 6, 1}/256</c>
4329*a58d3d2aSXin Li<c>12</c> <c>{1, 5, 12, 21, 31, 38, 40, 38, 31, 21, 12, 5, 1}/256</c>
4330*a58d3d2aSXin Li<c>13</c> <c>{1, 3, 9, 17, 26, 34, 38, 38, 34, 26, 17, 9, 3, 1}/256</c>
4331*a58d3d2aSXin Li<c>14</c> <c>{1, 3, 7, 14, 22, 29, 34, 36, 34, 29, 22, 14, 7, 3, 1}/256</c>
4332*a58d3d2aSXin Li<c>15</c> <c>{1, 2, 5, 11, 18, 25, 31, 35, 35, 31, 25, 18, 11, 5, 2, 1}/256</c>
4333*a58d3d2aSXin Li<c>16</c> <c>{1, 1, 4, 9, 15, 21, 28, 32, 34, 32, 28, 21, 15, 9, 4, 1, 1}/256</c>
4334*a58d3d2aSXin Li</texttable>
4335*a58d3d2aSXin Li
4336*a58d3d2aSXin Li<texttable anchor="silk_shell_code0_pdfs"
4337*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 2 Sample Partitions">
4338*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
4339*a58d3d2aSXin Li<ttcol>PDF</ttcol>
4340*a58d3d2aSXin Li <c>1</c> <c>{128, 128}/256</c>
4341*a58d3d2aSXin Li <c>2</c> <c>{42, 172, 42}/256</c>
4342*a58d3d2aSXin Li <c>3</c> <c>{21, 107, 107, 21}/256</c>
4343*a58d3d2aSXin Li <c>4</c> <c>{12, 60, 112, 61, 11}/256</c>
4344*a58d3d2aSXin Li <c>5</c> <c>{8, 34, 86, 86, 35, 7}/256</c>
4345*a58d3d2aSXin Li <c>6</c> <c>{8, 23, 55, 90, 55, 20, 5}/256</c>
4346*a58d3d2aSXin Li <c>7</c> <c>{5, 15, 38, 72, 72, 36, 15, 3}/256</c>
4347*a58d3d2aSXin Li <c>8</c> <c>{6, 12, 27, 52, 77, 47, 20, 10, 5}/256</c>
4348*a58d3d2aSXin Li <c>9</c> <c>{6, 19, 28, 35, 40, 40, 35, 28, 19, 6}/256</c>
4349*a58d3d2aSXin Li<c>10</c> <c>{4, 14, 22, 31, 37, 40, 37, 31, 22, 14, 4}/256</c>
4350*a58d3d2aSXin Li<c>11</c> <c>{3, 10, 18, 26, 33, 38, 38, 33, 26, 18, 10, 3}/256</c>
4351*a58d3d2aSXin Li<c>12</c> <c>{2, 8, 13, 21, 29, 36, 38, 36, 29, 21, 13, 8, 2}/256</c>
4352*a58d3d2aSXin Li<c>13</c> <c>{1, 5, 10, 17, 25, 32, 38, 38, 32, 25, 17, 10, 5, 1}/256</c>
4353*a58d3d2aSXin Li<c>14</c> <c>{1, 4, 7, 13, 21, 29, 35, 36, 35, 29, 21, 13, 7, 4, 1}/256</c>
4354*a58d3d2aSXin Li<c>15</c> <c>{1, 2, 5, 10, 17, 25, 32, 36, 36, 32, 25, 17, 10, 5, 2, 1}/256</c>
4355*a58d3d2aSXin Li<c>16</c> <c>{1, 2, 4, 7, 13, 21, 28, 34, 36, 34, 28, 21, 13, 7, 4, 2, 1}/256</c>
4356*a58d3d2aSXin Li</texttable>
4357*a58d3d2aSXin Li
4358*a58d3d2aSXin Li</section>
4359*a58d3d2aSXin Li
4360*a58d3d2aSXin Li<section anchor="silk_shell_lsb" title="LSB Decoding">
4361*a58d3d2aSXin Li<t>
4362*a58d3d2aSXin LiAfter the decoder reads the pulse locations for all blocks, it reads the LSBs
4363*a58d3d2aSXin Li (if any) for each block in turn.
4364*a58d3d2aSXin LiInside each block, it reads all the LSBs for each coefficient in turn, even
4365*a58d3d2aSXin Li those where no pulses were allocated, before proceeding to the next one.
4366*a58d3d2aSXin LiFor 10&nbsp;ms MB frames, it reads LSBs even for the extra 8&nbsp;samples in
4367*a58d3d2aSXin Li the last block.
4368*a58d3d2aSXin LiThe LSBs are coded from most significant to least significant, and they all use
4369*a58d3d2aSXin Li the PDF in <xref target="silk_shell_lsb_pdf"/>.
4370*a58d3d2aSXin Li</t>
4371*a58d3d2aSXin Li
4372*a58d3d2aSXin Li<texttable anchor="silk_shell_lsb_pdf" title="PDF for Excitation LSBs">
4373*a58d3d2aSXin Li<ttcol>PDF</ttcol>
4374*a58d3d2aSXin Li<c>{136, 120}/256</c>
4375*a58d3d2aSXin Li</texttable>
4376*a58d3d2aSXin Li
4377*a58d3d2aSXin Li<t>
4378*a58d3d2aSXin LiThe number of LSBs read for each coefficient in a block is determined in
4379*a58d3d2aSXin Li <xref target="silk_pulse_counts"/>.
4380*a58d3d2aSXin LiThe magnitude of the coefficient is initially equal to the number of pulses
4381*a58d3d2aSXin Li placed at that location in <xref target="silk_pulse_locations"/>.
4382*a58d3d2aSXin LiAs each LSB is decoded, the magnitude is doubled, and then the value of the LSB
4383*a58d3d2aSXin Li added to it, to obtain an updated magnitude.
4384*a58d3d2aSXin Li</t>
4385*a58d3d2aSXin Li</section>
4386*a58d3d2aSXin Li
4387*a58d3d2aSXin Li<section anchor="silk_signs" title="Sign Decoding">
4388*a58d3d2aSXin Li<t>
4389*a58d3d2aSXin LiAfter decoding the pulse locations and the LSBs, the decoder knows the
4390*a58d3d2aSXin Li magnitude of each coefficient in the excitation.
4391*a58d3d2aSXin LiIt then decodes a sign for all coefficients with a non-zero magnitude, using
4392*a58d3d2aSXin Li one of the PDFs from <xref target="silk_sign_pdfs"/>.
4393*a58d3d2aSXin LiIf the value decoded is 0, then the coefficient magnitude is negated.
4394*a58d3d2aSXin LiOtherwise, it remains positive.
4395*a58d3d2aSXin Li</t>
4396*a58d3d2aSXin Li
4397*a58d3d2aSXin Li<t>
4398*a58d3d2aSXin LiThe decoder chooses the PDF for the sign based on the signal type and
4399*a58d3d2aSXin Li quantization offset type (from <xref target="silk_frame_type"/>) and the
4400*a58d3d2aSXin Li number of pulses in the block (from <xref target="silk_pulse_counts"/>).
4401*a58d3d2aSXin LiThe number of pulses in the block does not take into account any LSBs.
4402*a58d3d2aSXin LiMost PDFs are skewed towards negative signs because of the quantization offset,
4403*a58d3d2aSXin Li but the PDFs for zero pulses are highly skewed towards positive signs.
4404*a58d3d2aSXin LiIf a block contains many positive coefficients, it is sometimes beneficial to
4405*a58d3d2aSXin Li code it solely using LSBs (i.e., with zero pulses), since the encoder may be
4406*a58d3d2aSXin Li able to save enough bits on the signs to justify the less efficient
4407*a58d3d2aSXin Li coefficient magnitude encoding.
4408*a58d3d2aSXin Li</t>
4409*a58d3d2aSXin Li
4410*a58d3d2aSXin Li<texttable anchor="silk_sign_pdfs"
4411*a58d3d2aSXin Li title="PDFs for Excitation Signs">
4412*a58d3d2aSXin Li<ttcol>Signal Type</ttcol>
4413*a58d3d2aSXin Li<ttcol>Quantization Offset Type</ttcol>
4414*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
4415*a58d3d2aSXin Li<ttcol>PDF</ttcol>
4416*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>0</c>         <c>{2, 254}/256</c>
4417*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>1</c>         <c>{207, 49}/256</c>
4418*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>2</c>         <c>{189, 67}/256</c>
4419*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>3</c>         <c>{179, 77}/256</c>
4420*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>4</c>         <c>{174, 82}/256</c>
4421*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>5</c>         <c>{163, 93}/256</c>
4422*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>6 or more</c> <c>{157, 99}/256</c>
4423*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>0</c>         <c>{58, 198}/256</c>
4424*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>1</c>         <c>{245, 11}/256</c>
4425*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>2</c>         <c>{238, 18}/256</c>
4426*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>3</c>         <c>{232, 24}/256</c>
4427*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>4</c>         <c>{225, 31}/256</c>
4428*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>5</c>         <c>{220, 36}/256</c>
4429*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>6 or more</c> <c>{211, 45}/256</c>
4430*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>0</c>         <c>{1, 255}/256</c>
4431*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>1</c>         <c>{210, 46}/256</c>
4432*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>2</c>         <c>{190, 66}/256</c>
4433*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>3</c>         <c>{178, 78}/256</c>
4434*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>4</c>         <c>{169, 87}/256</c>
4435*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>5</c>         <c>{162, 94}/256</c>
4436*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>6 or more</c> <c>{152, 104}/256</c>
4437*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>0</c>         <c>{48, 208}/256</c>
4438*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>1</c>         <c>{242, 14}/256</c>
4439*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>2</c>         <c>{235, 21}/256</c>
4440*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>3</c>         <c>{224, 32}/256</c>
4441*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>4</c>         <c>{214, 42}/256</c>
4442*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>5</c>         <c>{205, 51}/256</c>
4443*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>6 or more</c> <c>{190, 66}/256</c>
4444*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>0</c>         <c>{1, 255}/256</c>
4445*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>1</c>         <c>{162, 94}/256</c>
4446*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>2</c>         <c>{152, 104}/256</c>
4447*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>3</c>         <c>{147, 109}/256</c>
4448*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>4</c>         <c>{144, 112}/256</c>
4449*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>5</c>         <c>{141, 115}/256</c>
4450*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>6 or more</c> <c>{138, 118}/256</c>
4451*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>0</c>         <c>{8, 248}/256</c>
4452*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>1</c>         <c>{203, 53}/256</c>
4453*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>2</c>         <c>{187, 69}/256</c>
4454*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>3</c>         <c>{176, 80}/256</c>
4455*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>4</c>         <c>{168, 88}/256</c>
4456*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>5</c>         <c>{161, 95}/256</c>
4457*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>6 or more</c> <c>{154, 102}/256</c>
4458*a58d3d2aSXin Li</texttable>
4459*a58d3d2aSXin Li
4460*a58d3d2aSXin Li</section>
4461*a58d3d2aSXin Li
4462*a58d3d2aSXin Li<section anchor="silk_excitation_reconstruction"
4463*a58d3d2aSXin Li title="Reconstructing the Excitation">
4464*a58d3d2aSXin Li
4465*a58d3d2aSXin Li<t>
4466*a58d3d2aSXin LiAfter the signs have been read, there is enough information to reconstruct the
4467*a58d3d2aSXin Li complete excitation signal.
4468*a58d3d2aSXin LiThis requires adding a constant quantization offset to each non-zero sample,
4469*a58d3d2aSXin Li and then pseudorandomly inverting and offsetting every sample.
4470*a58d3d2aSXin LiThe constant quantization offset varies depending on the signal type and
4471*a58d3d2aSXin Li quantization offset type (see <xref target="silk_frame_type"/>).
4472*a58d3d2aSXin Li</t>
4473*a58d3d2aSXin Li
4474*a58d3d2aSXin Li<texttable anchor="silk_quantization_offsets"
4475*a58d3d2aSXin Li title="Excitation Quantization Offsets">
4476*a58d3d2aSXin Li<ttcol align="left">Signal Type</ttcol>
4477*a58d3d2aSXin Li<ttcol align="left">Quantization Offset Type</ttcol>
4478*a58d3d2aSXin Li<ttcol align="right">Quantization Offset (Q23)</ttcol>
4479*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>25</c>
4480*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>60</c>
4481*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>25</c>
4482*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>60</c>
4483*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>   <c>8</c>
4484*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>25</c>
4485*a58d3d2aSXin Li</texttable>
4486*a58d3d2aSXin Li
4487*a58d3d2aSXin Li<t>
4488*a58d3d2aSXin LiLet e_raw[i] be the raw excitation value at position i, with a magnitude
4489*a58d3d2aSXin Li composed of the pulses at that location (see
4490*a58d3d2aSXin Li <xref target="silk_pulse_locations"/>) combined with any additional LSBs (see
4491*a58d3d2aSXin Li <xref target="silk_shell_lsb"/>), and with the corresponding sign decoded in
4492*a58d3d2aSXin Li <xref target="silk_signs"/>.
4493*a58d3d2aSXin LiAdditionally, let seed be the current pseudorandom seed, which is initialized
4494*a58d3d2aSXin Li to the value decoded from <xref target="silk_seed"/> for the first sample in
4495*a58d3d2aSXin Li the current SILK frame, and updated for each subsequent sample according to
4496*a58d3d2aSXin Li the procedure below.
4497*a58d3d2aSXin LiFinally, let offset_Q23 be the quantization offset from
4498*a58d3d2aSXin Li <xref target="silk_quantization_offsets"/>.
4499*a58d3d2aSXin LiThen the following procedure produces the final reconstructed excitation value,
4500*a58d3d2aSXin Li e_Q23[i]:
4501*a58d3d2aSXin Li<figure align="center">
4502*a58d3d2aSXin Li<artwork align="center"><![CDATA[
4503*a58d3d2aSXin Lie_Q23[i] = (e_raw[i] << 8) - sign(e_raw[i])*20 + offset_Q23;
4504*a58d3d2aSXin Li    seed = (196314165*seed + 907633515) & 0xFFFFFFFF;
4505*a58d3d2aSXin Lie_Q23[i] = (seed & 0x80000000) ? -e_Q23[i] : e_Q23[i];
4506*a58d3d2aSXin Li    seed = (seed + e_raw[i]) & 0xFFFFFFFF;
4507*a58d3d2aSXin Li]]></artwork>
4508*a58d3d2aSXin Li</figure>
4509*a58d3d2aSXin LiWhen e_raw[i] is zero, sign() returns 0 by the definition in
4510*a58d3d2aSXin Li <xref target="sign"/>, so the factor of 20 does not get added.
4511*a58d3d2aSXin LiThe final e_Q23[i] value may require more than 16 bits per sample, but will not
4512*a58d3d2aSXin Li require more than 23, including the sign.
4513*a58d3d2aSXin Li</t>
4514*a58d3d2aSXin Li
4515*a58d3d2aSXin Li</section>
4516*a58d3d2aSXin Li
4517*a58d3d2aSXin Li</section>
4518*a58d3d2aSXin Li
4519*a58d3d2aSXin Li<section anchor="silk_frame_reconstruction" toc="include"
4520*a58d3d2aSXin Li title="SILK Frame Reconstruction">
4521*a58d3d2aSXin Li
4522*a58d3d2aSXin Li<t>
4523*a58d3d2aSXin LiThe remainder of the reconstruction process for the frame does not need to be
4524*a58d3d2aSXin Li bit-exact, as small errors should only introduce proportionally small
4525*a58d3d2aSXin Li distortions.
4526*a58d3d2aSXin LiAlthough the reference implementation only includes a fixed-point version of
4527*a58d3d2aSXin Li the remaining steps, this section describes them in terms of a floating-point
4528*a58d3d2aSXin Li version for simplicity.
4529*a58d3d2aSXin LiThis produces a signal with a nominal range of -1.0 to 1.0.
4530*a58d3d2aSXin Li</t>
4531*a58d3d2aSXin Li
4532*a58d3d2aSXin Li<t>
4533*a58d3d2aSXin Lisilk_decode_core() (decode_core.c) contains the code for the main
4534*a58d3d2aSXin Li reconstruction process.
4535*a58d3d2aSXin LiIt proceeds subframe-by-subframe, since quantization gains, LTP parameters, and
4536*a58d3d2aSXin Li (in 20&nbsp;ms SILK frames) LPC coefficients can vary from one to the
4537*a58d3d2aSXin Li next.
4538*a58d3d2aSXin Li</t>
4539*a58d3d2aSXin Li
4540*a58d3d2aSXin Li<t>
4541*a58d3d2aSXin LiLet a_Q12[k] be the LPC coefficients for the current subframe.
4542*a58d3d2aSXin LiIf this is the first or second subframe of a 20&nbsp;ms SILK frame and the LSF
4543*a58d3d2aSXin Li interpolation factor, w_Q2 (see <xref target="silk_nlsf_interpolation"/>), is
4544*a58d3d2aSXin Li less than 4, then these correspond to the final LPC coefficients produced by
4545*a58d3d2aSXin Li <xref target="silk_lpc_gain_limit"/> from the interpolated LSF coefficients,
4546*a58d3d2aSXin Li n1_Q15[k] (computed in <xref target="silk_nlsf_interpolation"/>).
4547*a58d3d2aSXin LiOtherwise, they correspond to the final LPC coefficients produced from the
4548*a58d3d2aSXin Li uninterpolated LSF coefficients for the current frame, n2_Q15[k].
4549*a58d3d2aSXin Li</t>
4550*a58d3d2aSXin Li
4551*a58d3d2aSXin Li<t>
4552*a58d3d2aSXin LiAlso, let n be the number of samples in a subframe (40 for NB, 60 for MB, and
4553*a58d3d2aSXin Li 80 for WB), s be the index of the current subframe in this SILK frame (0 or 1
4554*a58d3d2aSXin Li for 10&nbsp;ms frames, or 0 to 3 for 20&nbsp;ms frames), and j be the index of
4555*a58d3d2aSXin Li the first sample in the residual corresponding to the current subframe.
4556*a58d3d2aSXin Li</t>
4557*a58d3d2aSXin Li
4558*a58d3d2aSXin Li<section anchor="silk_ltp_synthesis" title="LTP Synthesis">
4559*a58d3d2aSXin Li<t>
4560*a58d3d2aSXin LiVoiced SILK frames (see <xref target="silk_frame_type"/>) pass the excitation
4561*a58d3d2aSXin Li through an LTP filter using the parameters decoded in
4562*a58d3d2aSXin Li <xref target="silk_ltp_params"/> to produce an LPC residual.
4563*a58d3d2aSXin LiThe LTP filter requires LPC residual values from before the current subframe as
4564*a58d3d2aSXin Li input.
4565*a58d3d2aSXin LiHowever, since the LPC coefficients may have changed, it obtains this residual
4566*a58d3d2aSXin Li by "rewhitening" the corresponding output signal using the LPC coefficients
4567*a58d3d2aSXin Li from the current subframe.
4568*a58d3d2aSXin LiLet out[i] for
4569*a58d3d2aSXin Li (j&nbsp;-&nbsp;pitch_lags[s]&nbsp;-&nbsp;d_LPC&nbsp;-&nbsp;2)&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;j
4570*a58d3d2aSXin Li be the fully reconstructed output signal from the last
4571*a58d3d2aSXin Li (pitch_lags[s]&nbsp;+&nbsp;d_LPC&nbsp;+&nbsp;2) samples of previous subframes
4572*a58d3d2aSXin Li (see <xref target="silk_lpc_synthesis"/>), where pitch_lags[s] is the pitch
4573*a58d3d2aSXin Li lag for the current subframe from <xref target="silk_ltp_lags"/>.
4574*a58d3d2aSXin LiDuring reconstruction of the first subframe for this channel after either
4575*a58d3d2aSXin Li<list style="symbols">
4576*a58d3d2aSXin Li<t>An uncoded regular SILK frame (if this is the side channel), or</t>
4577*a58d3d2aSXin Li<t>A decoder reset (see <xref target="decoder-reset"/>),</t>
4578*a58d3d2aSXin Li</list>
4579*a58d3d2aSXin Li out[] is rewhitened into an LPC residual,
4580*a58d3d2aSXin Li res[i], via
4581*a58d3d2aSXin Li<figure align="center">
4582*a58d3d2aSXin Li<artwork align="center"><![CDATA[
4583*a58d3d2aSXin Li         4.0*LTP_scale_Q14
4584*a58d3d2aSXin Lires[i] = ----------------- * clamp(-1.0,
4585*a58d3d2aSXin Li            gain_Q16[s]
4586*a58d3d2aSXin Li
4587*a58d3d2aSXin Li                                   d_LPC-1
4588*a58d3d2aSXin Li                                     __              a_Q12[k]
4589*a58d3d2aSXin Li                            out[i] - \  out[i-k-1] * --------, 1.0) .
4590*a58d3d2aSXin Li                                     /_               4096.0
4591*a58d3d2aSXin Li                                     k=0
4592*a58d3d2aSXin Li]]></artwork>
4593*a58d3d2aSXin Li</figure>
4594*a58d3d2aSXin LiThis requires storage to buffer up to 306 values of out[i] from previous
4595*a58d3d2aSXin Li subframes.
4596*a58d3d2aSXin LiThis corresponds to WB with a maximum pitch lag of
4597*a58d3d2aSXin Li 18&nbsp;ms&nbsp;*&nbsp;16&nbsp;kHz samples, plus 16 samples for d_LPC, plus 2
4598*a58d3d2aSXin Li samples for the width of the LTP filter.
4599*a58d3d2aSXin Li</t>
4600*a58d3d2aSXin Li
4601*a58d3d2aSXin Li<t>
4602*a58d3d2aSXin LiLet e_Q23[i] for j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n) be the
4603*a58d3d2aSXin Li excitation for the current subframe, and b_Q7[k] for
4604*a58d3d2aSXin Li 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;5 be the coefficients of the LTP filter
4605*a58d3d2aSXin Li taken from the codebook entry in one of
4606*a58d3d2aSXin Li Tables&nbsp;<xref format="counter" target="silk_ltp_filter_coeffs0"/>
4607*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_ltp_filter_coeffs2"/>
4608*a58d3d2aSXin Li corresponding to the index decoded for the current subframe in
4609*a58d3d2aSXin Li <xref target="silk_ltp_filter"/>.
4610*a58d3d2aSXin LiThen for i such that j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n),
4611*a58d3d2aSXin Li the LPC residual is
4612*a58d3d2aSXin Li<figure align="center">
4613*a58d3d2aSXin Li<artwork align="center"><![CDATA[
4614*a58d3d2aSXin Li                      4
4615*a58d3d2aSXin Li          e_Q23[i]   __                                  b_Q7[k]
4616*a58d3d2aSXin Lires[i] = --------- + \  res[i - pitch_lags[s] + 2 - k] * ------- .
4617*a58d3d2aSXin Li          2.0**23    /_                                   128.0
4618*a58d3d2aSXin Li                     k=0
4619*a58d3d2aSXin Li]]></artwork>
4620*a58d3d2aSXin Li</figure>
4621*a58d3d2aSXin Li</t>
4622*a58d3d2aSXin Li
4623*a58d3d2aSXin Li<t>
4624*a58d3d2aSXin LiFor unvoiced frames, the LPC residual for
4625*a58d3d2aSXin Li j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n) is simply a normalized
4626*a58d3d2aSXin Li copy of the excitation signal, i.e.,
4627*a58d3d2aSXin Li<figure align="center">
4628*a58d3d2aSXin Li<artwork align="center"><![CDATA[
4629*a58d3d2aSXin Li          e_Q23[i]
4630*a58d3d2aSXin Lires[i] = ---------
4631*a58d3d2aSXin Li          2.0**23
4632*a58d3d2aSXin Li]]></artwork>
4633*a58d3d2aSXin Li</figure>
4634*a58d3d2aSXin Li</t>
4635*a58d3d2aSXin Li</section>
4636*a58d3d2aSXin Li
4637*a58d3d2aSXin Li<section anchor="silk_lpc_synthesis" title="LPC Synthesis">
4638*a58d3d2aSXin Li<t>
4639*a58d3d2aSXin LiLPC synthesis uses the short-term LPC filter to predict the next output
4640*a58d3d2aSXin Li coefficient.
4641*a58d3d2aSXin LiFor i such that (j&nbsp;-&nbsp;d_LPC)&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;j, let
4642*a58d3d2aSXin Li lpc[i] be the result of LPC synthesis from the last d_LPC samples of the
4643*a58d3d2aSXin Li previous subframe, or zeros in the first subframe for this channel after
4644*a58d3d2aSXin Li either
4645*a58d3d2aSXin Li<list style="symbols">
4646*a58d3d2aSXin Li<t>An uncoded regular SILK frame (if this is the side channel), or</t>
4647*a58d3d2aSXin Li<t>A decoder reset (see <xref target="decoder-reset"/>).</t>
4648*a58d3d2aSXin Li</list>
4649*a58d3d2aSXin LiThen for i such that j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n), the
4650*a58d3d2aSXin Li result of LPC synthesis for the current subframe is
4651*a58d3d2aSXin Li<figure align="center">
4652*a58d3d2aSXin Li<artwork align="center"><![CDATA[
4653*a58d3d2aSXin Li                              d_LPC-1
4654*a58d3d2aSXin Li         gain_Q16[i]            __              a_Q12[k]
4655*a58d3d2aSXin Lilpc[i] = ----------- * res[i] + \  lpc[i-k-1] * -------- .
4656*a58d3d2aSXin Li           65536.0              /_               4096.0
4657*a58d3d2aSXin Li                                k=0
4658*a58d3d2aSXin Li]]></artwork>
4659*a58d3d2aSXin Li</figure>
4660*a58d3d2aSXin LiThe decoder saves the final d_LPC values, i.e., lpc[i] such that
4661*a58d3d2aSXin Li (j&nbsp;+&nbsp;n&nbsp;-&nbsp;d_LPC)&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n),
4662*a58d3d2aSXin Li to feed into the LPC synthesis of the next subframe.
4663*a58d3d2aSXin LiThis requires storage for up to 16 values of lpc[i] (for WB frames).
4664*a58d3d2aSXin Li</t>
4665*a58d3d2aSXin Li
4666*a58d3d2aSXin Li<t>
4667*a58d3d2aSXin LiThen, the signal is clamped into the final nominal range:
4668*a58d3d2aSXin Li<figure align="center">
4669*a58d3d2aSXin Li<artwork align="center"><![CDATA[
4670*a58d3d2aSXin Liout[i] = clamp(-1.0, lpc[i], 1.0) .
4671*a58d3d2aSXin Li]]></artwork>
4672*a58d3d2aSXin Li</figure>
4673*a58d3d2aSXin LiThis clamping occurs entirely after the LPC synthesis filter has run.
4674*a58d3d2aSXin LiThe decoder saves the unclamped values, lpc[i], to feed into the LPC filter for
4675*a58d3d2aSXin Li the next subframe, but saves the clamped values, out[i], for rewhitening in
4676*a58d3d2aSXin Li voiced frames.
4677*a58d3d2aSXin Li</t>
4678*a58d3d2aSXin Li</section>
4679*a58d3d2aSXin Li
4680*a58d3d2aSXin Li</section>
4681*a58d3d2aSXin Li
4682*a58d3d2aSXin Li</section>
4683*a58d3d2aSXin Li
4684*a58d3d2aSXin Li<section anchor="silk_stereo_unmixing" title="Stereo Unmixing">
4685*a58d3d2aSXin Li<t>
4686*a58d3d2aSXin LiFor stereo streams, after decoding a frame from each channel, the decoder must
4687*a58d3d2aSXin Li convert the mid-side (MS) representation into a left-right (LR)
4688*a58d3d2aSXin Li representation.
4689*a58d3d2aSXin LiThe function silk_stereo_MS_to_LR (stereo_MS_to_LR.c) implements this process.
4690*a58d3d2aSXin LiIn it, the decoder predicts the side channel using a) a simple low-passed
4691*a58d3d2aSXin Li version of the mid channel, and b) the unfiltered mid channel, using the
4692*a58d3d2aSXin Li prediction weights decoded in <xref target="silk_stereo_pred"/>.
4693*a58d3d2aSXin LiThis simple low-pass filter imposes a one-sample delay, and the unfiltered
4694*a58d3d2aSXin Limid channel is also delayed by one sample.
4695*a58d3d2aSXin LiIn order to allow seamless switching between stereo and mono, mono streams must
4696*a58d3d2aSXin Li also impose the same one-sample delay.
4697*a58d3d2aSXin LiThe encoder requires an additional one-sample delay for both mono and stereo
4698*a58d3d2aSXin Li streams, though an encoder may omit the delay for mono if it knows it will
4699*a58d3d2aSXin Li never switch to stereo.
4700*a58d3d2aSXin Li</t>
4701*a58d3d2aSXin Li
4702*a58d3d2aSXin Li<t>
4703*a58d3d2aSXin LiThe unmixing process operates in two phases.
4704*a58d3d2aSXin LiThe first phase lasts for 8&nbsp;ms, during which it interpolates the
4705*a58d3d2aSXin Li prediction weights from the previous frame, prev_w0_Q13 and prev_w1_Q13, to
4706*a58d3d2aSXin Li the values for the current frame, w0_Q13 and w1_Q13.
4707*a58d3d2aSXin LiThe second phase simply uses these weights for the remainder of the frame.
4708*a58d3d2aSXin Li</t>
4709*a58d3d2aSXin Li
4710*a58d3d2aSXin Li<t>
4711*a58d3d2aSXin LiLet mid[i] and side[i] be the contents of out[i] (from
4712*a58d3d2aSXin Li <xref target="silk_lpc_synthesis"/>) for the current mid and side channels,
4713*a58d3d2aSXin Li respectively, and let left[i] and right[i] be the corresponding stereo output
4714*a58d3d2aSXin Li channels.
4715*a58d3d2aSXin LiIf the side channel is not coded (see <xref target="silk_mid_only_flag"/>),
4716*a58d3d2aSXin Li then side[i] is set to zero.
4717*a58d3d2aSXin LiAlso let j be defined as in <xref target="silk_frame_reconstruction"/>, n1 be
4718*a58d3d2aSXin Li the number of samples in phase&nbsp;1 (64 for NB, 96 for MB, and 128 for WB),
4719*a58d3d2aSXin Li and n2 be the total number of samples in the frame.
4720*a58d3d2aSXin LiThen for i such that j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n2),
4721*a58d3d2aSXin Li the left and right channel output is
4722*a58d3d2aSXin Li<figure align="center">
4723*a58d3d2aSXin Li<artwork align="center"><![CDATA[
4724*a58d3d2aSXin Li              prev_w0_Q13                  (w0_Q13 - prev_w0_Q13)
4725*a58d3d2aSXin Li        w0 =  ----------- + min(i - j, n1)*---------------------- ,
4726*a58d3d2aSXin Li                8192.0                           8192.0*n1
4727*a58d3d2aSXin Li
4728*a58d3d2aSXin Li              prev_w1_Q13                  (w1_Q13 - prev_w1_Q13)
4729*a58d3d2aSXin Li        w1 =  ----------- + min(i - j, n1)*---------------------- ,
4730*a58d3d2aSXin Li                8192.0                            8192.0*n1
4731*a58d3d2aSXin Li
4732*a58d3d2aSXin Li             mid[i-2] + 2*mid[i-1] + mid[i]
4733*a58d3d2aSXin Li        p0 = ------------------------------ ,
4734*a58d3d2aSXin Li                          4.0
4735*a58d3d2aSXin Li
4736*a58d3d2aSXin Li left[i] = clamp(-1.0, (1 + w1)*mid[i-1] + side[i-1] + w0*p0, 1.0) ,
4737*a58d3d2aSXin Li
4738*a58d3d2aSXin Liright[i] = clamp(-1.0, (1 - w1)*mid[i-1] - side[i-1] - w0*p0, 1.0) .
4739*a58d3d2aSXin Li]]></artwork>
4740*a58d3d2aSXin Li</figure>
4741*a58d3d2aSXin LiThese formulas require two samples prior to index&nbsp;j, the start of the
4742*a58d3d2aSXin Li frame, for the mid channel, and one prior sample for the side channel.
4743*a58d3d2aSXin LiFor the first frame after a decoder reset, zeros are used instead.
4744*a58d3d2aSXin Li</t>
4745*a58d3d2aSXin Li
4746*a58d3d2aSXin Li</section>
4747*a58d3d2aSXin Li
4748*a58d3d2aSXin Li<section title="Resampling">
4749*a58d3d2aSXin Li<t>
4750*a58d3d2aSXin LiAfter stereo unmixing (if any), the decoder applies resampling to convert the
4751*a58d3d2aSXin Li decoded SILK output to the sample rate desired by the application.
4752*a58d3d2aSXin LiThis is necessary when decoding a Hybrid frame at SWB or FB sample rates, or
4753*a58d3d2aSXin Li whenever the decoder wants the output at a different sample rate than the
4754*a58d3d2aSXin Li internal SILK sampling rate (e.g., to allow a constant sample rate when the
4755*a58d3d2aSXin Li audio bandwidth changes, or to allow mixing with audio from other
4756*a58d3d2aSXin Li applications).
4757*a58d3d2aSXin LiThe resampler itself is non-normative, and a decoder can use any method it
4758*a58d3d2aSXin Li wants to perform the resampling.
4759*a58d3d2aSXin Li</t>
4760*a58d3d2aSXin Li
4761*a58d3d2aSXin Li<t>
4762*a58d3d2aSXin LiHowever, a minimum amount of delay is imposed to allow the resampler to
4763*a58d3d2aSXin Li operate, and this delay is normative, so that the corresponding delay can be
4764*a58d3d2aSXin Li applied to the MDCT layer in the encoder.
4765*a58d3d2aSXin LiA decoder is always free to use a resampler which requires more delay than
4766*a58d3d2aSXin Li allowed for here (e.g., to improve quality), but it must then delay the output
4767*a58d3d2aSXin Li of the MDCT layer by this extra amount.
4768*a58d3d2aSXin LiKeeping as much delay as possible on the encoder side allows an encoder which
4769*a58d3d2aSXin Li knows it will never use any of the SILK or Hybrid modes to skip this delay.
4770*a58d3d2aSXin LiBy contrast, if it were all applied by the decoder, then a decoder which
4771*a58d3d2aSXin Li processes audio in fixed-size blocks would be forced to delay the output of
4772*a58d3d2aSXin Li CELT frames just in case of a later switch to a SILK or Hybrid mode.
4773*a58d3d2aSXin Li</t>
4774*a58d3d2aSXin Li
4775*a58d3d2aSXin Li<t>
4776*a58d3d2aSXin Li<xref target="silk_resampler_delay_alloc"/> gives the maximum resampler delay
4777*a58d3d2aSXin Li in samples at 48&nbsp;kHz for each SILK audio bandwidth.
4778*a58d3d2aSXin LiBecause the actual output rate may not be 48&nbsp;kHz, it may not be possible
4779*a58d3d2aSXin Li to achieve exactly these delays while using a whole number of input or output
4780*a58d3d2aSXin Li samples.
4781*a58d3d2aSXin LiThe reference implementation is able to resample to any of the supported
4782*a58d3d2aSXin Li output sampling rates (8, 12, 16, 24, or 48&nbsp;kHz) within or near this
4783*a58d3d2aSXin Li delay constraint.
4784*a58d3d2aSXin LiSome resampling filters (including those used by the reference implementation)
4785*a58d3d2aSXin Li may add a delay that is not an exact integer, or is not linear-phase, and so
4786*a58d3d2aSXin Li cannot be represented by a single delay at all frequencies.
4787*a58d3d2aSXin LiHowever, such deviations are unlikely to be perceptible, and the comparison
4788*a58d3d2aSXin Li tool described in <xref target="conformance"/> is designed to be relatively
4789*a58d3d2aSXin Li insensitive to them.
4790*a58d3d2aSXin LiThe delays listed here are the ones that should be targeted by the encoder.
4791*a58d3d2aSXin Li</t>
4792*a58d3d2aSXin Li
4793*a58d3d2aSXin Li<texttable anchor="silk_resampler_delay_alloc"
4794*a58d3d2aSXin Li title="SILK Resampler Delay Allocations">
4795*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol>
4796*a58d3d2aSXin Li<ttcol>Delay in millisecond</ttcol>
4797*a58d3d2aSXin Li<c>NB</c> <c>0.538</c>
4798*a58d3d2aSXin Li<c>MB</c> <c>0.692</c>
4799*a58d3d2aSXin Li<c>WB</c> <c>0.706</c>
4800*a58d3d2aSXin Li</texttable>
4801*a58d3d2aSXin Li
4802*a58d3d2aSXin Li<t>
4803*a58d3d2aSXin LiNB is given a smaller decoder delay allocation than MB and WB to allow a
4804*a58d3d2aSXin Li higher-order filter when resampling to 8&nbsp;kHz in both the encoder and
4805*a58d3d2aSXin Li decoder.
4806*a58d3d2aSXin LiThis implies that the audio content of two SILK frames operating at different
4807*a58d3d2aSXin Li bandwidths are not perfectly aligned in time.
4808*a58d3d2aSXin LiThis is not an issue for any transitions described in
4809*a58d3d2aSXin Li <xref target="switching"/>, because they all involve a SILK decoder reset.
4810*a58d3d2aSXin LiWhen the decoder is reset, any samples remaining in the resampling buffer
4811*a58d3d2aSXin Li are discarded, and the resampler is re-initialized with silence.
4812*a58d3d2aSXin Li</t>
4813*a58d3d2aSXin Li
4814*a58d3d2aSXin Li</section>
4815*a58d3d2aSXin Li
4816*a58d3d2aSXin Li</section>
4817*a58d3d2aSXin Li
4818*a58d3d2aSXin Li
4819*a58d3d2aSXin Li<section title="CELT Decoder">
4820*a58d3d2aSXin Li
4821*a58d3d2aSXin Li<t>
4822*a58d3d2aSXin LiThe CELT layer of Opus is based on the Modified Discrete Cosine Transform
4823*a58d3d2aSXin Li<xref target='MDCT'/> with partially overlapping windows of 5 to 22.5 ms.
4824*a58d3d2aSXin LiThe main principle behind CELT is that the MDCT spectrum is divided into
4825*a58d3d2aSXin Libands that (roughly) follow the Bark scale, i.e., the scale of the ear's
4826*a58d3d2aSXin Licritical bands&nbsp;<xref target="Zwicker61"/>. The normal CELT layer uses 21 of those bands, though Opus
4827*a58d3d2aSXin Li Custom (see <xref target="opus-custom"/>) may use a different number of bands.
4828*a58d3d2aSXin LiIn Hybrid mode, the first 17 bands (up to 8&nbsp;kHz) are not coded.
4829*a58d3d2aSXin LiA band can contain as little as one MDCT bin per channel, and as many as 176
4830*a58d3d2aSXin Libins per channel, as detailed in <xref target="celt_band_sizes"/>.
4831*a58d3d2aSXin LiIn each band, the gain (energy) is coded separately from
4832*a58d3d2aSXin Lithe shape of the spectrum. Coding the gain explicitly makes it easy to
4833*a58d3d2aSXin Lipreserve the spectral envelope of the signal. The remaining unit-norm shape
4834*a58d3d2aSXin Livector is encoded using a Pyramid Vector Quantizer (PVQ)&nbsp;<xref target='PVQ-decoder'/>.
4835*a58d3d2aSXin Li</t>
4836*a58d3d2aSXin Li
4837*a58d3d2aSXin Li<texttable anchor="celt_band_sizes"
4838*a58d3d2aSXin Li title="MDCT Bins Per Channel Per Band for Each Frame Size">
4839*a58d3d2aSXin Li<ttcol>Frame Size:</ttcol>
4840*a58d3d2aSXin Li<ttcol align="right">2.5&nbsp;ms</ttcol>
4841*a58d3d2aSXin Li<ttcol align="right">5&nbsp;ms</ttcol>
4842*a58d3d2aSXin Li<ttcol align="right">10&nbsp;ms</ttcol>
4843*a58d3d2aSXin Li<ttcol align="right">20&nbsp;ms</ttcol>
4844*a58d3d2aSXin Li<ttcol align="right">Start Frequency</ttcol>
4845*a58d3d2aSXin Li<ttcol align="right">Stop Frequency</ttcol>
4846*a58d3d2aSXin Li<c>Band</c> <c>Bins:</c> <c/> <c/> <c/> <c/> <c/>
4847*a58d3d2aSXin Li <c>0</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>     <c>0&nbsp;Hz</c>   <c>200&nbsp;Hz</c>
4848*a58d3d2aSXin Li <c>1</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>   <c>200&nbsp;Hz</c>   <c>400&nbsp;Hz</c>
4849*a58d3d2aSXin Li <c>2</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>   <c>400&nbsp;Hz</c>   <c>600&nbsp;Hz</c>
4850*a58d3d2aSXin Li <c>3</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>   <c>600&nbsp;Hz</c>   <c>800&nbsp;Hz</c>
4851*a58d3d2aSXin Li <c>4</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>   <c>800&nbsp;Hz</c>  <c>1000&nbsp;Hz</c>
4852*a58d3d2aSXin Li <c>5</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>  <c>1000&nbsp;Hz</c>  <c>1200&nbsp;Hz</c>
4853*a58d3d2aSXin Li <c>6</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>  <c>1200&nbsp;Hz</c>  <c>1400&nbsp;Hz</c>
4854*a58d3d2aSXin Li <c>7</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>  <c>1400&nbsp;Hz</c>  <c>1600&nbsp;Hz</c>
4855*a58d3d2aSXin Li <c>8</c>  <c>2</c>  <c>4</c>  <c>8</c>  <c>16</c>  <c>1600&nbsp;Hz</c>  <c>2000&nbsp;Hz</c>
4856*a58d3d2aSXin Li <c>9</c>  <c>2</c>  <c>4</c>  <c>8</c>  <c>16</c>  <c>2000&nbsp;Hz</c>  <c>2400&nbsp;Hz</c>
4857*a58d3d2aSXin Li<c>10</c>  <c>2</c>  <c>4</c>  <c>8</c>  <c>16</c>  <c>2400&nbsp;Hz</c>  <c>2800&nbsp;Hz</c>
4858*a58d3d2aSXin Li<c>11</c>  <c>2</c>  <c>4</c>  <c>8</c>  <c>16</c>  <c>2800&nbsp;Hz</c>  <c>3200&nbsp;Hz</c>
4859*a58d3d2aSXin Li<c>12</c>  <c>4</c>  <c>8</c> <c>16</c>  <c>32</c>  <c>3200&nbsp;Hz</c>  <c>4000&nbsp;Hz</c>
4860*a58d3d2aSXin Li<c>13</c>  <c>4</c>  <c>8</c> <c>16</c>  <c>32</c>  <c>4000&nbsp;Hz</c>  <c>4800&nbsp;Hz</c>
4861*a58d3d2aSXin Li<c>14</c>  <c>4</c>  <c>8</c> <c>16</c>  <c>32</c>  <c>4800&nbsp;Hz</c>  <c>5600&nbsp;Hz</c>
4862*a58d3d2aSXin Li<c>15</c>  <c>6</c> <c>12</c> <c>24</c>  <c>48</c>  <c>5600&nbsp;Hz</c>  <c>6800&nbsp;Hz</c>
4863*a58d3d2aSXin Li<c>16</c>  <c>6</c> <c>12</c> <c>24</c>  <c>48</c>  <c>6800&nbsp;Hz</c>  <c>8000&nbsp;Hz</c>
4864*a58d3d2aSXin Li<c>17</c>  <c>8</c> <c>16</c> <c>32</c>  <c>64</c>  <c>8000&nbsp;Hz</c>  <c>9600&nbsp;Hz</c>
4865*a58d3d2aSXin Li<c>18</c> <c>12</c> <c>24</c> <c>48</c>  <c>96</c>  <c>9600&nbsp;Hz</c> <c>12000&nbsp;Hz</c>
4866*a58d3d2aSXin Li<c>19</c> <c>18</c> <c>36</c> <c>72</c> <c>144</c> <c>12000&nbsp;Hz</c> <c>15600&nbsp;Hz</c>
4867*a58d3d2aSXin Li<c>20</c> <c>22</c> <c>44</c> <c>88</c> <c>176</c> <c>15600&nbsp;Hz</c> <c>20000&nbsp;Hz</c>
4868*a58d3d2aSXin Li</texttable>
4869*a58d3d2aSXin Li
4870*a58d3d2aSXin Li<t>
4871*a58d3d2aSXin LiTransients are notoriously difficult for transform codecs to code.
4872*a58d3d2aSXin LiCELT uses two different strategies for them:
4873*a58d3d2aSXin Li<list style="numbers">
4874*a58d3d2aSXin Li<t>Using multiple smaller MDCTs instead of a single large MDCT, and</t>
4875*a58d3d2aSXin Li<t>Dynamic time-frequency resolution changes (See <xref target='tf-change'/>).</t>
4876*a58d3d2aSXin Li</list>
4877*a58d3d2aSXin LiTo improve quality on highly tonal and periodic signals, CELT includes
4878*a58d3d2aSXin Lia prefilter/postfilter combination. The prefilter on the encoder side
4879*a58d3d2aSXin Liattenuates the signal's harmonics. The postfilter on the decoder side
4880*a58d3d2aSXin Lirestores the original gain of the harmonics, while shaping the coding noise
4881*a58d3d2aSXin Lito roughly follow the harmonics. Such noise shaping reduces the perception
4882*a58d3d2aSXin Liof the noise.
4883*a58d3d2aSXin Li</t>
4884*a58d3d2aSXin Li
4885*a58d3d2aSXin Li<t>
4886*a58d3d2aSXin LiWhen coding a stereo signal, three coding methods are available:
4887*a58d3d2aSXin Li<list style="symbols">
4888*a58d3d2aSXin Li<t>mid-side stereo: encodes the mean and the difference of the left and right channels,</t>
4889*a58d3d2aSXin Li<t>intensity stereo: only encodes the mean of the left and right channels (discards the difference),</t>
4890*a58d3d2aSXin Li<t>dual stereo: encodes the left and right channels separately.</t>
4891*a58d3d2aSXin Li</list>
4892*a58d3d2aSXin Li</t>
4893*a58d3d2aSXin Li
4894*a58d3d2aSXin Li<t>
4895*a58d3d2aSXin LiAn overview of the decoder is given in <xref target="celt-decoder-overview"/>.
4896*a58d3d2aSXin Li</t>
4897*a58d3d2aSXin Li
4898*a58d3d2aSXin Li<figure anchor="celt-decoder-overview" title="Structure of the CELT decoder">
4899*a58d3d2aSXin Li<artwork align="center"><![CDATA[
4900*a58d3d2aSXin Li               +---------+
4901*a58d3d2aSXin Li               | Coarse  |
4902*a58d3d2aSXin Li            +->| decoder |----+
4903*a58d3d2aSXin Li            |  +---------+    |
4904*a58d3d2aSXin Li            |                 |
4905*a58d3d2aSXin Li            |  +---------+    v
4906*a58d3d2aSXin Li            |  |  Fine   |  +---+
4907*a58d3d2aSXin Li            +->| decoder |->| + |
4908*a58d3d2aSXin Li            |  +---------+  +---+
4909*a58d3d2aSXin Li            |       ^         |
4910*a58d3d2aSXin Li+---------+ |       |         |
4911*a58d3d2aSXin Li|  Range  | | +----------+    v
4912*a58d3d2aSXin Li| Decoder |-+ |   Bit    | +------+
4913*a58d3d2aSXin Li+---------+ | |Allocation| | 2**x |
4914*a58d3d2aSXin Li            | +----------+ +------+
4915*a58d3d2aSXin Li            |       |         |
4916*a58d3d2aSXin Li            |       v         v               +--------+
4917*a58d3d2aSXin Li            |  +---------+  +---+  +-------+  | pitch  |
4918*a58d3d2aSXin Li            +->|   PVQ   |->| * |->| IMDCT |->| post-  |--->
4919*a58d3d2aSXin Li            |  | decoder |  +---+  +-------+  | filter |
4920*a58d3d2aSXin Li            |  +---------+                    +--------+
4921*a58d3d2aSXin Li            |                                      ^
4922*a58d3d2aSXin Li            +--------------------------------------+
4923*a58d3d2aSXin Li]]></artwork>
4924*a58d3d2aSXin Li</figure>
4925*a58d3d2aSXin Li
4926*a58d3d2aSXin Li<t>
4927*a58d3d2aSXin LiThe decoder is based on the following symbols and sets of symbols:
4928*a58d3d2aSXin Li</t>
4929*a58d3d2aSXin Li
4930*a58d3d2aSXin Li<texttable anchor="celt_symbols"
4931*a58d3d2aSXin Li title="Order of the Symbols in the CELT Section of the Bitstream">
4932*a58d3d2aSXin Li<ttcol align="center">Symbol(s)</ttcol>
4933*a58d3d2aSXin Li<ttcol align="center">PDF</ttcol>
4934*a58d3d2aSXin Li<ttcol align="center">Condition</ttcol>
4935*a58d3d2aSXin Li<c>silence</c>      <c>{32767, 1}/32768</c> <c></c>
4936*a58d3d2aSXin Li<c>post-filter</c>  <c>{1, 1}/2</c> <c></c>
4937*a58d3d2aSXin Li<c>octave</c>       <c>uniform (6)</c><c>post-filter</c>
4938*a58d3d2aSXin Li<c>period</c>       <c>raw bits (4+octave)</c><c>post-filter</c>
4939*a58d3d2aSXin Li<c>gain</c>         <c>raw bits (3)</c><c>post-filter</c>
4940*a58d3d2aSXin Li<c>tapset</c>       <c>{2, 1, 1}/4</c><c>post-filter</c>
4941*a58d3d2aSXin Li<c>transient</c>    <c>{7, 1}/8</c><c></c>
4942*a58d3d2aSXin Li<c>intra</c>        <c>{7, 1}/8</c><c></c>
4943*a58d3d2aSXin Li<c>coarse energy</c><c><xref target="energy-decoding"/></c><c></c>
4944*a58d3d2aSXin Li<c>tf_change</c>    <c><xref target="transient-decoding"/></c><c></c>
4945*a58d3d2aSXin Li<c>tf_select</c>    <c>{1, 1}/2</c><c><xref target="transient-decoding"/></c>
4946*a58d3d2aSXin Li<c>spread</c>       <c>{7, 2, 21, 2}/32</c><c></c>
4947*a58d3d2aSXin Li<c>dyn. alloc.</c>  <c><xref target="allocation"/></c><c></c>
4948*a58d3d2aSXin Li<c>alloc. trim</c>  <c>{2, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2}/128</c><c></c>
4949*a58d3d2aSXin Li<c>skip</c>         <c>{1, 1}/2</c><c><xref target="allocation"/></c>
4950*a58d3d2aSXin Li<c>intensity</c>    <c>uniform</c><c><xref target="allocation"/></c>
4951*a58d3d2aSXin Li<c>dual</c>         <c>{1, 1}/2</c><c></c>
4952*a58d3d2aSXin Li<c>fine energy</c>  <c><xref target="energy-decoding"/></c><c></c>
4953*a58d3d2aSXin Li<c>residual</c>     <c><xref target="PVQ-decoder"/></c><c></c>
4954*a58d3d2aSXin Li<c>anti-collapse</c><c>{1, 1}/2</c><c><xref target="anti-collapse"/></c>
4955*a58d3d2aSXin Li<c>finalize</c>     <c><xref target="energy-decoding"/></c><c></c>
4956*a58d3d2aSXin Li</texttable>
4957*a58d3d2aSXin Li
4958*a58d3d2aSXin Li<t>
4959*a58d3d2aSXin LiThe decoder extracts information from the range-coded bitstream in the order
4960*a58d3d2aSXin Lidescribed in <xref target='celt_symbols'/>. In some circumstances, it is
4961*a58d3d2aSXin Lipossible for a decoded value to be out of range due to a very small amount of redundancy
4962*a58d3d2aSXin Liin the encoding of large integers by the range coder.
4963*a58d3d2aSXin LiIn that case, the decoder should assume there has been an error in the coding,
4964*a58d3d2aSXin Lidecoding, or transmission and SHOULD take measures to conceal the error and/or report
4965*a58d3d2aSXin Lito the application that a problem has occurred. Such out of range errors cannot occur
4966*a58d3d2aSXin Liin the SILK layer.
4967*a58d3d2aSXin Li</t>
4968*a58d3d2aSXin Li
4969*a58d3d2aSXin Li<section anchor="transient-decoding" title="Transient Decoding">
4970*a58d3d2aSXin Li<t>
4971*a58d3d2aSXin LiThe "transient" flag indicates whether the frame uses a single long MDCT or several short MDCTs.
4972*a58d3d2aSXin LiWhen it is set, then the MDCT coefficients represent multiple
4973*a58d3d2aSXin Lishort MDCTs in the frame. When not set, the coefficients represent a single
4974*a58d3d2aSXin Lilong MDCT for the frame. The flag is encoded in the bitstream with a probability of 1/8.
4975*a58d3d2aSXin LiIn addition to the global transient flag is a per-band
4976*a58d3d2aSXin Libinary flag to change the time-frequency (tf) resolution independently in each band. The
4977*a58d3d2aSXin Lichange in tf resolution is defined in tf_select_table[][] in celt.c and depends
4978*a58d3d2aSXin Lion the frame size, whether the transient flag is set, and the value of tf_select.
4979*a58d3d2aSXin LiThe tf_select flag uses a 1/2 probability, but is only decoded
4980*a58d3d2aSXin Liif it can have an impact on the result knowing the value of all per-band
4981*a58d3d2aSXin Litf_change flags.
4982*a58d3d2aSXin Li</t>
4983*a58d3d2aSXin Li</section>
4984*a58d3d2aSXin Li
4985*a58d3d2aSXin Li<section anchor="energy-decoding" title="Energy Envelope Decoding">
4986*a58d3d2aSXin Li
4987*a58d3d2aSXin Li<t>
4988*a58d3d2aSXin LiIt is important to quantize the energy with sufficient resolution because
4989*a58d3d2aSXin Liany energy quantization error cannot be compensated for at a later
4990*a58d3d2aSXin Listage. Regardless of the resolution used for encoding the spectral shape of a band,
4991*a58d3d2aSXin Liit is perceptually important to preserve the energy in each band. CELT uses a
4992*a58d3d2aSXin Lithree-step coarse-fine-fine strategy for encoding the energy in the base-2 log
4993*a58d3d2aSXin Lidomain, as implemented in quant_bands.c</t>
4994*a58d3d2aSXin Li
4995*a58d3d2aSXin Li<section anchor="coarse-energy-decoding" title="Coarse energy decoding">
4996*a58d3d2aSXin Li<t>
4997*a58d3d2aSXin LiCoarse quantization of the energy uses a fixed resolution of 6 dB
4998*a58d3d2aSXin Li(integer part of base-2 log). To minimize the bitrate, prediction is applied
4999*a58d3d2aSXin Liboth in time (using the previous frame) and in frequency (using the previous
5000*a58d3d2aSXin Libands). The part of the prediction that is based on the
5001*a58d3d2aSXin Liprevious frame can be disabled, creating an "intra" frame where the energy
5002*a58d3d2aSXin Liis coded without reference to prior frames. The decoder first reads the intra flag
5003*a58d3d2aSXin Lito determine what prediction is used.
5004*a58d3d2aSXin LiThe 2-D z-transform <xref target='z-transform'/> of
5005*a58d3d2aSXin Lithe prediction filter is:
5006*a58d3d2aSXin Li<figure align="center">
5007*a58d3d2aSXin Li<artwork align="center"><![CDATA[
5008*a58d3d2aSXin Li                            -1          -1
5009*a58d3d2aSXin Li              (1 - alpha*z_l  )*(1 - z_b  )
5010*a58d3d2aSXin LiA(z_l, z_b) = -----------------------------
5011*a58d3d2aSXin Li                                 -1
5012*a58d3d2aSXin Li                     1 - beta*z_b
5013*a58d3d2aSXin Li]]></artwork>
5014*a58d3d2aSXin Li</figure>
5015*a58d3d2aSXin Liwhere b is the band index and l is the frame index. The prediction coefficients
5016*a58d3d2aSXin Liapplied depend on the frame size in use when not using intra energy and are alpha=0, beta=4915/32768
5017*a58d3d2aSXin Liwhen using intra energy.
5018*a58d3d2aSXin LiThe time-domain prediction is based on the final fine quantization of the previous
5019*a58d3d2aSXin Liframe, while the frequency domain (within the current frame) prediction is based
5020*a58d3d2aSXin Lion coarse quantization only (because the fine quantization has not been computed
5021*a58d3d2aSXin Liyet). The prediction is clamped internally so that fixed point implementations with
5022*a58d3d2aSXin Lilimited dynamic range always remain in the same state as floating point implementations.
5023*a58d3d2aSXin LiWe approximate the ideal
5024*a58d3d2aSXin Liprobability distribution of the prediction error using a Laplace distribution
5025*a58d3d2aSXin Liwith separate parameters for each frame size in intra- and inter-frame modes. These
5026*a58d3d2aSXin Liparameters are held in the e_prob_model table in quant_bands.c.
5027*a58d3d2aSXin LiThe
5028*a58d3d2aSXin Licoarse energy quantization is performed by unquant_coarse_energy() and
5029*a58d3d2aSXin Liunquant_coarse_energy_impl() (quant_bands.c). The encoding of the Laplace-distributed values is
5030*a58d3d2aSXin Liimplemented in ec_laplace_decode() (laplace.c).
5031*a58d3d2aSXin Li</t>
5032*a58d3d2aSXin Li
5033*a58d3d2aSXin Li</section>
5034*a58d3d2aSXin Li
5035*a58d3d2aSXin Li<section anchor="fine-energy-decoding" title="Fine energy quantization">
5036*a58d3d2aSXin Li<t>
5037*a58d3d2aSXin LiThe number of bits assigned to fine energy quantization in each band is determined
5038*a58d3d2aSXin Liby the bit allocation computation described in <xref target="allocation"></xref>.
5039*a58d3d2aSXin LiLet B_i be the number of fine energy bits
5040*a58d3d2aSXin Lifor band i; the refinement is an integer f in the range [0,2**B_i-1]. The mapping between f
5041*a58d3d2aSXin Liand the correction applied to the coarse energy is equal to (f+1/2)/2**B_i - 1/2. Fine
5042*a58d3d2aSXin Lienergy quantization is implemented in quant_fine_energy() (quant_bands.c).
5043*a58d3d2aSXin Li</t>
5044*a58d3d2aSXin Li<t>
5045*a58d3d2aSXin LiWhen some bits are left "unused" after all other flags have been decoded, these bits
5046*a58d3d2aSXin Liare assigned to a "final" step of fine allocation. In effect, these bits are used
5047*a58d3d2aSXin Lito add one extra fine energy bit per band per channel. The allocation process
5048*a58d3d2aSXin Lidetermines two "priorities" for the final fine bits.
5049*a58d3d2aSXin LiAny remaining bits are first assigned only to bands of priority 0, starting
5050*a58d3d2aSXin Lifrom band 0 and going up. If all bands of priority 0 have received one bit per
5051*a58d3d2aSXin Lichannel, then bands of priority 1 are assigned an extra bit per channel,
5052*a58d3d2aSXin Listarting from band 0. If any bits are left after this, they are left unused.
5053*a58d3d2aSXin LiThis is implemented in unquant_energy_finalise() (quant_bands.c).
5054*a58d3d2aSXin Li</t>
5055*a58d3d2aSXin Li
5056*a58d3d2aSXin Li</section> <!-- fine energy -->
5057*a58d3d2aSXin Li
5058*a58d3d2aSXin Li</section> <!-- Energy decode -->
5059*a58d3d2aSXin Li
5060*a58d3d2aSXin Li<section anchor="allocation" title="Bit Allocation">
5061*a58d3d2aSXin Li
5062*a58d3d2aSXin Li<t>Because the bit allocation drives the decoding of the range-coder
5063*a58d3d2aSXin Listream, it MUST be recovered exactly so that identical coding decisions are
5064*a58d3d2aSXin Limade in the encoder and decoder. Any deviation from the reference's resulting
5065*a58d3d2aSXin Libit allocation will result in corrupted output, though implementers are
5066*a58d3d2aSXin Lifree to implement the procedure in any way which produces identical results.</t>
5067*a58d3d2aSXin Li
5068*a58d3d2aSXin Li<t>The per-band gain-shape structure of the CELT layer ensures that using
5069*a58d3d2aSXin Li the same number of bits for the spectral shape of a band in every frame will
5070*a58d3d2aSXin Li result in a roughly constant signal-to-noise ratio in that band.
5071*a58d3d2aSXin LiThis results in coding noise that has the same spectral envelope as the signal.
5072*a58d3d2aSXin LiThe masking curve produced by a standard psychoacoustic model also closely
5073*a58d3d2aSXin Li follows the spectral envelope of the signal.
5074*a58d3d2aSXin LiThis structure means that the ideal allocation is more consistent from frame to
5075*a58d3d2aSXin Li frame than it is for other codecs without an equivalent structure, and that a
5076*a58d3d2aSXin Li fixed allocation provides fairly consistent perceptual
5077*a58d3d2aSXin Li performance&nbsp;<xref target='Valin2010'/>.</t>
5078*a58d3d2aSXin Li
5079*a58d3d2aSXin Li<t>Many codecs transmit significant amounts of side information to control the
5080*a58d3d2aSXin Li bit allocation within a frame.
5081*a58d3d2aSXin LiOften this control is only indirect, and must be exercised carefully to
5082*a58d3d2aSXin Li achieve the desired rate constraints.
5083*a58d3d2aSXin LiThe CELT layer, however, can adapt over a very wide range of rates, and thus
5084*a58d3d2aSXin Li has a large number of codebook sizes to choose from for each band.
5085*a58d3d2aSXin LiExplicitly signaling the size of each of these codebooks would impose
5086*a58d3d2aSXin Li considerable overhead, even though the allocation is relatively static from
5087*a58d3d2aSXin Li frame to frame.
5088*a58d3d2aSXin LiThis is because all of the information required to compute these codebook sizes
5089*a58d3d2aSXin Li must be derived from a single frame by itself, in order to retain robustness
5090*a58d3d2aSXin Li to packet loss, so the signaling cannot take advantage of knowledge of the
5091*a58d3d2aSXin Li allocation in neighboring frames.
5092*a58d3d2aSXin LiThis problem is exacerbated in low-latency (small frame size) applications,
5093*a58d3d2aSXin Li which would include this overhead in every frame.</t>
5094*a58d3d2aSXin Li
5095*a58d3d2aSXin Li<t>For this reason, in the MDCT mode Opus uses a primarily implicit bit
5096*a58d3d2aSXin Liallocation. The available bitstream capacity is known in advance to both
5097*a58d3d2aSXin Lithe encoder and decoder without additional signaling, ultimately from the
5098*a58d3d2aSXin Lipacket sizes expressed by a higher-level protocol. Using this information,
5099*a58d3d2aSXin Lithe codec interpolates an allocation from a hard-coded table.</t>
5100*a58d3d2aSXin Li
5101*a58d3d2aSXin Li<t>While the band-energy structure effectively models intra-band masking,
5102*a58d3d2aSXin Liit ignores the weaker inter-band masking, band-temporal masking, and
5103*a58d3d2aSXin Liother less significant perceptual effects. While these effects can
5104*a58d3d2aSXin Lioften be ignored, they can become significant for particular samples. One
5105*a58d3d2aSXin Limechanism available to encoders would be to simply increase the overall
5106*a58d3d2aSXin Lirate for these frames, but this is not possible in a constant rate mode
5107*a58d3d2aSXin Liand can be fairly inefficient. As a result three explicitly signaled
5108*a58d3d2aSXin Limechanisms are provided to alter the implicit allocation:</t>
5109*a58d3d2aSXin Li
5110*a58d3d2aSXin Li<t>
5111*a58d3d2aSXin Li<list style="symbols">
5112*a58d3d2aSXin Li<t>Band boost</t>
5113*a58d3d2aSXin Li<t>Allocation trim</t>
5114*a58d3d2aSXin Li<t>Band skipping</t>
5115*a58d3d2aSXin Li</list>
5116*a58d3d2aSXin Li</t>
5117*a58d3d2aSXin Li
5118*a58d3d2aSXin Li<t>The first of these mechanisms, band boost, allows an encoder to boost
5119*a58d3d2aSXin Lithe allocation in specific bands. The second, allocation trim, works by
5120*a58d3d2aSXin Libiasing the overall allocation towards higher or lower frequency bands. The third, band
5121*a58d3d2aSXin Liskipping, selects which low-precision high frequency bands
5122*a58d3d2aSXin Liwill be allocated no shape bits at all.</t>
5123*a58d3d2aSXin Li
5124*a58d3d2aSXin Li<t>In stereo mode there are two additional parameters
5125*a58d3d2aSXin Lipotentially coded as part of the allocation procedure: a parameter to allow the
5126*a58d3d2aSXin Liselective elimination of allocation for the 'side' (i.e., intensity stereo) in jointly coded bands,
5127*a58d3d2aSXin Liand a flag to deactivate joint coding (i.e., dual stereo). These values are not signaled if
5128*a58d3d2aSXin Lithey would be meaningless in the overall context of the allocation.</t>
5129*a58d3d2aSXin Li
5130*a58d3d2aSXin Li<t>Because every signaled adjustment increases overhead and implementation
5131*a58d3d2aSXin Licomplexity, none were included speculatively: the reference encoder makes use
5132*a58d3d2aSXin Liof all of these mechanisms. While the decision logic in the reference was
5133*a58d3d2aSXin Lifound to be effective enough to justify the overhead and complexity, further
5134*a58d3d2aSXin Lianalysis techniques may be discovered which increase the effectiveness of these
5135*a58d3d2aSXin Liparameters. As with other signaled parameters, an encoder is free to choose the
5136*a58d3d2aSXin Livalues in any manner, but unless a technique is known to deliver superior
5137*a58d3d2aSXin Liperceptual results the methods used by the reference implementation should be
5138*a58d3d2aSXin Liused.</t>
5139*a58d3d2aSXin Li
5140*a58d3d2aSXin Li<t>The allocation process consists of the following steps: determining the per-band
5141*a58d3d2aSXin Limaximum allocation vector, decoding the boosts, decoding the tilt, determining
5142*a58d3d2aSXin Lithe remaining capacity of the frame, searching the mode table for the
5143*a58d3d2aSXin Lientry nearest but not exceeding the available space (subject to the tilt, boosts, band
5144*a58d3d2aSXin Limaximums, and band minimums), linear interpolation, reallocation of
5145*a58d3d2aSXin Liunused bits with concurrent skip decoding, determination of the
5146*a58d3d2aSXin Lifine-energy vs. shape split, and final reallocation. This process results
5147*a58d3d2aSXin Liin a per-band shape allocation (in 1/8th bit units), a per-band fine-energy
5148*a58d3d2aSXin Liallocation (in 1 bit per channel units), a set of band priorities for
5149*a58d3d2aSXin Licontrolling the use of remaining bits at the end of the frame, and a
5150*a58d3d2aSXin Liremaining balance of unallocated space, which is usually zero except
5151*a58d3d2aSXin Liat very high rates.</t>
5152*a58d3d2aSXin Li
5153*a58d3d2aSXin Li<t>
5154*a58d3d2aSXin LiThe "static" bit allocation (in 1/8 bits) for a quality q, excluding the minimums, maximums,
5155*a58d3d2aSXin Litilt and boosts, is equal to channels*N*alloc[band][q]&lt;&lt;LM&gt;&gt;2, where
5156*a58d3d2aSXin Lialloc[][] is given in <xref target="static_alloc"/> and LM=log2(frame_size/120). The allocation
5157*a58d3d2aSXin Liis obtained by linearly interpolating between two values of q (in steps of 1/64) to find the
5158*a58d3d2aSXin Lihighest allocation that does not exceed the number of bits remaining.
5159*a58d3d2aSXin Li</t>
5160*a58d3d2aSXin Li
5161*a58d3d2aSXin Li<texttable anchor="static_alloc"
5162*a58d3d2aSXin Li title="CELT Static Allocation Table">
5163*a58d3d2aSXin Li <preamble>Rows indicate the MDCT bands, columns are the different quality (q) parameters. The units are 1/32 bit per MDCT bin.</preamble>
5164*a58d3d2aSXin Li<ttcol align="right">0</ttcol>
5165*a58d3d2aSXin Li<ttcol align="right">1</ttcol>
5166*a58d3d2aSXin Li<ttcol align="right">2</ttcol>
5167*a58d3d2aSXin Li<ttcol align="right">3</ttcol>
5168*a58d3d2aSXin Li<ttcol align="right">4</ttcol>
5169*a58d3d2aSXin Li<ttcol align="right">5</ttcol>
5170*a58d3d2aSXin Li<ttcol align="right">6</ttcol>
5171*a58d3d2aSXin Li<ttcol align="right">7</ttcol>
5172*a58d3d2aSXin Li<ttcol align="right">8</ttcol>
5173*a58d3d2aSXin Li<ttcol align="right">9</ttcol>
5174*a58d3d2aSXin Li<ttcol align="right">10</ttcol>
5175*a58d3d2aSXin Li<c>0</c><c>90</c><c>110</c><c>118</c><c>126</c><c>134</c><c>144</c><c>152</c><c>162</c><c>172</c><c>200</c>
5176*a58d3d2aSXin Li<c>0</c><c>80</c><c>100</c><c>110</c><c>119</c><c>127</c><c>137</c><c>145</c><c>155</c><c>165</c><c>200</c>
5177*a58d3d2aSXin Li<c>0</c><c>75</c><c>90</c><c>103</c><c>112</c><c>120</c><c>130</c><c>138</c><c>148</c><c>158</c><c>200</c>
5178*a58d3d2aSXin Li<c>0</c><c>69</c><c>84</c><c>93</c><c>104</c><c>114</c><c>124</c><c>132</c><c>142</c><c>152</c><c>200</c>
5179*a58d3d2aSXin Li<c>0</c><c>63</c><c>78</c><c>86</c><c>95</c><c>103</c><c>113</c><c>123</c><c>133</c><c>143</c><c>200</c>
5180*a58d3d2aSXin Li<c>0</c><c>56</c><c>71</c><c>80</c><c>89</c><c>97</c><c>107</c><c>117</c><c>127</c><c>137</c><c>200</c>
5181*a58d3d2aSXin Li<c>0</c><c>49</c><c>65</c><c>75</c><c>83</c><c>91</c><c>101</c><c>111</c><c>121</c><c>131</c><c>200</c>
5182*a58d3d2aSXin Li<c>0</c><c>40</c><c>58</c><c>70</c><c>78</c><c>85</c><c>95</c><c>105</c><c>115</c><c>125</c><c>200</c>
5183*a58d3d2aSXin Li<c>0</c><c>34</c><c>51</c><c>65</c><c>72</c><c>78</c><c>88</c><c>98</c><c>108</c><c>118</c><c>198</c>
5184*a58d3d2aSXin Li<c>0</c><c>29</c><c>45</c><c>59</c><c>66</c><c>72</c><c>82</c><c>92</c><c>102</c><c>112</c><c>193</c>
5185*a58d3d2aSXin Li<c>0</c><c>20</c><c>39</c><c>53</c><c>60</c><c>66</c><c>76</c><c>86</c><c>96</c><c>106</c><c>188</c>
5186*a58d3d2aSXin Li<c>0</c><c>18</c><c>32</c><c>47</c><c>54</c><c>60</c><c>70</c><c>80</c><c>90</c><c>100</c><c>183</c>
5187*a58d3d2aSXin Li<c>0</c><c>10</c><c>26</c><c>40</c><c>47</c><c>54</c><c>64</c><c>74</c><c>84</c><c>94</c><c>178</c>
5188*a58d3d2aSXin Li<c>0</c><c>0</c><c>20</c><c>31</c><c>39</c><c>47</c><c>57</c><c>67</c><c>77</c><c>87</c><c>173</c>
5189*a58d3d2aSXin Li<c>0</c><c>0</c><c>12</c><c>23</c><c>32</c><c>41</c><c>51</c><c>61</c><c>71</c><c>81</c><c>168</c>
5190*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>15</c><c>25</c><c>35</c><c>45</c><c>55</c><c>65</c><c>75</c><c>163</c>
5191*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>4</c><c>17</c><c>29</c><c>39</c><c>49</c><c>59</c><c>69</c><c>158</c>
5192*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>12</c><c>23</c><c>33</c><c>43</c><c>53</c><c>63</c><c>153</c>
5193*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>1</c><c>16</c><c>26</c><c>36</c><c>46</c><c>56</c><c>148</c>
5194*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>0</c><c>10</c><c>15</c><c>20</c><c>30</c><c>45</c><c>129</c>
5195*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>0</c><c>1</c><c>1</c><c>1</c><c>1</c><c>20</c><c>104</c>
5196*a58d3d2aSXin Li</texttable>
5197*a58d3d2aSXin Li
5198*a58d3d2aSXin Li<t>The maximum allocation vector is an approximation of the maximum space
5199*a58d3d2aSXin Lithat can be used by each band for a given mode. The value is
5200*a58d3d2aSXin Liapproximate because the shape encoding is variable rate (due
5201*a58d3d2aSXin Lito entropy coding of splitting parameters). Setting the maximum too low reduces the
5202*a58d3d2aSXin Limaximum achievable quality in a band while setting it too high
5203*a58d3d2aSXin Limay result in waste: bitstream capacity available at the end
5204*a58d3d2aSXin Liof the frame which can not be put to any use. The maximums
5205*a58d3d2aSXin Lispecified by the codec reflect the average maximum. In the reference
5206*a58d3d2aSXin Liimplementation, the maximums in bits/sample are precomputed in a static table
5207*a58d3d2aSXin Li(see cache_caps50[] in static_modes_float.h) for each band,
5208*a58d3d2aSXin Lifor each value of LM, and for both mono and stereo.
5209*a58d3d2aSXin Li
5210*a58d3d2aSXin LiImplementations are expected
5211*a58d3d2aSXin Lito simply use the same table data, but the procedure for generating
5212*a58d3d2aSXin Lithis table is included in rate.c as part of compute_pulse_cache().</t>
5213*a58d3d2aSXin Li
5214*a58d3d2aSXin Li<t>To convert the values in cache.caps into the actual maximums: first
5215*a58d3d2aSXin Liset nbBands to the maximum number of bands for this mode, and stereo to
5216*a58d3d2aSXin Lizero if stereo is not in use and one otherwise. For each band set N
5217*a58d3d2aSXin Lito the number of MDCT bins covered by the band (for one channel), set LM
5218*a58d3d2aSXin Lito the shift value for the frame size,
5219*a58d3d2aSXin Lithen set i to nbBands*(2*LM+stereo). Then set the maximum for the band to
5220*a58d3d2aSXin Lithe i-th index of cache.caps + 64 and multiply by the number of channels
5221*a58d3d2aSXin Liin the current frame (one or two) and by N, then divide the result by 4
5222*a58d3d2aSXin Liusing integer division. The resulting vector will be called
5223*a58d3d2aSXin Licap[]. The elements fit in signed 16-bit integers but do not fit in 8 bits.
5224*a58d3d2aSXin LiThis procedure is implemented in the reference in the function init_caps() in celt.c.
5225*a58d3d2aSXin Li</t>
5226*a58d3d2aSXin Li
5227*a58d3d2aSXin Li<t>The band boosts are represented by a series of binary symbols which
5228*a58d3d2aSXin Liare entropy coded with very low probability. Each band can potentially be boosted
5229*a58d3d2aSXin Limultiple times, subject to the frame actually having enough room to obey
5230*a58d3d2aSXin Lithe boost and having enough room to code the boost symbol. The default
5231*a58d3d2aSXin Licoding cost for a boost starts out at six bits (probability p=1/64), but subsequent boosts
5232*a58d3d2aSXin Liin a band cost only a single bit and every time a band is boosted the
5233*a58d3d2aSXin Liinitial cost is reduced (down to a minimum of two bits, or p=1/4). Since the initial
5234*a58d3d2aSXin Licost of coding a boost is 6 bits, the coding cost of the boost symbols when
5235*a58d3d2aSXin Licompletely unused is 0.48 bits/frame for a 21 band mode (21*-log2(1-1/2**6)).</t>
5236*a58d3d2aSXin Li
5237*a58d3d2aSXin Li<t>To decode the band boosts: First set 'dynalloc_logp' to 6, the initial
5238*a58d3d2aSXin Liamount of storage required to signal a boost in bits, 'total_bits' to the
5239*a58d3d2aSXin Lisize of the frame in 8th bits, 'total_boost' to zero, and 'tell' to the total number
5240*a58d3d2aSXin Liof 8th bits decoded
5241*a58d3d2aSXin Liso far. For each band from the coding start (0 normally, but 17 in Hybrid mode)
5242*a58d3d2aSXin Lito the coding end (which changes depending on the signaled bandwidth), the boost quanta
5243*a58d3d2aSXin Liin units of 1/8 bit is calculated as quanta = min(8*N, max(48, N)).
5244*a58d3d2aSXin LiThis represents a boost step size of six bits, subject to a lower limit of
5245*a58d3d2aSXin Li1/8th&nbsp;bit/sample and an upper limit of 1&nbsp;bit/sample.
5246*a58d3d2aSXin LiSet 'boost' to zero and 'dynalloc_loop_logp'
5247*a58d3d2aSXin Lito dynalloc_logp. While dynalloc_loop_log (the current worst case symbol cost) in
5248*a58d3d2aSXin Li8th bits plus tell is less than total_bits plus total_boost and boost is less than cap[] for this
5249*a58d3d2aSXin Liband: Decode a bit from the bitstream with a with dynalloc_loop_logp as the cost
5250*a58d3d2aSXin Liof a one, update tell to reflect the current used capacity, if the decoded value
5251*a58d3d2aSXin Liis zero break the  loop otherwise add quanta to boost and total_boost, subtract quanta from
5252*a58d3d2aSXin Litotal_bits, and set dynalloc_loop_log to 1. When the while loop finishes
5253*a58d3d2aSXin Liboost contains the boost for this band. If boost is non-zero and dynalloc_logp
5254*a58d3d2aSXin Liis greater than 2, decrease dynalloc_logp.  Once this process has been
5255*a58d3d2aSXin Liexecuted on all bands, the band boosts have been decoded. This procedure
5256*a58d3d2aSXin Liis implemented around line 2474 of celt.c.</t>
5257*a58d3d2aSXin Li
5258*a58d3d2aSXin Li<t>At very low rates it is possible that there won't be enough available
5259*a58d3d2aSXin Lispace to execute the inner loop even once. In these cases band boost
5260*a58d3d2aSXin Liis not possible but its overhead is completely eliminated. Because of the
5261*a58d3d2aSXin Lihigh cost of band boost when activated, a reasonable encoder should not be
5262*a58d3d2aSXin Liusing it at very low rates. The reference implements its dynalloc decision
5263*a58d3d2aSXin Lilogic around line 1304 of celt.c.</t>
5264*a58d3d2aSXin Li
5265*a58d3d2aSXin Li<t>The allocation trim is a integer value from 0-10. The default value of
5266*a58d3d2aSXin Li5 indicates no trim. The trim parameter is entropy coded in order to
5267*a58d3d2aSXin Lilower the coding cost of less extreme adjustments. Values lower than
5268*a58d3d2aSXin Li5 bias the allocation towards lower frequencies and values above 5
5269*a58d3d2aSXin Libias it towards higher frequencies. Like other signaled parameters, signaling
5270*a58d3d2aSXin Liof the trim is gated so that it is not included if there is insufficient space
5271*a58d3d2aSXin Liavailable in the bitstream. To decode the trim, first set
5272*a58d3d2aSXin Lithe trim value to 5, then if and only if the count of decoded 8th bits so far (ec_tell_frac)
5273*a58d3d2aSXin Liplus 48 (6 bits) is less than or equal to the total frame size in 8th
5274*a58d3d2aSXin Libits minus total_boost (a product of the above band boost procedure),
5275*a58d3d2aSXin Lidecode the trim value using the PDF in <xref target="celt_trim_pdf"/>.</t>
5276*a58d3d2aSXin Li
5277*a58d3d2aSXin Li<texttable anchor="celt_trim_pdf" title="PDF for the Trim">
5278*a58d3d2aSXin Li<ttcol>PDF</ttcol>
5279*a58d3d2aSXin Li<c>{1, 1, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2}/128</c>
5280*a58d3d2aSXin Li</texttable>
5281*a58d3d2aSXin Li
5282*a58d3d2aSXin Li<t>For 10 ms and 20 ms frames using short blocks and that have at least LM+2 bits left prior to
5283*a58d3d2aSXin Lithe allocation process, then one anti-collapse bit is reserved in the allocation process so it can
5284*a58d3d2aSXin Libe decoded later. Following the the anti-collapse reservation, one bit is reserved for skip if available.</t>
5285*a58d3d2aSXin Li
5286*a58d3d2aSXin Li<t>For stereo frames, bits are reserved for intensity stereo and for dual stereo. Intensity stereo
5287*a58d3d2aSXin Lirequires ilog2(end-start) bits. Those bits are reserved if there is enough bits left. Following this, one
5288*a58d3d2aSXin Libit is reserved for dual stereo if available.</t>
5289*a58d3d2aSXin Li
5290*a58d3d2aSXin Li
5291*a58d3d2aSXin Li<t>The allocation computation begins by setting up some initial conditions.
5292*a58d3d2aSXin Li'total' is set to the remaining available 8th bits, computed by taking the
5293*a58d3d2aSXin Lisize of the coded frame times 8 and subtracting ec_tell_frac(). From this value, one (8th bit)
5294*a58d3d2aSXin Liis subtracted to ensure that the resulting allocation will be conservative. 'anti_collapse_rsv'
5295*a58d3d2aSXin Liis set to 8 (8th bits) if and only if the frame is a transient, LM is greater than 1, and total is
5296*a58d3d2aSXin Ligreater than or equal to (LM+2) * 8. Total is then decremented by anti_collapse_rsv and clamped
5297*a58d3d2aSXin Lito be equal to or greater than zero. 'skip_rsv' is set to 8 (8th bits) if total is greater than
5298*a58d3d2aSXin Li8, otherwise it is zero. Total is then decremented by skip_rsv. This reserves space for the
5299*a58d3d2aSXin Lifinal skipping flag.</t>
5300*a58d3d2aSXin Li
5301*a58d3d2aSXin Li<t>If the current frame is stereo, intensity_rsv is set to the conservative log2 in 8th bits
5302*a58d3d2aSXin Liof the number of coded bands for this frame (given by the table LOG2_FRAC_TABLE in rate.c). If
5303*a58d3d2aSXin Liintensity_rsv is greater than total then intensity_rsv is set to zero. Otherwise total is
5304*a58d3d2aSXin Lidecremented by intensity_rsv, and if total is still greater than 8, dual_stereo_rsv is
5305*a58d3d2aSXin Liset to 8 and total is decremented by dual_stereo_rsv.</t>
5306*a58d3d2aSXin Li
5307*a58d3d2aSXin Li<t>The allocation process then computes a vector representing the hard minimum amounts allocation
5308*a58d3d2aSXin Liany band will receive for shape. This minimum is higher than the technical limit of the PVQ
5309*a58d3d2aSXin Liprocess, but very low rate allocations produce an excessively sparse spectrum and these bands
5310*a58d3d2aSXin Liare better served by having no allocation at all. For each coded band, set thresh[band] to
5311*a58d3d2aSXin Litwenty-four times the number of MDCT bins in the band and divide by 16. If 8 times the number
5312*a58d3d2aSXin Liof channels is greater, use that instead. This sets the minimum allocation to one bit per channel
5313*a58d3d2aSXin Lior 48 128th bits per MDCT bin, whichever is greater. The band-size dependent part of this
5314*a58d3d2aSXin Livalue is not scaled by the channel count, because at the very low rates where this limit is
5315*a58d3d2aSXin Liapplicable there will usually be no bits allocated to the side.</t>
5316*a58d3d2aSXin Li
5317*a58d3d2aSXin Li<t>The previously decoded allocation trim is used to derive a vector of per-band adjustments,
5318*a58d3d2aSXin Li'trim_offsets[]'. For each coded band take the alloc_trim and subtract 5 and LM. Then multiply
5319*a58d3d2aSXin Lithe result by the number of channels, the number of MDCT bins in the shortest frame size for this mode,
5320*a58d3d2aSXin Lithe number of remaining bands, 2**LM, and 8. Then divide this value by 64. Finally, if the
5321*a58d3d2aSXin Linumber of MDCT bins in the band per channel is only one, 8 times the number of channels is subtracted
5322*a58d3d2aSXin Liin order to diminish the allocation by one bit, because width 1 bands receive greater benefit
5323*a58d3d2aSXin Lifrom the coarse energy coding.</t>
5324*a58d3d2aSXin Li
5325*a58d3d2aSXin Li
5326*a58d3d2aSXin Li</section>
5327*a58d3d2aSXin Li
5328*a58d3d2aSXin Li<section anchor="PVQ-decoder" title="Shape Decoding">
5329*a58d3d2aSXin Li<t>
5330*a58d3d2aSXin LiIn each band, the normalized "shape" is encoded
5331*a58d3d2aSXin Liusing a vector quantization scheme called a "pyramid vector quantizer".
5332*a58d3d2aSXin Li</t>
5333*a58d3d2aSXin Li
5334*a58d3d2aSXin Li<t>In
5335*a58d3d2aSXin Lithe simplest case, the number of bits allocated in
5336*a58d3d2aSXin Li<xref target="allocation"></xref> is converted to a number of pulses as described
5337*a58d3d2aSXin Liby <xref target="bits-pulses"></xref>. Knowing the number of pulses and the
5338*a58d3d2aSXin Linumber of samples in the band, the decoder calculates the size of the codebook
5339*a58d3d2aSXin Lias detailed in <xref target="cwrs-decoder"></xref>. The size is used to decode
5340*a58d3d2aSXin Lian unsigned integer (uniform probability model), which is the codeword index.
5341*a58d3d2aSXin LiThis index is converted into the corresponding vector as explained in
5342*a58d3d2aSXin Li<xref target="cwrs-decoder"></xref>. This vector is then scaled to unit norm.
5343*a58d3d2aSXin Li</t>
5344*a58d3d2aSXin Li
5345*a58d3d2aSXin Li<section anchor="bits-pulses" title="Bits to Pulses">
5346*a58d3d2aSXin Li<t>
5347*a58d3d2aSXin LiAlthough the allocation is performed in 1/8th bit units, the quantization requires
5348*a58d3d2aSXin Lian integer number of pulses K. To do this, the encoder searches for the value
5349*a58d3d2aSXin Liof K that produces the number of bits nearest to the allocated value
5350*a58d3d2aSXin Li(rounding down if exactly halfway between two values), not to exceed
5351*a58d3d2aSXin Lithe total number of bits available. For efficiency reasons, the search is performed against a
5352*a58d3d2aSXin Liprecomputed allocation table which only permits some K values for each N. The number of
5353*a58d3d2aSXin Licodebook entries can be computed as explained in <xref target="cwrs-decoder"></xref>. The difference
5354*a58d3d2aSXin Libetween the number of bits allocated and the number of bits used is accumulated to a
5355*a58d3d2aSXin Li"balance" (initialized to zero) that helps adjust the
5356*a58d3d2aSXin Liallocation for the next bands. One third of the balance is applied to the
5357*a58d3d2aSXin Libit allocation of each band to help achieve the target allocation. The only
5358*a58d3d2aSXin Liexceptions are the band before the last and the last band, for which half the balance
5359*a58d3d2aSXin Liand the whole balance are applied, respectively.
5360*a58d3d2aSXin Li</t>
5361*a58d3d2aSXin Li</section>
5362*a58d3d2aSXin Li
5363*a58d3d2aSXin Li<section anchor="cwrs-decoder" title="PVQ Decoding">
5364*a58d3d2aSXin Li
5365*a58d3d2aSXin Li<t>
5366*a58d3d2aSXin LiDecoding of PVQ vectors is implemented in decode_pulses() (cwrs.c).
5367*a58d3d2aSXin LiThe unique codeword index is decoded as a uniformly-distributed integer value between 0 and
5368*a58d3d2aSXin LiV(N,K)-1, where V(N,K) is the number of possible combinations of K pulses in
5369*a58d3d2aSXin LiN samples. The index is then converted to a vector in the same way specified in
5370*a58d3d2aSXin Li<xref target="PVQ"></xref>. The indexing is based on the calculation of V(N,K)
5371*a58d3d2aSXin Li(denoted N(L,K) in <xref target="PVQ"></xref>).
5372*a58d3d2aSXin Li</t>
5373*a58d3d2aSXin Li
5374*a58d3d2aSXin Li<t>
5375*a58d3d2aSXin Li The number of combinations can be computed recursively as
5376*a58d3d2aSXin LiV(N,K) = V(N-1,K) + V(N,K-1) + V(N-1,K-1), with V(N,0) = 1 and V(0,K) = 0, K != 0.
5377*a58d3d2aSXin LiThere are many different ways to compute V(N,K), including precomputed tables and direct
5378*a58d3d2aSXin Liuse of the recursive formulation. The reference implementation applies the recursive
5379*a58d3d2aSXin Liformulation one line (or column) at a time to save on memory use,
5380*a58d3d2aSXin Lialong with an alternate,
5381*a58d3d2aSXin Liunivariate recurrence to initialize an arbitrary line, and direct
5382*a58d3d2aSXin Lipolynomial solutions for small N. All of these methods are
5383*a58d3d2aSXin Liequivalent, and have different trade-offs in speed, memory usage, and
5384*a58d3d2aSXin Licode size. Implementations MAY use any methods they like, as long as
5385*a58d3d2aSXin Lithey are equivalent to the mathematical definition.
5386*a58d3d2aSXin Li</t>
5387*a58d3d2aSXin Li
5388*a58d3d2aSXin Li<t>
5389*a58d3d2aSXin LiThe decoded vector X is recovered as follows.
5390*a58d3d2aSXin LiLet i be the index decoded with the procedure in <xref target="ec_dec_uint"/>
5391*a58d3d2aSXin Li with ft&nbsp;=&nbsp;V(N,K), so that 0&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;V(N,K).
5392*a58d3d2aSXin LiLet k&nbsp;=&nbsp;K.
5393*a58d3d2aSXin LiThen for j&nbsp;=&nbsp;0 to (N&nbsp;-&nbsp;1), inclusive, do:
5394*a58d3d2aSXin Li<list style="numbers">
5395*a58d3d2aSXin Li<t>Let p&nbsp;=&nbsp;(V(N-j-1,k)&nbsp;+&nbsp;V(N-j,k))/2.</t>
5396*a58d3d2aSXin Li<t>
5397*a58d3d2aSXin LiIf i&nbsp;&lt;&nbsp;p, then let sgn&nbsp;=&nbsp;1, else let sgn&nbsp;=&nbsp;-1
5398*a58d3d2aSXin Li and set i&nbsp;=&nbsp;i&nbsp;-&nbsp;p.
5399*a58d3d2aSXin Li</t>
5400*a58d3d2aSXin Li<t>Let k0&nbsp;=&nbsp;k and set p&nbsp;=&nbsp;p&nbsp;-&nbsp;V(N-j-1,k).</t>
5401*a58d3d2aSXin Li<t>
5402*a58d3d2aSXin LiWhile p&nbsp;&gt;&nbsp;i, set k&nbsp;=&nbsp;k&nbsp;-&nbsp;1 and
5403*a58d3d2aSXin Li p&nbsp;=&nbsp;p&nbsp;-&nbsp;V(N-j-1,k).
5404*a58d3d2aSXin Li</t>
5405*a58d3d2aSXin Li<t>
5406*a58d3d2aSXin LiSet X[j]&nbsp;=&nbsp;sgn*(k0&nbsp;-&nbsp;k) and i&nbsp;=&nbsp;i&nbsp;-&nbsp;p.
5407*a58d3d2aSXin Li</t>
5408*a58d3d2aSXin Li</list>
5409*a58d3d2aSXin Li</t>
5410*a58d3d2aSXin Li
5411*a58d3d2aSXin Li<t>
5412*a58d3d2aSXin LiThe decoded vector X is then normalized such that its
5413*a58d3d2aSXin LiL2-norm equals one.
5414*a58d3d2aSXin Li</t>
5415*a58d3d2aSXin Li</section>
5416*a58d3d2aSXin Li
5417*a58d3d2aSXin Li<section anchor="spreading" title="Spreading">
5418*a58d3d2aSXin Li<t>
5419*a58d3d2aSXin LiThe normalized vector decoded in <xref target="cwrs-decoder"/> is then rotated
5420*a58d3d2aSXin Lifor the purpose of avoiding tonal artifacts. The rotation gain is equal to
5421*a58d3d2aSXin Li<figure align="center">
5422*a58d3d2aSXin Li<artwork align="center"><![CDATA[
5423*a58d3d2aSXin Lig_r = N / (N + f_r*K)
5424*a58d3d2aSXin Li]]></artwork>
5425*a58d3d2aSXin Li</figure>
5426*a58d3d2aSXin Li
5427*a58d3d2aSXin Liwhere N is the number of dimensions, K is the number of pulses, and f_r depends on
5428*a58d3d2aSXin Lithe value of the "spread" parameter in the bit-stream.
5429*a58d3d2aSXin Li</t>
5430*a58d3d2aSXin Li
5431*a58d3d2aSXin Li<texttable anchor="spread values" title="Spreading Values">
5432*a58d3d2aSXin Li<ttcol>Spread value</ttcol>
5433*a58d3d2aSXin Li<ttcol>f_r</ttcol>
5434*a58d3d2aSXin Li <c>0</c> <c>infinite (no rotation)</c>
5435*a58d3d2aSXin Li <c>1</c> <c>15</c>
5436*a58d3d2aSXin Li <c>2</c> <c>10</c>
5437*a58d3d2aSXin Li <c>3</c> <c>5</c>
5438*a58d3d2aSXin Li</texttable>
5439*a58d3d2aSXin Li
5440*a58d3d2aSXin Li<t>
5441*a58d3d2aSXin LiThe rotation angle is then calculated as
5442*a58d3d2aSXin Li<figure align="center">
5443*a58d3d2aSXin Li<artwork align="center"><![CDATA[
5444*a58d3d2aSXin Li                 2
5445*a58d3d2aSXin Li        pi *  g_r
5446*a58d3d2aSXin Litheta = ----------
5447*a58d3d2aSXin Li            4
5448*a58d3d2aSXin Li]]></artwork>
5449*a58d3d2aSXin Li</figure>
5450*a58d3d2aSXin LiA 2-D rotation R(i,j) between points x_i and x_j is defined as:
5451*a58d3d2aSXin Li<figure align="center">
5452*a58d3d2aSXin Li<artwork align="center"><![CDATA[
5453*a58d3d2aSXin Lix_i' =  cos(theta)*x_i + sin(theta)*x_j
5454*a58d3d2aSXin Lix_j' = -sin(theta)*x_i + cos(theta)*x_j
5455*a58d3d2aSXin Li]]></artwork>
5456*a58d3d2aSXin Li</figure>
5457*a58d3d2aSXin Li
5458*a58d3d2aSXin LiAn N-D rotation is then achieved by applying a series of 2-D rotations back and forth, in the
5459*a58d3d2aSXin Lifollowing order: R(x_1, x_2), R(x_2, x_3), ..., R(x_N-2, X_N-1), R(x_N-1, X_N),
5460*a58d3d2aSXin LiR(x_N-2, X_N-1), ..., R(x_1, x_2).
5461*a58d3d2aSXin Li</t>
5462*a58d3d2aSXin Li
5463*a58d3d2aSXin Li<t>
5464*a58d3d2aSXin LiIf the decoded vector represents more
5465*a58d3d2aSXin Lithan one time block, then this spreading process is applied separately on each time block.
5466*a58d3d2aSXin LiAlso, if each block represents 8 samples or more, then another N-D rotation, by
5467*a58d3d2aSXin Li(pi/2-theta), is applied <spanx style="emph">before</spanx> the rotation described above. This
5468*a58d3d2aSXin Liextra rotation is applied in an interleaved manner with a stride equal to round(sqrt(N/nb_blocks)),
5469*a58d3d2aSXin Lii.e., it is applied independently for each set of sample S_k = {stride*n + k}, n=0..N/stride-1.
5470*a58d3d2aSXin Li</t>
5471*a58d3d2aSXin Li</section>
5472*a58d3d2aSXin Li
5473*a58d3d2aSXin Li<section anchor="split" title="Split decoding">
5474*a58d3d2aSXin Li<t>
5475*a58d3d2aSXin LiTo avoid the need for multi-precision calculations when decoding PVQ codevectors,
5476*a58d3d2aSXin Lithe maximum size allowed for codebooks is 32 bits. When larger codebooks are
5477*a58d3d2aSXin Lineeded, the vector is instead split in two sub-vectors of size N/2.
5478*a58d3d2aSXin LiA quantized gain parameter with precision
5479*a58d3d2aSXin Liderived from the current allocation is entropy coded to represent the relative
5480*a58d3d2aSXin Ligains of each side of the split, and the entire decoding process is recursively
5481*a58d3d2aSXin Liapplied. Multiple levels of splitting may be applied up to a limit of LM+1 splits.
5482*a58d3d2aSXin LiThe same recursive mechanism is applied for the joint coding
5483*a58d3d2aSXin Liof stereo audio.
5484*a58d3d2aSXin Li</t>
5485*a58d3d2aSXin Li
5486*a58d3d2aSXin Li</section>
5487*a58d3d2aSXin Li
5488*a58d3d2aSXin Li<section anchor="tf-change" title="Time-Frequency change">
5489*a58d3d2aSXin Li<t>
5490*a58d3d2aSXin LiThe time-frequency (TF) parameters are used to control the time-frequency resolution tradeoff
5491*a58d3d2aSXin Liin each coded band. For each band, there are two possible TF choices. For the first
5492*a58d3d2aSXin Liband coded, the PDF is {3, 1}/4 for frames marked as transient and {15, 1}/16 for
5493*a58d3d2aSXin Lithe other frames. For subsequent bands, the TF choice is coded relative to the
5494*a58d3d2aSXin Liprevious TF choice with probability {15, 1}/15 for transient frames and {31, 1}/32
5495*a58d3d2aSXin Liotherwise. The mapping between the decoded TF choices and the adjustment in TF
5496*a58d3d2aSXin Liresolution is shown in the tables below.
5497*a58d3d2aSXin Li</t>
5498*a58d3d2aSXin Li
5499*a58d3d2aSXin Li<texttable anchor='tf_00'
5500*a58d3d2aSXin Li title="TF Adjustments for Non-transient Frames and tf_select=0">
5501*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol>
5502*a58d3d2aSXin Li<ttcol align='center'>0</ttcol>
5503*a58d3d2aSXin Li<ttcol align='center'>1</ttcol>
5504*a58d3d2aSXin Li<c>2.5</c>      <c>0</c> <c>-1</c>
5505*a58d3d2aSXin Li<c>5</c>      <c>0</c> <c>-1</c>
5506*a58d3d2aSXin Li<c>10</c>      <c>0</c> <c>-2</c>
5507*a58d3d2aSXin Li<c>20</c>      <c>0</c> <c>-2</c>
5508*a58d3d2aSXin Li</texttable>
5509*a58d3d2aSXin Li
5510*a58d3d2aSXin Li<texttable anchor='tf_01'
5511*a58d3d2aSXin Li title="TF Adjustments for Non-transient Frames and tf_select=1">
5512*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol>
5513*a58d3d2aSXin Li<ttcol align='center'>0</ttcol>
5514*a58d3d2aSXin Li<ttcol align='center'>1</ttcol>
5515*a58d3d2aSXin Li<c>2.5</c>      <c>0</c> <c>-1</c>
5516*a58d3d2aSXin Li<c>5</c>      <c>0</c> <c>-2</c>
5517*a58d3d2aSXin Li<c>10</c>      <c>0</c> <c>-3</c>
5518*a58d3d2aSXin Li<c>20</c>      <c>0</c> <c>-3</c>
5519*a58d3d2aSXin Li</texttable>
5520*a58d3d2aSXin Li
5521*a58d3d2aSXin Li
5522*a58d3d2aSXin Li<texttable anchor='tf_10'
5523*a58d3d2aSXin Li title="TF Adjustments for Transient Frames and tf_select=0">
5524*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol>
5525*a58d3d2aSXin Li<ttcol align='center'>0</ttcol>
5526*a58d3d2aSXin Li<ttcol align='center'>1</ttcol>
5527*a58d3d2aSXin Li<c>2.5</c>      <c>0</c> <c>-1</c>
5528*a58d3d2aSXin Li<c>5</c>      <c>1</c> <c>0</c>
5529*a58d3d2aSXin Li<c>10</c>      <c>2</c> <c>0</c>
5530*a58d3d2aSXin Li<c>20</c>      <c>3</c> <c>0</c>
5531*a58d3d2aSXin Li</texttable>
5532*a58d3d2aSXin Li
5533*a58d3d2aSXin Li<texttable anchor='tf_11'
5534*a58d3d2aSXin Li title="TF Adjustments for Transient Frames and tf_select=1">
5535*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol>
5536*a58d3d2aSXin Li<ttcol align='center'>0</ttcol>
5537*a58d3d2aSXin Li<ttcol align='center'>1</ttcol>
5538*a58d3d2aSXin Li<c>2.5</c>      <c>0</c> <c>-1</c>
5539*a58d3d2aSXin Li<c>5</c>      <c>1</c> <c>-1</c>
5540*a58d3d2aSXin Li<c>10</c>      <c>1</c> <c>-1</c>
5541*a58d3d2aSXin Li<c>20</c>      <c>1</c> <c>-1</c>
5542*a58d3d2aSXin Li</texttable>
5543*a58d3d2aSXin Li
5544*a58d3d2aSXin Li<t>
5545*a58d3d2aSXin LiA negative TF adjustment means that the temporal resolution is increased,
5546*a58d3d2aSXin Liwhile a positive TF adjustment means that the frequency resolution is increased.
5547*a58d3d2aSXin LiChanges in TF resolution are implemented using the Hadamard transform <xref target="Hadamard"/>. To increase
5548*a58d3d2aSXin Lithe time resolution by N, N "levels" of the Hadamard transform are applied to the
5549*a58d3d2aSXin Lidecoded vector for each interleaved MDCT vector. To increase the frequency resolution
5550*a58d3d2aSXin Li(assumes a transient frame), then N levels of the Hadamard transform are applied
5551*a58d3d2aSXin Li<spanx style="emph">across</spanx> the interleaved MDCT vector. In the case of increased
5552*a58d3d2aSXin Litime resolution the decoder uses the "sequency order" because the input vector
5553*a58d3d2aSXin Liis sorted in time.
5554*a58d3d2aSXin Li</t>
5555*a58d3d2aSXin Li</section>
5556*a58d3d2aSXin Li
5557*a58d3d2aSXin Li
5558*a58d3d2aSXin Li</section>
5559*a58d3d2aSXin Li
5560*a58d3d2aSXin Li<section anchor="anti-collapse" title="Anti-Collapse Processing">
5561*a58d3d2aSXin Li<t>
5562*a58d3d2aSXin LiThe anti-collapse feature is designed to avoid the situation where the use of multiple
5563*a58d3d2aSXin Lishort MDCTs causes the energy in one or more of the MDCTs to be zero for
5564*a58d3d2aSXin Lisome bands, causing unpleasant artifacts.
5565*a58d3d2aSXin LiWhen the frame has the transient bit set, an anti-collapse bit is decoded.
5566*a58d3d2aSXin LiWhen anti-collapse is set, the energy in each small MDCT is prevented
5567*a58d3d2aSXin Lifrom collapsing to zero. For each band of each MDCT where a collapse is
5568*a58d3d2aSXin Lidetected, a pseudo-random signal is inserted with an energy corresponding
5569*a58d3d2aSXin Lito the minimum energy over the two previous frames. A renormalization step is
5570*a58d3d2aSXin Lithen required to ensure that the anti-collapse step did not alter the
5571*a58d3d2aSXin Lienergy preservation property.
5572*a58d3d2aSXin Li</t>
5573*a58d3d2aSXin Li</section>
5574*a58d3d2aSXin Li
5575*a58d3d2aSXin Li<section anchor="denormalization" title="Denormalization">
5576*a58d3d2aSXin Li<t>
5577*a58d3d2aSXin LiJust as each band was normalized in the encoder, the last step of the decoder before
5578*a58d3d2aSXin Lithe inverse MDCT is to denormalize the bands. Each decoded normalized band is
5579*a58d3d2aSXin Limultiplied by the square root of the decoded energy. This is done by denormalise_bands()
5580*a58d3d2aSXin Li(bands.c).
5581*a58d3d2aSXin Li</t>
5582*a58d3d2aSXin Li</section>
5583*a58d3d2aSXin Li
5584*a58d3d2aSXin Li<section anchor="inverse-mdct" title="Inverse MDCT">
5585*a58d3d2aSXin Li
5586*a58d3d2aSXin Li
5587*a58d3d2aSXin Li<t>The inverse MDCT implementation has no special characteristics. The
5588*a58d3d2aSXin Liinput is N frequency-domain samples and the output is 2*N time-domain
5589*a58d3d2aSXin Lisamples, while scaling by 1/2. A "low-overlap" window reduces the algorithmic delay.
5590*a58d3d2aSXin LiIt is derived from a basic (full overlap) 240-sample version of the window used by the Vorbis codec:
5591*a58d3d2aSXin Li<figure align="center">
5592*a58d3d2aSXin Li<artwork align="center"><![CDATA[
5593*a58d3d2aSXin Li                                      2
5594*a58d3d2aSXin Li       /   /pi      /pi   n + 1/2\ \ \
5595*a58d3d2aSXin LiW(n) = |sin|-- * sin|-- * -------| | | .
5596*a58d3d2aSXin Li       \   \2       \2       L   / / /
5597*a58d3d2aSXin Li]]></artwork>
5598*a58d3d2aSXin Li</figure>
5599*a58d3d2aSXin LiThe low-overlap window is created by zero-padding the basic window and inserting ones in the
5600*a58d3d2aSXin Limiddle, such that the resulting window still satisfies power complementarity <xref target='Princen86'/>.
5601*a58d3d2aSXin LiThe IMDCT and
5602*a58d3d2aSXin Liwindowing are performed by mdct_backward (mdct.c).
5603*a58d3d2aSXin Li</t>
5604*a58d3d2aSXin Li
5605*a58d3d2aSXin Li<section anchor="post-filter" title="Post-filter">
5606*a58d3d2aSXin Li<t>
5607*a58d3d2aSXin LiThe output of the inverse MDCT (after weighted overlap-add) is sent to the
5608*a58d3d2aSXin Lipost-filter. Although the post-filter is applied at the end, the post-filter
5609*a58d3d2aSXin Liparameters are encoded at the beginning, just after the silence flag.
5610*a58d3d2aSXin LiThe post-filter can be switched on or off using one bit (logp=1).
5611*a58d3d2aSXin LiIf the post-filter is enabled, then the octave is decoded as an integer value
5612*a58d3d2aSXin Libetween 0 and 6 of uniform probability. Once the octave is known, the fine pitch
5613*a58d3d2aSXin Liwithin the octave is decoded using 4+octave raw bits. The final pitch period
5614*a58d3d2aSXin Liis equal to (16&lt;&lt;octave)+fine_pitch-1 so it is bounded between 15 and 1022,
5615*a58d3d2aSXin Liinclusively. Next, the gain is decoded as three raw bits and is equal to
5616*a58d3d2aSXin LiG=3*(int_gain+1)/32. The set of post-filter taps is decoded last, using
5617*a58d3d2aSXin Lia pdf equal to {2, 1, 1}/4. Tapset zero corresponds to the filter coefficients
5618*a58d3d2aSXin Lig0 = 0.3066406250, g1 = 0.2170410156, g2 = 0.1296386719. Tapset one
5619*a58d3d2aSXin Licorresponds to the filter coefficients g0 = 0.4638671875, g1 = 0.2680664062,
5620*a58d3d2aSXin Lig2 = 0, and tapset two uses filter coefficients g0 = 0.7998046875,
5621*a58d3d2aSXin Lig1 = 0.1000976562, g2 = 0.
5622*a58d3d2aSXin Li</t>
5623*a58d3d2aSXin Li
5624*a58d3d2aSXin Li<t>
5625*a58d3d2aSXin LiThe post-filter response is thus computed as:
5626*a58d3d2aSXin Li              <figure align="center">
5627*a58d3d2aSXin Li                <artwork align="center">
5628*a58d3d2aSXin Li                  <![CDATA[
5629*a58d3d2aSXin Li   y(n) = x(n) + G*(g0*y(n-T) + g1*(y(n-T+1)+y(n-T+1))
5630*a58d3d2aSXin Li                              + g2*(y(n-T+2)+y(n-T+2)))
5631*a58d3d2aSXin Li]]>
5632*a58d3d2aSXin Li                </artwork>
5633*a58d3d2aSXin Li              </figure>
5634*a58d3d2aSXin Li
5635*a58d3d2aSXin LiDuring a transition between different gains, a smooth transition is calculated
5636*a58d3d2aSXin Liusing the square of the MDCT window. It is important that values of y(n) be
5637*a58d3d2aSXin Liinterpolated one at a time such that the past value of y(n) used is interpolated.
5638*a58d3d2aSXin Li</t>
5639*a58d3d2aSXin Li</section>
5640*a58d3d2aSXin Li
5641*a58d3d2aSXin Li<section anchor="deemphasis" title="De-emphasis">
5642*a58d3d2aSXin Li<t>
5643*a58d3d2aSXin LiAfter the post-filter,
5644*a58d3d2aSXin Lithe signal is de-emphasized using the inverse of the pre-emphasis filter
5645*a58d3d2aSXin Liused in the encoder:
5646*a58d3d2aSXin Li<figure align="center">
5647*a58d3d2aSXin Li<artwork align="center"><![CDATA[
5648*a58d3d2aSXin Li 1            1
5649*a58d3d2aSXin Li---- = --------------- ,
5650*a58d3d2aSXin LiA(z)                -1
5651*a58d3d2aSXin Li       1 - alpha_p*z
5652*a58d3d2aSXin Li]]></artwork>
5653*a58d3d2aSXin Li</figure>
5654*a58d3d2aSXin Liwhere alpha_p=0.8500061035.
5655*a58d3d2aSXin Li</t>
5656*a58d3d2aSXin Li</section>
5657*a58d3d2aSXin Li
5658*a58d3d2aSXin Li</section>
5659*a58d3d2aSXin Li
5660*a58d3d2aSXin Li</section>
5661*a58d3d2aSXin Li
5662*a58d3d2aSXin Li<section anchor="Packet Loss Concealment" title="Packet Loss Concealment (PLC)">
5663*a58d3d2aSXin Li<t>
5664*a58d3d2aSXin LiPacket loss concealment (PLC) is an optional decoder-side feature that
5665*a58d3d2aSXin LiSHOULD be included when receiving from an unreliable channel. Because
5666*a58d3d2aSXin LiPLC is not part of the bitstream, there are many acceptable ways to
5667*a58d3d2aSXin Liimplement PLC with different complexity/quality trade-offs.
5668*a58d3d2aSXin Li</t>
5669*a58d3d2aSXin Li
5670*a58d3d2aSXin Li<t>
5671*a58d3d2aSXin LiThe PLC in
5672*a58d3d2aSXin Lithe reference implementation depends on the mode of last packet received.
5673*a58d3d2aSXin LiIn CELT mode, the PLC finds a periodicity in the decoded
5674*a58d3d2aSXin Lisignal and repeats the windowed waveform using the pitch offset. The windowed
5675*a58d3d2aSXin Liwaveform is overlapped in such a way as to preserve the time-domain aliasing
5676*a58d3d2aSXin Licancellation with the previous frame and the next frame. This is implemented
5677*a58d3d2aSXin Liin celt_decode_lost() (mdct.c).  In SILK mode, the PLC uses LPC extrapolation
5678*a58d3d2aSXin Lifrom the previous frame, implemented in silk_PLC() (PLC.c).
5679*a58d3d2aSXin Li</t>
5680*a58d3d2aSXin Li
5681*a58d3d2aSXin Li<section anchor="clock-drift" title="Clock Drift Compensation">
5682*a58d3d2aSXin Li<t>
5683*a58d3d2aSXin LiClock drift refers to the gradual desynchronization of two endpoints
5684*a58d3d2aSXin Liwhose sample clocks run at different frequencies while they are streaming
5685*a58d3d2aSXin Lilive audio.  Differences in clock frequencies are generally attributable to
5686*a58d3d2aSXin Limanufacturing variation in the endpoints' clock hardware.  For long-lived
5687*a58d3d2aSXin Listreams, the time difference between sender and receiver can grow without
5688*a58d3d2aSXin Libound.
5689*a58d3d2aSXin Li</t>
5690*a58d3d2aSXin Li
5691*a58d3d2aSXin Li<t>
5692*a58d3d2aSXin LiWhen the sender's clock runs slower than the receiver's, the effect is similar
5693*a58d3d2aSXin Lito packet loss: too few packets are received.  The receiver can distinguish
5694*a58d3d2aSXin Libetween drift and loss if the transport provides packet timestamps.  A receiver
5695*a58d3d2aSXin Lifor live streams SHOULD conceal the effects of drift, and MAY do so by invoking
5696*a58d3d2aSXin Lithe PLC.
5697*a58d3d2aSXin Li</t>
5698*a58d3d2aSXin Li
5699*a58d3d2aSXin Li<t>
5700*a58d3d2aSXin LiWhen the sender's clock runs faster than the receiver's, too many packets will
5701*a58d3d2aSXin Libe received.  The receiver MAY respond by skipping any packet (i.e., not
5702*a58d3d2aSXin Lisubmitting the packet for decoding).  This is likely to produce a less severe
5703*a58d3d2aSXin Liartifact than if the frame were dropped after decoding.
5704*a58d3d2aSXin Li</t>
5705*a58d3d2aSXin Li
5706*a58d3d2aSXin Li<t>
5707*a58d3d2aSXin LiA decoder MAY employ a more sophisticated drift compensation method. For
5708*a58d3d2aSXin Liexample, the
5709*a58d3d2aSXin Li<xref target='Google-NetEQ'>NetEQ component</xref>
5710*a58d3d2aSXin Liof the
5711*a58d3d2aSXin Li<xref target='Google-WebRTC'>Google WebRTC codebase</xref>
5712*a58d3d2aSXin Licompensates for drift by adding or removing
5713*a58d3d2aSXin Lione period when the signal is highly periodic. The reference implementation of
5714*a58d3d2aSXin LiOpus allows a caller to learn whether the current frame's signal is highly
5715*a58d3d2aSXin Liperiodic, and if so what the period is, using the OPUS_GET_PITCH() request.
5716*a58d3d2aSXin Li</t>
5717*a58d3d2aSXin Li</section>
5718*a58d3d2aSXin Li
5719*a58d3d2aSXin Li</section>
5720*a58d3d2aSXin Li
5721*a58d3d2aSXin Li<section anchor="switching" title="Configuration Switching">
5722*a58d3d2aSXin Li
5723*a58d3d2aSXin Li<t>
5724*a58d3d2aSXin LiSwitching between the Opus coding modes, audio bandwidths, and channel counts
5725*a58d3d2aSXin Li requires careful consideration to avoid audible glitches.
5726*a58d3d2aSXin LiSwitching between any two configurations of the CELT-only mode, any two
5727*a58d3d2aSXin Li configurations of the Hybrid mode, or from WB SILK to Hybrid mode does not
5728*a58d3d2aSXin Li require any special treatment in the decoder, as the MDCT overlap will smooth
5729*a58d3d2aSXin Li the transition.
5730*a58d3d2aSXin LiSwitching from Hybrid mode to WB SILK requires adding in the final contents
5731*a58d3d2aSXin Li of the CELT overlap buffer to the first SILK-only packet.
5732*a58d3d2aSXin LiThis can be done by decoding a 2.5&nbsp;ms silence frame with the CELT decoder
5733*a58d3d2aSXin Li using the channel count of the SILK-only packet (and any choice of audio
5734*a58d3d2aSXin Li bandwidth), which will correctly handle the cases when the channel count
5735*a58d3d2aSXin Li changes as well.
5736*a58d3d2aSXin Li</t>
5737*a58d3d2aSXin Li
5738*a58d3d2aSXin Li<t>
5739*a58d3d2aSXin LiWhen changing the channel count for SILK-only or Hybrid packets, the encoder
5740*a58d3d2aSXin Li can avoid glitches by smoothly varying the stereo width of the input signal
5741*a58d3d2aSXin Li before or after the transition, and SHOULD do so.
5742*a58d3d2aSXin LiHowever, other transitions between SILK-only packets or between NB or MB SILK
5743*a58d3d2aSXin Li and Hybrid packets may cause glitches, because neither the LSF coefficients
5744*a58d3d2aSXin Li nor the LTP, LPC, stereo unmixing, and resampler buffers are available at the
5745*a58d3d2aSXin Li new sample rate.
5746*a58d3d2aSXin LiThese switches SHOULD be delayed by the encoder until quiet periods or
5747*a58d3d2aSXin Li transients, where the inevitable glitches will be less audible. Additionally,
5748*a58d3d2aSXin Li the bit-stream MAY include redundant side information ("redundancy"), in the
5749*a58d3d2aSXin Li form of additional CELT frames embedded in each of the Opus frames around the
5750*a58d3d2aSXin Li transition.
5751*a58d3d2aSXin Li</t>
5752*a58d3d2aSXin Li
5753*a58d3d2aSXin Li<t>
5754*a58d3d2aSXin LiThe other transitions that cannot be easily handled are those where the lower
5755*a58d3d2aSXin Li frequencies switch between the SILK LP-based model and the CELT MDCT model.
5756*a58d3d2aSXin LiHowever, an encoder may not have an opportunity to delay such a switch to a
5757*a58d3d2aSXin Li convenient point.
5758*a58d3d2aSXin LiFor example, if the content switches from speech to music, and the encoder does
5759*a58d3d2aSXin Li not have enough latency in its analysis to detect this in advance, there may
5760*a58d3d2aSXin Li be no convenient silence period during which to make the transition for quite
5761*a58d3d2aSXin Li some time.
5762*a58d3d2aSXin LiTo avoid or reduce glitches during these problematic mode transitions, and
5763*a58d3d2aSXin Li also between audio bandwidth changes in the SILK-only modes, transitions MAY
5764*a58d3d2aSXin Li include redundant side information ("redundancy"), in the form of an
5765*a58d3d2aSXin Li additional CELT frame embedded in the Opus frame.
5766*a58d3d2aSXin Li</t>
5767*a58d3d2aSXin Li
5768*a58d3d2aSXin Li<t>
5769*a58d3d2aSXin LiA transition between coding the lower frequencies with the LP model and the
5770*a58d3d2aSXin Li MDCT model or a transition that involves changing the SILK bandwidth
5771*a58d3d2aSXin Li is only normatively specified when it includes redundancy.
5772*a58d3d2aSXin LiFor those without redundancy, it is RECOMMENDED that the decoder use a
5773*a58d3d2aSXin Li concealment technique (e.g., make use of a PLC algorithm) to "fill in" the
5774*a58d3d2aSXin Li gap or discontinuity caused by the mode transition.
5775*a58d3d2aSXin LiTherefore, PLC MUST NOT be applied during any normative transition, i.e., when
5776*a58d3d2aSXin Li<list style="symbols">
5777*a58d3d2aSXin Li<t>A packet includes redundancy for this transition (as described below),</t>
5778*a58d3d2aSXin Li<t>The transition is between any WB SILK packet and any Hybrid packet, or vice
5779*a58d3d2aSXin Li versa,</t>
5780*a58d3d2aSXin Li<t>The transition is between any two Hybrid mode packets, or</t>
5781*a58d3d2aSXin Li<t>The transition is between any two CELT mode packets,</t>
5782*a58d3d2aSXin Li</list>
5783*a58d3d2aSXin Li unless there is actual packet loss.
5784*a58d3d2aSXin Li</t>
5785*a58d3d2aSXin Li
5786*a58d3d2aSXin Li<section anchor="side-info" title="Transition Side Information (Redundancy)">
5787*a58d3d2aSXin Li<t>
5788*a58d3d2aSXin LiTransitions with side information include an extra 5&nbsp;ms "redundant" CELT
5789*a58d3d2aSXin Li frame within the Opus frame.
5790*a58d3d2aSXin LiThis frame is designed to fill in the gap or discontinuity in the different
5791*a58d3d2aSXin Li layers without requiring the decoder to conceal it.
5792*a58d3d2aSXin LiFor transitions from CELT-only to SILK-only or Hybrid, the redundant frame is
5793*a58d3d2aSXin Li inserted in the first Opus frame after the transition (i.e., the first
5794*a58d3d2aSXin Li SILK-only or Hybrid frame).
5795*a58d3d2aSXin LiFor transitions from SILK-only or Hybrid to CELT-only, the redundant frame is
5796*a58d3d2aSXin Li inserted in the last Opus frame before the transition (i.e., the last
5797*a58d3d2aSXin Li SILK-only or Hybrid frame).
5798*a58d3d2aSXin Li</t>
5799*a58d3d2aSXin Li
5800*a58d3d2aSXin Li<section anchor="opus_redundancy_flag" title="Redundancy Flag">
5801*a58d3d2aSXin Li<t>
5802*a58d3d2aSXin LiThe presence of redundancy is signaled in all SILK-only and Hybrid frames, not
5803*a58d3d2aSXin Li just those involved in a mode transition.
5804*a58d3d2aSXin LiThis allows the frames to be decoded correctly even if an adjacent frame is
5805*a58d3d2aSXin Li lost.
5806*a58d3d2aSXin LiFor SILK-only frames, this signaling is implicit, based on the size of the
5807*a58d3d2aSXin Li of the Opus frame and the number of bits consumed decoding the SILK portion of
5808*a58d3d2aSXin Li it.
5809*a58d3d2aSXin LiAfter decoding the SILK portion of the Opus frame, the decoder uses ec_tell()
5810*a58d3d2aSXin Li (see <xref target="ec_tell"/>) to check if there are at least 17 bits
5811*a58d3d2aSXin Li remaining.
5812*a58d3d2aSXin LiIf so, then the frame contains redundancy.
5813*a58d3d2aSXin Li</t>
5814*a58d3d2aSXin Li
5815*a58d3d2aSXin Li<t>
5816*a58d3d2aSXin LiFor Hybrid frames, this signaling is explicit.
5817*a58d3d2aSXin LiAfter decoding the SILK portion of the Opus frame, the decoder uses ec_tell()
5818*a58d3d2aSXin Li (see <xref target="ec_tell"/>) to ensure there are at least 37 bits remaining.
5819*a58d3d2aSXin LiIf so, it reads a symbol with the PDF in
5820*a58d3d2aSXin Li <xref target="opus_redundancy_flag_pdf"/>, and if the value is 1, then the
5821*a58d3d2aSXin Li frame contains redundancy.
5822*a58d3d2aSXin LiOtherwise (if there were fewer than 37 bits left or the value was 0), the frame
5823*a58d3d2aSXin Li does not contain redundancy.
5824*a58d3d2aSXin Li</t>
5825*a58d3d2aSXin Li
5826*a58d3d2aSXin Li<texttable anchor="opus_redundancy_flag_pdf" title="Redundancy Flag PDF">
5827*a58d3d2aSXin Li<ttcol>PDF</ttcol>
5828*a58d3d2aSXin Li<c>{4095, 1}/4096</c>
5829*a58d3d2aSXin Li</texttable>
5830*a58d3d2aSXin Li</section>
5831*a58d3d2aSXin Li
5832*a58d3d2aSXin Li<section anchor="opus_redundancy_pos" title="Redundancy Position Flag">
5833*a58d3d2aSXin Li<t>
5834*a58d3d2aSXin LiSince the current frame is a SILK-only or a Hybrid frame, it must be at least
5835*a58d3d2aSXin Li 10&nbsp;ms.
5836*a58d3d2aSXin LiTherefore, it needs an additional flag to indicate whether the redundant
5837*a58d3d2aSXin Li 5&nbsp;ms CELT frame should be mixed into the beginning of the current frame,
5838*a58d3d2aSXin Li or the end.
5839*a58d3d2aSXin LiAfter determining that a frame contains redundancy, the decoder reads a
5840*a58d3d2aSXin Li 1&nbsp;bit symbol with a uniform PDF
5841*a58d3d2aSXin Li (<xref target="opus_redundancy_pos_pdf"/>).
5842*a58d3d2aSXin Li</t>
5843*a58d3d2aSXin Li
5844*a58d3d2aSXin Li<texttable anchor="opus_redundancy_pos_pdf" title="Redundancy Position PDF">
5845*a58d3d2aSXin Li<ttcol>PDF</ttcol>
5846*a58d3d2aSXin Li<c>{1, 1}/2</c>
5847*a58d3d2aSXin Li</texttable>
5848*a58d3d2aSXin Li
5849*a58d3d2aSXin Li<t>
5850*a58d3d2aSXin LiIf the value is zero, this is the first frame in the transition, and the
5851*a58d3d2aSXin Li redundancy belongs at the end.
5852*a58d3d2aSXin LiIf the value is one, this is the second frame in the transition, and the
5853*a58d3d2aSXin Li redundancy belongs at the beginning.
5854*a58d3d2aSXin LiThere is no way to specify that an Opus frame contains separate redundant CELT
5855*a58d3d2aSXin Li frames at both the beginning and the end.
5856*a58d3d2aSXin Li</t>
5857*a58d3d2aSXin Li</section>
5858*a58d3d2aSXin Li
5859*a58d3d2aSXin Li<section anchor="opus_redundancy_size" title="Redundancy Size">
5860*a58d3d2aSXin Li<t>
5861*a58d3d2aSXin LiUnlike the CELT portion of a Hybrid frame, the redundant CELT frame does not
5862*a58d3d2aSXin Li use the same entropy coder state as the rest of the Opus frame, because this
5863*a58d3d2aSXin Li would break the CELT bit allocation mechanism in Hybrid frames.
5864*a58d3d2aSXin LiThus, a redundant CELT frame always starts and ends on a byte boundary, even in
5865*a58d3d2aSXin Li SILK-only frames, where this is not strictly necessary.
5866*a58d3d2aSXin Li</t>
5867*a58d3d2aSXin Li
5868*a58d3d2aSXin Li<t>
5869*a58d3d2aSXin LiFor SILK-only frames, the number of bytes in the redundant CELT frame is simply
5870*a58d3d2aSXin Li the number of whole bytes remaining, which must be at least 2, due to the
5871*a58d3d2aSXin Li space check in <xref target="opus_redundancy_flag"/>.
5872*a58d3d2aSXin LiFor Hybrid frames, the number of bytes is equal to 2, plus a decoded unsigned
5873*a58d3d2aSXin Li integer less than 256 (see <xref target="ec_dec_uint"/>).
5874*a58d3d2aSXin LiThis may be more than the number of whole bytes remaining in the Opus frame,
5875*a58d3d2aSXin Li in which case the frame is invalid.
5876*a58d3d2aSXin LiHowever, a decoder is not required to ignore the entire frame, as this may be
5877*a58d3d2aSXin Li the result of a bit error that desynchronized the range coder.
5878*a58d3d2aSXin LiThere may still be useful data before the error, and a decoder MAY keep any
5879*a58d3d2aSXin Li audio decoded so far instead of invoking the PLC, but it is RECOMMENDED that
5880*a58d3d2aSXin Li the decoder stop decoding and discard the rest of the current Opus frame.
5881*a58d3d2aSXin Li</t>
5882*a58d3d2aSXin Li
5883*a58d3d2aSXin Li<t>
5884*a58d3d2aSXin LiIt would have been possible to avoid these invalid states in the design of Opus
5885*a58d3d2aSXin Li by limiting the range of the explicit length decoded from Hybrid frames by the
5886*a58d3d2aSXin Li actual number of whole bytes remaining.
5887*a58d3d2aSXin LiHowever, this would require an encoder to determine the rate allocation for the
5888*a58d3d2aSXin Li MDCT layer up front, before it began encoding that layer.
5889*a58d3d2aSXin LiBy allowing some invalid sizes, the encoder is able to defer that decision
5890*a58d3d2aSXin Li until much later.
5891*a58d3d2aSXin LiWhen encoding Hybrid frames which do not include redundancy, the encoder must
5892*a58d3d2aSXin Li still decide up-front if it wishes to use the minimum 37 bits required to
5893*a58d3d2aSXin Li trigger encoding of the redundancy flag, but this is a much looser
5894*a58d3d2aSXin Li restriction.
5895*a58d3d2aSXin Li</t>
5896*a58d3d2aSXin Li
5897*a58d3d2aSXin Li<t>
5898*a58d3d2aSXin LiAfter determining the size of the redundant CELT frame, the decoder reduces
5899*a58d3d2aSXin Li the size of the buffer currently in use by the range coder by that amount.
5900*a58d3d2aSXin LiThe CELT layer read any raw bits from the end of this reduced buffer, and all
5901*a58d3d2aSXin Li calculations of the number of bits remaining in the buffer must be done using
5902*a58d3d2aSXin Li this new, reduced size, rather than the original size of the Opus frame.
5903*a58d3d2aSXin Li</t>
5904*a58d3d2aSXin Li</section>
5905*a58d3d2aSXin Li
5906*a58d3d2aSXin Li<section anchor="opus_redundancy_decoding" title="Decoding the Redundancy">
5907*a58d3d2aSXin Li<t>
5908*a58d3d2aSXin LiThe redundant frame is decoded like any other CELT-only frame, with the
5909*a58d3d2aSXin Li exception that it does not contain a TOC byte.
5910*a58d3d2aSXin LiThe frame size is fixed at 5&nbsp;ms, the channel count is set to that of the
5911*a58d3d2aSXin Li current frame, and the audio bandwidth is also set to that of the current
5912*a58d3d2aSXin Li frame, with the exception that for MB SILK frames, it is set to WB.
5913*a58d3d2aSXin Li</t>
5914*a58d3d2aSXin Li
5915*a58d3d2aSXin Li<t>
5916*a58d3d2aSXin LiIf the redundancy belongs at the beginning (in a CELT-only to SILK-only or
5917*a58d3d2aSXin Li Hybrid transition), the final reconstructed output uses the first 2.5&nbsp;ms
5918*a58d3d2aSXin Li of audio output by the decoder for the redundant frame as-is, discarding
5919*a58d3d2aSXin Li the corresponding output from the SILK-only or Hybrid portion of the frame.
5920*a58d3d2aSXin LiThe remaining 2.5&nbsp;ms is cross-lapped with the decoded SILK/Hybrid signal
5921*a58d3d2aSXin Li using the CELT's power-complementary MDCT window to ensure a smooth
5922*a58d3d2aSXin Li transition.
5923*a58d3d2aSXin Li</t>
5924*a58d3d2aSXin Li
5925*a58d3d2aSXin Li<t>
5926*a58d3d2aSXin LiIf the redundancy belongs at the end (in a SILK-only or Hybrid to CELT-only
5927*a58d3d2aSXin Li transition), only the second half (2.5&nbsp;ms) of the audio output by the
5928*a58d3d2aSXin Li decoder for the redundant frame is used.
5929*a58d3d2aSXin LiIn that case, the second half of the redundant frame is cross-lapped with the
5930*a58d3d2aSXin Li end of the SILK/Hybrid signal, again using CELT's power-complementary MDCT
5931*a58d3d2aSXin Li window to ensure a smooth transition.
5932*a58d3d2aSXin Li</t>
5933*a58d3d2aSXin Li</section>
5934*a58d3d2aSXin Li
5935*a58d3d2aSXin Li</section>
5936*a58d3d2aSXin Li
5937*a58d3d2aSXin Li<section anchor="decoder-reset" title="State Reset">
5938*a58d3d2aSXin Li<t>
5939*a58d3d2aSXin LiWhen a transition occurs, the state of the SILK or the CELT decoder (or both)
5940*a58d3d2aSXin Li may need to be reset before decoding a frame in the new mode.
5941*a58d3d2aSXin LiThis avoids reusing "out of date" memory, which may not have been updated in
5942*a58d3d2aSXin Li some time or may not be in a well-defined state due to, e.g., PLC.
5943*a58d3d2aSXin LiThe SILK state is reset before every SILK-only or Hybrid frame where the
5944*a58d3d2aSXin Li previous frame was CELT-only.
5945*a58d3d2aSXin LiThe CELT state is reset every time the operating mode changes and the new mode
5946*a58d3d2aSXin Li is either Hybrid or CELT-only, except when the transition uses redundancy as
5947*a58d3d2aSXin Li described above.
5948*a58d3d2aSXin LiWhen switching from SILK-only or Hybrid to CELT-only with redundancy, the CELT
5949*a58d3d2aSXin Li state is reset before decoding the redundant CELT frame embedded in the
5950*a58d3d2aSXin Li SILK-only or Hybrid frame, but it is not reset before decoding the following
5951*a58d3d2aSXin Li CELT-only frame.
5952*a58d3d2aSXin LiWhen switching from CELT-only mode to SILK-only or Hybrid mode with redundancy,
5953*a58d3d2aSXin Li the CELT decoder is not reset for decoding the redundant CELT frame.
5954*a58d3d2aSXin Li</t>
5955*a58d3d2aSXin Li</section>
5956*a58d3d2aSXin Li
5957*a58d3d2aSXin Li<section title="Summary of Transitions">
5958*a58d3d2aSXin Li
5959*a58d3d2aSXin Li<t>
5960*a58d3d2aSXin Li<xref target="normative_transitions"/> illustrates all of the normative
5961*a58d3d2aSXin Li transitions involving a mode change, an audio bandwidth change, or both.
5962*a58d3d2aSXin LiEach one uses an S, H, or C to represent an Opus frame in the corresponding
5963*a58d3d2aSXin Li mode.
5964*a58d3d2aSXin LiIn addition, an R indicates the presence of redundancy in the Opus frame it is
5965*a58d3d2aSXin Li cross-lapped with.
5966*a58d3d2aSXin LiIts location in the first or last 5&nbsp;ms is assumed to correspond to whether
5967*a58d3d2aSXin Li it is the frame before or after the transition.
5968*a58d3d2aSXin LiOther uses of redundancy are non-normative.
5969*a58d3d2aSXin LiFinally, a c indicates the contents of the CELT overlap buffer after the
5970*a58d3d2aSXin Li previously decoded frame (i.e., as extracted by decoding a silence frame).
5971*a58d3d2aSXin Li<figure align="center" anchor="normative_transitions"
5972*a58d3d2aSXin Li title="Normative Transitions">
5973*a58d3d2aSXin Li<artwork align="center"><![CDATA[
5974*a58d3d2aSXin LiSILK to SILK with Redundancy:             S -> S -> S
5975*a58d3d2aSXin Li                                                    &
5976*a58d3d2aSXin Li                                                   !R -> R
5977*a58d3d2aSXin Li                                                         &
5978*a58d3d2aSXin Li                                                        ;S -> S -> S
5979*a58d3d2aSXin Li
5980*a58d3d2aSXin LiNB or MB SILK to Hybrid with Redundancy:  S -> S -> S
5981*a58d3d2aSXin Li                                                    &
5982*a58d3d2aSXin Li                                                   !R ->;H -> H -> H
5983*a58d3d2aSXin Li
5984*a58d3d2aSXin LiWB SILK to Hybrid:                        S -> S -> S ->!H -> H -> H
5985*a58d3d2aSXin Li
5986*a58d3d2aSXin LiSILK to CELT with Redundancy:             S -> S -> S
5987*a58d3d2aSXin Li                                                    &
5988*a58d3d2aSXin Li                                                   !R -> C -> C -> C
5989*a58d3d2aSXin Li
5990*a58d3d2aSXin LiHybrid to NB or MB SILK with Redundancy:  H -> H -> H
5991*a58d3d2aSXin Li                                                    &
5992*a58d3d2aSXin Li                                                   !R -> R
5993*a58d3d2aSXin Li                                                         &
5994*a58d3d2aSXin Li                                                        ;S -> S -> S
5995*a58d3d2aSXin Li
5996*a58d3d2aSXin LiHybrid to WB SILK:                        H -> H -> H -> c
5997*a58d3d2aSXin Li                                                      \  +
5998*a58d3d2aSXin Li                                                       > S -> S -> S
5999*a58d3d2aSXin Li
6000*a58d3d2aSXin LiHybrid to CELT with Redundancy:           H -> H -> H
6001*a58d3d2aSXin Li                                                    &
6002*a58d3d2aSXin Li                                                   !R -> C -> C -> C
6003*a58d3d2aSXin Li
6004*a58d3d2aSXin LiCELT to SILK with Redundancy:             C -> C -> C -> R
6005*a58d3d2aSXin Li                                                         &
6006*a58d3d2aSXin Li                                                        ;S -> S -> S
6007*a58d3d2aSXin Li
6008*a58d3d2aSXin LiCELT to Hybrid with Redundancy:           C -> C -> C -> R
6009*a58d3d2aSXin Li                                                         &
6010*a58d3d2aSXin Li                                                        |H -> H -> H
6011*a58d3d2aSXin Li
6012*a58d3d2aSXin LiKey:
6013*a58d3d2aSXin LiS   SILK-only frame                 ;   SILK decoder reset
6014*a58d3d2aSXin LiH   Hybrid frame                    |   CELT and SILK decoder resets
6015*a58d3d2aSXin LiC   CELT-only frame                 !   CELT decoder reset
6016*a58d3d2aSXin Lic   CELT overlap                    +   Direct mixing
6017*a58d3d2aSXin LiR   Redundant CELT frame            &   Windowed cross-lap
6018*a58d3d2aSXin Li]]></artwork>
6019*a58d3d2aSXin Li</figure>
6020*a58d3d2aSXin LiThe first two and the last two Opus frames in each example are illustrative,
6021*a58d3d2aSXin Li i.e., there is no requirement that a stream remain in the same configuration
6022*a58d3d2aSXin Li for three consecutive frames before or after a switch.
6023*a58d3d2aSXin Li</t>
6024*a58d3d2aSXin Li
6025*a58d3d2aSXin Li<t>
6026*a58d3d2aSXin LiThe behavior of transitions without redundancy where PLC is allowed is non-normative.
6027*a58d3d2aSXin LiAn encoder might still wish to use these transitions if, for example, it
6028*a58d3d2aSXin Li doesn't want to add the extra bitrate required for redundancy or if it makes
6029*a58d3d2aSXin Li a decision to switch after it has already transmitted the frame that would
6030*a58d3d2aSXin Li have had to contain the redundancy.
6031*a58d3d2aSXin Li<xref target="nonnormative_transitions"/> illustrates the recommended
6032*a58d3d2aSXin Li cross-lapping and decoder resets for these transitions.
6033*a58d3d2aSXin Li<figure align="center" anchor="nonnormative_transitions"
6034*a58d3d2aSXin Li title="Recommended Non-Normative Transitions">
6035*a58d3d2aSXin Li<artwork align="center"><![CDATA[
6036*a58d3d2aSXin LiSILK to SILK (audio bandwidth change):    S -> S -> S   ;S -> S -> S
6037*a58d3d2aSXin Li
6038*a58d3d2aSXin LiNB or MB SILK to Hybrid:                  S -> S -> S   |H -> H -> H
6039*a58d3d2aSXin Li
6040*a58d3d2aSXin LiSILK to CELT without Redundancy:          S -> S -> S -> P
6041*a58d3d2aSXin Li                                                         &
6042*a58d3d2aSXin Li                                                        !C -> C -> C
6043*a58d3d2aSXin Li
6044*a58d3d2aSXin LiHybrid to NB or MB SILK:                  H -> H -> H -> c
6045*a58d3d2aSXin Li                                                         +
6046*a58d3d2aSXin Li                                                        ;S -> S -> S
6047*a58d3d2aSXin Li
6048*a58d3d2aSXin LiHybrid to CELT without Redundancy:        H -> H -> H -> P
6049*a58d3d2aSXin Li                                                         &
6050*a58d3d2aSXin Li                                                        !C -> C -> C
6051*a58d3d2aSXin Li
6052*a58d3d2aSXin LiCELT to SILK without Redundancy:          C -> C -> C -> P
6053*a58d3d2aSXin Li                                                         &
6054*a58d3d2aSXin Li                                                        ;S -> S -> S
6055*a58d3d2aSXin Li
6056*a58d3d2aSXin LiCELT to Hybrid without Redundancy:        C -> C -> C -> P
6057*a58d3d2aSXin Li                                                         &
6058*a58d3d2aSXin Li                                                        |H -> H -> H
6059*a58d3d2aSXin Li
6060*a58d3d2aSXin LiKey:
6061*a58d3d2aSXin LiS   SILK-only frame                 ;   SILK decoder reset
6062*a58d3d2aSXin LiH   Hybrid frame                    |   CELT and SILK decoder resets
6063*a58d3d2aSXin LiC   CELT-only frame                 !   CELT decoder reset
6064*a58d3d2aSXin Lic   CELT overlap                    +   Direct mixing
6065*a58d3d2aSXin LiP   Packet Loss Concealment         &   Windowed cross-lap
6066*a58d3d2aSXin Li]]></artwork>
6067*a58d3d2aSXin Li</figure>
6068*a58d3d2aSXin LiEncoders SHOULD NOT use other transitions, e.g., those that involve redundancy
6069*a58d3d2aSXin Li in ways not illustrated in <xref target="normative_transitions"/>.
6070*a58d3d2aSXin Li</t>
6071*a58d3d2aSXin Li
6072*a58d3d2aSXin Li</section>
6073*a58d3d2aSXin Li
6074*a58d3d2aSXin Li</section>
6075*a58d3d2aSXin Li
6076*a58d3d2aSXin Li</section>
6077*a58d3d2aSXin Li
6078*a58d3d2aSXin Li
6079*a58d3d2aSXin Li<!--  ******************************************************************* -->
6080*a58d3d2aSXin Li<!--  **************************   OPUS ENCODER   *********************** -->
6081*a58d3d2aSXin Li<!--  ******************************************************************* -->
6082*a58d3d2aSXin Li
6083*a58d3d2aSXin Li<section title="Opus Encoder">
6084*a58d3d2aSXin Li<t>
6085*a58d3d2aSXin LiJust like the decoder, the Opus encoder also normally consists of two main blocks: the
6086*a58d3d2aSXin LiSILK encoder and the CELT encoder. However, unlike the case of the decoder, a valid
6087*a58d3d2aSXin Li(though potentially suboptimal) Opus encoder is not required to support all modes and
6088*a58d3d2aSXin Limay thus only include a SILK encoder module or a CELT encoder module.
6089*a58d3d2aSXin LiThe output bit-stream of the Opus encoding contains bits from the SILK and CELT
6090*a58d3d2aSXin Li encoders, though these are not separable due to the use of a range coder.
6091*a58d3d2aSXin LiA block diagram of the encoder is illustrated below.
6092*a58d3d2aSXin Li
6093*a58d3d2aSXin Li<figure align="center" anchor="opus-encoder-figure" title="Opus Encoder">
6094*a58d3d2aSXin Li<artwork>
6095*a58d3d2aSXin Li<![CDATA[
6096*a58d3d2aSXin Li                    +------------+    +---------+
6097*a58d3d2aSXin Li                    |   Sample   |    |  SILK   |------+
6098*a58d3d2aSXin Li                 +->|    Rate    |--->| Encoder |      V
6099*a58d3d2aSXin Li  +-----------+  |  | Conversion |    |         | +---------+
6100*a58d3d2aSXin Li  | Optional  |  |  +------------+    +---------+ |  Range  |
6101*a58d3d2aSXin Li->| High-pass |--+                                | Encoder |---->
6102*a58d3d2aSXin Li  |  Filter   |  |  +--------------+  +---------+ |         | Bit-
6103*a58d3d2aSXin Li  +-----------+  |  |    Delay     |  |  CELT   | +---------+ stream
6104*a58d3d2aSXin Li                 +->| Compensation |->| Encoder |      ^
6105*a58d3d2aSXin Li                    |              |  |         |------+
6106*a58d3d2aSXin Li                    +--------------+  +---------+
6107*a58d3d2aSXin Li]]>
6108*a58d3d2aSXin Li</artwork>
6109*a58d3d2aSXin Li</figure>
6110*a58d3d2aSXin Li</t>
6111*a58d3d2aSXin Li
6112*a58d3d2aSXin Li<t>
6113*a58d3d2aSXin LiFor a normal encoder where both the SILK and the CELT modules are included, an optimal
6114*a58d3d2aSXin Liencoder should select which coding mode to use at run-time depending on the conditions.
6115*a58d3d2aSXin LiIn the reference implementation, the frame size is selected by the application, but the
6116*a58d3d2aSXin Liother configuration parameters (number of channels, bandwidth, mode) are automatically
6117*a58d3d2aSXin Liselected (unless explicitly overridden by the application) depend on the following:
6118*a58d3d2aSXin Li<list style="symbols">
6119*a58d3d2aSXin Li<t>Requested bitrate</t>
6120*a58d3d2aSXin Li<t>Input sampling rate</t>
6121*a58d3d2aSXin Li<t>Type of signal (speech vs music)</t>
6122*a58d3d2aSXin Li<t>Frame size in use</t>
6123*a58d3d2aSXin Li</list>
6124*a58d3d2aSXin Li
6125*a58d3d2aSXin LiThe type of signal currently needs to be provided by the application (though it can be
6126*a58d3d2aSXin Lichanged in real-time). An Opus encoder implementation could also do automatic detection,
6127*a58d3d2aSXin Libut since Opus is an interactive codec, such an implementation would likely have to either
6128*a58d3d2aSXin Lidelay the signal (for non-interactive applications) or delay the mode switching decisions (for
6129*a58d3d2aSXin Liinteractive applications).
6130*a58d3d2aSXin Li</t>
6131*a58d3d2aSXin Li
6132*a58d3d2aSXin Li<t>
6133*a58d3d2aSXin LiWhen the encoder is configured for voice over IP applications, the input signal is
6134*a58d3d2aSXin Lifiltered by a high-pass filter to remove the lowest part of the spectrum
6135*a58d3d2aSXin Lithat contains little speech energy and may contain background noise. This is a second order
6136*a58d3d2aSXin LiAuto Regressive Moving Average (i.e., with poles and zeros) filter with a cut-off frequency around 50&nbsp;Hz.
6137*a58d3d2aSXin LiIn the future, a music detector may also be used to lower the cut-off frequency when the
6138*a58d3d2aSXin Liinput signal is detected to be music rather than speech.
6139*a58d3d2aSXin Li</t>
6140*a58d3d2aSXin Li
6141*a58d3d2aSXin Li<section anchor="range-encoder" title="Range Encoder">
6142*a58d3d2aSXin Li<t>
6143*a58d3d2aSXin LiThe range coder acts as the bit-packer for Opus.
6144*a58d3d2aSXin LiIt is used in three different ways: to encode
6145*a58d3d2aSXin Li<list style="symbols">
6146*a58d3d2aSXin Li<t>
6147*a58d3d2aSXin LiEntropy-coded symbols with a fixed probability model using ec_encode()
6148*a58d3d2aSXin Li (entenc.c),
6149*a58d3d2aSXin Li</t>
6150*a58d3d2aSXin Li<t>
6151*a58d3d2aSXin LiIntegers from 0 to (2**M&nbsp;-&nbsp;1) using ec_enc_uint() or ec_enc_bits()
6152*a58d3d2aSXin Li (entenc.c),</t>
6153*a58d3d2aSXin Li<t>
6154*a58d3d2aSXin LiIntegers from 0 to (ft&nbsp;-&nbsp;1) (where ft is not a power of two) using
6155*a58d3d2aSXin Li ec_enc_uint() (entenc.c).
6156*a58d3d2aSXin Li</t>
6157*a58d3d2aSXin Li</list>
6158*a58d3d2aSXin Li</t>
6159*a58d3d2aSXin Li
6160*a58d3d2aSXin Li<t>
6161*a58d3d2aSXin LiThe range encoder maintains an internal state vector composed of the four-tuple
6162*a58d3d2aSXin Li (val,&nbsp;rng,&nbsp;rem,&nbsp;ext) representing the low end of the current
6163*a58d3d2aSXin Li range, the size of the current range, a single buffered output byte, and a
6164*a58d3d2aSXin Li count of additional carry-propagating output bytes.
6165*a58d3d2aSXin LiBoth val and rng are 32-bit unsigned integer values, rem is a byte value or
6166*a58d3d2aSXin Li less than 255 or the special value -1, and ext is an unsigned integer with at
6167*a58d3d2aSXin Li least 11 bits.
6168*a58d3d2aSXin LiThis state vector is initialized at the start of each each frame to the value
6169*a58d3d2aSXin Li (0,&nbsp;2**31,&nbsp;-1,&nbsp;0).
6170*a58d3d2aSXin LiAfter encoding a sequence of symbols, the value of rng in the encoder should
6171*a58d3d2aSXin Li exactly match the value of rng in the decoder after decoding the same sequence
6172*a58d3d2aSXin Li of symbols.
6173*a58d3d2aSXin LiThis is a powerful tool for detecting errors in either an encoder or decoder
6174*a58d3d2aSXin Li implementation.
6175*a58d3d2aSXin LiThe value of val, on the other hand, represents different things in the encoder
6176*a58d3d2aSXin Li and decoder, and is not expected to match.
6177*a58d3d2aSXin Li</t>
6178*a58d3d2aSXin Li
6179*a58d3d2aSXin Li<t>
6180*a58d3d2aSXin LiThe decoder has no analog for rem and ext.
6181*a58d3d2aSXin LiThese are used to perform carry propagation in the renormalization loop below.
6182*a58d3d2aSXin LiEach iteration of this loop produces 9 bits of output, consisting of 8 data
6183*a58d3d2aSXin Li bits and a carry flag.
6184*a58d3d2aSXin LiThe encoder cannot determine the final value of the output bytes until it
6185*a58d3d2aSXin Li propagates these carry flags.
6186*a58d3d2aSXin LiTherefore the reference implementation buffers a single non-propagating output
6187*a58d3d2aSXin Li byte (i.e., one less than 255) in rem and keeps a count of additional
6188*a58d3d2aSXin Li propagating (i.e., 255) output bytes in ext.
6189*a58d3d2aSXin LiAn implementation may choose to use any mathematically equivalent scheme to
6190*a58d3d2aSXin Li perform carry propagation.
6191*a58d3d2aSXin Li</t>
6192*a58d3d2aSXin Li
6193*a58d3d2aSXin Li<section anchor="encoding-symbols" title="Encoding Symbols">
6194*a58d3d2aSXin Li<t>
6195*a58d3d2aSXin LiThe main encoding function is ec_encode() (entenc.c), which encodes symbol k in
6196*a58d3d2aSXin Li the current context using the same three-tuple (fl[k],&nbsp;fh[k],&nbsp;ft)
6197*a58d3d2aSXin Li as the decoder to describe the range of the symbol (see
6198*a58d3d2aSXin Li <xref target="range-decoder"/>).
6199*a58d3d2aSXin Li</t>
6200*a58d3d2aSXin Li<t>
6201*a58d3d2aSXin Liec_encode() updates the state of the encoder as follows.
6202*a58d3d2aSXin LiIf fl[k] is greater than zero, then
6203*a58d3d2aSXin Li<figure align="center">
6204*a58d3d2aSXin Li<artwork align="center"><![CDATA[
6205*a58d3d2aSXin Li                  rng
6206*a58d3d2aSXin Lival = val + rng - --- * (ft - fl) ,
6207*a58d3d2aSXin Li                  ft
6208*a58d3d2aSXin Li
6209*a58d3d2aSXin Li      rng
6210*a58d3d2aSXin Lirng = --- * (fh - fl) .
6211*a58d3d2aSXin Li      ft
6212*a58d3d2aSXin Li]]></artwork>
6213*a58d3d2aSXin Li</figure>
6214*a58d3d2aSXin LiOtherwise, val is unchanged and
6215*a58d3d2aSXin Li<figure align="center">
6216*a58d3d2aSXin Li<artwork align="center"><![CDATA[
6217*a58d3d2aSXin Li            rng
6218*a58d3d2aSXin Lirng = rng - --- * (fh - fl) .
6219*a58d3d2aSXin Li            ft
6220*a58d3d2aSXin Li]]></artwork>
6221*a58d3d2aSXin Li</figure>
6222*a58d3d2aSXin LiThe divisions here are integer division.
6223*a58d3d2aSXin Li</t>
6224*a58d3d2aSXin Li
6225*a58d3d2aSXin Li<section anchor="range-encoder-renorm" title="Renormalization">
6226*a58d3d2aSXin Li<t>
6227*a58d3d2aSXin LiAfter this update, the range is normalized using a procedure very similar to
6228*a58d3d2aSXin Li that of <xref target="range-decoder-renorm"/>, implemented by
6229*a58d3d2aSXin Li ec_enc_normalize() (entenc.c).
6230*a58d3d2aSXin LiThe following process is repeated until rng&nbsp;&gt;&nbsp;2**23.
6231*a58d3d2aSXin LiFirst, the top 9 bits of val, (val&gt;&gt;23), are sent to the carry buffer,
6232*a58d3d2aSXin Li described in <xref target="ec_enc_carry_out"/>.
6233*a58d3d2aSXin LiThen, the encoder sets
6234*a58d3d2aSXin Li<figure align="center">
6235*a58d3d2aSXin Li<artwork align="center"><![CDATA[
6236*a58d3d2aSXin Lival = (val<<8) & 0x7FFFFFFF ,
6237*a58d3d2aSXin Li
6238*a58d3d2aSXin Lirng = rng<<8 .
6239*a58d3d2aSXin Li]]></artwork>
6240*a58d3d2aSXin Li</figure>
6241*a58d3d2aSXin Li</t>
6242*a58d3d2aSXin Li</section>
6243*a58d3d2aSXin Li
6244*a58d3d2aSXin Li<section anchor="ec_enc_carry_out"
6245*a58d3d2aSXin Li title="Carry Propagation and Output Buffering">
6246*a58d3d2aSXin Li<t>
6247*a58d3d2aSXin LiThe function ec_enc_carry_out() (entenc.c) implements carry propagation and
6248*a58d3d2aSXin Li output buffering.
6249*a58d3d2aSXin LiIt takes as input a 9-bit value, c, consisting of 8 data bits and an additional
6250*a58d3d2aSXin Li carry bit.
6251*a58d3d2aSXin LiIf c is equal to the value 255, then ext is simply incremented, and no other
6252*a58d3d2aSXin Li state updates are performed.
6253*a58d3d2aSXin LiOtherwise, let b&nbsp;=&nbsp;(c&gt;&gt;8) be the carry bit.
6254*a58d3d2aSXin LiThen,
6255*a58d3d2aSXin Li<list style="symbols">
6256*a58d3d2aSXin Li<t>
6257*a58d3d2aSXin LiIf the buffered byte rem contains a value other than -1, the encoder outputs
6258*a58d3d2aSXin Li the byte (rem&nbsp;+&nbsp;b).
6259*a58d3d2aSXin LiOtherwise, if rem is -1, no byte is output.
6260*a58d3d2aSXin Li</t>
6261*a58d3d2aSXin Li<t>
6262*a58d3d2aSXin LiIf ext is non-zero, then the encoder outputs ext bytes---all with a value of 0
6263*a58d3d2aSXin Li if b is set, or 255 if b is unset---and sets ext to 0.
6264*a58d3d2aSXin Li</t>
6265*a58d3d2aSXin Li<t>
6266*a58d3d2aSXin Lirem is set to the 8 data bits:
6267*a58d3d2aSXin Li<figure align="center">
6268*a58d3d2aSXin Li<artwork align="center"><![CDATA[
6269*a58d3d2aSXin Lirem = c & 255 .
6270*a58d3d2aSXin Li]]></artwork>
6271*a58d3d2aSXin Li</figure>
6272*a58d3d2aSXin Li</t>
6273*a58d3d2aSXin Li</list>
6274*a58d3d2aSXin Li</t>
6275*a58d3d2aSXin Li</section>
6276*a58d3d2aSXin Li
6277*a58d3d2aSXin Li</section>
6278*a58d3d2aSXin Li
6279*a58d3d2aSXin Li<section anchor="encoding-alternate" title="Alternate Encoding Methods">
6280*a58d3d2aSXin Li<t>
6281*a58d3d2aSXin LiThe reference implementation uses three additional encoding methods that are
6282*a58d3d2aSXin Li exactly equivalent to the above, but make assumptions and simplifications that
6283*a58d3d2aSXin Li allow for a more efficient implementation.
6284*a58d3d2aSXin Li</t>
6285*a58d3d2aSXin Li
6286*a58d3d2aSXin Li<section anchor="ec_encode_bin" title="ec_encode_bin()">
6287*a58d3d2aSXin Li<t>
6288*a58d3d2aSXin LiThe first is ec_encode_bin() (entenc.c), defined using the parameter ftb
6289*a58d3d2aSXin Li instead of ft.
6290*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_encode() with
6291*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;ftb), but avoids using division.
6292*a58d3d2aSXin Li</t>
6293*a58d3d2aSXin Li</section>
6294*a58d3d2aSXin Li
6295*a58d3d2aSXin Li<section anchor="ec_enc_bit_logp" title="ec_enc_bit_logp()">
6296*a58d3d2aSXin Li<t>
6297*a58d3d2aSXin LiThe next is ec_enc_bit_logp() (entenc.c), which encodes a single binary symbol.
6298*a58d3d2aSXin LiThe context is described by a single parameter, logp, which is the absolute
6299*a58d3d2aSXin Li value of the base-2 logarithm of the probability of a "1".
6300*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_encode() with the 3-tuple
6301*a58d3d2aSXin Li (fl[k]&nbsp;=&nbsp;0, fh[k]&nbsp;=&nbsp;(1&lt;&lt;logp)&nbsp;-&nbsp;1,
6302*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;logp)) if k is 0 and with
6303*a58d3d2aSXin Li (fl[k]&nbsp;=&nbsp;(1&lt;&lt;logp)&nbsp;-&nbsp;1,
6304*a58d3d2aSXin Li fh[k]&nbsp;=&nbsp;ft&nbsp;=&nbsp;(1&lt;&lt;logp)) if k is 1.
6305*a58d3d2aSXin LiThe implementation requires no multiplications or divisions.
6306*a58d3d2aSXin Li</t>
6307*a58d3d2aSXin Li</section>
6308*a58d3d2aSXin Li
6309*a58d3d2aSXin Li<section anchor="ec_enc_icdf" title="ec_enc_icdf()">
6310*a58d3d2aSXin Li<t>
6311*a58d3d2aSXin LiThe last is ec_enc_icdf() (entenc.c), which encodes a single binary symbol with
6312*a58d3d2aSXin Li a table-based context of up to 8 bits.
6313*a58d3d2aSXin LiThis uses the same icdf table as ec_dec_icdf() from
6314*a58d3d2aSXin Li <xref target="ec_dec_icdf"/>.
6315*a58d3d2aSXin LiThe function is mathematically equivalent to calling ec_encode() with
6316*a58d3d2aSXin Li fl[k]&nbsp;=&nbsp;(1&lt;&lt;ftb)&nbsp;-&nbsp;icdf[k-1] (or 0 if
6317*a58d3d2aSXin Li k&nbsp;==&nbsp;0), fh[k]&nbsp;=&nbsp;(1&lt;&lt;ftb)&nbsp;-&nbsp;icdf[k], and
6318*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;ftb).
6319*a58d3d2aSXin LiThis only saves a few arithmetic operations over ec_encode_bin(), but allows
6320*a58d3d2aSXin Li the encoder to use the same icdf tables as the decoder.
6321*a58d3d2aSXin Li</t>
6322*a58d3d2aSXin Li</section>
6323*a58d3d2aSXin Li
6324*a58d3d2aSXin Li</section>
6325*a58d3d2aSXin Li
6326*a58d3d2aSXin Li<section anchor="encoding-bits" title="Encoding Raw Bits">
6327*a58d3d2aSXin Li<t>
6328*a58d3d2aSXin LiThe raw bits used by the CELT layer are packed at the end of the buffer using
6329*a58d3d2aSXin Li ec_enc_bits() (entenc.c).
6330*a58d3d2aSXin LiBecause the raw bits may continue into the last byte output by the range coder
6331*a58d3d2aSXin Li if there is room in the low-order bits, the encoder must be prepared to merge
6332*a58d3d2aSXin Li these values into a single byte.
6333*a58d3d2aSXin LiThe procedure in <xref target="encoder-finalizing"/> does this in a way that
6334*a58d3d2aSXin Li ensures both the range coded data and the raw bits can be decoded
6335*a58d3d2aSXin Li successfully.
6336*a58d3d2aSXin Li</t>
6337*a58d3d2aSXin Li</section>
6338*a58d3d2aSXin Li
6339*a58d3d2aSXin Li<section anchor="encoding-ints" title="Encoding Uniformly Distributed Integers">
6340*a58d3d2aSXin Li<t>
6341*a58d3d2aSXin LiThe function ec_enc_uint() (entenc.c) encodes one of ft equiprobable symbols in
6342*a58d3d2aSXin Li the range 0 to (ft&nbsp;-&nbsp;1), inclusive, each with a frequency of 1,
6343*a58d3d2aSXin Li where ft may be as large as (2**32&nbsp;-&nbsp;1).
6344*a58d3d2aSXin LiLike the decoder (see <xref target="ec_dec_uint"/>), it splits up the
6345*a58d3d2aSXin Li value into a range coded symbol representing up to 8 of the high bits, and, if
6346*a58d3d2aSXin Li necessary, raw bits representing the remainder of the value.
6347*a58d3d2aSXin Li</t>
6348*a58d3d2aSXin Li<t>
6349*a58d3d2aSXin Liec_enc_uint() takes a two-tuple (t,&nbsp;ft), where t is the value to be
6350*a58d3d2aSXin Li encoded, 0&nbsp;&lt;=&nbsp;t&nbsp;&lt;&nbsp;ft, and ft is not necessarily a
6351*a58d3d2aSXin Li power of two.
6352*a58d3d2aSXin LiLet ftb&nbsp;=&nbsp;ilog(ft&nbsp;-&nbsp;1), i.e., the number of bits required
6353*a58d3d2aSXin Li to store (ft&nbsp;-&nbsp;1) in two's complement notation.
6354*a58d3d2aSXin LiIf ftb is 8 or less, then t is encoded directly using ec_encode() with the
6355*a58d3d2aSXin Li three-tuple (t, t&nbsp;+&nbsp;1, ft).
6356*a58d3d2aSXin Li</t>
6357*a58d3d2aSXin Li<t>
6358*a58d3d2aSXin LiIf ftb is greater than 8, then the top 8 bits of t are encoded using the
6359*a58d3d2aSXin Li three-tuple (t&gt;&gt;(ftb&nbsp;-&nbsp;8),
6360*a58d3d2aSXin Li (t&gt;&gt;(ftb&nbsp;-&nbsp;8))&nbsp;+&nbsp;1,
6361*a58d3d2aSXin Li ((ft&nbsp;-&nbsp;1)&gt;&gt;(ftb&nbsp;-&nbsp;8))&nbsp;+&nbsp;1), and the
6362*a58d3d2aSXin Li remaining bits,
6363*a58d3d2aSXin Li (t&nbsp;&amp;&nbsp;((1&lt;&lt;(ftb&nbsp;-&nbsp;8))&nbsp;-&nbsp;1),
6364*a58d3d2aSXin Li are encoded as raw bits with ec_enc_bits().
6365*a58d3d2aSXin Li</t>
6366*a58d3d2aSXin Li</section>
6367*a58d3d2aSXin Li
6368*a58d3d2aSXin Li<section anchor="encoder-finalizing" title="Finalizing the Stream">
6369*a58d3d2aSXin Li<t>
6370*a58d3d2aSXin LiAfter all symbols are encoded, the stream must be finalized by outputting a
6371*a58d3d2aSXin Li value inside the current range.
6372*a58d3d2aSXin LiLet end be the integer in the interval [val,&nbsp;val&nbsp;+&nbsp;rng) with the
6373*a58d3d2aSXin Li largest number of trailing zero bits, b, such that
6374*a58d3d2aSXin Li (end&nbsp;+&nbsp;(1&lt;&lt;b)&nbsp;-&nbsp;1) is also in the interval
6375*a58d3d2aSXin Li [val,&nbsp;val&nbsp;+&nbsp;rng).
6376*a58d3d2aSXin LiThis choice of end allows the maximum number of trailing bits to be set to
6377*a58d3d2aSXin Li arbitrary values while still ensuring the range coded part of the buffer can
6378*a58d3d2aSXin Li be decoded correctly.
6379*a58d3d2aSXin LiThen, while end is not zero, the top 9 bits of end, i.e., (end&gt;&gt;23), are
6380*a58d3d2aSXin Li passed to the carry buffer in accordance with the procedure in
6381*a58d3d2aSXin Li <xref target="ec_enc_carry_out"/>, and end is updated via
6382*a58d3d2aSXin Li<figure align="center">
6383*a58d3d2aSXin Li<artwork align="center"><![CDATA[
6384*a58d3d2aSXin Liend = (end<<8) & 0x7FFFFFFF .
6385*a58d3d2aSXin Li]]></artwork>
6386*a58d3d2aSXin Li</figure>
6387*a58d3d2aSXin LiFinally, if the buffered output byte, rem, is neither zero nor the special
6388*a58d3d2aSXin Li value -1, or the carry count, ext, is greater than zero, then 9 zero bits are
6389*a58d3d2aSXin Li sent to the carry buffer to flush it to the output buffer.
6390*a58d3d2aSXin LiWhen outputting the final byte from the range coder, if it would overlap any
6391*a58d3d2aSXin Li raw bits already packed into the end of the output buffer, they should be ORed
6392*a58d3d2aSXin Li into the same byte.
6393*a58d3d2aSXin LiThe bit allocation routines in the CELT layer should ensure that this can be
6394*a58d3d2aSXin Li done without corrupting the range coder data so long as end is chosen as
6395*a58d3d2aSXin Li described above.
6396*a58d3d2aSXin LiIf there is any space between the end of the range coder data and the end of
6397*a58d3d2aSXin Li the raw bits, it is padded with zero bits.
6398*a58d3d2aSXin LiThis entire process is implemented by ec_enc_done() (entenc.c).
6399*a58d3d2aSXin Li</t>
6400*a58d3d2aSXin Li</section>
6401*a58d3d2aSXin Li
6402*a58d3d2aSXin Li<section anchor="encoder-tell" title="Current Bit Usage">
6403*a58d3d2aSXin Li<t>
6404*a58d3d2aSXin Li   The bit allocation routines in Opus need to be able to determine a
6405*a58d3d2aSXin Li   conservative upper bound on the number of bits that have been used
6406*a58d3d2aSXin Li   to encode the current frame thus far. This drives allocation
6407*a58d3d2aSXin Li   decisions and ensures that the range coder and raw bits will not
6408*a58d3d2aSXin Li   overflow the output buffer. This is computed in the
6409*a58d3d2aSXin Li   reference implementation to whole-bit precision by
6410*a58d3d2aSXin Li   the function ec_tell() (entcode.h) and to fractional 1/8th bit
6411*a58d3d2aSXin Li   precision by the function ec_tell_frac() (entcode.c).
6412*a58d3d2aSXin Li   Like all operations in the range coder, it must be implemented in a
6413*a58d3d2aSXin Li   bit-exact manner, and must produce exactly the same value returned by
6414*a58d3d2aSXin Li   the same functions in the decoder after decoding the same symbols.
6415*a58d3d2aSXin Li</t>
6416*a58d3d2aSXin Li</section>
6417*a58d3d2aSXin Li
6418*a58d3d2aSXin Li</section>
6419*a58d3d2aSXin Li
6420*a58d3d2aSXin Li<section title='SILK Encoder'>
6421*a58d3d2aSXin Li  <t>
6422*a58d3d2aSXin Li    In many respects the SILK encoder mirrors the SILK decoder described
6423*a58d3d2aSXin Li    in <xref target='silk_decoder_outline'/>.
6424*a58d3d2aSXin Li    Details such as the quantization and range coder tables can be found
6425*a58d3d2aSXin Li    there, while this section describes the high-level design choices that
6426*a58d3d2aSXin Li    were made.
6427*a58d3d2aSXin Li    The diagram below shows the basic modules of the SILK encoder.
6428*a58d3d2aSXin Li<figure align="center" anchor="silk_encoder_figure" title="SILK Encoder">
6429*a58d3d2aSXin Li<artwork>
6430*a58d3d2aSXin Li<![CDATA[
6431*a58d3d2aSXin Li       +----------+    +--------+    +---------+
6432*a58d3d2aSXin Li       |  Sample  |    | Stereo |    |  SILK   |
6433*a58d3d2aSXin Li------>|   Rate   |--->| Mixing |--->|  Core   |---------->
6434*a58d3d2aSXin LiInput  |Conversion|    |        |    | Encoder |  Bitstream
6435*a58d3d2aSXin Li       +----------+    +--------+    +---------+
6436*a58d3d2aSXin Li]]>
6437*a58d3d2aSXin Li</artwork>
6438*a58d3d2aSXin Li</figure>
6439*a58d3d2aSXin Li</t>
6440*a58d3d2aSXin Li
6441*a58d3d2aSXin Li<section title='Sample Rate Conversion'>
6442*a58d3d2aSXin Li<t>
6443*a58d3d2aSXin LiThe input signal's sampling rate is adjusted by a sample rate conversion
6444*a58d3d2aSXin Limodule so that it matches the SILK internal sampling rate.
6445*a58d3d2aSXin LiThe input to the sample rate converter is delayed by a number of samples
6446*a58d3d2aSXin Lidepending on the sample rate ratio, such that the overall delay is constant
6447*a58d3d2aSXin Lifor all input and output sample rates.
6448*a58d3d2aSXin Li</t>
6449*a58d3d2aSXin Li</section>
6450*a58d3d2aSXin Li
6451*a58d3d2aSXin Li<section title='Stereo Mixing'>
6452*a58d3d2aSXin Li<t>
6453*a58d3d2aSXin LiThe stereo mixer is only used for stereo input signals.
6454*a58d3d2aSXin LiIt converts a stereo left/right signal into an adaptive
6455*a58d3d2aSXin Limid/side representation.
6456*a58d3d2aSXin LiThe first step is to compute non-adaptive mid/side signals
6457*a58d3d2aSXin Lias half the sum and difference between left and right signals.
6458*a58d3d2aSXin LiThe side signal is then minimized in energy by subtracting a
6459*a58d3d2aSXin Liprediction of it based on the mid signal.
6460*a58d3d2aSXin LiThis prediction works well when the left and right signals
6461*a58d3d2aSXin Liexhibit linear dependency, for instance for an amplitude-panned
6462*a58d3d2aSXin Liinput signal.
6463*a58d3d2aSXin LiLike in the decoder, the prediction coefficients are linearly
6464*a58d3d2aSXin Liinterpolated during the first 8&nbsp;ms of the frame.
6465*a58d3d2aSXin Li  The mid signal is always encoded, whereas the residual
6466*a58d3d2aSXin Li  side signal is only encoded if it has sufficient
6467*a58d3d2aSXin Li  energy compared to the mid signal's energy.
6468*a58d3d2aSXin Li  If it has not,
6469*a58d3d2aSXin Li  the "mid_only_flag" is set without encoding the side signal.
6470*a58d3d2aSXin Li</t>
6471*a58d3d2aSXin Li<t>
6472*a58d3d2aSXin LiThe predictor coefficients are coded regardless of whether
6473*a58d3d2aSXin Lithe side signal is encoded.
6474*a58d3d2aSXin LiFor each frame, two predictor coefficients are computed, one
6475*a58d3d2aSXin Lithat predicts between low-passed mid and side channels, and
6476*a58d3d2aSXin Lione that predicts between high-passed mid and side channels.
6477*a58d3d2aSXin LiThe low-pass filter is a simple three-tap filter
6478*a58d3d2aSXin Liand creates a delay of one sample.
6479*a58d3d2aSXin LiThe high-pass filtered signal is the difference between
6480*a58d3d2aSXin Lithe mid signal delayed by one sample and the low-passed
6481*a58d3d2aSXin Lisignal.  Instead of explicitly computing the high-passed
6482*a58d3d2aSXin Lisignal, it is computationally more efficient to transform
6483*a58d3d2aSXin Lithe prediction coefficients before applying them to the
6484*a58d3d2aSXin Lifiltered mid signal, as follows
6485*a58d3d2aSXin Li<figure align="center">
6486*a58d3d2aSXin Li<artwork align="center">
6487*a58d3d2aSXin Li<![CDATA[
6488*a58d3d2aSXin Lipred(n) = LP(n) * w0 + HP(n) * w1
6489*a58d3d2aSXin Li        = LP(n) * w0 + (mid(n-1) - LP(n)) * w1
6490*a58d3d2aSXin Li        = LP(n) * (w0 - w1) + mid(n-1) * w1
6491*a58d3d2aSXin Li]]>
6492*a58d3d2aSXin Li</artwork>
6493*a58d3d2aSXin Li</figure>
6494*a58d3d2aSXin Liwhere w0 and w1 are the low-pass and high-pass prediction
6495*a58d3d2aSXin Licoefficients, mid(n-1) is the mid signal delayed by one sample,
6496*a58d3d2aSXin LiLP(n) and HP(n) are the low-passed and high-passed
6497*a58d3d2aSXin Lisignals and pred(n) is the prediction signal that is subtracted
6498*a58d3d2aSXin Lifrom the side signal.
6499*a58d3d2aSXin Li</t>
6500*a58d3d2aSXin Li</section>
6501*a58d3d2aSXin Li
6502*a58d3d2aSXin Li<section title='SILK Core Encoder'>
6503*a58d3d2aSXin Li<t>
6504*a58d3d2aSXin LiWhat follows is a description of the core encoder and its components.
6505*a58d3d2aSXin LiFor simplicity, the core encoder is referred to simply as the encoder in
6506*a58d3d2aSXin Lithe remainder of this section. An overview of the encoder is given in
6507*a58d3d2aSXin Li<xref target="encoder_figure" />.
6508*a58d3d2aSXin Li</t>
6509*a58d3d2aSXin Li<figure align="center" anchor="encoder_figure" title="SILK Core Encoder">
6510*a58d3d2aSXin Li<artwork align="center">
6511*a58d3d2aSXin Li<![CDATA[
6512*a58d3d2aSXin Li                                                             +---+
6513*a58d3d2aSXin Li                          +--------------------------------->|   |
6514*a58d3d2aSXin Li     +---------+          |      +---------+                 |   |
6515*a58d3d2aSXin Li     |Voice    |          |      |LTP      |12               |   |
6516*a58d3d2aSXin Li +-->|Activity |--+       +----->|Scaling  |-----------+---->|   |
6517*a58d3d2aSXin Li |   |Detector |3 |       |      |Control  |<--+       |     |   |
6518*a58d3d2aSXin Li |   +---------+  |       |      +---------+   |       |     |   |
6519*a58d3d2aSXin Li |                |       |      +---------+   |       |     |   |
6520*a58d3d2aSXin Li |                |       |      |Gains    |   |       |     |   |
6521*a58d3d2aSXin Li |                |       |  +-->|Processor|---|---+---|---->| R |
6522*a58d3d2aSXin Li |                |       |  |   |         |11 |   |   |     | a |
6523*a58d3d2aSXin Li |               \/       |  |   +---------+   |   |   |     | n |
6524*a58d3d2aSXin Li |          +---------+   |  |   +---------+   |   |   |     | g |
6525*a58d3d2aSXin Li |          |Pitch    |   |  |   |LSF      |   |   |   |     | e |
6526*a58d3d2aSXin Li |       +->|Analysis |---+  |   |Quantizer|---|---|---|---->|   |
6527*a58d3d2aSXin Li |       |  |         |4  |  |   |         |8  |   |   |     | E |-->
6528*a58d3d2aSXin Li |       |  +---------+   |  |   +---------+   |   |   |     | n | 2
6529*a58d3d2aSXin Li |       |                |  |    9/\  10|     |   |   |     | c |
6530*a58d3d2aSXin Li |       |                |  |     |    \/     |   |   |     | o |
6531*a58d3d2aSXin Li |       |  +---------+   |  |   +----------+  |   |   |     | d |
6532*a58d3d2aSXin Li |       |  |Noise    |   +--|-->|Prediction|--+---|---|---->| e |
6533*a58d3d2aSXin Li |       +->|Shaping  |---|--+   |Analysis  |7 |   |   |     | r |
6534*a58d3d2aSXin Li |       |  |Analysis |5  |  |   |          |  |   |   |     |   |
6535*a58d3d2aSXin Li |       |  +---------+   |  |   +----------+  |   |   |     |   |
6536*a58d3d2aSXin Li |       |                |  |        /\       |   |   |     |   |
6537*a58d3d2aSXin Li |       |     +----------|--|--------+        |   |   |     |   |
6538*a58d3d2aSXin Li |       |     |         \/  \/               \/  \/  \/     |   |
6539*a58d3d2aSXin Li |       |     |       +---------+          +------------+   |   |
6540*a58d3d2aSXin Li |       |     |       |         |          |Noise       |   |   |
6541*a58d3d2aSXin Li-+-------+-----+------>|Prefilter|--------->|Shaping     |-->|   |
6542*a58d3d2aSXin Li1                      |         | 6        |Quantization|13 |   |
6543*a58d3d2aSXin Li                       +---------+          +------------+   +---+
6544*a58d3d2aSXin Li
6545*a58d3d2aSXin Li1:  Input speech signal
6546*a58d3d2aSXin Li2:  Range encoded bitstream
6547*a58d3d2aSXin Li3:  Voice activity estimate
6548*a58d3d2aSXin Li4:  Pitch lags (per 5 ms) and voicing decision (per 20 ms)
6549*a58d3d2aSXin Li5:  Noise shaping quantization coefficients
6550*a58d3d2aSXin Li  - Short term synthesis and analysis
6551*a58d3d2aSXin Li    noise shaping coefficients (per 5 ms)
6552*a58d3d2aSXin Li  - Long term synthesis and analysis noise
6553*a58d3d2aSXin Li    shaping coefficients (per 5 ms and for voiced speech only)
6554*a58d3d2aSXin Li  - Noise shaping tilt (per 5 ms)
6555*a58d3d2aSXin Li  - Quantizer gain/step size (per 5 ms)
6556*a58d3d2aSXin Li6:  Input signal filtered with analysis noise shaping filters
6557*a58d3d2aSXin Li7:  Short and long term prediction coefficients
6558*a58d3d2aSXin Li    LTP (per 5 ms) and LPC (per 20 ms)
6559*a58d3d2aSXin Li8:  LSF quantization indices
6560*a58d3d2aSXin Li9:  LSF coefficients
6561*a58d3d2aSXin Li10: Quantized LSF coefficients
6562*a58d3d2aSXin Li11: Processed gains, and synthesis noise shape coefficients
6563*a58d3d2aSXin Li12: LTP state scaling coefficient. Controlling error propagation
6564*a58d3d2aSXin Li   / prediction gain trade-off
6565*a58d3d2aSXin Li13: Quantized signal
6566*a58d3d2aSXin Li]]>
6567*a58d3d2aSXin Li</artwork>
6568*a58d3d2aSXin Li</figure>
6569*a58d3d2aSXin Li
6570*a58d3d2aSXin Li<section title='Voice Activity Detection'>
6571*a58d3d2aSXin Li<t>
6572*a58d3d2aSXin LiThe input signal is processed by a Voice Activity Detector (VAD) to produce
6573*a58d3d2aSXin Lia measure of voice activity, spectral tilt, and signal-to-noise estimates for
6574*a58d3d2aSXin Lieach frame. The VAD uses a sequence of half-band filterbanks to split the
6575*a58d3d2aSXin Lisignal into four subbands: 0...Fs/16, Fs/16...Fs/8, Fs/8...Fs/4, and
6576*a58d3d2aSXin LiFs/4...Fs/2, where Fs is the sampling frequency (8, 12, 16, or 24&nbsp;kHz).
6577*a58d3d2aSXin LiThe lowest subband, from 0 - Fs/16, is high-pass filtered with a first-order
6578*a58d3d2aSXin Limoving average (MA) filter (with transfer function H(z) = 1-z**(-1)) to
6579*a58d3d2aSXin Lireduce the energy at the lowest frequencies. For each frame, the signal
6580*a58d3d2aSXin Lienergy per subband is computed.
6581*a58d3d2aSXin LiIn each subband, a noise level estimator tracks the background noise level
6582*a58d3d2aSXin Liand a Signal-to-Noise Ratio (SNR) value is computed as the logarithm of the
6583*a58d3d2aSXin Liratio of energy to noise level.
6584*a58d3d2aSXin LiUsing these intermediate variables, the following parameters are calculated
6585*a58d3d2aSXin Lifor use in other SILK modules:
6586*a58d3d2aSXin Li<list style="symbols">
6587*a58d3d2aSXin Li<t>
6588*a58d3d2aSXin LiAverage SNR. The average of the subband SNR values.
6589*a58d3d2aSXin Li</t>
6590*a58d3d2aSXin Li
6591*a58d3d2aSXin Li<t>
6592*a58d3d2aSXin LiSmoothed subband SNRs. Temporally smoothed subband SNR values.
6593*a58d3d2aSXin Li</t>
6594*a58d3d2aSXin Li
6595*a58d3d2aSXin Li<t>
6596*a58d3d2aSXin LiSpeech activity level. Based on the average SNR and a weighted average of the
6597*a58d3d2aSXin Lisubband energies.
6598*a58d3d2aSXin Li</t>
6599*a58d3d2aSXin Li
6600*a58d3d2aSXin Li<t>
6601*a58d3d2aSXin LiSpectral tilt. A weighted average of the subband SNRs, with positive weights
6602*a58d3d2aSXin Lifor the low subbands and negative weights for the high subbands.
6603*a58d3d2aSXin Li</t>
6604*a58d3d2aSXin Li</list>
6605*a58d3d2aSXin Li</t>
6606*a58d3d2aSXin Li</section>
6607*a58d3d2aSXin Li
6608*a58d3d2aSXin Li<section title='Pitch Analysis' anchor='pitch_estimator_overview_section'>
6609*a58d3d2aSXin Li<t>
6610*a58d3d2aSXin LiThe input signal is processed by the open loop pitch estimator shown in
6611*a58d3d2aSXin Li<xref target='pitch_estimator_figure' />.
6612*a58d3d2aSXin Li<figure align="center" anchor="pitch_estimator_figure"
6613*a58d3d2aSXin Li title="Block diagram of the pitch estimator">
6614*a58d3d2aSXin Li<artwork align="center">
6615*a58d3d2aSXin Li<![CDATA[
6616*a58d3d2aSXin Li                                 +--------+  +----------+
6617*a58d3d2aSXin Li                                 |2 x Down|  |Time-     |
6618*a58d3d2aSXin Li                              +->|sampling|->|Correlator|     |
6619*a58d3d2aSXin Li                              |  |        |  |          |     |4
6620*a58d3d2aSXin Li                              |  +--------+  +----------+    \/
6621*a58d3d2aSXin Li                              |                    | 2    +-------+
6622*a58d3d2aSXin Li                              |                    |  +-->|Speech |5
6623*a58d3d2aSXin Li    +---------+    +--------+ |                   \/  |   |Type   |->
6624*a58d3d2aSXin Li    |LPC      |    |Down    | |              +----------+ |       |
6625*a58d3d2aSXin Li +->|Analysis | +->|sample  |-+------------->|Time-     | +-------+
6626*a58d3d2aSXin Li |  |         | |  |to 8 kHz|                |Correlator|----------->
6627*a58d3d2aSXin Li |  +---------+ |  +--------+                |__________|          6
6628*a58d3d2aSXin Li |       |      |                                  |3
6629*a58d3d2aSXin Li |      \/      |                                 \/
6630*a58d3d2aSXin Li |  +---------+ |                            +----------+
6631*a58d3d2aSXin Li |  |Whitening| |                            |Time-     |
6632*a58d3d2aSXin Li-+->|Filter   |-+--------------------------->|Correlator|----------->
6633*a58d3d2aSXin Li1   |         |                              |          |          7
6634*a58d3d2aSXin Li    +---------+                              +----------+
6635*a58d3d2aSXin Li
6636*a58d3d2aSXin Li1: Input signal
6637*a58d3d2aSXin Li2: Lag candidates from stage 1
6638*a58d3d2aSXin Li3: Lag candidates from stage 2
6639*a58d3d2aSXin Li4: Correlation threshold
6640*a58d3d2aSXin Li5: Voiced/unvoiced flag
6641*a58d3d2aSXin Li6: Pitch correlation
6642*a58d3d2aSXin Li7: Pitch lags
6643*a58d3d2aSXin Li]]>
6644*a58d3d2aSXin Li</artwork>
6645*a58d3d2aSXin Li</figure>
6646*a58d3d2aSXin LiThe pitch analysis finds a binary voiced/unvoiced classification, and, for
6647*a58d3d2aSXin Liframes classified as voiced, four pitch lags per frame - one for each
6648*a58d3d2aSXin Li5&nbsp;ms subframe - and a pitch correlation indicating the periodicity of
6649*a58d3d2aSXin Lithe signal.
6650*a58d3d2aSXin LiThe input is first whitened using a Linear Prediction (LP) whitening filter,
6651*a58d3d2aSXin Liwhere the coefficients are computed through standard Linear Prediction Coding
6652*a58d3d2aSXin Li(LPC) analysis. The order of the whitening filter is 16 for best results, but
6653*a58d3d2aSXin Liis reduced to 12 for medium complexity and 8 for low complexity modes.
6654*a58d3d2aSXin LiThe whitened signal is analyzed to find pitch lags for which the time
6655*a58d3d2aSXin Licorrelation is high.
6656*a58d3d2aSXin LiThe analysis consists of three stages for reducing the complexity:
6657*a58d3d2aSXin Li<list style="symbols">
6658*a58d3d2aSXin Li<t>In the first stage, the whitened signal is downsampled to 4&nbsp;kHz
6659*a58d3d2aSXin Li(from 8&nbsp;kHz) and the current frame is correlated to a signal delayed
6660*a58d3d2aSXin Liby a range of lags, starting from a shortest lag corresponding to
6661*a58d3d2aSXin Li500&nbsp;Hz, to a longest lag corresponding to 56&nbsp;Hz.</t>
6662*a58d3d2aSXin Li
6663*a58d3d2aSXin Li<t>
6664*a58d3d2aSXin LiThe second stage operates on an 8&nbsp;kHz signal (downsampled from 12, 16,
6665*a58d3d2aSXin Lior 24&nbsp;kHz) and measures time correlations only near the lags
6666*a58d3d2aSXin Licorresponding to those that had sufficiently high correlations in the first
6667*a58d3d2aSXin Listage. The resulting correlations are adjusted for a small bias towards
6668*a58d3d2aSXin Lishort lags to avoid ending up with a multiple of the true pitch lag.
6669*a58d3d2aSXin LiThe highest adjusted correlation is compared to a threshold depending on:
6670*a58d3d2aSXin Li<list style="symbols">
6671*a58d3d2aSXin Li<t>
6672*a58d3d2aSXin LiWhether the previous frame was classified as voiced
6673*a58d3d2aSXin Li</t>
6674*a58d3d2aSXin Li<t>
6675*a58d3d2aSXin LiThe speech activity level
6676*a58d3d2aSXin Li</t>
6677*a58d3d2aSXin Li<t>
6678*a58d3d2aSXin LiThe spectral tilt.
6679*a58d3d2aSXin Li</t>
6680*a58d3d2aSXin Li</list>
6681*a58d3d2aSXin LiIf the threshold is exceeded, the current frame is classified as voiced and
6682*a58d3d2aSXin Lithe lag with the highest adjusted correlation is stored for a final pitch
6683*a58d3d2aSXin Lianalysis of the highest precision in the third stage.
6684*a58d3d2aSXin Li</t>
6685*a58d3d2aSXin Li<t>
6686*a58d3d2aSXin LiThe last stage operates directly on the whitened input signal to compute time
6687*a58d3d2aSXin Licorrelations for each of the four subframes independently in a narrow range
6688*a58d3d2aSXin Liaround the lag with highest correlation from the second stage.
6689*a58d3d2aSXin Li</t>
6690*a58d3d2aSXin Li</list>
6691*a58d3d2aSXin Li</t>
6692*a58d3d2aSXin Li</section>
6693*a58d3d2aSXin Li
6694*a58d3d2aSXin Li<section title='Noise Shaping Analysis' anchor='noise_shaping_analysis_overview_section'>
6695*a58d3d2aSXin Li<t>
6696*a58d3d2aSXin LiThe noise shaping analysis finds gains and filter coefficients used in the
6697*a58d3d2aSXin Liprefilter and noise shaping quantizer. These parameters are chosen such that
6698*a58d3d2aSXin Lithey will fulfill several requirements:
6699*a58d3d2aSXin Li<list style="symbols">
6700*a58d3d2aSXin Li<t>
6701*a58d3d2aSXin LiBalancing quantization noise and bitrate.
6702*a58d3d2aSXin LiThe quantization gains determine the step size between reconstruction levels
6703*a58d3d2aSXin Liof the excitation signal. Therefore, increasing the quantization gain
6704*a58d3d2aSXin Liamplifies quantization noise, but also reduces the bitrate by lowering
6705*a58d3d2aSXin Lithe entropy of the quantization indices.
6706*a58d3d2aSXin Li</t>
6707*a58d3d2aSXin Li<t>
6708*a58d3d2aSXin LiSpectral shaping of the quantization noise; the noise shaping quantizer is
6709*a58d3d2aSXin Licapable of reducing quantization noise in some parts of the spectrum at the
6710*a58d3d2aSXin Licost of increased noise in other parts without substantially changing the
6711*a58d3d2aSXin Libitrate.
6712*a58d3d2aSXin LiBy shaping the noise such that it follows the signal spectrum, it becomes
6713*a58d3d2aSXin Liless audible. In practice, best results are obtained by making the shape
6714*a58d3d2aSXin Liof the noise spectrum slightly flatter than the signal spectrum.
6715*a58d3d2aSXin Li</t>
6716*a58d3d2aSXin Li<t>
6717*a58d3d2aSXin LiDe-emphasizing spectral valleys; by using different coefficients in the
6718*a58d3d2aSXin Lianalysis and synthesis part of the prefilter and noise shaping quantizer,
6719*a58d3d2aSXin Lithe levels of the spectral valleys can be decreased relative to the levels
6720*a58d3d2aSXin Liof the spectral peaks such as speech formants and harmonics.
6721*a58d3d2aSXin LiThis reduces the entropy of the signal, which is the difference between the
6722*a58d3d2aSXin Licoded signal and the quantization noise, thus lowering the bitrate.
6723*a58d3d2aSXin Li</t>
6724*a58d3d2aSXin Li<t>
6725*a58d3d2aSXin LiMatching the levels of the decoded speech formants to the levels of the
6726*a58d3d2aSXin Lioriginal speech formants; an adjustment gain and a first order tilt
6727*a58d3d2aSXin Licoefficient are computed to compensate for the effect of the noise
6728*a58d3d2aSXin Lishaping quantization on the level and spectral tilt.
6729*a58d3d2aSXin Li</t>
6730*a58d3d2aSXin Li</list>
6731*a58d3d2aSXin Li</t>
6732*a58d3d2aSXin Li<t>
6733*a58d3d2aSXin Li<figure align="center" anchor="noise_shape_analysis_spectra_figure"
6734*a58d3d2aSXin Li title="Noise shaping and spectral de-emphasis illustration">
6735*a58d3d2aSXin Li<artwork align="center">
6736*a58d3d2aSXin Li<![CDATA[
6737*a58d3d2aSXin Li  / \   ___
6738*a58d3d2aSXin Li   |   // \\
6739*a58d3d2aSXin Li   |  //   \\     ____
6740*a58d3d2aSXin Li   |_//     \\___//  \\         ____
6741*a58d3d2aSXin Li   | /  ___  \   /    \\       //  \\
6742*a58d3d2aSXin Li P |/  /   \  \_/      \\_____//    \\
6743*a58d3d2aSXin Li o |  /     \     ____  \     /      \\
6744*a58d3d2aSXin Li w | /       \___/    \  \___/  ____  \\___ 1
6745*a58d3d2aSXin Li e |/                  \       /    \  \
6746*a58d3d2aSXin Li r |                    \_____/      \  \__ 2
6747*a58d3d2aSXin Li   |                                  \
6748*a58d3d2aSXin Li   |                                   \___ 3
6749*a58d3d2aSXin Li   |
6750*a58d3d2aSXin Li   +---------------------------------------->
6751*a58d3d2aSXin Li                    Frequency
6752*a58d3d2aSXin Li
6753*a58d3d2aSXin Li1: Input signal spectrum
6754*a58d3d2aSXin Li2: De-emphasized and level matched spectrum
6755*a58d3d2aSXin Li3: Quantization noise spectrum
6756*a58d3d2aSXin Li]]>
6757*a58d3d2aSXin Li</artwork>
6758*a58d3d2aSXin Li</figure>
6759*a58d3d2aSXin Li<xref target='noise_shape_analysis_spectra_figure' /> shows an example of an
6760*a58d3d2aSXin Liinput signal spectrum (1).
6761*a58d3d2aSXin LiAfter de-emphasis and level matching, the spectrum has deeper valleys (2).
6762*a58d3d2aSXin LiThe quantization noise spectrum (3) more or less follows the input signal
6763*a58d3d2aSXin Lispectrum, while having slightly less pronounced peaks.
6764*a58d3d2aSXin LiThe entropy, which provides a lower bound on the bitrate for encoding the
6765*a58d3d2aSXin Liexcitation signal, is proportional to the area between the de-emphasized
6766*a58d3d2aSXin Lispectrum (2) and the quantization noise spectrum (3). Without de-emphasis,
6767*a58d3d2aSXin Lithe entropy is proportional to the area between input spectrum (1) and
6768*a58d3d2aSXin Liquantization noise (3) - clearly higher.
6769*a58d3d2aSXin Li</t>
6770*a58d3d2aSXin Li
6771*a58d3d2aSXin Li<t>
6772*a58d3d2aSXin LiThe transformation from input signal to de-emphasized signal can be
6773*a58d3d2aSXin Lidescribed as a filtering operation with a filter
6774*a58d3d2aSXin Li<figure align="center">
6775*a58d3d2aSXin Li<artwork align="center">
6776*a58d3d2aSXin Li<![CDATA[
6777*a58d3d2aSXin Li                           -1    Wana(z)
6778*a58d3d2aSXin LiH(z) = G * ( 1 - c_tilt * z  ) * -------
6779*a58d3d2aSXin Li                                 Wsyn(z),
6780*a58d3d2aSXin Li]]>
6781*a58d3d2aSXin Li</artwork>
6782*a58d3d2aSXin Li</figure>
6783*a58d3d2aSXin Lihaving an adjustment gain G, a first order tilt adjustment filter with
6784*a58d3d2aSXin Litilt coefficient c_tilt, and where
6785*a58d3d2aSXin Li<figure align="center">
6786*a58d3d2aSXin Li<artwork align="center">
6787*a58d3d2aSXin Li<![CDATA[
6788*a58d3d2aSXin Li               16                            d
6789*a58d3d2aSXin Li               __             -k        -L  __            -k
6790*a58d3d2aSXin LiWana(z) = (1 - \ (a_ana(k) * z  )*(1 - z  * \ b_ana(k) * z  ),
6791*a58d3d2aSXin Li               /_                           /_
6792*a58d3d2aSXin Li               k=1                          k=-d
6793*a58d3d2aSXin Li]]>
6794*a58d3d2aSXin Li</artwork>
6795*a58d3d2aSXin Li</figure>
6796*a58d3d2aSXin Liis the analysis part of the de-emphasis filter, consisting of the short-term
6797*a58d3d2aSXin Lishaping filter with coefficients a_ana(k), and the long-term shaping filter
6798*a58d3d2aSXin Liwith coefficients b_ana(k) and pitch lag L.
6799*a58d3d2aSXin LiThe parameter d determines the number of long-term shaping filter taps.
6800*a58d3d2aSXin Li</t>
6801*a58d3d2aSXin Li
6802*a58d3d2aSXin Li<t>
6803*a58d3d2aSXin LiSimilarly, but without the tilt adjustment, the synthesis part can be written as
6804*a58d3d2aSXin Li<figure align="center">
6805*a58d3d2aSXin Li<artwork align="center">
6806*a58d3d2aSXin Li<![CDATA[
6807*a58d3d2aSXin Li               16                            d
6808*a58d3d2aSXin Li               __             -k        -L  __            -k
6809*a58d3d2aSXin LiWsyn(z) = (1 - \ (a_syn(k) * z  )*(1 - z  * \ b_syn(k) * z  ).
6810*a58d3d2aSXin Li               /_                           /_
6811*a58d3d2aSXin Li               k=1                          k=-d
6812*a58d3d2aSXin Li            ]]>
6813*a58d3d2aSXin Li</artwork>
6814*a58d3d2aSXin Li</figure>
6815*a58d3d2aSXin Li</t>
6816*a58d3d2aSXin Li<t>
6817*a58d3d2aSXin LiAll noise shaping parameters are computed and applied per subframe of 5&nbsp;ms.
6818*a58d3d2aSXin LiFirst, an LPC analysis is performed on a windowed signal block of 15&nbsp;ms.
6819*a58d3d2aSXin LiThe signal block has a look-ahead of 5&nbsp;ms relative to the current subframe,
6820*a58d3d2aSXin Liand the window is an asymmetric sine window. The LPC analysis is done with the
6821*a58d3d2aSXin Liautocorrelation method, with an order of between 8, in lowest-complexity mode,
6822*a58d3d2aSXin Liand 16, for best quality.
6823*a58d3d2aSXin Li</t>
6824*a58d3d2aSXin Li<t>
6825*a58d3d2aSXin LiOptionally the LPC analysis and noise shaping filters are warped by replacing
6826*a58d3d2aSXin Lithe delay elements by first-order allpass filters.
6827*a58d3d2aSXin LiThis increases the frequency resolution at low frequencies and reduces it at
6828*a58d3d2aSXin Lihigh ones, which better matches the human auditory system and improves
6829*a58d3d2aSXin Liquality.
6830*a58d3d2aSXin LiThe warped analysis and filtering comes at a cost in complexity
6831*a58d3d2aSXin Liand is therefore only done in higher complexity modes.
6832*a58d3d2aSXin Li</t>
6833*a58d3d2aSXin Li<t>
6834*a58d3d2aSXin LiThe quantization gain is found by taking the square root of the residual energy
6835*a58d3d2aSXin Lifrom the LPC analysis and multiplying it by a value inversely proportional
6836*a58d3d2aSXin Lito the coding quality control parameter and the pitch correlation.
6837*a58d3d2aSXin Li</t>
6838*a58d3d2aSXin Li<t>
6839*a58d3d2aSXin LiNext the two sets of short-term noise shaping coefficients a_ana(k) and
6840*a58d3d2aSXin Lia_syn(k) are obtained by applying different amounts of bandwidth expansion to the
6841*a58d3d2aSXin Licoefficients found in the LPC analysis.
6842*a58d3d2aSXin LiThis bandwidth expansion moves the roots of the LPC polynomial towards the
6843*a58d3d2aSXin Liorigin, using the formulas
6844*a58d3d2aSXin Li<figure align="center">
6845*a58d3d2aSXin Li<artwork align="center">
6846*a58d3d2aSXin Li<![CDATA[
6847*a58d3d2aSXin Li                      k
6848*a58d3d2aSXin Li a_ana(k) = a(k)*g_ana , and
6849*a58d3d2aSXin Li
6850*a58d3d2aSXin Li                      k
6851*a58d3d2aSXin Li a_syn(k) = a(k)*g_syn ,
6852*a58d3d2aSXin Li]]>
6853*a58d3d2aSXin Li</artwork>
6854*a58d3d2aSXin Li</figure>
6855*a58d3d2aSXin Liwhere a(k) is the k'th LPC coefficient, and the bandwidth expansion factors
6856*a58d3d2aSXin Lig_ana and g_syn are calculated as
6857*a58d3d2aSXin Li<figure align="center">
6858*a58d3d2aSXin Li<artwork align="center">
6859*a58d3d2aSXin Li<![CDATA[
6860*a58d3d2aSXin Lig_ana = 0.95 - 0.01*C, and
6861*a58d3d2aSXin Li
6862*a58d3d2aSXin Lig_syn = 0.95 + 0.01*C,
6863*a58d3d2aSXin Li]]>
6864*a58d3d2aSXin Li</artwork>
6865*a58d3d2aSXin Li</figure>
6866*a58d3d2aSXin Liwhere C is the coding quality control parameter between 0 and 1.
6867*a58d3d2aSXin LiApplying more bandwidth expansion to the analysis part than to the synthesis
6868*a58d3d2aSXin Lipart gives the desired de-emphasis of spectral valleys in between formants.
6869*a58d3d2aSXin Li</t>
6870*a58d3d2aSXin Li
6871*a58d3d2aSXin Li<t>
6872*a58d3d2aSXin LiThe long-term shaping is applied only during voiced frames.
6873*a58d3d2aSXin LiIt uses three filter taps, described by
6874*a58d3d2aSXin Li<figure align="center">
6875*a58d3d2aSXin Li<artwork align="center">
6876*a58d3d2aSXin Li  <![CDATA[
6877*a58d3d2aSXin Lib_ana = F_ana * [0.25, 0.5, 0.25], and
6878*a58d3d2aSXin Li
6879*a58d3d2aSXin Lib_syn = F_syn * [0.25, 0.5, 0.25].
6880*a58d3d2aSXin Li]]>
6881*a58d3d2aSXin Li</artwork>
6882*a58d3d2aSXin Li</figure>
6883*a58d3d2aSXin LiFor unvoiced frames these coefficients are set to 0. The multiplication factors
6884*a58d3d2aSXin LiF_ana and F_syn are chosen between 0 and 1, depending on the coding quality
6885*a58d3d2aSXin Licontrol parameter, as well as the calculated pitch correlation and smoothed
6886*a58d3d2aSXin Lisubband SNR of the lowest subband. By having F_ana less than F_syn,
6887*a58d3d2aSXin Lithe pitch harmonics are emphasized relative to the valleys in between the
6888*a58d3d2aSXin Liharmonics.
6889*a58d3d2aSXin Li</t>
6890*a58d3d2aSXin Li
6891*a58d3d2aSXin Li<t>
6892*a58d3d2aSXin LiThe tilt coefficient c_tilt is for unvoiced frames chosen as
6893*a58d3d2aSXin Li<figure align="center">
6894*a58d3d2aSXin Li<artwork align="center">
6895*a58d3d2aSXin Li<![CDATA[
6896*a58d3d2aSXin Lic_tilt = 0.25,
6897*a58d3d2aSXin Li]]>
6898*a58d3d2aSXin Li</artwork>
6899*a58d3d2aSXin Li</figure>
6900*a58d3d2aSXin Liand as
6901*a58d3d2aSXin Li<figure align="center">
6902*a58d3d2aSXin Li<artwork align="center">
6903*a58d3d2aSXin Li<![CDATA[
6904*a58d3d2aSXin Lic_tilt = 0.25 + 0.2625 * V
6905*a58d3d2aSXin Li]]>
6906*a58d3d2aSXin Li</artwork>
6907*a58d3d2aSXin Li</figure>
6908*a58d3d2aSXin Lifor voiced frames, where V is the voice activity level between 0 and 1.
6909*a58d3d2aSXin Li</t>
6910*a58d3d2aSXin Li<t>
6911*a58d3d2aSXin LiThe adjustment gain G serves to correct any level mismatch between the original
6912*a58d3d2aSXin Liand decoded signals that might arise from the noise shaping and de-emphasis.
6913*a58d3d2aSXin LiThis gain is computed as the ratio of the prediction gain of the short-term
6914*a58d3d2aSXin Lianalysis and synthesis filter coefficients. The prediction gain of an LPC
6915*a58d3d2aSXin Lisynthesis filter is the square root of the output energy when the filter is
6916*a58d3d2aSXin Liexcited by a unit-energy impulse on the input.
6917*a58d3d2aSXin LiAn efficient way to compute the prediction gain is by first computing the
6918*a58d3d2aSXin Lireflection coefficients from the LPC coefficients through the step-down
6919*a58d3d2aSXin Lialgorithm, and extracting the prediction gain from the reflection coefficients
6920*a58d3d2aSXin Lias
6921*a58d3d2aSXin Li<figure align="center">
6922*a58d3d2aSXin Li<artwork align="center">
6923*a58d3d2aSXin Li<![CDATA[
6924*a58d3d2aSXin Li               K
6925*a58d3d2aSXin Li              ___          2  -0.5
6926*a58d3d2aSXin Li predGain = ( | | 1 - (r_k)  )    ,
6927*a58d3d2aSXin Li              k=1
6928*a58d3d2aSXin Li]]>
6929*a58d3d2aSXin Li</artwork>
6930*a58d3d2aSXin Li</figure>
6931*a58d3d2aSXin Liwhere r_k is the k'th reflection coefficient.
6932*a58d3d2aSXin Li</t>
6933*a58d3d2aSXin Li
6934*a58d3d2aSXin Li<t>
6935*a58d3d2aSXin LiInitial values for the quantization gains are computed as the square-root of
6936*a58d3d2aSXin Lithe residual energy of the LPC analysis, adjusted by the coding quality control
6937*a58d3d2aSXin Liparameter.
6938*a58d3d2aSXin LiThese quantization gains are later adjusted based on the results of the
6939*a58d3d2aSXin Liprediction analysis.
6940*a58d3d2aSXin Li</t>
6941*a58d3d2aSXin Li</section>
6942*a58d3d2aSXin Li
6943*a58d3d2aSXin Li<section title='Prediction Analysis' anchor='pred_ana_overview_section'>
6944*a58d3d2aSXin Li<t>
6945*a58d3d2aSXin LiThe prediction analysis is performed in one of two ways depending on how
6946*a58d3d2aSXin Lithe pitch estimator classified the frame.
6947*a58d3d2aSXin LiThe processing for voiced and unvoiced speech is described in
6948*a58d3d2aSXin Li<xref target='pred_ana_voiced_overview_section' /> and
6949*a58d3d2aSXin Li  <xref target='pred_ana_unvoiced_overview_section' />, respectively.
6950*a58d3d2aSXin Li  Inputs to this function include the pre-whitened signal from the
6951*a58d3d2aSXin Li  pitch estimator (see <xref target='pitch_estimator_overview_section'/>).
6952*a58d3d2aSXin Li</t>
6953*a58d3d2aSXin Li
6954*a58d3d2aSXin Li<section title='Voiced Speech' anchor='pred_ana_voiced_overview_section'>
6955*a58d3d2aSXin Li<t>
6956*a58d3d2aSXin Li  For a frame of voiced speech the pitch pulses will remain dominant in the
6957*a58d3d2aSXin Li  pre-whitened input signal.
6958*a58d3d2aSXin Li  Further whitening is desirable as it leads to higher quality at the same
6959*a58d3d2aSXin Li  available bitrate.
6960*a58d3d2aSXin Li  To achieve this, a Long-Term Prediction (LTP) analysis is carried out to
6961*a58d3d2aSXin Li  estimate the coefficients of a fifth-order LTP filter for each of four
6962*a58d3d2aSXin Li  subframes.
6963*a58d3d2aSXin Li  The LTP coefficients are quantized using the method described in
6964*a58d3d2aSXin Li  <xref target='ltp_quantizer_overview_section'/>, and the quantized LTP
6965*a58d3d2aSXin Li  coefficients are used to compute the LTP residual signal.
6966*a58d3d2aSXin Li  This LTP residual signal is the input to an LPC analysis where the LPC coefficients are
6967*a58d3d2aSXin Li  estimated using Burg's method <xref target="Burg"/>, such that the residual energy is minimized.
6968*a58d3d2aSXin Li  The estimated LPC coefficients are converted to a Line Spectral Frequency (LSF) vector
6969*a58d3d2aSXin Li  and quantized as described in <xref target='lsf_quantizer_overview_section'/>.
6970*a58d3d2aSXin LiAfter quantization, the quantized LSF vector is converted back to LPC
6971*a58d3d2aSXin Licoefficients using the full procedure in <xref target="silk_nlsfs"/>.
6972*a58d3d2aSXin LiBy using quantized LTP coefficients and LPC coefficients derived from the
6973*a58d3d2aSXin Liquantized LSF coefficients, the encoder remains fully synchronized with the
6974*a58d3d2aSXin Lidecoder.
6975*a58d3d2aSXin LiThe quantized LPC and LTP coefficients are also used to filter the input
6976*a58d3d2aSXin Lisignal and measure residual energy for each of the four subframes.
6977*a58d3d2aSXin Li</t>
6978*a58d3d2aSXin Li</section>
6979*a58d3d2aSXin Li<section title='Unvoiced Speech' anchor='pred_ana_unvoiced_overview_section'>
6980*a58d3d2aSXin Li<t>
6981*a58d3d2aSXin LiFor a speech signal that has been classified as unvoiced, there is no need
6982*a58d3d2aSXin Lifor LTP filtering, as it has already been determined that the pre-whitened
6983*a58d3d2aSXin Liinput signal is not periodic enough within the allowed pitch period range
6984*a58d3d2aSXin Lifor LTP analysis to be worth the cost in terms of complexity and bitrate.
6985*a58d3d2aSXin LiThe pre-whitened input signal is therefore discarded, and instead the input
6986*a58d3d2aSXin Lisignal is used for LPC analysis using Burg's method.
6987*a58d3d2aSXin LiThe resulting LPC coefficients are converted to an LSF vector and quantized
6988*a58d3d2aSXin Lias described in the following section.
6989*a58d3d2aSXin LiThey are then transformed back to obtain quantized LPC coefficients, which
6990*a58d3d2aSXin Liare then used to filter the input signal and measure residual energy for
6991*a58d3d2aSXin Lieach of the four subframes.
6992*a58d3d2aSXin Li</t>
6993*a58d3d2aSXin Li<section title="Burg's Method">
6994*a58d3d2aSXin Li<t>
6995*a58d3d2aSXin LiThe main purpose of linear prediction in SILK is to reduce the bitrate by
6996*a58d3d2aSXin Liminimizing the residual energy.
6997*a58d3d2aSXin LiAt least at high bitrates, perceptual aspects are handled
6998*a58d3d2aSXin Liindependently by the noise shaping filter.
6999*a58d3d2aSXin LiBurg's method is used because it provides higher prediction gain
7000*a58d3d2aSXin Lithan the autocorrelation method and, unlike the covariance method,
7001*a58d3d2aSXin Liproduces stable filters (assuming numerical errors don't spoil
7002*a58d3d2aSXin Lithat). SILK's implementation of Burg's method is also computationally
7003*a58d3d2aSXin Lifaster than the autocovariance method.
7004*a58d3d2aSXin LiThe implementation of Burg's method differs from traditional
7005*a58d3d2aSXin Liimplementations in two aspects.
7006*a58d3d2aSXin LiThe first difference is that it
7007*a58d3d2aSXin Lioperates on autocorrelations, similar to the Schur algorithm <xref target="Schur"/>, but
7008*a58d3d2aSXin Liwith a simple update to the autocorrelations after finding each
7009*a58d3d2aSXin Lireflection coefficient to make the result identical to Burg's method.
7010*a58d3d2aSXin LiThis brings down the complexity of Burg's method to near that of
7011*a58d3d2aSXin Lithe autocorrelation method.
7012*a58d3d2aSXin LiThe second difference is that the signal in each subframe is scaled
7013*a58d3d2aSXin Liby the inverse of the residual quantization step size.  Subframes with
7014*a58d3d2aSXin Lia small quantization step size will on average spend more bits for a
7015*a58d3d2aSXin Ligiven amount of residual energy than subframes with a large step size.
7016*a58d3d2aSXin LiWithout scaling, Burg's method minimizes the total residual energy in
7017*a58d3d2aSXin Liall subframes, which doesn't necessarily minimize the total number of
7018*a58d3d2aSXin Libits needed for coding the quantized residual.  The residual energy
7019*a58d3d2aSXin Liof the scaled subframes is a better measure for that number of
7020*a58d3d2aSXin Libits.
7021*a58d3d2aSXin Li</t>
7022*a58d3d2aSXin Li</section>
7023*a58d3d2aSXin Li</section>
7024*a58d3d2aSXin Li</section>
7025*a58d3d2aSXin Li
7026*a58d3d2aSXin Li<section title='LSF Quantization' anchor='lsf_quantizer_overview_section'>
7027*a58d3d2aSXin Li<t>
7028*a58d3d2aSXin LiUnlike many other speech codecs, SILK uses variable bitrate coding
7029*a58d3d2aSXin Lifor the LSFs.
7030*a58d3d2aSXin LiThis improves the average rate-distortion (R-D) tradeoff and reduces outliers.
7031*a58d3d2aSXin LiThe variable bitrate coding minimizes a linear combination of the weighted
7032*a58d3d2aSXin Liquantization errors and the bitrate.
7033*a58d3d2aSXin LiThe weights for the quantization errors are the Inverse
7034*a58d3d2aSXin LiHarmonic Mean Weighting (IHMW) function proposed by Laroia et al.
7035*a58d3d2aSXin Li(see <xref target="laroia-icassp" />).
7036*a58d3d2aSXin LiThese weights are referred to here as Laroia weights.
7037*a58d3d2aSXin Li</t>
7038*a58d3d2aSXin Li<t>
7039*a58d3d2aSXin LiThe LSF quantizer consists of two stages.
7040*a58d3d2aSXin LiThe first stage is an (unweighted) vector quantizer (VQ), with a
7041*a58d3d2aSXin Licodebook size of 32 vectors.
7042*a58d3d2aSXin LiThe quantization errors for the codebook vector are sorted, and
7043*a58d3d2aSXin Lifor the N best vectors a second stage quantizer is run.
7044*a58d3d2aSXin LiBy varying the number N a tradeoff is made between R-D performance
7045*a58d3d2aSXin Liand computational efficiency.
7046*a58d3d2aSXin LiFor each of the N codebook vectors the Laroia weights corresponding
7047*a58d3d2aSXin Lito that vector (and not to the input vector) are calculated.
7048*a58d3d2aSXin LiThen the residual between the input LSF vector and the codebook
7049*a58d3d2aSXin Livector is scaled by the square roots of these Laroia weights.
7050*a58d3d2aSXin LiThis scaling partially normalizes error sensitivity for the
7051*a58d3d2aSXin Liresidual vector, so that a uniform quantizer with fixed
7052*a58d3d2aSXin Listep sizes can be used in the second stage without too much
7053*a58d3d2aSXin Liperformance loss.
7054*a58d3d2aSXin LiAnd by scaling with Laroia weights determined from the first-stage
7055*a58d3d2aSXin Licodebook vector, the process can be reversed in the decoder.
7056*a58d3d2aSXin Li</t>
7057*a58d3d2aSXin Li<t>
7058*a58d3d2aSXin LiThe second stage uses predictive delayed decision scalar
7059*a58d3d2aSXin Liquantization.
7060*a58d3d2aSXin LiThe quantization error is weighted by Laroia weights determined
7061*a58d3d2aSXin Lifrom the LSF input vector.
7062*a58d3d2aSXin LiThe predictor multiplies the previous quantized residual value
7063*a58d3d2aSXin Liby a prediction coefficient that depends on the vector index from the
7064*a58d3d2aSXin Lifirst stage VQ and on the location in the LSF vector.
7065*a58d3d2aSXin LiThe prediction is subtracted from the LSF residual value before
7066*a58d3d2aSXin Liquantizing the result, and added back afterwards.
7067*a58d3d2aSXin LiThis subtraction can be interpreted as shifting the quantization levels
7068*a58d3d2aSXin Liof the scalar quantizer, and as a result the quantization error of
7069*a58d3d2aSXin Lieach value depends on the quantization decision of the previous value.
7070*a58d3d2aSXin LiThis dependency is exploited by the delayed decision mechanism to
7071*a58d3d2aSXin Lisearch for a quantization sequency with best R-D performance
7072*a58d3d2aSXin Liwith a Viterbi-like algorithm <xref target="Viterbi"/>.
7073*a58d3d2aSXin LiThe quantizer processes the residual LSF vector in reverse order
7074*a58d3d2aSXin Li(i.e., it starts with the highest residual LSF value).
7075*a58d3d2aSXin LiThis is done because the prediction works slightly
7076*a58d3d2aSXin Libetter in the reverse direction.
7077*a58d3d2aSXin Li</t>
7078*a58d3d2aSXin Li<t>
7079*a58d3d2aSXin LiThe quantization index of the first stage is entropy coded.
7080*a58d3d2aSXin LiThe quantization sequence from the second stage is also entropy
7081*a58d3d2aSXin Licoded, where for each element the probability table is chosen
7082*a58d3d2aSXin Lidepending on the vector index from the first stage and the location
7083*a58d3d2aSXin Liof that element in the LSF vector.
7084*a58d3d2aSXin Li</t>
7085*a58d3d2aSXin Li
7086*a58d3d2aSXin Li<section title='LSF Stabilization' anchor='lsf_stabilizer_overview_section'>
7087*a58d3d2aSXin Li<t>
7088*a58d3d2aSXin LiIf the input is stable, finding the best candidate usually results in a
7089*a58d3d2aSXin Liquantized vector that is also stable. Because of the two-stage approach,
7090*a58d3d2aSXin Lihowever, it is possible that the best quantization candidate is unstable.
7091*a58d3d2aSXin LiThe encoder applies the same stabilization procedure applied by the decoder
7092*a58d3d2aSXin Li (see <xref target="silk_nlsf_stabilization"/> to ensure the LSF parameters
7093*a58d3d2aSXin Li are within their valid range, increasingly sorted, and have minimum
7094*a58d3d2aSXin Li distances between each other and the border values.
7095*a58d3d2aSXin Li</t>
7096*a58d3d2aSXin Li</section>
7097*a58d3d2aSXin Li</section>
7098*a58d3d2aSXin Li
7099*a58d3d2aSXin Li<section title='LTP Quantization' anchor='ltp_quantizer_overview_section'>
7100*a58d3d2aSXin Li<t>
7101*a58d3d2aSXin LiFor voiced frames, the prediction analysis described in
7102*a58d3d2aSXin Li<xref target='pred_ana_voiced_overview_section' /> resulted in four sets
7103*a58d3d2aSXin Li(one set per subframe) of five LTP coefficients, plus four weighting matrices.
7104*a58d3d2aSXin LiThe LTP coefficients for each subframe are quantized using entropy constrained
7105*a58d3d2aSXin Livector quantization.
7106*a58d3d2aSXin LiA total of three vector codebooks are available for quantization, with
7107*a58d3d2aSXin Lidifferent rate-distortion trade-offs. The three codebooks have 10, 20, and
7108*a58d3d2aSXin Li40 vectors and average rates of about 3, 4, and 5 bits per vector, respectively.
7109*a58d3d2aSXin LiConsequently, the first codebook has larger average quantization distortion at
7110*a58d3d2aSXin Lia lower rate, whereas the last codebook has smaller average quantization
7111*a58d3d2aSXin Lidistortion at a higher rate.
7112*a58d3d2aSXin LiGiven the weighting matrix W_ltp and LTP vector b, the weighted rate-distortion
7113*a58d3d2aSXin Limeasure for a codebook vector cb_i with rate r_i is give by
7114*a58d3d2aSXin Li<figure align="center">
7115*a58d3d2aSXin Li<artwork align="center">
7116*a58d3d2aSXin Li<![CDATA[
7117*a58d3d2aSXin Li RD = u * (b - cb_i)' * W_ltp * (b - cb_i) + r_i,
7118*a58d3d2aSXin Li]]>
7119*a58d3d2aSXin Li</artwork>
7120*a58d3d2aSXin Li</figure>
7121*a58d3d2aSXin Liwhere u is a fixed, heuristically-determined parameter balancing the distortion
7122*a58d3d2aSXin Liand rate.
7123*a58d3d2aSXin LiWhich codebook gives the best performance for a given LTP vector depends on the
7124*a58d3d2aSXin Liweighting matrix for that LTP vector.
7125*a58d3d2aSXin LiFor example, for a low valued W_ltp, it is advantageous to use the codebook
7126*a58d3d2aSXin Liwith 10 vectors as it has a lower average rate.
7127*a58d3d2aSXin LiFor a large W_ltp, on the other hand, it is often better to use the codebook
7128*a58d3d2aSXin Liwith 40 vectors, as it is more likely to contain the best codebook vector.
7129*a58d3d2aSXin LiThe weighting matrix W_ltp depends mostly on two aspects of the input signal.
7130*a58d3d2aSXin LiThe first is the periodicity of the signal; the more periodic, the larger W_ltp.
7131*a58d3d2aSXin LiThe second is the change in signal energy in the current subframe, relative to
7132*a58d3d2aSXin Lithe signal one pitch lag earlier.
7133*a58d3d2aSXin LiA decaying energy leads to a larger W_ltp than an increasing energy.
7134*a58d3d2aSXin LiBoth aspects fluctuate relatively slowly, which causes the W_ltp matrices for
7135*a58d3d2aSXin Lidifferent subframes of one frame often to be similar.
7136*a58d3d2aSXin LiBecause of this, one of the three codebooks typically gives good performance
7137*a58d3d2aSXin Lifor all subframes, and therefore the codebook search for the subframe LTP
7138*a58d3d2aSXin Livectors is constrained to only allow codebook vectors to be chosen from the
7139*a58d3d2aSXin Lisame codebook, resulting in a rate reduction.
7140*a58d3d2aSXin Li</t>
7141*a58d3d2aSXin Li
7142*a58d3d2aSXin Li<t>
7143*a58d3d2aSXin LiTo find the best codebook, each of the three vector codebooks is
7144*a58d3d2aSXin Liused to quantize all subframe LTP vectors and produce a combined
7145*a58d3d2aSXin Liweighted rate-distortion measure for each vector codebook.
7146*a58d3d2aSXin LiThe vector codebook with the lowest combined rate-distortion
7147*a58d3d2aSXin Liover all subframes is chosen. The quantized LTP vectors are used
7148*a58d3d2aSXin Liin the noise shaping quantizer, and the index of the codebook
7149*a58d3d2aSXin Liplus the four indices for the four subframe codebook vectors
7150*a58d3d2aSXin Liare passed on to the range encoder.
7151*a58d3d2aSXin Li</t>
7152*a58d3d2aSXin Li</section>
7153*a58d3d2aSXin Li
7154*a58d3d2aSXin Li<section title='Prefilter'>
7155*a58d3d2aSXin Li<t>
7156*a58d3d2aSXin LiIn the prefilter the input signal is filtered using the spectral valley
7157*a58d3d2aSXin Lide-emphasis filter coefficients from the noise shaping analysis
7158*a58d3d2aSXin Li(see <xref target='noise_shaping_analysis_overview_section'/>).
7159*a58d3d2aSXin LiBy applying only the noise shaping analysis filter to the input signal,
7160*a58d3d2aSXin Liit provides the input to the noise shaping quantizer.
7161*a58d3d2aSXin Li</t>
7162*a58d3d2aSXin Li</section>
7163*a58d3d2aSXin Li
7164*a58d3d2aSXin Li<section title='Noise Shaping Quantizer'>
7165*a58d3d2aSXin Li<t>
7166*a58d3d2aSXin LiThe noise shaping quantizer independently shapes the signal and coding noise
7167*a58d3d2aSXin Lispectra to obtain a perceptually higher quality at the same bitrate.
7168*a58d3d2aSXin Li</t>
7169*a58d3d2aSXin Li<t>
7170*a58d3d2aSXin LiThe prefilter output signal is multiplied with a compensation gain G computed
7171*a58d3d2aSXin Liin the noise shaping analysis. Then the output of a synthesis shaping filter
7172*a58d3d2aSXin Liis added, and the output of a prediction filter is subtracted to create a
7173*a58d3d2aSXin Liresidual signal.
7174*a58d3d2aSXin LiThe residual signal is multiplied by the inverse quantized quantization gain
7175*a58d3d2aSXin Lifrom the noise shaping analysis, and input to a scalar quantizer.
7176*a58d3d2aSXin LiThe quantization indices of the scalar quantizer represent a signal of pulses
7177*a58d3d2aSXin Lithat is input to the pyramid range encoder.
7178*a58d3d2aSXin LiThe scalar quantizer also outputs a quantization signal, which is multiplied
7179*a58d3d2aSXin Liby the quantized quantization gain from the noise shaping analysis to create
7180*a58d3d2aSXin Lian excitation signal.
7181*a58d3d2aSXin LiThe output of the prediction filter is added to the excitation signal to form
7182*a58d3d2aSXin Lithe quantized output signal y(n).
7183*a58d3d2aSXin LiThe quantized output signal y(n) is input to the synthesis shaping and
7184*a58d3d2aSXin Liprediction filters.
7185*a58d3d2aSXin Li</t>
7186*a58d3d2aSXin Li<t>
7187*a58d3d2aSXin LiOptionally the noise shaping quantizer operates in a delayed decision
7188*a58d3d2aSXin Limode.
7189*a58d3d2aSXin LiIn this mode it uses a Viterbi algorithm to keep track of
7190*a58d3d2aSXin Limultiple rounding choices in the quantizer and select the best
7191*a58d3d2aSXin Lione after a delay of 32 samples.  This improves the rate/distortion
7192*a58d3d2aSXin Liperformance of the quantizer.
7193*a58d3d2aSXin Li</t>
7194*a58d3d2aSXin Li</section>
7195*a58d3d2aSXin Li
7196*a58d3d2aSXin Li<section title='Constant Bitrate Mode'>
7197*a58d3d2aSXin Li<t>
7198*a58d3d2aSXin Li  SILK was designed to run in Variable Bitrate (VBR) mode.  However
7199*a58d3d2aSXin Li  the reference implementation also has a Constant Bitrate (CBR) mode
7200*a58d3d2aSXin Li  for SILK.  In CBR mode SILK will attempt to encode each packet with
7201*a58d3d2aSXin Li  no more than the allowed number of bits.  The Opus wrapper code
7202*a58d3d2aSXin Li  then pads the bitstream if any unused bits are left in SILK mode, or
7203*a58d3d2aSXin Li  encodes the high band with the remaining number of bits in Hybrid mode.
7204*a58d3d2aSXin Li  The number of payload bits is adjusted by changing
7205*a58d3d2aSXin Li  the quantization gains and the rate/distortion tradeoff in the noise
7206*a58d3d2aSXin Li  shaping quantizer, in an iterative loop
7207*a58d3d2aSXin Li  around the noise shaping quantizer and entropy coding.
7208*a58d3d2aSXin Li  Compared to the SILK VBR mode, the CBR mode has lower
7209*a58d3d2aSXin Li  audio quality at a given average bitrate, and also has higher
7210*a58d3d2aSXin Li  computational complexity.
7211*a58d3d2aSXin Li</t>
7212*a58d3d2aSXin Li</section>
7213*a58d3d2aSXin Li
7214*a58d3d2aSXin Li</section>
7215*a58d3d2aSXin Li
7216*a58d3d2aSXin Li</section>
7217*a58d3d2aSXin Li
7218*a58d3d2aSXin Li
7219*a58d3d2aSXin Li<section title="CELT Encoder">
7220*a58d3d2aSXin Li<t>
7221*a58d3d2aSXin LiMost of the aspects of the CELT encoder can be directly derived from the description
7222*a58d3d2aSXin Liof the decoder. For example, the filters and rotations in the encoder are simply the
7223*a58d3d2aSXin Liinverse of the operation performed by the decoder. Similarly, the quantizers generally
7224*a58d3d2aSXin Lioptimize for the mean square error (because noise shaping is part of the bit-stream itself),
7225*a58d3d2aSXin Liso no special search is required. For this reason, only the less straightforward aspects of the
7226*a58d3d2aSXin Liencoder are described here.
7227*a58d3d2aSXin Li</t>
7228*a58d3d2aSXin Li
7229*a58d3d2aSXin Li<section anchor="pitch-prefilter" title="Pitch Prefilter">
7230*a58d3d2aSXin Li<t>The pitch prefilter is applied after the pre-emphasis. It is applied
7231*a58d3d2aSXin Liin such a way as to be the inverse of the decoder's post-filter. The main non-obvious aspect of the
7232*a58d3d2aSXin Liprefilter is the selection of the pitch period. The pitch search should be optimized for the
7233*a58d3d2aSXin Lifollowing criteria:
7234*a58d3d2aSXin Li<list style="symbols">
7235*a58d3d2aSXin Li<t>continuity: it is important that the pitch period
7236*a58d3d2aSXin Lidoes not change abruptly between frames; and</t>
7237*a58d3d2aSXin Li<t>avoidance of pitch multiples: when the period used is a multiple of the real period
7238*a58d3d2aSXin Li(lower frequency fundamental), the post-filter loses most of its ability to reduce noise</t>
7239*a58d3d2aSXin Li</list>
7240*a58d3d2aSXin Li</t>
7241*a58d3d2aSXin Li</section>
7242*a58d3d2aSXin Li
7243*a58d3d2aSXin Li<section anchor="normalization" title="Bands and Normalization">
7244*a58d3d2aSXin Li<t>
7245*a58d3d2aSXin LiThe MDCT output is divided into bands that are designed to match the ear's critical
7246*a58d3d2aSXin Libands for the smallest (2.5&nbsp;ms) frame size. The larger frame sizes use integer
7247*a58d3d2aSXin Limultiples of the 2.5&nbsp;ms layout. For each band, the encoder
7248*a58d3d2aSXin Licomputes the energy that will later be encoded. Each band is then normalized by the
7249*a58d3d2aSXin Lisquare root of the <spanx style="strong">unquantized</spanx> energy, such that each band now forms a unit vector X.
7250*a58d3d2aSXin LiThe energy and the normalization are computed by compute_band_energies()
7251*a58d3d2aSXin Liand normalise_bands() (bands.c), respectively.
7252*a58d3d2aSXin Li</t>
7253*a58d3d2aSXin Li</section>
7254*a58d3d2aSXin Li
7255*a58d3d2aSXin Li<section anchor="energy-quantization" title="Energy Envelope Quantization">
7256*a58d3d2aSXin Li
7257*a58d3d2aSXin Li<t>
7258*a58d3d2aSXin LiEnergy quantization (both coarse and fine) can be easily understood from the decoding process.
7259*a58d3d2aSXin LiFor all useful bitrates, the coarse quantizer always chooses the quantized log energy value that
7260*a58d3d2aSXin Liminimizes the error for each band. Only at very low rate does the encoder allow larger errors to
7261*a58d3d2aSXin Liminimize the rate and avoid using more bits than are available. When the
7262*a58d3d2aSXin Liavailable CPU requirements allow it, it is best to try encoding the coarse energy both with and without
7263*a58d3d2aSXin Liinter-frame prediction such that the best prediction mode can be selected. The optimal mode depends on
7264*a58d3d2aSXin Lithe coding rate, the available bitrate, and the current rate of packet loss.
7265*a58d3d2aSXin Li</t>
7266*a58d3d2aSXin Li
7267*a58d3d2aSXin Li<t>The fine energy quantizer always chooses the quantized log energy value that
7268*a58d3d2aSXin Liminimizes the error for each band because the rate of the fine quantization depends only
7269*a58d3d2aSXin Lion the bit allocation and not on the values that are coded.
7270*a58d3d2aSXin Li</t>
7271*a58d3d2aSXin Li</section> <!-- Energy quant -->
7272*a58d3d2aSXin Li
7273*a58d3d2aSXin Li<section title="Bit Allocation">
7274*a58d3d2aSXin Li<t>The encoder must use exactly the same bit allocation process as used by the decoder
7275*a58d3d2aSXin Liand described in <xref target="allocation"/>. The three mechanisms that can be used by the
7276*a58d3d2aSXin Liencoder to adjust the bitrate on a frame-by-frame basis are band boost, allocation trim,
7277*a58d3d2aSXin Liand band skipping.
7278*a58d3d2aSXin Li</t>
7279*a58d3d2aSXin Li
7280*a58d3d2aSXin Li<section title="Band Boost">
7281*a58d3d2aSXin Li<t>The reference encoder makes a decision to boost a band when the energy of that band is significantly
7282*a58d3d2aSXin Lihigher than that of the neighboring bands. Let E_j be the log-energy of band j, we define
7283*a58d3d2aSXin Li<list>
7284*a58d3d2aSXin Li<t>D_j = 2*E_j - E_j-1 - E_j+1 </t>
7285*a58d3d2aSXin Li</list>
7286*a58d3d2aSXin Li
7287*a58d3d2aSXin LiThe allocation of band j is boosted once if D_j &gt; t1 and twice if D_j &gt; t2. For LM&gt;=1, t1=2 and t2=4,
7288*a58d3d2aSXin Liwhile for LM&lt;1, t1=3 and t2=5.
7289*a58d3d2aSXin Li</t>
7290*a58d3d2aSXin Li
7291*a58d3d2aSXin Li</section>
7292*a58d3d2aSXin Li
7293*a58d3d2aSXin Li<section title="Allocation Trim">
7294*a58d3d2aSXin Li<t>The allocation trim is a value between 0 and 10 (inclusively) that controls the allocation
7295*a58d3d2aSXin Libalance between the low and high frequencies. The encoder starts with a safe "default" of 5
7296*a58d3d2aSXin Liand deviates from that default in two different ways. First the trim can deviate by +/- 2
7297*a58d3d2aSXin Lidepending on the spectral tilt of the input signal. For signals with more low frequencies, the
7298*a58d3d2aSXin Litrim is increased by up to 2, while for signals with more high frequencies, the trim is
7299*a58d3d2aSXin Lidecreased by up to 2.
7300*a58d3d2aSXin LiFor stereo inputs, the trim value can
7301*a58d3d2aSXin Libe decreased by up to 4 when the inter-channel correlation at low frequency (first 8 bands)
7302*a58d3d2aSXin Liis high. </t>
7303*a58d3d2aSXin Li</section>
7304*a58d3d2aSXin Li
7305*a58d3d2aSXin Li<section title="Band Skipping">
7306*a58d3d2aSXin Li<t>The encoder uses band skipping to ensure that the shape of the bands is only coded
7307*a58d3d2aSXin Liif there is at least 1/2 bit per sample available for the PVQ. If not, then no bit is allocated
7308*a58d3d2aSXin Liand folding is used instead. To ensure continuity in the allocation, some amount of hysteresis is
7309*a58d3d2aSXin Liadded to the process, such that a band that received PVQ bits in the previous frame only needs 7/16
7310*a58d3d2aSXin Libit/sample to be coded for the current frame, while a band that did not receive PVQ bits in the
7311*a58d3d2aSXin Liprevious frames needs at least 9/16 bit/sample to be coded.</t>
7312*a58d3d2aSXin Li</section>
7313*a58d3d2aSXin Li
7314*a58d3d2aSXin Li</section>
7315*a58d3d2aSXin Li
7316*a58d3d2aSXin Li<section title="Stereo Decisions">
7317*a58d3d2aSXin Li<t>Because CELT applies mid-side stereo coupling in the normalized domain, it does not suffer from
7318*a58d3d2aSXin Liimportant stereo image problems even when the two channels are completely uncorrelated. For this reason
7319*a58d3d2aSXin Liit is always safe to use stereo coupling on any audio frame. That being said, there are some frames
7320*a58d3d2aSXin Lifor which dual (independent) stereo is still more efficient. This decision is made by comparing the estimated
7321*a58d3d2aSXin Lientropy with and without coupling over the first 13 bands, taking into account the fact that all bands with
7322*a58d3d2aSXin Limore than two MDCT bins require one extra degree of freedom when coded in mid-side. Let L1_ms and L1_lr
7323*a58d3d2aSXin Libe the L1-norm of the mid-side vector and the L1-norm of the left-right vector, respectively. The decision
7324*a58d3d2aSXin Lito use mid-side is made if and only if
7325*a58d3d2aSXin Li<figure align="center">
7326*a58d3d2aSXin Li<artwork align="center"><![CDATA[
7327*a58d3d2aSXin Li L1_ms          L1_lr
7328*a58d3d2aSXin Li--------    <   -----
7329*a58d3d2aSXin Libins + E        bins
7330*a58d3d2aSXin Li]]></artwork>
7331*a58d3d2aSXin Li</figure>
7332*a58d3d2aSXin Liwhere bins is the number of MDCT bins in the first 13 bands and E is the number of extra degrees of
7333*a58d3d2aSXin Lifreedom for mid-side coding. For LM>1, E=13, otherwise E=5.
7334*a58d3d2aSXin Li</t>
7335*a58d3d2aSXin Li
7336*a58d3d2aSXin Li<t>The reference encoder decides on the intensity stereo threshold based on the bitrate alone. After
7337*a58d3d2aSXin Litaking into account the frame size by subtracting 80 bits per frame for coarse energy, the first
7338*a58d3d2aSXin Liband using intensity coding is as follows:
7339*a58d3d2aSXin Li</t>
7340*a58d3d2aSXin Li
7341*a58d3d2aSXin Li<texttable anchor="intensity-thresholds"
7342*a58d3d2aSXin Li title="Thresholds for Intensity Stereo">
7343*a58d3d2aSXin Li<ttcol align='center'>bitrate (kb/s)</ttcol>
7344*a58d3d2aSXin Li<ttcol align='center'>start band</ttcol>
7345*a58d3d2aSXin Li<c>&lt;35</c>      <c>8</c>
7346*a58d3d2aSXin Li<c>35-50</c>      <c>12</c>
7347*a58d3d2aSXin Li<c>50-68</c>      <c>16</c>
7348*a58d3d2aSXin Li<c>84-84</c>      <c>18</c>
7349*a58d3d2aSXin Li<c>84-102</c>     <c>19</c>
7350*a58d3d2aSXin Li<c>102-130</c>     <c>20</c>
7351*a58d3d2aSXin Li<c>&gt;130</c>     <c>disabled</c>
7352*a58d3d2aSXin Li</texttable>
7353*a58d3d2aSXin Li
7354*a58d3d2aSXin Li
7355*a58d3d2aSXin Li</section>
7356*a58d3d2aSXin Li
7357*a58d3d2aSXin Li<section title="Time-Frequency Decision">
7358*a58d3d2aSXin Li<t>
7359*a58d3d2aSXin LiThe choice of time-frequency resolution used in <xref target="tf-change"></xref> is based on
7360*a58d3d2aSXin LiR-D optimization. The distortion is the L1-norm (sum of absolute values) of each band
7361*a58d3d2aSXin Liafter each TF resolution under consideration. The L1 norm is used because it represents the entropy
7362*a58d3d2aSXin Lifor a Laplacian source. The number of bits required to code a change in TF resolution between
7363*a58d3d2aSXin Litwo bands is higher than the cost of having those two bands use the same resolution, which is
7364*a58d3d2aSXin Liwhat requires the R-D optimization. The optimal decision is computed using the Viterbi algorithm.
7365*a58d3d2aSXin LiSee tf_analysis() in celt/celt.c.
7366*a58d3d2aSXin Li</t>
7367*a58d3d2aSXin Li</section>
7368*a58d3d2aSXin Li
7369*a58d3d2aSXin Li<section title="Spreading Values Decision">
7370*a58d3d2aSXin Li<t>
7371*a58d3d2aSXin LiThe choice of the spreading value in <xref target="spread values"></xref> has an
7372*a58d3d2aSXin Liimpact on the nature of the coding noise introduced by CELT. The larger the f_r value, the
7373*a58d3d2aSXin Lilower the impact of the rotation, and the more tonal the coding noise. The
7374*a58d3d2aSXin Limore tonal the signal, the more tonal the noise should be, so the CELT encoder determines
7375*a58d3d2aSXin Lithe optimal value for f_r by estimating how tonal the signal is. The tonality estimate
7376*a58d3d2aSXin Liis based on discrete pdf (4-bin histogram) of each band. Bands that have a large number of small
7377*a58d3d2aSXin Livalues are considered more tonal and a decision is made by combining all bands with more than
7378*a58d3d2aSXin Li8 samples. See spreading_decision() in celt/bands.c.
7379*a58d3d2aSXin Li</t>
7380*a58d3d2aSXin Li</section>
7381*a58d3d2aSXin Li
7382*a58d3d2aSXin Li<section anchor="pvq" title="Spherical Vector Quantization">
7383*a58d3d2aSXin Li<t>CELT uses a Pyramid Vector Quantization (PVQ) <xref target="PVQ"></xref>
7384*a58d3d2aSXin Licodebook for quantizing the details of the spectrum in each band that have not
7385*a58d3d2aSXin Libeen predicted by the pitch predictor. The PVQ codebook consists of all sums
7386*a58d3d2aSXin Liof K signed pulses in a vector of N samples, where two pulses at the same position
7387*a58d3d2aSXin Liare required to have the same sign. Thus the codebook includes
7388*a58d3d2aSXin Liall integer codevectors y of N dimensions that satisfy sum(abs(y(j))) = K.
7389*a58d3d2aSXin Li</t>
7390*a58d3d2aSXin Li
7391*a58d3d2aSXin Li<t>
7392*a58d3d2aSXin LiIn bands where there are sufficient bits allocated PVQ is used to encode
7393*a58d3d2aSXin Lithe unit vector that results from the normalization in
7394*a58d3d2aSXin Li<xref target="normalization"></xref> directly. Given a PVQ codevector y,
7395*a58d3d2aSXin Lithe unit vector X is obtained as X = y/||y||, where ||.|| denotes the
7396*a58d3d2aSXin LiL2 norm.
7397*a58d3d2aSXin Li</t>
7398*a58d3d2aSXin Li
7399*a58d3d2aSXin Li
7400*a58d3d2aSXin Li<section anchor="pvq-search" title="PVQ Search">
7401*a58d3d2aSXin Li
7402*a58d3d2aSXin Li<t>
7403*a58d3d2aSXin LiThe search for the best codevector y is performed by alg_quant()
7404*a58d3d2aSXin Li(vq.c). There are several possible approaches to the
7405*a58d3d2aSXin Lisearch, with a trade-off between quality and complexity. The method used in the reference
7406*a58d3d2aSXin Liimplementation computes an initial codeword y1 by projecting the normalized spectrum
7407*a58d3d2aSXin LiX onto the codebook pyramid of K-1 pulses:
7408*a58d3d2aSXin Li</t>
7409*a58d3d2aSXin Li<t>
7410*a58d3d2aSXin Liy0 = truncate_towards_zero( (K-1) * X / sum(abs(X)))
7411*a58d3d2aSXin Li</t>
7412*a58d3d2aSXin Li
7413*a58d3d2aSXin Li<t>
7414*a58d3d2aSXin LiDepending on N, K and the input data, the initial codeword y0 may contain from
7415*a58d3d2aSXin Li0 to K-1 non-zero values. All the remaining pulses, with the exception of the last one,
7416*a58d3d2aSXin Liare found iteratively with a greedy search that minimizes the normalized correlation
7417*a58d3d2aSXin Libetween y and X:
7418*a58d3d2aSXin Li<figure align="center">
7419*a58d3d2aSXin Li<artwork align="center"><![CDATA[
7420*a58d3d2aSXin Li      T
7421*a58d3d2aSXin LiJ = -X * y / ||y||
7422*a58d3d2aSXin Li]]></artwork>
7423*a58d3d2aSXin Li</figure>
7424*a58d3d2aSXin Li</t>
7425*a58d3d2aSXin Li
7426*a58d3d2aSXin Li<t>
7427*a58d3d2aSXin LiThe search described above is considered to be a good trade-off between quality
7428*a58d3d2aSXin Liand computational cost. However, there are other possible ways to search the PVQ
7429*a58d3d2aSXin Licodebook and the implementers MAY use any other search methods. See alg_quant() in celt/vq.c.
7430*a58d3d2aSXin Li</t>
7431*a58d3d2aSXin Li</section>
7432*a58d3d2aSXin Li
7433*a58d3d2aSXin Li<section anchor="cwrs-encoder" title="PVQ Encoding">
7434*a58d3d2aSXin Li
7435*a58d3d2aSXin Li<t>
7436*a58d3d2aSXin LiThe vector to encode, X, is converted into an index i such that
7437*a58d3d2aSXin Li 0&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;V(N,K) as follows.
7438*a58d3d2aSXin LiLet i&nbsp;=&nbsp;0 and k&nbsp;=&nbsp;0.
7439*a58d3d2aSXin LiThen for j&nbsp;=&nbsp;(N&nbsp;-&nbsp;1) down to 0, inclusive, do:
7440*a58d3d2aSXin Li<list style="numbers">
7441*a58d3d2aSXin Li<t>
7442*a58d3d2aSXin LiIf k&nbsp;>&nbsp;0, set
7443*a58d3d2aSXin Li i&nbsp;=&nbsp;i&nbsp;+&nbsp;(V(N-j-1,k-1)&nbsp;+&nbsp;V(N-j,k-1))/2.
7444*a58d3d2aSXin Li</t>
7445*a58d3d2aSXin Li<t>Set k&nbsp;=&nbsp;k&nbsp;+&nbsp;abs(X[j]).</t>
7446*a58d3d2aSXin Li<t>
7447*a58d3d2aSXin LiIf X[j]&nbsp;&lt;&nbsp;0, set
7448*a58d3d2aSXin Li i&nbsp;=&nbsp;i&nbsp;+&nbsp;(V(N-j-1,k)&nbsp;+&nbsp;V(N-j,k))/2.
7449*a58d3d2aSXin Li</t>
7450*a58d3d2aSXin Li</list>
7451*a58d3d2aSXin Li</t>
7452*a58d3d2aSXin Li
7453*a58d3d2aSXin Li<t>
7454*a58d3d2aSXin LiThe index i is then encoded using the procedure in
7455*a58d3d2aSXin Li <xref target="encoding-ints"/> with ft&nbsp;=&nbsp;V(N,K).
7456*a58d3d2aSXin Li</t>
7457*a58d3d2aSXin Li
7458*a58d3d2aSXin Li</section>
7459*a58d3d2aSXin Li
7460*a58d3d2aSXin Li</section>
7461*a58d3d2aSXin Li
7462*a58d3d2aSXin Li
7463*a58d3d2aSXin Li
7464*a58d3d2aSXin Li
7465*a58d3d2aSXin Li
7466*a58d3d2aSXin Li</section>
7467*a58d3d2aSXin Li
7468*a58d3d2aSXin Li</section>
7469*a58d3d2aSXin Li
7470*a58d3d2aSXin Li
7471*a58d3d2aSXin Li<section anchor="conformance" title="Conformance">
7472*a58d3d2aSXin Li
7473*a58d3d2aSXin Li<t>
7474*a58d3d2aSXin LiIt is our intention to allow the greatest possible choice of freedom in
7475*a58d3d2aSXin Liimplementing the specification. For this reason, outside of the exceptions
7476*a58d3d2aSXin Linoted in this section, conformance is defined through the reference
7477*a58d3d2aSXin Liimplementation of the decoder provided in <xref target="ref-implementation"/>.
7478*a58d3d2aSXin LiAlthough this document includes an English description of the codec, should
7479*a58d3d2aSXin Lithe description contradict the source code of the reference implementation,
7480*a58d3d2aSXin Lithe latter shall take precedence.
7481*a58d3d2aSXin Li</t>
7482*a58d3d2aSXin Li
7483*a58d3d2aSXin Li<t>
7484*a58d3d2aSXin LiCompliance with this specification means that in addition to following the normative keywords in this document,
7485*a58d3d2aSXin Li a decoder's output MUST also be
7486*a58d3d2aSXin Li within the thresholds specified by the opus_compare.c tool (included
7487*a58d3d2aSXin Li with the code) when compared to the reference implementation for each of the
7488*a58d3d2aSXin Li test vectors provided (see <xref target="test-vectors"></xref>) and for each output
7489*a58d3d2aSXin Li sampling rate and channel count supported. In addition, a compliant
7490*a58d3d2aSXin Li decoder implementation MUST have the same final range decoder state as that of the
7491*a58d3d2aSXin Li reference decoder. It is therefore RECOMMENDED that the
7492*a58d3d2aSXin Li decoder implement the same functional behavior as the reference.
7493*a58d3d2aSXin Li
7494*a58d3d2aSXin Li A decoder implementation is not required to support all output sampling
7495*a58d3d2aSXin Li rates or all output channel counts.
7496*a58d3d2aSXin Li</t>
7497*a58d3d2aSXin Li
7498*a58d3d2aSXin Li<section title="Testing">
7499*a58d3d2aSXin Li<t>
7500*a58d3d2aSXin LiUsing the reference code provided in <xref target="ref-implementation"></xref>,
7501*a58d3d2aSXin Lia test vector can be decoded with
7502*a58d3d2aSXin Li<list>
7503*a58d3d2aSXin Li<t>opus_demo -d &lt;rate&gt; &lt;channels&gt; testvectorX.bit testX.out</t>
7504*a58d3d2aSXin Li</list>
7505*a58d3d2aSXin Liwhere &lt;rate&gt; is the sampling rate and can be 8000, 12000, 16000, 24000, or 48000, and
7506*a58d3d2aSXin Li&lt;channels&gt; is 1 for mono or 2 for stereo.
7507*a58d3d2aSXin Li</t>
7508*a58d3d2aSXin Li
7509*a58d3d2aSXin Li<t>
7510*a58d3d2aSXin LiIf the range decoder state is incorrect for one of the frames, the decoder will exit with
7511*a58d3d2aSXin Li"Error: Range coder state mismatch between encoder and decoder". If the decoder succeeds, then
7512*a58d3d2aSXin Lithe output can be compared with the "reference" output with
7513*a58d3d2aSXin Li<list>
7514*a58d3d2aSXin Li<t>opus_compare -s -r &lt;rate&gt; testvectorX.dec testX.out</t>
7515*a58d3d2aSXin Li</list>
7516*a58d3d2aSXin Lifor stereo or
7517*a58d3d2aSXin Li<list>
7518*a58d3d2aSXin Li<t>opus_compare -r &lt;rate&gt; testvectorX.dec testX.out</t>
7519*a58d3d2aSXin Li</list>
7520*a58d3d2aSXin Lifor mono.
7521*a58d3d2aSXin Li</t>
7522*a58d3d2aSXin Li
7523*a58d3d2aSXin Li<t>In addition to indicating whether the test vector comparison passes, the opus_compare tool
7524*a58d3d2aSXin Lioutputs an "Opus quality metric" that indicates how well the tested decoder matches the
7525*a58d3d2aSXin Lireference implementation. A quality of 0 corresponds to the passing threshold, while
7526*a58d3d2aSXin Lia quality of 100 is the highest possible value and means that the output of the tested decoder is identical to the reference
7527*a58d3d2aSXin Liimplementation. The passing threshold (quality 0) was calibrated in such a way that it corresponds to
7528*a58d3d2aSXin Liadditive white noise with a 48 dB SNR (similar to what can be obtained on a cassette deck).
7529*a58d3d2aSXin LiIt is still possible for an implementation to sound very good with such a low quality measure
7530*a58d3d2aSXin Li(e.g. if the deviation is due to inaudible phase distortion), but unless this is verified by
7531*a58d3d2aSXin Lilistening tests, it is RECOMMENDED that implementations achieve a quality above 90 for 48&nbsp;kHz
7532*a58d3d2aSXin Lidecoding. For other sampling rates, it is normal for the quality metric to be lower
7533*a58d3d2aSXin Li(typically as low as 50 even for a good implementation) because of harmless mismatch with
7534*a58d3d2aSXin Lithe delay and phase of the internal sampling rate conversion.
7535*a58d3d2aSXin Li</t>
7536*a58d3d2aSXin Li
7537*a58d3d2aSXin Li<t>
7538*a58d3d2aSXin LiOn POSIX environments, the run_vectors.sh script can be used to verify all test
7539*a58d3d2aSXin Livectors. This can be done with
7540*a58d3d2aSXin Li<list>
7541*a58d3d2aSXin Li<t>run_vectors.sh &lt;exec path&gt; &lt;vector path&gt; &lt;rate&gt;</t>
7542*a58d3d2aSXin Li</list>
7543*a58d3d2aSXin Liwhere &lt;exec path&gt; is the directory where the opus_demo and opus_compare executables
7544*a58d3d2aSXin Liare built and &lt;vector path&gt; is the directory containing the test vectors.
7545*a58d3d2aSXin Li</t>
7546*a58d3d2aSXin Li</section>
7547*a58d3d2aSXin Li
7548*a58d3d2aSXin Li<section anchor="opus-custom" title="Opus Custom">
7549*a58d3d2aSXin Li<t>
7550*a58d3d2aSXin LiOpus Custom is an OPTIONAL part of the specification that is defined to
7551*a58d3d2aSXin Lihandle special sample rates and frame rates that are not supported by the
7552*a58d3d2aSXin Limain Opus specification. Use of Opus Custom is discouraged for all but very
7553*a58d3d2aSXin Lispecial applications for which a frame size different from 2.5, 5, 10, or 20&nbsp;ms is
7554*a58d3d2aSXin Lineeded (for either complexity or latency reasons). Because Opus Custom is
7555*a58d3d2aSXin Lioptional, streams encoded using Opus Custom cannot be expected to be decodable by all Opus
7556*a58d3d2aSXin Liimplementations. Also, because no in-band mechanism exists for specifying the sampling
7557*a58d3d2aSXin Lirate and frame size of Opus Custom streams, out-of-band signaling is required.
7558*a58d3d2aSXin LiIn Opus Custom operation, only the CELT layer is available, using the opus_custom_* function
7559*a58d3d2aSXin Licalls in opus_custom.h.
7560*a58d3d2aSXin Li</t>
7561*a58d3d2aSXin Li</section>
7562*a58d3d2aSXin Li
7563*a58d3d2aSXin Li</section>
7564*a58d3d2aSXin Li
7565*a58d3d2aSXin Li<section anchor="security" title="Security Considerations">
7566*a58d3d2aSXin Li
7567*a58d3d2aSXin Li<t>
7568*a58d3d2aSXin LiImplementations of the Opus codec need to take appropriate security considerations
7569*a58d3d2aSXin Liinto account, as outlined in <xref target="DOS"/>.
7570*a58d3d2aSXin LiIt is extremely important for the decoder to be robust against malicious
7571*a58d3d2aSXin Lipayloads.
7572*a58d3d2aSXin LiMalicious payloads must not cause the decoder to overrun its allocated memory
7573*a58d3d2aSXin Li or to take an excessive amount of resources to decode.
7574*a58d3d2aSXin LiAlthough problems
7575*a58d3d2aSXin Liin encoders are typically rarer, the same applies to the encoder. Malicious
7576*a58d3d2aSXin Liaudio streams must not cause the encoder to misbehave because this would
7577*a58d3d2aSXin Liallow an attacker to attack transcoding gateways.
7578*a58d3d2aSXin Li</t>
7579*a58d3d2aSXin Li<t>
7580*a58d3d2aSXin LiThe reference implementation contains no known buffer overflow or cases where
7581*a58d3d2aSXin Li a specially crafted packet or audio segment could cause a significant increase
7582*a58d3d2aSXin Li in CPU load.
7583*a58d3d2aSXin LiHowever, on certain CPU architectures where denormalized floating-point
7584*a58d3d2aSXin Li operations are much slower than normal floating-point operations, it is
7585*a58d3d2aSXin Li possible for some audio content (e.g., silence or near-silence) to cause an
7586*a58d3d2aSXin Li increase in CPU load.
7587*a58d3d2aSXin LiDenormals can be introduced by reordering operations in the compiler and depend
7588*a58d3d2aSXin Li on the target architecture, so it is difficult to guarantee that an implementation
7589*a58d3d2aSXin Li avoids them.
7590*a58d3d2aSXin LiFor architectures on which denormals are problematic, adding very small
7591*a58d3d2aSXin Li floating-point offsets to the affected signals to prevent significant numbers
7592*a58d3d2aSXin Li of denormalized operations is RECOMMENDED.
7593*a58d3d2aSXin LiAlternatively, it is often possible to configure the hardware to treat
7594*a58d3d2aSXin Li denormals as zero (DAZ).
7595*a58d3d2aSXin LiNo such issue exists for the fixed-point reference implementation.
7596*a58d3d2aSXin Li</t>
7597*a58d3d2aSXin Li<t>The reference implementation was validated in the following conditions:
7598*a58d3d2aSXin Li<list style="numbers">
7599*a58d3d2aSXin Li<t>
7600*a58d3d2aSXin LiSending the decoder valid packets generated by the reference encoder and
7601*a58d3d2aSXin Li verifying that the decoder's final range coder state matches that of the
7602*a58d3d2aSXin Li encoder.
7603*a58d3d2aSXin Li</t>
7604*a58d3d2aSXin Li<t>
7605*a58d3d2aSXin LiSending the decoder packets generated by the reference encoder and then
7606*a58d3d2aSXin Li subjected to random corruption.
7607*a58d3d2aSXin Li</t>
7608*a58d3d2aSXin Li<t>Sending the decoder random packets.</t>
7609*a58d3d2aSXin Li<t>
7610*a58d3d2aSXin LiSending the decoder packets generated by a version of the reference encoder
7611*a58d3d2aSXin Li modified to make random coding decisions (internal fuzzing), including mode
7612*a58d3d2aSXin Li switching, and verifying that the range coder final states match.
7613*a58d3d2aSXin Li</t>
7614*a58d3d2aSXin Li</list>
7615*a58d3d2aSXin LiIn all of the conditions above, both the encoder and the decoder were run
7616*a58d3d2aSXin Li inside the <xref target="Valgrind">Valgrind</xref> memory
7617*a58d3d2aSXin Li debugger, which tracks reads and writes to invalid memory regions as well as
7618*a58d3d2aSXin Li the use of uninitialized memory.
7619*a58d3d2aSXin LiThere were no errors reported on any of the tested conditions.
7620*a58d3d2aSXin Li</t>
7621*a58d3d2aSXin Li</section>
7622*a58d3d2aSXin Li
7623*a58d3d2aSXin Li
7624*a58d3d2aSXin Li<section title="IANA Considerations">
7625*a58d3d2aSXin Li<t>
7626*a58d3d2aSXin LiThis document has no actions for IANA.
7627*a58d3d2aSXin Li</t>
7628*a58d3d2aSXin Li</section>
7629*a58d3d2aSXin Li
7630*a58d3d2aSXin Li<section anchor="Acknowledgements" title="Acknowledgements">
7631*a58d3d2aSXin Li<t>
7632*a58d3d2aSXin LiThanks to all other developers, including Raymond Chen, Soeren Skak Jensen, Gregory Maxwell,
7633*a58d3d2aSXin LiChristopher Montgomery, and Karsten Vandborg Soerensen. We would also
7634*a58d3d2aSXin Lilike to thank Igor Dyakonov, Jan Skoglund, and Christian Hoene for their help with subjective testing of the
7635*a58d3d2aSXin LiOpus codec. Thanks to Ralph Giles, John Ridges, Ben Schwartz, Keith Yan, Christian Hoene, Kat Walsh, and many others on the Opus and CELT mailing lists
7636*a58d3d2aSXin Lifor their bug reports and feedback.
7637*a58d3d2aSXin Li</t>
7638*a58d3d2aSXin Li</section>
7639*a58d3d2aSXin Li
7640*a58d3d2aSXin Li<section title="Copying Conditions">
7641*a58d3d2aSXin Li<t>The authors agree to grant third parties the irrevocable right to copy, use and distribute
7642*a58d3d2aSXin Lithe work (excluding Code Components available under the simplified BSD license), with or
7643*a58d3d2aSXin Liwithout modification, in any medium, without royalty, provided that, unless separate
7644*a58d3d2aSXin Lipermission is granted, redistributed modified works do not contain misleading author, version,
7645*a58d3d2aSXin Liname of work, or endorsement information.</t>
7646*a58d3d2aSXin Li</section>
7647*a58d3d2aSXin Li
7648*a58d3d2aSXin Li</middle>
7649*a58d3d2aSXin Li
7650*a58d3d2aSXin Li<back>
7651*a58d3d2aSXin Li
7652*a58d3d2aSXin Li<references title="Normative References">
7653*a58d3d2aSXin Li
7654*a58d3d2aSXin Li<reference anchor="rfc2119">
7655*a58d3d2aSXin Li<front>
7656*a58d3d2aSXin Li<title>Key words for use in RFCs to Indicate Requirement Levels </title>
7657*a58d3d2aSXin Li<author initials="S." surname="Bradner" fullname="Scott Bradner"></author>
7658*a58d3d2aSXin Li</front>
7659*a58d3d2aSXin Li<seriesInfo name="RFC" value="2119" />
7660*a58d3d2aSXin Li</reference>
7661*a58d3d2aSXin Li
7662*a58d3d2aSXin Li</references>
7663*a58d3d2aSXin Li
7664*a58d3d2aSXin Li<references title="Informative References">
7665*a58d3d2aSXin Li
7666*a58d3d2aSXin Li<reference anchor='requirements'>
7667*a58d3d2aSXin Li<front>
7668*a58d3d2aSXin Li<title>Requirements for an Internet Audio Codec</title>
7669*a58d3d2aSXin Li<author initials='J.-M.' surname='Valin' fullname='J.-M. Valin'>
7670*a58d3d2aSXin Li<organization /></author>
7671*a58d3d2aSXin Li<author initials='K.' surname='Vos' fullname='K. Vos'>
7672*a58d3d2aSXin Li<organization /></author>
7673*a58d3d2aSXin Li<author>
7674*a58d3d2aSXin Li<organization>IETF</organization></author>
7675*a58d3d2aSXin Li<date year='2011' month='August' />
7676*a58d3d2aSXin Li<abstract>
7677*a58d3d2aSXin Li<t>This document provides specific requirements for an Internet audio
7678*a58d3d2aSXin Li   codec.  These requirements address quality, sample rate, bitrate,
7679*a58d3d2aSXin Li   and packet-loss robustness, as well as other desirable properties.
7680*a58d3d2aSXin Li</t></abstract></front>
7681*a58d3d2aSXin Li<seriesInfo name='RFC' value='6366' />
7682*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/rfc/rfc6366.txt' />
7683*a58d3d2aSXin Li</reference>
7684*a58d3d2aSXin Li
7685*a58d3d2aSXin Li<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?>
7686*a58d3d2aSXin Li<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3533.xml"?>
7687*a58d3d2aSXin Li
7688*a58d3d2aSXin Li<reference anchor='SILK' target='http://developer.skype.com/silk'>
7689*a58d3d2aSXin Li<front>
7690*a58d3d2aSXin Li<title>SILK Speech Codec</title>
7691*a58d3d2aSXin Li<author initials='K.' surname='Vos' fullname='K. Vos'>
7692*a58d3d2aSXin Li<organization /></author>
7693*a58d3d2aSXin Li<author initials='S.' surname='Jensen' fullname='S. Jensen'>
7694*a58d3d2aSXin Li<organization /></author>
7695*a58d3d2aSXin Li<author initials='K.' surname='Soerensen' fullname='K. Soerensen'>
7696*a58d3d2aSXin Li<organization /></author>
7697*a58d3d2aSXin Li<date year='2010' month='March' />
7698*a58d3d2aSXin Li<abstract>
7699*a58d3d2aSXin Li<t></t>
7700*a58d3d2aSXin Li</abstract></front>
7701*a58d3d2aSXin Li<seriesInfo name='Internet-Draft' value='draft-vos-silk-01' />
7702*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/html/draft-vos-silk-01' />
7703*a58d3d2aSXin Li</reference>
7704*a58d3d2aSXin Li
7705*a58d3d2aSXin Li<reference anchor="laroia-icassp">
7706*a58d3d2aSXin Li<front>
7707*a58d3d2aSXin Li<title abbrev="Robust and Efficient Quantization of Speech LSP">
7708*a58d3d2aSXin LiRobust and Efficient Quantization of Speech LSP Parameters Using Structured Vector Quantization
7709*a58d3d2aSXin Li</title>
7710*a58d3d2aSXin Li<author initials="R.L." surname="Laroia" fullname="R.">
7711*a58d3d2aSXin Li<organization/>
7712*a58d3d2aSXin Li</author>
7713*a58d3d2aSXin Li<author initials="N.P." surname="Phamdo" fullname="N.">
7714*a58d3d2aSXin Li<organization/>
7715*a58d3d2aSXin Li</author>
7716*a58d3d2aSXin Li<author initials="N.F." surname="Farvardin" fullname="N.">
7717*a58d3d2aSXin Li<organization/>
7718*a58d3d2aSXin Li</author>
7719*a58d3d2aSXin Li</front>
7720*a58d3d2aSXin Li<seriesInfo name="ICASSP-1991, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 641-644, October" value="1991"/>
7721*a58d3d2aSXin Li</reference>
7722*a58d3d2aSXin Li
7723*a58d3d2aSXin Li<reference anchor='CELT' target='http://celt-codec.org/'>
7724*a58d3d2aSXin Li<front>
7725*a58d3d2aSXin Li<title>Constrained-Energy Lapped Transform (CELT) Codec</title>
7726*a58d3d2aSXin Li<author initials='J-M.' surname='Valin' fullname='J-M. Valin'>
7727*a58d3d2aSXin Li<organization /></author>
7728*a58d3d2aSXin Li<author initials='T&#x2E;B.' surname='Terriberry' fullname='Timothy B. Terriberry'>
7729*a58d3d2aSXin Li<organization /></author>
7730*a58d3d2aSXin Li<author initials='G.' surname='Maxwell' fullname='G. Maxwell'>
7731*a58d3d2aSXin Li<organization /></author>
7732*a58d3d2aSXin Li<author initials='C.' surname='Montgomery' fullname='C. Montgomery'>
7733*a58d3d2aSXin Li<organization /></author>
7734*a58d3d2aSXin Li<date year='2010' month='July' />
7735*a58d3d2aSXin Li<abstract>
7736*a58d3d2aSXin Li<t></t>
7737*a58d3d2aSXin Li</abstract></front>
7738*a58d3d2aSXin Li<seriesInfo name='Internet-Draft' value='draft-valin-celt-codec-02' />
7739*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/html/draft-valin-celt-codec-02' />
7740*a58d3d2aSXin Li</reference>
7741*a58d3d2aSXin Li
7742*a58d3d2aSXin Li<reference anchor='SRTP-VBR'>
7743*a58d3d2aSXin Li<front>
7744*a58d3d2aSXin Li<title>Guidelines for the use of Variable Bit Rate Audio with Secure RTP</title>
7745*a58d3d2aSXin Li<author initials='C.' surname='Perkins' fullname='K. Vos'>
7746*a58d3d2aSXin Li<organization /></author>
7747*a58d3d2aSXin Li<author initials='J.M.' surname='Valin' fullname='J.M. Valin'>
7748*a58d3d2aSXin Li<organization /></author>
7749*a58d3d2aSXin Li<date year='2011' month='July' />
7750*a58d3d2aSXin Li<abstract>
7751*a58d3d2aSXin Li<t></t>
7752*a58d3d2aSXin Li</abstract></front>
7753*a58d3d2aSXin Li<seriesInfo name='RFC' value='6562' />
7754*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/html/rfc6562' />
7755*a58d3d2aSXin Li</reference>
7756*a58d3d2aSXin Li
7757*a58d3d2aSXin Li<reference anchor='DOS'>
7758*a58d3d2aSXin Li<front>
7759*a58d3d2aSXin Li<title>Internet Denial-of-Service Considerations</title>
7760*a58d3d2aSXin Li<author initials='M.' surname='Handley' fullname='M. Handley'>
7761*a58d3d2aSXin Li<organization /></author>
7762*a58d3d2aSXin Li<author initials='E.' surname='Rescorla' fullname='E. Rescorla'>
7763*a58d3d2aSXin Li<organization /></author>
7764*a58d3d2aSXin Li<author>
7765*a58d3d2aSXin Li<organization>IAB</organization></author>
7766*a58d3d2aSXin Li<date year='2006' month='December' />
7767*a58d3d2aSXin Li<abstract>
7768*a58d3d2aSXin Li<t>This document provides an overview of possible avenues for denial-of-service (DoS) attack on Internet systems.  The aim is to encourage protocol designers and network engineers towards designs that are more robust.  We discuss partial solutions that reduce the effectiveness of attacks, and how some solutions might inadvertently open up alternative vulnerabilities.  This memo provides information for the Internet community.</t></abstract></front>
7769*a58d3d2aSXin Li<seriesInfo name='RFC' value='4732' />
7770*a58d3d2aSXin Li<format type='TXT' octets='91844' target='ftp://ftp.isi.edu/in-notes/rfc4732.txt' />
7771*a58d3d2aSXin Li</reference>
7772*a58d3d2aSXin Li
7773*a58d3d2aSXin Li<reference anchor="Martin79">
7774*a58d3d2aSXin Li<front>
7775*a58d3d2aSXin Li<title>Range encoding: An algorithm for removing redundancy from a digitised message</title>
7776*a58d3d2aSXin Li<author initials="G.N.N." surname="Martin" fullname="G. Nigel N. Martin"><organization/></author>
7777*a58d3d2aSXin Li<date year="1979" />
7778*a58d3d2aSXin Li</front>
7779*a58d3d2aSXin Li<seriesInfo name="Proc. Institution of Electronic and Radio Engineers International Conference on Video and Data Recording" value="" />
7780*a58d3d2aSXin Li</reference>
7781*a58d3d2aSXin Li
7782*a58d3d2aSXin Li<reference anchor="coding-thesis">
7783*a58d3d2aSXin Li<front>
7784*a58d3d2aSXin Li<title>Source coding algorithms for fast data compression</title>
7785*a58d3d2aSXin Li<author initials="R." surname="Pasco" fullname=""><organization/></author>
7786*a58d3d2aSXin Li<date month="May" year="1976" />
7787*a58d3d2aSXin Li</front>
7788*a58d3d2aSXin Li<seriesInfo name="Ph.D. thesis" value="Dept. of Electrical Engineering, Stanford University" />
7789*a58d3d2aSXin Li</reference>
7790*a58d3d2aSXin Li
7791*a58d3d2aSXin Li<reference anchor="PVQ">
7792*a58d3d2aSXin Li<front>
7793*a58d3d2aSXin Li<title>A Pyramid Vector Quantizer</title>
7794*a58d3d2aSXin Li<author initials="T." surname="Fischer" fullname=""><organization/></author>
7795*a58d3d2aSXin Li<date month="July" year="1986" />
7796*a58d3d2aSXin Li</front>
7797*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. on Information Theory, Vol. 32" value="pp. 568-583" />
7798*a58d3d2aSXin Li</reference>
7799*a58d3d2aSXin Li
7800*a58d3d2aSXin Li<reference anchor="Kabal86">
7801*a58d3d2aSXin Li<front>
7802*a58d3d2aSXin Li<title>The Computation of Line Spectral Frequencies Using Chebyshev Polynomials</title>
7803*a58d3d2aSXin Li<author initials="P." surname="Kabal" fullname="P. Kabal"><organization/></author>
7804*a58d3d2aSXin Li<author initials="R." surname="Ramachandran" fullname="R. P. Ramachandran"><organization/></author>
7805*a58d3d2aSXin Li<date month="December" year="1986" />
7806*a58d3d2aSXin Li</front>
7807*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. Acoustics, Speech, Signal Processing, vol. 34, no. 6" value="pp. 1419-1426" />
7808*a58d3d2aSXin Li</reference>
7809*a58d3d2aSXin Li
7810*a58d3d2aSXin Li
7811*a58d3d2aSXin Li<reference anchor="Valgrind" target="http://valgrind.org/">
7812*a58d3d2aSXin Li<front>
7813*a58d3d2aSXin Li<title>Valgrind website</title>
7814*a58d3d2aSXin Li<author></author>
7815*a58d3d2aSXin Li</front>
7816*a58d3d2aSXin Li</reference>
7817*a58d3d2aSXin Li
7818*a58d3d2aSXin Li<reference anchor="Google-NetEQ" target="http://code.google.com/p/webrtc/source/browse/trunk/src/modules/audio_coding/NetEQ/main/source/?r=583">
7819*a58d3d2aSXin Li<front>
7820*a58d3d2aSXin Li<title>Google NetEQ code</title>
7821*a58d3d2aSXin Li<author></author>
7822*a58d3d2aSXin Li</front>
7823*a58d3d2aSXin Li</reference>
7824*a58d3d2aSXin Li
7825*a58d3d2aSXin Li<reference anchor="Google-WebRTC" target="http://code.google.com/p/webrtc/">
7826*a58d3d2aSXin Li<front>
7827*a58d3d2aSXin Li<title>Google WebRTC code</title>
7828*a58d3d2aSXin Li<author></author>
7829*a58d3d2aSXin Li</front>
7830*a58d3d2aSXin Li</reference>
7831*a58d3d2aSXin Li
7832*a58d3d2aSXin Li
7833*a58d3d2aSXin Li<reference anchor="Opus-git" target="git://git.xiph.org/opus.git">
7834*a58d3d2aSXin Li<front>
7835*a58d3d2aSXin Li<title>Opus Git Repository</title>
7836*a58d3d2aSXin Li<author></author>
7837*a58d3d2aSXin Li</front>
7838*a58d3d2aSXin Li</reference>
7839*a58d3d2aSXin Li
7840*a58d3d2aSXin Li<reference anchor="Opus-website" target="http://opus-codec.org/">
7841*a58d3d2aSXin Li<front>
7842*a58d3d2aSXin Li<title>Opus website</title>
7843*a58d3d2aSXin Li<author></author>
7844*a58d3d2aSXin Li</front>
7845*a58d3d2aSXin Li</reference>
7846*a58d3d2aSXin Li
7847*a58d3d2aSXin Li<reference anchor="Vorbis-website" target="http://xiph.org/vorbis/">
7848*a58d3d2aSXin Li<front>
7849*a58d3d2aSXin Li<title>Vorbis website</title>
7850*a58d3d2aSXin Li<author></author>
7851*a58d3d2aSXin Li</front>
7852*a58d3d2aSXin Li</reference>
7853*a58d3d2aSXin Li
7854*a58d3d2aSXin Li<reference anchor="Matroska-website" target="http://matroska.org/">
7855*a58d3d2aSXin Li<front>
7856*a58d3d2aSXin Li<title>Matroska website</title>
7857*a58d3d2aSXin Li<author></author>
7858*a58d3d2aSXin Li</front>
7859*a58d3d2aSXin Li</reference>
7860*a58d3d2aSXin Li
7861*a58d3d2aSXin Li<reference anchor="Vectors-website" target="http://opus-codec.org/testvectors/">
7862*a58d3d2aSXin Li<front>
7863*a58d3d2aSXin Li<title>Opus Testvectors (webside)</title>
7864*a58d3d2aSXin Li<author></author>
7865*a58d3d2aSXin Li</front>
7866*a58d3d2aSXin Li</reference>
7867*a58d3d2aSXin Li
7868*a58d3d2aSXin Li<reference anchor="Vectors-proc" target="http://www.ietf.org/proceedings/83/slides/slides-83-codec-0.gz">
7869*a58d3d2aSXin Li<front>
7870*a58d3d2aSXin Li<title>Opus Testvectors (proceedings)</title>
7871*a58d3d2aSXin Li<author></author>
7872*a58d3d2aSXin Li</front>
7873*a58d3d2aSXin Li</reference>
7874*a58d3d2aSXin Li
7875*a58d3d2aSXin Li<reference anchor="line-spectral-pairs" target="http://en.wikipedia.org/wiki/Line_spectral_pairs">
7876*a58d3d2aSXin Li<front>
7877*a58d3d2aSXin Li<title>Line Spectral Pairs</title>
7878*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
7879*a58d3d2aSXin Li</front>
7880*a58d3d2aSXin Li</reference>
7881*a58d3d2aSXin Li
7882*a58d3d2aSXin Li<reference anchor="range-coding" target="http://en.wikipedia.org/wiki/Range_coding">
7883*a58d3d2aSXin Li<front>
7884*a58d3d2aSXin Li<title>Range Coding</title>
7885*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
7886*a58d3d2aSXin Li</front>
7887*a58d3d2aSXin Li</reference>
7888*a58d3d2aSXin Li
7889*a58d3d2aSXin Li<reference anchor="Hadamard" target="http://en.wikipedia.org/wiki/Hadamard_transform">
7890*a58d3d2aSXin Li<front>
7891*a58d3d2aSXin Li<title>Hadamard Transform</title>
7892*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
7893*a58d3d2aSXin Li</front>
7894*a58d3d2aSXin Li</reference>
7895*a58d3d2aSXin Li
7896*a58d3d2aSXin Li<reference anchor="Viterbi" target="http://en.wikipedia.org/wiki/Viterbi_algorithm">
7897*a58d3d2aSXin Li<front>
7898*a58d3d2aSXin Li<title>Viterbi Algorithm</title>
7899*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
7900*a58d3d2aSXin Li</front>
7901*a58d3d2aSXin Li</reference>
7902*a58d3d2aSXin Li
7903*a58d3d2aSXin Li<reference anchor="Whitening" target="http://en.wikipedia.org/wiki/White_noise">
7904*a58d3d2aSXin Li<front>
7905*a58d3d2aSXin Li<title>White Noise</title>
7906*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
7907*a58d3d2aSXin Li</front>
7908*a58d3d2aSXin Li</reference>
7909*a58d3d2aSXin Li
7910*a58d3d2aSXin Li<reference anchor="LPC" target="http://en.wikipedia.org/wiki/Linear_prediction">
7911*a58d3d2aSXin Li<front>
7912*a58d3d2aSXin Li<title>Linear Prediction</title>
7913*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
7914*a58d3d2aSXin Li</front>
7915*a58d3d2aSXin Li</reference>
7916*a58d3d2aSXin Li
7917*a58d3d2aSXin Li<reference anchor="MDCT" target="http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform">
7918*a58d3d2aSXin Li<front>
7919*a58d3d2aSXin Li<title>Modified Discrete Cosine Transform</title>
7920*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
7921*a58d3d2aSXin Li</front>
7922*a58d3d2aSXin Li</reference>
7923*a58d3d2aSXin Li
7924*a58d3d2aSXin Li<reference anchor="FFT" target="http://en.wikipedia.org/wiki/Fast_Fourier_transform">
7925*a58d3d2aSXin Li<front>
7926*a58d3d2aSXin Li<title>Fast Fourier Transform</title>
7927*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
7928*a58d3d2aSXin Li</front>
7929*a58d3d2aSXin Li</reference>
7930*a58d3d2aSXin Li
7931*a58d3d2aSXin Li<reference anchor="z-transform" target="http://en.wikipedia.org/wiki/Z-transform">
7932*a58d3d2aSXin Li<front>
7933*a58d3d2aSXin Li<title>Z-transform</title>
7934*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
7935*a58d3d2aSXin Li</front>
7936*a58d3d2aSXin Li</reference>
7937*a58d3d2aSXin Li
7938*a58d3d2aSXin Li
7939*a58d3d2aSXin Li<reference anchor="Burg">
7940*a58d3d2aSXin Li<front>
7941*a58d3d2aSXin Li<title>Maximum Entropy Spectral Analysis</title>
7942*a58d3d2aSXin Li<author initials="JP." surname="Burg" fullname="J.P. Burg"><organization/></author>
7943*a58d3d2aSXin Li</front>
7944*a58d3d2aSXin Li</reference>
7945*a58d3d2aSXin Li
7946*a58d3d2aSXin Li<reference anchor="Schur">
7947*a58d3d2aSXin Li<front>
7948*a58d3d2aSXin Li<title>A fixed point computation of partial correlation coefficients</title>
7949*a58d3d2aSXin Li<author initials="J." surname="Le Roux" fullname="J. Le Roux"><organization/></author>
7950*a58d3d2aSXin Li<author initials="C." surname="Gueguen" fullname="C. Gueguen"><organization/></author>
7951*a58d3d2aSXin Li</front>
7952*a58d3d2aSXin Li<seriesInfo name="ICASSP-1977, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 257-259, October" value="1977"/>
7953*a58d3d2aSXin Li</reference>
7954*a58d3d2aSXin Li
7955*a58d3d2aSXin Li<reference anchor="Princen86">
7956*a58d3d2aSXin Li<front>
7957*a58d3d2aSXin Li<title>Analysis/synthesis filter bank design based on time domain aliasing cancellation</title>
7958*a58d3d2aSXin Li<author initials="J." surname="Princen" fullname="John P. Princen"><organization/></author>
7959*a58d3d2aSXin Li<author initials="A." surname="Bradley" fullname="Alan B. Bradley"><organization/></author>
7960*a58d3d2aSXin Li</front>
7961*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. Acoust. Speech Sig. Proc. ASSP-34 (5), 1153-1161" value="1986"/>
7962*a58d3d2aSXin Li</reference>
7963*a58d3d2aSXin Li
7964*a58d3d2aSXin Li<reference anchor="Valin2010">
7965*a58d3d2aSXin Li<front>
7966*a58d3d2aSXin Li<title>A High-Quality Speech and Audio Codec With Less Than 10 ms delay</title>
7967*a58d3d2aSXin Li<author initials="JM" surname="Valin" fullname="Jean-Marc Valin"><organization/>
7968*a58d3d2aSXin Li</author>
7969*a58d3d2aSXin Li<author initials="T. B." surname="Terriberry" fullname="Timothy Terriberry"><organization/></author>
7970*a58d3d2aSXin Li<author initials="C." surname="Montgomery" fullname="Christopher Montgomery"><organization/></author>
7971*a58d3d2aSXin Li<author initials="G." surname="Maxwell" fullname="Gregory Maxwell"><organization/></author>
7972*a58d3d2aSXin Li</front>
7973*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. on Audio, Speech and Language Processing, Vol. 18, No. 1, pp. 58-67" value="2010" />
7974*a58d3d2aSXin Li</reference>
7975*a58d3d2aSXin Li
7976*a58d3d2aSXin Li
7977*a58d3d2aSXin Li<reference anchor="Zwicker61">
7978*a58d3d2aSXin Li<front>
7979*a58d3d2aSXin Li<title>Subdivision of the audible frequency range into critical bands</title>
7980*a58d3d2aSXin Li<author initials="E." surname="Zwicker" fullname="E. Zwicker"><organization/></author>
7981*a58d3d2aSXin Li<date month="February" year="1961" />
7982*a58d3d2aSXin Li</front>
7983*a58d3d2aSXin Li<seriesInfo name="The Journal of the Acoustical Society of America, Vol. 33, No 2" value="p. 248" />
7984*a58d3d2aSXin Li</reference>
7985*a58d3d2aSXin Li
7986*a58d3d2aSXin Li
7987*a58d3d2aSXin Li</references>
7988*a58d3d2aSXin Li
7989*a58d3d2aSXin Li<section anchor="ref-implementation" title="Reference Implementation">
7990*a58d3d2aSXin Li
7991*a58d3d2aSXin Li<t>This appendix contains the complete source code for the
7992*a58d3d2aSXin Lireference implementation of the Opus codec written in C. By default,
7993*a58d3d2aSXin Lithis implementation relies on floating-point arithmetic, but it can be
7994*a58d3d2aSXin Licompiled to use only fixed-point arithmetic by defining the FIXED_POINT
7995*a58d3d2aSXin Limacro. Information on building and using the reference implementation is
7996*a58d3d2aSXin Liavailable in the README file.
7997*a58d3d2aSXin Li</t>
7998*a58d3d2aSXin Li
7999*a58d3d2aSXin Li<t>The implementation can be compiled with either a C89 or a C99
8000*a58d3d2aSXin Licompiler. It is reasonably optimized for most platforms such that
8001*a58d3d2aSXin Lionly architecture-specific optimizations are likely to be useful.
8002*a58d3d2aSXin LiThe FFT <xref target="FFT"/> used is a slightly modified version of the KISS-FFT library,
8003*a58d3d2aSXin Libut it is easy to substitute any other FFT library.
8004*a58d3d2aSXin Li</t>
8005*a58d3d2aSXin Li
8006*a58d3d2aSXin Li<t>
8007*a58d3d2aSXin LiWhile the reference implementation does not rely on any
8008*a58d3d2aSXin Li<spanx style="emph">undefined behavior</spanx> as defined by C89 or C99,
8009*a58d3d2aSXin Liit relies on common <spanx style="emph">implementation-defined behavior</spanx>
8010*a58d3d2aSXin Lifor two's complement architectures:
8011*a58d3d2aSXin Li<list style="symbols">
8012*a58d3d2aSXin Li<t>Right shifts of negative values are consistent with two's complement arithmetic, so that a>>b is equivalent to floor(a/(2**b)),</t>
8013*a58d3d2aSXin Li<t>For conversion to a signed integer of N bits, the value is reduced modulo 2**N to be within range of the type,</t>
8014*a58d3d2aSXin Li<t>The result of integer division of a negative value is truncated towards zero, and</t>
8015*a58d3d2aSXin Li<t>The compiler provides a 64-bit integer type (a C99 requirement which is supported by most C89 compilers).</t>
8016*a58d3d2aSXin Li</list>
8017*a58d3d2aSXin Li</t>
8018*a58d3d2aSXin Li
8019*a58d3d2aSXin Li<t>
8020*a58d3d2aSXin LiIn its current form, the reference implementation also requires the following
8021*a58d3d2aSXin Liarchitectural characteristics to obtain acceptable performance:
8022*a58d3d2aSXin Li<list style="symbols">
8023*a58d3d2aSXin Li<t>Two's complement arithmetic,</t>
8024*a58d3d2aSXin Li<t>At least a 16 bit by 16 bit integer multiplier (32-bit result), and</t>
8025*a58d3d2aSXin Li<t>At least a 32-bit adder/accumulator.</t>
8026*a58d3d2aSXin Li</list>
8027*a58d3d2aSXin Li</t>
8028*a58d3d2aSXin Li
8029*a58d3d2aSXin Li
8030*a58d3d2aSXin Li<section title="Extracting the source">
8031*a58d3d2aSXin Li<t>
8032*a58d3d2aSXin LiThe complete source code can be extracted from this draft, by running the
8033*a58d3d2aSXin Lifollowing command line:
8034*a58d3d2aSXin Li
8035*a58d3d2aSXin Li<list style="symbols">
8036*a58d3d2aSXin Li<t><![CDATA[
8037*a58d3d2aSXin Licat draft-ietf-codec-opus.txt | grep '^\ \ \ ###' | sed -e 's/...###//' | base64 -d > opus_source.tar.gz
8038*a58d3d2aSXin Li]]></t>
8039*a58d3d2aSXin Li<t>
8040*a58d3d2aSXin Litar xzvf opus_source.tar.gz
8041*a58d3d2aSXin Li</t>
8042*a58d3d2aSXin Li<t>cd opus_source</t>
8043*a58d3d2aSXin Li<t>make</t>
8044*a58d3d2aSXin Li</list>
8045*a58d3d2aSXin LiOn systems where the provided Makefile does not work, the following command line may be used to compile
8046*a58d3d2aSXin Lithe source code:
8047*a58d3d2aSXin Li<list style="symbols">
8048*a58d3d2aSXin Li<t><![CDATA[
8049*a58d3d2aSXin Licc -O2 -g -o opus_demo src/opus_demo.c `cat *.mk | grep -v fixed | sed -e 's/.*=//' -e 's/\\\\//'` -DOPUS_BUILD -Iinclude -Icelt -Isilk -Isilk/float -DUSE_ALLOCA -Drestrict= -lm
8050*a58d3d2aSXin Li]]></t></list>
8051*a58d3d2aSXin Li</t>
8052*a58d3d2aSXin Li
8053*a58d3d2aSXin Li<t>
8054*a58d3d2aSXin LiOn systems where the base64 utility is not present, the following commands can be used instead:
8055*a58d3d2aSXin Li<list style="symbols">
8056*a58d3d2aSXin Li<t><![CDATA[
8057*a58d3d2aSXin Licat draft-ietf-codec-opus.txt | grep '^\ \ \ ###' | sed -e 's/...###//' > opus.b64
8058*a58d3d2aSXin Li]]></t>
8059*a58d3d2aSXin Li<t>openssl base64 -d -in opus.b64 > opus_source.tar.gz</t>
8060*a58d3d2aSXin Li</list>
8061*a58d3d2aSXin Li
8062*a58d3d2aSXin Li</t>
8063*a58d3d2aSXin Li</section>
8064*a58d3d2aSXin Li
8065*a58d3d2aSXin Li<section title="Up-to-date Implementation">
8066*a58d3d2aSXin Li<t>
8067*a58d3d2aSXin LiAs of the time of publication of this memo, an up-to-date implementation conforming to
8068*a58d3d2aSXin Lithis standard is available in a
8069*a58d3d2aSXin Li <xref target='Opus-git'>Git repository</xref>.
8070*a58d3d2aSXin LiReleases and other resources are available at
8071*a58d3d2aSXin Li <xref target='Opus-website'/>. However, although that implementation is expected to
8072*a58d3d2aSXin Li remain conformant with the standard, it is the code in this document that shall
8073*a58d3d2aSXin Li remain normative.
8074*a58d3d2aSXin Li</t>
8075*a58d3d2aSXin Li</section>
8076*a58d3d2aSXin Li
8077*a58d3d2aSXin Li<section title="Base64-encoded Source Code">
8078*a58d3d2aSXin Li<t>
8079*a58d3d2aSXin Li<?rfc include="opus_source.base64"?>
8080*a58d3d2aSXin Li</t>
8081*a58d3d2aSXin Li</section>
8082*a58d3d2aSXin Li
8083*a58d3d2aSXin Li<section anchor="test-vectors" title="Test Vectors">
8084*a58d3d2aSXin Li<t>
8085*a58d3d2aSXin LiBecause of size constraints, the Opus test vectors are not distributed in this
8086*a58d3d2aSXin Lidraft. They are available in the proceedings of the 83th IETF meeting (Paris) <xref target="Vectors-proc"/> and from the Opus codec website at
8087*a58d3d2aSXin Li<xref target="Vectors-website"/>. These test vectors were created specifically to exercise
8088*a58d3d2aSXin Liall aspects of the decoder and therefore the audio quality of the decoded output is
8089*a58d3d2aSXin Lisignificantly lower than what Opus can achieve in normal operation.
8090*a58d3d2aSXin Li</t>
8091*a58d3d2aSXin Li
8092*a58d3d2aSXin Li<t>
8093*a58d3d2aSXin LiThe SHA1 hash of the files in the test vector package are
8094*a58d3d2aSXin Li<?rfc include="testvectors_sha1"?>
8095*a58d3d2aSXin Li</t>
8096*a58d3d2aSXin Li
8097*a58d3d2aSXin Li</section>
8098*a58d3d2aSXin Li
8099*a58d3d2aSXin Li</section>
8100*a58d3d2aSXin Li
8101*a58d3d2aSXin Li<section anchor="self-delimiting-framing" title="Self-Delimiting Framing">
8102*a58d3d2aSXin Li<t>
8103*a58d3d2aSXin LiTo use the internal framing described in <xref target="modes"/>, the decoder
8104*a58d3d2aSXin Li must know the total length of the Opus packet, in bytes.
8105*a58d3d2aSXin LiThis section describes a simple variation of that framing which can be used
8106*a58d3d2aSXin Li when the total length of the packet is not known.
8107*a58d3d2aSXin LiNothing in the encoding of the packet itself allows a decoder to distinguish
8108*a58d3d2aSXin Li between the regular, undelimited framing and the self-delimiting framing
8109*a58d3d2aSXin Li described in this appendix.
8110*a58d3d2aSXin LiWhich one is used and where must be established by context at the transport
8111*a58d3d2aSXin Li layer.
8112*a58d3d2aSXin LiIt is RECOMMENDED that a transport layer choose exactly one framing scheme,
8113*a58d3d2aSXin Li rather than allowing an encoder to signal which one it wants to use.
8114*a58d3d2aSXin Li</t>
8115*a58d3d2aSXin Li
8116*a58d3d2aSXin Li<t>
8117*a58d3d2aSXin LiFor example, although a regular Opus stream does not support more than two
8118*a58d3d2aSXin Li channels, a multi-channel Opus stream may be formed from several one- and
8119*a58d3d2aSXin Li two-channel streams.
8120*a58d3d2aSXin LiTo pack an Opus packet from each of these streams together in a single packet
8121*a58d3d2aSXin Li at the transport layer, one could use the self-delimiting framing for all but
8122*a58d3d2aSXin Li the last stream, and then the regular, undelimited framing for the last one.
8123*a58d3d2aSXin LiReverting to the undelimited framing for the last stream saves overhead
8124*a58d3d2aSXin Li (because the total size of the transport-layer packet will still be known),
8125*a58d3d2aSXin Li and ensures that a "multi-channel" stream which only has a single Opus stream
8126*a58d3d2aSXin Li uses the same framing as a regular Opus stream does.
8127*a58d3d2aSXin LiThis avoids the need for signaling to distinguish these two cases.
8128*a58d3d2aSXin Li</t>
8129*a58d3d2aSXin Li
8130*a58d3d2aSXin Li<t>
8131*a58d3d2aSXin LiThe self-delimiting framing is identical to the regular, undelimited framing
8132*a58d3d2aSXin Li from <xref target="modes"/>, except that each Opus packet contains one extra
8133*a58d3d2aSXin Li length field, encoded using the same one- or two-byte scheme from
8134*a58d3d2aSXin Li <xref target="frame-length-coding"/>.
8135*a58d3d2aSXin LiThis extra length immediately precedes the compressed data of the first Opus
8136*a58d3d2aSXin Li frame in the packet, and is interpreted in the various modes as follows:
8137*a58d3d2aSXin Li<list style="symbols">
8138*a58d3d2aSXin Li<t>
8139*a58d3d2aSXin LiCode&nbsp;0 packets: It is the length of the single Opus frame (see
8140*a58d3d2aSXin Li <xref target="sd_code0_packet"/>).
8141*a58d3d2aSXin Li</t>
8142*a58d3d2aSXin Li<t>
8143*a58d3d2aSXin LiCode&nbsp;1 packets: It is the length used for both of the Opus frames (see
8144*a58d3d2aSXin Li <xref target="sd_code1_packet"/>).
8145*a58d3d2aSXin Li</t>
8146*a58d3d2aSXin Li<t>
8147*a58d3d2aSXin LiCode&nbsp;2 packets: It is the length of the second Opus frame (see
8148*a58d3d2aSXin Li <xref target="sd_code2_packet"/>).</t>
8149*a58d3d2aSXin Li<t>
8150*a58d3d2aSXin LiCBR Code&nbsp;3 packets: It is the length used for all of the Opus frames (see
8151*a58d3d2aSXin Li <xref target="sd_code3cbr_packet"/>).
8152*a58d3d2aSXin Li</t>
8153*a58d3d2aSXin Li<t>VBR Code&nbsp;3 packets: It is the length of the last Opus frame (see
8154*a58d3d2aSXin Li <xref target="sd_code3vbr_packet"/>).
8155*a58d3d2aSXin Li</t>
8156*a58d3d2aSXin Li</list>
8157*a58d3d2aSXin Li</t>
8158*a58d3d2aSXin Li
8159*a58d3d2aSXin Li<figure anchor="sd_code0_packet" title="A Self-Delimited Code 0 Packet"
8160*a58d3d2aSXin Li align="center">
8161*a58d3d2aSXin Li<artwork align="center"><![CDATA[
8162*a58d3d2aSXin Li 0                   1                   2                   3
8163*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
8164*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8165*a58d3d2aSXin Li| config  |s|0|0| N1 (1-2 bytes):                               |
8166*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
8167*a58d3d2aSXin Li|               Compressed frame 1 (N1 bytes)...                :
8168*a58d3d2aSXin Li:                                                               |
8169*a58d3d2aSXin Li|                                                               |
8170*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8171*a58d3d2aSXin Li]]></artwork>
8172*a58d3d2aSXin Li</figure>
8173*a58d3d2aSXin Li
8174*a58d3d2aSXin Li<figure anchor="sd_code1_packet" title="A Self-Delimited Code 1 Packet"
8175*a58d3d2aSXin Li align="center">
8176*a58d3d2aSXin Li<artwork align="center"><![CDATA[
8177*a58d3d2aSXin Li 0                   1                   2                   3
8178*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
8179*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8180*a58d3d2aSXin Li| config  |s|0|1| N1 (1-2 bytes):                               |
8181*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               :
8182*a58d3d2aSXin Li|               Compressed frame 1 (N1 bytes)...                |
8183*a58d3d2aSXin Li:                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8184*a58d3d2aSXin Li|                               |                               |
8185*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               :
8186*a58d3d2aSXin Li|               Compressed frame 2 (N1 bytes)...                |
8187*a58d3d2aSXin Li:                                               +-+-+-+-+-+-+-+-+
8188*a58d3d2aSXin Li|                                               |
8189*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8190*a58d3d2aSXin Li]]></artwork>
8191*a58d3d2aSXin Li</figure>
8192*a58d3d2aSXin Li
8193*a58d3d2aSXin Li<figure anchor="sd_code2_packet" title="A Self-Delimited Code 2 Packet"
8194*a58d3d2aSXin Li align="center">
8195*a58d3d2aSXin Li<artwork align="center"><![CDATA[
8196*a58d3d2aSXin Li 0                   1                   2                   3
8197*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
8198*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8199*a58d3d2aSXin Li| config  |s|1|0| N1 (1-2 bytes): N2 (1-2 bytes :               |
8200*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               :
8201*a58d3d2aSXin Li|               Compressed frame 1 (N1 bytes)...                |
8202*a58d3d2aSXin Li:                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8203*a58d3d2aSXin Li|                               |                               |
8204*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
8205*a58d3d2aSXin Li|               Compressed frame 2 (N2 bytes)...                :
8206*a58d3d2aSXin Li:                                                               |
8207*a58d3d2aSXin Li|                                                               |
8208*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8209*a58d3d2aSXin Li]]></artwork>
8210*a58d3d2aSXin Li</figure>
8211*a58d3d2aSXin Li
8212*a58d3d2aSXin Li<figure anchor="sd_code3cbr_packet" title="A Self-Delimited CBR Code 3 Packet"
8213*a58d3d2aSXin Li align="center">
8214*a58d3d2aSXin Li<artwork align="center"><![CDATA[
8215*a58d3d2aSXin Li 0                   1                   2                   3
8216*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
8217*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8218*a58d3d2aSXin Li| config  |s|1|1|0|p|     M     | Pad len (Opt) : N1 (1-2 bytes):
8219*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8220*a58d3d2aSXin Li|                                                               |
8221*a58d3d2aSXin Li:               Compressed frame 1 (N1 bytes)...                :
8222*a58d3d2aSXin Li|                                                               |
8223*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8224*a58d3d2aSXin Li|                                                               |
8225*a58d3d2aSXin Li:               Compressed frame 2 (N1 bytes)...                :
8226*a58d3d2aSXin Li|                                                               |
8227*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8228*a58d3d2aSXin Li|                                                               |
8229*a58d3d2aSXin Li:                              ...                              :
8230*a58d3d2aSXin Li|                                                               |
8231*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8232*a58d3d2aSXin Li|                                                               |
8233*a58d3d2aSXin Li:               Compressed frame M (N1 bytes)...                :
8234*a58d3d2aSXin Li|                                                               |
8235*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8236*a58d3d2aSXin Li:                  Opus Padding (Optional)...                   |
8237*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8238*a58d3d2aSXin Li]]></artwork>
8239*a58d3d2aSXin Li</figure>
8240*a58d3d2aSXin Li
8241*a58d3d2aSXin Li<figure anchor="sd_code3vbr_packet" title="A Self-Delimited VBR Code 3 Packet"
8242*a58d3d2aSXin Li align="center">
8243*a58d3d2aSXin Li<artwork align="center"><![CDATA[
8244*a58d3d2aSXin Li 0                   1                   2                   3
8245*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
8246*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8247*a58d3d2aSXin Li| config  |s|1|1|1|p|     M     | Padding length (Optional)     :
8248*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8249*a58d3d2aSXin Li: N1 (1-2 bytes):     ...       :     N[M-1]    |     N[M]      :
8250*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8251*a58d3d2aSXin Li|                                                               |
8252*a58d3d2aSXin Li:               Compressed frame 1 (N1 bytes)...                :
8253*a58d3d2aSXin Li|                                                               |
8254*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8255*a58d3d2aSXin Li|                                                               |
8256*a58d3d2aSXin Li:               Compressed frame 2 (N2 bytes)...                :
8257*a58d3d2aSXin Li|                                                               |
8258*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8259*a58d3d2aSXin Li|                                                               |
8260*a58d3d2aSXin Li:                              ...                              :
8261*a58d3d2aSXin Li|                                                               |
8262*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8263*a58d3d2aSXin Li|                                                               |
8264*a58d3d2aSXin Li:              Compressed frame M (N[M] bytes)...               :
8265*a58d3d2aSXin Li|                                                               |
8266*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8267*a58d3d2aSXin Li:                  Opus Padding (Optional)...                   |
8268*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8269*a58d3d2aSXin Li]]></artwork>
8270*a58d3d2aSXin Li</figure>
8271*a58d3d2aSXin Li
8272*a58d3d2aSXin Li</section>
8273*a58d3d2aSXin Li
8274*a58d3d2aSXin Li</back>
8275*a58d3d2aSXin Li
8276*a58d3d2aSXin Li</rfc>
8277