1*a58d3d2aSXin Li<?xml version="1.0" encoding="utf-8"?> 2*a58d3d2aSXin Li<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'> 3*a58d3d2aSXin Li<?rfc toc="yes" symrefs="yes" ?> 4*a58d3d2aSXin Li 5*a58d3d2aSXin Li<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-opus-14"> 6*a58d3d2aSXin Li 7*a58d3d2aSXin Li<front> 8*a58d3d2aSXin Li<title abbrev="Interactive Audio Codec">Definition of the Opus Audio Codec</title> 9*a58d3d2aSXin Li 10*a58d3d2aSXin Li 11*a58d3d2aSXin Li<author initials="JM" surname="Valin" fullname="Jean-Marc Valin"> 12*a58d3d2aSXin Li<organization>Mozilla Corporation</organization> 13*a58d3d2aSXin Li<address> 14*a58d3d2aSXin Li<postal> 15*a58d3d2aSXin Li<street>650 Castro Street</street> 16*a58d3d2aSXin Li<city>Mountain View</city> 17*a58d3d2aSXin Li<region>CA</region> 18*a58d3d2aSXin Li<code>94041</code> 19*a58d3d2aSXin Li<country>USA</country> 20*a58d3d2aSXin Li</postal> 21*a58d3d2aSXin Li<phone>+1 650 903-0800</phone> 22*a58d3d2aSXin Li<email>[email protected]</email> 23*a58d3d2aSXin Li</address> 24*a58d3d2aSXin Li</author> 25*a58d3d2aSXin Li 26*a58d3d2aSXin Li<author initials="K." surname="Vos" fullname="Koen Vos"> 27*a58d3d2aSXin Li<organization>Skype Technologies S.A.</organization> 28*a58d3d2aSXin Li<address> 29*a58d3d2aSXin Li<postal> 30*a58d3d2aSXin Li<street>Soder Malarstrand 43</street> 31*a58d3d2aSXin Li<city>Stockholm</city> 32*a58d3d2aSXin Li<region></region> 33*a58d3d2aSXin Li<code>11825</code> 34*a58d3d2aSXin Li<country>SE</country> 35*a58d3d2aSXin Li</postal> 36*a58d3d2aSXin Li<phone>+46 73 085 7619</phone> 37*a58d3d2aSXin Li<email>[email protected]</email> 38*a58d3d2aSXin Li</address> 39*a58d3d2aSXin Li</author> 40*a58d3d2aSXin Li 41*a58d3d2aSXin Li<author initials="T." surname="Terriberry" fullname="Timothy B. Terriberry"> 42*a58d3d2aSXin Li<organization>Mozilla Corporation</organization> 43*a58d3d2aSXin Li<address> 44*a58d3d2aSXin Li<postal> 45*a58d3d2aSXin Li<street>650 Castro Street</street> 46*a58d3d2aSXin Li<city>Mountain View</city> 47*a58d3d2aSXin Li<region>CA</region> 48*a58d3d2aSXin Li<code>94041</code> 49*a58d3d2aSXin Li<country>USA</country> 50*a58d3d2aSXin Li</postal> 51*a58d3d2aSXin Li<phone>+1 650 903-0800</phone> 52*a58d3d2aSXin Li<email>[email protected]</email> 53*a58d3d2aSXin Li</address> 54*a58d3d2aSXin Li</author> 55*a58d3d2aSXin Li 56*a58d3d2aSXin Li<date day="17" month="May" year="2012" /> 57*a58d3d2aSXin Li 58*a58d3d2aSXin Li<area>General</area> 59*a58d3d2aSXin Li 60*a58d3d2aSXin Li<workgroup></workgroup> 61*a58d3d2aSXin Li 62*a58d3d2aSXin Li<abstract> 63*a58d3d2aSXin Li<t> 64*a58d3d2aSXin LiThis document defines the Opus interactive speech and audio codec. 65*a58d3d2aSXin LiOpus is designed to handle a wide range of interactive audio applications, 66*a58d3d2aSXin Li including Voice over IP, videoconferencing, in-game chat, and even live, 67*a58d3d2aSXin Li distributed music performances. 68*a58d3d2aSXin LiIt scales from low bitrate narrowband speech at 6 kb/s to very high quality 69*a58d3d2aSXin Li stereo music at 510 kb/s. 70*a58d3d2aSXin LiOpus uses both linear prediction (LP) and the Modified Discrete Cosine 71*a58d3d2aSXin Li Transform (MDCT) to achieve good compression of both speech and music. 72*a58d3d2aSXin Li</t> 73*a58d3d2aSXin Li</abstract> 74*a58d3d2aSXin Li</front> 75*a58d3d2aSXin Li 76*a58d3d2aSXin Li<middle> 77*a58d3d2aSXin Li 78*a58d3d2aSXin Li<section anchor="introduction" title="Introduction"> 79*a58d3d2aSXin Li<t> 80*a58d3d2aSXin LiThe Opus codec is a real-time interactive audio codec designed to meet the requirements 81*a58d3d2aSXin Lidescribed in <xref target="requirements"></xref>. 82*a58d3d2aSXin LiIt is composed of a linear 83*a58d3d2aSXin Li prediction (LP)-based <xref target="LPC"/> layer and a Modified Discrete Cosine Transform 84*a58d3d2aSXin Li (MDCT)-based <xref target="MDCT"/> layer. 85*a58d3d2aSXin LiThe main idea behind using two layers is that in speech, linear prediction 86*a58d3d2aSXin Li techniques (such as Code-Excited Linear Prediction, or CELP) code low frequencies more efficiently than transform 87*a58d3d2aSXin Li (e.g., MDCT) domain techniques, while the situation is reversed for music and 88*a58d3d2aSXin Li higher speech frequencies. 89*a58d3d2aSXin LiThus a codec with both layers available can operate over a wider range than 90*a58d3d2aSXin Li either one alone and, by combining them, achieve better quality than either 91*a58d3d2aSXin Li one individually. 92*a58d3d2aSXin Li</t> 93*a58d3d2aSXin Li 94*a58d3d2aSXin Li<t> 95*a58d3d2aSXin LiThe primary normative part of this specification is provided by the source code 96*a58d3d2aSXin Li in <xref target="ref-implementation"></xref>. 97*a58d3d2aSXin LiOnly the decoder portion of this software is normative, though a 98*a58d3d2aSXin Li significant amount of code is shared by both the encoder and decoder. 99*a58d3d2aSXin Li<xref target="conformance"/> provides a decoder conformance test. 100*a58d3d2aSXin LiThe decoder contains a great deal of integer and fixed-point arithmetic which 101*a58d3d2aSXin Li needs to be performed exactly, including all rounding considerations, so any 102*a58d3d2aSXin Li useful specification requires domain-specific symbolic language to adequately 103*a58d3d2aSXin Li define these operations. 104*a58d3d2aSXin LiAdditionally, any 105*a58d3d2aSXin Liconflict between the symbolic representation and the included reference 106*a58d3d2aSXin Liimplementation must be resolved. For the practical reasons of compatibility and 107*a58d3d2aSXin Litestability it would be advantageous to give the reference implementation 108*a58d3d2aSXin Lipriority in any disagreement. The C language is also one of the most 109*a58d3d2aSXin Liwidely understood human-readable symbolic representations for machine 110*a58d3d2aSXin Libehavior. 111*a58d3d2aSXin LiFor these reasons this RFC uses the reference implementation as the sole 112*a58d3d2aSXin Li symbolic representation of the codec. 113*a58d3d2aSXin Li</t> 114*a58d3d2aSXin Li 115*a58d3d2aSXin Li<t>While the symbolic representation is unambiguous and complete it is not 116*a58d3d2aSXin Lialways the easiest way to understand the codec's operation. For this reason 117*a58d3d2aSXin Lithis document also describes significant parts of the codec in English and 118*a58d3d2aSXin Litakes the opportunity to explain the rationale behind many of the more 119*a58d3d2aSXin Lisurprising elements of the design. These descriptions are intended to be 120*a58d3d2aSXin Liaccurate and informative, but the limitations of common English sometimes 121*a58d3d2aSXin Liresult in ambiguity, so it is expected that the reader will always read 122*a58d3d2aSXin Lithem alongside the symbolic representation. Numerous references to the 123*a58d3d2aSXin Liimplementation are provided for this purpose. The descriptions sometimes 124*a58d3d2aSXin Lidiffer from the reference in ordering or through mathematical simplification 125*a58d3d2aSXin Liwherever such deviation makes an explanation easier to understand. 126*a58d3d2aSXin LiFor example, the right shift and left shift operations in the reference 127*a58d3d2aSXin Liimplementation are often described using division and multiplication in the text. 128*a58d3d2aSXin LiIn general, the text is focused on the "what" and "why" while the symbolic 129*a58d3d2aSXin Lirepresentation most clearly provides the "how". 130*a58d3d2aSXin Li</t> 131*a58d3d2aSXin Li 132*a58d3d2aSXin Li<section anchor="notation" title="Notation and Conventions"> 133*a58d3d2aSXin Li<t> 134*a58d3d2aSXin LiThe key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 135*a58d3d2aSXin Li "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be 136*a58d3d2aSXin Li interpreted as described in RFC 2119 <xref target="rfc2119"></xref>. 137*a58d3d2aSXin Li</t> 138*a58d3d2aSXin Li<t> 139*a58d3d2aSXin LiVarious operations in the codec require bit-exact fixed-point behavior, even 140*a58d3d2aSXin Li when writing a floating point implementation. 141*a58d3d2aSXin LiThe notation "Q<n>", where n is an integer, denotes the number of binary 142*a58d3d2aSXin Li digits to the right of the decimal point in a fixed-point number. 143*a58d3d2aSXin LiFor example, a signed Q14 value in a 16-bit word can represent values from 144*a58d3d2aSXin Li -2.0 to 1.99993896484375, inclusive. 145*a58d3d2aSXin LiThis notation is for informational purposes only. 146*a58d3d2aSXin LiArithmetic, when described, always operates on the underlying integer. 147*a58d3d2aSXin LiE.g., the text will explicitly indicate any shifts required after a 148*a58d3d2aSXin Li multiplication. 149*a58d3d2aSXin Li</t> 150*a58d3d2aSXin Li<t> 151*a58d3d2aSXin LiExpressions, where included in the text, follow C operator rules and 152*a58d3d2aSXin Li precedence, with the exception that the syntax "x**y" indicates x raised to 153*a58d3d2aSXin Li the power y. 154*a58d3d2aSXin LiThe text also makes use of the following functions: 155*a58d3d2aSXin Li</t> 156*a58d3d2aSXin Li 157*a58d3d2aSXin Li<section anchor="min" toc="exclude" title="min(x,y)"> 158*a58d3d2aSXin Li<t> 159*a58d3d2aSXin LiThe smallest of two values x and y. 160*a58d3d2aSXin Li</t> 161*a58d3d2aSXin Li</section> 162*a58d3d2aSXin Li 163*a58d3d2aSXin Li<section anchor="max" toc="exclude" title="max(x,y)"> 164*a58d3d2aSXin Li<t> 165*a58d3d2aSXin LiThe largest of two values x and y. 166*a58d3d2aSXin Li</t> 167*a58d3d2aSXin Li</section> 168*a58d3d2aSXin Li 169*a58d3d2aSXin Li<section anchor="clamp" toc="exclude" title="clamp(lo,x,hi)"> 170*a58d3d2aSXin Li<figure align="center"> 171*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 172*a58d3d2aSXin Liclamp(lo,x,hi) = max(lo,min(x,hi)) 173*a58d3d2aSXin Li]]></artwork> 174*a58d3d2aSXin Li</figure> 175*a58d3d2aSXin Li<t> 176*a58d3d2aSXin LiWith this definition, if lo > hi, the lower bound is the one that 177*a58d3d2aSXin Li is enforced. 178*a58d3d2aSXin Li</t> 179*a58d3d2aSXin Li</section> 180*a58d3d2aSXin Li 181*a58d3d2aSXin Li<section anchor="sign" toc="exclude" title="sign(x)"> 182*a58d3d2aSXin Li<t> 183*a58d3d2aSXin LiThe sign of x, i.e., 184*a58d3d2aSXin Li<figure align="center"> 185*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 186*a58d3d2aSXin Li ( -1, x < 0 , 187*a58d3d2aSXin Lisign(x) = < 0, x == 0 , 188*a58d3d2aSXin Li ( 1, x > 0 . 189*a58d3d2aSXin Li]]></artwork> 190*a58d3d2aSXin Li</figure> 191*a58d3d2aSXin Li</t> 192*a58d3d2aSXin Li</section> 193*a58d3d2aSXin Li 194*a58d3d2aSXin Li<section anchor="abs" toc="exclude" title="abs(x)"> 195*a58d3d2aSXin Li<t> 196*a58d3d2aSXin LiThe absolute value of x, i.e., 197*a58d3d2aSXin Li<figure align="center"> 198*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 199*a58d3d2aSXin Liabs(x) = sign(x)*x . 200*a58d3d2aSXin Li]]></artwork> 201*a58d3d2aSXin Li</figure> 202*a58d3d2aSXin Li</t> 203*a58d3d2aSXin Li</section> 204*a58d3d2aSXin Li 205*a58d3d2aSXin Li<section anchor="floor" toc="exclude" title="floor(f)"> 206*a58d3d2aSXin Li<t> 207*a58d3d2aSXin LiThe largest integer z such that z <= f. 208*a58d3d2aSXin Li</t> 209*a58d3d2aSXin Li</section> 210*a58d3d2aSXin Li 211*a58d3d2aSXin Li<section anchor="ceil" toc="exclude" title="ceil(f)"> 212*a58d3d2aSXin Li<t> 213*a58d3d2aSXin LiThe smallest integer z such that z >= f. 214*a58d3d2aSXin Li</t> 215*a58d3d2aSXin Li</section> 216*a58d3d2aSXin Li 217*a58d3d2aSXin Li<section anchor="round" toc="exclude" title="round(f)"> 218*a58d3d2aSXin Li<t> 219*a58d3d2aSXin LiThe integer z nearest to f, with ties rounded towards negative infinity, 220*a58d3d2aSXin Li i.e., 221*a58d3d2aSXin Li<figure align="center"> 222*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 223*a58d3d2aSXin Li round(f) = ceil(f - 0.5) . 224*a58d3d2aSXin Li]]></artwork> 225*a58d3d2aSXin Li</figure> 226*a58d3d2aSXin Li</t> 227*a58d3d2aSXin Li</section> 228*a58d3d2aSXin Li 229*a58d3d2aSXin Li<section anchor="log2" toc="exclude" title="log2(f)"> 230*a58d3d2aSXin Li<t> 231*a58d3d2aSXin LiThe base-two logarithm of f. 232*a58d3d2aSXin Li</t> 233*a58d3d2aSXin Li</section> 234*a58d3d2aSXin Li 235*a58d3d2aSXin Li<section anchor="ilog" toc="exclude" title="ilog(n)"> 236*a58d3d2aSXin Li<t> 237*a58d3d2aSXin LiThe minimum number of bits required to store a positive integer n in two's 238*a58d3d2aSXin Li complement notation, or 0 for a non-positive integer n. 239*a58d3d2aSXin Li<figure align="center"> 240*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 241*a58d3d2aSXin Li ( 0, n <= 0, 242*a58d3d2aSXin Liilog(n) = < 243*a58d3d2aSXin Li ( floor(log2(n))+1, n > 0 244*a58d3d2aSXin Li]]></artwork> 245*a58d3d2aSXin Li</figure> 246*a58d3d2aSXin LiExamples: 247*a58d3d2aSXin Li<list style="symbols"> 248*a58d3d2aSXin Li<t>ilog(-1) = 0</t> 249*a58d3d2aSXin Li<t>ilog(0) = 0</t> 250*a58d3d2aSXin Li<t>ilog(1) = 1</t> 251*a58d3d2aSXin Li<t>ilog(2) = 2</t> 252*a58d3d2aSXin Li<t>ilog(3) = 2</t> 253*a58d3d2aSXin Li<t>ilog(4) = 3</t> 254*a58d3d2aSXin Li<t>ilog(7) = 3</t> 255*a58d3d2aSXin Li</list> 256*a58d3d2aSXin Li</t> 257*a58d3d2aSXin Li</section> 258*a58d3d2aSXin Li 259*a58d3d2aSXin Li</section> 260*a58d3d2aSXin Li 261*a58d3d2aSXin Li</section> 262*a58d3d2aSXin Li 263*a58d3d2aSXin Li<section anchor="overview" title="Opus Codec Overview"> 264*a58d3d2aSXin Li 265*a58d3d2aSXin Li<t> 266*a58d3d2aSXin LiThe Opus codec scales from 6 kb/s narrowband mono speech to 510 kb/s 267*a58d3d2aSXin Li fullband stereo music, with algorithmic delays ranging from 5 ms to 268*a58d3d2aSXin Li 65.2 ms. 269*a58d3d2aSXin LiAt any given time, either the LP layer, the MDCT layer, or both, may be active. 270*a58d3d2aSXin LiIt can seamlessly switch between all of its various operating modes, giving it 271*a58d3d2aSXin Li a great deal of flexibility to adapt to varying content and network 272*a58d3d2aSXin Li conditions without renegotiating the current session. 273*a58d3d2aSXin LiThe codec allows input and output of various audio bandwidths, defined as 274*a58d3d2aSXin Li follows: 275*a58d3d2aSXin Li</t> 276*a58d3d2aSXin Li<texttable anchor="audio-bandwidth"> 277*a58d3d2aSXin Li<ttcol>Abbreviation</ttcol> 278*a58d3d2aSXin Li<ttcol align="right">Audio Bandwidth</ttcol> 279*a58d3d2aSXin Li<ttcol align="right">Sample Rate (Effective)</ttcol> 280*a58d3d2aSXin Li<c>NB (narrowband)</c> <c>4 kHz</c> <c>8 kHz</c> 281*a58d3d2aSXin Li<c>MB (medium-band)</c> <c>6 kHz</c> <c>12 kHz</c> 282*a58d3d2aSXin Li<c>WB (wideband)</c> <c>8 kHz</c> <c>16 kHz</c> 283*a58d3d2aSXin Li<c>SWB (super-wideband)</c> <c>12 kHz</c> <c>24 kHz</c> 284*a58d3d2aSXin Li<c>FB (fullband)</c> <c>20 kHz (*)</c> <c>48 kHz</c> 285*a58d3d2aSXin Li</texttable> 286*a58d3d2aSXin Li<t> 287*a58d3d2aSXin Li(*) Although the sampling theorem allows a bandwidth as large as half the 288*a58d3d2aSXin Li sampling rate, Opus never codes audio above 20 kHz, as that is the 289*a58d3d2aSXin Li generally accepted upper limit of human hearing. 290*a58d3d2aSXin Li</t> 291*a58d3d2aSXin Li 292*a58d3d2aSXin Li<t> 293*a58d3d2aSXin LiOpus defines super-wideband (SWB) with an effective sample rate of 24 kHz, 294*a58d3d2aSXin Li unlike some other audio coding standards that use 32 kHz. 295*a58d3d2aSXin LiThis was chosen for a number of reasons. 296*a58d3d2aSXin LiThe band layout in the MDCT layer naturally allows skipping coefficients for 297*a58d3d2aSXin Li frequencies over 12 kHz, but does not allow cleanly dropping just those 298*a58d3d2aSXin Li frequencies over 16 kHz. 299*a58d3d2aSXin LiA sample rate of 24 kHz also makes resampling in the MDCT layer easier, 300*a58d3d2aSXin Li as 24 evenly divides 48, and when 24 kHz is sufficient, it can save 301*a58d3d2aSXin Li computation in other processing, such as Acoustic Echo Cancellation (AEC). 302*a58d3d2aSXin LiExperimental changes to the band layout to allow a 16 kHz cutoff 303*a58d3d2aSXin Li (32 kHz effective sample rate) showed potential quality degradations at 304*a58d3d2aSXin Li other sample rates, and at typical bitrates the number of bits saved by using 305*a58d3d2aSXin Li such a cutoff instead of coding in fullband (FB) mode is very small. 306*a58d3d2aSXin LiTherefore, if an application wishes to process a signal sampled at 32 kHz, 307*a58d3d2aSXin Li it should just use FB. 308*a58d3d2aSXin Li</t> 309*a58d3d2aSXin Li 310*a58d3d2aSXin Li<t> 311*a58d3d2aSXin LiThe LP layer is based on the SILK codec 312*a58d3d2aSXin Li <xref target="SILK"></xref>. 313*a58d3d2aSXin LiIt supports NB, MB, or WB audio and frame sizes from 10 ms to 60 ms, 314*a58d3d2aSXin Li and requires an additional 5 ms look-ahead for noise shaping estimation. 315*a58d3d2aSXin LiA small additional delay (up to 1.5 ms) may be required for sampling rate 316*a58d3d2aSXin Li conversion. 317*a58d3d2aSXin LiLike Vorbis <xref target='Vorbis-website'/> and many other modern codecs, SILK is inherently designed for 318*a58d3d2aSXin Li variable-bitrate (VBR) coding, though the encoder can also produce 319*a58d3d2aSXin Li constant-bitrate (CBR) streams. 320*a58d3d2aSXin LiThe version of SILK used in Opus is substantially modified from, and not 321*a58d3d2aSXin Li compatible with, the stand-alone SILK codec previously deployed by Skype. 322*a58d3d2aSXin LiThis document does not serve to define that format, but those interested in the 323*a58d3d2aSXin Li original SILK codec should see <xref target="SILK"/> instead. 324*a58d3d2aSXin Li</t> 325*a58d3d2aSXin Li 326*a58d3d2aSXin Li<t> 327*a58d3d2aSXin LiThe MDCT layer is based on the CELT codec <xref target="CELT"></xref>. 328*a58d3d2aSXin LiIt supports NB, WB, SWB, or FB audio and frame sizes from 2.5 ms to 329*a58d3d2aSXin Li 20 ms, and requires an additional 2.5 ms look-ahead due to the 330*a58d3d2aSXin Li overlapping MDCT windows. 331*a58d3d2aSXin LiThe CELT codec is inherently designed for CBR coding, but unlike many CBR 332*a58d3d2aSXin Li codecs it is not limited to a set of predetermined rates. 333*a58d3d2aSXin LiIt internally allocates bits to exactly fill any given target budget, and an 334*a58d3d2aSXin Li encoder can produce a VBR stream by varying the target on a per-frame basis. 335*a58d3d2aSXin LiThe MDCT layer is not used for speech when the audio bandwidth is WB or less, 336*a58d3d2aSXin Li as it is not useful there. 337*a58d3d2aSXin LiOn the other hand, non-speech signals are not always adequately coded using 338*a58d3d2aSXin Li linear prediction, so for music only the MDCT layer should be used. 339*a58d3d2aSXin Li</t> 340*a58d3d2aSXin Li 341*a58d3d2aSXin Li<t> 342*a58d3d2aSXin LiA "Hybrid" mode allows the use of both layers simultaneously with a frame size 343*a58d3d2aSXin Li of 10 or 20 ms and a SWB or FB audio bandwidth. 344*a58d3d2aSXin LiThe LP layer codes the low frequencies by resampling the signal down to WB. 345*a58d3d2aSXin LiThe MDCT layer follows, coding the high frequency portion of the signal. 346*a58d3d2aSXin LiThe cutoff between the two lies at 8 kHz, the maximum WB audio bandwidth. 347*a58d3d2aSXin LiIn the MDCT layer, all bands below 8 kHz are discarded, so there is no 348*a58d3d2aSXin Li coding redundancy between the two layers. 349*a58d3d2aSXin Li</t> 350*a58d3d2aSXin Li 351*a58d3d2aSXin Li<t> 352*a58d3d2aSXin LiThe sample rate (in contrast to the actual audio bandwidth) can be chosen 353*a58d3d2aSXin Li independently on the encoder and decoder side, e.g., a fullband signal can be 354*a58d3d2aSXin Li decoded as wideband, or vice versa. 355*a58d3d2aSXin LiThis approach ensures a sender and receiver can always interoperate, regardless 356*a58d3d2aSXin Li of the capabilities of their actual audio hardware. 357*a58d3d2aSXin LiInternally, the LP layer always operates at a sample rate of twice the audio 358*a58d3d2aSXin Li bandwidth, up to a maximum of 16 kHz, which it continues to use for SWB 359*a58d3d2aSXin Li and FB. 360*a58d3d2aSXin LiThe decoder simply resamples its output to support different sample rates. 361*a58d3d2aSXin LiThe MDCT layer always operates internally at a sample rate of 48 kHz. 362*a58d3d2aSXin LiSince all the supported sample rates evenly divide this rate, and since the 363*a58d3d2aSXin Li the decoder may easily zero out the high frequency portion of the spectrum in 364*a58d3d2aSXin Li the frequency domain, it can simply decimate the MDCT layer output to achieve 365*a58d3d2aSXin Li the other supported sample rates very cheaply. 366*a58d3d2aSXin Li</t> 367*a58d3d2aSXin Li 368*a58d3d2aSXin Li<t> 369*a58d3d2aSXin LiAfter conversion to the common, desired output sample rate, the decoder simply 370*a58d3d2aSXin Li adds the output from the two layers together. 371*a58d3d2aSXin LiTo compensate for the different look-ahead required by each layer, the CELT 372*a58d3d2aSXin Li encoder input is delayed by an additional 2.7 ms. 373*a58d3d2aSXin LiThis ensures that low frequencies and high frequencies arrive at the same time. 374*a58d3d2aSXin LiThis extra delay may be reduced by an encoder by using less look-ahead for noise 375*a58d3d2aSXin Li shaping or using a simpler resampler in the LP layer, but this will reduce 376*a58d3d2aSXin Li quality. 377*a58d3d2aSXin LiHowever, the base 2.5 ms look-ahead in the CELT layer cannot be reduced in 378*a58d3d2aSXin Li the encoder because it is needed for the MDCT overlap, whose size is fixed by 379*a58d3d2aSXin Li the decoder. 380*a58d3d2aSXin Li</t> 381*a58d3d2aSXin Li 382*a58d3d2aSXin Li<t> 383*a58d3d2aSXin LiBoth layers use the same entropy coder, avoiding any waste from "padding bits" 384*a58d3d2aSXin Li between them. 385*a58d3d2aSXin LiThe hybrid approach makes it easy to support both CBR and VBR coding. 386*a58d3d2aSXin LiAlthough the LP layer is VBR, the bit allocation of the MDCT layer can produce 387*a58d3d2aSXin Li a final stream that is CBR by using all the bits left unused by the LP layer. 388*a58d3d2aSXin Li</t> 389*a58d3d2aSXin Li 390*a58d3d2aSXin Li<section title="Control Parameters"> 391*a58d3d2aSXin Li<t> 392*a58d3d2aSXin LiThe Opus codec includes a number of control parameters which can be changed dynamically during 393*a58d3d2aSXin Liregular operation of the codec, without interrupting the audio stream from the encoder to the decoder. 394*a58d3d2aSXin LiThese parameters only affect the encoder since any impact they have on the bit-stream is signaled 395*a58d3d2aSXin Liin-band such that a decoder can decode any Opus stream without any out-of-band signaling. Any Opus 396*a58d3d2aSXin Liimplementation can add or modify these control parameters without affecting interoperability. The most 397*a58d3d2aSXin Liimportant encoder control parameters in the reference encoder are listed below. 398*a58d3d2aSXin Li</t> 399*a58d3d2aSXin Li 400*a58d3d2aSXin Li<section title="Bitrate" toc="exlcude"> 401*a58d3d2aSXin Li<t> 402*a58d3d2aSXin LiOpus supports all bitrates from 6 kb/s to 510 kb/s. All other parameters being 403*a58d3d2aSXin Liequal, higher bitrate results in higher quality. For a frame size of 20 ms, these 404*a58d3d2aSXin Liare the bitrate "sweet spots" for Opus in various configurations: 405*a58d3d2aSXin Li<list style="symbols"> 406*a58d3d2aSXin Li<t>8-12 kb/s for NB speech,</t> 407*a58d3d2aSXin Li<t>16-20 kb/s for WB speech,</t> 408*a58d3d2aSXin Li<t>28-40 kb/s for FB speech,</t> 409*a58d3d2aSXin Li<t>48-64 kb/s for FB mono music, and</t> 410*a58d3d2aSXin Li<t>64-128 kb/s for FB stereo music.</t> 411*a58d3d2aSXin Li</list> 412*a58d3d2aSXin Li</t> 413*a58d3d2aSXin Li</section> 414*a58d3d2aSXin Li 415*a58d3d2aSXin Li<section title="Number of Channels (Mono/Stereo)" toc="exlcude"> 416*a58d3d2aSXin Li<t> 417*a58d3d2aSXin LiOpus can transmit either mono or stereo frames within a single stream. 418*a58d3d2aSXin LiWhen decoding a mono frame in a stereo decoder, the left and right channels are 419*a58d3d2aSXin Li identical, and when decoding a stereo frame in a mono decoder, the mono output 420*a58d3d2aSXin Li is the average of the left and right channels. 421*a58d3d2aSXin LiIn some cases, it is desirable to encode a stereo input stream in mono (e.g., 422*a58d3d2aSXin Li because the bitrate is too low to encode stereo with sufficient quality). 423*a58d3d2aSXin LiThe number of channels encoded can be selected in real-time, but by default the 424*a58d3d2aSXin Li reference encoder attempts to make the best decision possible given the 425*a58d3d2aSXin Li current bitrate. 426*a58d3d2aSXin Li</t> 427*a58d3d2aSXin Li</section> 428*a58d3d2aSXin Li 429*a58d3d2aSXin Li<section title="Audio Bandwidth" toc="exlcude"> 430*a58d3d2aSXin Li<t> 431*a58d3d2aSXin LiThe audio bandwidths supported by Opus are listed in 432*a58d3d2aSXin Li <xref target="audio-bandwidth"/>. 433*a58d3d2aSXin LiJust like for the number of channels, any decoder can decode audio encoded at 434*a58d3d2aSXin Li any bandwidth. 435*a58d3d2aSXin LiFor example, any Opus decoder operating at 8 kHz can decode a FB Opus 436*a58d3d2aSXin Li frame, and any Opus decoder operating at 48 kHz can decode a NB frame. 437*a58d3d2aSXin LiSimilarly, the reference encoder can take a 48 kHz input signal and 438*a58d3d2aSXin Li encode it as NB. 439*a58d3d2aSXin LiThe higher the audio bandwidth, the higher the required bitrate to achieve 440*a58d3d2aSXin Li acceptable quality. 441*a58d3d2aSXin LiThe audio bandwidth can be explicitly specified in real-time, but by default 442*a58d3d2aSXin Li the reference encoder attempts to make the best bandwidth decision possible 443*a58d3d2aSXin Li given the current bitrate. 444*a58d3d2aSXin Li</t> 445*a58d3d2aSXin Li</section> 446*a58d3d2aSXin Li 447*a58d3d2aSXin Li 448*a58d3d2aSXin Li<section title="Frame Duration" toc="exlcude"> 449*a58d3d2aSXin Li<t> 450*a58d3d2aSXin LiOpus can encode frames of 2.5, 5, 10, 20, 40 or 60 ms. 451*a58d3d2aSXin LiIt can also combine multiple frames into packets of up to 120 ms. 452*a58d3d2aSXin LiFor real-time applications, sending fewer packets per second reduces the 453*a58d3d2aSXin Li bitrate, since it reduces the overhead from IP, UDP, and RTP headers. 454*a58d3d2aSXin LiHowever, it increases latency and sensitivity to packet losses, as losing one 455*a58d3d2aSXin Li packet constitutes a loss of a bigger chunk of audio. 456*a58d3d2aSXin LiIncreasing the frame duration also slightly improves coding efficiency, but the 457*a58d3d2aSXin Li gain becomes small for frame sizes above 20 ms. 458*a58d3d2aSXin LiFor this reason, 20 ms frames are a good choice for most applications. 459*a58d3d2aSXin Li</t> 460*a58d3d2aSXin Li</section> 461*a58d3d2aSXin Li 462*a58d3d2aSXin Li<section title="Complexity" toc="exlcude"> 463*a58d3d2aSXin Li<t> 464*a58d3d2aSXin LiThere are various aspects of the Opus encoding process where trade-offs 465*a58d3d2aSXin Lican be made between CPU complexity and quality/bitrate. In the reference 466*a58d3d2aSXin Liencoder, the complexity is selected using an integer from 0 to 10, where 467*a58d3d2aSXin Li0 is the lowest complexity and 10 is the highest. Examples of 468*a58d3d2aSXin Licomputations for which such trade-offs may occur are: 469*a58d3d2aSXin Li<list style="symbols"> 470*a58d3d2aSXin Li<t>The order of the pitch analysis whitening filter <xref target="Whitening"/>,</t> 471*a58d3d2aSXin Li<t>The order of the short-term noise shaping filter,</t> 472*a58d3d2aSXin Li<t>The number of states in delayed decision quantization of the 473*a58d3d2aSXin Liresidual signal, and</t> 474*a58d3d2aSXin Li<t>The use of certain bit-stream features such as variable time-frequency 475*a58d3d2aSXin Liresolution and the pitch post-filter.</t> 476*a58d3d2aSXin Li</list> 477*a58d3d2aSXin Li</t> 478*a58d3d2aSXin Li</section> 479*a58d3d2aSXin Li 480*a58d3d2aSXin Li<section title="Packet Loss Resilience" toc="exlcude"> 481*a58d3d2aSXin Li<t> 482*a58d3d2aSXin LiAudio codecs often exploit inter-frame correlations to reduce the 483*a58d3d2aSXin Libitrate at a cost in error propagation: after losing one packet 484*a58d3d2aSXin Liseveral packets need to be received before the decoder is able to 485*a58d3d2aSXin Liaccurately reconstruct the speech signal. The extent to which Opus 486*a58d3d2aSXin Liexploits inter-frame dependencies can be adjusted on the fly to 487*a58d3d2aSXin Lichoose a trade-off between bitrate and amount of error propagation. 488*a58d3d2aSXin Li</t> 489*a58d3d2aSXin Li</section> 490*a58d3d2aSXin Li 491*a58d3d2aSXin Li<section title="Forward Error Correction (FEC)" toc="exlcude"> 492*a58d3d2aSXin Li<t> 493*a58d3d2aSXin Li Another mechanism providing robustness against packet loss is the in-band 494*a58d3d2aSXin Li Forward Error Correction (FEC). Packets that are determined to 495*a58d3d2aSXin Li contain perceptually important speech information, such as onsets or 496*a58d3d2aSXin Li transients, are encoded again at a lower bitrate and this re-encoded 497*a58d3d2aSXin Li information is added to a subsequent packet. 498*a58d3d2aSXin Li</t> 499*a58d3d2aSXin Li</section> 500*a58d3d2aSXin Li 501*a58d3d2aSXin Li<section title="Constant/Variable Bitrate" toc="exlcude"> 502*a58d3d2aSXin Li<t> 503*a58d3d2aSXin LiOpus is more efficient when operating with variable bitrate (VBR), which is 504*a58d3d2aSXin Lithe default. However, in some (rare) applications, constant bitrate (CBR) 505*a58d3d2aSXin Liis required. There are two main reasons to operate in CBR mode: 506*a58d3d2aSXin Li<list style="symbols"> 507*a58d3d2aSXin Li<t>When the transport only supports a fixed size for each compressed frame</t> 508*a58d3d2aSXin Li<t>When encryption is used for an audio stream that is either highly constrained 509*a58d3d2aSXin Li (e.g. yes/no, recorded prompts) or highly sensitive <xref target="SRTP-VBR"></xref> </t> 510*a58d3d2aSXin Li</list> 511*a58d3d2aSXin Li 512*a58d3d2aSXin LiWhen low-latency transmission is required over a relatively slow connection, then 513*a58d3d2aSXin Liconstrained VBR can also be used. This uses VBR in a way that simulates a 514*a58d3d2aSXin Li"bit reservoir" and is equivalent to what MP3 (MPEG 1, Layer 3) and 515*a58d3d2aSXin LiAAC (Advanced Audio Coding) call CBR (i.e., not true 516*a58d3d2aSXin LiCBR due to the bit reservoir). 517*a58d3d2aSXin Li</t> 518*a58d3d2aSXin Li</section> 519*a58d3d2aSXin Li 520*a58d3d2aSXin Li<section title="Discontinuous Transmission (DTX)" toc="exlcude"> 521*a58d3d2aSXin Li<t> 522*a58d3d2aSXin Li Discontinuous Transmission (DTX) reduces the bitrate during silence 523*a58d3d2aSXin Li or background noise. When DTX is enabled, only one frame is encoded 524*a58d3d2aSXin Li every 400 milliseconds. 525*a58d3d2aSXin Li</t> 526*a58d3d2aSXin Li</section> 527*a58d3d2aSXin Li 528*a58d3d2aSXin Li</section> 529*a58d3d2aSXin Li 530*a58d3d2aSXin Li</section> 531*a58d3d2aSXin Li 532*a58d3d2aSXin Li<section anchor="modes" title="Internal Framing"> 533*a58d3d2aSXin Li 534*a58d3d2aSXin Li<t> 535*a58d3d2aSXin LiThe Opus encoder produces "packets", which are each a contiguous set of bytes 536*a58d3d2aSXin Li meant to be transmitted as a single unit. 537*a58d3d2aSXin LiThe packets described here do not include such things as IP, UDP, or RTP 538*a58d3d2aSXin Li headers which are normally found in a transport-layer packet. 539*a58d3d2aSXin LiA single packet may contain multiple audio frames, so long as they share a 540*a58d3d2aSXin Li common set of parameters, including the operating mode, audio bandwidth, frame 541*a58d3d2aSXin Li size, and channel count (mono vs. stereo). 542*a58d3d2aSXin LiThis section describes the possible combinations of these parameters and the 543*a58d3d2aSXin Li internal framing used to pack multiple frames into a single packet. 544*a58d3d2aSXin LiThis framing is not self-delimiting. 545*a58d3d2aSXin LiInstead, it assumes that a higher layer (such as UDP or RTP <xref target='RFC3550'/> 546*a58d3d2aSXin Lior Ogg <xref target='RFC3533'/> or Matroska <xref target='Matroska-website'/>) 547*a58d3d2aSXin Li will communicate the length, in bytes, of the packet, and it uses this 548*a58d3d2aSXin Li information to reduce the framing overhead in the packet itself. 549*a58d3d2aSXin LiA decoder implementation MUST support the framing described in this section. 550*a58d3d2aSXin LiAn alternative, self-delimiting variant of the framing is described in 551*a58d3d2aSXin Li <xref target="self-delimiting-framing"/>. 552*a58d3d2aSXin LiSupport for that variant is OPTIONAL. 553*a58d3d2aSXin Li</t> 554*a58d3d2aSXin Li 555*a58d3d2aSXin Li<t> 556*a58d3d2aSXin LiAll bit diagrams in this document number the bits so that bit 0 is the most 557*a58d3d2aSXin Li significant bit of the first byte, and bit 7 is the least significant. 558*a58d3d2aSXin LiBit 8 is thus the most significant bit of the second byte, etc. 559*a58d3d2aSXin LiWell-formed Opus packets obey certain requirements, marked [R1] through [R7] 560*a58d3d2aSXin Li below. 561*a58d3d2aSXin LiThese are summarized in <xref target="malformed-packets"/> along with 562*a58d3d2aSXin Li appropriate means of handling malformed packets. 563*a58d3d2aSXin Li</t> 564*a58d3d2aSXin Li 565*a58d3d2aSXin Li<section anchor="toc_byte" title="The TOC Byte"> 566*a58d3d2aSXin Li<t anchor="R1"> 567*a58d3d2aSXin LiA well-formed Opus packet MUST contain at least one byte [R1]. 568*a58d3d2aSXin LiThis byte forms a table-of-contents (TOC) header that signals which of the 569*a58d3d2aSXin Li various modes and configurations a given packet uses. 570*a58d3d2aSXin LiIt is composed of a configuration number, "config", a stereo flag, "s", and a 571*a58d3d2aSXin Li frame count code, "c", arranged as illustrated in 572*a58d3d2aSXin Li <xref target="toc_byte_fig"/>. 573*a58d3d2aSXin LiA description of each of these fields follows. 574*a58d3d2aSXin Li</t> 575*a58d3d2aSXin Li 576*a58d3d2aSXin Li<figure anchor="toc_byte_fig" title="The TOC Byte"> 577*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 578*a58d3d2aSXin Li 0 579*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 580*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+ 581*a58d3d2aSXin Li| config |s| c | 582*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+ 583*a58d3d2aSXin Li]]></artwork> 584*a58d3d2aSXin Li</figure> 585*a58d3d2aSXin Li 586*a58d3d2aSXin Li<t> 587*a58d3d2aSXin LiThe top five bits of the TOC byte, labeled "config", encode one of 32 possible 588*a58d3d2aSXin Li configurations of operating mode, audio bandwidth, and frame size. 589*a58d3d2aSXin LiAs described, the LP (SILK) layer and MDCT (CELT) layer can be combined in three possible 590*a58d3d2aSXin Li operating modes: 591*a58d3d2aSXin Li<list style="numbers"> 592*a58d3d2aSXin Li<t>A SILK-only mode for use in low bitrate connections with an audio bandwidth 593*a58d3d2aSXin Li of WB or less,</t> 594*a58d3d2aSXin Li<t>A Hybrid (SILK+CELT) mode for SWB or FB speech at medium bitrates, and</t> 595*a58d3d2aSXin Li<t>A CELT-only mode for very low delay speech transmission as well as music 596*a58d3d2aSXin Li transmission (NB to FB).</t> 597*a58d3d2aSXin Li</list> 598*a58d3d2aSXin LiThe 32 possible configurations each identify which one of these operating modes 599*a58d3d2aSXin Li the packet uses, as well as the audio bandwidth and the frame size. 600*a58d3d2aSXin Li<xref target="config_bits"/> lists the parameters for each configuration. 601*a58d3d2aSXin Li</t> 602*a58d3d2aSXin Li<texttable anchor="config_bits" title="TOC Byte Configuration Parameters"> 603*a58d3d2aSXin Li<ttcol>Configuration Number(s)</ttcol> 604*a58d3d2aSXin Li<ttcol>Mode</ttcol> 605*a58d3d2aSXin Li<ttcol>Bandwidth</ttcol> 606*a58d3d2aSXin Li<ttcol>Frame Sizes</ttcol> 607*a58d3d2aSXin Li<c>0...3</c> <c>SILK-only</c> <c>NB</c> <c>10, 20, 40, 60 ms</c> 608*a58d3d2aSXin Li<c>4...7</c> <c>SILK-only</c> <c>MB</c> <c>10, 20, 40, 60 ms</c> 609*a58d3d2aSXin Li<c>8...11</c> <c>SILK-only</c> <c>WB</c> <c>10, 20, 40, 60 ms</c> 610*a58d3d2aSXin Li<c>12...13</c> <c>Hybrid</c> <c>SWB</c> <c>10, 20 ms</c> 611*a58d3d2aSXin Li<c>14...15</c> <c>Hybrid</c> <c>FB</c> <c>10, 20 ms</c> 612*a58d3d2aSXin Li<c>16...19</c> <c>CELT-only</c> <c>NB</c> <c>2.5, 5, 10, 20 ms</c> 613*a58d3d2aSXin Li<c>20...23</c> <c>CELT-only</c> <c>WB</c> <c>2.5, 5, 10, 20 ms</c> 614*a58d3d2aSXin Li<c>24...27</c> <c>CELT-only</c> <c>SWB</c> <c>2.5, 5, 10, 20 ms</c> 615*a58d3d2aSXin Li<c>28...31</c> <c>CELT-only</c> <c>FB</c> <c>2.5, 5, 10, 20 ms</c> 616*a58d3d2aSXin Li</texttable> 617*a58d3d2aSXin Li<t> 618*a58d3d2aSXin LiThe configuration numbers in each range (e.g., 0...3 for NB SILK-only) 619*a58d3d2aSXin Li correspond to the various choices of frame size, in the same order. 620*a58d3d2aSXin LiFor example, configuration 0 has a 10 ms frame size and configuration 3 621*a58d3d2aSXin Li has a 60 ms frame size. 622*a58d3d2aSXin Li</t> 623*a58d3d2aSXin Li 624*a58d3d2aSXin Li<t> 625*a58d3d2aSXin LiOne additional bit, labeled "s", signals mono vs. stereo, with 0 indicating 626*a58d3d2aSXin Li mono and 1 indicating stereo. 627*a58d3d2aSXin Li</t> 628*a58d3d2aSXin Li 629*a58d3d2aSXin Li<t> 630*a58d3d2aSXin LiThe remaining two bits of the TOC byte, labeled "c", code the number of frames 631*a58d3d2aSXin Li per packet (codes 0 to 3) as follows: 632*a58d3d2aSXin Li<list style="symbols"> 633*a58d3d2aSXin Li<t>0: 1 frame in the packet</t> 634*a58d3d2aSXin Li<t>1: 2 frames in the packet, each with equal compressed size</t> 635*a58d3d2aSXin Li<t>2: 2 frames in the packet, with different compressed sizes</t> 636*a58d3d2aSXin Li<t>3: an arbitrary number of frames in the packet</t> 637*a58d3d2aSXin Li</list> 638*a58d3d2aSXin LiThis draft refers to a packet as a code 0 packet, code 1 packet, etc., based on 639*a58d3d2aSXin Li the value of "c". 640*a58d3d2aSXin Li</t> 641*a58d3d2aSXin Li 642*a58d3d2aSXin Li</section> 643*a58d3d2aSXin Li 644*a58d3d2aSXin Li<section title="Frame Packing"> 645*a58d3d2aSXin Li 646*a58d3d2aSXin Li<t> 647*a58d3d2aSXin LiThis section describes how frames are packed according to each possible value 648*a58d3d2aSXin Li of "c" in the TOC byte. 649*a58d3d2aSXin Li</t> 650*a58d3d2aSXin Li 651*a58d3d2aSXin Li<section anchor="frame-length-coding" title="Frame Length Coding"> 652*a58d3d2aSXin Li<t> 653*a58d3d2aSXin LiWhen a packet contains multiple VBR frames (i.e., code 2 or 3), the compressed 654*a58d3d2aSXin Li length of one or more of these frames is indicated with a one- or two-byte 655*a58d3d2aSXin Li sequence, with the meaning of the first byte as follows: 656*a58d3d2aSXin Li<list style="symbols"> 657*a58d3d2aSXin Li<t>0: No frame (discontinuous transmission (DTX) or lost packet)</t> 658*a58d3d2aSXin Li<t>1...251: Length of the frame in bytes</t> 659*a58d3d2aSXin Li<t>252...255: A second byte is needed. The total length is (second_byte*4)+first_byte</t> 660*a58d3d2aSXin Li</list> 661*a58d3d2aSXin Li</t> 662*a58d3d2aSXin Li 663*a58d3d2aSXin Li<t> 664*a58d3d2aSXin LiThe special length 0 indicates that no frame is available, either because it 665*a58d3d2aSXin Li was dropped during transmission by some intermediary or because the encoder 666*a58d3d2aSXin Li chose not to transmit it. 667*a58d3d2aSXin LiAny Opus frame in any mode MAY have a length of 0. 668*a58d3d2aSXin Li</t> 669*a58d3d2aSXin Li 670*a58d3d2aSXin Li<t> 671*a58d3d2aSXin LiThe maximum representable length is 255*4+255=1275 bytes. 672*a58d3d2aSXin LiFor 20 ms frames, this represents a bitrate of 510 kb/s, which is 673*a58d3d2aSXin Li approximately the highest useful rate for lossily compressed fullband stereo 674*a58d3d2aSXin Li music. 675*a58d3d2aSXin LiBeyond this point, lossless codecs are more appropriate. 676*a58d3d2aSXin LiIt is also roughly the maximum useful rate of the MDCT layer, as shortly 677*a58d3d2aSXin Li thereafter quality no longer improves with additional bits due to limitations 678*a58d3d2aSXin Li on the codebook sizes. 679*a58d3d2aSXin Li</t> 680*a58d3d2aSXin Li 681*a58d3d2aSXin Li<t anchor="R2"> 682*a58d3d2aSXin LiNo length is transmitted for the last frame in a VBR packet, or for any of the 683*a58d3d2aSXin Li frames in a CBR packet, as it can be inferred from the total size of the 684*a58d3d2aSXin Li packet and the size of all other data in the packet. 685*a58d3d2aSXin LiHowever, the length of any individual frame MUST NOT exceed 686*a58d3d2aSXin Li 1275 bytes [R2], to allow for repacketization by gateways, 687*a58d3d2aSXin Li conference bridges, or other software. 688*a58d3d2aSXin Li</t> 689*a58d3d2aSXin Li</section> 690*a58d3d2aSXin Li 691*a58d3d2aSXin Li<section title="Code 0: One Frame in the Packet"> 692*a58d3d2aSXin Li 693*a58d3d2aSXin Li<t> 694*a58d3d2aSXin LiFor code 0 packets, the TOC byte is immediately followed by N-1 bytes 695*a58d3d2aSXin Li of compressed data for a single frame (where N is the size of the packet), 696*a58d3d2aSXin Li as illustrated in <xref target="code0_packet"/>. 697*a58d3d2aSXin Li</t> 698*a58d3d2aSXin Li<figure anchor="code0_packet" title="A Code 0 Packet" align="center"> 699*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 700*a58d3d2aSXin Li 0 1 2 3 701*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 702*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703*a58d3d2aSXin Li| config |s|0|0| | 704*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+ | 705*a58d3d2aSXin Li| Compressed frame 1 (N-1 bytes)... : 706*a58d3d2aSXin Li: | 707*a58d3d2aSXin Li| | 708*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 709*a58d3d2aSXin Li]]></artwork> 710*a58d3d2aSXin Li</figure> 711*a58d3d2aSXin Li</section> 712*a58d3d2aSXin Li 713*a58d3d2aSXin Li<section title="Code 1: Two Frames in the Packet, Each with Equal Compressed Size"> 714*a58d3d2aSXin Li<t anchor="R3"> 715*a58d3d2aSXin LiFor code 1 packets, the TOC byte is immediately followed by the 716*a58d3d2aSXin Li (N-1)/2 bytes of compressed data for the first frame, followed by 717*a58d3d2aSXin Li (N-1)/2 bytes of compressed data for the second frame, as illustrated in 718*a58d3d2aSXin Li <xref target="code1_packet"/>. 719*a58d3d2aSXin LiThe number of payload bytes available for compressed data, N-1, MUST be even 720*a58d3d2aSXin Li for all code 1 packets [R3]. 721*a58d3d2aSXin Li</t> 722*a58d3d2aSXin Li<figure anchor="code1_packet" title="A Code 1 Packet" align="center"> 723*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 724*a58d3d2aSXin Li 0 1 2 3 725*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 726*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 727*a58d3d2aSXin Li| config |s|0|1| | 728*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+ : 729*a58d3d2aSXin Li| Compressed frame 1 ((N-1)/2 bytes)... | 730*a58d3d2aSXin Li: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 731*a58d3d2aSXin Li| | | 732*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : 733*a58d3d2aSXin Li| Compressed frame 2 ((N-1)/2 bytes)... | 734*a58d3d2aSXin Li: +-+-+-+-+-+-+-+-+ 735*a58d3d2aSXin Li| | 736*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 737*a58d3d2aSXin Li]]></artwork> 738*a58d3d2aSXin Li</figure> 739*a58d3d2aSXin Li</section> 740*a58d3d2aSXin Li 741*a58d3d2aSXin Li<section title="Code 2: Two Frames in the Packet, with Different Compressed Sizes"> 742*a58d3d2aSXin Li<t anchor="R4"> 743*a58d3d2aSXin LiFor code 2 packets, the TOC byte is followed by a one- or two-byte sequence 744*a58d3d2aSXin Li indicating the length of the first frame (marked N1 in <xref target='code2_packet'/>), 745*a58d3d2aSXin Li followed by N1 bytes of compressed data for the first frame. 746*a58d3d2aSXin LiThe remaining N-N1-2 or N-N1-3 bytes are the compressed data for the 747*a58d3d2aSXin Li second frame. 748*a58d3d2aSXin LiThis is illustrated in <xref target="code2_packet"/>. 749*a58d3d2aSXin LiA code 2 packet MUST contain enough bytes to represent a valid length. 750*a58d3d2aSXin LiFor example, a 1-byte code 2 packet is always invalid, and a 2-byte code 2 751*a58d3d2aSXin Li packet whose second byte is in the range 252...255 is also invalid. 752*a58d3d2aSXin LiThe length of the first frame, N1, MUST also be no larger than the size of the 753*a58d3d2aSXin Li payload remaining after decoding that length for all code 2 packets [R4]. 754*a58d3d2aSXin LiThis makes, for example, a 2-byte code 2 packet with a second byte in the range 755*a58d3d2aSXin Li 1...251 invalid as well (the only valid 2-byte code 2 packet is one where the 756*a58d3d2aSXin Li length of both frames is zero). 757*a58d3d2aSXin Li</t> 758*a58d3d2aSXin Li<figure anchor="code2_packet" title="A Code 2 Packet" align="center"> 759*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 760*a58d3d2aSXin Li 0 1 2 3 761*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 762*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 763*a58d3d2aSXin Li| config |s|1|0| N1 (1-2 bytes): | 764*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : 765*a58d3d2aSXin Li| Compressed frame 1 (N1 bytes)... | 766*a58d3d2aSXin Li: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 767*a58d3d2aSXin Li| | | 768*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 769*a58d3d2aSXin Li| Compressed frame 2... : 770*a58d3d2aSXin Li: | 771*a58d3d2aSXin Li| | 772*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 773*a58d3d2aSXin Li]]></artwork> 774*a58d3d2aSXin Li</figure> 775*a58d3d2aSXin Li</section> 776*a58d3d2aSXin Li 777*a58d3d2aSXin Li<section title="Code 3: A Signaled Number of Frames in the Packet"> 778*a58d3d2aSXin Li<t anchor="R5"> 779*a58d3d2aSXin LiCode 3 packets signal the number of frames, as well as additional 780*a58d3d2aSXin Li padding, called "Opus padding" to indicate that this padding is added at the 781*a58d3d2aSXin Li Opus layer, rather than at the transport layer. 782*a58d3d2aSXin LiCode 3 packets MUST have at least 2 bytes [R6,R7]. 783*a58d3d2aSXin LiThe TOC byte is followed by a byte encoding the number of frames in the packet 784*a58d3d2aSXin Li in bits 2 to 7 (marked "M" in <xref target='frame_count_byte'/>), with bit 1 indicating whether 785*a58d3d2aSXin Li or not Opus padding is inserted (marked "p" in <xref target='frame_count_byte'/>), and bit 0 786*a58d3d2aSXin Li indicating VBR (marked "v" in <xref target='frame_count_byte'/>). 787*a58d3d2aSXin LiM MUST NOT be zero, and the audio duration contained within a packet MUST NOT 788*a58d3d2aSXin Li exceed 120 ms [R5]. 789*a58d3d2aSXin LiThis limits the maximum frame count for any frame size to 48 (for 2.5 ms 790*a58d3d2aSXin Li frames), with lower limits for longer frame sizes. 791*a58d3d2aSXin Li<xref target="frame_count_byte"/> illustrates the layout of the frame count 792*a58d3d2aSXin Li byte. 793*a58d3d2aSXin Li</t> 794*a58d3d2aSXin Li<figure anchor="frame_count_byte" title="The frame count byte"> 795*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 796*a58d3d2aSXin Li 0 797*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 798*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+ 799*a58d3d2aSXin Li|v|p| M | 800*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+ 801*a58d3d2aSXin Li]]></artwork> 802*a58d3d2aSXin Li</figure> 803*a58d3d2aSXin Li<t> 804*a58d3d2aSXin LiWhen Opus padding is used, the number of bytes of padding is encoded in the 805*a58d3d2aSXin Li bytes following the frame count byte. 806*a58d3d2aSXin LiValues from 0...254 indicate that 0...254 bytes of padding are included, 807*a58d3d2aSXin Li in addition to the byte(s) used to indicate the size of the padding. 808*a58d3d2aSXin LiIf the value is 255, then the size of the additional padding is 254 bytes, 809*a58d3d2aSXin Li plus the padding value encoded in the next byte. 810*a58d3d2aSXin LiThere MUST be at least one more byte in the packet in this case [R6,R7]. 811*a58d3d2aSXin LiThe additional padding bytes appear at the end of the packet, and MUST be set 812*a58d3d2aSXin Li to zero by the encoder to avoid creating a covert channel. 813*a58d3d2aSXin LiThe decoder MUST accept any value for the padding bytes, however. 814*a58d3d2aSXin Li</t> 815*a58d3d2aSXin Li<t> 816*a58d3d2aSXin LiAlthough this encoding provides multiple ways to indicate a given number of 817*a58d3d2aSXin Li padding bytes, each uses a different number of bytes to indicate the padding 818*a58d3d2aSXin Li size, and thus will increase the total packet size by a different amount. 819*a58d3d2aSXin LiFor example, to add 255 bytes to a packet, set the padding bit, p, to 1, insert 820*a58d3d2aSXin Li a single byte after the frame count byte with a value of 254, and append 254 821*a58d3d2aSXin Li padding bytes with the value zero to the end of the packet. 822*a58d3d2aSXin LiTo add 256 bytes to a packet, set the padding bit to 1, insert two bytes after 823*a58d3d2aSXin Li the frame count byte with the values 255 and 0, respectively, and append 254 824*a58d3d2aSXin Li padding bytes with the value zero to the end of the packet. 825*a58d3d2aSXin LiBy using the value 255 multiple times, it is possible to create a packet of any 826*a58d3d2aSXin Li specific, desired size. 827*a58d3d2aSXin LiLet P be the number of header bytes used to indicate the padding size plus the 828*a58d3d2aSXin Li number of padding bytes themselves (i.e., P is the total number of bytes added 829*a58d3d2aSXin Li to the packet). 830*a58d3d2aSXin LiThen P MUST be no more than N-2 [R6,R7]. 831*a58d3d2aSXin Li</t> 832*a58d3d2aSXin Li<t anchor="R6"> 833*a58d3d2aSXin LiIn the CBR case, let R=N-2-P be the number of bytes remaining in the packet 834*a58d3d2aSXin Li after subtracting the (optional) padding. 835*a58d3d2aSXin LiThen the compressed length of each frame in bytes is equal to R/M. 836*a58d3d2aSXin LiThe value R MUST be a non-negative integer multiple of M [R6]. 837*a58d3d2aSXin LiThe compressed data for all M frames follows, each of size 838*a58d3d2aSXin Li R/M bytes, as illustrated in <xref target="code3cbr_packet"/>. 839*a58d3d2aSXin Li</t> 840*a58d3d2aSXin Li 841*a58d3d2aSXin Li<figure anchor="code3cbr_packet" title="A CBR Code 3 Packet" align="center"> 842*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 843*a58d3d2aSXin Li 0 1 2 3 844*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 845*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 846*a58d3d2aSXin Li| config |s|1|1|0|p| M | Padding length (Optional) : 847*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 848*a58d3d2aSXin Li| | 849*a58d3d2aSXin Li: Compressed frame 1 (R/M bytes)... : 850*a58d3d2aSXin Li| | 851*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 852*a58d3d2aSXin Li| | 853*a58d3d2aSXin Li: Compressed frame 2 (R/M bytes)... : 854*a58d3d2aSXin Li| | 855*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 856*a58d3d2aSXin Li| | 857*a58d3d2aSXin Li: ... : 858*a58d3d2aSXin Li| | 859*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 860*a58d3d2aSXin Li| | 861*a58d3d2aSXin Li: Compressed frame M (R/M bytes)... : 862*a58d3d2aSXin Li| | 863*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 864*a58d3d2aSXin Li: Opus Padding (Optional)... | 865*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 866*a58d3d2aSXin Li]]></artwork> 867*a58d3d2aSXin Li</figure> 868*a58d3d2aSXin Li 869*a58d3d2aSXin Li<t anchor="R7"> 870*a58d3d2aSXin LiIn the VBR case, the (optional) padding length is followed by M-1 frame 871*a58d3d2aSXin Li lengths (indicated by "N1" to "N[M-1]" in <xref target='code3vbr_packet'/>), each encoded in a 872*a58d3d2aSXin Li one- or two-byte sequence as described above. 873*a58d3d2aSXin LiThe packet MUST contain enough data for the M-1 lengths after removing the 874*a58d3d2aSXin Li (optional) padding, and the sum of these lengths MUST be no larger than the 875*a58d3d2aSXin Li number of bytes remaining in the packet after decoding them [R7]. 876*a58d3d2aSXin LiThe compressed data for all M frames follows, each frame consisting of the 877*a58d3d2aSXin Li indicated number of bytes, with the final frame consuming any remaining bytes 878*a58d3d2aSXin Li before the final padding, as illustrated in <xref target="code3cbr_packet"/>. 879*a58d3d2aSXin LiThe number of header bytes (TOC byte, frame count byte, padding length bytes, 880*a58d3d2aSXin Li and frame length bytes), plus the signaled length of the first M-1 frames themselves, 881*a58d3d2aSXin Li plus the signaled length of the padding MUST be no larger than N, the total size of the 882*a58d3d2aSXin Li packet. 883*a58d3d2aSXin Li</t> 884*a58d3d2aSXin Li 885*a58d3d2aSXin Li<figure anchor="code3vbr_packet" title="A VBR Code 3 Packet" align="center"> 886*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 887*a58d3d2aSXin Li 0 1 2 3 888*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 889*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 890*a58d3d2aSXin Li| config |s|1|1|1|p| M | Padding length (Optional) : 891*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 892*a58d3d2aSXin Li: N1 (1-2 bytes): N2 (1-2 bytes): ... : N[M-1] | 893*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 894*a58d3d2aSXin Li| | 895*a58d3d2aSXin Li: Compressed frame 1 (N1 bytes)... : 896*a58d3d2aSXin Li| | 897*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 898*a58d3d2aSXin Li| | 899*a58d3d2aSXin Li: Compressed frame 2 (N2 bytes)... : 900*a58d3d2aSXin Li| | 901*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 902*a58d3d2aSXin Li| | 903*a58d3d2aSXin Li: ... : 904*a58d3d2aSXin Li| | 905*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 906*a58d3d2aSXin Li| | 907*a58d3d2aSXin Li: Compressed frame M... : 908*a58d3d2aSXin Li| | 909*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 910*a58d3d2aSXin Li: Opus Padding (Optional)... | 911*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 912*a58d3d2aSXin Li]]></artwork> 913*a58d3d2aSXin Li</figure> 914*a58d3d2aSXin Li</section> 915*a58d3d2aSXin Li</section> 916*a58d3d2aSXin Li 917*a58d3d2aSXin Li<section anchor="examples" title="Examples"> 918*a58d3d2aSXin Li<t> 919*a58d3d2aSXin LiSimplest case, one NB mono 20 ms SILK frame: 920*a58d3d2aSXin Li</t> 921*a58d3d2aSXin Li 922*a58d3d2aSXin Li<figure anchor='framing_example_1'> 923*a58d3d2aSXin Li<artwork><![CDATA[ 924*a58d3d2aSXin Li 0 1 2 3 925*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 926*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 927*a58d3d2aSXin Li| 1 |0|0|0| compressed data... : 928*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 929*a58d3d2aSXin Li]]></artwork> 930*a58d3d2aSXin Li</figure> 931*a58d3d2aSXin Li 932*a58d3d2aSXin Li<t> 933*a58d3d2aSXin LiTwo FB mono 5 ms CELT frames of the same compressed size: 934*a58d3d2aSXin Li</t> 935*a58d3d2aSXin Li 936*a58d3d2aSXin Li<figure anchor='framing_example_2'> 937*a58d3d2aSXin Li<artwork><![CDATA[ 938*a58d3d2aSXin Li 0 1 2 3 939*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 940*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 941*a58d3d2aSXin Li| 29 |0|0|1| compressed data... : 942*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 943*a58d3d2aSXin Li]]></artwork> 944*a58d3d2aSXin Li</figure> 945*a58d3d2aSXin Li 946*a58d3d2aSXin Li<t> 947*a58d3d2aSXin LiTwo FB mono 20 ms Hybrid frames of different compressed size: 948*a58d3d2aSXin Li</t> 949*a58d3d2aSXin Li 950*a58d3d2aSXin Li<figure anchor='framing_example_3'> 951*a58d3d2aSXin Li<artwork><![CDATA[ 952*a58d3d2aSXin Li 0 1 2 3 953*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 954*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 955*a58d3d2aSXin Li| 15 |0|1|1|1|0| 2 | N1 | | 956*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 957*a58d3d2aSXin Li| compressed data... : 958*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 959*a58d3d2aSXin Li]]></artwork> 960*a58d3d2aSXin Li</figure> 961*a58d3d2aSXin Li 962*a58d3d2aSXin Li<t> 963*a58d3d2aSXin LiFour FB stereo 20 ms CELT frames of the same compressed size: 964*a58d3d2aSXin Li</t> 965*a58d3d2aSXin Li 966*a58d3d2aSXin Li<figure anchor='framing_example_4'> 967*a58d3d2aSXin Li<artwork><![CDATA[ 968*a58d3d2aSXin Li 0 1 2 3 969*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 970*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 971*a58d3d2aSXin Li| 31 |1|1|1|0|0| 4 | compressed data... : 972*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 973*a58d3d2aSXin Li]]></artwork> 974*a58d3d2aSXin Li</figure> 975*a58d3d2aSXin Li</section> 976*a58d3d2aSXin Li 977*a58d3d2aSXin Li<section anchor="malformed-packets" title="Receiving Malformed Packets"> 978*a58d3d2aSXin Li<t> 979*a58d3d2aSXin LiA receiver MUST NOT process packets which violate any of the rules above as 980*a58d3d2aSXin Li normal Opus packets. 981*a58d3d2aSXin LiThey are reserved for future applications, such as in-band headers (containing 982*a58d3d2aSXin Li metadata, etc.). 983*a58d3d2aSXin LiPackets which violate these constraints may cause implementations of 984*a58d3d2aSXin Li <spanx style="emph">this</spanx> specification to treat them as malformed, and 985*a58d3d2aSXin Li discard them. 986*a58d3d2aSXin Li</t> 987*a58d3d2aSXin Li<t> 988*a58d3d2aSXin LiThese constraints are summarized here for reference: 989*a58d3d2aSXin Li<list style="format [R%d]"> 990*a58d3d2aSXin Li<t>Packets are at least one byte.</t> 991*a58d3d2aSXin Li<t>No implicit frame length is larger than 1275 bytes.</t> 992*a58d3d2aSXin Li<t>Code 1 packets have an odd total length, N, so that (N-1)/2 is an 993*a58d3d2aSXin Li integer.</t> 994*a58d3d2aSXin Li<t>Code 2 packets have enough bytes after the TOC for a valid frame 995*a58d3d2aSXin Li length, and that length is no larger than the number of bytes remaining in the 996*a58d3d2aSXin Li packet.</t> 997*a58d3d2aSXin Li<t>Code 3 packets contain at least one frame, but no more than 120 ms 998*a58d3d2aSXin Li of audio total.</t> 999*a58d3d2aSXin Li<t>The length of a CBR code 3 packet, N, is at least two bytes, the number of 1000*a58d3d2aSXin Li bytes added to indicate the padding size plus the trailing padding bytes 1001*a58d3d2aSXin Li themselves, P, is no more than N-2, and the frame count, M, satisfies 1002*a58d3d2aSXin Li the constraint that (N-2-P) is a non-negative integer multiple of M.</t> 1003*a58d3d2aSXin Li<t>VBR code 3 packets are large enough to contain all the header bytes (TOC 1004*a58d3d2aSXin Li byte, frame count byte, any padding length bytes, and any frame length bytes), 1005*a58d3d2aSXin Li plus the length of the first M-1 frames, plus any trailing padding bytes.</t> 1006*a58d3d2aSXin Li</list> 1007*a58d3d2aSXin Li</t> 1008*a58d3d2aSXin Li</section> 1009*a58d3d2aSXin Li 1010*a58d3d2aSXin Li</section> 1011*a58d3d2aSXin Li 1012*a58d3d2aSXin Li<section title="Opus Decoder"> 1013*a58d3d2aSXin Li<t> 1014*a58d3d2aSXin LiThe Opus decoder consists of two main blocks: the SILK decoder and the CELT 1015*a58d3d2aSXin Li decoder. 1016*a58d3d2aSXin LiAt any given time, one or both of the SILK and CELT decoders may be active. 1017*a58d3d2aSXin LiThe output of the Opus decode is the sum of the outputs from the SILK and CELT 1018*a58d3d2aSXin Li decoders with proper sample rate conversion and delay compensation on the SILK 1019*a58d3d2aSXin Li side, and optional decimation (when decoding to sample rates less than 1020*a58d3d2aSXin Li 48 kHz) on the CELT side, as illustrated in the block diagram below. 1021*a58d3d2aSXin Li</t> 1022*a58d3d2aSXin Li<figure> 1023*a58d3d2aSXin Li<artwork> 1024*a58d3d2aSXin Li<![CDATA[ 1025*a58d3d2aSXin Li +---------+ +------------+ 1026*a58d3d2aSXin Li | SILK | | Sample | 1027*a58d3d2aSXin Li +->| Decoder |--->| Rate |----+ 1028*a58d3d2aSXin LiBit- +---------+ | | | | Conversion | v 1029*a58d3d2aSXin Listream | Range |---+ +---------+ +------------+ /---\ Audio 1030*a58d3d2aSXin Li------->| Decoder | | + |------> 1031*a58d3d2aSXin Li | |---+ +---------+ +------------+ \---/ 1032*a58d3d2aSXin Li +---------+ | | CELT | | Decimation | ^ 1033*a58d3d2aSXin Li +->| Decoder |--->| (Optional) |----+ 1034*a58d3d2aSXin Li | | | | 1035*a58d3d2aSXin Li +---------+ +------------+ 1036*a58d3d2aSXin Li]]> 1037*a58d3d2aSXin Li</artwork> 1038*a58d3d2aSXin Li</figure> 1039*a58d3d2aSXin Li 1040*a58d3d2aSXin Li<section anchor="range-decoder" title="Range Decoder"> 1041*a58d3d2aSXin Li<t> 1042*a58d3d2aSXin LiOpus uses an entropy coder based on range coding <xref target="range-coding"></xref> 1043*a58d3d2aSXin Li<xref target="Martin79"></xref>, 1044*a58d3d2aSXin Liwhich is itself a rediscovery of the FIFO arithmetic code introduced by <xref target="coding-thesis"></xref>. 1045*a58d3d2aSXin LiIt is very similar to arithmetic encoding, except that encoding is done with 1046*a58d3d2aSXin Lidigits in any base instead of with bits, 1047*a58d3d2aSXin Liso it is faster when using larger bases (i.e., a byte). All of the 1048*a58d3d2aSXin Licalculations in the range coder must use bit-exact integer arithmetic. 1049*a58d3d2aSXin Li</t> 1050*a58d3d2aSXin Li<t> 1051*a58d3d2aSXin LiSymbols may also be coded as "raw bits" packed directly into the bitstream, 1052*a58d3d2aSXin Li bypassing the range coder. 1053*a58d3d2aSXin LiThese are packed backwards starting at the end of the frame, as illustrated in 1054*a58d3d2aSXin Li <xref target="rawbits-example"/>. 1055*a58d3d2aSXin LiThis reduces complexity and makes the stream more resilient to bit errors, as 1056*a58d3d2aSXin Li corruption in the raw bits will not desynchronize the decoding process, unlike 1057*a58d3d2aSXin Li corruption in the input to the range decoder. 1058*a58d3d2aSXin LiRaw bits are only used in the CELT layer. 1059*a58d3d2aSXin Li</t> 1060*a58d3d2aSXin Li 1061*a58d3d2aSXin Li<figure anchor="rawbits-example" title="Illustrative example of packing range 1062*a58d3d2aSXin Li coder and raw bits data"> 1063*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1064*a58d3d2aSXin Li 0 1 2 3 1065*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1066*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1067*a58d3d2aSXin Li| Range coder data (packed MSB to LSB) -> : 1068*a58d3d2aSXin Li+ + 1069*a58d3d2aSXin Li: : 1070*a58d3d2aSXin Li+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1071*a58d3d2aSXin Li: | <- Boundary occurs at an arbitrary bit position : 1072*a58d3d2aSXin Li+-+-+-+ + 1073*a58d3d2aSXin Li: <- Raw bits data (packed LSB to MSB) | 1074*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1075*a58d3d2aSXin Li]]></artwork> 1076*a58d3d2aSXin Li</figure> 1077*a58d3d2aSXin Li 1078*a58d3d2aSXin Li<t> 1079*a58d3d2aSXin LiEach symbol coded by the range coder is drawn from a finite alphabet and coded 1080*a58d3d2aSXin Li in a separate "context", which describes the size of the alphabet and the 1081*a58d3d2aSXin Li relative frequency of each symbol in that alphabet. 1082*a58d3d2aSXin Li</t> 1083*a58d3d2aSXin Li<t> 1084*a58d3d2aSXin LiSuppose there is a context with n symbols, identified with an index that ranges 1085*a58d3d2aSXin Li from 0 to n-1. 1086*a58d3d2aSXin LiThe parameters needed to encode or decode symbol k in this context are 1087*a58d3d2aSXin Li represented by a three-tuple (fl[k], fh[k], ft), with 1088*a58d3d2aSXin Li 0 <= fl[k] < fh[k] <= ft <= 65535. 1089*a58d3d2aSXin LiThe values of this tuple are derived from the probability model for the 1090*a58d3d2aSXin Li symbol, represented by traditional "frequency counts". 1091*a58d3d2aSXin LiBecause Opus uses static contexts these are not updated as symbols are decoded. 1092*a58d3d2aSXin LiLet f[i] be the frequency of symbol i. 1093*a58d3d2aSXin LiThen the three-tuple corresponding to symbol k is given by 1094*a58d3d2aSXin Li</t> 1095*a58d3d2aSXin Li<figure align="center"> 1096*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1097*a58d3d2aSXin Li k-1 n-1 1098*a58d3d2aSXin Li __ __ 1099*a58d3d2aSXin Lifl[k] = \ f[i], fh[k] = fl[k] + f[k], ft = \ f[i] 1100*a58d3d2aSXin Li /_ /_ 1101*a58d3d2aSXin Li i=0 i=0 1102*a58d3d2aSXin Li]]></artwork> 1103*a58d3d2aSXin Li</figure> 1104*a58d3d2aSXin Li<t> 1105*a58d3d2aSXin LiThe range decoder extracts the symbols and integers encoded using the range 1106*a58d3d2aSXin Li encoder in <xref target="range-encoder"/>. 1107*a58d3d2aSXin LiThe range decoder maintains an internal state vector composed of the two-tuple 1108*a58d3d2aSXin Li (val, rng), representing the difference between the high end of the 1109*a58d3d2aSXin Li current range and the actual coded value, minus one, and the size of the 1110*a58d3d2aSXin Li current range, respectively. 1111*a58d3d2aSXin LiBoth val and rng are 32-bit unsigned integer values. 1112*a58d3d2aSXin Li</t> 1113*a58d3d2aSXin Li 1114*a58d3d2aSXin Li<section anchor="range-decoder-init" title="Range Decoder Initialization"> 1115*a58d3d2aSXin Li<t> 1116*a58d3d2aSXin LiLet b0 be the first input byte (or zero if there are no bytes in this Opus 1117*a58d3d2aSXin Li frame). 1118*a58d3d2aSXin LiThe decoder initializes rng to 128 and initializes val to 1119*a58d3d2aSXin Li (127 - (b0>>1)), where (b0>>1) is the top 7 bits of the 1120*a58d3d2aSXin Li first input byte. 1121*a58d3d2aSXin LiIt saves the remaining bit, (b0&1), for use in the renormalization 1122*a58d3d2aSXin Li procedure described in <xref target="range-decoder-renorm"/>, which the 1123*a58d3d2aSXin Li decoder invokes immediately after initialization to read additional bits and 1124*a58d3d2aSXin Li establish the invariant that rng > 2**23. 1125*a58d3d2aSXin Li</t> 1126*a58d3d2aSXin Li</section> 1127*a58d3d2aSXin Li 1128*a58d3d2aSXin Li<section anchor="decoding-symbols" title="Decoding Symbols"> 1129*a58d3d2aSXin Li<t> 1130*a58d3d2aSXin LiDecoding a symbol is a two-step process. 1131*a58d3d2aSXin LiThe first step determines a 16-bit unsigned value fs, which lies within the 1132*a58d3d2aSXin Li range of some symbol in the current context. 1133*a58d3d2aSXin LiThe second step updates the range decoder state with the three-tuple 1134*a58d3d2aSXin Li (fl[k], fh[k], ft) corresponding to that symbol. 1135*a58d3d2aSXin Li</t> 1136*a58d3d2aSXin Li<t> 1137*a58d3d2aSXin LiThe first step is implemented by ec_decode() (entdec.c), which computes 1138*a58d3d2aSXin Li<figure align="center"> 1139*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1140*a58d3d2aSXin Li val 1141*a58d3d2aSXin Lifs = ft - min(------ + 1, ft) . 1142*a58d3d2aSXin Li rng/ft 1143*a58d3d2aSXin Li]]></artwork> 1144*a58d3d2aSXin Li</figure> 1145*a58d3d2aSXin LiThe divisions here are integer division. 1146*a58d3d2aSXin Li</t> 1147*a58d3d2aSXin Li<t> 1148*a58d3d2aSXin LiThe decoder then identifies the symbol in the current context corresponding to 1149*a58d3d2aSXin Li fs; i.e., the value of k whose three-tuple (fl[k], fh[k], ft) 1150*a58d3d2aSXin Li satisfies fl[k] <= fs < fh[k]. 1151*a58d3d2aSXin LiIt uses this tuple to update val according to 1152*a58d3d2aSXin Li<figure align="center"> 1153*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1154*a58d3d2aSXin Li rng 1155*a58d3d2aSXin Lival = val - --- * (ft - fh[k]) . 1156*a58d3d2aSXin Li ft 1157*a58d3d2aSXin Li]]></artwork> 1158*a58d3d2aSXin Li</figure> 1159*a58d3d2aSXin LiIf fl[k] is greater than zero, then the decoder updates rng using 1160*a58d3d2aSXin Li<figure align="center"> 1161*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1162*a58d3d2aSXin Li rng 1163*a58d3d2aSXin Lirng = --- * (fh[k] - fl[k]) . 1164*a58d3d2aSXin Li ft 1165*a58d3d2aSXin Li]]></artwork> 1166*a58d3d2aSXin Li</figure> 1167*a58d3d2aSXin LiOtherwise, it updates rng using 1168*a58d3d2aSXin Li<figure align="center"> 1169*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1170*a58d3d2aSXin Li rng 1171*a58d3d2aSXin Lirng = rng - --- * (ft - fh[k]) . 1172*a58d3d2aSXin Li ft 1173*a58d3d2aSXin Li]]></artwork> 1174*a58d3d2aSXin Li</figure> 1175*a58d3d2aSXin Li</t> 1176*a58d3d2aSXin Li<t> 1177*a58d3d2aSXin LiUsing a special case for the first symbol (rather than the last symbol, as is 1178*a58d3d2aSXin Li commonly done in other arithmetic coders) ensures that all the truncation 1179*a58d3d2aSXin Li error from the finite precision arithmetic accumulates in symbol 0. 1180*a58d3d2aSXin LiThis makes the cost of coding a 0 slightly smaller, on average, than its 1181*a58d3d2aSXin Li estimated probability indicates and makes the cost of coding any other symbol 1182*a58d3d2aSXin Li slightly larger. 1183*a58d3d2aSXin LiWhen contexts are designed so that 0 is the most probable symbol, which is 1184*a58d3d2aSXin Li often the case, this strategy minimizes the inefficiency introduced by the 1185*a58d3d2aSXin Li finite precision. 1186*a58d3d2aSXin LiIt also makes some of the special-case decoding routines in 1187*a58d3d2aSXin Li <xref target="decoding-alternate"/> particularly simple. 1188*a58d3d2aSXin Li</t> 1189*a58d3d2aSXin Li<t> 1190*a58d3d2aSXin LiAfter the updates, implemented by ec_dec_update() (entdec.c), the decoder 1191*a58d3d2aSXin Li normalizes the range using the procedure in the next section, and returns the 1192*a58d3d2aSXin Li index k. 1193*a58d3d2aSXin Li</t> 1194*a58d3d2aSXin Li 1195*a58d3d2aSXin Li<section anchor="range-decoder-renorm" title="Renormalization"> 1196*a58d3d2aSXin Li<t> 1197*a58d3d2aSXin LiTo normalize the range, the decoder repeats the following process, implemented 1198*a58d3d2aSXin Li by ec_dec_normalize() (entdec.c), until rng > 2**23. 1199*a58d3d2aSXin LiIf rng is already greater than 2**23, the entire process is skipped. 1200*a58d3d2aSXin LiFirst, it sets rng to (rng<<8). 1201*a58d3d2aSXin LiThen it reads the next byte of the Opus frame and forms an 8-bit value sym, 1202*a58d3d2aSXin Li using the left-over bit buffered from the previous byte as the high bit 1203*a58d3d2aSXin Li and the top 7 bits of the byte just read as the other 7 bits of sym. 1204*a58d3d2aSXin LiThe remaining bit in the byte just read is buffered for use in the next 1205*a58d3d2aSXin Li iteration. 1206*a58d3d2aSXin LiIf no more input bytes remain, it uses zero bits instead. 1207*a58d3d2aSXin LiSee <xref target="range-decoder-init"/> for the initialization used to process 1208*a58d3d2aSXin Li the first byte. 1209*a58d3d2aSXin LiThen, it sets 1210*a58d3d2aSXin Li<figure align="center"> 1211*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1212*a58d3d2aSXin Lival = ((val<<8) + (255-sym)) & 0x7FFFFFFF . 1213*a58d3d2aSXin Li]]></artwork> 1214*a58d3d2aSXin Li</figure> 1215*a58d3d2aSXin Li</t> 1216*a58d3d2aSXin Li<t> 1217*a58d3d2aSXin LiIt is normal and expected that the range decoder will read several bytes 1218*a58d3d2aSXin Li into the raw bits data (if any) at the end of the packet by the time the frame 1219*a58d3d2aSXin Li is completely decoded, as illustrated in <xref target="finalize-example"/>. 1220*a58d3d2aSXin LiThis same data MUST also be returned as raw bits when requested. 1221*a58d3d2aSXin LiThe encoder is expected to terminate the stream in such a way that the decoder 1222*a58d3d2aSXin Li will decode the intended values regardless of the data contained in the raw 1223*a58d3d2aSXin Li bits. 1224*a58d3d2aSXin Li<xref target="encoder-finalizing"/> describes a procedure for doing this. 1225*a58d3d2aSXin LiIf the range decoder consumes all of the bytes belonging to the current frame, 1226*a58d3d2aSXin Li it MUST continue to use zero when any further input bytes are required, even 1227*a58d3d2aSXin Li if there is additional data in the current packet from padding or other 1228*a58d3d2aSXin Li frames. 1229*a58d3d2aSXin Li</t> 1230*a58d3d2aSXin Li 1231*a58d3d2aSXin Li<figure anchor="finalize-example" title="Illustrative example of raw bits 1232*a58d3d2aSXin Li overlapping range coder data"> 1233*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1234*a58d3d2aSXin Li n n+1 n+2 n+3 1235*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 1236*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1237*a58d3d2aSXin Li: | <----------- Overlap region ------------> | : 1238*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1239*a58d3d2aSXin Li ^ ^ 1240*a58d3d2aSXin Li | End of data buffered by the range coder | 1241*a58d3d2aSXin Li...-----------------------------------------------+ 1242*a58d3d2aSXin Li | 1243*a58d3d2aSXin Li | End of data consumed by raw bits 1244*a58d3d2aSXin Li +-------------------------------------------------------... 1245*a58d3d2aSXin Li]]></artwork> 1246*a58d3d2aSXin Li</figure> 1247*a58d3d2aSXin Li</section> 1248*a58d3d2aSXin Li</section> 1249*a58d3d2aSXin Li 1250*a58d3d2aSXin Li<section anchor="decoding-alternate" title="Alternate Decoding Methods"> 1251*a58d3d2aSXin Li<t> 1252*a58d3d2aSXin LiThe reference implementation uses three additional decoding methods that are 1253*a58d3d2aSXin Li exactly equivalent to the above, but make assumptions and simplifications that 1254*a58d3d2aSXin Li allow for a more efficient implementation. 1255*a58d3d2aSXin Li</t> 1256*a58d3d2aSXin Li<section anchor="ec_decode_bin" title="ec_decode_bin()"> 1257*a58d3d2aSXin Li<t> 1258*a58d3d2aSXin LiThe first is ec_decode_bin() (entdec.c), defined using the parameter ftb 1259*a58d3d2aSXin Li instead of ft. 1260*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_decode() with 1261*a58d3d2aSXin Li ft = (1<<ftb), but avoids one of the divisions. 1262*a58d3d2aSXin Li</t> 1263*a58d3d2aSXin Li</section> 1264*a58d3d2aSXin Li<section anchor="ec_dec_bit_logp" title="ec_dec_bit_logp()"> 1265*a58d3d2aSXin Li<t> 1266*a58d3d2aSXin LiThe next is ec_dec_bit_logp() (entdec.c), which decodes a single binary symbol, 1267*a58d3d2aSXin Li replacing both the ec_decode() and ec_dec_update() steps. 1268*a58d3d2aSXin LiThe context is described by a single parameter, logp, which is the absolute 1269*a58d3d2aSXin Li value of the base-2 logarithm of the probability of a "1". 1270*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_decode() with 1271*a58d3d2aSXin Li ft = (1<<logp), followed by ec_dec_update() with 1272*a58d3d2aSXin Li the 3-tuple (fl[k] = 0, 1273*a58d3d2aSXin Li fh[k] = (1<<logp) - 1, 1274*a58d3d2aSXin Li ft = (1<<logp)) if the returned value 1275*a58d3d2aSXin Li of fs is less than (1<<logp) - 1 (a "0" was decoded), and with 1276*a58d3d2aSXin Li (fl[k] = (1<<logp) - 1, 1277*a58d3d2aSXin Li fh[k] = ft = (1<<logp)) otherwise (a "1" was 1278*a58d3d2aSXin Li decoded). 1279*a58d3d2aSXin LiThe implementation requires no multiplications or divisions. 1280*a58d3d2aSXin Li</t> 1281*a58d3d2aSXin Li</section> 1282*a58d3d2aSXin Li<section anchor="ec_dec_icdf" title="ec_dec_icdf()"> 1283*a58d3d2aSXin Li<t> 1284*a58d3d2aSXin LiThe last is ec_dec_icdf() (entdec.c), which decodes a single symbol with a 1285*a58d3d2aSXin Li table-based context of up to 8 bits, also replacing both the ec_decode() and 1286*a58d3d2aSXin Li ec_dec_update() steps, as well as the search for the decoded symbol in between. 1287*a58d3d2aSXin LiThe context is described by two parameters, an icdf 1288*a58d3d2aSXin Li ("inverse" cumulative distribution function) table and ftb. 1289*a58d3d2aSXin LiAs with ec_decode_bin(), (1<<ftb) is equivalent to ft. 1290*a58d3d2aSXin Liidcf[k], on the other hand, stores (1<<ftb)-fh[k], which is equal to 1291*a58d3d2aSXin Li (1<<ftb) - fl[k+1]. 1292*a58d3d2aSXin Lifl[0] is assumed to be 0, and the table is terminated by a value of 0 (where 1293*a58d3d2aSXin Li fh[k] == ft). 1294*a58d3d2aSXin Li</t> 1295*a58d3d2aSXin Li<t> 1296*a58d3d2aSXin LiThe function is mathematically equivalent to calling ec_decode() with 1297*a58d3d2aSXin Li ft = (1<<ftb), using the returned value fs to search the table 1298*a58d3d2aSXin Li for the first entry where fs < (1<<ftb)-icdf[k], and 1299*a58d3d2aSXin Li calling ec_dec_update() with 1300*a58d3d2aSXin Li fl[k] = (1<<ftb) - icdf[k-1] (or 0 1301*a58d3d2aSXin Li if k == 0), fh[k] = (1<<ftb) - idcf[k], 1302*a58d3d2aSXin Li and ft = (1<<ftb). 1303*a58d3d2aSXin LiCombining the search with the update allows the division to be replaced by a 1304*a58d3d2aSXin Li series of multiplications (which are usually much cheaper), and using an 1305*a58d3d2aSXin Li inverse CDF allows the use of an ftb as large as 8 in an 8-bit table without 1306*a58d3d2aSXin Li any special cases. 1307*a58d3d2aSXin LiThis is the primary interface with the range decoder in the SILK layer, though 1308*a58d3d2aSXin Li it is used in a few places in the CELT layer as well. 1309*a58d3d2aSXin Li</t> 1310*a58d3d2aSXin Li<t> 1311*a58d3d2aSXin LiAlthough icdf[k] is more convenient for the code, the frequency counts, f[k], 1312*a58d3d2aSXin Li are a more natural representation of the probability distribution function 1313*a58d3d2aSXin Li (PDF) for a given symbol. 1314*a58d3d2aSXin LiTherefore this draft lists the latter, not the former, when describing the 1315*a58d3d2aSXin Li context in which a symbol is coded as a list, e.g., {4, 4, 4, 4}/16 for a 1316*a58d3d2aSXin Li uniform context with four possible values and ft = 16. 1317*a58d3d2aSXin LiThe value of ft after the slash is always the sum of the entries in the PDF, 1318*a58d3d2aSXin Li but is included for convenience. 1319*a58d3d2aSXin LiContexts with identical probabilities, f[k]/ft, but different values of ft 1320*a58d3d2aSXin Li (or equivalently, ftb) are not the same, and cannot, in general, be used in 1321*a58d3d2aSXin Li place of one another. 1322*a58d3d2aSXin LiAn icdf table is also not capable of representing a PDF where the first symbol 1323*a58d3d2aSXin Li has 0 probability. 1324*a58d3d2aSXin LiIn such contexts, ec_dec_icdf() can decode the symbol by using a table that 1325*a58d3d2aSXin Li drops the entries for any initial zero-probability values and adding the 1326*a58d3d2aSXin Li constant offset of the first value with a non-zero probability to its return 1327*a58d3d2aSXin Li value. 1328*a58d3d2aSXin Li</t> 1329*a58d3d2aSXin Li</section> 1330*a58d3d2aSXin Li</section> 1331*a58d3d2aSXin Li 1332*a58d3d2aSXin Li<section anchor="decoding-bits" title="Decoding Raw Bits"> 1333*a58d3d2aSXin Li<t> 1334*a58d3d2aSXin LiThe raw bits used by the CELT layer are packed at the end of the packet, with 1335*a58d3d2aSXin Li the least significant bit of the first value packed in the least significant 1336*a58d3d2aSXin Li bit of the last byte, filling up to the most significant bit in the last byte, 1337*a58d3d2aSXin Li continuing on to the least significant bit of the penultimate byte, and so on. 1338*a58d3d2aSXin LiThe reference implementation reads them using ec_dec_bits() (entdec.c). 1339*a58d3d2aSXin LiBecause the range decoder must read several bytes ahead in the stream, as 1340*a58d3d2aSXin Li described in <xref target="range-decoder-renorm"/>, the input consumed by the 1341*a58d3d2aSXin Li raw bits may overlap with the input consumed by the range coder, and a decoder 1342*a58d3d2aSXin Li MUST allow this. 1343*a58d3d2aSXin LiThe format should render it impossible to attempt to read more raw bits than 1344*a58d3d2aSXin Li there are actual bits in the frame, though a decoder may wish to check for 1345*a58d3d2aSXin Li this and report an error. 1346*a58d3d2aSXin Li</t> 1347*a58d3d2aSXin Li</section> 1348*a58d3d2aSXin Li 1349*a58d3d2aSXin Li<section anchor="ec_dec_uint" title="Decoding Uniformly Distributed Integers"> 1350*a58d3d2aSXin Li<t> 1351*a58d3d2aSXin LiThe function ec_dec_uint() (entdec.c) decodes one of ft equiprobable values in 1352*a58d3d2aSXin Li the range 0 to (ft - 1), inclusive, each with a frequency of 1, 1353*a58d3d2aSXin Li where ft may be as large as (2**32 - 1). 1354*a58d3d2aSXin LiBecause ec_decode() is limited to a total frequency of (2**16 - 1), 1355*a58d3d2aSXin Li it splits up the value into a range coded symbol representing up to 8 of the 1356*a58d3d2aSXin Li high bits, and, if necessary, raw bits representing the remainder of the 1357*a58d3d2aSXin Li value. 1358*a58d3d2aSXin LiThe limit of 8 bits in the range coded symbol is a trade-off between 1359*a58d3d2aSXin Li implementation complexity, modeling error (since the symbols no longer truly 1360*a58d3d2aSXin Li have equal coding cost), and rounding error introduced by the range coder 1361*a58d3d2aSXin Li itself (which gets larger as more bits are included). 1362*a58d3d2aSXin LiUsing raw bits reduces the maximum number of divisions required in the worst 1363*a58d3d2aSXin Li case, but means that it may be possible to decode a value outside the range 1364*a58d3d2aSXin Li 0 to (ft - 1), inclusive. 1365*a58d3d2aSXin Li</t> 1366*a58d3d2aSXin Li 1367*a58d3d2aSXin Li<t> 1368*a58d3d2aSXin Liec_dec_uint() takes a single, positive parameter, ft, which is not necessarily 1369*a58d3d2aSXin Li a power of two, and returns an integer, t, whose value lies between 0 and 1370*a58d3d2aSXin Li (ft - 1), inclusive. 1371*a58d3d2aSXin LiLet ftb = ilog(ft - 1), i.e., the number of bits required 1372*a58d3d2aSXin Li to store (ft - 1) in two's complement notation. 1373*a58d3d2aSXin LiIf ftb is 8 or less, then t is decoded with t = ec_decode(ft), and 1374*a58d3d2aSXin Li the range coder state is updated using the three-tuple (t, t + 1, 1375*a58d3d2aSXin Li ft). 1376*a58d3d2aSXin Li</t> 1377*a58d3d2aSXin Li<t> 1378*a58d3d2aSXin LiIf ftb is greater than 8, then the top 8 bits of t are decoded using 1379*a58d3d2aSXin Li<figure align="center"> 1380*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1381*a58d3d2aSXin Lit = ec_decode(((ft - 1) >> (ftb - 8)) + 1) , 1382*a58d3d2aSXin Li]]></artwork> 1383*a58d3d2aSXin Li</figure> 1384*a58d3d2aSXin Li the decoder state is updated using the three-tuple 1385*a58d3d2aSXin Li (t, t + 1, 1386*a58d3d2aSXin Li ((ft - 1) >> (ftb - 8)) + 1), 1387*a58d3d2aSXin Li and the remaining bits are decoded as raw bits, setting 1388*a58d3d2aSXin Li<figure align="center"> 1389*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1390*a58d3d2aSXin Lit = (t << (ftb - 8)) | ec_dec_bits(ftb - 8) . 1391*a58d3d2aSXin Li]]></artwork> 1392*a58d3d2aSXin Li</figure> 1393*a58d3d2aSXin LiIf, at this point, t >= ft, then the current frame is corrupt. 1394*a58d3d2aSXin LiIn that case, the decoder should assume there has been an error in the coding, 1395*a58d3d2aSXin Li decoding, or transmission and SHOULD take measures to conceal the 1396*a58d3d2aSXin Li error and/or report to the application that the error has occurred. 1397*a58d3d2aSXin Li</t> 1398*a58d3d2aSXin Li 1399*a58d3d2aSXin Li</section> 1400*a58d3d2aSXin Li 1401*a58d3d2aSXin Li<section anchor="decoder-tell" title="Current Bit Usage"> 1402*a58d3d2aSXin Li<t> 1403*a58d3d2aSXin LiThe bit allocation routines in the CELT decoder need a conservative upper bound 1404*a58d3d2aSXin Li on the number of bits that have been used from the current frame thus far, 1405*a58d3d2aSXin Li including both range coder bits and raw bits. 1406*a58d3d2aSXin LiThis drives allocation decisions that must match those made in the encoder. 1407*a58d3d2aSXin LiThe upper bound is computed in the reference implementation to whole-bit 1408*a58d3d2aSXin Li precision by the function ec_tell() (entcode.h) and to fractional 1/8th bit 1409*a58d3d2aSXin Li precision by the function ec_tell_frac() (entcode.c). 1410*a58d3d2aSXin LiLike all operations in the range coder, it must be implemented in a bit-exact 1411*a58d3d2aSXin Li manner, and must produce exactly the same value returned by the same functions 1412*a58d3d2aSXin Li in the encoder after encoding the same symbols. 1413*a58d3d2aSXin Li</t> 1414*a58d3d2aSXin Li<t> 1415*a58d3d2aSXin Liec_tell() is guaranteed to return ceil(ec_tell_frac()/8.0). 1416*a58d3d2aSXin LiIn various places the codec will check to ensure there is enough room to 1417*a58d3d2aSXin Li contain a symbol before attempting to decode it. 1418*a58d3d2aSXin LiIn practice, although the number of bits used so far is an upper bound, 1419*a58d3d2aSXin Li decoding a symbol whose probability model suggests it has a worst-case cost of 1420*a58d3d2aSXin Li p 1/8th bits may actually advance the return value of ec_tell_frac() by 1421*a58d3d2aSXin Li p-1, p, or p+1 1/8th bits, due to approximation error in that upper bound, 1422*a58d3d2aSXin Li truncation error in the range coder, and for large values of ft, modeling 1423*a58d3d2aSXin Li error in ec_dec_uint(). 1424*a58d3d2aSXin Li</t> 1425*a58d3d2aSXin Li<t> 1426*a58d3d2aSXin LiHowever, this error is bounded, and periodic calls to ec_tell() or 1427*a58d3d2aSXin Li ec_tell_frac() at precisely defined points in the decoding process prevent it 1428*a58d3d2aSXin Li from accumulating. 1429*a58d3d2aSXin LiFor a range coder symbol that requires a whole number of bits (i.e., 1430*a58d3d2aSXin Li for which ft/(fh[k] - fl[k]) is a power of two), where there are at 1431*a58d3d2aSXin Li least p 1/8th bits available, decoding the symbol will never cause ec_tell() or 1432*a58d3d2aSXin Li ec_tell_frac() to exceed the size of the frame ("bust the budget"). 1433*a58d3d2aSXin LiIn this case the return value of ec_tell_frac() will only advance by more than 1434*a58d3d2aSXin Li p 1/8th bits if there was an additional, fractional number of bits remaining, 1435*a58d3d2aSXin Li and it will never advance beyond the next whole-bit boundary, which is safe, 1436*a58d3d2aSXin Li since frames always contain a whole number of bits. 1437*a58d3d2aSXin LiHowever, when p is not a whole number of bits, an extra 1/8th bit is required 1438*a58d3d2aSXin Li to ensure that decoding the symbol will not bust the budget. 1439*a58d3d2aSXin Li</t> 1440*a58d3d2aSXin Li<t> 1441*a58d3d2aSXin LiThe reference implementation keeps track of the total number of whole bits that 1442*a58d3d2aSXin Li have been processed by the decoder so far in the variable nbits_total, 1443*a58d3d2aSXin Li including the (possibly fractional) number of bits that are currently 1444*a58d3d2aSXin Li buffered, but not consumed, inside the range coder. 1445*a58d3d2aSXin Linbits_total is initialized to 9 just before the initial range renormalization 1446*a58d3d2aSXin Li process completes (or equivalently, it can be initialized to 33 after the 1447*a58d3d2aSXin Li first renormalization). 1448*a58d3d2aSXin LiThe extra two bits over the actual amount buffered by the range coder 1449*a58d3d2aSXin Li guarantees that it is an upper bound and that there is enough room for the 1450*a58d3d2aSXin Li encoder to terminate the stream. 1451*a58d3d2aSXin LiEach iteration through the range coder's renormalization loop increases 1452*a58d3d2aSXin Li nbits_total by 8. 1453*a58d3d2aSXin LiReading raw bits increases nbits_total by the number of raw bits read. 1454*a58d3d2aSXin Li</t> 1455*a58d3d2aSXin Li 1456*a58d3d2aSXin Li<section anchor="ec_tell" title="ec_tell()"> 1457*a58d3d2aSXin Li<t> 1458*a58d3d2aSXin LiThe whole number of bits buffered in rng may be estimated via lg = ilog(rng). 1459*a58d3d2aSXin Liec_tell() then becomes a simple matter of removing these bits from the total. 1460*a58d3d2aSXin LiIt returns (nbits_total - lg). 1461*a58d3d2aSXin Li</t> 1462*a58d3d2aSXin Li<t> 1463*a58d3d2aSXin LiIn a newly initialized decoder, before any symbols have been read, this reports 1464*a58d3d2aSXin Li that 1 bit has been used. 1465*a58d3d2aSXin LiThis is the bit reserved for termination of the encoder. 1466*a58d3d2aSXin Li</t> 1467*a58d3d2aSXin Li</section> 1468*a58d3d2aSXin Li 1469*a58d3d2aSXin Li<section anchor="ec_tell_frac" title="ec_tell_frac()"> 1470*a58d3d2aSXin Li<t> 1471*a58d3d2aSXin Liec_tell_frac() estimates the number of bits buffered in rng to fractional 1472*a58d3d2aSXin Li precision. 1473*a58d3d2aSXin LiSince rng must be greater than 2**23 after renormalization, lg must be at least 1474*a58d3d2aSXin Li 24. 1475*a58d3d2aSXin LiLet 1476*a58d3d2aSXin Li<figure align="center"> 1477*a58d3d2aSXin Li<artwork align="center"> 1478*a58d3d2aSXin Li<![CDATA[ 1479*a58d3d2aSXin Lir_Q15 = rng >> (lg-16) , 1480*a58d3d2aSXin Li]]></artwork> 1481*a58d3d2aSXin Li</figure> 1482*a58d3d2aSXin Li so that 32768 <= r_Q15 < 65536, an unsigned Q15 value representing the 1483*a58d3d2aSXin Li fractional part of rng. 1484*a58d3d2aSXin LiThen the following procedure can be used to add one bit of precision to lg. 1485*a58d3d2aSXin LiFirst, update 1486*a58d3d2aSXin Li<figure align="center"> 1487*a58d3d2aSXin Li<artwork align="center"> 1488*a58d3d2aSXin Li<![CDATA[ 1489*a58d3d2aSXin Lir_Q15 = (r_Q15*r_Q15) >> 15 . 1490*a58d3d2aSXin Li]]></artwork> 1491*a58d3d2aSXin Li</figure> 1492*a58d3d2aSXin LiThen add the 16th bit of r_Q15 to lg via 1493*a58d3d2aSXin Li<figure align="center"> 1494*a58d3d2aSXin Li<artwork align="center"> 1495*a58d3d2aSXin Li<![CDATA[ 1496*a58d3d2aSXin Lilg = 2*lg + (r_Q15 >> 16) . 1497*a58d3d2aSXin Li]]></artwork> 1498*a58d3d2aSXin Li</figure> 1499*a58d3d2aSXin LiFinally, if this bit was a 1, reduce r_Q15 by a factor of two via 1500*a58d3d2aSXin Li<figure align="center"> 1501*a58d3d2aSXin Li<artwork align="center"> 1502*a58d3d2aSXin Li<![CDATA[ 1503*a58d3d2aSXin Lir_Q15 = r_Q15 >> 1 , 1504*a58d3d2aSXin Li]]></artwork> 1505*a58d3d2aSXin Li</figure> 1506*a58d3d2aSXin Li so that it once again lies in the range 32768 <= r_Q15 < 65536. 1507*a58d3d2aSXin Li</t> 1508*a58d3d2aSXin Li<t> 1509*a58d3d2aSXin LiThis procedure is repeated three times to extend lg to 1/8th bit precision. 1510*a58d3d2aSXin Liec_tell_frac() then returns (nbits_total*8 - lg). 1511*a58d3d2aSXin Li</t> 1512*a58d3d2aSXin Li</section> 1513*a58d3d2aSXin Li 1514*a58d3d2aSXin Li</section> 1515*a58d3d2aSXin Li 1516*a58d3d2aSXin Li</section> 1517*a58d3d2aSXin Li 1518*a58d3d2aSXin Li<section anchor="silk_decoder_outline" title="SILK Decoder"> 1519*a58d3d2aSXin Li<t> 1520*a58d3d2aSXin LiThe decoder's LP layer uses a modified version of the SILK codec (herein simply 1521*a58d3d2aSXin Li called "SILK"), which runs a decoded excitation signal through adaptive 1522*a58d3d2aSXin Li long-term and short-term prediction synthesis filters. 1523*a58d3d2aSXin LiIt runs at NB, MB, and WB sample rates internally. 1524*a58d3d2aSXin LiWhen used in a SWB or FB Hybrid frame, the LP layer itself still only runs in 1525*a58d3d2aSXin Li WB. 1526*a58d3d2aSXin Li</t> 1527*a58d3d2aSXin Li 1528*a58d3d2aSXin Li<section title="SILK Decoder Modules"> 1529*a58d3d2aSXin Li<t> 1530*a58d3d2aSXin LiAn overview of the decoder is given in <xref target="silk_decoder_figure"/>. 1531*a58d3d2aSXin Li</t> 1532*a58d3d2aSXin Li<figure align="center" anchor="silk_decoder_figure" title="SILK Decoder"> 1533*a58d3d2aSXin Li<artwork align="center"> 1534*a58d3d2aSXin Li<![CDATA[ 1535*a58d3d2aSXin Li +---------+ +------------+ 1536*a58d3d2aSXin Li-->| Range |--->| Decode |---------------------------+ 1537*a58d3d2aSXin Li 1 | Decoder | 2 | Parameters |----------+ 5 | 1538*a58d3d2aSXin Li +---------+ +------------+ 4 | | 1539*a58d3d2aSXin Li 3 | | | 1540*a58d3d2aSXin Li \/ \/ \/ 1541*a58d3d2aSXin Li +------------+ +------------+ +------------+ 1542*a58d3d2aSXin Li | Generate |-->| LTP |-->| LPC | 1543*a58d3d2aSXin Li | Excitation | | Synthesis | | Synthesis | 1544*a58d3d2aSXin Li +------------+ +------------+ +------------+ 1545*a58d3d2aSXin Li ^ | 1546*a58d3d2aSXin Li | | 1547*a58d3d2aSXin Li +-------------------+----------------+ 1548*a58d3d2aSXin Li | 6 1549*a58d3d2aSXin Li | +------------+ +-------------+ 1550*a58d3d2aSXin Li +-->| Stereo |-->| Sample Rate |--> 1551*a58d3d2aSXin Li | Unmixing | 7 | Conversion | 8 1552*a58d3d2aSXin Li +------------+ +-------------+ 1553*a58d3d2aSXin Li 1554*a58d3d2aSXin Li1: Range encoded bitstream 1555*a58d3d2aSXin Li2: Coded parameters 1556*a58d3d2aSXin Li3: Pulses, LSBs, and signs 1557*a58d3d2aSXin Li4: Pitch lags, Long-Term Prediction (LTP) coefficients 1558*a58d3d2aSXin Li5: Linear Predictive Coding (LPC) coefficients and gains 1559*a58d3d2aSXin Li6: Decoded signal (mono or mid-side stereo) 1560*a58d3d2aSXin Li7: Unmixed signal (mono or left-right stereo) 1561*a58d3d2aSXin Li8: Resampled signal 1562*a58d3d2aSXin Li]]> 1563*a58d3d2aSXin Li</artwork> 1564*a58d3d2aSXin Li</figure> 1565*a58d3d2aSXin Li 1566*a58d3d2aSXin Li<t> 1567*a58d3d2aSXin LiThe decoder feeds the bitstream (1) to the range decoder from 1568*a58d3d2aSXin Li <xref target="range-decoder"/>, and then decodes the parameters in it (2) 1569*a58d3d2aSXin Li using the procedures detailed in 1570*a58d3d2aSXin Li Sections <xref format="counter" target="silk_header_bits"/> 1571*a58d3d2aSXin Li through <xref format="counter" target="silk_signs"/>. 1572*a58d3d2aSXin LiThese parameters (3, 4, 5) are used to generate an excitation signal (see 1573*a58d3d2aSXin Li <xref target="silk_excitation_reconstruction"/>), which is fed to an optional 1574*a58d3d2aSXin Li long-term prediction (LTP) filter (voiced frames only, see 1575*a58d3d2aSXin Li <xref target="silk_ltp_synthesis"/>) and then a short-term prediction filter 1576*a58d3d2aSXin Li (see <xref target="silk_lpc_synthesis"/>), producing the decoded signal (6). 1577*a58d3d2aSXin LiFor stereo streams, the mid-side representation is converted to separate left 1578*a58d3d2aSXin Li and right channels (7). 1579*a58d3d2aSXin LiThe result is finally resampled to the desired output sample rate (e.g., 1580*a58d3d2aSXin Li 48 kHz) so that the resampled signal (8) can be mixed with the CELT 1581*a58d3d2aSXin Li layer. 1582*a58d3d2aSXin Li</t> 1583*a58d3d2aSXin Li 1584*a58d3d2aSXin Li</section> 1585*a58d3d2aSXin Li 1586*a58d3d2aSXin Li<section anchor="silk_layer_organization" title="LP Layer Organization"> 1587*a58d3d2aSXin Li 1588*a58d3d2aSXin Li<t> 1589*a58d3d2aSXin LiInternally, the LP layer of a single Opus frame is composed of either a single 1590*a58d3d2aSXin Li 10 ms regular SILK frame or between one and three 20 ms regular SILK 1591*a58d3d2aSXin Li frames. 1592*a58d3d2aSXin LiA stereo Opus frame may double the number of regular SILK frames (up to a total 1593*a58d3d2aSXin Li of six), since it includes separate frames for a mid channel and, optionally, 1594*a58d3d2aSXin Li a side channel. 1595*a58d3d2aSXin LiOptional Low Bit-Rate Redundancy (LBRR) frames, which are reduced-bitrate 1596*a58d3d2aSXin Li encodings of previous SILK frames, may be included to aid in recovery from 1597*a58d3d2aSXin Li packet loss. 1598*a58d3d2aSXin LiIf present, these appear before the regular SILK frames. 1599*a58d3d2aSXin LiThey are in most respects identical to regular, active SILK frames, except that 1600*a58d3d2aSXin Li they are usually encoded with a lower bitrate. 1601*a58d3d2aSXin LiThis draft uses "SILK frame" to refer to either one and "regular SILK frame" if 1602*a58d3d2aSXin Li it needs to draw a distinction between the two. 1603*a58d3d2aSXin Li</t> 1604*a58d3d2aSXin Li<t> 1605*a58d3d2aSXin LiLogically, each SILK frame is in turn composed of either two or four 5 ms 1606*a58d3d2aSXin Li subframes. 1607*a58d3d2aSXin LiVarious parameters, such as the quantization gain of the excitation and the 1608*a58d3d2aSXin Li pitch lag and filter coefficients can vary on a subframe-by-subframe basis. 1609*a58d3d2aSXin LiPhysically, the parameters for each subframe are interleaved in the bitstream, 1610*a58d3d2aSXin Li as described in the relevant sections for each parameter. 1611*a58d3d2aSXin Li</t> 1612*a58d3d2aSXin Li<t> 1613*a58d3d2aSXin LiAll of these frames and subframes are decoded from the same range coder, with 1614*a58d3d2aSXin Li no padding between them. 1615*a58d3d2aSXin LiThus packing multiple SILK frames in a single Opus frame saves, on average, 1616*a58d3d2aSXin Li half a byte per SILK frame. 1617*a58d3d2aSXin LiIt also allows some parameters to be predicted from prior SILK frames in the 1618*a58d3d2aSXin Li same Opus frame, since this does not degrade packet loss robustness (beyond 1619*a58d3d2aSXin Li any penalty for merely using fewer, larger packets to store multiple frames). 1620*a58d3d2aSXin Li</t> 1621*a58d3d2aSXin Li 1622*a58d3d2aSXin Li<t> 1623*a58d3d2aSXin LiStereo support in SILK uses a variant of mid-side coding, allowing a mono 1624*a58d3d2aSXin Li decoder to simply decode the mid channel. 1625*a58d3d2aSXin LiHowever, the data for the two channels is interleaved, so a mono decoder must 1626*a58d3d2aSXin Li still unpack the data for the side channel. 1627*a58d3d2aSXin LiIt would be required to do so anyway for Hybrid Opus frames, or to support 1628*a58d3d2aSXin Li decoding individual 20 ms frames. 1629*a58d3d2aSXin Li</t> 1630*a58d3d2aSXin Li 1631*a58d3d2aSXin Li<t> 1632*a58d3d2aSXin Li<xref target="silk_symbols"/> summarizes the overall grouping of the contents of 1633*a58d3d2aSXin Li the LP layer. 1634*a58d3d2aSXin LiFigures <xref format="counter" target="silk_mono_60ms_frame"/> 1635*a58d3d2aSXin Li and <xref format="counter" target="silk_stereo_60ms_frame"/> illustrate 1636*a58d3d2aSXin Li the ordering of the various SILK frames for a 60 ms Opus frame, for both 1637*a58d3d2aSXin Li mono and stereo, respectively. 1638*a58d3d2aSXin Li</t> 1639*a58d3d2aSXin Li 1640*a58d3d2aSXin Li<texttable anchor="silk_symbols" 1641*a58d3d2aSXin Li title="Organization of the SILK layer of an Opus frame"> 1642*a58d3d2aSXin Li<ttcol align="center">Symbol(s)</ttcol> 1643*a58d3d2aSXin Li<ttcol align="center">PDF(s)</ttcol> 1644*a58d3d2aSXin Li<ttcol align="center">Condition</ttcol> 1645*a58d3d2aSXin Li 1646*a58d3d2aSXin Li<c>Voice Activity Detection (VAD) flags</c> 1647*a58d3d2aSXin Li<c>{1, 1}/2</c> 1648*a58d3d2aSXin Li<c/> 1649*a58d3d2aSXin Li 1650*a58d3d2aSXin Li<c>LBRR flag</c> 1651*a58d3d2aSXin Li<c>{1, 1}/2</c> 1652*a58d3d2aSXin Li<c/> 1653*a58d3d2aSXin Li 1654*a58d3d2aSXin Li<c>Per-frame LBRR flags</c> 1655*a58d3d2aSXin Li<c><xref target="silk_lbrr_flag_pdfs"/></c> 1656*a58d3d2aSXin Li<c><xref target="silk_lbrr_flags"/></c> 1657*a58d3d2aSXin Li 1658*a58d3d2aSXin Li<c>LBRR Frame(s)</c> 1659*a58d3d2aSXin Li<c><xref target="silk_frame"/></c> 1660*a58d3d2aSXin Li<c><xref target="silk_lbrr_flags"/></c> 1661*a58d3d2aSXin Li 1662*a58d3d2aSXin Li<c>Regular SILK Frame(s)</c> 1663*a58d3d2aSXin Li<c><xref target="silk_frame"/></c> 1664*a58d3d2aSXin Li<c/> 1665*a58d3d2aSXin Li 1666*a58d3d2aSXin Li</texttable> 1667*a58d3d2aSXin Li 1668*a58d3d2aSXin Li<figure align="center" anchor="silk_mono_60ms_frame" 1669*a58d3d2aSXin Li title="A 60 ms Mono Frame"> 1670*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1671*a58d3d2aSXin Li+---------------------------------+ 1672*a58d3d2aSXin Li| VAD Flags | 1673*a58d3d2aSXin Li+---------------------------------+ 1674*a58d3d2aSXin Li| LBRR Flag | 1675*a58d3d2aSXin Li+---------------------------------+ 1676*a58d3d2aSXin Li| Per-Frame LBRR Flags (Optional) | 1677*a58d3d2aSXin Li+---------------------------------+ 1678*a58d3d2aSXin Li| LBRR Frame 1 (Optional) | 1679*a58d3d2aSXin Li+---------------------------------+ 1680*a58d3d2aSXin Li| LBRR Frame 2 (Optional) | 1681*a58d3d2aSXin Li+---------------------------------+ 1682*a58d3d2aSXin Li| LBRR Frame 3 (Optional) | 1683*a58d3d2aSXin Li+---------------------------------+ 1684*a58d3d2aSXin Li| Regular SILK Frame 1 | 1685*a58d3d2aSXin Li+---------------------------------+ 1686*a58d3d2aSXin Li| Regular SILK Frame 2 | 1687*a58d3d2aSXin Li+---------------------------------+ 1688*a58d3d2aSXin Li| Regular SILK Frame 3 | 1689*a58d3d2aSXin Li+---------------------------------+ 1690*a58d3d2aSXin Li]]></artwork> 1691*a58d3d2aSXin Li</figure> 1692*a58d3d2aSXin Li 1693*a58d3d2aSXin Li<figure align="center" anchor="silk_stereo_60ms_frame" 1694*a58d3d2aSXin Li title="A 60 ms Stereo Frame"> 1695*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 1696*a58d3d2aSXin Li+---------------------------------------+ 1697*a58d3d2aSXin Li| Mid VAD Flags | 1698*a58d3d2aSXin Li+---------------------------------------+ 1699*a58d3d2aSXin Li| Mid LBRR Flag | 1700*a58d3d2aSXin Li+---------------------------------------+ 1701*a58d3d2aSXin Li| Side VAD Flags | 1702*a58d3d2aSXin Li+---------------------------------------+ 1703*a58d3d2aSXin Li| Side LBRR Flag | 1704*a58d3d2aSXin Li+---------------------------------------+ 1705*a58d3d2aSXin Li| Mid Per-Frame LBRR Flags (Optional) | 1706*a58d3d2aSXin Li+---------------------------------------+ 1707*a58d3d2aSXin Li| Side Per-Frame LBRR Flags (Optional) | 1708*a58d3d2aSXin Li+---------------------------------------+ 1709*a58d3d2aSXin Li| Mid LBRR Frame 1 (Optional) | 1710*a58d3d2aSXin Li+---------------------------------------+ 1711*a58d3d2aSXin Li| Side LBRR Frame 1 (Optional) | 1712*a58d3d2aSXin Li+---------------------------------------+ 1713*a58d3d2aSXin Li| Mid LBRR Frame 2 (Optional) | 1714*a58d3d2aSXin Li+---------------------------------------+ 1715*a58d3d2aSXin Li| Side LBRR Frame 2 (Optional) | 1716*a58d3d2aSXin Li+---------------------------------------+ 1717*a58d3d2aSXin Li| Mid LBRR Frame 3 (Optional) | 1718*a58d3d2aSXin Li+---------------------------------------+ 1719*a58d3d2aSXin Li| Side LBRR Frame 3 (Optional) | 1720*a58d3d2aSXin Li+---------------------------------------+ 1721*a58d3d2aSXin Li| Mid Regular SILK Frame 1 | 1722*a58d3d2aSXin Li+---------------------------------------+ 1723*a58d3d2aSXin Li| Side Regular SILK Frame 1 (Optional) | 1724*a58d3d2aSXin Li+---------------------------------------+ 1725*a58d3d2aSXin Li| Mid Regular SILK Frame 2 | 1726*a58d3d2aSXin Li+---------------------------------------+ 1727*a58d3d2aSXin Li| Side Regular SILK Frame 2 (Optional) | 1728*a58d3d2aSXin Li+---------------------------------------+ 1729*a58d3d2aSXin Li| Mid Regular SILK Frame 3 | 1730*a58d3d2aSXin Li+---------------------------------------+ 1731*a58d3d2aSXin Li| Side Regular SILK Frame 3 (Optional) | 1732*a58d3d2aSXin Li+---------------------------------------+ 1733*a58d3d2aSXin Li]]></artwork> 1734*a58d3d2aSXin Li</figure> 1735*a58d3d2aSXin Li 1736*a58d3d2aSXin Li</section> 1737*a58d3d2aSXin Li 1738*a58d3d2aSXin Li<section anchor="silk_header_bits" title="Header Bits"> 1739*a58d3d2aSXin Li<t> 1740*a58d3d2aSXin LiThe LP layer begins with two to eight header bits, decoded in silk_Decode() 1741*a58d3d2aSXin Li (dec_API.c). 1742*a58d3d2aSXin LiThese consist of one Voice Activity Detection (VAD) bit per frame (up to 3), 1743*a58d3d2aSXin Li followed by a single flag indicating the presence of LBRR frames. 1744*a58d3d2aSXin LiFor a stereo packet, these first flags correspond to the mid channel, and a 1745*a58d3d2aSXin Li second set of flags is included for the side channel. 1746*a58d3d2aSXin Li</t> 1747*a58d3d2aSXin Li<t> 1748*a58d3d2aSXin LiBecause these are the first symbols decoded by the range coder and because they 1749*a58d3d2aSXin Li are coded as binary values with uniform probability, they can be extracted 1750*a58d3d2aSXin Li directly from the most significant bits of the first byte of compressed data. 1751*a58d3d2aSXin LiThus, a receiver can determine if an Opus frame contains any active SILK frames 1752*a58d3d2aSXin Li without the overhead of using the range decoder. 1753*a58d3d2aSXin Li</t> 1754*a58d3d2aSXin Li</section> 1755*a58d3d2aSXin Li 1756*a58d3d2aSXin Li<section anchor="silk_lbrr_flags" title="Per-Frame LBRR Flags"> 1757*a58d3d2aSXin Li<t> 1758*a58d3d2aSXin LiFor Opus frames longer than 20 ms, a set of LBRR flags is 1759*a58d3d2aSXin Li decoded for each channel that has its LBRR flag set. 1760*a58d3d2aSXin LiEach set contains one flag per 20 ms SILK frame. 1761*a58d3d2aSXin Li40 ms Opus frames use the 2-frame LBRR flag PDF from 1762*a58d3d2aSXin Li <xref target="silk_lbrr_flag_pdfs"/>, and 60 ms Opus frames use the 1763*a58d3d2aSXin Li 3-frame LBRR flag PDF. 1764*a58d3d2aSXin LiFor each channel, the resulting 2- or 3-bit integer contains the corresponding 1765*a58d3d2aSXin Li LBRR flag for each frame, packed in order from the LSB to the MSB. 1766*a58d3d2aSXin Li</t> 1767*a58d3d2aSXin Li 1768*a58d3d2aSXin Li<texttable anchor="silk_lbrr_flag_pdfs" title="LBRR Flag PDFs"> 1769*a58d3d2aSXin Li<ttcol>Frame Size</ttcol> 1770*a58d3d2aSXin Li<ttcol>PDF</ttcol> 1771*a58d3d2aSXin Li<c>40 ms</c> <c>{0, 53, 53, 150}/256</c> 1772*a58d3d2aSXin Li<c>60 ms</c> <c>{0, 41, 20, 29, 41, 15, 28, 82}/256</c> 1773*a58d3d2aSXin Li</texttable> 1774*a58d3d2aSXin Li 1775*a58d3d2aSXin Li<t> 1776*a58d3d2aSXin LiA 10 or 20 ms Opus frame does not contain any per-frame LBRR flags, 1777*a58d3d2aSXin Li as there may be at most one LBRR frame per channel. 1778*a58d3d2aSXin LiThe global LBRR flag in the header bits (see <xref target="silk_header_bits"/>) 1779*a58d3d2aSXin Li is already sufficient to indicate the presence of that single LBRR frame. 1780*a58d3d2aSXin Li</t> 1781*a58d3d2aSXin Li 1782*a58d3d2aSXin Li</section> 1783*a58d3d2aSXin Li 1784*a58d3d2aSXin Li<section anchor="silk_lbrr_frames" title="LBRR Frames"> 1785*a58d3d2aSXin Li<t> 1786*a58d3d2aSXin LiThe LBRR frames, if present, contain an encoded representation of the signal 1787*a58d3d2aSXin Li immediately prior to the current Opus frame as if it were encoded with the 1788*a58d3d2aSXin Li current mode, frame size, audio bandwidth, and channel count, even if those 1789*a58d3d2aSXin Li differ from the prior Opus frame. 1790*a58d3d2aSXin LiWhen one of these parameters changes from one Opus frame to the next, this 1791*a58d3d2aSXin Li implies that the LBRR frames of the current Opus frame may not be simple 1792*a58d3d2aSXin Li drop-in replacements for the contents of the previous Opus frame. 1793*a58d3d2aSXin Li</t> 1794*a58d3d2aSXin Li 1795*a58d3d2aSXin Li<t> 1796*a58d3d2aSXin LiFor example, when switching from 20 ms to 60 ms, the 60 ms Opus 1797*a58d3d2aSXin Li frame may contain LBRR frames covering up to three prior 20 ms Opus 1798*a58d3d2aSXin Li frames, even if those frames already contained LBRR frames covering some of 1799*a58d3d2aSXin Li the same time periods. 1800*a58d3d2aSXin LiWhen switching from 20 ms to 10 ms, the 10 ms Opus frame can 1801*a58d3d2aSXin Li contain an LBRR frame covering at most half the prior 20 ms Opus frame, 1802*a58d3d2aSXin Li potentially leaving a hole that needs to be concealed from even a single 1803*a58d3d2aSXin Li packet loss (see <xref target="Packet Loss Concealment"/>). 1804*a58d3d2aSXin LiWhen switching from mono to stereo, the LBRR frames in the first stereo Opus 1805*a58d3d2aSXin Li frame MAY contain a non-trivial side channel. 1806*a58d3d2aSXin Li</t> 1807*a58d3d2aSXin Li 1808*a58d3d2aSXin Li<t> 1809*a58d3d2aSXin LiIn order to properly produce LBRR frames under all conditions, an encoder might 1810*a58d3d2aSXin Li need to buffer up to 60 ms of audio and re-encode it during these 1811*a58d3d2aSXin Li transitions. 1812*a58d3d2aSXin LiHowever, the reference implementation opts to disable LBRR frames at the 1813*a58d3d2aSXin Li transition point for simplicity. 1814*a58d3d2aSXin LiSince transitions are relatively infrequent in normal usage, this does not have 1815*a58d3d2aSXin Li a significant impact on packet loss robustness. 1816*a58d3d2aSXin Li</t> 1817*a58d3d2aSXin Li 1818*a58d3d2aSXin Li<t> 1819*a58d3d2aSXin LiThe LBRR frames immediately follow the LBRR flags, prior to any regular SILK 1820*a58d3d2aSXin Li frames. 1821*a58d3d2aSXin Li<xref target="silk_frame"/> describes their exact contents. 1822*a58d3d2aSXin LiLBRR frames do not include their own separate VAD flags. 1823*a58d3d2aSXin LiLBRR frames are only meant to be transmitted for active speech, thus all LBRR 1824*a58d3d2aSXin Li frames are treated as active. 1825*a58d3d2aSXin Li</t> 1826*a58d3d2aSXin Li 1827*a58d3d2aSXin Li<t> 1828*a58d3d2aSXin LiIn a stereo Opus frame longer than 20 ms, although the per-frame LBRR 1829*a58d3d2aSXin Li flags for the mid channel are coded as a unit before the per-frame LBRR flags 1830*a58d3d2aSXin Li for the side channel, the LBRR frames themselves are interleaved. 1831*a58d3d2aSXin LiThe decoder parses an LBRR frame for the mid channel of a given 20 ms 1832*a58d3d2aSXin Li interval (if present) and then immediately parses the corresponding LBRR 1833*a58d3d2aSXin Li frame for the side channel (if present), before proceeding to the next 1834*a58d3d2aSXin Li 20 ms interval. 1835*a58d3d2aSXin Li</t> 1836*a58d3d2aSXin Li</section> 1837*a58d3d2aSXin Li 1838*a58d3d2aSXin Li<section anchor="silk_regular_frames" title="Regular SILK Frames"> 1839*a58d3d2aSXin Li<t> 1840*a58d3d2aSXin LiThe regular SILK frame(s) follow the LBRR frames (if any). 1841*a58d3d2aSXin Li<xref target="silk_frame"/> describes their contents, as well. 1842*a58d3d2aSXin LiUnlike the LBRR frames, a regular SILK frame is coded for each time interval in 1843*a58d3d2aSXin Li an Opus frame, even if the corresponding VAD flags are unset. 1844*a58d3d2aSXin LiFor stereo Opus frames longer than 20 ms, the regular mid and side SILK 1845*a58d3d2aSXin Li frames for each 20 ms interval are interleaved, just as with the LBRR 1846*a58d3d2aSXin Li frames. 1847*a58d3d2aSXin LiThe side frame may be skipped by coding an appropriate flag, as detailed in 1848*a58d3d2aSXin Li <xref target="silk_mid_only_flag"/>. 1849*a58d3d2aSXin Li</t> 1850*a58d3d2aSXin Li</section> 1851*a58d3d2aSXin Li 1852*a58d3d2aSXin Li<section anchor="silk_frame" title="SILK Frame Contents"> 1853*a58d3d2aSXin Li<t> 1854*a58d3d2aSXin LiEach SILK frame includes a set of side information that encodes 1855*a58d3d2aSXin Li<list style="symbols"> 1856*a58d3d2aSXin Li<t>The frame type and quantization type (<xref target="silk_frame_type"/>),</t> 1857*a58d3d2aSXin Li<t>Quantization gains (<xref target="silk_gains"/>),</t> 1858*a58d3d2aSXin Li<t>Short-term prediction filter coefficients (<xref target="silk_nlsfs"/>),</t> 1859*a58d3d2aSXin Li<t>A Line Spectral Frequencies (LSF) interpolation weight (<xref target="silk_nlsf_interpolation"/>),</t> 1860*a58d3d2aSXin Li<t> 1861*a58d3d2aSXin LiLong-term prediction filter lags and gains (<xref target="silk_ltp_params"/>), 1862*a58d3d2aSXin Li and 1863*a58d3d2aSXin Li</t> 1864*a58d3d2aSXin Li<t>A linear congruential generator (LCG) seed (<xref target="silk_seed"/>).</t> 1865*a58d3d2aSXin Li</list> 1866*a58d3d2aSXin LiThe quantized excitation signal (see <xref target="silk_excitation"/>) follows 1867*a58d3d2aSXin Li these at the end of the frame. 1868*a58d3d2aSXin Li<xref target="silk_frame_symbols"/> details the overall organization of a 1869*a58d3d2aSXin Li SILK frame. 1870*a58d3d2aSXin Li</t> 1871*a58d3d2aSXin Li 1872*a58d3d2aSXin Li<texttable anchor="silk_frame_symbols" 1873*a58d3d2aSXin Li title="Order of the symbols in an individual SILK frame"> 1874*a58d3d2aSXin Li<ttcol align="center">Symbol(s)</ttcol> 1875*a58d3d2aSXin Li<ttcol align="center">PDF(s)</ttcol> 1876*a58d3d2aSXin Li<ttcol align="center">Condition</ttcol> 1877*a58d3d2aSXin Li 1878*a58d3d2aSXin Li<c>Stereo Prediction Weights</c> 1879*a58d3d2aSXin Li<c><xref target="silk_stereo_pred_pdfs"/></c> 1880*a58d3d2aSXin Li<c><xref target="silk_stereo_pred"/></c> 1881*a58d3d2aSXin Li 1882*a58d3d2aSXin Li<c>Mid-only Flag</c> 1883*a58d3d2aSXin Li<c><xref target="silk_mid_only_pdf"/></c> 1884*a58d3d2aSXin Li<c><xref target="silk_mid_only_flag"/></c> 1885*a58d3d2aSXin Li 1886*a58d3d2aSXin Li<c>Frame Type</c> 1887*a58d3d2aSXin Li<c><xref target="silk_frame_type"/></c> 1888*a58d3d2aSXin Li<c/> 1889*a58d3d2aSXin Li 1890*a58d3d2aSXin Li<c>Subframe Gains</c> 1891*a58d3d2aSXin Li<c><xref target="silk_gains"/></c> 1892*a58d3d2aSXin Li<c/> 1893*a58d3d2aSXin Li 1894*a58d3d2aSXin Li<c>Normalized LSF Stage-1 Index</c> 1895*a58d3d2aSXin Li<c><xref target="silk_nlsf_stage1_pdfs"/></c> 1896*a58d3d2aSXin Li<c/> 1897*a58d3d2aSXin Li 1898*a58d3d2aSXin Li<c>Normalized LSF Stage-2 Residual</c> 1899*a58d3d2aSXin Li<c><xref target="silk_nlsf_stage2"/></c> 1900*a58d3d2aSXin Li<c/> 1901*a58d3d2aSXin Li 1902*a58d3d2aSXin Li<c>Normalized LSF Interpolation Weight</c> 1903*a58d3d2aSXin Li<c><xref target="silk_nlsf_interp_pdf"/></c> 1904*a58d3d2aSXin Li<c>20 ms frame</c> 1905*a58d3d2aSXin Li 1906*a58d3d2aSXin Li<c>Primary Pitch Lag</c> 1907*a58d3d2aSXin Li<c><xref target="silk_ltp_lags"/></c> 1908*a58d3d2aSXin Li<c>Voiced frame</c> 1909*a58d3d2aSXin Li 1910*a58d3d2aSXin Li<c>Subframe Pitch Contour</c> 1911*a58d3d2aSXin Li<c><xref target="silk_pitch_contour_pdfs"/></c> 1912*a58d3d2aSXin Li<c>Voiced frame</c> 1913*a58d3d2aSXin Li 1914*a58d3d2aSXin Li<c>Periodicity Index</c> 1915*a58d3d2aSXin Li<c><xref target="silk_perindex_pdf"/></c> 1916*a58d3d2aSXin Li<c>Voiced frame</c> 1917*a58d3d2aSXin Li 1918*a58d3d2aSXin Li<c>LTP Filter</c> 1919*a58d3d2aSXin Li<c><xref target="silk_ltp_filter_pdfs"/></c> 1920*a58d3d2aSXin Li<c>Voiced frame</c> 1921*a58d3d2aSXin Li 1922*a58d3d2aSXin Li<c>LTP Scaling</c> 1923*a58d3d2aSXin Li<c><xref target="silk_ltp_scaling_pdf"/></c> 1924*a58d3d2aSXin Li<c><xref target="silk_ltp_scaling"/></c> 1925*a58d3d2aSXin Li 1926*a58d3d2aSXin Li<c>LCG Seed</c> 1927*a58d3d2aSXin Li<c><xref target="silk_seed_pdf"/></c> 1928*a58d3d2aSXin Li<c/> 1929*a58d3d2aSXin Li 1930*a58d3d2aSXin Li<c>Excitation Rate Level</c> 1931*a58d3d2aSXin Li<c><xref target="silk_rate_level_pdfs"/></c> 1932*a58d3d2aSXin Li<c/> 1933*a58d3d2aSXin Li 1934*a58d3d2aSXin Li<c>Excitation Pulse Counts</c> 1935*a58d3d2aSXin Li<c><xref target="silk_pulse_count_pdfs"/></c> 1936*a58d3d2aSXin Li<c/> 1937*a58d3d2aSXin Li 1938*a58d3d2aSXin Li<c>Excitation Pulse Locations</c> 1939*a58d3d2aSXin Li<c><xref target="silk_pulse_locations"/></c> 1940*a58d3d2aSXin Li<c>Non-zero pulse count</c> 1941*a58d3d2aSXin Li 1942*a58d3d2aSXin Li<c>Excitation LSBs</c> 1943*a58d3d2aSXin Li<c><xref target="silk_shell_lsb_pdf"/></c> 1944*a58d3d2aSXin Li<c><xref target="silk_pulse_counts"/></c> 1945*a58d3d2aSXin Li 1946*a58d3d2aSXin Li<c>Excitation Signs</c> 1947*a58d3d2aSXin Li<c><xref target="silk_sign_pdfs"/></c> 1948*a58d3d2aSXin Li<c/> 1949*a58d3d2aSXin Li 1950*a58d3d2aSXin Li</texttable> 1951*a58d3d2aSXin Li 1952*a58d3d2aSXin Li<section anchor="silk_stereo_pred" toc="include" 1953*a58d3d2aSXin Li title="Stereo Prediction Weights"> 1954*a58d3d2aSXin Li<t> 1955*a58d3d2aSXin LiA SILK frame corresponding to the mid channel of a stereo Opus frame begins 1956*a58d3d2aSXin Li with a pair of side channel prediction weights, designed such that zeros 1957*a58d3d2aSXin Li indicate normal mid-side coupling. 1958*a58d3d2aSXin LiSince these weights can change on every frame, the first portion of each frame 1959*a58d3d2aSXin Li linearly interpolates between the previous weights and the current ones, using 1960*a58d3d2aSXin Li zeros for the previous weights if none are available. 1961*a58d3d2aSXin LiThese prediction weights are never included in a mono Opus frame, and the 1962*a58d3d2aSXin Li previous weights are reset to zeros on any transition from mono to stereo. 1963*a58d3d2aSXin LiThey are also not included in an LBRR frame for the side channel, even if the 1964*a58d3d2aSXin Li LBRR flags indicate the corresponding mid channel was not coded. 1965*a58d3d2aSXin LiIn that case, the previous weights are used, again substituting in zeros if no 1966*a58d3d2aSXin Li previous weights are available since the last decoder reset 1967*a58d3d2aSXin Li (see <xref target="decoder-reset"/>). 1968*a58d3d2aSXin Li</t> 1969*a58d3d2aSXin Li 1970*a58d3d2aSXin Li<t> 1971*a58d3d2aSXin LiTo summarize, these weights are coded if and only if 1972*a58d3d2aSXin Li<list style="symbols"> 1973*a58d3d2aSXin Li<t>This is a stereo Opus frame (<xref target="toc_byte"/>), and</t> 1974*a58d3d2aSXin Li<t>The current SILK frame corresponds to the mid channel.</t> 1975*a58d3d2aSXin Li</list> 1976*a58d3d2aSXin Li</t> 1977*a58d3d2aSXin Li 1978*a58d3d2aSXin Li<t> 1979*a58d3d2aSXin LiThe prediction weights are coded in three separate pieces, which are decoded 1980*a58d3d2aSXin Li by silk_stereo_decode_pred() (decode_stereo_pred.c). 1981*a58d3d2aSXin LiThe first piece jointly codes the high-order part of a table index for both 1982*a58d3d2aSXin Li weights. 1983*a58d3d2aSXin LiThe second piece codes the low-order part of each table index. 1984*a58d3d2aSXin LiThe third piece codes an offset used to linearly interpolate between table 1985*a58d3d2aSXin Li indices. 1986*a58d3d2aSXin LiThe details are as follows. 1987*a58d3d2aSXin Li</t> 1988*a58d3d2aSXin Li 1989*a58d3d2aSXin Li<t> 1990*a58d3d2aSXin LiLet n be an index decoded with the 25-element stage-1 PDF in 1991*a58d3d2aSXin Li <xref target="silk_stereo_pred_pdfs"/>. 1992*a58d3d2aSXin LiThen let i0 and i1 be indices decoded with the stage-2 and stage-3 PDFs in 1993*a58d3d2aSXin Li <xref target="silk_stereo_pred_pdfs"/>, respectively, and let i2 and i3 1994*a58d3d2aSXin Li be two more indices decoded with the stage-2 and stage-3 PDFs, all in that 1995*a58d3d2aSXin Li order. 1996*a58d3d2aSXin Li</t> 1997*a58d3d2aSXin Li 1998*a58d3d2aSXin Li<texttable anchor="silk_stereo_pred_pdfs" title="Stereo Weight PDFs"> 1999*a58d3d2aSXin Li<ttcol align="left">Stage</ttcol> 2000*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 2001*a58d3d2aSXin Li<c>Stage 1</c> 2002*a58d3d2aSXin Li<c>{7, 2, 1, 1, 1, 2003*a58d3d2aSXin Li 10, 24, 8, 1, 1, 2004*a58d3d2aSXin Li 3, 23, 92, 23, 3, 2005*a58d3d2aSXin Li 1, 1, 8, 24, 10, 2006*a58d3d2aSXin Li 1, 1, 1, 2, 7}/256</c> 2007*a58d3d2aSXin Li 2008*a58d3d2aSXin Li<c>Stage 2</c> 2009*a58d3d2aSXin Li<c>{85, 86, 85}/256</c> 2010*a58d3d2aSXin Li 2011*a58d3d2aSXin Li<c>Stage 3</c> 2012*a58d3d2aSXin Li<c>{51, 51, 52, 51, 51}/256</c> 2013*a58d3d2aSXin Li</texttable> 2014*a58d3d2aSXin Li 2015*a58d3d2aSXin Li<t> 2016*a58d3d2aSXin LiThen use n, i0, and i2 to form two table indices, wi0 and wi1, according to 2017*a58d3d2aSXin Li<figure align="center"> 2018*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 2019*a58d3d2aSXin Liwi0 = i0 + 3*(n/5) 2020*a58d3d2aSXin Liwi1 = i2 + 3*(n%5) 2021*a58d3d2aSXin Li]]></artwork> 2022*a58d3d2aSXin Li</figure> 2023*a58d3d2aSXin Li where the division is integer division. 2024*a58d3d2aSXin LiThe range of these indices is 0 to 14, inclusive. 2025*a58d3d2aSXin LiLet w[i] be the i'th weight from <xref target="silk_stereo_weights_table"/>. 2026*a58d3d2aSXin LiThen the two prediction weights, w0_Q13 and w1_Q13, are 2027*a58d3d2aSXin Li<figure align="center"> 2028*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 2029*a58d3d2aSXin Liw1_Q13 = w_Q13[wi1] 2030*a58d3d2aSXin Li + ((w_Q13[wi1+1] - w_Q13[wi1])*6554) >> 16)*(2*i3 + 1) 2031*a58d3d2aSXin Li 2032*a58d3d2aSXin Liw0_Q13 = w_Q13[wi0] 2033*a58d3d2aSXin Li + ((w_Q13[wi0+1] - w_Q13[wi0])*6554) >> 16)*(2*i1 + 1) 2034*a58d3d2aSXin Li - w1_Q13 2035*a58d3d2aSXin Li]]></artwork> 2036*a58d3d2aSXin Li</figure> 2037*a58d3d2aSXin LiN.b., w1_Q13 is computed first here, because w0_Q13 depends on it. 2038*a58d3d2aSXin LiThe constant 6554 is approximately 0.1 in Q16. 2039*a58d3d2aSXin LiAlthough wi0 and wi1 only have 15 possible values, 2040*a58d3d2aSXin Li <xref target="silk_stereo_weights_table"/> contains 16 entries to allow 2041*a58d3d2aSXin Li interpolation between entry wi0 and (wi0 + 1) (and likewise for wi1). 2042*a58d3d2aSXin Li</t> 2043*a58d3d2aSXin Li 2044*a58d3d2aSXin Li<texttable anchor="silk_stereo_weights_table" 2045*a58d3d2aSXin Li title="Stereo Weight Table"> 2046*a58d3d2aSXin Li<ttcol align="left">Index</ttcol> 2047*a58d3d2aSXin Li<ttcol align="right">Weight (Q13)</ttcol> 2048*a58d3d2aSXin Li <c>0</c> <c>-13732</c> 2049*a58d3d2aSXin Li <c>1</c> <c>-10050</c> 2050*a58d3d2aSXin Li <c>2</c> <c>-8266</c> 2051*a58d3d2aSXin Li <c>3</c> <c>-7526</c> 2052*a58d3d2aSXin Li <c>4</c> <c>-6500</c> 2053*a58d3d2aSXin Li <c>5</c> <c>-5000</c> 2054*a58d3d2aSXin Li <c>6</c> <c>-2950</c> 2055*a58d3d2aSXin Li <c>7</c> <c>-820</c> 2056*a58d3d2aSXin Li <c>8</c> <c>820</c> 2057*a58d3d2aSXin Li <c>9</c> <c>2950</c> 2058*a58d3d2aSXin Li<c>10</c> <c>5000</c> 2059*a58d3d2aSXin Li<c>11</c> <c>6500</c> 2060*a58d3d2aSXin Li<c>12</c> <c>7526</c> 2061*a58d3d2aSXin Li<c>13</c> <c>8266</c> 2062*a58d3d2aSXin Li<c>14</c> <c>10050</c> 2063*a58d3d2aSXin Li<c>15</c> <c>13732</c> 2064*a58d3d2aSXin Li</texttable> 2065*a58d3d2aSXin Li 2066*a58d3d2aSXin Li</section> 2067*a58d3d2aSXin Li 2068*a58d3d2aSXin Li<section anchor="silk_mid_only_flag" toc="include" title="Mid-only Flag"> 2069*a58d3d2aSXin Li<t> 2070*a58d3d2aSXin LiA flag appears after the stereo prediction weights that indicates if only the 2071*a58d3d2aSXin Li mid channel is coded for this time interval. 2072*a58d3d2aSXin LiIt appears only when 2073*a58d3d2aSXin Li<list style="symbols"> 2074*a58d3d2aSXin Li<t>This is a stereo Opus frame (see <xref target="toc_byte"/>),</t> 2075*a58d3d2aSXin Li<t>The current SILK frame corresponds to the mid channel, and</t> 2076*a58d3d2aSXin Li<t>Either 2077*a58d3d2aSXin Li<list style="symbols"> 2078*a58d3d2aSXin Li<t>This is a regular SILK frame where the VAD flags 2079*a58d3d2aSXin Li (see <xref target="silk_header_bits"/>) indicate that the corresponding side 2080*a58d3d2aSXin Li channel is not active.</t> 2081*a58d3d2aSXin Li<t> 2082*a58d3d2aSXin LiThis is an LBRR frame where the LBRR flags 2083*a58d3d2aSXin Li (see <xref target="silk_header_bits"/> and <xref target="silk_lbrr_flags"/>) 2084*a58d3d2aSXin Li indicate that the corresponding side channel is not coded. 2085*a58d3d2aSXin Li</t> 2086*a58d3d2aSXin Li</list> 2087*a58d3d2aSXin Li</t> 2088*a58d3d2aSXin Li</list> 2089*a58d3d2aSXin LiIt is omitted when there are no stereo weights, for all of the same reasons. 2090*a58d3d2aSXin LiIt is also omitted for a regular SILK frame when the VAD flag of the 2091*a58d3d2aSXin Li corresponding side channel frame is set (indicating it is active). 2092*a58d3d2aSXin LiThe side channel must be coded in this case, making the mid-only flag 2093*a58d3d2aSXin Li redundant. 2094*a58d3d2aSXin LiIt is also omitted for an LBRR frame when the corresponding LBRR flags 2095*a58d3d2aSXin Li indicate the side channel is coded. 2096*a58d3d2aSXin Li</t> 2097*a58d3d2aSXin Li 2098*a58d3d2aSXin Li<t> 2099*a58d3d2aSXin LiWhen the flag is present, the decoder reads a single value using the PDF in 2100*a58d3d2aSXin Li <xref target="silk_mid_only_pdf"/>, as implemented in 2101*a58d3d2aSXin Li silk_stereo_decode_mid_only() (decode_stereo_pred.c). 2102*a58d3d2aSXin LiIf the flag is set, then there is no corresponding SILK frame for the side 2103*a58d3d2aSXin Li channel, the entire decoding process for the side channel is skipped, and 2104*a58d3d2aSXin Li zeros are fed to the stereo unmixing process (see 2105*a58d3d2aSXin Li <xref target="silk_stereo_unmixing"/>) instead. 2106*a58d3d2aSXin LiAs stated above, LBRR frames still include this flag when the LBRR flag 2107*a58d3d2aSXin Li indicates that the side channel is not coded. 2108*a58d3d2aSXin LiIn that case, if this flag is zero (indicating that there should be a side 2109*a58d3d2aSXin Li channel), then Packet Loss Concealment (PLC, see 2110*a58d3d2aSXin Li <xref target="Packet Loss Concealment"/>) SHOULD be invoked to recover a 2111*a58d3d2aSXin Li side channel signal. 2112*a58d3d2aSXin LiOtherwise, the stereo image will collapse. 2113*a58d3d2aSXin Li</t> 2114*a58d3d2aSXin Li 2115*a58d3d2aSXin Li<texttable anchor="silk_mid_only_pdf" title="Mid-only Flag PDF"> 2116*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 2117*a58d3d2aSXin Li<c>{192, 64}/256</c> 2118*a58d3d2aSXin Li</texttable> 2119*a58d3d2aSXin Li 2120*a58d3d2aSXin Li</section> 2121*a58d3d2aSXin Li 2122*a58d3d2aSXin Li<section anchor="silk_frame_type" toc="include" title="Frame Type"> 2123*a58d3d2aSXin Li<t> 2124*a58d3d2aSXin LiEach SILK frame contains a single "frame type" symbol that jointly codes the 2125*a58d3d2aSXin Li signal type and quantization offset type of the corresponding frame. 2126*a58d3d2aSXin LiIf the current frame is a regular SILK frame whose VAD bit was not set (an 2127*a58d3d2aSXin Li "inactive" frame), then the frame type symbol takes on a value of either 0 or 2128*a58d3d2aSXin Li 1 and is decoded using the first PDF in <xref target="silk_frame_type_pdfs"/>. 2129*a58d3d2aSXin LiIf the frame is an LBRR frame or a regular SILK frame whose VAD flag was set 2130*a58d3d2aSXin Li (an "active" frame), then the value of the symbol may range from 2 to 5, 2131*a58d3d2aSXin Li inclusive, and is decoded using the second PDF in 2132*a58d3d2aSXin Li <xref target="silk_frame_type_pdfs"/>. 2133*a58d3d2aSXin Li<xref target="silk_frame_type_table"/> translates between the value of the 2134*a58d3d2aSXin Li frame type symbol and the corresponding signal type and quantization offset 2135*a58d3d2aSXin Li type. 2136*a58d3d2aSXin Li</t> 2137*a58d3d2aSXin Li 2138*a58d3d2aSXin Li<texttable anchor="silk_frame_type_pdfs" title="Frame Type PDFs"> 2139*a58d3d2aSXin Li<ttcol>VAD Flag</ttcol> 2140*a58d3d2aSXin Li<ttcol>PDF</ttcol> 2141*a58d3d2aSXin Li<c>Inactive</c> <c>{26, 230, 0, 0, 0, 0}/256</c> 2142*a58d3d2aSXin Li<c>Active</c> <c>{0, 0, 24, 74, 148, 10}/256</c> 2143*a58d3d2aSXin Li</texttable> 2144*a58d3d2aSXin Li 2145*a58d3d2aSXin Li<texttable anchor="silk_frame_type_table" 2146*a58d3d2aSXin Li title="Signal Type and Quantization Offset Type from Frame Type"> 2147*a58d3d2aSXin Li<ttcol>Frame Type</ttcol> 2148*a58d3d2aSXin Li<ttcol>Signal Type</ttcol> 2149*a58d3d2aSXin Li<ttcol align="right">Quantization Offset Type</ttcol> 2150*a58d3d2aSXin Li<c>0</c> <c>Inactive</c> <c>Low</c> 2151*a58d3d2aSXin Li<c>1</c> <c>Inactive</c> <c>High</c> 2152*a58d3d2aSXin Li<c>2</c> <c>Unvoiced</c> <c>Low</c> 2153*a58d3d2aSXin Li<c>3</c> <c>Unvoiced</c> <c>High</c> 2154*a58d3d2aSXin Li<c>4</c> <c>Voiced</c> <c>Low</c> 2155*a58d3d2aSXin Li<c>5</c> <c>Voiced</c> <c>High</c> 2156*a58d3d2aSXin Li</texttable> 2157*a58d3d2aSXin Li 2158*a58d3d2aSXin Li</section> 2159*a58d3d2aSXin Li 2160*a58d3d2aSXin Li<section anchor="silk_gains" toc="include" title="Subframe Gains"> 2161*a58d3d2aSXin Li<t> 2162*a58d3d2aSXin LiA separate quantization gain is coded for each 5 ms subframe. 2163*a58d3d2aSXin LiThese gains control the step size between quantization levels of the excitation 2164*a58d3d2aSXin Li signal and, therefore, the quality of the reconstruction. 2165*a58d3d2aSXin LiThey are independent of and unrelated to the pitch contours coded for voiced 2166*a58d3d2aSXin Li frames. 2167*a58d3d2aSXin LiThe quantization gains are themselves uniformly quantized to 6 bits on a 2168*a58d3d2aSXin Li log scale, giving them a resolution of approximately 1.369 dB and a range 2169*a58d3d2aSXin Li of approximately 1.94 dB to 88.21 dB. 2170*a58d3d2aSXin Li</t> 2171*a58d3d2aSXin Li<t> 2172*a58d3d2aSXin LiThe subframe gains are either coded independently, or relative to the gain from 2173*a58d3d2aSXin Li the most recent coded subframe in the same channel. 2174*a58d3d2aSXin LiIndependent coding is used if and only if 2175*a58d3d2aSXin Li<list style="symbols"> 2176*a58d3d2aSXin Li<t> 2177*a58d3d2aSXin LiThis is the first subframe in the current SILK frame, and 2178*a58d3d2aSXin Li</t> 2179*a58d3d2aSXin Li<t>Either 2180*a58d3d2aSXin Li<list style="symbols"> 2181*a58d3d2aSXin Li<t> 2182*a58d3d2aSXin LiThis is the first SILK frame of its type (LBRR or regular) for this channel in 2183*a58d3d2aSXin Li the current Opus frame, or 2184*a58d3d2aSXin Li </t> 2185*a58d3d2aSXin Li<t> 2186*a58d3d2aSXin LiThe previous SILK frame of the same type (LBRR or regular) for this channel in 2187*a58d3d2aSXin Li the same Opus frame was not coded. 2188*a58d3d2aSXin Li</t> 2189*a58d3d2aSXin Li</list> 2190*a58d3d2aSXin Li</t> 2191*a58d3d2aSXin Li</list> 2192*a58d3d2aSXin Li</t> 2193*a58d3d2aSXin Li 2194*a58d3d2aSXin Li<t> 2195*a58d3d2aSXin LiIn an independently coded subframe gain, the 3 most significant bits of the 2196*a58d3d2aSXin Li quantization gain are decoded using a PDF selected from 2197*a58d3d2aSXin Li <xref target="silk_independent_gain_msb_pdfs"/> based on the decoded signal 2198*a58d3d2aSXin Li type (see <xref target="silk_frame_type"/>). 2199*a58d3d2aSXin Li</t> 2200*a58d3d2aSXin Li 2201*a58d3d2aSXin Li<texttable anchor="silk_independent_gain_msb_pdfs" 2202*a58d3d2aSXin Li title="PDFs for Independent Quantization Gain MSB Coding"> 2203*a58d3d2aSXin Li<ttcol align="left">Signal Type</ttcol> 2204*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 2205*a58d3d2aSXin Li<c>Inactive</c> <c>{32, 112, 68, 29, 12, 1, 1, 1}/256</c> 2206*a58d3d2aSXin Li<c>Unvoiced</c> <c>{2, 17, 45, 60, 62, 47, 19, 4}/256</c> 2207*a58d3d2aSXin Li<c>Voiced</c> <c>{1, 3, 26, 71, 94, 50, 9, 2}/256</c> 2208*a58d3d2aSXin Li</texttable> 2209*a58d3d2aSXin Li 2210*a58d3d2aSXin Li<t> 2211*a58d3d2aSXin LiThe 3 least significant bits are decoded using a uniform PDF: 2212*a58d3d2aSXin Li</t> 2213*a58d3d2aSXin Li<texttable anchor="silk_independent_gain_lsb_pdf" 2214*a58d3d2aSXin Li title="PDF for Independent Quantization Gain LSB Coding"> 2215*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 2216*a58d3d2aSXin Li<c>{32, 32, 32, 32, 32, 32, 32, 32}/256</c> 2217*a58d3d2aSXin Li</texttable> 2218*a58d3d2aSXin Li 2219*a58d3d2aSXin Li<t> 2220*a58d3d2aSXin LiThese 6 bits are combined to form a value, gain_index, between 0 and 63. 2221*a58d3d2aSXin LiWhen the gain for the previous subframe is available, then the current gain is 2222*a58d3d2aSXin Li limited as follows: 2223*a58d3d2aSXin Li<figure align="center"> 2224*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 2225*a58d3d2aSXin Lilog_gain = max(gain_index, previous_log_gain - 16) . 2226*a58d3d2aSXin Li]]></artwork> 2227*a58d3d2aSXin Li</figure> 2228*a58d3d2aSXin LiThis may help some implementations limit the change in precision of their 2229*a58d3d2aSXin Li internal LTP history. 2230*a58d3d2aSXin LiThe indices which this clamp applies to cannot simply be removed from the 2231*a58d3d2aSXin Li codebook, because previous_log_gain will not be available after packet loss. 2232*a58d3d2aSXin LiThe clamping is skipped after a decoder reset, and in the side channel if the 2233*a58d3d2aSXin Li previous frame in the side channel was not coded, since there is no value for 2234*a58d3d2aSXin Li previous_log_gain available. 2235*a58d3d2aSXin LiIt MAY also be skipped after packet loss. 2236*a58d3d2aSXin Li</t> 2237*a58d3d2aSXin Li 2238*a58d3d2aSXin Li<t> 2239*a58d3d2aSXin LiFor subframes which do not have an independent gain (including the first 2240*a58d3d2aSXin Li subframe of frames not listed as using independent coding above), the 2241*a58d3d2aSXin Li quantization gain is coded relative to the gain from the previous subframe (in 2242*a58d3d2aSXin Li the same channel). 2243*a58d3d2aSXin LiThe PDF in <xref target="silk_delta_gain_pdf"/> yields a delta_gain_index value 2244*a58d3d2aSXin Li between 0 and 40, inclusive. 2245*a58d3d2aSXin Li</t> 2246*a58d3d2aSXin Li<texttable anchor="silk_delta_gain_pdf" 2247*a58d3d2aSXin Li title="PDF for Delta Quantization Gain Coding"> 2248*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 2249*a58d3d2aSXin Li<c>{6, 5, 11, 31, 132, 21, 8, 4, 2250*a58d3d2aSXin Li 3, 2, 2, 2, 1, 1, 1, 1, 2251*a58d3d2aSXin Li 1, 1, 1, 1, 1, 1, 1, 1, 2252*a58d3d2aSXin Li 1, 1, 1, 1, 1, 1, 1, 1, 2253*a58d3d2aSXin Li 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c> 2254*a58d3d2aSXin Li</texttable> 2255*a58d3d2aSXin Li<t> 2256*a58d3d2aSXin LiThe following formula translates this index into a quantization gain for the 2257*a58d3d2aSXin Li current subframe using the gain from the previous subframe: 2258*a58d3d2aSXin Li<figure align="center"> 2259*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 2260*a58d3d2aSXin Lilog_gain = clamp(0, max(2*delta_gain_index - 16, 2261*a58d3d2aSXin Li previous_log_gain + delta_gain_index - 4), 63) . 2262*a58d3d2aSXin Li]]></artwork> 2263*a58d3d2aSXin Li</figure> 2264*a58d3d2aSXin Li</t> 2265*a58d3d2aSXin Li<t> 2266*a58d3d2aSXin Lisilk_gains_dequant() (gain_quant.c) dequantizes log_gain for the k'th subframe 2267*a58d3d2aSXin Li and converts it into a linear Q16 scale factor via 2268*a58d3d2aSXin Li<figure align="center"> 2269*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 2270*a58d3d2aSXin Ligain_Q16[k] = silk_log2lin((0x1D1C71*log_gain>>16) + 2090) 2271*a58d3d2aSXin Li]]></artwork> 2272*a58d3d2aSXin Li</figure> 2273*a58d3d2aSXin Li</t> 2274*a58d3d2aSXin Li<t> 2275*a58d3d2aSXin LiThe function silk_log2lin() (log2lin.c) computes an approximation of 2276*a58d3d2aSXin Li 2**(inLog_Q7/128.0), where inLog_Q7 is its Q7 input. 2277*a58d3d2aSXin LiLet i = inLog_Q7>>7 be the integer part of inLogQ7 and 2278*a58d3d2aSXin Li f = inLog_Q7&127 be the fractional part. 2279*a58d3d2aSXin LiThen 2280*a58d3d2aSXin Li<figure align="center"> 2281*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 2282*a58d3d2aSXin Li(1<<i) + ((-174*f*(128-f)>>16)+f)*((1<<i)>>7) 2283*a58d3d2aSXin Li]]></artwork> 2284*a58d3d2aSXin Li</figure> 2285*a58d3d2aSXin Li yields the approximate exponential. 2286*a58d3d2aSXin LiThe final Q16 gain values lies between 81920 and 1686110208, inclusive 2287*a58d3d2aSXin Li (representing scale factors of 1.25 to 25728, respectively). 2288*a58d3d2aSXin Li</t> 2289*a58d3d2aSXin Li</section> 2290*a58d3d2aSXin Li 2291*a58d3d2aSXin Li<section anchor="silk_nlsfs" toc="include" title="Normalized Line Spectral 2292*a58d3d2aSXin Li Frequency (LSF) and Linear Predictive Coding (LPC) Coefficients"> 2293*a58d3d2aSXin Li<t> 2294*a58d3d2aSXin LiA set of normalized Line Spectral Frequency (LSF) coefficients follow the 2295*a58d3d2aSXin Li quantization gains in the bitstream, and represent the Linear Predictive 2296*a58d3d2aSXin Li Coding (LPC) coefficients for the current SILK frame. 2297*a58d3d2aSXin LiOnce decoded, the normalized LSFs form an increasing list of Q15 values between 2298*a58d3d2aSXin Li 0 and 1. 2299*a58d3d2aSXin LiThese represent the interleaved zeros on the upper half of the unit circle 2300*a58d3d2aSXin Li (between 0 and pi, hence "normalized") in the standard decomposition 2301*a58d3d2aSXin Li <xref target="line-spectral-pairs"/> of the LPC filter into a symmetric part 2302*a58d3d2aSXin Li and an anti-symmetric part (P and Q in <xref target="silk_nlsf2lpc"/>). 2303*a58d3d2aSXin LiBecause of non-linear effects in the decoding process, an implementation SHOULD 2304*a58d3d2aSXin Li match the fixed-point arithmetic described in this section exactly. 2305*a58d3d2aSXin LiAn encoder SHOULD also use the same process. 2306*a58d3d2aSXin Li</t> 2307*a58d3d2aSXin Li<t> 2308*a58d3d2aSXin LiThe normalized LSFs are coded using a two-stage vector quantizer (VQ) 2309*a58d3d2aSXin Li (<xref target="silk_nlsf_stage1"/> and <xref target="silk_nlsf_stage2"/>). 2310*a58d3d2aSXin LiNB and MB frames use an order-10 predictor, while WB frames use an order-16 2311*a58d3d2aSXin Li predictor, and thus have different sets of tables. 2312*a58d3d2aSXin LiAfter reconstructing the normalized LSFs 2313*a58d3d2aSXin Li (<xref target="silk_nlsf_reconstruction"/>), the decoder runs them through a 2314*a58d3d2aSXin Li stabilization process (<xref target="silk_nlsf_stabilization"/>), interpolates 2315*a58d3d2aSXin Li them between frames (<xref target="silk_nlsf_interpolation"/>), converts them 2316*a58d3d2aSXin Li back into LPC coefficients (<xref target="silk_nlsf2lpc"/>), and then runs 2317*a58d3d2aSXin Li them through further processes to limit the range of the coefficients 2318*a58d3d2aSXin Li (<xref target="silk_lpc_range_limit"/>) and the gain of the filter 2319*a58d3d2aSXin Li (<xref target="silk_lpc_gain_limit"/>). 2320*a58d3d2aSXin LiAll of this is necessary to ensure the reconstruction process is stable. 2321*a58d3d2aSXin Li</t> 2322*a58d3d2aSXin Li 2323*a58d3d2aSXin Li<section anchor="silk_nlsf_stage1" title="Normalized LSF Stage 1 Decoding"> 2324*a58d3d2aSXin Li<t> 2325*a58d3d2aSXin LiThe first VQ stage uses a 32-element codebook, coded with one of the PDFs in 2326*a58d3d2aSXin Li <xref target="silk_nlsf_stage1_pdfs"/>, depending on the audio bandwidth and 2327*a58d3d2aSXin Li the signal type of the current SILK frame. 2328*a58d3d2aSXin LiThis yields a single index, I1, for the entire frame, which 2329*a58d3d2aSXin Li<list style="numbers"> 2330*a58d3d2aSXin Li<t>Indexes an element in a coarse codebook,</t> 2331*a58d3d2aSXin Li<t>Selects the PDFs for the second stage of the VQ, and</t> 2332*a58d3d2aSXin Li<t>Selects the prediction weights used to remove intra-frame redundancy from 2333*a58d3d2aSXin Li the second stage.</t> 2334*a58d3d2aSXin Li</list> 2335*a58d3d2aSXin LiThe actual codebook elements are listed in 2336*a58d3d2aSXin Li <xref target="silk_nlsf_nbmb_codebook"/> and 2337*a58d3d2aSXin Li <xref target="silk_nlsf_wb_codebook"/>, but they are not needed until the last 2338*a58d3d2aSXin Li stages of reconstructing the LSF coefficients. 2339*a58d3d2aSXin Li</t> 2340*a58d3d2aSXin Li 2341*a58d3d2aSXin Li<texttable anchor="silk_nlsf_stage1_pdfs" 2342*a58d3d2aSXin Li title="PDFs for Normalized LSF Stage-1 Index Decoding"> 2343*a58d3d2aSXin Li<ttcol align="left">Audio Bandwidth</ttcol> 2344*a58d3d2aSXin Li<ttcol align="left">Signal Type</ttcol> 2345*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 2346*a58d3d2aSXin Li<c>NB or MB</c> <c>Inactive or unvoiced</c> 2347*a58d3d2aSXin Li<c> 2348*a58d3d2aSXin Li{44, 34, 30, 19, 21, 12, 11, 3, 2349*a58d3d2aSXin Li 3, 2, 16, 2, 2, 1, 5, 2, 2350*a58d3d2aSXin Li 1, 3, 3, 1, 1, 2, 2, 2, 2351*a58d3d2aSXin Li 3, 1, 9, 9, 2, 7, 2, 1}/256 2352*a58d3d2aSXin Li</c> 2353*a58d3d2aSXin Li<c>NB or MB</c> <c>Voiced</c> 2354*a58d3d2aSXin Li<c> 2355*a58d3d2aSXin Li{1, 10, 1, 8, 3, 8, 8, 14, 2356*a58d3d2aSXin Li13, 14, 1, 14, 12, 13, 11, 11, 2357*a58d3d2aSXin Li12, 11, 10, 10, 11, 8, 9, 8, 2358*a58d3d2aSXin Li 7, 8, 1, 1, 6, 1, 6, 5}/256 2359*a58d3d2aSXin Li</c> 2360*a58d3d2aSXin Li<c>WB</c> <c>Inactive or unvoiced</c> 2361*a58d3d2aSXin Li<c> 2362*a58d3d2aSXin Li{31, 21, 3, 17, 1, 8, 17, 4, 2363*a58d3d2aSXin Li 1, 18, 16, 4, 2, 3, 1, 10, 2364*a58d3d2aSXin Li 1, 3, 16, 11, 16, 2, 2, 3, 2365*a58d3d2aSXin Li 2, 11, 1, 4, 9, 8, 7, 3}/256 2366*a58d3d2aSXin Li</c> 2367*a58d3d2aSXin Li<c>WB</c> <c>Voiced</c> 2368*a58d3d2aSXin Li<c> 2369*a58d3d2aSXin Li{1, 4, 16, 5, 18, 11, 5, 14, 2370*a58d3d2aSXin Li15, 1, 3, 12, 13, 14, 14, 6, 2371*a58d3d2aSXin Li14, 12, 2, 6, 1, 12, 12, 11, 2372*a58d3d2aSXin Li10, 3, 10, 5, 1, 1, 1, 3}/256 2373*a58d3d2aSXin Li</c> 2374*a58d3d2aSXin Li</texttable> 2375*a58d3d2aSXin Li 2376*a58d3d2aSXin Li</section> 2377*a58d3d2aSXin Li 2378*a58d3d2aSXin Li<section anchor="silk_nlsf_stage2" title="Normalized LSF Stage 2 Decoding"> 2379*a58d3d2aSXin Li<t> 2380*a58d3d2aSXin LiA total of 16 PDFs are available for the LSF residual in the second stage: the 2381*a58d3d2aSXin Li 8 (a...h) for NB and MB frames given in 2382*a58d3d2aSXin Li <xref target="silk_nlsf_stage2_nbmb_pdfs"/>, and the 8 (i...p) for WB frames 2383*a58d3d2aSXin Li given in <xref target="silk_nlsf_stage2_wb_pdfs"/>. 2384*a58d3d2aSXin LiWhich PDF is used for which coefficient is driven by the index, I1, 2385*a58d3d2aSXin Li decoded in the first stage. 2386*a58d3d2aSXin Li<xref target="silk_nlsf_nbmb_stage2_cb_sel"/> lists the letter of the 2387*a58d3d2aSXin Li corresponding PDF for each normalized LSF coefficient for NB and MB, and 2388*a58d3d2aSXin Li <xref target="silk_nlsf_wb_stage2_cb_sel"/> lists the same information for WB. 2389*a58d3d2aSXin Li</t> 2390*a58d3d2aSXin Li 2391*a58d3d2aSXin Li<texttable anchor="silk_nlsf_stage2_nbmb_pdfs" 2392*a58d3d2aSXin Li title="PDFs for NB/MB Normalized LSF Stage-2 Index Decoding"> 2393*a58d3d2aSXin Li<ttcol align="left">Codebook</ttcol> 2394*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 2395*a58d3d2aSXin Li<c>a</c> <c>{1, 1, 1, 15, 224, 11, 1, 1, 1}/256</c> 2396*a58d3d2aSXin Li<c>b</c> <c>{1, 1, 2, 34, 183, 32, 1, 1, 1}/256</c> 2397*a58d3d2aSXin Li<c>c</c> <c>{1, 1, 4, 42, 149, 55, 2, 1, 1}/256</c> 2398*a58d3d2aSXin Li<c>d</c> <c>{1, 1, 8, 52, 123, 61, 8, 1, 1}/256</c> 2399*a58d3d2aSXin Li<c>e</c> <c>{1, 3, 16, 53, 101, 74, 6, 1, 1}/256</c> 2400*a58d3d2aSXin Li<c>f</c> <c>{1, 3, 17, 55, 90, 73, 15, 1, 1}/256</c> 2401*a58d3d2aSXin Li<c>g</c> <c>{1, 7, 24, 53, 74, 67, 26, 3, 1}/256</c> 2402*a58d3d2aSXin Li<c>h</c> <c>{1, 1, 18, 63, 78, 58, 30, 6, 1}/256</c> 2403*a58d3d2aSXin Li</texttable> 2404*a58d3d2aSXin Li 2405*a58d3d2aSXin Li<texttable anchor="silk_nlsf_stage2_wb_pdfs" 2406*a58d3d2aSXin Li title="PDFs for WB Normalized LSF Stage-2 Index Decoding"> 2407*a58d3d2aSXin Li<ttcol align="left">Codebook</ttcol> 2408*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 2409*a58d3d2aSXin Li<c>i</c> <c>{1, 1, 1, 9, 232, 9, 1, 1, 1}/256</c> 2410*a58d3d2aSXin Li<c>j</c> <c>{1, 1, 2, 28, 186, 35, 1, 1, 1}/256</c> 2411*a58d3d2aSXin Li<c>k</c> <c>{1, 1, 3, 42, 152, 53, 2, 1, 1}/256</c> 2412*a58d3d2aSXin Li<c>l</c> <c>{1, 1, 10, 49, 126, 65, 2, 1, 1}/256</c> 2413*a58d3d2aSXin Li<c>m</c> <c>{1, 4, 19, 48, 100, 77, 5, 1, 1}/256</c> 2414*a58d3d2aSXin Li<c>n</c> <c>{1, 1, 14, 54, 100, 72, 12, 1, 1}/256</c> 2415*a58d3d2aSXin Li<c>o</c> <c>{1, 1, 15, 61, 87, 61, 25, 4, 1}/256</c> 2416*a58d3d2aSXin Li<c>p</c> <c>{1, 7, 21, 50, 77, 81, 17, 1, 1}/256</c> 2417*a58d3d2aSXin Li</texttable> 2418*a58d3d2aSXin Li 2419*a58d3d2aSXin Li<texttable anchor="silk_nlsf_nbmb_stage2_cb_sel" 2420*a58d3d2aSXin Li title="Codebook Selection for NB/MB Normalized LSF Stage-2 Index Decoding"> 2421*a58d3d2aSXin Li<ttcol>I1</ttcol> 2422*a58d3d2aSXin Li<ttcol>Coefficient</ttcol> 2423*a58d3d2aSXin Li<c/> 2424*a58d3d2aSXin Li<c><spanx style="vbare">0 1 2 3 4 5 6 7 8 9</spanx></c> 2425*a58d3d2aSXin Li<c> 0</c> 2426*a58d3d2aSXin Li<c><spanx style="vbare">a a a a a a a a a a</spanx></c> 2427*a58d3d2aSXin Li<c> 1</c> 2428*a58d3d2aSXin Li<c><spanx style="vbare">b d b c c b c b b b</spanx></c> 2429*a58d3d2aSXin Li<c> 2</c> 2430*a58d3d2aSXin Li<c><spanx style="vbare">c b b b b b b b b b</spanx></c> 2431*a58d3d2aSXin Li<c> 3</c> 2432*a58d3d2aSXin Li<c><spanx style="vbare">b c c c c b c b b b</spanx></c> 2433*a58d3d2aSXin Li<c> 4</c> 2434*a58d3d2aSXin Li<c><spanx style="vbare">c d d d d c c c c c</spanx></c> 2435*a58d3d2aSXin Li<c> 5</c> 2436*a58d3d2aSXin Li<c><spanx style="vbare">a f d d c c c c b b</spanx></c> 2437*a58d3d2aSXin Li<c> g</c> 2438*a58d3d2aSXin Li<c><spanx style="vbare">a c c c c c c c c b</spanx></c> 2439*a58d3d2aSXin Li<c> 7</c> 2440*a58d3d2aSXin Li<c><spanx style="vbare">c d g e e e f e f f</spanx></c> 2441*a58d3d2aSXin Li<c> 8</c> 2442*a58d3d2aSXin Li<c><spanx style="vbare">c e f f e f e g e e</spanx></c> 2443*a58d3d2aSXin Li<c> 9</c> 2444*a58d3d2aSXin Li<c><spanx style="vbare">c e e h e f e f f e</spanx></c> 2445*a58d3d2aSXin Li<c>10</c> 2446*a58d3d2aSXin Li<c><spanx style="vbare">e d d d c d c c c c</spanx></c> 2447*a58d3d2aSXin Li<c>11</c> 2448*a58d3d2aSXin Li<c><spanx style="vbare">b f f g e f e f f f</spanx></c> 2449*a58d3d2aSXin Li<c>12</c> 2450*a58d3d2aSXin Li<c><spanx style="vbare">c h e g f f f f f f</spanx></c> 2451*a58d3d2aSXin Li<c>13</c> 2452*a58d3d2aSXin Li<c><spanx style="vbare">c h f f f f f g f e</spanx></c> 2453*a58d3d2aSXin Li<c>14</c> 2454*a58d3d2aSXin Li<c><spanx style="vbare">d d f e e f e f e e</spanx></c> 2455*a58d3d2aSXin Li<c>15</c> 2456*a58d3d2aSXin Li<c><spanx style="vbare">c d d f f e e e e e</spanx></c> 2457*a58d3d2aSXin Li<c>16</c> 2458*a58d3d2aSXin Li<c><spanx style="vbare">c e e g e f e f f f</spanx></c> 2459*a58d3d2aSXin Li<c>17</c> 2460*a58d3d2aSXin Li<c><spanx style="vbare">c f e g f f f e f e</spanx></c> 2461*a58d3d2aSXin Li<c>18</c> 2462*a58d3d2aSXin Li<c><spanx style="vbare">c h e f e f e f f f</spanx></c> 2463*a58d3d2aSXin Li<c>19</c> 2464*a58d3d2aSXin Li<c><spanx style="vbare">c f e g h g f g f e</spanx></c> 2465*a58d3d2aSXin Li<c>20</c> 2466*a58d3d2aSXin Li<c><spanx style="vbare">d g h e g f f g e f</spanx></c> 2467*a58d3d2aSXin Li<c>21</c> 2468*a58d3d2aSXin Li<c><spanx style="vbare">c h g e e e f e f f</spanx></c> 2469*a58d3d2aSXin Li<c>22</c> 2470*a58d3d2aSXin Li<c><spanx style="vbare">e f f e g g f g f e</spanx></c> 2471*a58d3d2aSXin Li<c>23</c> 2472*a58d3d2aSXin Li<c><spanx style="vbare">c f f g f g e g e e</spanx></c> 2473*a58d3d2aSXin Li<c>24</c> 2474*a58d3d2aSXin Li<c><spanx style="vbare">e f f f d h e f f e</spanx></c> 2475*a58d3d2aSXin Li<c>25</c> 2476*a58d3d2aSXin Li<c><spanx style="vbare">c d e f f g e f f e</spanx></c> 2477*a58d3d2aSXin Li<c>26</c> 2478*a58d3d2aSXin Li<c><spanx style="vbare">c d c d d e c d d d</spanx></c> 2479*a58d3d2aSXin Li<c>27</c> 2480*a58d3d2aSXin Li<c><spanx style="vbare">b b c c c c c d c c</spanx></c> 2481*a58d3d2aSXin Li<c>28</c> 2482*a58d3d2aSXin Li<c><spanx style="vbare">e f f g g g f g e f</spanx></c> 2483*a58d3d2aSXin Li<c>29</c> 2484*a58d3d2aSXin Li<c><spanx style="vbare">d f f e e e e d d c</spanx></c> 2485*a58d3d2aSXin Li<c>30</c> 2486*a58d3d2aSXin Li<c><spanx style="vbare">c f d h f f e e f e</spanx></c> 2487*a58d3d2aSXin Li<c>31</c> 2488*a58d3d2aSXin Li<c><spanx style="vbare">e e f e f g f g f e</spanx></c> 2489*a58d3d2aSXin Li</texttable> 2490*a58d3d2aSXin Li 2491*a58d3d2aSXin Li<texttable anchor="silk_nlsf_wb_stage2_cb_sel" 2492*a58d3d2aSXin Li title="Codebook Selection for WB Normalized LSF Stage-2 Index Decoding"> 2493*a58d3d2aSXin Li<ttcol>I1</ttcol> 2494*a58d3d2aSXin Li<ttcol>Coefficient</ttcol> 2495*a58d3d2aSXin Li<c/> 2496*a58d3d2aSXin Li<c><spanx style="vbare">0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15</spanx></c> 2497*a58d3d2aSXin Li<c> 0</c> 2498*a58d3d2aSXin Li<c><spanx style="vbare">i i i i i i i i i i i i i i i i</spanx></c> 2499*a58d3d2aSXin Li<c> 1</c> 2500*a58d3d2aSXin Li<c><spanx style="vbare">k l l l l l k k k k k j j j i l</spanx></c> 2501*a58d3d2aSXin Li<c> 2</c> 2502*a58d3d2aSXin Li<c><spanx style="vbare">k n n l p m m n k n m n n m l l</spanx></c> 2503*a58d3d2aSXin Li<c> 3</c> 2504*a58d3d2aSXin Li<c><spanx style="vbare">i k j k k j j j j j i i i i i j</spanx></c> 2505*a58d3d2aSXin Li<c> 4</c> 2506*a58d3d2aSXin Li<c><spanx style="vbare">i o n m o m p n m m m n n m m l</spanx></c> 2507*a58d3d2aSXin Li<c> 5</c> 2508*a58d3d2aSXin Li<c><spanx style="vbare">i l n n m l l n l l l l l l k m</spanx></c> 2509*a58d3d2aSXin Li<c> 6</c> 2510*a58d3d2aSXin Li<c><spanx style="vbare">i i i i i i i i i i i i i i i i</spanx></c> 2511*a58d3d2aSXin Li<c> 7</c> 2512*a58d3d2aSXin Li<c><spanx style="vbare">i k o l p k n l m n n m l l k l</spanx></c> 2513*a58d3d2aSXin Li<c> 8</c> 2514*a58d3d2aSXin Li<c><spanx style="vbare">i o k o o m n m o n m m n l l l</spanx></c> 2515*a58d3d2aSXin Li<c> 9</c> 2516*a58d3d2aSXin Li<c><spanx style="vbare">k j i i i i i i i i i i i i i i</spanx></c> 2517*a58d3d2aSXin Li<c>10</c> 2518*a58d3d2aSXin Li<c><spanx style="vbare">i j i i i i i i i i i i i i i j</spanx></c> 2519*a58d3d2aSXin Li<c>11</c> 2520*a58d3d2aSXin Li<c><spanx style="vbare">k k l m n l l l l l l l k k j l</spanx></c> 2521*a58d3d2aSXin Li<c>12</c> 2522*a58d3d2aSXin Li<c><spanx style="vbare">k k l l m l l l l l l l l k j l</spanx></c> 2523*a58d3d2aSXin Li<c>13</c> 2524*a58d3d2aSXin Li<c><spanx style="vbare">l m m m o m m n l n m m n m l m</spanx></c> 2525*a58d3d2aSXin Li<c>14</c> 2526*a58d3d2aSXin Li<c><spanx style="vbare">i o m n m p n k o n p m m l n l</spanx></c> 2527*a58d3d2aSXin Li<c>15</c> 2528*a58d3d2aSXin Li<c><spanx style="vbare">i j i j j j j j j j i i i i j i</spanx></c> 2529*a58d3d2aSXin Li<c>16</c> 2530*a58d3d2aSXin Li<c><spanx style="vbare">j o n p n m n l m n m m m l l m</spanx></c> 2531*a58d3d2aSXin Li<c>17</c> 2532*a58d3d2aSXin Li<c><spanx style="vbare">j l l m m l l n k l l n n n l m</spanx></c> 2533*a58d3d2aSXin Li<c>18</c> 2534*a58d3d2aSXin Li<c><spanx style="vbare">k l l k k k l k j k j k j j j m</spanx></c> 2535*a58d3d2aSXin Li<c>19</c> 2536*a58d3d2aSXin Li<c><spanx style="vbare">i k l n l l k k k j j i i i i i</spanx></c> 2537*a58d3d2aSXin Li<c>20</c> 2538*a58d3d2aSXin Li<c><spanx style="vbare">l m l n l l k k j j j j j k k m</spanx></c> 2539*a58d3d2aSXin Li<c>21</c> 2540*a58d3d2aSXin Li<c><spanx style="vbare">k o l p p m n m n l n l l k l l</spanx></c> 2541*a58d3d2aSXin Li<c>22</c> 2542*a58d3d2aSXin Li<c><spanx style="vbare">k l n o o l n l m m l l l l k m</spanx></c> 2543*a58d3d2aSXin Li<c>23</c> 2544*a58d3d2aSXin Li<c><spanx style="vbare">j l l m m m m l n n n l j j j j</spanx></c> 2545*a58d3d2aSXin Li<c>24</c> 2546*a58d3d2aSXin Li<c><spanx style="vbare">k n l o o m p m m n l m m l l l</spanx></c> 2547*a58d3d2aSXin Li<c>25</c> 2548*a58d3d2aSXin Li<c><spanx style="vbare">i o j j i i i i i i i i i i i i</spanx></c> 2549*a58d3d2aSXin Li<c>26</c> 2550*a58d3d2aSXin Li<c><spanx style="vbare">i o o l n k n n l m m p p m m m</spanx></c> 2551*a58d3d2aSXin Li<c>27</c> 2552*a58d3d2aSXin Li<c><spanx style="vbare">l l p l n m l l l k k l l l k l</spanx></c> 2553*a58d3d2aSXin Li<c>28</c> 2554*a58d3d2aSXin Li<c><spanx style="vbare">i i j i i i k j k j j k k k j j</spanx></c> 2555*a58d3d2aSXin Li<c>29</c> 2556*a58d3d2aSXin Li<c><spanx style="vbare">i l k n l l k l k j i i j i i j</spanx></c> 2557*a58d3d2aSXin Li<c>30</c> 2558*a58d3d2aSXin Li<c><spanx style="vbare">l n n m p n l l k l k k j i j i</spanx></c> 2559*a58d3d2aSXin Li<c>31</c> 2560*a58d3d2aSXin Li<c><spanx style="vbare">k l n l m l l l k j k o m i i i</spanx></c> 2561*a58d3d2aSXin Li</texttable> 2562*a58d3d2aSXin Li 2563*a58d3d2aSXin Li<t> 2564*a58d3d2aSXin LiDecoding the second stage residual proceeds as follows. 2565*a58d3d2aSXin LiFor each coefficient, the decoder reads a symbol using the PDF corresponding to 2566*a58d3d2aSXin Li I1 from either <xref target="silk_nlsf_nbmb_stage2_cb_sel"/> or 2567*a58d3d2aSXin Li <xref target="silk_nlsf_wb_stage2_cb_sel"/>, and subtracts 4 from the result 2568*a58d3d2aSXin Li to give an index in the range -4 to 4, inclusive. 2569*a58d3d2aSXin LiIf the index is either -4 or 4, it reads a second symbol using the PDF in 2570*a58d3d2aSXin Li <xref target="silk_nlsf_ext_pdf"/>, and adds the value of this second symbol 2571*a58d3d2aSXin Li to the index, using the same sign. 2572*a58d3d2aSXin LiThis gives the index, I2[k], a total range of -10 to 10, inclusive. 2573*a58d3d2aSXin Li</t> 2574*a58d3d2aSXin Li 2575*a58d3d2aSXin Li<texttable anchor="silk_nlsf_ext_pdf" 2576*a58d3d2aSXin Li title="PDF for Normalized LSF Index Extension Decoding"> 2577*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 2578*a58d3d2aSXin Li<c>{156, 60, 24, 9, 4, 2, 1}/256</c> 2579*a58d3d2aSXin Li</texttable> 2580*a58d3d2aSXin Li 2581*a58d3d2aSXin Li<t> 2582*a58d3d2aSXin LiThe decoded indices from both stages are translated back into normalized LSF 2583*a58d3d2aSXin Li coefficients in silk_NLSF_decode() (NLSF_decode.c). 2584*a58d3d2aSXin LiThe stage-2 indices represent residuals after both the first stage of the VQ 2585*a58d3d2aSXin Li and a separate backwards-prediction step. 2586*a58d3d2aSXin LiThe backwards prediction process in the encoder subtracts a prediction from 2587*a58d3d2aSXin Li each residual formed by a multiple of the coefficient that follows it. 2588*a58d3d2aSXin LiThe decoder must undo this process. 2589*a58d3d2aSXin Li<xref target="silk_nlsf_pred_weights"/> contains lists of prediction weights 2590*a58d3d2aSXin Li for each coefficient. 2591*a58d3d2aSXin LiThere are two lists for NB and MB, and another two lists for WB, giving two 2592*a58d3d2aSXin Li possible prediction weights for each coefficient. 2593*a58d3d2aSXin Li</t> 2594*a58d3d2aSXin Li 2595*a58d3d2aSXin Li<texttable anchor="silk_nlsf_pred_weights" 2596*a58d3d2aSXin Li title="Prediction Weights for Normalized LSF Decoding"> 2597*a58d3d2aSXin Li<ttcol align="left">Coefficient</ttcol> 2598*a58d3d2aSXin Li<ttcol align="right">A</ttcol> 2599*a58d3d2aSXin Li<ttcol align="right">B</ttcol> 2600*a58d3d2aSXin Li<ttcol align="right">C</ttcol> 2601*a58d3d2aSXin Li<ttcol align="right">D</ttcol> 2602*a58d3d2aSXin Li <c>0</c> <c>179</c> <c>116</c> <c>175</c> <c>68</c> 2603*a58d3d2aSXin Li <c>1</c> <c>138</c> <c>67</c> <c>148</c> <c>62</c> 2604*a58d3d2aSXin Li <c>2</c> <c>140</c> <c>82</c> <c>160</c> <c>66</c> 2605*a58d3d2aSXin Li <c>3</c> <c>148</c> <c>59</c> <c>176</c> <c>60</c> 2606*a58d3d2aSXin Li <c>4</c> <c>151</c> <c>92</c> <c>178</c> <c>72</c> 2607*a58d3d2aSXin Li <c>5</c> <c>149</c> <c>72</c> <c>173</c> <c>117</c> 2608*a58d3d2aSXin Li <c>6</c> <c>153</c> <c>100</c> <c>174</c> <c>85</c> 2609*a58d3d2aSXin Li <c>7</c> <c>151</c> <c>89</c> <c>164</c> <c>90</c> 2610*a58d3d2aSXin Li <c>8</c> <c>163</c> <c>92</c> <c>177</c> <c>118</c> 2611*a58d3d2aSXin Li <c>9</c> <c/> <c/> <c>174</c> <c>136</c> 2612*a58d3d2aSXin Li<c>10</c> <c/> <c/> <c>196</c> <c>151</c> 2613*a58d3d2aSXin Li<c>11</c> <c/> <c/> <c>182</c> <c>142</c> 2614*a58d3d2aSXin Li<c>12</c> <c/> <c/> <c>198</c> <c>160</c> 2615*a58d3d2aSXin Li<c>13</c> <c/> <c/> <c>192</c> <c>142</c> 2616*a58d3d2aSXin Li<c>14</c> <c/> <c/> <c>182</c> <c>155</c> 2617*a58d3d2aSXin Li</texttable> 2618*a58d3d2aSXin Li 2619*a58d3d2aSXin Li<t> 2620*a58d3d2aSXin LiThe prediction is undone using the procedure implemented in 2621*a58d3d2aSXin Li silk_NLSF_residual_dequant() (NLSF_decode.c), which is as follows. 2622*a58d3d2aSXin LiEach coefficient selects its prediction weight from one of the two lists based 2623*a58d3d2aSXin Li on the stage-1 index, I1. 2624*a58d3d2aSXin Li<xref target="silk_nlsf_nbmb_weight_sel"/> gives the selections for each 2625*a58d3d2aSXin Li coefficient for NB and MB, and <xref target="silk_nlsf_wb_weight_sel"/> gives 2626*a58d3d2aSXin Li the selections for WB. 2627*a58d3d2aSXin LiLet d_LPC be the order of the codebook, i.e., 10 for NB and MB, and 16 for WB, 2628*a58d3d2aSXin Li and let pred_Q8[k] be the weight for the k'th coefficient selected by this 2629*a58d3d2aSXin Li process for 0 <= k < d_LPC-1. 2630*a58d3d2aSXin LiThen, the stage-2 residual for each coefficient is computed via 2631*a58d3d2aSXin Li<figure align="center"> 2632*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 2633*a58d3d2aSXin Lires_Q10[k] = (k+1 < d_LPC ? (res_Q10[k+1]*pred_Q8[k])>>8 : 0) 2634*a58d3d2aSXin Li + ((((I2[k]<<10) - sign(I2[k])*102)*qstep)>>16) , 2635*a58d3d2aSXin Li]]></artwork> 2636*a58d3d2aSXin Li</figure> 2637*a58d3d2aSXin Li where qstep is the Q16 quantization step size, which is 11796 for NB and MB 2638*a58d3d2aSXin Li and 9830 for WB (representing step sizes of approximately 0.18 and 0.15, 2639*a58d3d2aSXin Li respectively). 2640*a58d3d2aSXin Li</t> 2641*a58d3d2aSXin Li 2642*a58d3d2aSXin Li<texttable anchor="silk_nlsf_nbmb_weight_sel" 2643*a58d3d2aSXin Li title="Prediction Weight Selection for NB/MB Normalized LSF Decoding"> 2644*a58d3d2aSXin Li<ttcol>I1</ttcol> 2645*a58d3d2aSXin Li<ttcol>Coefficient</ttcol> 2646*a58d3d2aSXin Li<c/> 2647*a58d3d2aSXin Li<c><spanx style="vbare">0 1 2 3 4 5 6 7 8</spanx></c> 2648*a58d3d2aSXin Li<c> 0</c> 2649*a58d3d2aSXin Li<c><spanx style="vbare">A B A A A A A A A</spanx></c> 2650*a58d3d2aSXin Li<c> 1</c> 2651*a58d3d2aSXin Li<c><spanx style="vbare">B A A A A A A A A</spanx></c> 2652*a58d3d2aSXin Li<c> 2</c> 2653*a58d3d2aSXin Li<c><spanx style="vbare">A A A A A A A A A</spanx></c> 2654*a58d3d2aSXin Li<c> 3</c> 2655*a58d3d2aSXin Li<c><spanx style="vbare">B B B A A A A B A</spanx></c> 2656*a58d3d2aSXin Li<c> 4</c> 2657*a58d3d2aSXin Li<c><spanx style="vbare">A B A A A A A A A</spanx></c> 2658*a58d3d2aSXin Li<c> 5</c> 2659*a58d3d2aSXin Li<c><spanx style="vbare">A B A A A A A A A</spanx></c> 2660*a58d3d2aSXin Li<c> 6</c> 2661*a58d3d2aSXin Li<c><spanx style="vbare">B A B B A A A B A</spanx></c> 2662*a58d3d2aSXin Li<c> 7</c> 2663*a58d3d2aSXin Li<c><spanx style="vbare">A B B A A B B A A</spanx></c> 2664*a58d3d2aSXin Li<c> 8</c> 2665*a58d3d2aSXin Li<c><spanx style="vbare">A A B B A B A B B</spanx></c> 2666*a58d3d2aSXin Li<c> 9</c> 2667*a58d3d2aSXin Li<c><spanx style="vbare">A A B B A A B B B</spanx></c> 2668*a58d3d2aSXin Li<c>10</c> 2669*a58d3d2aSXin Li<c><spanx style="vbare">A A A A A A A A A</spanx></c> 2670*a58d3d2aSXin Li<c>11</c> 2671*a58d3d2aSXin Li<c><spanx style="vbare">A B A B B B B B A</spanx></c> 2672*a58d3d2aSXin Li<c>12</c> 2673*a58d3d2aSXin Li<c><spanx style="vbare">A B A B B B B B A</spanx></c> 2674*a58d3d2aSXin Li<c>13</c> 2675*a58d3d2aSXin Li<c><spanx style="vbare">A B B B B B B B A</spanx></c> 2676*a58d3d2aSXin Li<c>14</c> 2677*a58d3d2aSXin Li<c><spanx style="vbare">B A B B A B B B B</spanx></c> 2678*a58d3d2aSXin Li<c>15</c> 2679*a58d3d2aSXin Li<c><spanx style="vbare">A B B B B B A B A</spanx></c> 2680*a58d3d2aSXin Li<c>16</c> 2681*a58d3d2aSXin Li<c><spanx style="vbare">A A B B A B A B A</spanx></c> 2682*a58d3d2aSXin Li<c>17</c> 2683*a58d3d2aSXin Li<c><spanx style="vbare">A A B B B A B B B</spanx></c> 2684*a58d3d2aSXin Li<c>18</c> 2685*a58d3d2aSXin Li<c><spanx style="vbare">A B B A A B B B A</spanx></c> 2686*a58d3d2aSXin Li<c>19</c> 2687*a58d3d2aSXin Li<c><spanx style="vbare">A A A B B B A B A</spanx></c> 2688*a58d3d2aSXin Li<c>20</c> 2689*a58d3d2aSXin Li<c><spanx style="vbare">A B B A A B A B A</spanx></c> 2690*a58d3d2aSXin Li<c>21</c> 2691*a58d3d2aSXin Li<c><spanx style="vbare">A B B A A A B B A</spanx></c> 2692*a58d3d2aSXin Li<c>22</c> 2693*a58d3d2aSXin Li<c><spanx style="vbare">A A A A A B B B B</spanx></c> 2694*a58d3d2aSXin Li<c>23</c> 2695*a58d3d2aSXin Li<c><spanx style="vbare">A A B B A A A B B</spanx></c> 2696*a58d3d2aSXin Li<c>24</c> 2697*a58d3d2aSXin Li<c><spanx style="vbare">A A A B A B B B B</spanx></c> 2698*a58d3d2aSXin Li<c>25</c> 2699*a58d3d2aSXin Li<c><spanx style="vbare">A B B B B B B B A</spanx></c> 2700*a58d3d2aSXin Li<c>26</c> 2701*a58d3d2aSXin Li<c><spanx style="vbare">A A A A A A A A A</spanx></c> 2702*a58d3d2aSXin Li<c>27</c> 2703*a58d3d2aSXin Li<c><spanx style="vbare">A A A A A A A A A</spanx></c> 2704*a58d3d2aSXin Li<c>28</c> 2705*a58d3d2aSXin Li<c><spanx style="vbare">A A B A B B A B A</spanx></c> 2706*a58d3d2aSXin Li<c>29</c> 2707*a58d3d2aSXin Li<c><spanx style="vbare">B A A B A A A A A</spanx></c> 2708*a58d3d2aSXin Li<c>30</c> 2709*a58d3d2aSXin Li<c><spanx style="vbare">A A A B B A B A B</spanx></c> 2710*a58d3d2aSXin Li<c>31</c> 2711*a58d3d2aSXin Li<c><spanx style="vbare">B A B B A B B B B</spanx></c> 2712*a58d3d2aSXin Li</texttable> 2713*a58d3d2aSXin Li 2714*a58d3d2aSXin Li<texttable anchor="silk_nlsf_wb_weight_sel" 2715*a58d3d2aSXin Li title="Prediction Weight Selection for WB Normalized LSF Decoding"> 2716*a58d3d2aSXin Li<ttcol>I1</ttcol> 2717*a58d3d2aSXin Li<ttcol>Coefficient</ttcol> 2718*a58d3d2aSXin Li<c/> 2719*a58d3d2aSXin Li<c><spanx style="vbare">0 1 2 3 4 5 6 7 8 9 10 11 12 13 14</spanx></c> 2720*a58d3d2aSXin Li<c> 0</c> 2721*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C C C D</spanx></c> 2722*a58d3d2aSXin Li<c> 1</c> 2723*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C C C C</spanx></c> 2724*a58d3d2aSXin Li<c> 2</c> 2725*a58d3d2aSXin Li<c><spanx style="vbare">C C D C C D D D C D D D D C C</spanx></c> 2726*a58d3d2aSXin Li<c> 3</c> 2727*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C D C C</spanx></c> 2728*a58d3d2aSXin Li<c> 4</c> 2729*a58d3d2aSXin Li<c><spanx style="vbare">C D D C D C D D C D D D D D C</spanx></c> 2730*a58d3d2aSXin Li<c> 5</c> 2731*a58d3d2aSXin Li<c><spanx style="vbare">C C D C C C C C C C C C C C C</spanx></c> 2732*a58d3d2aSXin Li<c> 6</c> 2733*a58d3d2aSXin Li<c><spanx style="vbare">D C C C C C C C C C C D C D C</spanx></c> 2734*a58d3d2aSXin Li<c> 7</c> 2735*a58d3d2aSXin Li<c><spanx style="vbare">C D D C C C D C D D D C D C D</spanx></c> 2736*a58d3d2aSXin Li<c> 8</c> 2737*a58d3d2aSXin Li<c><spanx style="vbare">C D C D D C D C D C D D D D D</spanx></c> 2738*a58d3d2aSXin Li<c> 9</c> 2739*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C C C D</spanx></c> 2740*a58d3d2aSXin Li<c>10</c> 2741*a58d3d2aSXin Li<c><spanx style="vbare">C D C C C C C C C C C C C C C</spanx></c> 2742*a58d3d2aSXin Li<c>11</c> 2743*a58d3d2aSXin Li<c><spanx style="vbare">C C D C D D D D D D D C D C C</spanx></c> 2744*a58d3d2aSXin Li<c>12</c> 2745*a58d3d2aSXin Li<c><spanx style="vbare">C C D C C D C D C D C C D C C</spanx></c> 2746*a58d3d2aSXin Li<c>13</c> 2747*a58d3d2aSXin Li<c><spanx style="vbare">C C C C D D C D C D D D D C C</spanx></c> 2748*a58d3d2aSXin Li<c>14</c> 2749*a58d3d2aSXin Li<c><spanx style="vbare">C D C C C D D C D D D C D D D</spanx></c> 2750*a58d3d2aSXin Li<c>15</c> 2751*a58d3d2aSXin Li<c><spanx style="vbare">C C D D C C C C C C C C D D C</spanx></c> 2752*a58d3d2aSXin Li<c>16</c> 2753*a58d3d2aSXin Li<c><spanx style="vbare">C D D C D C D D D D D C D C C</spanx></c> 2754*a58d3d2aSXin Li<c>17</c> 2755*a58d3d2aSXin Li<c><spanx style="vbare">C C D C C C C D C C D D D C C</spanx></c> 2756*a58d3d2aSXin Li<c>18</c> 2757*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C C C D</spanx></c> 2758*a58d3d2aSXin Li<c>19</c> 2759*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C D C C</spanx></c> 2760*a58d3d2aSXin Li<c>20</c> 2761*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C C C C</spanx></c> 2762*a58d3d2aSXin Li<c>21</c> 2763*a58d3d2aSXin Li<c><spanx style="vbare">C D C D C D D C D C D C D D C</spanx></c> 2764*a58d3d2aSXin Li<c>22</c> 2765*a58d3d2aSXin Li<c><spanx style="vbare">C C D D D D C D D C C D D C C</spanx></c> 2766*a58d3d2aSXin Li<c>23</c> 2767*a58d3d2aSXin Li<c><spanx style="vbare">C D D C D C D C D C C C C D C</spanx></c> 2768*a58d3d2aSXin Li<c>24</c> 2769*a58d3d2aSXin Li<c><spanx style="vbare">C C C D D C D C D D D D D D D</spanx></c> 2770*a58d3d2aSXin Li<c>25</c> 2771*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C C C D</spanx></c> 2772*a58d3d2aSXin Li<c>26</c> 2773*a58d3d2aSXin Li<c><spanx style="vbare">C D D C C C D D C C D D D D D</spanx></c> 2774*a58d3d2aSXin Li<c>27</c> 2775*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C D C D D D D C D D D</spanx></c> 2776*a58d3d2aSXin Li<c>28</c> 2777*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C C C D</spanx></c> 2778*a58d3d2aSXin Li<c>29</c> 2779*a58d3d2aSXin Li<c><spanx style="vbare">C C C C C C C C C C C C C C D</spanx></c> 2780*a58d3d2aSXin Li<c>30</c> 2781*a58d3d2aSXin Li<c><spanx style="vbare">D C C C C C C C C C C D C C C</spanx></c> 2782*a58d3d2aSXin Li<c>31</c> 2783*a58d3d2aSXin Li<c><spanx style="vbare">C C D C C D D D C C D C C D C</spanx></c> 2784*a58d3d2aSXin Li</texttable> 2785*a58d3d2aSXin Li 2786*a58d3d2aSXin Li</section> 2787*a58d3d2aSXin Li 2788*a58d3d2aSXin Li<section anchor="silk_nlsf_reconstruction" 2789*a58d3d2aSXin Li title="Reconstructing the Normalized LSF Coefficients"> 2790*a58d3d2aSXin Li<t> 2791*a58d3d2aSXin LiOnce the stage-1 index I1 and the stage-2 residual res_Q10[] have been decoded, 2792*a58d3d2aSXin Li the final normalized LSF coefficients can be reconstructed. 2793*a58d3d2aSXin Li</t> 2794*a58d3d2aSXin Li<t> 2795*a58d3d2aSXin LiThe spectral distortion introduced by the quantization of each LSF coefficient 2796*a58d3d2aSXin Li varies, so the stage-2 residual is weighted accordingly, using the 2797*a58d3d2aSXin Li low-complexity Inverse Harmonic Mean Weighting (IHMW) function proposed in 2798*a58d3d2aSXin Li <xref target="laroia-icassp"/>. 2799*a58d3d2aSXin LiThe weights are derived directly from the stage-1 codebook vector. 2800*a58d3d2aSXin LiLet cb1_Q8[k] be the k'th entry of the stage-1 codebook vector from 2801*a58d3d2aSXin Li <xref target="silk_nlsf_nbmb_codebook"/> or 2802*a58d3d2aSXin Li <xref target="silk_nlsf_wb_codebook"/>. 2803*a58d3d2aSXin LiThen for 0 <= k < d_LPC the following expression 2804*a58d3d2aSXin Li computes the square of the weight as a Q18 value: 2805*a58d3d2aSXin Li<figure align="center"> 2806*a58d3d2aSXin Li<artwork align="center"> 2807*a58d3d2aSXin Li<![CDATA[ 2808*a58d3d2aSXin Liw2_Q18[k] = (1024/(cb1_Q8[k] - cb1_Q8[k-1]) 2809*a58d3d2aSXin Li + 1024/(cb1_Q8[k+1] - cb1_Q8[k])) << 16 , 2810*a58d3d2aSXin Li]]> 2811*a58d3d2aSXin Li</artwork> 2812*a58d3d2aSXin Li</figure> 2813*a58d3d2aSXin Li where cb1_Q8[-1] = 0 and cb1_Q8[d_LPC] = 256, and the 2814*a58d3d2aSXin Li division is integer division. 2815*a58d3d2aSXin LiThis is reduced to an unsquared, Q9 value using the following square-root 2816*a58d3d2aSXin Li approximation: 2817*a58d3d2aSXin Li<figure align="center"> 2818*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 2819*a58d3d2aSXin Lii = ilog(w2_Q18[k]) 2820*a58d3d2aSXin Lif = (w2_Q18[k]>>(i-8)) & 127 2821*a58d3d2aSXin Liy = ((i&1) ? 32768 : 46214) >> ((32-i)>>1) 2822*a58d3d2aSXin Liw_Q9[k] = y + ((213*f*y)>>16) 2823*a58d3d2aSXin Li]]></artwork> 2824*a58d3d2aSXin Li</figure> 2825*a58d3d2aSXin LiThe constant 46214 here is approximately the square root of 2 in Q15. 2826*a58d3d2aSXin LiThe cb1_Q8[] vector completely determines these weights, and they may be 2827*a58d3d2aSXin Li tabulated and stored as 13-bit unsigned values (with a range of 1819 to 5227, 2828*a58d3d2aSXin Li inclusive) to avoid computing them when decoding. 2829*a58d3d2aSXin LiThe reference implementation already requires code to compute these weights on 2830*a58d3d2aSXin Li unquantized coefficients in the encoder, in silk_NLSF_VQ_weights_laroia() 2831*a58d3d2aSXin Li (NLSF_VQ_weights_laroia.c) and its callers, so it reuses that code in the 2832*a58d3d2aSXin Li decoder instead of using a pre-computed table to reduce the amount of ROM 2833*a58d3d2aSXin Li required. 2834*a58d3d2aSXin Li</t> 2835*a58d3d2aSXin Li 2836*a58d3d2aSXin Li<texttable anchor="silk_nlsf_nbmb_codebook" 2837*a58d3d2aSXin Li title="NB/MB Normalized LSF Stage-1 Codebook Vectors"> 2838*a58d3d2aSXin Li<ttcol>I1</ttcol> 2839*a58d3d2aSXin Li<ttcol>Codebook (Q8)</ttcol> 2840*a58d3d2aSXin Li<c/> 2841*a58d3d2aSXin Li<c><spanx style="vbare"> 0 1 2 3 4 5 6 7 8 9</spanx></c> 2842*a58d3d2aSXin Li<c>0</c> 2843*a58d3d2aSXin Li<c><spanx style="vbare">12 35 60 83 108 132 157 180 206 228</spanx></c> 2844*a58d3d2aSXin Li<c>1</c> 2845*a58d3d2aSXin Li<c><spanx style="vbare">15 32 55 77 101 125 151 175 201 225</spanx></c> 2846*a58d3d2aSXin Li<c>2</c> 2847*a58d3d2aSXin Li<c><spanx style="vbare">19 42 66 89 114 137 162 184 209 230</spanx></c> 2848*a58d3d2aSXin Li<c>3</c> 2849*a58d3d2aSXin Li<c><spanx style="vbare">12 25 50 72 97 120 147 172 200 223</spanx></c> 2850*a58d3d2aSXin Li<c>4</c> 2851*a58d3d2aSXin Li<c><spanx style="vbare">26 44 69 90 114 135 159 180 205 225</spanx></c> 2852*a58d3d2aSXin Li<c>5</c> 2853*a58d3d2aSXin Li<c><spanx style="vbare">13 22 53 80 106 130 156 180 205 228</spanx></c> 2854*a58d3d2aSXin Li<c>6</c> 2855*a58d3d2aSXin Li<c><spanx style="vbare">15 25 44 64 90 115 142 168 196 222</spanx></c> 2856*a58d3d2aSXin Li<c>7</c> 2857*a58d3d2aSXin Li<c><spanx style="vbare">19 24 62 82 100 120 145 168 190 214</spanx></c> 2858*a58d3d2aSXin Li<c>8</c> 2859*a58d3d2aSXin Li<c><spanx style="vbare">22 31 50 79 103 120 151 170 203 227</spanx></c> 2860*a58d3d2aSXin Li<c>9</c> 2861*a58d3d2aSXin Li<c><spanx style="vbare">21 29 45 65 106 124 150 171 196 224</spanx></c> 2862*a58d3d2aSXin Li<c>10</c> 2863*a58d3d2aSXin Li<c><spanx style="vbare">30 49 75 97 121 142 165 186 209 229</spanx></c> 2864*a58d3d2aSXin Li<c>11</c> 2865*a58d3d2aSXin Li<c><spanx style="vbare">19 25 52 70 93 116 143 166 192 219</spanx></c> 2866*a58d3d2aSXin Li<c>12</c> 2867*a58d3d2aSXin Li<c><spanx style="vbare">26 34 62 75 97 118 145 167 194 217</spanx></c> 2868*a58d3d2aSXin Li<c>13</c> 2869*a58d3d2aSXin Li<c><spanx style="vbare">25 33 56 70 91 113 143 165 196 223</spanx></c> 2870*a58d3d2aSXin Li<c>14</c> 2871*a58d3d2aSXin Li<c><spanx style="vbare">21 34 51 72 97 117 145 171 196 222</spanx></c> 2872*a58d3d2aSXin Li<c>15</c> 2873*a58d3d2aSXin Li<c><spanx style="vbare">20 29 50 67 90 117 144 168 197 221</spanx></c> 2874*a58d3d2aSXin Li<c>16</c> 2875*a58d3d2aSXin Li<c><spanx style="vbare">22 31 48 66 95 117 146 168 196 222</spanx></c> 2876*a58d3d2aSXin Li<c>17</c> 2877*a58d3d2aSXin Li<c><spanx style="vbare">24 33 51 77 116 134 158 180 200 224</spanx></c> 2878*a58d3d2aSXin Li<c>18</c> 2879*a58d3d2aSXin Li<c><spanx style="vbare">21 28 70 87 106 124 149 170 194 217</spanx></c> 2880*a58d3d2aSXin Li<c>19</c> 2881*a58d3d2aSXin Li<c><spanx style="vbare">26 33 53 64 83 117 152 173 204 225</spanx></c> 2882*a58d3d2aSXin Li<c>20</c> 2883*a58d3d2aSXin Li<c><spanx style="vbare">27 34 65 95 108 129 155 174 210 225</spanx></c> 2884*a58d3d2aSXin Li<c>21</c> 2885*a58d3d2aSXin Li<c><spanx style="vbare">20 26 72 99 113 131 154 176 200 219</spanx></c> 2886*a58d3d2aSXin Li<c>22</c> 2887*a58d3d2aSXin Li<c><spanx style="vbare">34 43 61 78 93 114 155 177 205 229</spanx></c> 2888*a58d3d2aSXin Li<c>23</c> 2889*a58d3d2aSXin Li<c><spanx style="vbare">23 29 54 97 124 138 163 179 209 229</spanx></c> 2890*a58d3d2aSXin Li<c>24</c> 2891*a58d3d2aSXin Li<c><spanx style="vbare">30 38 56 89 118 129 158 178 200 231</spanx></c> 2892*a58d3d2aSXin Li<c>25</c> 2893*a58d3d2aSXin Li<c><spanx style="vbare">21 29 49 63 85 111 142 163 193 222</spanx></c> 2894*a58d3d2aSXin Li<c>26</c> 2895*a58d3d2aSXin Li<c><spanx style="vbare">27 48 77 103 133 158 179 196 215 232</spanx></c> 2896*a58d3d2aSXin Li<c>27</c> 2897*a58d3d2aSXin Li<c><spanx style="vbare">29 47 74 99 124 151 176 198 220 237</spanx></c> 2898*a58d3d2aSXin Li<c>28</c> 2899*a58d3d2aSXin Li<c><spanx style="vbare">33 42 61 76 93 121 155 174 207 225</spanx></c> 2900*a58d3d2aSXin Li<c>29</c> 2901*a58d3d2aSXin Li<c><spanx style="vbare">29 53 87 112 136 154 170 188 208 227</spanx></c> 2902*a58d3d2aSXin Li<c>30</c> 2903*a58d3d2aSXin Li<c><spanx style="vbare">24 30 52 84 131 150 166 186 203 229</spanx></c> 2904*a58d3d2aSXin Li<c>31</c> 2905*a58d3d2aSXin Li<c><spanx style="vbare">37 48 64 84 104 118 156 177 201 230</spanx></c> 2906*a58d3d2aSXin Li</texttable> 2907*a58d3d2aSXin Li 2908*a58d3d2aSXin Li<texttable anchor="silk_nlsf_wb_codebook" 2909*a58d3d2aSXin Li title="WB Normalized LSF Stage-1 Codebook Vectors"> 2910*a58d3d2aSXin Li<ttcol>I1</ttcol> 2911*a58d3d2aSXin Li<ttcol>Codebook (Q8)</ttcol> 2912*a58d3d2aSXin Li<c/> 2913*a58d3d2aSXin Li<c><spanx style="vbare"> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15</spanx></c> 2914*a58d3d2aSXin Li<c>0</c> 2915*a58d3d2aSXin Li<c><spanx style="vbare"> 7 23 38 54 69 85 100 116 131 147 162 178 193 208 223 239</spanx></c> 2916*a58d3d2aSXin Li<c>1</c> 2917*a58d3d2aSXin Li<c><spanx style="vbare">13 25 41 55 69 83 98 112 127 142 157 171 187 203 220 236</spanx></c> 2918*a58d3d2aSXin Li<c>2</c> 2919*a58d3d2aSXin Li<c><spanx style="vbare">15 21 34 51 61 78 92 106 126 136 152 167 185 205 225 240</spanx></c> 2920*a58d3d2aSXin Li<c>3</c> 2921*a58d3d2aSXin Li<c><spanx style="vbare">10 21 36 50 63 79 95 110 126 141 157 173 189 205 221 237</spanx></c> 2922*a58d3d2aSXin Li<c>4</c> 2923*a58d3d2aSXin Li<c><spanx style="vbare">17 20 37 51 59 78 89 107 123 134 150 164 184 205 224 240</spanx></c> 2924*a58d3d2aSXin Li<c>5</c> 2925*a58d3d2aSXin Li<c><spanx style="vbare">10 15 32 51 67 81 96 112 129 142 158 173 189 204 220 236</spanx></c> 2926*a58d3d2aSXin Li<c>6</c> 2927*a58d3d2aSXin Li<c><spanx style="vbare"> 8 21 37 51 65 79 98 113 126 138 155 168 179 192 209 218</spanx></c> 2928*a58d3d2aSXin Li<c>7</c> 2929*a58d3d2aSXin Li<c><spanx style="vbare">12 15 34 55 63 78 87 108 118 131 148 167 185 203 219 236</spanx></c> 2930*a58d3d2aSXin Li<c>8</c> 2931*a58d3d2aSXin Li<c><spanx style="vbare">16 19 32 36 56 79 91 108 118 136 154 171 186 204 220 237</spanx></c> 2932*a58d3d2aSXin Li<c>9</c> 2933*a58d3d2aSXin Li<c><spanx style="vbare">11 28 43 58 74 89 105 120 135 150 165 180 196 211 226 241</spanx></c> 2934*a58d3d2aSXin Li<c>10</c> 2935*a58d3d2aSXin Li<c><spanx style="vbare"> 6 16 33 46 60 75 92 107 123 137 156 169 185 199 214 225</spanx></c> 2936*a58d3d2aSXin Li<c>11</c> 2937*a58d3d2aSXin Li<c><spanx style="vbare">11 19 30 44 57 74 89 105 121 135 152 169 186 202 218 234</spanx></c> 2938*a58d3d2aSXin Li<c>12</c> 2939*a58d3d2aSXin Li<c><spanx style="vbare">12 19 29 46 57 71 88 100 120 132 148 165 182 199 216 233</spanx></c> 2940*a58d3d2aSXin Li<c>13</c> 2941*a58d3d2aSXin Li<c><spanx style="vbare">17 23 35 46 56 77 92 106 123 134 152 167 185 204 222 237</spanx></c> 2942*a58d3d2aSXin Li<c>14</c> 2943*a58d3d2aSXin Li<c><spanx style="vbare">14 17 45 53 63 75 89 107 115 132 151 171 188 206 221 240</spanx></c> 2944*a58d3d2aSXin Li<c>15</c> 2945*a58d3d2aSXin Li<c><spanx style="vbare"> 9 16 29 40 56 71 88 103 119 137 154 171 189 205 222 237</spanx></c> 2946*a58d3d2aSXin Li<c>16</c> 2947*a58d3d2aSXin Li<c><spanx style="vbare">16 19 36 48 57 76 87 105 118 132 150 167 185 202 218 236</spanx></c> 2948*a58d3d2aSXin Li<c>17</c> 2949*a58d3d2aSXin Li<c><spanx style="vbare">12 17 29 54 71 81 94 104 126 136 149 164 182 201 221 237</spanx></c> 2950*a58d3d2aSXin Li<c>18</c> 2951*a58d3d2aSXin Li<c><spanx style="vbare">15 28 47 62 79 97 115 129 142 155 168 180 194 208 223 238</spanx></c> 2952*a58d3d2aSXin Li<c>19</c> 2953*a58d3d2aSXin Li<c><spanx style="vbare"> 8 14 30 45 62 78 94 111 127 143 159 175 192 207 223 239</spanx></c> 2954*a58d3d2aSXin Li<c>20</c> 2955*a58d3d2aSXin Li<c><spanx style="vbare">17 30 49 62 79 92 107 119 132 145 160 174 190 204 220 235</spanx></c> 2956*a58d3d2aSXin Li<c>21</c> 2957*a58d3d2aSXin Li<c><spanx style="vbare">14 19 36 45 61 76 91 108 121 138 154 172 189 205 222 238</spanx></c> 2958*a58d3d2aSXin Li<c>22</c> 2959*a58d3d2aSXin Li<c><spanx style="vbare">12 18 31 45 60 76 91 107 123 138 154 171 187 204 221 236</spanx></c> 2960*a58d3d2aSXin Li<c>23</c> 2961*a58d3d2aSXin Li<c><spanx style="vbare">13 17 31 43 53 70 83 103 114 131 149 167 185 203 220 237</spanx></c> 2962*a58d3d2aSXin Li<c>24</c> 2963*a58d3d2aSXin Li<c><spanx style="vbare">17 22 35 42 58 78 93 110 125 139 155 170 188 206 224 240</spanx></c> 2964*a58d3d2aSXin Li<c>25</c> 2965*a58d3d2aSXin Li<c><spanx style="vbare"> 8 15 34 50 67 83 99 115 131 146 162 178 193 209 224 239</spanx></c> 2966*a58d3d2aSXin Li<c>26</c> 2967*a58d3d2aSXin Li<c><spanx style="vbare">13 16 41 66 73 86 95 111 128 137 150 163 183 206 225 241</spanx></c> 2968*a58d3d2aSXin Li<c>27</c> 2969*a58d3d2aSXin Li<c><spanx style="vbare">17 25 37 52 63 75 92 102 119 132 144 160 175 191 212 231</spanx></c> 2970*a58d3d2aSXin Li<c>28</c> 2971*a58d3d2aSXin Li<c><spanx style="vbare">19 31 49 65 83 100 117 133 147 161 174 187 200 213 227 242</spanx></c> 2972*a58d3d2aSXin Li<c>29</c> 2973*a58d3d2aSXin Li<c><spanx style="vbare">18 31 52 68 88 103 117 126 138 149 163 177 192 207 223 239</spanx></c> 2974*a58d3d2aSXin Li<c>30</c> 2975*a58d3d2aSXin Li<c><spanx style="vbare">16 29 47 61 76 90 106 119 133 147 161 176 193 209 224 240</spanx></c> 2976*a58d3d2aSXin Li<c>31</c> 2977*a58d3d2aSXin Li<c><spanx style="vbare">15 21 35 50 61 73 86 97 110 119 129 141 175 198 218 237</spanx></c> 2978*a58d3d2aSXin Li</texttable> 2979*a58d3d2aSXin Li 2980*a58d3d2aSXin Li<t> 2981*a58d3d2aSXin LiGiven the stage-1 codebook entry cb1_Q8[], the stage-2 residual res_Q10[], and 2982*a58d3d2aSXin Li their corresponding weights, w_Q9[], the reconstructed normalized LSF 2983*a58d3d2aSXin Li coefficients are 2984*a58d3d2aSXin Li<figure align="center"> 2985*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 2986*a58d3d2aSXin LiNLSF_Q15[k] = clamp(0, 2987*a58d3d2aSXin Li (cb1_Q8[k]<<7) + (res_Q10[k]<<14)/w_Q9[k], 32767) , 2988*a58d3d2aSXin Li]]></artwork> 2989*a58d3d2aSXin Li</figure> 2990*a58d3d2aSXin Li where the division is integer division. 2991*a58d3d2aSXin LiHowever, nothing in either the reconstruction process or the 2992*a58d3d2aSXin Li quantization process in the encoder thus far guarantees that the coefficients 2993*a58d3d2aSXin Li are monotonically increasing and separated well enough to ensure a stable 2994*a58d3d2aSXin Li filter <xref target="Kabal86"/>. 2995*a58d3d2aSXin LiWhen using the reference encoder, roughly 2% of frames violate this constraint. 2996*a58d3d2aSXin LiThe next section describes a stabilization procedure used to make these 2997*a58d3d2aSXin Li guarantees. 2998*a58d3d2aSXin Li</t> 2999*a58d3d2aSXin Li 3000*a58d3d2aSXin Li</section> 3001*a58d3d2aSXin Li 3002*a58d3d2aSXin Li<section anchor="silk_nlsf_stabilization" title="Normalized LSF Stabilization"> 3003*a58d3d2aSXin Li<t> 3004*a58d3d2aSXin LiThe normalized LSF stabilization procedure is implemented in 3005*a58d3d2aSXin Li silk_NLSF_stabilize() (NLSF_stabilize.c). 3006*a58d3d2aSXin LiThis process ensures that consecutive values of the normalized LSF 3007*a58d3d2aSXin Li coefficients, NLSF_Q15[], are spaced some minimum distance apart 3008*a58d3d2aSXin Li (predetermined to be the 0.01 percentile of a large training set). 3009*a58d3d2aSXin Li<xref target="silk_nlsf_min_spacing"/> gives the minimum spacings for NB and MB 3010*a58d3d2aSXin Li and those for WB, where row k is the minimum allowed value of 3011*a58d3d2aSXin Li NLSF_Q[k]-NLSF_Q[k-1]. 3012*a58d3d2aSXin LiFor the purposes of computing this spacing for the first and last coefficient, 3013*a58d3d2aSXin Li NLSF_Q15[-1] is taken to be 0, and NLSF_Q15[d_LPC] is taken to be 32768. 3014*a58d3d2aSXin Li</t> 3015*a58d3d2aSXin Li 3016*a58d3d2aSXin Li<texttable anchor="silk_nlsf_min_spacing" 3017*a58d3d2aSXin Li title="Minimum Spacing for Normalized LSF Coefficients"> 3018*a58d3d2aSXin Li<ttcol>Coefficient</ttcol> 3019*a58d3d2aSXin Li<ttcol align="right">NB and MB</ttcol> 3020*a58d3d2aSXin Li<ttcol align="right">WB</ttcol> 3021*a58d3d2aSXin Li <c>0</c> <c>250</c> <c>100</c> 3022*a58d3d2aSXin Li <c>1</c> <c>3</c> <c>3</c> 3023*a58d3d2aSXin Li <c>2</c> <c>6</c> <c>40</c> 3024*a58d3d2aSXin Li <c>3</c> <c>3</c> <c>3</c> 3025*a58d3d2aSXin Li <c>4</c> <c>3</c> <c>3</c> 3026*a58d3d2aSXin Li <c>5</c> <c>3</c> <c>3</c> 3027*a58d3d2aSXin Li <c>6</c> <c>4</c> <c>5</c> 3028*a58d3d2aSXin Li <c>7</c> <c>3</c> <c>14</c> 3029*a58d3d2aSXin Li <c>8</c> <c>3</c> <c>14</c> 3030*a58d3d2aSXin Li <c>9</c> <c>3</c> <c>10</c> 3031*a58d3d2aSXin Li<c>10</c> <c>461</c> <c>11</c> 3032*a58d3d2aSXin Li<c>11</c> <c/> <c>3</c> 3033*a58d3d2aSXin Li<c>12</c> <c/> <c>8</c> 3034*a58d3d2aSXin Li<c>13</c> <c/> <c>9</c> 3035*a58d3d2aSXin Li<c>14</c> <c/> <c>7</c> 3036*a58d3d2aSXin Li<c>15</c> <c/> <c>3</c> 3037*a58d3d2aSXin Li<c>16</c> <c/> <c>347</c> 3038*a58d3d2aSXin Li</texttable> 3039*a58d3d2aSXin Li 3040*a58d3d2aSXin Li<t> 3041*a58d3d2aSXin LiThe procedure starts off by trying to make small adjustments which attempt to 3042*a58d3d2aSXin Li minimize the amount of distortion introduced. 3043*a58d3d2aSXin LiAfter 20 such adjustments, it falls back to a more direct method which 3044*a58d3d2aSXin Li guarantees the constraints are enforced but may require large adjustments. 3045*a58d3d2aSXin Li</t> 3046*a58d3d2aSXin Li<t> 3047*a58d3d2aSXin LiLet NDeltaMin_Q15[k] be the minimum required spacing for the current audio 3048*a58d3d2aSXin Li bandwidth from <xref target="silk_nlsf_min_spacing"/>. 3049*a58d3d2aSXin LiFirst, the procedure finds the index i where 3050*a58d3d2aSXin Li NLSF_Q15[i] - NLSF_Q15[i-1] - NDeltaMin_Q15[i] is the 3051*a58d3d2aSXin Li smallest, breaking ties by using the lower value of i. 3052*a58d3d2aSXin LiIf this value is non-negative, then the stabilization stops; the coefficients 3053*a58d3d2aSXin Li satisfy all the constraints. 3054*a58d3d2aSXin LiOtherwise, if i == 0, it sets NLSF_Q15[0] to NDeltaMin_Q15[0], and if 3055*a58d3d2aSXin Li i == d_LPC, it sets NLSF_Q15[d_LPC-1] to 3056*a58d3d2aSXin Li (32768 - NDeltaMin_Q15[d_LPC]). 3057*a58d3d2aSXin LiFor all other values of i, both NLSF_Q15[i-1] and NLSF_Q15[i] are updated as 3058*a58d3d2aSXin Li follows: 3059*a58d3d2aSXin Li<figure align="center"> 3060*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3061*a58d3d2aSXin Li i-1 3062*a58d3d2aSXin Li __ 3063*a58d3d2aSXin Li min_center_Q15 = (NDeltaMin_Q15[i]>>1) + \ NDeltaMin_Q15[k] 3064*a58d3d2aSXin Li /_ 3065*a58d3d2aSXin Li k=0 3066*a58d3d2aSXin Li d_LPC 3067*a58d3d2aSXin Li __ 3068*a58d3d2aSXin Li max_center_Q15 = 32768 - (NDeltaMin_Q15[i]>>1) - \ NDeltaMin_Q15[k] 3069*a58d3d2aSXin Li /_ 3070*a58d3d2aSXin Li k=i+1 3071*a58d3d2aSXin Licenter_freq_Q15 = clamp(min_center_Q15[i], 3072*a58d3d2aSXin Li (NLSF_Q15[i-1] + NLSF_Q15[i] + 1)>>1, 3073*a58d3d2aSXin Li max_center_Q15[i]) 3074*a58d3d2aSXin Li 3075*a58d3d2aSXin Li NLSF_Q15[i-1] = center_freq_Q15 - (NDeltaMin_Q15[i]>>1) 3076*a58d3d2aSXin Li 3077*a58d3d2aSXin Li NLSF_Q15[i] = NLSF_Q15[i-1] + NDeltaMin_Q15[i] . 3078*a58d3d2aSXin Li]]></artwork> 3079*a58d3d2aSXin Li</figure> 3080*a58d3d2aSXin LiThen the procedure repeats again, until it has either executed 20 times or 3081*a58d3d2aSXin Li has stopped because the coefficients satisfy all the constraints. 3082*a58d3d2aSXin Li</t> 3083*a58d3d2aSXin Li<t> 3084*a58d3d2aSXin LiAfter the 20th repetition of the above procedure, the following fallback 3085*a58d3d2aSXin Li procedure executes once. 3086*a58d3d2aSXin LiFirst, the values of NLSF_Q15[k] for 0 <= k < d_LPC 3087*a58d3d2aSXin Li are sorted in ascending order. 3088*a58d3d2aSXin LiThen for each value of k from 0 to d_LPC-1, NLSF_Q15[k] is set to 3089*a58d3d2aSXin Li<figure align="center"> 3090*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3091*a58d3d2aSXin Limax(NLSF_Q15[k], NLSF_Q15[k-1] + NDeltaMin_Q15[k]) . 3092*a58d3d2aSXin Li]]></artwork> 3093*a58d3d2aSXin Li</figure> 3094*a58d3d2aSXin LiNext, for each value of k from d_LPC-1 down to 0, NLSF_Q15[k] is set to 3095*a58d3d2aSXin Li<figure align="center"> 3096*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3097*a58d3d2aSXin Limin(NLSF_Q15[k], NLSF_Q15[k+1] - NDeltaMin_Q15[k+1]) . 3098*a58d3d2aSXin Li]]></artwork> 3099*a58d3d2aSXin Li</figure> 3100*a58d3d2aSXin Li</t> 3101*a58d3d2aSXin Li 3102*a58d3d2aSXin Li</section> 3103*a58d3d2aSXin Li 3104*a58d3d2aSXin Li<section anchor="silk_nlsf_interpolation" title="Normalized LSF Interpolation"> 3105*a58d3d2aSXin Li<t> 3106*a58d3d2aSXin LiFor 20 ms SILK frames, the first half of the frame (i.e., the first two 3107*a58d3d2aSXin Li subframes) may use normalized LSF coefficients that are interpolated between 3108*a58d3d2aSXin Li the decoded LSFs for the most recent coded frame (in the same channel) and the 3109*a58d3d2aSXin Li current frame. 3110*a58d3d2aSXin LiA Q2 interpolation factor follows the LSF coefficient indices in the bitstream, 3111*a58d3d2aSXin Li which is decoded using the PDF in <xref target="silk_nlsf_interp_pdf"/>. 3112*a58d3d2aSXin LiThis happens in silk_decode_indices() (decode_indices.c). 3113*a58d3d2aSXin LiAfter either 3114*a58d3d2aSXin Li<list style="symbols"> 3115*a58d3d2aSXin Li<t>An uncoded regular SILK frame in the side channel, or</t> 3116*a58d3d2aSXin Li<t>A decoder reset (see <xref target="decoder-reset"/>),</t> 3117*a58d3d2aSXin Li</list> 3118*a58d3d2aSXin Li the decoder still decodes this factor, but ignores its value and always uses 3119*a58d3d2aSXin Li 4 instead. 3120*a58d3d2aSXin LiFor 10 ms SILK frames, this factor is not stored at all. 3121*a58d3d2aSXin Li</t> 3122*a58d3d2aSXin Li 3123*a58d3d2aSXin Li<texttable anchor="silk_nlsf_interp_pdf" 3124*a58d3d2aSXin Li title="PDF for Normalized LSF Interpolation Index"> 3125*a58d3d2aSXin Li<ttcol>PDF</ttcol> 3126*a58d3d2aSXin Li<c>{13, 22, 29, 11, 181}/256</c> 3127*a58d3d2aSXin Li</texttable> 3128*a58d3d2aSXin Li 3129*a58d3d2aSXin Li<t> 3130*a58d3d2aSXin LiLet n2_Q15[k] be the normalized LSF coefficients decoded by the procedure in 3131*a58d3d2aSXin Li <xref target="silk_nlsfs"/>, n0_Q15[k] be the LSF coefficients 3132*a58d3d2aSXin Li decoded for the prior frame, and w_Q2 be the interpolation factor. 3133*a58d3d2aSXin LiThen the normalized LSF coefficients used for the first half of a 20 ms 3134*a58d3d2aSXin Li frame, n1_Q15[k], are 3135*a58d3d2aSXin Li<figure align="center"> 3136*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3137*a58d3d2aSXin Lin1_Q15[k] = n0_Q15[k] + (w_Q2*(n2_Q15[k] - n0_Q15[k]) >> 2) . 3138*a58d3d2aSXin Li]]></artwork> 3139*a58d3d2aSXin Li</figure> 3140*a58d3d2aSXin LiThis interpolation is performed in silk_decode_parameters() 3141*a58d3d2aSXin Li (decode_parameters.c). 3142*a58d3d2aSXin Li</t> 3143*a58d3d2aSXin Li</section> 3144*a58d3d2aSXin Li 3145*a58d3d2aSXin Li<section anchor="silk_nlsf2lpc" 3146*a58d3d2aSXin Li title="Converting Normalized LSFs to LPC Coefficients"> 3147*a58d3d2aSXin Li<t> 3148*a58d3d2aSXin LiAny LPC filter A(z) can be split into a symmetric part P(z) and an 3149*a58d3d2aSXin Li anti-symmetric part Q(z) such that 3150*a58d3d2aSXin Li<figure align="center"> 3151*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3152*a58d3d2aSXin Li d_LPC 3153*a58d3d2aSXin Li __ -k 1 3154*a58d3d2aSXin LiA(z) = 1 - \ a[k] * z = - * (P(z) + Q(z)) 3155*a58d3d2aSXin Li /_ 2 3156*a58d3d2aSXin Li k=1 3157*a58d3d2aSXin Li]]></artwork> 3158*a58d3d2aSXin Li</figure> 3159*a58d3d2aSXin Liwith 3160*a58d3d2aSXin Li<figure align="center"> 3161*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3162*a58d3d2aSXin Li -d_LPC-1 -1 3163*a58d3d2aSXin LiP(z) = A(z) + z * A(z ) 3164*a58d3d2aSXin Li 3165*a58d3d2aSXin Li -d_LPC-1 -1 3166*a58d3d2aSXin LiQ(z) = A(z) - z * A(z ) . 3167*a58d3d2aSXin Li]]></artwork> 3168*a58d3d2aSXin Li</figure> 3169*a58d3d2aSXin LiThe even normalized LSF coefficients correspond to a pair of conjugate roots of 3170*a58d3d2aSXin Li P(z), while the odd coefficients correspond to a pair of conjugate roots of 3171*a58d3d2aSXin Li Q(z), all of which lie on the unit circle. 3172*a58d3d2aSXin LiIn addition, P(z) has a root at pi and Q(z) has a root at 0. 3173*a58d3d2aSXin LiThus, they may be reconstructed mathematically from a set of normalized LSF 3174*a58d3d2aSXin Li coefficients, n[k], as 3175*a58d3d2aSXin Li<figure align="center"> 3176*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3177*a58d3d2aSXin Li d_LPC/2-1 3178*a58d3d2aSXin Li -1 ___ -1 -2 3179*a58d3d2aSXin LiP(z) = (1 + z ) * | | (1 - 2*cos(pi*n[2*k])*z + z ) 3180*a58d3d2aSXin Li k=0 3181*a58d3d2aSXin Li 3182*a58d3d2aSXin Li d_LPC/2-1 3183*a58d3d2aSXin Li -1 ___ -1 -2 3184*a58d3d2aSXin LiQ(z) = (1 - z ) * | | (1 - 2*cos(pi*n[2*k+1])*z + z ) 3185*a58d3d2aSXin Li k=0 3186*a58d3d2aSXin Li]]></artwork> 3187*a58d3d2aSXin Li</figure> 3188*a58d3d2aSXin Li</t> 3189*a58d3d2aSXin Li<t> 3190*a58d3d2aSXin LiHowever, SILK performs this reconstruction using a fixed-point approximation so 3191*a58d3d2aSXin Li that all decoders can reproduce it in a bit-exact manner to avoid prediction 3192*a58d3d2aSXin Li drift. 3193*a58d3d2aSXin LiThe function silk_NLSF2A() (NLSF2A.c) implements this procedure. 3194*a58d3d2aSXin Li</t> 3195*a58d3d2aSXin Li<t> 3196*a58d3d2aSXin LiTo start, it approximates cos(pi*n[k]) using a table lookup with linear 3197*a58d3d2aSXin Li interpolation. 3198*a58d3d2aSXin LiThe encoder SHOULD use the inverse of this piecewise linear approximation, 3199*a58d3d2aSXin Li rather than the true inverse of the cosine function, when deriving the 3200*a58d3d2aSXin Li normalized LSF coefficients. 3201*a58d3d2aSXin LiThese values are also re-ordered to improve numerical accuracy when 3202*a58d3d2aSXin Li constructing the LPC polynomials. 3203*a58d3d2aSXin Li</t> 3204*a58d3d2aSXin Li 3205*a58d3d2aSXin Li<texttable anchor="silk_nlsf_orderings" 3206*a58d3d2aSXin Li title="LSF Ordering for Polynomial Evaluation"> 3207*a58d3d2aSXin Li<ttcol>Coefficient</ttcol> 3208*a58d3d2aSXin Li<ttcol align="right">NB and MB</ttcol> 3209*a58d3d2aSXin Li<ttcol align="right">WB</ttcol> 3210*a58d3d2aSXin Li <c>0</c> <c>0</c> <c>0</c> 3211*a58d3d2aSXin Li <c>1</c> <c>9</c> <c>15</c> 3212*a58d3d2aSXin Li <c>2</c> <c>6</c> <c>8</c> 3213*a58d3d2aSXin Li <c>3</c> <c>3</c> <c>7</c> 3214*a58d3d2aSXin Li <c>4</c> <c>4</c> <c>4</c> 3215*a58d3d2aSXin Li <c>5</c> <c>5</c> <c>11</c> 3216*a58d3d2aSXin Li <c>6</c> <c>8</c> <c>12</c> 3217*a58d3d2aSXin Li <c>7</c> <c>1</c> <c>3</c> 3218*a58d3d2aSXin Li <c>8</c> <c>2</c> <c>2</c> 3219*a58d3d2aSXin Li <c>9</c> <c>7</c> <c>13</c> 3220*a58d3d2aSXin Li<c>10</c> <c/> <c>10</c> 3221*a58d3d2aSXin Li<c>11</c> <c/> <c>5</c> 3222*a58d3d2aSXin Li<c>12</c> <c/> <c>6</c> 3223*a58d3d2aSXin Li<c>13</c> <c/> <c>9</c> 3224*a58d3d2aSXin Li<c>14</c> <c/> <c>14</c> 3225*a58d3d2aSXin Li<c>15</c> <c/> <c>1</c> 3226*a58d3d2aSXin Li</texttable> 3227*a58d3d2aSXin Li 3228*a58d3d2aSXin Li<t> 3229*a58d3d2aSXin LiThe top 7 bits of each normalized LSF coefficient index a value in the table, 3230*a58d3d2aSXin Li and the next 8 bits interpolate between it and the next value. 3231*a58d3d2aSXin LiLet i = (n[k] >> 8) be the integer index and 3232*a58d3d2aSXin Li f = (n[k] & 255) be the fractional part of a given 3233*a58d3d2aSXin Li coefficient. 3234*a58d3d2aSXin LiThen the re-ordered, approximated cosine, c_Q17[ordering[k]], is 3235*a58d3d2aSXin Li<figure align="center"> 3236*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3237*a58d3d2aSXin Lic_Q17[ordering[k]] = (cos_Q12[i]*256 3238*a58d3d2aSXin Li + (cos_Q12[i+1]-cos_Q12[i])*f + 4) >> 3 , 3239*a58d3d2aSXin Li]]></artwork> 3240*a58d3d2aSXin Li</figure> 3241*a58d3d2aSXin Li where ordering[k] is the k'th entry of the column of 3242*a58d3d2aSXin Li <xref target="silk_nlsf_orderings"/> corresponding to the current audio 3243*a58d3d2aSXin Li bandwidth and cos_Q12[i] is the i'th entry of <xref target="silk_cos_table"/>. 3244*a58d3d2aSXin Li</t> 3245*a58d3d2aSXin Li 3246*a58d3d2aSXin Li<texttable anchor="silk_cos_table" 3247*a58d3d2aSXin Li title="Q12 Cosine Table for LSF Conversion"> 3248*a58d3d2aSXin Li<ttcol align="right">i</ttcol> 3249*a58d3d2aSXin Li<ttcol align="right">+0</ttcol> 3250*a58d3d2aSXin Li<ttcol align="right">+1</ttcol> 3251*a58d3d2aSXin Li<ttcol align="right">+2</ttcol> 3252*a58d3d2aSXin Li<ttcol align="right">+3</ttcol> 3253*a58d3d2aSXin Li<c>0</c> 3254*a58d3d2aSXin Li <c>4096</c> <c>4095</c> <c>4091</c> <c>4085</c> 3255*a58d3d2aSXin Li<c>4</c> 3256*a58d3d2aSXin Li <c>4076</c> <c>4065</c> <c>4052</c> <c>4036</c> 3257*a58d3d2aSXin Li<c>8</c> 3258*a58d3d2aSXin Li <c>4017</c> <c>3997</c> <c>3973</c> <c>3948</c> 3259*a58d3d2aSXin Li<c>12</c> 3260*a58d3d2aSXin Li <c>3920</c> <c>3889</c> <c>3857</c> <c>3822</c> 3261*a58d3d2aSXin Li<c>16</c> 3262*a58d3d2aSXin Li <c>3784</c> <c>3745</c> <c>3703</c> <c>3659</c> 3263*a58d3d2aSXin Li<c>20</c> 3264*a58d3d2aSXin Li <c>3613</c> <c>3564</c> <c>3513</c> <c>3461</c> 3265*a58d3d2aSXin Li<c>24</c> 3266*a58d3d2aSXin Li <c>3406</c> <c>3349</c> <c>3290</c> <c>3229</c> 3267*a58d3d2aSXin Li<c>28</c> 3268*a58d3d2aSXin Li <c>3166</c> <c>3102</c> <c>3035</c> <c>2967</c> 3269*a58d3d2aSXin Li<c>32</c> 3270*a58d3d2aSXin Li <c>2896</c> <c>2824</c> <c>2751</c> <c>2676</c> 3271*a58d3d2aSXin Li<c>36</c> 3272*a58d3d2aSXin Li <c>2599</c> <c>2520</c> <c>2440</c> <c>2359</c> 3273*a58d3d2aSXin Li<c>40</c> 3274*a58d3d2aSXin Li <c>2276</c> <c>2191</c> <c>2106</c> <c>2019</c> 3275*a58d3d2aSXin Li<c>44</c> 3276*a58d3d2aSXin Li <c>1931</c> <c>1842</c> <c>1751</c> <c>1660</c> 3277*a58d3d2aSXin Li<c>48</c> 3278*a58d3d2aSXin Li <c>1568</c> <c>1474</c> <c>1380</c> <c>1285</c> 3279*a58d3d2aSXin Li<c>52</c> 3280*a58d3d2aSXin Li <c>1189</c> <c>1093</c> <c>995</c> <c>897</c> 3281*a58d3d2aSXin Li<c>56</c> 3282*a58d3d2aSXin Li <c>799</c> <c>700</c> <c>601</c> <c>501</c> 3283*a58d3d2aSXin Li<c>60</c> 3284*a58d3d2aSXin Li <c>401</c> <c>301</c> <c>201</c> <c>101</c> 3285*a58d3d2aSXin Li<c>64</c> 3286*a58d3d2aSXin Li <c>0</c> <c>-101</c> <c>-201</c> <c>-301</c> 3287*a58d3d2aSXin Li<c>68</c> 3288*a58d3d2aSXin Li <c>-401</c> <c>-501</c> <c>-601</c> <c>-700</c> 3289*a58d3d2aSXin Li<c>72</c> 3290*a58d3d2aSXin Li <c>-799</c> <c>-897</c> <c>-995</c> <c>-1093</c> 3291*a58d3d2aSXin Li<c>76</c> 3292*a58d3d2aSXin Li<c>-1189</c><c>-1285</c><c>-1380</c><c>-1474</c> 3293*a58d3d2aSXin Li<c>80</c> 3294*a58d3d2aSXin Li<c>-1568</c><c>-1660</c><c>-1751</c><c>-1842</c> 3295*a58d3d2aSXin Li<c>84</c> 3296*a58d3d2aSXin Li<c>-1931</c><c>-2019</c><c>-2106</c><c>-2191</c> 3297*a58d3d2aSXin Li<c>88</c> 3298*a58d3d2aSXin Li<c>-2276</c><c>-2359</c><c>-2440</c><c>-2520</c> 3299*a58d3d2aSXin Li<c>92</c> 3300*a58d3d2aSXin Li<c>-2599</c><c>-2676</c><c>-2751</c><c>-2824</c> 3301*a58d3d2aSXin Li<c>96</c> 3302*a58d3d2aSXin Li<c>-2896</c><c>-2967</c><c>-3035</c><c>-3102</c> 3303*a58d3d2aSXin Li<c>100</c> 3304*a58d3d2aSXin Li<c>-3166</c><c>-3229</c><c>-3290</c><c>-3349</c> 3305*a58d3d2aSXin Li<c>104</c> 3306*a58d3d2aSXin Li<c>-3406</c><c>-3461</c><c>-3513</c><c>-3564</c> 3307*a58d3d2aSXin Li<c>108</c> 3308*a58d3d2aSXin Li<c>-3613</c><c>-3659</c><c>-3703</c><c>-3745</c> 3309*a58d3d2aSXin Li<c>112</c> 3310*a58d3d2aSXin Li<c>-3784</c><c>-3822</c><c>-3857</c><c>-3889</c> 3311*a58d3d2aSXin Li<c>116</c> 3312*a58d3d2aSXin Li<c>-3920</c><c>-3948</c><c>-3973</c><c>-3997</c> 3313*a58d3d2aSXin Li<c>120</c> 3314*a58d3d2aSXin Li<c>-4017</c><c>-4036</c><c>-4052</c><c>-4065</c> 3315*a58d3d2aSXin Li<c>124</c> 3316*a58d3d2aSXin Li<c>-4076</c><c>-4085</c><c>-4091</c><c>-4095</c> 3317*a58d3d2aSXin Li<c>128</c> 3318*a58d3d2aSXin Li<c>-4096</c> <c/> <c/> <c/> 3319*a58d3d2aSXin Li</texttable> 3320*a58d3d2aSXin Li 3321*a58d3d2aSXin Li<t> 3322*a58d3d2aSXin LiGiven the list of cosine values, silk_NLSF2A_find_poly() (NLSF2A.c) 3323*a58d3d2aSXin Li computes the coefficients of P and Q, described here via a simple recurrence. 3324*a58d3d2aSXin LiLet p_Q16[k][j] and q_Q16[k][j] be the coefficients of the products of the 3325*a58d3d2aSXin Li first (k+1) root pairs for P and Q, with j indexing the coefficient number. 3326*a58d3d2aSXin LiOnly the first (k+2) coefficients are needed, as the products are symmetric. 3327*a58d3d2aSXin LiLet p_Q16[0][0] = q_Q16[0][0] = 1<<16, 3328*a58d3d2aSXin Li p_Q16[0][1] = -c_Q17[0], q_Q16[0][1] = -c_Q17[1], and 3329*a58d3d2aSXin Li d2 = d_LPC/2. 3330*a58d3d2aSXin LiAs boundary conditions, assume 3331*a58d3d2aSXin Li p_Q16[k][j] = q_Q16[k][j] = 0 for all 3332*a58d3d2aSXin Li j < 0. 3333*a58d3d2aSXin LiAlso, assume p_Q16[k][k+2] = p_Q16[k][k] and 3334*a58d3d2aSXin Li q_Q16[k][k+2] = q_Q16[k][k] (because of the symmetry). 3335*a58d3d2aSXin LiThen, for 0 < k < d2 and 0 <= j <= k+1, 3336*a58d3d2aSXin Li<figure align="center"> 3337*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3338*a58d3d2aSXin Lip_Q16[k][j] = p_Q16[k-1][j] + p_Q16[k-1][j-2] 3339*a58d3d2aSXin Li - ((c_Q17[2*k]*p_Q16[k-1][j-1] + 32768)>>16) , 3340*a58d3d2aSXin Li 3341*a58d3d2aSXin Liq_Q16[k][j] = q_Q16[k-1][j] + q_Q16[k-1][j-2] 3342*a58d3d2aSXin Li - ((c_Q17[2*k+1]*q_Q16[k-1][j-1] + 32768)>>16) . 3343*a58d3d2aSXin Li]]></artwork> 3344*a58d3d2aSXin Li</figure> 3345*a58d3d2aSXin LiThe use of Q17 values for the cosine terms in an otherwise Q16 expression 3346*a58d3d2aSXin Li implicitly scales them by a factor of 2. 3347*a58d3d2aSXin LiThe multiplications in this recurrence may require up to 48 bits of precision 3348*a58d3d2aSXin Li in the result to avoid overflow. 3349*a58d3d2aSXin LiIn practice, each row of the recurrence only depends on the previous row, so an 3350*a58d3d2aSXin Li implementation does not need to store all of them. 3351*a58d3d2aSXin Li</t> 3352*a58d3d2aSXin Li<t> 3353*a58d3d2aSXin Lisilk_NLSF2A() uses the values from the last row of this recurrence to 3354*a58d3d2aSXin Li reconstruct a 32-bit version of the LPC filter (without the leading 1.0 3355*a58d3d2aSXin Li coefficient), a32_Q17[k], 0 <= k < d2: 3356*a58d3d2aSXin Li<figure align="center"> 3357*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3358*a58d3d2aSXin Lia32_Q17[k] = -(q_Q16[d2-1][k+1] - q_Q16[d2-1][k]) 3359*a58d3d2aSXin Li - (p_Q16[d2-1][k+1] + p_Q16[d2-1][k])) , 3360*a58d3d2aSXin Li 3361*a58d3d2aSXin Lia32_Q17[d_LPC-k-1] = (q_Q16[d2-1][k+1] - q_Q16[d2-1][k]) 3362*a58d3d2aSXin Li - (p_Q16[d2-1][k+1] + p_Q16[d2-1][k])) . 3363*a58d3d2aSXin Li]]></artwork> 3364*a58d3d2aSXin Li</figure> 3365*a58d3d2aSXin LiThe sum and difference of two terms from each of the p_Q16 and q_Q16 3366*a58d3d2aSXin Li coefficient lists reflect the (1 + z**-1) and 3367*a58d3d2aSXin Li (1 - z**-1) factors of P and Q, respectively. 3368*a58d3d2aSXin LiThe promotion of the expression from Q16 to Q17 implicitly scales the result 3369*a58d3d2aSXin Li by 1/2. 3370*a58d3d2aSXin Li</t> 3371*a58d3d2aSXin Li</section> 3372*a58d3d2aSXin Li 3373*a58d3d2aSXin Li<section anchor="silk_lpc_range_limit" 3374*a58d3d2aSXin Li title="Limiting the Range of the LPC Coefficients"> 3375*a58d3d2aSXin Li<t> 3376*a58d3d2aSXin LiThe a32_Q17[] coefficients are too large to fit in a 16-bit value, which 3377*a58d3d2aSXin Li significantly increases the cost of applying this filter in fixed-point 3378*a58d3d2aSXin Li decoders. 3379*a58d3d2aSXin LiReducing them to Q12 precision doesn't incur any significant quality loss, 3380*a58d3d2aSXin Li but still does not guarantee they will fit. 3381*a58d3d2aSXin Lisilk_NLSF2A() applies up to 10 rounds of bandwidth expansion to limit 3382*a58d3d2aSXin Li the dynamic range of these coefficients. 3383*a58d3d2aSXin LiEven floating-point decoders SHOULD perform these steps, to avoid mismatch. 3384*a58d3d2aSXin Li</t> 3385*a58d3d2aSXin Li<t> 3386*a58d3d2aSXin LiFor each round, the process first finds the index k such that abs(a32_Q17[k]) 3387*a58d3d2aSXin Li is largest, breaking ties by choosing the lowest value of k. 3388*a58d3d2aSXin LiThen, it computes the corresponding Q12 precision value, maxabs_Q12, subject to 3389*a58d3d2aSXin Li an upper bound to avoid overflow in subsequent computations: 3390*a58d3d2aSXin Li<figure align="center"> 3391*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3392*a58d3d2aSXin Limaxabs_Q12 = min((maxabs_Q17 + 16) >> 5, 163838) . 3393*a58d3d2aSXin Li]]></artwork> 3394*a58d3d2aSXin Li</figure> 3395*a58d3d2aSXin LiIf this is larger than 32767, the procedure derives the chirp factor, 3396*a58d3d2aSXin Li sc_Q16[0], to use in the bandwidth expansion as 3397*a58d3d2aSXin Li<figure align="center"> 3398*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3399*a58d3d2aSXin Li (maxabs_Q12 - 32767) << 14 3400*a58d3d2aSXin Lisc_Q16[0] = 65470 - -------------------------- , 3401*a58d3d2aSXin Li (maxabs_Q12 * (k+1)) >> 2 3402*a58d3d2aSXin Li]]></artwork> 3403*a58d3d2aSXin Li</figure> 3404*a58d3d2aSXin Li where the division here is integer division. 3405*a58d3d2aSXin LiThis is an approximation of the chirp factor needed to reduce the target 3406*a58d3d2aSXin Li coefficient to 32767, though it is both less than 0.999 and, for 3407*a58d3d2aSXin Li k > 0 when maxabs_Q12 is much greater than 32767, still slightly 3408*a58d3d2aSXin Li too large. 3409*a58d3d2aSXin LiThe upper bound on maxabs_Q12, 163838, was chosen because it is equal to 3410*a58d3d2aSXin Li ((2**31 - 1) >> 14) + 32767, i.e., the 3411*a58d3d2aSXin Li largest value of maxabs_Q12 that would not overflow the numerator in the 3412*a58d3d2aSXin Li equation above when stored in a signed 32-bit integer. 3413*a58d3d2aSXin Li</t> 3414*a58d3d2aSXin Li<t> 3415*a58d3d2aSXin Lisilk_bwexpander_32() (bwexpander_32.c) performs the bandwidth expansion (again, 3416*a58d3d2aSXin Li only when maxabs_Q12 is greater than 32767) using the following recurrence: 3417*a58d3d2aSXin Li<figure align="center"> 3418*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3419*a58d3d2aSXin Li a32_Q17[k] = (a32_Q17[k]*sc_Q16[k]) >> 16 3420*a58d3d2aSXin Li 3421*a58d3d2aSXin Lisc_Q16[k+1] = (sc_Q16[0]*sc_Q16[k] + 32768) >> 16 3422*a58d3d2aSXin Li]]></artwork> 3423*a58d3d2aSXin Li</figure> 3424*a58d3d2aSXin LiThe first multiply may require up to 48 bits of precision in the result to 3425*a58d3d2aSXin Li avoid overflow. 3426*a58d3d2aSXin LiThe second multiply must be unsigned to avoid overflow with only 32 bits of 3427*a58d3d2aSXin Li precision. 3428*a58d3d2aSXin LiThe reference implementation uses a slightly more complex formulation that 3429*a58d3d2aSXin Li avoids the 32-bit overflow using signed multiplication, but is otherwise 3430*a58d3d2aSXin Li equivalent. 3431*a58d3d2aSXin Li</t> 3432*a58d3d2aSXin Li<t> 3433*a58d3d2aSXin LiAfter 10 rounds of bandwidth expansion are performed, they are simply saturated 3434*a58d3d2aSXin Li to 16 bits: 3435*a58d3d2aSXin Li<figure align="center"> 3436*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3437*a58d3d2aSXin Lia32_Q17[k] = clamp(-32768, (a32_Q17[k] + 16) >> 5, 32767) << 5 . 3438*a58d3d2aSXin Li]]></artwork> 3439*a58d3d2aSXin Li</figure> 3440*a58d3d2aSXin LiBecause this performs the actual saturation in the Q12 domain, but converts the 3441*a58d3d2aSXin Li coefficients back to the Q17 domain for the purposes of prediction gain 3442*a58d3d2aSXin Li limiting, this step must be performed after the 10th round of bandwidth 3443*a58d3d2aSXin Li expansion, regardless of whether or not the Q12 version of any coefficient 3444*a58d3d2aSXin Li still overflows a 16-bit integer. 3445*a58d3d2aSXin LiThis saturation is not performed if maxabs_Q12 drops to 32767 or less prior to 3446*a58d3d2aSXin Li the 10th round. 3447*a58d3d2aSXin Li</t> 3448*a58d3d2aSXin Li</section> 3449*a58d3d2aSXin Li 3450*a58d3d2aSXin Li<section anchor="silk_lpc_gain_limit" 3451*a58d3d2aSXin Li title="Limiting the Prediction Gain of the LPC Filter"> 3452*a58d3d2aSXin Li<t> 3453*a58d3d2aSXin LiThe prediction gain of an LPC synthesis filter is the square-root of the output 3454*a58d3d2aSXin Li energy when the filter is excited by a unit-energy impulse. 3455*a58d3d2aSXin LiEven if the Q12 coefficients would fit, the resulting filter may still have a 3456*a58d3d2aSXin Li significant gain (especially for voiced sounds), making the filter unstable. 3457*a58d3d2aSXin Lisilk_NLSF2A() applies up to 18 additional rounds of bandwidth expansion to 3458*a58d3d2aSXin Li limit the prediction gain. 3459*a58d3d2aSXin LiInstead of controlling the amount of bandwidth expansion using the prediction 3460*a58d3d2aSXin Li gain itself (which may diverge to infinity for an unstable filter), 3461*a58d3d2aSXin Li silk_NLSF2A() uses silk_LPC_inverse_pred_gain_QA() (LPC_inv_pred_gain.c) to 3462*a58d3d2aSXin Li compute the reflection coefficients associated with the filter. 3463*a58d3d2aSXin LiThe filter is stable if and only if the magnitude of these coefficients is 3464*a58d3d2aSXin Li sufficiently less than one. 3465*a58d3d2aSXin LiThe reflection coefficients, rc[k], can be computed using a simple Levinson 3466*a58d3d2aSXin Li recurrence, initialized with the LPC coefficients 3467*a58d3d2aSXin Li a[d_LPC-1][n] = a[n], and then updated via 3468*a58d3d2aSXin Li<figure align="center"> 3469*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3470*a58d3d2aSXin Li rc[k] = -a[k][k] , 3471*a58d3d2aSXin Li 3472*a58d3d2aSXin Li a[k][n] - a[k][k-n-1]*rc[k] 3473*a58d3d2aSXin Lia[k-1][n] = --------------------------- . 3474*a58d3d2aSXin Li 2 3475*a58d3d2aSXin Li 1 - rc[k] 3476*a58d3d2aSXin Li]]></artwork> 3477*a58d3d2aSXin Li</figure> 3478*a58d3d2aSXin Li</t> 3479*a58d3d2aSXin Li<t> 3480*a58d3d2aSXin LiHowever, silk_LPC_inverse_pred_gain_QA() approximates this using fixed-point 3481*a58d3d2aSXin Li arithmetic to guarantee reproducible results across platforms and 3482*a58d3d2aSXin Li implementations. 3483*a58d3d2aSXin LiSince small changes in the coefficients can make a stable filter unstable, it 3484*a58d3d2aSXin Li takes the real Q12 coefficients that will be used during reconstruction as 3485*a58d3d2aSXin Li input. 3486*a58d3d2aSXin LiThus, let 3487*a58d3d2aSXin Li<figure align="center"> 3488*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3489*a58d3d2aSXin Lia32_Q12[n] = (a32_Q17[n] + 16) >> 5 3490*a58d3d2aSXin Li]]></artwork> 3491*a58d3d2aSXin Li</figure> 3492*a58d3d2aSXin Li be the Q12 version of the LPC coefficients that will eventually be used. 3493*a58d3d2aSXin LiAs a simple initial check, the decoder computes the DC response as 3494*a58d3d2aSXin Li<figure align="center"> 3495*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3496*a58d3d2aSXin Li d_PLC-1 3497*a58d3d2aSXin Li __ 3498*a58d3d2aSXin LiDC_resp = \ a32_Q12[n] 3499*a58d3d2aSXin Li /_ 3500*a58d3d2aSXin Li n=0 3501*a58d3d2aSXin Li]]></artwork> 3502*a58d3d2aSXin Li</figure> 3503*a58d3d2aSXin Li and if DC_resp > 4096, the filter is unstable. 3504*a58d3d2aSXin Li</t> 3505*a58d3d2aSXin Li<t> 3506*a58d3d2aSXin LiIncreasing the precision of these Q12 coefficients to Q24 for intermediate 3507*a58d3d2aSXin Li computations allows more accurate computation of the reflection coefficients, 3508*a58d3d2aSXin Li so the decoder initializes the recurrence via 3509*a58d3d2aSXin Li<figure align="center"> 3510*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3511*a58d3d2aSXin Lia32_Q24[d_LPC-1][n] = a32_Q12[n] << 12 . 3512*a58d3d2aSXin Li]]></artwork> 3513*a58d3d2aSXin Li</figure> 3514*a58d3d2aSXin LiThen for each k from d_LPC-1 down to 0, if 3515*a58d3d2aSXin Li abs(a32_Q24[k][k]) > 16773022, the filter is unstable and the 3516*a58d3d2aSXin Li recurrence stops. 3517*a58d3d2aSXin LiThe constant 16773022 here is approximately 0.99975 in Q24. 3518*a58d3d2aSXin LiOtherwise, row k-1 of a32_Q24 is computed from row k as 3519*a58d3d2aSXin Li<figure align="center"> 3520*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3521*a58d3d2aSXin Li rc_Q31[k] = -a32_Q24[k][k] << 7 , 3522*a58d3d2aSXin Li 3523*a58d3d2aSXin Li div_Q30[k] = (1<<30) - (rc_Q31[k]*rc_Q31[k] >> 32) , 3524*a58d3d2aSXin Li 3525*a58d3d2aSXin Li b1[k] = ilog(div_Q30[k]) , 3526*a58d3d2aSXin Li 3527*a58d3d2aSXin Li b2[k] = b1[k] - 16 , 3528*a58d3d2aSXin Li 3529*a58d3d2aSXin Li (1<<29) - 1 3530*a58d3d2aSXin Li inv_Qb2[k] = ----------------------- , 3531*a58d3d2aSXin Li div_Q30[k] >> (b2[k]+1) 3532*a58d3d2aSXin Li 3533*a58d3d2aSXin Li err_Q29[k] = (1<<29) 3534*a58d3d2aSXin Li - ((div_Q30[k]<<(15-b2[k]))*inv_Qb2[k] >> 16) , 3535*a58d3d2aSXin Li 3536*a58d3d2aSXin Li gain_Qb1[k] = ((inv_Qb2[k] << 16) 3537*a58d3d2aSXin Li + (err_Q29[k]*inv_Qb2[k] >> 13)) , 3538*a58d3d2aSXin Li 3539*a58d3d2aSXin Linum_Q24[k-1][n] = a32_Q24[k][n] 3540*a58d3d2aSXin Li - ((a32_Q24[k][k-n-1]*rc_Q31[k] + (1<<30)) >> 31) , 3541*a58d3d2aSXin Li 3542*a58d3d2aSXin Lia32_Q24[k-1][n] = (num_Q24[k-1][n]*gain_Qb1[k] 3543*a58d3d2aSXin Li + (1<<(b1[k]-1))) >> b1[k] , 3544*a58d3d2aSXin Li]]></artwork> 3545*a58d3d2aSXin Li</figure> 3546*a58d3d2aSXin Li where 0 <= n < k. 3547*a58d3d2aSXin LiHere, rc_Q30[k] are the reflection coefficients. 3548*a58d3d2aSXin Lidiv_Q30[k] is the denominator for each iteration, and gain_Qb1[k] is its 3549*a58d3d2aSXin Li multiplicative inverse (with b1[k] fractional bits, where b1[k] ranges from 3550*a58d3d2aSXin Li 20 to 31). 3551*a58d3d2aSXin Liinv_Qb2[k], which ranges from 16384 to 32767, is a low-precision version of 3552*a58d3d2aSXin Li that inverse (with b2[k] fractional bits). 3553*a58d3d2aSXin Lierr_Q29[k] is the residual error, ranging from -32763 to 32392, which is used 3554*a58d3d2aSXin Li to improve the accuracy. 3555*a58d3d2aSXin LiThe values t_Q24[k-1][n] for each n are the numerators for the next row of 3556*a58d3d2aSXin Li coefficients in the recursion, and a32_Q24[k-1][n] is the final version of 3557*a58d3d2aSXin Li that row. 3558*a58d3d2aSXin LiEvery multiply in this procedure except the one used to compute gain_Qb1[k] 3559*a58d3d2aSXin Li requires more than 32 bits of precision, but otherwise all intermediate 3560*a58d3d2aSXin Li results fit in 32 bits or less. 3561*a58d3d2aSXin LiIn practice, because each row only depends on the next one, an implementation 3562*a58d3d2aSXin Li does not need to store them all. 3563*a58d3d2aSXin Li</t> 3564*a58d3d2aSXin Li<t> 3565*a58d3d2aSXin LiIf abs(a32_Q24[k][k]) <= 16773022 for 3566*a58d3d2aSXin Li 0 <= k < d_LPC, then the filter is considered stable. 3567*a58d3d2aSXin LiHowever, the problem of determining stability is ill-conditioned when the 3568*a58d3d2aSXin Li filter contains several reflection coefficients whose magnitude is very close 3569*a58d3d2aSXin Li to one. 3570*a58d3d2aSXin LiThis fixed-point algorithm is not mathematically guaranteed to correctly 3571*a58d3d2aSXin Li classify filters as stable or unstable in this case, though it does very well 3572*a58d3d2aSXin Li in practice. 3573*a58d3d2aSXin Li</t> 3574*a58d3d2aSXin Li<t> 3575*a58d3d2aSXin LiOn round i, 1 <= i <= 18, if the filter passes these 3576*a58d3d2aSXin Li stability checks, then this procedure stops, and the final LPC coefficients to 3577*a58d3d2aSXin Li use for reconstruction in <xref target="silk_lpc_synthesis"/> are 3578*a58d3d2aSXin Li<figure align="center"> 3579*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3580*a58d3d2aSXin Lia_Q12[k] = (a32_Q17[k] + 16) >> 5 . 3581*a58d3d2aSXin Li]]></artwork> 3582*a58d3d2aSXin Li</figure> 3583*a58d3d2aSXin LiOtherwise, a round of bandwidth expansion is applied using the same procedure 3584*a58d3d2aSXin Li as in <xref target="silk_lpc_range_limit"/>, with 3585*a58d3d2aSXin Li<figure align="center"> 3586*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3587*a58d3d2aSXin Lisc_Q16[0] = 65536 - (2<<i) . 3588*a58d3d2aSXin Li]]></artwork> 3589*a58d3d2aSXin Li</figure> 3590*a58d3d2aSXin LiDuring the 15th round, sc_Q16[0] becomes 0 in the above equation, so a_Q12[k] 3591*a58d3d2aSXin Li is set to 0 for all k, guaranteeing a stable filter. 3592*a58d3d2aSXin Li</t> 3593*a58d3d2aSXin Li</section> 3594*a58d3d2aSXin Li 3595*a58d3d2aSXin Li</section> 3596*a58d3d2aSXin Li 3597*a58d3d2aSXin Li<section anchor="silk_ltp_params" toc="include" 3598*a58d3d2aSXin Li title="Long-Term Prediction (LTP) Parameters"> 3599*a58d3d2aSXin Li<t> 3600*a58d3d2aSXin LiAfter the normalized LSF indices and, for 20 ms frames, the LSF 3601*a58d3d2aSXin Li interpolation index, voiced frames (see <xref target="silk_frame_type"/>) 3602*a58d3d2aSXin Li include additional LTP parameters. 3603*a58d3d2aSXin LiThere is one primary lag index for each SILK frame, but this is refined to 3604*a58d3d2aSXin Li produce a separate lag index per subframe using a vector quantizer. 3605*a58d3d2aSXin LiEach subframe also gets its own prediction gain coefficient. 3606*a58d3d2aSXin Li</t> 3607*a58d3d2aSXin Li 3608*a58d3d2aSXin Li<section anchor="silk_ltp_lags" title="Pitch Lags"> 3609*a58d3d2aSXin Li<t> 3610*a58d3d2aSXin LiThe primary lag index is coded either relative to the primary lag of the prior 3611*a58d3d2aSXin Li frame in the same channel, or as an absolute index. 3612*a58d3d2aSXin LiAbsolute coding is used if and only if 3613*a58d3d2aSXin Li<list style="symbols"> 3614*a58d3d2aSXin Li<t> 3615*a58d3d2aSXin LiThis is the first SILK frame of its type (LBRR or regular) for this channel in 3616*a58d3d2aSXin Li the current Opus frame, 3617*a58d3d2aSXin Li</t> 3618*a58d3d2aSXin Li<t> 3619*a58d3d2aSXin LiThe previous SILK frame of the same type (LBRR or regular) for this channel in 3620*a58d3d2aSXin Li the same Opus frame was not coded, or 3621*a58d3d2aSXin Li</t> 3622*a58d3d2aSXin Li<t> 3623*a58d3d2aSXin LiThat previous SILK frame was coded, but was not voiced (see 3624*a58d3d2aSXin Li <xref target="silk_frame_type"/>). 3625*a58d3d2aSXin Li</t> 3626*a58d3d2aSXin Li</list> 3627*a58d3d2aSXin Li</t> 3628*a58d3d2aSXin Li 3629*a58d3d2aSXin Li<t> 3630*a58d3d2aSXin LiWith absolute coding, the primary pitch lag may range from 2 ms 3631*a58d3d2aSXin Li (inclusive) up to 18 ms (exclusive), corresponding to pitches from 3632*a58d3d2aSXin Li 500 Hz down to 55.6 Hz, respectively. 3633*a58d3d2aSXin LiIt is comprised of a high part and a low part, where the decoder reads the high 3634*a58d3d2aSXin Li part using the 32-entry codebook in <xref target="silk_abs_pitch_high_pdf"/> 3635*a58d3d2aSXin Li and the low part using the codebook corresponding to the current audio 3636*a58d3d2aSXin Li bandwidth from <xref target="silk_abs_pitch_low_pdf"/>. 3637*a58d3d2aSXin LiThe final primary pitch lag is then 3638*a58d3d2aSXin Li<figure align="center"> 3639*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3640*a58d3d2aSXin Lilag = lag_high*lag_scale + lag_low + lag_min 3641*a58d3d2aSXin Li]]></artwork> 3642*a58d3d2aSXin Li</figure> 3643*a58d3d2aSXin Li where lag_high is the high part, lag_low is the low part, and lag_scale 3644*a58d3d2aSXin Li and lag_min are the values from the "Scale" and "Minimum Lag" columns of 3645*a58d3d2aSXin Li <xref target="silk_abs_pitch_low_pdf"/>, respectively. 3646*a58d3d2aSXin Li</t> 3647*a58d3d2aSXin Li 3648*a58d3d2aSXin Li<texttable anchor="silk_abs_pitch_high_pdf" 3649*a58d3d2aSXin Li title="PDF for High Part of Primary Pitch Lag"> 3650*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 3651*a58d3d2aSXin Li<c>{3, 3, 6, 11, 21, 30, 32, 19, 3652*a58d3d2aSXin Li 11, 10, 12, 13, 13, 12, 11, 9, 3653*a58d3d2aSXin Li 8, 7, 6, 4, 2, 2, 2, 1, 3654*a58d3d2aSXin Li 1, 1, 1, 1, 1, 1, 1, 1}/256</c> 3655*a58d3d2aSXin Li</texttable> 3656*a58d3d2aSXin Li 3657*a58d3d2aSXin Li<texttable anchor="silk_abs_pitch_low_pdf" 3658*a58d3d2aSXin Li title="PDF for Low Part of Primary Pitch Lag"> 3659*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol> 3660*a58d3d2aSXin Li<ttcol>PDF</ttcol> 3661*a58d3d2aSXin Li<ttcol>Scale</ttcol> 3662*a58d3d2aSXin Li<ttcol>Minimum Lag</ttcol> 3663*a58d3d2aSXin Li<ttcol>Maximum Lag</ttcol> 3664*a58d3d2aSXin Li<c>NB</c> <c>{64, 64, 64, 64}/256</c> <c>4</c> <c>16</c> <c>144</c> 3665*a58d3d2aSXin Li<c>MB</c> <c>{43, 42, 43, 43, 42, 43}/256</c> <c>6</c> <c>24</c> <c>216</c> 3666*a58d3d2aSXin Li<c>WB</c> <c>{32, 32, 32, 32, 32, 32, 32, 32}/256</c> <c>8</c> <c>32</c> <c>288</c> 3667*a58d3d2aSXin Li</texttable> 3668*a58d3d2aSXin Li 3669*a58d3d2aSXin Li<t> 3670*a58d3d2aSXin LiAll frames that do not use absolute coding for the primary lag index use 3671*a58d3d2aSXin Li relative coding instead. 3672*a58d3d2aSXin LiThe decoder reads a single delta value using the 21-entry PDF in 3673*a58d3d2aSXin Li <xref target="silk_rel_pitch_pdf"/>. 3674*a58d3d2aSXin LiIf the resulting value is zero, it falls back to the absolute coding procedure 3675*a58d3d2aSXin Li from the prior paragraph. 3676*a58d3d2aSXin LiOtherwise, the final primary pitch lag is then 3677*a58d3d2aSXin Li<figure align="center"> 3678*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3679*a58d3d2aSXin Lilag = previous_lag + (delta_lag_index - 9) 3680*a58d3d2aSXin Li]]></artwork> 3681*a58d3d2aSXin Li</figure> 3682*a58d3d2aSXin Li where previous_lag is the primary pitch lag from the most recent frame in the 3683*a58d3d2aSXin Li same channel and delta_lag_index is the value just decoded. 3684*a58d3d2aSXin LiThis allows a per-frame change in the pitch lag of -8 to +11 samples. 3685*a58d3d2aSXin LiThe decoder does no clamping at this point, so this value can fall outside the 3686*a58d3d2aSXin Li range of 2 ms to 18 ms, and the decoder must use this unclamped 3687*a58d3d2aSXin Li value when using relative coding in the next SILK frame (if any). 3688*a58d3d2aSXin LiHowever, because an Opus frame can use relative coding for at most two 3689*a58d3d2aSXin Li consecutive SILK frames, integer overflow should not be an issue. 3690*a58d3d2aSXin Li</t> 3691*a58d3d2aSXin Li 3692*a58d3d2aSXin Li<texttable anchor="silk_rel_pitch_pdf" 3693*a58d3d2aSXin Li title="PDF for Primary Pitch Lag Change"> 3694*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 3695*a58d3d2aSXin Li<c>{46, 2, 2, 3, 4, 6, 10, 15, 3696*a58d3d2aSXin Li 26, 38, 30, 22, 15, 10, 7, 6, 3697*a58d3d2aSXin Li 4, 4, 2, 2, 2}/256</c> 3698*a58d3d2aSXin Li</texttable> 3699*a58d3d2aSXin Li 3700*a58d3d2aSXin Li<t> 3701*a58d3d2aSXin LiAfter the primary pitch lag, a "pitch contour", stored as a single entry from 3702*a58d3d2aSXin Li one of four small VQ codebooks, gives lag offsets for each subframe in the 3703*a58d3d2aSXin Li current SILK frame. 3704*a58d3d2aSXin LiThe codebook index is decoded using one of the PDFs in 3705*a58d3d2aSXin Li <xref target="silk_pitch_contour_pdfs"/> depending on the current frame size 3706*a58d3d2aSXin Li and audio bandwidth. 3707*a58d3d2aSXin LiTables <xref format="counter" target="silk_pitch_contour_cb_nb10ms"/> 3708*a58d3d2aSXin Li through <xref format="counter" target="silk_pitch_contour_cb_mbwb20ms"/> 3709*a58d3d2aSXin Li give the corresponding offsets to apply to the primary pitch lag for each 3710*a58d3d2aSXin Li subframe given the decoded codebook index. 3711*a58d3d2aSXin Li</t> 3712*a58d3d2aSXin Li 3713*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_pdfs" 3714*a58d3d2aSXin Li title="PDFs for Subframe Pitch Contour"> 3715*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol> 3716*a58d3d2aSXin Li<ttcol>SILK Frame Size</ttcol> 3717*a58d3d2aSXin Li<ttcol align="right">Codebook Size</ttcol> 3718*a58d3d2aSXin Li<ttcol>PDF</ttcol> 3719*a58d3d2aSXin Li<c>NB</c> <c>10 ms</c> <c>3</c> 3720*a58d3d2aSXin Li<c>{143, 50, 63}/256</c> 3721*a58d3d2aSXin Li<c>NB</c> <c>20 ms</c> <c>11</c> 3722*a58d3d2aSXin Li<c>{68, 12, 21, 17, 19, 22, 30, 24, 3723*a58d3d2aSXin Li 17, 16, 10}/256</c> 3724*a58d3d2aSXin Li<c>MB or WB</c> <c>10 ms</c> <c>12</c> 3725*a58d3d2aSXin Li<c>{91, 46, 39, 19, 14, 12, 8, 7, 3726*a58d3d2aSXin Li 6, 5, 5, 4}/256</c> 3727*a58d3d2aSXin Li<c>MB or WB</c> <c>20 ms</c> <c>34</c> 3728*a58d3d2aSXin Li<c>{33, 22, 18, 16, 15, 14, 14, 13, 3729*a58d3d2aSXin Li 13, 10, 9, 9, 8, 6, 6, 6, 3730*a58d3d2aSXin Li 5, 4, 4, 4, 3, 3, 3, 2, 3731*a58d3d2aSXin Li 2, 2, 2, 2, 2, 2, 1, 1, 3732*a58d3d2aSXin Li 1, 1}/256</c> 3733*a58d3d2aSXin Li</texttable> 3734*a58d3d2aSXin Li 3735*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_nb10ms" 3736*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: NB, 10 ms Frames"> 3737*a58d3d2aSXin Li<ttcol>Index</ttcol> 3738*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol> 3739*a58d3d2aSXin Li<c>0</c> <c><spanx style="vbare"> 0 0</spanx></c> 3740*a58d3d2aSXin Li<c>1</c> <c><spanx style="vbare"> 1 0</spanx></c> 3741*a58d3d2aSXin Li<c>2</c> <c><spanx style="vbare"> 0 1</spanx></c> 3742*a58d3d2aSXin Li</texttable> 3743*a58d3d2aSXin Li 3744*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_nb20ms" 3745*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: NB, 20 ms Frames"> 3746*a58d3d2aSXin Li<ttcol>Index</ttcol> 3747*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol> 3748*a58d3d2aSXin Li <c>0</c> <c><spanx style="vbare"> 0 0 0 0</spanx></c> 3749*a58d3d2aSXin Li <c>1</c> <c><spanx style="vbare"> 2 1 0 -1</spanx></c> 3750*a58d3d2aSXin Li <c>2</c> <c><spanx style="vbare">-1 0 1 2</spanx></c> 3751*a58d3d2aSXin Li <c>3</c> <c><spanx style="vbare">-1 0 0 1</spanx></c> 3752*a58d3d2aSXin Li <c>4</c> <c><spanx style="vbare">-1 0 0 0</spanx></c> 3753*a58d3d2aSXin Li <c>5</c> <c><spanx style="vbare"> 0 0 0 1</spanx></c> 3754*a58d3d2aSXin Li <c>6</c> <c><spanx style="vbare"> 0 0 1 1</spanx></c> 3755*a58d3d2aSXin Li <c>7</c> <c><spanx style="vbare"> 1 1 0 0</spanx></c> 3756*a58d3d2aSXin Li <c>8</c> <c><spanx style="vbare"> 1 0 0 0</spanx></c> 3757*a58d3d2aSXin Li <c>9</c> <c><spanx style="vbare"> 0 0 0 -1</spanx></c> 3758*a58d3d2aSXin Li<c>10</c> <c><spanx style="vbare"> 1 0 0 -1</spanx></c> 3759*a58d3d2aSXin Li</texttable> 3760*a58d3d2aSXin Li 3761*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_mbwb10ms" 3762*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: MB or WB, 10 ms Frames"> 3763*a58d3d2aSXin Li<ttcol>Index</ttcol> 3764*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol> 3765*a58d3d2aSXin Li <c>0</c> <c><spanx style="vbare"> 0 0</spanx></c> 3766*a58d3d2aSXin Li <c>1</c> <c><spanx style="vbare"> 0 1</spanx></c> 3767*a58d3d2aSXin Li <c>2</c> <c><spanx style="vbare"> 1 0</spanx></c> 3768*a58d3d2aSXin Li <c>3</c> <c><spanx style="vbare">-1 1</spanx></c> 3769*a58d3d2aSXin Li <c>4</c> <c><spanx style="vbare"> 1 -1</spanx></c> 3770*a58d3d2aSXin Li <c>5</c> <c><spanx style="vbare">-1 2</spanx></c> 3771*a58d3d2aSXin Li <c>6</c> <c><spanx style="vbare"> 2 -1</spanx></c> 3772*a58d3d2aSXin Li <c>7</c> <c><spanx style="vbare">-2 2</spanx></c> 3773*a58d3d2aSXin Li <c>8</c> <c><spanx style="vbare"> 2 -2</spanx></c> 3774*a58d3d2aSXin Li <c>9</c> <c><spanx style="vbare">-2 3</spanx></c> 3775*a58d3d2aSXin Li<c>10</c> <c><spanx style="vbare"> 3 -2</spanx></c> 3776*a58d3d2aSXin Li<c>11</c> <c><spanx style="vbare">-3 3</spanx></c> 3777*a58d3d2aSXin Li</texttable> 3778*a58d3d2aSXin Li 3779*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_mbwb20ms" 3780*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: MB or WB, 20 ms Frames"> 3781*a58d3d2aSXin Li<ttcol>Index</ttcol> 3782*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol> 3783*a58d3d2aSXin Li <c>0</c> <c><spanx style="vbare"> 0 0 0 0</spanx></c> 3784*a58d3d2aSXin Li <c>1</c> <c><spanx style="vbare"> 0 0 1 1</spanx></c> 3785*a58d3d2aSXin Li <c>2</c> <c><spanx style="vbare"> 1 1 0 0</spanx></c> 3786*a58d3d2aSXin Li <c>3</c> <c><spanx style="vbare">-1 0 0 0</spanx></c> 3787*a58d3d2aSXin Li <c>4</c> <c><spanx style="vbare"> 0 0 0 1</spanx></c> 3788*a58d3d2aSXin Li <c>5</c> <c><spanx style="vbare"> 1 0 0 0</spanx></c> 3789*a58d3d2aSXin Li <c>6</c> <c><spanx style="vbare">-1 0 0 1</spanx></c> 3790*a58d3d2aSXin Li <c>7</c> <c><spanx style="vbare"> 0 0 0 -1</spanx></c> 3791*a58d3d2aSXin Li <c>8</c> <c><spanx style="vbare">-1 0 1 2</spanx></c> 3792*a58d3d2aSXin Li <c>9</c> <c><spanx style="vbare"> 1 0 0 -1</spanx></c> 3793*a58d3d2aSXin Li<c>10</c> <c><spanx style="vbare">-2 -1 1 2</spanx></c> 3794*a58d3d2aSXin Li<c>11</c> <c><spanx style="vbare"> 2 1 0 -1</spanx></c> 3795*a58d3d2aSXin Li<c>12</c> <c><spanx style="vbare">-2 0 0 2</spanx></c> 3796*a58d3d2aSXin Li<c>13</c> <c><spanx style="vbare">-2 0 1 3</spanx></c> 3797*a58d3d2aSXin Li<c>14</c> <c><spanx style="vbare"> 2 1 -1 -2</spanx></c> 3798*a58d3d2aSXin Li<c>15</c> <c><spanx style="vbare">-3 -1 1 3</spanx></c> 3799*a58d3d2aSXin Li<c>16</c> <c><spanx style="vbare"> 2 0 0 -2</spanx></c> 3800*a58d3d2aSXin Li<c>17</c> <c><spanx style="vbare"> 3 1 0 -2</spanx></c> 3801*a58d3d2aSXin Li<c>18</c> <c><spanx style="vbare">-3 -1 2 4</spanx></c> 3802*a58d3d2aSXin Li<c>19</c> <c><spanx style="vbare">-4 -1 1 4</spanx></c> 3803*a58d3d2aSXin Li<c>20</c> <c><spanx style="vbare"> 3 1 -1 -3</spanx></c> 3804*a58d3d2aSXin Li<c>21</c> <c><spanx style="vbare">-4 -1 2 5</spanx></c> 3805*a58d3d2aSXin Li<c>22</c> <c><spanx style="vbare"> 4 2 -1 -3</spanx></c> 3806*a58d3d2aSXin Li<c>23</c> <c><spanx style="vbare"> 4 1 -1 -4</spanx></c> 3807*a58d3d2aSXin Li<c>24</c> <c><spanx style="vbare">-5 -1 2 6</spanx></c> 3808*a58d3d2aSXin Li<c>25</c> <c><spanx style="vbare"> 5 2 -1 -4</spanx></c> 3809*a58d3d2aSXin Li<c>26</c> <c><spanx style="vbare">-6 -2 2 6</spanx></c> 3810*a58d3d2aSXin Li<c>27</c> <c><spanx style="vbare">-5 -2 2 5</spanx></c> 3811*a58d3d2aSXin Li<c>28</c> <c><spanx style="vbare"> 6 2 -1 -5</spanx></c> 3812*a58d3d2aSXin Li<c>29</c> <c><spanx style="vbare">-7 -2 3 8</spanx></c> 3813*a58d3d2aSXin Li<c>30</c> <c><spanx style="vbare"> 6 2 -2 -6</spanx></c> 3814*a58d3d2aSXin Li<c>31</c> <c><spanx style="vbare"> 5 2 -2 -5</spanx></c> 3815*a58d3d2aSXin Li<c>32</c> <c><spanx style="vbare"> 8 3 -2 -7</spanx></c> 3816*a58d3d2aSXin Li<c>33</c> <c><spanx style="vbare">-9 -3 3 9</spanx></c> 3817*a58d3d2aSXin Li</texttable> 3818*a58d3d2aSXin Li 3819*a58d3d2aSXin Li<t> 3820*a58d3d2aSXin LiThe final pitch lag for each subframe is assembled in silk_decode_pitch() 3821*a58d3d2aSXin Li (decode_pitch.c). 3822*a58d3d2aSXin LiLet lag be the primary pitch lag for the current SILK frame, contour_index be 3823*a58d3d2aSXin Li index of the VQ codebook, and lag_cb[contour_index][k] be the corresponding 3824*a58d3d2aSXin Li entry of the codebook from the appropriate table given above for the k'th 3825*a58d3d2aSXin Li subframe. 3826*a58d3d2aSXin LiThen the final pitch lag for that subframe is 3827*a58d3d2aSXin Li<figure align="center"> 3828*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 3829*a58d3d2aSXin Lipitch_lags[k] = clamp(lag_min, lag + lag_cb[contour_index][k], 3830*a58d3d2aSXin Li lag_max) 3831*a58d3d2aSXin Li]]></artwork> 3832*a58d3d2aSXin Li</figure> 3833*a58d3d2aSXin Li where lag_min and lag_max are the values from the "Minimum Lag" and 3834*a58d3d2aSXin Li "Maximum Lag" columns of <xref target="silk_abs_pitch_low_pdf"/>, 3835*a58d3d2aSXin Li respectively. 3836*a58d3d2aSXin Li</t> 3837*a58d3d2aSXin Li 3838*a58d3d2aSXin Li</section> 3839*a58d3d2aSXin Li 3840*a58d3d2aSXin Li<section anchor="silk_ltp_filter" title="LTP Filter Coefficients"> 3841*a58d3d2aSXin Li<t> 3842*a58d3d2aSXin LiSILK uses a separate 5-tap pitch filter for each subframe, selected from one 3843*a58d3d2aSXin Li of three codebooks. 3844*a58d3d2aSXin LiThe three codebooks each represent different rate-distortion trade-offs, with 3845*a58d3d2aSXin Li average rates of 1.61 bits/subframe, 3.68 bits/subframe, and 3846*a58d3d2aSXin Li 4.85 bits/subframe, respectively. 3847*a58d3d2aSXin Li</t> 3848*a58d3d2aSXin Li 3849*a58d3d2aSXin Li<t> 3850*a58d3d2aSXin LiThe importance of the filter coefficients generally depends on two factors: the 3851*a58d3d2aSXin Li periodicity of the signal and relative energy between the current subframe and 3852*a58d3d2aSXin Li the signal from one period earlier. 3853*a58d3d2aSXin LiGreater periodicity and decaying energy both lead to more important filter 3854*a58d3d2aSXin Li coefficients, and thus should be coded with lower distortion and higher rate. 3855*a58d3d2aSXin LiThese properties are relatively stable over the duration of a single SILK 3856*a58d3d2aSXin Li frame, hence all of the subframes in a SILK frame choose their filter from the 3857*a58d3d2aSXin Li same codebook. 3858*a58d3d2aSXin LiThis is signaled with an explicitly-coded "periodicity index". 3859*a58d3d2aSXin LiThis immediately follows the subframe pitch lags, and is coded using the 3860*a58d3d2aSXin Li 3-entry PDF from <xref target="silk_perindex_pdf"/>. 3861*a58d3d2aSXin Li</t> 3862*a58d3d2aSXin Li 3863*a58d3d2aSXin Li<texttable anchor="silk_perindex_pdf" title="Periodicity Index PDF"> 3864*a58d3d2aSXin Li<ttcol>PDF</ttcol> 3865*a58d3d2aSXin Li<c>{77, 80, 99}/256</c> 3866*a58d3d2aSXin Li</texttable> 3867*a58d3d2aSXin Li 3868*a58d3d2aSXin Li<t> 3869*a58d3d2aSXin LiThe indices of the filters for each subframe follow. 3870*a58d3d2aSXin LiThey are all coded using the PDF from <xref target="silk_ltp_filter_pdfs"/> 3871*a58d3d2aSXin Li corresponding to the periodicity index. 3872*a58d3d2aSXin LiTables <xref format="counter" target="silk_ltp_filter_coeffs0"/> 3873*a58d3d2aSXin Li through <xref format="counter" target="silk_ltp_filter_coeffs2"/> 3874*a58d3d2aSXin Li contain the corresponding filter taps as signed Q7 integers. 3875*a58d3d2aSXin Li</t> 3876*a58d3d2aSXin Li 3877*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_pdfs" title="LTP Filter PDFs"> 3878*a58d3d2aSXin Li<ttcol>Periodicity Index</ttcol> 3879*a58d3d2aSXin Li<ttcol align="right">Codebook Size</ttcol> 3880*a58d3d2aSXin Li<ttcol>PDF</ttcol> 3881*a58d3d2aSXin Li<c>0</c> <c>8</c> <c>{185, 15, 13, 13, 9, 9, 6, 6}/256</c> 3882*a58d3d2aSXin Li<c>1</c> <c>16</c> <c>{57, 34, 21, 20, 15, 13, 12, 13, 3883*a58d3d2aSXin Li 10, 10, 9, 10, 9, 8, 7, 8}/256</c> 3884*a58d3d2aSXin Li<c>2</c> <c>32</c> <c>{15, 16, 14, 12, 12, 12, 11, 11, 3885*a58d3d2aSXin Li 11, 10, 9, 9, 9, 9, 8, 8, 3886*a58d3d2aSXin Li 8, 8, 7, 7, 6, 6, 5, 4, 3887*a58d3d2aSXin Li 5, 4, 4, 4, 3, 4, 3, 2}/256</c> 3888*a58d3d2aSXin Li</texttable> 3889*a58d3d2aSXin Li 3890*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_coeffs0" 3891*a58d3d2aSXin Li title="Codebook Vectors for LTP Filter, Periodicity Index 0"> 3892*a58d3d2aSXin Li<ttcol>Index</ttcol> 3893*a58d3d2aSXin Li<ttcol align="right">Filter Taps (Q7)</ttcol> 3894*a58d3d2aSXin Li <c>0</c> 3895*a58d3d2aSXin Li<c><spanx style="vbare"> 4 6 24 7 5</spanx></c> 3896*a58d3d2aSXin Li <c>1</c> 3897*a58d3d2aSXin Li<c><spanx style="vbare"> 0 0 2 0 0</spanx></c> 3898*a58d3d2aSXin Li <c>2</c> 3899*a58d3d2aSXin Li<c><spanx style="vbare"> 12 28 41 13 -4</spanx></c> 3900*a58d3d2aSXin Li <c>3</c> 3901*a58d3d2aSXin Li<c><spanx style="vbare"> -9 15 42 25 14</spanx></c> 3902*a58d3d2aSXin Li <c>4</c> 3903*a58d3d2aSXin Li<c><spanx style="vbare"> 1 -2 62 41 -9</spanx></c> 3904*a58d3d2aSXin Li <c>5</c> 3905*a58d3d2aSXin Li<c><spanx style="vbare">-10 37 65 -4 3</spanx></c> 3906*a58d3d2aSXin Li <c>6</c> 3907*a58d3d2aSXin Li<c><spanx style="vbare"> -6 4 66 7 -8</spanx></c> 3908*a58d3d2aSXin Li <c>7</c> 3909*a58d3d2aSXin Li<c><spanx style="vbare"> 16 14 38 -3 33</spanx></c> 3910*a58d3d2aSXin Li</texttable> 3911*a58d3d2aSXin Li 3912*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_coeffs1" 3913*a58d3d2aSXin Li title="Codebook Vectors for LTP Filter, Periodicity Index 1"> 3914*a58d3d2aSXin Li<ttcol>Index</ttcol> 3915*a58d3d2aSXin Li<ttcol align="right">Filter Taps (Q7)</ttcol> 3916*a58d3d2aSXin Li 3917*a58d3d2aSXin Li <c>0</c> 3918*a58d3d2aSXin Li<c><spanx style="vbare"> 13 22 39 23 12</spanx></c> 3919*a58d3d2aSXin Li <c>1</c> 3920*a58d3d2aSXin Li<c><spanx style="vbare"> -1 36 64 27 -6</spanx></c> 3921*a58d3d2aSXin Li <c>2</c> 3922*a58d3d2aSXin Li<c><spanx style="vbare"> -7 10 55 43 17</spanx></c> 3923*a58d3d2aSXin Li <c>3</c> 3924*a58d3d2aSXin Li<c><spanx style="vbare"> 1 1 8 1 1</spanx></c> 3925*a58d3d2aSXin Li <c>4</c> 3926*a58d3d2aSXin Li<c><spanx style="vbare"> 6 -11 74 53 -9</spanx></c> 3927*a58d3d2aSXin Li <c>5</c> 3928*a58d3d2aSXin Li<c><spanx style="vbare">-12 55 76 -12 8</spanx></c> 3929*a58d3d2aSXin Li <c>6</c> 3930*a58d3d2aSXin Li<c><spanx style="vbare"> -3 3 93 27 -4</spanx></c> 3931*a58d3d2aSXin Li <c>7</c> 3932*a58d3d2aSXin Li<c><spanx style="vbare"> 26 39 59 3 -8</spanx></c> 3933*a58d3d2aSXin Li <c>8</c> 3934*a58d3d2aSXin Li<c><spanx style="vbare"> 2 0 77 11 9</spanx></c> 3935*a58d3d2aSXin Li <c>9</c> 3936*a58d3d2aSXin Li<c><spanx style="vbare"> -8 22 44 -6 7</spanx></c> 3937*a58d3d2aSXin Li<c>10</c> 3938*a58d3d2aSXin Li<c><spanx style="vbare"> 40 9 26 3 9</spanx></c> 3939*a58d3d2aSXin Li<c>11</c> 3940*a58d3d2aSXin Li<c><spanx style="vbare"> -7 20 101 -7 4</spanx></c> 3941*a58d3d2aSXin Li<c>12</c> 3942*a58d3d2aSXin Li<c><spanx style="vbare"> 3 -8 42 26 0</spanx></c> 3943*a58d3d2aSXin Li<c>13</c> 3944*a58d3d2aSXin Li<c><spanx style="vbare">-15 33 68 2 23</spanx></c> 3945*a58d3d2aSXin Li<c>14</c> 3946*a58d3d2aSXin Li<c><spanx style="vbare"> -2 55 46 -2 15</spanx></c> 3947*a58d3d2aSXin Li<c>15</c> 3948*a58d3d2aSXin Li<c><spanx style="vbare"> 3 -1 21 16 41</spanx></c> 3949*a58d3d2aSXin Li</texttable> 3950*a58d3d2aSXin Li 3951*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_coeffs2" 3952*a58d3d2aSXin Li title="Codebook Vectors for LTP Filter, Periodicity Index 2"> 3953*a58d3d2aSXin Li<ttcol>Index</ttcol> 3954*a58d3d2aSXin Li<ttcol align="right">Filter Taps (Q7)</ttcol> 3955*a58d3d2aSXin Li <c>0</c> 3956*a58d3d2aSXin Li<c><spanx style="vbare"> -6 27 61 39 5</spanx></c> 3957*a58d3d2aSXin Li <c>1</c> 3958*a58d3d2aSXin Li<c><spanx style="vbare">-11 42 88 4 1</spanx></c> 3959*a58d3d2aSXin Li <c>2</c> 3960*a58d3d2aSXin Li<c><spanx style="vbare"> -2 60 65 6 -4</spanx></c> 3961*a58d3d2aSXin Li <c>3</c> 3962*a58d3d2aSXin Li<c><spanx style="vbare"> -1 -5 73 56 1</spanx></c> 3963*a58d3d2aSXin Li <c>4</c> 3964*a58d3d2aSXin Li<c><spanx style="vbare"> -9 19 94 29 -9</spanx></c> 3965*a58d3d2aSXin Li <c>5</c> 3966*a58d3d2aSXin Li<c><spanx style="vbare"> 0 12 99 6 4</spanx></c> 3967*a58d3d2aSXin Li <c>6</c> 3968*a58d3d2aSXin Li<c><spanx style="vbare"> 8 -19 102 46 -13</spanx></c> 3969*a58d3d2aSXin Li <c>7</c> 3970*a58d3d2aSXin Li<c><spanx style="vbare"> 3 2 13 3 2</spanx></c> 3971*a58d3d2aSXin Li <c>8</c> 3972*a58d3d2aSXin Li<c><spanx style="vbare"> 9 -21 84 72 -18</spanx></c> 3973*a58d3d2aSXin Li <c>9</c> 3974*a58d3d2aSXin Li<c><spanx style="vbare">-11 46 104 -22 8</spanx></c> 3975*a58d3d2aSXin Li<c>10</c> 3976*a58d3d2aSXin Li<c><spanx style="vbare"> 18 38 48 23 0</spanx></c> 3977*a58d3d2aSXin Li<c>11</c> 3978*a58d3d2aSXin Li<c><spanx style="vbare">-16 70 83 -21 11</spanx></c> 3979*a58d3d2aSXin Li<c>12</c> 3980*a58d3d2aSXin Li<c><spanx style="vbare"> 5 -11 117 22 -8</spanx></c> 3981*a58d3d2aSXin Li<c>13</c> 3982*a58d3d2aSXin Li<c><spanx style="vbare"> -6 23 117 -12 3</spanx></c> 3983*a58d3d2aSXin Li<c>14</c> 3984*a58d3d2aSXin Li<c><spanx style="vbare"> 3 -8 95 28 4</spanx></c> 3985*a58d3d2aSXin Li<c>15</c> 3986*a58d3d2aSXin Li<c><spanx style="vbare">-10 15 77 60 -15</spanx></c> 3987*a58d3d2aSXin Li<c>16</c> 3988*a58d3d2aSXin Li<c><spanx style="vbare"> -1 4 124 2 -4</spanx></c> 3989*a58d3d2aSXin Li<c>17</c> 3990*a58d3d2aSXin Li<c><spanx style="vbare"> 3 38 84 24 -25</spanx></c> 3991*a58d3d2aSXin Li<c>18</c> 3992*a58d3d2aSXin Li<c><spanx style="vbare"> 2 13 42 13 31</spanx></c> 3993*a58d3d2aSXin Li<c>19</c> 3994*a58d3d2aSXin Li<c><spanx style="vbare"> 21 -4 56 46 -1</spanx></c> 3995*a58d3d2aSXin Li<c>20</c> 3996*a58d3d2aSXin Li<c><spanx style="vbare"> -1 35 79 -13 19</spanx></c> 3997*a58d3d2aSXin Li<c>21</c> 3998*a58d3d2aSXin Li<c><spanx style="vbare"> -7 65 88 -9 -14</spanx></c> 3999*a58d3d2aSXin Li<c>22</c> 4000*a58d3d2aSXin Li<c><spanx style="vbare"> 20 4 81 49 -29</spanx></c> 4001*a58d3d2aSXin Li<c>23</c> 4002*a58d3d2aSXin Li<c><spanx style="vbare"> 20 0 75 3 -17</spanx></c> 4003*a58d3d2aSXin Li<c>24</c> 4004*a58d3d2aSXin Li<c><spanx style="vbare"> 5 -9 44 92 -8</spanx></c> 4005*a58d3d2aSXin Li<c>25</c> 4006*a58d3d2aSXin Li<c><spanx style="vbare"> 1 -3 22 69 31</spanx></c> 4007*a58d3d2aSXin Li<c>26</c> 4008*a58d3d2aSXin Li<c><spanx style="vbare"> -6 95 41 -12 5</spanx></c> 4009*a58d3d2aSXin Li<c>27</c> 4010*a58d3d2aSXin Li<c><spanx style="vbare"> 39 67 16 -4 1</spanx></c> 4011*a58d3d2aSXin Li<c>28</c> 4012*a58d3d2aSXin Li<c><spanx style="vbare"> 0 -6 120 55 -36</spanx></c> 4013*a58d3d2aSXin Li<c>29</c> 4014*a58d3d2aSXin Li<c><spanx style="vbare">-13 44 122 4 -24</spanx></c> 4015*a58d3d2aSXin Li<c>30</c> 4016*a58d3d2aSXin Li<c><spanx style="vbare"> 81 5 11 3 7</spanx></c> 4017*a58d3d2aSXin Li<c>31</c> 4018*a58d3d2aSXin Li<c><spanx style="vbare"> 2 0 9 10 88</spanx></c> 4019*a58d3d2aSXin Li</texttable> 4020*a58d3d2aSXin Li 4021*a58d3d2aSXin Li</section> 4022*a58d3d2aSXin Li 4023*a58d3d2aSXin Li<section anchor="silk_ltp_scaling" title="LTP Scaling Parameter"> 4024*a58d3d2aSXin Li<t> 4025*a58d3d2aSXin LiAn LTP scaling parameter appears after the LTP filter coefficients if and only 4026*a58d3d2aSXin Li if 4027*a58d3d2aSXin Li<list style="symbols"> 4028*a58d3d2aSXin Li<t>This is a voiced frame (see <xref target="silk_frame_type"/>), and</t> 4029*a58d3d2aSXin Li<t>Either 4030*a58d3d2aSXin Li<list style="symbols"> 4031*a58d3d2aSXin Li<t> 4032*a58d3d2aSXin LiThis SILK frame corresponds to the first time interval of the 4033*a58d3d2aSXin Li current Opus frame for its type (LBRR or regular), or 4034*a58d3d2aSXin Li</t> 4035*a58d3d2aSXin Li<t> 4036*a58d3d2aSXin LiThis is an LBRR frame where the LBRR flags (see 4037*a58d3d2aSXin Li <xref target="silk_lbrr_flags"/>) indicate the previous LBRR frame in the same 4038*a58d3d2aSXin Li channel is not coded. 4039*a58d3d2aSXin Li</t> 4040*a58d3d2aSXin Li</list> 4041*a58d3d2aSXin Li</t> 4042*a58d3d2aSXin Li</list> 4043*a58d3d2aSXin LiThis allows the encoder to trade off the prediction gain between 4044*a58d3d2aSXin Li packets against the recovery time after packet loss. 4045*a58d3d2aSXin LiUnlike absolute-coding for pitch lags, regular SILK frames that are not at the 4046*a58d3d2aSXin Li start of an Opus frame (i.e., that do not correspond to the first 20 ms 4047*a58d3d2aSXin Li time interval in Opus frames of 40 or 60 ms) do not include this 4048*a58d3d2aSXin Li field, even if the prior frame was not voiced, or (in the case of the side 4049*a58d3d2aSXin Li channel) not even coded. 4050*a58d3d2aSXin LiAfter an uncoded frame in the side channel, the LTP buffer (see 4051*a58d3d2aSXin Li <xref target="silk_ltp_synthesis"/>) is cleared to zero, and is thus in a 4052*a58d3d2aSXin Li known state. 4053*a58d3d2aSXin LiIn contrast, LBRR frames do include this field when the prior frame was not 4054*a58d3d2aSXin Li coded, since the LTP buffer contains the output of the PLC, which is 4055*a58d3d2aSXin Li non-normative. 4056*a58d3d2aSXin Li</t> 4057*a58d3d2aSXin Li<t> 4058*a58d3d2aSXin LiIf present, the decoder reads a value using the 3-entry PDF in 4059*a58d3d2aSXin Li <xref target="silk_ltp_scaling_pdf"/>. 4060*a58d3d2aSXin LiThe three possible values represent Q14 scale factors of 15565, 12288, and 4061*a58d3d2aSXin Li 8192, respectively (corresponding to approximately 0.95, 0.75, and 0.5). 4062*a58d3d2aSXin LiFrames that do not code the scaling parameter use the default factor of 15565 4063*a58d3d2aSXin Li (approximately 0.95). 4064*a58d3d2aSXin Li</t> 4065*a58d3d2aSXin Li 4066*a58d3d2aSXin Li<texttable anchor="silk_ltp_scaling_pdf" 4067*a58d3d2aSXin Li title="PDF for LTP Scaling Parameter"> 4068*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 4069*a58d3d2aSXin Li<c>{128, 64, 64}/256</c> 4070*a58d3d2aSXin Li</texttable> 4071*a58d3d2aSXin Li 4072*a58d3d2aSXin Li</section> 4073*a58d3d2aSXin Li 4074*a58d3d2aSXin Li</section> 4075*a58d3d2aSXin Li 4076*a58d3d2aSXin Li<section anchor="silk_seed" toc="include" 4077*a58d3d2aSXin Li title="Linear Congruential Generator (LCG) Seed"> 4078*a58d3d2aSXin Li<t> 4079*a58d3d2aSXin LiAs described in <xref target="silk_excitation_reconstruction"/>, SILK uses a 4080*a58d3d2aSXin Li linear congruential generator (LCG) to inject pseudorandom noise into the 4081*a58d3d2aSXin Li quantized excitation. 4082*a58d3d2aSXin LiTo ensure synchronization of this process between the encoder and decoder, each 4083*a58d3d2aSXin Li SILK frame stores a 2-bit seed after the LTP parameters (if any). 4084*a58d3d2aSXin LiThe encoder may consider the choice of seed during quantization, and the 4085*a58d3d2aSXin Li flexibility of this choice lets it reduce distortion, helping to pay for the 4086*a58d3d2aSXin Li bit cost required to signal it. 4087*a58d3d2aSXin LiThe decoder reads the seed using the uniform 4-entry PDF in 4088*a58d3d2aSXin Li <xref target="silk_seed_pdf"/>, yielding a value between 0 and 3, inclusive. 4089*a58d3d2aSXin Li</t> 4090*a58d3d2aSXin Li 4091*a58d3d2aSXin Li<texttable anchor="silk_seed_pdf" 4092*a58d3d2aSXin Li title="PDF for LCG Seed"> 4093*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol> 4094*a58d3d2aSXin Li<c>{64, 64, 64, 64}/256</c> 4095*a58d3d2aSXin Li</texttable> 4096*a58d3d2aSXin Li 4097*a58d3d2aSXin Li</section> 4098*a58d3d2aSXin Li 4099*a58d3d2aSXin Li<section anchor="silk_excitation" toc="include" title="Excitation"> 4100*a58d3d2aSXin Li<t> 4101*a58d3d2aSXin LiSILK codes the excitation using a modified version of the Pyramid Vector 4102*a58d3d2aSXin Li Quantization (PVQ) codebook <xref target="PVQ"/>. 4103*a58d3d2aSXin LiThe PVQ codebook is designed for Laplace-distributed values and consists of all 4104*a58d3d2aSXin Li sums of K signed, unit pulses in a vector of dimension N, where two pulses at 4105*a58d3d2aSXin Li the same position are required to have the same sign. 4106*a58d3d2aSXin LiThus the codebook includes all integer codevectors y of dimension N that 4107*a58d3d2aSXin Li satisfy 4108*a58d3d2aSXin Li<figure align="center"> 4109*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 4110*a58d3d2aSXin LiN-1 4111*a58d3d2aSXin Li__ 4112*a58d3d2aSXin Li\ abs(y[j]) = K . 4113*a58d3d2aSXin Li/_ 4114*a58d3d2aSXin Lij=0 4115*a58d3d2aSXin Li]]></artwork> 4116*a58d3d2aSXin Li</figure> 4117*a58d3d2aSXin LiUnlike regular PVQ, SILK uses a variable-length, rather than fixed-length, 4118*a58d3d2aSXin Li encoding. 4119*a58d3d2aSXin LiThis encoding is better suited to the more Gaussian-like distribution of the 4120*a58d3d2aSXin Li coefficient magnitudes and the non-uniform distribution of their signs (caused 4121*a58d3d2aSXin Li by the quantization offset described below). 4122*a58d3d2aSXin LiSILK also handles large codebooks by coding the least significant bits (LSBs) 4123*a58d3d2aSXin Li of each coefficient directly. 4124*a58d3d2aSXin LiThis adds a small coding efficiency loss, but greatly reduces the computation 4125*a58d3d2aSXin Li time and ROM size required for decoding, as implemented in 4126*a58d3d2aSXin Li silk_decode_pulses() (decode_pulses.c). 4127*a58d3d2aSXin Li</t> 4128*a58d3d2aSXin Li 4129*a58d3d2aSXin Li<t> 4130*a58d3d2aSXin LiSILK fixes the dimension of the codebook to N = 16. 4131*a58d3d2aSXin LiThe excitation is made up of a number of "shell blocks", each 16 samples in 4132*a58d3d2aSXin Li size. 4133*a58d3d2aSXin Li<xref target="silk_shell_block_table"/> lists the number of shell blocks 4134*a58d3d2aSXin Li required for a SILK frame for each possible audio bandwidth and frame size. 4135*a58d3d2aSXin Li10 ms MB frames nominally contain 120 samples (10 ms at 4136*a58d3d2aSXin Li 12 kHz), which is not a multiple of 16. 4137*a58d3d2aSXin LiThis is handled by coding 8 shell blocks (128 samples) and discarding the final 4138*a58d3d2aSXin Li 8 samples of the last block. 4139*a58d3d2aSXin LiThe decoder contains no special case that prevents an encoder from placing 4140*a58d3d2aSXin Li pulses in these samples, and they must be correctly parsed from the bitstream 4141*a58d3d2aSXin Li if present, but they are otherwise ignored. 4142*a58d3d2aSXin Li</t> 4143*a58d3d2aSXin Li 4144*a58d3d2aSXin Li<texttable anchor="silk_shell_block_table" 4145*a58d3d2aSXin Li title="Number of Shell Blocks Per SILK Frame"> 4146*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol> 4147*a58d3d2aSXin Li<ttcol>Frame Size</ttcol> 4148*a58d3d2aSXin Li<ttcol align="right">Number of Shell Blocks</ttcol> 4149*a58d3d2aSXin Li<c>NB</c> <c>10 ms</c> <c>5</c> 4150*a58d3d2aSXin Li<c>MB</c> <c>10 ms</c> <c>8</c> 4151*a58d3d2aSXin Li<c>WB</c> <c>10 ms</c> <c>10</c> 4152*a58d3d2aSXin Li<c>NB</c> <c>20 ms</c> <c>10</c> 4153*a58d3d2aSXin Li<c>MB</c> <c>20 ms</c> <c>15</c> 4154*a58d3d2aSXin Li<c>WB</c> <c>20 ms</c> <c>20</c> 4155*a58d3d2aSXin Li</texttable> 4156*a58d3d2aSXin Li 4157*a58d3d2aSXin Li<section anchor="silk_rate_level" title="Rate Level"> 4158*a58d3d2aSXin Li<t> 4159*a58d3d2aSXin LiThe first symbol in the excitation is a "rate level", which is an index from 0 4160*a58d3d2aSXin Li to 8, inclusive, coded using the PDF in <xref target="silk_rate_level_pdfs"/> 4161*a58d3d2aSXin Li corresponding to the signal type of the current frame (from 4162*a58d3d2aSXin Li <xref target="silk_frame_type"/>). 4163*a58d3d2aSXin LiThe rate level selects the PDF used to decode the number of pulses in 4164*a58d3d2aSXin Li the individual shell blocks. 4165*a58d3d2aSXin LiIt does not directly convey any information about the bitrate or the number of 4166*a58d3d2aSXin Li pulses itself, but merely changes the probability of the symbols in 4167*a58d3d2aSXin Li <xref target="silk_pulse_counts"/>. 4168*a58d3d2aSXin LiLevel 0 provides a more efficient encoding at low rates generally, and 4169*a58d3d2aSXin Li level 8 provides a more efficient encoding at high rates generally, 4170*a58d3d2aSXin Li though the most efficient level for a particular SILK frame may depend on the 4171*a58d3d2aSXin Li exact distribution of the coded symbols. 4172*a58d3d2aSXin LiAn encoder should, but is not required to, use the most efficient rate level. 4173*a58d3d2aSXin Li</t> 4174*a58d3d2aSXin Li 4175*a58d3d2aSXin Li<texttable anchor="silk_rate_level_pdfs" 4176*a58d3d2aSXin Li title="PDFs for the Rate Level"> 4177*a58d3d2aSXin Li<ttcol>Signal Type</ttcol> 4178*a58d3d2aSXin Li<ttcol>PDF</ttcol> 4179*a58d3d2aSXin Li<c>Inactive or Unvoiced</c> 4180*a58d3d2aSXin Li<c>{15, 51, 12, 46, 45, 13, 33, 27, 14}/256</c> 4181*a58d3d2aSXin Li<c>Voiced</c> 4182*a58d3d2aSXin Li<c>{33, 30, 36, 17, 34, 49, 18, 21, 18}/256</c> 4183*a58d3d2aSXin Li</texttable> 4184*a58d3d2aSXin Li 4185*a58d3d2aSXin Li</section> 4186*a58d3d2aSXin Li 4187*a58d3d2aSXin Li<section anchor="silk_pulse_counts" title="Pulses Per Shell Block"> 4188*a58d3d2aSXin Li<t> 4189*a58d3d2aSXin LiThe total number of pulses in each of the shell blocks follows the rate level. 4190*a58d3d2aSXin LiThe pulse counts for all of the shell blocks are coded consecutively, before 4191*a58d3d2aSXin Li the content of any of the blocks. 4192*a58d3d2aSXin LiEach block may have anywhere from 0 to 16 pulses, inclusive, coded using the 4193*a58d3d2aSXin Li 18-entry PDF in <xref target="silk_pulse_count_pdfs"/> corresponding to the 4194*a58d3d2aSXin Li rate level from <xref target="silk_rate_level"/>. 4195*a58d3d2aSXin LiThe special value 17 indicates that this block has one or more additional 4196*a58d3d2aSXin Li LSBs to decode for each coefficient. 4197*a58d3d2aSXin LiIf the decoder encounters this value, it decodes another value for the actual 4198*a58d3d2aSXin Li pulse count of the block, but uses the PDF corresponding to the special rate 4199*a58d3d2aSXin Li level 9 instead of the normal rate level. 4200*a58d3d2aSXin LiThis process repeats until the decoder reads a value less than 17, and it then 4201*a58d3d2aSXin Li sets the number of extra LSBs used to the number of 17's decoded for that 4202*a58d3d2aSXin Li block. 4203*a58d3d2aSXin LiIf it reads the value 17 ten times, then the next iteration uses the special 4204*a58d3d2aSXin Li rate level 10 instead of 9. 4205*a58d3d2aSXin LiThe probability of decoding a 17 when using the PDF for rate level 10 is 4206*a58d3d2aSXin Li zero, ensuring that the number of LSBs for a block will not exceed 10. 4207*a58d3d2aSXin LiThe cumulative distribution for rate level 10 is just a shifted version of 4208*a58d3d2aSXin Li that for 9 and thus does not require any additional storage. 4209*a58d3d2aSXin Li</t> 4210*a58d3d2aSXin Li 4211*a58d3d2aSXin Li<texttable anchor="silk_pulse_count_pdfs" 4212*a58d3d2aSXin Li title="PDFs for the Pulse Count"> 4213*a58d3d2aSXin Li<ttcol>Rate Level</ttcol> 4214*a58d3d2aSXin Li<ttcol>PDF</ttcol> 4215*a58d3d2aSXin Li<c>0</c> 4216*a58d3d2aSXin Li<c>{131, 74, 25, 8, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c> 4217*a58d3d2aSXin Li<c>1</c> 4218*a58d3d2aSXin Li<c>{58, 93, 60, 23, 7, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c> 4219*a58d3d2aSXin Li<c>2</c> 4220*a58d3d2aSXin Li<c>{43, 51, 46, 33, 24, 16, 11, 8, 6, 3, 3, 3, 2, 1, 1, 2, 1, 2}/256</c> 4221*a58d3d2aSXin Li<c>3</c> 4222*a58d3d2aSXin Li<c>{17, 52, 71, 57, 31, 12, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c> 4223*a58d3d2aSXin Li<c>4</c> 4224*a58d3d2aSXin Li<c>{6, 21, 41, 53, 49, 35, 21, 11, 6, 3, 2, 2, 1, 1, 1, 1, 1, 1}/256</c> 4225*a58d3d2aSXin Li<c>5</c> 4226*a58d3d2aSXin Li<c>{7, 14, 22, 28, 29, 28, 25, 20, 17, 13, 11, 9, 7, 5, 4, 4, 3, 10}/256</c> 4227*a58d3d2aSXin Li<c>6</c> 4228*a58d3d2aSXin Li<c>{2, 5, 14, 29, 42, 46, 41, 31, 19, 11, 6, 3, 2, 1, 1, 1, 1, 1}/256</c> 4229*a58d3d2aSXin Li<c>7</c> 4230*a58d3d2aSXin Li<c>{1, 2, 4, 10, 19, 29, 35, 37, 34, 28, 20, 14, 8, 5, 4, 2, 2, 2}/256</c> 4231*a58d3d2aSXin Li<c>8</c> 4232*a58d3d2aSXin Li<c>{1, 2, 2, 5, 9, 14, 20, 24, 27, 28, 26, 23, 20, 15, 11, 8, 6, 15}/256</c> 4233*a58d3d2aSXin Li<c>9</c> 4234*a58d3d2aSXin Li<c>{1, 1, 1, 6, 27, 58, 56, 39, 25, 14, 10, 6, 3, 3, 2, 1, 1, 2}/256</c> 4235*a58d3d2aSXin Li<c>10</c> 4236*a58d3d2aSXin Li<c>{2, 1, 6, 27, 58, 56, 39, 25, 14, 10, 6, 3, 3, 2, 1, 1, 2, 0}/256</c> 4237*a58d3d2aSXin Li</texttable> 4238*a58d3d2aSXin Li 4239*a58d3d2aSXin Li</section> 4240*a58d3d2aSXin Li 4241*a58d3d2aSXin Li<section anchor="silk_pulse_locations" title="Pulse Location Decoding"> 4242*a58d3d2aSXin Li<t> 4243*a58d3d2aSXin LiThe locations of the pulses in each shell block follow the pulse counts, 4244*a58d3d2aSXin Li as decoded by silk_shell_decoder() (shell_coder.c). 4245*a58d3d2aSXin LiAs with the pulse counts, these locations are coded for all the shell blocks 4246*a58d3d2aSXin Li before any of the remaining information for each block. 4247*a58d3d2aSXin LiUnlike many other codecs, SILK places no restriction on the distribution of 4248*a58d3d2aSXin Li pulses within a shell block. 4249*a58d3d2aSXin LiAll of the pulses may be placed in a single location, or each one in a unique 4250*a58d3d2aSXin Li location, or anything in between. 4251*a58d3d2aSXin Li</t> 4252*a58d3d2aSXin Li 4253*a58d3d2aSXin Li<t> 4254*a58d3d2aSXin LiThe location of pulses is coded by recursively partitioning each block into 4255*a58d3d2aSXin Li halves, and coding how many pulses fall on the left side of the split. 4256*a58d3d2aSXin LiAll remaining pulses must fall on the right side of the split. 4257*a58d3d2aSXin LiThe process then recurses into the left half, and after that returns, the 4258*a58d3d2aSXin Li right half (preorder traversal). 4259*a58d3d2aSXin LiThe PDF to use is chosen by the size of the current partition (16, 8, 4, or 2) 4260*a58d3d2aSXin Li and the number of pulses in the partition (1 to 16, inclusive). 4261*a58d3d2aSXin LiTables <xref format="counter" target="silk_shell_code3_pdfs"/> 4262*a58d3d2aSXin Li through <xref format="counter" target="silk_shell_code0_pdfs"/> list the 4263*a58d3d2aSXin Li PDFs used for each partition size and pulse count. 4264*a58d3d2aSXin LiThis process skips partitions without any pulses, i.e., where the initial pulse 4265*a58d3d2aSXin Li count from <xref target="silk_pulse_counts"/> was zero, or where the split in 4266*a58d3d2aSXin Li the prior level indicated that all of the pulses fell on the other side. 4267*a58d3d2aSXin LiThese partitions have nothing to code, so they require no PDF. 4268*a58d3d2aSXin Li</t> 4269*a58d3d2aSXin Li 4270*a58d3d2aSXin Li<texttable anchor="silk_shell_code3_pdfs" 4271*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 16 Sample Partitions"> 4272*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol> 4273*a58d3d2aSXin Li<ttcol>PDF</ttcol> 4274*a58d3d2aSXin Li <c>1</c> <c>{126, 130}/256</c> 4275*a58d3d2aSXin Li <c>2</c> <c>{56, 142, 58}/256</c> 4276*a58d3d2aSXin Li <c>3</c> <c>{25, 101, 104, 26}/256</c> 4277*a58d3d2aSXin Li <c>4</c> <c>{12, 60, 108, 64, 12}/256</c> 4278*a58d3d2aSXin Li <c>5</c> <c>{7, 35, 84, 87, 37, 6}/256</c> 4279*a58d3d2aSXin Li <c>6</c> <c>{4, 20, 59, 86, 63, 21, 3}/256</c> 4280*a58d3d2aSXin Li <c>7</c> <c>{3, 12, 38, 72, 75, 42, 12, 2}/256</c> 4281*a58d3d2aSXin Li <c>8</c> <c>{2, 8, 25, 54, 73, 59, 27, 7, 1}/256</c> 4282*a58d3d2aSXin Li <c>9</c> <c>{2, 5, 17, 39, 63, 65, 42, 18, 4, 1}/256</c> 4283*a58d3d2aSXin Li<c>10</c> <c>{1, 4, 12, 28, 49, 63, 54, 30, 11, 3, 1}/256</c> 4284*a58d3d2aSXin Li<c>11</c> <c>{1, 4, 8, 20, 37, 55, 57, 41, 22, 8, 2, 1}/256</c> 4285*a58d3d2aSXin Li<c>12</c> <c>{1, 3, 7, 15, 28, 44, 53, 48, 33, 16, 6, 1, 1}/256</c> 4286*a58d3d2aSXin Li<c>13</c> <c>{1, 2, 6, 12, 21, 35, 47, 48, 40, 25, 12, 5, 1, 1}/256</c> 4287*a58d3d2aSXin Li<c>14</c> <c>{1, 1, 4, 10, 17, 27, 37, 47, 43, 33, 21, 9, 4, 1, 1}/256</c> 4288*a58d3d2aSXin Li<c>15</c> <c>{1, 1, 1, 8, 14, 22, 33, 40, 43, 38, 28, 16, 8, 1, 1, 1}/256</c> 4289*a58d3d2aSXin Li<c>16</c> <c>{1, 1, 1, 1, 13, 18, 27, 36, 41, 41, 34, 24, 14, 1, 1, 1, 1}/256</c> 4290*a58d3d2aSXin Li</texttable> 4291*a58d3d2aSXin Li 4292*a58d3d2aSXin Li<texttable anchor="silk_shell_code2_pdfs" 4293*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 8 Sample Partitions"> 4294*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol> 4295*a58d3d2aSXin Li<ttcol>PDF</ttcol> 4296*a58d3d2aSXin Li <c>1</c> <c>{127, 129}/256</c> 4297*a58d3d2aSXin Li <c>2</c> <c>{53, 149, 54}/256</c> 4298*a58d3d2aSXin Li <c>3</c> <c>{22, 105, 106, 23}/256</c> 4299*a58d3d2aSXin Li <c>4</c> <c>{11, 61, 111, 63, 10}/256</c> 4300*a58d3d2aSXin Li <c>5</c> <c>{6, 35, 86, 88, 36, 5}/256</c> 4301*a58d3d2aSXin Li <c>6</c> <c>{4, 20, 59, 87, 62, 21, 3}/256</c> 4302*a58d3d2aSXin Li <c>7</c> <c>{3, 13, 40, 71, 73, 41, 13, 2}/256</c> 4303*a58d3d2aSXin Li <c>8</c> <c>{3, 9, 27, 53, 70, 56, 28, 9, 1}/256</c> 4304*a58d3d2aSXin Li <c>9</c> <c>{3, 8, 19, 37, 57, 61, 44, 20, 6, 1}/256</c> 4305*a58d3d2aSXin Li<c>10</c> <c>{3, 7, 15, 28, 44, 54, 49, 33, 17, 5, 1}/256</c> 4306*a58d3d2aSXin Li<c>11</c> <c>{1, 7, 13, 22, 34, 46, 48, 38, 28, 14, 4, 1}/256</c> 4307*a58d3d2aSXin Li<c>12</c> <c>{1, 1, 11, 22, 27, 35, 42, 47, 33, 25, 10, 1, 1}/256</c> 4308*a58d3d2aSXin Li<c>13</c> <c>{1, 1, 6, 14, 26, 37, 43, 43, 37, 26, 14, 6, 1, 1}/256</c> 4309*a58d3d2aSXin Li<c>14</c> <c>{1, 1, 4, 10, 20, 31, 40, 42, 40, 31, 20, 10, 4, 1, 1}/256</c> 4310*a58d3d2aSXin Li<c>15</c> <c>{1, 1, 3, 8, 16, 26, 35, 38, 38, 35, 26, 16, 8, 3, 1, 1}/256</c> 4311*a58d3d2aSXin Li<c>16</c> <c>{1, 1, 2, 6, 12, 21, 30, 36, 38, 36, 30, 21, 12, 6, 2, 1, 1}/256</c> 4312*a58d3d2aSXin Li</texttable> 4313*a58d3d2aSXin Li 4314*a58d3d2aSXin Li<texttable anchor="silk_shell_code1_pdfs" 4315*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 4 Sample Partitions"> 4316*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol> 4317*a58d3d2aSXin Li<ttcol>PDF</ttcol> 4318*a58d3d2aSXin Li <c>1</c> <c>{127, 129}/256</c> 4319*a58d3d2aSXin Li <c>2</c> <c>{49, 157, 50}/256</c> 4320*a58d3d2aSXin Li <c>3</c> <c>{20, 107, 109, 20}/256</c> 4321*a58d3d2aSXin Li <c>4</c> <c>{11, 60, 113, 62, 10}/256</c> 4322*a58d3d2aSXin Li <c>5</c> <c>{7, 36, 84, 87, 36, 6}/256</c> 4323*a58d3d2aSXin Li <c>6</c> <c>{6, 24, 57, 82, 60, 23, 4}/256</c> 4324*a58d3d2aSXin Li <c>7</c> <c>{5, 18, 39, 64, 68, 42, 16, 4}/256</c> 4325*a58d3d2aSXin Li <c>8</c> <c>{6, 14, 29, 47, 61, 52, 30, 14, 3}/256</c> 4326*a58d3d2aSXin Li <c>9</c> <c>{1, 15, 23, 35, 51, 50, 40, 30, 10, 1}/256</c> 4327*a58d3d2aSXin Li<c>10</c> <c>{1, 1, 21, 32, 42, 52, 46, 41, 18, 1, 1}/256</c> 4328*a58d3d2aSXin Li<c>11</c> <c>{1, 6, 16, 27, 36, 42, 42, 36, 27, 16, 6, 1}/256</c> 4329*a58d3d2aSXin Li<c>12</c> <c>{1, 5, 12, 21, 31, 38, 40, 38, 31, 21, 12, 5, 1}/256</c> 4330*a58d3d2aSXin Li<c>13</c> <c>{1, 3, 9, 17, 26, 34, 38, 38, 34, 26, 17, 9, 3, 1}/256</c> 4331*a58d3d2aSXin Li<c>14</c> <c>{1, 3, 7, 14, 22, 29, 34, 36, 34, 29, 22, 14, 7, 3, 1}/256</c> 4332*a58d3d2aSXin Li<c>15</c> <c>{1, 2, 5, 11, 18, 25, 31, 35, 35, 31, 25, 18, 11, 5, 2, 1}/256</c> 4333*a58d3d2aSXin Li<c>16</c> <c>{1, 1, 4, 9, 15, 21, 28, 32, 34, 32, 28, 21, 15, 9, 4, 1, 1}/256</c> 4334*a58d3d2aSXin Li</texttable> 4335*a58d3d2aSXin Li 4336*a58d3d2aSXin Li<texttable anchor="silk_shell_code0_pdfs" 4337*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 2 Sample Partitions"> 4338*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol> 4339*a58d3d2aSXin Li<ttcol>PDF</ttcol> 4340*a58d3d2aSXin Li <c>1</c> <c>{128, 128}/256</c> 4341*a58d3d2aSXin Li <c>2</c> <c>{42, 172, 42}/256</c> 4342*a58d3d2aSXin Li <c>3</c> <c>{21, 107, 107, 21}/256</c> 4343*a58d3d2aSXin Li <c>4</c> <c>{12, 60, 112, 61, 11}/256</c> 4344*a58d3d2aSXin Li <c>5</c> <c>{8, 34, 86, 86, 35, 7}/256</c> 4345*a58d3d2aSXin Li <c>6</c> <c>{8, 23, 55, 90, 55, 20, 5}/256</c> 4346*a58d3d2aSXin Li <c>7</c> <c>{5, 15, 38, 72, 72, 36, 15, 3}/256</c> 4347*a58d3d2aSXin Li <c>8</c> <c>{6, 12, 27, 52, 77, 47, 20, 10, 5}/256</c> 4348*a58d3d2aSXin Li <c>9</c> <c>{6, 19, 28, 35, 40, 40, 35, 28, 19, 6}/256</c> 4349*a58d3d2aSXin Li<c>10</c> <c>{4, 14, 22, 31, 37, 40, 37, 31, 22, 14, 4}/256</c> 4350*a58d3d2aSXin Li<c>11</c> <c>{3, 10, 18, 26, 33, 38, 38, 33, 26, 18, 10, 3}/256</c> 4351*a58d3d2aSXin Li<c>12</c> <c>{2, 8, 13, 21, 29, 36, 38, 36, 29, 21, 13, 8, 2}/256</c> 4352*a58d3d2aSXin Li<c>13</c> <c>{1, 5, 10, 17, 25, 32, 38, 38, 32, 25, 17, 10, 5, 1}/256</c> 4353*a58d3d2aSXin Li<c>14</c> <c>{1, 4, 7, 13, 21, 29, 35, 36, 35, 29, 21, 13, 7, 4, 1}/256</c> 4354*a58d3d2aSXin Li<c>15</c> <c>{1, 2, 5, 10, 17, 25, 32, 36, 36, 32, 25, 17, 10, 5, 2, 1}/256</c> 4355*a58d3d2aSXin Li<c>16</c> <c>{1, 2, 4, 7, 13, 21, 28, 34, 36, 34, 28, 21, 13, 7, 4, 2, 1}/256</c> 4356*a58d3d2aSXin Li</texttable> 4357*a58d3d2aSXin Li 4358*a58d3d2aSXin Li</section> 4359*a58d3d2aSXin Li 4360*a58d3d2aSXin Li<section anchor="silk_shell_lsb" title="LSB Decoding"> 4361*a58d3d2aSXin Li<t> 4362*a58d3d2aSXin LiAfter the decoder reads the pulse locations for all blocks, it reads the LSBs 4363*a58d3d2aSXin Li (if any) for each block in turn. 4364*a58d3d2aSXin LiInside each block, it reads all the LSBs for each coefficient in turn, even 4365*a58d3d2aSXin Li those where no pulses were allocated, before proceeding to the next one. 4366*a58d3d2aSXin LiFor 10 ms MB frames, it reads LSBs even for the extra 8 samples in 4367*a58d3d2aSXin Li the last block. 4368*a58d3d2aSXin LiThe LSBs are coded from most significant to least significant, and they all use 4369*a58d3d2aSXin Li the PDF in <xref target="silk_shell_lsb_pdf"/>. 4370*a58d3d2aSXin Li</t> 4371*a58d3d2aSXin Li 4372*a58d3d2aSXin Li<texttable anchor="silk_shell_lsb_pdf" title="PDF for Excitation LSBs"> 4373*a58d3d2aSXin Li<ttcol>PDF</ttcol> 4374*a58d3d2aSXin Li<c>{136, 120}/256</c> 4375*a58d3d2aSXin Li</texttable> 4376*a58d3d2aSXin Li 4377*a58d3d2aSXin Li<t> 4378*a58d3d2aSXin LiThe number of LSBs read for each coefficient in a block is determined in 4379*a58d3d2aSXin Li <xref target="silk_pulse_counts"/>. 4380*a58d3d2aSXin LiThe magnitude of the coefficient is initially equal to the number of pulses 4381*a58d3d2aSXin Li placed at that location in <xref target="silk_pulse_locations"/>. 4382*a58d3d2aSXin LiAs each LSB is decoded, the magnitude is doubled, and then the value of the LSB 4383*a58d3d2aSXin Li added to it, to obtain an updated magnitude. 4384*a58d3d2aSXin Li</t> 4385*a58d3d2aSXin Li</section> 4386*a58d3d2aSXin Li 4387*a58d3d2aSXin Li<section anchor="silk_signs" title="Sign Decoding"> 4388*a58d3d2aSXin Li<t> 4389*a58d3d2aSXin LiAfter decoding the pulse locations and the LSBs, the decoder knows the 4390*a58d3d2aSXin Li magnitude of each coefficient in the excitation. 4391*a58d3d2aSXin LiIt then decodes a sign for all coefficients with a non-zero magnitude, using 4392*a58d3d2aSXin Li one of the PDFs from <xref target="silk_sign_pdfs"/>. 4393*a58d3d2aSXin LiIf the value decoded is 0, then the coefficient magnitude is negated. 4394*a58d3d2aSXin LiOtherwise, it remains positive. 4395*a58d3d2aSXin Li</t> 4396*a58d3d2aSXin Li 4397*a58d3d2aSXin Li<t> 4398*a58d3d2aSXin LiThe decoder chooses the PDF for the sign based on the signal type and 4399*a58d3d2aSXin Li quantization offset type (from <xref target="silk_frame_type"/>) and the 4400*a58d3d2aSXin Li number of pulses in the block (from <xref target="silk_pulse_counts"/>). 4401*a58d3d2aSXin LiThe number of pulses in the block does not take into account any LSBs. 4402*a58d3d2aSXin LiMost PDFs are skewed towards negative signs because of the quantization offset, 4403*a58d3d2aSXin Li but the PDFs for zero pulses are highly skewed towards positive signs. 4404*a58d3d2aSXin LiIf a block contains many positive coefficients, it is sometimes beneficial to 4405*a58d3d2aSXin Li code it solely using LSBs (i.e., with zero pulses), since the encoder may be 4406*a58d3d2aSXin Li able to save enough bits on the signs to justify the less efficient 4407*a58d3d2aSXin Li coefficient magnitude encoding. 4408*a58d3d2aSXin Li</t> 4409*a58d3d2aSXin Li 4410*a58d3d2aSXin Li<texttable anchor="silk_sign_pdfs" 4411*a58d3d2aSXin Li title="PDFs for Excitation Signs"> 4412*a58d3d2aSXin Li<ttcol>Signal Type</ttcol> 4413*a58d3d2aSXin Li<ttcol>Quantization Offset Type</ttcol> 4414*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol> 4415*a58d3d2aSXin Li<ttcol>PDF</ttcol> 4416*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c> <c>0</c> <c>{2, 254}/256</c> 4417*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c> <c>1</c> <c>{207, 49}/256</c> 4418*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c> <c>2</c> <c>{189, 67}/256</c> 4419*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c> <c>3</c> <c>{179, 77}/256</c> 4420*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c> <c>4</c> <c>{174, 82}/256</c> 4421*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c> <c>5</c> <c>{163, 93}/256</c> 4422*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c> <c>6 or more</c> <c>{157, 99}/256</c> 4423*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>0</c> <c>{58, 198}/256</c> 4424*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>1</c> <c>{245, 11}/256</c> 4425*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>2</c> <c>{238, 18}/256</c> 4426*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>3</c> <c>{232, 24}/256</c> 4427*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>4</c> <c>{225, 31}/256</c> 4428*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>5</c> <c>{220, 36}/256</c> 4429*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>6 or more</c> <c>{211, 45}/256</c> 4430*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c> <c>0</c> <c>{1, 255}/256</c> 4431*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c> <c>1</c> <c>{210, 46}/256</c> 4432*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c> <c>2</c> <c>{190, 66}/256</c> 4433*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c> <c>3</c> <c>{178, 78}/256</c> 4434*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c> <c>4</c> <c>{169, 87}/256</c> 4435*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c> <c>5</c> <c>{162, 94}/256</c> 4436*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c> <c>6 or more</c> <c>{152, 104}/256</c> 4437*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>0</c> <c>{48, 208}/256</c> 4438*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>1</c> <c>{242, 14}/256</c> 4439*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>2</c> <c>{235, 21}/256</c> 4440*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>3</c> <c>{224, 32}/256</c> 4441*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>4</c> <c>{214, 42}/256</c> 4442*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>5</c> <c>{205, 51}/256</c> 4443*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>6 or more</c> <c>{190, 66}/256</c> 4444*a58d3d2aSXin Li<c>Voiced</c> <c>Low</c> <c>0</c> <c>{1, 255}/256</c> 4445*a58d3d2aSXin Li<c>Voiced</c> <c>Low</c> <c>1</c> <c>{162, 94}/256</c> 4446*a58d3d2aSXin Li<c>Voiced</c> <c>Low</c> <c>2</c> <c>{152, 104}/256</c> 4447*a58d3d2aSXin Li<c>Voiced</c> <c>Low</c> <c>3</c> <c>{147, 109}/256</c> 4448*a58d3d2aSXin Li<c>Voiced</c> <c>Low</c> <c>4</c> <c>{144, 112}/256</c> 4449*a58d3d2aSXin Li<c>Voiced</c> <c>Low</c> <c>5</c> <c>{141, 115}/256</c> 4450*a58d3d2aSXin Li<c>Voiced</c> <c>Low</c> <c>6 or more</c> <c>{138, 118}/256</c> 4451*a58d3d2aSXin Li<c>Voiced</c> <c>High</c> <c>0</c> <c>{8, 248}/256</c> 4452*a58d3d2aSXin Li<c>Voiced</c> <c>High</c> <c>1</c> <c>{203, 53}/256</c> 4453*a58d3d2aSXin Li<c>Voiced</c> <c>High</c> <c>2</c> <c>{187, 69}/256</c> 4454*a58d3d2aSXin Li<c>Voiced</c> <c>High</c> <c>3</c> <c>{176, 80}/256</c> 4455*a58d3d2aSXin Li<c>Voiced</c> <c>High</c> <c>4</c> <c>{168, 88}/256</c> 4456*a58d3d2aSXin Li<c>Voiced</c> <c>High</c> <c>5</c> <c>{161, 95}/256</c> 4457*a58d3d2aSXin Li<c>Voiced</c> <c>High</c> <c>6 or more</c> <c>{154, 102}/256</c> 4458*a58d3d2aSXin Li</texttable> 4459*a58d3d2aSXin Li 4460*a58d3d2aSXin Li</section> 4461*a58d3d2aSXin Li 4462*a58d3d2aSXin Li<section anchor="silk_excitation_reconstruction" 4463*a58d3d2aSXin Li title="Reconstructing the Excitation"> 4464*a58d3d2aSXin Li 4465*a58d3d2aSXin Li<t> 4466*a58d3d2aSXin LiAfter the signs have been read, there is enough information to reconstruct the 4467*a58d3d2aSXin Li complete excitation signal. 4468*a58d3d2aSXin LiThis requires adding a constant quantization offset to each non-zero sample, 4469*a58d3d2aSXin Li and then pseudorandomly inverting and offsetting every sample. 4470*a58d3d2aSXin LiThe constant quantization offset varies depending on the signal type and 4471*a58d3d2aSXin Li quantization offset type (see <xref target="silk_frame_type"/>). 4472*a58d3d2aSXin Li</t> 4473*a58d3d2aSXin Li 4474*a58d3d2aSXin Li<texttable anchor="silk_quantization_offsets" 4475*a58d3d2aSXin Li title="Excitation Quantization Offsets"> 4476*a58d3d2aSXin Li<ttcol align="left">Signal Type</ttcol> 4477*a58d3d2aSXin Li<ttcol align="left">Quantization Offset Type</ttcol> 4478*a58d3d2aSXin Li<ttcol align="right">Quantization Offset (Q23)</ttcol> 4479*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c> <c>25</c> 4480*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>60</c> 4481*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c> <c>25</c> 4482*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>60</c> 4483*a58d3d2aSXin Li<c>Voiced</c> <c>Low</c> <c>8</c> 4484*a58d3d2aSXin Li<c>Voiced</c> <c>High</c> <c>25</c> 4485*a58d3d2aSXin Li</texttable> 4486*a58d3d2aSXin Li 4487*a58d3d2aSXin Li<t> 4488*a58d3d2aSXin LiLet e_raw[i] be the raw excitation value at position i, with a magnitude 4489*a58d3d2aSXin Li composed of the pulses at that location (see 4490*a58d3d2aSXin Li <xref target="silk_pulse_locations"/>) combined with any additional LSBs (see 4491*a58d3d2aSXin Li <xref target="silk_shell_lsb"/>), and with the corresponding sign decoded in 4492*a58d3d2aSXin Li <xref target="silk_signs"/>. 4493*a58d3d2aSXin LiAdditionally, let seed be the current pseudorandom seed, which is initialized 4494*a58d3d2aSXin Li to the value decoded from <xref target="silk_seed"/> for the first sample in 4495*a58d3d2aSXin Li the current SILK frame, and updated for each subsequent sample according to 4496*a58d3d2aSXin Li the procedure below. 4497*a58d3d2aSXin LiFinally, let offset_Q23 be the quantization offset from 4498*a58d3d2aSXin Li <xref target="silk_quantization_offsets"/>. 4499*a58d3d2aSXin LiThen the following procedure produces the final reconstructed excitation value, 4500*a58d3d2aSXin Li e_Q23[i]: 4501*a58d3d2aSXin Li<figure align="center"> 4502*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 4503*a58d3d2aSXin Lie_Q23[i] = (e_raw[i] << 8) - sign(e_raw[i])*20 + offset_Q23; 4504*a58d3d2aSXin Li seed = (196314165*seed + 907633515) & 0xFFFFFFFF; 4505*a58d3d2aSXin Lie_Q23[i] = (seed & 0x80000000) ? -e_Q23[i] : e_Q23[i]; 4506*a58d3d2aSXin Li seed = (seed + e_raw[i]) & 0xFFFFFFFF; 4507*a58d3d2aSXin Li]]></artwork> 4508*a58d3d2aSXin Li</figure> 4509*a58d3d2aSXin LiWhen e_raw[i] is zero, sign() returns 0 by the definition in 4510*a58d3d2aSXin Li <xref target="sign"/>, so the factor of 20 does not get added. 4511*a58d3d2aSXin LiThe final e_Q23[i] value may require more than 16 bits per sample, but will not 4512*a58d3d2aSXin Li require more than 23, including the sign. 4513*a58d3d2aSXin Li</t> 4514*a58d3d2aSXin Li 4515*a58d3d2aSXin Li</section> 4516*a58d3d2aSXin Li 4517*a58d3d2aSXin Li</section> 4518*a58d3d2aSXin Li 4519*a58d3d2aSXin Li<section anchor="silk_frame_reconstruction" toc="include" 4520*a58d3d2aSXin Li title="SILK Frame Reconstruction"> 4521*a58d3d2aSXin Li 4522*a58d3d2aSXin Li<t> 4523*a58d3d2aSXin LiThe remainder of the reconstruction process for the frame does not need to be 4524*a58d3d2aSXin Li bit-exact, as small errors should only introduce proportionally small 4525*a58d3d2aSXin Li distortions. 4526*a58d3d2aSXin LiAlthough the reference implementation only includes a fixed-point version of 4527*a58d3d2aSXin Li the remaining steps, this section describes them in terms of a floating-point 4528*a58d3d2aSXin Li version for simplicity. 4529*a58d3d2aSXin LiThis produces a signal with a nominal range of -1.0 to 1.0. 4530*a58d3d2aSXin Li</t> 4531*a58d3d2aSXin Li 4532*a58d3d2aSXin Li<t> 4533*a58d3d2aSXin Lisilk_decode_core() (decode_core.c) contains the code for the main 4534*a58d3d2aSXin Li reconstruction process. 4535*a58d3d2aSXin LiIt proceeds subframe-by-subframe, since quantization gains, LTP parameters, and 4536*a58d3d2aSXin Li (in 20 ms SILK frames) LPC coefficients can vary from one to the 4537*a58d3d2aSXin Li next. 4538*a58d3d2aSXin Li</t> 4539*a58d3d2aSXin Li 4540*a58d3d2aSXin Li<t> 4541*a58d3d2aSXin LiLet a_Q12[k] be the LPC coefficients for the current subframe. 4542*a58d3d2aSXin LiIf this is the first or second subframe of a 20 ms SILK frame and the LSF 4543*a58d3d2aSXin Li interpolation factor, w_Q2 (see <xref target="silk_nlsf_interpolation"/>), is 4544*a58d3d2aSXin Li less than 4, then these correspond to the final LPC coefficients produced by 4545*a58d3d2aSXin Li <xref target="silk_lpc_gain_limit"/> from the interpolated LSF coefficients, 4546*a58d3d2aSXin Li n1_Q15[k] (computed in <xref target="silk_nlsf_interpolation"/>). 4547*a58d3d2aSXin LiOtherwise, they correspond to the final LPC coefficients produced from the 4548*a58d3d2aSXin Li uninterpolated LSF coefficients for the current frame, n2_Q15[k]. 4549*a58d3d2aSXin Li</t> 4550*a58d3d2aSXin Li 4551*a58d3d2aSXin Li<t> 4552*a58d3d2aSXin LiAlso, let n be the number of samples in a subframe (40 for NB, 60 for MB, and 4553*a58d3d2aSXin Li 80 for WB), s be the index of the current subframe in this SILK frame (0 or 1 4554*a58d3d2aSXin Li for 10 ms frames, or 0 to 3 for 20 ms frames), and j be the index of 4555*a58d3d2aSXin Li the first sample in the residual corresponding to the current subframe. 4556*a58d3d2aSXin Li</t> 4557*a58d3d2aSXin Li 4558*a58d3d2aSXin Li<section anchor="silk_ltp_synthesis" title="LTP Synthesis"> 4559*a58d3d2aSXin Li<t> 4560*a58d3d2aSXin LiVoiced SILK frames (see <xref target="silk_frame_type"/>) pass the excitation 4561*a58d3d2aSXin Li through an LTP filter using the parameters decoded in 4562*a58d3d2aSXin Li <xref target="silk_ltp_params"/> to produce an LPC residual. 4563*a58d3d2aSXin LiThe LTP filter requires LPC residual values from before the current subframe as 4564*a58d3d2aSXin Li input. 4565*a58d3d2aSXin LiHowever, since the LPC coefficients may have changed, it obtains this residual 4566*a58d3d2aSXin Li by "rewhitening" the corresponding output signal using the LPC coefficients 4567*a58d3d2aSXin Li from the current subframe. 4568*a58d3d2aSXin LiLet out[i] for 4569*a58d3d2aSXin Li (j - pitch_lags[s] - d_LPC - 2) <= i < j 4570*a58d3d2aSXin Li be the fully reconstructed output signal from the last 4571*a58d3d2aSXin Li (pitch_lags[s] + d_LPC + 2) samples of previous subframes 4572*a58d3d2aSXin Li (see <xref target="silk_lpc_synthesis"/>), where pitch_lags[s] is the pitch 4573*a58d3d2aSXin Li lag for the current subframe from <xref target="silk_ltp_lags"/>. 4574*a58d3d2aSXin LiDuring reconstruction of the first subframe for this channel after either 4575*a58d3d2aSXin Li<list style="symbols"> 4576*a58d3d2aSXin Li<t>An uncoded regular SILK frame (if this is the side channel), or</t> 4577*a58d3d2aSXin Li<t>A decoder reset (see <xref target="decoder-reset"/>),</t> 4578*a58d3d2aSXin Li</list> 4579*a58d3d2aSXin Li out[] is rewhitened into an LPC residual, 4580*a58d3d2aSXin Li res[i], via 4581*a58d3d2aSXin Li<figure align="center"> 4582*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 4583*a58d3d2aSXin Li 4.0*LTP_scale_Q14 4584*a58d3d2aSXin Lires[i] = ----------------- * clamp(-1.0, 4585*a58d3d2aSXin Li gain_Q16[s] 4586*a58d3d2aSXin Li 4587*a58d3d2aSXin Li d_LPC-1 4588*a58d3d2aSXin Li __ a_Q12[k] 4589*a58d3d2aSXin Li out[i] - \ out[i-k-1] * --------, 1.0) . 4590*a58d3d2aSXin Li /_ 4096.0 4591*a58d3d2aSXin Li k=0 4592*a58d3d2aSXin Li]]></artwork> 4593*a58d3d2aSXin Li</figure> 4594*a58d3d2aSXin LiThis requires storage to buffer up to 306 values of out[i] from previous 4595*a58d3d2aSXin Li subframes. 4596*a58d3d2aSXin LiThis corresponds to WB with a maximum pitch lag of 4597*a58d3d2aSXin Li 18 ms * 16 kHz samples, plus 16 samples for d_LPC, plus 2 4598*a58d3d2aSXin Li samples for the width of the LTP filter. 4599*a58d3d2aSXin Li</t> 4600*a58d3d2aSXin Li 4601*a58d3d2aSXin Li<t> 4602*a58d3d2aSXin LiLet e_Q23[i] for j <= i < (j + n) be the 4603*a58d3d2aSXin Li excitation for the current subframe, and b_Q7[k] for 4604*a58d3d2aSXin Li 0 <= k < 5 be the coefficients of the LTP filter 4605*a58d3d2aSXin Li taken from the codebook entry in one of 4606*a58d3d2aSXin Li Tables <xref format="counter" target="silk_ltp_filter_coeffs0"/> 4607*a58d3d2aSXin Li through <xref format="counter" target="silk_ltp_filter_coeffs2"/> 4608*a58d3d2aSXin Li corresponding to the index decoded for the current subframe in 4609*a58d3d2aSXin Li <xref target="silk_ltp_filter"/>. 4610*a58d3d2aSXin LiThen for i such that j <= i < (j + n), 4611*a58d3d2aSXin Li the LPC residual is 4612*a58d3d2aSXin Li<figure align="center"> 4613*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 4614*a58d3d2aSXin Li 4 4615*a58d3d2aSXin Li e_Q23[i] __ b_Q7[k] 4616*a58d3d2aSXin Lires[i] = --------- + \ res[i - pitch_lags[s] + 2 - k] * ------- . 4617*a58d3d2aSXin Li 2.0**23 /_ 128.0 4618*a58d3d2aSXin Li k=0 4619*a58d3d2aSXin Li]]></artwork> 4620*a58d3d2aSXin Li</figure> 4621*a58d3d2aSXin Li</t> 4622*a58d3d2aSXin Li 4623*a58d3d2aSXin Li<t> 4624*a58d3d2aSXin LiFor unvoiced frames, the LPC residual for 4625*a58d3d2aSXin Li j <= i < (j + n) is simply a normalized 4626*a58d3d2aSXin Li copy of the excitation signal, i.e., 4627*a58d3d2aSXin Li<figure align="center"> 4628*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 4629*a58d3d2aSXin Li e_Q23[i] 4630*a58d3d2aSXin Lires[i] = --------- 4631*a58d3d2aSXin Li 2.0**23 4632*a58d3d2aSXin Li]]></artwork> 4633*a58d3d2aSXin Li</figure> 4634*a58d3d2aSXin Li</t> 4635*a58d3d2aSXin Li</section> 4636*a58d3d2aSXin Li 4637*a58d3d2aSXin Li<section anchor="silk_lpc_synthesis" title="LPC Synthesis"> 4638*a58d3d2aSXin Li<t> 4639*a58d3d2aSXin LiLPC synthesis uses the short-term LPC filter to predict the next output 4640*a58d3d2aSXin Li coefficient. 4641*a58d3d2aSXin LiFor i such that (j - d_LPC) <= i < j, let 4642*a58d3d2aSXin Li lpc[i] be the result of LPC synthesis from the last d_LPC samples of the 4643*a58d3d2aSXin Li previous subframe, or zeros in the first subframe for this channel after 4644*a58d3d2aSXin Li either 4645*a58d3d2aSXin Li<list style="symbols"> 4646*a58d3d2aSXin Li<t>An uncoded regular SILK frame (if this is the side channel), or</t> 4647*a58d3d2aSXin Li<t>A decoder reset (see <xref target="decoder-reset"/>).</t> 4648*a58d3d2aSXin Li</list> 4649*a58d3d2aSXin LiThen for i such that j <= i < (j + n), the 4650*a58d3d2aSXin Li result of LPC synthesis for the current subframe is 4651*a58d3d2aSXin Li<figure align="center"> 4652*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 4653*a58d3d2aSXin Li d_LPC-1 4654*a58d3d2aSXin Li gain_Q16[i] __ a_Q12[k] 4655*a58d3d2aSXin Lilpc[i] = ----------- * res[i] + \ lpc[i-k-1] * -------- . 4656*a58d3d2aSXin Li 65536.0 /_ 4096.0 4657*a58d3d2aSXin Li k=0 4658*a58d3d2aSXin Li]]></artwork> 4659*a58d3d2aSXin Li</figure> 4660*a58d3d2aSXin LiThe decoder saves the final d_LPC values, i.e., lpc[i] such that 4661*a58d3d2aSXin Li (j + n - d_LPC) <= i < (j + n), 4662*a58d3d2aSXin Li to feed into the LPC synthesis of the next subframe. 4663*a58d3d2aSXin LiThis requires storage for up to 16 values of lpc[i] (for WB frames). 4664*a58d3d2aSXin Li</t> 4665*a58d3d2aSXin Li 4666*a58d3d2aSXin Li<t> 4667*a58d3d2aSXin LiThen, the signal is clamped into the final nominal range: 4668*a58d3d2aSXin Li<figure align="center"> 4669*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 4670*a58d3d2aSXin Liout[i] = clamp(-1.0, lpc[i], 1.0) . 4671*a58d3d2aSXin Li]]></artwork> 4672*a58d3d2aSXin Li</figure> 4673*a58d3d2aSXin LiThis clamping occurs entirely after the LPC synthesis filter has run. 4674*a58d3d2aSXin LiThe decoder saves the unclamped values, lpc[i], to feed into the LPC filter for 4675*a58d3d2aSXin Li the next subframe, but saves the clamped values, out[i], for rewhitening in 4676*a58d3d2aSXin Li voiced frames. 4677*a58d3d2aSXin Li</t> 4678*a58d3d2aSXin Li</section> 4679*a58d3d2aSXin Li 4680*a58d3d2aSXin Li</section> 4681*a58d3d2aSXin Li 4682*a58d3d2aSXin Li</section> 4683*a58d3d2aSXin Li 4684*a58d3d2aSXin Li<section anchor="silk_stereo_unmixing" title="Stereo Unmixing"> 4685*a58d3d2aSXin Li<t> 4686*a58d3d2aSXin LiFor stereo streams, after decoding a frame from each channel, the decoder must 4687*a58d3d2aSXin Li convert the mid-side (MS) representation into a left-right (LR) 4688*a58d3d2aSXin Li representation. 4689*a58d3d2aSXin LiThe function silk_stereo_MS_to_LR (stereo_MS_to_LR.c) implements this process. 4690*a58d3d2aSXin LiIn it, the decoder predicts the side channel using a) a simple low-passed 4691*a58d3d2aSXin Li version of the mid channel, and b) the unfiltered mid channel, using the 4692*a58d3d2aSXin Li prediction weights decoded in <xref target="silk_stereo_pred"/>. 4693*a58d3d2aSXin LiThis simple low-pass filter imposes a one-sample delay, and the unfiltered 4694*a58d3d2aSXin Limid channel is also delayed by one sample. 4695*a58d3d2aSXin LiIn order to allow seamless switching between stereo and mono, mono streams must 4696*a58d3d2aSXin Li also impose the same one-sample delay. 4697*a58d3d2aSXin LiThe encoder requires an additional one-sample delay for both mono and stereo 4698*a58d3d2aSXin Li streams, though an encoder may omit the delay for mono if it knows it will 4699*a58d3d2aSXin Li never switch to stereo. 4700*a58d3d2aSXin Li</t> 4701*a58d3d2aSXin Li 4702*a58d3d2aSXin Li<t> 4703*a58d3d2aSXin LiThe unmixing process operates in two phases. 4704*a58d3d2aSXin LiThe first phase lasts for 8 ms, during which it interpolates the 4705*a58d3d2aSXin Li prediction weights from the previous frame, prev_w0_Q13 and prev_w1_Q13, to 4706*a58d3d2aSXin Li the values for the current frame, w0_Q13 and w1_Q13. 4707*a58d3d2aSXin LiThe second phase simply uses these weights for the remainder of the frame. 4708*a58d3d2aSXin Li</t> 4709*a58d3d2aSXin Li 4710*a58d3d2aSXin Li<t> 4711*a58d3d2aSXin LiLet mid[i] and side[i] be the contents of out[i] (from 4712*a58d3d2aSXin Li <xref target="silk_lpc_synthesis"/>) for the current mid and side channels, 4713*a58d3d2aSXin Li respectively, and let left[i] and right[i] be the corresponding stereo output 4714*a58d3d2aSXin Li channels. 4715*a58d3d2aSXin LiIf the side channel is not coded (see <xref target="silk_mid_only_flag"/>), 4716*a58d3d2aSXin Li then side[i] is set to zero. 4717*a58d3d2aSXin LiAlso let j be defined as in <xref target="silk_frame_reconstruction"/>, n1 be 4718*a58d3d2aSXin Li the number of samples in phase 1 (64 for NB, 96 for MB, and 128 for WB), 4719*a58d3d2aSXin Li and n2 be the total number of samples in the frame. 4720*a58d3d2aSXin LiThen for i such that j <= i < (j + n2), 4721*a58d3d2aSXin Li the left and right channel output is 4722*a58d3d2aSXin Li<figure align="center"> 4723*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 4724*a58d3d2aSXin Li prev_w0_Q13 (w0_Q13 - prev_w0_Q13) 4725*a58d3d2aSXin Li w0 = ----------- + min(i - j, n1)*---------------------- , 4726*a58d3d2aSXin Li 8192.0 8192.0*n1 4727*a58d3d2aSXin Li 4728*a58d3d2aSXin Li prev_w1_Q13 (w1_Q13 - prev_w1_Q13) 4729*a58d3d2aSXin Li w1 = ----------- + min(i - j, n1)*---------------------- , 4730*a58d3d2aSXin Li 8192.0 8192.0*n1 4731*a58d3d2aSXin Li 4732*a58d3d2aSXin Li mid[i-2] + 2*mid[i-1] + mid[i] 4733*a58d3d2aSXin Li p0 = ------------------------------ , 4734*a58d3d2aSXin Li 4.0 4735*a58d3d2aSXin Li 4736*a58d3d2aSXin Li left[i] = clamp(-1.0, (1 + w1)*mid[i-1] + side[i-1] + w0*p0, 1.0) , 4737*a58d3d2aSXin Li 4738*a58d3d2aSXin Liright[i] = clamp(-1.0, (1 - w1)*mid[i-1] - side[i-1] - w0*p0, 1.0) . 4739*a58d3d2aSXin Li]]></artwork> 4740*a58d3d2aSXin Li</figure> 4741*a58d3d2aSXin LiThese formulas require two samples prior to index j, the start of the 4742*a58d3d2aSXin Li frame, for the mid channel, and one prior sample for the side channel. 4743*a58d3d2aSXin LiFor the first frame after a decoder reset, zeros are used instead. 4744*a58d3d2aSXin Li</t> 4745*a58d3d2aSXin Li 4746*a58d3d2aSXin Li</section> 4747*a58d3d2aSXin Li 4748*a58d3d2aSXin Li<section title="Resampling"> 4749*a58d3d2aSXin Li<t> 4750*a58d3d2aSXin LiAfter stereo unmixing (if any), the decoder applies resampling to convert the 4751*a58d3d2aSXin Li decoded SILK output to the sample rate desired by the application. 4752*a58d3d2aSXin LiThis is necessary when decoding a Hybrid frame at SWB or FB sample rates, or 4753*a58d3d2aSXin Li whenever the decoder wants the output at a different sample rate than the 4754*a58d3d2aSXin Li internal SILK sampling rate (e.g., to allow a constant sample rate when the 4755*a58d3d2aSXin Li audio bandwidth changes, or to allow mixing with audio from other 4756*a58d3d2aSXin Li applications). 4757*a58d3d2aSXin LiThe resampler itself is non-normative, and a decoder can use any method it 4758*a58d3d2aSXin Li wants to perform the resampling. 4759*a58d3d2aSXin Li</t> 4760*a58d3d2aSXin Li 4761*a58d3d2aSXin Li<t> 4762*a58d3d2aSXin LiHowever, a minimum amount of delay is imposed to allow the resampler to 4763*a58d3d2aSXin Li operate, and this delay is normative, so that the corresponding delay can be 4764*a58d3d2aSXin Li applied to the MDCT layer in the encoder. 4765*a58d3d2aSXin LiA decoder is always free to use a resampler which requires more delay than 4766*a58d3d2aSXin Li allowed for here (e.g., to improve quality), but it must then delay the output 4767*a58d3d2aSXin Li of the MDCT layer by this extra amount. 4768*a58d3d2aSXin LiKeeping as much delay as possible on the encoder side allows an encoder which 4769*a58d3d2aSXin Li knows it will never use any of the SILK or Hybrid modes to skip this delay. 4770*a58d3d2aSXin LiBy contrast, if it were all applied by the decoder, then a decoder which 4771*a58d3d2aSXin Li processes audio in fixed-size blocks would be forced to delay the output of 4772*a58d3d2aSXin Li CELT frames just in case of a later switch to a SILK or Hybrid mode. 4773*a58d3d2aSXin Li</t> 4774*a58d3d2aSXin Li 4775*a58d3d2aSXin Li<t> 4776*a58d3d2aSXin Li<xref target="silk_resampler_delay_alloc"/> gives the maximum resampler delay 4777*a58d3d2aSXin Li in samples at 48 kHz for each SILK audio bandwidth. 4778*a58d3d2aSXin LiBecause the actual output rate may not be 48 kHz, it may not be possible 4779*a58d3d2aSXin Li to achieve exactly these delays while using a whole number of input or output 4780*a58d3d2aSXin Li samples. 4781*a58d3d2aSXin LiThe reference implementation is able to resample to any of the supported 4782*a58d3d2aSXin Li output sampling rates (8, 12, 16, 24, or 48 kHz) within or near this 4783*a58d3d2aSXin Li delay constraint. 4784*a58d3d2aSXin LiSome resampling filters (including those used by the reference implementation) 4785*a58d3d2aSXin Li may add a delay that is not an exact integer, or is not linear-phase, and so 4786*a58d3d2aSXin Li cannot be represented by a single delay at all frequencies. 4787*a58d3d2aSXin LiHowever, such deviations are unlikely to be perceptible, and the comparison 4788*a58d3d2aSXin Li tool described in <xref target="conformance"/> is designed to be relatively 4789*a58d3d2aSXin Li insensitive to them. 4790*a58d3d2aSXin LiThe delays listed here are the ones that should be targeted by the encoder. 4791*a58d3d2aSXin Li</t> 4792*a58d3d2aSXin Li 4793*a58d3d2aSXin Li<texttable anchor="silk_resampler_delay_alloc" 4794*a58d3d2aSXin Li title="SILK Resampler Delay Allocations"> 4795*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol> 4796*a58d3d2aSXin Li<ttcol>Delay in millisecond</ttcol> 4797*a58d3d2aSXin Li<c>NB</c> <c>0.538</c> 4798*a58d3d2aSXin Li<c>MB</c> <c>0.692</c> 4799*a58d3d2aSXin Li<c>WB</c> <c>0.706</c> 4800*a58d3d2aSXin Li</texttable> 4801*a58d3d2aSXin Li 4802*a58d3d2aSXin Li<t> 4803*a58d3d2aSXin LiNB is given a smaller decoder delay allocation than MB and WB to allow a 4804*a58d3d2aSXin Li higher-order filter when resampling to 8 kHz in both the encoder and 4805*a58d3d2aSXin Li decoder. 4806*a58d3d2aSXin LiThis implies that the audio content of two SILK frames operating at different 4807*a58d3d2aSXin Li bandwidths are not perfectly aligned in time. 4808*a58d3d2aSXin LiThis is not an issue for any transitions described in 4809*a58d3d2aSXin Li <xref target="switching"/>, because they all involve a SILK decoder reset. 4810*a58d3d2aSXin LiWhen the decoder is reset, any samples remaining in the resampling buffer 4811*a58d3d2aSXin Li are discarded, and the resampler is re-initialized with silence. 4812*a58d3d2aSXin Li</t> 4813*a58d3d2aSXin Li 4814*a58d3d2aSXin Li</section> 4815*a58d3d2aSXin Li 4816*a58d3d2aSXin Li</section> 4817*a58d3d2aSXin Li 4818*a58d3d2aSXin Li 4819*a58d3d2aSXin Li<section title="CELT Decoder"> 4820*a58d3d2aSXin Li 4821*a58d3d2aSXin Li<t> 4822*a58d3d2aSXin LiThe CELT layer of Opus is based on the Modified Discrete Cosine Transform 4823*a58d3d2aSXin Li<xref target='MDCT'/> with partially overlapping windows of 5 to 22.5 ms. 4824*a58d3d2aSXin LiThe main principle behind CELT is that the MDCT spectrum is divided into 4825*a58d3d2aSXin Libands that (roughly) follow the Bark scale, i.e., the scale of the ear's 4826*a58d3d2aSXin Licritical bands <xref target="Zwicker61"/>. The normal CELT layer uses 21 of those bands, though Opus 4827*a58d3d2aSXin Li Custom (see <xref target="opus-custom"/>) may use a different number of bands. 4828*a58d3d2aSXin LiIn Hybrid mode, the first 17 bands (up to 8 kHz) are not coded. 4829*a58d3d2aSXin LiA band can contain as little as one MDCT bin per channel, and as many as 176 4830*a58d3d2aSXin Libins per channel, as detailed in <xref target="celt_band_sizes"/>. 4831*a58d3d2aSXin LiIn each band, the gain (energy) is coded separately from 4832*a58d3d2aSXin Lithe shape of the spectrum. Coding the gain explicitly makes it easy to 4833*a58d3d2aSXin Lipreserve the spectral envelope of the signal. The remaining unit-norm shape 4834*a58d3d2aSXin Livector is encoded using a Pyramid Vector Quantizer (PVQ) <xref target='PVQ-decoder'/>. 4835*a58d3d2aSXin Li</t> 4836*a58d3d2aSXin Li 4837*a58d3d2aSXin Li<texttable anchor="celt_band_sizes" 4838*a58d3d2aSXin Li title="MDCT Bins Per Channel Per Band for Each Frame Size"> 4839*a58d3d2aSXin Li<ttcol>Frame Size:</ttcol> 4840*a58d3d2aSXin Li<ttcol align="right">2.5 ms</ttcol> 4841*a58d3d2aSXin Li<ttcol align="right">5 ms</ttcol> 4842*a58d3d2aSXin Li<ttcol align="right">10 ms</ttcol> 4843*a58d3d2aSXin Li<ttcol align="right">20 ms</ttcol> 4844*a58d3d2aSXin Li<ttcol align="right">Start Frequency</ttcol> 4845*a58d3d2aSXin Li<ttcol align="right">Stop Frequency</ttcol> 4846*a58d3d2aSXin Li<c>Band</c> <c>Bins:</c> <c/> <c/> <c/> <c/> <c/> 4847*a58d3d2aSXin Li <c>0</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>0 Hz</c> <c>200 Hz</c> 4848*a58d3d2aSXin Li <c>1</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>200 Hz</c> <c>400 Hz</c> 4849*a58d3d2aSXin Li <c>2</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>400 Hz</c> <c>600 Hz</c> 4850*a58d3d2aSXin Li <c>3</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>600 Hz</c> <c>800 Hz</c> 4851*a58d3d2aSXin Li <c>4</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>800 Hz</c> <c>1000 Hz</c> 4852*a58d3d2aSXin Li <c>5</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>1000 Hz</c> <c>1200 Hz</c> 4853*a58d3d2aSXin Li <c>6</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>1200 Hz</c> <c>1400 Hz</c> 4854*a58d3d2aSXin Li <c>7</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>1400 Hz</c> <c>1600 Hz</c> 4855*a58d3d2aSXin Li <c>8</c> <c>2</c> <c>4</c> <c>8</c> <c>16</c> <c>1600 Hz</c> <c>2000 Hz</c> 4856*a58d3d2aSXin Li <c>9</c> <c>2</c> <c>4</c> <c>8</c> <c>16</c> <c>2000 Hz</c> <c>2400 Hz</c> 4857*a58d3d2aSXin Li<c>10</c> <c>2</c> <c>4</c> <c>8</c> <c>16</c> <c>2400 Hz</c> <c>2800 Hz</c> 4858*a58d3d2aSXin Li<c>11</c> <c>2</c> <c>4</c> <c>8</c> <c>16</c> <c>2800 Hz</c> <c>3200 Hz</c> 4859*a58d3d2aSXin Li<c>12</c> <c>4</c> <c>8</c> <c>16</c> <c>32</c> <c>3200 Hz</c> <c>4000 Hz</c> 4860*a58d3d2aSXin Li<c>13</c> <c>4</c> <c>8</c> <c>16</c> <c>32</c> <c>4000 Hz</c> <c>4800 Hz</c> 4861*a58d3d2aSXin Li<c>14</c> <c>4</c> <c>8</c> <c>16</c> <c>32</c> <c>4800 Hz</c> <c>5600 Hz</c> 4862*a58d3d2aSXin Li<c>15</c> <c>6</c> <c>12</c> <c>24</c> <c>48</c> <c>5600 Hz</c> <c>6800 Hz</c> 4863*a58d3d2aSXin Li<c>16</c> <c>6</c> <c>12</c> <c>24</c> <c>48</c> <c>6800 Hz</c> <c>8000 Hz</c> 4864*a58d3d2aSXin Li<c>17</c> <c>8</c> <c>16</c> <c>32</c> <c>64</c> <c>8000 Hz</c> <c>9600 Hz</c> 4865*a58d3d2aSXin Li<c>18</c> <c>12</c> <c>24</c> <c>48</c> <c>96</c> <c>9600 Hz</c> <c>12000 Hz</c> 4866*a58d3d2aSXin Li<c>19</c> <c>18</c> <c>36</c> <c>72</c> <c>144</c> <c>12000 Hz</c> <c>15600 Hz</c> 4867*a58d3d2aSXin Li<c>20</c> <c>22</c> <c>44</c> <c>88</c> <c>176</c> <c>15600 Hz</c> <c>20000 Hz</c> 4868*a58d3d2aSXin Li</texttable> 4869*a58d3d2aSXin Li 4870*a58d3d2aSXin Li<t> 4871*a58d3d2aSXin LiTransients are notoriously difficult for transform codecs to code. 4872*a58d3d2aSXin LiCELT uses two different strategies for them: 4873*a58d3d2aSXin Li<list style="numbers"> 4874*a58d3d2aSXin Li<t>Using multiple smaller MDCTs instead of a single large MDCT, and</t> 4875*a58d3d2aSXin Li<t>Dynamic time-frequency resolution changes (See <xref target='tf-change'/>).</t> 4876*a58d3d2aSXin Li</list> 4877*a58d3d2aSXin LiTo improve quality on highly tonal and periodic signals, CELT includes 4878*a58d3d2aSXin Lia prefilter/postfilter combination. The prefilter on the encoder side 4879*a58d3d2aSXin Liattenuates the signal's harmonics. The postfilter on the decoder side 4880*a58d3d2aSXin Lirestores the original gain of the harmonics, while shaping the coding noise 4881*a58d3d2aSXin Lito roughly follow the harmonics. Such noise shaping reduces the perception 4882*a58d3d2aSXin Liof the noise. 4883*a58d3d2aSXin Li</t> 4884*a58d3d2aSXin Li 4885*a58d3d2aSXin Li<t> 4886*a58d3d2aSXin LiWhen coding a stereo signal, three coding methods are available: 4887*a58d3d2aSXin Li<list style="symbols"> 4888*a58d3d2aSXin Li<t>mid-side stereo: encodes the mean and the difference of the left and right channels,</t> 4889*a58d3d2aSXin Li<t>intensity stereo: only encodes the mean of the left and right channels (discards the difference),</t> 4890*a58d3d2aSXin Li<t>dual stereo: encodes the left and right channels separately.</t> 4891*a58d3d2aSXin Li</list> 4892*a58d3d2aSXin Li</t> 4893*a58d3d2aSXin Li 4894*a58d3d2aSXin Li<t> 4895*a58d3d2aSXin LiAn overview of the decoder is given in <xref target="celt-decoder-overview"/>. 4896*a58d3d2aSXin Li</t> 4897*a58d3d2aSXin Li 4898*a58d3d2aSXin Li<figure anchor="celt-decoder-overview" title="Structure of the CELT decoder"> 4899*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 4900*a58d3d2aSXin Li +---------+ 4901*a58d3d2aSXin Li | Coarse | 4902*a58d3d2aSXin Li +->| decoder |----+ 4903*a58d3d2aSXin Li | +---------+ | 4904*a58d3d2aSXin Li | | 4905*a58d3d2aSXin Li | +---------+ v 4906*a58d3d2aSXin Li | | Fine | +---+ 4907*a58d3d2aSXin Li +->| decoder |->| + | 4908*a58d3d2aSXin Li | +---------+ +---+ 4909*a58d3d2aSXin Li | ^ | 4910*a58d3d2aSXin Li+---------+ | | | 4911*a58d3d2aSXin Li| Range | | +----------+ v 4912*a58d3d2aSXin Li| Decoder |-+ | Bit | +------+ 4913*a58d3d2aSXin Li+---------+ | |Allocation| | 2**x | 4914*a58d3d2aSXin Li | +----------+ +------+ 4915*a58d3d2aSXin Li | | | 4916*a58d3d2aSXin Li | v v +--------+ 4917*a58d3d2aSXin Li | +---------+ +---+ +-------+ | pitch | 4918*a58d3d2aSXin Li +->| PVQ |->| * |->| IMDCT |->| post- |---> 4919*a58d3d2aSXin Li | | decoder | +---+ +-------+ | filter | 4920*a58d3d2aSXin Li | +---------+ +--------+ 4921*a58d3d2aSXin Li | ^ 4922*a58d3d2aSXin Li +--------------------------------------+ 4923*a58d3d2aSXin Li]]></artwork> 4924*a58d3d2aSXin Li</figure> 4925*a58d3d2aSXin Li 4926*a58d3d2aSXin Li<t> 4927*a58d3d2aSXin LiThe decoder is based on the following symbols and sets of symbols: 4928*a58d3d2aSXin Li</t> 4929*a58d3d2aSXin Li 4930*a58d3d2aSXin Li<texttable anchor="celt_symbols" 4931*a58d3d2aSXin Li title="Order of the Symbols in the CELT Section of the Bitstream"> 4932*a58d3d2aSXin Li<ttcol align="center">Symbol(s)</ttcol> 4933*a58d3d2aSXin Li<ttcol align="center">PDF</ttcol> 4934*a58d3d2aSXin Li<ttcol align="center">Condition</ttcol> 4935*a58d3d2aSXin Li<c>silence</c> <c>{32767, 1}/32768</c> <c></c> 4936*a58d3d2aSXin Li<c>post-filter</c> <c>{1, 1}/2</c> <c></c> 4937*a58d3d2aSXin Li<c>octave</c> <c>uniform (6)</c><c>post-filter</c> 4938*a58d3d2aSXin Li<c>period</c> <c>raw bits (4+octave)</c><c>post-filter</c> 4939*a58d3d2aSXin Li<c>gain</c> <c>raw bits (3)</c><c>post-filter</c> 4940*a58d3d2aSXin Li<c>tapset</c> <c>{2, 1, 1}/4</c><c>post-filter</c> 4941*a58d3d2aSXin Li<c>transient</c> <c>{7, 1}/8</c><c></c> 4942*a58d3d2aSXin Li<c>intra</c> <c>{7, 1}/8</c><c></c> 4943*a58d3d2aSXin Li<c>coarse energy</c><c><xref target="energy-decoding"/></c><c></c> 4944*a58d3d2aSXin Li<c>tf_change</c> <c><xref target="transient-decoding"/></c><c></c> 4945*a58d3d2aSXin Li<c>tf_select</c> <c>{1, 1}/2</c><c><xref target="transient-decoding"/></c> 4946*a58d3d2aSXin Li<c>spread</c> <c>{7, 2, 21, 2}/32</c><c></c> 4947*a58d3d2aSXin Li<c>dyn. alloc.</c> <c><xref target="allocation"/></c><c></c> 4948*a58d3d2aSXin Li<c>alloc. trim</c> <c>{2, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2}/128</c><c></c> 4949*a58d3d2aSXin Li<c>skip</c> <c>{1, 1}/2</c><c><xref target="allocation"/></c> 4950*a58d3d2aSXin Li<c>intensity</c> <c>uniform</c><c><xref target="allocation"/></c> 4951*a58d3d2aSXin Li<c>dual</c> <c>{1, 1}/2</c><c></c> 4952*a58d3d2aSXin Li<c>fine energy</c> <c><xref target="energy-decoding"/></c><c></c> 4953*a58d3d2aSXin Li<c>residual</c> <c><xref target="PVQ-decoder"/></c><c></c> 4954*a58d3d2aSXin Li<c>anti-collapse</c><c>{1, 1}/2</c><c><xref target="anti-collapse"/></c> 4955*a58d3d2aSXin Li<c>finalize</c> <c><xref target="energy-decoding"/></c><c></c> 4956*a58d3d2aSXin Li</texttable> 4957*a58d3d2aSXin Li 4958*a58d3d2aSXin Li<t> 4959*a58d3d2aSXin LiThe decoder extracts information from the range-coded bitstream in the order 4960*a58d3d2aSXin Lidescribed in <xref target='celt_symbols'/>. In some circumstances, it is 4961*a58d3d2aSXin Lipossible for a decoded value to be out of range due to a very small amount of redundancy 4962*a58d3d2aSXin Liin the encoding of large integers by the range coder. 4963*a58d3d2aSXin LiIn that case, the decoder should assume there has been an error in the coding, 4964*a58d3d2aSXin Lidecoding, or transmission and SHOULD take measures to conceal the error and/or report 4965*a58d3d2aSXin Lito the application that a problem has occurred. Such out of range errors cannot occur 4966*a58d3d2aSXin Liin the SILK layer. 4967*a58d3d2aSXin Li</t> 4968*a58d3d2aSXin Li 4969*a58d3d2aSXin Li<section anchor="transient-decoding" title="Transient Decoding"> 4970*a58d3d2aSXin Li<t> 4971*a58d3d2aSXin LiThe "transient" flag indicates whether the frame uses a single long MDCT or several short MDCTs. 4972*a58d3d2aSXin LiWhen it is set, then the MDCT coefficients represent multiple 4973*a58d3d2aSXin Lishort MDCTs in the frame. When not set, the coefficients represent a single 4974*a58d3d2aSXin Lilong MDCT for the frame. The flag is encoded in the bitstream with a probability of 1/8. 4975*a58d3d2aSXin LiIn addition to the global transient flag is a per-band 4976*a58d3d2aSXin Libinary flag to change the time-frequency (tf) resolution independently in each band. The 4977*a58d3d2aSXin Lichange in tf resolution is defined in tf_select_table[][] in celt.c and depends 4978*a58d3d2aSXin Lion the frame size, whether the transient flag is set, and the value of tf_select. 4979*a58d3d2aSXin LiThe tf_select flag uses a 1/2 probability, but is only decoded 4980*a58d3d2aSXin Liif it can have an impact on the result knowing the value of all per-band 4981*a58d3d2aSXin Litf_change flags. 4982*a58d3d2aSXin Li</t> 4983*a58d3d2aSXin Li</section> 4984*a58d3d2aSXin Li 4985*a58d3d2aSXin Li<section anchor="energy-decoding" title="Energy Envelope Decoding"> 4986*a58d3d2aSXin Li 4987*a58d3d2aSXin Li<t> 4988*a58d3d2aSXin LiIt is important to quantize the energy with sufficient resolution because 4989*a58d3d2aSXin Liany energy quantization error cannot be compensated for at a later 4990*a58d3d2aSXin Listage. Regardless of the resolution used for encoding the spectral shape of a band, 4991*a58d3d2aSXin Liit is perceptually important to preserve the energy in each band. CELT uses a 4992*a58d3d2aSXin Lithree-step coarse-fine-fine strategy for encoding the energy in the base-2 log 4993*a58d3d2aSXin Lidomain, as implemented in quant_bands.c</t> 4994*a58d3d2aSXin Li 4995*a58d3d2aSXin Li<section anchor="coarse-energy-decoding" title="Coarse energy decoding"> 4996*a58d3d2aSXin Li<t> 4997*a58d3d2aSXin LiCoarse quantization of the energy uses a fixed resolution of 6 dB 4998*a58d3d2aSXin Li(integer part of base-2 log). To minimize the bitrate, prediction is applied 4999*a58d3d2aSXin Liboth in time (using the previous frame) and in frequency (using the previous 5000*a58d3d2aSXin Libands). The part of the prediction that is based on the 5001*a58d3d2aSXin Liprevious frame can be disabled, creating an "intra" frame where the energy 5002*a58d3d2aSXin Liis coded without reference to prior frames. The decoder first reads the intra flag 5003*a58d3d2aSXin Lito determine what prediction is used. 5004*a58d3d2aSXin LiThe 2-D z-transform <xref target='z-transform'/> of 5005*a58d3d2aSXin Lithe prediction filter is: 5006*a58d3d2aSXin Li<figure align="center"> 5007*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 5008*a58d3d2aSXin Li -1 -1 5009*a58d3d2aSXin Li (1 - alpha*z_l )*(1 - z_b ) 5010*a58d3d2aSXin LiA(z_l, z_b) = ----------------------------- 5011*a58d3d2aSXin Li -1 5012*a58d3d2aSXin Li 1 - beta*z_b 5013*a58d3d2aSXin Li]]></artwork> 5014*a58d3d2aSXin Li</figure> 5015*a58d3d2aSXin Liwhere b is the band index and l is the frame index. The prediction coefficients 5016*a58d3d2aSXin Liapplied depend on the frame size in use when not using intra energy and are alpha=0, beta=4915/32768 5017*a58d3d2aSXin Liwhen using intra energy. 5018*a58d3d2aSXin LiThe time-domain prediction is based on the final fine quantization of the previous 5019*a58d3d2aSXin Liframe, while the frequency domain (within the current frame) prediction is based 5020*a58d3d2aSXin Lion coarse quantization only (because the fine quantization has not been computed 5021*a58d3d2aSXin Liyet). The prediction is clamped internally so that fixed point implementations with 5022*a58d3d2aSXin Lilimited dynamic range always remain in the same state as floating point implementations. 5023*a58d3d2aSXin LiWe approximate the ideal 5024*a58d3d2aSXin Liprobability distribution of the prediction error using a Laplace distribution 5025*a58d3d2aSXin Liwith separate parameters for each frame size in intra- and inter-frame modes. These 5026*a58d3d2aSXin Liparameters are held in the e_prob_model table in quant_bands.c. 5027*a58d3d2aSXin LiThe 5028*a58d3d2aSXin Licoarse energy quantization is performed by unquant_coarse_energy() and 5029*a58d3d2aSXin Liunquant_coarse_energy_impl() (quant_bands.c). The encoding of the Laplace-distributed values is 5030*a58d3d2aSXin Liimplemented in ec_laplace_decode() (laplace.c). 5031*a58d3d2aSXin Li</t> 5032*a58d3d2aSXin Li 5033*a58d3d2aSXin Li</section> 5034*a58d3d2aSXin Li 5035*a58d3d2aSXin Li<section anchor="fine-energy-decoding" title="Fine energy quantization"> 5036*a58d3d2aSXin Li<t> 5037*a58d3d2aSXin LiThe number of bits assigned to fine energy quantization in each band is determined 5038*a58d3d2aSXin Liby the bit allocation computation described in <xref target="allocation"></xref>. 5039*a58d3d2aSXin LiLet B_i be the number of fine energy bits 5040*a58d3d2aSXin Lifor band i; the refinement is an integer f in the range [0,2**B_i-1]. The mapping between f 5041*a58d3d2aSXin Liand the correction applied to the coarse energy is equal to (f+1/2)/2**B_i - 1/2. Fine 5042*a58d3d2aSXin Lienergy quantization is implemented in quant_fine_energy() (quant_bands.c). 5043*a58d3d2aSXin Li</t> 5044*a58d3d2aSXin Li<t> 5045*a58d3d2aSXin LiWhen some bits are left "unused" after all other flags have been decoded, these bits 5046*a58d3d2aSXin Liare assigned to a "final" step of fine allocation. In effect, these bits are used 5047*a58d3d2aSXin Lito add one extra fine energy bit per band per channel. The allocation process 5048*a58d3d2aSXin Lidetermines two "priorities" for the final fine bits. 5049*a58d3d2aSXin LiAny remaining bits are first assigned only to bands of priority 0, starting 5050*a58d3d2aSXin Lifrom band 0 and going up. If all bands of priority 0 have received one bit per 5051*a58d3d2aSXin Lichannel, then bands of priority 1 are assigned an extra bit per channel, 5052*a58d3d2aSXin Listarting from band 0. If any bits are left after this, they are left unused. 5053*a58d3d2aSXin LiThis is implemented in unquant_energy_finalise() (quant_bands.c). 5054*a58d3d2aSXin Li</t> 5055*a58d3d2aSXin Li 5056*a58d3d2aSXin Li</section> <!-- fine energy --> 5057*a58d3d2aSXin Li 5058*a58d3d2aSXin Li</section> <!-- Energy decode --> 5059*a58d3d2aSXin Li 5060*a58d3d2aSXin Li<section anchor="allocation" title="Bit Allocation"> 5061*a58d3d2aSXin Li 5062*a58d3d2aSXin Li<t>Because the bit allocation drives the decoding of the range-coder 5063*a58d3d2aSXin Listream, it MUST be recovered exactly so that identical coding decisions are 5064*a58d3d2aSXin Limade in the encoder and decoder. Any deviation from the reference's resulting 5065*a58d3d2aSXin Libit allocation will result in corrupted output, though implementers are 5066*a58d3d2aSXin Lifree to implement the procedure in any way which produces identical results.</t> 5067*a58d3d2aSXin Li 5068*a58d3d2aSXin Li<t>The per-band gain-shape structure of the CELT layer ensures that using 5069*a58d3d2aSXin Li the same number of bits for the spectral shape of a band in every frame will 5070*a58d3d2aSXin Li result in a roughly constant signal-to-noise ratio in that band. 5071*a58d3d2aSXin LiThis results in coding noise that has the same spectral envelope as the signal. 5072*a58d3d2aSXin LiThe masking curve produced by a standard psychoacoustic model also closely 5073*a58d3d2aSXin Li follows the spectral envelope of the signal. 5074*a58d3d2aSXin LiThis structure means that the ideal allocation is more consistent from frame to 5075*a58d3d2aSXin Li frame than it is for other codecs without an equivalent structure, and that a 5076*a58d3d2aSXin Li fixed allocation provides fairly consistent perceptual 5077*a58d3d2aSXin Li performance <xref target='Valin2010'/>.</t> 5078*a58d3d2aSXin Li 5079*a58d3d2aSXin Li<t>Many codecs transmit significant amounts of side information to control the 5080*a58d3d2aSXin Li bit allocation within a frame. 5081*a58d3d2aSXin LiOften this control is only indirect, and must be exercised carefully to 5082*a58d3d2aSXin Li achieve the desired rate constraints. 5083*a58d3d2aSXin LiThe CELT layer, however, can adapt over a very wide range of rates, and thus 5084*a58d3d2aSXin Li has a large number of codebook sizes to choose from for each band. 5085*a58d3d2aSXin LiExplicitly signaling the size of each of these codebooks would impose 5086*a58d3d2aSXin Li considerable overhead, even though the allocation is relatively static from 5087*a58d3d2aSXin Li frame to frame. 5088*a58d3d2aSXin LiThis is because all of the information required to compute these codebook sizes 5089*a58d3d2aSXin Li must be derived from a single frame by itself, in order to retain robustness 5090*a58d3d2aSXin Li to packet loss, so the signaling cannot take advantage of knowledge of the 5091*a58d3d2aSXin Li allocation in neighboring frames. 5092*a58d3d2aSXin LiThis problem is exacerbated in low-latency (small frame size) applications, 5093*a58d3d2aSXin Li which would include this overhead in every frame.</t> 5094*a58d3d2aSXin Li 5095*a58d3d2aSXin Li<t>For this reason, in the MDCT mode Opus uses a primarily implicit bit 5096*a58d3d2aSXin Liallocation. The available bitstream capacity is known in advance to both 5097*a58d3d2aSXin Lithe encoder and decoder without additional signaling, ultimately from the 5098*a58d3d2aSXin Lipacket sizes expressed by a higher-level protocol. Using this information, 5099*a58d3d2aSXin Lithe codec interpolates an allocation from a hard-coded table.</t> 5100*a58d3d2aSXin Li 5101*a58d3d2aSXin Li<t>While the band-energy structure effectively models intra-band masking, 5102*a58d3d2aSXin Liit ignores the weaker inter-band masking, band-temporal masking, and 5103*a58d3d2aSXin Liother less significant perceptual effects. While these effects can 5104*a58d3d2aSXin Lioften be ignored, they can become significant for particular samples. One 5105*a58d3d2aSXin Limechanism available to encoders would be to simply increase the overall 5106*a58d3d2aSXin Lirate for these frames, but this is not possible in a constant rate mode 5107*a58d3d2aSXin Liand can be fairly inefficient. As a result three explicitly signaled 5108*a58d3d2aSXin Limechanisms are provided to alter the implicit allocation:</t> 5109*a58d3d2aSXin Li 5110*a58d3d2aSXin Li<t> 5111*a58d3d2aSXin Li<list style="symbols"> 5112*a58d3d2aSXin Li<t>Band boost</t> 5113*a58d3d2aSXin Li<t>Allocation trim</t> 5114*a58d3d2aSXin Li<t>Band skipping</t> 5115*a58d3d2aSXin Li</list> 5116*a58d3d2aSXin Li</t> 5117*a58d3d2aSXin Li 5118*a58d3d2aSXin Li<t>The first of these mechanisms, band boost, allows an encoder to boost 5119*a58d3d2aSXin Lithe allocation in specific bands. The second, allocation trim, works by 5120*a58d3d2aSXin Libiasing the overall allocation towards higher or lower frequency bands. The third, band 5121*a58d3d2aSXin Liskipping, selects which low-precision high frequency bands 5122*a58d3d2aSXin Liwill be allocated no shape bits at all.</t> 5123*a58d3d2aSXin Li 5124*a58d3d2aSXin Li<t>In stereo mode there are two additional parameters 5125*a58d3d2aSXin Lipotentially coded as part of the allocation procedure: a parameter to allow the 5126*a58d3d2aSXin Liselective elimination of allocation for the 'side' (i.e., intensity stereo) in jointly coded bands, 5127*a58d3d2aSXin Liand a flag to deactivate joint coding (i.e., dual stereo). These values are not signaled if 5128*a58d3d2aSXin Lithey would be meaningless in the overall context of the allocation.</t> 5129*a58d3d2aSXin Li 5130*a58d3d2aSXin Li<t>Because every signaled adjustment increases overhead and implementation 5131*a58d3d2aSXin Licomplexity, none were included speculatively: the reference encoder makes use 5132*a58d3d2aSXin Liof all of these mechanisms. While the decision logic in the reference was 5133*a58d3d2aSXin Lifound to be effective enough to justify the overhead and complexity, further 5134*a58d3d2aSXin Lianalysis techniques may be discovered which increase the effectiveness of these 5135*a58d3d2aSXin Liparameters. As with other signaled parameters, an encoder is free to choose the 5136*a58d3d2aSXin Livalues in any manner, but unless a technique is known to deliver superior 5137*a58d3d2aSXin Liperceptual results the methods used by the reference implementation should be 5138*a58d3d2aSXin Liused.</t> 5139*a58d3d2aSXin Li 5140*a58d3d2aSXin Li<t>The allocation process consists of the following steps: determining the per-band 5141*a58d3d2aSXin Limaximum allocation vector, decoding the boosts, decoding the tilt, determining 5142*a58d3d2aSXin Lithe remaining capacity of the frame, searching the mode table for the 5143*a58d3d2aSXin Lientry nearest but not exceeding the available space (subject to the tilt, boosts, band 5144*a58d3d2aSXin Limaximums, and band minimums), linear interpolation, reallocation of 5145*a58d3d2aSXin Liunused bits with concurrent skip decoding, determination of the 5146*a58d3d2aSXin Lifine-energy vs. shape split, and final reallocation. This process results 5147*a58d3d2aSXin Liin a per-band shape allocation (in 1/8th bit units), a per-band fine-energy 5148*a58d3d2aSXin Liallocation (in 1 bit per channel units), a set of band priorities for 5149*a58d3d2aSXin Licontrolling the use of remaining bits at the end of the frame, and a 5150*a58d3d2aSXin Liremaining balance of unallocated space, which is usually zero except 5151*a58d3d2aSXin Liat very high rates.</t> 5152*a58d3d2aSXin Li 5153*a58d3d2aSXin Li<t> 5154*a58d3d2aSXin LiThe "static" bit allocation (in 1/8 bits) for a quality q, excluding the minimums, maximums, 5155*a58d3d2aSXin Litilt and boosts, is equal to channels*N*alloc[band][q]<<LM>>2, where 5156*a58d3d2aSXin Lialloc[][] is given in <xref target="static_alloc"/> and LM=log2(frame_size/120). The allocation 5157*a58d3d2aSXin Liis obtained by linearly interpolating between two values of q (in steps of 1/64) to find the 5158*a58d3d2aSXin Lihighest allocation that does not exceed the number of bits remaining. 5159*a58d3d2aSXin Li</t> 5160*a58d3d2aSXin Li 5161*a58d3d2aSXin Li<texttable anchor="static_alloc" 5162*a58d3d2aSXin Li title="CELT Static Allocation Table"> 5163*a58d3d2aSXin Li <preamble>Rows indicate the MDCT bands, columns are the different quality (q) parameters. The units are 1/32 bit per MDCT bin.</preamble> 5164*a58d3d2aSXin Li<ttcol align="right">0</ttcol> 5165*a58d3d2aSXin Li<ttcol align="right">1</ttcol> 5166*a58d3d2aSXin Li<ttcol align="right">2</ttcol> 5167*a58d3d2aSXin Li<ttcol align="right">3</ttcol> 5168*a58d3d2aSXin Li<ttcol align="right">4</ttcol> 5169*a58d3d2aSXin Li<ttcol align="right">5</ttcol> 5170*a58d3d2aSXin Li<ttcol align="right">6</ttcol> 5171*a58d3d2aSXin Li<ttcol align="right">7</ttcol> 5172*a58d3d2aSXin Li<ttcol align="right">8</ttcol> 5173*a58d3d2aSXin Li<ttcol align="right">9</ttcol> 5174*a58d3d2aSXin Li<ttcol align="right">10</ttcol> 5175*a58d3d2aSXin Li<c>0</c><c>90</c><c>110</c><c>118</c><c>126</c><c>134</c><c>144</c><c>152</c><c>162</c><c>172</c><c>200</c> 5176*a58d3d2aSXin Li<c>0</c><c>80</c><c>100</c><c>110</c><c>119</c><c>127</c><c>137</c><c>145</c><c>155</c><c>165</c><c>200</c> 5177*a58d3d2aSXin Li<c>0</c><c>75</c><c>90</c><c>103</c><c>112</c><c>120</c><c>130</c><c>138</c><c>148</c><c>158</c><c>200</c> 5178*a58d3d2aSXin Li<c>0</c><c>69</c><c>84</c><c>93</c><c>104</c><c>114</c><c>124</c><c>132</c><c>142</c><c>152</c><c>200</c> 5179*a58d3d2aSXin Li<c>0</c><c>63</c><c>78</c><c>86</c><c>95</c><c>103</c><c>113</c><c>123</c><c>133</c><c>143</c><c>200</c> 5180*a58d3d2aSXin Li<c>0</c><c>56</c><c>71</c><c>80</c><c>89</c><c>97</c><c>107</c><c>117</c><c>127</c><c>137</c><c>200</c> 5181*a58d3d2aSXin Li<c>0</c><c>49</c><c>65</c><c>75</c><c>83</c><c>91</c><c>101</c><c>111</c><c>121</c><c>131</c><c>200</c> 5182*a58d3d2aSXin Li<c>0</c><c>40</c><c>58</c><c>70</c><c>78</c><c>85</c><c>95</c><c>105</c><c>115</c><c>125</c><c>200</c> 5183*a58d3d2aSXin Li<c>0</c><c>34</c><c>51</c><c>65</c><c>72</c><c>78</c><c>88</c><c>98</c><c>108</c><c>118</c><c>198</c> 5184*a58d3d2aSXin Li<c>0</c><c>29</c><c>45</c><c>59</c><c>66</c><c>72</c><c>82</c><c>92</c><c>102</c><c>112</c><c>193</c> 5185*a58d3d2aSXin Li<c>0</c><c>20</c><c>39</c><c>53</c><c>60</c><c>66</c><c>76</c><c>86</c><c>96</c><c>106</c><c>188</c> 5186*a58d3d2aSXin Li<c>0</c><c>18</c><c>32</c><c>47</c><c>54</c><c>60</c><c>70</c><c>80</c><c>90</c><c>100</c><c>183</c> 5187*a58d3d2aSXin Li<c>0</c><c>10</c><c>26</c><c>40</c><c>47</c><c>54</c><c>64</c><c>74</c><c>84</c><c>94</c><c>178</c> 5188*a58d3d2aSXin Li<c>0</c><c>0</c><c>20</c><c>31</c><c>39</c><c>47</c><c>57</c><c>67</c><c>77</c><c>87</c><c>173</c> 5189*a58d3d2aSXin Li<c>0</c><c>0</c><c>12</c><c>23</c><c>32</c><c>41</c><c>51</c><c>61</c><c>71</c><c>81</c><c>168</c> 5190*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>15</c><c>25</c><c>35</c><c>45</c><c>55</c><c>65</c><c>75</c><c>163</c> 5191*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>4</c><c>17</c><c>29</c><c>39</c><c>49</c><c>59</c><c>69</c><c>158</c> 5192*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>12</c><c>23</c><c>33</c><c>43</c><c>53</c><c>63</c><c>153</c> 5193*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>1</c><c>16</c><c>26</c><c>36</c><c>46</c><c>56</c><c>148</c> 5194*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>0</c><c>10</c><c>15</c><c>20</c><c>30</c><c>45</c><c>129</c> 5195*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>0</c><c>1</c><c>1</c><c>1</c><c>1</c><c>20</c><c>104</c> 5196*a58d3d2aSXin Li</texttable> 5197*a58d3d2aSXin Li 5198*a58d3d2aSXin Li<t>The maximum allocation vector is an approximation of the maximum space 5199*a58d3d2aSXin Lithat can be used by each band for a given mode. The value is 5200*a58d3d2aSXin Liapproximate because the shape encoding is variable rate (due 5201*a58d3d2aSXin Lito entropy coding of splitting parameters). Setting the maximum too low reduces the 5202*a58d3d2aSXin Limaximum achievable quality in a band while setting it too high 5203*a58d3d2aSXin Limay result in waste: bitstream capacity available at the end 5204*a58d3d2aSXin Liof the frame which can not be put to any use. The maximums 5205*a58d3d2aSXin Lispecified by the codec reflect the average maximum. In the reference 5206*a58d3d2aSXin Liimplementation, the maximums in bits/sample are precomputed in a static table 5207*a58d3d2aSXin Li(see cache_caps50[] in static_modes_float.h) for each band, 5208*a58d3d2aSXin Lifor each value of LM, and for both mono and stereo. 5209*a58d3d2aSXin Li 5210*a58d3d2aSXin LiImplementations are expected 5211*a58d3d2aSXin Lito simply use the same table data, but the procedure for generating 5212*a58d3d2aSXin Lithis table is included in rate.c as part of compute_pulse_cache().</t> 5213*a58d3d2aSXin Li 5214*a58d3d2aSXin Li<t>To convert the values in cache.caps into the actual maximums: first 5215*a58d3d2aSXin Liset nbBands to the maximum number of bands for this mode, and stereo to 5216*a58d3d2aSXin Lizero if stereo is not in use and one otherwise. For each band set N 5217*a58d3d2aSXin Lito the number of MDCT bins covered by the band (for one channel), set LM 5218*a58d3d2aSXin Lito the shift value for the frame size, 5219*a58d3d2aSXin Lithen set i to nbBands*(2*LM+stereo). Then set the maximum for the band to 5220*a58d3d2aSXin Lithe i-th index of cache.caps + 64 and multiply by the number of channels 5221*a58d3d2aSXin Liin the current frame (one or two) and by N, then divide the result by 4 5222*a58d3d2aSXin Liusing integer division. The resulting vector will be called 5223*a58d3d2aSXin Licap[]. The elements fit in signed 16-bit integers but do not fit in 8 bits. 5224*a58d3d2aSXin LiThis procedure is implemented in the reference in the function init_caps() in celt.c. 5225*a58d3d2aSXin Li</t> 5226*a58d3d2aSXin Li 5227*a58d3d2aSXin Li<t>The band boosts are represented by a series of binary symbols which 5228*a58d3d2aSXin Liare entropy coded with very low probability. Each band can potentially be boosted 5229*a58d3d2aSXin Limultiple times, subject to the frame actually having enough room to obey 5230*a58d3d2aSXin Lithe boost and having enough room to code the boost symbol. The default 5231*a58d3d2aSXin Licoding cost for a boost starts out at six bits (probability p=1/64), but subsequent boosts 5232*a58d3d2aSXin Liin a band cost only a single bit and every time a band is boosted the 5233*a58d3d2aSXin Liinitial cost is reduced (down to a minimum of two bits, or p=1/4). Since the initial 5234*a58d3d2aSXin Licost of coding a boost is 6 bits, the coding cost of the boost symbols when 5235*a58d3d2aSXin Licompletely unused is 0.48 bits/frame for a 21 band mode (21*-log2(1-1/2**6)).</t> 5236*a58d3d2aSXin Li 5237*a58d3d2aSXin Li<t>To decode the band boosts: First set 'dynalloc_logp' to 6, the initial 5238*a58d3d2aSXin Liamount of storage required to signal a boost in bits, 'total_bits' to the 5239*a58d3d2aSXin Lisize of the frame in 8th bits, 'total_boost' to zero, and 'tell' to the total number 5240*a58d3d2aSXin Liof 8th bits decoded 5241*a58d3d2aSXin Liso far. For each band from the coding start (0 normally, but 17 in Hybrid mode) 5242*a58d3d2aSXin Lito the coding end (which changes depending on the signaled bandwidth), the boost quanta 5243*a58d3d2aSXin Liin units of 1/8 bit is calculated as quanta = min(8*N, max(48, N)). 5244*a58d3d2aSXin LiThis represents a boost step size of six bits, subject to a lower limit of 5245*a58d3d2aSXin Li1/8th bit/sample and an upper limit of 1 bit/sample. 5246*a58d3d2aSXin LiSet 'boost' to zero and 'dynalloc_loop_logp' 5247*a58d3d2aSXin Lito dynalloc_logp. While dynalloc_loop_log (the current worst case symbol cost) in 5248*a58d3d2aSXin Li8th bits plus tell is less than total_bits plus total_boost and boost is less than cap[] for this 5249*a58d3d2aSXin Liband: Decode a bit from the bitstream with a with dynalloc_loop_logp as the cost 5250*a58d3d2aSXin Liof a one, update tell to reflect the current used capacity, if the decoded value 5251*a58d3d2aSXin Liis zero break the loop otherwise add quanta to boost and total_boost, subtract quanta from 5252*a58d3d2aSXin Litotal_bits, and set dynalloc_loop_log to 1. When the while loop finishes 5253*a58d3d2aSXin Liboost contains the boost for this band. If boost is non-zero and dynalloc_logp 5254*a58d3d2aSXin Liis greater than 2, decrease dynalloc_logp. Once this process has been 5255*a58d3d2aSXin Liexecuted on all bands, the band boosts have been decoded. This procedure 5256*a58d3d2aSXin Liis implemented around line 2474 of celt.c.</t> 5257*a58d3d2aSXin Li 5258*a58d3d2aSXin Li<t>At very low rates it is possible that there won't be enough available 5259*a58d3d2aSXin Lispace to execute the inner loop even once. In these cases band boost 5260*a58d3d2aSXin Liis not possible but its overhead is completely eliminated. Because of the 5261*a58d3d2aSXin Lihigh cost of band boost when activated, a reasonable encoder should not be 5262*a58d3d2aSXin Liusing it at very low rates. The reference implements its dynalloc decision 5263*a58d3d2aSXin Lilogic around line 1304 of celt.c.</t> 5264*a58d3d2aSXin Li 5265*a58d3d2aSXin Li<t>The allocation trim is a integer value from 0-10. The default value of 5266*a58d3d2aSXin Li5 indicates no trim. The trim parameter is entropy coded in order to 5267*a58d3d2aSXin Lilower the coding cost of less extreme adjustments. Values lower than 5268*a58d3d2aSXin Li5 bias the allocation towards lower frequencies and values above 5 5269*a58d3d2aSXin Libias it towards higher frequencies. Like other signaled parameters, signaling 5270*a58d3d2aSXin Liof the trim is gated so that it is not included if there is insufficient space 5271*a58d3d2aSXin Liavailable in the bitstream. To decode the trim, first set 5272*a58d3d2aSXin Lithe trim value to 5, then if and only if the count of decoded 8th bits so far (ec_tell_frac) 5273*a58d3d2aSXin Liplus 48 (6 bits) is less than or equal to the total frame size in 8th 5274*a58d3d2aSXin Libits minus total_boost (a product of the above band boost procedure), 5275*a58d3d2aSXin Lidecode the trim value using the PDF in <xref target="celt_trim_pdf"/>.</t> 5276*a58d3d2aSXin Li 5277*a58d3d2aSXin Li<texttable anchor="celt_trim_pdf" title="PDF for the Trim"> 5278*a58d3d2aSXin Li<ttcol>PDF</ttcol> 5279*a58d3d2aSXin Li<c>{1, 1, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2}/128</c> 5280*a58d3d2aSXin Li</texttable> 5281*a58d3d2aSXin Li 5282*a58d3d2aSXin Li<t>For 10 ms and 20 ms frames using short blocks and that have at least LM+2 bits left prior to 5283*a58d3d2aSXin Lithe allocation process, then one anti-collapse bit is reserved in the allocation process so it can 5284*a58d3d2aSXin Libe decoded later. Following the the anti-collapse reservation, one bit is reserved for skip if available.</t> 5285*a58d3d2aSXin Li 5286*a58d3d2aSXin Li<t>For stereo frames, bits are reserved for intensity stereo and for dual stereo. Intensity stereo 5287*a58d3d2aSXin Lirequires ilog2(end-start) bits. Those bits are reserved if there is enough bits left. Following this, one 5288*a58d3d2aSXin Libit is reserved for dual stereo if available.</t> 5289*a58d3d2aSXin Li 5290*a58d3d2aSXin Li 5291*a58d3d2aSXin Li<t>The allocation computation begins by setting up some initial conditions. 5292*a58d3d2aSXin Li'total' is set to the remaining available 8th bits, computed by taking the 5293*a58d3d2aSXin Lisize of the coded frame times 8 and subtracting ec_tell_frac(). From this value, one (8th bit) 5294*a58d3d2aSXin Liis subtracted to ensure that the resulting allocation will be conservative. 'anti_collapse_rsv' 5295*a58d3d2aSXin Liis set to 8 (8th bits) if and only if the frame is a transient, LM is greater than 1, and total is 5296*a58d3d2aSXin Ligreater than or equal to (LM+2) * 8. Total is then decremented by anti_collapse_rsv and clamped 5297*a58d3d2aSXin Lito be equal to or greater than zero. 'skip_rsv' is set to 8 (8th bits) if total is greater than 5298*a58d3d2aSXin Li8, otherwise it is zero. Total is then decremented by skip_rsv. This reserves space for the 5299*a58d3d2aSXin Lifinal skipping flag.</t> 5300*a58d3d2aSXin Li 5301*a58d3d2aSXin Li<t>If the current frame is stereo, intensity_rsv is set to the conservative log2 in 8th bits 5302*a58d3d2aSXin Liof the number of coded bands for this frame (given by the table LOG2_FRAC_TABLE in rate.c). If 5303*a58d3d2aSXin Liintensity_rsv is greater than total then intensity_rsv is set to zero. Otherwise total is 5304*a58d3d2aSXin Lidecremented by intensity_rsv, and if total is still greater than 8, dual_stereo_rsv is 5305*a58d3d2aSXin Liset to 8 and total is decremented by dual_stereo_rsv.</t> 5306*a58d3d2aSXin Li 5307*a58d3d2aSXin Li<t>The allocation process then computes a vector representing the hard minimum amounts allocation 5308*a58d3d2aSXin Liany band will receive for shape. This minimum is higher than the technical limit of the PVQ 5309*a58d3d2aSXin Liprocess, but very low rate allocations produce an excessively sparse spectrum and these bands 5310*a58d3d2aSXin Liare better served by having no allocation at all. For each coded band, set thresh[band] to 5311*a58d3d2aSXin Litwenty-four times the number of MDCT bins in the band and divide by 16. If 8 times the number 5312*a58d3d2aSXin Liof channels is greater, use that instead. This sets the minimum allocation to one bit per channel 5313*a58d3d2aSXin Lior 48 128th bits per MDCT bin, whichever is greater. The band-size dependent part of this 5314*a58d3d2aSXin Livalue is not scaled by the channel count, because at the very low rates where this limit is 5315*a58d3d2aSXin Liapplicable there will usually be no bits allocated to the side.</t> 5316*a58d3d2aSXin Li 5317*a58d3d2aSXin Li<t>The previously decoded allocation trim is used to derive a vector of per-band adjustments, 5318*a58d3d2aSXin Li'trim_offsets[]'. For each coded band take the alloc_trim and subtract 5 and LM. Then multiply 5319*a58d3d2aSXin Lithe result by the number of channels, the number of MDCT bins in the shortest frame size for this mode, 5320*a58d3d2aSXin Lithe number of remaining bands, 2**LM, and 8. Then divide this value by 64. Finally, if the 5321*a58d3d2aSXin Linumber of MDCT bins in the band per channel is only one, 8 times the number of channels is subtracted 5322*a58d3d2aSXin Liin order to diminish the allocation by one bit, because width 1 bands receive greater benefit 5323*a58d3d2aSXin Lifrom the coarse energy coding.</t> 5324*a58d3d2aSXin Li 5325*a58d3d2aSXin Li 5326*a58d3d2aSXin Li</section> 5327*a58d3d2aSXin Li 5328*a58d3d2aSXin Li<section anchor="PVQ-decoder" title="Shape Decoding"> 5329*a58d3d2aSXin Li<t> 5330*a58d3d2aSXin LiIn each band, the normalized "shape" is encoded 5331*a58d3d2aSXin Liusing a vector quantization scheme called a "pyramid vector quantizer". 5332*a58d3d2aSXin Li</t> 5333*a58d3d2aSXin Li 5334*a58d3d2aSXin Li<t>In 5335*a58d3d2aSXin Lithe simplest case, the number of bits allocated in 5336*a58d3d2aSXin Li<xref target="allocation"></xref> is converted to a number of pulses as described 5337*a58d3d2aSXin Liby <xref target="bits-pulses"></xref>. Knowing the number of pulses and the 5338*a58d3d2aSXin Linumber of samples in the band, the decoder calculates the size of the codebook 5339*a58d3d2aSXin Lias detailed in <xref target="cwrs-decoder"></xref>. The size is used to decode 5340*a58d3d2aSXin Lian unsigned integer (uniform probability model), which is the codeword index. 5341*a58d3d2aSXin LiThis index is converted into the corresponding vector as explained in 5342*a58d3d2aSXin Li<xref target="cwrs-decoder"></xref>. This vector is then scaled to unit norm. 5343*a58d3d2aSXin Li</t> 5344*a58d3d2aSXin Li 5345*a58d3d2aSXin Li<section anchor="bits-pulses" title="Bits to Pulses"> 5346*a58d3d2aSXin Li<t> 5347*a58d3d2aSXin LiAlthough the allocation is performed in 1/8th bit units, the quantization requires 5348*a58d3d2aSXin Lian integer number of pulses K. To do this, the encoder searches for the value 5349*a58d3d2aSXin Liof K that produces the number of bits nearest to the allocated value 5350*a58d3d2aSXin Li(rounding down if exactly halfway between two values), not to exceed 5351*a58d3d2aSXin Lithe total number of bits available. For efficiency reasons, the search is performed against a 5352*a58d3d2aSXin Liprecomputed allocation table which only permits some K values for each N. The number of 5353*a58d3d2aSXin Licodebook entries can be computed as explained in <xref target="cwrs-decoder"></xref>. The difference 5354*a58d3d2aSXin Libetween the number of bits allocated and the number of bits used is accumulated to a 5355*a58d3d2aSXin Li"balance" (initialized to zero) that helps adjust the 5356*a58d3d2aSXin Liallocation for the next bands. One third of the balance is applied to the 5357*a58d3d2aSXin Libit allocation of each band to help achieve the target allocation. The only 5358*a58d3d2aSXin Liexceptions are the band before the last and the last band, for which half the balance 5359*a58d3d2aSXin Liand the whole balance are applied, respectively. 5360*a58d3d2aSXin Li</t> 5361*a58d3d2aSXin Li</section> 5362*a58d3d2aSXin Li 5363*a58d3d2aSXin Li<section anchor="cwrs-decoder" title="PVQ Decoding"> 5364*a58d3d2aSXin Li 5365*a58d3d2aSXin Li<t> 5366*a58d3d2aSXin LiDecoding of PVQ vectors is implemented in decode_pulses() (cwrs.c). 5367*a58d3d2aSXin LiThe unique codeword index is decoded as a uniformly-distributed integer value between 0 and 5368*a58d3d2aSXin LiV(N,K)-1, where V(N,K) is the number of possible combinations of K pulses in 5369*a58d3d2aSXin LiN samples. The index is then converted to a vector in the same way specified in 5370*a58d3d2aSXin Li<xref target="PVQ"></xref>. The indexing is based on the calculation of V(N,K) 5371*a58d3d2aSXin Li(denoted N(L,K) in <xref target="PVQ"></xref>). 5372*a58d3d2aSXin Li</t> 5373*a58d3d2aSXin Li 5374*a58d3d2aSXin Li<t> 5375*a58d3d2aSXin Li The number of combinations can be computed recursively as 5376*a58d3d2aSXin LiV(N,K) = V(N-1,K) + V(N,K-1) + V(N-1,K-1), with V(N,0) = 1 and V(0,K) = 0, K != 0. 5377*a58d3d2aSXin LiThere are many different ways to compute V(N,K), including precomputed tables and direct 5378*a58d3d2aSXin Liuse of the recursive formulation. The reference implementation applies the recursive 5379*a58d3d2aSXin Liformulation one line (or column) at a time to save on memory use, 5380*a58d3d2aSXin Lialong with an alternate, 5381*a58d3d2aSXin Liunivariate recurrence to initialize an arbitrary line, and direct 5382*a58d3d2aSXin Lipolynomial solutions for small N. All of these methods are 5383*a58d3d2aSXin Liequivalent, and have different trade-offs in speed, memory usage, and 5384*a58d3d2aSXin Licode size. Implementations MAY use any methods they like, as long as 5385*a58d3d2aSXin Lithey are equivalent to the mathematical definition. 5386*a58d3d2aSXin Li</t> 5387*a58d3d2aSXin Li 5388*a58d3d2aSXin Li<t> 5389*a58d3d2aSXin LiThe decoded vector X is recovered as follows. 5390*a58d3d2aSXin LiLet i be the index decoded with the procedure in <xref target="ec_dec_uint"/> 5391*a58d3d2aSXin Li with ft = V(N,K), so that 0 <= i < V(N,K). 5392*a58d3d2aSXin LiLet k = K. 5393*a58d3d2aSXin LiThen for j = 0 to (N - 1), inclusive, do: 5394*a58d3d2aSXin Li<list style="numbers"> 5395*a58d3d2aSXin Li<t>Let p = (V(N-j-1,k) + V(N-j,k))/2.</t> 5396*a58d3d2aSXin Li<t> 5397*a58d3d2aSXin LiIf i < p, then let sgn = 1, else let sgn = -1 5398*a58d3d2aSXin Li and set i = i - p. 5399*a58d3d2aSXin Li</t> 5400*a58d3d2aSXin Li<t>Let k0 = k and set p = p - V(N-j-1,k).</t> 5401*a58d3d2aSXin Li<t> 5402*a58d3d2aSXin LiWhile p > i, set k = k - 1 and 5403*a58d3d2aSXin Li p = p - V(N-j-1,k). 5404*a58d3d2aSXin Li</t> 5405*a58d3d2aSXin Li<t> 5406*a58d3d2aSXin LiSet X[j] = sgn*(k0 - k) and i = i - p. 5407*a58d3d2aSXin Li</t> 5408*a58d3d2aSXin Li</list> 5409*a58d3d2aSXin Li</t> 5410*a58d3d2aSXin Li 5411*a58d3d2aSXin Li<t> 5412*a58d3d2aSXin LiThe decoded vector X is then normalized such that its 5413*a58d3d2aSXin LiL2-norm equals one. 5414*a58d3d2aSXin Li</t> 5415*a58d3d2aSXin Li</section> 5416*a58d3d2aSXin Li 5417*a58d3d2aSXin Li<section anchor="spreading" title="Spreading"> 5418*a58d3d2aSXin Li<t> 5419*a58d3d2aSXin LiThe normalized vector decoded in <xref target="cwrs-decoder"/> is then rotated 5420*a58d3d2aSXin Lifor the purpose of avoiding tonal artifacts. The rotation gain is equal to 5421*a58d3d2aSXin Li<figure align="center"> 5422*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 5423*a58d3d2aSXin Lig_r = N / (N + f_r*K) 5424*a58d3d2aSXin Li]]></artwork> 5425*a58d3d2aSXin Li</figure> 5426*a58d3d2aSXin Li 5427*a58d3d2aSXin Liwhere N is the number of dimensions, K is the number of pulses, and f_r depends on 5428*a58d3d2aSXin Lithe value of the "spread" parameter in the bit-stream. 5429*a58d3d2aSXin Li</t> 5430*a58d3d2aSXin Li 5431*a58d3d2aSXin Li<texttable anchor="spread values" title="Spreading Values"> 5432*a58d3d2aSXin Li<ttcol>Spread value</ttcol> 5433*a58d3d2aSXin Li<ttcol>f_r</ttcol> 5434*a58d3d2aSXin Li <c>0</c> <c>infinite (no rotation)</c> 5435*a58d3d2aSXin Li <c>1</c> <c>15</c> 5436*a58d3d2aSXin Li <c>2</c> <c>10</c> 5437*a58d3d2aSXin Li <c>3</c> <c>5</c> 5438*a58d3d2aSXin Li</texttable> 5439*a58d3d2aSXin Li 5440*a58d3d2aSXin Li<t> 5441*a58d3d2aSXin LiThe rotation angle is then calculated as 5442*a58d3d2aSXin Li<figure align="center"> 5443*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 5444*a58d3d2aSXin Li 2 5445*a58d3d2aSXin Li pi * g_r 5446*a58d3d2aSXin Litheta = ---------- 5447*a58d3d2aSXin Li 4 5448*a58d3d2aSXin Li]]></artwork> 5449*a58d3d2aSXin Li</figure> 5450*a58d3d2aSXin LiA 2-D rotation R(i,j) between points x_i and x_j is defined as: 5451*a58d3d2aSXin Li<figure align="center"> 5452*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 5453*a58d3d2aSXin Lix_i' = cos(theta)*x_i + sin(theta)*x_j 5454*a58d3d2aSXin Lix_j' = -sin(theta)*x_i + cos(theta)*x_j 5455*a58d3d2aSXin Li]]></artwork> 5456*a58d3d2aSXin Li</figure> 5457*a58d3d2aSXin Li 5458*a58d3d2aSXin LiAn N-D rotation is then achieved by applying a series of 2-D rotations back and forth, in the 5459*a58d3d2aSXin Lifollowing order: R(x_1, x_2), R(x_2, x_3), ..., R(x_N-2, X_N-1), R(x_N-1, X_N), 5460*a58d3d2aSXin LiR(x_N-2, X_N-1), ..., R(x_1, x_2). 5461*a58d3d2aSXin Li</t> 5462*a58d3d2aSXin Li 5463*a58d3d2aSXin Li<t> 5464*a58d3d2aSXin LiIf the decoded vector represents more 5465*a58d3d2aSXin Lithan one time block, then this spreading process is applied separately on each time block. 5466*a58d3d2aSXin LiAlso, if each block represents 8 samples or more, then another N-D rotation, by 5467*a58d3d2aSXin Li(pi/2-theta), is applied <spanx style="emph">before</spanx> the rotation described above. This 5468*a58d3d2aSXin Liextra rotation is applied in an interleaved manner with a stride equal to round(sqrt(N/nb_blocks)), 5469*a58d3d2aSXin Lii.e., it is applied independently for each set of sample S_k = {stride*n + k}, n=0..N/stride-1. 5470*a58d3d2aSXin Li</t> 5471*a58d3d2aSXin Li</section> 5472*a58d3d2aSXin Li 5473*a58d3d2aSXin Li<section anchor="split" title="Split decoding"> 5474*a58d3d2aSXin Li<t> 5475*a58d3d2aSXin LiTo avoid the need for multi-precision calculations when decoding PVQ codevectors, 5476*a58d3d2aSXin Lithe maximum size allowed for codebooks is 32 bits. When larger codebooks are 5477*a58d3d2aSXin Lineeded, the vector is instead split in two sub-vectors of size N/2. 5478*a58d3d2aSXin LiA quantized gain parameter with precision 5479*a58d3d2aSXin Liderived from the current allocation is entropy coded to represent the relative 5480*a58d3d2aSXin Ligains of each side of the split, and the entire decoding process is recursively 5481*a58d3d2aSXin Liapplied. Multiple levels of splitting may be applied up to a limit of LM+1 splits. 5482*a58d3d2aSXin LiThe same recursive mechanism is applied for the joint coding 5483*a58d3d2aSXin Liof stereo audio. 5484*a58d3d2aSXin Li</t> 5485*a58d3d2aSXin Li 5486*a58d3d2aSXin Li</section> 5487*a58d3d2aSXin Li 5488*a58d3d2aSXin Li<section anchor="tf-change" title="Time-Frequency change"> 5489*a58d3d2aSXin Li<t> 5490*a58d3d2aSXin LiThe time-frequency (TF) parameters are used to control the time-frequency resolution tradeoff 5491*a58d3d2aSXin Liin each coded band. For each band, there are two possible TF choices. For the first 5492*a58d3d2aSXin Liband coded, the PDF is {3, 1}/4 for frames marked as transient and {15, 1}/16 for 5493*a58d3d2aSXin Lithe other frames. For subsequent bands, the TF choice is coded relative to the 5494*a58d3d2aSXin Liprevious TF choice with probability {15, 1}/15 for transient frames and {31, 1}/32 5495*a58d3d2aSXin Liotherwise. The mapping between the decoded TF choices and the adjustment in TF 5496*a58d3d2aSXin Liresolution is shown in the tables below. 5497*a58d3d2aSXin Li</t> 5498*a58d3d2aSXin Li 5499*a58d3d2aSXin Li<texttable anchor='tf_00' 5500*a58d3d2aSXin Li title="TF Adjustments for Non-transient Frames and tf_select=0"> 5501*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol> 5502*a58d3d2aSXin Li<ttcol align='center'>0</ttcol> 5503*a58d3d2aSXin Li<ttcol align='center'>1</ttcol> 5504*a58d3d2aSXin Li<c>2.5</c> <c>0</c> <c>-1</c> 5505*a58d3d2aSXin Li<c>5</c> <c>0</c> <c>-1</c> 5506*a58d3d2aSXin Li<c>10</c> <c>0</c> <c>-2</c> 5507*a58d3d2aSXin Li<c>20</c> <c>0</c> <c>-2</c> 5508*a58d3d2aSXin Li</texttable> 5509*a58d3d2aSXin Li 5510*a58d3d2aSXin Li<texttable anchor='tf_01' 5511*a58d3d2aSXin Li title="TF Adjustments for Non-transient Frames and tf_select=1"> 5512*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol> 5513*a58d3d2aSXin Li<ttcol align='center'>0</ttcol> 5514*a58d3d2aSXin Li<ttcol align='center'>1</ttcol> 5515*a58d3d2aSXin Li<c>2.5</c> <c>0</c> <c>-1</c> 5516*a58d3d2aSXin Li<c>5</c> <c>0</c> <c>-2</c> 5517*a58d3d2aSXin Li<c>10</c> <c>0</c> <c>-3</c> 5518*a58d3d2aSXin Li<c>20</c> <c>0</c> <c>-3</c> 5519*a58d3d2aSXin Li</texttable> 5520*a58d3d2aSXin Li 5521*a58d3d2aSXin Li 5522*a58d3d2aSXin Li<texttable anchor='tf_10' 5523*a58d3d2aSXin Li title="TF Adjustments for Transient Frames and tf_select=0"> 5524*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol> 5525*a58d3d2aSXin Li<ttcol align='center'>0</ttcol> 5526*a58d3d2aSXin Li<ttcol align='center'>1</ttcol> 5527*a58d3d2aSXin Li<c>2.5</c> <c>0</c> <c>-1</c> 5528*a58d3d2aSXin Li<c>5</c> <c>1</c> <c>0</c> 5529*a58d3d2aSXin Li<c>10</c> <c>2</c> <c>0</c> 5530*a58d3d2aSXin Li<c>20</c> <c>3</c> <c>0</c> 5531*a58d3d2aSXin Li</texttable> 5532*a58d3d2aSXin Li 5533*a58d3d2aSXin Li<texttable anchor='tf_11' 5534*a58d3d2aSXin Li title="TF Adjustments for Transient Frames and tf_select=1"> 5535*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol> 5536*a58d3d2aSXin Li<ttcol align='center'>0</ttcol> 5537*a58d3d2aSXin Li<ttcol align='center'>1</ttcol> 5538*a58d3d2aSXin Li<c>2.5</c> <c>0</c> <c>-1</c> 5539*a58d3d2aSXin Li<c>5</c> <c>1</c> <c>-1</c> 5540*a58d3d2aSXin Li<c>10</c> <c>1</c> <c>-1</c> 5541*a58d3d2aSXin Li<c>20</c> <c>1</c> <c>-1</c> 5542*a58d3d2aSXin Li</texttable> 5543*a58d3d2aSXin Li 5544*a58d3d2aSXin Li<t> 5545*a58d3d2aSXin LiA negative TF adjustment means that the temporal resolution is increased, 5546*a58d3d2aSXin Liwhile a positive TF adjustment means that the frequency resolution is increased. 5547*a58d3d2aSXin LiChanges in TF resolution are implemented using the Hadamard transform <xref target="Hadamard"/>. To increase 5548*a58d3d2aSXin Lithe time resolution by N, N "levels" of the Hadamard transform are applied to the 5549*a58d3d2aSXin Lidecoded vector for each interleaved MDCT vector. To increase the frequency resolution 5550*a58d3d2aSXin Li(assumes a transient frame), then N levels of the Hadamard transform are applied 5551*a58d3d2aSXin Li<spanx style="emph">across</spanx> the interleaved MDCT vector. In the case of increased 5552*a58d3d2aSXin Litime resolution the decoder uses the "sequency order" because the input vector 5553*a58d3d2aSXin Liis sorted in time. 5554*a58d3d2aSXin Li</t> 5555*a58d3d2aSXin Li</section> 5556*a58d3d2aSXin Li 5557*a58d3d2aSXin Li 5558*a58d3d2aSXin Li</section> 5559*a58d3d2aSXin Li 5560*a58d3d2aSXin Li<section anchor="anti-collapse" title="Anti-Collapse Processing"> 5561*a58d3d2aSXin Li<t> 5562*a58d3d2aSXin LiThe anti-collapse feature is designed to avoid the situation where the use of multiple 5563*a58d3d2aSXin Lishort MDCTs causes the energy in one or more of the MDCTs to be zero for 5564*a58d3d2aSXin Lisome bands, causing unpleasant artifacts. 5565*a58d3d2aSXin LiWhen the frame has the transient bit set, an anti-collapse bit is decoded. 5566*a58d3d2aSXin LiWhen anti-collapse is set, the energy in each small MDCT is prevented 5567*a58d3d2aSXin Lifrom collapsing to zero. For each band of each MDCT where a collapse is 5568*a58d3d2aSXin Lidetected, a pseudo-random signal is inserted with an energy corresponding 5569*a58d3d2aSXin Lito the minimum energy over the two previous frames. A renormalization step is 5570*a58d3d2aSXin Lithen required to ensure that the anti-collapse step did not alter the 5571*a58d3d2aSXin Lienergy preservation property. 5572*a58d3d2aSXin Li</t> 5573*a58d3d2aSXin Li</section> 5574*a58d3d2aSXin Li 5575*a58d3d2aSXin Li<section anchor="denormalization" title="Denormalization"> 5576*a58d3d2aSXin Li<t> 5577*a58d3d2aSXin LiJust as each band was normalized in the encoder, the last step of the decoder before 5578*a58d3d2aSXin Lithe inverse MDCT is to denormalize the bands. Each decoded normalized band is 5579*a58d3d2aSXin Limultiplied by the square root of the decoded energy. This is done by denormalise_bands() 5580*a58d3d2aSXin Li(bands.c). 5581*a58d3d2aSXin Li</t> 5582*a58d3d2aSXin Li</section> 5583*a58d3d2aSXin Li 5584*a58d3d2aSXin Li<section anchor="inverse-mdct" title="Inverse MDCT"> 5585*a58d3d2aSXin Li 5586*a58d3d2aSXin Li 5587*a58d3d2aSXin Li<t>The inverse MDCT implementation has no special characteristics. The 5588*a58d3d2aSXin Liinput is N frequency-domain samples and the output is 2*N time-domain 5589*a58d3d2aSXin Lisamples, while scaling by 1/2. A "low-overlap" window reduces the algorithmic delay. 5590*a58d3d2aSXin LiIt is derived from a basic (full overlap) 240-sample version of the window used by the Vorbis codec: 5591*a58d3d2aSXin Li<figure align="center"> 5592*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 5593*a58d3d2aSXin Li 2 5594*a58d3d2aSXin Li / /pi /pi n + 1/2\ \ \ 5595*a58d3d2aSXin LiW(n) = |sin|-- * sin|-- * -------| | | . 5596*a58d3d2aSXin Li \ \2 \2 L / / / 5597*a58d3d2aSXin Li]]></artwork> 5598*a58d3d2aSXin Li</figure> 5599*a58d3d2aSXin LiThe low-overlap window is created by zero-padding the basic window and inserting ones in the 5600*a58d3d2aSXin Limiddle, such that the resulting window still satisfies power complementarity <xref target='Princen86'/>. 5601*a58d3d2aSXin LiThe IMDCT and 5602*a58d3d2aSXin Liwindowing are performed by mdct_backward (mdct.c). 5603*a58d3d2aSXin Li</t> 5604*a58d3d2aSXin Li 5605*a58d3d2aSXin Li<section anchor="post-filter" title="Post-filter"> 5606*a58d3d2aSXin Li<t> 5607*a58d3d2aSXin LiThe output of the inverse MDCT (after weighted overlap-add) is sent to the 5608*a58d3d2aSXin Lipost-filter. Although the post-filter is applied at the end, the post-filter 5609*a58d3d2aSXin Liparameters are encoded at the beginning, just after the silence flag. 5610*a58d3d2aSXin LiThe post-filter can be switched on or off using one bit (logp=1). 5611*a58d3d2aSXin LiIf the post-filter is enabled, then the octave is decoded as an integer value 5612*a58d3d2aSXin Libetween 0 and 6 of uniform probability. Once the octave is known, the fine pitch 5613*a58d3d2aSXin Liwithin the octave is decoded using 4+octave raw bits. The final pitch period 5614*a58d3d2aSXin Liis equal to (16<<octave)+fine_pitch-1 so it is bounded between 15 and 1022, 5615*a58d3d2aSXin Liinclusively. Next, the gain is decoded as three raw bits and is equal to 5616*a58d3d2aSXin LiG=3*(int_gain+1)/32. The set of post-filter taps is decoded last, using 5617*a58d3d2aSXin Lia pdf equal to {2, 1, 1}/4. Tapset zero corresponds to the filter coefficients 5618*a58d3d2aSXin Lig0 = 0.3066406250, g1 = 0.2170410156, g2 = 0.1296386719. Tapset one 5619*a58d3d2aSXin Licorresponds to the filter coefficients g0 = 0.4638671875, g1 = 0.2680664062, 5620*a58d3d2aSXin Lig2 = 0, and tapset two uses filter coefficients g0 = 0.7998046875, 5621*a58d3d2aSXin Lig1 = 0.1000976562, g2 = 0. 5622*a58d3d2aSXin Li</t> 5623*a58d3d2aSXin Li 5624*a58d3d2aSXin Li<t> 5625*a58d3d2aSXin LiThe post-filter response is thus computed as: 5626*a58d3d2aSXin Li <figure align="center"> 5627*a58d3d2aSXin Li <artwork align="center"> 5628*a58d3d2aSXin Li <![CDATA[ 5629*a58d3d2aSXin Li y(n) = x(n) + G*(g0*y(n-T) + g1*(y(n-T+1)+y(n-T+1)) 5630*a58d3d2aSXin Li + g2*(y(n-T+2)+y(n-T+2))) 5631*a58d3d2aSXin Li]]> 5632*a58d3d2aSXin Li </artwork> 5633*a58d3d2aSXin Li </figure> 5634*a58d3d2aSXin Li 5635*a58d3d2aSXin LiDuring a transition between different gains, a smooth transition is calculated 5636*a58d3d2aSXin Liusing the square of the MDCT window. It is important that values of y(n) be 5637*a58d3d2aSXin Liinterpolated one at a time such that the past value of y(n) used is interpolated. 5638*a58d3d2aSXin Li</t> 5639*a58d3d2aSXin Li</section> 5640*a58d3d2aSXin Li 5641*a58d3d2aSXin Li<section anchor="deemphasis" title="De-emphasis"> 5642*a58d3d2aSXin Li<t> 5643*a58d3d2aSXin LiAfter the post-filter, 5644*a58d3d2aSXin Lithe signal is de-emphasized using the inverse of the pre-emphasis filter 5645*a58d3d2aSXin Liused in the encoder: 5646*a58d3d2aSXin Li<figure align="center"> 5647*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 5648*a58d3d2aSXin Li 1 1 5649*a58d3d2aSXin Li---- = --------------- , 5650*a58d3d2aSXin LiA(z) -1 5651*a58d3d2aSXin Li 1 - alpha_p*z 5652*a58d3d2aSXin Li]]></artwork> 5653*a58d3d2aSXin Li</figure> 5654*a58d3d2aSXin Liwhere alpha_p=0.8500061035. 5655*a58d3d2aSXin Li</t> 5656*a58d3d2aSXin Li</section> 5657*a58d3d2aSXin Li 5658*a58d3d2aSXin Li</section> 5659*a58d3d2aSXin Li 5660*a58d3d2aSXin Li</section> 5661*a58d3d2aSXin Li 5662*a58d3d2aSXin Li<section anchor="Packet Loss Concealment" title="Packet Loss Concealment (PLC)"> 5663*a58d3d2aSXin Li<t> 5664*a58d3d2aSXin LiPacket loss concealment (PLC) is an optional decoder-side feature that 5665*a58d3d2aSXin LiSHOULD be included when receiving from an unreliable channel. Because 5666*a58d3d2aSXin LiPLC is not part of the bitstream, there are many acceptable ways to 5667*a58d3d2aSXin Liimplement PLC with different complexity/quality trade-offs. 5668*a58d3d2aSXin Li</t> 5669*a58d3d2aSXin Li 5670*a58d3d2aSXin Li<t> 5671*a58d3d2aSXin LiThe PLC in 5672*a58d3d2aSXin Lithe reference implementation depends on the mode of last packet received. 5673*a58d3d2aSXin LiIn CELT mode, the PLC finds a periodicity in the decoded 5674*a58d3d2aSXin Lisignal and repeats the windowed waveform using the pitch offset. The windowed 5675*a58d3d2aSXin Liwaveform is overlapped in such a way as to preserve the time-domain aliasing 5676*a58d3d2aSXin Licancellation with the previous frame and the next frame. This is implemented 5677*a58d3d2aSXin Liin celt_decode_lost() (mdct.c). In SILK mode, the PLC uses LPC extrapolation 5678*a58d3d2aSXin Lifrom the previous frame, implemented in silk_PLC() (PLC.c). 5679*a58d3d2aSXin Li</t> 5680*a58d3d2aSXin Li 5681*a58d3d2aSXin Li<section anchor="clock-drift" title="Clock Drift Compensation"> 5682*a58d3d2aSXin Li<t> 5683*a58d3d2aSXin LiClock drift refers to the gradual desynchronization of two endpoints 5684*a58d3d2aSXin Liwhose sample clocks run at different frequencies while they are streaming 5685*a58d3d2aSXin Lilive audio. Differences in clock frequencies are generally attributable to 5686*a58d3d2aSXin Limanufacturing variation in the endpoints' clock hardware. For long-lived 5687*a58d3d2aSXin Listreams, the time difference between sender and receiver can grow without 5688*a58d3d2aSXin Libound. 5689*a58d3d2aSXin Li</t> 5690*a58d3d2aSXin Li 5691*a58d3d2aSXin Li<t> 5692*a58d3d2aSXin LiWhen the sender's clock runs slower than the receiver's, the effect is similar 5693*a58d3d2aSXin Lito packet loss: too few packets are received. The receiver can distinguish 5694*a58d3d2aSXin Libetween drift and loss if the transport provides packet timestamps. A receiver 5695*a58d3d2aSXin Lifor live streams SHOULD conceal the effects of drift, and MAY do so by invoking 5696*a58d3d2aSXin Lithe PLC. 5697*a58d3d2aSXin Li</t> 5698*a58d3d2aSXin Li 5699*a58d3d2aSXin Li<t> 5700*a58d3d2aSXin LiWhen the sender's clock runs faster than the receiver's, too many packets will 5701*a58d3d2aSXin Libe received. The receiver MAY respond by skipping any packet (i.e., not 5702*a58d3d2aSXin Lisubmitting the packet for decoding). This is likely to produce a less severe 5703*a58d3d2aSXin Liartifact than if the frame were dropped after decoding. 5704*a58d3d2aSXin Li</t> 5705*a58d3d2aSXin Li 5706*a58d3d2aSXin Li<t> 5707*a58d3d2aSXin LiA decoder MAY employ a more sophisticated drift compensation method. For 5708*a58d3d2aSXin Liexample, the 5709*a58d3d2aSXin Li<xref target='Google-NetEQ'>NetEQ component</xref> 5710*a58d3d2aSXin Liof the 5711*a58d3d2aSXin Li<xref target='Google-WebRTC'>Google WebRTC codebase</xref> 5712*a58d3d2aSXin Licompensates for drift by adding or removing 5713*a58d3d2aSXin Lione period when the signal is highly periodic. The reference implementation of 5714*a58d3d2aSXin LiOpus allows a caller to learn whether the current frame's signal is highly 5715*a58d3d2aSXin Liperiodic, and if so what the period is, using the OPUS_GET_PITCH() request. 5716*a58d3d2aSXin Li</t> 5717*a58d3d2aSXin Li</section> 5718*a58d3d2aSXin Li 5719*a58d3d2aSXin Li</section> 5720*a58d3d2aSXin Li 5721*a58d3d2aSXin Li<section anchor="switching" title="Configuration Switching"> 5722*a58d3d2aSXin Li 5723*a58d3d2aSXin Li<t> 5724*a58d3d2aSXin LiSwitching between the Opus coding modes, audio bandwidths, and channel counts 5725*a58d3d2aSXin Li requires careful consideration to avoid audible glitches. 5726*a58d3d2aSXin LiSwitching between any two configurations of the CELT-only mode, any two 5727*a58d3d2aSXin Li configurations of the Hybrid mode, or from WB SILK to Hybrid mode does not 5728*a58d3d2aSXin Li require any special treatment in the decoder, as the MDCT overlap will smooth 5729*a58d3d2aSXin Li the transition. 5730*a58d3d2aSXin LiSwitching from Hybrid mode to WB SILK requires adding in the final contents 5731*a58d3d2aSXin Li of the CELT overlap buffer to the first SILK-only packet. 5732*a58d3d2aSXin LiThis can be done by decoding a 2.5 ms silence frame with the CELT decoder 5733*a58d3d2aSXin Li using the channel count of the SILK-only packet (and any choice of audio 5734*a58d3d2aSXin Li bandwidth), which will correctly handle the cases when the channel count 5735*a58d3d2aSXin Li changes as well. 5736*a58d3d2aSXin Li</t> 5737*a58d3d2aSXin Li 5738*a58d3d2aSXin Li<t> 5739*a58d3d2aSXin LiWhen changing the channel count for SILK-only or Hybrid packets, the encoder 5740*a58d3d2aSXin Li can avoid glitches by smoothly varying the stereo width of the input signal 5741*a58d3d2aSXin Li before or after the transition, and SHOULD do so. 5742*a58d3d2aSXin LiHowever, other transitions between SILK-only packets or between NB or MB SILK 5743*a58d3d2aSXin Li and Hybrid packets may cause glitches, because neither the LSF coefficients 5744*a58d3d2aSXin Li nor the LTP, LPC, stereo unmixing, and resampler buffers are available at the 5745*a58d3d2aSXin Li new sample rate. 5746*a58d3d2aSXin LiThese switches SHOULD be delayed by the encoder until quiet periods or 5747*a58d3d2aSXin Li transients, where the inevitable glitches will be less audible. Additionally, 5748*a58d3d2aSXin Li the bit-stream MAY include redundant side information ("redundancy"), in the 5749*a58d3d2aSXin Li form of additional CELT frames embedded in each of the Opus frames around the 5750*a58d3d2aSXin Li transition. 5751*a58d3d2aSXin Li</t> 5752*a58d3d2aSXin Li 5753*a58d3d2aSXin Li<t> 5754*a58d3d2aSXin LiThe other transitions that cannot be easily handled are those where the lower 5755*a58d3d2aSXin Li frequencies switch between the SILK LP-based model and the CELT MDCT model. 5756*a58d3d2aSXin LiHowever, an encoder may not have an opportunity to delay such a switch to a 5757*a58d3d2aSXin Li convenient point. 5758*a58d3d2aSXin LiFor example, if the content switches from speech to music, and the encoder does 5759*a58d3d2aSXin Li not have enough latency in its analysis to detect this in advance, there may 5760*a58d3d2aSXin Li be no convenient silence period during which to make the transition for quite 5761*a58d3d2aSXin Li some time. 5762*a58d3d2aSXin LiTo avoid or reduce glitches during these problematic mode transitions, and 5763*a58d3d2aSXin Li also between audio bandwidth changes in the SILK-only modes, transitions MAY 5764*a58d3d2aSXin Li include redundant side information ("redundancy"), in the form of an 5765*a58d3d2aSXin Li additional CELT frame embedded in the Opus frame. 5766*a58d3d2aSXin Li</t> 5767*a58d3d2aSXin Li 5768*a58d3d2aSXin Li<t> 5769*a58d3d2aSXin LiA transition between coding the lower frequencies with the LP model and the 5770*a58d3d2aSXin Li MDCT model or a transition that involves changing the SILK bandwidth 5771*a58d3d2aSXin Li is only normatively specified when it includes redundancy. 5772*a58d3d2aSXin LiFor those without redundancy, it is RECOMMENDED that the decoder use a 5773*a58d3d2aSXin Li concealment technique (e.g., make use of a PLC algorithm) to "fill in" the 5774*a58d3d2aSXin Li gap or discontinuity caused by the mode transition. 5775*a58d3d2aSXin LiTherefore, PLC MUST NOT be applied during any normative transition, i.e., when 5776*a58d3d2aSXin Li<list style="symbols"> 5777*a58d3d2aSXin Li<t>A packet includes redundancy for this transition (as described below),</t> 5778*a58d3d2aSXin Li<t>The transition is between any WB SILK packet and any Hybrid packet, or vice 5779*a58d3d2aSXin Li versa,</t> 5780*a58d3d2aSXin Li<t>The transition is between any two Hybrid mode packets, or</t> 5781*a58d3d2aSXin Li<t>The transition is between any two CELT mode packets,</t> 5782*a58d3d2aSXin Li</list> 5783*a58d3d2aSXin Li unless there is actual packet loss. 5784*a58d3d2aSXin Li</t> 5785*a58d3d2aSXin Li 5786*a58d3d2aSXin Li<section anchor="side-info" title="Transition Side Information (Redundancy)"> 5787*a58d3d2aSXin Li<t> 5788*a58d3d2aSXin LiTransitions with side information include an extra 5 ms "redundant" CELT 5789*a58d3d2aSXin Li frame within the Opus frame. 5790*a58d3d2aSXin LiThis frame is designed to fill in the gap or discontinuity in the different 5791*a58d3d2aSXin Li layers without requiring the decoder to conceal it. 5792*a58d3d2aSXin LiFor transitions from CELT-only to SILK-only or Hybrid, the redundant frame is 5793*a58d3d2aSXin Li inserted in the first Opus frame after the transition (i.e., the first 5794*a58d3d2aSXin Li SILK-only or Hybrid frame). 5795*a58d3d2aSXin LiFor transitions from SILK-only or Hybrid to CELT-only, the redundant frame is 5796*a58d3d2aSXin Li inserted in the last Opus frame before the transition (i.e., the last 5797*a58d3d2aSXin Li SILK-only or Hybrid frame). 5798*a58d3d2aSXin Li</t> 5799*a58d3d2aSXin Li 5800*a58d3d2aSXin Li<section anchor="opus_redundancy_flag" title="Redundancy Flag"> 5801*a58d3d2aSXin Li<t> 5802*a58d3d2aSXin LiThe presence of redundancy is signaled in all SILK-only and Hybrid frames, not 5803*a58d3d2aSXin Li just those involved in a mode transition. 5804*a58d3d2aSXin LiThis allows the frames to be decoded correctly even if an adjacent frame is 5805*a58d3d2aSXin Li lost. 5806*a58d3d2aSXin LiFor SILK-only frames, this signaling is implicit, based on the size of the 5807*a58d3d2aSXin Li of the Opus frame and the number of bits consumed decoding the SILK portion of 5808*a58d3d2aSXin Li it. 5809*a58d3d2aSXin LiAfter decoding the SILK portion of the Opus frame, the decoder uses ec_tell() 5810*a58d3d2aSXin Li (see <xref target="ec_tell"/>) to check if there are at least 17 bits 5811*a58d3d2aSXin Li remaining. 5812*a58d3d2aSXin LiIf so, then the frame contains redundancy. 5813*a58d3d2aSXin Li</t> 5814*a58d3d2aSXin Li 5815*a58d3d2aSXin Li<t> 5816*a58d3d2aSXin LiFor Hybrid frames, this signaling is explicit. 5817*a58d3d2aSXin LiAfter decoding the SILK portion of the Opus frame, the decoder uses ec_tell() 5818*a58d3d2aSXin Li (see <xref target="ec_tell"/>) to ensure there are at least 37 bits remaining. 5819*a58d3d2aSXin LiIf so, it reads a symbol with the PDF in 5820*a58d3d2aSXin Li <xref target="opus_redundancy_flag_pdf"/>, and if the value is 1, then the 5821*a58d3d2aSXin Li frame contains redundancy. 5822*a58d3d2aSXin LiOtherwise (if there were fewer than 37 bits left or the value was 0), the frame 5823*a58d3d2aSXin Li does not contain redundancy. 5824*a58d3d2aSXin Li</t> 5825*a58d3d2aSXin Li 5826*a58d3d2aSXin Li<texttable anchor="opus_redundancy_flag_pdf" title="Redundancy Flag PDF"> 5827*a58d3d2aSXin Li<ttcol>PDF</ttcol> 5828*a58d3d2aSXin Li<c>{4095, 1}/4096</c> 5829*a58d3d2aSXin Li</texttable> 5830*a58d3d2aSXin Li</section> 5831*a58d3d2aSXin Li 5832*a58d3d2aSXin Li<section anchor="opus_redundancy_pos" title="Redundancy Position Flag"> 5833*a58d3d2aSXin Li<t> 5834*a58d3d2aSXin LiSince the current frame is a SILK-only or a Hybrid frame, it must be at least 5835*a58d3d2aSXin Li 10 ms. 5836*a58d3d2aSXin LiTherefore, it needs an additional flag to indicate whether the redundant 5837*a58d3d2aSXin Li 5 ms CELT frame should be mixed into the beginning of the current frame, 5838*a58d3d2aSXin Li or the end. 5839*a58d3d2aSXin LiAfter determining that a frame contains redundancy, the decoder reads a 5840*a58d3d2aSXin Li 1 bit symbol with a uniform PDF 5841*a58d3d2aSXin Li (<xref target="opus_redundancy_pos_pdf"/>). 5842*a58d3d2aSXin Li</t> 5843*a58d3d2aSXin Li 5844*a58d3d2aSXin Li<texttable anchor="opus_redundancy_pos_pdf" title="Redundancy Position PDF"> 5845*a58d3d2aSXin Li<ttcol>PDF</ttcol> 5846*a58d3d2aSXin Li<c>{1, 1}/2</c> 5847*a58d3d2aSXin Li</texttable> 5848*a58d3d2aSXin Li 5849*a58d3d2aSXin Li<t> 5850*a58d3d2aSXin LiIf the value is zero, this is the first frame in the transition, and the 5851*a58d3d2aSXin Li redundancy belongs at the end. 5852*a58d3d2aSXin LiIf the value is one, this is the second frame in the transition, and the 5853*a58d3d2aSXin Li redundancy belongs at the beginning. 5854*a58d3d2aSXin LiThere is no way to specify that an Opus frame contains separate redundant CELT 5855*a58d3d2aSXin Li frames at both the beginning and the end. 5856*a58d3d2aSXin Li</t> 5857*a58d3d2aSXin Li</section> 5858*a58d3d2aSXin Li 5859*a58d3d2aSXin Li<section anchor="opus_redundancy_size" title="Redundancy Size"> 5860*a58d3d2aSXin Li<t> 5861*a58d3d2aSXin LiUnlike the CELT portion of a Hybrid frame, the redundant CELT frame does not 5862*a58d3d2aSXin Li use the same entropy coder state as the rest of the Opus frame, because this 5863*a58d3d2aSXin Li would break the CELT bit allocation mechanism in Hybrid frames. 5864*a58d3d2aSXin LiThus, a redundant CELT frame always starts and ends on a byte boundary, even in 5865*a58d3d2aSXin Li SILK-only frames, where this is not strictly necessary. 5866*a58d3d2aSXin Li</t> 5867*a58d3d2aSXin Li 5868*a58d3d2aSXin Li<t> 5869*a58d3d2aSXin LiFor SILK-only frames, the number of bytes in the redundant CELT frame is simply 5870*a58d3d2aSXin Li the number of whole bytes remaining, which must be at least 2, due to the 5871*a58d3d2aSXin Li space check in <xref target="opus_redundancy_flag"/>. 5872*a58d3d2aSXin LiFor Hybrid frames, the number of bytes is equal to 2, plus a decoded unsigned 5873*a58d3d2aSXin Li integer less than 256 (see <xref target="ec_dec_uint"/>). 5874*a58d3d2aSXin LiThis may be more than the number of whole bytes remaining in the Opus frame, 5875*a58d3d2aSXin Li in which case the frame is invalid. 5876*a58d3d2aSXin LiHowever, a decoder is not required to ignore the entire frame, as this may be 5877*a58d3d2aSXin Li the result of a bit error that desynchronized the range coder. 5878*a58d3d2aSXin LiThere may still be useful data before the error, and a decoder MAY keep any 5879*a58d3d2aSXin Li audio decoded so far instead of invoking the PLC, but it is RECOMMENDED that 5880*a58d3d2aSXin Li the decoder stop decoding and discard the rest of the current Opus frame. 5881*a58d3d2aSXin Li</t> 5882*a58d3d2aSXin Li 5883*a58d3d2aSXin Li<t> 5884*a58d3d2aSXin LiIt would have been possible to avoid these invalid states in the design of Opus 5885*a58d3d2aSXin Li by limiting the range of the explicit length decoded from Hybrid frames by the 5886*a58d3d2aSXin Li actual number of whole bytes remaining. 5887*a58d3d2aSXin LiHowever, this would require an encoder to determine the rate allocation for the 5888*a58d3d2aSXin Li MDCT layer up front, before it began encoding that layer. 5889*a58d3d2aSXin LiBy allowing some invalid sizes, the encoder is able to defer that decision 5890*a58d3d2aSXin Li until much later. 5891*a58d3d2aSXin LiWhen encoding Hybrid frames which do not include redundancy, the encoder must 5892*a58d3d2aSXin Li still decide up-front if it wishes to use the minimum 37 bits required to 5893*a58d3d2aSXin Li trigger encoding of the redundancy flag, but this is a much looser 5894*a58d3d2aSXin Li restriction. 5895*a58d3d2aSXin Li</t> 5896*a58d3d2aSXin Li 5897*a58d3d2aSXin Li<t> 5898*a58d3d2aSXin LiAfter determining the size of the redundant CELT frame, the decoder reduces 5899*a58d3d2aSXin Li the size of the buffer currently in use by the range coder by that amount. 5900*a58d3d2aSXin LiThe CELT layer read any raw bits from the end of this reduced buffer, and all 5901*a58d3d2aSXin Li calculations of the number of bits remaining in the buffer must be done using 5902*a58d3d2aSXin Li this new, reduced size, rather than the original size of the Opus frame. 5903*a58d3d2aSXin Li</t> 5904*a58d3d2aSXin Li</section> 5905*a58d3d2aSXin Li 5906*a58d3d2aSXin Li<section anchor="opus_redundancy_decoding" title="Decoding the Redundancy"> 5907*a58d3d2aSXin Li<t> 5908*a58d3d2aSXin LiThe redundant frame is decoded like any other CELT-only frame, with the 5909*a58d3d2aSXin Li exception that it does not contain a TOC byte. 5910*a58d3d2aSXin LiThe frame size is fixed at 5 ms, the channel count is set to that of the 5911*a58d3d2aSXin Li current frame, and the audio bandwidth is also set to that of the current 5912*a58d3d2aSXin Li frame, with the exception that for MB SILK frames, it is set to WB. 5913*a58d3d2aSXin Li</t> 5914*a58d3d2aSXin Li 5915*a58d3d2aSXin Li<t> 5916*a58d3d2aSXin LiIf the redundancy belongs at the beginning (in a CELT-only to SILK-only or 5917*a58d3d2aSXin Li Hybrid transition), the final reconstructed output uses the first 2.5 ms 5918*a58d3d2aSXin Li of audio output by the decoder for the redundant frame as-is, discarding 5919*a58d3d2aSXin Li the corresponding output from the SILK-only or Hybrid portion of the frame. 5920*a58d3d2aSXin LiThe remaining 2.5 ms is cross-lapped with the decoded SILK/Hybrid signal 5921*a58d3d2aSXin Li using the CELT's power-complementary MDCT window to ensure a smooth 5922*a58d3d2aSXin Li transition. 5923*a58d3d2aSXin Li</t> 5924*a58d3d2aSXin Li 5925*a58d3d2aSXin Li<t> 5926*a58d3d2aSXin LiIf the redundancy belongs at the end (in a SILK-only or Hybrid to CELT-only 5927*a58d3d2aSXin Li transition), only the second half (2.5 ms) of the audio output by the 5928*a58d3d2aSXin Li decoder for the redundant frame is used. 5929*a58d3d2aSXin LiIn that case, the second half of the redundant frame is cross-lapped with the 5930*a58d3d2aSXin Li end of the SILK/Hybrid signal, again using CELT's power-complementary MDCT 5931*a58d3d2aSXin Li window to ensure a smooth transition. 5932*a58d3d2aSXin Li</t> 5933*a58d3d2aSXin Li</section> 5934*a58d3d2aSXin Li 5935*a58d3d2aSXin Li</section> 5936*a58d3d2aSXin Li 5937*a58d3d2aSXin Li<section anchor="decoder-reset" title="State Reset"> 5938*a58d3d2aSXin Li<t> 5939*a58d3d2aSXin LiWhen a transition occurs, the state of the SILK or the CELT decoder (or both) 5940*a58d3d2aSXin Li may need to be reset before decoding a frame in the new mode. 5941*a58d3d2aSXin LiThis avoids reusing "out of date" memory, which may not have been updated in 5942*a58d3d2aSXin Li some time or may not be in a well-defined state due to, e.g., PLC. 5943*a58d3d2aSXin LiThe SILK state is reset before every SILK-only or Hybrid frame where the 5944*a58d3d2aSXin Li previous frame was CELT-only. 5945*a58d3d2aSXin LiThe CELT state is reset every time the operating mode changes and the new mode 5946*a58d3d2aSXin Li is either Hybrid or CELT-only, except when the transition uses redundancy as 5947*a58d3d2aSXin Li described above. 5948*a58d3d2aSXin LiWhen switching from SILK-only or Hybrid to CELT-only with redundancy, the CELT 5949*a58d3d2aSXin Li state is reset before decoding the redundant CELT frame embedded in the 5950*a58d3d2aSXin Li SILK-only or Hybrid frame, but it is not reset before decoding the following 5951*a58d3d2aSXin Li CELT-only frame. 5952*a58d3d2aSXin LiWhen switching from CELT-only mode to SILK-only or Hybrid mode with redundancy, 5953*a58d3d2aSXin Li the CELT decoder is not reset for decoding the redundant CELT frame. 5954*a58d3d2aSXin Li</t> 5955*a58d3d2aSXin Li</section> 5956*a58d3d2aSXin Li 5957*a58d3d2aSXin Li<section title="Summary of Transitions"> 5958*a58d3d2aSXin Li 5959*a58d3d2aSXin Li<t> 5960*a58d3d2aSXin Li<xref target="normative_transitions"/> illustrates all of the normative 5961*a58d3d2aSXin Li transitions involving a mode change, an audio bandwidth change, or both. 5962*a58d3d2aSXin LiEach one uses an S, H, or C to represent an Opus frame in the corresponding 5963*a58d3d2aSXin Li mode. 5964*a58d3d2aSXin LiIn addition, an R indicates the presence of redundancy in the Opus frame it is 5965*a58d3d2aSXin Li cross-lapped with. 5966*a58d3d2aSXin LiIts location in the first or last 5 ms is assumed to correspond to whether 5967*a58d3d2aSXin Li it is the frame before or after the transition. 5968*a58d3d2aSXin LiOther uses of redundancy are non-normative. 5969*a58d3d2aSXin LiFinally, a c indicates the contents of the CELT overlap buffer after the 5970*a58d3d2aSXin Li previously decoded frame (i.e., as extracted by decoding a silence frame). 5971*a58d3d2aSXin Li<figure align="center" anchor="normative_transitions" 5972*a58d3d2aSXin Li title="Normative Transitions"> 5973*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 5974*a58d3d2aSXin LiSILK to SILK with Redundancy: S -> S -> S 5975*a58d3d2aSXin Li & 5976*a58d3d2aSXin Li !R -> R 5977*a58d3d2aSXin Li & 5978*a58d3d2aSXin Li ;S -> S -> S 5979*a58d3d2aSXin Li 5980*a58d3d2aSXin LiNB or MB SILK to Hybrid with Redundancy: S -> S -> S 5981*a58d3d2aSXin Li & 5982*a58d3d2aSXin Li !R ->;H -> H -> H 5983*a58d3d2aSXin Li 5984*a58d3d2aSXin LiWB SILK to Hybrid: S -> S -> S ->!H -> H -> H 5985*a58d3d2aSXin Li 5986*a58d3d2aSXin LiSILK to CELT with Redundancy: S -> S -> S 5987*a58d3d2aSXin Li & 5988*a58d3d2aSXin Li !R -> C -> C -> C 5989*a58d3d2aSXin Li 5990*a58d3d2aSXin LiHybrid to NB or MB SILK with Redundancy: H -> H -> H 5991*a58d3d2aSXin Li & 5992*a58d3d2aSXin Li !R -> R 5993*a58d3d2aSXin Li & 5994*a58d3d2aSXin Li ;S -> S -> S 5995*a58d3d2aSXin Li 5996*a58d3d2aSXin LiHybrid to WB SILK: H -> H -> H -> c 5997*a58d3d2aSXin Li \ + 5998*a58d3d2aSXin Li > S -> S -> S 5999*a58d3d2aSXin Li 6000*a58d3d2aSXin LiHybrid to CELT with Redundancy: H -> H -> H 6001*a58d3d2aSXin Li & 6002*a58d3d2aSXin Li !R -> C -> C -> C 6003*a58d3d2aSXin Li 6004*a58d3d2aSXin LiCELT to SILK with Redundancy: C -> C -> C -> R 6005*a58d3d2aSXin Li & 6006*a58d3d2aSXin Li ;S -> S -> S 6007*a58d3d2aSXin Li 6008*a58d3d2aSXin LiCELT to Hybrid with Redundancy: C -> C -> C -> R 6009*a58d3d2aSXin Li & 6010*a58d3d2aSXin Li |H -> H -> H 6011*a58d3d2aSXin Li 6012*a58d3d2aSXin LiKey: 6013*a58d3d2aSXin LiS SILK-only frame ; SILK decoder reset 6014*a58d3d2aSXin LiH Hybrid frame | CELT and SILK decoder resets 6015*a58d3d2aSXin LiC CELT-only frame ! CELT decoder reset 6016*a58d3d2aSXin Lic CELT overlap + Direct mixing 6017*a58d3d2aSXin LiR Redundant CELT frame & Windowed cross-lap 6018*a58d3d2aSXin Li]]></artwork> 6019*a58d3d2aSXin Li</figure> 6020*a58d3d2aSXin LiThe first two and the last two Opus frames in each example are illustrative, 6021*a58d3d2aSXin Li i.e., there is no requirement that a stream remain in the same configuration 6022*a58d3d2aSXin Li for three consecutive frames before or after a switch. 6023*a58d3d2aSXin Li</t> 6024*a58d3d2aSXin Li 6025*a58d3d2aSXin Li<t> 6026*a58d3d2aSXin LiThe behavior of transitions without redundancy where PLC is allowed is non-normative. 6027*a58d3d2aSXin LiAn encoder might still wish to use these transitions if, for example, it 6028*a58d3d2aSXin Li doesn't want to add the extra bitrate required for redundancy or if it makes 6029*a58d3d2aSXin Li a decision to switch after it has already transmitted the frame that would 6030*a58d3d2aSXin Li have had to contain the redundancy. 6031*a58d3d2aSXin Li<xref target="nonnormative_transitions"/> illustrates the recommended 6032*a58d3d2aSXin Li cross-lapping and decoder resets for these transitions. 6033*a58d3d2aSXin Li<figure align="center" anchor="nonnormative_transitions" 6034*a58d3d2aSXin Li title="Recommended Non-Normative Transitions"> 6035*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 6036*a58d3d2aSXin LiSILK to SILK (audio bandwidth change): S -> S -> S ;S -> S -> S 6037*a58d3d2aSXin Li 6038*a58d3d2aSXin LiNB or MB SILK to Hybrid: S -> S -> S |H -> H -> H 6039*a58d3d2aSXin Li 6040*a58d3d2aSXin LiSILK to CELT without Redundancy: S -> S -> S -> P 6041*a58d3d2aSXin Li & 6042*a58d3d2aSXin Li !C -> C -> C 6043*a58d3d2aSXin Li 6044*a58d3d2aSXin LiHybrid to NB or MB SILK: H -> H -> H -> c 6045*a58d3d2aSXin Li + 6046*a58d3d2aSXin Li ;S -> S -> S 6047*a58d3d2aSXin Li 6048*a58d3d2aSXin LiHybrid to CELT without Redundancy: H -> H -> H -> P 6049*a58d3d2aSXin Li & 6050*a58d3d2aSXin Li !C -> C -> C 6051*a58d3d2aSXin Li 6052*a58d3d2aSXin LiCELT to SILK without Redundancy: C -> C -> C -> P 6053*a58d3d2aSXin Li & 6054*a58d3d2aSXin Li ;S -> S -> S 6055*a58d3d2aSXin Li 6056*a58d3d2aSXin LiCELT to Hybrid without Redundancy: C -> C -> C -> P 6057*a58d3d2aSXin Li & 6058*a58d3d2aSXin Li |H -> H -> H 6059*a58d3d2aSXin Li 6060*a58d3d2aSXin LiKey: 6061*a58d3d2aSXin LiS SILK-only frame ; SILK decoder reset 6062*a58d3d2aSXin LiH Hybrid frame | CELT and SILK decoder resets 6063*a58d3d2aSXin LiC CELT-only frame ! CELT decoder reset 6064*a58d3d2aSXin Lic CELT overlap + Direct mixing 6065*a58d3d2aSXin LiP Packet Loss Concealment & Windowed cross-lap 6066*a58d3d2aSXin Li]]></artwork> 6067*a58d3d2aSXin Li</figure> 6068*a58d3d2aSXin LiEncoders SHOULD NOT use other transitions, e.g., those that involve redundancy 6069*a58d3d2aSXin Li in ways not illustrated in <xref target="normative_transitions"/>. 6070*a58d3d2aSXin Li</t> 6071*a58d3d2aSXin Li 6072*a58d3d2aSXin Li</section> 6073*a58d3d2aSXin Li 6074*a58d3d2aSXin Li</section> 6075*a58d3d2aSXin Li 6076*a58d3d2aSXin Li</section> 6077*a58d3d2aSXin Li 6078*a58d3d2aSXin Li 6079*a58d3d2aSXin Li<!-- ******************************************************************* --> 6080*a58d3d2aSXin Li<!-- ************************** OPUS ENCODER *********************** --> 6081*a58d3d2aSXin Li<!-- ******************************************************************* --> 6082*a58d3d2aSXin Li 6083*a58d3d2aSXin Li<section title="Opus Encoder"> 6084*a58d3d2aSXin Li<t> 6085*a58d3d2aSXin LiJust like the decoder, the Opus encoder also normally consists of two main blocks: the 6086*a58d3d2aSXin LiSILK encoder and the CELT encoder. However, unlike the case of the decoder, a valid 6087*a58d3d2aSXin Li(though potentially suboptimal) Opus encoder is not required to support all modes and 6088*a58d3d2aSXin Limay thus only include a SILK encoder module or a CELT encoder module. 6089*a58d3d2aSXin LiThe output bit-stream of the Opus encoding contains bits from the SILK and CELT 6090*a58d3d2aSXin Li encoders, though these are not separable due to the use of a range coder. 6091*a58d3d2aSXin LiA block diagram of the encoder is illustrated below. 6092*a58d3d2aSXin Li 6093*a58d3d2aSXin Li<figure align="center" anchor="opus-encoder-figure" title="Opus Encoder"> 6094*a58d3d2aSXin Li<artwork> 6095*a58d3d2aSXin Li<![CDATA[ 6096*a58d3d2aSXin Li +------------+ +---------+ 6097*a58d3d2aSXin Li | Sample | | SILK |------+ 6098*a58d3d2aSXin Li +->| Rate |--->| Encoder | V 6099*a58d3d2aSXin Li +-----------+ | | Conversion | | | +---------+ 6100*a58d3d2aSXin Li | Optional | | +------------+ +---------+ | Range | 6101*a58d3d2aSXin Li->| High-pass |--+ | Encoder |----> 6102*a58d3d2aSXin Li | Filter | | +--------------+ +---------+ | | Bit- 6103*a58d3d2aSXin Li +-----------+ | | Delay | | CELT | +---------+ stream 6104*a58d3d2aSXin Li +->| Compensation |->| Encoder | ^ 6105*a58d3d2aSXin Li | | | |------+ 6106*a58d3d2aSXin Li +--------------+ +---------+ 6107*a58d3d2aSXin Li]]> 6108*a58d3d2aSXin Li</artwork> 6109*a58d3d2aSXin Li</figure> 6110*a58d3d2aSXin Li</t> 6111*a58d3d2aSXin Li 6112*a58d3d2aSXin Li<t> 6113*a58d3d2aSXin LiFor a normal encoder where both the SILK and the CELT modules are included, an optimal 6114*a58d3d2aSXin Liencoder should select which coding mode to use at run-time depending on the conditions. 6115*a58d3d2aSXin LiIn the reference implementation, the frame size is selected by the application, but the 6116*a58d3d2aSXin Liother configuration parameters (number of channels, bandwidth, mode) are automatically 6117*a58d3d2aSXin Liselected (unless explicitly overridden by the application) depend on the following: 6118*a58d3d2aSXin Li<list style="symbols"> 6119*a58d3d2aSXin Li<t>Requested bitrate</t> 6120*a58d3d2aSXin Li<t>Input sampling rate</t> 6121*a58d3d2aSXin Li<t>Type of signal (speech vs music)</t> 6122*a58d3d2aSXin Li<t>Frame size in use</t> 6123*a58d3d2aSXin Li</list> 6124*a58d3d2aSXin Li 6125*a58d3d2aSXin LiThe type of signal currently needs to be provided by the application (though it can be 6126*a58d3d2aSXin Lichanged in real-time). An Opus encoder implementation could also do automatic detection, 6127*a58d3d2aSXin Libut since Opus is an interactive codec, such an implementation would likely have to either 6128*a58d3d2aSXin Lidelay the signal (for non-interactive applications) or delay the mode switching decisions (for 6129*a58d3d2aSXin Liinteractive applications). 6130*a58d3d2aSXin Li</t> 6131*a58d3d2aSXin Li 6132*a58d3d2aSXin Li<t> 6133*a58d3d2aSXin LiWhen the encoder is configured for voice over IP applications, the input signal is 6134*a58d3d2aSXin Lifiltered by a high-pass filter to remove the lowest part of the spectrum 6135*a58d3d2aSXin Lithat contains little speech energy and may contain background noise. This is a second order 6136*a58d3d2aSXin LiAuto Regressive Moving Average (i.e., with poles and zeros) filter with a cut-off frequency around 50 Hz. 6137*a58d3d2aSXin LiIn the future, a music detector may also be used to lower the cut-off frequency when the 6138*a58d3d2aSXin Liinput signal is detected to be music rather than speech. 6139*a58d3d2aSXin Li</t> 6140*a58d3d2aSXin Li 6141*a58d3d2aSXin Li<section anchor="range-encoder" title="Range Encoder"> 6142*a58d3d2aSXin Li<t> 6143*a58d3d2aSXin LiThe range coder acts as the bit-packer for Opus. 6144*a58d3d2aSXin LiIt is used in three different ways: to encode 6145*a58d3d2aSXin Li<list style="symbols"> 6146*a58d3d2aSXin Li<t> 6147*a58d3d2aSXin LiEntropy-coded symbols with a fixed probability model using ec_encode() 6148*a58d3d2aSXin Li (entenc.c), 6149*a58d3d2aSXin Li</t> 6150*a58d3d2aSXin Li<t> 6151*a58d3d2aSXin LiIntegers from 0 to (2**M - 1) using ec_enc_uint() or ec_enc_bits() 6152*a58d3d2aSXin Li (entenc.c),</t> 6153*a58d3d2aSXin Li<t> 6154*a58d3d2aSXin LiIntegers from 0 to (ft - 1) (where ft is not a power of two) using 6155*a58d3d2aSXin Li ec_enc_uint() (entenc.c). 6156*a58d3d2aSXin Li</t> 6157*a58d3d2aSXin Li</list> 6158*a58d3d2aSXin Li</t> 6159*a58d3d2aSXin Li 6160*a58d3d2aSXin Li<t> 6161*a58d3d2aSXin LiThe range encoder maintains an internal state vector composed of the four-tuple 6162*a58d3d2aSXin Li (val, rng, rem, ext) representing the low end of the current 6163*a58d3d2aSXin Li range, the size of the current range, a single buffered output byte, and a 6164*a58d3d2aSXin Li count of additional carry-propagating output bytes. 6165*a58d3d2aSXin LiBoth val and rng are 32-bit unsigned integer values, rem is a byte value or 6166*a58d3d2aSXin Li less than 255 or the special value -1, and ext is an unsigned integer with at 6167*a58d3d2aSXin Li least 11 bits. 6168*a58d3d2aSXin LiThis state vector is initialized at the start of each each frame to the value 6169*a58d3d2aSXin Li (0, 2**31, -1, 0). 6170*a58d3d2aSXin LiAfter encoding a sequence of symbols, the value of rng in the encoder should 6171*a58d3d2aSXin Li exactly match the value of rng in the decoder after decoding the same sequence 6172*a58d3d2aSXin Li of symbols. 6173*a58d3d2aSXin LiThis is a powerful tool for detecting errors in either an encoder or decoder 6174*a58d3d2aSXin Li implementation. 6175*a58d3d2aSXin LiThe value of val, on the other hand, represents different things in the encoder 6176*a58d3d2aSXin Li and decoder, and is not expected to match. 6177*a58d3d2aSXin Li</t> 6178*a58d3d2aSXin Li 6179*a58d3d2aSXin Li<t> 6180*a58d3d2aSXin LiThe decoder has no analog for rem and ext. 6181*a58d3d2aSXin LiThese are used to perform carry propagation in the renormalization loop below. 6182*a58d3d2aSXin LiEach iteration of this loop produces 9 bits of output, consisting of 8 data 6183*a58d3d2aSXin Li bits and a carry flag. 6184*a58d3d2aSXin LiThe encoder cannot determine the final value of the output bytes until it 6185*a58d3d2aSXin Li propagates these carry flags. 6186*a58d3d2aSXin LiTherefore the reference implementation buffers a single non-propagating output 6187*a58d3d2aSXin Li byte (i.e., one less than 255) in rem and keeps a count of additional 6188*a58d3d2aSXin Li propagating (i.e., 255) output bytes in ext. 6189*a58d3d2aSXin LiAn implementation may choose to use any mathematically equivalent scheme to 6190*a58d3d2aSXin Li perform carry propagation. 6191*a58d3d2aSXin Li</t> 6192*a58d3d2aSXin Li 6193*a58d3d2aSXin Li<section anchor="encoding-symbols" title="Encoding Symbols"> 6194*a58d3d2aSXin Li<t> 6195*a58d3d2aSXin LiThe main encoding function is ec_encode() (entenc.c), which encodes symbol k in 6196*a58d3d2aSXin Li the current context using the same three-tuple (fl[k], fh[k], ft) 6197*a58d3d2aSXin Li as the decoder to describe the range of the symbol (see 6198*a58d3d2aSXin Li <xref target="range-decoder"/>). 6199*a58d3d2aSXin Li</t> 6200*a58d3d2aSXin Li<t> 6201*a58d3d2aSXin Liec_encode() updates the state of the encoder as follows. 6202*a58d3d2aSXin LiIf fl[k] is greater than zero, then 6203*a58d3d2aSXin Li<figure align="center"> 6204*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 6205*a58d3d2aSXin Li rng 6206*a58d3d2aSXin Lival = val + rng - --- * (ft - fl) , 6207*a58d3d2aSXin Li ft 6208*a58d3d2aSXin Li 6209*a58d3d2aSXin Li rng 6210*a58d3d2aSXin Lirng = --- * (fh - fl) . 6211*a58d3d2aSXin Li ft 6212*a58d3d2aSXin Li]]></artwork> 6213*a58d3d2aSXin Li</figure> 6214*a58d3d2aSXin LiOtherwise, val is unchanged and 6215*a58d3d2aSXin Li<figure align="center"> 6216*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 6217*a58d3d2aSXin Li rng 6218*a58d3d2aSXin Lirng = rng - --- * (fh - fl) . 6219*a58d3d2aSXin Li ft 6220*a58d3d2aSXin Li]]></artwork> 6221*a58d3d2aSXin Li</figure> 6222*a58d3d2aSXin LiThe divisions here are integer division. 6223*a58d3d2aSXin Li</t> 6224*a58d3d2aSXin Li 6225*a58d3d2aSXin Li<section anchor="range-encoder-renorm" title="Renormalization"> 6226*a58d3d2aSXin Li<t> 6227*a58d3d2aSXin LiAfter this update, the range is normalized using a procedure very similar to 6228*a58d3d2aSXin Li that of <xref target="range-decoder-renorm"/>, implemented by 6229*a58d3d2aSXin Li ec_enc_normalize() (entenc.c). 6230*a58d3d2aSXin LiThe following process is repeated until rng > 2**23. 6231*a58d3d2aSXin LiFirst, the top 9 bits of val, (val>>23), are sent to the carry buffer, 6232*a58d3d2aSXin Li described in <xref target="ec_enc_carry_out"/>. 6233*a58d3d2aSXin LiThen, the encoder sets 6234*a58d3d2aSXin Li<figure align="center"> 6235*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 6236*a58d3d2aSXin Lival = (val<<8) & 0x7FFFFFFF , 6237*a58d3d2aSXin Li 6238*a58d3d2aSXin Lirng = rng<<8 . 6239*a58d3d2aSXin Li]]></artwork> 6240*a58d3d2aSXin Li</figure> 6241*a58d3d2aSXin Li</t> 6242*a58d3d2aSXin Li</section> 6243*a58d3d2aSXin Li 6244*a58d3d2aSXin Li<section anchor="ec_enc_carry_out" 6245*a58d3d2aSXin Li title="Carry Propagation and Output Buffering"> 6246*a58d3d2aSXin Li<t> 6247*a58d3d2aSXin LiThe function ec_enc_carry_out() (entenc.c) implements carry propagation and 6248*a58d3d2aSXin Li output buffering. 6249*a58d3d2aSXin LiIt takes as input a 9-bit value, c, consisting of 8 data bits and an additional 6250*a58d3d2aSXin Li carry bit. 6251*a58d3d2aSXin LiIf c is equal to the value 255, then ext is simply incremented, and no other 6252*a58d3d2aSXin Li state updates are performed. 6253*a58d3d2aSXin LiOtherwise, let b = (c>>8) be the carry bit. 6254*a58d3d2aSXin LiThen, 6255*a58d3d2aSXin Li<list style="symbols"> 6256*a58d3d2aSXin Li<t> 6257*a58d3d2aSXin LiIf the buffered byte rem contains a value other than -1, the encoder outputs 6258*a58d3d2aSXin Li the byte (rem + b). 6259*a58d3d2aSXin LiOtherwise, if rem is -1, no byte is output. 6260*a58d3d2aSXin Li</t> 6261*a58d3d2aSXin Li<t> 6262*a58d3d2aSXin LiIf ext is non-zero, then the encoder outputs ext bytes---all with a value of 0 6263*a58d3d2aSXin Li if b is set, or 255 if b is unset---and sets ext to 0. 6264*a58d3d2aSXin Li</t> 6265*a58d3d2aSXin Li<t> 6266*a58d3d2aSXin Lirem is set to the 8 data bits: 6267*a58d3d2aSXin Li<figure align="center"> 6268*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 6269*a58d3d2aSXin Lirem = c & 255 . 6270*a58d3d2aSXin Li]]></artwork> 6271*a58d3d2aSXin Li</figure> 6272*a58d3d2aSXin Li</t> 6273*a58d3d2aSXin Li</list> 6274*a58d3d2aSXin Li</t> 6275*a58d3d2aSXin Li</section> 6276*a58d3d2aSXin Li 6277*a58d3d2aSXin Li</section> 6278*a58d3d2aSXin Li 6279*a58d3d2aSXin Li<section anchor="encoding-alternate" title="Alternate Encoding Methods"> 6280*a58d3d2aSXin Li<t> 6281*a58d3d2aSXin LiThe reference implementation uses three additional encoding methods that are 6282*a58d3d2aSXin Li exactly equivalent to the above, but make assumptions and simplifications that 6283*a58d3d2aSXin Li allow for a more efficient implementation. 6284*a58d3d2aSXin Li</t> 6285*a58d3d2aSXin Li 6286*a58d3d2aSXin Li<section anchor="ec_encode_bin" title="ec_encode_bin()"> 6287*a58d3d2aSXin Li<t> 6288*a58d3d2aSXin LiThe first is ec_encode_bin() (entenc.c), defined using the parameter ftb 6289*a58d3d2aSXin Li instead of ft. 6290*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_encode() with 6291*a58d3d2aSXin Li ft = (1<<ftb), but avoids using division. 6292*a58d3d2aSXin Li</t> 6293*a58d3d2aSXin Li</section> 6294*a58d3d2aSXin Li 6295*a58d3d2aSXin Li<section anchor="ec_enc_bit_logp" title="ec_enc_bit_logp()"> 6296*a58d3d2aSXin Li<t> 6297*a58d3d2aSXin LiThe next is ec_enc_bit_logp() (entenc.c), which encodes a single binary symbol. 6298*a58d3d2aSXin LiThe context is described by a single parameter, logp, which is the absolute 6299*a58d3d2aSXin Li value of the base-2 logarithm of the probability of a "1". 6300*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_encode() with the 3-tuple 6301*a58d3d2aSXin Li (fl[k] = 0, fh[k] = (1<<logp) - 1, 6302*a58d3d2aSXin Li ft = (1<<logp)) if k is 0 and with 6303*a58d3d2aSXin Li (fl[k] = (1<<logp) - 1, 6304*a58d3d2aSXin Li fh[k] = ft = (1<<logp)) if k is 1. 6305*a58d3d2aSXin LiThe implementation requires no multiplications or divisions. 6306*a58d3d2aSXin Li</t> 6307*a58d3d2aSXin Li</section> 6308*a58d3d2aSXin Li 6309*a58d3d2aSXin Li<section anchor="ec_enc_icdf" title="ec_enc_icdf()"> 6310*a58d3d2aSXin Li<t> 6311*a58d3d2aSXin LiThe last is ec_enc_icdf() (entenc.c), which encodes a single binary symbol with 6312*a58d3d2aSXin Li a table-based context of up to 8 bits. 6313*a58d3d2aSXin LiThis uses the same icdf table as ec_dec_icdf() from 6314*a58d3d2aSXin Li <xref target="ec_dec_icdf"/>. 6315*a58d3d2aSXin LiThe function is mathematically equivalent to calling ec_encode() with 6316*a58d3d2aSXin Li fl[k] = (1<<ftb) - icdf[k-1] (or 0 if 6317*a58d3d2aSXin Li k == 0), fh[k] = (1<<ftb) - icdf[k], and 6318*a58d3d2aSXin Li ft = (1<<ftb). 6319*a58d3d2aSXin LiThis only saves a few arithmetic operations over ec_encode_bin(), but allows 6320*a58d3d2aSXin Li the encoder to use the same icdf tables as the decoder. 6321*a58d3d2aSXin Li</t> 6322*a58d3d2aSXin Li</section> 6323*a58d3d2aSXin Li 6324*a58d3d2aSXin Li</section> 6325*a58d3d2aSXin Li 6326*a58d3d2aSXin Li<section anchor="encoding-bits" title="Encoding Raw Bits"> 6327*a58d3d2aSXin Li<t> 6328*a58d3d2aSXin LiThe raw bits used by the CELT layer are packed at the end of the buffer using 6329*a58d3d2aSXin Li ec_enc_bits() (entenc.c). 6330*a58d3d2aSXin LiBecause the raw bits may continue into the last byte output by the range coder 6331*a58d3d2aSXin Li if there is room in the low-order bits, the encoder must be prepared to merge 6332*a58d3d2aSXin Li these values into a single byte. 6333*a58d3d2aSXin LiThe procedure in <xref target="encoder-finalizing"/> does this in a way that 6334*a58d3d2aSXin Li ensures both the range coded data and the raw bits can be decoded 6335*a58d3d2aSXin Li successfully. 6336*a58d3d2aSXin Li</t> 6337*a58d3d2aSXin Li</section> 6338*a58d3d2aSXin Li 6339*a58d3d2aSXin Li<section anchor="encoding-ints" title="Encoding Uniformly Distributed Integers"> 6340*a58d3d2aSXin Li<t> 6341*a58d3d2aSXin LiThe function ec_enc_uint() (entenc.c) encodes one of ft equiprobable symbols in 6342*a58d3d2aSXin Li the range 0 to (ft - 1), inclusive, each with a frequency of 1, 6343*a58d3d2aSXin Li where ft may be as large as (2**32 - 1). 6344*a58d3d2aSXin LiLike the decoder (see <xref target="ec_dec_uint"/>), it splits up the 6345*a58d3d2aSXin Li value into a range coded symbol representing up to 8 of the high bits, and, if 6346*a58d3d2aSXin Li necessary, raw bits representing the remainder of the value. 6347*a58d3d2aSXin Li</t> 6348*a58d3d2aSXin Li<t> 6349*a58d3d2aSXin Liec_enc_uint() takes a two-tuple (t, ft), where t is the value to be 6350*a58d3d2aSXin Li encoded, 0 <= t < ft, and ft is not necessarily a 6351*a58d3d2aSXin Li power of two. 6352*a58d3d2aSXin LiLet ftb = ilog(ft - 1), i.e., the number of bits required 6353*a58d3d2aSXin Li to store (ft - 1) in two's complement notation. 6354*a58d3d2aSXin LiIf ftb is 8 or less, then t is encoded directly using ec_encode() with the 6355*a58d3d2aSXin Li three-tuple (t, t + 1, ft). 6356*a58d3d2aSXin Li</t> 6357*a58d3d2aSXin Li<t> 6358*a58d3d2aSXin LiIf ftb is greater than 8, then the top 8 bits of t are encoded using the 6359*a58d3d2aSXin Li three-tuple (t>>(ftb - 8), 6360*a58d3d2aSXin Li (t>>(ftb - 8)) + 1, 6361*a58d3d2aSXin Li ((ft - 1)>>(ftb - 8)) + 1), and the 6362*a58d3d2aSXin Li remaining bits, 6363*a58d3d2aSXin Li (t & ((1<<(ftb - 8)) - 1), 6364*a58d3d2aSXin Li are encoded as raw bits with ec_enc_bits(). 6365*a58d3d2aSXin Li</t> 6366*a58d3d2aSXin Li</section> 6367*a58d3d2aSXin Li 6368*a58d3d2aSXin Li<section anchor="encoder-finalizing" title="Finalizing the Stream"> 6369*a58d3d2aSXin Li<t> 6370*a58d3d2aSXin LiAfter all symbols are encoded, the stream must be finalized by outputting a 6371*a58d3d2aSXin Li value inside the current range. 6372*a58d3d2aSXin LiLet end be the integer in the interval [val, val + rng) with the 6373*a58d3d2aSXin Li largest number of trailing zero bits, b, such that 6374*a58d3d2aSXin Li (end + (1<<b) - 1) is also in the interval 6375*a58d3d2aSXin Li [val, val + rng). 6376*a58d3d2aSXin LiThis choice of end allows the maximum number of trailing bits to be set to 6377*a58d3d2aSXin Li arbitrary values while still ensuring the range coded part of the buffer can 6378*a58d3d2aSXin Li be decoded correctly. 6379*a58d3d2aSXin LiThen, while end is not zero, the top 9 bits of end, i.e., (end>>23), are 6380*a58d3d2aSXin Li passed to the carry buffer in accordance with the procedure in 6381*a58d3d2aSXin Li <xref target="ec_enc_carry_out"/>, and end is updated via 6382*a58d3d2aSXin Li<figure align="center"> 6383*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 6384*a58d3d2aSXin Liend = (end<<8) & 0x7FFFFFFF . 6385*a58d3d2aSXin Li]]></artwork> 6386*a58d3d2aSXin Li</figure> 6387*a58d3d2aSXin LiFinally, if the buffered output byte, rem, is neither zero nor the special 6388*a58d3d2aSXin Li value -1, or the carry count, ext, is greater than zero, then 9 zero bits are 6389*a58d3d2aSXin Li sent to the carry buffer to flush it to the output buffer. 6390*a58d3d2aSXin LiWhen outputting the final byte from the range coder, if it would overlap any 6391*a58d3d2aSXin Li raw bits already packed into the end of the output buffer, they should be ORed 6392*a58d3d2aSXin Li into the same byte. 6393*a58d3d2aSXin LiThe bit allocation routines in the CELT layer should ensure that this can be 6394*a58d3d2aSXin Li done without corrupting the range coder data so long as end is chosen as 6395*a58d3d2aSXin Li described above. 6396*a58d3d2aSXin LiIf there is any space between the end of the range coder data and the end of 6397*a58d3d2aSXin Li the raw bits, it is padded with zero bits. 6398*a58d3d2aSXin LiThis entire process is implemented by ec_enc_done() (entenc.c). 6399*a58d3d2aSXin Li</t> 6400*a58d3d2aSXin Li</section> 6401*a58d3d2aSXin Li 6402*a58d3d2aSXin Li<section anchor="encoder-tell" title="Current Bit Usage"> 6403*a58d3d2aSXin Li<t> 6404*a58d3d2aSXin Li The bit allocation routines in Opus need to be able to determine a 6405*a58d3d2aSXin Li conservative upper bound on the number of bits that have been used 6406*a58d3d2aSXin Li to encode the current frame thus far. This drives allocation 6407*a58d3d2aSXin Li decisions and ensures that the range coder and raw bits will not 6408*a58d3d2aSXin Li overflow the output buffer. This is computed in the 6409*a58d3d2aSXin Li reference implementation to whole-bit precision by 6410*a58d3d2aSXin Li the function ec_tell() (entcode.h) and to fractional 1/8th bit 6411*a58d3d2aSXin Li precision by the function ec_tell_frac() (entcode.c). 6412*a58d3d2aSXin Li Like all operations in the range coder, it must be implemented in a 6413*a58d3d2aSXin Li bit-exact manner, and must produce exactly the same value returned by 6414*a58d3d2aSXin Li the same functions in the decoder after decoding the same symbols. 6415*a58d3d2aSXin Li</t> 6416*a58d3d2aSXin Li</section> 6417*a58d3d2aSXin Li 6418*a58d3d2aSXin Li</section> 6419*a58d3d2aSXin Li 6420*a58d3d2aSXin Li<section title='SILK Encoder'> 6421*a58d3d2aSXin Li <t> 6422*a58d3d2aSXin Li In many respects the SILK encoder mirrors the SILK decoder described 6423*a58d3d2aSXin Li in <xref target='silk_decoder_outline'/>. 6424*a58d3d2aSXin Li Details such as the quantization and range coder tables can be found 6425*a58d3d2aSXin Li there, while this section describes the high-level design choices that 6426*a58d3d2aSXin Li were made. 6427*a58d3d2aSXin Li The diagram below shows the basic modules of the SILK encoder. 6428*a58d3d2aSXin Li<figure align="center" anchor="silk_encoder_figure" title="SILK Encoder"> 6429*a58d3d2aSXin Li<artwork> 6430*a58d3d2aSXin Li<![CDATA[ 6431*a58d3d2aSXin Li +----------+ +--------+ +---------+ 6432*a58d3d2aSXin Li | Sample | | Stereo | | SILK | 6433*a58d3d2aSXin Li------>| Rate |--->| Mixing |--->| Core |----------> 6434*a58d3d2aSXin LiInput |Conversion| | | | Encoder | Bitstream 6435*a58d3d2aSXin Li +----------+ +--------+ +---------+ 6436*a58d3d2aSXin Li]]> 6437*a58d3d2aSXin Li</artwork> 6438*a58d3d2aSXin Li</figure> 6439*a58d3d2aSXin Li</t> 6440*a58d3d2aSXin Li 6441*a58d3d2aSXin Li<section title='Sample Rate Conversion'> 6442*a58d3d2aSXin Li<t> 6443*a58d3d2aSXin LiThe input signal's sampling rate is adjusted by a sample rate conversion 6444*a58d3d2aSXin Limodule so that it matches the SILK internal sampling rate. 6445*a58d3d2aSXin LiThe input to the sample rate converter is delayed by a number of samples 6446*a58d3d2aSXin Lidepending on the sample rate ratio, such that the overall delay is constant 6447*a58d3d2aSXin Lifor all input and output sample rates. 6448*a58d3d2aSXin Li</t> 6449*a58d3d2aSXin Li</section> 6450*a58d3d2aSXin Li 6451*a58d3d2aSXin Li<section title='Stereo Mixing'> 6452*a58d3d2aSXin Li<t> 6453*a58d3d2aSXin LiThe stereo mixer is only used for stereo input signals. 6454*a58d3d2aSXin LiIt converts a stereo left/right signal into an adaptive 6455*a58d3d2aSXin Limid/side representation. 6456*a58d3d2aSXin LiThe first step is to compute non-adaptive mid/side signals 6457*a58d3d2aSXin Lias half the sum and difference between left and right signals. 6458*a58d3d2aSXin LiThe side signal is then minimized in energy by subtracting a 6459*a58d3d2aSXin Liprediction of it based on the mid signal. 6460*a58d3d2aSXin LiThis prediction works well when the left and right signals 6461*a58d3d2aSXin Liexhibit linear dependency, for instance for an amplitude-panned 6462*a58d3d2aSXin Liinput signal. 6463*a58d3d2aSXin LiLike in the decoder, the prediction coefficients are linearly 6464*a58d3d2aSXin Liinterpolated during the first 8 ms of the frame. 6465*a58d3d2aSXin Li The mid signal is always encoded, whereas the residual 6466*a58d3d2aSXin Li side signal is only encoded if it has sufficient 6467*a58d3d2aSXin Li energy compared to the mid signal's energy. 6468*a58d3d2aSXin Li If it has not, 6469*a58d3d2aSXin Li the "mid_only_flag" is set without encoding the side signal. 6470*a58d3d2aSXin Li</t> 6471*a58d3d2aSXin Li<t> 6472*a58d3d2aSXin LiThe predictor coefficients are coded regardless of whether 6473*a58d3d2aSXin Lithe side signal is encoded. 6474*a58d3d2aSXin LiFor each frame, two predictor coefficients are computed, one 6475*a58d3d2aSXin Lithat predicts between low-passed mid and side channels, and 6476*a58d3d2aSXin Lione that predicts between high-passed mid and side channels. 6477*a58d3d2aSXin LiThe low-pass filter is a simple three-tap filter 6478*a58d3d2aSXin Liand creates a delay of one sample. 6479*a58d3d2aSXin LiThe high-pass filtered signal is the difference between 6480*a58d3d2aSXin Lithe mid signal delayed by one sample and the low-passed 6481*a58d3d2aSXin Lisignal. Instead of explicitly computing the high-passed 6482*a58d3d2aSXin Lisignal, it is computationally more efficient to transform 6483*a58d3d2aSXin Lithe prediction coefficients before applying them to the 6484*a58d3d2aSXin Lifiltered mid signal, as follows 6485*a58d3d2aSXin Li<figure align="center"> 6486*a58d3d2aSXin Li<artwork align="center"> 6487*a58d3d2aSXin Li<![CDATA[ 6488*a58d3d2aSXin Lipred(n) = LP(n) * w0 + HP(n) * w1 6489*a58d3d2aSXin Li = LP(n) * w0 + (mid(n-1) - LP(n)) * w1 6490*a58d3d2aSXin Li = LP(n) * (w0 - w1) + mid(n-1) * w1 6491*a58d3d2aSXin Li]]> 6492*a58d3d2aSXin Li</artwork> 6493*a58d3d2aSXin Li</figure> 6494*a58d3d2aSXin Liwhere w0 and w1 are the low-pass and high-pass prediction 6495*a58d3d2aSXin Licoefficients, mid(n-1) is the mid signal delayed by one sample, 6496*a58d3d2aSXin LiLP(n) and HP(n) are the low-passed and high-passed 6497*a58d3d2aSXin Lisignals and pred(n) is the prediction signal that is subtracted 6498*a58d3d2aSXin Lifrom the side signal. 6499*a58d3d2aSXin Li</t> 6500*a58d3d2aSXin Li</section> 6501*a58d3d2aSXin Li 6502*a58d3d2aSXin Li<section title='SILK Core Encoder'> 6503*a58d3d2aSXin Li<t> 6504*a58d3d2aSXin LiWhat follows is a description of the core encoder and its components. 6505*a58d3d2aSXin LiFor simplicity, the core encoder is referred to simply as the encoder in 6506*a58d3d2aSXin Lithe remainder of this section. An overview of the encoder is given in 6507*a58d3d2aSXin Li<xref target="encoder_figure" />. 6508*a58d3d2aSXin Li</t> 6509*a58d3d2aSXin Li<figure align="center" anchor="encoder_figure" title="SILK Core Encoder"> 6510*a58d3d2aSXin Li<artwork align="center"> 6511*a58d3d2aSXin Li<![CDATA[ 6512*a58d3d2aSXin Li +---+ 6513*a58d3d2aSXin Li +--------------------------------->| | 6514*a58d3d2aSXin Li +---------+ | +---------+ | | 6515*a58d3d2aSXin Li |Voice | | |LTP |12 | | 6516*a58d3d2aSXin Li +-->|Activity |--+ +----->|Scaling |-----------+---->| | 6517*a58d3d2aSXin Li | |Detector |3 | | |Control |<--+ | | | 6518*a58d3d2aSXin Li | +---------+ | | +---------+ | | | | 6519*a58d3d2aSXin Li | | | +---------+ | | | | 6520*a58d3d2aSXin Li | | | |Gains | | | | | 6521*a58d3d2aSXin Li | | | +-->|Processor|---|---+---|---->| R | 6522*a58d3d2aSXin Li | | | | | |11 | | | | a | 6523*a58d3d2aSXin Li | \/ | | +---------+ | | | | n | 6524*a58d3d2aSXin Li | +---------+ | | +---------+ | | | | g | 6525*a58d3d2aSXin Li | |Pitch | | | |LSF | | | | | e | 6526*a58d3d2aSXin Li | +->|Analysis |---+ | |Quantizer|---|---|---|---->| | 6527*a58d3d2aSXin Li | | | |4 | | | |8 | | | | E |--> 6528*a58d3d2aSXin Li | | +---------+ | | +---------+ | | | | n | 2 6529*a58d3d2aSXin Li | | | | 9/\ 10| | | | | c | 6530*a58d3d2aSXin Li | | | | | \/ | | | | o | 6531*a58d3d2aSXin Li | | +---------+ | | +----------+ | | | | d | 6532*a58d3d2aSXin Li | | |Noise | +--|-->|Prediction|--+---|---|---->| e | 6533*a58d3d2aSXin Li | +->|Shaping |---|--+ |Analysis |7 | | | | r | 6534*a58d3d2aSXin Li | | |Analysis |5 | | | | | | | | | 6535*a58d3d2aSXin Li | | +---------+ | | +----------+ | | | | | 6536*a58d3d2aSXin Li | | | | /\ | | | | | 6537*a58d3d2aSXin Li | | +----------|--|--------+ | | | | | 6538*a58d3d2aSXin Li | | | \/ \/ \/ \/ \/ | | 6539*a58d3d2aSXin Li | | | +---------+ +------------+ | | 6540*a58d3d2aSXin Li | | | | | |Noise | | | 6541*a58d3d2aSXin Li-+-------+-----+------>|Prefilter|--------->|Shaping |-->| | 6542*a58d3d2aSXin Li1 | | 6 |Quantization|13 | | 6543*a58d3d2aSXin Li +---------+ +------------+ +---+ 6544*a58d3d2aSXin Li 6545*a58d3d2aSXin Li1: Input speech signal 6546*a58d3d2aSXin Li2: Range encoded bitstream 6547*a58d3d2aSXin Li3: Voice activity estimate 6548*a58d3d2aSXin Li4: Pitch lags (per 5 ms) and voicing decision (per 20 ms) 6549*a58d3d2aSXin Li5: Noise shaping quantization coefficients 6550*a58d3d2aSXin Li - Short term synthesis and analysis 6551*a58d3d2aSXin Li noise shaping coefficients (per 5 ms) 6552*a58d3d2aSXin Li - Long term synthesis and analysis noise 6553*a58d3d2aSXin Li shaping coefficients (per 5 ms and for voiced speech only) 6554*a58d3d2aSXin Li - Noise shaping tilt (per 5 ms) 6555*a58d3d2aSXin Li - Quantizer gain/step size (per 5 ms) 6556*a58d3d2aSXin Li6: Input signal filtered with analysis noise shaping filters 6557*a58d3d2aSXin Li7: Short and long term prediction coefficients 6558*a58d3d2aSXin Li LTP (per 5 ms) and LPC (per 20 ms) 6559*a58d3d2aSXin Li8: LSF quantization indices 6560*a58d3d2aSXin Li9: LSF coefficients 6561*a58d3d2aSXin Li10: Quantized LSF coefficients 6562*a58d3d2aSXin Li11: Processed gains, and synthesis noise shape coefficients 6563*a58d3d2aSXin Li12: LTP state scaling coefficient. Controlling error propagation 6564*a58d3d2aSXin Li / prediction gain trade-off 6565*a58d3d2aSXin Li13: Quantized signal 6566*a58d3d2aSXin Li]]> 6567*a58d3d2aSXin Li</artwork> 6568*a58d3d2aSXin Li</figure> 6569*a58d3d2aSXin Li 6570*a58d3d2aSXin Li<section title='Voice Activity Detection'> 6571*a58d3d2aSXin Li<t> 6572*a58d3d2aSXin LiThe input signal is processed by a Voice Activity Detector (VAD) to produce 6573*a58d3d2aSXin Lia measure of voice activity, spectral tilt, and signal-to-noise estimates for 6574*a58d3d2aSXin Lieach frame. The VAD uses a sequence of half-band filterbanks to split the 6575*a58d3d2aSXin Lisignal into four subbands: 0...Fs/16, Fs/16...Fs/8, Fs/8...Fs/4, and 6576*a58d3d2aSXin LiFs/4...Fs/2, where Fs is the sampling frequency (8, 12, 16, or 24 kHz). 6577*a58d3d2aSXin LiThe lowest subband, from 0 - Fs/16, is high-pass filtered with a first-order 6578*a58d3d2aSXin Limoving average (MA) filter (with transfer function H(z) = 1-z**(-1)) to 6579*a58d3d2aSXin Lireduce the energy at the lowest frequencies. For each frame, the signal 6580*a58d3d2aSXin Lienergy per subband is computed. 6581*a58d3d2aSXin LiIn each subband, a noise level estimator tracks the background noise level 6582*a58d3d2aSXin Liand a Signal-to-Noise Ratio (SNR) value is computed as the logarithm of the 6583*a58d3d2aSXin Liratio of energy to noise level. 6584*a58d3d2aSXin LiUsing these intermediate variables, the following parameters are calculated 6585*a58d3d2aSXin Lifor use in other SILK modules: 6586*a58d3d2aSXin Li<list style="symbols"> 6587*a58d3d2aSXin Li<t> 6588*a58d3d2aSXin LiAverage SNR. The average of the subband SNR values. 6589*a58d3d2aSXin Li</t> 6590*a58d3d2aSXin Li 6591*a58d3d2aSXin Li<t> 6592*a58d3d2aSXin LiSmoothed subband SNRs. Temporally smoothed subband SNR values. 6593*a58d3d2aSXin Li</t> 6594*a58d3d2aSXin Li 6595*a58d3d2aSXin Li<t> 6596*a58d3d2aSXin LiSpeech activity level. Based on the average SNR and a weighted average of the 6597*a58d3d2aSXin Lisubband energies. 6598*a58d3d2aSXin Li</t> 6599*a58d3d2aSXin Li 6600*a58d3d2aSXin Li<t> 6601*a58d3d2aSXin LiSpectral tilt. A weighted average of the subband SNRs, with positive weights 6602*a58d3d2aSXin Lifor the low subbands and negative weights for the high subbands. 6603*a58d3d2aSXin Li</t> 6604*a58d3d2aSXin Li</list> 6605*a58d3d2aSXin Li</t> 6606*a58d3d2aSXin Li</section> 6607*a58d3d2aSXin Li 6608*a58d3d2aSXin Li<section title='Pitch Analysis' anchor='pitch_estimator_overview_section'> 6609*a58d3d2aSXin Li<t> 6610*a58d3d2aSXin LiThe input signal is processed by the open loop pitch estimator shown in 6611*a58d3d2aSXin Li<xref target='pitch_estimator_figure' />. 6612*a58d3d2aSXin Li<figure align="center" anchor="pitch_estimator_figure" 6613*a58d3d2aSXin Li title="Block diagram of the pitch estimator"> 6614*a58d3d2aSXin Li<artwork align="center"> 6615*a58d3d2aSXin Li<![CDATA[ 6616*a58d3d2aSXin Li +--------+ +----------+ 6617*a58d3d2aSXin Li |2 x Down| |Time- | 6618*a58d3d2aSXin Li +->|sampling|->|Correlator| | 6619*a58d3d2aSXin Li | | | | | |4 6620*a58d3d2aSXin Li | +--------+ +----------+ \/ 6621*a58d3d2aSXin Li | | 2 +-------+ 6622*a58d3d2aSXin Li | | +-->|Speech |5 6623*a58d3d2aSXin Li +---------+ +--------+ | \/ | |Type |-> 6624*a58d3d2aSXin Li |LPC | |Down | | +----------+ | | 6625*a58d3d2aSXin Li +->|Analysis | +->|sample |-+------------->|Time- | +-------+ 6626*a58d3d2aSXin Li | | | | |to 8 kHz| |Correlator|-----------> 6627*a58d3d2aSXin Li | +---------+ | +--------+ |__________| 6 6628*a58d3d2aSXin Li | | | |3 6629*a58d3d2aSXin Li | \/ | \/ 6630*a58d3d2aSXin Li | +---------+ | +----------+ 6631*a58d3d2aSXin Li | |Whitening| | |Time- | 6632*a58d3d2aSXin Li-+->|Filter |-+--------------------------->|Correlator|-----------> 6633*a58d3d2aSXin Li1 | | | | 7 6634*a58d3d2aSXin Li +---------+ +----------+ 6635*a58d3d2aSXin Li 6636*a58d3d2aSXin Li1: Input signal 6637*a58d3d2aSXin Li2: Lag candidates from stage 1 6638*a58d3d2aSXin Li3: Lag candidates from stage 2 6639*a58d3d2aSXin Li4: Correlation threshold 6640*a58d3d2aSXin Li5: Voiced/unvoiced flag 6641*a58d3d2aSXin Li6: Pitch correlation 6642*a58d3d2aSXin Li7: Pitch lags 6643*a58d3d2aSXin Li]]> 6644*a58d3d2aSXin Li</artwork> 6645*a58d3d2aSXin Li</figure> 6646*a58d3d2aSXin LiThe pitch analysis finds a binary voiced/unvoiced classification, and, for 6647*a58d3d2aSXin Liframes classified as voiced, four pitch lags per frame - one for each 6648*a58d3d2aSXin Li5 ms subframe - and a pitch correlation indicating the periodicity of 6649*a58d3d2aSXin Lithe signal. 6650*a58d3d2aSXin LiThe input is first whitened using a Linear Prediction (LP) whitening filter, 6651*a58d3d2aSXin Liwhere the coefficients are computed through standard Linear Prediction Coding 6652*a58d3d2aSXin Li(LPC) analysis. The order of the whitening filter is 16 for best results, but 6653*a58d3d2aSXin Liis reduced to 12 for medium complexity and 8 for low complexity modes. 6654*a58d3d2aSXin LiThe whitened signal is analyzed to find pitch lags for which the time 6655*a58d3d2aSXin Licorrelation is high. 6656*a58d3d2aSXin LiThe analysis consists of three stages for reducing the complexity: 6657*a58d3d2aSXin Li<list style="symbols"> 6658*a58d3d2aSXin Li<t>In the first stage, the whitened signal is downsampled to 4 kHz 6659*a58d3d2aSXin Li(from 8 kHz) and the current frame is correlated to a signal delayed 6660*a58d3d2aSXin Liby a range of lags, starting from a shortest lag corresponding to 6661*a58d3d2aSXin Li500 Hz, to a longest lag corresponding to 56 Hz.</t> 6662*a58d3d2aSXin Li 6663*a58d3d2aSXin Li<t> 6664*a58d3d2aSXin LiThe second stage operates on an 8 kHz signal (downsampled from 12, 16, 6665*a58d3d2aSXin Lior 24 kHz) and measures time correlations only near the lags 6666*a58d3d2aSXin Licorresponding to those that had sufficiently high correlations in the first 6667*a58d3d2aSXin Listage. The resulting correlations are adjusted for a small bias towards 6668*a58d3d2aSXin Lishort lags to avoid ending up with a multiple of the true pitch lag. 6669*a58d3d2aSXin LiThe highest adjusted correlation is compared to a threshold depending on: 6670*a58d3d2aSXin Li<list style="symbols"> 6671*a58d3d2aSXin Li<t> 6672*a58d3d2aSXin LiWhether the previous frame was classified as voiced 6673*a58d3d2aSXin Li</t> 6674*a58d3d2aSXin Li<t> 6675*a58d3d2aSXin LiThe speech activity level 6676*a58d3d2aSXin Li</t> 6677*a58d3d2aSXin Li<t> 6678*a58d3d2aSXin LiThe spectral tilt. 6679*a58d3d2aSXin Li</t> 6680*a58d3d2aSXin Li</list> 6681*a58d3d2aSXin LiIf the threshold is exceeded, the current frame is classified as voiced and 6682*a58d3d2aSXin Lithe lag with the highest adjusted correlation is stored for a final pitch 6683*a58d3d2aSXin Lianalysis of the highest precision in the third stage. 6684*a58d3d2aSXin Li</t> 6685*a58d3d2aSXin Li<t> 6686*a58d3d2aSXin LiThe last stage operates directly on the whitened input signal to compute time 6687*a58d3d2aSXin Licorrelations for each of the four subframes independently in a narrow range 6688*a58d3d2aSXin Liaround the lag with highest correlation from the second stage. 6689*a58d3d2aSXin Li</t> 6690*a58d3d2aSXin Li</list> 6691*a58d3d2aSXin Li</t> 6692*a58d3d2aSXin Li</section> 6693*a58d3d2aSXin Li 6694*a58d3d2aSXin Li<section title='Noise Shaping Analysis' anchor='noise_shaping_analysis_overview_section'> 6695*a58d3d2aSXin Li<t> 6696*a58d3d2aSXin LiThe noise shaping analysis finds gains and filter coefficients used in the 6697*a58d3d2aSXin Liprefilter and noise shaping quantizer. These parameters are chosen such that 6698*a58d3d2aSXin Lithey will fulfill several requirements: 6699*a58d3d2aSXin Li<list style="symbols"> 6700*a58d3d2aSXin Li<t> 6701*a58d3d2aSXin LiBalancing quantization noise and bitrate. 6702*a58d3d2aSXin LiThe quantization gains determine the step size between reconstruction levels 6703*a58d3d2aSXin Liof the excitation signal. Therefore, increasing the quantization gain 6704*a58d3d2aSXin Liamplifies quantization noise, but also reduces the bitrate by lowering 6705*a58d3d2aSXin Lithe entropy of the quantization indices. 6706*a58d3d2aSXin Li</t> 6707*a58d3d2aSXin Li<t> 6708*a58d3d2aSXin LiSpectral shaping of the quantization noise; the noise shaping quantizer is 6709*a58d3d2aSXin Licapable of reducing quantization noise in some parts of the spectrum at the 6710*a58d3d2aSXin Licost of increased noise in other parts without substantially changing the 6711*a58d3d2aSXin Libitrate. 6712*a58d3d2aSXin LiBy shaping the noise such that it follows the signal spectrum, it becomes 6713*a58d3d2aSXin Liless audible. In practice, best results are obtained by making the shape 6714*a58d3d2aSXin Liof the noise spectrum slightly flatter than the signal spectrum. 6715*a58d3d2aSXin Li</t> 6716*a58d3d2aSXin Li<t> 6717*a58d3d2aSXin LiDe-emphasizing spectral valleys; by using different coefficients in the 6718*a58d3d2aSXin Lianalysis and synthesis part of the prefilter and noise shaping quantizer, 6719*a58d3d2aSXin Lithe levels of the spectral valleys can be decreased relative to the levels 6720*a58d3d2aSXin Liof the spectral peaks such as speech formants and harmonics. 6721*a58d3d2aSXin LiThis reduces the entropy of the signal, which is the difference between the 6722*a58d3d2aSXin Licoded signal and the quantization noise, thus lowering the bitrate. 6723*a58d3d2aSXin Li</t> 6724*a58d3d2aSXin Li<t> 6725*a58d3d2aSXin LiMatching the levels of the decoded speech formants to the levels of the 6726*a58d3d2aSXin Lioriginal speech formants; an adjustment gain and a first order tilt 6727*a58d3d2aSXin Licoefficient are computed to compensate for the effect of the noise 6728*a58d3d2aSXin Lishaping quantization on the level and spectral tilt. 6729*a58d3d2aSXin Li</t> 6730*a58d3d2aSXin Li</list> 6731*a58d3d2aSXin Li</t> 6732*a58d3d2aSXin Li<t> 6733*a58d3d2aSXin Li<figure align="center" anchor="noise_shape_analysis_spectra_figure" 6734*a58d3d2aSXin Li title="Noise shaping and spectral de-emphasis illustration"> 6735*a58d3d2aSXin Li<artwork align="center"> 6736*a58d3d2aSXin Li<![CDATA[ 6737*a58d3d2aSXin Li / \ ___ 6738*a58d3d2aSXin Li | // \\ 6739*a58d3d2aSXin Li | // \\ ____ 6740*a58d3d2aSXin Li |_// \\___// \\ ____ 6741*a58d3d2aSXin Li | / ___ \ / \\ // \\ 6742*a58d3d2aSXin Li P |/ / \ \_/ \\_____// \\ 6743*a58d3d2aSXin Li o | / \ ____ \ / \\ 6744*a58d3d2aSXin Li w | / \___/ \ \___/ ____ \\___ 1 6745*a58d3d2aSXin Li e |/ \ / \ \ 6746*a58d3d2aSXin Li r | \_____/ \ \__ 2 6747*a58d3d2aSXin Li | \ 6748*a58d3d2aSXin Li | \___ 3 6749*a58d3d2aSXin Li | 6750*a58d3d2aSXin Li +----------------------------------------> 6751*a58d3d2aSXin Li Frequency 6752*a58d3d2aSXin Li 6753*a58d3d2aSXin Li1: Input signal spectrum 6754*a58d3d2aSXin Li2: De-emphasized and level matched spectrum 6755*a58d3d2aSXin Li3: Quantization noise spectrum 6756*a58d3d2aSXin Li]]> 6757*a58d3d2aSXin Li</artwork> 6758*a58d3d2aSXin Li</figure> 6759*a58d3d2aSXin Li<xref target='noise_shape_analysis_spectra_figure' /> shows an example of an 6760*a58d3d2aSXin Liinput signal spectrum (1). 6761*a58d3d2aSXin LiAfter de-emphasis and level matching, the spectrum has deeper valleys (2). 6762*a58d3d2aSXin LiThe quantization noise spectrum (3) more or less follows the input signal 6763*a58d3d2aSXin Lispectrum, while having slightly less pronounced peaks. 6764*a58d3d2aSXin LiThe entropy, which provides a lower bound on the bitrate for encoding the 6765*a58d3d2aSXin Liexcitation signal, is proportional to the area between the de-emphasized 6766*a58d3d2aSXin Lispectrum (2) and the quantization noise spectrum (3). Without de-emphasis, 6767*a58d3d2aSXin Lithe entropy is proportional to the area between input spectrum (1) and 6768*a58d3d2aSXin Liquantization noise (3) - clearly higher. 6769*a58d3d2aSXin Li</t> 6770*a58d3d2aSXin Li 6771*a58d3d2aSXin Li<t> 6772*a58d3d2aSXin LiThe transformation from input signal to de-emphasized signal can be 6773*a58d3d2aSXin Lidescribed as a filtering operation with a filter 6774*a58d3d2aSXin Li<figure align="center"> 6775*a58d3d2aSXin Li<artwork align="center"> 6776*a58d3d2aSXin Li<![CDATA[ 6777*a58d3d2aSXin Li -1 Wana(z) 6778*a58d3d2aSXin LiH(z) = G * ( 1 - c_tilt * z ) * ------- 6779*a58d3d2aSXin Li Wsyn(z), 6780*a58d3d2aSXin Li]]> 6781*a58d3d2aSXin Li</artwork> 6782*a58d3d2aSXin Li</figure> 6783*a58d3d2aSXin Lihaving an adjustment gain G, a first order tilt adjustment filter with 6784*a58d3d2aSXin Litilt coefficient c_tilt, and where 6785*a58d3d2aSXin Li<figure align="center"> 6786*a58d3d2aSXin Li<artwork align="center"> 6787*a58d3d2aSXin Li<![CDATA[ 6788*a58d3d2aSXin Li 16 d 6789*a58d3d2aSXin Li __ -k -L __ -k 6790*a58d3d2aSXin LiWana(z) = (1 - \ (a_ana(k) * z )*(1 - z * \ b_ana(k) * z ), 6791*a58d3d2aSXin Li /_ /_ 6792*a58d3d2aSXin Li k=1 k=-d 6793*a58d3d2aSXin Li]]> 6794*a58d3d2aSXin Li</artwork> 6795*a58d3d2aSXin Li</figure> 6796*a58d3d2aSXin Liis the analysis part of the de-emphasis filter, consisting of the short-term 6797*a58d3d2aSXin Lishaping filter with coefficients a_ana(k), and the long-term shaping filter 6798*a58d3d2aSXin Liwith coefficients b_ana(k) and pitch lag L. 6799*a58d3d2aSXin LiThe parameter d determines the number of long-term shaping filter taps. 6800*a58d3d2aSXin Li</t> 6801*a58d3d2aSXin Li 6802*a58d3d2aSXin Li<t> 6803*a58d3d2aSXin LiSimilarly, but without the tilt adjustment, the synthesis part can be written as 6804*a58d3d2aSXin Li<figure align="center"> 6805*a58d3d2aSXin Li<artwork align="center"> 6806*a58d3d2aSXin Li<![CDATA[ 6807*a58d3d2aSXin Li 16 d 6808*a58d3d2aSXin Li __ -k -L __ -k 6809*a58d3d2aSXin LiWsyn(z) = (1 - \ (a_syn(k) * z )*(1 - z * \ b_syn(k) * z ). 6810*a58d3d2aSXin Li /_ /_ 6811*a58d3d2aSXin Li k=1 k=-d 6812*a58d3d2aSXin Li ]]> 6813*a58d3d2aSXin Li</artwork> 6814*a58d3d2aSXin Li</figure> 6815*a58d3d2aSXin Li</t> 6816*a58d3d2aSXin Li<t> 6817*a58d3d2aSXin LiAll noise shaping parameters are computed and applied per subframe of 5 ms. 6818*a58d3d2aSXin LiFirst, an LPC analysis is performed on a windowed signal block of 15 ms. 6819*a58d3d2aSXin LiThe signal block has a look-ahead of 5 ms relative to the current subframe, 6820*a58d3d2aSXin Liand the window is an asymmetric sine window. The LPC analysis is done with the 6821*a58d3d2aSXin Liautocorrelation method, with an order of between 8, in lowest-complexity mode, 6822*a58d3d2aSXin Liand 16, for best quality. 6823*a58d3d2aSXin Li</t> 6824*a58d3d2aSXin Li<t> 6825*a58d3d2aSXin LiOptionally the LPC analysis and noise shaping filters are warped by replacing 6826*a58d3d2aSXin Lithe delay elements by first-order allpass filters. 6827*a58d3d2aSXin LiThis increases the frequency resolution at low frequencies and reduces it at 6828*a58d3d2aSXin Lihigh ones, which better matches the human auditory system and improves 6829*a58d3d2aSXin Liquality. 6830*a58d3d2aSXin LiThe warped analysis and filtering comes at a cost in complexity 6831*a58d3d2aSXin Liand is therefore only done in higher complexity modes. 6832*a58d3d2aSXin Li</t> 6833*a58d3d2aSXin Li<t> 6834*a58d3d2aSXin LiThe quantization gain is found by taking the square root of the residual energy 6835*a58d3d2aSXin Lifrom the LPC analysis and multiplying it by a value inversely proportional 6836*a58d3d2aSXin Lito the coding quality control parameter and the pitch correlation. 6837*a58d3d2aSXin Li</t> 6838*a58d3d2aSXin Li<t> 6839*a58d3d2aSXin LiNext the two sets of short-term noise shaping coefficients a_ana(k) and 6840*a58d3d2aSXin Lia_syn(k) are obtained by applying different amounts of bandwidth expansion to the 6841*a58d3d2aSXin Licoefficients found in the LPC analysis. 6842*a58d3d2aSXin LiThis bandwidth expansion moves the roots of the LPC polynomial towards the 6843*a58d3d2aSXin Liorigin, using the formulas 6844*a58d3d2aSXin Li<figure align="center"> 6845*a58d3d2aSXin Li<artwork align="center"> 6846*a58d3d2aSXin Li<![CDATA[ 6847*a58d3d2aSXin Li k 6848*a58d3d2aSXin Li a_ana(k) = a(k)*g_ana , and 6849*a58d3d2aSXin Li 6850*a58d3d2aSXin Li k 6851*a58d3d2aSXin Li a_syn(k) = a(k)*g_syn , 6852*a58d3d2aSXin Li]]> 6853*a58d3d2aSXin Li</artwork> 6854*a58d3d2aSXin Li</figure> 6855*a58d3d2aSXin Liwhere a(k) is the k'th LPC coefficient, and the bandwidth expansion factors 6856*a58d3d2aSXin Lig_ana and g_syn are calculated as 6857*a58d3d2aSXin Li<figure align="center"> 6858*a58d3d2aSXin Li<artwork align="center"> 6859*a58d3d2aSXin Li<![CDATA[ 6860*a58d3d2aSXin Lig_ana = 0.95 - 0.01*C, and 6861*a58d3d2aSXin Li 6862*a58d3d2aSXin Lig_syn = 0.95 + 0.01*C, 6863*a58d3d2aSXin Li]]> 6864*a58d3d2aSXin Li</artwork> 6865*a58d3d2aSXin Li</figure> 6866*a58d3d2aSXin Liwhere C is the coding quality control parameter between 0 and 1. 6867*a58d3d2aSXin LiApplying more bandwidth expansion to the analysis part than to the synthesis 6868*a58d3d2aSXin Lipart gives the desired de-emphasis of spectral valleys in between formants. 6869*a58d3d2aSXin Li</t> 6870*a58d3d2aSXin Li 6871*a58d3d2aSXin Li<t> 6872*a58d3d2aSXin LiThe long-term shaping is applied only during voiced frames. 6873*a58d3d2aSXin LiIt uses three filter taps, described by 6874*a58d3d2aSXin Li<figure align="center"> 6875*a58d3d2aSXin Li<artwork align="center"> 6876*a58d3d2aSXin Li <![CDATA[ 6877*a58d3d2aSXin Lib_ana = F_ana * [0.25, 0.5, 0.25], and 6878*a58d3d2aSXin Li 6879*a58d3d2aSXin Lib_syn = F_syn * [0.25, 0.5, 0.25]. 6880*a58d3d2aSXin Li]]> 6881*a58d3d2aSXin Li</artwork> 6882*a58d3d2aSXin Li</figure> 6883*a58d3d2aSXin LiFor unvoiced frames these coefficients are set to 0. The multiplication factors 6884*a58d3d2aSXin LiF_ana and F_syn are chosen between 0 and 1, depending on the coding quality 6885*a58d3d2aSXin Licontrol parameter, as well as the calculated pitch correlation and smoothed 6886*a58d3d2aSXin Lisubband SNR of the lowest subband. By having F_ana less than F_syn, 6887*a58d3d2aSXin Lithe pitch harmonics are emphasized relative to the valleys in between the 6888*a58d3d2aSXin Liharmonics. 6889*a58d3d2aSXin Li</t> 6890*a58d3d2aSXin Li 6891*a58d3d2aSXin Li<t> 6892*a58d3d2aSXin LiThe tilt coefficient c_tilt is for unvoiced frames chosen as 6893*a58d3d2aSXin Li<figure align="center"> 6894*a58d3d2aSXin Li<artwork align="center"> 6895*a58d3d2aSXin Li<![CDATA[ 6896*a58d3d2aSXin Lic_tilt = 0.25, 6897*a58d3d2aSXin Li]]> 6898*a58d3d2aSXin Li</artwork> 6899*a58d3d2aSXin Li</figure> 6900*a58d3d2aSXin Liand as 6901*a58d3d2aSXin Li<figure align="center"> 6902*a58d3d2aSXin Li<artwork align="center"> 6903*a58d3d2aSXin Li<![CDATA[ 6904*a58d3d2aSXin Lic_tilt = 0.25 + 0.2625 * V 6905*a58d3d2aSXin Li]]> 6906*a58d3d2aSXin Li</artwork> 6907*a58d3d2aSXin Li</figure> 6908*a58d3d2aSXin Lifor voiced frames, where V is the voice activity level between 0 and 1. 6909*a58d3d2aSXin Li</t> 6910*a58d3d2aSXin Li<t> 6911*a58d3d2aSXin LiThe adjustment gain G serves to correct any level mismatch between the original 6912*a58d3d2aSXin Liand decoded signals that might arise from the noise shaping and de-emphasis. 6913*a58d3d2aSXin LiThis gain is computed as the ratio of the prediction gain of the short-term 6914*a58d3d2aSXin Lianalysis and synthesis filter coefficients. The prediction gain of an LPC 6915*a58d3d2aSXin Lisynthesis filter is the square root of the output energy when the filter is 6916*a58d3d2aSXin Liexcited by a unit-energy impulse on the input. 6917*a58d3d2aSXin LiAn efficient way to compute the prediction gain is by first computing the 6918*a58d3d2aSXin Lireflection coefficients from the LPC coefficients through the step-down 6919*a58d3d2aSXin Lialgorithm, and extracting the prediction gain from the reflection coefficients 6920*a58d3d2aSXin Lias 6921*a58d3d2aSXin Li<figure align="center"> 6922*a58d3d2aSXin Li<artwork align="center"> 6923*a58d3d2aSXin Li<![CDATA[ 6924*a58d3d2aSXin Li K 6925*a58d3d2aSXin Li ___ 2 -0.5 6926*a58d3d2aSXin Li predGain = ( | | 1 - (r_k) ) , 6927*a58d3d2aSXin Li k=1 6928*a58d3d2aSXin Li]]> 6929*a58d3d2aSXin Li</artwork> 6930*a58d3d2aSXin Li</figure> 6931*a58d3d2aSXin Liwhere r_k is the k'th reflection coefficient. 6932*a58d3d2aSXin Li</t> 6933*a58d3d2aSXin Li 6934*a58d3d2aSXin Li<t> 6935*a58d3d2aSXin LiInitial values for the quantization gains are computed as the square-root of 6936*a58d3d2aSXin Lithe residual energy of the LPC analysis, adjusted by the coding quality control 6937*a58d3d2aSXin Liparameter. 6938*a58d3d2aSXin LiThese quantization gains are later adjusted based on the results of the 6939*a58d3d2aSXin Liprediction analysis. 6940*a58d3d2aSXin Li</t> 6941*a58d3d2aSXin Li</section> 6942*a58d3d2aSXin Li 6943*a58d3d2aSXin Li<section title='Prediction Analysis' anchor='pred_ana_overview_section'> 6944*a58d3d2aSXin Li<t> 6945*a58d3d2aSXin LiThe prediction analysis is performed in one of two ways depending on how 6946*a58d3d2aSXin Lithe pitch estimator classified the frame. 6947*a58d3d2aSXin LiThe processing for voiced and unvoiced speech is described in 6948*a58d3d2aSXin Li<xref target='pred_ana_voiced_overview_section' /> and 6949*a58d3d2aSXin Li <xref target='pred_ana_unvoiced_overview_section' />, respectively. 6950*a58d3d2aSXin Li Inputs to this function include the pre-whitened signal from the 6951*a58d3d2aSXin Li pitch estimator (see <xref target='pitch_estimator_overview_section'/>). 6952*a58d3d2aSXin Li</t> 6953*a58d3d2aSXin Li 6954*a58d3d2aSXin Li<section title='Voiced Speech' anchor='pred_ana_voiced_overview_section'> 6955*a58d3d2aSXin Li<t> 6956*a58d3d2aSXin Li For a frame of voiced speech the pitch pulses will remain dominant in the 6957*a58d3d2aSXin Li pre-whitened input signal. 6958*a58d3d2aSXin Li Further whitening is desirable as it leads to higher quality at the same 6959*a58d3d2aSXin Li available bitrate. 6960*a58d3d2aSXin Li To achieve this, a Long-Term Prediction (LTP) analysis is carried out to 6961*a58d3d2aSXin Li estimate the coefficients of a fifth-order LTP filter for each of four 6962*a58d3d2aSXin Li subframes. 6963*a58d3d2aSXin Li The LTP coefficients are quantized using the method described in 6964*a58d3d2aSXin Li <xref target='ltp_quantizer_overview_section'/>, and the quantized LTP 6965*a58d3d2aSXin Li coefficients are used to compute the LTP residual signal. 6966*a58d3d2aSXin Li This LTP residual signal is the input to an LPC analysis where the LPC coefficients are 6967*a58d3d2aSXin Li estimated using Burg's method <xref target="Burg"/>, such that the residual energy is minimized. 6968*a58d3d2aSXin Li The estimated LPC coefficients are converted to a Line Spectral Frequency (LSF) vector 6969*a58d3d2aSXin Li and quantized as described in <xref target='lsf_quantizer_overview_section'/>. 6970*a58d3d2aSXin LiAfter quantization, the quantized LSF vector is converted back to LPC 6971*a58d3d2aSXin Licoefficients using the full procedure in <xref target="silk_nlsfs"/>. 6972*a58d3d2aSXin LiBy using quantized LTP coefficients and LPC coefficients derived from the 6973*a58d3d2aSXin Liquantized LSF coefficients, the encoder remains fully synchronized with the 6974*a58d3d2aSXin Lidecoder. 6975*a58d3d2aSXin LiThe quantized LPC and LTP coefficients are also used to filter the input 6976*a58d3d2aSXin Lisignal and measure residual energy for each of the four subframes. 6977*a58d3d2aSXin Li</t> 6978*a58d3d2aSXin Li</section> 6979*a58d3d2aSXin Li<section title='Unvoiced Speech' anchor='pred_ana_unvoiced_overview_section'> 6980*a58d3d2aSXin Li<t> 6981*a58d3d2aSXin LiFor a speech signal that has been classified as unvoiced, there is no need 6982*a58d3d2aSXin Lifor LTP filtering, as it has already been determined that the pre-whitened 6983*a58d3d2aSXin Liinput signal is not periodic enough within the allowed pitch period range 6984*a58d3d2aSXin Lifor LTP analysis to be worth the cost in terms of complexity and bitrate. 6985*a58d3d2aSXin LiThe pre-whitened input signal is therefore discarded, and instead the input 6986*a58d3d2aSXin Lisignal is used for LPC analysis using Burg's method. 6987*a58d3d2aSXin LiThe resulting LPC coefficients are converted to an LSF vector and quantized 6988*a58d3d2aSXin Lias described in the following section. 6989*a58d3d2aSXin LiThey are then transformed back to obtain quantized LPC coefficients, which 6990*a58d3d2aSXin Liare then used to filter the input signal and measure residual energy for 6991*a58d3d2aSXin Lieach of the four subframes. 6992*a58d3d2aSXin Li</t> 6993*a58d3d2aSXin Li<section title="Burg's Method"> 6994*a58d3d2aSXin Li<t> 6995*a58d3d2aSXin LiThe main purpose of linear prediction in SILK is to reduce the bitrate by 6996*a58d3d2aSXin Liminimizing the residual energy. 6997*a58d3d2aSXin LiAt least at high bitrates, perceptual aspects are handled 6998*a58d3d2aSXin Liindependently by the noise shaping filter. 6999*a58d3d2aSXin LiBurg's method is used because it provides higher prediction gain 7000*a58d3d2aSXin Lithan the autocorrelation method and, unlike the covariance method, 7001*a58d3d2aSXin Liproduces stable filters (assuming numerical errors don't spoil 7002*a58d3d2aSXin Lithat). SILK's implementation of Burg's method is also computationally 7003*a58d3d2aSXin Lifaster than the autocovariance method. 7004*a58d3d2aSXin LiThe implementation of Burg's method differs from traditional 7005*a58d3d2aSXin Liimplementations in two aspects. 7006*a58d3d2aSXin LiThe first difference is that it 7007*a58d3d2aSXin Lioperates on autocorrelations, similar to the Schur algorithm <xref target="Schur"/>, but 7008*a58d3d2aSXin Liwith a simple update to the autocorrelations after finding each 7009*a58d3d2aSXin Lireflection coefficient to make the result identical to Burg's method. 7010*a58d3d2aSXin LiThis brings down the complexity of Burg's method to near that of 7011*a58d3d2aSXin Lithe autocorrelation method. 7012*a58d3d2aSXin LiThe second difference is that the signal in each subframe is scaled 7013*a58d3d2aSXin Liby the inverse of the residual quantization step size. Subframes with 7014*a58d3d2aSXin Lia small quantization step size will on average spend more bits for a 7015*a58d3d2aSXin Ligiven amount of residual energy than subframes with a large step size. 7016*a58d3d2aSXin LiWithout scaling, Burg's method minimizes the total residual energy in 7017*a58d3d2aSXin Liall subframes, which doesn't necessarily minimize the total number of 7018*a58d3d2aSXin Libits needed for coding the quantized residual. The residual energy 7019*a58d3d2aSXin Liof the scaled subframes is a better measure for that number of 7020*a58d3d2aSXin Libits. 7021*a58d3d2aSXin Li</t> 7022*a58d3d2aSXin Li</section> 7023*a58d3d2aSXin Li</section> 7024*a58d3d2aSXin Li</section> 7025*a58d3d2aSXin Li 7026*a58d3d2aSXin Li<section title='LSF Quantization' anchor='lsf_quantizer_overview_section'> 7027*a58d3d2aSXin Li<t> 7028*a58d3d2aSXin LiUnlike many other speech codecs, SILK uses variable bitrate coding 7029*a58d3d2aSXin Lifor the LSFs. 7030*a58d3d2aSXin LiThis improves the average rate-distortion (R-D) tradeoff and reduces outliers. 7031*a58d3d2aSXin LiThe variable bitrate coding minimizes a linear combination of the weighted 7032*a58d3d2aSXin Liquantization errors and the bitrate. 7033*a58d3d2aSXin LiThe weights for the quantization errors are the Inverse 7034*a58d3d2aSXin LiHarmonic Mean Weighting (IHMW) function proposed by Laroia et al. 7035*a58d3d2aSXin Li(see <xref target="laroia-icassp" />). 7036*a58d3d2aSXin LiThese weights are referred to here as Laroia weights. 7037*a58d3d2aSXin Li</t> 7038*a58d3d2aSXin Li<t> 7039*a58d3d2aSXin LiThe LSF quantizer consists of two stages. 7040*a58d3d2aSXin LiThe first stage is an (unweighted) vector quantizer (VQ), with a 7041*a58d3d2aSXin Licodebook size of 32 vectors. 7042*a58d3d2aSXin LiThe quantization errors for the codebook vector are sorted, and 7043*a58d3d2aSXin Lifor the N best vectors a second stage quantizer is run. 7044*a58d3d2aSXin LiBy varying the number N a tradeoff is made between R-D performance 7045*a58d3d2aSXin Liand computational efficiency. 7046*a58d3d2aSXin LiFor each of the N codebook vectors the Laroia weights corresponding 7047*a58d3d2aSXin Lito that vector (and not to the input vector) are calculated. 7048*a58d3d2aSXin LiThen the residual between the input LSF vector and the codebook 7049*a58d3d2aSXin Livector is scaled by the square roots of these Laroia weights. 7050*a58d3d2aSXin LiThis scaling partially normalizes error sensitivity for the 7051*a58d3d2aSXin Liresidual vector, so that a uniform quantizer with fixed 7052*a58d3d2aSXin Listep sizes can be used in the second stage without too much 7053*a58d3d2aSXin Liperformance loss. 7054*a58d3d2aSXin LiAnd by scaling with Laroia weights determined from the first-stage 7055*a58d3d2aSXin Licodebook vector, the process can be reversed in the decoder. 7056*a58d3d2aSXin Li</t> 7057*a58d3d2aSXin Li<t> 7058*a58d3d2aSXin LiThe second stage uses predictive delayed decision scalar 7059*a58d3d2aSXin Liquantization. 7060*a58d3d2aSXin LiThe quantization error is weighted by Laroia weights determined 7061*a58d3d2aSXin Lifrom the LSF input vector. 7062*a58d3d2aSXin LiThe predictor multiplies the previous quantized residual value 7063*a58d3d2aSXin Liby a prediction coefficient that depends on the vector index from the 7064*a58d3d2aSXin Lifirst stage VQ and on the location in the LSF vector. 7065*a58d3d2aSXin LiThe prediction is subtracted from the LSF residual value before 7066*a58d3d2aSXin Liquantizing the result, and added back afterwards. 7067*a58d3d2aSXin LiThis subtraction can be interpreted as shifting the quantization levels 7068*a58d3d2aSXin Liof the scalar quantizer, and as a result the quantization error of 7069*a58d3d2aSXin Lieach value depends on the quantization decision of the previous value. 7070*a58d3d2aSXin LiThis dependency is exploited by the delayed decision mechanism to 7071*a58d3d2aSXin Lisearch for a quantization sequency with best R-D performance 7072*a58d3d2aSXin Liwith a Viterbi-like algorithm <xref target="Viterbi"/>. 7073*a58d3d2aSXin LiThe quantizer processes the residual LSF vector in reverse order 7074*a58d3d2aSXin Li(i.e., it starts with the highest residual LSF value). 7075*a58d3d2aSXin LiThis is done because the prediction works slightly 7076*a58d3d2aSXin Libetter in the reverse direction. 7077*a58d3d2aSXin Li</t> 7078*a58d3d2aSXin Li<t> 7079*a58d3d2aSXin LiThe quantization index of the first stage is entropy coded. 7080*a58d3d2aSXin LiThe quantization sequence from the second stage is also entropy 7081*a58d3d2aSXin Licoded, where for each element the probability table is chosen 7082*a58d3d2aSXin Lidepending on the vector index from the first stage and the location 7083*a58d3d2aSXin Liof that element in the LSF vector. 7084*a58d3d2aSXin Li</t> 7085*a58d3d2aSXin Li 7086*a58d3d2aSXin Li<section title='LSF Stabilization' anchor='lsf_stabilizer_overview_section'> 7087*a58d3d2aSXin Li<t> 7088*a58d3d2aSXin LiIf the input is stable, finding the best candidate usually results in a 7089*a58d3d2aSXin Liquantized vector that is also stable. Because of the two-stage approach, 7090*a58d3d2aSXin Lihowever, it is possible that the best quantization candidate is unstable. 7091*a58d3d2aSXin LiThe encoder applies the same stabilization procedure applied by the decoder 7092*a58d3d2aSXin Li (see <xref target="silk_nlsf_stabilization"/> to ensure the LSF parameters 7093*a58d3d2aSXin Li are within their valid range, increasingly sorted, and have minimum 7094*a58d3d2aSXin Li distances between each other and the border values. 7095*a58d3d2aSXin Li</t> 7096*a58d3d2aSXin Li</section> 7097*a58d3d2aSXin Li</section> 7098*a58d3d2aSXin Li 7099*a58d3d2aSXin Li<section title='LTP Quantization' anchor='ltp_quantizer_overview_section'> 7100*a58d3d2aSXin Li<t> 7101*a58d3d2aSXin LiFor voiced frames, the prediction analysis described in 7102*a58d3d2aSXin Li<xref target='pred_ana_voiced_overview_section' /> resulted in four sets 7103*a58d3d2aSXin Li(one set per subframe) of five LTP coefficients, plus four weighting matrices. 7104*a58d3d2aSXin LiThe LTP coefficients for each subframe are quantized using entropy constrained 7105*a58d3d2aSXin Livector quantization. 7106*a58d3d2aSXin LiA total of three vector codebooks are available for quantization, with 7107*a58d3d2aSXin Lidifferent rate-distortion trade-offs. The three codebooks have 10, 20, and 7108*a58d3d2aSXin Li40 vectors and average rates of about 3, 4, and 5 bits per vector, respectively. 7109*a58d3d2aSXin LiConsequently, the first codebook has larger average quantization distortion at 7110*a58d3d2aSXin Lia lower rate, whereas the last codebook has smaller average quantization 7111*a58d3d2aSXin Lidistortion at a higher rate. 7112*a58d3d2aSXin LiGiven the weighting matrix W_ltp and LTP vector b, the weighted rate-distortion 7113*a58d3d2aSXin Limeasure for a codebook vector cb_i with rate r_i is give by 7114*a58d3d2aSXin Li<figure align="center"> 7115*a58d3d2aSXin Li<artwork align="center"> 7116*a58d3d2aSXin Li<![CDATA[ 7117*a58d3d2aSXin Li RD = u * (b - cb_i)' * W_ltp * (b - cb_i) + r_i, 7118*a58d3d2aSXin Li]]> 7119*a58d3d2aSXin Li</artwork> 7120*a58d3d2aSXin Li</figure> 7121*a58d3d2aSXin Liwhere u is a fixed, heuristically-determined parameter balancing the distortion 7122*a58d3d2aSXin Liand rate. 7123*a58d3d2aSXin LiWhich codebook gives the best performance for a given LTP vector depends on the 7124*a58d3d2aSXin Liweighting matrix for that LTP vector. 7125*a58d3d2aSXin LiFor example, for a low valued W_ltp, it is advantageous to use the codebook 7126*a58d3d2aSXin Liwith 10 vectors as it has a lower average rate. 7127*a58d3d2aSXin LiFor a large W_ltp, on the other hand, it is often better to use the codebook 7128*a58d3d2aSXin Liwith 40 vectors, as it is more likely to contain the best codebook vector. 7129*a58d3d2aSXin LiThe weighting matrix W_ltp depends mostly on two aspects of the input signal. 7130*a58d3d2aSXin LiThe first is the periodicity of the signal; the more periodic, the larger W_ltp. 7131*a58d3d2aSXin LiThe second is the change in signal energy in the current subframe, relative to 7132*a58d3d2aSXin Lithe signal one pitch lag earlier. 7133*a58d3d2aSXin LiA decaying energy leads to a larger W_ltp than an increasing energy. 7134*a58d3d2aSXin LiBoth aspects fluctuate relatively slowly, which causes the W_ltp matrices for 7135*a58d3d2aSXin Lidifferent subframes of one frame often to be similar. 7136*a58d3d2aSXin LiBecause of this, one of the three codebooks typically gives good performance 7137*a58d3d2aSXin Lifor all subframes, and therefore the codebook search for the subframe LTP 7138*a58d3d2aSXin Livectors is constrained to only allow codebook vectors to be chosen from the 7139*a58d3d2aSXin Lisame codebook, resulting in a rate reduction. 7140*a58d3d2aSXin Li</t> 7141*a58d3d2aSXin Li 7142*a58d3d2aSXin Li<t> 7143*a58d3d2aSXin LiTo find the best codebook, each of the three vector codebooks is 7144*a58d3d2aSXin Liused to quantize all subframe LTP vectors and produce a combined 7145*a58d3d2aSXin Liweighted rate-distortion measure for each vector codebook. 7146*a58d3d2aSXin LiThe vector codebook with the lowest combined rate-distortion 7147*a58d3d2aSXin Liover all subframes is chosen. The quantized LTP vectors are used 7148*a58d3d2aSXin Liin the noise shaping quantizer, and the index of the codebook 7149*a58d3d2aSXin Liplus the four indices for the four subframe codebook vectors 7150*a58d3d2aSXin Liare passed on to the range encoder. 7151*a58d3d2aSXin Li</t> 7152*a58d3d2aSXin Li</section> 7153*a58d3d2aSXin Li 7154*a58d3d2aSXin Li<section title='Prefilter'> 7155*a58d3d2aSXin Li<t> 7156*a58d3d2aSXin LiIn the prefilter the input signal is filtered using the spectral valley 7157*a58d3d2aSXin Lide-emphasis filter coefficients from the noise shaping analysis 7158*a58d3d2aSXin Li(see <xref target='noise_shaping_analysis_overview_section'/>). 7159*a58d3d2aSXin LiBy applying only the noise shaping analysis filter to the input signal, 7160*a58d3d2aSXin Liit provides the input to the noise shaping quantizer. 7161*a58d3d2aSXin Li</t> 7162*a58d3d2aSXin Li</section> 7163*a58d3d2aSXin Li 7164*a58d3d2aSXin Li<section title='Noise Shaping Quantizer'> 7165*a58d3d2aSXin Li<t> 7166*a58d3d2aSXin LiThe noise shaping quantizer independently shapes the signal and coding noise 7167*a58d3d2aSXin Lispectra to obtain a perceptually higher quality at the same bitrate. 7168*a58d3d2aSXin Li</t> 7169*a58d3d2aSXin Li<t> 7170*a58d3d2aSXin LiThe prefilter output signal is multiplied with a compensation gain G computed 7171*a58d3d2aSXin Liin the noise shaping analysis. Then the output of a synthesis shaping filter 7172*a58d3d2aSXin Liis added, and the output of a prediction filter is subtracted to create a 7173*a58d3d2aSXin Liresidual signal. 7174*a58d3d2aSXin LiThe residual signal is multiplied by the inverse quantized quantization gain 7175*a58d3d2aSXin Lifrom the noise shaping analysis, and input to a scalar quantizer. 7176*a58d3d2aSXin LiThe quantization indices of the scalar quantizer represent a signal of pulses 7177*a58d3d2aSXin Lithat is input to the pyramid range encoder. 7178*a58d3d2aSXin LiThe scalar quantizer also outputs a quantization signal, which is multiplied 7179*a58d3d2aSXin Liby the quantized quantization gain from the noise shaping analysis to create 7180*a58d3d2aSXin Lian excitation signal. 7181*a58d3d2aSXin LiThe output of the prediction filter is added to the excitation signal to form 7182*a58d3d2aSXin Lithe quantized output signal y(n). 7183*a58d3d2aSXin LiThe quantized output signal y(n) is input to the synthesis shaping and 7184*a58d3d2aSXin Liprediction filters. 7185*a58d3d2aSXin Li</t> 7186*a58d3d2aSXin Li<t> 7187*a58d3d2aSXin LiOptionally the noise shaping quantizer operates in a delayed decision 7188*a58d3d2aSXin Limode. 7189*a58d3d2aSXin LiIn this mode it uses a Viterbi algorithm to keep track of 7190*a58d3d2aSXin Limultiple rounding choices in the quantizer and select the best 7191*a58d3d2aSXin Lione after a delay of 32 samples. This improves the rate/distortion 7192*a58d3d2aSXin Liperformance of the quantizer. 7193*a58d3d2aSXin Li</t> 7194*a58d3d2aSXin Li</section> 7195*a58d3d2aSXin Li 7196*a58d3d2aSXin Li<section title='Constant Bitrate Mode'> 7197*a58d3d2aSXin Li<t> 7198*a58d3d2aSXin Li SILK was designed to run in Variable Bitrate (VBR) mode. However 7199*a58d3d2aSXin Li the reference implementation also has a Constant Bitrate (CBR) mode 7200*a58d3d2aSXin Li for SILK. In CBR mode SILK will attempt to encode each packet with 7201*a58d3d2aSXin Li no more than the allowed number of bits. The Opus wrapper code 7202*a58d3d2aSXin Li then pads the bitstream if any unused bits are left in SILK mode, or 7203*a58d3d2aSXin Li encodes the high band with the remaining number of bits in Hybrid mode. 7204*a58d3d2aSXin Li The number of payload bits is adjusted by changing 7205*a58d3d2aSXin Li the quantization gains and the rate/distortion tradeoff in the noise 7206*a58d3d2aSXin Li shaping quantizer, in an iterative loop 7207*a58d3d2aSXin Li around the noise shaping quantizer and entropy coding. 7208*a58d3d2aSXin Li Compared to the SILK VBR mode, the CBR mode has lower 7209*a58d3d2aSXin Li audio quality at a given average bitrate, and also has higher 7210*a58d3d2aSXin Li computational complexity. 7211*a58d3d2aSXin Li</t> 7212*a58d3d2aSXin Li</section> 7213*a58d3d2aSXin Li 7214*a58d3d2aSXin Li</section> 7215*a58d3d2aSXin Li 7216*a58d3d2aSXin Li</section> 7217*a58d3d2aSXin Li 7218*a58d3d2aSXin Li 7219*a58d3d2aSXin Li<section title="CELT Encoder"> 7220*a58d3d2aSXin Li<t> 7221*a58d3d2aSXin LiMost of the aspects of the CELT encoder can be directly derived from the description 7222*a58d3d2aSXin Liof the decoder. For example, the filters and rotations in the encoder are simply the 7223*a58d3d2aSXin Liinverse of the operation performed by the decoder. Similarly, the quantizers generally 7224*a58d3d2aSXin Lioptimize for the mean square error (because noise shaping is part of the bit-stream itself), 7225*a58d3d2aSXin Liso no special search is required. For this reason, only the less straightforward aspects of the 7226*a58d3d2aSXin Liencoder are described here. 7227*a58d3d2aSXin Li</t> 7228*a58d3d2aSXin Li 7229*a58d3d2aSXin Li<section anchor="pitch-prefilter" title="Pitch Prefilter"> 7230*a58d3d2aSXin Li<t>The pitch prefilter is applied after the pre-emphasis. It is applied 7231*a58d3d2aSXin Liin such a way as to be the inverse of the decoder's post-filter. The main non-obvious aspect of the 7232*a58d3d2aSXin Liprefilter is the selection of the pitch period. The pitch search should be optimized for the 7233*a58d3d2aSXin Lifollowing criteria: 7234*a58d3d2aSXin Li<list style="symbols"> 7235*a58d3d2aSXin Li<t>continuity: it is important that the pitch period 7236*a58d3d2aSXin Lidoes not change abruptly between frames; and</t> 7237*a58d3d2aSXin Li<t>avoidance of pitch multiples: when the period used is a multiple of the real period 7238*a58d3d2aSXin Li(lower frequency fundamental), the post-filter loses most of its ability to reduce noise</t> 7239*a58d3d2aSXin Li</list> 7240*a58d3d2aSXin Li</t> 7241*a58d3d2aSXin Li</section> 7242*a58d3d2aSXin Li 7243*a58d3d2aSXin Li<section anchor="normalization" title="Bands and Normalization"> 7244*a58d3d2aSXin Li<t> 7245*a58d3d2aSXin LiThe MDCT output is divided into bands that are designed to match the ear's critical 7246*a58d3d2aSXin Libands for the smallest (2.5 ms) frame size. The larger frame sizes use integer 7247*a58d3d2aSXin Limultiples of the 2.5 ms layout. For each band, the encoder 7248*a58d3d2aSXin Licomputes the energy that will later be encoded. Each band is then normalized by the 7249*a58d3d2aSXin Lisquare root of the <spanx style="strong">unquantized</spanx> energy, such that each band now forms a unit vector X. 7250*a58d3d2aSXin LiThe energy and the normalization are computed by compute_band_energies() 7251*a58d3d2aSXin Liand normalise_bands() (bands.c), respectively. 7252*a58d3d2aSXin Li</t> 7253*a58d3d2aSXin Li</section> 7254*a58d3d2aSXin Li 7255*a58d3d2aSXin Li<section anchor="energy-quantization" title="Energy Envelope Quantization"> 7256*a58d3d2aSXin Li 7257*a58d3d2aSXin Li<t> 7258*a58d3d2aSXin LiEnergy quantization (both coarse and fine) can be easily understood from the decoding process. 7259*a58d3d2aSXin LiFor all useful bitrates, the coarse quantizer always chooses the quantized log energy value that 7260*a58d3d2aSXin Liminimizes the error for each band. Only at very low rate does the encoder allow larger errors to 7261*a58d3d2aSXin Liminimize the rate and avoid using more bits than are available. When the 7262*a58d3d2aSXin Liavailable CPU requirements allow it, it is best to try encoding the coarse energy both with and without 7263*a58d3d2aSXin Liinter-frame prediction such that the best prediction mode can be selected. The optimal mode depends on 7264*a58d3d2aSXin Lithe coding rate, the available bitrate, and the current rate of packet loss. 7265*a58d3d2aSXin Li</t> 7266*a58d3d2aSXin Li 7267*a58d3d2aSXin Li<t>The fine energy quantizer always chooses the quantized log energy value that 7268*a58d3d2aSXin Liminimizes the error for each band because the rate of the fine quantization depends only 7269*a58d3d2aSXin Lion the bit allocation and not on the values that are coded. 7270*a58d3d2aSXin Li</t> 7271*a58d3d2aSXin Li</section> <!-- Energy quant --> 7272*a58d3d2aSXin Li 7273*a58d3d2aSXin Li<section title="Bit Allocation"> 7274*a58d3d2aSXin Li<t>The encoder must use exactly the same bit allocation process as used by the decoder 7275*a58d3d2aSXin Liand described in <xref target="allocation"/>. The three mechanisms that can be used by the 7276*a58d3d2aSXin Liencoder to adjust the bitrate on a frame-by-frame basis are band boost, allocation trim, 7277*a58d3d2aSXin Liand band skipping. 7278*a58d3d2aSXin Li</t> 7279*a58d3d2aSXin Li 7280*a58d3d2aSXin Li<section title="Band Boost"> 7281*a58d3d2aSXin Li<t>The reference encoder makes a decision to boost a band when the energy of that band is significantly 7282*a58d3d2aSXin Lihigher than that of the neighboring bands. Let E_j be the log-energy of band j, we define 7283*a58d3d2aSXin Li<list> 7284*a58d3d2aSXin Li<t>D_j = 2*E_j - E_j-1 - E_j+1 </t> 7285*a58d3d2aSXin Li</list> 7286*a58d3d2aSXin Li 7287*a58d3d2aSXin LiThe allocation of band j is boosted once if D_j > t1 and twice if D_j > t2. For LM>=1, t1=2 and t2=4, 7288*a58d3d2aSXin Liwhile for LM<1, t1=3 and t2=5. 7289*a58d3d2aSXin Li</t> 7290*a58d3d2aSXin Li 7291*a58d3d2aSXin Li</section> 7292*a58d3d2aSXin Li 7293*a58d3d2aSXin Li<section title="Allocation Trim"> 7294*a58d3d2aSXin Li<t>The allocation trim is a value between 0 and 10 (inclusively) that controls the allocation 7295*a58d3d2aSXin Libalance between the low and high frequencies. The encoder starts with a safe "default" of 5 7296*a58d3d2aSXin Liand deviates from that default in two different ways. First the trim can deviate by +/- 2 7297*a58d3d2aSXin Lidepending on the spectral tilt of the input signal. For signals with more low frequencies, the 7298*a58d3d2aSXin Litrim is increased by up to 2, while for signals with more high frequencies, the trim is 7299*a58d3d2aSXin Lidecreased by up to 2. 7300*a58d3d2aSXin LiFor stereo inputs, the trim value can 7301*a58d3d2aSXin Libe decreased by up to 4 when the inter-channel correlation at low frequency (first 8 bands) 7302*a58d3d2aSXin Liis high. </t> 7303*a58d3d2aSXin Li</section> 7304*a58d3d2aSXin Li 7305*a58d3d2aSXin Li<section title="Band Skipping"> 7306*a58d3d2aSXin Li<t>The encoder uses band skipping to ensure that the shape of the bands is only coded 7307*a58d3d2aSXin Liif there is at least 1/2 bit per sample available for the PVQ. If not, then no bit is allocated 7308*a58d3d2aSXin Liand folding is used instead. To ensure continuity in the allocation, some amount of hysteresis is 7309*a58d3d2aSXin Liadded to the process, such that a band that received PVQ bits in the previous frame only needs 7/16 7310*a58d3d2aSXin Libit/sample to be coded for the current frame, while a band that did not receive PVQ bits in the 7311*a58d3d2aSXin Liprevious frames needs at least 9/16 bit/sample to be coded.</t> 7312*a58d3d2aSXin Li</section> 7313*a58d3d2aSXin Li 7314*a58d3d2aSXin Li</section> 7315*a58d3d2aSXin Li 7316*a58d3d2aSXin Li<section title="Stereo Decisions"> 7317*a58d3d2aSXin Li<t>Because CELT applies mid-side stereo coupling in the normalized domain, it does not suffer from 7318*a58d3d2aSXin Liimportant stereo image problems even when the two channels are completely uncorrelated. For this reason 7319*a58d3d2aSXin Liit is always safe to use stereo coupling on any audio frame. That being said, there are some frames 7320*a58d3d2aSXin Lifor which dual (independent) stereo is still more efficient. This decision is made by comparing the estimated 7321*a58d3d2aSXin Lientropy with and without coupling over the first 13 bands, taking into account the fact that all bands with 7322*a58d3d2aSXin Limore than two MDCT bins require one extra degree of freedom when coded in mid-side. Let L1_ms and L1_lr 7323*a58d3d2aSXin Libe the L1-norm of the mid-side vector and the L1-norm of the left-right vector, respectively. The decision 7324*a58d3d2aSXin Lito use mid-side is made if and only if 7325*a58d3d2aSXin Li<figure align="center"> 7326*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 7327*a58d3d2aSXin Li L1_ms L1_lr 7328*a58d3d2aSXin Li-------- < ----- 7329*a58d3d2aSXin Libins + E bins 7330*a58d3d2aSXin Li]]></artwork> 7331*a58d3d2aSXin Li</figure> 7332*a58d3d2aSXin Liwhere bins is the number of MDCT bins in the first 13 bands and E is the number of extra degrees of 7333*a58d3d2aSXin Lifreedom for mid-side coding. For LM>1, E=13, otherwise E=5. 7334*a58d3d2aSXin Li</t> 7335*a58d3d2aSXin Li 7336*a58d3d2aSXin Li<t>The reference encoder decides on the intensity stereo threshold based on the bitrate alone. After 7337*a58d3d2aSXin Litaking into account the frame size by subtracting 80 bits per frame for coarse energy, the first 7338*a58d3d2aSXin Liband using intensity coding is as follows: 7339*a58d3d2aSXin Li</t> 7340*a58d3d2aSXin Li 7341*a58d3d2aSXin Li<texttable anchor="intensity-thresholds" 7342*a58d3d2aSXin Li title="Thresholds for Intensity Stereo"> 7343*a58d3d2aSXin Li<ttcol align='center'>bitrate (kb/s)</ttcol> 7344*a58d3d2aSXin Li<ttcol align='center'>start band</ttcol> 7345*a58d3d2aSXin Li<c><35</c> <c>8</c> 7346*a58d3d2aSXin Li<c>35-50</c> <c>12</c> 7347*a58d3d2aSXin Li<c>50-68</c> <c>16</c> 7348*a58d3d2aSXin Li<c>84-84</c> <c>18</c> 7349*a58d3d2aSXin Li<c>84-102</c> <c>19</c> 7350*a58d3d2aSXin Li<c>102-130</c> <c>20</c> 7351*a58d3d2aSXin Li<c>>130</c> <c>disabled</c> 7352*a58d3d2aSXin Li</texttable> 7353*a58d3d2aSXin Li 7354*a58d3d2aSXin Li 7355*a58d3d2aSXin Li</section> 7356*a58d3d2aSXin Li 7357*a58d3d2aSXin Li<section title="Time-Frequency Decision"> 7358*a58d3d2aSXin Li<t> 7359*a58d3d2aSXin LiThe choice of time-frequency resolution used in <xref target="tf-change"></xref> is based on 7360*a58d3d2aSXin LiR-D optimization. The distortion is the L1-norm (sum of absolute values) of each band 7361*a58d3d2aSXin Liafter each TF resolution under consideration. The L1 norm is used because it represents the entropy 7362*a58d3d2aSXin Lifor a Laplacian source. The number of bits required to code a change in TF resolution between 7363*a58d3d2aSXin Litwo bands is higher than the cost of having those two bands use the same resolution, which is 7364*a58d3d2aSXin Liwhat requires the R-D optimization. The optimal decision is computed using the Viterbi algorithm. 7365*a58d3d2aSXin LiSee tf_analysis() in celt/celt.c. 7366*a58d3d2aSXin Li</t> 7367*a58d3d2aSXin Li</section> 7368*a58d3d2aSXin Li 7369*a58d3d2aSXin Li<section title="Spreading Values Decision"> 7370*a58d3d2aSXin Li<t> 7371*a58d3d2aSXin LiThe choice of the spreading value in <xref target="spread values"></xref> has an 7372*a58d3d2aSXin Liimpact on the nature of the coding noise introduced by CELT. The larger the f_r value, the 7373*a58d3d2aSXin Lilower the impact of the rotation, and the more tonal the coding noise. The 7374*a58d3d2aSXin Limore tonal the signal, the more tonal the noise should be, so the CELT encoder determines 7375*a58d3d2aSXin Lithe optimal value for f_r by estimating how tonal the signal is. The tonality estimate 7376*a58d3d2aSXin Liis based on discrete pdf (4-bin histogram) of each band. Bands that have a large number of small 7377*a58d3d2aSXin Livalues are considered more tonal and a decision is made by combining all bands with more than 7378*a58d3d2aSXin Li8 samples. See spreading_decision() in celt/bands.c. 7379*a58d3d2aSXin Li</t> 7380*a58d3d2aSXin Li</section> 7381*a58d3d2aSXin Li 7382*a58d3d2aSXin Li<section anchor="pvq" title="Spherical Vector Quantization"> 7383*a58d3d2aSXin Li<t>CELT uses a Pyramid Vector Quantization (PVQ) <xref target="PVQ"></xref> 7384*a58d3d2aSXin Licodebook for quantizing the details of the spectrum in each band that have not 7385*a58d3d2aSXin Libeen predicted by the pitch predictor. The PVQ codebook consists of all sums 7386*a58d3d2aSXin Liof K signed pulses in a vector of N samples, where two pulses at the same position 7387*a58d3d2aSXin Liare required to have the same sign. Thus the codebook includes 7388*a58d3d2aSXin Liall integer codevectors y of N dimensions that satisfy sum(abs(y(j))) = K. 7389*a58d3d2aSXin Li</t> 7390*a58d3d2aSXin Li 7391*a58d3d2aSXin Li<t> 7392*a58d3d2aSXin LiIn bands where there are sufficient bits allocated PVQ is used to encode 7393*a58d3d2aSXin Lithe unit vector that results from the normalization in 7394*a58d3d2aSXin Li<xref target="normalization"></xref> directly. Given a PVQ codevector y, 7395*a58d3d2aSXin Lithe unit vector X is obtained as X = y/||y||, where ||.|| denotes the 7396*a58d3d2aSXin LiL2 norm. 7397*a58d3d2aSXin Li</t> 7398*a58d3d2aSXin Li 7399*a58d3d2aSXin Li 7400*a58d3d2aSXin Li<section anchor="pvq-search" title="PVQ Search"> 7401*a58d3d2aSXin Li 7402*a58d3d2aSXin Li<t> 7403*a58d3d2aSXin LiThe search for the best codevector y is performed by alg_quant() 7404*a58d3d2aSXin Li(vq.c). There are several possible approaches to the 7405*a58d3d2aSXin Lisearch, with a trade-off between quality and complexity. The method used in the reference 7406*a58d3d2aSXin Liimplementation computes an initial codeword y1 by projecting the normalized spectrum 7407*a58d3d2aSXin LiX onto the codebook pyramid of K-1 pulses: 7408*a58d3d2aSXin Li</t> 7409*a58d3d2aSXin Li<t> 7410*a58d3d2aSXin Liy0 = truncate_towards_zero( (K-1) * X / sum(abs(X))) 7411*a58d3d2aSXin Li</t> 7412*a58d3d2aSXin Li 7413*a58d3d2aSXin Li<t> 7414*a58d3d2aSXin LiDepending on N, K and the input data, the initial codeword y0 may contain from 7415*a58d3d2aSXin Li0 to K-1 non-zero values. All the remaining pulses, with the exception of the last one, 7416*a58d3d2aSXin Liare found iteratively with a greedy search that minimizes the normalized correlation 7417*a58d3d2aSXin Libetween y and X: 7418*a58d3d2aSXin Li<figure align="center"> 7419*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 7420*a58d3d2aSXin Li T 7421*a58d3d2aSXin LiJ = -X * y / ||y|| 7422*a58d3d2aSXin Li]]></artwork> 7423*a58d3d2aSXin Li</figure> 7424*a58d3d2aSXin Li</t> 7425*a58d3d2aSXin Li 7426*a58d3d2aSXin Li<t> 7427*a58d3d2aSXin LiThe search described above is considered to be a good trade-off between quality 7428*a58d3d2aSXin Liand computational cost. However, there are other possible ways to search the PVQ 7429*a58d3d2aSXin Licodebook and the implementers MAY use any other search methods. See alg_quant() in celt/vq.c. 7430*a58d3d2aSXin Li</t> 7431*a58d3d2aSXin Li</section> 7432*a58d3d2aSXin Li 7433*a58d3d2aSXin Li<section anchor="cwrs-encoder" title="PVQ Encoding"> 7434*a58d3d2aSXin Li 7435*a58d3d2aSXin Li<t> 7436*a58d3d2aSXin LiThe vector to encode, X, is converted into an index i such that 7437*a58d3d2aSXin Li 0 <= i < V(N,K) as follows. 7438*a58d3d2aSXin LiLet i = 0 and k = 0. 7439*a58d3d2aSXin LiThen for j = (N - 1) down to 0, inclusive, do: 7440*a58d3d2aSXin Li<list style="numbers"> 7441*a58d3d2aSXin Li<t> 7442*a58d3d2aSXin LiIf k > 0, set 7443*a58d3d2aSXin Li i = i + (V(N-j-1,k-1) + V(N-j,k-1))/2. 7444*a58d3d2aSXin Li</t> 7445*a58d3d2aSXin Li<t>Set k = k + abs(X[j]).</t> 7446*a58d3d2aSXin Li<t> 7447*a58d3d2aSXin LiIf X[j] < 0, set 7448*a58d3d2aSXin Li i = i + (V(N-j-1,k) + V(N-j,k))/2. 7449*a58d3d2aSXin Li</t> 7450*a58d3d2aSXin Li</list> 7451*a58d3d2aSXin Li</t> 7452*a58d3d2aSXin Li 7453*a58d3d2aSXin Li<t> 7454*a58d3d2aSXin LiThe index i is then encoded using the procedure in 7455*a58d3d2aSXin Li <xref target="encoding-ints"/> with ft = V(N,K). 7456*a58d3d2aSXin Li</t> 7457*a58d3d2aSXin Li 7458*a58d3d2aSXin Li</section> 7459*a58d3d2aSXin Li 7460*a58d3d2aSXin Li</section> 7461*a58d3d2aSXin Li 7462*a58d3d2aSXin Li 7463*a58d3d2aSXin Li 7464*a58d3d2aSXin Li 7465*a58d3d2aSXin Li 7466*a58d3d2aSXin Li</section> 7467*a58d3d2aSXin Li 7468*a58d3d2aSXin Li</section> 7469*a58d3d2aSXin Li 7470*a58d3d2aSXin Li 7471*a58d3d2aSXin Li<section anchor="conformance" title="Conformance"> 7472*a58d3d2aSXin Li 7473*a58d3d2aSXin Li<t> 7474*a58d3d2aSXin LiIt is our intention to allow the greatest possible choice of freedom in 7475*a58d3d2aSXin Liimplementing the specification. For this reason, outside of the exceptions 7476*a58d3d2aSXin Linoted in this section, conformance is defined through the reference 7477*a58d3d2aSXin Liimplementation of the decoder provided in <xref target="ref-implementation"/>. 7478*a58d3d2aSXin LiAlthough this document includes an English description of the codec, should 7479*a58d3d2aSXin Lithe description contradict the source code of the reference implementation, 7480*a58d3d2aSXin Lithe latter shall take precedence. 7481*a58d3d2aSXin Li</t> 7482*a58d3d2aSXin Li 7483*a58d3d2aSXin Li<t> 7484*a58d3d2aSXin LiCompliance with this specification means that in addition to following the normative keywords in this document, 7485*a58d3d2aSXin Li a decoder's output MUST also be 7486*a58d3d2aSXin Li within the thresholds specified by the opus_compare.c tool (included 7487*a58d3d2aSXin Li with the code) when compared to the reference implementation for each of the 7488*a58d3d2aSXin Li test vectors provided (see <xref target="test-vectors"></xref>) and for each output 7489*a58d3d2aSXin Li sampling rate and channel count supported. In addition, a compliant 7490*a58d3d2aSXin Li decoder implementation MUST have the same final range decoder state as that of the 7491*a58d3d2aSXin Li reference decoder. It is therefore RECOMMENDED that the 7492*a58d3d2aSXin Li decoder implement the same functional behavior as the reference. 7493*a58d3d2aSXin Li 7494*a58d3d2aSXin Li A decoder implementation is not required to support all output sampling 7495*a58d3d2aSXin Li rates or all output channel counts. 7496*a58d3d2aSXin Li</t> 7497*a58d3d2aSXin Li 7498*a58d3d2aSXin Li<section title="Testing"> 7499*a58d3d2aSXin Li<t> 7500*a58d3d2aSXin LiUsing the reference code provided in <xref target="ref-implementation"></xref>, 7501*a58d3d2aSXin Lia test vector can be decoded with 7502*a58d3d2aSXin Li<list> 7503*a58d3d2aSXin Li<t>opus_demo -d <rate> <channels> testvectorX.bit testX.out</t> 7504*a58d3d2aSXin Li</list> 7505*a58d3d2aSXin Liwhere <rate> is the sampling rate and can be 8000, 12000, 16000, 24000, or 48000, and 7506*a58d3d2aSXin Li<channels> is 1 for mono or 2 for stereo. 7507*a58d3d2aSXin Li</t> 7508*a58d3d2aSXin Li 7509*a58d3d2aSXin Li<t> 7510*a58d3d2aSXin LiIf the range decoder state is incorrect for one of the frames, the decoder will exit with 7511*a58d3d2aSXin Li"Error: Range coder state mismatch between encoder and decoder". If the decoder succeeds, then 7512*a58d3d2aSXin Lithe output can be compared with the "reference" output with 7513*a58d3d2aSXin Li<list> 7514*a58d3d2aSXin Li<t>opus_compare -s -r <rate> testvectorX.dec testX.out</t> 7515*a58d3d2aSXin Li</list> 7516*a58d3d2aSXin Lifor stereo or 7517*a58d3d2aSXin Li<list> 7518*a58d3d2aSXin Li<t>opus_compare -r <rate> testvectorX.dec testX.out</t> 7519*a58d3d2aSXin Li</list> 7520*a58d3d2aSXin Lifor mono. 7521*a58d3d2aSXin Li</t> 7522*a58d3d2aSXin Li 7523*a58d3d2aSXin Li<t>In addition to indicating whether the test vector comparison passes, the opus_compare tool 7524*a58d3d2aSXin Lioutputs an "Opus quality metric" that indicates how well the tested decoder matches the 7525*a58d3d2aSXin Lireference implementation. A quality of 0 corresponds to the passing threshold, while 7526*a58d3d2aSXin Lia quality of 100 is the highest possible value and means that the output of the tested decoder is identical to the reference 7527*a58d3d2aSXin Liimplementation. The passing threshold (quality 0) was calibrated in such a way that it corresponds to 7528*a58d3d2aSXin Liadditive white noise with a 48 dB SNR (similar to what can be obtained on a cassette deck). 7529*a58d3d2aSXin LiIt is still possible for an implementation to sound very good with such a low quality measure 7530*a58d3d2aSXin Li(e.g. if the deviation is due to inaudible phase distortion), but unless this is verified by 7531*a58d3d2aSXin Lilistening tests, it is RECOMMENDED that implementations achieve a quality above 90 for 48 kHz 7532*a58d3d2aSXin Lidecoding. For other sampling rates, it is normal for the quality metric to be lower 7533*a58d3d2aSXin Li(typically as low as 50 even for a good implementation) because of harmless mismatch with 7534*a58d3d2aSXin Lithe delay and phase of the internal sampling rate conversion. 7535*a58d3d2aSXin Li</t> 7536*a58d3d2aSXin Li 7537*a58d3d2aSXin Li<t> 7538*a58d3d2aSXin LiOn POSIX environments, the run_vectors.sh script can be used to verify all test 7539*a58d3d2aSXin Livectors. This can be done with 7540*a58d3d2aSXin Li<list> 7541*a58d3d2aSXin Li<t>run_vectors.sh <exec path> <vector path> <rate></t> 7542*a58d3d2aSXin Li</list> 7543*a58d3d2aSXin Liwhere <exec path> is the directory where the opus_demo and opus_compare executables 7544*a58d3d2aSXin Liare built and <vector path> is the directory containing the test vectors. 7545*a58d3d2aSXin Li</t> 7546*a58d3d2aSXin Li</section> 7547*a58d3d2aSXin Li 7548*a58d3d2aSXin Li<section anchor="opus-custom" title="Opus Custom"> 7549*a58d3d2aSXin Li<t> 7550*a58d3d2aSXin LiOpus Custom is an OPTIONAL part of the specification that is defined to 7551*a58d3d2aSXin Lihandle special sample rates and frame rates that are not supported by the 7552*a58d3d2aSXin Limain Opus specification. Use of Opus Custom is discouraged for all but very 7553*a58d3d2aSXin Lispecial applications for which a frame size different from 2.5, 5, 10, or 20 ms is 7554*a58d3d2aSXin Lineeded (for either complexity or latency reasons). Because Opus Custom is 7555*a58d3d2aSXin Lioptional, streams encoded using Opus Custom cannot be expected to be decodable by all Opus 7556*a58d3d2aSXin Liimplementations. Also, because no in-band mechanism exists for specifying the sampling 7557*a58d3d2aSXin Lirate and frame size of Opus Custom streams, out-of-band signaling is required. 7558*a58d3d2aSXin LiIn Opus Custom operation, only the CELT layer is available, using the opus_custom_* function 7559*a58d3d2aSXin Licalls in opus_custom.h. 7560*a58d3d2aSXin Li</t> 7561*a58d3d2aSXin Li</section> 7562*a58d3d2aSXin Li 7563*a58d3d2aSXin Li</section> 7564*a58d3d2aSXin Li 7565*a58d3d2aSXin Li<section anchor="security" title="Security Considerations"> 7566*a58d3d2aSXin Li 7567*a58d3d2aSXin Li<t> 7568*a58d3d2aSXin LiImplementations of the Opus codec need to take appropriate security considerations 7569*a58d3d2aSXin Liinto account, as outlined in <xref target="DOS"/>. 7570*a58d3d2aSXin LiIt is extremely important for the decoder to be robust against malicious 7571*a58d3d2aSXin Lipayloads. 7572*a58d3d2aSXin LiMalicious payloads must not cause the decoder to overrun its allocated memory 7573*a58d3d2aSXin Li or to take an excessive amount of resources to decode. 7574*a58d3d2aSXin LiAlthough problems 7575*a58d3d2aSXin Liin encoders are typically rarer, the same applies to the encoder. Malicious 7576*a58d3d2aSXin Liaudio streams must not cause the encoder to misbehave because this would 7577*a58d3d2aSXin Liallow an attacker to attack transcoding gateways. 7578*a58d3d2aSXin Li</t> 7579*a58d3d2aSXin Li<t> 7580*a58d3d2aSXin LiThe reference implementation contains no known buffer overflow or cases where 7581*a58d3d2aSXin Li a specially crafted packet or audio segment could cause a significant increase 7582*a58d3d2aSXin Li in CPU load. 7583*a58d3d2aSXin LiHowever, on certain CPU architectures where denormalized floating-point 7584*a58d3d2aSXin Li operations are much slower than normal floating-point operations, it is 7585*a58d3d2aSXin Li possible for some audio content (e.g., silence or near-silence) to cause an 7586*a58d3d2aSXin Li increase in CPU load. 7587*a58d3d2aSXin LiDenormals can be introduced by reordering operations in the compiler and depend 7588*a58d3d2aSXin Li on the target architecture, so it is difficult to guarantee that an implementation 7589*a58d3d2aSXin Li avoids them. 7590*a58d3d2aSXin LiFor architectures on which denormals are problematic, adding very small 7591*a58d3d2aSXin Li floating-point offsets to the affected signals to prevent significant numbers 7592*a58d3d2aSXin Li of denormalized operations is RECOMMENDED. 7593*a58d3d2aSXin LiAlternatively, it is often possible to configure the hardware to treat 7594*a58d3d2aSXin Li denormals as zero (DAZ). 7595*a58d3d2aSXin LiNo such issue exists for the fixed-point reference implementation. 7596*a58d3d2aSXin Li</t> 7597*a58d3d2aSXin Li<t>The reference implementation was validated in the following conditions: 7598*a58d3d2aSXin Li<list style="numbers"> 7599*a58d3d2aSXin Li<t> 7600*a58d3d2aSXin LiSending the decoder valid packets generated by the reference encoder and 7601*a58d3d2aSXin Li verifying that the decoder's final range coder state matches that of the 7602*a58d3d2aSXin Li encoder. 7603*a58d3d2aSXin Li</t> 7604*a58d3d2aSXin Li<t> 7605*a58d3d2aSXin LiSending the decoder packets generated by the reference encoder and then 7606*a58d3d2aSXin Li subjected to random corruption. 7607*a58d3d2aSXin Li</t> 7608*a58d3d2aSXin Li<t>Sending the decoder random packets.</t> 7609*a58d3d2aSXin Li<t> 7610*a58d3d2aSXin LiSending the decoder packets generated by a version of the reference encoder 7611*a58d3d2aSXin Li modified to make random coding decisions (internal fuzzing), including mode 7612*a58d3d2aSXin Li switching, and verifying that the range coder final states match. 7613*a58d3d2aSXin Li</t> 7614*a58d3d2aSXin Li</list> 7615*a58d3d2aSXin LiIn all of the conditions above, both the encoder and the decoder were run 7616*a58d3d2aSXin Li inside the <xref target="Valgrind">Valgrind</xref> memory 7617*a58d3d2aSXin Li debugger, which tracks reads and writes to invalid memory regions as well as 7618*a58d3d2aSXin Li the use of uninitialized memory. 7619*a58d3d2aSXin LiThere were no errors reported on any of the tested conditions. 7620*a58d3d2aSXin Li</t> 7621*a58d3d2aSXin Li</section> 7622*a58d3d2aSXin Li 7623*a58d3d2aSXin Li 7624*a58d3d2aSXin Li<section title="IANA Considerations"> 7625*a58d3d2aSXin Li<t> 7626*a58d3d2aSXin LiThis document has no actions for IANA. 7627*a58d3d2aSXin Li</t> 7628*a58d3d2aSXin Li</section> 7629*a58d3d2aSXin Li 7630*a58d3d2aSXin Li<section anchor="Acknowledgements" title="Acknowledgements"> 7631*a58d3d2aSXin Li<t> 7632*a58d3d2aSXin LiThanks to all other developers, including Raymond Chen, Soeren Skak Jensen, Gregory Maxwell, 7633*a58d3d2aSXin LiChristopher Montgomery, and Karsten Vandborg Soerensen. We would also 7634*a58d3d2aSXin Lilike to thank Igor Dyakonov, Jan Skoglund, and Christian Hoene for their help with subjective testing of the 7635*a58d3d2aSXin LiOpus codec. Thanks to Ralph Giles, John Ridges, Ben Schwartz, Keith Yan, Christian Hoene, Kat Walsh, and many others on the Opus and CELT mailing lists 7636*a58d3d2aSXin Lifor their bug reports and feedback. 7637*a58d3d2aSXin Li</t> 7638*a58d3d2aSXin Li</section> 7639*a58d3d2aSXin Li 7640*a58d3d2aSXin Li<section title="Copying Conditions"> 7641*a58d3d2aSXin Li<t>The authors agree to grant third parties the irrevocable right to copy, use and distribute 7642*a58d3d2aSXin Lithe work (excluding Code Components available under the simplified BSD license), with or 7643*a58d3d2aSXin Liwithout modification, in any medium, without royalty, provided that, unless separate 7644*a58d3d2aSXin Lipermission is granted, redistributed modified works do not contain misleading author, version, 7645*a58d3d2aSXin Liname of work, or endorsement information.</t> 7646*a58d3d2aSXin Li</section> 7647*a58d3d2aSXin Li 7648*a58d3d2aSXin Li</middle> 7649*a58d3d2aSXin Li 7650*a58d3d2aSXin Li<back> 7651*a58d3d2aSXin Li 7652*a58d3d2aSXin Li<references title="Normative References"> 7653*a58d3d2aSXin Li 7654*a58d3d2aSXin Li<reference anchor="rfc2119"> 7655*a58d3d2aSXin Li<front> 7656*a58d3d2aSXin Li<title>Key words for use in RFCs to Indicate Requirement Levels </title> 7657*a58d3d2aSXin Li<author initials="S." surname="Bradner" fullname="Scott Bradner"></author> 7658*a58d3d2aSXin Li</front> 7659*a58d3d2aSXin Li<seriesInfo name="RFC" value="2119" /> 7660*a58d3d2aSXin Li</reference> 7661*a58d3d2aSXin Li 7662*a58d3d2aSXin Li</references> 7663*a58d3d2aSXin Li 7664*a58d3d2aSXin Li<references title="Informative References"> 7665*a58d3d2aSXin Li 7666*a58d3d2aSXin Li<reference anchor='requirements'> 7667*a58d3d2aSXin Li<front> 7668*a58d3d2aSXin Li<title>Requirements for an Internet Audio Codec</title> 7669*a58d3d2aSXin Li<author initials='J.-M.' surname='Valin' fullname='J.-M. Valin'> 7670*a58d3d2aSXin Li<organization /></author> 7671*a58d3d2aSXin Li<author initials='K.' surname='Vos' fullname='K. Vos'> 7672*a58d3d2aSXin Li<organization /></author> 7673*a58d3d2aSXin Li<author> 7674*a58d3d2aSXin Li<organization>IETF</organization></author> 7675*a58d3d2aSXin Li<date year='2011' month='August' /> 7676*a58d3d2aSXin Li<abstract> 7677*a58d3d2aSXin Li<t>This document provides specific requirements for an Internet audio 7678*a58d3d2aSXin Li codec. These requirements address quality, sample rate, bitrate, 7679*a58d3d2aSXin Li and packet-loss robustness, as well as other desirable properties. 7680*a58d3d2aSXin Li</t></abstract></front> 7681*a58d3d2aSXin Li<seriesInfo name='RFC' value='6366' /> 7682*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/rfc/rfc6366.txt' /> 7683*a58d3d2aSXin Li</reference> 7684*a58d3d2aSXin Li 7685*a58d3d2aSXin Li<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?> 7686*a58d3d2aSXin Li<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3533.xml"?> 7687*a58d3d2aSXin Li 7688*a58d3d2aSXin Li<reference anchor='SILK' target='http://developer.skype.com/silk'> 7689*a58d3d2aSXin Li<front> 7690*a58d3d2aSXin Li<title>SILK Speech Codec</title> 7691*a58d3d2aSXin Li<author initials='K.' surname='Vos' fullname='K. Vos'> 7692*a58d3d2aSXin Li<organization /></author> 7693*a58d3d2aSXin Li<author initials='S.' surname='Jensen' fullname='S. Jensen'> 7694*a58d3d2aSXin Li<organization /></author> 7695*a58d3d2aSXin Li<author initials='K.' surname='Soerensen' fullname='K. Soerensen'> 7696*a58d3d2aSXin Li<organization /></author> 7697*a58d3d2aSXin Li<date year='2010' month='March' /> 7698*a58d3d2aSXin Li<abstract> 7699*a58d3d2aSXin Li<t></t> 7700*a58d3d2aSXin Li</abstract></front> 7701*a58d3d2aSXin Li<seriesInfo name='Internet-Draft' value='draft-vos-silk-01' /> 7702*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/html/draft-vos-silk-01' /> 7703*a58d3d2aSXin Li</reference> 7704*a58d3d2aSXin Li 7705*a58d3d2aSXin Li<reference anchor="laroia-icassp"> 7706*a58d3d2aSXin Li<front> 7707*a58d3d2aSXin Li<title abbrev="Robust and Efficient Quantization of Speech LSP"> 7708*a58d3d2aSXin LiRobust and Efficient Quantization of Speech LSP Parameters Using Structured Vector Quantization 7709*a58d3d2aSXin Li</title> 7710*a58d3d2aSXin Li<author initials="R.L." surname="Laroia" fullname="R."> 7711*a58d3d2aSXin Li<organization/> 7712*a58d3d2aSXin Li</author> 7713*a58d3d2aSXin Li<author initials="N.P." surname="Phamdo" fullname="N."> 7714*a58d3d2aSXin Li<organization/> 7715*a58d3d2aSXin Li</author> 7716*a58d3d2aSXin Li<author initials="N.F." surname="Farvardin" fullname="N."> 7717*a58d3d2aSXin Li<organization/> 7718*a58d3d2aSXin Li</author> 7719*a58d3d2aSXin Li</front> 7720*a58d3d2aSXin Li<seriesInfo name="ICASSP-1991, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 641-644, October" value="1991"/> 7721*a58d3d2aSXin Li</reference> 7722*a58d3d2aSXin Li 7723*a58d3d2aSXin Li<reference anchor='CELT' target='http://celt-codec.org/'> 7724*a58d3d2aSXin Li<front> 7725*a58d3d2aSXin Li<title>Constrained-Energy Lapped Transform (CELT) Codec</title> 7726*a58d3d2aSXin Li<author initials='J-M.' surname='Valin' fullname='J-M. Valin'> 7727*a58d3d2aSXin Li<organization /></author> 7728*a58d3d2aSXin Li<author initials='T.B.' surname='Terriberry' fullname='Timothy B. Terriberry'> 7729*a58d3d2aSXin Li<organization /></author> 7730*a58d3d2aSXin Li<author initials='G.' surname='Maxwell' fullname='G. Maxwell'> 7731*a58d3d2aSXin Li<organization /></author> 7732*a58d3d2aSXin Li<author initials='C.' surname='Montgomery' fullname='C. Montgomery'> 7733*a58d3d2aSXin Li<organization /></author> 7734*a58d3d2aSXin Li<date year='2010' month='July' /> 7735*a58d3d2aSXin Li<abstract> 7736*a58d3d2aSXin Li<t></t> 7737*a58d3d2aSXin Li</abstract></front> 7738*a58d3d2aSXin Li<seriesInfo name='Internet-Draft' value='draft-valin-celt-codec-02' /> 7739*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/html/draft-valin-celt-codec-02' /> 7740*a58d3d2aSXin Li</reference> 7741*a58d3d2aSXin Li 7742*a58d3d2aSXin Li<reference anchor='SRTP-VBR'> 7743*a58d3d2aSXin Li<front> 7744*a58d3d2aSXin Li<title>Guidelines for the use of Variable Bit Rate Audio with Secure RTP</title> 7745*a58d3d2aSXin Li<author initials='C.' surname='Perkins' fullname='K. Vos'> 7746*a58d3d2aSXin Li<organization /></author> 7747*a58d3d2aSXin Li<author initials='J.M.' surname='Valin' fullname='J.M. Valin'> 7748*a58d3d2aSXin Li<organization /></author> 7749*a58d3d2aSXin Li<date year='2011' month='July' /> 7750*a58d3d2aSXin Li<abstract> 7751*a58d3d2aSXin Li<t></t> 7752*a58d3d2aSXin Li</abstract></front> 7753*a58d3d2aSXin Li<seriesInfo name='RFC' value='6562' /> 7754*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/html/rfc6562' /> 7755*a58d3d2aSXin Li</reference> 7756*a58d3d2aSXin Li 7757*a58d3d2aSXin Li<reference anchor='DOS'> 7758*a58d3d2aSXin Li<front> 7759*a58d3d2aSXin Li<title>Internet Denial-of-Service Considerations</title> 7760*a58d3d2aSXin Li<author initials='M.' surname='Handley' fullname='M. Handley'> 7761*a58d3d2aSXin Li<organization /></author> 7762*a58d3d2aSXin Li<author initials='E.' surname='Rescorla' fullname='E. Rescorla'> 7763*a58d3d2aSXin Li<organization /></author> 7764*a58d3d2aSXin Li<author> 7765*a58d3d2aSXin Li<organization>IAB</organization></author> 7766*a58d3d2aSXin Li<date year='2006' month='December' /> 7767*a58d3d2aSXin Li<abstract> 7768*a58d3d2aSXin Li<t>This document provides an overview of possible avenues for denial-of-service (DoS) attack on Internet systems. The aim is to encourage protocol designers and network engineers towards designs that are more robust. We discuss partial solutions that reduce the effectiveness of attacks, and how some solutions might inadvertently open up alternative vulnerabilities. This memo provides information for the Internet community.</t></abstract></front> 7769*a58d3d2aSXin Li<seriesInfo name='RFC' value='4732' /> 7770*a58d3d2aSXin Li<format type='TXT' octets='91844' target='ftp://ftp.isi.edu/in-notes/rfc4732.txt' /> 7771*a58d3d2aSXin Li</reference> 7772*a58d3d2aSXin Li 7773*a58d3d2aSXin Li<reference anchor="Martin79"> 7774*a58d3d2aSXin Li<front> 7775*a58d3d2aSXin Li<title>Range encoding: An algorithm for removing redundancy from a digitised message</title> 7776*a58d3d2aSXin Li<author initials="G.N.N." surname="Martin" fullname="G. Nigel N. Martin"><organization/></author> 7777*a58d3d2aSXin Li<date year="1979" /> 7778*a58d3d2aSXin Li</front> 7779*a58d3d2aSXin Li<seriesInfo name="Proc. Institution of Electronic and Radio Engineers International Conference on Video and Data Recording" value="" /> 7780*a58d3d2aSXin Li</reference> 7781*a58d3d2aSXin Li 7782*a58d3d2aSXin Li<reference anchor="coding-thesis"> 7783*a58d3d2aSXin Li<front> 7784*a58d3d2aSXin Li<title>Source coding algorithms for fast data compression</title> 7785*a58d3d2aSXin Li<author initials="R." surname="Pasco" fullname=""><organization/></author> 7786*a58d3d2aSXin Li<date month="May" year="1976" /> 7787*a58d3d2aSXin Li</front> 7788*a58d3d2aSXin Li<seriesInfo name="Ph.D. thesis" value="Dept. of Electrical Engineering, Stanford University" /> 7789*a58d3d2aSXin Li</reference> 7790*a58d3d2aSXin Li 7791*a58d3d2aSXin Li<reference anchor="PVQ"> 7792*a58d3d2aSXin Li<front> 7793*a58d3d2aSXin Li<title>A Pyramid Vector Quantizer</title> 7794*a58d3d2aSXin Li<author initials="T." surname="Fischer" fullname=""><organization/></author> 7795*a58d3d2aSXin Li<date month="July" year="1986" /> 7796*a58d3d2aSXin Li</front> 7797*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. on Information Theory, Vol. 32" value="pp. 568-583" /> 7798*a58d3d2aSXin Li</reference> 7799*a58d3d2aSXin Li 7800*a58d3d2aSXin Li<reference anchor="Kabal86"> 7801*a58d3d2aSXin Li<front> 7802*a58d3d2aSXin Li<title>The Computation of Line Spectral Frequencies Using Chebyshev Polynomials</title> 7803*a58d3d2aSXin Li<author initials="P." surname="Kabal" fullname="P. Kabal"><organization/></author> 7804*a58d3d2aSXin Li<author initials="R." surname="Ramachandran" fullname="R. P. Ramachandran"><organization/></author> 7805*a58d3d2aSXin Li<date month="December" year="1986" /> 7806*a58d3d2aSXin Li</front> 7807*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. Acoustics, Speech, Signal Processing, vol. 34, no. 6" value="pp. 1419-1426" /> 7808*a58d3d2aSXin Li</reference> 7809*a58d3d2aSXin Li 7810*a58d3d2aSXin Li 7811*a58d3d2aSXin Li<reference anchor="Valgrind" target="http://valgrind.org/"> 7812*a58d3d2aSXin Li<front> 7813*a58d3d2aSXin Li<title>Valgrind website</title> 7814*a58d3d2aSXin Li<author></author> 7815*a58d3d2aSXin Li</front> 7816*a58d3d2aSXin Li</reference> 7817*a58d3d2aSXin Li 7818*a58d3d2aSXin Li<reference anchor="Google-NetEQ" target="http://code.google.com/p/webrtc/source/browse/trunk/src/modules/audio_coding/NetEQ/main/source/?r=583"> 7819*a58d3d2aSXin Li<front> 7820*a58d3d2aSXin Li<title>Google NetEQ code</title> 7821*a58d3d2aSXin Li<author></author> 7822*a58d3d2aSXin Li</front> 7823*a58d3d2aSXin Li</reference> 7824*a58d3d2aSXin Li 7825*a58d3d2aSXin Li<reference anchor="Google-WebRTC" target="http://code.google.com/p/webrtc/"> 7826*a58d3d2aSXin Li<front> 7827*a58d3d2aSXin Li<title>Google WebRTC code</title> 7828*a58d3d2aSXin Li<author></author> 7829*a58d3d2aSXin Li</front> 7830*a58d3d2aSXin Li</reference> 7831*a58d3d2aSXin Li 7832*a58d3d2aSXin Li 7833*a58d3d2aSXin Li<reference anchor="Opus-git" target="git://git.xiph.org/opus.git"> 7834*a58d3d2aSXin Li<front> 7835*a58d3d2aSXin Li<title>Opus Git Repository</title> 7836*a58d3d2aSXin Li<author></author> 7837*a58d3d2aSXin Li</front> 7838*a58d3d2aSXin Li</reference> 7839*a58d3d2aSXin Li 7840*a58d3d2aSXin Li<reference anchor="Opus-website" target="http://opus-codec.org/"> 7841*a58d3d2aSXin Li<front> 7842*a58d3d2aSXin Li<title>Opus website</title> 7843*a58d3d2aSXin Li<author></author> 7844*a58d3d2aSXin Li</front> 7845*a58d3d2aSXin Li</reference> 7846*a58d3d2aSXin Li 7847*a58d3d2aSXin Li<reference anchor="Vorbis-website" target="http://xiph.org/vorbis/"> 7848*a58d3d2aSXin Li<front> 7849*a58d3d2aSXin Li<title>Vorbis website</title> 7850*a58d3d2aSXin Li<author></author> 7851*a58d3d2aSXin Li</front> 7852*a58d3d2aSXin Li</reference> 7853*a58d3d2aSXin Li 7854*a58d3d2aSXin Li<reference anchor="Matroska-website" target="http://matroska.org/"> 7855*a58d3d2aSXin Li<front> 7856*a58d3d2aSXin Li<title>Matroska website</title> 7857*a58d3d2aSXin Li<author></author> 7858*a58d3d2aSXin Li</front> 7859*a58d3d2aSXin Li</reference> 7860*a58d3d2aSXin Li 7861*a58d3d2aSXin Li<reference anchor="Vectors-website" target="http://opus-codec.org/testvectors/"> 7862*a58d3d2aSXin Li<front> 7863*a58d3d2aSXin Li<title>Opus Testvectors (webside)</title> 7864*a58d3d2aSXin Li<author></author> 7865*a58d3d2aSXin Li</front> 7866*a58d3d2aSXin Li</reference> 7867*a58d3d2aSXin Li 7868*a58d3d2aSXin Li<reference anchor="Vectors-proc" target="http://www.ietf.org/proceedings/83/slides/slides-83-codec-0.gz"> 7869*a58d3d2aSXin Li<front> 7870*a58d3d2aSXin Li<title>Opus Testvectors (proceedings)</title> 7871*a58d3d2aSXin Li<author></author> 7872*a58d3d2aSXin Li</front> 7873*a58d3d2aSXin Li</reference> 7874*a58d3d2aSXin Li 7875*a58d3d2aSXin Li<reference anchor="line-spectral-pairs" target="http://en.wikipedia.org/wiki/Line_spectral_pairs"> 7876*a58d3d2aSXin Li<front> 7877*a58d3d2aSXin Li<title>Line Spectral Pairs</title> 7878*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author> 7879*a58d3d2aSXin Li</front> 7880*a58d3d2aSXin Li</reference> 7881*a58d3d2aSXin Li 7882*a58d3d2aSXin Li<reference anchor="range-coding" target="http://en.wikipedia.org/wiki/Range_coding"> 7883*a58d3d2aSXin Li<front> 7884*a58d3d2aSXin Li<title>Range Coding</title> 7885*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author> 7886*a58d3d2aSXin Li</front> 7887*a58d3d2aSXin Li</reference> 7888*a58d3d2aSXin Li 7889*a58d3d2aSXin Li<reference anchor="Hadamard" target="http://en.wikipedia.org/wiki/Hadamard_transform"> 7890*a58d3d2aSXin Li<front> 7891*a58d3d2aSXin Li<title>Hadamard Transform</title> 7892*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author> 7893*a58d3d2aSXin Li</front> 7894*a58d3d2aSXin Li</reference> 7895*a58d3d2aSXin Li 7896*a58d3d2aSXin Li<reference anchor="Viterbi" target="http://en.wikipedia.org/wiki/Viterbi_algorithm"> 7897*a58d3d2aSXin Li<front> 7898*a58d3d2aSXin Li<title>Viterbi Algorithm</title> 7899*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author> 7900*a58d3d2aSXin Li</front> 7901*a58d3d2aSXin Li</reference> 7902*a58d3d2aSXin Li 7903*a58d3d2aSXin Li<reference anchor="Whitening" target="http://en.wikipedia.org/wiki/White_noise"> 7904*a58d3d2aSXin Li<front> 7905*a58d3d2aSXin Li<title>White Noise</title> 7906*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author> 7907*a58d3d2aSXin Li</front> 7908*a58d3d2aSXin Li</reference> 7909*a58d3d2aSXin Li 7910*a58d3d2aSXin Li<reference anchor="LPC" target="http://en.wikipedia.org/wiki/Linear_prediction"> 7911*a58d3d2aSXin Li<front> 7912*a58d3d2aSXin Li<title>Linear Prediction</title> 7913*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author> 7914*a58d3d2aSXin Li</front> 7915*a58d3d2aSXin Li</reference> 7916*a58d3d2aSXin Li 7917*a58d3d2aSXin Li<reference anchor="MDCT" target="http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform"> 7918*a58d3d2aSXin Li<front> 7919*a58d3d2aSXin Li<title>Modified Discrete Cosine Transform</title> 7920*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author> 7921*a58d3d2aSXin Li</front> 7922*a58d3d2aSXin Li</reference> 7923*a58d3d2aSXin Li 7924*a58d3d2aSXin Li<reference anchor="FFT" target="http://en.wikipedia.org/wiki/Fast_Fourier_transform"> 7925*a58d3d2aSXin Li<front> 7926*a58d3d2aSXin Li<title>Fast Fourier Transform</title> 7927*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author> 7928*a58d3d2aSXin Li</front> 7929*a58d3d2aSXin Li</reference> 7930*a58d3d2aSXin Li 7931*a58d3d2aSXin Li<reference anchor="z-transform" target="http://en.wikipedia.org/wiki/Z-transform"> 7932*a58d3d2aSXin Li<front> 7933*a58d3d2aSXin Li<title>Z-transform</title> 7934*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author> 7935*a58d3d2aSXin Li</front> 7936*a58d3d2aSXin Li</reference> 7937*a58d3d2aSXin Li 7938*a58d3d2aSXin Li 7939*a58d3d2aSXin Li<reference anchor="Burg"> 7940*a58d3d2aSXin Li<front> 7941*a58d3d2aSXin Li<title>Maximum Entropy Spectral Analysis</title> 7942*a58d3d2aSXin Li<author initials="JP." surname="Burg" fullname="J.P. Burg"><organization/></author> 7943*a58d3d2aSXin Li</front> 7944*a58d3d2aSXin Li</reference> 7945*a58d3d2aSXin Li 7946*a58d3d2aSXin Li<reference anchor="Schur"> 7947*a58d3d2aSXin Li<front> 7948*a58d3d2aSXin Li<title>A fixed point computation of partial correlation coefficients</title> 7949*a58d3d2aSXin Li<author initials="J." surname="Le Roux" fullname="J. Le Roux"><organization/></author> 7950*a58d3d2aSXin Li<author initials="C." surname="Gueguen" fullname="C. Gueguen"><organization/></author> 7951*a58d3d2aSXin Li</front> 7952*a58d3d2aSXin Li<seriesInfo name="ICASSP-1977, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 257-259, October" value="1977"/> 7953*a58d3d2aSXin Li</reference> 7954*a58d3d2aSXin Li 7955*a58d3d2aSXin Li<reference anchor="Princen86"> 7956*a58d3d2aSXin Li<front> 7957*a58d3d2aSXin Li<title>Analysis/synthesis filter bank design based on time domain aliasing cancellation</title> 7958*a58d3d2aSXin Li<author initials="J." surname="Princen" fullname="John P. Princen"><organization/></author> 7959*a58d3d2aSXin Li<author initials="A." surname="Bradley" fullname="Alan B. Bradley"><organization/></author> 7960*a58d3d2aSXin Li</front> 7961*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. Acoust. Speech Sig. Proc. ASSP-34 (5), 1153-1161" value="1986"/> 7962*a58d3d2aSXin Li</reference> 7963*a58d3d2aSXin Li 7964*a58d3d2aSXin Li<reference anchor="Valin2010"> 7965*a58d3d2aSXin Li<front> 7966*a58d3d2aSXin Li<title>A High-Quality Speech and Audio Codec With Less Than 10 ms delay</title> 7967*a58d3d2aSXin Li<author initials="JM" surname="Valin" fullname="Jean-Marc Valin"><organization/> 7968*a58d3d2aSXin Li</author> 7969*a58d3d2aSXin Li<author initials="T. B." surname="Terriberry" fullname="Timothy Terriberry"><organization/></author> 7970*a58d3d2aSXin Li<author initials="C." surname="Montgomery" fullname="Christopher Montgomery"><organization/></author> 7971*a58d3d2aSXin Li<author initials="G." surname="Maxwell" fullname="Gregory Maxwell"><organization/></author> 7972*a58d3d2aSXin Li</front> 7973*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. on Audio, Speech and Language Processing, Vol. 18, No. 1, pp. 58-67" value="2010" /> 7974*a58d3d2aSXin Li</reference> 7975*a58d3d2aSXin Li 7976*a58d3d2aSXin Li 7977*a58d3d2aSXin Li<reference anchor="Zwicker61"> 7978*a58d3d2aSXin Li<front> 7979*a58d3d2aSXin Li<title>Subdivision of the audible frequency range into critical bands</title> 7980*a58d3d2aSXin Li<author initials="E." surname="Zwicker" fullname="E. Zwicker"><organization/></author> 7981*a58d3d2aSXin Li<date month="February" year="1961" /> 7982*a58d3d2aSXin Li</front> 7983*a58d3d2aSXin Li<seriesInfo name="The Journal of the Acoustical Society of America, Vol. 33, No 2" value="p. 248" /> 7984*a58d3d2aSXin Li</reference> 7985*a58d3d2aSXin Li 7986*a58d3d2aSXin Li 7987*a58d3d2aSXin Li</references> 7988*a58d3d2aSXin Li 7989*a58d3d2aSXin Li<section anchor="ref-implementation" title="Reference Implementation"> 7990*a58d3d2aSXin Li 7991*a58d3d2aSXin Li<t>This appendix contains the complete source code for the 7992*a58d3d2aSXin Lireference implementation of the Opus codec written in C. By default, 7993*a58d3d2aSXin Lithis implementation relies on floating-point arithmetic, but it can be 7994*a58d3d2aSXin Licompiled to use only fixed-point arithmetic by defining the FIXED_POINT 7995*a58d3d2aSXin Limacro. Information on building and using the reference implementation is 7996*a58d3d2aSXin Liavailable in the README file. 7997*a58d3d2aSXin Li</t> 7998*a58d3d2aSXin Li 7999*a58d3d2aSXin Li<t>The implementation can be compiled with either a C89 or a C99 8000*a58d3d2aSXin Licompiler. It is reasonably optimized for most platforms such that 8001*a58d3d2aSXin Lionly architecture-specific optimizations are likely to be useful. 8002*a58d3d2aSXin LiThe FFT <xref target="FFT"/> used is a slightly modified version of the KISS-FFT library, 8003*a58d3d2aSXin Libut it is easy to substitute any other FFT library. 8004*a58d3d2aSXin Li</t> 8005*a58d3d2aSXin Li 8006*a58d3d2aSXin Li<t> 8007*a58d3d2aSXin LiWhile the reference implementation does not rely on any 8008*a58d3d2aSXin Li<spanx style="emph">undefined behavior</spanx> as defined by C89 or C99, 8009*a58d3d2aSXin Liit relies on common <spanx style="emph">implementation-defined behavior</spanx> 8010*a58d3d2aSXin Lifor two's complement architectures: 8011*a58d3d2aSXin Li<list style="symbols"> 8012*a58d3d2aSXin Li<t>Right shifts of negative values are consistent with two's complement arithmetic, so that a>>b is equivalent to floor(a/(2**b)),</t> 8013*a58d3d2aSXin Li<t>For conversion to a signed integer of N bits, the value is reduced modulo 2**N to be within range of the type,</t> 8014*a58d3d2aSXin Li<t>The result of integer division of a negative value is truncated towards zero, and</t> 8015*a58d3d2aSXin Li<t>The compiler provides a 64-bit integer type (a C99 requirement which is supported by most C89 compilers).</t> 8016*a58d3d2aSXin Li</list> 8017*a58d3d2aSXin Li</t> 8018*a58d3d2aSXin Li 8019*a58d3d2aSXin Li<t> 8020*a58d3d2aSXin LiIn its current form, the reference implementation also requires the following 8021*a58d3d2aSXin Liarchitectural characteristics to obtain acceptable performance: 8022*a58d3d2aSXin Li<list style="symbols"> 8023*a58d3d2aSXin Li<t>Two's complement arithmetic,</t> 8024*a58d3d2aSXin Li<t>At least a 16 bit by 16 bit integer multiplier (32-bit result), and</t> 8025*a58d3d2aSXin Li<t>At least a 32-bit adder/accumulator.</t> 8026*a58d3d2aSXin Li</list> 8027*a58d3d2aSXin Li</t> 8028*a58d3d2aSXin Li 8029*a58d3d2aSXin Li 8030*a58d3d2aSXin Li<section title="Extracting the source"> 8031*a58d3d2aSXin Li<t> 8032*a58d3d2aSXin LiThe complete source code can be extracted from this draft, by running the 8033*a58d3d2aSXin Lifollowing command line: 8034*a58d3d2aSXin Li 8035*a58d3d2aSXin Li<list style="symbols"> 8036*a58d3d2aSXin Li<t><![CDATA[ 8037*a58d3d2aSXin Licat draft-ietf-codec-opus.txt | grep '^\ \ \ ###' | sed -e 's/...###//' | base64 -d > opus_source.tar.gz 8038*a58d3d2aSXin Li]]></t> 8039*a58d3d2aSXin Li<t> 8040*a58d3d2aSXin Litar xzvf opus_source.tar.gz 8041*a58d3d2aSXin Li</t> 8042*a58d3d2aSXin Li<t>cd opus_source</t> 8043*a58d3d2aSXin Li<t>make</t> 8044*a58d3d2aSXin Li</list> 8045*a58d3d2aSXin LiOn systems where the provided Makefile does not work, the following command line may be used to compile 8046*a58d3d2aSXin Lithe source code: 8047*a58d3d2aSXin Li<list style="symbols"> 8048*a58d3d2aSXin Li<t><![CDATA[ 8049*a58d3d2aSXin Licc -O2 -g -o opus_demo src/opus_demo.c `cat *.mk | grep -v fixed | sed -e 's/.*=//' -e 's/\\\\//'` -DOPUS_BUILD -Iinclude -Icelt -Isilk -Isilk/float -DUSE_ALLOCA -Drestrict= -lm 8050*a58d3d2aSXin Li]]></t></list> 8051*a58d3d2aSXin Li</t> 8052*a58d3d2aSXin Li 8053*a58d3d2aSXin Li<t> 8054*a58d3d2aSXin LiOn systems where the base64 utility is not present, the following commands can be used instead: 8055*a58d3d2aSXin Li<list style="symbols"> 8056*a58d3d2aSXin Li<t><![CDATA[ 8057*a58d3d2aSXin Licat draft-ietf-codec-opus.txt | grep '^\ \ \ ###' | sed -e 's/...###//' > opus.b64 8058*a58d3d2aSXin Li]]></t> 8059*a58d3d2aSXin Li<t>openssl base64 -d -in opus.b64 > opus_source.tar.gz</t> 8060*a58d3d2aSXin Li</list> 8061*a58d3d2aSXin Li 8062*a58d3d2aSXin Li</t> 8063*a58d3d2aSXin Li</section> 8064*a58d3d2aSXin Li 8065*a58d3d2aSXin Li<section title="Up-to-date Implementation"> 8066*a58d3d2aSXin Li<t> 8067*a58d3d2aSXin LiAs of the time of publication of this memo, an up-to-date implementation conforming to 8068*a58d3d2aSXin Lithis standard is available in a 8069*a58d3d2aSXin Li <xref target='Opus-git'>Git repository</xref>. 8070*a58d3d2aSXin LiReleases and other resources are available at 8071*a58d3d2aSXin Li <xref target='Opus-website'/>. However, although that implementation is expected to 8072*a58d3d2aSXin Li remain conformant with the standard, it is the code in this document that shall 8073*a58d3d2aSXin Li remain normative. 8074*a58d3d2aSXin Li</t> 8075*a58d3d2aSXin Li</section> 8076*a58d3d2aSXin Li 8077*a58d3d2aSXin Li<section title="Base64-encoded Source Code"> 8078*a58d3d2aSXin Li<t> 8079*a58d3d2aSXin Li<?rfc include="opus_source.base64"?> 8080*a58d3d2aSXin Li</t> 8081*a58d3d2aSXin Li</section> 8082*a58d3d2aSXin Li 8083*a58d3d2aSXin Li<section anchor="test-vectors" title="Test Vectors"> 8084*a58d3d2aSXin Li<t> 8085*a58d3d2aSXin LiBecause of size constraints, the Opus test vectors are not distributed in this 8086*a58d3d2aSXin Lidraft. They are available in the proceedings of the 83th IETF meeting (Paris) <xref target="Vectors-proc"/> and from the Opus codec website at 8087*a58d3d2aSXin Li<xref target="Vectors-website"/>. These test vectors were created specifically to exercise 8088*a58d3d2aSXin Liall aspects of the decoder and therefore the audio quality of the decoded output is 8089*a58d3d2aSXin Lisignificantly lower than what Opus can achieve in normal operation. 8090*a58d3d2aSXin Li</t> 8091*a58d3d2aSXin Li 8092*a58d3d2aSXin Li<t> 8093*a58d3d2aSXin LiThe SHA1 hash of the files in the test vector package are 8094*a58d3d2aSXin Li<?rfc include="testvectors_sha1"?> 8095*a58d3d2aSXin Li</t> 8096*a58d3d2aSXin Li 8097*a58d3d2aSXin Li</section> 8098*a58d3d2aSXin Li 8099*a58d3d2aSXin Li</section> 8100*a58d3d2aSXin Li 8101*a58d3d2aSXin Li<section anchor="self-delimiting-framing" title="Self-Delimiting Framing"> 8102*a58d3d2aSXin Li<t> 8103*a58d3d2aSXin LiTo use the internal framing described in <xref target="modes"/>, the decoder 8104*a58d3d2aSXin Li must know the total length of the Opus packet, in bytes. 8105*a58d3d2aSXin LiThis section describes a simple variation of that framing which can be used 8106*a58d3d2aSXin Li when the total length of the packet is not known. 8107*a58d3d2aSXin LiNothing in the encoding of the packet itself allows a decoder to distinguish 8108*a58d3d2aSXin Li between the regular, undelimited framing and the self-delimiting framing 8109*a58d3d2aSXin Li described in this appendix. 8110*a58d3d2aSXin LiWhich one is used and where must be established by context at the transport 8111*a58d3d2aSXin Li layer. 8112*a58d3d2aSXin LiIt is RECOMMENDED that a transport layer choose exactly one framing scheme, 8113*a58d3d2aSXin Li rather than allowing an encoder to signal which one it wants to use. 8114*a58d3d2aSXin Li</t> 8115*a58d3d2aSXin Li 8116*a58d3d2aSXin Li<t> 8117*a58d3d2aSXin LiFor example, although a regular Opus stream does not support more than two 8118*a58d3d2aSXin Li channels, a multi-channel Opus stream may be formed from several one- and 8119*a58d3d2aSXin Li two-channel streams. 8120*a58d3d2aSXin LiTo pack an Opus packet from each of these streams together in a single packet 8121*a58d3d2aSXin Li at the transport layer, one could use the self-delimiting framing for all but 8122*a58d3d2aSXin Li the last stream, and then the regular, undelimited framing for the last one. 8123*a58d3d2aSXin LiReverting to the undelimited framing for the last stream saves overhead 8124*a58d3d2aSXin Li (because the total size of the transport-layer packet will still be known), 8125*a58d3d2aSXin Li and ensures that a "multi-channel" stream which only has a single Opus stream 8126*a58d3d2aSXin Li uses the same framing as a regular Opus stream does. 8127*a58d3d2aSXin LiThis avoids the need for signaling to distinguish these two cases. 8128*a58d3d2aSXin Li</t> 8129*a58d3d2aSXin Li 8130*a58d3d2aSXin Li<t> 8131*a58d3d2aSXin LiThe self-delimiting framing is identical to the regular, undelimited framing 8132*a58d3d2aSXin Li from <xref target="modes"/>, except that each Opus packet contains one extra 8133*a58d3d2aSXin Li length field, encoded using the same one- or two-byte scheme from 8134*a58d3d2aSXin Li <xref target="frame-length-coding"/>. 8135*a58d3d2aSXin LiThis extra length immediately precedes the compressed data of the first Opus 8136*a58d3d2aSXin Li frame in the packet, and is interpreted in the various modes as follows: 8137*a58d3d2aSXin Li<list style="symbols"> 8138*a58d3d2aSXin Li<t> 8139*a58d3d2aSXin LiCode 0 packets: It is the length of the single Opus frame (see 8140*a58d3d2aSXin Li <xref target="sd_code0_packet"/>). 8141*a58d3d2aSXin Li</t> 8142*a58d3d2aSXin Li<t> 8143*a58d3d2aSXin LiCode 1 packets: It is the length used for both of the Opus frames (see 8144*a58d3d2aSXin Li <xref target="sd_code1_packet"/>). 8145*a58d3d2aSXin Li</t> 8146*a58d3d2aSXin Li<t> 8147*a58d3d2aSXin LiCode 2 packets: It is the length of the second Opus frame (see 8148*a58d3d2aSXin Li <xref target="sd_code2_packet"/>).</t> 8149*a58d3d2aSXin Li<t> 8150*a58d3d2aSXin LiCBR Code 3 packets: It is the length used for all of the Opus frames (see 8151*a58d3d2aSXin Li <xref target="sd_code3cbr_packet"/>). 8152*a58d3d2aSXin Li</t> 8153*a58d3d2aSXin Li<t>VBR Code 3 packets: It is the length of the last Opus frame (see 8154*a58d3d2aSXin Li <xref target="sd_code3vbr_packet"/>). 8155*a58d3d2aSXin Li</t> 8156*a58d3d2aSXin Li</list> 8157*a58d3d2aSXin Li</t> 8158*a58d3d2aSXin Li 8159*a58d3d2aSXin Li<figure anchor="sd_code0_packet" title="A Self-Delimited Code 0 Packet" 8160*a58d3d2aSXin Li align="center"> 8161*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 8162*a58d3d2aSXin Li 0 1 2 3 8163*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 8164*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8165*a58d3d2aSXin Li| config |s|0|0| N1 (1-2 bytes): | 8166*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 8167*a58d3d2aSXin Li| Compressed frame 1 (N1 bytes)... : 8168*a58d3d2aSXin Li: | 8169*a58d3d2aSXin Li| | 8170*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8171*a58d3d2aSXin Li]]></artwork> 8172*a58d3d2aSXin Li</figure> 8173*a58d3d2aSXin Li 8174*a58d3d2aSXin Li<figure anchor="sd_code1_packet" title="A Self-Delimited Code 1 Packet" 8175*a58d3d2aSXin Li align="center"> 8176*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 8177*a58d3d2aSXin Li 0 1 2 3 8178*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 8179*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8180*a58d3d2aSXin Li| config |s|0|1| N1 (1-2 bytes): | 8181*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : 8182*a58d3d2aSXin Li| Compressed frame 1 (N1 bytes)... | 8183*a58d3d2aSXin Li: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8184*a58d3d2aSXin Li| | | 8185*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : 8186*a58d3d2aSXin Li| Compressed frame 2 (N1 bytes)... | 8187*a58d3d2aSXin Li: +-+-+-+-+-+-+-+-+ 8188*a58d3d2aSXin Li| | 8189*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8190*a58d3d2aSXin Li]]></artwork> 8191*a58d3d2aSXin Li</figure> 8192*a58d3d2aSXin Li 8193*a58d3d2aSXin Li<figure anchor="sd_code2_packet" title="A Self-Delimited Code 2 Packet" 8194*a58d3d2aSXin Li align="center"> 8195*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 8196*a58d3d2aSXin Li 0 1 2 3 8197*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 8198*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8199*a58d3d2aSXin Li| config |s|1|0| N1 (1-2 bytes): N2 (1-2 bytes : | 8200*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : 8201*a58d3d2aSXin Li| Compressed frame 1 (N1 bytes)... | 8202*a58d3d2aSXin Li: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8203*a58d3d2aSXin Li| | | 8204*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 8205*a58d3d2aSXin Li| Compressed frame 2 (N2 bytes)... : 8206*a58d3d2aSXin Li: | 8207*a58d3d2aSXin Li| | 8208*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8209*a58d3d2aSXin Li]]></artwork> 8210*a58d3d2aSXin Li</figure> 8211*a58d3d2aSXin Li 8212*a58d3d2aSXin Li<figure anchor="sd_code3cbr_packet" title="A Self-Delimited CBR Code 3 Packet" 8213*a58d3d2aSXin Li align="center"> 8214*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 8215*a58d3d2aSXin Li 0 1 2 3 8216*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 8217*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8218*a58d3d2aSXin Li| config |s|1|1|0|p| M | Pad len (Opt) : N1 (1-2 bytes): 8219*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8220*a58d3d2aSXin Li| | 8221*a58d3d2aSXin Li: Compressed frame 1 (N1 bytes)... : 8222*a58d3d2aSXin Li| | 8223*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8224*a58d3d2aSXin Li| | 8225*a58d3d2aSXin Li: Compressed frame 2 (N1 bytes)... : 8226*a58d3d2aSXin Li| | 8227*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8228*a58d3d2aSXin Li| | 8229*a58d3d2aSXin Li: ... : 8230*a58d3d2aSXin Li| | 8231*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8232*a58d3d2aSXin Li| | 8233*a58d3d2aSXin Li: Compressed frame M (N1 bytes)... : 8234*a58d3d2aSXin Li| | 8235*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8236*a58d3d2aSXin Li: Opus Padding (Optional)... | 8237*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8238*a58d3d2aSXin Li]]></artwork> 8239*a58d3d2aSXin Li</figure> 8240*a58d3d2aSXin Li 8241*a58d3d2aSXin Li<figure anchor="sd_code3vbr_packet" title="A Self-Delimited VBR Code 3 Packet" 8242*a58d3d2aSXin Li align="center"> 8243*a58d3d2aSXin Li<artwork align="center"><![CDATA[ 8244*a58d3d2aSXin Li 0 1 2 3 8245*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 8246*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8247*a58d3d2aSXin Li| config |s|1|1|1|p| M | Padding length (Optional) : 8248*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8249*a58d3d2aSXin Li: N1 (1-2 bytes): ... : N[M-1] | N[M] : 8250*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8251*a58d3d2aSXin Li| | 8252*a58d3d2aSXin Li: Compressed frame 1 (N1 bytes)... : 8253*a58d3d2aSXin Li| | 8254*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8255*a58d3d2aSXin Li| | 8256*a58d3d2aSXin Li: Compressed frame 2 (N2 bytes)... : 8257*a58d3d2aSXin Li| | 8258*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8259*a58d3d2aSXin Li| | 8260*a58d3d2aSXin Li: ... : 8261*a58d3d2aSXin Li| | 8262*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8263*a58d3d2aSXin Li| | 8264*a58d3d2aSXin Li: Compressed frame M (N[M] bytes)... : 8265*a58d3d2aSXin Li| | 8266*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8267*a58d3d2aSXin Li: Opus Padding (Optional)... | 8268*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8269*a58d3d2aSXin Li]]></artwork> 8270*a58d3d2aSXin Li</figure> 8271*a58d3d2aSXin Li 8272*a58d3d2aSXin Li</section> 8273*a58d3d2aSXin Li 8274*a58d3d2aSXin Li</back> 8275*a58d3d2aSXin Li 8276*a58d3d2aSXin Li</rfc> 8277