libopus/doc/draft-ietf-codec-opus.xml

*a58d3d2aSXin Li<?xml version="1.0" encoding="utf-8"?>
*a58d3d2aSXin Li<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>
*a58d3d2aSXin Li<?rfc toc="yes" symrefs="yes" ?>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-opus-14">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title abbrev="Interactive Audio Codec">Definition of the Opus Audio Codec</title>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<author initials="JM" surname="Valin" fullname="Jean-Marc Valin">
*a58d3d2aSXin Li<organization>Mozilla Corporation</organization>
*a58d3d2aSXin Li<address>
*a58d3d2aSXin Li<postal>
*a58d3d2aSXin Li<street>650 Castro Street</street>
*a58d3d2aSXin Li<city>Mountain View</city>
*a58d3d2aSXin Li<region>CA</region>
*a58d3d2aSXin Li<code>94041</code>
*a58d3d2aSXin Li<country>USA</country>
*a58d3d2aSXin Li</postal>
*a58d3d2aSXin Li<phone>+1 650 903-0800</phone>
*a58d3d2aSXin Li<email>[email protected]</email>
*a58d3d2aSXin Li</address>
*a58d3d2aSXin Li</author>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<author initials="K." surname="Vos" fullname="Koen Vos">
*a58d3d2aSXin Li<organization>Skype Technologies S.A.</organization>
*a58d3d2aSXin Li<address>
*a58d3d2aSXin Li<postal>
*a58d3d2aSXin Li<street>Soder Malarstrand 43</street>
*a58d3d2aSXin Li<city>Stockholm</city>
*a58d3d2aSXin Li<region></region>
*a58d3d2aSXin Li<code>11825</code>
*a58d3d2aSXin Li<country>SE</country>
*a58d3d2aSXin Li</postal>
*a58d3d2aSXin Li<phone>+46 73 085 7619</phone>
*a58d3d2aSXin Li<email>[email protected]</email>
*a58d3d2aSXin Li</address>
*a58d3d2aSXin Li</author>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<author initials="T." surname="Terriberry" fullname="Timothy B. Terriberry">
*a58d3d2aSXin Li<organization>Mozilla Corporation</organization>
*a58d3d2aSXin Li<address>
*a58d3d2aSXin Li<postal>
*a58d3d2aSXin Li<street>650 Castro Street</street>
*a58d3d2aSXin Li<city>Mountain View</city>
*a58d3d2aSXin Li<region>CA</region>
*a58d3d2aSXin Li<code>94041</code>
*a58d3d2aSXin Li<country>USA</country>
*a58d3d2aSXin Li</postal>
*a58d3d2aSXin Li<phone>+1 650 903-0800</phone>
*a58d3d2aSXin Li<email>[email protected]</email>
*a58d3d2aSXin Li</address>
*a58d3d2aSXin Li</author>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<date day="17" month="May" year="2012" />
*a58d3d2aSXin Li
*a58d3d2aSXin Li<area>General</area>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<workgroup></workgroup>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<abstract>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis document defines the Opus interactive speech and audio codec.
*a58d3d2aSXin LiOpus is designed to handle a wide range of interactive audio applications,
*a58d3d2aSXin Li including Voice over IP, videoconferencing, in-game chat, and even live,
*a58d3d2aSXin Li distributed music performances.
*a58d3d2aSXin LiIt scales from low bitrate narrowband speech at 6 kb/s to very high quality
*a58d3d2aSXin Li stereo music at 510 kb/s.
*a58d3d2aSXin LiOpus uses both linear prediction (LP) and the Modified Discrete Cosine
*a58d3d2aSXin Li Transform (MDCT) to achieve good compression of both speech and music.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</abstract>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<middle>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="introduction" title="Introduction">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe Opus codec is a real-time interactive audio codec designed to meet the requirements
*a58d3d2aSXin Lidescribed in <xref target="requirements"></xref>.
*a58d3d2aSXin LiIt is composed of a linear
*a58d3d2aSXin Li prediction (LP)-based <xref target="LPC"/> layer and a Modified Discrete Cosine Transform
*a58d3d2aSXin Li (MDCT)-based <xref target="MDCT"/> layer.
*a58d3d2aSXin LiThe main idea behind using two layers is that in speech, linear prediction
*a58d3d2aSXin Li techniques (such as Code-Excited Linear Prediction, or CELP) code low frequencies more efficiently than transform
*a58d3d2aSXin Li (e.g., MDCT) domain techniques, while the situation is reversed for music and
*a58d3d2aSXin Li higher speech frequencies.
*a58d3d2aSXin LiThus a codec with both layers available can operate over a wider range than
*a58d3d2aSXin Li either one alone and, by combining them, achieve better quality than either
*a58d3d2aSXin Li one individually.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe primary normative part of this specification is provided by the source code
*a58d3d2aSXin Li in <xref target="ref-implementation"></xref>.
*a58d3d2aSXin LiOnly the decoder portion of this software is normative, though a
*a58d3d2aSXin Li significant amount of code is shared by both the encoder and decoder.
*a58d3d2aSXin Li<xref target="conformance"/> provides a decoder conformance test.
*a58d3d2aSXin LiThe decoder contains a great deal of integer and fixed-point arithmetic which
*a58d3d2aSXin Li needs to be performed exactly, including all rounding considerations, so any
*a58d3d2aSXin Li useful specification requires domain-specific symbolic language to adequately
*a58d3d2aSXin Li define these operations.
*a58d3d2aSXin LiAdditionally, any
*a58d3d2aSXin Liconflict between the symbolic representation and the included reference
*a58d3d2aSXin Liimplementation must be resolved. For the practical reasons of compatibility and
*a58d3d2aSXin Litestability it would be advantageous to give the reference implementation
*a58d3d2aSXin Lipriority in any disagreement. The C language is also one of the most
*a58d3d2aSXin Liwidely understood human-readable symbolic representations for machine
*a58d3d2aSXin Libehavior.
*a58d3d2aSXin LiFor these reasons this RFC uses the reference implementation as the sole
*a58d3d2aSXin Li symbolic representation of the codec.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>While the symbolic representation is unambiguous and complete it is not
*a58d3d2aSXin Lialways the easiest way to understand the codec's operation. For this reason
*a58d3d2aSXin Lithis document also describes significant parts of the codec in English and
*a58d3d2aSXin Litakes the opportunity to explain the rationale behind many of the more
*a58d3d2aSXin Lisurprising elements of the design. These descriptions are intended to be
*a58d3d2aSXin Liaccurate and informative, but the limitations of common English sometimes
*a58d3d2aSXin Liresult in ambiguity, so it is expected that the reader will always read
*a58d3d2aSXin Lithem alongside the symbolic representation. Numerous references to the
*a58d3d2aSXin Liimplementation are provided for this purpose. The descriptions sometimes
*a58d3d2aSXin Lidiffer from the reference in ordering or through mathematical simplification
*a58d3d2aSXin Liwherever such deviation makes an explanation easier to understand.
*a58d3d2aSXin LiFor example, the right shift and left shift operations in the reference
*a58d3d2aSXin Liimplementation are often described using division and multiplication in the text.
*a58d3d2aSXin LiIn general, the text is focused on the "what" and "why" while the symbolic
*a58d3d2aSXin Lirepresentation most clearly provides the "how".
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="notation" title="Notation and Conventions">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
*a58d3d2aSXin Li "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
*a58d3d2aSXin Li interpreted as described in RFC 2119 <xref target="rfc2119"></xref>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiVarious operations in the codec require bit-exact fixed-point behavior, even
*a58d3d2aSXin Li when writing a floating point implementation.
*a58d3d2aSXin LiThe notation "Q&lt;n&gt;", where n is an integer, denotes the number of binary
*a58d3d2aSXin Li digits to the right of the decimal point in a fixed-point number.
*a58d3d2aSXin LiFor example, a signed Q14 value in a 16-bit word can represent values from
*a58d3d2aSXin Li -2.0 to 1.99993896484375, inclusive.
*a58d3d2aSXin LiThis notation is for informational purposes only.
*a58d3d2aSXin LiArithmetic, when described, always operates on the underlying integer.
*a58d3d2aSXin LiE.g., the text will explicitly indicate any shifts required after a
*a58d3d2aSXin Li multiplication.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiExpressions, where included in the text, follow C operator rules and
*a58d3d2aSXin Li precedence, with the exception that the syntax "x**y" indicates x raised to
*a58d3d2aSXin Li the power y.
*a58d3d2aSXin LiThe text also makes use of the following functions:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="min" toc="exclude" title="min(x,y)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe smallest of two values x and y.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="max" toc="exclude" title="max(x,y)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe largest of two values x and y.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="clamp" toc="exclude" title="clamp(lo,x,hi)">
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Liclamp(lo,x,hi) = max(lo,min(x,hi))
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWith this definition, if lo&nbsp;&gt;&nbsp;hi, the lower bound is the one that
*a58d3d2aSXin Li is enforced.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="sign" toc="exclude" title="sign(x)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe sign of x, i.e.,
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li          ( -1,  x < 0 ,
*a58d3d2aSXin Lisign(x) = <  0,  x == 0 ,
*a58d3d2aSXin Li          (  1,  x > 0 .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="abs" toc="exclude" title="abs(x)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe absolute value of x, i.e.,
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Liabs(x) = sign(x)*x .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="floor" toc="exclude" title="floor(f)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe largest integer z such that z &lt;= f.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ceil" toc="exclude" title="ceil(f)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe smallest integer z such that z &gt;= f.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="round" toc="exclude" title="round(f)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe integer z nearest to f, with ties rounded towards negative infinity,
*a58d3d2aSXin Li i.e.,
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li round(f) = ceil(f - 0.5) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="log2" toc="exclude" title="log2(f)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe base-two logarithm of f.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ilog" toc="exclude" title="ilog(n)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe minimum number of bits required to store a positive integer n in two's
*a58d3d2aSXin Li complement notation, or 0 for a non-positive integer n.
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li          ( 0,                 n <= 0,
*a58d3d2aSXin Liilog(n) = <
*a58d3d2aSXin Li          ( floor(log2(n))+1,  n > 0
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiExamples:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>ilog(-1) = 0</t>
*a58d3d2aSXin Li<t>ilog(0) = 0</t>
*a58d3d2aSXin Li<t>ilog(1) = 1</t>
*a58d3d2aSXin Li<t>ilog(2) = 2</t>
*a58d3d2aSXin Li<t>ilog(3) = 2</t>
*a58d3d2aSXin Li<t>ilog(4) = 3</t>
*a58d3d2aSXin Li<t>ilog(7) = 3</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="overview" title="Opus Codec Overview">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe Opus codec scales from 6&nbsp;kb/s narrowband mono speech to 510&nbsp;kb/s
*a58d3d2aSXin Li fullband stereo music, with algorithmic delays ranging from 5&nbsp;ms to
*a58d3d2aSXin Li 65.2&nbsp;ms.
*a58d3d2aSXin LiAt any given time, either the LP layer, the MDCT layer, or both, may be active.
*a58d3d2aSXin LiIt can seamlessly switch between all of its various operating modes, giving it
*a58d3d2aSXin Li a great deal of flexibility to adapt to varying content and network
*a58d3d2aSXin Li conditions without renegotiating the current session.
*a58d3d2aSXin LiThe codec allows input and output of various audio bandwidths, defined as
*a58d3d2aSXin Li follows:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<texttable anchor="audio-bandwidth">
*a58d3d2aSXin Li<ttcol>Abbreviation</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Audio Bandwidth</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Sample Rate (Effective)</ttcol>
*a58d3d2aSXin Li<c>NB (narrowband)</c>       <c>4&nbsp;kHz</c>  <c>8&nbsp;kHz</c>
*a58d3d2aSXin Li<c>MB (medium-band)</c>      <c>6&nbsp;kHz</c> <c>12&nbsp;kHz</c>
*a58d3d2aSXin Li<c>WB (wideband)</c>         <c>8&nbsp;kHz</c> <c>16&nbsp;kHz</c>
*a58d3d2aSXin Li<c>SWB (super-wideband)</c> <c>12&nbsp;kHz</c> <c>24&nbsp;kHz</c>
*a58d3d2aSXin Li<c>FB (fullband)</c>        <c>20&nbsp;kHz (*)</c> <c>48&nbsp;kHz</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li(*) Although the sampling theorem allows a bandwidth as large as half the
*a58d3d2aSXin Li sampling rate, Opus never codes audio above 20&nbsp;kHz, as that is the
*a58d3d2aSXin Li generally accepted upper limit of human hearing.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOpus defines super-wideband (SWB) with an effective sample rate of 24&nbsp;kHz,
*a58d3d2aSXin Li unlike some other audio coding standards that use 32&nbsp;kHz.
*a58d3d2aSXin LiThis was chosen for a number of reasons.
*a58d3d2aSXin LiThe band layout in the MDCT layer naturally allows skipping coefficients for
*a58d3d2aSXin Li frequencies over 12&nbsp;kHz, but does not allow cleanly dropping just those
*a58d3d2aSXin Li frequencies over 16&nbsp;kHz.
*a58d3d2aSXin LiA sample rate of 24&nbsp;kHz also makes resampling in the MDCT layer easier,
*a58d3d2aSXin Li as 24 evenly divides 48, and when 24&nbsp;kHz is sufficient, it can save
*a58d3d2aSXin Li computation in other processing, such as Acoustic Echo Cancellation (AEC).
*a58d3d2aSXin LiExperimental changes to the band layout to allow a 16&nbsp;kHz cutoff
*a58d3d2aSXin Li (32&nbsp;kHz effective sample rate) showed potential quality degradations at
*a58d3d2aSXin Li other sample rates, and at typical bitrates the number of bits saved by using
*a58d3d2aSXin Li such a cutoff instead of coding in fullband (FB) mode is very small.
*a58d3d2aSXin LiTherefore, if an application wishes to process a signal sampled at 32&nbsp;kHz,
*a58d3d2aSXin Li it should just use FB.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe LP layer is based on the SILK codec
*a58d3d2aSXin Li <xref target="SILK"></xref>.
*a58d3d2aSXin LiIt supports NB, MB, or WB audio and frame sizes from 10&nbsp;ms to 60&nbsp;ms,
*a58d3d2aSXin Li and requires an additional 5&nbsp;ms look-ahead for noise shaping estimation.
*a58d3d2aSXin LiA small additional delay (up to 1.5 ms) may be required for sampling rate
*a58d3d2aSXin Li conversion.
*a58d3d2aSXin LiLike Vorbis <xref target='Vorbis-website'/> and many other modern codecs, SILK is inherently designed for
*a58d3d2aSXin Li variable-bitrate (VBR) coding, though the encoder can also produce
*a58d3d2aSXin Li constant-bitrate (CBR) streams.
*a58d3d2aSXin LiThe version of SILK used in Opus is substantially modified from, and not
*a58d3d2aSXin Li compatible with, the stand-alone SILK codec previously deployed by Skype.
*a58d3d2aSXin LiThis document does not serve to define that format, but those interested in the
*a58d3d2aSXin Li original SILK codec should see <xref target="SILK"/> instead.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe MDCT layer is based on the CELT  codec <xref target="CELT"></xref>.
*a58d3d2aSXin LiIt supports NB, WB, SWB, or FB audio and frame sizes from 2.5&nbsp;ms to
*a58d3d2aSXin Li 20&nbsp;ms, and requires an additional 2.5&nbsp;ms look-ahead due to the
*a58d3d2aSXin Li overlapping MDCT windows.
*a58d3d2aSXin LiThe CELT codec is inherently designed for CBR coding, but unlike many CBR
*a58d3d2aSXin Li codecs it is not limited to a set of predetermined rates.
*a58d3d2aSXin LiIt internally allocates bits to exactly fill any given target budget, and an
*a58d3d2aSXin Li encoder can produce a VBR stream by varying the target on a per-frame basis.
*a58d3d2aSXin LiThe MDCT layer is not used for speech when the audio bandwidth is WB or less,
*a58d3d2aSXin Li as it is not useful there.
*a58d3d2aSXin LiOn the other hand, non-speech signals are not always adequately coded using
*a58d3d2aSXin Li linear prediction, so for music only the MDCT layer should be used.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA "Hybrid" mode allows the use of both layers simultaneously with a frame size
*a58d3d2aSXin Li of 10&nbsp;or 20&nbsp;ms and a SWB or FB audio bandwidth.
*a58d3d2aSXin LiThe LP layer codes the low frequencies by resampling the signal down to WB.
*a58d3d2aSXin LiThe MDCT layer follows, coding the high frequency portion of the signal.
*a58d3d2aSXin LiThe cutoff between the two lies at 8&nbsp;kHz, the maximum WB audio bandwidth.
*a58d3d2aSXin LiIn the MDCT layer, all bands below 8&nbsp;kHz are discarded, so there is no
*a58d3d2aSXin Li coding redundancy between the two layers.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe sample rate (in contrast to the actual audio bandwidth) can be chosen
*a58d3d2aSXin Li independently on the encoder and decoder side, e.g., a fullband signal can be
*a58d3d2aSXin Li decoded as wideband, or vice versa.
*a58d3d2aSXin LiThis approach ensures a sender and receiver can always interoperate, regardless
*a58d3d2aSXin Li of the capabilities of their actual audio hardware.
*a58d3d2aSXin LiInternally, the LP layer always operates at a sample rate of twice the audio
*a58d3d2aSXin Li bandwidth, up to a maximum of 16&nbsp;kHz, which it continues to use for SWB
*a58d3d2aSXin Li and FB.
*a58d3d2aSXin LiThe decoder simply resamples its output to support different sample rates.
*a58d3d2aSXin LiThe MDCT layer always operates internally at a sample rate of 48&nbsp;kHz.
*a58d3d2aSXin LiSince all the supported sample rates evenly divide this rate, and since the
*a58d3d2aSXin Li the decoder may easily zero out the high frequency portion of the spectrum in
*a58d3d2aSXin Li the frequency domain, it can simply decimate the MDCT layer output to achieve
*a58d3d2aSXin Li the other supported sample rates very cheaply.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter conversion to the common, desired output sample rate, the decoder simply
*a58d3d2aSXin Li adds the output from the two layers together.
*a58d3d2aSXin LiTo compensate for the different look-ahead required by each layer, the CELT
*a58d3d2aSXin Li encoder input is delayed by an additional 2.7&nbsp;ms.
*a58d3d2aSXin LiThis ensures that low frequencies and high frequencies arrive at the same time.
*a58d3d2aSXin LiThis extra delay may be reduced by an encoder by using less look-ahead for noise
*a58d3d2aSXin Li shaping or using a simpler resampler in the LP layer, but this will reduce
*a58d3d2aSXin Li quality.
*a58d3d2aSXin LiHowever, the base 2.5&nbsp;ms look-ahead in the CELT layer cannot be reduced in
*a58d3d2aSXin Li the encoder because it is needed for the MDCT overlap, whose size is fixed by
*a58d3d2aSXin Li the decoder.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiBoth layers use the same entropy coder, avoiding any waste from "padding bits"
*a58d3d2aSXin Li between them.
*a58d3d2aSXin LiThe hybrid approach makes it easy to support both CBR and VBR coding.
*a58d3d2aSXin LiAlthough the LP layer is VBR, the bit allocation of the MDCT layer can produce
*a58d3d2aSXin Li a final stream that is CBR by using all the bits left unused by the LP layer.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Control Parameters">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe Opus codec includes a number of control parameters which can be changed dynamically during
*a58d3d2aSXin Liregular operation of the codec, without interrupting the audio stream from the encoder to the decoder.
*a58d3d2aSXin LiThese parameters only affect the encoder since any impact they have on the bit-stream is signaled
*a58d3d2aSXin Liin-band such that a decoder can decode any Opus stream without any out-of-band signaling. Any Opus
*a58d3d2aSXin Liimplementation can add or modify these control parameters without affecting interoperability. The most
*a58d3d2aSXin Liimportant encoder control parameters in the reference encoder are listed below.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Bitrate" toc="exlcude">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOpus supports all bitrates from 6&nbsp;kb/s to 510&nbsp;kb/s. All other parameters being
*a58d3d2aSXin Liequal, higher bitrate results in higher quality. For a frame size of 20&nbsp;ms, these
*a58d3d2aSXin Liare the bitrate "sweet spots" for Opus in various configurations:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>8-12 kb/s for NB speech,</t>
*a58d3d2aSXin Li<t>16-20 kb/s for WB speech,</t>
*a58d3d2aSXin Li<t>28-40 kb/s for FB speech,</t>
*a58d3d2aSXin Li<t>48-64 kb/s for FB mono music, and</t>
*a58d3d2aSXin Li<t>64-128 kb/s for FB stereo music.</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Number of Channels (Mono/Stereo)" toc="exlcude">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOpus can transmit either mono or stereo frames within a single stream.
*a58d3d2aSXin LiWhen decoding a mono frame in a stereo decoder, the left and right channels are
*a58d3d2aSXin Li identical, and when decoding a stereo frame in a mono decoder, the mono output
*a58d3d2aSXin Li is the average of the left and right channels.
*a58d3d2aSXin LiIn some cases, it is desirable to encode a stereo input stream in mono (e.g.,
*a58d3d2aSXin Li because the bitrate is too low to encode stereo with sufficient quality).
*a58d3d2aSXin LiThe number of channels encoded can be selected in real-time, but by default the
*a58d3d2aSXin Li reference encoder attempts to make the best decision possible given the
*a58d3d2aSXin Li current bitrate.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Audio Bandwidth" toc="exlcude">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe audio bandwidths supported by Opus are listed in
*a58d3d2aSXin Li <xref target="audio-bandwidth"/>.
*a58d3d2aSXin LiJust like for the number of channels, any decoder can decode audio encoded at
*a58d3d2aSXin Li any bandwidth.
*a58d3d2aSXin LiFor example, any Opus decoder operating at 8&nbsp;kHz can decode a FB Opus
*a58d3d2aSXin Li frame, and any Opus decoder operating at 48&nbsp;kHz can decode a NB frame.
*a58d3d2aSXin LiSimilarly, the reference encoder can take a 48&nbsp;kHz input signal and
*a58d3d2aSXin Li encode it as NB.
*a58d3d2aSXin LiThe higher the audio bandwidth, the higher the required bitrate to achieve
*a58d3d2aSXin Li acceptable quality.
*a58d3d2aSXin LiThe audio bandwidth can be explicitly specified in real-time, but by default
*a58d3d2aSXin Li the reference encoder attempts to make the best bandwidth decision possible
*a58d3d2aSXin Li given the current bitrate.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Frame Duration" toc="exlcude">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOpus can encode frames of 2.5, 5, 10, 20, 40 or 60&nbsp;ms.
*a58d3d2aSXin LiIt can also combine multiple frames into packets of up to 120&nbsp;ms.
*a58d3d2aSXin LiFor real-time applications, sending fewer packets per second reduces the
*a58d3d2aSXin Li bitrate, since it reduces the overhead from IP, UDP, and RTP headers.
*a58d3d2aSXin LiHowever, it increases latency and sensitivity to packet losses, as losing one
*a58d3d2aSXin Li packet constitutes a loss of a bigger chunk of audio.
*a58d3d2aSXin LiIncreasing the frame duration also slightly improves coding efficiency, but the
*a58d3d2aSXin Li gain becomes small for frame sizes above 20&nbsp;ms.
*a58d3d2aSXin LiFor this reason, 20&nbsp;ms frames are a good choice for most applications.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Complexity" toc="exlcude">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThere are various aspects of the Opus encoding process where trade-offs
*a58d3d2aSXin Lican be made between CPU complexity and quality/bitrate. In the reference
*a58d3d2aSXin Liencoder, the complexity is selected using an integer from 0 to 10, where
*a58d3d2aSXin Li0 is the lowest complexity and 10 is the highest. Examples of
*a58d3d2aSXin Licomputations for which such trade-offs may occur are:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>The order of the pitch analysis whitening filter <xref target="Whitening"/>,</t>
*a58d3d2aSXin Li<t>The order of the short-term noise shaping filter,</t>
*a58d3d2aSXin Li<t>The number of states in delayed decision quantization of the
*a58d3d2aSXin Liresidual signal, and</t>
*a58d3d2aSXin Li<t>The use of certain bit-stream features such as variable time-frequency
*a58d3d2aSXin Liresolution and the pitch post-filter.</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Packet Loss Resilience" toc="exlcude">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAudio codecs often exploit inter-frame correlations to reduce the
*a58d3d2aSXin Libitrate at a cost in error propagation: after losing one packet
*a58d3d2aSXin Liseveral packets need to be received before the decoder is able to
*a58d3d2aSXin Liaccurately reconstruct the speech signal.  The extent to which Opus
*a58d3d2aSXin Liexploits inter-frame dependencies can be adjusted on the fly to
*a58d3d2aSXin Lichoose a trade-off between bitrate and amount of error propagation.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Forward Error Correction (FEC)" toc="exlcude">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li   Another mechanism providing robustness against packet loss is the in-band
*a58d3d2aSXin Li   Forward Error Correction (FEC).  Packets that are determined to
*a58d3d2aSXin Li   contain perceptually important speech information, such as onsets or
*a58d3d2aSXin Li   transients, are encoded again at a lower bitrate and this re-encoded
*a58d3d2aSXin Li   information is added to a subsequent packet.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Constant/Variable Bitrate" toc="exlcude">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOpus is more efficient when operating with variable bitrate (VBR), which is
*a58d3d2aSXin Lithe default. However, in some (rare) applications, constant bitrate (CBR)
*a58d3d2aSXin Liis required. There are two main reasons to operate in CBR mode:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>When the transport only supports a fixed size for each compressed frame</t>
*a58d3d2aSXin Li<t>When encryption is used for an audio stream that is either highly constrained
*a58d3d2aSXin Li   (e.g. yes/no, recorded prompts) or highly sensitive <xref target="SRTP-VBR"></xref> </t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li
*a58d3d2aSXin LiWhen low-latency transmission is required over a relatively slow connection, then
*a58d3d2aSXin Liconstrained VBR can also be used. This uses VBR in a way that simulates a
*a58d3d2aSXin Li"bit reservoir" and is equivalent to what MP3 (MPEG 1, Layer 3) and
*a58d3d2aSXin LiAAC (Advanced Audio Coding) call CBR (i.e., not true
*a58d3d2aSXin LiCBR due to the bit reservoir).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Discontinuous Transmission (DTX)" toc="exlcude">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li   Discontinuous Transmission (DTX) reduces the bitrate during silence
*a58d3d2aSXin Li   or background noise.  When DTX is enabled, only one frame is encoded
*a58d3d2aSXin Li   every 400 milliseconds.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="modes" title="Internal Framing">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe Opus encoder produces "packets", which are each a contiguous set of bytes
*a58d3d2aSXin Li meant to be transmitted as a single unit.
*a58d3d2aSXin LiThe packets described here do not include such things as IP, UDP, or RTP
*a58d3d2aSXin Li headers which are normally found in a transport-layer packet.
*a58d3d2aSXin LiA single packet may contain multiple audio frames, so long as they share a
*a58d3d2aSXin Li common set of parameters, including the operating mode, audio bandwidth, frame
*a58d3d2aSXin Li size, and channel count (mono vs. stereo).
*a58d3d2aSXin LiThis section describes the possible combinations of these parameters and the
*a58d3d2aSXin Li internal framing used to pack multiple frames into a single packet.
*a58d3d2aSXin LiThis framing is not self-delimiting.
*a58d3d2aSXin LiInstead, it assumes that a higher layer (such as UDP or RTP <xref target='RFC3550'/>
*a58d3d2aSXin Lior Ogg <xref target='RFC3533'/> or Matroska <xref target='Matroska-website'/>)
*a58d3d2aSXin Li will communicate the length, in bytes, of the packet, and it uses this
*a58d3d2aSXin Li information to reduce the framing overhead in the packet itself.
*a58d3d2aSXin LiA decoder implementation MUST support the framing described in this section.
*a58d3d2aSXin LiAn alternative, self-delimiting variant of the framing is described in
*a58d3d2aSXin Li <xref target="self-delimiting-framing"/>.
*a58d3d2aSXin LiSupport for that variant is OPTIONAL.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAll bit diagrams in this document number the bits so that bit 0 is the most
*a58d3d2aSXin Li significant bit of the first byte, and bit 7 is the least significant.
*a58d3d2aSXin LiBit 8 is thus the most significant bit of the second byte, etc.
*a58d3d2aSXin LiWell-formed Opus packets obey certain requirements, marked [R1] through [R7]
*a58d3d2aSXin Li below.
*a58d3d2aSXin LiThese are summarized in <xref target="malformed-packets"/> along with
*a58d3d2aSXin Li appropriate means of handling malformed packets.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="toc_byte" title="The TOC Byte">
*a58d3d2aSXin Li<t anchor="R1">
*a58d3d2aSXin LiA well-formed Opus packet MUST contain at least one byte&nbsp;[R1].
*a58d3d2aSXin LiThis byte forms a table-of-contents (TOC) header that signals which of the
*a58d3d2aSXin Li various modes and configurations a given packet uses.
*a58d3d2aSXin LiIt is composed of a configuration number, "config", a stereo flag, "s", and a
*a58d3d2aSXin Li frame count code, "c", arranged as illustrated in
*a58d3d2aSXin Li <xref target="toc_byte_fig"/>.
*a58d3d2aSXin LiA description of each of these fields follows.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="toc_byte_fig" title="The TOC Byte">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s| c |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe top five bits of the TOC byte, labeled "config", encode one of 32 possible
*a58d3d2aSXin Li configurations of operating mode, audio bandwidth, and frame size.
*a58d3d2aSXin LiAs described, the LP (SILK) layer and MDCT (CELT) layer can be combined in three possible
*a58d3d2aSXin Li operating modes:
*a58d3d2aSXin Li<list style="numbers">
*a58d3d2aSXin Li<t>A SILK-only mode for use in low bitrate connections with an audio bandwidth
*a58d3d2aSXin Li of WB or less,</t>
*a58d3d2aSXin Li<t>A Hybrid (SILK+CELT) mode for SWB or FB speech at medium bitrates, and</t>
*a58d3d2aSXin Li<t>A CELT-only mode for very low delay speech transmission as well as music
*a58d3d2aSXin Li transmission (NB to FB).</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiThe 32 possible configurations each identify which one of these operating modes
*a58d3d2aSXin Li the packet uses, as well as the audio bandwidth and the frame size.
*a58d3d2aSXin Li<xref target="config_bits"/> lists the parameters for each configuration.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<texttable anchor="config_bits" title="TOC Byte Configuration Parameters">
*a58d3d2aSXin Li<ttcol>Configuration Number(s)</ttcol>
*a58d3d2aSXin Li<ttcol>Mode</ttcol>
*a58d3d2aSXin Li<ttcol>Bandwidth</ttcol>
*a58d3d2aSXin Li<ttcol>Frame Sizes</ttcol>
*a58d3d2aSXin Li<c>0...3</c>   <c>SILK-only</c> <c>NB</c>  <c>10, 20, 40, 60&nbsp;ms</c>
*a58d3d2aSXin Li<c>4...7</c>   <c>SILK-only</c> <c>MB</c>  <c>10, 20, 40, 60&nbsp;ms</c>
*a58d3d2aSXin Li<c>8...11</c>  <c>SILK-only</c> <c>WB</c>  <c>10, 20, 40, 60&nbsp;ms</c>
*a58d3d2aSXin Li<c>12...13</c> <c>Hybrid</c>    <c>SWB</c> <c>10, 20&nbsp;ms</c>
*a58d3d2aSXin Li<c>14...15</c> <c>Hybrid</c>    <c>FB</c>  <c>10, 20&nbsp;ms</c>
*a58d3d2aSXin Li<c>16...19</c> <c>CELT-only</c> <c>NB</c>  <c>2.5, 5, 10, 20&nbsp;ms</c>
*a58d3d2aSXin Li<c>20...23</c> <c>CELT-only</c> <c>WB</c>  <c>2.5, 5, 10, 20&nbsp;ms</c>
*a58d3d2aSXin Li<c>24...27</c> <c>CELT-only</c> <c>SWB</c> <c>2.5, 5, 10, 20&nbsp;ms</c>
*a58d3d2aSXin Li<c>28...31</c> <c>CELT-only</c> <c>FB</c>  <c>2.5, 5, 10, 20&nbsp;ms</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe configuration numbers in each range (e.g., 0...3 for NB SILK-only)
*a58d3d2aSXin Li correspond to the various choices of frame size, in the same order.
*a58d3d2aSXin LiFor example, configuration 0 has a 10&nbsp;ms frame size and configuration 3
*a58d3d2aSXin Li has a 60&nbsp;ms frame size.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOne additional bit, labeled "s", signals mono vs. stereo, with 0 indicating
*a58d3d2aSXin Li mono and 1 indicating stereo.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe remaining two bits of the TOC byte, labeled "c", code the number of frames
*a58d3d2aSXin Li per packet (codes 0 to 3) as follows:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>0:    1 frame in the packet</t>
*a58d3d2aSXin Li<t>1:    2 frames in the packet, each with equal compressed size</t>
*a58d3d2aSXin Li<t>2:    2 frames in the packet, with different compressed sizes</t>
*a58d3d2aSXin Li<t>3:    an arbitrary number of frames in the packet</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiThis draft refers to a packet as a code 0 packet, code 1 packet, etc., based on
*a58d3d2aSXin Li the value of "c".
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Frame Packing">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis section describes how frames are packed according to each possible value
*a58d3d2aSXin Li of "c" in the TOC byte.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="frame-length-coding" title="Frame Length Coding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen a packet contains multiple VBR frames (i.e., code 2 or 3), the compressed
*a58d3d2aSXin Li length of one or more of these frames is indicated with a one- or two-byte
*a58d3d2aSXin Li sequence, with the meaning of the first byte as follows:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>0:          No frame (discontinuous transmission (DTX) or lost packet)</t>
*a58d3d2aSXin Li<t>1...251:    Length of the frame in bytes</t>
*a58d3d2aSXin Li<t>252...255:  A second byte is needed. The total length is (second_byte*4)+first_byte</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe special length 0 indicates that no frame is available, either because it
*a58d3d2aSXin Li was dropped during transmission by some intermediary or because the encoder
*a58d3d2aSXin Li chose not to transmit it.
*a58d3d2aSXin LiAny Opus frame in any mode MAY have a length of 0.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe maximum representable length is 255*4+255=1275&nbsp;bytes.
*a58d3d2aSXin LiFor 20&nbsp;ms frames, this represents a bitrate of 510&nbsp;kb/s, which is
*a58d3d2aSXin Li approximately the highest useful rate for lossily compressed fullband stereo
*a58d3d2aSXin Li music.
*a58d3d2aSXin LiBeyond this point, lossless codecs are more appropriate.
*a58d3d2aSXin LiIt is also roughly the maximum useful rate of the MDCT layer, as shortly
*a58d3d2aSXin Li thereafter quality no longer improves with additional bits due to limitations
*a58d3d2aSXin Li on the codebook sizes.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t anchor="R2">
*a58d3d2aSXin LiNo length is transmitted for the last frame in a VBR packet, or for any of the
*a58d3d2aSXin Li frames in a CBR packet, as it can be inferred from the total size of the
*a58d3d2aSXin Li packet and the size of all other data in the packet.
*a58d3d2aSXin LiHowever, the length of any individual frame MUST NOT exceed
*a58d3d2aSXin Li 1275&nbsp;bytes&nbsp;[R2], to allow for repacketization by gateways,
*a58d3d2aSXin Li conference bridges, or other software.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Code 0: One Frame in the Packet">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor code&nbsp;0 packets, the TOC byte is immediately followed by N-1&nbsp;bytes
*a58d3d2aSXin Li of compressed data for a single frame (where N is the size of the packet),
*a58d3d2aSXin Li as illustrated in <xref target="code0_packet"/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<figure anchor="code0_packet" title="A Code 0 Packet" align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|0|0|                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+                                               |
*a58d3d2aSXin Li|                    Compressed frame 1 (N-1 bytes)...          :
*a58d3d2aSXin Li:                                                               |
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Code 1: Two Frames in the Packet, Each with Equal Compressed Size">
*a58d3d2aSXin Li<t anchor="R3">
*a58d3d2aSXin LiFor code 1 packets, the TOC byte is immediately followed by the
*a58d3d2aSXin Li (N-1)/2&nbsp;bytes of compressed data for the first frame, followed by
*a58d3d2aSXin Li (N-1)/2&nbsp;bytes of compressed data for the second frame, as illustrated in
*a58d3d2aSXin Li <xref target="code1_packet"/>.
*a58d3d2aSXin LiThe number of payload bytes available for compressed data, N-1, MUST be even
*a58d3d2aSXin Li for all code 1 packets&nbsp;[R3].
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<figure anchor="code1_packet" title="A Code 1 Packet" align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|0|1|                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+                                               :
*a58d3d2aSXin Li|             Compressed frame 1 ((N-1)/2 bytes)...             |
*a58d3d2aSXin Li:                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                               |                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               :
*a58d3d2aSXin Li|             Compressed frame 2 ((N-1)/2 bytes)...             |
*a58d3d2aSXin Li:                                               +-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Code 2: Two Frames in the Packet, with Different Compressed Sizes">
*a58d3d2aSXin Li<t anchor="R4">
*a58d3d2aSXin LiFor code 2 packets, the TOC byte is followed by a one- or two-byte sequence
*a58d3d2aSXin Li indicating the length of the first frame (marked N1 in <xref target='code2_packet'/>),
*a58d3d2aSXin Li followed by N1 bytes of compressed data for the first frame.
*a58d3d2aSXin LiThe remaining N-N1-2 or N-N1-3&nbsp;bytes are the compressed data for the
*a58d3d2aSXin Li second frame.
*a58d3d2aSXin LiThis is illustrated in <xref target="code2_packet"/>.
*a58d3d2aSXin LiA code 2 packet MUST contain enough bytes to represent a valid length.
*a58d3d2aSXin LiFor example, a 1-byte code 2 packet is always invalid, and a 2-byte code 2
*a58d3d2aSXin Li packet whose second byte is in the range 252...255 is also invalid.
*a58d3d2aSXin LiThe length of the first frame, N1, MUST also be no larger than the size of the
*a58d3d2aSXin Li payload remaining after decoding that length for all code 2 packets&nbsp;[R4].
*a58d3d2aSXin LiThis makes, for example, a 2-byte code 2 packet with a second byte in the range
*a58d3d2aSXin Li 1...251 invalid as well (the only valid 2-byte code 2 packet is one where the
*a58d3d2aSXin Li length of both frames is zero).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<figure anchor="code2_packet" title="A Code 2 Packet" align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|1|0| N1 (1-2 bytes):                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               :
*a58d3d2aSXin Li|               Compressed frame 1 (N1 bytes)...                |
*a58d3d2aSXin Li:                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                               |                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
*a58d3d2aSXin Li|                     Compressed frame 2...                     :
*a58d3d2aSXin Li:                                                               |
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Code 3: A Signaled Number of Frames in the Packet">
*a58d3d2aSXin Li<t anchor="R5">
*a58d3d2aSXin LiCode 3 packets signal the number of frames, as well as additional
*a58d3d2aSXin Li padding, called "Opus padding" to indicate that this padding is added at the
*a58d3d2aSXin Li Opus layer, rather than at the transport layer.
*a58d3d2aSXin LiCode 3 packets MUST have at least 2 bytes&nbsp;[R6,R7].
*a58d3d2aSXin LiThe TOC byte is followed by a byte encoding the number of frames in the packet
*a58d3d2aSXin Li in bits 2 to 7 (marked "M" in <xref target='frame_count_byte'/>), with bit 1 indicating whether
*a58d3d2aSXin Li or not Opus padding is inserted (marked "p" in <xref target='frame_count_byte'/>), and bit 0
*a58d3d2aSXin Li indicating VBR (marked "v" in <xref target='frame_count_byte'/>).
*a58d3d2aSXin LiM MUST NOT be zero, and the audio duration contained within a packet MUST NOT
*a58d3d2aSXin Li exceed 120&nbsp;ms&nbsp;[R5].
*a58d3d2aSXin LiThis limits the maximum frame count for any frame size to 48 (for 2.5&nbsp;ms
*a58d3d2aSXin Li frames), with lower limits for longer frame sizes.
*a58d3d2aSXin Li<xref target="frame_count_byte"/> illustrates the layout of the frame count
*a58d3d2aSXin Li byte.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<figure anchor="frame_count_byte" title="The frame count byte">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|v|p|     M     |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen Opus padding is used, the number of bytes of padding is encoded in the
*a58d3d2aSXin Li bytes following the frame count byte.
*a58d3d2aSXin LiValues from 0...254 indicate that 0...254&nbsp;bytes of padding are included,
*a58d3d2aSXin Li in addition to the byte(s) used to indicate the size of the padding.
*a58d3d2aSXin LiIf the value is 255, then the size of the additional padding is 254&nbsp;bytes,
*a58d3d2aSXin Li plus the padding value encoded in the next byte.
*a58d3d2aSXin LiThere MUST be at least one more byte in the packet in this case&nbsp;[R6,R7].
*a58d3d2aSXin LiThe additional padding bytes appear at the end of the packet, and MUST be set
*a58d3d2aSXin Li to zero by the encoder to avoid creating a covert channel.
*a58d3d2aSXin LiThe decoder MUST accept any value for the padding bytes, however.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAlthough this encoding provides multiple ways to indicate a given number of
*a58d3d2aSXin Li padding bytes, each uses a different number of bytes to indicate the padding
*a58d3d2aSXin Li size, and thus will increase the total packet size by a different amount.
*a58d3d2aSXin LiFor example, to add 255 bytes to a packet, set the padding bit, p, to 1, insert
*a58d3d2aSXin Li a single byte after the frame count byte with a value of 254, and append 254
*a58d3d2aSXin Li padding bytes with the value zero to the end of the packet.
*a58d3d2aSXin LiTo add 256 bytes to a packet, set the padding bit to 1, insert two bytes after
*a58d3d2aSXin Li the frame count byte with the values 255 and 0, respectively, and append 254
*a58d3d2aSXin Li padding bytes with the value zero to the end of the packet.
*a58d3d2aSXin LiBy using the value 255 multiple times, it is possible to create a packet of any
*a58d3d2aSXin Li specific, desired size.
*a58d3d2aSXin LiLet P be the number of header bytes used to indicate the padding size plus the
*a58d3d2aSXin Li number of padding bytes themselves (i.e., P is the total number of bytes added
*a58d3d2aSXin Li to the packet).
*a58d3d2aSXin LiThen P MUST be no more than N-2&nbsp;[R6,R7].
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t anchor="R6">
*a58d3d2aSXin LiIn the CBR case, let R=N-2-P be the number of bytes remaining in the packet
*a58d3d2aSXin Li after subtracting the (optional) padding.
*a58d3d2aSXin LiThen the compressed length of each frame in bytes is equal to R/M.
*a58d3d2aSXin LiThe value R MUST be a non-negative integer multiple of M&nbsp;[R6].
*a58d3d2aSXin LiThe compressed data for all M frames follows, each of size
*a58d3d2aSXin Li R/M&nbsp;bytes, as illustrated in <xref target="code3cbr_packet"/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="code3cbr_packet" title="A CBR Code 3 Packet" align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|1|1|0|p|     M     |  Padding length (Optional)    :
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame 1 (R/M bytes)...               :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame 2 (R/M bytes)...               :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:                              ...                              :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame M (R/M bytes)...               :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li:                  Opus Padding (Optional)...                   |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t anchor="R7">
*a58d3d2aSXin LiIn the VBR case, the (optional) padding length is followed by M-1 frame
*a58d3d2aSXin Li lengths (indicated by "N1" to "N[M-1]" in <xref target='code3vbr_packet'/>), each encoded in a
*a58d3d2aSXin Li one- or two-byte sequence as described above.
*a58d3d2aSXin LiThe packet MUST contain enough data for the M-1 lengths after removing the
*a58d3d2aSXin Li (optional) padding, and the sum of these lengths MUST be no larger than the
*a58d3d2aSXin Li number of bytes remaining in the packet after decoding them&nbsp;[R7].
*a58d3d2aSXin LiThe compressed data for all M frames follows, each frame consisting of the
*a58d3d2aSXin Li indicated number of bytes, with the final frame consuming any remaining bytes
*a58d3d2aSXin Li before the final padding, as illustrated in <xref target="code3cbr_packet"/>.
*a58d3d2aSXin LiThe number of header bytes (TOC byte, frame count byte, padding length bytes,
*a58d3d2aSXin Li and frame length bytes), plus the signaled length of the first M-1 frames themselves,
*a58d3d2aSXin Li plus the signaled length of the padding MUST be no larger than N, the total size of the
*a58d3d2aSXin Li packet.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="code3vbr_packet" title="A VBR Code 3 Packet" align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|1|1|1|p|     M     | Padding length (Optional)     :
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li: N1 (1-2 bytes): N2 (1-2 bytes):     ...       :     N[M-1]    |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame 1 (N1 bytes)...                :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame 2 (N2 bytes)...                :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:                              ...                              :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:                     Compressed frame M...                     :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li:                  Opus Padding (Optional)...                   |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="examples" title="Examples">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSimplest case, one NB mono 20&nbsp;ms SILK frame:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor='framing_example_1'>
*a58d3d2aSXin Li<artwork><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|    1    |0|0|0|               compressed data...              :
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTwo FB mono 5&nbsp;ms CELT frames of the same compressed size:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor='framing_example_2'>
*a58d3d2aSXin Li<artwork><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|   29    |0|0|1|               compressed data...              :
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTwo FB mono 20&nbsp;ms Hybrid frames of different compressed size:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor='framing_example_3'>
*a58d3d2aSXin Li<artwork><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|   15    |0|1|1|1|0|     2     |      N1       |               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
*a58d3d2aSXin Li|                       compressed data...                      :
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFour FB stereo 20&nbsp;ms CELT frames of the same compressed size:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor='framing_example_4'>
*a58d3d2aSXin Li<artwork><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|   31    |1|1|1|0|0|     4     |      compressed data...       :
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="malformed-packets" title="Receiving Malformed Packets">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA receiver MUST NOT process packets which violate any of the rules above as
*a58d3d2aSXin Li normal Opus packets.
*a58d3d2aSXin LiThey are reserved for future applications, such as in-band headers (containing
*a58d3d2aSXin Li metadata, etc.).
*a58d3d2aSXin LiPackets which violate these constraints may cause implementations of
*a58d3d2aSXin Li <spanx style="emph">this</spanx> specification to treat them as malformed, and
*a58d3d2aSXin Li discard them.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThese constraints are summarized here for reference:
*a58d3d2aSXin Li<list style="format [R%d]">
*a58d3d2aSXin Li<t>Packets are at least one byte.</t>
*a58d3d2aSXin Li<t>No implicit frame length is larger than 1275 bytes.</t>
*a58d3d2aSXin Li<t>Code 1 packets have an odd total length, N, so that (N-1)/2 is an
*a58d3d2aSXin Li integer.</t>
*a58d3d2aSXin Li<t>Code 2 packets have enough bytes after the TOC for a valid frame
*a58d3d2aSXin Li length, and that length is no larger than the number of bytes remaining in the
*a58d3d2aSXin Li packet.</t>
*a58d3d2aSXin Li<t>Code 3 packets contain at least one frame, but no more than 120&nbsp;ms
*a58d3d2aSXin Li of audio total.</t>
*a58d3d2aSXin Li<t>The length of a CBR code 3 packet, N, is at least two bytes, the number of
*a58d3d2aSXin Li bytes added to indicate the padding size plus the trailing padding bytes
*a58d3d2aSXin Li themselves, P, is no more than N-2, and the frame count, M, satisfies
*a58d3d2aSXin Li the constraint that (N-2-P) is a non-negative integer multiple of M.</t>
*a58d3d2aSXin Li<t>VBR code 3 packets are large enough to contain all the header bytes (TOC
*a58d3d2aSXin Li byte, frame count byte, any padding length bytes, and any frame length bytes),
*a58d3d2aSXin Li plus the length of the first M-1 frames, plus any trailing padding bytes.</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Opus Decoder">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe Opus decoder consists of two main blocks: the SILK decoder and the CELT
*a58d3d2aSXin Li decoder.
*a58d3d2aSXin LiAt any given time, one or both of the SILK and CELT decoders may be active.
*a58d3d2aSXin LiThe output of the Opus decode is the sum of the outputs from the SILK and CELT
*a58d3d2aSXin Li decoders with proper sample rate conversion and delay compensation on the SILK
*a58d3d2aSXin Li side, and optional decimation (when decoding to sample rates less than
*a58d3d2aSXin Li 48&nbsp;kHz) on the CELT side, as illustrated in the block diagram below.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<figure>
*a58d3d2aSXin Li<artwork>
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li                         +---------+    +------------+
*a58d3d2aSXin Li                         |  SILK   |    |   Sample   |
*a58d3d2aSXin Li                      +->| Decoder |--->|    Rate    |----+
*a58d3d2aSXin LiBit-    +---------+   |  |         |    | Conversion |    v
*a58d3d2aSXin Listream  |  Range  |---+  +---------+    +------------+  /---\  Audio
*a58d3d2aSXin Li------->| Decoder |                                     | + |------>
*a58d3d2aSXin Li        |         |---+  +---------+    +------------+  \---/
*a58d3d2aSXin Li        +---------+   |  |  CELT   |    | Decimation |    ^
*a58d3d2aSXin Li                      +->| Decoder |--->| (Optional) |----+
*a58d3d2aSXin Li                         |         |    |            |
*a58d3d2aSXin Li                         +---------+    +------------+
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="range-decoder" title="Range Decoder">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOpus uses an entropy coder based on range coding <xref target="range-coding"></xref>
*a58d3d2aSXin Li<xref target="Martin79"></xref>,
*a58d3d2aSXin Liwhich is itself a rediscovery of the FIFO arithmetic code introduced by <xref target="coding-thesis"></xref>.
*a58d3d2aSXin LiIt is very similar to arithmetic encoding, except that encoding is done with
*a58d3d2aSXin Lidigits in any base instead of with bits,
*a58d3d2aSXin Liso it is faster when using larger bases (i.e., a byte). All of the
*a58d3d2aSXin Licalculations in the range coder must use bit-exact integer arithmetic.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSymbols may also be coded as "raw bits" packed directly into the bitstream,
*a58d3d2aSXin Li bypassing the range coder.
*a58d3d2aSXin LiThese are packed backwards starting at the end of the frame, as illustrated in
*a58d3d2aSXin Li <xref target="rawbits-example"/>.
*a58d3d2aSXin LiThis reduces complexity and makes the stream more resilient to bit errors, as
*a58d3d2aSXin Li corruption in the raw bits will not desynchronize the decoding process, unlike
*a58d3d2aSXin Li corruption in the input to the range decoder.
*a58d3d2aSXin LiRaw bits are only used in the CELT layer.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="rawbits-example" title="Illustrative example of packing range
*a58d3d2aSXin Li coder and raw bits data">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| Range coder data (packed MSB to LSB) ->                       :
*a58d3d2aSXin Li+                                                               +
*a58d3d2aSXin Li:                                                               :
*a58d3d2aSXin Li+     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li:     | <- Boundary occurs at an arbitrary bit position         :
*a58d3d2aSXin Li+-+-+-+                                                         +
*a58d3d2aSXin Li:                          <- Raw bits data (packed LSB to MSB) |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiEach symbol coded by the range coder is drawn from a finite alphabet and coded
*a58d3d2aSXin Li in a separate "context", which describes the size of the alphabet and the
*a58d3d2aSXin Li relative frequency of each symbol in that alphabet.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSuppose there is a context with n symbols, identified with an index that ranges
*a58d3d2aSXin Li from 0 to n-1.
*a58d3d2aSXin LiThe parameters needed to encode or decode symbol k in this context are
*a58d3d2aSXin Li represented by a three-tuple (fl[k],&nbsp;fh[k],&nbsp;ft), with
*a58d3d2aSXin Li 0&nbsp;&lt;=&nbsp;fl[k]&nbsp;&lt;&nbsp;fh[k]&nbsp;&lt;=&nbsp;ft&nbsp;&lt;=&nbsp;65535.
*a58d3d2aSXin LiThe values of this tuple are derived from the probability model for the
*a58d3d2aSXin Li symbol, represented by traditional "frequency counts".
*a58d3d2aSXin LiBecause Opus uses static contexts these are not updated as symbols are decoded.
*a58d3d2aSXin LiLet f[i] be the frequency of symbol i.
*a58d3d2aSXin LiThen the three-tuple corresponding to symbol k is given by
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li        k-1                                   n-1
*a58d3d2aSXin Li        __                                    __
*a58d3d2aSXin Lifl[k] = \  f[i],  fh[k] = fl[k] + f[k],  ft = \  f[i]
*a58d3d2aSXin Li        /_                                    /_
*a58d3d2aSXin Li        i=0                                   i=0
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe range decoder extracts the symbols and integers encoded using the range
*a58d3d2aSXin Li encoder in <xref target="range-encoder"/>.
*a58d3d2aSXin LiThe range decoder maintains an internal state vector composed of the two-tuple
*a58d3d2aSXin Li (val,&nbsp;rng), representing the difference between the high end of the
*a58d3d2aSXin Li current range and the actual coded value, minus one, and the size of the
*a58d3d2aSXin Li current range, respectively.
*a58d3d2aSXin LiBoth val and rng are 32-bit unsigned integer values.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="range-decoder-init" title="Range Decoder Initialization">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLet b0 be the first input byte (or zero if there are no bytes in this Opus
*a58d3d2aSXin Li frame).
*a58d3d2aSXin LiThe decoder initializes rng to 128 and initializes val to
*a58d3d2aSXin Li (127&nbsp;-&nbsp;(b0&gt;&gt;1)), where (b0&gt;&gt;1) is the top 7 bits of the
*a58d3d2aSXin Li first input byte.
*a58d3d2aSXin LiIt saves the remaining bit, (b0&amp;1), for use in the renormalization
*a58d3d2aSXin Li procedure described in <xref target="range-decoder-renorm"/>, which the
*a58d3d2aSXin Li decoder invokes immediately after initialization to read additional bits and
*a58d3d2aSXin Li establish the invariant that rng&nbsp;&gt;&nbsp;2**23.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="decoding-symbols" title="Decoding Symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiDecoding a symbol is a two-step process.
*a58d3d2aSXin LiThe first step determines a 16-bit unsigned value fs, which lies within the
*a58d3d2aSXin Li range of some symbol in the current context.
*a58d3d2aSXin LiThe second step updates the range decoder state with the three-tuple
*a58d3d2aSXin Li (fl[k],&nbsp;fh[k],&nbsp;ft) corresponding to that symbol.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe first step is implemented by ec_decode() (entdec.c), which computes
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li               val
*a58d3d2aSXin Lifs = ft - min(------ + 1, ft) .
*a58d3d2aSXin Li              rng/ft
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe divisions here are integer division.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoder then identifies the symbol in the current context corresponding to
*a58d3d2aSXin Li fs; i.e., the value of k whose three-tuple (fl[k],&nbsp;fh[k],&nbsp;ft)
*a58d3d2aSXin Li satisfies fl[k]&nbsp;&lt;=&nbsp;fs&nbsp;&lt;&nbsp;fh[k].
*a58d3d2aSXin LiIt uses this tuple to update val according to
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li            rng
*a58d3d2aSXin Lival = val - --- * (ft - fh[k]) .
*a58d3d2aSXin Li            ft
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiIf fl[k] is greater than zero, then the decoder updates rng using
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li      rng
*a58d3d2aSXin Lirng = --- * (fh[k] - fl[k]) .
*a58d3d2aSXin Li      ft
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiOtherwise, it updates rng using
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li            rng
*a58d3d2aSXin Lirng = rng - --- * (ft - fh[k]) .
*a58d3d2aSXin Li            ft
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiUsing a special case for the first symbol (rather than the last symbol, as is
*a58d3d2aSXin Li commonly done in other arithmetic coders) ensures that all the truncation
*a58d3d2aSXin Li error from the finite precision arithmetic accumulates in symbol 0.
*a58d3d2aSXin LiThis makes the cost of coding a 0 slightly smaller, on average, than its
*a58d3d2aSXin Li estimated probability indicates and makes the cost of coding any other symbol
*a58d3d2aSXin Li slightly larger.
*a58d3d2aSXin LiWhen contexts are designed so that 0 is the most probable symbol, which is
*a58d3d2aSXin Li often the case, this strategy minimizes the inefficiency introduced by the
*a58d3d2aSXin Li finite precision.
*a58d3d2aSXin LiIt also makes some of the special-case decoding routines in
*a58d3d2aSXin Li <xref target="decoding-alternate"/> particularly simple.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter the updates, implemented by ec_dec_update() (entdec.c), the decoder
*a58d3d2aSXin Li normalizes the range using the procedure in the next section, and returns the
*a58d3d2aSXin Li index k.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="range-decoder-renorm" title="Renormalization">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTo normalize the range, the decoder repeats the following process, implemented
*a58d3d2aSXin Li by ec_dec_normalize() (entdec.c), until rng&nbsp;&gt;&nbsp;2**23.
*a58d3d2aSXin LiIf rng is already greater than 2**23, the entire process is skipped.
*a58d3d2aSXin LiFirst, it sets rng to (rng&lt;&lt;8).
*a58d3d2aSXin LiThen it reads the next byte of the Opus frame and forms an 8-bit value sym,
*a58d3d2aSXin Li using the left-over bit buffered from the previous byte as the high bit
*a58d3d2aSXin Li and the top 7 bits of the byte just read as the other 7 bits of sym.
*a58d3d2aSXin LiThe remaining bit in the byte just read is buffered for use in the next
*a58d3d2aSXin Li iteration.
*a58d3d2aSXin LiIf no more input bytes remain, it uses zero bits instead.
*a58d3d2aSXin LiSee <xref target="range-decoder-init"/> for the initialization used to process
*a58d3d2aSXin Li the first byte.
*a58d3d2aSXin LiThen, it sets
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lival = ((val<<8) + (255-sym)) & 0x7FFFFFFF .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIt is normal and expected that the range decoder will read several bytes
*a58d3d2aSXin Li into the raw bits data (if any) at the end of the packet by the time the frame
*a58d3d2aSXin Li is completely decoded, as illustrated in <xref target="finalize-example"/>.
*a58d3d2aSXin LiThis same data MUST also be returned as raw bits when requested.
*a58d3d2aSXin LiThe encoder is expected to terminate the stream in such a way that the decoder
*a58d3d2aSXin Li will decode the intended values regardless of the data contained in the raw
*a58d3d2aSXin Li bits.
*a58d3d2aSXin Li<xref target="encoder-finalizing"/> describes a procedure for doing this.
*a58d3d2aSXin LiIf the range decoder consumes all of the bytes belonging to the current frame,
*a58d3d2aSXin Li it MUST continue to use zero when any further input bytes are required, even
*a58d3d2aSXin Li if there is additional data in the current packet from padding or other
*a58d3d2aSXin Li frames.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="finalize-example" title="Illustrative example of raw bits
*a58d3d2aSXin Li overlapping range coder data">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li n              n+1             n+2             n+3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li:     | <----------- Overlap region ------------> |             :
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li      ^                                           ^
*a58d3d2aSXin Li      |   End of data buffered by the range coder |
*a58d3d2aSXin Li...-----------------------------------------------+
*a58d3d2aSXin Li      |
*a58d3d2aSXin Li      | End of data consumed by raw bits
*a58d3d2aSXin Li      +-------------------------------------------------------...
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="decoding-alternate" title="Alternate Decoding Methods">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe reference implementation uses three additional decoding methods that are
*a58d3d2aSXin Li exactly equivalent to the above, but make assumptions and simplifications that
*a58d3d2aSXin Li allow for a more efficient implementation.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<section anchor="ec_decode_bin" title="ec_decode_bin()">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe first is ec_decode_bin() (entdec.c), defined using the parameter ftb
*a58d3d2aSXin Li instead of ft.
*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_decode() with
*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;ftb), but avoids one of the divisions.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li<section anchor="ec_dec_bit_logp" title="ec_dec_bit_logp()">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe next is ec_dec_bit_logp() (entdec.c), which decodes a single binary symbol,
*a58d3d2aSXin Li replacing both the ec_decode() and ec_dec_update() steps.
*a58d3d2aSXin LiThe context is described by a single parameter, logp, which is the absolute
*a58d3d2aSXin Li value of the base-2 logarithm of the probability of a "1".
*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_decode() with
*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;logp), followed by ec_dec_update() with
*a58d3d2aSXin Li the 3-tuple (fl[k]&nbsp;=&nbsp;0,
*a58d3d2aSXin Li fh[k]&nbsp;=&nbsp;(1&lt;&lt;logp)&nbsp;-&nbsp;1,
*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;logp)) if the returned value
*a58d3d2aSXin Li of fs is less than (1&lt;&lt;logp)&nbsp;-&nbsp;1 (a "0" was decoded), and with
*a58d3d2aSXin Li (fl[k]&nbsp;=&nbsp;(1&lt;&lt;logp)&nbsp;-&nbsp;1,
*a58d3d2aSXin Li fh[k]&nbsp;=&nbsp;ft&nbsp;=&nbsp;(1&lt;&lt;logp)) otherwise (a "1" was
*a58d3d2aSXin Li decoded).
*a58d3d2aSXin LiThe implementation requires no multiplications or divisions.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li<section anchor="ec_dec_icdf" title="ec_dec_icdf()">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe last is ec_dec_icdf() (entdec.c), which decodes a single symbol with a
*a58d3d2aSXin Li table-based context of up to 8 bits, also replacing both the ec_decode() and
*a58d3d2aSXin Li ec_dec_update() steps, as well as the search for the decoded symbol in between.
*a58d3d2aSXin LiThe context is described by two parameters, an icdf
*a58d3d2aSXin Li ("inverse" cumulative distribution function) table and ftb.
*a58d3d2aSXin LiAs with ec_decode_bin(), (1&lt;&lt;ftb) is equivalent to ft.
*a58d3d2aSXin Liidcf[k], on the other hand, stores (1&lt;&lt;ftb)-fh[k], which is equal to
*a58d3d2aSXin Li (1&lt;&lt;ftb)&nbsp;-&nbsp;fl[k+1].
*a58d3d2aSXin Lifl[0] is assumed to be 0, and the table is terminated by a value of 0 (where
*a58d3d2aSXin Li fh[k]&nbsp;==&nbsp;ft).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe function is mathematically equivalent to calling ec_decode() with
*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;ftb), using the returned value fs to search the table
*a58d3d2aSXin Li for the first entry where fs&nbsp;&lt;&nbsp;(1&lt;&lt;ftb)-icdf[k], and
*a58d3d2aSXin Li calling ec_dec_update() with
*a58d3d2aSXin Li fl[k]&nbsp;=&nbsp;(1&lt;&lt;ftb)&nbsp;-&nbsp;icdf[k-1] (or 0
*a58d3d2aSXin Li if k&nbsp;==&nbsp;0), fh[k]&nbsp;=&nbsp;(1&lt;&lt;ftb)&nbsp;-&nbsp;idcf[k],
*a58d3d2aSXin Li and ft&nbsp;=&nbsp;(1&lt;&lt;ftb).
*a58d3d2aSXin LiCombining the search with the update allows the division to be replaced by a
*a58d3d2aSXin Li series of multiplications (which are usually much cheaper), and using an
*a58d3d2aSXin Li inverse CDF allows the use of an ftb as large as 8 in an 8-bit table without
*a58d3d2aSXin Li any special cases.
*a58d3d2aSXin LiThis is the primary interface with the range decoder in the SILK layer, though
*a58d3d2aSXin Li it is used in a few places in the CELT layer as well.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAlthough icdf[k] is more convenient for the code, the frequency counts, f[k],
*a58d3d2aSXin Li are a more natural representation of the probability distribution function
*a58d3d2aSXin Li (PDF) for a given symbol.
*a58d3d2aSXin LiTherefore this draft lists the latter, not the former, when describing the
*a58d3d2aSXin Li context in which a symbol is coded as a list, e.g., {4, 4, 4, 4}/16 for a
*a58d3d2aSXin Li uniform context with four possible values and ft&nbsp;=&nbsp;16.
*a58d3d2aSXin LiThe value of ft after the slash is always the sum of the entries in the PDF,
*a58d3d2aSXin Li but is included for convenience.
*a58d3d2aSXin LiContexts with identical probabilities, f[k]/ft, but different values of ft
*a58d3d2aSXin Li (or equivalently, ftb) are not the same, and cannot, in general, be used in
*a58d3d2aSXin Li place of one another.
*a58d3d2aSXin LiAn icdf table is also not capable of representing a PDF where the first symbol
*a58d3d2aSXin Li has 0 probability.
*a58d3d2aSXin LiIn such contexts, ec_dec_icdf() can decode the symbol by using a table that
*a58d3d2aSXin Li drops the entries for any initial zero-probability values and adding the
*a58d3d2aSXin Li constant offset of the first value with a non-zero probability to its return
*a58d3d2aSXin Li value.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="decoding-bits" title="Decoding Raw Bits">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe raw bits used by the CELT layer are packed at the end of the packet, with
*a58d3d2aSXin Li the least significant bit of the first value packed in the least significant
*a58d3d2aSXin Li bit of the last byte, filling up to the most significant bit in the last byte,
*a58d3d2aSXin Li continuing on to the least significant bit of the penultimate byte, and so on.
*a58d3d2aSXin LiThe reference implementation reads them using ec_dec_bits() (entdec.c).
*a58d3d2aSXin LiBecause the range decoder must read several bytes ahead in the stream, as
*a58d3d2aSXin Li described in <xref target="range-decoder-renorm"/>, the input consumed by the
*a58d3d2aSXin Li raw bits may overlap with the input consumed by the range coder, and a decoder
*a58d3d2aSXin Li MUST allow this.
*a58d3d2aSXin LiThe format should render it impossible to attempt to read more raw bits than
*a58d3d2aSXin Li there are actual bits in the frame, though a decoder may wish to check for
*a58d3d2aSXin Li this and report an error.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ec_dec_uint" title="Decoding Uniformly Distributed Integers">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe function ec_dec_uint() (entdec.c) decodes one of ft equiprobable values in
*a58d3d2aSXin Li the range 0 to (ft&nbsp;-&nbsp;1), inclusive, each with a frequency of 1,
*a58d3d2aSXin Li where ft may be as large as (2**32&nbsp;-&nbsp;1).
*a58d3d2aSXin LiBecause ec_decode() is limited to a total frequency of (2**16&nbsp;-&nbsp;1),
*a58d3d2aSXin Li it splits up the value into a range coded symbol representing up to 8 of the
*a58d3d2aSXin Li high bits, and, if necessary, raw bits representing the remainder of the
*a58d3d2aSXin Li value.
*a58d3d2aSXin LiThe limit of 8 bits in the range coded symbol is a trade-off between
*a58d3d2aSXin Li implementation complexity, modeling error (since the symbols no longer truly
*a58d3d2aSXin Li have equal coding cost), and rounding error introduced by the range coder
*a58d3d2aSXin Li itself (which gets larger as more bits are included).
*a58d3d2aSXin LiUsing raw bits reduces the maximum number of divisions required in the worst
*a58d3d2aSXin Li case, but means that it may be possible to decode a value outside the range
*a58d3d2aSXin Li 0 to (ft&nbsp;-&nbsp;1), inclusive.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Liec_dec_uint() takes a single, positive parameter, ft, which is not necessarily
*a58d3d2aSXin Li a power of two, and returns an integer, t, whose value lies between 0 and
*a58d3d2aSXin Li (ft&nbsp;-&nbsp;1), inclusive.
*a58d3d2aSXin LiLet ftb&nbsp;=&nbsp;ilog(ft&nbsp;-&nbsp;1), i.e., the number of bits required
*a58d3d2aSXin Li to store (ft&nbsp;-&nbsp;1) in two's complement notation.
*a58d3d2aSXin LiIf ftb is 8 or less, then t is decoded with t&nbsp;=&nbsp;ec_decode(ft), and
*a58d3d2aSXin Li the range coder state is updated using the three-tuple (t, t&nbsp;+&nbsp;1,
*a58d3d2aSXin Li ft).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf ftb is greater than 8, then the top 8 bits of t are decoded using
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lit = ec_decode(((ft - 1) >> (ftb - 8)) + 1) ,
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li the decoder state is updated using the three-tuple
*a58d3d2aSXin Li (t, t&nbsp;+&nbsp;1,
*a58d3d2aSXin Li ((ft&nbsp;-&nbsp;1)&nbsp;&gt;&gt;&nbsp;(ftb&nbsp;-&nbsp;8))&nbsp;+&nbsp;1),
*a58d3d2aSXin Li and the remaining bits are decoded as raw bits, setting
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lit = (t << (ftb - 8)) | ec_dec_bits(ftb - 8) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiIf, at this point, t >= ft, then the current frame is corrupt.
*a58d3d2aSXin LiIn that case, the decoder should assume there has been an error in the coding,
*a58d3d2aSXin Li decoding, or transmission and SHOULD take measures to conceal the
*a58d3d2aSXin Li error and/or report to the application that the error has occurred.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="decoder-tell" title="Current Bit Usage">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe bit allocation routines in the CELT decoder need a conservative upper bound
*a58d3d2aSXin Li on the number of bits that have been used from the current frame thus far,
*a58d3d2aSXin Li including both range coder bits and raw bits.
*a58d3d2aSXin LiThis drives allocation decisions that must match those made in the encoder.
*a58d3d2aSXin LiThe upper bound is computed in the reference implementation to whole-bit
*a58d3d2aSXin Li precision by the function ec_tell() (entcode.h) and to fractional 1/8th bit
*a58d3d2aSXin Li precision by the function ec_tell_frac() (entcode.c).
*a58d3d2aSXin LiLike all operations in the range coder, it must be implemented in a bit-exact
*a58d3d2aSXin Li manner, and must produce exactly the same value returned by the same functions
*a58d3d2aSXin Li in the encoder after encoding the same symbols.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Liec_tell() is guaranteed to return ceil(ec_tell_frac()/8.0).
*a58d3d2aSXin LiIn various places the codec will check to ensure there is enough room to
*a58d3d2aSXin Li contain a symbol before attempting to decode it.
*a58d3d2aSXin LiIn practice, although the number of bits used so far is an upper bound,
*a58d3d2aSXin Li decoding a symbol whose probability model suggests it has a worst-case cost of
*a58d3d2aSXin Li p 1/8th bits may actually advance the return value of ec_tell_frac() by
*a58d3d2aSXin Li p-1, p, or p+1 1/8th bits, due to approximation error in that upper bound,
*a58d3d2aSXin Li truncation error in the range coder, and for large values of ft, modeling
*a58d3d2aSXin Li error in ec_dec_uint().
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiHowever, this error is bounded, and periodic calls to ec_tell() or
*a58d3d2aSXin Li ec_tell_frac() at precisely defined points in the decoding process prevent it
*a58d3d2aSXin Li from accumulating.
*a58d3d2aSXin LiFor a range coder symbol that requires a whole number of bits (i.e.,
*a58d3d2aSXin Li for which ft/(fh[k]&nbsp;-&nbsp;fl[k]) is a power of two), where there are at
*a58d3d2aSXin Li least p 1/8th bits available, decoding the symbol will never cause ec_tell() or
*a58d3d2aSXin Li ec_tell_frac() to exceed the size of the frame ("bust the budget").
*a58d3d2aSXin LiIn this case the return value of ec_tell_frac() will only advance by more than
*a58d3d2aSXin Li p 1/8th bits if there was an additional, fractional number of bits remaining,
*a58d3d2aSXin Li and it will never advance beyond the next whole-bit boundary, which is safe,
*a58d3d2aSXin Li since frames always contain a whole number of bits.
*a58d3d2aSXin LiHowever, when p is not a whole number of bits, an extra 1/8th bit is required
*a58d3d2aSXin Li to ensure that decoding the symbol will not bust the budget.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe reference implementation keeps track of the total number of whole bits that
*a58d3d2aSXin Li have been processed by the decoder so far in the variable nbits_total,
*a58d3d2aSXin Li including the (possibly fractional) number of bits that are currently
*a58d3d2aSXin Li buffered, but not consumed, inside the range coder.
*a58d3d2aSXin Linbits_total is initialized to 9 just before the initial range renormalization
*a58d3d2aSXin Li process completes (or equivalently, it can be initialized to 33 after the
*a58d3d2aSXin Li first renormalization).
*a58d3d2aSXin LiThe extra two bits over the actual amount buffered by the range coder
*a58d3d2aSXin Li guarantees that it is an upper bound and that there is enough room for the
*a58d3d2aSXin Li encoder to terminate the stream.
*a58d3d2aSXin LiEach iteration through the range coder's renormalization loop increases
*a58d3d2aSXin Li nbits_total by 8.
*a58d3d2aSXin LiReading raw bits increases nbits_total by the number of raw bits read.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ec_tell" title="ec_tell()">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe whole number of bits buffered in rng may be estimated via lg = ilog(rng).
*a58d3d2aSXin Liec_tell() then becomes a simple matter of removing these bits from the total.
*a58d3d2aSXin LiIt returns (nbits_total - lg).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIn a newly initialized decoder, before any symbols have been read, this reports
*a58d3d2aSXin Li that 1 bit has been used.
*a58d3d2aSXin LiThis is the bit reserved for termination of the encoder.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ec_tell_frac" title="ec_tell_frac()">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Liec_tell_frac() estimates the number of bits buffered in rng to fractional
*a58d3d2aSXin Li precision.
*a58d3d2aSXin LiSince rng must be greater than 2**23 after renormalization, lg must be at least
*a58d3d2aSXin Li 24.
*a58d3d2aSXin LiLet
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Lir_Q15 = rng >> (lg-16) ,
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li so that 32768 &lt;= r_Q15 &lt; 65536, an unsigned Q15 value representing the
*a58d3d2aSXin Li fractional part of rng.
*a58d3d2aSXin LiThen the following procedure can be used to add one bit of precision to lg.
*a58d3d2aSXin LiFirst, update
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Lir_Q15 = (r_Q15*r_Q15) >> 15 .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThen add the 16th bit of r_Q15 to lg via
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Lilg = 2*lg + (r_Q15 >> 16) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiFinally, if this bit was a 1, reduce r_Q15 by a factor of two via
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Lir_Q15 = r_Q15 >> 1 ,
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li so that it once again lies in the range 32768 &lt;= r_Q15 &lt; 65536.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis procedure is repeated three times to extend lg to 1/8th bit precision.
*a58d3d2aSXin Liec_tell_frac() then returns (nbits_total*8 - lg).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_decoder_outline" title="SILK Decoder">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoder's LP layer uses a modified version of the SILK codec (herein simply
*a58d3d2aSXin Li called "SILK"), which runs a decoded excitation signal through adaptive
*a58d3d2aSXin Li long-term and short-term prediction synthesis filters.
*a58d3d2aSXin LiIt runs at NB, MB, and WB sample rates internally.
*a58d3d2aSXin LiWhen used in a SWB or FB Hybrid frame, the LP layer itself still only runs in
*a58d3d2aSXin Li WB.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="SILK Decoder Modules">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAn overview of the decoder is given in <xref target="silk_decoder_figure"/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<figure align="center" anchor="silk_decoder_figure" title="SILK Decoder">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li   +---------+    +------------+
*a58d3d2aSXin Li-->| Range   |--->| Decode     |---------------------------+
*a58d3d2aSXin Li 1 | Decoder | 2  | Parameters |----------+       5        |
*a58d3d2aSXin Li   +---------+    +------------+     4    |                |
*a58d3d2aSXin Li                       3 |                |                |
*a58d3d2aSXin Li                        \/               \/               \/
*a58d3d2aSXin Li                  +------------+   +------------+   +------------+
*a58d3d2aSXin Li                  | Generate   |-->| LTP        |-->| LPC        |
*a58d3d2aSXin Li                  | Excitation |   | Synthesis  |   | Synthesis  |
*a58d3d2aSXin Li                  +------------+   +------------+   +------------+
*a58d3d2aSXin Li                                          ^                |
*a58d3d2aSXin Li                                          |                |
*a58d3d2aSXin Li                      +-------------------+----------------+
*a58d3d2aSXin Li                      |                                      6
*a58d3d2aSXin Li                      |   +------------+   +-------------+
*a58d3d2aSXin Li                      +-->| Stereo     |-->| Sample Rate |-->
*a58d3d2aSXin Li                          | Unmixing   | 7 | Conversion  | 8
*a58d3d2aSXin Li                          +------------+   +-------------+
*a58d3d2aSXin Li
*a58d3d2aSXin Li1: Range encoded bitstream
*a58d3d2aSXin Li2: Coded parameters
*a58d3d2aSXin Li3: Pulses, LSBs, and signs
*a58d3d2aSXin Li4: Pitch lags, Long-Term Prediction (LTP) coefficients
*a58d3d2aSXin Li5: Linear Predictive Coding (LPC) coefficients and gains
*a58d3d2aSXin Li6: Decoded signal (mono or mid-side stereo)
*a58d3d2aSXin Li7: Unmixed signal (mono or left-right stereo)
*a58d3d2aSXin Li8: Resampled signal
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoder feeds the bitstream (1) to the range decoder from
*a58d3d2aSXin Li <xref target="range-decoder"/>, and then decodes the parameters in it (2)
*a58d3d2aSXin Li using the procedures detailed in
*a58d3d2aSXin Li Sections&nbsp;<xref format="counter" target="silk_header_bits"/>
*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_signs"/>.
*a58d3d2aSXin LiThese parameters (3, 4, 5) are used to generate an excitation signal (see
*a58d3d2aSXin Li <xref target="silk_excitation_reconstruction"/>), which is fed to an optional
*a58d3d2aSXin Li long-term prediction (LTP) filter (voiced frames only, see
*a58d3d2aSXin Li <xref target="silk_ltp_synthesis"/>) and then a short-term prediction filter
*a58d3d2aSXin Li (see <xref target="silk_lpc_synthesis"/>), producing the decoded signal (6).
*a58d3d2aSXin LiFor stereo streams, the mid-side representation is converted to separate left
*a58d3d2aSXin Li and right channels (7).
*a58d3d2aSXin LiThe result is finally resampled to the desired output sample rate (e.g.,
*a58d3d2aSXin Li 48&nbsp;kHz) so that the resampled signal (8) can be mixed with the CELT
*a58d3d2aSXin Li layer.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_layer_organization" title="LP Layer Organization">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiInternally, the LP layer of a single Opus frame is composed of either a single
*a58d3d2aSXin Li 10&nbsp;ms regular SILK frame or between one and three 20&nbsp;ms regular SILK
*a58d3d2aSXin Li frames.
*a58d3d2aSXin LiA stereo Opus frame may double the number of regular SILK frames (up to a total
*a58d3d2aSXin Li of six), since it includes separate frames for a mid channel and, optionally,
*a58d3d2aSXin Li a side channel.
*a58d3d2aSXin LiOptional Low Bit-Rate Redundancy (LBRR) frames, which are reduced-bitrate
*a58d3d2aSXin Li encodings of previous SILK frames, may be included to aid in recovery from
*a58d3d2aSXin Li packet loss.
*a58d3d2aSXin LiIf present, these appear before the regular SILK frames.
*a58d3d2aSXin LiThey are in most respects identical to regular, active SILK frames, except that
*a58d3d2aSXin Li they are usually encoded with a lower bitrate.
*a58d3d2aSXin LiThis draft uses "SILK frame" to refer to either one and "regular SILK frame" if
*a58d3d2aSXin Li it needs to draw a distinction between the two.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLogically, each SILK frame is in turn composed of either two or four 5&nbsp;ms
*a58d3d2aSXin Li subframes.
*a58d3d2aSXin LiVarious parameters, such as the quantization gain of the excitation and the
*a58d3d2aSXin Li pitch lag and filter coefficients can vary on a subframe-by-subframe basis.
*a58d3d2aSXin LiPhysically, the parameters for each subframe are interleaved in the bitstream,
*a58d3d2aSXin Li as described in the relevant sections for each parameter.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAll of these frames and subframes are decoded from the same range coder, with
*a58d3d2aSXin Li no padding between them.
*a58d3d2aSXin LiThus packing multiple SILK frames in a single Opus frame saves, on average,
*a58d3d2aSXin Li half a byte per SILK frame.
*a58d3d2aSXin LiIt also allows some parameters to be predicted from prior SILK frames in the
*a58d3d2aSXin Li same Opus frame, since this does not degrade packet loss robustness (beyond
*a58d3d2aSXin Li any penalty for merely using fewer, larger packets to store multiple frames).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiStereo support in SILK uses a variant of mid-side coding, allowing a mono
*a58d3d2aSXin Li decoder to simply decode the mid channel.
*a58d3d2aSXin LiHowever, the data for the two channels is interleaved, so a mono decoder must
*a58d3d2aSXin Li still unpack the data for the side channel.
*a58d3d2aSXin LiIt would be required to do so anyway for Hybrid Opus frames, or to support
*a58d3d2aSXin Li decoding individual 20&nbsp;ms frames.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li<xref target="silk_symbols"/> summarizes the overall grouping of the contents of
*a58d3d2aSXin Li the LP layer.
*a58d3d2aSXin LiFigures&nbsp;<xref format="counter" target="silk_mono_60ms_frame"/>
*a58d3d2aSXin Li and&nbsp;<xref format="counter" target="silk_stereo_60ms_frame"/> illustrate
*a58d3d2aSXin Li the ordering of the various SILK frames for a 60&nbsp;ms Opus frame, for both
*a58d3d2aSXin Li mono and stereo, respectively.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_symbols"
*a58d3d2aSXin Li title="Organization of the SILK layer of an Opus frame">
*a58d3d2aSXin Li<ttcol align="center">Symbol(s)</ttcol>
*a58d3d2aSXin Li<ttcol align="center">PDF(s)</ttcol>
*a58d3d2aSXin Li<ttcol align="center">Condition</ttcol>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Voice Activity Detection (VAD) flags</c>
*a58d3d2aSXin Li<c>{1, 1}/2</c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>LBRR flag</c>
*a58d3d2aSXin Li<c>{1, 1}/2</c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Per-frame LBRR flags</c>
*a58d3d2aSXin Li<c><xref target="silk_lbrr_flag_pdfs"/></c>
*a58d3d2aSXin Li<c><xref target="silk_lbrr_flags"/></c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>LBRR Frame(s)</c>
*a58d3d2aSXin Li<c><xref target="silk_frame"/></c>
*a58d3d2aSXin Li<c><xref target="silk_lbrr_flags"/></c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Regular SILK Frame(s)</c>
*a58d3d2aSXin Li<c><xref target="silk_frame"/></c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure align="center" anchor="silk_mono_60ms_frame"
*a58d3d2aSXin Li title="A 60&nbsp;ms Mono Frame">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li|            VAD Flags            |
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li|            LBRR Flag            |
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li| Per-Frame LBRR Flags (Optional) |
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li|     LBRR Frame 1 (Optional)     |
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li|     LBRR Frame 2 (Optional)     |
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li|     LBRR Frame 3 (Optional)     |
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li|      Regular SILK Frame 1       |
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li|      Regular SILK Frame 2       |
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li|      Regular SILK Frame 3       |
*a58d3d2aSXin Li+---------------------------------+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure align="center" anchor="silk_stereo_60ms_frame"
*a58d3d2aSXin Li title="A 60&nbsp;ms Stereo Frame">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|             Mid VAD Flags             |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|             Mid LBRR Flag             |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|             Side VAD Flags            |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|             Side LBRR Flag            |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|  Mid Per-Frame LBRR Flags (Optional)  |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li| Side Per-Frame LBRR Flags (Optional)  |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|     Mid LBRR Frame 1 (Optional)       |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|     Side LBRR Frame 1 (Optional)      |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|     Mid LBRR Frame 2 (Optional)       |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|     Side LBRR Frame 2 (Optional)      |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|     Mid LBRR Frame 3 (Optional)       |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|     Side LBRR Frame 3 (Optional)      |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|      Mid Regular SILK Frame 1         |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li| Side Regular SILK Frame 1 (Optional)  |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|      Mid Regular SILK Frame 2         |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li| Side Regular SILK Frame 2 (Optional)  |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li|      Mid Regular SILK Frame 3         |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li| Side Regular SILK Frame 3 (Optional)  |
*a58d3d2aSXin Li+---------------------------------------+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_header_bits" title="Header Bits">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe LP layer begins with two to eight header bits, decoded in silk_Decode()
*a58d3d2aSXin Li (dec_API.c).
*a58d3d2aSXin LiThese consist of one Voice Activity Detection (VAD) bit per frame (up to 3),
*a58d3d2aSXin Li followed by a single flag indicating the presence of LBRR frames.
*a58d3d2aSXin LiFor a stereo packet, these first flags correspond to the mid channel, and a
*a58d3d2aSXin Li second set of flags is included for the side channel.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiBecause these are the first symbols decoded by the range coder and because they
*a58d3d2aSXin Li are coded as binary values with uniform probability, they can be extracted
*a58d3d2aSXin Li directly from the most significant bits of the first byte of compressed data.
*a58d3d2aSXin LiThus, a receiver can determine if an Opus frame contains any active SILK frames
*a58d3d2aSXin Li without the overhead of using the range decoder.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_lbrr_flags" title="Per-Frame LBRR Flags">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor Opus frames longer than 20&nbsp;ms, a set of LBRR flags is
*a58d3d2aSXin Li decoded for each channel that has its LBRR flag set.
*a58d3d2aSXin LiEach set contains one flag per 20&nbsp;ms SILK frame.
*a58d3d2aSXin Li40&nbsp;ms Opus frames use the 2-frame LBRR flag PDF from
*a58d3d2aSXin Li <xref target="silk_lbrr_flag_pdfs"/>, and 60&nbsp;ms Opus frames use the
*a58d3d2aSXin Li 3-frame LBRR flag PDF.
*a58d3d2aSXin LiFor each channel, the resulting 2- or 3-bit integer contains the corresponding
*a58d3d2aSXin Li LBRR flag for each frame, packed in order from the LSB to the MSB.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_lbrr_flag_pdfs" title="LBRR Flag PDFs">
*a58d3d2aSXin Li<ttcol>Frame Size</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>40&nbsp;ms</c> <c>{0, 53, 53, 150}/256</c>
*a58d3d2aSXin Li<c>60&nbsp;ms</c> <c>{0, 41, 20, 29, 41, 15, 28, 82}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA 10&nbsp;or 20&nbsp;ms Opus frame does not contain any per-frame LBRR flags,
*a58d3d2aSXin Li as there may be at most one LBRR frame per channel.
*a58d3d2aSXin LiThe global LBRR flag in the header bits (see <xref target="silk_header_bits"/>)
*a58d3d2aSXin Li is already sufficient to indicate the presence of that single LBRR frame.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_lbrr_frames" title="LBRR Frames">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe LBRR frames, if present, contain an encoded representation of the signal
*a58d3d2aSXin Li immediately prior to the current Opus frame as if it were encoded with the
*a58d3d2aSXin Li current mode, frame size, audio bandwidth, and channel count, even if those
*a58d3d2aSXin Li differ from the prior Opus frame.
*a58d3d2aSXin LiWhen one of these parameters changes from one Opus frame to the next, this
*a58d3d2aSXin Li implies that the LBRR frames of the current Opus frame may not be simple
*a58d3d2aSXin Li drop-in replacements for the contents of the previous Opus frame.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor example, when switching from 20&nbsp;ms to 60&nbsp;ms, the 60&nbsp;ms Opus
*a58d3d2aSXin Li frame may contain LBRR frames covering up to three prior 20&nbsp;ms Opus
*a58d3d2aSXin Li frames, even if those frames already contained LBRR frames covering some of
*a58d3d2aSXin Li the same time periods.
*a58d3d2aSXin LiWhen switching from 20&nbsp;ms to 10&nbsp;ms, the 10&nbsp;ms Opus frame can
*a58d3d2aSXin Li contain an LBRR frame covering at most half the prior 20&nbsp;ms Opus frame,
*a58d3d2aSXin Li potentially leaving a hole that needs to be concealed from even a single
*a58d3d2aSXin Li packet loss (see <xref target="Packet Loss Concealment"/>).
*a58d3d2aSXin LiWhen switching from mono to stereo, the LBRR frames in the first stereo Opus
*a58d3d2aSXin Li frame MAY contain a non-trivial side channel.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIn order to properly produce LBRR frames under all conditions, an encoder might
*a58d3d2aSXin Li need to buffer up to 60&nbsp;ms of audio and re-encode it during these
*a58d3d2aSXin Li transitions.
*a58d3d2aSXin LiHowever, the reference implementation opts to disable LBRR frames at the
*a58d3d2aSXin Li transition point for simplicity.
*a58d3d2aSXin LiSince transitions are relatively infrequent in normal usage, this does not have
*a58d3d2aSXin Li a significant impact on packet loss robustness.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe LBRR frames immediately follow the LBRR flags, prior to any regular SILK
*a58d3d2aSXin Li frames.
*a58d3d2aSXin Li<xref target="silk_frame"/> describes their exact contents.
*a58d3d2aSXin LiLBRR frames do not include their own separate VAD flags.
*a58d3d2aSXin LiLBRR frames are only meant to be transmitted for active speech, thus all LBRR
*a58d3d2aSXin Li frames are treated as active.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIn a stereo Opus frame longer than 20&nbsp;ms, although the per-frame LBRR
*a58d3d2aSXin Li flags for the mid channel are coded as a unit before the per-frame LBRR flags
*a58d3d2aSXin Li for the side channel, the LBRR frames themselves are interleaved.
*a58d3d2aSXin LiThe decoder parses an LBRR frame for the mid channel of a given 20&nbsp;ms
*a58d3d2aSXin Li interval (if present) and then immediately parses the corresponding LBRR
*a58d3d2aSXin Li frame for the side channel (if present), before proceeding to the next
*a58d3d2aSXin Li 20&nbsp;ms interval.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_regular_frames" title="Regular SILK Frames">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe regular SILK frame(s) follow the LBRR frames (if any).
*a58d3d2aSXin Li<xref target="silk_frame"/> describes their contents, as well.
*a58d3d2aSXin LiUnlike the LBRR frames, a regular SILK frame is coded for each time interval in
*a58d3d2aSXin Li an Opus frame, even if the corresponding VAD flags are unset.
*a58d3d2aSXin LiFor stereo Opus frames longer than 20&nbsp;ms, the regular mid and side SILK
*a58d3d2aSXin Li frames for each 20&nbsp;ms interval are interleaved, just as with the LBRR
*a58d3d2aSXin Li frames.
*a58d3d2aSXin LiThe side frame may be skipped by coding an appropriate flag, as detailed in
*a58d3d2aSXin Li <xref target="silk_mid_only_flag"/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_frame" title="SILK Frame Contents">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiEach SILK frame includes a set of side information that encodes
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>The frame type and quantization type (<xref target="silk_frame_type"/>),</t>
*a58d3d2aSXin Li<t>Quantization gains (<xref target="silk_gains"/>),</t>
*a58d3d2aSXin Li<t>Short-term prediction filter coefficients (<xref target="silk_nlsfs"/>),</t>
*a58d3d2aSXin Li<t>A Line Spectral Frequencies (LSF) interpolation weight (<xref target="silk_nlsf_interpolation"/>),</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLong-term prediction filter lags and gains (<xref target="silk_ltp_params"/>),
*a58d3d2aSXin Li and
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>A linear congruential generator (LCG) seed (<xref target="silk_seed"/>).</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiThe quantized excitation signal (see <xref target="silk_excitation"/>) follows
*a58d3d2aSXin Li these at the end of the frame.
*a58d3d2aSXin Li<xref target="silk_frame_symbols"/> details the overall organization of a
*a58d3d2aSXin Li SILK frame.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_frame_symbols"
*a58d3d2aSXin Li title="Order of the symbols in an individual SILK frame">
*a58d3d2aSXin Li<ttcol align="center">Symbol(s)</ttcol>
*a58d3d2aSXin Li<ttcol align="center">PDF(s)</ttcol>
*a58d3d2aSXin Li<ttcol align="center">Condition</ttcol>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Stereo Prediction Weights</c>
*a58d3d2aSXin Li<c><xref target="silk_stereo_pred_pdfs"/></c>
*a58d3d2aSXin Li<c><xref target="silk_stereo_pred"/></c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Mid-only Flag</c>
*a58d3d2aSXin Li<c><xref target="silk_mid_only_pdf"/></c>
*a58d3d2aSXin Li<c><xref target="silk_mid_only_flag"/></c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Frame Type</c>
*a58d3d2aSXin Li<c><xref target="silk_frame_type"/></c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Subframe Gains</c>
*a58d3d2aSXin Li<c><xref target="silk_gains"/></c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Normalized LSF Stage-1 Index</c>
*a58d3d2aSXin Li<c><xref target="silk_nlsf_stage1_pdfs"/></c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Normalized LSF Stage-2 Residual</c>
*a58d3d2aSXin Li<c><xref target="silk_nlsf_stage2"/></c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Normalized LSF Interpolation Weight</c>
*a58d3d2aSXin Li<c><xref target="silk_nlsf_interp_pdf"/></c>
*a58d3d2aSXin Li<c>20&nbsp;ms frame</c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Primary Pitch Lag</c>
*a58d3d2aSXin Li<c><xref target="silk_ltp_lags"/></c>
*a58d3d2aSXin Li<c>Voiced frame</c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Subframe Pitch Contour</c>
*a58d3d2aSXin Li<c><xref target="silk_pitch_contour_pdfs"/></c>
*a58d3d2aSXin Li<c>Voiced frame</c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Periodicity Index</c>
*a58d3d2aSXin Li<c><xref target="silk_perindex_pdf"/></c>
*a58d3d2aSXin Li<c>Voiced frame</c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>LTP Filter</c>
*a58d3d2aSXin Li<c><xref target="silk_ltp_filter_pdfs"/></c>
*a58d3d2aSXin Li<c>Voiced frame</c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>LTP Scaling</c>
*a58d3d2aSXin Li<c><xref target="silk_ltp_scaling_pdf"/></c>
*a58d3d2aSXin Li<c><xref target="silk_ltp_scaling"/></c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>LCG Seed</c>
*a58d3d2aSXin Li<c><xref target="silk_seed_pdf"/></c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Excitation Rate Level</c>
*a58d3d2aSXin Li<c><xref target="silk_rate_level_pdfs"/></c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Excitation Pulse Counts</c>
*a58d3d2aSXin Li<c><xref target="silk_pulse_count_pdfs"/></c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Excitation Pulse Locations</c>
*a58d3d2aSXin Li<c><xref target="silk_pulse_locations"/></c>
*a58d3d2aSXin Li<c>Non-zero pulse count</c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Excitation LSBs</c>
*a58d3d2aSXin Li<c><xref target="silk_shell_lsb_pdf"/></c>
*a58d3d2aSXin Li<c><xref target="silk_pulse_counts"/></c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Excitation Signs</c>
*a58d3d2aSXin Li<c><xref target="silk_sign_pdfs"/></c>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_stereo_pred" toc="include"
*a58d3d2aSXin Li title="Stereo Prediction Weights">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA SILK frame corresponding to the mid channel of a stereo Opus frame begins
*a58d3d2aSXin Li with a pair of side channel prediction weights, designed such that zeros
*a58d3d2aSXin Li indicate normal mid-side coupling.
*a58d3d2aSXin LiSince these weights can change on every frame, the first portion of each frame
*a58d3d2aSXin Li linearly interpolates between the previous weights and the current ones, using
*a58d3d2aSXin Li zeros for the previous weights if none are available.
*a58d3d2aSXin LiThese prediction weights are never included in a mono Opus frame, and the
*a58d3d2aSXin Li previous weights are reset to zeros on any transition from mono to stereo.
*a58d3d2aSXin LiThey are also not included in an LBRR frame for the side channel, even if the
*a58d3d2aSXin Li LBRR flags indicate the corresponding mid channel was not coded.
*a58d3d2aSXin LiIn that case, the previous weights are used, again substituting in zeros if no
*a58d3d2aSXin Li previous weights are available since the last decoder reset
*a58d3d2aSXin Li (see <xref target="decoder-reset"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTo summarize, these weights are coded if and only if
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>This is a stereo Opus frame (<xref target="toc_byte"/>), and</t>
*a58d3d2aSXin Li<t>The current SILK frame corresponds to the mid channel.</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe prediction weights are coded in three separate pieces, which are decoded
*a58d3d2aSXin Li by silk_stereo_decode_pred() (decode_stereo_pred.c).
*a58d3d2aSXin LiThe first piece jointly codes the high-order part of a table index for both
*a58d3d2aSXin Li weights.
*a58d3d2aSXin LiThe second piece codes the low-order part of each table index.
*a58d3d2aSXin LiThe third piece codes an offset used to linearly interpolate between table
*a58d3d2aSXin Li indices.
*a58d3d2aSXin LiThe details are as follows.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLet n be an index decoded with the 25-element stage-1 PDF in
*a58d3d2aSXin Li <xref target="silk_stereo_pred_pdfs"/>.
*a58d3d2aSXin LiThen let i0 and i1 be indices decoded with the stage-2 and stage-3 PDFs in
*a58d3d2aSXin Li <xref target="silk_stereo_pred_pdfs"/>, respectively, and let i2 and i3
*a58d3d2aSXin Li be two more indices decoded with the stage-2 and stage-3 PDFs, all in that
*a58d3d2aSXin Li order.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_stereo_pred_pdfs" title="Stereo Weight PDFs">
*a58d3d2aSXin Li<ttcol align="left">Stage</ttcol>
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>Stage 1</c>
*a58d3d2aSXin Li<c>{7,  2,  1,  1,  1,
*a58d3d2aSXin Li   10, 24,  8,  1,  1,
*a58d3d2aSXin Li    3, 23, 92, 23,  3,
*a58d3d2aSXin Li    1,  1,  8, 24, 10,
*a58d3d2aSXin Li    1,  1,  1,  2,  7}/256</c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Stage 2</c>
*a58d3d2aSXin Li<c>{85, 86, 85}/256</c>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<c>Stage 3</c>
*a58d3d2aSXin Li<c>{51, 51, 52, 51, 51}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThen use n, i0, and i2 to form two table indices, wi0 and wi1, according to
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Liwi0 = i0 + 3*(n/5)
*a58d3d2aSXin Liwi1 = i2 + 3*(n%5)
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where the division is integer division.
*a58d3d2aSXin LiThe range of these indices is 0 to 14, inclusive.
*a58d3d2aSXin LiLet w[i] be the i'th weight from <xref target="silk_stereo_weights_table"/>.
*a58d3d2aSXin LiThen the two prediction weights, w0_Q13 and w1_Q13, are
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Liw1_Q13 = w_Q13[wi1]
*a58d3d2aSXin Li         + ((w_Q13[wi1+1] - w_Q13[wi1])*6554) >> 16)*(2*i3 + 1)
*a58d3d2aSXin Li
*a58d3d2aSXin Liw0_Q13 = w_Q13[wi0]
*a58d3d2aSXin Li         + ((w_Q13[wi0+1] - w_Q13[wi0])*6554) >> 16)*(2*i1 + 1)
*a58d3d2aSXin Li         - w1_Q13
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiN.b., w1_Q13 is computed first here, because w0_Q13 depends on it.
*a58d3d2aSXin LiThe constant 6554 is approximately 0.1 in Q16.
*a58d3d2aSXin LiAlthough wi0 and wi1 only have 15 possible values,
*a58d3d2aSXin Li <xref target="silk_stereo_weights_table"/> contains 16 entries to allow
*a58d3d2aSXin Li interpolation between entry wi0 and (wi0&nbsp;+&nbsp;1) (and likewise for wi1).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_stereo_weights_table"
*a58d3d2aSXin Li title="Stereo Weight Table">
*a58d3d2aSXin Li<ttcol align="left">Index</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Weight (Q13)</ttcol>
*a58d3d2aSXin Li <c>0</c> <c>-13732</c>
*a58d3d2aSXin Li <c>1</c> <c>-10050</c>
*a58d3d2aSXin Li <c>2</c>  <c>-8266</c>
*a58d3d2aSXin Li <c>3</c>  <c>-7526</c>
*a58d3d2aSXin Li <c>4</c>  <c>-6500</c>
*a58d3d2aSXin Li <c>5</c>  <c>-5000</c>
*a58d3d2aSXin Li <c>6</c>  <c>-2950</c>
*a58d3d2aSXin Li <c>7</c>   <c>-820</c>
*a58d3d2aSXin Li <c>8</c>    <c>820</c>
*a58d3d2aSXin Li <c>9</c>   <c>2950</c>
*a58d3d2aSXin Li<c>10</c>   <c>5000</c>
*a58d3d2aSXin Li<c>11</c>   <c>6500</c>
*a58d3d2aSXin Li<c>12</c>   <c>7526</c>
*a58d3d2aSXin Li<c>13</c>   <c>8266</c>
*a58d3d2aSXin Li<c>14</c>  <c>10050</c>
*a58d3d2aSXin Li<c>15</c>  <c>13732</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_mid_only_flag" toc="include" title="Mid-only Flag">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA flag appears after the stereo prediction weights that indicates if only the
*a58d3d2aSXin Li mid channel is coded for this time interval.
*a58d3d2aSXin LiIt appears only when
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>This is a stereo Opus frame (see <xref target="toc_byte"/>),</t>
*a58d3d2aSXin Li<t>The current SILK frame corresponds to the mid channel, and</t>
*a58d3d2aSXin Li<t>Either
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>This is a regular SILK frame where the VAD flags
*a58d3d2aSXin Li (see <xref target="silk_header_bits"/>) indicate that the corresponding side
*a58d3d2aSXin Li channel is not active.</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis is an LBRR frame where the LBRR flags
*a58d3d2aSXin Li (see <xref target="silk_header_bits"/> and <xref target="silk_lbrr_flags"/>)
*a58d3d2aSXin Li indicate that the corresponding side channel is not coded.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiIt is omitted when there are no stereo weights, for all of the same reasons.
*a58d3d2aSXin LiIt is also omitted for a regular SILK frame when the VAD flag of the
*a58d3d2aSXin Li corresponding side channel frame is set (indicating it is active).
*a58d3d2aSXin LiThe side channel must be coded in this case, making the mid-only flag
*a58d3d2aSXin Li redundant.
*a58d3d2aSXin LiIt is also omitted for an LBRR frame when the corresponding LBRR flags
*a58d3d2aSXin Li indicate the side channel is coded.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen the flag is present, the decoder reads a single value using the PDF in
*a58d3d2aSXin Li <xref target="silk_mid_only_pdf"/>, as implemented in
*a58d3d2aSXin Li silk_stereo_decode_mid_only() (decode_stereo_pred.c).
*a58d3d2aSXin LiIf the flag is set, then there is no corresponding SILK frame for the side
*a58d3d2aSXin Li channel, the entire decoding process for the side channel is skipped, and
*a58d3d2aSXin Li zeros are fed to the stereo unmixing process (see
*a58d3d2aSXin Li <xref target="silk_stereo_unmixing"/>) instead.
*a58d3d2aSXin LiAs stated above, LBRR frames still include this flag when the LBRR flag
*a58d3d2aSXin Li indicates that the side channel is not coded.
*a58d3d2aSXin LiIn that case, if this flag is zero (indicating that there should be a side
*a58d3d2aSXin Li channel), then Packet Loss Concealment (PLC, see
*a58d3d2aSXin Li <xref target="Packet Loss Concealment"/>) SHOULD be invoked to recover a
*a58d3d2aSXin Li side channel signal.
*a58d3d2aSXin LiOtherwise, the stereo image will collapse.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_mid_only_pdf" title="Mid-only Flag PDF">
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>{192, 64}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_frame_type" toc="include" title="Frame Type">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiEach SILK frame contains a single "frame type" symbol that jointly codes the
*a58d3d2aSXin Li signal type and quantization offset type of the corresponding frame.
*a58d3d2aSXin LiIf the current frame is a regular SILK frame whose VAD bit was not set (an
*a58d3d2aSXin Li "inactive" frame), then the frame type symbol takes on a value of either 0 or
*a58d3d2aSXin Li 1 and is decoded using the first PDF in <xref target="silk_frame_type_pdfs"/>.
*a58d3d2aSXin LiIf the frame is an LBRR frame or a regular SILK frame whose VAD flag was set
*a58d3d2aSXin Li (an "active" frame), then the value of the symbol may range from 2 to 5,
*a58d3d2aSXin Li inclusive, and is decoded using the second PDF in
*a58d3d2aSXin Li <xref target="silk_frame_type_pdfs"/>.
*a58d3d2aSXin Li<xref target="silk_frame_type_table"/> translates between the value of the
*a58d3d2aSXin Li frame type symbol and the corresponding signal type and quantization offset
*a58d3d2aSXin Li type.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_frame_type_pdfs" title="Frame Type PDFs">
*a58d3d2aSXin Li<ttcol>VAD Flag</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>Inactive</c> <c>{26, 230, 0, 0, 0, 0}/256</c>
*a58d3d2aSXin Li<c>Active</c>   <c>{0, 0, 24, 74, 148, 10}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_frame_type_table"
*a58d3d2aSXin Li title="Signal Type and Quantization Offset Type from Frame Type">
*a58d3d2aSXin Li<ttcol>Frame Type</ttcol>
*a58d3d2aSXin Li<ttcol>Signal Type</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Quantization Offset Type</ttcol>
*a58d3d2aSXin Li<c>0</c> <c>Inactive</c> <c>Low</c>
*a58d3d2aSXin Li<c>1</c> <c>Inactive</c> <c>High</c>
*a58d3d2aSXin Li<c>2</c> <c>Unvoiced</c> <c>Low</c>
*a58d3d2aSXin Li<c>3</c> <c>Unvoiced</c> <c>High</c>
*a58d3d2aSXin Li<c>4</c> <c>Voiced</c>   <c>Low</c>
*a58d3d2aSXin Li<c>5</c> <c>Voiced</c>   <c>High</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_gains" toc="include" title="Subframe Gains">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA separate quantization gain is coded for each 5&nbsp;ms subframe.
*a58d3d2aSXin LiThese gains control the step size between quantization levels of the excitation
*a58d3d2aSXin Li signal and, therefore, the quality of the reconstruction.
*a58d3d2aSXin LiThey are independent of and unrelated to the pitch contours coded for voiced
*a58d3d2aSXin Li frames.
*a58d3d2aSXin LiThe quantization gains are themselves uniformly quantized to 6&nbsp;bits on a
*a58d3d2aSXin Li log scale, giving them a resolution of approximately 1.369&nbsp;dB and a range
*a58d3d2aSXin Li of approximately 1.94&nbsp;dB to 88.21&nbsp;dB.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe subframe gains are either coded independently, or relative to the gain from
*a58d3d2aSXin Li the most recent coded subframe in the same channel.
*a58d3d2aSXin LiIndependent coding is used if and only if
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis is the first subframe in the current SILK frame, and
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>Either
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis is the first SILK frame of its type (LBRR or regular) for this channel in
*a58d3d2aSXin Li the current Opus frame, or
*a58d3d2aSXin Li </t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe previous SILK frame of the same type (LBRR or regular) for this channel in
*a58d3d2aSXin Li the same Opus frame was not coded.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIn an independently coded subframe gain, the 3 most significant bits of the
*a58d3d2aSXin Li quantization gain are decoded using a PDF selected from
*a58d3d2aSXin Li <xref target="silk_independent_gain_msb_pdfs"/> based on the decoded signal
*a58d3d2aSXin Li type (see <xref target="silk_frame_type"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_independent_gain_msb_pdfs"
*a58d3d2aSXin Li title="PDFs for Independent Quantization Gain MSB Coding">
*a58d3d2aSXin Li<ttcol align="left">Signal Type</ttcol>
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>Inactive</c> <c>{32, 112, 68, 29, 12,  1,  1, 1}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c>  <c>{2,  17, 45, 60, 62, 47, 19, 4}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>    <c>{1,   3, 26, 71, 94, 50,  9, 2}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe 3 least significant bits are decoded using a uniform PDF:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<texttable anchor="silk_independent_gain_lsb_pdf"
*a58d3d2aSXin Li title="PDF for Independent Quantization Gain LSB Coding">
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>{32, 32, 32, 32, 32, 32, 32, 32}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThese 6 bits are combined to form a value, gain_index, between 0 and 63.
*a58d3d2aSXin LiWhen the gain for the previous subframe is available, then the current gain is
*a58d3d2aSXin Li limited as follows:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lilog_gain = max(gain_index, previous_log_gain - 16) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThis may help some implementations limit the change in precision of their
*a58d3d2aSXin Li internal LTP history.
*a58d3d2aSXin LiThe indices which this clamp applies to cannot simply be removed from the
*a58d3d2aSXin Li codebook, because previous_log_gain will not be available after packet loss.
*a58d3d2aSXin LiThe clamping is skipped after a decoder reset, and in the side channel if the
*a58d3d2aSXin Li previous frame in the side channel was not coded, since there is no value for
*a58d3d2aSXin Li previous_log_gain available.
*a58d3d2aSXin LiIt MAY also be skipped after packet loss.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor subframes which do not have an independent gain (including the first
*a58d3d2aSXin Li subframe of frames not listed as using independent coding above), the
*a58d3d2aSXin Li quantization gain is coded relative to the gain from the previous subframe (in
*a58d3d2aSXin Li the same channel).
*a58d3d2aSXin LiThe PDF in <xref target="silk_delta_gain_pdf"/> yields a delta_gain_index value
*a58d3d2aSXin Li between 0 and 40, inclusive.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<texttable anchor="silk_delta_gain_pdf"
*a58d3d2aSXin Li title="PDF for Delta Quantization Gain Coding">
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>{6,   5,  11,  31, 132,  21,   8,   4,
*a58d3d2aSXin Li    3,   2,   2,   2,   1,   1,   1,   1,
*a58d3d2aSXin Li    1,   1,   1,   1,   1,   1,   1,   1,
*a58d3d2aSXin Li    1,   1,   1,   1,   1,   1,   1,   1,
*a58d3d2aSXin Li    1,   1,   1,   1,   1,   1,   1,   1,   1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe following formula translates this index into a quantization gain for the
*a58d3d2aSXin Li current subframe using the gain from the previous subframe:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lilog_gain = clamp(0, max(2*delta_gain_index - 16,
*a58d3d2aSXin Li                   previous_log_gain + delta_gain_index - 4), 63) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Lisilk_gains_dequant() (gain_quant.c) dequantizes log_gain for the k'th subframe
*a58d3d2aSXin Li and converts it into a linear Q16 scale factor via
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Ligain_Q16[k] = silk_log2lin((0x1D1C71*log_gain>>16) + 2090)
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe function silk_log2lin() (log2lin.c) computes an approximation of
*a58d3d2aSXin Li 2**(inLog_Q7/128.0), where inLog_Q7 is its Q7 input.
*a58d3d2aSXin LiLet i = inLog_Q7&gt;&gt;7 be the integer part of inLogQ7 and
*a58d3d2aSXin Li f = inLog_Q7&amp;127 be the fractional part.
*a58d3d2aSXin LiThen
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li(1<<i) + ((-174*f*(128-f)>>16)+f)*((1<<i)>>7)
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li yields the approximate exponential.
*a58d3d2aSXin LiThe final Q16 gain values lies between 81920 and 1686110208, inclusive
*a58d3d2aSXin Li (representing scale factors of 1.25 to 25728, respectively).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_nlsfs" toc="include" title="Normalized Line Spectral
*a58d3d2aSXin Li Frequency (LSF) and Linear Predictive Coding (LPC) Coefficients">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA set of normalized Line Spectral Frequency (LSF) coefficients follow the
*a58d3d2aSXin Li quantization gains in the bitstream, and represent the Linear Predictive
*a58d3d2aSXin Li Coding (LPC) coefficients for the current SILK frame.
*a58d3d2aSXin LiOnce decoded, the normalized LSFs form an increasing list of Q15 values between
*a58d3d2aSXin Li 0 and 1.
*a58d3d2aSXin LiThese represent the interleaved zeros on the upper half of the unit circle
*a58d3d2aSXin Li (between 0 and pi, hence "normalized") in the standard decomposition
*a58d3d2aSXin Li <xref target="line-spectral-pairs"/> of the LPC filter into a symmetric part
*a58d3d2aSXin Li and an anti-symmetric part (P and Q in <xref target="silk_nlsf2lpc"/>).
*a58d3d2aSXin LiBecause of non-linear effects in the decoding process, an implementation SHOULD
*a58d3d2aSXin Li match the fixed-point arithmetic described in this section exactly.
*a58d3d2aSXin LiAn encoder SHOULD also use the same process.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe normalized LSFs are coded using a two-stage vector quantizer (VQ)
*a58d3d2aSXin Li (<xref target="silk_nlsf_stage1"/> and <xref target="silk_nlsf_stage2"/>).
*a58d3d2aSXin LiNB and MB frames use an order-10 predictor, while WB frames use an order-16
*a58d3d2aSXin Li predictor, and thus have different sets of tables.
*a58d3d2aSXin LiAfter reconstructing the normalized LSFs
*a58d3d2aSXin Li (<xref target="silk_nlsf_reconstruction"/>), the decoder runs them through a
*a58d3d2aSXin Li stabilization process (<xref target="silk_nlsf_stabilization"/>), interpolates
*a58d3d2aSXin Li them between frames (<xref target="silk_nlsf_interpolation"/>), converts them
*a58d3d2aSXin Li back into LPC coefficients (<xref target="silk_nlsf2lpc"/>), and then runs
*a58d3d2aSXin Li them through further processes to limit the range of the coefficients
*a58d3d2aSXin Li (<xref target="silk_lpc_range_limit"/>) and the gain of the filter
*a58d3d2aSXin Li (<xref target="silk_lpc_gain_limit"/>).
*a58d3d2aSXin LiAll of this is necessary to ensure the reconstruction process is stable.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_nlsf_stage1" title="Normalized LSF Stage 1 Decoding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe first VQ stage uses a 32-element codebook, coded with one of the PDFs in
*a58d3d2aSXin Li <xref target="silk_nlsf_stage1_pdfs"/>, depending on the audio bandwidth and
*a58d3d2aSXin Li the signal type of the current SILK frame.
*a58d3d2aSXin LiThis yields a single index, I1, for the entire frame, which
*a58d3d2aSXin Li<list style="numbers">
*a58d3d2aSXin Li<t>Indexes an element in a coarse codebook,</t>
*a58d3d2aSXin Li<t>Selects the PDFs for the second stage of the VQ, and</t>
*a58d3d2aSXin Li<t>Selects the prediction weights used to remove intra-frame redundancy from
*a58d3d2aSXin Li the second stage.</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiThe actual codebook elements are listed in
*a58d3d2aSXin Li <xref target="silk_nlsf_nbmb_codebook"/> and
*a58d3d2aSXin Li <xref target="silk_nlsf_wb_codebook"/>, but they are not needed until the last
*a58d3d2aSXin Li stages of reconstructing the LSF coefficients.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_stage1_pdfs"
*a58d3d2aSXin Li title="PDFs for Normalized LSF Stage-1 Index Decoding">
*a58d3d2aSXin Li<ttcol align="left">Audio Bandwidth</ttcol>
*a58d3d2aSXin Li<ttcol align="left">Signal Type</ttcol>
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>NB or MB</c> <c>Inactive or unvoiced</c>
*a58d3d2aSXin Li<c>
*a58d3d2aSXin Li{44, 34, 30, 19, 21, 12, 11,  3,
*a58d3d2aSXin Li  3,  2, 16,  2,  2,  1,  5,  2,
*a58d3d2aSXin Li  1,  3,  3,  1,  1,  2,  2,  2,
*a58d3d2aSXin Li  3,  1,  9,  9,  2,  7,  2,  1}/256
*a58d3d2aSXin Li</c>
*a58d3d2aSXin Li<c>NB or MB</c> <c>Voiced</c>
*a58d3d2aSXin Li<c>
*a58d3d2aSXin Li{1, 10,  1,  8,  3,  8,  8, 14,
*a58d3d2aSXin Li13, 14,  1, 14, 12, 13, 11, 11,
*a58d3d2aSXin Li12, 11, 10, 10, 11,  8,  9,  8,
*a58d3d2aSXin Li 7,  8,  1,  1,  6,  1,  6,  5}/256
*a58d3d2aSXin Li</c>
*a58d3d2aSXin Li<c>WB</c> <c>Inactive or unvoiced</c>
*a58d3d2aSXin Li<c>
*a58d3d2aSXin Li{31, 21,  3, 17,  1,  8, 17,  4,
*a58d3d2aSXin Li  1, 18, 16,  4,  2,  3,  1, 10,
*a58d3d2aSXin Li  1,  3, 16, 11, 16,  2,  2,  3,
*a58d3d2aSXin Li  2, 11,  1,  4,  9,  8,  7,  3}/256
*a58d3d2aSXin Li</c>
*a58d3d2aSXin Li<c>WB</c> <c>Voiced</c>
*a58d3d2aSXin Li<c>
*a58d3d2aSXin Li{1,  4, 16,  5, 18, 11,  5, 14,
*a58d3d2aSXin Li15,  1,  3, 12, 13, 14, 14,  6,
*a58d3d2aSXin Li14, 12,  2,  6,  1, 12, 12, 11,
*a58d3d2aSXin Li10,  3, 10,  5,  1,  1,  1,  3}/256
*a58d3d2aSXin Li</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_nlsf_stage2" title="Normalized LSF Stage 2 Decoding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA total of 16 PDFs are available for the LSF residual in the second stage: the
*a58d3d2aSXin Li 8 (a...h) for NB and MB frames given in
*a58d3d2aSXin Li <xref target="silk_nlsf_stage2_nbmb_pdfs"/>, and the 8 (i...p) for WB frames
*a58d3d2aSXin Li given in <xref target="silk_nlsf_stage2_wb_pdfs"/>.
*a58d3d2aSXin LiWhich PDF is used for which coefficient is driven by the index, I1,
*a58d3d2aSXin Li decoded in the first stage.
*a58d3d2aSXin Li<xref target="silk_nlsf_nbmb_stage2_cb_sel"/> lists the letter of the
*a58d3d2aSXin Li corresponding PDF for each normalized LSF coefficient for NB and MB, and
*a58d3d2aSXin Li <xref target="silk_nlsf_wb_stage2_cb_sel"/> lists the same information for WB.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_stage2_nbmb_pdfs"
*a58d3d2aSXin Li title="PDFs for NB/MB Normalized LSF Stage-2 Index Decoding">
*a58d3d2aSXin Li<ttcol align="left">Codebook</ttcol>
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>a</c> <c>{1,   1,   1,  15, 224,  11,   1,   1,   1}/256</c>
*a58d3d2aSXin Li<c>b</c> <c>{1,   1,   2,  34, 183,  32,   1,   1,   1}/256</c>
*a58d3d2aSXin Li<c>c</c> <c>{1,   1,   4,  42, 149,  55,   2,   1,   1}/256</c>
*a58d3d2aSXin Li<c>d</c> <c>{1,   1,   8,  52, 123,  61,   8,   1,   1}/256</c>
*a58d3d2aSXin Li<c>e</c> <c>{1,   3,  16,  53, 101,  74,   6,   1,   1}/256</c>
*a58d3d2aSXin Li<c>f</c> <c>{1,   3,  17,  55,  90,  73,  15,   1,   1}/256</c>
*a58d3d2aSXin Li<c>g</c> <c>{1,   7,  24,  53,  74,  67,  26,   3,   1}/256</c>
*a58d3d2aSXin Li<c>h</c> <c>{1,   1,  18,  63,  78,  58,  30,   6,   1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_stage2_wb_pdfs"
*a58d3d2aSXin Li title="PDFs for WB Normalized LSF Stage-2 Index Decoding">
*a58d3d2aSXin Li<ttcol align="left">Codebook</ttcol>
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>i</c> <c>{1,   1,   1,   9, 232,   9,   1,   1,   1}/256</c>
*a58d3d2aSXin Li<c>j</c> <c>{1,   1,   2,  28, 186,  35,   1,   1,   1}/256</c>
*a58d3d2aSXin Li<c>k</c> <c>{1,   1,   3,  42, 152,  53,   2,   1,   1}/256</c>
*a58d3d2aSXin Li<c>l</c> <c>{1,   1,  10,  49, 126,  65,   2,   1,   1}/256</c>
*a58d3d2aSXin Li<c>m</c> <c>{1,   4,  19,  48, 100,  77,   5,   1,   1}/256</c>
*a58d3d2aSXin Li<c>n</c> <c>{1,   1,  14,  54, 100,  72,  12,   1,   1}/256</c>
*a58d3d2aSXin Li<c>o</c> <c>{1,   1,  15,  61,  87,  61,  25,   4,   1}/256</c>
*a58d3d2aSXin Li<c>p</c> <c>{1,   7,  21,  50,  77,  81,  17,   1,   1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_nbmb_stage2_cb_sel"
*a58d3d2aSXin Li title="Codebook Selection for NB/MB Normalized LSF Stage-2 Index Decoding">
*a58d3d2aSXin Li<ttcol>I1</ttcol>
*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li<c><spanx style="vbare">0&nbsp;1&nbsp;2&nbsp;3&nbsp;4&nbsp;5&nbsp;6&nbsp;7&nbsp;8&nbsp;9</spanx></c>
*a58d3d2aSXin Li<c> 0</c>
*a58d3d2aSXin Li<c><spanx style="vbare">a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a&nbsp;a</spanx></c>
*a58d3d2aSXin Li<c> 1</c>
*a58d3d2aSXin Li<c><spanx style="vbare">b&nbsp;d&nbsp;b&nbsp;c&nbsp;c&nbsp;b&nbsp;c&nbsp;b&nbsp;b&nbsp;b</spanx></c>
*a58d3d2aSXin Li<c> 2</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b&nbsp;b</spanx></c>
*a58d3d2aSXin Li<c> 3</c>
*a58d3d2aSXin Li<c><spanx style="vbare">b&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;b&nbsp;c&nbsp;b&nbsp;b&nbsp;b</spanx></c>
*a58d3d2aSXin Li<c> 4</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;d&nbsp;d&nbsp;d&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c</spanx></c>
*a58d3d2aSXin Li<c> 5</c>
*a58d3d2aSXin Li<c><spanx style="vbare">a&nbsp;f&nbsp;d&nbsp;d&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;b&nbsp;b</spanx></c>
*a58d3d2aSXin Li<c> g</c>
*a58d3d2aSXin Li<c><spanx style="vbare">a&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;b</spanx></c>
*a58d3d2aSXin Li<c> 7</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;g&nbsp;e&nbsp;e&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f</spanx></c>
*a58d3d2aSXin Li<c> 8</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;e&nbsp;f&nbsp;f&nbsp;e&nbsp;f&nbsp;e&nbsp;g&nbsp;e&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c> 9</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;e&nbsp;e&nbsp;h&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>10</c>
*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;d&nbsp;d&nbsp;d&nbsp;c&nbsp;d&nbsp;c&nbsp;c&nbsp;c&nbsp;c</spanx></c>
*a58d3d2aSXin Li<c>11</c>
*a58d3d2aSXin Li<c><spanx style="vbare">b&nbsp;f&nbsp;f&nbsp;g&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f&nbsp;f</spanx></c>
*a58d3d2aSXin Li<c>12</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;h&nbsp;e&nbsp;g&nbsp;f&nbsp;f&nbsp;f&nbsp;f&nbsp;f&nbsp;f</spanx></c>
*a58d3d2aSXin Li<c>13</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;h&nbsp;f&nbsp;f&nbsp;f&nbsp;f&nbsp;f&nbsp;g&nbsp;f&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>14</c>
*a58d3d2aSXin Li<c><spanx style="vbare">d&nbsp;d&nbsp;f&nbsp;e&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;e&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>15</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;d&nbsp;f&nbsp;f&nbsp;e&nbsp;e&nbsp;e&nbsp;e&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>16</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;e&nbsp;e&nbsp;g&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f&nbsp;f</spanx></c>
*a58d3d2aSXin Li<c>17</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;f&nbsp;e&nbsp;g&nbsp;f&nbsp;f&nbsp;f&nbsp;e&nbsp;f&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>18</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;h&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f&nbsp;f</spanx></c>
*a58d3d2aSXin Li<c>19</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;f&nbsp;e&nbsp;g&nbsp;h&nbsp;g&nbsp;f&nbsp;g&nbsp;f&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>20</c>
*a58d3d2aSXin Li<c><spanx style="vbare">d&nbsp;g&nbsp;h&nbsp;e&nbsp;g&nbsp;f&nbsp;f&nbsp;g&nbsp;e&nbsp;f</spanx></c>
*a58d3d2aSXin Li<c>21</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;h&nbsp;g&nbsp;e&nbsp;e&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;f</spanx></c>
*a58d3d2aSXin Li<c>22</c>
*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;f&nbsp;f&nbsp;e&nbsp;g&nbsp;g&nbsp;f&nbsp;g&nbsp;f&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>23</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;f&nbsp;f&nbsp;g&nbsp;f&nbsp;g&nbsp;e&nbsp;g&nbsp;e&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>24</c>
*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;f&nbsp;f&nbsp;f&nbsp;d&nbsp;h&nbsp;e&nbsp;f&nbsp;f&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>25</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;e&nbsp;f&nbsp;f&nbsp;g&nbsp;e&nbsp;f&nbsp;f&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>26</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;d&nbsp;c&nbsp;d&nbsp;d&nbsp;e&nbsp;c&nbsp;d&nbsp;d&nbsp;d</spanx></c>
*a58d3d2aSXin Li<c>27</c>
*a58d3d2aSXin Li<c><spanx style="vbare">b&nbsp;b&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;c&nbsp;d&nbsp;c&nbsp;c</spanx></c>
*a58d3d2aSXin Li<c>28</c>
*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;f&nbsp;f&nbsp;g&nbsp;g&nbsp;g&nbsp;f&nbsp;g&nbsp;e&nbsp;f</spanx></c>
*a58d3d2aSXin Li<c>29</c>
*a58d3d2aSXin Li<c><spanx style="vbare">d&nbsp;f&nbsp;f&nbsp;e&nbsp;e&nbsp;e&nbsp;e&nbsp;d&nbsp;d&nbsp;c</spanx></c>
*a58d3d2aSXin Li<c>30</c>
*a58d3d2aSXin Li<c><spanx style="vbare">c&nbsp;f&nbsp;d&nbsp;h&nbsp;f&nbsp;f&nbsp;e&nbsp;e&nbsp;f&nbsp;e</spanx></c>
*a58d3d2aSXin Li<c>31</c>
*a58d3d2aSXin Li<c><spanx style="vbare">e&nbsp;e&nbsp;f&nbsp;e&nbsp;f&nbsp;g&nbsp;f&nbsp;g&nbsp;f&nbsp;e</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_wb_stage2_cb_sel"
*a58d3d2aSXin Li title="Codebook Selection for WB Normalized LSF Stage-2 Index Decoding">
*a58d3d2aSXin Li<ttcol>I1</ttcol>
*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li<c><spanx style="vbare">0&nbsp;&nbsp;1&nbsp;&nbsp;2&nbsp;&nbsp;3&nbsp;&nbsp;4&nbsp;&nbsp;5&nbsp;&nbsp;6&nbsp;&nbsp;7&nbsp;&nbsp;8&nbsp;&nbsp;9&nbsp;10&nbsp;11&nbsp;12&nbsp;13&nbsp;14&nbsp;15</spanx></c>
*a58d3d2aSXin Li<c> 0</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
*a58d3d2aSXin Li<c> 1</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c> 2</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;k&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c> 3</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j</spanx></c>
*a58d3d2aSXin Li<c> 4</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c> 5</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;m</spanx></c>
*a58d3d2aSXin Li<c> 6</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
*a58d3d2aSXin Li<c> 7</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;k&nbsp;&nbsp;o&nbsp;&nbsp;l&nbsp;&nbsp;p&nbsp;&nbsp;k&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c> 8</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;k&nbsp;&nbsp;o&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;o&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c> 9</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
*a58d3d2aSXin Li<c>10</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j</spanx></c>
*a58d3d2aSXin Li<c>11</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c>12</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c>13</c>
*a58d3d2aSXin Li<c><spanx style="vbare">l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;m</spanx></c>
*a58d3d2aSXin Li<c>14</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;n&nbsp;&nbsp;k&nbsp;&nbsp;o&nbsp;&nbsp;n&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c>15</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;i</spanx></c>
*a58d3d2aSXin Li<c>16</c>
*a58d3d2aSXin Li<c><spanx style="vbare">j&nbsp;&nbsp;o&nbsp;&nbsp;n&nbsp;&nbsp;p&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;m</spanx></c>
*a58d3d2aSXin Li<c>17</c>
*a58d3d2aSXin Li<c><spanx style="vbare">j&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m</spanx></c>
*a58d3d2aSXin Li<c>18</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;m</spanx></c>
*a58d3d2aSXin Li<c>19</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
*a58d3d2aSXin Li<c>20</c>
*a58d3d2aSXin Li<c><spanx style="vbare">l&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;m</spanx></c>
*a58d3d2aSXin Li<c>21</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;o&nbsp;&nbsp;l&nbsp;&nbsp;p&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c>22</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;o&nbsp;&nbsp;o&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;m</spanx></c>
*a58d3d2aSXin Li<c>23</c>
*a58d3d2aSXin Li<c><spanx style="vbare">j&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;j</spanx></c>
*a58d3d2aSXin Li<c>24</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;o&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c>25</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
*a58d3d2aSXin Li<c>26</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;o&nbsp;&nbsp;o&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;k&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;p&nbsp;&nbsp;m&nbsp;&nbsp;m&nbsp;&nbsp;m</spanx></c>
*a58d3d2aSXin Li<c>27</c>
*a58d3d2aSXin Li<c><spanx style="vbare">l&nbsp;&nbsp;l&nbsp;&nbsp;p&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l</spanx></c>
*a58d3d2aSXin Li<c>28</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;j</spanx></c>
*a58d3d2aSXin Li<c>29</c>
*a58d3d2aSXin Li<c><spanx style="vbare">i&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;j</spanx></c>
*a58d3d2aSXin Li<c>30</c>
*a58d3d2aSXin Li<c><spanx style="vbare">l&nbsp;&nbsp;n&nbsp;&nbsp;n&nbsp;&nbsp;m&nbsp;&nbsp;p&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;i&nbsp;&nbsp;j&nbsp;&nbsp;i</spanx></c>
*a58d3d2aSXin Li<c>31</c>
*a58d3d2aSXin Li<c><spanx style="vbare">k&nbsp;&nbsp;l&nbsp;&nbsp;n&nbsp;&nbsp;l&nbsp;&nbsp;m&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;l&nbsp;&nbsp;k&nbsp;&nbsp;j&nbsp;&nbsp;k&nbsp;&nbsp;o&nbsp;&nbsp;m&nbsp;&nbsp;i&nbsp;&nbsp;i&nbsp;&nbsp;i</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiDecoding the second stage residual proceeds as follows.
*a58d3d2aSXin LiFor each coefficient, the decoder reads a symbol using the PDF corresponding to
*a58d3d2aSXin Li I1 from either <xref target="silk_nlsf_nbmb_stage2_cb_sel"/> or
*a58d3d2aSXin Li <xref target="silk_nlsf_wb_stage2_cb_sel"/>, and subtracts 4 from the result
*a58d3d2aSXin Li to give an index in the range -4 to 4, inclusive.
*a58d3d2aSXin LiIf the index is either -4 or 4, it reads a second symbol using the PDF in
*a58d3d2aSXin Li <xref target="silk_nlsf_ext_pdf"/>, and adds the value of this second symbol
*a58d3d2aSXin Li to the index, using the same sign.
*a58d3d2aSXin LiThis gives the index, I2[k], a total range of -10 to 10, inclusive.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_ext_pdf"
*a58d3d2aSXin Li title="PDF for Normalized LSF Index Extension Decoding">
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>{156, 60, 24,  9,  4,  2,  1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoded indices from both stages are translated back into normalized LSF
*a58d3d2aSXin Li coefficients in silk_NLSF_decode() (NLSF_decode.c).
*a58d3d2aSXin LiThe stage-2 indices represent residuals after both the first stage of the VQ
*a58d3d2aSXin Li and a separate backwards-prediction step.
*a58d3d2aSXin LiThe backwards prediction process in the encoder subtracts a prediction from
*a58d3d2aSXin Li each residual formed by a multiple of the coefficient that follows it.
*a58d3d2aSXin LiThe decoder must undo this process.
*a58d3d2aSXin Li<xref target="silk_nlsf_pred_weights"/> contains lists of prediction weights
*a58d3d2aSXin Li for each coefficient.
*a58d3d2aSXin LiThere are two lists for NB and MB, and another two lists for WB, giving two
*a58d3d2aSXin Li possible prediction weights for each coefficient.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_pred_weights"
*a58d3d2aSXin Li title="Prediction Weights for Normalized LSF Decoding">
*a58d3d2aSXin Li<ttcol align="left">Coefficient</ttcol>
*a58d3d2aSXin Li<ttcol align="right">A</ttcol>
*a58d3d2aSXin Li<ttcol align="right">B</ttcol>
*a58d3d2aSXin Li<ttcol align="right">C</ttcol>
*a58d3d2aSXin Li<ttcol align="right">D</ttcol>
*a58d3d2aSXin Li <c>0</c> <c>179</c> <c>116</c> <c>175</c>  <c>68</c>
*a58d3d2aSXin Li <c>1</c> <c>138</c>  <c>67</c> <c>148</c>  <c>62</c>
*a58d3d2aSXin Li <c>2</c> <c>140</c>  <c>82</c> <c>160</c>  <c>66</c>
*a58d3d2aSXin Li <c>3</c> <c>148</c>  <c>59</c> <c>176</c>  <c>60</c>
*a58d3d2aSXin Li <c>4</c> <c>151</c>  <c>92</c> <c>178</c>  <c>72</c>
*a58d3d2aSXin Li <c>5</c> <c>149</c>  <c>72</c> <c>173</c> <c>117</c>
*a58d3d2aSXin Li <c>6</c> <c>153</c> <c>100</c> <c>174</c>  <c>85</c>
*a58d3d2aSXin Li <c>7</c> <c>151</c>  <c>89</c> <c>164</c>  <c>90</c>
*a58d3d2aSXin Li <c>8</c> <c>163</c>  <c>92</c> <c>177</c> <c>118</c>
*a58d3d2aSXin Li <c>9</c> <c/>        <c/>      <c>174</c> <c>136</c>
*a58d3d2aSXin Li<c>10</c> <c/>        <c/>      <c>196</c> <c>151</c>
*a58d3d2aSXin Li<c>11</c> <c/>        <c/>      <c>182</c> <c>142</c>
*a58d3d2aSXin Li<c>12</c> <c/>        <c/>      <c>198</c> <c>160</c>
*a58d3d2aSXin Li<c>13</c> <c/>        <c/>      <c>192</c> <c>142</c>
*a58d3d2aSXin Li<c>14</c> <c/>        <c/>      <c>182</c> <c>155</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe prediction is undone using the procedure implemented in
*a58d3d2aSXin Li silk_NLSF_residual_dequant() (NLSF_decode.c), which is as follows.
*a58d3d2aSXin LiEach coefficient selects its prediction weight from one of the two lists based
*a58d3d2aSXin Li on the stage-1 index, I1.
*a58d3d2aSXin Li<xref target="silk_nlsf_nbmb_weight_sel"/> gives the selections for each
*a58d3d2aSXin Li coefficient for NB and MB, and <xref target="silk_nlsf_wb_weight_sel"/> gives
*a58d3d2aSXin Li the selections for WB.
*a58d3d2aSXin LiLet d_LPC be the order of the codebook, i.e., 10 for NB and MB, and 16 for WB,
*a58d3d2aSXin Li and let pred_Q8[k] be the weight for the k'th coefficient selected by this
*a58d3d2aSXin Li process for 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d_LPC-1.
*a58d3d2aSXin LiThen, the stage-2 residual for each coefficient is computed via
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lires_Q10[k] = (k+1 < d_LPC ? (res_Q10[k+1]*pred_Q8[k])>>8 : 0)
*a58d3d2aSXin Li             + ((((I2[k]<<10) - sign(I2[k])*102)*qstep)>>16) ,
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where qstep is the Q16 quantization step size, which is 11796 for NB and MB
*a58d3d2aSXin Li and 9830 for WB (representing step sizes of approximately 0.18 and 0.15,
*a58d3d2aSXin Li respectively).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_nbmb_weight_sel"
*a58d3d2aSXin Li title="Prediction Weight Selection for NB/MB Normalized LSF Decoding">
*a58d3d2aSXin Li<ttcol>I1</ttcol>
*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li<c><spanx style="vbare">0&nbsp;1&nbsp;2&nbsp;3&nbsp;4&nbsp;5&nbsp;6&nbsp;7&nbsp;8</spanx></c>
*a58d3d2aSXin Li<c> 0</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c> 1</c>
*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c> 2</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c> 3</c>
*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c> 4</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c> 5</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c> 6</c>
*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c> 7</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c> 8</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;B</spanx></c>
*a58d3d2aSXin Li<c> 9</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;B</spanx></c>
*a58d3d2aSXin Li<c>10</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>11</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>12</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>13</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>14</c>
*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B</spanx></c>
*a58d3d2aSXin Li<c>15</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>16</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>17</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B</spanx></c>
*a58d3d2aSXin Li<c>18</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>19</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>20</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>21</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>22</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B</spanx></c>
*a58d3d2aSXin Li<c>23</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;B&nbsp;B</spanx></c>
*a58d3d2aSXin Li<c>24</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B</spanx></c>
*a58d3d2aSXin Li<c>25</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>26</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>27</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>28</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>29</c>
*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;A&nbsp;B&nbsp;A&nbsp;A&nbsp;A&nbsp;A&nbsp;A</spanx></c>
*a58d3d2aSXin Li<c>30</c>
*a58d3d2aSXin Li<c><spanx style="vbare">A&nbsp;A&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;A&nbsp;B</spanx></c>
*a58d3d2aSXin Li<c>31</c>
*a58d3d2aSXin Li<c><spanx style="vbare">B&nbsp;A&nbsp;B&nbsp;B&nbsp;A&nbsp;B&nbsp;B&nbsp;B&nbsp;B</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_wb_weight_sel"
*a58d3d2aSXin Li title="Prediction Weight Selection for WB Normalized LSF Decoding">
*a58d3d2aSXin Li<ttcol>I1</ttcol>
*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li<c><spanx style="vbare">0&nbsp;&nbsp;1&nbsp;&nbsp;2&nbsp;&nbsp;3&nbsp;&nbsp;4&nbsp;&nbsp;5&nbsp;&nbsp;6&nbsp;&nbsp;7&nbsp;&nbsp;8&nbsp;&nbsp;9&nbsp;10&nbsp;11&nbsp;12&nbsp;13&nbsp;14</spanx></c>
*a58d3d2aSXin Li<c> 0</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c> 1</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c> 2</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c> 3</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c> 4</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c> 5</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c> 6</c>
*a58d3d2aSXin Li<c><spanx style="vbare">D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c> 7</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c> 8</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c> 9</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c>10</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>11</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>12</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>13</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>14</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c>15</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>16</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>17</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>18</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c>19</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>20</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>21</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>22</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>23</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>24</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c>25</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c>26</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c>27</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c>28</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c>29</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D</spanx></c>
*a58d3d2aSXin Li<c>30</c>
*a58d3d2aSXin Li<c><spanx style="vbare">D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li<c>31</c>
*a58d3d2aSXin Li<c><spanx style="vbare">C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C&nbsp;&nbsp;C&nbsp;&nbsp;D&nbsp;&nbsp;C</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_nlsf_reconstruction"
*a58d3d2aSXin Li title="Reconstructing the Normalized LSF Coefficients">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOnce the stage-1 index I1 and the stage-2 residual res_Q10[] have been decoded,
*a58d3d2aSXin Li the final normalized LSF coefficients can be reconstructed.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe spectral distortion introduced by the quantization of each LSF coefficient
*a58d3d2aSXin Li varies, so the stage-2 residual is weighted accordingly, using the
*a58d3d2aSXin Li low-complexity Inverse Harmonic Mean Weighting (IHMW) function proposed in
*a58d3d2aSXin Li <xref target="laroia-icassp"/>.
*a58d3d2aSXin LiThe weights are derived directly from the stage-1 codebook vector.
*a58d3d2aSXin LiLet cb1_Q8[k] be the k'th entry of the stage-1 codebook vector from
*a58d3d2aSXin Li <xref target="silk_nlsf_nbmb_codebook"/> or
*a58d3d2aSXin Li <xref target="silk_nlsf_wb_codebook"/>.
*a58d3d2aSXin LiThen for 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d_LPC the following expression
*a58d3d2aSXin Li computes the square of the weight as a Q18 value:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Liw2_Q18[k] = (1024/(cb1_Q8[k] - cb1_Q8[k-1])
*a58d3d2aSXin Li             + 1024/(cb1_Q8[k+1] - cb1_Q8[k])) << 16 ,
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where cb1_Q8[-1]&nbsp;=&nbsp;0 and cb1_Q8[d_LPC]&nbsp;=&nbsp;256, and the
*a58d3d2aSXin Li division is integer division.
*a58d3d2aSXin LiThis is reduced to an unsquared, Q9 value using the following square-root
*a58d3d2aSXin Li approximation:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lii = ilog(w2_Q18[k])
*a58d3d2aSXin Lif = (w2_Q18[k]>>(i-8)) & 127
*a58d3d2aSXin Liy = ((i&1) ? 32768 : 46214) >> ((32-i)>>1)
*a58d3d2aSXin Liw_Q9[k] = y + ((213*f*y)>>16)
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe constant 46214 here is approximately the square root of 2 in Q15.
*a58d3d2aSXin LiThe cb1_Q8[] vector completely determines these weights, and they may be
*a58d3d2aSXin Li tabulated and stored as 13-bit unsigned values (with a range of 1819 to 5227,
*a58d3d2aSXin Li inclusive) to avoid computing them when decoding.
*a58d3d2aSXin LiThe reference implementation already requires code to compute these weights on
*a58d3d2aSXin Li unquantized coefficients in the encoder, in silk_NLSF_VQ_weights_laroia()
*a58d3d2aSXin Li (NLSF_VQ_weights_laroia.c) and its callers, so it reuses that code in the
*a58d3d2aSXin Li decoder instead of using a pre-computed table to reduce the amount of ROM
*a58d3d2aSXin Li required.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_nbmb_codebook"
*a58d3d2aSXin Li           title="NB/MB Normalized LSF Stage-1 Codebook Vectors">
*a58d3d2aSXin Li<ttcol>I1</ttcol>
*a58d3d2aSXin Li<ttcol>Codebook (Q8)</ttcol>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;4&nbsp;&nbsp;&nbsp;5&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;&nbsp;7&nbsp;&nbsp;&nbsp;8&nbsp;&nbsp;&nbsp;9</spanx></c>
*a58d3d2aSXin Li<c>0</c>
*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;&nbsp;35&nbsp;&nbsp;60&nbsp;&nbsp;83&nbsp;108&nbsp;132&nbsp;157&nbsp;180&nbsp;206&nbsp;228</spanx></c>
*a58d3d2aSXin Li<c>1</c>
*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;&nbsp;32&nbsp;&nbsp;55&nbsp;&nbsp;77&nbsp;101&nbsp;125&nbsp;151&nbsp;175&nbsp;201&nbsp;225</spanx></c>
*a58d3d2aSXin Li<c>2</c>
*a58d3d2aSXin Li<c><spanx style="vbare">19&nbsp;&nbsp;42&nbsp;&nbsp;66&nbsp;&nbsp;89&nbsp;114&nbsp;137&nbsp;162&nbsp;184&nbsp;209&nbsp;230</spanx></c>
*a58d3d2aSXin Li<c>3</c>
*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;&nbsp;25&nbsp;&nbsp;50&nbsp;&nbsp;72&nbsp;&nbsp;97&nbsp;120&nbsp;147&nbsp;172&nbsp;200&nbsp;223</spanx></c>
*a58d3d2aSXin Li<c>4</c>
*a58d3d2aSXin Li<c><spanx style="vbare">26&nbsp;&nbsp;44&nbsp;&nbsp;69&nbsp;&nbsp;90&nbsp;114&nbsp;135&nbsp;159&nbsp;180&nbsp;205&nbsp;225</spanx></c>
*a58d3d2aSXin Li<c>5</c>
*a58d3d2aSXin Li<c><spanx style="vbare">13&nbsp;&nbsp;22&nbsp;&nbsp;53&nbsp;&nbsp;80&nbsp;106&nbsp;130&nbsp;156&nbsp;180&nbsp;205&nbsp;228</spanx></c>
*a58d3d2aSXin Li<c>6</c>
*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;&nbsp;25&nbsp;&nbsp;44&nbsp;&nbsp;64&nbsp;&nbsp;90&nbsp;115&nbsp;142&nbsp;168&nbsp;196&nbsp;222</spanx></c>
*a58d3d2aSXin Li<c>7</c>
*a58d3d2aSXin Li<c><spanx style="vbare">19&nbsp;&nbsp;24&nbsp;&nbsp;62&nbsp;&nbsp;82&nbsp;100&nbsp;120&nbsp;145&nbsp;168&nbsp;190&nbsp;214</spanx></c>
*a58d3d2aSXin Li<c>8</c>
*a58d3d2aSXin Li<c><spanx style="vbare">22&nbsp;&nbsp;31&nbsp;&nbsp;50&nbsp;&nbsp;79&nbsp;103&nbsp;120&nbsp;151&nbsp;170&nbsp;203&nbsp;227</spanx></c>
*a58d3d2aSXin Li<c>9</c>
*a58d3d2aSXin Li<c><spanx style="vbare">21&nbsp;&nbsp;29&nbsp;&nbsp;45&nbsp;&nbsp;65&nbsp;106&nbsp;124&nbsp;150&nbsp;171&nbsp;196&nbsp;224</spanx></c>
*a58d3d2aSXin Li<c>10</c>
*a58d3d2aSXin Li<c><spanx style="vbare">30&nbsp;&nbsp;49&nbsp;&nbsp;75&nbsp;&nbsp;97&nbsp;121&nbsp;142&nbsp;165&nbsp;186&nbsp;209&nbsp;229</spanx></c>
*a58d3d2aSXin Li<c>11</c>
*a58d3d2aSXin Li<c><spanx style="vbare">19&nbsp;&nbsp;25&nbsp;&nbsp;52&nbsp;&nbsp;70&nbsp;&nbsp;93&nbsp;116&nbsp;143&nbsp;166&nbsp;192&nbsp;219</spanx></c>
*a58d3d2aSXin Li<c>12</c>
*a58d3d2aSXin Li<c><spanx style="vbare">26&nbsp;&nbsp;34&nbsp;&nbsp;62&nbsp;&nbsp;75&nbsp;&nbsp;97&nbsp;118&nbsp;145&nbsp;167&nbsp;194&nbsp;217</spanx></c>
*a58d3d2aSXin Li<c>13</c>
*a58d3d2aSXin Li<c><spanx style="vbare">25&nbsp;&nbsp;33&nbsp;&nbsp;56&nbsp;&nbsp;70&nbsp;&nbsp;91&nbsp;113&nbsp;143&nbsp;165&nbsp;196&nbsp;223</spanx></c>
*a58d3d2aSXin Li<c>14</c>
*a58d3d2aSXin Li<c><spanx style="vbare">21&nbsp;&nbsp;34&nbsp;&nbsp;51&nbsp;&nbsp;72&nbsp;&nbsp;97&nbsp;117&nbsp;145&nbsp;171&nbsp;196&nbsp;222</spanx></c>
*a58d3d2aSXin Li<c>15</c>
*a58d3d2aSXin Li<c><spanx style="vbare">20&nbsp;&nbsp;29&nbsp;&nbsp;50&nbsp;&nbsp;67&nbsp;&nbsp;90&nbsp;117&nbsp;144&nbsp;168&nbsp;197&nbsp;221</spanx></c>
*a58d3d2aSXin Li<c>16</c>
*a58d3d2aSXin Li<c><spanx style="vbare">22&nbsp;&nbsp;31&nbsp;&nbsp;48&nbsp;&nbsp;66&nbsp;&nbsp;95&nbsp;117&nbsp;146&nbsp;168&nbsp;196&nbsp;222</spanx></c>
*a58d3d2aSXin Li<c>17</c>
*a58d3d2aSXin Li<c><spanx style="vbare">24&nbsp;&nbsp;33&nbsp;&nbsp;51&nbsp;&nbsp;77&nbsp;116&nbsp;134&nbsp;158&nbsp;180&nbsp;200&nbsp;224</spanx></c>
*a58d3d2aSXin Li<c>18</c>
*a58d3d2aSXin Li<c><spanx style="vbare">21&nbsp;&nbsp;28&nbsp;&nbsp;70&nbsp;&nbsp;87&nbsp;106&nbsp;124&nbsp;149&nbsp;170&nbsp;194&nbsp;217</spanx></c>
*a58d3d2aSXin Li<c>19</c>
*a58d3d2aSXin Li<c><spanx style="vbare">26&nbsp;&nbsp;33&nbsp;&nbsp;53&nbsp;&nbsp;64&nbsp;&nbsp;83&nbsp;117&nbsp;152&nbsp;173&nbsp;204&nbsp;225</spanx></c>
*a58d3d2aSXin Li<c>20</c>
*a58d3d2aSXin Li<c><spanx style="vbare">27&nbsp;&nbsp;34&nbsp;&nbsp;65&nbsp;&nbsp;95&nbsp;108&nbsp;129&nbsp;155&nbsp;174&nbsp;210&nbsp;225</spanx></c>
*a58d3d2aSXin Li<c>21</c>
*a58d3d2aSXin Li<c><spanx style="vbare">20&nbsp;&nbsp;26&nbsp;&nbsp;72&nbsp;&nbsp;99&nbsp;113&nbsp;131&nbsp;154&nbsp;176&nbsp;200&nbsp;219</spanx></c>
*a58d3d2aSXin Li<c>22</c>
*a58d3d2aSXin Li<c><spanx style="vbare">34&nbsp;&nbsp;43&nbsp;&nbsp;61&nbsp;&nbsp;78&nbsp;&nbsp;93&nbsp;114&nbsp;155&nbsp;177&nbsp;205&nbsp;229</spanx></c>
*a58d3d2aSXin Li<c>23</c>
*a58d3d2aSXin Li<c><spanx style="vbare">23&nbsp;&nbsp;29&nbsp;&nbsp;54&nbsp;&nbsp;97&nbsp;124&nbsp;138&nbsp;163&nbsp;179&nbsp;209&nbsp;229</spanx></c>
*a58d3d2aSXin Li<c>24</c>
*a58d3d2aSXin Li<c><spanx style="vbare">30&nbsp;&nbsp;38&nbsp;&nbsp;56&nbsp;&nbsp;89&nbsp;118&nbsp;129&nbsp;158&nbsp;178&nbsp;200&nbsp;231</spanx></c>
*a58d3d2aSXin Li<c>25</c>
*a58d3d2aSXin Li<c><spanx style="vbare">21&nbsp;&nbsp;29&nbsp;&nbsp;49&nbsp;&nbsp;63&nbsp;&nbsp;85&nbsp;111&nbsp;142&nbsp;163&nbsp;193&nbsp;222</spanx></c>
*a58d3d2aSXin Li<c>26</c>
*a58d3d2aSXin Li<c><spanx style="vbare">27&nbsp;&nbsp;48&nbsp;&nbsp;77&nbsp;103&nbsp;133&nbsp;158&nbsp;179&nbsp;196&nbsp;215&nbsp;232</spanx></c>
*a58d3d2aSXin Li<c>27</c>
*a58d3d2aSXin Li<c><spanx style="vbare">29&nbsp;&nbsp;47&nbsp;&nbsp;74&nbsp;&nbsp;99&nbsp;124&nbsp;151&nbsp;176&nbsp;198&nbsp;220&nbsp;237</spanx></c>
*a58d3d2aSXin Li<c>28</c>
*a58d3d2aSXin Li<c><spanx style="vbare">33&nbsp;&nbsp;42&nbsp;&nbsp;61&nbsp;&nbsp;76&nbsp;&nbsp;93&nbsp;121&nbsp;155&nbsp;174&nbsp;207&nbsp;225</spanx></c>
*a58d3d2aSXin Li<c>29</c>
*a58d3d2aSXin Li<c><spanx style="vbare">29&nbsp;&nbsp;53&nbsp;&nbsp;87&nbsp;112&nbsp;136&nbsp;154&nbsp;170&nbsp;188&nbsp;208&nbsp;227</spanx></c>
*a58d3d2aSXin Li<c>30</c>
*a58d3d2aSXin Li<c><spanx style="vbare">24&nbsp;&nbsp;30&nbsp;&nbsp;52&nbsp;&nbsp;84&nbsp;131&nbsp;150&nbsp;166&nbsp;186&nbsp;203&nbsp;229</spanx></c>
*a58d3d2aSXin Li<c>31</c>
*a58d3d2aSXin Li<c><spanx style="vbare">37&nbsp;&nbsp;48&nbsp;&nbsp;64&nbsp;&nbsp;84&nbsp;104&nbsp;118&nbsp;156&nbsp;177&nbsp;201&nbsp;230</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_wb_codebook"
*a58d3d2aSXin Li           title="WB Normalized LSF Stage-1 Codebook Vectors">
*a58d3d2aSXin Li<ttcol>I1</ttcol>
*a58d3d2aSXin Li<ttcol>Codebook (Q8)</ttcol>
*a58d3d2aSXin Li<c/>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;2&nbsp;&nbsp;3&nbsp;&nbsp;4&nbsp;&nbsp;&nbsp;5&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;&nbsp;7&nbsp;&nbsp;&nbsp;8&nbsp;&nbsp;&nbsp;9&nbsp;&nbsp;10&nbsp;&nbsp;11&nbsp;&nbsp;12&nbsp;&nbsp;13&nbsp;&nbsp;14&nbsp;&nbsp;15</spanx></c>
*a58d3d2aSXin Li<c>0</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;7&nbsp;23&nbsp;38&nbsp;54&nbsp;69&nbsp;&nbsp;85&nbsp;100&nbsp;116&nbsp;131&nbsp;147&nbsp;162&nbsp;178&nbsp;193&nbsp;208&nbsp;223&nbsp;239</spanx></c>
*a58d3d2aSXin Li<c>1</c>
*a58d3d2aSXin Li<c><spanx style="vbare">13&nbsp;25&nbsp;41&nbsp;55&nbsp;69&nbsp;&nbsp;83&nbsp;&nbsp;98&nbsp;112&nbsp;127&nbsp;142&nbsp;157&nbsp;171&nbsp;187&nbsp;203&nbsp;220&nbsp;236</spanx></c>
*a58d3d2aSXin Li<c>2</c>
*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;21&nbsp;34&nbsp;51&nbsp;61&nbsp;&nbsp;78&nbsp;&nbsp;92&nbsp;106&nbsp;126&nbsp;136&nbsp;152&nbsp;167&nbsp;185&nbsp;205&nbsp;225&nbsp;240</spanx></c>
*a58d3d2aSXin Li<c>3</c>
*a58d3d2aSXin Li<c><spanx style="vbare">10&nbsp;21&nbsp;36&nbsp;50&nbsp;63&nbsp;&nbsp;79&nbsp;&nbsp;95&nbsp;110&nbsp;126&nbsp;141&nbsp;157&nbsp;173&nbsp;189&nbsp;205&nbsp;221&nbsp;237</spanx></c>
*a58d3d2aSXin Li<c>4</c>
*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;20&nbsp;37&nbsp;51&nbsp;59&nbsp;&nbsp;78&nbsp;&nbsp;89&nbsp;107&nbsp;123&nbsp;134&nbsp;150&nbsp;164&nbsp;184&nbsp;205&nbsp;224&nbsp;240</spanx></c>
*a58d3d2aSXin Li<c>5</c>
*a58d3d2aSXin Li<c><spanx style="vbare">10&nbsp;15&nbsp;32&nbsp;51&nbsp;67&nbsp;&nbsp;81&nbsp;&nbsp;96&nbsp;112&nbsp;129&nbsp;142&nbsp;158&nbsp;173&nbsp;189&nbsp;204&nbsp;220&nbsp;236</spanx></c>
*a58d3d2aSXin Li<c>6</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;8&nbsp;21&nbsp;37&nbsp;51&nbsp;65&nbsp;&nbsp;79&nbsp;&nbsp;98&nbsp;113&nbsp;126&nbsp;138&nbsp;155&nbsp;168&nbsp;179&nbsp;192&nbsp;209&nbsp;218</spanx></c>
*a58d3d2aSXin Li<c>7</c>
*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;15&nbsp;34&nbsp;55&nbsp;63&nbsp;&nbsp;78&nbsp;&nbsp;87&nbsp;108&nbsp;118&nbsp;131&nbsp;148&nbsp;167&nbsp;185&nbsp;203&nbsp;219&nbsp;236</spanx></c>
*a58d3d2aSXin Li<c>8</c>
*a58d3d2aSXin Li<c><spanx style="vbare">16&nbsp;19&nbsp;32&nbsp;36&nbsp;56&nbsp;&nbsp;79&nbsp;&nbsp;91&nbsp;108&nbsp;118&nbsp;136&nbsp;154&nbsp;171&nbsp;186&nbsp;204&nbsp;220&nbsp;237</spanx></c>
*a58d3d2aSXin Li<c>9</c>
*a58d3d2aSXin Li<c><spanx style="vbare">11&nbsp;28&nbsp;43&nbsp;58&nbsp;74&nbsp;&nbsp;89&nbsp;105&nbsp;120&nbsp;135&nbsp;150&nbsp;165&nbsp;180&nbsp;196&nbsp;211&nbsp;226&nbsp;241</spanx></c>
*a58d3d2aSXin Li<c>10</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;6&nbsp;16&nbsp;33&nbsp;46&nbsp;60&nbsp;&nbsp;75&nbsp;&nbsp;92&nbsp;107&nbsp;123&nbsp;137&nbsp;156&nbsp;169&nbsp;185&nbsp;199&nbsp;214&nbsp;225</spanx></c>
*a58d3d2aSXin Li<c>11</c>
*a58d3d2aSXin Li<c><spanx style="vbare">11&nbsp;19&nbsp;30&nbsp;44&nbsp;57&nbsp;&nbsp;74&nbsp;&nbsp;89&nbsp;105&nbsp;121&nbsp;135&nbsp;152&nbsp;169&nbsp;186&nbsp;202&nbsp;218&nbsp;234</spanx></c>
*a58d3d2aSXin Li<c>12</c>
*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;19&nbsp;29&nbsp;46&nbsp;57&nbsp;&nbsp;71&nbsp;&nbsp;88&nbsp;100&nbsp;120&nbsp;132&nbsp;148&nbsp;165&nbsp;182&nbsp;199&nbsp;216&nbsp;233</spanx></c>
*a58d3d2aSXin Li<c>13</c>
*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;23&nbsp;35&nbsp;46&nbsp;56&nbsp;&nbsp;77&nbsp;&nbsp;92&nbsp;106&nbsp;123&nbsp;134&nbsp;152&nbsp;167&nbsp;185&nbsp;204&nbsp;222&nbsp;237</spanx></c>
*a58d3d2aSXin Li<c>14</c>
*a58d3d2aSXin Li<c><spanx style="vbare">14&nbsp;17&nbsp;45&nbsp;53&nbsp;63&nbsp;&nbsp;75&nbsp;&nbsp;89&nbsp;107&nbsp;115&nbsp;132&nbsp;151&nbsp;171&nbsp;188&nbsp;206&nbsp;221&nbsp;240</spanx></c>
*a58d3d2aSXin Li<c>15</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;9&nbsp;16&nbsp;29&nbsp;40&nbsp;56&nbsp;&nbsp;71&nbsp;&nbsp;88&nbsp;103&nbsp;119&nbsp;137&nbsp;154&nbsp;171&nbsp;189&nbsp;205&nbsp;222&nbsp;237</spanx></c>
*a58d3d2aSXin Li<c>16</c>
*a58d3d2aSXin Li<c><spanx style="vbare">16&nbsp;19&nbsp;36&nbsp;48&nbsp;57&nbsp;&nbsp;76&nbsp;&nbsp;87&nbsp;105&nbsp;118&nbsp;132&nbsp;150&nbsp;167&nbsp;185&nbsp;202&nbsp;218&nbsp;236</spanx></c>
*a58d3d2aSXin Li<c>17</c>
*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;17&nbsp;29&nbsp;54&nbsp;71&nbsp;&nbsp;81&nbsp;&nbsp;94&nbsp;104&nbsp;126&nbsp;136&nbsp;149&nbsp;164&nbsp;182&nbsp;201&nbsp;221&nbsp;237</spanx></c>
*a58d3d2aSXin Li<c>18</c>
*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;28&nbsp;47&nbsp;62&nbsp;79&nbsp;&nbsp;97&nbsp;115&nbsp;129&nbsp;142&nbsp;155&nbsp;168&nbsp;180&nbsp;194&nbsp;208&nbsp;223&nbsp;238</spanx></c>
*a58d3d2aSXin Li<c>19</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;8&nbsp;14&nbsp;30&nbsp;45&nbsp;62&nbsp;&nbsp;78&nbsp;&nbsp;94&nbsp;111&nbsp;127&nbsp;143&nbsp;159&nbsp;175&nbsp;192&nbsp;207&nbsp;223&nbsp;239</spanx></c>
*a58d3d2aSXin Li<c>20</c>
*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;30&nbsp;49&nbsp;62&nbsp;79&nbsp;&nbsp;92&nbsp;107&nbsp;119&nbsp;132&nbsp;145&nbsp;160&nbsp;174&nbsp;190&nbsp;204&nbsp;220&nbsp;235</spanx></c>
*a58d3d2aSXin Li<c>21</c>
*a58d3d2aSXin Li<c><spanx style="vbare">14&nbsp;19&nbsp;36&nbsp;45&nbsp;61&nbsp;&nbsp;76&nbsp;&nbsp;91&nbsp;108&nbsp;121&nbsp;138&nbsp;154&nbsp;172&nbsp;189&nbsp;205&nbsp;222&nbsp;238</spanx></c>
*a58d3d2aSXin Li<c>22</c>
*a58d3d2aSXin Li<c><spanx style="vbare">12&nbsp;18&nbsp;31&nbsp;45&nbsp;60&nbsp;&nbsp;76&nbsp;&nbsp;91&nbsp;107&nbsp;123&nbsp;138&nbsp;154&nbsp;171&nbsp;187&nbsp;204&nbsp;221&nbsp;236</spanx></c>
*a58d3d2aSXin Li<c>23</c>
*a58d3d2aSXin Li<c><spanx style="vbare">13&nbsp;17&nbsp;31&nbsp;43&nbsp;53&nbsp;&nbsp;70&nbsp;&nbsp;83&nbsp;103&nbsp;114&nbsp;131&nbsp;149&nbsp;167&nbsp;185&nbsp;203&nbsp;220&nbsp;237</spanx></c>
*a58d3d2aSXin Li<c>24</c>
*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;22&nbsp;35&nbsp;42&nbsp;58&nbsp;&nbsp;78&nbsp;&nbsp;93&nbsp;110&nbsp;125&nbsp;139&nbsp;155&nbsp;170&nbsp;188&nbsp;206&nbsp;224&nbsp;240</spanx></c>
*a58d3d2aSXin Li<c>25</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;8&nbsp;15&nbsp;34&nbsp;50&nbsp;67&nbsp;&nbsp;83&nbsp;&nbsp;99&nbsp;115&nbsp;131&nbsp;146&nbsp;162&nbsp;178&nbsp;193&nbsp;209&nbsp;224&nbsp;239</spanx></c>
*a58d3d2aSXin Li<c>26</c>
*a58d3d2aSXin Li<c><spanx style="vbare">13&nbsp;16&nbsp;41&nbsp;66&nbsp;73&nbsp;&nbsp;86&nbsp;&nbsp;95&nbsp;111&nbsp;128&nbsp;137&nbsp;150&nbsp;163&nbsp;183&nbsp;206&nbsp;225&nbsp;241</spanx></c>
*a58d3d2aSXin Li<c>27</c>
*a58d3d2aSXin Li<c><spanx style="vbare">17&nbsp;25&nbsp;37&nbsp;52&nbsp;63&nbsp;&nbsp;75&nbsp;&nbsp;92&nbsp;102&nbsp;119&nbsp;132&nbsp;144&nbsp;160&nbsp;175&nbsp;191&nbsp;212&nbsp;231</spanx></c>
*a58d3d2aSXin Li<c>28</c>
*a58d3d2aSXin Li<c><spanx style="vbare">19&nbsp;31&nbsp;49&nbsp;65&nbsp;83&nbsp;100&nbsp;117&nbsp;133&nbsp;147&nbsp;161&nbsp;174&nbsp;187&nbsp;200&nbsp;213&nbsp;227&nbsp;242</spanx></c>
*a58d3d2aSXin Li<c>29</c>
*a58d3d2aSXin Li<c><spanx style="vbare">18&nbsp;31&nbsp;52&nbsp;68&nbsp;88&nbsp;103&nbsp;117&nbsp;126&nbsp;138&nbsp;149&nbsp;163&nbsp;177&nbsp;192&nbsp;207&nbsp;223&nbsp;239</spanx></c>
*a58d3d2aSXin Li<c>30</c>
*a58d3d2aSXin Li<c><spanx style="vbare">16&nbsp;29&nbsp;47&nbsp;61&nbsp;76&nbsp;&nbsp;90&nbsp;106&nbsp;119&nbsp;133&nbsp;147&nbsp;161&nbsp;176&nbsp;193&nbsp;209&nbsp;224&nbsp;240</spanx></c>
*a58d3d2aSXin Li<c>31</c>
*a58d3d2aSXin Li<c><spanx style="vbare">15&nbsp;21&nbsp;35&nbsp;50&nbsp;61&nbsp;&nbsp;73&nbsp;&nbsp;86&nbsp;&nbsp;97&nbsp;110&nbsp;119&nbsp;129&nbsp;141&nbsp;175&nbsp;198&nbsp;218&nbsp;237</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiGiven the stage-1 codebook entry cb1_Q8[], the stage-2 residual res_Q10[], and
*a58d3d2aSXin Li their corresponding weights, w_Q9[], the reconstructed normalized LSF
*a58d3d2aSXin Li coefficients are
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin LiNLSF_Q15[k] = clamp(0,
*a58d3d2aSXin Li               (cb1_Q8[k]<<7) + (res_Q10[k]<<14)/w_Q9[k], 32767) ,
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where the division is integer division.
*a58d3d2aSXin LiHowever, nothing in either the reconstruction process or the
*a58d3d2aSXin Li quantization process in the encoder thus far guarantees that the coefficients
*a58d3d2aSXin Li are monotonically increasing and separated well enough to ensure a stable
*a58d3d2aSXin Li filter <xref target="Kabal86"/>.
*a58d3d2aSXin LiWhen using the reference encoder, roughly 2% of frames violate this constraint.
*a58d3d2aSXin LiThe next section describes a stabilization procedure used to make these
*a58d3d2aSXin Li guarantees.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_nlsf_stabilization" title="Normalized LSF Stabilization">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe normalized LSF stabilization procedure is implemented in
*a58d3d2aSXin Li silk_NLSF_stabilize() (NLSF_stabilize.c).
*a58d3d2aSXin LiThis process ensures that consecutive values of the normalized LSF
*a58d3d2aSXin Li coefficients, NLSF_Q15[], are spaced some minimum distance apart
*a58d3d2aSXin Li (predetermined to be the 0.01 percentile of a large training set).
*a58d3d2aSXin Li<xref target="silk_nlsf_min_spacing"/> gives the minimum spacings for NB and MB
*a58d3d2aSXin Li and those for WB, where row k is the minimum allowed value of
*a58d3d2aSXin Li NLSF_Q[k]-NLSF_Q[k-1].
*a58d3d2aSXin LiFor the purposes of computing this spacing for the first and last coefficient,
*a58d3d2aSXin Li NLSF_Q15[-1] is taken to be 0, and NLSF_Q15[d_LPC] is taken to be 32768.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_min_spacing"
*a58d3d2aSXin Li           title="Minimum Spacing for Normalized LSF Coefficients">
*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
*a58d3d2aSXin Li<ttcol align="right">NB and MB</ttcol>
*a58d3d2aSXin Li<ttcol align="right">WB</ttcol>
*a58d3d2aSXin Li <c>0</c> <c>250</c> <c>100</c>
*a58d3d2aSXin Li <c>1</c>   <c>3</c>   <c>3</c>
*a58d3d2aSXin Li <c>2</c>   <c>6</c>  <c>40</c>
*a58d3d2aSXin Li <c>3</c>   <c>3</c>   <c>3</c>
*a58d3d2aSXin Li <c>4</c>   <c>3</c>   <c>3</c>
*a58d3d2aSXin Li <c>5</c>   <c>3</c>   <c>3</c>
*a58d3d2aSXin Li <c>6</c>   <c>4</c>   <c>5</c>
*a58d3d2aSXin Li <c>7</c>   <c>3</c>  <c>14</c>
*a58d3d2aSXin Li <c>8</c>   <c>3</c>  <c>14</c>
*a58d3d2aSXin Li <c>9</c>   <c>3</c>  <c>10</c>
*a58d3d2aSXin Li<c>10</c> <c>461</c>  <c>11</c>
*a58d3d2aSXin Li<c>11</c>       <c/>   <c>3</c>
*a58d3d2aSXin Li<c>12</c>       <c/>   <c>8</c>
*a58d3d2aSXin Li<c>13</c>       <c/>   <c>9</c>
*a58d3d2aSXin Li<c>14</c>       <c/>   <c>7</c>
*a58d3d2aSXin Li<c>15</c>       <c/>   <c>3</c>
*a58d3d2aSXin Li<c>16</c>       <c/> <c>347</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe procedure starts off by trying to make small adjustments which attempt to
*a58d3d2aSXin Li minimize the amount of distortion introduced.
*a58d3d2aSXin LiAfter 20 such adjustments, it falls back to a more direct method which
*a58d3d2aSXin Li guarantees the constraints are enforced but may require large adjustments.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLet NDeltaMin_Q15[k] be the minimum required spacing for the current audio
*a58d3d2aSXin Li bandwidth from <xref target="silk_nlsf_min_spacing"/>.
*a58d3d2aSXin LiFirst, the procedure finds the index i where
*a58d3d2aSXin Li NLSF_Q15[i]&nbsp;-&nbsp;NLSF_Q15[i-1]&nbsp;-&nbsp;NDeltaMin_Q15[i] is the
*a58d3d2aSXin Li smallest, breaking ties by using the lower value of i.
*a58d3d2aSXin LiIf this value is non-negative, then the stabilization stops; the coefficients
*a58d3d2aSXin Li satisfy all the constraints.
*a58d3d2aSXin LiOtherwise, if i&nbsp;==&nbsp;0, it sets NLSF_Q15[0] to NDeltaMin_Q15[0], and if
*a58d3d2aSXin Li i&nbsp;==&nbsp;d_LPC, it sets NLSF_Q15[d_LPC-1] to
*a58d3d2aSXin Li (32768&nbsp;-&nbsp;NDeltaMin_Q15[d_LPC]).
*a58d3d2aSXin LiFor all other values of i, both NLSF_Q15[i-1] and NLSF_Q15[i] are updated as
*a58d3d2aSXin Li follows:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li                                          i-1
*a58d3d2aSXin Li                                          __
*a58d3d2aSXin Li min_center_Q15 = (NDeltaMin_Q15[i]>>1) + \  NDeltaMin_Q15[k]
*a58d3d2aSXin Li                                          /_
*a58d3d2aSXin Li                                          k=0
*a58d3d2aSXin Li                                                 d_LPC
*a58d3d2aSXin Li                                                  __
*a58d3d2aSXin Li max_center_Q15 = 32768 - (NDeltaMin_Q15[i]>>1) - \  NDeltaMin_Q15[k]
*a58d3d2aSXin Li                                                  /_
*a58d3d2aSXin Li                                                 k=i+1
*a58d3d2aSXin Licenter_freq_Q15 = clamp(min_center_Q15[i],
*a58d3d2aSXin Li                        (NLSF_Q15[i-1] + NLSF_Q15[i] + 1)>>1,
*a58d3d2aSXin Li                        max_center_Q15[i])
*a58d3d2aSXin Li
*a58d3d2aSXin Li NLSF_Q15[i-1] = center_freq_Q15 - (NDeltaMin_Q15[i]>>1)
*a58d3d2aSXin Li
*a58d3d2aSXin Li   NLSF_Q15[i] = NLSF_Q15[i-1] + NDeltaMin_Q15[i] .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThen the procedure repeats again, until it has either executed 20 times or
*a58d3d2aSXin Li has stopped because the coefficients satisfy all the constraints.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter the 20th repetition of the above procedure, the following fallback
*a58d3d2aSXin Li procedure executes once.
*a58d3d2aSXin LiFirst, the values of NLSF_Q15[k] for 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d_LPC
*a58d3d2aSXin Li are sorted in ascending order.
*a58d3d2aSXin LiThen for each value of k from 0 to d_LPC-1, NLSF_Q15[k] is set to
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Limax(NLSF_Q15[k], NLSF_Q15[k-1] + NDeltaMin_Q15[k]) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiNext, for each value of k from d_LPC-1 down to 0, NLSF_Q15[k] is set to
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Limin(NLSF_Q15[k], NLSF_Q15[k+1] - NDeltaMin_Q15[k+1]) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_nlsf_interpolation" title="Normalized LSF Interpolation">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor 20&nbsp;ms SILK frames, the first half of the frame (i.e., the first two
*a58d3d2aSXin Li subframes) may use normalized LSF coefficients that are interpolated between
*a58d3d2aSXin Li the decoded LSFs for the most recent coded frame (in the same channel) and the
*a58d3d2aSXin Li current frame.
*a58d3d2aSXin LiA Q2 interpolation factor follows the LSF coefficient indices in the bitstream,
*a58d3d2aSXin Li which is decoded using the PDF in <xref target="silk_nlsf_interp_pdf"/>.
*a58d3d2aSXin LiThis happens in silk_decode_indices() (decode_indices.c).
*a58d3d2aSXin LiAfter either
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>An uncoded regular SILK frame in the side channel, or</t>
*a58d3d2aSXin Li<t>A decoder reset (see <xref target="decoder-reset"/>),</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li the decoder still decodes this factor, but ignores its value and always uses
*a58d3d2aSXin Li 4 instead.
*a58d3d2aSXin LiFor 10&nbsp;ms SILK frames, this factor is not stored at all.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_interp_pdf"
*a58d3d2aSXin Li           title="PDF for Normalized LSF Interpolation Index">
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>{13, 22, 29, 11, 181}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLet n2_Q15[k] be the normalized LSF coefficients decoded by the procedure in
*a58d3d2aSXin Li <xref target="silk_nlsfs"/>, n0_Q15[k] be the LSF coefficients
*a58d3d2aSXin Li decoded for the prior frame, and w_Q2 be the interpolation factor.
*a58d3d2aSXin LiThen the normalized LSF coefficients used for the first half of a 20&nbsp;ms
*a58d3d2aSXin Li frame, n1_Q15[k], are
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lin1_Q15[k] = n0_Q15[k] + (w_Q2*(n2_Q15[k] - n0_Q15[k]) >> 2) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThis interpolation is performed in silk_decode_parameters()
*a58d3d2aSXin Li (decode_parameters.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_nlsf2lpc"
*a58d3d2aSXin Li title="Converting Normalized LSFs to LPC Coefficients">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAny LPC filter A(z) can be split into a symmetric part P(z) and an
*a58d3d2aSXin Li anti-symmetric part Q(z) such that
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li          d_LPC
*a58d3d2aSXin Li           __         -k   1
*a58d3d2aSXin LiA(z) = 1 - \  a[k] * z   = - * (P(z) + Q(z))
*a58d3d2aSXin Li           /_              2
*a58d3d2aSXin Li           k=1
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liwith
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li               -d_LPC-1      -1
*a58d3d2aSXin LiP(z) = A(z) + z         * A(z  )
*a58d3d2aSXin Li
*a58d3d2aSXin Li               -d_LPC-1      -1
*a58d3d2aSXin LiQ(z) = A(z) - z         * A(z  ) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe even normalized LSF coefficients correspond to a pair of conjugate roots of
*a58d3d2aSXin Li P(z), while the odd coefficients correspond to a pair of conjugate roots of
*a58d3d2aSXin Li Q(z), all of which lie on the unit circle.
*a58d3d2aSXin LiIn addition, P(z) has a root at pi and Q(z) has a root at 0.
*a58d3d2aSXin LiThus, they may be reconstructed mathematically from a set of normalized LSF
*a58d3d2aSXin Li coefficients, n[k], as
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li                 d_LPC/2-1
*a58d3d2aSXin Li             -1     ___                        -1    -2
*a58d3d2aSXin LiP(z) = (1 + z  ) *  | |  (1 - 2*cos(pi*n[2*k])*z  + z  )
*a58d3d2aSXin Li                    k=0
*a58d3d2aSXin Li
*a58d3d2aSXin Li                 d_LPC/2-1
*a58d3d2aSXin Li             -1     ___                          -1    -2
*a58d3d2aSXin LiQ(z) = (1 - z  ) *  | |  (1 - 2*cos(pi*n[2*k+1])*z  + z  )
*a58d3d2aSXin Li                    k=0
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiHowever, SILK performs this reconstruction using a fixed-point approximation so
*a58d3d2aSXin Li that all decoders can reproduce it in a bit-exact manner to avoid prediction
*a58d3d2aSXin Li drift.
*a58d3d2aSXin LiThe function silk_NLSF2A() (NLSF2A.c) implements this procedure.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTo start, it approximates cos(pi*n[k]) using a table lookup with linear
*a58d3d2aSXin Li interpolation.
*a58d3d2aSXin LiThe encoder SHOULD use the inverse of this piecewise linear approximation,
*a58d3d2aSXin Li rather than the true inverse of the cosine function, when deriving the
*a58d3d2aSXin Li normalized LSF coefficients.
*a58d3d2aSXin LiThese values are also re-ordered to improve numerical accuracy when
*a58d3d2aSXin Li constructing the LPC polynomials.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_nlsf_orderings"
*a58d3d2aSXin Li           title="LSF Ordering for Polynomial Evaluation">
*a58d3d2aSXin Li<ttcol>Coefficient</ttcol>
*a58d3d2aSXin Li<ttcol align="right">NB and MB</ttcol>
*a58d3d2aSXin Li<ttcol align="right">WB</ttcol>
*a58d3d2aSXin Li <c>0</c>  <c>0</c>  <c>0</c>
*a58d3d2aSXin Li <c>1</c>  <c>9</c> <c>15</c>
*a58d3d2aSXin Li <c>2</c>  <c>6</c>  <c>8</c>
*a58d3d2aSXin Li <c>3</c>  <c>3</c>  <c>7</c>
*a58d3d2aSXin Li <c>4</c>  <c>4</c>  <c>4</c>
*a58d3d2aSXin Li <c>5</c>  <c>5</c> <c>11</c>
*a58d3d2aSXin Li <c>6</c>  <c>8</c> <c>12</c>
*a58d3d2aSXin Li <c>7</c>  <c>1</c>  <c>3</c>
*a58d3d2aSXin Li <c>8</c>  <c>2</c>  <c>2</c>
*a58d3d2aSXin Li <c>9</c>  <c>7</c> <c>13</c>
*a58d3d2aSXin Li<c>10</c>      <c/> <c>10</c>
*a58d3d2aSXin Li<c>11</c>      <c/>  <c>5</c>
*a58d3d2aSXin Li<c>12</c>      <c/>  <c>6</c>
*a58d3d2aSXin Li<c>13</c>      <c/>  <c>9</c>
*a58d3d2aSXin Li<c>14</c>      <c/> <c>14</c>
*a58d3d2aSXin Li<c>15</c>      <c/>  <c>1</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe top 7 bits of each normalized LSF coefficient index a value in the table,
*a58d3d2aSXin Li and the next 8 bits interpolate between it and the next value.
*a58d3d2aSXin LiLet i&nbsp;=&nbsp;(n[k]&nbsp;&gt;&gt;&nbsp;8) be the integer index and
*a58d3d2aSXin Li f&nbsp;=&nbsp;(n[k]&nbsp;&amp;&nbsp;255) be the fractional part of a given
*a58d3d2aSXin Li coefficient.
*a58d3d2aSXin LiThen the re-ordered, approximated cosine, c_Q17[ordering[k]], is
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lic_Q17[ordering[k]] = (cos_Q12[i]*256
*a58d3d2aSXin Li                      + (cos_Q12[i+1]-cos_Q12[i])*f + 4) >> 3 ,
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where ordering[k] is the k'th entry of the column of
*a58d3d2aSXin Li <xref target="silk_nlsf_orderings"/> corresponding to the current audio
*a58d3d2aSXin Li bandwidth and cos_Q12[i] is the i'th entry of <xref target="silk_cos_table"/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_cos_table"
*a58d3d2aSXin Li           title="Q12 Cosine Table for LSF Conversion">
*a58d3d2aSXin Li<ttcol align="right">i</ttcol>
*a58d3d2aSXin Li<ttcol align="right">+0</ttcol>
*a58d3d2aSXin Li<ttcol align="right">+1</ttcol>
*a58d3d2aSXin Li<ttcol align="right">+2</ttcol>
*a58d3d2aSXin Li<ttcol align="right">+3</ttcol>
*a58d3d2aSXin Li<c>0</c>
*a58d3d2aSXin Li <c>4096</c> <c>4095</c> <c>4091</c> <c>4085</c>
*a58d3d2aSXin Li<c>4</c>
*a58d3d2aSXin Li <c>4076</c> <c>4065</c> <c>4052</c> <c>4036</c>
*a58d3d2aSXin Li<c>8</c>
*a58d3d2aSXin Li <c>4017</c> <c>3997</c> <c>3973</c> <c>3948</c>
*a58d3d2aSXin Li<c>12</c>
*a58d3d2aSXin Li <c>3920</c> <c>3889</c> <c>3857</c> <c>3822</c>
*a58d3d2aSXin Li<c>16</c>
*a58d3d2aSXin Li <c>3784</c> <c>3745</c> <c>3703</c> <c>3659</c>
*a58d3d2aSXin Li<c>20</c>
*a58d3d2aSXin Li <c>3613</c> <c>3564</c> <c>3513</c> <c>3461</c>
*a58d3d2aSXin Li<c>24</c>
*a58d3d2aSXin Li <c>3406</c> <c>3349</c> <c>3290</c> <c>3229</c>
*a58d3d2aSXin Li<c>28</c>
*a58d3d2aSXin Li <c>3166</c> <c>3102</c> <c>3035</c> <c>2967</c>
*a58d3d2aSXin Li<c>32</c>
*a58d3d2aSXin Li <c>2896</c> <c>2824</c> <c>2751</c> <c>2676</c>
*a58d3d2aSXin Li<c>36</c>
*a58d3d2aSXin Li <c>2599</c> <c>2520</c> <c>2440</c> <c>2359</c>
*a58d3d2aSXin Li<c>40</c>
*a58d3d2aSXin Li <c>2276</c> <c>2191</c> <c>2106</c> <c>2019</c>
*a58d3d2aSXin Li<c>44</c>
*a58d3d2aSXin Li <c>1931</c> <c>1842</c> <c>1751</c> <c>1660</c>
*a58d3d2aSXin Li<c>48</c>
*a58d3d2aSXin Li <c>1568</c> <c>1474</c> <c>1380</c> <c>1285</c>
*a58d3d2aSXin Li<c>52</c>
*a58d3d2aSXin Li <c>1189</c> <c>1093</c>  <c>995</c>  <c>897</c>
*a58d3d2aSXin Li<c>56</c>
*a58d3d2aSXin Li  <c>799</c>  <c>700</c>  <c>601</c>  <c>501</c>
*a58d3d2aSXin Li<c>60</c>
*a58d3d2aSXin Li  <c>401</c>  <c>301</c>  <c>201</c>  <c>101</c>
*a58d3d2aSXin Li<c>64</c>
*a58d3d2aSXin Li    <c>0</c> <c>-101</c> <c>-201</c> <c>-301</c>
*a58d3d2aSXin Li<c>68</c>
*a58d3d2aSXin Li <c>-401</c> <c>-501</c> <c>-601</c> <c>-700</c>
*a58d3d2aSXin Li<c>72</c>
*a58d3d2aSXin Li <c>-799</c> <c>-897</c> <c>-995</c> <c>-1093</c>
*a58d3d2aSXin Li<c>76</c>
*a58d3d2aSXin Li<c>-1189</c><c>-1285</c><c>-1380</c><c>-1474</c>
*a58d3d2aSXin Li<c>80</c>
*a58d3d2aSXin Li<c>-1568</c><c>-1660</c><c>-1751</c><c>-1842</c>
*a58d3d2aSXin Li<c>84</c>
*a58d3d2aSXin Li<c>-1931</c><c>-2019</c><c>-2106</c><c>-2191</c>
*a58d3d2aSXin Li<c>88</c>
*a58d3d2aSXin Li<c>-2276</c><c>-2359</c><c>-2440</c><c>-2520</c>
*a58d3d2aSXin Li<c>92</c>
*a58d3d2aSXin Li<c>-2599</c><c>-2676</c><c>-2751</c><c>-2824</c>
*a58d3d2aSXin Li<c>96</c>
*a58d3d2aSXin Li<c>-2896</c><c>-2967</c><c>-3035</c><c>-3102</c>
*a58d3d2aSXin Li<c>100</c>
*a58d3d2aSXin Li<c>-3166</c><c>-3229</c><c>-3290</c><c>-3349</c>
*a58d3d2aSXin Li<c>104</c>
*a58d3d2aSXin Li<c>-3406</c><c>-3461</c><c>-3513</c><c>-3564</c>
*a58d3d2aSXin Li<c>108</c>
*a58d3d2aSXin Li<c>-3613</c><c>-3659</c><c>-3703</c><c>-3745</c>
*a58d3d2aSXin Li<c>112</c>
*a58d3d2aSXin Li<c>-3784</c><c>-3822</c><c>-3857</c><c>-3889</c>
*a58d3d2aSXin Li<c>116</c>
*a58d3d2aSXin Li<c>-3920</c><c>-3948</c><c>-3973</c><c>-3997</c>
*a58d3d2aSXin Li<c>120</c>
*a58d3d2aSXin Li<c>-4017</c><c>-4036</c><c>-4052</c><c>-4065</c>
*a58d3d2aSXin Li<c>124</c>
*a58d3d2aSXin Li<c>-4076</c><c>-4085</c><c>-4091</c><c>-4095</c>
*a58d3d2aSXin Li<c>128</c>
*a58d3d2aSXin Li<c>-4096</c>        <c/>        <c/>        <c/>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiGiven the list of cosine values, silk_NLSF2A_find_poly() (NLSF2A.c)
*a58d3d2aSXin Li computes the coefficients of P and Q, described here via a simple recurrence.
*a58d3d2aSXin LiLet p_Q16[k][j] and q_Q16[k][j] be the coefficients of the products of the
*a58d3d2aSXin Li first (k+1) root pairs for P and Q, with j indexing the coefficient number.
*a58d3d2aSXin LiOnly the first (k+2) coefficients are needed, as the products are symmetric.
*a58d3d2aSXin LiLet p_Q16[0][0]&nbsp;=&nbsp;q_Q16[0][0]&nbsp;=&nbsp;1&lt;&lt;16,
*a58d3d2aSXin Li p_Q16[0][1]&nbsp;=&nbsp;-c_Q17[0], q_Q16[0][1]&nbsp;=&nbsp;-c_Q17[1], and
*a58d3d2aSXin Li d2&nbsp;=&nbsp;d_LPC/2.
*a58d3d2aSXin LiAs boundary conditions, assume
*a58d3d2aSXin Li p_Q16[k][j]&nbsp;=&nbsp;q_Q16[k][j]&nbsp;=&nbsp;0 for all
*a58d3d2aSXin Li j&nbsp;&lt;&nbsp;0.
*a58d3d2aSXin LiAlso, assume p_Q16[k][k+2]&nbsp;=&nbsp;p_Q16[k][k] and
*a58d3d2aSXin Li q_Q16[k][k+2]&nbsp;=&nbsp;q_Q16[k][k] (because of the symmetry).
*a58d3d2aSXin LiThen, for 0&nbsp;&lt;&nbsp;k&nbsp;&lt;&nbsp;d2 and 0&nbsp;&lt;=&nbsp;j&nbsp;&lt;=&nbsp;k+1,
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lip_Q16[k][j] = p_Q16[k-1][j] + p_Q16[k-1][j-2]
*a58d3d2aSXin Li              - ((c_Q17[2*k]*p_Q16[k-1][j-1] + 32768)>>16) ,
*a58d3d2aSXin Li
*a58d3d2aSXin Liq_Q16[k][j] = q_Q16[k-1][j] + q_Q16[k-1][j-2]
*a58d3d2aSXin Li              - ((c_Q17[2*k+1]*q_Q16[k-1][j-1] + 32768)>>16) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe use of Q17 values for the cosine terms in an otherwise Q16 expression
*a58d3d2aSXin Li implicitly scales them by a factor of 2.
*a58d3d2aSXin LiThe multiplications in this recurrence may require up to 48 bits of precision
*a58d3d2aSXin Li in the result to avoid overflow.
*a58d3d2aSXin LiIn practice, each row of the recurrence only depends on the previous row, so an
*a58d3d2aSXin Li implementation does not need to store all of them.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Lisilk_NLSF2A() uses the values from the last row of this recurrence to
*a58d3d2aSXin Li reconstruct a 32-bit version of the LPC filter (without the leading 1.0
*a58d3d2aSXin Li coefficient), a32_Q17[k], 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d2:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lia32_Q17[k]         = -(q_Q16[d2-1][k+1] - q_Q16[d2-1][k])
*a58d3d2aSXin Li                     - (p_Q16[d2-1][k+1] + p_Q16[d2-1][k])) ,
*a58d3d2aSXin Li
*a58d3d2aSXin Lia32_Q17[d_LPC-k-1] =  (q_Q16[d2-1][k+1] - q_Q16[d2-1][k])
*a58d3d2aSXin Li                     - (p_Q16[d2-1][k+1] + p_Q16[d2-1][k])) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe sum and difference of two terms from each of the p_Q16 and q_Q16
*a58d3d2aSXin Li coefficient lists reflect the (1&nbsp;+&nbsp;z**-1) and
*a58d3d2aSXin Li (1&nbsp;-&nbsp;z**-1) factors of P and Q, respectively.
*a58d3d2aSXin LiThe promotion of the expression from Q16 to Q17 implicitly scales the result
*a58d3d2aSXin Li by 1/2.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_lpc_range_limit"
*a58d3d2aSXin Li title="Limiting the Range of the LPC Coefficients">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe a32_Q17[] coefficients are too large to fit in a 16-bit value, which
*a58d3d2aSXin Li significantly increases the cost of applying this filter in fixed-point
*a58d3d2aSXin Li decoders.
*a58d3d2aSXin LiReducing them to Q12 precision doesn't incur any significant quality loss,
*a58d3d2aSXin Li but still does not guarantee they will fit.
*a58d3d2aSXin Lisilk_NLSF2A() applies up to 10 rounds of bandwidth expansion to limit
*a58d3d2aSXin Li the dynamic range of these coefficients.
*a58d3d2aSXin LiEven floating-point decoders SHOULD perform these steps, to avoid mismatch.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor each round, the process first finds the index k such that abs(a32_Q17[k])
*a58d3d2aSXin Li is largest, breaking ties by choosing the lowest value of k.
*a58d3d2aSXin LiThen, it computes the corresponding Q12 precision value, maxabs_Q12, subject to
*a58d3d2aSXin Li an upper bound to avoid overflow in subsequent computations:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Limaxabs_Q12 = min((maxabs_Q17 + 16) >> 5, 163838) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiIf this is larger than 32767, the procedure derives the chirp factor,
*a58d3d2aSXin Li sc_Q16[0], to use in the bandwidth expansion as
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li                    (maxabs_Q12 - 32767) << 14
*a58d3d2aSXin Lisc_Q16[0] = 65470 - -------------------------- ,
*a58d3d2aSXin Li                    (maxabs_Q12 * (k+1)) >> 2
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where the division here is integer division.
*a58d3d2aSXin LiThis is an approximation of the chirp factor needed to reduce the target
*a58d3d2aSXin Li coefficient to 32767, though it is both less than 0.999 and, for
*a58d3d2aSXin Li k&nbsp;&gt;&nbsp;0 when maxabs_Q12 is much greater than 32767, still slightly
*a58d3d2aSXin Li too large.
*a58d3d2aSXin LiThe upper bound on maxabs_Q12, 163838, was chosen because it is equal to
*a58d3d2aSXin Li ((2**31&nbsp;-&nbsp;1)&nbsp;&gt;&gt;&nbsp;14)&nbsp;+&nbsp;32767, i.e., the
*a58d3d2aSXin Li largest value of maxabs_Q12 that would not overflow the numerator in the
*a58d3d2aSXin Li equation above when stored in a signed 32-bit integer.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Lisilk_bwexpander_32() (bwexpander_32.c) performs the bandwidth expansion (again,
*a58d3d2aSXin Li only when maxabs_Q12 is greater than 32767) using the following recurrence:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li a32_Q17[k] = (a32_Q17[k]*sc_Q16[k]) >> 16
*a58d3d2aSXin Li
*a58d3d2aSXin Lisc_Q16[k+1] = (sc_Q16[0]*sc_Q16[k] + 32768) >> 16
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe first multiply may require up to 48 bits of precision in the result to
*a58d3d2aSXin Li avoid overflow.
*a58d3d2aSXin LiThe second multiply must be unsigned to avoid overflow with only 32 bits of
*a58d3d2aSXin Li precision.
*a58d3d2aSXin LiThe reference implementation uses a slightly more complex formulation that
*a58d3d2aSXin Li avoids the 32-bit overflow using signed multiplication, but is otherwise
*a58d3d2aSXin Li equivalent.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter 10 rounds of bandwidth expansion are performed, they are simply saturated
*a58d3d2aSXin Li to 16 bits:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lia32_Q17[k] = clamp(-32768, (a32_Q17[k] + 16) >> 5, 32767) << 5 .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiBecause this performs the actual saturation in the Q12 domain, but converts the
*a58d3d2aSXin Li coefficients back to the Q17 domain for the purposes of prediction gain
*a58d3d2aSXin Li limiting, this step must be performed after the 10th round of bandwidth
*a58d3d2aSXin Li expansion, regardless of whether or not the Q12 version of any coefficient
*a58d3d2aSXin Li still overflows a 16-bit integer.
*a58d3d2aSXin LiThis saturation is not performed if maxabs_Q12 drops to 32767 or less prior to
*a58d3d2aSXin Li the 10th round.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_lpc_gain_limit"
*a58d3d2aSXin Li title="Limiting the Prediction Gain of the LPC Filter">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe prediction gain of an LPC synthesis filter is the square-root of the output
*a58d3d2aSXin Li energy when the filter is excited by a unit-energy impulse.
*a58d3d2aSXin LiEven if the Q12 coefficients would fit, the resulting filter may still have a
*a58d3d2aSXin Li significant gain (especially for voiced sounds), making the filter unstable.
*a58d3d2aSXin Lisilk_NLSF2A() applies up to 18 additional rounds of bandwidth expansion to
*a58d3d2aSXin Li limit the prediction gain.
*a58d3d2aSXin LiInstead of controlling the amount of bandwidth expansion using the prediction
*a58d3d2aSXin Li gain itself (which may diverge to infinity for an unstable filter),
*a58d3d2aSXin Li silk_NLSF2A() uses silk_LPC_inverse_pred_gain_QA() (LPC_inv_pred_gain.c) to
*a58d3d2aSXin Li compute the reflection coefficients associated with the filter.
*a58d3d2aSXin LiThe filter is stable if and only if the magnitude of these coefficients is
*a58d3d2aSXin Li sufficiently less than one.
*a58d3d2aSXin LiThe reflection coefficients, rc[k], can be computed using a simple Levinson
*a58d3d2aSXin Li recurrence, initialized with the LPC coefficients
*a58d3d2aSXin Li a[d_LPC-1][n]&nbsp;=&nbsp;a[n], and then updated via
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li    rc[k] = -a[k][k] ,
*a58d3d2aSXin Li
*a58d3d2aSXin Li            a[k][n] - a[k][k-n-1]*rc[k]
*a58d3d2aSXin Lia[k-1][n] = --------------------------- .
*a58d3d2aSXin Li                             2
*a58d3d2aSXin Li                    1 - rc[k]
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiHowever, silk_LPC_inverse_pred_gain_QA() approximates this using fixed-point
*a58d3d2aSXin Li arithmetic to guarantee reproducible results across platforms and
*a58d3d2aSXin Li implementations.
*a58d3d2aSXin LiSince small changes in the coefficients can make a stable filter unstable, it
*a58d3d2aSXin Li takes the real Q12 coefficients that will be used during reconstruction as
*a58d3d2aSXin Li input.
*a58d3d2aSXin LiThus, let
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lia32_Q12[n] = (a32_Q17[n] + 16) >> 5
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li be the Q12 version of the LPC coefficients that will eventually be used.
*a58d3d2aSXin LiAs a simple initial check, the decoder computes the DC response as
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li        d_PLC-1
*a58d3d2aSXin Li          __
*a58d3d2aSXin LiDC_resp = \   a32_Q12[n]
*a58d3d2aSXin Li          /_
*a58d3d2aSXin Li          n=0
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li and if DC_resp&nbsp;&gt;&nbsp;4096, the filter is unstable.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIncreasing the precision of these Q12 coefficients to Q24 for intermediate
*a58d3d2aSXin Li computations allows more accurate computation of the reflection coefficients,
*a58d3d2aSXin Li so the decoder initializes the recurrence via
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lia32_Q24[d_LPC-1][n] = a32_Q12[n] << 12 .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThen for each k from d_LPC-1 down to 0, if
*a58d3d2aSXin Li abs(a32_Q24[k][k])&nbsp;&gt;&nbsp;16773022, the filter is unstable and the
*a58d3d2aSXin Li recurrence stops.
*a58d3d2aSXin LiThe constant 16773022 here is approximately 0.99975 in Q24.
*a58d3d2aSXin LiOtherwise, row k-1 of a32_Q24 is computed from row k as
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li      rc_Q31[k] = -a32_Q24[k][k] << 7 ,
*a58d3d2aSXin Li
*a58d3d2aSXin Li     div_Q30[k] = (1<<30) - (rc_Q31[k]*rc_Q31[k] >> 32) ,
*a58d3d2aSXin Li
*a58d3d2aSXin Li          b1[k] = ilog(div_Q30[k]) ,
*a58d3d2aSXin Li
*a58d3d2aSXin Li          b2[k] = b1[k] - 16 ,
*a58d3d2aSXin Li
*a58d3d2aSXin Li                        (1<<29) - 1
*a58d3d2aSXin Li     inv_Qb2[k] = ----------------------- ,
*a58d3d2aSXin Li                  div_Q30[k] >> (b2[k]+1)
*a58d3d2aSXin Li
*a58d3d2aSXin Li     err_Q29[k] = (1<<29)
*a58d3d2aSXin Li                  - ((div_Q30[k]<<(15-b2[k]))*inv_Qb2[k] >> 16) ,
*a58d3d2aSXin Li
*a58d3d2aSXin Li    gain_Qb1[k] = ((inv_Qb2[k] << 16)
*a58d3d2aSXin Li                   + (err_Q29[k]*inv_Qb2[k] >> 13)) ,
*a58d3d2aSXin Li
*a58d3d2aSXin Linum_Q24[k-1][n] = a32_Q24[k][n]
*a58d3d2aSXin Li                  - ((a32_Q24[k][k-n-1]*rc_Q31[k] + (1<<30)) >> 31) ,
*a58d3d2aSXin Li
*a58d3d2aSXin Lia32_Q24[k-1][n] = (num_Q24[k-1][n]*gain_Qb1[k]
*a58d3d2aSXin Li                   + (1<<(b1[k]-1))) >> b1[k] ,
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where 0&nbsp;&lt;=&nbsp;n&nbsp;&lt;&nbsp;k.
*a58d3d2aSXin LiHere, rc_Q30[k] are the reflection coefficients.
*a58d3d2aSXin Lidiv_Q30[k] is the denominator for each iteration, and gain_Qb1[k] is its
*a58d3d2aSXin Li multiplicative inverse (with b1[k] fractional bits, where b1[k] ranges from
*a58d3d2aSXin Li 20 to 31).
*a58d3d2aSXin Liinv_Qb2[k], which ranges from 16384 to 32767, is a low-precision version of
*a58d3d2aSXin Li that inverse (with b2[k] fractional bits).
*a58d3d2aSXin Lierr_Q29[k] is the residual error, ranging from -32763 to 32392, which is used
*a58d3d2aSXin Li to improve the accuracy.
*a58d3d2aSXin LiThe values t_Q24[k-1][n] for each n are the numerators for the next row of
*a58d3d2aSXin Li coefficients in the recursion, and a32_Q24[k-1][n] is the final version of
*a58d3d2aSXin Li that row.
*a58d3d2aSXin LiEvery multiply in this procedure except the one used to compute gain_Qb1[k]
*a58d3d2aSXin Li requires more than 32 bits of precision, but otherwise all intermediate
*a58d3d2aSXin Li results fit in 32 bits or less.
*a58d3d2aSXin LiIn practice, because each row only depends on the next one, an implementation
*a58d3d2aSXin Li does not need to store them all.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf abs(a32_Q24[k][k])&nbsp;&lt;=&nbsp;16773022 for
*a58d3d2aSXin Li 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;d_LPC, then the filter is considered stable.
*a58d3d2aSXin LiHowever, the problem of determining stability is ill-conditioned when the
*a58d3d2aSXin Li filter contains several reflection coefficients whose magnitude is very close
*a58d3d2aSXin Li to one.
*a58d3d2aSXin LiThis fixed-point algorithm is not mathematically guaranteed to correctly
*a58d3d2aSXin Li classify filters as stable or unstable in this case, though it does very well
*a58d3d2aSXin Li in practice.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOn round i, 1&nbsp;&lt;=&nbsp;i&nbsp;&lt;=&nbsp;18, if the filter passes these
*a58d3d2aSXin Li stability checks, then this procedure stops, and the final LPC coefficients to
*a58d3d2aSXin Li use for reconstruction in <xref target="silk_lpc_synthesis"/> are
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lia_Q12[k] = (a32_Q17[k] + 16) >> 5 .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiOtherwise, a round of bandwidth expansion is applied using the same procedure
*a58d3d2aSXin Li as in <xref target="silk_lpc_range_limit"/>, with
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lisc_Q16[0] = 65536 - (2<<i) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiDuring the 15th round, sc_Q16[0] becomes 0 in the above equation, so a_Q12[k]
*a58d3d2aSXin Li is set to 0 for all k, guaranteeing a stable filter.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_ltp_params" toc="include"
*a58d3d2aSXin Li title="Long-Term Prediction (LTP) Parameters">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter the normalized LSF indices and, for 20&nbsp;ms frames, the LSF
*a58d3d2aSXin Li interpolation index, voiced frames (see <xref target="silk_frame_type"/>)
*a58d3d2aSXin Li include additional LTP parameters.
*a58d3d2aSXin LiThere is one primary lag index for each SILK frame, but this is refined to
*a58d3d2aSXin Li produce a separate lag index per subframe using a vector quantizer.
*a58d3d2aSXin LiEach subframe also gets its own prediction gain coefficient.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_ltp_lags" title="Pitch Lags">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe primary lag index is coded either relative to the primary lag of the prior
*a58d3d2aSXin Li frame in the same channel, or as an absolute index.
*a58d3d2aSXin LiAbsolute coding is used if and only if
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis is the first SILK frame of its type (LBRR or regular) for this channel in
*a58d3d2aSXin Li the current Opus frame,
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe previous SILK frame of the same type (LBRR or regular) for this channel in
*a58d3d2aSXin Li the same Opus frame was not coded, or
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThat previous SILK frame was coded, but was not voiced (see
*a58d3d2aSXin Li <xref target="silk_frame_type"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWith absolute coding, the primary pitch lag may range from 2&nbsp;ms
*a58d3d2aSXin Li (inclusive) up to 18&nbsp;ms (exclusive), corresponding to pitches from
*a58d3d2aSXin Li 500&nbsp;Hz down to 55.6&nbsp;Hz, respectively.
*a58d3d2aSXin LiIt is comprised of a high part and a low part, where the decoder reads the high
*a58d3d2aSXin Li part using the 32-entry codebook in <xref target="silk_abs_pitch_high_pdf"/>
*a58d3d2aSXin Li and the low part using the codebook corresponding to the current audio
*a58d3d2aSXin Li bandwidth from <xref target="silk_abs_pitch_low_pdf"/>.
*a58d3d2aSXin LiThe final primary pitch lag is then
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lilag = lag_high*lag_scale + lag_low + lag_min
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where lag_high is the high part, lag_low is the low part, and lag_scale
*a58d3d2aSXin Li and lag_min are the values from the "Scale" and "Minimum Lag" columns of
*a58d3d2aSXin Li <xref target="silk_abs_pitch_low_pdf"/>, respectively.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_abs_pitch_high_pdf"
*a58d3d2aSXin Li title="PDF for High Part of Primary Pitch Lag">
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>{3,   3,   6,  11,  21,  30,  32,  19,
*a58d3d2aSXin Li   11,  10,  12,  13,  13,  12,  11,   9,
*a58d3d2aSXin Li    8,   7,   6,   4,   2,   2,   2,   1,
*a58d3d2aSXin Li    1,   1,   1,   1,   1,   1,   1,   1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_abs_pitch_low_pdf"
*a58d3d2aSXin Li title="PDF for Low Part of Primary Pitch Lag">
*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<ttcol>Scale</ttcol>
*a58d3d2aSXin Li<ttcol>Minimum Lag</ttcol>
*a58d3d2aSXin Li<ttcol>Maximum Lag</ttcol>
*a58d3d2aSXin Li<c>NB</c> <c>{64, 64, 64, 64}/256</c>                 <c>4</c> <c>16</c> <c>144</c>
*a58d3d2aSXin Li<c>MB</c> <c>{43, 42, 43, 43, 42, 43}/256</c>         <c>6</c> <c>24</c> <c>216</c>
*a58d3d2aSXin Li<c>WB</c> <c>{32, 32, 32, 32, 32, 32, 32, 32}/256</c> <c>8</c> <c>32</c> <c>288</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAll frames that do not use absolute coding for the primary lag index use
*a58d3d2aSXin Li relative coding instead.
*a58d3d2aSXin LiThe decoder reads a single delta value using the 21-entry PDF in
*a58d3d2aSXin Li <xref target="silk_rel_pitch_pdf"/>.
*a58d3d2aSXin LiIf the resulting value is zero, it falls back to the absolute coding procedure
*a58d3d2aSXin Li from the prior paragraph.
*a58d3d2aSXin LiOtherwise, the final primary pitch lag is then
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lilag = previous_lag + (delta_lag_index - 9)
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where previous_lag is the primary pitch lag from the most recent frame in the
*a58d3d2aSXin Li same channel and delta_lag_index is the value just decoded.
*a58d3d2aSXin LiThis allows a per-frame change in the pitch lag of -8 to +11 samples.
*a58d3d2aSXin LiThe decoder does no clamping at this point, so this value can fall outside the
*a58d3d2aSXin Li range of 2&nbsp;ms to 18&nbsp;ms, and the decoder must use this unclamped
*a58d3d2aSXin Li value when using relative coding in the next SILK frame (if any).
*a58d3d2aSXin LiHowever, because an Opus frame can use relative coding for at most two
*a58d3d2aSXin Li consecutive SILK frames, integer overflow should not be an issue.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_rel_pitch_pdf"
*a58d3d2aSXin Li title="PDF for Primary Pitch Lag Change">
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>{46,  2,  2,  3,  4,  6, 10, 15,
*a58d3d2aSXin Li    26, 38, 30, 22, 15, 10,  7,  6,
*a58d3d2aSXin Li     4,  4,  2,  2,  2}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter the primary pitch lag, a "pitch contour", stored as a single entry from
*a58d3d2aSXin Li one of four small VQ codebooks, gives lag offsets for each subframe in the
*a58d3d2aSXin Li current SILK frame.
*a58d3d2aSXin LiThe codebook index is decoded using one of the PDFs in
*a58d3d2aSXin Li <xref target="silk_pitch_contour_pdfs"/> depending on the current frame size
*a58d3d2aSXin Li and audio bandwidth.
*a58d3d2aSXin LiTables&nbsp;<xref format="counter" target="silk_pitch_contour_cb_nb10ms"/>
*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_pitch_contour_cb_mbwb20ms"/>
*a58d3d2aSXin Li give the corresponding offsets to apply to the primary pitch lag for each
*a58d3d2aSXin Li subframe given the decoded codebook index.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_pdfs"
*a58d3d2aSXin Li title="PDFs for Subframe Pitch Contour">
*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol>
*a58d3d2aSXin Li<ttcol>SILK Frame Size</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Codebook Size</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>NB</c>       <c>10&nbsp;ms</c>  <c>3</c>
*a58d3d2aSXin Li<c>{143, 50, 63}/256</c>
*a58d3d2aSXin Li<c>NB</c>       <c>20&nbsp;ms</c> <c>11</c>
*a58d3d2aSXin Li<c>{68, 12, 21, 17, 19, 22, 30, 24,
*a58d3d2aSXin Li    17, 16, 10}/256</c>
*a58d3d2aSXin Li<c>MB or WB</c> <c>10&nbsp;ms</c> <c>12</c>
*a58d3d2aSXin Li<c>{91, 46, 39, 19, 14, 12,  8,  7,
*a58d3d2aSXin Li     6,  5,  5,  4}/256</c>
*a58d3d2aSXin Li<c>MB or WB</c> <c>20&nbsp;ms</c> <c>34</c>
*a58d3d2aSXin Li<c>{33, 22, 18, 16, 15, 14, 14, 13,
*a58d3d2aSXin Li    13, 10,  9,  9,  8,  6,  6,  6,
*a58d3d2aSXin Li     5,  4,  4,  4,  3,  3,  3,  2,
*a58d3d2aSXin Li     2,  2,  2,  2,  2,  2,  1,  1,
*a58d3d2aSXin Li     1,  1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_nb10ms"
*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: NB, 10&nbsp;ms Frames">
*a58d3d2aSXin Li<ttcol>Index</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol>
*a58d3d2aSXin Li<c>0</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li<c>1</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li<c>2</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_nb20ms"
*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: NB, 20&nbsp;ms Frames">
*a58d3d2aSXin Li<ttcol>Index</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol>
*a58d3d2aSXin Li <c>0</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>1</c> <c><spanx style="vbare">&nbsp;2&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;-1</spanx></c>
*a58d3d2aSXin Li <c>2</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;2</spanx></c>
*a58d3d2aSXin Li <c>3</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>4</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>5</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>6</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>7</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>8</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>9</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-1</spanx></c>
*a58d3d2aSXin Li<c>10</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-1</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_mbwb10ms"
*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: MB or WB, 10&nbsp;ms Frames">
*a58d3d2aSXin Li<ttcol>Index</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol>
*a58d3d2aSXin Li <c>0</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>1</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>2</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>3</c> <c><spanx style="vbare">-1&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>4</c> <c><spanx style="vbare">&nbsp;1&nbsp;-1</spanx></c>
*a58d3d2aSXin Li <c>5</c> <c><spanx style="vbare">-1&nbsp;&nbsp;2</spanx></c>
*a58d3d2aSXin Li <c>6</c> <c><spanx style="vbare">&nbsp;2&nbsp;-1</spanx></c>
*a58d3d2aSXin Li <c>7</c> <c><spanx style="vbare">-2&nbsp;&nbsp;2</spanx></c>
*a58d3d2aSXin Li <c>8</c> <c><spanx style="vbare">&nbsp;2&nbsp;-2</spanx></c>
*a58d3d2aSXin Li <c>9</c> <c><spanx style="vbare">-2&nbsp;&nbsp;3</spanx></c>
*a58d3d2aSXin Li<c>10</c> <c><spanx style="vbare">&nbsp;3&nbsp;-2</spanx></c>
*a58d3d2aSXin Li<c>11</c> <c><spanx style="vbare">-3&nbsp;&nbsp;3</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_pitch_contour_cb_mbwb20ms"
*a58d3d2aSXin Li title="Codebook Vectors for Subframe Pitch Contour: MB or WB, 20&nbsp;ms Frames">
*a58d3d2aSXin Li<ttcol>Index</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Subframe Offsets</ttcol>
*a58d3d2aSXin Li <c>0</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>1</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>2</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>3</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>4</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>5</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>6</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>7</c> <c><spanx style="vbare">&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-1</spanx></c>
*a58d3d2aSXin Li <c>8</c> <c><spanx style="vbare">-1&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;2</spanx></c>
*a58d3d2aSXin Li <c>9</c> <c><spanx style="vbare">&nbsp;1&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-1</spanx></c>
*a58d3d2aSXin Li<c>10</c> <c><spanx style="vbare">-2&nbsp;-1&nbsp;&nbsp;1&nbsp;&nbsp;2</spanx></c>
*a58d3d2aSXin Li<c>11</c> <c><spanx style="vbare">&nbsp;2&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;-1</spanx></c>
*a58d3d2aSXin Li<c>12</c> <c><spanx style="vbare">-2&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;&nbsp;2</spanx></c>
*a58d3d2aSXin Li<c>13</c> <c><spanx style="vbare">-2&nbsp;&nbsp;0&nbsp;&nbsp;1&nbsp;&nbsp;3</spanx></c>
*a58d3d2aSXin Li<c>14</c> <c><spanx style="vbare">&nbsp;2&nbsp;&nbsp;1&nbsp;-1&nbsp;-2</spanx></c>
*a58d3d2aSXin Li<c>15</c> <c><spanx style="vbare">-3&nbsp;-1&nbsp;&nbsp;1&nbsp;&nbsp;3</spanx></c>
*a58d3d2aSXin Li<c>16</c> <c><spanx style="vbare">&nbsp;2&nbsp;&nbsp;0&nbsp;&nbsp;0&nbsp;-2</spanx></c>
*a58d3d2aSXin Li<c>17</c> <c><spanx style="vbare">&nbsp;3&nbsp;&nbsp;1&nbsp;&nbsp;0&nbsp;-2</spanx></c>
*a58d3d2aSXin Li<c>18</c> <c><spanx style="vbare">-3&nbsp;-1&nbsp;&nbsp;2&nbsp;&nbsp;4</spanx></c>
*a58d3d2aSXin Li<c>19</c> <c><spanx style="vbare">-4&nbsp;-1&nbsp;&nbsp;1&nbsp;&nbsp;4</spanx></c>
*a58d3d2aSXin Li<c>20</c> <c><spanx style="vbare">&nbsp;3&nbsp;&nbsp;1&nbsp;-1&nbsp;-3</spanx></c>
*a58d3d2aSXin Li<c>21</c> <c><spanx style="vbare">-4&nbsp;-1&nbsp;&nbsp;2&nbsp;&nbsp;5</spanx></c>
*a58d3d2aSXin Li<c>22</c> <c><spanx style="vbare">&nbsp;4&nbsp;&nbsp;2&nbsp;-1&nbsp;-3</spanx></c>
*a58d3d2aSXin Li<c>23</c> <c><spanx style="vbare">&nbsp;4&nbsp;&nbsp;1&nbsp;-1&nbsp;-4</spanx></c>
*a58d3d2aSXin Li<c>24</c> <c><spanx style="vbare">-5&nbsp;-1&nbsp;&nbsp;2&nbsp;&nbsp;6</spanx></c>
*a58d3d2aSXin Li<c>25</c> <c><spanx style="vbare">&nbsp;5&nbsp;&nbsp;2&nbsp;-1&nbsp;-4</spanx></c>
*a58d3d2aSXin Li<c>26</c> <c><spanx style="vbare">-6&nbsp;-2&nbsp;&nbsp;2&nbsp;&nbsp;6</spanx></c>
*a58d3d2aSXin Li<c>27</c> <c><spanx style="vbare">-5&nbsp;-2&nbsp;&nbsp;2&nbsp;&nbsp;5</spanx></c>
*a58d3d2aSXin Li<c>28</c> <c><spanx style="vbare">&nbsp;6&nbsp;&nbsp;2&nbsp;-1&nbsp;-5</spanx></c>
*a58d3d2aSXin Li<c>29</c> <c><spanx style="vbare">-7&nbsp;-2&nbsp;&nbsp;3&nbsp;&nbsp;8</spanx></c>
*a58d3d2aSXin Li<c>30</c> <c><spanx style="vbare">&nbsp;6&nbsp;&nbsp;2&nbsp;-2&nbsp;-6</spanx></c>
*a58d3d2aSXin Li<c>31</c> <c><spanx style="vbare">&nbsp;5&nbsp;&nbsp;2&nbsp;-2&nbsp;-5</spanx></c>
*a58d3d2aSXin Li<c>32</c> <c><spanx style="vbare">&nbsp;8&nbsp;&nbsp;3&nbsp;-2&nbsp;-7</spanx></c>
*a58d3d2aSXin Li<c>33</c> <c><spanx style="vbare">-9&nbsp;-3&nbsp;&nbsp;3&nbsp;&nbsp;9</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe final pitch lag for each subframe is assembled in silk_decode_pitch()
*a58d3d2aSXin Li (decode_pitch.c).
*a58d3d2aSXin LiLet lag be the primary pitch lag for the current SILK frame, contour_index be
*a58d3d2aSXin Li index of the VQ codebook, and lag_cb[contour_index][k] be the corresponding
*a58d3d2aSXin Li entry of the codebook from the appropriate table given above for the k'th
*a58d3d2aSXin Li subframe.
*a58d3d2aSXin LiThen the final pitch lag for that subframe is
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lipitch_lags[k] = clamp(lag_min, lag + lag_cb[contour_index][k],
*a58d3d2aSXin Li                      lag_max)
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li where lag_min and lag_max are the values from the "Minimum Lag" and
*a58d3d2aSXin Li "Maximum Lag" columns of <xref target="silk_abs_pitch_low_pdf"/>,
*a58d3d2aSXin Li respectively.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_ltp_filter" title="LTP Filter Coefficients">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSILK uses a separate 5-tap pitch filter for each subframe, selected from one
*a58d3d2aSXin Li of three codebooks.
*a58d3d2aSXin LiThe three codebooks each represent different rate-distortion trade-offs, with
*a58d3d2aSXin Li average rates of 1.61&nbsp;bits/subframe, 3.68&nbsp;bits/subframe, and
*a58d3d2aSXin Li 4.85&nbsp;bits/subframe, respectively.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe importance of the filter coefficients generally depends on two factors: the
*a58d3d2aSXin Li periodicity of the signal and relative energy between the current subframe and
*a58d3d2aSXin Li the signal from one period earlier.
*a58d3d2aSXin LiGreater periodicity and decaying energy both lead to more important filter
*a58d3d2aSXin Li coefficients, and thus should be coded with lower distortion and higher rate.
*a58d3d2aSXin LiThese properties are relatively stable over the duration of a single SILK
*a58d3d2aSXin Li frame, hence all of the subframes in a SILK frame choose their filter from the
*a58d3d2aSXin Li same codebook.
*a58d3d2aSXin LiThis is signaled with an explicitly-coded "periodicity index".
*a58d3d2aSXin LiThis immediately follows the subframe pitch lags, and is coded using the
*a58d3d2aSXin Li 3-entry PDF from <xref target="silk_perindex_pdf"/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_perindex_pdf" title="Periodicity Index PDF">
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>{77, 80, 99}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe indices of the filters for each subframe follow.
*a58d3d2aSXin LiThey are all coded using the PDF from <xref target="silk_ltp_filter_pdfs"/>
*a58d3d2aSXin Li corresponding to the periodicity index.
*a58d3d2aSXin LiTables&nbsp;<xref format="counter" target="silk_ltp_filter_coeffs0"/>
*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_ltp_filter_coeffs2"/>
*a58d3d2aSXin Li contain the corresponding filter taps as signed Q7 integers.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_pdfs" title="LTP Filter PDFs">
*a58d3d2aSXin Li<ttcol>Periodicity Index</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Codebook Size</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>0</c>  <c>8</c> <c>{185, 15, 13, 13, 9, 9, 6, 6}/256</c>
*a58d3d2aSXin Li<c>1</c> <c>16</c> <c>{57, 34, 21, 20, 15, 13, 12, 13,
*a58d3d2aSXin Li                       10, 10,  9, 10,  9,  8,  7,  8}/256</c>
*a58d3d2aSXin Li<c>2</c> <c>32</c> <c>{15, 16, 14, 12, 12, 12, 11, 11,
*a58d3d2aSXin Li                       11, 10,  9,  9,  9,  9,  8,  8,
*a58d3d2aSXin Li                        8,  8,  7,  7,  6,  6,  5,  4,
*a58d3d2aSXin Li                        5,  4,  4,  4,  3,  4,  3,  2}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_coeffs0"
*a58d3d2aSXin Li title="Codebook Vectors for LTP Filter, Periodicity Index 0">
*a58d3d2aSXin Li<ttcol>Index</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Filter Taps (Q7)</ttcol>
*a58d3d2aSXin Li <c>0</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;4&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;24&nbsp;&nbsp;&nbsp;7&nbsp;&nbsp;&nbsp;5</spanx></c>
*a58d3d2aSXin Li <c>1</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;0&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li <c>2</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;12&nbsp;&nbsp;28&nbsp;&nbsp;41&nbsp;&nbsp;13&nbsp;&nbsp;-4</spanx></c>
*a58d3d2aSXin Li <c>3</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-9&nbsp;&nbsp;15&nbsp;&nbsp;42&nbsp;&nbsp;25&nbsp;&nbsp;14</spanx></c>
*a58d3d2aSXin Li <c>4</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;1&nbsp;&nbsp;-2&nbsp;&nbsp;62&nbsp;&nbsp;41&nbsp;&nbsp;-9</spanx></c>
*a58d3d2aSXin Li <c>5</c>
*a58d3d2aSXin Li<c><spanx style="vbare">-10&nbsp;&nbsp;37&nbsp;&nbsp;65&nbsp;&nbsp;-4&nbsp;&nbsp;&nbsp;3</spanx></c>
*a58d3d2aSXin Li <c>6</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-6&nbsp;&nbsp;&nbsp;4&nbsp;&nbsp;66&nbsp;&nbsp;&nbsp;7&nbsp;&nbsp;-8</spanx></c>
*a58d3d2aSXin Li <c>7</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;16&nbsp;&nbsp;14&nbsp;&nbsp;38&nbsp;&nbsp;-3&nbsp;&nbsp;33</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_coeffs1"
*a58d3d2aSXin Li title="Codebook Vectors for LTP Filter, Periodicity Index 1">
*a58d3d2aSXin Li<ttcol>Index</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Filter Taps (Q7)</ttcol>
*a58d3d2aSXin Li
*a58d3d2aSXin Li <c>0</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;13&nbsp;&nbsp;22&nbsp;&nbsp;39&nbsp;&nbsp;23&nbsp;&nbsp;12</spanx></c>
*a58d3d2aSXin Li <c>1</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-1&nbsp;&nbsp;36&nbsp;&nbsp;64&nbsp;&nbsp;27&nbsp;&nbsp;-6</spanx></c>
*a58d3d2aSXin Li <c>2</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-7&nbsp;&nbsp;10&nbsp;&nbsp;55&nbsp;&nbsp;43&nbsp;&nbsp;17</spanx></c>
*a58d3d2aSXin Li <c>3</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;8&nbsp;&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>4</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;6&nbsp;-11&nbsp;&nbsp;74&nbsp;&nbsp;53&nbsp;&nbsp;-9</spanx></c>
*a58d3d2aSXin Li <c>5</c>
*a58d3d2aSXin Li<c><spanx style="vbare">-12&nbsp;&nbsp;55&nbsp;&nbsp;76&nbsp;-12&nbsp;&nbsp;&nbsp;8</spanx></c>
*a58d3d2aSXin Li <c>6</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-3&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;93&nbsp;&nbsp;27&nbsp;&nbsp;-4</spanx></c>
*a58d3d2aSXin Li <c>7</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;26&nbsp;&nbsp;39&nbsp;&nbsp;59&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;-8</spanx></c>
*a58d3d2aSXin Li <c>8</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;77&nbsp;&nbsp;11&nbsp;&nbsp;&nbsp;9</spanx></c>
*a58d3d2aSXin Li <c>9</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-8&nbsp;&nbsp;22&nbsp;&nbsp;44&nbsp;&nbsp;-6&nbsp;&nbsp;&nbsp;7</spanx></c>
*a58d3d2aSXin Li<c>10</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;40&nbsp;&nbsp;&nbsp;9&nbsp;&nbsp;26&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;9</spanx></c>
*a58d3d2aSXin Li<c>11</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-7&nbsp;&nbsp;20&nbsp;101&nbsp;&nbsp;-7&nbsp;&nbsp;&nbsp;4</spanx></c>
*a58d3d2aSXin Li<c>12</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;-8&nbsp;&nbsp;42&nbsp;&nbsp;26&nbsp;&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li<c>13</c>
*a58d3d2aSXin Li<c><spanx style="vbare">-15&nbsp;&nbsp;33&nbsp;&nbsp;68&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;23</spanx></c>
*a58d3d2aSXin Li<c>14</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-2&nbsp;&nbsp;55&nbsp;&nbsp;46&nbsp;&nbsp;-2&nbsp;&nbsp;15</spanx></c>
*a58d3d2aSXin Li<c>15</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;-1&nbsp;&nbsp;21&nbsp;&nbsp;16&nbsp;&nbsp;41</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_ltp_filter_coeffs2"
*a58d3d2aSXin Li title="Codebook Vectors for LTP Filter, Periodicity Index 2">
*a58d3d2aSXin Li<ttcol>Index</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Filter Taps (Q7)</ttcol>
*a58d3d2aSXin Li <c>0</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-6&nbsp;&nbsp;27&nbsp;&nbsp;61&nbsp;&nbsp;39&nbsp;&nbsp;&nbsp;5</spanx></c>
*a58d3d2aSXin Li <c>1</c>
*a58d3d2aSXin Li<c><spanx style="vbare">-11&nbsp;&nbsp;42&nbsp;&nbsp;88&nbsp;&nbsp;&nbsp;4&nbsp;&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>2</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-2&nbsp;&nbsp;60&nbsp;&nbsp;65&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;-4</spanx></c>
*a58d3d2aSXin Li <c>3</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-1&nbsp;&nbsp;-5&nbsp;&nbsp;73&nbsp;&nbsp;56&nbsp;&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li <c>4</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-9&nbsp;&nbsp;19&nbsp;&nbsp;94&nbsp;&nbsp;29&nbsp;&nbsp;-9</spanx></c>
*a58d3d2aSXin Li <c>5</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;0&nbsp;&nbsp;12&nbsp;&nbsp;99&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;&nbsp;4</spanx></c>
*a58d3d2aSXin Li <c>6</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;8&nbsp;-19&nbsp;102&nbsp;&nbsp;46&nbsp;-13</spanx></c>
*a58d3d2aSXin Li <c>7</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;13&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;2</spanx></c>
*a58d3d2aSXin Li <c>8</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;9&nbsp;-21&nbsp;&nbsp;84&nbsp;&nbsp;72&nbsp;-18</spanx></c>
*a58d3d2aSXin Li <c>9</c>
*a58d3d2aSXin Li<c><spanx style="vbare">-11&nbsp;&nbsp;46&nbsp;104&nbsp;-22&nbsp;&nbsp;&nbsp;8</spanx></c>
*a58d3d2aSXin Li<c>10</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;18&nbsp;&nbsp;38&nbsp;&nbsp;48&nbsp;&nbsp;23&nbsp;&nbsp;&nbsp;0</spanx></c>
*a58d3d2aSXin Li<c>11</c>
*a58d3d2aSXin Li<c><spanx style="vbare">-16&nbsp;&nbsp;70&nbsp;&nbsp;83&nbsp;-21&nbsp;&nbsp;11</spanx></c>
*a58d3d2aSXin Li<c>12</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;5&nbsp;-11&nbsp;117&nbsp;&nbsp;22&nbsp;&nbsp;-8</spanx></c>
*a58d3d2aSXin Li<c>13</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-6&nbsp;&nbsp;23&nbsp;117&nbsp;-12&nbsp;&nbsp;&nbsp;3</spanx></c>
*a58d3d2aSXin Li<c>14</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;-8&nbsp;&nbsp;95&nbsp;&nbsp;28&nbsp;&nbsp;&nbsp;4</spanx></c>
*a58d3d2aSXin Li<c>15</c>
*a58d3d2aSXin Li<c><spanx style="vbare">-10&nbsp;&nbsp;15&nbsp;&nbsp;77&nbsp;&nbsp;60&nbsp;-15</spanx></c>
*a58d3d2aSXin Li<c>16</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-1&nbsp;&nbsp;&nbsp;4&nbsp;124&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;-4</spanx></c>
*a58d3d2aSXin Li<c>17</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;3&nbsp;&nbsp;38&nbsp;&nbsp;84&nbsp;&nbsp;24&nbsp;-25</spanx></c>
*a58d3d2aSXin Li<c>18</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;2&nbsp;&nbsp;13&nbsp;&nbsp;42&nbsp;&nbsp;13&nbsp;&nbsp;31</spanx></c>
*a58d3d2aSXin Li<c>19</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;21&nbsp;&nbsp;-4&nbsp;&nbsp;56&nbsp;&nbsp;46&nbsp;&nbsp;-1</spanx></c>
*a58d3d2aSXin Li<c>20</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-1&nbsp;&nbsp;35&nbsp;&nbsp;79&nbsp;-13&nbsp;&nbsp;19</spanx></c>
*a58d3d2aSXin Li<c>21</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-7&nbsp;&nbsp;65&nbsp;&nbsp;88&nbsp;&nbsp;-9&nbsp;-14</spanx></c>
*a58d3d2aSXin Li<c>22</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;20&nbsp;&nbsp;&nbsp;4&nbsp;&nbsp;81&nbsp;&nbsp;49&nbsp;-29</spanx></c>
*a58d3d2aSXin Li<c>23</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;20&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;75&nbsp;&nbsp;&nbsp;3&nbsp;-17</spanx></c>
*a58d3d2aSXin Li<c>24</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;5&nbsp;&nbsp;-9&nbsp;&nbsp;44&nbsp;&nbsp;92&nbsp;&nbsp;-8</spanx></c>
*a58d3d2aSXin Li<c>25</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;1&nbsp;&nbsp;-3&nbsp;&nbsp;22&nbsp;&nbsp;69&nbsp;&nbsp;31</spanx></c>
*a58d3d2aSXin Li<c>26</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;-6&nbsp;&nbsp;95&nbsp;&nbsp;41&nbsp;-12&nbsp;&nbsp;&nbsp;5</spanx></c>
*a58d3d2aSXin Li<c>27</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;39&nbsp;&nbsp;67&nbsp;&nbsp;16&nbsp;&nbsp;-4&nbsp;&nbsp;&nbsp;1</spanx></c>
*a58d3d2aSXin Li<c>28</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;0&nbsp;&nbsp;-6&nbsp;120&nbsp;&nbsp;55&nbsp;-36</spanx></c>
*a58d3d2aSXin Li<c>29</c>
*a58d3d2aSXin Li<c><spanx style="vbare">-13&nbsp;&nbsp;44&nbsp;122&nbsp;&nbsp;&nbsp;4&nbsp;-24</spanx></c>
*a58d3d2aSXin Li<c>30</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;81&nbsp;&nbsp;&nbsp;5&nbsp;&nbsp;11&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;7</spanx></c>
*a58d3d2aSXin Li<c>31</c>
*a58d3d2aSXin Li<c><spanx style="vbare">&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;&nbsp;9&nbsp;&nbsp;10&nbsp;&nbsp;88</spanx></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_ltp_scaling" title="LTP Scaling Parameter">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAn LTP scaling parameter appears after the LTP filter coefficients if and only
*a58d3d2aSXin Li if
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>This is a voiced frame (see <xref target="silk_frame_type"/>), and</t>
*a58d3d2aSXin Li<t>Either
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis SILK frame corresponds to the first time interval of the
*a58d3d2aSXin Li current Opus frame for its type (LBRR or regular), or
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis is an LBRR frame where the LBRR flags (see
*a58d3d2aSXin Li <xref target="silk_lbrr_flags"/>) indicate the previous LBRR frame in the same
*a58d3d2aSXin Li channel is not coded.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiThis allows the encoder to trade off the prediction gain between
*a58d3d2aSXin Li packets against the recovery time after packet loss.
*a58d3d2aSXin LiUnlike absolute-coding for pitch lags, regular SILK frames that are not at the
*a58d3d2aSXin Li start of an Opus frame (i.e., that do not correspond to the first 20&nbsp;ms
*a58d3d2aSXin Li time interval in Opus frames of 40&nbsp;or 60&nbsp;ms) do not include this
*a58d3d2aSXin Li field, even if the prior frame was not voiced, or (in the case of the side
*a58d3d2aSXin Li channel) not even coded.
*a58d3d2aSXin LiAfter an uncoded frame in the side channel, the LTP buffer (see
*a58d3d2aSXin Li <xref target="silk_ltp_synthesis"/>) is cleared to zero, and is thus in a
*a58d3d2aSXin Li known state.
*a58d3d2aSXin LiIn contrast, LBRR frames do include this field when the prior frame was not
*a58d3d2aSXin Li coded, since the LTP buffer contains the output of the PLC, which is
*a58d3d2aSXin Li non-normative.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf present, the decoder reads a value using the 3-entry PDF in
*a58d3d2aSXin Li <xref target="silk_ltp_scaling_pdf"/>.
*a58d3d2aSXin LiThe three possible values represent Q14 scale factors of 15565, 12288, and
*a58d3d2aSXin Li 8192, respectively (corresponding to approximately 0.95, 0.75, and 0.5).
*a58d3d2aSXin LiFrames that do not code the scaling parameter use the default factor of 15565
*a58d3d2aSXin Li (approximately 0.95).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_ltp_scaling_pdf"
*a58d3d2aSXin Li title="PDF for LTP Scaling Parameter">
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>{128, 64, 64}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_seed" toc="include"
*a58d3d2aSXin Li title="Linear Congruential Generator (LCG) Seed">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAs described in <xref target="silk_excitation_reconstruction"/>, SILK uses a
*a58d3d2aSXin Li linear congruential generator (LCG) to inject pseudorandom noise into the
*a58d3d2aSXin Li quantized excitation.
*a58d3d2aSXin LiTo ensure synchronization of this process between the encoder and decoder, each
*a58d3d2aSXin Li SILK frame stores a 2-bit seed after the LTP parameters (if any).
*a58d3d2aSXin LiThe encoder may consider the choice of seed during quantization, and the
*a58d3d2aSXin Li flexibility of this choice lets it reduce distortion, helping to pay for the
*a58d3d2aSXin Li bit cost required to signal it.
*a58d3d2aSXin LiThe decoder reads the seed using the uniform 4-entry PDF in
*a58d3d2aSXin Li <xref target="silk_seed_pdf"/>, yielding a value between 0 and 3, inclusive.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_seed_pdf"
*a58d3d2aSXin Li title="PDF for LCG Seed">
*a58d3d2aSXin Li<ttcol align="left">PDF</ttcol>
*a58d3d2aSXin Li<c>{64, 64, 64, 64}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_excitation" toc="include" title="Excitation">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSILK codes the excitation using a modified version of the Pyramid Vector
*a58d3d2aSXin Li Quantization (PVQ) codebook <xref target="PVQ"/>.
*a58d3d2aSXin LiThe PVQ codebook is designed for Laplace-distributed values and consists of all
*a58d3d2aSXin Li sums of K signed, unit pulses in a vector of dimension N, where two pulses at
*a58d3d2aSXin Li the same position are required to have the same sign.
*a58d3d2aSXin LiThus the codebook includes all integer codevectors y of dimension N that
*a58d3d2aSXin Li satisfy
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin LiN-1
*a58d3d2aSXin Li__
*a58d3d2aSXin Li\  abs(y[j]) = K .
*a58d3d2aSXin Li/_
*a58d3d2aSXin Lij=0
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiUnlike regular PVQ, SILK uses a variable-length, rather than fixed-length,
*a58d3d2aSXin Li encoding.
*a58d3d2aSXin LiThis encoding is better suited to the more Gaussian-like distribution of the
*a58d3d2aSXin Li coefficient magnitudes and the non-uniform distribution of their signs (caused
*a58d3d2aSXin Li by the quantization offset described below).
*a58d3d2aSXin LiSILK also handles large codebooks by coding the least significant bits (LSBs)
*a58d3d2aSXin Li of each coefficient directly.
*a58d3d2aSXin LiThis adds a small coding efficiency loss, but greatly reduces the computation
*a58d3d2aSXin Li time and ROM size required for decoding, as implemented in
*a58d3d2aSXin Li silk_decode_pulses() (decode_pulses.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSILK fixes the dimension of the codebook to N&nbsp;=&nbsp;16.
*a58d3d2aSXin LiThe excitation is made up of a number of "shell blocks", each 16 samples in
*a58d3d2aSXin Li size.
*a58d3d2aSXin Li<xref target="silk_shell_block_table"/> lists the number of shell blocks
*a58d3d2aSXin Li required for a SILK frame for each possible audio bandwidth and frame size.
*a58d3d2aSXin Li10&nbsp;ms MB frames nominally contain 120&nbsp;samples (10&nbsp;ms at
*a58d3d2aSXin Li 12&nbsp;kHz), which is not a multiple of 16.
*a58d3d2aSXin LiThis is handled by coding 8 shell blocks (128 samples) and discarding the final
*a58d3d2aSXin Li 8 samples of the last block.
*a58d3d2aSXin LiThe decoder contains no special case that prevents an encoder from placing
*a58d3d2aSXin Li pulses in these samples, and they must be correctly parsed from the bitstream
*a58d3d2aSXin Li if present, but they are otherwise ignored.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_shell_block_table"
*a58d3d2aSXin Li title="Number of Shell Blocks Per SILK Frame">
*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol>
*a58d3d2aSXin Li<ttcol>Frame Size</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Number of Shell Blocks</ttcol>
*a58d3d2aSXin Li<c>NB</c> <c>10&nbsp;ms</c>  <c>5</c>
*a58d3d2aSXin Li<c>MB</c> <c>10&nbsp;ms</c>  <c>8</c>
*a58d3d2aSXin Li<c>WB</c> <c>10&nbsp;ms</c> <c>10</c>
*a58d3d2aSXin Li<c>NB</c> <c>20&nbsp;ms</c> <c>10</c>
*a58d3d2aSXin Li<c>MB</c> <c>20&nbsp;ms</c> <c>15</c>
*a58d3d2aSXin Li<c>WB</c> <c>20&nbsp;ms</c> <c>20</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_rate_level" title="Rate Level">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe first symbol in the excitation is a "rate level", which is an index from 0
*a58d3d2aSXin Li to 8, inclusive, coded using the PDF in <xref target="silk_rate_level_pdfs"/>
*a58d3d2aSXin Li corresponding to the signal type of the current frame (from
*a58d3d2aSXin Li <xref target="silk_frame_type"/>).
*a58d3d2aSXin LiThe rate level selects the PDF used to decode the number of pulses in
*a58d3d2aSXin Li the individual shell blocks.
*a58d3d2aSXin LiIt does not directly convey any information about the bitrate or the number of
*a58d3d2aSXin Li pulses itself, but merely changes the probability of the symbols in
*a58d3d2aSXin Li <xref target="silk_pulse_counts"/>.
*a58d3d2aSXin LiLevel&nbsp;0 provides a more efficient encoding at low rates generally, and
*a58d3d2aSXin Li level&nbsp;8 provides a more efficient encoding at high rates generally,
*a58d3d2aSXin Li though the most efficient level for a particular SILK frame may depend on the
*a58d3d2aSXin Li exact distribution of the coded symbols.
*a58d3d2aSXin LiAn encoder should, but is not required to, use the most efficient rate level.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_rate_level_pdfs"
*a58d3d2aSXin Li title="PDFs for the Rate Level">
*a58d3d2aSXin Li<ttcol>Signal Type</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>Inactive or Unvoiced</c>
*a58d3d2aSXin Li<c>{15, 51, 12, 46, 45, 13, 33, 27, 14}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>
*a58d3d2aSXin Li<c>{33, 30, 36, 17, 34, 49, 18, 21, 18}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_pulse_counts" title="Pulses Per Shell Block">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe total number of pulses in each of the shell blocks follows the rate level.
*a58d3d2aSXin LiThe pulse counts for all of the shell blocks are coded consecutively, before
*a58d3d2aSXin Li the content of any of the blocks.
*a58d3d2aSXin LiEach block may have anywhere from 0 to 16 pulses, inclusive, coded using the
*a58d3d2aSXin Li 18-entry PDF in <xref target="silk_pulse_count_pdfs"/> corresponding to the
*a58d3d2aSXin Li rate level from <xref target="silk_rate_level"/>.
*a58d3d2aSXin LiThe special value 17 indicates that this block has one or more additional
*a58d3d2aSXin Li LSBs to decode for each coefficient.
*a58d3d2aSXin LiIf the decoder encounters this value, it decodes another value for the actual
*a58d3d2aSXin Li pulse count of the block, but uses the PDF corresponding to the special rate
*a58d3d2aSXin Li level&nbsp;9 instead of the normal rate level.
*a58d3d2aSXin LiThis process repeats until the decoder reads a value less than 17, and it then
*a58d3d2aSXin Li sets the number of extra LSBs used to the number of 17's decoded for that
*a58d3d2aSXin Li block.
*a58d3d2aSXin LiIf it reads the value 17 ten times, then the next iteration uses the special
*a58d3d2aSXin Li rate level&nbsp;10 instead of 9.
*a58d3d2aSXin LiThe probability of decoding a 17 when using the PDF for rate level&nbsp;10 is
*a58d3d2aSXin Li zero, ensuring that the number of LSBs for a block will not exceed 10.
*a58d3d2aSXin LiThe cumulative distribution for rate level&nbsp;10 is just a shifted version of
*a58d3d2aSXin Li that for 9 and thus does not require any additional storage.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_pulse_count_pdfs"
*a58d3d2aSXin Li title="PDFs for the Pulse Count">
*a58d3d2aSXin Li<ttcol>Rate Level</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>0</c>
*a58d3d2aSXin Li<c>{131, 74, 25, 8, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c>
*a58d3d2aSXin Li<c>1</c>
*a58d3d2aSXin Li<c>{58, 93, 60, 23, 7, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c>
*a58d3d2aSXin Li<c>2</c>
*a58d3d2aSXin Li<c>{43, 51, 46, 33, 24, 16, 11, 8, 6, 3, 3, 3, 2, 1, 1, 2, 1, 2}/256</c>
*a58d3d2aSXin Li<c>3</c>
*a58d3d2aSXin Li<c>{17, 52, 71, 57, 31, 12, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256</c>
*a58d3d2aSXin Li<c>4</c>
*a58d3d2aSXin Li<c>{6, 21, 41, 53, 49, 35, 21, 11, 6, 3, 2, 2, 1, 1, 1, 1, 1, 1}/256</c>
*a58d3d2aSXin Li<c>5</c>
*a58d3d2aSXin Li<c>{7, 14, 22, 28, 29, 28, 25, 20, 17, 13, 11, 9, 7, 5, 4, 4, 3, 10}/256</c>
*a58d3d2aSXin Li<c>6</c>
*a58d3d2aSXin Li<c>{2, 5, 14, 29, 42, 46, 41, 31, 19, 11, 6, 3, 2, 1, 1, 1, 1, 1}/256</c>
*a58d3d2aSXin Li<c>7</c>
*a58d3d2aSXin Li<c>{1, 2, 4, 10, 19, 29, 35, 37, 34, 28, 20, 14, 8, 5, 4, 2, 2, 2}/256</c>
*a58d3d2aSXin Li<c>8</c>
*a58d3d2aSXin Li<c>{1, 2, 2, 5, 9, 14, 20, 24, 27, 28, 26, 23, 20, 15, 11, 8, 6, 15}/256</c>
*a58d3d2aSXin Li<c>9</c>
*a58d3d2aSXin Li<c>{1, 1, 1, 6, 27, 58, 56, 39, 25, 14, 10, 6, 3, 3, 2, 1, 1, 2}/256</c>
*a58d3d2aSXin Li<c>10</c>
*a58d3d2aSXin Li<c>{2, 1, 6, 27, 58, 56, 39, 25, 14, 10, 6, 3, 3, 2, 1, 1, 2, 0}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_pulse_locations" title="Pulse Location Decoding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe locations of the pulses in each shell block follow the pulse counts,
*a58d3d2aSXin Li as decoded by silk_shell_decoder() (shell_coder.c).
*a58d3d2aSXin LiAs with the pulse counts, these locations are coded for all the shell blocks
*a58d3d2aSXin Li before any of the remaining information for each block.
*a58d3d2aSXin LiUnlike many other codecs, SILK places no restriction on the distribution of
*a58d3d2aSXin Li pulses within a shell block.
*a58d3d2aSXin LiAll of the pulses may be placed in a single location, or each one in a unique
*a58d3d2aSXin Li location, or anything in between.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe location of pulses is coded by recursively partitioning each block into
*a58d3d2aSXin Li halves, and coding how many pulses fall on the left side of the split.
*a58d3d2aSXin LiAll remaining pulses must fall on the right side of the split.
*a58d3d2aSXin LiThe process then recurses into the left half, and after that returns, the
*a58d3d2aSXin Li right half (preorder traversal).
*a58d3d2aSXin LiThe PDF to use is chosen by the size of the current partition (16, 8, 4, or 2)
*a58d3d2aSXin Li and the number of pulses in the partition (1 to 16, inclusive).
*a58d3d2aSXin LiTables&nbsp;<xref format="counter" target="silk_shell_code3_pdfs"/>
*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_shell_code0_pdfs"/> list the
*a58d3d2aSXin Li PDFs used for each partition size and pulse count.
*a58d3d2aSXin LiThis process skips partitions without any pulses, i.e., where the initial pulse
*a58d3d2aSXin Li count from <xref target="silk_pulse_counts"/> was zero, or where the split in
*a58d3d2aSXin Li the prior level indicated that all of the pulses fell on the other side.
*a58d3d2aSXin LiThese partitions have nothing to code, so they require no PDF.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_shell_code3_pdfs"
*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 16 Sample Partitions">
*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li <c>1</c> <c>{126, 130}/256</c>
*a58d3d2aSXin Li <c>2</c> <c>{56, 142, 58}/256</c>
*a58d3d2aSXin Li <c>3</c> <c>{25, 101, 104, 26}/256</c>
*a58d3d2aSXin Li <c>4</c> <c>{12, 60, 108, 64, 12}/256</c>
*a58d3d2aSXin Li <c>5</c> <c>{7, 35, 84, 87, 37, 6}/256</c>
*a58d3d2aSXin Li <c>6</c> <c>{4, 20, 59, 86, 63, 21, 3}/256</c>
*a58d3d2aSXin Li <c>7</c> <c>{3, 12, 38, 72, 75, 42, 12, 2}/256</c>
*a58d3d2aSXin Li <c>8</c> <c>{2, 8, 25, 54, 73, 59, 27, 7, 1}/256</c>
*a58d3d2aSXin Li <c>9</c> <c>{2, 5, 17, 39, 63, 65, 42, 18, 4, 1}/256</c>
*a58d3d2aSXin Li<c>10</c> <c>{1, 4, 12, 28, 49, 63, 54, 30, 11, 3, 1}/256</c>
*a58d3d2aSXin Li<c>11</c> <c>{1, 4, 8, 20, 37, 55, 57, 41, 22, 8, 2, 1}/256</c>
*a58d3d2aSXin Li<c>12</c> <c>{1, 3, 7, 15, 28, 44, 53, 48, 33, 16, 6, 1, 1}/256</c>
*a58d3d2aSXin Li<c>13</c> <c>{1, 2, 6, 12, 21, 35, 47, 48, 40, 25, 12, 5, 1, 1}/256</c>
*a58d3d2aSXin Li<c>14</c> <c>{1, 1, 4, 10, 17, 27, 37, 47, 43, 33, 21, 9, 4, 1, 1}/256</c>
*a58d3d2aSXin Li<c>15</c> <c>{1, 1, 1, 8, 14, 22, 33, 40, 43, 38, 28, 16, 8, 1, 1, 1}/256</c>
*a58d3d2aSXin Li<c>16</c> <c>{1, 1, 1, 1, 13, 18, 27, 36, 41, 41, 34, 24, 14, 1, 1, 1, 1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_shell_code2_pdfs"
*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 8 Sample Partitions">
*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li <c>1</c> <c>{127, 129}/256</c>
*a58d3d2aSXin Li <c>2</c> <c>{53, 149, 54}/256</c>
*a58d3d2aSXin Li <c>3</c> <c>{22, 105, 106, 23}/256</c>
*a58d3d2aSXin Li <c>4</c> <c>{11, 61, 111, 63, 10}/256</c>
*a58d3d2aSXin Li <c>5</c> <c>{6, 35, 86, 88, 36, 5}/256</c>
*a58d3d2aSXin Li <c>6</c> <c>{4, 20, 59, 87, 62, 21, 3}/256</c>
*a58d3d2aSXin Li <c>7</c> <c>{3, 13, 40, 71, 73, 41, 13, 2}/256</c>
*a58d3d2aSXin Li <c>8</c> <c>{3, 9, 27, 53, 70, 56, 28, 9, 1}/256</c>
*a58d3d2aSXin Li <c>9</c> <c>{3, 8, 19, 37, 57, 61, 44, 20, 6, 1}/256</c>
*a58d3d2aSXin Li<c>10</c> <c>{3, 7, 15, 28, 44, 54, 49, 33, 17, 5, 1}/256</c>
*a58d3d2aSXin Li<c>11</c> <c>{1, 7, 13, 22, 34, 46, 48, 38, 28, 14, 4, 1}/256</c>
*a58d3d2aSXin Li<c>12</c> <c>{1, 1, 11, 22, 27, 35, 42, 47, 33, 25, 10, 1, 1}/256</c>
*a58d3d2aSXin Li<c>13</c> <c>{1, 1, 6, 14, 26, 37, 43, 43, 37, 26, 14, 6, 1, 1}/256</c>
*a58d3d2aSXin Li<c>14</c> <c>{1, 1, 4, 10, 20, 31, 40, 42, 40, 31, 20, 10, 4, 1, 1}/256</c>
*a58d3d2aSXin Li<c>15</c> <c>{1, 1, 3, 8, 16, 26, 35, 38, 38, 35, 26, 16, 8, 3, 1, 1}/256</c>
*a58d3d2aSXin Li<c>16</c> <c>{1, 1, 2, 6, 12, 21, 30, 36, 38, 36, 30, 21, 12, 6, 2, 1, 1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_shell_code1_pdfs"
*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 4 Sample Partitions">
*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li <c>1</c> <c>{127, 129}/256</c>
*a58d3d2aSXin Li <c>2</c> <c>{49, 157, 50}/256</c>
*a58d3d2aSXin Li <c>3</c> <c>{20, 107, 109, 20}/256</c>
*a58d3d2aSXin Li <c>4</c> <c>{11, 60, 113, 62, 10}/256</c>
*a58d3d2aSXin Li <c>5</c> <c>{7, 36, 84, 87, 36, 6}/256</c>
*a58d3d2aSXin Li <c>6</c> <c>{6, 24, 57, 82, 60, 23, 4}/256</c>
*a58d3d2aSXin Li <c>7</c> <c>{5, 18, 39, 64, 68, 42, 16, 4}/256</c>
*a58d3d2aSXin Li <c>8</c> <c>{6, 14, 29, 47, 61, 52, 30, 14, 3}/256</c>
*a58d3d2aSXin Li <c>9</c> <c>{1, 15, 23, 35, 51, 50, 40, 30, 10, 1}/256</c>
*a58d3d2aSXin Li<c>10</c> <c>{1, 1, 21, 32, 42, 52, 46, 41, 18, 1, 1}/256</c>
*a58d3d2aSXin Li<c>11</c> <c>{1, 6, 16, 27, 36, 42, 42, 36, 27, 16, 6, 1}/256</c>
*a58d3d2aSXin Li<c>12</c> <c>{1, 5, 12, 21, 31, 38, 40, 38, 31, 21, 12, 5, 1}/256</c>
*a58d3d2aSXin Li<c>13</c> <c>{1, 3, 9, 17, 26, 34, 38, 38, 34, 26, 17, 9, 3, 1}/256</c>
*a58d3d2aSXin Li<c>14</c> <c>{1, 3, 7, 14, 22, 29, 34, 36, 34, 29, 22, 14, 7, 3, 1}/256</c>
*a58d3d2aSXin Li<c>15</c> <c>{1, 2, 5, 11, 18, 25, 31, 35, 35, 31, 25, 18, 11, 5, 2, 1}/256</c>
*a58d3d2aSXin Li<c>16</c> <c>{1, 1, 4, 9, 15, 21, 28, 32, 34, 32, 28, 21, 15, 9, 4, 1, 1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_shell_code0_pdfs"
*a58d3d2aSXin Li title="PDFs for Pulse Count Split, 2 Sample Partitions">
*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li <c>1</c> <c>{128, 128}/256</c>
*a58d3d2aSXin Li <c>2</c> <c>{42, 172, 42}/256</c>
*a58d3d2aSXin Li <c>3</c> <c>{21, 107, 107, 21}/256</c>
*a58d3d2aSXin Li <c>4</c> <c>{12, 60, 112, 61, 11}/256</c>
*a58d3d2aSXin Li <c>5</c> <c>{8, 34, 86, 86, 35, 7}/256</c>
*a58d3d2aSXin Li <c>6</c> <c>{8, 23, 55, 90, 55, 20, 5}/256</c>
*a58d3d2aSXin Li <c>7</c> <c>{5, 15, 38, 72, 72, 36, 15, 3}/256</c>
*a58d3d2aSXin Li <c>8</c> <c>{6, 12, 27, 52, 77, 47, 20, 10, 5}/256</c>
*a58d3d2aSXin Li <c>9</c> <c>{6, 19, 28, 35, 40, 40, 35, 28, 19, 6}/256</c>
*a58d3d2aSXin Li<c>10</c> <c>{4, 14, 22, 31, 37, 40, 37, 31, 22, 14, 4}/256</c>
*a58d3d2aSXin Li<c>11</c> <c>{3, 10, 18, 26, 33, 38, 38, 33, 26, 18, 10, 3}/256</c>
*a58d3d2aSXin Li<c>12</c> <c>{2, 8, 13, 21, 29, 36, 38, 36, 29, 21, 13, 8, 2}/256</c>
*a58d3d2aSXin Li<c>13</c> <c>{1, 5, 10, 17, 25, 32, 38, 38, 32, 25, 17, 10, 5, 1}/256</c>
*a58d3d2aSXin Li<c>14</c> <c>{1, 4, 7, 13, 21, 29, 35, 36, 35, 29, 21, 13, 7, 4, 1}/256</c>
*a58d3d2aSXin Li<c>15</c> <c>{1, 2, 5, 10, 17, 25, 32, 36, 36, 32, 25, 17, 10, 5, 2, 1}/256</c>
*a58d3d2aSXin Li<c>16</c> <c>{1, 2, 4, 7, 13, 21, 28, 34, 36, 34, 28, 21, 13, 7, 4, 2, 1}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_shell_lsb" title="LSB Decoding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter the decoder reads the pulse locations for all blocks, it reads the LSBs
*a58d3d2aSXin Li (if any) for each block in turn.
*a58d3d2aSXin LiInside each block, it reads all the LSBs for each coefficient in turn, even
*a58d3d2aSXin Li those where no pulses were allocated, before proceeding to the next one.
*a58d3d2aSXin LiFor 10&nbsp;ms MB frames, it reads LSBs even for the extra 8&nbsp;samples in
*a58d3d2aSXin Li the last block.
*a58d3d2aSXin LiThe LSBs are coded from most significant to least significant, and they all use
*a58d3d2aSXin Li the PDF in <xref target="silk_shell_lsb_pdf"/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_shell_lsb_pdf" title="PDF for Excitation LSBs">
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>{136, 120}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe number of LSBs read for each coefficient in a block is determined in
*a58d3d2aSXin Li <xref target="silk_pulse_counts"/>.
*a58d3d2aSXin LiThe magnitude of the coefficient is initially equal to the number of pulses
*a58d3d2aSXin Li placed at that location in <xref target="silk_pulse_locations"/>.
*a58d3d2aSXin LiAs each LSB is decoded, the magnitude is doubled, and then the value of the LSB
*a58d3d2aSXin Li added to it, to obtain an updated magnitude.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_signs" title="Sign Decoding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter decoding the pulse locations and the LSBs, the decoder knows the
*a58d3d2aSXin Li magnitude of each coefficient in the excitation.
*a58d3d2aSXin LiIt then decodes a sign for all coefficients with a non-zero magnitude, using
*a58d3d2aSXin Li one of the PDFs from <xref target="silk_sign_pdfs"/>.
*a58d3d2aSXin LiIf the value decoded is 0, then the coefficient magnitude is negated.
*a58d3d2aSXin LiOtherwise, it remains positive.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoder chooses the PDF for the sign based on the signal type and
*a58d3d2aSXin Li quantization offset type (from <xref target="silk_frame_type"/>) and the
*a58d3d2aSXin Li number of pulses in the block (from <xref target="silk_pulse_counts"/>).
*a58d3d2aSXin LiThe number of pulses in the block does not take into account any LSBs.
*a58d3d2aSXin LiMost PDFs are skewed towards negative signs because of the quantization offset,
*a58d3d2aSXin Li but the PDFs for zero pulses are highly skewed towards positive signs.
*a58d3d2aSXin LiIf a block contains many positive coefficients, it is sometimes beneficial to
*a58d3d2aSXin Li code it solely using LSBs (i.e., with zero pulses), since the encoder may be
*a58d3d2aSXin Li able to save enough bits on the signs to justify the less efficient
*a58d3d2aSXin Li coefficient magnitude encoding.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_sign_pdfs"
*a58d3d2aSXin Li title="PDFs for Excitation Signs">
*a58d3d2aSXin Li<ttcol>Signal Type</ttcol>
*a58d3d2aSXin Li<ttcol>Quantization Offset Type</ttcol>
*a58d3d2aSXin Li<ttcol>Pulse Count</ttcol>
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>0</c>         <c>{2, 254}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>1</c>         <c>{207, 49}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>2</c>         <c>{189, 67}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>3</c>         <c>{179, 77}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>4</c>         <c>{174, 82}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>5</c>         <c>{163, 93}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>6 or more</c> <c>{157, 99}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>0</c>         <c>{58, 198}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>1</c>         <c>{245, 11}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>2</c>         <c>{238, 18}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>3</c>         <c>{232, 24}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>4</c>         <c>{225, 31}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>5</c>         <c>{220, 36}/256</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>6 or more</c> <c>{211, 45}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>0</c>         <c>{1, 255}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>1</c>         <c>{210, 46}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>2</c>         <c>{190, 66}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>3</c>         <c>{178, 78}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>4</c>         <c>{169, 87}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>5</c>         <c>{162, 94}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>6 or more</c> <c>{152, 104}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>0</c>         <c>{48, 208}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>1</c>         <c>{242, 14}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>2</c>         <c>{235, 21}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>3</c>         <c>{224, 32}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>4</c>         <c>{214, 42}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>5</c>         <c>{205, 51}/256</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>6 or more</c> <c>{190, 66}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>0</c>         <c>{1, 255}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>1</c>         <c>{162, 94}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>2</c>         <c>{152, 104}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>3</c>         <c>{147, 109}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>4</c>         <c>{144, 112}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>5</c>         <c>{141, 115}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>  <c>6 or more</c> <c>{138, 118}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>0</c>         <c>{8, 248}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>1</c>         <c>{203, 53}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>2</c>         <c>{187, 69}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>3</c>         <c>{176, 80}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>4</c>         <c>{168, 88}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>5</c>         <c>{161, 95}/256</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>6 or more</c> <c>{154, 102}/256</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_excitation_reconstruction"
*a58d3d2aSXin Li title="Reconstructing the Excitation">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter the signs have been read, there is enough information to reconstruct the
*a58d3d2aSXin Li complete excitation signal.
*a58d3d2aSXin LiThis requires adding a constant quantization offset to each non-zero sample,
*a58d3d2aSXin Li and then pseudorandomly inverting and offsetting every sample.
*a58d3d2aSXin LiThe constant quantization offset varies depending on the signal type and
*a58d3d2aSXin Li quantization offset type (see <xref target="silk_frame_type"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_quantization_offsets"
*a58d3d2aSXin Li title="Excitation Quantization Offsets">
*a58d3d2aSXin Li<ttcol align="left">Signal Type</ttcol>
*a58d3d2aSXin Li<ttcol align="left">Quantization Offset Type</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Quantization Offset (Q23)</ttcol>
*a58d3d2aSXin Li<c>Inactive</c> <c>Low</c>  <c>25</c>
*a58d3d2aSXin Li<c>Inactive</c> <c>High</c> <c>60</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>Low</c>  <c>25</c>
*a58d3d2aSXin Li<c>Unvoiced</c> <c>High</c> <c>60</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>Low</c>   <c>8</c>
*a58d3d2aSXin Li<c>Voiced</c>   <c>High</c> <c>25</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLet e_raw[i] be the raw excitation value at position i, with a magnitude
*a58d3d2aSXin Li composed of the pulses at that location (see
*a58d3d2aSXin Li <xref target="silk_pulse_locations"/>) combined with any additional LSBs (see
*a58d3d2aSXin Li <xref target="silk_shell_lsb"/>), and with the corresponding sign decoded in
*a58d3d2aSXin Li <xref target="silk_signs"/>.
*a58d3d2aSXin LiAdditionally, let seed be the current pseudorandom seed, which is initialized
*a58d3d2aSXin Li to the value decoded from <xref target="silk_seed"/> for the first sample in
*a58d3d2aSXin Li the current SILK frame, and updated for each subsequent sample according to
*a58d3d2aSXin Li the procedure below.
*a58d3d2aSXin LiFinally, let offset_Q23 be the quantization offset from
*a58d3d2aSXin Li <xref target="silk_quantization_offsets"/>.
*a58d3d2aSXin LiThen the following procedure produces the final reconstructed excitation value,
*a58d3d2aSXin Li e_Q23[i]:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lie_Q23[i] = (e_raw[i] << 8) - sign(e_raw[i])*20 + offset_Q23;
*a58d3d2aSXin Li    seed = (196314165*seed + 907633515) & 0xFFFFFFFF;
*a58d3d2aSXin Lie_Q23[i] = (seed & 0x80000000) ? -e_Q23[i] : e_Q23[i];
*a58d3d2aSXin Li    seed = (seed + e_raw[i]) & 0xFFFFFFFF;
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiWhen e_raw[i] is zero, sign() returns 0 by the definition in
*a58d3d2aSXin Li <xref target="sign"/>, so the factor of 20 does not get added.
*a58d3d2aSXin LiThe final e_Q23[i] value may require more than 16 bits per sample, but will not
*a58d3d2aSXin Li require more than 23, including the sign.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_frame_reconstruction" toc="include"
*a58d3d2aSXin Li title="SILK Frame Reconstruction">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe remainder of the reconstruction process for the frame does not need to be
*a58d3d2aSXin Li bit-exact, as small errors should only introduce proportionally small
*a58d3d2aSXin Li distortions.
*a58d3d2aSXin LiAlthough the reference implementation only includes a fixed-point version of
*a58d3d2aSXin Li the remaining steps, this section describes them in terms of a floating-point
*a58d3d2aSXin Li version for simplicity.
*a58d3d2aSXin LiThis produces a signal with a nominal range of -1.0 to 1.0.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Lisilk_decode_core() (decode_core.c) contains the code for the main
*a58d3d2aSXin Li reconstruction process.
*a58d3d2aSXin LiIt proceeds subframe-by-subframe, since quantization gains, LTP parameters, and
*a58d3d2aSXin Li (in 20&nbsp;ms SILK frames) LPC coefficients can vary from one to the
*a58d3d2aSXin Li next.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLet a_Q12[k] be the LPC coefficients for the current subframe.
*a58d3d2aSXin LiIf this is the first or second subframe of a 20&nbsp;ms SILK frame and the LSF
*a58d3d2aSXin Li interpolation factor, w_Q2 (see <xref target="silk_nlsf_interpolation"/>), is
*a58d3d2aSXin Li less than 4, then these correspond to the final LPC coefficients produced by
*a58d3d2aSXin Li <xref target="silk_lpc_gain_limit"/> from the interpolated LSF coefficients,
*a58d3d2aSXin Li n1_Q15[k] (computed in <xref target="silk_nlsf_interpolation"/>).
*a58d3d2aSXin LiOtherwise, they correspond to the final LPC coefficients produced from the
*a58d3d2aSXin Li uninterpolated LSF coefficients for the current frame, n2_Q15[k].
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAlso, let n be the number of samples in a subframe (40 for NB, 60 for MB, and
*a58d3d2aSXin Li 80 for WB), s be the index of the current subframe in this SILK frame (0 or 1
*a58d3d2aSXin Li for 10&nbsp;ms frames, or 0 to 3 for 20&nbsp;ms frames), and j be the index of
*a58d3d2aSXin Li the first sample in the residual corresponding to the current subframe.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_ltp_synthesis" title="LTP Synthesis">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiVoiced SILK frames (see <xref target="silk_frame_type"/>) pass the excitation
*a58d3d2aSXin Li through an LTP filter using the parameters decoded in
*a58d3d2aSXin Li <xref target="silk_ltp_params"/> to produce an LPC residual.
*a58d3d2aSXin LiThe LTP filter requires LPC residual values from before the current subframe as
*a58d3d2aSXin Li input.
*a58d3d2aSXin LiHowever, since the LPC coefficients may have changed, it obtains this residual
*a58d3d2aSXin Li by "rewhitening" the corresponding output signal using the LPC coefficients
*a58d3d2aSXin Li from the current subframe.
*a58d3d2aSXin LiLet out[i] for
*a58d3d2aSXin Li (j&nbsp;-&nbsp;pitch_lags[s]&nbsp;-&nbsp;d_LPC&nbsp;-&nbsp;2)&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;j
*a58d3d2aSXin Li be the fully reconstructed output signal from the last
*a58d3d2aSXin Li (pitch_lags[s]&nbsp;+&nbsp;d_LPC&nbsp;+&nbsp;2) samples of previous subframes
*a58d3d2aSXin Li (see <xref target="silk_lpc_synthesis"/>), where pitch_lags[s] is the pitch
*a58d3d2aSXin Li lag for the current subframe from <xref target="silk_ltp_lags"/>.
*a58d3d2aSXin LiDuring reconstruction of the first subframe for this channel after either
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>An uncoded regular SILK frame (if this is the side channel), or</t>
*a58d3d2aSXin Li<t>A decoder reset (see <xref target="decoder-reset"/>),</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li out[] is rewhitened into an LPC residual,
*a58d3d2aSXin Li res[i], via
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li         4.0*LTP_scale_Q14
*a58d3d2aSXin Lires[i] = ----------------- * clamp(-1.0,
*a58d3d2aSXin Li            gain_Q16[s]
*a58d3d2aSXin Li
*a58d3d2aSXin Li                                   d_LPC-1
*a58d3d2aSXin Li                                     __              a_Q12[k]
*a58d3d2aSXin Li                            out[i] - \  out[i-k-1] * --------, 1.0) .
*a58d3d2aSXin Li                                     /_               4096.0
*a58d3d2aSXin Li                                     k=0
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThis requires storage to buffer up to 306 values of out[i] from previous
*a58d3d2aSXin Li subframes.
*a58d3d2aSXin LiThis corresponds to WB with a maximum pitch lag of
*a58d3d2aSXin Li 18&nbsp;ms&nbsp;*&nbsp;16&nbsp;kHz samples, plus 16 samples for d_LPC, plus 2
*a58d3d2aSXin Li samples for the width of the LTP filter.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLet e_Q23[i] for j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n) be the
*a58d3d2aSXin Li excitation for the current subframe, and b_Q7[k] for
*a58d3d2aSXin Li 0&nbsp;&lt;=&nbsp;k&nbsp;&lt;&nbsp;5 be the coefficients of the LTP filter
*a58d3d2aSXin Li taken from the codebook entry in one of
*a58d3d2aSXin Li Tables&nbsp;<xref format="counter" target="silk_ltp_filter_coeffs0"/>
*a58d3d2aSXin Li through&nbsp;<xref format="counter" target="silk_ltp_filter_coeffs2"/>
*a58d3d2aSXin Li corresponding to the index decoded for the current subframe in
*a58d3d2aSXin Li <xref target="silk_ltp_filter"/>.
*a58d3d2aSXin LiThen for i such that j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n),
*a58d3d2aSXin Li the LPC residual is
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li                      4
*a58d3d2aSXin Li          e_Q23[i]   __                                  b_Q7[k]
*a58d3d2aSXin Lires[i] = --------- + \  res[i - pitch_lags[s] + 2 - k] * ------- .
*a58d3d2aSXin Li          2.0**23    /_                                   128.0
*a58d3d2aSXin Li                     k=0
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor unvoiced frames, the LPC residual for
*a58d3d2aSXin Li j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n) is simply a normalized
*a58d3d2aSXin Li copy of the excitation signal, i.e.,
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li          e_Q23[i]
*a58d3d2aSXin Lires[i] = ---------
*a58d3d2aSXin Li          2.0**23
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_lpc_synthesis" title="LPC Synthesis">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLPC synthesis uses the short-term LPC filter to predict the next output
*a58d3d2aSXin Li coefficient.
*a58d3d2aSXin LiFor i such that (j&nbsp;-&nbsp;d_LPC)&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;j, let
*a58d3d2aSXin Li lpc[i] be the result of LPC synthesis from the last d_LPC samples of the
*a58d3d2aSXin Li previous subframe, or zeros in the first subframe for this channel after
*a58d3d2aSXin Li either
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>An uncoded regular SILK frame (if this is the side channel), or</t>
*a58d3d2aSXin Li<t>A decoder reset (see <xref target="decoder-reset"/>).</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiThen for i such that j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n), the
*a58d3d2aSXin Li result of LPC synthesis for the current subframe is
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li                              d_LPC-1
*a58d3d2aSXin Li         gain_Q16[i]            __              a_Q12[k]
*a58d3d2aSXin Lilpc[i] = ----------- * res[i] + \  lpc[i-k-1] * -------- .
*a58d3d2aSXin Li           65536.0              /_               4096.0
*a58d3d2aSXin Li                                k=0
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe decoder saves the final d_LPC values, i.e., lpc[i] such that
*a58d3d2aSXin Li (j&nbsp;+&nbsp;n&nbsp;-&nbsp;d_LPC)&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n),
*a58d3d2aSXin Li to feed into the LPC synthesis of the next subframe.
*a58d3d2aSXin LiThis requires storage for up to 16 values of lpc[i] (for WB frames).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThen, the signal is clamped into the final nominal range:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Liout[i] = clamp(-1.0, lpc[i], 1.0) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThis clamping occurs entirely after the LPC synthesis filter has run.
*a58d3d2aSXin LiThe decoder saves the unclamped values, lpc[i], to feed into the LPC filter for
*a58d3d2aSXin Li the next subframe, but saves the clamped values, out[i], for rewhitening in
*a58d3d2aSXin Li voiced frames.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="silk_stereo_unmixing" title="Stereo Unmixing">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor stereo streams, after decoding a frame from each channel, the decoder must
*a58d3d2aSXin Li convert the mid-side (MS) representation into a left-right (LR)
*a58d3d2aSXin Li representation.
*a58d3d2aSXin LiThe function silk_stereo_MS_to_LR (stereo_MS_to_LR.c) implements this process.
*a58d3d2aSXin LiIn it, the decoder predicts the side channel using a) a simple low-passed
*a58d3d2aSXin Li version of the mid channel, and b) the unfiltered mid channel, using the
*a58d3d2aSXin Li prediction weights decoded in <xref target="silk_stereo_pred"/>.
*a58d3d2aSXin LiThis simple low-pass filter imposes a one-sample delay, and the unfiltered
*a58d3d2aSXin Limid channel is also delayed by one sample.
*a58d3d2aSXin LiIn order to allow seamless switching between stereo and mono, mono streams must
*a58d3d2aSXin Li also impose the same one-sample delay.
*a58d3d2aSXin LiThe encoder requires an additional one-sample delay for both mono and stereo
*a58d3d2aSXin Li streams, though an encoder may omit the delay for mono if it knows it will
*a58d3d2aSXin Li never switch to stereo.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe unmixing process operates in two phases.
*a58d3d2aSXin LiThe first phase lasts for 8&nbsp;ms, during which it interpolates the
*a58d3d2aSXin Li prediction weights from the previous frame, prev_w0_Q13 and prev_w1_Q13, to
*a58d3d2aSXin Li the values for the current frame, w0_Q13 and w1_Q13.
*a58d3d2aSXin LiThe second phase simply uses these weights for the remainder of the frame.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiLet mid[i] and side[i] be the contents of out[i] (from
*a58d3d2aSXin Li <xref target="silk_lpc_synthesis"/>) for the current mid and side channels,
*a58d3d2aSXin Li respectively, and let left[i] and right[i] be the corresponding stereo output
*a58d3d2aSXin Li channels.
*a58d3d2aSXin LiIf the side channel is not coded (see <xref target="silk_mid_only_flag"/>),
*a58d3d2aSXin Li then side[i] is set to zero.
*a58d3d2aSXin LiAlso let j be defined as in <xref target="silk_frame_reconstruction"/>, n1 be
*a58d3d2aSXin Li the number of samples in phase&nbsp;1 (64 for NB, 96 for MB, and 128 for WB),
*a58d3d2aSXin Li and n2 be the total number of samples in the frame.
*a58d3d2aSXin LiThen for i such that j&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;(j&nbsp;+&nbsp;n2),
*a58d3d2aSXin Li the left and right channel output is
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li              prev_w0_Q13                  (w0_Q13 - prev_w0_Q13)
*a58d3d2aSXin Li        w0 =  ----------- + min(i - j, n1)*---------------------- ,
*a58d3d2aSXin Li                8192.0                           8192.0*n1
*a58d3d2aSXin Li
*a58d3d2aSXin Li              prev_w1_Q13                  (w1_Q13 - prev_w1_Q13)
*a58d3d2aSXin Li        w1 =  ----------- + min(i - j, n1)*---------------------- ,
*a58d3d2aSXin Li                8192.0                            8192.0*n1
*a58d3d2aSXin Li
*a58d3d2aSXin Li             mid[i-2] + 2*mid[i-1] + mid[i]
*a58d3d2aSXin Li        p0 = ------------------------------ ,
*a58d3d2aSXin Li                          4.0
*a58d3d2aSXin Li
*a58d3d2aSXin Li left[i] = clamp(-1.0, (1 + w1)*mid[i-1] + side[i-1] + w0*p0, 1.0) ,
*a58d3d2aSXin Li
*a58d3d2aSXin Liright[i] = clamp(-1.0, (1 - w1)*mid[i-1] - side[i-1] - w0*p0, 1.0) .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThese formulas require two samples prior to index&nbsp;j, the start of the
*a58d3d2aSXin Li frame, for the mid channel, and one prior sample for the side channel.
*a58d3d2aSXin LiFor the first frame after a decoder reset, zeros are used instead.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Resampling">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter stereo unmixing (if any), the decoder applies resampling to convert the
*a58d3d2aSXin Li decoded SILK output to the sample rate desired by the application.
*a58d3d2aSXin LiThis is necessary when decoding a Hybrid frame at SWB or FB sample rates, or
*a58d3d2aSXin Li whenever the decoder wants the output at a different sample rate than the
*a58d3d2aSXin Li internal SILK sampling rate (e.g., to allow a constant sample rate when the
*a58d3d2aSXin Li audio bandwidth changes, or to allow mixing with audio from other
*a58d3d2aSXin Li applications).
*a58d3d2aSXin LiThe resampler itself is non-normative, and a decoder can use any method it
*a58d3d2aSXin Li wants to perform the resampling.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiHowever, a minimum amount of delay is imposed to allow the resampler to
*a58d3d2aSXin Li operate, and this delay is normative, so that the corresponding delay can be
*a58d3d2aSXin Li applied to the MDCT layer in the encoder.
*a58d3d2aSXin LiA decoder is always free to use a resampler which requires more delay than
*a58d3d2aSXin Li allowed for here (e.g., to improve quality), but it must then delay the output
*a58d3d2aSXin Li of the MDCT layer by this extra amount.
*a58d3d2aSXin LiKeeping as much delay as possible on the encoder side allows an encoder which
*a58d3d2aSXin Li knows it will never use any of the SILK or Hybrid modes to skip this delay.
*a58d3d2aSXin LiBy contrast, if it were all applied by the decoder, then a decoder which
*a58d3d2aSXin Li processes audio in fixed-size blocks would be forced to delay the output of
*a58d3d2aSXin Li CELT frames just in case of a later switch to a SILK or Hybrid mode.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li<xref target="silk_resampler_delay_alloc"/> gives the maximum resampler delay
*a58d3d2aSXin Li in samples at 48&nbsp;kHz for each SILK audio bandwidth.
*a58d3d2aSXin LiBecause the actual output rate may not be 48&nbsp;kHz, it may not be possible
*a58d3d2aSXin Li to achieve exactly these delays while using a whole number of input or output
*a58d3d2aSXin Li samples.
*a58d3d2aSXin LiThe reference implementation is able to resample to any of the supported
*a58d3d2aSXin Li output sampling rates (8, 12, 16, 24, or 48&nbsp;kHz) within or near this
*a58d3d2aSXin Li delay constraint.
*a58d3d2aSXin LiSome resampling filters (including those used by the reference implementation)
*a58d3d2aSXin Li may add a delay that is not an exact integer, or is not linear-phase, and so
*a58d3d2aSXin Li cannot be represented by a single delay at all frequencies.
*a58d3d2aSXin LiHowever, such deviations are unlikely to be perceptible, and the comparison
*a58d3d2aSXin Li tool described in <xref target="conformance"/> is designed to be relatively
*a58d3d2aSXin Li insensitive to them.
*a58d3d2aSXin LiThe delays listed here are the ones that should be targeted by the encoder.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="silk_resampler_delay_alloc"
*a58d3d2aSXin Li title="SILK Resampler Delay Allocations">
*a58d3d2aSXin Li<ttcol>Audio Bandwidth</ttcol>
*a58d3d2aSXin Li<ttcol>Delay in millisecond</ttcol>
*a58d3d2aSXin Li<c>NB</c> <c>0.538</c>
*a58d3d2aSXin Li<c>MB</c> <c>0.692</c>
*a58d3d2aSXin Li<c>WB</c> <c>0.706</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiNB is given a smaller decoder delay allocation than MB and WB to allow a
*a58d3d2aSXin Li higher-order filter when resampling to 8&nbsp;kHz in both the encoder and
*a58d3d2aSXin Li decoder.
*a58d3d2aSXin LiThis implies that the audio content of two SILK frames operating at different
*a58d3d2aSXin Li bandwidths are not perfectly aligned in time.
*a58d3d2aSXin LiThis is not an issue for any transitions described in
*a58d3d2aSXin Li <xref target="switching"/>, because they all involve a SILK decoder reset.
*a58d3d2aSXin LiWhen the decoder is reset, any samples remaining in the resampling buffer
*a58d3d2aSXin Li are discarded, and the resampler is re-initialized with silence.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="CELT Decoder">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe CELT layer of Opus is based on the Modified Discrete Cosine Transform
*a58d3d2aSXin Li<xref target='MDCT'/> with partially overlapping windows of 5 to 22.5 ms.
*a58d3d2aSXin LiThe main principle behind CELT is that the MDCT spectrum is divided into
*a58d3d2aSXin Libands that (roughly) follow the Bark scale, i.e., the scale of the ear's
*a58d3d2aSXin Licritical bands&nbsp;<xref target="Zwicker61"/>. The normal CELT layer uses 21 of those bands, though Opus
*a58d3d2aSXin Li Custom (see <xref target="opus-custom"/>) may use a different number of bands.
*a58d3d2aSXin LiIn Hybrid mode, the first 17 bands (up to 8&nbsp;kHz) are not coded.
*a58d3d2aSXin LiA band can contain as little as one MDCT bin per channel, and as many as 176
*a58d3d2aSXin Libins per channel, as detailed in <xref target="celt_band_sizes"/>.
*a58d3d2aSXin LiIn each band, the gain (energy) is coded separately from
*a58d3d2aSXin Lithe shape of the spectrum. Coding the gain explicitly makes it easy to
*a58d3d2aSXin Lipreserve the spectral envelope of the signal. The remaining unit-norm shape
*a58d3d2aSXin Livector is encoded using a Pyramid Vector Quantizer (PVQ)&nbsp;<xref target='PVQ-decoder'/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="celt_band_sizes"
*a58d3d2aSXin Li title="MDCT Bins Per Channel Per Band for Each Frame Size">
*a58d3d2aSXin Li<ttcol>Frame Size:</ttcol>
*a58d3d2aSXin Li<ttcol align="right">2.5&nbsp;ms</ttcol>
*a58d3d2aSXin Li<ttcol align="right">5&nbsp;ms</ttcol>
*a58d3d2aSXin Li<ttcol align="right">10&nbsp;ms</ttcol>
*a58d3d2aSXin Li<ttcol align="right">20&nbsp;ms</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Start Frequency</ttcol>
*a58d3d2aSXin Li<ttcol align="right">Stop Frequency</ttcol>
*a58d3d2aSXin Li<c>Band</c> <c>Bins:</c> <c/> <c/> <c/> <c/> <c/>
*a58d3d2aSXin Li <c>0</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>     <c>0&nbsp;Hz</c>   <c>200&nbsp;Hz</c>
*a58d3d2aSXin Li <c>1</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>   <c>200&nbsp;Hz</c>   <c>400&nbsp;Hz</c>
*a58d3d2aSXin Li <c>2</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>   <c>400&nbsp;Hz</c>   <c>600&nbsp;Hz</c>
*a58d3d2aSXin Li <c>3</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>   <c>600&nbsp;Hz</c>   <c>800&nbsp;Hz</c>
*a58d3d2aSXin Li <c>4</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>   <c>800&nbsp;Hz</c>  <c>1000&nbsp;Hz</c>
*a58d3d2aSXin Li <c>5</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>  <c>1000&nbsp;Hz</c>  <c>1200&nbsp;Hz</c>
*a58d3d2aSXin Li <c>6</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>  <c>1200&nbsp;Hz</c>  <c>1400&nbsp;Hz</c>
*a58d3d2aSXin Li <c>7</c>  <c>1</c>  <c>2</c>  <c>4</c>   <c>8</c>  <c>1400&nbsp;Hz</c>  <c>1600&nbsp;Hz</c>
*a58d3d2aSXin Li <c>8</c>  <c>2</c>  <c>4</c>  <c>8</c>  <c>16</c>  <c>1600&nbsp;Hz</c>  <c>2000&nbsp;Hz</c>
*a58d3d2aSXin Li <c>9</c>  <c>2</c>  <c>4</c>  <c>8</c>  <c>16</c>  <c>2000&nbsp;Hz</c>  <c>2400&nbsp;Hz</c>
*a58d3d2aSXin Li<c>10</c>  <c>2</c>  <c>4</c>  <c>8</c>  <c>16</c>  <c>2400&nbsp;Hz</c>  <c>2800&nbsp;Hz</c>
*a58d3d2aSXin Li<c>11</c>  <c>2</c>  <c>4</c>  <c>8</c>  <c>16</c>  <c>2800&nbsp;Hz</c>  <c>3200&nbsp;Hz</c>
*a58d3d2aSXin Li<c>12</c>  <c>4</c>  <c>8</c> <c>16</c>  <c>32</c>  <c>3200&nbsp;Hz</c>  <c>4000&nbsp;Hz</c>
*a58d3d2aSXin Li<c>13</c>  <c>4</c>  <c>8</c> <c>16</c>  <c>32</c>  <c>4000&nbsp;Hz</c>  <c>4800&nbsp;Hz</c>
*a58d3d2aSXin Li<c>14</c>  <c>4</c>  <c>8</c> <c>16</c>  <c>32</c>  <c>4800&nbsp;Hz</c>  <c>5600&nbsp;Hz</c>
*a58d3d2aSXin Li<c>15</c>  <c>6</c> <c>12</c> <c>24</c>  <c>48</c>  <c>5600&nbsp;Hz</c>  <c>6800&nbsp;Hz</c>
*a58d3d2aSXin Li<c>16</c>  <c>6</c> <c>12</c> <c>24</c>  <c>48</c>  <c>6800&nbsp;Hz</c>  <c>8000&nbsp;Hz</c>
*a58d3d2aSXin Li<c>17</c>  <c>8</c> <c>16</c> <c>32</c>  <c>64</c>  <c>8000&nbsp;Hz</c>  <c>9600&nbsp;Hz</c>
*a58d3d2aSXin Li<c>18</c> <c>12</c> <c>24</c> <c>48</c>  <c>96</c>  <c>9600&nbsp;Hz</c> <c>12000&nbsp;Hz</c>
*a58d3d2aSXin Li<c>19</c> <c>18</c> <c>36</c> <c>72</c> <c>144</c> <c>12000&nbsp;Hz</c> <c>15600&nbsp;Hz</c>
*a58d3d2aSXin Li<c>20</c> <c>22</c> <c>44</c> <c>88</c> <c>176</c> <c>15600&nbsp;Hz</c> <c>20000&nbsp;Hz</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTransients are notoriously difficult for transform codecs to code.
*a58d3d2aSXin LiCELT uses two different strategies for them:
*a58d3d2aSXin Li<list style="numbers">
*a58d3d2aSXin Li<t>Using multiple smaller MDCTs instead of a single large MDCT, and</t>
*a58d3d2aSXin Li<t>Dynamic time-frequency resolution changes (See <xref target='tf-change'/>).</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiTo improve quality on highly tonal and periodic signals, CELT includes
*a58d3d2aSXin Lia prefilter/postfilter combination. The prefilter on the encoder side
*a58d3d2aSXin Liattenuates the signal's harmonics. The postfilter on the decoder side
*a58d3d2aSXin Lirestores the original gain of the harmonics, while shaping the coding noise
*a58d3d2aSXin Lito roughly follow the harmonics. Such noise shaping reduces the perception
*a58d3d2aSXin Liof the noise.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen coding a stereo signal, three coding methods are available:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>mid-side stereo: encodes the mean and the difference of the left and right channels,</t>
*a58d3d2aSXin Li<t>intensity stereo: only encodes the mean of the left and right channels (discards the difference),</t>
*a58d3d2aSXin Li<t>dual stereo: encodes the left and right channels separately.</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAn overview of the decoder is given in <xref target="celt-decoder-overview"/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="celt-decoder-overview" title="Structure of the CELT decoder">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li               +---------+
*a58d3d2aSXin Li               | Coarse  |
*a58d3d2aSXin Li            +->| decoder |----+
*a58d3d2aSXin Li            |  +---------+    |
*a58d3d2aSXin Li            |                 |
*a58d3d2aSXin Li            |  +---------+    v
*a58d3d2aSXin Li            |  |  Fine   |  +---+
*a58d3d2aSXin Li            +->| decoder |->| + |
*a58d3d2aSXin Li            |  +---------+  +---+
*a58d3d2aSXin Li            |       ^         |
*a58d3d2aSXin Li+---------+ |       |         |
*a58d3d2aSXin Li|  Range  | | +----------+    v
*a58d3d2aSXin Li| Decoder |-+ |   Bit    | +------+
*a58d3d2aSXin Li+---------+ | |Allocation| | 2**x |
*a58d3d2aSXin Li            | +----------+ +------+
*a58d3d2aSXin Li            |       |         |
*a58d3d2aSXin Li            |       v         v               +--------+
*a58d3d2aSXin Li            |  +---------+  +---+  +-------+  | pitch  |
*a58d3d2aSXin Li            +->|   PVQ   |->| * |->| IMDCT |->| post-  |--->
*a58d3d2aSXin Li            |  | decoder |  +---+  +-------+  | filter |
*a58d3d2aSXin Li            |  +---------+                    +--------+
*a58d3d2aSXin Li            |                                      ^
*a58d3d2aSXin Li            +--------------------------------------+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoder is based on the following symbols and sets of symbols:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="celt_symbols"
*a58d3d2aSXin Li title="Order of the Symbols in the CELT Section of the Bitstream">
*a58d3d2aSXin Li<ttcol align="center">Symbol(s)</ttcol>
*a58d3d2aSXin Li<ttcol align="center">PDF</ttcol>
*a58d3d2aSXin Li<ttcol align="center">Condition</ttcol>
*a58d3d2aSXin Li<c>silence</c>      <c>{32767, 1}/32768</c> <c></c>
*a58d3d2aSXin Li<c>post-filter</c>  <c>{1, 1}/2</c> <c></c>
*a58d3d2aSXin Li<c>octave</c>       <c>uniform (6)</c><c>post-filter</c>
*a58d3d2aSXin Li<c>period</c>       <c>raw bits (4+octave)</c><c>post-filter</c>
*a58d3d2aSXin Li<c>gain</c>         <c>raw bits (3)</c><c>post-filter</c>
*a58d3d2aSXin Li<c>tapset</c>       <c>{2, 1, 1}/4</c><c>post-filter</c>
*a58d3d2aSXin Li<c>transient</c>    <c>{7, 1}/8</c><c></c>
*a58d3d2aSXin Li<c>intra</c>        <c>{7, 1}/8</c><c></c>
*a58d3d2aSXin Li<c>coarse energy</c><c><xref target="energy-decoding"/></c><c></c>
*a58d3d2aSXin Li<c>tf_change</c>    <c><xref target="transient-decoding"/></c><c></c>
*a58d3d2aSXin Li<c>tf_select</c>    <c>{1, 1}/2</c><c><xref target="transient-decoding"/></c>
*a58d3d2aSXin Li<c>spread</c>       <c>{7, 2, 21, 2}/32</c><c></c>
*a58d3d2aSXin Li<c>dyn. alloc.</c>  <c><xref target="allocation"/></c><c></c>
*a58d3d2aSXin Li<c>alloc. trim</c>  <c>{2, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2}/128</c><c></c>
*a58d3d2aSXin Li<c>skip</c>         <c>{1, 1}/2</c><c><xref target="allocation"/></c>
*a58d3d2aSXin Li<c>intensity</c>    <c>uniform</c><c><xref target="allocation"/></c>
*a58d3d2aSXin Li<c>dual</c>         <c>{1, 1}/2</c><c></c>
*a58d3d2aSXin Li<c>fine energy</c>  <c><xref target="energy-decoding"/></c><c></c>
*a58d3d2aSXin Li<c>residual</c>     <c><xref target="PVQ-decoder"/></c><c></c>
*a58d3d2aSXin Li<c>anti-collapse</c><c>{1, 1}/2</c><c><xref target="anti-collapse"/></c>
*a58d3d2aSXin Li<c>finalize</c>     <c><xref target="energy-decoding"/></c><c></c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoder extracts information from the range-coded bitstream in the order
*a58d3d2aSXin Lidescribed in <xref target='celt_symbols'/>. In some circumstances, it is
*a58d3d2aSXin Lipossible for a decoded value to be out of range due to a very small amount of redundancy
*a58d3d2aSXin Liin the encoding of large integers by the range coder.
*a58d3d2aSXin LiIn that case, the decoder should assume there has been an error in the coding,
*a58d3d2aSXin Lidecoding, or transmission and SHOULD take measures to conceal the error and/or report
*a58d3d2aSXin Lito the application that a problem has occurred. Such out of range errors cannot occur
*a58d3d2aSXin Liin the SILK layer.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="transient-decoding" title="Transient Decoding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe "transient" flag indicates whether the frame uses a single long MDCT or several short MDCTs.
*a58d3d2aSXin LiWhen it is set, then the MDCT coefficients represent multiple
*a58d3d2aSXin Lishort MDCTs in the frame. When not set, the coefficients represent a single
*a58d3d2aSXin Lilong MDCT for the frame. The flag is encoded in the bitstream with a probability of 1/8.
*a58d3d2aSXin LiIn addition to the global transient flag is a per-band
*a58d3d2aSXin Libinary flag to change the time-frequency (tf) resolution independently in each band. The
*a58d3d2aSXin Lichange in tf resolution is defined in tf_select_table[][] in celt.c and depends
*a58d3d2aSXin Lion the frame size, whether the transient flag is set, and the value of tf_select.
*a58d3d2aSXin LiThe tf_select flag uses a 1/2 probability, but is only decoded
*a58d3d2aSXin Liif it can have an impact on the result knowing the value of all per-band
*a58d3d2aSXin Litf_change flags.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="energy-decoding" title="Energy Envelope Decoding">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIt is important to quantize the energy with sufficient resolution because
*a58d3d2aSXin Liany energy quantization error cannot be compensated for at a later
*a58d3d2aSXin Listage. Regardless of the resolution used for encoding the spectral shape of a band,
*a58d3d2aSXin Liit is perceptually important to preserve the energy in each band. CELT uses a
*a58d3d2aSXin Lithree-step coarse-fine-fine strategy for encoding the energy in the base-2 log
*a58d3d2aSXin Lidomain, as implemented in quant_bands.c</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="coarse-energy-decoding" title="Coarse energy decoding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiCoarse quantization of the energy uses a fixed resolution of 6 dB
*a58d3d2aSXin Li(integer part of base-2 log). To minimize the bitrate, prediction is applied
*a58d3d2aSXin Liboth in time (using the previous frame) and in frequency (using the previous
*a58d3d2aSXin Libands). The part of the prediction that is based on the
*a58d3d2aSXin Liprevious frame can be disabled, creating an "intra" frame where the energy
*a58d3d2aSXin Liis coded without reference to prior frames. The decoder first reads the intra flag
*a58d3d2aSXin Lito determine what prediction is used.
*a58d3d2aSXin LiThe 2-D z-transform <xref target='z-transform'/> of
*a58d3d2aSXin Lithe prediction filter is:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li                            -1          -1
*a58d3d2aSXin Li              (1 - alpha*z_l  )*(1 - z_b  )
*a58d3d2aSXin LiA(z_l, z_b) = -----------------------------
*a58d3d2aSXin Li                                 -1
*a58d3d2aSXin Li                     1 - beta*z_b
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liwhere b is the band index and l is the frame index. The prediction coefficients
*a58d3d2aSXin Liapplied depend on the frame size in use when not using intra energy and are alpha=0, beta=4915/32768
*a58d3d2aSXin Liwhen using intra energy.
*a58d3d2aSXin LiThe time-domain prediction is based on the final fine quantization of the previous
*a58d3d2aSXin Liframe, while the frequency domain (within the current frame) prediction is based
*a58d3d2aSXin Lion coarse quantization only (because the fine quantization has not been computed
*a58d3d2aSXin Liyet). The prediction is clamped internally so that fixed point implementations with
*a58d3d2aSXin Lilimited dynamic range always remain in the same state as floating point implementations.
*a58d3d2aSXin LiWe approximate the ideal
*a58d3d2aSXin Liprobability distribution of the prediction error using a Laplace distribution
*a58d3d2aSXin Liwith separate parameters for each frame size in intra- and inter-frame modes. These
*a58d3d2aSXin Liparameters are held in the e_prob_model table in quant_bands.c.
*a58d3d2aSXin LiThe
*a58d3d2aSXin Licoarse energy quantization is performed by unquant_coarse_energy() and
*a58d3d2aSXin Liunquant_coarse_energy_impl() (quant_bands.c). The encoding of the Laplace-distributed values is
*a58d3d2aSXin Liimplemented in ec_laplace_decode() (laplace.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="fine-energy-decoding" title="Fine energy quantization">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe number of bits assigned to fine energy quantization in each band is determined
*a58d3d2aSXin Liby the bit allocation computation described in <xref target="allocation"></xref>.
*a58d3d2aSXin LiLet B_i be the number of fine energy bits
*a58d3d2aSXin Lifor band i; the refinement is an integer f in the range [0,2**B_i-1]. The mapping between f
*a58d3d2aSXin Liand the correction applied to the coarse energy is equal to (f+1/2)/2**B_i - 1/2. Fine
*a58d3d2aSXin Lienergy quantization is implemented in quant_fine_energy() (quant_bands.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen some bits are left "unused" after all other flags have been decoded, these bits
*a58d3d2aSXin Liare assigned to a "final" step of fine allocation. In effect, these bits are used
*a58d3d2aSXin Lito add one extra fine energy bit per band per channel. The allocation process
*a58d3d2aSXin Lidetermines two "priorities" for the final fine bits.
*a58d3d2aSXin LiAny remaining bits are first assigned only to bands of priority 0, starting
*a58d3d2aSXin Lifrom band 0 and going up. If all bands of priority 0 have received one bit per
*a58d3d2aSXin Lichannel, then bands of priority 1 are assigned an extra bit per channel,
*a58d3d2aSXin Listarting from band 0. If any bits are left after this, they are left unused.
*a58d3d2aSXin LiThis is implemented in unquant_energy_finalise() (quant_bands.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section> <!-- fine energy -->
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section> <!-- Energy decode -->
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="allocation" title="Bit Allocation">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>Because the bit allocation drives the decoding of the range-coder
*a58d3d2aSXin Listream, it MUST be recovered exactly so that identical coding decisions are
*a58d3d2aSXin Limade in the encoder and decoder. Any deviation from the reference's resulting
*a58d3d2aSXin Libit allocation will result in corrupted output, though implementers are
*a58d3d2aSXin Lifree to implement the procedure in any way which produces identical results.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The per-band gain-shape structure of the CELT layer ensures that using
*a58d3d2aSXin Li the same number of bits for the spectral shape of a band in every frame will
*a58d3d2aSXin Li result in a roughly constant signal-to-noise ratio in that band.
*a58d3d2aSXin LiThis results in coding noise that has the same spectral envelope as the signal.
*a58d3d2aSXin LiThe masking curve produced by a standard psychoacoustic model also closely
*a58d3d2aSXin Li follows the spectral envelope of the signal.
*a58d3d2aSXin LiThis structure means that the ideal allocation is more consistent from frame to
*a58d3d2aSXin Li frame than it is for other codecs without an equivalent structure, and that a
*a58d3d2aSXin Li fixed allocation provides fairly consistent perceptual
*a58d3d2aSXin Li performance&nbsp;<xref target='Valin2010'/>.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>Many codecs transmit significant amounts of side information to control the
*a58d3d2aSXin Li bit allocation within a frame.
*a58d3d2aSXin LiOften this control is only indirect, and must be exercised carefully to
*a58d3d2aSXin Li achieve the desired rate constraints.
*a58d3d2aSXin LiThe CELT layer, however, can adapt over a very wide range of rates, and thus
*a58d3d2aSXin Li has a large number of codebook sizes to choose from for each band.
*a58d3d2aSXin LiExplicitly signaling the size of each of these codebooks would impose
*a58d3d2aSXin Li considerable overhead, even though the allocation is relatively static from
*a58d3d2aSXin Li frame to frame.
*a58d3d2aSXin LiThis is because all of the information required to compute these codebook sizes
*a58d3d2aSXin Li must be derived from a single frame by itself, in order to retain robustness
*a58d3d2aSXin Li to packet loss, so the signaling cannot take advantage of knowledge of the
*a58d3d2aSXin Li allocation in neighboring frames.
*a58d3d2aSXin LiThis problem is exacerbated in low-latency (small frame size) applications,
*a58d3d2aSXin Li which would include this overhead in every frame.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>For this reason, in the MDCT mode Opus uses a primarily implicit bit
*a58d3d2aSXin Liallocation. The available bitstream capacity is known in advance to both
*a58d3d2aSXin Lithe encoder and decoder without additional signaling, ultimately from the
*a58d3d2aSXin Lipacket sizes expressed by a higher-level protocol. Using this information,
*a58d3d2aSXin Lithe codec interpolates an allocation from a hard-coded table.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>While the band-energy structure effectively models intra-band masking,
*a58d3d2aSXin Liit ignores the weaker inter-band masking, band-temporal masking, and
*a58d3d2aSXin Liother less significant perceptual effects. While these effects can
*a58d3d2aSXin Lioften be ignored, they can become significant for particular samples. One
*a58d3d2aSXin Limechanism available to encoders would be to simply increase the overall
*a58d3d2aSXin Lirate for these frames, but this is not possible in a constant rate mode
*a58d3d2aSXin Liand can be fairly inefficient. As a result three explicitly signaled
*a58d3d2aSXin Limechanisms are provided to alter the implicit allocation:</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>Band boost</t>
*a58d3d2aSXin Li<t>Allocation trim</t>
*a58d3d2aSXin Li<t>Band skipping</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The first of these mechanisms, band boost, allows an encoder to boost
*a58d3d2aSXin Lithe allocation in specific bands. The second, allocation trim, works by
*a58d3d2aSXin Libiasing the overall allocation towards higher or lower frequency bands. The third, band
*a58d3d2aSXin Liskipping, selects which low-precision high frequency bands
*a58d3d2aSXin Liwill be allocated no shape bits at all.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>In stereo mode there are two additional parameters
*a58d3d2aSXin Lipotentially coded as part of the allocation procedure: a parameter to allow the
*a58d3d2aSXin Liselective elimination of allocation for the 'side' (i.e., intensity stereo) in jointly coded bands,
*a58d3d2aSXin Liand a flag to deactivate joint coding (i.e., dual stereo). These values are not signaled if
*a58d3d2aSXin Lithey would be meaningless in the overall context of the allocation.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>Because every signaled adjustment increases overhead and implementation
*a58d3d2aSXin Licomplexity, none were included speculatively: the reference encoder makes use
*a58d3d2aSXin Liof all of these mechanisms. While the decision logic in the reference was
*a58d3d2aSXin Lifound to be effective enough to justify the overhead and complexity, further
*a58d3d2aSXin Lianalysis techniques may be discovered which increase the effectiveness of these
*a58d3d2aSXin Liparameters. As with other signaled parameters, an encoder is free to choose the
*a58d3d2aSXin Livalues in any manner, but unless a technique is known to deliver superior
*a58d3d2aSXin Liperceptual results the methods used by the reference implementation should be
*a58d3d2aSXin Liused.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The allocation process consists of the following steps: determining the per-band
*a58d3d2aSXin Limaximum allocation vector, decoding the boosts, decoding the tilt, determining
*a58d3d2aSXin Lithe remaining capacity of the frame, searching the mode table for the
*a58d3d2aSXin Lientry nearest but not exceeding the available space (subject to the tilt, boosts, band
*a58d3d2aSXin Limaximums, and band minimums), linear interpolation, reallocation of
*a58d3d2aSXin Liunused bits with concurrent skip decoding, determination of the
*a58d3d2aSXin Lifine-energy vs. shape split, and final reallocation. This process results
*a58d3d2aSXin Liin a per-band shape allocation (in 1/8th bit units), a per-band fine-energy
*a58d3d2aSXin Liallocation (in 1 bit per channel units), a set of band priorities for
*a58d3d2aSXin Licontrolling the use of remaining bits at the end of the frame, and a
*a58d3d2aSXin Liremaining balance of unallocated space, which is usually zero except
*a58d3d2aSXin Liat very high rates.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe "static" bit allocation (in 1/8 bits) for a quality q, excluding the minimums, maximums,
*a58d3d2aSXin Litilt and boosts, is equal to channels*N*alloc[band][q]&lt;&lt;LM&gt;&gt;2, where
*a58d3d2aSXin Lialloc[][] is given in <xref target="static_alloc"/> and LM=log2(frame_size/120). The allocation
*a58d3d2aSXin Liis obtained by linearly interpolating between two values of q (in steps of 1/64) to find the
*a58d3d2aSXin Lihighest allocation that does not exceed the number of bits remaining.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="static_alloc"
*a58d3d2aSXin Li title="CELT Static Allocation Table">
*a58d3d2aSXin Li <preamble>Rows indicate the MDCT bands, columns are the different quality (q) parameters. The units are 1/32 bit per MDCT bin.</preamble>
*a58d3d2aSXin Li<ttcol align="right">0</ttcol>
*a58d3d2aSXin Li<ttcol align="right">1</ttcol>
*a58d3d2aSXin Li<ttcol align="right">2</ttcol>
*a58d3d2aSXin Li<ttcol align="right">3</ttcol>
*a58d3d2aSXin Li<ttcol align="right">4</ttcol>
*a58d3d2aSXin Li<ttcol align="right">5</ttcol>
*a58d3d2aSXin Li<ttcol align="right">6</ttcol>
*a58d3d2aSXin Li<ttcol align="right">7</ttcol>
*a58d3d2aSXin Li<ttcol align="right">8</ttcol>
*a58d3d2aSXin Li<ttcol align="right">9</ttcol>
*a58d3d2aSXin Li<ttcol align="right">10</ttcol>
*a58d3d2aSXin Li<c>0</c><c>90</c><c>110</c><c>118</c><c>126</c><c>134</c><c>144</c><c>152</c><c>162</c><c>172</c><c>200</c>
*a58d3d2aSXin Li<c>0</c><c>80</c><c>100</c><c>110</c><c>119</c><c>127</c><c>137</c><c>145</c><c>155</c><c>165</c><c>200</c>
*a58d3d2aSXin Li<c>0</c><c>75</c><c>90</c><c>103</c><c>112</c><c>120</c><c>130</c><c>138</c><c>148</c><c>158</c><c>200</c>
*a58d3d2aSXin Li<c>0</c><c>69</c><c>84</c><c>93</c><c>104</c><c>114</c><c>124</c><c>132</c><c>142</c><c>152</c><c>200</c>
*a58d3d2aSXin Li<c>0</c><c>63</c><c>78</c><c>86</c><c>95</c><c>103</c><c>113</c><c>123</c><c>133</c><c>143</c><c>200</c>
*a58d3d2aSXin Li<c>0</c><c>56</c><c>71</c><c>80</c><c>89</c><c>97</c><c>107</c><c>117</c><c>127</c><c>137</c><c>200</c>
*a58d3d2aSXin Li<c>0</c><c>49</c><c>65</c><c>75</c><c>83</c><c>91</c><c>101</c><c>111</c><c>121</c><c>131</c><c>200</c>
*a58d3d2aSXin Li<c>0</c><c>40</c><c>58</c><c>70</c><c>78</c><c>85</c><c>95</c><c>105</c><c>115</c><c>125</c><c>200</c>
*a58d3d2aSXin Li<c>0</c><c>34</c><c>51</c><c>65</c><c>72</c><c>78</c><c>88</c><c>98</c><c>108</c><c>118</c><c>198</c>
*a58d3d2aSXin Li<c>0</c><c>29</c><c>45</c><c>59</c><c>66</c><c>72</c><c>82</c><c>92</c><c>102</c><c>112</c><c>193</c>
*a58d3d2aSXin Li<c>0</c><c>20</c><c>39</c><c>53</c><c>60</c><c>66</c><c>76</c><c>86</c><c>96</c><c>106</c><c>188</c>
*a58d3d2aSXin Li<c>0</c><c>18</c><c>32</c><c>47</c><c>54</c><c>60</c><c>70</c><c>80</c><c>90</c><c>100</c><c>183</c>
*a58d3d2aSXin Li<c>0</c><c>10</c><c>26</c><c>40</c><c>47</c><c>54</c><c>64</c><c>74</c><c>84</c><c>94</c><c>178</c>
*a58d3d2aSXin Li<c>0</c><c>0</c><c>20</c><c>31</c><c>39</c><c>47</c><c>57</c><c>67</c><c>77</c><c>87</c><c>173</c>
*a58d3d2aSXin Li<c>0</c><c>0</c><c>12</c><c>23</c><c>32</c><c>41</c><c>51</c><c>61</c><c>71</c><c>81</c><c>168</c>
*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>15</c><c>25</c><c>35</c><c>45</c><c>55</c><c>65</c><c>75</c><c>163</c>
*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>4</c><c>17</c><c>29</c><c>39</c><c>49</c><c>59</c><c>69</c><c>158</c>
*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>12</c><c>23</c><c>33</c><c>43</c><c>53</c><c>63</c><c>153</c>
*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>1</c><c>16</c><c>26</c><c>36</c><c>46</c><c>56</c><c>148</c>
*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>0</c><c>10</c><c>15</c><c>20</c><c>30</c><c>45</c><c>129</c>
*a58d3d2aSXin Li<c>0</c><c>0</c><c>0</c><c>0</c><c>0</c><c>1</c><c>1</c><c>1</c><c>1</c><c>20</c><c>104</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The maximum allocation vector is an approximation of the maximum space
*a58d3d2aSXin Lithat can be used by each band for a given mode. The value is
*a58d3d2aSXin Liapproximate because the shape encoding is variable rate (due
*a58d3d2aSXin Lito entropy coding of splitting parameters). Setting the maximum too low reduces the
*a58d3d2aSXin Limaximum achievable quality in a band while setting it too high
*a58d3d2aSXin Limay result in waste: bitstream capacity available at the end
*a58d3d2aSXin Liof the frame which can not be put to any use. The maximums
*a58d3d2aSXin Lispecified by the codec reflect the average maximum. In the reference
*a58d3d2aSXin Liimplementation, the maximums in bits/sample are precomputed in a static table
*a58d3d2aSXin Li(see cache_caps50[] in static_modes_float.h) for each band,
*a58d3d2aSXin Lifor each value of LM, and for both mono and stereo.
*a58d3d2aSXin Li
*a58d3d2aSXin LiImplementations are expected
*a58d3d2aSXin Lito simply use the same table data, but the procedure for generating
*a58d3d2aSXin Lithis table is included in rate.c as part of compute_pulse_cache().</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>To convert the values in cache.caps into the actual maximums: first
*a58d3d2aSXin Liset nbBands to the maximum number of bands for this mode, and stereo to
*a58d3d2aSXin Lizero if stereo is not in use and one otherwise. For each band set N
*a58d3d2aSXin Lito the number of MDCT bins covered by the band (for one channel), set LM
*a58d3d2aSXin Lito the shift value for the frame size,
*a58d3d2aSXin Lithen set i to nbBands*(2*LM+stereo). Then set the maximum for the band to
*a58d3d2aSXin Lithe i-th index of cache.caps + 64 and multiply by the number of channels
*a58d3d2aSXin Liin the current frame (one or two) and by N, then divide the result by 4
*a58d3d2aSXin Liusing integer division. The resulting vector will be called
*a58d3d2aSXin Licap[]. The elements fit in signed 16-bit integers but do not fit in 8 bits.
*a58d3d2aSXin LiThis procedure is implemented in the reference in the function init_caps() in celt.c.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The band boosts are represented by a series of binary symbols which
*a58d3d2aSXin Liare entropy coded with very low probability. Each band can potentially be boosted
*a58d3d2aSXin Limultiple times, subject to the frame actually having enough room to obey
*a58d3d2aSXin Lithe boost and having enough room to code the boost symbol. The default
*a58d3d2aSXin Licoding cost for a boost starts out at six bits (probability p=1/64), but subsequent boosts
*a58d3d2aSXin Liin a band cost only a single bit and every time a band is boosted the
*a58d3d2aSXin Liinitial cost is reduced (down to a minimum of two bits, or p=1/4). Since the initial
*a58d3d2aSXin Licost of coding a boost is 6 bits, the coding cost of the boost symbols when
*a58d3d2aSXin Licompletely unused is 0.48 bits/frame for a 21 band mode (21*-log2(1-1/2**6)).</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>To decode the band boosts: First set 'dynalloc_logp' to 6, the initial
*a58d3d2aSXin Liamount of storage required to signal a boost in bits, 'total_bits' to the
*a58d3d2aSXin Lisize of the frame in 8th bits, 'total_boost' to zero, and 'tell' to the total number
*a58d3d2aSXin Liof 8th bits decoded
*a58d3d2aSXin Liso far. For each band from the coding start (0 normally, but 17 in Hybrid mode)
*a58d3d2aSXin Lito the coding end (which changes depending on the signaled bandwidth), the boost quanta
*a58d3d2aSXin Liin units of 1/8 bit is calculated as quanta = min(8*N, max(48, N)).
*a58d3d2aSXin LiThis represents a boost step size of six bits, subject to a lower limit of
*a58d3d2aSXin Li1/8th&nbsp;bit/sample and an upper limit of 1&nbsp;bit/sample.
*a58d3d2aSXin LiSet 'boost' to zero and 'dynalloc_loop_logp'
*a58d3d2aSXin Lito dynalloc_logp. While dynalloc_loop_log (the current worst case symbol cost) in
*a58d3d2aSXin Li8th bits plus tell is less than total_bits plus total_boost and boost is less than cap[] for this
*a58d3d2aSXin Liband: Decode a bit from the bitstream with a with dynalloc_loop_logp as the cost
*a58d3d2aSXin Liof a one, update tell to reflect the current used capacity, if the decoded value
*a58d3d2aSXin Liis zero break the  loop otherwise add quanta to boost and total_boost, subtract quanta from
*a58d3d2aSXin Litotal_bits, and set dynalloc_loop_log to 1. When the while loop finishes
*a58d3d2aSXin Liboost contains the boost for this band. If boost is non-zero and dynalloc_logp
*a58d3d2aSXin Liis greater than 2, decrease dynalloc_logp.  Once this process has been
*a58d3d2aSXin Liexecuted on all bands, the band boosts have been decoded. This procedure
*a58d3d2aSXin Liis implemented around line 2474 of celt.c.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>At very low rates it is possible that there won't be enough available
*a58d3d2aSXin Lispace to execute the inner loop even once. In these cases band boost
*a58d3d2aSXin Liis not possible but its overhead is completely eliminated. Because of the
*a58d3d2aSXin Lihigh cost of band boost when activated, a reasonable encoder should not be
*a58d3d2aSXin Liusing it at very low rates. The reference implements its dynalloc decision
*a58d3d2aSXin Lilogic around line 1304 of celt.c.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The allocation trim is a integer value from 0-10. The default value of
*a58d3d2aSXin Li5 indicates no trim. The trim parameter is entropy coded in order to
*a58d3d2aSXin Lilower the coding cost of less extreme adjustments. Values lower than
*a58d3d2aSXin Li5 bias the allocation towards lower frequencies and values above 5
*a58d3d2aSXin Libias it towards higher frequencies. Like other signaled parameters, signaling
*a58d3d2aSXin Liof the trim is gated so that it is not included if there is insufficient space
*a58d3d2aSXin Liavailable in the bitstream. To decode the trim, first set
*a58d3d2aSXin Lithe trim value to 5, then if and only if the count of decoded 8th bits so far (ec_tell_frac)
*a58d3d2aSXin Liplus 48 (6 bits) is less than or equal to the total frame size in 8th
*a58d3d2aSXin Libits minus total_boost (a product of the above band boost procedure),
*a58d3d2aSXin Lidecode the trim value using the PDF in <xref target="celt_trim_pdf"/>.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="celt_trim_pdf" title="PDF for the Trim">
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>{1, 1, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2}/128</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>For 10 ms and 20 ms frames using short blocks and that have at least LM+2 bits left prior to
*a58d3d2aSXin Lithe allocation process, then one anti-collapse bit is reserved in the allocation process so it can
*a58d3d2aSXin Libe decoded later. Following the the anti-collapse reservation, one bit is reserved for skip if available.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>For stereo frames, bits are reserved for intensity stereo and for dual stereo. Intensity stereo
*a58d3d2aSXin Lirequires ilog2(end-start) bits. Those bits are reserved if there is enough bits left. Following this, one
*a58d3d2aSXin Libit is reserved for dual stereo if available.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The allocation computation begins by setting up some initial conditions.
*a58d3d2aSXin Li'total' is set to the remaining available 8th bits, computed by taking the
*a58d3d2aSXin Lisize of the coded frame times 8 and subtracting ec_tell_frac(). From this value, one (8th bit)
*a58d3d2aSXin Liis subtracted to ensure that the resulting allocation will be conservative. 'anti_collapse_rsv'
*a58d3d2aSXin Liis set to 8 (8th bits) if and only if the frame is a transient, LM is greater than 1, and total is
*a58d3d2aSXin Ligreater than or equal to (LM+2) * 8. Total is then decremented by anti_collapse_rsv and clamped
*a58d3d2aSXin Lito be equal to or greater than zero. 'skip_rsv' is set to 8 (8th bits) if total is greater than
*a58d3d2aSXin Li8, otherwise it is zero. Total is then decremented by skip_rsv. This reserves space for the
*a58d3d2aSXin Lifinal skipping flag.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>If the current frame is stereo, intensity_rsv is set to the conservative log2 in 8th bits
*a58d3d2aSXin Liof the number of coded bands for this frame (given by the table LOG2_FRAC_TABLE in rate.c). If
*a58d3d2aSXin Liintensity_rsv is greater than total then intensity_rsv is set to zero. Otherwise total is
*a58d3d2aSXin Lidecremented by intensity_rsv, and if total is still greater than 8, dual_stereo_rsv is
*a58d3d2aSXin Liset to 8 and total is decremented by dual_stereo_rsv.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The allocation process then computes a vector representing the hard minimum amounts allocation
*a58d3d2aSXin Liany band will receive for shape. This minimum is higher than the technical limit of the PVQ
*a58d3d2aSXin Liprocess, but very low rate allocations produce an excessively sparse spectrum and these bands
*a58d3d2aSXin Liare better served by having no allocation at all. For each coded band, set thresh[band] to
*a58d3d2aSXin Litwenty-four times the number of MDCT bins in the band and divide by 16. If 8 times the number
*a58d3d2aSXin Liof channels is greater, use that instead. This sets the minimum allocation to one bit per channel
*a58d3d2aSXin Lior 48 128th bits per MDCT bin, whichever is greater. The band-size dependent part of this
*a58d3d2aSXin Livalue is not scaled by the channel count, because at the very low rates where this limit is
*a58d3d2aSXin Liapplicable there will usually be no bits allocated to the side.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The previously decoded allocation trim is used to derive a vector of per-band adjustments,
*a58d3d2aSXin Li'trim_offsets[]'. For each coded band take the alloc_trim and subtract 5 and LM. Then multiply
*a58d3d2aSXin Lithe result by the number of channels, the number of MDCT bins in the shortest frame size for this mode,
*a58d3d2aSXin Lithe number of remaining bands, 2**LM, and 8. Then divide this value by 64. Finally, if the
*a58d3d2aSXin Linumber of MDCT bins in the band per channel is only one, 8 times the number of channels is subtracted
*a58d3d2aSXin Liin order to diminish the allocation by one bit, because width 1 bands receive greater benefit
*a58d3d2aSXin Lifrom the coarse energy coding.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="PVQ-decoder" title="Shape Decoding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIn each band, the normalized "shape" is encoded
*a58d3d2aSXin Liusing a vector quantization scheme called a "pyramid vector quantizer".
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>In
*a58d3d2aSXin Lithe simplest case, the number of bits allocated in
*a58d3d2aSXin Li<xref target="allocation"></xref> is converted to a number of pulses as described
*a58d3d2aSXin Liby <xref target="bits-pulses"></xref>. Knowing the number of pulses and the
*a58d3d2aSXin Linumber of samples in the band, the decoder calculates the size of the codebook
*a58d3d2aSXin Lias detailed in <xref target="cwrs-decoder"></xref>. The size is used to decode
*a58d3d2aSXin Lian unsigned integer (uniform probability model), which is the codeword index.
*a58d3d2aSXin LiThis index is converted into the corresponding vector as explained in
*a58d3d2aSXin Li<xref target="cwrs-decoder"></xref>. This vector is then scaled to unit norm.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="bits-pulses" title="Bits to Pulses">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAlthough the allocation is performed in 1/8th bit units, the quantization requires
*a58d3d2aSXin Lian integer number of pulses K. To do this, the encoder searches for the value
*a58d3d2aSXin Liof K that produces the number of bits nearest to the allocated value
*a58d3d2aSXin Li(rounding down if exactly halfway between two values), not to exceed
*a58d3d2aSXin Lithe total number of bits available. For efficiency reasons, the search is performed against a
*a58d3d2aSXin Liprecomputed allocation table which only permits some K values for each N. The number of
*a58d3d2aSXin Licodebook entries can be computed as explained in <xref target="cwrs-decoder"></xref>. The difference
*a58d3d2aSXin Libetween the number of bits allocated and the number of bits used is accumulated to a
*a58d3d2aSXin Li"balance" (initialized to zero) that helps adjust the
*a58d3d2aSXin Liallocation for the next bands. One third of the balance is applied to the
*a58d3d2aSXin Libit allocation of each band to help achieve the target allocation. The only
*a58d3d2aSXin Liexceptions are the band before the last and the last band, for which half the balance
*a58d3d2aSXin Liand the whole balance are applied, respectively.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="cwrs-decoder" title="PVQ Decoding">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiDecoding of PVQ vectors is implemented in decode_pulses() (cwrs.c).
*a58d3d2aSXin LiThe unique codeword index is decoded as a uniformly-distributed integer value between 0 and
*a58d3d2aSXin LiV(N,K)-1, where V(N,K) is the number of possible combinations of K pulses in
*a58d3d2aSXin LiN samples. The index is then converted to a vector in the same way specified in
*a58d3d2aSXin Li<xref target="PVQ"></xref>. The indexing is based on the calculation of V(N,K)
*a58d3d2aSXin Li(denoted N(L,K) in <xref target="PVQ"></xref>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li The number of combinations can be computed recursively as
*a58d3d2aSXin LiV(N,K) = V(N-1,K) + V(N,K-1) + V(N-1,K-1), with V(N,0) = 1 and V(0,K) = 0, K != 0.
*a58d3d2aSXin LiThere are many different ways to compute V(N,K), including precomputed tables and direct
*a58d3d2aSXin Liuse of the recursive formulation. The reference implementation applies the recursive
*a58d3d2aSXin Liformulation one line (or column) at a time to save on memory use,
*a58d3d2aSXin Lialong with an alternate,
*a58d3d2aSXin Liunivariate recurrence to initialize an arbitrary line, and direct
*a58d3d2aSXin Lipolynomial solutions for small N. All of these methods are
*a58d3d2aSXin Liequivalent, and have different trade-offs in speed, memory usage, and
*a58d3d2aSXin Licode size. Implementations MAY use any methods they like, as long as
*a58d3d2aSXin Lithey are equivalent to the mathematical definition.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoded vector X is recovered as follows.
*a58d3d2aSXin LiLet i be the index decoded with the procedure in <xref target="ec_dec_uint"/>
*a58d3d2aSXin Li with ft&nbsp;=&nbsp;V(N,K), so that 0&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;V(N,K).
*a58d3d2aSXin LiLet k&nbsp;=&nbsp;K.
*a58d3d2aSXin LiThen for j&nbsp;=&nbsp;0 to (N&nbsp;-&nbsp;1), inclusive, do:
*a58d3d2aSXin Li<list style="numbers">
*a58d3d2aSXin Li<t>Let p&nbsp;=&nbsp;(V(N-j-1,k)&nbsp;+&nbsp;V(N-j,k))/2.</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf i&nbsp;&lt;&nbsp;p, then let sgn&nbsp;=&nbsp;1, else let sgn&nbsp;=&nbsp;-1
*a58d3d2aSXin Li and set i&nbsp;=&nbsp;i&nbsp;-&nbsp;p.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>Let k0&nbsp;=&nbsp;k and set p&nbsp;=&nbsp;p&nbsp;-&nbsp;V(N-j-1,k).</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhile p&nbsp;&gt;&nbsp;i, set k&nbsp;=&nbsp;k&nbsp;-&nbsp;1 and
*a58d3d2aSXin Li p&nbsp;=&nbsp;p&nbsp;-&nbsp;V(N-j-1,k).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSet X[j]&nbsp;=&nbsp;sgn*(k0&nbsp;-&nbsp;k) and i&nbsp;=&nbsp;i&nbsp;-&nbsp;p.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoded vector X is then normalized such that its
*a58d3d2aSXin LiL2-norm equals one.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="spreading" title="Spreading">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe normalized vector decoded in <xref target="cwrs-decoder"/> is then rotated
*a58d3d2aSXin Lifor the purpose of avoiding tonal artifacts. The rotation gain is equal to
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lig_r = N / (N + f_r*K)
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Liwhere N is the number of dimensions, K is the number of pulses, and f_r depends on
*a58d3d2aSXin Lithe value of the "spread" parameter in the bit-stream.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="spread values" title="Spreading Values">
*a58d3d2aSXin Li<ttcol>Spread value</ttcol>
*a58d3d2aSXin Li<ttcol>f_r</ttcol>
*a58d3d2aSXin Li <c>0</c> <c>infinite (no rotation)</c>
*a58d3d2aSXin Li <c>1</c> <c>15</c>
*a58d3d2aSXin Li <c>2</c> <c>10</c>
*a58d3d2aSXin Li <c>3</c> <c>5</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe rotation angle is then calculated as
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li                 2
*a58d3d2aSXin Li        pi *  g_r
*a58d3d2aSXin Litheta = ----------
*a58d3d2aSXin Li            4
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiA 2-D rotation R(i,j) between points x_i and x_j is defined as:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lix_i' =  cos(theta)*x_i + sin(theta)*x_j
*a58d3d2aSXin Lix_j' = -sin(theta)*x_i + cos(theta)*x_j
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin LiAn N-D rotation is then achieved by applying a series of 2-D rotations back and forth, in the
*a58d3d2aSXin Lifollowing order: R(x_1, x_2), R(x_2, x_3), ..., R(x_N-2, X_N-1), R(x_N-1, X_N),
*a58d3d2aSXin LiR(x_N-2, X_N-1), ..., R(x_1, x_2).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf the decoded vector represents more
*a58d3d2aSXin Lithan one time block, then this spreading process is applied separately on each time block.
*a58d3d2aSXin LiAlso, if each block represents 8 samples or more, then another N-D rotation, by
*a58d3d2aSXin Li(pi/2-theta), is applied <spanx style="emph">before</spanx> the rotation described above. This
*a58d3d2aSXin Liextra rotation is applied in an interleaved manner with a stride equal to round(sqrt(N/nb_blocks)),
*a58d3d2aSXin Lii.e., it is applied independently for each set of sample S_k = {stride*n + k}, n=0..N/stride-1.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="split" title="Split decoding">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTo avoid the need for multi-precision calculations when decoding PVQ codevectors,
*a58d3d2aSXin Lithe maximum size allowed for codebooks is 32 bits. When larger codebooks are
*a58d3d2aSXin Lineeded, the vector is instead split in two sub-vectors of size N/2.
*a58d3d2aSXin LiA quantized gain parameter with precision
*a58d3d2aSXin Liderived from the current allocation is entropy coded to represent the relative
*a58d3d2aSXin Ligains of each side of the split, and the entire decoding process is recursively
*a58d3d2aSXin Liapplied. Multiple levels of splitting may be applied up to a limit of LM+1 splits.
*a58d3d2aSXin LiThe same recursive mechanism is applied for the joint coding
*a58d3d2aSXin Liof stereo audio.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="tf-change" title="Time-Frequency change">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe time-frequency (TF) parameters are used to control the time-frequency resolution tradeoff
*a58d3d2aSXin Liin each coded band. For each band, there are two possible TF choices. For the first
*a58d3d2aSXin Liband coded, the PDF is {3, 1}/4 for frames marked as transient and {15, 1}/16 for
*a58d3d2aSXin Lithe other frames. For subsequent bands, the TF choice is coded relative to the
*a58d3d2aSXin Liprevious TF choice with probability {15, 1}/15 for transient frames and {31, 1}/32
*a58d3d2aSXin Liotherwise. The mapping between the decoded TF choices and the adjustment in TF
*a58d3d2aSXin Liresolution is shown in the tables below.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor='tf_00'
*a58d3d2aSXin Li title="TF Adjustments for Non-transient Frames and tf_select=0">
*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol>
*a58d3d2aSXin Li<ttcol align='center'>0</ttcol>
*a58d3d2aSXin Li<ttcol align='center'>1</ttcol>
*a58d3d2aSXin Li<c>2.5</c>      <c>0</c> <c>-1</c>
*a58d3d2aSXin Li<c>5</c>      <c>0</c> <c>-1</c>
*a58d3d2aSXin Li<c>10</c>      <c>0</c> <c>-2</c>
*a58d3d2aSXin Li<c>20</c>      <c>0</c> <c>-2</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor='tf_01'
*a58d3d2aSXin Li title="TF Adjustments for Non-transient Frames and tf_select=1">
*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol>
*a58d3d2aSXin Li<ttcol align='center'>0</ttcol>
*a58d3d2aSXin Li<ttcol align='center'>1</ttcol>
*a58d3d2aSXin Li<c>2.5</c>      <c>0</c> <c>-1</c>
*a58d3d2aSXin Li<c>5</c>      <c>0</c> <c>-2</c>
*a58d3d2aSXin Li<c>10</c>      <c>0</c> <c>-3</c>
*a58d3d2aSXin Li<c>20</c>      <c>0</c> <c>-3</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor='tf_10'
*a58d3d2aSXin Li title="TF Adjustments for Transient Frames and tf_select=0">
*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol>
*a58d3d2aSXin Li<ttcol align='center'>0</ttcol>
*a58d3d2aSXin Li<ttcol align='center'>1</ttcol>
*a58d3d2aSXin Li<c>2.5</c>      <c>0</c> <c>-1</c>
*a58d3d2aSXin Li<c>5</c>      <c>1</c> <c>0</c>
*a58d3d2aSXin Li<c>10</c>      <c>2</c> <c>0</c>
*a58d3d2aSXin Li<c>20</c>      <c>3</c> <c>0</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor='tf_11'
*a58d3d2aSXin Li title="TF Adjustments for Transient Frames and tf_select=1">
*a58d3d2aSXin Li<ttcol align='center'>Frame size (ms)</ttcol>
*a58d3d2aSXin Li<ttcol align='center'>0</ttcol>
*a58d3d2aSXin Li<ttcol align='center'>1</ttcol>
*a58d3d2aSXin Li<c>2.5</c>      <c>0</c> <c>-1</c>
*a58d3d2aSXin Li<c>5</c>      <c>1</c> <c>-1</c>
*a58d3d2aSXin Li<c>10</c>      <c>1</c> <c>-1</c>
*a58d3d2aSXin Li<c>20</c>      <c>1</c> <c>-1</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA negative TF adjustment means that the temporal resolution is increased,
*a58d3d2aSXin Liwhile a positive TF adjustment means that the frequency resolution is increased.
*a58d3d2aSXin LiChanges in TF resolution are implemented using the Hadamard transform <xref target="Hadamard"/>. To increase
*a58d3d2aSXin Lithe time resolution by N, N "levels" of the Hadamard transform are applied to the
*a58d3d2aSXin Lidecoded vector for each interleaved MDCT vector. To increase the frequency resolution
*a58d3d2aSXin Li(assumes a transient frame), then N levels of the Hadamard transform are applied
*a58d3d2aSXin Li<spanx style="emph">across</spanx> the interleaved MDCT vector. In the case of increased
*a58d3d2aSXin Litime resolution the decoder uses the "sequency order" because the input vector
*a58d3d2aSXin Liis sorted in time.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="anti-collapse" title="Anti-Collapse Processing">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe anti-collapse feature is designed to avoid the situation where the use of multiple
*a58d3d2aSXin Lishort MDCTs causes the energy in one or more of the MDCTs to be zero for
*a58d3d2aSXin Lisome bands, causing unpleasant artifacts.
*a58d3d2aSXin LiWhen the frame has the transient bit set, an anti-collapse bit is decoded.
*a58d3d2aSXin LiWhen anti-collapse is set, the energy in each small MDCT is prevented
*a58d3d2aSXin Lifrom collapsing to zero. For each band of each MDCT where a collapse is
*a58d3d2aSXin Lidetected, a pseudo-random signal is inserted with an energy corresponding
*a58d3d2aSXin Lito the minimum energy over the two previous frames. A renormalization step is
*a58d3d2aSXin Lithen required to ensure that the anti-collapse step did not alter the
*a58d3d2aSXin Lienergy preservation property.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="denormalization" title="Denormalization">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiJust as each band was normalized in the encoder, the last step of the decoder before
*a58d3d2aSXin Lithe inverse MDCT is to denormalize the bands. Each decoded normalized band is
*a58d3d2aSXin Limultiplied by the square root of the decoded energy. This is done by denormalise_bands()
*a58d3d2aSXin Li(bands.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="inverse-mdct" title="Inverse MDCT">
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The inverse MDCT implementation has no special characteristics. The
*a58d3d2aSXin Liinput is N frequency-domain samples and the output is 2*N time-domain
*a58d3d2aSXin Lisamples, while scaling by 1/2. A "low-overlap" window reduces the algorithmic delay.
*a58d3d2aSXin LiIt is derived from a basic (full overlap) 240-sample version of the window used by the Vorbis codec:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li                                      2
*a58d3d2aSXin Li       /   /pi      /pi   n + 1/2\ \ \
*a58d3d2aSXin LiW(n) = |sin|-- * sin|-- * -------| | | .
*a58d3d2aSXin Li       \   \2       \2       L   / / /
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe low-overlap window is created by zero-padding the basic window and inserting ones in the
*a58d3d2aSXin Limiddle, such that the resulting window still satisfies power complementarity <xref target='Princen86'/>.
*a58d3d2aSXin LiThe IMDCT and
*a58d3d2aSXin Liwindowing are performed by mdct_backward (mdct.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="post-filter" title="Post-filter">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe output of the inverse MDCT (after weighted overlap-add) is sent to the
*a58d3d2aSXin Lipost-filter. Although the post-filter is applied at the end, the post-filter
*a58d3d2aSXin Liparameters are encoded at the beginning, just after the silence flag.
*a58d3d2aSXin LiThe post-filter can be switched on or off using one bit (logp=1).
*a58d3d2aSXin LiIf the post-filter is enabled, then the octave is decoded as an integer value
*a58d3d2aSXin Libetween 0 and 6 of uniform probability. Once the octave is known, the fine pitch
*a58d3d2aSXin Liwithin the octave is decoded using 4+octave raw bits. The final pitch period
*a58d3d2aSXin Liis equal to (16&lt;&lt;octave)+fine_pitch-1 so it is bounded between 15 and 1022,
*a58d3d2aSXin Liinclusively. Next, the gain is decoded as three raw bits and is equal to
*a58d3d2aSXin LiG=3*(int_gain+1)/32. The set of post-filter taps is decoded last, using
*a58d3d2aSXin Lia pdf equal to {2, 1, 1}/4. Tapset zero corresponds to the filter coefficients
*a58d3d2aSXin Lig0 = 0.3066406250, g1 = 0.2170410156, g2 = 0.1296386719. Tapset one
*a58d3d2aSXin Licorresponds to the filter coefficients g0 = 0.4638671875, g1 = 0.2680664062,
*a58d3d2aSXin Lig2 = 0, and tapset two uses filter coefficients g0 = 0.7998046875,
*a58d3d2aSXin Lig1 = 0.1000976562, g2 = 0.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe post-filter response is thus computed as:
*a58d3d2aSXin Li              <figure align="center">
*a58d3d2aSXin Li                <artwork align="center">
*a58d3d2aSXin Li                  <![CDATA[
*a58d3d2aSXin Li   y(n) = x(n) + G*(g0*y(n-T) + g1*(y(n-T+1)+y(n-T+1))
*a58d3d2aSXin Li                              + g2*(y(n-T+2)+y(n-T+2)))
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li                </artwork>
*a58d3d2aSXin Li              </figure>
*a58d3d2aSXin Li
*a58d3d2aSXin LiDuring a transition between different gains, a smooth transition is calculated
*a58d3d2aSXin Liusing the square of the MDCT window. It is important that values of y(n) be
*a58d3d2aSXin Liinterpolated one at a time such that the past value of y(n) used is interpolated.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="deemphasis" title="De-emphasis">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter the post-filter,
*a58d3d2aSXin Lithe signal is de-emphasized using the inverse of the pre-emphasis filter
*a58d3d2aSXin Liused in the encoder:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 1            1
*a58d3d2aSXin Li---- = --------------- ,
*a58d3d2aSXin LiA(z)                -1
*a58d3d2aSXin Li       1 - alpha_p*z
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liwhere alpha_p=0.8500061035.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="Packet Loss Concealment" title="Packet Loss Concealment (PLC)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiPacket loss concealment (PLC) is an optional decoder-side feature that
*a58d3d2aSXin LiSHOULD be included when receiving from an unreliable channel. Because
*a58d3d2aSXin LiPLC is not part of the bitstream, there are many acceptable ways to
*a58d3d2aSXin Liimplement PLC with different complexity/quality trade-offs.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe PLC in
*a58d3d2aSXin Lithe reference implementation depends on the mode of last packet received.
*a58d3d2aSXin LiIn CELT mode, the PLC finds a periodicity in the decoded
*a58d3d2aSXin Lisignal and repeats the windowed waveform using the pitch offset. The windowed
*a58d3d2aSXin Liwaveform is overlapped in such a way as to preserve the time-domain aliasing
*a58d3d2aSXin Licancellation with the previous frame and the next frame. This is implemented
*a58d3d2aSXin Liin celt_decode_lost() (mdct.c).  In SILK mode, the PLC uses LPC extrapolation
*a58d3d2aSXin Lifrom the previous frame, implemented in silk_PLC() (PLC.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="clock-drift" title="Clock Drift Compensation">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiClock drift refers to the gradual desynchronization of two endpoints
*a58d3d2aSXin Liwhose sample clocks run at different frequencies while they are streaming
*a58d3d2aSXin Lilive audio.  Differences in clock frequencies are generally attributable to
*a58d3d2aSXin Limanufacturing variation in the endpoints' clock hardware.  For long-lived
*a58d3d2aSXin Listreams, the time difference between sender and receiver can grow without
*a58d3d2aSXin Libound.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen the sender's clock runs slower than the receiver's, the effect is similar
*a58d3d2aSXin Lito packet loss: too few packets are received.  The receiver can distinguish
*a58d3d2aSXin Libetween drift and loss if the transport provides packet timestamps.  A receiver
*a58d3d2aSXin Lifor live streams SHOULD conceal the effects of drift, and MAY do so by invoking
*a58d3d2aSXin Lithe PLC.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen the sender's clock runs faster than the receiver's, too many packets will
*a58d3d2aSXin Libe received.  The receiver MAY respond by skipping any packet (i.e., not
*a58d3d2aSXin Lisubmitting the packet for decoding).  This is likely to produce a less severe
*a58d3d2aSXin Liartifact than if the frame were dropped after decoding.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA decoder MAY employ a more sophisticated drift compensation method. For
*a58d3d2aSXin Liexample, the
*a58d3d2aSXin Li<xref target='Google-NetEQ'>NetEQ component</xref>
*a58d3d2aSXin Liof the
*a58d3d2aSXin Li<xref target='Google-WebRTC'>Google WebRTC codebase</xref>
*a58d3d2aSXin Licompensates for drift by adding or removing
*a58d3d2aSXin Lione period when the signal is highly periodic. The reference implementation of
*a58d3d2aSXin LiOpus allows a caller to learn whether the current frame's signal is highly
*a58d3d2aSXin Liperiodic, and if so what the period is, using the OPUS_GET_PITCH() request.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="switching" title="Configuration Switching">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSwitching between the Opus coding modes, audio bandwidths, and channel counts
*a58d3d2aSXin Li requires careful consideration to avoid audible glitches.
*a58d3d2aSXin LiSwitching between any two configurations of the CELT-only mode, any two
*a58d3d2aSXin Li configurations of the Hybrid mode, or from WB SILK to Hybrid mode does not
*a58d3d2aSXin Li require any special treatment in the decoder, as the MDCT overlap will smooth
*a58d3d2aSXin Li the transition.
*a58d3d2aSXin LiSwitching from Hybrid mode to WB SILK requires adding in the final contents
*a58d3d2aSXin Li of the CELT overlap buffer to the first SILK-only packet.
*a58d3d2aSXin LiThis can be done by decoding a 2.5&nbsp;ms silence frame with the CELT decoder
*a58d3d2aSXin Li using the channel count of the SILK-only packet (and any choice of audio
*a58d3d2aSXin Li bandwidth), which will correctly handle the cases when the channel count
*a58d3d2aSXin Li changes as well.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen changing the channel count for SILK-only or Hybrid packets, the encoder
*a58d3d2aSXin Li can avoid glitches by smoothly varying the stereo width of the input signal
*a58d3d2aSXin Li before or after the transition, and SHOULD do so.
*a58d3d2aSXin LiHowever, other transitions between SILK-only packets or between NB or MB SILK
*a58d3d2aSXin Li and Hybrid packets may cause glitches, because neither the LSF coefficients
*a58d3d2aSXin Li nor the LTP, LPC, stereo unmixing, and resampler buffers are available at the
*a58d3d2aSXin Li new sample rate.
*a58d3d2aSXin LiThese switches SHOULD be delayed by the encoder until quiet periods or
*a58d3d2aSXin Li transients, where the inevitable glitches will be less audible. Additionally,
*a58d3d2aSXin Li the bit-stream MAY include redundant side information ("redundancy"), in the
*a58d3d2aSXin Li form of additional CELT frames embedded in each of the Opus frames around the
*a58d3d2aSXin Li transition.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe other transitions that cannot be easily handled are those where the lower
*a58d3d2aSXin Li frequencies switch between the SILK LP-based model and the CELT MDCT model.
*a58d3d2aSXin LiHowever, an encoder may not have an opportunity to delay such a switch to a
*a58d3d2aSXin Li convenient point.
*a58d3d2aSXin LiFor example, if the content switches from speech to music, and the encoder does
*a58d3d2aSXin Li not have enough latency in its analysis to detect this in advance, there may
*a58d3d2aSXin Li be no convenient silence period during which to make the transition for quite
*a58d3d2aSXin Li some time.
*a58d3d2aSXin LiTo avoid or reduce glitches during these problematic mode transitions, and
*a58d3d2aSXin Li also between audio bandwidth changes in the SILK-only modes, transitions MAY
*a58d3d2aSXin Li include redundant side information ("redundancy"), in the form of an
*a58d3d2aSXin Li additional CELT frame embedded in the Opus frame.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiA transition between coding the lower frequencies with the LP model and the
*a58d3d2aSXin Li MDCT model or a transition that involves changing the SILK bandwidth
*a58d3d2aSXin Li is only normatively specified when it includes redundancy.
*a58d3d2aSXin LiFor those without redundancy, it is RECOMMENDED that the decoder use a
*a58d3d2aSXin Li concealment technique (e.g., make use of a PLC algorithm) to "fill in" the
*a58d3d2aSXin Li gap or discontinuity caused by the mode transition.
*a58d3d2aSXin LiTherefore, PLC MUST NOT be applied during any normative transition, i.e., when
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>A packet includes redundancy for this transition (as described below),</t>
*a58d3d2aSXin Li<t>The transition is between any WB SILK packet and any Hybrid packet, or vice
*a58d3d2aSXin Li versa,</t>
*a58d3d2aSXin Li<t>The transition is between any two Hybrid mode packets, or</t>
*a58d3d2aSXin Li<t>The transition is between any two CELT mode packets,</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li unless there is actual packet loss.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="side-info" title="Transition Side Information (Redundancy)">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTransitions with side information include an extra 5&nbsp;ms "redundant" CELT
*a58d3d2aSXin Li frame within the Opus frame.
*a58d3d2aSXin LiThis frame is designed to fill in the gap or discontinuity in the different
*a58d3d2aSXin Li layers without requiring the decoder to conceal it.
*a58d3d2aSXin LiFor transitions from CELT-only to SILK-only or Hybrid, the redundant frame is
*a58d3d2aSXin Li inserted in the first Opus frame after the transition (i.e., the first
*a58d3d2aSXin Li SILK-only or Hybrid frame).
*a58d3d2aSXin LiFor transitions from SILK-only or Hybrid to CELT-only, the redundant frame is
*a58d3d2aSXin Li inserted in the last Opus frame before the transition (i.e., the last
*a58d3d2aSXin Li SILK-only or Hybrid frame).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="opus_redundancy_flag" title="Redundancy Flag">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe presence of redundancy is signaled in all SILK-only and Hybrid frames, not
*a58d3d2aSXin Li just those involved in a mode transition.
*a58d3d2aSXin LiThis allows the frames to be decoded correctly even if an adjacent frame is
*a58d3d2aSXin Li lost.
*a58d3d2aSXin LiFor SILK-only frames, this signaling is implicit, based on the size of the
*a58d3d2aSXin Li of the Opus frame and the number of bits consumed decoding the SILK portion of
*a58d3d2aSXin Li it.
*a58d3d2aSXin LiAfter decoding the SILK portion of the Opus frame, the decoder uses ec_tell()
*a58d3d2aSXin Li (see <xref target="ec_tell"/>) to check if there are at least 17 bits
*a58d3d2aSXin Li remaining.
*a58d3d2aSXin LiIf so, then the frame contains redundancy.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor Hybrid frames, this signaling is explicit.
*a58d3d2aSXin LiAfter decoding the SILK portion of the Opus frame, the decoder uses ec_tell()
*a58d3d2aSXin Li (see <xref target="ec_tell"/>) to ensure there are at least 37 bits remaining.
*a58d3d2aSXin LiIf so, it reads a symbol with the PDF in
*a58d3d2aSXin Li <xref target="opus_redundancy_flag_pdf"/>, and if the value is 1, then the
*a58d3d2aSXin Li frame contains redundancy.
*a58d3d2aSXin LiOtherwise (if there were fewer than 37 bits left or the value was 0), the frame
*a58d3d2aSXin Li does not contain redundancy.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="opus_redundancy_flag_pdf" title="Redundancy Flag PDF">
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>{4095, 1}/4096</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="opus_redundancy_pos" title="Redundancy Position Flag">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSince the current frame is a SILK-only or a Hybrid frame, it must be at least
*a58d3d2aSXin Li 10&nbsp;ms.
*a58d3d2aSXin LiTherefore, it needs an additional flag to indicate whether the redundant
*a58d3d2aSXin Li 5&nbsp;ms CELT frame should be mixed into the beginning of the current frame,
*a58d3d2aSXin Li or the end.
*a58d3d2aSXin LiAfter determining that a frame contains redundancy, the decoder reads a
*a58d3d2aSXin Li 1&nbsp;bit symbol with a uniform PDF
*a58d3d2aSXin Li (<xref target="opus_redundancy_pos_pdf"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="opus_redundancy_pos_pdf" title="Redundancy Position PDF">
*a58d3d2aSXin Li<ttcol>PDF</ttcol>
*a58d3d2aSXin Li<c>{1, 1}/2</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf the value is zero, this is the first frame in the transition, and the
*a58d3d2aSXin Li redundancy belongs at the end.
*a58d3d2aSXin LiIf the value is one, this is the second frame in the transition, and the
*a58d3d2aSXin Li redundancy belongs at the beginning.
*a58d3d2aSXin LiThere is no way to specify that an Opus frame contains separate redundant CELT
*a58d3d2aSXin Li frames at both the beginning and the end.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="opus_redundancy_size" title="Redundancy Size">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiUnlike the CELT portion of a Hybrid frame, the redundant CELT frame does not
*a58d3d2aSXin Li use the same entropy coder state as the rest of the Opus frame, because this
*a58d3d2aSXin Li would break the CELT bit allocation mechanism in Hybrid frames.
*a58d3d2aSXin LiThus, a redundant CELT frame always starts and ends on a byte boundary, even in
*a58d3d2aSXin Li SILK-only frames, where this is not strictly necessary.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor SILK-only frames, the number of bytes in the redundant CELT frame is simply
*a58d3d2aSXin Li the number of whole bytes remaining, which must be at least 2, due to the
*a58d3d2aSXin Li space check in <xref target="opus_redundancy_flag"/>.
*a58d3d2aSXin LiFor Hybrid frames, the number of bytes is equal to 2, plus a decoded unsigned
*a58d3d2aSXin Li integer less than 256 (see <xref target="ec_dec_uint"/>).
*a58d3d2aSXin LiThis may be more than the number of whole bytes remaining in the Opus frame,
*a58d3d2aSXin Li in which case the frame is invalid.
*a58d3d2aSXin LiHowever, a decoder is not required to ignore the entire frame, as this may be
*a58d3d2aSXin Li the result of a bit error that desynchronized the range coder.
*a58d3d2aSXin LiThere may still be useful data before the error, and a decoder MAY keep any
*a58d3d2aSXin Li audio decoded so far instead of invoking the PLC, but it is RECOMMENDED that
*a58d3d2aSXin Li the decoder stop decoding and discard the rest of the current Opus frame.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIt would have been possible to avoid these invalid states in the design of Opus
*a58d3d2aSXin Li by limiting the range of the explicit length decoded from Hybrid frames by the
*a58d3d2aSXin Li actual number of whole bytes remaining.
*a58d3d2aSXin LiHowever, this would require an encoder to determine the rate allocation for the
*a58d3d2aSXin Li MDCT layer up front, before it began encoding that layer.
*a58d3d2aSXin LiBy allowing some invalid sizes, the encoder is able to defer that decision
*a58d3d2aSXin Li until much later.
*a58d3d2aSXin LiWhen encoding Hybrid frames which do not include redundancy, the encoder must
*a58d3d2aSXin Li still decide up-front if it wishes to use the minimum 37 bits required to
*a58d3d2aSXin Li trigger encoding of the redundancy flag, but this is a much looser
*a58d3d2aSXin Li restriction.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter determining the size of the redundant CELT frame, the decoder reduces
*a58d3d2aSXin Li the size of the buffer currently in use by the range coder by that amount.
*a58d3d2aSXin LiThe CELT layer read any raw bits from the end of this reduced buffer, and all
*a58d3d2aSXin Li calculations of the number of bits remaining in the buffer must be done using
*a58d3d2aSXin Li this new, reduced size, rather than the original size of the Opus frame.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="opus_redundancy_decoding" title="Decoding the Redundancy">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe redundant frame is decoded like any other CELT-only frame, with the
*a58d3d2aSXin Li exception that it does not contain a TOC byte.
*a58d3d2aSXin LiThe frame size is fixed at 5&nbsp;ms, the channel count is set to that of the
*a58d3d2aSXin Li current frame, and the audio bandwidth is also set to that of the current
*a58d3d2aSXin Li frame, with the exception that for MB SILK frames, it is set to WB.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf the redundancy belongs at the beginning (in a CELT-only to SILK-only or
*a58d3d2aSXin Li Hybrid transition), the final reconstructed output uses the first 2.5&nbsp;ms
*a58d3d2aSXin Li of audio output by the decoder for the redundant frame as-is, discarding
*a58d3d2aSXin Li the corresponding output from the SILK-only or Hybrid portion of the frame.
*a58d3d2aSXin LiThe remaining 2.5&nbsp;ms is cross-lapped with the decoded SILK/Hybrid signal
*a58d3d2aSXin Li using the CELT's power-complementary MDCT window to ensure a smooth
*a58d3d2aSXin Li transition.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf the redundancy belongs at the end (in a SILK-only or Hybrid to CELT-only
*a58d3d2aSXin Li transition), only the second half (2.5&nbsp;ms) of the audio output by the
*a58d3d2aSXin Li decoder for the redundant frame is used.
*a58d3d2aSXin LiIn that case, the second half of the redundant frame is cross-lapped with the
*a58d3d2aSXin Li end of the SILK/Hybrid signal, again using CELT's power-complementary MDCT
*a58d3d2aSXin Li window to ensure a smooth transition.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="decoder-reset" title="State Reset">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen a transition occurs, the state of the SILK or the CELT decoder (or both)
*a58d3d2aSXin Li may need to be reset before decoding a frame in the new mode.
*a58d3d2aSXin LiThis avoids reusing "out of date" memory, which may not have been updated in
*a58d3d2aSXin Li some time or may not be in a well-defined state due to, e.g., PLC.
*a58d3d2aSXin LiThe SILK state is reset before every SILK-only or Hybrid frame where the
*a58d3d2aSXin Li previous frame was CELT-only.
*a58d3d2aSXin LiThe CELT state is reset every time the operating mode changes and the new mode
*a58d3d2aSXin Li is either Hybrid or CELT-only, except when the transition uses redundancy as
*a58d3d2aSXin Li described above.
*a58d3d2aSXin LiWhen switching from SILK-only or Hybrid to CELT-only with redundancy, the CELT
*a58d3d2aSXin Li state is reset before decoding the redundant CELT frame embedded in the
*a58d3d2aSXin Li SILK-only or Hybrid frame, but it is not reset before decoding the following
*a58d3d2aSXin Li CELT-only frame.
*a58d3d2aSXin LiWhen switching from CELT-only mode to SILK-only or Hybrid mode with redundancy,
*a58d3d2aSXin Li the CELT decoder is not reset for decoding the redundant CELT frame.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Summary of Transitions">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li<xref target="normative_transitions"/> illustrates all of the normative
*a58d3d2aSXin Li transitions involving a mode change, an audio bandwidth change, or both.
*a58d3d2aSXin LiEach one uses an S, H, or C to represent an Opus frame in the corresponding
*a58d3d2aSXin Li mode.
*a58d3d2aSXin LiIn addition, an R indicates the presence of redundancy in the Opus frame it is
*a58d3d2aSXin Li cross-lapped with.
*a58d3d2aSXin LiIts location in the first or last 5&nbsp;ms is assumed to correspond to whether
*a58d3d2aSXin Li it is the frame before or after the transition.
*a58d3d2aSXin LiOther uses of redundancy are non-normative.
*a58d3d2aSXin LiFinally, a c indicates the contents of the CELT overlap buffer after the
*a58d3d2aSXin Li previously decoded frame (i.e., as extracted by decoding a silence frame).
*a58d3d2aSXin Li<figure align="center" anchor="normative_transitions"
*a58d3d2aSXin Li title="Normative Transitions">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin LiSILK to SILK with Redundancy:             S -> S -> S
*a58d3d2aSXin Li                                                    &
*a58d3d2aSXin Li                                                   !R -> R
*a58d3d2aSXin Li                                                         &
*a58d3d2aSXin Li                                                        ;S -> S -> S
*a58d3d2aSXin Li
*a58d3d2aSXin LiNB or MB SILK to Hybrid with Redundancy:  S -> S -> S
*a58d3d2aSXin Li                                                    &
*a58d3d2aSXin Li                                                   !R ->;H -> H -> H
*a58d3d2aSXin Li
*a58d3d2aSXin LiWB SILK to Hybrid:                        S -> S -> S ->!H -> H -> H
*a58d3d2aSXin Li
*a58d3d2aSXin LiSILK to CELT with Redundancy:             S -> S -> S
*a58d3d2aSXin Li                                                    &
*a58d3d2aSXin Li                                                   !R -> C -> C -> C
*a58d3d2aSXin Li
*a58d3d2aSXin LiHybrid to NB or MB SILK with Redundancy:  H -> H -> H
*a58d3d2aSXin Li                                                    &
*a58d3d2aSXin Li                                                   !R -> R
*a58d3d2aSXin Li                                                         &
*a58d3d2aSXin Li                                                        ;S -> S -> S
*a58d3d2aSXin Li
*a58d3d2aSXin LiHybrid to WB SILK:                        H -> H -> H -> c
*a58d3d2aSXin Li                                                      \  +
*a58d3d2aSXin Li                                                       > S -> S -> S
*a58d3d2aSXin Li
*a58d3d2aSXin LiHybrid to CELT with Redundancy:           H -> H -> H
*a58d3d2aSXin Li                                                    &
*a58d3d2aSXin Li                                                   !R -> C -> C -> C
*a58d3d2aSXin Li
*a58d3d2aSXin LiCELT to SILK with Redundancy:             C -> C -> C -> R
*a58d3d2aSXin Li                                                         &
*a58d3d2aSXin Li                                                        ;S -> S -> S
*a58d3d2aSXin Li
*a58d3d2aSXin LiCELT to Hybrid with Redundancy:           C -> C -> C -> R
*a58d3d2aSXin Li                                                         &
*a58d3d2aSXin Li                                                        |H -> H -> H
*a58d3d2aSXin Li
*a58d3d2aSXin LiKey:
*a58d3d2aSXin LiS   SILK-only frame                 ;   SILK decoder reset
*a58d3d2aSXin LiH   Hybrid frame                    |   CELT and SILK decoder resets
*a58d3d2aSXin LiC   CELT-only frame                 !   CELT decoder reset
*a58d3d2aSXin Lic   CELT overlap                    +   Direct mixing
*a58d3d2aSXin LiR   Redundant CELT frame            &   Windowed cross-lap
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe first two and the last two Opus frames in each example are illustrative,
*a58d3d2aSXin Li i.e., there is no requirement that a stream remain in the same configuration
*a58d3d2aSXin Li for three consecutive frames before or after a switch.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe behavior of transitions without redundancy where PLC is allowed is non-normative.
*a58d3d2aSXin LiAn encoder might still wish to use these transitions if, for example, it
*a58d3d2aSXin Li doesn't want to add the extra bitrate required for redundancy or if it makes
*a58d3d2aSXin Li a decision to switch after it has already transmitted the frame that would
*a58d3d2aSXin Li have had to contain the redundancy.
*a58d3d2aSXin Li<xref target="nonnormative_transitions"/> illustrates the recommended
*a58d3d2aSXin Li cross-lapping and decoder resets for these transitions.
*a58d3d2aSXin Li<figure align="center" anchor="nonnormative_transitions"
*a58d3d2aSXin Li title="Recommended Non-Normative Transitions">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin LiSILK to SILK (audio bandwidth change):    S -> S -> S   ;S -> S -> S
*a58d3d2aSXin Li
*a58d3d2aSXin LiNB or MB SILK to Hybrid:                  S -> S -> S   |H -> H -> H
*a58d3d2aSXin Li
*a58d3d2aSXin LiSILK to CELT without Redundancy:          S -> S -> S -> P
*a58d3d2aSXin Li                                                         &
*a58d3d2aSXin Li                                                        !C -> C -> C
*a58d3d2aSXin Li
*a58d3d2aSXin LiHybrid to NB or MB SILK:                  H -> H -> H -> c
*a58d3d2aSXin Li                                                         +
*a58d3d2aSXin Li                                                        ;S -> S -> S
*a58d3d2aSXin Li
*a58d3d2aSXin LiHybrid to CELT without Redundancy:        H -> H -> H -> P
*a58d3d2aSXin Li                                                         &
*a58d3d2aSXin Li                                                        !C -> C -> C
*a58d3d2aSXin Li
*a58d3d2aSXin LiCELT to SILK without Redundancy:          C -> C -> C -> P
*a58d3d2aSXin Li                                                         &
*a58d3d2aSXin Li                                                        ;S -> S -> S
*a58d3d2aSXin Li
*a58d3d2aSXin LiCELT to Hybrid without Redundancy:        C -> C -> C -> P
*a58d3d2aSXin Li                                                         &
*a58d3d2aSXin Li                                                        |H -> H -> H
*a58d3d2aSXin Li
*a58d3d2aSXin LiKey:
*a58d3d2aSXin LiS   SILK-only frame                 ;   SILK decoder reset
*a58d3d2aSXin LiH   Hybrid frame                    |   CELT and SILK decoder resets
*a58d3d2aSXin LiC   CELT-only frame                 !   CELT decoder reset
*a58d3d2aSXin Lic   CELT overlap                    +   Direct mixing
*a58d3d2aSXin LiP   Packet Loss Concealment         &   Windowed cross-lap
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiEncoders SHOULD NOT use other transitions, e.g., those that involve redundancy
*a58d3d2aSXin Li in ways not illustrated in <xref target="normative_transitions"/>.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<!--  ******************************************************************* -->
*a58d3d2aSXin Li<!--  **************************   OPUS ENCODER   *********************** -->
*a58d3d2aSXin Li<!--  ******************************************************************* -->
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Opus Encoder">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiJust like the decoder, the Opus encoder also normally consists of two main blocks: the
*a58d3d2aSXin LiSILK encoder and the CELT encoder. However, unlike the case of the decoder, a valid
*a58d3d2aSXin Li(though potentially suboptimal) Opus encoder is not required to support all modes and
*a58d3d2aSXin Limay thus only include a SILK encoder module or a CELT encoder module.
*a58d3d2aSXin LiThe output bit-stream of the Opus encoding contains bits from the SILK and CELT
*a58d3d2aSXin Li encoders, though these are not separable due to the use of a range coder.
*a58d3d2aSXin LiA block diagram of the encoder is illustrated below.
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure align="center" anchor="opus-encoder-figure" title="Opus Encoder">
*a58d3d2aSXin Li<artwork>
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li                    +------------+    +---------+
*a58d3d2aSXin Li                    |   Sample   |    |  SILK   |------+
*a58d3d2aSXin Li                 +->|    Rate    |--->| Encoder |      V
*a58d3d2aSXin Li  +-----------+  |  | Conversion |    |         | +---------+
*a58d3d2aSXin Li  | Optional  |  |  +------------+    +---------+ |  Range  |
*a58d3d2aSXin Li->| High-pass |--+                                | Encoder |---->
*a58d3d2aSXin Li  |  Filter   |  |  +--------------+  +---------+ |         | Bit-
*a58d3d2aSXin Li  +-----------+  |  |    Delay     |  |  CELT   | +---------+ stream
*a58d3d2aSXin Li                 +->| Compensation |->| Encoder |      ^
*a58d3d2aSXin Li                    |              |  |         |------+
*a58d3d2aSXin Li                    +--------------+  +---------+
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor a normal encoder where both the SILK and the CELT modules are included, an optimal
*a58d3d2aSXin Liencoder should select which coding mode to use at run-time depending on the conditions.
*a58d3d2aSXin LiIn the reference implementation, the frame size is selected by the application, but the
*a58d3d2aSXin Liother configuration parameters (number of channels, bandwidth, mode) are automatically
*a58d3d2aSXin Liselected (unless explicitly overridden by the application) depend on the following:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>Requested bitrate</t>
*a58d3d2aSXin Li<t>Input sampling rate</t>
*a58d3d2aSXin Li<t>Type of signal (speech vs music)</t>
*a58d3d2aSXin Li<t>Frame size in use</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li
*a58d3d2aSXin LiThe type of signal currently needs to be provided by the application (though it can be
*a58d3d2aSXin Lichanged in real-time). An Opus encoder implementation could also do automatic detection,
*a58d3d2aSXin Libut since Opus is an interactive codec, such an implementation would likely have to either
*a58d3d2aSXin Lidelay the signal (for non-interactive applications) or delay the mode switching decisions (for
*a58d3d2aSXin Liinteractive applications).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhen the encoder is configured for voice over IP applications, the input signal is
*a58d3d2aSXin Lifiltered by a high-pass filter to remove the lowest part of the spectrum
*a58d3d2aSXin Lithat contains little speech energy and may contain background noise. This is a second order
*a58d3d2aSXin LiAuto Regressive Moving Average (i.e., with poles and zeros) filter with a cut-off frequency around 50&nbsp;Hz.
*a58d3d2aSXin LiIn the future, a music detector may also be used to lower the cut-off frequency when the
*a58d3d2aSXin Liinput signal is detected to be music rather than speech.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="range-encoder" title="Range Encoder">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe range coder acts as the bit-packer for Opus.
*a58d3d2aSXin LiIt is used in three different ways: to encode
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiEntropy-coded symbols with a fixed probability model using ec_encode()
*a58d3d2aSXin Li (entenc.c),
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIntegers from 0 to (2**M&nbsp;-&nbsp;1) using ec_enc_uint() or ec_enc_bits()
*a58d3d2aSXin Li (entenc.c),</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIntegers from 0 to (ft&nbsp;-&nbsp;1) (where ft is not a power of two) using
*a58d3d2aSXin Li ec_enc_uint() (entenc.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe range encoder maintains an internal state vector composed of the four-tuple
*a58d3d2aSXin Li (val,&nbsp;rng,&nbsp;rem,&nbsp;ext) representing the low end of the current
*a58d3d2aSXin Li range, the size of the current range, a single buffered output byte, and a
*a58d3d2aSXin Li count of additional carry-propagating output bytes.
*a58d3d2aSXin LiBoth val and rng are 32-bit unsigned integer values, rem is a byte value or
*a58d3d2aSXin Li less than 255 or the special value -1, and ext is an unsigned integer with at
*a58d3d2aSXin Li least 11 bits.
*a58d3d2aSXin LiThis state vector is initialized at the start of each each frame to the value
*a58d3d2aSXin Li (0,&nbsp;2**31,&nbsp;-1,&nbsp;0).
*a58d3d2aSXin LiAfter encoding a sequence of symbols, the value of rng in the encoder should
*a58d3d2aSXin Li exactly match the value of rng in the decoder after decoding the same sequence
*a58d3d2aSXin Li of symbols.
*a58d3d2aSXin LiThis is a powerful tool for detecting errors in either an encoder or decoder
*a58d3d2aSXin Li implementation.
*a58d3d2aSXin LiThe value of val, on the other hand, represents different things in the encoder
*a58d3d2aSXin Li and decoder, and is not expected to match.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe decoder has no analog for rem and ext.
*a58d3d2aSXin LiThese are used to perform carry propagation in the renormalization loop below.
*a58d3d2aSXin LiEach iteration of this loop produces 9 bits of output, consisting of 8 data
*a58d3d2aSXin Li bits and a carry flag.
*a58d3d2aSXin LiThe encoder cannot determine the final value of the output bytes until it
*a58d3d2aSXin Li propagates these carry flags.
*a58d3d2aSXin LiTherefore the reference implementation buffers a single non-propagating output
*a58d3d2aSXin Li byte (i.e., one less than 255) in rem and keeps a count of additional
*a58d3d2aSXin Li propagating (i.e., 255) output bytes in ext.
*a58d3d2aSXin LiAn implementation may choose to use any mathematically equivalent scheme to
*a58d3d2aSXin Li perform carry propagation.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="encoding-symbols" title="Encoding Symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe main encoding function is ec_encode() (entenc.c), which encodes symbol k in
*a58d3d2aSXin Li the current context using the same three-tuple (fl[k],&nbsp;fh[k],&nbsp;ft)
*a58d3d2aSXin Li as the decoder to describe the range of the symbol (see
*a58d3d2aSXin Li <xref target="range-decoder"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Liec_encode() updates the state of the encoder as follows.
*a58d3d2aSXin LiIf fl[k] is greater than zero, then
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li                  rng
*a58d3d2aSXin Lival = val + rng - --- * (ft - fl) ,
*a58d3d2aSXin Li                  ft
*a58d3d2aSXin Li
*a58d3d2aSXin Li      rng
*a58d3d2aSXin Lirng = --- * (fh - fl) .
*a58d3d2aSXin Li      ft
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiOtherwise, val is unchanged and
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li            rng
*a58d3d2aSXin Lirng = rng - --- * (fh - fl) .
*a58d3d2aSXin Li            ft
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe divisions here are integer division.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="range-encoder-renorm" title="Renormalization">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter this update, the range is normalized using a procedure very similar to
*a58d3d2aSXin Li that of <xref target="range-decoder-renorm"/>, implemented by
*a58d3d2aSXin Li ec_enc_normalize() (entenc.c).
*a58d3d2aSXin LiThe following process is repeated until rng&nbsp;&gt;&nbsp;2**23.
*a58d3d2aSXin LiFirst, the top 9 bits of val, (val&gt;&gt;23), are sent to the carry buffer,
*a58d3d2aSXin Li described in <xref target="ec_enc_carry_out"/>.
*a58d3d2aSXin LiThen, the encoder sets
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lival = (val<<8) & 0x7FFFFFFF ,
*a58d3d2aSXin Li
*a58d3d2aSXin Lirng = rng<<8 .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ec_enc_carry_out"
*a58d3d2aSXin Li title="Carry Propagation and Output Buffering">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe function ec_enc_carry_out() (entenc.c) implements carry propagation and
*a58d3d2aSXin Li output buffering.
*a58d3d2aSXin LiIt takes as input a 9-bit value, c, consisting of 8 data bits and an additional
*a58d3d2aSXin Li carry bit.
*a58d3d2aSXin LiIf c is equal to the value 255, then ext is simply incremented, and no other
*a58d3d2aSXin Li state updates are performed.
*a58d3d2aSXin LiOtherwise, let b&nbsp;=&nbsp;(c&gt;&gt;8) be the carry bit.
*a58d3d2aSXin LiThen,
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf the buffered byte rem contains a value other than -1, the encoder outputs
*a58d3d2aSXin Li the byte (rem&nbsp;+&nbsp;b).
*a58d3d2aSXin LiOtherwise, if rem is -1, no byte is output.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf ext is non-zero, then the encoder outputs ext bytes---all with a value of 0
*a58d3d2aSXin Li if b is set, or 255 if b is unset---and sets ext to 0.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Lirem is set to the 8 data bits:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Lirem = c & 255 .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="encoding-alternate" title="Alternate Encoding Methods">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe reference implementation uses three additional encoding methods that are
*a58d3d2aSXin Li exactly equivalent to the above, but make assumptions and simplifications that
*a58d3d2aSXin Li allow for a more efficient implementation.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ec_encode_bin" title="ec_encode_bin()">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe first is ec_encode_bin() (entenc.c), defined using the parameter ftb
*a58d3d2aSXin Li instead of ft.
*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_encode() with
*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;ftb), but avoids using division.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ec_enc_bit_logp" title="ec_enc_bit_logp()">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe next is ec_enc_bit_logp() (entenc.c), which encodes a single binary symbol.
*a58d3d2aSXin LiThe context is described by a single parameter, logp, which is the absolute
*a58d3d2aSXin Li value of the base-2 logarithm of the probability of a "1".
*a58d3d2aSXin LiIt is mathematically equivalent to calling ec_encode() with the 3-tuple
*a58d3d2aSXin Li (fl[k]&nbsp;=&nbsp;0, fh[k]&nbsp;=&nbsp;(1&lt;&lt;logp)&nbsp;-&nbsp;1,
*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;logp)) if k is 0 and with
*a58d3d2aSXin Li (fl[k]&nbsp;=&nbsp;(1&lt;&lt;logp)&nbsp;-&nbsp;1,
*a58d3d2aSXin Li fh[k]&nbsp;=&nbsp;ft&nbsp;=&nbsp;(1&lt;&lt;logp)) if k is 1.
*a58d3d2aSXin LiThe implementation requires no multiplications or divisions.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ec_enc_icdf" title="ec_enc_icdf()">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe last is ec_enc_icdf() (entenc.c), which encodes a single binary symbol with
*a58d3d2aSXin Li a table-based context of up to 8 bits.
*a58d3d2aSXin LiThis uses the same icdf table as ec_dec_icdf() from
*a58d3d2aSXin Li <xref target="ec_dec_icdf"/>.
*a58d3d2aSXin LiThe function is mathematically equivalent to calling ec_encode() with
*a58d3d2aSXin Li fl[k]&nbsp;=&nbsp;(1&lt;&lt;ftb)&nbsp;-&nbsp;icdf[k-1] (or 0 if
*a58d3d2aSXin Li k&nbsp;==&nbsp;0), fh[k]&nbsp;=&nbsp;(1&lt;&lt;ftb)&nbsp;-&nbsp;icdf[k], and
*a58d3d2aSXin Li ft&nbsp;=&nbsp;(1&lt;&lt;ftb).
*a58d3d2aSXin LiThis only saves a few arithmetic operations over ec_encode_bin(), but allows
*a58d3d2aSXin Li the encoder to use the same icdf tables as the decoder.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="encoding-bits" title="Encoding Raw Bits">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe raw bits used by the CELT layer are packed at the end of the buffer using
*a58d3d2aSXin Li ec_enc_bits() (entenc.c).
*a58d3d2aSXin LiBecause the raw bits may continue into the last byte output by the range coder
*a58d3d2aSXin Li if there is room in the low-order bits, the encoder must be prepared to merge
*a58d3d2aSXin Li these values into a single byte.
*a58d3d2aSXin LiThe procedure in <xref target="encoder-finalizing"/> does this in a way that
*a58d3d2aSXin Li ensures both the range coded data and the raw bits can be decoded
*a58d3d2aSXin Li successfully.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="encoding-ints" title="Encoding Uniformly Distributed Integers">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe function ec_enc_uint() (entenc.c) encodes one of ft equiprobable symbols in
*a58d3d2aSXin Li the range 0 to (ft&nbsp;-&nbsp;1), inclusive, each with a frequency of 1,
*a58d3d2aSXin Li where ft may be as large as (2**32&nbsp;-&nbsp;1).
*a58d3d2aSXin LiLike the decoder (see <xref target="ec_dec_uint"/>), it splits up the
*a58d3d2aSXin Li value into a range coded symbol representing up to 8 of the high bits, and, if
*a58d3d2aSXin Li necessary, raw bits representing the remainder of the value.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Liec_enc_uint() takes a two-tuple (t,&nbsp;ft), where t is the value to be
*a58d3d2aSXin Li encoded, 0&nbsp;&lt;=&nbsp;t&nbsp;&lt;&nbsp;ft, and ft is not necessarily a
*a58d3d2aSXin Li power of two.
*a58d3d2aSXin LiLet ftb&nbsp;=&nbsp;ilog(ft&nbsp;-&nbsp;1), i.e., the number of bits required
*a58d3d2aSXin Li to store (ft&nbsp;-&nbsp;1) in two's complement notation.
*a58d3d2aSXin LiIf ftb is 8 or less, then t is encoded directly using ec_encode() with the
*a58d3d2aSXin Li three-tuple (t, t&nbsp;+&nbsp;1, ft).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf ftb is greater than 8, then the top 8 bits of t are encoded using the
*a58d3d2aSXin Li three-tuple (t&gt;&gt;(ftb&nbsp;-&nbsp;8),
*a58d3d2aSXin Li (t&gt;&gt;(ftb&nbsp;-&nbsp;8))&nbsp;+&nbsp;1,
*a58d3d2aSXin Li ((ft&nbsp;-&nbsp;1)&gt;&gt;(ftb&nbsp;-&nbsp;8))&nbsp;+&nbsp;1), and the
*a58d3d2aSXin Li remaining bits,
*a58d3d2aSXin Li (t&nbsp;&amp;&nbsp;((1&lt;&lt;(ftb&nbsp;-&nbsp;8))&nbsp;-&nbsp;1),
*a58d3d2aSXin Li are encoded as raw bits with ec_enc_bits().
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="encoder-finalizing" title="Finalizing the Stream">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAfter all symbols are encoded, the stream must be finalized by outputting a
*a58d3d2aSXin Li value inside the current range.
*a58d3d2aSXin LiLet end be the integer in the interval [val,&nbsp;val&nbsp;+&nbsp;rng) with the
*a58d3d2aSXin Li largest number of trailing zero bits, b, such that
*a58d3d2aSXin Li (end&nbsp;+&nbsp;(1&lt;&lt;b)&nbsp;-&nbsp;1) is also in the interval
*a58d3d2aSXin Li [val,&nbsp;val&nbsp;+&nbsp;rng).
*a58d3d2aSXin LiThis choice of end allows the maximum number of trailing bits to be set to
*a58d3d2aSXin Li arbitrary values while still ensuring the range coded part of the buffer can
*a58d3d2aSXin Li be decoded correctly.
*a58d3d2aSXin LiThen, while end is not zero, the top 9 bits of end, i.e., (end&gt;&gt;23), are
*a58d3d2aSXin Li passed to the carry buffer in accordance with the procedure in
*a58d3d2aSXin Li <xref target="ec_enc_carry_out"/>, and end is updated via
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Liend = (end<<8) & 0x7FFFFFFF .
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiFinally, if the buffered output byte, rem, is neither zero nor the special
*a58d3d2aSXin Li value -1, or the carry count, ext, is greater than zero, then 9 zero bits are
*a58d3d2aSXin Li sent to the carry buffer to flush it to the output buffer.
*a58d3d2aSXin LiWhen outputting the final byte from the range coder, if it would overlap any
*a58d3d2aSXin Li raw bits already packed into the end of the output buffer, they should be ORed
*a58d3d2aSXin Li into the same byte.
*a58d3d2aSXin LiThe bit allocation routines in the CELT layer should ensure that this can be
*a58d3d2aSXin Li done without corrupting the range coder data so long as end is chosen as
*a58d3d2aSXin Li described above.
*a58d3d2aSXin LiIf there is any space between the end of the range coder data and the end of
*a58d3d2aSXin Li the raw bits, it is padded with zero bits.
*a58d3d2aSXin LiThis entire process is implemented by ec_enc_done() (entenc.c).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="encoder-tell" title="Current Bit Usage">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li   The bit allocation routines in Opus need to be able to determine a
*a58d3d2aSXin Li   conservative upper bound on the number of bits that have been used
*a58d3d2aSXin Li   to encode the current frame thus far. This drives allocation
*a58d3d2aSXin Li   decisions and ensures that the range coder and raw bits will not
*a58d3d2aSXin Li   overflow the output buffer. This is computed in the
*a58d3d2aSXin Li   reference implementation to whole-bit precision by
*a58d3d2aSXin Li   the function ec_tell() (entcode.h) and to fractional 1/8th bit
*a58d3d2aSXin Li   precision by the function ec_tell_frac() (entcode.c).
*a58d3d2aSXin Li   Like all operations in the range coder, it must be implemented in a
*a58d3d2aSXin Li   bit-exact manner, and must produce exactly the same value returned by
*a58d3d2aSXin Li   the same functions in the decoder after decoding the same symbols.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='SILK Encoder'>
*a58d3d2aSXin Li  <t>
*a58d3d2aSXin Li    In many respects the SILK encoder mirrors the SILK decoder described
*a58d3d2aSXin Li    in <xref target='silk_decoder_outline'/>.
*a58d3d2aSXin Li    Details such as the quantization and range coder tables can be found
*a58d3d2aSXin Li    there, while this section describes the high-level design choices that
*a58d3d2aSXin Li    were made.
*a58d3d2aSXin Li    The diagram below shows the basic modules of the SILK encoder.
*a58d3d2aSXin Li<figure align="center" anchor="silk_encoder_figure" title="SILK Encoder">
*a58d3d2aSXin Li<artwork>
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li       +----------+    +--------+    +---------+
*a58d3d2aSXin Li       |  Sample  |    | Stereo |    |  SILK   |
*a58d3d2aSXin Li------>|   Rate   |--->| Mixing |--->|  Core   |---------->
*a58d3d2aSXin LiInput  |Conversion|    |        |    | Encoder |  Bitstream
*a58d3d2aSXin Li       +----------+    +--------+    +---------+
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Sample Rate Conversion'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe input signal's sampling rate is adjusted by a sample rate conversion
*a58d3d2aSXin Limodule so that it matches the SILK internal sampling rate.
*a58d3d2aSXin LiThe input to the sample rate converter is delayed by a number of samples
*a58d3d2aSXin Lidepending on the sample rate ratio, such that the overall delay is constant
*a58d3d2aSXin Lifor all input and output sample rates.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Stereo Mixing'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe stereo mixer is only used for stereo input signals.
*a58d3d2aSXin LiIt converts a stereo left/right signal into an adaptive
*a58d3d2aSXin Limid/side representation.
*a58d3d2aSXin LiThe first step is to compute non-adaptive mid/side signals
*a58d3d2aSXin Lias half the sum and difference between left and right signals.
*a58d3d2aSXin LiThe side signal is then minimized in energy by subtracting a
*a58d3d2aSXin Liprediction of it based on the mid signal.
*a58d3d2aSXin LiThis prediction works well when the left and right signals
*a58d3d2aSXin Liexhibit linear dependency, for instance for an amplitude-panned
*a58d3d2aSXin Liinput signal.
*a58d3d2aSXin LiLike in the decoder, the prediction coefficients are linearly
*a58d3d2aSXin Liinterpolated during the first 8&nbsp;ms of the frame.
*a58d3d2aSXin Li  The mid signal is always encoded, whereas the residual
*a58d3d2aSXin Li  side signal is only encoded if it has sufficient
*a58d3d2aSXin Li  energy compared to the mid signal's energy.
*a58d3d2aSXin Li  If it has not,
*a58d3d2aSXin Li  the "mid_only_flag" is set without encoding the side signal.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe predictor coefficients are coded regardless of whether
*a58d3d2aSXin Lithe side signal is encoded.
*a58d3d2aSXin LiFor each frame, two predictor coefficients are computed, one
*a58d3d2aSXin Lithat predicts between low-passed mid and side channels, and
*a58d3d2aSXin Lione that predicts between high-passed mid and side channels.
*a58d3d2aSXin LiThe low-pass filter is a simple three-tap filter
*a58d3d2aSXin Liand creates a delay of one sample.
*a58d3d2aSXin LiThe high-pass filtered signal is the difference between
*a58d3d2aSXin Lithe mid signal delayed by one sample and the low-passed
*a58d3d2aSXin Lisignal.  Instead of explicitly computing the high-passed
*a58d3d2aSXin Lisignal, it is computationally more efficient to transform
*a58d3d2aSXin Lithe prediction coefficients before applying them to the
*a58d3d2aSXin Lifiltered mid signal, as follows
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Lipred(n) = LP(n) * w0 + HP(n) * w1
*a58d3d2aSXin Li        = LP(n) * w0 + (mid(n-1) - LP(n)) * w1
*a58d3d2aSXin Li        = LP(n) * (w0 - w1) + mid(n-1) * w1
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liwhere w0 and w1 are the low-pass and high-pass prediction
*a58d3d2aSXin Licoefficients, mid(n-1) is the mid signal delayed by one sample,
*a58d3d2aSXin LiLP(n) and HP(n) are the low-passed and high-passed
*a58d3d2aSXin Lisignals and pred(n) is the prediction signal that is subtracted
*a58d3d2aSXin Lifrom the side signal.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='SILK Core Encoder'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhat follows is a description of the core encoder and its components.
*a58d3d2aSXin LiFor simplicity, the core encoder is referred to simply as the encoder in
*a58d3d2aSXin Lithe remainder of this section. An overview of the encoder is given in
*a58d3d2aSXin Li<xref target="encoder_figure" />.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<figure align="center" anchor="encoder_figure" title="SILK Core Encoder">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li                                                             +---+
*a58d3d2aSXin Li                          +--------------------------------->|   |
*a58d3d2aSXin Li     +---------+          |      +---------+                 |   |
*a58d3d2aSXin Li     |Voice    |          |      |LTP      |12               |   |
*a58d3d2aSXin Li +-->|Activity |--+       +----->|Scaling  |-----------+---->|   |
*a58d3d2aSXin Li |   |Detector |3 |       |      |Control  |<--+       |     |   |
*a58d3d2aSXin Li |   +---------+  |       |      +---------+   |       |     |   |
*a58d3d2aSXin Li |                |       |      +---------+   |       |     |   |
*a58d3d2aSXin Li |                |       |      |Gains    |   |       |     |   |
*a58d3d2aSXin Li |                |       |  +-->|Processor|---|---+---|---->| R |
*a58d3d2aSXin Li |                |       |  |   |         |11 |   |   |     | a |
*a58d3d2aSXin Li |               \/       |  |   +---------+   |   |   |     | n |
*a58d3d2aSXin Li |          +---------+   |  |   +---------+   |   |   |     | g |
*a58d3d2aSXin Li |          |Pitch    |   |  |   |LSF      |   |   |   |     | e |
*a58d3d2aSXin Li |       +->|Analysis |---+  |   |Quantizer|---|---|---|---->|   |
*a58d3d2aSXin Li |       |  |         |4  |  |   |         |8  |   |   |     | E |-->
*a58d3d2aSXin Li |       |  +---------+   |  |   +---------+   |   |   |     | n | 2
*a58d3d2aSXin Li |       |                |  |    9/\  10|     |   |   |     | c |
*a58d3d2aSXin Li |       |                |  |     |    \/     |   |   |     | o |
*a58d3d2aSXin Li |       |  +---------+   |  |   +----------+  |   |   |     | d |
*a58d3d2aSXin Li |       |  |Noise    |   +--|-->|Prediction|--+---|---|---->| e |
*a58d3d2aSXin Li |       +->|Shaping  |---|--+   |Analysis  |7 |   |   |     | r |
*a58d3d2aSXin Li |       |  |Analysis |5  |  |   |          |  |   |   |     |   |
*a58d3d2aSXin Li |       |  +---------+   |  |   +----------+  |   |   |     |   |
*a58d3d2aSXin Li |       |                |  |        /\       |   |   |     |   |
*a58d3d2aSXin Li |       |     +----------|--|--------+        |   |   |     |   |
*a58d3d2aSXin Li |       |     |         \/  \/               \/  \/  \/     |   |
*a58d3d2aSXin Li |       |     |       +---------+          +------------+   |   |
*a58d3d2aSXin Li |       |     |       |         |          |Noise       |   |   |
*a58d3d2aSXin Li-+-------+-----+------>|Prefilter|--------->|Shaping     |-->|   |
*a58d3d2aSXin Li1                      |         | 6        |Quantization|13 |   |
*a58d3d2aSXin Li                       +---------+          +------------+   +---+
*a58d3d2aSXin Li
*a58d3d2aSXin Li1:  Input speech signal
*a58d3d2aSXin Li2:  Range encoded bitstream
*a58d3d2aSXin Li3:  Voice activity estimate
*a58d3d2aSXin Li4:  Pitch lags (per 5 ms) and voicing decision (per 20 ms)
*a58d3d2aSXin Li5:  Noise shaping quantization coefficients
*a58d3d2aSXin Li  - Short term synthesis and analysis
*a58d3d2aSXin Li    noise shaping coefficients (per 5 ms)
*a58d3d2aSXin Li  - Long term synthesis and analysis noise
*a58d3d2aSXin Li    shaping coefficients (per 5 ms and for voiced speech only)
*a58d3d2aSXin Li  - Noise shaping tilt (per 5 ms)
*a58d3d2aSXin Li  - Quantizer gain/step size (per 5 ms)
*a58d3d2aSXin Li6:  Input signal filtered with analysis noise shaping filters
*a58d3d2aSXin Li7:  Short and long term prediction coefficients
*a58d3d2aSXin Li    LTP (per 5 ms) and LPC (per 20 ms)
*a58d3d2aSXin Li8:  LSF quantization indices
*a58d3d2aSXin Li9:  LSF coefficients
*a58d3d2aSXin Li10: Quantized LSF coefficients
*a58d3d2aSXin Li11: Processed gains, and synthesis noise shape coefficients
*a58d3d2aSXin Li12: LTP state scaling coefficient. Controlling error propagation
*a58d3d2aSXin Li   / prediction gain trade-off
*a58d3d2aSXin Li13: Quantized signal
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Voice Activity Detection'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe input signal is processed by a Voice Activity Detector (VAD) to produce
*a58d3d2aSXin Lia measure of voice activity, spectral tilt, and signal-to-noise estimates for
*a58d3d2aSXin Lieach frame. The VAD uses a sequence of half-band filterbanks to split the
*a58d3d2aSXin Lisignal into four subbands: 0...Fs/16, Fs/16...Fs/8, Fs/8...Fs/4, and
*a58d3d2aSXin LiFs/4...Fs/2, where Fs is the sampling frequency (8, 12, 16, or 24&nbsp;kHz).
*a58d3d2aSXin LiThe lowest subband, from 0 - Fs/16, is high-pass filtered with a first-order
*a58d3d2aSXin Limoving average (MA) filter (with transfer function H(z) = 1-z**(-1)) to
*a58d3d2aSXin Lireduce the energy at the lowest frequencies. For each frame, the signal
*a58d3d2aSXin Lienergy per subband is computed.
*a58d3d2aSXin LiIn each subband, a noise level estimator tracks the background noise level
*a58d3d2aSXin Liand a Signal-to-Noise Ratio (SNR) value is computed as the logarithm of the
*a58d3d2aSXin Liratio of energy to noise level.
*a58d3d2aSXin LiUsing these intermediate variables, the following parameters are calculated
*a58d3d2aSXin Lifor use in other SILK modules:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAverage SNR. The average of the subband SNR values.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSmoothed subband SNRs. Temporally smoothed subband SNR values.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSpeech activity level. Based on the average SNR and a weighted average of the
*a58d3d2aSXin Lisubband energies.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSpectral tilt. A weighted average of the subband SNRs, with positive weights
*a58d3d2aSXin Lifor the low subbands and negative weights for the high subbands.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Pitch Analysis' anchor='pitch_estimator_overview_section'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe input signal is processed by the open loop pitch estimator shown in
*a58d3d2aSXin Li<xref target='pitch_estimator_figure' />.
*a58d3d2aSXin Li<figure align="center" anchor="pitch_estimator_figure"
*a58d3d2aSXin Li title="Block diagram of the pitch estimator">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li                                 +--------+  +----------+
*a58d3d2aSXin Li                                 |2 x Down|  |Time-     |
*a58d3d2aSXin Li                              +->|sampling|->|Correlator|     |
*a58d3d2aSXin Li                              |  |        |  |          |     |4
*a58d3d2aSXin Li                              |  +--------+  +----------+    \/
*a58d3d2aSXin Li                              |                    | 2    +-------+
*a58d3d2aSXin Li                              |                    |  +-->|Speech |5
*a58d3d2aSXin Li    +---------+    +--------+ |                   \/  |   |Type   |->
*a58d3d2aSXin Li    |LPC      |    |Down    | |              +----------+ |       |
*a58d3d2aSXin Li +->|Analysis | +->|sample  |-+------------->|Time-     | +-------+
*a58d3d2aSXin Li |  |         | |  |to 8 kHz|                |Correlator|----------->
*a58d3d2aSXin Li |  +---------+ |  +--------+                |__________|          6
*a58d3d2aSXin Li |       |      |                                  |3
*a58d3d2aSXin Li |      \/      |                                 \/
*a58d3d2aSXin Li |  +---------+ |                            +----------+
*a58d3d2aSXin Li |  |Whitening| |                            |Time-     |
*a58d3d2aSXin Li-+->|Filter   |-+--------------------------->|Correlator|----------->
*a58d3d2aSXin Li1   |         |                              |          |          7
*a58d3d2aSXin Li    +---------+                              +----------+
*a58d3d2aSXin Li
*a58d3d2aSXin Li1: Input signal
*a58d3d2aSXin Li2: Lag candidates from stage 1
*a58d3d2aSXin Li3: Lag candidates from stage 2
*a58d3d2aSXin Li4: Correlation threshold
*a58d3d2aSXin Li5: Voiced/unvoiced flag
*a58d3d2aSXin Li6: Pitch correlation
*a58d3d2aSXin Li7: Pitch lags
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiThe pitch analysis finds a binary voiced/unvoiced classification, and, for
*a58d3d2aSXin Liframes classified as voiced, four pitch lags per frame - one for each
*a58d3d2aSXin Li5&nbsp;ms subframe - and a pitch correlation indicating the periodicity of
*a58d3d2aSXin Lithe signal.
*a58d3d2aSXin LiThe input is first whitened using a Linear Prediction (LP) whitening filter,
*a58d3d2aSXin Liwhere the coefficients are computed through standard Linear Prediction Coding
*a58d3d2aSXin Li(LPC) analysis. The order of the whitening filter is 16 for best results, but
*a58d3d2aSXin Liis reduced to 12 for medium complexity and 8 for low complexity modes.
*a58d3d2aSXin LiThe whitened signal is analyzed to find pitch lags for which the time
*a58d3d2aSXin Licorrelation is high.
*a58d3d2aSXin LiThe analysis consists of three stages for reducing the complexity:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>In the first stage, the whitened signal is downsampled to 4&nbsp;kHz
*a58d3d2aSXin Li(from 8&nbsp;kHz) and the current frame is correlated to a signal delayed
*a58d3d2aSXin Liby a range of lags, starting from a shortest lag corresponding to
*a58d3d2aSXin Li500&nbsp;Hz, to a longest lag corresponding to 56&nbsp;Hz.</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe second stage operates on an 8&nbsp;kHz signal (downsampled from 12, 16,
*a58d3d2aSXin Lior 24&nbsp;kHz) and measures time correlations only near the lags
*a58d3d2aSXin Licorresponding to those that had sufficiently high correlations in the first
*a58d3d2aSXin Listage. The resulting correlations are adjusted for a small bias towards
*a58d3d2aSXin Lishort lags to avoid ending up with a multiple of the true pitch lag.
*a58d3d2aSXin LiThe highest adjusted correlation is compared to a threshold depending on:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhether the previous frame was classified as voiced
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe speech activity level
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe spectral tilt.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiIf the threshold is exceeded, the current frame is classified as voiced and
*a58d3d2aSXin Lithe lag with the highest adjusted correlation is stored for a final pitch
*a58d3d2aSXin Lianalysis of the highest precision in the third stage.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe last stage operates directly on the whitened input signal to compute time
*a58d3d2aSXin Licorrelations for each of the four subframes independently in a narrow range
*a58d3d2aSXin Liaround the lag with highest correlation from the second stage.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Noise Shaping Analysis' anchor='noise_shaping_analysis_overview_section'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe noise shaping analysis finds gains and filter coefficients used in the
*a58d3d2aSXin Liprefilter and noise shaping quantizer. These parameters are chosen such that
*a58d3d2aSXin Lithey will fulfill several requirements:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiBalancing quantization noise and bitrate.
*a58d3d2aSXin LiThe quantization gains determine the step size between reconstruction levels
*a58d3d2aSXin Liof the excitation signal. Therefore, increasing the quantization gain
*a58d3d2aSXin Liamplifies quantization noise, but also reduces the bitrate by lowering
*a58d3d2aSXin Lithe entropy of the quantization indices.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSpectral shaping of the quantization noise; the noise shaping quantizer is
*a58d3d2aSXin Licapable of reducing quantization noise in some parts of the spectrum at the
*a58d3d2aSXin Licost of increased noise in other parts without substantially changing the
*a58d3d2aSXin Libitrate.
*a58d3d2aSXin LiBy shaping the noise such that it follows the signal spectrum, it becomes
*a58d3d2aSXin Liless audible. In practice, best results are obtained by making the shape
*a58d3d2aSXin Liof the noise spectrum slightly flatter than the signal spectrum.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiDe-emphasizing spectral valleys; by using different coefficients in the
*a58d3d2aSXin Lianalysis and synthesis part of the prefilter and noise shaping quantizer,
*a58d3d2aSXin Lithe levels of the spectral valleys can be decreased relative to the levels
*a58d3d2aSXin Liof the spectral peaks such as speech formants and harmonics.
*a58d3d2aSXin LiThis reduces the entropy of the signal, which is the difference between the
*a58d3d2aSXin Licoded signal and the quantization noise, thus lowering the bitrate.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiMatching the levels of the decoded speech formants to the levels of the
*a58d3d2aSXin Lioriginal speech formants; an adjustment gain and a first order tilt
*a58d3d2aSXin Licoefficient are computed to compensate for the effect of the noise
*a58d3d2aSXin Lishaping quantization on the level and spectral tilt.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li<figure align="center" anchor="noise_shape_analysis_spectra_figure"
*a58d3d2aSXin Li title="Noise shaping and spectral de-emphasis illustration">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li  / \   ___
*a58d3d2aSXin Li   |   // \\
*a58d3d2aSXin Li   |  //   \\     ____
*a58d3d2aSXin Li   |_//     \\___//  \\         ____
*a58d3d2aSXin Li   | /  ___  \   /    \\       //  \\
*a58d3d2aSXin Li P |/  /   \  \_/      \\_____//    \\
*a58d3d2aSXin Li o |  /     \     ____  \     /      \\
*a58d3d2aSXin Li w | /       \___/    \  \___/  ____  \\___ 1
*a58d3d2aSXin Li e |/                  \       /    \  \
*a58d3d2aSXin Li r |                    \_____/      \  \__ 2
*a58d3d2aSXin Li   |                                  \
*a58d3d2aSXin Li   |                                   \___ 3
*a58d3d2aSXin Li   |
*a58d3d2aSXin Li   +---------------------------------------->
*a58d3d2aSXin Li                    Frequency
*a58d3d2aSXin Li
*a58d3d2aSXin Li1: Input signal spectrum
*a58d3d2aSXin Li2: De-emphasized and level matched spectrum
*a58d3d2aSXin Li3: Quantization noise spectrum
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li<xref target='noise_shape_analysis_spectra_figure' /> shows an example of an
*a58d3d2aSXin Liinput signal spectrum (1).
*a58d3d2aSXin LiAfter de-emphasis and level matching, the spectrum has deeper valleys (2).
*a58d3d2aSXin LiThe quantization noise spectrum (3) more or less follows the input signal
*a58d3d2aSXin Lispectrum, while having slightly less pronounced peaks.
*a58d3d2aSXin LiThe entropy, which provides a lower bound on the bitrate for encoding the
*a58d3d2aSXin Liexcitation signal, is proportional to the area between the de-emphasized
*a58d3d2aSXin Lispectrum (2) and the quantization noise spectrum (3). Without de-emphasis,
*a58d3d2aSXin Lithe entropy is proportional to the area between input spectrum (1) and
*a58d3d2aSXin Liquantization noise (3) - clearly higher.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe transformation from input signal to de-emphasized signal can be
*a58d3d2aSXin Lidescribed as a filtering operation with a filter
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li                           -1    Wana(z)
*a58d3d2aSXin LiH(z) = G * ( 1 - c_tilt * z  ) * -------
*a58d3d2aSXin Li                                 Wsyn(z),
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Lihaving an adjustment gain G, a first order tilt adjustment filter with
*a58d3d2aSXin Litilt coefficient c_tilt, and where
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li               16                            d
*a58d3d2aSXin Li               __             -k        -L  __            -k
*a58d3d2aSXin LiWana(z) = (1 - \ (a_ana(k) * z  )*(1 - z  * \ b_ana(k) * z  ),
*a58d3d2aSXin Li               /_                           /_
*a58d3d2aSXin Li               k=1                          k=-d
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liis the analysis part of the de-emphasis filter, consisting of the short-term
*a58d3d2aSXin Lishaping filter with coefficients a_ana(k), and the long-term shaping filter
*a58d3d2aSXin Liwith coefficients b_ana(k) and pitch lag L.
*a58d3d2aSXin LiThe parameter d determines the number of long-term shaping filter taps.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSimilarly, but without the tilt adjustment, the synthesis part can be written as
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li               16                            d
*a58d3d2aSXin Li               __             -k        -L  __            -k
*a58d3d2aSXin LiWsyn(z) = (1 - \ (a_syn(k) * z  )*(1 - z  * \ b_syn(k) * z  ).
*a58d3d2aSXin Li               /_                           /_
*a58d3d2aSXin Li               k=1                          k=-d
*a58d3d2aSXin Li            ]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAll noise shaping parameters are computed and applied per subframe of 5&nbsp;ms.
*a58d3d2aSXin LiFirst, an LPC analysis is performed on a windowed signal block of 15&nbsp;ms.
*a58d3d2aSXin LiThe signal block has a look-ahead of 5&nbsp;ms relative to the current subframe,
*a58d3d2aSXin Liand the window is an asymmetric sine window. The LPC analysis is done with the
*a58d3d2aSXin Liautocorrelation method, with an order of between 8, in lowest-complexity mode,
*a58d3d2aSXin Liand 16, for best quality.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOptionally the LPC analysis and noise shaping filters are warped by replacing
*a58d3d2aSXin Lithe delay elements by first-order allpass filters.
*a58d3d2aSXin LiThis increases the frequency resolution at low frequencies and reduces it at
*a58d3d2aSXin Lihigh ones, which better matches the human auditory system and improves
*a58d3d2aSXin Liquality.
*a58d3d2aSXin LiThe warped analysis and filtering comes at a cost in complexity
*a58d3d2aSXin Liand is therefore only done in higher complexity modes.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe quantization gain is found by taking the square root of the residual energy
*a58d3d2aSXin Lifrom the LPC analysis and multiplying it by a value inversely proportional
*a58d3d2aSXin Lito the coding quality control parameter and the pitch correlation.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiNext the two sets of short-term noise shaping coefficients a_ana(k) and
*a58d3d2aSXin Lia_syn(k) are obtained by applying different amounts of bandwidth expansion to the
*a58d3d2aSXin Licoefficients found in the LPC analysis.
*a58d3d2aSXin LiThis bandwidth expansion moves the roots of the LPC polynomial towards the
*a58d3d2aSXin Liorigin, using the formulas
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li                      k
*a58d3d2aSXin Li a_ana(k) = a(k)*g_ana , and
*a58d3d2aSXin Li
*a58d3d2aSXin Li                      k
*a58d3d2aSXin Li a_syn(k) = a(k)*g_syn ,
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liwhere a(k) is the k'th LPC coefficient, and the bandwidth expansion factors
*a58d3d2aSXin Lig_ana and g_syn are calculated as
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Lig_ana = 0.95 - 0.01*C, and
*a58d3d2aSXin Li
*a58d3d2aSXin Lig_syn = 0.95 + 0.01*C,
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liwhere C is the coding quality control parameter between 0 and 1.
*a58d3d2aSXin LiApplying more bandwidth expansion to the analysis part than to the synthesis
*a58d3d2aSXin Lipart gives the desired de-emphasis of spectral valleys in between formants.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe long-term shaping is applied only during voiced frames.
*a58d3d2aSXin LiIt uses three filter taps, described by
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li  <![CDATA[
*a58d3d2aSXin Lib_ana = F_ana * [0.25, 0.5, 0.25], and
*a58d3d2aSXin Li
*a58d3d2aSXin Lib_syn = F_syn * [0.25, 0.5, 0.25].
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin LiFor unvoiced frames these coefficients are set to 0. The multiplication factors
*a58d3d2aSXin LiF_ana and F_syn are chosen between 0 and 1, depending on the coding quality
*a58d3d2aSXin Licontrol parameter, as well as the calculated pitch correlation and smoothed
*a58d3d2aSXin Lisubband SNR of the lowest subband. By having F_ana less than F_syn,
*a58d3d2aSXin Lithe pitch harmonics are emphasized relative to the valleys in between the
*a58d3d2aSXin Liharmonics.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe tilt coefficient c_tilt is for unvoiced frames chosen as
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Lic_tilt = 0.25,
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liand as
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Lic_tilt = 0.25 + 0.2625 * V
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Lifor voiced frames, where V is the voice activity level between 0 and 1.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe adjustment gain G serves to correct any level mismatch between the original
*a58d3d2aSXin Liand decoded signals that might arise from the noise shaping and de-emphasis.
*a58d3d2aSXin LiThis gain is computed as the ratio of the prediction gain of the short-term
*a58d3d2aSXin Lianalysis and synthesis filter coefficients. The prediction gain of an LPC
*a58d3d2aSXin Lisynthesis filter is the square root of the output energy when the filter is
*a58d3d2aSXin Liexcited by a unit-energy impulse on the input.
*a58d3d2aSXin LiAn efficient way to compute the prediction gain is by first computing the
*a58d3d2aSXin Lireflection coefficients from the LPC coefficients through the step-down
*a58d3d2aSXin Lialgorithm, and extracting the prediction gain from the reflection coefficients
*a58d3d2aSXin Lias
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li               K
*a58d3d2aSXin Li              ___          2  -0.5
*a58d3d2aSXin Li predGain = ( | | 1 - (r_k)  )    ,
*a58d3d2aSXin Li              k=1
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liwhere r_k is the k'th reflection coefficient.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiInitial values for the quantization gains are computed as the square-root of
*a58d3d2aSXin Lithe residual energy of the LPC analysis, adjusted by the coding quality control
*a58d3d2aSXin Liparameter.
*a58d3d2aSXin LiThese quantization gains are later adjusted based on the results of the
*a58d3d2aSXin Liprediction analysis.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Prediction Analysis' anchor='pred_ana_overview_section'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe prediction analysis is performed in one of two ways depending on how
*a58d3d2aSXin Lithe pitch estimator classified the frame.
*a58d3d2aSXin LiThe processing for voiced and unvoiced speech is described in
*a58d3d2aSXin Li<xref target='pred_ana_voiced_overview_section' /> and
*a58d3d2aSXin Li  <xref target='pred_ana_unvoiced_overview_section' />, respectively.
*a58d3d2aSXin Li  Inputs to this function include the pre-whitened signal from the
*a58d3d2aSXin Li  pitch estimator (see <xref target='pitch_estimator_overview_section'/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Voiced Speech' anchor='pred_ana_voiced_overview_section'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li  For a frame of voiced speech the pitch pulses will remain dominant in the
*a58d3d2aSXin Li  pre-whitened input signal.
*a58d3d2aSXin Li  Further whitening is desirable as it leads to higher quality at the same
*a58d3d2aSXin Li  available bitrate.
*a58d3d2aSXin Li  To achieve this, a Long-Term Prediction (LTP) analysis is carried out to
*a58d3d2aSXin Li  estimate the coefficients of a fifth-order LTP filter for each of four
*a58d3d2aSXin Li  subframes.
*a58d3d2aSXin Li  The LTP coefficients are quantized using the method described in
*a58d3d2aSXin Li  <xref target='ltp_quantizer_overview_section'/>, and the quantized LTP
*a58d3d2aSXin Li  coefficients are used to compute the LTP residual signal.
*a58d3d2aSXin Li  This LTP residual signal is the input to an LPC analysis where the LPC coefficients are
*a58d3d2aSXin Li  estimated using Burg's method <xref target="Burg"/>, such that the residual energy is minimized.
*a58d3d2aSXin Li  The estimated LPC coefficients are converted to a Line Spectral Frequency (LSF) vector
*a58d3d2aSXin Li  and quantized as described in <xref target='lsf_quantizer_overview_section'/>.
*a58d3d2aSXin LiAfter quantization, the quantized LSF vector is converted back to LPC
*a58d3d2aSXin Licoefficients using the full procedure in <xref target="silk_nlsfs"/>.
*a58d3d2aSXin LiBy using quantized LTP coefficients and LPC coefficients derived from the
*a58d3d2aSXin Liquantized LSF coefficients, the encoder remains fully synchronized with the
*a58d3d2aSXin Lidecoder.
*a58d3d2aSXin LiThe quantized LPC and LTP coefficients are also used to filter the input
*a58d3d2aSXin Lisignal and measure residual energy for each of the four subframes.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li<section title='Unvoiced Speech' anchor='pred_ana_unvoiced_overview_section'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor a speech signal that has been classified as unvoiced, there is no need
*a58d3d2aSXin Lifor LTP filtering, as it has already been determined that the pre-whitened
*a58d3d2aSXin Liinput signal is not periodic enough within the allowed pitch period range
*a58d3d2aSXin Lifor LTP analysis to be worth the cost in terms of complexity and bitrate.
*a58d3d2aSXin LiThe pre-whitened input signal is therefore discarded, and instead the input
*a58d3d2aSXin Lisignal is used for LPC analysis using Burg's method.
*a58d3d2aSXin LiThe resulting LPC coefficients are converted to an LSF vector and quantized
*a58d3d2aSXin Lias described in the following section.
*a58d3d2aSXin LiThey are then transformed back to obtain quantized LPC coefficients, which
*a58d3d2aSXin Liare then used to filter the input signal and measure residual energy for
*a58d3d2aSXin Lieach of the four subframes.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<section title="Burg's Method">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe main purpose of linear prediction in SILK is to reduce the bitrate by
*a58d3d2aSXin Liminimizing the residual energy.
*a58d3d2aSXin LiAt least at high bitrates, perceptual aspects are handled
*a58d3d2aSXin Liindependently by the noise shaping filter.
*a58d3d2aSXin LiBurg's method is used because it provides higher prediction gain
*a58d3d2aSXin Lithan the autocorrelation method and, unlike the covariance method,
*a58d3d2aSXin Liproduces stable filters (assuming numerical errors don't spoil
*a58d3d2aSXin Lithat). SILK's implementation of Burg's method is also computationally
*a58d3d2aSXin Lifaster than the autocovariance method.
*a58d3d2aSXin LiThe implementation of Burg's method differs from traditional
*a58d3d2aSXin Liimplementations in two aspects.
*a58d3d2aSXin LiThe first difference is that it
*a58d3d2aSXin Lioperates on autocorrelations, similar to the Schur algorithm <xref target="Schur"/>, but
*a58d3d2aSXin Liwith a simple update to the autocorrelations after finding each
*a58d3d2aSXin Lireflection coefficient to make the result identical to Burg's method.
*a58d3d2aSXin LiThis brings down the complexity of Burg's method to near that of
*a58d3d2aSXin Lithe autocorrelation method.
*a58d3d2aSXin LiThe second difference is that the signal in each subframe is scaled
*a58d3d2aSXin Liby the inverse of the residual quantization step size.  Subframes with
*a58d3d2aSXin Lia small quantization step size will on average spend more bits for a
*a58d3d2aSXin Ligiven amount of residual energy than subframes with a large step size.
*a58d3d2aSXin LiWithout scaling, Burg's method minimizes the total residual energy in
*a58d3d2aSXin Liall subframes, which doesn't necessarily minimize the total number of
*a58d3d2aSXin Libits needed for coding the quantized residual.  The residual energy
*a58d3d2aSXin Liof the scaled subframes is a better measure for that number of
*a58d3d2aSXin Libits.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='LSF Quantization' anchor='lsf_quantizer_overview_section'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiUnlike many other speech codecs, SILK uses variable bitrate coding
*a58d3d2aSXin Lifor the LSFs.
*a58d3d2aSXin LiThis improves the average rate-distortion (R-D) tradeoff and reduces outliers.
*a58d3d2aSXin LiThe variable bitrate coding minimizes a linear combination of the weighted
*a58d3d2aSXin Liquantization errors and the bitrate.
*a58d3d2aSXin LiThe weights for the quantization errors are the Inverse
*a58d3d2aSXin LiHarmonic Mean Weighting (IHMW) function proposed by Laroia et al.
*a58d3d2aSXin Li(see <xref target="laroia-icassp" />).
*a58d3d2aSXin LiThese weights are referred to here as Laroia weights.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe LSF quantizer consists of two stages.
*a58d3d2aSXin LiThe first stage is an (unweighted) vector quantizer (VQ), with a
*a58d3d2aSXin Licodebook size of 32 vectors.
*a58d3d2aSXin LiThe quantization errors for the codebook vector are sorted, and
*a58d3d2aSXin Lifor the N best vectors a second stage quantizer is run.
*a58d3d2aSXin LiBy varying the number N a tradeoff is made between R-D performance
*a58d3d2aSXin Liand computational efficiency.
*a58d3d2aSXin LiFor each of the N codebook vectors the Laroia weights corresponding
*a58d3d2aSXin Lito that vector (and not to the input vector) are calculated.
*a58d3d2aSXin LiThen the residual between the input LSF vector and the codebook
*a58d3d2aSXin Livector is scaled by the square roots of these Laroia weights.
*a58d3d2aSXin LiThis scaling partially normalizes error sensitivity for the
*a58d3d2aSXin Liresidual vector, so that a uniform quantizer with fixed
*a58d3d2aSXin Listep sizes can be used in the second stage without too much
*a58d3d2aSXin Liperformance loss.
*a58d3d2aSXin LiAnd by scaling with Laroia weights determined from the first-stage
*a58d3d2aSXin Licodebook vector, the process can be reversed in the decoder.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe second stage uses predictive delayed decision scalar
*a58d3d2aSXin Liquantization.
*a58d3d2aSXin LiThe quantization error is weighted by Laroia weights determined
*a58d3d2aSXin Lifrom the LSF input vector.
*a58d3d2aSXin LiThe predictor multiplies the previous quantized residual value
*a58d3d2aSXin Liby a prediction coefficient that depends on the vector index from the
*a58d3d2aSXin Lifirst stage VQ and on the location in the LSF vector.
*a58d3d2aSXin LiThe prediction is subtracted from the LSF residual value before
*a58d3d2aSXin Liquantizing the result, and added back afterwards.
*a58d3d2aSXin LiThis subtraction can be interpreted as shifting the quantization levels
*a58d3d2aSXin Liof the scalar quantizer, and as a result the quantization error of
*a58d3d2aSXin Lieach value depends on the quantization decision of the previous value.
*a58d3d2aSXin LiThis dependency is exploited by the delayed decision mechanism to
*a58d3d2aSXin Lisearch for a quantization sequency with best R-D performance
*a58d3d2aSXin Liwith a Viterbi-like algorithm <xref target="Viterbi"/>.
*a58d3d2aSXin LiThe quantizer processes the residual LSF vector in reverse order
*a58d3d2aSXin Li(i.e., it starts with the highest residual LSF value).
*a58d3d2aSXin LiThis is done because the prediction works slightly
*a58d3d2aSXin Libetter in the reverse direction.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe quantization index of the first stage is entropy coded.
*a58d3d2aSXin LiThe quantization sequence from the second stage is also entropy
*a58d3d2aSXin Licoded, where for each element the probability table is chosen
*a58d3d2aSXin Lidepending on the vector index from the first stage and the location
*a58d3d2aSXin Liof that element in the LSF vector.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='LSF Stabilization' anchor='lsf_stabilizer_overview_section'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf the input is stable, finding the best candidate usually results in a
*a58d3d2aSXin Liquantized vector that is also stable. Because of the two-stage approach,
*a58d3d2aSXin Lihowever, it is possible that the best quantization candidate is unstable.
*a58d3d2aSXin LiThe encoder applies the same stabilization procedure applied by the decoder
*a58d3d2aSXin Li (see <xref target="silk_nlsf_stabilization"/> to ensure the LSF parameters
*a58d3d2aSXin Li are within their valid range, increasingly sorted, and have minimum
*a58d3d2aSXin Li distances between each other and the border values.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='LTP Quantization' anchor='ltp_quantizer_overview_section'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor voiced frames, the prediction analysis described in
*a58d3d2aSXin Li<xref target='pred_ana_voiced_overview_section' /> resulted in four sets
*a58d3d2aSXin Li(one set per subframe) of five LTP coefficients, plus four weighting matrices.
*a58d3d2aSXin LiThe LTP coefficients for each subframe are quantized using entropy constrained
*a58d3d2aSXin Livector quantization.
*a58d3d2aSXin LiA total of three vector codebooks are available for quantization, with
*a58d3d2aSXin Lidifferent rate-distortion trade-offs. The three codebooks have 10, 20, and
*a58d3d2aSXin Li40 vectors and average rates of about 3, 4, and 5 bits per vector, respectively.
*a58d3d2aSXin LiConsequently, the first codebook has larger average quantization distortion at
*a58d3d2aSXin Lia lower rate, whereas the last codebook has smaller average quantization
*a58d3d2aSXin Lidistortion at a higher rate.
*a58d3d2aSXin LiGiven the weighting matrix W_ltp and LTP vector b, the weighted rate-distortion
*a58d3d2aSXin Limeasure for a codebook vector cb_i with rate r_i is give by
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center">
*a58d3d2aSXin Li<![CDATA[
*a58d3d2aSXin Li RD = u * (b - cb_i)' * W_ltp * (b - cb_i) + r_i,
*a58d3d2aSXin Li]]>
*a58d3d2aSXin Li</artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liwhere u is a fixed, heuristically-determined parameter balancing the distortion
*a58d3d2aSXin Liand rate.
*a58d3d2aSXin LiWhich codebook gives the best performance for a given LTP vector depends on the
*a58d3d2aSXin Liweighting matrix for that LTP vector.
*a58d3d2aSXin LiFor example, for a low valued W_ltp, it is advantageous to use the codebook
*a58d3d2aSXin Liwith 10 vectors as it has a lower average rate.
*a58d3d2aSXin LiFor a large W_ltp, on the other hand, it is often better to use the codebook
*a58d3d2aSXin Liwith 40 vectors, as it is more likely to contain the best codebook vector.
*a58d3d2aSXin LiThe weighting matrix W_ltp depends mostly on two aspects of the input signal.
*a58d3d2aSXin LiThe first is the periodicity of the signal; the more periodic, the larger W_ltp.
*a58d3d2aSXin LiThe second is the change in signal energy in the current subframe, relative to
*a58d3d2aSXin Lithe signal one pitch lag earlier.
*a58d3d2aSXin LiA decaying energy leads to a larger W_ltp than an increasing energy.
*a58d3d2aSXin LiBoth aspects fluctuate relatively slowly, which causes the W_ltp matrices for
*a58d3d2aSXin Lidifferent subframes of one frame often to be similar.
*a58d3d2aSXin LiBecause of this, one of the three codebooks typically gives good performance
*a58d3d2aSXin Lifor all subframes, and therefore the codebook search for the subframe LTP
*a58d3d2aSXin Livectors is constrained to only allow codebook vectors to be chosen from the
*a58d3d2aSXin Lisame codebook, resulting in a rate reduction.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTo find the best codebook, each of the three vector codebooks is
*a58d3d2aSXin Liused to quantize all subframe LTP vectors and produce a combined
*a58d3d2aSXin Liweighted rate-distortion measure for each vector codebook.
*a58d3d2aSXin LiThe vector codebook with the lowest combined rate-distortion
*a58d3d2aSXin Liover all subframes is chosen. The quantized LTP vectors are used
*a58d3d2aSXin Liin the noise shaping quantizer, and the index of the codebook
*a58d3d2aSXin Liplus the four indices for the four subframe codebook vectors
*a58d3d2aSXin Liare passed on to the range encoder.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Prefilter'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIn the prefilter the input signal is filtered using the spectral valley
*a58d3d2aSXin Lide-emphasis filter coefficients from the noise shaping analysis
*a58d3d2aSXin Li(see <xref target='noise_shaping_analysis_overview_section'/>).
*a58d3d2aSXin LiBy applying only the noise shaping analysis filter to the input signal,
*a58d3d2aSXin Liit provides the input to the noise shaping quantizer.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Noise Shaping Quantizer'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe noise shaping quantizer independently shapes the signal and coding noise
*a58d3d2aSXin Lispectra to obtain a perceptually higher quality at the same bitrate.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe prefilter output signal is multiplied with a compensation gain G computed
*a58d3d2aSXin Liin the noise shaping analysis. Then the output of a synthesis shaping filter
*a58d3d2aSXin Liis added, and the output of a prediction filter is subtracted to create a
*a58d3d2aSXin Liresidual signal.
*a58d3d2aSXin LiThe residual signal is multiplied by the inverse quantized quantization gain
*a58d3d2aSXin Lifrom the noise shaping analysis, and input to a scalar quantizer.
*a58d3d2aSXin LiThe quantization indices of the scalar quantizer represent a signal of pulses
*a58d3d2aSXin Lithat is input to the pyramid range encoder.
*a58d3d2aSXin LiThe scalar quantizer also outputs a quantization signal, which is multiplied
*a58d3d2aSXin Liby the quantized quantization gain from the noise shaping analysis to create
*a58d3d2aSXin Lian excitation signal.
*a58d3d2aSXin LiThe output of the prediction filter is added to the excitation signal to form
*a58d3d2aSXin Lithe quantized output signal y(n).
*a58d3d2aSXin LiThe quantized output signal y(n) is input to the synthesis shaping and
*a58d3d2aSXin Liprediction filters.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOptionally the noise shaping quantizer operates in a delayed decision
*a58d3d2aSXin Limode.
*a58d3d2aSXin LiIn this mode it uses a Viterbi algorithm to keep track of
*a58d3d2aSXin Limultiple rounding choices in the quantizer and select the best
*a58d3d2aSXin Lione after a delay of 32 samples.  This improves the rate/distortion
*a58d3d2aSXin Liperformance of the quantizer.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title='Constant Bitrate Mode'>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li  SILK was designed to run in Variable Bitrate (VBR) mode.  However
*a58d3d2aSXin Li  the reference implementation also has a Constant Bitrate (CBR) mode
*a58d3d2aSXin Li  for SILK.  In CBR mode SILK will attempt to encode each packet with
*a58d3d2aSXin Li  no more than the allowed number of bits.  The Opus wrapper code
*a58d3d2aSXin Li  then pads the bitstream if any unused bits are left in SILK mode, or
*a58d3d2aSXin Li  encodes the high band with the remaining number of bits in Hybrid mode.
*a58d3d2aSXin Li  The number of payload bits is adjusted by changing
*a58d3d2aSXin Li  the quantization gains and the rate/distortion tradeoff in the noise
*a58d3d2aSXin Li  shaping quantizer, in an iterative loop
*a58d3d2aSXin Li  around the noise shaping quantizer and entropy coding.
*a58d3d2aSXin Li  Compared to the SILK VBR mode, the CBR mode has lower
*a58d3d2aSXin Li  audio quality at a given average bitrate, and also has higher
*a58d3d2aSXin Li  computational complexity.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="CELT Encoder">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiMost of the aspects of the CELT encoder can be directly derived from the description
*a58d3d2aSXin Liof the decoder. For example, the filters and rotations in the encoder are simply the
*a58d3d2aSXin Liinverse of the operation performed by the decoder. Similarly, the quantizers generally
*a58d3d2aSXin Lioptimize for the mean square error (because noise shaping is part of the bit-stream itself),
*a58d3d2aSXin Liso no special search is required. For this reason, only the less straightforward aspects of the
*a58d3d2aSXin Liencoder are described here.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="pitch-prefilter" title="Pitch Prefilter">
*a58d3d2aSXin Li<t>The pitch prefilter is applied after the pre-emphasis. It is applied
*a58d3d2aSXin Liin such a way as to be the inverse of the decoder's post-filter. The main non-obvious aspect of the
*a58d3d2aSXin Liprefilter is the selection of the pitch period. The pitch search should be optimized for the
*a58d3d2aSXin Lifollowing criteria:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>continuity: it is important that the pitch period
*a58d3d2aSXin Lidoes not change abruptly between frames; and</t>
*a58d3d2aSXin Li<t>avoidance of pitch multiples: when the period used is a multiple of the real period
*a58d3d2aSXin Li(lower frequency fundamental), the post-filter loses most of its ability to reduce noise</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="normalization" title="Bands and Normalization">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe MDCT output is divided into bands that are designed to match the ear's critical
*a58d3d2aSXin Libands for the smallest (2.5&nbsp;ms) frame size. The larger frame sizes use integer
*a58d3d2aSXin Limultiples of the 2.5&nbsp;ms layout. For each band, the encoder
*a58d3d2aSXin Licomputes the energy that will later be encoded. Each band is then normalized by the
*a58d3d2aSXin Lisquare root of the <spanx style="strong">unquantized</spanx> energy, such that each band now forms a unit vector X.
*a58d3d2aSXin LiThe energy and the normalization are computed by compute_band_energies()
*a58d3d2aSXin Liand normalise_bands() (bands.c), respectively.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="energy-quantization" title="Energy Envelope Quantization">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiEnergy quantization (both coarse and fine) can be easily understood from the decoding process.
*a58d3d2aSXin LiFor all useful bitrates, the coarse quantizer always chooses the quantized log energy value that
*a58d3d2aSXin Liminimizes the error for each band. Only at very low rate does the encoder allow larger errors to
*a58d3d2aSXin Liminimize the rate and avoid using more bits than are available. When the
*a58d3d2aSXin Liavailable CPU requirements allow it, it is best to try encoding the coarse energy both with and without
*a58d3d2aSXin Liinter-frame prediction such that the best prediction mode can be selected. The optimal mode depends on
*a58d3d2aSXin Lithe coding rate, the available bitrate, and the current rate of packet loss.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The fine energy quantizer always chooses the quantized log energy value that
*a58d3d2aSXin Liminimizes the error for each band because the rate of the fine quantization depends only
*a58d3d2aSXin Lion the bit allocation and not on the values that are coded.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section> <!-- Energy quant -->
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Bit Allocation">
*a58d3d2aSXin Li<t>The encoder must use exactly the same bit allocation process as used by the decoder
*a58d3d2aSXin Liand described in <xref target="allocation"/>. The three mechanisms that can be used by the
*a58d3d2aSXin Liencoder to adjust the bitrate on a frame-by-frame basis are band boost, allocation trim,
*a58d3d2aSXin Liand band skipping.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Band Boost">
*a58d3d2aSXin Li<t>The reference encoder makes a decision to boost a band when the energy of that band is significantly
*a58d3d2aSXin Lihigher than that of the neighboring bands. Let E_j be the log-energy of band j, we define
*a58d3d2aSXin Li<list>
*a58d3d2aSXin Li<t>D_j = 2*E_j - E_j-1 - E_j+1 </t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li
*a58d3d2aSXin LiThe allocation of band j is boosted once if D_j &gt; t1 and twice if D_j &gt; t2. For LM&gt;=1, t1=2 and t2=4,
*a58d3d2aSXin Liwhile for LM&lt;1, t1=3 and t2=5.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Allocation Trim">
*a58d3d2aSXin Li<t>The allocation trim is a value between 0 and 10 (inclusively) that controls the allocation
*a58d3d2aSXin Libalance between the low and high frequencies. The encoder starts with a safe "default" of 5
*a58d3d2aSXin Liand deviates from that default in two different ways. First the trim can deviate by +/- 2
*a58d3d2aSXin Lidepending on the spectral tilt of the input signal. For signals with more low frequencies, the
*a58d3d2aSXin Litrim is increased by up to 2, while for signals with more high frequencies, the trim is
*a58d3d2aSXin Lidecreased by up to 2.
*a58d3d2aSXin LiFor stereo inputs, the trim value can
*a58d3d2aSXin Libe decreased by up to 4 when the inter-channel correlation at low frequency (first 8 bands)
*a58d3d2aSXin Liis high. </t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Band Skipping">
*a58d3d2aSXin Li<t>The encoder uses band skipping to ensure that the shape of the bands is only coded
*a58d3d2aSXin Liif there is at least 1/2 bit per sample available for the PVQ. If not, then no bit is allocated
*a58d3d2aSXin Liand folding is used instead. To ensure continuity in the allocation, some amount of hysteresis is
*a58d3d2aSXin Liadded to the process, such that a band that received PVQ bits in the previous frame only needs 7/16
*a58d3d2aSXin Libit/sample to be coded for the current frame, while a band that did not receive PVQ bits in the
*a58d3d2aSXin Liprevious frames needs at least 9/16 bit/sample to be coded.</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Stereo Decisions">
*a58d3d2aSXin Li<t>Because CELT applies mid-side stereo coupling in the normalized domain, it does not suffer from
*a58d3d2aSXin Liimportant stereo image problems even when the two channels are completely uncorrelated. For this reason
*a58d3d2aSXin Liit is always safe to use stereo coupling on any audio frame. That being said, there are some frames
*a58d3d2aSXin Lifor which dual (independent) stereo is still more efficient. This decision is made by comparing the estimated
*a58d3d2aSXin Lientropy with and without coupling over the first 13 bands, taking into account the fact that all bands with
*a58d3d2aSXin Limore than two MDCT bins require one extra degree of freedom when coded in mid-side. Let L1_ms and L1_lr
*a58d3d2aSXin Libe the L1-norm of the mid-side vector and the L1-norm of the left-right vector, respectively. The decision
*a58d3d2aSXin Lito use mid-side is made if and only if
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li L1_ms          L1_lr
*a58d3d2aSXin Li--------    <   -----
*a58d3d2aSXin Libins + E        bins
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Liwhere bins is the number of MDCT bins in the first 13 bands and E is the number of extra degrees of
*a58d3d2aSXin Lifreedom for mid-side coding. For LM>1, E=13, otherwise E=5.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The reference encoder decides on the intensity stereo threshold based on the bitrate alone. After
*a58d3d2aSXin Litaking into account the frame size by subtracting 80 bits per frame for coarse energy, the first
*a58d3d2aSXin Liband using intensity coding is as follows:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<texttable anchor="intensity-thresholds"
*a58d3d2aSXin Li title="Thresholds for Intensity Stereo">
*a58d3d2aSXin Li<ttcol align='center'>bitrate (kb/s)</ttcol>
*a58d3d2aSXin Li<ttcol align='center'>start band</ttcol>
*a58d3d2aSXin Li<c>&lt;35</c>      <c>8</c>
*a58d3d2aSXin Li<c>35-50</c>      <c>12</c>
*a58d3d2aSXin Li<c>50-68</c>      <c>16</c>
*a58d3d2aSXin Li<c>84-84</c>      <c>18</c>
*a58d3d2aSXin Li<c>84-102</c>     <c>19</c>
*a58d3d2aSXin Li<c>102-130</c>     <c>20</c>
*a58d3d2aSXin Li<c>&gt;130</c>     <c>disabled</c>
*a58d3d2aSXin Li</texttable>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Time-Frequency Decision">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe choice of time-frequency resolution used in <xref target="tf-change"></xref> is based on
*a58d3d2aSXin LiR-D optimization. The distortion is the L1-norm (sum of absolute values) of each band
*a58d3d2aSXin Liafter each TF resolution under consideration. The L1 norm is used because it represents the entropy
*a58d3d2aSXin Lifor a Laplacian source. The number of bits required to code a change in TF resolution between
*a58d3d2aSXin Litwo bands is higher than the cost of having those two bands use the same resolution, which is
*a58d3d2aSXin Liwhat requires the R-D optimization. The optimal decision is computed using the Viterbi algorithm.
*a58d3d2aSXin LiSee tf_analysis() in celt/celt.c.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Spreading Values Decision">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe choice of the spreading value in <xref target="spread values"></xref> has an
*a58d3d2aSXin Liimpact on the nature of the coding noise introduced by CELT. The larger the f_r value, the
*a58d3d2aSXin Lilower the impact of the rotation, and the more tonal the coding noise. The
*a58d3d2aSXin Limore tonal the signal, the more tonal the noise should be, so the CELT encoder determines
*a58d3d2aSXin Lithe optimal value for f_r by estimating how tonal the signal is. The tonality estimate
*a58d3d2aSXin Liis based on discrete pdf (4-bin histogram) of each band. Bands that have a large number of small
*a58d3d2aSXin Livalues are considered more tonal and a decision is made by combining all bands with more than
*a58d3d2aSXin Li8 samples. See spreading_decision() in celt/bands.c.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="pvq" title="Spherical Vector Quantization">
*a58d3d2aSXin Li<t>CELT uses a Pyramid Vector Quantization (PVQ) <xref target="PVQ"></xref>
*a58d3d2aSXin Licodebook for quantizing the details of the spectrum in each band that have not
*a58d3d2aSXin Libeen predicted by the pitch predictor. The PVQ codebook consists of all sums
*a58d3d2aSXin Liof K signed pulses in a vector of N samples, where two pulses at the same position
*a58d3d2aSXin Liare required to have the same sign. Thus the codebook includes
*a58d3d2aSXin Liall integer codevectors y of N dimensions that satisfy sum(abs(y(j))) = K.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIn bands where there are sufficient bits allocated PVQ is used to encode
*a58d3d2aSXin Lithe unit vector that results from the normalization in
*a58d3d2aSXin Li<xref target="normalization"></xref> directly. Given a PVQ codevector y,
*a58d3d2aSXin Lithe unit vector X is obtained as X = y/||y||, where ||.|| denotes the
*a58d3d2aSXin LiL2 norm.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="pvq-search" title="PVQ Search">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe search for the best codevector y is performed by alg_quant()
*a58d3d2aSXin Li(vq.c). There are several possible approaches to the
*a58d3d2aSXin Lisearch, with a trade-off between quality and complexity. The method used in the reference
*a58d3d2aSXin Liimplementation computes an initial codeword y1 by projecting the normalized spectrum
*a58d3d2aSXin LiX onto the codebook pyramid of K-1 pulses:
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Liy0 = truncate_towards_zero( (K-1) * X / sum(abs(X)))
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiDepending on N, K and the input data, the initial codeword y0 may contain from
*a58d3d2aSXin Li0 to K-1 non-zero values. All the remaining pulses, with the exception of the last one,
*a58d3d2aSXin Liare found iteratively with a greedy search that minimizes the normalized correlation
*a58d3d2aSXin Libetween y and X:
*a58d3d2aSXin Li<figure align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li      T
*a58d3d2aSXin LiJ = -X * y / ||y||
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe search described above is considered to be a good trade-off between quality
*a58d3d2aSXin Liand computational cost. However, there are other possible ways to search the PVQ
*a58d3d2aSXin Licodebook and the implementers MAY use any other search methods. See alg_quant() in celt/vq.c.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="cwrs-encoder" title="PVQ Encoding">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe vector to encode, X, is converted into an index i such that
*a58d3d2aSXin Li 0&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;V(N,K) as follows.
*a58d3d2aSXin LiLet i&nbsp;=&nbsp;0 and k&nbsp;=&nbsp;0.
*a58d3d2aSXin LiThen for j&nbsp;=&nbsp;(N&nbsp;-&nbsp;1) down to 0, inclusive, do:
*a58d3d2aSXin Li<list style="numbers">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf k&nbsp;>&nbsp;0, set
*a58d3d2aSXin Li i&nbsp;=&nbsp;i&nbsp;+&nbsp;(V(N-j-1,k-1)&nbsp;+&nbsp;V(N-j,k-1))/2.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>Set k&nbsp;=&nbsp;k&nbsp;+&nbsp;abs(X[j]).</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf X[j]&nbsp;&lt;&nbsp;0, set
*a58d3d2aSXin Li i&nbsp;=&nbsp;i&nbsp;+&nbsp;(V(N-j-1,k)&nbsp;+&nbsp;V(N-j,k))/2.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe index i is then encoded using the procedure in
*a58d3d2aSXin Li <xref target="encoding-ints"/> with ft&nbsp;=&nbsp;V(N,K).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="conformance" title="Conformance">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIt is our intention to allow the greatest possible choice of freedom in
*a58d3d2aSXin Liimplementing the specification. For this reason, outside of the exceptions
*a58d3d2aSXin Linoted in this section, conformance is defined through the reference
*a58d3d2aSXin Liimplementation of the decoder provided in <xref target="ref-implementation"/>.
*a58d3d2aSXin LiAlthough this document includes an English description of the codec, should
*a58d3d2aSXin Lithe description contradict the source code of the reference implementation,
*a58d3d2aSXin Lithe latter shall take precedence.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiCompliance with this specification means that in addition to following the normative keywords in this document,
*a58d3d2aSXin Li a decoder's output MUST also be
*a58d3d2aSXin Li within the thresholds specified by the opus_compare.c tool (included
*a58d3d2aSXin Li with the code) when compared to the reference implementation for each of the
*a58d3d2aSXin Li test vectors provided (see <xref target="test-vectors"></xref>) and for each output
*a58d3d2aSXin Li sampling rate and channel count supported. In addition, a compliant
*a58d3d2aSXin Li decoder implementation MUST have the same final range decoder state as that of the
*a58d3d2aSXin Li reference decoder. It is therefore RECOMMENDED that the
*a58d3d2aSXin Li decoder implement the same functional behavior as the reference.
*a58d3d2aSXin Li
*a58d3d2aSXin Li A decoder implementation is not required to support all output sampling
*a58d3d2aSXin Li rates or all output channel counts.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Testing">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiUsing the reference code provided in <xref target="ref-implementation"></xref>,
*a58d3d2aSXin Lia test vector can be decoded with
*a58d3d2aSXin Li<list>
*a58d3d2aSXin Li<t>opus_demo -d &lt;rate&gt; &lt;channels&gt; testvectorX.bit testX.out</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Liwhere &lt;rate&gt; is the sampling rate and can be 8000, 12000, 16000, 24000, or 48000, and
*a58d3d2aSXin Li&lt;channels&gt; is 1 for mono or 2 for stereo.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIf the range decoder state is incorrect for one of the frames, the decoder will exit with
*a58d3d2aSXin Li"Error: Range coder state mismatch between encoder and decoder". If the decoder succeeds, then
*a58d3d2aSXin Lithe output can be compared with the "reference" output with
*a58d3d2aSXin Li<list>
*a58d3d2aSXin Li<t>opus_compare -s -r &lt;rate&gt; testvectorX.dec testX.out</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Lifor stereo or
*a58d3d2aSXin Li<list>
*a58d3d2aSXin Li<t>opus_compare -r &lt;rate&gt; testvectorX.dec testX.out</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Lifor mono.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>In addition to indicating whether the test vector comparison passes, the opus_compare tool
*a58d3d2aSXin Lioutputs an "Opus quality metric" that indicates how well the tested decoder matches the
*a58d3d2aSXin Lireference implementation. A quality of 0 corresponds to the passing threshold, while
*a58d3d2aSXin Lia quality of 100 is the highest possible value and means that the output of the tested decoder is identical to the reference
*a58d3d2aSXin Liimplementation. The passing threshold (quality 0) was calibrated in such a way that it corresponds to
*a58d3d2aSXin Liadditive white noise with a 48 dB SNR (similar to what can be obtained on a cassette deck).
*a58d3d2aSXin LiIt is still possible for an implementation to sound very good with such a low quality measure
*a58d3d2aSXin Li(e.g. if the deviation is due to inaudible phase distortion), but unless this is verified by
*a58d3d2aSXin Lilistening tests, it is RECOMMENDED that implementations achieve a quality above 90 for 48&nbsp;kHz
*a58d3d2aSXin Lidecoding. For other sampling rates, it is normal for the quality metric to be lower
*a58d3d2aSXin Li(typically as low as 50 even for a good implementation) because of harmless mismatch with
*a58d3d2aSXin Lithe delay and phase of the internal sampling rate conversion.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOn POSIX environments, the run_vectors.sh script can be used to verify all test
*a58d3d2aSXin Livectors. This can be done with
*a58d3d2aSXin Li<list>
*a58d3d2aSXin Li<t>run_vectors.sh &lt;exec path&gt; &lt;vector path&gt; &lt;rate&gt;</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Liwhere &lt;exec path&gt; is the directory where the opus_demo and opus_compare executables
*a58d3d2aSXin Liare built and &lt;vector path&gt; is the directory containing the test vectors.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="opus-custom" title="Opus Custom">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOpus Custom is an OPTIONAL part of the specification that is defined to
*a58d3d2aSXin Lihandle special sample rates and frame rates that are not supported by the
*a58d3d2aSXin Limain Opus specification. Use of Opus Custom is discouraged for all but very
*a58d3d2aSXin Lispecial applications for which a frame size different from 2.5, 5, 10, or 20&nbsp;ms is
*a58d3d2aSXin Lineeded (for either complexity or latency reasons). Because Opus Custom is
*a58d3d2aSXin Lioptional, streams encoded using Opus Custom cannot be expected to be decodable by all Opus
*a58d3d2aSXin Liimplementations. Also, because no in-band mechanism exists for specifying the sampling
*a58d3d2aSXin Lirate and frame size of Opus Custom streams, out-of-band signaling is required.
*a58d3d2aSXin LiIn Opus Custom operation, only the CELT layer is available, using the opus_custom_* function
*a58d3d2aSXin Licalls in opus_custom.h.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="security" title="Security Considerations">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiImplementations of the Opus codec need to take appropriate security considerations
*a58d3d2aSXin Liinto account, as outlined in <xref target="DOS"/>.
*a58d3d2aSXin LiIt is extremely important for the decoder to be robust against malicious
*a58d3d2aSXin Lipayloads.
*a58d3d2aSXin LiMalicious payloads must not cause the decoder to overrun its allocated memory
*a58d3d2aSXin Li or to take an excessive amount of resources to decode.
*a58d3d2aSXin LiAlthough problems
*a58d3d2aSXin Liin encoders are typically rarer, the same applies to the encoder. Malicious
*a58d3d2aSXin Liaudio streams must not cause the encoder to misbehave because this would
*a58d3d2aSXin Liallow an attacker to attack transcoding gateways.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe reference implementation contains no known buffer overflow or cases where
*a58d3d2aSXin Li a specially crafted packet or audio segment could cause a significant increase
*a58d3d2aSXin Li in CPU load.
*a58d3d2aSXin LiHowever, on certain CPU architectures where denormalized floating-point
*a58d3d2aSXin Li operations are much slower than normal floating-point operations, it is
*a58d3d2aSXin Li possible for some audio content (e.g., silence or near-silence) to cause an
*a58d3d2aSXin Li increase in CPU load.
*a58d3d2aSXin LiDenormals can be introduced by reordering operations in the compiler and depend
*a58d3d2aSXin Li on the target architecture, so it is difficult to guarantee that an implementation
*a58d3d2aSXin Li avoids them.
*a58d3d2aSXin LiFor architectures on which denormals are problematic, adding very small
*a58d3d2aSXin Li floating-point offsets to the affected signals to prevent significant numbers
*a58d3d2aSXin Li of denormalized operations is RECOMMENDED.
*a58d3d2aSXin LiAlternatively, it is often possible to configure the hardware to treat
*a58d3d2aSXin Li denormals as zero (DAZ).
*a58d3d2aSXin LiNo such issue exists for the fixed-point reference implementation.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>The reference implementation was validated in the following conditions:
*a58d3d2aSXin Li<list style="numbers">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSending the decoder valid packets generated by the reference encoder and
*a58d3d2aSXin Li verifying that the decoder's final range coder state matches that of the
*a58d3d2aSXin Li encoder.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSending the decoder packets generated by the reference encoder and then
*a58d3d2aSXin Li subjected to random corruption.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>Sending the decoder random packets.</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiSending the decoder packets generated by a version of the reference encoder
*a58d3d2aSXin Li modified to make random coding decisions (internal fuzzing), including mode
*a58d3d2aSXin Li switching, and verifying that the range coder final states match.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiIn all of the conditions above, both the encoder and the decoder were run
*a58d3d2aSXin Li inside the <xref target="Valgrind">Valgrind</xref> memory
*a58d3d2aSXin Li debugger, which tracks reads and writes to invalid memory regions as well as
*a58d3d2aSXin Li the use of uninitialized memory.
*a58d3d2aSXin LiThere were no errors reported on any of the tested conditions.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="IANA Considerations">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThis document has no actions for IANA.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="Acknowledgements" title="Acknowledgements">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThanks to all other developers, including Raymond Chen, Soeren Skak Jensen, Gregory Maxwell,
*a58d3d2aSXin LiChristopher Montgomery, and Karsten Vandborg Soerensen. We would also
*a58d3d2aSXin Lilike to thank Igor Dyakonov, Jan Skoglund, and Christian Hoene for their help with subjective testing of the
*a58d3d2aSXin LiOpus codec. Thanks to Ralph Giles, John Ridges, Ben Schwartz, Keith Yan, Christian Hoene, Kat Walsh, and many others on the Opus and CELT mailing lists
*a58d3d2aSXin Lifor their bug reports and feedback.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Copying Conditions">
*a58d3d2aSXin Li<t>The authors agree to grant third parties the irrevocable right to copy, use and distribute
*a58d3d2aSXin Lithe work (excluding Code Components available under the simplified BSD license), with or
*a58d3d2aSXin Liwithout modification, in any medium, without royalty, provided that, unless separate
*a58d3d2aSXin Lipermission is granted, redistributed modified works do not contain misleading author, version,
*a58d3d2aSXin Liname of work, or endorsement information.</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</middle>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<back>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<references title="Normative References">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="rfc2119">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Key words for use in RFCs to Indicate Requirement Levels </title>
*a58d3d2aSXin Li<author initials="S." surname="Bradner" fullname="Scott Bradner"></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="RFC" value="2119" />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</references>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<references title="Informative References">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor='requirements'>
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Requirements for an Internet Audio Codec</title>
*a58d3d2aSXin Li<author initials='J.-M.' surname='Valin' fullname='J.-M. Valin'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author initials='K.' surname='Vos' fullname='K. Vos'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author>
*a58d3d2aSXin Li<organization>IETF</organization></author>
*a58d3d2aSXin Li<date year='2011' month='August' />
*a58d3d2aSXin Li<abstract>
*a58d3d2aSXin Li<t>This document provides specific requirements for an Internet audio
*a58d3d2aSXin Li   codec.  These requirements address quality, sample rate, bitrate,
*a58d3d2aSXin Li   and packet-loss robustness, as well as other desirable properties.
*a58d3d2aSXin Li</t></abstract></front>
*a58d3d2aSXin Li<seriesInfo name='RFC' value='6366' />
*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/rfc/rfc6366.txt' />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?>
*a58d3d2aSXin Li<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3533.xml"?>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor='SILK' target='http://developer.skype.com/silk'>
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>SILK Speech Codec</title>
*a58d3d2aSXin Li<author initials='K.' surname='Vos' fullname='K. Vos'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author initials='S.' surname='Jensen' fullname='S. Jensen'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author initials='K.' surname='Soerensen' fullname='K. Soerensen'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<date year='2010' month='March' />
*a58d3d2aSXin Li<abstract>
*a58d3d2aSXin Li<t></t>
*a58d3d2aSXin Li</abstract></front>
*a58d3d2aSXin Li<seriesInfo name='Internet-Draft' value='draft-vos-silk-01' />
*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/html/draft-vos-silk-01' />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="laroia-icassp">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title abbrev="Robust and Efficient Quantization of Speech LSP">
*a58d3d2aSXin LiRobust and Efficient Quantization of Speech LSP Parameters Using Structured Vector Quantization
*a58d3d2aSXin Li</title>
*a58d3d2aSXin Li<author initials="R.L." surname="Laroia" fullname="R.">
*a58d3d2aSXin Li<organization/>
*a58d3d2aSXin Li</author>
*a58d3d2aSXin Li<author initials="N.P." surname="Phamdo" fullname="N.">
*a58d3d2aSXin Li<organization/>
*a58d3d2aSXin Li</author>
*a58d3d2aSXin Li<author initials="N.F." surname="Farvardin" fullname="N.">
*a58d3d2aSXin Li<organization/>
*a58d3d2aSXin Li</author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="ICASSP-1991, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 641-644, October" value="1991"/>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor='CELT' target='http://celt-codec.org/'>
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Constrained-Energy Lapped Transform (CELT) Codec</title>
*a58d3d2aSXin Li<author initials='J-M.' surname='Valin' fullname='J-M. Valin'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author initials='T&#x2E;B.' surname='Terriberry' fullname='Timothy B. Terriberry'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author initials='G.' surname='Maxwell' fullname='G. Maxwell'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author initials='C.' surname='Montgomery' fullname='C. Montgomery'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<date year='2010' month='July' />
*a58d3d2aSXin Li<abstract>
*a58d3d2aSXin Li<t></t>
*a58d3d2aSXin Li</abstract></front>
*a58d3d2aSXin Li<seriesInfo name='Internet-Draft' value='draft-valin-celt-codec-02' />
*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/html/draft-valin-celt-codec-02' />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor='SRTP-VBR'>
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Guidelines for the use of Variable Bit Rate Audio with Secure RTP</title>
*a58d3d2aSXin Li<author initials='C.' surname='Perkins' fullname='K. Vos'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author initials='J.M.' surname='Valin' fullname='J.M. Valin'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<date year='2011' month='July' />
*a58d3d2aSXin Li<abstract>
*a58d3d2aSXin Li<t></t>
*a58d3d2aSXin Li</abstract></front>
*a58d3d2aSXin Li<seriesInfo name='RFC' value='6562' />
*a58d3d2aSXin Li<format type='TXT' target='http://tools.ietf.org/html/rfc6562' />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor='DOS'>
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Internet Denial-of-Service Considerations</title>
*a58d3d2aSXin Li<author initials='M.' surname='Handley' fullname='M. Handley'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author initials='E.' surname='Rescorla' fullname='E. Rescorla'>
*a58d3d2aSXin Li<organization /></author>
*a58d3d2aSXin Li<author>
*a58d3d2aSXin Li<organization>IAB</organization></author>
*a58d3d2aSXin Li<date year='2006' month='December' />
*a58d3d2aSXin Li<abstract>
*a58d3d2aSXin Li<t>This document provides an overview of possible avenues for denial-of-service (DoS) attack on Internet systems.  The aim is to encourage protocol designers and network engineers towards designs that are more robust.  We discuss partial solutions that reduce the effectiveness of attacks, and how some solutions might inadvertently open up alternative vulnerabilities.  This memo provides information for the Internet community.</t></abstract></front>
*a58d3d2aSXin Li<seriesInfo name='RFC' value='4732' />
*a58d3d2aSXin Li<format type='TXT' octets='91844' target='ftp://ftp.isi.edu/in-notes/rfc4732.txt' />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Martin79">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Range encoding: An algorithm for removing redundancy from a digitised message</title>
*a58d3d2aSXin Li<author initials="G.N.N." surname="Martin" fullname="G. Nigel N. Martin"><organization/></author>
*a58d3d2aSXin Li<date year="1979" />
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="Proc. Institution of Electronic and Radio Engineers International Conference on Video and Data Recording" value="" />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="coding-thesis">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Source coding algorithms for fast data compression</title>
*a58d3d2aSXin Li<author initials="R." surname="Pasco" fullname=""><organization/></author>
*a58d3d2aSXin Li<date month="May" year="1976" />
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="Ph.D. thesis" value="Dept. of Electrical Engineering, Stanford University" />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="PVQ">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>A Pyramid Vector Quantizer</title>
*a58d3d2aSXin Li<author initials="T." surname="Fischer" fullname=""><organization/></author>
*a58d3d2aSXin Li<date month="July" year="1986" />
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. on Information Theory, Vol. 32" value="pp. 568-583" />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Kabal86">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>The Computation of Line Spectral Frequencies Using Chebyshev Polynomials</title>
*a58d3d2aSXin Li<author initials="P." surname="Kabal" fullname="P. Kabal"><organization/></author>
*a58d3d2aSXin Li<author initials="R." surname="Ramachandran" fullname="R. P. Ramachandran"><organization/></author>
*a58d3d2aSXin Li<date month="December" year="1986" />
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. Acoustics, Speech, Signal Processing, vol. 34, no. 6" value="pp. 1419-1426" />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Valgrind" target="http://valgrind.org/">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Valgrind website</title>
*a58d3d2aSXin Li<author></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Google-NetEQ" target="http://code.google.com/p/webrtc/source/browse/trunk/src/modules/audio_coding/NetEQ/main/source/?r=583">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Google NetEQ code</title>
*a58d3d2aSXin Li<author></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Google-WebRTC" target="http://code.google.com/p/webrtc/">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Google WebRTC code</title>
*a58d3d2aSXin Li<author></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Opus-git" target="git://git.xiph.org/opus.git">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Opus Git Repository</title>
*a58d3d2aSXin Li<author></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Opus-website" target="http://opus-codec.org/">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Opus website</title>
*a58d3d2aSXin Li<author></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Vorbis-website" target="http://xiph.org/vorbis/">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Vorbis website</title>
*a58d3d2aSXin Li<author></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Matroska-website" target="http://matroska.org/">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Matroska website</title>
*a58d3d2aSXin Li<author></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Vectors-website" target="http://opus-codec.org/testvectors/">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Opus Testvectors (webside)</title>
*a58d3d2aSXin Li<author></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Vectors-proc" target="http://www.ietf.org/proceedings/83/slides/slides-83-codec-0.gz">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Opus Testvectors (proceedings)</title>
*a58d3d2aSXin Li<author></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="line-spectral-pairs" target="http://en.wikipedia.org/wiki/Line_spectral_pairs">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Line Spectral Pairs</title>
*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="range-coding" target="http://en.wikipedia.org/wiki/Range_coding">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Range Coding</title>
*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Hadamard" target="http://en.wikipedia.org/wiki/Hadamard_transform">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Hadamard Transform</title>
*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Viterbi" target="http://en.wikipedia.org/wiki/Viterbi_algorithm">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Viterbi Algorithm</title>
*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Whitening" target="http://en.wikipedia.org/wiki/White_noise">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>White Noise</title>
*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="LPC" target="http://en.wikipedia.org/wiki/Linear_prediction">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Linear Prediction</title>
*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="MDCT" target="http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Modified Discrete Cosine Transform</title>
*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="FFT" target="http://en.wikipedia.org/wiki/Fast_Fourier_transform">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Fast Fourier Transform</title>
*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="z-transform" target="http://en.wikipedia.org/wiki/Z-transform">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Z-transform</title>
*a58d3d2aSXin Li<author><organization>Wikipedia</organization></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Burg">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Maximum Entropy Spectral Analysis</title>
*a58d3d2aSXin Li<author initials="JP." surname="Burg" fullname="J.P. Burg"><organization/></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Schur">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>A fixed point computation of partial correlation coefficients</title>
*a58d3d2aSXin Li<author initials="J." surname="Le Roux" fullname="J. Le Roux"><organization/></author>
*a58d3d2aSXin Li<author initials="C." surname="Gueguen" fullname="C. Gueguen"><organization/></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="ICASSP-1977, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 257-259, October" value="1977"/>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Princen86">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Analysis/synthesis filter bank design based on time domain aliasing cancellation</title>
*a58d3d2aSXin Li<author initials="J." surname="Princen" fullname="John P. Princen"><organization/></author>
*a58d3d2aSXin Li<author initials="A." surname="Bradley" fullname="Alan B. Bradley"><organization/></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. Acoust. Speech Sig. Proc. ASSP-34 (5), 1153-1161" value="1986"/>
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Valin2010">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>A High-Quality Speech and Audio Codec With Less Than 10 ms delay</title>
*a58d3d2aSXin Li<author initials="JM" surname="Valin" fullname="Jean-Marc Valin"><organization/>
*a58d3d2aSXin Li</author>
*a58d3d2aSXin Li<author initials="T. B." surname="Terriberry" fullname="Timothy Terriberry"><organization/></author>
*a58d3d2aSXin Li<author initials="C." surname="Montgomery" fullname="Christopher Montgomery"><organization/></author>
*a58d3d2aSXin Li<author initials="G." surname="Maxwell" fullname="Gregory Maxwell"><organization/></author>
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="IEEE Trans. on Audio, Speech and Language Processing, Vol. 18, No. 1, pp. 58-67" value="2010" />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<reference anchor="Zwicker61">
*a58d3d2aSXin Li<front>
*a58d3d2aSXin Li<title>Subdivision of the audible frequency range into critical bands</title>
*a58d3d2aSXin Li<author initials="E." surname="Zwicker" fullname="E. Zwicker"><organization/></author>
*a58d3d2aSXin Li<date month="February" year="1961" />
*a58d3d2aSXin Li</front>
*a58d3d2aSXin Li<seriesInfo name="The Journal of the Acoustical Society of America, Vol. 33, No 2" value="p. 248" />
*a58d3d2aSXin Li</reference>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li</references>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="ref-implementation" title="Reference Implementation">
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>This appendix contains the complete source code for the
*a58d3d2aSXin Lireference implementation of the Opus codec written in C. By default,
*a58d3d2aSXin Lithis implementation relies on floating-point arithmetic, but it can be
*a58d3d2aSXin Licompiled to use only fixed-point arithmetic by defining the FIXED_POINT
*a58d3d2aSXin Limacro. Information on building and using the reference implementation is
*a58d3d2aSXin Liavailable in the README file.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>The implementation can be compiled with either a C89 or a C99
*a58d3d2aSXin Licompiler. It is reasonably optimized for most platforms such that
*a58d3d2aSXin Lionly architecture-specific optimizations are likely to be useful.
*a58d3d2aSXin LiThe FFT <xref target="FFT"/> used is a slightly modified version of the KISS-FFT library,
*a58d3d2aSXin Libut it is easy to substitute any other FFT library.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiWhile the reference implementation does not rely on any
*a58d3d2aSXin Li<spanx style="emph">undefined behavior</spanx> as defined by C89 or C99,
*a58d3d2aSXin Liit relies on common <spanx style="emph">implementation-defined behavior</spanx>
*a58d3d2aSXin Lifor two's complement architectures:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>Right shifts of negative values are consistent with two's complement arithmetic, so that a>>b is equivalent to floor(a/(2**b)),</t>
*a58d3d2aSXin Li<t>For conversion to a signed integer of N bits, the value is reduced modulo 2**N to be within range of the type,</t>
*a58d3d2aSXin Li<t>The result of integer division of a negative value is truncated towards zero, and</t>
*a58d3d2aSXin Li<t>The compiler provides a 64-bit integer type (a C99 requirement which is supported by most C89 compilers).</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiIn its current form, the reference implementation also requires the following
*a58d3d2aSXin Liarchitectural characteristics to obtain acceptable performance:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>Two's complement arithmetic,</t>
*a58d3d2aSXin Li<t>At least a 16 bit by 16 bit integer multiplier (32-bit result), and</t>
*a58d3d2aSXin Li<t>At least a 32-bit adder/accumulator.</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Extracting the source">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe complete source code can be extracted from this draft, by running the
*a58d3d2aSXin Lifollowing command line:
*a58d3d2aSXin Li
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t><![CDATA[
*a58d3d2aSXin Licat draft-ietf-codec-opus.txt | grep '^\ \ \ ###' | sed -e 's/...###//' | base64 -d > opus_source.tar.gz
*a58d3d2aSXin Li]]></t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Litar xzvf opus_source.tar.gz
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>cd opus_source</t>
*a58d3d2aSXin Li<t>make</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin LiOn systems where the provided Makefile does not work, the following command line may be used to compile
*a58d3d2aSXin Lithe source code:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t><![CDATA[
*a58d3d2aSXin Licc -O2 -g -o opus_demo src/opus_demo.c `cat *.mk | grep -v fixed | sed -e 's/.*=//' -e 's/\\\\//'` -DOPUS_BUILD -Iinclude -Icelt -Isilk -Isilk/float -DUSE_ALLOCA -Drestrict= -lm
*a58d3d2aSXin Li]]></t></list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiOn systems where the base64 utility is not present, the following commands can be used instead:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t><![CDATA[
*a58d3d2aSXin Licat draft-ietf-codec-opus.txt | grep '^\ \ \ ###' | sed -e 's/...###//' > opus.b64
*a58d3d2aSXin Li]]></t>
*a58d3d2aSXin Li<t>openssl base64 -d -in opus.b64 > opus_source.tar.gz</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Up-to-date Implementation">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiAs of the time of publication of this memo, an up-to-date implementation conforming to
*a58d3d2aSXin Lithis standard is available in a
*a58d3d2aSXin Li <xref target='Opus-git'>Git repository</xref>.
*a58d3d2aSXin LiReleases and other resources are available at
*a58d3d2aSXin Li <xref target='Opus-website'/>. However, although that implementation is expected to
*a58d3d2aSXin Li remain conformant with the standard, it is the code in this document that shall
*a58d3d2aSXin Li remain normative.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section title="Base64-encoded Source Code">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin Li<?rfc include="opus_source.base64"?>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="test-vectors" title="Test Vectors">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiBecause of size constraints, the Opus test vectors are not distributed in this
*a58d3d2aSXin Lidraft. They are available in the proceedings of the 83th IETF meeting (Paris) <xref target="Vectors-proc"/> and from the Opus codec website at
*a58d3d2aSXin Li<xref target="Vectors-website"/>. These test vectors were created specifically to exercise
*a58d3d2aSXin Liall aspects of the decoder and therefore the audio quality of the decoded output is
*a58d3d2aSXin Lisignificantly lower than what Opus can achieve in normal operation.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe SHA1 hash of the files in the test vector package are
*a58d3d2aSXin Li<?rfc include="testvectors_sha1"?>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<section anchor="self-delimiting-framing" title="Self-Delimiting Framing">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiTo use the internal framing described in <xref target="modes"/>, the decoder
*a58d3d2aSXin Li must know the total length of the Opus packet, in bytes.
*a58d3d2aSXin LiThis section describes a simple variation of that framing which can be used
*a58d3d2aSXin Li when the total length of the packet is not known.
*a58d3d2aSXin LiNothing in the encoding of the packet itself allows a decoder to distinguish
*a58d3d2aSXin Li between the regular, undelimited framing and the self-delimiting framing
*a58d3d2aSXin Li described in this appendix.
*a58d3d2aSXin LiWhich one is used and where must be established by context at the transport
*a58d3d2aSXin Li layer.
*a58d3d2aSXin LiIt is RECOMMENDED that a transport layer choose exactly one framing scheme,
*a58d3d2aSXin Li rather than allowing an encoder to signal which one it wants to use.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiFor example, although a regular Opus stream does not support more than two
*a58d3d2aSXin Li channels, a multi-channel Opus stream may be formed from several one- and
*a58d3d2aSXin Li two-channel streams.
*a58d3d2aSXin LiTo pack an Opus packet from each of these streams together in a single packet
*a58d3d2aSXin Li at the transport layer, one could use the self-delimiting framing for all but
*a58d3d2aSXin Li the last stream, and then the regular, undelimited framing for the last one.
*a58d3d2aSXin LiReverting to the undelimited framing for the last stream saves overhead
*a58d3d2aSXin Li (because the total size of the transport-layer packet will still be known),
*a58d3d2aSXin Li and ensures that a "multi-channel" stream which only has a single Opus stream
*a58d3d2aSXin Li uses the same framing as a regular Opus stream does.
*a58d3d2aSXin LiThis avoids the need for signaling to distinguish these two cases.
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiThe self-delimiting framing is identical to the regular, undelimited framing
*a58d3d2aSXin Li from <xref target="modes"/>, except that each Opus packet contains one extra
*a58d3d2aSXin Li length field, encoded using the same one- or two-byte scheme from
*a58d3d2aSXin Li <xref target="frame-length-coding"/>.
*a58d3d2aSXin LiThis extra length immediately precedes the compressed data of the first Opus
*a58d3d2aSXin Li frame in the packet, and is interpreted in the various modes as follows:
*a58d3d2aSXin Li<list style="symbols">
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiCode&nbsp;0 packets: It is the length of the single Opus frame (see
*a58d3d2aSXin Li <xref target="sd_code0_packet"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiCode&nbsp;1 packets: It is the length used for both of the Opus frames (see
*a58d3d2aSXin Li <xref target="sd_code1_packet"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiCode&nbsp;2 packets: It is the length of the second Opus frame (see
*a58d3d2aSXin Li <xref target="sd_code2_packet"/>).</t>
*a58d3d2aSXin Li<t>
*a58d3d2aSXin LiCBR Code&nbsp;3 packets: It is the length used for all of the Opus frames (see
*a58d3d2aSXin Li <xref target="sd_code3cbr_packet"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li<t>VBR Code&nbsp;3 packets: It is the length of the last Opus frame (see
*a58d3d2aSXin Li <xref target="sd_code3vbr_packet"/>).
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li</list>
*a58d3d2aSXin Li</t>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="sd_code0_packet" title="A Self-Delimited Code 0 Packet"
*a58d3d2aSXin Li align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|0|0| N1 (1-2 bytes):                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
*a58d3d2aSXin Li|               Compressed frame 1 (N1 bytes)...                :
*a58d3d2aSXin Li:                                                               |
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="sd_code1_packet" title="A Self-Delimited Code 1 Packet"
*a58d3d2aSXin Li align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|0|1| N1 (1-2 bytes):                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               :
*a58d3d2aSXin Li|               Compressed frame 1 (N1 bytes)...                |
*a58d3d2aSXin Li:                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                               |                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               :
*a58d3d2aSXin Li|               Compressed frame 2 (N1 bytes)...                |
*a58d3d2aSXin Li:                                               +-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="sd_code2_packet" title="A Self-Delimited Code 2 Packet"
*a58d3d2aSXin Li align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|1|0| N1 (1-2 bytes): N2 (1-2 bytes :               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               :
*a58d3d2aSXin Li|               Compressed frame 1 (N1 bytes)...                |
*a58d3d2aSXin Li:                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                               |                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
*a58d3d2aSXin Li|               Compressed frame 2 (N2 bytes)...                :
*a58d3d2aSXin Li:                                                               |
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="sd_code3cbr_packet" title="A Self-Delimited CBR Code 3 Packet"
*a58d3d2aSXin Li align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|1|1|0|p|     M     | Pad len (Opt) : N1 (1-2 bytes):
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame 1 (N1 bytes)...                :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame 2 (N1 bytes)...                :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:                              ...                              :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame M (N1 bytes)...                :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li:                  Opus Padding (Optional)...                   |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li<figure anchor="sd_code3vbr_packet" title="A Self-Delimited VBR Code 3 Packet"
*a58d3d2aSXin Li align="center">
*a58d3d2aSXin Li<artwork align="center"><![CDATA[
*a58d3d2aSXin Li 0                   1                   2                   3
*a58d3d2aSXin Li 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li| config  |s|1|1|1|p|     M     | Padding length (Optional)     :
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li: N1 (1-2 bytes):     ...       :     N[M-1]    |     N[M]      :
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame 1 (N1 bytes)...                :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:               Compressed frame 2 (N2 bytes)...                :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:                              ...                              :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li:              Compressed frame M (N[M] bytes)...               :
*a58d3d2aSXin Li|                                                               |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li:                  Opus Padding (Optional)...                   |
*a58d3d2aSXin Li+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*a58d3d2aSXin Li]]></artwork>
*a58d3d2aSXin Li</figure>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</section>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</back>
*a58d3d2aSXin Li
*a58d3d2aSXin Li</rfc>