xref: /btstack/3rd-party/bluedroid/decoder/srce/synthesis-sbc.c (revision 328627d19a85cc4acfad9ec28aad4e66ccab99a3)
1 /******************************************************************************
2  *
3  *  Copyright (C) 2014 The Android Open Source Project
4  *  Copyright 2003 - 2004 Open Interface North America, Inc. All rights reserved.
5  *
6  *  Licensed under the Apache License, Version 2.0 (the "License");
7  *  you may not use this file except in compliance with the License.
8  *  You may obtain a copy of the License at:
9  *
10  *  http://www.apache.org/licenses/LICENSE-2.0
11  *
12  *  Unless required by applicable law or agreed to in writing, software
13  *  distributed under the License is distributed on an "AS IS" BASIS,
14  *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15  *  See the License for the specific language governing permissions and
16  *  limitations under the License.
17  *
18  ******************************************************************************/
19 
20 /**********************************************************************************
21   $Revision: #1 $
22 ***********************************************************************************/
23 
24 /** @file
25 
26 This file, along with synthesis-generated.c, contains the synthesis
27 filterbank routines. The operations performed correspond to the
28 operations described in A2DP Appendix B, Figure 12.3. Several
29 mathematical optimizations are performed, particularly for the
30 8-subband case.
31 
32 One important optimization is to note that the "matrixing" operation
33 can be decomposed into the product of a type II discrete cosine kernel
34 and another, sparse matrix.
35 
36 According to Fig 12.3, in the 8-subband case,
37 @code
38     N[k][i] = cos((i+0.5)*(k+4)*pi/8), k = 0..15 and i = 0..7
39 @endcode
40 
41 N can be factored as R * C2, where C2 is an 8-point type II discrete
42 cosine kernel given by
43 @code
44     C2[k][i] = cos((i+0.5)*k*pi/8)), k = 0..7 and i = 0..7
45 @endcode
46 
47 R turns out to be a sparse 16x8 matrix with the following non-zero
48 entries:
49 @code
50     R[k][k+4]        =  1,   k = 0..3
51     R[k][abs(12-k)]  = -1,   k = 5..15
52 @endcode
53 
54 The spec describes computing V[0..15] as N * R.
55 @code
56     V[0..15] = N * R = (R * C2) * R = R * (C2 * R)
57 @endcode
58 
59 C2 * R corresponds to computing the discrete cosine transform of R, so
60 V[0..15] can be computed by taking the DCT of R followed by assignment
61 and selective negation of the DCT result into V.
62 
63         Although this was derived empirically using GNU Octave, it is
64         formally demonstrated in, e.g., Liu, Chi-Min and Lee,
65         Wen-Chieh. "A Unified Fast Algorithm for Cosine Modulated
66         Filter Banks in Current Audio Coding Standards." Journal of
67         the AES 47 (December 1999): 1061.
68 
69 Given the shift operation performed prior to computing V[0..15], it is
70 clear that V[0..159] represents a rolling history of the 10 most
71 recent groups of blocks input to the synthesis operation. Interpreting
72 the matrix N in light of its factorization into C2 and R, R's
73 sparseness has implications for interpreting the values in V. In
74 particular, there is considerable redundancy in the values stored in
75 V. Furthermore, since R[4][0..7] are all zeros, one out of every 16
76 values in V will be zero regardless of the input data. Within each
77 block of 16 values in V, fully half of them are redundant or
78 irrelevant:
79 
80 @code
81     V[ 0] =  DCT[4]
82     V[ 1] =  DCT[5]
83     V[ 2] =  DCT[6]
84     V[ 3] =  DCT[7]
85     V[ 4] = 0
86     V[ 5] = -DCT[7] = -V[3] (redundant)
87     V[ 6] = -DCT[6] = -V[2] (redundant)
88     V[ 7] = -DCT[5] = -V[1] (redundant)
89     V[ 8] = -DCT[4] = -V[0] (redundant)
90     V[ 9] = -DCT[3]
91     V[10] = -DCT[2]
92     V[11] = -DCT[1]
93     V[12] = -DCT[0]
94     V[13] = -DCT[1] = V[11] (redundant)
95     V[14] = -DCT[2] = V[10] (redundant)
96     V[15] = -DCT[3] = V[ 9] (redundant)
97 @endcode
98 
99 Since the elements of V beyond 15 were originally computed the same
100 way during a previous run, what holds true for V[x] also holds true
101 for V[x+16]. Thus, so long as care is taken to maintain the mapping,
102 we need only actually store the unique values, which correspond to the
103 output of the DCT, in some cases inverted. In fact, instead of storing
104 V[0..159], we could store DCT[0..79] which would contain a history of
105 DCT results. More on this in a bit.
106 
107 Going back to figure 12.3 in the spec, it should be clear that the
108 vector U need not actually be explicitly constructed, but that with
109 suitable indexing into V during the window operation, the same end can
110 be accomplished. In the same spirit of the pseudocode shown in the
111 figure, the following is the construction of W without using U:
112 
113 @code
114     for i=0 to 79 do
115         W[i] = D[i]*VSIGN(i)*V[remap_V(i)] where remap_V(i) = 32*(int(i/16)) + (i % 16) + (i % 16 >= 8 ? 16 : 0)
116                                              and VSIGN(i) maps i%16 into {1, 1, 1, 1, 0, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1 }
117                                              These values correspond to the
118                                              signs of the redundant values as
119                                              shown in the explanation three
120                                              paragraphs above.
121 @endcode
122 
123 We saw above how V[4..8,13..15] (and by extension
124 V[(4..8,13..15)+16*n]) can be defined in terms of other elements
125 within the subblock of V. V[0..3,9..12] correspond to DCT elements.
126 
127 @code
128     for i=0 to 79 do
129         W[i] = D[i]*DSIGN(i)*DCT[remap_DCT(i)]
130 @endcode
131 
132 The DCT is calculated using the Arai-Agui-Nakajima factorization,
133 which saves some computation by producing output that needs to be
134 multiplied by scaling factors before being used.
135 
136 @code
137     for i=0 to 79 do
138         W[i] = D[i]*SCALE[i%8]*AAN_DCT[remap_DCT(i)]
139 @endcode
140 
141 D can be premultiplied with the DCT scaling factors to yield
142 
143 @code
144     for i=0 to 79 do
145         W[i] = DSCALED[i]*AAN_DCT[remap_DCT(i)] where DSCALED[i] = D[i]*SCALE[i%8]
146 @endcode
147 
148 The output samples X[0..7] are defined as sums of W:
149 
150 @code
151         X[j] = sum{i=0..9}(W[j+8*i])
152 @endcode
153 
154 @ingroup codec_internal
155 */
156 
157 /**
158 @addtogroup codec_internal
159 @{
160 */
161 
162 #include "oi_codec_sbc_private.h"
163 
164 const OI_INT32 dec_window_4[21] = {
165            0,        /* +0.00000000E+00 */
166           97,        /* +5.36548976E-04 */
167          270,        /* +1.49188357E-03 */
168          495,        /* +2.73370904E-03 */
169          694,        /* +3.83720193E-03 */
170          704,        /* +3.89205149E-03 */
171          338,        /* +1.86581691E-03 */
172         -554,        /* -3.06012286E-03 */
173         1974,        /* +1.09137620E-02 */
174         3697,        /* +2.04385087E-02 */
175         5224,        /* +2.88757392E-02 */
176         5824,        /* +3.21939290E-02 */
177         4681,        /* +2.58767811E-02 */
178         1109,        /* +6.13245186E-03 */
179        -5214,        /* -2.88217274E-02 */
180       -14047,        /* -7.76463494E-02 */
181        24529,        /* +1.35593274E-01 */
182        35274,        /* +1.94987841E-01 */
183        44618,        /* +2.46636662E-01 */
184        50984,        /* +2.81828203E-01 */
185        53243,        /* +2.94315332E-01 */
186 };
187 
188 #define DCTII_4_K06_FIX ( 11585)/* S1.14      11585   0.707107*/
189 
190 #define DCTII_4_K08_FIX ( 21407)/* S1.14      21407   1.306563*/
191 
192 #define DCTII_4_K09_FIX (-15137)/* S1.14     -15137  -0.923880*/
193 
194 #define DCTII_4_K10_FIX ( -8867)/* S1.14      -8867  -0.541196*/
195 
196 /** Scales x by y bits to the right, adding a rounding factor.
197  */
198 #ifndef SCALE
199 #define SCALE(x, y) (((x) + (1 <<((y)-1))) >> (y))
200 #endif
201 
202 #ifndef CLIP_INT16
203 #define CLIP_INT16(x) do { if (x > OI_INT16_MAX) { x = OI_INT16_MAX; } else if (x < OI_INT16_MIN) { x = OI_INT16_MIN; } } while (0)
204 #endif
205 
206 /**
207  * Default C language implementation of a 16x32->32 multiply. This function may
208  * be replaced by a platform-specific version for speed.
209  *
210  * @param u A signed 16-bit multiplicand
211  * @param v A signed 32-bit multiplier
212 
213  * @return  A signed 32-bit value corresponding to the 32 most significant bits
214  * of the 48-bit product of u and v.
215  */
216 INLINE OI_INT32 default_mul_16s_32s_hi(OI_INT16 u, OI_INT32 v);
default_mul_16s_32s_hi(OI_INT16 u,OI_INT32 v)217 INLINE OI_INT32 default_mul_16s_32s_hi(OI_INT16 u, OI_INT32 v)
218 {
219     OI_UINT16 v0;
220     OI_INT16 v1;
221 
222     OI_INT32 w,x;
223 
224     v0 = (OI_UINT16)(v & 0xffff);
225     v1 = (OI_INT16) (v >> 16);
226 
227     w = v1 * u;
228     x = u * v0;
229 
230     return w + (x >> 16);
231 }
232 
233 #define MUL_16S_32S_HI(_x, _y) default_mul_16s_32s_hi(_x, _y)
234 
235 #define LONG_MULT_DCT(K, sample) (MUL_16S_32S_HI(K, sample)<<2)
236 
237 PRIVATE void SynthWindow80_generated(OI_INT16 *pcm, SBC_BUFFER_T const * RESTRICT buffer, OI_UINT strideShift);
238 PRIVATE void SynthWindow112_generated(OI_INT16 *pcm, SBC_BUFFER_T const * RESTRICT buffer, OI_UINT strideShift);
239 PRIVATE void dct2_8(SBC_BUFFER_T * RESTRICT out, OI_INT32 const * RESTRICT x);
240 
241 typedef void (*SYNTH_FRAME)(OI_CODEC_SBC_DECODER_CONTEXT *context, OI_INT16 *pcm, OI_UINT blkstart, OI_UINT blkcount);
242 
243 #ifndef COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS
244 #define COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(dest, src) do { shift_buffer(dest, src, 72); } while (0)
245 #endif
246 
247 #ifndef DCT2_8
248 #define DCT2_8(dst, src) dct2_8(dst, src)
249 #endif
250 
251 #ifndef SYNTH80
252 #define SYNTH80 SynthWindow80_generated
253 #endif
254 
255 #ifndef SYNTH112
256 #define SYNTH112 SynthWindow112_generated
257 #endif
258 
259 PRIVATE void OI_SBC_SynthFrame_80(OI_CODEC_SBC_DECODER_CONTEXT *context, OI_INT16 *pcm, OI_UINT blkstart, OI_UINT blkcount);
OI_SBC_SynthFrame_80(OI_CODEC_SBC_DECODER_CONTEXT * context,OI_INT16 * pcm,OI_UINT blkstart,OI_UINT blkcount)260 PRIVATE void OI_SBC_SynthFrame_80(OI_CODEC_SBC_DECODER_CONTEXT *context, OI_INT16 *pcm, OI_UINT blkstart, OI_UINT blkcount)
261 {
262     OI_UINT blk;
263     OI_UINT ch;
264     OI_UINT nrof_channels = context->common.frameInfo.nrof_channels;
265     OI_UINT pcmStrideShift = (context->common.pcmStride == 1) ? 0 : 1;
266     OI_UINT offset = context->common.filterBufferOffset;
267     OI_INT32 *s = context->common.subdata + (8 * nrof_channels * blkstart);
268     OI_UINT blkstop = blkstart + blkcount;
269 
270     for (blk = blkstart; blk < blkstop; blk++) {
271         if (offset == 0) {
272             COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(context->common.filterBuffer[0] + context->common.filterBufferLen - 72, context->common.filterBuffer[0]);
273             if (nrof_channels == 2) {
274                 COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(context->common.filterBuffer[1] + context->common.filterBufferLen - 72, context->common.filterBuffer[1]);
275             }
276             offset = context->common.filterBufferLen - 80;
277         } else {
278             offset -= 1*8;
279         }
280 
281         for (ch = 0; ch < nrof_channels; ch++) {
282             DCT2_8(context->common.filterBuffer[ch] + offset, s);
283             SYNTH80(pcm + ch, context->common.filterBuffer[ch] + offset, pcmStrideShift);
284             s += 8;
285         }
286         pcm += (8 << pcmStrideShift);
287     }
288     context->common.filterBufferOffset = offset;
289 }
290 
291 PRIVATE void OI_SBC_SynthFrame_4SB(OI_CODEC_SBC_DECODER_CONTEXT *context, OI_INT16 *pcm, OI_UINT blkstart, OI_UINT blkcount);
OI_SBC_SynthFrame_4SB(OI_CODEC_SBC_DECODER_CONTEXT * context,OI_INT16 * pcm,OI_UINT blkstart,OI_UINT blkcount)292 PRIVATE void OI_SBC_SynthFrame_4SB(OI_CODEC_SBC_DECODER_CONTEXT *context, OI_INT16 *pcm, OI_UINT blkstart, OI_UINT blkcount)
293 {
294     OI_UINT blk;
295     OI_UINT ch;
296     OI_UINT nrof_channels = context->common.frameInfo.nrof_channels;
297     OI_UINT pcmStrideShift = (context->common.pcmStride == 1) ? 0 : 1;
298     OI_UINT offset = context->common.filterBufferOffset;
299     OI_INT32 *s = context->common.subdata + (8 * nrof_channels * blkstart);
300     OI_UINT blkstop = blkstart + blkcount;
301 
302     for (blk = blkstart; blk < blkstop; blk++) {
303         if (offset == 0) {
304             COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(context->common.filterBuffer[0] + context->common.filterBufferLen - 72,context->common.filterBuffer[0]);
305             if (nrof_channels == 2) {
306                 COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(context->common.filterBuffer[1] + context->common.filterBufferLen - 72,context->common.filterBuffer[1]);
307             }
308             offset =context->common.filterBufferLen - 80;
309         } else {
310             offset -= 8;
311         }
312         for (ch = 0; ch < nrof_channels; ch++) {
313             cosineModulateSynth4(context->common.filterBuffer[ch] + offset, s);
314             SynthWindow40_int32_int32_symmetry_with_sum(pcm + ch,
315                                                         context->common.filterBuffer[ch] + offset,
316                                                         pcmStrideShift);
317             s += 4;
318         }
319         pcm += (4 << pcmStrideShift);
320     }
321     context->common.filterBufferOffset = offset;
322 }
323 
324 #ifdef SBC_ENHANCED
325 
OI_SBC_SynthFrame_Enhanced(OI_CODEC_SBC_DECODER_CONTEXT * context,OI_INT16 * pcm,OI_UINT blkstart,OI_UINT blkcount)326 PRIVATE void OI_SBC_SynthFrame_Enhanced(OI_CODEC_SBC_DECODER_CONTEXT *context, OI_INT16 *pcm, OI_UINT blkstart, OI_UINT blkcount)
327 {
328     OI_UINT blk;
329     OI_UINT ch;
330     OI_UINT nrof_channels = context->common.frameInfo.nrof_channels;
331     OI_UINT pcmStrideShift = context->common.pcmStride == 1 ? 0 : 1;
332     OI_UINT offset = context->common.filterBufferOffset;
333     OI_INT32 *s = context->common.subdata + 8 * nrof_channels * blkstart;
334     OI_UINT blkstop = blkstart + blkcount;
335 
336     for (blk = blkstart; blk < blkstop; blk++) {
337         if (offset == 0) {
338             COPY_BACKWARD_32BIT_ALIGNED_104_HALFWORDS(context->common.filterBuffer[0] +context->common.filterBufferLen - 104, context->common.filterBuffer[0]);
339             if (nrof_channels == 2) {
340                 COPY_BACKWARD_32BIT_ALIGNED_104_HALFWORDS(context->common.filterBuffer[1] + context->common.filterBufferLen - 104, context->common.filterBuffer[1]);
341             }
342             offset = context->common.filterBufferLen - 112;
343         } else {
344             offset -= 8;
345         }
346         for (ch = 0; ch < nrof_channels; ++ch) {
347             DCT2_8(context->common.filterBuffer[ch] + offset, s);
348             SYNTH112(pcm + ch, context->common.filterBuffer[ch] + offset, pcmStrideShift);
349             s += 8;
350         }
351         pcm += (8 << pcmStrideShift);
352     }
353     context->common.filterBufferOffset = offset;
354 }
355 
356 static const SYNTH_FRAME SynthFrameEnhanced[] = {
357     (SYNTH_FRAME) NULL,   /* invalid */
358     OI_SBC_SynthFrame_Enhanced, /* mono */
359     OI_SBC_SynthFrame_Enhanced  /* stereo */
360 };
361 
362 #endif
363 
364 static const SYNTH_FRAME SynthFrame8SB[] = {
365     (SYNTH_FRAME) NULL, /* invalid */
366     OI_SBC_SynthFrame_80, /* mono */
367     OI_SBC_SynthFrame_80  /* stereo */
368 };
369 
370 
371 static const SYNTH_FRAME SynthFrame4SB[] = {
372     (SYNTH_FRAME) NULL, /* invalid */
373     OI_SBC_SynthFrame_4SB, /* mono */
374     OI_SBC_SynthFrame_4SB  /* stereo */
375 };
376 
OI_SBC_SynthFrame(OI_CODEC_SBC_DECODER_CONTEXT * context,OI_INT16 * pcm,OI_UINT start_block,OI_UINT nrof_blocks)377 PRIVATE void OI_SBC_SynthFrame(OI_CODEC_SBC_DECODER_CONTEXT *context, OI_INT16 *pcm, OI_UINT start_block, OI_UINT nrof_blocks)
378 {
379     OI_UINT nrof_subbands = context->common.frameInfo.nrof_subbands;
380     OI_UINT nrof_channels = context->common.frameInfo.nrof_channels;
381 
382     OI_ASSERT(nrof_subbands == 4 || nrof_subbands == 8);
383     if (nrof_subbands == 4) {
384         SynthFrame4SB[nrof_channels](context, pcm, start_block, nrof_blocks);
385 #ifdef SBC_ENHANCED
386     } else if (context->common.frameInfo.enhanced) {
387         SynthFrameEnhanced[nrof_channels](context, pcm, start_block, nrof_blocks);
388 #endif /* SBC_ENHANCED */
389         } else {
390         SynthFrame8SB[nrof_channels](context, pcm, start_block, nrof_blocks);
391     }
392 }
393 
394 
SynthWindow40_int32_int32_symmetry_with_sum(OI_INT16 * pcm,SBC_BUFFER_T buffer[80],OI_UINT strideShift)395 void SynthWindow40_int32_int32_symmetry_with_sum(OI_INT16 *pcm, SBC_BUFFER_T buffer[80], OI_UINT strideShift)
396 {
397     OI_INT32 pa;
398     OI_INT32 pb;
399 
400     /* These values should be zero, since out[2] of the 4-band cosine modulation
401      * is always zero. */
402     OI_ASSERT(buffer[ 2] == 0);
403     OI_ASSERT(buffer[10] == 0);
404     OI_ASSERT(buffer[18] == 0);
405     OI_ASSERT(buffer[26] == 0);
406     OI_ASSERT(buffer[34] == 0);
407     OI_ASSERT(buffer[42] == 0);
408     OI_ASSERT(buffer[50] == 0);
409     OI_ASSERT(buffer[58] == 0);
410     OI_ASSERT(buffer[66] == 0);
411     OI_ASSERT(buffer[74] == 0);
412 
413 
414     pa  = dec_window_4[ 4] * (buffer[12] + buffer[76]);
415     pa += dec_window_4[ 8] * (buffer[16] - buffer[64]);
416     pa += dec_window_4[12] * (buffer[28] + buffer[60]);
417     pa += dec_window_4[16] * (buffer[32] - buffer[48]);
418     pa += dec_window_4[20] *  buffer[44];
419     pa = SCALE(-pa, 15);
420     CLIP_INT16(pa);
421     pcm[(uint32_t)(0 << strideShift)] = (OI_INT16)pa;
422 
423 
424     pa  = dec_window_4[ 1] * buffer[ 1]; pb  = dec_window_4[ 1] * buffer[79];
425     pb += dec_window_4[ 3] * buffer[ 3]; pa += dec_window_4[ 3] * buffer[77];
426     pa += dec_window_4[ 5] * buffer[13]; pb += dec_window_4[ 5] * buffer[67];
427     pb += dec_window_4[ 7] * buffer[15]; pa += dec_window_4[ 7] * buffer[65];
428     pa += dec_window_4[ 9] * buffer[17]; pb += dec_window_4[ 9] * buffer[63];
429     pb += dec_window_4[11] * buffer[19]; pa += dec_window_4[11] * buffer[61];
430     pa += dec_window_4[13] * buffer[29]; pb += dec_window_4[13] * buffer[51];
431     pb += dec_window_4[15] * buffer[31]; pa += dec_window_4[15] * buffer[49];
432     pa += dec_window_4[17] * buffer[33]; pb += dec_window_4[17] * buffer[47];
433     pb += dec_window_4[19] * buffer[35]; pa += dec_window_4[19] * buffer[45];
434     pa = SCALE(-pa, 15);
435     CLIP_INT16(pa);
436     pcm[(uint32_t)(1 << strideShift)] = (OI_INT16)(pa);
437     pb = SCALE(-pb, 15);
438     CLIP_INT16(pb);
439     pcm[(uint32_t)(3 << strideShift)] = (OI_INT16)(pb);
440 
441 
442     pa  = dec_window_4[2] * (/*buffer[ 2] + */ buffer[78]);  /* buffer[ 2] is always zero */
443     pa += dec_window_4[6] * (buffer[14] /* + buffer[66]*/);  /* buffer[66] is always zero */
444     pa += dec_window_4[10] * (/*buffer[18] + */ buffer[62]);  /* buffer[18] is always zero */
445     pa += dec_window_4[14] * (buffer[30] /* + buffer[50]*/);  /* buffer[50] is always zero */
446     pa += dec_window_4[18] * (/*buffer[34] + */ buffer[46]);  /* buffer[34] is always zero */
447     pa = SCALE(-pa, 15);
448     CLIP_INT16(pa);
449     pcm[(uint32_t)(2 << strideShift)] = (OI_INT16)(pa);
450 }
451 
452 
453 /**
454   This routine implements the cosine modulation matrix for 4-subband
455   synthesis. This is called "matrixing" in the SBC specification. This
456   matrix, M4,  can be factored into an 8-point Type II Discrete Cosine
457   Transform, DCTII_4 and a matrix S4, given here:
458 
459   @code
460         __               __
461        |   0   0   1   0   |
462        |   0   0   0   1   |
463        |   0   0   0   0   |
464        |   0   0   0  -1   |
465   S4 = |   0   0  -1   0   |
466        |   0  -1   0   0   |
467        |  -1   0   0   0   |
468        |__ 0  -1   0   0 __|
469 
470   M4 * in = S4 * (DCTII_4 * in)
471   @endcode
472 
473   (DCTII_4 * in) is computed using a Fast Cosine Transform. The algorithm
474   here is based on an implementation computed by the SPIRAL computer
475   algebra system, manually converted to fixed-point arithmetic. S4 can be
476   implemented using only assignment and negation.
477   */
cosineModulateSynth4(SBC_BUFFER_T * RESTRICT out,OI_INT32 const * RESTRICT in)478 PRIVATE void cosineModulateSynth4(SBC_BUFFER_T * RESTRICT out, OI_INT32 const * RESTRICT in)
479 {
480     OI_INT32 f0, f1, f2, f3, f4, f7, f8, f9, f10;
481     OI_INT32 y0, y1, y2, y3;
482 
483     f0 = (in[0] - in[3]);
484     f1 = (in[0] + in[3]);
485     f2 = (in[1] - in[2]);
486     f3 = (in[1] + in[2]);
487 
488     f4 = f1 - f3;
489 
490     y0 = -SCALE(f1 + f3, DCT_SHIFT);
491     y2 = -SCALE(LONG_MULT_DCT(DCTII_4_K06_FIX, f4), DCT_SHIFT);
492     f7 = f0 + f2;
493     f8 = LONG_MULT_DCT(DCTII_4_K08_FIX, f0);
494     f9 = LONG_MULT_DCT(DCTII_4_K09_FIX, f7);
495     f10 = LONG_MULT_DCT(DCTII_4_K10_FIX, f2);
496     y3 = -SCALE(f8 + f9, DCT_SHIFT);
497     y1 = -SCALE(f10 - f9, DCT_SHIFT);
498 
499     out[0] = (OI_INT16)-y2;
500     out[1] = (OI_INT16)-y3;
501     out[2] = (OI_INT16)0;
502     out[3] = (OI_INT16)y3;
503     out[4] = (OI_INT16)y2;
504     out[5] = (OI_INT16)y1;
505     out[6] = (OI_INT16)y0;
506     out[7] = (OI_INT16)y1;
507 }
508 
509 
510 
511 /**
512 @}
513 */
514