1*dfc6aa5cSAndroid Build Coastguard Worker; 2*dfc6aa5cSAndroid Build Coastguard Worker; jdsample.asm - upsampling (MMX) 3*dfc6aa5cSAndroid Build Coastguard Worker; 4*dfc6aa5cSAndroid Build Coastguard Worker; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB 5*dfc6aa5cSAndroid Build Coastguard Worker; Copyright (C) 2016, D. R. Commander. 6*dfc6aa5cSAndroid Build Coastguard Worker; 7*dfc6aa5cSAndroid Build Coastguard Worker; Based on the x86 SIMD extension for IJG JPEG library 8*dfc6aa5cSAndroid Build Coastguard Worker; Copyright (C) 1999-2006, MIYASAKA Masaru. 9*dfc6aa5cSAndroid Build Coastguard Worker; For conditions of distribution and use, see copyright notice in jsimdext.inc 10*dfc6aa5cSAndroid Build Coastguard Worker; 11*dfc6aa5cSAndroid Build Coastguard Worker; This file should be assembled with NASM (Netwide Assembler), 12*dfc6aa5cSAndroid Build Coastguard Worker; can *not* be assembled with Microsoft's MASM or any compatible 13*dfc6aa5cSAndroid Build Coastguard Worker; assembler (including Borland's Turbo Assembler). 14*dfc6aa5cSAndroid Build Coastguard Worker; NASM is available from http://nasm.sourceforge.net/ or 15*dfc6aa5cSAndroid Build Coastguard Worker; http://sourceforge.net/project/showfiles.php?group_id=6208 16*dfc6aa5cSAndroid Build Coastguard Worker 17*dfc6aa5cSAndroid Build Coastguard Worker%include "jsimdext.inc" 18*dfc6aa5cSAndroid Build Coastguard Worker 19*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 20*dfc6aa5cSAndroid Build Coastguard Worker SECTION SEG_CONST 21*dfc6aa5cSAndroid Build Coastguard Worker 22*dfc6aa5cSAndroid Build Coastguard Worker alignz 32 23*dfc6aa5cSAndroid Build Coastguard Worker GLOBAL_DATA(jconst_fancy_upsample_mmx) 24*dfc6aa5cSAndroid Build Coastguard Worker 25*dfc6aa5cSAndroid Build Coastguard WorkerEXTN(jconst_fancy_upsample_mmx): 26*dfc6aa5cSAndroid Build Coastguard Worker 27*dfc6aa5cSAndroid Build Coastguard WorkerPW_ONE times 4 dw 1 28*dfc6aa5cSAndroid Build Coastguard WorkerPW_TWO times 4 dw 2 29*dfc6aa5cSAndroid Build Coastguard WorkerPW_THREE times 4 dw 3 30*dfc6aa5cSAndroid Build Coastguard WorkerPW_SEVEN times 4 dw 7 31*dfc6aa5cSAndroid Build Coastguard WorkerPW_EIGHT times 4 dw 8 32*dfc6aa5cSAndroid Build Coastguard Worker 33*dfc6aa5cSAndroid Build Coastguard Worker alignz 32 34*dfc6aa5cSAndroid Build Coastguard Worker 35*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 36*dfc6aa5cSAndroid Build Coastguard Worker SECTION SEG_TEXT 37*dfc6aa5cSAndroid Build Coastguard Worker BITS 32 38*dfc6aa5cSAndroid Build Coastguard Worker; 39*dfc6aa5cSAndroid Build Coastguard Worker; Fancy processing for the common case of 2:1 horizontal and 1:1 vertical. 40*dfc6aa5cSAndroid Build Coastguard Worker; 41*dfc6aa5cSAndroid Build Coastguard Worker; The upsampling algorithm is linear interpolation between pixel centers, 42*dfc6aa5cSAndroid Build Coastguard Worker; also known as a "triangle filter". This is a good compromise between 43*dfc6aa5cSAndroid Build Coastguard Worker; speed and visual quality. The centers of the output pixels are 1/4 and 3/4 44*dfc6aa5cSAndroid Build Coastguard Worker; of the way between input pixel centers. 45*dfc6aa5cSAndroid Build Coastguard Worker; 46*dfc6aa5cSAndroid Build Coastguard Worker; GLOBAL(void) 47*dfc6aa5cSAndroid Build Coastguard Worker; jsimd_h2v1_fancy_upsample_mmx(int max_v_samp_factor, 48*dfc6aa5cSAndroid Build Coastguard Worker; JDIMENSION downsampled_width, 49*dfc6aa5cSAndroid Build Coastguard Worker; JSAMPARRAY input_data, 50*dfc6aa5cSAndroid Build Coastguard Worker; JSAMPARRAY *output_data_ptr); 51*dfc6aa5cSAndroid Build Coastguard Worker; 52*dfc6aa5cSAndroid Build Coastguard Worker 53*dfc6aa5cSAndroid Build Coastguard Worker%define max_v_samp(b) (b) + 8 ; int max_v_samp_factor 54*dfc6aa5cSAndroid Build Coastguard Worker%define downsamp_width(b) (b) + 12 ; JDIMENSION downsampled_width 55*dfc6aa5cSAndroid Build Coastguard Worker%define input_data(b) (b) + 16 ; JSAMPARRAY input_data 56*dfc6aa5cSAndroid Build Coastguard Worker%define output_data_ptr(b) (b) + 20 ; JSAMPARRAY *output_data_ptr 57*dfc6aa5cSAndroid Build Coastguard Worker 58*dfc6aa5cSAndroid Build Coastguard Worker align 32 59*dfc6aa5cSAndroid Build Coastguard Worker GLOBAL_FUNCTION(jsimd_h2v1_fancy_upsample_mmx) 60*dfc6aa5cSAndroid Build Coastguard Worker 61*dfc6aa5cSAndroid Build Coastguard WorkerEXTN(jsimd_h2v1_fancy_upsample_mmx): 62*dfc6aa5cSAndroid Build Coastguard Worker push ebp 63*dfc6aa5cSAndroid Build Coastguard Worker mov ebp, esp 64*dfc6aa5cSAndroid Build Coastguard Worker pushpic ebx 65*dfc6aa5cSAndroid Build Coastguard Worker; push ecx ; need not be preserved 66*dfc6aa5cSAndroid Build Coastguard Worker; push edx ; need not be preserved 67*dfc6aa5cSAndroid Build Coastguard Worker push esi 68*dfc6aa5cSAndroid Build Coastguard Worker push edi 69*dfc6aa5cSAndroid Build Coastguard Worker 70*dfc6aa5cSAndroid Build Coastguard Worker get_GOT ebx ; get GOT address 71*dfc6aa5cSAndroid Build Coastguard Worker 72*dfc6aa5cSAndroid Build Coastguard Worker mov eax, JDIMENSION [downsamp_width(ebp)] ; colctr 73*dfc6aa5cSAndroid Build Coastguard Worker test eax, eax 74*dfc6aa5cSAndroid Build Coastguard Worker jz near .return 75*dfc6aa5cSAndroid Build Coastguard Worker 76*dfc6aa5cSAndroid Build Coastguard Worker mov ecx, INT [max_v_samp(ebp)] ; rowctr 77*dfc6aa5cSAndroid Build Coastguard Worker test ecx, ecx 78*dfc6aa5cSAndroid Build Coastguard Worker jz near .return 79*dfc6aa5cSAndroid Build Coastguard Worker 80*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JSAMPARRAY [input_data(ebp)] ; input_data 81*dfc6aa5cSAndroid Build Coastguard Worker mov edi, POINTER [output_data_ptr(ebp)] 82*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JSAMPARRAY [edi] ; output_data 83*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 84*dfc6aa5cSAndroid Build Coastguard Worker.rowloop: 85*dfc6aa5cSAndroid Build Coastguard Worker push eax ; colctr 86*dfc6aa5cSAndroid Build Coastguard Worker push edi 87*dfc6aa5cSAndroid Build Coastguard Worker push esi 88*dfc6aa5cSAndroid Build Coastguard Worker 89*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JSAMPROW [esi] ; inptr 90*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JSAMPROW [edi] ; outptr 91*dfc6aa5cSAndroid Build Coastguard Worker 92*dfc6aa5cSAndroid Build Coastguard Worker test eax, SIZEOF_MMWORD-1 93*dfc6aa5cSAndroid Build Coastguard Worker jz short .skip 94*dfc6aa5cSAndroid Build Coastguard Worker mov dl, JSAMPLE [esi+(eax-1)*SIZEOF_JSAMPLE] 95*dfc6aa5cSAndroid Build Coastguard Worker mov JSAMPLE [esi+eax*SIZEOF_JSAMPLE], dl ; insert a dummy sample 96*dfc6aa5cSAndroid Build Coastguard Worker.skip: 97*dfc6aa5cSAndroid Build Coastguard Worker pxor mm0, mm0 ; mm0=(all 0's) 98*dfc6aa5cSAndroid Build Coastguard Worker pcmpeqb mm7, mm7 99*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm7, (SIZEOF_MMWORD-1)*BYTE_BIT 100*dfc6aa5cSAndroid Build Coastguard Worker pand mm7, MMWORD [esi+0*SIZEOF_MMWORD] 101*dfc6aa5cSAndroid Build Coastguard Worker 102*dfc6aa5cSAndroid Build Coastguard Worker add eax, byte SIZEOF_MMWORD-1 103*dfc6aa5cSAndroid Build Coastguard Worker and eax, byte -SIZEOF_MMWORD 104*dfc6aa5cSAndroid Build Coastguard Worker cmp eax, byte SIZEOF_MMWORD 105*dfc6aa5cSAndroid Build Coastguard Worker ja short .columnloop 106*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 107*dfc6aa5cSAndroid Build Coastguard Worker 108*dfc6aa5cSAndroid Build Coastguard Worker.columnloop_last: 109*dfc6aa5cSAndroid Build Coastguard Worker pcmpeqb mm6, mm6 110*dfc6aa5cSAndroid Build Coastguard Worker psllq mm6, (SIZEOF_MMWORD-1)*BYTE_BIT 111*dfc6aa5cSAndroid Build Coastguard Worker pand mm6, MMWORD [esi+0*SIZEOF_MMWORD] 112*dfc6aa5cSAndroid Build Coastguard Worker jmp short .upsample 113*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 114*dfc6aa5cSAndroid Build Coastguard Worker 115*dfc6aa5cSAndroid Build Coastguard Worker.columnloop: 116*dfc6aa5cSAndroid Build Coastguard Worker movq mm6, MMWORD [esi+1*SIZEOF_MMWORD] 117*dfc6aa5cSAndroid Build Coastguard Worker psllq mm6, (SIZEOF_MMWORD-1)*BYTE_BIT 118*dfc6aa5cSAndroid Build Coastguard Worker 119*dfc6aa5cSAndroid Build Coastguard Worker.upsample: 120*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, MMWORD [esi+0*SIZEOF_MMWORD] 121*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, mm1 122*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, mm1 ; mm1=( 0 1 2 3 4 5 6 7) 123*dfc6aa5cSAndroid Build Coastguard Worker psllq mm2, BYTE_BIT ; mm2=( - 0 1 2 3 4 5 6) 124*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm3, BYTE_BIT ; mm3=( 1 2 3 4 5 6 7 -) 125*dfc6aa5cSAndroid Build Coastguard Worker 126*dfc6aa5cSAndroid Build Coastguard Worker por mm2, mm7 ; mm2=(-1 0 1 2 3 4 5 6) 127*dfc6aa5cSAndroid Build Coastguard Worker por mm3, mm6 ; mm3=( 1 2 3 4 5 6 7 8) 128*dfc6aa5cSAndroid Build Coastguard Worker 129*dfc6aa5cSAndroid Build Coastguard Worker movq mm7, mm1 130*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm7, (SIZEOF_MMWORD-1)*BYTE_BIT ; mm7=( 7 - - - - - - -) 131*dfc6aa5cSAndroid Build Coastguard Worker 132*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm1 133*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm1, mm0 ; mm1=( 0 1 2 3) 134*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm4, mm0 ; mm4=( 4 5 6 7) 135*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm2 136*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm2, mm0 ; mm2=(-1 0 1 2) 137*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm5, mm0 ; mm5=( 3 4 5 6) 138*dfc6aa5cSAndroid Build Coastguard Worker movq mm6, mm3 139*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm3, mm0 ; mm3=( 1 2 3 4) 140*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm6, mm0 ; mm6=( 5 6 7 8) 141*dfc6aa5cSAndroid Build Coastguard Worker 142*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm1, [GOTOFF(ebx,PW_THREE)] 143*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm4, [GOTOFF(ebx,PW_THREE)] 144*dfc6aa5cSAndroid Build Coastguard Worker paddw mm2, [GOTOFF(ebx,PW_ONE)] 145*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, [GOTOFF(ebx,PW_ONE)] 146*dfc6aa5cSAndroid Build Coastguard Worker paddw mm3, [GOTOFF(ebx,PW_TWO)] 147*dfc6aa5cSAndroid Build Coastguard Worker paddw mm6, [GOTOFF(ebx,PW_TWO)] 148*dfc6aa5cSAndroid Build Coastguard Worker 149*dfc6aa5cSAndroid Build Coastguard Worker paddw mm2, mm1 150*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, mm4 151*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm2, 2 ; mm2=OutLE=( 0 2 4 6) 152*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm5, 2 ; mm5=OutHE=( 8 10 12 14) 153*dfc6aa5cSAndroid Build Coastguard Worker paddw mm3, mm1 154*dfc6aa5cSAndroid Build Coastguard Worker paddw mm6, mm4 155*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm3, 2 ; mm3=OutLO=( 1 3 5 7) 156*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm6, 2 ; mm6=OutHO=( 9 11 13 15) 157*dfc6aa5cSAndroid Build Coastguard Worker 158*dfc6aa5cSAndroid Build Coastguard Worker psllw mm3, BYTE_BIT 159*dfc6aa5cSAndroid Build Coastguard Worker psllw mm6, BYTE_BIT 160*dfc6aa5cSAndroid Build Coastguard Worker por mm2, mm3 ; mm2=OutL=( 0 1 2 3 4 5 6 7) 161*dfc6aa5cSAndroid Build Coastguard Worker por mm5, mm6 ; mm5=OutH=( 8 9 10 11 12 13 14 15) 162*dfc6aa5cSAndroid Build Coastguard Worker 163*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+0*SIZEOF_MMWORD], mm2 164*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+1*SIZEOF_MMWORD], mm5 165*dfc6aa5cSAndroid Build Coastguard Worker 166*dfc6aa5cSAndroid Build Coastguard Worker sub eax, byte SIZEOF_MMWORD 167*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 1*SIZEOF_MMWORD ; inptr 168*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 2*SIZEOF_MMWORD ; outptr 169*dfc6aa5cSAndroid Build Coastguard Worker cmp eax, byte SIZEOF_MMWORD 170*dfc6aa5cSAndroid Build Coastguard Worker ja near .columnloop 171*dfc6aa5cSAndroid Build Coastguard Worker test eax, eax 172*dfc6aa5cSAndroid Build Coastguard Worker jnz near .columnloop_last 173*dfc6aa5cSAndroid Build Coastguard Worker 174*dfc6aa5cSAndroid Build Coastguard Worker pop esi 175*dfc6aa5cSAndroid Build Coastguard Worker pop edi 176*dfc6aa5cSAndroid Build Coastguard Worker pop eax 177*dfc6aa5cSAndroid Build Coastguard Worker 178*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte SIZEOF_JSAMPROW ; input_data 179*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte SIZEOF_JSAMPROW ; output_data 180*dfc6aa5cSAndroid Build Coastguard Worker dec ecx ; rowctr 181*dfc6aa5cSAndroid Build Coastguard Worker jg near .rowloop 182*dfc6aa5cSAndroid Build Coastguard Worker 183*dfc6aa5cSAndroid Build Coastguard Worker emms ; empty MMX state 184*dfc6aa5cSAndroid Build Coastguard Worker 185*dfc6aa5cSAndroid Build Coastguard Worker.return: 186*dfc6aa5cSAndroid Build Coastguard Worker pop edi 187*dfc6aa5cSAndroid Build Coastguard Worker pop esi 188*dfc6aa5cSAndroid Build Coastguard Worker; pop edx ; need not be preserved 189*dfc6aa5cSAndroid Build Coastguard Worker; pop ecx ; need not be preserved 190*dfc6aa5cSAndroid Build Coastguard Worker poppic ebx 191*dfc6aa5cSAndroid Build Coastguard Worker pop ebp 192*dfc6aa5cSAndroid Build Coastguard Worker ret 193*dfc6aa5cSAndroid Build Coastguard Worker 194*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 195*dfc6aa5cSAndroid Build Coastguard Worker; 196*dfc6aa5cSAndroid Build Coastguard Worker; Fancy processing for the common case of 2:1 horizontal and 2:1 vertical. 197*dfc6aa5cSAndroid Build Coastguard Worker; Again a triangle filter; see comments for h2v1 case, above. 198*dfc6aa5cSAndroid Build Coastguard Worker; 199*dfc6aa5cSAndroid Build Coastguard Worker; GLOBAL(void) 200*dfc6aa5cSAndroid Build Coastguard Worker; jsimd_h2v2_fancy_upsample_mmx(int max_v_samp_factor, 201*dfc6aa5cSAndroid Build Coastguard Worker; JDIMENSION downsampled_width, 202*dfc6aa5cSAndroid Build Coastguard Worker; JSAMPARRAY input_data, 203*dfc6aa5cSAndroid Build Coastguard Worker; JSAMPARRAY *output_data_ptr); 204*dfc6aa5cSAndroid Build Coastguard Worker; 205*dfc6aa5cSAndroid Build Coastguard Worker 206*dfc6aa5cSAndroid Build Coastguard Worker%define max_v_samp(b) (b) + 8 ; int max_v_samp_factor 207*dfc6aa5cSAndroid Build Coastguard Worker%define downsamp_width(b) (b) + 12 ; JDIMENSION downsampled_width 208*dfc6aa5cSAndroid Build Coastguard Worker%define input_data(b) (b) + 16 ; JSAMPARRAY input_data 209*dfc6aa5cSAndroid Build Coastguard Worker%define output_data_ptr(b) (b) + 20 ; JSAMPARRAY *output_data_ptr 210*dfc6aa5cSAndroid Build Coastguard Worker 211*dfc6aa5cSAndroid Build Coastguard Worker%define original_ebp ebp + 0 212*dfc6aa5cSAndroid Build Coastguard Worker%define wk(i) ebp - (WK_NUM - (i)) * SIZEOF_MMWORD ; mmword wk[WK_NUM] 213*dfc6aa5cSAndroid Build Coastguard Worker%define WK_NUM 4 214*dfc6aa5cSAndroid Build Coastguard Worker%define gotptr wk(0) - SIZEOF_POINTER ; void *gotptr 215*dfc6aa5cSAndroid Build Coastguard Worker 216*dfc6aa5cSAndroid Build Coastguard Worker align 32 217*dfc6aa5cSAndroid Build Coastguard Worker GLOBAL_FUNCTION(jsimd_h2v2_fancy_upsample_mmx) 218*dfc6aa5cSAndroid Build Coastguard Worker 219*dfc6aa5cSAndroid Build Coastguard WorkerEXTN(jsimd_h2v2_fancy_upsample_mmx): 220*dfc6aa5cSAndroid Build Coastguard Worker push ebp 221*dfc6aa5cSAndroid Build Coastguard Worker mov eax, esp ; eax = original ebp 222*dfc6aa5cSAndroid Build Coastguard Worker sub esp, byte 4 223*dfc6aa5cSAndroid Build Coastguard Worker and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits 224*dfc6aa5cSAndroid Build Coastguard Worker mov [esp], eax 225*dfc6aa5cSAndroid Build Coastguard Worker mov ebp, esp ; ebp = aligned ebp 226*dfc6aa5cSAndroid Build Coastguard Worker lea esp, [wk(0)] 227*dfc6aa5cSAndroid Build Coastguard Worker pushpic eax ; make a room for GOT address 228*dfc6aa5cSAndroid Build Coastguard Worker push ebx 229*dfc6aa5cSAndroid Build Coastguard Worker; push ecx ; need not be preserved 230*dfc6aa5cSAndroid Build Coastguard Worker; push edx ; need not be preserved 231*dfc6aa5cSAndroid Build Coastguard Worker push esi 232*dfc6aa5cSAndroid Build Coastguard Worker push edi 233*dfc6aa5cSAndroid Build Coastguard Worker 234*dfc6aa5cSAndroid Build Coastguard Worker get_GOT ebx ; get GOT address 235*dfc6aa5cSAndroid Build Coastguard Worker movpic POINTER [gotptr], ebx ; save GOT address 236*dfc6aa5cSAndroid Build Coastguard Worker 237*dfc6aa5cSAndroid Build Coastguard Worker mov edx, eax ; edx = original ebp 238*dfc6aa5cSAndroid Build Coastguard Worker mov eax, JDIMENSION [downsamp_width(edx)] ; colctr 239*dfc6aa5cSAndroid Build Coastguard Worker test eax, eax 240*dfc6aa5cSAndroid Build Coastguard Worker jz near .return 241*dfc6aa5cSAndroid Build Coastguard Worker 242*dfc6aa5cSAndroid Build Coastguard Worker mov ecx, INT [max_v_samp(edx)] ; rowctr 243*dfc6aa5cSAndroid Build Coastguard Worker test ecx, ecx 244*dfc6aa5cSAndroid Build Coastguard Worker jz near .return 245*dfc6aa5cSAndroid Build Coastguard Worker 246*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JSAMPARRAY [input_data(edx)] ; input_data 247*dfc6aa5cSAndroid Build Coastguard Worker mov edi, POINTER [output_data_ptr(edx)] 248*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JSAMPARRAY [edi] ; output_data 249*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 250*dfc6aa5cSAndroid Build Coastguard Worker.rowloop: 251*dfc6aa5cSAndroid Build Coastguard Worker push eax ; colctr 252*dfc6aa5cSAndroid Build Coastguard Worker push ecx 253*dfc6aa5cSAndroid Build Coastguard Worker push edi 254*dfc6aa5cSAndroid Build Coastguard Worker push esi 255*dfc6aa5cSAndroid Build Coastguard Worker 256*dfc6aa5cSAndroid Build Coastguard Worker mov ecx, JSAMPROW [esi-1*SIZEOF_JSAMPROW] ; inptr1(above) 257*dfc6aa5cSAndroid Build Coastguard Worker mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; inptr0 258*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; inptr1(below) 259*dfc6aa5cSAndroid Build Coastguard Worker mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW] ; outptr0 260*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JSAMPROW [edi+1*SIZEOF_JSAMPROW] ; outptr1 261*dfc6aa5cSAndroid Build Coastguard Worker 262*dfc6aa5cSAndroid Build Coastguard Worker test eax, SIZEOF_MMWORD-1 263*dfc6aa5cSAndroid Build Coastguard Worker jz short .skip 264*dfc6aa5cSAndroid Build Coastguard Worker push edx 265*dfc6aa5cSAndroid Build Coastguard Worker mov dl, JSAMPLE [ecx+(eax-1)*SIZEOF_JSAMPLE] 266*dfc6aa5cSAndroid Build Coastguard Worker mov JSAMPLE [ecx+eax*SIZEOF_JSAMPLE], dl 267*dfc6aa5cSAndroid Build Coastguard Worker mov dl, JSAMPLE [ebx+(eax-1)*SIZEOF_JSAMPLE] 268*dfc6aa5cSAndroid Build Coastguard Worker mov JSAMPLE [ebx+eax*SIZEOF_JSAMPLE], dl 269*dfc6aa5cSAndroid Build Coastguard Worker mov dl, JSAMPLE [esi+(eax-1)*SIZEOF_JSAMPLE] 270*dfc6aa5cSAndroid Build Coastguard Worker mov JSAMPLE [esi+eax*SIZEOF_JSAMPLE], dl ; insert a dummy sample 271*dfc6aa5cSAndroid Build Coastguard Worker pop edx 272*dfc6aa5cSAndroid Build Coastguard Worker.skip: 273*dfc6aa5cSAndroid Build Coastguard Worker ; -- process the first column block 274*dfc6aa5cSAndroid Build Coastguard Worker 275*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, MMWORD [ebx+0*SIZEOF_MMWORD] ; mm0=row[ 0][0] 276*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, MMWORD [ecx+0*SIZEOF_MMWORD] ; mm1=row[-1][0] 277*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, MMWORD [esi+0*SIZEOF_MMWORD] ; mm2=row[+1][0] 278*dfc6aa5cSAndroid Build Coastguard Worker 279*dfc6aa5cSAndroid Build Coastguard Worker pushpic ebx 280*dfc6aa5cSAndroid Build Coastguard Worker movpic ebx, POINTER [gotptr] ; load GOT address 281*dfc6aa5cSAndroid Build Coastguard Worker 282*dfc6aa5cSAndroid Build Coastguard Worker pxor mm3, mm3 ; mm3=(all 0's) 283*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm0 284*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm0, mm3 ; mm0=row[ 0][0]( 0 1 2 3) 285*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm4, mm3 ; mm4=row[ 0][0]( 4 5 6 7) 286*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm1 287*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm1, mm3 ; mm1=row[-1][0]( 0 1 2 3) 288*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm5, mm3 ; mm5=row[-1][0]( 4 5 6 7) 289*dfc6aa5cSAndroid Build Coastguard Worker movq mm6, mm2 290*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm2, mm3 ; mm2=row[+1][0]( 0 1 2 3) 291*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm6, mm3 ; mm6=row[+1][0]( 4 5 6 7) 292*dfc6aa5cSAndroid Build Coastguard Worker 293*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm0, [GOTOFF(ebx,PW_THREE)] 294*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm4, [GOTOFF(ebx,PW_THREE)] 295*dfc6aa5cSAndroid Build Coastguard Worker 296*dfc6aa5cSAndroid Build Coastguard Worker pcmpeqb mm7, mm7 297*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm7, (SIZEOF_MMWORD-2)*BYTE_BIT 298*dfc6aa5cSAndroid Build Coastguard Worker 299*dfc6aa5cSAndroid Build Coastguard Worker paddw mm1, mm0 ; mm1=Int0L=( 0 1 2 3) 300*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, mm4 ; mm5=Int0H=( 4 5 6 7) 301*dfc6aa5cSAndroid Build Coastguard Worker paddw mm2, mm0 ; mm2=Int1L=( 0 1 2 3) 302*dfc6aa5cSAndroid Build Coastguard Worker paddw mm6, mm4 ; mm6=Int1H=( 4 5 6 7) 303*dfc6aa5cSAndroid Build Coastguard Worker 304*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edx+0*SIZEOF_MMWORD], mm1 ; temporarily save 305*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edx+1*SIZEOF_MMWORD], mm5 ; the intermediate data 306*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+0*SIZEOF_MMWORD], mm2 307*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+1*SIZEOF_MMWORD], mm6 308*dfc6aa5cSAndroid Build Coastguard Worker 309*dfc6aa5cSAndroid Build Coastguard Worker pand mm1, mm7 ; mm1=( 0 - - -) 310*dfc6aa5cSAndroid Build Coastguard Worker pand mm2, mm7 ; mm2=( 0 - - -) 311*dfc6aa5cSAndroid Build Coastguard Worker 312*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(0)], mm1 313*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(1)], mm2 314*dfc6aa5cSAndroid Build Coastguard Worker 315*dfc6aa5cSAndroid Build Coastguard Worker poppic ebx 316*dfc6aa5cSAndroid Build Coastguard Worker 317*dfc6aa5cSAndroid Build Coastguard Worker add eax, byte SIZEOF_MMWORD-1 318*dfc6aa5cSAndroid Build Coastguard Worker and eax, byte -SIZEOF_MMWORD 319*dfc6aa5cSAndroid Build Coastguard Worker cmp eax, byte SIZEOF_MMWORD 320*dfc6aa5cSAndroid Build Coastguard Worker ja short .columnloop 321*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 322*dfc6aa5cSAndroid Build Coastguard Worker 323*dfc6aa5cSAndroid Build Coastguard Worker.columnloop_last: 324*dfc6aa5cSAndroid Build Coastguard Worker ; -- process the last column block 325*dfc6aa5cSAndroid Build Coastguard Worker 326*dfc6aa5cSAndroid Build Coastguard Worker pushpic ebx 327*dfc6aa5cSAndroid Build Coastguard Worker movpic ebx, POINTER [gotptr] ; load GOT address 328*dfc6aa5cSAndroid Build Coastguard Worker 329*dfc6aa5cSAndroid Build Coastguard Worker pcmpeqb mm1, mm1 330*dfc6aa5cSAndroid Build Coastguard Worker psllq mm1, (SIZEOF_MMWORD-2)*BYTE_BIT 331*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, mm1 332*dfc6aa5cSAndroid Build Coastguard Worker 333*dfc6aa5cSAndroid Build Coastguard Worker pand mm1, MMWORD [edx+1*SIZEOF_MMWORD] ; mm1=( - - - 7) 334*dfc6aa5cSAndroid Build Coastguard Worker pand mm2, MMWORD [edi+1*SIZEOF_MMWORD] ; mm2=( - - - 7) 335*dfc6aa5cSAndroid Build Coastguard Worker 336*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(2)], mm1 337*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(3)], mm2 338*dfc6aa5cSAndroid Build Coastguard Worker 339*dfc6aa5cSAndroid Build Coastguard Worker jmp short .upsample 340*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 341*dfc6aa5cSAndroid Build Coastguard Worker 342*dfc6aa5cSAndroid Build Coastguard Worker.columnloop: 343*dfc6aa5cSAndroid Build Coastguard Worker ; -- process the next column block 344*dfc6aa5cSAndroid Build Coastguard Worker 345*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, MMWORD [ebx+1*SIZEOF_MMWORD] ; mm0=row[ 0][1] 346*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, MMWORD [ecx+1*SIZEOF_MMWORD] ; mm1=row[-1][1] 347*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, MMWORD [esi+1*SIZEOF_MMWORD] ; mm2=row[+1][1] 348*dfc6aa5cSAndroid Build Coastguard Worker 349*dfc6aa5cSAndroid Build Coastguard Worker pushpic ebx 350*dfc6aa5cSAndroid Build Coastguard Worker movpic ebx, POINTER [gotptr] ; load GOT address 351*dfc6aa5cSAndroid Build Coastguard Worker 352*dfc6aa5cSAndroid Build Coastguard Worker pxor mm3, mm3 ; mm3=(all 0's) 353*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm0 354*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm0, mm3 ; mm0=row[ 0][1]( 0 1 2 3) 355*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm4, mm3 ; mm4=row[ 0][1]( 4 5 6 7) 356*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm1 357*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm1, mm3 ; mm1=row[-1][1]( 0 1 2 3) 358*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm5, mm3 ; mm5=row[-1][1]( 4 5 6 7) 359*dfc6aa5cSAndroid Build Coastguard Worker movq mm6, mm2 360*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm2, mm3 ; mm2=row[+1][1]( 0 1 2 3) 361*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm6, mm3 ; mm6=row[+1][1]( 4 5 6 7) 362*dfc6aa5cSAndroid Build Coastguard Worker 363*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm0, [GOTOFF(ebx,PW_THREE)] 364*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm4, [GOTOFF(ebx,PW_THREE)] 365*dfc6aa5cSAndroid Build Coastguard Worker 366*dfc6aa5cSAndroid Build Coastguard Worker paddw mm1, mm0 ; mm1=Int0L=( 0 1 2 3) 367*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, mm4 ; mm5=Int0H=( 4 5 6 7) 368*dfc6aa5cSAndroid Build Coastguard Worker paddw mm2, mm0 ; mm2=Int1L=( 0 1 2 3) 369*dfc6aa5cSAndroid Build Coastguard Worker paddw mm6, mm4 ; mm6=Int1H=( 4 5 6 7) 370*dfc6aa5cSAndroid Build Coastguard Worker 371*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edx+2*SIZEOF_MMWORD], mm1 ; temporarily save 372*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edx+3*SIZEOF_MMWORD], mm5 ; the intermediate data 373*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+2*SIZEOF_MMWORD], mm2 374*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+3*SIZEOF_MMWORD], mm6 375*dfc6aa5cSAndroid Build Coastguard Worker 376*dfc6aa5cSAndroid Build Coastguard Worker psllq mm1, (SIZEOF_MMWORD-2)*BYTE_BIT ; mm1=( - - - 0) 377*dfc6aa5cSAndroid Build Coastguard Worker psllq mm2, (SIZEOF_MMWORD-2)*BYTE_BIT ; mm2=( - - - 0) 378*dfc6aa5cSAndroid Build Coastguard Worker 379*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(2)], mm1 380*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(3)], mm2 381*dfc6aa5cSAndroid Build Coastguard Worker 382*dfc6aa5cSAndroid Build Coastguard Worker.upsample: 383*dfc6aa5cSAndroid Build Coastguard Worker ; -- process the upper row 384*dfc6aa5cSAndroid Build Coastguard Worker 385*dfc6aa5cSAndroid Build Coastguard Worker movq mm7, MMWORD [edx+0*SIZEOF_MMWORD] ; mm7=Int0L=( 0 1 2 3) 386*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, MMWORD [edx+1*SIZEOF_MMWORD] ; mm3=Int0H=( 4 5 6 7) 387*dfc6aa5cSAndroid Build Coastguard Worker 388*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm7 389*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm3 390*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm0, 2*BYTE_BIT ; mm0=( 1 2 3 -) 391*dfc6aa5cSAndroid Build Coastguard Worker psllq mm4, (SIZEOF_MMWORD-2)*BYTE_BIT ; mm4=( - - - 4) 392*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm7 393*dfc6aa5cSAndroid Build Coastguard Worker movq mm6, mm3 394*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm5, (SIZEOF_MMWORD-2)*BYTE_BIT ; mm5=( 3 - - -) 395*dfc6aa5cSAndroid Build Coastguard Worker psllq mm6, 2*BYTE_BIT ; mm6=( - 4 5 6) 396*dfc6aa5cSAndroid Build Coastguard Worker 397*dfc6aa5cSAndroid Build Coastguard Worker por mm0, mm4 ; mm0=( 1 2 3 4) 398*dfc6aa5cSAndroid Build Coastguard Worker por mm5, mm6 ; mm5=( 3 4 5 6) 399*dfc6aa5cSAndroid Build Coastguard Worker 400*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm7 401*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, mm3 402*dfc6aa5cSAndroid Build Coastguard Worker psllq mm1, 2*BYTE_BIT ; mm1=( - 0 1 2) 403*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm2, 2*BYTE_BIT ; mm2=( 5 6 7 -) 404*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm3 405*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm4, (SIZEOF_MMWORD-2)*BYTE_BIT ; mm4=( 7 - - -) 406*dfc6aa5cSAndroid Build Coastguard Worker 407*dfc6aa5cSAndroid Build Coastguard Worker por mm1, MMWORD [wk(0)] ; mm1=(-1 0 1 2) 408*dfc6aa5cSAndroid Build Coastguard Worker por mm2, MMWORD [wk(2)] ; mm2=( 5 6 7 8) 409*dfc6aa5cSAndroid Build Coastguard Worker 410*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(0)], mm4 411*dfc6aa5cSAndroid Build Coastguard Worker 412*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm7, [GOTOFF(ebx,PW_THREE)] 413*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm3, [GOTOFF(ebx,PW_THREE)] 414*dfc6aa5cSAndroid Build Coastguard Worker paddw mm1, [GOTOFF(ebx,PW_EIGHT)] 415*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, [GOTOFF(ebx,PW_EIGHT)] 416*dfc6aa5cSAndroid Build Coastguard Worker paddw mm0, [GOTOFF(ebx,PW_SEVEN)] 417*dfc6aa5cSAndroid Build Coastguard Worker paddw mm2, [GOTOFF(ebx,PW_SEVEN)] 418*dfc6aa5cSAndroid Build Coastguard Worker 419*dfc6aa5cSAndroid Build Coastguard Worker paddw mm1, mm7 420*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, mm3 421*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm1, 4 ; mm1=Out0LE=( 0 2 4 6) 422*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm5, 4 ; mm5=Out0HE=( 8 10 12 14) 423*dfc6aa5cSAndroid Build Coastguard Worker paddw mm0, mm7 424*dfc6aa5cSAndroid Build Coastguard Worker paddw mm2, mm3 425*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm0, 4 ; mm0=Out0LO=( 1 3 5 7) 426*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm2, 4 ; mm2=Out0HO=( 9 11 13 15) 427*dfc6aa5cSAndroid Build Coastguard Worker 428*dfc6aa5cSAndroid Build Coastguard Worker psllw mm0, BYTE_BIT 429*dfc6aa5cSAndroid Build Coastguard Worker psllw mm2, BYTE_BIT 430*dfc6aa5cSAndroid Build Coastguard Worker por mm1, mm0 ; mm1=Out0L=( 0 1 2 3 4 5 6 7) 431*dfc6aa5cSAndroid Build Coastguard Worker por mm5, mm2 ; mm5=Out0H=( 8 9 10 11 12 13 14 15) 432*dfc6aa5cSAndroid Build Coastguard Worker 433*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edx+0*SIZEOF_MMWORD], mm1 434*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edx+1*SIZEOF_MMWORD], mm5 435*dfc6aa5cSAndroid Build Coastguard Worker 436*dfc6aa5cSAndroid Build Coastguard Worker ; -- process the lower row 437*dfc6aa5cSAndroid Build Coastguard Worker 438*dfc6aa5cSAndroid Build Coastguard Worker movq mm6, MMWORD [edi+0*SIZEOF_MMWORD] ; mm6=Int1L=( 0 1 2 3) 439*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, MMWORD [edi+1*SIZEOF_MMWORD] ; mm4=Int1H=( 4 5 6 7) 440*dfc6aa5cSAndroid Build Coastguard Worker 441*dfc6aa5cSAndroid Build Coastguard Worker movq mm7, mm6 442*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, mm4 443*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm7, 2*BYTE_BIT ; mm7=( 1 2 3 -) 444*dfc6aa5cSAndroid Build Coastguard Worker psllq mm3, (SIZEOF_MMWORD-2)*BYTE_BIT ; mm3=( - - - 4) 445*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm6 446*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, mm4 447*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm0, (SIZEOF_MMWORD-2)*BYTE_BIT ; mm0=( 3 - - -) 448*dfc6aa5cSAndroid Build Coastguard Worker psllq mm2, 2*BYTE_BIT ; mm2=( - 4 5 6) 449*dfc6aa5cSAndroid Build Coastguard Worker 450*dfc6aa5cSAndroid Build Coastguard Worker por mm7, mm3 ; mm7=( 1 2 3 4) 451*dfc6aa5cSAndroid Build Coastguard Worker por mm0, mm2 ; mm0=( 3 4 5 6) 452*dfc6aa5cSAndroid Build Coastguard Worker 453*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm6 454*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm4 455*dfc6aa5cSAndroid Build Coastguard Worker psllq mm1, 2*BYTE_BIT ; mm1=( - 0 1 2) 456*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm5, 2*BYTE_BIT ; mm5=( 5 6 7 -) 457*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, mm4 458*dfc6aa5cSAndroid Build Coastguard Worker psrlq mm3, (SIZEOF_MMWORD-2)*BYTE_BIT ; mm3=( 7 - - -) 459*dfc6aa5cSAndroid Build Coastguard Worker 460*dfc6aa5cSAndroid Build Coastguard Worker por mm1, MMWORD [wk(1)] ; mm1=(-1 0 1 2) 461*dfc6aa5cSAndroid Build Coastguard Worker por mm5, MMWORD [wk(3)] ; mm5=( 5 6 7 8) 462*dfc6aa5cSAndroid Build Coastguard Worker 463*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(1)], mm3 464*dfc6aa5cSAndroid Build Coastguard Worker 465*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm6, [GOTOFF(ebx,PW_THREE)] 466*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm4, [GOTOFF(ebx,PW_THREE)] 467*dfc6aa5cSAndroid Build Coastguard Worker paddw mm1, [GOTOFF(ebx,PW_EIGHT)] 468*dfc6aa5cSAndroid Build Coastguard Worker paddw mm0, [GOTOFF(ebx,PW_EIGHT)] 469*dfc6aa5cSAndroid Build Coastguard Worker paddw mm7, [GOTOFF(ebx,PW_SEVEN)] 470*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, [GOTOFF(ebx,PW_SEVEN)] 471*dfc6aa5cSAndroid Build Coastguard Worker 472*dfc6aa5cSAndroid Build Coastguard Worker paddw mm1, mm6 473*dfc6aa5cSAndroid Build Coastguard Worker paddw mm0, mm4 474*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm1, 4 ; mm1=Out1LE=( 0 2 4 6) 475*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm0, 4 ; mm0=Out1HE=( 8 10 12 14) 476*dfc6aa5cSAndroid Build Coastguard Worker paddw mm7, mm6 477*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, mm4 478*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm7, 4 ; mm7=Out1LO=( 1 3 5 7) 479*dfc6aa5cSAndroid Build Coastguard Worker psrlw mm5, 4 ; mm5=Out1HO=( 9 11 13 15) 480*dfc6aa5cSAndroid Build Coastguard Worker 481*dfc6aa5cSAndroid Build Coastguard Worker psllw mm7, BYTE_BIT 482*dfc6aa5cSAndroid Build Coastguard Worker psllw mm5, BYTE_BIT 483*dfc6aa5cSAndroid Build Coastguard Worker por mm1, mm7 ; mm1=Out1L=( 0 1 2 3 4 5 6 7) 484*dfc6aa5cSAndroid Build Coastguard Worker por mm0, mm5 ; mm0=Out1H=( 8 9 10 11 12 13 14 15) 485*dfc6aa5cSAndroid Build Coastguard Worker 486*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+0*SIZEOF_MMWORD], mm1 487*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+1*SIZEOF_MMWORD], mm0 488*dfc6aa5cSAndroid Build Coastguard Worker 489*dfc6aa5cSAndroid Build Coastguard Worker poppic ebx 490*dfc6aa5cSAndroid Build Coastguard Worker 491*dfc6aa5cSAndroid Build Coastguard Worker sub eax, byte SIZEOF_MMWORD 492*dfc6aa5cSAndroid Build Coastguard Worker add ecx, byte 1*SIZEOF_MMWORD ; inptr1(above) 493*dfc6aa5cSAndroid Build Coastguard Worker add ebx, byte 1*SIZEOF_MMWORD ; inptr0 494*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 1*SIZEOF_MMWORD ; inptr1(below) 495*dfc6aa5cSAndroid Build Coastguard Worker add edx, byte 2*SIZEOF_MMWORD ; outptr0 496*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 2*SIZEOF_MMWORD ; outptr1 497*dfc6aa5cSAndroid Build Coastguard Worker cmp eax, byte SIZEOF_MMWORD 498*dfc6aa5cSAndroid Build Coastguard Worker ja near .columnloop 499*dfc6aa5cSAndroid Build Coastguard Worker test eax, eax 500*dfc6aa5cSAndroid Build Coastguard Worker jnz near .columnloop_last 501*dfc6aa5cSAndroid Build Coastguard Worker 502*dfc6aa5cSAndroid Build Coastguard Worker pop esi 503*dfc6aa5cSAndroid Build Coastguard Worker pop edi 504*dfc6aa5cSAndroid Build Coastguard Worker pop ecx 505*dfc6aa5cSAndroid Build Coastguard Worker pop eax 506*dfc6aa5cSAndroid Build Coastguard Worker 507*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 1*SIZEOF_JSAMPROW ; input_data 508*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 2*SIZEOF_JSAMPROW ; output_data 509*dfc6aa5cSAndroid Build Coastguard Worker sub ecx, byte 2 ; rowctr 510*dfc6aa5cSAndroid Build Coastguard Worker jg near .rowloop 511*dfc6aa5cSAndroid Build Coastguard Worker 512*dfc6aa5cSAndroid Build Coastguard Worker emms ; empty MMX state 513*dfc6aa5cSAndroid Build Coastguard Worker 514*dfc6aa5cSAndroid Build Coastguard Worker.return: 515*dfc6aa5cSAndroid Build Coastguard Worker pop edi 516*dfc6aa5cSAndroid Build Coastguard Worker pop esi 517*dfc6aa5cSAndroid Build Coastguard Worker; pop edx ; need not be preserved 518*dfc6aa5cSAndroid Build Coastguard Worker; pop ecx ; need not be preserved 519*dfc6aa5cSAndroid Build Coastguard Worker pop ebx 520*dfc6aa5cSAndroid Build Coastguard Worker mov esp, ebp ; esp <- aligned ebp 521*dfc6aa5cSAndroid Build Coastguard Worker pop esp ; esp <- original ebp 522*dfc6aa5cSAndroid Build Coastguard Worker pop ebp 523*dfc6aa5cSAndroid Build Coastguard Worker ret 524*dfc6aa5cSAndroid Build Coastguard Worker 525*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 526*dfc6aa5cSAndroid Build Coastguard Worker; 527*dfc6aa5cSAndroid Build Coastguard Worker; Fast processing for the common case of 2:1 horizontal and 1:1 vertical. 528*dfc6aa5cSAndroid Build Coastguard Worker; It's still a box filter. 529*dfc6aa5cSAndroid Build Coastguard Worker; 530*dfc6aa5cSAndroid Build Coastguard Worker; GLOBAL(void) 531*dfc6aa5cSAndroid Build Coastguard Worker; jsimd_h2v1_upsample_mmx(int max_v_samp_factor, JDIMENSION output_width, 532*dfc6aa5cSAndroid Build Coastguard Worker; JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr); 533*dfc6aa5cSAndroid Build Coastguard Worker; 534*dfc6aa5cSAndroid Build Coastguard Worker 535*dfc6aa5cSAndroid Build Coastguard Worker%define max_v_samp(b) (b) + 8 ; int max_v_samp_factor 536*dfc6aa5cSAndroid Build Coastguard Worker%define output_width(b) (b) + 12 ; JDIMENSION output_width 537*dfc6aa5cSAndroid Build Coastguard Worker%define input_data(b) (b) + 16 ; JSAMPARRAY input_data 538*dfc6aa5cSAndroid Build Coastguard Worker%define output_data_ptr(b) (b) + 20 ; JSAMPARRAY *output_data_ptr 539*dfc6aa5cSAndroid Build Coastguard Worker 540*dfc6aa5cSAndroid Build Coastguard Worker align 32 541*dfc6aa5cSAndroid Build Coastguard Worker GLOBAL_FUNCTION(jsimd_h2v1_upsample_mmx) 542*dfc6aa5cSAndroid Build Coastguard Worker 543*dfc6aa5cSAndroid Build Coastguard WorkerEXTN(jsimd_h2v1_upsample_mmx): 544*dfc6aa5cSAndroid Build Coastguard Worker push ebp 545*dfc6aa5cSAndroid Build Coastguard Worker mov ebp, esp 546*dfc6aa5cSAndroid Build Coastguard Worker; push ebx ; unused 547*dfc6aa5cSAndroid Build Coastguard Worker; push ecx ; need not be preserved 548*dfc6aa5cSAndroid Build Coastguard Worker; push edx ; need not be preserved 549*dfc6aa5cSAndroid Build Coastguard Worker push esi 550*dfc6aa5cSAndroid Build Coastguard Worker push edi 551*dfc6aa5cSAndroid Build Coastguard Worker 552*dfc6aa5cSAndroid Build Coastguard Worker mov edx, JDIMENSION [output_width(ebp)] 553*dfc6aa5cSAndroid Build Coastguard Worker add edx, byte (2*SIZEOF_MMWORD)-1 554*dfc6aa5cSAndroid Build Coastguard Worker and edx, byte -(2*SIZEOF_MMWORD) 555*dfc6aa5cSAndroid Build Coastguard Worker jz short .return 556*dfc6aa5cSAndroid Build Coastguard Worker 557*dfc6aa5cSAndroid Build Coastguard Worker mov ecx, INT [max_v_samp(ebp)] ; rowctr 558*dfc6aa5cSAndroid Build Coastguard Worker test ecx, ecx 559*dfc6aa5cSAndroid Build Coastguard Worker jz short .return 560*dfc6aa5cSAndroid Build Coastguard Worker 561*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JSAMPARRAY [input_data(ebp)] ; input_data 562*dfc6aa5cSAndroid Build Coastguard Worker mov edi, POINTER [output_data_ptr(ebp)] 563*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JSAMPARRAY [edi] ; output_data 564*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 565*dfc6aa5cSAndroid Build Coastguard Worker.rowloop: 566*dfc6aa5cSAndroid Build Coastguard Worker push edi 567*dfc6aa5cSAndroid Build Coastguard Worker push esi 568*dfc6aa5cSAndroid Build Coastguard Worker 569*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JSAMPROW [esi] ; inptr 570*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JSAMPROW [edi] ; outptr 571*dfc6aa5cSAndroid Build Coastguard Worker mov eax, edx ; colctr 572*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 573*dfc6aa5cSAndroid Build Coastguard Worker.columnloop: 574*dfc6aa5cSAndroid Build Coastguard Worker 575*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, MMWORD [esi+0*SIZEOF_MMWORD] 576*dfc6aa5cSAndroid Build Coastguard Worker 577*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm0 578*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm0, mm0 579*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm1, mm1 580*dfc6aa5cSAndroid Build Coastguard Worker 581*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+0*SIZEOF_MMWORD], mm0 582*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+1*SIZEOF_MMWORD], mm1 583*dfc6aa5cSAndroid Build Coastguard Worker 584*dfc6aa5cSAndroid Build Coastguard Worker sub eax, byte 2*SIZEOF_MMWORD 585*dfc6aa5cSAndroid Build Coastguard Worker jz short .nextrow 586*dfc6aa5cSAndroid Build Coastguard Worker 587*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, MMWORD [esi+1*SIZEOF_MMWORD] 588*dfc6aa5cSAndroid Build Coastguard Worker 589*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, mm2 590*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm2, mm2 591*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm3, mm3 592*dfc6aa5cSAndroid Build Coastguard Worker 593*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+2*SIZEOF_MMWORD], mm2 594*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+3*SIZEOF_MMWORD], mm3 595*dfc6aa5cSAndroid Build Coastguard Worker 596*dfc6aa5cSAndroid Build Coastguard Worker sub eax, byte 2*SIZEOF_MMWORD 597*dfc6aa5cSAndroid Build Coastguard Worker jz short .nextrow 598*dfc6aa5cSAndroid Build Coastguard Worker 599*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 2*SIZEOF_MMWORD ; inptr 600*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 4*SIZEOF_MMWORD ; outptr 601*dfc6aa5cSAndroid Build Coastguard Worker jmp short .columnloop 602*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 603*dfc6aa5cSAndroid Build Coastguard Worker 604*dfc6aa5cSAndroid Build Coastguard Worker.nextrow: 605*dfc6aa5cSAndroid Build Coastguard Worker pop esi 606*dfc6aa5cSAndroid Build Coastguard Worker pop edi 607*dfc6aa5cSAndroid Build Coastguard Worker 608*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte SIZEOF_JSAMPROW ; input_data 609*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte SIZEOF_JSAMPROW ; output_data 610*dfc6aa5cSAndroid Build Coastguard Worker dec ecx ; rowctr 611*dfc6aa5cSAndroid Build Coastguard Worker jg short .rowloop 612*dfc6aa5cSAndroid Build Coastguard Worker 613*dfc6aa5cSAndroid Build Coastguard Worker emms ; empty MMX state 614*dfc6aa5cSAndroid Build Coastguard Worker 615*dfc6aa5cSAndroid Build Coastguard Worker.return: 616*dfc6aa5cSAndroid Build Coastguard Worker pop edi 617*dfc6aa5cSAndroid Build Coastguard Worker pop esi 618*dfc6aa5cSAndroid Build Coastguard Worker; pop edx ; need not be preserved 619*dfc6aa5cSAndroid Build Coastguard Worker; pop ecx ; need not be preserved 620*dfc6aa5cSAndroid Build Coastguard Worker; pop ebx ; unused 621*dfc6aa5cSAndroid Build Coastguard Worker pop ebp 622*dfc6aa5cSAndroid Build Coastguard Worker ret 623*dfc6aa5cSAndroid Build Coastguard Worker 624*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 625*dfc6aa5cSAndroid Build Coastguard Worker; 626*dfc6aa5cSAndroid Build Coastguard Worker; Fast processing for the common case of 2:1 horizontal and 2:1 vertical. 627*dfc6aa5cSAndroid Build Coastguard Worker; It's still a box filter. 628*dfc6aa5cSAndroid Build Coastguard Worker; 629*dfc6aa5cSAndroid Build Coastguard Worker; GLOBAL(void) 630*dfc6aa5cSAndroid Build Coastguard Worker; jsimd_h2v2_upsample_mmx(int max_v_samp_factor, JDIMENSION output_width, 631*dfc6aa5cSAndroid Build Coastguard Worker; JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr); 632*dfc6aa5cSAndroid Build Coastguard Worker; 633*dfc6aa5cSAndroid Build Coastguard Worker 634*dfc6aa5cSAndroid Build Coastguard Worker%define max_v_samp(b) (b) + 8 ; int max_v_samp_factor 635*dfc6aa5cSAndroid Build Coastguard Worker%define output_width(b) (b) + 12 ; JDIMENSION output_width 636*dfc6aa5cSAndroid Build Coastguard Worker%define input_data(b) (b) + 16 ; JSAMPARRAY input_data 637*dfc6aa5cSAndroid Build Coastguard Worker%define output_data_ptr(b) (b) + 20 ; JSAMPARRAY *output_data_ptr 638*dfc6aa5cSAndroid Build Coastguard Worker 639*dfc6aa5cSAndroid Build Coastguard Worker align 32 640*dfc6aa5cSAndroid Build Coastguard Worker GLOBAL_FUNCTION(jsimd_h2v2_upsample_mmx) 641*dfc6aa5cSAndroid Build Coastguard Worker 642*dfc6aa5cSAndroid Build Coastguard WorkerEXTN(jsimd_h2v2_upsample_mmx): 643*dfc6aa5cSAndroid Build Coastguard Worker push ebp 644*dfc6aa5cSAndroid Build Coastguard Worker mov ebp, esp 645*dfc6aa5cSAndroid Build Coastguard Worker push ebx 646*dfc6aa5cSAndroid Build Coastguard Worker; push ecx ; need not be preserved 647*dfc6aa5cSAndroid Build Coastguard Worker; push edx ; need not be preserved 648*dfc6aa5cSAndroid Build Coastguard Worker push esi 649*dfc6aa5cSAndroid Build Coastguard Worker push edi 650*dfc6aa5cSAndroid Build Coastguard Worker 651*dfc6aa5cSAndroid Build Coastguard Worker mov edx, JDIMENSION [output_width(ebp)] 652*dfc6aa5cSAndroid Build Coastguard Worker add edx, byte (2*SIZEOF_MMWORD)-1 653*dfc6aa5cSAndroid Build Coastguard Worker and edx, byte -(2*SIZEOF_MMWORD) 654*dfc6aa5cSAndroid Build Coastguard Worker jz near .return 655*dfc6aa5cSAndroid Build Coastguard Worker 656*dfc6aa5cSAndroid Build Coastguard Worker mov ecx, INT [max_v_samp(ebp)] ; rowctr 657*dfc6aa5cSAndroid Build Coastguard Worker test ecx, ecx 658*dfc6aa5cSAndroid Build Coastguard Worker jz short .return 659*dfc6aa5cSAndroid Build Coastguard Worker 660*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JSAMPARRAY [input_data(ebp)] ; input_data 661*dfc6aa5cSAndroid Build Coastguard Worker mov edi, POINTER [output_data_ptr(ebp)] 662*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JSAMPARRAY [edi] ; output_data 663*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 664*dfc6aa5cSAndroid Build Coastguard Worker.rowloop: 665*dfc6aa5cSAndroid Build Coastguard Worker push edi 666*dfc6aa5cSAndroid Build Coastguard Worker push esi 667*dfc6aa5cSAndroid Build Coastguard Worker 668*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JSAMPROW [esi] ; inptr 669*dfc6aa5cSAndroid Build Coastguard Worker mov ebx, JSAMPROW [edi+0*SIZEOF_JSAMPROW] ; outptr0 670*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JSAMPROW [edi+1*SIZEOF_JSAMPROW] ; outptr1 671*dfc6aa5cSAndroid Build Coastguard Worker mov eax, edx ; colctr 672*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 673*dfc6aa5cSAndroid Build Coastguard Worker.columnloop: 674*dfc6aa5cSAndroid Build Coastguard Worker 675*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, MMWORD [esi+0*SIZEOF_MMWORD] 676*dfc6aa5cSAndroid Build Coastguard Worker 677*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm0 678*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm0, mm0 679*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm1, mm1 680*dfc6aa5cSAndroid Build Coastguard Worker 681*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [ebx+0*SIZEOF_MMWORD], mm0 682*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [ebx+1*SIZEOF_MMWORD], mm1 683*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+0*SIZEOF_MMWORD], mm0 684*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+1*SIZEOF_MMWORD], mm1 685*dfc6aa5cSAndroid Build Coastguard Worker 686*dfc6aa5cSAndroid Build Coastguard Worker sub eax, byte 2*SIZEOF_MMWORD 687*dfc6aa5cSAndroid Build Coastguard Worker jz short .nextrow 688*dfc6aa5cSAndroid Build Coastguard Worker 689*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, MMWORD [esi+1*SIZEOF_MMWORD] 690*dfc6aa5cSAndroid Build Coastguard Worker 691*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, mm2 692*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm2, mm2 693*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm3, mm3 694*dfc6aa5cSAndroid Build Coastguard Worker 695*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [ebx+2*SIZEOF_MMWORD], mm2 696*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [ebx+3*SIZEOF_MMWORD], mm3 697*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+2*SIZEOF_MMWORD], mm2 698*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edi+3*SIZEOF_MMWORD], mm3 699*dfc6aa5cSAndroid Build Coastguard Worker 700*dfc6aa5cSAndroid Build Coastguard Worker sub eax, byte 2*SIZEOF_MMWORD 701*dfc6aa5cSAndroid Build Coastguard Worker jz short .nextrow 702*dfc6aa5cSAndroid Build Coastguard Worker 703*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 2*SIZEOF_MMWORD ; inptr 704*dfc6aa5cSAndroid Build Coastguard Worker add ebx, byte 4*SIZEOF_MMWORD ; outptr0 705*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 4*SIZEOF_MMWORD ; outptr1 706*dfc6aa5cSAndroid Build Coastguard Worker jmp short .columnloop 707*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 708*dfc6aa5cSAndroid Build Coastguard Worker 709*dfc6aa5cSAndroid Build Coastguard Worker.nextrow: 710*dfc6aa5cSAndroid Build Coastguard Worker pop esi 711*dfc6aa5cSAndroid Build Coastguard Worker pop edi 712*dfc6aa5cSAndroid Build Coastguard Worker 713*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 1*SIZEOF_JSAMPROW ; input_data 714*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 2*SIZEOF_JSAMPROW ; output_data 715*dfc6aa5cSAndroid Build Coastguard Worker sub ecx, byte 2 ; rowctr 716*dfc6aa5cSAndroid Build Coastguard Worker jg short .rowloop 717*dfc6aa5cSAndroid Build Coastguard Worker 718*dfc6aa5cSAndroid Build Coastguard Worker emms ; empty MMX state 719*dfc6aa5cSAndroid Build Coastguard Worker 720*dfc6aa5cSAndroid Build Coastguard Worker.return: 721*dfc6aa5cSAndroid Build Coastguard Worker pop edi 722*dfc6aa5cSAndroid Build Coastguard Worker pop esi 723*dfc6aa5cSAndroid Build Coastguard Worker; pop edx ; need not be preserved 724*dfc6aa5cSAndroid Build Coastguard Worker; pop ecx ; need not be preserved 725*dfc6aa5cSAndroid Build Coastguard Worker pop ebx 726*dfc6aa5cSAndroid Build Coastguard Worker pop ebp 727*dfc6aa5cSAndroid Build Coastguard Worker ret 728*dfc6aa5cSAndroid Build Coastguard Worker 729*dfc6aa5cSAndroid Build Coastguard Worker; For some reason, the OS X linker does not honor the request to align the 730*dfc6aa5cSAndroid Build Coastguard Worker; segment unless we do this. 731*dfc6aa5cSAndroid Build Coastguard Worker align 32 732