1*9880d681SAndroid Build Coastguard Worker//===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===// 2*9880d681SAndroid Build Coastguard Worker 3*9880d681SAndroid Build Coastguard WorkerImplement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector 4*9880d681SAndroid Build Coastguard Workerregisters, to generate better spill code. 5*9880d681SAndroid Build Coastguard Worker 6*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 7*9880d681SAndroid Build Coastguard Worker 8*9880d681SAndroid Build Coastguard WorkerThe first should be a single lvx from the constant pool, the second should be 9*9880d681SAndroid Build Coastguard Workera xor/stvx: 10*9880d681SAndroid Build Coastguard Worker 11*9880d681SAndroid Build Coastguard Workervoid foo(void) { 12*9880d681SAndroid Build Coastguard Worker int x[8] __attribute__((aligned(128))) = { 1, 1, 1, 17, 1, 1, 1, 1 }; 13*9880d681SAndroid Build Coastguard Worker bar (x); 14*9880d681SAndroid Build Coastguard Worker} 15*9880d681SAndroid Build Coastguard Worker 16*9880d681SAndroid Build Coastguard Worker#include <string.h> 17*9880d681SAndroid Build Coastguard Workervoid foo(void) { 18*9880d681SAndroid Build Coastguard Worker int x[8] __attribute__((aligned(128))); 19*9880d681SAndroid Build Coastguard Worker memset (x, 0, sizeof (x)); 20*9880d681SAndroid Build Coastguard Worker bar (x); 21*9880d681SAndroid Build Coastguard Worker} 22*9880d681SAndroid Build Coastguard Worker 23*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 24*9880d681SAndroid Build Coastguard Worker 25*9880d681SAndroid Build Coastguard WorkerAltivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0: 26*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763 27*9880d681SAndroid Build Coastguard Worker 28*9880d681SAndroid Build Coastguard WorkerWhen -ffast-math is on, we can use 0.0. 29*9880d681SAndroid Build Coastguard Worker 30*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 31*9880d681SAndroid Build Coastguard Worker 32*9880d681SAndroid Build Coastguard Worker Consider this: 33*9880d681SAndroid Build Coastguard Worker v4f32 Vector; 34*9880d681SAndroid Build Coastguard Worker v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X }; 35*9880d681SAndroid Build Coastguard Worker 36*9880d681SAndroid Build Coastguard WorkerSince we know that "Vector" is 16-byte aligned and we know the element offset 37*9880d681SAndroid Build Coastguard Workerof ".X", we should change the load into a lve*x instruction, instead of doing 38*9880d681SAndroid Build Coastguard Workera load/store/lve*x sequence. 39*9880d681SAndroid Build Coastguard Worker 40*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 41*9880d681SAndroid Build Coastguard Worker 42*9880d681SAndroid Build Coastguard WorkerFor functions that use altivec AND have calls, we are VRSAVE'ing all call 43*9880d681SAndroid Build Coastguard Workerclobbered regs. 44*9880d681SAndroid Build Coastguard Worker 45*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 46*9880d681SAndroid Build Coastguard Worker 47*9880d681SAndroid Build Coastguard WorkerImplement passing vectors by value into calls and receiving them as arguments. 48*9880d681SAndroid Build Coastguard Worker 49*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 50*9880d681SAndroid Build Coastguard Worker 51*9880d681SAndroid Build Coastguard WorkerGCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load 52*9880d681SAndroid Build Coastguard Workerof C1/C2/C3, then a load and vperm of Variable. 53*9880d681SAndroid Build Coastguard Worker 54*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 55*9880d681SAndroid Build Coastguard Worker 56*9880d681SAndroid Build Coastguard WorkerWe need a way to teach tblgen that some operands of an intrinsic are required to 57*9880d681SAndroid Build Coastguard Workerbe constants. The verifier should enforce this constraint. 58*9880d681SAndroid Build Coastguard Worker 59*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 60*9880d681SAndroid Build Coastguard Worker 61*9880d681SAndroid Build Coastguard WorkerWe currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte 62*9880d681SAndroid Build Coastguard Workeraligned stack slot, followed by a load/vperm. We should probably just store it 63*9880d681SAndroid Build Coastguard Workerto a scalar stack slot, then use lvsl/vperm to load it. If the value is already 64*9880d681SAndroid Build Coastguard Workerin memory this is a big win. 65*9880d681SAndroid Build Coastguard Worker 66*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 67*9880d681SAndroid Build Coastguard Worker 68*9880d681SAndroid Build Coastguard Workerextract_vector_elt of an arbitrary constant vector can be done with the 69*9880d681SAndroid Build Coastguard Workerfollowing instructions: 70*9880d681SAndroid Build Coastguard Worker 71*9880d681SAndroid Build Coastguard WorkervTemp = vec_splat(v0,2); // 2 is the element the src is in. 72*9880d681SAndroid Build Coastguard Workervec_ste(&destloc,0,vTemp); 73*9880d681SAndroid Build Coastguard Worker 74*9880d681SAndroid Build Coastguard WorkerWe can do an arbitrary non-constant value by using lvsr/perm/ste. 75*9880d681SAndroid Build Coastguard Worker 76*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 77*9880d681SAndroid Build Coastguard Worker 78*9880d681SAndroid Build Coastguard WorkerIf we want to tie instruction selection into the scheduler, we can do some 79*9880d681SAndroid Build Coastguard Workerconstant formation with different instructions. For example, we can generate 80*9880d681SAndroid Build Coastguard Worker"vsplti -1" with "vcmpequw R,R" and 1,1,1,1 with "vsubcuw R,R", and 0,0,0,0 with 81*9880d681SAndroid Build Coastguard Worker"vsplti 0" or "vxor", each of which use different execution units, thus could 82*9880d681SAndroid Build Coastguard Workerhelp scheduling. 83*9880d681SAndroid Build Coastguard Worker 84*9880d681SAndroid Build Coastguard WorkerThis is probably only reasonable for a post-pass scheduler. 85*9880d681SAndroid Build Coastguard Worker 86*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 87*9880d681SAndroid Build Coastguard Worker 88*9880d681SAndroid Build Coastguard WorkerFor this function: 89*9880d681SAndroid Build Coastguard Worker 90*9880d681SAndroid Build Coastguard Workervoid test(vector float *A, vector float *B) { 91*9880d681SAndroid Build Coastguard Worker vector float C = (vector float)vec_cmpeq(*A, *B); 92*9880d681SAndroid Build Coastguard Worker if (!vec_any_eq(*A, *B)) 93*9880d681SAndroid Build Coastguard Worker *B = (vector float){0,0,0,0}; 94*9880d681SAndroid Build Coastguard Worker *A = C; 95*9880d681SAndroid Build Coastguard Worker} 96*9880d681SAndroid Build Coastguard Worker 97*9880d681SAndroid Build Coastguard Workerwe get the following basic block: 98*9880d681SAndroid Build Coastguard Worker 99*9880d681SAndroid Build Coastguard Worker ... 100*9880d681SAndroid Build Coastguard Worker lvx v2, 0, r4 101*9880d681SAndroid Build Coastguard Worker lvx v3, 0, r3 102*9880d681SAndroid Build Coastguard Worker vcmpeqfp v4, v3, v2 103*9880d681SAndroid Build Coastguard Worker vcmpeqfp. v2, v3, v2 104*9880d681SAndroid Build Coastguard Worker bne cr6, LBB1_2 ; cond_next 105*9880d681SAndroid Build Coastguard Worker 106*9880d681SAndroid Build Coastguard WorkerThe vcmpeqfp/vcmpeqfp. instructions currently cannot be merged when the 107*9880d681SAndroid Build Coastguard Workervcmpeqfp. result is used by a branch. This can be improved. 108*9880d681SAndroid Build Coastguard Worker 109*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 110*9880d681SAndroid Build Coastguard Worker 111*9880d681SAndroid Build Coastguard WorkerThe code generated for this is truly aweful: 112*9880d681SAndroid Build Coastguard Worker 113*9880d681SAndroid Build Coastguard Workervector float test(float a, float b) { 114*9880d681SAndroid Build Coastguard Worker return (vector float){ 0.0, a, 0.0, 0.0}; 115*9880d681SAndroid Build Coastguard Worker} 116*9880d681SAndroid Build Coastguard Worker 117*9880d681SAndroid Build Coastguard WorkerLCPI1_0: ; float 118*9880d681SAndroid Build Coastguard Worker .space 4 119*9880d681SAndroid Build Coastguard Worker .text 120*9880d681SAndroid Build Coastguard Worker .globl _test 121*9880d681SAndroid Build Coastguard Worker .align 4 122*9880d681SAndroid Build Coastguard Worker_test: 123*9880d681SAndroid Build Coastguard Worker mfspr r2, 256 124*9880d681SAndroid Build Coastguard Worker oris r3, r2, 4096 125*9880d681SAndroid Build Coastguard Worker mtspr 256, r3 126*9880d681SAndroid Build Coastguard Worker lis r3, ha16(LCPI1_0) 127*9880d681SAndroid Build Coastguard Worker addi r4, r1, -32 128*9880d681SAndroid Build Coastguard Worker stfs f1, -16(r1) 129*9880d681SAndroid Build Coastguard Worker addi r5, r1, -16 130*9880d681SAndroid Build Coastguard Worker lfs f0, lo16(LCPI1_0)(r3) 131*9880d681SAndroid Build Coastguard Worker stfs f0, -32(r1) 132*9880d681SAndroid Build Coastguard Worker lvx v2, 0, r4 133*9880d681SAndroid Build Coastguard Worker lvx v3, 0, r5 134*9880d681SAndroid Build Coastguard Worker vmrghw v3, v3, v2 135*9880d681SAndroid Build Coastguard Worker vspltw v2, v2, 0 136*9880d681SAndroid Build Coastguard Worker vmrghw v2, v2, v3 137*9880d681SAndroid Build Coastguard Worker mtspr 256, r2 138*9880d681SAndroid Build Coastguard Worker blr 139*9880d681SAndroid Build Coastguard Worker 140*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 141*9880d681SAndroid Build Coastguard Worker 142*9880d681SAndroid Build Coastguard Workerint foo(vector float *x, vector float *y) { 143*9880d681SAndroid Build Coastguard Worker if (vec_all_eq(*x,*y)) return 3245; 144*9880d681SAndroid Build Coastguard Worker else return 12; 145*9880d681SAndroid Build Coastguard Worker} 146*9880d681SAndroid Build Coastguard Worker 147*9880d681SAndroid Build Coastguard WorkerA predicate compare being used in a select_cc should have the same peephole 148*9880d681SAndroid Build Coastguard Workerapplied to it as a predicate compare used by a br_cc. There should be no 149*9880d681SAndroid Build Coastguard Workermfcr here: 150*9880d681SAndroid Build Coastguard Worker 151*9880d681SAndroid Build Coastguard Worker_foo: 152*9880d681SAndroid Build Coastguard Worker mfspr r2, 256 153*9880d681SAndroid Build Coastguard Worker oris r5, r2, 12288 154*9880d681SAndroid Build Coastguard Worker mtspr 256, r5 155*9880d681SAndroid Build Coastguard Worker li r5, 12 156*9880d681SAndroid Build Coastguard Worker li r6, 3245 157*9880d681SAndroid Build Coastguard Worker lvx v2, 0, r4 158*9880d681SAndroid Build Coastguard Worker lvx v3, 0, r3 159*9880d681SAndroid Build Coastguard Worker vcmpeqfp. v2, v3, v2 160*9880d681SAndroid Build Coastguard Worker mfcr r3, 2 161*9880d681SAndroid Build Coastguard Worker rlwinm r3, r3, 25, 31, 31 162*9880d681SAndroid Build Coastguard Worker cmpwi cr0, r3, 0 163*9880d681SAndroid Build Coastguard Worker bne cr0, LBB1_2 ; entry 164*9880d681SAndroid Build Coastguard WorkerLBB1_1: ; entry 165*9880d681SAndroid Build Coastguard Worker mr r6, r5 166*9880d681SAndroid Build Coastguard WorkerLBB1_2: ; entry 167*9880d681SAndroid Build Coastguard Worker mr r3, r6 168*9880d681SAndroid Build Coastguard Worker mtspr 256, r2 169*9880d681SAndroid Build Coastguard Worker blr 170*9880d681SAndroid Build Coastguard Worker 171*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 172*9880d681SAndroid Build Coastguard Worker 173*9880d681SAndroid Build Coastguard WorkerCodeGen/PowerPC/vec_constants.ll has an and operation that should be 174*9880d681SAndroid Build Coastguard Workercodegen'd to andc. The issue is that the 'all ones' build vector is 175*9880d681SAndroid Build Coastguard WorkerSelectNodeTo'd a VSPLTISB instruction node before the and/xor is selected 176*9880d681SAndroid Build Coastguard Workerwhich prevents the vnot pattern from matching. 177*9880d681SAndroid Build Coastguard Worker 178*9880d681SAndroid Build Coastguard Worker 179*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 180*9880d681SAndroid Build Coastguard Worker 181*9880d681SAndroid Build Coastguard WorkerAn alternative to the store/store/load approach for illegal insert element 182*9880d681SAndroid Build Coastguard Workerlowering would be: 183*9880d681SAndroid Build Coastguard Worker 184*9880d681SAndroid Build Coastguard Worker1. store element to any ol' slot 185*9880d681SAndroid Build Coastguard Worker2. lvx the slot 186*9880d681SAndroid Build Coastguard Worker3. lvsl 0; splat index; vcmpeq to generate a select mask 187*9880d681SAndroid Build Coastguard Worker4. lvsl slot + x; vperm to rotate result into correct slot 188*9880d681SAndroid Build Coastguard Worker5. vsel result together. 189*9880d681SAndroid Build Coastguard Worker 190*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 191*9880d681SAndroid Build Coastguard Worker 192*9880d681SAndroid Build Coastguard WorkerShould codegen branches on vec_any/vec_all to avoid mfcr. Two examples: 193*9880d681SAndroid Build Coastguard Worker 194*9880d681SAndroid Build Coastguard Worker#include <altivec.h> 195*9880d681SAndroid Build Coastguard Worker int f(vector float a, vector float b) 196*9880d681SAndroid Build Coastguard Worker { 197*9880d681SAndroid Build Coastguard Worker int aa = 0; 198*9880d681SAndroid Build Coastguard Worker if (vec_all_ge(a, b)) 199*9880d681SAndroid Build Coastguard Worker aa |= 0x1; 200*9880d681SAndroid Build Coastguard Worker if (vec_any_ge(a,b)) 201*9880d681SAndroid Build Coastguard Worker aa |= 0x2; 202*9880d681SAndroid Build Coastguard Worker return aa; 203*9880d681SAndroid Build Coastguard Worker} 204*9880d681SAndroid Build Coastguard Worker 205*9880d681SAndroid Build Coastguard Workervector float f(vector float a, vector float b) { 206*9880d681SAndroid Build Coastguard Worker if (vec_any_eq(a, b)) 207*9880d681SAndroid Build Coastguard Worker return a; 208*9880d681SAndroid Build Coastguard Worker else 209*9880d681SAndroid Build Coastguard Worker return b; 210*9880d681SAndroid Build Coastguard Worker} 211*9880d681SAndroid Build Coastguard Worker 212*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 213*9880d681SAndroid Build Coastguard Worker 214*9880d681SAndroid Build Coastguard WorkerWe should do a little better with eliminating dead stores. 215*9880d681SAndroid Build Coastguard WorkerThe stores to the stack are dead since %a and %b are not needed 216*9880d681SAndroid Build Coastguard Worker 217*9880d681SAndroid Build Coastguard Worker; Function Attrs: nounwind 218*9880d681SAndroid Build Coastguard Workerdefine <16 x i8> @test_vpmsumb() #0 { 219*9880d681SAndroid Build Coastguard Worker entry: 220*9880d681SAndroid Build Coastguard Worker %a = alloca <16 x i8>, align 16 221*9880d681SAndroid Build Coastguard Worker %b = alloca <16 x i8>, align 16 222*9880d681SAndroid Build Coastguard Worker store <16 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16>, <16 x i8>* %a, align 16 223*9880d681SAndroid Build Coastguard Worker store <16 x i8> <i8 113, i8 114, i8 115, i8 116, i8 117, i8 118, i8 119, i8 120, i8 121, i8 122, i8 123, i8 124, i8 125, i8 126, i8 127, i8 112>, <16 x i8>* %b, align 16 224*9880d681SAndroid Build Coastguard Worker %0 = load <16 x i8>* %a, align 16 225*9880d681SAndroid Build Coastguard Worker %1 = load <16 x i8>* %b, align 16 226*9880d681SAndroid Build Coastguard Worker %2 = call <16 x i8> @llvm.ppc.altivec.crypto.vpmsumb(<16 x i8> %0, <16 x i8> %1) 227*9880d681SAndroid Build Coastguard Worker ret <16 x i8> %2 228*9880d681SAndroid Build Coastguard Worker} 229*9880d681SAndroid Build Coastguard Worker 230*9880d681SAndroid Build Coastguard Worker 231*9880d681SAndroid Build Coastguard Worker; Function Attrs: nounwind readnone 232*9880d681SAndroid Build Coastguard Workerdeclare <16 x i8> @llvm.ppc.altivec.crypto.vpmsumb(<16 x i8>, <16 x i8>) #1 233*9880d681SAndroid Build Coastguard Worker 234*9880d681SAndroid Build Coastguard Worker 235*9880d681SAndroid Build Coastguard WorkerProduces the following code with -mtriple=powerpc64-unknown-linux-gnu: 236*9880d681SAndroid Build Coastguard Worker# BB#0: # %entry 237*9880d681SAndroid Build Coastguard Worker addis 3, 2, .LCPI0_0@toc@ha 238*9880d681SAndroid Build Coastguard Worker addis 4, 2, .LCPI0_1@toc@ha 239*9880d681SAndroid Build Coastguard Worker addi 3, 3, .LCPI0_0@toc@l 240*9880d681SAndroid Build Coastguard Worker addi 4, 4, .LCPI0_1@toc@l 241*9880d681SAndroid Build Coastguard Worker lxvw4x 0, 0, 3 242*9880d681SAndroid Build Coastguard Worker addi 3, 1, -16 243*9880d681SAndroid Build Coastguard Worker lxvw4x 35, 0, 4 244*9880d681SAndroid Build Coastguard Worker stxvw4x 0, 0, 3 245*9880d681SAndroid Build Coastguard Worker ori 2, 2, 0 246*9880d681SAndroid Build Coastguard Worker lxvw4x 34, 0, 3 247*9880d681SAndroid Build Coastguard Worker addi 3, 1, -32 248*9880d681SAndroid Build Coastguard Worker stxvw4x 35, 0, 3 249*9880d681SAndroid Build Coastguard Worker vpmsumb 2, 2, 3 250*9880d681SAndroid Build Coastguard Worker blr 251*9880d681SAndroid Build Coastguard Worker .long 0 252*9880d681SAndroid Build Coastguard Worker .quad 0 253*9880d681SAndroid Build Coastguard Worker 254*9880d681SAndroid Build Coastguard WorkerThe two stxvw4x instructions are not needed. 255*9880d681SAndroid Build Coastguard WorkerWith -mtriple=powerpc64le-unknown-linux-gnu, the associated permutes 256*9880d681SAndroid Build Coastguard Workerare present too. 257*9880d681SAndroid Build Coastguard Worker 258*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 259*9880d681SAndroid Build Coastguard Worker 260*9880d681SAndroid Build Coastguard WorkerThe following example is found in test/CodeGen/PowerPC/vec_add_sub_doubleword.ll: 261*9880d681SAndroid Build Coastguard Worker 262*9880d681SAndroid Build Coastguard Workerdefine <2 x i64> @increment_by_val(<2 x i64> %x, i64 %val) nounwind { 263*9880d681SAndroid Build Coastguard Worker %tmpvec = insertelement <2 x i64> <i64 0, i64 0>, i64 %val, i32 0 264*9880d681SAndroid Build Coastguard Worker %tmpvec2 = insertelement <2 x i64> %tmpvec, i64 %val, i32 1 265*9880d681SAndroid Build Coastguard Worker %result = add <2 x i64> %x, %tmpvec2 266*9880d681SAndroid Build Coastguard Worker ret <2 x i64> %result 267*9880d681SAndroid Build Coastguard Worker 268*9880d681SAndroid Build Coastguard WorkerThis will generate the following instruction sequence: 269*9880d681SAndroid Build Coastguard Worker std 5, -8(1) 270*9880d681SAndroid Build Coastguard Worker std 5, -16(1) 271*9880d681SAndroid Build Coastguard Worker addi 3, 1, -16 272*9880d681SAndroid Build Coastguard Worker ori 2, 2, 0 273*9880d681SAndroid Build Coastguard Worker lxvd2x 35, 0, 3 274*9880d681SAndroid Build Coastguard Worker vaddudm 2, 2, 3 275*9880d681SAndroid Build Coastguard Worker blr 276*9880d681SAndroid Build Coastguard Worker 277*9880d681SAndroid Build Coastguard WorkerThis will almost certainly cause a load-hit-store hazard. 278*9880d681SAndroid Build Coastguard WorkerSince val is a value parameter, it should not need to be saved onto 279*9880d681SAndroid Build Coastguard Workerthe stack, unless it's being done set up the vector register. Instead, 280*9880d681SAndroid Build Coastguard Workerit would be better to splat the value into a vector register, and then 281*9880d681SAndroid Build Coastguard Workerremove the (dead) stores to the stack. 282*9880d681SAndroid Build Coastguard Worker 283*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 284*9880d681SAndroid Build Coastguard Worker 285*9880d681SAndroid Build Coastguard WorkerAt the moment we always generate a lxsdx in preference to lfd, or stxsdx in 286*9880d681SAndroid Build Coastguard Workerpreference to stfd. When we have a reg-immediate addressing mode, this is a 287*9880d681SAndroid Build Coastguard Workerpoor choice, since we have to load the address into an index register. This 288*9880d681SAndroid Build Coastguard Workershould be fixed for P7/P8. 289*9880d681SAndroid Build Coastguard Worker 290*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 291*9880d681SAndroid Build Coastguard Worker 292*9880d681SAndroid Build Coastguard WorkerRight now, ShuffleKind 0 is supported only on BE, and ShuffleKind 2 only on LE. 293*9880d681SAndroid Build Coastguard WorkerHowever, we could actually support both kinds on either endianness, if we check 294*9880d681SAndroid Build Coastguard Workerfor the appropriate shufflevector pattern for each case ... this would cause 295*9880d681SAndroid Build Coastguard Workersome additional shufflevectors to be recognized and implemented via the 296*9880d681SAndroid Build Coastguard Worker"swapped" form. 297*9880d681SAndroid Build Coastguard Worker 298*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 299*9880d681SAndroid Build Coastguard Worker 300*9880d681SAndroid Build Coastguard WorkerThere is a utility program called PerfectShuffle that generates a table of the 301*9880d681SAndroid Build Coastguard Workershortest instruction sequence for implementing a shufflevector operation on 302*9880d681SAndroid Build Coastguard WorkerPowerPC. However, this was designed for big-endian code generation. We could 303*9880d681SAndroid Build Coastguard Workermodify this program to create a little endian version of the table. The table 304*9880d681SAndroid Build Coastguard Workeris used in PPCISelLowering.cpp, PPCTargetLowering::LOWERVECTOR_SHUFFLE(). 305*9880d681SAndroid Build Coastguard Worker 306*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 307*9880d681SAndroid Build Coastguard Worker 308*9880d681SAndroid Build Coastguard WorkerOpportunies to use instructions from PPCInstrVSX.td during code gen 309*9880d681SAndroid Build Coastguard Worker - Conversion instructions (Sections 7.6.1.5 and 7.6.1.6 of ISA 2.07) 310*9880d681SAndroid Build Coastguard Worker - Scalar comparisons (xscmpodp and xscmpudp) 311*9880d681SAndroid Build Coastguard Worker - Min and max (xsmaxdp, xsmindp, xvmaxdp, xvmindp, xvmaxsp, xvminsp) 312*9880d681SAndroid Build Coastguard Worker 313*9880d681SAndroid Build Coastguard WorkerRelated to this: we currently do not generate the lxvw4x instruction for either 314*9880d681SAndroid Build Coastguard Workerv4f32 or v4i32, probably because adding a dag pattern to the recognizer requires 315*9880d681SAndroid Build Coastguard Workera single target type. This should probably be addressed in the PPCISelDAGToDAG logic. 316*9880d681SAndroid Build Coastguard Worker 317*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===// 318*9880d681SAndroid Build Coastguard Worker 319*9880d681SAndroid Build Coastguard WorkerCurrently EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT are type-legal only 320*9880d681SAndroid Build Coastguard Workerfor v2f64 with VSX available. We should create custom lowering 321*9880d681SAndroid Build Coastguard Workersupport for the other vector types. Without this support, we generate 322*9880d681SAndroid Build Coastguard Workersequences with load-hit-store hazards. 323*9880d681SAndroid Build Coastguard Worker 324*9880d681SAndroid Build Coastguard Workerv4f32 can be supported with VSX by shifting the correct element into 325*9880d681SAndroid Build Coastguard Workerbig-endian lane 0, using xscvspdpn to produce a double-precision 326*9880d681SAndroid Build Coastguard Workerrepresentation of the single-precision value in big-endian 327*9880d681SAndroid Build Coastguard Workerdouble-precision lane 0, and reinterpreting lane 0 as an FPR or 328*9880d681SAndroid Build Coastguard Workervector-scalar register. 329*9880d681SAndroid Build Coastguard Worker 330*9880d681SAndroid Build Coastguard Workerv2i64 can be supported with VSX and P8Vector in the same manner as 331*9880d681SAndroid Build Coastguard Workerv2f64, followed by a direct move to a GPR. 332*9880d681SAndroid Build Coastguard Worker 333*9880d681SAndroid Build Coastguard Workerv4i32 can be supported with VSX and P8Vector by shifting the correct 334*9880d681SAndroid Build Coastguard Workerelement into big-endian lane 1, using a direct move to a GPR, and 335*9880d681SAndroid Build Coastguard Workersign-extending the 32-bit result to 64 bits. 336*9880d681SAndroid Build Coastguard Worker 337*9880d681SAndroid Build Coastguard Workerv8i16 can be supported with VSX and P8Vector by shifting the correct 338*9880d681SAndroid Build Coastguard Workerelement into big-endian lane 3, using a direct move to a GPR, and 339*9880d681SAndroid Build Coastguard Workersign-extending the 16-bit result to 64 bits. 340*9880d681SAndroid Build Coastguard Worker 341*9880d681SAndroid Build Coastguard Workerv16i8 can be supported with VSX and P8Vector by shifting the correct 342*9880d681SAndroid Build Coastguard Workerelement into big-endian lane 7, using a direct move to a GPR, and 343*9880d681SAndroid Build Coastguard Workersign-extending the 8-bit result to 64 bits. 344