xref: /aosp_15_r20/external/llvm/lib/Target/PowerPC/README_ALTIVEC.txt (revision 9880d6810fe72a1726cb53787c6711e909410d58)
1*9880d681SAndroid Build Coastguard Worker//===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===//
2*9880d681SAndroid Build Coastguard Worker
3*9880d681SAndroid Build Coastguard WorkerImplement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector
4*9880d681SAndroid Build Coastguard Workerregisters, to generate better spill code.
5*9880d681SAndroid Build Coastguard Worker
6*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
7*9880d681SAndroid Build Coastguard Worker
8*9880d681SAndroid Build Coastguard WorkerThe first should be a single lvx from the constant pool, the second should be
9*9880d681SAndroid Build Coastguard Workera xor/stvx:
10*9880d681SAndroid Build Coastguard Worker
11*9880d681SAndroid Build Coastguard Workervoid foo(void) {
12*9880d681SAndroid Build Coastguard Worker  int x[8] __attribute__((aligned(128))) = { 1, 1, 1, 17, 1, 1, 1, 1 };
13*9880d681SAndroid Build Coastguard Worker  bar (x);
14*9880d681SAndroid Build Coastguard Worker}
15*9880d681SAndroid Build Coastguard Worker
16*9880d681SAndroid Build Coastguard Worker#include <string.h>
17*9880d681SAndroid Build Coastguard Workervoid foo(void) {
18*9880d681SAndroid Build Coastguard Worker  int x[8] __attribute__((aligned(128)));
19*9880d681SAndroid Build Coastguard Worker  memset (x, 0, sizeof (x));
20*9880d681SAndroid Build Coastguard Worker  bar (x);
21*9880d681SAndroid Build Coastguard Worker}
22*9880d681SAndroid Build Coastguard Worker
23*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
24*9880d681SAndroid Build Coastguard Worker
25*9880d681SAndroid Build Coastguard WorkerAltivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:
26*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763
27*9880d681SAndroid Build Coastguard Worker
28*9880d681SAndroid Build Coastguard WorkerWhen -ffast-math is on, we can use 0.0.
29*9880d681SAndroid Build Coastguard Worker
30*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
31*9880d681SAndroid Build Coastguard Worker
32*9880d681SAndroid Build Coastguard Worker  Consider this:
33*9880d681SAndroid Build Coastguard Worker  v4f32 Vector;
34*9880d681SAndroid Build Coastguard Worker  v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X };
35*9880d681SAndroid Build Coastguard Worker
36*9880d681SAndroid Build Coastguard WorkerSince we know that "Vector" is 16-byte aligned and we know the element offset
37*9880d681SAndroid Build Coastguard Workerof ".X", we should change the load into a lve*x instruction, instead of doing
38*9880d681SAndroid Build Coastguard Workera load/store/lve*x sequence.
39*9880d681SAndroid Build Coastguard Worker
40*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
41*9880d681SAndroid Build Coastguard Worker
42*9880d681SAndroid Build Coastguard WorkerFor functions that use altivec AND have calls, we are VRSAVE'ing all call
43*9880d681SAndroid Build Coastguard Workerclobbered regs.
44*9880d681SAndroid Build Coastguard Worker
45*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
46*9880d681SAndroid Build Coastguard Worker
47*9880d681SAndroid Build Coastguard WorkerImplement passing vectors by value into calls and receiving them as arguments.
48*9880d681SAndroid Build Coastguard Worker
49*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
50*9880d681SAndroid Build Coastguard Worker
51*9880d681SAndroid Build Coastguard WorkerGCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load
52*9880d681SAndroid Build Coastguard Workerof C1/C2/C3, then a load and vperm of Variable.
53*9880d681SAndroid Build Coastguard Worker
54*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
55*9880d681SAndroid Build Coastguard Worker
56*9880d681SAndroid Build Coastguard WorkerWe need a way to teach tblgen that some operands of an intrinsic are required to
57*9880d681SAndroid Build Coastguard Workerbe constants.  The verifier should enforce this constraint.
58*9880d681SAndroid Build Coastguard Worker
59*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
60*9880d681SAndroid Build Coastguard Worker
61*9880d681SAndroid Build Coastguard WorkerWe currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte
62*9880d681SAndroid Build Coastguard Workeraligned stack slot, followed by a load/vperm.  We should probably just store it
63*9880d681SAndroid Build Coastguard Workerto a scalar stack slot, then use lvsl/vperm to load it.  If the value is already
64*9880d681SAndroid Build Coastguard Workerin memory this is a big win.
65*9880d681SAndroid Build Coastguard Worker
66*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
67*9880d681SAndroid Build Coastguard Worker
68*9880d681SAndroid Build Coastguard Workerextract_vector_elt of an arbitrary constant vector can be done with the
69*9880d681SAndroid Build Coastguard Workerfollowing instructions:
70*9880d681SAndroid Build Coastguard Worker
71*9880d681SAndroid Build Coastguard WorkervTemp = vec_splat(v0,2);    // 2 is the element the src is in.
72*9880d681SAndroid Build Coastguard Workervec_ste(&destloc,0,vTemp);
73*9880d681SAndroid Build Coastguard Worker
74*9880d681SAndroid Build Coastguard WorkerWe can do an arbitrary non-constant value by using lvsr/perm/ste.
75*9880d681SAndroid Build Coastguard Worker
76*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
77*9880d681SAndroid Build Coastguard Worker
78*9880d681SAndroid Build Coastguard WorkerIf we want to tie instruction selection into the scheduler, we can do some
79*9880d681SAndroid Build Coastguard Workerconstant formation with different instructions.  For example, we can generate
80*9880d681SAndroid Build Coastguard Worker"vsplti -1" with "vcmpequw R,R" and 1,1,1,1 with "vsubcuw R,R", and 0,0,0,0 with
81*9880d681SAndroid Build Coastguard Worker"vsplti 0" or "vxor", each of which use different execution units, thus could
82*9880d681SAndroid Build Coastguard Workerhelp scheduling.
83*9880d681SAndroid Build Coastguard Worker
84*9880d681SAndroid Build Coastguard WorkerThis is probably only reasonable for a post-pass scheduler.
85*9880d681SAndroid Build Coastguard Worker
86*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
87*9880d681SAndroid Build Coastguard Worker
88*9880d681SAndroid Build Coastguard WorkerFor this function:
89*9880d681SAndroid Build Coastguard Worker
90*9880d681SAndroid Build Coastguard Workervoid test(vector float *A, vector float *B) {
91*9880d681SAndroid Build Coastguard Worker  vector float C = (vector float)vec_cmpeq(*A, *B);
92*9880d681SAndroid Build Coastguard Worker  if (!vec_any_eq(*A, *B))
93*9880d681SAndroid Build Coastguard Worker    *B = (vector float){0,0,0,0};
94*9880d681SAndroid Build Coastguard Worker  *A = C;
95*9880d681SAndroid Build Coastguard Worker}
96*9880d681SAndroid Build Coastguard Worker
97*9880d681SAndroid Build Coastguard Workerwe get the following basic block:
98*9880d681SAndroid Build Coastguard Worker
99*9880d681SAndroid Build Coastguard Worker	...
100*9880d681SAndroid Build Coastguard Worker        lvx v2, 0, r4
101*9880d681SAndroid Build Coastguard Worker        lvx v3, 0, r3
102*9880d681SAndroid Build Coastguard Worker        vcmpeqfp v4, v3, v2
103*9880d681SAndroid Build Coastguard Worker        vcmpeqfp. v2, v3, v2
104*9880d681SAndroid Build Coastguard Worker        bne cr6, LBB1_2 ; cond_next
105*9880d681SAndroid Build Coastguard Worker
106*9880d681SAndroid Build Coastguard WorkerThe vcmpeqfp/vcmpeqfp. instructions currently cannot be merged when the
107*9880d681SAndroid Build Coastguard Workervcmpeqfp. result is used by a branch.  This can be improved.
108*9880d681SAndroid Build Coastguard Worker
109*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
110*9880d681SAndroid Build Coastguard Worker
111*9880d681SAndroid Build Coastguard WorkerThe code generated for this is truly aweful:
112*9880d681SAndroid Build Coastguard Worker
113*9880d681SAndroid Build Coastguard Workervector float test(float a, float b) {
114*9880d681SAndroid Build Coastguard Worker return (vector float){ 0.0, a, 0.0, 0.0};
115*9880d681SAndroid Build Coastguard Worker}
116*9880d681SAndroid Build Coastguard Worker
117*9880d681SAndroid Build Coastguard WorkerLCPI1_0:                                        ;  float
118*9880d681SAndroid Build Coastguard Worker        .space  4
119*9880d681SAndroid Build Coastguard Worker        .text
120*9880d681SAndroid Build Coastguard Worker        .globl  _test
121*9880d681SAndroid Build Coastguard Worker        .align  4
122*9880d681SAndroid Build Coastguard Worker_test:
123*9880d681SAndroid Build Coastguard Worker        mfspr r2, 256
124*9880d681SAndroid Build Coastguard Worker        oris r3, r2, 4096
125*9880d681SAndroid Build Coastguard Worker        mtspr 256, r3
126*9880d681SAndroid Build Coastguard Worker        lis r3, ha16(LCPI1_0)
127*9880d681SAndroid Build Coastguard Worker        addi r4, r1, -32
128*9880d681SAndroid Build Coastguard Worker        stfs f1, -16(r1)
129*9880d681SAndroid Build Coastguard Worker        addi r5, r1, -16
130*9880d681SAndroid Build Coastguard Worker        lfs f0, lo16(LCPI1_0)(r3)
131*9880d681SAndroid Build Coastguard Worker        stfs f0, -32(r1)
132*9880d681SAndroid Build Coastguard Worker        lvx v2, 0, r4
133*9880d681SAndroid Build Coastguard Worker        lvx v3, 0, r5
134*9880d681SAndroid Build Coastguard Worker        vmrghw v3, v3, v2
135*9880d681SAndroid Build Coastguard Worker        vspltw v2, v2, 0
136*9880d681SAndroid Build Coastguard Worker        vmrghw v2, v2, v3
137*9880d681SAndroid Build Coastguard Worker        mtspr 256, r2
138*9880d681SAndroid Build Coastguard Worker        blr
139*9880d681SAndroid Build Coastguard Worker
140*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
141*9880d681SAndroid Build Coastguard Worker
142*9880d681SAndroid Build Coastguard Workerint foo(vector float *x, vector float *y) {
143*9880d681SAndroid Build Coastguard Worker        if (vec_all_eq(*x,*y)) return 3245;
144*9880d681SAndroid Build Coastguard Worker        else return 12;
145*9880d681SAndroid Build Coastguard Worker}
146*9880d681SAndroid Build Coastguard Worker
147*9880d681SAndroid Build Coastguard WorkerA predicate compare being used in a select_cc should have the same peephole
148*9880d681SAndroid Build Coastguard Workerapplied to it as a predicate compare used by a br_cc.  There should be no
149*9880d681SAndroid Build Coastguard Workermfcr here:
150*9880d681SAndroid Build Coastguard Worker
151*9880d681SAndroid Build Coastguard Worker_foo:
152*9880d681SAndroid Build Coastguard Worker        mfspr r2, 256
153*9880d681SAndroid Build Coastguard Worker        oris r5, r2, 12288
154*9880d681SAndroid Build Coastguard Worker        mtspr 256, r5
155*9880d681SAndroid Build Coastguard Worker        li r5, 12
156*9880d681SAndroid Build Coastguard Worker        li r6, 3245
157*9880d681SAndroid Build Coastguard Worker        lvx v2, 0, r4
158*9880d681SAndroid Build Coastguard Worker        lvx v3, 0, r3
159*9880d681SAndroid Build Coastguard Worker        vcmpeqfp. v2, v3, v2
160*9880d681SAndroid Build Coastguard Worker        mfcr r3, 2
161*9880d681SAndroid Build Coastguard Worker        rlwinm r3, r3, 25, 31, 31
162*9880d681SAndroid Build Coastguard Worker        cmpwi cr0, r3, 0
163*9880d681SAndroid Build Coastguard Worker        bne cr0, LBB1_2 ; entry
164*9880d681SAndroid Build Coastguard WorkerLBB1_1: ; entry
165*9880d681SAndroid Build Coastguard Worker        mr r6, r5
166*9880d681SAndroid Build Coastguard WorkerLBB1_2: ; entry
167*9880d681SAndroid Build Coastguard Worker        mr r3, r6
168*9880d681SAndroid Build Coastguard Worker        mtspr 256, r2
169*9880d681SAndroid Build Coastguard Worker        blr
170*9880d681SAndroid Build Coastguard Worker
171*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
172*9880d681SAndroid Build Coastguard Worker
173*9880d681SAndroid Build Coastguard WorkerCodeGen/PowerPC/vec_constants.ll has an and operation that should be
174*9880d681SAndroid Build Coastguard Workercodegen'd to andc.  The issue is that the 'all ones' build vector is
175*9880d681SAndroid Build Coastguard WorkerSelectNodeTo'd a VSPLTISB instruction node before the and/xor is selected
176*9880d681SAndroid Build Coastguard Workerwhich prevents the vnot pattern from matching.
177*9880d681SAndroid Build Coastguard Worker
178*9880d681SAndroid Build Coastguard Worker
179*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
180*9880d681SAndroid Build Coastguard Worker
181*9880d681SAndroid Build Coastguard WorkerAn alternative to the store/store/load approach for illegal insert element
182*9880d681SAndroid Build Coastguard Workerlowering would be:
183*9880d681SAndroid Build Coastguard Worker
184*9880d681SAndroid Build Coastguard Worker1. store element to any ol' slot
185*9880d681SAndroid Build Coastguard Worker2. lvx the slot
186*9880d681SAndroid Build Coastguard Worker3. lvsl 0; splat index; vcmpeq to generate a select mask
187*9880d681SAndroid Build Coastguard Worker4. lvsl slot + x; vperm to rotate result into correct slot
188*9880d681SAndroid Build Coastguard Worker5. vsel result together.
189*9880d681SAndroid Build Coastguard Worker
190*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
191*9880d681SAndroid Build Coastguard Worker
192*9880d681SAndroid Build Coastguard WorkerShould codegen branches on vec_any/vec_all to avoid mfcr.  Two examples:
193*9880d681SAndroid Build Coastguard Worker
194*9880d681SAndroid Build Coastguard Worker#include <altivec.h>
195*9880d681SAndroid Build Coastguard Worker int f(vector float a, vector float b)
196*9880d681SAndroid Build Coastguard Worker {
197*9880d681SAndroid Build Coastguard Worker  int aa = 0;
198*9880d681SAndroid Build Coastguard Worker  if (vec_all_ge(a, b))
199*9880d681SAndroid Build Coastguard Worker    aa |= 0x1;
200*9880d681SAndroid Build Coastguard Worker  if (vec_any_ge(a,b))
201*9880d681SAndroid Build Coastguard Worker    aa |= 0x2;
202*9880d681SAndroid Build Coastguard Worker  return aa;
203*9880d681SAndroid Build Coastguard Worker}
204*9880d681SAndroid Build Coastguard Worker
205*9880d681SAndroid Build Coastguard Workervector float f(vector float a, vector float b) {
206*9880d681SAndroid Build Coastguard Worker  if (vec_any_eq(a, b))
207*9880d681SAndroid Build Coastguard Worker    return a;
208*9880d681SAndroid Build Coastguard Worker  else
209*9880d681SAndroid Build Coastguard Worker    return b;
210*9880d681SAndroid Build Coastguard Worker}
211*9880d681SAndroid Build Coastguard Worker
212*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
213*9880d681SAndroid Build Coastguard Worker
214*9880d681SAndroid Build Coastguard WorkerWe should do a little better with eliminating dead stores.
215*9880d681SAndroid Build Coastguard WorkerThe stores to the stack are dead since %a and %b are not needed
216*9880d681SAndroid Build Coastguard Worker
217*9880d681SAndroid Build Coastguard Worker; Function Attrs: nounwind
218*9880d681SAndroid Build Coastguard Workerdefine <16 x i8> @test_vpmsumb() #0 {
219*9880d681SAndroid Build Coastguard Worker  entry:
220*9880d681SAndroid Build Coastguard Worker  %a = alloca <16 x i8>, align 16
221*9880d681SAndroid Build Coastguard Worker  %b = alloca <16 x i8>, align 16
222*9880d681SAndroid Build Coastguard Worker  store <16 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16>, <16 x i8>* %a, align 16
223*9880d681SAndroid Build Coastguard Worker  store <16 x i8> <i8 113, i8 114, i8 115, i8 116, i8 117, i8 118, i8 119, i8 120, i8 121, i8 122, i8 123, i8 124, i8 125, i8 126, i8 127, i8 112>, <16 x i8>* %b, align 16
224*9880d681SAndroid Build Coastguard Worker  %0 = load <16 x i8>* %a, align 16
225*9880d681SAndroid Build Coastguard Worker  %1 = load <16 x i8>* %b, align 16
226*9880d681SAndroid Build Coastguard Worker  %2 = call <16 x i8> @llvm.ppc.altivec.crypto.vpmsumb(<16 x i8> %0, <16 x i8> %1)
227*9880d681SAndroid Build Coastguard Worker  ret <16 x i8> %2
228*9880d681SAndroid Build Coastguard Worker}
229*9880d681SAndroid Build Coastguard Worker
230*9880d681SAndroid Build Coastguard Worker
231*9880d681SAndroid Build Coastguard Worker; Function Attrs: nounwind readnone
232*9880d681SAndroid Build Coastguard Workerdeclare <16 x i8> @llvm.ppc.altivec.crypto.vpmsumb(<16 x i8>, <16 x i8>) #1
233*9880d681SAndroid Build Coastguard Worker
234*9880d681SAndroid Build Coastguard Worker
235*9880d681SAndroid Build Coastguard WorkerProduces the following code with -mtriple=powerpc64-unknown-linux-gnu:
236*9880d681SAndroid Build Coastguard Worker# BB#0:                                 # %entry
237*9880d681SAndroid Build Coastguard Worker    addis 3, 2, .LCPI0_0@toc@ha
238*9880d681SAndroid Build Coastguard Worker    addis 4, 2, .LCPI0_1@toc@ha
239*9880d681SAndroid Build Coastguard Worker    addi 3, 3, .LCPI0_0@toc@l
240*9880d681SAndroid Build Coastguard Worker    addi 4, 4, .LCPI0_1@toc@l
241*9880d681SAndroid Build Coastguard Worker    lxvw4x 0, 0, 3
242*9880d681SAndroid Build Coastguard Worker    addi 3, 1, -16
243*9880d681SAndroid Build Coastguard Worker    lxvw4x 35, 0, 4
244*9880d681SAndroid Build Coastguard Worker    stxvw4x 0, 0, 3
245*9880d681SAndroid Build Coastguard Worker    ori 2, 2, 0
246*9880d681SAndroid Build Coastguard Worker    lxvw4x 34, 0, 3
247*9880d681SAndroid Build Coastguard Worker    addi 3, 1, -32
248*9880d681SAndroid Build Coastguard Worker    stxvw4x 35, 0, 3
249*9880d681SAndroid Build Coastguard Worker    vpmsumb 2, 2, 3
250*9880d681SAndroid Build Coastguard Worker    blr
251*9880d681SAndroid Build Coastguard Worker    .long   0
252*9880d681SAndroid Build Coastguard Worker    .quad   0
253*9880d681SAndroid Build Coastguard Worker
254*9880d681SAndroid Build Coastguard WorkerThe two stxvw4x instructions are not needed.
255*9880d681SAndroid Build Coastguard WorkerWith -mtriple=powerpc64le-unknown-linux-gnu, the associated permutes
256*9880d681SAndroid Build Coastguard Workerare present too.
257*9880d681SAndroid Build Coastguard Worker
258*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
259*9880d681SAndroid Build Coastguard Worker
260*9880d681SAndroid Build Coastguard WorkerThe following example is found in test/CodeGen/PowerPC/vec_add_sub_doubleword.ll:
261*9880d681SAndroid Build Coastguard Worker
262*9880d681SAndroid Build Coastguard Workerdefine <2 x i64> @increment_by_val(<2 x i64> %x, i64 %val) nounwind {
263*9880d681SAndroid Build Coastguard Worker       %tmpvec = insertelement <2 x i64> <i64 0, i64 0>, i64 %val, i32 0
264*9880d681SAndroid Build Coastguard Worker       %tmpvec2 = insertelement <2 x i64> %tmpvec, i64 %val, i32 1
265*9880d681SAndroid Build Coastguard Worker       %result = add <2 x i64> %x, %tmpvec2
266*9880d681SAndroid Build Coastguard Worker       ret <2 x i64> %result
267*9880d681SAndroid Build Coastguard Worker
268*9880d681SAndroid Build Coastguard WorkerThis will generate the following instruction sequence:
269*9880d681SAndroid Build Coastguard Worker        std 5, -8(1)
270*9880d681SAndroid Build Coastguard Worker        std 5, -16(1)
271*9880d681SAndroid Build Coastguard Worker        addi 3, 1, -16
272*9880d681SAndroid Build Coastguard Worker        ori 2, 2, 0
273*9880d681SAndroid Build Coastguard Worker        lxvd2x 35, 0, 3
274*9880d681SAndroid Build Coastguard Worker        vaddudm 2, 2, 3
275*9880d681SAndroid Build Coastguard Worker        blr
276*9880d681SAndroid Build Coastguard Worker
277*9880d681SAndroid Build Coastguard WorkerThis will almost certainly cause a load-hit-store hazard.
278*9880d681SAndroid Build Coastguard WorkerSince val is a value parameter, it should not need to be saved onto
279*9880d681SAndroid Build Coastguard Workerthe stack, unless it's being done set up the vector register. Instead,
280*9880d681SAndroid Build Coastguard Workerit would be better to splat the value into a vector register, and then
281*9880d681SAndroid Build Coastguard Workerremove the (dead) stores to the stack.
282*9880d681SAndroid Build Coastguard Worker
283*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
284*9880d681SAndroid Build Coastguard Worker
285*9880d681SAndroid Build Coastguard WorkerAt the moment we always generate a lxsdx in preference to lfd, or stxsdx in
286*9880d681SAndroid Build Coastguard Workerpreference to stfd.  When we have a reg-immediate addressing mode, this is a
287*9880d681SAndroid Build Coastguard Workerpoor choice, since we have to load the address into an index register.  This
288*9880d681SAndroid Build Coastguard Workershould be fixed for P7/P8.
289*9880d681SAndroid Build Coastguard Worker
290*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
291*9880d681SAndroid Build Coastguard Worker
292*9880d681SAndroid Build Coastguard WorkerRight now, ShuffleKind 0 is supported only on BE, and ShuffleKind 2 only on LE.
293*9880d681SAndroid Build Coastguard WorkerHowever, we could actually support both kinds on either endianness, if we check
294*9880d681SAndroid Build Coastguard Workerfor the appropriate shufflevector pattern for each case ...  this would cause
295*9880d681SAndroid Build Coastguard Workersome additional shufflevectors to be recognized and implemented via the
296*9880d681SAndroid Build Coastguard Worker"swapped" form.
297*9880d681SAndroid Build Coastguard Worker
298*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
299*9880d681SAndroid Build Coastguard Worker
300*9880d681SAndroid Build Coastguard WorkerThere is a utility program called PerfectShuffle that generates a table of the
301*9880d681SAndroid Build Coastguard Workershortest instruction sequence for implementing a shufflevector operation on
302*9880d681SAndroid Build Coastguard WorkerPowerPC.  However, this was designed for big-endian code generation.  We could
303*9880d681SAndroid Build Coastguard Workermodify this program to create a little endian version of the table.  The table
304*9880d681SAndroid Build Coastguard Workeris used in PPCISelLowering.cpp, PPCTargetLowering::LOWERVECTOR_SHUFFLE().
305*9880d681SAndroid Build Coastguard Worker
306*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
307*9880d681SAndroid Build Coastguard Worker
308*9880d681SAndroid Build Coastguard WorkerOpportunies to use instructions from PPCInstrVSX.td during code gen
309*9880d681SAndroid Build Coastguard Worker  - Conversion instructions (Sections 7.6.1.5 and 7.6.1.6 of ISA 2.07)
310*9880d681SAndroid Build Coastguard Worker  - Scalar comparisons (xscmpodp and xscmpudp)
311*9880d681SAndroid Build Coastguard Worker  - Min and max (xsmaxdp, xsmindp, xvmaxdp, xvmindp, xvmaxsp, xvminsp)
312*9880d681SAndroid Build Coastguard Worker
313*9880d681SAndroid Build Coastguard WorkerRelated to this: we currently do not generate the lxvw4x instruction for either
314*9880d681SAndroid Build Coastguard Workerv4f32 or v4i32, probably because adding a dag pattern to the recognizer requires
315*9880d681SAndroid Build Coastguard Workera single target type.  This should probably be addressed in the PPCISelDAGToDAG logic.
316*9880d681SAndroid Build Coastguard Worker
317*9880d681SAndroid Build Coastguard Worker//===----------------------------------------------------------------------===//
318*9880d681SAndroid Build Coastguard Worker
319*9880d681SAndroid Build Coastguard WorkerCurrently EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT are type-legal only
320*9880d681SAndroid Build Coastguard Workerfor v2f64 with VSX available.  We should create custom lowering
321*9880d681SAndroid Build Coastguard Workersupport for the other vector types.  Without this support, we generate
322*9880d681SAndroid Build Coastguard Workersequences with load-hit-store hazards.
323*9880d681SAndroid Build Coastguard Worker
324*9880d681SAndroid Build Coastguard Workerv4f32 can be supported with VSX by shifting the correct element into
325*9880d681SAndroid Build Coastguard Workerbig-endian lane 0, using xscvspdpn to produce a double-precision
326*9880d681SAndroid Build Coastguard Workerrepresentation of the single-precision value in big-endian
327*9880d681SAndroid Build Coastguard Workerdouble-precision lane 0, and reinterpreting lane 0 as an FPR or
328*9880d681SAndroid Build Coastguard Workervector-scalar register.
329*9880d681SAndroid Build Coastguard Worker
330*9880d681SAndroid Build Coastguard Workerv2i64 can be supported with VSX and P8Vector in the same manner as
331*9880d681SAndroid Build Coastguard Workerv2f64, followed by a direct move to a GPR.
332*9880d681SAndroid Build Coastguard Worker
333*9880d681SAndroid Build Coastguard Workerv4i32 can be supported with VSX and P8Vector by shifting the correct
334*9880d681SAndroid Build Coastguard Workerelement into big-endian lane 1, using a direct move to a GPR, and
335*9880d681SAndroid Build Coastguard Workersign-extending the 32-bit result to 64 bits.
336*9880d681SAndroid Build Coastguard Worker
337*9880d681SAndroid Build Coastguard Workerv8i16 can be supported with VSX and P8Vector by shifting the correct
338*9880d681SAndroid Build Coastguard Workerelement into big-endian lane 3, using a direct move to a GPR, and
339*9880d681SAndroid Build Coastguard Workersign-extending the 16-bit result to 64 bits.
340*9880d681SAndroid Build Coastguard Worker
341*9880d681SAndroid Build Coastguard Workerv16i8 can be supported with VSX and P8Vector by shifting the correct
342*9880d681SAndroid Build Coastguard Workerelement into big-endian lane 7, using a direct move to a GPR, and
343*9880d681SAndroid Build Coastguard Workersign-extending the 8-bit result to 64 bits.
344