xref: /aosp_15_r20/external/swiftshader/third_party/subzero/src/README.SIMD.rst (revision 03ce13f70fcc45d86ee91b7ee4cab1936a95046e)
1*03ce13f7SAndroid Build Coastguard WorkerMissing support
2*03ce13f7SAndroid Build Coastguard Worker===============
3*03ce13f7SAndroid Build Coastguard Worker
4*03ce13f7SAndroid Build Coastguard Worker* The PNaCl LLVM backend expands shufflevector operations into sequences of
5*03ce13f7SAndroid Build Coastguard Worker  insertelement and extractelement operations. For instance:
6*03ce13f7SAndroid Build Coastguard Worker
7*03ce13f7SAndroid Build Coastguard Worker    define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {
8*03ce13f7SAndroid Build Coastguard Worker    entry:
9*03ce13f7SAndroid Build Coastguard Worker      %res = shufflevector <4 x i32> %arg1,
10*03ce13f7SAndroid Build Coastguard Worker                           <4 x i32> %arg2,
11*03ce13f7SAndroid Build Coastguard Worker                           <4 x i32> <i32 4, i32 5, i32 0, i32 1>
12*03ce13f7SAndroid Build Coastguard Worker      ret <4 x i32> %res
13*03ce13f7SAndroid Build Coastguard Worker    }
14*03ce13f7SAndroid Build Coastguard Worker
15*03ce13f7SAndroid Build Coastguard Worker  gets expanded into:
16*03ce13f7SAndroid Build Coastguard Worker
17*03ce13f7SAndroid Build Coastguard Worker    define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {
18*03ce13f7SAndroid Build Coastguard Worker    entry:
19*03ce13f7SAndroid Build Coastguard Worker      %0 = extractelement <4 x i32> %arg2, i32 0
20*03ce13f7SAndroid Build Coastguard Worker      %1 = insertelement <4 x i32> undef, i32 %0, i32 0
21*03ce13f7SAndroid Build Coastguard Worker      %2 = extractelement <4 x i32> %arg2, i32 1
22*03ce13f7SAndroid Build Coastguard Worker      %3 = insertelement <4 x i32> %1, i32 %2, i32 1
23*03ce13f7SAndroid Build Coastguard Worker      %4 = extractelement <4 x i32> %arg1, i32 0
24*03ce13f7SAndroid Build Coastguard Worker      %5 = insertelement <4 x i32> %3, i32 %4, i32 2
25*03ce13f7SAndroid Build Coastguard Worker      %6 = extractelement <4 x i32> %arg1, i32 1
26*03ce13f7SAndroid Build Coastguard Worker      %7 = insertelement <4 x i32> %5, i32 %6, i32 3
27*03ce13f7SAndroid Build Coastguard Worker      ret <4 x i32> %7
28*03ce13f7SAndroid Build Coastguard Worker    }
29*03ce13f7SAndroid Build Coastguard Worker
30*03ce13f7SAndroid Build Coastguard Worker  Subzero should recognize these sequences and recombine them into
31*03ce13f7SAndroid Build Coastguard Worker  shuffle operations where appropriate.
32*03ce13f7SAndroid Build Coastguard Worker
33*03ce13f7SAndroid Build Coastguard Worker* Add support for vector constants in the backend. The current code
34*03ce13f7SAndroid Build Coastguard Worker  materializes the vector constants it needs (eg. for performing icmp on
35*03ce13f7SAndroid Build Coastguard Worker  unsigned operands) using register operations, but this should be changed to
36*03ce13f7SAndroid Build Coastguard Worker  loading them from a constant pool if the register initialization is too
37*03ce13f7SAndroid Build Coastguard Worker  complicated (such as in TargetX8632::makeVectorOfHighOrderBits()).
38*03ce13f7SAndroid Build Coastguard Worker
39*03ce13f7SAndroid Build Coastguard Worker* [x86 specific] llvm-mc does not allow lea to take a mem128 memory operand
40*03ce13f7SAndroid Build Coastguard Worker  when assembling x86-32 code. The current InstX8632Lea::emit() code uses
41*03ce13f7SAndroid Build Coastguard Worker  Variable::asType() to convert any mem128 Variables into a compatible memory
42*03ce13f7SAndroid Build Coastguard Worker  operand type. However, the emit code does not do any conversions of
43*03ce13f7SAndroid Build Coastguard Worker  OperandX8632Mem, so if an OperandX8632Mem is passed to lea as mem128 the
44*03ce13f7SAndroid Build Coastguard Worker  resulting code will not assemble.  One way to fix this is by implementing
45*03ce13f7SAndroid Build Coastguard Worker  OperandX8632Mem::asType().
46*03ce13f7SAndroid Build Coastguard Worker
47*03ce13f7SAndroid Build Coastguard Worker* [x86 specific] Lower shl with <4 x i32> using some clever float conversion:
48*03ce13f7SAndroid Build Coastguard Workerhttp://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html
49*03ce13f7SAndroid Build Coastguard Worker
50*03ce13f7SAndroid Build Coastguard Worker* [x86 specific] Add support for using aligned mov operations (movaps). This
51*03ce13f7SAndroid Build Coastguard Worker  will require passing alignment information to loads and stores.
52*03ce13f7SAndroid Build Coastguard Worker
53*03ce13f7SAndroid Build Coastguard Workerx86 SIMD Diversification
54*03ce13f7SAndroid Build Coastguard Worker========================
55*03ce13f7SAndroid Build Coastguard Worker
56*03ce13f7SAndroid Build Coastguard Worker* Vector "bitwise" operations have several variant instructions: the AND
57*03ce13f7SAndroid Build Coastguard Worker  operation can be implemented with pand, andpd, or andps. This pattern also
58*03ce13f7SAndroid Build Coastguard Worker  holds for ANDN, OR, and XOR.
59*03ce13f7SAndroid Build Coastguard Worker
60*03ce13f7SAndroid Build Coastguard Worker* Vector "mov" instructions can be diversified (eg. movdqu instead of movups)
61*03ce13f7SAndroid Build Coastguard Worker  at the cost of a possible performance penalty.
62*03ce13f7SAndroid Build Coastguard Worker
63*03ce13f7SAndroid Build Coastguard Worker* Scalar FP arithmetic can be diversified by performing the operations with the
64*03ce13f7SAndroid Build Coastguard Worker  vector version of the instructions.
65