xref: /aosp_15_r20/external/llvm/lib/Target/ARM/README.txt (revision 9880d6810fe72a1726cb53787c6711e909410d58)
1*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
2*9880d681SAndroid Build Coastguard Worker// Random ideas for the ARM backend.
3*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
4*9880d681SAndroid Build Coastguard Worker
5*9880d681SAndroid Build Coastguard WorkerReimplement 'select' in terms of 'SEL'.
6*9880d681SAndroid Build Coastguard Worker
7*9880d681SAndroid Build Coastguard Worker* We would really like to support UXTAB16, but we need to prove that the
8*9880d681SAndroid Build Coastguard Worker  add doesn't need to overflow between the two 16-bit chunks.
9*9880d681SAndroid Build Coastguard Worker
10*9880d681SAndroid Build Coastguard Worker* Implement pre/post increment support.  (e.g. PR935)
11*9880d681SAndroid Build Coastguard Worker* Implement smarter constant generation for binops with large immediates.
12*9880d681SAndroid Build Coastguard Worker
13*9880d681SAndroid Build Coastguard WorkerA few ARMv6T2 ops should be pattern matched: BFI, SBFX, and UBFX
14*9880d681SAndroid Build Coastguard Worker
15*9880d681SAndroid Build Coastguard WorkerInteresting optimization for PIC codegen on arm-linux:
16*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129
17*9880d681SAndroid Build Coastguard Worker
18*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
19*9880d681SAndroid Build Coastguard Worker
20*9880d681SAndroid Build Coastguard WorkerCrazy idea:  Consider code that uses lots of 8-bit or 16-bit values.  By the
21*9880d681SAndroid Build Coastguard Workertime regalloc happens, these values are now in a 32-bit register, usually with
22*9880d681SAndroid Build Coastguard Workerthe top-bits known to be sign or zero extended.  If spilled, we should be able
23*9880d681SAndroid Build Coastguard Workerto spill these to a 8-bit or 16-bit stack slot, zero or sign extending as part
24*9880d681SAndroid Build Coastguard Workerof the reload.
25*9880d681SAndroid Build Coastguard Worker
26*9880d681SAndroid Build Coastguard WorkerDoing this reduces the size of the stack frame (important for thumb etc), and
27*9880d681SAndroid Build Coastguard Workeralso increases the likelihood that we will be able to reload multiple values
28*9880d681SAndroid Build Coastguard Workerfrom the stack with a single load.
29*9880d681SAndroid Build Coastguard Worker
30*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
31*9880d681SAndroid Build Coastguard Worker
32*9880d681SAndroid Build Coastguard WorkerThe constant island pass is in good shape.  Some cleanups might be desirable,
33*9880d681SAndroid Build Coastguard Workerbut there is unlikely to be much improvement in the generated code.
34*9880d681SAndroid Build Coastguard Worker
35*9880d681SAndroid Build Coastguard Worker1.  There may be some advantage to trying to be smarter about the initial
36*9880d681SAndroid Build Coastguard Workerplacement, rather than putting everything at the end.
37*9880d681SAndroid Build Coastguard Worker
38*9880d681SAndroid Build Coastguard Worker2.  There might be some compile-time efficiency to be had by representing
39*9880d681SAndroid Build Coastguard Workerconsecutive islands as a single block rather than multiple blocks.
40*9880d681SAndroid Build Coastguard Worker
41*9880d681SAndroid Build Coastguard Worker3.  Use a priority queue to sort constant pool users in inverse order of
42*9880d681SAndroid Build Coastguard Worker    position so we always process the one closed to the end of functions
43*9880d681SAndroid Build Coastguard Worker    first. This may simply CreateNewWater.
44*9880d681SAndroid Build Coastguard Worker
45*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
46*9880d681SAndroid Build Coastguard Worker
47*9880d681SAndroid Build Coastguard WorkerEliminate copysign custom expansion. We are still generating crappy code with
48*9880d681SAndroid Build Coastguard Workerdefault expansion + if-conversion.
49*9880d681SAndroid Build Coastguard Worker
50*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
51*9880d681SAndroid Build Coastguard Worker
52*9880d681SAndroid Build Coastguard WorkerEliminate one instruction from:
53*9880d681SAndroid Build Coastguard Worker
54*9880d681SAndroid Build Coastguard Workerdefine i32 @_Z6slow4bii(i32 %x, i32 %y) {
55*9880d681SAndroid Build Coastguard Worker        %tmp = icmp sgt i32 %x, %y
56*9880d681SAndroid Build Coastguard Worker        %retval = select i1 %tmp, i32 %x, i32 %y
57*9880d681SAndroid Build Coastguard Worker        ret i32 %retval
58*9880d681SAndroid Build Coastguard Worker}
59*9880d681SAndroid Build Coastguard Worker
60*9880d681SAndroid Build Coastguard Worker__Z6slow4bii:
61*9880d681SAndroid Build Coastguard Worker        cmp r0, r1
62*9880d681SAndroid Build Coastguard Worker        movgt r1, r0
63*9880d681SAndroid Build Coastguard Worker        mov r0, r1
64*9880d681SAndroid Build Coastguard Worker        bx lr
65*9880d681SAndroid Build Coastguard Worker=>
66*9880d681SAndroid Build Coastguard Worker
67*9880d681SAndroid Build Coastguard Worker__Z6slow4bii:
68*9880d681SAndroid Build Coastguard Worker        cmp r0, r1
69*9880d681SAndroid Build Coastguard Worker        movle r0, r1
70*9880d681SAndroid Build Coastguard Worker        bx lr
71*9880d681SAndroid Build Coastguard Worker
72*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
73*9880d681SAndroid Build Coastguard Worker
74*9880d681SAndroid Build Coastguard WorkerImplement long long "X-3" with instructions that fold the immediate in.  These
75*9880d681SAndroid Build Coastguard Workerwere disabled due to badness with the ARM carry flag on subtracts.
76*9880d681SAndroid Build Coastguard Worker
77*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
78*9880d681SAndroid Build Coastguard Worker
79*9880d681SAndroid Build Coastguard WorkerMore load / store optimizations:
80*9880d681SAndroid Build Coastguard Worker1) Better representation for block transfer? This is from Olden/power:
81*9880d681SAndroid Build Coastguard Worker
82*9880d681SAndroid Build Coastguard Worker	fldd d0, [r4]
83*9880d681SAndroid Build Coastguard Worker	fstd d0, [r4, #+32]
84*9880d681SAndroid Build Coastguard Worker	fldd d0, [r4, #+8]
85*9880d681SAndroid Build Coastguard Worker	fstd d0, [r4, #+40]
86*9880d681SAndroid Build Coastguard Worker	fldd d0, [r4, #+16]
87*9880d681SAndroid Build Coastguard Worker	fstd d0, [r4, #+48]
88*9880d681SAndroid Build Coastguard Worker	fldd d0, [r4, #+24]
89*9880d681SAndroid Build Coastguard Worker	fstd d0, [r4, #+56]
90*9880d681SAndroid Build Coastguard Worker
91*9880d681SAndroid Build Coastguard WorkerIf we can spare the registers, it would be better to use fldm and fstm here.
92*9880d681SAndroid Build Coastguard WorkerNeed major register allocator enhancement though.
93*9880d681SAndroid Build Coastguard Worker
94*9880d681SAndroid Build Coastguard Worker2) Can we recognize the relative position of constantpool entries? i.e. Treat
95*9880d681SAndroid Build Coastguard Worker
96*9880d681SAndroid Build Coastguard Worker	ldr r0, LCPI17_3
97*9880d681SAndroid Build Coastguard Worker	ldr r1, LCPI17_4
98*9880d681SAndroid Build Coastguard Worker	ldr r2, LCPI17_5
99*9880d681SAndroid Build Coastguard Worker
100*9880d681SAndroid Build Coastguard Worker   as
101*9880d681SAndroid Build Coastguard Worker	ldr r0, LCPI17
102*9880d681SAndroid Build Coastguard Worker	ldr r1, LCPI17+4
103*9880d681SAndroid Build Coastguard Worker	ldr r2, LCPI17+8
104*9880d681SAndroid Build Coastguard Worker
105*9880d681SAndroid Build Coastguard Worker   Then the ldr's can be combined into a single ldm. See Olden/power.
106*9880d681SAndroid Build Coastguard Worker
107*9880d681SAndroid Build Coastguard WorkerNote for ARM v4 gcc uses ldmia to load a pair of 32-bit values to represent a
108*9880d681SAndroid Build Coastguard Workerdouble 64-bit FP constant:
109*9880d681SAndroid Build Coastguard Worker
110*9880d681SAndroid Build Coastguard Worker	adr	r0, L6
111*9880d681SAndroid Build Coastguard Worker	ldmia	r0, {r0-r1}
112*9880d681SAndroid Build Coastguard Worker
113*9880d681SAndroid Build Coastguard Worker	.align 2
114*9880d681SAndroid Build Coastguard WorkerL6:
115*9880d681SAndroid Build Coastguard Worker	.long	-858993459
116*9880d681SAndroid Build Coastguard Worker	.long	1074318540
117*9880d681SAndroid Build Coastguard Worker
118*9880d681SAndroid Build Coastguard Worker3) struct copies appear to be done field by field
119*9880d681SAndroid Build Coastguard Workerinstead of by words, at least sometimes:
120*9880d681SAndroid Build Coastguard Worker
121*9880d681SAndroid Build Coastguard Workerstruct foo { int x; short s; char c1; char c2; };
122*9880d681SAndroid Build Coastguard Workervoid cpy(struct foo*a, struct foo*b) { *a = *b; }
123*9880d681SAndroid Build Coastguard Worker
124*9880d681SAndroid Build Coastguard Workerllvm code (-O2)
125*9880d681SAndroid Build Coastguard Worker        ldrb r3, [r1, #+6]
126*9880d681SAndroid Build Coastguard Worker        ldr r2, [r1]
127*9880d681SAndroid Build Coastguard Worker        ldrb r12, [r1, #+7]
128*9880d681SAndroid Build Coastguard Worker        ldrh r1, [r1, #+4]
129*9880d681SAndroid Build Coastguard Worker        str r2, [r0]
130*9880d681SAndroid Build Coastguard Worker        strh r1, [r0, #+4]
131*9880d681SAndroid Build Coastguard Worker        strb r3, [r0, #+6]
132*9880d681SAndroid Build Coastguard Worker        strb r12, [r0, #+7]
133*9880d681SAndroid Build Coastguard Workergcc code (-O2)
134*9880d681SAndroid Build Coastguard Worker        ldmia   r1, {r1-r2}
135*9880d681SAndroid Build Coastguard Worker        stmia   r0, {r1-r2}
136*9880d681SAndroid Build Coastguard Worker
137*9880d681SAndroid Build Coastguard WorkerIn this benchmark poor handling of aggregate copies has shown up as
138*9880d681SAndroid Build Coastguard Workerhaving a large effect on size, and possibly speed as well (we don't have
139*9880d681SAndroid Build Coastguard Workera good way to measure on ARM).
140*9880d681SAndroid Build Coastguard Worker
141*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
142*9880d681SAndroid Build Coastguard Worker
143*9880d681SAndroid Build Coastguard Worker* Consider this silly example:
144*9880d681SAndroid Build Coastguard Worker
145*9880d681SAndroid Build Coastguard Workerdouble bar(double x) {
146*9880d681SAndroid Build Coastguard Worker  double r = foo(3.1);
147*9880d681SAndroid Build Coastguard Worker  return x+r;
148*9880d681SAndroid Build Coastguard Worker}
149*9880d681SAndroid Build Coastguard Worker
150*9880d681SAndroid Build Coastguard Worker_bar:
151*9880d681SAndroid Build Coastguard Worker        stmfd sp!, {r4, r5, r7, lr}
152*9880d681SAndroid Build Coastguard Worker        add r7, sp, #8
153*9880d681SAndroid Build Coastguard Worker        mov r4, r0
154*9880d681SAndroid Build Coastguard Worker        mov r5, r1
155*9880d681SAndroid Build Coastguard Worker        fldd d0, LCPI1_0
156*9880d681SAndroid Build Coastguard Worker        fmrrd r0, r1, d0
157*9880d681SAndroid Build Coastguard Worker        bl _foo
158*9880d681SAndroid Build Coastguard Worker        fmdrr d0, r4, r5
159*9880d681SAndroid Build Coastguard Worker        fmsr s2, r0
160*9880d681SAndroid Build Coastguard Worker        fsitod d1, s2
161*9880d681SAndroid Build Coastguard Worker        faddd d0, d1, d0
162*9880d681SAndroid Build Coastguard Worker        fmrrd r0, r1, d0
163*9880d681SAndroid Build Coastguard Worker        ldmfd sp!, {r4, r5, r7, pc}
164*9880d681SAndroid Build Coastguard Worker
165*9880d681SAndroid Build Coastguard WorkerIgnore the prologue and epilogue stuff for a second. Note
166*9880d681SAndroid Build Coastguard Worker	mov r4, r0
167*9880d681SAndroid Build Coastguard Worker	mov r5, r1
168*9880d681SAndroid Build Coastguard Workerthe copys to callee-save registers and the fact they are only being used by the
169*9880d681SAndroid Build Coastguard Workerfmdrr instruction. It would have been better had the fmdrr been scheduled
170*9880d681SAndroid Build Coastguard Workerbefore the call and place the result in a callee-save DPR register. The two
171*9880d681SAndroid Build Coastguard Workermov ops would not have been necessary.
172*9880d681SAndroid Build Coastguard Worker
173*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
174*9880d681SAndroid Build Coastguard Worker
175*9880d681SAndroid Build Coastguard WorkerCalling convention related stuff:
176*9880d681SAndroid Build Coastguard Worker
177*9880d681SAndroid Build Coastguard Worker* gcc's parameter passing implementation is terrible and we suffer as a result:
178*9880d681SAndroid Build Coastguard Worker
179*9880d681SAndroid Build Coastguard Workere.g.
180*9880d681SAndroid Build Coastguard Workerstruct s {
181*9880d681SAndroid Build Coastguard Worker  double d1;
182*9880d681SAndroid Build Coastguard Worker  int s1;
183*9880d681SAndroid Build Coastguard Worker};
184*9880d681SAndroid Build Coastguard Worker
185*9880d681SAndroid Build Coastguard Workervoid foo(struct s S) {
186*9880d681SAndroid Build Coastguard Worker  printf("%g, %d\n", S.d1, S.s1);
187*9880d681SAndroid Build Coastguard Worker}
188*9880d681SAndroid Build Coastguard Worker
189*9880d681SAndroid Build Coastguard Worker'S' is passed via registers r0, r1, r2. But gcc stores them to the stack, and
190*9880d681SAndroid Build Coastguard Workerthen reload them to r1, r2, and r3 before issuing the call (r0 contains the
191*9880d681SAndroid Build Coastguard Workeraddress of the format string):
192*9880d681SAndroid Build Coastguard Worker
193*9880d681SAndroid Build Coastguard Worker	stmfd	sp!, {r7, lr}
194*9880d681SAndroid Build Coastguard Worker	add	r7, sp, #0
195*9880d681SAndroid Build Coastguard Worker	sub	sp, sp, #12
196*9880d681SAndroid Build Coastguard Worker	stmia	sp, {r0, r1, r2}
197*9880d681SAndroid Build Coastguard Worker	ldmia	sp, {r1-r2}
198*9880d681SAndroid Build Coastguard Worker	ldr	r0, L5
199*9880d681SAndroid Build Coastguard Worker	ldr	r3, [sp, #8]
200*9880d681SAndroid Build Coastguard WorkerL2:
201*9880d681SAndroid Build Coastguard Worker	add	r0, pc, r0
202*9880d681SAndroid Build Coastguard Worker	bl	L_printf$stub
203*9880d681SAndroid Build Coastguard Worker
204*9880d681SAndroid Build Coastguard WorkerInstead of a stmia, ldmia, and a ldr, wouldn't it be better to do three moves?
205*9880d681SAndroid Build Coastguard Worker
206*9880d681SAndroid Build Coastguard Worker* Return an aggregate type is even worse:
207*9880d681SAndroid Build Coastguard Worker
208*9880d681SAndroid Build Coastguard Workere.g.
209*9880d681SAndroid Build Coastguard Workerstruct s foo(void) {
210*9880d681SAndroid Build Coastguard Worker  struct s S = {1.1, 2};
211*9880d681SAndroid Build Coastguard Worker  return S;
212*9880d681SAndroid Build Coastguard Worker}
213*9880d681SAndroid Build Coastguard Worker
214*9880d681SAndroid Build Coastguard Worker	mov	ip, r0
215*9880d681SAndroid Build Coastguard Worker	ldr	r0, L5
216*9880d681SAndroid Build Coastguard Worker	sub	sp, sp, #12
217*9880d681SAndroid Build Coastguard WorkerL2:
218*9880d681SAndroid Build Coastguard Worker	add	r0, pc, r0
219*9880d681SAndroid Build Coastguard Worker	@ lr needed for prologue
220*9880d681SAndroid Build Coastguard Worker	ldmia	r0, {r0, r1, r2}
221*9880d681SAndroid Build Coastguard Worker	stmia	sp, {r0, r1, r2}
222*9880d681SAndroid Build Coastguard Worker	stmia	ip, {r0, r1, r2}
223*9880d681SAndroid Build Coastguard Worker	mov	r0, ip
224*9880d681SAndroid Build Coastguard Worker	add	sp, sp, #12
225*9880d681SAndroid Build Coastguard Worker	bx	lr
226*9880d681SAndroid Build Coastguard Worker
227*9880d681SAndroid Build Coastguard Workerr0 (and later ip) is the hidden parameter from caller to store the value in. The
228*9880d681SAndroid Build Coastguard Workerfirst ldmia loads the constants into r0, r1, r2. The last stmia stores r0, r1,
229*9880d681SAndroid Build Coastguard Workerr2 into the address passed in. However, there is one additional stmia that
230*9880d681SAndroid Build Coastguard Workerstores r0, r1, and r2 to some stack location. The store is dead.
231*9880d681SAndroid Build Coastguard Worker
232*9880d681SAndroid Build Coastguard WorkerThe llvm-gcc generated code looks like this:
233*9880d681SAndroid Build Coastguard Worker
234*9880d681SAndroid Build Coastguard Workercsretcc void %foo(%struct.s* %agg.result) {
235*9880d681SAndroid Build Coastguard Workerentry:
236*9880d681SAndroid Build Coastguard Worker	%S = alloca %struct.s, align 4		; <%struct.s*> [#uses=1]
237*9880d681SAndroid Build Coastguard Worker	%memtmp = alloca %struct.s		; <%struct.s*> [#uses=1]
238*9880d681SAndroid Build Coastguard Worker	cast %struct.s* %S to sbyte*		; <sbyte*>:0 [#uses=2]
239*9880d681SAndroid Build Coastguard Worker	call void %llvm.memcpy.i32( sbyte* %0, sbyte* cast ({ double, int }* %C.0.904 to sbyte*), uint 12, uint 4 )
240*9880d681SAndroid Build Coastguard Worker	cast %struct.s* %agg.result to sbyte*		; <sbyte*>:1 [#uses=2]
241*9880d681SAndroid Build Coastguard Worker	call void %llvm.memcpy.i32( sbyte* %1, sbyte* %0, uint 12, uint 0 )
242*9880d681SAndroid Build Coastguard Worker	cast %struct.s* %memtmp to sbyte*		; <sbyte*>:2 [#uses=1]
243*9880d681SAndroid Build Coastguard Worker	call void %llvm.memcpy.i32( sbyte* %2, sbyte* %1, uint 12, uint 0 )
244*9880d681SAndroid Build Coastguard Worker	ret void
245*9880d681SAndroid Build Coastguard Worker}
246*9880d681SAndroid Build Coastguard Worker
247*9880d681SAndroid Build Coastguard Workerllc ends up issuing two memcpy's (the first memcpy becomes 3 loads from
248*9880d681SAndroid Build Coastguard Workerconstantpool). Perhaps we should 1) fix llvm-gcc so the memcpy is translated
249*9880d681SAndroid Build Coastguard Workerinto a number of load and stores, or 2) custom lower memcpy (of small size) to
250*9880d681SAndroid Build Coastguard Workerbe ldmia / stmia. I think option 2 is better but the current register
251*9880d681SAndroid Build Coastguard Workerallocator cannot allocate a chunk of registers at a time.
252*9880d681SAndroid Build Coastguard Worker
253*9880d681SAndroid Build Coastguard WorkerA feasible temporary solution is to use specific physical registers at the
254*9880d681SAndroid Build Coastguard Workerlowering time for small (<= 4 words?) transfer size.
255*9880d681SAndroid Build Coastguard Worker
256*9880d681SAndroid Build Coastguard Worker* ARM CSRet calling convention requires the hidden argument to be returned by
257*9880d681SAndroid Build Coastguard Workerthe callee.
258*9880d681SAndroid Build Coastguard Worker
259*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
260*9880d681SAndroid Build Coastguard Worker
261*9880d681SAndroid Build Coastguard WorkerWe can definitely do a better job on BB placements to eliminate some branches.
262*9880d681SAndroid Build Coastguard WorkerIt's very common to see llvm generated assembly code that looks like this:
263*9880d681SAndroid Build Coastguard Worker
264*9880d681SAndroid Build Coastguard WorkerLBB3:
265*9880d681SAndroid Build Coastguard Worker ...
266*9880d681SAndroid Build Coastguard WorkerLBB4:
267*9880d681SAndroid Build Coastguard Worker...
268*9880d681SAndroid Build Coastguard Worker  beq LBB3
269*9880d681SAndroid Build Coastguard Worker  b LBB2
270*9880d681SAndroid Build Coastguard Worker
271*9880d681SAndroid Build Coastguard WorkerIf BB4 is the only predecessor of BB3, then we can emit BB3 after BB4. We can
272*9880d681SAndroid Build Coastguard Workerthen eliminate beq and turn the unconditional branch to LBB2 to a bne.
273*9880d681SAndroid Build Coastguard Worker
274*9880d681SAndroid Build Coastguard WorkerSee McCat/18-imp/ComputeBoundingBoxes for an example.
275*9880d681SAndroid Build Coastguard Worker
276*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
277*9880d681SAndroid Build Coastguard Worker
278*9880d681SAndroid Build Coastguard WorkerPre-/post- indexed load / stores:
279*9880d681SAndroid Build Coastguard Worker
280*9880d681SAndroid Build Coastguard Worker1) We should not make the pre/post- indexed load/store transform if the base ptr
281*9880d681SAndroid Build Coastguard Workeris guaranteed to be live beyond the load/store. This can happen if the base
282*9880d681SAndroid Build Coastguard Workerptr is live out of the block we are performing the optimization. e.g.
283*9880d681SAndroid Build Coastguard Worker
284*9880d681SAndroid Build Coastguard Workermov r1, r2
285*9880d681SAndroid Build Coastguard Workerldr r3, [r1], #4
286*9880d681SAndroid Build Coastguard Worker...
287*9880d681SAndroid Build Coastguard Worker
288*9880d681SAndroid Build Coastguard Workervs.
289*9880d681SAndroid Build Coastguard Worker
290*9880d681SAndroid Build Coastguard Workerldr r3, [r2]
291*9880d681SAndroid Build Coastguard Workeradd r1, r2, #4
292*9880d681SAndroid Build Coastguard Worker...
293*9880d681SAndroid Build Coastguard Worker
294*9880d681SAndroid Build Coastguard WorkerIn most cases, this is just a wasted optimization. However, sometimes it can
295*9880d681SAndroid Build Coastguard Workernegatively impact the performance because two-address code is more restrictive
296*9880d681SAndroid Build Coastguard Workerwhen it comes to scheduling.
297*9880d681SAndroid Build Coastguard Worker
298*9880d681SAndroid Build Coastguard WorkerUnfortunately, liveout information is currently unavailable during DAG combine
299*9880d681SAndroid Build Coastguard Workertime.
300*9880d681SAndroid Build Coastguard Worker
301*9880d681SAndroid Build Coastguard Worker2) Consider spliting a indexed load / store into a pair of add/sub + load/store
302*9880d681SAndroid Build Coastguard Worker   to solve #1 (in TwoAddressInstructionPass.cpp).
303*9880d681SAndroid Build Coastguard Worker
304*9880d681SAndroid Build Coastguard Worker3) Enhance LSR to generate more opportunities for indexed ops.
305*9880d681SAndroid Build Coastguard Worker
306*9880d681SAndroid Build Coastguard Worker4) Once we added support for multiple result patterns, write indexed loads
307*9880d681SAndroid Build Coastguard Worker   patterns instead of C++ instruction selection code.
308*9880d681SAndroid Build Coastguard Worker
309*9880d681SAndroid Build Coastguard Worker5) Use VLDM / VSTM to emulate indexed FP load / store.
310*9880d681SAndroid Build Coastguard Worker
311*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
312*9880d681SAndroid Build Coastguard Worker
313*9880d681SAndroid Build Coastguard WorkerImplement support for some more tricky ways to materialize immediates.  For
314*9880d681SAndroid Build Coastguard Workerexample, to get 0xffff8000, we can use:
315*9880d681SAndroid Build Coastguard Worker
316*9880d681SAndroid Build Coastguard Workermov r9, #&3f8000
317*9880d681SAndroid Build Coastguard Workersub r9, r9, #&400000
318*9880d681SAndroid Build Coastguard Worker
319*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
320*9880d681SAndroid Build Coastguard Worker
321*9880d681SAndroid Build Coastguard WorkerWe sometimes generate multiple add / sub instructions to update sp in prologue
322*9880d681SAndroid Build Coastguard Workerand epilogue if the inc / dec value is too large to fit in a single immediate
323*9880d681SAndroid Build Coastguard Workeroperand. In some cases, perhaps it might be better to load the value from a
324*9880d681SAndroid Build Coastguard Workerconstantpool instead.
325*9880d681SAndroid Build Coastguard Worker
326*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
327*9880d681SAndroid Build Coastguard Worker
328*9880d681SAndroid Build Coastguard WorkerGCC generates significantly better code for this function.
329*9880d681SAndroid Build Coastguard Worker
330*9880d681SAndroid Build Coastguard Workerint foo(int StackPtr, unsigned char *Line, unsigned char *Stack, int LineLen) {
331*9880d681SAndroid Build Coastguard Worker    int i = 0;
332*9880d681SAndroid Build Coastguard Worker
333*9880d681SAndroid Build Coastguard Worker    if (StackPtr != 0) {
334*9880d681SAndroid Build Coastguard Worker       while (StackPtr != 0 && i < (((LineLen) < (32768))? (LineLen) : (32768)))
335*9880d681SAndroid Build Coastguard Worker          Line[i++] = Stack[--StackPtr];
336*9880d681SAndroid Build Coastguard Worker        if (LineLen > 32768)
337*9880d681SAndroid Build Coastguard Worker        {
338*9880d681SAndroid Build Coastguard Worker            while (StackPtr != 0 && i < LineLen)
339*9880d681SAndroid Build Coastguard Worker            {
340*9880d681SAndroid Build Coastguard Worker                i++;
341*9880d681SAndroid Build Coastguard Worker                --StackPtr;
342*9880d681SAndroid Build Coastguard Worker            }
343*9880d681SAndroid Build Coastguard Worker        }
344*9880d681SAndroid Build Coastguard Worker    }
345*9880d681SAndroid Build Coastguard Worker    return StackPtr;
346*9880d681SAndroid Build Coastguard Worker}
347*9880d681SAndroid Build Coastguard Worker
348*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
349*9880d681SAndroid Build Coastguard Worker
350*9880d681SAndroid Build Coastguard WorkerThis should compile to the mlas instruction:
351*9880d681SAndroid Build Coastguard Workerint mlas(int x, int y, int z) { return ((x * y + z) < 0) ? 7 : 13; }
352*9880d681SAndroid Build Coastguard Worker
353*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
354*9880d681SAndroid Build Coastguard Worker
355*9880d681SAndroid Build Coastguard WorkerAt some point, we should triage these to see if they still apply to us:
356*9880d681SAndroid Build Coastguard Worker
357*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19598
358*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=18560
359*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016
360*9880d681SAndroid Build Coastguard Worker
361*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11831
362*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11826
363*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11825
364*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11824
365*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11823
366*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11820
367*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=10982
368*9880d681SAndroid Build Coastguard Worker
369*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=10242
370*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9831
371*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9760
372*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9759
373*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9703
374*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9702
375*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9663
376*9880d681SAndroid Build Coastguard Worker
377*9880d681SAndroid Build Coastguard Workerhttp://www.inf.u-szeged.hu/gcc-arm/
378*9880d681SAndroid Build Coastguard Workerhttp://citeseer.ist.psu.edu/debus04linktime.html
379*9880d681SAndroid Build Coastguard Worker
380*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
381*9880d681SAndroid Build Coastguard Worker
382*9880d681SAndroid Build Coastguard Workergcc generates smaller code for this function at -O2 or -Os:
383*9880d681SAndroid Build Coastguard Worker
384*9880d681SAndroid Build Coastguard Workervoid foo(signed char* p) {
385*9880d681SAndroid Build Coastguard Worker  if (*p == 3)
386*9880d681SAndroid Build Coastguard Worker     bar();
387*9880d681SAndroid Build Coastguard Worker   else if (*p == 4)
388*9880d681SAndroid Build Coastguard Worker    baz();
389*9880d681SAndroid Build Coastguard Worker  else if (*p == 5)
390*9880d681SAndroid Build Coastguard Worker    quux();
391*9880d681SAndroid Build Coastguard Worker}
392*9880d681SAndroid Build Coastguard Worker
393*9880d681SAndroid Build Coastguard Workerllvm decides it's a good idea to turn the repeated if...else into a
394*9880d681SAndroid Build Coastguard Workerbinary tree, as if it were a switch; the resulting code requires -1
395*9880d681SAndroid Build Coastguard Workercompare-and-branches when *p<=2 or *p==5, the same number if *p==4
396*9880d681SAndroid Build Coastguard Workeror *p>6, and +1 if *p==3.  So it should be a speed win
397*9880d681SAndroid Build Coastguard Worker(on balance).  However, the revised code is larger, with 4 conditional
398*9880d681SAndroid Build Coastguard Workerbranches instead of 3.
399*9880d681SAndroid Build Coastguard Worker
400*9880d681SAndroid Build Coastguard WorkerMore seriously, there is a byte->word extend before
401*9880d681SAndroid Build Coastguard Workereach comparison, where there should be only one, and the condition codes
402*9880d681SAndroid Build Coastguard Workerare not remembered when the same two values are compared twice.
403*9880d681SAndroid Build Coastguard Worker
404*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
405*9880d681SAndroid Build Coastguard Worker
406*9880d681SAndroid Build Coastguard WorkerMore LSR enhancements possible:
407*9880d681SAndroid Build Coastguard Worker
408*9880d681SAndroid Build Coastguard Worker1. Teach LSR about pre- and post- indexed ops to allow iv increment be merged
409*9880d681SAndroid Build Coastguard Worker   in a load / store.
410*9880d681SAndroid Build Coastguard Worker2. Allow iv reuse even when a type conversion is required. For example, i8
411*9880d681SAndroid Build Coastguard Worker   and i32 load / store addressing modes are identical.
412*9880d681SAndroid Build Coastguard Worker
413*9880d681SAndroid Build Coastguard Worker
414*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
415*9880d681SAndroid Build Coastguard Worker
416*9880d681SAndroid Build Coastguard WorkerThis:
417*9880d681SAndroid Build Coastguard Worker
418*9880d681SAndroid Build Coastguard Workerint foo(int a, int b, int c, int d) {
419*9880d681SAndroid Build Coastguard Worker  long long acc = (long long)a * (long long)b;
420*9880d681SAndroid Build Coastguard Worker  acc += (long long)c * (long long)d;
421*9880d681SAndroid Build Coastguard Worker  return (int)(acc >> 32);
422*9880d681SAndroid Build Coastguard Worker}
423*9880d681SAndroid Build Coastguard Worker
424*9880d681SAndroid Build Coastguard WorkerShould compile to use SMLAL (Signed Multiply Accumulate Long) which multiplies
425*9880d681SAndroid Build Coastguard Workertwo signed 32-bit values to produce a 64-bit value, and accumulates this with
426*9880d681SAndroid Build Coastguard Workera 64-bit value.
427*9880d681SAndroid Build Coastguard Worker
428*9880d681SAndroid Build Coastguard WorkerWe currently get this with both v4 and v6:
429*9880d681SAndroid Build Coastguard Worker
430*9880d681SAndroid Build Coastguard Worker_foo:
431*9880d681SAndroid Build Coastguard Worker        smull r1, r0, r1, r0
432*9880d681SAndroid Build Coastguard Worker        smull r3, r2, r3, r2
433*9880d681SAndroid Build Coastguard Worker        adds r3, r3, r1
434*9880d681SAndroid Build Coastguard Worker        adc r0, r2, r0
435*9880d681SAndroid Build Coastguard Worker        bx lr
436*9880d681SAndroid Build Coastguard Worker
437*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
438*9880d681SAndroid Build Coastguard Worker
439*9880d681SAndroid Build Coastguard WorkerThis:
440*9880d681SAndroid Build Coastguard Worker        #include <algorithm>
441*9880d681SAndroid Build Coastguard Worker        std::pair<unsigned, bool> full_add(unsigned a, unsigned b)
442*9880d681SAndroid Build Coastguard Worker        { return std::make_pair(a + b, a + b < a); }
443*9880d681SAndroid Build Coastguard Worker        bool no_overflow(unsigned a, unsigned b)
444*9880d681SAndroid Build Coastguard Worker        { return !full_add(a, b).second; }
445*9880d681SAndroid Build Coastguard Worker
446*9880d681SAndroid Build Coastguard WorkerShould compile to:
447*9880d681SAndroid Build Coastguard Worker
448*9880d681SAndroid Build Coastguard Worker_Z8full_addjj:
449*9880d681SAndroid Build Coastguard Worker	adds	r2, r1, r2
450*9880d681SAndroid Build Coastguard Worker	movcc	r1, #0
451*9880d681SAndroid Build Coastguard Worker	movcs	r1, #1
452*9880d681SAndroid Build Coastguard Worker	str	r2, [r0, #0]
453*9880d681SAndroid Build Coastguard Worker	strb	r1, [r0, #4]
454*9880d681SAndroid Build Coastguard Worker	mov	pc, lr
455*9880d681SAndroid Build Coastguard Worker
456*9880d681SAndroid Build Coastguard Worker_Z11no_overflowjj:
457*9880d681SAndroid Build Coastguard Worker	cmn	r0, r1
458*9880d681SAndroid Build Coastguard Worker	movcs	r0, #0
459*9880d681SAndroid Build Coastguard Worker	movcc	r0, #1
460*9880d681SAndroid Build Coastguard Worker	mov	pc, lr
461*9880d681SAndroid Build Coastguard Worker
462*9880d681SAndroid Build Coastguard Workernot:
463*9880d681SAndroid Build Coastguard Worker
464*9880d681SAndroid Build Coastguard Worker__Z8full_addjj:
465*9880d681SAndroid Build Coastguard Worker        add r3, r2, r1
466*9880d681SAndroid Build Coastguard Worker        str r3, [r0]
467*9880d681SAndroid Build Coastguard Worker        mov r2, #1
468*9880d681SAndroid Build Coastguard Worker        mov r12, #0
469*9880d681SAndroid Build Coastguard Worker        cmp r3, r1
470*9880d681SAndroid Build Coastguard Worker        movlo r12, r2
471*9880d681SAndroid Build Coastguard Worker        str r12, [r0, #+4]
472*9880d681SAndroid Build Coastguard Worker        bx lr
473*9880d681SAndroid Build Coastguard Worker__Z11no_overflowjj:
474*9880d681SAndroid Build Coastguard Worker        add r3, r1, r0
475*9880d681SAndroid Build Coastguard Worker        mov r2, #1
476*9880d681SAndroid Build Coastguard Worker        mov r1, #0
477*9880d681SAndroid Build Coastguard Worker        cmp r3, r0
478*9880d681SAndroid Build Coastguard Worker        movhs r1, r2
479*9880d681SAndroid Build Coastguard Worker        mov r0, r1
480*9880d681SAndroid Build Coastguard Worker        bx lr
481*9880d681SAndroid Build Coastguard Worker
482*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
483*9880d681SAndroid Build Coastguard Worker
484*9880d681SAndroid Build Coastguard WorkerSome of the NEON intrinsics may be appropriate for more general use, either
485*9880d681SAndroid Build Coastguard Workeras target-independent intrinsics or perhaps elsewhere in the ARM backend.
486*9880d681SAndroid Build Coastguard WorkerSome of them may also be lowered to target-independent SDNodes, and perhaps
487*9880d681SAndroid Build Coastguard Workersome new SDNodes could be added.
488*9880d681SAndroid Build Coastguard Worker
489*9880d681SAndroid Build Coastguard WorkerFor example, maximum, minimum, and absolute value operations are well-defined
490*9880d681SAndroid Build Coastguard Workerand standard operations, both for vector and scalar types.
491*9880d681SAndroid Build Coastguard Worker
492*9880d681SAndroid Build Coastguard WorkerThe current NEON-specific intrinsics for count leading zeros and count one
493*9880d681SAndroid Build Coastguard Workerbits could perhaps be replaced by the target-independent ctlz and ctpop
494*9880d681SAndroid Build Coastguard Workerintrinsics.  It may also make sense to add a target-independent "ctls"
495*9880d681SAndroid Build Coastguard Workerintrinsic for "count leading sign bits".  Likewise, the backend could use
496*9880d681SAndroid Build Coastguard Workerthe target-independent SDNodes for these operations.
497*9880d681SAndroid Build Coastguard Worker
498*9880d681SAndroid Build Coastguard WorkerARMv6 has scalar saturating and halving adds and subtracts.  The same
499*9880d681SAndroid Build Coastguard Workerintrinsics could possibly be used for both NEON's vector implementations of
500*9880d681SAndroid Build Coastguard Workerthose operations and the ARMv6 scalar versions.
501*9880d681SAndroid Build Coastguard Worker
502*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
503*9880d681SAndroid Build Coastguard Worker
504*9880d681SAndroid Build Coastguard WorkerSplit out LDR (literal) from normal ARM LDR instruction. Also consider spliting
505*9880d681SAndroid Build Coastguard WorkerLDR into imm12 and so_reg forms. This allows us to clean up some code. e.g.
506*9880d681SAndroid Build Coastguard WorkerARMLoadStoreOptimizer does not need to look at LDR (literal) and LDR (so_reg)
507*9880d681SAndroid Build Coastguard Workerwhile ARMConstantIslandPass only need to worry about LDR (literal).
508*9880d681SAndroid Build Coastguard Worker
509*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
510*9880d681SAndroid Build Coastguard Worker
511*9880d681SAndroid Build Coastguard WorkerConstant island pass should make use of full range SoImm values for LEApcrel.
512*9880d681SAndroid Build Coastguard WorkerBe careful though as the last attempt caused infinite looping on lencod.
513*9880d681SAndroid Build Coastguard Worker
514*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
515*9880d681SAndroid Build Coastguard Worker
516*9880d681SAndroid Build Coastguard WorkerPredication issue. This function:
517*9880d681SAndroid Build Coastguard Worker
518*9880d681SAndroid Build Coastguard Workerextern unsigned array[ 128 ];
519*9880d681SAndroid Build Coastguard Workerint     foo( int x ) {
520*9880d681SAndroid Build Coastguard Worker  int     y;
521*9880d681SAndroid Build Coastguard Worker  y = array[ x & 127 ];
522*9880d681SAndroid Build Coastguard Worker  if ( x & 128 )
523*9880d681SAndroid Build Coastguard Worker     y = 123456789 & ( y >> 2 );
524*9880d681SAndroid Build Coastguard Worker  else
525*9880d681SAndroid Build Coastguard Worker     y = 123456789 & y;
526*9880d681SAndroid Build Coastguard Worker  return y;
527*9880d681SAndroid Build Coastguard Worker}
528*9880d681SAndroid Build Coastguard Worker
529*9880d681SAndroid Build Coastguard Workercompiles to:
530*9880d681SAndroid Build Coastguard Worker
531*9880d681SAndroid Build Coastguard Worker_foo:
532*9880d681SAndroid Build Coastguard Worker	and r1, r0, #127
533*9880d681SAndroid Build Coastguard Worker	ldr r2, LCPI1_0
534*9880d681SAndroid Build Coastguard Worker	ldr r2, [r2]
535*9880d681SAndroid Build Coastguard Worker	ldr r1, [r2, +r1, lsl #2]
536*9880d681SAndroid Build Coastguard Worker	mov r2, r1, lsr #2
537*9880d681SAndroid Build Coastguard Worker	tst r0, #128
538*9880d681SAndroid Build Coastguard Worker	moveq r2, r1
539*9880d681SAndroid Build Coastguard Worker	ldr r0, LCPI1_1
540*9880d681SAndroid Build Coastguard Worker	and r0, r2, r0
541*9880d681SAndroid Build Coastguard Worker	bx lr
542*9880d681SAndroid Build Coastguard Worker
543*9880d681SAndroid Build Coastguard WorkerIt would be better to do something like this, to fold the shift into the
544*9880d681SAndroid Build Coastguard Workerconditional move:
545*9880d681SAndroid Build Coastguard Worker
546*9880d681SAndroid Build Coastguard Worker	and r1, r0, #127
547*9880d681SAndroid Build Coastguard Worker	ldr r2, LCPI1_0
548*9880d681SAndroid Build Coastguard Worker	ldr r2, [r2]
549*9880d681SAndroid Build Coastguard Worker	ldr r1, [r2, +r1, lsl #2]
550*9880d681SAndroid Build Coastguard Worker	tst r0, #128
551*9880d681SAndroid Build Coastguard Worker	movne r1, r1, lsr #2
552*9880d681SAndroid Build Coastguard Worker	ldr r0, LCPI1_1
553*9880d681SAndroid Build Coastguard Worker	and r0, r1, r0
554*9880d681SAndroid Build Coastguard Worker	bx lr
555*9880d681SAndroid Build Coastguard Worker
556*9880d681SAndroid Build Coastguard Workerit saves an instruction and a register.
557*9880d681SAndroid Build Coastguard Worker
558*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
559*9880d681SAndroid Build Coastguard Worker
560*9880d681SAndroid Build Coastguard WorkerIt might be profitable to cse MOVi16 if there are lots of 32-bit immediates
561*9880d681SAndroid Build Coastguard Workerwith the same bottom half.
562*9880d681SAndroid Build Coastguard Worker
563*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
564*9880d681SAndroid Build Coastguard Worker
565*9880d681SAndroid Build Coastguard WorkerRobert Muth started working on an alternate jump table implementation that
566*9880d681SAndroid Build Coastguard Workerdoes not put the tables in-line in the text.  This is more like the llvm
567*9880d681SAndroid Build Coastguard Workerdefault jump table implementation.  This might be useful sometime.  Several
568*9880d681SAndroid Build Coastguard Workerrevisions of patches are on the mailing list, beginning at:
569*9880d681SAndroid Build Coastguard Workerhttp://lists.llvm.org/pipermail/llvm-dev/2009-June/022763.html
570*9880d681SAndroid Build Coastguard Worker
571*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
572*9880d681SAndroid Build Coastguard Worker
573*9880d681SAndroid Build Coastguard WorkerMake use of the "rbit" instruction.
574*9880d681SAndroid Build Coastguard Worker
575*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
576*9880d681SAndroid Build Coastguard Worker
577*9880d681SAndroid Build Coastguard WorkerTake a look at test/CodeGen/Thumb2/machine-licm.ll. ARM should be taught how
578*9880d681SAndroid Build Coastguard Workerto licm and cse the unnecessary load from cp#1.
579*9880d681SAndroid Build Coastguard Worker
580*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
581*9880d681SAndroid Build Coastguard Worker
582*9880d681SAndroid Build Coastguard WorkerThe CMN instruction sets the flags like an ADD instruction, while CMP sets
583*9880d681SAndroid Build Coastguard Workerthem like a subtract. Therefore to be able to use CMN for comparisons other
584*9880d681SAndroid Build Coastguard Workerthan the Z bit, we'll need additional logic to reverse the conditionals
585*9880d681SAndroid Build Coastguard Workerassociated with the comparison. Perhaps a pseudo-instruction for the comparison,
586*9880d681SAndroid Build Coastguard Workerwith a post-codegen pass to clean up and handle the condition codes?
587*9880d681SAndroid Build Coastguard WorkerSee PR5694 for testcase.
588*9880d681SAndroid Build Coastguard Worker
589*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
590*9880d681SAndroid Build Coastguard Worker
591*9880d681SAndroid Build Coastguard WorkerGiven the following on armv5:
592*9880d681SAndroid Build Coastguard Workerint test1(int A, int B) {
593*9880d681SAndroid Build Coastguard Worker  return (A&-8388481)|(B&8388480);
594*9880d681SAndroid Build Coastguard Worker}
595*9880d681SAndroid Build Coastguard Worker
596*9880d681SAndroid Build Coastguard WorkerWe currently generate:
597*9880d681SAndroid Build Coastguard Worker	ldr	r2, .LCPI0_0
598*9880d681SAndroid Build Coastguard Worker	and	r0, r0, r2
599*9880d681SAndroid Build Coastguard Worker	ldr	r2, .LCPI0_1
600*9880d681SAndroid Build Coastguard Worker	and	r1, r1, r2
601*9880d681SAndroid Build Coastguard Worker	orr	r0, r1, r0
602*9880d681SAndroid Build Coastguard Worker	bx	lr
603*9880d681SAndroid Build Coastguard Worker
604*9880d681SAndroid Build Coastguard WorkerWe should be able to replace the second ldr+and with a bic (i.e. reuse the
605*9880d681SAndroid Build Coastguard Workerconstant which was already loaded).  Not sure what's necessary to do that.
606*9880d681SAndroid Build Coastguard Worker
607*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
608*9880d681SAndroid Build Coastguard Worker
609*9880d681SAndroid Build Coastguard WorkerThe code generated for bswap on armv4/5 (CPUs without rev) is less than ideal:
610*9880d681SAndroid Build Coastguard Worker
611*9880d681SAndroid Build Coastguard Workerint a(int x) { return __builtin_bswap32(x); }
612*9880d681SAndroid Build Coastguard Worker
613*9880d681SAndroid Build Coastguard Workera:
614*9880d681SAndroid Build Coastguard Worker	mov	r1, #255, 24
615*9880d681SAndroid Build Coastguard Worker	mov	r2, #255, 16
616*9880d681SAndroid Build Coastguard Worker	and	r1, r1, r0, lsr #8
617*9880d681SAndroid Build Coastguard Worker	and	r2, r2, r0, lsl #8
618*9880d681SAndroid Build Coastguard Worker	orr	r1, r1, r0, lsr #24
619*9880d681SAndroid Build Coastguard Worker	orr	r0, r2, r0, lsl #24
620*9880d681SAndroid Build Coastguard Worker	orr	r0, r0, r1
621*9880d681SAndroid Build Coastguard Worker	bx	lr
622*9880d681SAndroid Build Coastguard Worker
623*9880d681SAndroid Build Coastguard WorkerSomething like the following would be better (fewer instructions/registers):
624*9880d681SAndroid Build Coastguard Worker	eor     r1, r0, r0, ror #16
625*9880d681SAndroid Build Coastguard Worker	bic     r1, r1, #0xff0000
626*9880d681SAndroid Build Coastguard Worker	mov     r1, r1, lsr #8
627*9880d681SAndroid Build Coastguard Worker	eor     r0, r1, r0, ror #8
628*9880d681SAndroid Build Coastguard Worker	bx	lr
629*9880d681SAndroid Build Coastguard Worker
630*9880d681SAndroid Build Coastguard WorkerA custom Thumb version would also be a slight improvement over the generic
631*9880d681SAndroid Build Coastguard Workerversion.
632*9880d681SAndroid Build Coastguard Worker
633*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
634*9880d681SAndroid Build Coastguard Worker
635*9880d681SAndroid Build Coastguard WorkerConsider the following simple C code:
636*9880d681SAndroid Build Coastguard Worker
637*9880d681SAndroid Build Coastguard Workervoid foo(unsigned char *a, unsigned char *b, int *c) {
638*9880d681SAndroid Build Coastguard Worker if ((*a | *b) == 0) *c = 0;
639*9880d681SAndroid Build Coastguard Worker}
640*9880d681SAndroid Build Coastguard Worker
641*9880d681SAndroid Build Coastguard Workercurrently llvm-gcc generates something like this (nice branchless code I'd say):
642*9880d681SAndroid Build Coastguard Worker
643*9880d681SAndroid Build Coastguard Worker       ldrb    r0, [r0]
644*9880d681SAndroid Build Coastguard Worker       ldrb    r1, [r1]
645*9880d681SAndroid Build Coastguard Worker       orr     r0, r1, r0
646*9880d681SAndroid Build Coastguard Worker       tst     r0, #255
647*9880d681SAndroid Build Coastguard Worker       moveq   r0, #0
648*9880d681SAndroid Build Coastguard Worker       streq   r0, [r2]
649*9880d681SAndroid Build Coastguard Worker       bx      lr
650*9880d681SAndroid Build Coastguard Worker
651*9880d681SAndroid Build Coastguard WorkerNote that both "tst" and "moveq" are redundant.
652*9880d681SAndroid Build Coastguard Worker
653*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
654*9880d681SAndroid Build Coastguard Worker
655*9880d681SAndroid Build Coastguard WorkerWhen loading immediate constants with movt/movw, if there are multiple
656*9880d681SAndroid Build Coastguard Workerconstants needed with the same low 16 bits, and those values are not live at
657*9880d681SAndroid Build Coastguard Workerthe same time, it would be possible to use a single movw instruction, followed
658*9880d681SAndroid Build Coastguard Workerby multiple movt instructions to rewrite the high bits to different values.
659*9880d681SAndroid Build Coastguard WorkerFor example:
660*9880d681SAndroid Build Coastguard Worker
661*9880d681SAndroid Build Coastguard Worker  volatile store i32 -1, i32* inttoptr (i32 1342210076 to i32*), align 4,
662*9880d681SAndroid Build Coastguard Worker  !tbaa
663*9880d681SAndroid Build Coastguard Worker!0
664*9880d681SAndroid Build Coastguard Worker  volatile store i32 -1, i32* inttoptr (i32 1342341148 to i32*), align 4,
665*9880d681SAndroid Build Coastguard Worker  !tbaa
666*9880d681SAndroid Build Coastguard Worker!0
667*9880d681SAndroid Build Coastguard Worker
668*9880d681SAndroid Build Coastguard Workeris compiled and optimized to:
669*9880d681SAndroid Build Coastguard Worker
670*9880d681SAndroid Build Coastguard Worker    movw    r0, #32796
671*9880d681SAndroid Build Coastguard Worker    mov.w    r1, #-1
672*9880d681SAndroid Build Coastguard Worker    movt    r0, #20480
673*9880d681SAndroid Build Coastguard Worker    str    r1, [r0]
674*9880d681SAndroid Build Coastguard Worker    movw    r0, #32796    @ <= this MOVW is not needed, value is there already
675*9880d681SAndroid Build Coastguard Worker    movt    r0, #20482
676*9880d681SAndroid Build Coastguard Worker    str    r1, [r0]
677*9880d681SAndroid Build Coastguard Worker
678*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
679*9880d681SAndroid Build Coastguard Worker
680*9880d681SAndroid Build Coastguard WorkerImprove codegen for select's:
681*9880d681SAndroid Build Coastguard Workerif (x != 0) x = 1
682*9880d681SAndroid Build Coastguard Workerif (x == 1) x = 1
683*9880d681SAndroid Build Coastguard Worker
684*9880d681SAndroid Build Coastguard WorkerARM codegen used to look like this:
685*9880d681SAndroid Build Coastguard Worker       mov     r1, r0
686*9880d681SAndroid Build Coastguard Worker       cmp     r1, #1
687*9880d681SAndroid Build Coastguard Worker       mov     r0, #0
688*9880d681SAndroid Build Coastguard Worker       moveq   r0, #1
689*9880d681SAndroid Build Coastguard Worker
690*9880d681SAndroid Build Coastguard WorkerThe naive lowering select between two different values. It should recognize the
691*9880d681SAndroid Build Coastguard Workertest is equality test so it's more a conditional move rather than a select:
692*9880d681SAndroid Build Coastguard Worker       cmp     r0, #1
693*9880d681SAndroid Build Coastguard Worker       movne   r0, #0
694*9880d681SAndroid Build Coastguard Worker
695*9880d681SAndroid Build Coastguard WorkerCurrently this is a ARM specific dag combine. We probably should make it into a
696*9880d681SAndroid Build Coastguard Workertarget-neutral one.
697*9880d681SAndroid Build Coastguard Worker
698*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
699*9880d681SAndroid Build Coastguard Worker
700*9880d681SAndroid Build Coastguard WorkerOptimize unnecessary checks for zero with __builtin_clz/ctz.  Those builtins
701*9880d681SAndroid Build Coastguard Workerare specified to be undefined at zero, so portable code must check for zero
702*9880d681SAndroid Build Coastguard Workerand handle it as a special case.  That is unnecessary on ARM where those
703*9880d681SAndroid Build Coastguard Workeroperations are implemented in a way that is well-defined for zero.  For
704*9880d681SAndroid Build Coastguard Workerexample:
705*9880d681SAndroid Build Coastguard Worker
706*9880d681SAndroid Build Coastguard Workerint f(int x) { return x ? __builtin_clz(x) : sizeof(int)*8; }
707*9880d681SAndroid Build Coastguard Worker
708*9880d681SAndroid Build Coastguard Workershould just be implemented with a CLZ instruction.  Since there are other
709*9880d681SAndroid Build Coastguard Workertargets, e.g., PPC, that share this behavior, it would be best to implement
710*9880d681SAndroid Build Coastguard Workerthis in a target-independent way: we should probably fold that (when using
711*9880d681SAndroid Build Coastguard Worker"undefined at zero" semantics) to set the "defined at zero" bit and have
712*9880d681SAndroid Build Coastguard Workerthe code generator expand out the right code.
713*9880d681SAndroid Build Coastguard Worker
714*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
715*9880d681SAndroid Build Coastguard Worker
716*9880d681SAndroid Build Coastguard WorkerClean up the test/MC/ARM files to have more robust register choices.
717*9880d681SAndroid Build Coastguard Worker
718*9880d681SAndroid Build Coastguard WorkerR0 should not be used as a register operand in the assembler tests as it's then
719*9880d681SAndroid Build Coastguard Workernot possible to distinguish between a correct encoding and a missing operand
720*9880d681SAndroid Build Coastguard Workerencoding, as zero is the default value for the binary encoder.
721*9880d681SAndroid Build Coastguard Workere.g.,
722*9880d681SAndroid Build Coastguard Worker    add r0, r0  // bad
723*9880d681SAndroid Build Coastguard Worker    add r3, r5  // good
724*9880d681SAndroid Build Coastguard Worker
725*9880d681SAndroid Build Coastguard WorkerRegister operands should be distinct. That is, when the encoding does not
726*9880d681SAndroid Build Coastguard Workerrequire two syntactical operands to refer to the same register, two different
727*9880d681SAndroid Build Coastguard Workerregisters should be used in the test so as to catch errors where the
728*9880d681SAndroid Build Coastguard Workeroperands are swapped in the encoding.
729*9880d681SAndroid Build Coastguard Workere.g.,
730*9880d681SAndroid Build Coastguard Worker    subs.w r1, r1, r1 // bad
731*9880d681SAndroid Build Coastguard Worker    subs.w r1, r2, r3 // good
732*9880d681SAndroid Build Coastguard Worker
733