xref: /aosp_15_r20/external/llvm/lib/Target/X86/README.txt (revision 9880d6810fe72a1726cb53787c6711e909410d58)
1*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
2*9880d681SAndroid Build Coastguard Worker// Random ideas for the X86 backend.
3*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
4*9880d681SAndroid Build Coastguard Worker
5*9880d681SAndroid Build Coastguard WorkerImprovements to the multiply -> shift/add algorithm:
6*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/ml/gcc-patches/2004-08/msg01590.html
7*9880d681SAndroid Build Coastguard Worker
8*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
9*9880d681SAndroid Build Coastguard Worker
10*9880d681SAndroid Build Coastguard WorkerImprove code like this (occurs fairly frequently, e.g. in LLVM):
11*9880d681SAndroid Build Coastguard Workerlong long foo(int x) { return 1LL << x; }
12*9880d681SAndroid Build Coastguard Worker
13*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/ml/gcc-patches/2004-09/msg01109.html
14*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/ml/gcc-patches/2004-09/msg01128.html
15*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/ml/gcc-patches/2004-09/msg01136.html
16*9880d681SAndroid Build Coastguard Worker
17*9880d681SAndroid Build Coastguard WorkerAnother useful one would be  ~0ULL >> X and ~0ULL << X.
18*9880d681SAndroid Build Coastguard Worker
19*9880d681SAndroid Build Coastguard WorkerOne better solution for 1LL << x is:
20*9880d681SAndroid Build Coastguard Worker        xorl    %eax, %eax
21*9880d681SAndroid Build Coastguard Worker        xorl    %edx, %edx
22*9880d681SAndroid Build Coastguard Worker        testb   $32, %cl
23*9880d681SAndroid Build Coastguard Worker        sete    %al
24*9880d681SAndroid Build Coastguard Worker        setne   %dl
25*9880d681SAndroid Build Coastguard Worker        sall    %cl, %eax
26*9880d681SAndroid Build Coastguard Worker        sall    %cl, %edx
27*9880d681SAndroid Build Coastguard Worker
28*9880d681SAndroid Build Coastguard WorkerBut that requires good 8-bit subreg support.
29*9880d681SAndroid Build Coastguard Worker
30*9880d681SAndroid Build Coastguard WorkerAlso, this might be better.  It's an extra shift, but it's one instruction
31*9880d681SAndroid Build Coastguard Workershorter, and doesn't stress 8-bit subreg support.
32*9880d681SAndroid Build Coastguard Worker(From http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01148.html,
33*9880d681SAndroid Build Coastguard Workerbut without the unnecessary and.)
34*9880d681SAndroid Build Coastguard Worker        movl %ecx, %eax
35*9880d681SAndroid Build Coastguard Worker        shrl $5, %eax
36*9880d681SAndroid Build Coastguard Worker        movl %eax, %edx
37*9880d681SAndroid Build Coastguard Worker        xorl $1, %edx
38*9880d681SAndroid Build Coastguard Worker        sall %cl, %eax
39*9880d681SAndroid Build Coastguard Worker        sall %cl. %edx
40*9880d681SAndroid Build Coastguard Worker
41*9880d681SAndroid Build Coastguard Worker64-bit shifts (in general) expand to really bad code.  Instead of using
42*9880d681SAndroid Build Coastguard Workercmovs, we should expand to a conditional branch like GCC produces.
43*9880d681SAndroid Build Coastguard Worker
44*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
45*9880d681SAndroid Build Coastguard Worker
46*9880d681SAndroid Build Coastguard WorkerSome isel ideas:
47*9880d681SAndroid Build Coastguard Worker
48*9880d681SAndroid Build Coastguard Worker1. Dynamic programming based approach when compile time is not an
49*9880d681SAndroid Build Coastguard Worker   issue.
50*9880d681SAndroid Build Coastguard Worker2. Code duplication (addressing mode) during isel.
51*9880d681SAndroid Build Coastguard Worker3. Other ideas from "Register-Sensitive Selection, Duplication, and
52*9880d681SAndroid Build Coastguard Worker   Sequencing of Instructions".
53*9880d681SAndroid Build Coastguard Worker4. Scheduling for reduced register pressure.  E.g. "Minimum Register
54*9880d681SAndroid Build Coastguard Worker   Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs"
55*9880d681SAndroid Build Coastguard Worker   and other related papers.
56*9880d681SAndroid Build Coastguard Worker   http://citeseer.ist.psu.edu/govindarajan01minimum.html
57*9880d681SAndroid Build Coastguard Worker
58*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
59*9880d681SAndroid Build Coastguard Worker
60*9880d681SAndroid Build Coastguard WorkerShould we promote i16 to i32 to avoid partial register update stalls?
61*9880d681SAndroid Build Coastguard Worker
62*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
63*9880d681SAndroid Build Coastguard Worker
64*9880d681SAndroid Build Coastguard WorkerLeave any_extend as pseudo instruction and hint to register
65*9880d681SAndroid Build Coastguard Workerallocator. Delay codegen until post register allocation.
66*9880d681SAndroid Build Coastguard WorkerNote. any_extend is now turned into an INSERT_SUBREG. We still need to teach
67*9880d681SAndroid Build Coastguard Workerthe coalescer how to deal with it though.
68*9880d681SAndroid Build Coastguard Worker
69*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
70*9880d681SAndroid Build Coastguard Worker
71*9880d681SAndroid Build Coastguard WorkerIt appears icc use push for parameter passing. Need to investigate.
72*9880d681SAndroid Build Coastguard Worker
73*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
74*9880d681SAndroid Build Coastguard Worker
75*9880d681SAndroid Build Coastguard WorkerThe instruction selector sometimes misses folding a load into a compare.  The
76*9880d681SAndroid Build Coastguard Workerpattern is written as (cmp reg, (load p)).  Because the compare isn't
77*9880d681SAndroid Build Coastguard Workercommutative, it is not matched with the load on both sides.  The dag combiner
78*9880d681SAndroid Build Coastguard Workershould be made smart enough to canonicalize the load into the RHS of a compare
79*9880d681SAndroid Build Coastguard Workerwhen it can invert the result of the compare for free.
80*9880d681SAndroid Build Coastguard Worker
81*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
82*9880d681SAndroid Build Coastguard Worker
83*9880d681SAndroid Build Coastguard WorkerIn many cases, LLVM generates code like this:
84*9880d681SAndroid Build Coastguard Worker
85*9880d681SAndroid Build Coastguard Worker_test:
86*9880d681SAndroid Build Coastguard Worker        movl 8(%esp), %eax
87*9880d681SAndroid Build Coastguard Worker        cmpl %eax, 4(%esp)
88*9880d681SAndroid Build Coastguard Worker        setl %al
89*9880d681SAndroid Build Coastguard Worker        movzbl %al, %eax
90*9880d681SAndroid Build Coastguard Worker        ret
91*9880d681SAndroid Build Coastguard Worker
92*9880d681SAndroid Build Coastguard Workeron some processors (which ones?), it is more efficient to do this:
93*9880d681SAndroid Build Coastguard Worker
94*9880d681SAndroid Build Coastguard Worker_test:
95*9880d681SAndroid Build Coastguard Worker        movl 8(%esp), %ebx
96*9880d681SAndroid Build Coastguard Worker        xor  %eax, %eax
97*9880d681SAndroid Build Coastguard Worker        cmpl %ebx, 4(%esp)
98*9880d681SAndroid Build Coastguard Worker        setl %al
99*9880d681SAndroid Build Coastguard Worker        ret
100*9880d681SAndroid Build Coastguard Worker
101*9880d681SAndroid Build Coastguard WorkerDoing this correctly is tricky though, as the xor clobbers the flags.
102*9880d681SAndroid Build Coastguard Worker
103*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
104*9880d681SAndroid Build Coastguard Worker
105*9880d681SAndroid Build Coastguard WorkerWe should generate bts/btr/etc instructions on targets where they are cheap or
106*9880d681SAndroid Build Coastguard Workerwhen codesize is important.  e.g., for:
107*9880d681SAndroid Build Coastguard Worker
108*9880d681SAndroid Build Coastguard Workervoid setbit(int *target, int bit) {
109*9880d681SAndroid Build Coastguard Worker    *target |= (1 << bit);
110*9880d681SAndroid Build Coastguard Worker}
111*9880d681SAndroid Build Coastguard Workervoid clearbit(int *target, int bit) {
112*9880d681SAndroid Build Coastguard Worker    *target &= ~(1 << bit);
113*9880d681SAndroid Build Coastguard Worker}
114*9880d681SAndroid Build Coastguard Worker
115*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
116*9880d681SAndroid Build Coastguard Worker
117*9880d681SAndroid Build Coastguard WorkerInstead of the following for memset char*, 1, 10:
118*9880d681SAndroid Build Coastguard Worker
119*9880d681SAndroid Build Coastguard Worker	movl $16843009, 4(%edx)
120*9880d681SAndroid Build Coastguard Worker	movl $16843009, (%edx)
121*9880d681SAndroid Build Coastguard Worker	movw $257, 8(%edx)
122*9880d681SAndroid Build Coastguard Worker
123*9880d681SAndroid Build Coastguard WorkerIt might be better to generate
124*9880d681SAndroid Build Coastguard Worker
125*9880d681SAndroid Build Coastguard Worker	movl $16843009, %eax
126*9880d681SAndroid Build Coastguard Worker	movl %eax, 4(%edx)
127*9880d681SAndroid Build Coastguard Worker	movl %eax, (%edx)
128*9880d681SAndroid Build Coastguard Worker	movw al, 8(%edx)
129*9880d681SAndroid Build Coastguard Worker
130*9880d681SAndroid Build Coastguard Workerwhen we can spare a register. It reduces code size.
131*9880d681SAndroid Build Coastguard Worker
132*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
133*9880d681SAndroid Build Coastguard Worker
134*9880d681SAndroid Build Coastguard WorkerEvaluate what the best way to codegen sdiv X, (2^C) is.  For X/8, we currently
135*9880d681SAndroid Build Coastguard Workerget this:
136*9880d681SAndroid Build Coastguard Worker
137*9880d681SAndroid Build Coastguard Workerdefine i32 @test1(i32 %X) {
138*9880d681SAndroid Build Coastguard Worker    %Y = sdiv i32 %X, 8
139*9880d681SAndroid Build Coastguard Worker    ret i32 %Y
140*9880d681SAndroid Build Coastguard Worker}
141*9880d681SAndroid Build Coastguard Worker
142*9880d681SAndroid Build Coastguard Worker_test1:
143*9880d681SAndroid Build Coastguard Worker        movl 4(%esp), %eax
144*9880d681SAndroid Build Coastguard Worker        movl %eax, %ecx
145*9880d681SAndroid Build Coastguard Worker        sarl $31, %ecx
146*9880d681SAndroid Build Coastguard Worker        shrl $29, %ecx
147*9880d681SAndroid Build Coastguard Worker        addl %ecx, %eax
148*9880d681SAndroid Build Coastguard Worker        sarl $3, %eax
149*9880d681SAndroid Build Coastguard Worker        ret
150*9880d681SAndroid Build Coastguard Worker
151*9880d681SAndroid Build Coastguard WorkerGCC knows several different ways to codegen it, one of which is this:
152*9880d681SAndroid Build Coastguard Worker
153*9880d681SAndroid Build Coastguard Worker_test1:
154*9880d681SAndroid Build Coastguard Worker        movl    4(%esp), %eax
155*9880d681SAndroid Build Coastguard Worker        cmpl    $-1, %eax
156*9880d681SAndroid Build Coastguard Worker        leal    7(%eax), %ecx
157*9880d681SAndroid Build Coastguard Worker        cmovle  %ecx, %eax
158*9880d681SAndroid Build Coastguard Worker        sarl    $3, %eax
159*9880d681SAndroid Build Coastguard Worker        ret
160*9880d681SAndroid Build Coastguard Worker
161*9880d681SAndroid Build Coastguard Workerwhich is probably slower, but it's interesting at least :)
162*9880d681SAndroid Build Coastguard Worker
163*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
164*9880d681SAndroid Build Coastguard Worker
165*9880d681SAndroid Build Coastguard WorkerWe are currently lowering large (1MB+) memmove/memcpy to rep/stosl and rep/movsl
166*9880d681SAndroid Build Coastguard WorkerWe should leave these as libcalls for everything over a much lower threshold,
167*9880d681SAndroid Build Coastguard Workersince libc is hand tuned for medium and large mem ops (avoiding RFO for large
168*9880d681SAndroid Build Coastguard Workerstores, TLB preheating, etc)
169*9880d681SAndroid Build Coastguard Worker
170*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
171*9880d681SAndroid Build Coastguard Worker
172*9880d681SAndroid Build Coastguard WorkerOptimize this into something reasonable:
173*9880d681SAndroid Build Coastguard Worker x * copysign(1.0, y) * copysign(1.0, z)
174*9880d681SAndroid Build Coastguard Worker
175*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
176*9880d681SAndroid Build Coastguard Worker
177*9880d681SAndroid Build Coastguard WorkerOptimize copysign(x, *y) to use an integer load from y.
178*9880d681SAndroid Build Coastguard Worker
179*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
180*9880d681SAndroid Build Coastguard Worker
181*9880d681SAndroid Build Coastguard WorkerThe following tests perform worse with LSR:
182*9880d681SAndroid Build Coastguard Worker
183*9880d681SAndroid Build Coastguard Workerlambda, siod, optimizer-eval, ackermann, hash2, nestedloop, strcat, and Treesor.
184*9880d681SAndroid Build Coastguard Worker
185*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
186*9880d681SAndroid Build Coastguard Worker
187*9880d681SAndroid Build Coastguard WorkerAdding to the list of cmp / test poor codegen issues:
188*9880d681SAndroid Build Coastguard Worker
189*9880d681SAndroid Build Coastguard Workerint test(__m128 *A, __m128 *B) {
190*9880d681SAndroid Build Coastguard Worker  if (_mm_comige_ss(*A, *B))
191*9880d681SAndroid Build Coastguard Worker    return 3;
192*9880d681SAndroid Build Coastguard Worker  else
193*9880d681SAndroid Build Coastguard Worker    return 4;
194*9880d681SAndroid Build Coastguard Worker}
195*9880d681SAndroid Build Coastguard Worker
196*9880d681SAndroid Build Coastguard Worker_test:
197*9880d681SAndroid Build Coastguard Worker	movl 8(%esp), %eax
198*9880d681SAndroid Build Coastguard Worker	movaps (%eax), %xmm0
199*9880d681SAndroid Build Coastguard Worker	movl 4(%esp), %eax
200*9880d681SAndroid Build Coastguard Worker	movaps (%eax), %xmm1
201*9880d681SAndroid Build Coastguard Worker	comiss %xmm0, %xmm1
202*9880d681SAndroid Build Coastguard Worker	setae %al
203*9880d681SAndroid Build Coastguard Worker	movzbl %al, %ecx
204*9880d681SAndroid Build Coastguard Worker	movl $3, %eax
205*9880d681SAndroid Build Coastguard Worker	movl $4, %edx
206*9880d681SAndroid Build Coastguard Worker	cmpl $0, %ecx
207*9880d681SAndroid Build Coastguard Worker	cmove %edx, %eax
208*9880d681SAndroid Build Coastguard Worker	ret
209*9880d681SAndroid Build Coastguard Worker
210*9880d681SAndroid Build Coastguard WorkerNote the setae, movzbl, cmpl, cmove can be replaced with a single cmovae. There
211*9880d681SAndroid Build Coastguard Workerare a number of issues. 1) We are introducing a setcc between the result of the
212*9880d681SAndroid Build Coastguard Workerintrisic call and select. 2) The intrinsic is expected to produce a i32 value
213*9880d681SAndroid Build Coastguard Workerso a any extend (which becomes a zero extend) is added.
214*9880d681SAndroid Build Coastguard Worker
215*9880d681SAndroid Build Coastguard WorkerWe probably need some kind of target DAG combine hook to fix this.
216*9880d681SAndroid Build Coastguard Worker
217*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
218*9880d681SAndroid Build Coastguard Worker
219*9880d681SAndroid Build Coastguard WorkerWe generate significantly worse code for this than GCC:
220*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=21150
221*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/attachment.cgi?id=8701
222*9880d681SAndroid Build Coastguard Worker
223*9880d681SAndroid Build Coastguard WorkerThere is also one case we do worse on PPC.
224*9880d681SAndroid Build Coastguard Worker
225*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
226*9880d681SAndroid Build Coastguard Worker
227*9880d681SAndroid Build Coastguard WorkerFor this:
228*9880d681SAndroid Build Coastguard Worker
229*9880d681SAndroid Build Coastguard Workerint test(int a)
230*9880d681SAndroid Build Coastguard Worker{
231*9880d681SAndroid Build Coastguard Worker  return a * 3;
232*9880d681SAndroid Build Coastguard Worker}
233*9880d681SAndroid Build Coastguard Worker
234*9880d681SAndroid Build Coastguard WorkerWe currently emits
235*9880d681SAndroid Build Coastguard Worker	imull $3, 4(%esp), %eax
236*9880d681SAndroid Build Coastguard Worker
237*9880d681SAndroid Build Coastguard WorkerPerhaps this is what we really should generate is? Is imull three or four
238*9880d681SAndroid Build Coastguard Workercycles? Note: ICC generates this:
239*9880d681SAndroid Build Coastguard Worker	movl	4(%esp), %eax
240*9880d681SAndroid Build Coastguard Worker	leal	(%eax,%eax,2), %eax
241*9880d681SAndroid Build Coastguard Worker
242*9880d681SAndroid Build Coastguard WorkerThe current instruction priority is based on pattern complexity. The former is
243*9880d681SAndroid Build Coastguard Workermore "complex" because it folds a load so the latter will not be emitted.
244*9880d681SAndroid Build Coastguard Worker
245*9880d681SAndroid Build Coastguard WorkerPerhaps we should use AddedComplexity to give LEA32r a higher priority? We
246*9880d681SAndroid Build Coastguard Workershould always try to match LEA first since the LEA matching code does some
247*9880d681SAndroid Build Coastguard Workerestimate to determine whether the match is profitable.
248*9880d681SAndroid Build Coastguard Worker
249*9880d681SAndroid Build Coastguard WorkerHowever, if we care more about code size, then imull is better. It's two bytes
250*9880d681SAndroid Build Coastguard Workershorter than movl + leal.
251*9880d681SAndroid Build Coastguard Worker
252*9880d681SAndroid Build Coastguard WorkerOn a Pentium M, both variants have the same characteristics with regard
253*9880d681SAndroid Build Coastguard Workerto throughput; however, the multiplication has a latency of four cycles, as
254*9880d681SAndroid Build Coastguard Workeropposed to two cycles for the movl+lea variant.
255*9880d681SAndroid Build Coastguard Worker
256*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
257*9880d681SAndroid Build Coastguard Worker
258*9880d681SAndroid Build Coastguard WorkerIt appears gcc place string data with linkonce linkage in
259*9880d681SAndroid Build Coastguard Worker.section __TEXT,__const_coal,coalesced instead of
260*9880d681SAndroid Build Coastguard Worker.section __DATA,__const_coal,coalesced.
261*9880d681SAndroid Build Coastguard WorkerTake a look at darwin.h, there are other Darwin assembler directives that we
262*9880d681SAndroid Build Coastguard Workerdo not make use of.
263*9880d681SAndroid Build Coastguard Worker
264*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
265*9880d681SAndroid Build Coastguard Worker
266*9880d681SAndroid Build Coastguard Workerdefine i32 @foo(i32* %a, i32 %t) {
267*9880d681SAndroid Build Coastguard Workerentry:
268*9880d681SAndroid Build Coastguard Worker	br label %cond_true
269*9880d681SAndroid Build Coastguard Worker
270*9880d681SAndroid Build Coastguard Workercond_true:		; preds = %cond_true, %entry
271*9880d681SAndroid Build Coastguard Worker	%x.0.0 = phi i32 [ 0, %entry ], [ %tmp9, %cond_true ]		; <i32> [#uses=3]
272*9880d681SAndroid Build Coastguard Worker	%t_addr.0.0 = phi i32 [ %t, %entry ], [ %tmp7, %cond_true ]		; <i32> [#uses=1]
273*9880d681SAndroid Build Coastguard Worker	%tmp2 = getelementptr i32* %a, i32 %x.0.0		; <i32*> [#uses=1]
274*9880d681SAndroid Build Coastguard Worker	%tmp3 = load i32* %tmp2		; <i32> [#uses=1]
275*9880d681SAndroid Build Coastguard Worker	%tmp5 = add i32 %t_addr.0.0, %x.0.0		; <i32> [#uses=1]
276*9880d681SAndroid Build Coastguard Worker	%tmp7 = add i32 %tmp5, %tmp3		; <i32> [#uses=2]
277*9880d681SAndroid Build Coastguard Worker	%tmp9 = add i32 %x.0.0, 1		; <i32> [#uses=2]
278*9880d681SAndroid Build Coastguard Worker	%tmp = icmp sgt i32 %tmp9, 39		; <i1> [#uses=1]
279*9880d681SAndroid Build Coastguard Worker	br i1 %tmp, label %bb12, label %cond_true
280*9880d681SAndroid Build Coastguard Worker
281*9880d681SAndroid Build Coastguard Workerbb12:		; preds = %cond_true
282*9880d681SAndroid Build Coastguard Worker	ret i32 %tmp7
283*9880d681SAndroid Build Coastguard Worker}
284*9880d681SAndroid Build Coastguard Workeris pessimized by -loop-reduce and -indvars
285*9880d681SAndroid Build Coastguard Worker
286*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
287*9880d681SAndroid Build Coastguard Worker
288*9880d681SAndroid Build Coastguard Workeru32 to float conversion improvement:
289*9880d681SAndroid Build Coastguard Worker
290*9880d681SAndroid Build Coastguard Workerfloat uint32_2_float( unsigned u ) {
291*9880d681SAndroid Build Coastguard Worker  float fl = (int) (u & 0xffff);
292*9880d681SAndroid Build Coastguard Worker  float fh = (int) (u >> 16);
293*9880d681SAndroid Build Coastguard Worker  fh *= 0x1.0p16f;
294*9880d681SAndroid Build Coastguard Worker  return fh + fl;
295*9880d681SAndroid Build Coastguard Worker}
296*9880d681SAndroid Build Coastguard Worker
297*9880d681SAndroid Build Coastguard Worker00000000        subl    $0x04,%esp
298*9880d681SAndroid Build Coastguard Worker00000003        movl    0x08(%esp,1),%eax
299*9880d681SAndroid Build Coastguard Worker00000007        movl    %eax,%ecx
300*9880d681SAndroid Build Coastguard Worker00000009        shrl    $0x10,%ecx
301*9880d681SAndroid Build Coastguard Worker0000000c        cvtsi2ss        %ecx,%xmm0
302*9880d681SAndroid Build Coastguard Worker00000010        andl    $0x0000ffff,%eax
303*9880d681SAndroid Build Coastguard Worker00000015        cvtsi2ss        %eax,%xmm1
304*9880d681SAndroid Build Coastguard Worker00000019        mulss   0x00000078,%xmm0
305*9880d681SAndroid Build Coastguard Worker00000021        addss   %xmm1,%xmm0
306*9880d681SAndroid Build Coastguard Worker00000025        movss   %xmm0,(%esp,1)
307*9880d681SAndroid Build Coastguard Worker0000002a        flds    (%esp,1)
308*9880d681SAndroid Build Coastguard Worker0000002d        addl    $0x04,%esp
309*9880d681SAndroid Build Coastguard Worker00000030        ret
310*9880d681SAndroid Build Coastguard Worker
311*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
312*9880d681SAndroid Build Coastguard Worker
313*9880d681SAndroid Build Coastguard WorkerWhen using fastcc abi, align stack slot of argument of type double on 8 byte
314*9880d681SAndroid Build Coastguard Workerboundary to improve performance.
315*9880d681SAndroid Build Coastguard Worker
316*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
317*9880d681SAndroid Build Coastguard Worker
318*9880d681SAndroid Build Coastguard WorkerGCC's ix86_expand_int_movcc function (in i386.c) has a ton of interesting
319*9880d681SAndroid Build Coastguard Workersimplifications for integer "x cmp y ? a : b".
320*9880d681SAndroid Build Coastguard Worker
321*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
322*9880d681SAndroid Build Coastguard Worker
323*9880d681SAndroid Build Coastguard WorkerConsider the expansion of:
324*9880d681SAndroid Build Coastguard Worker
325*9880d681SAndroid Build Coastguard Workerdefine i32 @test3(i32 %X) {
326*9880d681SAndroid Build Coastguard Worker        %tmp1 = urem i32 %X, 255
327*9880d681SAndroid Build Coastguard Worker        ret i32 %tmp1
328*9880d681SAndroid Build Coastguard Worker}
329*9880d681SAndroid Build Coastguard Worker
330*9880d681SAndroid Build Coastguard WorkerCurrently it compiles to:
331*9880d681SAndroid Build Coastguard Worker
332*9880d681SAndroid Build Coastguard Worker...
333*9880d681SAndroid Build Coastguard Worker        movl $2155905153, %ecx
334*9880d681SAndroid Build Coastguard Worker        movl 8(%esp), %esi
335*9880d681SAndroid Build Coastguard Worker        movl %esi, %eax
336*9880d681SAndroid Build Coastguard Worker        mull %ecx
337*9880d681SAndroid Build Coastguard Worker...
338*9880d681SAndroid Build Coastguard Worker
339*9880d681SAndroid Build Coastguard WorkerThis could be "reassociated" into:
340*9880d681SAndroid Build Coastguard Worker
341*9880d681SAndroid Build Coastguard Worker        movl $2155905153, %eax
342*9880d681SAndroid Build Coastguard Worker        movl 8(%esp), %ecx
343*9880d681SAndroid Build Coastguard Worker        mull %ecx
344*9880d681SAndroid Build Coastguard Worker
345*9880d681SAndroid Build Coastguard Workerto avoid the copy.  In fact, the existing two-address stuff would do this
346*9880d681SAndroid Build Coastguard Workerexcept that mul isn't a commutative 2-addr instruction.  I guess this has
347*9880d681SAndroid Build Coastguard Workerto be done at isel time based on the #uses to mul?
348*9880d681SAndroid Build Coastguard Worker
349*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
350*9880d681SAndroid Build Coastguard Worker
351*9880d681SAndroid Build Coastguard WorkerMake sure the instruction which starts a loop does not cross a cacheline
352*9880d681SAndroid Build Coastguard Workerboundary. This requires knowning the exact length of each machine instruction.
353*9880d681SAndroid Build Coastguard WorkerThat is somewhat complicated, but doable. Example 256.bzip2:
354*9880d681SAndroid Build Coastguard Worker
355*9880d681SAndroid Build Coastguard WorkerIn the new trace, the hot loop has an instruction which crosses a cacheline
356*9880d681SAndroid Build Coastguard Workerboundary.  In addition to potential cache misses, this can't help decoding as I
357*9880d681SAndroid Build Coastguard Workerimagine there has to be some kind of complicated decoder reset and realignment
358*9880d681SAndroid Build Coastguard Workerto grab the bytes from the next cacheline.
359*9880d681SAndroid Build Coastguard Worker
360*9880d681SAndroid Build Coastguard Worker532  532 0x3cfc movb     (1809(%esp, %esi), %bl   <<<--- spans 2 64 byte lines
361*9880d681SAndroid Build Coastguard Worker942  942 0x3d03 movl     %dh, (1809(%esp, %esi)
362*9880d681SAndroid Build Coastguard Worker937  937 0x3d0a incl     %esi
363*9880d681SAndroid Build Coastguard Worker3    3   0x3d0b cmpb     %bl, %dl
364*9880d681SAndroid Build Coastguard Worker27   27  0x3d0d jnz      0x000062db <main+11707>
365*9880d681SAndroid Build Coastguard Worker
366*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
367*9880d681SAndroid Build Coastguard Worker
368*9880d681SAndroid Build Coastguard WorkerIn c99 mode, the preprocessor doesn't like assembly comments like #TRUNCATE.
369*9880d681SAndroid Build Coastguard Worker
370*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
371*9880d681SAndroid Build Coastguard Worker
372*9880d681SAndroid Build Coastguard WorkerThis could be a single 16-bit load.
373*9880d681SAndroid Build Coastguard Worker
374*9880d681SAndroid Build Coastguard Workerint f(char *p) {
375*9880d681SAndroid Build Coastguard Worker    if ((p[0] == 1) & (p[1] == 2)) return 1;
376*9880d681SAndroid Build Coastguard Worker    return 0;
377*9880d681SAndroid Build Coastguard Worker}
378*9880d681SAndroid Build Coastguard Worker
379*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
380*9880d681SAndroid Build Coastguard Worker
381*9880d681SAndroid Build Coastguard WorkerWe should inline lrintf and probably other libc functions.
382*9880d681SAndroid Build Coastguard Worker
383*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
384*9880d681SAndroid Build Coastguard Worker
385*9880d681SAndroid Build Coastguard WorkerThis code:
386*9880d681SAndroid Build Coastguard Worker
387*9880d681SAndroid Build Coastguard Workervoid test(int X) {
388*9880d681SAndroid Build Coastguard Worker  if (X) abort();
389*9880d681SAndroid Build Coastguard Worker}
390*9880d681SAndroid Build Coastguard Worker
391*9880d681SAndroid Build Coastguard Workeris currently compiled to:
392*9880d681SAndroid Build Coastguard Worker
393*9880d681SAndroid Build Coastguard Worker_test:
394*9880d681SAndroid Build Coastguard Worker        subl $12, %esp
395*9880d681SAndroid Build Coastguard Worker        cmpl $0, 16(%esp)
396*9880d681SAndroid Build Coastguard Worker        jne LBB1_1
397*9880d681SAndroid Build Coastguard Worker        addl $12, %esp
398*9880d681SAndroid Build Coastguard Worker        ret
399*9880d681SAndroid Build Coastguard WorkerLBB1_1:
400*9880d681SAndroid Build Coastguard Worker        call L_abort$stub
401*9880d681SAndroid Build Coastguard Worker
402*9880d681SAndroid Build Coastguard WorkerIt would be better to produce:
403*9880d681SAndroid Build Coastguard Worker
404*9880d681SAndroid Build Coastguard Worker_test:
405*9880d681SAndroid Build Coastguard Worker        subl $12, %esp
406*9880d681SAndroid Build Coastguard Worker        cmpl $0, 16(%esp)
407*9880d681SAndroid Build Coastguard Worker        jne L_abort$stub
408*9880d681SAndroid Build Coastguard Worker        addl $12, %esp
409*9880d681SAndroid Build Coastguard Worker        ret
410*9880d681SAndroid Build Coastguard Worker
411*9880d681SAndroid Build Coastguard WorkerThis can be applied to any no-return function call that takes no arguments etc.
412*9880d681SAndroid Build Coastguard WorkerAlternatively, the stack save/restore logic could be shrink-wrapped, producing
413*9880d681SAndroid Build Coastguard Workersomething like this:
414*9880d681SAndroid Build Coastguard Worker
415*9880d681SAndroid Build Coastguard Worker_test:
416*9880d681SAndroid Build Coastguard Worker        cmpl $0, 4(%esp)
417*9880d681SAndroid Build Coastguard Worker        jne LBB1_1
418*9880d681SAndroid Build Coastguard Worker        ret
419*9880d681SAndroid Build Coastguard WorkerLBB1_1:
420*9880d681SAndroid Build Coastguard Worker        subl $12, %esp
421*9880d681SAndroid Build Coastguard Worker        call L_abort$stub
422*9880d681SAndroid Build Coastguard Worker
423*9880d681SAndroid Build Coastguard WorkerBoth are useful in different situations.  Finally, it could be shrink-wrapped
424*9880d681SAndroid Build Coastguard Workerand tail called, like this:
425*9880d681SAndroid Build Coastguard Worker
426*9880d681SAndroid Build Coastguard Worker_test:
427*9880d681SAndroid Build Coastguard Worker        cmpl $0, 4(%esp)
428*9880d681SAndroid Build Coastguard Worker        jne LBB1_1
429*9880d681SAndroid Build Coastguard Worker        ret
430*9880d681SAndroid Build Coastguard WorkerLBB1_1:
431*9880d681SAndroid Build Coastguard Worker        pop %eax   # realign stack.
432*9880d681SAndroid Build Coastguard Worker        call L_abort$stub
433*9880d681SAndroid Build Coastguard Worker
434*9880d681SAndroid Build Coastguard WorkerThough this probably isn't worth it.
435*9880d681SAndroid Build Coastguard Worker
436*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
437*9880d681SAndroid Build Coastguard Worker
438*9880d681SAndroid Build Coastguard WorkerSometimes it is better to codegen subtractions from a constant (e.g. 7-x) with
439*9880d681SAndroid Build Coastguard Workera neg instead of a sub instruction.  Consider:
440*9880d681SAndroid Build Coastguard Worker
441*9880d681SAndroid Build Coastguard Workerint test(char X) { return 7-X; }
442*9880d681SAndroid Build Coastguard Worker
443*9880d681SAndroid Build Coastguard Workerwe currently produce:
444*9880d681SAndroid Build Coastguard Worker_test:
445*9880d681SAndroid Build Coastguard Worker        movl $7, %eax
446*9880d681SAndroid Build Coastguard Worker        movsbl 4(%esp), %ecx
447*9880d681SAndroid Build Coastguard Worker        subl %ecx, %eax
448*9880d681SAndroid Build Coastguard Worker        ret
449*9880d681SAndroid Build Coastguard Worker
450*9880d681SAndroid Build Coastguard WorkerWe would use one fewer register if codegen'd as:
451*9880d681SAndroid Build Coastguard Worker
452*9880d681SAndroid Build Coastguard Worker        movsbl 4(%esp), %eax
453*9880d681SAndroid Build Coastguard Worker	neg %eax
454*9880d681SAndroid Build Coastguard Worker        add $7, %eax
455*9880d681SAndroid Build Coastguard Worker        ret
456*9880d681SAndroid Build Coastguard Worker
457*9880d681SAndroid Build Coastguard WorkerNote that this isn't beneficial if the load can be folded into the sub.  In
458*9880d681SAndroid Build Coastguard Workerthis case, we want a sub:
459*9880d681SAndroid Build Coastguard Worker
460*9880d681SAndroid Build Coastguard Workerint test(int X) { return 7-X; }
461*9880d681SAndroid Build Coastguard Worker_test:
462*9880d681SAndroid Build Coastguard Worker        movl $7, %eax
463*9880d681SAndroid Build Coastguard Worker        subl 4(%esp), %eax
464*9880d681SAndroid Build Coastguard Worker        ret
465*9880d681SAndroid Build Coastguard Worker
466*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
467*9880d681SAndroid Build Coastguard Worker
468*9880d681SAndroid Build Coastguard WorkerLeaf functions that require one 4-byte spill slot have a prolog like this:
469*9880d681SAndroid Build Coastguard Worker
470*9880d681SAndroid Build Coastguard Worker_foo:
471*9880d681SAndroid Build Coastguard Worker        pushl   %esi
472*9880d681SAndroid Build Coastguard Worker        subl    $4, %esp
473*9880d681SAndroid Build Coastguard Worker...
474*9880d681SAndroid Build Coastguard Workerand an epilog like this:
475*9880d681SAndroid Build Coastguard Worker        addl    $4, %esp
476*9880d681SAndroid Build Coastguard Worker        popl    %esi
477*9880d681SAndroid Build Coastguard Worker        ret
478*9880d681SAndroid Build Coastguard Worker
479*9880d681SAndroid Build Coastguard WorkerIt would be smaller, and potentially faster, to push eax on entry and to
480*9880d681SAndroid Build Coastguard Workerpop into a dummy register instead of using addl/subl of esp.  Just don't pop
481*9880d681SAndroid Build Coastguard Workerinto any return registers :)
482*9880d681SAndroid Build Coastguard Worker
483*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
484*9880d681SAndroid Build Coastguard Worker
485*9880d681SAndroid Build Coastguard WorkerThe X86 backend should fold (branch (or (setcc, setcc))) into multiple
486*9880d681SAndroid Build Coastguard Workerbranches.  We generate really poor code for:
487*9880d681SAndroid Build Coastguard Worker
488*9880d681SAndroid Build Coastguard Workerdouble testf(double a) {
489*9880d681SAndroid Build Coastguard Worker       return a == 0.0 ? 0.0 : (a > 0.0 ? 1.0 : -1.0);
490*9880d681SAndroid Build Coastguard Worker}
491*9880d681SAndroid Build Coastguard Worker
492*9880d681SAndroid Build Coastguard WorkerFor example, the entry BB is:
493*9880d681SAndroid Build Coastguard Worker
494*9880d681SAndroid Build Coastguard Worker_testf:
495*9880d681SAndroid Build Coastguard Worker        subl    $20, %esp
496*9880d681SAndroid Build Coastguard Worker        pxor    %xmm0, %xmm0
497*9880d681SAndroid Build Coastguard Worker        movsd   24(%esp), %xmm1
498*9880d681SAndroid Build Coastguard Worker        ucomisd %xmm0, %xmm1
499*9880d681SAndroid Build Coastguard Worker        setnp   %al
500*9880d681SAndroid Build Coastguard Worker        sete    %cl
501*9880d681SAndroid Build Coastguard Worker        testb   %cl, %al
502*9880d681SAndroid Build Coastguard Worker        jne     LBB1_5  # UnifiedReturnBlock
503*9880d681SAndroid Build Coastguard WorkerLBB1_1: # cond_true
504*9880d681SAndroid Build Coastguard Worker
505*9880d681SAndroid Build Coastguard Worker
506*9880d681SAndroid Build Coastguard Workerit would be better to replace the last four instructions with:
507*9880d681SAndroid Build Coastguard Worker
508*9880d681SAndroid Build Coastguard Worker	jp LBB1_1
509*9880d681SAndroid Build Coastguard Worker	je LBB1_5
510*9880d681SAndroid Build Coastguard WorkerLBB1_1:
511*9880d681SAndroid Build Coastguard Worker
512*9880d681SAndroid Build Coastguard WorkerWe also codegen the inner ?: into a diamond:
513*9880d681SAndroid Build Coastguard Worker
514*9880d681SAndroid Build Coastguard Worker       cvtss2sd        LCPI1_0(%rip), %xmm2
515*9880d681SAndroid Build Coastguard Worker        cvtss2sd        LCPI1_1(%rip), %xmm3
516*9880d681SAndroid Build Coastguard Worker        ucomisd %xmm1, %xmm0
517*9880d681SAndroid Build Coastguard Worker        ja      LBB1_3  # cond_true
518*9880d681SAndroid Build Coastguard WorkerLBB1_2: # cond_true
519*9880d681SAndroid Build Coastguard Worker        movapd  %xmm3, %xmm2
520*9880d681SAndroid Build Coastguard WorkerLBB1_3: # cond_true
521*9880d681SAndroid Build Coastguard Worker        movapd  %xmm2, %xmm0
522*9880d681SAndroid Build Coastguard Worker        ret
523*9880d681SAndroid Build Coastguard Worker
524*9880d681SAndroid Build Coastguard WorkerWe should sink the load into xmm3 into the LBB1_2 block.  This should
525*9880d681SAndroid Build Coastguard Workerbe pretty easy, and will nuke all the copies.
526*9880d681SAndroid Build Coastguard Worker
527*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
528*9880d681SAndroid Build Coastguard Worker
529*9880d681SAndroid Build Coastguard WorkerThis:
530*9880d681SAndroid Build Coastguard Worker        #include <algorithm>
531*9880d681SAndroid Build Coastguard Worker        inline std::pair<unsigned, bool> full_add(unsigned a, unsigned b)
532*9880d681SAndroid Build Coastguard Worker        { return std::make_pair(a + b, a + b < a); }
533*9880d681SAndroid Build Coastguard Worker        bool no_overflow(unsigned a, unsigned b)
534*9880d681SAndroid Build Coastguard Worker        { return !full_add(a, b).second; }
535*9880d681SAndroid Build Coastguard Worker
536*9880d681SAndroid Build Coastguard WorkerShould compile to:
537*9880d681SAndroid Build Coastguard Worker	addl	%esi, %edi
538*9880d681SAndroid Build Coastguard Worker	setae	%al
539*9880d681SAndroid Build Coastguard Worker	movzbl	%al, %eax
540*9880d681SAndroid Build Coastguard Worker	ret
541*9880d681SAndroid Build Coastguard Worker
542*9880d681SAndroid Build Coastguard Workeron x86-64, instead of the rather stupid-looking:
543*9880d681SAndroid Build Coastguard Worker	addl	%esi, %edi
544*9880d681SAndroid Build Coastguard Worker	setb	%al
545*9880d681SAndroid Build Coastguard Worker	xorb	$1, %al
546*9880d681SAndroid Build Coastguard Worker	movzbl	%al, %eax
547*9880d681SAndroid Build Coastguard Worker	ret
548*9880d681SAndroid Build Coastguard Worker
549*9880d681SAndroid Build Coastguard Worker
550*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
551*9880d681SAndroid Build Coastguard Worker
552*9880d681SAndroid Build Coastguard WorkerThe following code:
553*9880d681SAndroid Build Coastguard Worker
554*9880d681SAndroid Build Coastguard Workerbb114.preheader:		; preds = %cond_next94
555*9880d681SAndroid Build Coastguard Worker	%tmp231232 = sext i16 %tmp62 to i32		; <i32> [#uses=1]
556*9880d681SAndroid Build Coastguard Worker	%tmp233 = sub i32 32, %tmp231232		; <i32> [#uses=1]
557*9880d681SAndroid Build Coastguard Worker	%tmp245246 = sext i16 %tmp65 to i32		; <i32> [#uses=1]
558*9880d681SAndroid Build Coastguard Worker	%tmp252253 = sext i16 %tmp68 to i32		; <i32> [#uses=1]
559*9880d681SAndroid Build Coastguard Worker	%tmp254 = sub i32 32, %tmp252253		; <i32> [#uses=1]
560*9880d681SAndroid Build Coastguard Worker	%tmp553554 = bitcast i16* %tmp37 to i8*		; <i8*> [#uses=2]
561*9880d681SAndroid Build Coastguard Worker	%tmp583584 = sext i16 %tmp98 to i32		; <i32> [#uses=1]
562*9880d681SAndroid Build Coastguard Worker	%tmp585 = sub i32 32, %tmp583584		; <i32> [#uses=1]
563*9880d681SAndroid Build Coastguard Worker	%tmp614615 = sext i16 %tmp101 to i32		; <i32> [#uses=1]
564*9880d681SAndroid Build Coastguard Worker	%tmp621622 = sext i16 %tmp104 to i32		; <i32> [#uses=1]
565*9880d681SAndroid Build Coastguard Worker	%tmp623 = sub i32 32, %tmp621622		; <i32> [#uses=1]
566*9880d681SAndroid Build Coastguard Worker	br label %bb114
567*9880d681SAndroid Build Coastguard Worker
568*9880d681SAndroid Build Coastguard Workerproduces:
569*9880d681SAndroid Build Coastguard Worker
570*9880d681SAndroid Build Coastguard WorkerLBB3_5:	# bb114.preheader
571*9880d681SAndroid Build Coastguard Worker	movswl	-68(%ebp), %eax
572*9880d681SAndroid Build Coastguard Worker	movl	$32, %ecx
573*9880d681SAndroid Build Coastguard Worker	movl	%ecx, -80(%ebp)
574*9880d681SAndroid Build Coastguard Worker	subl	%eax, -80(%ebp)
575*9880d681SAndroid Build Coastguard Worker	movswl	-52(%ebp), %eax
576*9880d681SAndroid Build Coastguard Worker	movl	%ecx, -84(%ebp)
577*9880d681SAndroid Build Coastguard Worker	subl	%eax, -84(%ebp)
578*9880d681SAndroid Build Coastguard Worker	movswl	-70(%ebp), %eax
579*9880d681SAndroid Build Coastguard Worker	movl	%ecx, -88(%ebp)
580*9880d681SAndroid Build Coastguard Worker	subl	%eax, -88(%ebp)
581*9880d681SAndroid Build Coastguard Worker	movswl	-50(%ebp), %eax
582*9880d681SAndroid Build Coastguard Worker	subl	%eax, %ecx
583*9880d681SAndroid Build Coastguard Worker	movl	%ecx, -76(%ebp)
584*9880d681SAndroid Build Coastguard Worker	movswl	-42(%ebp), %eax
585*9880d681SAndroid Build Coastguard Worker	movl	%eax, -92(%ebp)
586*9880d681SAndroid Build Coastguard Worker	movswl	-66(%ebp), %eax
587*9880d681SAndroid Build Coastguard Worker	movl	%eax, -96(%ebp)
588*9880d681SAndroid Build Coastguard Worker	movw	$0, -98(%ebp)
589*9880d681SAndroid Build Coastguard Worker
590*9880d681SAndroid Build Coastguard WorkerThis appears to be bad because the RA is not folding the store to the stack
591*9880d681SAndroid Build Coastguard Workerslot into the movl.  The above instructions could be:
592*9880d681SAndroid Build Coastguard Worker	movl    $32, -80(%ebp)
593*9880d681SAndroid Build Coastguard Worker...
594*9880d681SAndroid Build Coastguard Worker	movl    $32, -84(%ebp)
595*9880d681SAndroid Build Coastguard Worker...
596*9880d681SAndroid Build Coastguard WorkerThis seems like a cross between remat and spill folding.
597*9880d681SAndroid Build Coastguard Worker
598*9880d681SAndroid Build Coastguard WorkerThis has redundant subtractions of %eax from a stack slot. However, %ecx doesn't
599*9880d681SAndroid Build Coastguard Workerchange, so we could simply subtract %eax from %ecx first and then use %ecx (or
600*9880d681SAndroid Build Coastguard Workervice-versa).
601*9880d681SAndroid Build Coastguard Worker
602*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
603*9880d681SAndroid Build Coastguard Worker
604*9880d681SAndroid Build Coastguard WorkerThis code:
605*9880d681SAndroid Build Coastguard Worker
606*9880d681SAndroid Build Coastguard Worker	%tmp659 = icmp slt i16 %tmp654, 0		; <i1> [#uses=1]
607*9880d681SAndroid Build Coastguard Worker	br i1 %tmp659, label %cond_true662, label %cond_next715
608*9880d681SAndroid Build Coastguard Worker
609*9880d681SAndroid Build Coastguard Workerproduces this:
610*9880d681SAndroid Build Coastguard Worker
611*9880d681SAndroid Build Coastguard Worker	testw	%cx, %cx
612*9880d681SAndroid Build Coastguard Worker	movswl	%cx, %esi
613*9880d681SAndroid Build Coastguard Worker	jns	LBB4_109	# cond_next715
614*9880d681SAndroid Build Coastguard Worker
615*9880d681SAndroid Build Coastguard WorkerShark tells us that using %cx in the testw instruction is sub-optimal. It
616*9880d681SAndroid Build Coastguard Workersuggests using the 32-bit register (which is what ICC uses).
617*9880d681SAndroid Build Coastguard Worker
618*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
619*9880d681SAndroid Build Coastguard Worker
620*9880d681SAndroid Build Coastguard WorkerWe compile this:
621*9880d681SAndroid Build Coastguard Worker
622*9880d681SAndroid Build Coastguard Workervoid compare (long long foo) {
623*9880d681SAndroid Build Coastguard Worker  if (foo < 4294967297LL)
624*9880d681SAndroid Build Coastguard Worker    abort();
625*9880d681SAndroid Build Coastguard Worker}
626*9880d681SAndroid Build Coastguard Worker
627*9880d681SAndroid Build Coastguard Workerto:
628*9880d681SAndroid Build Coastguard Worker
629*9880d681SAndroid Build Coastguard Workercompare:
630*9880d681SAndroid Build Coastguard Worker        subl    $4, %esp
631*9880d681SAndroid Build Coastguard Worker        cmpl    $0, 8(%esp)
632*9880d681SAndroid Build Coastguard Worker        setne   %al
633*9880d681SAndroid Build Coastguard Worker        movzbw  %al, %ax
634*9880d681SAndroid Build Coastguard Worker        cmpl    $1, 12(%esp)
635*9880d681SAndroid Build Coastguard Worker        setg    %cl
636*9880d681SAndroid Build Coastguard Worker        movzbw  %cl, %cx
637*9880d681SAndroid Build Coastguard Worker        cmove   %ax, %cx
638*9880d681SAndroid Build Coastguard Worker        testb   $1, %cl
639*9880d681SAndroid Build Coastguard Worker        jne     .LBB1_2 # UnifiedReturnBlock
640*9880d681SAndroid Build Coastguard Worker.LBB1_1:        # ifthen
641*9880d681SAndroid Build Coastguard Worker        call    abort
642*9880d681SAndroid Build Coastguard Worker.LBB1_2:        # UnifiedReturnBlock
643*9880d681SAndroid Build Coastguard Worker        addl    $4, %esp
644*9880d681SAndroid Build Coastguard Worker        ret
645*9880d681SAndroid Build Coastguard Worker
646*9880d681SAndroid Build Coastguard Worker(also really horrible code on ppc).  This is due to the expand code for 64-bit
647*9880d681SAndroid Build Coastguard Workercompares.  GCC produces multiple branches, which is much nicer:
648*9880d681SAndroid Build Coastguard Worker
649*9880d681SAndroid Build Coastguard Workercompare:
650*9880d681SAndroid Build Coastguard Worker        subl    $12, %esp
651*9880d681SAndroid Build Coastguard Worker        movl    20(%esp), %edx
652*9880d681SAndroid Build Coastguard Worker        movl    16(%esp), %eax
653*9880d681SAndroid Build Coastguard Worker        decl    %edx
654*9880d681SAndroid Build Coastguard Worker        jle     .L7
655*9880d681SAndroid Build Coastguard Worker.L5:
656*9880d681SAndroid Build Coastguard Worker        addl    $12, %esp
657*9880d681SAndroid Build Coastguard Worker        ret
658*9880d681SAndroid Build Coastguard Worker        .p2align 4,,7
659*9880d681SAndroid Build Coastguard Worker.L7:
660*9880d681SAndroid Build Coastguard Worker        jl      .L4
661*9880d681SAndroid Build Coastguard Worker        cmpl    $0, %eax
662*9880d681SAndroid Build Coastguard Worker        .p2align 4,,8
663*9880d681SAndroid Build Coastguard Worker        ja      .L5
664*9880d681SAndroid Build Coastguard Worker.L4:
665*9880d681SAndroid Build Coastguard Worker        .p2align 4,,9
666*9880d681SAndroid Build Coastguard Worker        call    abort
667*9880d681SAndroid Build Coastguard Worker
668*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
669*9880d681SAndroid Build Coastguard Worker
670*9880d681SAndroid Build Coastguard WorkerTail call optimization improvements: Tail call optimization currently
671*9880d681SAndroid Build Coastguard Workerpushes all arguments on the top of the stack (their normal place for
672*9880d681SAndroid Build Coastguard Workernon-tail call optimized calls) that source from the callers arguments
673*9880d681SAndroid Build Coastguard Workeror  that source from a virtual register (also possibly sourcing from
674*9880d681SAndroid Build Coastguard Workercallers arguments).
675*9880d681SAndroid Build Coastguard WorkerThis is done to prevent overwriting of parameters (see example
676*9880d681SAndroid Build Coastguard Workerbelow) that might be used later.
677*9880d681SAndroid Build Coastguard Worker
678*9880d681SAndroid Build Coastguard Workerexample:
679*9880d681SAndroid Build Coastguard Worker
680*9880d681SAndroid Build Coastguard Workerint callee(int32, int64);
681*9880d681SAndroid Build Coastguard Workerint caller(int32 arg1, int32 arg2) {
682*9880d681SAndroid Build Coastguard Worker  int64 local = arg2 * 2;
683*9880d681SAndroid Build Coastguard Worker  return callee(arg2, (int64)local);
684*9880d681SAndroid Build Coastguard Worker}
685*9880d681SAndroid Build Coastguard Worker
686*9880d681SAndroid Build Coastguard Worker[arg1]          [!arg2 no longer valid since we moved local onto it]
687*9880d681SAndroid Build Coastguard Worker[arg2]      ->  [(int64)
688*9880d681SAndroid Build Coastguard Worker[RETADDR]        local  ]
689*9880d681SAndroid Build Coastguard Worker
690*9880d681SAndroid Build Coastguard WorkerMoving arg1 onto the stack slot of callee function would overwrite
691*9880d681SAndroid Build Coastguard Workerarg2 of the caller.
692*9880d681SAndroid Build Coastguard Worker
693*9880d681SAndroid Build Coastguard WorkerPossible optimizations:
694*9880d681SAndroid Build Coastguard Worker
695*9880d681SAndroid Build Coastguard Worker
696*9880d681SAndroid Build Coastguard Worker - Analyse the actual parameters of the callee to see which would
697*9880d681SAndroid Build Coastguard Worker   overwrite a caller parameter which is used by the callee and only
698*9880d681SAndroid Build Coastguard Worker   push them onto the top of the stack.
699*9880d681SAndroid Build Coastguard Worker
700*9880d681SAndroid Build Coastguard Worker   int callee (int32 arg1, int32 arg2);
701*9880d681SAndroid Build Coastguard Worker   int caller (int32 arg1, int32 arg2) {
702*9880d681SAndroid Build Coastguard Worker       return callee(arg1,arg2);
703*9880d681SAndroid Build Coastguard Worker   }
704*9880d681SAndroid Build Coastguard Worker
705*9880d681SAndroid Build Coastguard Worker   Here we don't need to write any variables to the top of the stack
706*9880d681SAndroid Build Coastguard Worker   since they don't overwrite each other.
707*9880d681SAndroid Build Coastguard Worker
708*9880d681SAndroid Build Coastguard Worker   int callee (int32 arg1, int32 arg2);
709*9880d681SAndroid Build Coastguard Worker   int caller (int32 arg1, int32 arg2) {
710*9880d681SAndroid Build Coastguard Worker       return callee(arg2,arg1);
711*9880d681SAndroid Build Coastguard Worker   }
712*9880d681SAndroid Build Coastguard Worker
713*9880d681SAndroid Build Coastguard Worker   Here we need to push the arguments because they overwrite each
714*9880d681SAndroid Build Coastguard Worker   other.
715*9880d681SAndroid Build Coastguard Worker
716*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
717*9880d681SAndroid Build Coastguard Worker
718*9880d681SAndroid Build Coastguard Workermain ()
719*9880d681SAndroid Build Coastguard Worker{
720*9880d681SAndroid Build Coastguard Worker  int i = 0;
721*9880d681SAndroid Build Coastguard Worker  unsigned long int z = 0;
722*9880d681SAndroid Build Coastguard Worker
723*9880d681SAndroid Build Coastguard Worker  do {
724*9880d681SAndroid Build Coastguard Worker    z -= 0x00004000;
725*9880d681SAndroid Build Coastguard Worker    i++;
726*9880d681SAndroid Build Coastguard Worker    if (i > 0x00040000)
727*9880d681SAndroid Build Coastguard Worker      abort ();
728*9880d681SAndroid Build Coastguard Worker  } while (z > 0);
729*9880d681SAndroid Build Coastguard Worker  exit (0);
730*9880d681SAndroid Build Coastguard Worker}
731*9880d681SAndroid Build Coastguard Worker
732*9880d681SAndroid Build Coastguard Workergcc compiles this to:
733*9880d681SAndroid Build Coastguard Worker
734*9880d681SAndroid Build Coastguard Worker_main:
735*9880d681SAndroid Build Coastguard Worker	subl	$28, %esp
736*9880d681SAndroid Build Coastguard Worker	xorl	%eax, %eax
737*9880d681SAndroid Build Coastguard Worker	jmp	L2
738*9880d681SAndroid Build Coastguard WorkerL3:
739*9880d681SAndroid Build Coastguard Worker	cmpl	$262144, %eax
740*9880d681SAndroid Build Coastguard Worker	je	L10
741*9880d681SAndroid Build Coastguard WorkerL2:
742*9880d681SAndroid Build Coastguard Worker	addl	$1, %eax
743*9880d681SAndroid Build Coastguard Worker	cmpl	$262145, %eax
744*9880d681SAndroid Build Coastguard Worker	jne	L3
745*9880d681SAndroid Build Coastguard Worker	call	L_abort$stub
746*9880d681SAndroid Build Coastguard WorkerL10:
747*9880d681SAndroid Build Coastguard Worker	movl	$0, (%esp)
748*9880d681SAndroid Build Coastguard Worker	call	L_exit$stub
749*9880d681SAndroid Build Coastguard Worker
750*9880d681SAndroid Build Coastguard Workerllvm:
751*9880d681SAndroid Build Coastguard Worker
752*9880d681SAndroid Build Coastguard Worker_main:
753*9880d681SAndroid Build Coastguard Worker	subl	$12, %esp
754*9880d681SAndroid Build Coastguard Worker	movl	$1, %eax
755*9880d681SAndroid Build Coastguard Worker	movl	$16384, %ecx
756*9880d681SAndroid Build Coastguard WorkerLBB1_1:	# bb
757*9880d681SAndroid Build Coastguard Worker	cmpl	$262145, %eax
758*9880d681SAndroid Build Coastguard Worker	jge	LBB1_4	# cond_true
759*9880d681SAndroid Build Coastguard WorkerLBB1_2:	# cond_next
760*9880d681SAndroid Build Coastguard Worker	incl	%eax
761*9880d681SAndroid Build Coastguard Worker	addl	$4294950912, %ecx
762*9880d681SAndroid Build Coastguard Worker	cmpl	$16384, %ecx
763*9880d681SAndroid Build Coastguard Worker	jne	LBB1_1	# bb
764*9880d681SAndroid Build Coastguard WorkerLBB1_3:	# bb11
765*9880d681SAndroid Build Coastguard Worker	xorl	%eax, %eax
766*9880d681SAndroid Build Coastguard Worker	addl	$12, %esp
767*9880d681SAndroid Build Coastguard Worker	ret
768*9880d681SAndroid Build Coastguard WorkerLBB1_4:	# cond_true
769*9880d681SAndroid Build Coastguard Worker	call	L_abort$stub
770*9880d681SAndroid Build Coastguard Worker
771*9880d681SAndroid Build Coastguard Worker1. LSR should rewrite the first cmp with induction variable %ecx.
772*9880d681SAndroid Build Coastguard Worker2. DAG combiner should fold
773*9880d681SAndroid Build Coastguard Worker        leal    1(%eax), %edx
774*9880d681SAndroid Build Coastguard Worker        cmpl    $262145, %edx
775*9880d681SAndroid Build Coastguard Worker   =>
776*9880d681SAndroid Build Coastguard Worker        cmpl    $262144, %eax
777*9880d681SAndroid Build Coastguard Worker
778*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
779*9880d681SAndroid Build Coastguard Worker
780*9880d681SAndroid Build Coastguard Workerdefine i64 @test(double %X) {
781*9880d681SAndroid Build Coastguard Worker	%Y = fptosi double %X to i64
782*9880d681SAndroid Build Coastguard Worker	ret i64 %Y
783*9880d681SAndroid Build Coastguard Worker}
784*9880d681SAndroid Build Coastguard Worker
785*9880d681SAndroid Build Coastguard Workercompiles to:
786*9880d681SAndroid Build Coastguard Worker
787*9880d681SAndroid Build Coastguard Worker_test:
788*9880d681SAndroid Build Coastguard Worker	subl	$20, %esp
789*9880d681SAndroid Build Coastguard Worker	movsd	24(%esp), %xmm0
790*9880d681SAndroid Build Coastguard Worker	movsd	%xmm0, 8(%esp)
791*9880d681SAndroid Build Coastguard Worker	fldl	8(%esp)
792*9880d681SAndroid Build Coastguard Worker	fisttpll	(%esp)
793*9880d681SAndroid Build Coastguard Worker	movl	4(%esp), %edx
794*9880d681SAndroid Build Coastguard Worker	movl	(%esp), %eax
795*9880d681SAndroid Build Coastguard Worker	addl	$20, %esp
796*9880d681SAndroid Build Coastguard Worker	#FP_REG_KILL
797*9880d681SAndroid Build Coastguard Worker	ret
798*9880d681SAndroid Build Coastguard Worker
799*9880d681SAndroid Build Coastguard WorkerThis should just fldl directly from the input stack slot.
800*9880d681SAndroid Build Coastguard Worker
801*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
802*9880d681SAndroid Build Coastguard Worker
803*9880d681SAndroid Build Coastguard WorkerThis code:
804*9880d681SAndroid Build Coastguard Workerint foo (int x) { return (x & 65535) | 255; }
805*9880d681SAndroid Build Coastguard Worker
806*9880d681SAndroid Build Coastguard WorkerShould compile into:
807*9880d681SAndroid Build Coastguard Worker
808*9880d681SAndroid Build Coastguard Worker_foo:
809*9880d681SAndroid Build Coastguard Worker        movzwl  4(%esp), %eax
810*9880d681SAndroid Build Coastguard Worker        orl     $255, %eax
811*9880d681SAndroid Build Coastguard Worker        ret
812*9880d681SAndroid Build Coastguard Worker
813*9880d681SAndroid Build Coastguard Workerinstead of:
814*9880d681SAndroid Build Coastguard Worker_foo:
815*9880d681SAndroid Build Coastguard Worker	movl	$65280, %eax
816*9880d681SAndroid Build Coastguard Worker	andl	4(%esp), %eax
817*9880d681SAndroid Build Coastguard Worker	orl	$255, %eax
818*9880d681SAndroid Build Coastguard Worker	ret
819*9880d681SAndroid Build Coastguard Worker
820*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
821*9880d681SAndroid Build Coastguard Worker
822*9880d681SAndroid Build Coastguard WorkerWe're codegen'ing multiply of long longs inefficiently:
823*9880d681SAndroid Build Coastguard Worker
824*9880d681SAndroid Build Coastguard Workerunsigned long long LLM(unsigned long long arg1, unsigned long long arg2) {
825*9880d681SAndroid Build Coastguard Worker  return arg1 *  arg2;
826*9880d681SAndroid Build Coastguard Worker}
827*9880d681SAndroid Build Coastguard Worker
828*9880d681SAndroid Build Coastguard WorkerWe compile to (fomit-frame-pointer):
829*9880d681SAndroid Build Coastguard Worker
830*9880d681SAndroid Build Coastguard Worker_LLM:
831*9880d681SAndroid Build Coastguard Worker	pushl	%esi
832*9880d681SAndroid Build Coastguard Worker	movl	8(%esp), %ecx
833*9880d681SAndroid Build Coastguard Worker	movl	16(%esp), %esi
834*9880d681SAndroid Build Coastguard Worker	movl	%esi, %eax
835*9880d681SAndroid Build Coastguard Worker	mull	%ecx
836*9880d681SAndroid Build Coastguard Worker	imull	12(%esp), %esi
837*9880d681SAndroid Build Coastguard Worker	addl	%edx, %esi
838*9880d681SAndroid Build Coastguard Worker	imull	20(%esp), %ecx
839*9880d681SAndroid Build Coastguard Worker	movl	%esi, %edx
840*9880d681SAndroid Build Coastguard Worker	addl	%ecx, %edx
841*9880d681SAndroid Build Coastguard Worker	popl	%esi
842*9880d681SAndroid Build Coastguard Worker	ret
843*9880d681SAndroid Build Coastguard Worker
844*9880d681SAndroid Build Coastguard WorkerThis looks like a scheduling deficiency and lack of remat of the load from
845*9880d681SAndroid Build Coastguard Workerthe argument area.  ICC apparently produces:
846*9880d681SAndroid Build Coastguard Worker
847*9880d681SAndroid Build Coastguard Worker        movl      8(%esp), %ecx
848*9880d681SAndroid Build Coastguard Worker        imull     12(%esp), %ecx
849*9880d681SAndroid Build Coastguard Worker        movl      16(%esp), %eax
850*9880d681SAndroid Build Coastguard Worker        imull     4(%esp), %eax
851*9880d681SAndroid Build Coastguard Worker        addl      %eax, %ecx
852*9880d681SAndroid Build Coastguard Worker        movl      4(%esp), %eax
853*9880d681SAndroid Build Coastguard Worker        mull      12(%esp)
854*9880d681SAndroid Build Coastguard Worker        addl      %ecx, %edx
855*9880d681SAndroid Build Coastguard Worker        ret
856*9880d681SAndroid Build Coastguard Worker
857*9880d681SAndroid Build Coastguard WorkerNote that it remat'd loads from 4(esp) and 12(esp).  See this GCC PR:
858*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236
859*9880d681SAndroid Build Coastguard Worker
860*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
861*9880d681SAndroid Build Coastguard Worker
862*9880d681SAndroid Build Coastguard WorkerWe can fold a store into "zeroing a reg".  Instead of:
863*9880d681SAndroid Build Coastguard Worker
864*9880d681SAndroid Build Coastguard Workerxorl    %eax, %eax
865*9880d681SAndroid Build Coastguard Workermovl    %eax, 124(%esp)
866*9880d681SAndroid Build Coastguard Worker
867*9880d681SAndroid Build Coastguard Workerwe should get:
868*9880d681SAndroid Build Coastguard Worker
869*9880d681SAndroid Build Coastguard Workermovl    $0, 124(%esp)
870*9880d681SAndroid Build Coastguard Worker
871*9880d681SAndroid Build Coastguard Workerif the flags of the xor are dead.
872*9880d681SAndroid Build Coastguard Worker
873*9880d681SAndroid Build Coastguard WorkerLikewise, we isel "x<<1" into "add reg,reg".  If reg is spilled, this should
874*9880d681SAndroid Build Coastguard Workerbe folded into: shl [mem], 1
875*9880d681SAndroid Build Coastguard Worker
876*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
877*9880d681SAndroid Build Coastguard Worker
878*9880d681SAndroid Build Coastguard WorkerIn SSE mode, we turn abs and neg into a load from the constant pool plus a xor
879*9880d681SAndroid Build Coastguard Workeror and instruction, for example:
880*9880d681SAndroid Build Coastguard Worker
881*9880d681SAndroid Build Coastguard Worker	xorpd	LCPI1_0, %xmm2
882*9880d681SAndroid Build Coastguard Worker
883*9880d681SAndroid Build Coastguard WorkerHowever, if xmm2 gets spilled, we end up with really ugly code like this:
884*9880d681SAndroid Build Coastguard Worker
885*9880d681SAndroid Build Coastguard Worker	movsd	(%esp), %xmm0
886*9880d681SAndroid Build Coastguard Worker	xorpd	LCPI1_0, %xmm0
887*9880d681SAndroid Build Coastguard Worker	movsd	%xmm0, (%esp)
888*9880d681SAndroid Build Coastguard Worker
889*9880d681SAndroid Build Coastguard WorkerSince we 'know' that this is a 'neg', we can actually "fold" the spill into
890*9880d681SAndroid Build Coastguard Workerthe neg/abs instruction, turning it into an *integer* operation, like this:
891*9880d681SAndroid Build Coastguard Worker
892*9880d681SAndroid Build Coastguard Worker	xorl 2147483648, [mem+4]     ## 2147483648 = (1 << 31)
893*9880d681SAndroid Build Coastguard Worker
894*9880d681SAndroid Build Coastguard Workeryou could also use xorb, but xorl is less likely to lead to a partial register
895*9880d681SAndroid Build Coastguard Workerstall.  Here is a contrived testcase:
896*9880d681SAndroid Build Coastguard Worker
897*9880d681SAndroid Build Coastguard Workerdouble a, b, c;
898*9880d681SAndroid Build Coastguard Workervoid test(double *P) {
899*9880d681SAndroid Build Coastguard Worker  double X = *P;
900*9880d681SAndroid Build Coastguard Worker  a = X;
901*9880d681SAndroid Build Coastguard Worker  bar();
902*9880d681SAndroid Build Coastguard Worker  X = -X;
903*9880d681SAndroid Build Coastguard Worker  b = X;
904*9880d681SAndroid Build Coastguard Worker  bar();
905*9880d681SAndroid Build Coastguard Worker  c = X;
906*9880d681SAndroid Build Coastguard Worker}
907*9880d681SAndroid Build Coastguard Worker
908*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
909*9880d681SAndroid Build Coastguard Worker
910*9880d681SAndroid Build Coastguard WorkerThe generated code on x86 for checking for signed overflow on a multiply the
911*9880d681SAndroid Build Coastguard Workerobvious way is much longer than it needs to be.
912*9880d681SAndroid Build Coastguard Worker
913*9880d681SAndroid Build Coastguard Workerint x(int a, int b) {
914*9880d681SAndroid Build Coastguard Worker  long long prod = (long long)a*b;
915*9880d681SAndroid Build Coastguard Worker  return  prod > 0x7FFFFFFF || prod < (-0x7FFFFFFF-1);
916*9880d681SAndroid Build Coastguard Worker}
917*9880d681SAndroid Build Coastguard Worker
918*9880d681SAndroid Build Coastguard WorkerSee PR2053 for more details.
919*9880d681SAndroid Build Coastguard Worker
920*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
921*9880d681SAndroid Build Coastguard Worker
922*9880d681SAndroid Build Coastguard WorkerWe should investigate using cdq/ctld (effect: edx = sar eax, 31)
923*9880d681SAndroid Build Coastguard Workermore aggressively; it should cost the same as a move+shift on any modern
924*9880d681SAndroid Build Coastguard Workerprocessor, but it's a lot shorter. Downside is that it puts more
925*9880d681SAndroid Build Coastguard Workerpressure on register allocation because it has fixed operands.
926*9880d681SAndroid Build Coastguard Worker
927*9880d681SAndroid Build Coastguard WorkerExample:
928*9880d681SAndroid Build Coastguard Workerint abs(int x) {return x < 0 ? -x : x;}
929*9880d681SAndroid Build Coastguard Worker
930*9880d681SAndroid Build Coastguard Workergcc compiles this to the following when using march/mtune=pentium2/3/4/m/etc.:
931*9880d681SAndroid Build Coastguard Workerabs:
932*9880d681SAndroid Build Coastguard Worker        movl    4(%esp), %eax
933*9880d681SAndroid Build Coastguard Worker        cltd
934*9880d681SAndroid Build Coastguard Worker        xorl    %edx, %eax
935*9880d681SAndroid Build Coastguard Worker        subl    %edx, %eax
936*9880d681SAndroid Build Coastguard Worker        ret
937*9880d681SAndroid Build Coastguard Worker
938*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
939*9880d681SAndroid Build Coastguard Worker
940*9880d681SAndroid Build Coastguard WorkerTake the following code (from
941*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=16541):
942*9880d681SAndroid Build Coastguard Worker
943*9880d681SAndroid Build Coastguard Workerextern unsigned char first_one[65536];
944*9880d681SAndroid Build Coastguard Workerint FirstOnet(unsigned long long arg1)
945*9880d681SAndroid Build Coastguard Worker{
946*9880d681SAndroid Build Coastguard Worker  if (arg1 >> 48)
947*9880d681SAndroid Build Coastguard Worker    return (first_one[arg1 >> 48]);
948*9880d681SAndroid Build Coastguard Worker  return 0;
949*9880d681SAndroid Build Coastguard Worker}
950*9880d681SAndroid Build Coastguard Worker
951*9880d681SAndroid Build Coastguard Worker
952*9880d681SAndroid Build Coastguard WorkerThe following code is currently generated:
953*9880d681SAndroid Build Coastguard WorkerFirstOnet:
954*9880d681SAndroid Build Coastguard Worker        movl    8(%esp), %eax
955*9880d681SAndroid Build Coastguard Worker        cmpl    $65536, %eax
956*9880d681SAndroid Build Coastguard Worker        movl    4(%esp), %ecx
957*9880d681SAndroid Build Coastguard Worker        jb      .LBB1_2 # UnifiedReturnBlock
958*9880d681SAndroid Build Coastguard Worker.LBB1_1:        # ifthen
959*9880d681SAndroid Build Coastguard Worker        shrl    $16, %eax
960*9880d681SAndroid Build Coastguard Worker        movzbl  first_one(%eax), %eax
961*9880d681SAndroid Build Coastguard Worker        ret
962*9880d681SAndroid Build Coastguard Worker.LBB1_2:        # UnifiedReturnBlock
963*9880d681SAndroid Build Coastguard Worker        xorl    %eax, %eax
964*9880d681SAndroid Build Coastguard Worker        ret
965*9880d681SAndroid Build Coastguard Worker
966*9880d681SAndroid Build Coastguard WorkerWe could change the "movl 8(%esp), %eax" into "movzwl 10(%esp), %eax"; this
967*9880d681SAndroid Build Coastguard Workerlets us change the cmpl into a testl, which is shorter, and eliminate the shift.
968*9880d681SAndroid Build Coastguard Worker
969*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
970*9880d681SAndroid Build Coastguard Worker
971*9880d681SAndroid Build Coastguard WorkerWe compile this function:
972*9880d681SAndroid Build Coastguard Worker
973*9880d681SAndroid Build Coastguard Workerdefine i32 @foo(i32 %a, i32 %b, i32 %c, i8 zeroext  %d) nounwind  {
974*9880d681SAndroid Build Coastguard Workerentry:
975*9880d681SAndroid Build Coastguard Worker	%tmp2 = icmp eq i8 %d, 0		; <i1> [#uses=1]
976*9880d681SAndroid Build Coastguard Worker	br i1 %tmp2, label %bb7, label %bb
977*9880d681SAndroid Build Coastguard Worker
978*9880d681SAndroid Build Coastguard Workerbb:		; preds = %entry
979*9880d681SAndroid Build Coastguard Worker	%tmp6 = add i32 %b, %a		; <i32> [#uses=1]
980*9880d681SAndroid Build Coastguard Worker	ret i32 %tmp6
981*9880d681SAndroid Build Coastguard Worker
982*9880d681SAndroid Build Coastguard Workerbb7:		; preds = %entry
983*9880d681SAndroid Build Coastguard Worker	%tmp10 = sub i32 %a, %c		; <i32> [#uses=1]
984*9880d681SAndroid Build Coastguard Worker	ret i32 %tmp10
985*9880d681SAndroid Build Coastguard Worker}
986*9880d681SAndroid Build Coastguard Worker
987*9880d681SAndroid Build Coastguard Workerto:
988*9880d681SAndroid Build Coastguard Worker
989*9880d681SAndroid Build Coastguard Workerfoo:                                    # @foo
990*9880d681SAndroid Build Coastguard Worker# BB#0:                                 # %entry
991*9880d681SAndroid Build Coastguard Worker	movl	4(%esp), %ecx
992*9880d681SAndroid Build Coastguard Worker	cmpb	$0, 16(%esp)
993*9880d681SAndroid Build Coastguard Worker	je	.LBB0_2
994*9880d681SAndroid Build Coastguard Worker# BB#1:                                 # %bb
995*9880d681SAndroid Build Coastguard Worker	movl	8(%esp), %eax
996*9880d681SAndroid Build Coastguard Worker	addl	%ecx, %eax
997*9880d681SAndroid Build Coastguard Worker	ret
998*9880d681SAndroid Build Coastguard Worker.LBB0_2:                                # %bb7
999*9880d681SAndroid Build Coastguard Worker	movl	12(%esp), %edx
1000*9880d681SAndroid Build Coastguard Worker	movl	%ecx, %eax
1001*9880d681SAndroid Build Coastguard Worker	subl	%edx, %eax
1002*9880d681SAndroid Build Coastguard Worker	ret
1003*9880d681SAndroid Build Coastguard Worker
1004*9880d681SAndroid Build Coastguard WorkerThere's an obviously unnecessary movl in .LBB0_2, and we could eliminate a
1005*9880d681SAndroid Build Coastguard Workercouple more movls by putting 4(%esp) into %eax instead of %ecx.
1006*9880d681SAndroid Build Coastguard Worker
1007*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1008*9880d681SAndroid Build Coastguard Worker
1009*9880d681SAndroid Build Coastguard WorkerSee rdar://4653682.
1010*9880d681SAndroid Build Coastguard Worker
1011*9880d681SAndroid Build Coastguard WorkerFrom flops:
1012*9880d681SAndroid Build Coastguard Worker
1013*9880d681SAndroid Build Coastguard WorkerLBB1_15:        # bb310
1014*9880d681SAndroid Build Coastguard Worker        cvtss2sd        LCPI1_0, %xmm1
1015*9880d681SAndroid Build Coastguard Worker        addsd   %xmm1, %xmm0
1016*9880d681SAndroid Build Coastguard Worker        movsd   176(%esp), %xmm2
1017*9880d681SAndroid Build Coastguard Worker        mulsd   %xmm0, %xmm2
1018*9880d681SAndroid Build Coastguard Worker        movapd  %xmm2, %xmm3
1019*9880d681SAndroid Build Coastguard Worker        mulsd   %xmm3, %xmm3
1020*9880d681SAndroid Build Coastguard Worker        movapd  %xmm3, %xmm4
1021*9880d681SAndroid Build Coastguard Worker        mulsd   LCPI1_23, %xmm4
1022*9880d681SAndroid Build Coastguard Worker        addsd   LCPI1_24, %xmm4
1023*9880d681SAndroid Build Coastguard Worker        mulsd   %xmm3, %xmm4
1024*9880d681SAndroid Build Coastguard Worker        addsd   LCPI1_25, %xmm4
1025*9880d681SAndroid Build Coastguard Worker        mulsd   %xmm3, %xmm4
1026*9880d681SAndroid Build Coastguard Worker        addsd   LCPI1_26, %xmm4
1027*9880d681SAndroid Build Coastguard Worker        mulsd   %xmm3, %xmm4
1028*9880d681SAndroid Build Coastguard Worker        addsd   LCPI1_27, %xmm4
1029*9880d681SAndroid Build Coastguard Worker        mulsd   %xmm3, %xmm4
1030*9880d681SAndroid Build Coastguard Worker        addsd   LCPI1_28, %xmm4
1031*9880d681SAndroid Build Coastguard Worker        mulsd   %xmm3, %xmm4
1032*9880d681SAndroid Build Coastguard Worker        addsd   %xmm1, %xmm4
1033*9880d681SAndroid Build Coastguard Worker        mulsd   %xmm2, %xmm4
1034*9880d681SAndroid Build Coastguard Worker        movsd   152(%esp), %xmm1
1035*9880d681SAndroid Build Coastguard Worker        addsd   %xmm4, %xmm1
1036*9880d681SAndroid Build Coastguard Worker        movsd   %xmm1, 152(%esp)
1037*9880d681SAndroid Build Coastguard Worker        incl    %eax
1038*9880d681SAndroid Build Coastguard Worker        cmpl    %eax, %esi
1039*9880d681SAndroid Build Coastguard Worker        jge     LBB1_15 # bb310
1040*9880d681SAndroid Build Coastguard WorkerLBB1_16:        # bb358.loopexit
1041*9880d681SAndroid Build Coastguard Worker        movsd   152(%esp), %xmm0
1042*9880d681SAndroid Build Coastguard Worker        addsd   %xmm0, %xmm0
1043*9880d681SAndroid Build Coastguard Worker        addsd   LCPI1_22, %xmm0
1044*9880d681SAndroid Build Coastguard Worker        movsd   %xmm0, 152(%esp)
1045*9880d681SAndroid Build Coastguard Worker
1046*9880d681SAndroid Build Coastguard WorkerRather than spilling the result of the last addsd in the loop, we should have
1047*9880d681SAndroid Build Coastguard Workerinsert a copy to split the interval (one for the duration of the loop, one
1048*9880d681SAndroid Build Coastguard Workerextending to the fall through). The register pressure in the loop isn't high
1049*9880d681SAndroid Build Coastguard Workerenough to warrant the spill.
1050*9880d681SAndroid Build Coastguard Worker
1051*9880d681SAndroid Build Coastguard WorkerAlso check why xmm7 is not used at all in the function.
1052*9880d681SAndroid Build Coastguard Worker
1053*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1054*9880d681SAndroid Build Coastguard Worker
1055*9880d681SAndroid Build Coastguard WorkerTake the following:
1056*9880d681SAndroid Build Coastguard Worker
1057*9880d681SAndroid Build Coastguard Workertarget datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-S128"
1058*9880d681SAndroid Build Coastguard Workertarget triple = "i386-apple-darwin8"
1059*9880d681SAndroid Build Coastguard Worker@in_exit.4870.b = internal global i1 false		; <i1*> [#uses=2]
1060*9880d681SAndroid Build Coastguard Workerdefine fastcc void @abort_gzip() noreturn nounwind  {
1061*9880d681SAndroid Build Coastguard Workerentry:
1062*9880d681SAndroid Build Coastguard Worker	%tmp.b.i = load i1* @in_exit.4870.b		; <i1> [#uses=1]
1063*9880d681SAndroid Build Coastguard Worker	br i1 %tmp.b.i, label %bb.i, label %bb4.i
1064*9880d681SAndroid Build Coastguard Workerbb.i:		; preds = %entry
1065*9880d681SAndroid Build Coastguard Worker	tail call void @exit( i32 1 ) noreturn nounwind
1066*9880d681SAndroid Build Coastguard Worker	unreachable
1067*9880d681SAndroid Build Coastguard Workerbb4.i:		; preds = %entry
1068*9880d681SAndroid Build Coastguard Worker	store i1 true, i1* @in_exit.4870.b
1069*9880d681SAndroid Build Coastguard Worker	tail call void @exit( i32 1 ) noreturn nounwind
1070*9880d681SAndroid Build Coastguard Worker	unreachable
1071*9880d681SAndroid Build Coastguard Worker}
1072*9880d681SAndroid Build Coastguard Workerdeclare void @exit(i32) noreturn nounwind
1073*9880d681SAndroid Build Coastguard Worker
1074*9880d681SAndroid Build Coastguard WorkerThis compiles into:
1075*9880d681SAndroid Build Coastguard Worker_abort_gzip:                            ## @abort_gzip
1076*9880d681SAndroid Build Coastguard Worker## BB#0:                                ## %entry
1077*9880d681SAndroid Build Coastguard Worker	subl	$12, %esp
1078*9880d681SAndroid Build Coastguard Worker	movb	_in_exit.4870.b, %al
1079*9880d681SAndroid Build Coastguard Worker	cmpb	$1, %al
1080*9880d681SAndroid Build Coastguard Worker	jne	LBB0_2
1081*9880d681SAndroid Build Coastguard Worker
1082*9880d681SAndroid Build Coastguard WorkerWe somehow miss folding the movb into the cmpb.
1083*9880d681SAndroid Build Coastguard Worker
1084*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1085*9880d681SAndroid Build Coastguard Worker
1086*9880d681SAndroid Build Coastguard WorkerWe compile:
1087*9880d681SAndroid Build Coastguard Worker
1088*9880d681SAndroid Build Coastguard Workerint test(int x, int y) {
1089*9880d681SAndroid Build Coastguard Worker  return x-y-1;
1090*9880d681SAndroid Build Coastguard Worker}
1091*9880d681SAndroid Build Coastguard Worker
1092*9880d681SAndroid Build Coastguard Workerinto (-m64):
1093*9880d681SAndroid Build Coastguard Worker
1094*9880d681SAndroid Build Coastguard Worker_test:
1095*9880d681SAndroid Build Coastguard Worker	decl	%edi
1096*9880d681SAndroid Build Coastguard Worker	movl	%edi, %eax
1097*9880d681SAndroid Build Coastguard Worker	subl	%esi, %eax
1098*9880d681SAndroid Build Coastguard Worker	ret
1099*9880d681SAndroid Build Coastguard Worker
1100*9880d681SAndroid Build Coastguard Workerit would be better to codegen as: x+~y  (notl+addl)
1101*9880d681SAndroid Build Coastguard Worker
1102*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1103*9880d681SAndroid Build Coastguard Worker
1104*9880d681SAndroid Build Coastguard WorkerThis code:
1105*9880d681SAndroid Build Coastguard Worker
1106*9880d681SAndroid Build Coastguard Workerint foo(const char *str,...)
1107*9880d681SAndroid Build Coastguard Worker{
1108*9880d681SAndroid Build Coastguard Worker __builtin_va_list a; int x;
1109*9880d681SAndroid Build Coastguard Worker __builtin_va_start(a,str); x = __builtin_va_arg(a,int); __builtin_va_end(a);
1110*9880d681SAndroid Build Coastguard Worker return x;
1111*9880d681SAndroid Build Coastguard Worker}
1112*9880d681SAndroid Build Coastguard Worker
1113*9880d681SAndroid Build Coastguard Workergets compiled into this on x86-64:
1114*9880d681SAndroid Build Coastguard Worker	subq    $200, %rsp
1115*9880d681SAndroid Build Coastguard Worker        movaps  %xmm7, 160(%rsp)
1116*9880d681SAndroid Build Coastguard Worker        movaps  %xmm6, 144(%rsp)
1117*9880d681SAndroid Build Coastguard Worker        movaps  %xmm5, 128(%rsp)
1118*9880d681SAndroid Build Coastguard Worker        movaps  %xmm4, 112(%rsp)
1119*9880d681SAndroid Build Coastguard Worker        movaps  %xmm3, 96(%rsp)
1120*9880d681SAndroid Build Coastguard Worker        movaps  %xmm2, 80(%rsp)
1121*9880d681SAndroid Build Coastguard Worker        movaps  %xmm1, 64(%rsp)
1122*9880d681SAndroid Build Coastguard Worker        movaps  %xmm0, 48(%rsp)
1123*9880d681SAndroid Build Coastguard Worker        movq    %r9, 40(%rsp)
1124*9880d681SAndroid Build Coastguard Worker        movq    %r8, 32(%rsp)
1125*9880d681SAndroid Build Coastguard Worker        movq    %rcx, 24(%rsp)
1126*9880d681SAndroid Build Coastguard Worker        movq    %rdx, 16(%rsp)
1127*9880d681SAndroid Build Coastguard Worker        movq    %rsi, 8(%rsp)
1128*9880d681SAndroid Build Coastguard Worker        leaq    (%rsp), %rax
1129*9880d681SAndroid Build Coastguard Worker        movq    %rax, 192(%rsp)
1130*9880d681SAndroid Build Coastguard Worker        leaq    208(%rsp), %rax
1131*9880d681SAndroid Build Coastguard Worker        movq    %rax, 184(%rsp)
1132*9880d681SAndroid Build Coastguard Worker        movl    $48, 180(%rsp)
1133*9880d681SAndroid Build Coastguard Worker        movl    $8, 176(%rsp)
1134*9880d681SAndroid Build Coastguard Worker        movl    176(%rsp), %eax
1135*9880d681SAndroid Build Coastguard Worker        cmpl    $47, %eax
1136*9880d681SAndroid Build Coastguard Worker        jbe     .LBB1_3 # bb
1137*9880d681SAndroid Build Coastguard Worker.LBB1_1:        # bb3
1138*9880d681SAndroid Build Coastguard Worker        movq    184(%rsp), %rcx
1139*9880d681SAndroid Build Coastguard Worker        leaq    8(%rcx), %rax
1140*9880d681SAndroid Build Coastguard Worker        movq    %rax, 184(%rsp)
1141*9880d681SAndroid Build Coastguard Worker.LBB1_2:        # bb4
1142*9880d681SAndroid Build Coastguard Worker        movl    (%rcx), %eax
1143*9880d681SAndroid Build Coastguard Worker        addq    $200, %rsp
1144*9880d681SAndroid Build Coastguard Worker        ret
1145*9880d681SAndroid Build Coastguard Worker.LBB1_3:        # bb
1146*9880d681SAndroid Build Coastguard Worker        movl    %eax, %ecx
1147*9880d681SAndroid Build Coastguard Worker        addl    $8, %eax
1148*9880d681SAndroid Build Coastguard Worker        addq    192(%rsp), %rcx
1149*9880d681SAndroid Build Coastguard Worker        movl    %eax, 176(%rsp)
1150*9880d681SAndroid Build Coastguard Worker        jmp     .LBB1_2 # bb4
1151*9880d681SAndroid Build Coastguard Worker
1152*9880d681SAndroid Build Coastguard Workergcc 4.3 generates:
1153*9880d681SAndroid Build Coastguard Worker	subq    $96, %rsp
1154*9880d681SAndroid Build Coastguard Worker.LCFI0:
1155*9880d681SAndroid Build Coastguard Worker        leaq    104(%rsp), %rax
1156*9880d681SAndroid Build Coastguard Worker        movq    %rsi, -80(%rsp)
1157*9880d681SAndroid Build Coastguard Worker        movl    $8, -120(%rsp)
1158*9880d681SAndroid Build Coastguard Worker        movq    %rax, -112(%rsp)
1159*9880d681SAndroid Build Coastguard Worker        leaq    -88(%rsp), %rax
1160*9880d681SAndroid Build Coastguard Worker        movq    %rax, -104(%rsp)
1161*9880d681SAndroid Build Coastguard Worker        movl    $8, %eax
1162*9880d681SAndroid Build Coastguard Worker        cmpl    $48, %eax
1163*9880d681SAndroid Build Coastguard Worker        jb      .L6
1164*9880d681SAndroid Build Coastguard Worker        movq    -112(%rsp), %rdx
1165*9880d681SAndroid Build Coastguard Worker        movl    (%rdx), %eax
1166*9880d681SAndroid Build Coastguard Worker        addq    $96, %rsp
1167*9880d681SAndroid Build Coastguard Worker        ret
1168*9880d681SAndroid Build Coastguard Worker        .p2align 4,,10
1169*9880d681SAndroid Build Coastguard Worker        .p2align 3
1170*9880d681SAndroid Build Coastguard Worker.L6:
1171*9880d681SAndroid Build Coastguard Worker        mov     %eax, %edx
1172*9880d681SAndroid Build Coastguard Worker        addq    -104(%rsp), %rdx
1173*9880d681SAndroid Build Coastguard Worker        addl    $8, %eax
1174*9880d681SAndroid Build Coastguard Worker        movl    %eax, -120(%rsp)
1175*9880d681SAndroid Build Coastguard Worker        movl    (%rdx), %eax
1176*9880d681SAndroid Build Coastguard Worker        addq    $96, %rsp
1177*9880d681SAndroid Build Coastguard Worker        ret
1178*9880d681SAndroid Build Coastguard Worker
1179*9880d681SAndroid Build Coastguard Workerand it gets compiled into this on x86:
1180*9880d681SAndroid Build Coastguard Worker	pushl   %ebp
1181*9880d681SAndroid Build Coastguard Worker        movl    %esp, %ebp
1182*9880d681SAndroid Build Coastguard Worker        subl    $4, %esp
1183*9880d681SAndroid Build Coastguard Worker        leal    12(%ebp), %eax
1184*9880d681SAndroid Build Coastguard Worker        movl    %eax, -4(%ebp)
1185*9880d681SAndroid Build Coastguard Worker        leal    16(%ebp), %eax
1186*9880d681SAndroid Build Coastguard Worker        movl    %eax, -4(%ebp)
1187*9880d681SAndroid Build Coastguard Worker        movl    12(%ebp), %eax
1188*9880d681SAndroid Build Coastguard Worker        addl    $4, %esp
1189*9880d681SAndroid Build Coastguard Worker        popl    %ebp
1190*9880d681SAndroid Build Coastguard Worker        ret
1191*9880d681SAndroid Build Coastguard Worker
1192*9880d681SAndroid Build Coastguard Workergcc 4.3 generates:
1193*9880d681SAndroid Build Coastguard Worker	pushl   %ebp
1194*9880d681SAndroid Build Coastguard Worker        movl    %esp, %ebp
1195*9880d681SAndroid Build Coastguard Worker        movl    12(%ebp), %eax
1196*9880d681SAndroid Build Coastguard Worker        popl    %ebp
1197*9880d681SAndroid Build Coastguard Worker        ret
1198*9880d681SAndroid Build Coastguard Worker
1199*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1200*9880d681SAndroid Build Coastguard Worker
1201*9880d681SAndroid Build Coastguard WorkerTeach tblgen not to check bitconvert source type in some cases. This allows us
1202*9880d681SAndroid Build Coastguard Workerto consolidate the following patterns in X86InstrMMX.td:
1203*9880d681SAndroid Build Coastguard Worker
1204*9880d681SAndroid Build Coastguard Workerdef : Pat<(v2i32 (bitconvert (i64 (vector_extract (v2i64 VR128:$src),
1205*9880d681SAndroid Build Coastguard Worker                                                  (iPTR 0))))),
1206*9880d681SAndroid Build Coastguard Worker          (v2i32 (MMX_MOVDQ2Qrr VR128:$src))>;
1207*9880d681SAndroid Build Coastguard Workerdef : Pat<(v4i16 (bitconvert (i64 (vector_extract (v2i64 VR128:$src),
1208*9880d681SAndroid Build Coastguard Worker                                                  (iPTR 0))))),
1209*9880d681SAndroid Build Coastguard Worker          (v4i16 (MMX_MOVDQ2Qrr VR128:$src))>;
1210*9880d681SAndroid Build Coastguard Workerdef : Pat<(v8i8 (bitconvert (i64 (vector_extract (v2i64 VR128:$src),
1211*9880d681SAndroid Build Coastguard Worker                                                  (iPTR 0))))),
1212*9880d681SAndroid Build Coastguard Worker          (v8i8 (MMX_MOVDQ2Qrr VR128:$src))>;
1213*9880d681SAndroid Build Coastguard Worker
1214*9880d681SAndroid Build Coastguard WorkerThere are other cases in various td files.
1215*9880d681SAndroid Build Coastguard Worker
1216*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1217*9880d681SAndroid Build Coastguard Worker
1218*9880d681SAndroid Build Coastguard WorkerTake something like the following on x86-32:
1219*9880d681SAndroid Build Coastguard Workerunsigned a(unsigned long long x, unsigned y) {return x % y;}
1220*9880d681SAndroid Build Coastguard Worker
1221*9880d681SAndroid Build Coastguard WorkerWe currently generate a libcall, but we really shouldn't: the expansion is
1222*9880d681SAndroid Build Coastguard Workershorter and likely faster than the libcall.  The expected code is something
1223*9880d681SAndroid Build Coastguard Workerlike the following:
1224*9880d681SAndroid Build Coastguard Worker
1225*9880d681SAndroid Build Coastguard Worker	movl	12(%ebp), %eax
1226*9880d681SAndroid Build Coastguard Worker	movl	16(%ebp), %ecx
1227*9880d681SAndroid Build Coastguard Worker	xorl	%edx, %edx
1228*9880d681SAndroid Build Coastguard Worker	divl	%ecx
1229*9880d681SAndroid Build Coastguard Worker	movl	8(%ebp), %eax
1230*9880d681SAndroid Build Coastguard Worker	divl	%ecx
1231*9880d681SAndroid Build Coastguard Worker	movl	%edx, %eax
1232*9880d681SAndroid Build Coastguard Worker	ret
1233*9880d681SAndroid Build Coastguard Worker
1234*9880d681SAndroid Build Coastguard WorkerA similar code sequence works for division.
1235*9880d681SAndroid Build Coastguard Worker
1236*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1237*9880d681SAndroid Build Coastguard Worker
1238*9880d681SAndroid Build Coastguard WorkerWe currently compile this:
1239*9880d681SAndroid Build Coastguard Worker
1240*9880d681SAndroid Build Coastguard Workerdefine i32 @func1(i32 %v1, i32 %v2) nounwind {
1241*9880d681SAndroid Build Coastguard Workerentry:
1242*9880d681SAndroid Build Coastguard Worker  %t = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %v1, i32 %v2)
1243*9880d681SAndroid Build Coastguard Worker  %sum = extractvalue {i32, i1} %t, 0
1244*9880d681SAndroid Build Coastguard Worker  %obit = extractvalue {i32, i1} %t, 1
1245*9880d681SAndroid Build Coastguard Worker  br i1 %obit, label %overflow, label %normal
1246*9880d681SAndroid Build Coastguard Workernormal:
1247*9880d681SAndroid Build Coastguard Worker  ret i32 %sum
1248*9880d681SAndroid Build Coastguard Workeroverflow:
1249*9880d681SAndroid Build Coastguard Worker  call void @llvm.trap()
1250*9880d681SAndroid Build Coastguard Worker  unreachable
1251*9880d681SAndroid Build Coastguard Worker}
1252*9880d681SAndroid Build Coastguard Workerdeclare {i32, i1} @llvm.sadd.with.overflow.i32(i32, i32)
1253*9880d681SAndroid Build Coastguard Workerdeclare void @llvm.trap()
1254*9880d681SAndroid Build Coastguard Worker
1255*9880d681SAndroid Build Coastguard Workerto:
1256*9880d681SAndroid Build Coastguard Worker
1257*9880d681SAndroid Build Coastguard Worker_func1:
1258*9880d681SAndroid Build Coastguard Worker	movl	4(%esp), %eax
1259*9880d681SAndroid Build Coastguard Worker	addl	8(%esp), %eax
1260*9880d681SAndroid Build Coastguard Worker	jo	LBB1_2	## overflow
1261*9880d681SAndroid Build Coastguard WorkerLBB1_1:	## normal
1262*9880d681SAndroid Build Coastguard Worker	ret
1263*9880d681SAndroid Build Coastguard WorkerLBB1_2:	## overflow
1264*9880d681SAndroid Build Coastguard Worker	ud2
1265*9880d681SAndroid Build Coastguard Worker
1266*9880d681SAndroid Build Coastguard Workerit would be nice to produce "into" someday.
1267*9880d681SAndroid Build Coastguard Worker
1268*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1269*9880d681SAndroid Build Coastguard Worker
1270*9880d681SAndroid Build Coastguard WorkerTest instructions can be eliminated by using EFLAGS values from arithmetic
1271*9880d681SAndroid Build Coastguard Workerinstructions. This is currently not done for mul, and, or, xor, neg, shl,
1272*9880d681SAndroid Build Coastguard Workersra, srl, shld, shrd, atomic ops, and others. It is also currently not done
1273*9880d681SAndroid Build Coastguard Workerfor read-modify-write instructions. It is also current not done if the
1274*9880d681SAndroid Build Coastguard WorkerOF or CF flags are needed.
1275*9880d681SAndroid Build Coastguard Worker
1276*9880d681SAndroid Build Coastguard WorkerThe shift operators have the complication that when the shift count is
1277*9880d681SAndroid Build Coastguard Workerzero, EFLAGS is not set, so they can only subsume a test instruction if
1278*9880d681SAndroid Build Coastguard Workerthe shift count is known to be non-zero. Also, using the EFLAGS value
1279*9880d681SAndroid Build Coastguard Workerfrom a shift is apparently very slow on some x86 implementations.
1280*9880d681SAndroid Build Coastguard Worker
1281*9880d681SAndroid Build Coastguard WorkerIn read-modify-write instructions, the root node in the isel match is
1282*9880d681SAndroid Build Coastguard Workerthe store, and isel has no way for the use of the EFLAGS result of the
1283*9880d681SAndroid Build Coastguard Workerarithmetic to be remapped to the new node.
1284*9880d681SAndroid Build Coastguard Worker
1285*9880d681SAndroid Build Coastguard WorkerAdd and subtract instructions set OF on signed overflow and CF on unsiged
1286*9880d681SAndroid Build Coastguard Workeroverflow, while test instructions always clear OF and CF. In order to
1287*9880d681SAndroid Build Coastguard Workerreplace a test with an add or subtract in a situation where OF or CF is
1288*9880d681SAndroid Build Coastguard Workerneeded, codegen must be able to prove that the operation cannot see
1289*9880d681SAndroid Build Coastguard Workersigned or unsigned overflow, respectively.
1290*9880d681SAndroid Build Coastguard Worker
1291*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1292*9880d681SAndroid Build Coastguard Worker
1293*9880d681SAndroid Build Coastguard Workermemcpy/memmove do not lower to SSE copies when possible.  A silly example is:
1294*9880d681SAndroid Build Coastguard Workerdefine <16 x float> @foo(<16 x float> %A) nounwind {
1295*9880d681SAndroid Build Coastguard Worker	%tmp = alloca <16 x float>, align 16
1296*9880d681SAndroid Build Coastguard Worker	%tmp2 = alloca <16 x float>, align 16
1297*9880d681SAndroid Build Coastguard Worker	store <16 x float> %A, <16 x float>* %tmp
1298*9880d681SAndroid Build Coastguard Worker	%s = bitcast <16 x float>* %tmp to i8*
1299*9880d681SAndroid Build Coastguard Worker	%s2 = bitcast <16 x float>* %tmp2 to i8*
1300*9880d681SAndroid Build Coastguard Worker	call void @llvm.memcpy.i64(i8* %s, i8* %s2, i64 64, i32 16)
1301*9880d681SAndroid Build Coastguard Worker	%R = load <16 x float>* %tmp2
1302*9880d681SAndroid Build Coastguard Worker	ret <16 x float> %R
1303*9880d681SAndroid Build Coastguard Worker}
1304*9880d681SAndroid Build Coastguard Worker
1305*9880d681SAndroid Build Coastguard Workerdeclare void @llvm.memcpy.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind
1306*9880d681SAndroid Build Coastguard Worker
1307*9880d681SAndroid Build Coastguard Workerwhich compiles to:
1308*9880d681SAndroid Build Coastguard Worker
1309*9880d681SAndroid Build Coastguard Worker_foo:
1310*9880d681SAndroid Build Coastguard Worker	subl	$140, %esp
1311*9880d681SAndroid Build Coastguard Worker	movaps	%xmm3, 112(%esp)
1312*9880d681SAndroid Build Coastguard Worker	movaps	%xmm2, 96(%esp)
1313*9880d681SAndroid Build Coastguard Worker	movaps	%xmm1, 80(%esp)
1314*9880d681SAndroid Build Coastguard Worker	movaps	%xmm0, 64(%esp)
1315*9880d681SAndroid Build Coastguard Worker	movl	60(%esp), %eax
1316*9880d681SAndroid Build Coastguard Worker	movl	%eax, 124(%esp)
1317*9880d681SAndroid Build Coastguard Worker	movl	56(%esp), %eax
1318*9880d681SAndroid Build Coastguard Worker	movl	%eax, 120(%esp)
1319*9880d681SAndroid Build Coastguard Worker	movl	52(%esp), %eax
1320*9880d681SAndroid Build Coastguard Worker        <many many more 32-bit copies>
1321*9880d681SAndroid Build Coastguard Worker      	movaps	(%esp), %xmm0
1322*9880d681SAndroid Build Coastguard Worker	movaps	16(%esp), %xmm1
1323*9880d681SAndroid Build Coastguard Worker	movaps	32(%esp), %xmm2
1324*9880d681SAndroid Build Coastguard Worker	movaps	48(%esp), %xmm3
1325*9880d681SAndroid Build Coastguard Worker	addl	$140, %esp
1326*9880d681SAndroid Build Coastguard Worker	ret
1327*9880d681SAndroid Build Coastguard Worker
1328*9880d681SAndroid Build Coastguard WorkerOn Nehalem, it may even be cheaper to just use movups when unaligned than to
1329*9880d681SAndroid Build Coastguard Workerfall back to lower-granularity chunks.
1330*9880d681SAndroid Build Coastguard Worker
1331*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1332*9880d681SAndroid Build Coastguard Worker
1333*9880d681SAndroid Build Coastguard WorkerImplement processor-specific optimizations for parity with GCC on these
1334*9880d681SAndroid Build Coastguard Workerprocessors.  GCC does two optimizations:
1335*9880d681SAndroid Build Coastguard Worker
1336*9880d681SAndroid Build Coastguard Worker1. ix86_pad_returns inserts a noop before ret instructions if immediately
1337*9880d681SAndroid Build Coastguard Worker   preceded by a conditional branch or is the target of a jump.
1338*9880d681SAndroid Build Coastguard Worker2. ix86_avoid_jump_misspredicts inserts noops in cases where a 16-byte block of
1339*9880d681SAndroid Build Coastguard Worker   code contains more than 3 branches.
1340*9880d681SAndroid Build Coastguard Worker
1341*9880d681SAndroid Build Coastguard WorkerThe first one is done for all AMDs, Core2, and "Generic"
1342*9880d681SAndroid Build Coastguard WorkerThe second one is done for: Atom, Pentium Pro, all AMDs, Pentium 4, Nocona,
1343*9880d681SAndroid Build Coastguard Worker  Core 2, and "Generic"
1344*9880d681SAndroid Build Coastguard Worker
1345*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1346*9880d681SAndroid Build Coastguard WorkerTestcase:
1347*9880d681SAndroid Build Coastguard Workerint x(int a) { return (a&0xf0)>>4; }
1348*9880d681SAndroid Build Coastguard Worker
1349*9880d681SAndroid Build Coastguard WorkerCurrent output:
1350*9880d681SAndroid Build Coastguard Worker	movl	4(%esp), %eax
1351*9880d681SAndroid Build Coastguard Worker	shrl	$4, %eax
1352*9880d681SAndroid Build Coastguard Worker	andl	$15, %eax
1353*9880d681SAndroid Build Coastguard Worker	ret
1354*9880d681SAndroid Build Coastguard Worker
1355*9880d681SAndroid Build Coastguard WorkerIdeal output:
1356*9880d681SAndroid Build Coastguard Worker	movzbl	4(%esp), %eax
1357*9880d681SAndroid Build Coastguard Worker	shrl	$4, %eax
1358*9880d681SAndroid Build Coastguard Worker	ret
1359*9880d681SAndroid Build Coastguard Worker
1360*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1361*9880d681SAndroid Build Coastguard Worker
1362*9880d681SAndroid Build Coastguard WorkerRe-implement atomic builtins __sync_add_and_fetch() and __sync_sub_and_fetch
1363*9880d681SAndroid Build Coastguard Workerproperly.
1364*9880d681SAndroid Build Coastguard Worker
1365*9880d681SAndroid Build Coastguard WorkerWhen the return value is not used (i.e. only care about the value in the
1366*9880d681SAndroid Build Coastguard Workermemory), x86 does not have to use add to implement these. Instead, it can use
1367*9880d681SAndroid Build Coastguard Workeradd, sub, inc, dec instructions with the "lock" prefix.
1368*9880d681SAndroid Build Coastguard Worker
1369*9880d681SAndroid Build Coastguard WorkerThis is currently implemented using a bit of instruction selection trick. The
1370*9880d681SAndroid Build Coastguard Workerissue is the target independent pattern produces one output and a chain and we
1371*9880d681SAndroid Build Coastguard Workerwant to map it into one that just output a chain. The current trick is to select
1372*9880d681SAndroid Build Coastguard Workerit into a MERGE_VALUES with the first definition being an implicit_def. The
1373*9880d681SAndroid Build Coastguard Workerproper solution is to add new ISD opcodes for the no-output variant. DAG
1374*9880d681SAndroid Build Coastguard Workercombiner can then transform the node before it gets to target node selection.
1375*9880d681SAndroid Build Coastguard Worker
1376*9880d681SAndroid Build Coastguard WorkerProblem #2 is we are adding a whole bunch of x86 atomic instructions when in
1377*9880d681SAndroid Build Coastguard Workerfact these instructions are identical to the non-lock versions. We need a way to
1378*9880d681SAndroid Build Coastguard Workeradd target specific information to target nodes and have this information
1379*9880d681SAndroid Build Coastguard Workercarried over to machine instructions. Asm printer (or JIT) can use this
1380*9880d681SAndroid Build Coastguard Workerinformation to add the "lock" prefix.
1381*9880d681SAndroid Build Coastguard Worker
1382*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1383*9880d681SAndroid Build Coastguard Worker
1384*9880d681SAndroid Build Coastguard Workerstruct B {
1385*9880d681SAndroid Build Coastguard Worker  unsigned char y0 : 1;
1386*9880d681SAndroid Build Coastguard Worker};
1387*9880d681SAndroid Build Coastguard Worker
1388*9880d681SAndroid Build Coastguard Workerint bar(struct B* a) { return a->y0; }
1389*9880d681SAndroid Build Coastguard Worker
1390*9880d681SAndroid Build Coastguard Workerdefine i32 @bar(%struct.B* nocapture %a) nounwind readonly optsize {
1391*9880d681SAndroid Build Coastguard Worker  %1 = getelementptr inbounds %struct.B* %a, i64 0, i32 0
1392*9880d681SAndroid Build Coastguard Worker  %2 = load i8* %1, align 1
1393*9880d681SAndroid Build Coastguard Worker  %3 = and i8 %2, 1
1394*9880d681SAndroid Build Coastguard Worker  %4 = zext i8 %3 to i32
1395*9880d681SAndroid Build Coastguard Worker  ret i32 %4
1396*9880d681SAndroid Build Coastguard Worker}
1397*9880d681SAndroid Build Coastguard Worker
1398*9880d681SAndroid Build Coastguard Workerbar:                                    # @bar
1399*9880d681SAndroid Build Coastguard Worker# BB#0:
1400*9880d681SAndroid Build Coastguard Worker        movb    (%rdi), %al
1401*9880d681SAndroid Build Coastguard Worker        andb    $1, %al
1402*9880d681SAndroid Build Coastguard Worker        movzbl  %al, %eax
1403*9880d681SAndroid Build Coastguard Worker        ret
1404*9880d681SAndroid Build Coastguard Worker
1405*9880d681SAndroid Build Coastguard WorkerMissed optimization: should be movl+andl.
1406*9880d681SAndroid Build Coastguard Worker
1407*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1408*9880d681SAndroid Build Coastguard Worker
1409*9880d681SAndroid Build Coastguard WorkerThe x86_64 abi says:
1410*9880d681SAndroid Build Coastguard Worker
1411*9880d681SAndroid Build Coastguard WorkerBooleans, when stored in a memory object, are stored as single byte objects the
1412*9880d681SAndroid Build Coastguard Workervalue of which is always 0 (false) or 1 (true).
1413*9880d681SAndroid Build Coastguard Worker
1414*9880d681SAndroid Build Coastguard WorkerWe are not using this fact:
1415*9880d681SAndroid Build Coastguard Worker
1416*9880d681SAndroid Build Coastguard Workerint bar(_Bool *a) { return *a; }
1417*9880d681SAndroid Build Coastguard Worker
1418*9880d681SAndroid Build Coastguard Workerdefine i32 @bar(i8* nocapture %a) nounwind readonly optsize {
1419*9880d681SAndroid Build Coastguard Worker  %1 = load i8* %a, align 1, !tbaa !0
1420*9880d681SAndroid Build Coastguard Worker  %tmp = and i8 %1, 1
1421*9880d681SAndroid Build Coastguard Worker  %2 = zext i8 %tmp to i32
1422*9880d681SAndroid Build Coastguard Worker  ret i32 %2
1423*9880d681SAndroid Build Coastguard Worker}
1424*9880d681SAndroid Build Coastguard Worker
1425*9880d681SAndroid Build Coastguard Workerbar:
1426*9880d681SAndroid Build Coastguard Worker        movb    (%rdi), %al
1427*9880d681SAndroid Build Coastguard Worker        andb    $1, %al
1428*9880d681SAndroid Build Coastguard Worker        movzbl  %al, %eax
1429*9880d681SAndroid Build Coastguard Worker        ret
1430*9880d681SAndroid Build Coastguard Worker
1431*9880d681SAndroid Build Coastguard WorkerGCC produces
1432*9880d681SAndroid Build Coastguard Worker
1433*9880d681SAndroid Build Coastguard Workerbar:
1434*9880d681SAndroid Build Coastguard Worker        movzbl  (%rdi), %eax
1435*9880d681SAndroid Build Coastguard Worker        ret
1436*9880d681SAndroid Build Coastguard Worker
1437*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1438*9880d681SAndroid Build Coastguard Worker
1439*9880d681SAndroid Build Coastguard WorkerConsider the following two functions compiled with clang:
1440*9880d681SAndroid Build Coastguard Worker_Bool foo(int *x) { return !(*x & 4); }
1441*9880d681SAndroid Build Coastguard Workerunsigned bar(int *x) { return !(*x & 4); }
1442*9880d681SAndroid Build Coastguard Worker
1443*9880d681SAndroid Build Coastguard Workerfoo:
1444*9880d681SAndroid Build Coastguard Worker	movl	4(%esp), %eax
1445*9880d681SAndroid Build Coastguard Worker	testb	$4, (%eax)
1446*9880d681SAndroid Build Coastguard Worker	sete	%al
1447*9880d681SAndroid Build Coastguard Worker	movzbl	%al, %eax
1448*9880d681SAndroid Build Coastguard Worker	ret
1449*9880d681SAndroid Build Coastguard Worker
1450*9880d681SAndroid Build Coastguard Workerbar:
1451*9880d681SAndroid Build Coastguard Worker	movl	4(%esp), %eax
1452*9880d681SAndroid Build Coastguard Worker	movl	(%eax), %eax
1453*9880d681SAndroid Build Coastguard Worker	shrl	$2, %eax
1454*9880d681SAndroid Build Coastguard Worker	andl	$1, %eax
1455*9880d681SAndroid Build Coastguard Worker	xorl	$1, %eax
1456*9880d681SAndroid Build Coastguard Worker	ret
1457*9880d681SAndroid Build Coastguard Worker
1458*9880d681SAndroid Build Coastguard WorkerThe second function generates more code even though the two functions are
1459*9880d681SAndroid Build Coastguard Workerare functionally identical.
1460*9880d681SAndroid Build Coastguard Worker
1461*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1462*9880d681SAndroid Build Coastguard Worker
1463*9880d681SAndroid Build Coastguard WorkerTake the following C code:
1464*9880d681SAndroid Build Coastguard Workerint f(int a, int b) { return (unsigned char)a == (unsigned char)b; }
1465*9880d681SAndroid Build Coastguard Worker
1466*9880d681SAndroid Build Coastguard WorkerWe generate the following IR with clang:
1467*9880d681SAndroid Build Coastguard Workerdefine i32 @f(i32 %a, i32 %b) nounwind readnone {
1468*9880d681SAndroid Build Coastguard Workerentry:
1469*9880d681SAndroid Build Coastguard Worker  %tmp = xor i32 %b, %a                           ; <i32> [#uses=1]
1470*9880d681SAndroid Build Coastguard Worker  %tmp6 = and i32 %tmp, 255                       ; <i32> [#uses=1]
1471*9880d681SAndroid Build Coastguard Worker  %cmp = icmp eq i32 %tmp6, 0                     ; <i1> [#uses=1]
1472*9880d681SAndroid Build Coastguard Worker  %conv5 = zext i1 %cmp to i32                    ; <i32> [#uses=1]
1473*9880d681SAndroid Build Coastguard Worker  ret i32 %conv5
1474*9880d681SAndroid Build Coastguard Worker}
1475*9880d681SAndroid Build Coastguard Worker
1476*9880d681SAndroid Build Coastguard WorkerAnd the following x86 code:
1477*9880d681SAndroid Build Coastguard Worker	xorl	%esi, %edi
1478*9880d681SAndroid Build Coastguard Worker	testb	$-1, %dil
1479*9880d681SAndroid Build Coastguard Worker	sete	%al
1480*9880d681SAndroid Build Coastguard Worker	movzbl	%al, %eax
1481*9880d681SAndroid Build Coastguard Worker	ret
1482*9880d681SAndroid Build Coastguard Worker
1483*9880d681SAndroid Build Coastguard WorkerA cmpb instead of the xorl+testb would be one instruction shorter.
1484*9880d681SAndroid Build Coastguard Worker
1485*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1486*9880d681SAndroid Build Coastguard Worker
1487*9880d681SAndroid Build Coastguard WorkerGiven the following C code:
1488*9880d681SAndroid Build Coastguard Workerint f(int a, int b) { return (signed char)a == (signed char)b; }
1489*9880d681SAndroid Build Coastguard Worker
1490*9880d681SAndroid Build Coastguard WorkerWe generate the following IR with clang:
1491*9880d681SAndroid Build Coastguard Workerdefine i32 @f(i32 %a, i32 %b) nounwind readnone {
1492*9880d681SAndroid Build Coastguard Workerentry:
1493*9880d681SAndroid Build Coastguard Worker  %sext = shl i32 %a, 24                          ; <i32> [#uses=1]
1494*9880d681SAndroid Build Coastguard Worker  %conv1 = ashr i32 %sext, 24                     ; <i32> [#uses=1]
1495*9880d681SAndroid Build Coastguard Worker  %sext6 = shl i32 %b, 24                         ; <i32> [#uses=1]
1496*9880d681SAndroid Build Coastguard Worker  %conv4 = ashr i32 %sext6, 24                    ; <i32> [#uses=1]
1497*9880d681SAndroid Build Coastguard Worker  %cmp = icmp eq i32 %conv1, %conv4               ; <i1> [#uses=1]
1498*9880d681SAndroid Build Coastguard Worker  %conv5 = zext i1 %cmp to i32                    ; <i32> [#uses=1]
1499*9880d681SAndroid Build Coastguard Worker  ret i32 %conv5
1500*9880d681SAndroid Build Coastguard Worker}
1501*9880d681SAndroid Build Coastguard Worker
1502*9880d681SAndroid Build Coastguard WorkerAnd the following x86 code:
1503*9880d681SAndroid Build Coastguard Worker	movsbl	%sil, %eax
1504*9880d681SAndroid Build Coastguard Worker	movsbl	%dil, %ecx
1505*9880d681SAndroid Build Coastguard Worker	cmpl	%eax, %ecx
1506*9880d681SAndroid Build Coastguard Worker	sete	%al
1507*9880d681SAndroid Build Coastguard Worker	movzbl	%al, %eax
1508*9880d681SAndroid Build Coastguard Worker	ret
1509*9880d681SAndroid Build Coastguard Worker
1510*9880d681SAndroid Build Coastguard Worker
1511*9880d681SAndroid Build Coastguard WorkerIt should be possible to eliminate the sign extensions.
1512*9880d681SAndroid Build Coastguard Worker
1513*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1514*9880d681SAndroid Build Coastguard Worker
1515*9880d681SAndroid Build Coastguard WorkerLLVM misses a load+store narrowing opportunity in this code:
1516*9880d681SAndroid Build Coastguard Worker
1517*9880d681SAndroid Build Coastguard Worker%struct.bf = type { i64, i16, i16, i32 }
1518*9880d681SAndroid Build Coastguard Worker
1519*9880d681SAndroid Build Coastguard Worker@bfi = external global %struct.bf*                ; <%struct.bf**> [#uses=2]
1520*9880d681SAndroid Build Coastguard Worker
1521*9880d681SAndroid Build Coastguard Workerdefine void @t1() nounwind ssp {
1522*9880d681SAndroid Build Coastguard Workerentry:
1523*9880d681SAndroid Build Coastguard Worker  %0 = load %struct.bf** @bfi, align 8            ; <%struct.bf*> [#uses=1]
1524*9880d681SAndroid Build Coastguard Worker  %1 = getelementptr %struct.bf* %0, i64 0, i32 1 ; <i16*> [#uses=1]
1525*9880d681SAndroid Build Coastguard Worker  %2 = bitcast i16* %1 to i32*                    ; <i32*> [#uses=2]
1526*9880d681SAndroid Build Coastguard Worker  %3 = load i32* %2, align 1                      ; <i32> [#uses=1]
1527*9880d681SAndroid Build Coastguard Worker  %4 = and i32 %3, -65537                         ; <i32> [#uses=1]
1528*9880d681SAndroid Build Coastguard Worker  store i32 %4, i32* %2, align 1
1529*9880d681SAndroid Build Coastguard Worker  %5 = load %struct.bf** @bfi, align 8            ; <%struct.bf*> [#uses=1]
1530*9880d681SAndroid Build Coastguard Worker  %6 = getelementptr %struct.bf* %5, i64 0, i32 1 ; <i16*> [#uses=1]
1531*9880d681SAndroid Build Coastguard Worker  %7 = bitcast i16* %6 to i32*                    ; <i32*> [#uses=2]
1532*9880d681SAndroid Build Coastguard Worker  %8 = load i32* %7, align 1                      ; <i32> [#uses=1]
1533*9880d681SAndroid Build Coastguard Worker  %9 = and i32 %8, -131073                        ; <i32> [#uses=1]
1534*9880d681SAndroid Build Coastguard Worker  store i32 %9, i32* %7, align 1
1535*9880d681SAndroid Build Coastguard Worker  ret void
1536*9880d681SAndroid Build Coastguard Worker}
1537*9880d681SAndroid Build Coastguard Worker
1538*9880d681SAndroid Build Coastguard WorkerLLVM currently emits this:
1539*9880d681SAndroid Build Coastguard Worker
1540*9880d681SAndroid Build Coastguard Worker  movq  bfi(%rip), %rax
1541*9880d681SAndroid Build Coastguard Worker  andl  $-65537, 8(%rax)
1542*9880d681SAndroid Build Coastguard Worker  movq  bfi(%rip), %rax
1543*9880d681SAndroid Build Coastguard Worker  andl  $-131073, 8(%rax)
1544*9880d681SAndroid Build Coastguard Worker  ret
1545*9880d681SAndroid Build Coastguard Worker
1546*9880d681SAndroid Build Coastguard WorkerIt could narrow the loads and stores to emit this:
1547*9880d681SAndroid Build Coastguard Worker
1548*9880d681SAndroid Build Coastguard Worker  movq  bfi(%rip), %rax
1549*9880d681SAndroid Build Coastguard Worker  andb  $-2, 10(%rax)
1550*9880d681SAndroid Build Coastguard Worker  movq  bfi(%rip), %rax
1551*9880d681SAndroid Build Coastguard Worker  andb  $-3, 10(%rax)
1552*9880d681SAndroid Build Coastguard Worker  ret
1553*9880d681SAndroid Build Coastguard Worker
1554*9880d681SAndroid Build Coastguard WorkerThe trouble is that there is a TokenFactor between the store and the
1555*9880d681SAndroid Build Coastguard Workerload, making it non-trivial to determine if there's anything between
1556*9880d681SAndroid Build Coastguard Workerthe load and the store which would prohibit narrowing.
1557*9880d681SAndroid Build Coastguard Worker
1558*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1559*9880d681SAndroid Build Coastguard Worker
1560*9880d681SAndroid Build Coastguard WorkerThis code:
1561*9880d681SAndroid Build Coastguard Workervoid foo(unsigned x) {
1562*9880d681SAndroid Build Coastguard Worker  if (x == 0) bar();
1563*9880d681SAndroid Build Coastguard Worker  else if (x == 1) qux();
1564*9880d681SAndroid Build Coastguard Worker}
1565*9880d681SAndroid Build Coastguard Worker
1566*9880d681SAndroid Build Coastguard Workercurrently compiles into:
1567*9880d681SAndroid Build Coastguard Worker_foo:
1568*9880d681SAndroid Build Coastguard Worker	movl	4(%esp), %eax
1569*9880d681SAndroid Build Coastguard Worker	cmpl	$1, %eax
1570*9880d681SAndroid Build Coastguard Worker	je	LBB0_3
1571*9880d681SAndroid Build Coastguard Worker	testl	%eax, %eax
1572*9880d681SAndroid Build Coastguard Worker	jne	LBB0_4
1573*9880d681SAndroid Build Coastguard Worker
1574*9880d681SAndroid Build Coastguard Workerthe testl could be removed:
1575*9880d681SAndroid Build Coastguard Worker_foo:
1576*9880d681SAndroid Build Coastguard Worker	movl	4(%esp), %eax
1577*9880d681SAndroid Build Coastguard Worker	cmpl	$1, %eax
1578*9880d681SAndroid Build Coastguard Worker	je	LBB0_3
1579*9880d681SAndroid Build Coastguard Worker	jb	LBB0_4
1580*9880d681SAndroid Build Coastguard Worker
1581*9880d681SAndroid Build Coastguard Worker0 is the only unsigned number < 1.
1582*9880d681SAndroid Build Coastguard Worker
1583*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1584*9880d681SAndroid Build Coastguard Worker
1585*9880d681SAndroid Build Coastguard WorkerThis code:
1586*9880d681SAndroid Build Coastguard Worker
1587*9880d681SAndroid Build Coastguard Worker%0 = type { i32, i1 }
1588*9880d681SAndroid Build Coastguard Worker
1589*9880d681SAndroid Build Coastguard Workerdefine i32 @add32carry(i32 %sum, i32 %x) nounwind readnone ssp {
1590*9880d681SAndroid Build Coastguard Workerentry:
1591*9880d681SAndroid Build Coastguard Worker  %uadd = tail call %0 @llvm.uadd.with.overflow.i32(i32 %sum, i32 %x)
1592*9880d681SAndroid Build Coastguard Worker  %cmp = extractvalue %0 %uadd, 1
1593*9880d681SAndroid Build Coastguard Worker  %inc = zext i1 %cmp to i32
1594*9880d681SAndroid Build Coastguard Worker  %add = add i32 %x, %sum
1595*9880d681SAndroid Build Coastguard Worker  %z.0 = add i32 %add, %inc
1596*9880d681SAndroid Build Coastguard Worker  ret i32 %z.0
1597*9880d681SAndroid Build Coastguard Worker}
1598*9880d681SAndroid Build Coastguard Worker
1599*9880d681SAndroid Build Coastguard Workerdeclare %0 @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone
1600*9880d681SAndroid Build Coastguard Worker
1601*9880d681SAndroid Build Coastguard Workercompiles to:
1602*9880d681SAndroid Build Coastguard Worker
1603*9880d681SAndroid Build Coastguard Worker_add32carry:                            ## @add32carry
1604*9880d681SAndroid Build Coastguard Worker	addl	%esi, %edi
1605*9880d681SAndroid Build Coastguard Worker	sbbl	%ecx, %ecx
1606*9880d681SAndroid Build Coastguard Worker	movl	%edi, %eax
1607*9880d681SAndroid Build Coastguard Worker	subl	%ecx, %eax
1608*9880d681SAndroid Build Coastguard Worker	ret
1609*9880d681SAndroid Build Coastguard Worker
1610*9880d681SAndroid Build Coastguard WorkerBut it could be:
1611*9880d681SAndroid Build Coastguard Worker
1612*9880d681SAndroid Build Coastguard Worker_add32carry:
1613*9880d681SAndroid Build Coastguard Worker	leal	(%rsi,%rdi), %eax
1614*9880d681SAndroid Build Coastguard Worker	cmpl	%esi, %eax
1615*9880d681SAndroid Build Coastguard Worker	adcl	$0, %eax
1616*9880d681SAndroid Build Coastguard Worker	ret
1617*9880d681SAndroid Build Coastguard Worker
1618*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1619*9880d681SAndroid Build Coastguard Worker
1620*9880d681SAndroid Build Coastguard WorkerThe hot loop of 256.bzip2 contains code that looks a bit like this:
1621*9880d681SAndroid Build Coastguard Worker
1622*9880d681SAndroid Build Coastguard Workerint foo(char *P, char *Q, int x, int y) {
1623*9880d681SAndroid Build Coastguard Worker  if (P[0] != Q[0])
1624*9880d681SAndroid Build Coastguard Worker     return P[0] < Q[0];
1625*9880d681SAndroid Build Coastguard Worker  if (P[1] != Q[1])
1626*9880d681SAndroid Build Coastguard Worker     return P[1] < Q[1];
1627*9880d681SAndroid Build Coastguard Worker  if (P[2] != Q[2])
1628*9880d681SAndroid Build Coastguard Worker     return P[2] < Q[2];
1629*9880d681SAndroid Build Coastguard Worker   return P[3] < Q[3];
1630*9880d681SAndroid Build Coastguard Worker}
1631*9880d681SAndroid Build Coastguard Worker
1632*9880d681SAndroid Build Coastguard WorkerIn the real code, we get a lot more wrong than this.  However, even in this
1633*9880d681SAndroid Build Coastguard Workercode we generate:
1634*9880d681SAndroid Build Coastguard Worker
1635*9880d681SAndroid Build Coastguard Worker_foo:                                   ## @foo
1636*9880d681SAndroid Build Coastguard Worker## BB#0:                                ## %entry
1637*9880d681SAndroid Build Coastguard Worker	movb	(%rsi), %al
1638*9880d681SAndroid Build Coastguard Worker	movb	(%rdi), %cl
1639*9880d681SAndroid Build Coastguard Worker	cmpb	%al, %cl
1640*9880d681SAndroid Build Coastguard Worker	je	LBB0_2
1641*9880d681SAndroid Build Coastguard WorkerLBB0_1:                                 ## %if.then
1642*9880d681SAndroid Build Coastguard Worker	cmpb	%al, %cl
1643*9880d681SAndroid Build Coastguard Worker	jmp	LBB0_5
1644*9880d681SAndroid Build Coastguard WorkerLBB0_2:                                 ## %if.end
1645*9880d681SAndroid Build Coastguard Worker	movb	1(%rsi), %al
1646*9880d681SAndroid Build Coastguard Worker	movb	1(%rdi), %cl
1647*9880d681SAndroid Build Coastguard Worker	cmpb	%al, %cl
1648*9880d681SAndroid Build Coastguard Worker	jne	LBB0_1
1649*9880d681SAndroid Build Coastguard Worker## BB#3:                                ## %if.end38
1650*9880d681SAndroid Build Coastguard Worker	movb	2(%rsi), %al
1651*9880d681SAndroid Build Coastguard Worker	movb	2(%rdi), %cl
1652*9880d681SAndroid Build Coastguard Worker	cmpb	%al, %cl
1653*9880d681SAndroid Build Coastguard Worker	jne	LBB0_1
1654*9880d681SAndroid Build Coastguard Worker## BB#4:                                ## %if.end60
1655*9880d681SAndroid Build Coastguard Worker	movb	3(%rdi), %al
1656*9880d681SAndroid Build Coastguard Worker	cmpb	3(%rsi), %al
1657*9880d681SAndroid Build Coastguard WorkerLBB0_5:                                 ## %if.end60
1658*9880d681SAndroid Build Coastguard Worker	setl	%al
1659*9880d681SAndroid Build Coastguard Worker	movzbl	%al, %eax
1660*9880d681SAndroid Build Coastguard Worker	ret
1661*9880d681SAndroid Build Coastguard Worker
1662*9880d681SAndroid Build Coastguard WorkerNote that we generate jumps to LBB0_1 which does a redundant compare.  The
1663*9880d681SAndroid Build Coastguard Workerredundant compare also forces the register values to be live, which prevents
1664*9880d681SAndroid Build Coastguard Workerfolding one of the loads into the compare.  In contrast, GCC 4.2 produces:
1665*9880d681SAndroid Build Coastguard Worker
1666*9880d681SAndroid Build Coastguard Worker_foo:
1667*9880d681SAndroid Build Coastguard Worker	movzbl	(%rsi), %eax
1668*9880d681SAndroid Build Coastguard Worker	cmpb	%al, (%rdi)
1669*9880d681SAndroid Build Coastguard Worker	jne	L10
1670*9880d681SAndroid Build Coastguard WorkerL12:
1671*9880d681SAndroid Build Coastguard Worker	movzbl	1(%rsi), %eax
1672*9880d681SAndroid Build Coastguard Worker	cmpb	%al, 1(%rdi)
1673*9880d681SAndroid Build Coastguard Worker	jne	L10
1674*9880d681SAndroid Build Coastguard Worker	movzbl	2(%rsi), %eax
1675*9880d681SAndroid Build Coastguard Worker	cmpb	%al, 2(%rdi)
1676*9880d681SAndroid Build Coastguard Worker	jne	L10
1677*9880d681SAndroid Build Coastguard Worker	movzbl	3(%rdi), %eax
1678*9880d681SAndroid Build Coastguard Worker	cmpb	3(%rsi), %al
1679*9880d681SAndroid Build Coastguard WorkerL10:
1680*9880d681SAndroid Build Coastguard Worker	setl	%al
1681*9880d681SAndroid Build Coastguard Worker	movzbl	%al, %eax
1682*9880d681SAndroid Build Coastguard Worker	ret
1683*9880d681SAndroid Build Coastguard Worker
1684*9880d681SAndroid Build Coastguard Workerwhich is "perfect".
1685*9880d681SAndroid Build Coastguard Worker
1686*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1687*9880d681SAndroid Build Coastguard Worker
1688*9880d681SAndroid Build Coastguard WorkerFor the branch in the following code:
1689*9880d681SAndroid Build Coastguard Workerint a();
1690*9880d681SAndroid Build Coastguard Workerint b(int x, int y) {
1691*9880d681SAndroid Build Coastguard Worker  if (x & (1<<(y&7)))
1692*9880d681SAndroid Build Coastguard Worker    return a();
1693*9880d681SAndroid Build Coastguard Worker  return y;
1694*9880d681SAndroid Build Coastguard Worker}
1695*9880d681SAndroid Build Coastguard Worker
1696*9880d681SAndroid Build Coastguard WorkerWe currently generate:
1697*9880d681SAndroid Build Coastguard Worker	movb	%sil, %al
1698*9880d681SAndroid Build Coastguard Worker	andb	$7, %al
1699*9880d681SAndroid Build Coastguard Worker	movzbl	%al, %eax
1700*9880d681SAndroid Build Coastguard Worker	btl	%eax, %edi
1701*9880d681SAndroid Build Coastguard Worker	jae	.LBB0_2
1702*9880d681SAndroid Build Coastguard Worker
1703*9880d681SAndroid Build Coastguard Workermovl+andl would be shorter than the movb+andb+movzbl sequence.
1704*9880d681SAndroid Build Coastguard Worker
1705*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1706*9880d681SAndroid Build Coastguard Worker
1707*9880d681SAndroid Build Coastguard WorkerFor the following:
1708*9880d681SAndroid Build Coastguard Workerstruct u1 {
1709*9880d681SAndroid Build Coastguard Worker    float x, y;
1710*9880d681SAndroid Build Coastguard Worker};
1711*9880d681SAndroid Build Coastguard Workerfloat foo(struct u1 u) {
1712*9880d681SAndroid Build Coastguard Worker    return u.x + u.y;
1713*9880d681SAndroid Build Coastguard Worker}
1714*9880d681SAndroid Build Coastguard Worker
1715*9880d681SAndroid Build Coastguard WorkerWe currently generate:
1716*9880d681SAndroid Build Coastguard Worker	movdqa	%xmm0, %xmm1
1717*9880d681SAndroid Build Coastguard Worker	pshufd	$1, %xmm0, %xmm0        # xmm0 = xmm0[1,0,0,0]
1718*9880d681SAndroid Build Coastguard Worker	addss	%xmm1, %xmm0
1719*9880d681SAndroid Build Coastguard Worker	ret
1720*9880d681SAndroid Build Coastguard Worker
1721*9880d681SAndroid Build Coastguard WorkerWe could save an instruction here by commuting the addss.
1722*9880d681SAndroid Build Coastguard Worker
1723*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1724*9880d681SAndroid Build Coastguard Worker
1725*9880d681SAndroid Build Coastguard WorkerThis (from PR9661):
1726*9880d681SAndroid Build Coastguard Worker
1727*9880d681SAndroid Build Coastguard Workerfloat clamp_float(float a) {
1728*9880d681SAndroid Build Coastguard Worker        if (a > 1.0f)
1729*9880d681SAndroid Build Coastguard Worker                return 1.0f;
1730*9880d681SAndroid Build Coastguard Worker        else if (a < 0.0f)
1731*9880d681SAndroid Build Coastguard Worker                return 0.0f;
1732*9880d681SAndroid Build Coastguard Worker        else
1733*9880d681SAndroid Build Coastguard Worker                return a;
1734*9880d681SAndroid Build Coastguard Worker}
1735*9880d681SAndroid Build Coastguard Worker
1736*9880d681SAndroid Build Coastguard WorkerCould compile to:
1737*9880d681SAndroid Build Coastguard Worker
1738*9880d681SAndroid Build Coastguard Workerclamp_float:                            # @clamp_float
1739*9880d681SAndroid Build Coastguard Worker        movss   .LCPI0_0(%rip), %xmm1
1740*9880d681SAndroid Build Coastguard Worker        minss   %xmm1, %xmm0
1741*9880d681SAndroid Build Coastguard Worker        pxor    %xmm1, %xmm1
1742*9880d681SAndroid Build Coastguard Worker        maxss   %xmm1, %xmm0
1743*9880d681SAndroid Build Coastguard Worker        ret
1744*9880d681SAndroid Build Coastguard Worker
1745*9880d681SAndroid Build Coastguard Workerwith -ffast-math.
1746*9880d681SAndroid Build Coastguard Worker
1747*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1748*9880d681SAndroid Build Coastguard Worker
1749*9880d681SAndroid Build Coastguard WorkerThis function (from PR9803):
1750*9880d681SAndroid Build Coastguard Worker
1751*9880d681SAndroid Build Coastguard Workerint clamp2(int a) {
1752*9880d681SAndroid Build Coastguard Worker        if (a > 5)
1753*9880d681SAndroid Build Coastguard Worker                a = 5;
1754*9880d681SAndroid Build Coastguard Worker        if (a < 0)
1755*9880d681SAndroid Build Coastguard Worker                return 0;
1756*9880d681SAndroid Build Coastguard Worker        return a;
1757*9880d681SAndroid Build Coastguard Worker}
1758*9880d681SAndroid Build Coastguard Worker
1759*9880d681SAndroid Build Coastguard WorkerCompiles to:
1760*9880d681SAndroid Build Coastguard Worker
1761*9880d681SAndroid Build Coastguard Worker_clamp2:                                ## @clamp2
1762*9880d681SAndroid Build Coastguard Worker        pushq   %rbp
1763*9880d681SAndroid Build Coastguard Worker        movq    %rsp, %rbp
1764*9880d681SAndroid Build Coastguard Worker        cmpl    $5, %edi
1765*9880d681SAndroid Build Coastguard Worker        movl    $5, %ecx
1766*9880d681SAndroid Build Coastguard Worker        cmovlel %edi, %ecx
1767*9880d681SAndroid Build Coastguard Worker        testl   %ecx, %ecx
1768*9880d681SAndroid Build Coastguard Worker        movl    $0, %eax
1769*9880d681SAndroid Build Coastguard Worker        cmovnsl %ecx, %eax
1770*9880d681SAndroid Build Coastguard Worker        popq    %rbp
1771*9880d681SAndroid Build Coastguard Worker        ret
1772*9880d681SAndroid Build Coastguard Worker
1773*9880d681SAndroid Build Coastguard WorkerThe move of 0 could be scheduled above the test to make it is xor reg,reg.
1774*9880d681SAndroid Build Coastguard Worker
1775*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1776*9880d681SAndroid Build Coastguard Worker
1777*9880d681SAndroid Build Coastguard WorkerGCC PR48986.  We currently compile this:
1778*9880d681SAndroid Build Coastguard Worker
1779*9880d681SAndroid Build Coastguard Workervoid bar(void);
1780*9880d681SAndroid Build Coastguard Workervoid yyy(int* p) {
1781*9880d681SAndroid Build Coastguard Worker    if (__sync_fetch_and_add(p, -1) == 1)
1782*9880d681SAndroid Build Coastguard Worker      bar();
1783*9880d681SAndroid Build Coastguard Worker}
1784*9880d681SAndroid Build Coastguard Worker
1785*9880d681SAndroid Build Coastguard Workerinto:
1786*9880d681SAndroid Build Coastguard Worker	movl	$-1, %eax
1787*9880d681SAndroid Build Coastguard Worker	lock
1788*9880d681SAndroid Build Coastguard Worker	xaddl	%eax, (%rdi)
1789*9880d681SAndroid Build Coastguard Worker	cmpl	$1, %eax
1790*9880d681SAndroid Build Coastguard Worker	je	LBB0_2
1791*9880d681SAndroid Build Coastguard Worker
1792*9880d681SAndroid Build Coastguard WorkerInstead we could generate:
1793*9880d681SAndroid Build Coastguard Worker
1794*9880d681SAndroid Build Coastguard Worker	lock
1795*9880d681SAndroid Build Coastguard Worker	dec %rdi
1796*9880d681SAndroid Build Coastguard Worker	je LBB0_2
1797*9880d681SAndroid Build Coastguard Worker
1798*9880d681SAndroid Build Coastguard WorkerThe trick is to match "fetch_and_add(X, -C) == C".
1799*9880d681SAndroid Build Coastguard Worker
1800*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1801*9880d681SAndroid Build Coastguard Worker
1802*9880d681SAndroid Build Coastguard Workerunsigned t(unsigned a, unsigned b) {
1803*9880d681SAndroid Build Coastguard Worker  return a <= b ? 5 : -5;
1804*9880d681SAndroid Build Coastguard Worker}
1805*9880d681SAndroid Build Coastguard Worker
1806*9880d681SAndroid Build Coastguard WorkerWe generate:
1807*9880d681SAndroid Build Coastguard Worker	movl	$5, %ecx
1808*9880d681SAndroid Build Coastguard Worker	cmpl	%esi, %edi
1809*9880d681SAndroid Build Coastguard Worker	movl	$-5, %eax
1810*9880d681SAndroid Build Coastguard Worker	cmovbel	%ecx, %eax
1811*9880d681SAndroid Build Coastguard Worker
1812*9880d681SAndroid Build Coastguard WorkerGCC:
1813*9880d681SAndroid Build Coastguard Worker	cmpl	%edi, %esi
1814*9880d681SAndroid Build Coastguard Worker	sbbl	%eax, %eax
1815*9880d681SAndroid Build Coastguard Worker	andl	$-10, %eax
1816*9880d681SAndroid Build Coastguard Worker	addl	$5, %eax
1817*9880d681SAndroid Build Coastguard Worker
1818*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
1819