1*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2*9880d681SAndroid Build Coastguard Worker// Random ideas for the X86 backend. 3*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 4*9880d681SAndroid Build Coastguard Worker 5*9880d681SAndroid Build Coastguard WorkerImprovements to the multiply -> shift/add algorithm: 6*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/ml/gcc-patches/2004-08/msg01590.html 7*9880d681SAndroid Build Coastguard Worker 8*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 9*9880d681SAndroid Build Coastguard Worker 10*9880d681SAndroid Build Coastguard WorkerImprove code like this (occurs fairly frequently, e.g. in LLVM): 11*9880d681SAndroid Build Coastguard Workerlong long foo(int x) { return 1LL << x; } 12*9880d681SAndroid Build Coastguard Worker 13*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/ml/gcc-patches/2004-09/msg01109.html 14*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/ml/gcc-patches/2004-09/msg01128.html 15*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/ml/gcc-patches/2004-09/msg01136.html 16*9880d681SAndroid Build Coastguard Worker 17*9880d681SAndroid Build Coastguard WorkerAnother useful one would be ~0ULL >> X and ~0ULL << X. 18*9880d681SAndroid Build Coastguard Worker 19*9880d681SAndroid Build Coastguard WorkerOne better solution for 1LL << x is: 20*9880d681SAndroid Build Coastguard Worker xorl %eax, %eax 21*9880d681SAndroid Build Coastguard Worker xorl %edx, %edx 22*9880d681SAndroid Build Coastguard Worker testb $32, %cl 23*9880d681SAndroid Build Coastguard Worker sete %al 24*9880d681SAndroid Build Coastguard Worker setne %dl 25*9880d681SAndroid Build Coastguard Worker sall %cl, %eax 26*9880d681SAndroid Build Coastguard Worker sall %cl, %edx 27*9880d681SAndroid Build Coastguard Worker 28*9880d681SAndroid Build Coastguard WorkerBut that requires good 8-bit subreg support. 29*9880d681SAndroid Build Coastguard Worker 30*9880d681SAndroid Build Coastguard WorkerAlso, this might be better. It's an extra shift, but it's one instruction 31*9880d681SAndroid Build Coastguard Workershorter, and doesn't stress 8-bit subreg support. 32*9880d681SAndroid Build Coastguard Worker(From http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01148.html, 33*9880d681SAndroid Build Coastguard Workerbut without the unnecessary and.) 34*9880d681SAndroid Build Coastguard Worker movl %ecx, %eax 35*9880d681SAndroid Build Coastguard Worker shrl $5, %eax 36*9880d681SAndroid Build Coastguard Worker movl %eax, %edx 37*9880d681SAndroid Build Coastguard Worker xorl $1, %edx 38*9880d681SAndroid Build Coastguard Worker sall %cl, %eax 39*9880d681SAndroid Build Coastguard Worker sall %cl. %edx 40*9880d681SAndroid Build Coastguard Worker 41*9880d681SAndroid Build Coastguard Worker64-bit shifts (in general) expand to really bad code. Instead of using 42*9880d681SAndroid Build Coastguard Workercmovs, we should expand to a conditional branch like GCC produces. 43*9880d681SAndroid Build Coastguard Worker 44*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 45*9880d681SAndroid Build Coastguard Worker 46*9880d681SAndroid Build Coastguard WorkerSome isel ideas: 47*9880d681SAndroid Build Coastguard Worker 48*9880d681SAndroid Build Coastguard Worker1. Dynamic programming based approach when compile time is not an 49*9880d681SAndroid Build Coastguard Worker issue. 50*9880d681SAndroid Build Coastguard Worker2. Code duplication (addressing mode) during isel. 51*9880d681SAndroid Build Coastguard Worker3. Other ideas from "Register-Sensitive Selection, Duplication, and 52*9880d681SAndroid Build Coastguard Worker Sequencing of Instructions". 53*9880d681SAndroid Build Coastguard Worker4. Scheduling for reduced register pressure. E.g. "Minimum Register 54*9880d681SAndroid Build Coastguard Worker Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs" 55*9880d681SAndroid Build Coastguard Worker and other related papers. 56*9880d681SAndroid Build Coastguard Worker http://citeseer.ist.psu.edu/govindarajan01minimum.html 57*9880d681SAndroid Build Coastguard Worker 58*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 59*9880d681SAndroid Build Coastguard Worker 60*9880d681SAndroid Build Coastguard WorkerShould we promote i16 to i32 to avoid partial register update stalls? 61*9880d681SAndroid Build Coastguard Worker 62*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 63*9880d681SAndroid Build Coastguard Worker 64*9880d681SAndroid Build Coastguard WorkerLeave any_extend as pseudo instruction and hint to register 65*9880d681SAndroid Build Coastguard Workerallocator. Delay codegen until post register allocation. 66*9880d681SAndroid Build Coastguard WorkerNote. any_extend is now turned into an INSERT_SUBREG. We still need to teach 67*9880d681SAndroid Build Coastguard Workerthe coalescer how to deal with it though. 68*9880d681SAndroid Build Coastguard Worker 69*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 70*9880d681SAndroid Build Coastguard Worker 71*9880d681SAndroid Build Coastguard WorkerIt appears icc use push for parameter passing. Need to investigate. 72*9880d681SAndroid Build Coastguard Worker 73*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 74*9880d681SAndroid Build Coastguard Worker 75*9880d681SAndroid Build Coastguard WorkerThe instruction selector sometimes misses folding a load into a compare. The 76*9880d681SAndroid Build Coastguard Workerpattern is written as (cmp reg, (load p)). Because the compare isn't 77*9880d681SAndroid Build Coastguard Workercommutative, it is not matched with the load on both sides. The dag combiner 78*9880d681SAndroid Build Coastguard Workershould be made smart enough to canonicalize the load into the RHS of a compare 79*9880d681SAndroid Build Coastguard Workerwhen it can invert the result of the compare for free. 80*9880d681SAndroid Build Coastguard Worker 81*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 82*9880d681SAndroid Build Coastguard Worker 83*9880d681SAndroid Build Coastguard WorkerIn many cases, LLVM generates code like this: 84*9880d681SAndroid Build Coastguard Worker 85*9880d681SAndroid Build Coastguard Worker_test: 86*9880d681SAndroid Build Coastguard Worker movl 8(%esp), %eax 87*9880d681SAndroid Build Coastguard Worker cmpl %eax, 4(%esp) 88*9880d681SAndroid Build Coastguard Worker setl %al 89*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 90*9880d681SAndroid Build Coastguard Worker ret 91*9880d681SAndroid Build Coastguard Worker 92*9880d681SAndroid Build Coastguard Workeron some processors (which ones?), it is more efficient to do this: 93*9880d681SAndroid Build Coastguard Worker 94*9880d681SAndroid Build Coastguard Worker_test: 95*9880d681SAndroid Build Coastguard Worker movl 8(%esp), %ebx 96*9880d681SAndroid Build Coastguard Worker xor %eax, %eax 97*9880d681SAndroid Build Coastguard Worker cmpl %ebx, 4(%esp) 98*9880d681SAndroid Build Coastguard Worker setl %al 99*9880d681SAndroid Build Coastguard Worker ret 100*9880d681SAndroid Build Coastguard Worker 101*9880d681SAndroid Build Coastguard WorkerDoing this correctly is tricky though, as the xor clobbers the flags. 102*9880d681SAndroid Build Coastguard Worker 103*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 104*9880d681SAndroid Build Coastguard Worker 105*9880d681SAndroid Build Coastguard WorkerWe should generate bts/btr/etc instructions on targets where they are cheap or 106*9880d681SAndroid Build Coastguard Workerwhen codesize is important. e.g., for: 107*9880d681SAndroid Build Coastguard Worker 108*9880d681SAndroid Build Coastguard Workervoid setbit(int *target, int bit) { 109*9880d681SAndroid Build Coastguard Worker *target |= (1 << bit); 110*9880d681SAndroid Build Coastguard Worker} 111*9880d681SAndroid Build Coastguard Workervoid clearbit(int *target, int bit) { 112*9880d681SAndroid Build Coastguard Worker *target &= ~(1 << bit); 113*9880d681SAndroid Build Coastguard Worker} 114*9880d681SAndroid Build Coastguard Worker 115*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 116*9880d681SAndroid Build Coastguard Worker 117*9880d681SAndroid Build Coastguard WorkerInstead of the following for memset char*, 1, 10: 118*9880d681SAndroid Build Coastguard Worker 119*9880d681SAndroid Build Coastguard Worker movl $16843009, 4(%edx) 120*9880d681SAndroid Build Coastguard Worker movl $16843009, (%edx) 121*9880d681SAndroid Build Coastguard Worker movw $257, 8(%edx) 122*9880d681SAndroid Build Coastguard Worker 123*9880d681SAndroid Build Coastguard WorkerIt might be better to generate 124*9880d681SAndroid Build Coastguard Worker 125*9880d681SAndroid Build Coastguard Worker movl $16843009, %eax 126*9880d681SAndroid Build Coastguard Worker movl %eax, 4(%edx) 127*9880d681SAndroid Build Coastguard Worker movl %eax, (%edx) 128*9880d681SAndroid Build Coastguard Worker movw al, 8(%edx) 129*9880d681SAndroid Build Coastguard Worker 130*9880d681SAndroid Build Coastguard Workerwhen we can spare a register. It reduces code size. 131*9880d681SAndroid Build Coastguard Worker 132*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 133*9880d681SAndroid Build Coastguard Worker 134*9880d681SAndroid Build Coastguard WorkerEvaluate what the best way to codegen sdiv X, (2^C) is. For X/8, we currently 135*9880d681SAndroid Build Coastguard Workerget this: 136*9880d681SAndroid Build Coastguard Worker 137*9880d681SAndroid Build Coastguard Workerdefine i32 @test1(i32 %X) { 138*9880d681SAndroid Build Coastguard Worker %Y = sdiv i32 %X, 8 139*9880d681SAndroid Build Coastguard Worker ret i32 %Y 140*9880d681SAndroid Build Coastguard Worker} 141*9880d681SAndroid Build Coastguard Worker 142*9880d681SAndroid Build Coastguard Worker_test1: 143*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 144*9880d681SAndroid Build Coastguard Worker movl %eax, %ecx 145*9880d681SAndroid Build Coastguard Worker sarl $31, %ecx 146*9880d681SAndroid Build Coastguard Worker shrl $29, %ecx 147*9880d681SAndroid Build Coastguard Worker addl %ecx, %eax 148*9880d681SAndroid Build Coastguard Worker sarl $3, %eax 149*9880d681SAndroid Build Coastguard Worker ret 150*9880d681SAndroid Build Coastguard Worker 151*9880d681SAndroid Build Coastguard WorkerGCC knows several different ways to codegen it, one of which is this: 152*9880d681SAndroid Build Coastguard Worker 153*9880d681SAndroid Build Coastguard Worker_test1: 154*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 155*9880d681SAndroid Build Coastguard Worker cmpl $-1, %eax 156*9880d681SAndroid Build Coastguard Worker leal 7(%eax), %ecx 157*9880d681SAndroid Build Coastguard Worker cmovle %ecx, %eax 158*9880d681SAndroid Build Coastguard Worker sarl $3, %eax 159*9880d681SAndroid Build Coastguard Worker ret 160*9880d681SAndroid Build Coastguard Worker 161*9880d681SAndroid Build Coastguard Workerwhich is probably slower, but it's interesting at least :) 162*9880d681SAndroid Build Coastguard Worker 163*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 164*9880d681SAndroid Build Coastguard Worker 165*9880d681SAndroid Build Coastguard WorkerWe are currently lowering large (1MB+) memmove/memcpy to rep/stosl and rep/movsl 166*9880d681SAndroid Build Coastguard WorkerWe should leave these as libcalls for everything over a much lower threshold, 167*9880d681SAndroid Build Coastguard Workersince libc is hand tuned for medium and large mem ops (avoiding RFO for large 168*9880d681SAndroid Build Coastguard Workerstores, TLB preheating, etc) 169*9880d681SAndroid Build Coastguard Worker 170*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 171*9880d681SAndroid Build Coastguard Worker 172*9880d681SAndroid Build Coastguard WorkerOptimize this into something reasonable: 173*9880d681SAndroid Build Coastguard Worker x * copysign(1.0, y) * copysign(1.0, z) 174*9880d681SAndroid Build Coastguard Worker 175*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 176*9880d681SAndroid Build Coastguard Worker 177*9880d681SAndroid Build Coastguard WorkerOptimize copysign(x, *y) to use an integer load from y. 178*9880d681SAndroid Build Coastguard Worker 179*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 180*9880d681SAndroid Build Coastguard Worker 181*9880d681SAndroid Build Coastguard WorkerThe following tests perform worse with LSR: 182*9880d681SAndroid Build Coastguard Worker 183*9880d681SAndroid Build Coastguard Workerlambda, siod, optimizer-eval, ackermann, hash2, nestedloop, strcat, and Treesor. 184*9880d681SAndroid Build Coastguard Worker 185*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 186*9880d681SAndroid Build Coastguard Worker 187*9880d681SAndroid Build Coastguard WorkerAdding to the list of cmp / test poor codegen issues: 188*9880d681SAndroid Build Coastguard Worker 189*9880d681SAndroid Build Coastguard Workerint test(__m128 *A, __m128 *B) { 190*9880d681SAndroid Build Coastguard Worker if (_mm_comige_ss(*A, *B)) 191*9880d681SAndroid Build Coastguard Worker return 3; 192*9880d681SAndroid Build Coastguard Worker else 193*9880d681SAndroid Build Coastguard Worker return 4; 194*9880d681SAndroid Build Coastguard Worker} 195*9880d681SAndroid Build Coastguard Worker 196*9880d681SAndroid Build Coastguard Worker_test: 197*9880d681SAndroid Build Coastguard Worker movl 8(%esp), %eax 198*9880d681SAndroid Build Coastguard Worker movaps (%eax), %xmm0 199*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 200*9880d681SAndroid Build Coastguard Worker movaps (%eax), %xmm1 201*9880d681SAndroid Build Coastguard Worker comiss %xmm0, %xmm1 202*9880d681SAndroid Build Coastguard Worker setae %al 203*9880d681SAndroid Build Coastguard Worker movzbl %al, %ecx 204*9880d681SAndroid Build Coastguard Worker movl $3, %eax 205*9880d681SAndroid Build Coastguard Worker movl $4, %edx 206*9880d681SAndroid Build Coastguard Worker cmpl $0, %ecx 207*9880d681SAndroid Build Coastguard Worker cmove %edx, %eax 208*9880d681SAndroid Build Coastguard Worker ret 209*9880d681SAndroid Build Coastguard Worker 210*9880d681SAndroid Build Coastguard WorkerNote the setae, movzbl, cmpl, cmove can be replaced with a single cmovae. There 211*9880d681SAndroid Build Coastguard Workerare a number of issues. 1) We are introducing a setcc between the result of the 212*9880d681SAndroid Build Coastguard Workerintrisic call and select. 2) The intrinsic is expected to produce a i32 value 213*9880d681SAndroid Build Coastguard Workerso a any extend (which becomes a zero extend) is added. 214*9880d681SAndroid Build Coastguard Worker 215*9880d681SAndroid Build Coastguard WorkerWe probably need some kind of target DAG combine hook to fix this. 216*9880d681SAndroid Build Coastguard Worker 217*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 218*9880d681SAndroid Build Coastguard Worker 219*9880d681SAndroid Build Coastguard WorkerWe generate significantly worse code for this than GCC: 220*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=21150 221*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/attachment.cgi?id=8701 222*9880d681SAndroid Build Coastguard Worker 223*9880d681SAndroid Build Coastguard WorkerThere is also one case we do worse on PPC. 224*9880d681SAndroid Build Coastguard Worker 225*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 226*9880d681SAndroid Build Coastguard Worker 227*9880d681SAndroid Build Coastguard WorkerFor this: 228*9880d681SAndroid Build Coastguard Worker 229*9880d681SAndroid Build Coastguard Workerint test(int a) 230*9880d681SAndroid Build Coastguard Worker{ 231*9880d681SAndroid Build Coastguard Worker return a * 3; 232*9880d681SAndroid Build Coastguard Worker} 233*9880d681SAndroid Build Coastguard Worker 234*9880d681SAndroid Build Coastguard WorkerWe currently emits 235*9880d681SAndroid Build Coastguard Worker imull $3, 4(%esp), %eax 236*9880d681SAndroid Build Coastguard Worker 237*9880d681SAndroid Build Coastguard WorkerPerhaps this is what we really should generate is? Is imull three or four 238*9880d681SAndroid Build Coastguard Workercycles? Note: ICC generates this: 239*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 240*9880d681SAndroid Build Coastguard Worker leal (%eax,%eax,2), %eax 241*9880d681SAndroid Build Coastguard Worker 242*9880d681SAndroid Build Coastguard WorkerThe current instruction priority is based on pattern complexity. The former is 243*9880d681SAndroid Build Coastguard Workermore "complex" because it folds a load so the latter will not be emitted. 244*9880d681SAndroid Build Coastguard Worker 245*9880d681SAndroid Build Coastguard WorkerPerhaps we should use AddedComplexity to give LEA32r a higher priority? We 246*9880d681SAndroid Build Coastguard Workershould always try to match LEA first since the LEA matching code does some 247*9880d681SAndroid Build Coastguard Workerestimate to determine whether the match is profitable. 248*9880d681SAndroid Build Coastguard Worker 249*9880d681SAndroid Build Coastguard WorkerHowever, if we care more about code size, then imull is better. It's two bytes 250*9880d681SAndroid Build Coastguard Workershorter than movl + leal. 251*9880d681SAndroid Build Coastguard Worker 252*9880d681SAndroid Build Coastguard WorkerOn a Pentium M, both variants have the same characteristics with regard 253*9880d681SAndroid Build Coastguard Workerto throughput; however, the multiplication has a latency of four cycles, as 254*9880d681SAndroid Build Coastguard Workeropposed to two cycles for the movl+lea variant. 255*9880d681SAndroid Build Coastguard Worker 256*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 257*9880d681SAndroid Build Coastguard Worker 258*9880d681SAndroid Build Coastguard WorkerIt appears gcc place string data with linkonce linkage in 259*9880d681SAndroid Build Coastguard Worker.section __TEXT,__const_coal,coalesced instead of 260*9880d681SAndroid Build Coastguard Worker.section __DATA,__const_coal,coalesced. 261*9880d681SAndroid Build Coastguard WorkerTake a look at darwin.h, there are other Darwin assembler directives that we 262*9880d681SAndroid Build Coastguard Workerdo not make use of. 263*9880d681SAndroid Build Coastguard Worker 264*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 265*9880d681SAndroid Build Coastguard Worker 266*9880d681SAndroid Build Coastguard Workerdefine i32 @foo(i32* %a, i32 %t) { 267*9880d681SAndroid Build Coastguard Workerentry: 268*9880d681SAndroid Build Coastguard Worker br label %cond_true 269*9880d681SAndroid Build Coastguard Worker 270*9880d681SAndroid Build Coastguard Workercond_true: ; preds = %cond_true, %entry 271*9880d681SAndroid Build Coastguard Worker %x.0.0 = phi i32 [ 0, %entry ], [ %tmp9, %cond_true ] ; <i32> [#uses=3] 272*9880d681SAndroid Build Coastguard Worker %t_addr.0.0 = phi i32 [ %t, %entry ], [ %tmp7, %cond_true ] ; <i32> [#uses=1] 273*9880d681SAndroid Build Coastguard Worker %tmp2 = getelementptr i32* %a, i32 %x.0.0 ; <i32*> [#uses=1] 274*9880d681SAndroid Build Coastguard Worker %tmp3 = load i32* %tmp2 ; <i32> [#uses=1] 275*9880d681SAndroid Build Coastguard Worker %tmp5 = add i32 %t_addr.0.0, %x.0.0 ; <i32> [#uses=1] 276*9880d681SAndroid Build Coastguard Worker %tmp7 = add i32 %tmp5, %tmp3 ; <i32> [#uses=2] 277*9880d681SAndroid Build Coastguard Worker %tmp9 = add i32 %x.0.0, 1 ; <i32> [#uses=2] 278*9880d681SAndroid Build Coastguard Worker %tmp = icmp sgt i32 %tmp9, 39 ; <i1> [#uses=1] 279*9880d681SAndroid Build Coastguard Worker br i1 %tmp, label %bb12, label %cond_true 280*9880d681SAndroid Build Coastguard Worker 281*9880d681SAndroid Build Coastguard Workerbb12: ; preds = %cond_true 282*9880d681SAndroid Build Coastguard Worker ret i32 %tmp7 283*9880d681SAndroid Build Coastguard Worker} 284*9880d681SAndroid Build Coastguard Workeris pessimized by -loop-reduce and -indvars 285*9880d681SAndroid Build Coastguard Worker 286*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 287*9880d681SAndroid Build Coastguard Worker 288*9880d681SAndroid Build Coastguard Workeru32 to float conversion improvement: 289*9880d681SAndroid Build Coastguard Worker 290*9880d681SAndroid Build Coastguard Workerfloat uint32_2_float( unsigned u ) { 291*9880d681SAndroid Build Coastguard Worker float fl = (int) (u & 0xffff); 292*9880d681SAndroid Build Coastguard Worker float fh = (int) (u >> 16); 293*9880d681SAndroid Build Coastguard Worker fh *= 0x1.0p16f; 294*9880d681SAndroid Build Coastguard Worker return fh + fl; 295*9880d681SAndroid Build Coastguard Worker} 296*9880d681SAndroid Build Coastguard Worker 297*9880d681SAndroid Build Coastguard Worker00000000 subl $0x04,%esp 298*9880d681SAndroid Build Coastguard Worker00000003 movl 0x08(%esp,1),%eax 299*9880d681SAndroid Build Coastguard Worker00000007 movl %eax,%ecx 300*9880d681SAndroid Build Coastguard Worker00000009 shrl $0x10,%ecx 301*9880d681SAndroid Build Coastguard Worker0000000c cvtsi2ss %ecx,%xmm0 302*9880d681SAndroid Build Coastguard Worker00000010 andl $0x0000ffff,%eax 303*9880d681SAndroid Build Coastguard Worker00000015 cvtsi2ss %eax,%xmm1 304*9880d681SAndroid Build Coastguard Worker00000019 mulss 0x00000078,%xmm0 305*9880d681SAndroid Build Coastguard Worker00000021 addss %xmm1,%xmm0 306*9880d681SAndroid Build Coastguard Worker00000025 movss %xmm0,(%esp,1) 307*9880d681SAndroid Build Coastguard Worker0000002a flds (%esp,1) 308*9880d681SAndroid Build Coastguard Worker0000002d addl $0x04,%esp 309*9880d681SAndroid Build Coastguard Worker00000030 ret 310*9880d681SAndroid Build Coastguard Worker 311*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 312*9880d681SAndroid Build Coastguard Worker 313*9880d681SAndroid Build Coastguard WorkerWhen using fastcc abi, align stack slot of argument of type double on 8 byte 314*9880d681SAndroid Build Coastguard Workerboundary to improve performance. 315*9880d681SAndroid Build Coastguard Worker 316*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 317*9880d681SAndroid Build Coastguard Worker 318*9880d681SAndroid Build Coastguard WorkerGCC's ix86_expand_int_movcc function (in i386.c) has a ton of interesting 319*9880d681SAndroid Build Coastguard Workersimplifications for integer "x cmp y ? a : b". 320*9880d681SAndroid Build Coastguard Worker 321*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 322*9880d681SAndroid Build Coastguard Worker 323*9880d681SAndroid Build Coastguard WorkerConsider the expansion of: 324*9880d681SAndroid Build Coastguard Worker 325*9880d681SAndroid Build Coastguard Workerdefine i32 @test3(i32 %X) { 326*9880d681SAndroid Build Coastguard Worker %tmp1 = urem i32 %X, 255 327*9880d681SAndroid Build Coastguard Worker ret i32 %tmp1 328*9880d681SAndroid Build Coastguard Worker} 329*9880d681SAndroid Build Coastguard Worker 330*9880d681SAndroid Build Coastguard WorkerCurrently it compiles to: 331*9880d681SAndroid Build Coastguard Worker 332*9880d681SAndroid Build Coastguard Worker... 333*9880d681SAndroid Build Coastguard Worker movl $2155905153, %ecx 334*9880d681SAndroid Build Coastguard Worker movl 8(%esp), %esi 335*9880d681SAndroid Build Coastguard Worker movl %esi, %eax 336*9880d681SAndroid Build Coastguard Worker mull %ecx 337*9880d681SAndroid Build Coastguard Worker... 338*9880d681SAndroid Build Coastguard Worker 339*9880d681SAndroid Build Coastguard WorkerThis could be "reassociated" into: 340*9880d681SAndroid Build Coastguard Worker 341*9880d681SAndroid Build Coastguard Worker movl $2155905153, %eax 342*9880d681SAndroid Build Coastguard Worker movl 8(%esp), %ecx 343*9880d681SAndroid Build Coastguard Worker mull %ecx 344*9880d681SAndroid Build Coastguard Worker 345*9880d681SAndroid Build Coastguard Workerto avoid the copy. In fact, the existing two-address stuff would do this 346*9880d681SAndroid Build Coastguard Workerexcept that mul isn't a commutative 2-addr instruction. I guess this has 347*9880d681SAndroid Build Coastguard Workerto be done at isel time based on the #uses to mul? 348*9880d681SAndroid Build Coastguard Worker 349*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 350*9880d681SAndroid Build Coastguard Worker 351*9880d681SAndroid Build Coastguard WorkerMake sure the instruction which starts a loop does not cross a cacheline 352*9880d681SAndroid Build Coastguard Workerboundary. This requires knowning the exact length of each machine instruction. 353*9880d681SAndroid Build Coastguard WorkerThat is somewhat complicated, but doable. Example 256.bzip2: 354*9880d681SAndroid Build Coastguard Worker 355*9880d681SAndroid Build Coastguard WorkerIn the new trace, the hot loop has an instruction which crosses a cacheline 356*9880d681SAndroid Build Coastguard Workerboundary. In addition to potential cache misses, this can't help decoding as I 357*9880d681SAndroid Build Coastguard Workerimagine there has to be some kind of complicated decoder reset and realignment 358*9880d681SAndroid Build Coastguard Workerto grab the bytes from the next cacheline. 359*9880d681SAndroid Build Coastguard Worker 360*9880d681SAndroid Build Coastguard Worker532 532 0x3cfc movb (1809(%esp, %esi), %bl <<<--- spans 2 64 byte lines 361*9880d681SAndroid Build Coastguard Worker942 942 0x3d03 movl %dh, (1809(%esp, %esi) 362*9880d681SAndroid Build Coastguard Worker937 937 0x3d0a incl %esi 363*9880d681SAndroid Build Coastguard Worker3 3 0x3d0b cmpb %bl, %dl 364*9880d681SAndroid Build Coastguard Worker27 27 0x3d0d jnz 0x000062db <main+11707> 365*9880d681SAndroid Build Coastguard Worker 366*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 367*9880d681SAndroid Build Coastguard Worker 368*9880d681SAndroid Build Coastguard WorkerIn c99 mode, the preprocessor doesn't like assembly comments like #TRUNCATE. 369*9880d681SAndroid Build Coastguard Worker 370*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 371*9880d681SAndroid Build Coastguard Worker 372*9880d681SAndroid Build Coastguard WorkerThis could be a single 16-bit load. 373*9880d681SAndroid Build Coastguard Worker 374*9880d681SAndroid Build Coastguard Workerint f(char *p) { 375*9880d681SAndroid Build Coastguard Worker if ((p[0] == 1) & (p[1] == 2)) return 1; 376*9880d681SAndroid Build Coastguard Worker return 0; 377*9880d681SAndroid Build Coastguard Worker} 378*9880d681SAndroid Build Coastguard Worker 379*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 380*9880d681SAndroid Build Coastguard Worker 381*9880d681SAndroid Build Coastguard WorkerWe should inline lrintf and probably other libc functions. 382*9880d681SAndroid Build Coastguard Worker 383*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 384*9880d681SAndroid Build Coastguard Worker 385*9880d681SAndroid Build Coastguard WorkerThis code: 386*9880d681SAndroid Build Coastguard Worker 387*9880d681SAndroid Build Coastguard Workervoid test(int X) { 388*9880d681SAndroid Build Coastguard Worker if (X) abort(); 389*9880d681SAndroid Build Coastguard Worker} 390*9880d681SAndroid Build Coastguard Worker 391*9880d681SAndroid Build Coastguard Workeris currently compiled to: 392*9880d681SAndroid Build Coastguard Worker 393*9880d681SAndroid Build Coastguard Worker_test: 394*9880d681SAndroid Build Coastguard Worker subl $12, %esp 395*9880d681SAndroid Build Coastguard Worker cmpl $0, 16(%esp) 396*9880d681SAndroid Build Coastguard Worker jne LBB1_1 397*9880d681SAndroid Build Coastguard Worker addl $12, %esp 398*9880d681SAndroid Build Coastguard Worker ret 399*9880d681SAndroid Build Coastguard WorkerLBB1_1: 400*9880d681SAndroid Build Coastguard Worker call L_abort$stub 401*9880d681SAndroid Build Coastguard Worker 402*9880d681SAndroid Build Coastguard WorkerIt would be better to produce: 403*9880d681SAndroid Build Coastguard Worker 404*9880d681SAndroid Build Coastguard Worker_test: 405*9880d681SAndroid Build Coastguard Worker subl $12, %esp 406*9880d681SAndroid Build Coastguard Worker cmpl $0, 16(%esp) 407*9880d681SAndroid Build Coastguard Worker jne L_abort$stub 408*9880d681SAndroid Build Coastguard Worker addl $12, %esp 409*9880d681SAndroid Build Coastguard Worker ret 410*9880d681SAndroid Build Coastguard Worker 411*9880d681SAndroid Build Coastguard WorkerThis can be applied to any no-return function call that takes no arguments etc. 412*9880d681SAndroid Build Coastguard WorkerAlternatively, the stack save/restore logic could be shrink-wrapped, producing 413*9880d681SAndroid Build Coastguard Workersomething like this: 414*9880d681SAndroid Build Coastguard Worker 415*9880d681SAndroid Build Coastguard Worker_test: 416*9880d681SAndroid Build Coastguard Worker cmpl $0, 4(%esp) 417*9880d681SAndroid Build Coastguard Worker jne LBB1_1 418*9880d681SAndroid Build Coastguard Worker ret 419*9880d681SAndroid Build Coastguard WorkerLBB1_1: 420*9880d681SAndroid Build Coastguard Worker subl $12, %esp 421*9880d681SAndroid Build Coastguard Worker call L_abort$stub 422*9880d681SAndroid Build Coastguard Worker 423*9880d681SAndroid Build Coastguard WorkerBoth are useful in different situations. Finally, it could be shrink-wrapped 424*9880d681SAndroid Build Coastguard Workerand tail called, like this: 425*9880d681SAndroid Build Coastguard Worker 426*9880d681SAndroid Build Coastguard Worker_test: 427*9880d681SAndroid Build Coastguard Worker cmpl $0, 4(%esp) 428*9880d681SAndroid Build Coastguard Worker jne LBB1_1 429*9880d681SAndroid Build Coastguard Worker ret 430*9880d681SAndroid Build Coastguard WorkerLBB1_1: 431*9880d681SAndroid Build Coastguard Worker pop %eax # realign stack. 432*9880d681SAndroid Build Coastguard Worker call L_abort$stub 433*9880d681SAndroid Build Coastguard Worker 434*9880d681SAndroid Build Coastguard WorkerThough this probably isn't worth it. 435*9880d681SAndroid Build Coastguard Worker 436*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 437*9880d681SAndroid Build Coastguard Worker 438*9880d681SAndroid Build Coastguard WorkerSometimes it is better to codegen subtractions from a constant (e.g. 7-x) with 439*9880d681SAndroid Build Coastguard Workera neg instead of a sub instruction. Consider: 440*9880d681SAndroid Build Coastguard Worker 441*9880d681SAndroid Build Coastguard Workerint test(char X) { return 7-X; } 442*9880d681SAndroid Build Coastguard Worker 443*9880d681SAndroid Build Coastguard Workerwe currently produce: 444*9880d681SAndroid Build Coastguard Worker_test: 445*9880d681SAndroid Build Coastguard Worker movl $7, %eax 446*9880d681SAndroid Build Coastguard Worker movsbl 4(%esp), %ecx 447*9880d681SAndroid Build Coastguard Worker subl %ecx, %eax 448*9880d681SAndroid Build Coastguard Worker ret 449*9880d681SAndroid Build Coastguard Worker 450*9880d681SAndroid Build Coastguard WorkerWe would use one fewer register if codegen'd as: 451*9880d681SAndroid Build Coastguard Worker 452*9880d681SAndroid Build Coastguard Worker movsbl 4(%esp), %eax 453*9880d681SAndroid Build Coastguard Worker neg %eax 454*9880d681SAndroid Build Coastguard Worker add $7, %eax 455*9880d681SAndroid Build Coastguard Worker ret 456*9880d681SAndroid Build Coastguard Worker 457*9880d681SAndroid Build Coastguard WorkerNote that this isn't beneficial if the load can be folded into the sub. In 458*9880d681SAndroid Build Coastguard Workerthis case, we want a sub: 459*9880d681SAndroid Build Coastguard Worker 460*9880d681SAndroid Build Coastguard Workerint test(int X) { return 7-X; } 461*9880d681SAndroid Build Coastguard Worker_test: 462*9880d681SAndroid Build Coastguard Worker movl $7, %eax 463*9880d681SAndroid Build Coastguard Worker subl 4(%esp), %eax 464*9880d681SAndroid Build Coastguard Worker ret 465*9880d681SAndroid Build Coastguard Worker 466*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 467*9880d681SAndroid Build Coastguard Worker 468*9880d681SAndroid Build Coastguard WorkerLeaf functions that require one 4-byte spill slot have a prolog like this: 469*9880d681SAndroid Build Coastguard Worker 470*9880d681SAndroid Build Coastguard Worker_foo: 471*9880d681SAndroid Build Coastguard Worker pushl %esi 472*9880d681SAndroid Build Coastguard Worker subl $4, %esp 473*9880d681SAndroid Build Coastguard Worker... 474*9880d681SAndroid Build Coastguard Workerand an epilog like this: 475*9880d681SAndroid Build Coastguard Worker addl $4, %esp 476*9880d681SAndroid Build Coastguard Worker popl %esi 477*9880d681SAndroid Build Coastguard Worker ret 478*9880d681SAndroid Build Coastguard Worker 479*9880d681SAndroid Build Coastguard WorkerIt would be smaller, and potentially faster, to push eax on entry and to 480*9880d681SAndroid Build Coastguard Workerpop into a dummy register instead of using addl/subl of esp. Just don't pop 481*9880d681SAndroid Build Coastguard Workerinto any return registers :) 482*9880d681SAndroid Build Coastguard Worker 483*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 484*9880d681SAndroid Build Coastguard Worker 485*9880d681SAndroid Build Coastguard WorkerThe X86 backend should fold (branch (or (setcc, setcc))) into multiple 486*9880d681SAndroid Build Coastguard Workerbranches. We generate really poor code for: 487*9880d681SAndroid Build Coastguard Worker 488*9880d681SAndroid Build Coastguard Workerdouble testf(double a) { 489*9880d681SAndroid Build Coastguard Worker return a == 0.0 ? 0.0 : (a > 0.0 ? 1.0 : -1.0); 490*9880d681SAndroid Build Coastguard Worker} 491*9880d681SAndroid Build Coastguard Worker 492*9880d681SAndroid Build Coastguard WorkerFor example, the entry BB is: 493*9880d681SAndroid Build Coastguard Worker 494*9880d681SAndroid Build Coastguard Worker_testf: 495*9880d681SAndroid Build Coastguard Worker subl $20, %esp 496*9880d681SAndroid Build Coastguard Worker pxor %xmm0, %xmm0 497*9880d681SAndroid Build Coastguard Worker movsd 24(%esp), %xmm1 498*9880d681SAndroid Build Coastguard Worker ucomisd %xmm0, %xmm1 499*9880d681SAndroid Build Coastguard Worker setnp %al 500*9880d681SAndroid Build Coastguard Worker sete %cl 501*9880d681SAndroid Build Coastguard Worker testb %cl, %al 502*9880d681SAndroid Build Coastguard Worker jne LBB1_5 # UnifiedReturnBlock 503*9880d681SAndroid Build Coastguard WorkerLBB1_1: # cond_true 504*9880d681SAndroid Build Coastguard Worker 505*9880d681SAndroid Build Coastguard Worker 506*9880d681SAndroid Build Coastguard Workerit would be better to replace the last four instructions with: 507*9880d681SAndroid Build Coastguard Worker 508*9880d681SAndroid Build Coastguard Worker jp LBB1_1 509*9880d681SAndroid Build Coastguard Worker je LBB1_5 510*9880d681SAndroid Build Coastguard WorkerLBB1_1: 511*9880d681SAndroid Build Coastguard Worker 512*9880d681SAndroid Build Coastguard WorkerWe also codegen the inner ?: into a diamond: 513*9880d681SAndroid Build Coastguard Worker 514*9880d681SAndroid Build Coastguard Worker cvtss2sd LCPI1_0(%rip), %xmm2 515*9880d681SAndroid Build Coastguard Worker cvtss2sd LCPI1_1(%rip), %xmm3 516*9880d681SAndroid Build Coastguard Worker ucomisd %xmm1, %xmm0 517*9880d681SAndroid Build Coastguard Worker ja LBB1_3 # cond_true 518*9880d681SAndroid Build Coastguard WorkerLBB1_2: # cond_true 519*9880d681SAndroid Build Coastguard Worker movapd %xmm3, %xmm2 520*9880d681SAndroid Build Coastguard WorkerLBB1_3: # cond_true 521*9880d681SAndroid Build Coastguard Worker movapd %xmm2, %xmm0 522*9880d681SAndroid Build Coastguard Worker ret 523*9880d681SAndroid Build Coastguard Worker 524*9880d681SAndroid Build Coastguard WorkerWe should sink the load into xmm3 into the LBB1_2 block. This should 525*9880d681SAndroid Build Coastguard Workerbe pretty easy, and will nuke all the copies. 526*9880d681SAndroid Build Coastguard Worker 527*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 528*9880d681SAndroid Build Coastguard Worker 529*9880d681SAndroid Build Coastguard WorkerThis: 530*9880d681SAndroid Build Coastguard Worker #include <algorithm> 531*9880d681SAndroid Build Coastguard Worker inline std::pair<unsigned, bool> full_add(unsigned a, unsigned b) 532*9880d681SAndroid Build Coastguard Worker { return std::make_pair(a + b, a + b < a); } 533*9880d681SAndroid Build Coastguard Worker bool no_overflow(unsigned a, unsigned b) 534*9880d681SAndroid Build Coastguard Worker { return !full_add(a, b).second; } 535*9880d681SAndroid Build Coastguard Worker 536*9880d681SAndroid Build Coastguard WorkerShould compile to: 537*9880d681SAndroid Build Coastguard Worker addl %esi, %edi 538*9880d681SAndroid Build Coastguard Worker setae %al 539*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 540*9880d681SAndroid Build Coastguard Worker ret 541*9880d681SAndroid Build Coastguard Worker 542*9880d681SAndroid Build Coastguard Workeron x86-64, instead of the rather stupid-looking: 543*9880d681SAndroid Build Coastguard Worker addl %esi, %edi 544*9880d681SAndroid Build Coastguard Worker setb %al 545*9880d681SAndroid Build Coastguard Worker xorb $1, %al 546*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 547*9880d681SAndroid Build Coastguard Worker ret 548*9880d681SAndroid Build Coastguard Worker 549*9880d681SAndroid Build Coastguard Worker 550*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 551*9880d681SAndroid Build Coastguard Worker 552*9880d681SAndroid Build Coastguard WorkerThe following code: 553*9880d681SAndroid Build Coastguard Worker 554*9880d681SAndroid Build Coastguard Workerbb114.preheader: ; preds = %cond_next94 555*9880d681SAndroid Build Coastguard Worker %tmp231232 = sext i16 %tmp62 to i32 ; <i32> [#uses=1] 556*9880d681SAndroid Build Coastguard Worker %tmp233 = sub i32 32, %tmp231232 ; <i32> [#uses=1] 557*9880d681SAndroid Build Coastguard Worker %tmp245246 = sext i16 %tmp65 to i32 ; <i32> [#uses=1] 558*9880d681SAndroid Build Coastguard Worker %tmp252253 = sext i16 %tmp68 to i32 ; <i32> [#uses=1] 559*9880d681SAndroid Build Coastguard Worker %tmp254 = sub i32 32, %tmp252253 ; <i32> [#uses=1] 560*9880d681SAndroid Build Coastguard Worker %tmp553554 = bitcast i16* %tmp37 to i8* ; <i8*> [#uses=2] 561*9880d681SAndroid Build Coastguard Worker %tmp583584 = sext i16 %tmp98 to i32 ; <i32> [#uses=1] 562*9880d681SAndroid Build Coastguard Worker %tmp585 = sub i32 32, %tmp583584 ; <i32> [#uses=1] 563*9880d681SAndroid Build Coastguard Worker %tmp614615 = sext i16 %tmp101 to i32 ; <i32> [#uses=1] 564*9880d681SAndroid Build Coastguard Worker %tmp621622 = sext i16 %tmp104 to i32 ; <i32> [#uses=1] 565*9880d681SAndroid Build Coastguard Worker %tmp623 = sub i32 32, %tmp621622 ; <i32> [#uses=1] 566*9880d681SAndroid Build Coastguard Worker br label %bb114 567*9880d681SAndroid Build Coastguard Worker 568*9880d681SAndroid Build Coastguard Workerproduces: 569*9880d681SAndroid Build Coastguard Worker 570*9880d681SAndroid Build Coastguard WorkerLBB3_5: # bb114.preheader 571*9880d681SAndroid Build Coastguard Worker movswl -68(%ebp), %eax 572*9880d681SAndroid Build Coastguard Worker movl $32, %ecx 573*9880d681SAndroid Build Coastguard Worker movl %ecx, -80(%ebp) 574*9880d681SAndroid Build Coastguard Worker subl %eax, -80(%ebp) 575*9880d681SAndroid Build Coastguard Worker movswl -52(%ebp), %eax 576*9880d681SAndroid Build Coastguard Worker movl %ecx, -84(%ebp) 577*9880d681SAndroid Build Coastguard Worker subl %eax, -84(%ebp) 578*9880d681SAndroid Build Coastguard Worker movswl -70(%ebp), %eax 579*9880d681SAndroid Build Coastguard Worker movl %ecx, -88(%ebp) 580*9880d681SAndroid Build Coastguard Worker subl %eax, -88(%ebp) 581*9880d681SAndroid Build Coastguard Worker movswl -50(%ebp), %eax 582*9880d681SAndroid Build Coastguard Worker subl %eax, %ecx 583*9880d681SAndroid Build Coastguard Worker movl %ecx, -76(%ebp) 584*9880d681SAndroid Build Coastguard Worker movswl -42(%ebp), %eax 585*9880d681SAndroid Build Coastguard Worker movl %eax, -92(%ebp) 586*9880d681SAndroid Build Coastguard Worker movswl -66(%ebp), %eax 587*9880d681SAndroid Build Coastguard Worker movl %eax, -96(%ebp) 588*9880d681SAndroid Build Coastguard Worker movw $0, -98(%ebp) 589*9880d681SAndroid Build Coastguard Worker 590*9880d681SAndroid Build Coastguard WorkerThis appears to be bad because the RA is not folding the store to the stack 591*9880d681SAndroid Build Coastguard Workerslot into the movl. The above instructions could be: 592*9880d681SAndroid Build Coastguard Worker movl $32, -80(%ebp) 593*9880d681SAndroid Build Coastguard Worker... 594*9880d681SAndroid Build Coastguard Worker movl $32, -84(%ebp) 595*9880d681SAndroid Build Coastguard Worker... 596*9880d681SAndroid Build Coastguard WorkerThis seems like a cross between remat and spill folding. 597*9880d681SAndroid Build Coastguard Worker 598*9880d681SAndroid Build Coastguard WorkerThis has redundant subtractions of %eax from a stack slot. However, %ecx doesn't 599*9880d681SAndroid Build Coastguard Workerchange, so we could simply subtract %eax from %ecx first and then use %ecx (or 600*9880d681SAndroid Build Coastguard Workervice-versa). 601*9880d681SAndroid Build Coastguard Worker 602*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 603*9880d681SAndroid Build Coastguard Worker 604*9880d681SAndroid Build Coastguard WorkerThis code: 605*9880d681SAndroid Build Coastguard Worker 606*9880d681SAndroid Build Coastguard Worker %tmp659 = icmp slt i16 %tmp654, 0 ; <i1> [#uses=1] 607*9880d681SAndroid Build Coastguard Worker br i1 %tmp659, label %cond_true662, label %cond_next715 608*9880d681SAndroid Build Coastguard Worker 609*9880d681SAndroid Build Coastguard Workerproduces this: 610*9880d681SAndroid Build Coastguard Worker 611*9880d681SAndroid Build Coastguard Worker testw %cx, %cx 612*9880d681SAndroid Build Coastguard Worker movswl %cx, %esi 613*9880d681SAndroid Build Coastguard Worker jns LBB4_109 # cond_next715 614*9880d681SAndroid Build Coastguard Worker 615*9880d681SAndroid Build Coastguard WorkerShark tells us that using %cx in the testw instruction is sub-optimal. It 616*9880d681SAndroid Build Coastguard Workersuggests using the 32-bit register (which is what ICC uses). 617*9880d681SAndroid Build Coastguard Worker 618*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 619*9880d681SAndroid Build Coastguard Worker 620*9880d681SAndroid Build Coastguard WorkerWe compile this: 621*9880d681SAndroid Build Coastguard Worker 622*9880d681SAndroid Build Coastguard Workervoid compare (long long foo) { 623*9880d681SAndroid Build Coastguard Worker if (foo < 4294967297LL) 624*9880d681SAndroid Build Coastguard Worker abort(); 625*9880d681SAndroid Build Coastguard Worker} 626*9880d681SAndroid Build Coastguard Worker 627*9880d681SAndroid Build Coastguard Workerto: 628*9880d681SAndroid Build Coastguard Worker 629*9880d681SAndroid Build Coastguard Workercompare: 630*9880d681SAndroid Build Coastguard Worker subl $4, %esp 631*9880d681SAndroid Build Coastguard Worker cmpl $0, 8(%esp) 632*9880d681SAndroid Build Coastguard Worker setne %al 633*9880d681SAndroid Build Coastguard Worker movzbw %al, %ax 634*9880d681SAndroid Build Coastguard Worker cmpl $1, 12(%esp) 635*9880d681SAndroid Build Coastguard Worker setg %cl 636*9880d681SAndroid Build Coastguard Worker movzbw %cl, %cx 637*9880d681SAndroid Build Coastguard Worker cmove %ax, %cx 638*9880d681SAndroid Build Coastguard Worker testb $1, %cl 639*9880d681SAndroid Build Coastguard Worker jne .LBB1_2 # UnifiedReturnBlock 640*9880d681SAndroid Build Coastguard Worker.LBB1_1: # ifthen 641*9880d681SAndroid Build Coastguard Worker call abort 642*9880d681SAndroid Build Coastguard Worker.LBB1_2: # UnifiedReturnBlock 643*9880d681SAndroid Build Coastguard Worker addl $4, %esp 644*9880d681SAndroid Build Coastguard Worker ret 645*9880d681SAndroid Build Coastguard Worker 646*9880d681SAndroid Build Coastguard Worker(also really horrible code on ppc). This is due to the expand code for 64-bit 647*9880d681SAndroid Build Coastguard Workercompares. GCC produces multiple branches, which is much nicer: 648*9880d681SAndroid Build Coastguard Worker 649*9880d681SAndroid Build Coastguard Workercompare: 650*9880d681SAndroid Build Coastguard Worker subl $12, %esp 651*9880d681SAndroid Build Coastguard Worker movl 20(%esp), %edx 652*9880d681SAndroid Build Coastguard Worker movl 16(%esp), %eax 653*9880d681SAndroid Build Coastguard Worker decl %edx 654*9880d681SAndroid Build Coastguard Worker jle .L7 655*9880d681SAndroid Build Coastguard Worker.L5: 656*9880d681SAndroid Build Coastguard Worker addl $12, %esp 657*9880d681SAndroid Build Coastguard Worker ret 658*9880d681SAndroid Build Coastguard Worker .p2align 4,,7 659*9880d681SAndroid Build Coastguard Worker.L7: 660*9880d681SAndroid Build Coastguard Worker jl .L4 661*9880d681SAndroid Build Coastguard Worker cmpl $0, %eax 662*9880d681SAndroid Build Coastguard Worker .p2align 4,,8 663*9880d681SAndroid Build Coastguard Worker ja .L5 664*9880d681SAndroid Build Coastguard Worker.L4: 665*9880d681SAndroid Build Coastguard Worker .p2align 4,,9 666*9880d681SAndroid Build Coastguard Worker call abort 667*9880d681SAndroid Build Coastguard Worker 668*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 669*9880d681SAndroid Build Coastguard Worker 670*9880d681SAndroid Build Coastguard WorkerTail call optimization improvements: Tail call optimization currently 671*9880d681SAndroid Build Coastguard Workerpushes all arguments on the top of the stack (their normal place for 672*9880d681SAndroid Build Coastguard Workernon-tail call optimized calls) that source from the callers arguments 673*9880d681SAndroid Build Coastguard Workeror that source from a virtual register (also possibly sourcing from 674*9880d681SAndroid Build Coastguard Workercallers arguments). 675*9880d681SAndroid Build Coastguard WorkerThis is done to prevent overwriting of parameters (see example 676*9880d681SAndroid Build Coastguard Workerbelow) that might be used later. 677*9880d681SAndroid Build Coastguard Worker 678*9880d681SAndroid Build Coastguard Workerexample: 679*9880d681SAndroid Build Coastguard Worker 680*9880d681SAndroid Build Coastguard Workerint callee(int32, int64); 681*9880d681SAndroid Build Coastguard Workerint caller(int32 arg1, int32 arg2) { 682*9880d681SAndroid Build Coastguard Worker int64 local = arg2 * 2; 683*9880d681SAndroid Build Coastguard Worker return callee(arg2, (int64)local); 684*9880d681SAndroid Build Coastguard Worker} 685*9880d681SAndroid Build Coastguard Worker 686*9880d681SAndroid Build Coastguard Worker[arg1] [!arg2 no longer valid since we moved local onto it] 687*9880d681SAndroid Build Coastguard Worker[arg2] -> [(int64) 688*9880d681SAndroid Build Coastguard Worker[RETADDR] local ] 689*9880d681SAndroid Build Coastguard Worker 690*9880d681SAndroid Build Coastguard WorkerMoving arg1 onto the stack slot of callee function would overwrite 691*9880d681SAndroid Build Coastguard Workerarg2 of the caller. 692*9880d681SAndroid Build Coastguard Worker 693*9880d681SAndroid Build Coastguard WorkerPossible optimizations: 694*9880d681SAndroid Build Coastguard Worker 695*9880d681SAndroid Build Coastguard Worker 696*9880d681SAndroid Build Coastguard Worker - Analyse the actual parameters of the callee to see which would 697*9880d681SAndroid Build Coastguard Worker overwrite a caller parameter which is used by the callee and only 698*9880d681SAndroid Build Coastguard Worker push them onto the top of the stack. 699*9880d681SAndroid Build Coastguard Worker 700*9880d681SAndroid Build Coastguard Worker int callee (int32 arg1, int32 arg2); 701*9880d681SAndroid Build Coastguard Worker int caller (int32 arg1, int32 arg2) { 702*9880d681SAndroid Build Coastguard Worker return callee(arg1,arg2); 703*9880d681SAndroid Build Coastguard Worker } 704*9880d681SAndroid Build Coastguard Worker 705*9880d681SAndroid Build Coastguard Worker Here we don't need to write any variables to the top of the stack 706*9880d681SAndroid Build Coastguard Worker since they don't overwrite each other. 707*9880d681SAndroid Build Coastguard Worker 708*9880d681SAndroid Build Coastguard Worker int callee (int32 arg1, int32 arg2); 709*9880d681SAndroid Build Coastguard Worker int caller (int32 arg1, int32 arg2) { 710*9880d681SAndroid Build Coastguard Worker return callee(arg2,arg1); 711*9880d681SAndroid Build Coastguard Worker } 712*9880d681SAndroid Build Coastguard Worker 713*9880d681SAndroid Build Coastguard Worker Here we need to push the arguments because they overwrite each 714*9880d681SAndroid Build Coastguard Worker other. 715*9880d681SAndroid Build Coastguard Worker 716*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 717*9880d681SAndroid Build Coastguard Worker 718*9880d681SAndroid Build Coastguard Workermain () 719*9880d681SAndroid Build Coastguard Worker{ 720*9880d681SAndroid Build Coastguard Worker int i = 0; 721*9880d681SAndroid Build Coastguard Worker unsigned long int z = 0; 722*9880d681SAndroid Build Coastguard Worker 723*9880d681SAndroid Build Coastguard Worker do { 724*9880d681SAndroid Build Coastguard Worker z -= 0x00004000; 725*9880d681SAndroid Build Coastguard Worker i++; 726*9880d681SAndroid Build Coastguard Worker if (i > 0x00040000) 727*9880d681SAndroid Build Coastguard Worker abort (); 728*9880d681SAndroid Build Coastguard Worker } while (z > 0); 729*9880d681SAndroid Build Coastguard Worker exit (0); 730*9880d681SAndroid Build Coastguard Worker} 731*9880d681SAndroid Build Coastguard Worker 732*9880d681SAndroid Build Coastguard Workergcc compiles this to: 733*9880d681SAndroid Build Coastguard Worker 734*9880d681SAndroid Build Coastguard Worker_main: 735*9880d681SAndroid Build Coastguard Worker subl $28, %esp 736*9880d681SAndroid Build Coastguard Worker xorl %eax, %eax 737*9880d681SAndroid Build Coastguard Worker jmp L2 738*9880d681SAndroid Build Coastguard WorkerL3: 739*9880d681SAndroid Build Coastguard Worker cmpl $262144, %eax 740*9880d681SAndroid Build Coastguard Worker je L10 741*9880d681SAndroid Build Coastguard WorkerL2: 742*9880d681SAndroid Build Coastguard Worker addl $1, %eax 743*9880d681SAndroid Build Coastguard Worker cmpl $262145, %eax 744*9880d681SAndroid Build Coastguard Worker jne L3 745*9880d681SAndroid Build Coastguard Worker call L_abort$stub 746*9880d681SAndroid Build Coastguard WorkerL10: 747*9880d681SAndroid Build Coastguard Worker movl $0, (%esp) 748*9880d681SAndroid Build Coastguard Worker call L_exit$stub 749*9880d681SAndroid Build Coastguard Worker 750*9880d681SAndroid Build Coastguard Workerllvm: 751*9880d681SAndroid Build Coastguard Worker 752*9880d681SAndroid Build Coastguard Worker_main: 753*9880d681SAndroid Build Coastguard Worker subl $12, %esp 754*9880d681SAndroid Build Coastguard Worker movl $1, %eax 755*9880d681SAndroid Build Coastguard Worker movl $16384, %ecx 756*9880d681SAndroid Build Coastguard WorkerLBB1_1: # bb 757*9880d681SAndroid Build Coastguard Worker cmpl $262145, %eax 758*9880d681SAndroid Build Coastguard Worker jge LBB1_4 # cond_true 759*9880d681SAndroid Build Coastguard WorkerLBB1_2: # cond_next 760*9880d681SAndroid Build Coastguard Worker incl %eax 761*9880d681SAndroid Build Coastguard Worker addl $4294950912, %ecx 762*9880d681SAndroid Build Coastguard Worker cmpl $16384, %ecx 763*9880d681SAndroid Build Coastguard Worker jne LBB1_1 # bb 764*9880d681SAndroid Build Coastguard WorkerLBB1_3: # bb11 765*9880d681SAndroid Build Coastguard Worker xorl %eax, %eax 766*9880d681SAndroid Build Coastguard Worker addl $12, %esp 767*9880d681SAndroid Build Coastguard Worker ret 768*9880d681SAndroid Build Coastguard WorkerLBB1_4: # cond_true 769*9880d681SAndroid Build Coastguard Worker call L_abort$stub 770*9880d681SAndroid Build Coastguard Worker 771*9880d681SAndroid Build Coastguard Worker1. LSR should rewrite the first cmp with induction variable %ecx. 772*9880d681SAndroid Build Coastguard Worker2. DAG combiner should fold 773*9880d681SAndroid Build Coastguard Worker leal 1(%eax), %edx 774*9880d681SAndroid Build Coastguard Worker cmpl $262145, %edx 775*9880d681SAndroid Build Coastguard Worker => 776*9880d681SAndroid Build Coastguard Worker cmpl $262144, %eax 777*9880d681SAndroid Build Coastguard Worker 778*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 779*9880d681SAndroid Build Coastguard Worker 780*9880d681SAndroid Build Coastguard Workerdefine i64 @test(double %X) { 781*9880d681SAndroid Build Coastguard Worker %Y = fptosi double %X to i64 782*9880d681SAndroid Build Coastguard Worker ret i64 %Y 783*9880d681SAndroid Build Coastguard Worker} 784*9880d681SAndroid Build Coastguard Worker 785*9880d681SAndroid Build Coastguard Workercompiles to: 786*9880d681SAndroid Build Coastguard Worker 787*9880d681SAndroid Build Coastguard Worker_test: 788*9880d681SAndroid Build Coastguard Worker subl $20, %esp 789*9880d681SAndroid Build Coastguard Worker movsd 24(%esp), %xmm0 790*9880d681SAndroid Build Coastguard Worker movsd %xmm0, 8(%esp) 791*9880d681SAndroid Build Coastguard Worker fldl 8(%esp) 792*9880d681SAndroid Build Coastguard Worker fisttpll (%esp) 793*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %edx 794*9880d681SAndroid Build Coastguard Worker movl (%esp), %eax 795*9880d681SAndroid Build Coastguard Worker addl $20, %esp 796*9880d681SAndroid Build Coastguard Worker #FP_REG_KILL 797*9880d681SAndroid Build Coastguard Worker ret 798*9880d681SAndroid Build Coastguard Worker 799*9880d681SAndroid Build Coastguard WorkerThis should just fldl directly from the input stack slot. 800*9880d681SAndroid Build Coastguard Worker 801*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 802*9880d681SAndroid Build Coastguard Worker 803*9880d681SAndroid Build Coastguard WorkerThis code: 804*9880d681SAndroid Build Coastguard Workerint foo (int x) { return (x & 65535) | 255; } 805*9880d681SAndroid Build Coastguard Worker 806*9880d681SAndroid Build Coastguard WorkerShould compile into: 807*9880d681SAndroid Build Coastguard Worker 808*9880d681SAndroid Build Coastguard Worker_foo: 809*9880d681SAndroid Build Coastguard Worker movzwl 4(%esp), %eax 810*9880d681SAndroid Build Coastguard Worker orl $255, %eax 811*9880d681SAndroid Build Coastguard Worker ret 812*9880d681SAndroid Build Coastguard Worker 813*9880d681SAndroid Build Coastguard Workerinstead of: 814*9880d681SAndroid Build Coastguard Worker_foo: 815*9880d681SAndroid Build Coastguard Worker movl $65280, %eax 816*9880d681SAndroid Build Coastguard Worker andl 4(%esp), %eax 817*9880d681SAndroid Build Coastguard Worker orl $255, %eax 818*9880d681SAndroid Build Coastguard Worker ret 819*9880d681SAndroid Build Coastguard Worker 820*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 821*9880d681SAndroid Build Coastguard Worker 822*9880d681SAndroid Build Coastguard WorkerWe're codegen'ing multiply of long longs inefficiently: 823*9880d681SAndroid Build Coastguard Worker 824*9880d681SAndroid Build Coastguard Workerunsigned long long LLM(unsigned long long arg1, unsigned long long arg2) { 825*9880d681SAndroid Build Coastguard Worker return arg1 * arg2; 826*9880d681SAndroid Build Coastguard Worker} 827*9880d681SAndroid Build Coastguard Worker 828*9880d681SAndroid Build Coastguard WorkerWe compile to (fomit-frame-pointer): 829*9880d681SAndroid Build Coastguard Worker 830*9880d681SAndroid Build Coastguard Worker_LLM: 831*9880d681SAndroid Build Coastguard Worker pushl %esi 832*9880d681SAndroid Build Coastguard Worker movl 8(%esp), %ecx 833*9880d681SAndroid Build Coastguard Worker movl 16(%esp), %esi 834*9880d681SAndroid Build Coastguard Worker movl %esi, %eax 835*9880d681SAndroid Build Coastguard Worker mull %ecx 836*9880d681SAndroid Build Coastguard Worker imull 12(%esp), %esi 837*9880d681SAndroid Build Coastguard Worker addl %edx, %esi 838*9880d681SAndroid Build Coastguard Worker imull 20(%esp), %ecx 839*9880d681SAndroid Build Coastguard Worker movl %esi, %edx 840*9880d681SAndroid Build Coastguard Worker addl %ecx, %edx 841*9880d681SAndroid Build Coastguard Worker popl %esi 842*9880d681SAndroid Build Coastguard Worker ret 843*9880d681SAndroid Build Coastguard Worker 844*9880d681SAndroid Build Coastguard WorkerThis looks like a scheduling deficiency and lack of remat of the load from 845*9880d681SAndroid Build Coastguard Workerthe argument area. ICC apparently produces: 846*9880d681SAndroid Build Coastguard Worker 847*9880d681SAndroid Build Coastguard Worker movl 8(%esp), %ecx 848*9880d681SAndroid Build Coastguard Worker imull 12(%esp), %ecx 849*9880d681SAndroid Build Coastguard Worker movl 16(%esp), %eax 850*9880d681SAndroid Build Coastguard Worker imull 4(%esp), %eax 851*9880d681SAndroid Build Coastguard Worker addl %eax, %ecx 852*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 853*9880d681SAndroid Build Coastguard Worker mull 12(%esp) 854*9880d681SAndroid Build Coastguard Worker addl %ecx, %edx 855*9880d681SAndroid Build Coastguard Worker ret 856*9880d681SAndroid Build Coastguard Worker 857*9880d681SAndroid Build Coastguard WorkerNote that it remat'd loads from 4(esp) and 12(esp). See this GCC PR: 858*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236 859*9880d681SAndroid Build Coastguard Worker 860*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 861*9880d681SAndroid Build Coastguard Worker 862*9880d681SAndroid Build Coastguard WorkerWe can fold a store into "zeroing a reg". Instead of: 863*9880d681SAndroid Build Coastguard Worker 864*9880d681SAndroid Build Coastguard Workerxorl %eax, %eax 865*9880d681SAndroid Build Coastguard Workermovl %eax, 124(%esp) 866*9880d681SAndroid Build Coastguard Worker 867*9880d681SAndroid Build Coastguard Workerwe should get: 868*9880d681SAndroid Build Coastguard Worker 869*9880d681SAndroid Build Coastguard Workermovl $0, 124(%esp) 870*9880d681SAndroid Build Coastguard Worker 871*9880d681SAndroid Build Coastguard Workerif the flags of the xor are dead. 872*9880d681SAndroid Build Coastguard Worker 873*9880d681SAndroid Build Coastguard WorkerLikewise, we isel "x<<1" into "add reg,reg". If reg is spilled, this should 874*9880d681SAndroid Build Coastguard Workerbe folded into: shl [mem], 1 875*9880d681SAndroid Build Coastguard Worker 876*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 877*9880d681SAndroid Build Coastguard Worker 878*9880d681SAndroid Build Coastguard WorkerIn SSE mode, we turn abs and neg into a load from the constant pool plus a xor 879*9880d681SAndroid Build Coastguard Workeror and instruction, for example: 880*9880d681SAndroid Build Coastguard Worker 881*9880d681SAndroid Build Coastguard Worker xorpd LCPI1_0, %xmm2 882*9880d681SAndroid Build Coastguard Worker 883*9880d681SAndroid Build Coastguard WorkerHowever, if xmm2 gets spilled, we end up with really ugly code like this: 884*9880d681SAndroid Build Coastguard Worker 885*9880d681SAndroid Build Coastguard Worker movsd (%esp), %xmm0 886*9880d681SAndroid Build Coastguard Worker xorpd LCPI1_0, %xmm0 887*9880d681SAndroid Build Coastguard Worker movsd %xmm0, (%esp) 888*9880d681SAndroid Build Coastguard Worker 889*9880d681SAndroid Build Coastguard WorkerSince we 'know' that this is a 'neg', we can actually "fold" the spill into 890*9880d681SAndroid Build Coastguard Workerthe neg/abs instruction, turning it into an *integer* operation, like this: 891*9880d681SAndroid Build Coastguard Worker 892*9880d681SAndroid Build Coastguard Worker xorl 2147483648, [mem+4] ## 2147483648 = (1 << 31) 893*9880d681SAndroid Build Coastguard Worker 894*9880d681SAndroid Build Coastguard Workeryou could also use xorb, but xorl is less likely to lead to a partial register 895*9880d681SAndroid Build Coastguard Workerstall. Here is a contrived testcase: 896*9880d681SAndroid Build Coastguard Worker 897*9880d681SAndroid Build Coastguard Workerdouble a, b, c; 898*9880d681SAndroid Build Coastguard Workervoid test(double *P) { 899*9880d681SAndroid Build Coastguard Worker double X = *P; 900*9880d681SAndroid Build Coastguard Worker a = X; 901*9880d681SAndroid Build Coastguard Worker bar(); 902*9880d681SAndroid Build Coastguard Worker X = -X; 903*9880d681SAndroid Build Coastguard Worker b = X; 904*9880d681SAndroid Build Coastguard Worker bar(); 905*9880d681SAndroid Build Coastguard Worker c = X; 906*9880d681SAndroid Build Coastguard Worker} 907*9880d681SAndroid Build Coastguard Worker 908*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 909*9880d681SAndroid Build Coastguard Worker 910*9880d681SAndroid Build Coastguard WorkerThe generated code on x86 for checking for signed overflow on a multiply the 911*9880d681SAndroid Build Coastguard Workerobvious way is much longer than it needs to be. 912*9880d681SAndroid Build Coastguard Worker 913*9880d681SAndroid Build Coastguard Workerint x(int a, int b) { 914*9880d681SAndroid Build Coastguard Worker long long prod = (long long)a*b; 915*9880d681SAndroid Build Coastguard Worker return prod > 0x7FFFFFFF || prod < (-0x7FFFFFFF-1); 916*9880d681SAndroid Build Coastguard Worker} 917*9880d681SAndroid Build Coastguard Worker 918*9880d681SAndroid Build Coastguard WorkerSee PR2053 for more details. 919*9880d681SAndroid Build Coastguard Worker 920*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 921*9880d681SAndroid Build Coastguard Worker 922*9880d681SAndroid Build Coastguard WorkerWe should investigate using cdq/ctld (effect: edx = sar eax, 31) 923*9880d681SAndroid Build Coastguard Workermore aggressively; it should cost the same as a move+shift on any modern 924*9880d681SAndroid Build Coastguard Workerprocessor, but it's a lot shorter. Downside is that it puts more 925*9880d681SAndroid Build Coastguard Workerpressure on register allocation because it has fixed operands. 926*9880d681SAndroid Build Coastguard Worker 927*9880d681SAndroid Build Coastguard WorkerExample: 928*9880d681SAndroid Build Coastguard Workerint abs(int x) {return x < 0 ? -x : x;} 929*9880d681SAndroid Build Coastguard Worker 930*9880d681SAndroid Build Coastguard Workergcc compiles this to the following when using march/mtune=pentium2/3/4/m/etc.: 931*9880d681SAndroid Build Coastguard Workerabs: 932*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 933*9880d681SAndroid Build Coastguard Worker cltd 934*9880d681SAndroid Build Coastguard Worker xorl %edx, %eax 935*9880d681SAndroid Build Coastguard Worker subl %edx, %eax 936*9880d681SAndroid Build Coastguard Worker ret 937*9880d681SAndroid Build Coastguard Worker 938*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 939*9880d681SAndroid Build Coastguard Worker 940*9880d681SAndroid Build Coastguard WorkerTake the following code (from 941*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=16541): 942*9880d681SAndroid Build Coastguard Worker 943*9880d681SAndroid Build Coastguard Workerextern unsigned char first_one[65536]; 944*9880d681SAndroid Build Coastguard Workerint FirstOnet(unsigned long long arg1) 945*9880d681SAndroid Build Coastguard Worker{ 946*9880d681SAndroid Build Coastguard Worker if (arg1 >> 48) 947*9880d681SAndroid Build Coastguard Worker return (first_one[arg1 >> 48]); 948*9880d681SAndroid Build Coastguard Worker return 0; 949*9880d681SAndroid Build Coastguard Worker} 950*9880d681SAndroid Build Coastguard Worker 951*9880d681SAndroid Build Coastguard Worker 952*9880d681SAndroid Build Coastguard WorkerThe following code is currently generated: 953*9880d681SAndroid Build Coastguard WorkerFirstOnet: 954*9880d681SAndroid Build Coastguard Worker movl 8(%esp), %eax 955*9880d681SAndroid Build Coastguard Worker cmpl $65536, %eax 956*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %ecx 957*9880d681SAndroid Build Coastguard Worker jb .LBB1_2 # UnifiedReturnBlock 958*9880d681SAndroid Build Coastguard Worker.LBB1_1: # ifthen 959*9880d681SAndroid Build Coastguard Worker shrl $16, %eax 960*9880d681SAndroid Build Coastguard Worker movzbl first_one(%eax), %eax 961*9880d681SAndroid Build Coastguard Worker ret 962*9880d681SAndroid Build Coastguard Worker.LBB1_2: # UnifiedReturnBlock 963*9880d681SAndroid Build Coastguard Worker xorl %eax, %eax 964*9880d681SAndroid Build Coastguard Worker ret 965*9880d681SAndroid Build Coastguard Worker 966*9880d681SAndroid Build Coastguard WorkerWe could change the "movl 8(%esp), %eax" into "movzwl 10(%esp), %eax"; this 967*9880d681SAndroid Build Coastguard Workerlets us change the cmpl into a testl, which is shorter, and eliminate the shift. 968*9880d681SAndroid Build Coastguard Worker 969*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 970*9880d681SAndroid Build Coastguard Worker 971*9880d681SAndroid Build Coastguard WorkerWe compile this function: 972*9880d681SAndroid Build Coastguard Worker 973*9880d681SAndroid Build Coastguard Workerdefine i32 @foo(i32 %a, i32 %b, i32 %c, i8 zeroext %d) nounwind { 974*9880d681SAndroid Build Coastguard Workerentry: 975*9880d681SAndroid Build Coastguard Worker %tmp2 = icmp eq i8 %d, 0 ; <i1> [#uses=1] 976*9880d681SAndroid Build Coastguard Worker br i1 %tmp2, label %bb7, label %bb 977*9880d681SAndroid Build Coastguard Worker 978*9880d681SAndroid Build Coastguard Workerbb: ; preds = %entry 979*9880d681SAndroid Build Coastguard Worker %tmp6 = add i32 %b, %a ; <i32> [#uses=1] 980*9880d681SAndroid Build Coastguard Worker ret i32 %tmp6 981*9880d681SAndroid Build Coastguard Worker 982*9880d681SAndroid Build Coastguard Workerbb7: ; preds = %entry 983*9880d681SAndroid Build Coastguard Worker %tmp10 = sub i32 %a, %c ; <i32> [#uses=1] 984*9880d681SAndroid Build Coastguard Worker ret i32 %tmp10 985*9880d681SAndroid Build Coastguard Worker} 986*9880d681SAndroid Build Coastguard Worker 987*9880d681SAndroid Build Coastguard Workerto: 988*9880d681SAndroid Build Coastguard Worker 989*9880d681SAndroid Build Coastguard Workerfoo: # @foo 990*9880d681SAndroid Build Coastguard Worker# BB#0: # %entry 991*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %ecx 992*9880d681SAndroid Build Coastguard Worker cmpb $0, 16(%esp) 993*9880d681SAndroid Build Coastguard Worker je .LBB0_2 994*9880d681SAndroid Build Coastguard Worker# BB#1: # %bb 995*9880d681SAndroid Build Coastguard Worker movl 8(%esp), %eax 996*9880d681SAndroid Build Coastguard Worker addl %ecx, %eax 997*9880d681SAndroid Build Coastguard Worker ret 998*9880d681SAndroid Build Coastguard Worker.LBB0_2: # %bb7 999*9880d681SAndroid Build Coastguard Worker movl 12(%esp), %edx 1000*9880d681SAndroid Build Coastguard Worker movl %ecx, %eax 1001*9880d681SAndroid Build Coastguard Worker subl %edx, %eax 1002*9880d681SAndroid Build Coastguard Worker ret 1003*9880d681SAndroid Build Coastguard Worker 1004*9880d681SAndroid Build Coastguard WorkerThere's an obviously unnecessary movl in .LBB0_2, and we could eliminate a 1005*9880d681SAndroid Build Coastguard Workercouple more movls by putting 4(%esp) into %eax instead of %ecx. 1006*9880d681SAndroid Build Coastguard Worker 1007*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1008*9880d681SAndroid Build Coastguard Worker 1009*9880d681SAndroid Build Coastguard WorkerSee rdar://4653682. 1010*9880d681SAndroid Build Coastguard Worker 1011*9880d681SAndroid Build Coastguard WorkerFrom flops: 1012*9880d681SAndroid Build Coastguard Worker 1013*9880d681SAndroid Build Coastguard WorkerLBB1_15: # bb310 1014*9880d681SAndroid Build Coastguard Worker cvtss2sd LCPI1_0, %xmm1 1015*9880d681SAndroid Build Coastguard Worker addsd %xmm1, %xmm0 1016*9880d681SAndroid Build Coastguard Worker movsd 176(%esp), %xmm2 1017*9880d681SAndroid Build Coastguard Worker mulsd %xmm0, %xmm2 1018*9880d681SAndroid Build Coastguard Worker movapd %xmm2, %xmm3 1019*9880d681SAndroid Build Coastguard Worker mulsd %xmm3, %xmm3 1020*9880d681SAndroid Build Coastguard Worker movapd %xmm3, %xmm4 1021*9880d681SAndroid Build Coastguard Worker mulsd LCPI1_23, %xmm4 1022*9880d681SAndroid Build Coastguard Worker addsd LCPI1_24, %xmm4 1023*9880d681SAndroid Build Coastguard Worker mulsd %xmm3, %xmm4 1024*9880d681SAndroid Build Coastguard Worker addsd LCPI1_25, %xmm4 1025*9880d681SAndroid Build Coastguard Worker mulsd %xmm3, %xmm4 1026*9880d681SAndroid Build Coastguard Worker addsd LCPI1_26, %xmm4 1027*9880d681SAndroid Build Coastguard Worker mulsd %xmm3, %xmm4 1028*9880d681SAndroid Build Coastguard Worker addsd LCPI1_27, %xmm4 1029*9880d681SAndroid Build Coastguard Worker mulsd %xmm3, %xmm4 1030*9880d681SAndroid Build Coastguard Worker addsd LCPI1_28, %xmm4 1031*9880d681SAndroid Build Coastguard Worker mulsd %xmm3, %xmm4 1032*9880d681SAndroid Build Coastguard Worker addsd %xmm1, %xmm4 1033*9880d681SAndroid Build Coastguard Worker mulsd %xmm2, %xmm4 1034*9880d681SAndroid Build Coastguard Worker movsd 152(%esp), %xmm1 1035*9880d681SAndroid Build Coastguard Worker addsd %xmm4, %xmm1 1036*9880d681SAndroid Build Coastguard Worker movsd %xmm1, 152(%esp) 1037*9880d681SAndroid Build Coastguard Worker incl %eax 1038*9880d681SAndroid Build Coastguard Worker cmpl %eax, %esi 1039*9880d681SAndroid Build Coastguard Worker jge LBB1_15 # bb310 1040*9880d681SAndroid Build Coastguard WorkerLBB1_16: # bb358.loopexit 1041*9880d681SAndroid Build Coastguard Worker movsd 152(%esp), %xmm0 1042*9880d681SAndroid Build Coastguard Worker addsd %xmm0, %xmm0 1043*9880d681SAndroid Build Coastguard Worker addsd LCPI1_22, %xmm0 1044*9880d681SAndroid Build Coastguard Worker movsd %xmm0, 152(%esp) 1045*9880d681SAndroid Build Coastguard Worker 1046*9880d681SAndroid Build Coastguard WorkerRather than spilling the result of the last addsd in the loop, we should have 1047*9880d681SAndroid Build Coastguard Workerinsert a copy to split the interval (one for the duration of the loop, one 1048*9880d681SAndroid Build Coastguard Workerextending to the fall through). The register pressure in the loop isn't high 1049*9880d681SAndroid Build Coastguard Workerenough to warrant the spill. 1050*9880d681SAndroid Build Coastguard Worker 1051*9880d681SAndroid Build Coastguard WorkerAlso check why xmm7 is not used at all in the function. 1052*9880d681SAndroid Build Coastguard Worker 1053*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1054*9880d681SAndroid Build Coastguard Worker 1055*9880d681SAndroid Build Coastguard WorkerTake the following: 1056*9880d681SAndroid Build Coastguard Worker 1057*9880d681SAndroid Build Coastguard Workertarget datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-S128" 1058*9880d681SAndroid Build Coastguard Workertarget triple = "i386-apple-darwin8" 1059*9880d681SAndroid Build Coastguard Worker@in_exit.4870.b = internal global i1 false ; <i1*> [#uses=2] 1060*9880d681SAndroid Build Coastguard Workerdefine fastcc void @abort_gzip() noreturn nounwind { 1061*9880d681SAndroid Build Coastguard Workerentry: 1062*9880d681SAndroid Build Coastguard Worker %tmp.b.i = load i1* @in_exit.4870.b ; <i1> [#uses=1] 1063*9880d681SAndroid Build Coastguard Worker br i1 %tmp.b.i, label %bb.i, label %bb4.i 1064*9880d681SAndroid Build Coastguard Workerbb.i: ; preds = %entry 1065*9880d681SAndroid Build Coastguard Worker tail call void @exit( i32 1 ) noreturn nounwind 1066*9880d681SAndroid Build Coastguard Worker unreachable 1067*9880d681SAndroid Build Coastguard Workerbb4.i: ; preds = %entry 1068*9880d681SAndroid Build Coastguard Worker store i1 true, i1* @in_exit.4870.b 1069*9880d681SAndroid Build Coastguard Worker tail call void @exit( i32 1 ) noreturn nounwind 1070*9880d681SAndroid Build Coastguard Worker unreachable 1071*9880d681SAndroid Build Coastguard Worker} 1072*9880d681SAndroid Build Coastguard Workerdeclare void @exit(i32) noreturn nounwind 1073*9880d681SAndroid Build Coastguard Worker 1074*9880d681SAndroid Build Coastguard WorkerThis compiles into: 1075*9880d681SAndroid Build Coastguard Worker_abort_gzip: ## @abort_gzip 1076*9880d681SAndroid Build Coastguard Worker## BB#0: ## %entry 1077*9880d681SAndroid Build Coastguard Worker subl $12, %esp 1078*9880d681SAndroid Build Coastguard Worker movb _in_exit.4870.b, %al 1079*9880d681SAndroid Build Coastguard Worker cmpb $1, %al 1080*9880d681SAndroid Build Coastguard Worker jne LBB0_2 1081*9880d681SAndroid Build Coastguard Worker 1082*9880d681SAndroid Build Coastguard WorkerWe somehow miss folding the movb into the cmpb. 1083*9880d681SAndroid Build Coastguard Worker 1084*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1085*9880d681SAndroid Build Coastguard Worker 1086*9880d681SAndroid Build Coastguard WorkerWe compile: 1087*9880d681SAndroid Build Coastguard Worker 1088*9880d681SAndroid Build Coastguard Workerint test(int x, int y) { 1089*9880d681SAndroid Build Coastguard Worker return x-y-1; 1090*9880d681SAndroid Build Coastguard Worker} 1091*9880d681SAndroid Build Coastguard Worker 1092*9880d681SAndroid Build Coastguard Workerinto (-m64): 1093*9880d681SAndroid Build Coastguard Worker 1094*9880d681SAndroid Build Coastguard Worker_test: 1095*9880d681SAndroid Build Coastguard Worker decl %edi 1096*9880d681SAndroid Build Coastguard Worker movl %edi, %eax 1097*9880d681SAndroid Build Coastguard Worker subl %esi, %eax 1098*9880d681SAndroid Build Coastguard Worker ret 1099*9880d681SAndroid Build Coastguard Worker 1100*9880d681SAndroid Build Coastguard Workerit would be better to codegen as: x+~y (notl+addl) 1101*9880d681SAndroid Build Coastguard Worker 1102*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1103*9880d681SAndroid Build Coastguard Worker 1104*9880d681SAndroid Build Coastguard WorkerThis code: 1105*9880d681SAndroid Build Coastguard Worker 1106*9880d681SAndroid Build Coastguard Workerint foo(const char *str,...) 1107*9880d681SAndroid Build Coastguard Worker{ 1108*9880d681SAndroid Build Coastguard Worker __builtin_va_list a; int x; 1109*9880d681SAndroid Build Coastguard Worker __builtin_va_start(a,str); x = __builtin_va_arg(a,int); __builtin_va_end(a); 1110*9880d681SAndroid Build Coastguard Worker return x; 1111*9880d681SAndroid Build Coastguard Worker} 1112*9880d681SAndroid Build Coastguard Worker 1113*9880d681SAndroid Build Coastguard Workergets compiled into this on x86-64: 1114*9880d681SAndroid Build Coastguard Worker subq $200, %rsp 1115*9880d681SAndroid Build Coastguard Worker movaps %xmm7, 160(%rsp) 1116*9880d681SAndroid Build Coastguard Worker movaps %xmm6, 144(%rsp) 1117*9880d681SAndroid Build Coastguard Worker movaps %xmm5, 128(%rsp) 1118*9880d681SAndroid Build Coastguard Worker movaps %xmm4, 112(%rsp) 1119*9880d681SAndroid Build Coastguard Worker movaps %xmm3, 96(%rsp) 1120*9880d681SAndroid Build Coastguard Worker movaps %xmm2, 80(%rsp) 1121*9880d681SAndroid Build Coastguard Worker movaps %xmm1, 64(%rsp) 1122*9880d681SAndroid Build Coastguard Worker movaps %xmm0, 48(%rsp) 1123*9880d681SAndroid Build Coastguard Worker movq %r9, 40(%rsp) 1124*9880d681SAndroid Build Coastguard Worker movq %r8, 32(%rsp) 1125*9880d681SAndroid Build Coastguard Worker movq %rcx, 24(%rsp) 1126*9880d681SAndroid Build Coastguard Worker movq %rdx, 16(%rsp) 1127*9880d681SAndroid Build Coastguard Worker movq %rsi, 8(%rsp) 1128*9880d681SAndroid Build Coastguard Worker leaq (%rsp), %rax 1129*9880d681SAndroid Build Coastguard Worker movq %rax, 192(%rsp) 1130*9880d681SAndroid Build Coastguard Worker leaq 208(%rsp), %rax 1131*9880d681SAndroid Build Coastguard Worker movq %rax, 184(%rsp) 1132*9880d681SAndroid Build Coastguard Worker movl $48, 180(%rsp) 1133*9880d681SAndroid Build Coastguard Worker movl $8, 176(%rsp) 1134*9880d681SAndroid Build Coastguard Worker movl 176(%rsp), %eax 1135*9880d681SAndroid Build Coastguard Worker cmpl $47, %eax 1136*9880d681SAndroid Build Coastguard Worker jbe .LBB1_3 # bb 1137*9880d681SAndroid Build Coastguard Worker.LBB1_1: # bb3 1138*9880d681SAndroid Build Coastguard Worker movq 184(%rsp), %rcx 1139*9880d681SAndroid Build Coastguard Worker leaq 8(%rcx), %rax 1140*9880d681SAndroid Build Coastguard Worker movq %rax, 184(%rsp) 1141*9880d681SAndroid Build Coastguard Worker.LBB1_2: # bb4 1142*9880d681SAndroid Build Coastguard Worker movl (%rcx), %eax 1143*9880d681SAndroid Build Coastguard Worker addq $200, %rsp 1144*9880d681SAndroid Build Coastguard Worker ret 1145*9880d681SAndroid Build Coastguard Worker.LBB1_3: # bb 1146*9880d681SAndroid Build Coastguard Worker movl %eax, %ecx 1147*9880d681SAndroid Build Coastguard Worker addl $8, %eax 1148*9880d681SAndroid Build Coastguard Worker addq 192(%rsp), %rcx 1149*9880d681SAndroid Build Coastguard Worker movl %eax, 176(%rsp) 1150*9880d681SAndroid Build Coastguard Worker jmp .LBB1_2 # bb4 1151*9880d681SAndroid Build Coastguard Worker 1152*9880d681SAndroid Build Coastguard Workergcc 4.3 generates: 1153*9880d681SAndroid Build Coastguard Worker subq $96, %rsp 1154*9880d681SAndroid Build Coastguard Worker.LCFI0: 1155*9880d681SAndroid Build Coastguard Worker leaq 104(%rsp), %rax 1156*9880d681SAndroid Build Coastguard Worker movq %rsi, -80(%rsp) 1157*9880d681SAndroid Build Coastguard Worker movl $8, -120(%rsp) 1158*9880d681SAndroid Build Coastguard Worker movq %rax, -112(%rsp) 1159*9880d681SAndroid Build Coastguard Worker leaq -88(%rsp), %rax 1160*9880d681SAndroid Build Coastguard Worker movq %rax, -104(%rsp) 1161*9880d681SAndroid Build Coastguard Worker movl $8, %eax 1162*9880d681SAndroid Build Coastguard Worker cmpl $48, %eax 1163*9880d681SAndroid Build Coastguard Worker jb .L6 1164*9880d681SAndroid Build Coastguard Worker movq -112(%rsp), %rdx 1165*9880d681SAndroid Build Coastguard Worker movl (%rdx), %eax 1166*9880d681SAndroid Build Coastguard Worker addq $96, %rsp 1167*9880d681SAndroid Build Coastguard Worker ret 1168*9880d681SAndroid Build Coastguard Worker .p2align 4,,10 1169*9880d681SAndroid Build Coastguard Worker .p2align 3 1170*9880d681SAndroid Build Coastguard Worker.L6: 1171*9880d681SAndroid Build Coastguard Worker mov %eax, %edx 1172*9880d681SAndroid Build Coastguard Worker addq -104(%rsp), %rdx 1173*9880d681SAndroid Build Coastguard Worker addl $8, %eax 1174*9880d681SAndroid Build Coastguard Worker movl %eax, -120(%rsp) 1175*9880d681SAndroid Build Coastguard Worker movl (%rdx), %eax 1176*9880d681SAndroid Build Coastguard Worker addq $96, %rsp 1177*9880d681SAndroid Build Coastguard Worker ret 1178*9880d681SAndroid Build Coastguard Worker 1179*9880d681SAndroid Build Coastguard Workerand it gets compiled into this on x86: 1180*9880d681SAndroid Build Coastguard Worker pushl %ebp 1181*9880d681SAndroid Build Coastguard Worker movl %esp, %ebp 1182*9880d681SAndroid Build Coastguard Worker subl $4, %esp 1183*9880d681SAndroid Build Coastguard Worker leal 12(%ebp), %eax 1184*9880d681SAndroid Build Coastguard Worker movl %eax, -4(%ebp) 1185*9880d681SAndroid Build Coastguard Worker leal 16(%ebp), %eax 1186*9880d681SAndroid Build Coastguard Worker movl %eax, -4(%ebp) 1187*9880d681SAndroid Build Coastguard Worker movl 12(%ebp), %eax 1188*9880d681SAndroid Build Coastguard Worker addl $4, %esp 1189*9880d681SAndroid Build Coastguard Worker popl %ebp 1190*9880d681SAndroid Build Coastguard Worker ret 1191*9880d681SAndroid Build Coastguard Worker 1192*9880d681SAndroid Build Coastguard Workergcc 4.3 generates: 1193*9880d681SAndroid Build Coastguard Worker pushl %ebp 1194*9880d681SAndroid Build Coastguard Worker movl %esp, %ebp 1195*9880d681SAndroid Build Coastguard Worker movl 12(%ebp), %eax 1196*9880d681SAndroid Build Coastguard Worker popl %ebp 1197*9880d681SAndroid Build Coastguard Worker ret 1198*9880d681SAndroid Build Coastguard Worker 1199*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1200*9880d681SAndroid Build Coastguard Worker 1201*9880d681SAndroid Build Coastguard WorkerTeach tblgen not to check bitconvert source type in some cases. This allows us 1202*9880d681SAndroid Build Coastguard Workerto consolidate the following patterns in X86InstrMMX.td: 1203*9880d681SAndroid Build Coastguard Worker 1204*9880d681SAndroid Build Coastguard Workerdef : Pat<(v2i32 (bitconvert (i64 (vector_extract (v2i64 VR128:$src), 1205*9880d681SAndroid Build Coastguard Worker (iPTR 0))))), 1206*9880d681SAndroid Build Coastguard Worker (v2i32 (MMX_MOVDQ2Qrr VR128:$src))>; 1207*9880d681SAndroid Build Coastguard Workerdef : Pat<(v4i16 (bitconvert (i64 (vector_extract (v2i64 VR128:$src), 1208*9880d681SAndroid Build Coastguard Worker (iPTR 0))))), 1209*9880d681SAndroid Build Coastguard Worker (v4i16 (MMX_MOVDQ2Qrr VR128:$src))>; 1210*9880d681SAndroid Build Coastguard Workerdef : Pat<(v8i8 (bitconvert (i64 (vector_extract (v2i64 VR128:$src), 1211*9880d681SAndroid Build Coastguard Worker (iPTR 0))))), 1212*9880d681SAndroid Build Coastguard Worker (v8i8 (MMX_MOVDQ2Qrr VR128:$src))>; 1213*9880d681SAndroid Build Coastguard Worker 1214*9880d681SAndroid Build Coastguard WorkerThere are other cases in various td files. 1215*9880d681SAndroid Build Coastguard Worker 1216*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1217*9880d681SAndroid Build Coastguard Worker 1218*9880d681SAndroid Build Coastguard WorkerTake something like the following on x86-32: 1219*9880d681SAndroid Build Coastguard Workerunsigned a(unsigned long long x, unsigned y) {return x % y;} 1220*9880d681SAndroid Build Coastguard Worker 1221*9880d681SAndroid Build Coastguard WorkerWe currently generate a libcall, but we really shouldn't: the expansion is 1222*9880d681SAndroid Build Coastguard Workershorter and likely faster than the libcall. The expected code is something 1223*9880d681SAndroid Build Coastguard Workerlike the following: 1224*9880d681SAndroid Build Coastguard Worker 1225*9880d681SAndroid Build Coastguard Worker movl 12(%ebp), %eax 1226*9880d681SAndroid Build Coastguard Worker movl 16(%ebp), %ecx 1227*9880d681SAndroid Build Coastguard Worker xorl %edx, %edx 1228*9880d681SAndroid Build Coastguard Worker divl %ecx 1229*9880d681SAndroid Build Coastguard Worker movl 8(%ebp), %eax 1230*9880d681SAndroid Build Coastguard Worker divl %ecx 1231*9880d681SAndroid Build Coastguard Worker movl %edx, %eax 1232*9880d681SAndroid Build Coastguard Worker ret 1233*9880d681SAndroid Build Coastguard Worker 1234*9880d681SAndroid Build Coastguard WorkerA similar code sequence works for division. 1235*9880d681SAndroid Build Coastguard Worker 1236*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1237*9880d681SAndroid Build Coastguard Worker 1238*9880d681SAndroid Build Coastguard WorkerWe currently compile this: 1239*9880d681SAndroid Build Coastguard Worker 1240*9880d681SAndroid Build Coastguard Workerdefine i32 @func1(i32 %v1, i32 %v2) nounwind { 1241*9880d681SAndroid Build Coastguard Workerentry: 1242*9880d681SAndroid Build Coastguard Worker %t = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %v1, i32 %v2) 1243*9880d681SAndroid Build Coastguard Worker %sum = extractvalue {i32, i1} %t, 0 1244*9880d681SAndroid Build Coastguard Worker %obit = extractvalue {i32, i1} %t, 1 1245*9880d681SAndroid Build Coastguard Worker br i1 %obit, label %overflow, label %normal 1246*9880d681SAndroid Build Coastguard Workernormal: 1247*9880d681SAndroid Build Coastguard Worker ret i32 %sum 1248*9880d681SAndroid Build Coastguard Workeroverflow: 1249*9880d681SAndroid Build Coastguard Worker call void @llvm.trap() 1250*9880d681SAndroid Build Coastguard Worker unreachable 1251*9880d681SAndroid Build Coastguard Worker} 1252*9880d681SAndroid Build Coastguard Workerdeclare {i32, i1} @llvm.sadd.with.overflow.i32(i32, i32) 1253*9880d681SAndroid Build Coastguard Workerdeclare void @llvm.trap() 1254*9880d681SAndroid Build Coastguard Worker 1255*9880d681SAndroid Build Coastguard Workerto: 1256*9880d681SAndroid Build Coastguard Worker 1257*9880d681SAndroid Build Coastguard Worker_func1: 1258*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 1259*9880d681SAndroid Build Coastguard Worker addl 8(%esp), %eax 1260*9880d681SAndroid Build Coastguard Worker jo LBB1_2 ## overflow 1261*9880d681SAndroid Build Coastguard WorkerLBB1_1: ## normal 1262*9880d681SAndroid Build Coastguard Worker ret 1263*9880d681SAndroid Build Coastguard WorkerLBB1_2: ## overflow 1264*9880d681SAndroid Build Coastguard Worker ud2 1265*9880d681SAndroid Build Coastguard Worker 1266*9880d681SAndroid Build Coastguard Workerit would be nice to produce "into" someday. 1267*9880d681SAndroid Build Coastguard Worker 1268*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1269*9880d681SAndroid Build Coastguard Worker 1270*9880d681SAndroid Build Coastguard WorkerTest instructions can be eliminated by using EFLAGS values from arithmetic 1271*9880d681SAndroid Build Coastguard Workerinstructions. This is currently not done for mul, and, or, xor, neg, shl, 1272*9880d681SAndroid Build Coastguard Workersra, srl, shld, shrd, atomic ops, and others. It is also currently not done 1273*9880d681SAndroid Build Coastguard Workerfor read-modify-write instructions. It is also current not done if the 1274*9880d681SAndroid Build Coastguard WorkerOF or CF flags are needed. 1275*9880d681SAndroid Build Coastguard Worker 1276*9880d681SAndroid Build Coastguard WorkerThe shift operators have the complication that when the shift count is 1277*9880d681SAndroid Build Coastguard Workerzero, EFLAGS is not set, so they can only subsume a test instruction if 1278*9880d681SAndroid Build Coastguard Workerthe shift count is known to be non-zero. Also, using the EFLAGS value 1279*9880d681SAndroid Build Coastguard Workerfrom a shift is apparently very slow on some x86 implementations. 1280*9880d681SAndroid Build Coastguard Worker 1281*9880d681SAndroid Build Coastguard WorkerIn read-modify-write instructions, the root node in the isel match is 1282*9880d681SAndroid Build Coastguard Workerthe store, and isel has no way for the use of the EFLAGS result of the 1283*9880d681SAndroid Build Coastguard Workerarithmetic to be remapped to the new node. 1284*9880d681SAndroid Build Coastguard Worker 1285*9880d681SAndroid Build Coastguard WorkerAdd and subtract instructions set OF on signed overflow and CF on unsiged 1286*9880d681SAndroid Build Coastguard Workeroverflow, while test instructions always clear OF and CF. In order to 1287*9880d681SAndroid Build Coastguard Workerreplace a test with an add or subtract in a situation where OF or CF is 1288*9880d681SAndroid Build Coastguard Workerneeded, codegen must be able to prove that the operation cannot see 1289*9880d681SAndroid Build Coastguard Workersigned or unsigned overflow, respectively. 1290*9880d681SAndroid Build Coastguard Worker 1291*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1292*9880d681SAndroid Build Coastguard Worker 1293*9880d681SAndroid Build Coastguard Workermemcpy/memmove do not lower to SSE copies when possible. A silly example is: 1294*9880d681SAndroid Build Coastguard Workerdefine <16 x float> @foo(<16 x float> %A) nounwind { 1295*9880d681SAndroid Build Coastguard Worker %tmp = alloca <16 x float>, align 16 1296*9880d681SAndroid Build Coastguard Worker %tmp2 = alloca <16 x float>, align 16 1297*9880d681SAndroid Build Coastguard Worker store <16 x float> %A, <16 x float>* %tmp 1298*9880d681SAndroid Build Coastguard Worker %s = bitcast <16 x float>* %tmp to i8* 1299*9880d681SAndroid Build Coastguard Worker %s2 = bitcast <16 x float>* %tmp2 to i8* 1300*9880d681SAndroid Build Coastguard Worker call void @llvm.memcpy.i64(i8* %s, i8* %s2, i64 64, i32 16) 1301*9880d681SAndroid Build Coastguard Worker %R = load <16 x float>* %tmp2 1302*9880d681SAndroid Build Coastguard Worker ret <16 x float> %R 1303*9880d681SAndroid Build Coastguard Worker} 1304*9880d681SAndroid Build Coastguard Worker 1305*9880d681SAndroid Build Coastguard Workerdeclare void @llvm.memcpy.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind 1306*9880d681SAndroid Build Coastguard Worker 1307*9880d681SAndroid Build Coastguard Workerwhich compiles to: 1308*9880d681SAndroid Build Coastguard Worker 1309*9880d681SAndroid Build Coastguard Worker_foo: 1310*9880d681SAndroid Build Coastguard Worker subl $140, %esp 1311*9880d681SAndroid Build Coastguard Worker movaps %xmm3, 112(%esp) 1312*9880d681SAndroid Build Coastguard Worker movaps %xmm2, 96(%esp) 1313*9880d681SAndroid Build Coastguard Worker movaps %xmm1, 80(%esp) 1314*9880d681SAndroid Build Coastguard Worker movaps %xmm0, 64(%esp) 1315*9880d681SAndroid Build Coastguard Worker movl 60(%esp), %eax 1316*9880d681SAndroid Build Coastguard Worker movl %eax, 124(%esp) 1317*9880d681SAndroid Build Coastguard Worker movl 56(%esp), %eax 1318*9880d681SAndroid Build Coastguard Worker movl %eax, 120(%esp) 1319*9880d681SAndroid Build Coastguard Worker movl 52(%esp), %eax 1320*9880d681SAndroid Build Coastguard Worker <many many more 32-bit copies> 1321*9880d681SAndroid Build Coastguard Worker movaps (%esp), %xmm0 1322*9880d681SAndroid Build Coastguard Worker movaps 16(%esp), %xmm1 1323*9880d681SAndroid Build Coastguard Worker movaps 32(%esp), %xmm2 1324*9880d681SAndroid Build Coastguard Worker movaps 48(%esp), %xmm3 1325*9880d681SAndroid Build Coastguard Worker addl $140, %esp 1326*9880d681SAndroid Build Coastguard Worker ret 1327*9880d681SAndroid Build Coastguard Worker 1328*9880d681SAndroid Build Coastguard WorkerOn Nehalem, it may even be cheaper to just use movups when unaligned than to 1329*9880d681SAndroid Build Coastguard Workerfall back to lower-granularity chunks. 1330*9880d681SAndroid Build Coastguard Worker 1331*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1332*9880d681SAndroid Build Coastguard Worker 1333*9880d681SAndroid Build Coastguard WorkerImplement processor-specific optimizations for parity with GCC on these 1334*9880d681SAndroid Build Coastguard Workerprocessors. GCC does two optimizations: 1335*9880d681SAndroid Build Coastguard Worker 1336*9880d681SAndroid Build Coastguard Worker1. ix86_pad_returns inserts a noop before ret instructions if immediately 1337*9880d681SAndroid Build Coastguard Worker preceded by a conditional branch or is the target of a jump. 1338*9880d681SAndroid Build Coastguard Worker2. ix86_avoid_jump_misspredicts inserts noops in cases where a 16-byte block of 1339*9880d681SAndroid Build Coastguard Worker code contains more than 3 branches. 1340*9880d681SAndroid Build Coastguard Worker 1341*9880d681SAndroid Build Coastguard WorkerThe first one is done for all AMDs, Core2, and "Generic" 1342*9880d681SAndroid Build Coastguard WorkerThe second one is done for: Atom, Pentium Pro, all AMDs, Pentium 4, Nocona, 1343*9880d681SAndroid Build Coastguard Worker Core 2, and "Generic" 1344*9880d681SAndroid Build Coastguard Worker 1345*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1346*9880d681SAndroid Build Coastguard WorkerTestcase: 1347*9880d681SAndroid Build Coastguard Workerint x(int a) { return (a&0xf0)>>4; } 1348*9880d681SAndroid Build Coastguard Worker 1349*9880d681SAndroid Build Coastguard WorkerCurrent output: 1350*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 1351*9880d681SAndroid Build Coastguard Worker shrl $4, %eax 1352*9880d681SAndroid Build Coastguard Worker andl $15, %eax 1353*9880d681SAndroid Build Coastguard Worker ret 1354*9880d681SAndroid Build Coastguard Worker 1355*9880d681SAndroid Build Coastguard WorkerIdeal output: 1356*9880d681SAndroid Build Coastguard Worker movzbl 4(%esp), %eax 1357*9880d681SAndroid Build Coastguard Worker shrl $4, %eax 1358*9880d681SAndroid Build Coastguard Worker ret 1359*9880d681SAndroid Build Coastguard Worker 1360*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1361*9880d681SAndroid Build Coastguard Worker 1362*9880d681SAndroid Build Coastguard WorkerRe-implement atomic builtins __sync_add_and_fetch() and __sync_sub_and_fetch 1363*9880d681SAndroid Build Coastguard Workerproperly. 1364*9880d681SAndroid Build Coastguard Worker 1365*9880d681SAndroid Build Coastguard WorkerWhen the return value is not used (i.e. only care about the value in the 1366*9880d681SAndroid Build Coastguard Workermemory), x86 does not have to use add to implement these. Instead, it can use 1367*9880d681SAndroid Build Coastguard Workeradd, sub, inc, dec instructions with the "lock" prefix. 1368*9880d681SAndroid Build Coastguard Worker 1369*9880d681SAndroid Build Coastguard WorkerThis is currently implemented using a bit of instruction selection trick. The 1370*9880d681SAndroid Build Coastguard Workerissue is the target independent pattern produces one output and a chain and we 1371*9880d681SAndroid Build Coastguard Workerwant to map it into one that just output a chain. The current trick is to select 1372*9880d681SAndroid Build Coastguard Workerit into a MERGE_VALUES with the first definition being an implicit_def. The 1373*9880d681SAndroid Build Coastguard Workerproper solution is to add new ISD opcodes for the no-output variant. DAG 1374*9880d681SAndroid Build Coastguard Workercombiner can then transform the node before it gets to target node selection. 1375*9880d681SAndroid Build Coastguard Worker 1376*9880d681SAndroid Build Coastguard WorkerProblem #2 is we are adding a whole bunch of x86 atomic instructions when in 1377*9880d681SAndroid Build Coastguard Workerfact these instructions are identical to the non-lock versions. We need a way to 1378*9880d681SAndroid Build Coastguard Workeradd target specific information to target nodes and have this information 1379*9880d681SAndroid Build Coastguard Workercarried over to machine instructions. Asm printer (or JIT) can use this 1380*9880d681SAndroid Build Coastguard Workerinformation to add the "lock" prefix. 1381*9880d681SAndroid Build Coastguard Worker 1382*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1383*9880d681SAndroid Build Coastguard Worker 1384*9880d681SAndroid Build Coastguard Workerstruct B { 1385*9880d681SAndroid Build Coastguard Worker unsigned char y0 : 1; 1386*9880d681SAndroid Build Coastguard Worker}; 1387*9880d681SAndroid Build Coastguard Worker 1388*9880d681SAndroid Build Coastguard Workerint bar(struct B* a) { return a->y0; } 1389*9880d681SAndroid Build Coastguard Worker 1390*9880d681SAndroid Build Coastguard Workerdefine i32 @bar(%struct.B* nocapture %a) nounwind readonly optsize { 1391*9880d681SAndroid Build Coastguard Worker %1 = getelementptr inbounds %struct.B* %a, i64 0, i32 0 1392*9880d681SAndroid Build Coastguard Worker %2 = load i8* %1, align 1 1393*9880d681SAndroid Build Coastguard Worker %3 = and i8 %2, 1 1394*9880d681SAndroid Build Coastguard Worker %4 = zext i8 %3 to i32 1395*9880d681SAndroid Build Coastguard Worker ret i32 %4 1396*9880d681SAndroid Build Coastguard Worker} 1397*9880d681SAndroid Build Coastguard Worker 1398*9880d681SAndroid Build Coastguard Workerbar: # @bar 1399*9880d681SAndroid Build Coastguard Worker# BB#0: 1400*9880d681SAndroid Build Coastguard Worker movb (%rdi), %al 1401*9880d681SAndroid Build Coastguard Worker andb $1, %al 1402*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 1403*9880d681SAndroid Build Coastguard Worker ret 1404*9880d681SAndroid Build Coastguard Worker 1405*9880d681SAndroid Build Coastguard WorkerMissed optimization: should be movl+andl. 1406*9880d681SAndroid Build Coastguard Worker 1407*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1408*9880d681SAndroid Build Coastguard Worker 1409*9880d681SAndroid Build Coastguard WorkerThe x86_64 abi says: 1410*9880d681SAndroid Build Coastguard Worker 1411*9880d681SAndroid Build Coastguard WorkerBooleans, when stored in a memory object, are stored as single byte objects the 1412*9880d681SAndroid Build Coastguard Workervalue of which is always 0 (false) or 1 (true). 1413*9880d681SAndroid Build Coastguard Worker 1414*9880d681SAndroid Build Coastguard WorkerWe are not using this fact: 1415*9880d681SAndroid Build Coastguard Worker 1416*9880d681SAndroid Build Coastguard Workerint bar(_Bool *a) { return *a; } 1417*9880d681SAndroid Build Coastguard Worker 1418*9880d681SAndroid Build Coastguard Workerdefine i32 @bar(i8* nocapture %a) nounwind readonly optsize { 1419*9880d681SAndroid Build Coastguard Worker %1 = load i8* %a, align 1, !tbaa !0 1420*9880d681SAndroid Build Coastguard Worker %tmp = and i8 %1, 1 1421*9880d681SAndroid Build Coastguard Worker %2 = zext i8 %tmp to i32 1422*9880d681SAndroid Build Coastguard Worker ret i32 %2 1423*9880d681SAndroid Build Coastguard Worker} 1424*9880d681SAndroid Build Coastguard Worker 1425*9880d681SAndroid Build Coastguard Workerbar: 1426*9880d681SAndroid Build Coastguard Worker movb (%rdi), %al 1427*9880d681SAndroid Build Coastguard Worker andb $1, %al 1428*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 1429*9880d681SAndroid Build Coastguard Worker ret 1430*9880d681SAndroid Build Coastguard Worker 1431*9880d681SAndroid Build Coastguard WorkerGCC produces 1432*9880d681SAndroid Build Coastguard Worker 1433*9880d681SAndroid Build Coastguard Workerbar: 1434*9880d681SAndroid Build Coastguard Worker movzbl (%rdi), %eax 1435*9880d681SAndroid Build Coastguard Worker ret 1436*9880d681SAndroid Build Coastguard Worker 1437*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1438*9880d681SAndroid Build Coastguard Worker 1439*9880d681SAndroid Build Coastguard WorkerConsider the following two functions compiled with clang: 1440*9880d681SAndroid Build Coastguard Worker_Bool foo(int *x) { return !(*x & 4); } 1441*9880d681SAndroid Build Coastguard Workerunsigned bar(int *x) { return !(*x & 4); } 1442*9880d681SAndroid Build Coastguard Worker 1443*9880d681SAndroid Build Coastguard Workerfoo: 1444*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 1445*9880d681SAndroid Build Coastguard Worker testb $4, (%eax) 1446*9880d681SAndroid Build Coastguard Worker sete %al 1447*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 1448*9880d681SAndroid Build Coastguard Worker ret 1449*9880d681SAndroid Build Coastguard Worker 1450*9880d681SAndroid Build Coastguard Workerbar: 1451*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 1452*9880d681SAndroid Build Coastguard Worker movl (%eax), %eax 1453*9880d681SAndroid Build Coastguard Worker shrl $2, %eax 1454*9880d681SAndroid Build Coastguard Worker andl $1, %eax 1455*9880d681SAndroid Build Coastguard Worker xorl $1, %eax 1456*9880d681SAndroid Build Coastguard Worker ret 1457*9880d681SAndroid Build Coastguard Worker 1458*9880d681SAndroid Build Coastguard WorkerThe second function generates more code even though the two functions are 1459*9880d681SAndroid Build Coastguard Workerare functionally identical. 1460*9880d681SAndroid Build Coastguard Worker 1461*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1462*9880d681SAndroid Build Coastguard Worker 1463*9880d681SAndroid Build Coastguard WorkerTake the following C code: 1464*9880d681SAndroid Build Coastguard Workerint f(int a, int b) { return (unsigned char)a == (unsigned char)b; } 1465*9880d681SAndroid Build Coastguard Worker 1466*9880d681SAndroid Build Coastguard WorkerWe generate the following IR with clang: 1467*9880d681SAndroid Build Coastguard Workerdefine i32 @f(i32 %a, i32 %b) nounwind readnone { 1468*9880d681SAndroid Build Coastguard Workerentry: 1469*9880d681SAndroid Build Coastguard Worker %tmp = xor i32 %b, %a ; <i32> [#uses=1] 1470*9880d681SAndroid Build Coastguard Worker %tmp6 = and i32 %tmp, 255 ; <i32> [#uses=1] 1471*9880d681SAndroid Build Coastguard Worker %cmp = icmp eq i32 %tmp6, 0 ; <i1> [#uses=1] 1472*9880d681SAndroid Build Coastguard Worker %conv5 = zext i1 %cmp to i32 ; <i32> [#uses=1] 1473*9880d681SAndroid Build Coastguard Worker ret i32 %conv5 1474*9880d681SAndroid Build Coastguard Worker} 1475*9880d681SAndroid Build Coastguard Worker 1476*9880d681SAndroid Build Coastguard WorkerAnd the following x86 code: 1477*9880d681SAndroid Build Coastguard Worker xorl %esi, %edi 1478*9880d681SAndroid Build Coastguard Worker testb $-1, %dil 1479*9880d681SAndroid Build Coastguard Worker sete %al 1480*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 1481*9880d681SAndroid Build Coastguard Worker ret 1482*9880d681SAndroid Build Coastguard Worker 1483*9880d681SAndroid Build Coastguard WorkerA cmpb instead of the xorl+testb would be one instruction shorter. 1484*9880d681SAndroid Build Coastguard Worker 1485*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1486*9880d681SAndroid Build Coastguard Worker 1487*9880d681SAndroid Build Coastguard WorkerGiven the following C code: 1488*9880d681SAndroid Build Coastguard Workerint f(int a, int b) { return (signed char)a == (signed char)b; } 1489*9880d681SAndroid Build Coastguard Worker 1490*9880d681SAndroid Build Coastguard WorkerWe generate the following IR with clang: 1491*9880d681SAndroid Build Coastguard Workerdefine i32 @f(i32 %a, i32 %b) nounwind readnone { 1492*9880d681SAndroid Build Coastguard Workerentry: 1493*9880d681SAndroid Build Coastguard Worker %sext = shl i32 %a, 24 ; <i32> [#uses=1] 1494*9880d681SAndroid Build Coastguard Worker %conv1 = ashr i32 %sext, 24 ; <i32> [#uses=1] 1495*9880d681SAndroid Build Coastguard Worker %sext6 = shl i32 %b, 24 ; <i32> [#uses=1] 1496*9880d681SAndroid Build Coastguard Worker %conv4 = ashr i32 %sext6, 24 ; <i32> [#uses=1] 1497*9880d681SAndroid Build Coastguard Worker %cmp = icmp eq i32 %conv1, %conv4 ; <i1> [#uses=1] 1498*9880d681SAndroid Build Coastguard Worker %conv5 = zext i1 %cmp to i32 ; <i32> [#uses=1] 1499*9880d681SAndroid Build Coastguard Worker ret i32 %conv5 1500*9880d681SAndroid Build Coastguard Worker} 1501*9880d681SAndroid Build Coastguard Worker 1502*9880d681SAndroid Build Coastguard WorkerAnd the following x86 code: 1503*9880d681SAndroid Build Coastguard Worker movsbl %sil, %eax 1504*9880d681SAndroid Build Coastguard Worker movsbl %dil, %ecx 1505*9880d681SAndroid Build Coastguard Worker cmpl %eax, %ecx 1506*9880d681SAndroid Build Coastguard Worker sete %al 1507*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 1508*9880d681SAndroid Build Coastguard Worker ret 1509*9880d681SAndroid Build Coastguard Worker 1510*9880d681SAndroid Build Coastguard Worker 1511*9880d681SAndroid Build Coastguard WorkerIt should be possible to eliminate the sign extensions. 1512*9880d681SAndroid Build Coastguard Worker 1513*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1514*9880d681SAndroid Build Coastguard Worker 1515*9880d681SAndroid Build Coastguard WorkerLLVM misses a load+store narrowing opportunity in this code: 1516*9880d681SAndroid Build Coastguard Worker 1517*9880d681SAndroid Build Coastguard Worker%struct.bf = type { i64, i16, i16, i32 } 1518*9880d681SAndroid Build Coastguard Worker 1519*9880d681SAndroid Build Coastguard Worker@bfi = external global %struct.bf* ; <%struct.bf**> [#uses=2] 1520*9880d681SAndroid Build Coastguard Worker 1521*9880d681SAndroid Build Coastguard Workerdefine void @t1() nounwind ssp { 1522*9880d681SAndroid Build Coastguard Workerentry: 1523*9880d681SAndroid Build Coastguard Worker %0 = load %struct.bf** @bfi, align 8 ; <%struct.bf*> [#uses=1] 1524*9880d681SAndroid Build Coastguard Worker %1 = getelementptr %struct.bf* %0, i64 0, i32 1 ; <i16*> [#uses=1] 1525*9880d681SAndroid Build Coastguard Worker %2 = bitcast i16* %1 to i32* ; <i32*> [#uses=2] 1526*9880d681SAndroid Build Coastguard Worker %3 = load i32* %2, align 1 ; <i32> [#uses=1] 1527*9880d681SAndroid Build Coastguard Worker %4 = and i32 %3, -65537 ; <i32> [#uses=1] 1528*9880d681SAndroid Build Coastguard Worker store i32 %4, i32* %2, align 1 1529*9880d681SAndroid Build Coastguard Worker %5 = load %struct.bf** @bfi, align 8 ; <%struct.bf*> [#uses=1] 1530*9880d681SAndroid Build Coastguard Worker %6 = getelementptr %struct.bf* %5, i64 0, i32 1 ; <i16*> [#uses=1] 1531*9880d681SAndroid Build Coastguard Worker %7 = bitcast i16* %6 to i32* ; <i32*> [#uses=2] 1532*9880d681SAndroid Build Coastguard Worker %8 = load i32* %7, align 1 ; <i32> [#uses=1] 1533*9880d681SAndroid Build Coastguard Worker %9 = and i32 %8, -131073 ; <i32> [#uses=1] 1534*9880d681SAndroid Build Coastguard Worker store i32 %9, i32* %7, align 1 1535*9880d681SAndroid Build Coastguard Worker ret void 1536*9880d681SAndroid Build Coastguard Worker} 1537*9880d681SAndroid Build Coastguard Worker 1538*9880d681SAndroid Build Coastguard WorkerLLVM currently emits this: 1539*9880d681SAndroid Build Coastguard Worker 1540*9880d681SAndroid Build Coastguard Worker movq bfi(%rip), %rax 1541*9880d681SAndroid Build Coastguard Worker andl $-65537, 8(%rax) 1542*9880d681SAndroid Build Coastguard Worker movq bfi(%rip), %rax 1543*9880d681SAndroid Build Coastguard Worker andl $-131073, 8(%rax) 1544*9880d681SAndroid Build Coastguard Worker ret 1545*9880d681SAndroid Build Coastguard Worker 1546*9880d681SAndroid Build Coastguard WorkerIt could narrow the loads and stores to emit this: 1547*9880d681SAndroid Build Coastguard Worker 1548*9880d681SAndroid Build Coastguard Worker movq bfi(%rip), %rax 1549*9880d681SAndroid Build Coastguard Worker andb $-2, 10(%rax) 1550*9880d681SAndroid Build Coastguard Worker movq bfi(%rip), %rax 1551*9880d681SAndroid Build Coastguard Worker andb $-3, 10(%rax) 1552*9880d681SAndroid Build Coastguard Worker ret 1553*9880d681SAndroid Build Coastguard Worker 1554*9880d681SAndroid Build Coastguard WorkerThe trouble is that there is a TokenFactor between the store and the 1555*9880d681SAndroid Build Coastguard Workerload, making it non-trivial to determine if there's anything between 1556*9880d681SAndroid Build Coastguard Workerthe load and the store which would prohibit narrowing. 1557*9880d681SAndroid Build Coastguard Worker 1558*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1559*9880d681SAndroid Build Coastguard Worker 1560*9880d681SAndroid Build Coastguard WorkerThis code: 1561*9880d681SAndroid Build Coastguard Workervoid foo(unsigned x) { 1562*9880d681SAndroid Build Coastguard Worker if (x == 0) bar(); 1563*9880d681SAndroid Build Coastguard Worker else if (x == 1) qux(); 1564*9880d681SAndroid Build Coastguard Worker} 1565*9880d681SAndroid Build Coastguard Worker 1566*9880d681SAndroid Build Coastguard Workercurrently compiles into: 1567*9880d681SAndroid Build Coastguard Worker_foo: 1568*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 1569*9880d681SAndroid Build Coastguard Worker cmpl $1, %eax 1570*9880d681SAndroid Build Coastguard Worker je LBB0_3 1571*9880d681SAndroid Build Coastguard Worker testl %eax, %eax 1572*9880d681SAndroid Build Coastguard Worker jne LBB0_4 1573*9880d681SAndroid Build Coastguard Worker 1574*9880d681SAndroid Build Coastguard Workerthe testl could be removed: 1575*9880d681SAndroid Build Coastguard Worker_foo: 1576*9880d681SAndroid Build Coastguard Worker movl 4(%esp), %eax 1577*9880d681SAndroid Build Coastguard Worker cmpl $1, %eax 1578*9880d681SAndroid Build Coastguard Worker je LBB0_3 1579*9880d681SAndroid Build Coastguard Worker jb LBB0_4 1580*9880d681SAndroid Build Coastguard Worker 1581*9880d681SAndroid Build Coastguard Worker0 is the only unsigned number < 1. 1582*9880d681SAndroid Build Coastguard Worker 1583*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1584*9880d681SAndroid Build Coastguard Worker 1585*9880d681SAndroid Build Coastguard WorkerThis code: 1586*9880d681SAndroid Build Coastguard Worker 1587*9880d681SAndroid Build Coastguard Worker%0 = type { i32, i1 } 1588*9880d681SAndroid Build Coastguard Worker 1589*9880d681SAndroid Build Coastguard Workerdefine i32 @add32carry(i32 %sum, i32 %x) nounwind readnone ssp { 1590*9880d681SAndroid Build Coastguard Workerentry: 1591*9880d681SAndroid Build Coastguard Worker %uadd = tail call %0 @llvm.uadd.with.overflow.i32(i32 %sum, i32 %x) 1592*9880d681SAndroid Build Coastguard Worker %cmp = extractvalue %0 %uadd, 1 1593*9880d681SAndroid Build Coastguard Worker %inc = zext i1 %cmp to i32 1594*9880d681SAndroid Build Coastguard Worker %add = add i32 %x, %sum 1595*9880d681SAndroid Build Coastguard Worker %z.0 = add i32 %add, %inc 1596*9880d681SAndroid Build Coastguard Worker ret i32 %z.0 1597*9880d681SAndroid Build Coastguard Worker} 1598*9880d681SAndroid Build Coastguard Worker 1599*9880d681SAndroid Build Coastguard Workerdeclare %0 @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone 1600*9880d681SAndroid Build Coastguard Worker 1601*9880d681SAndroid Build Coastguard Workercompiles to: 1602*9880d681SAndroid Build Coastguard Worker 1603*9880d681SAndroid Build Coastguard Worker_add32carry: ## @add32carry 1604*9880d681SAndroid Build Coastguard Worker addl %esi, %edi 1605*9880d681SAndroid Build Coastguard Worker sbbl %ecx, %ecx 1606*9880d681SAndroid Build Coastguard Worker movl %edi, %eax 1607*9880d681SAndroid Build Coastguard Worker subl %ecx, %eax 1608*9880d681SAndroid Build Coastguard Worker ret 1609*9880d681SAndroid Build Coastguard Worker 1610*9880d681SAndroid Build Coastguard WorkerBut it could be: 1611*9880d681SAndroid Build Coastguard Worker 1612*9880d681SAndroid Build Coastguard Worker_add32carry: 1613*9880d681SAndroid Build Coastguard Worker leal (%rsi,%rdi), %eax 1614*9880d681SAndroid Build Coastguard Worker cmpl %esi, %eax 1615*9880d681SAndroid Build Coastguard Worker adcl $0, %eax 1616*9880d681SAndroid Build Coastguard Worker ret 1617*9880d681SAndroid Build Coastguard Worker 1618*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1619*9880d681SAndroid Build Coastguard Worker 1620*9880d681SAndroid Build Coastguard WorkerThe hot loop of 256.bzip2 contains code that looks a bit like this: 1621*9880d681SAndroid Build Coastguard Worker 1622*9880d681SAndroid Build Coastguard Workerint foo(char *P, char *Q, int x, int y) { 1623*9880d681SAndroid Build Coastguard Worker if (P[0] != Q[0]) 1624*9880d681SAndroid Build Coastguard Worker return P[0] < Q[0]; 1625*9880d681SAndroid Build Coastguard Worker if (P[1] != Q[1]) 1626*9880d681SAndroid Build Coastguard Worker return P[1] < Q[1]; 1627*9880d681SAndroid Build Coastguard Worker if (P[2] != Q[2]) 1628*9880d681SAndroid Build Coastguard Worker return P[2] < Q[2]; 1629*9880d681SAndroid Build Coastguard Worker return P[3] < Q[3]; 1630*9880d681SAndroid Build Coastguard Worker} 1631*9880d681SAndroid Build Coastguard Worker 1632*9880d681SAndroid Build Coastguard WorkerIn the real code, we get a lot more wrong than this. However, even in this 1633*9880d681SAndroid Build Coastguard Workercode we generate: 1634*9880d681SAndroid Build Coastguard Worker 1635*9880d681SAndroid Build Coastguard Worker_foo: ## @foo 1636*9880d681SAndroid Build Coastguard Worker## BB#0: ## %entry 1637*9880d681SAndroid Build Coastguard Worker movb (%rsi), %al 1638*9880d681SAndroid Build Coastguard Worker movb (%rdi), %cl 1639*9880d681SAndroid Build Coastguard Worker cmpb %al, %cl 1640*9880d681SAndroid Build Coastguard Worker je LBB0_2 1641*9880d681SAndroid Build Coastguard WorkerLBB0_1: ## %if.then 1642*9880d681SAndroid Build Coastguard Worker cmpb %al, %cl 1643*9880d681SAndroid Build Coastguard Worker jmp LBB0_5 1644*9880d681SAndroid Build Coastguard WorkerLBB0_2: ## %if.end 1645*9880d681SAndroid Build Coastguard Worker movb 1(%rsi), %al 1646*9880d681SAndroid Build Coastguard Worker movb 1(%rdi), %cl 1647*9880d681SAndroid Build Coastguard Worker cmpb %al, %cl 1648*9880d681SAndroid Build Coastguard Worker jne LBB0_1 1649*9880d681SAndroid Build Coastguard Worker## BB#3: ## %if.end38 1650*9880d681SAndroid Build Coastguard Worker movb 2(%rsi), %al 1651*9880d681SAndroid Build Coastguard Worker movb 2(%rdi), %cl 1652*9880d681SAndroid Build Coastguard Worker cmpb %al, %cl 1653*9880d681SAndroid Build Coastguard Worker jne LBB0_1 1654*9880d681SAndroid Build Coastguard Worker## BB#4: ## %if.end60 1655*9880d681SAndroid Build Coastguard Worker movb 3(%rdi), %al 1656*9880d681SAndroid Build Coastguard Worker cmpb 3(%rsi), %al 1657*9880d681SAndroid Build Coastguard WorkerLBB0_5: ## %if.end60 1658*9880d681SAndroid Build Coastguard Worker setl %al 1659*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 1660*9880d681SAndroid Build Coastguard Worker ret 1661*9880d681SAndroid Build Coastguard Worker 1662*9880d681SAndroid Build Coastguard WorkerNote that we generate jumps to LBB0_1 which does a redundant compare. The 1663*9880d681SAndroid Build Coastguard Workerredundant compare also forces the register values to be live, which prevents 1664*9880d681SAndroid Build Coastguard Workerfolding one of the loads into the compare. In contrast, GCC 4.2 produces: 1665*9880d681SAndroid Build Coastguard Worker 1666*9880d681SAndroid Build Coastguard Worker_foo: 1667*9880d681SAndroid Build Coastguard Worker movzbl (%rsi), %eax 1668*9880d681SAndroid Build Coastguard Worker cmpb %al, (%rdi) 1669*9880d681SAndroid Build Coastguard Worker jne L10 1670*9880d681SAndroid Build Coastguard WorkerL12: 1671*9880d681SAndroid Build Coastguard Worker movzbl 1(%rsi), %eax 1672*9880d681SAndroid Build Coastguard Worker cmpb %al, 1(%rdi) 1673*9880d681SAndroid Build Coastguard Worker jne L10 1674*9880d681SAndroid Build Coastguard Worker movzbl 2(%rsi), %eax 1675*9880d681SAndroid Build Coastguard Worker cmpb %al, 2(%rdi) 1676*9880d681SAndroid Build Coastguard Worker jne L10 1677*9880d681SAndroid Build Coastguard Worker movzbl 3(%rdi), %eax 1678*9880d681SAndroid Build Coastguard Worker cmpb 3(%rsi), %al 1679*9880d681SAndroid Build Coastguard WorkerL10: 1680*9880d681SAndroid Build Coastguard Worker setl %al 1681*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 1682*9880d681SAndroid Build Coastguard Worker ret 1683*9880d681SAndroid Build Coastguard Worker 1684*9880d681SAndroid Build Coastguard Workerwhich is "perfect". 1685*9880d681SAndroid Build Coastguard Worker 1686*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1687*9880d681SAndroid Build Coastguard Worker 1688*9880d681SAndroid Build Coastguard WorkerFor the branch in the following code: 1689*9880d681SAndroid Build Coastguard Workerint a(); 1690*9880d681SAndroid Build Coastguard Workerint b(int x, int y) { 1691*9880d681SAndroid Build Coastguard Worker if (x & (1<<(y&7))) 1692*9880d681SAndroid Build Coastguard Worker return a(); 1693*9880d681SAndroid Build Coastguard Worker return y; 1694*9880d681SAndroid Build Coastguard Worker} 1695*9880d681SAndroid Build Coastguard Worker 1696*9880d681SAndroid Build Coastguard WorkerWe currently generate: 1697*9880d681SAndroid Build Coastguard Worker movb %sil, %al 1698*9880d681SAndroid Build Coastguard Worker andb $7, %al 1699*9880d681SAndroid Build Coastguard Worker movzbl %al, %eax 1700*9880d681SAndroid Build Coastguard Worker btl %eax, %edi 1701*9880d681SAndroid Build Coastguard Worker jae .LBB0_2 1702*9880d681SAndroid Build Coastguard Worker 1703*9880d681SAndroid Build Coastguard Workermovl+andl would be shorter than the movb+andb+movzbl sequence. 1704*9880d681SAndroid Build Coastguard Worker 1705*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1706*9880d681SAndroid Build Coastguard Worker 1707*9880d681SAndroid Build Coastguard WorkerFor the following: 1708*9880d681SAndroid Build Coastguard Workerstruct u1 { 1709*9880d681SAndroid Build Coastguard Worker float x, y; 1710*9880d681SAndroid Build Coastguard Worker}; 1711*9880d681SAndroid Build Coastguard Workerfloat foo(struct u1 u) { 1712*9880d681SAndroid Build Coastguard Worker return u.x + u.y; 1713*9880d681SAndroid Build Coastguard Worker} 1714*9880d681SAndroid Build Coastguard Worker 1715*9880d681SAndroid Build Coastguard WorkerWe currently generate: 1716*9880d681SAndroid Build Coastguard Worker movdqa %xmm0, %xmm1 1717*9880d681SAndroid Build Coastguard Worker pshufd $1, %xmm0, %xmm0 # xmm0 = xmm0[1,0,0,0] 1718*9880d681SAndroid Build Coastguard Worker addss %xmm1, %xmm0 1719*9880d681SAndroid Build Coastguard Worker ret 1720*9880d681SAndroid Build Coastguard Worker 1721*9880d681SAndroid Build Coastguard WorkerWe could save an instruction here by commuting the addss. 1722*9880d681SAndroid Build Coastguard Worker 1723*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1724*9880d681SAndroid Build Coastguard Worker 1725*9880d681SAndroid Build Coastguard WorkerThis (from PR9661): 1726*9880d681SAndroid Build Coastguard Worker 1727*9880d681SAndroid Build Coastguard Workerfloat clamp_float(float a) { 1728*9880d681SAndroid Build Coastguard Worker if (a > 1.0f) 1729*9880d681SAndroid Build Coastguard Worker return 1.0f; 1730*9880d681SAndroid Build Coastguard Worker else if (a < 0.0f) 1731*9880d681SAndroid Build Coastguard Worker return 0.0f; 1732*9880d681SAndroid Build Coastguard Worker else 1733*9880d681SAndroid Build Coastguard Worker return a; 1734*9880d681SAndroid Build Coastguard Worker} 1735*9880d681SAndroid Build Coastguard Worker 1736*9880d681SAndroid Build Coastguard WorkerCould compile to: 1737*9880d681SAndroid Build Coastguard Worker 1738*9880d681SAndroid Build Coastguard Workerclamp_float: # @clamp_float 1739*9880d681SAndroid Build Coastguard Worker movss .LCPI0_0(%rip), %xmm1 1740*9880d681SAndroid Build Coastguard Worker minss %xmm1, %xmm0 1741*9880d681SAndroid Build Coastguard Worker pxor %xmm1, %xmm1 1742*9880d681SAndroid Build Coastguard Worker maxss %xmm1, %xmm0 1743*9880d681SAndroid Build Coastguard Worker ret 1744*9880d681SAndroid Build Coastguard Worker 1745*9880d681SAndroid Build Coastguard Workerwith -ffast-math. 1746*9880d681SAndroid Build Coastguard Worker 1747*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1748*9880d681SAndroid Build Coastguard Worker 1749*9880d681SAndroid Build Coastguard WorkerThis function (from PR9803): 1750*9880d681SAndroid Build Coastguard Worker 1751*9880d681SAndroid Build Coastguard Workerint clamp2(int a) { 1752*9880d681SAndroid Build Coastguard Worker if (a > 5) 1753*9880d681SAndroid Build Coastguard Worker a = 5; 1754*9880d681SAndroid Build Coastguard Worker if (a < 0) 1755*9880d681SAndroid Build Coastguard Worker return 0; 1756*9880d681SAndroid Build Coastguard Worker return a; 1757*9880d681SAndroid Build Coastguard Worker} 1758*9880d681SAndroid Build Coastguard Worker 1759*9880d681SAndroid Build Coastguard WorkerCompiles to: 1760*9880d681SAndroid Build Coastguard Worker 1761*9880d681SAndroid Build Coastguard Worker_clamp2: ## @clamp2 1762*9880d681SAndroid Build Coastguard Worker pushq %rbp 1763*9880d681SAndroid Build Coastguard Worker movq %rsp, %rbp 1764*9880d681SAndroid Build Coastguard Worker cmpl $5, %edi 1765*9880d681SAndroid Build Coastguard Worker movl $5, %ecx 1766*9880d681SAndroid Build Coastguard Worker cmovlel %edi, %ecx 1767*9880d681SAndroid Build Coastguard Worker testl %ecx, %ecx 1768*9880d681SAndroid Build Coastguard Worker movl $0, %eax 1769*9880d681SAndroid Build Coastguard Worker cmovnsl %ecx, %eax 1770*9880d681SAndroid Build Coastguard Worker popq %rbp 1771*9880d681SAndroid Build Coastguard Worker ret 1772*9880d681SAndroid Build Coastguard Worker 1773*9880d681SAndroid Build Coastguard WorkerThe move of 0 could be scheduled above the test to make it is xor reg,reg. 1774*9880d681SAndroid Build Coastguard Worker 1775*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1776*9880d681SAndroid Build Coastguard Worker 1777*9880d681SAndroid Build Coastguard WorkerGCC PR48986. We currently compile this: 1778*9880d681SAndroid Build Coastguard Worker 1779*9880d681SAndroid Build Coastguard Workervoid bar(void); 1780*9880d681SAndroid Build Coastguard Workervoid yyy(int* p) { 1781*9880d681SAndroid Build Coastguard Worker if (__sync_fetch_and_add(p, -1) == 1) 1782*9880d681SAndroid Build Coastguard Worker bar(); 1783*9880d681SAndroid Build Coastguard Worker} 1784*9880d681SAndroid Build Coastguard Worker 1785*9880d681SAndroid Build Coastguard Workerinto: 1786*9880d681SAndroid Build Coastguard Worker movl $-1, %eax 1787*9880d681SAndroid Build Coastguard Worker lock 1788*9880d681SAndroid Build Coastguard Worker xaddl %eax, (%rdi) 1789*9880d681SAndroid Build Coastguard Worker cmpl $1, %eax 1790*9880d681SAndroid Build Coastguard Worker je LBB0_2 1791*9880d681SAndroid Build Coastguard Worker 1792*9880d681SAndroid Build Coastguard WorkerInstead we could generate: 1793*9880d681SAndroid Build Coastguard Worker 1794*9880d681SAndroid Build Coastguard Worker lock 1795*9880d681SAndroid Build Coastguard Worker dec %rdi 1796*9880d681SAndroid Build Coastguard Worker je LBB0_2 1797*9880d681SAndroid Build Coastguard Worker 1798*9880d681SAndroid Build Coastguard WorkerThe trick is to match "fetch_and_add(X, -C) == C". 1799*9880d681SAndroid Build Coastguard Worker 1800*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1801*9880d681SAndroid Build Coastguard Worker 1802*9880d681SAndroid Build Coastguard Workerunsigned t(unsigned a, unsigned b) { 1803*9880d681SAndroid Build Coastguard Worker return a <= b ? 5 : -5; 1804*9880d681SAndroid Build Coastguard Worker} 1805*9880d681SAndroid Build Coastguard Worker 1806*9880d681SAndroid Build Coastguard WorkerWe generate: 1807*9880d681SAndroid Build Coastguard Worker movl $5, %ecx 1808*9880d681SAndroid Build Coastguard Worker cmpl %esi, %edi 1809*9880d681SAndroid Build Coastguard Worker movl $-5, %eax 1810*9880d681SAndroid Build Coastguard Worker cmovbel %ecx, %eax 1811*9880d681SAndroid Build Coastguard Worker 1812*9880d681SAndroid Build Coastguard WorkerGCC: 1813*9880d681SAndroid Build Coastguard Worker cmpl %edi, %esi 1814*9880d681SAndroid Build Coastguard Worker sbbl %eax, %eax 1815*9880d681SAndroid Build Coastguard Worker andl $-10, %eax 1816*9880d681SAndroid Build Coastguard Worker addl $5, %eax 1817*9880d681SAndroid Build Coastguard Worker 1818*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1819