1*9880d681SAndroid Build Coastguard WorkerTarget Independent Opportunities: 2*9880d681SAndroid Build Coastguard Worker 3*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 4*9880d681SAndroid Build Coastguard Worker 5*9880d681SAndroid Build Coastguard WorkerWe should recognized various "overflow detection" idioms and translate them into 6*9880d681SAndroid Build Coastguard Workerllvm.uadd.with.overflow and similar intrinsics. Here is a multiply idiom: 7*9880d681SAndroid Build Coastguard Worker 8*9880d681SAndroid Build Coastguard Workerunsigned int mul(unsigned int a,unsigned int b) { 9*9880d681SAndroid Build Coastguard Worker if ((unsigned long long)a*b>0xffffffff) 10*9880d681SAndroid Build Coastguard Worker exit(0); 11*9880d681SAndroid Build Coastguard Worker return a*b; 12*9880d681SAndroid Build Coastguard Worker} 13*9880d681SAndroid Build Coastguard Worker 14*9880d681SAndroid Build Coastguard WorkerThe legalization code for mul-with-overflow needs to be made more robust before 15*9880d681SAndroid Build Coastguard Workerthis can be implemented though. 16*9880d681SAndroid Build Coastguard Worker 17*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 18*9880d681SAndroid Build Coastguard Worker 19*9880d681SAndroid Build Coastguard WorkerGet the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and 20*9880d681SAndroid Build Coastguard Workerprecision don't matter (ffastmath). Misc/mandel will like this. :) This isn't 21*9880d681SAndroid Build Coastguard Workersafe in general, even on darwin. See the libm implementation of hypot for 22*9880d681SAndroid Build Coastguard Workerexamples (which special case when x/y are exactly zero to get signed zeros etc 23*9880d681SAndroid Build Coastguard Workerright). 24*9880d681SAndroid Build Coastguard Worker 25*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 26*9880d681SAndroid Build Coastguard Worker 27*9880d681SAndroid Build Coastguard WorkerOn targets with expensive 64-bit multiply, we could LSR this: 28*9880d681SAndroid Build Coastguard Worker 29*9880d681SAndroid Build Coastguard Workerfor (i = ...; ++i) { 30*9880d681SAndroid Build Coastguard Worker x = 1ULL << i; 31*9880d681SAndroid Build Coastguard Worker 32*9880d681SAndroid Build Coastguard Workerinto: 33*9880d681SAndroid Build Coastguard Worker long long tmp = 1; 34*9880d681SAndroid Build Coastguard Worker for (i = ...; ++i, tmp+=tmp) 35*9880d681SAndroid Build Coastguard Worker x = tmp; 36*9880d681SAndroid Build Coastguard Worker 37*9880d681SAndroid Build Coastguard WorkerThis would be a win on ppc32, but not x86 or ppc64. 38*9880d681SAndroid Build Coastguard Worker 39*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 40*9880d681SAndroid Build Coastguard Worker 41*9880d681SAndroid Build Coastguard WorkerShrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0) 42*9880d681SAndroid Build Coastguard Worker 43*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 44*9880d681SAndroid Build Coastguard Worker 45*9880d681SAndroid Build Coastguard WorkerReassociate should turn things like: 46*9880d681SAndroid Build Coastguard Worker 47*9880d681SAndroid Build Coastguard Workerint factorial(int X) { 48*9880d681SAndroid Build Coastguard Worker return X*X*X*X*X*X*X*X; 49*9880d681SAndroid Build Coastguard Worker} 50*9880d681SAndroid Build Coastguard Worker 51*9880d681SAndroid Build Coastguard Workerinto llvm.powi calls, allowing the code generator to produce balanced 52*9880d681SAndroid Build Coastguard Workermultiplication trees. 53*9880d681SAndroid Build Coastguard Worker 54*9880d681SAndroid Build Coastguard WorkerFirst, the intrinsic needs to be extended to support integers, and second the 55*9880d681SAndroid Build Coastguard Workercode generator needs to be enhanced to lower these to multiplication trees. 56*9880d681SAndroid Build Coastguard Worker 57*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 58*9880d681SAndroid Build Coastguard Worker 59*9880d681SAndroid Build Coastguard WorkerInteresting? testcase for add/shift/mul reassoc: 60*9880d681SAndroid Build Coastguard Worker 61*9880d681SAndroid Build Coastguard Workerint bar(int x, int y) { 62*9880d681SAndroid Build Coastguard Worker return x*x*x+y+x*x*x*x*x*y*y*y*y; 63*9880d681SAndroid Build Coastguard Worker} 64*9880d681SAndroid Build Coastguard Workerint foo(int z, int n) { 65*9880d681SAndroid Build Coastguard Worker return bar(z, n) + bar(2*z, 2*n); 66*9880d681SAndroid Build Coastguard Worker} 67*9880d681SAndroid Build Coastguard Worker 68*9880d681SAndroid Build Coastguard WorkerThis is blocked on not handling X*X*X -> powi(X, 3) (see note above). The issue 69*9880d681SAndroid Build Coastguard Workeris that we end up getting t = 2*X s = t*t and don't turn this into 4*X*X, 70*9880d681SAndroid Build Coastguard Workerwhich is the same number of multiplies and is canonical, because the 2*X has 71*9880d681SAndroid Build Coastguard Workermultiple uses. Here's a simple example: 72*9880d681SAndroid Build Coastguard Worker 73*9880d681SAndroid Build Coastguard Workerdefine i32 @test15(i32 %X1) { 74*9880d681SAndroid Build Coastguard Worker %B = mul i32 %X1, 47 ; X1*47 75*9880d681SAndroid Build Coastguard Worker %C = mul i32 %B, %B 76*9880d681SAndroid Build Coastguard Worker ret i32 %C 77*9880d681SAndroid Build Coastguard Worker} 78*9880d681SAndroid Build Coastguard Worker 79*9880d681SAndroid Build Coastguard Worker 80*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 81*9880d681SAndroid Build Coastguard Worker 82*9880d681SAndroid Build Coastguard WorkerReassociate should handle the example in GCC PR16157: 83*9880d681SAndroid Build Coastguard Worker 84*9880d681SAndroid Build Coastguard Workerextern int a0, a1, a2, a3, a4; extern int b0, b1, b2, b3, b4; 85*9880d681SAndroid Build Coastguard Workervoid f () { /* this can be optimized to four additions... */ 86*9880d681SAndroid Build Coastguard Worker b4 = a4 + a3 + a2 + a1 + a0; 87*9880d681SAndroid Build Coastguard Worker b3 = a3 + a2 + a1 + a0; 88*9880d681SAndroid Build Coastguard Worker b2 = a2 + a1 + a0; 89*9880d681SAndroid Build Coastguard Worker b1 = a1 + a0; 90*9880d681SAndroid Build Coastguard Worker} 91*9880d681SAndroid Build Coastguard Worker 92*9880d681SAndroid Build Coastguard WorkerThis requires reassociating to forms of expressions that are already available, 93*9880d681SAndroid Build Coastguard Workersomething that reassoc doesn't think about yet. 94*9880d681SAndroid Build Coastguard Worker 95*9880d681SAndroid Build Coastguard Worker 96*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 97*9880d681SAndroid Build Coastguard Worker 98*9880d681SAndroid Build Coastguard WorkerThese two functions should generate the same code on big-endian systems: 99*9880d681SAndroid Build Coastguard Worker 100*9880d681SAndroid Build Coastguard Workerint g(int *j,int *l) { return memcmp(j,l,4); } 101*9880d681SAndroid Build Coastguard Workerint h(int *j, int *l) { return *j - *l; } 102*9880d681SAndroid Build Coastguard Worker 103*9880d681SAndroid Build Coastguard Workerthis could be done in SelectionDAGISel.cpp, along with other special cases, 104*9880d681SAndroid Build Coastguard Workerfor 1,2,4,8 bytes. 105*9880d681SAndroid Build Coastguard Worker 106*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 107*9880d681SAndroid Build Coastguard Worker 108*9880d681SAndroid Build Coastguard WorkerIt would be nice to revert this patch: 109*9880d681SAndroid Build Coastguard Workerhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html 110*9880d681SAndroid Build Coastguard Worker 111*9880d681SAndroid Build Coastguard WorkerAnd teach the dag combiner enough to simplify the code expanded before 112*9880d681SAndroid Build Coastguard Workerlegalize. It seems plausible that this knowledge would let it simplify other 113*9880d681SAndroid Build Coastguard Workerstuff too. 114*9880d681SAndroid Build Coastguard Worker 115*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 116*9880d681SAndroid Build Coastguard Worker 117*9880d681SAndroid Build Coastguard WorkerFor vector types, DataLayout.cpp::getTypeInfo() returns alignment that is equal 118*9880d681SAndroid Build Coastguard Workerto the type size. It works but can be overly conservative as the alignment of 119*9880d681SAndroid Build Coastguard Workerspecific vector types are target dependent. 120*9880d681SAndroid Build Coastguard Worker 121*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 122*9880d681SAndroid Build Coastguard Worker 123*9880d681SAndroid Build Coastguard WorkerWe should produce an unaligned load from code like this: 124*9880d681SAndroid Build Coastguard Worker 125*9880d681SAndroid Build Coastguard Workerv4sf example(float *P) { 126*9880d681SAndroid Build Coastguard Worker return (v4sf){P[0], P[1], P[2], P[3] }; 127*9880d681SAndroid Build Coastguard Worker} 128*9880d681SAndroid Build Coastguard Worker 129*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 130*9880d681SAndroid Build Coastguard Worker 131*9880d681SAndroid Build Coastguard WorkerAdd support for conditional increments, and other related patterns. Instead 132*9880d681SAndroid Build Coastguard Workerof: 133*9880d681SAndroid Build Coastguard Worker 134*9880d681SAndroid Build Coastguard Worker movl 136(%esp), %eax 135*9880d681SAndroid Build Coastguard Worker cmpl $0, %eax 136*9880d681SAndroid Build Coastguard Worker je LBB16_2 #cond_next 137*9880d681SAndroid Build Coastguard WorkerLBB16_1: #cond_true 138*9880d681SAndroid Build Coastguard Worker incl _foo 139*9880d681SAndroid Build Coastguard WorkerLBB16_2: #cond_next 140*9880d681SAndroid Build Coastguard Worker 141*9880d681SAndroid Build Coastguard Workeremit: 142*9880d681SAndroid Build Coastguard Worker movl _foo, %eax 143*9880d681SAndroid Build Coastguard Worker cmpl $1, %edi 144*9880d681SAndroid Build Coastguard Worker sbbl $-1, %eax 145*9880d681SAndroid Build Coastguard Worker movl %eax, _foo 146*9880d681SAndroid Build Coastguard Worker 147*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 148*9880d681SAndroid Build Coastguard Worker 149*9880d681SAndroid Build Coastguard WorkerCombine: a = sin(x), b = cos(x) into a,b = sincos(x). 150*9880d681SAndroid Build Coastguard Worker 151*9880d681SAndroid Build Coastguard WorkerExpand these to calls of sin/cos and stores: 152*9880d681SAndroid Build Coastguard Worker double sincos(double x, double *sin, double *cos); 153*9880d681SAndroid Build Coastguard Worker float sincosf(float x, float *sin, float *cos); 154*9880d681SAndroid Build Coastguard Worker long double sincosl(long double x, long double *sin, long double *cos); 155*9880d681SAndroid Build Coastguard Worker 156*9880d681SAndroid Build Coastguard WorkerDoing so could allow SROA of the destination pointers. See also: 157*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687 158*9880d681SAndroid Build Coastguard Worker 159*9880d681SAndroid Build Coastguard WorkerThis is now easily doable with MRVs. We could even make an intrinsic for this 160*9880d681SAndroid Build Coastguard Workerif anyone cared enough about sincos. 161*9880d681SAndroid Build Coastguard Worker 162*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 163*9880d681SAndroid Build Coastguard Worker 164*9880d681SAndroid Build Coastguard Workerquantum_sigma_x in 462.libquantum contains the following loop: 165*9880d681SAndroid Build Coastguard Worker 166*9880d681SAndroid Build Coastguard Worker for(i=0; i<reg->size; i++) 167*9880d681SAndroid Build Coastguard Worker { 168*9880d681SAndroid Build Coastguard Worker /* Flip the target bit of each basis state */ 169*9880d681SAndroid Build Coastguard Worker reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target); 170*9880d681SAndroid Build Coastguard Worker } 171*9880d681SAndroid Build Coastguard Worker 172*9880d681SAndroid Build Coastguard WorkerWhere MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just 173*9880d681SAndroid Build Coastguard Workerso cool to turn it into something like: 174*9880d681SAndroid Build Coastguard Worker 175*9880d681SAndroid Build Coastguard Worker long long Res = ((MAX_UNSIGNED) 1 << target); 176*9880d681SAndroid Build Coastguard Worker if (target < 32) { 177*9880d681SAndroid Build Coastguard Worker for(i=0; i<reg->size; i++) 178*9880d681SAndroid Build Coastguard Worker reg->node[i].state ^= Res & 0xFFFFFFFFULL; 179*9880d681SAndroid Build Coastguard Worker } else { 180*9880d681SAndroid Build Coastguard Worker for(i=0; i<reg->size; i++) 181*9880d681SAndroid Build Coastguard Worker reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL 182*9880d681SAndroid Build Coastguard Worker } 183*9880d681SAndroid Build Coastguard Worker 184*9880d681SAndroid Build Coastguard Worker... which would only do one 32-bit XOR per loop iteration instead of two. 185*9880d681SAndroid Build Coastguard Worker 186*9880d681SAndroid Build Coastguard WorkerIt would also be nice to recognize the reg->size doesn't alias reg->node[i], but 187*9880d681SAndroid Build Coastguard Workerthis requires TBAA. 188*9880d681SAndroid Build Coastguard Worker 189*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 190*9880d681SAndroid Build Coastguard Worker 191*9880d681SAndroid Build Coastguard WorkerThis isn't recognized as bswap by instcombine (yes, it really is bswap): 192*9880d681SAndroid Build Coastguard Worker 193*9880d681SAndroid Build Coastguard Workerunsigned long reverse(unsigned v) { 194*9880d681SAndroid Build Coastguard Worker unsigned t; 195*9880d681SAndroid Build Coastguard Worker t = v ^ ((v << 16) | (v >> 16)); 196*9880d681SAndroid Build Coastguard Worker t &= ~0xff0000; 197*9880d681SAndroid Build Coastguard Worker v = (v << 24) | (v >> 8); 198*9880d681SAndroid Build Coastguard Worker return v ^ (t >> 8); 199*9880d681SAndroid Build Coastguard Worker} 200*9880d681SAndroid Build Coastguard Worker 201*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 202*9880d681SAndroid Build Coastguard Worker 203*9880d681SAndroid Build Coastguard Worker[LOOP DELETION] 204*9880d681SAndroid Build Coastguard Worker 205*9880d681SAndroid Build Coastguard WorkerWe don't delete this output free loop, because trip count analysis doesn't 206*9880d681SAndroid Build Coastguard Workerrealize that it is finite (if it were infinite, it would be undefined). Not 207*9880d681SAndroid Build Coastguard Workerhaving this blocks Loop Idiom from matching strlen and friends. 208*9880d681SAndroid Build Coastguard Worker 209*9880d681SAndroid Build Coastguard Workervoid foo(char *C) { 210*9880d681SAndroid Build Coastguard Worker int x = 0; 211*9880d681SAndroid Build Coastguard Worker while (*C) 212*9880d681SAndroid Build Coastguard Worker ++x,++C; 213*9880d681SAndroid Build Coastguard Worker} 214*9880d681SAndroid Build Coastguard Worker 215*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 216*9880d681SAndroid Build Coastguard Worker 217*9880d681SAndroid Build Coastguard Worker[LOOP RECOGNITION] 218*9880d681SAndroid Build Coastguard Worker 219*9880d681SAndroid Build Coastguard WorkerThese idioms should be recognized as popcount (see PR1488): 220*9880d681SAndroid Build Coastguard Worker 221*9880d681SAndroid Build Coastguard Workerunsigned countbits_slow(unsigned v) { 222*9880d681SAndroid Build Coastguard Worker unsigned c; 223*9880d681SAndroid Build Coastguard Worker for (c = 0; v; v >>= 1) 224*9880d681SAndroid Build Coastguard Worker c += v & 1; 225*9880d681SAndroid Build Coastguard Worker return c; 226*9880d681SAndroid Build Coastguard Worker} 227*9880d681SAndroid Build Coastguard Worker 228*9880d681SAndroid Build Coastguard Workerunsigned int popcount(unsigned int input) { 229*9880d681SAndroid Build Coastguard Worker unsigned int count = 0; 230*9880d681SAndroid Build Coastguard Worker for (unsigned int i = 0; i < 4 * 8; i++) 231*9880d681SAndroid Build Coastguard Worker count += (input >> i) & i; 232*9880d681SAndroid Build Coastguard Worker return count; 233*9880d681SAndroid Build Coastguard Worker} 234*9880d681SAndroid Build Coastguard Worker 235*9880d681SAndroid Build Coastguard WorkerThis should be recognized as CLZ: rdar://8459039 236*9880d681SAndroid Build Coastguard Worker 237*9880d681SAndroid Build Coastguard Workerunsigned clz_a(unsigned a) { 238*9880d681SAndroid Build Coastguard Worker int i; 239*9880d681SAndroid Build Coastguard Worker for (i=0;i<32;i++) 240*9880d681SAndroid Build Coastguard Worker if (a & (1<<(31-i))) 241*9880d681SAndroid Build Coastguard Worker return i; 242*9880d681SAndroid Build Coastguard Worker return 32; 243*9880d681SAndroid Build Coastguard Worker} 244*9880d681SAndroid Build Coastguard Worker 245*9880d681SAndroid Build Coastguard WorkerThis sort of thing should be added to the loop idiom pass. 246*9880d681SAndroid Build Coastguard Worker 247*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 248*9880d681SAndroid Build Coastguard Worker 249*9880d681SAndroid Build Coastguard WorkerThese should turn into single 16-bit (unaligned?) loads on little/big endian 250*9880d681SAndroid Build Coastguard Workerprocessors. 251*9880d681SAndroid Build Coastguard Worker 252*9880d681SAndroid Build Coastguard Workerunsigned short read_16_le(const unsigned char *adr) { 253*9880d681SAndroid Build Coastguard Worker return adr[0] | (adr[1] << 8); 254*9880d681SAndroid Build Coastguard Worker} 255*9880d681SAndroid Build Coastguard Workerunsigned short read_16_be(const unsigned char *adr) { 256*9880d681SAndroid Build Coastguard Worker return (adr[0] << 8) | adr[1]; 257*9880d681SAndroid Build Coastguard Worker} 258*9880d681SAndroid Build Coastguard Worker 259*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 260*9880d681SAndroid Build Coastguard Worker 261*9880d681SAndroid Build Coastguard Worker-instcombine should handle this transform: 262*9880d681SAndroid Build Coastguard Worker icmp pred (sdiv X / C1 ), C2 263*9880d681SAndroid Build Coastguard Workerwhen X, C1, and C2 are unsigned. Similarly for udiv and signed operands. 264*9880d681SAndroid Build Coastguard Worker 265*9880d681SAndroid Build Coastguard WorkerCurrently InstCombine avoids this transform but will do it when the signs of 266*9880d681SAndroid Build Coastguard Workerthe operands and the sign of the divide match. See the FIXME in 267*9880d681SAndroid Build Coastguard WorkerInstructionCombining.cpp in the visitSetCondInst method after the switch case 268*9880d681SAndroid Build Coastguard Workerfor Instruction::UDiv (around line 4447) for more details. 269*9880d681SAndroid Build Coastguard Worker 270*9880d681SAndroid Build Coastguard WorkerThe SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of 271*9880d681SAndroid Build Coastguard Workerthis construct. 272*9880d681SAndroid Build Coastguard Worker 273*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 274*9880d681SAndroid Build Coastguard Worker 275*9880d681SAndroid Build Coastguard Worker[LOOP OPTIMIZATION] 276*9880d681SAndroid Build Coastguard Worker 277*9880d681SAndroid Build Coastguard WorkerSingleSource/Benchmarks/Misc/dt.c shows several interesting optimization 278*9880d681SAndroid Build Coastguard Workeropportunities in its double_array_divs_variable function: it needs loop 279*9880d681SAndroid Build Coastguard Workerinterchange, memory promotion (which LICM already does), vectorization and 280*9880d681SAndroid Build Coastguard Workervariable trip count loop unrolling (since it has a constant trip count). ICC 281*9880d681SAndroid Build Coastguard Workerapparently produces this very nice code with -ffast-math: 282*9880d681SAndroid Build Coastguard Worker 283*9880d681SAndroid Build Coastguard Worker..B1.70: # Preds ..B1.70 ..B1.69 284*9880d681SAndroid Build Coastguard Worker mulpd %xmm0, %xmm1 #108.2 285*9880d681SAndroid Build Coastguard Worker mulpd %xmm0, %xmm1 #108.2 286*9880d681SAndroid Build Coastguard Worker mulpd %xmm0, %xmm1 #108.2 287*9880d681SAndroid Build Coastguard Worker mulpd %xmm0, %xmm1 #108.2 288*9880d681SAndroid Build Coastguard Worker addl $8, %edx # 289*9880d681SAndroid Build Coastguard Worker cmpl $131072, %edx #108.2 290*9880d681SAndroid Build Coastguard Worker jb ..B1.70 # Prob 99% #108.2 291*9880d681SAndroid Build Coastguard Worker 292*9880d681SAndroid Build Coastguard WorkerIt would be better to count down to zero, but this is a lot better than what we 293*9880d681SAndroid Build Coastguard Workerdo. 294*9880d681SAndroid Build Coastguard Worker 295*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 296*9880d681SAndroid Build Coastguard Worker 297*9880d681SAndroid Build Coastguard WorkerConsider: 298*9880d681SAndroid Build Coastguard Worker 299*9880d681SAndroid Build Coastguard Workertypedef unsigned U32; 300*9880d681SAndroid Build Coastguard Workertypedef unsigned long long U64; 301*9880d681SAndroid Build Coastguard Workerint test (U32 *inst, U64 *regs) { 302*9880d681SAndroid Build Coastguard Worker U64 effective_addr2; 303*9880d681SAndroid Build Coastguard Worker U32 temp = *inst; 304*9880d681SAndroid Build Coastguard Worker int r1 = (temp >> 20) & 0xf; 305*9880d681SAndroid Build Coastguard Worker int b2 = (temp >> 16) & 0xf; 306*9880d681SAndroid Build Coastguard Worker effective_addr2 = temp & 0xfff; 307*9880d681SAndroid Build Coastguard Worker if (b2) effective_addr2 += regs[b2]; 308*9880d681SAndroid Build Coastguard Worker b2 = (temp >> 12) & 0xf; 309*9880d681SAndroid Build Coastguard Worker if (b2) effective_addr2 += regs[b2]; 310*9880d681SAndroid Build Coastguard Worker effective_addr2 &= regs[4]; 311*9880d681SAndroid Build Coastguard Worker if ((effective_addr2 & 3) == 0) 312*9880d681SAndroid Build Coastguard Worker return 1; 313*9880d681SAndroid Build Coastguard Worker return 0; 314*9880d681SAndroid Build Coastguard Worker} 315*9880d681SAndroid Build Coastguard Worker 316*9880d681SAndroid Build Coastguard WorkerNote that only the low 2 bits of effective_addr2 are used. On 32-bit systems, 317*9880d681SAndroid Build Coastguard Workerwe don't eliminate the computation of the top half of effective_addr2 because 318*9880d681SAndroid Build Coastguard Workerwe don't have whole-function selection dags. On x86, this means we use one 319*9880d681SAndroid Build Coastguard Workerextra register for the function when effective_addr2 is declared as U64 than 320*9880d681SAndroid Build Coastguard Workerwhen it is declared U32. 321*9880d681SAndroid Build Coastguard Worker 322*9880d681SAndroid Build Coastguard WorkerPHI Slicing could be extended to do this. 323*9880d681SAndroid Build Coastguard Worker 324*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 325*9880d681SAndroid Build Coastguard Worker 326*9880d681SAndroid Build Coastguard WorkerTail call elim should be more aggressive, checking to see if the call is 327*9880d681SAndroid Build Coastguard Workerfollowed by an uncond branch to an exit block. 328*9880d681SAndroid Build Coastguard Worker 329*9880d681SAndroid Build Coastguard Worker; This testcase is due to tail-duplication not wanting to copy the return 330*9880d681SAndroid Build Coastguard Worker; instruction into the terminating blocks because there was other code 331*9880d681SAndroid Build Coastguard Worker; optimized out of the function after the taildup happened. 332*9880d681SAndroid Build Coastguard Worker; RUN: llvm-as < %s | opt -tailcallelim | llvm-dis | not grep call 333*9880d681SAndroid Build Coastguard Worker 334*9880d681SAndroid Build Coastguard Workerdefine i32 @t4(i32 %a) { 335*9880d681SAndroid Build Coastguard Workerentry: 336*9880d681SAndroid Build Coastguard Worker %tmp.1 = and i32 %a, 1 ; <i32> [#uses=1] 337*9880d681SAndroid Build Coastguard Worker %tmp.2 = icmp ne i32 %tmp.1, 0 ; <i1> [#uses=1] 338*9880d681SAndroid Build Coastguard Worker br i1 %tmp.2, label %then.0, label %else.0 339*9880d681SAndroid Build Coastguard Worker 340*9880d681SAndroid Build Coastguard Workerthen.0: ; preds = %entry 341*9880d681SAndroid Build Coastguard Worker %tmp.5 = add i32 %a, -1 ; <i32> [#uses=1] 342*9880d681SAndroid Build Coastguard Worker %tmp.3 = call i32 @t4( i32 %tmp.5 ) ; <i32> [#uses=1] 343*9880d681SAndroid Build Coastguard Worker br label %return 344*9880d681SAndroid Build Coastguard Worker 345*9880d681SAndroid Build Coastguard Workerelse.0: ; preds = %entry 346*9880d681SAndroid Build Coastguard Worker %tmp.7 = icmp ne i32 %a, 0 ; <i1> [#uses=1] 347*9880d681SAndroid Build Coastguard Worker br i1 %tmp.7, label %then.1, label %return 348*9880d681SAndroid Build Coastguard Worker 349*9880d681SAndroid Build Coastguard Workerthen.1: ; preds = %else.0 350*9880d681SAndroid Build Coastguard Worker %tmp.11 = add i32 %a, -2 ; <i32> [#uses=1] 351*9880d681SAndroid Build Coastguard Worker %tmp.9 = call i32 @t4( i32 %tmp.11 ) ; <i32> [#uses=1] 352*9880d681SAndroid Build Coastguard Worker br label %return 353*9880d681SAndroid Build Coastguard Worker 354*9880d681SAndroid Build Coastguard Workerreturn: ; preds = %then.1, %else.0, %then.0 355*9880d681SAndroid Build Coastguard Worker %result.0 = phi i32 [ 0, %else.0 ], [ %tmp.3, %then.0 ], 356*9880d681SAndroid Build Coastguard Worker [ %tmp.9, %then.1 ] 357*9880d681SAndroid Build Coastguard Worker ret i32 %result.0 358*9880d681SAndroid Build Coastguard Worker} 359*9880d681SAndroid Build Coastguard Worker 360*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 361*9880d681SAndroid Build Coastguard Worker 362*9880d681SAndroid Build Coastguard WorkerTail recursion elimination should handle: 363*9880d681SAndroid Build Coastguard Worker 364*9880d681SAndroid Build Coastguard Workerint pow2m1(int n) { 365*9880d681SAndroid Build Coastguard Worker if (n == 0) 366*9880d681SAndroid Build Coastguard Worker return 0; 367*9880d681SAndroid Build Coastguard Worker return 2 * pow2m1 (n - 1) + 1; 368*9880d681SAndroid Build Coastguard Worker} 369*9880d681SAndroid Build Coastguard Worker 370*9880d681SAndroid Build Coastguard WorkerAlso, multiplies can be turned into SHL's, so they should be handled as if 371*9880d681SAndroid Build Coastguard Workerthey were associative. "return foo() << 1" can be tail recursion eliminated. 372*9880d681SAndroid Build Coastguard Worker 373*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 374*9880d681SAndroid Build Coastguard Worker 375*9880d681SAndroid Build Coastguard WorkerArgument promotion should promote arguments for recursive functions, like 376*9880d681SAndroid Build Coastguard Workerthis: 377*9880d681SAndroid Build Coastguard Worker 378*9880d681SAndroid Build Coastguard Worker; RUN: llvm-as < %s | opt -argpromotion | llvm-dis | grep x.val 379*9880d681SAndroid Build Coastguard Worker 380*9880d681SAndroid Build Coastguard Workerdefine internal i32 @foo(i32* %x) { 381*9880d681SAndroid Build Coastguard Workerentry: 382*9880d681SAndroid Build Coastguard Worker %tmp = load i32* %x ; <i32> [#uses=0] 383*9880d681SAndroid Build Coastguard Worker %tmp.foo = call i32 @foo( i32* %x ) ; <i32> [#uses=1] 384*9880d681SAndroid Build Coastguard Worker ret i32 %tmp.foo 385*9880d681SAndroid Build Coastguard Worker} 386*9880d681SAndroid Build Coastguard Worker 387*9880d681SAndroid Build Coastguard Workerdefine i32 @bar(i32* %x) { 388*9880d681SAndroid Build Coastguard Workerentry: 389*9880d681SAndroid Build Coastguard Worker %tmp3 = call i32 @foo( i32* %x ) ; <i32> [#uses=1] 390*9880d681SAndroid Build Coastguard Worker ret i32 %tmp3 391*9880d681SAndroid Build Coastguard Worker} 392*9880d681SAndroid Build Coastguard Worker 393*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 394*9880d681SAndroid Build Coastguard Worker 395*9880d681SAndroid Build Coastguard WorkerWe should investigate an instruction sinking pass. Consider this silly 396*9880d681SAndroid Build Coastguard Workerexample in pic mode: 397*9880d681SAndroid Build Coastguard Worker 398*9880d681SAndroid Build Coastguard Worker#include <assert.h> 399*9880d681SAndroid Build Coastguard Workervoid foo(int x) { 400*9880d681SAndroid Build Coastguard Worker assert(x); 401*9880d681SAndroid Build Coastguard Worker //... 402*9880d681SAndroid Build Coastguard Worker} 403*9880d681SAndroid Build Coastguard Worker 404*9880d681SAndroid Build Coastguard Workerwe compile this to: 405*9880d681SAndroid Build Coastguard Worker_foo: 406*9880d681SAndroid Build Coastguard Worker subl $28, %esp 407*9880d681SAndroid Build Coastguard Worker call "L1$pb" 408*9880d681SAndroid Build Coastguard Worker"L1$pb": 409*9880d681SAndroid Build Coastguard Worker popl %eax 410*9880d681SAndroid Build Coastguard Worker cmpl $0, 32(%esp) 411*9880d681SAndroid Build Coastguard Worker je LBB1_2 # cond_true 412*9880d681SAndroid Build Coastguard WorkerLBB1_1: # return 413*9880d681SAndroid Build Coastguard Worker # ... 414*9880d681SAndroid Build Coastguard Worker addl $28, %esp 415*9880d681SAndroid Build Coastguard Worker ret 416*9880d681SAndroid Build Coastguard WorkerLBB1_2: # cond_true 417*9880d681SAndroid Build Coastguard Worker... 418*9880d681SAndroid Build Coastguard Worker 419*9880d681SAndroid Build Coastguard WorkerThe PIC base computation (call+popl) is only used on one path through the 420*9880d681SAndroid Build Coastguard Workercode, but is currently always computed in the entry block. It would be 421*9880d681SAndroid Build Coastguard Workerbetter to sink the picbase computation down into the block for the 422*9880d681SAndroid Build Coastguard Workerassertion, as it is the only one that uses it. This happens for a lot of 423*9880d681SAndroid Build Coastguard Workercode with early outs. 424*9880d681SAndroid Build Coastguard Worker 425*9880d681SAndroid Build Coastguard WorkerAnother example is loads of arguments, which are usually emitted into the 426*9880d681SAndroid Build Coastguard Workerentry block on targets like x86. If not used in all paths through a 427*9880d681SAndroid Build Coastguard Workerfunction, they should be sunk into the ones that do. 428*9880d681SAndroid Build Coastguard Worker 429*9880d681SAndroid Build Coastguard WorkerIn this case, whole-function-isel would also handle this. 430*9880d681SAndroid Build Coastguard Worker 431*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 432*9880d681SAndroid Build Coastguard Worker 433*9880d681SAndroid Build Coastguard WorkerInvestigate lowering of sparse switch statements into perfect hash tables: 434*9880d681SAndroid Build Coastguard Workerhttp://burtleburtle.net/bob/hash/perfect.html 435*9880d681SAndroid Build Coastguard Worker 436*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 437*9880d681SAndroid Build Coastguard Worker 438*9880d681SAndroid Build Coastguard WorkerWe should turn things like "load+fabs+store" and "load+fneg+store" into the 439*9880d681SAndroid Build Coastguard Workercorresponding integer operations. On a yonah, this loop: 440*9880d681SAndroid Build Coastguard Worker 441*9880d681SAndroid Build Coastguard Workerdouble a[256]; 442*9880d681SAndroid Build Coastguard Workervoid foo() { 443*9880d681SAndroid Build Coastguard Worker int i, b; 444*9880d681SAndroid Build Coastguard Worker for (b = 0; b < 10000000; b++) 445*9880d681SAndroid Build Coastguard Worker for (i = 0; i < 256; i++) 446*9880d681SAndroid Build Coastguard Worker a[i] = -a[i]; 447*9880d681SAndroid Build Coastguard Worker} 448*9880d681SAndroid Build Coastguard Worker 449*9880d681SAndroid Build Coastguard Workeris twice as slow as this loop: 450*9880d681SAndroid Build Coastguard Worker 451*9880d681SAndroid Build Coastguard Workerlong long a[256]; 452*9880d681SAndroid Build Coastguard Workervoid foo() { 453*9880d681SAndroid Build Coastguard Worker int i, b; 454*9880d681SAndroid Build Coastguard Worker for (b = 0; b < 10000000; b++) 455*9880d681SAndroid Build Coastguard Worker for (i = 0; i < 256; i++) 456*9880d681SAndroid Build Coastguard Worker a[i] ^= (1ULL << 63); 457*9880d681SAndroid Build Coastguard Worker} 458*9880d681SAndroid Build Coastguard Worker 459*9880d681SAndroid Build Coastguard Workerand I suspect other processors are similar. On X86 in particular this is a 460*9880d681SAndroid Build Coastguard Workerbig win because doing this with integers allows the use of read/modify/write 461*9880d681SAndroid Build Coastguard Workerinstructions. 462*9880d681SAndroid Build Coastguard Worker 463*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 464*9880d681SAndroid Build Coastguard Worker 465*9880d681SAndroid Build Coastguard WorkerDAG Combiner should try to combine small loads into larger loads when 466*9880d681SAndroid Build Coastguard Workerprofitable. For example, we compile this C++ example: 467*9880d681SAndroid Build Coastguard Worker 468*9880d681SAndroid Build Coastguard Workerstruct THotKey { short Key; bool Control; bool Shift; bool Alt; }; 469*9880d681SAndroid Build Coastguard Workerextern THotKey m_HotKey; 470*9880d681SAndroid Build Coastguard WorkerTHotKey GetHotKey () { return m_HotKey; } 471*9880d681SAndroid Build Coastguard Worker 472*9880d681SAndroid Build Coastguard Workerinto (-m64 -O3 -fno-exceptions -static -fomit-frame-pointer): 473*9880d681SAndroid Build Coastguard Worker 474*9880d681SAndroid Build Coastguard Worker__Z9GetHotKeyv: ## @_Z9GetHotKeyv 475*9880d681SAndroid Build Coastguard Worker movq _m_HotKey@GOTPCREL(%rip), %rax 476*9880d681SAndroid Build Coastguard Worker movzwl (%rax), %ecx 477*9880d681SAndroid Build Coastguard Worker movzbl 2(%rax), %edx 478*9880d681SAndroid Build Coastguard Worker shlq $16, %rdx 479*9880d681SAndroid Build Coastguard Worker orq %rcx, %rdx 480*9880d681SAndroid Build Coastguard Worker movzbl 3(%rax), %ecx 481*9880d681SAndroid Build Coastguard Worker shlq $24, %rcx 482*9880d681SAndroid Build Coastguard Worker orq %rdx, %rcx 483*9880d681SAndroid Build Coastguard Worker movzbl 4(%rax), %eax 484*9880d681SAndroid Build Coastguard Worker shlq $32, %rax 485*9880d681SAndroid Build Coastguard Worker orq %rcx, %rax 486*9880d681SAndroid Build Coastguard Worker ret 487*9880d681SAndroid Build Coastguard Worker 488*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 489*9880d681SAndroid Build Coastguard Worker 490*9880d681SAndroid Build Coastguard WorkerWe should add an FRINT node to the DAG to model targets that have legal 491*9880d681SAndroid Build Coastguard Workerimplementations of ceil/floor/rint. 492*9880d681SAndroid Build Coastguard Worker 493*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 494*9880d681SAndroid Build Coastguard Worker 495*9880d681SAndroid Build Coastguard WorkerConsider: 496*9880d681SAndroid Build Coastguard Worker 497*9880d681SAndroid Build Coastguard Workerint test() { 498*9880d681SAndroid Build Coastguard Worker long long input[8] = {1,0,1,0,1,0,1,0}; 499*9880d681SAndroid Build Coastguard Worker foo(input); 500*9880d681SAndroid Build Coastguard Worker} 501*9880d681SAndroid Build Coastguard Worker 502*9880d681SAndroid Build Coastguard WorkerClang compiles this into: 503*9880d681SAndroid Build Coastguard Worker 504*9880d681SAndroid Build Coastguard Worker call void @llvm.memset.p0i8.i64(i8* %tmp, i8 0, i64 64, i32 16, i1 false) 505*9880d681SAndroid Build Coastguard Worker %0 = getelementptr [8 x i64]* %input, i64 0, i64 0 506*9880d681SAndroid Build Coastguard Worker store i64 1, i64* %0, align 16 507*9880d681SAndroid Build Coastguard Worker %1 = getelementptr [8 x i64]* %input, i64 0, i64 2 508*9880d681SAndroid Build Coastguard Worker store i64 1, i64* %1, align 16 509*9880d681SAndroid Build Coastguard Worker %2 = getelementptr [8 x i64]* %input, i64 0, i64 4 510*9880d681SAndroid Build Coastguard Worker store i64 1, i64* %2, align 16 511*9880d681SAndroid Build Coastguard Worker %3 = getelementptr [8 x i64]* %input, i64 0, i64 6 512*9880d681SAndroid Build Coastguard Worker store i64 1, i64* %3, align 16 513*9880d681SAndroid Build Coastguard Worker 514*9880d681SAndroid Build Coastguard WorkerWhich gets codegen'd into: 515*9880d681SAndroid Build Coastguard Worker 516*9880d681SAndroid Build Coastguard Worker pxor %xmm0, %xmm0 517*9880d681SAndroid Build Coastguard Worker movaps %xmm0, -16(%rbp) 518*9880d681SAndroid Build Coastguard Worker movaps %xmm0, -32(%rbp) 519*9880d681SAndroid Build Coastguard Worker movaps %xmm0, -48(%rbp) 520*9880d681SAndroid Build Coastguard Worker movaps %xmm0, -64(%rbp) 521*9880d681SAndroid Build Coastguard Worker movq $1, -64(%rbp) 522*9880d681SAndroid Build Coastguard Worker movq $1, -48(%rbp) 523*9880d681SAndroid Build Coastguard Worker movq $1, -32(%rbp) 524*9880d681SAndroid Build Coastguard Worker movq $1, -16(%rbp) 525*9880d681SAndroid Build Coastguard Worker 526*9880d681SAndroid Build Coastguard WorkerIt would be better to have 4 movq's of 0 instead of the movaps's. 527*9880d681SAndroid Build Coastguard Worker 528*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 529*9880d681SAndroid Build Coastguard Worker 530*9880d681SAndroid Build Coastguard Workerhttp://llvm.org/PR717: 531*9880d681SAndroid Build Coastguard Worker 532*9880d681SAndroid Build Coastguard WorkerThe following code should compile into "ret int undef". Instead, LLVM 533*9880d681SAndroid Build Coastguard Workerproduces "ret int 0": 534*9880d681SAndroid Build Coastguard Worker 535*9880d681SAndroid Build Coastguard Workerint f() { 536*9880d681SAndroid Build Coastguard Worker int x = 4; 537*9880d681SAndroid Build Coastguard Worker int y; 538*9880d681SAndroid Build Coastguard Worker if (x == 3) y = 0; 539*9880d681SAndroid Build Coastguard Worker return y; 540*9880d681SAndroid Build Coastguard Worker} 541*9880d681SAndroid Build Coastguard Worker 542*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 543*9880d681SAndroid Build Coastguard Worker 544*9880d681SAndroid Build Coastguard WorkerThe loop unroller should partially unroll loops (instead of peeling them) 545*9880d681SAndroid Build Coastguard Workerwhen code growth isn't too bad and when an unroll count allows simplification 546*9880d681SAndroid Build Coastguard Workerof some code within the loop. One trivial example is: 547*9880d681SAndroid Build Coastguard Worker 548*9880d681SAndroid Build Coastguard Worker#include <stdio.h> 549*9880d681SAndroid Build Coastguard Workerint main() { 550*9880d681SAndroid Build Coastguard Worker int nRet = 17; 551*9880d681SAndroid Build Coastguard Worker int nLoop; 552*9880d681SAndroid Build Coastguard Worker for ( nLoop = 0; nLoop < 1000; nLoop++ ) { 553*9880d681SAndroid Build Coastguard Worker if ( nLoop & 1 ) 554*9880d681SAndroid Build Coastguard Worker nRet += 2; 555*9880d681SAndroid Build Coastguard Worker else 556*9880d681SAndroid Build Coastguard Worker nRet -= 1; 557*9880d681SAndroid Build Coastguard Worker } 558*9880d681SAndroid Build Coastguard Worker return nRet; 559*9880d681SAndroid Build Coastguard Worker} 560*9880d681SAndroid Build Coastguard Worker 561*9880d681SAndroid Build Coastguard WorkerUnrolling by 2 would eliminate the '&1' in both copies, leading to a net 562*9880d681SAndroid Build Coastguard Workerreduction in code size. The resultant code would then also be suitable for 563*9880d681SAndroid Build Coastguard Workerexit value computation. 564*9880d681SAndroid Build Coastguard Worker 565*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 566*9880d681SAndroid Build Coastguard Worker 567*9880d681SAndroid Build Coastguard WorkerWe miss a bunch of rotate opportunities on various targets, including ppc, x86, 568*9880d681SAndroid Build Coastguard Workeretc. On X86, we miss a bunch of 'rotate by variable' cases because the rotate 569*9880d681SAndroid Build Coastguard Workermatching code in dag combine doesn't look through truncates aggressively 570*9880d681SAndroid Build Coastguard Workerenough. Here are some testcases reduces from GCC PR17886: 571*9880d681SAndroid Build Coastguard Worker 572*9880d681SAndroid Build Coastguard Workerunsigned long long f5(unsigned long long x, unsigned long long y) { 573*9880d681SAndroid Build Coastguard Worker return (x << 8) | ((y >> 48) & 0xffull); 574*9880d681SAndroid Build Coastguard Worker} 575*9880d681SAndroid Build Coastguard Workerunsigned long long f6(unsigned long long x, unsigned long long y, int z) { 576*9880d681SAndroid Build Coastguard Worker switch(z) { 577*9880d681SAndroid Build Coastguard Worker case 1: 578*9880d681SAndroid Build Coastguard Worker return (x << 8) | ((y >> 48) & 0xffull); 579*9880d681SAndroid Build Coastguard Worker case 2: 580*9880d681SAndroid Build Coastguard Worker return (x << 16) | ((y >> 40) & 0xffffull); 581*9880d681SAndroid Build Coastguard Worker case 3: 582*9880d681SAndroid Build Coastguard Worker return (x << 24) | ((y >> 32) & 0xffffffull); 583*9880d681SAndroid Build Coastguard Worker case 4: 584*9880d681SAndroid Build Coastguard Worker return (x << 32) | ((y >> 24) & 0xffffffffull); 585*9880d681SAndroid Build Coastguard Worker default: 586*9880d681SAndroid Build Coastguard Worker return (x << 40) | ((y >> 16) & 0xffffffffffull); 587*9880d681SAndroid Build Coastguard Worker } 588*9880d681SAndroid Build Coastguard Worker} 589*9880d681SAndroid Build Coastguard Worker 590*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 591*9880d681SAndroid Build Coastguard Worker 592*9880d681SAndroid Build Coastguard WorkerThis (and similar related idioms): 593*9880d681SAndroid Build Coastguard Worker 594*9880d681SAndroid Build Coastguard Workerunsigned int foo(unsigned char i) { 595*9880d681SAndroid Build Coastguard Worker return i | (i<<8) | (i<<16) | (i<<24); 596*9880d681SAndroid Build Coastguard Worker} 597*9880d681SAndroid Build Coastguard Worker 598*9880d681SAndroid Build Coastguard Workercompiles into: 599*9880d681SAndroid Build Coastguard Worker 600*9880d681SAndroid Build Coastguard Workerdefine i32 @foo(i8 zeroext %i) nounwind readnone ssp noredzone { 601*9880d681SAndroid Build Coastguard Workerentry: 602*9880d681SAndroid Build Coastguard Worker %conv = zext i8 %i to i32 603*9880d681SAndroid Build Coastguard Worker %shl = shl i32 %conv, 8 604*9880d681SAndroid Build Coastguard Worker %shl5 = shl i32 %conv, 16 605*9880d681SAndroid Build Coastguard Worker %shl9 = shl i32 %conv, 24 606*9880d681SAndroid Build Coastguard Worker %or = or i32 %shl9, %conv 607*9880d681SAndroid Build Coastguard Worker %or6 = or i32 %or, %shl5 608*9880d681SAndroid Build Coastguard Worker %or10 = or i32 %or6, %shl 609*9880d681SAndroid Build Coastguard Worker ret i32 %or10 610*9880d681SAndroid Build Coastguard Worker} 611*9880d681SAndroid Build Coastguard Worker 612*9880d681SAndroid Build Coastguard Workerit would be better as: 613*9880d681SAndroid Build Coastguard Worker 614*9880d681SAndroid Build Coastguard Workerunsigned int bar(unsigned char i) { 615*9880d681SAndroid Build Coastguard Worker unsigned int j=i | (i << 8); 616*9880d681SAndroid Build Coastguard Worker return j | (j<<16); 617*9880d681SAndroid Build Coastguard Worker} 618*9880d681SAndroid Build Coastguard Worker 619*9880d681SAndroid Build Coastguard Workeraka: 620*9880d681SAndroid Build Coastguard Worker 621*9880d681SAndroid Build Coastguard Workerdefine i32 @bar(i8 zeroext %i) nounwind readnone ssp noredzone { 622*9880d681SAndroid Build Coastguard Workerentry: 623*9880d681SAndroid Build Coastguard Worker %conv = zext i8 %i to i32 624*9880d681SAndroid Build Coastguard Worker %shl = shl i32 %conv, 8 625*9880d681SAndroid Build Coastguard Worker %or = or i32 %shl, %conv 626*9880d681SAndroid Build Coastguard Worker %shl5 = shl i32 %or, 16 627*9880d681SAndroid Build Coastguard Worker %or6 = or i32 %shl5, %or 628*9880d681SAndroid Build Coastguard Worker ret i32 %or6 629*9880d681SAndroid Build Coastguard Worker} 630*9880d681SAndroid Build Coastguard Worker 631*9880d681SAndroid Build Coastguard Workeror even i*0x01010101, depending on the speed of the multiplier. The best way to 632*9880d681SAndroid Build Coastguard Workerhandle this is to canonicalize it to a multiply in IR and have codegen handle 633*9880d681SAndroid Build Coastguard Workerlowering multiplies to shifts on cpus where shifts are faster. 634*9880d681SAndroid Build Coastguard Worker 635*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 636*9880d681SAndroid Build Coastguard Worker 637*9880d681SAndroid Build Coastguard WorkerWe do a number of simplifications in simplify libcalls to strength reduce 638*9880d681SAndroid Build Coastguard Workerstandard library functions, but we don't currently merge them together. For 639*9880d681SAndroid Build Coastguard Workerexample, it is useful to merge memcpy(a,b,strlen(b)) -> strcpy. This can only 640*9880d681SAndroid Build Coastguard Workerbe done safely if "b" isn't modified between the strlen and memcpy of course. 641*9880d681SAndroid Build Coastguard Worker 642*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 643*9880d681SAndroid Build Coastguard Worker 644*9880d681SAndroid Build Coastguard WorkerWe compile this program: (from GCC PR11680) 645*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/attachment.cgi?id=4487 646*9880d681SAndroid Build Coastguard Worker 647*9880d681SAndroid Build Coastguard WorkerInto code that runs the same speed in fast/slow modes, but both modes run 2x 648*9880d681SAndroid Build Coastguard Workerslower than when compile with GCC (either 4.0 or 4.2): 649*9880d681SAndroid Build Coastguard Worker 650*9880d681SAndroid Build Coastguard Worker$ llvm-g++ perf.cpp -O3 -fno-exceptions 651*9880d681SAndroid Build Coastguard Worker$ time ./a.out fast 652*9880d681SAndroid Build Coastguard Worker1.821u 0.003s 0:01.82 100.0% 0+0k 0+0io 0pf+0w 653*9880d681SAndroid Build Coastguard Worker 654*9880d681SAndroid Build Coastguard Worker$ g++ perf.cpp -O3 -fno-exceptions 655*9880d681SAndroid Build Coastguard Worker$ time ./a.out fast 656*9880d681SAndroid Build Coastguard Worker0.821u 0.001s 0:00.82 100.0% 0+0k 0+0io 0pf+0w 657*9880d681SAndroid Build Coastguard Worker 658*9880d681SAndroid Build Coastguard WorkerIt looks like we are making the same inlining decisions, so this may be raw 659*9880d681SAndroid Build Coastguard Workercodegen badness or something else (haven't investigated). 660*9880d681SAndroid Build Coastguard Worker 661*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 662*9880d681SAndroid Build Coastguard Worker 663*9880d681SAndroid Build Coastguard WorkerDivisibility by constant can be simplified (according to GCC PR12849) from 664*9880d681SAndroid Build Coastguard Workerbeing a mulhi to being a mul lo (cheaper). Testcase: 665*9880d681SAndroid Build Coastguard Worker 666*9880d681SAndroid Build Coastguard Workervoid bar(unsigned n) { 667*9880d681SAndroid Build Coastguard Worker if (n % 3 == 0) 668*9880d681SAndroid Build Coastguard Worker true(); 669*9880d681SAndroid Build Coastguard Worker} 670*9880d681SAndroid Build Coastguard Worker 671*9880d681SAndroid Build Coastguard WorkerThis is equivalent to the following, where 2863311531 is the multiplicative 672*9880d681SAndroid Build Coastguard Workerinverse of 3, and 1431655766 is ((2^32)-1)/3+1: 673*9880d681SAndroid Build Coastguard Workervoid bar(unsigned n) { 674*9880d681SAndroid Build Coastguard Worker if (n * 2863311531U < 1431655766U) 675*9880d681SAndroid Build Coastguard Worker true(); 676*9880d681SAndroid Build Coastguard Worker} 677*9880d681SAndroid Build Coastguard Worker 678*9880d681SAndroid Build Coastguard WorkerThe same transformation can work with an even modulo with the addition of a 679*9880d681SAndroid Build Coastguard Workerrotate: rotate the result of the multiply to the right by the number of bits 680*9880d681SAndroid Build Coastguard Workerwhich need to be zero for the condition to be true, and shrink the compare RHS 681*9880d681SAndroid Build Coastguard Workerby the same amount. Unless the target supports rotates, though, that 682*9880d681SAndroid Build Coastguard Workertransformation probably isn't worthwhile. 683*9880d681SAndroid Build Coastguard Worker 684*9880d681SAndroid Build Coastguard WorkerThe transformation can also easily be made to work with non-zero equality 685*9880d681SAndroid Build Coastguard Workercomparisons: just transform, for example, "n % 3 == 1" to "(n-1) % 3 == 0". 686*9880d681SAndroid Build Coastguard Worker 687*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 688*9880d681SAndroid Build Coastguard Worker 689*9880d681SAndroid Build Coastguard WorkerBetter mod/ref analysis for scanf would allow us to eliminate the vtable and a 690*9880d681SAndroid Build Coastguard Workerbunch of other stuff from this example (see PR1604): 691*9880d681SAndroid Build Coastguard Worker 692*9880d681SAndroid Build Coastguard Worker#include <cstdio> 693*9880d681SAndroid Build Coastguard Workerstruct test { 694*9880d681SAndroid Build Coastguard Worker int val; 695*9880d681SAndroid Build Coastguard Worker virtual ~test() {} 696*9880d681SAndroid Build Coastguard Worker}; 697*9880d681SAndroid Build Coastguard Worker 698*9880d681SAndroid Build Coastguard Workerint main() { 699*9880d681SAndroid Build Coastguard Worker test t; 700*9880d681SAndroid Build Coastguard Worker std::scanf("%d", &t.val); 701*9880d681SAndroid Build Coastguard Worker std::printf("%d\n", t.val); 702*9880d681SAndroid Build Coastguard Worker} 703*9880d681SAndroid Build Coastguard Worker 704*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 705*9880d681SAndroid Build Coastguard Worker 706*9880d681SAndroid Build Coastguard WorkerThese functions perform the same computation, but produce different assembly. 707*9880d681SAndroid Build Coastguard Worker 708*9880d681SAndroid Build Coastguard Workerdefine i8 @select(i8 %x) readnone nounwind { 709*9880d681SAndroid Build Coastguard Worker %A = icmp ult i8 %x, 250 710*9880d681SAndroid Build Coastguard Worker %B = select i1 %A, i8 0, i8 1 711*9880d681SAndroid Build Coastguard Worker ret i8 %B 712*9880d681SAndroid Build Coastguard Worker} 713*9880d681SAndroid Build Coastguard Worker 714*9880d681SAndroid Build Coastguard Workerdefine i8 @addshr(i8 %x) readnone nounwind { 715*9880d681SAndroid Build Coastguard Worker %A = zext i8 %x to i9 716*9880d681SAndroid Build Coastguard Worker %B = add i9 %A, 6 ;; 256 - 250 == 6 717*9880d681SAndroid Build Coastguard Worker %C = lshr i9 %B, 8 718*9880d681SAndroid Build Coastguard Worker %D = trunc i9 %C to i8 719*9880d681SAndroid Build Coastguard Worker ret i8 %D 720*9880d681SAndroid Build Coastguard Worker} 721*9880d681SAndroid Build Coastguard Worker 722*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 723*9880d681SAndroid Build Coastguard Worker 724*9880d681SAndroid Build Coastguard WorkerFrom gcc bug 24696: 725*9880d681SAndroid Build Coastguard Workerint 726*9880d681SAndroid Build Coastguard Workerf (unsigned long a, unsigned long b, unsigned long c) 727*9880d681SAndroid Build Coastguard Worker{ 728*9880d681SAndroid Build Coastguard Worker return ((a & (c - 1)) != 0) || ((b & (c - 1)) != 0); 729*9880d681SAndroid Build Coastguard Worker} 730*9880d681SAndroid Build Coastguard Workerint 731*9880d681SAndroid Build Coastguard Workerf (unsigned long a, unsigned long b, unsigned long c) 732*9880d681SAndroid Build Coastguard Worker{ 733*9880d681SAndroid Build Coastguard Worker return ((a & (c - 1)) != 0) | ((b & (c - 1)) != 0); 734*9880d681SAndroid Build Coastguard Worker} 735*9880d681SAndroid Build Coastguard WorkerBoth should combine to ((a|b) & (c-1)) != 0. Currently not optimized with 736*9880d681SAndroid Build Coastguard Worker"clang -emit-llvm-bc | opt -O3". 737*9880d681SAndroid Build Coastguard Worker 738*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 739*9880d681SAndroid Build Coastguard Worker 740*9880d681SAndroid Build Coastguard WorkerFrom GCC Bug 20192: 741*9880d681SAndroid Build Coastguard Worker#define PMD_MASK (~((1UL << 23) - 1)) 742*9880d681SAndroid Build Coastguard Workervoid clear_pmd_range(unsigned long start, unsigned long end) 743*9880d681SAndroid Build Coastguard Worker{ 744*9880d681SAndroid Build Coastguard Worker if (!(start & ~PMD_MASK) && !(end & ~PMD_MASK)) 745*9880d681SAndroid Build Coastguard Worker f(); 746*9880d681SAndroid Build Coastguard Worker} 747*9880d681SAndroid Build Coastguard WorkerThe expression should optimize to something like 748*9880d681SAndroid Build Coastguard Worker"!((start|end)&~PMD_MASK). Currently not optimized with "clang 749*9880d681SAndroid Build Coastguard Worker-emit-llvm-bc | opt -O3". 750*9880d681SAndroid Build Coastguard Worker 751*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 752*9880d681SAndroid Build Coastguard Worker 753*9880d681SAndroid Build Coastguard Workerunsigned int f(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return 754*9880d681SAndroid Build Coastguard Workeri;} 755*9880d681SAndroid Build Coastguard Workerunsigned int f2(unsigned int i, unsigned int n) {++i; i += i == n; return i;} 756*9880d681SAndroid Build Coastguard WorkerThese should combine to the same thing. Currently, the first function 757*9880d681SAndroid Build Coastguard Workerproduces better code on X86. 758*9880d681SAndroid Build Coastguard Worker 759*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 760*9880d681SAndroid Build Coastguard Worker 761*9880d681SAndroid Build Coastguard WorkerFrom GCC Bug 15784: 762*9880d681SAndroid Build Coastguard Worker#define abs(x) x>0?x:-x 763*9880d681SAndroid Build Coastguard Workerint f(int x, int y) 764*9880d681SAndroid Build Coastguard Worker{ 765*9880d681SAndroid Build Coastguard Worker return (abs(x)) >= 0; 766*9880d681SAndroid Build Coastguard Worker} 767*9880d681SAndroid Build Coastguard WorkerThis should optimize to x == INT_MIN. (With -fwrapv.) Currently not 768*9880d681SAndroid Build Coastguard Workeroptimized with "clang -emit-llvm-bc | opt -O3". 769*9880d681SAndroid Build Coastguard Worker 770*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 771*9880d681SAndroid Build Coastguard Worker 772*9880d681SAndroid Build Coastguard WorkerFrom GCC Bug 14753: 773*9880d681SAndroid Build Coastguard Workervoid 774*9880d681SAndroid Build Coastguard Workerrotate_cst (unsigned int a) 775*9880d681SAndroid Build Coastguard Worker{ 776*9880d681SAndroid Build Coastguard Worker a = (a << 10) | (a >> 22); 777*9880d681SAndroid Build Coastguard Worker if (a == 123) 778*9880d681SAndroid Build Coastguard Worker bar (); 779*9880d681SAndroid Build Coastguard Worker} 780*9880d681SAndroid Build Coastguard Workervoid 781*9880d681SAndroid Build Coastguard Workerminus_cst (unsigned int a) 782*9880d681SAndroid Build Coastguard Worker{ 783*9880d681SAndroid Build Coastguard Worker unsigned int tem; 784*9880d681SAndroid Build Coastguard Worker 785*9880d681SAndroid Build Coastguard Worker tem = 20 - a; 786*9880d681SAndroid Build Coastguard Worker if (tem == 5) 787*9880d681SAndroid Build Coastguard Worker bar (); 788*9880d681SAndroid Build Coastguard Worker} 789*9880d681SAndroid Build Coastguard Workervoid 790*9880d681SAndroid Build Coastguard Workermask_gt (unsigned int a) 791*9880d681SAndroid Build Coastguard Worker{ 792*9880d681SAndroid Build Coastguard Worker /* This is equivalent to a > 15. */ 793*9880d681SAndroid Build Coastguard Worker if ((a & ~7) > 8) 794*9880d681SAndroid Build Coastguard Worker bar (); 795*9880d681SAndroid Build Coastguard Worker} 796*9880d681SAndroid Build Coastguard Workervoid 797*9880d681SAndroid Build Coastguard Workerrshift_gt (unsigned int a) 798*9880d681SAndroid Build Coastguard Worker{ 799*9880d681SAndroid Build Coastguard Worker /* This is equivalent to a > 23. */ 800*9880d681SAndroid Build Coastguard Worker if ((a >> 2) > 5) 801*9880d681SAndroid Build Coastguard Worker bar (); 802*9880d681SAndroid Build Coastguard Worker} 803*9880d681SAndroid Build Coastguard Worker 804*9880d681SAndroid Build Coastguard WorkerAll should simplify to a single comparison. All of these are 805*9880d681SAndroid Build Coastguard Workercurrently not optimized with "clang -emit-llvm-bc | opt 806*9880d681SAndroid Build Coastguard Worker-O3". 807*9880d681SAndroid Build Coastguard Worker 808*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 809*9880d681SAndroid Build Coastguard Worker 810*9880d681SAndroid Build Coastguard WorkerFrom GCC Bug 32605: 811*9880d681SAndroid Build Coastguard Workerint c(int* x) {return (char*)x+2 == (char*)x;} 812*9880d681SAndroid Build Coastguard WorkerShould combine to 0. Currently not optimized with "clang 813*9880d681SAndroid Build Coastguard Worker-emit-llvm-bc | opt -O3" (although llc can optimize it). 814*9880d681SAndroid Build Coastguard Worker 815*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 816*9880d681SAndroid Build Coastguard Worker 817*9880d681SAndroid Build Coastguard Workerint a(unsigned b) {return ((b << 31) | (b << 30)) >> 31;} 818*9880d681SAndroid Build Coastguard WorkerShould be combined to "((b >> 1) | b) & 1". Currently not optimized 819*9880d681SAndroid Build Coastguard Workerwith "clang -emit-llvm-bc | opt -O3". 820*9880d681SAndroid Build Coastguard Worker 821*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 822*9880d681SAndroid Build Coastguard Worker 823*9880d681SAndroid Build Coastguard Workerunsigned a(unsigned x, unsigned y) { return x | (y & 1) | (y & 2);} 824*9880d681SAndroid Build Coastguard WorkerShould combine to "x | (y & 3)". Currently not optimized with "clang 825*9880d681SAndroid Build Coastguard Worker-emit-llvm-bc | opt -O3". 826*9880d681SAndroid Build Coastguard Worker 827*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 828*9880d681SAndroid Build Coastguard Worker 829*9880d681SAndroid Build Coastguard Workerint a(int a, int b, int c) {return (~a & c) | ((c|a) & b);} 830*9880d681SAndroid Build Coastguard WorkerShould fold to "(~a & c) | (a & b)". Currently not optimized with 831*9880d681SAndroid Build Coastguard Worker"clang -emit-llvm-bc | opt -O3". 832*9880d681SAndroid Build Coastguard Worker 833*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 834*9880d681SAndroid Build Coastguard Worker 835*9880d681SAndroid Build Coastguard Workerint a(int a,int b) {return (~(a|b))|a;} 836*9880d681SAndroid Build Coastguard WorkerShould fold to "a|~b". Currently not optimized with "clang 837*9880d681SAndroid Build Coastguard Worker-emit-llvm-bc | opt -O3". 838*9880d681SAndroid Build Coastguard Worker 839*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 840*9880d681SAndroid Build Coastguard Worker 841*9880d681SAndroid Build Coastguard Workerint a(int a, int b) {return (a&&b) || (a&&!b);} 842*9880d681SAndroid Build Coastguard WorkerShould fold to "a". Currently not optimized with "clang -emit-llvm-bc 843*9880d681SAndroid Build Coastguard Worker| opt -O3". 844*9880d681SAndroid Build Coastguard Worker 845*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 846*9880d681SAndroid Build Coastguard Worker 847*9880d681SAndroid Build Coastguard Workerint a(int a, int b, int c) {return (a&&b) || (!a&&c);} 848*9880d681SAndroid Build Coastguard WorkerShould fold to "a ? b : c", or at least something sane. Currently not 849*9880d681SAndroid Build Coastguard Workeroptimized with "clang -emit-llvm-bc | opt -O3". 850*9880d681SAndroid Build Coastguard Worker 851*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 852*9880d681SAndroid Build Coastguard Worker 853*9880d681SAndroid Build Coastguard Workerint a(int a, int b, int c) {return (a&&b) || (a&&c) || (a&&b&&c);} 854*9880d681SAndroid Build Coastguard WorkerShould fold to a && (b || c). Currently not optimized with "clang 855*9880d681SAndroid Build Coastguard Worker-emit-llvm-bc | opt -O3". 856*9880d681SAndroid Build Coastguard Worker 857*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 858*9880d681SAndroid Build Coastguard Worker 859*9880d681SAndroid Build Coastguard Workerint a(int x) {return x | ((x & 8) ^ 8);} 860*9880d681SAndroid Build Coastguard WorkerShould combine to x | 8. Currently not optimized with "clang 861*9880d681SAndroid Build Coastguard Worker-emit-llvm-bc | opt -O3". 862*9880d681SAndroid Build Coastguard Worker 863*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 864*9880d681SAndroid Build Coastguard Worker 865*9880d681SAndroid Build Coastguard Workerint a(int x) {return x ^ ((x & 8) ^ 8);} 866*9880d681SAndroid Build Coastguard WorkerShould also combine to x | 8. Currently not optimized with "clang 867*9880d681SAndroid Build Coastguard Worker-emit-llvm-bc | opt -O3". 868*9880d681SAndroid Build Coastguard Worker 869*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 870*9880d681SAndroid Build Coastguard Worker 871*9880d681SAndroid Build Coastguard Workerint a(int x) {return ((x | -9) ^ 8) & x;} 872*9880d681SAndroid Build Coastguard WorkerShould combine to x & -9. Currently not optimized with "clang 873*9880d681SAndroid Build Coastguard Worker-emit-llvm-bc | opt -O3". 874*9880d681SAndroid Build Coastguard Worker 875*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 876*9880d681SAndroid Build Coastguard Worker 877*9880d681SAndroid Build Coastguard Workerunsigned a(unsigned a) {return a * 0x11111111 >> 28 & 1;} 878*9880d681SAndroid Build Coastguard WorkerShould combine to "a * 0x88888888 >> 31". Currently not optimized 879*9880d681SAndroid Build Coastguard Workerwith "clang -emit-llvm-bc | opt -O3". 880*9880d681SAndroid Build Coastguard Worker 881*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 882*9880d681SAndroid Build Coastguard Worker 883*9880d681SAndroid Build Coastguard Workerunsigned a(char* x) {if ((*x & 32) == 0) return b();} 884*9880d681SAndroid Build Coastguard WorkerThere's an unnecessary zext in the generated code with "clang 885*9880d681SAndroid Build Coastguard Worker-emit-llvm-bc | opt -O3". 886*9880d681SAndroid Build Coastguard Worker 887*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 888*9880d681SAndroid Build Coastguard Worker 889*9880d681SAndroid Build Coastguard Workerunsigned a(unsigned long long x) {return 40 * (x >> 1);} 890*9880d681SAndroid Build Coastguard WorkerShould combine to "20 * (((unsigned)x) & -2)". Currently not 891*9880d681SAndroid Build Coastguard Workeroptimized with "clang -emit-llvm-bc | opt -O3". 892*9880d681SAndroid Build Coastguard Worker 893*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 894*9880d681SAndroid Build Coastguard Worker 895*9880d681SAndroid Build Coastguard Workerint g(int x) { return (x - 10) < 0; } 896*9880d681SAndroid Build Coastguard WorkerShould combine to "x <= 9" (the sub has nsw). Currently not 897*9880d681SAndroid Build Coastguard Workeroptimized with "clang -emit-llvm-bc | opt -O3". 898*9880d681SAndroid Build Coastguard Worker 899*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 900*9880d681SAndroid Build Coastguard Worker 901*9880d681SAndroid Build Coastguard Workerint g(int x) { return (x + 10) < 0; } 902*9880d681SAndroid Build Coastguard WorkerShould combine to "x < -10" (the add has nsw). Currently not 903*9880d681SAndroid Build Coastguard Workeroptimized with "clang -emit-llvm-bc | opt -O3". 904*9880d681SAndroid Build Coastguard Worker 905*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 906*9880d681SAndroid Build Coastguard Worker 907*9880d681SAndroid Build Coastguard Workerint f(int i, int j) { return i < j + 1; } 908*9880d681SAndroid Build Coastguard Workerint g(int i, int j) { return j > i - 1; } 909*9880d681SAndroid Build Coastguard WorkerShould combine to "i <= j" (the add/sub has nsw). Currently not 910*9880d681SAndroid Build Coastguard Workeroptimized with "clang -emit-llvm-bc | opt -O3". 911*9880d681SAndroid Build Coastguard Worker 912*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 913*9880d681SAndroid Build Coastguard Worker 914*9880d681SAndroid Build Coastguard Workerunsigned f(unsigned x) { return ((x & 7) + 1) & 15; } 915*9880d681SAndroid Build Coastguard WorkerThe & 15 part should be optimized away, it doesn't change the result. Currently 916*9880d681SAndroid Build Coastguard Workernot optimized with "clang -emit-llvm-bc | opt -O3". 917*9880d681SAndroid Build Coastguard Worker 918*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 919*9880d681SAndroid Build Coastguard Worker 920*9880d681SAndroid Build Coastguard WorkerThis was noticed in the entryblock for grokdeclarator in 403.gcc: 921*9880d681SAndroid Build Coastguard Worker 922*9880d681SAndroid Build Coastguard Worker %tmp = icmp eq i32 %decl_context, 4 923*9880d681SAndroid Build Coastguard Worker %decl_context_addr.0 = select i1 %tmp, i32 3, i32 %decl_context 924*9880d681SAndroid Build Coastguard Worker %tmp1 = icmp eq i32 %decl_context_addr.0, 1 925*9880d681SAndroid Build Coastguard Worker %decl_context_addr.1 = select i1 %tmp1, i32 0, i32 %decl_context_addr.0 926*9880d681SAndroid Build Coastguard Worker 927*9880d681SAndroid Build Coastguard Workertmp1 should be simplified to something like: 928*9880d681SAndroid Build Coastguard Worker (!tmp || decl_context == 1) 929*9880d681SAndroid Build Coastguard Worker 930*9880d681SAndroid Build Coastguard WorkerThis allows recursive simplifications, tmp1 is used all over the place in 931*9880d681SAndroid Build Coastguard Workerthe function, e.g. by: 932*9880d681SAndroid Build Coastguard Worker 933*9880d681SAndroid Build Coastguard Worker %tmp23 = icmp eq i32 %decl_context_addr.1, 0 ; <i1> [#uses=1] 934*9880d681SAndroid Build Coastguard Worker %tmp24 = xor i1 %tmp1, true ; <i1> [#uses=1] 935*9880d681SAndroid Build Coastguard Worker %or.cond8 = and i1 %tmp23, %tmp24 ; <i1> [#uses=1] 936*9880d681SAndroid Build Coastguard Worker 937*9880d681SAndroid Build Coastguard Workerlater. 938*9880d681SAndroid Build Coastguard Worker 939*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 940*9880d681SAndroid Build Coastguard Worker 941*9880d681SAndroid Build Coastguard Worker[STORE SINKING] 942*9880d681SAndroid Build Coastguard Worker 943*9880d681SAndroid Build Coastguard WorkerStore sinking: This code: 944*9880d681SAndroid Build Coastguard Worker 945*9880d681SAndroid Build Coastguard Workervoid f (int n, int *cond, int *res) { 946*9880d681SAndroid Build Coastguard Worker int i; 947*9880d681SAndroid Build Coastguard Worker *res = 0; 948*9880d681SAndroid Build Coastguard Worker for (i = 0; i < n; i++) 949*9880d681SAndroid Build Coastguard Worker if (*cond) 950*9880d681SAndroid Build Coastguard Worker *res ^= 234; /* (*) */ 951*9880d681SAndroid Build Coastguard Worker} 952*9880d681SAndroid Build Coastguard Worker 953*9880d681SAndroid Build Coastguard WorkerOn this function GVN hoists the fully redundant value of *res, but nothing 954*9880d681SAndroid Build Coastguard Workermoves the store out. This gives us this code: 955*9880d681SAndroid Build Coastguard Worker 956*9880d681SAndroid Build Coastguard Workerbb: ; preds = %bb2, %entry 957*9880d681SAndroid Build Coastguard Worker %.rle = phi i32 [ 0, %entry ], [ %.rle6, %bb2 ] 958*9880d681SAndroid Build Coastguard Worker %i.05 = phi i32 [ 0, %entry ], [ %indvar.next, %bb2 ] 959*9880d681SAndroid Build Coastguard Worker %1 = load i32* %cond, align 4 960*9880d681SAndroid Build Coastguard Worker %2 = icmp eq i32 %1, 0 961*9880d681SAndroid Build Coastguard Worker br i1 %2, label %bb2, label %bb1 962*9880d681SAndroid Build Coastguard Worker 963*9880d681SAndroid Build Coastguard Workerbb1: ; preds = %bb 964*9880d681SAndroid Build Coastguard Worker %3 = xor i32 %.rle, 234 965*9880d681SAndroid Build Coastguard Worker store i32 %3, i32* %res, align 4 966*9880d681SAndroid Build Coastguard Worker br label %bb2 967*9880d681SAndroid Build Coastguard Worker 968*9880d681SAndroid Build Coastguard Workerbb2: ; preds = %bb, %bb1 969*9880d681SAndroid Build Coastguard Worker %.rle6 = phi i32 [ %3, %bb1 ], [ %.rle, %bb ] 970*9880d681SAndroid Build Coastguard Worker %indvar.next = add i32 %i.05, 1 971*9880d681SAndroid Build Coastguard Worker %exitcond = icmp eq i32 %indvar.next, %n 972*9880d681SAndroid Build Coastguard Worker br i1 %exitcond, label %return, label %bb 973*9880d681SAndroid Build Coastguard Worker 974*9880d681SAndroid Build Coastguard WorkerDSE should sink partially dead stores to get the store out of the loop. 975*9880d681SAndroid Build Coastguard Worker 976*9880d681SAndroid Build Coastguard WorkerHere's another partial dead case: 977*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=12395 978*9880d681SAndroid Build Coastguard Worker 979*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 980*9880d681SAndroid Build Coastguard Worker 981*9880d681SAndroid Build Coastguard WorkerScalar PRE hoists the mul in the common block up to the else: 982*9880d681SAndroid Build Coastguard Worker 983*9880d681SAndroid Build Coastguard Workerint test (int a, int b, int c, int g) { 984*9880d681SAndroid Build Coastguard Worker int d, e; 985*9880d681SAndroid Build Coastguard Worker if (a) 986*9880d681SAndroid Build Coastguard Worker d = b * c; 987*9880d681SAndroid Build Coastguard Worker else 988*9880d681SAndroid Build Coastguard Worker d = b - c; 989*9880d681SAndroid Build Coastguard Worker e = b * c + g; 990*9880d681SAndroid Build Coastguard Worker return d + e; 991*9880d681SAndroid Build Coastguard Worker} 992*9880d681SAndroid Build Coastguard Worker 993*9880d681SAndroid Build Coastguard WorkerIt would be better to do the mul once to reduce codesize above the if. 994*9880d681SAndroid Build Coastguard WorkerThis is GCC PR38204. 995*9880d681SAndroid Build Coastguard Worker 996*9880d681SAndroid Build Coastguard Worker 997*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 998*9880d681SAndroid Build Coastguard WorkerThis simple function from 179.art: 999*9880d681SAndroid Build Coastguard Worker 1000*9880d681SAndroid Build Coastguard Workerint winner, numf2s; 1001*9880d681SAndroid Build Coastguard Workerstruct { double y; int reset; } *Y; 1002*9880d681SAndroid Build Coastguard Worker 1003*9880d681SAndroid Build Coastguard Workervoid find_match() { 1004*9880d681SAndroid Build Coastguard Worker int i; 1005*9880d681SAndroid Build Coastguard Worker winner = 0; 1006*9880d681SAndroid Build Coastguard Worker for (i=0;i<numf2s;i++) 1007*9880d681SAndroid Build Coastguard Worker if (Y[i].y > Y[winner].y) 1008*9880d681SAndroid Build Coastguard Worker winner =i; 1009*9880d681SAndroid Build Coastguard Worker} 1010*9880d681SAndroid Build Coastguard Worker 1011*9880d681SAndroid Build Coastguard WorkerCompiles into (with clang TBAA): 1012*9880d681SAndroid Build Coastguard Worker 1013*9880d681SAndroid Build Coastguard Workerfor.body: ; preds = %for.inc, %bb.nph 1014*9880d681SAndroid Build Coastguard Worker %indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.inc ] 1015*9880d681SAndroid Build Coastguard Worker %i.01718 = phi i32 [ 0, %bb.nph ], [ %i.01719, %for.inc ] 1016*9880d681SAndroid Build Coastguard Worker %tmp4 = getelementptr inbounds %struct.anon* %tmp3, i64 %indvar, i32 0 1017*9880d681SAndroid Build Coastguard Worker %tmp5 = load double* %tmp4, align 8, !tbaa !4 1018*9880d681SAndroid Build Coastguard Worker %idxprom7 = sext i32 %i.01718 to i64 1019*9880d681SAndroid Build Coastguard Worker %tmp10 = getelementptr inbounds %struct.anon* %tmp3, i64 %idxprom7, i32 0 1020*9880d681SAndroid Build Coastguard Worker %tmp11 = load double* %tmp10, align 8, !tbaa !4 1021*9880d681SAndroid Build Coastguard Worker %cmp12 = fcmp ogt double %tmp5, %tmp11 1022*9880d681SAndroid Build Coastguard Worker br i1 %cmp12, label %if.then, label %for.inc 1023*9880d681SAndroid Build Coastguard Worker 1024*9880d681SAndroid Build Coastguard Workerif.then: ; preds = %for.body 1025*9880d681SAndroid Build Coastguard Worker %i.017 = trunc i64 %indvar to i32 1026*9880d681SAndroid Build Coastguard Worker br label %for.inc 1027*9880d681SAndroid Build Coastguard Worker 1028*9880d681SAndroid Build Coastguard Workerfor.inc: ; preds = %for.body, %if.then 1029*9880d681SAndroid Build Coastguard Worker %i.01719 = phi i32 [ %i.01718, %for.body ], [ %i.017, %if.then ] 1030*9880d681SAndroid Build Coastguard Worker %indvar.next = add i64 %indvar, 1 1031*9880d681SAndroid Build Coastguard Worker %exitcond = icmp eq i64 %indvar.next, %tmp22 1032*9880d681SAndroid Build Coastguard Worker br i1 %exitcond, label %for.cond.for.end_crit_edge, label %for.body 1033*9880d681SAndroid Build Coastguard Worker 1034*9880d681SAndroid Build Coastguard Worker 1035*9880d681SAndroid Build Coastguard WorkerIt is good that we hoisted the reloads of numf2's, and Y out of the loop and 1036*9880d681SAndroid Build Coastguard Workersunk the store to winner out. 1037*9880d681SAndroid Build Coastguard Worker 1038*9880d681SAndroid Build Coastguard WorkerHowever, this is awful on several levels: the conditional truncate in the loop 1039*9880d681SAndroid Build Coastguard Worker(-indvars at fault? why can't we completely promote the IV to i64?). 1040*9880d681SAndroid Build Coastguard Worker 1041*9880d681SAndroid Build Coastguard WorkerBeyond that, we have a partially redundant load in the loop: if "winner" (aka 1042*9880d681SAndroid Build Coastguard Worker%i.01718) isn't updated, we reload Y[winner].y the next time through the loop. 1043*9880d681SAndroid Build Coastguard WorkerSimilarly, the addressing that feeds it (including the sext) is redundant. In 1044*9880d681SAndroid Build Coastguard Workerthe end we get this generated assembly: 1045*9880d681SAndroid Build Coastguard Worker 1046*9880d681SAndroid Build Coastguard WorkerLBB0_2: ## %for.body 1047*9880d681SAndroid Build Coastguard Worker ## =>This Inner Loop Header: Depth=1 1048*9880d681SAndroid Build Coastguard Worker movsd (%rdi), %xmm0 1049*9880d681SAndroid Build Coastguard Worker movslq %edx, %r8 1050*9880d681SAndroid Build Coastguard Worker shlq $4, %r8 1051*9880d681SAndroid Build Coastguard Worker ucomisd (%rcx,%r8), %xmm0 1052*9880d681SAndroid Build Coastguard Worker jbe LBB0_4 1053*9880d681SAndroid Build Coastguard Worker movl %esi, %edx 1054*9880d681SAndroid Build Coastguard WorkerLBB0_4: ## %for.inc 1055*9880d681SAndroid Build Coastguard Worker addq $16, %rdi 1056*9880d681SAndroid Build Coastguard Worker incq %rsi 1057*9880d681SAndroid Build Coastguard Worker cmpq %rsi, %rax 1058*9880d681SAndroid Build Coastguard Worker jne LBB0_2 1059*9880d681SAndroid Build Coastguard Worker 1060*9880d681SAndroid Build Coastguard WorkerAll things considered this isn't too bad, but we shouldn't need the movslq or 1061*9880d681SAndroid Build Coastguard Workerthe shlq instruction, or the load folded into ucomisd every time through the 1062*9880d681SAndroid Build Coastguard Workerloop. 1063*9880d681SAndroid Build Coastguard Worker 1064*9880d681SAndroid Build Coastguard WorkerOn an x86-specific topic, if the loop can't be restructure, the movl should be a 1065*9880d681SAndroid Build Coastguard Workercmov. 1066*9880d681SAndroid Build Coastguard Worker 1067*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1068*9880d681SAndroid Build Coastguard Worker 1069*9880d681SAndroid Build Coastguard Worker[STORE SINKING] 1070*9880d681SAndroid Build Coastguard Worker 1071*9880d681SAndroid Build Coastguard WorkerGCC PR37810 is an interesting case where we should sink load/store reload 1072*9880d681SAndroid Build Coastguard Workerinto the if block and outside the loop, so we don't reload/store it on the 1073*9880d681SAndroid Build Coastguard Workernon-call path. 1074*9880d681SAndroid Build Coastguard Worker 1075*9880d681SAndroid Build Coastguard Workerfor () { 1076*9880d681SAndroid Build Coastguard Worker *P += 1; 1077*9880d681SAndroid Build Coastguard Worker if () 1078*9880d681SAndroid Build Coastguard Worker call(); 1079*9880d681SAndroid Build Coastguard Worker else 1080*9880d681SAndroid Build Coastguard Worker ... 1081*9880d681SAndroid Build Coastguard Worker-> 1082*9880d681SAndroid Build Coastguard Workertmp = *P 1083*9880d681SAndroid Build Coastguard Workerfor () { 1084*9880d681SAndroid Build Coastguard Worker tmp += 1; 1085*9880d681SAndroid Build Coastguard Worker if () { 1086*9880d681SAndroid Build Coastguard Worker *P = tmp; 1087*9880d681SAndroid Build Coastguard Worker call(); 1088*9880d681SAndroid Build Coastguard Worker tmp = *P; 1089*9880d681SAndroid Build Coastguard Worker } else ... 1090*9880d681SAndroid Build Coastguard Worker} 1091*9880d681SAndroid Build Coastguard Worker*P = tmp; 1092*9880d681SAndroid Build Coastguard Worker 1093*9880d681SAndroid Build Coastguard WorkerWe now hoist the reload after the call (Transforms/GVN/lpre-call-wrap.ll), but 1094*9880d681SAndroid Build Coastguard Workerwe don't sink the store. We need partially dead store sinking. 1095*9880d681SAndroid Build Coastguard Worker 1096*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1097*9880d681SAndroid Build Coastguard Worker 1098*9880d681SAndroid Build Coastguard Worker[LOAD PRE CRIT EDGE SPLITTING] 1099*9880d681SAndroid Build Coastguard Worker 1100*9880d681SAndroid Build Coastguard WorkerGCC PR37166: Sinking of loads prevents SROA'ing the "g" struct on the stack 1101*9880d681SAndroid Build Coastguard Workerleading to excess stack traffic. This could be handled by GVN with some crazy 1102*9880d681SAndroid Build Coastguard Workersymbolic phi translation. The code we get looks like (g is on the stack): 1103*9880d681SAndroid Build Coastguard Worker 1104*9880d681SAndroid Build Coastguard Workerbb2: ; preds = %bb1 1105*9880d681SAndroid Build Coastguard Worker.. 1106*9880d681SAndroid Build Coastguard Worker %9 = getelementptr %struct.f* %g, i32 0, i32 0 1107*9880d681SAndroid Build Coastguard Worker store i32 %8, i32* %9, align bel %bb3 1108*9880d681SAndroid Build Coastguard Worker 1109*9880d681SAndroid Build Coastguard Workerbb3: ; preds = %bb1, %bb2, %bb 1110*9880d681SAndroid Build Coastguard Worker %c_addr.0 = phi %struct.f* [ %g, %bb2 ], [ %c, %bb ], [ %c, %bb1 ] 1111*9880d681SAndroid Build Coastguard Worker %b_addr.0 = phi %struct.f* [ %b, %bb2 ], [ %g, %bb ], [ %b, %bb1 ] 1112*9880d681SAndroid Build Coastguard Worker %10 = getelementptr %struct.f* %c_addr.0, i32 0, i32 0 1113*9880d681SAndroid Build Coastguard Worker %11 = load i32* %10, align 4 1114*9880d681SAndroid Build Coastguard Worker 1115*9880d681SAndroid Build Coastguard Worker%11 is partially redundant, an in BB2 it should have the value %8. 1116*9880d681SAndroid Build Coastguard Worker 1117*9880d681SAndroid Build Coastguard WorkerGCC PR33344 and PR35287 are similar cases. 1118*9880d681SAndroid Build Coastguard Worker 1119*9880d681SAndroid Build Coastguard Worker 1120*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1121*9880d681SAndroid Build Coastguard Worker 1122*9880d681SAndroid Build Coastguard Worker[LOAD PRE] 1123*9880d681SAndroid Build Coastguard Worker 1124*9880d681SAndroid Build Coastguard WorkerThere are many load PRE testcases in testsuite/gcc.dg/tree-ssa/loadpre* in the 1125*9880d681SAndroid Build Coastguard WorkerGCC testsuite, ones we don't get yet are (checked through loadpre25): 1126*9880d681SAndroid Build Coastguard Worker 1127*9880d681SAndroid Build Coastguard Worker[CRIT EDGE BREAKING] 1128*9880d681SAndroid Build Coastguard Workerpredcom-4.c 1129*9880d681SAndroid Build Coastguard Worker 1130*9880d681SAndroid Build Coastguard Worker[PRE OF READONLY CALL] 1131*9880d681SAndroid Build Coastguard Workerloadpre5.c 1132*9880d681SAndroid Build Coastguard Worker 1133*9880d681SAndroid Build Coastguard Worker[TURN SELECT INTO BRANCH] 1134*9880d681SAndroid Build Coastguard Workerloadpre14.c loadpre15.c 1135*9880d681SAndroid Build Coastguard Worker 1136*9880d681SAndroid Build Coastguard Workeractually a conditional increment: loadpre18.c loadpre19.c 1137*9880d681SAndroid Build Coastguard Worker 1138*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1139*9880d681SAndroid Build Coastguard Worker 1140*9880d681SAndroid Build Coastguard Worker[LOAD PRE / STORE SINKING / SPEC HACK] 1141*9880d681SAndroid Build Coastguard Worker 1142*9880d681SAndroid Build Coastguard WorkerThis is a chunk of code from 456.hmmer: 1143*9880d681SAndroid Build Coastguard Worker 1144*9880d681SAndroid Build Coastguard Workerint f(int M, int *mc, int *mpp, int *tpmm, int *ip, int *tpim, int *dpp, 1145*9880d681SAndroid Build Coastguard Worker int *tpdm, int xmb, int *bp, int *ms) { 1146*9880d681SAndroid Build Coastguard Worker int k, sc; 1147*9880d681SAndroid Build Coastguard Worker for (k = 1; k <= M; k++) { 1148*9880d681SAndroid Build Coastguard Worker mc[k] = mpp[k-1] + tpmm[k-1]; 1149*9880d681SAndroid Build Coastguard Worker if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc; 1150*9880d681SAndroid Build Coastguard Worker if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc; 1151*9880d681SAndroid Build Coastguard Worker if ((sc = xmb + bp[k]) > mc[k]) mc[k] = sc; 1152*9880d681SAndroid Build Coastguard Worker mc[k] += ms[k]; 1153*9880d681SAndroid Build Coastguard Worker } 1154*9880d681SAndroid Build Coastguard Worker} 1155*9880d681SAndroid Build Coastguard Worker 1156*9880d681SAndroid Build Coastguard WorkerIt is very profitable for this benchmark to turn the conditional stores to mc[k] 1157*9880d681SAndroid Build Coastguard Workerinto a conditional move (select instr in IR) and allow the final store to do the 1158*9880d681SAndroid Build Coastguard Workerstore. See GCC PR27313 for more details. Note that this is valid to xform even 1159*9880d681SAndroid Build Coastguard Workerwith the new C++ memory model, since mc[k] is previously loaded and later 1160*9880d681SAndroid Build Coastguard Workerstored. 1161*9880d681SAndroid Build Coastguard Worker 1162*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1163*9880d681SAndroid Build Coastguard Worker 1164*9880d681SAndroid Build Coastguard Worker[SCALAR PRE] 1165*9880d681SAndroid Build Coastguard WorkerThere are many PRE testcases in testsuite/gcc.dg/tree-ssa/ssa-pre-*.c in the 1166*9880d681SAndroid Build Coastguard WorkerGCC testsuite. 1167*9880d681SAndroid Build Coastguard Worker 1168*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1169*9880d681SAndroid Build Coastguard Worker 1170*9880d681SAndroid Build Coastguard WorkerThere are some interesting cases in testsuite/gcc.dg/tree-ssa/pred-comm* in the 1171*9880d681SAndroid Build Coastguard WorkerGCC testsuite. For example, we get the first example in predcom-1.c, but 1172*9880d681SAndroid Build Coastguard Workermiss the second one: 1173*9880d681SAndroid Build Coastguard Worker 1174*9880d681SAndroid Build Coastguard Workerunsigned fib[1000]; 1175*9880d681SAndroid Build Coastguard Workerunsigned avg[1000]; 1176*9880d681SAndroid Build Coastguard Worker 1177*9880d681SAndroid Build Coastguard Worker__attribute__ ((noinline)) 1178*9880d681SAndroid Build Coastguard Workervoid count_averages(int n) { 1179*9880d681SAndroid Build Coastguard Worker int i; 1180*9880d681SAndroid Build Coastguard Worker for (i = 1; i < n; i++) 1181*9880d681SAndroid Build Coastguard Worker avg[i] = (((unsigned long) fib[i - 1] + fib[i] + fib[i + 1]) / 3) & 0xffff; 1182*9880d681SAndroid Build Coastguard Worker} 1183*9880d681SAndroid Build Coastguard Worker 1184*9880d681SAndroid Build Coastguard Workerwhich compiles into two loads instead of one in the loop. 1185*9880d681SAndroid Build Coastguard Worker 1186*9880d681SAndroid Build Coastguard Workerpredcom-2.c is the same as predcom-1.c 1187*9880d681SAndroid Build Coastguard Worker 1188*9880d681SAndroid Build Coastguard Workerpredcom-3.c is very similar but needs loads feeding each other instead of 1189*9880d681SAndroid Build Coastguard Workerstore->load. 1190*9880d681SAndroid Build Coastguard Worker 1191*9880d681SAndroid Build Coastguard Worker 1192*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1193*9880d681SAndroid Build Coastguard Worker 1194*9880d681SAndroid Build Coastguard Worker[ALIAS ANALYSIS] 1195*9880d681SAndroid Build Coastguard Worker 1196*9880d681SAndroid Build Coastguard WorkerType based alias analysis: 1197*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=14705 1198*9880d681SAndroid Build Coastguard Worker 1199*9880d681SAndroid Build Coastguard WorkerWe should do better analysis of posix_memalign. At the least it should 1200*9880d681SAndroid Build Coastguard Workerno-capture its pointer argument, at best, we should know that the out-value 1201*9880d681SAndroid Build Coastguard Workerresult doesn't point to anything (like malloc). One example of this is in 1202*9880d681SAndroid Build Coastguard WorkerSingleSource/Benchmarks/Misc/dt.c 1203*9880d681SAndroid Build Coastguard Worker 1204*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1205*9880d681SAndroid Build Coastguard Worker 1206*9880d681SAndroid Build Coastguard WorkerInteresting missed case because of control flow flattening (should be 2 loads): 1207*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=26629 1208*9880d681SAndroid Build Coastguard WorkerWith: llvm-gcc t2.c -S -o - -O0 -emit-llvm | llvm-as | 1209*9880d681SAndroid Build Coastguard Worker opt -mem2reg -gvn -instcombine | llvm-dis 1210*9880d681SAndroid Build Coastguard Workerwe miss it because we need 1) CRIT EDGE 2) MULTIPLE DIFFERENT 1211*9880d681SAndroid Build Coastguard WorkerVALS PRODUCED BY ONE BLOCK OVER DIFFERENT PATHS 1212*9880d681SAndroid Build Coastguard Worker 1213*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1214*9880d681SAndroid Build Coastguard Worker 1215*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19633 1216*9880d681SAndroid Build Coastguard WorkerWe could eliminate the branch condition here, loading from null is undefined: 1217*9880d681SAndroid Build Coastguard Worker 1218*9880d681SAndroid Build Coastguard Workerstruct S { int w, x, y, z; }; 1219*9880d681SAndroid Build Coastguard Workerstruct T { int r; struct S s; }; 1220*9880d681SAndroid Build Coastguard Workervoid bar (struct S, int); 1221*9880d681SAndroid Build Coastguard Workervoid foo (int a, struct T b) 1222*9880d681SAndroid Build Coastguard Worker{ 1223*9880d681SAndroid Build Coastguard Worker struct S *c = 0; 1224*9880d681SAndroid Build Coastguard Worker if (a) 1225*9880d681SAndroid Build Coastguard Worker c = &b.s; 1226*9880d681SAndroid Build Coastguard Worker bar (*c, a); 1227*9880d681SAndroid Build Coastguard Worker} 1228*9880d681SAndroid Build Coastguard Worker 1229*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1230*9880d681SAndroid Build Coastguard Worker 1231*9880d681SAndroid Build Coastguard Workersimplifylibcalls should do several optimizations for strspn/strcspn: 1232*9880d681SAndroid Build Coastguard Worker 1233*9880d681SAndroid Build Coastguard Workerstrcspn(x, "a") -> inlined loop for up to 3 letters (similarly for strspn): 1234*9880d681SAndroid Build Coastguard Worker 1235*9880d681SAndroid Build Coastguard Workersize_t __strcspn_c3 (__const char *__s, int __reject1, int __reject2, 1236*9880d681SAndroid Build Coastguard Worker int __reject3) { 1237*9880d681SAndroid Build Coastguard Worker register size_t __result = 0; 1238*9880d681SAndroid Build Coastguard Worker while (__s[__result] != '\0' && __s[__result] != __reject1 && 1239*9880d681SAndroid Build Coastguard Worker __s[__result] != __reject2 && __s[__result] != __reject3) 1240*9880d681SAndroid Build Coastguard Worker ++__result; 1241*9880d681SAndroid Build Coastguard Worker return __result; 1242*9880d681SAndroid Build Coastguard Worker} 1243*9880d681SAndroid Build Coastguard Worker 1244*9880d681SAndroid Build Coastguard WorkerThis should turn into a switch on the character. See PR3253 for some notes on 1245*9880d681SAndroid Build Coastguard Workercodegen. 1246*9880d681SAndroid Build Coastguard Worker 1247*9880d681SAndroid Build Coastguard Worker456.hmmer apparently uses strcspn and strspn a lot. 471.omnetpp uses strspn. 1248*9880d681SAndroid Build Coastguard Worker 1249*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1250*9880d681SAndroid Build Coastguard Worker 1251*9880d681SAndroid Build Coastguard Workersimplifylibcalls should turn these snprintf idioms into memcpy (GCC PR47917) 1252*9880d681SAndroid Build Coastguard Worker 1253*9880d681SAndroid Build Coastguard Workerchar buf1[6], buf2[6], buf3[4], buf4[4]; 1254*9880d681SAndroid Build Coastguard Workerint i; 1255*9880d681SAndroid Build Coastguard Worker 1256*9880d681SAndroid Build Coastguard Workerint foo (void) { 1257*9880d681SAndroid Build Coastguard Worker int ret = snprintf (buf1, sizeof buf1, "abcde"); 1258*9880d681SAndroid Build Coastguard Worker ret += snprintf (buf2, sizeof buf2, "abcdef") * 16; 1259*9880d681SAndroid Build Coastguard Worker ret += snprintf (buf3, sizeof buf3, "%s", i++ < 6 ? "abc" : "def") * 256; 1260*9880d681SAndroid Build Coastguard Worker ret += snprintf (buf4, sizeof buf4, "%s", i++ > 10 ? "abcde" : "defgh")*4096; 1261*9880d681SAndroid Build Coastguard Worker return ret; 1262*9880d681SAndroid Build Coastguard Worker} 1263*9880d681SAndroid Build Coastguard Worker 1264*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1265*9880d681SAndroid Build Coastguard Worker 1266*9880d681SAndroid Build Coastguard Worker"gas" uses this idiom: 1267*9880d681SAndroid Build Coastguard Worker else if (strchr ("+-/*%|&^:[]()~", *intel_parser.op_string)) 1268*9880d681SAndroid Build Coastguard Worker.. 1269*9880d681SAndroid Build Coastguard Worker else if (strchr ("<>", *intel_parser.op_string) 1270*9880d681SAndroid Build Coastguard Worker 1271*9880d681SAndroid Build Coastguard WorkerThose should be turned into a switch. SimplifyLibCalls only gets the second 1272*9880d681SAndroid Build Coastguard Workercase. 1273*9880d681SAndroid Build Coastguard Worker 1274*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1275*9880d681SAndroid Build Coastguard Worker 1276*9880d681SAndroid Build Coastguard Worker252.eon contains this interesting code: 1277*9880d681SAndroid Build Coastguard Worker 1278*9880d681SAndroid Build Coastguard Worker %3072 = getelementptr [100 x i8]* %tempString, i32 0, i32 0 1279*9880d681SAndroid Build Coastguard Worker %3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind 1280*9880d681SAndroid Build Coastguard Worker %strlen = call i32 @strlen(i8* %3072) ; uses = 1 1281*9880d681SAndroid Build Coastguard Worker %endptr = getelementptr [100 x i8]* %tempString, i32 0, i32 %strlen 1282*9880d681SAndroid Build Coastguard Worker call void @llvm.memcpy.i32(i8* %endptr, 1283*9880d681SAndroid Build Coastguard Worker i8* getelementptr ([5 x i8]* @"\01LC42", i32 0, i32 0), i32 5, i32 1) 1284*9880d681SAndroid Build Coastguard Worker %3074 = call i32 @strlen(i8* %endptr) nounwind readonly 1285*9880d681SAndroid Build Coastguard Worker 1286*9880d681SAndroid Build Coastguard WorkerThis is interesting for a couple reasons. First, in this: 1287*9880d681SAndroid Build Coastguard Worker 1288*9880d681SAndroid Build Coastguard WorkerThe memcpy+strlen strlen can be replaced with: 1289*9880d681SAndroid Build Coastguard Worker 1290*9880d681SAndroid Build Coastguard Worker %3074 = call i32 @strlen([5 x i8]* @"\01LC42") nounwind readonly 1291*9880d681SAndroid Build Coastguard Worker 1292*9880d681SAndroid Build Coastguard WorkerBecause the destination was just copied into the specified memory buffer. This, 1293*9880d681SAndroid Build Coastguard Workerin turn, can be constant folded to "4". 1294*9880d681SAndroid Build Coastguard Worker 1295*9880d681SAndroid Build Coastguard WorkerIn other code, it contains: 1296*9880d681SAndroid Build Coastguard Worker 1297*9880d681SAndroid Build Coastguard Worker %endptr6978 = bitcast i8* %endptr69 to i32* 1298*9880d681SAndroid Build Coastguard Worker store i32 7107374, i32* %endptr6978, align 1 1299*9880d681SAndroid Build Coastguard Worker %3167 = call i32 @strlen(i8* %endptr69) nounwind readonly 1300*9880d681SAndroid Build Coastguard Worker 1301*9880d681SAndroid Build Coastguard WorkerWhich could also be constant folded. Whatever is producing this should probably 1302*9880d681SAndroid Build Coastguard Workerbe fixed to leave this as a memcpy from a string. 1303*9880d681SAndroid Build Coastguard Worker 1304*9880d681SAndroid Build Coastguard WorkerFurther, eon also has an interesting partially redundant strlen call: 1305*9880d681SAndroid Build Coastguard Worker 1306*9880d681SAndroid Build Coastguard Workerbb8: ; preds = %_ZN18eonImageCalculatorC1Ev.exit 1307*9880d681SAndroid Build Coastguard Worker %682 = getelementptr i8** %argv, i32 6 ; <i8**> [#uses=2] 1308*9880d681SAndroid Build Coastguard Worker %683 = load i8** %682, align 4 ; <i8*> [#uses=4] 1309*9880d681SAndroid Build Coastguard Worker %684 = load i8* %683, align 1 ; <i8> [#uses=1] 1310*9880d681SAndroid Build Coastguard Worker %685 = icmp eq i8 %684, 0 ; <i1> [#uses=1] 1311*9880d681SAndroid Build Coastguard Worker br i1 %685, label %bb10, label %bb9 1312*9880d681SAndroid Build Coastguard Worker 1313*9880d681SAndroid Build Coastguard Workerbb9: ; preds = %bb8 1314*9880d681SAndroid Build Coastguard Worker %686 = call i32 @strlen(i8* %683) nounwind readonly 1315*9880d681SAndroid Build Coastguard Worker %687 = icmp ugt i32 %686, 254 ; <i1> [#uses=1] 1316*9880d681SAndroid Build Coastguard Worker br i1 %687, label %bb10, label %bb11 1317*9880d681SAndroid Build Coastguard Worker 1318*9880d681SAndroid Build Coastguard Workerbb10: ; preds = %bb9, %bb8 1319*9880d681SAndroid Build Coastguard Worker %688 = call i32 @strlen(i8* %683) nounwind readonly 1320*9880d681SAndroid Build Coastguard Worker 1321*9880d681SAndroid Build Coastguard WorkerThis could be eliminated by doing the strlen once in bb8, saving code size and 1322*9880d681SAndroid Build Coastguard Workerimproving perf on the bb8->9->10 path. 1323*9880d681SAndroid Build Coastguard Worker 1324*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1325*9880d681SAndroid Build Coastguard Worker 1326*9880d681SAndroid Build Coastguard WorkerI see an interesting fully redundant call to strlen left in 186.crafty:InputMove 1327*9880d681SAndroid Build Coastguard Workerwhich looks like: 1328*9880d681SAndroid Build Coastguard Worker %movetext11 = getelementptr [128 x i8]* %movetext, i32 0, i32 0 1329*9880d681SAndroid Build Coastguard Worker 1330*9880d681SAndroid Build Coastguard Worker 1331*9880d681SAndroid Build Coastguard Workerbb62: ; preds = %bb55, %bb53 1332*9880d681SAndroid Build Coastguard Worker %promote.0 = phi i32 [ %169, %bb55 ], [ 0, %bb53 ] 1333*9880d681SAndroid Build Coastguard Worker %171 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1 1334*9880d681SAndroid Build Coastguard Worker %172 = add i32 %171, -1 ; <i32> [#uses=1] 1335*9880d681SAndroid Build Coastguard Worker %173 = getelementptr [128 x i8]* %movetext, i32 0, i32 %172 1336*9880d681SAndroid Build Coastguard Worker 1337*9880d681SAndroid Build Coastguard Worker... no stores ... 1338*9880d681SAndroid Build Coastguard Worker br i1 %or.cond, label %bb65, label %bb72 1339*9880d681SAndroid Build Coastguard Worker 1340*9880d681SAndroid Build Coastguard Workerbb65: ; preds = %bb62 1341*9880d681SAndroid Build Coastguard Worker store i8 0, i8* %173, align 1 1342*9880d681SAndroid Build Coastguard Worker br label %bb72 1343*9880d681SAndroid Build Coastguard Worker 1344*9880d681SAndroid Build Coastguard Workerbb72: ; preds = %bb65, %bb62 1345*9880d681SAndroid Build Coastguard Worker %trank.1 = phi i32 [ %176, %bb65 ], [ -1, %bb62 ] 1346*9880d681SAndroid Build Coastguard Worker %177 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1 1347*9880d681SAndroid Build Coastguard Worker 1348*9880d681SAndroid Build Coastguard WorkerNote that on the bb62->bb72 path, that the %177 strlen call is partially 1349*9880d681SAndroid Build Coastguard Workerredundant with the %171 call. At worst, we could shove the %177 strlen call 1350*9880d681SAndroid Build Coastguard Workerup into the bb65 block moving it out of the bb62->bb72 path. However, note 1351*9880d681SAndroid Build Coastguard Workerthat bb65 stores to the string, zeroing out the last byte. This means that on 1352*9880d681SAndroid Build Coastguard Workerthat path the value of %177 is actually just %171-1. A sub is cheaper than a 1353*9880d681SAndroid Build Coastguard Workerstrlen! 1354*9880d681SAndroid Build Coastguard Worker 1355*9880d681SAndroid Build Coastguard WorkerThis pattern repeats several times, basically doing: 1356*9880d681SAndroid Build Coastguard Worker 1357*9880d681SAndroid Build Coastguard Worker A = strlen(P); 1358*9880d681SAndroid Build Coastguard Worker P[A-1] = 0; 1359*9880d681SAndroid Build Coastguard Worker B = strlen(P); 1360*9880d681SAndroid Build Coastguard Worker where it is "obvious" that B = A-1. 1361*9880d681SAndroid Build Coastguard Worker 1362*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1363*9880d681SAndroid Build Coastguard Worker 1364*9880d681SAndroid Build Coastguard Worker186.crafty has this interesting pattern with the "out.4543" variable: 1365*9880d681SAndroid Build Coastguard Worker 1366*9880d681SAndroid Build Coastguard Workercall void @llvm.memcpy.i32( 1367*9880d681SAndroid Build Coastguard Worker i8* getelementptr ([10 x i8]* @out.4543, i32 0, i32 0), 1368*9880d681SAndroid Build Coastguard Worker i8* getelementptr ([7 x i8]* @"\01LC28700", i32 0, i32 0), i32 7, i32 1) 1369*9880d681SAndroid Build Coastguard Worker%101 = call@printf(i8* ... @out.4543, i32 0, i32 0)) nounwind 1370*9880d681SAndroid Build Coastguard Worker 1371*9880d681SAndroid Build Coastguard WorkerIt is basically doing: 1372*9880d681SAndroid Build Coastguard Worker 1373*9880d681SAndroid Build Coastguard Worker memcpy(globalarray, "string"); 1374*9880d681SAndroid Build Coastguard Worker printf(..., globalarray); 1375*9880d681SAndroid Build Coastguard Worker 1376*9880d681SAndroid Build Coastguard WorkerAnyway, by knowing that printf just reads the memory and forward substituting 1377*9880d681SAndroid Build Coastguard Workerthe string directly into the printf, this eliminates reads from globalarray. 1378*9880d681SAndroid Build Coastguard WorkerSince this pattern occurs frequently in crafty (due to the "DisplayTime" and 1379*9880d681SAndroid Build Coastguard Workerother similar functions) there are many stores to "out". Once all the printfs 1380*9880d681SAndroid Build Coastguard Workerstop using "out", all that is left is the memcpy's into it. This should allow 1381*9880d681SAndroid Build Coastguard Workerglobalopt to remove the "stored only" global. 1382*9880d681SAndroid Build Coastguard Worker 1383*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1384*9880d681SAndroid Build Coastguard Worker 1385*9880d681SAndroid Build Coastguard WorkerThis code: 1386*9880d681SAndroid Build Coastguard Worker 1387*9880d681SAndroid Build Coastguard Workerdefine inreg i32 @foo(i8* inreg %p) nounwind { 1388*9880d681SAndroid Build Coastguard Worker %tmp0 = load i8* %p 1389*9880d681SAndroid Build Coastguard Worker %tmp1 = ashr i8 %tmp0, 5 1390*9880d681SAndroid Build Coastguard Worker %tmp2 = sext i8 %tmp1 to i32 1391*9880d681SAndroid Build Coastguard Worker ret i32 %tmp2 1392*9880d681SAndroid Build Coastguard Worker} 1393*9880d681SAndroid Build Coastguard Worker 1394*9880d681SAndroid Build Coastguard Workercould be dagcombine'd to a sign-extending load with a shift. 1395*9880d681SAndroid Build Coastguard WorkerFor example, on x86 this currently gets this: 1396*9880d681SAndroid Build Coastguard Worker 1397*9880d681SAndroid Build Coastguard Worker movb (%eax), %al 1398*9880d681SAndroid Build Coastguard Worker sarb $5, %al 1399*9880d681SAndroid Build Coastguard Worker movsbl %al, %eax 1400*9880d681SAndroid Build Coastguard Worker 1401*9880d681SAndroid Build Coastguard Workerwhile it could get this: 1402*9880d681SAndroid Build Coastguard Worker 1403*9880d681SAndroid Build Coastguard Worker movsbl (%eax), %eax 1404*9880d681SAndroid Build Coastguard Worker sarl $5, %eax 1405*9880d681SAndroid Build Coastguard Worker 1406*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1407*9880d681SAndroid Build Coastguard Worker 1408*9880d681SAndroid Build Coastguard WorkerGCC PR31029: 1409*9880d681SAndroid Build Coastguard Worker 1410*9880d681SAndroid Build Coastguard Workerint test(int x) { return 1-x == x; } // --> return false 1411*9880d681SAndroid Build Coastguard Workerint test2(int x) { return 2-x == x; } // --> return x == 1 ? 1412*9880d681SAndroid Build Coastguard Worker 1413*9880d681SAndroid Build Coastguard WorkerAlways foldable for odd constants, what is the rule for even? 1414*9880d681SAndroid Build Coastguard Worker 1415*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1416*9880d681SAndroid Build Coastguard Worker 1417*9880d681SAndroid Build Coastguard WorkerPR 3381: GEP to field of size 0 inside a struct could be turned into GEP 1418*9880d681SAndroid Build Coastguard Workerfor next field in struct (which is at same address). 1419*9880d681SAndroid Build Coastguard Worker 1420*9880d681SAndroid Build Coastguard WorkerFor example: store of float into { {{}}, float } could be turned into a store to 1421*9880d681SAndroid Build Coastguard Workerthe float directly. 1422*9880d681SAndroid Build Coastguard Worker 1423*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1424*9880d681SAndroid Build Coastguard Worker 1425*9880d681SAndroid Build Coastguard WorkerThe arg promotion pass should make use of nocapture to make its alias analysis 1426*9880d681SAndroid Build Coastguard Workerstuff much more precise. 1427*9880d681SAndroid Build Coastguard Worker 1428*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1429*9880d681SAndroid Build Coastguard Worker 1430*9880d681SAndroid Build Coastguard WorkerThe following functions should be optimized to use a select instead of a 1431*9880d681SAndroid Build Coastguard Workerbranch (from gcc PR40072): 1432*9880d681SAndroid Build Coastguard Worker 1433*9880d681SAndroid Build Coastguard Workerchar char_int(int m) {if(m>7) return 0; return m;} 1434*9880d681SAndroid Build Coastguard Workerint int_char(char m) {if(m>7) return 0; return m;} 1435*9880d681SAndroid Build Coastguard Worker 1436*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1437*9880d681SAndroid Build Coastguard Worker 1438*9880d681SAndroid Build Coastguard Workerint func(int a, int b) { if (a & 0x80) b |= 0x80; else b &= ~0x80; return b; } 1439*9880d681SAndroid Build Coastguard Worker 1440*9880d681SAndroid Build Coastguard WorkerGenerates this: 1441*9880d681SAndroid Build Coastguard Worker 1442*9880d681SAndroid Build Coastguard Workerdefine i32 @func(i32 %a, i32 %b) nounwind readnone ssp { 1443*9880d681SAndroid Build Coastguard Workerentry: 1444*9880d681SAndroid Build Coastguard Worker %0 = and i32 %a, 128 ; <i32> [#uses=1] 1445*9880d681SAndroid Build Coastguard Worker %1 = icmp eq i32 %0, 0 ; <i1> [#uses=1] 1446*9880d681SAndroid Build Coastguard Worker %2 = or i32 %b, 128 ; <i32> [#uses=1] 1447*9880d681SAndroid Build Coastguard Worker %3 = and i32 %b, -129 ; <i32> [#uses=1] 1448*9880d681SAndroid Build Coastguard Worker %b_addr.0 = select i1 %1, i32 %3, i32 %2 ; <i32> [#uses=1] 1449*9880d681SAndroid Build Coastguard Worker ret i32 %b_addr.0 1450*9880d681SAndroid Build Coastguard Worker} 1451*9880d681SAndroid Build Coastguard Worker 1452*9880d681SAndroid Build Coastguard WorkerHowever, it's functionally equivalent to: 1453*9880d681SAndroid Build Coastguard Worker 1454*9880d681SAndroid Build Coastguard Worker b = (b & ~0x80) | (a & 0x80); 1455*9880d681SAndroid Build Coastguard Worker 1456*9880d681SAndroid Build Coastguard WorkerWhich generates this: 1457*9880d681SAndroid Build Coastguard Worker 1458*9880d681SAndroid Build Coastguard Workerdefine i32 @func(i32 %a, i32 %b) nounwind readnone ssp { 1459*9880d681SAndroid Build Coastguard Workerentry: 1460*9880d681SAndroid Build Coastguard Worker %0 = and i32 %b, -129 ; <i32> [#uses=1] 1461*9880d681SAndroid Build Coastguard Worker %1 = and i32 %a, 128 ; <i32> [#uses=1] 1462*9880d681SAndroid Build Coastguard Worker %2 = or i32 %0, %1 ; <i32> [#uses=1] 1463*9880d681SAndroid Build Coastguard Worker ret i32 %2 1464*9880d681SAndroid Build Coastguard Worker} 1465*9880d681SAndroid Build Coastguard Worker 1466*9880d681SAndroid Build Coastguard WorkerThis can be generalized for other forms: 1467*9880d681SAndroid Build Coastguard Worker 1468*9880d681SAndroid Build Coastguard Worker b = (b & ~0x80) | (a & 0x40) << 1; 1469*9880d681SAndroid Build Coastguard Worker 1470*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1471*9880d681SAndroid Build Coastguard Worker 1472*9880d681SAndroid Build Coastguard WorkerThese two functions produce different code. They shouldn't: 1473*9880d681SAndroid Build Coastguard Worker 1474*9880d681SAndroid Build Coastguard Worker#include <stdint.h> 1475*9880d681SAndroid Build Coastguard Worker 1476*9880d681SAndroid Build Coastguard Workeruint8_t p1(uint8_t b, uint8_t a) { 1477*9880d681SAndroid Build Coastguard Worker b = (b & ~0xc0) | (a & 0xc0); 1478*9880d681SAndroid Build Coastguard Worker return (b); 1479*9880d681SAndroid Build Coastguard Worker} 1480*9880d681SAndroid Build Coastguard Worker 1481*9880d681SAndroid Build Coastguard Workeruint8_t p2(uint8_t b, uint8_t a) { 1482*9880d681SAndroid Build Coastguard Worker b = (b & ~0x40) | (a & 0x40); 1483*9880d681SAndroid Build Coastguard Worker b = (b & ~0x80) | (a & 0x80); 1484*9880d681SAndroid Build Coastguard Worker return (b); 1485*9880d681SAndroid Build Coastguard Worker} 1486*9880d681SAndroid Build Coastguard Worker 1487*9880d681SAndroid Build Coastguard Workerdefine zeroext i8 @p1(i8 zeroext %b, i8 zeroext %a) nounwind readnone ssp { 1488*9880d681SAndroid Build Coastguard Workerentry: 1489*9880d681SAndroid Build Coastguard Worker %0 = and i8 %b, 63 ; <i8> [#uses=1] 1490*9880d681SAndroid Build Coastguard Worker %1 = and i8 %a, -64 ; <i8> [#uses=1] 1491*9880d681SAndroid Build Coastguard Worker %2 = or i8 %1, %0 ; <i8> [#uses=1] 1492*9880d681SAndroid Build Coastguard Worker ret i8 %2 1493*9880d681SAndroid Build Coastguard Worker} 1494*9880d681SAndroid Build Coastguard Worker 1495*9880d681SAndroid Build Coastguard Workerdefine zeroext i8 @p2(i8 zeroext %b, i8 zeroext %a) nounwind readnone ssp { 1496*9880d681SAndroid Build Coastguard Workerentry: 1497*9880d681SAndroid Build Coastguard Worker %0 = and i8 %b, 63 ; <i8> [#uses=1] 1498*9880d681SAndroid Build Coastguard Worker %.masked = and i8 %a, 64 ; <i8> [#uses=1] 1499*9880d681SAndroid Build Coastguard Worker %1 = and i8 %a, -128 ; <i8> [#uses=1] 1500*9880d681SAndroid Build Coastguard Worker %2 = or i8 %1, %0 ; <i8> [#uses=1] 1501*9880d681SAndroid Build Coastguard Worker %3 = or i8 %2, %.masked ; <i8> [#uses=1] 1502*9880d681SAndroid Build Coastguard Worker ret i8 %3 1503*9880d681SAndroid Build Coastguard Worker} 1504*9880d681SAndroid Build Coastguard Worker 1505*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1506*9880d681SAndroid Build Coastguard Worker 1507*9880d681SAndroid Build Coastguard WorkerIPSCCP does not currently propagate argument dependent constants through 1508*9880d681SAndroid Build Coastguard Workerfunctions where it does not not all of the callers. This includes functions 1509*9880d681SAndroid Build Coastguard Workerwith normal external linkage as well as templates, C99 inline functions etc. 1510*9880d681SAndroid Build Coastguard WorkerSpecifically, it does nothing to: 1511*9880d681SAndroid Build Coastguard Worker 1512*9880d681SAndroid Build Coastguard Workerdefine i32 @test(i32 %x, i32 %y, i32 %z) nounwind { 1513*9880d681SAndroid Build Coastguard Workerentry: 1514*9880d681SAndroid Build Coastguard Worker %0 = add nsw i32 %y, %z 1515*9880d681SAndroid Build Coastguard Worker %1 = mul i32 %0, %x 1516*9880d681SAndroid Build Coastguard Worker %2 = mul i32 %y, %z 1517*9880d681SAndroid Build Coastguard Worker %3 = add nsw i32 %1, %2 1518*9880d681SAndroid Build Coastguard Worker ret i32 %3 1519*9880d681SAndroid Build Coastguard Worker} 1520*9880d681SAndroid Build Coastguard Worker 1521*9880d681SAndroid Build Coastguard Workerdefine i32 @test2() nounwind { 1522*9880d681SAndroid Build Coastguard Workerentry: 1523*9880d681SAndroid Build Coastguard Worker %0 = call i32 @test(i32 1, i32 2, i32 4) nounwind 1524*9880d681SAndroid Build Coastguard Worker ret i32 %0 1525*9880d681SAndroid Build Coastguard Worker} 1526*9880d681SAndroid Build Coastguard Worker 1527*9880d681SAndroid Build Coastguard WorkerIt would be interesting extend IPSCCP to be able to handle simple cases like 1528*9880d681SAndroid Build Coastguard Workerthis, where all of the arguments to a call are constant. Because IPSCCP runs 1529*9880d681SAndroid Build Coastguard Workerbefore inlining, trivial templates and inline functions are not yet inlined. 1530*9880d681SAndroid Build Coastguard WorkerThe results for a function + set of constant arguments should be memoized in a 1531*9880d681SAndroid Build Coastguard Workermap. 1532*9880d681SAndroid Build Coastguard Worker 1533*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1534*9880d681SAndroid Build Coastguard Worker 1535*9880d681SAndroid Build Coastguard WorkerThe libcall constant folding stuff should be moved out of SimplifyLibcalls into 1536*9880d681SAndroid Build Coastguard Workerlibanalysis' constantfolding logic. This would allow IPSCCP to be able to 1537*9880d681SAndroid Build Coastguard Workerhandle simple things like this: 1538*9880d681SAndroid Build Coastguard Worker 1539*9880d681SAndroid Build Coastguard Workerstatic int foo(const char *X) { return strlen(X); } 1540*9880d681SAndroid Build Coastguard Workerint bar() { return foo("abcd"); } 1541*9880d681SAndroid Build Coastguard Worker 1542*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1543*9880d681SAndroid Build Coastguard Worker 1544*9880d681SAndroid Build Coastguard Workerfunctionattrs doesn't know much about memcpy/memset. This function should be 1545*9880d681SAndroid Build Coastguard Workermarked readnone rather than readonly, since it only twiddles local memory, but 1546*9880d681SAndroid Build Coastguard Workerfunctionattrs doesn't handle memset/memcpy/memmove aggressively: 1547*9880d681SAndroid Build Coastguard Worker 1548*9880d681SAndroid Build Coastguard Workerstruct X { int *p; int *q; }; 1549*9880d681SAndroid Build Coastguard Workerint foo() { 1550*9880d681SAndroid Build Coastguard Worker int i = 0, j = 1; 1551*9880d681SAndroid Build Coastguard Worker struct X x, y; 1552*9880d681SAndroid Build Coastguard Worker int **p; 1553*9880d681SAndroid Build Coastguard Worker y.p = &i; 1554*9880d681SAndroid Build Coastguard Worker x.q = &j; 1555*9880d681SAndroid Build Coastguard Worker p = __builtin_memcpy (&x, &y, sizeof (int *)); 1556*9880d681SAndroid Build Coastguard Worker return **p; 1557*9880d681SAndroid Build Coastguard Worker} 1558*9880d681SAndroid Build Coastguard Worker 1559*9880d681SAndroid Build Coastguard WorkerThis can be seen at: 1560*9880d681SAndroid Build Coastguard Worker$ clang t.c -S -o - -mkernel -O0 -emit-llvm | opt -functionattrs -S 1561*9880d681SAndroid Build Coastguard Worker 1562*9880d681SAndroid Build Coastguard Worker 1563*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1564*9880d681SAndroid Build Coastguard Worker 1565*9880d681SAndroid Build Coastguard WorkerMissed instcombine transformation: 1566*9880d681SAndroid Build Coastguard Workerdefine i1 @a(i32 %x) nounwind readnone { 1567*9880d681SAndroid Build Coastguard Workerentry: 1568*9880d681SAndroid Build Coastguard Worker %cmp = icmp eq i32 %x, 30 1569*9880d681SAndroid Build Coastguard Worker %sub = add i32 %x, -30 1570*9880d681SAndroid Build Coastguard Worker %cmp2 = icmp ugt i32 %sub, 9 1571*9880d681SAndroid Build Coastguard Worker %or = or i1 %cmp, %cmp2 1572*9880d681SAndroid Build Coastguard Worker ret i1 %or 1573*9880d681SAndroid Build Coastguard Worker} 1574*9880d681SAndroid Build Coastguard WorkerThis should be optimized to a single compare. Testcase derived from gcc. 1575*9880d681SAndroid Build Coastguard Worker 1576*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1577*9880d681SAndroid Build Coastguard Worker 1578*9880d681SAndroid Build Coastguard WorkerMissed instcombine or reassociate transformation: 1579*9880d681SAndroid Build Coastguard Workerint a(int a, int b) { return (a==12)&(b>47)&(b<58); } 1580*9880d681SAndroid Build Coastguard Worker 1581*9880d681SAndroid Build Coastguard WorkerThe sgt and slt should be combined into a single comparison. Testcase derived 1582*9880d681SAndroid Build Coastguard Workerfrom gcc. 1583*9880d681SAndroid Build Coastguard Worker 1584*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1585*9880d681SAndroid Build Coastguard Worker 1586*9880d681SAndroid Build Coastguard WorkerMissed instcombine transformation: 1587*9880d681SAndroid Build Coastguard Worker 1588*9880d681SAndroid Build Coastguard Worker %382 = srem i32 %tmp14.i, 64 ; [#uses=1] 1589*9880d681SAndroid Build Coastguard Worker %383 = zext i32 %382 to i64 ; [#uses=1] 1590*9880d681SAndroid Build Coastguard Worker %384 = shl i64 %381, %383 ; [#uses=1] 1591*9880d681SAndroid Build Coastguard Worker %385 = icmp slt i32 %tmp14.i, 64 ; [#uses=1] 1592*9880d681SAndroid Build Coastguard Worker 1593*9880d681SAndroid Build Coastguard WorkerThe srem can be transformed to an and because if %tmp14.i is negative, the 1594*9880d681SAndroid Build Coastguard Workershift is undefined. Testcase derived from 403.gcc. 1595*9880d681SAndroid Build Coastguard Worker 1596*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1597*9880d681SAndroid Build Coastguard Worker 1598*9880d681SAndroid Build Coastguard WorkerThis is a range comparison on a divided result (from 403.gcc): 1599*9880d681SAndroid Build Coastguard Worker 1600*9880d681SAndroid Build Coastguard Worker %1337 = sdiv i32 %1336, 8 ; [#uses=1] 1601*9880d681SAndroid Build Coastguard Worker %.off.i208 = add i32 %1336, 7 ; [#uses=1] 1602*9880d681SAndroid Build Coastguard Worker %1338 = icmp ult i32 %.off.i208, 15 ; [#uses=1] 1603*9880d681SAndroid Build Coastguard Worker 1604*9880d681SAndroid Build Coastguard WorkerWe already catch this (removing the sdiv) if there isn't an add, we should 1605*9880d681SAndroid Build Coastguard Workerhandle the 'add' as well. This is a common idiom with it's builtin_alloca code. 1606*9880d681SAndroid Build Coastguard WorkerC testcase: 1607*9880d681SAndroid Build Coastguard Worker 1608*9880d681SAndroid Build Coastguard Workerint a(int x) { return (unsigned)(x/16+7) < 15; } 1609*9880d681SAndroid Build Coastguard Worker 1610*9880d681SAndroid Build Coastguard WorkerAnother similar case involves truncations on 64-bit targets: 1611*9880d681SAndroid Build Coastguard Worker 1612*9880d681SAndroid Build Coastguard Worker %361 = sdiv i64 %.046, 8 ; [#uses=1] 1613*9880d681SAndroid Build Coastguard Worker %362 = trunc i64 %361 to i32 ; [#uses=2] 1614*9880d681SAndroid Build Coastguard Worker... 1615*9880d681SAndroid Build Coastguard Worker %367 = icmp eq i32 %362, 0 ; [#uses=1] 1616*9880d681SAndroid Build Coastguard Worker 1617*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1618*9880d681SAndroid Build Coastguard Worker 1619*9880d681SAndroid Build Coastguard WorkerMissed instcombine/dagcombine transformation: 1620*9880d681SAndroid Build Coastguard Workerdefine void @lshift_lt(i8 zeroext %a) nounwind { 1621*9880d681SAndroid Build Coastguard Workerentry: 1622*9880d681SAndroid Build Coastguard Worker %conv = zext i8 %a to i32 1623*9880d681SAndroid Build Coastguard Worker %shl = shl i32 %conv, 3 1624*9880d681SAndroid Build Coastguard Worker %cmp = icmp ult i32 %shl, 33 1625*9880d681SAndroid Build Coastguard Worker br i1 %cmp, label %if.then, label %if.end 1626*9880d681SAndroid Build Coastguard Worker 1627*9880d681SAndroid Build Coastguard Workerif.then: 1628*9880d681SAndroid Build Coastguard Worker tail call void @bar() nounwind 1629*9880d681SAndroid Build Coastguard Worker ret void 1630*9880d681SAndroid Build Coastguard Worker 1631*9880d681SAndroid Build Coastguard Workerif.end: 1632*9880d681SAndroid Build Coastguard Worker ret void 1633*9880d681SAndroid Build Coastguard Worker} 1634*9880d681SAndroid Build Coastguard Workerdeclare void @bar() nounwind 1635*9880d681SAndroid Build Coastguard Worker 1636*9880d681SAndroid Build Coastguard WorkerThe shift should be eliminated. Testcase derived from gcc. 1637*9880d681SAndroid Build Coastguard Worker 1638*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1639*9880d681SAndroid Build Coastguard Worker 1640*9880d681SAndroid Build Coastguard WorkerThese compile into different code, one gets recognized as a switch and the 1641*9880d681SAndroid Build Coastguard Workerother doesn't due to phase ordering issues (PR6212): 1642*9880d681SAndroid Build Coastguard Worker 1643*9880d681SAndroid Build Coastguard Workerint test1(int mainType, int subType) { 1644*9880d681SAndroid Build Coastguard Worker if (mainType == 7) 1645*9880d681SAndroid Build Coastguard Worker subType = 4; 1646*9880d681SAndroid Build Coastguard Worker else if (mainType == 9) 1647*9880d681SAndroid Build Coastguard Worker subType = 6; 1648*9880d681SAndroid Build Coastguard Worker else if (mainType == 11) 1649*9880d681SAndroid Build Coastguard Worker subType = 9; 1650*9880d681SAndroid Build Coastguard Worker return subType; 1651*9880d681SAndroid Build Coastguard Worker} 1652*9880d681SAndroid Build Coastguard Worker 1653*9880d681SAndroid Build Coastguard Workerint test2(int mainType, int subType) { 1654*9880d681SAndroid Build Coastguard Worker if (mainType == 7) 1655*9880d681SAndroid Build Coastguard Worker subType = 4; 1656*9880d681SAndroid Build Coastguard Worker if (mainType == 9) 1657*9880d681SAndroid Build Coastguard Worker subType = 6; 1658*9880d681SAndroid Build Coastguard Worker if (mainType == 11) 1659*9880d681SAndroid Build Coastguard Worker subType = 9; 1660*9880d681SAndroid Build Coastguard Worker return subType; 1661*9880d681SAndroid Build Coastguard Worker} 1662*9880d681SAndroid Build Coastguard Worker 1663*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1664*9880d681SAndroid Build Coastguard Worker 1665*9880d681SAndroid Build Coastguard WorkerThe following test case (from PR6576): 1666*9880d681SAndroid Build Coastguard Worker 1667*9880d681SAndroid Build Coastguard Workerdefine i32 @mul(i32 %a, i32 %b) nounwind readnone { 1668*9880d681SAndroid Build Coastguard Workerentry: 1669*9880d681SAndroid Build Coastguard Worker %cond1 = icmp eq i32 %b, 0 ; <i1> [#uses=1] 1670*9880d681SAndroid Build Coastguard Worker br i1 %cond1, label %exit, label %bb.nph 1671*9880d681SAndroid Build Coastguard Workerbb.nph: ; preds = %entry 1672*9880d681SAndroid Build Coastguard Worker %tmp = mul i32 %b, %a ; <i32> [#uses=1] 1673*9880d681SAndroid Build Coastguard Worker ret i32 %tmp 1674*9880d681SAndroid Build Coastguard Workerexit: ; preds = %entry 1675*9880d681SAndroid Build Coastguard Worker ret i32 0 1676*9880d681SAndroid Build Coastguard Worker} 1677*9880d681SAndroid Build Coastguard Worker 1678*9880d681SAndroid Build Coastguard Workercould be reduced to: 1679*9880d681SAndroid Build Coastguard Worker 1680*9880d681SAndroid Build Coastguard Workerdefine i32 @mul(i32 %a, i32 %b) nounwind readnone { 1681*9880d681SAndroid Build Coastguard Workerentry: 1682*9880d681SAndroid Build Coastguard Worker %tmp = mul i32 %b, %a 1683*9880d681SAndroid Build Coastguard Worker ret i32 %tmp 1684*9880d681SAndroid Build Coastguard Worker} 1685*9880d681SAndroid Build Coastguard Worker 1686*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1687*9880d681SAndroid Build Coastguard Worker 1688*9880d681SAndroid Build Coastguard WorkerWe should use DSE + llvm.lifetime.end to delete dead vtable pointer updates. 1689*9880d681SAndroid Build Coastguard WorkerSee GCC PR34949 1690*9880d681SAndroid Build Coastguard Worker 1691*9880d681SAndroid Build Coastguard WorkerAnother interesting case is that something related could be used for variables 1692*9880d681SAndroid Build Coastguard Workerthat go const after their ctor has finished. In these cases, globalopt (which 1693*9880d681SAndroid Build Coastguard Workercan statically run the constructor) could mark the global const (so it gets put 1694*9880d681SAndroid Build Coastguard Workerin the readonly section). A testcase would be: 1695*9880d681SAndroid Build Coastguard Worker 1696*9880d681SAndroid Build Coastguard Worker#include <complex> 1697*9880d681SAndroid Build Coastguard Workerusing namespace std; 1698*9880d681SAndroid Build Coastguard Workerconst complex<char> should_be_in_rodata (42,-42); 1699*9880d681SAndroid Build Coastguard Workercomplex<char> should_be_in_data (42,-42); 1700*9880d681SAndroid Build Coastguard Workercomplex<char> should_be_in_bss; 1701*9880d681SAndroid Build Coastguard Worker 1702*9880d681SAndroid Build Coastguard WorkerWhere we currently evaluate the ctors but the globals don't become const because 1703*9880d681SAndroid Build Coastguard Workerthe optimizer doesn't know they "become const" after the ctor is done. See 1704*9880d681SAndroid Build Coastguard WorkerGCC PR4131 for more examples. 1705*9880d681SAndroid Build Coastguard Worker 1706*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1707*9880d681SAndroid Build Coastguard Worker 1708*9880d681SAndroid Build Coastguard WorkerIn this code: 1709*9880d681SAndroid Build Coastguard Worker 1710*9880d681SAndroid Build Coastguard Workerlong foo(long x) { 1711*9880d681SAndroid Build Coastguard Worker return x > 1 ? x : 1; 1712*9880d681SAndroid Build Coastguard Worker} 1713*9880d681SAndroid Build Coastguard Worker 1714*9880d681SAndroid Build Coastguard WorkerLLVM emits a comparison with 1 instead of 0. 0 would be equivalent 1715*9880d681SAndroid Build Coastguard Workerand cheaper on most targets. 1716*9880d681SAndroid Build Coastguard Worker 1717*9880d681SAndroid Build Coastguard WorkerLLVM prefers comparisons with zero over non-zero in general, but in this 1718*9880d681SAndroid Build Coastguard Workercase it choses instead to keep the max operation obvious. 1719*9880d681SAndroid Build Coastguard Worker 1720*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1721*9880d681SAndroid Build Coastguard Worker 1722*9880d681SAndroid Build Coastguard Workerdefine void @a(i32 %x) nounwind { 1723*9880d681SAndroid Build Coastguard Workerentry: 1724*9880d681SAndroid Build Coastguard Worker switch i32 %x, label %if.end [ 1725*9880d681SAndroid Build Coastguard Worker i32 0, label %if.then 1726*9880d681SAndroid Build Coastguard Worker i32 1, label %if.then 1727*9880d681SAndroid Build Coastguard Worker i32 2, label %if.then 1728*9880d681SAndroid Build Coastguard Worker i32 3, label %if.then 1729*9880d681SAndroid Build Coastguard Worker i32 5, label %if.then 1730*9880d681SAndroid Build Coastguard Worker ] 1731*9880d681SAndroid Build Coastguard Workerif.then: 1732*9880d681SAndroid Build Coastguard Worker tail call void @foo() nounwind 1733*9880d681SAndroid Build Coastguard Worker ret void 1734*9880d681SAndroid Build Coastguard Workerif.end: 1735*9880d681SAndroid Build Coastguard Worker ret void 1736*9880d681SAndroid Build Coastguard Worker} 1737*9880d681SAndroid Build Coastguard Workerdeclare void @foo() 1738*9880d681SAndroid Build Coastguard Worker 1739*9880d681SAndroid Build Coastguard WorkerGenerated code on x86-64 (other platforms give similar results): 1740*9880d681SAndroid Build Coastguard Workera: 1741*9880d681SAndroid Build Coastguard Worker cmpl $5, %edi 1742*9880d681SAndroid Build Coastguard Worker ja LBB2_2 1743*9880d681SAndroid Build Coastguard Worker cmpl $4, %edi 1744*9880d681SAndroid Build Coastguard Worker jne LBB2_3 1745*9880d681SAndroid Build Coastguard Worker.LBB0_2: 1746*9880d681SAndroid Build Coastguard Worker ret 1747*9880d681SAndroid Build Coastguard Worker.LBB0_3: 1748*9880d681SAndroid Build Coastguard Worker jmp foo # TAILCALL 1749*9880d681SAndroid Build Coastguard Worker 1750*9880d681SAndroid Build Coastguard WorkerIf we wanted to be really clever, we could simplify the whole thing to 1751*9880d681SAndroid Build Coastguard Workersomething like the following, which eliminates a branch: 1752*9880d681SAndroid Build Coastguard Worker xorl $1, %edi 1753*9880d681SAndroid Build Coastguard Worker cmpl $4, %edi 1754*9880d681SAndroid Build Coastguard Worker ja .LBB0_2 1755*9880d681SAndroid Build Coastguard Worker ret 1756*9880d681SAndroid Build Coastguard Worker.LBB0_2: 1757*9880d681SAndroid Build Coastguard Worker jmp foo # TAILCALL 1758*9880d681SAndroid Build Coastguard Worker 1759*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1760*9880d681SAndroid Build Coastguard Worker 1761*9880d681SAndroid Build Coastguard WorkerWe compile this: 1762*9880d681SAndroid Build Coastguard Worker 1763*9880d681SAndroid Build Coastguard Workerint foo(int a) { return (a & (~15)) / 16; } 1764*9880d681SAndroid Build Coastguard Worker 1765*9880d681SAndroid Build Coastguard WorkerInto: 1766*9880d681SAndroid Build Coastguard Worker 1767*9880d681SAndroid Build Coastguard Workerdefine i32 @foo(i32 %a) nounwind readnone ssp { 1768*9880d681SAndroid Build Coastguard Workerentry: 1769*9880d681SAndroid Build Coastguard Worker %and = and i32 %a, -16 1770*9880d681SAndroid Build Coastguard Worker %div = sdiv i32 %and, 16 1771*9880d681SAndroid Build Coastguard Worker ret i32 %div 1772*9880d681SAndroid Build Coastguard Worker} 1773*9880d681SAndroid Build Coastguard Worker 1774*9880d681SAndroid Build Coastguard Workerbut this code (X & -A)/A is X >> log2(A) when A is a power of 2, so this case 1775*9880d681SAndroid Build Coastguard Workershould be instcombined into just "a >> 4". 1776*9880d681SAndroid Build Coastguard Worker 1777*9880d681SAndroid Build Coastguard WorkerWe do get this at the codegen level, so something knows about it, but 1778*9880d681SAndroid Build Coastguard Workerinstcombine should catch it earlier: 1779*9880d681SAndroid Build Coastguard Worker 1780*9880d681SAndroid Build Coastguard Worker_foo: ## @foo 1781*9880d681SAndroid Build Coastguard Worker## BB#0: ## %entry 1782*9880d681SAndroid Build Coastguard Worker movl %edi, %eax 1783*9880d681SAndroid Build Coastguard Worker sarl $4, %eax 1784*9880d681SAndroid Build Coastguard Worker ret 1785*9880d681SAndroid Build Coastguard Worker 1786*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1787*9880d681SAndroid Build Coastguard Worker 1788*9880d681SAndroid Build Coastguard WorkerThis code (from GCC PR28685): 1789*9880d681SAndroid Build Coastguard Worker 1790*9880d681SAndroid Build Coastguard Workerint test(int a, int b) { 1791*9880d681SAndroid Build Coastguard Worker int lt = a < b; 1792*9880d681SAndroid Build Coastguard Worker int eq = a == b; 1793*9880d681SAndroid Build Coastguard Worker if (lt) 1794*9880d681SAndroid Build Coastguard Worker return 1; 1795*9880d681SAndroid Build Coastguard Worker return eq; 1796*9880d681SAndroid Build Coastguard Worker} 1797*9880d681SAndroid Build Coastguard Worker 1798*9880d681SAndroid Build Coastguard WorkerIs compiled to: 1799*9880d681SAndroid Build Coastguard Worker 1800*9880d681SAndroid Build Coastguard Workerdefine i32 @test(i32 %a, i32 %b) nounwind readnone ssp { 1801*9880d681SAndroid Build Coastguard Workerentry: 1802*9880d681SAndroid Build Coastguard Worker %cmp = icmp slt i32 %a, %b 1803*9880d681SAndroid Build Coastguard Worker br i1 %cmp, label %return, label %if.end 1804*9880d681SAndroid Build Coastguard Worker 1805*9880d681SAndroid Build Coastguard Workerif.end: ; preds = %entry 1806*9880d681SAndroid Build Coastguard Worker %cmp5 = icmp eq i32 %a, %b 1807*9880d681SAndroid Build Coastguard Worker %conv6 = zext i1 %cmp5 to i32 1808*9880d681SAndroid Build Coastguard Worker ret i32 %conv6 1809*9880d681SAndroid Build Coastguard Worker 1810*9880d681SAndroid Build Coastguard Workerreturn: ; preds = %entry 1811*9880d681SAndroid Build Coastguard Worker ret i32 1 1812*9880d681SAndroid Build Coastguard Worker} 1813*9880d681SAndroid Build Coastguard Worker 1814*9880d681SAndroid Build Coastguard Workerit could be: 1815*9880d681SAndroid Build Coastguard Worker 1816*9880d681SAndroid Build Coastguard Workerdefine i32 @test__(i32 %a, i32 %b) nounwind readnone ssp { 1817*9880d681SAndroid Build Coastguard Workerentry: 1818*9880d681SAndroid Build Coastguard Worker %0 = icmp sle i32 %a, %b 1819*9880d681SAndroid Build Coastguard Worker %retval = zext i1 %0 to i32 1820*9880d681SAndroid Build Coastguard Worker ret i32 %retval 1821*9880d681SAndroid Build Coastguard Worker} 1822*9880d681SAndroid Build Coastguard Worker 1823*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1824*9880d681SAndroid Build Coastguard Worker 1825*9880d681SAndroid Build Coastguard WorkerThis code can be seen in viterbi: 1826*9880d681SAndroid Build Coastguard Worker 1827*9880d681SAndroid Build Coastguard Worker %64 = call noalias i8* @malloc(i64 %62) nounwind 1828*9880d681SAndroid Build Coastguard Worker... 1829*9880d681SAndroid Build Coastguard Worker %67 = call i64 @llvm.objectsize.i64(i8* %64, i1 false) nounwind 1830*9880d681SAndroid Build Coastguard Worker %68 = call i8* @__memset_chk(i8* %64, i32 0, i64 %62, i64 %67) nounwind 1831*9880d681SAndroid Build Coastguard Worker 1832*9880d681SAndroid Build Coastguard Workerllvm.objectsize.i64 should be taught about malloc/calloc, allowing it to 1833*9880d681SAndroid Build Coastguard Workerfold to %62. This is a security win (overflows of malloc will get caught) 1834*9880d681SAndroid Build Coastguard Workerand also a performance win by exposing more memsets to the optimizer. 1835*9880d681SAndroid Build Coastguard Worker 1836*9880d681SAndroid Build Coastguard WorkerThis occurs several times in viterbi. 1837*9880d681SAndroid Build Coastguard Worker 1838*9880d681SAndroid Build Coastguard WorkerNote that this would change the semantics of @llvm.objectsize which by its 1839*9880d681SAndroid Build Coastguard Workercurrent definition always folds to a constant. We also should make sure that 1840*9880d681SAndroid Build Coastguard Workerwe remove checking in code like 1841*9880d681SAndroid Build Coastguard Worker 1842*9880d681SAndroid Build Coastguard Worker char *p = malloc(strlen(s)+1); 1843*9880d681SAndroid Build Coastguard Worker __strcpy_chk(p, s, __builtin_objectsize(p, 0)); 1844*9880d681SAndroid Build Coastguard Worker 1845*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1846*9880d681SAndroid Build Coastguard Worker 1847*9880d681SAndroid Build Coastguard Workerclang -O3 currently compiles this code 1848*9880d681SAndroid Build Coastguard Worker 1849*9880d681SAndroid Build Coastguard Workerint g(unsigned int a) { 1850*9880d681SAndroid Build Coastguard Worker unsigned int c[100]; 1851*9880d681SAndroid Build Coastguard Worker c[10] = a; 1852*9880d681SAndroid Build Coastguard Worker c[11] = a; 1853*9880d681SAndroid Build Coastguard Worker unsigned int b = c[10] + c[11]; 1854*9880d681SAndroid Build Coastguard Worker if(b > a*2) a = 4; 1855*9880d681SAndroid Build Coastguard Worker else a = 8; 1856*9880d681SAndroid Build Coastguard Worker return a + 7; 1857*9880d681SAndroid Build Coastguard Worker} 1858*9880d681SAndroid Build Coastguard Worker 1859*9880d681SAndroid Build Coastguard Workerinto 1860*9880d681SAndroid Build Coastguard Worker 1861*9880d681SAndroid Build Coastguard Workerdefine i32 @g(i32 a) nounwind readnone { 1862*9880d681SAndroid Build Coastguard Worker %add = shl i32 %a, 1 1863*9880d681SAndroid Build Coastguard Worker %mul = shl i32 %a, 1 1864*9880d681SAndroid Build Coastguard Worker %cmp = icmp ugt i32 %add, %mul 1865*9880d681SAndroid Build Coastguard Worker %a.addr.0 = select i1 %cmp, i32 11, i32 15 1866*9880d681SAndroid Build Coastguard Worker ret i32 %a.addr.0 1867*9880d681SAndroid Build Coastguard Worker} 1868*9880d681SAndroid Build Coastguard Worker 1869*9880d681SAndroid Build Coastguard WorkerThe icmp should fold to false. This CSE opportunity is only available 1870*9880d681SAndroid Build Coastguard Workerafter GVN and InstCombine have run. 1871*9880d681SAndroid Build Coastguard Worker 1872*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1873*9880d681SAndroid Build Coastguard Worker 1874*9880d681SAndroid Build Coastguard Workermemcpyopt should turn this: 1875*9880d681SAndroid Build Coastguard Worker 1876*9880d681SAndroid Build Coastguard Workerdefine i8* @test10(i32 %x) { 1877*9880d681SAndroid Build Coastguard Worker %alloc = call noalias i8* @malloc(i32 %x) nounwind 1878*9880d681SAndroid Build Coastguard Worker call void @llvm.memset.p0i8.i32(i8* %alloc, i8 0, i32 %x, i32 1, i1 false) 1879*9880d681SAndroid Build Coastguard Worker ret i8* %alloc 1880*9880d681SAndroid Build Coastguard Worker} 1881*9880d681SAndroid Build Coastguard Worker 1882*9880d681SAndroid Build Coastguard Workerinto a call to calloc. We should make sure that we analyze calloc as 1883*9880d681SAndroid Build Coastguard Workeraggressively as malloc though. 1884*9880d681SAndroid Build Coastguard Worker 1885*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1886*9880d681SAndroid Build Coastguard Worker 1887*9880d681SAndroid Build Coastguard Workerclang -O3 doesn't optimize this: 1888*9880d681SAndroid Build Coastguard Worker 1889*9880d681SAndroid Build Coastguard Workervoid f1(int* begin, int* end) { 1890*9880d681SAndroid Build Coastguard Worker std::fill(begin, end, 0); 1891*9880d681SAndroid Build Coastguard Worker} 1892*9880d681SAndroid Build Coastguard Worker 1893*9880d681SAndroid Build Coastguard Workerinto a memset. This is PR8942. 1894*9880d681SAndroid Build Coastguard Worker 1895*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1896*9880d681SAndroid Build Coastguard Worker 1897*9880d681SAndroid Build Coastguard Workerclang -O3 -fno-exceptions currently compiles this code: 1898*9880d681SAndroid Build Coastguard Worker 1899*9880d681SAndroid Build Coastguard Workervoid f(int N) { 1900*9880d681SAndroid Build Coastguard Worker std::vector<int> v(N); 1901*9880d681SAndroid Build Coastguard Worker 1902*9880d681SAndroid Build Coastguard Worker extern void sink(void*); sink(&v); 1903*9880d681SAndroid Build Coastguard Worker} 1904*9880d681SAndroid Build Coastguard Worker 1905*9880d681SAndroid Build Coastguard Workerinto 1906*9880d681SAndroid Build Coastguard Worker 1907*9880d681SAndroid Build Coastguard Workerdefine void @_Z1fi(i32 %N) nounwind { 1908*9880d681SAndroid Build Coastguard Workerentry: 1909*9880d681SAndroid Build Coastguard Worker %v2 = alloca [3 x i32*], align 8 1910*9880d681SAndroid Build Coastguard Worker %v2.sub = getelementptr inbounds [3 x i32*]* %v2, i64 0, i64 0 1911*9880d681SAndroid Build Coastguard Worker %tmpcast = bitcast [3 x i32*]* %v2 to %"class.std::vector"* 1912*9880d681SAndroid Build Coastguard Worker %conv = sext i32 %N to i64 1913*9880d681SAndroid Build Coastguard Worker store i32* null, i32** %v2.sub, align 8, !tbaa !0 1914*9880d681SAndroid Build Coastguard Worker %tmp3.i.i.i.i.i = getelementptr inbounds [3 x i32*]* %v2, i64 0, i64 1 1915*9880d681SAndroid Build Coastguard Worker store i32* null, i32** %tmp3.i.i.i.i.i, align 8, !tbaa !0 1916*9880d681SAndroid Build Coastguard Worker %tmp4.i.i.i.i.i = getelementptr inbounds [3 x i32*]* %v2, i64 0, i64 2 1917*9880d681SAndroid Build Coastguard Worker store i32* null, i32** %tmp4.i.i.i.i.i, align 8, !tbaa !0 1918*9880d681SAndroid Build Coastguard Worker %cmp.i.i.i.i = icmp eq i32 %N, 0 1919*9880d681SAndroid Build Coastguard Worker br i1 %cmp.i.i.i.i, label %_ZNSt12_Vector_baseIiSaIiEEC2EmRKS0_.exit.thread.i.i, label %cond.true.i.i.i.i 1920*9880d681SAndroid Build Coastguard Worker 1921*9880d681SAndroid Build Coastguard Worker_ZNSt12_Vector_baseIiSaIiEEC2EmRKS0_.exit.thread.i.i: ; preds = %entry 1922*9880d681SAndroid Build Coastguard Worker store i32* null, i32** %v2.sub, align 8, !tbaa !0 1923*9880d681SAndroid Build Coastguard Worker store i32* null, i32** %tmp3.i.i.i.i.i, align 8, !tbaa !0 1924*9880d681SAndroid Build Coastguard Worker %add.ptr.i5.i.i = getelementptr inbounds i32* null, i64 %conv 1925*9880d681SAndroid Build Coastguard Worker store i32* %add.ptr.i5.i.i, i32** %tmp4.i.i.i.i.i, align 8, !tbaa !0 1926*9880d681SAndroid Build Coastguard Worker br label %_ZNSt6vectorIiSaIiEEC1EmRKiRKS0_.exit 1927*9880d681SAndroid Build Coastguard Worker 1928*9880d681SAndroid Build Coastguard Workercond.true.i.i.i.i: ; preds = %entry 1929*9880d681SAndroid Build Coastguard Worker %cmp.i.i.i.i.i = icmp slt i32 %N, 0 1930*9880d681SAndroid Build Coastguard Worker br i1 %cmp.i.i.i.i.i, label %if.then.i.i.i.i.i, label %_ZNSt12_Vector_baseIiSaIiEEC2EmRKS0_.exit.i.i 1931*9880d681SAndroid Build Coastguard Worker 1932*9880d681SAndroid Build Coastguard Workerif.then.i.i.i.i.i: ; preds = %cond.true.i.i.i.i 1933*9880d681SAndroid Build Coastguard Worker call void @_ZSt17__throw_bad_allocv() noreturn nounwind 1934*9880d681SAndroid Build Coastguard Worker unreachable 1935*9880d681SAndroid Build Coastguard Worker 1936*9880d681SAndroid Build Coastguard Worker_ZNSt12_Vector_baseIiSaIiEEC2EmRKS0_.exit.i.i: ; preds = %cond.true.i.i.i.i 1937*9880d681SAndroid Build Coastguard Worker %mul.i.i.i.i.i = shl i64 %conv, 2 1938*9880d681SAndroid Build Coastguard Worker %call3.i.i.i.i.i = call noalias i8* @_Znwm(i64 %mul.i.i.i.i.i) nounwind 1939*9880d681SAndroid Build Coastguard Worker %0 = bitcast i8* %call3.i.i.i.i.i to i32* 1940*9880d681SAndroid Build Coastguard Worker store i32* %0, i32** %v2.sub, align 8, !tbaa !0 1941*9880d681SAndroid Build Coastguard Worker store i32* %0, i32** %tmp3.i.i.i.i.i, align 8, !tbaa !0 1942*9880d681SAndroid Build Coastguard Worker %add.ptr.i.i.i = getelementptr inbounds i32* %0, i64 %conv 1943*9880d681SAndroid Build Coastguard Worker store i32* %add.ptr.i.i.i, i32** %tmp4.i.i.i.i.i, align 8, !tbaa !0 1944*9880d681SAndroid Build Coastguard Worker call void @llvm.memset.p0i8.i64(i8* %call3.i.i.i.i.i, i8 0, i64 %mul.i.i.i.i.i, i32 4, i1 false) 1945*9880d681SAndroid Build Coastguard Worker br label %_ZNSt6vectorIiSaIiEEC1EmRKiRKS0_.exit 1946*9880d681SAndroid Build Coastguard Worker 1947*9880d681SAndroid Build Coastguard WorkerThis is just the handling the construction of the vector. Most surprising here 1948*9880d681SAndroid Build Coastguard Workeris the fact that all three null stores in %entry are dead (because we do no 1949*9880d681SAndroid Build Coastguard Workercross-block DSE). 1950*9880d681SAndroid Build Coastguard Worker 1951*9880d681SAndroid Build Coastguard WorkerAlso surprising is that %conv isn't simplified to 0 in %....exit.thread.i.i. 1952*9880d681SAndroid Build Coastguard WorkerThis is a because the client of LazyValueInfo doesn't simplify all instruction 1953*9880d681SAndroid Build Coastguard Workeroperands, just selected ones. 1954*9880d681SAndroid Build Coastguard Worker 1955*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1956*9880d681SAndroid Build Coastguard Worker 1957*9880d681SAndroid Build Coastguard Workerclang -O3 -fno-exceptions currently compiles this code: 1958*9880d681SAndroid Build Coastguard Worker 1959*9880d681SAndroid Build Coastguard Workervoid f(char* a, int n) { 1960*9880d681SAndroid Build Coastguard Worker __builtin_memset(a, 0, n); 1961*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < n; ++i) 1962*9880d681SAndroid Build Coastguard Worker a[i] = 0; 1963*9880d681SAndroid Build Coastguard Worker} 1964*9880d681SAndroid Build Coastguard Worker 1965*9880d681SAndroid Build Coastguard Workerinto: 1966*9880d681SAndroid Build Coastguard Worker 1967*9880d681SAndroid Build Coastguard Workerdefine void @_Z1fPci(i8* nocapture %a, i32 %n) nounwind { 1968*9880d681SAndroid Build Coastguard Workerentry: 1969*9880d681SAndroid Build Coastguard Worker %conv = sext i32 %n to i64 1970*9880d681SAndroid Build Coastguard Worker tail call void @llvm.memset.p0i8.i64(i8* %a, i8 0, i64 %conv, i32 1, i1 false) 1971*9880d681SAndroid Build Coastguard Worker %cmp8 = icmp sgt i32 %n, 0 1972*9880d681SAndroid Build Coastguard Worker br i1 %cmp8, label %for.body.lr.ph, label %for.end 1973*9880d681SAndroid Build Coastguard Worker 1974*9880d681SAndroid Build Coastguard Workerfor.body.lr.ph: ; preds = %entry 1975*9880d681SAndroid Build Coastguard Worker %tmp10 = add i32 %n, -1 1976*9880d681SAndroid Build Coastguard Worker %tmp11 = zext i32 %tmp10 to i64 1977*9880d681SAndroid Build Coastguard Worker %tmp12 = add i64 %tmp11, 1 1978*9880d681SAndroid Build Coastguard Worker call void @llvm.memset.p0i8.i64(i8* %a, i8 0, i64 %tmp12, i32 1, i1 false) 1979*9880d681SAndroid Build Coastguard Worker ret void 1980*9880d681SAndroid Build Coastguard Worker 1981*9880d681SAndroid Build Coastguard Workerfor.end: ; preds = %entry 1982*9880d681SAndroid Build Coastguard Worker ret void 1983*9880d681SAndroid Build Coastguard Worker} 1984*9880d681SAndroid Build Coastguard Worker 1985*9880d681SAndroid Build Coastguard WorkerThis shouldn't need the ((zext (%n - 1)) + 1) game, and it should ideally fold 1986*9880d681SAndroid Build Coastguard Workerthe two memset's together. 1987*9880d681SAndroid Build Coastguard Worker 1988*9880d681SAndroid Build Coastguard WorkerThe issue with the addition only occurs in 64-bit mode, and appears to be at 1989*9880d681SAndroid Build Coastguard Workerleast partially caused by Scalar Evolution not keeping its cache updated: it 1990*9880d681SAndroid Build Coastguard Workerreturns the "wrong" result immediately after indvars runs, but figures out the 1991*9880d681SAndroid Build Coastguard Workerexpected result if it is run from scratch on IR resulting from running indvars. 1992*9880d681SAndroid Build Coastguard Worker 1993*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 1994*9880d681SAndroid Build Coastguard Worker 1995*9880d681SAndroid Build Coastguard Workerclang -O3 -fno-exceptions currently compiles this code: 1996*9880d681SAndroid Build Coastguard Worker 1997*9880d681SAndroid Build Coastguard Workerstruct S { 1998*9880d681SAndroid Build Coastguard Worker unsigned short m1, m2; 1999*9880d681SAndroid Build Coastguard Worker unsigned char m3, m4; 2000*9880d681SAndroid Build Coastguard Worker}; 2001*9880d681SAndroid Build Coastguard Worker 2002*9880d681SAndroid Build Coastguard Workervoid f(int N) { 2003*9880d681SAndroid Build Coastguard Worker std::vector<S> v(N); 2004*9880d681SAndroid Build Coastguard Worker extern void sink(void*); sink(&v); 2005*9880d681SAndroid Build Coastguard Worker} 2006*9880d681SAndroid Build Coastguard Worker 2007*9880d681SAndroid Build Coastguard Workerinto poor code for zero-initializing 'v' when N is >0. The problem is that 2008*9880d681SAndroid Build Coastguard WorkerS is only 6 bytes, but each element is 8 byte-aligned. We generate a loop and 2009*9880d681SAndroid Build Coastguard Worker4 stores on each iteration. If the struct were 8 bytes, this gets turned into 2010*9880d681SAndroid Build Coastguard Workera memset. 2011*9880d681SAndroid Build Coastguard Worker 2012*9880d681SAndroid Build Coastguard WorkerIn order to handle this we have to: 2013*9880d681SAndroid Build Coastguard Worker A) Teach clang to generate metadata for memsets of structs that have holes in 2014*9880d681SAndroid Build Coastguard Worker them. 2015*9880d681SAndroid Build Coastguard Worker B) Teach clang to use such a memset for zero init of this struct (since it has 2016*9880d681SAndroid Build Coastguard Worker a hole), instead of doing elementwise zeroing. 2017*9880d681SAndroid Build Coastguard Worker 2018*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2019*9880d681SAndroid Build Coastguard Worker 2020*9880d681SAndroid Build Coastguard Workerclang -O3 currently compiles this code: 2021*9880d681SAndroid Build Coastguard Worker 2022*9880d681SAndroid Build Coastguard Workerextern const int magic; 2023*9880d681SAndroid Build Coastguard Workerdouble f() { return 0.0 * magic; } 2024*9880d681SAndroid Build Coastguard Worker 2025*9880d681SAndroid Build Coastguard Workerinto 2026*9880d681SAndroid Build Coastguard Worker 2027*9880d681SAndroid Build Coastguard Worker@magic = external constant i32 2028*9880d681SAndroid Build Coastguard Worker 2029*9880d681SAndroid Build Coastguard Workerdefine double @_Z1fv() nounwind readnone { 2030*9880d681SAndroid Build Coastguard Workerentry: 2031*9880d681SAndroid Build Coastguard Worker %tmp = load i32* @magic, align 4, !tbaa !0 2032*9880d681SAndroid Build Coastguard Worker %conv = sitofp i32 %tmp to double 2033*9880d681SAndroid Build Coastguard Worker %mul = fmul double %conv, 0.000000e+00 2034*9880d681SAndroid Build Coastguard Worker ret double %mul 2035*9880d681SAndroid Build Coastguard Worker} 2036*9880d681SAndroid Build Coastguard Worker 2037*9880d681SAndroid Build Coastguard WorkerWe should be able to fold away this fmul to 0.0. More generally, fmul(x,0.0) 2038*9880d681SAndroid Build Coastguard Workercan be folded to 0.0 if we can prove that the LHS is not -0.0, not a NaN, and 2039*9880d681SAndroid Build Coastguard Workernot an INF. The CannotBeNegativeZero predicate in value tracking should be 2040*9880d681SAndroid Build Coastguard Workerextended to support general "fpclassify" operations that can return 2041*9880d681SAndroid Build Coastguard Workeryes/no/unknown for each of these predicates. 2042*9880d681SAndroid Build Coastguard Worker 2043*9880d681SAndroid Build Coastguard WorkerIn this predicate, we know that uitofp is trivially never NaN or -0.0, and 2044*9880d681SAndroid Build Coastguard Workerwe know that it isn't +/-Inf if the floating point type has enough exponent bits 2045*9880d681SAndroid Build Coastguard Workerto represent the largest integer value as < inf. 2046*9880d681SAndroid Build Coastguard Worker 2047*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2048*9880d681SAndroid Build Coastguard Worker 2049*9880d681SAndroid Build Coastguard WorkerWhen optimizing a transformation that can change the sign of 0.0 (such as the 2050*9880d681SAndroid Build Coastguard Worker0.0*val -> 0.0 transformation above), it might be provable that the sign of the 2051*9880d681SAndroid Build Coastguard Workerexpression doesn't matter. For example, by the above rules, we can't transform 2052*9880d681SAndroid Build Coastguard Workerfmul(sitofp(x), 0.0) into 0.0, because x might be -1 and the result of the 2053*9880d681SAndroid Build Coastguard Workerexpression is defined to be -0.0. 2054*9880d681SAndroid Build Coastguard Worker 2055*9880d681SAndroid Build Coastguard WorkerIf we look at the uses of the fmul for example, we might be able to prove that 2056*9880d681SAndroid Build Coastguard Workerall uses don't care about the sign of zero. For example, if we have: 2057*9880d681SAndroid Build Coastguard Worker 2058*9880d681SAndroid Build Coastguard Worker fadd(fmul(sitofp(x), 0.0), 2.0) 2059*9880d681SAndroid Build Coastguard Worker 2060*9880d681SAndroid Build Coastguard WorkerSince we know that x+2.0 doesn't care about the sign of any zeros in X, we can 2061*9880d681SAndroid Build Coastguard Workertransform the fmul to 0.0, and then the fadd to 2.0. 2062*9880d681SAndroid Build Coastguard Worker 2063*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2064*9880d681SAndroid Build Coastguard Worker 2065*9880d681SAndroid Build Coastguard WorkerWe should enhance memcpy/memcpy/memset to allow a metadata node on them 2066*9880d681SAndroid Build Coastguard Workerindicating that some bytes of the transfer are undefined. This is useful for 2067*9880d681SAndroid Build Coastguard Workerfrontends like clang when lowering struct copies, when some elements of the 2068*9880d681SAndroid Build Coastguard Workerstruct are undefined. Consider something like this: 2069*9880d681SAndroid Build Coastguard Worker 2070*9880d681SAndroid Build Coastguard Workerstruct x { 2071*9880d681SAndroid Build Coastguard Worker char a; 2072*9880d681SAndroid Build Coastguard Worker int b[4]; 2073*9880d681SAndroid Build Coastguard Worker}; 2074*9880d681SAndroid Build Coastguard Workervoid foo(struct x*P); 2075*9880d681SAndroid Build Coastguard Workerstruct x testfunc() { 2076*9880d681SAndroid Build Coastguard Worker struct x V1, V2; 2077*9880d681SAndroid Build Coastguard Worker foo(&V1); 2078*9880d681SAndroid Build Coastguard Worker V2 = V1; 2079*9880d681SAndroid Build Coastguard Worker 2080*9880d681SAndroid Build Coastguard Worker return V2; 2081*9880d681SAndroid Build Coastguard Worker} 2082*9880d681SAndroid Build Coastguard Worker 2083*9880d681SAndroid Build Coastguard WorkerWe currently compile this to: 2084*9880d681SAndroid Build Coastguard Worker$ clang t.c -S -o - -O0 -emit-llvm | opt -sroa -S 2085*9880d681SAndroid Build Coastguard Worker 2086*9880d681SAndroid Build Coastguard Worker 2087*9880d681SAndroid Build Coastguard Worker%struct.x = type { i8, [4 x i32] } 2088*9880d681SAndroid Build Coastguard Worker 2089*9880d681SAndroid Build Coastguard Workerdefine void @testfunc(%struct.x* sret %agg.result) nounwind ssp { 2090*9880d681SAndroid Build Coastguard Workerentry: 2091*9880d681SAndroid Build Coastguard Worker %V1 = alloca %struct.x, align 4 2092*9880d681SAndroid Build Coastguard Worker call void @foo(%struct.x* %V1) 2093*9880d681SAndroid Build Coastguard Worker %tmp1 = bitcast %struct.x* %V1 to i8* 2094*9880d681SAndroid Build Coastguard Worker %0 = bitcast %struct.x* %V1 to i160* 2095*9880d681SAndroid Build Coastguard Worker %srcval1 = load i160* %0, align 4 2096*9880d681SAndroid Build Coastguard Worker %tmp2 = bitcast %struct.x* %agg.result to i8* 2097*9880d681SAndroid Build Coastguard Worker %1 = bitcast %struct.x* %agg.result to i160* 2098*9880d681SAndroid Build Coastguard Worker store i160 %srcval1, i160* %1, align 4 2099*9880d681SAndroid Build Coastguard Worker ret void 2100*9880d681SAndroid Build Coastguard Worker} 2101*9880d681SAndroid Build Coastguard Worker 2102*9880d681SAndroid Build Coastguard WorkerThis happens because SRoA sees that the temp alloca has is being memcpy'd into 2103*9880d681SAndroid Build Coastguard Workerand out of and it has holes and it has to be conservative. If we knew about the 2104*9880d681SAndroid Build Coastguard Workerholes, then this could be much much better. 2105*9880d681SAndroid Build Coastguard Worker 2106*9880d681SAndroid Build Coastguard WorkerHaving information about these holes would also improve memcpy (etc) lowering at 2107*9880d681SAndroid Build Coastguard Workerllc time when it gets inlined, because we can use smaller transfers. This also 2108*9880d681SAndroid Build Coastguard Workeravoids partial register stalls in some important cases. 2109*9880d681SAndroid Build Coastguard Worker 2110*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2111*9880d681SAndroid Build Coastguard Worker 2112*9880d681SAndroid Build Coastguard WorkerWe don't fold (icmp (add) (add)) unless the two adds only have a single use. 2113*9880d681SAndroid Build Coastguard WorkerThere are a lot of cases that we're refusing to fold in (e.g.) 256.bzip2, for 2114*9880d681SAndroid Build Coastguard Workerexample: 2115*9880d681SAndroid Build Coastguard Worker 2116*9880d681SAndroid Build Coastguard Worker %indvar.next90 = add i64 %indvar89, 1 ;; Has 2 uses 2117*9880d681SAndroid Build Coastguard Worker %tmp96 = add i64 %tmp95, 1 ;; Has 1 use 2118*9880d681SAndroid Build Coastguard Worker %exitcond97 = icmp eq i64 %indvar.next90, %tmp96 2119*9880d681SAndroid Build Coastguard Worker 2120*9880d681SAndroid Build Coastguard WorkerWe don't fold this because we don't want to introduce an overlapped live range 2121*9880d681SAndroid Build Coastguard Workerof the ivar. However if we can make this more aggressive without causing 2122*9880d681SAndroid Build Coastguard Workerperformance issues in two ways: 2123*9880d681SAndroid Build Coastguard Worker 2124*9880d681SAndroid Build Coastguard Worker1. If *either* the LHS or RHS has a single use, we can definitely do the 2125*9880d681SAndroid Build Coastguard Worker transformation. In the overlapping liverange case we're trading one register 2126*9880d681SAndroid Build Coastguard Worker use for one fewer operation, which is a reasonable trade. Before doing this 2127*9880d681SAndroid Build Coastguard Worker we should verify that the llc output actually shrinks for some benchmarks. 2128*9880d681SAndroid Build Coastguard Worker2. If both ops have multiple uses, we can still fold it if the operations are 2129*9880d681SAndroid Build Coastguard Worker both sinkable to *after* the icmp (e.g. in a subsequent block) which doesn't 2130*9880d681SAndroid Build Coastguard Worker increase register pressure. 2131*9880d681SAndroid Build Coastguard Worker 2132*9880d681SAndroid Build Coastguard WorkerThere are a ton of icmp's we aren't simplifying because of the reg pressure 2133*9880d681SAndroid Build Coastguard Workerconcern. Care is warranted here though because many of these are induction 2134*9880d681SAndroid Build Coastguard Workervariables and other cases that matter a lot to performance, like the above. 2135*9880d681SAndroid Build Coastguard WorkerHere's a blob of code that you can drop into the bottom of visitICmp to see some 2136*9880d681SAndroid Build Coastguard Workermissed cases: 2137*9880d681SAndroid Build Coastguard Worker 2138*9880d681SAndroid Build Coastguard Worker { Value *A, *B, *C, *D; 2139*9880d681SAndroid Build Coastguard Worker if (match(Op0, m_Add(m_Value(A), m_Value(B))) && 2140*9880d681SAndroid Build Coastguard Worker match(Op1, m_Add(m_Value(C), m_Value(D))) && 2141*9880d681SAndroid Build Coastguard Worker (A == C || A == D || B == C || B == D)) { 2142*9880d681SAndroid Build Coastguard Worker errs() << "OP0 = " << *Op0 << " U=" << Op0->getNumUses() << "\n"; 2143*9880d681SAndroid Build Coastguard Worker errs() << "OP1 = " << *Op1 << " U=" << Op1->getNumUses() << "\n"; 2144*9880d681SAndroid Build Coastguard Worker errs() << "CMP = " << I << "\n\n"; 2145*9880d681SAndroid Build Coastguard Worker } 2146*9880d681SAndroid Build Coastguard Worker } 2147*9880d681SAndroid Build Coastguard Worker 2148*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2149*9880d681SAndroid Build Coastguard Worker 2150*9880d681SAndroid Build Coastguard Workerdefine i1 @test1(i32 %x) nounwind { 2151*9880d681SAndroid Build Coastguard Worker %and = and i32 %x, 3 2152*9880d681SAndroid Build Coastguard Worker %cmp = icmp ult i32 %and, 2 2153*9880d681SAndroid Build Coastguard Worker ret i1 %cmp 2154*9880d681SAndroid Build Coastguard Worker} 2155*9880d681SAndroid Build Coastguard Worker 2156*9880d681SAndroid Build Coastguard WorkerCan be folded to (x & 2) == 0. 2157*9880d681SAndroid Build Coastguard Worker 2158*9880d681SAndroid Build Coastguard Workerdefine i1 @test2(i32 %x) nounwind { 2159*9880d681SAndroid Build Coastguard Worker %and = and i32 %x, 3 2160*9880d681SAndroid Build Coastguard Worker %cmp = icmp ugt i32 %and, 1 2161*9880d681SAndroid Build Coastguard Worker ret i1 %cmp 2162*9880d681SAndroid Build Coastguard Worker} 2163*9880d681SAndroid Build Coastguard Worker 2164*9880d681SAndroid Build Coastguard WorkerCan be folded to (x & 2) != 0. 2165*9880d681SAndroid Build Coastguard Worker 2166*9880d681SAndroid Build Coastguard WorkerSimplifyDemandedBits shrinks the "and" constant to 2 but instcombine misses the 2167*9880d681SAndroid Build Coastguard Workericmp transform. 2168*9880d681SAndroid Build Coastguard Worker 2169*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2170*9880d681SAndroid Build Coastguard Worker 2171*9880d681SAndroid Build Coastguard WorkerThis code: 2172*9880d681SAndroid Build Coastguard Worker 2173*9880d681SAndroid Build Coastguard Workertypedef struct { 2174*9880d681SAndroid Build Coastguard Workerint f1:1; 2175*9880d681SAndroid Build Coastguard Workerint f2:1; 2176*9880d681SAndroid Build Coastguard Workerint f3:1; 2177*9880d681SAndroid Build Coastguard Workerint f4:29; 2178*9880d681SAndroid Build Coastguard Worker} t1; 2179*9880d681SAndroid Build Coastguard Worker 2180*9880d681SAndroid Build Coastguard Workertypedef struct { 2181*9880d681SAndroid Build Coastguard Workerint f1:1; 2182*9880d681SAndroid Build Coastguard Workerint f2:1; 2183*9880d681SAndroid Build Coastguard Workerint f3:30; 2184*9880d681SAndroid Build Coastguard Worker} t2; 2185*9880d681SAndroid Build Coastguard Worker 2186*9880d681SAndroid Build Coastguard Workert1 s1; 2187*9880d681SAndroid Build Coastguard Workert2 s2; 2188*9880d681SAndroid Build Coastguard Worker 2189*9880d681SAndroid Build Coastguard Workervoid func1(void) 2190*9880d681SAndroid Build Coastguard Worker{ 2191*9880d681SAndroid Build Coastguard Workers1.f1 = s2.f1; 2192*9880d681SAndroid Build Coastguard Workers1.f2 = s2.f2; 2193*9880d681SAndroid Build Coastguard Worker} 2194*9880d681SAndroid Build Coastguard Worker 2195*9880d681SAndroid Build Coastguard WorkerCompiles into this IR (on x86-64 at least): 2196*9880d681SAndroid Build Coastguard Worker 2197*9880d681SAndroid Build Coastguard Worker%struct.t1 = type { i8, [3 x i8] } 2198*9880d681SAndroid Build Coastguard Worker@s2 = global %struct.t1 zeroinitializer, align 4 2199*9880d681SAndroid Build Coastguard Worker@s1 = global %struct.t1 zeroinitializer, align 4 2200*9880d681SAndroid Build Coastguard Workerdefine void @func1() nounwind ssp noredzone { 2201*9880d681SAndroid Build Coastguard Workerentry: 2202*9880d681SAndroid Build Coastguard Worker %0 = load i32* bitcast (%struct.t1* @s2 to i32*), align 4 2203*9880d681SAndroid Build Coastguard Worker %bf.val.sext5 = and i32 %0, 1 2204*9880d681SAndroid Build Coastguard Worker %1 = load i32* bitcast (%struct.t1* @s1 to i32*), align 4 2205*9880d681SAndroid Build Coastguard Worker %2 = and i32 %1, -4 2206*9880d681SAndroid Build Coastguard Worker %3 = or i32 %2, %bf.val.sext5 2207*9880d681SAndroid Build Coastguard Worker %bf.val.sext26 = and i32 %0, 2 2208*9880d681SAndroid Build Coastguard Worker %4 = or i32 %3, %bf.val.sext26 2209*9880d681SAndroid Build Coastguard Worker store i32 %4, i32* bitcast (%struct.t1* @s1 to i32*), align 4 2210*9880d681SAndroid Build Coastguard Worker ret void 2211*9880d681SAndroid Build Coastguard Worker} 2212*9880d681SAndroid Build Coastguard Worker 2213*9880d681SAndroid Build Coastguard WorkerThe two or/and's should be merged into one each. 2214*9880d681SAndroid Build Coastguard Worker 2215*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2216*9880d681SAndroid Build Coastguard Worker 2217*9880d681SAndroid Build Coastguard WorkerMachine level code hoisting can be useful in some cases. For example, PR9408 2218*9880d681SAndroid Build Coastguard Workeris about: 2219*9880d681SAndroid Build Coastguard Worker 2220*9880d681SAndroid Build Coastguard Workertypedef union { 2221*9880d681SAndroid Build Coastguard Worker void (*f1)(int); 2222*9880d681SAndroid Build Coastguard Worker void (*f2)(long); 2223*9880d681SAndroid Build Coastguard Worker} funcs; 2224*9880d681SAndroid Build Coastguard Worker 2225*9880d681SAndroid Build Coastguard Workervoid foo(funcs f, int which) { 2226*9880d681SAndroid Build Coastguard Worker int a = 5; 2227*9880d681SAndroid Build Coastguard Worker if (which) { 2228*9880d681SAndroid Build Coastguard Worker f.f1(a); 2229*9880d681SAndroid Build Coastguard Worker } else { 2230*9880d681SAndroid Build Coastguard Worker f.f2(a); 2231*9880d681SAndroid Build Coastguard Worker } 2232*9880d681SAndroid Build Coastguard Worker} 2233*9880d681SAndroid Build Coastguard Worker 2234*9880d681SAndroid Build Coastguard Workerwhich we compile to: 2235*9880d681SAndroid Build Coastguard Worker 2236*9880d681SAndroid Build Coastguard Workerfoo: # @foo 2237*9880d681SAndroid Build Coastguard Worker# BB#0: # %entry 2238*9880d681SAndroid Build Coastguard Worker pushq %rbp 2239*9880d681SAndroid Build Coastguard Worker movq %rsp, %rbp 2240*9880d681SAndroid Build Coastguard Worker testl %esi, %esi 2241*9880d681SAndroid Build Coastguard Worker movq %rdi, %rax 2242*9880d681SAndroid Build Coastguard Worker je .LBB0_2 2243*9880d681SAndroid Build Coastguard Worker# BB#1: # %if.then 2244*9880d681SAndroid Build Coastguard Worker movl $5, %edi 2245*9880d681SAndroid Build Coastguard Worker callq *%rax 2246*9880d681SAndroid Build Coastguard Worker popq %rbp 2247*9880d681SAndroid Build Coastguard Worker ret 2248*9880d681SAndroid Build Coastguard Worker.LBB0_2: # %if.else 2249*9880d681SAndroid Build Coastguard Worker movl $5, %edi 2250*9880d681SAndroid Build Coastguard Worker callq *%rax 2251*9880d681SAndroid Build Coastguard Worker popq %rbp 2252*9880d681SAndroid Build Coastguard Worker ret 2253*9880d681SAndroid Build Coastguard Worker 2254*9880d681SAndroid Build Coastguard WorkerNote that bb1 and bb2 are the same. This doesn't happen at the IR level 2255*9880d681SAndroid Build Coastguard Workerbecause one call is passing an i32 and the other is passing an i64. 2256*9880d681SAndroid Build Coastguard Worker 2257*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2258*9880d681SAndroid Build Coastguard Worker 2259*9880d681SAndroid Build Coastguard WorkerI see this sort of pattern in 176.gcc in a few places (e.g. the start of 2260*9880d681SAndroid Build Coastguard Workerstore_bit_field). The rem should be replaced with a multiply and subtract: 2261*9880d681SAndroid Build Coastguard Worker 2262*9880d681SAndroid Build Coastguard Worker %3 = sdiv i32 %A, %B 2263*9880d681SAndroid Build Coastguard Worker %4 = srem i32 %A, %B 2264*9880d681SAndroid Build Coastguard Worker 2265*9880d681SAndroid Build Coastguard WorkerSimilarly for udiv/urem. Note that this shouldn't be done on X86 or ARM, 2266*9880d681SAndroid Build Coastguard Workerwhich can do this in a single operation (instruction or libcall). It is 2267*9880d681SAndroid Build Coastguard Workerprobably best to do this in the code generator. 2268*9880d681SAndroid Build Coastguard Worker 2269*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2270*9880d681SAndroid Build Coastguard Worker 2271*9880d681SAndroid Build Coastguard Workerunsigned foo(unsigned x, unsigned y) { return (x & y) == 0 || x == 0; } 2272*9880d681SAndroid Build Coastguard Workershould fold to (x & y) == 0. 2273*9880d681SAndroid Build Coastguard Worker 2274*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2275*9880d681SAndroid Build Coastguard Worker 2276*9880d681SAndroid Build Coastguard Workerunsigned foo(unsigned x, unsigned y) { return x > y && x != 0; } 2277*9880d681SAndroid Build Coastguard Workershould fold to x > y. 2278*9880d681SAndroid Build Coastguard Worker 2279*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2280