1*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 2*9880d681SAndroid Build Coastguard Worker// Random ideas for the ARM backend. 3*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 4*9880d681SAndroid Build Coastguard Worker 5*9880d681SAndroid Build Coastguard WorkerReimplement 'select' in terms of 'SEL'. 6*9880d681SAndroid Build Coastguard Worker 7*9880d681SAndroid Build Coastguard Worker* We would really like to support UXTAB16, but we need to prove that the 8*9880d681SAndroid Build Coastguard Worker add doesn't need to overflow between the two 16-bit chunks. 9*9880d681SAndroid Build Coastguard Worker 10*9880d681SAndroid Build Coastguard Worker* Implement pre/post increment support. (e.g. PR935) 11*9880d681SAndroid Build Coastguard Worker* Implement smarter constant generation for binops with large immediates. 12*9880d681SAndroid Build Coastguard Worker 13*9880d681SAndroid Build Coastguard WorkerA few ARMv6T2 ops should be pattern matched: BFI, SBFX, and UBFX 14*9880d681SAndroid Build Coastguard Worker 15*9880d681SAndroid Build Coastguard WorkerInteresting optimization for PIC codegen on arm-linux: 16*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129 17*9880d681SAndroid Build Coastguard Worker 18*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 19*9880d681SAndroid Build Coastguard Worker 20*9880d681SAndroid Build Coastguard WorkerCrazy idea: Consider code that uses lots of 8-bit or 16-bit values. By the 21*9880d681SAndroid Build Coastguard Workertime regalloc happens, these values are now in a 32-bit register, usually with 22*9880d681SAndroid Build Coastguard Workerthe top-bits known to be sign or zero extended. If spilled, we should be able 23*9880d681SAndroid Build Coastguard Workerto spill these to a 8-bit or 16-bit stack slot, zero or sign extending as part 24*9880d681SAndroid Build Coastguard Workerof the reload. 25*9880d681SAndroid Build Coastguard Worker 26*9880d681SAndroid Build Coastguard WorkerDoing this reduces the size of the stack frame (important for thumb etc), and 27*9880d681SAndroid Build Coastguard Workeralso increases the likelihood that we will be able to reload multiple values 28*9880d681SAndroid Build Coastguard Workerfrom the stack with a single load. 29*9880d681SAndroid Build Coastguard Worker 30*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 31*9880d681SAndroid Build Coastguard Worker 32*9880d681SAndroid Build Coastguard WorkerThe constant island pass is in good shape. Some cleanups might be desirable, 33*9880d681SAndroid Build Coastguard Workerbut there is unlikely to be much improvement in the generated code. 34*9880d681SAndroid Build Coastguard Worker 35*9880d681SAndroid Build Coastguard Worker1. There may be some advantage to trying to be smarter about the initial 36*9880d681SAndroid Build Coastguard Workerplacement, rather than putting everything at the end. 37*9880d681SAndroid Build Coastguard Worker 38*9880d681SAndroid Build Coastguard Worker2. There might be some compile-time efficiency to be had by representing 39*9880d681SAndroid Build Coastguard Workerconsecutive islands as a single block rather than multiple blocks. 40*9880d681SAndroid Build Coastguard Worker 41*9880d681SAndroid Build Coastguard Worker3. Use a priority queue to sort constant pool users in inverse order of 42*9880d681SAndroid Build Coastguard Worker position so we always process the one closed to the end of functions 43*9880d681SAndroid Build Coastguard Worker first. This may simply CreateNewWater. 44*9880d681SAndroid Build Coastguard Worker 45*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 46*9880d681SAndroid Build Coastguard Worker 47*9880d681SAndroid Build Coastguard WorkerEliminate copysign custom expansion. We are still generating crappy code with 48*9880d681SAndroid Build Coastguard Workerdefault expansion + if-conversion. 49*9880d681SAndroid Build Coastguard Worker 50*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 51*9880d681SAndroid Build Coastguard Worker 52*9880d681SAndroid Build Coastguard WorkerEliminate one instruction from: 53*9880d681SAndroid Build Coastguard Worker 54*9880d681SAndroid Build Coastguard Workerdefine i32 @_Z6slow4bii(i32 %x, i32 %y) { 55*9880d681SAndroid Build Coastguard Worker %tmp = icmp sgt i32 %x, %y 56*9880d681SAndroid Build Coastguard Worker %retval = select i1 %tmp, i32 %x, i32 %y 57*9880d681SAndroid Build Coastguard Worker ret i32 %retval 58*9880d681SAndroid Build Coastguard Worker} 59*9880d681SAndroid Build Coastguard Worker 60*9880d681SAndroid Build Coastguard Worker__Z6slow4bii: 61*9880d681SAndroid Build Coastguard Worker cmp r0, r1 62*9880d681SAndroid Build Coastguard Worker movgt r1, r0 63*9880d681SAndroid Build Coastguard Worker mov r0, r1 64*9880d681SAndroid Build Coastguard Worker bx lr 65*9880d681SAndroid Build Coastguard Worker=> 66*9880d681SAndroid Build Coastguard Worker 67*9880d681SAndroid Build Coastguard Worker__Z6slow4bii: 68*9880d681SAndroid Build Coastguard Worker cmp r0, r1 69*9880d681SAndroid Build Coastguard Worker movle r0, r1 70*9880d681SAndroid Build Coastguard Worker bx lr 71*9880d681SAndroid Build Coastguard Worker 72*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 73*9880d681SAndroid Build Coastguard Worker 74*9880d681SAndroid Build Coastguard WorkerImplement long long "X-3" with instructions that fold the immediate in. These 75*9880d681SAndroid Build Coastguard Workerwere disabled due to badness with the ARM carry flag on subtracts. 76*9880d681SAndroid Build Coastguard Worker 77*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 78*9880d681SAndroid Build Coastguard Worker 79*9880d681SAndroid Build Coastguard WorkerMore load / store optimizations: 80*9880d681SAndroid Build Coastguard Worker1) Better representation for block transfer? This is from Olden/power: 81*9880d681SAndroid Build Coastguard Worker 82*9880d681SAndroid Build Coastguard Worker fldd d0, [r4] 83*9880d681SAndroid Build Coastguard Worker fstd d0, [r4, #+32] 84*9880d681SAndroid Build Coastguard Worker fldd d0, [r4, #+8] 85*9880d681SAndroid Build Coastguard Worker fstd d0, [r4, #+40] 86*9880d681SAndroid Build Coastguard Worker fldd d0, [r4, #+16] 87*9880d681SAndroid Build Coastguard Worker fstd d0, [r4, #+48] 88*9880d681SAndroid Build Coastguard Worker fldd d0, [r4, #+24] 89*9880d681SAndroid Build Coastguard Worker fstd d0, [r4, #+56] 90*9880d681SAndroid Build Coastguard Worker 91*9880d681SAndroid Build Coastguard WorkerIf we can spare the registers, it would be better to use fldm and fstm here. 92*9880d681SAndroid Build Coastguard WorkerNeed major register allocator enhancement though. 93*9880d681SAndroid Build Coastguard Worker 94*9880d681SAndroid Build Coastguard Worker2) Can we recognize the relative position of constantpool entries? i.e. Treat 95*9880d681SAndroid Build Coastguard Worker 96*9880d681SAndroid Build Coastguard Worker ldr r0, LCPI17_3 97*9880d681SAndroid Build Coastguard Worker ldr r1, LCPI17_4 98*9880d681SAndroid Build Coastguard Worker ldr r2, LCPI17_5 99*9880d681SAndroid Build Coastguard Worker 100*9880d681SAndroid Build Coastguard Worker as 101*9880d681SAndroid Build Coastguard Worker ldr r0, LCPI17 102*9880d681SAndroid Build Coastguard Worker ldr r1, LCPI17+4 103*9880d681SAndroid Build Coastguard Worker ldr r2, LCPI17+8 104*9880d681SAndroid Build Coastguard Worker 105*9880d681SAndroid Build Coastguard Worker Then the ldr's can be combined into a single ldm. See Olden/power. 106*9880d681SAndroid Build Coastguard Worker 107*9880d681SAndroid Build Coastguard WorkerNote for ARM v4 gcc uses ldmia to load a pair of 32-bit values to represent a 108*9880d681SAndroid Build Coastguard Workerdouble 64-bit FP constant: 109*9880d681SAndroid Build Coastguard Worker 110*9880d681SAndroid Build Coastguard Worker adr r0, L6 111*9880d681SAndroid Build Coastguard Worker ldmia r0, {r0-r1} 112*9880d681SAndroid Build Coastguard Worker 113*9880d681SAndroid Build Coastguard Worker .align 2 114*9880d681SAndroid Build Coastguard WorkerL6: 115*9880d681SAndroid Build Coastguard Worker .long -858993459 116*9880d681SAndroid Build Coastguard Worker .long 1074318540 117*9880d681SAndroid Build Coastguard Worker 118*9880d681SAndroid Build Coastguard Worker3) struct copies appear to be done field by field 119*9880d681SAndroid Build Coastguard Workerinstead of by words, at least sometimes: 120*9880d681SAndroid Build Coastguard Worker 121*9880d681SAndroid Build Coastguard Workerstruct foo { int x; short s; char c1; char c2; }; 122*9880d681SAndroid Build Coastguard Workervoid cpy(struct foo*a, struct foo*b) { *a = *b; } 123*9880d681SAndroid Build Coastguard Worker 124*9880d681SAndroid Build Coastguard Workerllvm code (-O2) 125*9880d681SAndroid Build Coastguard Worker ldrb r3, [r1, #+6] 126*9880d681SAndroid Build Coastguard Worker ldr r2, [r1] 127*9880d681SAndroid Build Coastguard Worker ldrb r12, [r1, #+7] 128*9880d681SAndroid Build Coastguard Worker ldrh r1, [r1, #+4] 129*9880d681SAndroid Build Coastguard Worker str r2, [r0] 130*9880d681SAndroid Build Coastguard Worker strh r1, [r0, #+4] 131*9880d681SAndroid Build Coastguard Worker strb r3, [r0, #+6] 132*9880d681SAndroid Build Coastguard Worker strb r12, [r0, #+7] 133*9880d681SAndroid Build Coastguard Workergcc code (-O2) 134*9880d681SAndroid Build Coastguard Worker ldmia r1, {r1-r2} 135*9880d681SAndroid Build Coastguard Worker stmia r0, {r1-r2} 136*9880d681SAndroid Build Coastguard Worker 137*9880d681SAndroid Build Coastguard WorkerIn this benchmark poor handling of aggregate copies has shown up as 138*9880d681SAndroid Build Coastguard Workerhaving a large effect on size, and possibly speed as well (we don't have 139*9880d681SAndroid Build Coastguard Workera good way to measure on ARM). 140*9880d681SAndroid Build Coastguard Worker 141*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 142*9880d681SAndroid Build Coastguard Worker 143*9880d681SAndroid Build Coastguard Worker* Consider this silly example: 144*9880d681SAndroid Build Coastguard Worker 145*9880d681SAndroid Build Coastguard Workerdouble bar(double x) { 146*9880d681SAndroid Build Coastguard Worker double r = foo(3.1); 147*9880d681SAndroid Build Coastguard Worker return x+r; 148*9880d681SAndroid Build Coastguard Worker} 149*9880d681SAndroid Build Coastguard Worker 150*9880d681SAndroid Build Coastguard Worker_bar: 151*9880d681SAndroid Build Coastguard Worker stmfd sp!, {r4, r5, r7, lr} 152*9880d681SAndroid Build Coastguard Worker add r7, sp, #8 153*9880d681SAndroid Build Coastguard Worker mov r4, r0 154*9880d681SAndroid Build Coastguard Worker mov r5, r1 155*9880d681SAndroid Build Coastguard Worker fldd d0, LCPI1_0 156*9880d681SAndroid Build Coastguard Worker fmrrd r0, r1, d0 157*9880d681SAndroid Build Coastguard Worker bl _foo 158*9880d681SAndroid Build Coastguard Worker fmdrr d0, r4, r5 159*9880d681SAndroid Build Coastguard Worker fmsr s2, r0 160*9880d681SAndroid Build Coastguard Worker fsitod d1, s2 161*9880d681SAndroid Build Coastguard Worker faddd d0, d1, d0 162*9880d681SAndroid Build Coastguard Worker fmrrd r0, r1, d0 163*9880d681SAndroid Build Coastguard Worker ldmfd sp!, {r4, r5, r7, pc} 164*9880d681SAndroid Build Coastguard Worker 165*9880d681SAndroid Build Coastguard WorkerIgnore the prologue and epilogue stuff for a second. Note 166*9880d681SAndroid Build Coastguard Worker mov r4, r0 167*9880d681SAndroid Build Coastguard Worker mov r5, r1 168*9880d681SAndroid Build Coastguard Workerthe copys to callee-save registers and the fact they are only being used by the 169*9880d681SAndroid Build Coastguard Workerfmdrr instruction. It would have been better had the fmdrr been scheduled 170*9880d681SAndroid Build Coastguard Workerbefore the call and place the result in a callee-save DPR register. The two 171*9880d681SAndroid Build Coastguard Workermov ops would not have been necessary. 172*9880d681SAndroid Build Coastguard Worker 173*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 174*9880d681SAndroid Build Coastguard Worker 175*9880d681SAndroid Build Coastguard WorkerCalling convention related stuff: 176*9880d681SAndroid Build Coastguard Worker 177*9880d681SAndroid Build Coastguard Worker* gcc's parameter passing implementation is terrible and we suffer as a result: 178*9880d681SAndroid Build Coastguard Worker 179*9880d681SAndroid Build Coastguard Workere.g. 180*9880d681SAndroid Build Coastguard Workerstruct s { 181*9880d681SAndroid Build Coastguard Worker double d1; 182*9880d681SAndroid Build Coastguard Worker int s1; 183*9880d681SAndroid Build Coastguard Worker}; 184*9880d681SAndroid Build Coastguard Worker 185*9880d681SAndroid Build Coastguard Workervoid foo(struct s S) { 186*9880d681SAndroid Build Coastguard Worker printf("%g, %d\n", S.d1, S.s1); 187*9880d681SAndroid Build Coastguard Worker} 188*9880d681SAndroid Build Coastguard Worker 189*9880d681SAndroid Build Coastguard Worker'S' is passed via registers r0, r1, r2. But gcc stores them to the stack, and 190*9880d681SAndroid Build Coastguard Workerthen reload them to r1, r2, and r3 before issuing the call (r0 contains the 191*9880d681SAndroid Build Coastguard Workeraddress of the format string): 192*9880d681SAndroid Build Coastguard Worker 193*9880d681SAndroid Build Coastguard Worker stmfd sp!, {r7, lr} 194*9880d681SAndroid Build Coastguard Worker add r7, sp, #0 195*9880d681SAndroid Build Coastguard Worker sub sp, sp, #12 196*9880d681SAndroid Build Coastguard Worker stmia sp, {r0, r1, r2} 197*9880d681SAndroid Build Coastguard Worker ldmia sp, {r1-r2} 198*9880d681SAndroid Build Coastguard Worker ldr r0, L5 199*9880d681SAndroid Build Coastguard Worker ldr r3, [sp, #8] 200*9880d681SAndroid Build Coastguard WorkerL2: 201*9880d681SAndroid Build Coastguard Worker add r0, pc, r0 202*9880d681SAndroid Build Coastguard Worker bl L_printf$stub 203*9880d681SAndroid Build Coastguard Worker 204*9880d681SAndroid Build Coastguard WorkerInstead of a stmia, ldmia, and a ldr, wouldn't it be better to do three moves? 205*9880d681SAndroid Build Coastguard Worker 206*9880d681SAndroid Build Coastguard Worker* Return an aggregate type is even worse: 207*9880d681SAndroid Build Coastguard Worker 208*9880d681SAndroid Build Coastguard Workere.g. 209*9880d681SAndroid Build Coastguard Workerstruct s foo(void) { 210*9880d681SAndroid Build Coastguard Worker struct s S = {1.1, 2}; 211*9880d681SAndroid Build Coastguard Worker return S; 212*9880d681SAndroid Build Coastguard Worker} 213*9880d681SAndroid Build Coastguard Worker 214*9880d681SAndroid Build Coastguard Worker mov ip, r0 215*9880d681SAndroid Build Coastguard Worker ldr r0, L5 216*9880d681SAndroid Build Coastguard Worker sub sp, sp, #12 217*9880d681SAndroid Build Coastguard WorkerL2: 218*9880d681SAndroid Build Coastguard Worker add r0, pc, r0 219*9880d681SAndroid Build Coastguard Worker @ lr needed for prologue 220*9880d681SAndroid Build Coastguard Worker ldmia r0, {r0, r1, r2} 221*9880d681SAndroid Build Coastguard Worker stmia sp, {r0, r1, r2} 222*9880d681SAndroid Build Coastguard Worker stmia ip, {r0, r1, r2} 223*9880d681SAndroid Build Coastguard Worker mov r0, ip 224*9880d681SAndroid Build Coastguard Worker add sp, sp, #12 225*9880d681SAndroid Build Coastguard Worker bx lr 226*9880d681SAndroid Build Coastguard Worker 227*9880d681SAndroid Build Coastguard Workerr0 (and later ip) is the hidden parameter from caller to store the value in. The 228*9880d681SAndroid Build Coastguard Workerfirst ldmia loads the constants into r0, r1, r2. The last stmia stores r0, r1, 229*9880d681SAndroid Build Coastguard Workerr2 into the address passed in. However, there is one additional stmia that 230*9880d681SAndroid Build Coastguard Workerstores r0, r1, and r2 to some stack location. The store is dead. 231*9880d681SAndroid Build Coastguard Worker 232*9880d681SAndroid Build Coastguard WorkerThe llvm-gcc generated code looks like this: 233*9880d681SAndroid Build Coastguard Worker 234*9880d681SAndroid Build Coastguard Workercsretcc void %foo(%struct.s* %agg.result) { 235*9880d681SAndroid Build Coastguard Workerentry: 236*9880d681SAndroid Build Coastguard Worker %S = alloca %struct.s, align 4 ; <%struct.s*> [#uses=1] 237*9880d681SAndroid Build Coastguard Worker %memtmp = alloca %struct.s ; <%struct.s*> [#uses=1] 238*9880d681SAndroid Build Coastguard Worker cast %struct.s* %S to sbyte* ; <sbyte*>:0 [#uses=2] 239*9880d681SAndroid Build Coastguard Worker call void %llvm.memcpy.i32( sbyte* %0, sbyte* cast ({ double, int }* %C.0.904 to sbyte*), uint 12, uint 4 ) 240*9880d681SAndroid Build Coastguard Worker cast %struct.s* %agg.result to sbyte* ; <sbyte*>:1 [#uses=2] 241*9880d681SAndroid Build Coastguard Worker call void %llvm.memcpy.i32( sbyte* %1, sbyte* %0, uint 12, uint 0 ) 242*9880d681SAndroid Build Coastguard Worker cast %struct.s* %memtmp to sbyte* ; <sbyte*>:2 [#uses=1] 243*9880d681SAndroid Build Coastguard Worker call void %llvm.memcpy.i32( sbyte* %2, sbyte* %1, uint 12, uint 0 ) 244*9880d681SAndroid Build Coastguard Worker ret void 245*9880d681SAndroid Build Coastguard Worker} 246*9880d681SAndroid Build Coastguard Worker 247*9880d681SAndroid Build Coastguard Workerllc ends up issuing two memcpy's (the first memcpy becomes 3 loads from 248*9880d681SAndroid Build Coastguard Workerconstantpool). Perhaps we should 1) fix llvm-gcc so the memcpy is translated 249*9880d681SAndroid Build Coastguard Workerinto a number of load and stores, or 2) custom lower memcpy (of small size) to 250*9880d681SAndroid Build Coastguard Workerbe ldmia / stmia. I think option 2 is better but the current register 251*9880d681SAndroid Build Coastguard Workerallocator cannot allocate a chunk of registers at a time. 252*9880d681SAndroid Build Coastguard Worker 253*9880d681SAndroid Build Coastguard WorkerA feasible temporary solution is to use specific physical registers at the 254*9880d681SAndroid Build Coastguard Workerlowering time for small (<= 4 words?) transfer size. 255*9880d681SAndroid Build Coastguard Worker 256*9880d681SAndroid Build Coastguard Worker* ARM CSRet calling convention requires the hidden argument to be returned by 257*9880d681SAndroid Build Coastguard Workerthe callee. 258*9880d681SAndroid Build Coastguard Worker 259*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 260*9880d681SAndroid Build Coastguard Worker 261*9880d681SAndroid Build Coastguard WorkerWe can definitely do a better job on BB placements to eliminate some branches. 262*9880d681SAndroid Build Coastguard WorkerIt's very common to see llvm generated assembly code that looks like this: 263*9880d681SAndroid Build Coastguard Worker 264*9880d681SAndroid Build Coastguard WorkerLBB3: 265*9880d681SAndroid Build Coastguard Worker ... 266*9880d681SAndroid Build Coastguard WorkerLBB4: 267*9880d681SAndroid Build Coastguard Worker... 268*9880d681SAndroid Build Coastguard Worker beq LBB3 269*9880d681SAndroid Build Coastguard Worker b LBB2 270*9880d681SAndroid Build Coastguard Worker 271*9880d681SAndroid Build Coastguard WorkerIf BB4 is the only predecessor of BB3, then we can emit BB3 after BB4. We can 272*9880d681SAndroid Build Coastguard Workerthen eliminate beq and turn the unconditional branch to LBB2 to a bne. 273*9880d681SAndroid Build Coastguard Worker 274*9880d681SAndroid Build Coastguard WorkerSee McCat/18-imp/ComputeBoundingBoxes for an example. 275*9880d681SAndroid Build Coastguard Worker 276*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 277*9880d681SAndroid Build Coastguard Worker 278*9880d681SAndroid Build Coastguard WorkerPre-/post- indexed load / stores: 279*9880d681SAndroid Build Coastguard Worker 280*9880d681SAndroid Build Coastguard Worker1) We should not make the pre/post- indexed load/store transform if the base ptr 281*9880d681SAndroid Build Coastguard Workeris guaranteed to be live beyond the load/store. This can happen if the base 282*9880d681SAndroid Build Coastguard Workerptr is live out of the block we are performing the optimization. e.g. 283*9880d681SAndroid Build Coastguard Worker 284*9880d681SAndroid Build Coastguard Workermov r1, r2 285*9880d681SAndroid Build Coastguard Workerldr r3, [r1], #4 286*9880d681SAndroid Build Coastguard Worker... 287*9880d681SAndroid Build Coastguard Worker 288*9880d681SAndroid Build Coastguard Workervs. 289*9880d681SAndroid Build Coastguard Worker 290*9880d681SAndroid Build Coastguard Workerldr r3, [r2] 291*9880d681SAndroid Build Coastguard Workeradd r1, r2, #4 292*9880d681SAndroid Build Coastguard Worker... 293*9880d681SAndroid Build Coastguard Worker 294*9880d681SAndroid Build Coastguard WorkerIn most cases, this is just a wasted optimization. However, sometimes it can 295*9880d681SAndroid Build Coastguard Workernegatively impact the performance because two-address code is more restrictive 296*9880d681SAndroid Build Coastguard Workerwhen it comes to scheduling. 297*9880d681SAndroid Build Coastguard Worker 298*9880d681SAndroid Build Coastguard WorkerUnfortunately, liveout information is currently unavailable during DAG combine 299*9880d681SAndroid Build Coastguard Workertime. 300*9880d681SAndroid Build Coastguard Worker 301*9880d681SAndroid Build Coastguard Worker2) Consider spliting a indexed load / store into a pair of add/sub + load/store 302*9880d681SAndroid Build Coastguard Worker to solve #1 (in TwoAddressInstructionPass.cpp). 303*9880d681SAndroid Build Coastguard Worker 304*9880d681SAndroid Build Coastguard Worker3) Enhance LSR to generate more opportunities for indexed ops. 305*9880d681SAndroid Build Coastguard Worker 306*9880d681SAndroid Build Coastguard Worker4) Once we added support for multiple result patterns, write indexed loads 307*9880d681SAndroid Build Coastguard Worker patterns instead of C++ instruction selection code. 308*9880d681SAndroid Build Coastguard Worker 309*9880d681SAndroid Build Coastguard Worker5) Use VLDM / VSTM to emulate indexed FP load / store. 310*9880d681SAndroid Build Coastguard Worker 311*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 312*9880d681SAndroid Build Coastguard Worker 313*9880d681SAndroid Build Coastguard WorkerImplement support for some more tricky ways to materialize immediates. For 314*9880d681SAndroid Build Coastguard Workerexample, to get 0xffff8000, we can use: 315*9880d681SAndroid Build Coastguard Worker 316*9880d681SAndroid Build Coastguard Workermov r9, #&3f8000 317*9880d681SAndroid Build Coastguard Workersub r9, r9, #&400000 318*9880d681SAndroid Build Coastguard Worker 319*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 320*9880d681SAndroid Build Coastguard Worker 321*9880d681SAndroid Build Coastguard WorkerWe sometimes generate multiple add / sub instructions to update sp in prologue 322*9880d681SAndroid Build Coastguard Workerand epilogue if the inc / dec value is too large to fit in a single immediate 323*9880d681SAndroid Build Coastguard Workeroperand. In some cases, perhaps it might be better to load the value from a 324*9880d681SAndroid Build Coastguard Workerconstantpool instead. 325*9880d681SAndroid Build Coastguard Worker 326*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 327*9880d681SAndroid Build Coastguard Worker 328*9880d681SAndroid Build Coastguard WorkerGCC generates significantly better code for this function. 329*9880d681SAndroid Build Coastguard Worker 330*9880d681SAndroid Build Coastguard Workerint foo(int StackPtr, unsigned char *Line, unsigned char *Stack, int LineLen) { 331*9880d681SAndroid Build Coastguard Worker int i = 0; 332*9880d681SAndroid Build Coastguard Worker 333*9880d681SAndroid Build Coastguard Worker if (StackPtr != 0) { 334*9880d681SAndroid Build Coastguard Worker while (StackPtr != 0 && i < (((LineLen) < (32768))? (LineLen) : (32768))) 335*9880d681SAndroid Build Coastguard Worker Line[i++] = Stack[--StackPtr]; 336*9880d681SAndroid Build Coastguard Worker if (LineLen > 32768) 337*9880d681SAndroid Build Coastguard Worker { 338*9880d681SAndroid Build Coastguard Worker while (StackPtr != 0 && i < LineLen) 339*9880d681SAndroid Build Coastguard Worker { 340*9880d681SAndroid Build Coastguard Worker i++; 341*9880d681SAndroid Build Coastguard Worker --StackPtr; 342*9880d681SAndroid Build Coastguard Worker } 343*9880d681SAndroid Build Coastguard Worker } 344*9880d681SAndroid Build Coastguard Worker } 345*9880d681SAndroid Build Coastguard Worker return StackPtr; 346*9880d681SAndroid Build Coastguard Worker} 347*9880d681SAndroid Build Coastguard Worker 348*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 349*9880d681SAndroid Build Coastguard Worker 350*9880d681SAndroid Build Coastguard WorkerThis should compile to the mlas instruction: 351*9880d681SAndroid Build Coastguard Workerint mlas(int x, int y, int z) { return ((x * y + z) < 0) ? 7 : 13; } 352*9880d681SAndroid Build Coastguard Worker 353*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 354*9880d681SAndroid Build Coastguard Worker 355*9880d681SAndroid Build Coastguard WorkerAt some point, we should triage these to see if they still apply to us: 356*9880d681SAndroid Build Coastguard Worker 357*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19598 358*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=18560 359*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016 360*9880d681SAndroid Build Coastguard Worker 361*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11831 362*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11826 363*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11825 364*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11824 365*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11823 366*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11820 367*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=10982 368*9880d681SAndroid Build Coastguard Worker 369*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=10242 370*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9831 371*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9760 372*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9759 373*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9703 374*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9702 375*9880d681SAndroid Build Coastguard Workerhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=9663 376*9880d681SAndroid Build Coastguard Worker 377*9880d681SAndroid Build Coastguard Workerhttp://www.inf.u-szeged.hu/gcc-arm/ 378*9880d681SAndroid Build Coastguard Workerhttp://citeseer.ist.psu.edu/debus04linktime.html 379*9880d681SAndroid Build Coastguard Worker 380*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 381*9880d681SAndroid Build Coastguard Worker 382*9880d681SAndroid Build Coastguard Workergcc generates smaller code for this function at -O2 or -Os: 383*9880d681SAndroid Build Coastguard Worker 384*9880d681SAndroid Build Coastguard Workervoid foo(signed char* p) { 385*9880d681SAndroid Build Coastguard Worker if (*p == 3) 386*9880d681SAndroid Build Coastguard Worker bar(); 387*9880d681SAndroid Build Coastguard Worker else if (*p == 4) 388*9880d681SAndroid Build Coastguard Worker baz(); 389*9880d681SAndroid Build Coastguard Worker else if (*p == 5) 390*9880d681SAndroid Build Coastguard Worker quux(); 391*9880d681SAndroid Build Coastguard Worker} 392*9880d681SAndroid Build Coastguard Worker 393*9880d681SAndroid Build Coastguard Workerllvm decides it's a good idea to turn the repeated if...else into a 394*9880d681SAndroid Build Coastguard Workerbinary tree, as if it were a switch; the resulting code requires -1 395*9880d681SAndroid Build Coastguard Workercompare-and-branches when *p<=2 or *p==5, the same number if *p==4 396*9880d681SAndroid Build Coastguard Workeror *p>6, and +1 if *p==3. So it should be a speed win 397*9880d681SAndroid Build Coastguard Worker(on balance). However, the revised code is larger, with 4 conditional 398*9880d681SAndroid Build Coastguard Workerbranches instead of 3. 399*9880d681SAndroid Build Coastguard Worker 400*9880d681SAndroid Build Coastguard WorkerMore seriously, there is a byte->word extend before 401*9880d681SAndroid Build Coastguard Workereach comparison, where there should be only one, and the condition codes 402*9880d681SAndroid Build Coastguard Workerare not remembered when the same two values are compared twice. 403*9880d681SAndroid Build Coastguard Worker 404*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 405*9880d681SAndroid Build Coastguard Worker 406*9880d681SAndroid Build Coastguard WorkerMore LSR enhancements possible: 407*9880d681SAndroid Build Coastguard Worker 408*9880d681SAndroid Build Coastguard Worker1. Teach LSR about pre- and post- indexed ops to allow iv increment be merged 409*9880d681SAndroid Build Coastguard Worker in a load / store. 410*9880d681SAndroid Build Coastguard Worker2. Allow iv reuse even when a type conversion is required. For example, i8 411*9880d681SAndroid Build Coastguard Worker and i32 load / store addressing modes are identical. 412*9880d681SAndroid Build Coastguard Worker 413*9880d681SAndroid Build Coastguard Worker 414*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 415*9880d681SAndroid Build Coastguard Worker 416*9880d681SAndroid Build Coastguard WorkerThis: 417*9880d681SAndroid Build Coastguard Worker 418*9880d681SAndroid Build Coastguard Workerint foo(int a, int b, int c, int d) { 419*9880d681SAndroid Build Coastguard Worker long long acc = (long long)a * (long long)b; 420*9880d681SAndroid Build Coastguard Worker acc += (long long)c * (long long)d; 421*9880d681SAndroid Build Coastguard Worker return (int)(acc >> 32); 422*9880d681SAndroid Build Coastguard Worker} 423*9880d681SAndroid Build Coastguard Worker 424*9880d681SAndroid Build Coastguard WorkerShould compile to use SMLAL (Signed Multiply Accumulate Long) which multiplies 425*9880d681SAndroid Build Coastguard Workertwo signed 32-bit values to produce a 64-bit value, and accumulates this with 426*9880d681SAndroid Build Coastguard Workera 64-bit value. 427*9880d681SAndroid Build Coastguard Worker 428*9880d681SAndroid Build Coastguard WorkerWe currently get this with both v4 and v6: 429*9880d681SAndroid Build Coastguard Worker 430*9880d681SAndroid Build Coastguard Worker_foo: 431*9880d681SAndroid Build Coastguard Worker smull r1, r0, r1, r0 432*9880d681SAndroid Build Coastguard Worker smull r3, r2, r3, r2 433*9880d681SAndroid Build Coastguard Worker adds r3, r3, r1 434*9880d681SAndroid Build Coastguard Worker adc r0, r2, r0 435*9880d681SAndroid Build Coastguard Worker bx lr 436*9880d681SAndroid Build Coastguard Worker 437*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 438*9880d681SAndroid Build Coastguard Worker 439*9880d681SAndroid Build Coastguard WorkerThis: 440*9880d681SAndroid Build Coastguard Worker #include <algorithm> 441*9880d681SAndroid Build Coastguard Worker std::pair<unsigned, bool> full_add(unsigned a, unsigned b) 442*9880d681SAndroid Build Coastguard Worker { return std::make_pair(a + b, a + b < a); } 443*9880d681SAndroid Build Coastguard Worker bool no_overflow(unsigned a, unsigned b) 444*9880d681SAndroid Build Coastguard Worker { return !full_add(a, b).second; } 445*9880d681SAndroid Build Coastguard Worker 446*9880d681SAndroid Build Coastguard WorkerShould compile to: 447*9880d681SAndroid Build Coastguard Worker 448*9880d681SAndroid Build Coastguard Worker_Z8full_addjj: 449*9880d681SAndroid Build Coastguard Worker adds r2, r1, r2 450*9880d681SAndroid Build Coastguard Worker movcc r1, #0 451*9880d681SAndroid Build Coastguard Worker movcs r1, #1 452*9880d681SAndroid Build Coastguard Worker str r2, [r0, #0] 453*9880d681SAndroid Build Coastguard Worker strb r1, [r0, #4] 454*9880d681SAndroid Build Coastguard Worker mov pc, lr 455*9880d681SAndroid Build Coastguard Worker 456*9880d681SAndroid Build Coastguard Worker_Z11no_overflowjj: 457*9880d681SAndroid Build Coastguard Worker cmn r0, r1 458*9880d681SAndroid Build Coastguard Worker movcs r0, #0 459*9880d681SAndroid Build Coastguard Worker movcc r0, #1 460*9880d681SAndroid Build Coastguard Worker mov pc, lr 461*9880d681SAndroid Build Coastguard Worker 462*9880d681SAndroid Build Coastguard Workernot: 463*9880d681SAndroid Build Coastguard Worker 464*9880d681SAndroid Build Coastguard Worker__Z8full_addjj: 465*9880d681SAndroid Build Coastguard Worker add r3, r2, r1 466*9880d681SAndroid Build Coastguard Worker str r3, [r0] 467*9880d681SAndroid Build Coastguard Worker mov r2, #1 468*9880d681SAndroid Build Coastguard Worker mov r12, #0 469*9880d681SAndroid Build Coastguard Worker cmp r3, r1 470*9880d681SAndroid Build Coastguard Worker movlo r12, r2 471*9880d681SAndroid Build Coastguard Worker str r12, [r0, #+4] 472*9880d681SAndroid Build Coastguard Worker bx lr 473*9880d681SAndroid Build Coastguard Worker__Z11no_overflowjj: 474*9880d681SAndroid Build Coastguard Worker add r3, r1, r0 475*9880d681SAndroid Build Coastguard Worker mov r2, #1 476*9880d681SAndroid Build Coastguard Worker mov r1, #0 477*9880d681SAndroid Build Coastguard Worker cmp r3, r0 478*9880d681SAndroid Build Coastguard Worker movhs r1, r2 479*9880d681SAndroid Build Coastguard Worker mov r0, r1 480*9880d681SAndroid Build Coastguard Worker bx lr 481*9880d681SAndroid Build Coastguard Worker 482*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 483*9880d681SAndroid Build Coastguard Worker 484*9880d681SAndroid Build Coastguard WorkerSome of the NEON intrinsics may be appropriate for more general use, either 485*9880d681SAndroid Build Coastguard Workeras target-independent intrinsics or perhaps elsewhere in the ARM backend. 486*9880d681SAndroid Build Coastguard WorkerSome of them may also be lowered to target-independent SDNodes, and perhaps 487*9880d681SAndroid Build Coastguard Workersome new SDNodes could be added. 488*9880d681SAndroid Build Coastguard Worker 489*9880d681SAndroid Build Coastguard WorkerFor example, maximum, minimum, and absolute value operations are well-defined 490*9880d681SAndroid Build Coastguard Workerand standard operations, both for vector and scalar types. 491*9880d681SAndroid Build Coastguard Worker 492*9880d681SAndroid Build Coastguard WorkerThe current NEON-specific intrinsics for count leading zeros and count one 493*9880d681SAndroid Build Coastguard Workerbits could perhaps be replaced by the target-independent ctlz and ctpop 494*9880d681SAndroid Build Coastguard Workerintrinsics. It may also make sense to add a target-independent "ctls" 495*9880d681SAndroid Build Coastguard Workerintrinsic for "count leading sign bits". Likewise, the backend could use 496*9880d681SAndroid Build Coastguard Workerthe target-independent SDNodes for these operations. 497*9880d681SAndroid Build Coastguard Worker 498*9880d681SAndroid Build Coastguard WorkerARMv6 has scalar saturating and halving adds and subtracts. The same 499*9880d681SAndroid Build Coastguard Workerintrinsics could possibly be used for both NEON's vector implementations of 500*9880d681SAndroid Build Coastguard Workerthose operations and the ARMv6 scalar versions. 501*9880d681SAndroid Build Coastguard Worker 502*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 503*9880d681SAndroid Build Coastguard Worker 504*9880d681SAndroid Build Coastguard WorkerSplit out LDR (literal) from normal ARM LDR instruction. Also consider spliting 505*9880d681SAndroid Build Coastguard WorkerLDR into imm12 and so_reg forms. This allows us to clean up some code. e.g. 506*9880d681SAndroid Build Coastguard WorkerARMLoadStoreOptimizer does not need to look at LDR (literal) and LDR (so_reg) 507*9880d681SAndroid Build Coastguard Workerwhile ARMConstantIslandPass only need to worry about LDR (literal). 508*9880d681SAndroid Build Coastguard Worker 509*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 510*9880d681SAndroid Build Coastguard Worker 511*9880d681SAndroid Build Coastguard WorkerConstant island pass should make use of full range SoImm values for LEApcrel. 512*9880d681SAndroid Build Coastguard WorkerBe careful though as the last attempt caused infinite looping on lencod. 513*9880d681SAndroid Build Coastguard Worker 514*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 515*9880d681SAndroid Build Coastguard Worker 516*9880d681SAndroid Build Coastguard WorkerPredication issue. This function: 517*9880d681SAndroid Build Coastguard Worker 518*9880d681SAndroid Build Coastguard Workerextern unsigned array[ 128 ]; 519*9880d681SAndroid Build Coastguard Workerint foo( int x ) { 520*9880d681SAndroid Build Coastguard Worker int y; 521*9880d681SAndroid Build Coastguard Worker y = array[ x & 127 ]; 522*9880d681SAndroid Build Coastguard Worker if ( x & 128 ) 523*9880d681SAndroid Build Coastguard Worker y = 123456789 & ( y >> 2 ); 524*9880d681SAndroid Build Coastguard Worker else 525*9880d681SAndroid Build Coastguard Worker y = 123456789 & y; 526*9880d681SAndroid Build Coastguard Worker return y; 527*9880d681SAndroid Build Coastguard Worker} 528*9880d681SAndroid Build Coastguard Worker 529*9880d681SAndroid Build Coastguard Workercompiles to: 530*9880d681SAndroid Build Coastguard Worker 531*9880d681SAndroid Build Coastguard Worker_foo: 532*9880d681SAndroid Build Coastguard Worker and r1, r0, #127 533*9880d681SAndroid Build Coastguard Worker ldr r2, LCPI1_0 534*9880d681SAndroid Build Coastguard Worker ldr r2, [r2] 535*9880d681SAndroid Build Coastguard Worker ldr r1, [r2, +r1, lsl #2] 536*9880d681SAndroid Build Coastguard Worker mov r2, r1, lsr #2 537*9880d681SAndroid Build Coastguard Worker tst r0, #128 538*9880d681SAndroid Build Coastguard Worker moveq r2, r1 539*9880d681SAndroid Build Coastguard Worker ldr r0, LCPI1_1 540*9880d681SAndroid Build Coastguard Worker and r0, r2, r0 541*9880d681SAndroid Build Coastguard Worker bx lr 542*9880d681SAndroid Build Coastguard Worker 543*9880d681SAndroid Build Coastguard WorkerIt would be better to do something like this, to fold the shift into the 544*9880d681SAndroid Build Coastguard Workerconditional move: 545*9880d681SAndroid Build Coastguard Worker 546*9880d681SAndroid Build Coastguard Worker and r1, r0, #127 547*9880d681SAndroid Build Coastguard Worker ldr r2, LCPI1_0 548*9880d681SAndroid Build Coastguard Worker ldr r2, [r2] 549*9880d681SAndroid Build Coastguard Worker ldr r1, [r2, +r1, lsl #2] 550*9880d681SAndroid Build Coastguard Worker tst r0, #128 551*9880d681SAndroid Build Coastguard Worker movne r1, r1, lsr #2 552*9880d681SAndroid Build Coastguard Worker ldr r0, LCPI1_1 553*9880d681SAndroid Build Coastguard Worker and r0, r1, r0 554*9880d681SAndroid Build Coastguard Worker bx lr 555*9880d681SAndroid Build Coastguard Worker 556*9880d681SAndroid Build Coastguard Workerit saves an instruction and a register. 557*9880d681SAndroid Build Coastguard Worker 558*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 559*9880d681SAndroid Build Coastguard Worker 560*9880d681SAndroid Build Coastguard WorkerIt might be profitable to cse MOVi16 if there are lots of 32-bit immediates 561*9880d681SAndroid Build Coastguard Workerwith the same bottom half. 562*9880d681SAndroid Build Coastguard Worker 563*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 564*9880d681SAndroid Build Coastguard Worker 565*9880d681SAndroid Build Coastguard WorkerRobert Muth started working on an alternate jump table implementation that 566*9880d681SAndroid Build Coastguard Workerdoes not put the tables in-line in the text. This is more like the llvm 567*9880d681SAndroid Build Coastguard Workerdefault jump table implementation. This might be useful sometime. Several 568*9880d681SAndroid Build Coastguard Workerrevisions of patches are on the mailing list, beginning at: 569*9880d681SAndroid Build Coastguard Workerhttp://lists.llvm.org/pipermail/llvm-dev/2009-June/022763.html 570*9880d681SAndroid Build Coastguard Worker 571*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 572*9880d681SAndroid Build Coastguard Worker 573*9880d681SAndroid Build Coastguard WorkerMake use of the "rbit" instruction. 574*9880d681SAndroid Build Coastguard Worker 575*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 576*9880d681SAndroid Build Coastguard Worker 577*9880d681SAndroid Build Coastguard WorkerTake a look at test/CodeGen/Thumb2/machine-licm.ll. ARM should be taught how 578*9880d681SAndroid Build Coastguard Workerto licm and cse the unnecessary load from cp#1. 579*9880d681SAndroid Build Coastguard Worker 580*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 581*9880d681SAndroid Build Coastguard Worker 582*9880d681SAndroid Build Coastguard WorkerThe CMN instruction sets the flags like an ADD instruction, while CMP sets 583*9880d681SAndroid Build Coastguard Workerthem like a subtract. Therefore to be able to use CMN for comparisons other 584*9880d681SAndroid Build Coastguard Workerthan the Z bit, we'll need additional logic to reverse the conditionals 585*9880d681SAndroid Build Coastguard Workerassociated with the comparison. Perhaps a pseudo-instruction for the comparison, 586*9880d681SAndroid Build Coastguard Workerwith a post-codegen pass to clean up and handle the condition codes? 587*9880d681SAndroid Build Coastguard WorkerSee PR5694 for testcase. 588*9880d681SAndroid Build Coastguard Worker 589*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 590*9880d681SAndroid Build Coastguard Worker 591*9880d681SAndroid Build Coastguard WorkerGiven the following on armv5: 592*9880d681SAndroid Build Coastguard Workerint test1(int A, int B) { 593*9880d681SAndroid Build Coastguard Worker return (A&-8388481)|(B&8388480); 594*9880d681SAndroid Build Coastguard Worker} 595*9880d681SAndroid Build Coastguard Worker 596*9880d681SAndroid Build Coastguard WorkerWe currently generate: 597*9880d681SAndroid Build Coastguard Worker ldr r2, .LCPI0_0 598*9880d681SAndroid Build Coastguard Worker and r0, r0, r2 599*9880d681SAndroid Build Coastguard Worker ldr r2, .LCPI0_1 600*9880d681SAndroid Build Coastguard Worker and r1, r1, r2 601*9880d681SAndroid Build Coastguard Worker orr r0, r1, r0 602*9880d681SAndroid Build Coastguard Worker bx lr 603*9880d681SAndroid Build Coastguard Worker 604*9880d681SAndroid Build Coastguard WorkerWe should be able to replace the second ldr+and with a bic (i.e. reuse the 605*9880d681SAndroid Build Coastguard Workerconstant which was already loaded). Not sure what's necessary to do that. 606*9880d681SAndroid Build Coastguard Worker 607*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 608*9880d681SAndroid Build Coastguard Worker 609*9880d681SAndroid Build Coastguard WorkerThe code generated for bswap on armv4/5 (CPUs without rev) is less than ideal: 610*9880d681SAndroid Build Coastguard Worker 611*9880d681SAndroid Build Coastguard Workerint a(int x) { return __builtin_bswap32(x); } 612*9880d681SAndroid Build Coastguard Worker 613*9880d681SAndroid Build Coastguard Workera: 614*9880d681SAndroid Build Coastguard Worker mov r1, #255, 24 615*9880d681SAndroid Build Coastguard Worker mov r2, #255, 16 616*9880d681SAndroid Build Coastguard Worker and r1, r1, r0, lsr #8 617*9880d681SAndroid Build Coastguard Worker and r2, r2, r0, lsl #8 618*9880d681SAndroid Build Coastguard Worker orr r1, r1, r0, lsr #24 619*9880d681SAndroid Build Coastguard Worker orr r0, r2, r0, lsl #24 620*9880d681SAndroid Build Coastguard Worker orr r0, r0, r1 621*9880d681SAndroid Build Coastguard Worker bx lr 622*9880d681SAndroid Build Coastguard Worker 623*9880d681SAndroid Build Coastguard WorkerSomething like the following would be better (fewer instructions/registers): 624*9880d681SAndroid Build Coastguard Worker eor r1, r0, r0, ror #16 625*9880d681SAndroid Build Coastguard Worker bic r1, r1, #0xff0000 626*9880d681SAndroid Build Coastguard Worker mov r1, r1, lsr #8 627*9880d681SAndroid Build Coastguard Worker eor r0, r1, r0, ror #8 628*9880d681SAndroid Build Coastguard Worker bx lr 629*9880d681SAndroid Build Coastguard Worker 630*9880d681SAndroid Build Coastguard WorkerA custom Thumb version would also be a slight improvement over the generic 631*9880d681SAndroid Build Coastguard Workerversion. 632*9880d681SAndroid Build Coastguard Worker 633*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 634*9880d681SAndroid Build Coastguard Worker 635*9880d681SAndroid Build Coastguard WorkerConsider the following simple C code: 636*9880d681SAndroid Build Coastguard Worker 637*9880d681SAndroid Build Coastguard Workervoid foo(unsigned char *a, unsigned char *b, int *c) { 638*9880d681SAndroid Build Coastguard Worker if ((*a | *b) == 0) *c = 0; 639*9880d681SAndroid Build Coastguard Worker} 640*9880d681SAndroid Build Coastguard Worker 641*9880d681SAndroid Build Coastguard Workercurrently llvm-gcc generates something like this (nice branchless code I'd say): 642*9880d681SAndroid Build Coastguard Worker 643*9880d681SAndroid Build Coastguard Worker ldrb r0, [r0] 644*9880d681SAndroid Build Coastguard Worker ldrb r1, [r1] 645*9880d681SAndroid Build Coastguard Worker orr r0, r1, r0 646*9880d681SAndroid Build Coastguard Worker tst r0, #255 647*9880d681SAndroid Build Coastguard Worker moveq r0, #0 648*9880d681SAndroid Build Coastguard Worker streq r0, [r2] 649*9880d681SAndroid Build Coastguard Worker bx lr 650*9880d681SAndroid Build Coastguard Worker 651*9880d681SAndroid Build Coastguard WorkerNote that both "tst" and "moveq" are redundant. 652*9880d681SAndroid Build Coastguard Worker 653*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 654*9880d681SAndroid Build Coastguard Worker 655*9880d681SAndroid Build Coastguard WorkerWhen loading immediate constants with movt/movw, if there are multiple 656*9880d681SAndroid Build Coastguard Workerconstants needed with the same low 16 bits, and those values are not live at 657*9880d681SAndroid Build Coastguard Workerthe same time, it would be possible to use a single movw instruction, followed 658*9880d681SAndroid Build Coastguard Workerby multiple movt instructions to rewrite the high bits to different values. 659*9880d681SAndroid Build Coastguard WorkerFor example: 660*9880d681SAndroid Build Coastguard Worker 661*9880d681SAndroid Build Coastguard Worker volatile store i32 -1, i32* inttoptr (i32 1342210076 to i32*), align 4, 662*9880d681SAndroid Build Coastguard Worker !tbaa 663*9880d681SAndroid Build Coastguard Worker!0 664*9880d681SAndroid Build Coastguard Worker volatile store i32 -1, i32* inttoptr (i32 1342341148 to i32*), align 4, 665*9880d681SAndroid Build Coastguard Worker !tbaa 666*9880d681SAndroid Build Coastguard Worker!0 667*9880d681SAndroid Build Coastguard Worker 668*9880d681SAndroid Build Coastguard Workeris compiled and optimized to: 669*9880d681SAndroid Build Coastguard Worker 670*9880d681SAndroid Build Coastguard Worker movw r0, #32796 671*9880d681SAndroid Build Coastguard Worker mov.w r1, #-1 672*9880d681SAndroid Build Coastguard Worker movt r0, #20480 673*9880d681SAndroid Build Coastguard Worker str r1, [r0] 674*9880d681SAndroid Build Coastguard Worker movw r0, #32796 @ <= this MOVW is not needed, value is there already 675*9880d681SAndroid Build Coastguard Worker movt r0, #20482 676*9880d681SAndroid Build Coastguard Worker str r1, [r0] 677*9880d681SAndroid Build Coastguard Worker 678*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 679*9880d681SAndroid Build Coastguard Worker 680*9880d681SAndroid Build Coastguard WorkerImprove codegen for select's: 681*9880d681SAndroid Build Coastguard Workerif (x != 0) x = 1 682*9880d681SAndroid Build Coastguard Workerif (x == 1) x = 1 683*9880d681SAndroid Build Coastguard Worker 684*9880d681SAndroid Build Coastguard WorkerARM codegen used to look like this: 685*9880d681SAndroid Build Coastguard Worker mov r1, r0 686*9880d681SAndroid Build Coastguard Worker cmp r1, #1 687*9880d681SAndroid Build Coastguard Worker mov r0, #0 688*9880d681SAndroid Build Coastguard Worker moveq r0, #1 689*9880d681SAndroid Build Coastguard Worker 690*9880d681SAndroid Build Coastguard WorkerThe naive lowering select between two different values. It should recognize the 691*9880d681SAndroid Build Coastguard Workertest is equality test so it's more a conditional move rather than a select: 692*9880d681SAndroid Build Coastguard Worker cmp r0, #1 693*9880d681SAndroid Build Coastguard Worker movne r0, #0 694*9880d681SAndroid Build Coastguard Worker 695*9880d681SAndroid Build Coastguard WorkerCurrently this is a ARM specific dag combine. We probably should make it into a 696*9880d681SAndroid Build Coastguard Workertarget-neutral one. 697*9880d681SAndroid Build Coastguard Worker 698*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 699*9880d681SAndroid Build Coastguard Worker 700*9880d681SAndroid Build Coastguard WorkerOptimize unnecessary checks for zero with __builtin_clz/ctz. Those builtins 701*9880d681SAndroid Build Coastguard Workerare specified to be undefined at zero, so portable code must check for zero 702*9880d681SAndroid Build Coastguard Workerand handle it as a special case. That is unnecessary on ARM where those 703*9880d681SAndroid Build Coastguard Workeroperations are implemented in a way that is well-defined for zero. For 704*9880d681SAndroid Build Coastguard Workerexample: 705*9880d681SAndroid Build Coastguard Worker 706*9880d681SAndroid Build Coastguard Workerint f(int x) { return x ? __builtin_clz(x) : sizeof(int)*8; } 707*9880d681SAndroid Build Coastguard Worker 708*9880d681SAndroid Build Coastguard Workershould just be implemented with a CLZ instruction. Since there are other 709*9880d681SAndroid Build Coastguard Workertargets, e.g., PPC, that share this behavior, it would be best to implement 710*9880d681SAndroid Build Coastguard Workerthis in a target-independent way: we should probably fold that (when using 711*9880d681SAndroid Build Coastguard Worker"undefined at zero" semantics) to set the "defined at zero" bit and have 712*9880d681SAndroid Build Coastguard Workerthe code generator expand out the right code. 713*9880d681SAndroid Build Coastguard Worker 714*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===// 715*9880d681SAndroid Build Coastguard Worker 716*9880d681SAndroid Build Coastguard WorkerClean up the test/MC/ARM files to have more robust register choices. 717*9880d681SAndroid Build Coastguard Worker 718*9880d681SAndroid Build Coastguard WorkerR0 should not be used as a register operand in the assembler tests as it's then 719*9880d681SAndroid Build Coastguard Workernot possible to distinguish between a correct encoding and a missing operand 720*9880d681SAndroid Build Coastguard Workerencoding, as zero is the default value for the binary encoder. 721*9880d681SAndroid Build Coastguard Workere.g., 722*9880d681SAndroid Build Coastguard Worker add r0, r0 // bad 723*9880d681SAndroid Build Coastguard Worker add r3, r5 // good 724*9880d681SAndroid Build Coastguard Worker 725*9880d681SAndroid Build Coastguard WorkerRegister operands should be distinct. That is, when the encoding does not 726*9880d681SAndroid Build Coastguard Workerrequire two syntactical operands to refer to the same register, two different 727*9880d681SAndroid Build Coastguard Workerregisters should be used in the test so as to catch errors where the 728*9880d681SAndroid Build Coastguard Workeroperands are swapped in the encoding. 729*9880d681SAndroid Build Coastguard Workere.g., 730*9880d681SAndroid Build Coastguard Worker subs.w r1, r1, r1 // bad 731*9880d681SAndroid Build Coastguard Worker subs.w r1, r2, r3 // good 732*9880d681SAndroid Build Coastguard Worker 733