1*9880d681SAndroid Build Coastguard WorkerWed Jun 25 15:13:51 CDT 2003 2*9880d681SAndroid Build Coastguard Worker 3*9880d681SAndroid Build Coastguard WorkerFirst-level instrumentation 4*9880d681SAndroid Build Coastguard Worker--------------------------- 5*9880d681SAndroid Build Coastguard Worker 6*9880d681SAndroid Build Coastguard WorkerWe use opt to do Bytecode-to-bytecode instrumentation. Look at 7*9880d681SAndroid Build Coastguard Workerback-edges and insert llvm_first_trigger() function call which takes 8*9880d681SAndroid Build Coastguard Workerno arguments and no return value. This instrumentation is designed to 9*9880d681SAndroid Build Coastguard Workerbe easy to remove, for instance by writing a NOP over the function 10*9880d681SAndroid Build Coastguard Workercall instruction. 11*9880d681SAndroid Build Coastguard Worker 12*9880d681SAndroid Build Coastguard WorkerKeep count of every call to llvm_first_trigger(), and maintain 13*9880d681SAndroid Build Coastguard Workercounters in a map indexed by return address. If the trigger count 14*9880d681SAndroid Build Coastguard Workerexceeds a threshold, we identify a hot loop and perform second-level 15*9880d681SAndroid Build Coastguard Workerinstrumentation on the hot loop region (the instructions between the 16*9880d681SAndroid Build Coastguard Workertarget of the back-edge and the branch that causes the back-edge). We 17*9880d681SAndroid Build Coastguard Workerdo not move code across basic-block boundaries. 18*9880d681SAndroid Build Coastguard Worker 19*9880d681SAndroid Build Coastguard Worker 20*9880d681SAndroid Build Coastguard WorkerSecond-level instrumentation 21*9880d681SAndroid Build Coastguard Worker--------------------------- 22*9880d681SAndroid Build Coastguard Worker 23*9880d681SAndroid Build Coastguard WorkerWe remove the first-level instrumentation by overwriting the CALL to 24*9880d681SAndroid Build Coastguard Workerllvm_first_trigger() with a NOP. 25*9880d681SAndroid Build Coastguard Worker 26*9880d681SAndroid Build Coastguard WorkerThe reoptimizer maintains a map between machine-code basic blocks and 27*9880d681SAndroid Build Coastguard WorkerLLVM BasicBlock*s. We only keep track of paths that start at the 28*9880d681SAndroid Build Coastguard Workerfirst machine-code basic block of the hot loop region. 29*9880d681SAndroid Build Coastguard Worker 30*9880d681SAndroid Build Coastguard WorkerHow do we keep track of which edges to instrument, and which edges are 31*9880d681SAndroid Build Coastguard Workerexits from the hot region? 3 step process. 32*9880d681SAndroid Build Coastguard Worker 33*9880d681SAndroid Build Coastguard Worker1) Do a DFS from the first machine-code basic block of the hot loop 34*9880d681SAndroid Build Coastguard Workerregion and mark reachable edges. 35*9880d681SAndroid Build Coastguard Worker 36*9880d681SAndroid Build Coastguard Worker2) Do a DFS from the last machine-code basic block of the hot loop 37*9880d681SAndroid Build Coastguard Workerregion IGNORING back edges, and mark the edges which are reachable in 38*9880d681SAndroid Build Coastguard Worker1) and also in 2) (i.e., must be reachable from both the start BB and 39*9880d681SAndroid Build Coastguard Workerthe end BB of the hot region). 40*9880d681SAndroid Build Coastguard Worker 41*9880d681SAndroid Build Coastguard Worker3) Mark BBs which end in edges that exit the hot region; we need to 42*9880d681SAndroid Build Coastguard Workerinstrument these differently. 43*9880d681SAndroid Build Coastguard Worker 44*9880d681SAndroid Build Coastguard WorkerAssume that there is 1 free register. On SPARC we use %g1, which LLC 45*9880d681SAndroid Build Coastguard Workerhas agreed not to use. Shift a 1 into it at the beginning. At every 46*9880d681SAndroid Build Coastguard Workeredge which corresponds to a conditional branch, we shift 0 for not 47*9880d681SAndroid Build Coastguard Workertaken and 1 for taken into a register. This uniquely numbers the paths 48*9880d681SAndroid Build Coastguard Workerthrough the hot region. Silently fail if we need more than 64 bits. 49*9880d681SAndroid Build Coastguard Worker 50*9880d681SAndroid Build Coastguard WorkerAt the end BB we call countPath and increment the counter based on %g1 51*9880d681SAndroid Build Coastguard Workerand the return address of the countPath call. We keep track of the 52*9880d681SAndroid Build Coastguard Workernumber of iterations and the number of paths. We only run this 53*9880d681SAndroid Build Coastguard Workerversion 30 or 40 times. 54*9880d681SAndroid Build Coastguard Worker 55*9880d681SAndroid Build Coastguard WorkerFind the BBs that total 90% or more of execution, and aggregate them 56*9880d681SAndroid Build Coastguard Workertogether to form our trace. But we do not allow more than 5 paths; if 57*9880d681SAndroid Build Coastguard Workerwe have more than 5 we take the ones that are executed the most. We 58*9880d681SAndroid Build Coastguard Workerverify our assumption that we picked a hot back-edge in first-level 59*9880d681SAndroid Build Coastguard Workerinstrumentation, by making sure that the number of times we took an 60*9880d681SAndroid Build Coastguard Workerexit edge from the hot trace is less than 10% of the number of 61*9880d681SAndroid Build Coastguard Workeriterations. 62*9880d681SAndroid Build Coastguard Worker 63*9880d681SAndroid Build Coastguard WorkerLLC has been taught to recognize llvm_first_trigger() calls and NOT 64*9880d681SAndroid Build Coastguard Workergenerate saves and restores of caller-saved registers around these 65*9880d681SAndroid Build Coastguard Workercalls. 66*9880d681SAndroid Build Coastguard Worker 67*9880d681SAndroid Build Coastguard Worker 68*9880d681SAndroid Build Coastguard WorkerPhase behavior 69*9880d681SAndroid Build Coastguard Worker-------------- 70*9880d681SAndroid Build Coastguard Worker 71*9880d681SAndroid Build Coastguard WorkerWe turn off llvm_first_trigger() calls with NOPs, but this would hide 72*9880d681SAndroid Build Coastguard Workerphase behavior from us (when some funcs/traces stop being hot and 73*9880d681SAndroid Build Coastguard Workerothers become hot.) 74*9880d681SAndroid Build Coastguard Worker 75*9880d681SAndroid Build Coastguard WorkerWe have a SIGALRM timer that counts time for us. Every time we get a 76*9880d681SAndroid Build Coastguard WorkerSIGALRM we look at our priority queue of locations where we have 77*9880d681SAndroid Build Coastguard Workerremoved llvm_first_trigger() calls. Each location is inserted along 78*9880d681SAndroid Build Coastguard Workerwith a time when we will next turn instrumentation back on for that 79*9880d681SAndroid Build Coastguard Workercall site. If the time has arrived for a particular call site, we pop 80*9880d681SAndroid Build Coastguard Workerthat off the prio. queue and turn instrumentation back on for that 81*9880d681SAndroid Build Coastguard Workercall site. 82*9880d681SAndroid Build Coastguard Worker 83*9880d681SAndroid Build Coastguard Worker 84*9880d681SAndroid Build Coastguard WorkerGenerating traces 85*9880d681SAndroid Build Coastguard Worker----------------- 86*9880d681SAndroid Build Coastguard Worker 87*9880d681SAndroid Build Coastguard WorkerWhen we finally generate an optimized trace we first copy the code 88*9880d681SAndroid Build Coastguard Workerinto the trace cache. This leaves us with 3 copies of the code: the 89*9880d681SAndroid Build Coastguard Workeroriginal code, the instrumented code, and the optimized trace. The 90*9880d681SAndroid Build Coastguard Workeroptimized trace does not have instrumentation. The original code and 91*9880d681SAndroid Build Coastguard Workerthe instrumented code are modified to have a branch to the trace 92*9880d681SAndroid Build Coastguard Workercache, where the optimized traces are kept. 93*9880d681SAndroid Build Coastguard Worker 94*9880d681SAndroid Build Coastguard WorkerWe copy the code from the original to the instrumentation version 95*9880d681SAndroid Build Coastguard Workerby tracing the LLVM-to-Machine code basic block map and then copying 96*9880d681SAndroid Build Coastguard Workereach machine code basic block we think is in the hot region into the 97*9880d681SAndroid Build Coastguard Workertrace cache. Then we instrument that code. The process is similar for 98*9880d681SAndroid Build Coastguard Workergenerating the final optimized trace; we copy the same basic blocks 99*9880d681SAndroid Build Coastguard Workerbecause we might need to put in fixup code for exit BBs. 100*9880d681SAndroid Build Coastguard Worker 101*9880d681SAndroid Build Coastguard WorkerLLVM basic blocks are not typically used in the Reoptimizer except 102*9880d681SAndroid Build Coastguard Workerfor the mapping information. 103*9880d681SAndroid Build Coastguard Worker 104*9880d681SAndroid Build Coastguard WorkerWe are restricted to using single instructions to branch between the 105*9880d681SAndroid Build Coastguard Workeroriginal code, trace, and instrumented code. So we have to keep the 106*9880d681SAndroid Build Coastguard Workercode copies in memory near the original code (they can't be far enough 107*9880d681SAndroid Build Coastguard Workeraway that a single pc-relative branch would not work.) Malloc() or 108*9880d681SAndroid Build Coastguard Workerdata region space is too far away. this impacts the design of the 109*9880d681SAndroid Build Coastguard Workertrace cache. 110*9880d681SAndroid Build Coastguard Worker 111*9880d681SAndroid Build Coastguard WorkerWe use a dummy function that is full of a bunch of for loops which we 112*9880d681SAndroid Build Coastguard Workeroverwrite with trace-cache code. The trace manager keeps track of 113*9880d681SAndroid Build Coastguard Workerwhether or not we have enough space in the trace cache, etc. 114*9880d681SAndroid Build Coastguard Worker 115*9880d681SAndroid Build Coastguard WorkerThe trace insertion routine takes an original start address, a vector 116*9880d681SAndroid Build Coastguard Workerof machine instructions representing the trace, index of branches and 117*9880d681SAndroid Build Coastguard Workertheir corresponding absolute targets, and index of calls and their 118*9880d681SAndroid Build Coastguard Workercorresponding absolute targets. 119*9880d681SAndroid Build Coastguard Worker 120*9880d681SAndroid Build Coastguard WorkerThe trace insertion routine is responsible for inserting branches from 121*9880d681SAndroid Build Coastguard Workerthe beginning of the original code to the beginning of the optimized 122*9880d681SAndroid Build Coastguard Workertrace. This is because at some point the trace cache may run out of 123*9880d681SAndroid Build Coastguard Workerspace and it may have to evict a trace, at which point the branch to 124*9880d681SAndroid Build Coastguard Workerthe trace would also have to be removed. It uses a round-robin 125*9880d681SAndroid Build Coastguard Workerreplacement policy; we have found that this is almost as good as LRU 126*9880d681SAndroid Build Coastguard Workerand better than random (especially because of problems fitting the new 127*9880d681SAndroid Build Coastguard Workertrace in.) 128*9880d681SAndroid Build Coastguard Worker 129*9880d681SAndroid Build Coastguard WorkerWe cannot deal with discontiguous trace cache areas. The trace cache 130*9880d681SAndroid Build Coastguard Workeris supposed to be cache-line-aligned, but it is not page-aligned. 131*9880d681SAndroid Build Coastguard Worker 132*9880d681SAndroid Build Coastguard WorkerWe generate instrumentation traces and optimized traces into separate 133*9880d681SAndroid Build Coastguard Workertrace caches. We keep the instrumented code around because you don't 134*9880d681SAndroid Build Coastguard Workerwant to delete a trace when you still might have to return to it 135*9880d681SAndroid Build Coastguard Worker(i.e., return from an llvm_first_trigger() or countPath() call.) 136*9880d681SAndroid Build Coastguard Worker 137*9880d681SAndroid Build Coastguard Worker 138