xref: /aosp_15_r20/external/llvm/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt (revision 9880d6810fe72a1726cb53787c6711e909410d58)
1*9880d681SAndroid Build Coastguard WorkerWed Jun 25 15:13:51 CDT 2003
2*9880d681SAndroid Build Coastguard Worker
3*9880d681SAndroid Build Coastguard WorkerFirst-level instrumentation
4*9880d681SAndroid Build Coastguard Worker---------------------------
5*9880d681SAndroid Build Coastguard Worker
6*9880d681SAndroid Build Coastguard WorkerWe use opt to do Bytecode-to-bytecode instrumentation. Look at
7*9880d681SAndroid Build Coastguard Workerback-edges and insert llvm_first_trigger() function call which takes
8*9880d681SAndroid Build Coastguard Workerno arguments and no return value. This instrumentation is designed to
9*9880d681SAndroid Build Coastguard Workerbe easy to remove, for instance by writing a NOP over the function
10*9880d681SAndroid Build Coastguard Workercall instruction.
11*9880d681SAndroid Build Coastguard Worker
12*9880d681SAndroid Build Coastguard WorkerKeep count of every call to llvm_first_trigger(), and maintain
13*9880d681SAndroid Build Coastguard Workercounters in a map indexed by return address. If the trigger count
14*9880d681SAndroid Build Coastguard Workerexceeds a threshold, we identify a hot loop and perform second-level
15*9880d681SAndroid Build Coastguard Workerinstrumentation on the hot loop region (the instructions between the
16*9880d681SAndroid Build Coastguard Workertarget of the back-edge and the branch that causes the back-edge).  We
17*9880d681SAndroid Build Coastguard Workerdo not move code across basic-block boundaries.
18*9880d681SAndroid Build Coastguard Worker
19*9880d681SAndroid Build Coastguard Worker
20*9880d681SAndroid Build Coastguard WorkerSecond-level instrumentation
21*9880d681SAndroid Build Coastguard Worker---------------------------
22*9880d681SAndroid Build Coastguard Worker
23*9880d681SAndroid Build Coastguard WorkerWe remove the first-level instrumentation by overwriting the CALL to
24*9880d681SAndroid Build Coastguard Workerllvm_first_trigger() with a NOP.
25*9880d681SAndroid Build Coastguard Worker
26*9880d681SAndroid Build Coastguard WorkerThe reoptimizer maintains a map between machine-code basic blocks and
27*9880d681SAndroid Build Coastguard WorkerLLVM BasicBlock*s.  We only keep track of paths that start at the
28*9880d681SAndroid Build Coastguard Workerfirst machine-code basic block of the hot loop region.
29*9880d681SAndroid Build Coastguard Worker
30*9880d681SAndroid Build Coastguard WorkerHow do we keep track of which edges to instrument, and which edges are
31*9880d681SAndroid Build Coastguard Workerexits from the hot region? 3 step process.
32*9880d681SAndroid Build Coastguard Worker
33*9880d681SAndroid Build Coastguard Worker1) Do a DFS from the first machine-code basic block of the hot loop
34*9880d681SAndroid Build Coastguard Workerregion and mark reachable edges.
35*9880d681SAndroid Build Coastguard Worker
36*9880d681SAndroid Build Coastguard Worker2) Do a DFS from the last machine-code basic block of the hot loop
37*9880d681SAndroid Build Coastguard Workerregion IGNORING back edges, and mark the edges which are reachable in
38*9880d681SAndroid Build Coastguard Worker1) and also in 2) (i.e., must be reachable from both the start BB and
39*9880d681SAndroid Build Coastguard Workerthe end BB of the hot region).
40*9880d681SAndroid Build Coastguard Worker
41*9880d681SAndroid Build Coastguard Worker3) Mark BBs which end in edges that exit the hot region; we need to
42*9880d681SAndroid Build Coastguard Workerinstrument these differently.
43*9880d681SAndroid Build Coastguard Worker
44*9880d681SAndroid Build Coastguard WorkerAssume that there is 1 free register. On SPARC we use %g1, which LLC
45*9880d681SAndroid Build Coastguard Workerhas agreed not to use.  Shift a 1 into it at the beginning. At every
46*9880d681SAndroid Build Coastguard Workeredge which corresponds to a conditional branch, we shift 0 for not
47*9880d681SAndroid Build Coastguard Workertaken and 1 for taken into a register. This uniquely numbers the paths
48*9880d681SAndroid Build Coastguard Workerthrough the hot region. Silently fail if we need more than 64 bits.
49*9880d681SAndroid Build Coastguard Worker
50*9880d681SAndroid Build Coastguard WorkerAt the end BB we call countPath and increment the counter based on %g1
51*9880d681SAndroid Build Coastguard Workerand the return address of the countPath call.  We keep track of the
52*9880d681SAndroid Build Coastguard Workernumber of iterations and the number of paths.  We only run this
53*9880d681SAndroid Build Coastguard Workerversion 30 or 40 times.
54*9880d681SAndroid Build Coastguard Worker
55*9880d681SAndroid Build Coastguard WorkerFind the BBs that total 90% or more of execution, and aggregate them
56*9880d681SAndroid Build Coastguard Workertogether to form our trace. But we do not allow more than 5 paths; if
57*9880d681SAndroid Build Coastguard Workerwe have more than 5 we take the ones that are executed the most.  We
58*9880d681SAndroid Build Coastguard Workerverify our assumption that we picked a hot back-edge in first-level
59*9880d681SAndroid Build Coastguard Workerinstrumentation, by making sure that the number of times we took an
60*9880d681SAndroid Build Coastguard Workerexit edge from the hot trace is less than 10% of the number of
61*9880d681SAndroid Build Coastguard Workeriterations.
62*9880d681SAndroid Build Coastguard Worker
63*9880d681SAndroid Build Coastguard WorkerLLC has been taught to recognize llvm_first_trigger() calls and NOT
64*9880d681SAndroid Build Coastguard Workergenerate saves and restores of caller-saved registers around these
65*9880d681SAndroid Build Coastguard Workercalls.
66*9880d681SAndroid Build Coastguard Worker
67*9880d681SAndroid Build Coastguard Worker
68*9880d681SAndroid Build Coastguard WorkerPhase behavior
69*9880d681SAndroid Build Coastguard Worker--------------
70*9880d681SAndroid Build Coastguard Worker
71*9880d681SAndroid Build Coastguard WorkerWe turn off llvm_first_trigger() calls with NOPs, but this would hide
72*9880d681SAndroid Build Coastguard Workerphase behavior from us (when some funcs/traces stop being hot and
73*9880d681SAndroid Build Coastguard Workerothers become hot.)
74*9880d681SAndroid Build Coastguard Worker
75*9880d681SAndroid Build Coastguard WorkerWe have a SIGALRM timer that counts time for us. Every time we get a
76*9880d681SAndroid Build Coastguard WorkerSIGALRM we look at our priority queue of locations where we have
77*9880d681SAndroid Build Coastguard Workerremoved llvm_first_trigger() calls. Each location is inserted along
78*9880d681SAndroid Build Coastguard Workerwith a time when we will next turn instrumentation back on for that
79*9880d681SAndroid Build Coastguard Workercall site. If the time has arrived for a particular call site, we pop
80*9880d681SAndroid Build Coastguard Workerthat off the prio. queue and turn instrumentation back on for that
81*9880d681SAndroid Build Coastguard Workercall site.
82*9880d681SAndroid Build Coastguard Worker
83*9880d681SAndroid Build Coastguard Worker
84*9880d681SAndroid Build Coastguard WorkerGenerating traces
85*9880d681SAndroid Build Coastguard Worker-----------------
86*9880d681SAndroid Build Coastguard Worker
87*9880d681SAndroid Build Coastguard WorkerWhen we finally generate an optimized trace we first copy the code
88*9880d681SAndroid Build Coastguard Workerinto the trace cache. This leaves us with 3 copies of the code: the
89*9880d681SAndroid Build Coastguard Workeroriginal code, the instrumented code, and the optimized trace. The
90*9880d681SAndroid Build Coastguard Workeroptimized trace does not have instrumentation. The original code and
91*9880d681SAndroid Build Coastguard Workerthe instrumented code are modified to have a branch to the trace
92*9880d681SAndroid Build Coastguard Workercache, where the optimized traces are kept.
93*9880d681SAndroid Build Coastguard Worker
94*9880d681SAndroid Build Coastguard WorkerWe copy the code from the original to the instrumentation version
95*9880d681SAndroid Build Coastguard Workerby tracing the LLVM-to-Machine code basic block map and then copying
96*9880d681SAndroid Build Coastguard Workereach machine code basic block we think is in the hot region into the
97*9880d681SAndroid Build Coastguard Workertrace cache. Then we instrument that code. The process is similar for
98*9880d681SAndroid Build Coastguard Workergenerating the final optimized trace; we copy the same basic blocks
99*9880d681SAndroid Build Coastguard Workerbecause we might need to put in fixup code for exit BBs.
100*9880d681SAndroid Build Coastguard Worker
101*9880d681SAndroid Build Coastguard WorkerLLVM basic blocks are not typically used in the Reoptimizer except
102*9880d681SAndroid Build Coastguard Workerfor the mapping information.
103*9880d681SAndroid Build Coastguard Worker
104*9880d681SAndroid Build Coastguard WorkerWe are restricted to using single instructions to branch between the
105*9880d681SAndroid Build Coastguard Workeroriginal code, trace, and instrumented code. So we have to keep the
106*9880d681SAndroid Build Coastguard Workercode copies in memory near the original code (they can't be far enough
107*9880d681SAndroid Build Coastguard Workeraway that a single pc-relative branch would not work.) Malloc() or
108*9880d681SAndroid Build Coastguard Workerdata region space is too far away. this impacts the design of the
109*9880d681SAndroid Build Coastguard Workertrace cache.
110*9880d681SAndroid Build Coastguard Worker
111*9880d681SAndroid Build Coastguard WorkerWe use a dummy function that is full of a bunch of for loops which we
112*9880d681SAndroid Build Coastguard Workeroverwrite with trace-cache code. The trace manager keeps track of
113*9880d681SAndroid Build Coastguard Workerwhether or not we have enough space in the trace cache, etc.
114*9880d681SAndroid Build Coastguard Worker
115*9880d681SAndroid Build Coastguard WorkerThe trace insertion routine takes an original start address, a vector
116*9880d681SAndroid Build Coastguard Workerof machine instructions representing the trace, index of branches and
117*9880d681SAndroid Build Coastguard Workertheir corresponding absolute targets, and index of calls and their
118*9880d681SAndroid Build Coastguard Workercorresponding absolute targets.
119*9880d681SAndroid Build Coastguard Worker
120*9880d681SAndroid Build Coastguard WorkerThe trace insertion routine is responsible for inserting branches from
121*9880d681SAndroid Build Coastguard Workerthe beginning of the original code to the beginning of the optimized
122*9880d681SAndroid Build Coastguard Workertrace. This is because at some point the trace cache may run out of
123*9880d681SAndroid Build Coastguard Workerspace and it may have to evict a trace, at which point the branch to
124*9880d681SAndroid Build Coastguard Workerthe trace would also have to be removed. It uses a round-robin
125*9880d681SAndroid Build Coastguard Workerreplacement policy; we have found that this is almost as good as LRU
126*9880d681SAndroid Build Coastguard Workerand better than random (especially because of problems fitting the new
127*9880d681SAndroid Build Coastguard Workertrace in.)
128*9880d681SAndroid Build Coastguard Worker
129*9880d681SAndroid Build Coastguard WorkerWe cannot deal with discontiguous trace cache areas.  The trace cache
130*9880d681SAndroid Build Coastguard Workeris supposed to be cache-line-aligned, but it is not page-aligned.
131*9880d681SAndroid Build Coastguard Worker
132*9880d681SAndroid Build Coastguard WorkerWe generate instrumentation traces and optimized traces into separate
133*9880d681SAndroid Build Coastguard Workertrace caches. We keep the instrumented code around because you don't
134*9880d681SAndroid Build Coastguard Workerwant to delete a trace when you still might have to return to it
135*9880d681SAndroid Build Coastguard Worker(i.e., return from an llvm_first_trigger() or countPath() call.)
136*9880d681SAndroid Build Coastguard Worker
137*9880d681SAndroid Build Coastguard Worker
138