xref: /aosp_15_r20/external/swiftshader/third_party/subzero/docs/DESIGN.rst (revision 03ce13f70fcc45d86ee91b7ee4cab1936a95046e)
1*03ce13f7SAndroid Build Coastguard WorkerDesign of the Subzero fast code generator
2*03ce13f7SAndroid Build Coastguard Worker=========================================
3*03ce13f7SAndroid Build Coastguard Worker
4*03ce13f7SAndroid Build Coastguard WorkerIntroduction
5*03ce13f7SAndroid Build Coastguard Worker------------
6*03ce13f7SAndroid Build Coastguard Worker
7*03ce13f7SAndroid Build Coastguard WorkerThe `Portable Native Client (PNaCl) <http://gonacl.com>`_ project includes
8*03ce13f7SAndroid Build Coastguard Workercompiler technology based on `LLVM <http://llvm.org/>`_.  The developer uses the
9*03ce13f7SAndroid Build Coastguard WorkerPNaCl toolchain to compile their application to architecture-neutral PNaCl
10*03ce13f7SAndroid Build Coastguard Workerbitcode (a ``.pexe`` file), using as much architecture-neutral optimization as
11*03ce13f7SAndroid Build Coastguard Workerpossible.  The ``.pexe`` file is downloaded to the user's browser where the
12*03ce13f7SAndroid Build Coastguard WorkerPNaCl translator (a component of Chrome) compiles the ``.pexe`` file to
13*03ce13f7SAndroid Build Coastguard Worker`sandboxed
14*03ce13f7SAndroid Build Coastguard Worker<https://developer.chrome.com/native-client/reference/sandbox_internals/index>`_
15*03ce13f7SAndroid Build Coastguard Workernative code.  The translator uses architecture-specific optimizations as much as
16*03ce13f7SAndroid Build Coastguard Workerpractical to generate good native code.
17*03ce13f7SAndroid Build Coastguard Worker
18*03ce13f7SAndroid Build Coastguard WorkerThe native code can be cached by the browser to avoid repeating translation on
19*03ce13f7SAndroid Build Coastguard Workerfuture page loads.  However, first-time user experience is hampered by long
20*03ce13f7SAndroid Build Coastguard Workertranslation times.  The LLVM-based PNaCl translator is pretty slow, even when
21*03ce13f7SAndroid Build Coastguard Workerusing ``-O0`` to minimize optimizations, so delays are especially noticeable on
22*03ce13f7SAndroid Build Coastguard Workerslow browser platforms such as ARM-based Chromebooks.
23*03ce13f7SAndroid Build Coastguard Worker
24*03ce13f7SAndroid Build Coastguard WorkerTranslator slowness can be mitigated or hidden in a number of ways.
25*03ce13f7SAndroid Build Coastguard Worker
26*03ce13f7SAndroid Build Coastguard Worker- Parallel translation.  However, slow machines where this matters most, e.g.
27*03ce13f7SAndroid Build Coastguard Worker  ARM-based Chromebooks, are likely to have fewer cores to parallelize across,
28*03ce13f7SAndroid Build Coastguard Worker  and are likely to less memory available for multiple translation threads to
29*03ce13f7SAndroid Build Coastguard Worker  use.
30*03ce13f7SAndroid Build Coastguard Worker
31*03ce13f7SAndroid Build Coastguard Worker- Streaming translation, i.e. start translating as soon as the download starts.
32*03ce13f7SAndroid Build Coastguard Worker  This doesn't help much when translation speed is 10× slower than download
33*03ce13f7SAndroid Build Coastguard Worker  speed, or the ``.pexe`` file is already cached while the translated binary was
34*03ce13f7SAndroid Build Coastguard Worker  flushed from the cache.
35*03ce13f7SAndroid Build Coastguard Worker
36*03ce13f7SAndroid Build Coastguard Worker- Arrange the web page such that translation is done in parallel with
37*03ce13f7SAndroid Build Coastguard Worker  downloading large assets.
38*03ce13f7SAndroid Build Coastguard Worker
39*03ce13f7SAndroid Build Coastguard Worker- Arrange the web page to distract the user with `cat videos
40*03ce13f7SAndroid Build Coastguard Worker  <https://www.youtube.com/watch?v=tLt5rBfNucc>`_ while translation is in
41*03ce13f7SAndroid Build Coastguard Worker  progress.
42*03ce13f7SAndroid Build Coastguard Worker
43*03ce13f7SAndroid Build Coastguard WorkerOr, improve translator performance to something more reasonable.
44*03ce13f7SAndroid Build Coastguard Worker
45*03ce13f7SAndroid Build Coastguard WorkerThis document describes Subzero's attempt to improve translation speed by an
46*03ce13f7SAndroid Build Coastguard Workerorder of magnitude while rivaling LLVM's code quality.  Subzero does this
47*03ce13f7SAndroid Build Coastguard Workerthrough minimal IR layering, lean data structures and passes, and a careful
48*03ce13f7SAndroid Build Coastguard Workerselection of fast optimization passes.  It has two optimization recipes: full
49*03ce13f7SAndroid Build Coastguard Workeroptimizations (``O2``) and minimal optimizations (``Om1``).  The recipes are the
50*03ce13f7SAndroid Build Coastguard Workerfollowing (described in more detail below):
51*03ce13f7SAndroid Build Coastguard Worker
52*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
53*03ce13f7SAndroid Build Coastguard Worker| O2 recipe                              | Om1 recipe                  |
54*03ce13f7SAndroid Build Coastguard Worker+========================================+=============================+
55*03ce13f7SAndroid Build Coastguard Worker| Parse .pexe file                       | Parse .pexe file            |
56*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
57*03ce13f7SAndroid Build Coastguard Worker| Loop nest analysis                     |                             |
58*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
59*03ce13f7SAndroid Build Coastguard Worker| Local common subexpression elimination |                             |
60*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
61*03ce13f7SAndroid Build Coastguard Worker| Address mode inference                 |                             |
62*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
63*03ce13f7SAndroid Build Coastguard Worker| Read-modify-write (RMW) transform      |                             |
64*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
65*03ce13f7SAndroid Build Coastguard Worker| Basic liveness analysis                |                             |
66*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
67*03ce13f7SAndroid Build Coastguard Worker| Load optimization                      |                             |
68*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
69*03ce13f7SAndroid Build Coastguard Worker|                                        | Phi lowering (simple)       |
70*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
71*03ce13f7SAndroid Build Coastguard Worker| Target lowering                        | Target lowering             |
72*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
73*03ce13f7SAndroid Build Coastguard Worker| Full liveness analysis                 |                             |
74*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
75*03ce13f7SAndroid Build Coastguard Worker| Register allocation                    | Minimal register allocation |
76*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
77*03ce13f7SAndroid Build Coastguard Worker| Phi lowering (advanced)                |                             |
78*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
79*03ce13f7SAndroid Build Coastguard Worker| Post-phi register allocation           |                             |
80*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
81*03ce13f7SAndroid Build Coastguard Worker| Branch optimization                    |                             |
82*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
83*03ce13f7SAndroid Build Coastguard Worker| Code emission                          | Code emission               |
84*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+
85*03ce13f7SAndroid Build Coastguard Worker
86*03ce13f7SAndroid Build Coastguard WorkerGoals
87*03ce13f7SAndroid Build Coastguard Worker=====
88*03ce13f7SAndroid Build Coastguard Worker
89*03ce13f7SAndroid Build Coastguard WorkerTranslation speed
90*03ce13f7SAndroid Build Coastguard Worker-----------------
91*03ce13f7SAndroid Build Coastguard Worker
92*03ce13f7SAndroid Build Coastguard WorkerWe'd like to be able to translate a ``.pexe`` file as fast as download speed.
93*03ce13f7SAndroid Build Coastguard WorkerAny faster is in a sense wasted effort.  Download speed varies greatly, but
94*03ce13f7SAndroid Build Coastguard Workerwe'll arbitrarily say 1 MB/sec.  We'll pick the ARM A15 CPU as the example of a
95*03ce13f7SAndroid Build Coastguard Workerslow machine.  We observe a 3× single-thread performance difference between A15
96*03ce13f7SAndroid Build Coastguard Workerand a high-end x86 Xeon E5-2690 based workstation, and aggressively assume a
97*03ce13f7SAndroid Build Coastguard Worker``.pexe`` file could be compressed to 50% on the web server using gzip transport
98*03ce13f7SAndroid Build Coastguard Workercompression, so we set the translation speed goal to 6 MB/sec on the high-end
99*03ce13f7SAndroid Build Coastguard WorkerXeon workstation.
100*03ce13f7SAndroid Build Coastguard Worker
101*03ce13f7SAndroid Build Coastguard WorkerCurrently, at the ``-O0`` level, the LLVM-based PNaCl translation translates at
102*03ce13f7SAndroid Build Coastguard Worker⅒ the target rate.  The ``-O2`` mode takes 3× as long as the ``-O0`` mode.
103*03ce13f7SAndroid Build Coastguard Worker
104*03ce13f7SAndroid Build Coastguard WorkerIn other words, Subzero's goal is to improve over LLVM's translation speed by
105*03ce13f7SAndroid Build Coastguard Worker10×.
106*03ce13f7SAndroid Build Coastguard Worker
107*03ce13f7SAndroid Build Coastguard WorkerCode quality
108*03ce13f7SAndroid Build Coastguard Worker------------
109*03ce13f7SAndroid Build Coastguard Worker
110*03ce13f7SAndroid Build Coastguard WorkerSubzero's initial goal is to produce code that meets or exceeds LLVM's ``-O0``
111*03ce13f7SAndroid Build Coastguard Workercode quality.  The stretch goal is to approach LLVM ``-O2`` code quality.  On
112*03ce13f7SAndroid Build Coastguard Workeraverage, LLVM ``-O2`` performs twice as well as LLVM ``-O0``.
113*03ce13f7SAndroid Build Coastguard Worker
114*03ce13f7SAndroid Build Coastguard WorkerIt's important to note that the quality of Subzero-generated code depends on
115*03ce13f7SAndroid Build Coastguard Workertarget-neutral optimizations and simplifications being run beforehand in the
116*03ce13f7SAndroid Build Coastguard Workerdeveloper environment.  The ``.pexe`` file reflects these optimizations.  For
117*03ce13f7SAndroid Build Coastguard Workerexample, Subzero assumes that the basic blocks are ordered topologically where
118*03ce13f7SAndroid Build Coastguard Workerpossible (which makes liveness analysis converge fastest), and Subzero does not
119*03ce13f7SAndroid Build Coastguard Workerdo any function inlining because it should already have been done.
120*03ce13f7SAndroid Build Coastguard Worker
121*03ce13f7SAndroid Build Coastguard WorkerTranslator size
122*03ce13f7SAndroid Build Coastguard Worker---------------
123*03ce13f7SAndroid Build Coastguard Worker
124*03ce13f7SAndroid Build Coastguard WorkerThe current LLVM-based translator binary (``pnacl-llc``) is about 10 MB in size.
125*03ce13f7SAndroid Build Coastguard WorkerWe think 1 MB is a more reasonable size -- especially for such a component that
126*03ce13f7SAndroid Build Coastguard Workeris distributed to a billion Chrome users.  Thus we target a 10× reduction in
127*03ce13f7SAndroid Build Coastguard Workerbinary size.
128*03ce13f7SAndroid Build Coastguard Worker
129*03ce13f7SAndroid Build Coastguard WorkerFor development, Subzero can be built for all target architectures, and all
130*03ce13f7SAndroid Build Coastguard Workerdebugging and diagnostic options enabled.  For a smaller translator, we restrict
131*03ce13f7SAndroid Build Coastguard Workerto a single target architecture, and define a ``MINIMAL`` build where
132*03ce13f7SAndroid Build Coastguard Workerunnecessary features are compiled out.
133*03ce13f7SAndroid Build Coastguard Worker
134*03ce13f7SAndroid Build Coastguard WorkerSubzero leverages some data structures from LLVM's ``ADT`` and ``Support``
135*03ce13f7SAndroid Build Coastguard Workerinclude directories, which have little impact on translator size.  It also uses
136*03ce13f7SAndroid Build Coastguard Workersome of LLVM's bitcode decoding code (for binary-format ``.pexe`` files), again
137*03ce13f7SAndroid Build Coastguard Workerwith little size impact.  In non-``MINIMAL`` builds, the translator size is much
138*03ce13f7SAndroid Build Coastguard Workerlarger due to including code for parsing text-format bitcode files and forming
139*03ce13f7SAndroid Build Coastguard WorkerLLVM IR.
140*03ce13f7SAndroid Build Coastguard Worker
141*03ce13f7SAndroid Build Coastguard WorkerMemory footprint
142*03ce13f7SAndroid Build Coastguard Worker----------------
143*03ce13f7SAndroid Build Coastguard Worker
144*03ce13f7SAndroid Build Coastguard WorkerThe current LLVM-based translator suffers from an issue in which some
145*03ce13f7SAndroid Build Coastguard Workerfunction-specific data has to be retained in memory until all translation
146*03ce13f7SAndroid Build Coastguard Workercompletes, and therefore the memory footprint grows without bound.  Large
147*03ce13f7SAndroid Build Coastguard Worker``.pexe`` files can lead to the translator process holding hundreds of MB of
148*03ce13f7SAndroid Build Coastguard Workermemory by the end.  The translator runs in a separate process, so this memory
149*03ce13f7SAndroid Build Coastguard Workergrowth doesn't *directly* affect other processes, but it does dirty the physical
150*03ce13f7SAndroid Build Coastguard Workermemory and contributes to a perception of bloat and sometimes a reality of
151*03ce13f7SAndroid Build Coastguard Workerout-of-memory tab killing, especially noticeable on weaker systems.
152*03ce13f7SAndroid Build Coastguard Worker
153*03ce13f7SAndroid Build Coastguard WorkerSubzero should maintain a stable memory footprint throughout translation.  It's
154*03ce13f7SAndroid Build Coastguard Workernot really practical to set a specific limit, because there is not really a
155*03ce13f7SAndroid Build Coastguard Workerpractical limit on a single function's size, but the footprint should be
156*03ce13f7SAndroid Build Coastguard Worker"reasonable" and be proportional to the largest input function size, not the
157*03ce13f7SAndroid Build Coastguard Workertotal ``.pexe`` file size.  Simply put, Subzero should not have memory leaks or
158*03ce13f7SAndroid Build Coastguard Workerinexorable memory growth.  (We use ASAN builds to test for leaks.)
159*03ce13f7SAndroid Build Coastguard Worker
160*03ce13f7SAndroid Build Coastguard WorkerMultithreaded translation
161*03ce13f7SAndroid Build Coastguard Worker-------------------------
162*03ce13f7SAndroid Build Coastguard Worker
163*03ce13f7SAndroid Build Coastguard WorkerIt should be practical to translate different functions concurrently and see
164*03ce13f7SAndroid Build Coastguard Workergood scalability.  Some locking may be needed, such as accessing output buffers
165*03ce13f7SAndroid Build Coastguard Workeror constant pools, but that should be fairly minimal.  In contrast, LLVM was
166*03ce13f7SAndroid Build Coastguard Workeronly designed for module-level parallelism, and as such, the PNaCl translator
167*03ce13f7SAndroid Build Coastguard Workerinternally splits a ``.pexe`` file into several modules for concurrent
168*03ce13f7SAndroid Build Coastguard Workertranslation.  All output needs to be deterministic regardless of the level of
169*03ce13f7SAndroid Build Coastguard Workermultithreading, i.e. functions and data should always be output in the same
170*03ce13f7SAndroid Build Coastguard Workerorder.
171*03ce13f7SAndroid Build Coastguard Worker
172*03ce13f7SAndroid Build Coastguard WorkerTarget architectures
173*03ce13f7SAndroid Build Coastguard Worker--------------------
174*03ce13f7SAndroid Build Coastguard Worker
175*03ce13f7SAndroid Build Coastguard WorkerInitial target architectures are x86-32, x86-64, ARM32, and MIPS32.  Future
176*03ce13f7SAndroid Build Coastguard Workertargets include ARM64 and MIPS64, though these targets lack NaCl support
177*03ce13f7SAndroid Build Coastguard Workerincluding a sandbox model or a validator.
178*03ce13f7SAndroid Build Coastguard Worker
179*03ce13f7SAndroid Build Coastguard WorkerThe first implementation is for x86-32, because it was expected to be
180*03ce13f7SAndroid Build Coastguard Workerparticularly challenging, and thus more likely to draw out any design problems
181*03ce13f7SAndroid Build Coastguard Workerearly:
182*03ce13f7SAndroid Build Coastguard Worker
183*03ce13f7SAndroid Build Coastguard Worker- There are a number of special cases, asymmetries, and warts in the x86
184*03ce13f7SAndroid Build Coastguard Worker  instruction set.
185*03ce13f7SAndroid Build Coastguard Worker
186*03ce13f7SAndroid Build Coastguard Worker- Complex addressing modes may be leveraged for better code quality.
187*03ce13f7SAndroid Build Coastguard Worker
188*03ce13f7SAndroid Build Coastguard Worker- 64-bit integer operations have to be lowered into longer sequences of 32-bit
189*03ce13f7SAndroid Build Coastguard Worker  operations.
190*03ce13f7SAndroid Build Coastguard Worker
191*03ce13f7SAndroid Build Coastguard Worker- Paucity of physical registers may reveal code quality issues early in the
192*03ce13f7SAndroid Build Coastguard Worker  design.
193*03ce13f7SAndroid Build Coastguard Worker
194*03ce13f7SAndroid Build Coastguard WorkerDetailed design
195*03ce13f7SAndroid Build Coastguard Worker===============
196*03ce13f7SAndroid Build Coastguard Worker
197*03ce13f7SAndroid Build Coastguard WorkerIntermediate representation - ICE
198*03ce13f7SAndroid Build Coastguard Worker---------------------------------
199*03ce13f7SAndroid Build Coastguard Worker
200*03ce13f7SAndroid Build Coastguard WorkerSubzero's IR is called ICE.  It is designed to be reasonably similar to LLVM's
201*03ce13f7SAndroid Build Coastguard WorkerIR, which is reflected in the ``.pexe`` file's bitcode structure.  It has a
202*03ce13f7SAndroid Build Coastguard Workerrepresentation of global variables and initializers, and a set of functions.
203*03ce13f7SAndroid Build Coastguard WorkerEach function contains a list of basic blocks, and each basic block constains a
204*03ce13f7SAndroid Build Coastguard Workerlist of instructions.  Instructions that operate on stack and register variables
205*03ce13f7SAndroid Build Coastguard Workerdo so using static single assignment (SSA) form.
206*03ce13f7SAndroid Build Coastguard Worker
207*03ce13f7SAndroid Build Coastguard WorkerThe ``.pexe`` file is translated one function at a time (or in parallel by
208*03ce13f7SAndroid Build Coastguard Workermultiple translation threads).  The recipe for optimization passes depends on
209*03ce13f7SAndroid Build Coastguard Workerthe specific target and optimization level, and is described in detail below.
210*03ce13f7SAndroid Build Coastguard WorkerGlobal variables (types and initializers) are simply and directly translated to
211*03ce13f7SAndroid Build Coastguard Workerobject code, without any meaningful attempts at optimization.
212*03ce13f7SAndroid Build Coastguard Worker
213*03ce13f7SAndroid Build Coastguard WorkerA function's control flow graph (CFG) is represented by the ``Ice::Cfg`` class.
214*03ce13f7SAndroid Build Coastguard WorkerIts key contents include:
215*03ce13f7SAndroid Build Coastguard Worker
216*03ce13f7SAndroid Build Coastguard Worker- A list of ``CfgNode`` pointers, generally held in topological order.
217*03ce13f7SAndroid Build Coastguard Worker
218*03ce13f7SAndroid Build Coastguard Worker- A list of ``Variable`` pointers corresponding to local variables used in the
219*03ce13f7SAndroid Build Coastguard Worker  function plus compiler-generated temporaries.
220*03ce13f7SAndroid Build Coastguard Worker
221*03ce13f7SAndroid Build Coastguard WorkerA basic block is represented by the ``Ice::CfgNode`` class.  Its key contents
222*03ce13f7SAndroid Build Coastguard Workerinclude:
223*03ce13f7SAndroid Build Coastguard Worker
224*03ce13f7SAndroid Build Coastguard Worker- A linear list of instructions, in the same style as LLVM.  The last
225*03ce13f7SAndroid Build Coastguard Worker  instruction of the list is always a terminator instruction: branch, switch,
226*03ce13f7SAndroid Build Coastguard Worker  return, unreachable.
227*03ce13f7SAndroid Build Coastguard Worker
228*03ce13f7SAndroid Build Coastguard Worker- A list of Phi instructions, also in the same style as LLVM.  They are held as
229*03ce13f7SAndroid Build Coastguard Worker  a linear list for convenience, though per Phi semantics, they are executed "in
230*03ce13f7SAndroid Build Coastguard Worker  parallel" without dependencies on each other.
231*03ce13f7SAndroid Build Coastguard Worker
232*03ce13f7SAndroid Build Coastguard Worker- An unordered list of ``CfgNode`` pointers corresponding to incoming edges, and
233*03ce13f7SAndroid Build Coastguard Worker  another list for outgoing edges.
234*03ce13f7SAndroid Build Coastguard Worker
235*03ce13f7SAndroid Build Coastguard Worker- The node's unique, 0-based index into the CFG's node list.
236*03ce13f7SAndroid Build Coastguard Worker
237*03ce13f7SAndroid Build Coastguard WorkerAn instruction is represented by the ``Ice::Inst`` class.  Its key contents
238*03ce13f7SAndroid Build Coastguard Workerinclude:
239*03ce13f7SAndroid Build Coastguard Worker
240*03ce13f7SAndroid Build Coastguard Worker- A list of source operands.
241*03ce13f7SAndroid Build Coastguard Worker
242*03ce13f7SAndroid Build Coastguard Worker- Its destination variable, if the instruction produces a result in an
243*03ce13f7SAndroid Build Coastguard Worker  ``Ice::Variable``.
244*03ce13f7SAndroid Build Coastguard Worker
245*03ce13f7SAndroid Build Coastguard Worker- A bitvector indicating which variables' live ranges this instruction ends.
246*03ce13f7SAndroid Build Coastguard Worker  This is computed during liveness analysis.
247*03ce13f7SAndroid Build Coastguard Worker
248*03ce13f7SAndroid Build Coastguard WorkerInstructions kinds are divided into high-level ICE instructions and low-level
249*03ce13f7SAndroid Build Coastguard WorkerICE instructions.  High-level instructions consist of the PNaCl/LLVM bitcode
250*03ce13f7SAndroid Build Coastguard Workerinstruction kinds.  Each target architecture implementation extends the
251*03ce13f7SAndroid Build Coastguard Workerinstruction space with its own set of low-level instructions.  Generally,
252*03ce13f7SAndroid Build Coastguard Workerlow-level instructions correspond to individual machine instructions.  The
253*03ce13f7SAndroid Build Coastguard Workerhigh-level ICE instruction space includes a few additional instruction kinds
254*03ce13f7SAndroid Build Coastguard Workerthat are not part of LLVM but are generally useful (e.g., an Assignment
255*03ce13f7SAndroid Build Coastguard Workerinstruction), or are useful across targets.
256*03ce13f7SAndroid Build Coastguard Worker
257*03ce13f7SAndroid Build Coastguard WorkerSpecifically, high-level ICE instructions that derive from LLVM (but with PNaCl
258*03ce13f7SAndroid Build Coastguard WorkerABI restrictions as documented in the `PNaCl Bitcode Reference Manual
259*03ce13f7SAndroid Build Coastguard Worker<https://developer.chrome.com/native-client/reference/pnacl-bitcode-abi>`_) are
260*03ce13f7SAndroid Build Coastguard Workerthe following:
261*03ce13f7SAndroid Build Coastguard Worker
262*03ce13f7SAndroid Build Coastguard Worker- Alloca: allocate data on the stack
263*03ce13f7SAndroid Build Coastguard Worker
264*03ce13f7SAndroid Build Coastguard Worker- Arithmetic: binary operations of the form ``A = B op C``
265*03ce13f7SAndroid Build Coastguard Worker
266*03ce13f7SAndroid Build Coastguard Worker- Br: conditional or unconditional branch
267*03ce13f7SAndroid Build Coastguard Worker
268*03ce13f7SAndroid Build Coastguard Worker- Call: function call
269*03ce13f7SAndroid Build Coastguard Worker
270*03ce13f7SAndroid Build Coastguard Worker- Cast: unary type-conversion operations
271*03ce13f7SAndroid Build Coastguard Worker
272*03ce13f7SAndroid Build Coastguard Worker- ExtractElement: extract a scalar element from a vector-type value
273*03ce13f7SAndroid Build Coastguard Worker
274*03ce13f7SAndroid Build Coastguard Worker- Fcmp: floating-point comparison
275*03ce13f7SAndroid Build Coastguard Worker
276*03ce13f7SAndroid Build Coastguard Worker- Icmp: integer comparison
277*03ce13f7SAndroid Build Coastguard Worker
278*03ce13f7SAndroid Build Coastguard Worker- Intrinsic: invoke a known intrinsic
279*03ce13f7SAndroid Build Coastguard Worker
280*03ce13f7SAndroid Build Coastguard Worker- InsertElement: insert a scalar element into a vector-type value
281*03ce13f7SAndroid Build Coastguard Worker
282*03ce13f7SAndroid Build Coastguard Worker- Load: load a value from memory
283*03ce13f7SAndroid Build Coastguard Worker
284*03ce13f7SAndroid Build Coastguard Worker- Phi: implement the SSA phi node
285*03ce13f7SAndroid Build Coastguard Worker
286*03ce13f7SAndroid Build Coastguard Worker- Ret: return from the function
287*03ce13f7SAndroid Build Coastguard Worker
288*03ce13f7SAndroid Build Coastguard Worker- Select: essentially the C language operation of the form ``X = C ? Y : Z``
289*03ce13f7SAndroid Build Coastguard Worker
290*03ce13f7SAndroid Build Coastguard Worker- Store: store a value into memory
291*03ce13f7SAndroid Build Coastguard Worker
292*03ce13f7SAndroid Build Coastguard Worker- Switch: generalized branch to multiple possible locations
293*03ce13f7SAndroid Build Coastguard Worker
294*03ce13f7SAndroid Build Coastguard Worker- Unreachable: indicate that this portion of the code is unreachable
295*03ce13f7SAndroid Build Coastguard Worker
296*03ce13f7SAndroid Build Coastguard WorkerThe additional high-level ICE instructions are the following:
297*03ce13f7SAndroid Build Coastguard Worker
298*03ce13f7SAndroid Build Coastguard Worker- Assign: a simple ``A=B`` assignment.  This is useful for e.g. lowering Phi
299*03ce13f7SAndroid Build Coastguard Worker  instructions to non-SSA assignments, before lowering to machine code.
300*03ce13f7SAndroid Build Coastguard Worker
301*03ce13f7SAndroid Build Coastguard Worker- FakeDef, FakeUse, FakeKill.  These are tools used to preserve consistency in
302*03ce13f7SAndroid Build Coastguard Worker  liveness analysis, elevated to the high-level because they are used by all
303*03ce13f7SAndroid Build Coastguard Worker  targets.  They are described in more detail at the end of this section.
304*03ce13f7SAndroid Build Coastguard Worker
305*03ce13f7SAndroid Build Coastguard Worker- JumpTable: this represents the result of switch optimization analysis, where
306*03ce13f7SAndroid Build Coastguard Worker  some switch instructions may use jump tables instead of cascading
307*03ce13f7SAndroid Build Coastguard Worker  compare/branches.
308*03ce13f7SAndroid Build Coastguard Worker
309*03ce13f7SAndroid Build Coastguard WorkerAn operand is represented by the ``Ice::Operand`` class.  In high-level ICE, an
310*03ce13f7SAndroid Build Coastguard Workeroperand is either an ``Ice::Constant`` or an ``Ice::Variable``.  Constants
311*03ce13f7SAndroid Build Coastguard Workerinclude scalar integer constants, scalar floating point constants, Undef (an
312*03ce13f7SAndroid Build Coastguard Workerunspecified constant of a particular scalar or vector type), and symbol
313*03ce13f7SAndroid Build Coastguard Workerconstants (essentially addresses of globals).  Note that the PNaCl ABI does not
314*03ce13f7SAndroid Build Coastguard Workerinclude vector-type constants besides Undef, and as such, Subzero (so far) has
315*03ce13f7SAndroid Build Coastguard Workerno reason to represent vector-type constants internally.  A variable represents
316*03ce13f7SAndroid Build Coastguard Workera value allocated on the stack (though not including alloca-derived storage).
317*03ce13f7SAndroid Build Coastguard WorkerAmong other things, a variable holds its unique, 0-based index into the CFG's
318*03ce13f7SAndroid Build Coastguard Workervariable list.
319*03ce13f7SAndroid Build Coastguard Worker
320*03ce13f7SAndroid Build Coastguard WorkerEach target can extend the ``Constant`` and ``Variable`` classes for its own
321*03ce13f7SAndroid Build Coastguard Workerneeds.  In addition, the ``Operand`` class may be extended, e.g. to define an
322*03ce13f7SAndroid Build Coastguard Workerx86 ``MemOperand`` that encodes a base register, an index register, an index
323*03ce13f7SAndroid Build Coastguard Workerregister shift amount, and a constant offset.
324*03ce13f7SAndroid Build Coastguard Worker
325*03ce13f7SAndroid Build Coastguard WorkerRegister allocation and liveness analysis are restricted to Variable operands.
326*03ce13f7SAndroid Build Coastguard WorkerBecause of the importance of register allocation to code quality, and the
327*03ce13f7SAndroid Build Coastguard Workertranslation-time cost of liveness analysis, Variable operands get some special
328*03ce13f7SAndroid Build Coastguard Workertreatment in ICE.  Most notably, a frequent pattern in Subzero is to iterate
329*03ce13f7SAndroid Build Coastguard Workeracross all the Variables of an instruction.  An instruction holds a list of
330*03ce13f7SAndroid Build Coastguard Workeroperands, but an operand may contain 0, 1, or more Variables.  As such, the
331*03ce13f7SAndroid Build Coastguard Worker``Operand`` class specially holds a list of Variables contained within, for
332*03ce13f7SAndroid Build Coastguard Workerquick access.
333*03ce13f7SAndroid Build Coastguard Worker
334*03ce13f7SAndroid Build Coastguard WorkerA Subzero transformation pass may work by deleting an existing instruction and
335*03ce13f7SAndroid Build Coastguard Workerreplacing it with zero or more new instructions.  Instead of actually deleting
336*03ce13f7SAndroid Build Coastguard Workerthe existing instruction, we generally mark it as deleted and insert the new
337*03ce13f7SAndroid Build Coastguard Workerinstructions right after the deleted instruction.  When printing the IR for
338*03ce13f7SAndroid Build Coastguard Workerdebugging, this is a big help because it makes it much more clear how the
339*03ce13f7SAndroid Build Coastguard Workernon-deleted instructions came about.
340*03ce13f7SAndroid Build Coastguard Worker
341*03ce13f7SAndroid Build Coastguard WorkerSubzero has a few special instructions to help with liveness analysis
342*03ce13f7SAndroid Build Coastguard Workerconsistency.
343*03ce13f7SAndroid Build Coastguard Worker
344*03ce13f7SAndroid Build Coastguard Worker- The FakeDef instruction gives a fake definition of some variable.  For
345*03ce13f7SAndroid Build Coastguard Worker  example, on x86-32, a divide instruction defines both ``%eax`` and ``%edx``
346*03ce13f7SAndroid Build Coastguard Worker  but an ICE instruction can represent only one destination variable.  This is
347*03ce13f7SAndroid Build Coastguard Worker  similar for multiply instructions, and for function calls that return a 64-bit
348*03ce13f7SAndroid Build Coastguard Worker  integer result in the ``%edx:%eax`` pair.  Also, using the ``xor %eax, %eax``
349*03ce13f7SAndroid Build Coastguard Worker  trick to set ``%eax`` to 0 requires an initial FakeDef of ``%eax``.
350*03ce13f7SAndroid Build Coastguard Worker
351*03ce13f7SAndroid Build Coastguard Worker- The FakeUse instruction registers a use of a variable, typically to prevent an
352*03ce13f7SAndroid Build Coastguard Worker  earlier assignment to that variable from being dead-code eliminated.  For
353*03ce13f7SAndroid Build Coastguard Worker  example, lowering an operation like ``x=cc?y:z`` may be done using x86's
354*03ce13f7SAndroid Build Coastguard Worker  conditional move (cmov) instruction: ``mov z, x; cmov_cc y, x``.  Without a
355*03ce13f7SAndroid Build Coastguard Worker  FakeUse of ``x`` between the two instructions, the liveness analysis pass may
356*03ce13f7SAndroid Build Coastguard Worker  dead-code eliminate the first instruction.
357*03ce13f7SAndroid Build Coastguard Worker
358*03ce13f7SAndroid Build Coastguard Worker- The FakeKill instruction is added after a call instruction, and is a quick way
359*03ce13f7SAndroid Build Coastguard Worker  of indicating that caller-save registers are invalidated.
360*03ce13f7SAndroid Build Coastguard Worker
361*03ce13f7SAndroid Build Coastguard WorkerPexe parsing
362*03ce13f7SAndroid Build Coastguard Worker------------
363*03ce13f7SAndroid Build Coastguard Worker
364*03ce13f7SAndroid Build Coastguard WorkerSubzero includes an integrated PNaCl bitcode parser for ``.pexe`` files.  It
365*03ce13f7SAndroid Build Coastguard Workerparses the ``.pexe`` file function by function, ultimately constructing an ICE
366*03ce13f7SAndroid Build Coastguard WorkerCFG for each function.  After a function is parsed, its CFG is handed off to the
367*03ce13f7SAndroid Build Coastguard Workertranslation phase.  The bitcode parser also parses global initializer data and
368*03ce13f7SAndroid Build Coastguard Workerhands it off to be translated to data sections in the object file.
369*03ce13f7SAndroid Build Coastguard Worker
370*03ce13f7SAndroid Build Coastguard WorkerSubzero has another parsing strategy for testing/debugging.  LLVM libraries can
371*03ce13f7SAndroid Build Coastguard Workerbe used to parse a module into LLVM IR (though very slowly relative to Subzero
372*03ce13f7SAndroid Build Coastguard Workernative parsing).  Then we iterate across the LLVM IR and construct high-level
373*03ce13f7SAndroid Build Coastguard WorkerICE, handing off each CFG to the translation phase.
374*03ce13f7SAndroid Build Coastguard Worker
375*03ce13f7SAndroid Build Coastguard WorkerOverview of lowering
376*03ce13f7SAndroid Build Coastguard Worker--------------------
377*03ce13f7SAndroid Build Coastguard Worker
378*03ce13f7SAndroid Build Coastguard WorkerIn general, translation goes like this:
379*03ce13f7SAndroid Build Coastguard Worker
380*03ce13f7SAndroid Build Coastguard Worker- Parse the next function from the ``.pexe`` file and construct a CFG consisting
381*03ce13f7SAndroid Build Coastguard Worker  of high-level ICE.
382*03ce13f7SAndroid Build Coastguard Worker
383*03ce13f7SAndroid Build Coastguard Worker- Do analysis passes and transformation passes on the high-level ICE, as
384*03ce13f7SAndroid Build Coastguard Worker  desired.
385*03ce13f7SAndroid Build Coastguard Worker
386*03ce13f7SAndroid Build Coastguard Worker- Lower each high-level ICE instruction into a sequence of zero or more
387*03ce13f7SAndroid Build Coastguard Worker  low-level ICE instructions.  Each high-level instruction is generally lowered
388*03ce13f7SAndroid Build Coastguard Worker  independently, though the target lowering is allowed to look ahead in the
389*03ce13f7SAndroid Build Coastguard Worker  CfgNode's instruction list if desired.
390*03ce13f7SAndroid Build Coastguard Worker
391*03ce13f7SAndroid Build Coastguard Worker- Do more analysis and transformation passes on the low-level ICE, as desired.
392*03ce13f7SAndroid Build Coastguard Worker
393*03ce13f7SAndroid Build Coastguard Worker- Assemble the low-level CFG into an ELF object file (alternatively, a textual
394*03ce13f7SAndroid Build Coastguard Worker  assembly file that is later assembled by some external tool).
395*03ce13f7SAndroid Build Coastguard Worker
396*03ce13f7SAndroid Build Coastguard Worker- Repeat for all functions, and also produce object code for data such as global
397*03ce13f7SAndroid Build Coastguard Worker  initializers and internal constant pools.
398*03ce13f7SAndroid Build Coastguard Worker
399*03ce13f7SAndroid Build Coastguard WorkerCurrently there are two optimization levels: ``O2`` and ``Om1``.  For ``O2``,
400*03ce13f7SAndroid Build Coastguard Workerthe intention is to apply all available optimizations to get the best code
401*03ce13f7SAndroid Build Coastguard Workerquality (though the initial code quality goal is measured against LLVM's ``O0``
402*03ce13f7SAndroid Build Coastguard Workercode quality).  For ``Om1``, the intention is to apply as few optimizations as
403*03ce13f7SAndroid Build Coastguard Workerpossible and produce code as quickly as possible, accepting poor code quality.
404*03ce13f7SAndroid Build Coastguard Worker``Om1`` is short for "O-minus-one", i.e. "worse than O0", or in other words,
405*03ce13f7SAndroid Build Coastguard Worker"sub-zero".
406*03ce13f7SAndroid Build Coastguard Worker
407*03ce13f7SAndroid Build Coastguard WorkerHigh-level debuggability of generated code is so far not a design requirement.
408*03ce13f7SAndroid Build Coastguard WorkerSubzero doesn't really do transformations that would obfuscate debugging; the
409*03ce13f7SAndroid Build Coastguard Workermain thing might be that register allocation (including stack slot coalescing
410*03ce13f7SAndroid Build Coastguard Workerfor stack-allocated variables whose live ranges don't overlap) may render a
411*03ce13f7SAndroid Build Coastguard Workervariable's value unobtainable after its live range ends.  This would not be an
412*03ce13f7SAndroid Build Coastguard Workerissue for ``Om1`` since it doesn't register-allocate program-level variables,
413*03ce13f7SAndroid Build Coastguard Workernor does it coalesce stack slots.  That said, fully supporting debuggability
414*03ce13f7SAndroid Build Coastguard Workerwould require a few additions:
415*03ce13f7SAndroid Build Coastguard Worker
416*03ce13f7SAndroid Build Coastguard Worker- DWARF support would need to be added to Subzero's ELF file emitter.  Subzero
417*03ce13f7SAndroid Build Coastguard Worker  propagates global symbol names, local variable names, and function-internal
418*03ce13f7SAndroid Build Coastguard Worker  label names that are present in the ``.pexe`` file.  This would allow a
419*03ce13f7SAndroid Build Coastguard Worker  debugger to map addresses back to symbols in the ``.pexe`` file.
420*03ce13f7SAndroid Build Coastguard Worker
421*03ce13f7SAndroid Build Coastguard Worker- To map ``.pexe`` file symbols back to meaningful source-level symbol names,
422*03ce13f7SAndroid Build Coastguard Worker  file names, line numbers, etc., Subzero would need to handle `LLVM bitcode
423*03ce13f7SAndroid Build Coastguard Worker  metadata <http://llvm.org/docs/LangRef.html#metadata>`_ and ``llvm.dbg``
424*03ce13f7SAndroid Build Coastguard Worker  `instrinsics<http://llvm.org/docs/LangRef.html#dbg-intrinsics>`_.
425*03ce13f7SAndroid Build Coastguard Worker
426*03ce13f7SAndroid Build Coastguard Worker- The PNaCl toolchain explicitly strips all this from the ``.pexe`` file, and so
427*03ce13f7SAndroid Build Coastguard Worker  the toolchain would need to be modified to preserve it.
428*03ce13f7SAndroid Build Coastguard Worker
429*03ce13f7SAndroid Build Coastguard WorkerOur experience so far is that ``Om1`` translates twice as fast as ``O2``, but
430*03ce13f7SAndroid Build Coastguard Workerproduces code with one third the code quality.  ``Om1`` is good for testing and
431*03ce13f7SAndroid Build Coastguard Workerdebugging -- during translation, it tends to expose errors in the basic lowering
432*03ce13f7SAndroid Build Coastguard Workerthat might otherwise have been hidden by the register allocator or other
433*03ce13f7SAndroid Build Coastguard Workeroptimization passes.  It also helps determine whether a code correctness problem
434*03ce13f7SAndroid Build Coastguard Workeris a fundamental problem in the basic lowering, or an error in another
435*03ce13f7SAndroid Build Coastguard Workeroptimization pass.
436*03ce13f7SAndroid Build Coastguard Worker
437*03ce13f7SAndroid Build Coastguard WorkerThe implementation of target lowering also controls the recipe of passes used
438*03ce13f7SAndroid Build Coastguard Workerfor ``Om1`` and ``O2`` translation.  For example, address mode inference may
439*03ce13f7SAndroid Build Coastguard Workeronly be relevant for x86.
440*03ce13f7SAndroid Build Coastguard Worker
441*03ce13f7SAndroid Build Coastguard WorkerLowering strategy
442*03ce13f7SAndroid Build Coastguard Worker-----------------
443*03ce13f7SAndroid Build Coastguard Worker
444*03ce13f7SAndroid Build Coastguard WorkerThe core of Subzero's lowering from high-level ICE to low-level ICE is to lower
445*03ce13f7SAndroid Build Coastguard Workereach high-level instruction down to a sequence of low-level target-specific
446*03ce13f7SAndroid Build Coastguard Workerinstructions, in a largely context-free setting.  That is, each high-level
447*03ce13f7SAndroid Build Coastguard Workerinstruction conceptually has a simple template expansion into low-level
448*03ce13f7SAndroid Build Coastguard Workerinstructions, and lowering can in theory be done in any order.  This may sound
449*03ce13f7SAndroid Build Coastguard Workerlike a small effort, but quite a large number of templates may be needed because
450*03ce13f7SAndroid Build Coastguard Workerof the number of PNaCl types and instruction variants.  Furthermore, there may
451*03ce13f7SAndroid Build Coastguard Workerbe optimized templates, e.g. to take advantage of operator commutativity (for
452*03ce13f7SAndroid Build Coastguard Workerexample, ``x=x+1`` might allow a bettern lowering than ``x=1+x``).  This is
453*03ce13f7SAndroid Build Coastguard Workersimilar to other template-based approaches in fast code generation or
454*03ce13f7SAndroid Build Coastguard Workerinterpretation, though some decisions are deferred until after some global
455*03ce13f7SAndroid Build Coastguard Workeranalysis passes, mostly related to register allocation, stack slot assignment,
456*03ce13f7SAndroid Build Coastguard Workerand specific choice of instruction variant and addressing mode.
457*03ce13f7SAndroid Build Coastguard Worker
458*03ce13f7SAndroid Build Coastguard WorkerThe key idea for a lowering template is to produce valid low-level instructions
459*03ce13f7SAndroid Build Coastguard Workerthat are guaranteed to meet address mode and other structural requirements of
460*03ce13f7SAndroid Build Coastguard Workerthe instruction set.  For example, on x86, the source operand of an integer
461*03ce13f7SAndroid Build Coastguard Workerstore instruction must be an immediate or a physical register; a shift
462*03ce13f7SAndroid Build Coastguard Workerinstruction's shift amount must be an immediate or in register ``%cl``; a
463*03ce13f7SAndroid Build Coastguard Workerfunction's integer return value is in ``%eax``; most x86 instructions are
464*03ce13f7SAndroid Build Coastguard Workertwo-operand, in contrast to corresponding three-operand high-level instructions;
465*03ce13f7SAndroid Build Coastguard Workeretc.
466*03ce13f7SAndroid Build Coastguard Worker
467*03ce13f7SAndroid Build Coastguard WorkerBecause target lowering runs before register allocation, there is no way to know
468*03ce13f7SAndroid Build Coastguard Workerwhether a given ``Ice::Variable`` operand lives on the stack or in a physical
469*03ce13f7SAndroid Build Coastguard Workerregister.  When the low-level instruction calls for a physical register operand,
470*03ce13f7SAndroid Build Coastguard Workerthe target lowering can create an infinite-weight Variable.  This tells the
471*03ce13f7SAndroid Build Coastguard Workerregister allocator to assign infinite weight when making decisions, effectively
472*03ce13f7SAndroid Build Coastguard Workerguaranteeing some physical register.  Variables can also be pre-colored to a
473*03ce13f7SAndroid Build Coastguard Workerspecific physical register (``cl`` in the shift example above), which also gives
474*03ce13f7SAndroid Build Coastguard Workerinfinite weight.
475*03ce13f7SAndroid Build Coastguard Worker
476*03ce13f7SAndroid Build Coastguard WorkerTo illustrate, consider a high-level arithmetic instruction on 32-bit integer
477*03ce13f7SAndroid Build Coastguard Workeroperands::
478*03ce13f7SAndroid Build Coastguard Worker
479*03ce13f7SAndroid Build Coastguard Worker    A = B + C
480*03ce13f7SAndroid Build Coastguard Worker
481*03ce13f7SAndroid Build Coastguard WorkerX86 target lowering might produce the following::
482*03ce13f7SAndroid Build Coastguard Worker
483*03ce13f7SAndroid Build Coastguard Worker    T.inf = B  // mov instruction
484*03ce13f7SAndroid Build Coastguard Worker    T.inf += C // add instruction
485*03ce13f7SAndroid Build Coastguard Worker    A = T.inf  // mov instruction
486*03ce13f7SAndroid Build Coastguard Worker
487*03ce13f7SAndroid Build Coastguard WorkerHere, ``T.inf`` is an infinite-weight temporary.  As long as ``T.inf`` has a
488*03ce13f7SAndroid Build Coastguard Workerphysical register, the three lowered instructions are all encodable regardless
489*03ce13f7SAndroid Build Coastguard Workerof whether ``B`` and ``C`` are physical registers, memory, or immediates, and
490*03ce13f7SAndroid Build Coastguard Workerwhether ``A`` is a physical register or in memory.
491*03ce13f7SAndroid Build Coastguard Worker
492*03ce13f7SAndroid Build Coastguard WorkerIn this example, ``A`` must be a Variable and one may be tempted to simplify the
493*03ce13f7SAndroid Build Coastguard Workerlowering sequence by setting ``A`` as infinite-weight and using::
494*03ce13f7SAndroid Build Coastguard Worker
495*03ce13f7SAndroid Build Coastguard Worker        A = B  // mov instruction
496*03ce13f7SAndroid Build Coastguard Worker        A += C // add instruction
497*03ce13f7SAndroid Build Coastguard Worker
498*03ce13f7SAndroid Build Coastguard WorkerThis has two problems.  First, if the original instruction was actually ``A =
499*03ce13f7SAndroid Build Coastguard WorkerB + A``, the result would be incorrect.  Second, assigning ``A`` a physical
500*03ce13f7SAndroid Build Coastguard Workerregister applies throughout ``A``'s entire live range.  This is probably not
501*03ce13f7SAndroid Build Coastguard Workerwhat is intended, and may ultimately lead to a failure to allocate a register
502*03ce13f7SAndroid Build Coastguard Workerfor an infinite-weight variable.
503*03ce13f7SAndroid Build Coastguard Worker
504*03ce13f7SAndroid Build Coastguard WorkerThis style of lowering leads to many temporaries being generated, so in ``O2``
505*03ce13f7SAndroid Build Coastguard Workermode, we rely on the register allocator to clean things up.  For example, in the
506*03ce13f7SAndroid Build Coastguard Workerexample above, if ``B`` ends up getting a physical register and its live range
507*03ce13f7SAndroid Build Coastguard Workerends at this instruction, the register allocator is likely to reuse that
508*03ce13f7SAndroid Build Coastguard Workerregister for ``T.inf``.  This leads to ``T.inf=B`` being a redundant register
509*03ce13f7SAndroid Build Coastguard Workercopy, which is removed as an emission-time peephole optimization.
510*03ce13f7SAndroid Build Coastguard Worker
511*03ce13f7SAndroid Build Coastguard WorkerO2 lowering
512*03ce13f7SAndroid Build Coastguard Worker-----------
513*03ce13f7SAndroid Build Coastguard Worker
514*03ce13f7SAndroid Build Coastguard WorkerCurrently, the ``O2`` lowering recipe is the following:
515*03ce13f7SAndroid Build Coastguard Worker
516*03ce13f7SAndroid Build Coastguard Worker- Loop nest analysis
517*03ce13f7SAndroid Build Coastguard Worker
518*03ce13f7SAndroid Build Coastguard Worker- Local common subexpression elimination
519*03ce13f7SAndroid Build Coastguard Worker
520*03ce13f7SAndroid Build Coastguard Worker- Address mode inference
521*03ce13f7SAndroid Build Coastguard Worker
522*03ce13f7SAndroid Build Coastguard Worker- Read-modify-write (RMW) transformation
523*03ce13f7SAndroid Build Coastguard Worker
524*03ce13f7SAndroid Build Coastguard Worker- Basic liveness analysis
525*03ce13f7SAndroid Build Coastguard Worker
526*03ce13f7SAndroid Build Coastguard Worker- Load optimization
527*03ce13f7SAndroid Build Coastguard Worker
528*03ce13f7SAndroid Build Coastguard Worker- Target lowering
529*03ce13f7SAndroid Build Coastguard Worker
530*03ce13f7SAndroid Build Coastguard Worker- Full liveness analysis
531*03ce13f7SAndroid Build Coastguard Worker
532*03ce13f7SAndroid Build Coastguard Worker- Register allocation
533*03ce13f7SAndroid Build Coastguard Worker
534*03ce13f7SAndroid Build Coastguard Worker- Phi instruction lowering (advanced)
535*03ce13f7SAndroid Build Coastguard Worker
536*03ce13f7SAndroid Build Coastguard Worker- Post-phi lowering register allocation
537*03ce13f7SAndroid Build Coastguard Worker
538*03ce13f7SAndroid Build Coastguard Worker- Branch optimization
539*03ce13f7SAndroid Build Coastguard Worker
540*03ce13f7SAndroid Build Coastguard WorkerThese passes are described in more detail below.
541*03ce13f7SAndroid Build Coastguard Worker
542*03ce13f7SAndroid Build Coastguard WorkerOm1 lowering
543*03ce13f7SAndroid Build Coastguard Worker------------
544*03ce13f7SAndroid Build Coastguard Worker
545*03ce13f7SAndroid Build Coastguard WorkerCurrently, the ``Om1`` lowering recipe is the following:
546*03ce13f7SAndroid Build Coastguard Worker
547*03ce13f7SAndroid Build Coastguard Worker- Phi instruction lowering (simple)
548*03ce13f7SAndroid Build Coastguard Worker
549*03ce13f7SAndroid Build Coastguard Worker- Target lowering
550*03ce13f7SAndroid Build Coastguard Worker
551*03ce13f7SAndroid Build Coastguard Worker- Register allocation (infinite-weight and pre-colored only)
552*03ce13f7SAndroid Build Coastguard Worker
553*03ce13f7SAndroid Build Coastguard WorkerOptimization passes
554*03ce13f7SAndroid Build Coastguard Worker-------------------
555*03ce13f7SAndroid Build Coastguard Worker
556*03ce13f7SAndroid Build Coastguard WorkerLiveness analysis
557*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^
558*03ce13f7SAndroid Build Coastguard Worker
559*03ce13f7SAndroid Build Coastguard WorkerLiveness analysis is a standard dataflow optimization, implemented as follows.
560*03ce13f7SAndroid Build Coastguard WorkerFor each node (basic block), its live-out set is computed as the union of the
561*03ce13f7SAndroid Build Coastguard Workerlive-in sets of its successor nodes.  Then the node's instructions are processed
562*03ce13f7SAndroid Build Coastguard Workerin reverse order, updating the live set, until the beginning of the node is
563*03ce13f7SAndroid Build Coastguard Workerreached, and the node's live-in set is recorded.  If this iteration has changed
564*03ce13f7SAndroid Build Coastguard Workerthe node's live-in set, the node's predecessors are marked for reprocessing.
565*03ce13f7SAndroid Build Coastguard WorkerThis continues until no more nodes need reprocessing.  If nodes are processed in
566*03ce13f7SAndroid Build Coastguard Workerreverse topological order, the number of iterations over the CFG is generally
567*03ce13f7SAndroid Build Coastguard Workerequal to the maximum loop nest depth.
568*03ce13f7SAndroid Build Coastguard Worker
569*03ce13f7SAndroid Build Coastguard WorkerTo implement this, each node records its live-in and live-out sets, initialized
570*03ce13f7SAndroid Build Coastguard Workerto the empty set.  Each instruction records which of its Variables' live ranges
571*03ce13f7SAndroid Build Coastguard Workerend in that instruction, initialized to the empty set.  A side effect of
572*03ce13f7SAndroid Build Coastguard Workerliveness analysis is dead instruction elimination.  Each instruction can be
573*03ce13f7SAndroid Build Coastguard Workermarked as tentatively dead, and after the algorithm converges, the tentatively
574*03ce13f7SAndroid Build Coastguard Workerdead instructions are permanently deleted.
575*03ce13f7SAndroid Build Coastguard Worker
576*03ce13f7SAndroid Build Coastguard WorkerOptionally, after this liveness analysis completes, we can do live range
577*03ce13f7SAndroid Build Coastguard Workerconstruction, in which we calculate the live range of each variable in terms of
578*03ce13f7SAndroid Build Coastguard Workerinstruction numbers.  A live range is represented as a union of segments, where
579*03ce13f7SAndroid Build Coastguard Workerthe segment endpoints are instruction numbers.  Instruction numbers are required
580*03ce13f7SAndroid Build Coastguard Workerto be unique across the CFG, and monotonically increasing within a basic block.
581*03ce13f7SAndroid Build Coastguard WorkerAs a union of segments, live ranges can contain "gaps" and are therefore
582*03ce13f7SAndroid Build Coastguard Workerprecise.  Because of SSA properties, a variable's live range can start at most
583*03ce13f7SAndroid Build Coastguard Workeronce in a basic block, and can end at most once in a basic block.  Liveness
584*03ce13f7SAndroid Build Coastguard Workeranalysis keeps track of which variable/instruction tuples begin live ranges and
585*03ce13f7SAndroid Build Coastguard Workerend live ranges, and combined with live-in and live-out sets, we can efficiently
586*03ce13f7SAndroid Build Coastguard Workerbuild up live ranges of all variables across all basic blocks.
587*03ce13f7SAndroid Build Coastguard Worker
588*03ce13f7SAndroid Build Coastguard WorkerA lot of care is taken to try to make liveness analysis fast and efficient.
589*03ce13f7SAndroid Build Coastguard WorkerBecause of the lowering strategy, the number of variables is generally
590*03ce13f7SAndroid Build Coastguard Workerproportional to the number of instructions, leading to an O(N^2) complexity
591*03ce13f7SAndroid Build Coastguard Workeralgorithm if implemented naively.  To improve things based on sparsity, we note
592*03ce13f7SAndroid Build Coastguard Workerthat most variables are "local" and referenced in at most one basic block (in
593*03ce13f7SAndroid Build Coastguard Workercontrast to the "global" variables with multi-block usage), and therefore cannot
594*03ce13f7SAndroid Build Coastguard Workerbe live across basic blocks.  Therefore, the live-in and live-out sets,
595*03ce13f7SAndroid Build Coastguard Workertypically represented as bit vectors, can be limited to the set of global
596*03ce13f7SAndroid Build Coastguard Workervariables, and the intra-block liveness bit vector can be compacted to hold the
597*03ce13f7SAndroid Build Coastguard Workerglobal variables plus the local variables for that block.
598*03ce13f7SAndroid Build Coastguard Worker
599*03ce13f7SAndroid Build Coastguard WorkerRegister allocation
600*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^
601*03ce13f7SAndroid Build Coastguard Worker
602*03ce13f7SAndroid Build Coastguard WorkerSubzero implements a simple linear-scan register allocator, based on the
603*03ce13f7SAndroid Build Coastguard Workerallocator described by Hanspeter Mössenböck and Michael Pfeiffer in `Linear Scan
604*03ce13f7SAndroid Build Coastguard WorkerRegister Allocation in the Context of SSA Form and Register Constraints
605*03ce13f7SAndroid Build Coastguard Worker<ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe02.PDF>`_.  This allocator has
606*03ce13f7SAndroid Build Coastguard Workerseveral nice features:
607*03ce13f7SAndroid Build Coastguard Worker
608*03ce13f7SAndroid Build Coastguard Worker- Live ranges are represented as unions of segments, as described above, rather
609*03ce13f7SAndroid Build Coastguard Worker  than a single start/end tuple.
610*03ce13f7SAndroid Build Coastguard Worker
611*03ce13f7SAndroid Build Coastguard Worker- It allows pre-coloring of variables with specific physical registers.
612*03ce13f7SAndroid Build Coastguard Worker
613*03ce13f7SAndroid Build Coastguard Worker- It applies equally well to pre-lowered Phi instructions.
614*03ce13f7SAndroid Build Coastguard Worker
615*03ce13f7SAndroid Build Coastguard WorkerThe paper suggests an approach of aggressively coalescing variables across Phi
616*03ce13f7SAndroid Build Coastguard Workerinstructions (i.e., trying to force Phi source and destination variables to have
617*03ce13f7SAndroid Build Coastguard Workerthe same register assignment), but we reject that in favor of the more natural
618*03ce13f7SAndroid Build Coastguard Workerpreference mechanism described below.
619*03ce13f7SAndroid Build Coastguard Worker
620*03ce13f7SAndroid Build Coastguard WorkerWe enhance the algorithm in the paper with the capability of automatic inference
621*03ce13f7SAndroid Build Coastguard Workerof register preference, and with the capability of allowing overlapping live
622*03ce13f7SAndroid Build Coastguard Workerranges to safely share the same register in certain circumstances.  If we are
623*03ce13f7SAndroid Build Coastguard Workerconsidering register allocation for variable ``A``, and ``A`` has a single
624*03ce13f7SAndroid Build Coastguard Workerdefining instruction ``A=B+C``, then the preferred register for ``A``, if
625*03ce13f7SAndroid Build Coastguard Workeravailable, would be the register assigned to ``B`` or ``C``, if any, provided
626*03ce13f7SAndroid Build Coastguard Workerthat ``B`` or ``C``'s live range does not overlap ``A``'s live range.  In this
627*03ce13f7SAndroid Build Coastguard Workerway we infer a good register preference for ``A``.
628*03ce13f7SAndroid Build Coastguard Worker
629*03ce13f7SAndroid Build Coastguard WorkerWe allow overlapping live ranges to get the same register in certain cases.
630*03ce13f7SAndroid Build Coastguard WorkerSuppose a high-level instruction like::
631*03ce13f7SAndroid Build Coastguard Worker
632*03ce13f7SAndroid Build Coastguard Worker    A = unary_op(B)
633*03ce13f7SAndroid Build Coastguard Worker
634*03ce13f7SAndroid Build Coastguard Workerhas been target-lowered like::
635*03ce13f7SAndroid Build Coastguard Worker
636*03ce13f7SAndroid Build Coastguard Worker    T.inf = B
637*03ce13f7SAndroid Build Coastguard Worker    A = unary_op(T.inf)
638*03ce13f7SAndroid Build Coastguard Worker
639*03ce13f7SAndroid Build Coastguard WorkerFurther, assume that ``B``'s live range continues beyond this instruction
640*03ce13f7SAndroid Build Coastguard Workersequence, and that ``B`` has already been assigned some register.  Normally, we
641*03ce13f7SAndroid Build Coastguard Workermight want to infer ``B``'s register as a good candidate for ``T.inf``, but it
642*03ce13f7SAndroid Build Coastguard Workerturns out that ``T.inf`` and ``B``'s live ranges overlap, requiring them to have
643*03ce13f7SAndroid Build Coastguard Workerdifferent registers.  But ``T.inf`` is just a read-only copy of ``B`` that is
644*03ce13f7SAndroid Build Coastguard Workerguaranteed to be in a register, so in theory these overlapping live ranges could
645*03ce13f7SAndroid Build Coastguard Workersafely have the same register.  Our implementation allows this overlap as long
646*03ce13f7SAndroid Build Coastguard Workeras ``T.inf`` is never modified within ``B``'s live range, and ``B`` is never
647*03ce13f7SAndroid Build Coastguard Workermodified within ``T.inf``'s live range.
648*03ce13f7SAndroid Build Coastguard Worker
649*03ce13f7SAndroid Build Coastguard WorkerSubzero's register allocator can be run in 3 configurations.
650*03ce13f7SAndroid Build Coastguard Worker
651*03ce13f7SAndroid Build Coastguard Worker- Normal mode.  All Variables are considered for register allocation.  It
652*03ce13f7SAndroid Build Coastguard Worker  requires full liveness analysis and live range construction as a prerequisite.
653*03ce13f7SAndroid Build Coastguard Worker  This is used by ``O2`` lowering.
654*03ce13f7SAndroid Build Coastguard Worker
655*03ce13f7SAndroid Build Coastguard Worker- Minimal mode.  Only infinite-weight or pre-colored Variables are considered.
656*03ce13f7SAndroid Build Coastguard Worker  All other Variables are stack-allocated.  It does not require liveness
657*03ce13f7SAndroid Build Coastguard Worker  analysis; instead, it quickly scans the instructions and records first
658*03ce13f7SAndroid Build Coastguard Worker  definitions and last uses of all relevant Variables, using that to construct a
659*03ce13f7SAndroid Build Coastguard Worker  single-segment live range.  Although this includes most of the Variables, the
660*03ce13f7SAndroid Build Coastguard Worker  live ranges are mostly simple, short, and rarely overlapping, which the
661*03ce13f7SAndroid Build Coastguard Worker  register allocator handles efficiently.  This is used by ``Om1`` lowering.
662*03ce13f7SAndroid Build Coastguard Worker
663*03ce13f7SAndroid Build Coastguard Worker- Post-phi lowering mode.  Advanced phi lowering is done after normal-mode
664*03ce13f7SAndroid Build Coastguard Worker  register allocation, and may result in new infinite-weight Variables that need
665*03ce13f7SAndroid Build Coastguard Worker  registers.  One would like to just run something like minimal mode to assign
666*03ce13f7SAndroid Build Coastguard Worker  registers to the new Variables while respecting existing register allocation
667*03ce13f7SAndroid Build Coastguard Worker  decisions.  However, it sometimes happens that there are no free registers.
668*03ce13f7SAndroid Build Coastguard Worker  In this case, some register needs to be forcibly spilled to the stack and
669*03ce13f7SAndroid Build Coastguard Worker  temporarily reassigned to the new Variable, and reloaded at the end of the new
670*03ce13f7SAndroid Build Coastguard Worker  Variable's live range.  The register must be one that has no explicit
671*03ce13f7SAndroid Build Coastguard Worker  references during the Variable's live range.  Since Subzero currently doesn't
672*03ce13f7SAndroid Build Coastguard Worker  track def/use chains (though it does record the CfgNode where a Variable is
673*03ce13f7SAndroid Build Coastguard Worker  defined), we just do a brute-force search across the CfgNode's instruction
674*03ce13f7SAndroid Build Coastguard Worker  list for the instruction numbers of interest.  This situation happens very
675*03ce13f7SAndroid Build Coastguard Worker  rarely, so there's little point for now in improving its performance.
676*03ce13f7SAndroid Build Coastguard Worker
677*03ce13f7SAndroid Build Coastguard WorkerThe basic linear-scan algorithm may, as it proceeds, rescind an early register
678*03ce13f7SAndroid Build Coastguard Workerallocation decision, leaving that Variable to be stack-allocated.  Some of these
679*03ce13f7SAndroid Build Coastguard Workertimes, it turns out that the Variable could have been given a different register
680*03ce13f7SAndroid Build Coastguard Workerwithout conflict, but by this time it's too late.  The literature recognizes
681*03ce13f7SAndroid Build Coastguard Workerthis situation and describes "second-chance bin-packing", which Subzero can do.
682*03ce13f7SAndroid Build Coastguard WorkerWe can rerun the register allocator in a mode that respects existing register
683*03ce13f7SAndroid Build Coastguard Workerallocation decisions, and sometimes it finds new non-conflicting opportunities.
684*03ce13f7SAndroid Build Coastguard WorkerIn fact, we can repeatedly run the register allocator until convergence.
685*03ce13f7SAndroid Build Coastguard WorkerUnfortunately, in the current implementation, these subsequent register
686*03ce13f7SAndroid Build Coastguard Workerallocation passes end up being extremely expensive.  This is because of the
687*03ce13f7SAndroid Build Coastguard Workertreatment of the "unhandled pre-colored" Variable set, which is normally very
688*03ce13f7SAndroid Build Coastguard Workersmall but ends up being quite large on subsequent passes.  Its performance can
689*03ce13f7SAndroid Build Coastguard Workerprobably be made acceptable with a better choice of data structures, but for now
690*03ce13f7SAndroid Build Coastguard Workerthis second-chance mechanism is disabled.
691*03ce13f7SAndroid Build Coastguard Worker
692*03ce13f7SAndroid Build Coastguard WorkerFuture work is to implement LLVM's `Greedy
693*03ce13f7SAndroid Build Coastguard Worker<http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html>`_
694*03ce13f7SAndroid Build Coastguard Workerregister allocator as a replacement for the basic linear-scan algorithm, given
695*03ce13f7SAndroid Build Coastguard WorkerLLVM's experience with its improvement in code quality.  (The blog post claims
696*03ce13f7SAndroid Build Coastguard Workerthat the Greedy allocator also improved maintainability because a lot of hacks
697*03ce13f7SAndroid Build Coastguard Workercould be removed, but Subzero is probably not yet to that level of hacks, and is
698*03ce13f7SAndroid Build Coastguard Workerless likely to see that particular benefit.)
699*03ce13f7SAndroid Build Coastguard Worker
700*03ce13f7SAndroid Build Coastguard WorkerLocal common subexpression elimination
701*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
702*03ce13f7SAndroid Build Coastguard Worker
703*03ce13f7SAndroid Build Coastguard WorkerThe Local CSE implementation goes through each instruction and records a portion
704*03ce13f7SAndroid Build Coastguard Workerof each ``Seen`` instruction in a hashset-like container.  That portion consists
705*03ce13f7SAndroid Build Coastguard Workerof the entire instruction except for any dest variable. That means ``A = X + Y``
706*03ce13f7SAndroid Build Coastguard Workerand ``B = X + Y`` will be considered to be 'equal' for this purpose. This allows
707*03ce13f7SAndroid Build Coastguard Workerus to detect common subexpressions.
708*03ce13f7SAndroid Build Coastguard Worker
709*03ce13f7SAndroid Build Coastguard WorkerWhenever a repetition is detected, the redundant variables are stored in a
710*03ce13f7SAndroid Build Coastguard Workercontainer mapping the replacee to the replacement. In the case above, it would
711*03ce13f7SAndroid Build Coastguard Workerbe ``MAP[B] = A`` assuming ``B = X + Y`` comes after ``A = X + Y``.
712*03ce13f7SAndroid Build Coastguard Worker
713*03ce13f7SAndroid Build Coastguard WorkerAt any point if a variable that has an entry in the replacement table is
714*03ce13f7SAndroid Build Coastguard Workerencountered, it is replaced with the variable it is mapped to. This ensures that
715*03ce13f7SAndroid Build Coastguard Workerthe redundant variables will not have any uses in the basic block, allowing
716*03ce13f7SAndroid Build Coastguard Workerdead code elimination to clean up the redundant instruction.
717*03ce13f7SAndroid Build Coastguard Worker
718*03ce13f7SAndroid Build Coastguard WorkerWith SSA, the information stored is never invalidated. However, non-SSA input is
719*03ce13f7SAndroid Build Coastguard Workersupported with the ``-lcse=no-ssa`` option. This has to maintain some extra
720*03ce13f7SAndroid Build Coastguard Workerdependency information to ensure proper invalidation on variable assignment.
721*03ce13f7SAndroid Build Coastguard WorkerThis is not rigorously tested because this pass is run at an early stage where
722*03ce13f7SAndroid Build Coastguard Workerit is safe to assume variables have a single definition. This is not enabled by
723*03ce13f7SAndroid Build Coastguard Workerdefault because it bumps the compile time overhead from 2% to 6%.
724*03ce13f7SAndroid Build Coastguard Worker
725*03ce13f7SAndroid Build Coastguard WorkerLoop-invariant code motion
726*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^
727*03ce13f7SAndroid Build Coastguard Worker
728*03ce13f7SAndroid Build Coastguard WorkerThis pass utilizes the loop analysis information to hoist invariant instructions
729*03ce13f7SAndroid Build Coastguard Workerto loop pre-headers. A loop must have a single entry node (header) and that node
730*03ce13f7SAndroid Build Coastguard Workermust have a single external predecessor for this optimization to work, as it is
731*03ce13f7SAndroid Build Coastguard Workercurrently implemented.
732*03ce13f7SAndroid Build Coastguard Worker
733*03ce13f7SAndroid Build Coastguard WorkerThe pass works by iterating over all instructions in the loop until the set of
734*03ce13f7SAndroid Build Coastguard Workerinvariant instructions converges. In each iteration, a non-invariant instruction
735*03ce13f7SAndroid Build Coastguard Workerinvolving only constants or a variable known to be invariant is added to the
736*03ce13f7SAndroid Build Coastguard Workerresult set. The destination variable of that instruction is added to the set of
737*03ce13f7SAndroid Build Coastguard Workervariables known to be invariant (which is initialized with the constant args).
738*03ce13f7SAndroid Build Coastguard Worker
739*03ce13f7SAndroid Build Coastguard WorkerImproving the loop-analysis infrastructure is likely to have significant impact
740*03ce13f7SAndroid Build Coastguard Workeron this optimization. Inserting an extra node to act as the pre-header when the
741*03ce13f7SAndroid Build Coastguard Workerheader has multiple incoming edges from outside could also be a good idea.
742*03ce13f7SAndroid Build Coastguard WorkerExpanding the initial invariant variable set to contain all variables that do
743*03ce13f7SAndroid Build Coastguard Workernot have definitions inside the loop does not seem to improve anything.
744*03ce13f7SAndroid Build Coastguard Worker
745*03ce13f7SAndroid Build Coastguard WorkerShort circuit evaluation
746*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^
747*03ce13f7SAndroid Build Coastguard Worker
748*03ce13f7SAndroid Build Coastguard WorkerShort circuit evaluation splits nodes and introduces early jumps when the result
749*03ce13f7SAndroid Build Coastguard Workerof a logical operation can be determined early and there are no observable side
750*03ce13f7SAndroid Build Coastguard Workereffects of skipping the rest of the computation. The instructions considered
751*03ce13f7SAndroid Build Coastguard Workerbackwards from the end of the basic blocks. When a definition of a variable
752*03ce13f7SAndroid Build Coastguard Workerinvolved in a conditional jump is found, an extra jump can be inserted in that
753*03ce13f7SAndroid Build Coastguard Workerlocation, moving the rest of the instructions in the node to a newly inserted
754*03ce13f7SAndroid Build Coastguard Workernode. Consider this example::
755*03ce13f7SAndroid Build Coastguard Worker
756*03ce13f7SAndroid Build Coastguard Worker  __N :
757*03ce13f7SAndroid Build Coastguard Worker    a = <something>
758*03ce13f7SAndroid Build Coastguard Worker    Instruction 1 without side effect
759*03ce13f7SAndroid Build Coastguard Worker    ... b = <something> ...
760*03ce13f7SAndroid Build Coastguard Worker    Instruction N without side effect
761*03ce13f7SAndroid Build Coastguard Worker    t1 = or a b
762*03ce13f7SAndroid Build Coastguard Worker    br t1 __X __Y
763*03ce13f7SAndroid Build Coastguard Worker
764*03ce13f7SAndroid Build Coastguard Workeris transformed to::
765*03ce13f7SAndroid Build Coastguard Worker
766*03ce13f7SAndroid Build Coastguard Worker  __N :
767*03ce13f7SAndroid Build Coastguard Worker    a = <something>
768*03ce13f7SAndroid Build Coastguard Worker    br a __X __N_ext
769*03ce13f7SAndroid Build Coastguard Worker
770*03ce13f7SAndroid Build Coastguard Worker  __N_ext :
771*03ce13f7SAndroid Build Coastguard Worker    Instruction 1 without side effect
772*03ce13f7SAndroid Build Coastguard Worker    ... b = <something> ...
773*03ce13f7SAndroid Build Coastguard Worker    Instruction N without side effect
774*03ce13f7SAndroid Build Coastguard Worker    br b __X __Y
775*03ce13f7SAndroid Build Coastguard Worker
776*03ce13f7SAndroid Build Coastguard WorkerThe logic for AND is analogous, the only difference is that the early jump is
777*03ce13f7SAndroid Build Coastguard Workerfacilitated by a ``false`` value instead of ``true``.
778*03ce13f7SAndroid Build Coastguard Worker
779*03ce13f7SAndroid Build Coastguard WorkerGlobal Variable Splitting
780*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^
781*03ce13f7SAndroid Build Coastguard Worker
782*03ce13f7SAndroid Build Coastguard WorkerGlobal variable splitting (``-split-global-vars``) is run after register
783*03ce13f7SAndroid Build Coastguard Workerallocation. It works on the variables that did not manage to get registers (but
784*03ce13f7SAndroid Build Coastguard Workerare allowed to) and decomposes their live ranges into the individual segments
785*03ce13f7SAndroid Build Coastguard Worker(which span a single node at most). New variables are created (but not yet used)
786*03ce13f7SAndroid Build Coastguard Workerwith these smaller live ranges and the register allocator is run again. This is
787*03ce13f7SAndroid Build Coastguard Workernot inefficient as the old variables that already had registers are now
788*03ce13f7SAndroid Build Coastguard Workerconsidered pre-colored.
789*03ce13f7SAndroid Build Coastguard Worker
790*03ce13f7SAndroid Build Coastguard WorkerThe new variables that get registers replace their parent variables for their
791*03ce13f7SAndroid Build Coastguard Workerportion of its (parent's) live range. A copy from the old variable to the new
792*03ce13f7SAndroid Build Coastguard Workeris introduced before the first use and the reverse after the last def in the
793*03ce13f7SAndroid Build Coastguard Workerlive range.
794*03ce13f7SAndroid Build Coastguard Worker
795*03ce13f7SAndroid Build Coastguard WorkerBasic phi lowering
796*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^
797*03ce13f7SAndroid Build Coastguard Worker
798*03ce13f7SAndroid Build Coastguard WorkerThe simplest phi lowering strategy works as follows (this is how LLVM ``-O0``
799*03ce13f7SAndroid Build Coastguard Workerimplements it).  Consider this example::
800*03ce13f7SAndroid Build Coastguard Worker
801*03ce13f7SAndroid Build Coastguard Worker    L1:
802*03ce13f7SAndroid Build Coastguard Worker      ...
803*03ce13f7SAndroid Build Coastguard Worker      br L3
804*03ce13f7SAndroid Build Coastguard Worker    L2:
805*03ce13f7SAndroid Build Coastguard Worker      ...
806*03ce13f7SAndroid Build Coastguard Worker      br L3
807*03ce13f7SAndroid Build Coastguard Worker    L3:
808*03ce13f7SAndroid Build Coastguard Worker      A = phi [B, L1], [C, L2]
809*03ce13f7SAndroid Build Coastguard Worker      X = phi [Y, L1], [Z, L2]
810*03ce13f7SAndroid Build Coastguard Worker
811*03ce13f7SAndroid Build Coastguard WorkerFor each destination of a phi instruction, we can create a temporary and insert
812*03ce13f7SAndroid Build Coastguard Workerthe temporary's assignment at the end of the predecessor block::
813*03ce13f7SAndroid Build Coastguard Worker
814*03ce13f7SAndroid Build Coastguard Worker    L1:
815*03ce13f7SAndroid Build Coastguard Worker      ...
816*03ce13f7SAndroid Build Coastguard Worker      A' = B
817*03ce13f7SAndroid Build Coastguard Worker      X' = Y
818*03ce13f7SAndroid Build Coastguard Worker      br L3
819*03ce13f7SAndroid Build Coastguard Worker    L2:
820*03ce13f7SAndroid Build Coastguard Worker      ...
821*03ce13f7SAndroid Build Coastguard Worker      A' = C
822*03ce13f7SAndroid Build Coastguard Worker      X' = Z
823*03ce13f7SAndroid Build Coastguard Worker      br L3
824*03ce13f7SAndroid Build Coastguard Worker    L2:
825*03ce13f7SAndroid Build Coastguard Worker      A = A'
826*03ce13f7SAndroid Build Coastguard Worker      X = X'
827*03ce13f7SAndroid Build Coastguard Worker
828*03ce13f7SAndroid Build Coastguard WorkerThis transformation is very simple and reliable.  It can be done before target
829*03ce13f7SAndroid Build Coastguard Workerlowering and register allocation, and it easily avoids the classic lost-copy and
830*03ce13f7SAndroid Build Coastguard Workerrelated problems.  ``Om1`` lowering uses this strategy.
831*03ce13f7SAndroid Build Coastguard Worker
832*03ce13f7SAndroid Build Coastguard WorkerHowever, it has the disadvantage of initializing temporaries even for branches
833*03ce13f7SAndroid Build Coastguard Workernot taken, though that could be mitigated by splitting non-critical edges and
834*03ce13f7SAndroid Build Coastguard Workerputting assignments in the edge-split nodes.  Another problem is that without
835*03ce13f7SAndroid Build Coastguard Workerextra machinery, the assignments to ``A``, ``A'``, ``X``, and ``X'`` are given a
836*03ce13f7SAndroid Build Coastguard Workerspecific ordering even though phi semantics are that the assignments are
837*03ce13f7SAndroid Build Coastguard Workerparallel or unordered.  This sometimes imposes false live range overlaps and
838*03ce13f7SAndroid Build Coastguard Workerleads to poorer register allocation.
839*03ce13f7SAndroid Build Coastguard Worker
840*03ce13f7SAndroid Build Coastguard WorkerAdvanced phi lowering
841*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^
842*03ce13f7SAndroid Build Coastguard Worker
843*03ce13f7SAndroid Build Coastguard Worker``O2`` lowering defers phi lowering until after register allocation to avoid the
844*03ce13f7SAndroid Build Coastguard Workerproblem of false live range overlaps.  It works as follows.  We split each
845*03ce13f7SAndroid Build Coastguard Workerincoming edge and move the (parallel) phi assignments into the split nodes.  We
846*03ce13f7SAndroid Build Coastguard Workerlinearize each set of assignments by finding a safe, topological ordering of the
847*03ce13f7SAndroid Build Coastguard Workerassignments, respecting register assignments as well.  For example::
848*03ce13f7SAndroid Build Coastguard Worker
849*03ce13f7SAndroid Build Coastguard Worker    A = B
850*03ce13f7SAndroid Build Coastguard Worker    X = Y
851*03ce13f7SAndroid Build Coastguard Worker
852*03ce13f7SAndroid Build Coastguard WorkerNormally these assignments could be executed in either order, but if ``B`` and
853*03ce13f7SAndroid Build Coastguard Worker``X`` are assigned the same physical register, we would want to use the above
854*03ce13f7SAndroid Build Coastguard Workerordering.  Dependency cycles are broken by introducing a temporary.  For
855*03ce13f7SAndroid Build Coastguard Workerexample::
856*03ce13f7SAndroid Build Coastguard Worker
857*03ce13f7SAndroid Build Coastguard Worker    A = B
858*03ce13f7SAndroid Build Coastguard Worker    B = A
859*03ce13f7SAndroid Build Coastguard Worker
860*03ce13f7SAndroid Build Coastguard WorkerHere, a temporary breaks the cycle::
861*03ce13f7SAndroid Build Coastguard Worker
862*03ce13f7SAndroid Build Coastguard Worker    t = A
863*03ce13f7SAndroid Build Coastguard Worker    A = B
864*03ce13f7SAndroid Build Coastguard Worker    B = t
865*03ce13f7SAndroid Build Coastguard Worker
866*03ce13f7SAndroid Build Coastguard WorkerFinally, we use the existing target lowering to lower the assignments in this
867*03ce13f7SAndroid Build Coastguard Workerbasic block, and once that is done for all basic blocks, we run the post-phi
868*03ce13f7SAndroid Build Coastguard Workervariant of register allocation on the edge-split basic blocks.
869*03ce13f7SAndroid Build Coastguard Worker
870*03ce13f7SAndroid Build Coastguard WorkerWhen computing a topological order, we try to first schedule assignments whose
871*03ce13f7SAndroid Build Coastguard Workersource has a physical register, and last schedule assignments whose destination
872*03ce13f7SAndroid Build Coastguard Workerhas a physical register.  This helps reduce register pressure.
873*03ce13f7SAndroid Build Coastguard Worker
874*03ce13f7SAndroid Build Coastguard WorkerX86 address mode inference
875*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^
876*03ce13f7SAndroid Build Coastguard Worker
877*03ce13f7SAndroid Build Coastguard WorkerWe try to take advantage of the x86 addressing mode that includes a base
878*03ce13f7SAndroid Build Coastguard Workerregister, an index register, an index register scale amount, and an immediate
879*03ce13f7SAndroid Build Coastguard Workeroffset.  We do this through simple pattern matching.  Starting with a load or
880*03ce13f7SAndroid Build Coastguard Workerstore instruction where the address is a variable, we initialize the base
881*03ce13f7SAndroid Build Coastguard Workerregister to that variable, and look up the instruction where that variable is
882*03ce13f7SAndroid Build Coastguard Workerdefined.  If that is an add instruction of two variables and the index register
883*03ce13f7SAndroid Build Coastguard Workerhasn't been set, we replace the base and index register with those two
884*03ce13f7SAndroid Build Coastguard Workervariables.  If instead it is an add instruction of a variable and a constant, we
885*03ce13f7SAndroid Build Coastguard Workerreplace the base register with the variable and add the constant to the
886*03ce13f7SAndroid Build Coastguard Workerimmediate offset.
887*03ce13f7SAndroid Build Coastguard Worker
888*03ce13f7SAndroid Build Coastguard WorkerThere are several more patterns that can be matched.  This pattern matching
889*03ce13f7SAndroid Build Coastguard Workercontinues on the load or store instruction until no more matches are found.
890*03ce13f7SAndroid Build Coastguard WorkerBecause a program typically has few load and store instructions (not to be
891*03ce13f7SAndroid Build Coastguard Workerconfused with instructions that manipulate stack variables), this address mode
892*03ce13f7SAndroid Build Coastguard Workerinference pass is fast.
893*03ce13f7SAndroid Build Coastguard Worker
894*03ce13f7SAndroid Build Coastguard WorkerX86 read-modify-write inference
895*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
896*03ce13f7SAndroid Build Coastguard Worker
897*03ce13f7SAndroid Build Coastguard WorkerA reasonably common bitcode pattern is a non-atomic update of a memory
898*03ce13f7SAndroid Build Coastguard Workerlocation::
899*03ce13f7SAndroid Build Coastguard Worker
900*03ce13f7SAndroid Build Coastguard Worker    x = load addr
901*03ce13f7SAndroid Build Coastguard Worker    y = add x, 1
902*03ce13f7SAndroid Build Coastguard Worker    store y, addr
903*03ce13f7SAndroid Build Coastguard Worker
904*03ce13f7SAndroid Build Coastguard WorkerOn x86, with good register allocation, the Subzero passes described above
905*03ce13f7SAndroid Build Coastguard Workergenerate code with only this quality::
906*03ce13f7SAndroid Build Coastguard Worker
907*03ce13f7SAndroid Build Coastguard Worker    mov [%ebx], %eax
908*03ce13f7SAndroid Build Coastguard Worker    add $1, %eax
909*03ce13f7SAndroid Build Coastguard Worker    mov %eax, [%ebx]
910*03ce13f7SAndroid Build Coastguard Worker
911*03ce13f7SAndroid Build Coastguard WorkerHowever, x86 allows for this kind of code::
912*03ce13f7SAndroid Build Coastguard Worker
913*03ce13f7SAndroid Build Coastguard Worker    add $1, [%ebx]
914*03ce13f7SAndroid Build Coastguard Worker
915*03ce13f7SAndroid Build Coastguard Workerwhich requires fewer instructions, but perhaps more importantly, requires fewer
916*03ce13f7SAndroid Build Coastguard Workerphysical registers.
917*03ce13f7SAndroid Build Coastguard Worker
918*03ce13f7SAndroid Build Coastguard WorkerIt's also important to note that this transformation only makes sense if the
919*03ce13f7SAndroid Build Coastguard Workerstore instruction ends ``x``'s live range.
920*03ce13f7SAndroid Build Coastguard Worker
921*03ce13f7SAndroid Build Coastguard WorkerSubzero's ``O2`` recipe includes an early pass to find read-modify-write (RMW)
922*03ce13f7SAndroid Build Coastguard Workeropportunities via simple pattern matching.  The only problem is that it is run
923*03ce13f7SAndroid Build Coastguard Workerbefore liveness analysis, which is needed to determine whether ``x``'s live
924*03ce13f7SAndroid Build Coastguard Workerrange ends after the RMW.  Since liveness analysis is one of the most expensive
925*03ce13f7SAndroid Build Coastguard Workerpasses, it's not attractive to run it an extra time just for RMW analysis.
926*03ce13f7SAndroid Build Coastguard WorkerInstead, we essentially generate both the RMW and the non-RMW versions, and then
927*03ce13f7SAndroid Build Coastguard Workerduring lowering, the RMW version deletes itself if it finds x still live.
928*03ce13f7SAndroid Build Coastguard Worker
929*03ce13f7SAndroid Build Coastguard WorkerX86 compare-branch inference
930*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^
931*03ce13f7SAndroid Build Coastguard Worker
932*03ce13f7SAndroid Build Coastguard WorkerIn the LLVM instruction set, the compare/branch pattern works like this::
933*03ce13f7SAndroid Build Coastguard Worker
934*03ce13f7SAndroid Build Coastguard Worker    cond = icmp eq a, b
935*03ce13f7SAndroid Build Coastguard Worker    br cond, target
936*03ce13f7SAndroid Build Coastguard Worker
937*03ce13f7SAndroid Build Coastguard WorkerThe result of the icmp instruction is a single bit, and a conditional branch
938*03ce13f7SAndroid Build Coastguard Workertests that bit.  By contrast, most target architectures use this pattern::
939*03ce13f7SAndroid Build Coastguard Worker
940*03ce13f7SAndroid Build Coastguard Worker    cmp a, b  // implicitly sets various bits of FLAGS register
941*03ce13f7SAndroid Build Coastguard Worker    br eq, target  // branch on a particular FLAGS bit
942*03ce13f7SAndroid Build Coastguard Worker
943*03ce13f7SAndroid Build Coastguard WorkerA naive lowering sequence conditionally sets ``cond`` to 0 or 1, then tests
944*03ce13f7SAndroid Build Coastguard Worker``cond`` and conditionally branches.  Subzero has a pass that identifies
945*03ce13f7SAndroid Build Coastguard Workerboolean-based operations like this and folds them into a single
946*03ce13f7SAndroid Build Coastguard Workercompare/branch-like operation.  It is set up for more than just cmp/br though.
947*03ce13f7SAndroid Build Coastguard WorkerBoolean producers include icmp (integer compare), fcmp (floating-point compare),
948*03ce13f7SAndroid Build Coastguard Workerand trunc (integer truncation when the destination has bool type).  Boolean
949*03ce13f7SAndroid Build Coastguard Workerconsumers include branch, select (the ternary operator from the C language), and
950*03ce13f7SAndroid Build Coastguard Workersign-extend and zero-extend when the source has bool type.
951*03ce13f7SAndroid Build Coastguard Worker
952*03ce13f7SAndroid Build Coastguard WorkerCode emission
953*03ce13f7SAndroid Build Coastguard Worker-------------
954*03ce13f7SAndroid Build Coastguard Worker
955*03ce13f7SAndroid Build Coastguard WorkerSubzero's integrated assembler is derived from Dart's `assembler code
956*03ce13f7SAndroid Build Coastguard Worker<https://github.com/dart-lang/sdk/tree/master/runtime/vm>'_.  There is a pass
957*03ce13f7SAndroid Build Coastguard Workerthat iterates through the low-level ICE instructions and invokes the relevant
958*03ce13f7SAndroid Build Coastguard Workerassembler functions.  Placeholders are added for later fixup of branch target
959*03ce13f7SAndroid Build Coastguard Workeroffsets.  (Backward branches use short offsets if possible; forward branches
960*03ce13f7SAndroid Build Coastguard Workergenerally use long offsets unless it is an intra-block branch of "known" short
961*03ce13f7SAndroid Build Coastguard Workerlength.)  The assembler emits into a staging buffer.  Once emission into the
962*03ce13f7SAndroid Build Coastguard Workerstaging buffer for a function is complete, the data is emitted to the output
963*03ce13f7SAndroid Build Coastguard Workerfile as an ELF object file, and metadata such as relocations, symbol table, and
964*03ce13f7SAndroid Build Coastguard Workerstring table, are accumulated for emission at the end.  Global data initializers
965*03ce13f7SAndroid Build Coastguard Workerare emitted similarly.  A key point is that at this point, the staging buffer
966*03ce13f7SAndroid Build Coastguard Workercan be deallocated, and only a minimum of data needs to held until the end.
967*03ce13f7SAndroid Build Coastguard Worker
968*03ce13f7SAndroid Build Coastguard WorkerAs a debugging alternative, Subzero can emit textual assembly code which can
969*03ce13f7SAndroid Build Coastguard Workerthen be run through an external assembler.  This is of course super slow, but
970*03ce13f7SAndroid Build Coastguard Workerquite valuable when bringing up a new target.
971*03ce13f7SAndroid Build Coastguard Worker
972*03ce13f7SAndroid Build Coastguard WorkerAs another debugging option, the staging buffer can be emitted as textual
973*03ce13f7SAndroid Build Coastguard Workerassembly, primarily in the form of ".byte" lines.  This allows the assembler to
974*03ce13f7SAndroid Build Coastguard Workerbe tested separately from the ELF related code.
975*03ce13f7SAndroid Build Coastguard Worker
976*03ce13f7SAndroid Build Coastguard WorkerMemory management
977*03ce13f7SAndroid Build Coastguard Worker-----------------
978*03ce13f7SAndroid Build Coastguard Worker
979*03ce13f7SAndroid Build Coastguard WorkerWhere possible, we allocate from a ``CfgLocalAllocator`` which derives from
980*03ce13f7SAndroid Build Coastguard WorkerLLVM's ``BumpPtrAllocator``.  This is an arena-style allocator where objects
981*03ce13f7SAndroid Build Coastguard Workerallocated from the arena are never actually freed; instead, when the CFG
982*03ce13f7SAndroid Build Coastguard Workertranslation completes and the CFG is deleted, the entire arena memory is
983*03ce13f7SAndroid Build Coastguard Workerreclaimed at once.  This style of allocation works well in an environment like a
984*03ce13f7SAndroid Build Coastguard Workercompiler where there are distinct phases with only easily-identifiable objects
985*03ce13f7SAndroid Build Coastguard Workerliving across phases.  It frees the developer from having to manage object
986*03ce13f7SAndroid Build Coastguard Workerdeletion, and it amortizes deletion costs across just a single arena deletion at
987*03ce13f7SAndroid Build Coastguard Workerthe end of the phase.  Furthermore, it helps scalability by allocating entirely
988*03ce13f7SAndroid Build Coastguard Workerfrom thread-local memory pools, and minimizing global locking of the heap.
989*03ce13f7SAndroid Build Coastguard Worker
990*03ce13f7SAndroid Build Coastguard WorkerInstructions are probably the most heavily allocated complex class in Subzero.
991*03ce13f7SAndroid Build Coastguard WorkerWe represent an instruction list as an intrusive doubly linked list, allocate
992*03ce13f7SAndroid Build Coastguard Workerall instructions from the ``CfgLocalAllocator``, and we make sure each
993*03ce13f7SAndroid Build Coastguard Workerinstruction subclass is basically `POD
994*03ce13f7SAndroid Build Coastguard Worker<http://en.cppreference.com/w/cpp/concept/PODType>`_ (Plain Old Data) with a
995*03ce13f7SAndroid Build Coastguard Workertrivial destructor.  This way, when the CFG is finished, we don't need to
996*03ce13f7SAndroid Build Coastguard Workerindividually deallocate every instruction.  We do similar for Variables, which
997*03ce13f7SAndroid Build Coastguard Workeris probably the second most popular complex class.
998*03ce13f7SAndroid Build Coastguard Worker
999*03ce13f7SAndroid Build Coastguard WorkerThere are some situations where passes need to use some `STL container class
1000*03ce13f7SAndroid Build Coastguard Worker<http://en.cppreference.com/w/cpp/container>`_.  Subzero has a way of using the
1001*03ce13f7SAndroid Build Coastguard Worker``CfgLocalAllocator`` as the container allocator if this is needed.
1002*03ce13f7SAndroid Build Coastguard Worker
1003*03ce13f7SAndroid Build Coastguard WorkerMultithreaded translation
1004*03ce13f7SAndroid Build Coastguard Worker-------------------------
1005*03ce13f7SAndroid Build Coastguard Worker
1006*03ce13f7SAndroid Build Coastguard WorkerSubzero is designed to be able to translate functions in parallel.  With the
1007*03ce13f7SAndroid Build Coastguard Worker``-threads=N`` command-line option, there is a 3-stage producer-consumer
1008*03ce13f7SAndroid Build Coastguard Workerpipeline:
1009*03ce13f7SAndroid Build Coastguard Worker
1010*03ce13f7SAndroid Build Coastguard Worker- A single thread parses the ``.pexe`` file and produces a sequence of work
1011*03ce13f7SAndroid Build Coastguard Worker  units.  A work unit can be either a fully constructed CFG, or a set of global
1012*03ce13f7SAndroid Build Coastguard Worker  initializers.  The work unit includes its sequence number denoting its parse
1013*03ce13f7SAndroid Build Coastguard Worker  order.  Each work unit is added to the translation queue.
1014*03ce13f7SAndroid Build Coastguard Worker
1015*03ce13f7SAndroid Build Coastguard Worker- There are N translation threads that draw work units from the translation
1016*03ce13f7SAndroid Build Coastguard Worker  queue and lower them into assembler buffers.  Each assembler buffer is added
1017*03ce13f7SAndroid Build Coastguard Worker  to the emitter queue, tagged with its sequence number.  The CFG and its
1018*03ce13f7SAndroid Build Coastguard Worker  ``CfgLocalAllocator`` are disposed of at this point.
1019*03ce13f7SAndroid Build Coastguard Worker
1020*03ce13f7SAndroid Build Coastguard Worker- A single thread draws assembler buffers from the emitter queue and appends to
1021*03ce13f7SAndroid Build Coastguard Worker  the output file.  It uses the sequence numbers to reintegrate the assembler
1022*03ce13f7SAndroid Build Coastguard Worker  buffers according to the original parse order, such that output order is
1023*03ce13f7SAndroid Build Coastguard Worker  always deterministic.
1024*03ce13f7SAndroid Build Coastguard Worker
1025*03ce13f7SAndroid Build Coastguard WorkerThis means that with ``-threads=N``, there are actually ``N+1`` spawned threads
1026*03ce13f7SAndroid Build Coastguard Workerfor a total of ``N+2`` execution threads, taking the parser and emitter threads
1027*03ce13f7SAndroid Build Coastguard Workerinto account.  For the special case of ``N=0``, execution is entirely sequential
1028*03ce13f7SAndroid Build Coastguard Worker-- the same thread parses, translates, and emits, one function at a time.  This
1029*03ce13f7SAndroid Build Coastguard Workeris useful for performance measurements.
1030*03ce13f7SAndroid Build Coastguard Worker
1031*03ce13f7SAndroid Build Coastguard WorkerIdeally, we would like to get near-linear scalability as the number of
1032*03ce13f7SAndroid Build Coastguard Workertranslation threads increases.  We expect that ``-threads=1`` should be slightly
1033*03ce13f7SAndroid Build Coastguard Workerfaster than ``-threads=0`` as the small amount of time spent parsing and
1034*03ce13f7SAndroid Build Coastguard Workeremitting is done largely in parallel with translation.  With perfect
1035*03ce13f7SAndroid Build Coastguard Workerscalability, we see ``-threads=N`` translating ``N`` times as fast as
1036*03ce13f7SAndroid Build Coastguard Worker``-threads=1``, up until the point where parsing or emitting becomes the
1037*03ce13f7SAndroid Build Coastguard Workerbottleneck, or ``N+2`` exceeds the number of CPU cores.  In reality, memory
1038*03ce13f7SAndroid Build Coastguard Workerperformance would become a bottleneck and efficiency might peak at, say, 75%.
1039*03ce13f7SAndroid Build Coastguard Worker
1040*03ce13f7SAndroid Build Coastguard WorkerCurrently, parsing takes about 11% of total sequential time.  If translation
1041*03ce13f7SAndroid Build Coastguard Workerscalability ever gets so fast and awesomely scalable that parsing becomes a
1042*03ce13f7SAndroid Build Coastguard Workerbottleneck, it should be possible to make parsing multithreaded as well.
1043*03ce13f7SAndroid Build Coastguard Worker
1044*03ce13f7SAndroid Build Coastguard WorkerInternally, all shared, mutable data is held in the GlobalContext object, and
1045*03ce13f7SAndroid Build Coastguard Workeraccess to each field is guarded by a mutex.
1046*03ce13f7SAndroid Build Coastguard Worker
1047*03ce13f7SAndroid Build Coastguard WorkerSecurity
1048*03ce13f7SAndroid Build Coastguard Worker--------
1049*03ce13f7SAndroid Build Coastguard Worker
1050*03ce13f7SAndroid Build Coastguard WorkerSubzero includes a number of security features in the generated code, as well as
1051*03ce13f7SAndroid Build Coastguard Workerin the Subzero translator itself, which run on top of the existing Native Client
1052*03ce13f7SAndroid Build Coastguard Workersandbox as well as Chrome's OS-level sandbox.
1053*03ce13f7SAndroid Build Coastguard Worker
1054*03ce13f7SAndroid Build Coastguard WorkerSandboxed translator
1055*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^
1056*03ce13f7SAndroid Build Coastguard Worker
1057*03ce13f7SAndroid Build Coastguard WorkerWhen running inside the browser, the Subzero translator executes as sandboxed,
1058*03ce13f7SAndroid Build Coastguard Workeruntrusted code that is initially checked by the validator, just like the
1059*03ce13f7SAndroid Build Coastguard WorkerLLVM-based ``pnacl-llc`` translator.  As such, the Subzero binary should be no
1060*03ce13f7SAndroid Build Coastguard Workermore or less secure than the translator it replaces, from the point of view of
1061*03ce13f7SAndroid Build Coastguard Workerthe Chrome sandbox.  That said, Subzero is much smaller than ``pnacl-llc`` and
1062*03ce13f7SAndroid Build Coastguard Workerwas designed from the start with security in mind, so one expects fewer attacker
1063*03ce13f7SAndroid Build Coastguard Workeropportunities here.
1064*03ce13f7SAndroid Build Coastguard Worker
1065*03ce13f7SAndroid Build Coastguard WorkerFuzzing
1066*03ce13f7SAndroid Build Coastguard Worker^^^^^^^
1067*03ce13f7SAndroid Build Coastguard Worker
1068*03ce13f7SAndroid Build Coastguard WorkerWe have started fuzz-testing the ``.pexe`` files input to Subzero, using a
1069*03ce13f7SAndroid Build Coastguard Workercombination of `afl-fuzz <http://lcamtuf.coredump.cx/afl/>`_, LLVM's `libFuzzer
1070*03ce13f7SAndroid Build Coastguard Worker<http://llvm.org/docs/LibFuzzer.html>`_, and custom tooling.  The purpose is to
1071*03ce13f7SAndroid Build Coastguard Workerfind and fix cases where Subzero crashes or otherwise ungracefully fails on
1072*03ce13f7SAndroid Build Coastguard Workerunexpected inputs, and to do so automatically over a large range of unexpected
1073*03ce13f7SAndroid Build Coastguard Workerinputs.  By fixing bugs that arise from fuzz testing, we reduce the possibility
1074*03ce13f7SAndroid Build Coastguard Workerof an attacker exploiting these bugs.
1075*03ce13f7SAndroid Build Coastguard Worker
1076*03ce13f7SAndroid Build Coastguard WorkerMost of the problems found so far are ones most appropriately handled in the
1077*03ce13f7SAndroid Build Coastguard Workerparser.  However, there have been a couple that have identified problems in the
1078*03ce13f7SAndroid Build Coastguard Workerlowering, or otherwise inappropriately triggered assertion failures and fatal
1079*03ce13f7SAndroid Build Coastguard Workererrors.  We continue to dig into this area.
1080*03ce13f7SAndroid Build Coastguard Worker
1081*03ce13f7SAndroid Build Coastguard WorkerFuture security work
1082*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^
1083*03ce13f7SAndroid Build Coastguard Worker
1084*03ce13f7SAndroid Build Coastguard WorkerSubzero is well-positioned to explore other future security enhancements, e.g.:
1085*03ce13f7SAndroid Build Coastguard Worker
1086*03ce13f7SAndroid Build Coastguard Worker- Tightening the Native Client sandbox.  ABI changes, such as the previous work
1087*03ce13f7SAndroid Build Coastguard Worker  on `hiding the sandbox base address
1088*03ce13f7SAndroid Build Coastguard Worker  <https://docs.google.com/document/d/1eskaI4353XdsJQFJLRnZzb_YIESQx4gNRzf31dqXVG8>`_
1089*03ce13f7SAndroid Build Coastguard Worker  in x86-64, are easy to experiment with in Subzero.
1090*03ce13f7SAndroid Build Coastguard Worker
1091*03ce13f7SAndroid Build Coastguard Worker- Making the executable code section read-only.  This would prevent a PNaCl
1092*03ce13f7SAndroid Build Coastguard Worker  application from inspecting its own binary and trying to find ROP gadgets even
1093*03ce13f7SAndroid Build Coastguard Worker  after code diversification has been performed.  It may still be susceptible to
1094*03ce13f7SAndroid Build Coastguard Worker  `blind ROP <http://www.scs.stanford.edu/brop/bittau-brop.pdf>`_ attacks,
1095*03ce13f7SAndroid Build Coastguard Worker  security is still overall improved.
1096*03ce13f7SAndroid Build Coastguard Worker
1097*03ce13f7SAndroid Build Coastguard Worker- Instruction selection diversification.  It may be possible to lower a given
1098*03ce13f7SAndroid Build Coastguard Worker  instruction in several largely equivalent ways, which gives more opportunities
1099*03ce13f7SAndroid Build Coastguard Worker  for code randomization.
1100*03ce13f7SAndroid Build Coastguard Worker
1101*03ce13f7SAndroid Build Coastguard WorkerChrome integration
1102*03ce13f7SAndroid Build Coastguard Worker------------------
1103*03ce13f7SAndroid Build Coastguard Worker
1104*03ce13f7SAndroid Build Coastguard WorkerCurrently Subzero is available in Chrome for the x86-32 architecture, but under
1105*03ce13f7SAndroid Build Coastguard Workera flag.  When the flag is enabled, Subzero is used when the `manifest file
1106*03ce13f7SAndroid Build Coastguard Worker<https://developer.chrome.com/native-client/reference/nacl-manifest-format>`_
1107*03ce13f7SAndroid Build Coastguard Workerlinking to the ``.pexe`` file specifies the ``O0`` optimization level.
1108*03ce13f7SAndroid Build Coastguard Worker
1109*03ce13f7SAndroid Build Coastguard WorkerThe next step is to remove the flag, i.e. invoke Subzero as the only translator
1110*03ce13f7SAndroid Build Coastguard Workerfor ``O0``-specified manifest files.
1111*03ce13f7SAndroid Build Coastguard Worker
1112*03ce13f7SAndroid Build Coastguard WorkerUltimately, Subzero might produce code rivaling LLVM ``O2`` quality, in which
1113*03ce13f7SAndroid Build Coastguard Workercase Subzero could be used for all PNaCl translation.
1114*03ce13f7SAndroid Build Coastguard Worker
1115*03ce13f7SAndroid Build Coastguard WorkerCommand line options
1116*03ce13f7SAndroid Build Coastguard Worker--------------------
1117*03ce13f7SAndroid Build Coastguard Worker
1118*03ce13f7SAndroid Build Coastguard WorkerSubzero has a number of command-line options for debugging and diagnostics.
1119*03ce13f7SAndroid Build Coastguard WorkerAmong the more interesting are the following.
1120*03ce13f7SAndroid Build Coastguard Worker
1121*03ce13f7SAndroid Build Coastguard Worker- Using the ``-verbose`` flag, Subzero will dump the CFG, or produce other
1122*03ce13f7SAndroid Build Coastguard Worker  diagnostic output, with various levels of detail after each pass.  Instruction
1123*03ce13f7SAndroid Build Coastguard Worker  numbers can be printed or suppressed.  Deleted instructions can be printed or
1124*03ce13f7SAndroid Build Coastguard Worker  suppressed (they are retained in the instruction list, as discussed earlier,
1125*03ce13f7SAndroid Build Coastguard Worker  because they can help explain how lower-level instructions originated).
1126*03ce13f7SAndroid Build Coastguard Worker  Liveness information can be printed when available.  Details of register
1127*03ce13f7SAndroid Build Coastguard Worker  allocation can be printed as register allocator decisions are made.  And more.
1128*03ce13f7SAndroid Build Coastguard Worker
1129*03ce13f7SAndroid Build Coastguard Worker- Running Subzero with any level of verbosity produces an enormous amount of
1130*03ce13f7SAndroid Build Coastguard Worker  output.  When debugging a single function, verbose output can be suppressed
1131*03ce13f7SAndroid Build Coastguard Worker  except for a particular function.  The ``-verbose-focus`` flag suppresses
1132*03ce13f7SAndroid Build Coastguard Worker  verbose output except for the specified function.
1133*03ce13f7SAndroid Build Coastguard Worker
1134*03ce13f7SAndroid Build Coastguard Worker- Subzero has a ``-timing`` option that prints a breakdown of pass-level timing
1135*03ce13f7SAndroid Build Coastguard Worker  at exit.  Timing markers can be placed in the Subzero source code to demarcate
1136*03ce13f7SAndroid Build Coastguard Worker  logical operations or passes of interest.  Basic timing information plus
1137*03ce13f7SAndroid Build Coastguard Worker  call-stack type timing information is printed at the end.
1138*03ce13f7SAndroid Build Coastguard Worker
1139*03ce13f7SAndroid Build Coastguard Worker- Along with ``-timing``, the user can instead get a report on the overall
1140*03ce13f7SAndroid Build Coastguard Worker  translation time for each function, to help focus on timing outliers.  Also,
1141*03ce13f7SAndroid Build Coastguard Worker  ``-timing-focus`` limits the ``-timing`` reporting to a single function,
1142*03ce13f7SAndroid Build Coastguard Worker  instead of aggregating pass timing across all functions.
1143*03ce13f7SAndroid Build Coastguard Worker
1144*03ce13f7SAndroid Build Coastguard Worker- The ``-szstats`` option reports various statistics on each function, such as
1145*03ce13f7SAndroid Build Coastguard Worker  stack frame size, static instruction count, etc.  It may be helpful to track
1146*03ce13f7SAndroid Build Coastguard Worker  these stats over time as Subzero is improved, as an approximate measure of
1147*03ce13f7SAndroid Build Coastguard Worker  code quality.
1148*03ce13f7SAndroid Build Coastguard Worker
1149*03ce13f7SAndroid Build Coastguard Worker- The flag ``-asm-verbose``, in conjunction with emitting textual assembly
1150*03ce13f7SAndroid Build Coastguard Worker  output, annotate the assembly output with register-focused liveness
1151*03ce13f7SAndroid Build Coastguard Worker  information.  In particular, each basic block is annotated with which
1152*03ce13f7SAndroid Build Coastguard Worker  registers are live-in and live-out, and each instruction is annotated with
1153*03ce13f7SAndroid Build Coastguard Worker  which registers' and stack locations' live ranges end at that instruction.
1154*03ce13f7SAndroid Build Coastguard Worker  This is really useful when studying the generated code to find opportunities
1155*03ce13f7SAndroid Build Coastguard Worker  for code quality improvements.
1156*03ce13f7SAndroid Build Coastguard Worker
1157*03ce13f7SAndroid Build Coastguard WorkerTesting and debugging
1158*03ce13f7SAndroid Build Coastguard Worker---------------------
1159*03ce13f7SAndroid Build Coastguard Worker
1160*03ce13f7SAndroid Build Coastguard WorkerLLVM lit tests
1161*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^
1162*03ce13f7SAndroid Build Coastguard Worker
1163*03ce13f7SAndroid Build Coastguard WorkerFor basic testing, Subzero uses LLVM's `lit
1164*03ce13f7SAndroid Build Coastguard Worker<http://llvm.org/docs/CommandGuide/lit.html>`_ framework for running tests.  We
1165*03ce13f7SAndroid Build Coastguard Workerhave a suite of hundreds of small functions where we test for particular
1166*03ce13f7SAndroid Build Coastguard Workerassembly code patterns across different target architectures.
1167*03ce13f7SAndroid Build Coastguard Worker
1168*03ce13f7SAndroid Build Coastguard WorkerCross tests
1169*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^
1170*03ce13f7SAndroid Build Coastguard Worker
1171*03ce13f7SAndroid Build Coastguard WorkerUnfortunately, the lit tests don't do a great job of precisely testing the
1172*03ce13f7SAndroid Build Coastguard Workercorrectness of the output.  Much better are the cross tests, which are execution
1173*03ce13f7SAndroid Build Coastguard Workertests that compare Subzero and ``pnacl-llc`` translated bitcode across a wide
1174*03ce13f7SAndroid Build Coastguard Workervariety of interesting inputs.  Each cross test consists of a set of C, C++,
1175*03ce13f7SAndroid Build Coastguard Workerand/or low-level bitcode files.  The C and C++ source files are compiled down to
1176*03ce13f7SAndroid Build Coastguard Workerbitcode.  The bitcode files are translated by ``pnacl-llc`` and also by Subzero.
1177*03ce13f7SAndroid Build Coastguard WorkerSubzero mangles global symbol names with a special prefix to avoid duplicate
1178*03ce13f7SAndroid Build Coastguard Workersymbol errors.  A driver program invokes both versions on a large set of
1179*03ce13f7SAndroid Build Coastguard Workerinteresting inputs, and reports when the Subzero and ``pnacl-llc`` results
1180*03ce13f7SAndroid Build Coastguard Workerdiffer.  Cross tests turn out to be an excellent way of testing the basic
1181*03ce13f7SAndroid Build Coastguard Workerlowering patterns, but they are less useful for testing more global things like
1182*03ce13f7SAndroid Build Coastguard Workerliveness analysis and register allocation.
1183*03ce13f7SAndroid Build Coastguard Worker
1184*03ce13f7SAndroid Build Coastguard WorkerBisection debugging
1185*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^
1186*03ce13f7SAndroid Build Coastguard Worker
1187*03ce13f7SAndroid Build Coastguard WorkerSometimes with a new application, Subzero will end up producing incorrect code
1188*03ce13f7SAndroid Build Coastguard Workerthat either crashes at runtime or otherwise produces the wrong results.  When
1189*03ce13f7SAndroid Build Coastguard Workerthis happens, we need to narrow it down to a single function (or small set of
1190*03ce13f7SAndroid Build Coastguard Workerfunctions) that yield incorrect behavior.  For this, we have a bisection
1191*03ce13f7SAndroid Build Coastguard Workerdebugging framework.  Here, we initially translate the entire application once
1192*03ce13f7SAndroid Build Coastguard Workerwith Subzero and once with ``pnacl-llc``.  We then use ``objdump`` to
1193*03ce13f7SAndroid Build Coastguard Workerselectively weaken symbols based on a list provided on the command line. The
1194*03ce13f7SAndroid Build Coastguard Workertwo object files can then be linked together without link errors, with the
1195*03ce13f7SAndroid Build Coastguard Workerdesired version of each method "winning".  Then the binary is tested, and
1196*03ce13f7SAndroid Build Coastguard Workerbisection proceeds based on whether the binary produces correct output.
1197*03ce13f7SAndroid Build Coastguard Worker
1198*03ce13f7SAndroid Build Coastguard WorkerWhen the bisection completes, we are left with a minimal set of
1199*03ce13f7SAndroid Build Coastguard WorkerSubzero-translated functions that cause the failure.  Usually it is a single
1200*03ce13f7SAndroid Build Coastguard Workerfunction, though sometimes it might require a combination of several functions
1201*03ce13f7SAndroid Build Coastguard Workerto cause a failure; this may be due to an incorrect call ABI, for example.
1202*03ce13f7SAndroid Build Coastguard WorkerHowever, Murphy's Law implies that the single failing function is enormous and
1203*03ce13f7SAndroid Build Coastguard Workerimpractical to debug.  In that case, we can restart the bisection, explicitly
1204*03ce13f7SAndroid Build Coastguard Workerignoring the enormous function, and try to find another candidate to debug.
1205*03ce13f7SAndroid Build Coastguard Worker(Future work is to automate this to find all minimal sets of functions, so that
1206*03ce13f7SAndroid Build Coastguard Workerdebugging can focus on the simplest example.)
1207*03ce13f7SAndroid Build Coastguard Worker
1208*03ce13f7SAndroid Build Coastguard WorkerFuzz testing
1209*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^
1210*03ce13f7SAndroid Build Coastguard Worker
1211*03ce13f7SAndroid Build Coastguard WorkerAs described above, we try to find internal Subzero bugs using fuzz testing
1212*03ce13f7SAndroid Build Coastguard Workertechniques.
1213*03ce13f7SAndroid Build Coastguard Worker
1214*03ce13f7SAndroid Build Coastguard WorkerSanitizers
1215*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^
1216*03ce13f7SAndroid Build Coastguard Worker
1217*03ce13f7SAndroid Build Coastguard WorkerSubzero can be built with `AddressSanitizer
1218*03ce13f7SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/AddressSanitizer.html>`_ (ASan) or `ThreadSanitizer
1219*03ce13f7SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/ThreadSanitizer.html>`_ (TSan) support.  This is
1220*03ce13f7SAndroid Build Coastguard Workerdone using something as simple as ``make ASAN=1`` or ``make TSAN=1``.  So far,
1221*03ce13f7SAndroid Build Coastguard Workermultithreading has been simple enough that TSan hasn't found any bugs, but ASan
1222*03ce13f7SAndroid Build Coastguard Workerhas found at least one memory leak which was subsequently fixed.
1223*03ce13f7SAndroid Build Coastguard Worker`UndefinedBehaviorSanitizer
1224*03ce13f7SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation>`_
1225*03ce13f7SAndroid Build Coastguard Worker(UBSan) support is in progress.  `Control flow integrity sanitization
1226*03ce13f7SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/ControlFlowIntegrity.html>`_ is also under
1227*03ce13f7SAndroid Build Coastguard Workerconsideration.
1228*03ce13f7SAndroid Build Coastguard Worker
1229*03ce13f7SAndroid Build Coastguard WorkerCurrent status
1230*03ce13f7SAndroid Build Coastguard Worker==============
1231*03ce13f7SAndroid Build Coastguard Worker
1232*03ce13f7SAndroid Build Coastguard WorkerTarget architectures
1233*03ce13f7SAndroid Build Coastguard Worker--------------------
1234*03ce13f7SAndroid Build Coastguard Worker
1235*03ce13f7SAndroid Build Coastguard WorkerSubzero is currently more or less complete for the x86-32 target.  It has been
1236*03ce13f7SAndroid Build Coastguard Workerrefactored and extended to handle x86-64 as well, and that is mostly complete at
1237*03ce13f7SAndroid Build Coastguard Workerthis point.
1238*03ce13f7SAndroid Build Coastguard Worker
1239*03ce13f7SAndroid Build Coastguard WorkerARM32 work is in progress.  It currently lacks the testing level of x86, at
1240*03ce13f7SAndroid Build Coastguard Workerleast in part because Subzero's register allocator needs modifications to handle
1241*03ce13f7SAndroid Build Coastguard WorkerARM's aliasing of floating point and vector registers.  Specifically, a 64-bit
1242*03ce13f7SAndroid Build Coastguard Workerregister is actually a gang of two consecutive and aligned 32-bit registers, and
1243*03ce13f7SAndroid Build Coastguard Workera 128-bit register is a gang of 4 consecutive and aligned 32-bit registers.
1244*03ce13f7SAndroid Build Coastguard WorkerARM64 work has not started; when it does, it will be native-only since the
1245*03ce13f7SAndroid Build Coastguard WorkerNative Client sandbox model, validator, and other tools have never been defined.
1246*03ce13f7SAndroid Build Coastguard Worker
1247*03ce13f7SAndroid Build Coastguard WorkerAn external contributor is adding MIPS support, in most part by following the
1248*03ce13f7SAndroid Build Coastguard WorkerARM work.
1249*03ce13f7SAndroid Build Coastguard Worker
1250*03ce13f7SAndroid Build Coastguard WorkerTranslator performance
1251*03ce13f7SAndroid Build Coastguard Worker----------------------
1252*03ce13f7SAndroid Build Coastguard Worker
1253*03ce13f7SAndroid Build Coastguard WorkerSingle-threaded translation speed is currently about 5× the ``pnacl-llc``
1254*03ce13f7SAndroid Build Coastguard Workertranslation speed.  For a large ``.pexe`` file, the time breaks down as:
1255*03ce13f7SAndroid Build Coastguard Worker
1256*03ce13f7SAndroid Build Coastguard Worker- 11% for parsing and initial IR building
1257*03ce13f7SAndroid Build Coastguard Worker
1258*03ce13f7SAndroid Build Coastguard Worker- 4% for emitting to /dev/null
1259*03ce13f7SAndroid Build Coastguard Worker
1260*03ce13f7SAndroid Build Coastguard Worker- 27% for liveness analysis (two liveness passes plus live range construction)
1261*03ce13f7SAndroid Build Coastguard Worker
1262*03ce13f7SAndroid Build Coastguard Worker- 15% for linear-scan register allocation
1263*03ce13f7SAndroid Build Coastguard Worker
1264*03ce13f7SAndroid Build Coastguard Worker- 9% for basic lowering
1265*03ce13f7SAndroid Build Coastguard Worker
1266*03ce13f7SAndroid Build Coastguard Worker- 10% for advanced phi lowering
1267*03ce13f7SAndroid Build Coastguard Worker
1268*03ce13f7SAndroid Build Coastguard Worker- ~11% for other minor analysis
1269*03ce13f7SAndroid Build Coastguard Worker
1270*03ce13f7SAndroid Build Coastguard Worker- ~10% measurement overhead to acquire these numbers
1271*03ce13f7SAndroid Build Coastguard Worker
1272*03ce13f7SAndroid Build Coastguard WorkerSome improvements could undoubtedly be made, but it will be hard to increase the
1273*03ce13f7SAndroid Build Coastguard Workerspeed to 10× of ``pnacl-llc`` while keeping acceptable code quality.  With
1274*03ce13f7SAndroid Build Coastguard Worker``-Om1`` (lack of) optimization, we do actually achieve roughly 10×
1275*03ce13f7SAndroid Build Coastguard Worker``pnacl-llc`` translation speed, but code quality drops by a factor of 3.
1276*03ce13f7SAndroid Build Coastguard Worker
1277*03ce13f7SAndroid Build Coastguard WorkerCode quality
1278*03ce13f7SAndroid Build Coastguard Worker------------
1279*03ce13f7SAndroid Build Coastguard Worker
1280*03ce13f7SAndroid Build Coastguard WorkerMeasured across 16 components of spec2k, Subzero's code quality is uniformly
1281*03ce13f7SAndroid Build Coastguard Workerbetter than ``pnacl-llc`` ``-O0`` code quality, and in many cases solidly
1282*03ce13f7SAndroid Build Coastguard Workerbetween ``pnacl-llc`` ``-O0`` and ``-O2``.
1283*03ce13f7SAndroid Build Coastguard Worker
1284*03ce13f7SAndroid Build Coastguard WorkerTranslator size
1285*03ce13f7SAndroid Build Coastguard Worker---------------
1286*03ce13f7SAndroid Build Coastguard Worker
1287*03ce13f7SAndroid Build Coastguard WorkerWhen built in MINIMAL mode, the x86-64 native translator size for the x86-32
1288*03ce13f7SAndroid Build Coastguard Workertarget is about 700 KB, not including the size of functions referenced in
1289*03ce13f7SAndroid Build Coastguard Workerdynamically-linked libraries.  The sandboxed version of Subzero is a bit over 1
1290*03ce13f7SAndroid Build Coastguard WorkerMB, and it is statically linked and also includes nop padding for bundling as
1291*03ce13f7SAndroid Build Coastguard Workerwell as indirect branch masking.
1292*03ce13f7SAndroid Build Coastguard Worker
1293*03ce13f7SAndroid Build Coastguard WorkerTranslator memory footprint
1294*03ce13f7SAndroid Build Coastguard Worker---------------------------
1295*03ce13f7SAndroid Build Coastguard Worker
1296*03ce13f7SAndroid Build Coastguard WorkerIt's hard to draw firm conclusions about memory footprint, since the footprint
1297*03ce13f7SAndroid Build Coastguard Workeris at least proportional to the input function size, and there is no real limit
1298*03ce13f7SAndroid Build Coastguard Workeron the size of functions in the ``.pexe`` file.
1299*03ce13f7SAndroid Build Coastguard Worker
1300*03ce13f7SAndroid Build Coastguard WorkerThat said, we looked at the memory footprint over time as Subzero translated
1301*03ce13f7SAndroid Build Coastguard Worker``pnacl-llc.pexe``, which is the largest ``.pexe`` file (7.2 MB) at our
1302*03ce13f7SAndroid Build Coastguard Workerdisposal.  One of LLVM's libraries that Subzero uses can report the current
1303*03ce13f7SAndroid Build Coastguard Workermalloc heap usage.  With single-threaded translation, Subzero tends to hover
1304*03ce13f7SAndroid Build Coastguard Workeraround 15 MB of memory usage.  There are a couple of monstrous functions where
1305*03ce13f7SAndroid Build Coastguard WorkerSubzero grows to around 100 MB, but then it drops back down after those
1306*03ce13f7SAndroid Build Coastguard Workerfunctions finish translating.  In contrast, ``pnacl-llc`` grows larger and
1307*03ce13f7SAndroid Build Coastguard Workerlarger throughout translation, reaching several hundred MB by the time it
1308*03ce13f7SAndroid Build Coastguard Workercompletes.
1309*03ce13f7SAndroid Build Coastguard Worker
1310*03ce13f7SAndroid Build Coastguard WorkerIt's a bit more interesting when we enable multithreaded translation.  When
1311*03ce13f7SAndroid Build Coastguard Workerthere are N translation threads, Subzero implements a policy that limits the
1312*03ce13f7SAndroid Build Coastguard Workersize of the translation queue to N entries -- if it is "full" when the parser
1313*03ce13f7SAndroid Build Coastguard Workertries to add a new CFG, the parser blocks until one of the translation threads
1314*03ce13f7SAndroid Build Coastguard Workerremoves a CFG.  This means the number of in-memory CFGs can (and generally does)
1315*03ce13f7SAndroid Build Coastguard Workerreach 2*N+1, and so the memory footprint rises in proportion to the number of
1316*03ce13f7SAndroid Build Coastguard Workerthreads.  Adding to the pressure is the observation that the monstrous functions
1317*03ce13f7SAndroid Build Coastguard Workeralso take proportionally longer time to translate, so there's a good chance many
1318*03ce13f7SAndroid Build Coastguard Workerof the monstrous functions will be active at the same time with multithreaded
1319*03ce13f7SAndroid Build Coastguard Workertranslation.  As a result, for N=32, Subzero's memory footprint peaks at about
1320*03ce13f7SAndroid Build Coastguard Worker260 MB, but drops back down as the large functions finish translating.
1321*03ce13f7SAndroid Build Coastguard Worker
1322*03ce13f7SAndroid Build Coastguard WorkerIf this peak memory size becomes a problem, it might be possible for the parser
1323*03ce13f7SAndroid Build Coastguard Workerto resequence the functions to try to spread out the larger functions, or to
1324*03ce13f7SAndroid Build Coastguard Workerthrottle the translation queue to prevent too many in-flight large functions.
1325*03ce13f7SAndroid Build Coastguard WorkerIt may also be possible to throttle based on memory pressure signaling from
1326*03ce13f7SAndroid Build Coastguard WorkerChrome.
1327*03ce13f7SAndroid Build Coastguard Worker
1328*03ce13f7SAndroid Build Coastguard WorkerTranslator scalability
1329*03ce13f7SAndroid Build Coastguard Worker----------------------
1330*03ce13f7SAndroid Build Coastguard Worker
1331*03ce13f7SAndroid Build Coastguard WorkerCurrently scalability is "not very good".  Multiple translation threads lead to
1332*03ce13f7SAndroid Build Coastguard Workerfaster translation, but not to the degree desired.  We haven't dug in to
1333*03ce13f7SAndroid Build Coastguard Workerinvestigate yet.
1334*03ce13f7SAndroid Build Coastguard Worker
1335*03ce13f7SAndroid Build Coastguard WorkerThere are a few areas to investigate.  First, there may be contention on the
1336*03ce13f7SAndroid Build Coastguard Workerconstant pool, which all threads access, and which requires locked access even
1337*03ce13f7SAndroid Build Coastguard Workerfor reading.  This could be mitigated by keeping a CFG-local cache of the most
1338*03ce13f7SAndroid Build Coastguard Workercommon constants.
1339*03ce13f7SAndroid Build Coastguard Worker
1340*03ce13f7SAndroid Build Coastguard WorkerSecond, there may be contention on memory allocation.  While almost all CFG
1341*03ce13f7SAndroid Build Coastguard Workerobjects are allocated from the CFG-local allocator, some passes use temporary
1342*03ce13f7SAndroid Build Coastguard WorkerSTL containers that use the default allocator, which may require global locking.
1343*03ce13f7SAndroid Build Coastguard WorkerThis could be mitigated by switching these to the CFG-local allocator.
1344*03ce13f7SAndroid Build Coastguard Worker
1345*03ce13f7SAndroid Build Coastguard WorkerThird, multithreading may make the default allocator strategy more expensive.
1346*03ce13f7SAndroid Build Coastguard WorkerIn a single-threaded environment, a pass will allocate its containers, run the
1347*03ce13f7SAndroid Build Coastguard Workerpass, and deallocate the containers.  This results in stack-like allocation
1348*03ce13f7SAndroid Build Coastguard Workerbehavior and makes the heap free list easier to manage, with less heap
1349*03ce13f7SAndroid Build Coastguard Workerfragmentation.  But when multithreading is added, the allocations and
1350*03ce13f7SAndroid Build Coastguard Workerdeallocations become much less stack-like, making allocation and deallocation
1351*03ce13f7SAndroid Build Coastguard Workeroperations individually more expensive.  Again, this could be mitigated by
1352*03ce13f7SAndroid Build Coastguard Workerswitching these to the CFG-local allocator.
1353