1*03ce13f7SAndroid Build Coastguard WorkerDesign of the Subzero fast code generator 2*03ce13f7SAndroid Build Coastguard Worker========================================= 3*03ce13f7SAndroid Build Coastguard Worker 4*03ce13f7SAndroid Build Coastguard WorkerIntroduction 5*03ce13f7SAndroid Build Coastguard Worker------------ 6*03ce13f7SAndroid Build Coastguard Worker 7*03ce13f7SAndroid Build Coastguard WorkerThe `Portable Native Client (PNaCl) <http://gonacl.com>`_ project includes 8*03ce13f7SAndroid Build Coastguard Workercompiler technology based on `LLVM <http://llvm.org/>`_. The developer uses the 9*03ce13f7SAndroid Build Coastguard WorkerPNaCl toolchain to compile their application to architecture-neutral PNaCl 10*03ce13f7SAndroid Build Coastguard Workerbitcode (a ``.pexe`` file), using as much architecture-neutral optimization as 11*03ce13f7SAndroid Build Coastguard Workerpossible. The ``.pexe`` file is downloaded to the user's browser where the 12*03ce13f7SAndroid Build Coastguard WorkerPNaCl translator (a component of Chrome) compiles the ``.pexe`` file to 13*03ce13f7SAndroid Build Coastguard Worker`sandboxed 14*03ce13f7SAndroid Build Coastguard Worker<https://developer.chrome.com/native-client/reference/sandbox_internals/index>`_ 15*03ce13f7SAndroid Build Coastguard Workernative code. The translator uses architecture-specific optimizations as much as 16*03ce13f7SAndroid Build Coastguard Workerpractical to generate good native code. 17*03ce13f7SAndroid Build Coastguard Worker 18*03ce13f7SAndroid Build Coastguard WorkerThe native code can be cached by the browser to avoid repeating translation on 19*03ce13f7SAndroid Build Coastguard Workerfuture page loads. However, first-time user experience is hampered by long 20*03ce13f7SAndroid Build Coastguard Workertranslation times. The LLVM-based PNaCl translator is pretty slow, even when 21*03ce13f7SAndroid Build Coastguard Workerusing ``-O0`` to minimize optimizations, so delays are especially noticeable on 22*03ce13f7SAndroid Build Coastguard Workerslow browser platforms such as ARM-based Chromebooks. 23*03ce13f7SAndroid Build Coastguard Worker 24*03ce13f7SAndroid Build Coastguard WorkerTranslator slowness can be mitigated or hidden in a number of ways. 25*03ce13f7SAndroid Build Coastguard Worker 26*03ce13f7SAndroid Build Coastguard Worker- Parallel translation. However, slow machines where this matters most, e.g. 27*03ce13f7SAndroid Build Coastguard Worker ARM-based Chromebooks, are likely to have fewer cores to parallelize across, 28*03ce13f7SAndroid Build Coastguard Worker and are likely to less memory available for multiple translation threads to 29*03ce13f7SAndroid Build Coastguard Worker use. 30*03ce13f7SAndroid Build Coastguard Worker 31*03ce13f7SAndroid Build Coastguard Worker- Streaming translation, i.e. start translating as soon as the download starts. 32*03ce13f7SAndroid Build Coastguard Worker This doesn't help much when translation speed is 10× slower than download 33*03ce13f7SAndroid Build Coastguard Worker speed, or the ``.pexe`` file is already cached while the translated binary was 34*03ce13f7SAndroid Build Coastguard Worker flushed from the cache. 35*03ce13f7SAndroid Build Coastguard Worker 36*03ce13f7SAndroid Build Coastguard Worker- Arrange the web page such that translation is done in parallel with 37*03ce13f7SAndroid Build Coastguard Worker downloading large assets. 38*03ce13f7SAndroid Build Coastguard Worker 39*03ce13f7SAndroid Build Coastguard Worker- Arrange the web page to distract the user with `cat videos 40*03ce13f7SAndroid Build Coastguard Worker <https://www.youtube.com/watch?v=tLt5rBfNucc>`_ while translation is in 41*03ce13f7SAndroid Build Coastguard Worker progress. 42*03ce13f7SAndroid Build Coastguard Worker 43*03ce13f7SAndroid Build Coastguard WorkerOr, improve translator performance to something more reasonable. 44*03ce13f7SAndroid Build Coastguard Worker 45*03ce13f7SAndroid Build Coastguard WorkerThis document describes Subzero's attempt to improve translation speed by an 46*03ce13f7SAndroid Build Coastguard Workerorder of magnitude while rivaling LLVM's code quality. Subzero does this 47*03ce13f7SAndroid Build Coastguard Workerthrough minimal IR layering, lean data structures and passes, and a careful 48*03ce13f7SAndroid Build Coastguard Workerselection of fast optimization passes. It has two optimization recipes: full 49*03ce13f7SAndroid Build Coastguard Workeroptimizations (``O2``) and minimal optimizations (``Om1``). The recipes are the 50*03ce13f7SAndroid Build Coastguard Workerfollowing (described in more detail below): 51*03ce13f7SAndroid Build Coastguard Worker 52*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 53*03ce13f7SAndroid Build Coastguard Worker| O2 recipe | Om1 recipe | 54*03ce13f7SAndroid Build Coastguard Worker+========================================+=============================+ 55*03ce13f7SAndroid Build Coastguard Worker| Parse .pexe file | Parse .pexe file | 56*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 57*03ce13f7SAndroid Build Coastguard Worker| Loop nest analysis | | 58*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 59*03ce13f7SAndroid Build Coastguard Worker| Local common subexpression elimination | | 60*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 61*03ce13f7SAndroid Build Coastguard Worker| Address mode inference | | 62*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 63*03ce13f7SAndroid Build Coastguard Worker| Read-modify-write (RMW) transform | | 64*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 65*03ce13f7SAndroid Build Coastguard Worker| Basic liveness analysis | | 66*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 67*03ce13f7SAndroid Build Coastguard Worker| Load optimization | | 68*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 69*03ce13f7SAndroid Build Coastguard Worker| | Phi lowering (simple) | 70*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 71*03ce13f7SAndroid Build Coastguard Worker| Target lowering | Target lowering | 72*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 73*03ce13f7SAndroid Build Coastguard Worker| Full liveness analysis | | 74*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 75*03ce13f7SAndroid Build Coastguard Worker| Register allocation | Minimal register allocation | 76*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 77*03ce13f7SAndroid Build Coastguard Worker| Phi lowering (advanced) | | 78*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 79*03ce13f7SAndroid Build Coastguard Worker| Post-phi register allocation | | 80*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 81*03ce13f7SAndroid Build Coastguard Worker| Branch optimization | | 82*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 83*03ce13f7SAndroid Build Coastguard Worker| Code emission | Code emission | 84*03ce13f7SAndroid Build Coastguard Worker+----------------------------------------+-----------------------------+ 85*03ce13f7SAndroid Build Coastguard Worker 86*03ce13f7SAndroid Build Coastguard WorkerGoals 87*03ce13f7SAndroid Build Coastguard Worker===== 88*03ce13f7SAndroid Build Coastguard Worker 89*03ce13f7SAndroid Build Coastguard WorkerTranslation speed 90*03ce13f7SAndroid Build Coastguard Worker----------------- 91*03ce13f7SAndroid Build Coastguard Worker 92*03ce13f7SAndroid Build Coastguard WorkerWe'd like to be able to translate a ``.pexe`` file as fast as download speed. 93*03ce13f7SAndroid Build Coastguard WorkerAny faster is in a sense wasted effort. Download speed varies greatly, but 94*03ce13f7SAndroid Build Coastguard Workerwe'll arbitrarily say 1 MB/sec. We'll pick the ARM A15 CPU as the example of a 95*03ce13f7SAndroid Build Coastguard Workerslow machine. We observe a 3× single-thread performance difference between A15 96*03ce13f7SAndroid Build Coastguard Workerand a high-end x86 Xeon E5-2690 based workstation, and aggressively assume a 97*03ce13f7SAndroid Build Coastguard Worker``.pexe`` file could be compressed to 50% on the web server using gzip transport 98*03ce13f7SAndroid Build Coastguard Workercompression, so we set the translation speed goal to 6 MB/sec on the high-end 99*03ce13f7SAndroid Build Coastguard WorkerXeon workstation. 100*03ce13f7SAndroid Build Coastguard Worker 101*03ce13f7SAndroid Build Coastguard WorkerCurrently, at the ``-O0`` level, the LLVM-based PNaCl translation translates at 102*03ce13f7SAndroid Build Coastguard Worker⅒ the target rate. The ``-O2`` mode takes 3× as long as the ``-O0`` mode. 103*03ce13f7SAndroid Build Coastguard Worker 104*03ce13f7SAndroid Build Coastguard WorkerIn other words, Subzero's goal is to improve over LLVM's translation speed by 105*03ce13f7SAndroid Build Coastguard Worker10×. 106*03ce13f7SAndroid Build Coastguard Worker 107*03ce13f7SAndroid Build Coastguard WorkerCode quality 108*03ce13f7SAndroid Build Coastguard Worker------------ 109*03ce13f7SAndroid Build Coastguard Worker 110*03ce13f7SAndroid Build Coastguard WorkerSubzero's initial goal is to produce code that meets or exceeds LLVM's ``-O0`` 111*03ce13f7SAndroid Build Coastguard Workercode quality. The stretch goal is to approach LLVM ``-O2`` code quality. On 112*03ce13f7SAndroid Build Coastguard Workeraverage, LLVM ``-O2`` performs twice as well as LLVM ``-O0``. 113*03ce13f7SAndroid Build Coastguard Worker 114*03ce13f7SAndroid Build Coastguard WorkerIt's important to note that the quality of Subzero-generated code depends on 115*03ce13f7SAndroid Build Coastguard Workertarget-neutral optimizations and simplifications being run beforehand in the 116*03ce13f7SAndroid Build Coastguard Workerdeveloper environment. The ``.pexe`` file reflects these optimizations. For 117*03ce13f7SAndroid Build Coastguard Workerexample, Subzero assumes that the basic blocks are ordered topologically where 118*03ce13f7SAndroid Build Coastguard Workerpossible (which makes liveness analysis converge fastest), and Subzero does not 119*03ce13f7SAndroid Build Coastguard Workerdo any function inlining because it should already have been done. 120*03ce13f7SAndroid Build Coastguard Worker 121*03ce13f7SAndroid Build Coastguard WorkerTranslator size 122*03ce13f7SAndroid Build Coastguard Worker--------------- 123*03ce13f7SAndroid Build Coastguard Worker 124*03ce13f7SAndroid Build Coastguard WorkerThe current LLVM-based translator binary (``pnacl-llc``) is about 10 MB in size. 125*03ce13f7SAndroid Build Coastguard WorkerWe think 1 MB is a more reasonable size -- especially for such a component that 126*03ce13f7SAndroid Build Coastguard Workeris distributed to a billion Chrome users. Thus we target a 10× reduction in 127*03ce13f7SAndroid Build Coastguard Workerbinary size. 128*03ce13f7SAndroid Build Coastguard Worker 129*03ce13f7SAndroid Build Coastguard WorkerFor development, Subzero can be built for all target architectures, and all 130*03ce13f7SAndroid Build Coastguard Workerdebugging and diagnostic options enabled. For a smaller translator, we restrict 131*03ce13f7SAndroid Build Coastguard Workerto a single target architecture, and define a ``MINIMAL`` build where 132*03ce13f7SAndroid Build Coastguard Workerunnecessary features are compiled out. 133*03ce13f7SAndroid Build Coastguard Worker 134*03ce13f7SAndroid Build Coastguard WorkerSubzero leverages some data structures from LLVM's ``ADT`` and ``Support`` 135*03ce13f7SAndroid Build Coastguard Workerinclude directories, which have little impact on translator size. It also uses 136*03ce13f7SAndroid Build Coastguard Workersome of LLVM's bitcode decoding code (for binary-format ``.pexe`` files), again 137*03ce13f7SAndroid Build Coastguard Workerwith little size impact. In non-``MINIMAL`` builds, the translator size is much 138*03ce13f7SAndroid Build Coastguard Workerlarger due to including code for parsing text-format bitcode files and forming 139*03ce13f7SAndroid Build Coastguard WorkerLLVM IR. 140*03ce13f7SAndroid Build Coastguard Worker 141*03ce13f7SAndroid Build Coastguard WorkerMemory footprint 142*03ce13f7SAndroid Build Coastguard Worker---------------- 143*03ce13f7SAndroid Build Coastguard Worker 144*03ce13f7SAndroid Build Coastguard WorkerThe current LLVM-based translator suffers from an issue in which some 145*03ce13f7SAndroid Build Coastguard Workerfunction-specific data has to be retained in memory until all translation 146*03ce13f7SAndroid Build Coastguard Workercompletes, and therefore the memory footprint grows without bound. Large 147*03ce13f7SAndroid Build Coastguard Worker``.pexe`` files can lead to the translator process holding hundreds of MB of 148*03ce13f7SAndroid Build Coastguard Workermemory by the end. The translator runs in a separate process, so this memory 149*03ce13f7SAndroid Build Coastguard Workergrowth doesn't *directly* affect other processes, but it does dirty the physical 150*03ce13f7SAndroid Build Coastguard Workermemory and contributes to a perception of bloat and sometimes a reality of 151*03ce13f7SAndroid Build Coastguard Workerout-of-memory tab killing, especially noticeable on weaker systems. 152*03ce13f7SAndroid Build Coastguard Worker 153*03ce13f7SAndroid Build Coastguard WorkerSubzero should maintain a stable memory footprint throughout translation. It's 154*03ce13f7SAndroid Build Coastguard Workernot really practical to set a specific limit, because there is not really a 155*03ce13f7SAndroid Build Coastguard Workerpractical limit on a single function's size, but the footprint should be 156*03ce13f7SAndroid Build Coastguard Worker"reasonable" and be proportional to the largest input function size, not the 157*03ce13f7SAndroid Build Coastguard Workertotal ``.pexe`` file size. Simply put, Subzero should not have memory leaks or 158*03ce13f7SAndroid Build Coastguard Workerinexorable memory growth. (We use ASAN builds to test for leaks.) 159*03ce13f7SAndroid Build Coastguard Worker 160*03ce13f7SAndroid Build Coastguard WorkerMultithreaded translation 161*03ce13f7SAndroid Build Coastguard Worker------------------------- 162*03ce13f7SAndroid Build Coastguard Worker 163*03ce13f7SAndroid Build Coastguard WorkerIt should be practical to translate different functions concurrently and see 164*03ce13f7SAndroid Build Coastguard Workergood scalability. Some locking may be needed, such as accessing output buffers 165*03ce13f7SAndroid Build Coastguard Workeror constant pools, but that should be fairly minimal. In contrast, LLVM was 166*03ce13f7SAndroid Build Coastguard Workeronly designed for module-level parallelism, and as such, the PNaCl translator 167*03ce13f7SAndroid Build Coastguard Workerinternally splits a ``.pexe`` file into several modules for concurrent 168*03ce13f7SAndroid Build Coastguard Workertranslation. All output needs to be deterministic regardless of the level of 169*03ce13f7SAndroid Build Coastguard Workermultithreading, i.e. functions and data should always be output in the same 170*03ce13f7SAndroid Build Coastguard Workerorder. 171*03ce13f7SAndroid Build Coastguard Worker 172*03ce13f7SAndroid Build Coastguard WorkerTarget architectures 173*03ce13f7SAndroid Build Coastguard Worker-------------------- 174*03ce13f7SAndroid Build Coastguard Worker 175*03ce13f7SAndroid Build Coastguard WorkerInitial target architectures are x86-32, x86-64, ARM32, and MIPS32. Future 176*03ce13f7SAndroid Build Coastguard Workertargets include ARM64 and MIPS64, though these targets lack NaCl support 177*03ce13f7SAndroid Build Coastguard Workerincluding a sandbox model or a validator. 178*03ce13f7SAndroid Build Coastguard Worker 179*03ce13f7SAndroid Build Coastguard WorkerThe first implementation is for x86-32, because it was expected to be 180*03ce13f7SAndroid Build Coastguard Workerparticularly challenging, and thus more likely to draw out any design problems 181*03ce13f7SAndroid Build Coastguard Workerearly: 182*03ce13f7SAndroid Build Coastguard Worker 183*03ce13f7SAndroid Build Coastguard Worker- There are a number of special cases, asymmetries, and warts in the x86 184*03ce13f7SAndroid Build Coastguard Worker instruction set. 185*03ce13f7SAndroid Build Coastguard Worker 186*03ce13f7SAndroid Build Coastguard Worker- Complex addressing modes may be leveraged for better code quality. 187*03ce13f7SAndroid Build Coastguard Worker 188*03ce13f7SAndroid Build Coastguard Worker- 64-bit integer operations have to be lowered into longer sequences of 32-bit 189*03ce13f7SAndroid Build Coastguard Worker operations. 190*03ce13f7SAndroid Build Coastguard Worker 191*03ce13f7SAndroid Build Coastguard Worker- Paucity of physical registers may reveal code quality issues early in the 192*03ce13f7SAndroid Build Coastguard Worker design. 193*03ce13f7SAndroid Build Coastguard Worker 194*03ce13f7SAndroid Build Coastguard WorkerDetailed design 195*03ce13f7SAndroid Build Coastguard Worker=============== 196*03ce13f7SAndroid Build Coastguard Worker 197*03ce13f7SAndroid Build Coastguard WorkerIntermediate representation - ICE 198*03ce13f7SAndroid Build Coastguard Worker--------------------------------- 199*03ce13f7SAndroid Build Coastguard Worker 200*03ce13f7SAndroid Build Coastguard WorkerSubzero's IR is called ICE. It is designed to be reasonably similar to LLVM's 201*03ce13f7SAndroid Build Coastguard WorkerIR, which is reflected in the ``.pexe`` file's bitcode structure. It has a 202*03ce13f7SAndroid Build Coastguard Workerrepresentation of global variables and initializers, and a set of functions. 203*03ce13f7SAndroid Build Coastguard WorkerEach function contains a list of basic blocks, and each basic block constains a 204*03ce13f7SAndroid Build Coastguard Workerlist of instructions. Instructions that operate on stack and register variables 205*03ce13f7SAndroid Build Coastguard Workerdo so using static single assignment (SSA) form. 206*03ce13f7SAndroid Build Coastguard Worker 207*03ce13f7SAndroid Build Coastguard WorkerThe ``.pexe`` file is translated one function at a time (or in parallel by 208*03ce13f7SAndroid Build Coastguard Workermultiple translation threads). The recipe for optimization passes depends on 209*03ce13f7SAndroid Build Coastguard Workerthe specific target and optimization level, and is described in detail below. 210*03ce13f7SAndroid Build Coastguard WorkerGlobal variables (types and initializers) are simply and directly translated to 211*03ce13f7SAndroid Build Coastguard Workerobject code, without any meaningful attempts at optimization. 212*03ce13f7SAndroid Build Coastguard Worker 213*03ce13f7SAndroid Build Coastguard WorkerA function's control flow graph (CFG) is represented by the ``Ice::Cfg`` class. 214*03ce13f7SAndroid Build Coastguard WorkerIts key contents include: 215*03ce13f7SAndroid Build Coastguard Worker 216*03ce13f7SAndroid Build Coastguard Worker- A list of ``CfgNode`` pointers, generally held in topological order. 217*03ce13f7SAndroid Build Coastguard Worker 218*03ce13f7SAndroid Build Coastguard Worker- A list of ``Variable`` pointers corresponding to local variables used in the 219*03ce13f7SAndroid Build Coastguard Worker function plus compiler-generated temporaries. 220*03ce13f7SAndroid Build Coastguard Worker 221*03ce13f7SAndroid Build Coastguard WorkerA basic block is represented by the ``Ice::CfgNode`` class. Its key contents 222*03ce13f7SAndroid Build Coastguard Workerinclude: 223*03ce13f7SAndroid Build Coastguard Worker 224*03ce13f7SAndroid Build Coastguard Worker- A linear list of instructions, in the same style as LLVM. The last 225*03ce13f7SAndroid Build Coastguard Worker instruction of the list is always a terminator instruction: branch, switch, 226*03ce13f7SAndroid Build Coastguard Worker return, unreachable. 227*03ce13f7SAndroid Build Coastguard Worker 228*03ce13f7SAndroid Build Coastguard Worker- A list of Phi instructions, also in the same style as LLVM. They are held as 229*03ce13f7SAndroid Build Coastguard Worker a linear list for convenience, though per Phi semantics, they are executed "in 230*03ce13f7SAndroid Build Coastguard Worker parallel" without dependencies on each other. 231*03ce13f7SAndroid Build Coastguard Worker 232*03ce13f7SAndroid Build Coastguard Worker- An unordered list of ``CfgNode`` pointers corresponding to incoming edges, and 233*03ce13f7SAndroid Build Coastguard Worker another list for outgoing edges. 234*03ce13f7SAndroid Build Coastguard Worker 235*03ce13f7SAndroid Build Coastguard Worker- The node's unique, 0-based index into the CFG's node list. 236*03ce13f7SAndroid Build Coastguard Worker 237*03ce13f7SAndroid Build Coastguard WorkerAn instruction is represented by the ``Ice::Inst`` class. Its key contents 238*03ce13f7SAndroid Build Coastguard Workerinclude: 239*03ce13f7SAndroid Build Coastguard Worker 240*03ce13f7SAndroid Build Coastguard Worker- A list of source operands. 241*03ce13f7SAndroid Build Coastguard Worker 242*03ce13f7SAndroid Build Coastguard Worker- Its destination variable, if the instruction produces a result in an 243*03ce13f7SAndroid Build Coastguard Worker ``Ice::Variable``. 244*03ce13f7SAndroid Build Coastguard Worker 245*03ce13f7SAndroid Build Coastguard Worker- A bitvector indicating which variables' live ranges this instruction ends. 246*03ce13f7SAndroid Build Coastguard Worker This is computed during liveness analysis. 247*03ce13f7SAndroid Build Coastguard Worker 248*03ce13f7SAndroid Build Coastguard WorkerInstructions kinds are divided into high-level ICE instructions and low-level 249*03ce13f7SAndroid Build Coastguard WorkerICE instructions. High-level instructions consist of the PNaCl/LLVM bitcode 250*03ce13f7SAndroid Build Coastguard Workerinstruction kinds. Each target architecture implementation extends the 251*03ce13f7SAndroid Build Coastguard Workerinstruction space with its own set of low-level instructions. Generally, 252*03ce13f7SAndroid Build Coastguard Workerlow-level instructions correspond to individual machine instructions. The 253*03ce13f7SAndroid Build Coastguard Workerhigh-level ICE instruction space includes a few additional instruction kinds 254*03ce13f7SAndroid Build Coastguard Workerthat are not part of LLVM but are generally useful (e.g., an Assignment 255*03ce13f7SAndroid Build Coastguard Workerinstruction), or are useful across targets. 256*03ce13f7SAndroid Build Coastguard Worker 257*03ce13f7SAndroid Build Coastguard WorkerSpecifically, high-level ICE instructions that derive from LLVM (but with PNaCl 258*03ce13f7SAndroid Build Coastguard WorkerABI restrictions as documented in the `PNaCl Bitcode Reference Manual 259*03ce13f7SAndroid Build Coastguard Worker<https://developer.chrome.com/native-client/reference/pnacl-bitcode-abi>`_) are 260*03ce13f7SAndroid Build Coastguard Workerthe following: 261*03ce13f7SAndroid Build Coastguard Worker 262*03ce13f7SAndroid Build Coastguard Worker- Alloca: allocate data on the stack 263*03ce13f7SAndroid Build Coastguard Worker 264*03ce13f7SAndroid Build Coastguard Worker- Arithmetic: binary operations of the form ``A = B op C`` 265*03ce13f7SAndroid Build Coastguard Worker 266*03ce13f7SAndroid Build Coastguard Worker- Br: conditional or unconditional branch 267*03ce13f7SAndroid Build Coastguard Worker 268*03ce13f7SAndroid Build Coastguard Worker- Call: function call 269*03ce13f7SAndroid Build Coastguard Worker 270*03ce13f7SAndroid Build Coastguard Worker- Cast: unary type-conversion operations 271*03ce13f7SAndroid Build Coastguard Worker 272*03ce13f7SAndroid Build Coastguard Worker- ExtractElement: extract a scalar element from a vector-type value 273*03ce13f7SAndroid Build Coastguard Worker 274*03ce13f7SAndroid Build Coastguard Worker- Fcmp: floating-point comparison 275*03ce13f7SAndroid Build Coastguard Worker 276*03ce13f7SAndroid Build Coastguard Worker- Icmp: integer comparison 277*03ce13f7SAndroid Build Coastguard Worker 278*03ce13f7SAndroid Build Coastguard Worker- Intrinsic: invoke a known intrinsic 279*03ce13f7SAndroid Build Coastguard Worker 280*03ce13f7SAndroid Build Coastguard Worker- InsertElement: insert a scalar element into a vector-type value 281*03ce13f7SAndroid Build Coastguard Worker 282*03ce13f7SAndroid Build Coastguard Worker- Load: load a value from memory 283*03ce13f7SAndroid Build Coastguard Worker 284*03ce13f7SAndroid Build Coastguard Worker- Phi: implement the SSA phi node 285*03ce13f7SAndroid Build Coastguard Worker 286*03ce13f7SAndroid Build Coastguard Worker- Ret: return from the function 287*03ce13f7SAndroid Build Coastguard Worker 288*03ce13f7SAndroid Build Coastguard Worker- Select: essentially the C language operation of the form ``X = C ? Y : Z`` 289*03ce13f7SAndroid Build Coastguard Worker 290*03ce13f7SAndroid Build Coastguard Worker- Store: store a value into memory 291*03ce13f7SAndroid Build Coastguard Worker 292*03ce13f7SAndroid Build Coastguard Worker- Switch: generalized branch to multiple possible locations 293*03ce13f7SAndroid Build Coastguard Worker 294*03ce13f7SAndroid Build Coastguard Worker- Unreachable: indicate that this portion of the code is unreachable 295*03ce13f7SAndroid Build Coastguard Worker 296*03ce13f7SAndroid Build Coastguard WorkerThe additional high-level ICE instructions are the following: 297*03ce13f7SAndroid Build Coastguard Worker 298*03ce13f7SAndroid Build Coastguard Worker- Assign: a simple ``A=B`` assignment. This is useful for e.g. lowering Phi 299*03ce13f7SAndroid Build Coastguard Worker instructions to non-SSA assignments, before lowering to machine code. 300*03ce13f7SAndroid Build Coastguard Worker 301*03ce13f7SAndroid Build Coastguard Worker- FakeDef, FakeUse, FakeKill. These are tools used to preserve consistency in 302*03ce13f7SAndroid Build Coastguard Worker liveness analysis, elevated to the high-level because they are used by all 303*03ce13f7SAndroid Build Coastguard Worker targets. They are described in more detail at the end of this section. 304*03ce13f7SAndroid Build Coastguard Worker 305*03ce13f7SAndroid Build Coastguard Worker- JumpTable: this represents the result of switch optimization analysis, where 306*03ce13f7SAndroid Build Coastguard Worker some switch instructions may use jump tables instead of cascading 307*03ce13f7SAndroid Build Coastguard Worker compare/branches. 308*03ce13f7SAndroid Build Coastguard Worker 309*03ce13f7SAndroid Build Coastguard WorkerAn operand is represented by the ``Ice::Operand`` class. In high-level ICE, an 310*03ce13f7SAndroid Build Coastguard Workeroperand is either an ``Ice::Constant`` or an ``Ice::Variable``. Constants 311*03ce13f7SAndroid Build Coastguard Workerinclude scalar integer constants, scalar floating point constants, Undef (an 312*03ce13f7SAndroid Build Coastguard Workerunspecified constant of a particular scalar or vector type), and symbol 313*03ce13f7SAndroid Build Coastguard Workerconstants (essentially addresses of globals). Note that the PNaCl ABI does not 314*03ce13f7SAndroid Build Coastguard Workerinclude vector-type constants besides Undef, and as such, Subzero (so far) has 315*03ce13f7SAndroid Build Coastguard Workerno reason to represent vector-type constants internally. A variable represents 316*03ce13f7SAndroid Build Coastguard Workera value allocated on the stack (though not including alloca-derived storage). 317*03ce13f7SAndroid Build Coastguard WorkerAmong other things, a variable holds its unique, 0-based index into the CFG's 318*03ce13f7SAndroid Build Coastguard Workervariable list. 319*03ce13f7SAndroid Build Coastguard Worker 320*03ce13f7SAndroid Build Coastguard WorkerEach target can extend the ``Constant`` and ``Variable`` classes for its own 321*03ce13f7SAndroid Build Coastguard Workerneeds. In addition, the ``Operand`` class may be extended, e.g. to define an 322*03ce13f7SAndroid Build Coastguard Workerx86 ``MemOperand`` that encodes a base register, an index register, an index 323*03ce13f7SAndroid Build Coastguard Workerregister shift amount, and a constant offset. 324*03ce13f7SAndroid Build Coastguard Worker 325*03ce13f7SAndroid Build Coastguard WorkerRegister allocation and liveness analysis are restricted to Variable operands. 326*03ce13f7SAndroid Build Coastguard WorkerBecause of the importance of register allocation to code quality, and the 327*03ce13f7SAndroid Build Coastguard Workertranslation-time cost of liveness analysis, Variable operands get some special 328*03ce13f7SAndroid Build Coastguard Workertreatment in ICE. Most notably, a frequent pattern in Subzero is to iterate 329*03ce13f7SAndroid Build Coastguard Workeracross all the Variables of an instruction. An instruction holds a list of 330*03ce13f7SAndroid Build Coastguard Workeroperands, but an operand may contain 0, 1, or more Variables. As such, the 331*03ce13f7SAndroid Build Coastguard Worker``Operand`` class specially holds a list of Variables contained within, for 332*03ce13f7SAndroid Build Coastguard Workerquick access. 333*03ce13f7SAndroid Build Coastguard Worker 334*03ce13f7SAndroid Build Coastguard WorkerA Subzero transformation pass may work by deleting an existing instruction and 335*03ce13f7SAndroid Build Coastguard Workerreplacing it with zero or more new instructions. Instead of actually deleting 336*03ce13f7SAndroid Build Coastguard Workerthe existing instruction, we generally mark it as deleted and insert the new 337*03ce13f7SAndroid Build Coastguard Workerinstructions right after the deleted instruction. When printing the IR for 338*03ce13f7SAndroid Build Coastguard Workerdebugging, this is a big help because it makes it much more clear how the 339*03ce13f7SAndroid Build Coastguard Workernon-deleted instructions came about. 340*03ce13f7SAndroid Build Coastguard Worker 341*03ce13f7SAndroid Build Coastguard WorkerSubzero has a few special instructions to help with liveness analysis 342*03ce13f7SAndroid Build Coastguard Workerconsistency. 343*03ce13f7SAndroid Build Coastguard Worker 344*03ce13f7SAndroid Build Coastguard Worker- The FakeDef instruction gives a fake definition of some variable. For 345*03ce13f7SAndroid Build Coastguard Worker example, on x86-32, a divide instruction defines both ``%eax`` and ``%edx`` 346*03ce13f7SAndroid Build Coastguard Worker but an ICE instruction can represent only one destination variable. This is 347*03ce13f7SAndroid Build Coastguard Worker similar for multiply instructions, and for function calls that return a 64-bit 348*03ce13f7SAndroid Build Coastguard Worker integer result in the ``%edx:%eax`` pair. Also, using the ``xor %eax, %eax`` 349*03ce13f7SAndroid Build Coastguard Worker trick to set ``%eax`` to 0 requires an initial FakeDef of ``%eax``. 350*03ce13f7SAndroid Build Coastguard Worker 351*03ce13f7SAndroid Build Coastguard Worker- The FakeUse instruction registers a use of a variable, typically to prevent an 352*03ce13f7SAndroid Build Coastguard Worker earlier assignment to that variable from being dead-code eliminated. For 353*03ce13f7SAndroid Build Coastguard Worker example, lowering an operation like ``x=cc?y:z`` may be done using x86's 354*03ce13f7SAndroid Build Coastguard Worker conditional move (cmov) instruction: ``mov z, x; cmov_cc y, x``. Without a 355*03ce13f7SAndroid Build Coastguard Worker FakeUse of ``x`` between the two instructions, the liveness analysis pass may 356*03ce13f7SAndroid Build Coastguard Worker dead-code eliminate the first instruction. 357*03ce13f7SAndroid Build Coastguard Worker 358*03ce13f7SAndroid Build Coastguard Worker- The FakeKill instruction is added after a call instruction, and is a quick way 359*03ce13f7SAndroid Build Coastguard Worker of indicating that caller-save registers are invalidated. 360*03ce13f7SAndroid Build Coastguard Worker 361*03ce13f7SAndroid Build Coastguard WorkerPexe parsing 362*03ce13f7SAndroid Build Coastguard Worker------------ 363*03ce13f7SAndroid Build Coastguard Worker 364*03ce13f7SAndroid Build Coastguard WorkerSubzero includes an integrated PNaCl bitcode parser for ``.pexe`` files. It 365*03ce13f7SAndroid Build Coastguard Workerparses the ``.pexe`` file function by function, ultimately constructing an ICE 366*03ce13f7SAndroid Build Coastguard WorkerCFG for each function. After a function is parsed, its CFG is handed off to the 367*03ce13f7SAndroid Build Coastguard Workertranslation phase. The bitcode parser also parses global initializer data and 368*03ce13f7SAndroid Build Coastguard Workerhands it off to be translated to data sections in the object file. 369*03ce13f7SAndroid Build Coastguard Worker 370*03ce13f7SAndroid Build Coastguard WorkerSubzero has another parsing strategy for testing/debugging. LLVM libraries can 371*03ce13f7SAndroid Build Coastguard Workerbe used to parse a module into LLVM IR (though very slowly relative to Subzero 372*03ce13f7SAndroid Build Coastguard Workernative parsing). Then we iterate across the LLVM IR and construct high-level 373*03ce13f7SAndroid Build Coastguard WorkerICE, handing off each CFG to the translation phase. 374*03ce13f7SAndroid Build Coastguard Worker 375*03ce13f7SAndroid Build Coastguard WorkerOverview of lowering 376*03ce13f7SAndroid Build Coastguard Worker-------------------- 377*03ce13f7SAndroid Build Coastguard Worker 378*03ce13f7SAndroid Build Coastguard WorkerIn general, translation goes like this: 379*03ce13f7SAndroid Build Coastguard Worker 380*03ce13f7SAndroid Build Coastguard Worker- Parse the next function from the ``.pexe`` file and construct a CFG consisting 381*03ce13f7SAndroid Build Coastguard Worker of high-level ICE. 382*03ce13f7SAndroid Build Coastguard Worker 383*03ce13f7SAndroid Build Coastguard Worker- Do analysis passes and transformation passes on the high-level ICE, as 384*03ce13f7SAndroid Build Coastguard Worker desired. 385*03ce13f7SAndroid Build Coastguard Worker 386*03ce13f7SAndroid Build Coastguard Worker- Lower each high-level ICE instruction into a sequence of zero or more 387*03ce13f7SAndroid Build Coastguard Worker low-level ICE instructions. Each high-level instruction is generally lowered 388*03ce13f7SAndroid Build Coastguard Worker independently, though the target lowering is allowed to look ahead in the 389*03ce13f7SAndroid Build Coastguard Worker CfgNode's instruction list if desired. 390*03ce13f7SAndroid Build Coastguard Worker 391*03ce13f7SAndroid Build Coastguard Worker- Do more analysis and transformation passes on the low-level ICE, as desired. 392*03ce13f7SAndroid Build Coastguard Worker 393*03ce13f7SAndroid Build Coastguard Worker- Assemble the low-level CFG into an ELF object file (alternatively, a textual 394*03ce13f7SAndroid Build Coastguard Worker assembly file that is later assembled by some external tool). 395*03ce13f7SAndroid Build Coastguard Worker 396*03ce13f7SAndroid Build Coastguard Worker- Repeat for all functions, and also produce object code for data such as global 397*03ce13f7SAndroid Build Coastguard Worker initializers and internal constant pools. 398*03ce13f7SAndroid Build Coastguard Worker 399*03ce13f7SAndroid Build Coastguard WorkerCurrently there are two optimization levels: ``O2`` and ``Om1``. For ``O2``, 400*03ce13f7SAndroid Build Coastguard Workerthe intention is to apply all available optimizations to get the best code 401*03ce13f7SAndroid Build Coastguard Workerquality (though the initial code quality goal is measured against LLVM's ``O0`` 402*03ce13f7SAndroid Build Coastguard Workercode quality). For ``Om1``, the intention is to apply as few optimizations as 403*03ce13f7SAndroid Build Coastguard Workerpossible and produce code as quickly as possible, accepting poor code quality. 404*03ce13f7SAndroid Build Coastguard Worker``Om1`` is short for "O-minus-one", i.e. "worse than O0", or in other words, 405*03ce13f7SAndroid Build Coastguard Worker"sub-zero". 406*03ce13f7SAndroid Build Coastguard Worker 407*03ce13f7SAndroid Build Coastguard WorkerHigh-level debuggability of generated code is so far not a design requirement. 408*03ce13f7SAndroid Build Coastguard WorkerSubzero doesn't really do transformations that would obfuscate debugging; the 409*03ce13f7SAndroid Build Coastguard Workermain thing might be that register allocation (including stack slot coalescing 410*03ce13f7SAndroid Build Coastguard Workerfor stack-allocated variables whose live ranges don't overlap) may render a 411*03ce13f7SAndroid Build Coastguard Workervariable's value unobtainable after its live range ends. This would not be an 412*03ce13f7SAndroid Build Coastguard Workerissue for ``Om1`` since it doesn't register-allocate program-level variables, 413*03ce13f7SAndroid Build Coastguard Workernor does it coalesce stack slots. That said, fully supporting debuggability 414*03ce13f7SAndroid Build Coastguard Workerwould require a few additions: 415*03ce13f7SAndroid Build Coastguard Worker 416*03ce13f7SAndroid Build Coastguard Worker- DWARF support would need to be added to Subzero's ELF file emitter. Subzero 417*03ce13f7SAndroid Build Coastguard Worker propagates global symbol names, local variable names, and function-internal 418*03ce13f7SAndroid Build Coastguard Worker label names that are present in the ``.pexe`` file. This would allow a 419*03ce13f7SAndroid Build Coastguard Worker debugger to map addresses back to symbols in the ``.pexe`` file. 420*03ce13f7SAndroid Build Coastguard Worker 421*03ce13f7SAndroid Build Coastguard Worker- To map ``.pexe`` file symbols back to meaningful source-level symbol names, 422*03ce13f7SAndroid Build Coastguard Worker file names, line numbers, etc., Subzero would need to handle `LLVM bitcode 423*03ce13f7SAndroid Build Coastguard Worker metadata <http://llvm.org/docs/LangRef.html#metadata>`_ and ``llvm.dbg`` 424*03ce13f7SAndroid Build Coastguard Worker `instrinsics<http://llvm.org/docs/LangRef.html#dbg-intrinsics>`_. 425*03ce13f7SAndroid Build Coastguard Worker 426*03ce13f7SAndroid Build Coastguard Worker- The PNaCl toolchain explicitly strips all this from the ``.pexe`` file, and so 427*03ce13f7SAndroid Build Coastguard Worker the toolchain would need to be modified to preserve it. 428*03ce13f7SAndroid Build Coastguard Worker 429*03ce13f7SAndroid Build Coastguard WorkerOur experience so far is that ``Om1`` translates twice as fast as ``O2``, but 430*03ce13f7SAndroid Build Coastguard Workerproduces code with one third the code quality. ``Om1`` is good for testing and 431*03ce13f7SAndroid Build Coastguard Workerdebugging -- during translation, it tends to expose errors in the basic lowering 432*03ce13f7SAndroid Build Coastguard Workerthat might otherwise have been hidden by the register allocator or other 433*03ce13f7SAndroid Build Coastguard Workeroptimization passes. It also helps determine whether a code correctness problem 434*03ce13f7SAndroid Build Coastguard Workeris a fundamental problem in the basic lowering, or an error in another 435*03ce13f7SAndroid Build Coastguard Workeroptimization pass. 436*03ce13f7SAndroid Build Coastguard Worker 437*03ce13f7SAndroid Build Coastguard WorkerThe implementation of target lowering also controls the recipe of passes used 438*03ce13f7SAndroid Build Coastguard Workerfor ``Om1`` and ``O2`` translation. For example, address mode inference may 439*03ce13f7SAndroid Build Coastguard Workeronly be relevant for x86. 440*03ce13f7SAndroid Build Coastguard Worker 441*03ce13f7SAndroid Build Coastguard WorkerLowering strategy 442*03ce13f7SAndroid Build Coastguard Worker----------------- 443*03ce13f7SAndroid Build Coastguard Worker 444*03ce13f7SAndroid Build Coastguard WorkerThe core of Subzero's lowering from high-level ICE to low-level ICE is to lower 445*03ce13f7SAndroid Build Coastguard Workereach high-level instruction down to a sequence of low-level target-specific 446*03ce13f7SAndroid Build Coastguard Workerinstructions, in a largely context-free setting. That is, each high-level 447*03ce13f7SAndroid Build Coastguard Workerinstruction conceptually has a simple template expansion into low-level 448*03ce13f7SAndroid Build Coastguard Workerinstructions, and lowering can in theory be done in any order. This may sound 449*03ce13f7SAndroid Build Coastguard Workerlike a small effort, but quite a large number of templates may be needed because 450*03ce13f7SAndroid Build Coastguard Workerof the number of PNaCl types and instruction variants. Furthermore, there may 451*03ce13f7SAndroid Build Coastguard Workerbe optimized templates, e.g. to take advantage of operator commutativity (for 452*03ce13f7SAndroid Build Coastguard Workerexample, ``x=x+1`` might allow a bettern lowering than ``x=1+x``). This is 453*03ce13f7SAndroid Build Coastguard Workersimilar to other template-based approaches in fast code generation or 454*03ce13f7SAndroid Build Coastguard Workerinterpretation, though some decisions are deferred until after some global 455*03ce13f7SAndroid Build Coastguard Workeranalysis passes, mostly related to register allocation, stack slot assignment, 456*03ce13f7SAndroid Build Coastguard Workerand specific choice of instruction variant and addressing mode. 457*03ce13f7SAndroid Build Coastguard Worker 458*03ce13f7SAndroid Build Coastguard WorkerThe key idea for a lowering template is to produce valid low-level instructions 459*03ce13f7SAndroid Build Coastguard Workerthat are guaranteed to meet address mode and other structural requirements of 460*03ce13f7SAndroid Build Coastguard Workerthe instruction set. For example, on x86, the source operand of an integer 461*03ce13f7SAndroid Build Coastguard Workerstore instruction must be an immediate or a physical register; a shift 462*03ce13f7SAndroid Build Coastguard Workerinstruction's shift amount must be an immediate or in register ``%cl``; a 463*03ce13f7SAndroid Build Coastguard Workerfunction's integer return value is in ``%eax``; most x86 instructions are 464*03ce13f7SAndroid Build Coastguard Workertwo-operand, in contrast to corresponding three-operand high-level instructions; 465*03ce13f7SAndroid Build Coastguard Workeretc. 466*03ce13f7SAndroid Build Coastguard Worker 467*03ce13f7SAndroid Build Coastguard WorkerBecause target lowering runs before register allocation, there is no way to know 468*03ce13f7SAndroid Build Coastguard Workerwhether a given ``Ice::Variable`` operand lives on the stack or in a physical 469*03ce13f7SAndroid Build Coastguard Workerregister. When the low-level instruction calls for a physical register operand, 470*03ce13f7SAndroid Build Coastguard Workerthe target lowering can create an infinite-weight Variable. This tells the 471*03ce13f7SAndroid Build Coastguard Workerregister allocator to assign infinite weight when making decisions, effectively 472*03ce13f7SAndroid Build Coastguard Workerguaranteeing some physical register. Variables can also be pre-colored to a 473*03ce13f7SAndroid Build Coastguard Workerspecific physical register (``cl`` in the shift example above), which also gives 474*03ce13f7SAndroid Build Coastguard Workerinfinite weight. 475*03ce13f7SAndroid Build Coastguard Worker 476*03ce13f7SAndroid Build Coastguard WorkerTo illustrate, consider a high-level arithmetic instruction on 32-bit integer 477*03ce13f7SAndroid Build Coastguard Workeroperands:: 478*03ce13f7SAndroid Build Coastguard Worker 479*03ce13f7SAndroid Build Coastguard Worker A = B + C 480*03ce13f7SAndroid Build Coastguard Worker 481*03ce13f7SAndroid Build Coastguard WorkerX86 target lowering might produce the following:: 482*03ce13f7SAndroid Build Coastguard Worker 483*03ce13f7SAndroid Build Coastguard Worker T.inf = B // mov instruction 484*03ce13f7SAndroid Build Coastguard Worker T.inf += C // add instruction 485*03ce13f7SAndroid Build Coastguard Worker A = T.inf // mov instruction 486*03ce13f7SAndroid Build Coastguard Worker 487*03ce13f7SAndroid Build Coastguard WorkerHere, ``T.inf`` is an infinite-weight temporary. As long as ``T.inf`` has a 488*03ce13f7SAndroid Build Coastguard Workerphysical register, the three lowered instructions are all encodable regardless 489*03ce13f7SAndroid Build Coastguard Workerof whether ``B`` and ``C`` are physical registers, memory, or immediates, and 490*03ce13f7SAndroid Build Coastguard Workerwhether ``A`` is a physical register or in memory. 491*03ce13f7SAndroid Build Coastguard Worker 492*03ce13f7SAndroid Build Coastguard WorkerIn this example, ``A`` must be a Variable and one may be tempted to simplify the 493*03ce13f7SAndroid Build Coastguard Workerlowering sequence by setting ``A`` as infinite-weight and using:: 494*03ce13f7SAndroid Build Coastguard Worker 495*03ce13f7SAndroid Build Coastguard Worker A = B // mov instruction 496*03ce13f7SAndroid Build Coastguard Worker A += C // add instruction 497*03ce13f7SAndroid Build Coastguard Worker 498*03ce13f7SAndroid Build Coastguard WorkerThis has two problems. First, if the original instruction was actually ``A = 499*03ce13f7SAndroid Build Coastguard WorkerB + A``, the result would be incorrect. Second, assigning ``A`` a physical 500*03ce13f7SAndroid Build Coastguard Workerregister applies throughout ``A``'s entire live range. This is probably not 501*03ce13f7SAndroid Build Coastguard Workerwhat is intended, and may ultimately lead to a failure to allocate a register 502*03ce13f7SAndroid Build Coastguard Workerfor an infinite-weight variable. 503*03ce13f7SAndroid Build Coastguard Worker 504*03ce13f7SAndroid Build Coastguard WorkerThis style of lowering leads to many temporaries being generated, so in ``O2`` 505*03ce13f7SAndroid Build Coastguard Workermode, we rely on the register allocator to clean things up. For example, in the 506*03ce13f7SAndroid Build Coastguard Workerexample above, if ``B`` ends up getting a physical register and its live range 507*03ce13f7SAndroid Build Coastguard Workerends at this instruction, the register allocator is likely to reuse that 508*03ce13f7SAndroid Build Coastguard Workerregister for ``T.inf``. This leads to ``T.inf=B`` being a redundant register 509*03ce13f7SAndroid Build Coastguard Workercopy, which is removed as an emission-time peephole optimization. 510*03ce13f7SAndroid Build Coastguard Worker 511*03ce13f7SAndroid Build Coastguard WorkerO2 lowering 512*03ce13f7SAndroid Build Coastguard Worker----------- 513*03ce13f7SAndroid Build Coastguard Worker 514*03ce13f7SAndroid Build Coastguard WorkerCurrently, the ``O2`` lowering recipe is the following: 515*03ce13f7SAndroid Build Coastguard Worker 516*03ce13f7SAndroid Build Coastguard Worker- Loop nest analysis 517*03ce13f7SAndroid Build Coastguard Worker 518*03ce13f7SAndroid Build Coastguard Worker- Local common subexpression elimination 519*03ce13f7SAndroid Build Coastguard Worker 520*03ce13f7SAndroid Build Coastguard Worker- Address mode inference 521*03ce13f7SAndroid Build Coastguard Worker 522*03ce13f7SAndroid Build Coastguard Worker- Read-modify-write (RMW) transformation 523*03ce13f7SAndroid Build Coastguard Worker 524*03ce13f7SAndroid Build Coastguard Worker- Basic liveness analysis 525*03ce13f7SAndroid Build Coastguard Worker 526*03ce13f7SAndroid Build Coastguard Worker- Load optimization 527*03ce13f7SAndroid Build Coastguard Worker 528*03ce13f7SAndroid Build Coastguard Worker- Target lowering 529*03ce13f7SAndroid Build Coastguard Worker 530*03ce13f7SAndroid Build Coastguard Worker- Full liveness analysis 531*03ce13f7SAndroid Build Coastguard Worker 532*03ce13f7SAndroid Build Coastguard Worker- Register allocation 533*03ce13f7SAndroid Build Coastguard Worker 534*03ce13f7SAndroid Build Coastguard Worker- Phi instruction lowering (advanced) 535*03ce13f7SAndroid Build Coastguard Worker 536*03ce13f7SAndroid Build Coastguard Worker- Post-phi lowering register allocation 537*03ce13f7SAndroid Build Coastguard Worker 538*03ce13f7SAndroid Build Coastguard Worker- Branch optimization 539*03ce13f7SAndroid Build Coastguard Worker 540*03ce13f7SAndroid Build Coastguard WorkerThese passes are described in more detail below. 541*03ce13f7SAndroid Build Coastguard Worker 542*03ce13f7SAndroid Build Coastguard WorkerOm1 lowering 543*03ce13f7SAndroid Build Coastguard Worker------------ 544*03ce13f7SAndroid Build Coastguard Worker 545*03ce13f7SAndroid Build Coastguard WorkerCurrently, the ``Om1`` lowering recipe is the following: 546*03ce13f7SAndroid Build Coastguard Worker 547*03ce13f7SAndroid Build Coastguard Worker- Phi instruction lowering (simple) 548*03ce13f7SAndroid Build Coastguard Worker 549*03ce13f7SAndroid Build Coastguard Worker- Target lowering 550*03ce13f7SAndroid Build Coastguard Worker 551*03ce13f7SAndroid Build Coastguard Worker- Register allocation (infinite-weight and pre-colored only) 552*03ce13f7SAndroid Build Coastguard Worker 553*03ce13f7SAndroid Build Coastguard WorkerOptimization passes 554*03ce13f7SAndroid Build Coastguard Worker------------------- 555*03ce13f7SAndroid Build Coastguard Worker 556*03ce13f7SAndroid Build Coastguard WorkerLiveness analysis 557*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^ 558*03ce13f7SAndroid Build Coastguard Worker 559*03ce13f7SAndroid Build Coastguard WorkerLiveness analysis is a standard dataflow optimization, implemented as follows. 560*03ce13f7SAndroid Build Coastguard WorkerFor each node (basic block), its live-out set is computed as the union of the 561*03ce13f7SAndroid Build Coastguard Workerlive-in sets of its successor nodes. Then the node's instructions are processed 562*03ce13f7SAndroid Build Coastguard Workerin reverse order, updating the live set, until the beginning of the node is 563*03ce13f7SAndroid Build Coastguard Workerreached, and the node's live-in set is recorded. If this iteration has changed 564*03ce13f7SAndroid Build Coastguard Workerthe node's live-in set, the node's predecessors are marked for reprocessing. 565*03ce13f7SAndroid Build Coastguard WorkerThis continues until no more nodes need reprocessing. If nodes are processed in 566*03ce13f7SAndroid Build Coastguard Workerreverse topological order, the number of iterations over the CFG is generally 567*03ce13f7SAndroid Build Coastguard Workerequal to the maximum loop nest depth. 568*03ce13f7SAndroid Build Coastguard Worker 569*03ce13f7SAndroid Build Coastguard WorkerTo implement this, each node records its live-in and live-out sets, initialized 570*03ce13f7SAndroid Build Coastguard Workerto the empty set. Each instruction records which of its Variables' live ranges 571*03ce13f7SAndroid Build Coastguard Workerend in that instruction, initialized to the empty set. A side effect of 572*03ce13f7SAndroid Build Coastguard Workerliveness analysis is dead instruction elimination. Each instruction can be 573*03ce13f7SAndroid Build Coastguard Workermarked as tentatively dead, and after the algorithm converges, the tentatively 574*03ce13f7SAndroid Build Coastguard Workerdead instructions are permanently deleted. 575*03ce13f7SAndroid Build Coastguard Worker 576*03ce13f7SAndroid Build Coastguard WorkerOptionally, after this liveness analysis completes, we can do live range 577*03ce13f7SAndroid Build Coastguard Workerconstruction, in which we calculate the live range of each variable in terms of 578*03ce13f7SAndroid Build Coastguard Workerinstruction numbers. A live range is represented as a union of segments, where 579*03ce13f7SAndroid Build Coastguard Workerthe segment endpoints are instruction numbers. Instruction numbers are required 580*03ce13f7SAndroid Build Coastguard Workerto be unique across the CFG, and monotonically increasing within a basic block. 581*03ce13f7SAndroid Build Coastguard WorkerAs a union of segments, live ranges can contain "gaps" and are therefore 582*03ce13f7SAndroid Build Coastguard Workerprecise. Because of SSA properties, a variable's live range can start at most 583*03ce13f7SAndroid Build Coastguard Workeronce in a basic block, and can end at most once in a basic block. Liveness 584*03ce13f7SAndroid Build Coastguard Workeranalysis keeps track of which variable/instruction tuples begin live ranges and 585*03ce13f7SAndroid Build Coastguard Workerend live ranges, and combined with live-in and live-out sets, we can efficiently 586*03ce13f7SAndroid Build Coastguard Workerbuild up live ranges of all variables across all basic blocks. 587*03ce13f7SAndroid Build Coastguard Worker 588*03ce13f7SAndroid Build Coastguard WorkerA lot of care is taken to try to make liveness analysis fast and efficient. 589*03ce13f7SAndroid Build Coastguard WorkerBecause of the lowering strategy, the number of variables is generally 590*03ce13f7SAndroid Build Coastguard Workerproportional to the number of instructions, leading to an O(N^2) complexity 591*03ce13f7SAndroid Build Coastguard Workeralgorithm if implemented naively. To improve things based on sparsity, we note 592*03ce13f7SAndroid Build Coastguard Workerthat most variables are "local" and referenced in at most one basic block (in 593*03ce13f7SAndroid Build Coastguard Workercontrast to the "global" variables with multi-block usage), and therefore cannot 594*03ce13f7SAndroid Build Coastguard Workerbe live across basic blocks. Therefore, the live-in and live-out sets, 595*03ce13f7SAndroid Build Coastguard Workertypically represented as bit vectors, can be limited to the set of global 596*03ce13f7SAndroid Build Coastguard Workervariables, and the intra-block liveness bit vector can be compacted to hold the 597*03ce13f7SAndroid Build Coastguard Workerglobal variables plus the local variables for that block. 598*03ce13f7SAndroid Build Coastguard Worker 599*03ce13f7SAndroid Build Coastguard WorkerRegister allocation 600*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^ 601*03ce13f7SAndroid Build Coastguard Worker 602*03ce13f7SAndroid Build Coastguard WorkerSubzero implements a simple linear-scan register allocator, based on the 603*03ce13f7SAndroid Build Coastguard Workerallocator described by Hanspeter Mössenböck and Michael Pfeiffer in `Linear Scan 604*03ce13f7SAndroid Build Coastguard WorkerRegister Allocation in the Context of SSA Form and Register Constraints 605*03ce13f7SAndroid Build Coastguard Worker<ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe02.PDF>`_. This allocator has 606*03ce13f7SAndroid Build Coastguard Workerseveral nice features: 607*03ce13f7SAndroid Build Coastguard Worker 608*03ce13f7SAndroid Build Coastguard Worker- Live ranges are represented as unions of segments, as described above, rather 609*03ce13f7SAndroid Build Coastguard Worker than a single start/end tuple. 610*03ce13f7SAndroid Build Coastguard Worker 611*03ce13f7SAndroid Build Coastguard Worker- It allows pre-coloring of variables with specific physical registers. 612*03ce13f7SAndroid Build Coastguard Worker 613*03ce13f7SAndroid Build Coastguard Worker- It applies equally well to pre-lowered Phi instructions. 614*03ce13f7SAndroid Build Coastguard Worker 615*03ce13f7SAndroid Build Coastguard WorkerThe paper suggests an approach of aggressively coalescing variables across Phi 616*03ce13f7SAndroid Build Coastguard Workerinstructions (i.e., trying to force Phi source and destination variables to have 617*03ce13f7SAndroid Build Coastguard Workerthe same register assignment), but we reject that in favor of the more natural 618*03ce13f7SAndroid Build Coastguard Workerpreference mechanism described below. 619*03ce13f7SAndroid Build Coastguard Worker 620*03ce13f7SAndroid Build Coastguard WorkerWe enhance the algorithm in the paper with the capability of automatic inference 621*03ce13f7SAndroid Build Coastguard Workerof register preference, and with the capability of allowing overlapping live 622*03ce13f7SAndroid Build Coastguard Workerranges to safely share the same register in certain circumstances. If we are 623*03ce13f7SAndroid Build Coastguard Workerconsidering register allocation for variable ``A``, and ``A`` has a single 624*03ce13f7SAndroid Build Coastguard Workerdefining instruction ``A=B+C``, then the preferred register for ``A``, if 625*03ce13f7SAndroid Build Coastguard Workeravailable, would be the register assigned to ``B`` or ``C``, if any, provided 626*03ce13f7SAndroid Build Coastguard Workerthat ``B`` or ``C``'s live range does not overlap ``A``'s live range. In this 627*03ce13f7SAndroid Build Coastguard Workerway we infer a good register preference for ``A``. 628*03ce13f7SAndroid Build Coastguard Worker 629*03ce13f7SAndroid Build Coastguard WorkerWe allow overlapping live ranges to get the same register in certain cases. 630*03ce13f7SAndroid Build Coastguard WorkerSuppose a high-level instruction like:: 631*03ce13f7SAndroid Build Coastguard Worker 632*03ce13f7SAndroid Build Coastguard Worker A = unary_op(B) 633*03ce13f7SAndroid Build Coastguard Worker 634*03ce13f7SAndroid Build Coastguard Workerhas been target-lowered like:: 635*03ce13f7SAndroid Build Coastguard Worker 636*03ce13f7SAndroid Build Coastguard Worker T.inf = B 637*03ce13f7SAndroid Build Coastguard Worker A = unary_op(T.inf) 638*03ce13f7SAndroid Build Coastguard Worker 639*03ce13f7SAndroid Build Coastguard WorkerFurther, assume that ``B``'s live range continues beyond this instruction 640*03ce13f7SAndroid Build Coastguard Workersequence, and that ``B`` has already been assigned some register. Normally, we 641*03ce13f7SAndroid Build Coastguard Workermight want to infer ``B``'s register as a good candidate for ``T.inf``, but it 642*03ce13f7SAndroid Build Coastguard Workerturns out that ``T.inf`` and ``B``'s live ranges overlap, requiring them to have 643*03ce13f7SAndroid Build Coastguard Workerdifferent registers. But ``T.inf`` is just a read-only copy of ``B`` that is 644*03ce13f7SAndroid Build Coastguard Workerguaranteed to be in a register, so in theory these overlapping live ranges could 645*03ce13f7SAndroid Build Coastguard Workersafely have the same register. Our implementation allows this overlap as long 646*03ce13f7SAndroid Build Coastguard Workeras ``T.inf`` is never modified within ``B``'s live range, and ``B`` is never 647*03ce13f7SAndroid Build Coastguard Workermodified within ``T.inf``'s live range. 648*03ce13f7SAndroid Build Coastguard Worker 649*03ce13f7SAndroid Build Coastguard WorkerSubzero's register allocator can be run in 3 configurations. 650*03ce13f7SAndroid Build Coastguard Worker 651*03ce13f7SAndroid Build Coastguard Worker- Normal mode. All Variables are considered for register allocation. It 652*03ce13f7SAndroid Build Coastguard Worker requires full liveness analysis and live range construction as a prerequisite. 653*03ce13f7SAndroid Build Coastguard Worker This is used by ``O2`` lowering. 654*03ce13f7SAndroid Build Coastguard Worker 655*03ce13f7SAndroid Build Coastguard Worker- Minimal mode. Only infinite-weight or pre-colored Variables are considered. 656*03ce13f7SAndroid Build Coastguard Worker All other Variables are stack-allocated. It does not require liveness 657*03ce13f7SAndroid Build Coastguard Worker analysis; instead, it quickly scans the instructions and records first 658*03ce13f7SAndroid Build Coastguard Worker definitions and last uses of all relevant Variables, using that to construct a 659*03ce13f7SAndroid Build Coastguard Worker single-segment live range. Although this includes most of the Variables, the 660*03ce13f7SAndroid Build Coastguard Worker live ranges are mostly simple, short, and rarely overlapping, which the 661*03ce13f7SAndroid Build Coastguard Worker register allocator handles efficiently. This is used by ``Om1`` lowering. 662*03ce13f7SAndroid Build Coastguard Worker 663*03ce13f7SAndroid Build Coastguard Worker- Post-phi lowering mode. Advanced phi lowering is done after normal-mode 664*03ce13f7SAndroid Build Coastguard Worker register allocation, and may result in new infinite-weight Variables that need 665*03ce13f7SAndroid Build Coastguard Worker registers. One would like to just run something like minimal mode to assign 666*03ce13f7SAndroid Build Coastguard Worker registers to the new Variables while respecting existing register allocation 667*03ce13f7SAndroid Build Coastguard Worker decisions. However, it sometimes happens that there are no free registers. 668*03ce13f7SAndroid Build Coastguard Worker In this case, some register needs to be forcibly spilled to the stack and 669*03ce13f7SAndroid Build Coastguard Worker temporarily reassigned to the new Variable, and reloaded at the end of the new 670*03ce13f7SAndroid Build Coastguard Worker Variable's live range. The register must be one that has no explicit 671*03ce13f7SAndroid Build Coastguard Worker references during the Variable's live range. Since Subzero currently doesn't 672*03ce13f7SAndroid Build Coastguard Worker track def/use chains (though it does record the CfgNode where a Variable is 673*03ce13f7SAndroid Build Coastguard Worker defined), we just do a brute-force search across the CfgNode's instruction 674*03ce13f7SAndroid Build Coastguard Worker list for the instruction numbers of interest. This situation happens very 675*03ce13f7SAndroid Build Coastguard Worker rarely, so there's little point for now in improving its performance. 676*03ce13f7SAndroid Build Coastguard Worker 677*03ce13f7SAndroid Build Coastguard WorkerThe basic linear-scan algorithm may, as it proceeds, rescind an early register 678*03ce13f7SAndroid Build Coastguard Workerallocation decision, leaving that Variable to be stack-allocated. Some of these 679*03ce13f7SAndroid Build Coastguard Workertimes, it turns out that the Variable could have been given a different register 680*03ce13f7SAndroid Build Coastguard Workerwithout conflict, but by this time it's too late. The literature recognizes 681*03ce13f7SAndroid Build Coastguard Workerthis situation and describes "second-chance bin-packing", which Subzero can do. 682*03ce13f7SAndroid Build Coastguard WorkerWe can rerun the register allocator in a mode that respects existing register 683*03ce13f7SAndroid Build Coastguard Workerallocation decisions, and sometimes it finds new non-conflicting opportunities. 684*03ce13f7SAndroid Build Coastguard WorkerIn fact, we can repeatedly run the register allocator until convergence. 685*03ce13f7SAndroid Build Coastguard WorkerUnfortunately, in the current implementation, these subsequent register 686*03ce13f7SAndroid Build Coastguard Workerallocation passes end up being extremely expensive. This is because of the 687*03ce13f7SAndroid Build Coastguard Workertreatment of the "unhandled pre-colored" Variable set, which is normally very 688*03ce13f7SAndroid Build Coastguard Workersmall but ends up being quite large on subsequent passes. Its performance can 689*03ce13f7SAndroid Build Coastguard Workerprobably be made acceptable with a better choice of data structures, but for now 690*03ce13f7SAndroid Build Coastguard Workerthis second-chance mechanism is disabled. 691*03ce13f7SAndroid Build Coastguard Worker 692*03ce13f7SAndroid Build Coastguard WorkerFuture work is to implement LLVM's `Greedy 693*03ce13f7SAndroid Build Coastguard Worker<http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html>`_ 694*03ce13f7SAndroid Build Coastguard Workerregister allocator as a replacement for the basic linear-scan algorithm, given 695*03ce13f7SAndroid Build Coastguard WorkerLLVM's experience with its improvement in code quality. (The blog post claims 696*03ce13f7SAndroid Build Coastguard Workerthat the Greedy allocator also improved maintainability because a lot of hacks 697*03ce13f7SAndroid Build Coastguard Workercould be removed, but Subzero is probably not yet to that level of hacks, and is 698*03ce13f7SAndroid Build Coastguard Workerless likely to see that particular benefit.) 699*03ce13f7SAndroid Build Coastguard Worker 700*03ce13f7SAndroid Build Coastguard WorkerLocal common subexpression elimination 701*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 702*03ce13f7SAndroid Build Coastguard Worker 703*03ce13f7SAndroid Build Coastguard WorkerThe Local CSE implementation goes through each instruction and records a portion 704*03ce13f7SAndroid Build Coastguard Workerof each ``Seen`` instruction in a hashset-like container. That portion consists 705*03ce13f7SAndroid Build Coastguard Workerof the entire instruction except for any dest variable. That means ``A = X + Y`` 706*03ce13f7SAndroid Build Coastguard Workerand ``B = X + Y`` will be considered to be 'equal' for this purpose. This allows 707*03ce13f7SAndroid Build Coastguard Workerus to detect common subexpressions. 708*03ce13f7SAndroid Build Coastguard Worker 709*03ce13f7SAndroid Build Coastguard WorkerWhenever a repetition is detected, the redundant variables are stored in a 710*03ce13f7SAndroid Build Coastguard Workercontainer mapping the replacee to the replacement. In the case above, it would 711*03ce13f7SAndroid Build Coastguard Workerbe ``MAP[B] = A`` assuming ``B = X + Y`` comes after ``A = X + Y``. 712*03ce13f7SAndroid Build Coastguard Worker 713*03ce13f7SAndroid Build Coastguard WorkerAt any point if a variable that has an entry in the replacement table is 714*03ce13f7SAndroid Build Coastguard Workerencountered, it is replaced with the variable it is mapped to. This ensures that 715*03ce13f7SAndroid Build Coastguard Workerthe redundant variables will not have any uses in the basic block, allowing 716*03ce13f7SAndroid Build Coastguard Workerdead code elimination to clean up the redundant instruction. 717*03ce13f7SAndroid Build Coastguard Worker 718*03ce13f7SAndroid Build Coastguard WorkerWith SSA, the information stored is never invalidated. However, non-SSA input is 719*03ce13f7SAndroid Build Coastguard Workersupported with the ``-lcse=no-ssa`` option. This has to maintain some extra 720*03ce13f7SAndroid Build Coastguard Workerdependency information to ensure proper invalidation on variable assignment. 721*03ce13f7SAndroid Build Coastguard WorkerThis is not rigorously tested because this pass is run at an early stage where 722*03ce13f7SAndroid Build Coastguard Workerit is safe to assume variables have a single definition. This is not enabled by 723*03ce13f7SAndroid Build Coastguard Workerdefault because it bumps the compile time overhead from 2% to 6%. 724*03ce13f7SAndroid Build Coastguard Worker 725*03ce13f7SAndroid Build Coastguard WorkerLoop-invariant code motion 726*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^ 727*03ce13f7SAndroid Build Coastguard Worker 728*03ce13f7SAndroid Build Coastguard WorkerThis pass utilizes the loop analysis information to hoist invariant instructions 729*03ce13f7SAndroid Build Coastguard Workerto loop pre-headers. A loop must have a single entry node (header) and that node 730*03ce13f7SAndroid Build Coastguard Workermust have a single external predecessor for this optimization to work, as it is 731*03ce13f7SAndroid Build Coastguard Workercurrently implemented. 732*03ce13f7SAndroid Build Coastguard Worker 733*03ce13f7SAndroid Build Coastguard WorkerThe pass works by iterating over all instructions in the loop until the set of 734*03ce13f7SAndroid Build Coastguard Workerinvariant instructions converges. In each iteration, a non-invariant instruction 735*03ce13f7SAndroid Build Coastguard Workerinvolving only constants or a variable known to be invariant is added to the 736*03ce13f7SAndroid Build Coastguard Workerresult set. The destination variable of that instruction is added to the set of 737*03ce13f7SAndroid Build Coastguard Workervariables known to be invariant (which is initialized with the constant args). 738*03ce13f7SAndroid Build Coastguard Worker 739*03ce13f7SAndroid Build Coastguard WorkerImproving the loop-analysis infrastructure is likely to have significant impact 740*03ce13f7SAndroid Build Coastguard Workeron this optimization. Inserting an extra node to act as the pre-header when the 741*03ce13f7SAndroid Build Coastguard Workerheader has multiple incoming edges from outside could also be a good idea. 742*03ce13f7SAndroid Build Coastguard WorkerExpanding the initial invariant variable set to contain all variables that do 743*03ce13f7SAndroid Build Coastguard Workernot have definitions inside the loop does not seem to improve anything. 744*03ce13f7SAndroid Build Coastguard Worker 745*03ce13f7SAndroid Build Coastguard WorkerShort circuit evaluation 746*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^ 747*03ce13f7SAndroid Build Coastguard Worker 748*03ce13f7SAndroid Build Coastguard WorkerShort circuit evaluation splits nodes and introduces early jumps when the result 749*03ce13f7SAndroid Build Coastguard Workerof a logical operation can be determined early and there are no observable side 750*03ce13f7SAndroid Build Coastguard Workereffects of skipping the rest of the computation. The instructions considered 751*03ce13f7SAndroid Build Coastguard Workerbackwards from the end of the basic blocks. When a definition of a variable 752*03ce13f7SAndroid Build Coastguard Workerinvolved in a conditional jump is found, an extra jump can be inserted in that 753*03ce13f7SAndroid Build Coastguard Workerlocation, moving the rest of the instructions in the node to a newly inserted 754*03ce13f7SAndroid Build Coastguard Workernode. Consider this example:: 755*03ce13f7SAndroid Build Coastguard Worker 756*03ce13f7SAndroid Build Coastguard Worker __N : 757*03ce13f7SAndroid Build Coastguard Worker a = <something> 758*03ce13f7SAndroid Build Coastguard Worker Instruction 1 without side effect 759*03ce13f7SAndroid Build Coastguard Worker ... b = <something> ... 760*03ce13f7SAndroid Build Coastguard Worker Instruction N without side effect 761*03ce13f7SAndroid Build Coastguard Worker t1 = or a b 762*03ce13f7SAndroid Build Coastguard Worker br t1 __X __Y 763*03ce13f7SAndroid Build Coastguard Worker 764*03ce13f7SAndroid Build Coastguard Workeris transformed to:: 765*03ce13f7SAndroid Build Coastguard Worker 766*03ce13f7SAndroid Build Coastguard Worker __N : 767*03ce13f7SAndroid Build Coastguard Worker a = <something> 768*03ce13f7SAndroid Build Coastguard Worker br a __X __N_ext 769*03ce13f7SAndroid Build Coastguard Worker 770*03ce13f7SAndroid Build Coastguard Worker __N_ext : 771*03ce13f7SAndroid Build Coastguard Worker Instruction 1 without side effect 772*03ce13f7SAndroid Build Coastguard Worker ... b = <something> ... 773*03ce13f7SAndroid Build Coastguard Worker Instruction N without side effect 774*03ce13f7SAndroid Build Coastguard Worker br b __X __Y 775*03ce13f7SAndroid Build Coastguard Worker 776*03ce13f7SAndroid Build Coastguard WorkerThe logic for AND is analogous, the only difference is that the early jump is 777*03ce13f7SAndroid Build Coastguard Workerfacilitated by a ``false`` value instead of ``true``. 778*03ce13f7SAndroid Build Coastguard Worker 779*03ce13f7SAndroid Build Coastguard WorkerGlobal Variable Splitting 780*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^ 781*03ce13f7SAndroid Build Coastguard Worker 782*03ce13f7SAndroid Build Coastguard WorkerGlobal variable splitting (``-split-global-vars``) is run after register 783*03ce13f7SAndroid Build Coastguard Workerallocation. It works on the variables that did not manage to get registers (but 784*03ce13f7SAndroid Build Coastguard Workerare allowed to) and decomposes their live ranges into the individual segments 785*03ce13f7SAndroid Build Coastguard Worker(which span a single node at most). New variables are created (but not yet used) 786*03ce13f7SAndroid Build Coastguard Workerwith these smaller live ranges and the register allocator is run again. This is 787*03ce13f7SAndroid Build Coastguard Workernot inefficient as the old variables that already had registers are now 788*03ce13f7SAndroid Build Coastguard Workerconsidered pre-colored. 789*03ce13f7SAndroid Build Coastguard Worker 790*03ce13f7SAndroid Build Coastguard WorkerThe new variables that get registers replace their parent variables for their 791*03ce13f7SAndroid Build Coastguard Workerportion of its (parent's) live range. A copy from the old variable to the new 792*03ce13f7SAndroid Build Coastguard Workeris introduced before the first use and the reverse after the last def in the 793*03ce13f7SAndroid Build Coastguard Workerlive range. 794*03ce13f7SAndroid Build Coastguard Worker 795*03ce13f7SAndroid Build Coastguard WorkerBasic phi lowering 796*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^ 797*03ce13f7SAndroid Build Coastguard Worker 798*03ce13f7SAndroid Build Coastguard WorkerThe simplest phi lowering strategy works as follows (this is how LLVM ``-O0`` 799*03ce13f7SAndroid Build Coastguard Workerimplements it). Consider this example:: 800*03ce13f7SAndroid Build Coastguard Worker 801*03ce13f7SAndroid Build Coastguard Worker L1: 802*03ce13f7SAndroid Build Coastguard Worker ... 803*03ce13f7SAndroid Build Coastguard Worker br L3 804*03ce13f7SAndroid Build Coastguard Worker L2: 805*03ce13f7SAndroid Build Coastguard Worker ... 806*03ce13f7SAndroid Build Coastguard Worker br L3 807*03ce13f7SAndroid Build Coastguard Worker L3: 808*03ce13f7SAndroid Build Coastguard Worker A = phi [B, L1], [C, L2] 809*03ce13f7SAndroid Build Coastguard Worker X = phi [Y, L1], [Z, L2] 810*03ce13f7SAndroid Build Coastguard Worker 811*03ce13f7SAndroid Build Coastguard WorkerFor each destination of a phi instruction, we can create a temporary and insert 812*03ce13f7SAndroid Build Coastguard Workerthe temporary's assignment at the end of the predecessor block:: 813*03ce13f7SAndroid Build Coastguard Worker 814*03ce13f7SAndroid Build Coastguard Worker L1: 815*03ce13f7SAndroid Build Coastguard Worker ... 816*03ce13f7SAndroid Build Coastguard Worker A' = B 817*03ce13f7SAndroid Build Coastguard Worker X' = Y 818*03ce13f7SAndroid Build Coastguard Worker br L3 819*03ce13f7SAndroid Build Coastguard Worker L2: 820*03ce13f7SAndroid Build Coastguard Worker ... 821*03ce13f7SAndroid Build Coastguard Worker A' = C 822*03ce13f7SAndroid Build Coastguard Worker X' = Z 823*03ce13f7SAndroid Build Coastguard Worker br L3 824*03ce13f7SAndroid Build Coastguard Worker L2: 825*03ce13f7SAndroid Build Coastguard Worker A = A' 826*03ce13f7SAndroid Build Coastguard Worker X = X' 827*03ce13f7SAndroid Build Coastguard Worker 828*03ce13f7SAndroid Build Coastguard WorkerThis transformation is very simple and reliable. It can be done before target 829*03ce13f7SAndroid Build Coastguard Workerlowering and register allocation, and it easily avoids the classic lost-copy and 830*03ce13f7SAndroid Build Coastguard Workerrelated problems. ``Om1`` lowering uses this strategy. 831*03ce13f7SAndroid Build Coastguard Worker 832*03ce13f7SAndroid Build Coastguard WorkerHowever, it has the disadvantage of initializing temporaries even for branches 833*03ce13f7SAndroid Build Coastguard Workernot taken, though that could be mitigated by splitting non-critical edges and 834*03ce13f7SAndroid Build Coastguard Workerputting assignments in the edge-split nodes. Another problem is that without 835*03ce13f7SAndroid Build Coastguard Workerextra machinery, the assignments to ``A``, ``A'``, ``X``, and ``X'`` are given a 836*03ce13f7SAndroid Build Coastguard Workerspecific ordering even though phi semantics are that the assignments are 837*03ce13f7SAndroid Build Coastguard Workerparallel or unordered. This sometimes imposes false live range overlaps and 838*03ce13f7SAndroid Build Coastguard Workerleads to poorer register allocation. 839*03ce13f7SAndroid Build Coastguard Worker 840*03ce13f7SAndroid Build Coastguard WorkerAdvanced phi lowering 841*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^ 842*03ce13f7SAndroid Build Coastguard Worker 843*03ce13f7SAndroid Build Coastguard Worker``O2`` lowering defers phi lowering until after register allocation to avoid the 844*03ce13f7SAndroid Build Coastguard Workerproblem of false live range overlaps. It works as follows. We split each 845*03ce13f7SAndroid Build Coastguard Workerincoming edge and move the (parallel) phi assignments into the split nodes. We 846*03ce13f7SAndroid Build Coastguard Workerlinearize each set of assignments by finding a safe, topological ordering of the 847*03ce13f7SAndroid Build Coastguard Workerassignments, respecting register assignments as well. For example:: 848*03ce13f7SAndroid Build Coastguard Worker 849*03ce13f7SAndroid Build Coastguard Worker A = B 850*03ce13f7SAndroid Build Coastguard Worker X = Y 851*03ce13f7SAndroid Build Coastguard Worker 852*03ce13f7SAndroid Build Coastguard WorkerNormally these assignments could be executed in either order, but if ``B`` and 853*03ce13f7SAndroid Build Coastguard Worker``X`` are assigned the same physical register, we would want to use the above 854*03ce13f7SAndroid Build Coastguard Workerordering. Dependency cycles are broken by introducing a temporary. For 855*03ce13f7SAndroid Build Coastguard Workerexample:: 856*03ce13f7SAndroid Build Coastguard Worker 857*03ce13f7SAndroid Build Coastguard Worker A = B 858*03ce13f7SAndroid Build Coastguard Worker B = A 859*03ce13f7SAndroid Build Coastguard Worker 860*03ce13f7SAndroid Build Coastguard WorkerHere, a temporary breaks the cycle:: 861*03ce13f7SAndroid Build Coastguard Worker 862*03ce13f7SAndroid Build Coastguard Worker t = A 863*03ce13f7SAndroid Build Coastguard Worker A = B 864*03ce13f7SAndroid Build Coastguard Worker B = t 865*03ce13f7SAndroid Build Coastguard Worker 866*03ce13f7SAndroid Build Coastguard WorkerFinally, we use the existing target lowering to lower the assignments in this 867*03ce13f7SAndroid Build Coastguard Workerbasic block, and once that is done for all basic blocks, we run the post-phi 868*03ce13f7SAndroid Build Coastguard Workervariant of register allocation on the edge-split basic blocks. 869*03ce13f7SAndroid Build Coastguard Worker 870*03ce13f7SAndroid Build Coastguard WorkerWhen computing a topological order, we try to first schedule assignments whose 871*03ce13f7SAndroid Build Coastguard Workersource has a physical register, and last schedule assignments whose destination 872*03ce13f7SAndroid Build Coastguard Workerhas a physical register. This helps reduce register pressure. 873*03ce13f7SAndroid Build Coastguard Worker 874*03ce13f7SAndroid Build Coastguard WorkerX86 address mode inference 875*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^ 876*03ce13f7SAndroid Build Coastguard Worker 877*03ce13f7SAndroid Build Coastguard WorkerWe try to take advantage of the x86 addressing mode that includes a base 878*03ce13f7SAndroid Build Coastguard Workerregister, an index register, an index register scale amount, and an immediate 879*03ce13f7SAndroid Build Coastguard Workeroffset. We do this through simple pattern matching. Starting with a load or 880*03ce13f7SAndroid Build Coastguard Workerstore instruction where the address is a variable, we initialize the base 881*03ce13f7SAndroid Build Coastguard Workerregister to that variable, and look up the instruction where that variable is 882*03ce13f7SAndroid Build Coastguard Workerdefined. If that is an add instruction of two variables and the index register 883*03ce13f7SAndroid Build Coastguard Workerhasn't been set, we replace the base and index register with those two 884*03ce13f7SAndroid Build Coastguard Workervariables. If instead it is an add instruction of a variable and a constant, we 885*03ce13f7SAndroid Build Coastguard Workerreplace the base register with the variable and add the constant to the 886*03ce13f7SAndroid Build Coastguard Workerimmediate offset. 887*03ce13f7SAndroid Build Coastguard Worker 888*03ce13f7SAndroid Build Coastguard WorkerThere are several more patterns that can be matched. This pattern matching 889*03ce13f7SAndroid Build Coastguard Workercontinues on the load or store instruction until no more matches are found. 890*03ce13f7SAndroid Build Coastguard WorkerBecause a program typically has few load and store instructions (not to be 891*03ce13f7SAndroid Build Coastguard Workerconfused with instructions that manipulate stack variables), this address mode 892*03ce13f7SAndroid Build Coastguard Workerinference pass is fast. 893*03ce13f7SAndroid Build Coastguard Worker 894*03ce13f7SAndroid Build Coastguard WorkerX86 read-modify-write inference 895*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 896*03ce13f7SAndroid Build Coastguard Worker 897*03ce13f7SAndroid Build Coastguard WorkerA reasonably common bitcode pattern is a non-atomic update of a memory 898*03ce13f7SAndroid Build Coastguard Workerlocation:: 899*03ce13f7SAndroid Build Coastguard Worker 900*03ce13f7SAndroid Build Coastguard Worker x = load addr 901*03ce13f7SAndroid Build Coastguard Worker y = add x, 1 902*03ce13f7SAndroid Build Coastguard Worker store y, addr 903*03ce13f7SAndroid Build Coastguard Worker 904*03ce13f7SAndroid Build Coastguard WorkerOn x86, with good register allocation, the Subzero passes described above 905*03ce13f7SAndroid Build Coastguard Workergenerate code with only this quality:: 906*03ce13f7SAndroid Build Coastguard Worker 907*03ce13f7SAndroid Build Coastguard Worker mov [%ebx], %eax 908*03ce13f7SAndroid Build Coastguard Worker add $1, %eax 909*03ce13f7SAndroid Build Coastguard Worker mov %eax, [%ebx] 910*03ce13f7SAndroid Build Coastguard Worker 911*03ce13f7SAndroid Build Coastguard WorkerHowever, x86 allows for this kind of code:: 912*03ce13f7SAndroid Build Coastguard Worker 913*03ce13f7SAndroid Build Coastguard Worker add $1, [%ebx] 914*03ce13f7SAndroid Build Coastguard Worker 915*03ce13f7SAndroid Build Coastguard Workerwhich requires fewer instructions, but perhaps more importantly, requires fewer 916*03ce13f7SAndroid Build Coastguard Workerphysical registers. 917*03ce13f7SAndroid Build Coastguard Worker 918*03ce13f7SAndroid Build Coastguard WorkerIt's also important to note that this transformation only makes sense if the 919*03ce13f7SAndroid Build Coastguard Workerstore instruction ends ``x``'s live range. 920*03ce13f7SAndroid Build Coastguard Worker 921*03ce13f7SAndroid Build Coastguard WorkerSubzero's ``O2`` recipe includes an early pass to find read-modify-write (RMW) 922*03ce13f7SAndroid Build Coastguard Workeropportunities via simple pattern matching. The only problem is that it is run 923*03ce13f7SAndroid Build Coastguard Workerbefore liveness analysis, which is needed to determine whether ``x``'s live 924*03ce13f7SAndroid Build Coastguard Workerrange ends after the RMW. Since liveness analysis is one of the most expensive 925*03ce13f7SAndroid Build Coastguard Workerpasses, it's not attractive to run it an extra time just for RMW analysis. 926*03ce13f7SAndroid Build Coastguard WorkerInstead, we essentially generate both the RMW and the non-RMW versions, and then 927*03ce13f7SAndroid Build Coastguard Workerduring lowering, the RMW version deletes itself if it finds x still live. 928*03ce13f7SAndroid Build Coastguard Worker 929*03ce13f7SAndroid Build Coastguard WorkerX86 compare-branch inference 930*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 931*03ce13f7SAndroid Build Coastguard Worker 932*03ce13f7SAndroid Build Coastguard WorkerIn the LLVM instruction set, the compare/branch pattern works like this:: 933*03ce13f7SAndroid Build Coastguard Worker 934*03ce13f7SAndroid Build Coastguard Worker cond = icmp eq a, b 935*03ce13f7SAndroid Build Coastguard Worker br cond, target 936*03ce13f7SAndroid Build Coastguard Worker 937*03ce13f7SAndroid Build Coastguard WorkerThe result of the icmp instruction is a single bit, and a conditional branch 938*03ce13f7SAndroid Build Coastguard Workertests that bit. By contrast, most target architectures use this pattern:: 939*03ce13f7SAndroid Build Coastguard Worker 940*03ce13f7SAndroid Build Coastguard Worker cmp a, b // implicitly sets various bits of FLAGS register 941*03ce13f7SAndroid Build Coastguard Worker br eq, target // branch on a particular FLAGS bit 942*03ce13f7SAndroid Build Coastguard Worker 943*03ce13f7SAndroid Build Coastguard WorkerA naive lowering sequence conditionally sets ``cond`` to 0 or 1, then tests 944*03ce13f7SAndroid Build Coastguard Worker``cond`` and conditionally branches. Subzero has a pass that identifies 945*03ce13f7SAndroid Build Coastguard Workerboolean-based operations like this and folds them into a single 946*03ce13f7SAndroid Build Coastguard Workercompare/branch-like operation. It is set up for more than just cmp/br though. 947*03ce13f7SAndroid Build Coastguard WorkerBoolean producers include icmp (integer compare), fcmp (floating-point compare), 948*03ce13f7SAndroid Build Coastguard Workerand trunc (integer truncation when the destination has bool type). Boolean 949*03ce13f7SAndroid Build Coastguard Workerconsumers include branch, select (the ternary operator from the C language), and 950*03ce13f7SAndroid Build Coastguard Workersign-extend and zero-extend when the source has bool type. 951*03ce13f7SAndroid Build Coastguard Worker 952*03ce13f7SAndroid Build Coastguard WorkerCode emission 953*03ce13f7SAndroid Build Coastguard Worker------------- 954*03ce13f7SAndroid Build Coastguard Worker 955*03ce13f7SAndroid Build Coastguard WorkerSubzero's integrated assembler is derived from Dart's `assembler code 956*03ce13f7SAndroid Build Coastguard Worker<https://github.com/dart-lang/sdk/tree/master/runtime/vm>'_. There is a pass 957*03ce13f7SAndroid Build Coastguard Workerthat iterates through the low-level ICE instructions and invokes the relevant 958*03ce13f7SAndroid Build Coastguard Workerassembler functions. Placeholders are added for later fixup of branch target 959*03ce13f7SAndroid Build Coastguard Workeroffsets. (Backward branches use short offsets if possible; forward branches 960*03ce13f7SAndroid Build Coastguard Workergenerally use long offsets unless it is an intra-block branch of "known" short 961*03ce13f7SAndroid Build Coastguard Workerlength.) The assembler emits into a staging buffer. Once emission into the 962*03ce13f7SAndroid Build Coastguard Workerstaging buffer for a function is complete, the data is emitted to the output 963*03ce13f7SAndroid Build Coastguard Workerfile as an ELF object file, and metadata such as relocations, symbol table, and 964*03ce13f7SAndroid Build Coastguard Workerstring table, are accumulated for emission at the end. Global data initializers 965*03ce13f7SAndroid Build Coastguard Workerare emitted similarly. A key point is that at this point, the staging buffer 966*03ce13f7SAndroid Build Coastguard Workercan be deallocated, and only a minimum of data needs to held until the end. 967*03ce13f7SAndroid Build Coastguard Worker 968*03ce13f7SAndroid Build Coastguard WorkerAs a debugging alternative, Subzero can emit textual assembly code which can 969*03ce13f7SAndroid Build Coastguard Workerthen be run through an external assembler. This is of course super slow, but 970*03ce13f7SAndroid Build Coastguard Workerquite valuable when bringing up a new target. 971*03ce13f7SAndroid Build Coastguard Worker 972*03ce13f7SAndroid Build Coastguard WorkerAs another debugging option, the staging buffer can be emitted as textual 973*03ce13f7SAndroid Build Coastguard Workerassembly, primarily in the form of ".byte" lines. This allows the assembler to 974*03ce13f7SAndroid Build Coastguard Workerbe tested separately from the ELF related code. 975*03ce13f7SAndroid Build Coastguard Worker 976*03ce13f7SAndroid Build Coastguard WorkerMemory management 977*03ce13f7SAndroid Build Coastguard Worker----------------- 978*03ce13f7SAndroid Build Coastguard Worker 979*03ce13f7SAndroid Build Coastguard WorkerWhere possible, we allocate from a ``CfgLocalAllocator`` which derives from 980*03ce13f7SAndroid Build Coastguard WorkerLLVM's ``BumpPtrAllocator``. This is an arena-style allocator where objects 981*03ce13f7SAndroid Build Coastguard Workerallocated from the arena are never actually freed; instead, when the CFG 982*03ce13f7SAndroid Build Coastguard Workertranslation completes and the CFG is deleted, the entire arena memory is 983*03ce13f7SAndroid Build Coastguard Workerreclaimed at once. This style of allocation works well in an environment like a 984*03ce13f7SAndroid Build Coastguard Workercompiler where there are distinct phases with only easily-identifiable objects 985*03ce13f7SAndroid Build Coastguard Workerliving across phases. It frees the developer from having to manage object 986*03ce13f7SAndroid Build Coastguard Workerdeletion, and it amortizes deletion costs across just a single arena deletion at 987*03ce13f7SAndroid Build Coastguard Workerthe end of the phase. Furthermore, it helps scalability by allocating entirely 988*03ce13f7SAndroid Build Coastguard Workerfrom thread-local memory pools, and minimizing global locking of the heap. 989*03ce13f7SAndroid Build Coastguard Worker 990*03ce13f7SAndroid Build Coastguard WorkerInstructions are probably the most heavily allocated complex class in Subzero. 991*03ce13f7SAndroid Build Coastguard WorkerWe represent an instruction list as an intrusive doubly linked list, allocate 992*03ce13f7SAndroid Build Coastguard Workerall instructions from the ``CfgLocalAllocator``, and we make sure each 993*03ce13f7SAndroid Build Coastguard Workerinstruction subclass is basically `POD 994*03ce13f7SAndroid Build Coastguard Worker<http://en.cppreference.com/w/cpp/concept/PODType>`_ (Plain Old Data) with a 995*03ce13f7SAndroid Build Coastguard Workertrivial destructor. This way, when the CFG is finished, we don't need to 996*03ce13f7SAndroid Build Coastguard Workerindividually deallocate every instruction. We do similar for Variables, which 997*03ce13f7SAndroid Build Coastguard Workeris probably the second most popular complex class. 998*03ce13f7SAndroid Build Coastguard Worker 999*03ce13f7SAndroid Build Coastguard WorkerThere are some situations where passes need to use some `STL container class 1000*03ce13f7SAndroid Build Coastguard Worker<http://en.cppreference.com/w/cpp/container>`_. Subzero has a way of using the 1001*03ce13f7SAndroid Build Coastguard Worker``CfgLocalAllocator`` as the container allocator if this is needed. 1002*03ce13f7SAndroid Build Coastguard Worker 1003*03ce13f7SAndroid Build Coastguard WorkerMultithreaded translation 1004*03ce13f7SAndroid Build Coastguard Worker------------------------- 1005*03ce13f7SAndroid Build Coastguard Worker 1006*03ce13f7SAndroid Build Coastguard WorkerSubzero is designed to be able to translate functions in parallel. With the 1007*03ce13f7SAndroid Build Coastguard Worker``-threads=N`` command-line option, there is a 3-stage producer-consumer 1008*03ce13f7SAndroid Build Coastguard Workerpipeline: 1009*03ce13f7SAndroid Build Coastguard Worker 1010*03ce13f7SAndroid Build Coastguard Worker- A single thread parses the ``.pexe`` file and produces a sequence of work 1011*03ce13f7SAndroid Build Coastguard Worker units. A work unit can be either a fully constructed CFG, or a set of global 1012*03ce13f7SAndroid Build Coastguard Worker initializers. The work unit includes its sequence number denoting its parse 1013*03ce13f7SAndroid Build Coastguard Worker order. Each work unit is added to the translation queue. 1014*03ce13f7SAndroid Build Coastguard Worker 1015*03ce13f7SAndroid Build Coastguard Worker- There are N translation threads that draw work units from the translation 1016*03ce13f7SAndroid Build Coastguard Worker queue and lower them into assembler buffers. Each assembler buffer is added 1017*03ce13f7SAndroid Build Coastguard Worker to the emitter queue, tagged with its sequence number. The CFG and its 1018*03ce13f7SAndroid Build Coastguard Worker ``CfgLocalAllocator`` are disposed of at this point. 1019*03ce13f7SAndroid Build Coastguard Worker 1020*03ce13f7SAndroid Build Coastguard Worker- A single thread draws assembler buffers from the emitter queue and appends to 1021*03ce13f7SAndroid Build Coastguard Worker the output file. It uses the sequence numbers to reintegrate the assembler 1022*03ce13f7SAndroid Build Coastguard Worker buffers according to the original parse order, such that output order is 1023*03ce13f7SAndroid Build Coastguard Worker always deterministic. 1024*03ce13f7SAndroid Build Coastguard Worker 1025*03ce13f7SAndroid Build Coastguard WorkerThis means that with ``-threads=N``, there are actually ``N+1`` spawned threads 1026*03ce13f7SAndroid Build Coastguard Workerfor a total of ``N+2`` execution threads, taking the parser and emitter threads 1027*03ce13f7SAndroid Build Coastguard Workerinto account. For the special case of ``N=0``, execution is entirely sequential 1028*03ce13f7SAndroid Build Coastguard Worker-- the same thread parses, translates, and emits, one function at a time. This 1029*03ce13f7SAndroid Build Coastguard Workeris useful for performance measurements. 1030*03ce13f7SAndroid Build Coastguard Worker 1031*03ce13f7SAndroid Build Coastguard WorkerIdeally, we would like to get near-linear scalability as the number of 1032*03ce13f7SAndroid Build Coastguard Workertranslation threads increases. We expect that ``-threads=1`` should be slightly 1033*03ce13f7SAndroid Build Coastguard Workerfaster than ``-threads=0`` as the small amount of time spent parsing and 1034*03ce13f7SAndroid Build Coastguard Workeremitting is done largely in parallel with translation. With perfect 1035*03ce13f7SAndroid Build Coastguard Workerscalability, we see ``-threads=N`` translating ``N`` times as fast as 1036*03ce13f7SAndroid Build Coastguard Worker``-threads=1``, up until the point where parsing or emitting becomes the 1037*03ce13f7SAndroid Build Coastguard Workerbottleneck, or ``N+2`` exceeds the number of CPU cores. In reality, memory 1038*03ce13f7SAndroid Build Coastguard Workerperformance would become a bottleneck and efficiency might peak at, say, 75%. 1039*03ce13f7SAndroid Build Coastguard Worker 1040*03ce13f7SAndroid Build Coastguard WorkerCurrently, parsing takes about 11% of total sequential time. If translation 1041*03ce13f7SAndroid Build Coastguard Workerscalability ever gets so fast and awesomely scalable that parsing becomes a 1042*03ce13f7SAndroid Build Coastguard Workerbottleneck, it should be possible to make parsing multithreaded as well. 1043*03ce13f7SAndroid Build Coastguard Worker 1044*03ce13f7SAndroid Build Coastguard WorkerInternally, all shared, mutable data is held in the GlobalContext object, and 1045*03ce13f7SAndroid Build Coastguard Workeraccess to each field is guarded by a mutex. 1046*03ce13f7SAndroid Build Coastguard Worker 1047*03ce13f7SAndroid Build Coastguard WorkerSecurity 1048*03ce13f7SAndroid Build Coastguard Worker-------- 1049*03ce13f7SAndroid Build Coastguard Worker 1050*03ce13f7SAndroid Build Coastguard WorkerSubzero includes a number of security features in the generated code, as well as 1051*03ce13f7SAndroid Build Coastguard Workerin the Subzero translator itself, which run on top of the existing Native Client 1052*03ce13f7SAndroid Build Coastguard Workersandbox as well as Chrome's OS-level sandbox. 1053*03ce13f7SAndroid Build Coastguard Worker 1054*03ce13f7SAndroid Build Coastguard WorkerSandboxed translator 1055*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^ 1056*03ce13f7SAndroid Build Coastguard Worker 1057*03ce13f7SAndroid Build Coastguard WorkerWhen running inside the browser, the Subzero translator executes as sandboxed, 1058*03ce13f7SAndroid Build Coastguard Workeruntrusted code that is initially checked by the validator, just like the 1059*03ce13f7SAndroid Build Coastguard WorkerLLVM-based ``pnacl-llc`` translator. As such, the Subzero binary should be no 1060*03ce13f7SAndroid Build Coastguard Workermore or less secure than the translator it replaces, from the point of view of 1061*03ce13f7SAndroid Build Coastguard Workerthe Chrome sandbox. That said, Subzero is much smaller than ``pnacl-llc`` and 1062*03ce13f7SAndroid Build Coastguard Workerwas designed from the start with security in mind, so one expects fewer attacker 1063*03ce13f7SAndroid Build Coastguard Workeropportunities here. 1064*03ce13f7SAndroid Build Coastguard Worker 1065*03ce13f7SAndroid Build Coastguard WorkerFuzzing 1066*03ce13f7SAndroid Build Coastguard Worker^^^^^^^ 1067*03ce13f7SAndroid Build Coastguard Worker 1068*03ce13f7SAndroid Build Coastguard WorkerWe have started fuzz-testing the ``.pexe`` files input to Subzero, using a 1069*03ce13f7SAndroid Build Coastguard Workercombination of `afl-fuzz <http://lcamtuf.coredump.cx/afl/>`_, LLVM's `libFuzzer 1070*03ce13f7SAndroid Build Coastguard Worker<http://llvm.org/docs/LibFuzzer.html>`_, and custom tooling. The purpose is to 1071*03ce13f7SAndroid Build Coastguard Workerfind and fix cases where Subzero crashes or otherwise ungracefully fails on 1072*03ce13f7SAndroid Build Coastguard Workerunexpected inputs, and to do so automatically over a large range of unexpected 1073*03ce13f7SAndroid Build Coastguard Workerinputs. By fixing bugs that arise from fuzz testing, we reduce the possibility 1074*03ce13f7SAndroid Build Coastguard Workerof an attacker exploiting these bugs. 1075*03ce13f7SAndroid Build Coastguard Worker 1076*03ce13f7SAndroid Build Coastguard WorkerMost of the problems found so far are ones most appropriately handled in the 1077*03ce13f7SAndroid Build Coastguard Workerparser. However, there have been a couple that have identified problems in the 1078*03ce13f7SAndroid Build Coastguard Workerlowering, or otherwise inappropriately triggered assertion failures and fatal 1079*03ce13f7SAndroid Build Coastguard Workererrors. We continue to dig into this area. 1080*03ce13f7SAndroid Build Coastguard Worker 1081*03ce13f7SAndroid Build Coastguard WorkerFuture security work 1082*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^ 1083*03ce13f7SAndroid Build Coastguard Worker 1084*03ce13f7SAndroid Build Coastguard WorkerSubzero is well-positioned to explore other future security enhancements, e.g.: 1085*03ce13f7SAndroid Build Coastguard Worker 1086*03ce13f7SAndroid Build Coastguard Worker- Tightening the Native Client sandbox. ABI changes, such as the previous work 1087*03ce13f7SAndroid Build Coastguard Worker on `hiding the sandbox base address 1088*03ce13f7SAndroid Build Coastguard Worker <https://docs.google.com/document/d/1eskaI4353XdsJQFJLRnZzb_YIESQx4gNRzf31dqXVG8>`_ 1089*03ce13f7SAndroid Build Coastguard Worker in x86-64, are easy to experiment with in Subzero. 1090*03ce13f7SAndroid Build Coastguard Worker 1091*03ce13f7SAndroid Build Coastguard Worker- Making the executable code section read-only. This would prevent a PNaCl 1092*03ce13f7SAndroid Build Coastguard Worker application from inspecting its own binary and trying to find ROP gadgets even 1093*03ce13f7SAndroid Build Coastguard Worker after code diversification has been performed. It may still be susceptible to 1094*03ce13f7SAndroid Build Coastguard Worker `blind ROP <http://www.scs.stanford.edu/brop/bittau-brop.pdf>`_ attacks, 1095*03ce13f7SAndroid Build Coastguard Worker security is still overall improved. 1096*03ce13f7SAndroid Build Coastguard Worker 1097*03ce13f7SAndroid Build Coastguard Worker- Instruction selection diversification. It may be possible to lower a given 1098*03ce13f7SAndroid Build Coastguard Worker instruction in several largely equivalent ways, which gives more opportunities 1099*03ce13f7SAndroid Build Coastguard Worker for code randomization. 1100*03ce13f7SAndroid Build Coastguard Worker 1101*03ce13f7SAndroid Build Coastguard WorkerChrome integration 1102*03ce13f7SAndroid Build Coastguard Worker------------------ 1103*03ce13f7SAndroid Build Coastguard Worker 1104*03ce13f7SAndroid Build Coastguard WorkerCurrently Subzero is available in Chrome for the x86-32 architecture, but under 1105*03ce13f7SAndroid Build Coastguard Workera flag. When the flag is enabled, Subzero is used when the `manifest file 1106*03ce13f7SAndroid Build Coastguard Worker<https://developer.chrome.com/native-client/reference/nacl-manifest-format>`_ 1107*03ce13f7SAndroid Build Coastguard Workerlinking to the ``.pexe`` file specifies the ``O0`` optimization level. 1108*03ce13f7SAndroid Build Coastguard Worker 1109*03ce13f7SAndroid Build Coastguard WorkerThe next step is to remove the flag, i.e. invoke Subzero as the only translator 1110*03ce13f7SAndroid Build Coastguard Workerfor ``O0``-specified manifest files. 1111*03ce13f7SAndroid Build Coastguard Worker 1112*03ce13f7SAndroid Build Coastguard WorkerUltimately, Subzero might produce code rivaling LLVM ``O2`` quality, in which 1113*03ce13f7SAndroid Build Coastguard Workercase Subzero could be used for all PNaCl translation. 1114*03ce13f7SAndroid Build Coastguard Worker 1115*03ce13f7SAndroid Build Coastguard WorkerCommand line options 1116*03ce13f7SAndroid Build Coastguard Worker-------------------- 1117*03ce13f7SAndroid Build Coastguard Worker 1118*03ce13f7SAndroid Build Coastguard WorkerSubzero has a number of command-line options for debugging and diagnostics. 1119*03ce13f7SAndroid Build Coastguard WorkerAmong the more interesting are the following. 1120*03ce13f7SAndroid Build Coastguard Worker 1121*03ce13f7SAndroid Build Coastguard Worker- Using the ``-verbose`` flag, Subzero will dump the CFG, or produce other 1122*03ce13f7SAndroid Build Coastguard Worker diagnostic output, with various levels of detail after each pass. Instruction 1123*03ce13f7SAndroid Build Coastguard Worker numbers can be printed or suppressed. Deleted instructions can be printed or 1124*03ce13f7SAndroid Build Coastguard Worker suppressed (they are retained in the instruction list, as discussed earlier, 1125*03ce13f7SAndroid Build Coastguard Worker because they can help explain how lower-level instructions originated). 1126*03ce13f7SAndroid Build Coastguard Worker Liveness information can be printed when available. Details of register 1127*03ce13f7SAndroid Build Coastguard Worker allocation can be printed as register allocator decisions are made. And more. 1128*03ce13f7SAndroid Build Coastguard Worker 1129*03ce13f7SAndroid Build Coastguard Worker- Running Subzero with any level of verbosity produces an enormous amount of 1130*03ce13f7SAndroid Build Coastguard Worker output. When debugging a single function, verbose output can be suppressed 1131*03ce13f7SAndroid Build Coastguard Worker except for a particular function. The ``-verbose-focus`` flag suppresses 1132*03ce13f7SAndroid Build Coastguard Worker verbose output except for the specified function. 1133*03ce13f7SAndroid Build Coastguard Worker 1134*03ce13f7SAndroid Build Coastguard Worker- Subzero has a ``-timing`` option that prints a breakdown of pass-level timing 1135*03ce13f7SAndroid Build Coastguard Worker at exit. Timing markers can be placed in the Subzero source code to demarcate 1136*03ce13f7SAndroid Build Coastguard Worker logical operations or passes of interest. Basic timing information plus 1137*03ce13f7SAndroid Build Coastguard Worker call-stack type timing information is printed at the end. 1138*03ce13f7SAndroid Build Coastguard Worker 1139*03ce13f7SAndroid Build Coastguard Worker- Along with ``-timing``, the user can instead get a report on the overall 1140*03ce13f7SAndroid Build Coastguard Worker translation time for each function, to help focus on timing outliers. Also, 1141*03ce13f7SAndroid Build Coastguard Worker ``-timing-focus`` limits the ``-timing`` reporting to a single function, 1142*03ce13f7SAndroid Build Coastguard Worker instead of aggregating pass timing across all functions. 1143*03ce13f7SAndroid Build Coastguard Worker 1144*03ce13f7SAndroid Build Coastguard Worker- The ``-szstats`` option reports various statistics on each function, such as 1145*03ce13f7SAndroid Build Coastguard Worker stack frame size, static instruction count, etc. It may be helpful to track 1146*03ce13f7SAndroid Build Coastguard Worker these stats over time as Subzero is improved, as an approximate measure of 1147*03ce13f7SAndroid Build Coastguard Worker code quality. 1148*03ce13f7SAndroid Build Coastguard Worker 1149*03ce13f7SAndroid Build Coastguard Worker- The flag ``-asm-verbose``, in conjunction with emitting textual assembly 1150*03ce13f7SAndroid Build Coastguard Worker output, annotate the assembly output with register-focused liveness 1151*03ce13f7SAndroid Build Coastguard Worker information. In particular, each basic block is annotated with which 1152*03ce13f7SAndroid Build Coastguard Worker registers are live-in and live-out, and each instruction is annotated with 1153*03ce13f7SAndroid Build Coastguard Worker which registers' and stack locations' live ranges end at that instruction. 1154*03ce13f7SAndroid Build Coastguard Worker This is really useful when studying the generated code to find opportunities 1155*03ce13f7SAndroid Build Coastguard Worker for code quality improvements. 1156*03ce13f7SAndroid Build Coastguard Worker 1157*03ce13f7SAndroid Build Coastguard WorkerTesting and debugging 1158*03ce13f7SAndroid Build Coastguard Worker--------------------- 1159*03ce13f7SAndroid Build Coastguard Worker 1160*03ce13f7SAndroid Build Coastguard WorkerLLVM lit tests 1161*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^ 1162*03ce13f7SAndroid Build Coastguard Worker 1163*03ce13f7SAndroid Build Coastguard WorkerFor basic testing, Subzero uses LLVM's `lit 1164*03ce13f7SAndroid Build Coastguard Worker<http://llvm.org/docs/CommandGuide/lit.html>`_ framework for running tests. We 1165*03ce13f7SAndroid Build Coastguard Workerhave a suite of hundreds of small functions where we test for particular 1166*03ce13f7SAndroid Build Coastguard Workerassembly code patterns across different target architectures. 1167*03ce13f7SAndroid Build Coastguard Worker 1168*03ce13f7SAndroid Build Coastguard WorkerCross tests 1169*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^ 1170*03ce13f7SAndroid Build Coastguard Worker 1171*03ce13f7SAndroid Build Coastguard WorkerUnfortunately, the lit tests don't do a great job of precisely testing the 1172*03ce13f7SAndroid Build Coastguard Workercorrectness of the output. Much better are the cross tests, which are execution 1173*03ce13f7SAndroid Build Coastguard Workertests that compare Subzero and ``pnacl-llc`` translated bitcode across a wide 1174*03ce13f7SAndroid Build Coastguard Workervariety of interesting inputs. Each cross test consists of a set of C, C++, 1175*03ce13f7SAndroid Build Coastguard Workerand/or low-level bitcode files. The C and C++ source files are compiled down to 1176*03ce13f7SAndroid Build Coastguard Workerbitcode. The bitcode files are translated by ``pnacl-llc`` and also by Subzero. 1177*03ce13f7SAndroid Build Coastguard WorkerSubzero mangles global symbol names with a special prefix to avoid duplicate 1178*03ce13f7SAndroid Build Coastguard Workersymbol errors. A driver program invokes both versions on a large set of 1179*03ce13f7SAndroid Build Coastguard Workerinteresting inputs, and reports when the Subzero and ``pnacl-llc`` results 1180*03ce13f7SAndroid Build Coastguard Workerdiffer. Cross tests turn out to be an excellent way of testing the basic 1181*03ce13f7SAndroid Build Coastguard Workerlowering patterns, but they are less useful for testing more global things like 1182*03ce13f7SAndroid Build Coastguard Workerliveness analysis and register allocation. 1183*03ce13f7SAndroid Build Coastguard Worker 1184*03ce13f7SAndroid Build Coastguard WorkerBisection debugging 1185*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^ 1186*03ce13f7SAndroid Build Coastguard Worker 1187*03ce13f7SAndroid Build Coastguard WorkerSometimes with a new application, Subzero will end up producing incorrect code 1188*03ce13f7SAndroid Build Coastguard Workerthat either crashes at runtime or otherwise produces the wrong results. When 1189*03ce13f7SAndroid Build Coastguard Workerthis happens, we need to narrow it down to a single function (or small set of 1190*03ce13f7SAndroid Build Coastguard Workerfunctions) that yield incorrect behavior. For this, we have a bisection 1191*03ce13f7SAndroid Build Coastguard Workerdebugging framework. Here, we initially translate the entire application once 1192*03ce13f7SAndroid Build Coastguard Workerwith Subzero and once with ``pnacl-llc``. We then use ``objdump`` to 1193*03ce13f7SAndroid Build Coastguard Workerselectively weaken symbols based on a list provided on the command line. The 1194*03ce13f7SAndroid Build Coastguard Workertwo object files can then be linked together without link errors, with the 1195*03ce13f7SAndroid Build Coastguard Workerdesired version of each method "winning". Then the binary is tested, and 1196*03ce13f7SAndroid Build Coastguard Workerbisection proceeds based on whether the binary produces correct output. 1197*03ce13f7SAndroid Build Coastguard Worker 1198*03ce13f7SAndroid Build Coastguard WorkerWhen the bisection completes, we are left with a minimal set of 1199*03ce13f7SAndroid Build Coastguard WorkerSubzero-translated functions that cause the failure. Usually it is a single 1200*03ce13f7SAndroid Build Coastguard Workerfunction, though sometimes it might require a combination of several functions 1201*03ce13f7SAndroid Build Coastguard Workerto cause a failure; this may be due to an incorrect call ABI, for example. 1202*03ce13f7SAndroid Build Coastguard WorkerHowever, Murphy's Law implies that the single failing function is enormous and 1203*03ce13f7SAndroid Build Coastguard Workerimpractical to debug. In that case, we can restart the bisection, explicitly 1204*03ce13f7SAndroid Build Coastguard Workerignoring the enormous function, and try to find another candidate to debug. 1205*03ce13f7SAndroid Build Coastguard Worker(Future work is to automate this to find all minimal sets of functions, so that 1206*03ce13f7SAndroid Build Coastguard Workerdebugging can focus on the simplest example.) 1207*03ce13f7SAndroid Build Coastguard Worker 1208*03ce13f7SAndroid Build Coastguard WorkerFuzz testing 1209*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^^^ 1210*03ce13f7SAndroid Build Coastguard Worker 1211*03ce13f7SAndroid Build Coastguard WorkerAs described above, we try to find internal Subzero bugs using fuzz testing 1212*03ce13f7SAndroid Build Coastguard Workertechniques. 1213*03ce13f7SAndroid Build Coastguard Worker 1214*03ce13f7SAndroid Build Coastguard WorkerSanitizers 1215*03ce13f7SAndroid Build Coastguard Worker^^^^^^^^^^ 1216*03ce13f7SAndroid Build Coastguard Worker 1217*03ce13f7SAndroid Build Coastguard WorkerSubzero can be built with `AddressSanitizer 1218*03ce13f7SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/AddressSanitizer.html>`_ (ASan) or `ThreadSanitizer 1219*03ce13f7SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/ThreadSanitizer.html>`_ (TSan) support. This is 1220*03ce13f7SAndroid Build Coastguard Workerdone using something as simple as ``make ASAN=1`` or ``make TSAN=1``. So far, 1221*03ce13f7SAndroid Build Coastguard Workermultithreading has been simple enough that TSan hasn't found any bugs, but ASan 1222*03ce13f7SAndroid Build Coastguard Workerhas found at least one memory leak which was subsequently fixed. 1223*03ce13f7SAndroid Build Coastguard Worker`UndefinedBehaviorSanitizer 1224*03ce13f7SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation>`_ 1225*03ce13f7SAndroid Build Coastguard Worker(UBSan) support is in progress. `Control flow integrity sanitization 1226*03ce13f7SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/ControlFlowIntegrity.html>`_ is also under 1227*03ce13f7SAndroid Build Coastguard Workerconsideration. 1228*03ce13f7SAndroid Build Coastguard Worker 1229*03ce13f7SAndroid Build Coastguard WorkerCurrent status 1230*03ce13f7SAndroid Build Coastguard Worker============== 1231*03ce13f7SAndroid Build Coastguard Worker 1232*03ce13f7SAndroid Build Coastguard WorkerTarget architectures 1233*03ce13f7SAndroid Build Coastguard Worker-------------------- 1234*03ce13f7SAndroid Build Coastguard Worker 1235*03ce13f7SAndroid Build Coastguard WorkerSubzero is currently more or less complete for the x86-32 target. It has been 1236*03ce13f7SAndroid Build Coastguard Workerrefactored and extended to handle x86-64 as well, and that is mostly complete at 1237*03ce13f7SAndroid Build Coastguard Workerthis point. 1238*03ce13f7SAndroid Build Coastguard Worker 1239*03ce13f7SAndroid Build Coastguard WorkerARM32 work is in progress. It currently lacks the testing level of x86, at 1240*03ce13f7SAndroid Build Coastguard Workerleast in part because Subzero's register allocator needs modifications to handle 1241*03ce13f7SAndroid Build Coastguard WorkerARM's aliasing of floating point and vector registers. Specifically, a 64-bit 1242*03ce13f7SAndroid Build Coastguard Workerregister is actually a gang of two consecutive and aligned 32-bit registers, and 1243*03ce13f7SAndroid Build Coastguard Workera 128-bit register is a gang of 4 consecutive and aligned 32-bit registers. 1244*03ce13f7SAndroid Build Coastguard WorkerARM64 work has not started; when it does, it will be native-only since the 1245*03ce13f7SAndroid Build Coastguard WorkerNative Client sandbox model, validator, and other tools have never been defined. 1246*03ce13f7SAndroid Build Coastguard Worker 1247*03ce13f7SAndroid Build Coastguard WorkerAn external contributor is adding MIPS support, in most part by following the 1248*03ce13f7SAndroid Build Coastguard WorkerARM work. 1249*03ce13f7SAndroid Build Coastguard Worker 1250*03ce13f7SAndroid Build Coastguard WorkerTranslator performance 1251*03ce13f7SAndroid Build Coastguard Worker---------------------- 1252*03ce13f7SAndroid Build Coastguard Worker 1253*03ce13f7SAndroid Build Coastguard WorkerSingle-threaded translation speed is currently about 5× the ``pnacl-llc`` 1254*03ce13f7SAndroid Build Coastguard Workertranslation speed. For a large ``.pexe`` file, the time breaks down as: 1255*03ce13f7SAndroid Build Coastguard Worker 1256*03ce13f7SAndroid Build Coastguard Worker- 11% for parsing and initial IR building 1257*03ce13f7SAndroid Build Coastguard Worker 1258*03ce13f7SAndroid Build Coastguard Worker- 4% for emitting to /dev/null 1259*03ce13f7SAndroid Build Coastguard Worker 1260*03ce13f7SAndroid Build Coastguard Worker- 27% for liveness analysis (two liveness passes plus live range construction) 1261*03ce13f7SAndroid Build Coastguard Worker 1262*03ce13f7SAndroid Build Coastguard Worker- 15% for linear-scan register allocation 1263*03ce13f7SAndroid Build Coastguard Worker 1264*03ce13f7SAndroid Build Coastguard Worker- 9% for basic lowering 1265*03ce13f7SAndroid Build Coastguard Worker 1266*03ce13f7SAndroid Build Coastguard Worker- 10% for advanced phi lowering 1267*03ce13f7SAndroid Build Coastguard Worker 1268*03ce13f7SAndroid Build Coastguard Worker- ~11% for other minor analysis 1269*03ce13f7SAndroid Build Coastguard Worker 1270*03ce13f7SAndroid Build Coastguard Worker- ~10% measurement overhead to acquire these numbers 1271*03ce13f7SAndroid Build Coastguard Worker 1272*03ce13f7SAndroid Build Coastguard WorkerSome improvements could undoubtedly be made, but it will be hard to increase the 1273*03ce13f7SAndroid Build Coastguard Workerspeed to 10× of ``pnacl-llc`` while keeping acceptable code quality. With 1274*03ce13f7SAndroid Build Coastguard Worker``-Om1`` (lack of) optimization, we do actually achieve roughly 10× 1275*03ce13f7SAndroid Build Coastguard Worker``pnacl-llc`` translation speed, but code quality drops by a factor of 3. 1276*03ce13f7SAndroid Build Coastguard Worker 1277*03ce13f7SAndroid Build Coastguard WorkerCode quality 1278*03ce13f7SAndroid Build Coastguard Worker------------ 1279*03ce13f7SAndroid Build Coastguard Worker 1280*03ce13f7SAndroid Build Coastguard WorkerMeasured across 16 components of spec2k, Subzero's code quality is uniformly 1281*03ce13f7SAndroid Build Coastguard Workerbetter than ``pnacl-llc`` ``-O0`` code quality, and in many cases solidly 1282*03ce13f7SAndroid Build Coastguard Workerbetween ``pnacl-llc`` ``-O0`` and ``-O2``. 1283*03ce13f7SAndroid Build Coastguard Worker 1284*03ce13f7SAndroid Build Coastguard WorkerTranslator size 1285*03ce13f7SAndroid Build Coastguard Worker--------------- 1286*03ce13f7SAndroid Build Coastguard Worker 1287*03ce13f7SAndroid Build Coastguard WorkerWhen built in MINIMAL mode, the x86-64 native translator size for the x86-32 1288*03ce13f7SAndroid Build Coastguard Workertarget is about 700 KB, not including the size of functions referenced in 1289*03ce13f7SAndroid Build Coastguard Workerdynamically-linked libraries. The sandboxed version of Subzero is a bit over 1 1290*03ce13f7SAndroid Build Coastguard WorkerMB, and it is statically linked and also includes nop padding for bundling as 1291*03ce13f7SAndroid Build Coastguard Workerwell as indirect branch masking. 1292*03ce13f7SAndroid Build Coastguard Worker 1293*03ce13f7SAndroid Build Coastguard WorkerTranslator memory footprint 1294*03ce13f7SAndroid Build Coastguard Worker--------------------------- 1295*03ce13f7SAndroid Build Coastguard Worker 1296*03ce13f7SAndroid Build Coastguard WorkerIt's hard to draw firm conclusions about memory footprint, since the footprint 1297*03ce13f7SAndroid Build Coastguard Workeris at least proportional to the input function size, and there is no real limit 1298*03ce13f7SAndroid Build Coastguard Workeron the size of functions in the ``.pexe`` file. 1299*03ce13f7SAndroid Build Coastguard Worker 1300*03ce13f7SAndroid Build Coastguard WorkerThat said, we looked at the memory footprint over time as Subzero translated 1301*03ce13f7SAndroid Build Coastguard Worker``pnacl-llc.pexe``, which is the largest ``.pexe`` file (7.2 MB) at our 1302*03ce13f7SAndroid Build Coastguard Workerdisposal. One of LLVM's libraries that Subzero uses can report the current 1303*03ce13f7SAndroid Build Coastguard Workermalloc heap usage. With single-threaded translation, Subzero tends to hover 1304*03ce13f7SAndroid Build Coastguard Workeraround 15 MB of memory usage. There are a couple of monstrous functions where 1305*03ce13f7SAndroid Build Coastguard WorkerSubzero grows to around 100 MB, but then it drops back down after those 1306*03ce13f7SAndroid Build Coastguard Workerfunctions finish translating. In contrast, ``pnacl-llc`` grows larger and 1307*03ce13f7SAndroid Build Coastguard Workerlarger throughout translation, reaching several hundred MB by the time it 1308*03ce13f7SAndroid Build Coastguard Workercompletes. 1309*03ce13f7SAndroid Build Coastguard Worker 1310*03ce13f7SAndroid Build Coastguard WorkerIt's a bit more interesting when we enable multithreaded translation. When 1311*03ce13f7SAndroid Build Coastguard Workerthere are N translation threads, Subzero implements a policy that limits the 1312*03ce13f7SAndroid Build Coastguard Workersize of the translation queue to N entries -- if it is "full" when the parser 1313*03ce13f7SAndroid Build Coastguard Workertries to add a new CFG, the parser blocks until one of the translation threads 1314*03ce13f7SAndroid Build Coastguard Workerremoves a CFG. This means the number of in-memory CFGs can (and generally does) 1315*03ce13f7SAndroid Build Coastguard Workerreach 2*N+1, and so the memory footprint rises in proportion to the number of 1316*03ce13f7SAndroid Build Coastguard Workerthreads. Adding to the pressure is the observation that the monstrous functions 1317*03ce13f7SAndroid Build Coastguard Workeralso take proportionally longer time to translate, so there's a good chance many 1318*03ce13f7SAndroid Build Coastguard Workerof the monstrous functions will be active at the same time with multithreaded 1319*03ce13f7SAndroid Build Coastguard Workertranslation. As a result, for N=32, Subzero's memory footprint peaks at about 1320*03ce13f7SAndroid Build Coastguard Worker260 MB, but drops back down as the large functions finish translating. 1321*03ce13f7SAndroid Build Coastguard Worker 1322*03ce13f7SAndroid Build Coastguard WorkerIf this peak memory size becomes a problem, it might be possible for the parser 1323*03ce13f7SAndroid Build Coastguard Workerto resequence the functions to try to spread out the larger functions, or to 1324*03ce13f7SAndroid Build Coastguard Workerthrottle the translation queue to prevent too many in-flight large functions. 1325*03ce13f7SAndroid Build Coastguard WorkerIt may also be possible to throttle based on memory pressure signaling from 1326*03ce13f7SAndroid Build Coastguard WorkerChrome. 1327*03ce13f7SAndroid Build Coastguard Worker 1328*03ce13f7SAndroid Build Coastguard WorkerTranslator scalability 1329*03ce13f7SAndroid Build Coastguard Worker---------------------- 1330*03ce13f7SAndroid Build Coastguard Worker 1331*03ce13f7SAndroid Build Coastguard WorkerCurrently scalability is "not very good". Multiple translation threads lead to 1332*03ce13f7SAndroid Build Coastguard Workerfaster translation, but not to the degree desired. We haven't dug in to 1333*03ce13f7SAndroid Build Coastguard Workerinvestigate yet. 1334*03ce13f7SAndroid Build Coastguard Worker 1335*03ce13f7SAndroid Build Coastguard WorkerThere are a few areas to investigate. First, there may be contention on the 1336*03ce13f7SAndroid Build Coastguard Workerconstant pool, which all threads access, and which requires locked access even 1337*03ce13f7SAndroid Build Coastguard Workerfor reading. This could be mitigated by keeping a CFG-local cache of the most 1338*03ce13f7SAndroid Build Coastguard Workercommon constants. 1339*03ce13f7SAndroid Build Coastguard Worker 1340*03ce13f7SAndroid Build Coastguard WorkerSecond, there may be contention on memory allocation. While almost all CFG 1341*03ce13f7SAndroid Build Coastguard Workerobjects are allocated from the CFG-local allocator, some passes use temporary 1342*03ce13f7SAndroid Build Coastguard WorkerSTL containers that use the default allocator, which may require global locking. 1343*03ce13f7SAndroid Build Coastguard WorkerThis could be mitigated by switching these to the CFG-local allocator. 1344*03ce13f7SAndroid Build Coastguard Worker 1345*03ce13f7SAndroid Build Coastguard WorkerThird, multithreading may make the default allocator strategy more expensive. 1346*03ce13f7SAndroid Build Coastguard WorkerIn a single-threaded environment, a pass will allocate its containers, run the 1347*03ce13f7SAndroid Build Coastguard Workerpass, and deallocate the containers. This results in stack-like allocation 1348*03ce13f7SAndroid Build Coastguard Workerbehavior and makes the heap free list easier to manage, with less heap 1349*03ce13f7SAndroid Build Coastguard Workerfragmentation. But when multithreading is added, the allocations and 1350*03ce13f7SAndroid Build Coastguard Workerdeallocations become much less stack-like, making allocation and deallocation 1351*03ce13f7SAndroid Build Coastguard Workeroperations individually more expensive. Again, this could be mitigated by 1352*03ce13f7SAndroid Build Coastguard Workerswitching these to the CFG-local allocator. 1353