1*67e74705SXin Li======================================== 2*67e74705SXin LiPrecompiled Header and Modules Internals 3*67e74705SXin Li======================================== 4*67e74705SXin Li 5*67e74705SXin Li.. contents:: 6*67e74705SXin Li :local: 7*67e74705SXin Li 8*67e74705SXin LiThis document describes the design and implementation of Clang's precompiled 9*67e74705SXin Liheaders (PCH) and modules. If you are interested in the end-user view, please 10*67e74705SXin Lisee the :ref:`User's Manual <usersmanual-precompiled-headers>`. 11*67e74705SXin Li 12*67e74705SXin LiUsing Precompiled Headers with ``clang`` 13*67e74705SXin Li---------------------------------------- 14*67e74705SXin Li 15*67e74705SXin LiThe Clang compiler frontend, ``clang -cc1``, supports two command line options 16*67e74705SXin Lifor generating and using PCH files. 17*67e74705SXin Li 18*67e74705SXin LiTo generate PCH files using ``clang -cc1``, use the option `-emit-pch`: 19*67e74705SXin Li 20*67e74705SXin Li.. code-block:: bash 21*67e74705SXin Li 22*67e74705SXin Li $ clang -cc1 test.h -emit-pch -o test.h.pch 23*67e74705SXin Li 24*67e74705SXin LiThis option is transparently used by ``clang`` when generating PCH files. The 25*67e74705SXin Liresulting PCH file contains the serialized form of the compiler's internal 26*67e74705SXin Lirepresentation after it has completed parsing and semantic analysis. The PCH 27*67e74705SXin Lifile can then be used as a prefix header with the `-include-pch` 28*67e74705SXin Lioption: 29*67e74705SXin Li 30*67e74705SXin Li.. code-block:: bash 31*67e74705SXin Li 32*67e74705SXin Li $ clang -cc1 -include-pch test.h.pch test.c -o test.s 33*67e74705SXin Li 34*67e74705SXin LiDesign Philosophy 35*67e74705SXin Li----------------- 36*67e74705SXin Li 37*67e74705SXin LiPrecompiled headers are meant to improve overall compile times for projects, so 38*67e74705SXin Lithe design of precompiled headers is entirely driven by performance concerns. 39*67e74705SXin LiThe use case for precompiled headers is relatively simple: when there is a 40*67e74705SXin Licommon set of headers that is included in nearly every source file in the 41*67e74705SXin Liproject, we *precompile* that bundle of headers into a single precompiled 42*67e74705SXin Liheader (PCH file). Then, when compiling the source files in the project, we 43*67e74705SXin Liload the PCH file first (as a prefix header), which acts as a stand-in for that 44*67e74705SXin Libundle of headers. 45*67e74705SXin Li 46*67e74705SXin LiA precompiled header implementation improves performance when: 47*67e74705SXin Li 48*67e74705SXin Li* Loading the PCH file is significantly faster than re-parsing the bundle of 49*67e74705SXin Li headers stored within the PCH file. Thus, a precompiled header design 50*67e74705SXin Li attempts to minimize the cost of reading the PCH file. Ideally, this cost 51*67e74705SXin Li should not vary with the size of the precompiled header file. 52*67e74705SXin Li 53*67e74705SXin Li* The cost of generating the PCH file initially is not so large that it 54*67e74705SXin Li counters the per-source-file performance improvement due to eliminating the 55*67e74705SXin Li need to parse the bundled headers in the first place. This is particularly 56*67e74705SXin Li important on multi-core systems, because PCH file generation serializes the 57*67e74705SXin Li build when all compilations require the PCH file to be up-to-date. 58*67e74705SXin Li 59*67e74705SXin LiModules, as implemented in Clang, use the same mechanisms as precompiled 60*67e74705SXin Liheaders to save a serialized AST file (one per module) and use those AST 61*67e74705SXin Limodules. From an implementation standpoint, modules are a generalization of 62*67e74705SXin Liprecompiled headers, lifting a number of restrictions placed on precompiled 63*67e74705SXin Liheaders. In particular, there can only be one precompiled header and it must 64*67e74705SXin Libe included at the beginning of the translation unit. The extensions to the 65*67e74705SXin LiAST file format required for modules are discussed in the section on 66*67e74705SXin Li:ref:`modules <pchinternals-modules>`. 67*67e74705SXin Li 68*67e74705SXin LiClang's AST files are designed with a compact on-disk representation, which 69*67e74705SXin Liminimizes both creation time and the time required to initially load the AST 70*67e74705SXin Lifile. The AST file itself contains a serialized representation of Clang's 71*67e74705SXin Liabstract syntax trees and supporting data structures, stored using the same 72*67e74705SXin Licompressed bitstream as `LLVM's bitcode file format 73*67e74705SXin Li<http://llvm.org/docs/BitCodeFormat.html>`_. 74*67e74705SXin Li 75*67e74705SXin LiClang's AST files are loaded "lazily" from disk. When an AST file is initially 76*67e74705SXin Liloaded, Clang reads only a small amount of data from the AST file to establish 77*67e74705SXin Liwhere certain important data structures are stored. The amount of data read in 78*67e74705SXin Lithis initial load is independent of the size of the AST file, such that a 79*67e74705SXin Lilarger AST file does not lead to longer AST load times. The actual header data 80*67e74705SXin Liin the AST file --- macros, functions, variables, types, etc. --- is loaded 81*67e74705SXin Lionly when it is referenced from the user's code, at which point only that 82*67e74705SXin Lientity (and those entities it depends on) are deserialized from the AST file. 83*67e74705SXin LiWith this approach, the cost of using an AST file for a translation unit is 84*67e74705SXin Liproportional to the amount of code actually used from the AST file, rather than 85*67e74705SXin Libeing proportional to the size of the AST file itself. 86*67e74705SXin Li 87*67e74705SXin LiWhen given the `-print-stats` option, Clang produces statistics 88*67e74705SXin Lidescribing how much of the AST file was actually loaded from disk. For a 89*67e74705SXin Lisimple "Hello, World!" program that includes the Apple ``Cocoa.h`` header 90*67e74705SXin Li(which is built as a precompiled header), this option illustrates how little of 91*67e74705SXin Lithe actual precompiled header is required: 92*67e74705SXin Li 93*67e74705SXin Li.. code-block:: none 94*67e74705SXin Li 95*67e74705SXin Li *** AST File Statistics: 96*67e74705SXin Li 895/39981 source location entries read (2.238563%) 97*67e74705SXin Li 19/15315 types read (0.124061%) 98*67e74705SXin Li 20/82685 declarations read (0.024188%) 99*67e74705SXin Li 154/58070 identifiers read (0.265197%) 100*67e74705SXin Li 0/7260 selectors read (0.000000%) 101*67e74705SXin Li 0/30842 statements read (0.000000%) 102*67e74705SXin Li 4/8400 macros read (0.047619%) 103*67e74705SXin Li 1/4995 lexical declcontexts read (0.020020%) 104*67e74705SXin Li 0/4413 visible declcontexts read (0.000000%) 105*67e74705SXin Li 0/7230 method pool entries read (0.000000%) 106*67e74705SXin Li 0 method pool misses 107*67e74705SXin Li 108*67e74705SXin LiFor this small program, only a tiny fraction of the source locations, types, 109*67e74705SXin Lideclarations, identifiers, and macros were actually deserialized from the 110*67e74705SXin Liprecompiled header. These statistics can be useful to determine whether the 111*67e74705SXin LiAST file implementation can be improved by making more of the implementation 112*67e74705SXin Lilazy. 113*67e74705SXin Li 114*67e74705SXin LiPrecompiled headers can be chained. When you create a PCH while including an 115*67e74705SXin Liexisting PCH, Clang can create the new PCH by referencing the original file and 116*67e74705SXin Lionly writing the new data to the new file. For example, you could create a PCH 117*67e74705SXin Liout of all the headers that are very commonly used throughout your project, and 118*67e74705SXin Lithen create a PCH for every single source file in the project that includes the 119*67e74705SXin Licode that is specific to that file, so that recompiling the file itself is very 120*67e74705SXin Lifast, without duplicating the data from the common headers for every file. The 121*67e74705SXin Limechanisms behind chained precompiled headers are discussed in a :ref:`later 122*67e74705SXin Lisection <pchinternals-chained>`. 123*67e74705SXin Li 124*67e74705SXin LiAST File Contents 125*67e74705SXin Li----------------- 126*67e74705SXin Li 127*67e74705SXin LiAn AST file produced by clang is an object file container with a ``clangast`` 128*67e74705SXin Li(COFF) or ``__clangast`` (ELF and Mach-O) section containing the serialized AST. 129*67e74705SXin LiOther target-specific sections in the object file container are used to hold 130*67e74705SXin Lidebug information for the data types defined in the AST. Tools built on top of 131*67e74705SXin Lilibclang that do not need debug information may also produce raw AST files that 132*67e74705SXin Lionly contain the serialized AST. 133*67e74705SXin Li 134*67e74705SXin LiThe ``clangast`` section is organized into several different blocks, each of 135*67e74705SXin Liwhich contains the serialized representation of a part of Clang's internal 136*67e74705SXin Lirepresentation. Each of the blocks corresponds to either a block or a record 137*67e74705SXin Liwithin `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_. 138*67e74705SXin LiThe contents of each of these logical blocks are described below. 139*67e74705SXin Li 140*67e74705SXin Li.. image:: PCHLayout.png 141*67e74705SXin Li 142*67e74705SXin LiThe ``llvm-objdump`` utility provides a ``-raw-clang-ast`` option to extract the 143*67e74705SXin Libinary contents of the AST section from an object file container. 144*67e74705SXin Li 145*67e74705SXin LiThe `llvm-bcanalyzer <http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_ 146*67e74705SXin Liutility can be used to examine the actual structure of the bitstream for the AST 147*67e74705SXin Lisection. This information can be used both to help understand the structure of 148*67e74705SXin Lithe AST section and to isolate areas where the AST representation can still be 149*67e74705SXin Lioptimized, e.g., through the introduction of abbreviations. 150*67e74705SXin Li 151*67e74705SXin Li 152*67e74705SXin LiMetadata Block 153*67e74705SXin Li^^^^^^^^^^^^^^ 154*67e74705SXin Li 155*67e74705SXin LiThe metadata block contains several records that provide information about how 156*67e74705SXin Lithe AST file was built. This metadata is primarily used to validate the use of 157*67e74705SXin Lian AST file. For example, a precompiled header built for a 32-bit x86 target 158*67e74705SXin Licannot be used when compiling for a 64-bit x86 target. The metadata block 159*67e74705SXin Licontains information about: 160*67e74705SXin Li 161*67e74705SXin LiLanguage options 162*67e74705SXin Li Describes the particular language dialect used to compile the AST file, 163*67e74705SXin Li including major options (e.g., Objective-C support) and more minor options 164*67e74705SXin Li (e.g., support for "``//``" comments). The contents of this record correspond to 165*67e74705SXin Li the ``LangOptions`` class. 166*67e74705SXin Li 167*67e74705SXin LiTarget architecture 168*67e74705SXin Li The target triple that describes the architecture, platform, and ABI for 169*67e74705SXin Li which the AST file was generated, e.g., ``i386-apple-darwin9``. 170*67e74705SXin Li 171*67e74705SXin LiAST version 172*67e74705SXin Li The major and minor version numbers of the AST file format. Changes in the 173*67e74705SXin Li minor version number should not affect backward compatibility, while changes 174*67e74705SXin Li in the major version number imply that a newer compiler cannot read an older 175*67e74705SXin Li precompiled header (and vice-versa). 176*67e74705SXin Li 177*67e74705SXin LiOriginal file name 178*67e74705SXin Li The full path of the header that was used to generate the AST file. 179*67e74705SXin Li 180*67e74705SXin LiPredefines buffer 181*67e74705SXin Li Although not explicitly stored as part of the metadata, the predefines buffer 182*67e74705SXin Li is used in the validation of the AST file. The predefines buffer itself 183*67e74705SXin Li contains code generated by the compiler to initialize the preprocessor state 184*67e74705SXin Li according to the current target, platform, and command-line options. For 185*67e74705SXin Li example, the predefines buffer will contain "``#define __STDC__ 1``" when we 186*67e74705SXin Li are compiling C without Microsoft extensions. The predefines buffer itself 187*67e74705SXin Li is stored within the :ref:`pchinternals-sourcemgr`, but its contents are 188*67e74705SXin Li verified along with the rest of the metadata. 189*67e74705SXin Li 190*67e74705SXin LiA chained PCH file (that is, one that references another PCH) and a module 191*67e74705SXin Li(which may import other modules) have additional metadata containing the list 192*67e74705SXin Liof all AST files that this AST file depends on. Each of those files will be 193*67e74705SXin Liloaded along with this AST file. 194*67e74705SXin Li 195*67e74705SXin LiFor chained precompiled headers, the language options, target architecture and 196*67e74705SXin Lipredefines buffer data is taken from the end of the chain, since they have to 197*67e74705SXin Limatch anyway. 198*67e74705SXin Li 199*67e74705SXin Li.. _pchinternals-sourcemgr: 200*67e74705SXin Li 201*67e74705SXin LiSource Manager Block 202*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^ 203*67e74705SXin Li 204*67e74705SXin LiThe source manager block contains the serialized representation of Clang's 205*67e74705SXin Li:ref:`SourceManager <SourceManager>` class, which handles the mapping from 206*67e74705SXin Lisource locations (as represented in Clang's abstract syntax tree) into actual 207*67e74705SXin Licolumn/line positions within a source file or macro instantiation. The AST 208*67e74705SXin Lifile's representation of the source manager also includes information about all 209*67e74705SXin Liof the headers that were (transitively) included when building the AST file. 210*67e74705SXin Li 211*67e74705SXin LiThe bulk of the source manager block is dedicated to information about the 212*67e74705SXin Livarious files, buffers, and macro instantiations into which a source location 213*67e74705SXin Lican refer. Each of these is referenced by a numeric "file ID", which is a 214*67e74705SXin Liunique number (allocated starting at 1) stored in the source location. Clang 215*67e74705SXin Liserializes the information for each kind of file ID, along with an index that 216*67e74705SXin Limaps file IDs to the position within the AST file where the information about 217*67e74705SXin Lithat file ID is stored. The data associated with a file ID is loaded only when 218*67e74705SXin Lirequired by the front end, e.g., to emit a diagnostic that includes a macro 219*67e74705SXin Liinstantiation history inside the header itself. 220*67e74705SXin Li 221*67e74705SXin LiThe source manager block also contains information about all of the headers 222*67e74705SXin Lithat were included when building the AST file. This includes information about 223*67e74705SXin Lithe controlling macro for the header (e.g., when the preprocessor identified 224*67e74705SXin Lithat the contents of the header dependent on a macro like 225*67e74705SXin Li``LLVM_CLANG_SOURCEMANAGER_H``). 226*67e74705SXin Li 227*67e74705SXin Li.. _pchinternals-preprocessor: 228*67e74705SXin Li 229*67e74705SXin LiPreprocessor Block 230*67e74705SXin Li^^^^^^^^^^^^^^^^^^ 231*67e74705SXin Li 232*67e74705SXin LiThe preprocessor block contains the serialized representation of the 233*67e74705SXin Lipreprocessor. Specifically, it contains all of the macros that have been 234*67e74705SXin Lidefined by the end of the header used to build the AST file, along with the 235*67e74705SXin Litoken sequences that comprise each macro. The macro definitions are only read 236*67e74705SXin Lifrom the AST file when the name of the macro first occurs in the program. This 237*67e74705SXin Lilazy loading of macro definitions is triggered by lookups into the 238*67e74705SXin Li:ref:`identifier table <pchinternals-ident-table>`. 239*67e74705SXin Li 240*67e74705SXin Li.. _pchinternals-types: 241*67e74705SXin Li 242*67e74705SXin LiTypes Block 243*67e74705SXin Li^^^^^^^^^^^ 244*67e74705SXin Li 245*67e74705SXin LiThe types block contains the serialized representation of all of the types 246*67e74705SXin Lireferenced in the translation unit. Each Clang type node (``PointerType``, 247*67e74705SXin Li``FunctionProtoType``, etc.) has a corresponding record type in the AST file. 248*67e74705SXin LiWhen types are deserialized from the AST file, the data within the record is 249*67e74705SXin Liused to reconstruct the appropriate type node using the AST context. 250*67e74705SXin Li 251*67e74705SXin LiEach type has a unique type ID, which is an integer that uniquely identifies 252*67e74705SXin Lithat type. Type ID 0 represents the NULL type, type IDs less than 253*67e74705SXin Li``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.), 254*67e74705SXin Liwhile other "user-defined" type IDs are assigned consecutively from 255*67e74705SXin Li``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered. The AST file has 256*67e74705SXin Lian associated mapping from the user-defined types block to the location within 257*67e74705SXin Lithe types block where the serialized representation of that type resides, 258*67e74705SXin Lienabling lazy deserialization of types. When a type is referenced from within 259*67e74705SXin Lithe AST file, that reference is encoded using the type ID shifted left by 3 260*67e74705SXin Libits. The lower three bits are used to represent the ``const``, ``volatile``, 261*67e74705SXin Liand ``restrict`` qualifiers, as in Clang's :ref:`QualType <QualType>` class. 262*67e74705SXin Li 263*67e74705SXin Li.. _pchinternals-decls: 264*67e74705SXin Li 265*67e74705SXin LiDeclarations Block 266*67e74705SXin Li^^^^^^^^^^^^^^^^^^ 267*67e74705SXin Li 268*67e74705SXin LiThe declarations block contains the serialized representation of all of the 269*67e74705SXin Lideclarations referenced in the translation unit. Each Clang declaration node 270*67e74705SXin Li(``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the 271*67e74705SXin LiAST file. When declarations are deserialized from the AST file, the data 272*67e74705SXin Liwithin the record is used to build and populate a new instance of the 273*67e74705SXin Licorresponding ``Decl`` node. As with types, each declaration node has a 274*67e74705SXin Linumeric ID that is used to refer to that declaration within the AST file. In 275*67e74705SXin Liaddition, a lookup table provides a mapping from that numeric ID to the offset 276*67e74705SXin Liwithin the precompiled header where that declaration is described. 277*67e74705SXin Li 278*67e74705SXin LiDeclarations in Clang's abstract syntax trees are stored hierarchically. At 279*67e74705SXin Lithe top of the hierarchy is the translation unit (``TranslationUnitDecl``), 280*67e74705SXin Liwhich contains all of the declarations in the translation unit but is not 281*67e74705SXin Liactually written as a specific declaration node. Its child declarations (such 282*67e74705SXin Lias functions or struct types) may also contain other declarations inside them, 283*67e74705SXin Liand so on. Within Clang, each declaration is stored within a :ref:`declaration 284*67e74705SXin Licontext <DeclContext>`, as represented by the ``DeclContext`` class. 285*67e74705SXin LiDeclaration contexts provide the mechanism to perform name lookup within a 286*67e74705SXin Ligiven declaration (e.g., find the member named ``x`` in a structure) and 287*67e74705SXin Liiterate over the declarations stored within a context (e.g., iterate over all 288*67e74705SXin Liof the fields of a structure for structure layout). 289*67e74705SXin Li 290*67e74705SXin LiIn Clang's AST file format, deserializing a declaration that is a 291*67e74705SXin Li``DeclContext`` is a separate operation from deserializing all of the 292*67e74705SXin Lideclarations stored within that declaration context. Therefore, Clang will 293*67e74705SXin Lideserialize the translation unit declaration without deserializing the 294*67e74705SXin Lideclarations within that translation unit. When required, the declarations 295*67e74705SXin Listored within a declaration context will be deserialized. There are two 296*67e74705SXin Lirepresentations of the declarations within a declaration context, which 297*67e74705SXin Licorrespond to the name-lookup and iteration behavior described above: 298*67e74705SXin Li 299*67e74705SXin Li* When the front end performs name lookup to find a name ``x`` within a given 300*67e74705SXin Li declaration context (for example, during semantic analysis of the expression 301*67e74705SXin Li ``p->x``, where ``p``'s type is defined in the precompiled header), Clang 302*67e74705SXin Li refers to an on-disk hash table that maps from the names within that 303*67e74705SXin Li declaration context to the declaration IDs that represent each visible 304*67e74705SXin Li declaration with that name. The actual declarations will then be 305*67e74705SXin Li deserialized to provide the results of name lookup. 306*67e74705SXin Li* When the front end performs iteration over all of the declarations within a 307*67e74705SXin Li declaration context, all of those declarations are immediately 308*67e74705SXin Li de-serialized. For large declaration contexts (e.g., the translation unit), 309*67e74705SXin Li this operation is expensive; however, large declaration contexts are not 310*67e74705SXin Li traversed in normal compilation, since such a traversal is unnecessary. 311*67e74705SXin Li However, it is common for the code generator and semantic analysis to 312*67e74705SXin Li traverse declaration contexts for structs, classes, unions, and 313*67e74705SXin Li enumerations, although those contexts contain relatively few declarations in 314*67e74705SXin Li the common case. 315*67e74705SXin Li 316*67e74705SXin LiStatements and Expressions 317*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^ 318*67e74705SXin Li 319*67e74705SXin LiStatements and expressions are stored in the AST file in both the :ref:`types 320*67e74705SXin Li<pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks, 321*67e74705SXin Libecause every statement or expression will be associated with either a type or 322*67e74705SXin Lideclaration. The actual statement and expression records are stored 323*67e74705SXin Liimmediately following the declaration or type that owns the statement or 324*67e74705SXin Liexpression. For example, the statement representing the body of a function 325*67e74705SXin Liwill be stored directly following the declaration of the function. 326*67e74705SXin Li 327*67e74705SXin LiAs with types and declarations, each statement and expression kind in Clang's 328*67e74705SXin Liabstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding 329*67e74705SXin Lirecord type in the AST file, which contains the serialized representation of 330*67e74705SXin Lithat statement or expression. Each substatement or subexpression within an 331*67e74705SXin Liexpression is stored as a separate record (which keeps most records to a fixed 332*67e74705SXin Lisize). Within the AST file, the subexpressions of an expression are stored, in 333*67e74705SXin Lireverse order, prior to the expression that owns those expression, using a form 334*67e74705SXin Liof `Reverse Polish Notation 335*67e74705SXin Li<http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_. For example, an 336*67e74705SXin Liexpression ``3 - 4 + 5`` would be represented as follows: 337*67e74705SXin Li 338*67e74705SXin Li+-----------------------+ 339*67e74705SXin Li| ``IntegerLiteral(5)`` | 340*67e74705SXin Li+-----------------------+ 341*67e74705SXin Li| ``IntegerLiteral(4)`` | 342*67e74705SXin Li+-----------------------+ 343*67e74705SXin Li| ``IntegerLiteral(3)`` | 344*67e74705SXin Li+-----------------------+ 345*67e74705SXin Li| ``IntegerLiteral(-)`` | 346*67e74705SXin Li+-----------------------+ 347*67e74705SXin Li| ``IntegerLiteral(+)`` | 348*67e74705SXin Li+-----------------------+ 349*67e74705SXin Li| ``STOP`` | 350*67e74705SXin Li+-----------------------+ 351*67e74705SXin Li 352*67e74705SXin LiWhen reading this representation, Clang evaluates each expression record it 353*67e74705SXin Liencounters, builds the appropriate abstract syntax tree node, and then pushes 354*67e74705SXin Lithat expression on to a stack. When a record contains *N* subexpressions --- 355*67e74705SXin Li``BinaryOperator`` has two of them --- those expressions are popped from the 356*67e74705SXin Litop of the stack. The special STOP code indicates that we have reached the end 357*67e74705SXin Liof a serialized expression or statement; other expression or statement records 358*67e74705SXin Limay follow, but they are part of a different expression. 359*67e74705SXin Li 360*67e74705SXin Li.. _pchinternals-ident-table: 361*67e74705SXin Li 362*67e74705SXin LiIdentifier Table Block 363*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^ 364*67e74705SXin Li 365*67e74705SXin LiThe identifier table block contains an on-disk hash table that maps each 366*67e74705SXin Liidentifier mentioned within the AST file to the serialized representation of 367*67e74705SXin Lithe identifier's information (e.g, the ``IdentifierInfo`` structure). The 368*67e74705SXin Liserialized representation contains: 369*67e74705SXin Li 370*67e74705SXin Li* The actual identifier string. 371*67e74705SXin Li* Flags that describe whether this identifier is the name of a built-in, a 372*67e74705SXin Li poisoned identifier, an extension token, or a macro. 373*67e74705SXin Li* If the identifier names a macro, the offset of the macro definition within 374*67e74705SXin Li the :ref:`pchinternals-preprocessor`. 375*67e74705SXin Li* If the identifier names one or more declarations visible from translation 376*67e74705SXin Li unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these 377*67e74705SXin Li declarations. 378*67e74705SXin Li 379*67e74705SXin LiWhen an AST file is loaded, the AST file reader mechanism introduces itself 380*67e74705SXin Liinto the identifier table as an external lookup source. Thus, when the user 381*67e74705SXin Liprogram refers to an identifier that has not yet been seen, Clang will perform 382*67e74705SXin Lia lookup into the identifier table. If an identifier is found, its contents 383*67e74705SXin Li(macro definitions, flags, top-level declarations, etc.) will be deserialized, 384*67e74705SXin Liat which point the corresponding ``IdentifierInfo`` structure will have the 385*67e74705SXin Lisame contents it would have after parsing the headers in the AST file. 386*67e74705SXin Li 387*67e74705SXin LiWithin the AST file, the identifiers used to name declarations are represented 388*67e74705SXin Liwith an integral value. A separate table provides a mapping from this integral 389*67e74705SXin Livalue (the identifier ID) to the location within the on-disk hash table where 390*67e74705SXin Lithat identifier is stored. This mapping is used when deserializing the name of 391*67e74705SXin Lia declaration, the identifier of a token, or any other construct in the AST 392*67e74705SXin Lifile that refers to a name. 393*67e74705SXin Li 394*67e74705SXin Li.. _pchinternals-method-pool: 395*67e74705SXin Li 396*67e74705SXin LiMethod Pool Block 397*67e74705SXin Li^^^^^^^^^^^^^^^^^ 398*67e74705SXin Li 399*67e74705SXin LiThe method pool block is represented as an on-disk hash table that serves two 400*67e74705SXin Lipurposes: it provides a mapping from the names of Objective-C selectors to the 401*67e74705SXin Liset of Objective-C instance and class methods that have that particular 402*67e74705SXin Liselector (which is required for semantic analysis in Objective-C) and also 403*67e74705SXin Listores all of the selectors used by entities within the AST file. The design 404*67e74705SXin Liof the method pool is similar to that of the :ref:`identifier table 405*67e74705SXin Li<pchinternals-ident-table>`: the first time a particular selector is formed 406*67e74705SXin Liduring the compilation of the program, Clang will search in the on-disk hash 407*67e74705SXin Litable of selectors; if found, Clang will read the Objective-C methods 408*67e74705SXin Liassociated with that selector into the appropriate front-end data structure 409*67e74705SXin Li(``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and 410*67e74705SXin Liclass methods, respectively). 411*67e74705SXin Li 412*67e74705SXin LiAs with identifiers, selectors are represented by numeric values within the AST 413*67e74705SXin Lifile. A separate index maps these numeric selector values to the offset of the 414*67e74705SXin Liselector within the on-disk hash table, and will be used when de-serializing an 415*67e74705SXin LiObjective-C method declaration (or other Objective-C construct) that refers to 416*67e74705SXin Lithe selector. 417*67e74705SXin Li 418*67e74705SXin LiAST Reader Integration Points 419*67e74705SXin Li----------------------------- 420*67e74705SXin Li 421*67e74705SXin LiThe "lazy" deserialization behavior of AST files requires their integration 422*67e74705SXin Liinto several completely different submodules of Clang. For example, lazily 423*67e74705SXin Lideserializing the declarations during name lookup requires that the name-lookup 424*67e74705SXin Liroutines be able to query the AST file to find entities stored there. 425*67e74705SXin Li 426*67e74705SXin LiFor each Clang data structure that requires direct interaction with the AST 427*67e74705SXin Lireader logic, there is an abstract class that provides the interface between 428*67e74705SXin Lithe two modules. The ``ASTReader`` class, which handles the loading of an AST 429*67e74705SXin Lifile, inherits from all of these abstract classes to provide lazy 430*67e74705SXin Lideserialization of Clang's data structures. ``ASTReader`` implements the 431*67e74705SXin Lifollowing abstract classes: 432*67e74705SXin Li 433*67e74705SXin Li``ExternalSLocEntrySource`` 434*67e74705SXin Li This abstract interface is associated with the ``SourceManager`` class, and 435*67e74705SXin Li is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to 436*67e74705SXin Li load the details of a file, buffer, or macro instantiation. 437*67e74705SXin Li 438*67e74705SXin Li``IdentifierInfoLookup`` 439*67e74705SXin Li This abstract interface is associated with the ``IdentifierTable`` class, and 440*67e74705SXin Li is used whenever the program source refers to an identifier that has not yet 441*67e74705SXin Li been seen. In this case, the AST reader searches for this identifier within 442*67e74705SXin Li its :ref:`identifier table <pchinternals-ident-table>` to load any top-level 443*67e74705SXin Li declarations or macros associated with that identifier. 444*67e74705SXin Li 445*67e74705SXin Li``ExternalASTSource`` 446*67e74705SXin Li This abstract interface is associated with the ``ASTContext`` class, and is 447*67e74705SXin Li used whenever the abstract syntax tree nodes need to loaded from the AST 448*67e74705SXin Li file. It provides the ability to de-serialize declarations and types 449*67e74705SXin Li identified by their numeric values, read the bodies of functions when 450*67e74705SXin Li required, and read the declarations stored within a declaration context 451*67e74705SXin Li (either for iteration or for name lookup). 452*67e74705SXin Li 453*67e74705SXin Li``ExternalSemaSource`` 454*67e74705SXin Li This abstract interface is associated with the ``Sema`` class, and is used 455*67e74705SXin Li whenever semantic analysis needs to read information from the :ref:`global 456*67e74705SXin Li method pool <pchinternals-method-pool>`. 457*67e74705SXin Li 458*67e74705SXin Li.. _pchinternals-chained: 459*67e74705SXin Li 460*67e74705SXin LiChained precompiled headers 461*67e74705SXin Li--------------------------- 462*67e74705SXin Li 463*67e74705SXin LiChained precompiled headers were initially intended to improve the performance 464*67e74705SXin Liof IDE-centric operations such as syntax highlighting and code completion while 465*67e74705SXin Lia particular source file is being edited by the user. To minimize the amount 466*67e74705SXin Liof reparsing required after a change to the file, a form of precompiled header 467*67e74705SXin Li--- called a precompiled *preamble* --- is automatically generated by parsing 468*67e74705SXin Liall of the headers in the source file, up to and including the last 469*67e74705SXin Li``#include``. When only the source file changes (and none of the headers it 470*67e74705SXin Lidepends on), reparsing of that source file can use the precompiled preamble and 471*67e74705SXin Listart parsing after the ``#include``\ s, so parsing time is proportional to the 472*67e74705SXin Lisize of the source file (rather than all of its includes). However, the 473*67e74705SXin Licompilation of that translation unit may already use a precompiled header: in 474*67e74705SXin Lithis case, Clang will create the precompiled preamble as a chained precompiled 475*67e74705SXin Liheader that refers to the original precompiled header. This drastically 476*67e74705SXin Lireduces the time needed to serialize the precompiled preamble for use in 477*67e74705SXin Lireparsing. 478*67e74705SXin Li 479*67e74705SXin LiChained precompiled headers get their name because each precompiled header can 480*67e74705SXin Lidepend on one other precompiled header, forming a chain of dependencies. A 481*67e74705SXin Litranslation unit will then include the precompiled header that starts the chain 482*67e74705SXin Li(i.e., nothing depends on it). This linearity of dependencies is important for 483*67e74705SXin Lithe semantic model of chained precompiled headers, because the most-recent 484*67e74705SXin Liprecompiled header can provide information that overrides the information 485*67e74705SXin Liprovided by the precompiled headers it depends on, just like a header file 486*67e74705SXin Li``B.h`` that includes another header ``A.h`` can modify the state produced by 487*67e74705SXin Liparsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``. 488*67e74705SXin Li 489*67e74705SXin LiThere are several ways in which chained precompiled headers generalize the AST 490*67e74705SXin Lifile model: 491*67e74705SXin Li 492*67e74705SXin LiNumbering of IDs 493*67e74705SXin Li Many different kinds of entities --- identifiers, declarations, types, etc. 494*67e74705SXin Li --- have ID numbers that start at 1 or some other predefined constant and 495*67e74705SXin Li grow upward. Each precompiled header records the maximum ID number it has 496*67e74705SXin Li assigned in each category. Then, when a new precompiled header is generated 497*67e74705SXin Li that depends on (chains to) another precompiled header, it will start 498*67e74705SXin Li counting at the next available ID number. This way, one can determine, given 499*67e74705SXin Li an ID number, which AST file actually contains the entity. 500*67e74705SXin Li 501*67e74705SXin LiName lookup 502*67e74705SXin Li When writing a chained precompiled header, Clang attempts to write only 503*67e74705SXin Li information that has changed from the precompiled header on which it is 504*67e74705SXin Li based. This changes the lookup algorithm for the various tables, such as the 505*67e74705SXin Li :ref:`identifier table <pchinternals-ident-table>`: the search starts at the 506*67e74705SXin Li most-recent precompiled header. If no entry is found, lookup then proceeds 507*67e74705SXin Li to the identifier table in the precompiled header it depends on, and so one. 508*67e74705SXin Li Once a lookup succeeds, that result is considered definitive, overriding any 509*67e74705SXin Li results from earlier precompiled headers. 510*67e74705SXin Li 511*67e74705SXin LiUpdate records 512*67e74705SXin Li There are various ways in which a later precompiled header can modify the 513*67e74705SXin Li entities described in an earlier precompiled header. For example, later 514*67e74705SXin Li precompiled headers can add entries into the various name-lookup tables for 515*67e74705SXin Li the translation unit or namespaces, or add new categories to an Objective-C 516*67e74705SXin Li class. Each of these updates is captured in an "update record" that is 517*67e74705SXin Li stored in the chained precompiled header file and will be loaded along with 518*67e74705SXin Li the original entity. 519*67e74705SXin Li 520*67e74705SXin Li.. _pchinternals-modules: 521*67e74705SXin Li 522*67e74705SXin LiModules 523*67e74705SXin Li------- 524*67e74705SXin Li 525*67e74705SXin LiModules generalize the chained precompiled header model yet further, from a 526*67e74705SXin Lilinear chain of precompiled headers to an arbitrary directed acyclic graph 527*67e74705SXin Li(DAG) of AST files. All of the same techniques used to make chained 528*67e74705SXin Liprecompiled headers work --- ID number, name lookup, update records --- are 529*67e74705SXin Lishared with modules. However, the DAG nature of modules introduce a number of 530*67e74705SXin Liadditional complications to the model: 531*67e74705SXin Li 532*67e74705SXin LiNumbering of IDs 533*67e74705SXin Li The simple, linear numbering scheme used in chained precompiled headers falls 534*67e74705SXin Li apart with the module DAG, because different modules may end up with 535*67e74705SXin Li different numbering schemes for entities they imported from common shared 536*67e74705SXin Li modules. To account for this, each module file provides information about 537*67e74705SXin Li which modules it depends on and which ID numbers it assigned to the entities 538*67e74705SXin Li in those modules, as well as which ID numbers it took for its own new 539*67e74705SXin Li entities. The AST reader then maps these "local" ID numbers into a "global" 540*67e74705SXin Li ID number space for the current translation unit, providing a 1-1 mapping 541*67e74705SXin Li between entities (in whatever AST file they inhabit) and global ID numbers. 542*67e74705SXin Li If that translation unit is then serialized into an AST file, this mapping 543*67e74705SXin Li will be stored for use when the AST file is imported. 544*67e74705SXin Li 545*67e74705SXin LiDeclaration merging 546*67e74705SXin Li It is possible for a given entity (from the language's perspective) to be 547*67e74705SXin Li declared multiple times in different places. For example, two different 548*67e74705SXin Li headers can have the declaration of ``printf`` or could forward-declare 549*67e74705SXin Li ``struct stat``. If each of those headers is included in a module, and some 550*67e74705SXin Li third party imports both of those modules, there is a potentially serious 551*67e74705SXin Li problem: name lookup for ``printf`` or ``struct stat`` will find both 552*67e74705SXin Li declarations, but the AST nodes are unrelated. This would result in a 553*67e74705SXin Li compilation error, due to an ambiguity in name lookup. Therefore, the AST 554*67e74705SXin Li reader performs declaration merging according to the appropriate language 555*67e74705SXin Li semantics, ensuring that the two disjoint declarations are merged into a 556*67e74705SXin Li single redeclaration chain (with a common canonical declaration), so that it 557*67e74705SXin Li is as if one of the headers had been included before the other. 558*67e74705SXin Li 559*67e74705SXin LiName Visibility 560*67e74705SXin Li Modules allow certain names that occur during module creation to be "hidden", 561*67e74705SXin Li so that they are not part of the public interface of the module and are not 562*67e74705SXin Li visible to its clients. The AST reader maintains a "visible" bit on various 563*67e74705SXin Li AST nodes (declarations, macros, etc.) to indicate whether that particular 564*67e74705SXin Li AST node is currently visible; the various name lookup mechanisms in Clang 565*67e74705SXin Li inspect the visible bit to determine whether that entity, which is still in 566*67e74705SXin Li the AST (because other, visible AST nodes may depend on it), can actually be 567*67e74705SXin Li found by name lookup. When a new (sub)module is imported, it may make 568*67e74705SXin Li existing, non-visible, already-deserialized AST nodes visible; it is the 569*67e74705SXin Li responsibility of the AST reader to find and update these AST nodes when it 570*67e74705SXin Li is notified of the import. 571*67e74705SXin Li 572