xref: /aosp_15_r20/external/clang/docs/PCHInternals.rst (revision 67e74705e28f6214e480b399dd47ea732279e315)
1*67e74705SXin Li========================================
2*67e74705SXin LiPrecompiled Header and Modules Internals
3*67e74705SXin Li========================================
4*67e74705SXin Li
5*67e74705SXin Li.. contents::
6*67e74705SXin Li   :local:
7*67e74705SXin Li
8*67e74705SXin LiThis document describes the design and implementation of Clang's precompiled
9*67e74705SXin Liheaders (PCH) and modules.  If you are interested in the end-user view, please
10*67e74705SXin Lisee the :ref:`User's Manual <usersmanual-precompiled-headers>`.
11*67e74705SXin Li
12*67e74705SXin LiUsing Precompiled Headers with ``clang``
13*67e74705SXin Li----------------------------------------
14*67e74705SXin Li
15*67e74705SXin LiThe Clang compiler frontend, ``clang -cc1``, supports two command line options
16*67e74705SXin Lifor generating and using PCH files.
17*67e74705SXin Li
18*67e74705SXin LiTo generate PCH files using ``clang -cc1``, use the option `-emit-pch`:
19*67e74705SXin Li
20*67e74705SXin Li.. code-block:: bash
21*67e74705SXin Li
22*67e74705SXin Li  $ clang -cc1 test.h -emit-pch -o test.h.pch
23*67e74705SXin Li
24*67e74705SXin LiThis option is transparently used by ``clang`` when generating PCH files.  The
25*67e74705SXin Liresulting PCH file contains the serialized form of the compiler's internal
26*67e74705SXin Lirepresentation after it has completed parsing and semantic analysis.  The PCH
27*67e74705SXin Lifile can then be used as a prefix header with the `-include-pch`
28*67e74705SXin Lioption:
29*67e74705SXin Li
30*67e74705SXin Li.. code-block:: bash
31*67e74705SXin Li
32*67e74705SXin Li  $ clang -cc1 -include-pch test.h.pch test.c -o test.s
33*67e74705SXin Li
34*67e74705SXin LiDesign Philosophy
35*67e74705SXin Li-----------------
36*67e74705SXin Li
37*67e74705SXin LiPrecompiled headers are meant to improve overall compile times for projects, so
38*67e74705SXin Lithe design of precompiled headers is entirely driven by performance concerns.
39*67e74705SXin LiThe use case for precompiled headers is relatively simple: when there is a
40*67e74705SXin Licommon set of headers that is included in nearly every source file in the
41*67e74705SXin Liproject, we *precompile* that bundle of headers into a single precompiled
42*67e74705SXin Liheader (PCH file).  Then, when compiling the source files in the project, we
43*67e74705SXin Liload the PCH file first (as a prefix header), which acts as a stand-in for that
44*67e74705SXin Libundle of headers.
45*67e74705SXin Li
46*67e74705SXin LiA precompiled header implementation improves performance when:
47*67e74705SXin Li
48*67e74705SXin Li* Loading the PCH file is significantly faster than re-parsing the bundle of
49*67e74705SXin Li  headers stored within the PCH file.  Thus, a precompiled header design
50*67e74705SXin Li  attempts to minimize the cost of reading the PCH file.  Ideally, this cost
51*67e74705SXin Li  should not vary with the size of the precompiled header file.
52*67e74705SXin Li
53*67e74705SXin Li* The cost of generating the PCH file initially is not so large that it
54*67e74705SXin Li  counters the per-source-file performance improvement due to eliminating the
55*67e74705SXin Li  need to parse the bundled headers in the first place.  This is particularly
56*67e74705SXin Li  important on multi-core systems, because PCH file generation serializes the
57*67e74705SXin Li  build when all compilations require the PCH file to be up-to-date.
58*67e74705SXin Li
59*67e74705SXin LiModules, as implemented in Clang, use the same mechanisms as precompiled
60*67e74705SXin Liheaders to save a serialized AST file (one per module) and use those AST
61*67e74705SXin Limodules.  From an implementation standpoint, modules are a generalization of
62*67e74705SXin Liprecompiled headers, lifting a number of restrictions placed on precompiled
63*67e74705SXin Liheaders.  In particular, there can only be one precompiled header and it must
64*67e74705SXin Libe included at the beginning of the translation unit.  The extensions to the
65*67e74705SXin LiAST file format required for modules are discussed in the section on
66*67e74705SXin Li:ref:`modules <pchinternals-modules>`.
67*67e74705SXin Li
68*67e74705SXin LiClang's AST files are designed with a compact on-disk representation, which
69*67e74705SXin Liminimizes both creation time and the time required to initially load the AST
70*67e74705SXin Lifile.  The AST file itself contains a serialized representation of Clang's
71*67e74705SXin Liabstract syntax trees and supporting data structures, stored using the same
72*67e74705SXin Licompressed bitstream as `LLVM's bitcode file format
73*67e74705SXin Li<http://llvm.org/docs/BitCodeFormat.html>`_.
74*67e74705SXin Li
75*67e74705SXin LiClang's AST files are loaded "lazily" from disk.  When an AST file is initially
76*67e74705SXin Liloaded, Clang reads only a small amount of data from the AST file to establish
77*67e74705SXin Liwhere certain important data structures are stored.  The amount of data read in
78*67e74705SXin Lithis initial load is independent of the size of the AST file, such that a
79*67e74705SXin Lilarger AST file does not lead to longer AST load times.  The actual header data
80*67e74705SXin Liin the AST file --- macros, functions, variables, types, etc. --- is loaded
81*67e74705SXin Lionly when it is referenced from the user's code, at which point only that
82*67e74705SXin Lientity (and those entities it depends on) are deserialized from the AST file.
83*67e74705SXin LiWith this approach, the cost of using an AST file for a translation unit is
84*67e74705SXin Liproportional to the amount of code actually used from the AST file, rather than
85*67e74705SXin Libeing proportional to the size of the AST file itself.
86*67e74705SXin Li
87*67e74705SXin LiWhen given the `-print-stats` option, Clang produces statistics
88*67e74705SXin Lidescribing how much of the AST file was actually loaded from disk.  For a
89*67e74705SXin Lisimple "Hello, World!" program that includes the Apple ``Cocoa.h`` header
90*67e74705SXin Li(which is built as a precompiled header), this option illustrates how little of
91*67e74705SXin Lithe actual precompiled header is required:
92*67e74705SXin Li
93*67e74705SXin Li.. code-block:: none
94*67e74705SXin Li
95*67e74705SXin Li  *** AST File Statistics:
96*67e74705SXin Li    895/39981 source location entries read (2.238563%)
97*67e74705SXin Li    19/15315 types read (0.124061%)
98*67e74705SXin Li    20/82685 declarations read (0.024188%)
99*67e74705SXin Li    154/58070 identifiers read (0.265197%)
100*67e74705SXin Li    0/7260 selectors read (0.000000%)
101*67e74705SXin Li    0/30842 statements read (0.000000%)
102*67e74705SXin Li    4/8400 macros read (0.047619%)
103*67e74705SXin Li    1/4995 lexical declcontexts read (0.020020%)
104*67e74705SXin Li    0/4413 visible declcontexts read (0.000000%)
105*67e74705SXin Li    0/7230 method pool entries read (0.000000%)
106*67e74705SXin Li    0 method pool misses
107*67e74705SXin Li
108*67e74705SXin LiFor this small program, only a tiny fraction of the source locations, types,
109*67e74705SXin Lideclarations, identifiers, and macros were actually deserialized from the
110*67e74705SXin Liprecompiled header.  These statistics can be useful to determine whether the
111*67e74705SXin LiAST file implementation can be improved by making more of the implementation
112*67e74705SXin Lilazy.
113*67e74705SXin Li
114*67e74705SXin LiPrecompiled headers can be chained.  When you create a PCH while including an
115*67e74705SXin Liexisting PCH, Clang can create the new PCH by referencing the original file and
116*67e74705SXin Lionly writing the new data to the new file.  For example, you could create a PCH
117*67e74705SXin Liout of all the headers that are very commonly used throughout your project, and
118*67e74705SXin Lithen create a PCH for every single source file in the project that includes the
119*67e74705SXin Licode that is specific to that file, so that recompiling the file itself is very
120*67e74705SXin Lifast, without duplicating the data from the common headers for every file.  The
121*67e74705SXin Limechanisms behind chained precompiled headers are discussed in a :ref:`later
122*67e74705SXin Lisection <pchinternals-chained>`.
123*67e74705SXin Li
124*67e74705SXin LiAST File Contents
125*67e74705SXin Li-----------------
126*67e74705SXin Li
127*67e74705SXin LiAn AST file produced by clang is an object file container with a ``clangast``
128*67e74705SXin Li(COFF) or ``__clangast`` (ELF and Mach-O) section containing the serialized AST.
129*67e74705SXin LiOther target-specific sections in the object file container are used to hold
130*67e74705SXin Lidebug information for the data types defined in the AST.  Tools built on top of
131*67e74705SXin Lilibclang that do not need debug information may also produce raw AST files that
132*67e74705SXin Lionly contain the serialized AST.
133*67e74705SXin Li
134*67e74705SXin LiThe ``clangast`` section is organized into several different blocks, each of
135*67e74705SXin Liwhich contains the serialized representation of a part of Clang's internal
136*67e74705SXin Lirepresentation.  Each of the blocks corresponds to either a block or a record
137*67e74705SXin Liwithin `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_.
138*67e74705SXin LiThe contents of each of these logical blocks are described below.
139*67e74705SXin Li
140*67e74705SXin Li.. image:: PCHLayout.png
141*67e74705SXin Li
142*67e74705SXin LiThe ``llvm-objdump`` utility provides a ``-raw-clang-ast`` option to extract the
143*67e74705SXin Libinary contents of the AST section from an object file container.
144*67e74705SXin Li
145*67e74705SXin LiThe `llvm-bcanalyzer <http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_
146*67e74705SXin Liutility can be used to examine the actual structure of the bitstream for the AST
147*67e74705SXin Lisection.  This information can be used both to help understand the structure of
148*67e74705SXin Lithe AST section and to isolate areas where the AST representation can still be
149*67e74705SXin Lioptimized, e.g., through the introduction of abbreviations.
150*67e74705SXin Li
151*67e74705SXin Li
152*67e74705SXin LiMetadata Block
153*67e74705SXin Li^^^^^^^^^^^^^^
154*67e74705SXin Li
155*67e74705SXin LiThe metadata block contains several records that provide information about how
156*67e74705SXin Lithe AST file was built.  This metadata is primarily used to validate the use of
157*67e74705SXin Lian AST file.  For example, a precompiled header built for a 32-bit x86 target
158*67e74705SXin Licannot be used when compiling for a 64-bit x86 target.  The metadata block
159*67e74705SXin Licontains information about:
160*67e74705SXin Li
161*67e74705SXin LiLanguage options
162*67e74705SXin Li  Describes the particular language dialect used to compile the AST file,
163*67e74705SXin Li  including major options (e.g., Objective-C support) and more minor options
164*67e74705SXin Li  (e.g., support for "``//``" comments).  The contents of this record correspond to
165*67e74705SXin Li  the ``LangOptions`` class.
166*67e74705SXin Li
167*67e74705SXin LiTarget architecture
168*67e74705SXin Li  The target triple that describes the architecture, platform, and ABI for
169*67e74705SXin Li  which the AST file was generated, e.g., ``i386-apple-darwin9``.
170*67e74705SXin Li
171*67e74705SXin LiAST version
172*67e74705SXin Li  The major and minor version numbers of the AST file format.  Changes in the
173*67e74705SXin Li  minor version number should not affect backward compatibility, while changes
174*67e74705SXin Li  in the major version number imply that a newer compiler cannot read an older
175*67e74705SXin Li  precompiled header (and vice-versa).
176*67e74705SXin Li
177*67e74705SXin LiOriginal file name
178*67e74705SXin Li  The full path of the header that was used to generate the AST file.
179*67e74705SXin Li
180*67e74705SXin LiPredefines buffer
181*67e74705SXin Li  Although not explicitly stored as part of the metadata, the predefines buffer
182*67e74705SXin Li  is used in the validation of the AST file.  The predefines buffer itself
183*67e74705SXin Li  contains code generated by the compiler to initialize the preprocessor state
184*67e74705SXin Li  according to the current target, platform, and command-line options.  For
185*67e74705SXin Li  example, the predefines buffer will contain "``#define __STDC__ 1``" when we
186*67e74705SXin Li  are compiling C without Microsoft extensions.  The predefines buffer itself
187*67e74705SXin Li  is stored within the :ref:`pchinternals-sourcemgr`, but its contents are
188*67e74705SXin Li  verified along with the rest of the metadata.
189*67e74705SXin Li
190*67e74705SXin LiA chained PCH file (that is, one that references another PCH) and a module
191*67e74705SXin Li(which may import other modules) have additional metadata containing the list
192*67e74705SXin Liof all AST files that this AST file depends on.  Each of those files will be
193*67e74705SXin Liloaded along with this AST file.
194*67e74705SXin Li
195*67e74705SXin LiFor chained precompiled headers, the language options, target architecture and
196*67e74705SXin Lipredefines buffer data is taken from the end of the chain, since they have to
197*67e74705SXin Limatch anyway.
198*67e74705SXin Li
199*67e74705SXin Li.. _pchinternals-sourcemgr:
200*67e74705SXin Li
201*67e74705SXin LiSource Manager Block
202*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^
203*67e74705SXin Li
204*67e74705SXin LiThe source manager block contains the serialized representation of Clang's
205*67e74705SXin Li:ref:`SourceManager <SourceManager>` class, which handles the mapping from
206*67e74705SXin Lisource locations (as represented in Clang's abstract syntax tree) into actual
207*67e74705SXin Licolumn/line positions within a source file or macro instantiation.  The AST
208*67e74705SXin Lifile's representation of the source manager also includes information about all
209*67e74705SXin Liof the headers that were (transitively) included when building the AST file.
210*67e74705SXin Li
211*67e74705SXin LiThe bulk of the source manager block is dedicated to information about the
212*67e74705SXin Livarious files, buffers, and macro instantiations into which a source location
213*67e74705SXin Lican refer.  Each of these is referenced by a numeric "file ID", which is a
214*67e74705SXin Liunique number (allocated starting at 1) stored in the source location.  Clang
215*67e74705SXin Liserializes the information for each kind of file ID, along with an index that
216*67e74705SXin Limaps file IDs to the position within the AST file where the information about
217*67e74705SXin Lithat file ID is stored.  The data associated with a file ID is loaded only when
218*67e74705SXin Lirequired by the front end, e.g., to emit a diagnostic that includes a macro
219*67e74705SXin Liinstantiation history inside the header itself.
220*67e74705SXin Li
221*67e74705SXin LiThe source manager block also contains information about all of the headers
222*67e74705SXin Lithat were included when building the AST file.  This includes information about
223*67e74705SXin Lithe controlling macro for the header (e.g., when the preprocessor identified
224*67e74705SXin Lithat the contents of the header dependent on a macro like
225*67e74705SXin Li``LLVM_CLANG_SOURCEMANAGER_H``).
226*67e74705SXin Li
227*67e74705SXin Li.. _pchinternals-preprocessor:
228*67e74705SXin Li
229*67e74705SXin LiPreprocessor Block
230*67e74705SXin Li^^^^^^^^^^^^^^^^^^
231*67e74705SXin Li
232*67e74705SXin LiThe preprocessor block contains the serialized representation of the
233*67e74705SXin Lipreprocessor.  Specifically, it contains all of the macros that have been
234*67e74705SXin Lidefined by the end of the header used to build the AST file, along with the
235*67e74705SXin Litoken sequences that comprise each macro.  The macro definitions are only read
236*67e74705SXin Lifrom the AST file when the name of the macro first occurs in the program.  This
237*67e74705SXin Lilazy loading of macro definitions is triggered by lookups into the
238*67e74705SXin Li:ref:`identifier table <pchinternals-ident-table>`.
239*67e74705SXin Li
240*67e74705SXin Li.. _pchinternals-types:
241*67e74705SXin Li
242*67e74705SXin LiTypes Block
243*67e74705SXin Li^^^^^^^^^^^
244*67e74705SXin Li
245*67e74705SXin LiThe types block contains the serialized representation of all of the types
246*67e74705SXin Lireferenced in the translation unit.  Each Clang type node (``PointerType``,
247*67e74705SXin Li``FunctionProtoType``, etc.) has a corresponding record type in the AST file.
248*67e74705SXin LiWhen types are deserialized from the AST file, the data within the record is
249*67e74705SXin Liused to reconstruct the appropriate type node using the AST context.
250*67e74705SXin Li
251*67e74705SXin LiEach type has a unique type ID, which is an integer that uniquely identifies
252*67e74705SXin Lithat type.  Type ID 0 represents the NULL type, type IDs less than
253*67e74705SXin Li``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.),
254*67e74705SXin Liwhile other "user-defined" type IDs are assigned consecutively from
255*67e74705SXin Li``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered.  The AST file has
256*67e74705SXin Lian associated mapping from the user-defined types block to the location within
257*67e74705SXin Lithe types block where the serialized representation of that type resides,
258*67e74705SXin Lienabling lazy deserialization of types.  When a type is referenced from within
259*67e74705SXin Lithe AST file, that reference is encoded using the type ID shifted left by 3
260*67e74705SXin Libits.  The lower three bits are used to represent the ``const``, ``volatile``,
261*67e74705SXin Liand ``restrict`` qualifiers, as in Clang's :ref:`QualType <QualType>` class.
262*67e74705SXin Li
263*67e74705SXin Li.. _pchinternals-decls:
264*67e74705SXin Li
265*67e74705SXin LiDeclarations Block
266*67e74705SXin Li^^^^^^^^^^^^^^^^^^
267*67e74705SXin Li
268*67e74705SXin LiThe declarations block contains the serialized representation of all of the
269*67e74705SXin Lideclarations referenced in the translation unit.  Each Clang declaration node
270*67e74705SXin Li(``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the
271*67e74705SXin LiAST file.  When declarations are deserialized from the AST file, the data
272*67e74705SXin Liwithin the record is used to build and populate a new instance of the
273*67e74705SXin Licorresponding ``Decl`` node.  As with types, each declaration node has a
274*67e74705SXin Linumeric ID that is used to refer to that declaration within the AST file.  In
275*67e74705SXin Liaddition, a lookup table provides a mapping from that numeric ID to the offset
276*67e74705SXin Liwithin the precompiled header where that declaration is described.
277*67e74705SXin Li
278*67e74705SXin LiDeclarations in Clang's abstract syntax trees are stored hierarchically.  At
279*67e74705SXin Lithe top of the hierarchy is the translation unit (``TranslationUnitDecl``),
280*67e74705SXin Liwhich contains all of the declarations in the translation unit but is not
281*67e74705SXin Liactually written as a specific declaration node.  Its child declarations (such
282*67e74705SXin Lias functions or struct types) may also contain other declarations inside them,
283*67e74705SXin Liand so on.  Within Clang, each declaration is stored within a :ref:`declaration
284*67e74705SXin Licontext <DeclContext>`, as represented by the ``DeclContext`` class.
285*67e74705SXin LiDeclaration contexts provide the mechanism to perform name lookup within a
286*67e74705SXin Ligiven declaration (e.g., find the member named ``x`` in a structure) and
287*67e74705SXin Liiterate over the declarations stored within a context (e.g., iterate over all
288*67e74705SXin Liof the fields of a structure for structure layout).
289*67e74705SXin Li
290*67e74705SXin LiIn Clang's AST file format, deserializing a declaration that is a
291*67e74705SXin Li``DeclContext`` is a separate operation from deserializing all of the
292*67e74705SXin Lideclarations stored within that declaration context.  Therefore, Clang will
293*67e74705SXin Lideserialize the translation unit declaration without deserializing the
294*67e74705SXin Lideclarations within that translation unit.  When required, the declarations
295*67e74705SXin Listored within a declaration context will be deserialized.  There are two
296*67e74705SXin Lirepresentations of the declarations within a declaration context, which
297*67e74705SXin Licorrespond to the name-lookup and iteration behavior described above:
298*67e74705SXin Li
299*67e74705SXin Li* When the front end performs name lookup to find a name ``x`` within a given
300*67e74705SXin Li  declaration context (for example, during semantic analysis of the expression
301*67e74705SXin Li  ``p->x``, where ``p``'s type is defined in the precompiled header), Clang
302*67e74705SXin Li  refers to an on-disk hash table that maps from the names within that
303*67e74705SXin Li  declaration context to the declaration IDs that represent each visible
304*67e74705SXin Li  declaration with that name.  The actual declarations will then be
305*67e74705SXin Li  deserialized to provide the results of name lookup.
306*67e74705SXin Li* When the front end performs iteration over all of the declarations within a
307*67e74705SXin Li  declaration context, all of those declarations are immediately
308*67e74705SXin Li  de-serialized.  For large declaration contexts (e.g., the translation unit),
309*67e74705SXin Li  this operation is expensive; however, large declaration contexts are not
310*67e74705SXin Li  traversed in normal compilation, since such a traversal is unnecessary.
311*67e74705SXin Li  However, it is common for the code generator and semantic analysis to
312*67e74705SXin Li  traverse declaration contexts for structs, classes, unions, and
313*67e74705SXin Li  enumerations, although those contexts contain relatively few declarations in
314*67e74705SXin Li  the common case.
315*67e74705SXin Li
316*67e74705SXin LiStatements and Expressions
317*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^
318*67e74705SXin Li
319*67e74705SXin LiStatements and expressions are stored in the AST file in both the :ref:`types
320*67e74705SXin Li<pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks,
321*67e74705SXin Libecause every statement or expression will be associated with either a type or
322*67e74705SXin Lideclaration.  The actual statement and expression records are stored
323*67e74705SXin Liimmediately following the declaration or type that owns the statement or
324*67e74705SXin Liexpression.  For example, the statement representing the body of a function
325*67e74705SXin Liwill be stored directly following the declaration of the function.
326*67e74705SXin Li
327*67e74705SXin LiAs with types and declarations, each statement and expression kind in Clang's
328*67e74705SXin Liabstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding
329*67e74705SXin Lirecord type in the AST file, which contains the serialized representation of
330*67e74705SXin Lithat statement or expression.  Each substatement or subexpression within an
331*67e74705SXin Liexpression is stored as a separate record (which keeps most records to a fixed
332*67e74705SXin Lisize).  Within the AST file, the subexpressions of an expression are stored, in
333*67e74705SXin Lireverse order, prior to the expression that owns those expression, using a form
334*67e74705SXin Liof `Reverse Polish Notation
335*67e74705SXin Li<http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_.  For example, an
336*67e74705SXin Liexpression ``3 - 4 + 5`` would be represented as follows:
337*67e74705SXin Li
338*67e74705SXin Li+-----------------------+
339*67e74705SXin Li| ``IntegerLiteral(5)`` |
340*67e74705SXin Li+-----------------------+
341*67e74705SXin Li| ``IntegerLiteral(4)`` |
342*67e74705SXin Li+-----------------------+
343*67e74705SXin Li| ``IntegerLiteral(3)`` |
344*67e74705SXin Li+-----------------------+
345*67e74705SXin Li| ``IntegerLiteral(-)`` |
346*67e74705SXin Li+-----------------------+
347*67e74705SXin Li| ``IntegerLiteral(+)`` |
348*67e74705SXin Li+-----------------------+
349*67e74705SXin Li|       ``STOP``        |
350*67e74705SXin Li+-----------------------+
351*67e74705SXin Li
352*67e74705SXin LiWhen reading this representation, Clang evaluates each expression record it
353*67e74705SXin Liencounters, builds the appropriate abstract syntax tree node, and then pushes
354*67e74705SXin Lithat expression on to a stack.  When a record contains *N* subexpressions ---
355*67e74705SXin Li``BinaryOperator`` has two of them --- those expressions are popped from the
356*67e74705SXin Litop of the stack.  The special STOP code indicates that we have reached the end
357*67e74705SXin Liof a serialized expression or statement; other expression or statement records
358*67e74705SXin Limay follow, but they are part of a different expression.
359*67e74705SXin Li
360*67e74705SXin Li.. _pchinternals-ident-table:
361*67e74705SXin Li
362*67e74705SXin LiIdentifier Table Block
363*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^
364*67e74705SXin Li
365*67e74705SXin LiThe identifier table block contains an on-disk hash table that maps each
366*67e74705SXin Liidentifier mentioned within the AST file to the serialized representation of
367*67e74705SXin Lithe identifier's information (e.g, the ``IdentifierInfo`` structure).  The
368*67e74705SXin Liserialized representation contains:
369*67e74705SXin Li
370*67e74705SXin Li* The actual identifier string.
371*67e74705SXin Li* Flags that describe whether this identifier is the name of a built-in, a
372*67e74705SXin Li  poisoned identifier, an extension token, or a macro.
373*67e74705SXin Li* If the identifier names a macro, the offset of the macro definition within
374*67e74705SXin Li  the :ref:`pchinternals-preprocessor`.
375*67e74705SXin Li* If the identifier names one or more declarations visible from translation
376*67e74705SXin Li  unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these
377*67e74705SXin Li  declarations.
378*67e74705SXin Li
379*67e74705SXin LiWhen an AST file is loaded, the AST file reader mechanism introduces itself
380*67e74705SXin Liinto the identifier table as an external lookup source.  Thus, when the user
381*67e74705SXin Liprogram refers to an identifier that has not yet been seen, Clang will perform
382*67e74705SXin Lia lookup into the identifier table.  If an identifier is found, its contents
383*67e74705SXin Li(macro definitions, flags, top-level declarations, etc.) will be deserialized,
384*67e74705SXin Liat which point the corresponding ``IdentifierInfo`` structure will have the
385*67e74705SXin Lisame contents it would have after parsing the headers in the AST file.
386*67e74705SXin Li
387*67e74705SXin LiWithin the AST file, the identifiers used to name declarations are represented
388*67e74705SXin Liwith an integral value.  A separate table provides a mapping from this integral
389*67e74705SXin Livalue (the identifier ID) to the location within the on-disk hash table where
390*67e74705SXin Lithat identifier is stored.  This mapping is used when deserializing the name of
391*67e74705SXin Lia declaration, the identifier of a token, or any other construct in the AST
392*67e74705SXin Lifile that refers to a name.
393*67e74705SXin Li
394*67e74705SXin Li.. _pchinternals-method-pool:
395*67e74705SXin Li
396*67e74705SXin LiMethod Pool Block
397*67e74705SXin Li^^^^^^^^^^^^^^^^^
398*67e74705SXin Li
399*67e74705SXin LiThe method pool block is represented as an on-disk hash table that serves two
400*67e74705SXin Lipurposes: it provides a mapping from the names of Objective-C selectors to the
401*67e74705SXin Liset of Objective-C instance and class methods that have that particular
402*67e74705SXin Liselector (which is required for semantic analysis in Objective-C) and also
403*67e74705SXin Listores all of the selectors used by entities within the AST file.  The design
404*67e74705SXin Liof the method pool is similar to that of the :ref:`identifier table
405*67e74705SXin Li<pchinternals-ident-table>`: the first time a particular selector is formed
406*67e74705SXin Liduring the compilation of the program, Clang will search in the on-disk hash
407*67e74705SXin Litable of selectors; if found, Clang will read the Objective-C methods
408*67e74705SXin Liassociated with that selector into the appropriate front-end data structure
409*67e74705SXin Li(``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and
410*67e74705SXin Liclass methods, respectively).
411*67e74705SXin Li
412*67e74705SXin LiAs with identifiers, selectors are represented by numeric values within the AST
413*67e74705SXin Lifile.  A separate index maps these numeric selector values to the offset of the
414*67e74705SXin Liselector within the on-disk hash table, and will be used when de-serializing an
415*67e74705SXin LiObjective-C method declaration (or other Objective-C construct) that refers to
416*67e74705SXin Lithe selector.
417*67e74705SXin Li
418*67e74705SXin LiAST Reader Integration Points
419*67e74705SXin Li-----------------------------
420*67e74705SXin Li
421*67e74705SXin LiThe "lazy" deserialization behavior of AST files requires their integration
422*67e74705SXin Liinto several completely different submodules of Clang.  For example, lazily
423*67e74705SXin Lideserializing the declarations during name lookup requires that the name-lookup
424*67e74705SXin Liroutines be able to query the AST file to find entities stored there.
425*67e74705SXin Li
426*67e74705SXin LiFor each Clang data structure that requires direct interaction with the AST
427*67e74705SXin Lireader logic, there is an abstract class that provides the interface between
428*67e74705SXin Lithe two modules.  The ``ASTReader`` class, which handles the loading of an AST
429*67e74705SXin Lifile, inherits from all of these abstract classes to provide lazy
430*67e74705SXin Lideserialization of Clang's data structures.  ``ASTReader`` implements the
431*67e74705SXin Lifollowing abstract classes:
432*67e74705SXin Li
433*67e74705SXin Li``ExternalSLocEntrySource``
434*67e74705SXin Li  This abstract interface is associated with the ``SourceManager`` class, and
435*67e74705SXin Li  is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to
436*67e74705SXin Li  load the details of a file, buffer, or macro instantiation.
437*67e74705SXin Li
438*67e74705SXin Li``IdentifierInfoLookup``
439*67e74705SXin Li  This abstract interface is associated with the ``IdentifierTable`` class, and
440*67e74705SXin Li  is used whenever the program source refers to an identifier that has not yet
441*67e74705SXin Li  been seen.  In this case, the AST reader searches for this identifier within
442*67e74705SXin Li  its :ref:`identifier table <pchinternals-ident-table>` to load any top-level
443*67e74705SXin Li  declarations or macros associated with that identifier.
444*67e74705SXin Li
445*67e74705SXin Li``ExternalASTSource``
446*67e74705SXin Li  This abstract interface is associated with the ``ASTContext`` class, and is
447*67e74705SXin Li  used whenever the abstract syntax tree nodes need to loaded from the AST
448*67e74705SXin Li  file.  It provides the ability to de-serialize declarations and types
449*67e74705SXin Li  identified by their numeric values, read the bodies of functions when
450*67e74705SXin Li  required, and read the declarations stored within a declaration context
451*67e74705SXin Li  (either for iteration or for name lookup).
452*67e74705SXin Li
453*67e74705SXin Li``ExternalSemaSource``
454*67e74705SXin Li  This abstract interface is associated with the ``Sema`` class, and is used
455*67e74705SXin Li  whenever semantic analysis needs to read information from the :ref:`global
456*67e74705SXin Li  method pool <pchinternals-method-pool>`.
457*67e74705SXin Li
458*67e74705SXin Li.. _pchinternals-chained:
459*67e74705SXin Li
460*67e74705SXin LiChained precompiled headers
461*67e74705SXin Li---------------------------
462*67e74705SXin Li
463*67e74705SXin LiChained precompiled headers were initially intended to improve the performance
464*67e74705SXin Liof IDE-centric operations such as syntax highlighting and code completion while
465*67e74705SXin Lia particular source file is being edited by the user.  To minimize the amount
466*67e74705SXin Liof reparsing required after a change to the file, a form of precompiled header
467*67e74705SXin Li--- called a precompiled *preamble* --- is automatically generated by parsing
468*67e74705SXin Liall of the headers in the source file, up to and including the last
469*67e74705SXin Li``#include``.  When only the source file changes (and none of the headers it
470*67e74705SXin Lidepends on), reparsing of that source file can use the precompiled preamble and
471*67e74705SXin Listart parsing after the ``#include``\ s, so parsing time is proportional to the
472*67e74705SXin Lisize of the source file (rather than all of its includes).  However, the
473*67e74705SXin Licompilation of that translation unit may already use a precompiled header: in
474*67e74705SXin Lithis case, Clang will create the precompiled preamble as a chained precompiled
475*67e74705SXin Liheader that refers to the original precompiled header.  This drastically
476*67e74705SXin Lireduces the time needed to serialize the precompiled preamble for use in
477*67e74705SXin Lireparsing.
478*67e74705SXin Li
479*67e74705SXin LiChained precompiled headers get their name because each precompiled header can
480*67e74705SXin Lidepend on one other precompiled header, forming a chain of dependencies.  A
481*67e74705SXin Litranslation unit will then include the precompiled header that starts the chain
482*67e74705SXin Li(i.e., nothing depends on it).  This linearity of dependencies is important for
483*67e74705SXin Lithe semantic model of chained precompiled headers, because the most-recent
484*67e74705SXin Liprecompiled header can provide information that overrides the information
485*67e74705SXin Liprovided by the precompiled headers it depends on, just like a header file
486*67e74705SXin Li``B.h`` that includes another header ``A.h`` can modify the state produced by
487*67e74705SXin Liparsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``.
488*67e74705SXin Li
489*67e74705SXin LiThere are several ways in which chained precompiled headers generalize the AST
490*67e74705SXin Lifile model:
491*67e74705SXin Li
492*67e74705SXin LiNumbering of IDs
493*67e74705SXin Li  Many different kinds of entities --- identifiers, declarations, types, etc.
494*67e74705SXin Li  --- have ID numbers that start at 1 or some other predefined constant and
495*67e74705SXin Li  grow upward.  Each precompiled header records the maximum ID number it has
496*67e74705SXin Li  assigned in each category.  Then, when a new precompiled header is generated
497*67e74705SXin Li  that depends on (chains to) another precompiled header, it will start
498*67e74705SXin Li  counting at the next available ID number.  This way, one can determine, given
499*67e74705SXin Li  an ID number, which AST file actually contains the entity.
500*67e74705SXin Li
501*67e74705SXin LiName lookup
502*67e74705SXin Li  When writing a chained precompiled header, Clang attempts to write only
503*67e74705SXin Li  information that has changed from the precompiled header on which it is
504*67e74705SXin Li  based.  This changes the lookup algorithm for the various tables, such as the
505*67e74705SXin Li  :ref:`identifier table <pchinternals-ident-table>`: the search starts at the
506*67e74705SXin Li  most-recent precompiled header.  If no entry is found, lookup then proceeds
507*67e74705SXin Li  to the identifier table in the precompiled header it depends on, and so one.
508*67e74705SXin Li  Once a lookup succeeds, that result is considered definitive, overriding any
509*67e74705SXin Li  results from earlier precompiled headers.
510*67e74705SXin Li
511*67e74705SXin LiUpdate records
512*67e74705SXin Li  There are various ways in which a later precompiled header can modify the
513*67e74705SXin Li  entities described in an earlier precompiled header.  For example, later
514*67e74705SXin Li  precompiled headers can add entries into the various name-lookup tables for
515*67e74705SXin Li  the translation unit or namespaces, or add new categories to an Objective-C
516*67e74705SXin Li  class.  Each of these updates is captured in an "update record" that is
517*67e74705SXin Li  stored in the chained precompiled header file and will be loaded along with
518*67e74705SXin Li  the original entity.
519*67e74705SXin Li
520*67e74705SXin Li.. _pchinternals-modules:
521*67e74705SXin Li
522*67e74705SXin LiModules
523*67e74705SXin Li-------
524*67e74705SXin Li
525*67e74705SXin LiModules generalize the chained precompiled header model yet further, from a
526*67e74705SXin Lilinear chain of precompiled headers to an arbitrary directed acyclic graph
527*67e74705SXin Li(DAG) of AST files.  All of the same techniques used to make chained
528*67e74705SXin Liprecompiled headers work --- ID number, name lookup, update records --- are
529*67e74705SXin Lishared with modules.  However, the DAG nature of modules introduce a number of
530*67e74705SXin Liadditional complications to the model:
531*67e74705SXin Li
532*67e74705SXin LiNumbering of IDs
533*67e74705SXin Li  The simple, linear numbering scheme used in chained precompiled headers falls
534*67e74705SXin Li  apart with the module DAG, because different modules may end up with
535*67e74705SXin Li  different numbering schemes for entities they imported from common shared
536*67e74705SXin Li  modules.  To account for this, each module file provides information about
537*67e74705SXin Li  which modules it depends on and which ID numbers it assigned to the entities
538*67e74705SXin Li  in those modules, as well as which ID numbers it took for its own new
539*67e74705SXin Li  entities.  The AST reader then maps these "local" ID numbers into a "global"
540*67e74705SXin Li  ID number space for the current translation unit, providing a 1-1 mapping
541*67e74705SXin Li  between entities (in whatever AST file they inhabit) and global ID numbers.
542*67e74705SXin Li  If that translation unit is then serialized into an AST file, this mapping
543*67e74705SXin Li  will be stored for use when the AST file is imported.
544*67e74705SXin Li
545*67e74705SXin LiDeclaration merging
546*67e74705SXin Li  It is possible for a given entity (from the language's perspective) to be
547*67e74705SXin Li  declared multiple times in different places.  For example, two different
548*67e74705SXin Li  headers can have the declaration of ``printf`` or could forward-declare
549*67e74705SXin Li  ``struct stat``.  If each of those headers is included in a module, and some
550*67e74705SXin Li  third party imports both of those modules, there is a potentially serious
551*67e74705SXin Li  problem: name lookup for ``printf`` or ``struct stat`` will find both
552*67e74705SXin Li  declarations, but the AST nodes are unrelated.  This would result in a
553*67e74705SXin Li  compilation error, due to an ambiguity in name lookup.  Therefore, the AST
554*67e74705SXin Li  reader performs declaration merging according to the appropriate language
555*67e74705SXin Li  semantics, ensuring that the two disjoint declarations are merged into a
556*67e74705SXin Li  single redeclaration chain (with a common canonical declaration), so that it
557*67e74705SXin Li  is as if one of the headers had been included before the other.
558*67e74705SXin Li
559*67e74705SXin LiName Visibility
560*67e74705SXin Li  Modules allow certain names that occur during module creation to be "hidden",
561*67e74705SXin Li  so that they are not part of the public interface of the module and are not
562*67e74705SXin Li  visible to its clients.  The AST reader maintains a "visible" bit on various
563*67e74705SXin Li  AST nodes (declarations, macros, etc.) to indicate whether that particular
564*67e74705SXin Li  AST node is currently visible; the various name lookup mechanisms in Clang
565*67e74705SXin Li  inspect the visible bit to determine whether that entity, which is still in
566*67e74705SXin Li  the AST (because other, visible AST nodes may depend on it), can actually be
567*67e74705SXin Li  found by name lookup.  When a new (sub)module is imported, it may make
568*67e74705SXin Li  existing, non-visible, already-deserialized AST nodes visible; it is the
569*67e74705SXin Li  responsibility of the AST reader to find and update these AST nodes when it
570*67e74705SXin Li  is notified of the import.
571*67e74705SXin Li
572