xref: /aosp_15_r20/external/clang/docs/InternalsManual.rst (revision 67e74705e28f6214e480b399dd47ea732279e315)
1*67e74705SXin Li============================
2*67e74705SXin Li"Clang" CFE Internals Manual
3*67e74705SXin Li============================
4*67e74705SXin Li
5*67e74705SXin Li.. contents::
6*67e74705SXin Li   :local:
7*67e74705SXin Li
8*67e74705SXin LiIntroduction
9*67e74705SXin Li============
10*67e74705SXin Li
11*67e74705SXin LiThis document describes some of the more important APIs and internal design
12*67e74705SXin Lidecisions made in the Clang C front-end.  The purpose of this document is to
13*67e74705SXin Liboth capture some of this high level information and also describe some of the
14*67e74705SXin Lidesign decisions behind it.  This is meant for people interested in hacking on
15*67e74705SXin LiClang, not for end-users.  The description below is categorized by libraries,
16*67e74705SXin Liand does not describe any of the clients of the libraries.
17*67e74705SXin Li
18*67e74705SXin LiLLVM Support Library
19*67e74705SXin Li====================
20*67e74705SXin Li
21*67e74705SXin LiThe LLVM ``libSupport`` library provides many underlying libraries and
22*67e74705SXin Li`data-structures <http://llvm.org/docs/ProgrammersManual.html>`_, including
23*67e74705SXin Licommand line option processing, various containers and a system abstraction
24*67e74705SXin Lilayer, which is used for file system access.
25*67e74705SXin Li
26*67e74705SXin LiThe Clang "Basic" Library
27*67e74705SXin Li=========================
28*67e74705SXin Li
29*67e74705SXin LiThis library certainly needs a better name.  The "basic" library contains a
30*67e74705SXin Linumber of low-level utilities for tracking and manipulating source buffers,
31*67e74705SXin Lilocations within the source buffers, diagnostics, tokens, target abstraction,
32*67e74705SXin Liand information about the subset of the language being compiled for.
33*67e74705SXin Li
34*67e74705SXin LiPart of this infrastructure is specific to C (such as the ``TargetInfo``
35*67e74705SXin Liclass), other parts could be reused for other non-C-based languages
36*67e74705SXin Li(``SourceLocation``, ``SourceManager``, ``Diagnostics``, ``FileManager``).
37*67e74705SXin LiWhen and if there is future demand we can figure out if it makes sense to
38*67e74705SXin Liintroduce a new library, move the general classes somewhere else, or introduce
39*67e74705SXin Lisome other solution.
40*67e74705SXin Li
41*67e74705SXin LiWe describe the roles of these classes in order of their dependencies.
42*67e74705SXin Li
43*67e74705SXin LiThe Diagnostics Subsystem
44*67e74705SXin Li-------------------------
45*67e74705SXin Li
46*67e74705SXin LiThe Clang Diagnostics subsystem is an important part of how the compiler
47*67e74705SXin Licommunicates with the human.  Diagnostics are the warnings and errors produced
48*67e74705SXin Liwhen the code is incorrect or dubious.  In Clang, each diagnostic produced has
49*67e74705SXin Li(at the minimum) a unique ID, an English translation associated with it, a
50*67e74705SXin Li:ref:`SourceLocation <SourceLocation>` to "put the caret", and a severity
51*67e74705SXin Li(e.g., ``WARNING`` or ``ERROR``).  They can also optionally include a number of
52*67e74705SXin Liarguments to the dianostic (which fill in "%0"'s in the string) as well as a
53*67e74705SXin Linumber of source ranges that related to the diagnostic.
54*67e74705SXin Li
55*67e74705SXin LiIn this section, we'll be giving examples produced by the Clang command line
56*67e74705SXin Lidriver, but diagnostics can be :ref:`rendered in many different ways
57*67e74705SXin Li<DiagnosticClient>` depending on how the ``DiagnosticClient`` interface is
58*67e74705SXin Liimplemented.  A representative example of a diagnostic is:
59*67e74705SXin Li
60*67e74705SXin Li.. code-block:: text
61*67e74705SXin Li
62*67e74705SXin Li  t.c:38:15: error: invalid operands to binary expression ('int *' and '_Complex float')
63*67e74705SXin Li  P = (P-42) + Gamma*4;
64*67e74705SXin Li      ~~~~~~ ^ ~~~~~~~
65*67e74705SXin Li
66*67e74705SXin LiIn this example, you can see the English translation, the severity (error), you
67*67e74705SXin Lican see the source location (the caret ("``^``") and file/line/column info),
68*67e74705SXin Lithe source ranges "``~~~~``", arguments to the diagnostic ("``int*``" and
69*67e74705SXin Li"``_Complex float``").  You'll have to believe me that there is a unique ID
70*67e74705SXin Libacking the diagnostic :).
71*67e74705SXin Li
72*67e74705SXin LiGetting all of this to happen has several steps and involves many moving
73*67e74705SXin Lipieces, this section describes them and talks about best practices when adding
74*67e74705SXin Lia new diagnostic.
75*67e74705SXin Li
76*67e74705SXin LiThe ``Diagnostic*Kinds.td`` files
77*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
78*67e74705SXin Li
79*67e74705SXin LiDiagnostics are created by adding an entry to one of the
80*67e74705SXin Li``clang/Basic/Diagnostic*Kinds.td`` files, depending on what library will be
81*67e74705SXin Liusing it.  From this file, :program:`tblgen` generates the unique ID of the
82*67e74705SXin Lidiagnostic, the severity of the diagnostic and the English translation + format
83*67e74705SXin Listring.
84*67e74705SXin Li
85*67e74705SXin LiThere is little sanity with the naming of the unique ID's right now.  Some
86*67e74705SXin Listart with ``err_``, ``warn_``, ``ext_`` to encode the severity into the name.
87*67e74705SXin LiSince the enum is referenced in the C++ code that produces the diagnostic, it
88*67e74705SXin Liis somewhat useful for it to be reasonably short.
89*67e74705SXin Li
90*67e74705SXin LiThe severity of the diagnostic comes from the set {``NOTE``, ``REMARK``,
91*67e74705SXin Li``WARNING``,
92*67e74705SXin Li``EXTENSION``, ``EXTWARN``, ``ERROR``}.  The ``ERROR`` severity is used for
93*67e74705SXin Lidiagnostics indicating the program is never acceptable under any circumstances.
94*67e74705SXin LiWhen an error is emitted, the AST for the input code may not be fully built.
95*67e74705SXin LiThe ``EXTENSION`` and ``EXTWARN`` severities are used for extensions to the
96*67e74705SXin Lilanguage that Clang accepts.  This means that Clang fully understands and can
97*67e74705SXin Lirepresent them in the AST, but we produce diagnostics to tell the user their
98*67e74705SXin Licode is non-portable.  The difference is that the former are ignored by
99*67e74705SXin Lidefault, and the later warn by default.  The ``WARNING`` severity is used for
100*67e74705SXin Liconstructs that are valid in the currently selected source language but that
101*67e74705SXin Liare dubious in some way.  The ``REMARK`` severity provides generic information
102*67e74705SXin Liabout the compilation that is not necessarily related to any dubious code.  The
103*67e74705SXin Li``NOTE`` level is used to staple more information onto previous diagnostics.
104*67e74705SXin Li
105*67e74705SXin LiThese *severities* are mapped into a smaller set (the ``Diagnostic::Level``
106*67e74705SXin Lienum, {``Ignored``, ``Note``, ``Remark``, ``Warning``, ``Error``, ``Fatal``}) of
107*67e74705SXin Lioutput
108*67e74705SXin Li*levels* by the diagnostics subsystem based on various configuration options.
109*67e74705SXin LiClang internally supports a fully fine grained mapping mechanism that allows
110*67e74705SXin Liyou to map almost any diagnostic to the output level that you want.  The only
111*67e74705SXin Lidiagnostics that cannot be mapped are ``NOTE``\ s, which always follow the
112*67e74705SXin Liseverity of the previously emitted diagnostic and ``ERROR``\ s, which can only
113*67e74705SXin Libe mapped to ``Fatal`` (it is not possible to turn an error into a warning, for
114*67e74705SXin Liexample).
115*67e74705SXin Li
116*67e74705SXin LiDiagnostic mappings are used in many ways.  For example, if the user specifies
117*67e74705SXin Li``-pedantic``, ``EXTENSION`` maps to ``Warning``, if they specify
118*67e74705SXin Li``-pedantic-errors``, it turns into ``Error``.  This is used to implement
119*67e74705SXin Lioptions like ``-Wunused_macros``, ``-Wundef`` etc.
120*67e74705SXin Li
121*67e74705SXin LiMapping to ``Fatal`` should only be used for diagnostics that are considered so
122*67e74705SXin Lisevere that error recovery won't be able to recover sensibly from them (thus
123*67e74705SXin Lispewing a ton of bogus errors).  One example of this class of error are failure
124*67e74705SXin Lito ``#include`` a file.
125*67e74705SXin Li
126*67e74705SXin LiThe Format String
127*67e74705SXin Li^^^^^^^^^^^^^^^^^
128*67e74705SXin Li
129*67e74705SXin LiThe format string for the diagnostic is very simple, but it has some power.  It
130*67e74705SXin Litakes the form of a string in English with markers that indicate where and how
131*67e74705SXin Liarguments to the diagnostic are inserted and formatted.  For example, here are
132*67e74705SXin Lisome simple format strings:
133*67e74705SXin Li
134*67e74705SXin Li.. code-block:: c++
135*67e74705SXin Li
136*67e74705SXin Li  "binary integer literals are an extension"
137*67e74705SXin Li  "format string contains '\\0' within the string body"
138*67e74705SXin Li  "more '%%' conversions than data arguments"
139*67e74705SXin Li  "invalid operands to binary expression (%0 and %1)"
140*67e74705SXin Li  "overloaded '%0' must be a %select{unary|binary|unary or binary}2 operator"
141*67e74705SXin Li       " (has %1 parameter%s1)"
142*67e74705SXin Li
143*67e74705SXin LiThese examples show some important points of format strings.  You can use any
144*67e74705SXin Liplain ASCII character in the diagnostic string except "``%``" without a
145*67e74705SXin Liproblem, but these are C strings, so you have to use and be aware of all the C
146*67e74705SXin Liescape sequences (as in the second example).  If you want to produce a "``%``"
147*67e74705SXin Liin the output, use the "``%%``" escape sequence, like the third diagnostic.
148*67e74705SXin LiFinally, Clang uses the "``%...[digit]``" sequences to specify where and how
149*67e74705SXin Liarguments to the diagnostic are formatted.
150*67e74705SXin Li
151*67e74705SXin LiArguments to the diagnostic are numbered according to how they are specified by
152*67e74705SXin Lithe C++ code that :ref:`produces them <internals-producing-diag>`, and are
153*67e74705SXin Lireferenced by ``%0`` .. ``%9``.  If you have more than 10 arguments to your
154*67e74705SXin Lidiagnostic, you are doing something wrong :).  Unlike ``printf``, there is no
155*67e74705SXin Lirequirement that arguments to the diagnostic end up in the output in the same
156*67e74705SXin Liorder as they are specified, you could have a format string with "``%1 %0``"
157*67e74705SXin Lithat swaps them, for example.  The text in between the percent and digit are
158*67e74705SXin Liformatting instructions.  If there are no instructions, the argument is just
159*67e74705SXin Liturned into a string and substituted in.
160*67e74705SXin Li
161*67e74705SXin LiHere are some "best practices" for writing the English format string:
162*67e74705SXin Li
163*67e74705SXin Li* Keep the string short.  It should ideally fit in the 80 column limit of the
164*67e74705SXin Li  ``DiagnosticKinds.td`` file.  This avoids the diagnostic wrapping when
165*67e74705SXin Li  printed, and forces you to think about the important point you are conveying
166*67e74705SXin Li  with the diagnostic.
167*67e74705SXin Li* Take advantage of location information.  The user will be able to see the
168*67e74705SXin Li  line and location of the caret, so you don't need to tell them that the
169*67e74705SXin Li  problem is with the 4th argument to the function: just point to it.
170*67e74705SXin Li* Do not capitalize the diagnostic string, and do not end it with a period.
171*67e74705SXin Li* If you need to quote something in the diagnostic string, use single quotes.
172*67e74705SXin Li
173*67e74705SXin LiDiagnostics should never take random English strings as arguments: you
174*67e74705SXin Lishouldn't use "``you have a problem with %0``" and pass in things like "``your
175*67e74705SXin Liargument``" or "``your return value``" as arguments.  Doing this prevents
176*67e74705SXin Li:ref:`translating <internals-diag-translation>` the Clang diagnostics to other
177*67e74705SXin Lilanguages (because they'll get random English words in their otherwise
178*67e74705SXin Lilocalized diagnostic).  The exceptions to this are C/C++ language keywords
179*67e74705SXin Li(e.g., ``auto``, ``const``, ``mutable``, etc) and C/C++ operators (``/=``).
180*67e74705SXin LiNote that things like "pointer" and "reference" are not keywords.  On the other
181*67e74705SXin Lihand, you *can* include anything that comes from the user's source code,
182*67e74705SXin Liincluding variable names, types, labels, etc.  The "``select``" format can be
183*67e74705SXin Liused to achieve this sort of thing in a localizable way, see below.
184*67e74705SXin Li
185*67e74705SXin LiFormatting a Diagnostic Argument
186*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
187*67e74705SXin Li
188*67e74705SXin LiArguments to diagnostics are fully typed internally, and come from a couple
189*67e74705SXin Lidifferent classes: integers, types, names, and random strings.  Depending on
190*67e74705SXin Lithe class of the argument, it can be optionally formatted in different ways.
191*67e74705SXin LiThis gives the ``DiagnosticClient`` information about what the argument means
192*67e74705SXin Liwithout requiring it to use a specific presentation (consider this MVC for
193*67e74705SXin LiClang :).
194*67e74705SXin Li
195*67e74705SXin LiHere are the different diagnostic argument formats currently supported by
196*67e74705SXin LiClang:
197*67e74705SXin Li
198*67e74705SXin Li**"s" format**
199*67e74705SXin Li
200*67e74705SXin LiExample:
201*67e74705SXin Li  ``"requires %1 parameter%s1"``
202*67e74705SXin LiClass:
203*67e74705SXin Li  Integers
204*67e74705SXin LiDescription:
205*67e74705SXin Li  This is a simple formatter for integers that is useful when producing English
206*67e74705SXin Li  diagnostics.  When the integer is 1, it prints as nothing.  When the integer
207*67e74705SXin Li  is not 1, it prints as "``s``".  This allows some simple grammatical forms to
208*67e74705SXin Li  be to be handled correctly, and eliminates the need to use gross things like
209*67e74705SXin Li  ``"requires %1 parameter(s)"``.
210*67e74705SXin Li
211*67e74705SXin Li**"select" format**
212*67e74705SXin Li
213*67e74705SXin LiExample:
214*67e74705SXin Li  ``"must be a %select{unary|binary|unary or binary}2 operator"``
215*67e74705SXin LiClass:
216*67e74705SXin Li  Integers
217*67e74705SXin LiDescription:
218*67e74705SXin Li  This format specifier is used to merge multiple related diagnostics together
219*67e74705SXin Li  into one common one, without requiring the difference to be specified as an
220*67e74705SXin Li  English string argument.  Instead of specifying the string, the diagnostic
221*67e74705SXin Li  gets an integer argument and the format string selects the numbered option.
222*67e74705SXin Li  In this case, the "``%2``" value must be an integer in the range [0..2].  If
223*67e74705SXin Li  it is 0, it prints "unary", if it is 1 it prints "binary" if it is 2, it
224*67e74705SXin Li  prints "unary or binary".  This allows other language translations to
225*67e74705SXin Li  substitute reasonable words (or entire phrases) based on the semantics of the
226*67e74705SXin Li  diagnostic instead of having to do things textually.  The selected string
227*67e74705SXin Li  does undergo formatting.
228*67e74705SXin Li
229*67e74705SXin Li**"plural" format**
230*67e74705SXin Li
231*67e74705SXin LiExample:
232*67e74705SXin Li  ``"you have %1 %plural{1:mouse|:mice}1 connected to your computer"``
233*67e74705SXin LiClass:
234*67e74705SXin Li  Integers
235*67e74705SXin LiDescription:
236*67e74705SXin Li  This is a formatter for complex plural forms.  It is designed to handle even
237*67e74705SXin Li  the requirements of languages with very complex plural forms, as many Baltic
238*67e74705SXin Li  languages have.  The argument consists of a series of expression/form pairs,
239*67e74705SXin Li  separated by ":", where the first form whose expression evaluates to true is
240*67e74705SXin Li  the result of the modifier.
241*67e74705SXin Li
242*67e74705SXin Li  An expression can be empty, in which case it is always true.  See the example
243*67e74705SXin Li  at the top.  Otherwise, it is a series of one or more numeric conditions,
244*67e74705SXin Li  separated by ",".  If any condition matches, the expression matches.  Each
245*67e74705SXin Li  numeric condition can take one of three forms.
246*67e74705SXin Li
247*67e74705SXin Li  * number: A simple decimal number matches if the argument is the same as the
248*67e74705SXin Li    number.  Example: ``"%plural{1:mouse|:mice}4"``
249*67e74705SXin Li  * range: A range in square brackets matches if the argument is within the
250*67e74705SXin Li    range.  Then range is inclusive on both ends.  Example:
251*67e74705SXin Li    ``"%plural{0:none|1:one|[2,5]:some|:many}2"``
252*67e74705SXin Li  * modulo: A modulo operator is followed by a number, and equals sign and
253*67e74705SXin Li    either a number or a range.  The tests are the same as for plain numbers
254*67e74705SXin Li    and ranges, but the argument is taken modulo the number first.  Example:
255*67e74705SXin Li    ``"%plural{%100=0:even hundred|%100=[1,50]:lower half|:everything else}1"``
256*67e74705SXin Li
257*67e74705SXin Li  The parser is very unforgiving.  A syntax error, even whitespace, will abort,
258*67e74705SXin Li  as will a failure to match the argument against any expression.
259*67e74705SXin Li
260*67e74705SXin Li**"ordinal" format**
261*67e74705SXin Li
262*67e74705SXin LiExample:
263*67e74705SXin Li  ``"ambiguity in %ordinal0 argument"``
264*67e74705SXin LiClass:
265*67e74705SXin Li  Integers
266*67e74705SXin LiDescription:
267*67e74705SXin Li  This is a formatter which represents the argument number as an ordinal: the
268*67e74705SXin Li  value ``1`` becomes ``1st``, ``3`` becomes ``3rd``, and so on.  Values less
269*67e74705SXin Li  than ``1`` are not supported.  This formatter is currently hard-coded to use
270*67e74705SXin Li  English ordinals.
271*67e74705SXin Li
272*67e74705SXin Li**"objcclass" format**
273*67e74705SXin Li
274*67e74705SXin LiExample:
275*67e74705SXin Li  ``"method %objcclass0 not found"``
276*67e74705SXin LiClass:
277*67e74705SXin Li  ``DeclarationName``
278*67e74705SXin LiDescription:
279*67e74705SXin Li  This is a simple formatter that indicates the ``DeclarationName`` corresponds
280*67e74705SXin Li  to an Objective-C class method selector.  As such, it prints the selector
281*67e74705SXin Li  with a leading "``+``".
282*67e74705SXin Li
283*67e74705SXin Li**"objcinstance" format**
284*67e74705SXin Li
285*67e74705SXin LiExample:
286*67e74705SXin Li  ``"method %objcinstance0 not found"``
287*67e74705SXin LiClass:
288*67e74705SXin Li  ``DeclarationName``
289*67e74705SXin LiDescription:
290*67e74705SXin Li  This is a simple formatter that indicates the ``DeclarationName`` corresponds
291*67e74705SXin Li  to an Objective-C instance method selector.  As such, it prints the selector
292*67e74705SXin Li  with a leading "``-``".
293*67e74705SXin Li
294*67e74705SXin Li**"q" format**
295*67e74705SXin Li
296*67e74705SXin LiExample:
297*67e74705SXin Li  ``"candidate found by name lookup is %q0"``
298*67e74705SXin LiClass:
299*67e74705SXin Li  ``NamedDecl *``
300*67e74705SXin LiDescription:
301*67e74705SXin Li  This formatter indicates that the fully-qualified name of the declaration
302*67e74705SXin Li  should be printed, e.g., "``std::vector``" rather than "``vector``".
303*67e74705SXin Li
304*67e74705SXin Li**"diff" format**
305*67e74705SXin Li
306*67e74705SXin LiExample:
307*67e74705SXin Li  ``"no known conversion %diff{from $ to $|from argument type to parameter type}1,2"``
308*67e74705SXin LiClass:
309*67e74705SXin Li  ``QualType``
310*67e74705SXin LiDescription:
311*67e74705SXin Li  This formatter takes two ``QualType``\ s and attempts to print a template
312*67e74705SXin Li  difference between the two.  If tree printing is off, the text inside the
313*67e74705SXin Li  braces before the pipe is printed, with the formatted text replacing the $.
314*67e74705SXin Li  If tree printing is on, the text after the pipe is printed and a type tree is
315*67e74705SXin Li  printed after the diagnostic message.
316*67e74705SXin Li
317*67e74705SXin LiIt is really easy to add format specifiers to the Clang diagnostics system, but
318*67e74705SXin Lithey should be discussed before they are added.  If you are creating a lot of
319*67e74705SXin Lirepetitive diagnostics and/or have an idea for a useful formatter, please bring
320*67e74705SXin Liit up on the cfe-dev mailing list.
321*67e74705SXin Li
322*67e74705SXin Li.. _internals-producing-diag:
323*67e74705SXin Li
324*67e74705SXin LiProducing the Diagnostic
325*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^
326*67e74705SXin Li
327*67e74705SXin LiNow that you've created the diagnostic in the ``Diagnostic*Kinds.td`` file, you
328*67e74705SXin Lineed to write the code that detects the condition in question and emits the new
329*67e74705SXin Lidiagnostic.  Various components of Clang (e.g., the preprocessor, ``Sema``,
330*67e74705SXin Lietc.) provide a helper function named "``Diag``".  It creates a diagnostic and
331*67e74705SXin Liaccepts the arguments, ranges, and other information that goes along with it.
332*67e74705SXin Li
333*67e74705SXin LiFor example, the binary expression error comes from code like this:
334*67e74705SXin Li
335*67e74705SXin Li.. code-block:: c++
336*67e74705SXin Li
337*67e74705SXin Li  if (various things that are bad)
338*67e74705SXin Li    Diag(Loc, diag::err_typecheck_invalid_operands)
339*67e74705SXin Li      << lex->getType() << rex->getType()
340*67e74705SXin Li      << lex->getSourceRange() << rex->getSourceRange();
341*67e74705SXin Li
342*67e74705SXin LiThis shows that use of the ``Diag`` method: it takes a location (a
343*67e74705SXin Li:ref:`SourceLocation <SourceLocation>` object) and a diagnostic enum value
344*67e74705SXin Li(which matches the name from ``Diagnostic*Kinds.td``).  If the diagnostic takes
345*67e74705SXin Liarguments, they are specified with the ``<<`` operator: the first argument
346*67e74705SXin Libecomes ``%0``, the second becomes ``%1``, etc.  The diagnostic interface
347*67e74705SXin Liallows you to specify arguments of many different types, including ``int`` and
348*67e74705SXin Li``unsigned`` for integer arguments, ``const char*`` and ``std::string`` for
349*67e74705SXin Listring arguments, ``DeclarationName`` and ``const IdentifierInfo *`` for names,
350*67e74705SXin Li``QualType`` for types, etc.  ``SourceRange``\ s are also specified with the
351*67e74705SXin Li``<<`` operator, but do not have a specific ordering requirement.
352*67e74705SXin Li
353*67e74705SXin LiAs you can see, adding and producing a diagnostic is pretty straightforward.
354*67e74705SXin LiThe hard part is deciding exactly what you need to say to help the user,
355*67e74705SXin Lipicking a suitable wording, and providing the information needed to format it
356*67e74705SXin Licorrectly.  The good news is that the call site that issues a diagnostic should
357*67e74705SXin Libe completely independent of how the diagnostic is formatted and in what
358*67e74705SXin Lilanguage it is rendered.
359*67e74705SXin Li
360*67e74705SXin LiFix-It Hints
361*67e74705SXin Li^^^^^^^^^^^^
362*67e74705SXin Li
363*67e74705SXin LiIn some cases, the front end emits diagnostics when it is clear that some small
364*67e74705SXin Lichange to the source code would fix the problem.  For example, a missing
365*67e74705SXin Lisemicolon at the end of a statement or a use of deprecated syntax that is
366*67e74705SXin Lieasily rewritten into a more modern form.  Clang tries very hard to emit the
367*67e74705SXin Lidiagnostic and recover gracefully in these and other cases.
368*67e74705SXin Li
369*67e74705SXin LiHowever, for these cases where the fix is obvious, the diagnostic can be
370*67e74705SXin Liannotated with a hint (referred to as a "fix-it hint") that describes how to
371*67e74705SXin Lichange the code referenced by the diagnostic to fix the problem.  For example,
372*67e74705SXin Liit might add the missing semicolon at the end of the statement or rewrite the
373*67e74705SXin Liuse of a deprecated construct into something more palatable.  Here is one such
374*67e74705SXin Liexample from the C++ front end, where we warn about the right-shift operator
375*67e74705SXin Lichanging meaning from C++98 to C++11:
376*67e74705SXin Li
377*67e74705SXin Li.. code-block:: text
378*67e74705SXin Li
379*67e74705SXin Li  test.cpp:3:7: warning: use of right-shift operator ('>>') in template argument
380*67e74705SXin Li                         will require parentheses in C++11
381*67e74705SXin Li  A<100 >> 2> *a;
382*67e74705SXin Li        ^
383*67e74705SXin Li    (       )
384*67e74705SXin Li
385*67e74705SXin LiHere, the fix-it hint is suggesting that parentheses be added, and showing
386*67e74705SXin Liexactly where those parentheses would be inserted into the source code.  The
387*67e74705SXin Lifix-it hints themselves describe what changes to make to the source code in an
388*67e74705SXin Liabstract manner, which the text diagnostic printer renders as a line of
389*67e74705SXin Li"insertions" below the caret line.  :ref:`Other diagnostic clients
390*67e74705SXin Li<DiagnosticClient>` might choose to render the code differently (e.g., as
391*67e74705SXin Limarkup inline) or even give the user the ability to automatically fix the
392*67e74705SXin Liproblem.
393*67e74705SXin Li
394*67e74705SXin LiFix-it hints on errors and warnings need to obey these rules:
395*67e74705SXin Li
396*67e74705SXin Li* Since they are automatically applied if ``-Xclang -fixit`` is passed to the
397*67e74705SXin Li  driver, they should only be used when it's very likely they match the user's
398*67e74705SXin Li  intent.
399*67e74705SXin Li* Clang must recover from errors as if the fix-it had been applied.
400*67e74705SXin Li
401*67e74705SXin LiIf a fix-it can't obey these rules, put the fix-it on a note.  Fix-its on notes
402*67e74705SXin Liare not applied automatically.
403*67e74705SXin Li
404*67e74705SXin LiAll fix-it hints are described by the ``FixItHint`` class, instances of which
405*67e74705SXin Lishould be attached to the diagnostic using the ``<<`` operator in the same way
406*67e74705SXin Lithat highlighted source ranges and arguments are passed to the diagnostic.
407*67e74705SXin LiFix-it hints can be created with one of three constructors:
408*67e74705SXin Li
409*67e74705SXin Li* ``FixItHint::CreateInsertion(Loc, Code)``
410*67e74705SXin Li
411*67e74705SXin Li    Specifies that the given ``Code`` (a string) should be inserted before the
412*67e74705SXin Li    source location ``Loc``.
413*67e74705SXin Li
414*67e74705SXin Li* ``FixItHint::CreateRemoval(Range)``
415*67e74705SXin Li
416*67e74705SXin Li    Specifies that the code in the given source ``Range`` should be removed.
417*67e74705SXin Li
418*67e74705SXin Li* ``FixItHint::CreateReplacement(Range, Code)``
419*67e74705SXin Li
420*67e74705SXin Li    Specifies that the code in the given source ``Range`` should be removed,
421*67e74705SXin Li    and replaced with the given ``Code`` string.
422*67e74705SXin Li
423*67e74705SXin Li.. _DiagnosticClient:
424*67e74705SXin Li
425*67e74705SXin LiThe ``DiagnosticClient`` Interface
426*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
427*67e74705SXin Li
428*67e74705SXin LiOnce code generates a diagnostic with all of the arguments and the rest of the
429*67e74705SXin Lirelevant information, Clang needs to know what to do with it.  As previously
430*67e74705SXin Limentioned, the diagnostic machinery goes through some filtering to map a
431*67e74705SXin Liseverity onto a diagnostic level, then (assuming the diagnostic is not mapped
432*67e74705SXin Lito "``Ignore``") it invokes an object that implements the ``DiagnosticClient``
433*67e74705SXin Liinterface with the information.
434*67e74705SXin Li
435*67e74705SXin LiIt is possible to implement this interface in many different ways.  For
436*67e74705SXin Liexample, the normal Clang ``DiagnosticClient`` (named
437*67e74705SXin Li``TextDiagnosticPrinter``) turns the arguments into strings (according to the
438*67e74705SXin Livarious formatting rules), prints out the file/line/column information and the
439*67e74705SXin Listring, then prints out the line of code, the source ranges, and the caret.
440*67e74705SXin LiHowever, this behavior isn't required.
441*67e74705SXin Li
442*67e74705SXin LiAnother implementation of the ``DiagnosticClient`` interface is the
443*67e74705SXin Li``TextDiagnosticBuffer`` class, which is used when Clang is in ``-verify``
444*67e74705SXin Limode.  Instead of formatting and printing out the diagnostics, this
445*67e74705SXin Liimplementation just captures and remembers the diagnostics as they fly by.
446*67e74705SXin LiThen ``-verify`` compares the list of produced diagnostics to the list of
447*67e74705SXin Liexpected ones.  If they disagree, it prints out its own output.  Full
448*67e74705SXin Lidocumentation for the ``-verify`` mode can be found in the Clang API
449*67e74705SXin Lidocumentation for `VerifyDiagnosticConsumer
450*67e74705SXin Li</doxygen/classclang_1_1VerifyDiagnosticConsumer.html#details>`_.
451*67e74705SXin Li
452*67e74705SXin LiThere are many other possible implementations of this interface, and this is
453*67e74705SXin Liwhy we prefer diagnostics to pass down rich structured information in
454*67e74705SXin Liarguments.  For example, an HTML output might want declaration names be
455*67e74705SXin Lilinkified to where they come from in the source.  Another example is that a GUI
456*67e74705SXin Limight let you click on typedefs to expand them.  This application would want to
457*67e74705SXin Lipass significantly more information about types through to the GUI than a
458*67e74705SXin Lisimple flat string.  The interface allows this to happen.
459*67e74705SXin Li
460*67e74705SXin Li.. _internals-diag-translation:
461*67e74705SXin Li
462*67e74705SXin LiAdding Translations to Clang
463*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^^^
464*67e74705SXin Li
465*67e74705SXin LiNot possible yet! Diagnostic strings should be written in UTF-8, the client can
466*67e74705SXin Litranslate to the relevant code page if needed.  Each translation completely
467*67e74705SXin Lireplaces the format string for the diagnostic.
468*67e74705SXin Li
469*67e74705SXin Li.. _SourceLocation:
470*67e74705SXin Li.. _SourceManager:
471*67e74705SXin Li
472*67e74705SXin LiThe ``SourceLocation`` and ``SourceManager`` classes
473*67e74705SXin Li----------------------------------------------------
474*67e74705SXin Li
475*67e74705SXin LiStrangely enough, the ``SourceLocation`` class represents a location within the
476*67e74705SXin Lisource code of the program.  Important design points include:
477*67e74705SXin Li
478*67e74705SXin Li#. ``sizeof(SourceLocation)`` must be extremely small, as these are embedded
479*67e74705SXin Li   into many AST nodes and are passed around often.  Currently it is 32 bits.
480*67e74705SXin Li#. ``SourceLocation`` must be a simple value object that can be efficiently
481*67e74705SXin Li   copied.
482*67e74705SXin Li#. We should be able to represent a source location for any byte of any input
483*67e74705SXin Li   file.  This includes in the middle of tokens, in whitespace, in trigraphs,
484*67e74705SXin Li   etc.
485*67e74705SXin Li#. A ``SourceLocation`` must encode the current ``#include`` stack that was
486*67e74705SXin Li   active when the location was processed.  For example, if the location
487*67e74705SXin Li   corresponds to a token, it should contain the set of ``#include``\ s active
488*67e74705SXin Li   when the token was lexed.  This allows us to print the ``#include`` stack
489*67e74705SXin Li   for a diagnostic.
490*67e74705SXin Li#. ``SourceLocation`` must be able to describe macro expansions, capturing both
491*67e74705SXin Li   the ultimate instantiation point and the source of the original character
492*67e74705SXin Li   data.
493*67e74705SXin Li
494*67e74705SXin LiIn practice, the ``SourceLocation`` works together with the ``SourceManager``
495*67e74705SXin Liclass to encode two pieces of information about a location: its spelling
496*67e74705SXin Lilocation and its instantiation location.  For most tokens, these will be the
497*67e74705SXin Lisame.  However, for a macro expansion (or tokens that came from a ``_Pragma``
498*67e74705SXin Lidirective) these will describe the location of the characters corresponding to
499*67e74705SXin Lithe token and the location where the token was used (i.e., the macro
500*67e74705SXin Liinstantiation point or the location of the ``_Pragma`` itself).
501*67e74705SXin Li
502*67e74705SXin LiThe Clang front-end inherently depends on the location of a token being tracked
503*67e74705SXin Licorrectly.  If it is ever incorrect, the front-end may get confused and die.
504*67e74705SXin LiThe reason for this is that the notion of the "spelling" of a ``Token`` in
505*67e74705SXin LiClang depends on being able to find the original input characters for the
506*67e74705SXin Litoken.  This concept maps directly to the "spelling location" for the token.
507*67e74705SXin Li
508*67e74705SXin Li``SourceRange`` and ``CharSourceRange``
509*67e74705SXin Li---------------------------------------
510*67e74705SXin Li
511*67e74705SXin Li.. mostly taken from http://lists.llvm.org/pipermail/cfe-dev/2010-August/010595.html
512*67e74705SXin Li
513*67e74705SXin LiClang represents most source ranges by [first, last], where "first" and "last"
514*67e74705SXin Lieach point to the beginning of their respective tokens.  For example consider
515*67e74705SXin Lithe ``SourceRange`` of the following statement:
516*67e74705SXin Li
517*67e74705SXin Li.. code-block:: text
518*67e74705SXin Li
519*67e74705SXin Li  x = foo + bar;
520*67e74705SXin Li  ^first    ^last
521*67e74705SXin Li
522*67e74705SXin LiTo map from this representation to a character-based representation, the "last"
523*67e74705SXin Lilocation needs to be adjusted to point to (or past) the end of that token with
524*67e74705SXin Lieither ``Lexer::MeasureTokenLength()`` or ``Lexer::getLocForEndOfToken()``.  For
525*67e74705SXin Lithe rare cases where character-level source ranges information is needed we use
526*67e74705SXin Lithe ``CharSourceRange`` class.
527*67e74705SXin Li
528*67e74705SXin LiThe Driver Library
529*67e74705SXin Li==================
530*67e74705SXin Li
531*67e74705SXin LiThe clang Driver and library are documented :doc:`here <DriverInternals>`.
532*67e74705SXin Li
533*67e74705SXin LiPrecompiled Headers
534*67e74705SXin Li===================
535*67e74705SXin Li
536*67e74705SXin LiClang supports two implementations of precompiled headers.  The default
537*67e74705SXin Liimplementation, precompiled headers (:doc:`PCH <PCHInternals>`) uses a
538*67e74705SXin Liserialized representation of Clang's internal data structures, encoded with the
539*67e74705SXin Li`LLVM bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_.
540*67e74705SXin LiPretokenized headers (:doc:`PTH <PTHInternals>`), on the other hand, contain a
541*67e74705SXin Liserialized representation of the tokens encountered when preprocessing a header
542*67e74705SXin Li(and anything that header includes).
543*67e74705SXin Li
544*67e74705SXin LiThe Frontend Library
545*67e74705SXin Li====================
546*67e74705SXin Li
547*67e74705SXin LiThe Frontend library contains functionality useful for building tools on top of
548*67e74705SXin Lithe Clang libraries, for example several methods for outputting diagnostics.
549*67e74705SXin Li
550*67e74705SXin LiThe Lexer and Preprocessor Library
551*67e74705SXin Li==================================
552*67e74705SXin Li
553*67e74705SXin LiThe Lexer library contains several tightly-connected classes that are involved
554*67e74705SXin Liwith the nasty process of lexing and preprocessing C source code.  The main
555*67e74705SXin Liinterface to this library for outside clients is the large ``Preprocessor``
556*67e74705SXin Liclass.  It contains the various pieces of state that are required to coherently
557*67e74705SXin Liread tokens out of a translation unit.
558*67e74705SXin Li
559*67e74705SXin LiThe core interface to the ``Preprocessor`` object (once it is set up) is the
560*67e74705SXin Li``Preprocessor::Lex`` method, which returns the next :ref:`Token <Token>` from
561*67e74705SXin Lithe preprocessor stream.  There are two types of token providers that the
562*67e74705SXin Lipreprocessor is capable of reading from: a buffer lexer (provided by the
563*67e74705SXin Li:ref:`Lexer <Lexer>` class) and a buffered token stream (provided by the
564*67e74705SXin Li:ref:`TokenLexer <TokenLexer>` class).
565*67e74705SXin Li
566*67e74705SXin Li.. _Token:
567*67e74705SXin Li
568*67e74705SXin LiThe Token class
569*67e74705SXin Li---------------
570*67e74705SXin Li
571*67e74705SXin LiThe ``Token`` class is used to represent a single lexed token.  Tokens are
572*67e74705SXin Liintended to be used by the lexer/preprocess and parser libraries, but are not
573*67e74705SXin Liintended to live beyond them (for example, they should not live in the ASTs).
574*67e74705SXin Li
575*67e74705SXin LiTokens most often live on the stack (or some other location that is efficient
576*67e74705SXin Lito access) as the parser is running, but occasionally do get buffered up.  For
577*67e74705SXin Liexample, macro definitions are stored as a series of tokens, and the C++
578*67e74705SXin Lifront-end periodically needs to buffer tokens up for tentative parsing and
579*67e74705SXin Livarious pieces of look-ahead.  As such, the size of a ``Token`` matters.  On a
580*67e74705SXin Li32-bit system, ``sizeof(Token)`` is currently 16 bytes.
581*67e74705SXin Li
582*67e74705SXin LiTokens occur in two forms: :ref:`annotation tokens <AnnotationToken>` and
583*67e74705SXin Linormal tokens.  Normal tokens are those returned by the lexer, annotation
584*67e74705SXin Litokens represent semantic information and are produced by the parser, replacing
585*67e74705SXin Linormal tokens in the token stream.  Normal tokens contain the following
586*67e74705SXin Liinformation:
587*67e74705SXin Li
588*67e74705SXin Li* **A SourceLocation** --- This indicates the location of the start of the
589*67e74705SXin Li  token.
590*67e74705SXin Li
591*67e74705SXin Li* **A length** --- This stores the length of the token as stored in the
592*67e74705SXin Li  ``SourceBuffer``.  For tokens that include them, this length includes
593*67e74705SXin Li  trigraphs and escaped newlines which are ignored by later phases of the
594*67e74705SXin Li  compiler.  By pointing into the original source buffer, it is always possible
595*67e74705SXin Li  to get the original spelling of a token completely accurately.
596*67e74705SXin Li
597*67e74705SXin Li* **IdentifierInfo** --- If a token takes the form of an identifier, and if
598*67e74705SXin Li  identifier lookup was enabled when the token was lexed (e.g., the lexer was
599*67e74705SXin Li  not reading in "raw" mode) this contains a pointer to the unique hash value
600*67e74705SXin Li  for the identifier.  Because the lookup happens before keyword
601*67e74705SXin Li  identification, this field is set even for language keywords like "``for``".
602*67e74705SXin Li
603*67e74705SXin Li* **TokenKind** --- This indicates the kind of token as classified by the
604*67e74705SXin Li  lexer.  This includes things like ``tok::starequal`` (for the "``*=``"
605*67e74705SXin Li  operator), ``tok::ampamp`` for the "``&&``" token, and keyword values (e.g.,
606*67e74705SXin Li  ``tok::kw_for``) for identifiers that correspond to keywords.  Note that
607*67e74705SXin Li  some tokens can be spelled multiple ways.  For example, C++ supports
608*67e74705SXin Li  "operator keywords", where things like "``and``" are treated exactly like the
609*67e74705SXin Li  "``&&``" operator.  In these cases, the kind value is set to ``tok::ampamp``,
610*67e74705SXin Li  which is good for the parser, which doesn't have to consider both forms.  For
611*67e74705SXin Li  something that cares about which form is used (e.g., the preprocessor
612*67e74705SXin Li  "stringize" operator) the spelling indicates the original form.
613*67e74705SXin Li
614*67e74705SXin Li* **Flags** --- There are currently four flags tracked by the
615*67e74705SXin Li  lexer/preprocessor system on a per-token basis:
616*67e74705SXin Li
617*67e74705SXin Li  #. **StartOfLine** --- This was the first token that occurred on its input
618*67e74705SXin Li     source line.
619*67e74705SXin Li  #. **LeadingSpace** --- There was a space character either immediately before
620*67e74705SXin Li     the token or transitively before the token as it was expanded through a
621*67e74705SXin Li     macro.  The definition of this flag is very closely defined by the
622*67e74705SXin Li     stringizing requirements of the preprocessor.
623*67e74705SXin Li  #. **DisableExpand** --- This flag is used internally to the preprocessor to
624*67e74705SXin Li     represent identifier tokens which have macro expansion disabled.  This
625*67e74705SXin Li     prevents them from being considered as candidates for macro expansion ever
626*67e74705SXin Li     in the future.
627*67e74705SXin Li  #. **NeedsCleaning** --- This flag is set if the original spelling for the
628*67e74705SXin Li     token includes a trigraph or escaped newline.  Since this is uncommon,
629*67e74705SXin Li     many pieces of code can fast-path on tokens that did not need cleaning.
630*67e74705SXin Li
631*67e74705SXin LiOne interesting (and somewhat unusual) aspect of normal tokens is that they
632*67e74705SXin Lidon't contain any semantic information about the lexed value.  For example, if
633*67e74705SXin Lithe token was a pp-number token, we do not represent the value of the number
634*67e74705SXin Lithat was lexed (this is left for later pieces of code to decide).
635*67e74705SXin LiAdditionally, the lexer library has no notion of typedef names vs variable
636*67e74705SXin Linames: both are returned as identifiers, and the parser is left to decide
637*67e74705SXin Liwhether a specific identifier is a typedef or a variable (tracking this
638*67e74705SXin Lirequires scope information among other things).  The parser can do this
639*67e74705SXin Litranslation by replacing tokens returned by the preprocessor with "Annotation
640*67e74705SXin LiTokens".
641*67e74705SXin Li
642*67e74705SXin Li.. _AnnotationToken:
643*67e74705SXin Li
644*67e74705SXin LiAnnotation Tokens
645*67e74705SXin Li-----------------
646*67e74705SXin Li
647*67e74705SXin LiAnnotation tokens are tokens that are synthesized by the parser and injected
648*67e74705SXin Liinto the preprocessor's token stream (replacing existing tokens) to record
649*67e74705SXin Lisemantic information found by the parser.  For example, if "``foo``" is found
650*67e74705SXin Lito be a typedef, the "``foo``" ``tok::identifier`` token is replaced with an
651*67e74705SXin Li``tok::annot_typename``.  This is useful for a couple of reasons: 1) this makes
652*67e74705SXin Liit easy to handle qualified type names (e.g., "``foo::bar::baz<42>::t``") in
653*67e74705SXin LiC++ as a single "token" in the parser.  2) if the parser backtracks, the
654*67e74705SXin Lireparse does not need to redo semantic analysis to determine whether a token
655*67e74705SXin Lisequence is a variable, type, template, etc.
656*67e74705SXin Li
657*67e74705SXin LiAnnotation tokens are created by the parser and reinjected into the parser's
658*67e74705SXin Litoken stream (when backtracking is enabled).  Because they can only exist in
659*67e74705SXin Litokens that the preprocessor-proper is done with, it doesn't need to keep
660*67e74705SXin Liaround flags like "start of line" that the preprocessor uses to do its job.
661*67e74705SXin LiAdditionally, an annotation token may "cover" a sequence of preprocessor tokens
662*67e74705SXin Li(e.g., "``a::b::c``" is five preprocessor tokens).  As such, the valid fields
663*67e74705SXin Liof an annotation token are different than the fields for a normal token (but
664*67e74705SXin Lithey are multiplexed into the normal ``Token`` fields):
665*67e74705SXin Li
666*67e74705SXin Li* **SourceLocation "Location"** --- The ``SourceLocation`` for the annotation
667*67e74705SXin Li  token indicates the first token replaced by the annotation token.  In the
668*67e74705SXin Li  example above, it would be the location of the "``a``" identifier.
669*67e74705SXin Li* **SourceLocation "AnnotationEndLoc"** --- This holds the location of the last
670*67e74705SXin Li  token replaced with the annotation token.  In the example above, it would be
671*67e74705SXin Li  the location of the "``c``" identifier.
672*67e74705SXin Li* **void* "AnnotationValue"** --- This contains an opaque object that the
673*67e74705SXin Li  parser gets from ``Sema``.  The parser merely preserves the information for
674*67e74705SXin Li  ``Sema`` to later interpret based on the annotation token kind.
675*67e74705SXin Li* **TokenKind "Kind"** --- This indicates the kind of Annotation token this is.
676*67e74705SXin Li  See below for the different valid kinds.
677*67e74705SXin Li
678*67e74705SXin LiAnnotation tokens currently come in three kinds:
679*67e74705SXin Li
680*67e74705SXin Li#. **tok::annot_typename**: This annotation token represents a resolved
681*67e74705SXin Li   typename token that is potentially qualified.  The ``AnnotationValue`` field
682*67e74705SXin Li   contains the ``QualType`` returned by ``Sema::getTypeName()``, possibly with
683*67e74705SXin Li   source location information attached.
684*67e74705SXin Li#. **tok::annot_cxxscope**: This annotation token represents a C++ scope
685*67e74705SXin Li   specifier, such as "``A::B::``".  This corresponds to the grammar
686*67e74705SXin Li   productions "*::*" and "*:: [opt] nested-name-specifier*".  The
687*67e74705SXin Li   ``AnnotationValue`` pointer is a ``NestedNameSpecifier *`` returned by the
688*67e74705SXin Li   ``Sema::ActOnCXXGlobalScopeSpecifier`` and
689*67e74705SXin Li   ``Sema::ActOnCXXNestedNameSpecifier`` callbacks.
690*67e74705SXin Li#. **tok::annot_template_id**: This annotation token represents a C++
691*67e74705SXin Li   template-id such as "``foo<int, 4>``", where "``foo``" is the name of a
692*67e74705SXin Li   template.  The ``AnnotationValue`` pointer is a pointer to a ``malloc``'d
693*67e74705SXin Li   ``TemplateIdAnnotation`` object.  Depending on the context, a parsed
694*67e74705SXin Li   template-id that names a type might become a typename annotation token (if
695*67e74705SXin Li   all we care about is the named type, e.g., because it occurs in a type
696*67e74705SXin Li   specifier) or might remain a template-id token (if we want to retain more
697*67e74705SXin Li   source location information or produce a new type, e.g., in a declaration of
698*67e74705SXin Li   a class template specialization).  template-id annotation tokens that refer
699*67e74705SXin Li   to a type can be "upgraded" to typename annotation tokens by the parser.
700*67e74705SXin Li
701*67e74705SXin LiAs mentioned above, annotation tokens are not returned by the preprocessor,
702*67e74705SXin Lithey are formed on demand by the parser.  This means that the parser has to be
703*67e74705SXin Liaware of cases where an annotation could occur and form it where appropriate.
704*67e74705SXin LiThis is somewhat similar to how the parser handles Translation Phase 6 of C99:
705*67e74705SXin LiString Concatenation (see C99 5.1.1.2).  In the case of string concatenation,
706*67e74705SXin Lithe preprocessor just returns distinct ``tok::string_literal`` and
707*67e74705SXin Li``tok::wide_string_literal`` tokens and the parser eats a sequence of them
708*67e74705SXin Liwherever the grammar indicates that a string literal can occur.
709*67e74705SXin Li
710*67e74705SXin LiIn order to do this, whenever the parser expects a ``tok::identifier`` or
711*67e74705SXin Li``tok::coloncolon``, it should call the ``TryAnnotateTypeOrScopeToken`` or
712*67e74705SXin Li``TryAnnotateCXXScopeToken`` methods to form the annotation token.  These
713*67e74705SXin Limethods will maximally form the specified annotation tokens and replace the
714*67e74705SXin Licurrent token with them, if applicable.  If the current tokens is not valid for
715*67e74705SXin Lian annotation token, it will remain an identifier or "``::``" token.
716*67e74705SXin Li
717*67e74705SXin Li.. _Lexer:
718*67e74705SXin Li
719*67e74705SXin LiThe ``Lexer`` class
720*67e74705SXin Li-------------------
721*67e74705SXin Li
722*67e74705SXin LiThe ``Lexer`` class provides the mechanics of lexing tokens out of a source
723*67e74705SXin Libuffer and deciding what they mean.  The ``Lexer`` is complicated by the fact
724*67e74705SXin Lithat it operates on raw buffers that have not had spelling eliminated (this is
725*67e74705SXin Lia necessity to get decent performance), but this is countered with careful
726*67e74705SXin Licoding as well as standard performance techniques (for example, the comment
727*67e74705SXin Lihandling code is vectorized on X86 and PowerPC hosts).
728*67e74705SXin Li
729*67e74705SXin LiThe lexer has a couple of interesting modal features:
730*67e74705SXin Li
731*67e74705SXin Li* The lexer can operate in "raw" mode.  This mode has several features that
732*67e74705SXin Li  make it possible to quickly lex the file (e.g., it stops identifier lookup,
733*67e74705SXin Li  doesn't specially handle preprocessor tokens, handles EOF differently, etc).
734*67e74705SXin Li  This mode is used for lexing within an "``#if 0``" block, for example.
735*67e74705SXin Li* The lexer can capture and return comments as tokens.  This is required to
736*67e74705SXin Li  support the ``-C`` preprocessor mode, which passes comments through, and is
737*67e74705SXin Li  used by the diagnostic checker to identifier expect-error annotations.
738*67e74705SXin Li* The lexer can be in ``ParsingFilename`` mode, which happens when
739*67e74705SXin Li  preprocessing after reading a ``#include`` directive.  This mode changes the
740*67e74705SXin Li  parsing of "``<``" to return an "angled string" instead of a bunch of tokens
741*67e74705SXin Li  for each thing within the filename.
742*67e74705SXin Li* When parsing a preprocessor directive (after "``#``") the
743*67e74705SXin Li  ``ParsingPreprocessorDirective`` mode is entered.  This changes the parser to
744*67e74705SXin Li  return EOD at a newline.
745*67e74705SXin Li* The ``Lexer`` uses a ``LangOptions`` object to know whether trigraphs are
746*67e74705SXin Li  enabled, whether C++ or ObjC keywords are recognized, etc.
747*67e74705SXin Li
748*67e74705SXin LiIn addition to these modes, the lexer keeps track of a couple of other features
749*67e74705SXin Lithat are local to a lexed buffer, which change as the buffer is lexed:
750*67e74705SXin Li
751*67e74705SXin Li* The ``Lexer`` uses ``BufferPtr`` to keep track of the current character being
752*67e74705SXin Li  lexed.
753*67e74705SXin Li* The ``Lexer`` uses ``IsAtStartOfLine`` to keep track of whether the next
754*67e74705SXin Li  lexed token will start with its "start of line" bit set.
755*67e74705SXin Li* The ``Lexer`` keeps track of the current "``#if``" directives that are active
756*67e74705SXin Li  (which can be nested).
757*67e74705SXin Li* The ``Lexer`` keeps track of an :ref:`MultipleIncludeOpt
758*67e74705SXin Li  <MultipleIncludeOpt>` object, which is used to detect whether the buffer uses
759*67e74705SXin Li  the standard "``#ifndef XX`` / ``#define XX``" idiom to prevent multiple
760*67e74705SXin Li  inclusion.  If a buffer does, subsequent includes can be ignored if the
761*67e74705SXin Li  "``XX``" macro is defined.
762*67e74705SXin Li
763*67e74705SXin Li.. _TokenLexer:
764*67e74705SXin Li
765*67e74705SXin LiThe ``TokenLexer`` class
766*67e74705SXin Li------------------------
767*67e74705SXin Li
768*67e74705SXin LiThe ``TokenLexer`` class is a token provider that returns tokens from a list of
769*67e74705SXin Litokens that came from somewhere else.  It typically used for two things: 1)
770*67e74705SXin Lireturning tokens from a macro definition as it is being expanded 2) returning
771*67e74705SXin Litokens from an arbitrary buffer of tokens.  The later use is used by
772*67e74705SXin Li``_Pragma`` and will most likely be used to handle unbounded look-ahead for the
773*67e74705SXin LiC++ parser.
774*67e74705SXin Li
775*67e74705SXin Li.. _MultipleIncludeOpt:
776*67e74705SXin Li
777*67e74705SXin LiThe ``MultipleIncludeOpt`` class
778*67e74705SXin Li--------------------------------
779*67e74705SXin Li
780*67e74705SXin LiThe ``MultipleIncludeOpt`` class implements a really simple little state
781*67e74705SXin Limachine that is used to detect the standard "``#ifndef XX`` / ``#define XX``"
782*67e74705SXin Liidiom that people typically use to prevent multiple inclusion of headers.  If a
783*67e74705SXin Libuffer uses this idiom and is subsequently ``#include``'d, the preprocessor can
784*67e74705SXin Lisimply check to see whether the guarding condition is defined or not.  If so,
785*67e74705SXin Lithe preprocessor can completely ignore the include of the header.
786*67e74705SXin Li
787*67e74705SXin Li.. _Parser:
788*67e74705SXin Li
789*67e74705SXin LiThe Parser Library
790*67e74705SXin Li==================
791*67e74705SXin Li
792*67e74705SXin LiThis library contains a recursive-descent parser that polls tokens from the
793*67e74705SXin Lipreprocessor and notifies a client of the parsing progress.
794*67e74705SXin Li
795*67e74705SXin LiHistorically, the parser used to talk to an abstract ``Action`` interface that
796*67e74705SXin Lihad virtual methods for parse events, for example ``ActOnBinOp()``.  When Clang
797*67e74705SXin Ligrew C++ support, the parser stopped supporting general ``Action`` clients --
798*67e74705SXin Liit now always talks to the :ref:`Sema libray <Sema>`.  However, the Parser
799*67e74705SXin Listill accesses AST objects only through opaque types like ``ExprResult`` and
800*67e74705SXin Li``StmtResult``.  Only :ref:`Sema <Sema>` looks at the AST node contents of these
801*67e74705SXin Liwrappers.
802*67e74705SXin Li
803*67e74705SXin Li.. _AST:
804*67e74705SXin Li
805*67e74705SXin LiThe AST Library
806*67e74705SXin Li===============
807*67e74705SXin Li
808*67e74705SXin Li.. _Type:
809*67e74705SXin Li
810*67e74705SXin LiThe ``Type`` class and its subclasses
811*67e74705SXin Li-------------------------------------
812*67e74705SXin Li
813*67e74705SXin LiThe ``Type`` class (and its subclasses) are an important part of the AST.
814*67e74705SXin LiTypes are accessed through the ``ASTContext`` class, which implicitly creates
815*67e74705SXin Liand uniques them as they are needed.  Types have a couple of non-obvious
816*67e74705SXin Lifeatures: 1) they do not capture type qualifiers like ``const`` or ``volatile``
817*67e74705SXin Li(see :ref:`QualType <QualType>`), and 2) they implicitly capture typedef
818*67e74705SXin Liinformation.  Once created, types are immutable (unlike decls).
819*67e74705SXin Li
820*67e74705SXin LiTypedefs in C make semantic analysis a bit more complex than it would be without
821*67e74705SXin Lithem.  The issue is that we want to capture typedef information and represent it
822*67e74705SXin Liin the AST perfectly, but the semantics of operations need to "see through"
823*67e74705SXin Litypedefs.  For example, consider this code:
824*67e74705SXin Li
825*67e74705SXin Li.. code-block:: c++
826*67e74705SXin Li
827*67e74705SXin Li  void func() {
828*67e74705SXin Li    typedef int foo;
829*67e74705SXin Li    foo X, *Y;
830*67e74705SXin Li    typedef foo *bar;
831*67e74705SXin Li    bar Z;
832*67e74705SXin Li    *X; // error
833*67e74705SXin Li    **Y; // error
834*67e74705SXin Li    **Z; // error
835*67e74705SXin Li  }
836*67e74705SXin Li
837*67e74705SXin LiThe code above is illegal, and thus we expect there to be diagnostics emitted
838*67e74705SXin Lion the annotated lines.  In this example, we expect to get:
839*67e74705SXin Li
840*67e74705SXin Li.. code-block:: text
841*67e74705SXin Li
842*67e74705SXin Li  test.c:6:1: error: indirection requires pointer operand ('foo' invalid)
843*67e74705SXin Li    *X; // error
844*67e74705SXin Li    ^~
845*67e74705SXin Li  test.c:7:1: error: indirection requires pointer operand ('foo' invalid)
846*67e74705SXin Li    **Y; // error
847*67e74705SXin Li    ^~~
848*67e74705SXin Li  test.c:8:1: error: indirection requires pointer operand ('foo' invalid)
849*67e74705SXin Li    **Z; // error
850*67e74705SXin Li    ^~~
851*67e74705SXin Li
852*67e74705SXin LiWhile this example is somewhat silly, it illustrates the point: we want to
853*67e74705SXin Liretain typedef information where possible, so that we can emit errors about
854*67e74705SXin Li"``std::string``" instead of "``std::basic_string<char, std:...``".  Doing this
855*67e74705SXin Lirequires properly keeping typedef information (for example, the type of ``X``
856*67e74705SXin Liis "``foo``", not "``int``"), and requires properly propagating it through the
857*67e74705SXin Livarious operators (for example, the type of ``*Y`` is "``foo``", not
858*67e74705SXin Li"``int``").  In order to retain this information, the type of these expressions
859*67e74705SXin Liis an instance of the ``TypedefType`` class, which indicates that the type of
860*67e74705SXin Lithese expressions is a typedef for "``foo``".
861*67e74705SXin Li
862*67e74705SXin LiRepresenting types like this is great for diagnostics, because the
863*67e74705SXin Liuser-specified type is always immediately available.  There are two problems
864*67e74705SXin Liwith this: first, various semantic checks need to make judgements about the
865*67e74705SXin Li*actual structure* of a type, ignoring typedefs.  Second, we need an efficient
866*67e74705SXin Liway to query whether two types are structurally identical to each other,
867*67e74705SXin Liignoring typedefs.  The solution to both of these problems is the idea of
868*67e74705SXin Licanonical types.
869*67e74705SXin Li
870*67e74705SXin LiCanonical Types
871*67e74705SXin Li^^^^^^^^^^^^^^^
872*67e74705SXin Li
873*67e74705SXin LiEvery instance of the ``Type`` class contains a canonical type pointer.  For
874*67e74705SXin Lisimple types with no typedefs involved (e.g., "``int``", "``int*``",
875*67e74705SXin Li"``int**``"), the type just points to itself.  For types that have a typedef
876*67e74705SXin Lisomewhere in their structure (e.g., "``foo``", "``foo*``", "``foo**``",
877*67e74705SXin Li"``bar``"), the canonical type pointer points to their structurally equivalent
878*67e74705SXin Litype without any typedefs (e.g., "``int``", "``int*``", "``int**``", and
879*67e74705SXin Li"``int*``" respectively).
880*67e74705SXin Li
881*67e74705SXin LiThis design provides a constant time operation (dereferencing the canonical type
882*67e74705SXin Lipointer) that gives us access to the structure of types.  For example, we can
883*67e74705SXin Litrivially tell that "``bar``" and "``foo*``" are the same type by dereferencing
884*67e74705SXin Litheir canonical type pointers and doing a pointer comparison (they both point
885*67e74705SXin Lito the single "``int*``" type).
886*67e74705SXin Li
887*67e74705SXin LiCanonical types and typedef types bring up some complexities that must be
888*67e74705SXin Licarefully managed.  Specifically, the ``isa``/``cast``/``dyn_cast`` operators
889*67e74705SXin Ligenerally shouldn't be used in code that is inspecting the AST.  For example,
890*67e74705SXin Liwhen type checking the indirection operator (unary "``*``" on a pointer), the
891*67e74705SXin Litype checker must verify that the operand has a pointer type.  It would not be
892*67e74705SXin Licorrect to check that with "``isa<PointerType>(SubExpr->getType())``", because
893*67e74705SXin Lithis predicate would fail if the subexpression had a typedef type.
894*67e74705SXin Li
895*67e74705SXin LiThe solution to this problem are a set of helper methods on ``Type``, used to
896*67e74705SXin Licheck their properties.  In this case, it would be correct to use
897*67e74705SXin Li"``SubExpr->getType()->isPointerType()``" to do the check.  This predicate will
898*67e74705SXin Lireturn true if the *canonical type is a pointer*, which is true any time the
899*67e74705SXin Litype is structurally a pointer type.  The only hard part here is remembering
900*67e74705SXin Linot to use the ``isa``/``cast``/``dyn_cast`` operations.
901*67e74705SXin Li
902*67e74705SXin LiThe second problem we face is how to get access to the pointer type once we
903*67e74705SXin Liknow it exists.  To continue the example, the result type of the indirection
904*67e74705SXin Lioperator is the pointee type of the subexpression.  In order to determine the
905*67e74705SXin Litype, we need to get the instance of ``PointerType`` that best captures the
906*67e74705SXin Litypedef information in the program.  If the type of the expression is literally
907*67e74705SXin Lia ``PointerType``, we can return that, otherwise we have to dig through the
908*67e74705SXin Litypedefs to find the pointer type.  For example, if the subexpression had type
909*67e74705SXin Li"``foo*``", we could return that type as the result.  If the subexpression had
910*67e74705SXin Litype "``bar``", we want to return "``foo*``" (note that we do *not* want
911*67e74705SXin Li"``int*``").  In order to provide all of this, ``Type`` has a
912*67e74705SXin Li``getAsPointerType()`` method that checks whether the type is structurally a
913*67e74705SXin Li``PointerType`` and, if so, returns the best one.  If not, it returns a null
914*67e74705SXin Lipointer.
915*67e74705SXin Li
916*67e74705SXin LiThis structure is somewhat mystical, but after meditating on it, it will make
917*67e74705SXin Lisense to you :).
918*67e74705SXin Li
919*67e74705SXin Li.. _QualType:
920*67e74705SXin Li
921*67e74705SXin LiThe ``QualType`` class
922*67e74705SXin Li----------------------
923*67e74705SXin Li
924*67e74705SXin LiThe ``QualType`` class is designed as a trivial value class that is small,
925*67e74705SXin Lipassed by-value and is efficient to query.  The idea of ``QualType`` is that it
926*67e74705SXin Listores the type qualifiers (``const``, ``volatile``, ``restrict``, plus some
927*67e74705SXin Liextended qualifiers required by language extensions) separately from the types
928*67e74705SXin Lithemselves.  ``QualType`` is conceptually a pair of "``Type*``" and the bits
929*67e74705SXin Lifor these type qualifiers.
930*67e74705SXin Li
931*67e74705SXin LiBy storing the type qualifiers as bits in the conceptual pair, it is extremely
932*67e74705SXin Liefficient to get the set of qualifiers on a ``QualType`` (just return the field
933*67e74705SXin Liof the pair), add a type qualifier (which is a trivial constant-time operation
934*67e74705SXin Lithat sets a bit), and remove one or more type qualifiers (just return a
935*67e74705SXin Li``QualType`` with the bitfield set to empty).
936*67e74705SXin Li
937*67e74705SXin LiFurther, because the bits are stored outside of the type itself, we do not need
938*67e74705SXin Lito create duplicates of types with different sets of qualifiers (i.e. there is
939*67e74705SXin Lionly a single heap allocated "``int``" type: "``const int``" and "``volatile
940*67e74705SXin Liconst int``" both point to the same heap allocated "``int``" type).  This
941*67e74705SXin Lireduces the heap size used to represent bits and also means we do not have to
942*67e74705SXin Liconsider qualifiers when uniquing types (:ref:`Type <Type>` does not even
943*67e74705SXin Licontain qualifiers).
944*67e74705SXin Li
945*67e74705SXin LiIn practice, the two most common type qualifiers (``const`` and ``restrict``)
946*67e74705SXin Liare stored in the low bits of the pointer to the ``Type`` object, together with
947*67e74705SXin Lia flag indicating whether extended qualifiers are present (which must be
948*67e74705SXin Liheap-allocated).  This means that ``QualType`` is exactly the same size as a
949*67e74705SXin Lipointer.
950*67e74705SXin Li
951*67e74705SXin Li.. _DeclarationName:
952*67e74705SXin Li
953*67e74705SXin LiDeclaration names
954*67e74705SXin Li-----------------
955*67e74705SXin Li
956*67e74705SXin LiThe ``DeclarationName`` class represents the name of a declaration in Clang.
957*67e74705SXin LiDeclarations in the C family of languages can take several different forms.
958*67e74705SXin LiMost declarations are named by simple identifiers, e.g., "``f``" and "``x``" in
959*67e74705SXin Lithe function declaration ``f(int x)``.  In C++, declaration names can also name
960*67e74705SXin Liclass constructors ("``Class``" in ``struct Class { Class(); }``), class
961*67e74705SXin Lidestructors ("``~Class``"), overloaded operator names ("``operator+``"), and
962*67e74705SXin Liconversion functions ("``operator void const *``").  In Objective-C,
963*67e74705SXin Lideclaration names can refer to the names of Objective-C methods, which involve
964*67e74705SXin Lithe method name and the parameters, collectively called a *selector*, e.g.,
965*67e74705SXin Li"``setWidth:height:``".  Since all of these kinds of entities --- variables,
966*67e74705SXin Lifunctions, Objective-C methods, C++ constructors, destructors, and operators
967*67e74705SXin Li--- are represented as subclasses of Clang's common ``NamedDecl`` class,
968*67e74705SXin Li``DeclarationName`` is designed to efficiently represent any kind of name.
969*67e74705SXin Li
970*67e74705SXin LiGiven a ``DeclarationName`` ``N``, ``N.getNameKind()`` will produce a value
971*67e74705SXin Lithat describes what kind of name ``N`` stores.  There are 10 options (all of
972*67e74705SXin Lithe names are inside the ``DeclarationName`` class).
973*67e74705SXin Li
974*67e74705SXin Li``Identifier``
975*67e74705SXin Li
976*67e74705SXin Li  The name is a simple identifier.  Use ``N.getAsIdentifierInfo()`` to retrieve
977*67e74705SXin Li  the corresponding ``IdentifierInfo*`` pointing to the actual identifier.
978*67e74705SXin Li
979*67e74705SXin Li``ObjCZeroArgSelector``, ``ObjCOneArgSelector``, ``ObjCMultiArgSelector``
980*67e74705SXin Li
981*67e74705SXin Li  The name is an Objective-C selector, which can be retrieved as a ``Selector``
982*67e74705SXin Li  instance via ``N.getObjCSelector()``.  The three possible name kinds for
983*67e74705SXin Li  Objective-C reflect an optimization within the ``DeclarationName`` class:
984*67e74705SXin Li  both zero- and one-argument selectors are stored as a masked
985*67e74705SXin Li  ``IdentifierInfo`` pointer, and therefore require very little space, since
986*67e74705SXin Li  zero- and one-argument selectors are far more common than multi-argument
987*67e74705SXin Li  selectors (which use a different structure).
988*67e74705SXin Li
989*67e74705SXin Li``CXXConstructorName``
990*67e74705SXin Li
991*67e74705SXin Li  The name is a C++ constructor name.  Use ``N.getCXXNameType()`` to retrieve
992*67e74705SXin Li  the :ref:`type <QualType>` that this constructor is meant to construct.  The
993*67e74705SXin Li  type is always the canonical type, since all constructors for a given type
994*67e74705SXin Li  have the same name.
995*67e74705SXin Li
996*67e74705SXin Li``CXXDestructorName``
997*67e74705SXin Li
998*67e74705SXin Li  The name is a C++ destructor name.  Use ``N.getCXXNameType()`` to retrieve
999*67e74705SXin Li  the :ref:`type <QualType>` whose destructor is being named.  This type is
1000*67e74705SXin Li  always a canonical type.
1001*67e74705SXin Li
1002*67e74705SXin Li``CXXConversionFunctionName``
1003*67e74705SXin Li
1004*67e74705SXin Li  The name is a C++ conversion function.  Conversion functions are named
1005*67e74705SXin Li  according to the type they convert to, e.g., "``operator void const *``".
1006*67e74705SXin Li  Use ``N.getCXXNameType()`` to retrieve the type that this conversion function
1007*67e74705SXin Li  converts to.  This type is always a canonical type.
1008*67e74705SXin Li
1009*67e74705SXin Li``CXXOperatorName``
1010*67e74705SXin Li
1011*67e74705SXin Li  The name is a C++ overloaded operator name.  Overloaded operators are named
1012*67e74705SXin Li  according to their spelling, e.g., "``operator+``" or "``operator new []``".
1013*67e74705SXin Li  Use ``N.getCXXOverloadedOperator()`` to retrieve the overloaded operator (a
1014*67e74705SXin Li  value of type ``OverloadedOperatorKind``).
1015*67e74705SXin Li
1016*67e74705SXin Li``CXXLiteralOperatorName``
1017*67e74705SXin Li
1018*67e74705SXin Li  The name is a C++11 user defined literal operator.  User defined
1019*67e74705SXin Li  Literal operators are named according to the suffix they define,
1020*67e74705SXin Li  e.g., "``_foo``" for "``operator "" _foo``".  Use
1021*67e74705SXin Li  ``N.getCXXLiteralIdentifier()`` to retrieve the corresponding
1022*67e74705SXin Li  ``IdentifierInfo*`` pointing to the identifier.
1023*67e74705SXin Li
1024*67e74705SXin Li``CXXUsingDirective``
1025*67e74705SXin Li
1026*67e74705SXin Li  The name is a C++ using directive.  Using directives are not really
1027*67e74705SXin Li  NamedDecls, in that they all have the same name, but they are
1028*67e74705SXin Li  implemented as such in order to store them in DeclContext
1029*67e74705SXin Li  effectively.
1030*67e74705SXin Li
1031*67e74705SXin Li``DeclarationName``\ s are cheap to create, copy, and compare.  They require
1032*67e74705SXin Lionly a single pointer's worth of storage in the common cases (identifiers,
1033*67e74705SXin Lizero- and one-argument Objective-C selectors) and use dense, uniqued storage
1034*67e74705SXin Lifor the other kinds of names.  Two ``DeclarationName``\ s can be compared for
1035*67e74705SXin Liequality (``==``, ``!=``) using a simple bitwise comparison, can be ordered
1036*67e74705SXin Liwith ``<``, ``>``, ``<=``, and ``>=`` (which provide a lexicographical ordering
1037*67e74705SXin Lifor normal identifiers but an unspecified ordering for other kinds of names),
1038*67e74705SXin Liand can be placed into LLVM ``DenseMap``\ s and ``DenseSet``\ s.
1039*67e74705SXin Li
1040*67e74705SXin Li``DeclarationName`` instances can be created in different ways depending on
1041*67e74705SXin Liwhat kind of name the instance will store.  Normal identifiers
1042*67e74705SXin Li(``IdentifierInfo`` pointers) and Objective-C selectors (``Selector``) can be
1043*67e74705SXin Liimplicitly converted to ``DeclarationNames``.  Names for C++ constructors,
1044*67e74705SXin Lidestructors, conversion functions, and overloaded operators can be retrieved
1045*67e74705SXin Lifrom the ``DeclarationNameTable``, an instance of which is available as
1046*67e74705SXin Li``ASTContext::DeclarationNames``.  The member functions
1047*67e74705SXin Li``getCXXConstructorName``, ``getCXXDestructorName``,
1048*67e74705SXin Li``getCXXConversionFunctionName``, and ``getCXXOperatorName``, respectively,
1049*67e74705SXin Lireturn ``DeclarationName`` instances for the four kinds of C++ special function
1050*67e74705SXin Linames.
1051*67e74705SXin Li
1052*67e74705SXin Li.. _DeclContext:
1053*67e74705SXin Li
1054*67e74705SXin LiDeclaration contexts
1055*67e74705SXin Li--------------------
1056*67e74705SXin Li
1057*67e74705SXin LiEvery declaration in a program exists within some *declaration context*, such
1058*67e74705SXin Lias a translation unit, namespace, class, or function.  Declaration contexts in
1059*67e74705SXin LiClang are represented by the ``DeclContext`` class, from which the various
1060*67e74705SXin Lideclaration-context AST nodes (``TranslationUnitDecl``, ``NamespaceDecl``,
1061*67e74705SXin Li``RecordDecl``, ``FunctionDecl``, etc.) will derive.  The ``DeclContext`` class
1062*67e74705SXin Liprovides several facilities common to each declaration context:
1063*67e74705SXin Li
1064*67e74705SXin LiSource-centric vs. Semantics-centric View of Declarations
1065*67e74705SXin Li
1066*67e74705SXin Li  ``DeclContext`` provides two views of the declarations stored within a
1067*67e74705SXin Li  declaration context.  The source-centric view accurately represents the
1068*67e74705SXin Li  program source code as written, including multiple declarations of entities
1069*67e74705SXin Li  where present (see the section :ref:`Redeclarations and Overloads
1070*67e74705SXin Li  <Redeclarations>`), while the semantics-centric view represents the program
1071*67e74705SXin Li  semantics.  The two views are kept synchronized by semantic analysis while
1072*67e74705SXin Li  the ASTs are being constructed.
1073*67e74705SXin Li
1074*67e74705SXin LiStorage of declarations within that context
1075*67e74705SXin Li
1076*67e74705SXin Li  Every declaration context can contain some number of declarations.  For
1077*67e74705SXin Li  example, a C++ class (represented by ``RecordDecl``) contains various member
1078*67e74705SXin Li  functions, fields, nested types, and so on.  All of these declarations will
1079*67e74705SXin Li  be stored within the ``DeclContext``, and one can iterate over the
1080*67e74705SXin Li  declarations via [``DeclContext::decls_begin()``,
1081*67e74705SXin Li  ``DeclContext::decls_end()``).  This mechanism provides the source-centric
1082*67e74705SXin Li  view of declarations in the context.
1083*67e74705SXin Li
1084*67e74705SXin LiLookup of declarations within that context
1085*67e74705SXin Li
1086*67e74705SXin Li  The ``DeclContext`` structure provides efficient name lookup for names within
1087*67e74705SXin Li  that declaration context.  For example, if ``N`` is a namespace we can look
1088*67e74705SXin Li  for the name ``N::f`` using ``DeclContext::lookup``.  The lookup itself is
1089*67e74705SXin Li  based on a lazily-constructed array (for declaration contexts with a small
1090*67e74705SXin Li  number of declarations) or hash table (for declaration contexts with more
1091*67e74705SXin Li  declarations).  The lookup operation provides the semantics-centric view of
1092*67e74705SXin Li  the declarations in the context.
1093*67e74705SXin Li
1094*67e74705SXin LiOwnership of declarations
1095*67e74705SXin Li
1096*67e74705SXin Li  The ``DeclContext`` owns all of the declarations that were declared within
1097*67e74705SXin Li  its declaration context, and is responsible for the management of their
1098*67e74705SXin Li  memory as well as their (de-)serialization.
1099*67e74705SXin Li
1100*67e74705SXin LiAll declarations are stored within a declaration context, and one can query
1101*67e74705SXin Liinformation about the context in which each declaration lives.  One can
1102*67e74705SXin Liretrieve the ``DeclContext`` that contains a particular ``Decl`` using
1103*67e74705SXin Li``Decl::getDeclContext``.  However, see the section
1104*67e74705SXin Li:ref:`LexicalAndSemanticContexts` for more information about how to interpret
1105*67e74705SXin Lithis context information.
1106*67e74705SXin Li
1107*67e74705SXin Li.. _Redeclarations:
1108*67e74705SXin Li
1109*67e74705SXin LiRedeclarations and Overloads
1110*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1111*67e74705SXin Li
1112*67e74705SXin LiWithin a translation unit, it is common for an entity to be declared several
1113*67e74705SXin Litimes.  For example, we might declare a function "``f``" and then later
1114*67e74705SXin Lire-declare it as part of an inlined definition:
1115*67e74705SXin Li
1116*67e74705SXin Li.. code-block:: c++
1117*67e74705SXin Li
1118*67e74705SXin Li  void f(int x, int y, int z = 1);
1119*67e74705SXin Li
1120*67e74705SXin Li  inline void f(int x, int y, int z) { /* ...  */ }
1121*67e74705SXin Li
1122*67e74705SXin LiThe representation of "``f``" differs in the source-centric and
1123*67e74705SXin Lisemantics-centric views of a declaration context.  In the source-centric view,
1124*67e74705SXin Liall redeclarations will be present, in the order they occurred in the source
1125*67e74705SXin Licode, making this view suitable for clients that wish to see the structure of
1126*67e74705SXin Lithe source code.  In the semantics-centric view, only the most recent "``f``"
1127*67e74705SXin Liwill be found by the lookup, since it effectively replaces the first
1128*67e74705SXin Lideclaration of "``f``".
1129*67e74705SXin Li
1130*67e74705SXin LiIn the semantics-centric view, overloading of functions is represented
1131*67e74705SXin Liexplicitly.  For example, given two declarations of a function "``g``" that are
1132*67e74705SXin Lioverloaded, e.g.,
1133*67e74705SXin Li
1134*67e74705SXin Li.. code-block:: c++
1135*67e74705SXin Li
1136*67e74705SXin Li  void g();
1137*67e74705SXin Li  void g(int);
1138*67e74705SXin Li
1139*67e74705SXin Lithe ``DeclContext::lookup`` operation will return a
1140*67e74705SXin Li``DeclContext::lookup_result`` that contains a range of iterators over
1141*67e74705SXin Lideclarations of "``g``".  Clients that perform semantic analysis on a program
1142*67e74705SXin Lithat is not concerned with the actual source code will primarily use this
1143*67e74705SXin Lisemantics-centric view.
1144*67e74705SXin Li
1145*67e74705SXin Li.. _LexicalAndSemanticContexts:
1146*67e74705SXin Li
1147*67e74705SXin LiLexical and Semantic Contexts
1148*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1149*67e74705SXin Li
1150*67e74705SXin LiEach declaration has two potentially different declaration contexts: a
1151*67e74705SXin Li*lexical* context, which corresponds to the source-centric view of the
1152*67e74705SXin Lideclaration context, and a *semantic* context, which corresponds to the
1153*67e74705SXin Lisemantics-centric view.  The lexical context is accessible via
1154*67e74705SXin Li``Decl::getLexicalDeclContext`` while the semantic context is accessible via
1155*67e74705SXin Li``Decl::getDeclContext``, both of which return ``DeclContext`` pointers.  For
1156*67e74705SXin Limost declarations, the two contexts are identical.  For example:
1157*67e74705SXin Li
1158*67e74705SXin Li.. code-block:: c++
1159*67e74705SXin Li
1160*67e74705SXin Li  class X {
1161*67e74705SXin Li  public:
1162*67e74705SXin Li    void f(int x);
1163*67e74705SXin Li  };
1164*67e74705SXin Li
1165*67e74705SXin LiHere, the semantic and lexical contexts of ``X::f`` are the ``DeclContext``
1166*67e74705SXin Liassociated with the class ``X`` (itself stored as a ``RecordDecl`` AST node).
1167*67e74705SXin LiHowever, we can now define ``X::f`` out-of-line:
1168*67e74705SXin Li
1169*67e74705SXin Li.. code-block:: c++
1170*67e74705SXin Li
1171*67e74705SXin Li  void X::f(int x = 17) { /* ...  */ }
1172*67e74705SXin Li
1173*67e74705SXin LiThis definition of "``f``" has different lexical and semantic contexts.  The
1174*67e74705SXin Lilexical context corresponds to the declaration context in which the actual
1175*67e74705SXin Lideclaration occurred in the source code, e.g., the translation unit containing
1176*67e74705SXin Li``X``.  Thus, this declaration of ``X::f`` can be found by traversing the
1177*67e74705SXin Lideclarations provided by [``decls_begin()``, ``decls_end()``) in the
1178*67e74705SXin Litranslation unit.
1179*67e74705SXin Li
1180*67e74705SXin LiThe semantic context of ``X::f`` corresponds to the class ``X``, since this
1181*67e74705SXin Limember function is (semantically) a member of ``X``.  Lookup of the name ``f``
1182*67e74705SXin Liinto the ``DeclContext`` associated with ``X`` will then return the definition
1183*67e74705SXin Liof ``X::f`` (including information about the default argument).
1184*67e74705SXin Li
1185*67e74705SXin LiTransparent Declaration Contexts
1186*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1187*67e74705SXin Li
1188*67e74705SXin LiIn C and C++, there are several contexts in which names that are logically
1189*67e74705SXin Lideclared inside another declaration will actually "leak" out into the enclosing
1190*67e74705SXin Liscope from the perspective of name lookup.  The most obvious instance of this
1191*67e74705SXin Libehavior is in enumeration types, e.g.,
1192*67e74705SXin Li
1193*67e74705SXin Li.. code-block:: c++
1194*67e74705SXin Li
1195*67e74705SXin Li  enum Color {
1196*67e74705SXin Li    Red,
1197*67e74705SXin Li    Green,
1198*67e74705SXin Li    Blue
1199*67e74705SXin Li  };
1200*67e74705SXin Li
1201*67e74705SXin LiHere, ``Color`` is an enumeration, which is a declaration context that contains
1202*67e74705SXin Lithe enumerators ``Red``, ``Green``, and ``Blue``.  Thus, traversing the list of
1203*67e74705SXin Lideclarations contained in the enumeration ``Color`` will yield ``Red``,
1204*67e74705SXin Li``Green``, and ``Blue``.  However, outside of the scope of ``Color`` one can
1205*67e74705SXin Liname the enumerator ``Red`` without qualifying the name, e.g.,
1206*67e74705SXin Li
1207*67e74705SXin Li.. code-block:: c++
1208*67e74705SXin Li
1209*67e74705SXin Li  Color c = Red;
1210*67e74705SXin Li
1211*67e74705SXin LiThere are other entities in C++ that provide similar behavior.  For example,
1212*67e74705SXin Lilinkage specifications that use curly braces:
1213*67e74705SXin Li
1214*67e74705SXin Li.. code-block:: c++
1215*67e74705SXin Li
1216*67e74705SXin Li  extern "C" {
1217*67e74705SXin Li    void f(int);
1218*67e74705SXin Li    void g(int);
1219*67e74705SXin Li  }
1220*67e74705SXin Li  // f and g are visible here
1221*67e74705SXin Li
1222*67e74705SXin LiFor source-level accuracy, we treat the linkage specification and enumeration
1223*67e74705SXin Litype as a declaration context in which its enclosed declarations ("``Red``",
1224*67e74705SXin Li"``Green``", and "``Blue``"; "``f``" and "``g``") are declared.  However, these
1225*67e74705SXin Lideclarations are visible outside of the scope of the declaration context.
1226*67e74705SXin Li
1227*67e74705SXin LiThese language features (and several others, described below) have roughly the
1228*67e74705SXin Lisame set of requirements: declarations are declared within a particular lexical
1229*67e74705SXin Licontext, but the declarations are also found via name lookup in scopes
1230*67e74705SXin Lienclosing the declaration itself.  This feature is implemented via
1231*67e74705SXin Li*transparent* declaration contexts (see
1232*67e74705SXin Li``DeclContext::isTransparentContext()``), whose declarations are visible in the
1233*67e74705SXin Linearest enclosing non-transparent declaration context.  This means that the
1234*67e74705SXin Lilexical context of the declaration (e.g., an enumerator) will be the
1235*67e74705SXin Litransparent ``DeclContext`` itself, as will the semantic context, but the
1236*67e74705SXin Lideclaration will be visible in every outer context up to and including the
1237*67e74705SXin Lifirst non-transparent declaration context (since transparent declaration
1238*67e74705SXin Licontexts can be nested).
1239*67e74705SXin Li
1240*67e74705SXin LiThe transparent ``DeclContext``\ s are:
1241*67e74705SXin Li
1242*67e74705SXin Li* Enumerations (but not C++11 "scoped enumerations"):
1243*67e74705SXin Li
1244*67e74705SXin Li  .. code-block:: c++
1245*67e74705SXin Li
1246*67e74705SXin Li    enum Color {
1247*67e74705SXin Li      Red,
1248*67e74705SXin Li      Green,
1249*67e74705SXin Li      Blue
1250*67e74705SXin Li    };
1251*67e74705SXin Li    // Red, Green, and Blue are in scope
1252*67e74705SXin Li
1253*67e74705SXin Li* C++ linkage specifications:
1254*67e74705SXin Li
1255*67e74705SXin Li  .. code-block:: c++
1256*67e74705SXin Li
1257*67e74705SXin Li    extern "C" {
1258*67e74705SXin Li      void f(int);
1259*67e74705SXin Li      void g(int);
1260*67e74705SXin Li    }
1261*67e74705SXin Li    // f and g are in scope
1262*67e74705SXin Li
1263*67e74705SXin Li* Anonymous unions and structs:
1264*67e74705SXin Li
1265*67e74705SXin Li  .. code-block:: c++
1266*67e74705SXin Li
1267*67e74705SXin Li    struct LookupTable {
1268*67e74705SXin Li      bool IsVector;
1269*67e74705SXin Li      union {
1270*67e74705SXin Li        std::vector<Item> *Vector;
1271*67e74705SXin Li        std::set<Item> *Set;
1272*67e74705SXin Li      };
1273*67e74705SXin Li    };
1274*67e74705SXin Li
1275*67e74705SXin Li    LookupTable LT;
1276*67e74705SXin Li    LT.Vector = 0; // Okay: finds Vector inside the unnamed union
1277*67e74705SXin Li
1278*67e74705SXin Li* C++11 inline namespaces:
1279*67e74705SXin Li
1280*67e74705SXin Li  .. code-block:: c++
1281*67e74705SXin Li
1282*67e74705SXin Li    namespace mylib {
1283*67e74705SXin Li      inline namespace debug {
1284*67e74705SXin Li        class X;
1285*67e74705SXin Li      }
1286*67e74705SXin Li    }
1287*67e74705SXin Li    mylib::X *xp; // okay: mylib::X refers to mylib::debug::X
1288*67e74705SXin Li
1289*67e74705SXin Li.. _MultiDeclContext:
1290*67e74705SXin Li
1291*67e74705SXin LiMultiply-Defined Declaration Contexts
1292*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1293*67e74705SXin Li
1294*67e74705SXin LiC++ namespaces have the interesting --- and, so far, unique --- property that
1295*67e74705SXin Lithe namespace can be defined multiple times, and the declarations provided by
1296*67e74705SXin Lieach namespace definition are effectively merged (from the semantic point of
1297*67e74705SXin Liview).  For example, the following two code snippets are semantically
1298*67e74705SXin Liindistinguishable:
1299*67e74705SXin Li
1300*67e74705SXin Li.. code-block:: c++
1301*67e74705SXin Li
1302*67e74705SXin Li  // Snippet #1:
1303*67e74705SXin Li  namespace N {
1304*67e74705SXin Li    void f();
1305*67e74705SXin Li  }
1306*67e74705SXin Li  namespace N {
1307*67e74705SXin Li    void f(int);
1308*67e74705SXin Li  }
1309*67e74705SXin Li
1310*67e74705SXin Li  // Snippet #2:
1311*67e74705SXin Li  namespace N {
1312*67e74705SXin Li    void f();
1313*67e74705SXin Li    void f(int);
1314*67e74705SXin Li  }
1315*67e74705SXin Li
1316*67e74705SXin LiIn Clang's representation, the source-centric view of declaration contexts will
1317*67e74705SXin Liactually have two separate ``NamespaceDecl`` nodes in Snippet #1, each of which
1318*67e74705SXin Liis a declaration context that contains a single declaration of "``f``".
1319*67e74705SXin LiHowever, the semantics-centric view provided by name lookup into the namespace
1320*67e74705SXin Li``N`` for "``f``" will return a ``DeclContext::lookup_result`` that contains a
1321*67e74705SXin Lirange of iterators over declarations of "``f``".
1322*67e74705SXin Li
1323*67e74705SXin Li``DeclContext`` manages multiply-defined declaration contexts internally.  The
1324*67e74705SXin Lifunction ``DeclContext::getPrimaryContext`` retrieves the "primary" context for
1325*67e74705SXin Lia given ``DeclContext`` instance, which is the ``DeclContext`` responsible for
1326*67e74705SXin Limaintaining the lookup table used for the semantics-centric view.  Given a
1327*67e74705SXin LiDeclContext, one can obtain the set of declaration contexts that are semanticaly
1328*67e74705SXin Liconnected to this declaration context, in source order, including this context
1329*67e74705SXin Li(which will be the only result, for non-namespace contexts) via
1330*67e74705SXin Li``DeclContext::collectAllContexts``. Note that these functions are used
1331*67e74705SXin Liinternally within the lookup and insertion methods of the ``DeclContext``, so
1332*67e74705SXin Lithe vast majority of clients can ignore them.
1333*67e74705SXin Li
1334*67e74705SXin Li.. _CFG:
1335*67e74705SXin Li
1336*67e74705SXin LiThe ``CFG`` class
1337*67e74705SXin Li-----------------
1338*67e74705SXin Li
1339*67e74705SXin LiThe ``CFG`` class is designed to represent a source-level control-flow graph
1340*67e74705SXin Lifor a single statement (``Stmt*``).  Typically instances of ``CFG`` are
1341*67e74705SXin Liconstructed for function bodies (usually an instance of ``CompoundStmt``), but
1342*67e74705SXin Lican also be instantiated to represent the control-flow of any class that
1343*67e74705SXin Lisubclasses ``Stmt``, which includes simple expressions.  Control-flow graphs
1344*67e74705SXin Liare especially useful for performing `flow- or path-sensitive
1345*67e74705SXin Li<http://en.wikipedia.org/wiki/Data_flow_analysis#Sensitivities>`_ program
1346*67e74705SXin Lianalyses on a given function.
1347*67e74705SXin Li
1348*67e74705SXin LiBasic Blocks
1349*67e74705SXin Li^^^^^^^^^^^^
1350*67e74705SXin Li
1351*67e74705SXin LiConcretely, an instance of ``CFG`` is a collection of basic blocks.  Each basic
1352*67e74705SXin Liblock is an instance of ``CFGBlock``, which simply contains an ordered sequence
1353*67e74705SXin Liof ``Stmt*`` (each referring to statements in the AST).  The ordering of
1354*67e74705SXin Listatements within a block indicates unconditional flow of control from one
1355*67e74705SXin Listatement to the next.  :ref:`Conditional control-flow
1356*67e74705SXin Li<ConditionalControlFlow>` is represented using edges between basic blocks.  The
1357*67e74705SXin Listatements within a given ``CFGBlock`` can be traversed using the
1358*67e74705SXin Li``CFGBlock::*iterator`` interface.
1359*67e74705SXin Li
1360*67e74705SXin LiA ``CFG`` object owns the instances of ``CFGBlock`` within the control-flow
1361*67e74705SXin Ligraph it represents.  Each ``CFGBlock`` within a CFG is also uniquely numbered
1362*67e74705SXin Li(accessible via ``CFGBlock::getBlockID()``).  Currently the number is based on
1363*67e74705SXin Lithe ordering the blocks were created, but no assumptions should be made on how
1364*67e74705SXin Li``CFGBlocks`` are numbered other than their numbers are unique and that they
1365*67e74705SXin Liare numbered from 0..N-1 (where N is the number of basic blocks in the CFG).
1366*67e74705SXin Li
1367*67e74705SXin LiEntry and Exit Blocks
1368*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^
1369*67e74705SXin Li
1370*67e74705SXin LiEach instance of ``CFG`` contains two special blocks: an *entry* block
1371*67e74705SXin Li(accessible via ``CFG::getEntry()``), which has no incoming edges, and an
1372*67e74705SXin Li*exit* block (accessible via ``CFG::getExit()``), which has no outgoing edges.
1373*67e74705SXin LiNeither block contains any statements, and they serve the role of providing a
1374*67e74705SXin Liclear entrance and exit for a body of code such as a function body.  The
1375*67e74705SXin Lipresence of these empty blocks greatly simplifies the implementation of many
1376*67e74705SXin Lianalyses built on top of CFGs.
1377*67e74705SXin Li
1378*67e74705SXin Li.. _ConditionalControlFlow:
1379*67e74705SXin Li
1380*67e74705SXin LiConditional Control-Flow
1381*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^
1382*67e74705SXin Li
1383*67e74705SXin LiConditional control-flow (such as those induced by if-statements and loops) is
1384*67e74705SXin Lirepresented as edges between ``CFGBlocks``.  Because different C language
1385*67e74705SXin Liconstructs can induce control-flow, each ``CFGBlock`` also records an extra
1386*67e74705SXin Li``Stmt*`` that represents the *terminator* of the block.  A terminator is
1387*67e74705SXin Lisimply the statement that caused the control-flow, and is used to identify the
1388*67e74705SXin Linature of the conditional control-flow between blocks.  For example, in the
1389*67e74705SXin Licase of an if-statement, the terminator refers to the ``IfStmt`` object in the
1390*67e74705SXin LiAST that represented the given branch.
1391*67e74705SXin Li
1392*67e74705SXin LiTo illustrate, consider the following code example:
1393*67e74705SXin Li
1394*67e74705SXin Li.. code-block:: c++
1395*67e74705SXin Li
1396*67e74705SXin Li  int foo(int x) {
1397*67e74705SXin Li    x = x + 1;
1398*67e74705SXin Li    if (x > 2)
1399*67e74705SXin Li      x++;
1400*67e74705SXin Li    else {
1401*67e74705SXin Li      x += 2;
1402*67e74705SXin Li      x *= 2;
1403*67e74705SXin Li    }
1404*67e74705SXin Li
1405*67e74705SXin Li    return x;
1406*67e74705SXin Li  }
1407*67e74705SXin Li
1408*67e74705SXin LiAfter invoking the parser+semantic analyzer on this code fragment, the AST of
1409*67e74705SXin Lithe body of ``foo`` is referenced by a single ``Stmt*``.  We can then construct
1410*67e74705SXin Lian instance of ``CFG`` representing the control-flow graph of this function
1411*67e74705SXin Libody by single call to a static class method:
1412*67e74705SXin Li
1413*67e74705SXin Li.. code-block:: c++
1414*67e74705SXin Li
1415*67e74705SXin Li  Stmt *FooBody = ...
1416*67e74705SXin Li  std::unique_ptr<CFG> FooCFG = CFG::buildCFG(FooBody);
1417*67e74705SXin Li
1418*67e74705SXin LiAlong with providing an interface to iterate over its ``CFGBlocks``, the
1419*67e74705SXin Li``CFG`` class also provides methods that are useful for debugging and
1420*67e74705SXin Livisualizing CFGs.  For example, the method ``CFG::dump()`` dumps a
1421*67e74705SXin Lipretty-printed version of the CFG to standard error.  This is especially useful
1422*67e74705SXin Liwhen one is using a debugger such as gdb.  For example, here is the output of
1423*67e74705SXin Li``FooCFG->dump()``:
1424*67e74705SXin Li
1425*67e74705SXin Li.. code-block:: text
1426*67e74705SXin Li
1427*67e74705SXin Li [ B5 (ENTRY) ]
1428*67e74705SXin Li    Predecessors (0):
1429*67e74705SXin Li    Successors (1): B4
1430*67e74705SXin Li
1431*67e74705SXin Li [ B4 ]
1432*67e74705SXin Li    1: x = x + 1
1433*67e74705SXin Li    2: (x > 2)
1434*67e74705SXin Li    T: if [B4.2]
1435*67e74705SXin Li    Predecessors (1): B5
1436*67e74705SXin Li    Successors (2): B3 B2
1437*67e74705SXin Li
1438*67e74705SXin Li [ B3 ]
1439*67e74705SXin Li    1: x++
1440*67e74705SXin Li    Predecessors (1): B4
1441*67e74705SXin Li    Successors (1): B1
1442*67e74705SXin Li
1443*67e74705SXin Li [ B2 ]
1444*67e74705SXin Li    1: x += 2
1445*67e74705SXin Li    2: x *= 2
1446*67e74705SXin Li    Predecessors (1): B4
1447*67e74705SXin Li    Successors (1): B1
1448*67e74705SXin Li
1449*67e74705SXin Li [ B1 ]
1450*67e74705SXin Li    1: return x;
1451*67e74705SXin Li    Predecessors (2): B2 B3
1452*67e74705SXin Li    Successors (1): B0
1453*67e74705SXin Li
1454*67e74705SXin Li [ B0 (EXIT) ]
1455*67e74705SXin Li    Predecessors (1): B1
1456*67e74705SXin Li    Successors (0):
1457*67e74705SXin Li
1458*67e74705SXin LiFor each block, the pretty-printed output displays for each block the number of
1459*67e74705SXin Li*predecessor* blocks (blocks that have outgoing control-flow to the given
1460*67e74705SXin Liblock) and *successor* blocks (blocks that have control-flow that have incoming
1461*67e74705SXin Licontrol-flow from the given block).  We can also clearly see the special entry
1462*67e74705SXin Liand exit blocks at the beginning and end of the pretty-printed output.  For the
1463*67e74705SXin Lientry block (block B5), the number of predecessor blocks is 0, while for the
1464*67e74705SXin Liexit block (block B0) the number of successor blocks is 0.
1465*67e74705SXin Li
1466*67e74705SXin LiThe most interesting block here is B4, whose outgoing control-flow represents
1467*67e74705SXin Lithe branching caused by the sole if-statement in ``foo``.  Of particular
1468*67e74705SXin Liinterest is the second statement in the block, ``(x > 2)``, and the terminator,
1469*67e74705SXin Liprinted as ``if [B4.2]``.  The second statement represents the evaluation of
1470*67e74705SXin Lithe condition of the if-statement, which occurs before the actual branching of
1471*67e74705SXin Licontrol-flow.  Within the ``CFGBlock`` for B4, the ``Stmt*`` for the second
1472*67e74705SXin Listatement refers to the actual expression in the AST for ``(x > 2)``.  Thus
1473*67e74705SXin Lipointers to subclasses of ``Expr`` can appear in the list of statements in a
1474*67e74705SXin Liblock, and not just subclasses of ``Stmt`` that refer to proper C statements.
1475*67e74705SXin Li
1476*67e74705SXin LiThe terminator of block B4 is a pointer to the ``IfStmt`` object in the AST.
1477*67e74705SXin LiThe pretty-printer outputs ``if [B4.2]`` because the condition expression of
1478*67e74705SXin Lithe if-statement has an actual place in the basic block, and thus the
1479*67e74705SXin Literminator is essentially *referring* to the expression that is the second
1480*67e74705SXin Listatement of block B4 (i.e., B4.2).  In this manner, conditions for
1481*67e74705SXin Licontrol-flow (which also includes conditions for loops and switch statements)
1482*67e74705SXin Liare hoisted into the actual basic block.
1483*67e74705SXin Li
1484*67e74705SXin Li.. Implicit Control-Flow
1485*67e74705SXin Li.. ^^^^^^^^^^^^^^^^^^^^^
1486*67e74705SXin Li
1487*67e74705SXin Li.. A key design principle of the ``CFG`` class was to not require any
1488*67e74705SXin Li.. transformations to the AST in order to represent control-flow.  Thus the
1489*67e74705SXin Li.. ``CFG`` does not perform any "lowering" of the statements in an AST: loops
1490*67e74705SXin Li.. are not transformed into guarded gotos, short-circuit operations are not
1491*67e74705SXin Li.. converted to a set of if-statements, and so on.
1492*67e74705SXin Li
1493*67e74705SXin LiConstant Folding in the Clang AST
1494*67e74705SXin Li---------------------------------
1495*67e74705SXin Li
1496*67e74705SXin LiThere are several places where constants and constant folding matter a lot to
1497*67e74705SXin Lithe Clang front-end.  First, in general, we prefer the AST to retain the source
1498*67e74705SXin Licode as close to how the user wrote it as possible.  This means that if they
1499*67e74705SXin Liwrote "``5+4``", we want to keep the addition and two constants in the AST, we
1500*67e74705SXin Lidon't want to fold to "``9``".  This means that constant folding in various
1501*67e74705SXin Liways turns into a tree walk that needs to handle the various cases.
1502*67e74705SXin Li
1503*67e74705SXin LiHowever, there are places in both C and C++ that require constants to be
1504*67e74705SXin Lifolded.  For example, the C standard defines what an "integer constant
1505*67e74705SXin Liexpression" (i-c-e) is with very precise and specific requirements.  The
1506*67e74705SXin Lilanguage then requires i-c-e's in a lot of places (for example, the size of a
1507*67e74705SXin Libitfield, the value for a case statement, etc).  For these, we have to be able
1508*67e74705SXin Lito constant fold the constants, to do semantic checks (e.g., verify bitfield
1509*67e74705SXin Lisize is non-negative and that case statements aren't duplicated).  We aim for
1510*67e74705SXin LiClang to be very pedantic about this, diagnosing cases when the code does not
1511*67e74705SXin Liuse an i-c-e where one is required, but accepting the code unless running with
1512*67e74705SXin Li``-pedantic-errors``.
1513*67e74705SXin Li
1514*67e74705SXin LiThings get a little bit more tricky when it comes to compatibility with
1515*67e74705SXin Lireal-world source code.  Specifically, GCC has historically accepted a huge
1516*67e74705SXin Lisuperset of expressions as i-c-e's, and a lot of real world code depends on
1517*67e74705SXin Lithis unfortuate accident of history (including, e.g., the glibc system
1518*67e74705SXin Liheaders).  GCC accepts anything its "fold" optimizer is capable of reducing to
1519*67e74705SXin Lian integer constant, which means that the definition of what it accepts changes
1520*67e74705SXin Lias its optimizer does.  One example is that GCC accepts things like "``case
1521*67e74705SXin LiX-X:``" even when ``X`` is a variable, because it can fold this to 0.
1522*67e74705SXin Li
1523*67e74705SXin LiAnother issue are how constants interact with the extensions we support, such
1524*67e74705SXin Lias ``__builtin_constant_p``, ``__builtin_inf``, ``__extension__`` and many
1525*67e74705SXin Liothers.  C99 obviously does not specify the semantics of any of these
1526*67e74705SXin Liextensions, and the definition of i-c-e does not include them.  However, these
1527*67e74705SXin Liextensions are often used in real code, and we have to have a way to reason
1528*67e74705SXin Liabout them.
1529*67e74705SXin Li
1530*67e74705SXin LiFinally, this is not just a problem for semantic analysis.  The code generator
1531*67e74705SXin Liand other clients have to be able to fold constants (e.g., to initialize global
1532*67e74705SXin Livariables) and has to handle a superset of what C99 allows.  Further, these
1533*67e74705SXin Liclients can benefit from extended information.  For example, we know that
1534*67e74705SXin Li"``foo() || 1``" always evaluates to ``true``, but we can't replace the
1535*67e74705SXin Liexpression with ``true`` because it has side effects.
1536*67e74705SXin Li
1537*67e74705SXin LiImplementation Approach
1538*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^
1539*67e74705SXin Li
1540*67e74705SXin LiAfter trying several different approaches, we've finally converged on a design
1541*67e74705SXin Li(Note, at the time of this writing, not all of this has been implemented,
1542*67e74705SXin Liconsider this a design goal!).  Our basic approach is to define a single
1543*67e74705SXin Lirecursive method evaluation method (``Expr::Evaluate``), which is implemented
1544*67e74705SXin Liin ``AST/ExprConstant.cpp``.  Given an expression with "scalar" type (integer,
1545*67e74705SXin Lifp, complex, or pointer) this method returns the following information:
1546*67e74705SXin Li
1547*67e74705SXin Li* Whether the expression is an integer constant expression, a general constant
1548*67e74705SXin Li  that was folded but has no side effects, a general constant that was folded
1549*67e74705SXin Li  but that does have side effects, or an uncomputable/unfoldable value.
1550*67e74705SXin Li* If the expression was computable in any way, this method returns the
1551*67e74705SXin Li  ``APValue`` for the result of the expression.
1552*67e74705SXin Li* If the expression is not evaluatable at all, this method returns information
1553*67e74705SXin Li  on one of the problems with the expression.  This includes a
1554*67e74705SXin Li  ``SourceLocation`` for where the problem is, and a diagnostic ID that explains
1555*67e74705SXin Li  the problem.  The diagnostic should have ``ERROR`` type.
1556*67e74705SXin Li* If the expression is not an integer constant expression, this method returns
1557*67e74705SXin Li  information on one of the problems with the expression.  This includes a
1558*67e74705SXin Li  ``SourceLocation`` for where the problem is, and a diagnostic ID that
1559*67e74705SXin Li  explains the problem.  The diagnostic should have ``EXTENSION`` type.
1560*67e74705SXin Li
1561*67e74705SXin LiThis information gives various clients the flexibility that they want, and we
1562*67e74705SXin Liwill eventually have some helper methods for various extensions.  For example,
1563*67e74705SXin Li``Sema`` should have a ``Sema::VerifyIntegerConstantExpression`` method, which
1564*67e74705SXin Licalls ``Evaluate`` on the expression.  If the expression is not foldable, the
1565*67e74705SXin Lierror is emitted, and it would return ``true``.  If the expression is not an
1566*67e74705SXin Lii-c-e, the ``EXTENSION`` diagnostic is emitted.  Finally it would return
1567*67e74705SXin Li``false`` to indicate that the AST is OK.
1568*67e74705SXin Li
1569*67e74705SXin LiOther clients can use the information in other ways, for example, codegen can
1570*67e74705SXin Lijust use expressions that are foldable in any way.
1571*67e74705SXin Li
1572*67e74705SXin LiExtensions
1573*67e74705SXin Li^^^^^^^^^^
1574*67e74705SXin Li
1575*67e74705SXin LiThis section describes how some of the various extensions Clang supports
1576*67e74705SXin Liinteracts with constant evaluation:
1577*67e74705SXin Li
1578*67e74705SXin Li* ``__extension__``: The expression form of this extension causes any
1579*67e74705SXin Li  evaluatable subexpression to be accepted as an integer constant expression.
1580*67e74705SXin Li* ``__builtin_constant_p``: This returns true (as an integer constant
1581*67e74705SXin Li  expression) if the operand evaluates to either a numeric value (that is, not
1582*67e74705SXin Li  a pointer cast to integral type) of integral, enumeration, floating or
1583*67e74705SXin Li  complex type, or if it evaluates to the address of the first character of a
1584*67e74705SXin Li  string literal (possibly cast to some other type).  As a special case, if
1585*67e74705SXin Li  ``__builtin_constant_p`` is the (potentially parenthesized) condition of a
1586*67e74705SXin Li  conditional operator expression ("``?:``"), only the true side of the
1587*67e74705SXin Li  conditional operator is considered, and it is evaluated with full constant
1588*67e74705SXin Li  folding.
1589*67e74705SXin Li* ``__builtin_choose_expr``: The condition is required to be an integer
1590*67e74705SXin Li  constant expression, but we accept any constant as an "extension of an
1591*67e74705SXin Li  extension".  This only evaluates one operand depending on which way the
1592*67e74705SXin Li  condition evaluates.
1593*67e74705SXin Li* ``__builtin_classify_type``: This always returns an integer constant
1594*67e74705SXin Li  expression.
1595*67e74705SXin Li* ``__builtin_inf, nan, ...``: These are treated just like a floating-point
1596*67e74705SXin Li  literal.
1597*67e74705SXin Li* ``__builtin_abs, copysign, ...``: These are constant folded as general
1598*67e74705SXin Li  constant expressions.
1599*67e74705SXin Li* ``__builtin_strlen`` and ``strlen``: These are constant folded as integer
1600*67e74705SXin Li  constant expressions if the argument is a string literal.
1601*67e74705SXin Li
1602*67e74705SXin Li.. _Sema:
1603*67e74705SXin Li
1604*67e74705SXin LiThe Sema Library
1605*67e74705SXin Li================
1606*67e74705SXin Li
1607*67e74705SXin LiThis library is called by the :ref:`Parser library <Parser>` during parsing to
1608*67e74705SXin Lido semantic analysis of the input.  For valid programs, Sema builds an AST for
1609*67e74705SXin Liparsed constructs.
1610*67e74705SXin Li
1611*67e74705SXin Li.. _CodeGen:
1612*67e74705SXin Li
1613*67e74705SXin LiThe CodeGen Library
1614*67e74705SXin Li===================
1615*67e74705SXin Li
1616*67e74705SXin LiCodeGen takes an :ref:`AST <AST>` as input and produces `LLVM IR code
1617*67e74705SXin Li<//llvm.org/docs/LangRef.html>`_ from it.
1618*67e74705SXin Li
1619*67e74705SXin LiHow to change Clang
1620*67e74705SXin Li===================
1621*67e74705SXin Li
1622*67e74705SXin LiHow to add an attribute
1623*67e74705SXin Li-----------------------
1624*67e74705SXin LiAttributes are a form of metadata that can be attached to a program construct,
1625*67e74705SXin Liallowing the programmer to pass semantic information along to the compiler for
1626*67e74705SXin Livarious uses. For example, attributes may be used to alter the code generation
1627*67e74705SXin Lifor a program construct, or to provide extra semantic information for static
1628*67e74705SXin Lianalysis. This document explains how to add a custom attribute to Clang.
1629*67e74705SXin LiDocumentation on existing attributes can be found `here
1630*67e74705SXin Li<//clang.llvm.org/docs/AttributeReference.html>`_.
1631*67e74705SXin Li
1632*67e74705SXin LiAttribute Basics
1633*67e74705SXin Li^^^^^^^^^^^^^^^^
1634*67e74705SXin LiAttributes in Clang are handled in three stages: parsing into a parsed attribute
1635*67e74705SXin Lirepresentation, conversion from a parsed attribute into a semantic attribute,
1636*67e74705SXin Liand then the semantic handling of the attribute.
1637*67e74705SXin Li
1638*67e74705SXin LiParsing of the attribute is determined by the various syntactic forms attributes
1639*67e74705SXin Lican take, such as GNU, C++11, and Microsoft style attributes, as well as other
1640*67e74705SXin Liinformation provided by the table definition of the attribute. Ultimately, the
1641*67e74705SXin Liparsed representation of an attribute object is an ``AttributeList`` object.
1642*67e74705SXin LiThese parsed attributes chain together as a list of parsed attributes attached
1643*67e74705SXin Lito a declarator or declaration specifier. The parsing of attributes is handled
1644*67e74705SXin Liautomatically by Clang, except for attributes spelled as keywords. When
1645*67e74705SXin Liimplementing a keyword attribute, the parsing of the keyword and creation of the
1646*67e74705SXin Li``AttributeList`` object must be done manually.
1647*67e74705SXin Li
1648*67e74705SXin LiEventually, ``Sema::ProcessDeclAttributeList()`` is called with a ``Decl`` and
1649*67e74705SXin Lian ``AttributeList``, at which point the parsed attribute can be transformed
1650*67e74705SXin Liinto a semantic attribute. The process by which a parsed attribute is converted
1651*67e74705SXin Liinto a semantic attribute depends on the attribute definition and semantic
1652*67e74705SXin Lirequirements of the attribute. The end result, however, is that the semantic
1653*67e74705SXin Liattribute object is attached to the ``Decl`` object, and can be obtained by a
1654*67e74705SXin Licall to ``Decl::getAttr<T>()``.
1655*67e74705SXin Li
1656*67e74705SXin LiThe structure of the semantic attribute is also governed by the attribute
1657*67e74705SXin Lidefinition given in Attr.td. This definition is used to automatically generate
1658*67e74705SXin Lifunctionality used for the implementation of the attribute, such as a class
1659*67e74705SXin Liderived from ``clang::Attr``, information for the parser to use, automated
1660*67e74705SXin Lisemantic checking for some attributes, etc.
1661*67e74705SXin Li
1662*67e74705SXin Li
1663*67e74705SXin Li``include/clang/Basic/Attr.td``
1664*67e74705SXin Li^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1665*67e74705SXin LiThe first step to adding a new attribute to Clang is to add its definition to
1666*67e74705SXin Li`include/clang/Basic/Attr.td
1667*67e74705SXin Li<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Attr.td?view=markup>`_.
1668*67e74705SXin LiThis tablegen definition must derive from the ``Attr`` (tablegen, not
1669*67e74705SXin Lisemantic) type, or one of its derivatives. Most attributes will derive from the
1670*67e74705SXin Li``InheritableAttr`` type, which specifies that the attribute can be inherited by
1671*67e74705SXin Lilater redeclarations of the ``Decl`` it is associated with.
1672*67e74705SXin Li``InheritableParamAttr`` is similar to ``InheritableAttr``, except that the
1673*67e74705SXin Liattribute is written on a parameter instead of a declaration. If the attribute
1674*67e74705SXin Liis intended to apply to a type instead of a declaration, such an attribute
1675*67e74705SXin Lishould derive from ``TypeAttr``, and will generally not be given an AST
1676*67e74705SXin Lirepresentation. (Note that this document does not cover the creation of type
1677*67e74705SXin Liattributes.) An attribute that inherits from ``IgnoredAttr`` is parsed, but will
1678*67e74705SXin Ligenerate an ignored attribute diagnostic when used, which may be useful when an
1679*67e74705SXin Liattribute is supported by another vendor but not supported by clang.
1680*67e74705SXin Li
1681*67e74705SXin LiThe definition will specify several key pieces of information, such as the
1682*67e74705SXin Lisemantic name of the attribute, the spellings the attribute supports, the
1683*67e74705SXin Liarguments the attribute expects, and more. Most members of the ``Attr`` tablegen
1684*67e74705SXin Litype do not require definitions in the derived definition as the default
1685*67e74705SXin Lisuffice. However, every attribute must specify at least a spelling list, a
1686*67e74705SXin Lisubject list, and a documentation list.
1687*67e74705SXin Li
1688*67e74705SXin LiSpellings
1689*67e74705SXin Li~~~~~~~~~
1690*67e74705SXin LiAll attributes are required to specify a spelling list that denotes the ways in
1691*67e74705SXin Liwhich the attribute can be spelled. For instance, a single semantic attribute
1692*67e74705SXin Limay have a keyword spelling, as well as a C++11 spelling and a GNU spelling. An
1693*67e74705SXin Liempty spelling list is also permissible and may be useful for attributes which
1694*67e74705SXin Liare created implicitly. The following spellings are accepted:
1695*67e74705SXin Li
1696*67e74705SXin Li  ============  ================================================================
1697*67e74705SXin Li  Spelling      Description
1698*67e74705SXin Li  ============  ================================================================
1699*67e74705SXin Li  ``GNU``       Spelled with a GNU-style ``__attribute__((attr))`` syntax and
1700*67e74705SXin Li                placement.
1701*67e74705SXin Li  ``CXX11``     Spelled with a C++-style ``[[attr]]`` syntax. If the attribute
1702*67e74705SXin Li                is meant to be used by Clang, it should set the namespace to
1703*67e74705SXin Li                ``"clang"``.
1704*67e74705SXin Li  ``Declspec``  Spelled with a Microsoft-style ``__declspec(attr)`` syntax.
1705*67e74705SXin Li  ``Keyword``   The attribute is spelled as a keyword, and required custom
1706*67e74705SXin Li                parsing.
1707*67e74705SXin Li  ``GCC``       Specifies two spellings: the first is a GNU-style spelling, and
1708*67e74705SXin Li                the second is a C++-style spelling with the ``gnu`` namespace.
1709*67e74705SXin Li                Attributes should only specify this spelling for attributes
1710*67e74705SXin Li                supported by GCC.
1711*67e74705SXin Li  ``Pragma``    The attribute is spelled as a ``#pragma``, and requires custom
1712*67e74705SXin Li                processing within the preprocessor. If the attribute is meant to
1713*67e74705SXin Li                be used by Clang, it should set the namespace to ``"clang"``.
1714*67e74705SXin Li                Note that this spelling is not used for declaration attributes.
1715*67e74705SXin Li  ============  ================================================================
1716*67e74705SXin Li
1717*67e74705SXin LiSubjects
1718*67e74705SXin Li~~~~~~~~
1719*67e74705SXin LiAttributes appertain to one or more ``Decl`` subjects. If the attribute attempts
1720*67e74705SXin Lito attach to a subject that is not in the subject list, a diagnostic is issued
1721*67e74705SXin Liautomatically. Whether the diagnostic is a warning or an error depends on how
1722*67e74705SXin Lithe attribute's ``SubjectList`` is defined, but the default behavior is to warn.
1723*67e74705SXin LiThe diagnostics displayed to the user are automatically determined based on the
1724*67e74705SXin Lisubjects in the list, but a custom diagnostic parameter can also be specified in
1725*67e74705SXin Lithe ``SubjectList``. The diagnostics generated for subject list violations are
1726*67e74705SXin Lieither ``diag::warn_attribute_wrong_decl_type`` or
1727*67e74705SXin Li``diag::err_attribute_wrong_decl_type``, and the parameter enumeration is found
1728*67e74705SXin Liin `include/clang/Sema/AttributeList.h
1729*67e74705SXin Li<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Sema/AttributeList.h?view=markup>`_
1730*67e74705SXin LiIf a previously unused Decl node is added to the ``SubjectList``, the logic used
1731*67e74705SXin Lito automatically determine the diagnostic parameter in `utils/TableGen/ClangAttrEmitter.cpp
1732*67e74705SXin Li<http://llvm.org/viewvc/llvm-project/cfe/trunk/utils/TableGen/ClangAttrEmitter.cpp?view=markup>`_
1733*67e74705SXin Limay need to be updated.
1734*67e74705SXin Li
1735*67e74705SXin LiBy default, all subjects in the SubjectList must either be a Decl node defined
1736*67e74705SXin Liin ``DeclNodes.td``, or a statement node defined in ``StmtNodes.td``. However,
1737*67e74705SXin Limore complex subjects can be created by creating a ``SubsetSubject`` object.
1738*67e74705SXin LiEach such object has a base subject which it appertains to (which must be a
1739*67e74705SXin LiDecl or Stmt node, and not a SubsetSubject node), and some custom code which is
1740*67e74705SXin Licalled when determining whether an attribute appertains to the subject. For
1741*67e74705SXin Liinstance, a ``NonBitField`` SubsetSubject appertains to a ``FieldDecl``, and
1742*67e74705SXin Litests whether the given FieldDecl is a bit field. When a SubsetSubject is
1743*67e74705SXin Lispecified in a SubjectList, a custom diagnostic parameter must also be provided.
1744*67e74705SXin Li
1745*67e74705SXin LiDiagnostic checking for attribute subject lists is automated except when
1746*67e74705SXin Li``HasCustomParsing`` is set to ``1``.
1747*67e74705SXin Li
1748*67e74705SXin LiDocumentation
1749*67e74705SXin Li~~~~~~~~~~~~~
1750*67e74705SXin LiAll attributes must have some form of documentation associated with them.
1751*67e74705SXin LiDocumentation is table generated on the public web server by a server-side
1752*67e74705SXin Liprocess that runs daily. Generally, the documentation for an attribute is a
1753*67e74705SXin Listand-alone definition in `include/clang/Basic/AttrDocs.td
1754*67e74705SXin Li<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/AttdDocs.td?view=markup>`_
1755*67e74705SXin Lithat is named after the attribute being documented.
1756*67e74705SXin Li
1757*67e74705SXin LiIf the attribute is not for public consumption, or is an implicitly-created
1758*67e74705SXin Liattribute that has no visible spelling, the documentation list can specify the
1759*67e74705SXin Li``Undocumented`` object. Otherwise, the attribute should have its documentation
1760*67e74705SXin Liadded to AttrDocs.td.
1761*67e74705SXin Li
1762*67e74705SXin LiDocumentation derives from the ``Documentation`` tablegen type. All derived
1763*67e74705SXin Litypes must specify a documentation category and the actual documentation itself.
1764*67e74705SXin LiAdditionally, it can specify a custom heading for the attribute, though a
1765*67e74705SXin Lidefault heading will be chosen when possible.
1766*67e74705SXin Li
1767*67e74705SXin LiThere are four predefined documentation categories: ``DocCatFunction`` for
1768*67e74705SXin Liattributes that appertain to function-like subjects, ``DocCatVariable`` for
1769*67e74705SXin Liattributes that appertain to variable-like subjects, ``DocCatType`` for type
1770*67e74705SXin Liattributes, and ``DocCatStmt`` for statement attributes. A custom documentation
1771*67e74705SXin Licategory should be used for groups of attributes with similar functionality.
1772*67e74705SXin LiCustom categories are good for providing overview information for the attributes
1773*67e74705SXin Ligrouped under it. For instance, the consumed annotation attributes define a
1774*67e74705SXin Licustom category, ``DocCatConsumed``, that explains what consumed annotations are
1775*67e74705SXin Liat a high level.
1776*67e74705SXin Li
1777*67e74705SXin LiDocumentation content (whether it is for an attribute or a category) is written
1778*67e74705SXin Liusing reStructuredText (RST) syntax.
1779*67e74705SXin Li
1780*67e74705SXin LiAfter writing the documentation for the attribute, it should be locally tested
1781*67e74705SXin Lito ensure that there are no issues generating the documentation on the server.
1782*67e74705SXin LiLocal testing requires a fresh build of clang-tblgen. To generate the attribute
1783*67e74705SXin Lidocumentation, execute the following command::
1784*67e74705SXin Li
1785*67e74705SXin Li  clang-tblgen -gen-attr-docs -I /path/to/clang/include /path/to/clang/include/clang/Basic/Attr.td -o /path/to/clang/docs/AttributeReference.rst
1786*67e74705SXin Li
1787*67e74705SXin LiWhen testing locally, *do not* commit changes to ``AttributeReference.rst``.
1788*67e74705SXin LiThis file is generated by the server automatically, and any changes made to this
1789*67e74705SXin Lifile will be overwritten.
1790*67e74705SXin Li
1791*67e74705SXin LiArguments
1792*67e74705SXin Li~~~~~~~~~
1793*67e74705SXin LiAttributes may optionally specify a list of arguments that can be passed to the
1794*67e74705SXin Liattribute. Attribute arguments specify both the parsed form and the semantic
1795*67e74705SXin Liform of the attribute. For example, if ``Args`` is
1796*67e74705SXin Li``[StringArgument<"Arg1">, IntArgument<"Arg2">]`` then
1797*67e74705SXin Li``__attribute__((myattribute("Hello", 3)))`` will be a valid use; it requires
1798*67e74705SXin Litwo arguments while parsing, and the Attr subclass' constructor for the
1799*67e74705SXin Lisemantic attribute will require a string and integer argument.
1800*67e74705SXin Li
1801*67e74705SXin LiAll arguments have a name and a flag that specifies whether the argument is
1802*67e74705SXin Lioptional. The associated C++ type of the argument is determined by the argument
1803*67e74705SXin Lidefinition type. If the existing argument types are insufficient, new types can
1804*67e74705SXin Libe created, but it requires modifying `utils/TableGen/ClangAttrEmitter.cpp
1805*67e74705SXin Li<http://llvm.org/viewvc/llvm-project/cfe/trunk/utils/TableGen/ClangAttrEmitter.cpp?view=markup>`_
1806*67e74705SXin Lito properly support the type.
1807*67e74705SXin Li
1808*67e74705SXin LiOther Properties
1809*67e74705SXin Li~~~~~~~~~~~~~~~~
1810*67e74705SXin LiThe ``Attr`` definition has other members which control the behavior of the
1811*67e74705SXin Liattribute. Many of them are special-purpose and beyond the scope of this
1812*67e74705SXin Lidocument, however a few deserve mention.
1813*67e74705SXin Li
1814*67e74705SXin LiIf the parsed form of the attribute is more complex, or differs from the
1815*67e74705SXin Lisemantic form, the ``HasCustomParsing`` bit can be set to ``1`` for the class,
1816*67e74705SXin Liand the parsing code in `Parser::ParseGNUAttributeArgs()
1817*67e74705SXin Li<http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Parse/ParseDecl.cpp?view=markup>`_
1818*67e74705SXin Lican be updated for the special case. Note that this only applies to arguments
1819*67e74705SXin Liwith a GNU spelling -- attributes with a __declspec spelling currently ignore
1820*67e74705SXin Lithis flag and are handled by ``Parser::ParseMicrosoftDeclSpec``.
1821*67e74705SXin Li
1822*67e74705SXin LiNote that setting this member to 1 will opt out of common attribute semantic
1823*67e74705SXin Lihandling, requiring extra implementation efforts to ensure the attribute
1824*67e74705SXin Liappertains to the appropriate subject, etc.
1825*67e74705SXin Li
1826*67e74705SXin LiIf the attribute should not be propagated from from a template declaration to an
1827*67e74705SXin Liinstantiation of the template, set the ``Clone`` member to 0. By default, all
1828*67e74705SXin Liattributes will be cloned to template instantiations.
1829*67e74705SXin Li
1830*67e74705SXin LiAttributes that do not require an AST node should set the ``ASTNode`` field to
1831*67e74705SXin Li``0`` to avoid polluting the AST. Note that anything inheriting from
1832*67e74705SXin Li``TypeAttr`` or ``IgnoredAttr`` automatically do not generate an AST node. All
1833*67e74705SXin Liother attributes generate an AST node by default. The AST node is the semantic
1834*67e74705SXin Lirepresentation of the attribute.
1835*67e74705SXin Li
1836*67e74705SXin LiThe ``LangOpts`` field specifies a list of language options required by the
1837*67e74705SXin Liattribute.  For instance, all of the CUDA-specific attributes specify ``[CUDA]``
1838*67e74705SXin Lifor the ``LangOpts`` field, and when the CUDA language option is not enabled, an
1839*67e74705SXin Li"attribute ignored" warning diagnostic is emitted. Since language options are
1840*67e74705SXin Linot table generated nodes, new language options must be created manually and
1841*67e74705SXin Lishould specify the spelling used by ``LangOptions`` class.
1842*67e74705SXin Li
1843*67e74705SXin LiCustom accessors can be generated for an attribute based on the spelling list
1844*67e74705SXin Lifor that attribute. For instance, if an attribute has two different spellings:
1845*67e74705SXin Li'Foo' and 'Bar', accessors can be created:
1846*67e74705SXin Li``[Accessor<"isFoo", [GNU<"Foo">]>, Accessor<"isBar", [GNU<"Bar">]>]``
1847*67e74705SXin LiThese accessors will be generated on the semantic form of the attribute,
1848*67e74705SXin Liaccepting no arguments and returning a ``bool``.
1849*67e74705SXin Li
1850*67e74705SXin LiAttributes that do not require custom semantic handling should set the
1851*67e74705SXin Li``SemaHandler`` field to ``0``. Note that anything inheriting from
1852*67e74705SXin Li``IgnoredAttr`` automatically do not get a semantic handler. All other
1853*67e74705SXin Liattributes are assumed to use a semantic handler by default. Attributes
1854*67e74705SXin Liwithout a semantic handler are not given a parsed attribute ``Kind`` enumerator.
1855*67e74705SXin Li
1856*67e74705SXin LiTarget-specific attributes may share a spelling with other attributes in
1857*67e74705SXin Lidifferent targets. For instance, the ARM and MSP430 targets both have an
1858*67e74705SXin Liattribute spelled ``GNU<"interrupt">``, but with different parsing and semantic
1859*67e74705SXin Lirequirements. To support this feature, an attribute inheriting from
1860*67e74705SXin Li``TargetSpecificAttribute`` may specify a ``ParseKind`` field. This field
1861*67e74705SXin Lishould be the same value between all arguments sharing a spelling, and
1862*67e74705SXin Licorresponds to the parsed attribute's ``Kind`` enumerator. This allows
1863*67e74705SXin Liattributes to share a parsed attribute kind, but have distinct semantic
1864*67e74705SXin Liattribute classes. For instance, ``AttributeList::AT_Interrupt`` is the shared
1865*67e74705SXin Liparsed attribute kind, but ARMInterruptAttr and MSP430InterruptAttr are the
1866*67e74705SXin Lisemantic attributes generated.
1867*67e74705SXin Li
1868*67e74705SXin LiBy default, when declarations are merging attributes, an attribute will not be
1869*67e74705SXin Liduplicated. However, if an attribute can be duplicated during this merging
1870*67e74705SXin Listage, set ``DuplicatesAllowedWhileMerging`` to ``1``, and the attribute will
1871*67e74705SXin Libe merged.
1872*67e74705SXin Li
1873*67e74705SXin LiBy default, attribute arguments are parsed in an evaluated context. If the
1874*67e74705SXin Liarguments for an attribute should be parsed in an unevaluated context (akin to
1875*67e74705SXin Lithe way the argument to a ``sizeof`` expression is parsed), set
1876*67e74705SXin Li``ParseArgumentsAsUnevaluated`` to ``1``.
1877*67e74705SXin Li
1878*67e74705SXin LiIf additional functionality is desired for the semantic form of the attribute,
1879*67e74705SXin Lithe ``AdditionalMembers`` field specifies code to be copied verbatim into the
1880*67e74705SXin Lisemantic attribute class object, with ``public`` access.
1881*67e74705SXin Li
1882*67e74705SXin LiBoilerplate
1883*67e74705SXin Li^^^^^^^^^^^
1884*67e74705SXin LiAll semantic processing of declaration attributes happens in `lib/Sema/SemaDeclAttr.cpp
1885*67e74705SXin Li<http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaDeclAttr.cpp?view=markup>`_,
1886*67e74705SXin Liand generally starts in the ``ProcessDeclAttribute()`` function. If the
1887*67e74705SXin Liattribute is a "simple" attribute -- meaning that it requires no custom semantic
1888*67e74705SXin Liprocessing aside from what is automatically  provided, add a call to
1889*67e74705SXin Li``handleSimpleAttribute<YourAttr>(S, D, Attr);`` to the switch statement.
1890*67e74705SXin LiOtherwise, write a new ``handleYourAttr()`` function, and add that to the switch
1891*67e74705SXin Listatement. Please do not implement handling logic directly in the ``case`` for
1892*67e74705SXin Lithe attribute.
1893*67e74705SXin Li
1894*67e74705SXin LiUnless otherwise specified by the attribute definition, common semantic checking
1895*67e74705SXin Liof the parsed attribute is handled automatically. This includes diagnosing
1896*67e74705SXin Liparsed attributes that do not appertain to the given ``Decl``, ensuring the
1897*67e74705SXin Licorrect minimum number of arguments are passed, etc.
1898*67e74705SXin Li
1899*67e74705SXin LiIf the attribute adds additional warnings, define a ``DiagGroup`` in
1900*67e74705SXin Li`include/clang/Basic/DiagnosticGroups.td
1901*67e74705SXin Li<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticGroups.td?view=markup>`_
1902*67e74705SXin Linamed after the attribute's ``Spelling`` with "_"s replaced by "-"s. If there
1903*67e74705SXin Liis only a single diagnostic, it is permissible to use ``InGroup<DiagGroup<"your-attribute">>``
1904*67e74705SXin Lidirectly in `DiagnosticSemaKinds.td
1905*67e74705SXin Li<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td?view=markup>`_
1906*67e74705SXin Li
1907*67e74705SXin LiAll semantic diagnostics generated for your attribute, including automatically-
1908*67e74705SXin Ligenerated ones (such as subjects and argument counts), should have a
1909*67e74705SXin Licorresponding test case.
1910*67e74705SXin Li
1911*67e74705SXin LiSemantic handling
1912*67e74705SXin Li^^^^^^^^^^^^^^^^^
1913*67e74705SXin LiMost attributes are implemented to have some effect on the compiler. For
1914*67e74705SXin Liinstance, to modify the way code is generated, or to add extra semantic checks
1915*67e74705SXin Lifor an analysis pass, etc. Having added the attribute definition and conversion
1916*67e74705SXin Lito the semantic representation for the attribute, what remains is to implement
1917*67e74705SXin Lithe custom logic requiring use of the attribute.
1918*67e74705SXin Li
1919*67e74705SXin LiThe ``clang::Decl`` object can be queried for the presence or absence of an
1920*67e74705SXin Liattribute using ``hasAttr<T>()``. To obtain a pointer to the semantic
1921*67e74705SXin Lirepresentation of the attribute, ``getAttr<T>`` may be used.
1922*67e74705SXin Li
1923*67e74705SXin LiHow to add an expression or statement
1924*67e74705SXin Li-------------------------------------
1925*67e74705SXin Li
1926*67e74705SXin LiExpressions and statements are one of the most fundamental constructs within a
1927*67e74705SXin Licompiler, because they interact with many different parts of the AST, semantic
1928*67e74705SXin Lianalysis, and IR generation.  Therefore, adding a new expression or statement
1929*67e74705SXin Likind into Clang requires some care.  The following list details the various
1930*67e74705SXin Liplaces in Clang where an expression or statement needs to be introduced, along
1931*67e74705SXin Liwith patterns to follow to ensure that the new expression or statement works
1932*67e74705SXin Liwell across all of the C languages.  We focus on expressions, but statements
1933*67e74705SXin Liare similar.
1934*67e74705SXin Li
1935*67e74705SXin Li#. Introduce parsing actions into the parser.  Recursive-descent parsing is
1936*67e74705SXin Li   mostly self-explanatory, but there are a few things that are worth keeping
1937*67e74705SXin Li   in mind:
1938*67e74705SXin Li
1939*67e74705SXin Li   * Keep as much source location information as possible! You'll want it later
1940*67e74705SXin Li     to produce great diagnostics and support Clang's various features that map
1941*67e74705SXin Li     between source code and the AST.
1942*67e74705SXin Li   * Write tests for all of the "bad" parsing cases, to make sure your recovery
1943*67e74705SXin Li     is good.  If you have matched delimiters (e.g., parentheses, square
1944*67e74705SXin Li     brackets, etc.), use ``Parser::BalancedDelimiterTracker`` to give nice
1945*67e74705SXin Li     diagnostics when things go wrong.
1946*67e74705SXin Li
1947*67e74705SXin Li#. Introduce semantic analysis actions into ``Sema``.  Semantic analysis should
1948*67e74705SXin Li   always involve two functions: an ``ActOnXXX`` function that will be called
1949*67e74705SXin Li   directly from the parser, and a ``BuildXXX`` function that performs the
1950*67e74705SXin Li   actual semantic analysis and will (eventually!) build the AST node.  It's
1951*67e74705SXin Li   fairly common for the ``ActOnCXX`` function to do very little (often just
1952*67e74705SXin Li   some minor translation from the parser's representation to ``Sema``'s
1953*67e74705SXin Li   representation of the same thing), but the separation is still important:
1954*67e74705SXin Li   C++ template instantiation, for example, should always call the ``BuildXXX``
1955*67e74705SXin Li   variant.  Several notes on semantic analysis before we get into construction
1956*67e74705SXin Li   of the AST:
1957*67e74705SXin Li
1958*67e74705SXin Li   * Your expression probably involves some types and some subexpressions.
1959*67e74705SXin Li     Make sure to fully check that those types, and the types of those
1960*67e74705SXin Li     subexpressions, meet your expectations.  Add implicit conversions where
1961*67e74705SXin Li     necessary to make sure that all of the types line up exactly the way you
1962*67e74705SXin Li     want them.  Write extensive tests to check that you're getting good
1963*67e74705SXin Li     diagnostics for mistakes and that you can use various forms of
1964*67e74705SXin Li     subexpressions with your expression.
1965*67e74705SXin Li   * When type-checking a type or subexpression, make sure to first check
1966*67e74705SXin Li     whether the type is "dependent" (``Type::isDependentType()``) or whether a
1967*67e74705SXin Li     subexpression is type-dependent (``Expr::isTypeDependent()``).  If any of
1968*67e74705SXin Li     these return ``true``, then you're inside a template and you can't do much
1969*67e74705SXin Li     type-checking now.  That's normal, and your AST node (when you get there)
1970*67e74705SXin Li     will have to deal with this case.  At this point, you can write tests that
1971*67e74705SXin Li     use your expression within templates, but don't try to instantiate the
1972*67e74705SXin Li     templates.
1973*67e74705SXin Li   * For each subexpression, be sure to call ``Sema::CheckPlaceholderExpr()``
1974*67e74705SXin Li     to deal with "weird" expressions that don't behave well as subexpressions.
1975*67e74705SXin Li     Then, determine whether you need to perform lvalue-to-rvalue conversions
1976*67e74705SXin Li     (``Sema::DefaultLvalueConversions``) or the usual unary conversions
1977*67e74705SXin Li     (``Sema::UsualUnaryConversions``), for places where the subexpression is
1978*67e74705SXin Li     producing a value you intend to use.
1979*67e74705SXin Li   * Your ``BuildXXX`` function will probably just return ``ExprError()`` at
1980*67e74705SXin Li     this point, since you don't have an AST.  That's perfectly fine, and
1981*67e74705SXin Li     shouldn't impact your testing.
1982*67e74705SXin Li
1983*67e74705SXin Li#. Introduce an AST node for your new expression.  This starts with declaring
1984*67e74705SXin Li   the node in ``include/Basic/StmtNodes.td`` and creating a new class for your
1985*67e74705SXin Li   expression in the appropriate ``include/AST/Expr*.h`` header.  It's best to
1986*67e74705SXin Li   look at the class for a similar expression to get ideas, and there are some
1987*67e74705SXin Li   specific things to watch for:
1988*67e74705SXin Li
1989*67e74705SXin Li   * If you need to allocate memory, use the ``ASTContext`` allocator to
1990*67e74705SXin Li     allocate memory.  Never use raw ``malloc`` or ``new``, and never hold any
1991*67e74705SXin Li     resources in an AST node, because the destructor of an AST node is never
1992*67e74705SXin Li     called.
1993*67e74705SXin Li   * Make sure that ``getSourceRange()`` covers the exact source range of your
1994*67e74705SXin Li     expression.  This is needed for diagnostics and for IDE support.
1995*67e74705SXin Li   * Make sure that ``children()`` visits all of the subexpressions.  This is
1996*67e74705SXin Li     important for a number of features (e.g., IDE support, C++ variadic
1997*67e74705SXin Li     templates).  If you have sub-types, you'll also need to visit those
1998*67e74705SXin Li     sub-types in ``RecursiveASTVisitor``.
1999*67e74705SXin Li   * Add printing support (``StmtPrinter.cpp``) for your expression.
2000*67e74705SXin Li   * Add profiling support (``StmtProfile.cpp``) for your AST node, noting the
2001*67e74705SXin Li     distinguishing (non-source location) characteristics of an instance of
2002*67e74705SXin Li     your expression.  Omitting this step will lead to hard-to-diagnose
2003*67e74705SXin Li     failures regarding matching of template declarations.
2004*67e74705SXin Li   * Add serialization support (``ASTReaderStmt.cpp``, ``ASTWriterStmt.cpp``)
2005*67e74705SXin Li     for your AST node.
2006*67e74705SXin Li
2007*67e74705SXin Li#. Teach semantic analysis to build your AST node.  At this point, you can wire
2008*67e74705SXin Li   up your ``Sema::BuildXXX`` function to actually create your AST.  A few
2009*67e74705SXin Li   things to check at this point:
2010*67e74705SXin Li
2011*67e74705SXin Li   * If your expression can construct a new C++ class or return a new
2012*67e74705SXin Li     Objective-C object, be sure to update and then call
2013*67e74705SXin Li     ``Sema::MaybeBindToTemporary`` for your just-created AST node to be sure
2014*67e74705SXin Li     that the object gets properly destructed.  An easy way to test this is to
2015*67e74705SXin Li     return a C++ class with a private destructor: semantic analysis should
2016*67e74705SXin Li     flag an error here with the attempt to call the destructor.
2017*67e74705SXin Li   * Inspect the generated AST by printing it using ``clang -cc1 -ast-print``,
2018*67e74705SXin Li     to make sure you're capturing all of the important information about how
2019*67e74705SXin Li     the AST was written.
2020*67e74705SXin Li   * Inspect the generated AST under ``clang -cc1 -ast-dump`` to verify that
2021*67e74705SXin Li     all of the types in the generated AST line up the way you want them.
2022*67e74705SXin Li     Remember that clients of the AST should never have to "think" to
2023*67e74705SXin Li     understand what's going on.  For example, all implicit conversions should
2024*67e74705SXin Li     show up explicitly in the AST.
2025*67e74705SXin Li   * Write tests that use your expression as a subexpression of other,
2026*67e74705SXin Li     well-known expressions.  Can you call a function using your expression as
2027*67e74705SXin Li     an argument?  Can you use the ternary operator?
2028*67e74705SXin Li
2029*67e74705SXin Li#. Teach code generation to create IR to your AST node.  This step is the first
2030*67e74705SXin Li   (and only) that requires knowledge of LLVM IR.  There are several things to
2031*67e74705SXin Li   keep in mind:
2032*67e74705SXin Li
2033*67e74705SXin Li   * Code generation is separated into scalar/aggregate/complex and
2034*67e74705SXin Li     lvalue/rvalue paths, depending on what kind of result your expression
2035*67e74705SXin Li     produces.  On occasion, this requires some careful factoring of code to
2036*67e74705SXin Li     avoid duplication.
2037*67e74705SXin Li   * ``CodeGenFunction`` contains functions ``ConvertType`` and
2038*67e74705SXin Li     ``ConvertTypeForMem`` that convert Clang's types (``clang::Type*`` or
2039*67e74705SXin Li     ``clang::QualType``) to LLVM types.  Use the former for values, and the
2040*67e74705SXin Li     later for memory locations: test with the C++ "``bool``" type to check
2041*67e74705SXin Li     this.  If you find that you are having to use LLVM bitcasts to make the
2042*67e74705SXin Li     subexpressions of your expression have the type that your expression
2043*67e74705SXin Li     expects, STOP!  Go fix semantic analysis and the AST so that you don't
2044*67e74705SXin Li     need these bitcasts.
2045*67e74705SXin Li   * The ``CodeGenFunction`` class has a number of helper functions to make
2046*67e74705SXin Li     certain operations easy, such as generating code to produce an lvalue or
2047*67e74705SXin Li     an rvalue, or to initialize a memory location with a given value.  Prefer
2048*67e74705SXin Li     to use these functions rather than directly writing loads and stores,
2049*67e74705SXin Li     because these functions take care of some of the tricky details for you
2050*67e74705SXin Li     (e.g., for exceptions).
2051*67e74705SXin Li   * If your expression requires some special behavior in the event of an
2052*67e74705SXin Li     exception, look at the ``push*Cleanup`` functions in ``CodeGenFunction``
2053*67e74705SXin Li     to introduce a cleanup.  You shouldn't have to deal with
2054*67e74705SXin Li     exception-handling directly.
2055*67e74705SXin Li   * Testing is extremely important in IR generation.  Use ``clang -cc1
2056*67e74705SXin Li     -emit-llvm`` and `FileCheck
2057*67e74705SXin Li     <http://llvm.org/docs/CommandGuide/FileCheck.html>`_ to verify that you're
2058*67e74705SXin Li     generating the right IR.
2059*67e74705SXin Li
2060*67e74705SXin Li#. Teach template instantiation how to cope with your AST node, which requires
2061*67e74705SXin Li   some fairly simple code:
2062*67e74705SXin Li
2063*67e74705SXin Li   * Make sure that your expression's constructor properly computes the flags
2064*67e74705SXin Li     for type dependence (i.e., the type your expression produces can change
2065*67e74705SXin Li     from one instantiation to the next), value dependence (i.e., the constant
2066*67e74705SXin Li     value your expression produces can change from one instantiation to the
2067*67e74705SXin Li     next), instantiation dependence (i.e., a template parameter occurs
2068*67e74705SXin Li     anywhere in your expression), and whether your expression contains a
2069*67e74705SXin Li     parameter pack (for variadic templates).  Often, computing these flags
2070*67e74705SXin Li     just means combining the results from the various types and
2071*67e74705SXin Li     subexpressions.
2072*67e74705SXin Li   * Add ``TransformXXX`` and ``RebuildXXX`` functions to the ``TreeTransform``
2073*67e74705SXin Li     class template in ``Sema``.  ``TransformXXX`` should (recursively)
2074*67e74705SXin Li     transform all of the subexpressions and types within your expression,
2075*67e74705SXin Li     using ``getDerived().TransformYYY``.  If all of the subexpressions and
2076*67e74705SXin Li     types transform without error, it will then call the ``RebuildXXX``
2077*67e74705SXin Li     function, which will in turn call ``getSema().BuildXXX`` to perform
2078*67e74705SXin Li     semantic analysis and build your expression.
2079*67e74705SXin Li   * To test template instantiation, take those tests you wrote to make sure
2080*67e74705SXin Li     that you were type checking with type-dependent expressions and dependent
2081*67e74705SXin Li     types (from step #2) and instantiate those templates with various types,
2082*67e74705SXin Li     some of which type-check and some that don't, and test the error messages
2083*67e74705SXin Li     in each case.
2084*67e74705SXin Li
2085*67e74705SXin Li#. There are some "extras" that make other features work better.  It's worth
2086*67e74705SXin Li   handling these extras to give your expression complete integration into
2087*67e74705SXin Li   Clang:
2088*67e74705SXin Li
2089*67e74705SXin Li   * Add code completion support for your expression in
2090*67e74705SXin Li     ``SemaCodeComplete.cpp``.
2091*67e74705SXin Li   * If your expression has types in it, or has any "interesting" features
2092*67e74705SXin Li     other than subexpressions, extend libclang's ``CursorVisitor`` to provide
2093*67e74705SXin Li     proper visitation for your expression, enabling various IDE features such
2094*67e74705SXin Li     as syntax highlighting, cross-referencing, and so on.  The
2095*67e74705SXin Li     ``c-index-test`` helper program can be used to test these features.
2096*67e74705SXin Li
2097