xref: /aosp_15_r20/external/clang/docs/IntroductionToTheClangAST.rst (revision 67e74705e28f6214e480b399dd47ea732279e315)
1*67e74705SXin Li=============================
2*67e74705SXin LiIntroduction to the Clang AST
3*67e74705SXin Li=============================
4*67e74705SXin Li
5*67e74705SXin LiThis document gives a gentle introduction to the mysteries of the Clang
6*67e74705SXin LiAST. It is targeted at developers who either want to contribute to
7*67e74705SXin LiClang, or use tools that work based on Clang's AST, like the AST
8*67e74705SXin Limatchers.
9*67e74705SXin Li
10*67e74705SXin Li.. raw:: html
11*67e74705SXin Li
12*67e74705SXin Li  <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center>
13*67e74705SXin Li
14*67e74705SXin Li`Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_
15*67e74705SXin Li
16*67e74705SXin LiIntroduction
17*67e74705SXin Li============
18*67e74705SXin Li
19*67e74705SXin LiClang's AST is different from ASTs produced by some other compilers in
20*67e74705SXin Lithat it closely resembles both the written C++ code and the C++
21*67e74705SXin Listandard. For example, parenthesis expressions and compile time
22*67e74705SXin Liconstants are available in an unreduced form in the AST. This makes
23*67e74705SXin LiClang's AST a good fit for refactoring tools.
24*67e74705SXin Li
25*67e74705SXin LiDocumentation for all Clang AST nodes is available via the generated
26*67e74705SXin Li`Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online
27*67e74705SXin Lidocumentation is also indexed by your favorite search engine, which will
28*67e74705SXin Limake a search for clang and the AST node's class name usually turn up
29*67e74705SXin Lithe doxygen of the class you're looking for (for example, search for:
30*67e74705SXin Liclang ParenExpr).
31*67e74705SXin Li
32*67e74705SXin LiExamining the AST
33*67e74705SXin Li=================
34*67e74705SXin Li
35*67e74705SXin LiA good way to familarize yourself with the Clang AST is to actually look
36*67e74705SXin Liat it on some simple example code. Clang has a builtin AST-dump mode,
37*67e74705SXin Liwhich can be enabled with the flag ``-ast-dump``.
38*67e74705SXin Li
39*67e74705SXin LiLet's look at a simple example AST:
40*67e74705SXin Li
41*67e74705SXin Li::
42*67e74705SXin Li
43*67e74705SXin Li    $ cat test.cc
44*67e74705SXin Li    int f(int x) {
45*67e74705SXin Li      int result = (x / 42);
46*67e74705SXin Li      return result;
47*67e74705SXin Li    }
48*67e74705SXin Li
49*67e74705SXin Li    # Clang by default is a frontend for many tools; -Xclang is used to pass
50*67e74705SXin Li    # options directly to the C++ frontend.
51*67e74705SXin Li    $ clang -Xclang -ast-dump -fsyntax-only test.cc
52*67e74705SXin Li    TranslationUnitDecl 0x5aea0d0 <<invalid sloc>>
53*67e74705SXin Li    ... cutting out internal declarations of clang ...
54*67e74705SXin Li    `-FunctionDecl 0x5aeab50 <test.cc:1:1, line:4:1> f 'int (int)'
55*67e74705SXin Li      |-ParmVarDecl 0x5aeaa90 <line:1:7, col:11> x 'int'
56*67e74705SXin Li      `-CompoundStmt 0x5aead88 <col:14, line:4:1>
57*67e74705SXin Li        |-DeclStmt 0x5aead10 <line:2:3, col:24>
58*67e74705SXin Li        | `-VarDecl 0x5aeac10 <col:3, col:23> result 'int'
59*67e74705SXin Li        |   `-ParenExpr 0x5aeacf0 <col:16, col:23> 'int'
60*67e74705SXin Li        |     `-BinaryOperator 0x5aeacc8 <col:17, col:21> 'int' '/'
61*67e74705SXin Li        |       |-ImplicitCastExpr 0x5aeacb0 <col:17> 'int' <LValueToRValue>
62*67e74705SXin Li        |       | `-DeclRefExpr 0x5aeac68 <col:17> 'int' lvalue ParmVar 0x5aeaa90 'x' 'int'
63*67e74705SXin Li        |       `-IntegerLiteral 0x5aeac90 <col:21> 'int' 42
64*67e74705SXin Li        `-ReturnStmt 0x5aead68 <line:3:3, col:10>
65*67e74705SXin Li          `-ImplicitCastExpr 0x5aead50 <col:10> 'int' <LValueToRValue>
66*67e74705SXin Li            `-DeclRefExpr 0x5aead28 <col:10> 'int' lvalue Var 0x5aeac10 'result' 'int'
67*67e74705SXin Li
68*67e74705SXin LiThe toplevel declaration in
69*67e74705SXin Lia translation unit is always the `translation unit
70*67e74705SXin Lideclaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_.
71*67e74705SXin LiIn this example, our first user written declaration is the `function
72*67e74705SXin Lideclaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_
73*67e74705SXin Liof "``f``". The body of "``f``" is a `compound
74*67e74705SXin Listatement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_,
75*67e74705SXin Liwhose child nodes are a `declaration
76*67e74705SXin Listatement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_
77*67e74705SXin Lithat declares our result variable, and the `return
78*67e74705SXin Listatement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_.
79*67e74705SXin Li
80*67e74705SXin LiAST Context
81*67e74705SXin Li===========
82*67e74705SXin Li
83*67e74705SXin LiAll information about the AST for a translation unit is bundled up in
84*67e74705SXin Lithe class
85*67e74705SXin Li`ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_.
86*67e74705SXin LiIt allows traversal of the whole translation unit starting from
87*67e74705SXin Li`getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_,
88*67e74705SXin Lior to access Clang's `table of
89*67e74705SXin Liidentifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_
90*67e74705SXin Lifor the parsed translation unit.
91*67e74705SXin Li
92*67e74705SXin LiAST Nodes
93*67e74705SXin Li=========
94*67e74705SXin Li
95*67e74705SXin LiClang's AST nodes are modeled on a class hierarchy that does not have a
96*67e74705SXin Licommon ancestor. Instead, there are multiple larger hierarchies for
97*67e74705SXin Libasic node types like
98*67e74705SXin Li`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and
99*67e74705SXin Li`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many
100*67e74705SXin Liimportant AST nodes derive from
101*67e74705SXin Li`Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_,
102*67e74705SXin Li`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_,
103*67e74705SXin Li`DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_
104*67e74705SXin Lior `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with
105*67e74705SXin Lisome classes deriving from both Decl and DeclContext.
106*67e74705SXin Li
107*67e74705SXin LiThere are also a multitude of nodes in the AST that are not part of a
108*67e74705SXin Lilarger hierarchy, and are only reachable from specific other nodes, like
109*67e74705SXin Li`CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_.
110*67e74705SXin Li
111*67e74705SXin LiThus, to traverse the full AST, one starts from the
112*67e74705SXin Li`TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_
113*67e74705SXin Liand then recursively traverses everything that can be reached from that
114*67e74705SXin Linode - this information has to be encoded for each specific node type.
115*67e74705SXin LiThis algorithm is encoded in the
116*67e74705SXin Li`RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_.
117*67e74705SXin LiSee the `RecursiveASTVisitor
118*67e74705SXin Litutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_.
119*67e74705SXin Li
120*67e74705SXin LiThe two most basic nodes in the Clang AST are statements
121*67e74705SXin Li(`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and
122*67e74705SXin Lideclarations
123*67e74705SXin Li(`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note
124*67e74705SXin Lithat expressions
125*67e74705SXin Li(`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are
126*67e74705SXin Lialso statements in Clang's AST.
127