xref: /aosp_15_r20/external/clang/docs/LibASTMatchersTutorial.rst (revision 67e74705e28f6214e480b399dd47ea732279e315)
1*67e74705SXin Li===============================================================
2*67e74705SXin LiTutorial for building tools using LibTooling and LibASTMatchers
3*67e74705SXin Li===============================================================
4*67e74705SXin Li
5*67e74705SXin LiThis document is intended to show how to build a useful source-to-source
6*67e74705SXin Litranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
7*67e74705SXin Liexplicitly aimed at people who are new to Clang, so all you should need
8*67e74705SXin Liis a working knowledge of C++ and the command line.
9*67e74705SXin Li
10*67e74705SXin LiIn order to work on the compiler, you need some basic knowledge of the
11*67e74705SXin Liabstract syntax tree (AST). To this end, the reader is incouraged to
12*67e74705SXin Liskim the :doc:`Introduction to the Clang
13*67e74705SXin LiAST <IntroductionToTheClangAST>`
14*67e74705SXin Li
15*67e74705SXin LiStep 0: Obtaining Clang
16*67e74705SXin Li=======================
17*67e74705SXin Li
18*67e74705SXin LiAs Clang is part of the LLVM project, you'll need to download LLVM's
19*67e74705SXin Lisource code first. Both Clang and LLVM are maintained as Subversion
20*67e74705SXin Lirepositories, but we'll be accessing them through the git mirror. For
21*67e74705SXin Lifurther information, see the `getting started
22*67e74705SXin Liguide <http://llvm.org/docs/GettingStarted.html>`_.
23*67e74705SXin Li
24*67e74705SXin Li.. code-block:: console
25*67e74705SXin Li
26*67e74705SXin Li      mkdir ~/clang-llvm && cd ~/clang-llvm
27*67e74705SXin Li      git clone http://llvm.org/git/llvm.git
28*67e74705SXin Li      cd llvm/tools
29*67e74705SXin Li      git clone http://llvm.org/git/clang.git
30*67e74705SXin Li      cd clang/tools
31*67e74705SXin Li      git clone http://llvm.org/git/clang-tools-extra.git extra
32*67e74705SXin Li
33*67e74705SXin LiNext you need to obtain the CMake build system and Ninja build tool. You
34*67e74705SXin Limay already have CMake installed, but current binary versions of CMake
35*67e74705SXin Liaren't built with Ninja support.
36*67e74705SXin Li
37*67e74705SXin Li.. code-block:: console
38*67e74705SXin Li
39*67e74705SXin Li      cd ~/clang-llvm
40*67e74705SXin Li      git clone https://github.com/martine/ninja.git
41*67e74705SXin Li      cd ninja
42*67e74705SXin Li      git checkout release
43*67e74705SXin Li      ./bootstrap.py
44*67e74705SXin Li      sudo cp ninja /usr/bin/
45*67e74705SXin Li
46*67e74705SXin Li      cd ~/clang-llvm
47*67e74705SXin Li      git clone git://cmake.org/stage/cmake.git
48*67e74705SXin Li      cd cmake
49*67e74705SXin Li      git checkout next
50*67e74705SXin Li      ./bootstrap
51*67e74705SXin Li      make
52*67e74705SXin Li      sudo make install
53*67e74705SXin Li
54*67e74705SXin LiOkay. Now we'll build Clang!
55*67e74705SXin Li
56*67e74705SXin Li.. code-block:: console
57*67e74705SXin Li
58*67e74705SXin Li      cd ~/clang-llvm
59*67e74705SXin Li      mkdir build && cd build
60*67e74705SXin Li      cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON  # Enable tests; default is off.
61*67e74705SXin Li      ninja
62*67e74705SXin Li      ninja check       # Test LLVM only.
63*67e74705SXin Li      ninja clang-test  # Test Clang only.
64*67e74705SXin Li      ninja install
65*67e74705SXin Li
66*67e74705SXin LiAnd we're live.
67*67e74705SXin Li
68*67e74705SXin LiAll of the tests should pass, though there is a (very) small chance that
69*67e74705SXin Liyou can catch LLVM and Clang out of sync. Running ``'git svn rebase'``
70*67e74705SXin Liin both the llvm and clang directories should fix any problems.
71*67e74705SXin Li
72*67e74705SXin LiFinally, we want to set Clang as its own compiler.
73*67e74705SXin Li
74*67e74705SXin Li.. code-block:: console
75*67e74705SXin Li
76*67e74705SXin Li      cd ~/clang-llvm/build
77*67e74705SXin Li      ccmake ../llvm
78*67e74705SXin Li
79*67e74705SXin LiThe second command will bring up a GUI for configuring Clang. You need
80*67e74705SXin Lito set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
81*67e74705SXin Liadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
82*67e74705SXin Li``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
83*67e74705SXin Liconfigure, then ``'g'`` to generate CMake's files.
84*67e74705SXin Li
85*67e74705SXin LiFinally, run ninja one last time, and you're done.
86*67e74705SXin Li
87*67e74705SXin LiStep 1: Create a ClangTool
88*67e74705SXin Li==========================
89*67e74705SXin Li
90*67e74705SXin LiNow that we have enough background knowledge, it's time to create the
91*67e74705SXin Lisimplest productive ClangTool in existence: a syntax checker. While this
92*67e74705SXin Lialready exists as ``clang-check``, it's important to understand what's
93*67e74705SXin Ligoing on.
94*67e74705SXin Li
95*67e74705SXin LiFirst, we'll need to create a new directory for our tool and tell CMake
96*67e74705SXin Lithat it exists. As this is not going to be a core clang tool, it will
97*67e74705SXin Lilive in the ``tools/extra`` repository.
98*67e74705SXin Li
99*67e74705SXin Li.. code-block:: console
100*67e74705SXin Li
101*67e74705SXin Li      cd ~/clang-llvm/llvm/tools/clang
102*67e74705SXin Li      mkdir tools/extra/loop-convert
103*67e74705SXin Li      echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt
104*67e74705SXin Li      vim tools/extra/loop-convert/CMakeLists.txt
105*67e74705SXin Li
106*67e74705SXin LiCMakeLists.txt should have the following contents:
107*67e74705SXin Li
108*67e74705SXin Li::
109*67e74705SXin Li
110*67e74705SXin Li      set(LLVM_LINK_COMPONENTS support)
111*67e74705SXin Li
112*67e74705SXin Li      add_clang_executable(loop-convert
113*67e74705SXin Li        LoopConvert.cpp
114*67e74705SXin Li        )
115*67e74705SXin Li      target_link_libraries(loop-convert
116*67e74705SXin Li        clangTooling
117*67e74705SXin Li        clangBasic
118*67e74705SXin Li        clangASTMatchers
119*67e74705SXin Li        )
120*67e74705SXin Li
121*67e74705SXin LiWith that done, Ninja will be able to compile our tool. Let's give it
122*67e74705SXin Lisomething to compile! Put the following into
123*67e74705SXin Li``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
124*67e74705SXin Liwhy the different parts are needed can be found in the `LibTooling
125*67e74705SXin Lidocumentation <LibTooling.html>`_.
126*67e74705SXin Li
127*67e74705SXin Li.. code-block:: c++
128*67e74705SXin Li
129*67e74705SXin Li      // Declares clang::SyntaxOnlyAction.
130*67e74705SXin Li      #include "clang/Frontend/FrontendActions.h"
131*67e74705SXin Li      #include "clang/Tooling/CommonOptionsParser.h"
132*67e74705SXin Li      #include "clang/Tooling/Tooling.h"
133*67e74705SXin Li      // Declares llvm::cl::extrahelp.
134*67e74705SXin Li      #include "llvm/Support/CommandLine.h"
135*67e74705SXin Li
136*67e74705SXin Li      using namespace clang::tooling;
137*67e74705SXin Li      using namespace llvm;
138*67e74705SXin Li
139*67e74705SXin Li      // Apply a custom category to all command-line options so that they are the
140*67e74705SXin Li      // only ones displayed.
141*67e74705SXin Li      static llvm::cl::OptionCategory MyToolCategory("my-tool options");
142*67e74705SXin Li
143*67e74705SXin Li      // CommonOptionsParser declares HelpMessage with a description of the common
144*67e74705SXin Li      // command-line options related to the compilation database and input files.
145*67e74705SXin Li      // It's nice to have this help message in all tools.
146*67e74705SXin Li      static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
147*67e74705SXin Li
148*67e74705SXin Li      // A help message for this specific tool can be added afterwards.
149*67e74705SXin Li      static cl::extrahelp MoreHelp("\nMore help text...");
150*67e74705SXin Li
151*67e74705SXin Li      int main(int argc, const char **argv) {
152*67e74705SXin Li        CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
153*67e74705SXin Li        ClangTool Tool(OptionsParser.getCompilations(),
154*67e74705SXin Li                       OptionsParser.getSourcePathList());
155*67e74705SXin Li        return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get());
156*67e74705SXin Li      }
157*67e74705SXin Li
158*67e74705SXin LiAnd that's it! You can compile our new tool by running ninja from the
159*67e74705SXin Li``build`` directory.
160*67e74705SXin Li
161*67e74705SXin Li.. code-block:: console
162*67e74705SXin Li
163*67e74705SXin Li      cd ~/clang-llvm/build
164*67e74705SXin Li      ninja
165*67e74705SXin Li
166*67e74705SXin LiYou should now be able to run the syntax checker, which is located in
167*67e74705SXin Li``~/clang-llvm/build/bin``, on any source file. Try it!
168*67e74705SXin Li
169*67e74705SXin Li.. code-block:: console
170*67e74705SXin Li
171*67e74705SXin Li      echo "int main() { return 0; }" > test.cpp
172*67e74705SXin Li      bin/loop-convert test.cpp --
173*67e74705SXin Li
174*67e74705SXin LiNote the two dashes after we specify the source file. The additional
175*67e74705SXin Lioptions for the compiler are passed after the dashes rather than loading
176*67e74705SXin Lithem from a compilation database - there just aren't any options needed
177*67e74705SXin Liright now.
178*67e74705SXin Li
179*67e74705SXin LiIntermezzo: Learn AST matcher basics
180*67e74705SXin Li====================================
181*67e74705SXin Li
182*67e74705SXin LiClang recently introduced the :doc:`ASTMatcher
183*67e74705SXin Lilibrary <LibASTMatchers>` to provide a simple, powerful, and
184*67e74705SXin Liconcise way to describe specific patterns in the AST. Implemented as a
185*67e74705SXin LiDSL powered by macros and templates (see
186*67e74705SXin Li`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
187*67e74705SXin Licurious), matchers offer the feel of algebraic data types common to
188*67e74705SXin Lifunctional programming languages.
189*67e74705SXin Li
190*67e74705SXin LiFor example, suppose you wanted to examine only binary operators. There
191*67e74705SXin Liis a matcher to do exactly that, conveniently named ``binaryOperator``.
192*67e74705SXin LiI'll give you one guess what this matcher does:
193*67e74705SXin Li
194*67e74705SXin Li.. code-block:: c++
195*67e74705SXin Li
196*67e74705SXin Li      binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
197*67e74705SXin Li
198*67e74705SXin LiShockingly, it will match against addition expressions whose left hand
199*67e74705SXin Liside is exactly the literal 0. It will not match against other forms of
200*67e74705SXin Li0, such as ``'\0'`` or ``NULL``, but it will match against macros that
201*67e74705SXin Liexpand to 0. The matcher will also not match against calls to the
202*67e74705SXin Lioverloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
203*67e74705SXin Limatcher to handle overloaded operators.
204*67e74705SXin Li
205*67e74705SXin LiThere are AST matchers to match all the different nodes of the AST,
206*67e74705SXin Linarrowing matchers to only match AST nodes fulfilling specific criteria,
207*67e74705SXin Liand traversal matchers to get from one kind of AST node to another. For
208*67e74705SXin Lia complete list of AST matchers, take a look at the `AST Matcher
209*67e74705SXin LiReferences <LibASTMatchersReference.html>`_
210*67e74705SXin Li
211*67e74705SXin LiAll matcher that are nouns describe entities in the AST and can be
212*67e74705SXin Libound, so that they can be referred to whenever a match is found. To do
213*67e74705SXin Liso, simply call the method ``bind`` on these matchers, e.g.:
214*67e74705SXin Li
215*67e74705SXin Li.. code-block:: c++
216*67e74705SXin Li
217*67e74705SXin Li      variable(hasType(isInteger())).bind("intvar")
218*67e74705SXin Li
219*67e74705SXin LiStep 2: Using AST matchers
220*67e74705SXin Li==========================
221*67e74705SXin Li
222*67e74705SXin LiOkay, on to using matchers for real. Let's start by defining a matcher
223*67e74705SXin Liwhich will capture all ``for`` statements that define a new variable
224*67e74705SXin Liinitialized to zero. Let's start with matching all ``for`` loops:
225*67e74705SXin Li
226*67e74705SXin Li.. code-block:: c++
227*67e74705SXin Li
228*67e74705SXin Li      forStmt()
229*67e74705SXin Li
230*67e74705SXin LiNext, we want to specify that a single variable is declared in the first
231*67e74705SXin Liportion of the loop, so we can extend the matcher to
232*67e74705SXin Li
233*67e74705SXin Li.. code-block:: c++
234*67e74705SXin Li
235*67e74705SXin Li      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
236*67e74705SXin Li
237*67e74705SXin LiFinally, we can add the condition that the variable is initialized to
238*67e74705SXin Lizero.
239*67e74705SXin Li
240*67e74705SXin Li.. code-block:: c++
241*67e74705SXin Li
242*67e74705SXin Li      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
243*67e74705SXin Li        hasInitializer(integerLiteral(equals(0))))))))
244*67e74705SXin Li
245*67e74705SXin LiIt is fairly easy to read and understand the matcher definition ("match
246*67e74705SXin Liloops whose init portion declares a single variable which is initialized
247*67e74705SXin Lito the integer literal 0"), but deciding that every piece is necessary
248*67e74705SXin Liis more difficult. Note that this matcher will not match loops whose
249*67e74705SXin Livariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
250*67e74705SXin Lizero besides the integer 0.
251*67e74705SXin Li
252*67e74705SXin LiThe last step is giving the matcher a name and binding the ``ForStmt``
253*67e74705SXin Lias we will want to do something with it:
254*67e74705SXin Li
255*67e74705SXin Li.. code-block:: c++
256*67e74705SXin Li
257*67e74705SXin Li      StatementMatcher LoopMatcher =
258*67e74705SXin Li        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
259*67e74705SXin Li          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
260*67e74705SXin Li
261*67e74705SXin LiOnce you have defined your matchers, you will need to add a little more
262*67e74705SXin Liscaffolding in order to run them. Matchers are paired with a
263*67e74705SXin Li``MatchCallback`` and registered with a ``MatchFinder`` object, then run
264*67e74705SXin Lifrom a ``ClangTool``. More code!
265*67e74705SXin Li
266*67e74705SXin LiAdd the following to ``LoopConvert.cpp``:
267*67e74705SXin Li
268*67e74705SXin Li.. code-block:: c++
269*67e74705SXin Li
270*67e74705SXin Li      #include "clang/ASTMatchers/ASTMatchers.h"
271*67e74705SXin Li      #include "clang/ASTMatchers/ASTMatchFinder.h"
272*67e74705SXin Li
273*67e74705SXin Li      using namespace clang;
274*67e74705SXin Li      using namespace clang::ast_matchers;
275*67e74705SXin Li
276*67e74705SXin Li      StatementMatcher LoopMatcher =
277*67e74705SXin Li        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
278*67e74705SXin Li          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
279*67e74705SXin Li
280*67e74705SXin Li      class LoopPrinter : public MatchFinder::MatchCallback {
281*67e74705SXin Li      public :
282*67e74705SXin Li        virtual void run(const MatchFinder::MatchResult &Result) {
283*67e74705SXin Li          if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
284*67e74705SXin Li            FS->dump();
285*67e74705SXin Li        }
286*67e74705SXin Li      };
287*67e74705SXin Li
288*67e74705SXin LiAnd change ``main()`` to:
289*67e74705SXin Li
290*67e74705SXin Li.. code-block:: c++
291*67e74705SXin Li
292*67e74705SXin Li      int main(int argc, const char **argv) {
293*67e74705SXin Li        CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
294*67e74705SXin Li        ClangTool Tool(OptionsParser.getCompilations(),
295*67e74705SXin Li                       OptionsParser.getSourcePathList());
296*67e74705SXin Li
297*67e74705SXin Li        LoopPrinter Printer;
298*67e74705SXin Li        MatchFinder Finder;
299*67e74705SXin Li        Finder.addMatcher(LoopMatcher, &Printer);
300*67e74705SXin Li
301*67e74705SXin Li        return Tool.run(newFrontendActionFactory(&Finder).get());
302*67e74705SXin Li      }
303*67e74705SXin Li
304*67e74705SXin LiNow, you should be able to recompile and run the code to discover for
305*67e74705SXin Liloops. Create a new file with a few examples, and test out our new
306*67e74705SXin Lihandiwork:
307*67e74705SXin Li
308*67e74705SXin Li.. code-block:: console
309*67e74705SXin Li
310*67e74705SXin Li      cd ~/clang-llvm/llvm/llvm_build/
311*67e74705SXin Li      ninja loop-convert
312*67e74705SXin Li      vim ~/test-files/simple-loops.cc
313*67e74705SXin Li      bin/loop-convert ~/test-files/simple-loops.cc
314*67e74705SXin Li
315*67e74705SXin LiStep 3.5: More Complicated Matchers
316*67e74705SXin Li===================================
317*67e74705SXin Li
318*67e74705SXin LiOur simple matcher is capable of discovering for loops, but we would
319*67e74705SXin Listill need to filter out many more ourselves. We can do a good portion
320*67e74705SXin Liof the remaining work with some cleverly chosen matchers, but first we
321*67e74705SXin Lineed to decide exactly which properties we want to allow.
322*67e74705SXin Li
323*67e74705SXin LiHow can we characterize for loops over arrays which would be eligible
324*67e74705SXin Lifor translation to range-based syntax? Range based loops over arrays of
325*67e74705SXin Lisize ``N`` that:
326*67e74705SXin Li
327*67e74705SXin Li-  start at index ``0``
328*67e74705SXin Li-  iterate consecutively
329*67e74705SXin Li-  end at index ``N-1``
330*67e74705SXin Li
331*67e74705SXin LiWe already check for (1), so all we need to add is a check to the loop's
332*67e74705SXin Licondition to ensure that the loop's index variable is compared against
333*67e74705SXin Li``N`` and another check to ensure that the increment step just
334*67e74705SXin Liincrements this same variable. The matcher for (2) is straightforward:
335*67e74705SXin Lirequire a pre- or post-increment of the same variable declared in the
336*67e74705SXin Liinit portion.
337*67e74705SXin Li
338*67e74705SXin LiUnfortunately, such a matcher is impossible to write. Matchers contain
339*67e74705SXin Lino logic for comparing two arbitrary AST nodes and determining whether
340*67e74705SXin Lior not they are equal, so the best we can do is matching more than we
341*67e74705SXin Liwould like to allow, and punting extra comparisons to the callback.
342*67e74705SXin Li
343*67e74705SXin LiIn any case, we can start building this sub-matcher. We can require that
344*67e74705SXin Lithe increment step be a unary increment like this:
345*67e74705SXin Li
346*67e74705SXin Li.. code-block:: c++
347*67e74705SXin Li
348*67e74705SXin Li      hasIncrement(unaryOperator(hasOperatorName("++")))
349*67e74705SXin Li
350*67e74705SXin LiSpecifying what is incremented introduces another quirk of Clang's AST:
351*67e74705SXin LiUsages of variables are represented as ``DeclRefExpr``'s ("declaration
352*67e74705SXin Lireference expressions") because they are expressions which refer to
353*67e74705SXin Livariable declarations. To find a ``unaryOperator`` that refers to a
354*67e74705SXin Lispecific declaration, we can simply add a second condition to it:
355*67e74705SXin Li
356*67e74705SXin Li.. code-block:: c++
357*67e74705SXin Li
358*67e74705SXin Li      hasIncrement(unaryOperator(
359*67e74705SXin Li        hasOperatorName("++"),
360*67e74705SXin Li        hasUnaryOperand(declRefExpr())))
361*67e74705SXin Li
362*67e74705SXin LiFurthermore, we can restrict our matcher to only match if the
363*67e74705SXin Liincremented variable is an integer:
364*67e74705SXin Li
365*67e74705SXin Li.. code-block:: c++
366*67e74705SXin Li
367*67e74705SXin Li      hasIncrement(unaryOperator(
368*67e74705SXin Li        hasOperatorName("++"),
369*67e74705SXin Li        hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
370*67e74705SXin Li
371*67e74705SXin LiAnd the last step will be to attach an identifier to this variable, so
372*67e74705SXin Lithat we can retrieve it in the callback:
373*67e74705SXin Li
374*67e74705SXin Li.. code-block:: c++
375*67e74705SXin Li
376*67e74705SXin Li      hasIncrement(unaryOperator(
377*67e74705SXin Li        hasOperatorName("++"),
378*67e74705SXin Li        hasUnaryOperand(declRefExpr(to(
379*67e74705SXin Li          varDecl(hasType(isInteger())).bind("incrementVariable"))))))
380*67e74705SXin Li
381*67e74705SXin LiWe can add this code to the definition of ``LoopMatcher`` and make sure
382*67e74705SXin Lithat our program, outfitted with the new matcher, only prints out loops
383*67e74705SXin Lithat declare a single variable initialized to zero and have an increment
384*67e74705SXin Listep consisting of a unary increment of some variable.
385*67e74705SXin Li
386*67e74705SXin LiNow, we just need to add a matcher to check if the condition part of the
387*67e74705SXin Li``for`` loop compares a variable against the size of the array. There is
388*67e74705SXin Lionly one problem - we don't know which array we're iterating over
389*67e74705SXin Liwithout looking at the body of the loop! We are again restricted to
390*67e74705SXin Liapproximating the result we want with matchers, filling in the details
391*67e74705SXin Liin the callback. So we start with:
392*67e74705SXin Li
393*67e74705SXin Li.. code-block:: c++
394*67e74705SXin Li
395*67e74705SXin Li      hasCondition(binaryOperator(hasOperatorName("<"))
396*67e74705SXin Li
397*67e74705SXin LiIt makes sense to ensure that the left-hand side is a reference to a
398*67e74705SXin Livariable, and that the right-hand side has integer type.
399*67e74705SXin Li
400*67e74705SXin Li.. code-block:: c++
401*67e74705SXin Li
402*67e74705SXin Li      hasCondition(binaryOperator(
403*67e74705SXin Li        hasOperatorName("<"),
404*67e74705SXin Li        hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
405*67e74705SXin Li        hasRHS(expr(hasType(isInteger())))))
406*67e74705SXin Li
407*67e74705SXin LiWhy? Because it doesn't work. Of the three loops provided in
408*67e74705SXin Li``test-files/simple.cpp``, zero of them have a matching condition. A
409*67e74705SXin Liquick look at the AST dump of the first for loop, produced by the
410*67e74705SXin Liprevious iteration of loop-convert, shows us the answer:
411*67e74705SXin Li
412*67e74705SXin Li::
413*67e74705SXin Li
414*67e74705SXin Li      (ForStmt 0x173b240
415*67e74705SXin Li        (DeclStmt 0x173afc8
416*67e74705SXin Li          0x173af50 "int i =
417*67e74705SXin Li            (IntegerLiteral 0x173afa8 'int' 0)")
418*67e74705SXin Li        <<>>
419*67e74705SXin Li        (BinaryOperator 0x173b060 '_Bool' '<'
420*67e74705SXin Li          (ImplicitCastExpr 0x173b030 'int'
421*67e74705SXin Li            (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
422*67e74705SXin Li          (ImplicitCastExpr 0x173b048 'int'
423*67e74705SXin Li            (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
424*67e74705SXin Li        (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
425*67e74705SXin Li          (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
426*67e74705SXin Li        (CompoundStatement ...
427*67e74705SXin Li
428*67e74705SXin LiWe already know that the declaration and increments both match, or this
429*67e74705SXin Liloop wouldn't have been dumped. The culprit lies in the implicit cast
430*67e74705SXin Liapplied to the first operand (i.e. the LHS) of the less-than operator,
431*67e74705SXin Lian L-value to R-value conversion applied to the expression referencing
432*67e74705SXin Li``i``. Thankfully, the matcher library offers a solution to this problem
433*67e74705SXin Liin the form of ``ignoringParenImpCasts``, which instructs the matcher to
434*67e74705SXin Liignore implicit casts and parentheses before continuing to match.
435*67e74705SXin LiAdjusting the condition operator will restore the desired match.
436*67e74705SXin Li
437*67e74705SXin Li.. code-block:: c++
438*67e74705SXin Li
439*67e74705SXin Li      hasCondition(binaryOperator(
440*67e74705SXin Li        hasOperatorName("<"),
441*67e74705SXin Li        hasLHS(ignoringParenImpCasts(declRefExpr(
442*67e74705SXin Li          to(varDecl(hasType(isInteger())))))),
443*67e74705SXin Li        hasRHS(expr(hasType(isInteger())))))
444*67e74705SXin Li
445*67e74705SXin LiAfter adding binds to the expressions we wished to capture and
446*67e74705SXin Liextracting the identifier strings into variables, we have array-step-2
447*67e74705SXin Licompleted.
448*67e74705SXin Li
449*67e74705SXin LiStep 4: Retrieving Matched Nodes
450*67e74705SXin Li================================
451*67e74705SXin Li
452*67e74705SXin LiSo far, the matcher callback isn't very interesting: it just dumps the
453*67e74705SXin Liloop's AST. At some point, we will need to make changes to the input
454*67e74705SXin Lisource code. Next, we'll work on using the nodes we bound in the
455*67e74705SXin Liprevious step.
456*67e74705SXin Li
457*67e74705SXin LiThe ``MatchFinder::run()`` callback takes a
458*67e74705SXin Li``MatchFinder::MatchResult&`` as its parameter. We're most interested in
459*67e74705SXin Liits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
460*67e74705SXin Liclass to represent contextual information about the AST, as the name
461*67e74705SXin Liimplies, though the most functionally important detail is that several
462*67e74705SXin Lioperations require an ``ASTContext*`` parameter. More immediately useful
463*67e74705SXin Liis the set of matched nodes, and how we retrieve them.
464*67e74705SXin Li
465*67e74705SXin LiSince we bind three variables (identified by ConditionVarName,
466*67e74705SXin LiInitVarName, and IncrementVarName), we can obtain the matched nodes by
467*67e74705SXin Liusing the ``getNodeAs()`` member function.
468*67e74705SXin Li
469*67e74705SXin LiIn ``LoopConvert.cpp`` add
470*67e74705SXin Li
471*67e74705SXin Li.. code-block:: c++
472*67e74705SXin Li
473*67e74705SXin Li      #include "clang/AST/ASTContext.h"
474*67e74705SXin Li
475*67e74705SXin LiChange ``LoopMatcher`` to
476*67e74705SXin Li
477*67e74705SXin Li.. code-block:: c++
478*67e74705SXin Li
479*67e74705SXin Li      StatementMatcher LoopMatcher =
480*67e74705SXin Li          forStmt(hasLoopInit(declStmt(
481*67e74705SXin Li                      hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0))))
482*67e74705SXin Li                                        .bind("initVarName")))),
483*67e74705SXin Li                  hasIncrement(unaryOperator(
484*67e74705SXin Li                      hasOperatorName("++"),
485*67e74705SXin Li                      hasUnaryOperand(declRefExpr(
486*67e74705SXin Li                          to(varDecl(hasType(isInteger())).bind("incVarName")))))),
487*67e74705SXin Li                  hasCondition(binaryOperator(
488*67e74705SXin Li                      hasOperatorName("<"),
489*67e74705SXin Li                      hasLHS(ignoringParenImpCasts(declRefExpr(
490*67e74705SXin Li                          to(varDecl(hasType(isInteger())).bind("condVarName"))))),
491*67e74705SXin Li                      hasRHS(expr(hasType(isInteger())))))).bind("forLoop");
492*67e74705SXin Li
493*67e74705SXin LiAnd change ``LoopPrinter::run`` to
494*67e74705SXin Li
495*67e74705SXin Li.. code-block:: c++
496*67e74705SXin Li
497*67e74705SXin Li      void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
498*67e74705SXin Li        ASTContext *Context = Result.Context;
499*67e74705SXin Li        const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop");
500*67e74705SXin Li        // We do not want to convert header files!
501*67e74705SXin Li        if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc()))
502*67e74705SXin Li          return;
503*67e74705SXin Li        const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName");
504*67e74705SXin Li        const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName");
505*67e74705SXin Li        const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName");
506*67e74705SXin Li
507*67e74705SXin Li        if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
508*67e74705SXin Li          return;
509*67e74705SXin Li        llvm::outs() << "Potential array-based loop discovered.\n";
510*67e74705SXin Li      }
511*67e74705SXin Li
512*67e74705SXin LiClang associates a ``VarDecl`` with each variable to represent the variable's
513*67e74705SXin Lideclaration. Since the "canonical" form of each declaration is unique by
514*67e74705SXin Liaddress, all we need to do is make sure neither ``ValueDecl`` (base class of
515*67e74705SXin Li``VarDecl``) is ``NULL`` and compare the canonical Decls.
516*67e74705SXin Li
517*67e74705SXin Li.. code-block:: c++
518*67e74705SXin Li
519*67e74705SXin Li      static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
520*67e74705SXin Li        return First && Second &&
521*67e74705SXin Li               First->getCanonicalDecl() == Second->getCanonicalDecl();
522*67e74705SXin Li      }
523*67e74705SXin Li
524*67e74705SXin LiIf execution reaches the end of ``LoopPrinter::run()``, we know that the
525*67e74705SXin Liloop shell that looks like
526*67e74705SXin Li
527*67e74705SXin Li.. code-block:: c++
528*67e74705SXin Li
529*67e74705SXin Li      for (int i= 0; i < expr(); ++i) { ... }
530*67e74705SXin Li
531*67e74705SXin LiFor now, we will just print a message explaining that we found a loop.
532*67e74705SXin LiThe next section will deal with recursively traversing the AST to
533*67e74705SXin Lidiscover all changes needed.
534*67e74705SXin Li
535*67e74705SXin LiAs a side note, it's not as trivial to test if two expressions are the same,
536*67e74705SXin Lithough Clang has already done the hard work for us by providing a way to
537*67e74705SXin Licanonicalize expressions:
538*67e74705SXin Li
539*67e74705SXin Li.. code-block:: c++
540*67e74705SXin Li
541*67e74705SXin Li      static bool areSameExpr(ASTContext *Context, const Expr *First,
542*67e74705SXin Li                              const Expr *Second) {
543*67e74705SXin Li        if (!First || !Second)
544*67e74705SXin Li          return false;
545*67e74705SXin Li        llvm::FoldingSetNodeID FirstID, SecondID;
546*67e74705SXin Li        First->Profile(FirstID, *Context, true);
547*67e74705SXin Li        Second->Profile(SecondID, *Context, true);
548*67e74705SXin Li        return FirstID == SecondID;
549*67e74705SXin Li      }
550*67e74705SXin Li
551*67e74705SXin LiThis code relies on the comparison between two
552*67e74705SXin Li``llvm::FoldingSetNodeIDs``. As the documentation for
553*67e74705SXin Li``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
554*67e74705SXin Lia description of a node in the AST, based on its properties, along with
555*67e74705SXin Lithose of its children. ``FoldingSetNodeID`` then serves as a hash we can
556*67e74705SXin Liuse to compare expressions. We will need ``areSameExpr`` later. Before
557*67e74705SXin Liyou run the new code on the additional loops added to
558*67e74705SXin Litest-files/simple.cpp, try to figure out which ones will be considered
559*67e74705SXin Lipotentially convertible.
560