1*67e74705SXin Li=============================================================== 2*67e74705SXin LiTutorial for building tools using LibTooling and LibASTMatchers 3*67e74705SXin Li=============================================================== 4*67e74705SXin Li 5*67e74705SXin LiThis document is intended to show how to build a useful source-to-source 6*67e74705SXin Litranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is 7*67e74705SXin Liexplicitly aimed at people who are new to Clang, so all you should need 8*67e74705SXin Liis a working knowledge of C++ and the command line. 9*67e74705SXin Li 10*67e74705SXin LiIn order to work on the compiler, you need some basic knowledge of the 11*67e74705SXin Liabstract syntax tree (AST). To this end, the reader is incouraged to 12*67e74705SXin Liskim the :doc:`Introduction to the Clang 13*67e74705SXin LiAST <IntroductionToTheClangAST>` 14*67e74705SXin Li 15*67e74705SXin LiStep 0: Obtaining Clang 16*67e74705SXin Li======================= 17*67e74705SXin Li 18*67e74705SXin LiAs Clang is part of the LLVM project, you'll need to download LLVM's 19*67e74705SXin Lisource code first. Both Clang and LLVM are maintained as Subversion 20*67e74705SXin Lirepositories, but we'll be accessing them through the git mirror. For 21*67e74705SXin Lifurther information, see the `getting started 22*67e74705SXin Liguide <http://llvm.org/docs/GettingStarted.html>`_. 23*67e74705SXin Li 24*67e74705SXin Li.. code-block:: console 25*67e74705SXin Li 26*67e74705SXin Li mkdir ~/clang-llvm && cd ~/clang-llvm 27*67e74705SXin Li git clone http://llvm.org/git/llvm.git 28*67e74705SXin Li cd llvm/tools 29*67e74705SXin Li git clone http://llvm.org/git/clang.git 30*67e74705SXin Li cd clang/tools 31*67e74705SXin Li git clone http://llvm.org/git/clang-tools-extra.git extra 32*67e74705SXin Li 33*67e74705SXin LiNext you need to obtain the CMake build system and Ninja build tool. You 34*67e74705SXin Limay already have CMake installed, but current binary versions of CMake 35*67e74705SXin Liaren't built with Ninja support. 36*67e74705SXin Li 37*67e74705SXin Li.. code-block:: console 38*67e74705SXin Li 39*67e74705SXin Li cd ~/clang-llvm 40*67e74705SXin Li git clone https://github.com/martine/ninja.git 41*67e74705SXin Li cd ninja 42*67e74705SXin Li git checkout release 43*67e74705SXin Li ./bootstrap.py 44*67e74705SXin Li sudo cp ninja /usr/bin/ 45*67e74705SXin Li 46*67e74705SXin Li cd ~/clang-llvm 47*67e74705SXin Li git clone git://cmake.org/stage/cmake.git 48*67e74705SXin Li cd cmake 49*67e74705SXin Li git checkout next 50*67e74705SXin Li ./bootstrap 51*67e74705SXin Li make 52*67e74705SXin Li sudo make install 53*67e74705SXin Li 54*67e74705SXin LiOkay. Now we'll build Clang! 55*67e74705SXin Li 56*67e74705SXin Li.. code-block:: console 57*67e74705SXin Li 58*67e74705SXin Li cd ~/clang-llvm 59*67e74705SXin Li mkdir build && cd build 60*67e74705SXin Li cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. 61*67e74705SXin Li ninja 62*67e74705SXin Li ninja check # Test LLVM only. 63*67e74705SXin Li ninja clang-test # Test Clang only. 64*67e74705SXin Li ninja install 65*67e74705SXin Li 66*67e74705SXin LiAnd we're live. 67*67e74705SXin Li 68*67e74705SXin LiAll of the tests should pass, though there is a (very) small chance that 69*67e74705SXin Liyou can catch LLVM and Clang out of sync. Running ``'git svn rebase'`` 70*67e74705SXin Liin both the llvm and clang directories should fix any problems. 71*67e74705SXin Li 72*67e74705SXin LiFinally, we want to set Clang as its own compiler. 73*67e74705SXin Li 74*67e74705SXin Li.. code-block:: console 75*67e74705SXin Li 76*67e74705SXin Li cd ~/clang-llvm/build 77*67e74705SXin Li ccmake ../llvm 78*67e74705SXin Li 79*67e74705SXin LiThe second command will bring up a GUI for configuring Clang. You need 80*67e74705SXin Lito set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on 81*67e74705SXin Liadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to 82*67e74705SXin Li``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to 83*67e74705SXin Liconfigure, then ``'g'`` to generate CMake's files. 84*67e74705SXin Li 85*67e74705SXin LiFinally, run ninja one last time, and you're done. 86*67e74705SXin Li 87*67e74705SXin LiStep 1: Create a ClangTool 88*67e74705SXin Li========================== 89*67e74705SXin Li 90*67e74705SXin LiNow that we have enough background knowledge, it's time to create the 91*67e74705SXin Lisimplest productive ClangTool in existence: a syntax checker. While this 92*67e74705SXin Lialready exists as ``clang-check``, it's important to understand what's 93*67e74705SXin Ligoing on. 94*67e74705SXin Li 95*67e74705SXin LiFirst, we'll need to create a new directory for our tool and tell CMake 96*67e74705SXin Lithat it exists. As this is not going to be a core clang tool, it will 97*67e74705SXin Lilive in the ``tools/extra`` repository. 98*67e74705SXin Li 99*67e74705SXin Li.. code-block:: console 100*67e74705SXin Li 101*67e74705SXin Li cd ~/clang-llvm/llvm/tools/clang 102*67e74705SXin Li mkdir tools/extra/loop-convert 103*67e74705SXin Li echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt 104*67e74705SXin Li vim tools/extra/loop-convert/CMakeLists.txt 105*67e74705SXin Li 106*67e74705SXin LiCMakeLists.txt should have the following contents: 107*67e74705SXin Li 108*67e74705SXin Li:: 109*67e74705SXin Li 110*67e74705SXin Li set(LLVM_LINK_COMPONENTS support) 111*67e74705SXin Li 112*67e74705SXin Li add_clang_executable(loop-convert 113*67e74705SXin Li LoopConvert.cpp 114*67e74705SXin Li ) 115*67e74705SXin Li target_link_libraries(loop-convert 116*67e74705SXin Li clangTooling 117*67e74705SXin Li clangBasic 118*67e74705SXin Li clangASTMatchers 119*67e74705SXin Li ) 120*67e74705SXin Li 121*67e74705SXin LiWith that done, Ninja will be able to compile our tool. Let's give it 122*67e74705SXin Lisomething to compile! Put the following into 123*67e74705SXin Li``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of 124*67e74705SXin Liwhy the different parts are needed can be found in the `LibTooling 125*67e74705SXin Lidocumentation <LibTooling.html>`_. 126*67e74705SXin Li 127*67e74705SXin Li.. code-block:: c++ 128*67e74705SXin Li 129*67e74705SXin Li // Declares clang::SyntaxOnlyAction. 130*67e74705SXin Li #include "clang/Frontend/FrontendActions.h" 131*67e74705SXin Li #include "clang/Tooling/CommonOptionsParser.h" 132*67e74705SXin Li #include "clang/Tooling/Tooling.h" 133*67e74705SXin Li // Declares llvm::cl::extrahelp. 134*67e74705SXin Li #include "llvm/Support/CommandLine.h" 135*67e74705SXin Li 136*67e74705SXin Li using namespace clang::tooling; 137*67e74705SXin Li using namespace llvm; 138*67e74705SXin Li 139*67e74705SXin Li // Apply a custom category to all command-line options so that they are the 140*67e74705SXin Li // only ones displayed. 141*67e74705SXin Li static llvm::cl::OptionCategory MyToolCategory("my-tool options"); 142*67e74705SXin Li 143*67e74705SXin Li // CommonOptionsParser declares HelpMessage with a description of the common 144*67e74705SXin Li // command-line options related to the compilation database and input files. 145*67e74705SXin Li // It's nice to have this help message in all tools. 146*67e74705SXin Li static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); 147*67e74705SXin Li 148*67e74705SXin Li // A help message for this specific tool can be added afterwards. 149*67e74705SXin Li static cl::extrahelp MoreHelp("\nMore help text..."); 150*67e74705SXin Li 151*67e74705SXin Li int main(int argc, const char **argv) { 152*67e74705SXin Li CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); 153*67e74705SXin Li ClangTool Tool(OptionsParser.getCompilations(), 154*67e74705SXin Li OptionsParser.getSourcePathList()); 155*67e74705SXin Li return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get()); 156*67e74705SXin Li } 157*67e74705SXin Li 158*67e74705SXin LiAnd that's it! You can compile our new tool by running ninja from the 159*67e74705SXin Li``build`` directory. 160*67e74705SXin Li 161*67e74705SXin Li.. code-block:: console 162*67e74705SXin Li 163*67e74705SXin Li cd ~/clang-llvm/build 164*67e74705SXin Li ninja 165*67e74705SXin Li 166*67e74705SXin LiYou should now be able to run the syntax checker, which is located in 167*67e74705SXin Li``~/clang-llvm/build/bin``, on any source file. Try it! 168*67e74705SXin Li 169*67e74705SXin Li.. code-block:: console 170*67e74705SXin Li 171*67e74705SXin Li echo "int main() { return 0; }" > test.cpp 172*67e74705SXin Li bin/loop-convert test.cpp -- 173*67e74705SXin Li 174*67e74705SXin LiNote the two dashes after we specify the source file. The additional 175*67e74705SXin Lioptions for the compiler are passed after the dashes rather than loading 176*67e74705SXin Lithem from a compilation database - there just aren't any options needed 177*67e74705SXin Liright now. 178*67e74705SXin Li 179*67e74705SXin LiIntermezzo: Learn AST matcher basics 180*67e74705SXin Li==================================== 181*67e74705SXin Li 182*67e74705SXin LiClang recently introduced the :doc:`ASTMatcher 183*67e74705SXin Lilibrary <LibASTMatchers>` to provide a simple, powerful, and 184*67e74705SXin Liconcise way to describe specific patterns in the AST. Implemented as a 185*67e74705SXin LiDSL powered by macros and templates (see 186*67e74705SXin Li`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're 187*67e74705SXin Licurious), matchers offer the feel of algebraic data types common to 188*67e74705SXin Lifunctional programming languages. 189*67e74705SXin Li 190*67e74705SXin LiFor example, suppose you wanted to examine only binary operators. There 191*67e74705SXin Liis a matcher to do exactly that, conveniently named ``binaryOperator``. 192*67e74705SXin LiI'll give you one guess what this matcher does: 193*67e74705SXin Li 194*67e74705SXin Li.. code-block:: c++ 195*67e74705SXin Li 196*67e74705SXin Li binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) 197*67e74705SXin Li 198*67e74705SXin LiShockingly, it will match against addition expressions whose left hand 199*67e74705SXin Liside is exactly the literal 0. It will not match against other forms of 200*67e74705SXin Li0, such as ``'\0'`` or ``NULL``, but it will match against macros that 201*67e74705SXin Liexpand to 0. The matcher will also not match against calls to the 202*67e74705SXin Lioverloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` 203*67e74705SXin Limatcher to handle overloaded operators. 204*67e74705SXin Li 205*67e74705SXin LiThere are AST matchers to match all the different nodes of the AST, 206*67e74705SXin Linarrowing matchers to only match AST nodes fulfilling specific criteria, 207*67e74705SXin Liand traversal matchers to get from one kind of AST node to another. For 208*67e74705SXin Lia complete list of AST matchers, take a look at the `AST Matcher 209*67e74705SXin LiReferences <LibASTMatchersReference.html>`_ 210*67e74705SXin Li 211*67e74705SXin LiAll matcher that are nouns describe entities in the AST and can be 212*67e74705SXin Libound, so that they can be referred to whenever a match is found. To do 213*67e74705SXin Liso, simply call the method ``bind`` on these matchers, e.g.: 214*67e74705SXin Li 215*67e74705SXin Li.. code-block:: c++ 216*67e74705SXin Li 217*67e74705SXin Li variable(hasType(isInteger())).bind("intvar") 218*67e74705SXin Li 219*67e74705SXin LiStep 2: Using AST matchers 220*67e74705SXin Li========================== 221*67e74705SXin Li 222*67e74705SXin LiOkay, on to using matchers for real. Let's start by defining a matcher 223*67e74705SXin Liwhich will capture all ``for`` statements that define a new variable 224*67e74705SXin Liinitialized to zero. Let's start with matching all ``for`` loops: 225*67e74705SXin Li 226*67e74705SXin Li.. code-block:: c++ 227*67e74705SXin Li 228*67e74705SXin Li forStmt() 229*67e74705SXin Li 230*67e74705SXin LiNext, we want to specify that a single variable is declared in the first 231*67e74705SXin Liportion of the loop, so we can extend the matcher to 232*67e74705SXin Li 233*67e74705SXin Li.. code-block:: c++ 234*67e74705SXin Li 235*67e74705SXin Li forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) 236*67e74705SXin Li 237*67e74705SXin LiFinally, we can add the condition that the variable is initialized to 238*67e74705SXin Lizero. 239*67e74705SXin Li 240*67e74705SXin Li.. code-block:: c++ 241*67e74705SXin Li 242*67e74705SXin Li forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 243*67e74705SXin Li hasInitializer(integerLiteral(equals(0)))))))) 244*67e74705SXin Li 245*67e74705SXin LiIt is fairly easy to read and understand the matcher definition ("match 246*67e74705SXin Liloops whose init portion declares a single variable which is initialized 247*67e74705SXin Lito the integer literal 0"), but deciding that every piece is necessary 248*67e74705SXin Liis more difficult. Note that this matcher will not match loops whose 249*67e74705SXin Livariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of 250*67e74705SXin Lizero besides the integer 0. 251*67e74705SXin Li 252*67e74705SXin LiThe last step is giving the matcher a name and binding the ``ForStmt`` 253*67e74705SXin Lias we will want to do something with it: 254*67e74705SXin Li 255*67e74705SXin Li.. code-block:: c++ 256*67e74705SXin Li 257*67e74705SXin Li StatementMatcher LoopMatcher = 258*67e74705SXin Li forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 259*67e74705SXin Li hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 260*67e74705SXin Li 261*67e74705SXin LiOnce you have defined your matchers, you will need to add a little more 262*67e74705SXin Liscaffolding in order to run them. Matchers are paired with a 263*67e74705SXin Li``MatchCallback`` and registered with a ``MatchFinder`` object, then run 264*67e74705SXin Lifrom a ``ClangTool``. More code! 265*67e74705SXin Li 266*67e74705SXin LiAdd the following to ``LoopConvert.cpp``: 267*67e74705SXin Li 268*67e74705SXin Li.. code-block:: c++ 269*67e74705SXin Li 270*67e74705SXin Li #include "clang/ASTMatchers/ASTMatchers.h" 271*67e74705SXin Li #include "clang/ASTMatchers/ASTMatchFinder.h" 272*67e74705SXin Li 273*67e74705SXin Li using namespace clang; 274*67e74705SXin Li using namespace clang::ast_matchers; 275*67e74705SXin Li 276*67e74705SXin Li StatementMatcher LoopMatcher = 277*67e74705SXin Li forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 278*67e74705SXin Li hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 279*67e74705SXin Li 280*67e74705SXin Li class LoopPrinter : public MatchFinder::MatchCallback { 281*67e74705SXin Li public : 282*67e74705SXin Li virtual void run(const MatchFinder::MatchResult &Result) { 283*67e74705SXin Li if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) 284*67e74705SXin Li FS->dump(); 285*67e74705SXin Li } 286*67e74705SXin Li }; 287*67e74705SXin Li 288*67e74705SXin LiAnd change ``main()`` to: 289*67e74705SXin Li 290*67e74705SXin Li.. code-block:: c++ 291*67e74705SXin Li 292*67e74705SXin Li int main(int argc, const char **argv) { 293*67e74705SXin Li CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); 294*67e74705SXin Li ClangTool Tool(OptionsParser.getCompilations(), 295*67e74705SXin Li OptionsParser.getSourcePathList()); 296*67e74705SXin Li 297*67e74705SXin Li LoopPrinter Printer; 298*67e74705SXin Li MatchFinder Finder; 299*67e74705SXin Li Finder.addMatcher(LoopMatcher, &Printer); 300*67e74705SXin Li 301*67e74705SXin Li return Tool.run(newFrontendActionFactory(&Finder).get()); 302*67e74705SXin Li } 303*67e74705SXin Li 304*67e74705SXin LiNow, you should be able to recompile and run the code to discover for 305*67e74705SXin Liloops. Create a new file with a few examples, and test out our new 306*67e74705SXin Lihandiwork: 307*67e74705SXin Li 308*67e74705SXin Li.. code-block:: console 309*67e74705SXin Li 310*67e74705SXin Li cd ~/clang-llvm/llvm/llvm_build/ 311*67e74705SXin Li ninja loop-convert 312*67e74705SXin Li vim ~/test-files/simple-loops.cc 313*67e74705SXin Li bin/loop-convert ~/test-files/simple-loops.cc 314*67e74705SXin Li 315*67e74705SXin LiStep 3.5: More Complicated Matchers 316*67e74705SXin Li=================================== 317*67e74705SXin Li 318*67e74705SXin LiOur simple matcher is capable of discovering for loops, but we would 319*67e74705SXin Listill need to filter out many more ourselves. We can do a good portion 320*67e74705SXin Liof the remaining work with some cleverly chosen matchers, but first we 321*67e74705SXin Lineed to decide exactly which properties we want to allow. 322*67e74705SXin Li 323*67e74705SXin LiHow can we characterize for loops over arrays which would be eligible 324*67e74705SXin Lifor translation to range-based syntax? Range based loops over arrays of 325*67e74705SXin Lisize ``N`` that: 326*67e74705SXin Li 327*67e74705SXin Li- start at index ``0`` 328*67e74705SXin Li- iterate consecutively 329*67e74705SXin Li- end at index ``N-1`` 330*67e74705SXin Li 331*67e74705SXin LiWe already check for (1), so all we need to add is a check to the loop's 332*67e74705SXin Licondition to ensure that the loop's index variable is compared against 333*67e74705SXin Li``N`` and another check to ensure that the increment step just 334*67e74705SXin Liincrements this same variable. The matcher for (2) is straightforward: 335*67e74705SXin Lirequire a pre- or post-increment of the same variable declared in the 336*67e74705SXin Liinit portion. 337*67e74705SXin Li 338*67e74705SXin LiUnfortunately, such a matcher is impossible to write. Matchers contain 339*67e74705SXin Lino logic for comparing two arbitrary AST nodes and determining whether 340*67e74705SXin Lior not they are equal, so the best we can do is matching more than we 341*67e74705SXin Liwould like to allow, and punting extra comparisons to the callback. 342*67e74705SXin Li 343*67e74705SXin LiIn any case, we can start building this sub-matcher. We can require that 344*67e74705SXin Lithe increment step be a unary increment like this: 345*67e74705SXin Li 346*67e74705SXin Li.. code-block:: c++ 347*67e74705SXin Li 348*67e74705SXin Li hasIncrement(unaryOperator(hasOperatorName("++"))) 349*67e74705SXin Li 350*67e74705SXin LiSpecifying what is incremented introduces another quirk of Clang's AST: 351*67e74705SXin LiUsages of variables are represented as ``DeclRefExpr``'s ("declaration 352*67e74705SXin Lireference expressions") because they are expressions which refer to 353*67e74705SXin Livariable declarations. To find a ``unaryOperator`` that refers to a 354*67e74705SXin Lispecific declaration, we can simply add a second condition to it: 355*67e74705SXin Li 356*67e74705SXin Li.. code-block:: c++ 357*67e74705SXin Li 358*67e74705SXin Li hasIncrement(unaryOperator( 359*67e74705SXin Li hasOperatorName("++"), 360*67e74705SXin Li hasUnaryOperand(declRefExpr()))) 361*67e74705SXin Li 362*67e74705SXin LiFurthermore, we can restrict our matcher to only match if the 363*67e74705SXin Liincremented variable is an integer: 364*67e74705SXin Li 365*67e74705SXin Li.. code-block:: c++ 366*67e74705SXin Li 367*67e74705SXin Li hasIncrement(unaryOperator( 368*67e74705SXin Li hasOperatorName("++"), 369*67e74705SXin Li hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) 370*67e74705SXin Li 371*67e74705SXin LiAnd the last step will be to attach an identifier to this variable, so 372*67e74705SXin Lithat we can retrieve it in the callback: 373*67e74705SXin Li 374*67e74705SXin Li.. code-block:: c++ 375*67e74705SXin Li 376*67e74705SXin Li hasIncrement(unaryOperator( 377*67e74705SXin Li hasOperatorName("++"), 378*67e74705SXin Li hasUnaryOperand(declRefExpr(to( 379*67e74705SXin Li varDecl(hasType(isInteger())).bind("incrementVariable")))))) 380*67e74705SXin Li 381*67e74705SXin LiWe can add this code to the definition of ``LoopMatcher`` and make sure 382*67e74705SXin Lithat our program, outfitted with the new matcher, only prints out loops 383*67e74705SXin Lithat declare a single variable initialized to zero and have an increment 384*67e74705SXin Listep consisting of a unary increment of some variable. 385*67e74705SXin Li 386*67e74705SXin LiNow, we just need to add a matcher to check if the condition part of the 387*67e74705SXin Li``for`` loop compares a variable against the size of the array. There is 388*67e74705SXin Lionly one problem - we don't know which array we're iterating over 389*67e74705SXin Liwithout looking at the body of the loop! We are again restricted to 390*67e74705SXin Liapproximating the result we want with matchers, filling in the details 391*67e74705SXin Liin the callback. So we start with: 392*67e74705SXin Li 393*67e74705SXin Li.. code-block:: c++ 394*67e74705SXin Li 395*67e74705SXin Li hasCondition(binaryOperator(hasOperatorName("<")) 396*67e74705SXin Li 397*67e74705SXin LiIt makes sense to ensure that the left-hand side is a reference to a 398*67e74705SXin Livariable, and that the right-hand side has integer type. 399*67e74705SXin Li 400*67e74705SXin Li.. code-block:: c++ 401*67e74705SXin Li 402*67e74705SXin Li hasCondition(binaryOperator( 403*67e74705SXin Li hasOperatorName("<"), 404*67e74705SXin Li hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), 405*67e74705SXin Li hasRHS(expr(hasType(isInteger()))))) 406*67e74705SXin Li 407*67e74705SXin LiWhy? Because it doesn't work. Of the three loops provided in 408*67e74705SXin Li``test-files/simple.cpp``, zero of them have a matching condition. A 409*67e74705SXin Liquick look at the AST dump of the first for loop, produced by the 410*67e74705SXin Liprevious iteration of loop-convert, shows us the answer: 411*67e74705SXin Li 412*67e74705SXin Li:: 413*67e74705SXin Li 414*67e74705SXin Li (ForStmt 0x173b240 415*67e74705SXin Li (DeclStmt 0x173afc8 416*67e74705SXin Li 0x173af50 "int i = 417*67e74705SXin Li (IntegerLiteral 0x173afa8 'int' 0)") 418*67e74705SXin Li <<>> 419*67e74705SXin Li (BinaryOperator 0x173b060 '_Bool' '<' 420*67e74705SXin Li (ImplicitCastExpr 0x173b030 'int' 421*67e74705SXin Li (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) 422*67e74705SXin Li (ImplicitCastExpr 0x173b048 'int' 423*67e74705SXin Li (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) 424*67e74705SXin Li (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' 425*67e74705SXin Li (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) 426*67e74705SXin Li (CompoundStatement ... 427*67e74705SXin Li 428*67e74705SXin LiWe already know that the declaration and increments both match, or this 429*67e74705SXin Liloop wouldn't have been dumped. The culprit lies in the implicit cast 430*67e74705SXin Liapplied to the first operand (i.e. the LHS) of the less-than operator, 431*67e74705SXin Lian L-value to R-value conversion applied to the expression referencing 432*67e74705SXin Li``i``. Thankfully, the matcher library offers a solution to this problem 433*67e74705SXin Liin the form of ``ignoringParenImpCasts``, which instructs the matcher to 434*67e74705SXin Liignore implicit casts and parentheses before continuing to match. 435*67e74705SXin LiAdjusting the condition operator will restore the desired match. 436*67e74705SXin Li 437*67e74705SXin Li.. code-block:: c++ 438*67e74705SXin Li 439*67e74705SXin Li hasCondition(binaryOperator( 440*67e74705SXin Li hasOperatorName("<"), 441*67e74705SXin Li hasLHS(ignoringParenImpCasts(declRefExpr( 442*67e74705SXin Li to(varDecl(hasType(isInteger())))))), 443*67e74705SXin Li hasRHS(expr(hasType(isInteger()))))) 444*67e74705SXin Li 445*67e74705SXin LiAfter adding binds to the expressions we wished to capture and 446*67e74705SXin Liextracting the identifier strings into variables, we have array-step-2 447*67e74705SXin Licompleted. 448*67e74705SXin Li 449*67e74705SXin LiStep 4: Retrieving Matched Nodes 450*67e74705SXin Li================================ 451*67e74705SXin Li 452*67e74705SXin LiSo far, the matcher callback isn't very interesting: it just dumps the 453*67e74705SXin Liloop's AST. At some point, we will need to make changes to the input 454*67e74705SXin Lisource code. Next, we'll work on using the nodes we bound in the 455*67e74705SXin Liprevious step. 456*67e74705SXin Li 457*67e74705SXin LiThe ``MatchFinder::run()`` callback takes a 458*67e74705SXin Li``MatchFinder::MatchResult&`` as its parameter. We're most interested in 459*67e74705SXin Liits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` 460*67e74705SXin Liclass to represent contextual information about the AST, as the name 461*67e74705SXin Liimplies, though the most functionally important detail is that several 462*67e74705SXin Lioperations require an ``ASTContext*`` parameter. More immediately useful 463*67e74705SXin Liis the set of matched nodes, and how we retrieve them. 464*67e74705SXin Li 465*67e74705SXin LiSince we bind three variables (identified by ConditionVarName, 466*67e74705SXin LiInitVarName, and IncrementVarName), we can obtain the matched nodes by 467*67e74705SXin Liusing the ``getNodeAs()`` member function. 468*67e74705SXin Li 469*67e74705SXin LiIn ``LoopConvert.cpp`` add 470*67e74705SXin Li 471*67e74705SXin Li.. code-block:: c++ 472*67e74705SXin Li 473*67e74705SXin Li #include "clang/AST/ASTContext.h" 474*67e74705SXin Li 475*67e74705SXin LiChange ``LoopMatcher`` to 476*67e74705SXin Li 477*67e74705SXin Li.. code-block:: c++ 478*67e74705SXin Li 479*67e74705SXin Li StatementMatcher LoopMatcher = 480*67e74705SXin Li forStmt(hasLoopInit(declStmt( 481*67e74705SXin Li hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) 482*67e74705SXin Li .bind("initVarName")))), 483*67e74705SXin Li hasIncrement(unaryOperator( 484*67e74705SXin Li hasOperatorName("++"), 485*67e74705SXin Li hasUnaryOperand(declRefExpr( 486*67e74705SXin Li to(varDecl(hasType(isInteger())).bind("incVarName")))))), 487*67e74705SXin Li hasCondition(binaryOperator( 488*67e74705SXin Li hasOperatorName("<"), 489*67e74705SXin Li hasLHS(ignoringParenImpCasts(declRefExpr( 490*67e74705SXin Li to(varDecl(hasType(isInteger())).bind("condVarName"))))), 491*67e74705SXin Li hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); 492*67e74705SXin Li 493*67e74705SXin LiAnd change ``LoopPrinter::run`` to 494*67e74705SXin Li 495*67e74705SXin Li.. code-block:: c++ 496*67e74705SXin Li 497*67e74705SXin Li void LoopPrinter::run(const MatchFinder::MatchResult &Result) { 498*67e74705SXin Li ASTContext *Context = Result.Context; 499*67e74705SXin Li const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop"); 500*67e74705SXin Li // We do not want to convert header files! 501*67e74705SXin Li if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc())) 502*67e74705SXin Li return; 503*67e74705SXin Li const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); 504*67e74705SXin Li const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); 505*67e74705SXin Li const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); 506*67e74705SXin Li 507*67e74705SXin Li if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) 508*67e74705SXin Li return; 509*67e74705SXin Li llvm::outs() << "Potential array-based loop discovered.\n"; 510*67e74705SXin Li } 511*67e74705SXin Li 512*67e74705SXin LiClang associates a ``VarDecl`` with each variable to represent the variable's 513*67e74705SXin Lideclaration. Since the "canonical" form of each declaration is unique by 514*67e74705SXin Liaddress, all we need to do is make sure neither ``ValueDecl`` (base class of 515*67e74705SXin Li``VarDecl``) is ``NULL`` and compare the canonical Decls. 516*67e74705SXin Li 517*67e74705SXin Li.. code-block:: c++ 518*67e74705SXin Li 519*67e74705SXin Li static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { 520*67e74705SXin Li return First && Second && 521*67e74705SXin Li First->getCanonicalDecl() == Second->getCanonicalDecl(); 522*67e74705SXin Li } 523*67e74705SXin Li 524*67e74705SXin LiIf execution reaches the end of ``LoopPrinter::run()``, we know that the 525*67e74705SXin Liloop shell that looks like 526*67e74705SXin Li 527*67e74705SXin Li.. code-block:: c++ 528*67e74705SXin Li 529*67e74705SXin Li for (int i= 0; i < expr(); ++i) { ... } 530*67e74705SXin Li 531*67e74705SXin LiFor now, we will just print a message explaining that we found a loop. 532*67e74705SXin LiThe next section will deal with recursively traversing the AST to 533*67e74705SXin Lidiscover all changes needed. 534*67e74705SXin Li 535*67e74705SXin LiAs a side note, it's not as trivial to test if two expressions are the same, 536*67e74705SXin Lithough Clang has already done the hard work for us by providing a way to 537*67e74705SXin Licanonicalize expressions: 538*67e74705SXin Li 539*67e74705SXin Li.. code-block:: c++ 540*67e74705SXin Li 541*67e74705SXin Li static bool areSameExpr(ASTContext *Context, const Expr *First, 542*67e74705SXin Li const Expr *Second) { 543*67e74705SXin Li if (!First || !Second) 544*67e74705SXin Li return false; 545*67e74705SXin Li llvm::FoldingSetNodeID FirstID, SecondID; 546*67e74705SXin Li First->Profile(FirstID, *Context, true); 547*67e74705SXin Li Second->Profile(SecondID, *Context, true); 548*67e74705SXin Li return FirstID == SecondID; 549*67e74705SXin Li } 550*67e74705SXin Li 551*67e74705SXin LiThis code relies on the comparison between two 552*67e74705SXin Li``llvm::FoldingSetNodeIDs``. As the documentation for 553*67e74705SXin Li``Stmt::Profile()`` indicates, the ``Profile()`` member function builds 554*67e74705SXin Lia description of a node in the AST, based on its properties, along with 555*67e74705SXin Lithose of its children. ``FoldingSetNodeID`` then serves as a hash we can 556*67e74705SXin Liuse to compare expressions. We will need ``areSameExpr`` later. Before 557*67e74705SXin Liyou run the new code on the additional loops added to 558*67e74705SXin Litest-files/simple.cpp, try to figure out which ones will be considered 559*67e74705SXin Lipotentially convertible. 560