xref: /aosp_15_r20/external/clang/lib/StaticAnalyzer/README.txt (revision 67e74705e28f6214e480b399dd47ea732279e315)
1*67e74705SXin Li//===----------------------------------------------------------------------===//
2*67e74705SXin Li// Clang Static Analyzer
3*67e74705SXin Li//===----------------------------------------------------------------------===//
4*67e74705SXin Li
5*67e74705SXin Li= Library Structure =
6*67e74705SXin Li
7*67e74705SXin LiThe analyzer library has two layers: a (low-level) static analysis
8*67e74705SXin Liengine (GRExprEngine.cpp and friends), and some static checkers
9*67e74705SXin Li(*Checker.cpp).  The latter are built on top of the former via the
10*67e74705SXin LiChecker and CheckerVisitor interfaces (Checker.h and
11*67e74705SXin LiCheckerVisitor.h).  The Checker interface is designed to be minimal
12*67e74705SXin Liand simple for checker writers, and attempts to isolate them from much
13*67e74705SXin Liof the gore of the internal analysis engine.
14*67e74705SXin Li
15*67e74705SXin Li= How It Works =
16*67e74705SXin Li
17*67e74705SXin LiThe analyzer is inspired by several foundational research papers ([1],
18*67e74705SXin Li[2]).  (FIXME: kremenek to add more links)
19*67e74705SXin Li
20*67e74705SXin LiIn a nutshell, the analyzer is basically a source code simulator that
21*67e74705SXin Litraces out possible paths of execution.  The state of the program
22*67e74705SXin Li(values of variables and expressions) is encapsulated by the state
23*67e74705SXin Li(ProgramState).  A location in the program is called a program point
24*67e74705SXin Li(ProgramPoint), and the combination of state and program point is a
25*67e74705SXin Linode in an exploded graph (ExplodedGraph).  The term "exploded" comes
26*67e74705SXin Lifrom exploding the control-flow edges in the control-flow graph (CFG).
27*67e74705SXin Li
28*67e74705SXin LiConceptually the analyzer does a reachability analysis through the
29*67e74705SXin LiExplodedGraph.  We start at a root node, which has the entry program
30*67e74705SXin Lipoint and initial state, and then simulate transitions by analyzing
31*67e74705SXin Liindividual expressions.  The analysis of an expression can cause the
32*67e74705SXin Listate to change, resulting in a new node in the ExplodedGraph with an
33*67e74705SXin Liupdated program point and an updated state.  A bug is found by hitting
34*67e74705SXin Lia node that satisfies some "bug condition" (basically a violation of a
35*67e74705SXin Lichecking invariant).
36*67e74705SXin Li
37*67e74705SXin LiThe analyzer traces out multiple paths by reasoning about branches and
38*67e74705SXin Lithen bifurcating the state: on the true branch the conditions of the
39*67e74705SXin Libranch are assumed to be true and on the false branch the conditions
40*67e74705SXin Liof the branch are assumed to be false.  Such "assumptions" create
41*67e74705SXin Liconstraints on the values of the program, and those constraints are
42*67e74705SXin Lirecorded in the ProgramState object (and are manipulated by the
43*67e74705SXin LiConstraintManager).  If assuming the conditions of a branch would
44*67e74705SXin Licause the constraints to be unsatisfiable, the branch is considered
45*67e74705SXin Liinfeasible and that path is not taken.  This is how we get
46*67e74705SXin Lipath-sensitivity.  We reduce exponential blow-up by caching nodes.  If
47*67e74705SXin Lia new node with the same state and program point as an existing node
48*67e74705SXin Liwould get generated, the path "caches out" and we simply reuse the
49*67e74705SXin Liexisting node.  Thus the ExplodedGraph is not a DAG; it can contain
50*67e74705SXin Licycles as paths loop back onto each other and cache out.
51*67e74705SXin Li
52*67e74705SXin LiProgramState and ExplodedNodes are basically immutable once created.  Once
53*67e74705SXin Lione creates a ProgramState, you need to create a new one to get a new
54*67e74705SXin LiProgramState.  This immutability is key since the ExplodedGraph represents
55*67e74705SXin Lithe behavior of the analyzed program from the entry point.  To
56*67e74705SXin Lirepresent these efficiently, we use functional data structures (e.g.,
57*67e74705SXin LiImmutableMaps) which share data between instances.
58*67e74705SXin Li
59*67e74705SXin LiFinally, individual Checkers work by also manipulating the analysis
60*67e74705SXin Listate.  The analyzer engine talks to them via a visitor interface.
61*67e74705SXin LiFor example, the PreVisitCallExpr() method is called by GRExprEngine
62*67e74705SXin Lito tell the Checker that we are about to analyze a CallExpr, and the
63*67e74705SXin Lichecker is asked to check for any preconditions that might not be
64*67e74705SXin Lisatisfied.  The checker can do nothing, or it can generate a new
65*67e74705SXin LiProgramState and ExplodedNode which contains updated checker state.  If it
66*67e74705SXin Lifinds a bug, it can tell the BugReporter object about the bug,
67*67e74705SXin Liproviding it an ExplodedNode which is the last node in the path that
68*67e74705SXin Litriggered the problem.
69*67e74705SXin Li
70*67e74705SXin Li= Notes about C++ =
71*67e74705SXin Li
72*67e74705SXin LiSince now constructors are seen before the variable that is constructed
73*67e74705SXin Liin the CFG, we create a temporary object as the destination region that
74*67e74705SXin Liis constructed into. See ExprEngine::VisitCXXConstructExpr().
75*67e74705SXin Li
76*67e74705SXin LiIn ExprEngine::processCallExit(), we always bind the object region to the
77*67e74705SXin Lievaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the
78*67e74705SXin Licorresponding lazy compound value if the variable is not a reference, and
79*67e74705SXin Libind the variable region to the lazy compound value. If the variable
80*67e74705SXin Liis a reference, just use the object region as the initilizer value.
81*67e74705SXin Li
82*67e74705SXin LiBefore entering a C++ method (or ctor/dtor), the 'this' region is bound
83*67e74705SXin Lito the object region. In ctors, we synthesize 'this' region with
84*67e74705SXin LiCXXRecordDecl*, which means we do not use type qualifiers. In methods, we
85*67e74705SXin Lisynthesize 'this' region with CXXMethodDecl*, which has getThisType()
86*67e74705SXin Litaking type qualifiers into account. It does not matter we use qualified
87*67e74705SXin Li'this' region in one method and unqualified 'this' region in another
88*67e74705SXin Limethod, because we only need to ensure the 'this' region is consistent
89*67e74705SXin Liwhen we synthesize it and create it directly from CXXThisExpr in a single
90*67e74705SXin Limethod call.
91*67e74705SXin Li
92*67e74705SXin Li= Working on the Analyzer =
93*67e74705SXin Li
94*67e74705SXin LiIf you are interested in bringing up support for C++ expressions, the
95*67e74705SXin Libest place to look is the visitation logic in GRExprEngine, which
96*67e74705SXin Lihandles the simulation of individual expressions.  There are plenty of
97*67e74705SXin Liexamples there of how other expressions are handled.
98*67e74705SXin Li
99*67e74705SXin LiIf you are interested in writing checkers, look at the Checker and
100*67e74705SXin LiCheckerVisitor interfaces (Checker.h and CheckerVisitor.h).  Also look
101*67e74705SXin Liat the files named *Checker.cpp for examples on how you can implement
102*67e74705SXin Lithese interfaces.
103*67e74705SXin Li
104*67e74705SXin Li= Debugging the Analyzer =
105*67e74705SXin Li
106*67e74705SXin LiThere are some useful command-line options for debugging.  For example:
107*67e74705SXin Li
108*67e74705SXin Li$ clang -cc1 -help | grep analyze
109*67e74705SXin Li -analyze-function <value>
110*67e74705SXin Li -analyzer-display-progress
111*67e74705SXin Li -analyzer-viz-egraph-graphviz
112*67e74705SXin Li ...
113*67e74705SXin Li
114*67e74705SXin LiThe first allows you to specify only analyzing a specific function.
115*67e74705SXin LiThe second prints to the console what function is being analyzed.  The
116*67e74705SXin Lithird generates a graphviz dot file of the ExplodedGraph.  This is
117*67e74705SXin Liextremely useful when debugging the analyzer and viewing the
118*67e74705SXin Lisimulation results.
119*67e74705SXin Li
120*67e74705SXin LiOf course, viewing the CFG (Control-Flow Graph) is also useful:
121*67e74705SXin Li
122*67e74705SXin Li$ clang -cc1 -help | grep cfg
123*67e74705SXin Li -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses
124*67e74705SXin Li -cfg-add-initializers   Add C++ initializers to CFGs for all analyses
125*67e74705SXin Li -cfg-dump               Display Control-Flow Graphs
126*67e74705SXin Li -cfg-view               View Control-Flow Graphs using GraphViz
127*67e74705SXin Li -unoptimized-cfg        Generate unoptimized CFGs for all analyses
128*67e74705SXin Li
129*67e74705SXin Li-cfg-dump dumps a textual representation of the CFG to the console,
130*67e74705SXin Liand -cfg-view creates a GraphViz representation.
131*67e74705SXin Li
132*67e74705SXin Li= References =
133*67e74705SXin Li
134*67e74705SXin Li[1] Precise interprocedural dataflow analysis via graph reachability,
135*67e74705SXin Li    T Reps, S Horwitz, and M Sagiv, POPL '95,
136*67e74705SXin Li    http://portal.acm.org/citation.cfm?id=199462
137*67e74705SXin Li
138*67e74705SXin Li[2] A memory model for static analysis of C programs, Z Xu, T
139*67e74705SXin Li    Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf
140