xref: /aosp_15_r20/external/clang/docs/analyzer/RegionStore.txt (revision 67e74705e28f6214e480b399dd47ea732279e315)
1*67e74705SXin LiThe analyzer "Store" represents the contents of memory regions. It is an opaque
2*67e74705SXin Lifunctional data structure stored in each ProgramState; the only class that can
3*67e74705SXin Limodify the store is its associated StoreManager.
4*67e74705SXin Li
5*67e74705SXin LiCurrently (Feb. 2013), the only StoreManager implementation being used is
6*67e74705SXin LiRegionStoreManager. This store records bindings to memory regions using a "base
7*67e74705SXin Liregion + offset" key. (This allows `*p` and `p[0]` to map to the same location,
8*67e74705SXin Liamong other benefits.)
9*67e74705SXin Li
10*67e74705SXin LiRegions are grouped into "clusters", which roughly correspond to "regions with
11*67e74705SXin Lithe same base region". This allows certain operations to be more efficient,
12*67e74705SXin Lisuch as invalidation.
13*67e74705SXin Li
14*67e74705SXin LiRegions that do not have a known offset use a special "symbolic" offset. These
15*67e74705SXin Likeys store both the original region, and the "concrete offset region" -- the
16*67e74705SXin Lilast region whose offset is entirely concrete. (For example, in the expression
17*67e74705SXin Li`foo.bar[1][i].baz`, the concrete offset region is the array `foo.bar[1]`,
18*67e74705SXin Lisince that has a known offset from the start of the top-level `foo` struct.)
19*67e74705SXin Li
20*67e74705SXin Li
21*67e74705SXin LiBinding Invalidation
22*67e74705SXin Li====================
23*67e74705SXin Li
24*67e74705SXin LiSupporting both concrete and symbolic offsets makes things a bit tricky. Here's
25*67e74705SXin Lian example:
26*67e74705SXin Li
27*67e74705SXin Li    foo[0] = 0;
28*67e74705SXin Li    foo[1] = 1;
29*67e74705SXin Li    foo[i] = i;
30*67e74705SXin Li
31*67e74705SXin LiAfter the third assignment, nothing can be said about the value of `foo[0]`,
32*67e74705SXin Libecause `foo[i]` may have overwritten it! Thus, *binding to a region with a
33*67e74705SXin Lisymbolic offset invalidates the entire concrete offset region.* We know
34*67e74705SXin Li`foo[i]` is somewhere within `foo`, so we don't have to invalidate anything
35*67e74705SXin Lielse, but we do have to be conservative about all other bindings within `foo`.
36*67e74705SXin Li
37*67e74705SXin LiContinuing the example:
38*67e74705SXin Li
39*67e74705SXin Li    foo[i] = i;
40*67e74705SXin Li    foo[0] = 0;
41*67e74705SXin Li
42*67e74705SXin LiAfter this latest assignment, nothing can be said about the value of `foo[i]`,
43*67e74705SXin Libecause `foo[0]` may have overwritten it! *Binding to a region R with a
44*67e74705SXin Liconcrete offset invalidates any symbolic offset bindings whose concrete offset
45*67e74705SXin Liregion is a super-region **or** sub-region of R.* All we know about `foo[i]` is
46*67e74705SXin Lithat it is somewhere within `foo`, so changing *anything* within `foo` might
47*67e74705SXin Lichange `foo[i]`, and changing *all* of `foo` (or its base region) will
48*67e74705SXin Li*definitely* change `foo[i]`.
49*67e74705SXin Li
50*67e74705SXin LiThis logic could be improved by using the current constraints on `i`, at the
51*67e74705SXin Licost of speed. The latter case could also be improved by matching region kinds,
52*67e74705SXin Lii.e. changing `foo[0].a` is unlikely to affect `foo[i].b`, no matter what `i`
53*67e74705SXin Liis.
54*67e74705SXin Li
55*67e74705SXin LiFor more detail, read through RegionStoreManager::removeSubRegionBindings in
56*67e74705SXin LiRegionStore.cpp.
57*67e74705SXin Li
58*67e74705SXin Li
59*67e74705SXin LiObjCIvarRegions
60*67e74705SXin Li===============
61*67e74705SXin Li
62*67e74705SXin LiObjective-C instance variables require a bit of special handling. Like struct
63*67e74705SXin Lifields, they are not base regions, and when their parent object region is
64*67e74705SXin Liinvalidated, all the instance variables must be invalidated as well. However,
65*67e74705SXin Lithey have no concrete compile-time offsets (in the modern, "non-fragile"
66*67e74705SXin Liruntime), and so cannot easily be represented as an offset from the start of
67*67e74705SXin Lithe object in the analyzer. Moreover, this means that invalidating a single
68*67e74705SXin Liinstance variable should *not* invalidate the rest of the object, since unlike
69*67e74705SXin Listruct fields or array elements there is no way to perform pointer arithmetic
70*67e74705SXin Lito access another instance variable.
71*67e74705SXin Li
72*67e74705SXin LiConsequently, although the base region of an ObjCIvarRegion is the entire
73*67e74705SXin Liobject, RegionStore offsets are computed from the start of the instance
74*67e74705SXin Livariable. Thus it is not valid to assume that all bindings with non-symbolic
75*67e74705SXin Lioffsets start from the base region!
76*67e74705SXin Li
77*67e74705SXin Li
78*67e74705SXin LiRegion Invalidation
79*67e74705SXin Li===================
80*67e74705SXin Li
81*67e74705SXin LiUnlike binding invalidation, region invalidation occurs when the entire
82*67e74705SXin Licontents of a region may have changed---say, because it has been passed to a
83*67e74705SXin Lifunction the analyzer can model, like memcpy, or because its address has
84*67e74705SXin Liescaped, usually as an argument to an opaque function call. In these cases we
85*67e74705SXin Lineed to throw away not just all bindings within the region itself, but within
86*67e74705SXin Liits entire cluster, since neighboring regions may be accessed via pointer
87*67e74705SXin Liarithmetic.
88*67e74705SXin Li
89*67e74705SXin LiRegion invalidation typically does even more than this, however. Because it
90*67e74705SXin Liusually represents the complete escape of a region from the analyzer's model,
91*67e74705SXin Liits *contents* must also be transitively invalidated. (For example, if a region
92*67e74705SXin Li'p' of type 'int **' is invalidated, the contents of '*p' and '**p' may have
93*67e74705SXin Lichanged as well.) The algorithm that traverses this transitive closure of
94*67e74705SXin Liaccessible regions is known as ClusterAnalysis, and is also used for finding
95*67e74705SXin Liall live bindings in the store (in order to throw away the dead ones). The name
96*67e74705SXin Li"ClusterAnalysis" predates the cluster-based organization of bindings, but
97*67e74705SXin Lirefers to the same concept: during invalidation and liveness analysis, all
98*67e74705SXin Libindings within a cluster must be treated in the same way for a conservative
99*67e74705SXin Limodel of program behavior.
100*67e74705SXin Li
101*67e74705SXin Li
102*67e74705SXin LiDefault Bindings
103*67e74705SXin Li================
104*67e74705SXin Li
105*67e74705SXin LiMost bindings in RegionStore are simple scalar values -- integers and pointers.
106*67e74705SXin LiThese are known as "Direct" bindings. However, RegionStore supports a second
107*67e74705SXin Litype of binding called a "Default" binding. These are used to provide values to
108*67e74705SXin Liall the elements of an aggregate type (struct or array) without having to
109*67e74705SXin Liexplicitly specify a binding for each individual element.
110*67e74705SXin Li
111*67e74705SXin LiWhen there is no Direct binding for a particular region, the store manager
112*67e74705SXin Lilooks at each super-region in turn to see if there is a Default binding. If so,
113*67e74705SXin Lithis value is used as the value of the original region. The search ends when
114*67e74705SXin Lithe base region is reached, at which point the RegionStore will pick an
115*67e74705SXin Liappropriate default value for the region (usually a symbolic value, but
116*67e74705SXin Lisometimes zero, for static data, or "uninitialized", for stack variables).
117*67e74705SXin Li
118*67e74705SXin Li  int manyInts[10];
119*67e74705SXin Li  manyInts[1] = 42;   // Creates a Direct binding for manyInts[1].
120*67e74705SXin Li  print(manyInts[1]); // Retrieves the Direct binding for manyInts[1];
121*67e74705SXin Li  print(manyInts[0]); // There is no Direct binding for manyInts[1].
122*67e74705SXin Li                      // Is there a Default binding for the entire array?
123*67e74705SXin Li                      // There is not, but it is a stack variable, so we use
124*67e74705SXin Li                      // "uninitialized" as the default value (and emit a
125*67e74705SXin Li                      // diagnostic!).
126*67e74705SXin Li
127*67e74705SXin LiNOTE: The fact that bindings are stored as a base region plus an offset limits
128*67e74705SXin Lithe Default Binding strategy, because in C aggregates can contain other
129*67e74705SXin Liaggregates. In the current implementation of RegionStore, there is no way to
130*67e74705SXin Lidistinguish a Default binding for an entire aggregate from a Default binding
131*67e74705SXin Lifor the sub-aggregate at offset 0.
132*67e74705SXin Li
133*67e74705SXin Li
134*67e74705SXin LiLazy Bindings (LazyCompoundVal)
135*67e74705SXin Li===============================
136*67e74705SXin Li
137*67e74705SXin LiRegionStore implements an optimization for copying aggregates (structs and
138*67e74705SXin Liarrays) called "lazy bindings", implemented using a special SVal called
139*67e74705SXin LiLazyCompoundVal. When the store is asked for the "binding" for an entire
140*67e74705SXin Liaggregate (i.e. for an lvalue-to-rvalue conversion), it returns a
141*67e74705SXin LiLazyCompoundVal instead. When this value is then stored into a variable, it is
142*67e74705SXin Libound as a Default value. This makes copying arrays and structs much cheaper
143*67e74705SXin Lithan if they had required memberwise access.
144*67e74705SXin Li
145*67e74705SXin LiUnder the hood, a LazyCompoundVal is implemented as a uniqued pair of (region,
146*67e74705SXin Listore), representing "the value of the region during this 'snapshot' of the
147*67e74705SXin Listore". This has important implications for any sort of liveness or
148*67e74705SXin Lireachability analysis, which must take the bindings in the old store into
149*67e74705SXin Liaccount.
150*67e74705SXin Li
151*67e74705SXin LiRetrieving a value from a lazy binding happens in the same way as any other
152*67e74705SXin LiDefault binding: since there is no direct binding, the store manager falls back
153*67e74705SXin Lito super-regions to look for an appropriate default binding. LazyCompoundVal
154*67e74705SXin Lidiffers from a normal default binding, however, in that it contains several
155*67e74705SXin Lidifferent values, instead of one value that will appear several times. Because
156*67e74705SXin Liof this, the store manager has to reconstruct the subregion chain on top of the
157*67e74705SXin LiLazyCompoundVal region, and look up *that* region in the previous store.
158*67e74705SXin Li
159*67e74705SXin LiHere's a concrete example:
160*67e74705SXin Li
161*67e74705SXin Li    CGPoint p;
162*67e74705SXin Li    p.x = 42;       // A Direct binding is made to the FieldRegion 'p.x'.
163*67e74705SXin Li    CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a
164*67e74705SXin Li                    // snapshot of the current store state. This value is then
165*67e74705SXin Li                    // used as a Default binding for the VarRegion 'p2'.
166*67e74705SXin Li    return p2.x;    // The binding for FieldRegion 'p2.x' is requested.
167*67e74705SXin Li                    // There is no Direct binding, so we look for a Default
168*67e74705SXin Li                    // binding to 'p2' and find the LCV.
169*67e74705SXin Li                    // Because it's an LCV, we look at our requested region
170*67e74705SXin Li                    // and see that it's the '.x' field. We ask for the value
171*67e74705SXin Li                    // of 'p.x' within the snapshot, and get back 42.
172