1*67e74705SXin LiThe analyzer "Store" represents the contents of memory regions. It is an opaque 2*67e74705SXin Lifunctional data structure stored in each ProgramState; the only class that can 3*67e74705SXin Limodify the store is its associated StoreManager. 4*67e74705SXin Li 5*67e74705SXin LiCurrently (Feb. 2013), the only StoreManager implementation being used is 6*67e74705SXin LiRegionStoreManager. This store records bindings to memory regions using a "base 7*67e74705SXin Liregion + offset" key. (This allows `*p` and `p[0]` to map to the same location, 8*67e74705SXin Liamong other benefits.) 9*67e74705SXin Li 10*67e74705SXin LiRegions are grouped into "clusters", which roughly correspond to "regions with 11*67e74705SXin Lithe same base region". This allows certain operations to be more efficient, 12*67e74705SXin Lisuch as invalidation. 13*67e74705SXin Li 14*67e74705SXin LiRegions that do not have a known offset use a special "symbolic" offset. These 15*67e74705SXin Likeys store both the original region, and the "concrete offset region" -- the 16*67e74705SXin Lilast region whose offset is entirely concrete. (For example, in the expression 17*67e74705SXin Li`foo.bar[1][i].baz`, the concrete offset region is the array `foo.bar[1]`, 18*67e74705SXin Lisince that has a known offset from the start of the top-level `foo` struct.) 19*67e74705SXin Li 20*67e74705SXin Li 21*67e74705SXin LiBinding Invalidation 22*67e74705SXin Li==================== 23*67e74705SXin Li 24*67e74705SXin LiSupporting both concrete and symbolic offsets makes things a bit tricky. Here's 25*67e74705SXin Lian example: 26*67e74705SXin Li 27*67e74705SXin Li foo[0] = 0; 28*67e74705SXin Li foo[1] = 1; 29*67e74705SXin Li foo[i] = i; 30*67e74705SXin Li 31*67e74705SXin LiAfter the third assignment, nothing can be said about the value of `foo[0]`, 32*67e74705SXin Libecause `foo[i]` may have overwritten it! Thus, *binding to a region with a 33*67e74705SXin Lisymbolic offset invalidates the entire concrete offset region.* We know 34*67e74705SXin Li`foo[i]` is somewhere within `foo`, so we don't have to invalidate anything 35*67e74705SXin Lielse, but we do have to be conservative about all other bindings within `foo`. 36*67e74705SXin Li 37*67e74705SXin LiContinuing the example: 38*67e74705SXin Li 39*67e74705SXin Li foo[i] = i; 40*67e74705SXin Li foo[0] = 0; 41*67e74705SXin Li 42*67e74705SXin LiAfter this latest assignment, nothing can be said about the value of `foo[i]`, 43*67e74705SXin Libecause `foo[0]` may have overwritten it! *Binding to a region R with a 44*67e74705SXin Liconcrete offset invalidates any symbolic offset bindings whose concrete offset 45*67e74705SXin Liregion is a super-region **or** sub-region of R.* All we know about `foo[i]` is 46*67e74705SXin Lithat it is somewhere within `foo`, so changing *anything* within `foo` might 47*67e74705SXin Lichange `foo[i]`, and changing *all* of `foo` (or its base region) will 48*67e74705SXin Li*definitely* change `foo[i]`. 49*67e74705SXin Li 50*67e74705SXin LiThis logic could be improved by using the current constraints on `i`, at the 51*67e74705SXin Licost of speed. The latter case could also be improved by matching region kinds, 52*67e74705SXin Lii.e. changing `foo[0].a` is unlikely to affect `foo[i].b`, no matter what `i` 53*67e74705SXin Liis. 54*67e74705SXin Li 55*67e74705SXin LiFor more detail, read through RegionStoreManager::removeSubRegionBindings in 56*67e74705SXin LiRegionStore.cpp. 57*67e74705SXin Li 58*67e74705SXin Li 59*67e74705SXin LiObjCIvarRegions 60*67e74705SXin Li=============== 61*67e74705SXin Li 62*67e74705SXin LiObjective-C instance variables require a bit of special handling. Like struct 63*67e74705SXin Lifields, they are not base regions, and when their parent object region is 64*67e74705SXin Liinvalidated, all the instance variables must be invalidated as well. However, 65*67e74705SXin Lithey have no concrete compile-time offsets (in the modern, "non-fragile" 66*67e74705SXin Liruntime), and so cannot easily be represented as an offset from the start of 67*67e74705SXin Lithe object in the analyzer. Moreover, this means that invalidating a single 68*67e74705SXin Liinstance variable should *not* invalidate the rest of the object, since unlike 69*67e74705SXin Listruct fields or array elements there is no way to perform pointer arithmetic 70*67e74705SXin Lito access another instance variable. 71*67e74705SXin Li 72*67e74705SXin LiConsequently, although the base region of an ObjCIvarRegion is the entire 73*67e74705SXin Liobject, RegionStore offsets are computed from the start of the instance 74*67e74705SXin Livariable. Thus it is not valid to assume that all bindings with non-symbolic 75*67e74705SXin Lioffsets start from the base region! 76*67e74705SXin Li 77*67e74705SXin Li 78*67e74705SXin LiRegion Invalidation 79*67e74705SXin Li=================== 80*67e74705SXin Li 81*67e74705SXin LiUnlike binding invalidation, region invalidation occurs when the entire 82*67e74705SXin Licontents of a region may have changed---say, because it has been passed to a 83*67e74705SXin Lifunction the analyzer can model, like memcpy, or because its address has 84*67e74705SXin Liescaped, usually as an argument to an opaque function call. In these cases we 85*67e74705SXin Lineed to throw away not just all bindings within the region itself, but within 86*67e74705SXin Liits entire cluster, since neighboring regions may be accessed via pointer 87*67e74705SXin Liarithmetic. 88*67e74705SXin Li 89*67e74705SXin LiRegion invalidation typically does even more than this, however. Because it 90*67e74705SXin Liusually represents the complete escape of a region from the analyzer's model, 91*67e74705SXin Liits *contents* must also be transitively invalidated. (For example, if a region 92*67e74705SXin Li'p' of type 'int **' is invalidated, the contents of '*p' and '**p' may have 93*67e74705SXin Lichanged as well.) The algorithm that traverses this transitive closure of 94*67e74705SXin Liaccessible regions is known as ClusterAnalysis, and is also used for finding 95*67e74705SXin Liall live bindings in the store (in order to throw away the dead ones). The name 96*67e74705SXin Li"ClusterAnalysis" predates the cluster-based organization of bindings, but 97*67e74705SXin Lirefers to the same concept: during invalidation and liveness analysis, all 98*67e74705SXin Libindings within a cluster must be treated in the same way for a conservative 99*67e74705SXin Limodel of program behavior. 100*67e74705SXin Li 101*67e74705SXin Li 102*67e74705SXin LiDefault Bindings 103*67e74705SXin Li================ 104*67e74705SXin Li 105*67e74705SXin LiMost bindings in RegionStore are simple scalar values -- integers and pointers. 106*67e74705SXin LiThese are known as "Direct" bindings. However, RegionStore supports a second 107*67e74705SXin Litype of binding called a "Default" binding. These are used to provide values to 108*67e74705SXin Liall the elements of an aggregate type (struct or array) without having to 109*67e74705SXin Liexplicitly specify a binding for each individual element. 110*67e74705SXin Li 111*67e74705SXin LiWhen there is no Direct binding for a particular region, the store manager 112*67e74705SXin Lilooks at each super-region in turn to see if there is a Default binding. If so, 113*67e74705SXin Lithis value is used as the value of the original region. The search ends when 114*67e74705SXin Lithe base region is reached, at which point the RegionStore will pick an 115*67e74705SXin Liappropriate default value for the region (usually a symbolic value, but 116*67e74705SXin Lisometimes zero, for static data, or "uninitialized", for stack variables). 117*67e74705SXin Li 118*67e74705SXin Li int manyInts[10]; 119*67e74705SXin Li manyInts[1] = 42; // Creates a Direct binding for manyInts[1]. 120*67e74705SXin Li print(manyInts[1]); // Retrieves the Direct binding for manyInts[1]; 121*67e74705SXin Li print(manyInts[0]); // There is no Direct binding for manyInts[1]. 122*67e74705SXin Li // Is there a Default binding for the entire array? 123*67e74705SXin Li // There is not, but it is a stack variable, so we use 124*67e74705SXin Li // "uninitialized" as the default value (and emit a 125*67e74705SXin Li // diagnostic!). 126*67e74705SXin Li 127*67e74705SXin LiNOTE: The fact that bindings are stored as a base region plus an offset limits 128*67e74705SXin Lithe Default Binding strategy, because in C aggregates can contain other 129*67e74705SXin Liaggregates. In the current implementation of RegionStore, there is no way to 130*67e74705SXin Lidistinguish a Default binding for an entire aggregate from a Default binding 131*67e74705SXin Lifor the sub-aggregate at offset 0. 132*67e74705SXin Li 133*67e74705SXin Li 134*67e74705SXin LiLazy Bindings (LazyCompoundVal) 135*67e74705SXin Li=============================== 136*67e74705SXin Li 137*67e74705SXin LiRegionStore implements an optimization for copying aggregates (structs and 138*67e74705SXin Liarrays) called "lazy bindings", implemented using a special SVal called 139*67e74705SXin LiLazyCompoundVal. When the store is asked for the "binding" for an entire 140*67e74705SXin Liaggregate (i.e. for an lvalue-to-rvalue conversion), it returns a 141*67e74705SXin LiLazyCompoundVal instead. When this value is then stored into a variable, it is 142*67e74705SXin Libound as a Default value. This makes copying arrays and structs much cheaper 143*67e74705SXin Lithan if they had required memberwise access. 144*67e74705SXin Li 145*67e74705SXin LiUnder the hood, a LazyCompoundVal is implemented as a uniqued pair of (region, 146*67e74705SXin Listore), representing "the value of the region during this 'snapshot' of the 147*67e74705SXin Listore". This has important implications for any sort of liveness or 148*67e74705SXin Lireachability analysis, which must take the bindings in the old store into 149*67e74705SXin Liaccount. 150*67e74705SXin Li 151*67e74705SXin LiRetrieving a value from a lazy binding happens in the same way as any other 152*67e74705SXin LiDefault binding: since there is no direct binding, the store manager falls back 153*67e74705SXin Lito super-regions to look for an appropriate default binding. LazyCompoundVal 154*67e74705SXin Lidiffers from a normal default binding, however, in that it contains several 155*67e74705SXin Lidifferent values, instead of one value that will appear several times. Because 156*67e74705SXin Liof this, the store manager has to reconstruct the subregion chain on top of the 157*67e74705SXin LiLazyCompoundVal region, and look up *that* region in the previous store. 158*67e74705SXin Li 159*67e74705SXin LiHere's a concrete example: 160*67e74705SXin Li 161*67e74705SXin Li CGPoint p; 162*67e74705SXin Li p.x = 42; // A Direct binding is made to the FieldRegion 'p.x'. 163*67e74705SXin Li CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a 164*67e74705SXin Li // snapshot of the current store state. This value is then 165*67e74705SXin Li // used as a Default binding for the VarRegion 'p2'. 166*67e74705SXin Li return p2.x; // The binding for FieldRegion 'p2.x' is requested. 167*67e74705SXin Li // There is no Direct binding, so we look for a Default 168*67e74705SXin Li // binding to 'p2' and find the LCV. 169*67e74705SXin Li // Because it's an LCV, we look at our requested region 170*67e74705SXin Li // and see that it's the '.x' field. We ask for the value 171*67e74705SXin Li // of 'p.x' within the snapshot, and get back 42. 172