1# Comparison 2 3This is implemented in `comparison.{h,cc}`. 4 5Graph comparison is the basis of all STG difference reporting. 6 7All the various STG node attributes (such as `size` or `name`) and edge 8identifiers (whether explicit or not, such as function parameter index) can be 9reduced to generic labels. This (over)simplification reduces the problem to 10solve to: 11 12* given two rooted directed graphs, containing labelled nodes and edges 13* generate a graph that encapsulates all the differences found by following 14 matching pairs of edges 15 16Report generation from such a graph is the subject of [Reporting](reporting.md). 17 18STG compares the graphs starting at the root nodes and recursing along edges 19with matching labels. The recursion stops at incomparable nodes (such as a 20`struct` and `int`). Otherwise a comparison specific to the kind of node is 21performed. 22 23Given that the graph will in general not be a tree and may contain cycles, STG 24 25* memoises comparisons with known results 26* detects comparison cycles and ensures correct termination and propagation of 27 results 28 29Overall, this ensures that any given pair of nodes (one from each graph) is 30compared at most once. In the case of a small change to a typical ABI graph of 31size *N*, *O(N)* node comparisons are performed. 32 33## Ignoring certain kinds of differences 34 35It has proved useful to add selective suppression of certain kinds of 36differences, for distinct purposes. 37 38| **ignore** | **directionality** | **purpose** | 39| ------------------------ | ------------------ | ------------------------- | 40| interface addition | asymmetric | compatibility checking | 41| type definition addition | asymmetric | compatibility checking | 42| primitive type encoding | symmetric | cross comparison noise | 43: : : reduction : 44| member size | symmetric | cross comparison noise | 45: : : reduction : 46| enum underlying type | symmetric | cross comparison noise | 47: : : reduction : 48| qualifier | symmetric | cross comparison noise | 49: : : reduction : 50| symbol CRC | symmetric | cross comparison noise | 51: : : reduction : 52| symbol type presence | symmetric | libabigail XML comparison | 53: : : noise reduction : 54| type declaration status | symmetric | libabigail XML comparison | 55: : : noise reduction : 56 57The first two options can be used to test whether one ABI is a subset of 58another. 59 60It can be useful to cross compare ABIs extracted in different ways to validate 61the fidelity of one ABI source against another. Where the models or 62implementations differ systematically, suppressing those differences will make 63the remainder more obvious. 64 65The libabigail versions used in Android's GKI project often generated ABIs with 66spurious differences due to the disappearance (or reappearance) of type 67definitions and (occasionally) symbol types. The corresponding ignore options 68replicate the behaviour of libabigail's `abidiff`. 69 70## Differences and difference graphs 71 72When comparing a pair of nodes, each difference falls into one of the following 73categories: 74 75* node label - a purely local difference 76* matching edge - a recursive difference found by following edges with 77 matching labels 78* added or removed labelled edges, where edges are identified by label - 79 modelled as a recursive difference with a node absent on one side 80 81Each node in an STG difference graph is one of the following[^1]: 82 83* a node removal or addition, containing 84 * a reference to either a node in first graph or one in the second 85* a node change, containing 86 * a reference to two nodes, one in each of the two graphs 87 * a possibly-empty list of differences which can each be one of 88 * a node attribute difference in the form of some informative text 89 * an edge difference in the form of a link to a difference node 90 91[^1]: STG models comparisons as pairs of nodes where either node can be absent. 92 The absent-absent comparison is used to represent "no change". All such 93 edges, except those representing the root of a "no change" graph comparison, 94 are pruned during diff graph creation. 95 96Note that STG's difference nodes are *unkinded*, there is only one kind of 97difference node, unlike STG's data nodes where there is a separate kind of node 98for each kind of C type etc. 99 100## Matching and comparing collections of edges 101 102While an algorithm based on generic label comparison will work, there are a 103couple of issues: 104 105* collections of outgoing edges may not have an obvious label to assign 106 (multiple anonymous members of a `struct`, for example) 107* edges are often ordered (parameters and members, for example) and we want to 108 preserve this order when reporting differences 109 110Comparing two pointer types is straightforward, just compare the pointed-to 111types. However, symbol tables, function arguments and struct members all require 112comparisons of multiple entities simultaneously. 113 114STG compares edge aggregates as follows: 115 116* arrays (like function arguments): by index, comparing recursively items with 117 the same index, reporting removals and additions of the remaining items 118* maps (like the symbol table): by key, comparing recursively items with 119 matching keys, reporting removals and additions of unmatched items 120* otherwise synthesise a key for comparison, compare by key, report 121 differences in the original order of items being compared (favouring the 122 second list's order over the first's); the reordering is an *O(n²)* 123 operation and it might be possible to adapt the more general Myer's diff 124 algorithm to reduce this 125 126## Implementation Details 127 128Comparison is mostly done pair-wise recursively with a DFS, by the function 129object `CompareWorker` and with the help of the [SCC finder](scc.md). 130 131The algorithm divides responsibility between `operator()(Id, Id)` and various 132`operator()(Node, Node)` methods. There are also various helpers in and outside 133`CompareWorker`. 134 135The `Result` type encapsulates the difference between two nodes being compared. 136It contains both a list (`Diff`) of differences (`DiffDetail`) and a boolean 137equality outcome. The latter is used to propagate inequality information in the 138presence of cycles in the diff comparison graph. 139 140### `operator()(Node, Node)` 141 142Specialised for each `Node` type, these methods have the job of computing local 143differences, matching edges and obtaining edge differences from recursive calls 144to `operator()(Id, Id)` (or `Removed` and `Added`, if edge labels are 145unmatched). 146 147Local differences can easily be rendered as text, but edge differences need 148recursive calls, the results of which are merged into the local differences 149`Result` with helper methods. 150 151These methods form the bulk of the comparison code, so in general we want them 152to be as small as possible, containing no boilerplate and simply mirroring the 153node data. The helper functions were therefore chosen for power, laziness and 154concision. 155 156### `Added` and `Removed` 157 158These take care of comparisons where one side is absent. 159 160There are several reasons for not folding this functionality into `operator(Id, 161Id)` itself: 162 163* it would result in unnecessary extra work as its callers would need to pack 164 and the function would need to unpack `optional<Id>` arguments 165* added and removed nodes have none of the other interesting features that it 166 handles 167* `Added` and `Removed` don't need to decorate their return values with any 168 difference information 169 170### `Defined` 171 172This takes care of comparisons of user-defined types which may be forward 173declarations or full definitions. 174 175### `Nodes` 176 177These take care of comparisons of sequences of arbitrary nodes (or enumerators). 178 179STG uses "matching keys" to reduce the problem of comparing sequences to 180comparing maps. It attempts to preserve original sequence order as described in 181`order.h`. 182 183### `operator(Id, Id)` 184 185This controls recursion and handles some special cases before delegating to some 186`operator()(Node, Node)` in the "normal" case. 187 188It takes care of the following: 189 190* revisited, completed comparisons 191* revisited, in-progress comparison 192* qualified types 193* typedefs 194* incomparable and comparable nodes 195* comparison cycles 196 197Note that the non-trivial special cases relating to typedefs and qualified types 198(and their current concrete representations) require non-parallel traversals of 199the graphs being compared; this is the only place where the comparison is not 200purely structural. 201 202#### Revisited Nodes and Recursive Comparison 203 204STG comparison relies on `SCC` to control recursion and behaviour in the face of 205graphs containing arbitrary cycles. Any difference found affecting one node in a 206comparison cycle affects all nodes in that cycle. 207 208Excluding the two special cases documented in the following sections, the 209comparison steps are approximately: 210 2111. if the comparison already has a known result then return this 2122. if the comparison already is in progress then return a potential difference 2133. start node visit, register the node with the SCC finder 214 1. (special handling of qualified types and typedefs) 215 2. incomparable nodes (such as a `struct` and an `int`) go to `Mismatch` 216 which returns a difference; there is no further recursion 217 3. otherwise delegate node comparison to a node-specific function; with 218 possible recursion 219 4. the comparison result here is tentative, due to potential cycles 2204. finish node visit, informing the SCC finder 2215. if an SCC was closed, we've just finished its root comparison 222 1. root compared equal? discard unwanted potential differences 223 2. difference found? record confirmed differences 224 3. record all its comparison results as final 2256. return result (which will be final if an SCC was closed) 226 227#### Typedefs 228 229This special handling is subject to change. 230 231* `typedef` foo bar ⇔ foo 232 233Typedefs are named type aliases which cannot refer to themselves or later 234defined types. The referred-to type is exactly identical to the typedef. So for 235difference *finding*, typedefs should just be resolved and skipped over. 236However, for *reporting*, it may be still useful to say where a difference came 237from. This requires extra handling to collect the typedef names on each side of 238the comparison, when there is something to report. 239 240If `operator()(Id, Id)` sees a comparison involving a typedef, it resolves 241typedef chains on both sides and keeps track of the names. Then it calls itself 242recursively. If the result is no-diff, it returns no-diff, otherwise, it reports 243the differences between the types at the end of the typedef chains. 244 245An alternative would be to genuinely just follow the epsilons. Store the 246typedefs in the diff tree but record the comparison of what they resolve to. The 247presentation layer can decorate the comparison text with resolution chains. 248 249Note that qualified typedefs present extra complications. 250 251#### Qualified Types 252 253This special handling is subject to change. 254 255* `const` → `volatile` → foo ⇔ `volatile` → `const` → foo 256 257STG currently represents type qualifiers as separate, individual nodes. They are 258relevant for finding differences but there may be no guarantee of the order in 259which they will appear. For diff reporting, STG currently reports added and 260removed qualifiers but also compares the underlying types. 261 262This implies that when faced with a comparison involving a qualifier, 263`operator()(Id, Id)` should collect and compare all qualifiers on both sides and 264treat the types as compound objects consisting of their qualifiers and the 265underlying types, either or both of which may have differences to report. 266Comparing the underlying types requires recursive calls. 267 268Note that qualified typedefs present extra complications. 269 270#### Qualified typedefs 271 272STG does not currently do anything special for qualified typedefs which can have 273subtle and surprising behaviours. For example: 274 275Before: 276 277```c++ 278const int quux; 279``` 280 281After 1: 282 283```c++ 284typedef int foo; 285const foo quux; 286``` 287 288After 2: 289 290```c++ 291typedef const int foo; 292foo quux; 293``` 294 295After 3: 296 297```c++ 298typedef const int foo; 299const foo quux; 300``` 301 302In all cases above, the type of `quux` is unchanged. These examples strongly 303suggest that a better model of C types would involve tracking qualification as a 304decoration present on every type node, including typedefs. 305 306Note that this behaviour implies C's type system is not purely constructive as 307there is machinery to discard duplicate qualifiers which would be illegal 308elsewhere. 309 310For the moment, we can pretend that outer qualifications are always significant, 311even though they may be absorbed by inner ones, and risk occasional false 312positives. 313 314A worse case is: 315 316Before: 317 318```c++ 319const int quux[]; 320``` 321 322After 1: 323 324```c++ 325typedef int foo[]; 326const foo quux; 327``` 328 329After 2: 330 331```c++ 332typedef const int foo[]; 333foo quux; 334``` 335 336After 3: 337 338```c++ 339typedef const int foo[]; 340const foo quux; 341``` 342 343All the `quux` are identically typed. There is an additional wart that what 344would normally be illegal qualifiers on an array type instead decorate its 345element type. 346 347Finally, worst is: 348 349Before: 350 351```c++ 352const int quux(); 353``` 354 355After 1: 356 357```c++ 358typedef int foo(); 359const foo quux; 360``` 361 362After 2: 363 364```c++ 365typedef const int foo(); 366foo quux; 367``` 368 369After 3: 370 371```c++ 372typedef const int foo(); 373const foo quux; 374``` 375 376The two `const foo quux` cases invoke undefined behaviour. The consistently 377crazy behaviour would have been to decorate the return type instead. 378 379The "worstest" case is GCC allowing the specification of typedef alignment 380different to the defining type. This abomination should not exist and should 381never be used. Alignment is not currently modelled by STG. 382 383### Diff helpers 384 385These are mainly used by the `CompareWorker::operator()(Node, Node)` methods. 386 387* `MarkIncomparable` - nodes are just different 388* `AddNodeDiff` - add node difference, unconditionally 389* `AddEdgeDiff` - add edge difference (addition or removal), unconditionally 390* `MaybeAddNodeDiff` - add node difference (label change), conditionally 391* `MaybeAddEdgeDiff` - add matching edge recursive difference, conditionally 392 393Variants are possible where text is generated lazily on a recursive diff being 394found, as are ones where labels are compared and serialised only if different. 395 396## Diff Presentation 397 398In general, there are two problems to solve: 399 400* generating suitable text, for 401 * nodes and edges 402 * node and edge differences 403* building a report with some meaningful structure 404 405Node and edge description and report structure are the responsibility of the 406*reporting* code. See [Naming](naming.md) for more detailed notes on node 407description, mainly C type name syntax. 408 409Several report formats are supported and the simplest is (omitting various 410complications) a rendering of a difference graph as a difference *tree* where 411revisiting nodes is avoided by reporting 2 additional artificial kinds of 412difference: 413 4141. already reported - to handle diff sharing 4152. being compared - to handle diff cycles 416 417The various formats are not documented further here. 418 419Finally, node and edge difference description is currently the responsibility of 420the *comparison* code. This may change in the future, but might require a typed 421difference graph. 422