xref: /aosp_15_r20/external/stg/doc/comparison.md (revision 9e3b08ae94a55201065475453d799e8b1378bea6)
1# Comparison
2
3This is implemented in `comparison.{h,cc}`.
4
5Graph comparison is the basis of all STG difference reporting.
6
7All the various STG node attributes (such as `size` or `name`) and edge
8identifiers (whether explicit or not, such as function parameter index) can be
9reduced to generic labels. This (over)simplification reduces the problem to
10solve to:
11
12*   given two rooted directed graphs, containing labelled nodes and edges
13*   generate a graph that encapsulates all the differences found by following
14    matching pairs of edges
15
16Report generation from such a graph is the subject of [Reporting](reporting.md).
17
18STG compares the graphs starting at the root nodes and recursing along edges
19with matching labels. The recursion stops at incomparable nodes (such as a
20`struct` and `int`). Otherwise a comparison specific to the kind of node is
21performed.
22
23Given that the graph will in general not be a tree and may contain cycles, STG
24
25*   memoises comparisons with known results
26*   detects comparison cycles and ensures correct termination and propagation of
27    results
28
29Overall, this ensures that any given pair of nodes (one from each graph) is
30compared at most once. In the case of a small change to a typical ABI graph of
31size *N*, *O(N)* node comparisons are performed.
32
33## Ignoring certain kinds of differences
34
35It has proved useful to add selective suppression of certain kinds of
36differences, for distinct purposes.
37
38| **ignore**               | **directionality** | **purpose**               |
39| ------------------------ | ------------------ | ------------------------- |
40| interface addition       | asymmetric         | compatibility checking    |
41| type definition addition | asymmetric         | compatibility checking    |
42| primitive type encoding  | symmetric          | cross comparison noise    |
43:                          :                    : reduction                 :
44| member size              | symmetric          | cross comparison noise    |
45:                          :                    : reduction                 :
46| enum underlying type     | symmetric          | cross comparison noise    |
47:                          :                    : reduction                 :
48| qualifier                | symmetric          | cross comparison noise    |
49:                          :                    : reduction                 :
50| symbol CRC               | symmetric          | cross comparison noise    |
51:                          :                    : reduction                 :
52| symbol type presence     | symmetric          | libabigail XML comparison |
53:                          :                    : noise reduction           :
54| type declaration status  | symmetric          | libabigail XML comparison |
55:                          :                    : noise reduction           :
56
57The first two options can be used to test whether one ABI is a subset of
58another.
59
60It can be useful to cross compare ABIs extracted in different ways to validate
61the fidelity of one ABI source against another. Where the models or
62implementations differ systematically, suppressing those differences will make
63the remainder more obvious.
64
65The libabigail versions used in Android's GKI project often generated ABIs with
66spurious differences due to the disappearance (or reappearance) of type
67definitions and (occasionally) symbol types. The corresponding ignore options
68replicate the behaviour of libabigail's `abidiff`.
69
70## Differences and difference graphs
71
72When comparing a pair of nodes, each difference falls into one of the following
73categories:
74
75*   node label - a purely local difference
76*   matching edge - a recursive difference found by following edges with
77    matching labels
78*   added or removed labelled edges, where edges are identified by label -
79    modelled as a recursive difference with a node absent on one side
80
81Each node in an STG difference graph is one of the following[^1]:
82
83*   a node removal or addition, containing
84    *   a reference to either a node in first graph or one in the second
85*   a node change, containing
86    *   a reference to two nodes, one in each of the two graphs
87    *   a possibly-empty list of differences which can each be one of
88        *   a node attribute difference in the form of some informative text
89        *   an edge difference in the form of a link to a difference node
90
91[^1]: STG models comparisons as pairs of nodes where either node can be absent.
92    The absent-absent comparison is used to represent "no change". All such
93    edges, except those representing the root of a "no change" graph comparison,
94    are pruned during diff graph creation.
95
96Note that STG's difference nodes are *unkinded*, there is only one kind of
97difference node, unlike STG's data nodes where there is a separate kind of node
98for each kind of C type etc.
99
100## Matching and comparing collections of edges
101
102While an algorithm based on generic label comparison will work, there are a
103couple of issues:
104
105*   collections of outgoing edges may not have an obvious label to assign
106    (multiple anonymous members of a `struct`, for example)
107*   edges are often ordered (parameters and members, for example) and we want to
108    preserve this order when reporting differences
109
110Comparing two pointer types is straightforward, just compare the pointed-to
111types. However, symbol tables, function arguments and struct members all require
112comparisons of multiple entities simultaneously.
113
114STG compares edge aggregates as follows:
115
116*   arrays (like function arguments): by index, comparing recursively items with
117    the same index, reporting removals and additions of the remaining items
118*   maps (like the symbol table): by key, comparing recursively items with
119    matching keys, reporting removals and additions of unmatched items
120*   otherwise synthesise a key for comparison, compare by key, report
121    differences in the original order of items being compared (favouring the
122    second list's order over the first's); the reordering is an *O(n²)*
123    operation and it might be possible to adapt the more general Myer's diff
124    algorithm to reduce this
125
126## Implementation Details
127
128Comparison is mostly done pair-wise recursively with a DFS, by the function
129object `CompareWorker` and with the help of the [SCC finder](scc.md).
130
131The algorithm divides responsibility between `operator()(Id, Id)` and various
132`operator()(Node, Node)` methods. There are also various helpers in and outside
133`CompareWorker`.
134
135The `Result` type encapsulates the difference between two nodes being compared.
136It contains both a list (`Diff`) of differences (`DiffDetail`) and a boolean
137equality outcome. The latter is used to propagate inequality information in the
138presence of cycles in the diff comparison graph.
139
140### `operator()(Node, Node)`
141
142Specialised for each `Node` type, these methods have the job of computing local
143differences, matching edges and obtaining edge differences from recursive calls
144to `operator()(Id, Id)` (or `Removed` and `Added`, if edge labels are
145unmatched).
146
147Local differences can easily be rendered as text, but edge differences need
148recursive calls, the results of which are merged into the local differences
149`Result` with helper methods.
150
151These methods form the bulk of the comparison code, so in general we want them
152to be as small as possible, containing no boilerplate and simply mirroring the
153node data. The helper functions were therefore chosen for power, laziness and
154concision.
155
156### `Added` and `Removed`
157
158These take care of comparisons where one side is absent.
159
160There are several reasons for not folding this functionality into `operator(Id,
161Id)` itself:
162
163*   it would result in unnecessary extra work as its callers would need to pack
164    and the function would need to unpack `optional<Id>` arguments
165*   added and removed nodes have none of the other interesting features that it
166    handles
167*   `Added` and `Removed` don't need to decorate their return values with any
168    difference information
169
170### `Defined`
171
172This takes care of comparisons of user-defined types which may be forward
173declarations or full definitions.
174
175### `Nodes`
176
177These take care of comparisons of sequences of arbitrary nodes (or enumerators).
178
179STG uses "matching keys" to reduce the problem of comparing sequences to
180comparing maps. It attempts to preserve original sequence order as described in
181`order.h`.
182
183### `operator(Id, Id)`
184
185This controls recursion and handles some special cases before delegating to some
186`operator()(Node, Node)` in the "normal" case.
187
188It takes care of the following:
189
190*   revisited, completed comparisons
191*   revisited, in-progress comparison
192*   qualified types
193*   typedefs
194*   incomparable and comparable nodes
195*   comparison cycles
196
197Note that the non-trivial special cases relating to typedefs and qualified types
198(and their current concrete representations) require non-parallel traversals of
199the graphs being compared; this is the only place where the comparison is not
200purely structural.
201
202#### Revisited Nodes and Recursive Comparison
203
204STG comparison relies on `SCC` to control recursion and behaviour in the face of
205graphs containing arbitrary cycles. Any difference found affecting one node in a
206comparison cycle affects all nodes in that cycle.
207
208Excluding the two special cases documented in the following sections, the
209comparison steps are approximately:
210
2111.  if the comparison already has a known result then return this
2122.  if the comparison already is in progress then return a potential difference
2133.  start node visit, register the node with the SCC finder
214    1.  (special handling of qualified types and typedefs)
215    2.  incomparable nodes (such as a `struct` and an `int`) go to `Mismatch`
216        which returns a difference; there is no further recursion
217    3.  otherwise delegate node comparison to a node-specific function; with
218        possible recursion
219    4.  the comparison result here is tentative, due to potential cycles
2204.  finish node visit, informing the SCC finder
2215.  if an SCC was closed, we've just finished its root comparison
222    1.  root compared equal? discard unwanted potential differences
223    2.  difference found? record confirmed differences
224    3.  record all its comparison results as final
2256.  return result (which will be final if an SCC was closed)
226
227#### Typedefs
228
229This special handling is subject to change.
230
231*   `typedef` foo bar ⇔ foo
232
233Typedefs are named type aliases which cannot refer to themselves or later
234defined types. The referred-to type is exactly identical to the typedef. So for
235difference *finding*, typedefs should just be resolved and skipped over.
236However, for *reporting*, it may be still useful to say where a difference came
237from. This requires extra handling to collect the typedef names on each side of
238the comparison, when there is something to report.
239
240If `operator()(Id, Id)` sees a comparison involving a typedef, it resolves
241typedef chains on both sides and keeps track of the names. Then it calls itself
242recursively. If the result is no-diff, it returns no-diff, otherwise, it reports
243the differences between the types at the end of the typedef chains.
244
245An alternative would be to genuinely just follow the epsilons. Store the
246typedefs in the diff tree but record the comparison of what they resolve to. The
247presentation layer can decorate the comparison text with resolution chains.
248
249Note that qualified typedefs present extra complications.
250
251#### Qualified Types
252
253This special handling is subject to change.
254
255*   `const` → `volatile` → foo ⇔ `volatile` → `const` → foo
256
257STG currently represents type qualifiers as separate, individual nodes. They are
258relevant for finding differences but there may be no guarantee of the order in
259which they will appear. For diff reporting, STG currently reports added and
260removed qualifiers but also compares the underlying types.
261
262This implies that when faced with a comparison involving a qualifier,
263`operator()(Id, Id)` should collect and compare all qualifiers on both sides and
264treat the types as compound objects consisting of their qualifiers and the
265underlying types, either or both of which may have differences to report.
266Comparing the underlying types requires recursive calls.
267
268Note that qualified typedefs present extra complications.
269
270#### Qualified typedefs
271
272STG does not currently do anything special for qualified typedefs which can have
273subtle and surprising behaviours. For example:
274
275Before:
276
277```c++
278const int quux;
279```
280
281After 1:
282
283```c++
284typedef int foo;
285const foo quux;
286```
287
288After 2:
289
290```c++
291typedef const int foo;
292foo quux;
293```
294
295After 3:
296
297```c++
298typedef const int foo;
299const foo quux;
300```
301
302In all cases above, the type of `quux` is unchanged. These examples strongly
303suggest that a better model of C types would involve tracking qualification as a
304decoration present on every type node, including typedefs.
305
306Note that this behaviour implies C's type system is not purely constructive as
307there is machinery to discard duplicate qualifiers which would be illegal
308elsewhere.
309
310For the moment, we can pretend that outer qualifications are always significant,
311even though they may be absorbed by inner ones, and risk occasional false
312positives.
313
314A worse case is:
315
316Before:
317
318```c++
319const int quux[];
320```
321
322After 1:
323
324```c++
325typedef int foo[];
326const foo quux;
327```
328
329After 2:
330
331```c++
332typedef const int foo[];
333foo quux;
334```
335
336After 3:
337
338```c++
339typedef const int foo[];
340const foo quux;
341```
342
343All the `quux` are identically typed. There is an additional wart that what
344would normally be illegal qualifiers on an array type instead decorate its
345element type.
346
347Finally, worst is:
348
349Before:
350
351```c++
352const int quux();
353```
354
355After 1:
356
357```c++
358typedef int foo();
359const foo quux;
360```
361
362After 2:
363
364```c++
365typedef const int foo();
366foo quux;
367```
368
369After 3:
370
371```c++
372typedef const int foo();
373const foo quux;
374```
375
376The two `const foo quux` cases invoke undefined behaviour. The consistently
377crazy behaviour would have been to decorate the return type instead.
378
379The "worstest" case is GCC allowing the specification of typedef alignment
380different to the defining type. This abomination should not exist and should
381never be used. Alignment is not currently modelled by STG.
382
383### Diff helpers
384
385These are mainly used by the `CompareWorker::operator()(Node, Node)` methods.
386
387*   `MarkIncomparable` - nodes are just different
388*   `AddNodeDiff` - add node difference, unconditionally
389*   `AddEdgeDiff` - add edge difference (addition or removal), unconditionally
390*   `MaybeAddNodeDiff` - add node difference (label change), conditionally
391*   `MaybeAddEdgeDiff` - add matching edge recursive difference, conditionally
392
393Variants are possible where text is generated lazily on a recursive diff being
394found, as are ones where labels are compared and serialised only if different.
395
396## Diff Presentation
397
398In general, there are two problems to solve:
399
400*   generating suitable text, for
401    *   nodes and edges
402    *   node and edge differences
403*   building a report with some meaningful structure
404
405Node and edge description and report structure are the responsibility of the
406*reporting* code. See [Naming](naming.md) for more detailed notes on node
407description, mainly C type name syntax.
408
409Several report formats are supported and the simplest is (omitting various
410complications) a rendering of a difference graph as a difference *tree* where
411revisiting nodes is avoided by reporting 2 additional artificial kinds of
412difference:
413
4141.  already reported - to handle diff sharing
4152.  being compared - to handle diff cycles
416
417The various formats are not documented further here.
418
419Finally, node and edge difference description is currently the responsibility of
420the *comparison* code. This may change in the future, but might require a typed
421difference graph.
422