# Tutorial The simplest use for the STG tools is to extract, store and compare ABI representations. This tutorial uses long options throughout. Equivalent short options can be found in the manual pages for [`stg`](stg.md) and [`stgdiff`](stgdiff.md). Both tools understand `-` as a shorthand for `/dev/stdout`.
Working Example - code and compilation This small code sample will be used as a working example. Copy it into a file called `tree.c`. ```c struct N { struct N * left; struct N * right; int value; }; unsigned int count(struct N * tree) { return tree ? count(tree->left) + count(tree->right) + 1 : 0; } int sum(struct N * tree) { return tree ? sum(tree->left) + sum(tree->right) + tree->value : 0; } ``` Compile it: ```shell gcc -Wall -Wextra -g -c tree.c -o tree.o ```
## Extraction from ELF / DWARF `stg` is the tool for extracting ABI representations, though it can do more sophisticated things as well. The simplest invocation of `stg` looks something like this: ```shell stg --elf library.so --output library.stg ``` Adding the `--annotate` option can be useful, especially if trying to debug ABI issues or when experimenting with the tools, like now. If the output consists of just symbols and you get a warning about missing DWARF information, this means that `library.so` has no DWARF debugging information. For meaningful results, `stg` should be run on an *unstripped* ELF file which may require build system adjustments.
Working Example - ABI extraction Run this: ```shell stg --elf tree.o --annotate --output - ``` And you should get something like this:
Output ```proto version: 0x00000002 root_id: 0x84ea5130 # interface pointer_reference { id: 0x32b38621 kind: POINTER pointee_type_id: 0xe08efe1a # struct N } primitive { id: 0x4585663f name: "unsigned int" encoding: UNSIGNED_INTEGER bytesize: 0x00000004 } primitive { id: 0x6720d32f name: "int" encoding: SIGNED_INTEGER bytesize: 0x00000004 } member { id: 0x35cbdb23 name: "left" type_id: 0x32b38621 # struct N* } member { id: 0x0b440ffb name: "right" type_id: 0x32b38621 # struct N* offset: 64 } member { id: 0xa06f75d5 name: "value" type_id: 0x6720d32f # int offset: 128 } struct_union { id: 0xe08efe1a kind: STRUCT name: "N" definition { bytesize: 24 member_id: 0x35cbdb23 # struct N* left member_id: 0x0b440ffb # struct N* right member_id: 0xa06f75d5 # int value } } function { id: 0x912c02a7 return_type_id: 0x6720d32f # int parameter_id: 0x32b38621 # struct N* } function { id: 0xc2779f73 return_type_id: 0x4585663f # unsigned int parameter_id: 0x32b38621 # struct N* } elf_symbol { id: 0xbb237197 name: "count" is_defined: true symbol_type: FUNCTION type_id: 0xc2779f73 # unsigned int(struct N*) full_name: "count" } elf_symbol { id: 0x4fdeca38 name: "sum" is_defined: true symbol_type: FUNCTION type_id: 0x912c02a7 # int(struct N*) full_name: "sum" } interface { id: 0x84ea5130 symbol_id: 0xbb237197 # unsigned int count(struct N*) symbol_id: 0x4fdeca38 # int sum(struct N*) } ```
## Filtering One issue when first starting to manage the ABI of a binary is the wish to restrict the interface surface to just the necessary minimum. Any superfluous symbols or type definitions in the ABI representation can result in spurious ABI differences in reports later on. When it comes to the symbols exposed, it's common to control symbol *visibility*. Type definitions can be either exposed in public header files or hidden in private header files, with perhaps only public forward declarations, but this does not remove any type definitions in the DWARF information. STG provides filtering facilities for both symbols and types, for example: ```shell stg --files '*.h' --elf library.so --output library.stg ``` This will ensure that definitions of any types defined outside any header files, and perhaps used as opaque pointer handles, are omitted from the ABI representation. If you separate public and private headers, then use an appropriate glob pattern that distinguishes the two. Sets of symbol or file names can be read from a file. In this example, all symbols whose names begin with `api_`, except those in the `obsolete` file, are kept. ```shell stg --symbols 'api_* & ! :obsolete' --elf library.so --output library.stg ``` For historical reasons, the literal filter file format is compatible with libabigail's symbol list one, but this is subject to change. ```ini [list] # one symbol per line foo # comments, whitespace and empty lines are all ignored bar baz ```
Working Example - filtering the ABI Let's say that `struct N` is supposed to be an opaque type that user code only gets pointers to and, additionally, the function `count` should be excluded from the ABI (perhaps due to an argument over its return type). We can exclude the definition of `struct N`, along with that of any other types defined in `tree.c`, using a file filter. The symbol can be excluded by name. Run this: ```shell stg --elf tree.o --files '*.h' --symbols '!count' --output - ``` The result should be something like this:
Output ```proto version: 0x00000002 root_id: 0x84ea5130 pointer_reference { id: 0x26944aa7 kind: POINTER pointee_type_id: 0xb011cc02 } primitive { id: 0x6720d32f name: "int" encoding: SIGNED_INTEGER bytesize: 0x00000004 } struct_union { id: 0xb011cc02 kind: STRUCT name: "N" } function { id: 0x9425f186 return_type_id: 0x6720d32f parameter_id: 0x26944aa7 } elf_symbol { id: 0x4fdeca38 name: "sum" is_defined: true symbol_type: FUNCTION type_id: 0x9425f186 full_name: "sum" } interface { id: 0x84ea5130 symbol_id: 0x4fdeca38 } ```
## ABI Comparison `stgdiff` is the tool for comparing ABI representations and reporting differences, though it has some other, more specialised, uses. The simplest invocation of `stgdiff` looks something like this: ```shell stgdiff --stg old/library.stg new/library.stg --output - ``` This will report ABI differences in the default (`small`) format.
Working Example - ABI differences - small format The function `sum` has a type that depends on `struct N`. Any change to either might affect the ABI exposed via `sum`. For example, if the type of the `value` member is changed to `short` and the file is recompiled, STG can detect this difference. First rerun the STG extraction, specifying `--output tree-old.stg`. Make the source code change, recompile and extract the ABI with `--output tree-new.stg`. Then run this: ```shell stgdiff --stg tree-old.stg tree-new.stg --output - ``` To get this: ```text type 'struct N' changed member changed from 'int value' to 'short int value' type changed from 'int' to 'short int' ```
The `small` format omits parts of the ABI graph which haven't changed.[^1] To see all impacted nodes, use `--format flat` instead. [^1]: The similarly named `short` format goes a bit further and will omit and summarise certain repetitive differences.
Working Example - ABI differences - flat format ```text function symbol 'int sum(struct N*)' changed type 'int(struct N*)' changed parameter 1 type 'struct N*' changed pointed-to type 'struct N' changed type 'struct N' changed member 'struct N* left' changed type 'struct N*' changed pointed-to type 'struct N' changed member 'struct N* right' changed type 'struct N*' changed pointed-to type 'struct N' changed member changed from 'int value' to 'short int value' type changed from 'int' to 'short int' ```
And if you really want to see more of the graph structure, use `--format plain`.
Working Example - ABI differences - plain format ```text function symbol 'int sum(struct N*)' changed type 'int(struct N*)' changed parameter 1 type 'struct N*' changed pointed-to type 'struct N' changed member 'struct N* left' changed type 'struct N*' changed pointed-to type 'struct N' changed (being reported) member 'struct N* right' changed type 'struct N*' changed pointed-to type 'struct N' changed (being reported) member changed from 'int value' to 'short int value' type changed from 'int' to 'short int' ```
Or just use `--format viz` which generates input for [Graphviz](https://graphviz.org/).
Working Example - ABI differences - viz format ```dot digraph "ABI diff" { "0" [shape=rectangle, label="'interface'"] "1" [label="'int sum(struct N*)'"] "2" [label="'int(struct N*)'"] "3" [label="'struct N*'"] "4" [shape=rectangle, label="'struct N'"] "5" [label="'struct N* left'"] "5" -> "3" [label=""] "4" -> "5" [label=""] "6" [label="'struct N* right'"] "6" -> "3" [label=""] "4" -> "6" [label=""] "7" [label="'int value' → 'short int value'"] "8" [color=red, label="'int' → 'short int'"] "7" -> "8" [label=""] "4" -> "7" [label=""] "3" -> "4" [label="pointed-to"] "2" -> "3" [label="parameter 1"] "1" -> "2" [label=""] "0" -> "1" [label=""] } ```