# STG STG stands for Symbol-Type Graph. # Overview STG models Application Binary Interfaces. It supports extraction of ABIs from DWARF and ingestion of BTF and libabigail XML into its model. Its primary purpose is monitoring an ABI for changes over time and reporting such changes in a comprehensible fashion. STG captures symbol information, the size and layout of structs, function argument and return types and much more, in a graph representation. Difference reporting happens via a graph comparison. Currently, STG functionality is exposed as two command-line tools, `stg` (for ABI extraction) and `stgdiff` (for ABI comparison), and a native file format. ## Model STG's model is an *abstraction* which does not and cannot capture every possible interface property, invariant or behaviour. Conversely, the model includes distinctions which are API significant but not ABI significant. Concretely, STG's model is a rooted, connected, directed graph where each kind of node corresponds to a meaningful ABI entity such as a symbol, function type or struct member. Nodes have specific attributes, such as name or size. Outgoing edges specify things like return type. STG's model does not impose any constraints on which nodes may be joined by edges. Each node has an identity. However, for the purpose of comparison, nodes are considered equal if they are of the same kind, have the same attributes and matching outgoing edges and all nodes reachable via a pair of matching edges are (recursively) equal. Renumbering nodes, (de)duplicating nodes and adding/removing unreachable nodes do not affect this relationship. ### Symbols As modelled by STG, symbols correspond closely to ELF symbols as seen in `.dynsym` for shared object files or in `.symtab` for object files. In the case of the Linux kernel, the `.symtab` is enriched with metadata and the effective "ksymtab" is actually a subset of the ELF symbols together with CRC and namespace information. STG links symbols to their source-level types where these are known. Symbols defined purely in assembly language will not have type information. The symbol table is contained in the root node of the graph, which is an *Interface* node. ### Types STG models the C, C++ and (to a limited extent) Rust type systems. For example, C++ template value parameters are poorly modelled for the simple reason that this would require modelling C++ *values* as well as types, something that DWARF itself doesn't do to the full extent permitted by C++20. As type definitions are in general mutually recursive, an STG ABI is in general a cyclic graph. The root node of the graph can also contain a list of interface types, which may not necessarily be reachable from the interface symbols. ## Supported Input Formats, Parsers and Limitations STG can read its own native format for processing or comparison. It can also process libabigail XML and BTF (`.BTF` ELF sections), with some limitations due to model, design and implementation differences including missing features. ### Kinds of Node STG has the following kinds of node. * **Special** - used for `void` and `...` * **Pointer / Reference** - `*`, `&` and `&&` * **Pointer to Member** - `foo::*` * **Typedef** - `typedef` and `using ... = ...` * **Qualified** - `const` and friends * **Primitive** - concrete types such as `int` and friends * **Array** - `foo[N]` - there is no distinction between zero and indeterminate length in the model * **Base Class** - inheritance metadata * **Method** - (only) virtual function * **Member** - data member * **Variant Member** - discriminated member * **Struct / Union** - `struct foo` etc., Rust tuples too * **Enumeration** - including the underlying value type - only values that are within the range of signed 64-bit integer are correctly modelled * **Variant** - for Rust enums holding data * **Function** - multiple argument, single return type * **ELF Symbol** - name, version, ELF metadata, Linux kernel metadata * **Interface** - top-level collection of symbols and types An STG ABI consists of a rooted, connected graph of such nodes, and *nothing else*. STG is blind to anything that cannot be represented by its model. ### Native Format STG's native file format is a protocol buffer text format. It is suitable for revision control, rather than human consumption. It is effectively described by [`stg.proto`](../stg.proto). In this textual serialisation of ABI graphs, external node identifiers and node order are chosen to minimise file changes when a small subset of the graph changes. As an example, this is the definition of the **Typedef** node kind: ```proto message Typedef { fixed32 id = 1; string name = 2; fixed32 referred_type_id = 3; } ``` ### Abigail (a.k.a. libabigail XML) [libabigail](https://sourceware.org/libabigail/) is another project for ABI monitoring. It uses a format that can be parsed as XML. This command will transform Abigail into STG: ```shell stg --abi library.xml --output library.stg ``` The main features modelled in Abigail but not STG are: * source file, line and column information * C++ access specifiers (public, protected, private) The Abigail reader has these distinct phases of operation: 1. text parsed into an XML tree 2. XML cleaning - whitespace and unused attributes are stripped 3. XML tidying - issues like duplicate nodes are resolved, if possible 4. XML parsed into a graph with symbol information held separately 5. symbols and root node added to the graph 6. useless type qualifiers are stripped in post-processing ### BTF [BTF](https://docs.kernel.org/bpf/btf.html) is typically used for the Linux kernel where it is generated by `pahole -J` from ELF and DWARF information. It can also be generated natively instead of DWARF using `gcc -gbtf` and by Clang, but only for eBPF targets. This command will transform BTF into STG: ```shell stg --btf vmlinux --output vmlinux.stg ``` STG has primarily been tested against the `pahole` (libbtf) dialect of BTF and support is not complete. * split BTF is not supported at all * any `.BTF.ext` section is just ignored * some kinds of BTF node are not handled: * `BTF_KIND_DATASEC` - skip * `BTF_KIND_DECL_TAG` - abort * `BTF_KIND_TYPE_TAG` - abort The BTF reader has these distinct phases of operation: 1. file is opened as ELF and `.BTF` section data found 2. BTF header processed 3. BTF nodes parsed into a graph with symbol information held separately 4. symbols and root node added to the graph ### DWARF The ELF / DWARF reader operates similarly to the other readers at a high level, but much more work has to be done to turn ELF symbols and DWARF DIEs into STG nodes. 1. the ELF file is checked for DWARF - missing DWARF results in a warning 2. the ELF symbols are read (from `.dynsym` in the case of shared object file) 3. the DWARF information is parsed into a partial STG graph 4. the ELF and DWARF information are stitched together, adding symbols and a root node to the graph 5. useless type qualifiers are stripped in post-processing ## Output preprocessing Before `stg` outputs a serialised graph, it performs: 1. a type normalisation step that unifies overlapping type definitions 2. a final deduplication step to eliminate other redundant nodes