1# C Type Names 2 3This is implementated in `naming.{h,cc}`. 4 5STG does not contain full type names for every type node in the graph. In order 6to meaningfully describe type changes, STG needs to be able to render C and C++ 7type names back into source-level syntax. 8 9## Implementation 10 11The STG type `Name` is the basic entity representing the human-friendly name of 12a graph node. Values of this type are computed recursively by `Describe` and are 13memoised in a `NameCache`. 14 15`Name` copes with the inside-out C type syntax in a fairly uniform fashion. 16There is some slightly special code for `Qualifier` decoration. 17 18Infinite recursion prevention in `Describe` is done in the most straightforward 19fashion possible. One consequence of this is that if there is a naming cycle, 20the memoised names for nodes in the cycle will depend on where it was entered. 21 22The rest of this document covers the concepts and design principles. 23 24## Background 25 26In sensible operator grammars, composition can be done using precedence levels. 27 28Example with binary operators (there are minor adjustments needed if operators 29have left or right associativity): 30 31**op** | **precedence** 32------ | -------------- 33`+` | 0 34`*` | 1 35*num* | 2 36 37```haskell 38data Expr = Number Int | Times Expr Expr | Plus Expr Expr 39 40show_paren p x = if p then "(" ++ x ++ ")" else x 41 42shows_prec p (Number n) = show n 43shows_prec p (Times e1 e2) = show_paren (p > 1) $ shows_prec 2 e1 ++ "*" ++ shows_prec 2 e2 44shows_prec p (Plus e1 e2) = show_paren (p > 0) $ shows_prec 1 e1 ++ "+" ++ shows_prec 1 e2 45``` 46 47The central idea is that expressions are rendered in the context of a precedence 48level. Parentheses are needed if the context precedence is higher than the 49expression's own precedence. Atomic values can be viewed as expressions having 50maximal precedence. The default precedence context for printing an expression is 51the minimal one; no parentheses will be emitted. 52 53## The more-than-slightly-bonkers C declaration syntax 54 55C's type syntax is closely related to the inside-out declaration syntax it uses 56and has the same precedence rules. A simplified, partial precedence table for 57types might look like this. 58 59**thing** | **precedence** 60------------ | -------------- 61`int` | 0 62`refer *` | 1 63`elt[N]` | 2 64`ret(args)` | 2 65`identifier` | 3 66 67The basic (lowest precedence) elements are: 68 69* primitive types, possibly CV-qualified 70* typedefs, possibly CVR-qualified 71* struct, union and enum types, possibly CV-qualified 72 73The "operators" in increasing precedence level order are: 74 75* pointer-to, possibly CVR-qualified 76* function (return type) and array (element type) 77 78The atomic (highest precedence) elements are: 79 80* variable names 81* function names 82 83### CVR-qualifiers 84 85The qualifiers `const`, `volatile` and `restrict` appear to the right of the 86pointer-to operator `*` and are idiomatically placed to the left of the basic 87elements.[^1] They can be considered as transparent to precedence. 88 89[^1]: They can be idiomatically placed to the right, but that's a different 90 idiom. 91 92### User-defined types 93 94Struct, union and enum types can be named or anonymous. The normal case is a 95named type. Anonymous types are given structural descriptions. 96 97### Pointer, array and function types 98 99To declare `x` as a pointer to type `t`, we declare the dereferencing of `x` as 100`t`. 101 102```c++ 103t * x; 104``` 105 106To declare `x` as an array of type `t` and size `n`, we declare the `n`th 107element of `x` as `t`. 108 109```c++ 110t x[n]; 111``` 112 113To declare `x` as a function returning type `t` and taking args `n`, we declare 114the result of applying `x` to `n` as `t`. 115 116```c++ 117t x(n); 118``` 119 120The context precedence level for rendering `x`, which may be a complex thing in 121its own right, is 2 in the case of arrays and functions and 1 in the case of 122pointers. 123 124We need to do things inside-out now, because the outer type is a leaf of the 125type expression tree. Instead we say `x` has a precedence and the leaf type 126wraps around it, using parentheses if `x`'s precedence is less than the leaf 127type (which typically happens if `x` is a pointer type). 128 129In each of these cases `x` has been replaced by another type that mentions `y`. 130 131```c++ 132t * * y; // y is a pointer to pointer to t 133t * y[m]; // y is an array of pointer to t 134t * y(m); // y is a function returning pointer to t 135t (* y)[n]; // y is a pointer to an array of t 136t y[m][n]; // y is an array of arrays of t 137t y(m)[n]; // if a function could return an array, this is what it would look like 138t (* y)(n); // y is a pointer to a function returning t 139t y[m](n); // if an array could contain functions, this is what it would look like 140t y(m)(n); // if a function could return a function, this is what it would look like 141``` 142 143## Concise and Efficient Pretty Printer (sketch) 144 145This builds a type recursively, so that outside bits will be rendered first. The 146recursion needs to keep track of a left piece, a right piece and the precedence 147level of the hole in the middle. 148 149```haskell 150data LR = L | R deriving Eq 151 152data Type = Basic String | Ptr Type | Function Type [Type] | Array Type Int | Decl String Type 153 154render_final expr = ll ++ rr where 155 (ll, _, rr) = render expr 156 157render (Basic name) = (name, 0, "") 158render (Ptr ref) = add L 1 "*" (render ref) 159render (Function ret args) = add R 2 ("(" ++ intercalate ", " (map final_render args) ++ ")") (render ret) 160render (Array elt size) = add R 2 ("[" ++ show size ++ "]") (render elt) 161render (Decl name t) = add L 3 name (render t) 162 163add side prec text (l, p, r) = 164 case side of 165 L -> (ll ++ text, prec, rr) 166 R -> (ll, prec, text ++ rr) 167 where 168 paren = prec < p 169 ll = if paren then l ++ "(" else if side == L then l ++ " " else l 170 rr = if paren then ")" ++ r else r 171``` 172 173To finally print something, just print the left and right pieces in sequence. 174 175The cunning bit about structuring the pretty printer this way is that `render` 176can be memoised. With the expectation that any given type will appear many times 177in typical output, this is a big win. 178 179NOTE: C type qualifiers are a significant extra complication and are omitted 180from this sketch. They must appear to the left or right of (the left part of) a 181type name. Which side can be determined by the current precendence. 182 183NOTE: Whitespace can be emitted sparingly as specific side / precedence contexts 184can imply the impossibility of inadvertently joining two words. 185