xref: /aosp_15_r20/external/stg/doc/naming.md (revision 9e3b08ae94a55201065475453d799e8b1378bea6)
1# C Type Names
2
3This is implementated in `naming.{h,cc}`.
4
5STG does not contain full type names for every type node in the graph. In order
6to meaningfully describe type changes, STG needs to be able to render C and C++
7type names back into source-level syntax.
8
9## Implementation
10
11The STG type `Name` is the basic entity representing the human-friendly name of
12a graph node. Values of this type are computed recursively by `Describe` and are
13memoised in a `NameCache`.
14
15`Name` copes with the inside-out C type syntax in a fairly uniform fashion.
16There is some slightly special code for `Qualifier` decoration.
17
18Infinite recursion prevention in `Describe` is done in the most straightforward
19fashion possible. One consequence of this is that if there is a naming cycle,
20the memoised names for nodes in the cycle will depend on where it was entered.
21
22The rest of this document covers the concepts and design principles.
23
24## Background
25
26In sensible operator grammars, composition can be done using precedence levels.
27
28Example with binary operators (there are minor adjustments needed if operators
29have left or right associativity):
30
31**op** | **precedence**
32------ | --------------
33`+`    | 0
34`*`    | 1
35*num*  | 2
36
37```haskell
38data Expr = Number Int | Times Expr Expr | Plus Expr Expr
39
40show_paren p x = if p then "(" ++ x ++ ")" else x
41
42shows_prec p (Number n) = show n
43shows_prec p (Times e1 e2) = show_paren (p > 1) $ shows_prec 2 e1 ++ "*" ++ shows_prec 2 e2
44shows_prec p (Plus e1 e2) = show_paren (p > 0) $ shows_prec 1 e1 ++ "+" ++ shows_prec 1 e2
45```
46
47The central idea is that expressions are rendered in the context of a precedence
48level. Parentheses are needed if the context precedence is higher than the
49expression's own precedence. Atomic values can be viewed as expressions having
50maximal precedence. The default precedence context for printing an expression is
51the minimal one; no parentheses will be emitted.
52
53## The more-than-slightly-bonkers C declaration syntax
54
55C's type syntax is closely related to the inside-out declaration syntax it uses
56and has the same precedence rules. A simplified, partial precedence table for
57types might look like this.
58
59**thing**    | **precedence**
60------------ | --------------
61`int`        | 0
62`refer *`    | 1
63`elt[N]`     | 2
64`ret(args)`  | 2
65`identifier` | 3
66
67The basic (lowest precedence) elements are:
68
69*   primitive types, possibly CV-qualified
70*   typedefs, possibly CVR-qualified
71*   struct, union and enum types, possibly CV-qualified
72
73The "operators" in increasing precedence level order are:
74
75*   pointer-to, possibly CVR-qualified
76*   function (return type) and array (element type)
77
78The atomic (highest precedence) elements are:
79
80*   variable names
81*   function names
82
83### CVR-qualifiers
84
85The qualifiers `const`, `volatile` and `restrict` appear to the right of the
86pointer-to operator `*` and are idiomatically placed to the left of the basic
87elements.[^1] They can be considered as transparent to precedence.
88
89[^1]: They can be idiomatically placed to the right, but that's a different
90    idiom.
91
92### User-defined types
93
94Struct, union and enum types can be named or anonymous. The normal case is a
95named type. Anonymous types are given structural descriptions.
96
97### Pointer, array and function types
98
99To declare `x` as a pointer to type `t`, we declare the dereferencing of `x` as
100`t`.
101
102```c++
103t * x;
104```
105
106To declare `x` as an array of type `t` and size `n`, we declare the `n`th
107element of `x` as `t`.
108
109```c++
110t x[n];
111```
112
113To declare `x` as a function returning type `t` and taking args `n`, we declare
114the result of applying `x` to `n` as `t`.
115
116```c++
117t x(n);
118```
119
120The context precedence level for rendering `x`, which may be a complex thing in
121its own right, is 2 in the case of arrays and functions and 1 in the case of
122pointers.
123
124We need to do things inside-out now, because the outer type is a leaf of the
125type expression tree. Instead we say `x` has a precedence and the leaf type
126wraps around it, using parentheses if `x`'s precedence is less than the leaf
127type (which typically happens if `x` is a pointer type).
128
129In each of these cases `x` has been replaced by another type that mentions `y`.
130
131```c++
132t * * y;     // y is a pointer to pointer to t
133t * y[m];    // y is an array of pointer to t
134t * y(m);    // y is a function returning pointer to t
135t (* y)[n];  // y is a pointer to an array of t
136t y[m][n];   // y is an array of arrays of t
137t y(m)[n];   // if a function could return an array, this is what it would look like
138t (* y)(n);  // y is a pointer to a function returning t
139t y[m](n);   // if an array could contain functions, this is what it would look like
140t y(m)(n);   // if a function could return a function, this is what it would look like
141```
142
143## Concise and Efficient Pretty Printer (sketch)
144
145This builds a type recursively, so that outside bits will be rendered first. The
146recursion needs to keep track of a left piece, a right piece and the precedence
147level of the hole in the middle.
148
149```haskell
150data LR = L | R deriving Eq
151
152data Type = Basic String | Ptr Type | Function Type [Type] | Array Type Int | Decl String Type
153
154render_final expr = ll ++ rr where
155  (ll, _, rr) = render expr
156
157render (Basic name) = (name, 0, "")
158render (Ptr ref) = add L 1 "*" (render ref)
159render (Function ret args) = add R 2 ("(" ++ intercalate ", " (map final_render args) ++ ")") (render ret)
160render (Array elt size) = add R 2 ("[" ++ show size ++ "]") (render elt)
161render (Decl name t) = add L 3 name (render t)
162
163add side prec text (l, p, r) =
164  case side of
165    L -> (ll ++ text, prec, rr)
166    R -> (ll, prec, text ++ rr)
167  where
168    paren = prec < p
169    ll = if paren then l ++ "(" else if side == L then l ++ " " else l
170    rr = if paren then ")" ++ r else r
171```
172
173To finally print something, just print the left and right pieces in sequence.
174
175The cunning bit about structuring the pretty printer this way is that `render`
176can be memoised. With the expectation that any given type will appear many times
177in typical output, this is a big win.
178
179NOTE: C type qualifiers are a significant extra complication and are omitted
180from this sketch. They must appear to the left or right of (the left part of) a
181type name. Which side can be determined by the current precendence.
182
183NOTE: Whitespace can be emitted sparingly as specific side / precedence contexts
184can imply the impossibility of inadvertently joining two words.
185