xref: /aosp_15_r20/external/emboss/doc/design_docs/archive/next_keyword.md (revision 99e0aae7469b87d12f0ad23e61142c2d74c1ef70)
1# Design Sketch: Packed Field Notation
2
3This document is provided for historical interest.  This feature is now
4implemented in the form of the `$next` keyword.
5
6
7## Motivation
8
9Many structures have many or most fields laid out consecutively, possibly with
10padding for alignment.  For example:
11
12    struct Simple:
13      0 [+2]  UInt  field_1
14      2 [+4]  UInt  field_2
15      6 [+2]  UInt  field_3
16
17For simple structures of fixed-size fields, the main issue is unchecked
18redundancy: it is relatively easy to enter the wrong value for the field
19offset, and no compiler checks will help.
20
21For more complex structures with multiple variable-sized fields, this can lead
22to unwieldy offsets:
23
24    struct Complex:
25      0     [+2]  UInt    header_length (h)
26      2     [+h]  Header  header
27      2+h   [+2]  UInt    body_length (b)
28      4+h   [+b]  Body    body
29      4+h+b [+4]  UInt    crc
30
31In both cases, there is some benefit to a shorthand notation that says 'this
32field should be placed immediately after the end of the lexically-previous
33field.
34
35
36## Example
37
38(Exact syntax TBD.)
39
40    struct Complex:
41      0     [+2]  UInt    header_length (h)
42      $next [+h]  Header  header
43      $next [+2]  UInt    body_length (b)
44      $next [+b]  Body    body
45      $next [+4]  UInt    crc
46
47It is tempting to use some more specialized, terser syntax, like:
48
49    struct Complex:
50      0  [+2]  UInt    header_length (h)
51      ^^ [+h]  Header  header
52      ^^ [+2]  UInt    body_length (b)
53      ^^ [+b]  Body    body
54      ^^ [+4]  UInt    crc
55
56However, an explicit symbol has the advantage that you can use it in
57expressions, if needed:
58
59    struct ComplexWithGap:
60      0       [+2]  UInt    header_length (h)
61      $next   [+h]  Header  header
62      $next   [+2]  UInt    body_length (b)
63      # 2-byte reserved gap.
64      $next+2 [+b]  Body    body
65      $next   [+4]  UInt    crc
66
67Or, with a (not-yet-implemented) `$align()` function:
68
69    struct ComplexWithAlignment:
70      0                [+2]  UInt    header_length (h)
71      $align($next, 4) [+h]  Header  header
72      $align($next, 4) [+2]  UInt    body_length (b)
73      $align($next, 4) [+b]  Body    body
74      $align($next, 4) [+4]  UInt    crc
75
76
77## Implementation
78
79Assuming the "new symbol" approach:
80
811.  Pick a new symbol name -- preferably, come up with a few alternatives and
82    do a quick survey.  By convention, Emboss built-in symbols start with `$`.
832.  Add the new name to `LITERAL_TOKEN_PATTERNS` in
84    `compiler/front_end/tokenizer.py`.
853.  Add a new production for `builtin-word -> "$new_symbol"` to the `_word()`
86    function in `module_ir.py`.
874.  Add a new compiler pass before `synthetics.synthesize_fields`, to replace
88    the new symbol with the expanded representation.  This should be relatively
89    straightforward -- something that uses `fast_traverse_ir_top_down()` to
90    find all `ir_data.Structure` elements in the IR, then iterates over the
91    field offsets within each structure, and recursively replaces any
92    `ir_data.Expression`s with a
93    `builtin_reference.canonical_name.object_path[0]` equal to
94    `"$new_symbol"`.  It would probably be useful to make
95    `traverse_ir._fast_traverse_proto_top_down()` into a public function, so
96    that you do not have to re-write `Expression` traversal.
97
98For this change, the back end should not need any modifications.
99