1# Design Sketch: Packed Field Notation 2 3This document is provided for historical interest. This feature is now 4implemented in the form of the `$next` keyword. 5 6 7## Motivation 8 9Many structures have many or most fields laid out consecutively, possibly with 10padding for alignment. For example: 11 12 struct Simple: 13 0 [+2] UInt field_1 14 2 [+4] UInt field_2 15 6 [+2] UInt field_3 16 17For simple structures of fixed-size fields, the main issue is unchecked 18redundancy: it is relatively easy to enter the wrong value for the field 19offset, and no compiler checks will help. 20 21For more complex structures with multiple variable-sized fields, this can lead 22to unwieldy offsets: 23 24 struct Complex: 25 0 [+2] UInt header_length (h) 26 2 [+h] Header header 27 2+h [+2] UInt body_length (b) 28 4+h [+b] Body body 29 4+h+b [+4] UInt crc 30 31In both cases, there is some benefit to a shorthand notation that says 'this 32field should be placed immediately after the end of the lexically-previous 33field. 34 35 36## Example 37 38(Exact syntax TBD.) 39 40 struct Complex: 41 0 [+2] UInt header_length (h) 42 $next [+h] Header header 43 $next [+2] UInt body_length (b) 44 $next [+b] Body body 45 $next [+4] UInt crc 46 47It is tempting to use some more specialized, terser syntax, like: 48 49 struct Complex: 50 0 [+2] UInt header_length (h) 51 ^^ [+h] Header header 52 ^^ [+2] UInt body_length (b) 53 ^^ [+b] Body body 54 ^^ [+4] UInt crc 55 56However, an explicit symbol has the advantage that you can use it in 57expressions, if needed: 58 59 struct ComplexWithGap: 60 0 [+2] UInt header_length (h) 61 $next [+h] Header header 62 $next [+2] UInt body_length (b) 63 # 2-byte reserved gap. 64 $next+2 [+b] Body body 65 $next [+4] UInt crc 66 67Or, with a (not-yet-implemented) `$align()` function: 68 69 struct ComplexWithAlignment: 70 0 [+2] UInt header_length (h) 71 $align($next, 4) [+h] Header header 72 $align($next, 4) [+2] UInt body_length (b) 73 $align($next, 4) [+b] Body body 74 $align($next, 4) [+4] UInt crc 75 76 77## Implementation 78 79Assuming the "new symbol" approach: 80 811. Pick a new symbol name -- preferably, come up with a few alternatives and 82 do a quick survey. By convention, Emboss built-in symbols start with `$`. 832. Add the new name to `LITERAL_TOKEN_PATTERNS` in 84 `compiler/front_end/tokenizer.py`. 853. Add a new production for `builtin-word -> "$new_symbol"` to the `_word()` 86 function in `module_ir.py`. 874. Add a new compiler pass before `synthetics.synthesize_fields`, to replace 88 the new symbol with the expanded representation. This should be relatively 89 straightforward -- something that uses `fast_traverse_ir_top_down()` to 90 find all `ir_data.Structure` elements in the IR, then iterates over the 91 field offsets within each structure, and recursively replaces any 92 `ir_data.Expression`s with a 93 `builtin_reference.canonical_name.object_path[0]` equal to 94 `"$new_symbol"`. It would probably be useful to make 95 `traverse_ir._fast_traverse_proto_top_down()` into a public function, so 96 that you do not have to re-write `Expression` traversal. 97 98For this change, the back end should not need any modifications. 99