1ISASPEC - XML Based ISA Specification 2===================================== 3 4isaspec provides a mechanism to describe an instruction set in XML, and 5generate a disassembler and assembler. The intention is 6to describe the instruction set more formally than hand-coded assembler 7and disassembler, and better decouple the shader compiler from the 8underlying instruction encoding to simplify dealing with instruction 9encoding differences between generations of GPU. 10 11Benefits of a formal ISA description, compared to hand-coded assemblers 12and disassemblers, include easier detection of new bit combinations that 13were not seen before in previous generations due to more rigorous 14description of bits that are expect to be '0' or '1' or 'x' (dontcare) 15and verification that different encodings don't have conflicting bits 16(i.e. that the specification cannot result in more than one valid 17interpretation of any bit pattern). 18 19The isaspec tool and XML schema are intended to be generic (not specific 20to ir3), although there are currently a couple limitations due to short- 21cuts taken to get things up and running (which are mostly not inherent to 22the XML schema, and should not be too difficult to remove from the py and 23decode/disasm utility): 24 25* Maximum "field" size is 64b 26* Fixed instruction size 27 28Often times, especially when new functionality is added in later gens 29while retaining (or at least mostly retaining) backwards compatibility 30with encodings used in earlier generations, the actual encoding can be 31rather messy to describe. To support this, isaspec provides many flexible 32mechanism, such as conditional overrides and derived fields. This not 33only allows for describing an irregular instruction encoding, but also 34allows matching an existing disasm syntax (which might not have been 35design around the idea of disassembly based on a formal ISA description). 36 37Bitsets 38------- 39 40The fundamental concept of matching a bit-pattern to an instruction 41decoding/encoding is the concept of a hierarchical tree of bitsets. 42This is intended to match how the HW decodes instructions, where certain 43bits describe the instruction (and sub-encoding, and so on), and other 44bits describe various operands to the instruction. 45 46Bitsets can also be used recursively as the type of a field described 47in another bitset. 48 49The leaves of the tree of instruction bitsets represent every possible 50instruction. Deciding which instruction a bitpattern is amounts to: 51 52.. code-block:: c 53 54 m = (val & bitsets[n]->mask) & ~bitsets[n]->dontcare; 55 56 if (m == bitsets[n]->match) { 57 /* we've found the instruction description */ 58 } 59 60For example, the starting point to decode an ir3 instruction is a 64b 61bitset: 62 63.. code-block:: xml 64 65 <bitset name="#instruction" size="64"> 66 <doc> 67 Encoding of an ir3 instruction. All instructions are 64b. 68 </doc> 69 </bitset> 70 71In the first level of instruction encoding hierarchy, the high three bits 72group things into instruction "categories": 73 74.. code-block:: xml 75 76 <bitset name="#instruction-cat2" extends="#instruction"> 77 <field name="DST" low="32" high="39" type="#reg-gpr"/> 78 <field name="REPEAT" low="40" high="41" type="#rptN"/> 79 <field name="SAT" pos="42" type="bool" display="(sat)"/> 80 <field name="SS" pos="44" type="bool" display="(ss)"/> 81 <field name="UL" pos="45" type="bool" display="(ul)"/> 82 <field name="DST_CONV" pos="46" type="bool"> 83 <doc> 84 Destination register is opposite precision as source, i.e. 85 if {FULL} is true then destination is half precision, and 86 visa versa. 87 </doc> 88 </field> 89 <derived name="DST_HALF" expr="#dest-half" type="bool" display="h"/> 90 <field name="EI" pos="47" type="bool" display="(ei)"/> 91 <field name="FULL" pos="52" type="bool"> 92 <doc>Full precision source registers</doc> 93 </field> 94 <field name="JP" pos="59" type="bool" display="(jp)"/> 95 <field name="SY" pos="60" type="bool" display="(sy)"/> 96 <pattern low="61" high="63">010</pattern> <!-- cat2 --> 97 <!-- 98 NOTE, both SRC1_R and SRC2_R are defined at this level because 99 SRC2_R is still a valid bit for (nopN) (REPEAT==0) for cat2 100 instructions with only a single src 101 --> 102 <field name="SRC1_R" pos="43" type="bool" display="(r)"/> 103 <field name="SRC2_R" pos="51" type="bool" display="(r)"/> 104 <derived name="ZERO" expr="#zero" type="bool" display=""/> 105 </bitset> 106 107The ``<pattern>`` elements are the part(s) that determine which leaf-node 108bitset matches against a given bit pattern. The leaf node's match/mask/ 109dontcare bitmasks are a combination of those defined at the leaf node and 110recursively each parent bitclass. 111 112For example, cat2 instructions (ALU instructions with up to two src 113registers) can have either one or two source registers: 114 115.. code-block:: xml 116 117 <bitset name="#instruction-cat2-1src" extends="#instruction-cat2"> 118 <override expr="#cat2-cat3-nop-encoding"> 119 <display> 120 {SY}{SS}{JP}{SAT}(nop{NOP}) {UL}{NAME} {EI}{DST_HALF}{DST}, {SRC1} 121 </display> 122 <derived name="NOP" expr="#cat2-cat3-nop-value" type="uint"/> 123 <field name="SRC1" low="0" high="15" type="#multisrc"> 124 <param name="ZERO" as="SRC_R"/> 125 <param name="FULL"/> 126 </field> 127 </override> 128 <display> 129 {SY}{SS}{JP}{SAT}{REPEAT}{UL}{NAME} {EI}{DST_HALF}{DST}, {SRC1} 130 </display> 131 <pattern low="16" high="31">xxxxxxxxxxxxxxxx</pattern> 132 <pattern low="48" high="50">xxx</pattern> <!-- COND --> 133 <field name="SRC1" low="0" high="15" type="#multisrc"> 134 <param name="SRC1_R" as="SRC_R"/> 135 <param name="FULL"/> 136 </field> 137 </bitset> 138 139 <bitset name="absneg.f" extends="#instruction-cat2-1src"> 140 <pattern low="53" high="58">000110</pattern> 141 </bitset> 142 143In this example, ``absneg.f`` is a concrete cat2 instruction (leaf node of 144the bitset inheritance tree) which has a single src register. At the 145``#instruction-cat2-1src`` level, bits that are used for the 2nd src arg 146and condition code (for cat2 instructions which use a condition code) are 147defined as 'x' (dontcare), which matches our understanding of the hardware 148(but also lets the disassembler flag cases where '1' bits show up in places 149we don't expect, which may signal a new instruction (sub)encoding). 150 151You'll notice that ``SRC1`` refers back to a different bitset hierarchy 152that describes various different src register encoding (used for cat2 and 153cat4 instructions), i.e. GPR vs CONST vs relative GPR/CONST. For fields 154which have bitset types, parameters can be "passed" in via ``<param>`` 155elements, which can be referred to by the display template string, and/or 156expressions. For example, this helps to deal with cases where other fields 157outside of that bitset control the encoding/decoding, such as in the 158``#multisrc`` example: 159 160.. code-block:: xml 161 162 <bitset name="#multisrc" size="16"> 163 <doc> 164 Encoding for instruction source which can be GPR/CONST/IMMED 165 or relative GPR/CONST. 166 </doc> 167 </bitset> 168 169 ... 170 171 <bitset name="#multisrc-gpr" extends="#multisrc"> 172 <display> 173 {ABSNEG}{SRC_R}{HALF}{SRC} 174 </display> 175 <derived name="HALF" expr="#multisrc-half" type="bool" display="h"/> 176 <field name="SRC" low="0" high="7" type="#reg-gpr"/> 177 <pattern low="8" high="13">000000</pattern> 178 <field name="ABSNEG" low="14" high="15" type="#absneg"/> 179 </bitset> 180 181At some level in the bitset inheritance hierarchy, there is expected to be a 182``<display>`` element specifying a template string used during bitset 183decoding. The display template consists of references to fields (which may 184be derived fields) specified as ``{FIELDNAME}`` and other characters 185which are just echoed through to the resulting decoded bitset. 186 187The special field reference ``{NAME}`` prints the name of the bitset. This is 188often useful when the ``<display>`` element is at a higher level than the 189leaves of the hierarchy, for example a whole class of similar instructions that 190only differ in opcode. 191 192Sometimes there may be multiple variants of an instruction that must be 193different bitsets, for example because they are so different that they must 194derive from different bitsets, but they have the same name. Because bitset 195names must be unique in the encoder, this can be a problem, but this can worked 196around with the ``displayname`` attribute on the ``bitset`` which changes how 197``{NAME}`` is displayed but not the name used in the encoder. ``displayname`` 198is only useful for leaf bitsets. 199 200It is possible to define a line column alignment value per field to influence 201the visual output. It needs to be specified as ``{FIELDNAME:align=xx}``. 202 203The ``<override>`` element will be described in the next section, but it 204provides for both different decoded instruction syntax/mnemonics (when 205simply providing a different display template string) as well as instruction 206encoding where different ranges of bits have a different meaning based on 207some other bitfield (or combination of bitfields). In this example it is 208used to cover the cases where ``SRCn_R`` has a different meaning and a 209different disassembly syntax depending on whether ``REPEAT`` equals zero. 210 211The ``<template>`` element can be used to represent a placeholder for a more 212complex ``<display>`` substring. 213 214Overrides 215--------- 216 217In many cases, a bitset is not convenient for describing the expected 218disasm syntax, and/or interpretation of some range of bits differs based 219on some other field or combination of fields. These *could* be modeled 220as different derived bitsets, at the expense of a combinatorial explosion 221of the size of the bitset inheritance tree. For example, *every* cat2 222(and cat3) instruction has both a ``(nopN)`` interpretation in addition to 223the ``(rptN`)`` interpretation. 224 225An ``<override>`` in a bitset allows to redefine the display string, and/or 226field definitions from the default case. If the override's expr(ession) 227evaluates to non-zero, ``<display>``, ``<field>``, and ``<derived>`` 228elements take precedence over what is defined in the top-level of the 229bitset (i.e. the default case). 230 231Expressions 232----------- 233 234Both ``<override>`` and ``<derived>`` fields make use of ``<expr>`` elements, 235either defined inline, or defined and named at the top level and referred to 236by name in multiple other places. An expression is a simple 'C' expression 237which can reference fields (including other derived fields) with the same 238``{FIELDNAME}`` syntax as display template strings. For example: 239 240.. code-block:: xml 241 242 <expr name="#cat2-cat3-nop-encoding"> 243 (({SRC1_R} != 0) || ({SRC2_R} != 0)) && ({REPEAT} == 0) 244 </expr> 245 246In the case of ``<override>`` elements, the override applies if the expression 247evaluates to non-zero. In the case of ``<derived>`` fields, the expression 248evaluates to the value of the derived field. 249 250Branching 251--------- 252 253isaspec supports a few special field types for printing branch destinations. If 254``isaspec_decode_options::branch_labels`` is true, a pre-pass over the program 255to be disassembled determines which instructions are branch destinations and 256then they are printed when disassembling, in addition to printing the name of 257the destination when printing the field itself. 258 259There are two different types, which affect how the destination is computed. If 260the field type is ``branch``, then the field is interpreted as a signed offset 261from the current instruction. If the type is ``absbranch``, then it is 262interpreted as an offset from the first instruction to be disassembled. In 263either case, the offset is multiplied by the instruction size. 264 265For example, here is what a signed-offset unconditional jump instruction might 266look like: 267 268.. code-block:: xml 269 270 <bitset name="jump" extends="#instruction"> 271 <display> 272 jump #{OFFSET} 273 </display> 274 <pattern low="26" high="31">110010</pattern> <!-- opcode goes here --> 275 <field name="OFFSET" low="0" high="25" type="branch"/> 276 </bitset> 277 278This would produce a disassembly like ``jump #l42`` if the destination is 42 279instructions after the start of the disassembly. The destination would be 280preceded by a line with just ``l42:``. 281 282``branch`` and ``absbranch`` fields can additionally have a ``call="true"`` 283attribute. For now, this just changes the disassembly. In particular the label 284prefix is changed to ``fxn`` and an extra empty line before the destination is 285added to visually separate the disassembly into functions. So, for example, a 286call instruction defined like this: 287 288.. code-block:: xml 289 290 <bitset name="call" extends="#instruction"> 291 <display> 292 call #{OFFSET} 293 </display> 294 <pattern low="26" high="31">110010</pattern> <!-- opcode goes here --> 295 <field name="OFFSET" low="0" high="25" type="branch" call="true"/> 296 </bitset> 297 298will disassemble to ``call #fxn42``. 299 300Finally, users with special knowledge about where execution may start can define 301"entrypoints" when disassembling which are printed like function call 302destinations, with an extra empty line, but with an arbitrary user-defined 303name. Names that are ``fxn`` or ``l`` followed by a number are discouraged 304because they may clash with automatically-generated names. 305 306Encoding 307-------- 308 309To facilitate instruction encoding, ``<encode>`` elements can be provided 310to teach the generated instruction packing code how to map from data structures 311representing the IR to fields. For example: 312 313.. code-block:: xml 314 315 <bitset name="#instruction" size="64"> 316 <doc> 317 Encoding of an ir3 instruction. All instructions are 64b. 318 </doc> 319 <gen min="300"/> 320 <encode type="struct ir3_instruction *" case-prefix="OPC_"> 321 <!-- 322 Define mapping from encode src to individual fields, 323 which are common across all instruction categories 324 at the root instruction level 325 326 Not all of these apply to all instructions, but we 327 can define mappings here for anything that is used 328 in more than one instruction category. For things 329 that are specific to a single instruction category, 330 mappings should be defined at that level instead. 331 --> 332 <map name="DST">src->regs[0]</map> 333 <map name="SRC1">src->regs[1]</map> 334 <map name="SRC2">src->regs[2]</map> 335 <map name="SRC3">src->regs[3]</map> 336 <map name="REPEAT">src->repeat</map> 337 <map name="SS">!!(src->flags & IR3_INSTR_SS)</map> 338 <map name="JP">!!(src->flags & IR3_INSTR_JP)</map> 339 <map name="SY">!!(src->flags & IR3_INSTR_SY)</map> 340 <map name="UL">!!(src->flags & IR3_INSTR_UL)</map> 341 <map name="EQ">0</map> <!-- We don't use this (yet) --> 342 <map name="SAT">!!(src->flags & IR3_INSTR_SAT)</map> 343 </encode> 344 </bitset> 345 346The ``type`` attribute specifies that the input to encoding an instruction 347is a ``struct ir3_instruction *``. In the case of bitset hierarchies with 348multiple possible leaf nodes, a ``case-prefix`` attribute should be supplied 349along with a function that maps the bitset encode source to an enum value 350with the specified prefix prepended to uppercased leaf node name. I.e. in 351this case, "add.f" becomes ``OPC_ADD_F``. 352 353Individual ``<map>`` elements teach the encoder how to map from the encode 354source to fields in the encoded instruction. 355