1*61046927SAndroid Build Coastguard Worker===================== 2*61046927SAndroid Build Coastguard WorkerAdreno Five Microcode 3*61046927SAndroid Build Coastguard Worker===================== 4*61046927SAndroid Build Coastguard Worker 5*61046927SAndroid Build Coastguard Worker.. contents:: 6*61046927SAndroid Build Coastguard Worker 7*61046927SAndroid Build Coastguard Worker.. _afuc-introduction: 8*61046927SAndroid Build Coastguard Worker 9*61046927SAndroid Build Coastguard WorkerIntroduction 10*61046927SAndroid Build Coastguard Worker============ 11*61046927SAndroid Build Coastguard Worker 12*61046927SAndroid Build Coastguard WorkerAdreno GPUs prior to 6xx use two micro-controllers to parse the command-stream, 13*61046927SAndroid Build Coastguard Workersetup the hardware for draws (or compute jobs), and do various GPU 14*61046927SAndroid Build Coastguard Workerhousekeeping. They are relatively simple (basically glorified 15*61046927SAndroid Build Coastguard Workerregister writers) and basically all their state is in a collection 16*61046927SAndroid Build Coastguard Workerof registers. Ie. there is no stack, and no memory assigned to 17*61046927SAndroid Build Coastguard Workerthem; any global state like which bank of context registers is to 18*61046927SAndroid Build Coastguard Workerbe used in the next draw is stored in a register. 19*61046927SAndroid Build Coastguard Worker 20*61046927SAndroid Build Coastguard WorkerThe setup is similar to radeon, in fact Adreno 2xx thru 4xx used 21*61046927SAndroid Build Coastguard Workerbasically the same instruction set as r600. There is a "PFP" 22*61046927SAndroid Build Coastguard Worker(Prefetch Parser) and "ME" (Micro Engine, also confusingly referred 23*61046927SAndroid Build Coastguard Workerto as "PM4"). These make up the "CP" ("Command Parser"). The 24*61046927SAndroid Build Coastguard WorkerPFP runs ahead of the ME, with some PM4 packets handled entirely 25*61046927SAndroid Build Coastguard Workerin the PFP. Between the PFP and ME is a FIFO ("MEQ"). In the 26*61046927SAndroid Build Coastguard Workergenerations prior to Adreno 5xx, the PFP and ME had different 27*61046927SAndroid Build Coastguard Workerinstruction sets. 28*61046927SAndroid Build Coastguard Worker 29*61046927SAndroid Build Coastguard WorkerStarting with Adreno 5xx, a new microcontroller with a unified 30*61046927SAndroid Build Coastguard Workerinstruction set was introduced, although the overall architecture 31*61046927SAndroid Build Coastguard Workerand purpose of the two microcontrollers remains the same. 32*61046927SAndroid Build Coastguard Worker 33*61046927SAndroid Build Coastguard WorkerFor lack of a better name, this new instruction set is called 34*61046927SAndroid Build Coastguard Worker"Adreno Five MicroCode" or "afuc". (No idea what Qualcomm calls 35*61046927SAndroid Build Coastguard Workerit internally). 36*61046927SAndroid Build Coastguard Worker 37*61046927SAndroid Build Coastguard WorkerWith Adreno 6xx, the separate PFP and ME are replaced with a single 38*61046927SAndroid Build Coastguard WorkerSQE microcontroller using the same instruction set as 5xx. 39*61046927SAndroid Build Coastguard Worker 40*61046927SAndroid Build Coastguard WorkerStarting with Adreno 660, another processor called LPAC (Low Priority 41*61046927SAndroid Build Coastguard WorkerAsynchronous Compute) is introduced which is a slightly cut-down copy of the 42*61046927SAndroid Build Coastguard WorkerSQE used to execute background compute tasks. Unlike on 5xx, the firmware is 43*61046927SAndroid Build Coastguard Workerbundled together with the main SQE firmware, and the SQE is responsible for 44*61046927SAndroid Build Coastguard Workerbooting LPAC. On 7xx, to implement concurrent binning the SQE is split into two 45*61046927SAndroid Build Coastguard Workerprocessors called BR and BV. Again, the firmware for all three is bundled 46*61046927SAndroid Build Coastguard Workertogether and BR is responsible for booting both BV and LPAC. 47*61046927SAndroid Build Coastguard Worker 48*61046927SAndroid Build Coastguard Worker.. _afuc-overview: 49*61046927SAndroid Build Coastguard Worker 50*61046927SAndroid Build Coastguard WorkerInstruction Set Overview 51*61046927SAndroid Build Coastguard Worker======================== 52*61046927SAndroid Build Coastguard Worker 53*61046927SAndroid Build Coastguard WorkerThe afuc instruction set is heavily inspired by MIPS, but not exactly 54*61046927SAndroid Build Coastguard Workercompatible. 55*61046927SAndroid Build Coastguard Worker 56*61046927SAndroid Build Coastguard WorkerRegisters 57*61046927SAndroid Build Coastguard Worker========= 58*61046927SAndroid Build Coastguard Worker 59*61046927SAndroid Build Coastguard WorkerSimilar to MIPS, there are 32 registers, and some are special purpose. ``$00`` 60*61046927SAndroid Build Coastguard Workeris the same as ``$zero`` on MIPS, it reads 0 and writes are discarded. 61*61046927SAndroid Build Coastguard Worker 62*61046927SAndroid Build Coastguard WorkerRegisters are displayed in the current disassembly with a hexadecimal 63*61046927SAndroid Build Coastguard Workernumbering, e.g. ``$0a`` is encoded as 10. 64*61046927SAndroid Build Coastguard Worker 65*61046927SAndroid Build Coastguard WorkerThe ABI used when processing packets is that ``$01`` contains the current PM4 66*61046927SAndroid Build Coastguard Workerheader, registers from ``$02`` up to ``$11`` are temporaries and may be freely 67*61046927SAndroid Build Coastguard Workerclobbered by the packet handler, while ``$12`` and above are used to store 68*61046927SAndroid Build Coastguard Workerglobal state like the IB level and next visible draw (for draw skipping). 69*61046927SAndroid Build Coastguard Worker 70*61046927SAndroid Build Coastguard WorkerUnlike in MIPS, there is a special small hardware-managed stack and special 71*61046927SAndroid Build Coastguard Workerinstructions ``call``/``ret`` which use it. The stack only contains return 72*61046927SAndroid Build Coastguard Workeraddresses, there is no "stack frame" to spill values to. As a result, ``$sp``, 73*61046927SAndroid Build Coastguard Worker``$fp``, and ``$ra`` don't exist as on MIPS. Instead the last 3 registers are 74*61046927SAndroid Build Coastguard Workerused to :ref:`afuc-read<read>` from various queues and 75*61046927SAndroid Build Coastguard Worker:ref:`afuc-reg-writes<write GPU registers>`. In addition there is a ``$rem`` 76*61046927SAndroid Build Coastguard Workerregister which normally contains the number of words remaining in the packet 77*61046927SAndroid Build Coastguard Workerbut can also be used as a normal register in combination with the rep prefix. 78*61046927SAndroid Build Coastguard Worker 79*61046927SAndroid Build Coastguard Worker.. _afuc-alu: 80*61046927SAndroid Build Coastguard Worker 81*61046927SAndroid Build Coastguard WorkerALU Instructions 82*61046927SAndroid Build Coastguard Worker================ 83*61046927SAndroid Build Coastguard Worker 84*61046927SAndroid Build Coastguard WorkerThe following instructions are available: 85*61046927SAndroid Build Coastguard Worker 86*61046927SAndroid Build Coastguard Worker- ``add`` - add 87*61046927SAndroid Build Coastguard Worker- ``addhi`` - add + carry (for upper 32b of 64b value) 88*61046927SAndroid Build Coastguard Worker- ``sub`` - subtract 89*61046927SAndroid Build Coastguard Worker- ``subhi`` - subtract + carry (for upper 32b of 64b value) 90*61046927SAndroid Build Coastguard Worker- ``and`` - bitwise AND 91*61046927SAndroid Build Coastguard Worker- ``or`` - bitwise OR 92*61046927SAndroid Build Coastguard Worker- ``xor`` - bitwise XOR 93*61046927SAndroid Build Coastguard Worker- ``not`` - bitwise NOT (no src1) 94*61046927SAndroid Build Coastguard Worker- ``shl`` - shift-left 95*61046927SAndroid Build Coastguard Worker- ``ushr`` - unsigned shift-right 96*61046927SAndroid Build Coastguard Worker- ``ishr`` - signed shift-right 97*61046927SAndroid Build Coastguard Worker- ``rot`` - rotate-left (like shift-left with wrap-around) 98*61046927SAndroid Build Coastguard Worker- ``mul8`` - multiply low 8b of two src 99*61046927SAndroid Build Coastguard Worker- ``min`` - minimum 100*61046927SAndroid Build Coastguard Worker- ``max`` - maximum 101*61046927SAndroid Build Coastguard Worker- ``cmp`` - compare two values 102*61046927SAndroid Build Coastguard Worker 103*61046927SAndroid Build Coastguard WorkerSimilar to MIPS, The ALU instructions can take either two src registers, or a 104*61046927SAndroid Build Coastguard Workersrc plus 16b immediate as 2nd src, ex:: 105*61046927SAndroid Build Coastguard Worker 106*61046927SAndroid Build Coastguard Worker add $dst, $src, 0x1234 ; src2 is immed 107*61046927SAndroid Build Coastguard Worker add $dst, $src1, $src2 ; src2 is reg 108*61046927SAndroid Build Coastguard Worker 109*61046927SAndroid Build Coastguard WorkerThe ``not`` instruction only takes a single source:: 110*61046927SAndroid Build Coastguard Worker 111*61046927SAndroid Build Coastguard Worker not $dst, $src 112*61046927SAndroid Build Coastguard Worker not $dst, 0x1234 113*61046927SAndroid Build Coastguard Worker 114*61046927SAndroid Build Coastguard WorkerOne departure from MIPS is that there is a special immediate-form ``mov`` 115*61046927SAndroid Build Coastguard Workerinstruction that can shift the 16-bit immediate by a given amount:: 116*61046927SAndroid Build Coastguard Worker 117*61046927SAndroid Build Coastguard Worker mov $dst, 0x1234 << 2 118*61046927SAndroid Build Coastguard Worker 119*61046927SAndroid Build Coastguard WorkerThis replaces ``lui`` on MIPS (just use a shift of 16) while also allowing the 120*61046927SAndroid Build Coastguard Workerquick construction of small bitfields, which comes in handy in various places. 121*61046927SAndroid Build Coastguard Worker 122*61046927SAndroid Build Coastguard Worker.. _afuc-alu-cmp: 123*61046927SAndroid Build Coastguard Worker 124*61046927SAndroid Build Coastguard WorkerThe ``cmp`` instruction returns: 125*61046927SAndroid Build Coastguard Worker 126*61046927SAndroid Build Coastguard Worker- ``0x00`` if src1 > src2 127*61046927SAndroid Build Coastguard Worker- ``0x2b`` if src1 == src2 128*61046927SAndroid Build Coastguard Worker- ``0x1e`` if src1 < src2 129*61046927SAndroid Build Coastguard Worker 130*61046927SAndroid Build Coastguard WorkerSee explanation in :ref:`afuc-branch` 131*61046927SAndroid Build Coastguard Worker 132*61046927SAndroid Build Coastguard Worker 133*61046927SAndroid Build Coastguard Worker.. _afuc-branch: 134*61046927SAndroid Build Coastguard Worker 135*61046927SAndroid Build Coastguard WorkerBranch Instructions 136*61046927SAndroid Build Coastguard Worker=================== 137*61046927SAndroid Build Coastguard Worker 138*61046927SAndroid Build Coastguard WorkerThe following branch/jump instructions are available: 139*61046927SAndroid Build Coastguard Worker 140*61046927SAndroid Build Coastguard Worker- ``brne`` - branch if not equal (or bit not set) 141*61046927SAndroid Build Coastguard Worker- ``breq`` - branch if equal (or bit set) 142*61046927SAndroid Build Coastguard Worker- ``jump`` - unconditional jump 143*61046927SAndroid Build Coastguard Worker 144*61046927SAndroid Build Coastguard WorkerBoth ``brne`` and ``breq`` have two forms, comparing the src register 145*61046927SAndroid Build Coastguard Workeragainst either a small immediate (up to 5 bits) or a specific bit:: 146*61046927SAndroid Build Coastguard Worker 147*61046927SAndroid Build Coastguard Worker breq $src, b3, #somelabel ; branch if src & (1 << 3) 148*61046927SAndroid Build Coastguard Worker breq $src, 0x3, #somelabel ; branch if src == 3 149*61046927SAndroid Build Coastguard Worker 150*61046927SAndroid Build Coastguard WorkerThe branch instructions are encoded with a 16b relative offset. 151*61046927SAndroid Build Coastguard WorkerSince ``$00`` always reads back zero, it can be used to construct 152*61046927SAndroid Build Coastguard Workeran unconditional relative jump. 153*61046927SAndroid Build Coastguard Worker 154*61046927SAndroid Build Coastguard WorkerThe :ref:`cmp <afuc-alu-cmp>` instruction can be paired with the 155*61046927SAndroid Build Coastguard Workerbit-test variants of ``brne``/``breq`` to implement gt/ge/lt/le, 156*61046927SAndroid Build Coastguard Workerdue to the bit pattern it returns, for example:: 157*61046927SAndroid Build Coastguard Worker 158*61046927SAndroid Build Coastguard Worker cmp $04, $02, $03 159*61046927SAndroid Build Coastguard Worker breq $04, b1, #somelabel 160*61046927SAndroid Build Coastguard Worker 161*61046927SAndroid Build Coastguard Workerwill branch if ``$02`` is less than or equal to ``$03``. 162*61046927SAndroid Build Coastguard Worker 163*61046927SAndroid Build Coastguard WorkerDelay slots 164*61046927SAndroid Build Coastguard Worker----------- 165*61046927SAndroid Build Coastguard Worker 166*61046927SAndroid Build Coastguard WorkerBranch instructions have a delay slot so the following instruction is always 167*61046927SAndroid Build Coastguard Workerexecuted regardless of whether branch is taken or not. Unlike MIPS, a branch in 168*61046927SAndroid Build Coastguard Workerthe delay slot is legal as long as the original branch and the branch in its 169*61046927SAndroid Build Coastguard Workerdelay slot are never both taken. Because jump tables are awkward and slow due 170*61046927SAndroid Build Coastguard Workerto the lack of memory caching, this is often exploited to create dense 171*61046927SAndroid Build Coastguard Workersequences of branches to implement switch-case constructs:: 172*61046927SAndroid Build Coastguard Worker 173*61046927SAndroid Build Coastguard Worker breq $02, 0x1, #foo 174*61046927SAndroid Build Coastguard Worker breq $02, 0x2, #bar 175*61046927SAndroid Build Coastguard Worker breq $02, 0x3, #baz 176*61046927SAndroid Build Coastguard Worker ... 177*61046927SAndroid Build Coastguard Worker nop 178*61046927SAndroid Build Coastguard Worker jump #default 179*61046927SAndroid Build Coastguard Worker 180*61046927SAndroid Build Coastguard WorkerAnother common use of a branch in a delay slot is a double-jump (jump to one 181*61046927SAndroid Build Coastguard Workerlocation if a condition is true, and another location if false). In MIPS this 182*61046927SAndroid Build Coastguard Workerrequires two delay slots:: 183*61046927SAndroid Build Coastguard Worker 184*61046927SAndroid Build Coastguard Worker beq $t0, 0x1, #foo 185*61046927SAndroid Build Coastguard Worker nop ; beq delay slot 186*61046927SAndroid Build Coastguard Worker b #bar 187*61046927SAndroid Build Coastguard Worker nop ; b delay slot 188*61046927SAndroid Build Coastguard Worker 189*61046927SAndroid Build Coastguard WorkerIn afuc this only requires a delay slot for the second branch:: 190*61046927SAndroid Build Coastguard Worker 191*61046927SAndroid Build Coastguard Worker breq $02, 0x1, #foo 192*61046927SAndroid Build Coastguard Worker brne $02, 0x1, #bar 193*61046927SAndroid Build Coastguard Worker nop 194*61046927SAndroid Build Coastguard Worker 195*61046927SAndroid Build Coastguard WorkerNote that for the second branch we had to use a conditional branch with the 196*61046927SAndroid Build Coastguard Workeropposite condition instead of an unconditional branch as in the MIPS example, 197*61046927SAndroid Build Coastguard Workerto guarantee that at most one is ever taken. 198*61046927SAndroid Build Coastguard Worker 199*61046927SAndroid Build Coastguard Worker.. _afuc-call: 200*61046927SAndroid Build Coastguard Worker 201*61046927SAndroid Build Coastguard WorkerCall/Return 202*61046927SAndroid Build Coastguard Worker=========== 203*61046927SAndroid Build Coastguard Worker 204*61046927SAndroid Build Coastguard WorkerSimple subroutines can be implemented with ``call``/``ret``. The 205*61046927SAndroid Build Coastguard Workerjump instruction encodes a fixed offset from the SQE instruction base. 206*61046927SAndroid Build Coastguard Worker 207*61046927SAndroid Build Coastguard Worker TODO not sure how many levels deep function calls can be nested. 208*61046927SAndroid Build Coastguard Worker There isn't really a stack. Definitely seems to be multiple 209*61046927SAndroid Build Coastguard Worker levels of fxn call, see in PFP: CP_CONTEXT_SWITCH_YIELD -> f13 -> 210*61046927SAndroid Build Coastguard Worker f22. 211*61046927SAndroid Build Coastguard Worker 212*61046927SAndroid Build Coastguard Worker.. _afuc-nop: 213*61046927SAndroid Build Coastguard Worker 214*61046927SAndroid Build Coastguard WorkerNOPs 215*61046927SAndroid Build Coastguard Worker==== 216*61046927SAndroid Build Coastguard Worker 217*61046927SAndroid Build Coastguard WorkerAfuc has a special NOP encoding where the low 24 bits are ignored by the 218*61046927SAndroid Build Coastguard Workerprocessor. On a5xx the high 8 bits are ``00``, on a6xx they are ``01`` 219*61046927SAndroid Build Coastguard Worker(probably to make sure that 0 is not a legal instruction, increasing the 220*61046927SAndroid Build Coastguard Workerchances of halting immediately when something is misconfigured). This is used 221*61046927SAndroid Build Coastguard Workersometimes to create a "payload" that is ignored when executed. For example, the 222*61046927SAndroid Build Coastguard Workerfirst 2 instructions of the firmware typically contain the firmware ID and 223*61046927SAndroid Build Coastguard Workerversion followed by the packet handling table offset encoded as NOPs. They are 224*61046927SAndroid Build Coastguard Workerskipped when executed but they are later read as data by the bootstrap routine. 225*61046927SAndroid Build Coastguard Worker 226*61046927SAndroid Build Coastguard Worker.. _afuc-control: 227*61046927SAndroid Build Coastguard Worker 228*61046927SAndroid Build Coastguard WorkerControl Registers 229*61046927SAndroid Build Coastguard Worker================= 230*61046927SAndroid Build Coastguard Worker 231*61046927SAndroid Build Coastguard WorkerControl registers are a special register space that can only be read/written 232*61046927SAndroid Build Coastguard Workerdirectly by CP through ``cread``/``cwrite`` instructions:: 233*61046927SAndroid Build Coastguard Worker 234*61046927SAndroid Build Coastguard Worker- ``cread $dst, [$off + addr], flags`` 235*61046927SAndroid Build Coastguard Worker- ``cread $dst, [$off + addr]!, flags`` 236*61046927SAndroid Build Coastguard Worker- ``cwrite $src, [$off + addr], flags`` 237*61046927SAndroid Build Coastguard Worker- ``cwrite $src, [$off + addr]!, flags`` 238*61046927SAndroid Build Coastguard Worker 239*61046927SAndroid Build Coastguard WorkerControl registers ``0x000`` to ``0x0ff`` are private registers used to control 240*61046927SAndroid Build Coastguard Workerthe CP, for example to indicate where to read from memory or (normal) 241*61046927SAndroid Build Coastguard Workerregisters. ``0x100`` to ``0x17f`` are a private scratch space used by the 242*61046927SAndroid Build Coastguard Workerfirmware however it wants, for example as an ad-hoc stack to spill registers 243*61046927SAndroid Build Coastguard Workerwhen calling a function or to store the scratch used in ``CP_SCRATCH_TO_*`` 244*61046927SAndroid Build Coastguard Workerpackets. Starting with the introduction of LPAC, ``0x200`` to ``0x27f`` are a 245*61046927SAndroid Build Coastguard Workershared scratch space used to communicate between processors and on a7xx they 246*61046927SAndroid Build Coastguard Workercan also be written on event completion to implement so-called "on-chip 247*61046927SAndroid Build Coastguard Workertimestamps". 248*61046927SAndroid Build Coastguard Worker 249*61046927SAndroid Build Coastguard WorkerIn cases where no offset is needed, ``$00`` is frequently used as the offset. 250*61046927SAndroid Build Coastguard Worker 251*61046927SAndroid Build Coastguard WorkerThe addressing mode with ``!`` is a pre-increment mode that writes the final 252*61046927SAndroid Build Coastguard Workeraddress ``$off + addr`` to ``$off``. 253*61046927SAndroid Build Coastguard Worker 254*61046927SAndroid Build Coastguard WorkerFor example, the following sequences sets:: 255*61046927SAndroid Build Coastguard Worker 256*61046927SAndroid Build Coastguard Worker ; load CP_INDIRECT_BUFFER parameters from cmdstream: 257*61046927SAndroid Build Coastguard Worker mov $02, $data ; low 32b of IB target address 258*61046927SAndroid Build Coastguard Worker mov $03, $data ; high 32b of IB target 259*61046927SAndroid Build Coastguard Worker mov $04, $data ; IB size in dwords 260*61046927SAndroid Build Coastguard Worker 261*61046927SAndroid Build Coastguard Worker ; sanity check # of dwords: 262*61046927SAndroid Build Coastguard Worker breq $04, 0x0, #l23 263*61046927SAndroid Build Coastguard Worker 264*61046927SAndroid Build Coastguard Worker ; this seems something to do with figuring out whether 265*61046927SAndroid Build Coastguard Worker ; we are going from RB->IB1 or IB1->IB2 (ie. so the 266*61046927SAndroid Build Coastguard Worker ; below cwrite instructions update either 267*61046927SAndroid Build Coastguard Worker ; CP_IB1_BASE_LO/HI/BUFSIZE or CP_IB2_BASE_LO/HI/BUFSIZE 268*61046927SAndroid Build Coastguard Worker and $05, $18, 0x0003 269*61046927SAndroid Build Coastguard Worker shl $05, $05, 0x0002 270*61046927SAndroid Build Coastguard Worker 271*61046927SAndroid Build Coastguard Worker ; update CP_IBn_BASE_LO/HI/BUFSIZE: 272*61046927SAndroid Build Coastguard Worker cwrite $02, [$05 + 0x0b0], 0x8 273*61046927SAndroid Build Coastguard Worker cwrite $03, [$05 + 0x0b1], 0x8 274*61046927SAndroid Build Coastguard Worker cwrite $04, [$05 + 0x0b2], 0x8 275*61046927SAndroid Build Coastguard Worker 276*61046927SAndroid Build Coastguard WorkerUnlike normal GPU registers, writing control registers seems to always take 277*61046927SAndroid Build Coastguard Workereffect immediately; if writing a control register triggers some complex 278*61046927SAndroid Build Coastguard Workeroperation that the firmware needs to wait for, then it typically uses a 279*61046927SAndroid Build Coastguard Workerspinloop with another control register to wait for it to finish. 280*61046927SAndroid Build Coastguard Worker 281*61046927SAndroid Build Coastguard WorkerControl registers are documented in ``adreno_control_regs.xml``. The 282*61046927SAndroid Build Coastguard Workerdisassembler will try to recognize an immediate address as a known control 283*61046927SAndroid Build Coastguard Workerregister and print it, for example this sequence similar to the above sequence 284*61046927SAndroid Build Coastguard Workerbut on a6xx:: 285*61046927SAndroid Build Coastguard Worker 286*61046927SAndroid Build Coastguard Worker and $05, $12, 0x0003 287*61046927SAndroid Build Coastguard Worker shl $05, $05, 0x0002 288*61046927SAndroid Build Coastguard Worker cwrite $0e, [$05 + @IB1_BASE], 0x0 289*61046927SAndroid Build Coastguard Worker cwrite $0b, [$05 + @IB1_BASE+0x1], 0x0 290*61046927SAndroid Build Coastguard Worker cwrite $04, [$05 + @IB1_DWORDS], 0x0 291*61046927SAndroid Build Coastguard Worker 292*61046927SAndroid Build Coastguard Worker.. _afuc-sqe-regs: 293*61046927SAndroid Build Coastguard Worker 294*61046927SAndroid Build Coastguard WorkerSQE Registers 295*61046927SAndroid Build Coastguard Worker============= 296*61046927SAndroid Build Coastguard Worker 297*61046927SAndroid Build Coastguard WorkerStarting with a6xx, the state of the SQE processor itself can be accessed 298*61046927SAndroid Build Coastguard Workerthrough ``sread``/``swrite`` instructions that work identically to 299*61046927SAndroid Build Coastguard Worker``cread``/``cwrite``. For example, this includes the state of the 300*61046927SAndroid Build Coastguard Worker``call``/``ret`` stack. This is mainly used during the preemption routine but 301*61046927SAndroid Build Coastguard Workerit's also used to set the entrypoint for preemption. 302*61046927SAndroid Build Coastguard Worker 303*61046927SAndroid Build Coastguard Worker.. _afuc-read: 304*61046927SAndroid Build Coastguard Worker 305*61046927SAndroid Build Coastguard WorkerReading Memory and Registers 306*61046927SAndroid Build Coastguard Worker============================ 307*61046927SAndroid Build Coastguard Worker 308*61046927SAndroid Build Coastguard WorkerThe CP accesses memory directly with no caching. This means that except for 309*61046927SAndroid Build Coastguard Workervery small amounts of data accessed rarely, ``load`` and ``store`` are very 310*61046927SAndroid Build Coastguard Workerslow. Instead, ME/PFP and later SQE read memory through various queues. Reading 311*61046927SAndroid Build Coastguard Workerregisters also use a queue, likely because burst reading several registers at 312*61046927SAndroid Build Coastguard Workeronce is faster than reading them one-by-one and reading does not complete 313*61046927SAndroid Build Coastguard Workerimmediately. Queueing up a read involves writing a (address, length) pair to a 314*61046927SAndroid Build Coastguard Workercontrol register, and data is read from the queue using one of three special 315*61046927SAndroid Build Coastguard Workerregisters: 316*61046927SAndroid Build Coastguard Worker 317*61046927SAndroid Build Coastguard Worker- ``$data`` reads the next PM4 packet word. This comes from the RB, IB1, IB2, 318*61046927SAndroid Build Coastguard Worker or SDS (Set Draw State) queue, controlled by ``@IB_LEVEL``. It also 319*61046927SAndroid Build Coastguard Worker decrements ``$rem`` if it isn't already decremented by a rep prefix. 320*61046927SAndroid Build Coastguard Worker- ``$memdata`` reads the next word from a memory read buffer (MRB) setup by 321*61046927SAndroid Build Coastguard Worker writing ``@MEM_READ_ADDR``/``@MEM_READ_DWORDS``. It's used by things like 322*61046927SAndroid Build Coastguard Worker ``CP_MEMCPY`` and reading indirect draw parameters in ``CP_DRAW_INDIRECT``. 323*61046927SAndroid Build Coastguard Worker- ``$regdata`` reads from a register read buffer (RRB) setup by 324*61046927SAndroid Build Coastguard Worker ``@REG_READ_ADDR``/``@REG_READ_DWORDS``. 325*61046927SAndroid Build Coastguard Worker 326*61046927SAndroid Build Coastguard WorkerRB, IB1, IB2, SDS, and MRB make up the Read-Only Queue or ROQ, in addition to 327*61046927SAndroid Build Coastguard Workerthe Visibility Stream Decoder (VSD) which is setup via a similar control 328*61046927SAndroid Build Coastguard Workerregister pair but is read by a fixed-function parser that the CP accesses via a 329*61046927SAndroid Build Coastguard Workerfew control registers. 330*61046927SAndroid Build Coastguard Worker 331*61046927SAndroid Build Coastguard Worker.. _afuc-reg-writes: 332*61046927SAndroid Build Coastguard Worker 333*61046927SAndroid Build Coastguard WorkerWriting Registers 334*61046927SAndroid Build Coastguard Worker================= 335*61046927SAndroid Build Coastguard Worker 336*61046927SAndroid Build Coastguard WorkerThe same special registers, when used as a destination, can be used to 337*61046927SAndroid Build Coastguard Workerwrite GPU registers on ME. Because they have a totally different function when 338*61046927SAndroid Build Coastguard Workerused as a destination, they use different names: 339*61046927SAndroid Build Coastguard Worker 340*61046927SAndroid Build Coastguard Worker- ``$addr`` sets the address and disables ``CP_PROTECT`` address checking. 341*61046927SAndroid Build Coastguard Worker- ``$usraddr`` sets the address and checks it against the ``CP_PROTECT`` access 342*61046927SAndroid Build Coastguard Worker table. It's used for addresses specified by the PM4 packet stream instead of 343*61046927SAndroid Build Coastguard Worker internally. 344*61046927SAndroid Build Coastguard Worker- ``$data`` writes the register and auto-increments the address. 345*61046927SAndroid Build Coastguard Worker 346*61046927SAndroid Build Coastguard Workerfor example, to write:: 347*61046927SAndroid Build Coastguard Worker 348*61046927SAndroid Build Coastguard Worker mov $addr, CP_SCRATCH_REG[0x2] ; set register to write 349*61046927SAndroid Build Coastguard Worker mov $data, $03 ; CP_SCRATCH_REG[0x2] 350*61046927SAndroid Build Coastguard Worker mov $data, $04 ; CP_SCRATCH_REG[0x3] 351*61046927SAndroid Build Coastguard Worker ... 352*61046927SAndroid Build Coastguard Worker 353*61046927SAndroid Build Coastguard Workersubsequent writes to ``$data`` will increment the address of the register 354*61046927SAndroid Build Coastguard Workerto write, so a sequence of consecutive registers can be written. On a5xx ME, 355*61046927SAndroid Build Coastguard Workerthis will directly write the register, on a6xx SQE this will instead determine 356*61046927SAndroid Build Coastguard Workerwhich cluster(s) the register belongs to and push the write onto the 357*61046927SAndroid Build Coastguard Workerappropriate per-cluster queue(s) letting the SQE run ahead of the GPU. 358*61046927SAndroid Build Coastguard Worker 359*61046927SAndroid Build Coastguard WorkerWhen bit 18 of ``$addr`` is set, the auto-incrementing is disabled. This is 360*61046927SAndroid Build Coastguard Workeroften used with :ref:`afuc-mem-writes <NRT_DATA>`. 361*61046927SAndroid Build Coastguard Worker 362*61046927SAndroid Build Coastguard WorkerOn a5xx ME, ``$regdata`` can also be used to directly read a register:: 363*61046927SAndroid Build Coastguard Worker 364*61046927SAndroid Build Coastguard Worker mov $addr, CP_SCRATCH_REG[0x2] 365*61046927SAndroid Build Coastguard Worker mov $03, $regdata 366*61046927SAndroid Build Coastguard Worker mov $04, $regdata 367*61046927SAndroid Build Coastguard Worker 368*61046927SAndroid Build Coastguard WorkerThis does not exist on a6xx because register reads are not synchronized against 369*61046927SAndroid Build Coastguard Workerwrites any more. 370*61046927SAndroid Build Coastguard Worker 371*61046927SAndroid Build Coastguard WorkerMany registers that are updated frequently have two banks, so they can be 372*61046927SAndroid Build Coastguard Workerupdated without stalling for previous draw to finish. On a5xx, these banks are 373*61046927SAndroid Build Coastguard Workerarranged so bit 11 is zero for bank 0 and 1 for bank 1. The ME fw (at 374*61046927SAndroid Build Coastguard Workerleast the version I'm looking at) stores this in ``$17``, so to update these 375*61046927SAndroid Build Coastguard Workerregisters from ME:: 376*61046927SAndroid Build Coastguard Worker 377*61046927SAndroid Build Coastguard Worker or $addr, $17, VFD_INDEX_OFFSET 378*61046927SAndroid Build Coastguard Worker mov $data, $03 379*61046927SAndroid Build Coastguard Worker ... 380*61046927SAndroid Build Coastguard Worker 381*61046927SAndroid Build Coastguard WorkerOn a6xx this is handled transparently to the SQE, and the bank to use is stored 382*61046927SAndroid Build Coastguard Workerseparately in the cluster queue. 383*61046927SAndroid Build Coastguard Worker 384*61046927SAndroid Build Coastguard WorkerRegisters can also be written directly, skipping the queue, by writing 385*61046927SAndroid Build Coastguard Worker``@REG_WRITE_ADDR``/``@REG_WRITE``. This is used on a6xx for certain frontend 386*61046927SAndroid Build Coastguard Workerregisters that have their own queues and on a5xx is used by the PFP:: 387*61046927SAndroid Build Coastguard Worker 388*61046927SAndroid Build Coastguard Worker mov $0c, CP_SCRATCH_REG[0x7] 389*61046927SAndroid Build Coastguard Worker mov $02, 0x789a ; value 390*61046927SAndroid Build Coastguard Worker cwrite $0c, [$00 + @REG_WRITE_ADDR], 0x8 391*61046927SAndroid Build Coastguard Worker cwrite $02, [$00 + @REG_WRITE], 0x8 392*61046927SAndroid Build Coastguard Worker 393*61046927SAndroid Build Coastguard WorkerLike with the ``$addr``/``$data`` approach, the destination register address 394*61046927SAndroid Build Coastguard Workerincrements on each write to ``@REG_WRITE``. 395*61046927SAndroid Build Coastguard Worker 396*61046927SAndroid Build Coastguard Worker.. _afuc-pipe-regs: 397*61046927SAndroid Build Coastguard Worker 398*61046927SAndroid Build Coastguard WorkerPipe Registers 399*61046927SAndroid Build Coastguard Worker-------------- 400*61046927SAndroid Build Coastguard Worker 401*61046927SAndroid Build Coastguard WorkerThis yet another private register space, triggered by writing to the high 8 402*61046927SAndroid Build Coastguard Workerbits of ``$addr`` and then writing ``$data`` normally. Some pipe registers like 403*61046927SAndroid Build Coastguard Worker``WAIT_MEM_WRITES`` or ``WAIT_GPU_IDLE`` have no data and a write is triggered 404*61046927SAndroid Build Coastguard Workerimmediately when ``$addr`` is written, for example in ``CP_WAIT_MEM_WRITES``:: 405*61046927SAndroid Build Coastguard Worker 406*61046927SAndroid Build Coastguard Worker mov $addr, 0x0084 << 24 ; |WAIT_MEM_WRITES 407*61046927SAndroid Build Coastguard Worker 408*61046927SAndroid Build Coastguard WorkerThe pipe register is decoded here by the disassembler in a comment. 409*61046927SAndroid Build Coastguard Worker 410*61046927SAndroid Build Coastguard WorkerThe main difference of pipe registers from control registers are: 411*61046927SAndroid Build Coastguard Worker 412*61046927SAndroid Build Coastguard Worker- They are always write-only. 413*61046927SAndroid Build Coastguard Worker- On a6xx they are pipelined together with normal register writes, on a5xx they 414*61046927SAndroid Build Coastguard Worker are written from ME like normal registers. 415*61046927SAndroid Build Coastguard Worker- Writing them can take an arbitrary amount of time, so they can be used to 416*61046927SAndroid Build Coastguard Worker wait for some condition without spinning. 417*61046927SAndroid Build Coastguard Worker 418*61046927SAndroid Build Coastguard WorkerIn short, they behave more like normal registers but are not expected to be 419*61046927SAndroid Build Coastguard Workerread/written by anything other than CP. Over time more and more GPU registers 420*61046927SAndroid Build Coastguard Workernot touched by the kernel driver have been converted to pipe registers. 421*61046927SAndroid Build Coastguard Worker 422*61046927SAndroid Build Coastguard Worker.. _afuc-mem-writes: 423*61046927SAndroid Build Coastguard Worker 424*61046927SAndroid Build Coastguard WorkerWriting Memory 425*61046927SAndroid Build Coastguard Worker============== 426*61046927SAndroid Build Coastguard Worker 427*61046927SAndroid Build Coastguard WorkerWriting memory is done by writing GPU registers: 428*61046927SAndroid Build Coastguard Worker 429*61046927SAndroid Build Coastguard Worker- ``CP_ME_NRT_ADDR_LO``/``_HI`` - write to set the address to read or write 430*61046927SAndroid Build Coastguard Worker- ``CP_ME_NRT_DATA`` - write to trigger write to address in ``CP_ME_NRT_ADDR``. 431*61046927SAndroid Build Coastguard Worker 432*61046927SAndroid Build Coastguard WorkerThe address register increments with successive writes. 433*61046927SAndroid Build Coastguard Worker 434*61046927SAndroid Build Coastguard WorkerOn a5xx, this seems to be only used by ME. If PFP were also using it, they would 435*61046927SAndroid Build Coastguard Workerrace with each other. It can also be used for reads, primarily small reads. 436*61046927SAndroid Build Coastguard Worker 437*61046927SAndroid Build Coastguard WorkerMemory Write example:: 438*61046927SAndroid Build Coastguard Worker 439*61046927SAndroid Build Coastguard Worker ; store 64b value in $04+$05 to 64b address in $02+$03 440*61046927SAndroid Build Coastguard Worker mov $addr, CP_ME_NRT_ADDR_LO 441*61046927SAndroid Build Coastguard Worker mov $data, $02 442*61046927SAndroid Build Coastguard Worker mov $data, $03 443*61046927SAndroid Build Coastguard Worker mov $addr, CP_ME_NRT_DATA 444*61046927SAndroid Build Coastguard Worker mov $data, $04 445*61046927SAndroid Build Coastguard Worker mov $data, $05 446*61046927SAndroid Build Coastguard Worker 447*61046927SAndroid Build Coastguard WorkerMemory Read example:: 448*61046927SAndroid Build Coastguard Worker 449*61046927SAndroid Build Coastguard Worker ; load 64b value from address in $02+$03 into $04+$05 450*61046927SAndroid Build Coastguard Worker mov $addr, CP_ME_NRT_ADDR_LO 451*61046927SAndroid Build Coastguard Worker mov $data, $02 452*61046927SAndroid Build Coastguard Worker mov $data, $03 453*61046927SAndroid Build Coastguard Worker mov $04, $addr 454*61046927SAndroid Build Coastguard Worker mov $05, $addr 455*61046927SAndroid Build Coastguard Worker 456*61046927SAndroid Build Coastguard WorkerOn a6xx ``CP_ME_NRT_ADDR`` and ``CP_ME_NRT_DATA`` have been replaced by 457*61046927SAndroid Build Coastguard Worker:ref:`afuc-pipe-regs <pipe registers>` and they can only be used for writes but 458*61046927SAndroid Build Coastguard Workerit otherwise works similarly. 459*61046927SAndroid Build Coastguard Worker 460*61046927SAndroid Build Coastguard WorkerLoad and Store Instructions 461*61046927SAndroid Build Coastguard Worker=========================== 462*61046927SAndroid Build Coastguard Worker 463*61046927SAndroid Build Coastguard Workera6xx adds ``load`` and ``store`` instruction that work similarly to ``cread`` 464*61046927SAndroid Build Coastguard Workerand ``cwrite``. Because the address is 64-bits but registers are 32-bit, the 465*61046927SAndroid Build Coastguard Workerhigh 32 bits come from the ``@LOAD_STORE_HI`` 466*61046927SAndroid Build Coastguard Worker:ref:`afuc-control <control register>`. They are mostly used by the context 467*61046927SAndroid Build Coastguard Workerswitch routine and even then very sparingly, before the memory read/write queue 468*61046927SAndroid Build Coastguard Workerstate is saved while it is being restored. 469*61046927SAndroid Build Coastguard Worker 470*61046927SAndroid Build Coastguard WorkerModifiers 471*61046927SAndroid Build Coastguard Worker========= 472*61046927SAndroid Build Coastguard Worker 473*61046927SAndroid Build Coastguard WorkerThere are two modifiers that enable more compact and efficient implementations 474*61046927SAndroid Build Coastguard Workerof common patterns: 475*61046927SAndroid Build Coastguard Worker 476*61046927SAndroid Build Coastguard Worker.. _afuc-rep: 477*61046927SAndroid Build Coastguard Worker 478*61046927SAndroid Build Coastguard WorkerRepeat 479*61046927SAndroid Build Coastguard Worker------ 480*61046927SAndroid Build Coastguard Worker 481*61046927SAndroid Build Coastguard Worker``(rep)`` repeats the same instruction ``$rem`` times. More precisely, it 482*61046927SAndroid Build Coastguard Workerdecrements ``$rem`` after the instruction executes if it wasn't already 483*61046927SAndroid Build Coastguard Workerdecremented from a read from ``$data`` and re-executes the instruction until 484*61046927SAndroid Build Coastguard Worker``$rem`` is 0. It can be used with ALU instructions and control instructions. 485*61046927SAndroid Build Coastguard WorkerUsually it is used in conjunction with ``$data`` to read the rest of the packet 486*61046927SAndroid Build Coastguard Workerin one instruction, but it can also be used freestanding, for example this 487*61046927SAndroid Build Coastguard Workersnippet clears the control register scratch space:: 488*61046927SAndroid Build Coastguard Worker 489*61046927SAndroid Build Coastguard Worker mov $rem, 0x0080 ; clear 0x80 registers 490*61046927SAndroid Build Coastguard Worker mov $03, 0x00ff ; start at 0xff + 1 = 0x100 491*61046927SAndroid Build Coastguard Worker (rep)cwrite $00, [$03 + 0x001], 0x4 492*61046927SAndroid Build Coastguard Worker 493*61046927SAndroid Build Coastguard WorkerNote the use of pre-increment mode, so that the first execution clears 494*61046927SAndroid Build Coastguard Worker``0x100`` and updates ``$03`` to ``0x100``, the second execution clears 495*61046927SAndroid Build Coastguard Worker``0x101`` and updates ``$03`` to ``0x101``, and so on. 496*61046927SAndroid Build Coastguard Worker 497*61046927SAndroid Build Coastguard Worker.. _afuc-xmov: 498*61046927SAndroid Build Coastguard Worker 499*61046927SAndroid Build Coastguard WorkereXtra Moves 500*61046927SAndroid Build Coastguard Worker----------- 501*61046927SAndroid Build Coastguard Worker 502*61046927SAndroid Build Coastguard Worker``(xmovN)`` is an optimization which lets the firmware read multiple words from 503*61046927SAndroid Build Coastguard Workera queue in the same cycle. Conceptually, it adds "extra" mov instructions to be 504*61046927SAndroid Build Coastguard Workerexecuted after a given ALU instruction, although in practice they are likely 505*61046927SAndroid Build Coastguard Workerexecuted in parallel. ``(xmov1)`` adds up to 1 move, ``(xmov2)`` adds up to 2, 506*61046927SAndroid Build Coastguard Workerand ``(xmov3)`` adds up to 3. The actual number of moves added is the minimum 507*61046927SAndroid Build Coastguard Workerof the number in the instruction and ``$rem``, so a ``(xmov3)`` instruction 508*61046927SAndroid Build Coastguard Workerbehaves like a ``(xmov1)`` instruction if ``$rem = 1``. Given an instruction:: 509*61046927SAndroid Build Coastguard Worker 510*61046927SAndroid Build Coastguard Worker (xmovN) alu $dst, $src1, $src2 511*61046927SAndroid Build Coastguard Worker 512*61046927SAndroid Build Coastguard Workeror a 1-source instruction:: 513*61046927SAndroid Build Coastguard Worker 514*61046927SAndroid Build Coastguard Worker (xmovN) alu $dst, $src2 515*61046927SAndroid Build Coastguard Worker 516*61046927SAndroid Build Coastguard Workerthen we compute the number of extra moves ``M = min(N, $rem)``. If ``M = 1``, 517*61046927SAndroid Build Coastguard Workerthen we add:: 518*61046927SAndroid Build Coastguard Worker 519*61046927SAndroid Build Coastguard Worker mov $data, $src2 520*61046927SAndroid Build Coastguard Worker 521*61046927SAndroid Build Coastguard WorkerIf ``M = 2``, then we add:: 522*61046927SAndroid Build Coastguard Worker 523*61046927SAndroid Build Coastguard Worker mov $data, $src2 524*61046927SAndroid Build Coastguard Worker mov $data, $src2 525*61046927SAndroid Build Coastguard Worker 526*61046927SAndroid Build Coastguard WorkerFinally, as a special case explained below, if ``M = 3`` then we add:: 527*61046927SAndroid Build Coastguard Worker 528*61046927SAndroid Build Coastguard Worker mov $data, $src2 529*61046927SAndroid Build Coastguard Worker mov $dst, $src2 ; !!! 530*61046927SAndroid Build Coastguard Worker mov $data, $src2 531*61046927SAndroid Build Coastguard Worker 532*61046927SAndroid Build Coastguard WorkerIf ``$dst`` is not one of the "special" registers ``$data``, ``$addr``, 533*61046927SAndroid Build Coastguard Worker``$usraddr``, then ``$data`` is replaced by ``$00`` in all destinations, i.e. 534*61046927SAndroid Build Coastguard Workerthe results of the subsequent moves are discarded. 535*61046927SAndroid Build Coastguard Worker 536*61046927SAndroid Build Coastguard WorkerThe purpose of the ``M = 3`` special case is mostly to efficiently implement 537*61046927SAndroid Build Coastguard Worker``CP_CONTEXT_REG_BUNCH``. This is the entire implementation of 538*61046927SAndroid Build Coastguard Worker``CP_CONTEXT_REG_BUNCH``, which is essentially just one instruction:: 539*61046927SAndroid Build Coastguard Worker 540*61046927SAndroid Build Coastguard Worker CP_CONTEXT_REG_BUNCH: 541*61046927SAndroid Build Coastguard Worker (rep)(xmov3)mov $usraddr, $data 542*61046927SAndroid Build Coastguard Worker waitin 543*61046927SAndroid Build Coastguard Worker mov $01, $data 544*61046927SAndroid Build Coastguard Worker 545*61046927SAndroid Build Coastguard WorkerIf there are 4 or more words remaining in the packet, that is if there are at 546*61046927SAndroid Build Coastguard Workerleast two more registers to write, then (ignoring the ``(rep)`` for a moment) 547*61046927SAndroid Build Coastguard Workerthe instruction expands to:: 548*61046927SAndroid Build Coastguard Worker 549*61046927SAndroid Build Coastguard Worker mov $usraddr, $data 550*61046927SAndroid Build Coastguard Worker mov $data, $data 551*61046927SAndroid Build Coastguard Worker mov $usraddr, $data 552*61046927SAndroid Build Coastguard Worker mov $data, $data 553*61046927SAndroid Build Coastguard Worker 554*61046927SAndroid Build Coastguard WorkerThis is likely all executed in a single cycle, allowing us to write 2 registers 555*61046927SAndroid Build Coastguard Workerper cycle. 556*61046927SAndroid Build Coastguard Worker 557*61046927SAndroid Build Coastguard Worker``(xmov1)`` can be also added to ``(rep)mov $data, $data``, which is a common 558*61046927SAndroid Build Coastguard Workerpattern to write the rest of the packet to successive registers, to write up to 559*61046927SAndroid Build Coastguard Worker2 registers per cycle as well. The firmware does not use ``(xmov3)``, however, 560*61046927SAndroid Build Coastguard Workerso 2 registers per cycle is likely a hardware limitation. 561*61046927SAndroid Build Coastguard Worker 562*61046927SAndroid Build Coastguard WorkerAlthough ``(xmovN)`` is often used in combination with ``(rep)``, it doesn't 563*61046927SAndroid Build Coastguard Workerhave to be. For example, ``(xmov1)mov $data, $data`` moves the next 2 packet 564*61046927SAndroid Build Coastguard Workerwords to 2 successive registers. 565*61046927SAndroid Build Coastguard Worker 566*61046927SAndroid Build Coastguard Worker.. _afuc-sds: 567*61046927SAndroid Build Coastguard Worker 568*61046927SAndroid Build Coastguard WorkerSet Draw State 569*61046927SAndroid Build Coastguard Worker-------------- 570*61046927SAndroid Build Coastguard Worker 571*61046927SAndroid Build Coastguard Worker``(sdsN)`` is a modifier for ``cwrite`` used to accelerate 572*61046927SAndroid Build Coastguard Worker``CP_SET_DRAW_STATE``. For each draw state group to update, 573*61046927SAndroid Build Coastguard Worker``CP_SET_DRAW_STATE`` needs to copy 3 words from the packet containing the 574*61046927SAndroid Build Coastguard Workergroup to update, metadata, and base address plus size. Using the ``(sds2)`` 575*61046927SAndroid Build Coastguard Workermodifier as well as ``(rep)``, this can be accomplished in a single 576*61046927SAndroid Build Coastguard Workerinstruction:: 577*61046927SAndroid Build Coastguard Worker 578*61046927SAndroid Build Coastguard Worker (rep)(sds2)cwrite $data, [$00 + @DRAW_STATE_SET_HDR] 579*61046927SAndroid Build Coastguard Worker 580*61046927SAndroid Build Coastguard WorkerThe first word containing the header is written to ``@DRAW_STATE_SET_HDR``, and 581*61046927SAndroid Build Coastguard Workerthe second and third words containing the draw state base come from reading the 582*61046927SAndroid Build Coastguard Workersource again twice and are written directly to the draw state RAM. 583*61046927SAndroid Build Coastguard Worker 584*61046927SAndroid Build Coastguard WorkerIn testing with other control registers, ``(sdsN)`` causes the source to be 585*61046927SAndroid Build Coastguard Workerread ``N`` extra times and then thrown away. Only when used in combination with 586*61046927SAndroid Build Coastguard Worker``@DRAW_STATE_SET_HDR`` do the extra source reads have an effect. 587*61046927SAndroid Build Coastguard Worker 588*61046927SAndroid Build Coastguard Worker.. _afuc-peek: 589*61046927SAndroid Build Coastguard Worker 590*61046927SAndroid Build Coastguard WorkerPeek 591*61046927SAndroid Build Coastguard Worker---- 592*61046927SAndroid Build Coastguard Worker 593*61046927SAndroid Build Coastguard Worker``(peek)`` is valid on ALU instructions without an immediate. It modifies what 594*61046927SAndroid Build Coastguard Worker``$data`` (and possibly ``$memdata`` and ``$regdata``) do by making them avoid 595*61046927SAndroid Build Coastguard Workerconsuming the word. The next read to ``$data`` will return the same thing. This 596*61046927SAndroid Build Coastguard Workeris used solely by ``CP_INDIRECT_BUFFER`` to test if there is a subsequent IB 597*61046927SAndroid Build Coastguard Workerthat can be prefetched while the first IB is executed without actually 598*61046927SAndroid Build Coastguard Workerconsuming the header for the next packet. It is introduced on a7xx, and 599*61046927SAndroid Build Coastguard Workerreplaces the use of a special control register. 600*61046927SAndroid Build Coastguard Worker 601*61046927SAndroid Build Coastguard WorkerPacket Table 602*61046927SAndroid Build Coastguard Worker============ 603*61046927SAndroid Build Coastguard Worker 604*61046927SAndroid Build Coastguard WorkerThe core of the microprocessor's job is to parse each packet header and jump to 605*61046927SAndroid Build Coastguard Workerits handler. This is done through a ``waitin`` instruction which waits for the 606*61046927SAndroid Build Coastguard Workerpacket header to become available and then parses the header and jumps to the 607*61046927SAndroid Build Coastguard Workerhandler using a jump table. However it does *not* actually consume the header. 608*61046927SAndroid Build Coastguard WorkerLike any branch instruction, it has a delay slot, and by convention this delay 609*61046927SAndroid Build Coastguard Workerslot always contains a ``mov $01, $data`` instruction. This consumes the same 610*61046927SAndroid Build Coastguard Workerheader that ``waitin`` parsed and puts it in ``$01`` so that the packet header 611*61046927SAndroid Build Coastguard Workeris available in ``$01`` in the next packet. Thus all packet handlers end with 612*61046927SAndroid Build Coastguard Workerthis sequence:: 613*61046927SAndroid Build Coastguard Worker 614*61046927SAndroid Build Coastguard Worker waitin 615*61046927SAndroid Build Coastguard Worker mov $01, $data 616*61046927SAndroid Build Coastguard Worker 617*61046927SAndroid Build Coastguard WorkerThe jump table itself is initialized by the SQE in the bootstrap routine at the 618*61046927SAndroid Build Coastguard Workerbeginning of the firmware. Amongst other tasks, it reads the offset of the jump 619*61046927SAndroid Build Coastguard Workertable from the NOP payload at the beginning, then uses a jump table embedded at 620*61046927SAndroid Build Coastguard Workerthe end of the firmware to set it up by writing the ``@PACKET_TABLE_WRITE`` 621*61046927SAndroid Build Coastguard Workercontrol register. After everything is setup, it does the ``waitin`` sequence 622*61046927SAndroid Build Coastguard Workerto start handling the first packet (which should be ``CP_ME_INIT``). 623*61046927SAndroid Build Coastguard Worker 624*61046927SAndroid Build Coastguard WorkerExample Packet 625*61046927SAndroid Build Coastguard Worker============== 626*61046927SAndroid Build Coastguard Worker 627*61046927SAndroid Build Coastguard WorkerLet's examine an implementation of ``CP_MEM_WRITE``:: 628*61046927SAndroid Build Coastguard Worker 629*61046927SAndroid Build Coastguard Worker CP_MEM_WRITE: 630*61046927SAndroid Build Coastguard Worker mov $addr, 0x00a0 << 24 ; |NRT_ADDR 631*61046927SAndroid Build Coastguard Worker 632*61046927SAndroid Build Coastguard WorkerFirst, we setup the register to write to, which is the ``NRT_ADDR`` 633*61046927SAndroid Build Coastguard Worker:ref:`afuc-pipe-regs <pipe register>`. It turns out that the low 2 bits of 634*61046927SAndroid Build Coastguard Worker``NRT_ADDR`` are a flag which when 1 disables auto-incrementing ``NRT_ADDR`` 635*61046927SAndroid Build Coastguard Workerwhen ``NRT_DATA`` is written, but we don't want this behavior so we have to 636*61046927SAndroid Build Coastguard Workermake sure they are clear:: 637*61046927SAndroid Build Coastguard Worker 638*61046927SAndroid Build Coastguard Worker or $02, $data, 0x0003 ; reading $data reads the next PM4 word 639*61046927SAndroid Build Coastguard Worker xor $data, $02, 0x0003 ; writing $data writes the register, which is NRT_ADDR 640*61046927SAndroid Build Coastguard Worker 641*61046927SAndroid Build Coastguard WorkerWriting ``$data`` auto-increments ``$addr``, so now the next write is to 642*61046927SAndroid Build Coastguard Worker``0xa1`` or ``NRT_ADDR+1`` (``NRT_ADDR`` is a 64-bit register):: 643*61046927SAndroid Build Coastguard Worker 644*61046927SAndroid Build Coastguard Worker mov $data, $data 645*61046927SAndroid Build Coastguard Worker 646*61046927SAndroid Build Coastguard WorkerNow, we have to write ``NRT_DATA``. We want to repeatedly write the same 647*61046927SAndroid Build Coastguard Workerregister, without having to fight the auto-increment by resetting ``$addr`` 648*61046927SAndroid Build Coastguard Workereach time, which is where the bit 18 that disables auto-increment comes in 649*61046927SAndroid Build Coastguard Workerhandy:: 650*61046927SAndroid Build Coastguard Worker 651*61046927SAndroid Build Coastguard Worker mov $addr, 0xa204 << 16 ; |NRT_DATA 652*61046927SAndroid Build Coastguard Worker 653*61046927SAndroid Build Coastguard WorkerFinally, we have to repeatedly copy the remaining PM4 packet data to the 654*61046927SAndroid Build Coastguard Worker``NRT_DATA`` register, which we can do in one instruction with 655*61046927SAndroid Build Coastguard Worker:ref:`afuc-rep <(rep)>`. Furthermore we can use :ref:`afuc-xmov <(xmov1)>` to 656*61046927SAndroid Build Coastguard Workersqueeze out some more performance:: 657*61046927SAndroid Build Coastguard Worker 658*61046927SAndroid Build Coastguard Worker (rep)(xmov1)mov $data, $data 659*61046927SAndroid Build Coastguard Worker 660*61046927SAndroid Build Coastguard WorkerAt the end is the standard go-to-next-packet sequence:: 661*61046927SAndroid Build Coastguard Worker 662*61046927SAndroid Build Coastguard Worker waitin 663*61046927SAndroid Build Coastguard Worker mov $01, $data 664*61046927SAndroid Build Coastguard Worker 665*61046927SAndroid Build Coastguard WorkerReassembling Firmwares 666*61046927SAndroid Build Coastguard Worker====================== 667*61046927SAndroid Build Coastguard Worker 668*61046927SAndroid Build Coastguard WorkerOf course, the main use of assembling is to take the firmware you're using, 669*61046927SAndroid Build Coastguard Workermodify it to test something, and reassemble it. Reassembling a firmware should 670*61046927SAndroid Build Coastguard Workerwork out-of-the-box, and should give you back an identical firmware, but there 671*61046927SAndroid Build Coastguard Workeris a caveat if you want to reassemble a modified firmware and use preemption. 672*61046927SAndroid Build Coastguard WorkerThe preemption routines contain a few tables embedded in the firmware, and they 673*61046927SAndroid Build Coastguard Workerload the offset of the table with a ``mov`` instruction that needs to be turned 674*61046927SAndroid Build Coastguard Workerinto a relocation and then add it to ``CP_SQE_INSTR_BASE``. ``afuc-asm`` 675*61046927SAndroid Build Coastguard Workersupports using labels as immediates for this:: 676*61046927SAndroid Build Coastguard Worker 677*61046927SAndroid Build Coastguard Worker foo: 678*61046927SAndroid Build Coastguard Worker [00000000] 679*61046927SAndroid Build Coastguard Worker ... 680*61046927SAndroid Build Coastguard Worker 681*61046927SAndroid Build Coastguard Worker mov $02, #foo << 2 ; #foo will be replaced with the offset in words 682*61046927SAndroid Build Coastguard Worker 683*61046927SAndroid Build Coastguard WorkerHowever, you have to manually insert the labels and replace the constant. On 684*61046927SAndroid Build Coastguard Workera7xx there are multiple tables next to each other that look like one table, so 685*61046927SAndroid Build Coastguard Workerbe careful to make sure you've found all the places it offsets from 686*61046927SAndroid Build Coastguard Worker``CP_SQE_INSTR_BASE``! There are also tables in the BV microcode on a7xx. To 687*61046927SAndroid Build Coastguard Workercheck that the relocations are correct, check that reassembling an otherwise 688*61046927SAndroid Build Coastguard Workerunmodified firmware still gives an identical result after adding the 689*61046927SAndroid Build Coastguard Workerrelocations. 690*61046927SAndroid Build Coastguard Worker 691*61046927SAndroid Build Coastguard WorkerA6XX NOTES 692*61046927SAndroid Build Coastguard Worker========== 693*61046927SAndroid Build Coastguard Worker 694*61046927SAndroid Build Coastguard WorkerThe ``$14`` register holds global flags set by: 695*61046927SAndroid Build Coastguard Worker 696*61046927SAndroid Build Coastguard Worker CP_SKIP_IB2_ENABLE_LOCAL - b8 697*61046927SAndroid Build Coastguard Worker CP_SKIP_IB2_ENABLE_GLOBAL - b9 698*61046927SAndroid Build Coastguard Worker CP_SET_MARKER 699*61046927SAndroid Build Coastguard Worker MODE=GMEM - sets b15 700*61046927SAndroid Build Coastguard Worker MODE=BLIT2D - clears b15, b12, b7 701*61046927SAndroid Build Coastguard Worker CP_SET_MODE - b29+b30 702*61046927SAndroid Build Coastguard Worker CP_SET_VISIBILITY_OVERRIDE - b11, b21, b30? 703*61046927SAndroid Build Coastguard Worker CP_SET_DRAW_STATE - checks b29+b30 704*61046927SAndroid Build Coastguard Worker 705*61046927SAndroid Build Coastguard Worker CP_COND_REG_EXEC - checks b10, which should be predicate flag? 706