xref: /aosp_15_r20/external/mesa3d/src/freedreno/afuc/README.rst (revision 6104692788411f58d303aa86923a9ff6ecaded22)
1*61046927SAndroid Build Coastguard Worker=====================
2*61046927SAndroid Build Coastguard WorkerAdreno Five Microcode
3*61046927SAndroid Build Coastguard Worker=====================
4*61046927SAndroid Build Coastguard Worker
5*61046927SAndroid Build Coastguard Worker.. contents::
6*61046927SAndroid Build Coastguard Worker
7*61046927SAndroid Build Coastguard Worker.. _afuc-introduction:
8*61046927SAndroid Build Coastguard Worker
9*61046927SAndroid Build Coastguard WorkerIntroduction
10*61046927SAndroid Build Coastguard Worker============
11*61046927SAndroid Build Coastguard Worker
12*61046927SAndroid Build Coastguard WorkerAdreno GPUs prior to 6xx use two micro-controllers to parse the command-stream,
13*61046927SAndroid Build Coastguard Workersetup the hardware for draws (or compute jobs), and do various GPU
14*61046927SAndroid Build Coastguard Workerhousekeeping.  They are relatively simple (basically glorified
15*61046927SAndroid Build Coastguard Workerregister writers) and basically all their state is in a collection
16*61046927SAndroid Build Coastguard Workerof registers.  Ie. there is no stack, and no memory assigned to
17*61046927SAndroid Build Coastguard Workerthem; any global state like which bank of context registers is to
18*61046927SAndroid Build Coastguard Workerbe used in the next draw is stored in a register.
19*61046927SAndroid Build Coastguard Worker
20*61046927SAndroid Build Coastguard WorkerThe setup is similar to radeon, in fact Adreno 2xx thru 4xx used
21*61046927SAndroid Build Coastguard Workerbasically the same instruction set as r600.  There is a "PFP"
22*61046927SAndroid Build Coastguard Worker(Prefetch Parser) and "ME" (Micro Engine, also confusingly referred
23*61046927SAndroid Build Coastguard Workerto as "PM4").  These make up the "CP" ("Command Parser").  The
24*61046927SAndroid Build Coastguard WorkerPFP runs ahead of the ME, with some PM4 packets handled entirely
25*61046927SAndroid Build Coastguard Workerin the PFP.  Between the PFP and ME is a FIFO ("MEQ").  In the
26*61046927SAndroid Build Coastguard Workergenerations prior to Adreno 5xx, the PFP and ME had different
27*61046927SAndroid Build Coastguard Workerinstruction sets.
28*61046927SAndroid Build Coastguard Worker
29*61046927SAndroid Build Coastguard WorkerStarting with Adreno 5xx, a new microcontroller with a unified
30*61046927SAndroid Build Coastguard Workerinstruction set was introduced, although the overall architecture
31*61046927SAndroid Build Coastguard Workerand purpose of the two microcontrollers remains the same.
32*61046927SAndroid Build Coastguard Worker
33*61046927SAndroid Build Coastguard WorkerFor lack of a better name, this new instruction set is called
34*61046927SAndroid Build Coastguard Worker"Adreno Five MicroCode" or "afuc".  (No idea what Qualcomm calls
35*61046927SAndroid Build Coastguard Workerit internally).
36*61046927SAndroid Build Coastguard Worker
37*61046927SAndroid Build Coastguard WorkerWith Adreno 6xx, the separate PFP and ME are replaced with a single
38*61046927SAndroid Build Coastguard WorkerSQE microcontroller using the same instruction set as 5xx.
39*61046927SAndroid Build Coastguard Worker
40*61046927SAndroid Build Coastguard WorkerStarting with Adreno 660, another processor called LPAC (Low Priority
41*61046927SAndroid Build Coastguard WorkerAsynchronous Compute) is introduced which is a slightly cut-down copy of the
42*61046927SAndroid Build Coastguard WorkerSQE used to execute background compute tasks. Unlike on 5xx, the firmware is
43*61046927SAndroid Build Coastguard Workerbundled together with the main SQE firmware, and the SQE is responsible for
44*61046927SAndroid Build Coastguard Workerbooting LPAC. On 7xx, to implement concurrent binning the SQE is split into two
45*61046927SAndroid Build Coastguard Workerprocessors called BR and BV. Again, the firmware for all three is bundled
46*61046927SAndroid Build Coastguard Workertogether and BR is responsible for booting both BV and LPAC.
47*61046927SAndroid Build Coastguard Worker
48*61046927SAndroid Build Coastguard Worker.. _afuc-overview:
49*61046927SAndroid Build Coastguard Worker
50*61046927SAndroid Build Coastguard WorkerInstruction Set Overview
51*61046927SAndroid Build Coastguard Worker========================
52*61046927SAndroid Build Coastguard Worker
53*61046927SAndroid Build Coastguard WorkerThe afuc instruction set is heavily inspired by MIPS, but not exactly
54*61046927SAndroid Build Coastguard Workercompatible.
55*61046927SAndroid Build Coastguard Worker
56*61046927SAndroid Build Coastguard WorkerRegisters
57*61046927SAndroid Build Coastguard Worker=========
58*61046927SAndroid Build Coastguard Worker
59*61046927SAndroid Build Coastguard WorkerSimilar to MIPS, there are 32 registers, and some are special purpose. ``$00``
60*61046927SAndroid Build Coastguard Workeris the same as ``$zero`` on MIPS, it reads 0 and writes are discarded.
61*61046927SAndroid Build Coastguard Worker
62*61046927SAndroid Build Coastguard WorkerRegisters are displayed in the current disassembly with a hexadecimal
63*61046927SAndroid Build Coastguard Workernumbering, e.g. ``$0a`` is encoded as 10.
64*61046927SAndroid Build Coastguard Worker
65*61046927SAndroid Build Coastguard WorkerThe ABI used when processing packets is that ``$01`` contains the current PM4
66*61046927SAndroid Build Coastguard Workerheader, registers from ``$02`` up to ``$11`` are temporaries and may be freely
67*61046927SAndroid Build Coastguard Workerclobbered by the packet handler, while ``$12`` and above are used to store
68*61046927SAndroid Build Coastguard Workerglobal state like the IB level and next visible draw (for draw skipping).
69*61046927SAndroid Build Coastguard Worker
70*61046927SAndroid Build Coastguard WorkerUnlike in MIPS, there is a special small hardware-managed stack and special
71*61046927SAndroid Build Coastguard Workerinstructions ``call``/``ret`` which use it. The stack only contains return
72*61046927SAndroid Build Coastguard Workeraddresses, there is no "stack frame" to spill values to. As a result, ``$sp``,
73*61046927SAndroid Build Coastguard Worker``$fp``, and ``$ra`` don't exist as on MIPS. Instead the last 3 registers are
74*61046927SAndroid Build Coastguard Workerused to :ref:`afuc-read<read>` from various queues and
75*61046927SAndroid Build Coastguard Worker:ref:`afuc-reg-writes<write GPU registers>`. In addition there is a ``$rem``
76*61046927SAndroid Build Coastguard Workerregister which normally contains the number of words remaining in the packet
77*61046927SAndroid Build Coastguard Workerbut can also be used as a normal register in combination with the rep prefix.
78*61046927SAndroid Build Coastguard Worker
79*61046927SAndroid Build Coastguard Worker.. _afuc-alu:
80*61046927SAndroid Build Coastguard Worker
81*61046927SAndroid Build Coastguard WorkerALU Instructions
82*61046927SAndroid Build Coastguard Worker================
83*61046927SAndroid Build Coastguard Worker
84*61046927SAndroid Build Coastguard WorkerThe following instructions are available:
85*61046927SAndroid Build Coastguard Worker
86*61046927SAndroid Build Coastguard Worker- ``add``   - add
87*61046927SAndroid Build Coastguard Worker- ``addhi`` - add + carry (for upper 32b of 64b value)
88*61046927SAndroid Build Coastguard Worker- ``sub``   - subtract
89*61046927SAndroid Build Coastguard Worker- ``subhi`` - subtract + carry (for upper 32b of 64b value)
90*61046927SAndroid Build Coastguard Worker- ``and``   - bitwise AND
91*61046927SAndroid Build Coastguard Worker- ``or``    - bitwise OR
92*61046927SAndroid Build Coastguard Worker- ``xor``   - bitwise XOR
93*61046927SAndroid Build Coastguard Worker- ``not``   - bitwise NOT (no src1)
94*61046927SAndroid Build Coastguard Worker- ``shl``   - shift-left
95*61046927SAndroid Build Coastguard Worker- ``ushr``  - unsigned shift-right
96*61046927SAndroid Build Coastguard Worker- ``ishr``  - signed shift-right
97*61046927SAndroid Build Coastguard Worker- ``rot``   - rotate-left (like shift-left with wrap-around)
98*61046927SAndroid Build Coastguard Worker- ``mul8``  - multiply low 8b of two src
99*61046927SAndroid Build Coastguard Worker- ``min``   - minimum
100*61046927SAndroid Build Coastguard Worker- ``max``   - maximum
101*61046927SAndroid Build Coastguard Worker- ``cmp``  - compare two values
102*61046927SAndroid Build Coastguard Worker
103*61046927SAndroid Build Coastguard WorkerSimilar to MIPS, The ALU instructions can take either two src registers, or a
104*61046927SAndroid Build Coastguard Workersrc plus 16b immediate as 2nd src, ex::
105*61046927SAndroid Build Coastguard Worker
106*61046927SAndroid Build Coastguard Worker  add $dst, $src, 0x1234   ; src2 is immed
107*61046927SAndroid Build Coastguard Worker  add $dst, $src1, $src2   ; src2 is reg
108*61046927SAndroid Build Coastguard Worker
109*61046927SAndroid Build Coastguard WorkerThe ``not`` instruction only takes a single source::
110*61046927SAndroid Build Coastguard Worker
111*61046927SAndroid Build Coastguard Worker  not $dst, $src
112*61046927SAndroid Build Coastguard Worker  not $dst, 0x1234
113*61046927SAndroid Build Coastguard Worker
114*61046927SAndroid Build Coastguard WorkerOne departure from MIPS is that there is a special immediate-form ``mov``
115*61046927SAndroid Build Coastguard Workerinstruction that can shift the 16-bit immediate by a given amount::
116*61046927SAndroid Build Coastguard Worker
117*61046927SAndroid Build Coastguard Worker   mov $dst, 0x1234 << 2
118*61046927SAndroid Build Coastguard Worker
119*61046927SAndroid Build Coastguard WorkerThis replaces ``lui`` on MIPS (just use a shift of 16) while also allowing the
120*61046927SAndroid Build Coastguard Workerquick construction of small bitfields, which comes in handy in various places.
121*61046927SAndroid Build Coastguard Worker
122*61046927SAndroid Build Coastguard Worker.. _afuc-alu-cmp:
123*61046927SAndroid Build Coastguard Worker
124*61046927SAndroid Build Coastguard WorkerThe ``cmp`` instruction returns:
125*61046927SAndroid Build Coastguard Worker
126*61046927SAndroid Build Coastguard Worker- ``0x00`` if src1 > src2
127*61046927SAndroid Build Coastguard Worker- ``0x2b`` if src1 == src2
128*61046927SAndroid Build Coastguard Worker- ``0x1e`` if src1 < src2
129*61046927SAndroid Build Coastguard Worker
130*61046927SAndroid Build Coastguard WorkerSee explanation in :ref:`afuc-branch`
131*61046927SAndroid Build Coastguard Worker
132*61046927SAndroid Build Coastguard Worker
133*61046927SAndroid Build Coastguard Worker.. _afuc-branch:
134*61046927SAndroid Build Coastguard Worker
135*61046927SAndroid Build Coastguard WorkerBranch Instructions
136*61046927SAndroid Build Coastguard Worker===================
137*61046927SAndroid Build Coastguard Worker
138*61046927SAndroid Build Coastguard WorkerThe following branch/jump instructions are available:
139*61046927SAndroid Build Coastguard Worker
140*61046927SAndroid Build Coastguard Worker- ``brne`` - branch if not equal (or bit not set)
141*61046927SAndroid Build Coastguard Worker- ``breq`` - branch if equal (or bit set)
142*61046927SAndroid Build Coastguard Worker- ``jump`` - unconditional jump
143*61046927SAndroid Build Coastguard Worker
144*61046927SAndroid Build Coastguard WorkerBoth ``brne`` and ``breq`` have two forms, comparing the src register
145*61046927SAndroid Build Coastguard Workeragainst either a small immediate (up to 5 bits) or a specific bit::
146*61046927SAndroid Build Coastguard Worker
147*61046927SAndroid Build Coastguard Worker  breq $src, b3, #somelabel  ; branch if src & (1 << 3)
148*61046927SAndroid Build Coastguard Worker  breq $src, 0x3, #somelabel ; branch if src == 3
149*61046927SAndroid Build Coastguard Worker
150*61046927SAndroid Build Coastguard WorkerThe branch instructions are encoded with a 16b relative offset.
151*61046927SAndroid Build Coastguard WorkerSince ``$00`` always reads back zero, it can be used to construct
152*61046927SAndroid Build Coastguard Workeran unconditional relative jump.
153*61046927SAndroid Build Coastguard Worker
154*61046927SAndroid Build Coastguard WorkerThe :ref:`cmp <afuc-alu-cmp>` instruction can be paired with the
155*61046927SAndroid Build Coastguard Workerbit-test variants of ``brne``/``breq`` to implement gt/ge/lt/le,
156*61046927SAndroid Build Coastguard Workerdue to the bit pattern it returns, for example::
157*61046927SAndroid Build Coastguard Worker
158*61046927SAndroid Build Coastguard Worker  cmp $04, $02, $03
159*61046927SAndroid Build Coastguard Worker  breq $04, b1, #somelabel
160*61046927SAndroid Build Coastguard Worker
161*61046927SAndroid Build Coastguard Workerwill branch if ``$02`` is less than or equal to ``$03``.
162*61046927SAndroid Build Coastguard Worker
163*61046927SAndroid Build Coastguard WorkerDelay slots
164*61046927SAndroid Build Coastguard Worker-----------
165*61046927SAndroid Build Coastguard Worker
166*61046927SAndroid Build Coastguard WorkerBranch instructions have a delay slot so the following instruction is always
167*61046927SAndroid Build Coastguard Workerexecuted regardless of whether branch is taken or not. Unlike MIPS, a branch in
168*61046927SAndroid Build Coastguard Workerthe delay slot is legal as long as the original branch and the branch in its
169*61046927SAndroid Build Coastguard Workerdelay slot are never both taken. Because jump tables are awkward and slow due
170*61046927SAndroid Build Coastguard Workerto the lack of memory caching, this is often exploited to create dense
171*61046927SAndroid Build Coastguard Workersequences of branches to implement switch-case constructs::
172*61046927SAndroid Build Coastguard Worker
173*61046927SAndroid Build Coastguard Worker   breq $02, 0x1, #foo
174*61046927SAndroid Build Coastguard Worker   breq $02, 0x2, #bar
175*61046927SAndroid Build Coastguard Worker   breq $02, 0x3, #baz
176*61046927SAndroid Build Coastguard Worker   ...
177*61046927SAndroid Build Coastguard Worker   nop
178*61046927SAndroid Build Coastguard Worker   jump #default
179*61046927SAndroid Build Coastguard Worker
180*61046927SAndroid Build Coastguard WorkerAnother common use of a branch in a delay slot is a double-jump (jump to one
181*61046927SAndroid Build Coastguard Workerlocation if a condition is true, and another location if false). In MIPS this
182*61046927SAndroid Build Coastguard Workerrequires two delay slots::
183*61046927SAndroid Build Coastguard Worker
184*61046927SAndroid Build Coastguard Worker   beq $t0, 0x1, #foo
185*61046927SAndroid Build Coastguard Worker   nop ; beq delay slot
186*61046927SAndroid Build Coastguard Worker   b #bar
187*61046927SAndroid Build Coastguard Worker   nop ; b delay slot
188*61046927SAndroid Build Coastguard Worker
189*61046927SAndroid Build Coastguard WorkerIn afuc this only requires a delay slot for the second branch::
190*61046927SAndroid Build Coastguard Worker
191*61046927SAndroid Build Coastguard Worker   breq $02, 0x1, #foo
192*61046927SAndroid Build Coastguard Worker   brne $02, 0x1, #bar
193*61046927SAndroid Build Coastguard Worker   nop
194*61046927SAndroid Build Coastguard Worker
195*61046927SAndroid Build Coastguard WorkerNote that for the second branch we had to use a conditional branch with the
196*61046927SAndroid Build Coastguard Workeropposite condition instead of an unconditional branch as in the MIPS example,
197*61046927SAndroid Build Coastguard Workerto guarantee that at most one is ever taken.
198*61046927SAndroid Build Coastguard Worker
199*61046927SAndroid Build Coastguard Worker.. _afuc-call:
200*61046927SAndroid Build Coastguard Worker
201*61046927SAndroid Build Coastguard WorkerCall/Return
202*61046927SAndroid Build Coastguard Worker===========
203*61046927SAndroid Build Coastguard Worker
204*61046927SAndroid Build Coastguard WorkerSimple subroutines can be implemented with ``call``/``ret``.  The
205*61046927SAndroid Build Coastguard Workerjump instruction encodes a fixed offset from the SQE instruction base.
206*61046927SAndroid Build Coastguard Worker
207*61046927SAndroid Build Coastguard Worker  TODO not sure how many levels deep function calls can be nested.
208*61046927SAndroid Build Coastguard Worker  There isn't really a stack.  Definitely seems to be multiple
209*61046927SAndroid Build Coastguard Worker  levels of fxn call, see in PFP: CP_CONTEXT_SWITCH_YIELD -> f13 ->
210*61046927SAndroid Build Coastguard Worker  f22.
211*61046927SAndroid Build Coastguard Worker
212*61046927SAndroid Build Coastguard Worker.. _afuc-nop:
213*61046927SAndroid Build Coastguard Worker
214*61046927SAndroid Build Coastguard WorkerNOPs
215*61046927SAndroid Build Coastguard Worker====
216*61046927SAndroid Build Coastguard Worker
217*61046927SAndroid Build Coastguard WorkerAfuc has a special NOP encoding where the low 24 bits are ignored by the
218*61046927SAndroid Build Coastguard Workerprocessor. On a5xx the high 8 bits are ``00``, on a6xx they are ``01``
219*61046927SAndroid Build Coastguard Worker(probably to make sure that 0 is not a legal instruction, increasing the
220*61046927SAndroid Build Coastguard Workerchances of halting immediately when something is misconfigured). This is used
221*61046927SAndroid Build Coastguard Workersometimes to create a "payload" that is ignored when executed. For example, the
222*61046927SAndroid Build Coastguard Workerfirst 2 instructions of the firmware typically contain the firmware ID and
223*61046927SAndroid Build Coastguard Workerversion followed by the packet handling table offset encoded as NOPs. They are
224*61046927SAndroid Build Coastguard Workerskipped when executed but they are later read as data by the bootstrap routine.
225*61046927SAndroid Build Coastguard Worker
226*61046927SAndroid Build Coastguard Worker.. _afuc-control:
227*61046927SAndroid Build Coastguard Worker
228*61046927SAndroid Build Coastguard WorkerControl Registers
229*61046927SAndroid Build Coastguard Worker=================
230*61046927SAndroid Build Coastguard Worker
231*61046927SAndroid Build Coastguard WorkerControl registers are a special register space that can only be read/written
232*61046927SAndroid Build Coastguard Workerdirectly by CP through ``cread``/``cwrite`` instructions::
233*61046927SAndroid Build Coastguard Worker
234*61046927SAndroid Build Coastguard Worker- ``cread $dst, [$off + addr], flags``
235*61046927SAndroid Build Coastguard Worker- ``cread $dst, [$off + addr]!, flags``
236*61046927SAndroid Build Coastguard Worker- ``cwrite $src, [$off + addr], flags``
237*61046927SAndroid Build Coastguard Worker- ``cwrite $src, [$off + addr]!, flags``
238*61046927SAndroid Build Coastguard Worker
239*61046927SAndroid Build Coastguard WorkerControl registers ``0x000`` to ``0x0ff`` are private registers used to control
240*61046927SAndroid Build Coastguard Workerthe CP, for example to indicate where to read from memory or (normal)
241*61046927SAndroid Build Coastguard Workerregisters.  ``0x100`` to ``0x17f`` are a private scratch space used by the
242*61046927SAndroid Build Coastguard Workerfirmware however it wants, for example as an ad-hoc stack to spill registers
243*61046927SAndroid Build Coastguard Workerwhen calling a function or to store the scratch used in ``CP_SCRATCH_TO_*``
244*61046927SAndroid Build Coastguard Workerpackets. Starting with the introduction of LPAC, ``0x200`` to ``0x27f`` are a
245*61046927SAndroid Build Coastguard Workershared scratch space used to communicate between processors and on a7xx they
246*61046927SAndroid Build Coastguard Workercan also be written on event completion to implement so-called "on-chip
247*61046927SAndroid Build Coastguard Workertimestamps".
248*61046927SAndroid Build Coastguard Worker
249*61046927SAndroid Build Coastguard WorkerIn cases where no offset is needed, ``$00`` is frequently used as the offset.
250*61046927SAndroid Build Coastguard Worker
251*61046927SAndroid Build Coastguard WorkerThe addressing mode with ``!`` is a pre-increment mode that writes the final
252*61046927SAndroid Build Coastguard Workeraddress ``$off + addr`` to ``$off``.
253*61046927SAndroid Build Coastguard Worker
254*61046927SAndroid Build Coastguard WorkerFor example, the following sequences sets::
255*61046927SAndroid Build Coastguard Worker
256*61046927SAndroid Build Coastguard Worker  ; load CP_INDIRECT_BUFFER parameters from cmdstream:
257*61046927SAndroid Build Coastguard Worker  mov $02, $data   ; low 32b of IB target address
258*61046927SAndroid Build Coastguard Worker  mov $03, $data   ; high 32b of IB target
259*61046927SAndroid Build Coastguard Worker  mov $04, $data   ; IB size in dwords
260*61046927SAndroid Build Coastguard Worker
261*61046927SAndroid Build Coastguard Worker  ; sanity check # of dwords:
262*61046927SAndroid Build Coastguard Worker  breq $04, 0x0, #l23
263*61046927SAndroid Build Coastguard Worker
264*61046927SAndroid Build Coastguard Worker  ; this seems something to do with figuring out whether
265*61046927SAndroid Build Coastguard Worker  ; we are going from RB->IB1 or IB1->IB2 (ie. so the
266*61046927SAndroid Build Coastguard Worker  ; below cwrite instructions update either
267*61046927SAndroid Build Coastguard Worker  ; CP_IB1_BASE_LO/HI/BUFSIZE or CP_IB2_BASE_LO/HI/BUFSIZE
268*61046927SAndroid Build Coastguard Worker  and $05, $18, 0x0003
269*61046927SAndroid Build Coastguard Worker  shl $05, $05, 0x0002
270*61046927SAndroid Build Coastguard Worker
271*61046927SAndroid Build Coastguard Worker  ; update CP_IBn_BASE_LO/HI/BUFSIZE:
272*61046927SAndroid Build Coastguard Worker  cwrite $02, [$05 + 0x0b0], 0x8
273*61046927SAndroid Build Coastguard Worker  cwrite $03, [$05 + 0x0b1], 0x8
274*61046927SAndroid Build Coastguard Worker  cwrite $04, [$05 + 0x0b2], 0x8
275*61046927SAndroid Build Coastguard Worker
276*61046927SAndroid Build Coastguard WorkerUnlike normal GPU registers, writing control registers seems to always take
277*61046927SAndroid Build Coastguard Workereffect immediately; if writing a control register triggers some complex
278*61046927SAndroid Build Coastguard Workeroperation that the firmware needs to wait for, then it typically uses a
279*61046927SAndroid Build Coastguard Workerspinloop with another control register to wait for it to finish.
280*61046927SAndroid Build Coastguard Worker
281*61046927SAndroid Build Coastguard WorkerControl registers are documented in ``adreno_control_regs.xml``. The
282*61046927SAndroid Build Coastguard Workerdisassembler will try to recognize an immediate address as a known control
283*61046927SAndroid Build Coastguard Workerregister and print it, for example this sequence similar to the above sequence
284*61046927SAndroid Build Coastguard Workerbut on a6xx::
285*61046927SAndroid Build Coastguard Worker
286*61046927SAndroid Build Coastguard Worker  and $05, $12, 0x0003
287*61046927SAndroid Build Coastguard Worker  shl $05, $05, 0x0002
288*61046927SAndroid Build Coastguard Worker  cwrite $0e, [$05 + @IB1_BASE], 0x0
289*61046927SAndroid Build Coastguard Worker  cwrite $0b, [$05 + @IB1_BASE+0x1], 0x0
290*61046927SAndroid Build Coastguard Worker  cwrite $04, [$05 + @IB1_DWORDS], 0x0
291*61046927SAndroid Build Coastguard Worker
292*61046927SAndroid Build Coastguard Worker.. _afuc-sqe-regs:
293*61046927SAndroid Build Coastguard Worker
294*61046927SAndroid Build Coastguard WorkerSQE Registers
295*61046927SAndroid Build Coastguard Worker=============
296*61046927SAndroid Build Coastguard Worker
297*61046927SAndroid Build Coastguard WorkerStarting with a6xx, the state of the SQE processor itself can be accessed
298*61046927SAndroid Build Coastguard Workerthrough ``sread``/``swrite`` instructions that work identically to
299*61046927SAndroid Build Coastguard Worker``cread``/``cwrite``. For example, this includes the state of the
300*61046927SAndroid Build Coastguard Worker``call``/``ret`` stack. This is mainly used during the preemption routine but
301*61046927SAndroid Build Coastguard Workerit's also used to set the entrypoint for preemption.
302*61046927SAndroid Build Coastguard Worker
303*61046927SAndroid Build Coastguard Worker.. _afuc-read:
304*61046927SAndroid Build Coastguard Worker
305*61046927SAndroid Build Coastguard WorkerReading Memory and Registers
306*61046927SAndroid Build Coastguard Worker============================
307*61046927SAndroid Build Coastguard Worker
308*61046927SAndroid Build Coastguard WorkerThe CP accesses memory directly with no caching. This means that except for
309*61046927SAndroid Build Coastguard Workervery small amounts of data accessed rarely, ``load`` and ``store`` are very
310*61046927SAndroid Build Coastguard Workerslow. Instead, ME/PFP and later SQE read memory through various queues. Reading
311*61046927SAndroid Build Coastguard Workerregisters also use a queue, likely because burst reading several registers at
312*61046927SAndroid Build Coastguard Workeronce is faster than reading them one-by-one and reading does not complete
313*61046927SAndroid Build Coastguard Workerimmediately. Queueing up a read involves writing a (address, length) pair to a
314*61046927SAndroid Build Coastguard Workercontrol register, and data is read from the queue using one of three special
315*61046927SAndroid Build Coastguard Workerregisters:
316*61046927SAndroid Build Coastguard Worker
317*61046927SAndroid Build Coastguard Worker- ``$data`` reads the next PM4 packet word. This comes from the RB, IB1, IB2,
318*61046927SAndroid Build Coastguard Worker  or SDS (Set Draw State) queue, controlled by ``@IB_LEVEL``. It also
319*61046927SAndroid Build Coastguard Worker  decrements ``$rem`` if it isn't already decremented by a rep prefix.
320*61046927SAndroid Build Coastguard Worker- ``$memdata`` reads the next word from a memory read buffer (MRB) setup by
321*61046927SAndroid Build Coastguard Worker  writing ``@MEM_READ_ADDR``/``@MEM_READ_DWORDS``. It's used by things like
322*61046927SAndroid Build Coastguard Worker  ``CP_MEMCPY`` and reading indirect draw parameters in ``CP_DRAW_INDIRECT``.
323*61046927SAndroid Build Coastguard Worker- ``$regdata`` reads from a register read buffer (RRB) setup by
324*61046927SAndroid Build Coastguard Worker  ``@REG_READ_ADDR``/``@REG_READ_DWORDS``.
325*61046927SAndroid Build Coastguard Worker
326*61046927SAndroid Build Coastguard WorkerRB, IB1, IB2, SDS, and MRB make up the Read-Only Queue or ROQ, in addition to
327*61046927SAndroid Build Coastguard Workerthe Visibility Stream Decoder (VSD) which is setup via a similar control
328*61046927SAndroid Build Coastguard Workerregister pair but is read by a fixed-function parser that the CP accesses via a
329*61046927SAndroid Build Coastguard Workerfew control registers.
330*61046927SAndroid Build Coastguard Worker
331*61046927SAndroid Build Coastguard Worker.. _afuc-reg-writes:
332*61046927SAndroid Build Coastguard Worker
333*61046927SAndroid Build Coastguard WorkerWriting Registers
334*61046927SAndroid Build Coastguard Worker=================
335*61046927SAndroid Build Coastguard Worker
336*61046927SAndroid Build Coastguard WorkerThe same special registers, when used as a destination, can be used to
337*61046927SAndroid Build Coastguard Workerwrite GPU registers on ME. Because they have a totally different function when
338*61046927SAndroid Build Coastguard Workerused as a destination, they use different names:
339*61046927SAndroid Build Coastguard Worker
340*61046927SAndroid Build Coastguard Worker- ``$addr`` sets the address and disables ``CP_PROTECT`` address checking.
341*61046927SAndroid Build Coastguard Worker- ``$usraddr`` sets the address and checks it against the ``CP_PROTECT`` access
342*61046927SAndroid Build Coastguard Worker  table. It's used for addresses specified by the PM4 packet stream instead of
343*61046927SAndroid Build Coastguard Worker  internally.
344*61046927SAndroid Build Coastguard Worker- ``$data`` writes the register and auto-increments the address.
345*61046927SAndroid Build Coastguard Worker
346*61046927SAndroid Build Coastguard Workerfor example, to write::
347*61046927SAndroid Build Coastguard Worker
348*61046927SAndroid Build Coastguard Worker  mov $addr, CP_SCRATCH_REG[0x2] ; set register to write
349*61046927SAndroid Build Coastguard Worker  mov $data, $03                 ; CP_SCRATCH_REG[0x2]
350*61046927SAndroid Build Coastguard Worker  mov $data, $04                 ; CP_SCRATCH_REG[0x3]
351*61046927SAndroid Build Coastguard Worker  ...
352*61046927SAndroid Build Coastguard Worker
353*61046927SAndroid Build Coastguard Workersubsequent writes to ``$data`` will increment the address of the register
354*61046927SAndroid Build Coastguard Workerto write, so a sequence of consecutive registers can be written. On a5xx ME,
355*61046927SAndroid Build Coastguard Workerthis will directly write the register, on a6xx SQE this will instead determine
356*61046927SAndroid Build Coastguard Workerwhich cluster(s) the register belongs to and push the write onto the
357*61046927SAndroid Build Coastguard Workerappropriate per-cluster queue(s) letting the SQE run ahead of the GPU.
358*61046927SAndroid Build Coastguard Worker
359*61046927SAndroid Build Coastguard WorkerWhen bit 18 of ``$addr`` is set, the auto-incrementing is disabled. This is
360*61046927SAndroid Build Coastguard Workeroften used with :ref:`afuc-mem-writes <NRT_DATA>`.
361*61046927SAndroid Build Coastguard Worker
362*61046927SAndroid Build Coastguard WorkerOn a5xx ME, ``$regdata`` can also be used to directly read a register::
363*61046927SAndroid Build Coastguard Worker
364*61046927SAndroid Build Coastguard Worker  mov $addr, CP_SCRATCH_REG[0x2]
365*61046927SAndroid Build Coastguard Worker  mov $03, $regdata
366*61046927SAndroid Build Coastguard Worker  mov $04, $regdata
367*61046927SAndroid Build Coastguard Worker
368*61046927SAndroid Build Coastguard WorkerThis does not exist on a6xx because register reads are not synchronized against
369*61046927SAndroid Build Coastguard Workerwrites any more.
370*61046927SAndroid Build Coastguard Worker
371*61046927SAndroid Build Coastguard WorkerMany registers that are updated frequently have two banks, so they can be
372*61046927SAndroid Build Coastguard Workerupdated without stalling for previous draw to finish.  On a5xx, these banks are
373*61046927SAndroid Build Coastguard Workerarranged so bit 11 is zero for bank 0 and 1 for bank 1.  The ME fw (at
374*61046927SAndroid Build Coastguard Workerleast the version I'm looking at) stores this in ``$17``, so to update these
375*61046927SAndroid Build Coastguard Workerregisters from ME::
376*61046927SAndroid Build Coastguard Worker
377*61046927SAndroid Build Coastguard Worker  or $addr, $17, VFD_INDEX_OFFSET
378*61046927SAndroid Build Coastguard Worker  mov $data, $03
379*61046927SAndroid Build Coastguard Worker  ...
380*61046927SAndroid Build Coastguard Worker
381*61046927SAndroid Build Coastguard WorkerOn a6xx this is handled transparently to the SQE, and the bank to use is stored
382*61046927SAndroid Build Coastguard Workerseparately in the cluster queue.
383*61046927SAndroid Build Coastguard Worker
384*61046927SAndroid Build Coastguard WorkerRegisters can also be written directly, skipping the queue, by writing
385*61046927SAndroid Build Coastguard Worker``@REG_WRITE_ADDR``/``@REG_WRITE``. This is used on a6xx for certain frontend
386*61046927SAndroid Build Coastguard Workerregisters that have their own queues and on a5xx is used by the PFP::
387*61046927SAndroid Build Coastguard Worker
388*61046927SAndroid Build Coastguard Worker  mov $0c, CP_SCRATCH_REG[0x7]
389*61046927SAndroid Build Coastguard Worker  mov $02, 0x789a   ; value
390*61046927SAndroid Build Coastguard Worker  cwrite $0c, [$00 + @REG_WRITE_ADDR], 0x8
391*61046927SAndroid Build Coastguard Worker  cwrite $02, [$00 + @REG_WRITE], 0x8
392*61046927SAndroid Build Coastguard Worker
393*61046927SAndroid Build Coastguard WorkerLike with the ``$addr``/``$data`` approach, the destination register address
394*61046927SAndroid Build Coastguard Workerincrements on each write to ``@REG_WRITE``.
395*61046927SAndroid Build Coastguard Worker
396*61046927SAndroid Build Coastguard Worker.. _afuc-pipe-regs:
397*61046927SAndroid Build Coastguard Worker
398*61046927SAndroid Build Coastguard WorkerPipe Registers
399*61046927SAndroid Build Coastguard Worker--------------
400*61046927SAndroid Build Coastguard Worker
401*61046927SAndroid Build Coastguard WorkerThis yet another private register space, triggered by writing to the high 8
402*61046927SAndroid Build Coastguard Workerbits of ``$addr`` and then writing ``$data`` normally. Some pipe registers like
403*61046927SAndroid Build Coastguard Worker``WAIT_MEM_WRITES`` or ``WAIT_GPU_IDLE`` have no data and a write is triggered
404*61046927SAndroid Build Coastguard Workerimmediately when ``$addr`` is written, for example in ``CP_WAIT_MEM_WRITES``::
405*61046927SAndroid Build Coastguard Worker
406*61046927SAndroid Build Coastguard Worker  mov $addr, 0x0084 << 24 ; |WAIT_MEM_WRITES
407*61046927SAndroid Build Coastguard Worker
408*61046927SAndroid Build Coastguard WorkerThe pipe register is decoded here by the disassembler in a comment.
409*61046927SAndroid Build Coastguard Worker
410*61046927SAndroid Build Coastguard WorkerThe main difference of pipe registers from control registers are:
411*61046927SAndroid Build Coastguard Worker
412*61046927SAndroid Build Coastguard Worker- They are always write-only.
413*61046927SAndroid Build Coastguard Worker- On a6xx they are pipelined together with normal register writes, on a5xx they
414*61046927SAndroid Build Coastguard Worker  are written from ME like normal registers.
415*61046927SAndroid Build Coastguard Worker- Writing them can take an arbitrary amount of time, so they can be used to
416*61046927SAndroid Build Coastguard Worker  wait for some condition without spinning.
417*61046927SAndroid Build Coastguard Worker
418*61046927SAndroid Build Coastguard WorkerIn short, they behave more like normal registers but are not expected to be
419*61046927SAndroid Build Coastguard Workerread/written by anything other than CP. Over time more and more GPU registers
420*61046927SAndroid Build Coastguard Workernot touched by the kernel driver have been converted to pipe registers.
421*61046927SAndroid Build Coastguard Worker
422*61046927SAndroid Build Coastguard Worker.. _afuc-mem-writes:
423*61046927SAndroid Build Coastguard Worker
424*61046927SAndroid Build Coastguard WorkerWriting Memory
425*61046927SAndroid Build Coastguard Worker==============
426*61046927SAndroid Build Coastguard Worker
427*61046927SAndroid Build Coastguard WorkerWriting memory is done by writing GPU registers:
428*61046927SAndroid Build Coastguard Worker
429*61046927SAndroid Build Coastguard Worker- ``CP_ME_NRT_ADDR_LO``/``_HI`` - write to set the address to read or write
430*61046927SAndroid Build Coastguard Worker- ``CP_ME_NRT_DATA`` - write to trigger write to address in ``CP_ME_NRT_ADDR``.
431*61046927SAndroid Build Coastguard Worker
432*61046927SAndroid Build Coastguard WorkerThe address register increments with successive writes.
433*61046927SAndroid Build Coastguard Worker
434*61046927SAndroid Build Coastguard WorkerOn a5xx, this seems to be only used by ME.  If PFP were also using it, they would
435*61046927SAndroid Build Coastguard Workerrace with each other.  It can also be used for reads, primarily small reads.
436*61046927SAndroid Build Coastguard Worker
437*61046927SAndroid Build Coastguard WorkerMemory Write example::
438*61046927SAndroid Build Coastguard Worker
439*61046927SAndroid Build Coastguard Worker  ; store 64b value in $04+$05 to 64b address in $02+$03
440*61046927SAndroid Build Coastguard Worker  mov $addr, CP_ME_NRT_ADDR_LO
441*61046927SAndroid Build Coastguard Worker  mov $data, $02
442*61046927SAndroid Build Coastguard Worker  mov $data, $03
443*61046927SAndroid Build Coastguard Worker  mov $addr, CP_ME_NRT_DATA
444*61046927SAndroid Build Coastguard Worker  mov $data, $04
445*61046927SAndroid Build Coastguard Worker  mov $data, $05
446*61046927SAndroid Build Coastguard Worker
447*61046927SAndroid Build Coastguard WorkerMemory Read example::
448*61046927SAndroid Build Coastguard Worker
449*61046927SAndroid Build Coastguard Worker  ; load 64b value from address in $02+$03 into $04+$05
450*61046927SAndroid Build Coastguard Worker  mov $addr, CP_ME_NRT_ADDR_LO
451*61046927SAndroid Build Coastguard Worker  mov $data, $02
452*61046927SAndroid Build Coastguard Worker  mov $data, $03
453*61046927SAndroid Build Coastguard Worker  mov $04, $addr
454*61046927SAndroid Build Coastguard Worker  mov $05, $addr
455*61046927SAndroid Build Coastguard Worker
456*61046927SAndroid Build Coastguard WorkerOn a6xx ``CP_ME_NRT_ADDR`` and ``CP_ME_NRT_DATA`` have been replaced by
457*61046927SAndroid Build Coastguard Worker:ref:`afuc-pipe-regs <pipe registers>` and they can only be used for writes but
458*61046927SAndroid Build Coastguard Workerit otherwise works similarly.
459*61046927SAndroid Build Coastguard Worker
460*61046927SAndroid Build Coastguard WorkerLoad and Store Instructions
461*61046927SAndroid Build Coastguard Worker===========================
462*61046927SAndroid Build Coastguard Worker
463*61046927SAndroid Build Coastguard Workera6xx adds ``load`` and ``store`` instruction that work similarly to ``cread``
464*61046927SAndroid Build Coastguard Workerand ``cwrite``. Because the address is 64-bits but registers are 32-bit, the
465*61046927SAndroid Build Coastguard Workerhigh 32 bits come from the ``@LOAD_STORE_HI``
466*61046927SAndroid Build Coastguard Worker:ref:`afuc-control <control register>`. They are mostly used by the context
467*61046927SAndroid Build Coastguard Workerswitch routine and even then very sparingly, before the memory read/write queue
468*61046927SAndroid Build Coastguard Workerstate is saved while it is being restored.
469*61046927SAndroid Build Coastguard Worker
470*61046927SAndroid Build Coastguard WorkerModifiers
471*61046927SAndroid Build Coastguard Worker=========
472*61046927SAndroid Build Coastguard Worker
473*61046927SAndroid Build Coastguard WorkerThere are two modifiers that enable more compact and efficient implementations
474*61046927SAndroid Build Coastguard Workerof common patterns:
475*61046927SAndroid Build Coastguard Worker
476*61046927SAndroid Build Coastguard Worker.. _afuc-rep:
477*61046927SAndroid Build Coastguard Worker
478*61046927SAndroid Build Coastguard WorkerRepeat
479*61046927SAndroid Build Coastguard Worker------
480*61046927SAndroid Build Coastguard Worker
481*61046927SAndroid Build Coastguard Worker``(rep)`` repeats the same instruction ``$rem`` times. More precisely, it
482*61046927SAndroid Build Coastguard Workerdecrements ``$rem`` after the instruction executes if it wasn't already
483*61046927SAndroid Build Coastguard Workerdecremented from a read from ``$data`` and re-executes the instruction until
484*61046927SAndroid Build Coastguard Worker``$rem`` is 0.  It can be used with ALU instructions and control instructions.
485*61046927SAndroid Build Coastguard WorkerUsually it is used in conjunction with ``$data`` to read the rest of the packet
486*61046927SAndroid Build Coastguard Workerin one instruction, but it can also be used freestanding, for example this
487*61046927SAndroid Build Coastguard Workersnippet clears the control register scratch space::
488*61046927SAndroid Build Coastguard Worker
489*61046927SAndroid Build Coastguard Worker  mov $rem, 0x0080 ; clear 0x80 registers
490*61046927SAndroid Build Coastguard Worker  mov $03, 0x00ff ; start at 0xff + 1 = 0x100
491*61046927SAndroid Build Coastguard Worker  (rep)cwrite $00, [$03 + 0x001], 0x4
492*61046927SAndroid Build Coastguard Worker
493*61046927SAndroid Build Coastguard WorkerNote the use of pre-increment mode, so that the first execution clears
494*61046927SAndroid Build Coastguard Worker``0x100`` and updates ``$03`` to ``0x100``, the second execution clears
495*61046927SAndroid Build Coastguard Worker``0x101`` and updates ``$03`` to ``0x101``, and so on.
496*61046927SAndroid Build Coastguard Worker
497*61046927SAndroid Build Coastguard Worker.. _afuc-xmov:
498*61046927SAndroid Build Coastguard Worker
499*61046927SAndroid Build Coastguard WorkereXtra Moves
500*61046927SAndroid Build Coastguard Worker-----------
501*61046927SAndroid Build Coastguard Worker
502*61046927SAndroid Build Coastguard Worker``(xmovN)`` is an optimization which lets the firmware read multiple words from
503*61046927SAndroid Build Coastguard Workera queue in the same cycle. Conceptually, it adds "extra" mov instructions to be
504*61046927SAndroid Build Coastguard Workerexecuted after a given ALU instruction, although in practice they are likely
505*61046927SAndroid Build Coastguard Workerexecuted in parallel. ``(xmov1)`` adds up to 1 move, ``(xmov2)`` adds up to 2,
506*61046927SAndroid Build Coastguard Workerand ``(xmov3)`` adds up to 3. The actual number of moves added is the minimum
507*61046927SAndroid Build Coastguard Workerof the number in the instruction and ``$rem``, so a ``(xmov3)`` instruction
508*61046927SAndroid Build Coastguard Workerbehaves like a ``(xmov1)`` instruction if ``$rem = 1``. Given an instruction::
509*61046927SAndroid Build Coastguard Worker
510*61046927SAndroid Build Coastguard Worker  (xmovN) alu $dst, $src1, $src2
511*61046927SAndroid Build Coastguard Worker
512*61046927SAndroid Build Coastguard Workeror a 1-source instruction::
513*61046927SAndroid Build Coastguard Worker
514*61046927SAndroid Build Coastguard Worker  (xmovN) alu $dst, $src2
515*61046927SAndroid Build Coastguard Worker
516*61046927SAndroid Build Coastguard Workerthen we compute the number of extra moves ``M = min(N, $rem)``. If ``M = 1``,
517*61046927SAndroid Build Coastguard Workerthen we add::
518*61046927SAndroid Build Coastguard Worker
519*61046927SAndroid Build Coastguard Worker  mov $data, $src2
520*61046927SAndroid Build Coastguard Worker
521*61046927SAndroid Build Coastguard WorkerIf ``M = 2``, then we add::
522*61046927SAndroid Build Coastguard Worker
523*61046927SAndroid Build Coastguard Worker  mov $data, $src2
524*61046927SAndroid Build Coastguard Worker  mov $data, $src2
525*61046927SAndroid Build Coastguard Worker
526*61046927SAndroid Build Coastguard WorkerFinally, as a special case explained below, if ``M = 3`` then we add::
527*61046927SAndroid Build Coastguard Worker
528*61046927SAndroid Build Coastguard Worker  mov $data, $src2
529*61046927SAndroid Build Coastguard Worker  mov $dst, $src2 ; !!!
530*61046927SAndroid Build Coastguard Worker  mov $data, $src2
531*61046927SAndroid Build Coastguard Worker
532*61046927SAndroid Build Coastguard WorkerIf ``$dst`` is not one of the "special" registers ``$data``, ``$addr``,
533*61046927SAndroid Build Coastguard Worker``$usraddr``, then ``$data`` is replaced by ``$00`` in all destinations, i.e.
534*61046927SAndroid Build Coastguard Workerthe results of the subsequent moves are discarded.
535*61046927SAndroid Build Coastguard Worker
536*61046927SAndroid Build Coastguard WorkerThe purpose of the ``M = 3`` special case is mostly to efficiently implement
537*61046927SAndroid Build Coastguard Worker``CP_CONTEXT_REG_BUNCH``. This is the entire implementation of
538*61046927SAndroid Build Coastguard Worker``CP_CONTEXT_REG_BUNCH``, which is essentially just one instruction::
539*61046927SAndroid Build Coastguard Worker
540*61046927SAndroid Build Coastguard Worker  CP_CONTEXT_REG_BUNCH:
541*61046927SAndroid Build Coastguard Worker  (rep)(xmov3)mov $usraddr, $data
542*61046927SAndroid Build Coastguard Worker  waitin
543*61046927SAndroid Build Coastguard Worker  mov $01, $data
544*61046927SAndroid Build Coastguard Worker
545*61046927SAndroid Build Coastguard WorkerIf there are 4 or more words remaining in the packet, that is if there are at
546*61046927SAndroid Build Coastguard Workerleast two more registers to write, then (ignoring the ``(rep)`` for a moment)
547*61046927SAndroid Build Coastguard Workerthe instruction expands to::
548*61046927SAndroid Build Coastguard Worker
549*61046927SAndroid Build Coastguard Worker  mov $usraddr, $data
550*61046927SAndroid Build Coastguard Worker  mov $data, $data
551*61046927SAndroid Build Coastguard Worker  mov $usraddr, $data
552*61046927SAndroid Build Coastguard Worker  mov $data, $data
553*61046927SAndroid Build Coastguard Worker
554*61046927SAndroid Build Coastguard WorkerThis is likely all executed in a single cycle, allowing us to write 2 registers
555*61046927SAndroid Build Coastguard Workerper cycle.
556*61046927SAndroid Build Coastguard Worker
557*61046927SAndroid Build Coastguard Worker``(xmov1)`` can be also added to ``(rep)mov $data, $data``, which is a common
558*61046927SAndroid Build Coastguard Workerpattern to write the rest of the packet to successive registers, to write up to
559*61046927SAndroid Build Coastguard Worker2 registers per cycle as well. The firmware does not use ``(xmov3)``, however,
560*61046927SAndroid Build Coastguard Workerso 2 registers per cycle is likely a hardware limitation.
561*61046927SAndroid Build Coastguard Worker
562*61046927SAndroid Build Coastguard WorkerAlthough ``(xmovN)`` is often used in combination with ``(rep)``, it doesn't
563*61046927SAndroid Build Coastguard Workerhave to be. For example, ``(xmov1)mov $data, $data`` moves the next 2 packet
564*61046927SAndroid Build Coastguard Workerwords to 2 successive registers.
565*61046927SAndroid Build Coastguard Worker
566*61046927SAndroid Build Coastguard Worker.. _afuc-sds:
567*61046927SAndroid Build Coastguard Worker
568*61046927SAndroid Build Coastguard WorkerSet Draw State
569*61046927SAndroid Build Coastguard Worker--------------
570*61046927SAndroid Build Coastguard Worker
571*61046927SAndroid Build Coastguard Worker``(sdsN)`` is a modifier for ``cwrite`` used to accelerate
572*61046927SAndroid Build Coastguard Worker``CP_SET_DRAW_STATE``. For each draw state group to update,
573*61046927SAndroid Build Coastguard Worker``CP_SET_DRAW_STATE`` needs to copy 3 words from the packet containing the
574*61046927SAndroid Build Coastguard Workergroup to update, metadata, and base address plus size.  Using the ``(sds2)``
575*61046927SAndroid Build Coastguard Workermodifier as well as ``(rep)``, this can be accomplished in a single
576*61046927SAndroid Build Coastguard Workerinstruction::
577*61046927SAndroid Build Coastguard Worker
578*61046927SAndroid Build Coastguard Worker  (rep)(sds2)cwrite $data, [$00 + @DRAW_STATE_SET_HDR]
579*61046927SAndroid Build Coastguard Worker
580*61046927SAndroid Build Coastguard WorkerThe first word containing the header is written to ``@DRAW_STATE_SET_HDR``, and
581*61046927SAndroid Build Coastguard Workerthe second and third words containing the draw state base come from reading the
582*61046927SAndroid Build Coastguard Workersource again twice and are written directly to the draw state RAM.
583*61046927SAndroid Build Coastguard Worker
584*61046927SAndroid Build Coastguard WorkerIn testing with other control registers, ``(sdsN)`` causes the source to be
585*61046927SAndroid Build Coastguard Workerread ``N`` extra times and then thrown away. Only when used in combination with
586*61046927SAndroid Build Coastguard Worker``@DRAW_STATE_SET_HDR`` do the extra source reads have an effect.
587*61046927SAndroid Build Coastguard Worker
588*61046927SAndroid Build Coastguard Worker.. _afuc-peek:
589*61046927SAndroid Build Coastguard Worker
590*61046927SAndroid Build Coastguard WorkerPeek
591*61046927SAndroid Build Coastguard Worker----
592*61046927SAndroid Build Coastguard Worker
593*61046927SAndroid Build Coastguard Worker``(peek)`` is valid on ALU instructions without an immediate. It modifies what
594*61046927SAndroid Build Coastguard Worker``$data`` (and possibly ``$memdata`` and ``$regdata``) do by making them avoid
595*61046927SAndroid Build Coastguard Workerconsuming the word. The next read to ``$data`` will return the same thing. This
596*61046927SAndroid Build Coastguard Workeris used solely by ``CP_INDIRECT_BUFFER`` to test if there is a subsequent IB
597*61046927SAndroid Build Coastguard Workerthat can be prefetched while the first IB is executed without actually
598*61046927SAndroid Build Coastguard Workerconsuming the header for the next packet. It is introduced on a7xx, and
599*61046927SAndroid Build Coastguard Workerreplaces the use of a special control register.
600*61046927SAndroid Build Coastguard Worker
601*61046927SAndroid Build Coastguard WorkerPacket Table
602*61046927SAndroid Build Coastguard Worker============
603*61046927SAndroid Build Coastguard Worker
604*61046927SAndroid Build Coastguard WorkerThe core of the microprocessor's job is to parse each packet header and jump to
605*61046927SAndroid Build Coastguard Workerits handler. This is done through a ``waitin`` instruction which waits for the
606*61046927SAndroid Build Coastguard Workerpacket header to become available and then parses the header and jumps to the
607*61046927SAndroid Build Coastguard Workerhandler using a jump table. However it does *not* actually consume the header.
608*61046927SAndroid Build Coastguard WorkerLike any branch instruction, it has a delay slot, and by convention this delay
609*61046927SAndroid Build Coastguard Workerslot always contains a ``mov $01, $data`` instruction. This consumes the same
610*61046927SAndroid Build Coastguard Workerheader that ``waitin`` parsed and puts it in ``$01`` so that the packet header
611*61046927SAndroid Build Coastguard Workeris available in ``$01`` in the next packet. Thus all packet handlers end with
612*61046927SAndroid Build Coastguard Workerthis sequence::
613*61046927SAndroid Build Coastguard Worker
614*61046927SAndroid Build Coastguard Worker  waitin
615*61046927SAndroid Build Coastguard Worker  mov $01, $data
616*61046927SAndroid Build Coastguard Worker
617*61046927SAndroid Build Coastguard WorkerThe jump table itself is initialized by the SQE in the bootstrap routine at the
618*61046927SAndroid Build Coastguard Workerbeginning of the firmware. Amongst other tasks, it reads the offset of the jump
619*61046927SAndroid Build Coastguard Workertable from the NOP payload at the beginning, then uses a jump table embedded at
620*61046927SAndroid Build Coastguard Workerthe end of the firmware to set it up by writing the ``@PACKET_TABLE_WRITE``
621*61046927SAndroid Build Coastguard Workercontrol register.  After everything is setup, it does the ``waitin`` sequence
622*61046927SAndroid Build Coastguard Workerto start handling the first packet (which should be ``CP_ME_INIT``).
623*61046927SAndroid Build Coastguard Worker
624*61046927SAndroid Build Coastguard WorkerExample Packet
625*61046927SAndroid Build Coastguard Worker==============
626*61046927SAndroid Build Coastguard Worker
627*61046927SAndroid Build Coastguard WorkerLet's examine an implementation of ``CP_MEM_WRITE``::
628*61046927SAndroid Build Coastguard Worker
629*61046927SAndroid Build Coastguard Worker  CP_MEM_WRITE:
630*61046927SAndroid Build Coastguard Worker  mov $addr, 0x00a0 << 24 ; |NRT_ADDR
631*61046927SAndroid Build Coastguard Worker
632*61046927SAndroid Build Coastguard WorkerFirst, we setup the register to write to, which is the ``NRT_ADDR``
633*61046927SAndroid Build Coastguard Worker:ref:`afuc-pipe-regs <pipe register>`. It turns out that the low 2 bits of
634*61046927SAndroid Build Coastguard Worker``NRT_ADDR`` are a flag which when 1 disables auto-incrementing ``NRT_ADDR``
635*61046927SAndroid Build Coastguard Workerwhen ``NRT_DATA`` is written, but we don't want this behavior so we have to
636*61046927SAndroid Build Coastguard Workermake sure they are clear::
637*61046927SAndroid Build Coastguard Worker
638*61046927SAndroid Build Coastguard Worker  or $02, $data, 0x0003 ; reading $data reads the next PM4 word
639*61046927SAndroid Build Coastguard Worker  xor $data, $02, 0x0003 ; writing $data writes the register, which is NRT_ADDR
640*61046927SAndroid Build Coastguard Worker
641*61046927SAndroid Build Coastguard WorkerWriting ``$data`` auto-increments ``$addr``, so now the next write is to
642*61046927SAndroid Build Coastguard Worker``0xa1`` or ``NRT_ADDR+1`` (``NRT_ADDR`` is a 64-bit register)::
643*61046927SAndroid Build Coastguard Worker
644*61046927SAndroid Build Coastguard Worker  mov $data, $data
645*61046927SAndroid Build Coastguard Worker
646*61046927SAndroid Build Coastguard WorkerNow, we have to write ``NRT_DATA``. We want to repeatedly write the same
647*61046927SAndroid Build Coastguard Workerregister, without having to fight the auto-increment by resetting ``$addr``
648*61046927SAndroid Build Coastguard Workereach time, which is where the bit 18 that disables auto-increment comes in
649*61046927SAndroid Build Coastguard Workerhandy::
650*61046927SAndroid Build Coastguard Worker
651*61046927SAndroid Build Coastguard Worker  mov $addr, 0xa204 << 16 ; |NRT_DATA
652*61046927SAndroid Build Coastguard Worker
653*61046927SAndroid Build Coastguard WorkerFinally, we have to repeatedly copy the remaining PM4 packet data to the
654*61046927SAndroid Build Coastguard Worker``NRT_DATA`` register, which we can do in one instruction with
655*61046927SAndroid Build Coastguard Worker:ref:`afuc-rep <(rep)>`. Furthermore we can use :ref:`afuc-xmov <(xmov1)>` to
656*61046927SAndroid Build Coastguard Workersqueeze out some more performance::
657*61046927SAndroid Build Coastguard Worker
658*61046927SAndroid Build Coastguard Worker  (rep)(xmov1)mov $data, $data
659*61046927SAndroid Build Coastguard Worker
660*61046927SAndroid Build Coastguard WorkerAt the end is the standard go-to-next-packet sequence::
661*61046927SAndroid Build Coastguard Worker
662*61046927SAndroid Build Coastguard Worker  waitin
663*61046927SAndroid Build Coastguard Worker  mov $01, $data
664*61046927SAndroid Build Coastguard Worker
665*61046927SAndroid Build Coastguard WorkerReassembling Firmwares
666*61046927SAndroid Build Coastguard Worker======================
667*61046927SAndroid Build Coastguard Worker
668*61046927SAndroid Build Coastguard WorkerOf course, the main use of assembling is to take the firmware you're using,
669*61046927SAndroid Build Coastguard Workermodify it to test something, and reassemble it. Reassembling a firmware should
670*61046927SAndroid Build Coastguard Workerwork out-of-the-box, and should give you back an identical firmware, but there
671*61046927SAndroid Build Coastguard Workeris a caveat if you want to reassemble a modified firmware and use preemption.
672*61046927SAndroid Build Coastguard WorkerThe preemption routines contain a few tables embedded in the firmware, and they
673*61046927SAndroid Build Coastguard Workerload the offset of the table with a ``mov`` instruction that needs to be turned
674*61046927SAndroid Build Coastguard Workerinto a relocation and then add it to ``CP_SQE_INSTR_BASE``. ``afuc-asm``
675*61046927SAndroid Build Coastguard Workersupports using labels as immediates for this::
676*61046927SAndroid Build Coastguard Worker
677*61046927SAndroid Build Coastguard Worker  foo:
678*61046927SAndroid Build Coastguard Worker  [00000000]
679*61046927SAndroid Build Coastguard Worker  ...
680*61046927SAndroid Build Coastguard Worker
681*61046927SAndroid Build Coastguard Worker  mov $02, #foo << 2 ; #foo will be replaced with the offset in words
682*61046927SAndroid Build Coastguard Worker
683*61046927SAndroid Build Coastguard WorkerHowever, you have to manually insert the labels and replace the constant. On
684*61046927SAndroid Build Coastguard Workera7xx there are multiple tables next to each other that look like one table, so
685*61046927SAndroid Build Coastguard Workerbe careful to make sure you've found all the places it offsets from
686*61046927SAndroid Build Coastguard Worker``CP_SQE_INSTR_BASE``! There are also tables in the BV microcode on a7xx. To
687*61046927SAndroid Build Coastguard Workercheck that the relocations are correct, check that reassembling an otherwise
688*61046927SAndroid Build Coastguard Workerunmodified firmware still gives an identical result after adding the
689*61046927SAndroid Build Coastguard Workerrelocations.
690*61046927SAndroid Build Coastguard Worker
691*61046927SAndroid Build Coastguard WorkerA6XX NOTES
692*61046927SAndroid Build Coastguard Worker==========
693*61046927SAndroid Build Coastguard Worker
694*61046927SAndroid Build Coastguard WorkerThe ``$14`` register holds global flags set by:
695*61046927SAndroid Build Coastguard Worker
696*61046927SAndroid Build Coastguard Worker  CP_SKIP_IB2_ENABLE_LOCAL - b8
697*61046927SAndroid Build Coastguard Worker  CP_SKIP_IB2_ENABLE_GLOBAL - b9
698*61046927SAndroid Build Coastguard Worker  CP_SET_MARKER
699*61046927SAndroid Build Coastguard Worker    MODE=GMEM - sets b15
700*61046927SAndroid Build Coastguard Worker    MODE=BLIT2D - clears b15, b12, b7
701*61046927SAndroid Build Coastguard Worker  CP_SET_MODE - b29+b30
702*61046927SAndroid Build Coastguard Worker  CP_SET_VISIBILITY_OVERRIDE - b11, b21, b30?
703*61046927SAndroid Build Coastguard Worker  CP_SET_DRAW_STATE - checks b29+b30
704*61046927SAndroid Build Coastguard Worker
705*61046927SAndroid Build Coastguard Worker  CP_COND_REG_EXEC - checks b10, which should be predicate flag?
706