xref: /aosp_15_r20/external/mesa3d/src/asahi/compiler/README.md (revision 6104692788411f58d303aa86923a9ff6ecaded22)
1*61046927SAndroid Build Coastguard Worker# Special registers
2*61046927SAndroid Build Coastguard Worker
3*61046927SAndroid Build Coastguard Worker`r0l` is the hardware nesting counter.
4*61046927SAndroid Build Coastguard Worker
5*61046927SAndroid Build Coastguard Worker`r1` is the hardware link register.
6*61046927SAndroid Build Coastguard Worker
7*61046927SAndroid Build Coastguard Worker`r5` and `r6` are preloaded in vertex shaders to the vertex ID and instance ID.
8*61046927SAndroid Build Coastguard Worker
9*61046927SAndroid Build Coastguard Worker# ABI
10*61046927SAndroid Build Coastguard Worker
11*61046927SAndroid Build Coastguard WorkerThe following section describes the ABI used by non-monolithic programs.
12*61046927SAndroid Build Coastguard Worker
13*61046927SAndroid Build Coastguard Worker## Vertex
14*61046927SAndroid Build Coastguard Worker
15*61046927SAndroid Build Coastguard WorkerRegisters have the following layout at the beginning of the vertex shader
16*61046927SAndroid Build Coastguard Worker(written by the vertex prolog):
17*61046927SAndroid Build Coastguard Worker
18*61046927SAndroid Build Coastguard Worker* `r0-r4` and `r7` undefined. This avoids preloading into the nesting counter or
19*61046927SAndroid Build Coastguard Worker  having unaligned values. The prolog is free to use these registers as
20*61046927SAndroid Build Coastguard Worker  temporaries.
21*61046927SAndroid Build Coastguard Worker* `r5-r6` retain their usual meanings, even if the vertex shader is running as a
22*61046927SAndroid Build Coastguard Worker  hardware compute shader. This allows software index fetch code to run in the
23*61046927SAndroid Build Coastguard Worker  prolog without contaminating the main shader key.
24*61046927SAndroid Build Coastguard Worker* `r8` onwards contains 128-bit uniform vectors for each attribute.
25*61046927SAndroid Build Coastguard Worker  Accommodates 30 attributes without spilling, exceeding the 16 attribute API
26*61046927SAndroid Build Coastguard Worker  minimum. For 32 attributes, we will need to use function calls or the stack.
27*61046927SAndroid Build Coastguard Worker
28*61046927SAndroid Build Coastguard WorkerOne useful property is that the GPR usage of the combined program is equal to
29*61046927SAndroid Build Coastguard Workerthe GPR usage of the main shader. The prolog cannot write higher registers than
30*61046927SAndroid Build Coastguard Workerread by the main shader.
31*61046927SAndroid Build Coastguard Worker
32*61046927SAndroid Build Coastguard WorkerVertex prologs do not have any uniform registers allocated for preamble
33*61046927SAndroid Build Coastguard Workeroptimization or constant promotion, as this adds complexity without any
34*61046927SAndroid Build Coastguard Workerlegitimate use case.
35*61046927SAndroid Build Coastguard Worker
36*61046927SAndroid Build Coastguard WorkerFor a vertex shader reading $n$ attributes, the following layout is used:
37*61046927SAndroid Build Coastguard Worker
38*61046927SAndroid Build Coastguard Worker* The first $n$ 64-bit uniforms are the base addresses of each attribute.
39*61046927SAndroid Build Coastguard Worker* The next $n$ 32-bit uniforms are the associated clamps (sizes). Presently
40*61046927SAndroid Build Coastguard Worker  robustness is always used.
41*61046927SAndroid Build Coastguard Worker* The next 2x32-bit uniform is the base vertex and base instance. This must
42*61046927SAndroid Build Coastguard Worker  always be reserved because it is unknown at vertex shader compile-time whether
43*61046927SAndroid Build Coastguard Worker  any attribute will use instancing. Reserving also the base vertex allows us to
44*61046927SAndroid Build Coastguard Worker  push both conveniently with a single USC Uniform word.
45*61046927SAndroid Build Coastguard Worker* The next 16-bit is the draw ID.
46*61046927SAndroid Build Coastguard Worker* For a hardware compute shader, the next 48-bit is padding.
47*61046927SAndroid Build Coastguard Worker* For a hardware compute shader, the next 64-bit uniform is a pointer to the
48*61046927SAndroid Build Coastguard Worker  input assembly buffer.
49*61046927SAndroid Build Coastguard Worker
50*61046927SAndroid Build Coastguard WorkerIn total, the first $6n + 5$ 16-bit uniform slots are reserved for a hardware
51*61046927SAndroid Build Coastguard Workervertex shader, or $6n + 12$ for a hardware compute shader.
52*61046927SAndroid Build Coastguard Worker
53*61046927SAndroid Build Coastguard Worker## Fragment
54*61046927SAndroid Build Coastguard Worker
55*61046927SAndroid Build Coastguard WorkerWhen sample shading is enabled in a non-monolithic fragment shader, the fragment
56*61046927SAndroid Build Coastguard Workershader has the following register inputs:
57*61046927SAndroid Build Coastguard Worker
58*61046927SAndroid Build Coastguard Worker* `r0l = 0`. This is the hardware nesting counter.
59*61046927SAndroid Build Coastguard Worker* `r0h` is the mask of samples currently being shaded. This usually equals to
60*61046927SAndroid Build Coastguard Worker  `1 << sample ID`, for "true" per-sample shading.
61*61046927SAndroid Build Coastguard Worker
62*61046927SAndroid Build Coastguard WorkerWhen sample shading is disabled, no register inputs are defined. The fragment
63*61046927SAndroid Build Coastguard Workerprolog (if present) may clobber whatever registers it pleases.
64*61046927SAndroid Build Coastguard Worker
65*61046927SAndroid Build Coastguard WorkerRegisters have the following layout at the end of the fragment shader (read by
66*61046927SAndroid Build Coastguard Workerthe fragment epilog):
67*61046927SAndroid Build Coastguard Worker
68*61046927SAndroid Build Coastguard Worker* `r0l = 0` if sample shading is enabled. This is implicitly true.
69*61046927SAndroid Build Coastguard Worker* `r0h` preserved if sample shading is enabled.
70*61046927SAndroid Build Coastguard Worker* `r2` and `r3l` contain the emitted depth/stencil respectively, if
71*61046927SAndroid Build Coastguard Worker  depth and/or stencil are written by the fragment shader. Depth/stencil writes
72*61046927SAndroid Build Coastguard Worker  must be deferred to the epilog for correctness when the epilog can discard
73*61046927SAndroid Build Coastguard Worker  (i.e. when alpha-to-coverage is enabled).
74*61046927SAndroid Build Coastguard Worker* `r3h` contains the logically emitted sample mask, if the fragment shader uses
75*61046927SAndroid Build Coastguard Worker  forced early tests. This predicates the epilog's stores.
76*61046927SAndroid Build Coastguard Worker* The vec4 of 32-bit registers beginning at `r(4 * (i + 1))` contains the colour
77*61046927SAndroid Build Coastguard Worker  output for render target `i`. When dual source blending is enabled, there is
78*61046927SAndroid Build Coastguard Worker  only a single render target and the dual source colour is treated as the
79*61046927SAndroid Build Coastguard Worker  second render target (registers r8-r11).
80*61046927SAndroid Build Coastguard Worker
81*61046927SAndroid Build Coastguard WorkerUniform registers have the following layout:
82*61046927SAndroid Build Coastguard Worker
83*61046927SAndroid Build Coastguard Worker* u0_u1: 64-bit render target texture heap
84*61046927SAndroid Build Coastguard Worker* u2...u5: Blend constant
85*61046927SAndroid Build Coastguard Worker* u6_u7: Root descriptor, so we can fetch the 64-bit fragment invocation counter
86*61046927SAndroid Build Coastguard Worker  address and (OpenGL only) the 64-bit polygon stipple address
87