1# Special registers 2 3`r0l` is the hardware nesting counter. 4 5`r1` is the hardware link register. 6 7`r5` and `r6` are preloaded in vertex shaders to the vertex ID and instance ID. 8 9# ABI 10 11The following section describes the ABI used by non-monolithic programs. 12 13## Vertex 14 15Registers have the following layout at the beginning of the vertex shader 16(written by the vertex prolog): 17 18* `r0-r4` and `r7` undefined. This avoids preloading into the nesting counter or 19 having unaligned values. The prolog is free to use these registers as 20 temporaries. 21* `r5-r6` retain their usual meanings, even if the vertex shader is running as a 22 hardware compute shader. This allows software index fetch code to run in the 23 prolog without contaminating the main shader key. 24* `r8` onwards contains 128-bit uniform vectors for each attribute. 25 Accommodates 30 attributes without spilling, exceeding the 16 attribute API 26 minimum. For 32 attributes, we will need to use function calls or the stack. 27 28One useful property is that the GPR usage of the combined program is equal to 29the GPR usage of the main shader. The prolog cannot write higher registers than 30read by the main shader. 31 32Vertex prologs do not have any uniform registers allocated for preamble 33optimization or constant promotion, as this adds complexity without any 34legitimate use case. 35 36For a vertex shader reading $n$ attributes, the following layout is used: 37 38* The first $n$ 64-bit uniforms are the base addresses of each attribute. 39* The next $n$ 32-bit uniforms are the associated clamps (sizes). Presently 40 robustness is always used. 41* The next 2x32-bit uniform is the base vertex and base instance. This must 42 always be reserved because it is unknown at vertex shader compile-time whether 43 any attribute will use instancing. Reserving also the base vertex allows us to 44 push both conveniently with a single USC Uniform word. 45* The next 16-bit is the draw ID. 46* For a hardware compute shader, the next 48-bit is padding. 47* For a hardware compute shader, the next 64-bit uniform is a pointer to the 48 input assembly buffer. 49 50In total, the first $6n + 5$ 16-bit uniform slots are reserved for a hardware 51vertex shader, or $6n + 12$ for a hardware compute shader. 52 53## Fragment 54 55When sample shading is enabled in a non-monolithic fragment shader, the fragment 56shader has the following register inputs: 57 58* `r0l = 0`. This is the hardware nesting counter. 59* `r0h` is the mask of samples currently being shaded. This usually equals to 60 `1 << sample ID`, for "true" per-sample shading. 61 62When sample shading is disabled, no register inputs are defined. The fragment 63prolog (if present) may clobber whatever registers it pleases. 64 65Registers have the following layout at the end of the fragment shader (read by 66the fragment epilog): 67 68* `r0l = 0` if sample shading is enabled. This is implicitly true. 69* `r0h` preserved if sample shading is enabled. 70* `r2` and `r3l` contain the emitted depth/stencil respectively, if 71 depth and/or stencil are written by the fragment shader. Depth/stencil writes 72 must be deferred to the epilog for correctness when the epilog can discard 73 (i.e. when alpha-to-coverage is enabled). 74* `r3h` contains the logically emitted sample mask, if the fragment shader uses 75 forced early tests. This predicates the epilog's stores. 76* The vec4 of 32-bit registers beginning at `r(4 * (i + 1))` contains the colour 77 output for render target `i`. When dual source blending is enabled, there is 78 only a single render target and the dual source colour is treated as the 79 second render target (registers r8-r11). 80 81Uniform registers have the following layout: 82 83* u0_u1: 64-bit render target texture heap 84* u2...u5: Blend constant 85* u6_u7: Root descriptor, so we can fetch the 64-bit fragment invocation counter 86 address and (OpenGL only) the 64-bit polygon stipple address 87