Name Date Size #Lines LOC

..--

test/H25-Apr-2025-1,250968

README.mdH A D25-Apr-20253.8 KiB8765

agx_builder.h.pyH A D25-Apr-20254.7 KiB193157

agx_compile.cH A D25-Apr-2025114 KiB3,6422,539

agx_compile.hH A D25-Apr-20259 KiB322170

agx_compiler.hH A D25-Apr-202527.2 KiB1,061749

agx_dce.cH A D25-Apr-20251.5 KiB5937

agx_debug.hH A D25-Apr-2025874 3823

agx_insert_waits.cH A D25-Apr-20254.6 KiB165103

agx_ir.cH A D25-Apr-2025459 1811

agx_liveness.cH A D25-Apr-20253.4 KiB11259

agx_lower_64bit.cH A D25-Apr-20251.5 KiB6437

agx_lower_divergent_shuffle.cH A D25-Apr-20252.1 KiB7246

agx_lower_parallel_copy.cH A D25-Apr-202513.1 KiB424244

agx_lower_pseudo.cH A D25-Apr-20253.2 KiB11580

agx_lower_spill.cH A D25-Apr-20252.5 KiB7846

agx_lower_uniform_sources.cH A D25-Apr-20252.5 KiB8765

agx_minifloat.hH A D25-Apr-20251.6 KiB6337

agx_nir.hH A D25-Apr-2025519 1910

agx_nir_algebraic.pyH A D25-Apr-20257.4 KiB214131

agx_nir_lower_address.cH A D25-Apr-202511.4 KiB354223

agx_nir_lower_cull_distance.cH A D25-Apr-20254.1 KiB12971

agx_nir_lower_discard_zs_emit.cH A D25-Apr-20254.2 KiB155104

agx_nir_lower_frag_sidefx.cH A D25-Apr-20253.2 KiB10451

agx_nir_lower_interpolation.cH A D25-Apr-20255.6 KiB170103

agx_nir_lower_sample_mask.cH A D25-Apr-20257.4 KiB227102

agx_nir_lower_shared_bitsize.cH A D25-Apr-2025951 3827

agx_nir_lower_subgroups.cH A D25-Apr-20258 KiB252177

agx_nir_opt_preamble.cH A D25-Apr-20258.6 KiB340272

agx_opcodes.c.pyH A D25-Apr-20251,002 4134

agx_opcodes.h.pyH A D25-Apr-20251.9 KiB9274

agx_opcodes.pyH A D25-Apr-202517.7 KiB520388

agx_opt_break_if.cH A D25-Apr-20252.2 KiB8247

agx_opt_compact_constants.cH A D25-Apr-20251.8 KiB6342

agx_opt_cse.cH A D25-Apr-20253.6 KiB14097

agx_opt_empty_else.cH A D25-Apr-20252.1 KiB10246

agx_opt_jmp_none.cH A D25-Apr-20255.8 KiB18591

agx_opt_promote_constants.cH A D25-Apr-20255.4 KiB185114

agx_optimizer.cH A D25-Apr-202515.6 KiB552329

agx_pack.cH A D25-Apr-202539.8 KiB1,185921

agx_performance.cH A D25-Apr-20253.9 KiB13696

agx_pressure_schedule.cH A D25-Apr-20257.3 KiB277174

agx_print.cH A D25-Apr-20255.7 KiB265199

agx_register_allocate.cH A D25-Apr-202552.4 KiB1,598996

agx_reindex_ssa.cH A D25-Apr-2025689 3421

agx_repair_ssa.cH A D25-Apr-20258.6 KiB316214

agx_spill.cH A D25-Apr-202533.8 KiB1,216775

agx_validate.cH A D25-Apr-202513.8 KiB532391

meson.buildH A D25-Apr-20253.1 KiB124114

README.md

1# Special registers
2
3`r0l` is the hardware nesting counter.
4
5`r1` is the hardware link register.
6
7`r5` and `r6` are preloaded in vertex shaders to the vertex ID and instance ID.
8
9# ABI
10
11The following section describes the ABI used by non-monolithic programs.
12
13## Vertex
14
15Registers have the following layout at the beginning of the vertex shader
16(written by the vertex prolog):
17
18* `r0-r4` and `r7` undefined. This avoids preloading into the nesting counter or
19  having unaligned values. The prolog is free to use these registers as
20  temporaries.
21* `r5-r6` retain their usual meanings, even if the vertex shader is running as a
22  hardware compute shader. This allows software index fetch code to run in the
23  prolog without contaminating the main shader key.
24* `r8` onwards contains 128-bit uniform vectors for each attribute.
25  Accommodates 30 attributes without spilling, exceeding the 16 attribute API
26  minimum. For 32 attributes, we will need to use function calls or the stack.
27
28One useful property is that the GPR usage of the combined program is equal to
29the GPR usage of the main shader. The prolog cannot write higher registers than
30read by the main shader.
31
32Vertex prologs do not have any uniform registers allocated for preamble
33optimization or constant promotion, as this adds complexity without any
34legitimate use case.
35
36For a vertex shader reading $n$ attributes, the following layout is used:
37
38* The first $n$ 64-bit uniforms are the base addresses of each attribute.
39* The next $n$ 32-bit uniforms are the associated clamps (sizes). Presently
40  robustness is always used.
41* The next 2x32-bit uniform is the base vertex and base instance. This must
42  always be reserved because it is unknown at vertex shader compile-time whether
43  any attribute will use instancing. Reserving also the base vertex allows us to
44  push both conveniently with a single USC Uniform word.
45* The next 16-bit is the draw ID.
46* For a hardware compute shader, the next 48-bit is padding.
47* For a hardware compute shader, the next 64-bit uniform is a pointer to the
48  input assembly buffer.
49
50In total, the first $6n + 5$ 16-bit uniform slots are reserved for a hardware
51vertex shader, or $6n + 12$ for a hardware compute shader.
52
53## Fragment
54
55When sample shading is enabled in a non-monolithic fragment shader, the fragment
56shader has the following register inputs:
57
58* `r0l = 0`. This is the hardware nesting counter.
59* `r0h` is the mask of samples currently being shaded. This usually equals to
60  `1 << sample ID`, for "true" per-sample shading.
61
62When sample shading is disabled, no register inputs are defined. The fragment
63prolog (if present) may clobber whatever registers it pleases.
64
65Registers have the following layout at the end of the fragment shader (read by
66the fragment epilog):
67
68* `r0l = 0` if sample shading is enabled. This is implicitly true.
69* `r0h` preserved if sample shading is enabled.
70* `r2` and `r3l` contain the emitted depth/stencil respectively, if
71  depth and/or stencil are written by the fragment shader. Depth/stencil writes
72  must be deferred to the epilog for correctness when the epilog can discard
73  (i.e. when alpha-to-coverage is enabled).
74* `r3h` contains the logically emitted sample mask, if the fragment shader uses
75  forced early tests. This predicates the epilog's stores.
76* The vec4 of 32-bit registers beginning at `r(4 * (i + 1))` contains the colour
77  output for render target `i`. When dual source blending is enabled, there is
78  only a single render target and the dual source colour is treated as the
79  second render target (registers r8-r11).
80
81Uniform registers have the following layout:
82
83* u0_u1: 64-bit render target texture heap
84* u2...u5: Blend constant
85* u6_u7: Root descriptor, so we can fetch the 64-bit fragment invocation counter
86  address and (OpenGL only) the 64-bit polygon stipple address
87