compiler - OpenGrok cross reference for /aosp_15_r20/external/mesa3d/src/asahi/compiler/

# Special registers

`r0l` is the hardware nesting counter.

`r1` is the hardware link register.

`r5` and `r6` are preloaded in vertex shaders to the vertex ID and instance ID.

# ABI

The following section describes the ABI used by non-monolithic programs.

## Vertex

Registers have the following layout at the beginning of the vertex shader
(written by the vertex prolog):

* `r0-r4` and `r7` undefined. This avoids preloading into the nesting counter or
  having unaligned values. The prolog is free to use these registers as
  temporaries.
* `r5-r6` retain their usual meanings, even if the vertex shader is running as a
  hardware compute shader. This allows software index fetch code to run in the
  prolog without contaminating the main shader key.
* `r8` onwards contains 128-bit uniform vectors for each attribute.
  Accommodates 30 attributes without spilling, exceeding the 16 attribute API
  minimum. For 32 attributes, we will need to use function calls or the stack.

One useful property is that the GPR usage of the combined program is equal to
the GPR usage of the main shader. The prolog cannot write higher registers than
read by the main shader.

Vertex prologs do not have any uniform registers allocated for preamble
optimization or constant promotion, as this adds complexity without any
legitimate use case.

For a vertex shader reading $n$ attributes, the following layout is used:

* The first $n$ 64-bit uniforms are the base addresses of each attribute.
* The next $n$ 32-bit uniforms are the associated clamps (sizes). Presently
  robustness is always used.
* The next 2x32-bit uniform is the base vertex and base instance. This must
  always be reserved because it is unknown at vertex shader compile-time whether
  any attribute will use instancing. Reserving also the base vertex allows us to
  push both conveniently with a single USC Uniform word.
* The next 16-bit is the draw ID.
* For a hardware compute shader, the next 48-bit is padding.
* For a hardware compute shader, the next 64-bit uniform is a pointer to the
  input assembly buffer.

In total, the first $6n + 5$ 16-bit uniform slots are reserved for a hardware
vertex shader, or $6n + 12$ for a hardware compute shader.

## Fragment

When sample shading is enabled in a non-monolithic fragment shader, the fragment
shader has the following register inputs:

* `r0l = 0`. This is the hardware nesting counter.
* `r0h` is the mask of samples currently being shaded. This usually equals to
  `1 << sample ID`, for "true" per-sample shading.

When sample shading is disabled, no register inputs are defined. The fragment
prolog (if present) may clobber whatever registers it pleases.

Registers have the following layout at the end of the fragment shader (read by
the fragment epilog):

* `r0l = 0` if sample shading is enabled. This is implicitly true.
* `r0h` preserved if sample shading is enabled.
* `r2` and `r3l` contain the emitted depth/stencil respectively, if
  depth and/or stencil are written by the fragment shader. Depth/stencil writes
  must be deferred to the epilog for correctness when the epilog can discard
  (i.e. when alpha-to-coverage is enabled).
* `r3h` contains the logically emitted sample mask, if the fragment shader uses
  forced early tests. This predicates the epilog's stores.
* The vec4 of 32-bit registers beginning at `r(4 * (i + 1))` contains the colour
  output for render target `i`. When dual source blending is enabled, there is
  only a single render target and the dual source colour is treated as the
  second render target (registers r8-r11).

Uniform registers have the following layout:

* u0_u1: 64-bit render target texture heap
* u2...u5: Blend constant
* u6_u7: Root descriptor, so we can fetch the 64-bit fragment invocation counter
  address and (OpenGL only) the 64-bit polygon stipple address
Name		Date	Size	#Lines	LOC
..		-	-
test/	H	25-Apr-2025	-	1,250	968
README.md	H A D	25-Apr-2025	3.8 KiB	87	65
agx_builder.h.py	H A D	25-Apr-2025	4.7 KiB	193	157
agx_compile.c	H A D	25-Apr-2025	114 KiB	3,642	2,539
agx_compile.h	H A D	25-Apr-2025	9 KiB	322	170
agx_compiler.h	H A D	25-Apr-2025	27.2 KiB	1,061	749
agx_dce.c	H A D	25-Apr-2025	1.5 KiB	59	37
agx_debug.h	H A D	25-Apr-2025	874	38	23
agx_insert_waits.c	H A D	25-Apr-2025	4.6 KiB	165	103
agx_ir.c	H A D	25-Apr-2025	459	18	11
agx_liveness.c	H A D	25-Apr-2025	3.4 KiB	112	59
agx_lower_64bit.c	H A D	25-Apr-2025	1.5 KiB	64	37
agx_lower_divergent_shuffle.c	H A D	25-Apr-2025	2.1 KiB	72	46
agx_lower_parallel_copy.c	H A D	25-Apr-2025	13.1 KiB	424	244
agx_lower_pseudo.c	H A D	25-Apr-2025	3.2 KiB	115	80
agx_lower_spill.c	H A D	25-Apr-2025	2.5 KiB	78	46
agx_lower_uniform_sources.c	H A D	25-Apr-2025	2.5 KiB	87	65
agx_minifloat.h	H A D	25-Apr-2025	1.6 KiB	63	37
agx_nir.h	H A D	25-Apr-2025	519	19	10
agx_nir_algebraic.py	H A D	25-Apr-2025	7.4 KiB	214	131
agx_nir_lower_address.c	H A D	25-Apr-2025	11.4 KiB	354	223
agx_nir_lower_cull_distance.c	H A D	25-Apr-2025	4.1 KiB	129	71
agx_nir_lower_discard_zs_emit.c	H A D	25-Apr-2025	4.2 KiB	155	104
agx_nir_lower_frag_sidefx.c	H A D	25-Apr-2025	3.2 KiB	104	51
agx_nir_lower_interpolation.c	H A D	25-Apr-2025	5.6 KiB	170	103
agx_nir_lower_sample_mask.c	H A D	25-Apr-2025	7.4 KiB	227	102
agx_nir_lower_shared_bitsize.c	H A D	25-Apr-2025	951	38	27
agx_nir_lower_subgroups.c	H A D	25-Apr-2025	8 KiB	252	177
agx_nir_opt_preamble.c	H A D	25-Apr-2025	8.6 KiB	340	272
agx_opcodes.c.py	H A D	25-Apr-2025	1,002	41	34
agx_opcodes.h.py	H A D	25-Apr-2025	1.9 KiB	92	74
agx_opcodes.py	H A D	25-Apr-2025	17.7 KiB	520	388
agx_opt_break_if.c	H A D	25-Apr-2025	2.2 KiB	82	47
agx_opt_compact_constants.c	H A D	25-Apr-2025	1.8 KiB	63	42
agx_opt_cse.c	H A D	25-Apr-2025	3.6 KiB	140	97
agx_opt_empty_else.c	H A D	25-Apr-2025	2.1 KiB	102	46
agx_opt_jmp_none.c	H A D	25-Apr-2025	5.8 KiB	185	91
agx_opt_promote_constants.c	H A D	25-Apr-2025	5.4 KiB	185	114
agx_optimizer.c	H A D	25-Apr-2025	15.6 KiB	552	329
agx_pack.c	H A D	25-Apr-2025	39.8 KiB	1,185	921
agx_performance.c	H A D	25-Apr-2025	3.9 KiB	136	96
agx_pressure_schedule.c	H A D	25-Apr-2025	7.3 KiB	277	174
agx_print.c	H A D	25-Apr-2025	5.7 KiB	265	199
agx_register_allocate.c	H A D	25-Apr-2025	52.4 KiB	1,598	996
agx_reindex_ssa.c	H A D	25-Apr-2025	689	34	21
agx_repair_ssa.c	H A D	25-Apr-2025	8.6 KiB	316	214
agx_spill.c	H A D	25-Apr-2025	33.8 KiB	1,216	775
agx_validate.c	H A D	25-Apr-2025	13.8 KiB	532	391
meson.build	H A D	25-Apr-2025	3.1 KiB	124	114