xref: /aosp_15_r20/external/mesa3d/docs/drivers/freedreno/hw/lrz.rst (revision 6104692788411f58d303aa86923a9ff6ecaded22)
1Low Resolution Z Buffer
2=======================
3
4This doc is based on a6xx HW reverse engineering, a5xx should be similar to
5a6xx before gen3.
6
7Low Resolution Z buffer is very similar to a depth prepass that helps
8the HW to avoid executing the fragment shader on those fragments that will
9be subsequently discarded by the depth test afterwards.
10
11The interesting part of this feature is that it allows applications
12to submit the vertices in any order.
13
14Citing official Adreno documentation:
15
16::
17
18  [A Low Resolution Z (LRZ)] pass is also referred to as draw order independent
19  depth rejection. During the binning pass, a low resolution Z-buffer is constructed,
20  and can reject LRZ-tile wide contributions to boost binning performance. This LRZ
21  is then used during the rendering pass to reject pixels efficiently before testing
22  against the full resolution Z-buffer.
23
24Limitations
25-----------
26
27There are two main limitations of LRZ:
28
29- Since LRZ is an early depth test, such test cannot be used when late-z is required;
30- LRZ buffer could be formed only in one direction, changing depth comparison directions
31  without disabling LRZ would lead to a malformed LRZ buffer.
32
33Pre-a650 (before gen3)
34----------------------
35
36The direction is fully tracked on CPU. In render pass LRZ starts with
37unknown direction, the direction is set first time when depth write occurs
38and if it does change afterwards then the direction becomes invalid and LRZ is
39disabled for the rest of the render pass.
40
41Since the direction is not tracked by the GPU, it's impossible to know whether
42LRZ is enabled during construction of secondary command buffers.
43
44For the same reason, it's impossible to reuse LRZ between render passes.
45
46A650+ (gen3+)
47-------------
48
49Now LRZ direction can be tracked on GPU. There are two parts:
50
51- Direction byte which stores current LRZ direction - ``GRAS_LRZ_CNTL.DIR``.
52- Parameters of the last used depth view - ``GRAS_LRZ_DEPTH_VIEW``.
53
54The idea is the same as when LRZ tracked on CPU: when ``GRAS_LRZ_CNTL``
55is used, its direction is compared to the previously known direction
56and direction byte is set to disabled when directions are incompatible.
57
58Additionally, to reuse LRZ between render passes, ``GRAS_LRZ_CNTL`` checks
59if the current value of ``GRAS_LRZ_DEPTH_VIEW`` is equal to the value
60stored in the buffer. If not, LRZ is disabled. This is necessary
61because depth buffer may have several layers and mip levels, while the
62LRZ buffer represents only a single layer + mip level.
63
64A7XX
65-------------
66
67A7XX introduces the concept of bidirectional LRZ where there are two LRZ
68buffers, one for each direction. This way LRZ doesn't need to be disabled
69when the direction changes, by default, this behavior is disabled but the
70LRZ buffers have to be allocated with this space in mind as fast clears
71will always write metadata for both.
72
73Additionally, there are now two seperate LRZ buffers (on top of one for
74each direction, a total of four) - due to concurrent binning, one can be
75used for binning and the other for rendering concurrently. These can be
76flipped between via the `LRZ_FLIP_BUFFER` event which can be put inside
77a conditional block for either the BV or BR.
78
79LRZ Fast-Clear
80--------------
81
82The LRZ fast-clear buffer is initialized to zeroes and read/written
83when ``GRAS_LRZ_CNTL.FC_ENABLE`` is set. It appears to store 1b/block.
84``0`` means block has original depth clear value, and ``1`` means that the
85corresponding block in LRZ has been modified.
86
87LRZ fast-clear conservatively clears LRZ buffer. At the point where LRZ is
88written the LRZ block which corresponds to a single fast-clear bit is cleared:
89
90- To ``0.0`` if depth comparison is ``GREATER``
91- To ``1.0`` if depth comparison is ``LESS``
92
93This way it's always valid to fast-clear.
94
95On A7XX, the original depth clear value can be specified exactly allowing for
96fast-clear to any value rather than just ``1.0`` or ``0.0``.
97
98LRZ Feedback
99-------------
100
101Some draws do write depth but cannot contribute to LRZ during the BINNING pass
102e.g. when fragment shader has "discard" in it, however they can contribute to LRZ
103during the RENDERING pass via LRZ feedback mechanism. This may allow the draws
104that follow to depth test against the updated LRZ, this is especially important
105if such "bad" draws were at the start of the renderpass.
106
107LRZ feedback happens during the RENDERING pass when ``LRZ_FEEDBACK_ZMODE_MASK``
108is set, if draw has a6xx_ztest_mode that has corresponding flag set in
109``LRZ_FEEDBACK_ZMODE_MASK`` - its depth values would be used for feedback.
110
111LRZ feedback alongside with LRZ testing also works during sysmem rendering.
112
113LRZ Precision
114-------------
115
116LRZ always uses ``Z16_UNORM``. The epsilon for it is ``1.f / (1 << 16)`` which is
117not enough to represent all values of ``Z32_UNORM`` or ``Z32_FLOAT``.
118This especially raises questions in context of fast-clear, if fast-clear
119uses a value which cannot be precisely represented by LRZ - we wouldn't
120be able to round it in the correct direction since direction is tracked
121on GPU.
122
123However, it seems that depth comparisons with LRZ values have some "slack"
124and nothing special should be done for such depth clear values.
125
126How it was tested:
127
128- Clear ``Z32_FLOAT`` attachment to ``1.f / (1 << 17)``
129
130  - LRZ buffer contains all zeroes.
131
132- Do draws and check whether all samples are passing:
133
134  - ``OP_GREATER`` with ``(1.f / (1 << 17) + float32_epsilon)`` - passing;
135  - ``OP_GREATER`` with ``(1.f / (1 << 17) - float32_epsilon)`` - not passing;
136  - ``OP_LESS`` with ``(1.f / (1 << 17) - float32_epsilon)`` - samples;
137  - ``OP_LESS`` with ``(1.f / (1 << 17) + float32_epsilon)``- not passing;
138  - ``OP_LESS_OR_EQ`` with ``(1.f / (1 << 17) + float32_epsilon)`` - not passing.
139
140In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.
141
142LRZ Caches
143----------
144
145``LRZ_FLUSH`` flushes and invalidates LRZ caches, there are two caches:
146
147- Cache for fast-clear buffer;
148- Cache for direction byte + depth view params.
149
150They could be cleared by ``LRZ_CLEAR``. To become visible in GPU memory
151the caches should be flushed with ``LRZ_FLUSH`` afterwards.
152
153``GRAS_LRZ_CNTL`` reads from these caches.
154