1Freedreno 2========= 3 4Freedreno GLES and GL driver for Adreno 2xx-6xx GPUs. It implements up to 5OpenGL ES 3.2 and desktop OpenGL 4.5. 6 7See the `Freedreno Wiki 8<https://gitlab.freedesktop.org/freedreno/freedreno/-/wikis/home>`__ for more 9details. 10 11Turnip 12------ 13 14Turnip is a Vulkan 1.3 driver for Adreno 6xx GPUs. 15 16The current set of specific chip versions supported can be found in 17:file:`src/freedreno/common/freedreno_devices.py`. The current set of features 18supported can be found rendered at `Mesa Matrix <https://mesamatrix.net/>`__. 19There are no plans to port to a5xx or earlier GPUs. 20 21Hardware architecture 22--------------------- 23 24Adreno is a mostly tile-mode renderer, but with the option to bypass tiling 25("gmem") and render directly to system memory ("sysmem"). It is UMA, using 26mostly write combined memory but with the ability to map some buffers as cache 27coherent with the CPU. 28 29.. toctree:: 30 :glob: 31 32 freedreno/hw/* 33 34Hardware acronyms 35^^^^^^^^^^^^^^^^^ 36 37.. glossary:: 38 39 Cluster 40 A group of hardware registers, often with multiple copies to allow 41 pipelining. There is an M:N relationship between hardware blocks that do 42 work and the clusters of registers for the state that hardware blocks use. 43 44 CP 45 Command Processor. Reads the stream of state changes and draw commands 46 generated by the driver. 47 48 PFP 49 Prefetch Parser. Adreno 2xx-4xx CP component. 50 51 ME 52 Micro Engine. Adreno 2xx-4xx CP component after PFP, handles most PM4 commands. 53 54 SQE 55 a6xx+ replacement for PFP/ME. This is the microcontroller that runs the 56 microcode (loaded from Linux) which actually processes the command stream 57 and writes to the hardware registers. See `afuc 58 <https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/freedreno/afuc/README.rst>`__. 59 60 ROQ 61 DMA engine used by the SQE for reading memory, with some prefetch buffering. 62 Mostly reads in the command stream, but also serves for 63 ``CP_MEMCPY``/``CP_MEM_TO_REG`` and visibility stream reads. 64 65 SP 66 Shader Processor. Unified, scalar shader engine. One or more, depending on 67 GPU and tier. 68 69 TP 70 Texture Processor. 71 72 UCHE 73 Unified L2 Cache. 32KB on A330, unclear how big now. 74 75 CCU 76 Color Cache Unit. 77 78 VSC 79 Visibility Stream Compressor 80 81 PVS 82 Primitive Visibility Stream 83 84 FE 85 Front End? Index buffer and vertex attribute fetch cluster. Includes PC, 86 VFD, VPC. 87 88 VFD 89 Vertex Fetch and Decode 90 91 VPC 92 Varying/Position Cache? Hardware block that stores shaded vertex data for 93 primitive assembly. 94 95 HLSQ 96 High Level Sequencer. Manages state for the SPs, batches up PS invocations 97 between primitives, is involved in preemption. 98 99 PC_VS 100 Cluster where varyings are read from VPC and assembled into primitives to 101 feed GRAS. 102 103 VS 104 Vertex Shader. Responsible for generating VS/GS/tess invocations 105 106 GRAS 107 Rasterizer. Responsible for generating PS invocations from primitives, also 108 does LRZ 109 110 PS 111 Pixel Shader. 112 113 RB 114 Render Backend. Performs both early and late Z testing, blending, and 115 attachment stores of output of the PS. 116 117 GMEM 118 Roughly 128KB-1MB of memory on the GPU (SKU-dependent), used to store 119 attachments during tiled rendering 120 121 LRZ 122 Low Resolution Z. A low resolution area of the depth buffer that can be 123 initialized during the binning pass to contain the worst-case (farthest) Z 124 values in a block, and then used to early reject fragments during 125 rasterization. 126 127Cache hierarchy 128^^^^^^^^^^^^^^^ 129 130The a6xx GPUs have two main caches: CCU and UCHE. 131 132UCHE (Unified L2 Cache) is the cache behind the vertex fetch, VSC writes, 133texture L1, LRZ, and storage image accesses (``ldib``/``stib``). Misses and 134flushes access system memory. 135 136The CCU is the separate cache used by 2D blits and sysmem render target access 137(and also for resolves to system memory when in GMEM mode). Its memory comes 138from a carveout of GMEM controlled by ``RB_CCU_CNTL``, with a varying amount 139reserved based on whether we're in a render pass using GMEM for attachment 140storage, or we're doing sysmem rendering. Cache entries have the attachment 141number and layer mixed into the cache tag in some way, likely so that a 142fragment's access is spread through the cache even if the attachments are the 143same size and alignments in address space. This means that the cache must be 144flushed and invalidated between memory being used for one attachment and another 145(notably depth vs color, but also MRT color). 146 147The Texture Processors (TP) additionally have a small L1 cache (1KB on A330, 148unclear how big now) before accessing UCHE. This cache is used for normal 149sampling like ``sam``` and ``isam`` (and the compiler will make read-only 150storage image access through it as well). It is not coherent with UCHE (may get 151stale results when you ``sam`` after ``stib``), but must get flushed per draw or 152something because you don't need a manual invalidate between draws storing to an 153image and draws sampling from a texture. 154 155The command processor (CP) does not read from either of these caches, and 156instead uses FIFOs in the ROQ to avoid stalls reading from system memory. 157 158Draw states 159^^^^^^^^^^^ 160 161Since the SQE is not a fast processor, and tiled rendering means that many draws 162won't even be used in many bins, since a5xx state updates can be batched up into 163"draw states" that point to a fragment of CP packets. At draw time, if the draw 164call is going to actually execute (some primitive is visible in the current 165tile), the SQE goes through the ``GROUP_ID``\s and for any with an update since 166the last time they were executed, it executes the corresponding fragment. 167 168Starting with a6xx, states can be tagged with whether they should be executed 169at draw time for any of sysmem, binning, or tile rendering. This allows a 170single command stream to be generated which can be executed in any of the modes, 171unlike pre-a6xx where we had to generate separate command lists for the binning 172and rendering phases. 173 174Note that this means that the generated draw state has to always update all of 175the state you have chosen to pack into that ``GROUP_ID``, since any of your 176previous state changes in a previous draw state command may have been skipped. 177 178Pipelining (a6xx+) 179^^^^^^^^^^^^^^^^^^ 180 181Most CP commands write to registers. In a6xx+, the registers are located in 182clusters corresponding to the stage of the pipeline they are used from (see 183``enum tu_stage`` for a list). To pipeline state updates and drawing, registers 184generally have two copies ("contexts") in their cluster, so previous draws can 185be working on the previous set of register state while the next draw's state is 186being set up. You can find what registers go into which clusters by looking at 187:command:`crashdec` output in the ``regs-name: CP_MEMPOOL`` section. 188 189As SQE processes register writes in the command stream, it sends them into a 190per-cluster queue stored in ``CP_MEMPOOL``. This allows the pipeline stages to 191process their stream of register updates and events independent of each other 192(so even with just 2 contexts in a stage, earlier stages can proceed on to later 193draws before later stages have caught up). 194 195Each cluster has a per-context bit indicating that the context is done/free. 196Register writes will stall on the context being done. 197 198During a 3D draw command, SQE generates several internal events flow through the 199pipeline: 200 201- ``CP_EVENT_START`` clears the done bit for the context when written to the 202 cluster 203- ``PC_EVENT_CMD``/``PC_DRAW_CMD``/``HLSQ_EVENT_CMD``/``HLSQ_DRAW_CMD`` kick off 204 the actual event/drawing. 205- ``CONTEXT_DONE`` event completes after the event/draw is complete and sets the 206 done flag. 207- ``CP_EVENT_END`` waits for the done flag on the next context, then copies all 208 the registers that were dirtied in this context to that one. 209 210The 2D blit engine has its own ``CP_2D_EVENT_START``, ``CP_2D_EVENT_END``, 211``CONTEXT_DONE_2D``, so 2D and 3D register contexts can do separate context 212rollover. 213 214Because the clusters proceed independently of each other even across draws, if 215you need to synchronize an earlier cluster to the output of a later one, then 216you will need to ``CP_WAIT_FOR_IDLE`` after flushing and invalidating any 217necessary caches. 218 219Also, note that some registers are not banked at all, and will require a 220``CP_WAIT_FOR_IDLE`` for any previous usage of the register to complete. 221 222In a2xx-a4xx, there weren't per-stage clusters, and instead there were two 223register banks that were flipped between per draw. 224 225Bindless/Bindful Descriptors (a6xx+) 226^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 227 228Starting with a6xx++, cat5 (texture) and cat6 (image/SSBO/UBO) instructions are 229extended to support bindless descriptors. 230 231In the old bindful model, descriptors are separate for textures, samplers, 232UBOs, and IBOs (combined descriptor for images and SSBOs), with separate 233registers for the memory containing the array of descriptors, and/or different 234``STATE_TYPE`` and ``STATE_BLOCK`` for ``CP_LOAD_STATE``/``_FRAG``/``_GEOM`` 235to pre-load the descriptors into cache. 236 237- textures - per-shader-stage 238 - registers: ``SP_xS_TEX_CONST``/``SP_xS_TEX_COUNT`` 239 - state-type: ``ST6_CONSTANTS`` 240 - state-block: ``SB6_xS_TEX`` 241- samplers - per-shader-stage 242 - registers: ``SP_xS_TEX_SAMP`` 243 - state-type: ``ST6_SHADER`` 244 - state-block: ``SB6_xS_TEX`` 245- UBOs - per-shader-stage 246 - registers: none 247 - state-type: ``ST6_UBO`` 248 - state-block: ``SB6_xS_SHADER`` 249- IBOs - global across shader 3d stages, separate for compute shader 250 - registers: ``SP_IBO``/``SP_IBO_COUNT`` or ``SP_CS_IBO``/``SP_CS_IBO_COUNT`` 251 - state-type: ``ST6_SHADER`` 252 - state-block: ``ST6_IBO`` or ``ST6_CS_IBO`` for compute shaders 253 - Note, unlike per-shader-stage descriptors, ``CP_LOAD_STATE6`` is used, 254 as opposed to ``CP_LOAD_STATE6_GEOM`` or ``CP_LOAD_STATE6_FRAG`` 255 depending on shader stage. 256 257.. note:: 258 For the per-shader-stage registers and state-blocks the ``xS`` notation 259 refers to per-shader-stage names, ex. ``SP_FS_TEX_CONST`` or ``SB6_DS_TEX`` 260 261Textures and IBOs (images) use *basically* the same 64byte descriptor format 262with some exceptions (for ex, for IBOs cubemaps are handles as 2d array). 263SSBOs are just untyped buffers, but otherwise use the same descriptors and 264instructions as images. Samplers use a 16byte descriptor, and UBOs use an 2658byte descriptor which packs the size in the upper 15 bits of the UBO address. 266 267In the bindless model, descriptors are split into 5 descriptor sets, which are 268global across shader stages (but as with bindful IBO descriptors, separate for 2693d stages vs compute stage). Each HW descriptor is an array of descriptors 270of configurable size (each descriptor set can be configured for a descriptor 271pitch of 8bytes or 64bytes). Each descriptor can be of arbitrary format (ie. 272UBOs/IBOs/textures/samplers interleaved), it's interpretation by the HW is 273determined by the instruction that references the descriptor. Each descriptor 274set can contain at least 2^^16 descriptors. 275 276The HW is configured with the base address of the descriptor set via an array 277of "BINDLESS_BASE" registers, ie ``SP_BINDLESS_BASE[n]``/``HLSQ_BINDLESS_BASE[n]`` 278for 3d shader stages, or ``SP_CS_BINDLESS_BASE[n]``/``HLSQ_CS_BINDLESS_BASE[n]`` 279for compute shaders, with the descriptor pitch encoded in the low bits. 280Which of the descriptor sets is referenced is encoded via three bits in the 281instruction. The address of the descriptor is calculated as:: 282 283 descriptor_addr = (BINDLESS_BASE[n] & ~0x3) + 284 (idx * 4 * (2 << BINDLESS_BASE[n] & 0x3)) 285 286 287.. note:: 288 Turnip reserves one descriptor set for internal use and exposes the other 289 four for the application via the Vulkan API. 290 291Software Architecture 292--------------------- 293 294Freedreno and Turnip use a shared core for shader compiler, image layout, and 295register and command stream definitions. They implement separate state 296management and command stream generation. 297 298.. toctree:: 299 :glob: 300 301 freedreno/* 302 303GPU devcoredump 304^^^^^^^^^^^^^^^^^^ 305 306A kernel message from DRM of "gpu fault" can mean any sort of error reported by 307the GPU (including its internal hang detection). If a fault in GPU address 308space happened, you should expect to find a message from the iommu, with the 309faulting address and a hardware unit involved: 310 311.. code-block:: text 312 313 *** gpu fault: ttbr0=000000001c941000 iova=000000010066a000 dir=READ type=TRANSLATION source=TP|VFD (0,0,0,1) 314 315On a GPU fault or hang, a GPU core dump is taken by the DRM driver and saved to 316``/sys/devices/virtual/devcoredump/**/data``. You can cp that file to a 317:file:`crash.devcore` to save it, otherwise the kernel will expire it 318eventually. Echo 1 to the file to free the core early, as another core won't be 319taken until then. 320 321Once you have your core file, you can use :command:`crashdec -f crash.devcore` 322to decode it. The output will have ``ESTIMATED CRASH LOCATION`` where we 323estimate the CP to have stopped. Note that it is expected that this will be 324some distance past whatever state triggered the fault, given GPU pipelining, and 325will often be at some ``CP_REG_TO_MEM`` (which waits on previous WFIs) or 326``CP_WAIT_FOR_ME`` (which waits for all register writes to land) or similar 327event. You can try running the workload with ``TU_DEBUG=flushall`` or 328``FD_MESA_DEBUG=flush`` to try to close in on the failing commands. 329 330You can also find what commands were queued up to each cluster in the 331``regs-name: CP_MEMPOOL`` section. 332 333If ``ESTIMATED CRASH LOCATION`` doesn't exist you could find ``CP_SQE_STAT``, 334though going here is the last resort and likely won't be helpful. 335 336.. code-block:: 337 338 indexed-registers: 339 - regs-name: CP_SQE_STAT 340 dwords: 51 341 PC: 00d7 <------------- 342 PKT: CP_LOAD_STATE6_FRAG 343 $01: 70348003 $11: 00000000 344 $02: 20000000 $12: 00000022 345 346The ``PC`` value is an instruction address in the current firmware. 347You would need to disassemble the firmware (/lib/firmware/qcom/aXXX_sqe.fw) via: 348 349.. code-block:: sh 350 351 afuc-disasm -v a650_sqe.fw > a650_sqe.fw.disasm 352 353Now you should search for PC value in the disassembly, e.g.: 354 355.. code-block:: 356 357 l018: 00d1: 08dd0001 add $addr, $06, 0x0001 358 00d2: 981ff806 mov $data, $data 359 00d3: 8a080001 mov $08, 0x0001 << 16 360 00d4: 3108ffff or $08, $08, 0xffff 361 00d5: 9be8f805 and $data, $data, $08 362 00d6: 9806e806 mov $addr, $06 363 00d7: 9803f806 mov $data, $03 <------------- HERE 364 00d8: d8000000 waitin 365 00d9: 981f0806 mov $01, $data 366 367 368Command Stream Capture 369^^^^^^^^^^^^^^^^^^^^^^ 370 371During Mesa development, it's often useful to look at the command streams we 372send to the kernel. We have an interface for the kernel to capture all 373submitted command streams: 374 375.. code-block:: sh 376 377 cat /sys/kernel/debug/dri/0/rd > cmdstream & 378 379By default, command stream capture does not capture texture/vertex/etc. data. 380You can enable capturing all the BOs with: 381 382.. code-block:: sh 383 384 echo Y > /sys/module/msm/parameters/rd_full 385 386Note that, since all command streams get captured, it is easy to run the system 387out of memory doing this, so you probably don't want to enable it during play of 388a heavyweight game. Instead, to capture a command stream within a game, you 389probably want to cause a crash in the GPU during a frame of interest so that a 390single GPU core dump is generated. Emitting ``0xdeadbeef`` in the CS should be 391enough to cause a fault. 392 393``fd_rd_output`` facilities provide support for generating the command stream 394capture from inside Mesa. Different ``FD_RD_DUMP`` options are available: 395 396- ``enable`` simply enables dumping the command stream on each submit for a 397 given logical device. When a more advanced option is specified, ``enable`` is 398 implied as specified. 399- ``combine`` will combine all dumps into a single file instead of writing the 400 dump for each submit into a standalone file. 401- ``full`` will dump every buffer object, which is necessary for replays of 402 command streams (see below). 403- ``trigger`` will establish a trigger file through which dumps can be better 404 controlled. Writing a positive integer value into the file will enable dumping 405 of that many subsequent submits. Writing -1 will enable dumping of submits 406 until disabled. Writing 0 (or any other value) will disable dumps. 407 408Output dump files and trigger file (when enabled) are hard-coded to be placed 409under ``/tmp``, or ``/data/local/tmp`` under Android. `FD_RD_DUMP_TESTNAME` can 410be used to specify a more descriptive prefix for the output or trigger files. 411 412Functionality is generic to any Freedreno-based backend, but is currently only 413integrated in the MSM backend of Turnip. Using the existing ``TU_DEBUG=rd`` 414option will translate to ``FD_RD_DUMP=enable``. 415 416Capturing Hang RD 417+++++++++++++++++ 418 419Devcore file doesn't contain all submitted command streams, only the hanging one. 420Additionally it is geared towards analyzing the GPU state at the moment of the crash. 421 422Alternatively, it's possible to obtain the whole submission with all command 423streams via ``/sys/kernel/debug/dri/0/hangrd``: 424 425.. code-block:: sh 426 427 sudo cat /sys/kernel/debug/dri/0/hangrd > logfile.rd // Do the cat _before_ the expected hang 428 429The format of hangrd is the same as in ordinary command stream capture. 430``rd_full`` also has the same effect on it. 431 432Replaying Command Stream 433^^^^^^^^^^^^^^^^^^^^^^^^ 434 435``replay`` tool allows capturing and replaying ``rd`` to reproduce GPU faults. 436Especially useful for transient GPU issues since it has much higher chances to 437reproduce them. 438 439Dumping rendering results or even just memory is currently unsupported. 440 441- Replaying command streams requires kernel with ``MSM_INFO_SET_IOVA`` support. 442- Requires ``rd`` capture to have full snapshots of the memory (``rd_full`` is enabled). 443 444Replaying is done via ``replay`` tool: 445 446.. code-block:: sh 447 448 ./replay test_replay.rd 449 450More examples: 451 452.. code-block:: sh 453 454 ./replay --first=start_submit_n --last=last_submit_n test_replay.rd 455 456.. code-block:: sh 457 458 ./replay --override=0 test_replay.rd 459 460Editing Command Stream (a6xx+) 461^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 462 463While replaying a fault is useful in itself, modifying the capture to 464understand what causes the fault could be even more useful. 465 466``rddecompiler`` decompiles a single cmdstream from ``rd`` into compilable C source. 467Given the address space bounds the generated program creates a new ``rd`` which 468could be used to override cmdstream with 'replay'. Generated ``rd`` is not replayable 469on its own and depends on buffers provided by the source ``rd``. 470 471C source could be compiled by putting it into src/freedreno/decode/generate-rd.cc. 472 473The workflow would look like this: 474 4751. Find the cmdstream № you want to edit; 4762. Decompile it: 477 478.. code-block:: sh 479 480 ./rddecompiler -s %cmd_stream_n% example.rd > src/freedreno/decode/generate-rd.cc 481 4823. Edit the command stream;; 4834. Compile and deploy freedreno tools; 4845. Plug the generator into cmdstream replay: 485 486.. code-block:: sh 487 488 ./replay --override=%cmd_stream_№% 489 4906. Repeat 3-5. 491 492GPU Hang Debugging 493^^^^^^^^^^^^^^^^^^ 494 495Not a guide for how to do it but mostly an enumeration of methods. 496 497Useful ``TU_DEBUG`` (for Turnip) options to narrow down the hang cause: 498 499``sysmem``, ``gmem``, ``nobin``, ``forcebin``, ``noubwc``, ``nolrz``, ``flushall``, ``syncdraw``, ``rast_order`` 500 501Useful ``FD_MESA_DEBUG`` (for Freedreno) options: 502 503``sysmem``, ``gmem``, ``nobin``, ``noubwc``, ``nolrz``, ``notile``, ``dclear``, ``ddraw``, ``flush``, ``inorder``, ``noblit`` 504 505Useful ``IR3_SHADER_DEBUG`` options: 506 507``nouboopt``, ``spillall``, ``nopreamble``, ``nofp16`` 508 509Use Graphics Flight Recorder to narrow down the place which hangs, 510use our own breadcrumbs implementation in case of unrecoverable hangs. 511 512In case of faults use RenderDoc to find the problematic command. If it's 513a draw call, edit shader in RenderDoc to find whether it culprit is a shader. 514If yes, bisect it. 515 516If editing the shader messes the assembly too much and the issue becomes unreproducible 517try editing the assembly itself via ``IR3_SHADER_OVERRIDE_PATH``. 518 519If fault or hang is transient try capturing an ``rd`` and replay it. If issue 520is reproduced - bisect the GPU packets until the culprit is found. 521 522Do the above if culprit is not a shader. 523 524The hang recovery mechanism in Kernel is not perfect, in case of unrecoverable 525hangs check whether the kernel is up to date and look for unmerged patches 526which could improve the recovery. 527 528GPU Breadcrumbs 529+++++++++++++++ 530 531Breadcrumbs described below are available only in Turnip. 532 533Freedreno has simpler breadcrumbs, in debug build writes breadcrumbs 534into ``CP_SCRATCH_REG[6]`` and per-tile breadcrumbs into ``CP_SCRATCH_REG[7]``, 535in this way they are available in the devcoredump. TODO: generalize Tunip's 536breadcrumbs implementation. 537 538This is a simple implementations of breadcrumbs tracking of GPU progress 539intended to be a last resort when debugging unrecoverable hangs. 540For best results use Vulkan traces to have a predictable place of hang. 541 542For ordinary hangs as a more user-friendly solution use GFR 543"Graphics Flight Recorder". 544 545Or breadcrumbs implementation aims to handle cases where nothing can be done 546after the hang. In-driver breadcrumbs also allow more precise tracking since 547we could target a single GPU packet. 548 549While breadcrumbs support gmem, try to reproduce the hang in a sysmem mode 550because it would require much less breadcrumb writes and syncs. 551 552Breadcrumbs settings: 553 554.. code-block:: sh 555 556 TU_BREADCRUMBS=%IP%:%PORT%,break=%BREAKPOINT%:%BREAKPOINT_HITS% 557 558``BREAKPOINT`` 559 The breadcrumb starting from which we require explicit ack. 560``BREAKPOINT_HITS`` 561 How many times breakpoint should be reached for break to occur. 562 Necessary for a gmem mode and re-usable cmdbuffers in both of which 563 the same cmdstream could be executed several times. 564 565A typical work flow would be: 566 567- Start listening for breadcrumbs on a remote host: 568 569.. code-block:: sh 570 571 nc -lvup $PORT | stdbuf -o0 xxd -pc -c 4 | awk -Wposix '{printf("%u:%u\n", "0x" $0, a[$0]++)}' 572 573- Start capturing command stream; 574- Replay the hanging trace with: 575 576.. code-block:: sh 577 578 TU_BREADCRUMBS=$IP:$PORT,break=-1:0 579 580- Increase hangcheck period: 581 582.. code-block:: sh 583 584 echo -n 60000 > /sys/kernel/debug/dri/0/hangcheck_period_ms 585 586- After GPU hang note the last breadcrumb and relaunch trace with: 587 588.. code-block:: sh 589 590 TU_BREADCRUMBS=%IP%:%PORT%,break=%LAST_BREADCRUMB%:%HITS% 591 592- After the breakpoint is reached each breadcrumb would require 593 explicit ack from the user. This way it's possible to find 594 the last packet which didn't hang. 595 596- Find the packet in the decoded cmdstream. 597 598Debugging random failures 599^^^^^^^^^^^^^^^^^^^^^^^^^ 600 601In most cases random GPU faults and rendering artifacts are caused by some kind 602of undefined behavior that falls under the following categories: 603 604- Usage of a stale reg value; 605- Usage of stale memory (e.g. expecting it to be zeroed when it is not); 606- Lack of the proper synchronization. 607 608Finding instances of stale reg reads 609++++++++++++++++++++++++++++++++++++ 610 611Turnip has a debug option to stomp the registers with invalid values to catch 612the cases where stale data is read. 613 614.. code-block:: sh 615 616 MESA_VK_ABORT_ON_DEVICE_LOSS=1 \ 617 TU_DEBUG_STALE_REGS_RANGE=0x00000c00,0x0000be01 \ 618 TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass \ 619 ./app 620 621.. envvar:: TU_DEBUG_STALE_REGS_RANGE 622 623 the reg range in which registers would be stomped. Add ``inverse`` to the 624 flags in order for this range to specify which registers NOT to stomp. 625 626.. envvar:: TU_DEBUG_STALE_REGS_FLAGS 627 628 ``cmdbuf`` 629 stomp registers at the start of each command buffer. 630 ``renderpass`` 631 stomp registers before each render pass. 632 ``inverse`` 633 changes ``TU_DEBUG_STALE_REGS_RANGE`` meaning to 634 "regs that should NOT be stomped". 635 636The best way to pinpoint the reg which causes a failure is to bisect the regs 637range. In case when a fail is caused by combination of several registers 638the ``inverse`` flag may be set to find the reg which prevents the failure. 639