1:orphan: 2 3.. _radv-debug-hang: 4 5Debugging GPU hangs with RADV 6============================= 7 8UMR (optional) 9-------------- 10 11UMR is needed for dumping a lot of useful information. Clone, build and install 12`UMR <https://gitlab.freedesktop.org/tomstdenis/umr>`__. Do not forget to run 13``chmod +s $(which umr)`` so RADV can actually access UMR. 14 15UMR needs to access some kernel debug interfaces: 16 17.. code-block:: sh 18 19 chmod 777 /sys/kernel/debug 20 chmod -R 777 /sys/kernel/debug/dri 21 22Secure boot has to be disabled as well. 23 24Generating and analyzing hang reports 25------------------------------------- 26 27With UMR installed, you can now set ``RADV_DEBUG=hang`` which makes RADV insert 28trace markers and synchronization and check for hangs. The hang report will be 29saved to ``~/radv_dumps_<pid>_<time>``. Inside the directory of the hang report, 30there are a couple of files: 31 32* ``*.spv``: SPIR-V binaries of the pipeline that was bound when the hang 33 occurred. 34* ``app_info.log``: ``VkApplicationInfo`` fields. 35* ``bo_history.log``: A list of every GPU memory allocation and deallocation. 36 If the GPU hang was caused by a page fault, you can use 37 `radv_check_va.py <https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/amd/vulkan/radv_check_va.py>`__ 38 to figure out if address is invalid or used after the memory was deallocated. 39* ``bo_ranges.log``: Address ranges that were valid at the time of submission. 40* ``dmesg.log``: Output of ``dmesg``, if available. 41* ``gpu_info.log``: Fields of ``radeon_info``. 42* ``pipeline.log``: IR of the shaders that were bound during the hang as well as 43 programm counters of waves executing said shaders and bound descriptors. 44* ``registers.log``: Various GPU state registers. 45* ``trace.log``: An annotated list of the command stream that caused the hang. 46 the commands that hung come after 47 ``!!!!! This is the last trace point that was reached by the CP !!!!!``. 48* ``umr_ring.log``: Similar to ``trace.log``. 49* ``umr_waves.log``: A list of waves that were active at the time of the hang, 50 including register values. 51* ``vm_fault.log``: The page fault address if a page fault occurred. 52 53Debugging Steam games 54--------------------- 55 56Steam games require a bit more work so RADV can access UMR: In your Steam library, 57make sure **Tools** is checked and search for **Steam Linux Runtime**. 58Under **Properties** -> **Installed Files**, click **Browse**, open 59``_v2-entry-point`` and add 60 61.. code-block:: sh 62 63 shift 2 64 exec "$@" 65 66at the top of the file. Hang debugging can be enabled by selecting the faulting 67game and adding ``RADV_DEBUG=hang %command%`` under **Properties** -> **General** 68-> **LAUNCH OPTIONS**. 69 70Debugging hangs without RADV_DEBUG=hang 71--------------------------------------- 72 73In some situations, ``RADV_DEBUG=hang`` wouldn't be able to generate a GPU hang 74report, like for synchronization issues (because it enables 75``RADV_DEBUG=syncshaders`` behind the scene). An alternative solution is to 76disable GPU recovery by adding ``amdgpu.gpu_recovery=0`` to your kernel command 77line options. And then invoke UMR manually with 78``umr --by-pci <pci_id> -O bits,halt_waves -go 0 -wa <ring> -go 1 2>&1`` for 79dumping the waves and ``umr --by-pci <pci_id> -RS <ring> 2>&1`` for dumping the 80rings once the GPU hang occurred. 81